CWL when:/pickValue → Galaxy branching translation
Audience: a Mold author looking at a summary-cwl.json whose steps carry when: predicates and/or whose workflow outputs use pickValue, deciding which Galaxy translation to recommend.
The three honest translations
CWL has two related branching mechanisms with no 1:1 gxformat2 equivalent (until galaxy#22222 — see cwl-pickvalue-to-galaxy):
when:on a step — execute conditionally on a JS predicate.pickValue:on a step input or workflow output — fan in N candidate sources and pickfirst_non_null/the_only_non_null/all_non_null.
Three Galaxy-idiomatic translations are available; each is honest for a different source shape.
Translation A — paired_or_unpaired collection (preferred when the discriminator is paired-vs-single)
When the CWL when: predicates discriminate the paired-vs-single mode of read inputs (the seqprep-subwf pattern: single_reads: File? triggers single mode; absence triggers paired mode), the entire branching collapses into a single workflow with a paired_or_unpaired (or list:paired_or_unpaired for batches) input.
- The mode-switch parameter disappears.
- The per-step
when:predicates disappear. - The
pickValuefan-in disappears — both branches feed the same wrapper, branching happens inside the wrapper viahas_single_item. - Result: one workflow, fewer steps, no synthetic merge.
See [[galaxy-paired-or-unpaired-collections]] for the type itself and the decision rule.
Translation B — native pick_value workflow step (preferred for non-collection-shaped branches, post galaxy#22222)
When the CWL pickValue is on the workflow output (or a step input) but the discriminator is not a collection-shape concern (e.g., it’s a parameter-driven mode, or branches that produce different artifact kinds, or all_non_null aggregation), use Galaxy’s native pick_value workflow module.
- Each upstream branch keeps a per-step gxformat2
when:predicate. - A
type: pick_valuestep withstate.mode: <cwl_mode>fans in the branch outputs. - Workflow output reads from the
pick_valuestep’soutput.
See [[cwl-pickvalue-to-galaxy]] for the full mapping (mode table, gxformat2 surface, gotchas, version floor).
Translation C — sibling workflows per mode (preferred when IWC convention exists or modes diverge structurally)
When the IWC corpus already publishes the same upstream pipeline as two separate workflow files (one per mode), follow the IWC convention. Same applies when the two CWL branches diverge structurally beyond a single fan-in point (different downstream steps per mode, not just different sources to the same output).
- Each Galaxy
.gxwf.ymlis a linear workflow with no branching. - Two sibling files:
<name>-paired.gxwf.yml,<name>-single.gxwf.yml. - Downstream consumers (and registries) pick the workflow that matches their data shape.
Evidence: EBI-Metagenomics/pipeline-v5 → IWC ports amplicon/amplicon-mgnify/mgnify-amplicon-pipeline-v5-quality-control-paired-end and …-single-end.
Picking among A / B / C
Decision walk for the data-flow Mold:
- Is the discriminator paired-vs-single reads (or a near-cousin: optional File? read inputs gated by
when:)? → A. Translation B/C are fallbacks ifpaired_or_unpairedcannot model the shape (e.g., scatter/nesting depth limit; see[[galaxy-paired-or-unpaired-collections]]§“Limitation: only deepest rank”). - Does an IWC port of the same upstream pipeline exist as sibling workflows? → C, unless updating to
paired_or_unpaired(A) is explicitly an upgrade. Surface both options incompare-against-iwc-exemplar’s comparison notes. - Does the workflow output use
pickValue: all_non_null(collection aggregation), or does the branching produce structurally divergent outputs that need first-non-null/the-only-non-null fan-in? → B. - None of the above (custom predicate, non-paired-mode
when:) → B (default to nativepick_value). Use C only if the branches share no downstream structure.
Anti-patterns to call out in the brief
- Synthesizing a “pick non-empty” custom Galaxy tool. With galaxy#22222 (
pick_valuemodule), this is no longer needed and is now an anti-pattern. The Foundry’s prior data-flow briefs that recommended this should be updated. - Emitting a workflow-level
reads_modeselect parameter when the discriminator is collection-shape. Use translation A instead. - Translating CWL
first_non_nullto Galaxyfirst_or_skipwithout flagging that this is a Galaxy-only semantic (first_or_skipemits a “skipped” HDA on all-null; CWLfirst_non_nullerrors). Don’t swap modes silently.
Source-side anti-patterns to surface
CWL sources sometimes carry mode-switch branching that is a CWL accommodation rather than a real design choice. Surface these as anti-pattern flags during the data-flow brief:
- Bespoke per-pipeline utility scripts (
count_lines.py, ad-hoc renamers) that emit scalar metrics consumable only via CWL’s typed workflow outputs. Galaxy convention: drop them in favor of MultiQC / fastp JSON / FastQC reports. - Decompress-just-to-pass-through chains (CWL
multiple-gunzip.cwlstep that exists because the next CWL tool refuses gzipped input). Galaxy tools commonly accept.fastq.gzdirectly; verify before keeping the unzip step. Where IWC keeps it (e.g.,__UNZIP_COLLECTION__in MGnify amplicon ports), the downstream is intentional.
Galaxy version floor
Translation B (native pick_value) requires Galaxy ≥ main 2026-03-31 (galaxy#22222 merge date). Workflows using type: pick_value will not import on older releases. Translations A and C have no recent version floor (A requires paired_or_unpaired support, present since PR #19377 / late 2024).
Citations
[[cwl-pickvalue-to-galaxy]]— full PR analysis of galaxy#22222 (the nativepick_valuemodule).[[galaxy-paired-or-unpaired-collections]]— the discriminated-union collection type and thepaired ⊏ paired_or_unpairedsubtyping lattice.- IWC sibling-workflows convention:
amplicon/amplicon-mgnify/mgnify-amplicon-pipeline-v5-quality-control-paired-endand…-single-endin the IWC corpus. - Run-dir evidence for the discriminator-classification trap:
casts/claude/skills/_emulated-runs/mgnify-seqprep-subwf/cwl-galaxy-interface.md(the under-read Option A) vs the now-recommended translation A.
Evidence quality
- Corpus-observed (concrete): IWC sibling-workflows precedent for the paired-vs-single MGnify pipeline.
- Galaxy-source-observed (concrete via referenced notes):
pick_valuesemantics,paired_or_unpairedsubtyping. - Foundry-inference (marked): the three-translation taxonomy and the decision walk are inferred from one run + IWC evidence; refine after more CWL→Galaxy test-drives.
- Anti-pattern catalog: corpus-grounded for
count_lines.py(one run) and unzip-just-to-pass-through (same run); expect drift as more CWL fixtures land.