Nextflow Operators To Galaxy Collection Recipes
Most Nextflow operators are not Galaxy tools. Translate them first as source-side data-flow intent, then decide whether the Galaxy representation is simple wiring, collection semantics, an explicit Galaxy step, or a user-review checkpoint.
Decision Vocabulary
| Label | Meaning |
|---|---|
channel-only rewiring | The operator disappears into Galaxy connections, labels, branch wiring, or output selection. |
Galaxy collection semantics | Translation relies on collection identifiers, collection type, map-over, reduction, or nesting behavior. |
explicit Galaxy step | Add a collection-operation, tabular, text-processing, or domain tool step. |
user review | Translation is likely lossy or semantically ambiguous. |
Operator Recipes
| Nextflow operator | Galaxy recipe | Class | Confidence |
|---|---|---|---|
map | Treat metadata-only projection as wiring. Use Apply Rules only when materialized identifiers or collection shape must change. Use an explicit tool if file contents or table rows change. | Usually channel-only; sometimes collection semantics or explicit step | High for metadata-only, medium for identifier parsing |
join | If two Galaxy collections have matching identifiers and structure, use normal multi-input map-over. Otherwise add synchronization by extracting, sorting, filtering, or relabeling identifiers. Use tabular joins for row-key joins. | Collection semantics or explicit step | High for file/index pairing, medium for loose joins |
groupTuple | If grouping represents nested collection structure, model collection nesting or Apply Rules. If grouping is scatter/gather for a domain operation, implement the downstream merge/reduce tool. | Collection semantics plus explicit reduction | High for interval gather, medium for arbitrary grouping |
branch | Use branch wiring only for static workflow-level classes. Per-element conditional routing usually needs explicit filters/classifiers or user review. | Channel-only, explicit step, or review | Medium |
mix | Keep report/version aggregation as wiring. Use __MERGE_COLLECTION__ or __BUILD_LIST__ only when a materialized collection is required. | Usually channel-only; explicit collection assembly when materialized | High for report/version aggregation |
combine | With by:, treat like keyed pairing. Without by, treat as Cartesian expansion and require review unless a specific Galaxy cross-product recipe is intended. | Collection semantics or review | Medium-low for unkeyed combine |
multiMap | Usually split one tuple into separate synchronized Galaxy inputs. Preserve enough edge notes to rejoin later if needed. | Usually channel-only | Medium |
User-Review Triggers
branchhas non-trivial predicates, anunknown/default branch, or discarded branch data.joinuses optional/remainder behavior, duplicate keys, non-metadata keys, or mismatch-tolerant settings.combinelacksby:and may imply all-vs-all expansion.groupTupleuses explicit size, sort, orgroupKeybehavior.mapmutates metadata, parses identifiers with regex, returns variable arity, or hides content-level computation.mixcombines branches with overlapping identifiers or unclear ordering.- Any Groovy closure transforms file bytes or table rows.
Evidence
Pinned Nextflow fixtures provide direct examples of all covered operators except some edge cases of combine and arbitrary map closures:
workflow-fixtures/pipelines/nf-core__taxprofiler/main.nfformap,branch,mix, and collection split/merge behavior.workflow-fixtures/pipelines/nf-core__taxprofiler/subworkflows/local/visualization_krona/main.nffor keyedcombineplusmultiMap.workflow-fixtures/pipelines/nf-core__sarek/subworkflows/local/*/main.nfforjoin, intervalgroupTuple, and scatter/gather patterns.workflow-fixtures/pipelines/nf-core__fetchngs/workflows/sra/main.nffor accession/data-fetching branches and reductions.
Galaxy-side evidence:
- galaxy-collection-semantics for map-over and reduction behavior.
- iwc-transformations-survey for cleanup-after-fanout, identifier synchronization, collection flattening, and corpus-observed collection recipes.
- galaxy-collection-tools for built-in collection operations.
- galaxy-apply-rules-dsl for identifier-derived collection reshaping.
- iwc-tabular-operations-survey for cases where operator translation leaves collection-land and becomes tabular/text transformation.
Low-Confidence Areas
- Unkeyed
combinecan be represented by Galaxy cross-product collection tools, but the IWC survey found little or no corpus uptake. Prefer review. branchcleanup via null filtering is possible but weakly attested; avoid claiming it as the default pattern.- Arbitrary
mapclosures cannot be safely translated from syntax alone. Summarization should classify closure intent when possible.
Mold Use
- nextflow-summary-to-galaxy-data-flow should use this as the primary operator-translation reference.
- implement-galaxy-tool-step should use this when operator decisions become concrete Galaxy collection or tabular steps.
- debug-galaxy-workflow-output should use this when wrong nesting, missing elements, branch merges, or gather outputs indicate a bad operator translation.
TODOs
- Decide whether
summary-nextflow.schema.jsonshould record operator parameters such asby,remainder,failOnMismatch,size, andsort. - Consider a dedicated pattern for Nextflow per-element
branchto Galaxy conditionals/filtering. - Decide whether unkeyed
combinealways requires review.