Home Research

Nextflow operators to Galaxy collection recipes

Classifies common Nextflow operators as Galaxy wiring, collection semantics, explicit steps, or review triggers.

Raw
Revised
2026-05-02
Rev
1
component

Nextflow Operators To Galaxy Collection Recipes

Most Nextflow operators are not Galaxy tools. Translate them first as source-side data-flow intent, then decide whether the Galaxy representation is simple wiring, collection semantics, an explicit Galaxy step, or a user-review checkpoint.

Decision Vocabulary

LabelMeaning
channel-only rewiringThe operator disappears into Galaxy connections, labels, branch wiring, or output selection.
Galaxy collection semanticsTranslation relies on collection identifiers, collection type, map-over, reduction, or nesting behavior.
explicit Galaxy stepAdd a collection-operation, tabular, text-processing, or domain tool step.
user reviewTranslation is likely lossy or semantically ambiguous.

Operator Recipes

Nextflow operatorGalaxy recipeClassConfidence
mapTreat metadata-only projection as wiring. Use Apply Rules only when materialized identifiers or collection shape must change. Use an explicit tool if file contents or table rows change.Usually channel-only; sometimes collection semantics or explicit stepHigh for metadata-only, medium for identifier parsing
joinIf two Galaxy collections have matching identifiers and structure, use normal multi-input map-over. Otherwise add synchronization by extracting, sorting, filtering, or relabeling identifiers. Use tabular joins for row-key joins.Collection semantics or explicit stepHigh for file/index pairing, medium for loose joins
groupTupleIf grouping represents nested collection structure, model collection nesting or Apply Rules. If grouping is scatter/gather for a domain operation, implement the downstream merge/reduce tool.Collection semantics plus explicit reductionHigh for interval gather, medium for arbitrary grouping
branchUse branch wiring only for static workflow-level classes. Per-element conditional routing usually needs explicit filters/classifiers or user review.Channel-only, explicit step, or reviewMedium
mixKeep report/version aggregation as wiring. Use __MERGE_COLLECTION__ or __BUILD_LIST__ only when a materialized collection is required.Usually channel-only; explicit collection assembly when materializedHigh for report/version aggregation
combineWith by:, treat like keyed pairing. Without by, treat as Cartesian expansion and require review unless a specific Galaxy cross-product recipe is intended.Collection semantics or reviewMedium-low for unkeyed combine
multiMapUsually split one tuple into separate synchronized Galaxy inputs. Preserve enough edge notes to rejoin later if needed.Usually channel-onlyMedium

User-Review Triggers

  • branch has non-trivial predicates, an unknown/default branch, or discarded branch data.
  • join uses optional/remainder behavior, duplicate keys, non-metadata keys, or mismatch-tolerant settings.
  • combine lacks by: and may imply all-vs-all expansion.
  • groupTuple uses explicit size, sort, or groupKey behavior.
  • map mutates metadata, parses identifiers with regex, returns variable arity, or hides content-level computation.
  • mix combines branches with overlapping identifiers or unclear ordering.
  • Any Groovy closure transforms file bytes or table rows.

Evidence

Pinned Nextflow fixtures provide direct examples of all covered operators except some edge cases of combine and arbitrary map closures:

  • workflow-fixtures/pipelines/nf-core__taxprofiler/main.nf for map, branch, mix, and collection split/merge behavior.
  • workflow-fixtures/pipelines/nf-core__taxprofiler/subworkflows/local/visualization_krona/main.nf for keyed combine plus multiMap.
  • workflow-fixtures/pipelines/nf-core__sarek/subworkflows/local/*/main.nf for join, interval groupTuple, and scatter/gather patterns.
  • workflow-fixtures/pipelines/nf-core__fetchngs/workflows/sra/main.nf for accession/data-fetching branches and reductions.

Galaxy-side evidence:

Low-Confidence Areas

  • Unkeyed combine can be represented by Galaxy cross-product collection tools, but the IWC survey found little or no corpus uptake. Prefer review.
  • branch cleanup via null filtering is possible but weakly attested; avoid claiming it as the default pattern.
  • Arbitrary map closures cannot be safely translated from syntax alone. Summarization should classify closure intent when possible.

Mold Use

TODOs

  • Decide whether summary-nextflow.schema.json should record operator parameters such as by, remainder, failOnMismatch, size, and sort.
  • Consider a dedicated pattern for Nextflow per-element branch to Galaxy conditionals/filtering.
  • Decide whether unkeyed combine always requires review.

Incoming References (9)

  • Galaxy Apply Rules DSLrelated note— Reference for Galaxy's Apply Rules DSL: rule operations, mapping operations, composition patterns, pitfalls.
  • Galaxy collection semanticsrelated note— Vendored formal spec of Galaxy dataset-collection mapping/reduction semantics, with labeled examples and pinned test references.
  • Galaxy collection-operation toolsrelated note— Catalog of Galaxy's collection-operation tools — purpose, IO, parameters, selection guide. Companion to galaxy-collection-semantics.
  • Galaxy data-flow draft contractrelated note— Defines the proposed boundary between Galaxy data-flow drafts, gxformat2 templates, and concrete step implementation.
  • Iwc Map Over Lifecycle Surveyrelated note— Survey of IWC map-over lifecycle recipes, with a Nextflow-to-Galaxy crosswalk for collection construction, cleanup, reshape, reduce, and publish phases.
  • Iwc Tabular Operations Surveyrelated note— Corpus survey of tabular tools and operations across IWC workflows; map for the operation pattern hierarchy on row/column data manipulation.
  • Iwc Transformations Surveyrelated note— Corpus survey of collection-shape transformations across IWC: built-in collection ops, toolshed transformers, and the multi-step recipes that bracket map-over.
  • Nextflow-to-Galaxy channel shape mappingrelated note— Maps common Nextflow channel, tuple, and path shapes to Galaxy dataset and collection shapes.
  • Nextflow: source pattern maprelated note— Use this source-pattern map to route recurring Nextflow channel and operator idioms to Galaxy implementation patterns.