Home Research

Nextflow-to-Galaxy channel shape mapping

Maps common Nextflow channel, tuple, and path shapes to Galaxy dataset and collection shapes.

Raw
Revised
2026-05-06
Rev
2
component

Nextflow-to-Galaxy Channel Shape Mapping

This note maps source-level Nextflow channel shapes onto Galaxy data and dataset-collection shapes. Evidence quality is uneven: Galaxy collection semantics and IWC collection operations are well grounded; several Nextflow shapes are observed in the pinned fixtures; fromFilePairs and arbitrary deep tuple mapping remain low-confidence without direct fixture evidence.

Shape Mapping

Nextflow shapeGalaxy shapeConfidenceNotes
path(x) or a single Channel.fromPath(...).first()FileHighOne dataset input or output.
val(meta) aloneNo dataset shapeHighTreat as identifiers, labels, tags, sample-sheet metadata, or parameters. It is not a Galaxy dataset by itself.
Repeated tuple val(meta), path(file)listHighOne dataset per sample/key. Galaxy maps tools over list collections.
One tuple(meta, [R1, R2])pairedHighUse only when the workflow input is one paired sample or one tool consumes one pair.
Repeated tuple(meta, [R1, R2])list:pairedHighCommon paired-end workflow input shape.
Repeated tuple(meta, [single])listHighnf-core often normalizes single-end reads to one-element lists, but Galaxy should not model this as paired.
Mixed single-end and paired-end readspaired_or_unpaired or split list plus list:pairedMediumGalaxy supports mixed collections, but branch-splitting may be clearer when tools diverge.
tuple(meta, path(a), path(b))Parallel lists or per-step File inputsMediumNot automatically a paired collection. It is usually a keyed record of multiple tool inputs.
Global file plus per-sample filesBroadcast File plus mapped listMediumIf one input is global, connect it once and map the collection input.
collect() or toList()Collection reductionHighUsually one downstream invocation over a collection or multiple-input value.
collectFile(...)FileHighNextflow creates one new file; Galaxy should model a tool output, not a collection operation.
groupTuple() by keylist, list:paired, or list:listMediumDepends on grouped payload and whether the grouping axis matters downstream.
transpose() after groupingExplicit reshape or subcollection mappingMediumMay need flattening, Apply Rules, or identifier-preserving reshaping.
combine(..., by: key) or join(...)Multi-input collection map if identifiers matchMediumOtherwise use explicit synchronization by identifiers/order.
branch { ... }Branch wiring or explicit filtersMediumShape is whatever each branch returns; per-element routing needs review.
mix(...)Merge compatible streams/collectionsMediumUse direct wiring for reports/versions; use __MERGE_COLLECTION__ or __BUILD_LIST__ when materialized.
multiMap { ... }Synchronized split into Galaxy inputsMediumUsually channel-only fan-out, but downstream rejoin depends on identifier discipline.
fromFilePairspaired or list:pairedLowConceptually clean, but not directly observed in the pinned fixtures.
Arbitrary deep tuplesParallel lists, list:list, or manual modelingLowGalaxy collection types are not arbitrary tuple records. Require per-tool review.

Explicit Galaxy Operations

Use explicit Galaxy collection-operation or tabular steps when the translation changes materialized collection shape rather than only wiring an existing collection into a mapped step.

NeedCandidate Galaxy recipe
Build a collection from separate datasets__BUILD_LIST__
Pair forward/reverse files__ZIP_COLLECTION__ or Apply Rules paired mapping
Split paired reads into forward/reverse collections__UNZIP_COLLECTION__
Flatten nested collections__FLATTEN__
Regroup, swap nesting, split identifiers, or build pairs from metadata__APPLY_RULES__
Harmonize sibling collections by identifiers/orderidentifier extraction, filtering, sorting, or relabeling
Remove empty, failed, or null elements after fan-outfilter empty/failed/null collection tools
Unbox a singleton collection__EXTRACT_DATASET__
Convert a collection of tabular outputs to one tablecollapse_dataset or collection_column_join

Evidence

Corpus-observed Galaxy semantics:

Pinned fixture examples used by the research pass:

  • workflow-fixtures/pipelines/nf-core__demo/subworkflows/local/utils_nfcore_demo_pipeline/main.nf for tuple(meta, reads) paired-read handling.
  • workflow-fixtures/pipelines/nf-core__rnaseq/main.nf for single-end versus paired-end branching and reads normalization.
  • workflow-fixtures/pipelines/nf-core__taxprofiler/main.nf and workflow-fixtures/pipelines/nf-core__taxprofiler/subworkflows/local/*/main.nf for groupTuple, transpose, mix, combine, and multiMap patterns.
  • workflow-fixtures/pipelines/nf-core__fetchngs/workflows/sra/main.nf for collectFile and accession-driven data fetching.
  • $IWC_FORMAT2/virology/influenza-isolates-consensus-and-subtyping/influenza-consensus-and-subtyping.gxwf.yml as a Galaxy list:paired exemplar.

Low-Confidence TODOs

  • Confirm fromFilePairs with a direct fixture or external Nextflow documentation before treating it as corpus-observed.
  • Define a small shape grammar for summary-nextflow outputs, or explicitly require downstream Molds to preserve shape strings plus rationale.
  • Decide whether mixed single/paired reads should prefer Galaxy paired_or_unpaired or split branches.
  • Add a decision rule for grouped runs: preserve as list:list when the run axis matters; reduce to list only when downstream semantics ignore that axis.
  • Avoid using list:list just because a Nextflow tuple is nested.

Mold Use

Incoming References (35)