Nextflow: keyed join/combine(by:) to identifier-synchronized map-over
Use this when the source uses join(...) or combine(..., by: ...) to pair records by a key that should become or match a Galaxy collection element identifier.
Do not use this for unkeyed combine(). Treat unkeyed combine as Cartesian expansion and review by default.
Translation rule
If both sides are already Galaxy collections with matching identifiers and compatible structure, ordinary multi-input mapped wiring may be enough.
If membership, order, or labels may drift, add explicit identifier synchronization before downstream map-over:
- Extract identifiers from the reference collection.
- Filter sibling collections by identifiers when membership must match.
- Sort sibling collections by an identifier file when order must match.
- Relabel when order is correct but useful element names were lost.
Choose implementation pattern
- sync-collections-by-identifier for membership intersection by identifier.
- harmonize-by-sortlist-from-identifiers for order sync before zip-like mapped consumption.
- regex-relabel-via-tabular to restore or clean labels before keyed pairing.
- tabular-join-on-key when the source join is row/table data joining rather than file collection alignment.
Decision checklist
- What is the key?
- Is the key unique on both sides?
- Is unmatched data dropped, fatal, or preserved?
- Does downstream need membership sync only, order sync too, or relabeling?
- Are payloads file-like collection elements or tabular rows?
- Is one side global or broadcast rather than keyed?
Pitfalls
- Identifier sync is not order sync.
- File-driven sort can behave like reorder plus intersection; do not use it if unmatched elements must survive.
- Relabeling does not filter or reorder.
- Tabular key joins are different from collection identifier sync.
Evidence posture
This page is grounded in existing Foundry research and Galaxy implementation patterns. Generated Nextflow fixtures were not present during authoring, so exact operator-flag behavior remains a review trigger.