Nextflow Channel Operators
Operational grounding for summarize-nextflow §6 (“Reconcile the workflow DAG”). The Mold’s deterministic parser records the literal operator chain a workflow uses (["map", "join", "groupTuple"] in Edge.via); a second LLM pass reconciles the chain into a coherent from → to edge with resolved shape.
That reconciliation badly needs operator-level cardinality semantics: which operators preserve cardinality, which fan out, which fan in, which fork. Without a structured operator catalog, the LLM is guessing.
Companion structured form: component-nextflow-channel-operators.yml. The cast skill consumes the YAML at runtime; this prose note explains the categories and the cardinality model.
Categories
Six categories, ordered by how they affect downstream channel cardinality:
transform— emits one output per input. Cardinality preserved. (map,view,set,tap,dump,ifEmpty,randomSample.)filter— emits ≤ one output per input. Cardinality reduced. (filter,distinct,unique,first,last,take,until.)fan-in— collects N inputs into 1 (or fewer) outputs. (collect,collectFile,groupTuple,reduce,sum,count*,max,min,toList,toSortedList,buffer,collate.)fan-out— emits >1 outputs per input. (flatten,flatMap,transpose,splitCsv,splitFasta,splitFastq,splitJson,splitText.)combine— joins or concatenates two source channels. (combine,concat,cross,join,merge,mix.)fork— splits one source channel into multiple downstream channels. (branch,multiMap.) Returns a multi-channel object, not a single channel.terminal— consumes a channel without producing one. (subscribe.) Rare in workflow definitions; usually only inview-adjacent debug paths.
The category determines what the resolver records as the chain’s effect on shape. A chain that ends in groupTuple produces a list-shaped output even if every preceding operator was transform-category. A chain containing branch produces multiple downstream channels and the cast skill must record the branch keys.
Reading the YAML
Each entry has:
name— exact operator identifier as it appears in DSL2 source.category— one of the seven above.arity_in— how many channels the operator consumes (1or2).cardinality—preserved|reduced|expanded|aggregated|forked.output_shape_rule— terse description of what the output channel’s shape is, given the input.key_args— the parameters that materially change the shape effect (e.g.,groupTuple(by: [0, 1])’s grouping key).notes— anything that surprises an inattentive reader (e.g.,mergeis deprecated in DSL2,crossjoins on first element).
The cast skill’s reconciliation pass walks the chain left-to-right, applying each entry’s output_shape_rule to the running shape estimate. When the LLM is uncertain, Edge.notes records the reasoning chain for review.
Anti-patterns to recognize, not resolve
The Mold §6 says operator chains with deeply nested closures may produce edges flagged with low confidence. Specifically:
map { ... }with substantial Groovy logic. The closure can reshape arbitrarily; the YAML’soutput_shape_rule: same shape unless closure restructuresis honest about the limit. The LLM falls back to surface inspection of the closure body.branch { ... }with non-obvious keys. The branch keys are determined by the closure; static parsing recovers the keys frombranch.<name>references in the workflow body.multiMap { ... }returning records. Same as branch — the keys come from the closure’s emit list.cross/combinewith complex keys.crossjoins on the first tuple element by default;combine(by: [0, 1])joins on multiple elements. The reconciliation needs to know which.
Cross-references
summarize-nextflow.md§6 — the consumer of this digest.- component-nextflow-pipeline-anatomy — DSL2 layout context.
- The
Edge.viafield in summary-nextflow — where the operator chain is recorded.
Open gaps
Updated when contact with real pipelines reveals an operator pattern the bucketing rules do not handle cleanly. Each entry names the motivating target.