Claude skill · cast nextflow-summary-to-galaxy-data-flow
Translate a Nextflow summary into a Galaxy data-flow design brief.
← All cast skills · Source mold →
Install
/plugin marketplace add jmchilton/foundry
/plugin install foundry-skills@galaxy-workflow-foundry
Then invoke as:
/foundry-skills:nextflow-summary-to-galaxy-data-flow
Skill Bundle
/ packaged cast
- attached files
- 11
- upfront
- 4
- on demand
- 7
- cast rev
- 3
- validated
- 0
Produces: 1 artifact.
Consumes: 3 artifacts.
Artifact Contract
/ skill handoff Produces
nextflow-galaxy-data-flow
Reviewable Markdown brief: abstract operations, collection map/reduce choices, shape-changing placeholder steps, unresolved Galaxy tool needs, confidence, open questions.
markdownnextflow-galaxy-data-flow.md
Raw artifact contract
{
"id": "nextflow-galaxy-data-flow",
"kind": "markdown",
"default_filename": "nextflow-galaxy-data-flow.md",
"description": "Reviewable Markdown brief: abstract operations, collection map/reduce choices, shape-changing placeholder steps, unresolved Galaxy tool needs, confidence, open questions."
}
Consumes
summary-nextflow
Structured Nextflow pipeline summary emitted by [[summarize-nextflow]]; the JSON the data-flow translation reads.
Raw artifact contract
{
"id": "summary-nextflow",
"description": "Structured Nextflow pipeline summary emitted by [[summarize-nextflow]]; the JSON the data-flow translation reads.",
"inherited_schema": "[[summary-nextflow]]",
"producers": [
"summarize-nextflow"
]
}
nextflow-galaxy-reference-data
Reference-data shape brief from [[nextflow-summary-to-galaxy-reference-data]] that pins per-asset reference inputs and rebuild-on-absence behavior.
Raw artifact contract
{
"id": "nextflow-galaxy-reference-data",
"description": "Reference-data shape brief from [[nextflow-summary-to-galaxy-reference-data]] that pins per-asset reference inputs and rebuild-on-absence behavior.",
"producers": [
"nextflow-summary-to-galaxy-reference-data"
]
}
nextflow-galaxy-interface
Preceding Galaxy interface brief from [[nextflow-summary-to-galaxy-interface]] that pins inputs, outputs, and labels.
Raw artifact contract
{
"id": "nextflow-galaxy-interface",
"description": "Preceding Galaxy interface brief from [[nextflow-summary-to-galaxy-interface]] that pins inputs, outputs, and labels.",
"producers": [
"nextflow-summary-to-galaxy-interface"
]
}
Attached Files
/ runtime references Load upfront
Keep the data-flow brief separate from gxformat2 templating and concrete step implementation.
upfront runtime verbatim hypothesis deterministic 6.4 KB
- bundle
references/notes/galaxy-data-flow-draft-contract.md - source
content/research/galaxy-data-flow-draft-contract.md
Preview md
---
type: research
subtype: design-spec
title: "Galaxy data-flow draft contract"
tags:
- research/design-spec
- target/galaxy
status: draft
created: 2026-05-02
revised: 2026-05-03
revision: 2
ai_generated: true
related_notes:
- "[[nextflow-to-galaxy-channel-shape-mapping]]"
- "[[nextflow-operators-to-galaxy-collection-recipes]]"
related_molds:
- "[[nextflow-summary-to-galaxy-data-flow]]"
- "[[cwl-summary-to-galaxy-data-flow]]"
- "[[paper-summary-to-galaxy-design]]"
- "[[nextflow-summary-to-galaxy-template]]"
- "[[cwl-summary-to-galaxy-template]]"
- "[[paper-summary-to-galaxy-template]]"
- "[[compare-against-iwc-exemplar]]"
sources:
- "https://github.com/jmchilton/foundry/issues/54"
summary: "Defines the proposed boundary between Galaxy data-flow drafts, gxformat2 templates, and concrete step implementation."
---
# Galaxy Data-Flow Draft Contract
This is an architectural contract, not a schema. Evidence is strongest for Mold and Pipeline boundaries. Proposed fields are speculative until exercised by two or three worked translations.
## Boundary
The data-flow draft owns a target-shaped abstract DAG for Galaxy. It should not be valid `gxformat2` and should not resolve exact Tool Shed tools.
Data-flow draft owns:
- Galaxy-facing workflow inputs and outputs.
- Abstract nodes, edges, branches, collection mapping, collection reduction, and placeholder transformations.
- Input/output shape decisions such as `File`, `list`, `paired`, `list:paired`, or `list:list`.
- Conceptual Galaxy idioms: map-over, reduction, Apply Rules, collection cleanup, identifier synchronization, tabular bridge.
- Abstract unresolved tool needs with input and output shapes.
- Confidence and rationale on inferred nodes, edges, transforms, and tool needs.
The Galaxy template
...
Classify Nextflow operators as Galaxy wiring, collection semantics, explicit steps, or review triggers.
upfront runtime verbatim corpus-observed deterministic 6.5 KB
- bundle
references/notes/nextflow-operators-to-galaxy-collection-recipes.md - source
content/research/nextflow-operators-to-galaxy-collection-recipes.md
Preview md
---
type: research
subtype: component
title: "Nextflow operators to Galaxy collection recipes"
tags:
- research/component
- source/nextflow
- target/galaxy
status: draft
created: 2026-05-02
revised: 2026-05-02
revision: 1
ai_generated: true
related_notes:
- "[[nextflow-to-galaxy-channel-shape-mapping]]"
- "[[galaxy-collection-semantics]]"
- "[[galaxy-collection-tools]]"
- "[[galaxy-apply-rules-dsl]]"
- "[[iwc-transformations-survey]]"
- "[[iwc-tabular-operations-survey]]"
- "[[galaxy-data-flow-draft-contract]]"
- "[[iwc-map-over-lifecycle-survey]]"
- "[[nextflow-patterns]]"
related_molds:
- "[[nextflow-summary-to-galaxy-data-flow]]"
- "[[implement-galaxy-tool-step]]"
- "[[debug-galaxy-workflow-output]]"
sources:
- "https://github.com/jmchilton/foundry/issues/53"
summary: "Classifies common Nextflow operators as Galaxy wiring, collection semantics, explicit steps, or review triggers."
---
# Nextflow Operators To Galaxy Collection Recipes
Most Nextflow operators are not Galaxy tools. Translate them first as source-side data-flow intent, then decide whether the Galaxy representation is simple wiring, collection semantics, an explicit Galaxy step, or a user-review checkpoint.
## Decision Vocabulary
| Label | Meaning |
|---|---|
| `channel-only rewiring` | The operator disappears into Galaxy connections, labels, branch wiring, or output selection. |
| `Galaxy collection semantics` | Translation relies on collection identifiers, collection type, map-over, reduction, or nesting behavior. |
| `explicit Galaxy step` | Add a collection-operation, tabular, text-processing, or domain tool step. |
| `user review` | Translation is likely lossy or semantically ambiguous. |
## Operator Recipes
| Nextflow operator | Galaxy recipe | Class | Confidenc
...
Translate Nextflow channel, tuple, and path shapes into Galaxy dataset and collection shapes.
upfront runtime verbatim corpus-observed deterministic 8.6 KB
- bundle
references/notes/nextflow-to-galaxy-channel-shape-mapping.md - source
content/research/nextflow-to-galaxy-channel-shape-mapping.md
Preview md
---
type: research
subtype: component
title: "Nextflow-to-Galaxy channel shape mapping"
tags:
- research/component
- source/nextflow
- target/galaxy
status: draft
created: 2026-05-02
revised: 2026-05-06
revision: 2
ai_generated: true
related_notes:
- "[[nextflow-workflow-io-semantics]]"
- "[[nextflow-params-to-galaxy-inputs]]"
- "[[nextflow-path-glob-to-galaxy-datatype]]"
- "[[galaxy-collection-semantics]]"
- "[[galaxy-collection-tools]]"
- "[[galaxy-apply-rules-dsl]]"
- "[[iwc-transformations-survey]]"
- "[[nextflow-operators-to-galaxy-collection-recipes]]"
- "[[galaxy-data-flow-draft-contract]]"
- "[[iwc-conditionals-survey]]"
- "[[manifest-to-mapped-collection-lifecycle]]"
- "[[map-workflow-enum-to-tool-parameter]]"
- "[[regex-relabel-via-tabular]]"
- "[[relabel-via-rules-and-find-replace]]"
- "[[reshape-relabel-remap-by-collection-axis]]"
- "[[sync-collections-by-identifier]]"
- "[[tabular-compute-new-column]]"
- "[[tabular-concatenate-collection-to-table]]"
- "[[tabular-cut-and-reorder-columns]]"
- "[[tabular-filter-by-column-value]]"
- "[[tabular-filter-by-regex]]"
- "[[tabular-group-and-aggregate-with-datamash]]"
- "[[tabular-join-on-key]]"
- "[[tabular-pivot-collection-to-wide]]"
- "[[tabular-prepend-header]]"
- "[[tabular-relabel-by-row-counter]]"
- "[[tabular-split-taxonomy-string]]"
- "[[tabular-sql-query]]"
- "[[tabular-synthesize-bed-from-3col]]"
- "[[tabular-to-collection-by-row]]"
- "[[iwc-map-over-lifecycle-survey]]"
- "[[nextflow-patterns]]"
related_molds:
- "[[nextflow-summary-to-galaxy-interface]]"
- "[[nextflow-summary-to-galaxy-data-flow]]"
- "[[nextflow-summary-to-galaxy-template]]"
- "[[implement-galaxy-tool-step]]"
sources:
- "https://github.com/jmchilton/foundry/issu
...
schema
summary-nextflow
packaged Read process, channel, operator, and fixture structure while drafting Galaxy-facing abstract data flow.
upfront runtime verbatim corpus-observed deterministic 56.7 KB
- bundle
references/schemas/summary-nextflow.schema.json - source
package://@galaxy-foundry/summarize-nextflow#summaryNextflowSchema
Preview json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://galaxyproject.org/foundry/schemas/summary-nextflow.schema.json",
"$comment": "Canonical source: packages/summarize-nextflow/src/schema/summary-nextflow.schema.json in jmchilton/foundry. Mold frontmatter cites this schema via [[summary-nextflow]] wiki-links; the cast pipeline imports the `summaryNextflowSchema` runtime export and serializes it into cast bundles.",
"title": "Nextflow Pipeline Summary",
"description": "Structured per-source summary emitted by the summarize-nextflow Mold.\n\nPer-source schema by design — paper, Nextflow, and CWL each have their own summary shape; downstream Molds (data flow, templates, tool wrappers) consume any source's summary and handle the polymorphism.\n\nField names mirror gxy-sketches' SketchSource / ToolSpec / TestDataRef / ExpectedOutputRef where parity exists; see content/research/gxy-sketches-alignment.md.",
"$ref": "#/$defs/Summary",
"$defs": {
"Summary": {
"title": "Summary",
"description": "Top-level shape. Every Nextflow summary is exactly this object.",
"type": "object",
"additionalProperties": false,
"required": [
"source",
"params",
"sample_sheets",
"profiles",
"tools",
"processes",
"subworkflows",
"workflow",
"reference_assets",
"reference_rebuilds",
"test_fixtures",
"nf_tests"
],
"properties": {
"source": {
"$ref": "#/$defs/SourceRecord"
},
"params": {
"type": "array",
"items": {
"$ref": "#/$defs/Param"
}
},
"sample_sheets": {
"type": "array",
"items": {
"$ref": "#/$defs/SampleSheet"
},
"description": "Structured sample-sheet inputs. Each entry binds one `params[]` parameter to a row schema (column names, types, path-vs-meta classification, required flags, enums, patterns). Promoted from prose inside `params[].description` so downstream target translations (Galaxy `sample_sheet*` collections, CWL records-of-arrays) can choose collection variants without re-parsing the source pipeline. Empty array when no sample-sheet idiom is detected. Discovery sources: nf-schema `schema:` references, `samplesheetToList()` calls, and `splitCsv(header: true)` m
...
Load on demand
Ground collection-shape choices in curated, corpus-observed operation and recipe patterns.
Trigger: When selecting collection cleanup, reshape, identifier, or collection-tabular bridge patterns.
on-demand runtime verbatim corpus-observed deterministic 4.4 KB
- bundle
references/patterns/galaxy-collection-patterns.md - source
content/patterns/galaxy-collection-patterns.md
Preview md
---
type: pattern
pattern_kind: moc
evidence: corpus-observed
title: "Galaxy: collection patterns"
aliases:
- "Galaxy collection pattern MOC"
- "collection transformation patterns"
- "IWC collection pattern map"
tags:
- pattern
- target/galaxy
- topic/galaxy-transform
- topic/collection-transform
status: draft
created: 2026-05-02
revised: 2026-05-02
revision: 1
ai_generated: true
summary: "Use this MOC to choose corpus-grounded Galaxy collection transformation patterns."
related_notes:
- "[[iwc-transformations-survey]]"
- "[[iwc-conditionals-survey]]"
related_patterns:
- "[[manifest-to-mapped-collection-lifecycle]]"
- "[[cleanup-sync-and-publish-nonempty-results]]"
- "[[reshape-relabel-remap-by-collection-axis]]"
- "[[fan-in-bundle-consume-and-flatten]]"
- "[[collection-cleanup-after-mapover-failure]]"
- "[[sync-collections-by-identifier]]"
- "[[harmonize-by-sortlist-from-identifiers]]"
- "[[regex-relabel-via-tabular]]"
- "[[relabel-via-rules-and-find-replace]]"
- "[[collection-swap-nesting-with-apply-rules]]"
- "[[collection-split-identifier-via-rules]]"
- "[[collection-build-list-paired-with-apply-rules]]"
- "[[tabular-to-collection-by-row]]"
- "[[tabular-concatenate-collection-to-table]]"
- "[[tabular-pivot-collection-to-wide]]"
related_molds:
- "[[implement-galaxy-tool-step]]"
- "[[nextflow-summary-to-galaxy-data-flow]]"
- "[[cwl-summary-to-galaxy-data-flow]]"
- "[[nextflow-summary-to-galaxy-template]]"
- "[[cwl-summary-to-galaxy-template]]"
- "[[paper-summary-to-galaxy-template]]"
- "[[compare-against-iwc-exemplar]]"
---
# Galaxy: collection patterns
This is the runtime-facing map for Galaxy collection transformation choices. Use it before loading raw survey notes. The survey remains evidence backing;
...
Ground tabular bridge and table-operation choices in curated, corpus-observed operation patterns.
Trigger: When data-flow translation needs filtering, joining, aggregation, pivoting, or tabular-collection bridges.
on-demand runtime verbatim corpus-observed deterministic 3.1 KB
- bundle
references/patterns/galaxy-tabular-patterns.md - source
content/patterns/galaxy-tabular-patterns.md
Preview md
---
type: pattern
pattern_kind: moc
evidence: corpus-observed
title: "Galaxy: tabular patterns"
aliases:
- "Galaxy tabular pattern MOC"
- "tabular transformation patterns"
- "IWC tabular pattern map"
tags:
- pattern
- target/galaxy
- topic/galaxy-transform
- topic/tabular-transform
status: draft
created: 2026-05-02
revised: 2026-05-02
revision: 1
ai_generated: true
summary: "Use this MOC to choose corpus-grounded Galaxy tabular transformation patterns."
related_notes:
- "[[iwc-tabular-operations-survey]]"
related_patterns:
- "[[tabular-filter-by-column-value]]"
- "[[tabular-filter-by-regex]]"
- "[[tabular-cut-and-reorder-columns]]"
- "[[tabular-compute-new-column]]"
- "[[tabular-join-on-key]]"
- "[[tabular-group-and-aggregate-with-datamash]]"
- "[[tabular-sql-query]]"
- "[[tabular-prepend-header]]"
- "[[tabular-synthesize-bed-from-3col]]"
- "[[tabular-split-taxonomy-string]]"
- "[[tabular-relabel-by-row-counter]]"
- "[[tabular-to-collection-by-row]]"
- "[[tabular-concatenate-collection-to-table]]"
- "[[tabular-pivot-collection-to-wide]]"
related_molds:
- "[[implement-galaxy-tool-step]]"
- "[[nextflow-summary-to-galaxy-data-flow]]"
- "[[cwl-summary-to-galaxy-data-flow]]"
- "[[nextflow-summary-to-galaxy-template]]"
- "[[cwl-summary-to-galaxy-template]]"
- "[[paper-summary-to-galaxy-template]]"
- "[[compare-against-iwc-exemplar]]"
---
# Galaxy: tabular patterns
This is the runtime-facing map for Galaxy tabular transformation choices. Use it before loading raw survey notes. The survey remains evidence backing; the operation pages are the actionable references.
## Row And Column Operations
- [[tabular-filter-by-column-value]] — keep/drop rows by string column value with `Filter1`.
- [[tabular-filter-by-regex]] — k
...
Preserve per-row metadata on the data-flow side: keep sample_sheet column_definitions wired through identifier-keyed steps instead of dropping into parallel parameter inputs, and re-attach metadata after map-over steps that lose it.
Trigger: When the upstream interface brief carries a sample_sheet[:paired|:paired_or_unpaired|:record] input, or when the Nextflow summary shows tuple(meta, path...) channel shape originating from samplesheetToList or splitCsv(header: true).
on-demand runtime verbatim corpus-observed deterministic 8.4 KB
- bundle
references/notes/galaxy-sample-sheet-collections.md - source
content/research/galaxy-sample-sheet-collections.md
Preview md
---
type: research
subtype: component
title: "Galaxy sample_sheet collection types"
tags:
- research/component
- target/galaxy
status: draft
created: 2026-05-05
revised: 2026-05-06
revision: 2
ai_generated: true
related_notes:
- "[[galaxy-collection-semantics]]"
- "[[galaxy-collection-tools]]"
- "[[nextflow-workflow-io-semantics]]"
- "[[nextflow-params-to-galaxy-inputs]]"
- "[[nextflow-path-glob-to-galaxy-datatype]]"
- "[[nextflow-to-galaxy-channel-shape-mapping]]"
- "[[nextflow-to-galaxy-reference-data-mapping]]"
related_molds:
- "[[nextflow-summary-to-galaxy-interface]]"
- "[[nextflow-summary-to-galaxy-data-flow]]"
sources:
- "Galaxy PR #19305 (Implement Sample Sheets), merged 2025-07-30"
- "lib/galaxy/model/dataset_collections/types/sample_sheet.py"
- "lib/galaxy/model/dataset_collections/types/sample_sheet_util.py"
- "lib/galaxy/model/dataset_collections/type_description.py"
- "lib/galaxy/schema/schema.py (SampleSheetColumnDefinition, SampleSheetRow)"
- "lib/galaxy/tools/wrappers.py (DatasetCollectionWrapper.sample_sheet_row)"
- "lib/galaxy/tools/sample_sheet_to_tabular.xml"
- "lib/galaxy/webapps/galaxy/api/dataset_collections.py (sample_sheet_workbook endpoints)"
- "lib/galaxy/model/migrations/alembic/versions_gxy/3af58c192752_implement_sample_sheets.py"
summary: "Galaxy's sample_sheet collection family: typed column metadata, four variants, mapping rules, validator allowlist."
---
# Galaxy sample_sheet collection types
Reference for the Galaxy backend shape that targets structured per-row metadata — the natural landing zone for Nextflow `samplesheetToList` parameters and for any source-side idiom that pairs typed metadata columns with dataset references.
## Shape
A `sample_sheet` is a list-shaped collection where each el
...
Decide between subworkflow `when:` and inline tool-step `when:` for each source conditional, and pick the right output fan-in primitive (`pick_value` vs twin-cascade) so the data-flow brief carries a coherent conditional disposition forward.
Trigger: When the Nextflow summary's `workflow.conditionals[]` is non-empty, or when subworkflow boundaries in the source align with parameter-driven branches (step, aligner, wes, tools, skip_*, use_*).
on-demand runtime verbatim corpus-observed deterministic 13.7 KB
- bundle
references/notes/nextflow-conditional-to-galaxy-subworkflow-when.md - source
content/research/nextflow-conditional-to-galaxy-subworkflow-when.md
Preview md
---
type: research
subtype: component
title: "Nextflow conditional to Galaxy subworkflow / when"
tags:
- research/component
- source/nextflow
- target/galaxy
status: draft
created: 2026-05-08
revised: 2026-05-08
revision: 1
ai_generated: true
related_notes:
- "[[nextflow-to-galaxy-reference-data-mapping]]"
- "[[nextflow-to-galaxy-channel-shape-mapping]]"
- "[[summary-nextflow]]"
- "[[gxformat2-schema]]"
related_molds:
- "[[nextflow-summary-to-galaxy-data-flow]]"
- "[[nextflow-summary-to-galaxy-template]]"
sources:
- "https://github.com/galaxyproject/gxformat2"
- "https://github.com/iwc-workflows"
summary: "Stub. Translate Nextflow conditionals into Galaxy `when:` (single-workflow v1). Subworkflow vs inline is an aesthetic call, not a rule."
---
# Nextflow conditional to Galaxy subworkflow / when
Stub. Surfaced from sarek emulation (2026-05-08). Companion to [[nextflow-to-galaxy-reference-data-mapping]] — same v1 posture (one Galaxy workflow per source pipeline; trench-coat shape is acceptable as a draft for human review), different gap (control flow rather than reference data).
## Posture
For v1 of the Nextflow-to-Galaxy translation Molds the output is a single Galaxy workflow per source pipeline, even when the source has substantial branching. IWC reviewers historically prefer sibling workflows for what looks like one pipeline with toggles, and we agree; but for the *translation step* a single artifact keeps the Mold pipeline deterministic, the harness simple, and the reviewer's mental model of "this draft maps 1:1 to the source" intact. Sibling-extraction is a polish pass a human or follow-up Mold runs *after* translation, not a decision the translation Mold makes.
The question this note addresses is: given that v1 is one Galaxy workflow, *h
...
Preserve datatype confidence while translating path-like data-flow edges, process output patterns, and published outputs.
Trigger: When choosing or reviewing Galaxy datatype extensions for data-flow edges, collection elements, or output datasets.
on-demand runtime verbatim corpus-observed deterministic 12.8 KB
- bundle
references/notes/nextflow-path-glob-to-galaxy-datatype.md - source
content/research/nextflow-path-glob-to-galaxy-datatype.md
Preview md
---
type: research
subtype: component
title: "Nextflow path/glob to Galaxy datatype mapping"
tags:
- research/component
- source/nextflow
- target/galaxy
status: draft
created: 2026-05-06
revised: 2026-05-06
revision: 1
ai_generated: true
related_notes:
- "[[nextflow-workflow-io-semantics]]"
- "[[gxformat2-workflow-inputs]]"
- "[[galaxy-datatypes-conf]]"
- "[[galaxy-sample-sheet-collections]]"
- "[[nextflow-params-to-galaxy-inputs]]"
- "[[nextflow-to-galaxy-channel-shape-mapping]]"
- "[[summary-nextflow]]"
- "[[nextflow-summary-to-galaxy-interface]]"
- "[[nextflow-summary-to-galaxy-data-flow]]"
related_molds:
- "[[summarize-nextflow]]"
- "[[nextflow-summary-to-galaxy-interface]]"
- "[[nextflow-summary-to-galaxy-data-flow]]"
sources:
- "content/research/datatypes_conf.xml.sample"
- "https://github.com/galaxyproject/galaxy/blob/7765fae934fbfdee77e3be5f5b235e43735273ae/config/datatypes_conf.xml.sample"
- "https://www.nextflow.io/docs/latest/process.html"
- "https://www.nextflow.io/docs/latest/reference/channel.html"
- "https://nextflow-io.github.io/nf-schema/latest/nextflow_schema/nextflow_schema_specification/"
summary: "Rules for mapping Nextflow path, glob, sample-sheet, and output filename evidence to Galaxy datatype extensions."
---
# Nextflow path/glob to Galaxy datatype mapping
Use this note when a Nextflow-to-Galaxy Mold needs a gxformat2 `format` value for a `data` input, collection element, or workflow output. [[nextflow-params-to-galaxy-inputs]] decides whether something is a dataset or collection; this note only decides datatype extension and confidence.
Evidence quality:
- **Corpus-observed** claims cite pinned fixtures under `$NEXTFLOW_FIXTURES`, the shared clone at `/Users/jxc755/projects/repositories/workflow-fixt
...
Cross-check source-side reference-data classifications before deciding how reference assets and optional rebuild branches flow through the Galaxy data-flow draft.
Trigger: When the reference-data or interface brief is silent, low-confidence, or conflicts with source evidence for iGenomes-derived params, coordinated bundles, compute-if-missing branches, multi-DB pick-lists, or cohort-specific assets.
on-demand runtime verbatim corpus-observed deterministic 7.8 KB
- bundle
references/notes/nextflow-reference-data-classification.md - source
content/research/nextflow-reference-data-classification.md
Preview md
---
type: research
subtype: component
title: "Nextflow reference-data classification"
tags:
- research/component
- source/nextflow
status: draft
created: 2026-05-10
revised: 2026-05-10
revision: 3
ai_generated: true
related_notes:
- "[[summary-nextflow]]"
- "[[nextflow-to-galaxy-reference-data-mapping]]"
- "[[nextflow-summary-to-galaxy-reference-data]]"
- "[[nextflow-summary-to-galaxy-interface]]"
- "[[nextflow-summary-to-galaxy-data-flow]]"
- "[[nextflow-summary-to-galaxy-template]]"
related_molds:
- "[[summarize-nextflow]]"
- "[[nextflow-summary-to-galaxy-reference-data]]"
- "[[nextflow-summary-to-galaxy-interface]]"
- "[[nextflow-summary-to-galaxy-data-flow]]"
- "[[nextflow-summary-to-galaxy-template]]"
sources:
- "https://nf-co.re/docs/usage/reference_genomes"
- "https://github.com/nf-core/sarek/blob/master/conf/igenomes.config"
- "https://github.com/nf-core/configs"
- "https://github.com/jmchilton/foundry/issues/221"
summary: "Source-side taxonomy of how Nextflow pipelines use reference data — eight classifications detectable from a summary-nextflow artifact."
---
# Nextflow reference-data classification
Reference-data shape varies along several roughly orthogonal dimensions: whether the pipeline consumes or produces reference data, the cardinality of the assets, whether they're keyed or per-asset, whether rebuild fallback exists, and whether multiple bundles run in parallel. The classifications below are flags an LLM can detect from a `summary-nextflow` artifact; a single pipeline often matches more than one. Grounded in the complexity bridge fixtures from jmchilton/foundry#221.
For the Galaxy-side translation of these classifications, see [[nextflow-to-galaxy-reference-data-mapping]].
## None
Pipeline consumes no reference d
...
Decide how reference assets and their indexes flow through the Galaxy data-flow draft (preserving dbkey through map-overs, deferring index-building to wrappers vs surfacing as workflow steps).
Trigger: When the upstream interface brief carries reference-data inputs (FASTA, fai, dict, indexes, known sites, intervals, PoN) or when the source pipeline's compute-if-missing branches imply rebuild semantics the data flow has to honor.
on-demand runtime verbatim corpus-observed deterministic 12.0 KB
- bundle
references/notes/nextflow-to-galaxy-reference-data-mapping.md - source
content/research/nextflow-to-galaxy-reference-data-mapping.md
Preview md
---
type: research
subtype: component
title: "Nextflow to Galaxy reference-data mapping"
tags:
- research/component
- source/nextflow
- target/galaxy
status: draft
created: 2026-05-08
revised: 2026-05-10
revision: 5
ai_generated: true
related_notes:
- "[[nextflow-reference-data-classification]]"
- "[[nextflow-params-to-galaxy-inputs]]"
- "[[nextflow-path-glob-to-galaxy-datatype]]"
- "[[summary-nextflow]]"
- "[[nextflow-summary-to-galaxy-reference-data]]"
- "[[nextflow-summary-to-galaxy-template]]"
- "[[galaxy-sample-sheet-collections]]"
- "[[galaxy-datatypes-conf]]"
related_molds:
- "[[summarize-nextflow]]"
- "[[nextflow-summary-to-galaxy-reference-data]]"
- "[[nextflow-summary-to-galaxy-interface]]"
- "[[nextflow-summary-to-galaxy-data-flow]]"
- "[[nextflow-summary-to-galaxy-template]]"
sources:
- "https://github.com/jmchilton/foundry/issues/221"
summary: "Galaxy-side translation of Nextflow reference-data classifications: idioms available, the v1 posture, datatype defaults, and the in-tool rebuild trade-off."
---
# Nextflow to Galaxy reference-data mapping
Mapping research for [[nextflow-summary-to-galaxy-reference-data]]. Once a Nextflow pipeline's reference-data usage is classified per [[nextflow-reference-data-classification]], this note pins the Galaxy-side translation: idioms available, the v1 posture, datatype defaults, the in-tool rebuild trade-off, and known representation gaps the brief should flag.
## Galaxy side
Galaxy has multiple idioms for surfacing reference data. The bullets below are presented as available shapes; the recommendations that follow narrow them to the v1 posture.
- **`dbkey`-keyed cached lookups.** Workflow inputs carry a `dbkey` annotation; tools consume an admin-pre-loaded data table indexed by `db
...
SKILL.md
# nextflow-summary-to-galaxy-data-flow
Follow the procedure below and use the artifact/reference sections as the runtime contract.
## When To Use
- Translate a Nextflow summary into a Galaxy data-flow design brief.
## Inputs
- Read artifact `summary-nextflow`. Schema: summary-nextflow. Produced by `summarize-nextflow`. Structured Nextflow pipeline summary emitted by summarize-nextflow; the JSON the data-flow translation reads.
- Read artifact `nextflow-galaxy-reference-data`. Produced by `nextflow-summary-to-galaxy-reference-data`. Reference-data shape brief from nextflow-summary-to-galaxy-reference-data that pins per-asset reference inputs and rebuild-on-absence behavior.
- Read artifact `nextflow-galaxy-interface`. Produced by `nextflow-summary-to-galaxy-interface`. Preceding Galaxy interface brief from nextflow-summary-to-galaxy-interface that pins inputs, outputs, and labels.
## Outputs
- Write artifact `nextflow-galaxy-data-flow` as `nextflow-galaxy-data-flow.md`. Format: `markdown`. Reviewable Markdown brief: abstract operations, collection map/reduce choices, shape-changing placeholder steps, unresolved Galaxy tool needs, confidence, open questions.
## Required Tools
- None declared. Procedure should not assume external CLIs are present.
## Load Upfront
- `references/notes/galaxy-data-flow-draft-contract.md`: Research note copied verbatim into the bundle. Keep the data-flow brief separate from gxformat2 templating and concrete step implementation.
- `references/notes/nextflow-operators-to-galaxy-collection-recipes.md`: Research note copied verbatim into the bundle. Classify Nextflow operators as Galaxy wiring, collection semantics, explicit steps, or review triggers.
- `references/notes/nextflow-to-galaxy-channel-shape-mapping.md`: Research note copied verbatim into the bundle. Translate Nextflow channel, tuple, and path shapes into Galaxy dataset and collection shapes.
- `references/schemas/summary-nextflow.schema.json`: Schema file copied verbatim into the bundle. Read process, channel, operator, and fixture structure while drafting Galaxy-facing abstract data flow.
## Load On Demand
- `references/patterns/galaxy-collection-patterns.md`: Pattern note copied verbatim into the bundle. Ground collection-shape choices in curated, corpus-observed operation and recipe patterns. Use when: selecting collection cleanup, reshape, identifier, or collection-tabular bridge patterns.
- `references/patterns/galaxy-tabular-patterns.md`: Pattern note copied verbatim into the bundle. Ground tabular bridge and table-operation choices in curated, corpus-observed operation patterns. Use when: data-flow translation needs filtering, joining, aggregation, pivoting, or tabular-collection bridges.
- `references/notes/galaxy-sample-sheet-collections.md`: Research note copied verbatim into the bundle. Preserve per-row metadata on the data-flow side: keep sample_sheet column_definitions wired through identifier-keyed steps instead of dropping into parallel parameter inputs, and re-attach metadata after map-over steps that lose it. Use when: the upstream interface brief carries a sample_sheet[:paired|:paired_or_unpaired|:record] input, or when the Nextflow summary shows tuple(meta, path...) channel shape originating from samplesheetToList or splitCsv(header: true).
- `references/notes/nextflow-conditional-to-galaxy-subworkflow-when.md`: Research note copied verbatim into the bundle. Decide between subworkflow `when:` and inline tool-step `when:` for each source conditional, and pick the right output fan-in primitive (`pick_value` vs twin-cascade) so the data-flow brief carries a coherent conditional disposition forward. Use when: the Nextflow summary's `workflow.conditionals[]` is non-empty, or when subworkflow boundaries in the source align with parameter-driven branches (step, aligner, wes, tools, skip_*, use_*).
- `references/notes/nextflow-path-glob-to-galaxy-datatype.md`: Research note copied verbatim into the bundle. Preserve datatype confidence while translating path-like data-flow edges, process output patterns, and published outputs. Use when: choosing or reviewing Galaxy datatype extensions for data-flow edges, collection elements, or output datasets.
- `references/notes/nextflow-reference-data-classification.md`: Research note copied verbatim into the bundle. Cross-check source-side reference-data classifications before deciding how reference assets and optional rebuild branches flow through the Galaxy data-flow draft. Use when: the reference-data or interface brief is silent, low-confidence, or conflicts with source evidence for iGenomes-derived params, coordinated bundles, compute-if-missing branches, multi-DB pick-lists, or cohort-specific assets.
- `references/notes/nextflow-to-galaxy-reference-data-mapping.md`: Research note copied verbatim into the bundle. Decide how reference assets and their indexes flow through the Galaxy data-flow draft (preserving dbkey through map-overs, deferring index-building to wrappers vs surfacing as workflow steps). Use when: the upstream interface brief carries reference-data inputs (FASTA, fai, dict, indexes, known sites, intervals, PoN) or when the source pipeline's compute-if-missing branches imply rebuild semantics the data flow has to honor.
## Validation
- None declared.
## Procedure
Read a Nextflow summary plus the preceding Galaxy interface brief and emit a reviewable Markdown data-flow brief. Capture abstract operations, collection map/reduce choices, shape-changing placeholder transformations, unresolved Galaxy tool needs, confidence, and open questions.
The output is not gxformat2 and should not resolve exact Tool Shed tools. nextflow-summary-to-galaxy-template turns this handoff and the interface brief into a skeleton.
## Runtime Notes
- Do not read Foundry source files at runtime; use only files packaged in this skill bundle and user-supplied artifacts.
- Preserve declared artifact filenames unless the user or harness supplies explicit paths.
- Carry unresolved assumptions into the output artifact instead of silently inventing missing source evidence.