Claude skill · cast

nextflow-summary-to-galaxy-template

gxformat2 skeleton with per-step TODOs from a Nextflow summary and prior Galaxy design briefs.

← All cast skills · Source mold →

Install

/plugin marketplace add jmchilton/foundry
/plugin install foundry-skills@galaxy-workflow-foundry

Then invoke as:

/foundry-skills:nextflow-summary-to-galaxy-template

Skill Bundle

/ packaged cast
attached files
11
upfront
4
on demand
7
cast rev
3
validated
0

Produces: 1 artifact.

Consumes: 5 artifacts.

Artifact Contract

/ skill handoff

Produces

galaxy-workflow-draft

gxformat2 draft (see [[galaxy-workflow-draft-format]]): topology fully resolved (workflow inputs, outputs, step set, edges); tool_id / tool_state / tool_shed_repository and wrapper-determined port names may be TODO with free-text _plan_state / _plan_context / _plan_in / _plan_out per step for later implementation Molds.

yamlgalaxy-workflow-draft.gxwf.yml
Raw artifact contract
{
  "id": "galaxy-workflow-draft",
  "kind": "yaml",
  "default_filename": "galaxy-workflow-draft.gxwf.yml",
  "description": "gxformat2 draft (see [[galaxy-workflow-draft-format]]): topology fully resolved (workflow inputs, outputs, step set, edges); tool_id / tool_state / tool_shed_repository and wrapper-determined port names may be TODO with free-text _plan_state / _plan_context / _plan_in / _plan_out per step for later implementation Molds."
}

Consumes

summary-nextflow

Structured Nextflow pipeline summary emitted by [[summarize-nextflow]]; consulted while emitting placeholder steps.

Raw artifact contract
{
  "id": "summary-nextflow",
  "description": "Structured Nextflow pipeline summary emitted by [[summarize-nextflow]]; consulted while emitting placeholder steps.",
  "inherited_schema": "[[summary-nextflow]]",
  "producers": [
    "summarize-nextflow"
  ]
}

nextflow-galaxy-reference-data

Reference-data shape brief from [[nextflow-summary-to-galaxy-reference-data]] that pins per-asset reference inputs, optionality, datatype hints, and rebuild-on-absence behavior the skeleton must honor.

Raw artifact contract
{
  "id": "nextflow-galaxy-reference-data",
  "description": "Reference-data shape brief from [[nextflow-summary-to-galaxy-reference-data]] that pins per-asset reference inputs, optionality, datatype hints, and rebuild-on-absence behavior the skeleton must honor.",
  "producers": [
    "nextflow-summary-to-galaxy-reference-data"
  ]
}

nextflow-galaxy-interface

Galaxy interface brief from [[nextflow-summary-to-galaxy-interface]] that pins workflow inputs, outputs, labels.

Raw artifact contract
{
  "id": "nextflow-galaxy-interface",
  "description": "Galaxy interface brief from [[nextflow-summary-to-galaxy-interface]] that pins workflow inputs, outputs, labels.",
  "producers": [
    "nextflow-summary-to-galaxy-interface"
  ]
}

nextflow-galaxy-data-flow

Galaxy data-flow brief from [[nextflow-summary-to-galaxy-data-flow]] that pins abstract operations and collection choices.

Raw artifact contract
{
  "id": "nextflow-galaxy-data-flow",
  "description": "Galaxy data-flow brief from [[nextflow-summary-to-galaxy-data-flow]] that pins abstract operations and collection choices.",
  "producers": [
    "nextflow-summary-to-galaxy-data-flow"
  ]
}

iwc-comparison-notes

Structural diff guidance from [[compare-against-iwc-exemplar]] (run on the design briefs); steers the skeleton toward IWC-aligned structure before per-step authoring.

Raw artifact contract
{
  "id": "iwc-comparison-notes",
  "description": "Structural diff guidance from [[compare-against-iwc-exemplar]] (run on the design briefs); steers the skeleton toward IWC-aligned structure before per-step authoring.",
  "producers": [
    "compare-against-iwc-exemplar"
  ]
}

Attached Files

/ runtime references

Load upfront

research

galaxy-data-flow-draft-contract

packaged

Respect the handoff from abstract data-flow draft to gxformat2 skeleton.

Trigger: When translating abstract nodes, unresolved tool needs, and placeholder transformations into template TODOs.

upfront runtime verbatim hypothesis deterministic 6.4 KB
bundle
references/notes/galaxy-data-flow-draft-contract.md
source
content/research/galaxy-data-flow-draft-contract.md
Preview md
---
type: research
subtype: design-spec
title: "Galaxy data-flow draft contract"
tags:
  - research/design-spec
  - target/galaxy
status: draft
created: 2026-05-02
revised: 2026-05-03
revision: 2
ai_generated: true
related_notes:
  - "[[nextflow-to-galaxy-channel-shape-mapping]]"
  - "[[nextflow-operators-to-galaxy-collection-recipes]]"
related_molds:
  - "[[nextflow-summary-to-galaxy-data-flow]]"
  - "[[cwl-summary-to-galaxy-data-flow]]"
  - "[[paper-summary-to-galaxy-design]]"
  - "[[nextflow-summary-to-galaxy-template]]"
  - "[[cwl-summary-to-galaxy-template]]"
  - "[[paper-summary-to-galaxy-template]]"
  - "[[compare-against-iwc-exemplar]]"
sources:
  - "https://github.com/jmchilton/foundry/issues/54"
summary: "Defines the proposed boundary between Galaxy data-flow drafts, gxformat2 templates, and concrete step implementation."
---

# Galaxy Data-Flow Draft Contract

This is an architectural contract, not a schema. Evidence is strongest for Mold and Pipeline boundaries. Proposed fields are speculative until exercised by two or three worked translations.

## Boundary

The data-flow draft owns a target-shaped abstract DAG for Galaxy. It should not be valid `gxformat2` and should not resolve exact Tool Shed tools.

Data-flow draft owns:

- Galaxy-facing workflow inputs and outputs.
- Abstract nodes, edges, branches, collection mapping, collection reduction, and placeholder transformations.
- Input/output shape decisions such as `File`, `list`, `paired`, `list:paired`, or `list:list`.
- Conceptual Galaxy idioms: map-over, reduction, Apply Rules, collection cleanup, identifier synchronization, tabular bridge.
- Abstract unresolved tool needs with input and output shapes.
- Confidence and rationale on inferred nodes, edges, transforms, and tool needs.

The Galaxy template
...
research

galaxy-workflow-draft-format

packaged

Emit the gxformat2 draft superset: TODO tool_id, optional tool_state / tool_shed_repository, and per-step _plan_state / _plan_context planning fields.

upfront runtime verbatim hypothesis deterministic 7.1 KB
bundle
references/notes/galaxy-workflow-draft-format.md
source
content/research/galaxy-workflow-draft-format.md
Preview md
---
type: research
subtype: design-spec
title: "Galaxy workflow draft format"
tags:
  - research/design-spec
  - target/galaxy
status: draft
created: 2026-05-06
revised: 2026-05-10
revision: 2
ai_generated: true
related_notes:
  - "[[gxformat2-schema]]"
  - "[[galaxy-data-flow-draft-contract]]"
  - "[[discover-shed-tool]]"
related_molds:
  - "[[nextflow-summary-to-galaxy-template]]"
  - "[[cwl-summary-to-galaxy-template]]"
  - "[[paper-summary-to-galaxy-template]]"
  - "[[compare-against-iwc-exemplar]]"
  - "[[implement-galaxy-tool-step]]"
summary: "gxformat2 draft superset: wrapper-tier TODOs (tool_id, tool_state, port names) plus _plan_state / _plan_context / _plan_in / _plan_out per tool step."
---

# Galaxy workflow draft format

The output artifact `galaxy-workflow-draft` produced by the `*-summary-to-galaxy-template` Molds is **gxformat2 with wrapper-tier relaxations and free-text planning fields**, sized to the gap between data-flow design and tool-resolved implementation.

Topology — workflow inputs and their collection shapes, workflow outputs, the step set, the producer→consumer edge graph, branches, and `when:` guards — is **settled by the template Mold itself**, drawing on the upstream interface and data-flow briefs (see [[galaxy-data-flow-draft-contract]]). The output is concrete gxformat2 with no topology TODOs. Everything deferred to the per-step implementation Mold is wrapper-tier: which Tool Shed wrapper, what parameters, and the wrapper-determined port names that populate `in:` / `out:` / `outputSource`.

## Relaxations vs. gxformat2

For tool steps, when the wrapper has not been picked:

- `tool_id` and `tool_version` MAY be the literal string `TODO`. Resolution belongs to [[discover-shed-tool]] and the per-step implementation Mold.
- `tool_shed_repos
...
research

gxformat2-schema

packaged

Use the gxformat2 structural vocabulary for workflow inputs, outputs, steps, and producer-side output actions while emitting the skeleton.

upfront runtime verbatim corpus-observed deterministic 3.8 KB
bundle
references/notes/gxformat2-schema.md
source
content/research/gxformat2-schema.md
Preview md
---
type: research
subtype: component
title: "gxformat2 structural JSON Schema"
tags:
  - research/component
  - target/galaxy
status: draft
created: 2026-05-05
revised: 2026-05-06
revision: 2
ai_generated: true
related_notes:
  - "[[galaxy-collection-semantics]]"
  - "[[galaxy-datatypes-conf]]"
  - "[[galaxy-workflow-testability-design]]"
  - "[[gxformat2-workflow-inputs]]"
sources:
  - "https://github.com/jmchilton/galaxy-tool-util-ts/blob/7ae4ecd0ba8d492225f58a6d455c4cc5317298f0/packages/schema/src/galaxy-workflow.ts"
summary: "Vendored structural JSON Schema for gxformat2 workflows: vocabulary for inputs, outputs, steps, and step subtypes."
---

> **Vendored from upstream**, pinned at SHA `7ae4ecd`. One file lives next to this note:
>
> - `gxformat2.schema.json` — Draft-07 JSON Schema generated from the `@galaxy-tool-util/schema` Effect-TS schema via `gxwf structural-schema --format format2`. **Agents and casting should consume this** when validating gxformat2 workflows or reasoning about the closed vocabulary of input/step types.
>
> **Re-sync:** `pnpm sync:vendored --update`. Sync builds galaxy-tool-util-ts and re-runs the generator before copy.

## Top-level shape

A gxformat2 document is an object with required `inputs`, `outputs`, `class` (`enum: ["GalaxyWorkflow"]`), and `steps`. Optional: `id`, `label`, `doc`, `uuid`, `report`, `tags`, `comments`, `creator`, `license`, `release`.

## Workflow input vocabulary

Each entry under `inputs` carries:

- `type` — closed enum: `null | boolean | int | long | float | double | string | integer | text | File | data | collection`. Drives whether the input is a parameter, a single dataset, or a collection.
- `format` — Galaxy datatype extension when `type ∈ {File, data}`. See [[galaxy-datatypes-conf]] for valid values.
- `
...
schema

summary-nextflow

packaged

Read process, channel, operator, and fixture structure when emitting placeholder steps and TODO context.

upfront runtime verbatim corpus-observed deterministic 56.7 KB
bundle
references/schemas/summary-nextflow.schema.json
source
package://@galaxy-foundry/summarize-nextflow#summaryNextflowSchema
Preview json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "https://galaxyproject.org/foundry/schemas/summary-nextflow.schema.json",
  "$comment": "Canonical source: packages/summarize-nextflow/src/schema/summary-nextflow.schema.json in jmchilton/foundry. Mold frontmatter cites this schema via [[summary-nextflow]] wiki-links; the cast pipeline imports the `summaryNextflowSchema` runtime export and serializes it into cast bundles.",
  "title": "Nextflow Pipeline Summary",
  "description": "Structured per-source summary emitted by the summarize-nextflow Mold.\n\nPer-source schema by design — paper, Nextflow, and CWL each have their own summary shape; downstream Molds (data flow, templates, tool wrappers) consume any source's summary and handle the polymorphism.\n\nField names mirror gxy-sketches' SketchSource / ToolSpec / TestDataRef / ExpectedOutputRef where parity exists; see content/research/gxy-sketches-alignment.md.",
  "$ref": "#/$defs/Summary",
  "$defs": {
    "Summary": {
      "title": "Summary",
      "description": "Top-level shape. Every Nextflow summary is exactly this object.",
      "type": "object",
      "additionalProperties": false,
      "required": [
        "source",
        "params",
        "sample_sheets",
        "profiles",
        "tools",
        "processes",
        "subworkflows",
        "workflow",
        "reference_assets",
        "reference_rebuilds",
        "test_fixtures",
        "nf_tests"
      ],
      "properties": {
        "source": {
          "$ref": "#/$defs/SourceRecord"
        },
        "params": {
          "type": "array",
          "items": {
            "$ref": "#/$defs/Param"
          }
        },
        "sample_sheets": {
          "type": "array",
          "items": {
            "$ref": "#/$defs/SampleSheet"
          },
          "description": "Structured sample-sheet inputs. Each entry binds one `params[]` parameter to a row schema (column names, types, path-vs-meta classification, required flags, enums, patterns). Promoted from prose inside `params[].description` so downstream target translations (Galaxy `sample_sheet*` collections, CWL records-of-arrays) can choose collection variants without re-parsing the source pipeline. Empty array when no sample-sheet idiom is detected. Discovery sources: nf-schema `schema:` references, `samplesheetToList()` calls, and `splitCsv(header: true)` m
...

Load on demand

pattern

galaxy-collection-patterns

packaged

Use corpus-grounded collection pattern guidance for unresolved skeleton steps.

Trigger: When adding TODO steps for collection cleanup, reshaping, relabeling, identifier synchronization, or collection-tabular bridges.

on-demand runtime verbatim corpus-observed deterministic 4.4 KB
bundle
references/patterns/galaxy-collection-patterns.md
source
content/patterns/galaxy-collection-patterns.md
Preview md
---
type: pattern
pattern_kind: moc
evidence: corpus-observed
title: "Galaxy: collection patterns"
aliases:
  - "Galaxy collection pattern MOC"
  - "collection transformation patterns"
  - "IWC collection pattern map"
tags:
  - pattern
  - target/galaxy
  - topic/galaxy-transform
  - topic/collection-transform
status: draft
created: 2026-05-02
revised: 2026-05-02
revision: 1
ai_generated: true
summary: "Use this MOC to choose corpus-grounded Galaxy collection transformation patterns."
related_notes:
  - "[[iwc-transformations-survey]]"
  - "[[iwc-conditionals-survey]]"
related_patterns:
  - "[[manifest-to-mapped-collection-lifecycle]]"
  - "[[cleanup-sync-and-publish-nonempty-results]]"
  - "[[reshape-relabel-remap-by-collection-axis]]"
  - "[[fan-in-bundle-consume-and-flatten]]"
  - "[[collection-cleanup-after-mapover-failure]]"
  - "[[sync-collections-by-identifier]]"
  - "[[harmonize-by-sortlist-from-identifiers]]"
  - "[[regex-relabel-via-tabular]]"
  - "[[relabel-via-rules-and-find-replace]]"
  - "[[collection-swap-nesting-with-apply-rules]]"
  - "[[collection-split-identifier-via-rules]]"
  - "[[collection-build-list-paired-with-apply-rules]]"
  - "[[tabular-to-collection-by-row]]"
  - "[[tabular-concatenate-collection-to-table]]"
  - "[[tabular-pivot-collection-to-wide]]"
related_molds:
  - "[[implement-galaxy-tool-step]]"
  - "[[nextflow-summary-to-galaxy-data-flow]]"
  - "[[cwl-summary-to-galaxy-data-flow]]"
  - "[[nextflow-summary-to-galaxy-template]]"
  - "[[cwl-summary-to-galaxy-template]]"
  - "[[paper-summary-to-galaxy-template]]"
  - "[[compare-against-iwc-exemplar]]"
---

# Galaxy: collection patterns

This is the runtime-facing map for Galaxy collection transformation choices. Use it before loading raw survey notes. The survey remains evidence backing; 
...
pattern

galaxy-tabular-patterns

packaged

Use corpus-grounded tabular pattern guidance for unresolved skeleton steps.

Trigger: When adding TODO steps for tabular filtering, projection, joins, aggregation, text-processing recipes, or tabular-collection bridges.

on-demand runtime verbatim corpus-observed deterministic 3.1 KB
bundle
references/patterns/galaxy-tabular-patterns.md
source
content/patterns/galaxy-tabular-patterns.md
Preview md
---
type: pattern
pattern_kind: moc
evidence: corpus-observed
title: "Galaxy: tabular patterns"
aliases:
  - "Galaxy tabular pattern MOC"
  - "tabular transformation patterns"
  - "IWC tabular pattern map"
tags:
  - pattern
  - target/galaxy
  - topic/galaxy-transform
  - topic/tabular-transform
status: draft
created: 2026-05-02
revised: 2026-05-02
revision: 1
ai_generated: true
summary: "Use this MOC to choose corpus-grounded Galaxy tabular transformation patterns."
related_notes:
  - "[[iwc-tabular-operations-survey]]"
related_patterns:
  - "[[tabular-filter-by-column-value]]"
  - "[[tabular-filter-by-regex]]"
  - "[[tabular-cut-and-reorder-columns]]"
  - "[[tabular-compute-new-column]]"
  - "[[tabular-join-on-key]]"
  - "[[tabular-group-and-aggregate-with-datamash]]"
  - "[[tabular-sql-query]]"
  - "[[tabular-prepend-header]]"
  - "[[tabular-synthesize-bed-from-3col]]"
  - "[[tabular-split-taxonomy-string]]"
  - "[[tabular-relabel-by-row-counter]]"
  - "[[tabular-to-collection-by-row]]"
  - "[[tabular-concatenate-collection-to-table]]"
  - "[[tabular-pivot-collection-to-wide]]"
related_molds:
  - "[[implement-galaxy-tool-step]]"
  - "[[nextflow-summary-to-galaxy-data-flow]]"
  - "[[cwl-summary-to-galaxy-data-flow]]"
  - "[[nextflow-summary-to-galaxy-template]]"
  - "[[cwl-summary-to-galaxy-template]]"
  - "[[paper-summary-to-galaxy-template]]"
  - "[[compare-against-iwc-exemplar]]"
---

# Galaxy: tabular patterns

This is the runtime-facing map for Galaxy tabular transformation choices. Use it before loading raw survey notes. The survey remains evidence backing; the operation pages are the actionable references.

## Row And Column Operations

- [[tabular-filter-by-column-value]] — keep/drop rows by string column value with `Filter1`.
- [[tabular-filter-by-regex]] — k
...
research

galaxy-collection-semantics

packaged

Preserve Galaxy collection typing and map-over/reduction semantics in the gxformat2 skeleton.

Trigger: When creating workflow inputs, outputs, and placeholder connections involving collections.

on-demand runtime verbatim corpus-observed deterministic 1.8 KB
bundle
references/notes/galaxy-collection-semantics.md
source
content/research/galaxy-collection-semantics.md
Preview md
---
type: research
subtype: component
title: "Galaxy collection semantics"
tags:
  - research/component
  - target/galaxy
status: draft
created: 2026-04-30
revised: 2026-05-05
revision: 3
ai_generated: false
related_notes:
  - "[[galaxy-xsd]]"
  - "[[galaxy-collection-tools]]"
  - "[[galaxy-apply-rules-dsl]]"
  - "[[nextflow-to-galaxy-channel-shape-mapping]]"
  - "[[nextflow-operators-to-galaxy-collection-recipes]]"
  - "[[galaxy-tool-job-failure-reference]]"
  - "[[galaxy-workflow-invocation-failure-reference]]"
  - "[[iwc-transformations-survey]]"
  - "[[galaxy-discover-datasets]]"
sources:
  - "https://github.com/galaxyproject/galaxy/blob/7765fae934fbfdee77e3be5f5b235e43735273ae/lib/galaxy/model/dataset_collections/types/collection_semantics.yml"
summary: "Vendored formal spec of Galaxy dataset-collection mapping/reduction semantics, with labeled examples and pinned test references."
---

> **Vendored from upstream**, pinned at SHA `7765fae`. Two files live next to this note:
>
> - `galaxy-collection-semantics.yml` — the structured source. **Agents and casting should consume this.** It carries the `tests:` blocks that pin concrete Galaxy test names; the rendered upstream view drops them.
> - `galaxy-collection-semantics.upstream.myst` — Galaxy's auto-generated MyST/LaTeX rendering of the YAML, vendored only so the human view below has something to render. Sync is manual.
>
> **When to consult:** authoring or reasoning about Molds and patterns that touch `data_collection` inputs, map-over / reduction shape changes, sub-collection mapping, `paired_or_unpaired`, or `sample_sheet`.

```vendored-myst
file: galaxy-collection-semantics.upstream.myst
source: https://github.com/galaxyproject/galaxy/blob/7765fae934fbfdee77e3be5f5b235e43735273ae/doc/source/dev/collection_semant
...
research

galaxy-workflow-testability-design

packaged

Choose stable workflow input/output labels, testable checkpoint outputs, and fixture-compatible workflow interfaces while drafting the skeleton.

Trigger: When the template decides workflow inputs, workflow outputs, promoted checkpoints, or collection output identifiers that future tests will need to address.

on-demand runtime verbatim corpus-observed deterministic 11.4 KB
bundle
references/notes/galaxy-workflow-testability-design.md
source
content/research/galaxy-workflow-testability-design.md
Preview md
---
type: research
subtype: component
tags:
  - research/component
  - target/galaxy
status: draft
created: 2026-05-03
revised: 2026-05-06
revision: 2
ai_generated: true
related_notes:
  - "[[iwc-workflow-testability-survey]]"
  - "[[iwc-test-data-conventions]]"
  - "[[planemo-asserts-idioms]]"
  - "[[iwc-shortcuts-anti-patterns]]"
  - "[[planemo-workflow-test-architecture]]"
  - "[[implement-galaxy-workflow-test]]"
  - "[[gxformat2-schema]]"
  - "[[gxformat2-workflow-inputs]]"
  - "[[galaxy-datatypes-conf]]"
summary: "Design guidance for Galaxy workflow inputs, outputs, and checkpoints that make IWC-style workflow tests possible."
---

# Galaxy workflow testability design

Use this note when authoring or translating a Galaxy workflow **before** the `-tests.yml` file exists. It covers workflow structure choices that make later IWC-style tests meaningful: labels, promoted checkpoints, collection identifiers, and fixture-compatible inputs.

This is not a `content/patterns/` page. It is cross-cutting design guidance for Molds that need testable Galaxy workflows. Assertion syntax lives in [[planemo-asserts-idioms]]. Test YAML fixture shapes live in [[iwc-test-data-conventions]]. Accepted shortcut vs smell calls live in [[iwc-shortcuts-anti-patterns]]. Corpus evidence trail lives in [[iwc-workflow-testability-survey]].

## 1. Treat labels as API

Workflow input and output labels are not cosmetic. Planemo and IWC tests address workflow inputs and outputs by label, and the survey found exact label matches for every asserted output across 114 matched workflow/test pairs. A generated workflow should therefore pick stable, descriptive labels before test authoring starts.

Rules:

- Label every output that may need a test assertion.
- Treat input/output renames as breaking changes
...
research

nextflow-conditional-to-galaxy-subworkflow-when

packaged

Emit `when:` clauses on the right step type (subworkflow vs tool) and the right output fan-in primitive (`pick_value`, modes `first_non_null` / `first_or_skip` / `the_only_non_null`) when the data-flow brief carries a conditional disposition through to the skeleton.

Trigger: When the upstream data-flow brief assigns a source conditional to a subworkflow `when:` or inline `when:` shape, or when emitting the merge step for two or more `when:`-gated branches that produce the same logical output.

on-demand runtime verbatim corpus-observed deterministic 13.7 KB
bundle
references/notes/nextflow-conditional-to-galaxy-subworkflow-when.md
source
content/research/nextflow-conditional-to-galaxy-subworkflow-when.md
Preview md
---
type: research
subtype: component
title: "Nextflow conditional to Galaxy subworkflow / when"
tags:
  - research/component
  - source/nextflow
  - target/galaxy
status: draft
created: 2026-05-08
revised: 2026-05-08
revision: 1
ai_generated: true
related_notes:
  - "[[nextflow-to-galaxy-reference-data-mapping]]"
  - "[[nextflow-to-galaxy-channel-shape-mapping]]"
  - "[[summary-nextflow]]"
  - "[[gxformat2-schema]]"
related_molds:
  - "[[nextflow-summary-to-galaxy-data-flow]]"
  - "[[nextflow-summary-to-galaxy-template]]"
sources:
  - "https://github.com/galaxyproject/gxformat2"
  - "https://github.com/iwc-workflows"
summary: "Stub. Translate Nextflow conditionals into Galaxy `when:` (single-workflow v1). Subworkflow vs inline is an aesthetic call, not a rule."
---

# Nextflow conditional to Galaxy subworkflow / when

Stub. Surfaced from sarek emulation (2026-05-08). Companion to [[nextflow-to-galaxy-reference-data-mapping]] — same v1 posture (one Galaxy workflow per source pipeline; trench-coat shape is acceptable as a draft for human review), different gap (control flow rather than reference data).

## Posture

For v1 of the Nextflow-to-Galaxy translation Molds the output is a single Galaxy workflow per source pipeline, even when the source has substantial branching. IWC reviewers historically prefer sibling workflows for what looks like one pipeline with toggles, and we agree; but for the *translation step* a single artifact keeps the Mold pipeline deterministic, the harness simple, and the reviewer's mental model of "this draft maps 1:1 to the source" intact. Sibling-extraction is a polish pass a human or follow-up Mold runs *after* translation, not a decision the translation Mold makes.

The question this note addresses is: given that v1 is one Galaxy workflow, *h
...
research

nextflow-reference-data-classification

packaged

Cross-check source-side reference-data classifications before committing reference assets, optional inputs, and rebuild behavior into the gxformat2 skeleton.

Trigger: When emitting workflow inputs or input-connection wiring for reference assets and the reference-data brief is silent, low-confidence, or conflicts with source evidence for iGenomes-derived params, coordinated bundles, compute-if-missing branches, multi-DB pick-lists, or cohort-specific assets.

on-demand runtime verbatim corpus-observed deterministic 7.8 KB
bundle
references/notes/nextflow-reference-data-classification.md
source
content/research/nextflow-reference-data-classification.md
Preview md
---
type: research
subtype: component
title: "Nextflow reference-data classification"
tags:
  - research/component
  - source/nextflow
status: draft
created: 2026-05-10
revised: 2026-05-10
revision: 3
ai_generated: true
related_notes:
  - "[[summary-nextflow]]"
  - "[[nextflow-to-galaxy-reference-data-mapping]]"
  - "[[nextflow-summary-to-galaxy-reference-data]]"
  - "[[nextflow-summary-to-galaxy-interface]]"
  - "[[nextflow-summary-to-galaxy-data-flow]]"
  - "[[nextflow-summary-to-galaxy-template]]"
related_molds:
  - "[[summarize-nextflow]]"
  - "[[nextflow-summary-to-galaxy-reference-data]]"
  - "[[nextflow-summary-to-galaxy-interface]]"
  - "[[nextflow-summary-to-galaxy-data-flow]]"
  - "[[nextflow-summary-to-galaxy-template]]"
sources:
  - "https://nf-co.re/docs/usage/reference_genomes"
  - "https://github.com/nf-core/sarek/blob/master/conf/igenomes.config"
  - "https://github.com/nf-core/configs"
  - "https://github.com/jmchilton/foundry/issues/221"
summary: "Source-side taxonomy of how Nextflow pipelines use reference data — eight classifications detectable from a summary-nextflow artifact."
---

# Nextflow reference-data classification

Reference-data shape varies along several roughly orthogonal dimensions: whether the pipeline consumes or produces reference data, the cardinality of the assets, whether they're keyed or per-asset, whether rebuild fallback exists, and whether multiple bundles run in parallel. The classifications below are flags an LLM can detect from a `summary-nextflow` artifact; a single pipeline often matches more than one. Grounded in the complexity bridge fixtures from jmchilton/foundry#221.

For the Galaxy-side translation of these classifications, see [[nextflow-to-galaxy-reference-data-mapping]].

## None

Pipeline consumes no reference d
...
research

nextflow-to-galaxy-reference-data-mapping

packaged

Resolve reference-asset declarations into gxformat2 inputs with the right datatype, optionality, and rebuild-on-absence wiring; cross-check the upstream brief against the datatypes table and v1 posture when the brief is silent or low-confidence on a specific asset.

Trigger: When emitting workflow inputs or input-connection wiring for any reference asset flagged in the reference-data brief (FASTA, fai, dict, indexes, dbsnp, known_indels, intervals, PoN, …), or when the source pipeline declares iGenomes-derived params, per-asset reference path params, or any compute-if-missing index-building branch.

on-demand runtime verbatim corpus-observed deterministic 12.0 KB
bundle
references/notes/nextflow-to-galaxy-reference-data-mapping.md
source
content/research/nextflow-to-galaxy-reference-data-mapping.md
Preview md
---
type: research
subtype: component
title: "Nextflow to Galaxy reference-data mapping"
tags:
  - research/component
  - source/nextflow
  - target/galaxy
status: draft
created: 2026-05-08
revised: 2026-05-10
revision: 5
ai_generated: true
related_notes:
  - "[[nextflow-reference-data-classification]]"
  - "[[nextflow-params-to-galaxy-inputs]]"
  - "[[nextflow-path-glob-to-galaxy-datatype]]"
  - "[[summary-nextflow]]"
  - "[[nextflow-summary-to-galaxy-reference-data]]"
  - "[[nextflow-summary-to-galaxy-template]]"
  - "[[galaxy-sample-sheet-collections]]"
  - "[[galaxy-datatypes-conf]]"
related_molds:
  - "[[summarize-nextflow]]"
  - "[[nextflow-summary-to-galaxy-reference-data]]"
  - "[[nextflow-summary-to-galaxy-interface]]"
  - "[[nextflow-summary-to-galaxy-data-flow]]"
  - "[[nextflow-summary-to-galaxy-template]]"
sources:
  - "https://github.com/jmchilton/foundry/issues/221"
summary: "Galaxy-side translation of Nextflow reference-data classifications: idioms available, the v1 posture, datatype defaults, and the in-tool rebuild trade-off."
---

# Nextflow to Galaxy reference-data mapping

Mapping research for [[nextflow-summary-to-galaxy-reference-data]]. Once a Nextflow pipeline's reference-data usage is classified per [[nextflow-reference-data-classification]], this note pins the Galaxy-side translation: idioms available, the v1 posture, datatype defaults, the in-tool rebuild trade-off, and known representation gaps the brief should flag.

## Galaxy side

Galaxy has multiple idioms for surfacing reference data. The bullets below are presented as available shapes; the recommendations that follow narrow them to the v1 posture.

- **`dbkey`-keyed cached lookups.** Workflow inputs carry a `dbkey` annotation; tools consume an admin-pre-loaded data table indexed by `db
...

SKILL.md


# nextflow-summary-to-galaxy-template

Follow the procedure below and use the artifact/reference sections as the runtime contract.

## When To Use

- gxformat2 skeleton with per-step TODOs from a Nextflow summary and prior Galaxy design briefs.

## Inputs

- Read artifact `summary-nextflow`. Schema: summary-nextflow. Produced by `summarize-nextflow`. Structured Nextflow pipeline summary emitted by summarize-nextflow; consulted while emitting placeholder steps.
- Read artifact `nextflow-galaxy-reference-data`. Produced by `nextflow-summary-to-galaxy-reference-data`. Reference-data shape brief from nextflow-summary-to-galaxy-reference-data that pins per-asset reference inputs, optionality, datatype hints, and rebuild-on-absence behavior the skeleton must honor.
- Read artifact `nextflow-galaxy-interface`. Produced by `nextflow-summary-to-galaxy-interface`. Galaxy interface brief from nextflow-summary-to-galaxy-interface that pins workflow inputs, outputs, labels.
- Read artifact `nextflow-galaxy-data-flow`. Produced by `nextflow-summary-to-galaxy-data-flow`. Galaxy data-flow brief from nextflow-summary-to-galaxy-data-flow that pins abstract operations and collection choices.
- Read artifact `iwc-comparison-notes`. Produced by `compare-against-iwc-exemplar`. Structural diff guidance from compare-against-iwc-exemplar (run on the design briefs); steers the skeleton toward IWC-aligned structure before per-step authoring.

## Outputs

- Write artifact `galaxy-workflow-draft` as `galaxy-workflow-draft.gxwf.yml`. Format: `yaml`. gxformat2 draft (see galaxy-workflow-draft-format): topology fully resolved (workflow inputs, outputs, step set, edges); tool_id / tool_state / tool_shed_repository and wrapper-determined port names may be TODO with free-text _plan_state / _plan_context / _plan_in / _plan_out per step for later implementation Molds.

## Required Tools

- None declared. Procedure should not assume external CLIs are present.

## Load Upfront

- `references/notes/galaxy-data-flow-draft-contract.md`: Research note copied verbatim into the bundle. Respect the handoff from abstract data-flow draft to gxformat2 skeleton. Use when: translating abstract nodes, unresolved tool needs, and placeholder transformations into template TODOs.
- `references/notes/galaxy-workflow-draft-format.md`: Research note copied verbatim into the bundle. Emit the gxformat2 draft superset: TODO tool_id, optional tool_state / tool_shed_repository, and per-step _plan_state / _plan_context planning fields.
- `references/notes/gxformat2-schema.md`: Research note copied verbatim into the bundle. Use the gxformat2 structural vocabulary for workflow inputs, outputs, steps, and producer-side output actions while emitting the skeleton.
- `references/schemas/summary-nextflow.schema.json`: Schema file copied verbatim into the bundle. Read process, channel, operator, and fixture structure when emitting placeholder steps and TODO context.

## Load On Demand

- `references/patterns/galaxy-collection-patterns.md`: Pattern note copied verbatim into the bundle. Use corpus-grounded collection pattern guidance for unresolved skeleton steps. Use when: adding TODO steps for collection cleanup, reshaping, relabeling, identifier synchronization, or collection-tabular bridges.
- `references/patterns/galaxy-tabular-patterns.md`: Pattern note copied verbatim into the bundle. Use corpus-grounded tabular pattern guidance for unresolved skeleton steps. Use when: adding TODO steps for tabular filtering, projection, joins, aggregation, text-processing recipes, or tabular-collection bridges.
- `references/notes/galaxy-collection-semantics.md`: Research note copied verbatim into the bundle. Preserve Galaxy collection typing and map-over/reduction semantics in the gxformat2 skeleton. Use when: creating workflow inputs, outputs, and placeholder connections involving collections.
- `references/notes/galaxy-workflow-testability-design.md`: Research note copied verbatim into the bundle. Choose stable workflow input/output labels, testable checkpoint outputs, and fixture-compatible workflow interfaces while drafting the skeleton. Use when: the template decides workflow inputs, workflow outputs, promoted checkpoints, or collection output identifiers that future tests will need to address.
- `references/notes/nextflow-conditional-to-galaxy-subworkflow-when.md`: Research note copied verbatim into the bundle. Emit `when:` clauses on the right step type (subworkflow vs tool) and the right output fan-in primitive (`pick_value`, modes `first_non_null` / `first_or_skip` / `the_only_non_null`) when the data-flow brief carries a conditional disposition through to the skeleton. Use when: the upstream data-flow brief assigns a source conditional to a subworkflow `when:` or inline `when:` shape, or when emitting the merge step for two or more `when:`-gated branches that produce the same logical output.
- `references/notes/nextflow-reference-data-classification.md`: Research note copied verbatim into the bundle. Cross-check source-side reference-data classifications before committing reference assets, optional inputs, and rebuild behavior into the gxformat2 skeleton. Use when: emitting workflow inputs or input-connection wiring for reference assets and the reference-data brief is silent, low-confidence, or conflicts with source evidence for iGenomes-derived params, coordinated bundles, compute-if-missing branches, multi-DB pick-lists, or cohort-specific assets.
- `references/notes/nextflow-to-galaxy-reference-data-mapping.md`: Research note copied verbatim into the bundle. Resolve reference-asset declarations into gxformat2 inputs with the right datatype, optionality, and rebuild-on-absence wiring; cross-check the upstream brief against the datatypes table and v1 posture when the brief is silent or low-confidence on a specific asset. Use when: emitting workflow inputs or input-connection wiring for any reference asset flagged in the reference-data brief (FASTA, fai, dict, indexes, dbsnp, known_indels, intervals, PoN, …), or when the source pipeline declares iGenomes-derived params, per-asset reference path params, or any compute-if-missing index-building branch.

## Validation

- None declared.

## Procedure

Read the original Nextflow source artifact, the `summary-nextflow.json` summary, the Nextflow-to-Galaxy reference-data brief, the Nextflow-to-Galaxy interface brief, and the Nextflow-to-Galaxy data-flow brief. Emit a gxformat2 skeleton with workflow inputs, workflow outputs, placeholder steps, rough connections, and TODO slots for later implementation skills.

The reference-data, interface, and data-flow briefs guide the skeleton, but they do not replace source evidence. Treat the prior-step index as the working context: Nextflow source, source summary, reference-data brief, interface brief, data-flow brief, and any open questions carried forward.

Topology is this skill's job to settle. The output must be concrete gxformat2: workflow inputs with their final collection shapes and formats, workflow outputs, the step set, the producer→consumer edge graph, branches, and `when:` guards are all decided here. The upstream interface and data-flow briefs guide those decisions, but if they hedge or leave a topology choice open, this skill makes the call from source evidence, IWC exemplars, and pattern pages — never emit a topology `TODO`. What is deferred to per-step authoring is strictly wrapper-tier: `tool_id`, `tool_version`, `tool_shed_repository`, `tool_state`, and the wrapper-determined port names that surface in `in:` / `out:` / `outputSource`. Capture deferred intent in the `_plan_*` family (`_plan_state`, `_plan_context`, `_plan_in`, `_plan_out`) so the per-step skill has the source evidence and constraints it needs.

Defer thoughtfully. When research surfaces a Foundry pattern page that names the exact recipe — a galaxy-collection-patterns reshape, a conditional-run-optional-step gate, a galaxy-tabular-patterns filter — fill the step in as completely as the pattern allows: concrete `tool_id`, parameters, port names from the pattern's worked example. Pattern pages encode resolved choices; emitting `TODO` over a covered recipe discards real evidence the per-step skill cannot recover. Defer only when the step is a domain-specific scientific tool with no covering pattern (alignment, variant calling, expression quantification, etc.), and pack `_plan_context` with the source command, conda specs, container tags, and pre/post conditions the per-step skill will need to pick a wrapper.

Output shape is gxformat2 with wrapper-tier relaxations and `_plan_state` / `_plan_context` / `_plan_in` / `_plan_out` per tool step — see galaxy-workflow-draft-format. Refinement open work for those planning fields lives in `refinement.md`.

## Runtime Notes

- Do not read Foundry source files at runtime; use only files packaged in this skill bundle and user-supplied artifacts.
- Preserve declared artifact filenames unless the user or harness supplies explicit paths.
- Carry unresolved assumptions into the output artifact instead of silently inventing missing source evidence.