Home Schema

Nextflow pipeline summary

JSON Schema for the structured summary emitted by the summarize-nextflow Mold.

Revised: 2026-05-06
Rev: 8
Related: summarize-nextflow
nextflow-workflow-io-semantics
nextflow-params-to-galaxy-inputs
nextflow-path-glob-to-galaxy-datatype
nextflow-reference-data-classification
nextflow-summary-to-galaxy-reference-data
nextflow-summary-to-galaxy-interface
nextflow-summary-to-galaxy-data-flow
nextflow-summary-to-galaxy-template
nextflow-summary-to-cwl-interface
nextflow-summary-to-cwl-data-flow
author-galaxy-tool-wrapper
nextflow-test-to-galaxy-test-plan
nextflow-test-to-cwl-test-plan

schema summary-nextflow @galaxy-foundry/summarize-nextflow @ 0.0.0 upstream ↗

Nextflow Pipeline Summary

Structured per-source summary emitted by the summarize-nextflow Mold. Per-source schema by design — paper, Nextflow, and CWL each have their own summary shape; downstream Molds (data flow, templates, tool wrappers) consume any source's summary and handle the polymorphism. Field names mirror gxy-sketches' SketchSource / ToolSpec / TestDataRef / ExpectedOutputRef where parity exists; see content/research/gxy-sketches-alignment.md.

29 definitions. Anchor links per definition (e.g. #has_text_model) are stable.

Channel

Channel

One top-level channel constructed in the workflow. Sources include `Channel.fromPath`, `Channel.fromFilePairs`, `samplesheetToList`-driven `Channel.fromList`, `splitCsv`, `file`/`files`, and plain `params.*` references.

field	type	req	description
construct	string	✓	Classifies the channel's primary materialization factory or shape-determining operator. Selection precedence: (1) `samplesheetToList` when the chain contains `samplesheetToList(...)` (typically wrapped in `Channel.fromList`); (2) `splitCsv` when the chain ends in `.splitCsv(header: true)` over a path; (3) otherwise the outermost factory (`Channel.fromPath` → `fromPath`, `Channel.fromFilePairs` → `fromFilePairs`, `file(...)` → `file`, etc.); (4) `other` for unrecognized constructions. The verbatim `source` retains the full chain — `construct` is the typed lookup the converter would otherwise re-parse.
from_param	string \| null	✓	Foreign key into `params[].name` when the construction expression directly references `params.X` (e.g. `Channel.fromPath(params.reads)`, `samplesheetToList(params.input, ...)`, `file(params.fasta)`). Null for literal-glob construction, expressions that compose multiple params, and channels derived from other channels. v1 resolution is direct-only; one-hop Groovy bindings (`def reads = params.reads; Channel.fromPath(reads)`) are not chased — see jmchilton/foundry#211.
name	string	✓
required_runtime	boolean	✓	True when the construction chain ends in `.ifEmpty { error ... }` (or an equivalent imperative emptiness-throw guard). Captures runtime requiredness even when the param's nf-schema entry does not mark it required. Combine with `params[].required` and any imperative pre-construction `error` checks for the full requiredness picture.
shape	string	✓	String-encoded channel shape; same convention as ChannelIO.
source	string	✓	Verbatim channel-construction expression.

ChannelIO

ChannelIO

One declared input or output channel of a process. Channel shape is a string, not a structured type — `tuple(meta, [path,path])` is enough for downstream Molds and avoids a research project on NF channel typing.

field	type	req	description
name	string	✓
shape	string	✓	String-encoded channel shape, e.g. `tuple(meta, [path,path])`, `path`, `val(integer)`.
description	string
topic	string \| null		Nextflow channel-topic name when this output is bound to a topic (e.g. `versions` for the standard nf-core version-aggregation topic). Topics are a Nextflow 24+ feature; the module-level `topic: <name>` annotation on a process output emits to a global named channel that any consumer can `channel.topic('<name>')` to read. Null for non-topic outputs and for all inputs.

Conditional

Conditional

One workflow-level conditional that gates a subgraph.

field	type	req	description
affects	string[]	✓	Process or subworkflow names whose execution is gated by this conditional.
branch	string	✓	Which side of the conditional this entry describes. `default` is the truthy branch; `alternate` is the else branch.
guard	string	✓	Verbatim guard expression, e.g. `params.skip_alignment`.

Edge

Edge

One edge in the workflow's call graph. The deterministic parser records the literal operator chain in `via`; reconciliation of the source and target shapes is the LLM step.

field	type	req	description
from	string	✓	Channel name or `<PROCESS>.out.<chan>` reference.
to	string	✓	Process or subworkflow name receiving this channel.
notes	string		LLM-emitted note when the reconciliation is low-confidence (e.g. deeply nested closures).
via	string[]		Operators on the path between source and target, in order (e.g. `["map", "join", "groupTuple"]`). Empty for a direct edge.

Evidence

Evidence

Shared provenance block for source-detected facts. `source_path` localizes the finding; `confidence` is a coarse self-assessment; `evidence[]` carries free-form snippets (matched lines, normalized expressions, detector notes). Reused by ReferenceAsset and ReferenceRebuildRule.

field	type	req	description
confidence	string	✓	Detector self-assessment. `high`: all signals match the expected idiom; `medium`: some signals match but the guard or builder is partial / mixes non-param locals; `low`: heuristic match, downstream should treat as a hint.
evidence	string[]	✓	Free-form evidence snippets (matched source lines, normalized expressions, detector notes). Empty when none.
source_path	string \| null	✓	Repo-relative path of the source file where the finding was detected (e.g. `subworkflows/local/prepare_genome/main.nf`, `nextflow.config`). Null when the finding spans files or cannot be localized.

ExpectedOutputRef

ExpectedOutputRef

One expected output. Field names mirror gxy-sketches' ExpectedOutputRef verbatim. At least one of path / url / assertions is required.

field	type	req	description
role	string	✓
assertions	string[]		Assertion strings. Simple equality / regex / contains-string checks are preserved verbatim; complex Groovy assertions are summarized to prose with a `warnings[]` flag in the parent Summary.
description	string \| null
kind	string \| null		Coarse output kind: `report`, `tabular`, `image`, `archive`, `sequence`, `other`.
path	string \| null
url	string \| null

InvocationBinding

InvocationBinding

Binding of one positional argument to a callee `take:` name.

field	type	req	description
argument	string	✓	Verbatim caller-side argument expression.
take	string	✓	Callee `take:` name (FK into the callee subworkflow's `inputs[].name`).

ModuleMeta

ModuleMeta

Normalized subset of nf-core module `meta.yml`, captured next to a process so module-scoped downstream Molds can consume `processes[i]` without reading the full pipeline summary.

field	type	req
authors	string[]	✓
input	→ ModuleMetaEntry[]	✓
keywords	string[]	✓
maintainers	string[]	✓
output	→ ModuleMetaEntry[]	✓
tools	→ ModuleMetaEntry[]	✓
description	string

ModuleMetaEntry

ModuleMetaEntry

One named entry from a module `meta.yml` section, normalized from nf-core's single-key YAML maps into `{ name, ...fields }` objects.

field	type	req
name	string	✓
description	string
documentation	string
doi	string
homepage	string
identifier	string
licence	string[]
pattern	string
tool_dev_url	string
type	string

NfTest

NfTest

One `tests/*.nf.test` file. nf-core templates use a near-uniform shape: a `nextflow_pipeline { test("<desc>") { when { params { ... } } then { assertAll(...) } } }` block, with one test() per profile, asserting `workflow.success` plus a snapshot match. This shape captures the data so downstream test-conversion molds (e.g. nextflow-test-to-target-tests) don't have to re-parse Groovy.

field	type	req	description
assert_workflow_success	boolean	✓	Whether the test asserts `workflow.success`. Almost always true for nf-core templates.
name	string	✓	Description string passed to `test("...")`. Often duplicates the profile name.
path	string	✓	Repo-relative path of the .nf.test file (e.g. `tests/dfast.nf.test`).
profiles	string[]	✓	Profile name(s) the test runs under, from the file-level `profile "<name>"` declaration and/or per-test config overrides. Usually a single profile.
prose_assertions	string[]	✓	Other assertions (regex matches, count checks, content equality on specific files) summarized to prose strings. Empty for snapshot-only tests, which is the common nf-core case.
params_overrides	object		The `when { params { ... } }` block as a key→value map. Most templates set only `outdir`; pipeline-specific overrides land here.
snapshot	→ SnapshotFixture \| null		Structured representation of the `assert snapshot(...).match()` clause when present; null when the test uses no snapshot.

Param

Param

One pipeline parameter. Sourced from `nextflow.config`'s `params { ... }` block, augmented from `nextflow_schema.json` when present (nf-core).

field	type	req	description
name	string	✓
required	boolean	✓
type	string	✓	JSON Schema-style type when nextflow_schema.json supplies it (string, integer, number, boolean, path); free-form when inferred from a default value.
default	any		Default value verbatim from the source. May be null.
description	string
enum	any[]		Allowed values when nextflow_schema.json declares them.
fa_icon	string \| null		Font Awesome icon class from the parent `$defs` section's `fa_icon` (e.g. `fas fa-terminal`). Optional UI hint; null when undeclared.
format	string \| null		JSON Schema `format` keyword from nextflow_schema.json. nf-schema vocabulary: `file-path`, `directory-path`, `path`, `file-path-pattern`. Distinguishes path-typed params (datasets/collections) from plain strings without re-reading the source schema. Null when undeclared.
hidden	boolean \| null		nf-schema `hidden` keyword. True for CLI-plumbing params (e.g. `validate_params`, `pipelines_testdata_base_path`, `version`) that shouldn't surface in user-facing target interfaces. Null when undeclared.
mimetype	string \| null		nf-schema `mimetype` keyword (e.g. `text/csv`, `text/plain`, `application/gzip`). Useful for seeding Galaxy `format` on path-typed params. Null when undeclared.
schema_group	string \| null		Human-readable group label from the parent nextflow_schema.json `$defs` section's `title` (e.g. `Input/output options`, `Reference genome options`). Preserves nf-schema sectioning so target interfaces can group inputs without re-parsing the source. Null when the param is not under a titled section.
source_expression	string \| null		Verbatim right-hand-side expression when the param was assigned in Groovy (e.g. `getGenomeAttribute('fasta_fai')`). Null when the param comes from a JSON Schema entry only.
source_kind	string \| null		How this param entry was derived. `nextflow_schema`: declared in `nextflow_schema.json`. `params_block`: declared (or defaulted) in `nextflow.config`'s `params { ... }` block. `getGenomeAttribute`: synthesized from `params.X = getGenomeAttribute('X')` assignments in `nextflow.config` (nf-core key-expanded bundle pattern). `computed`: derived from another non-getGenomeAttribute expression. Null when provenance was not classified.
source_path	string \| null		Repo-relative path of the source file where the assignment lives (e.g. `nextflow.config`, `conf/igenomes.config`). Null when undefined or when the param comes only from `nextflow_schema.json`.

Process

Process

One Nextflow `process { ... }` block.

field	type	req	description
inputs	→ ChannelIO[]	✓
meta	→ ModuleMeta \| null	✓	Normalized `meta.yml` from the module directory when present. Null for local/ad-hoc processes without module metadata.
module_path	string	✓	Relative path from the pipeline root to the file declaring the process.
module_tests	→ NfTest[]	✓	Module-scoped nf-test files under the module's `tests/` directory. Empty for local/ad-hoc modules without unit tests.
name	string	✓	Canonical process name (uppercase NF convention) as declared by the `process <NAME>` line in `module_path`.
outputs	→ ChannelIO[]	✓
aliases	string[]		Alias names this process is imported as in workflow scopes via `include { <NAME> as <ALIAS> }`. Real nf-core pipelines re-import a single module multiple times under different aliases to invoke it with distinct runtime args (e.g. `MINIMAP2_ALIGN as MINIMAP2_CONSENSUS`, `MINIMAP2_ALIGN as MINIMAP2_POLISH`). Each alias appears in `workflow.edges[].from`/`.to` exactly as written; the canonical `name` does not appear in edges unless an unaliased import exists. Default `[]` when the process is only imported under its canonical name.
conda	string \| null		Verbatim text of the process's `conda` directive (or null). Two common shapes: a literal `bioconda::<name>=<version>` (legacy) or a file reference `${moduleDir}/environment.yml` (modern nf-core convention). Either way the resolved bioconda spec lands in `tools[].bioconda`.
container	string \| null		Verbatim text of the process's `container` directive (or null). Modern nf-core modules use a ternary expression `"${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? '<singularity-uri>' : '<docker-uri>' }"` — keep that text intact here. The two resolved branches are split out into `tools[].docker` / `tools[].singularity` (and `tools[].wave` / `tools[].biocontainer` as appropriate).
in_subworkflows	string[]		Denormalized FK: names of `subworkflows[]` whose `calls[]` invoke this process (canonical name or any alias). Empty `[]` when the process is only invoked from the primary `workflow` body. Useful for grouping the flat process-tier DAG into a subworkflow-tier view client-side. Multi-entry when the same module is invoked from several subworkflows (typically via aliases).
publish_dir	string \| null		`publishDir` value when the process publishes outputs (or null).
script_summary	string		One-line LLM-derived summary of what the `script:` body does. Free text.
tool	string \| null		Foreign key into `tools[].name` when the process runs a single recognizable tool; null for multi-tool or pure-Groovy processes.
when	string \| null		Verbatim `when:` guard expression, or null if absent.

ReferenceAsset

ReferenceAsset

Curated source-side view of one reference-data input the pipeline consumes. A foreign-key projection onto `params[].name` plus the metadata downstream classification Molds need. Target-agnostic: no Galaxy translation decisions are encoded here.

field	type	req	description
asset_kind	string	✓	Coarse asset classification (e.g. `fasta`, `fasta_index`, `sequence_dictionary`, `bwa_index`, `bwamem2_index`, `tabix_index`, `gtf`, `gff`, `bed`, `vcf`, `database`, `other`). Free-form string; downstream Molds choose how to map onto target datatypes.
evidence	→ Evidence	✓
param	string	✓	FK into `params[].name`.
required	boolean	✓	True when the source pipeline treats this asset as required (no compute-if-missing branch, no optional guard). When false, see `reference_rebuilds[]` for any rebuild branch covering this param.
used_by	string[]	✓	Names of processes / subworkflows that receive this param as an argument (via direct invocation binding or channel construction). Empty when no caller could be attributed.
format_hint	string \| null		Format hint, often the nf-schema `format` keyword (`file-path`, `directory-path`, `path`, `file-path-pattern`) or a domain hint like `fai`, `dict`. Null when undetermined.
schema_group	string \| null		Mirrors `params[<this>].schema_group`. Useful for grouping (`Reference genome options`).
source_expression	string \| null		Mirrors `params[<this>].source_expression`. Duplicated for the same reason.
source_kind	string \| null		Mirrors `params[<this>].source_kind`. Duplicated here so reference-data consumers don't have to dereference.

ReferenceRebuildRule

ReferenceRebuildRule

One compute-if-missing rebuild branch detected in a workflow or subworkflow body. Binds an asset param to the guard expression and builder process that produces it when absent. Source evidence only; in-tool-rebuild decisions live downstream.

field	type	req	description
asset_param	string	✓	FK into `params[].name` — the asset this branch reconstitutes when missing.
builder	string	✓	Builder process or subworkflow name invoked inside the branch (e.g. `SAMTOOLS_FAIDX`, `BWA_INDEX`).
builder_outputs	string[]	✓	Output channel names from the builder that are assigned to the asset (e.g. `fai`, `dict`).
evidence	→ Evidence	✓
guard	string	✓	Verbatim guard expression, e.g. `!fasta_fai_in && step != "annotate"`.
fallback_for	string \| null		Name of the `<asset>_in`-style param that, when present, supplies the asset directly and skips the rebuild. Null when no such pairing was detected.
guard_params	string[]		Param names referenced by the guard. Empty when the guard binds only non-param locals.

SampleSheet

SampleSheet

One sample-sheet-shaped pipeline input. Captures the row schema, path-vs-meta column classification, and discovery provenance so downstream Molds can map onto Galaxy `sample_sheet[:variant]` collections, CWL records-of-arrays, or other target shapes without re-reading the source.

field	type	req	description
columns	→ SampleSheetColumn[]	✓	Row schema in declaration order. nf-schema's `samplesheetToList` emits columns in property order, not source-column order — preserve property order here so downstream translations match runtime channel item layout.
discovered_via	string	✓	How the sample-sheet shape was detected. `nf-schema`: nextflow_schema.json entry has `schema:` keyword. `samplesheetToList`: workflow imports nf-schema's helper and calls it on the param. `splitCsv`: workflow uses `splitCsv(header: true)` on the param's path. `ad-hoc`: pipeline-specific CSV/TSV parsing inferred from script bodies.
param	string	✓	Foreign key into `params[].name` — the parameter whose value points at the sample-sheet file.
format	string \| null		Tabular format declared by the param's `mimetype` or inferred from `samplesheetToList` / `splitCsv` semantics. Null when undeclared.
header	boolean \| null		Whether the file is expected to have a header row. Null when the discovery source doesn't pin it.
schema_path	string \| null		Repo-relative path of the JSON Schema file describing rows (e.g. `assets/schema_input.json`), resolved from the param's nf-schema entry's `schema:` keyword. Null when discovered via `samplesheetToList` without a schema reference, or via `splitCsv(header: true)` ad-hoc parsing.

SampleSheetColumn

SampleSheetColumn

One column in a sample-sheet row schema. Type vocabulary is JSON Schema-style scalar; `kind` separates dataset-reference columns from per-row metadata. Validation hints (`pattern`, `enum`, `exists`, `mimetype`) are preserved verbatim — target Molds decide which of them survive translation (e.g. Galaxy's `sample_sheet` validator allowlist is regex/in_range/length).

field	type	req	description
kind	string	✓	`data`: path-typed column whose values are dataset references (becomes element/inner-collection slot in Galaxy `sample_sheet`). `meta`: scalar metadata column (becomes a `column_definitions` entry, populated per row in `columns[]`). Includes the nf-schema `meta:` annotation when present; in its absence inferred from `format` (`file-path`/`directory-path`/`path` → `data`, anything else → `meta`).
name	string	✓
required	boolean	✓
type	string	✓	JSON Schema-style scalar type. Path columns are typed `string` with a `format: file-path\|directory-path\|path` qualifier — see `format` and `kind`.
default	any		Default value verbatim from the sample-sheet schema. May be null.
description	string \| null
enum	any[]		Allowed values when the column schema declares an `enum`. Empty array when absent.
exists	boolean \| null		nf-schema `exists` flag for path columns. True when the parser must verify the file exists.
format	string \| null		JSON Schema `format` keyword. nf-schema vocabulary: `file-path`, `directory-path`, `path`, `file-path-pattern`. Null for scalar metadata columns without a format hint.
mimetype	string \| null		nf-schema `mimetype` keyword (e.g. `text/csv`, `application/gzip`). Useful for target datatypes.
pattern	string \| null		JSON Schema regex `pattern` when present.

SnapshotChannel

SnapshotChannel

One output channel or flat snapshot list inside a `.snap` entry.

field	type	req	description
files	→ SnapshotFile[]	✓	File md5 assertions parsed from `<path>:md5,<hex>` strings.
key	string \| null	✓	Channel key from the snapshot object, e.g. `0`, `1`, or `genome_gtf`; null for flat file-list snapshots.
values	any[]	✓	Non-file snapshot values preserved verbatim for consumers that need versions, counts, or scalar tuple emissions.

SnapshotContent

SnapshotContent

One top-level snapshot entry from an nf-test `.snap` JSON sidecar.

field	type	req	description
channels	→ SnapshotChannel[]	✓	Channel-keyed snapshot values. Pipeline-template flat file lists use `key: null`.
name	string	✓	Top-level key in the `.snap` file, usually the `test("...")` name.

SnapshotFile

SnapshotFile

One file digest assertion from a `.snap` sidecar.

field	type	req	description
basename	string	✓	Final path segment, suitable for Galaxy test `file:` assertions when the snapshot stores only basenames.
md5	string	✓	MD5 digest from the sidecar.
path	string	✓	Path portion before `:md5,` exactly as stored in the sidecar.
stub	boolean	✓	True when the digest is the empty-file md5 emitted by many `-stub` tests.

SnapshotFixture

SnapshotFixture

Structured shape of an nf-test `snapshot(...).match()` assertion. nf-core templates pass a small set of values into snapshot() — succeeded-task count, version YAML (with Nextflow version stripped), stable file-name list, stable file-content list — and prune the comparison via `getAllFilesFromDir(..., ignoreFile: ..., ignore: [...])` helpers. This breakdown lets downstream consumers reconstruct equivalent assertions in target test frameworks without re-parsing Groovy.

field	type	req	description
captures	string[]	✓	Logical names of values passed to snapshot(), in order. Common set: `succeeded_task_count`, `versions_yml`, `stable_names`, `stable_paths`. Free-form strings rather than an enum because pipelines can pass other values.
helpers	string[]	✓	nf-test helper functions invoked in the snapshot expression (e.g. `getAllFilesFromDir`, `removeNextflowVersion`). Used to detect whether a target framework can replicate the comparison.
ignore_files	string[]	✓	Repo-relative paths passed as `ignoreFile:` to helpers (e.g. `tests/.nftignore`, `tests/.nftignore_files_entirely`). The files themselves are line-delimited globs.
ignore_globs	string[]	✓	Inline `ignore: [...]` glob list passed to helpers (e.g. `['Prokka/', '/multiqc_busco.yaml']`). Distinct from ignore_files which references on-disk lists.
parsed_content	→ SnapshotContent[]		Parsed entries from the `.nf.test.snap` JSON sidecar. Empty when the sidecar is absent, too large for compact summary output, or not valid JSON. Each entry preserves the snapshot name plus channel-keyed file md5 assertions and non-file scalar values.
snap_path	string \| null		Repo-relative path of the corresponding `.nf.test.snap` file when present.

SourceRecord

SourceRecord

Provenance block. Mirrors gxy-sketches SketchSource field names so a Foundry summary is structurally consumable by anyone using that shape.

field	type	req	description
ecosystem	string	✓	Pipeline flavor. `nf-core` when `manifest.name` is `nf-core/<slug>` and the canonical layout is present; `nextflow` for ad-hoc DSL2 pipelines. Superset of gxy-sketches' enum (which adds `iwc`, `snakemake-workflows`, `wdl` for its other ingestors).
slug	string	✓	Kebab-case identifier; `<owner>-<repo>` for nf-core, repo basename otherwise.
url	string	✓	Git remote URL of the pipeline repository.
version	string	✓	Tag, branch, or commit SHA the summary was derived from.
workflow	string	✓	Pipeline name (typically the part after `nf-core/`).
license	string \| null		SPDX license identifier when detectable from a LICENSE file.

Subworkflow

Subworkflow

One named `workflow <NAME> { ... }` block other than the primary workflow. Includes nf-core utility wrappers (PIPELINE_INITIALISATION, PIPELINE_COMPLETION, UTILS_NFCORE_PIPELINE) which exist to compose helper-function calls rather than to invoke processes.

field	type	req	description
calls	string[]	✓	Names of processes and nested subworkflows this subworkflow invokes. Free-function calls (e.g. `paramsHelp`, `completionEmail`) are NOT recorded here; they're implied by `kind = utility`.
kind	string	✓	`pipeline` when the subworkflow invokes pipeline processes (data-flow contributor). `utility` when it composes helper functions only — common nf-core template subworkflows like PIPELINE_INITIALISATION (which calls `paramsHelp`, `samplesheetToList`, `UTILS_NFSCHEMA_PLUGIN`) and PIPELINE_COMPLETION (`completionEmail`, `completionSummary`). Utility subworkflows have empty `calls[]` for processes but may still expose channel outputs the primary workflow consumes.
name	string	✓
path	string	✓	Relative path from the pipeline root.
tests	→ NfTest[]	✓	Subworkflow-scoped nf-test files under the subworkflow's `tests/` directory. Empty when absent.
inputs	→ ChannelIO[]		Parsed entries of the subworkflow's `take:` block. `name` is the take name, `description` carries any adjacent `// ...` comment, `shape` carries the verbatim source line.
invocations	→ SubworkflowInvocation[]		Call-sites where this subworkflow was invoked. Each entry records the calling workflow/subworkflow, the verbatim positional arguments, and a per-argument binding onto this subworkflow's `inputs[].name` (take names). Drives reference-asset attribution (`reference_assets[].used_by`) without re-parsing call sites. Empty when uncalled or when the call could not be positionally aligned.
outputs	→ ChannelIO[]

SubworkflowInvocation

SubworkflowInvocation

One call-site of a subworkflow, with positional arguments bound to its `take:` names.

field	type	req	description
arguments	string[]	✓	Verbatim positional argument expressions in source order (e.g. `params.fasta`, `PREPARE_GENOME.out.fai`).
bindings	→ InvocationBinding[]	✓	Per-argument binding onto the callee's `inputs[].name` (take names), in source order. Length matches `arguments[]` when the call positionally aligns to the callee's take block; shorter when only a prefix aligns.
caller	string	✓	Name of the calling workflow or subworkflow.
caller_path	string \| null		Repo-relative path of the file the caller is defined in. Null when not resolvable.

Summary

Summary

Top-level shape. Every Nextflow summary is exactly this object.

field	type	req	description
nf_tests	→ NfTest[]	✓	All `tests/*.nf.test` files in the pipeline. Each entry captures one test-program — its profile, params overrides, and structured assertions. Empty array when the pipeline has no nf-test fixtures.
params	→ Param[]	✓
processes	→ Process[]	✓
profiles	string[]	✓
reference_assets	→ ReferenceAsset[]	✓	Curated, target-agnostic view of reference-data inputs the source pipeline consumes (FASTAs, indexes, dictionaries, annotation tables, …). Each entry is a foreign-key projection onto `params[]` plus source-side metadata (asset kind, format hint, source expression, schema group, callers). Empty array when no reference assets are detected. Downstream Molds (e.g. nextflow-summary-to-galaxy-reference-data) classify these against Galaxy translation rungs; classification decisions are not encoded here.
reference_rebuilds	→ ReferenceRebuildRule[]	✓	Compute-if-missing rebuild branches detected in workflow/subworkflow bodies. Each entry binds an asset param to the guard and builder process that recomputes it when absent. Empty when no rebuild idioms are found. Source evidence only — Galaxy in-tool-rebuild decisions live downstream.
sample_sheets	→ SampleSheet[]	✓	Structured sample-sheet inputs. Each entry binds one `params[]` parameter to a row schema (column names, types, path-vs-meta classification, required flags, enums, patterns). Promoted from prose inside `params[].description` so downstream target translations (Galaxy `sample_sheet*` collections, CWL records-of-arrays) can choose collection variants without re-parsing the source pipeline. Empty array when no sample-sheet idiom is detected. Discovery sources: nf-schema `schema:` references, `samplesheetToList()` calls, and `splitCsv(header: true)` materializations.
source	→ SourceRecord	✓
subworkflows	→ Subworkflow[]	✓
test_fixtures	→ TestFixtures	✓	Test-input data shape for the selected profile (the cast's `profile` argument, default `test`). For pipelines with multiple test profiles, see `nf_tests[]` for the full enumeration.
tools	→ Tool[]	✓
workflow	→ Workflow	✓
warnings	string[]		Cast-skill-emitted advisory messages (e.g. DSL1 detected, meta.yml/script disagreement, lossy assertion summarization). Empty array on a clean run.

TestDataRef

TestDataRef

One test-fixture input. Field names mirror gxy-sketches' TestDataRef verbatim. The sketch-bundle constraint that `path` must live under `test_data/` is intentionally dropped; the Foundry summary describes fixtures as data, it does not bundle them.

field	type	req	description
role	string	✓	Logical role of the input (e.g. `samplesheet`, `reference_fasta`).
description	string \| null
filetype	string \| null		File format, e.g. `fastq.gz`, `csv`.
path	string \| null		Repo-relative path when the fixture lives in-tree, or local filesystem path when the CLI fetched the remote fixture into a caller-provided test-data directory.
sha1	string \| null		SHA-1 integrity hash when published alongside the fixture.
url	string \| null

TestFixtures

TestFixtures

Test fixtures derived from a `conf/<profile>.config` for the cast's selected profile. Pipelines with multiple test profiles (bacass has 11) record only the selected profile here; the full nf-test enumeration lives in `nf_tests[]`.

field	type	req	description
inputs	→ TestDataRef[]	✓
outputs	→ ExpectedOutputRef[]	✓
profile	string	✓	Which `conf/<profile>.config` produced these fixtures (typically `test`).

Tool

Tool

One tool/dependency the pipeline runs. Mirrors gxy-sketches ToolSpec (name + version) and augments it with the resolved container/conda strings the bridge to author-galaxy-tool-wrapper needs.

field	type	req	description
name	string	✓	Canonical tool name (e.g. `fastp`, `samtools`). Used as the foreign key from `processes[].tool`.
version	string	✓
bioconda	string \| null		`bioconda::<name>=<version>` spec when a `conda` directive provides one — directly or by reference to a `${moduleDir}/environment.yml` whose `dependencies:` list contains a single bioconda entry. The latter is the modern nf-core convention.
biocontainer	string \| null		BioContainers image reference when one was found. Includes both `quay.io/biocontainers/<name>:<version>--<build>` and the docker.io alias `biocontainers/<name>:<version>--<build>` (modern nf-core modules typically use the docker.io form in the docker branch and the depot.galaxyproject.org form in the singularity branch).
docker	string \| null		Non-biocontainer Docker registry image string when present.
mulled_components	→ ToolSpec[]		Constituent Bioconda packages for an opaque `mulled-v2-*` multi-package container when resolved from a cached BioContainers multi-package-containers TSV. Omitted when no matching cached row is available.
singularity	string \| null		Singularity image reference when present (e.g. `https://depot.galaxyproject.org/singularity/...`).
wave	string \| null		Seqera Wave / community-cr registry reference (e.g. `community.wave.seqera.io/library/<name>:<version>--<digest>` or `https://community-cr-prod.seqera.io/.../sha256/<digest>/data`). Wave-built containers are increasingly common in nf-core; kept distinct from `docker` because the resolution rules and provenance differ.

ToolSpec

ToolSpec

One constituent package in a decomposed multi-package container.

field	type	req	description
bioconda	string	✓	Exact Bioconda requirement spec, e.g. `bioconda::samtools=1.20`.
name	string	✓
version	string	✓

Workflow

Workflow

Primary workflow: channel construction plus the call-graph DAG. Real nf-core pipelines have multiple named `workflow` blocks (an anonymous entrypoint `workflow {}` in main.nf that wires PIPELINE_INITIALISATION → NFCORE_<NAME> → PIPELINE_COMPLETION; a named NFCORE_<NAME> wrapper; a substantive named workflow under workflows/<name>.nf). The summary collapses this into one primary `workflow` plus `subworkflows[]`. Selection rule: pick the workflow that invokes the most pipeline processes — typically the one under `workflows/<name>.nf`. The anonymous `workflow {}` glue and the NFCORE_<NAME> wrapper land in `subworkflows[]`.

field	type	req	description
channels	→ Channel[]	✓
edges	→ Edge[]	✓
name	string	✓	Name of the selected primary workflow (uppercase NF convention). The anonymous `workflow {}` entrypoint is never the primary; if it is the only workflow block, the pipeline is too small to summarize and the cast skill should emit a warning.
conditionals	→ Conditional[]

This page is auto-rendered from the JSON Schema authored in this repo and shipped on npm as part of @galaxy-foundry/summarize-nextflow (the producer co-locates its own schema). Each $def becomes a section below with a stable anchor ID — research notes and Mold bodies can deep-link individual shapes via summary-nextflow#Tool.

Source-of-truth chain:

packages/summarize-nextflow/src/schema/summary-nextflow.schema.json — the canonical JSON, hand-edited as part of the Mold/cast loop (summarize-nextflow). Mold frontmatter cites it via summary-nextflow wiki-links; cast imports the summaryNextflowSchema runtime export and serializes it into cast bundles.
packages/summarize-nextflow/scripts/sync-schema.mjs runs at prebuild, regenerating the typed summary-nextflow.schema.generated.ts const wrapper from the canonical JSON.
Published as @galaxy-foundry/summarize-nextflow on npm. Site rendering imports the schema directly from this package via site/src/lib/schema-registry.ts; the published artifact also exports validateSummary() and ships the standalone summarize-nextflow bin (self-validates by default). The unified foundry CLI in @galaxy-foundry/foundry exposes the same gate as foundry validate-summary-nextflow for downstream cast skills.

At runtime in cast skills: validation should happen through the CLI command:

foundry validate-summary-nextflow summary.json

The same schema is copied verbatim into references/schemas/summary-nextflow.schema.json per the casting policy in docs/COMPILATION_PIPELINE.md. The package additionally exports validateSummary (AJV gate) for TypeScript consumers, but generated skills should prefer command-shaped validation so failures are easy to reproduce outside the agent runtime.

Contrast with tests-format, which is vendored from an external npm package (@galaxy-tool-util/schema); this schema is authored here and shipped to npm — the direction of the source-of-truth chain is reversed.

Why per-source

Paper, Nextflow, and CWL are different enough that forcing a shared cross-source summary shape would either lose detail or bloat all three (docs/HARNESS_PIPELINES.md §“Mold-inventory parity”). Each summarize-<source> Mold emits its own schema; downstream source-target Molds such as nextflow-summary-to-galaxy-interface, nextflow-summary-to-galaxy-data-flow, nextflow-summary-to-cwl-interface, and nextflow-summary-to-cwl-data-flow consume this summary without pretending every source has one shared shape.

Field-name parity with gxy-sketches

Three sub-shapes mirror gxy-sketches verbatim — see gxy-sketches-alignment for the rationale:

SourceRecord — mirrors SketchSource (ecosystem, workflow, url, version, license, slug).
Tool — extends ToolSpec (name, version) with the resolved container/conda strings the bridge to author-galaxy-tool-wrapper needs.
TestDataRef / ExpectedOutputRef — mirror gxy-sketches’ field names exactly. The sketch-bundle invariant that path must live under test_data/ is intentionally dropped; the Foundry summary describes fixtures as data, it does not bundle them.

Cast-time role

Per docs/COMPILATION_PIPELINE.md’s per-kind dispatch, this schema is referenced by summarize-nextflow via output_artifacts[].schema and copied verbatim into the cast bundle’s references/schemas/. The cast skill validates its emitted JSON with validate-summary-nextflow before returning; failure is loud — downstream Molds bind to this shape and would produce worse errors later.

What is intentionally not modeled

Structured channel typing. processes[].inputs[].shape is a string ("tuple(meta, [path,path])"), not a structured type. NF channel typing is a research project; a string is enough for downstream Molds to reason about and an LLM to emit.
Operator-chain semantics. Edge.via records the literal operator chain (["map", "join", "groupTuple"]). Reconciling what the chain does to channel shapes is left to the LLM step that fills Edge.notes when confidence is low.
Multi-tool processes outside decomposed mulled-v2 containers. A process can run multiple tools (a shell pipeline of two binaries). Process.tool is nullable; multi-tool processes set it null and surface tool details in script_summary and container. A tools[] foreign-key array on Process would be cleaner; deferred until downstream use forces it.

Revision 2 — 2026-05-01

First cast against nf-core/demo @ 1.1.0 exposed gaps in the v1 shape (see content/log.md’s 2026-05-01 entry). Changes:

Tool.biocontainer description widened to accept the docker.io alias biocontainers/<name>:<version>--<build> alongside quay.io/biocontainers/.... Modern nf-core modules publish the docker.io form in the docker branch and the depot.galaxyproject.org/singularity/... form in the singularity branch.
Tool.wave field added for Seqera Wave / community-cr registry images (community.wave.seqera.io/..., https://community-cr-prod.seqera.io/...). Kept distinct from docker because resolution rules and provenance differ.
Process.container and Process.conda re-described as verbatim directive text (not “resolved”). Modern container directives are ternary expressions over workflow.containerEngine; modern conda directives are file references to ${moduleDir}/environment.yml. The schema now records the directive faithfully and pushes resolution into tools[].
ChannelIO.topic field added for Nextflow 24+ channel topics. nf-core templates emit a per-process topic: versions triple to a global topic for version aggregation; the v1 shape had no place to record this.
Subworkflow.kind enum added (pipeline | utility). nf-core template subworkflows like PIPELINE_INITIALISATION compose free-function calls (paramsHelp, completionEmail) without invoking processes; their calls[] is empty by design. The kind field disambiguates real data-flow contributors from utility wrappers.
Workflow.name description names the selection rule for pipelines with multiple named workflow blocks (anonymous workflow {} + NFCORE_ + a substantive named workflow): pick the one with the most process invocations — typically workflows/<name>.nf — and route the rest into subworkflows[].

What was not changed despite biting:

nf-test snapshot assertions (snapshot(...).match() with helpers) remain summarized to prose strings in ExpectedOutputRef.assertions[]. A structured “snapshot fixture” shape would help but is deferred until rev 3 when the testing note has paragraphs to inform the design.
Free-function calls in workflow bodies (paramsSummaryMap, softwareVersionsToYAML) remain folded into channel description text. No first-class representation; their effects are channel sources, the names are nf-core idiom not pipeline-specific signal.

Revision 3 — 2026-05-01

Second cast against nf-core/bacass @ 2.5.0 (33 processes, 9 nf-test files, 11 test profiles) exposed two structural-coverage gaps the second pipeline made universal. Changes:

Process.aliases: string[] added. Real pipelines re-import a single module multiple times under different aliases via include { X as Y } — bacass has six such patterns (CAT_FASTQ→{SHORT,LONG}, MINIMAP2_ALIGN→{CONSENSUS,POLISH} (3 aliases of one process), KRAKEN2_KRAKEN2→{KRAKEN2,KRAKEN2_LONG}, QUAST→QUAST_BYREFSEQID, plus FASTQC→{RAW,TRIM} inside FASTQ_TRIM_FASTP_FASTQC). Workflow edges[].from/.to reference alias names; canonical names didn’t appear in edges at all. The new field captures the alias→canonical mapping so downstream skills (especially author-galaxy-tool-wrapper, which needs to know “MINIMAP2_CONSENSUS shares MINIMAP2_ALIGN’s container/conda but is invoked with different runtime args”) can resolve references.
Summary.nf_tests: NfTest[] added. bacass has 9 tests/*.nf.test files, one per test profile. The previous schema’s test_fixtures is singular (one selected profile’s data shape); the rest of the test surface was invisible. The new array enumerates every .nf.test with structured fields: profile, params overrides, assert_workflow_success, prose_assertions, and a structured snapshot: SnapshotFixture | null capturing the snapshot(...).match() semantics.
SnapshotFixture shape added. nf-core templates use a near-uniform snapshot pattern: pass succeeded-task-count + version-yaml + stable-name list + stable-path list into snapshot(), pruning via ignoreFile: and ignore: globs. The new shape records captures, helpers, ignore_files, ignore_globs, and the .snap path — enough for downstream test-plan molds (e.g. nextflow-test-to-galaxy-test-plan) to reconstruct equivalent assertion intent in target frameworks without re-parsing Groovy.

What was not changed despite biting:

TestFixtures stayed singular. Multiple test profiles surfaced via nf_tests[] rather than promoting test_fixtures to an array — this preserves backward compatibility and keeps the “data shape of the selected profile” abstraction.
Mulled-v2 multi-package containers, multiMap/.branch/.cross fan-out, conditional channel construction, .mix-then-reassign — all still had only one bite each (bacass), so deferred per the “grow from contact” rule.

Revision 4 — 2026-05-05

Snapshot-sidecar parsing landed for module and subworkflow tests whose interesting assertions live in sibling .nf.test.snap JSON files. Changes:

SnapshotFixture.parsed_content: SnapshotContent[] added. Each parsed sidecar entry preserves the snapshot name plus channel-keyed SnapshotChannel values.
SnapshotFile added. <path>:md5,<hex> strings become file digest assertions with path, basename, md5, and a stub flag for empty-file md5s.
Non-file values preserved. Version tuples, counts, and other scalar snapshot values remain in SnapshotChannel.values so downstream test-plan Molds do not re-read .snap files.

Revision 7 — 2026-05-06

Top-level Param entries gained the nf-schema metadata previously only available on sample-sheet columns. Resolves jmchilton/foundry#186.

Param.format added (string|null). nf-schema format keyword: file-path, directory-path, path, file-path-pattern. Disambiguates path-typed params from plain strings without re-reading nextflow_schema.json.
Param.hidden added (boolean|null). nf-schema hidden keyword. CLI-plumbing params (validate_params, pipelines_testdata_base_path, version, …) now drop out of user-facing target interfaces structurally.
Param.mimetype added (string|null). nf-schema mimetype keyword. Seeds Galaxy format on path params when present.
Param.schema_group added (string|null). The parent $defs section’s title (e.g. Input/output options, Reference genome options). Preserves nf-schema sectioning for UI grouping.
Param.fa_icon added (string|null). The parent section’s Font Awesome icon hint.

Downstream Molds — nextflow-summary-to-galaxy-interface, nextflow-summary-to-cwl-interface, nextflow-summary-to-galaxy-data-flow — can now consume these structurally instead of re-parsing the source schema. Mapping table in nextflow-params-to-galaxy-inputs.

Revision 6 — 2026-05-05

Sample-sheet schemas became first-class structured inputs. Resolves the open question raised in nextflow-workflow-io-semantics §“Open questions” and tracked in jmchilton/foundry#177.

Summary.sample_sheets: SampleSheet[] added (required; empty array when none). Promotes sample-sheet shape out of params[].description prose so downstream target Molds can pick collection variants without re-parsing the source pipeline.
SampleSheet shape added. Binds one params[] parameter (param) to a row schema (columns) plus discovery provenance (discovered_via: nf-schema | samplesheetToList | splitCsv | ad-hoc), optional schema_path, format, and header.
SampleSheetColumn shape added. Captures name, JSON Schema-style scalar type, kind (data for path-typed dataset references vs meta for per-row metadata), format, required, default, enum, pattern, exists, mimetype, description. Validation hints stay verbatim — target Molds decide which survive translation (e.g. Galaxy’s sample_sheet validator allowlist is regex/in_range/length only; richer nf-schema validation downgrades to prose with confidence note).

Why now: nf-core’s samplesheetToList(params.input, "assets/schema_input.json") idiom maps almost 1:1 onto Galaxy’s sample_sheet[:paired|:paired_or_unpaired|:record] collection types (column_definitions, typed columns including element_identifier cross-row refs, restrictions[] for enums, regex validators). Without structured columns the interface Mold cannot pick sample_sheet:paired vs list:paired vs flat-file principally, and per-row meta fields silently fall to parallel parameter inputs. See galaxy-sample-sheet-collections for the target-side mapping table consumed by nextflow-summary-to-galaxy-interface and nextflow-summary-to-galaxy-data-flow.

What was not changed: Param.type still records the param’s own type (string/path) — the sample-sheet relationship is expressed by sample_sheets[].param referencing params[].name, not by mutating the param entry.

Revision 5 — 2026-05-05

Mulled-v2 multi-package container decomposition now has a narrow optional shape. Changes:

Tool.mulled_components: ToolSpec[] added. When summarize-nextflow is given a cached BioContainers multi-package-containers TSV, opaque mulled-v2-* container IDs can be decomposed into constituent Bioconda package specs.
ToolSpec added. Constituent packages record name, version, and exact bioconda requirement text.

Incoming References (21)

Summarize Nextflowrelated note— Statically introspect a Nextflow / nf-core pipeline tree and emit a validated JSON summary.
Validate Summary Nextflowrelated note— AJV gate for summarize-nextflow JSON documents.
author-galaxy-tool-wrapperrelated note— Author a new Galaxy user-defined tool YAML definition when discovery yields nothing acceptable.
nextflow-summary-to-cwl-data-flowrelated note— Translate a Nextflow summary into a CWL data-flow design brief.
nextflow-summary-to-cwl-interfacerelated note— Map a Nextflow summary into a CWL Workflow interface design brief.
nextflow-summary-to-galaxy-data-flowrelated note— Translate a Nextflow summary into a Galaxy data-flow design brief.
nextflow-summary-to-galaxy-interfacerelated note— Map a Nextflow summary into a Galaxy workflow interface design brief.
nextflow-summary-to-galaxy-reference-datarelated note— Decide the Galaxy-side shape of external reference data declared by a Nextflow pipeline.
nextflow-summary-to-galaxy-templaterelated note— gxformat2 skeleton with per-step TODOs from a Nextflow summary and prior Galaxy design briefs.
nextflow-test-to-cwl-test-planrelated note— Translate Nextflow test evidence into a CWL workflow test plan.
nextflow-test-to-galaxy-test-planrelated note— Translate Nextflow test evidence into a Galaxy workflow test plan.
summarize-nextflowrelated note— Read a Nextflow pipeline source tree (nf-core or ad-hoc DSL2) and emit a structured JSON summary for downstream translation Molds.
summarize-nextflowschema of artifact output— Read a Nextflow pipeline source tree (nf-core or ad-hoc DSL2) and emit a structured JSON summary for downstream translation Molds.
Alignment: gxy-sketches ↔ Galaxy Workflow Foundryrelated note— Where the Foundry's per-source summary Molds align with gxy-sketches on field names and source/test-fixture vocabulary, and where they intentionally do not.
Nextflow conditional to Galaxy subworkflow / whenrelated note— Stub. Translate Nextflow conditionals into Galaxy `when:` (single-workflow v1). Subworkflow vs inline is an aesthetic call, not a rule.
Nextflow params to Galaxy workflow inputsrelated note— Rules for translating Nextflow params, sample sheets, channels, and control flags into gxformat2 inputs.
Nextflow path/glob to Galaxy datatype mappingrelated note— Rules for mapping Nextflow path, glob, sample-sheet, and output filename evidence to Galaxy datatype extensions.
Nextflow reference-data classificationrelated note— Source-side taxonomy of how Nextflow pipelines use reference data — eight classifications detectable from a summary-nextflow artifact.
Nextflow nf-test snapshots to Galaxy/Planemo assertionsrelated note— Translates nf-test snapshot assertions into Galaxy workflow test-format assertions, broken out by module-level vs pipeline-level test shape.
Nextflow to Galaxy reference-data mappingrelated note— Galaxy-side translation of Nextflow reference-data classifications: idioms available, the v1 posture, datatype defaults, and the in-tool rebuild trade-off.
Nextflow workflow I/O semanticsrelated note— Defines Nextflow workflow inputs and outputs from docs plus observed fixture pipeline structures.