Home Schema

Nextflow pipeline summary

JSON Schema for the structured summary emitted by the summarize-nextflow Mold.

Revised
2026-05-06
Rev
8
schema summary-nextflow @galaxy-foundry/summarize-nextflow @ 0.0.0 upstream ↗

Nextflow Pipeline Summary

Structured per-source summary emitted by the summarize-nextflow Mold. Per-source schema by design — paper, Nextflow, and CWL each have their own summary shape; downstream Molds (data flow, templates, tool wrappers) consume any source's summary and handle the polymorphism. Field names mirror gxy-sketches' SketchSource / ToolSpec / TestDataRef / ExpectedOutputRef where parity exists; see content/research/gxy-sketches-alignment.md.

29 definitions. Anchor links per definition (e.g. #has_text_model) are stable.

Channel

Channel

One top-level channel constructed in the workflow. Sources include `Channel.fromPath`, `Channel.fromFilePairs`, `samplesheetToList`-driven `Channel.fromList`, `splitCsv`, `file`/`files`, and plain `params.*` references.

field type req description
construct string Classifies the channel's primary materialization factory or shape-determining operator. Selection precedence: (1) `samplesheetToList` when the chain contains `samplesheetToList(...)` (typically wrapped in `Channel.fromList`); (2) `splitCsv` when the chain ends in `.splitCsv(header: true)` over a path; (3) otherwise the outermost factory (`Channel.fromPath` → `fromPath`, `Channel.fromFilePairs` → `fromFilePairs`, `file(...)` → `file`, etc.); (4) `other` for unrecognized constructions. The verbatim `source` retains the full chain — `construct` is the typed lookup the converter would otherwise re-parse.
from_param string | null Foreign key into `params[].name` when the construction expression directly references `params.X` (e.g. `Channel.fromPath(params.reads)`, `samplesheetToList(params.input, ...)`, `file(params.fasta)`). Null for literal-glob construction, expressions that compose multiple params, and channels derived from other channels. v1 resolution is direct-only; one-hop Groovy bindings (`def reads = params.reads; Channel.fromPath(reads)`) are not chased — see jmchilton/foundry#211.
name string
required_runtime boolean True when the construction chain ends in `.ifEmpty { error ... }` (or an equivalent imperative emptiness-throw guard). Captures runtime requiredness even when the param's nf-schema entry does not mark it required. Combine with `params[].required` and any imperative pre-construction `error` checks for the full requiredness picture.
shape string String-encoded channel shape; same convention as ChannelIO.
source string Verbatim channel-construction expression.

ChannelIO

ChannelIO

One declared input or output channel of a process. Channel shape is a string, not a structured type — `tuple(meta, [path,path])` is enough for downstream Molds and avoids a research project on NF channel typing.

field type req description
name string
shape string String-encoded channel shape, e.g. `tuple(meta, [path,path])`, `path`, `val(integer)`.
description string
topic string | null Nextflow channel-topic name when this output is bound to a topic (e.g. `versions` for the standard nf-core version-aggregation topic). Topics are a Nextflow 24+ feature; the module-level `topic: <name>` annotation on a process output emits to a global named channel that any consumer can `channel.topic('<name>')` to read. Null for non-topic outputs and for all inputs.

Conditional

Conditional

One workflow-level conditional that gates a subgraph.

field type req description
affects string[] Process or subworkflow names whose execution is gated by this conditional.
branch string Which side of the conditional this entry describes. `default` is the truthy branch; `alternate` is the else branch.
guard string Verbatim guard expression, e.g. `params.skip_alignment`.

Edge

Edge

One edge in the workflow's call graph. The deterministic parser records the literal operator chain in `via`; reconciliation of the source and target shapes is the LLM step.

field type req description
from string Channel name or `<PROCESS>.out.<chan>` reference.
to string Process or subworkflow name receiving this channel.
notes string LLM-emitted note when the reconciliation is low-confidence (e.g. deeply nested closures).
via string[] Operators on the path between source and target, in order (e.g. `["map", "join", "groupTuple"]`). Empty for a direct edge.

Evidence

Evidence

Shared provenance block for source-detected facts. `source_path` localizes the finding; `confidence` is a coarse self-assessment; `evidence[]` carries free-form snippets (matched lines, normalized expressions, detector notes). Reused by ReferenceAsset and ReferenceRebuildRule.

field type req description
confidence string Detector self-assessment. `high`: all signals match the expected idiom; `medium`: some signals match but the guard or builder is partial / mixes non-param locals; `low`: heuristic match, downstream should treat as a hint.
evidence string[] Free-form evidence snippets (matched source lines, normalized expressions, detector notes). Empty when none.
source_path string | null Repo-relative path of the source file where the finding was detected (e.g. `subworkflows/local/prepare_genome/main.nf`, `nextflow.config`). Null when the finding spans files or cannot be localized.

ExpectedOutputRef

ExpectedOutputRef

One expected output. Field names mirror gxy-sketches' ExpectedOutputRef verbatim. At least one of path / url / assertions is required.

field type req description
role string
assertions string[] Assertion strings. Simple equality / regex / contains-string checks are preserved verbatim; complex Groovy assertions are summarized to prose with a `warnings[]` flag in the parent Summary.
description string | null
kind string | null Coarse output kind: `report`, `tabular`, `image`, `archive`, `sequence`, `other`.
path string | null
url string | null

InvocationBinding

InvocationBinding

Binding of one positional argument to a callee `take:` name.

field type req description
argument string Verbatim caller-side argument expression.
take string Callee `take:` name (FK into the callee subworkflow's `inputs[].name`).

ModuleMeta

ModuleMeta

Normalized subset of nf-core module `meta.yml`, captured next to a process so module-scoped downstream Molds can consume `processes[i]` without reading the full pipeline summary.

field type req description
authors string[]
input → ModuleMetaEntry[]
keywords string[]
maintainers string[]
output → ModuleMetaEntry[]
tools → ModuleMetaEntry[]
description string

ModuleMetaEntry

ModuleMetaEntry

One named entry from a module `meta.yml` section, normalized from nf-core's single-key YAML maps into `{ name, ...fields }` objects.

field type req description
name string
description string
documentation string
doi string
homepage string
identifier string
licence string[]
pattern string
tool_dev_url string
type string

NfTest

NfTest

One `tests/*.nf.test` file. nf-core templates use a near-uniform shape: a `nextflow_pipeline { test("<desc>") { when { params { ... } } then { assertAll(...) } } }` block, with one test() per profile, asserting `workflow.success` plus a snapshot match. This shape captures the data so downstream test-conversion molds (e.g. nextflow-test-to-target-tests) don't have to re-parse Groovy.

field type req description
assert_workflow_success boolean Whether the test asserts `workflow.success`. Almost always true for nf-core templates.
name string Description string passed to `test("...")`. Often duplicates the profile name.
path string Repo-relative path of the .nf.test file (e.g. `tests/dfast.nf.test`).
profiles string[] Profile name(s) the test runs under, from the file-level `profile "<name>"` declaration and/or per-test config overrides. Usually a single profile.
prose_assertions string[] Other assertions (regex matches, count checks, content equality on specific files) summarized to prose strings. Empty for snapshot-only tests, which is the common nf-core case.
params_overrides object The `when { params { ... } }` block as a key→value map. Most templates set only `outdir`; pipeline-specific overrides land here.
snapshot → SnapshotFixture | null Structured representation of the `assert snapshot(...).match()` clause when present; null when the test uses no snapshot.

Param

Param

One pipeline parameter. Sourced from `nextflow.config`'s `params { ... }` block, augmented from `nextflow_schema.json` when present (nf-core).

field type req description
name string
required boolean
type string JSON Schema-style type when nextflow_schema.json supplies it (string, integer, number, boolean, path); free-form when inferred from a default value.
default any Default value verbatim from the source. May be null.
description string
enum any[] Allowed values when nextflow_schema.json declares them.
fa_icon string | null Font Awesome icon class from the parent `$defs` section's `fa_icon` (e.g. `fas fa-terminal`). Optional UI hint; null when undeclared.
format string | null JSON Schema `format` keyword from nextflow_schema.json. nf-schema vocabulary: `file-path`, `directory-path`, `path`, `file-path-pattern`. Distinguishes path-typed params (datasets/collections) from plain strings without re-reading the source schema. Null when undeclared.
hidden boolean | null nf-schema `hidden` keyword. True for CLI-plumbing params (e.g. `validate_params`, `pipelines_testdata_base_path`, `version`) that shouldn't surface in user-facing target interfaces. Null when undeclared.
mimetype string | null nf-schema `mimetype` keyword (e.g. `text/csv`, `text/plain`, `application/gzip`). Useful for seeding Galaxy `format` on path-typed params. Null when undeclared.
schema_group string | null Human-readable group label from the parent nextflow_schema.json `$defs` section's `title` (e.g. `Input/output options`, `Reference genome options`). Preserves nf-schema sectioning so target interfaces can group inputs without re-parsing the source. Null when the param is not under a titled section.
source_expression string | null Verbatim right-hand-side expression when the param was assigned in Groovy (e.g. `getGenomeAttribute('fasta_fai')`). Null when the param comes from a JSON Schema entry only.
source_kind string | null How this param entry was derived. `nextflow_schema`: declared in `nextflow_schema.json`. `params_block`: declared (or defaulted) in `nextflow.config`'s `params { ... }` block. `getGenomeAttribute`: synthesized from `params.X = getGenomeAttribute('X')` assignments in `nextflow.config` (nf-core key-expanded bundle pattern). `computed`: derived from another non-getGenomeAttribute expression. Null when provenance was not classified.
source_path string | null Repo-relative path of the source file where the assignment lives (e.g. `nextflow.config`, `conf/igenomes.config`). Null when undefined or when the param comes only from `nextflow_schema.json`.

Process

Process

One Nextflow `process { ... }` block.

field type req description
inputs → ChannelIO[]
meta → ModuleMeta | null Normalized `meta.yml` from the module directory when present. Null for local/ad-hoc processes without module metadata.
module_path string Relative path from the pipeline root to the file declaring the process.
module_tests → NfTest[] Module-scoped nf-test files under the module's `tests/` directory. Empty for local/ad-hoc modules without unit tests.
name string Canonical process name (uppercase NF convention) as declared by the `process <NAME>` line in `module_path`.
outputs → ChannelIO[]
aliases string[] Alias names this process is imported as in workflow scopes via `include { <NAME> as <ALIAS> }`. Real nf-core pipelines re-import a single module multiple times under different aliases to invoke it with distinct runtime args (e.g. `MINIMAP2_ALIGN as MINIMAP2_CONSENSUS`, `MINIMAP2_ALIGN as MINIMAP2_POLISH`). Each alias appears in `workflow.edges[].from`/`.to` exactly as written; the canonical `name` does not appear in edges unless an unaliased import exists. Default `[]` when the process is only imported under its canonical name.
conda string | null Verbatim text of the process's `conda` directive (or null). Two common shapes: a literal `bioconda::<name>=<version>` (legacy) or a file reference `${moduleDir}/environment.yml` (modern nf-core convention). Either way the resolved bioconda spec lands in `tools[].bioconda`.
container string | null Verbatim text of the process's `container` directive (or null). Modern nf-core modules use a ternary expression `"${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? '<singularity-uri>' : '<docker-uri>' }"` — keep that text intact here. The two resolved branches are split out into `tools[].docker` / `tools[].singularity` (and `tools[].wave` / `tools[].biocontainer` as appropriate).
in_subworkflows string[] Denormalized FK: names of `subworkflows[]` whose `calls[]` invoke this process (canonical name or any alias). Empty `[]` when the process is only invoked from the primary `workflow` body. Useful for grouping the flat process-tier DAG into a subworkflow-tier view client-side. Multi-entry when the same module is invoked from several subworkflows (typically via aliases).
publish_dir string | null `publishDir` value when the process publishes outputs (or null).
script_summary string One-line LLM-derived summary of what the `script:` body does. Free text.
tool string | null Foreign key into `tools[].name` when the process runs a single recognizable tool; null for multi-tool or pure-Groovy processes.
when string | null Verbatim `when:` guard expression, or null if absent.

ReferenceAsset

ReferenceAsset

Curated source-side view of one reference-data input the pipeline consumes. A foreign-key projection onto `params[].name` plus the metadata downstream classification Molds need. Target-agnostic: no Galaxy translation decisions are encoded here.

field type req description
asset_kind string Coarse asset classification (e.g. `fasta`, `fasta_index`, `sequence_dictionary`, `bwa_index`, `bwamem2_index`, `tabix_index`, `gtf`, `gff`, `bed`, `vcf`, `database`, `other`). Free-form string; downstream Molds choose how to map onto target datatypes.
evidence → Evidence
param string FK into `params[].name`.
required boolean True when the source pipeline treats this asset as required (no compute-if-missing branch, no optional guard). When false, see `reference_rebuilds[]` for any rebuild branch covering this param.
used_by string[] Names of processes / subworkflows that receive this param as an argument (via direct invocation binding or channel construction). Empty when no caller could be attributed.
format_hint string | null Format hint, often the nf-schema `format` keyword (`file-path`, `directory-path`, `path`, `file-path-pattern`) or a domain hint like `fai`, `dict`. Null when undetermined.
schema_group string | null Mirrors `params[<this>].schema_group`. Useful for grouping (`Reference genome options`).
source_expression string | null Mirrors `params[<this>].source_expression`. Duplicated for the same reason.
source_kind string | null Mirrors `params[<this>].source_kind`. Duplicated here so reference-data consumers don't have to dereference.

ReferenceRebuildRule

ReferenceRebuildRule

One compute-if-missing rebuild branch detected in a workflow or subworkflow body. Binds an asset param to the guard expression and builder process that produces it when absent. Source evidence only; in-tool-rebuild decisions live downstream.

field type req description
asset_param string FK into `params[].name` — the asset this branch reconstitutes when missing.
builder string Builder process or subworkflow name invoked inside the branch (e.g. `SAMTOOLS_FAIDX`, `BWA_INDEX`).
builder_outputs string[] Output channel names from the builder that are assigned to the asset (e.g. `fai`, `dict`).
evidence → Evidence
guard string Verbatim guard expression, e.g. `!fasta_fai_in && step != "annotate"`.
fallback_for string | null Name of the `<asset>_in`-style param that, when present, supplies the asset directly and skips the rebuild. Null when no such pairing was detected.
guard_params string[] Param names referenced by the guard. Empty when the guard binds only non-param locals.

SampleSheet

SampleSheet

One sample-sheet-shaped pipeline input. Captures the row schema, path-vs-meta column classification, and discovery provenance so downstream Molds can map onto Galaxy `sample_sheet[:variant]` collections, CWL records-of-arrays, or other target shapes without re-reading the source.

field type req description
columns → SampleSheetColumn[] Row schema in declaration order. nf-schema's `samplesheetToList` emits columns in property order, not source-column order — preserve property order here so downstream translations match runtime channel item layout.
discovered_via string How the sample-sheet shape was detected. `nf-schema`: nextflow_schema.json entry has `schema:` keyword. `samplesheetToList`: workflow imports nf-schema's helper and calls it on the param. `splitCsv`: workflow uses `splitCsv(header: true)` on the param's path. `ad-hoc`: pipeline-specific CSV/TSV parsing inferred from script bodies.
param string Foreign key into `params[].name` — the parameter whose value points at the sample-sheet file.
format string | null Tabular format declared by the param's `mimetype` or inferred from `samplesheetToList` / `splitCsv` semantics. Null when undeclared.
header boolean | null Whether the file is expected to have a header row. Null when the discovery source doesn't pin it.
schema_path string | null Repo-relative path of the JSON Schema file describing rows (e.g. `assets/schema_input.json`), resolved from the param's nf-schema entry's `schema:` keyword. Null when discovered via `samplesheetToList` without a schema reference, or via `splitCsv(header: true)` ad-hoc parsing.

SampleSheetColumn

SampleSheetColumn

One column in a sample-sheet row schema. Type vocabulary is JSON Schema-style scalar; `kind` separates dataset-reference columns from per-row metadata. Validation hints (`pattern`, `enum`, `exists`, `mimetype`) are preserved verbatim — target Molds decide which of them survive translation (e.g. Galaxy's `sample_sheet` validator allowlist is regex/in_range/length).

field type req description
kind string `data`: path-typed column whose values are dataset references (becomes element/inner-collection slot in Galaxy `sample_sheet`). `meta`: scalar metadata column (becomes a `column_definitions` entry, populated per row in `columns[]`). Includes the nf-schema `meta:` annotation when present; in its absence inferred from `format` (`file-path`/`directory-path`/`path` → `data`, anything else → `meta`).
name string
required boolean
type string JSON Schema-style scalar type. Path columns are typed `string` with a `format: file-path|directory-path|path` qualifier — see `format` and `kind`.
default any Default value verbatim from the sample-sheet schema. May be null.
description string | null
enum any[] Allowed values when the column schema declares an `enum`. Empty array when absent.
exists boolean | null nf-schema `exists` flag for path columns. True when the parser must verify the file exists.
format string | null JSON Schema `format` keyword. nf-schema vocabulary: `file-path`, `directory-path`, `path`, `file-path-pattern`. Null for scalar metadata columns without a format hint.
mimetype string | null nf-schema `mimetype` keyword (e.g. `text/csv`, `application/gzip`). Useful for target datatypes.
pattern string | null JSON Schema regex `pattern` when present.

SnapshotChannel

SnapshotChannel

One output channel or flat snapshot list inside a `.snap` entry.

field type req description
files → SnapshotFile[] File md5 assertions parsed from `<path>:md5,<hex>` strings.
key string | null Channel key from the snapshot object, e.g. `0`, `1`, or `genome_gtf`; null for flat file-list snapshots.
values any[] Non-file snapshot values preserved verbatim for consumers that need versions, counts, or scalar tuple emissions.

SnapshotContent

SnapshotContent

One top-level snapshot entry from an nf-test `.snap` JSON sidecar.

field type req description
channels → SnapshotChannel[] Channel-keyed snapshot values. Pipeline-template flat file lists use `key: null`.
name string Top-level key in the `.snap` file, usually the `test("...")` name.

SnapshotFile

SnapshotFile

One file digest assertion from a `.snap` sidecar.

field type req description
basename string Final path segment, suitable for Galaxy test `file:` assertions when the snapshot stores only basenames.
md5 string MD5 digest from the sidecar.
path string Path portion before `:md5,` exactly as stored in the sidecar.
stub boolean True when the digest is the empty-file md5 emitted by many `-stub` tests.

SnapshotFixture

SnapshotFixture

Structured shape of an nf-test `snapshot(...).match()` assertion. nf-core templates pass a small set of values into snapshot() — succeeded-task count, version YAML (with Nextflow version stripped), stable file-name list, stable file-content list — and prune the comparison via `getAllFilesFromDir(..., ignoreFile: ..., ignore: [...])` helpers. This breakdown lets downstream consumers reconstruct equivalent assertions in target test frameworks without re-parsing Groovy.

field type req description
captures string[] Logical names of values passed to snapshot(), in order. Common set: `succeeded_task_count`, `versions_yml`, `stable_names`, `stable_paths`. Free-form strings rather than an enum because pipelines can pass other values.
helpers string[] nf-test helper functions invoked in the snapshot expression (e.g. `getAllFilesFromDir`, `removeNextflowVersion`). Used to detect whether a target framework can replicate the comparison.
ignore_files string[] Repo-relative paths passed as `ignoreFile:` to helpers (e.g. `tests/.nftignore`, `tests/.nftignore_files_entirely`). The files themselves are line-delimited globs.
ignore_globs string[] Inline `ignore: [...]` glob list passed to helpers (e.g. `['Prokka/**', '**/multiqc_busco.yaml']`). Distinct from ignore_files which references on-disk lists.
parsed_content → SnapshotContent[] Parsed entries from the `.nf.test.snap` JSON sidecar. Empty when the sidecar is absent, too large for compact summary output, or not valid JSON. Each entry preserves the snapshot name plus channel-keyed file md5 assertions and non-file scalar values.
snap_path string | null Repo-relative path of the corresponding `.nf.test.snap` file when present.

SourceRecord

SourceRecord

Provenance block. Mirrors gxy-sketches SketchSource field names so a Foundry summary is structurally consumable by anyone using that shape.

field type req description
ecosystem string Pipeline flavor. `nf-core` when `manifest.name` is `nf-core/<slug>` and the canonical layout is present; `nextflow` for ad-hoc DSL2 pipelines. Superset of gxy-sketches' enum (which adds `iwc`, `snakemake-workflows`, `wdl` for its other ingestors).
slug string Kebab-case identifier; `<owner>-<repo>` for nf-core, repo basename otherwise.
url string Git remote URL of the pipeline repository.
version string Tag, branch, or commit SHA the summary was derived from.
workflow string Pipeline name (typically the part after `nf-core/`).
license string | null SPDX license identifier when detectable from a LICENSE file.

Subworkflow

Subworkflow

One named `workflow <NAME> { ... }` block other than the primary workflow. Includes nf-core utility wrappers (PIPELINE_INITIALISATION, PIPELINE_COMPLETION, UTILS_NFCORE_PIPELINE) which exist to compose helper-function calls rather than to invoke processes.

field type req description
calls string[] Names of processes and nested subworkflows this subworkflow invokes. Free-function calls (e.g. `paramsHelp`, `completionEmail`) are NOT recorded here; they're implied by `kind = utility`.
kind string `pipeline` when the subworkflow invokes pipeline processes (data-flow contributor). `utility` when it composes helper functions only — common nf-core template subworkflows like PIPELINE_INITIALISATION (which calls `paramsHelp`, `samplesheetToList`, `UTILS_NFSCHEMA_PLUGIN`) and PIPELINE_COMPLETION (`completionEmail`, `completionSummary`). Utility subworkflows have empty `calls[]` for processes but may still expose channel outputs the primary workflow consumes.
name string
path string Relative path from the pipeline root.
tests → NfTest[] Subworkflow-scoped nf-test files under the subworkflow's `tests/` directory. Empty when absent.
inputs → ChannelIO[] Parsed entries of the subworkflow's `take:` block. `name` is the take name, `description` carries any adjacent `// ...` comment, `shape` carries the verbatim source line.
invocations → SubworkflowInvocation[] Call-sites where this subworkflow was invoked. Each entry records the calling workflow/subworkflow, the verbatim positional arguments, and a per-argument binding onto this subworkflow's `inputs[].name` (take names). Drives reference-asset attribution (`reference_assets[].used_by`) without re-parsing call sites. Empty when uncalled or when the call could not be positionally aligned.
outputs → ChannelIO[]

SubworkflowInvocation

SubworkflowInvocation

One call-site of a subworkflow, with positional arguments bound to its `take:` names.

field type req description
arguments string[] Verbatim positional argument expressions in source order (e.g. `params.fasta`, `PREPARE_GENOME.out.fai`).
bindings → InvocationBinding[] Per-argument binding onto the callee's `inputs[].name` (take names), in source order. Length matches `arguments[]` when the call positionally aligns to the callee's take block; shorter when only a prefix aligns.
caller string Name of the calling workflow or subworkflow.
caller_path string | null Repo-relative path of the file the caller is defined in. Null when not resolvable.

Summary

Summary

Top-level shape. Every Nextflow summary is exactly this object.

field type req description
nf_tests → NfTest[] All `tests/*.nf.test` files in the pipeline. Each entry captures one test-program — its profile, params overrides, and structured assertions. Empty array when the pipeline has no nf-test fixtures.
params → Param[]
processes → Process[]
profiles string[]
reference_assets → ReferenceAsset[] Curated, target-agnostic view of reference-data inputs the source pipeline consumes (FASTAs, indexes, dictionaries, annotation tables, …). Each entry is a foreign-key projection onto `params[]` plus source-side metadata (asset kind, format hint, source expression, schema group, callers). Empty array when no reference assets are detected. Downstream Molds (e.g. nextflow-summary-to-galaxy-reference-data) classify these against Galaxy translation rungs; classification decisions are not encoded here.
reference_rebuilds → ReferenceRebuildRule[] Compute-if-missing rebuild branches detected in workflow/subworkflow bodies. Each entry binds an asset param to the guard and builder process that recomputes it when absent. Empty when no rebuild idioms are found. Source evidence only — Galaxy in-tool-rebuild decisions live downstream.
sample_sheets → SampleSheet[] Structured sample-sheet inputs. Each entry binds one `params[]` parameter to a row schema (column names, types, path-vs-meta classification, required flags, enums, patterns). Promoted from prose inside `params[].description` so downstream target translations (Galaxy `sample_sheet*` collections, CWL records-of-arrays) can choose collection variants without re-parsing the source pipeline. Empty array when no sample-sheet idiom is detected. Discovery sources: nf-schema `schema:` references, `samplesheetToList()` calls, and `splitCsv(header: true)` materializations.
source → SourceRecord
subworkflows → Subworkflow[]
test_fixtures → TestFixtures Test-input data shape for the *selected* profile (the cast's `profile` argument, default `test`). For pipelines with multiple test profiles, see `nf_tests[]` for the full enumeration.
tools → Tool[]
workflow → Workflow
warnings string[] Cast-skill-emitted advisory messages (e.g. DSL1 detected, meta.yml/script disagreement, lossy assertion summarization). Empty array on a clean run.

TestDataRef

TestDataRef

One test-fixture input. Field names mirror gxy-sketches' TestDataRef verbatim. The sketch-bundle constraint that `path` must live under `test_data/` is intentionally dropped; the Foundry summary describes fixtures as data, it does not bundle them.

field type req description
role string Logical role of the input (e.g. `samplesheet`, `reference_fasta`).
description string | null
filetype string | null File format, e.g. `fastq.gz`, `csv`.
path string | null Repo-relative path when the fixture lives in-tree, or local filesystem path when the CLI fetched the remote fixture into a caller-provided test-data directory.
sha1 string | null SHA-1 integrity hash when published alongside the fixture.
url string | null

TestFixtures

TestFixtures

Test fixtures derived from a `conf/<profile>.config` for the cast's selected profile. Pipelines with multiple test profiles (bacass has 11) record only the selected profile here; the full nf-test enumeration lives in `nf_tests[]`.

field type req description
inputs → TestDataRef[]
outputs → ExpectedOutputRef[]
profile string Which `conf/<profile>.config` produced these fixtures (typically `test`).

Tool

Tool

One tool/dependency the pipeline runs. Mirrors gxy-sketches ToolSpec (name + version) and augments it with the resolved container/conda strings the bridge to author-galaxy-tool-wrapper needs.

field type req description
name string Canonical tool name (e.g. `fastp`, `samtools`). Used as the foreign key from `processes[].tool`.
version string
bioconda string | null `bioconda::<name>=<version>` spec when a `conda` directive provides one — directly or by reference to a `${moduleDir}/environment.yml` whose `dependencies:` list contains a single bioconda entry. The latter is the modern nf-core convention.
biocontainer string | null BioContainers image reference when one was found. Includes both `quay.io/biocontainers/<name>:<version>--<build>` and the docker.io alias `biocontainers/<name>:<version>--<build>` (modern nf-core modules typically use the docker.io form in the docker branch and the depot.galaxyproject.org form in the singularity branch).
docker string | null Non-biocontainer Docker registry image string when present.
mulled_components → ToolSpec[] Constituent Bioconda packages for an opaque `mulled-v2-*` multi-package container when resolved from a cached BioContainers multi-package-containers TSV. Omitted when no matching cached row is available.
singularity string | null Singularity image reference when present (e.g. `https://depot.galaxyproject.org/singularity/...`).
wave string | null Seqera Wave / community-cr registry reference (e.g. `community.wave.seqera.io/library/<name>:<version>--<digest>` or `https://community-cr-prod.seqera.io/.../sha256/<digest>/data`). Wave-built containers are increasingly common in nf-core; kept distinct from `docker` because the resolution rules and provenance differ.

ToolSpec

ToolSpec

One constituent package in a decomposed multi-package container.

field type req description
bioconda string Exact Bioconda requirement spec, e.g. `bioconda::samtools=1.20`.
name string
version string

Workflow

Workflow

Primary workflow: channel construction plus the call-graph DAG. Real nf-core pipelines have multiple named `workflow` blocks (an anonymous entrypoint `workflow {}` in main.nf that wires PIPELINE_INITIALISATION → NFCORE_<NAME> → PIPELINE_COMPLETION; a named NFCORE_<NAME> wrapper; a substantive named workflow under workflows/<name>.nf). The summary collapses this into one primary `workflow` plus `subworkflows[]`. Selection rule: pick the workflow that invokes the most pipeline processes — typically the one under `workflows/<name>.nf`. The anonymous `workflow {}` glue and the NFCORE_<NAME> wrapper land in `subworkflows[]`.

field type req description
channels → Channel[]
edges → Edge[]
name string Name of the selected primary workflow (uppercase NF convention). The anonymous `workflow {}` entrypoint is never the primary; if it is the only workflow block, the pipeline is too small to summarize and the cast skill should emit a warning.
conditionals → Conditional[]

This page is auto-rendered from the JSON Schema authored in this repo and shipped on npm as part of @galaxy-foundry/summarize-nextflow (the producer co-locates its own schema). Each $def becomes a section below with a stable anchor ID — research notes and Mold bodies can deep-link individual shapes via summary-nextflow#Tool.

Source-of-truth chain:

  1. packages/summarize-nextflow/src/schema/summary-nextflow.schema.json — the canonical JSON, hand-edited as part of the Mold/cast loop (summarize-nextflow). Mold frontmatter cites it via summary-nextflow wiki-links; cast imports the summaryNextflowSchema runtime export and serializes it into cast bundles.
  2. packages/summarize-nextflow/scripts/sync-schema.mjs runs at prebuild, regenerating the typed summary-nextflow.schema.generated.ts const wrapper from the canonical JSON.
  3. Published as @galaxy-foundry/summarize-nextflow on npm. Site rendering imports the schema directly from this package via site/src/lib/schema-registry.ts; the published artifact also exports validateSummary() and ships the standalone summarize-nextflow bin (self-validates by default). The unified foundry CLI in @galaxy-foundry/foundry exposes the same gate as foundry validate-summary-nextflow for downstream cast skills.

At runtime in cast skills: validation should happen through the CLI command:

foundry validate-summary-nextflow summary.json

The same schema is copied verbatim into references/schemas/summary-nextflow.schema.json per the casting policy in docs/COMPILATION_PIPELINE.md. The package additionally exports validateSummary (AJV gate) for TypeScript consumers, but generated skills should prefer command-shaped validation so failures are easy to reproduce outside the agent runtime.

Contrast with tests-format, which is vendored from an external npm package (@galaxy-tool-util/schema); this schema is authored here and shipped to npm — the direction of the source-of-truth chain is reversed.

Why per-source

Paper, Nextflow, and CWL are different enough that forcing a shared cross-source summary shape would either lose detail or bloat all three (docs/HARNESS_PIPELINES.md §“Mold-inventory parity”). Each summarize-<source> Mold emits its own schema; downstream source-target Molds such as nextflow-summary-to-galaxy-interface, nextflow-summary-to-galaxy-data-flow, nextflow-summary-to-cwl-interface, and nextflow-summary-to-cwl-data-flow consume this summary without pretending every source has one shared shape.

Field-name parity with gxy-sketches

Three sub-shapes mirror gxy-sketches verbatim — see gxy-sketches-alignment for the rationale:

  • SourceRecord — mirrors SketchSource (ecosystem, workflow, url, version, license, slug).
  • Tool — extends ToolSpec (name, version) with the resolved container/conda strings the bridge to author-galaxy-tool-wrapper needs.
  • TestDataRef / ExpectedOutputRef — mirror gxy-sketches’ field names exactly. The sketch-bundle invariant that path must live under test_data/ is intentionally dropped; the Foundry summary describes fixtures as data, it does not bundle them.

Cast-time role

Per docs/COMPILATION_PIPELINE.md’s per-kind dispatch, this schema is referenced by summarize-nextflow via output_artifacts[].schema and copied verbatim into the cast bundle’s references/schemas/. The cast skill validates its emitted JSON with validate-summary-nextflow before returning; failure is loud — downstream Molds bind to this shape and would produce worse errors later.

What is intentionally not modeled

  • Structured channel typing. processes[].inputs[].shape is a string ("tuple(meta, [path,path])"), not a structured type. NF channel typing is a research project; a string is enough for downstream Molds to reason about and an LLM to emit.
  • Operator-chain semantics. Edge.via records the literal operator chain (["map", "join", "groupTuple"]). Reconciling what the chain does to channel shapes is left to the LLM step that fills Edge.notes when confidence is low.
  • Multi-tool processes outside decomposed mulled-v2 containers. A process can run multiple tools (a shell pipeline of two binaries). Process.tool is nullable; multi-tool processes set it null and surface tool details in script_summary and container. A tools[] foreign-key array on Process would be cleaner; deferred until downstream use forces it.

Revision 2 — 2026-05-01

First cast against nf-core/demo @ 1.1.0 exposed gaps in the v1 shape (see content/log.md’s 2026-05-01 entry). Changes:

  • Tool.biocontainer description widened to accept the docker.io alias biocontainers/<name>:<version>--<build> alongside quay.io/biocontainers/.... Modern nf-core modules publish the docker.io form in the docker branch and the depot.galaxyproject.org/singularity/... form in the singularity branch.
  • Tool.wave field added for Seqera Wave / community-cr registry images (community.wave.seqera.io/..., https://community-cr-prod.seqera.io/...). Kept distinct from docker because resolution rules and provenance differ.
  • Process.container and Process.conda re-described as verbatim directive text (not “resolved”). Modern container directives are ternary expressions over workflow.containerEngine; modern conda directives are file references to ${moduleDir}/environment.yml. The schema now records the directive faithfully and pushes resolution into tools[].
  • ChannelIO.topic field added for Nextflow 24+ channel topics. nf-core templates emit a per-process topic: versions triple to a global topic for version aggregation; the v1 shape had no place to record this.
  • Subworkflow.kind enum added (pipeline | utility). nf-core template subworkflows like PIPELINE_INITIALISATION compose free-function calls (paramsHelp, completionEmail) without invoking processes; their calls[] is empty by design. The kind field disambiguates real data-flow contributors from utility wrappers.
  • Workflow.name description names the selection rule for pipelines with multiple named workflow blocks (anonymous workflow {} + NFCORE_ + a substantive named workflow): pick the one with the most process invocations — typically workflows/<name>.nf — and route the rest into subworkflows[].

What was not changed despite biting:

  • nf-test snapshot assertions (snapshot(...).match() with helpers) remain summarized to prose strings in ExpectedOutputRef.assertions[]. A structured “snapshot fixture” shape would help but is deferred until rev 3 when the testing note has paragraphs to inform the design.
  • Free-function calls in workflow bodies (paramsSummaryMap, softwareVersionsToYAML) remain folded into channel description text. No first-class representation; their effects are channel sources, the names are nf-core idiom not pipeline-specific signal.

Revision 3 — 2026-05-01

Second cast against nf-core/bacass @ 2.5.0 (33 processes, 9 nf-test files, 11 test profiles) exposed two structural-coverage gaps the second pipeline made universal. Changes:

  • Process.aliases: string[] added. Real pipelines re-import a single module multiple times under different aliases via include { X as Y } — bacass has six such patterns (CAT_FASTQ→{SHORT,LONG}, MINIMAP2_ALIGN→{CONSENSUS,POLISH} (3 aliases of one process), KRAKEN2_KRAKEN2→{KRAKEN2,KRAKEN2_LONG}, QUAST→QUAST_BYREFSEQID, plus FASTQC→{RAW,TRIM} inside FASTQ_TRIM_FASTP_FASTQC). Workflow edges[].from/.to reference alias names; canonical names didn’t appear in edges at all. The new field captures the alias→canonical mapping so downstream skills (especially author-galaxy-tool-wrapper, which needs to know “MINIMAP2_CONSENSUS shares MINIMAP2_ALIGN’s container/conda but is invoked with different runtime args”) can resolve references.
  • Summary.nf_tests: NfTest[] added. bacass has 9 tests/*.nf.test files, one per test profile. The previous schema’s test_fixtures is singular (one selected profile’s data shape); the rest of the test surface was invisible. The new array enumerates every .nf.test with structured fields: profile, params overrides, assert_workflow_success, prose_assertions, and a structured snapshot: SnapshotFixture | null capturing the snapshot(...).match() semantics.
  • SnapshotFixture shape added. nf-core templates use a near-uniform snapshot pattern: pass succeeded-task-count + version-yaml + stable-name list + stable-path list into snapshot(), pruning via ignoreFile: and ignore: globs. The new shape records captures, helpers, ignore_files, ignore_globs, and the .snap path — enough for downstream test-plan molds (e.g. nextflow-test-to-galaxy-test-plan) to reconstruct equivalent assertion intent in target frameworks without re-parsing Groovy.

What was not changed despite biting:

  • TestFixtures stayed singular. Multiple test profiles surfaced via nf_tests[] rather than promoting test_fixtures to an array — this preserves backward compatibility and keeps the “data shape of the selected profile” abstraction.
  • Mulled-v2 multi-package containers, multiMap/.branch/.cross fan-out, conditional channel construction, .mix-then-reassign — all still had only one bite each (bacass), so deferred per the “grow from contact” rule.

Revision 4 — 2026-05-05

Snapshot-sidecar parsing landed for module and subworkflow tests whose interesting assertions live in sibling .nf.test.snap JSON files. Changes:

  • SnapshotFixture.parsed_content: SnapshotContent[] added. Each parsed sidecar entry preserves the snapshot name plus channel-keyed SnapshotChannel values.
  • SnapshotFile added. <path>:md5,<hex> strings become file digest assertions with path, basename, md5, and a stub flag for empty-file md5s.
  • Non-file values preserved. Version tuples, counts, and other scalar snapshot values remain in SnapshotChannel.values so downstream test-plan Molds do not re-read .snap files.

Revision 7 — 2026-05-06

Top-level Param entries gained the nf-schema metadata previously only available on sample-sheet columns. Resolves jmchilton/foundry#186.

  • Param.format added (string|null). nf-schema format keyword: file-path, directory-path, path, file-path-pattern. Disambiguates path-typed params from plain strings without re-reading nextflow_schema.json.
  • Param.hidden added (boolean|null). nf-schema hidden keyword. CLI-plumbing params (validate_params, pipelines_testdata_base_path, version, …) now drop out of user-facing target interfaces structurally.
  • Param.mimetype added (string|null). nf-schema mimetype keyword. Seeds Galaxy format on path params when present.
  • Param.schema_group added (string|null). The parent $defs section’s title (e.g. Input/output options, Reference genome options). Preserves nf-schema sectioning for UI grouping.
  • Param.fa_icon added (string|null). The parent section’s Font Awesome icon hint.

Downstream Molds — nextflow-summary-to-galaxy-interface, nextflow-summary-to-cwl-interface, nextflow-summary-to-galaxy-data-flow — can now consume these structurally instead of re-parsing the source schema. Mapping table in nextflow-params-to-galaxy-inputs.

Revision 6 — 2026-05-05

Sample-sheet schemas became first-class structured inputs. Resolves the open question raised in nextflow-workflow-io-semantics §“Open questions” and tracked in jmchilton/foundry#177.

  • Summary.sample_sheets: SampleSheet[] added (required; empty array when none). Promotes sample-sheet shape out of params[].description prose so downstream target Molds can pick collection variants without re-parsing the source pipeline.
  • SampleSheet shape added. Binds one params[] parameter (param) to a row schema (columns) plus discovery provenance (discovered_via: nf-schema | samplesheetToList | splitCsv | ad-hoc), optional schema_path, format, and header.
  • SampleSheetColumn shape added. Captures name, JSON Schema-style scalar type, kind (data for path-typed dataset references vs meta for per-row metadata), format, required, default, enum, pattern, exists, mimetype, description. Validation hints stay verbatim — target Molds decide which survive translation (e.g. Galaxy’s sample_sheet validator allowlist is regex/in_range/length only; richer nf-schema validation downgrades to prose with confidence note).

Why now: nf-core’s samplesheetToList(params.input, "assets/schema_input.json") idiom maps almost 1:1 onto Galaxy’s sample_sheet[:paired|:paired_or_unpaired|:record] collection types (column_definitions, typed columns including element_identifier cross-row refs, restrictions[] for enums, regex validators). Without structured columns the interface Mold cannot pick sample_sheet:paired vs list:paired vs flat-file principally, and per-row meta fields silently fall to parallel parameter inputs. See galaxy-sample-sheet-collections for the target-side mapping table consumed by nextflow-summary-to-galaxy-interface and nextflow-summary-to-galaxy-data-flow.

What was not changed: Param.type still records the param’s own type (string/path) — the sample-sheet relationship is expressed by sample_sheets[].param referencing params[].name, not by mutating the param entry.

Revision 5 — 2026-05-05

Mulled-v2 multi-package container decomposition now has a narrow optional shape. Changes:

  • Tool.mulled_components: ToolSpec[] added. When summarize-nextflow is given a cached BioContainers multi-package-containers TSV, opaque mulled-v2-* container IDs can be decomposed into constituent Bioconda package specs.
  • ToolSpec added. Constituent packages record name, version, and exact bioconda requirement text.

Incoming References (21)