nf-core/tools and the nf-core Pipeline Toolchain: A Technical Survey
Source clone: ~/projects/repositories/nf-core-tools (commit b6c5737, version 4.0.2).
Overview
nf-core/tools is the official Python package that the nf-core community publishes to PyPI as nf-core (current release 4.0.2, May 2026). It is a Click-based CLI plus an importable Python library (nf_core.*) that handles essentially every lifecycle task a pipeline author or operator performs against an nf-core Nextflow pipeline: scaffolding new pipelines from a Jinja-rendered cookiecutter template, installing and updating shared modules and subworkflows from nf-core/modules, linting, schema management, listing remote pipelines, downloading pipelines together with their container images for offline use, and synchronising pipelines with the upstream template as it evolves.
Conceptually the package solves three problems for the community: enforcing convention (every nf-core pipeline shares a directory layout, file inventory, and metadata schema, and nf-core pipelines lint is the reference enforcer), enabling code reuse across pipelines (the modules/subworkflows subcommands implement a Git-tracked package manager whose state lives in modules.json), and bridging the pipeline to its surrounding registries (pipelines.json, nf-core/configs, nf-core/test-datasets).
Historically the package began as a small scaffolding helper around 2018 and has tracked the nf-core pipeline standard ever since. The 2.x line introduced subworkflows; 3.x rewrote the create UI in Textual and added the schema validator plugin model; 4.x consolidates around nf-test as the canonical pipeline test harness, replaces pytest_workflow and pytest_modules style harnesses, and tightens the .nf-core.yml schema with a Pydantic v2 model.
It sits in the centre of an ecosystem of GitHub repositories: nf-core/pipelines.json (pipeline registry), nf-core/modules (the canonical module + subworkflow registry), nf-core/configs (institutional Nextflow configs), nf-core/test-datasets (branch-per-pipeline test data), and the website nf-co.re which serves the schema-builder web UI and API. The CLI talks to all of them over HTTPS and the GitHub API.
The nf-core conventions the tools encode
A pipeline that the tools recognise as nf-core compliant follows the layout reproduced verbatim in nf_core/pipeline-template/. The canonical structure is:
main.nf— entrypoint; importsworkflows/<name>.nfand the boilerplate utility subworkflows (utils_nfcore_pipeline,utils_nfschema_plugin).nextflow.config— setsmanifest,params,profiles(at minimumtest,test_full,docker,singularity,conda,apptainer,arm), and includes the conf/ files.nextflow_schema.json— JSON Schema (Draft-07) forparams, with nf-core extensions (see Schema universe).workflows/<name>.nf— the pipeline DSL2 workflow definition.modules/nf-core/<tool>/<subtool>/— vendored modules pulled fromnf-core/modules. Each module hasmain.nf,meta.yml,environment.yml, andtests/main.nf.testplus snapshots.modules/local/— pipeline-specific modules.subworkflows/nf-core/<name>/andsubworkflows/local/<name>/— same split for subworkflows.meta.ymlfor subworkflows declarescomponents:(the modules/subworkflows it depends on).conf/base.config,conf/modules.config,conf/test.config,conf/test_full.config— required by theincluded_configsandbase_configlint checks.assets/— schema_input.json (sample sheet schema), MultiQC config, email templates, logos.bin/— pipeline-shipped helper scripts placed onPATHfor processes.docs/—usage.md,output.md,images/.tests/— top-level nf-test files; the lint checknf_test_contentenforces presence and naming..nf-core.yml— see below.modules.json— see below.nf-test.config,tower.yml,CHANGELOG.md,CITATIONS.md,CODE_OF_CONDUCT.md,LICENSE,README.md— all enforced byfiles_exist.
.nf-core.yml
Validated by NFCoreYamlConfig (Pydantic v2) in nf_core/utils.py. The recognised fields are:
repository_type: "pipeline" | "modules"— discriminates pipeline repos from module repos.nf_core/components/components_command.pyswitches behaviour on this.nf_core_version— version of the tools used to render or last sync the template.org_path— used in modules-type repos (e.g.nf-coreor an institutional fork).lint:—NFCoreYamlLintConfig; per-check disable map. A check listed here is skipped or downgraded; thefiles_existandfiles_unchangedchecks accept lists of paths to ignore.template:—NFCoreTemplateConfig; captures the cookiecutter answers (name,description,author,version, skipped features).bump_version:— modules-repo only; per-component opt-out ofbump-versions.update:— pipeline-repo only; per-module pinning that suppressesnf-core modules update.
Unknown fields are tolerated only via .get because the model is defined with explicit attributes; at lint time mismatches are reported by nfcore_yml.
modules.json
Lives at the pipeline root. JSON object of the form:
{
"name": "<pipeline>",
"homePage": "...",
"repos": {
"https://github.com/nf-core/modules.git": {
"modules": {"nf-core": {"<tool>/<subtool>": {"branch":"master","git_sha":"...","installed_by":["modules"]}}},
"subworkflows": {"nf-core": {"<name>": {"branch":"master","git_sha":"...","installed_by":["subworkflows"]}}}
}
}
}
nf_core/modules/modules_json.py reads, validates, and rewrites it. Every install/update/remove/patch command mutates this file. The installed_by field tracks whether a module was added directly or pulled in transitively as a subworkflow dependency — removing the parent removes the child only if no other parent remains. git_sha pins the commit in nf-core/modules from which the module’s directory was copied; nf-core modules update diffs the working tree against that SHA and against the requested SHA.
nf-core CLI surface
The CLI is defined in nf_core/__main__.py using click + rich-click. Top-level groups: pipelines, modules, subworkflows, test-datasets, interface (TUI via trogon).
nf-core pipelines
create— Textual TUI (or--template-yamlfor headless) that rendersnf_core/pipeline-template/through Jinja using atemplate_features.ymlanswer set. Skipped features remove sections of the template.lint— runs the lint test battery (see Linting); supports--release,--fix,--key,--show-passed,--fail-warned,--fail-ignored,--json <file>,--markdown <file>,--sort-by.download— clones a pipeline at a revision, optionally fetches container images. Shells out tonextflow inspect -format jsonto enumerate processes and containers (see Container resolution).list— fetcheshttps://nf-co.re/pipelines.json, joins it with local clone state, prints table or--json.launch— interactive parameter wizard againstnextflow_schema.json; can post to the nf-co.re web GUI for collaborative editing then poll back.create-params-file— non-interactive; emits a YAML params file populated with schema defaults.sync— fetches the current template, re-renders against the recorded answers in.nf-core.yml, commits to aTEMPLATEbranch, opens a PR back todev.bump-version— rewritesmanifest.versioninnextflow.config,nextflow_schema.json,CITATIONS.md, etc.create-logo— produces nf-core-styled logos.rocrate— emits Research Object Crate metadata viarepo2rocrate.schema validate <pipeline> <params>— validates a params file againstnextflow_schema.json.schema build— interactive; opens the web schema builder, polls for the result, writes back.schema lint— schema-itself validation (Draft-07 + nf-core conventions).schema docs— generates Markdown documentation from the schema.
nf-core modules
list remote/list local— enumerate modules in a remote modules repo or the current pipeline’s vendored set.install <name>,update <name>,remove <name>— package-manager operations againstmodules.json.create <tool>/<subtool>— scaffolds a new module fromnf_core/module-template/.info <name>— pretty-printsmeta.ymlfor a remote or local module.lint— runs the module lint suite (see below).patch <name>— captures local diffs against the upstream module as a<name>.difffile that survivesupdate.bump-versions— bumps tool versions inenvironment.ymland the container directive.test <name>— runs the module’s nf-test.
nf-core subworkflows
Same surface as modules (create, install, update, remove, list, info, lint, patch, test) backed by a shared nf_core/components/ layer. A subworkflow’s meta.yml declares components: [<module>, <subworkflow>] which the install command resolves transitively.
nf-core test-datasets
search— keyword search across branches ofnf-core/test-datasets.list— list all data files for the current pipeline branch.list-branches— list branches (one per pipeline).
nf-core interface
Trogon-rendered TUI wrapping the click app — useful for discovery, not a separate command surface.
A flag worth flagging: almost no commands offer JSON output by default. pipelines list and pipelines lint do (--json); the rest are human-oriented Rich tables and prompts. Programmatic consumers usually drop to the Python API.
Python API
The package exposes modules under nf_core/ whose __init__.py files are deliberately minimal — most public API is reached by importing the leaf modules:
nf_core.pipelines.schema.PipelineSchema—.load_schema(),.validate_params(),.validate_schema(),.get_schema_defaults(),.schema_to_markdown(). The most stable internal interface; reused bylint,launch,create-params-file.nf_core.pipelines.lint.PipelineLint— registry of lint tests as instance methods named after the entries inlint_tests. The_get_results_md(),_get_lint_results()outputs include a"nf_core_tools_version"field, per-category test arrays (tests_pass,tests_warned,tests_failed,tests_ignored,tests_fixed), and counts.nf_core.pipelines.list.Workflows— wrapshttps://nf-co.re/pipelines.json; exposes.remote_workflowsand.local_workflowslists each containing aPipelinemodel.nf_core.pipelines.download.DownloadWorkflow— full download orchestrator, including container fetch.nf_core.pipelines.create.create.PipelineCreate— programmatic scaffolding.nf_core.modules.modules_json.ModulesJson— read/write the manifest.nf_core.modules.modules_repo.ModulesRepo— clone and resolve refs in a modules repository (defaulthttps://github.com/nf-core/modules.git, branchmaster, all overridable via env varsNF_CORE_MODULES_REMOTE,NF_CORE_MODULES_NAME,NF_CORE_MODULES_DEFAULT_BRANCH).nf_core.components.components_command.ComponentCommand— base class shared by modules and subworkflows operations;.get_local_components(),.has_modules_file(),.check_modules_structure().nf_core.utils—is_pipeline_directory,fetch_wf_config(runsnextflow config),load_tools_config(returns a PydanticNFCoreYamlConfig),setup_requests_cachedir(a 1-dayrequests_cachefor GitHub calls),GitHubAPISession(rate-limit-aware),anaconda_package,get_biocontainer_tag,determine_base_dir,is_file_binary.
There is no documented stable API contract. The README and online docs cover the CLI; nf_core.* modules are imported by other tools (nf-core/configs scripts, nf-validation, internal nf-core webapps) but breaking changes happen across major releases. Type hints are mostly in place since 4.x.
The schema universe
nextflow_schema.json— JSON Schema Draft-07 forparams. nf-core layers conventions on top: top-leveldefinitionsgroups parameters into UI panels; per-property keywordsfa_icon(FontAwesome),hidden(boolean, hides from launch UI),help_text,mimetype(for file params; checked bycheck_for_input_mimetype), and a customdefaultresolution that tolerates nulls and Nextflow-style closures. The schema is consumed bypipelines launch,create-params-file, the nf-co.re schema builder, the in-pipelinenf-validation/nf-schemaNextflow plugins, and Seqera Platform.assets/schema_input.json— JSON Schema for the sample-sheet CSV/TSV; consumed at runtime by thenf-schemaplugin’ssamplesheetToListoperator.pipeline_template.yml(in tools,nf_core/pipelines/create/template_features.yml) — describes the cookiecutter feature flags (fastqc,multiqc,nf_schema,igenomes,email,slackreport,adaptivecard, …). Each entry hasskippable_paths,forbidden_paths,nfcore_yml_skip_value, etc..nf-core.yml— covered above. Pydantic-validated;additionalPropertiesis effectively closed because Pydantic v2 with explicit fields ignores extras by default.modules.json— covered above. Validated bynf_core/modules/modules_json.pyagainst an internal jsonschema.meta.yml(modules and subworkflows) — YAML; declaresname,description,keywords,tools(each with version, license, doi, homepage, biocontainer/container hints),inputandoutputchannel specifications including types, patterns, and ontology terms (EDAM where present), andauthors/maintainers. Themeta_ymllint check validates against a JSON Schema bundled in the tools repo. For introspection use cases,meta.ymlis the most valuable single file: it’s the only declarative source of channel IO shapes per module.nf-test.configand per-moduletests/main.nf.test,tests/main.nf.test.snap— nf-test’s own format (Groovy DSL + JSON snapshots). Thenf_test_contentlint check parses these for required tags and thesetupblock that pulls test data fromnf-core/test-datasets.
The downstream ecosystem
How resolution works at runtime:
- Pipeline registry:
https://nf-co.re/pipelines.jsonis fetched bynf_core.pipelines.list(cached viarequests_cache). Anonymous; no auth. - GitHub API:
nf_core.utils.GitHubAPISessionwrapsrequests_cache.CachedSessionwith token discovery fromGITHUB_TOKEN/GITHUB_AUTH_TOKENand rate-limit retry. Used for branch/release enumeration (get_repo_releases_branches) and SHA resolution (get_repo_commit). - Modules repo:
nf_core.modules.modules_repo.ModulesRepoperforms a realgit clone --no-checkoutofhttps://github.com/nf-core/modules.git(override viaNF_CORE_MODULES_REMOTE) into the user’s nf-core cache directory, thengit checkoutof specific component subtrees. - Test-datasets:
nf_core.test_datasets.test_datasets_utilscallshttps://api.github.com/repos/nf-core/test-datasets/branchesfor branch listing andhttps://raw.githubusercontent.com/nf-core/test-datasets/<branch>/<path>for content. The website also publisheshttps://raw.githubusercontent.com/nf-core/website/refs/heads/main/public/pipeline_names.jsonas the canonical list of pipeline-named branches. - Configs: not directly fetched by the tools CLI; pipelines
includeConfigfromhttps://raw.githubusercontent.com/nf-core/configs/master/...at Nextflow runtime, gated byparams.custom_config_base. Theconfigslint check verifies the include statement is present.
Linting, in detail
The pipeline lint registry is the list lint_tests in nf_core/pipelines/lint/__init__.py:
files_exist, nextflow_config, nf_test_content, files_unchanged, actions_nf_test, actions_awstest, actions_awsfulltest, readme, pipeline_todos, pipeline_if_empty_null, plugin_includes, pipeline_name_conventions, template_strings, schema_lint, schema_params, system_exit, schema_description, actions_schema_validation, merge_markers, modules_json, multiqc_config, modules_structure, local_component_structure, base_config, modules_config, nfcore_yml, rocrate_readme_sync, container_configs. In --release mode version_consistency and included_configs are added.
Categories:
- Files exist / files unchanged:
files_existchecks the canonical inventory (over 60 paths).files_unchangeddiffs template-shipped files against whatpipelines createwould render today, flagging local edits. - Schema:
schema_lintvalidates the JSON Schema itself;schema_paramscross-checks every param declared innextflow.config(viafetch_wf_config) against the schema, both directions.schema_descriptionrequires a description on every parameter. - Modules:
modules_jsonintegrity (every directory undermodules/nf-core/has an entry, every entry has a directory, SHAs are resolvable).modules_structurechecks the<tool>/<subtool>/{main.nf,meta.yml,environment.yml,tests/}layout.local_component_structuredoes the same formodules/local/. - CI:
actions_nf_test,actions_awstest,actions_awsfulltest,actions_schema_validationparse.github/workflows/*.ymland assert presence of expected jobs. - Code hygiene:
merge_markers,pipeline_todos(TODO grep),template_strings(no leftover{{ jinja }}),system_exit(GroovySystem.exitis forbidden — useerror()). - Container/config:
nextflow_configrequires a long list of manifest fields andparams.*defaults;container_configsvalidates containers settings;base_config,modules_configcheckconf/;multiqc_configvalidatesassets/multiqc_config.yml.
The module lint suite (nf_core/modules/lint/): main_nf (process structure, container directive form, output channels), meta_yml (schema-validate the meta), environment_yml (conda channels, package pinning, name == module name), module_changes (working-tree diff against pinned SHA), module_version (compare with upstream master), module_tests (nf-test presence and tags), module_todos, module_deprecations, module_patch (patch file integrity).
--json <path> writes a structured report keyed by:
nf_core_tools_version, date_run,
tests_pass[], tests_warned[], tests_failed[], tests_ignored[], tests_fixed[],
num_tests_pass/warned/failed/ignored/fixed,
has_tests_pass/warned/failed/ignored/fixed (booleans),
markdown_result
Every entry is [check_id, message]. Disabling a check is done in .nf-core.yml:
lint:
files_exist: false # disable entirely
files_unchanged: # or pass arguments
- .github/CONTRIBUTING.md
nextflow_config:
- manifest.name
Container resolution
Modules declare containers via the canonical Groovy ternary:
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/fastqc:0.12.1--hdfd78af_0' :
'biocontainers/fastqc:0.12.1--hdfd78af_0' }"
For tools with multiple dependencies the mulled-v2 convention is used: https://depot.galaxyproject.org/singularity/mulled-v2-<hash>:<verhash>-0, where the hash is reproducible from the sorted package list (galaxy-tool-util provides the hash function; nf-core/tools accepts both pre-computed hashes and resolves them via Biocontainers/Quay).
nf-core pipelines download --container-system singularity --container-cache-utilisation amend materialises images by:
- Cloning the pipeline at the requested revision.
- Running
nextflow inspect -format json -profile <profile> <entrypoint>per requested profile and collecting thecontainerfield of each process. The download module retries inspect with a syntheticoutdirparams file when the pipeline aborts on missingoutdironly. - For each unique container URI, pulling via
singularity pull/apptainer pull/docker pullinto the--singularity-cache-dir(or NXF_SINGULARITY_CACHEDIR) layout that Nextflow itself expects.
Wave / Seqera Containers fits as an alternative ternary branch and as the alternative registries community.wave.seqera.io/library/... (Docker) and community-cr-prod.seqera.io/docker/registry/v2/... (Singularity), encoded in nf_core/pipelines/download/utils.py. The download flow recognises both and pulls them by URL.
Practical patterns
Listing every module a pipeline uses
The authoritative source is modules.json. Read repos["https://github.com/nf-core/modules.git"]["modules"]["nf-core"] keys and walk modules/nf-core/<key>/ directories. The Python helper is:
from nf_core.modules.modules_json import ModulesJson
mj = ModulesJson(pipeline_dir)
mj.load()
modules = mj.get_all_components("modules") # -> list[(repo_url, install_dir, name)]
For the upstream version + SHA per module, the same JSON has git_sha and branch per entry.
Enumerating test profiles
Profiles live in nextflow.config in a profiles { ... } block. Static parsing is brittle because Groovy allows arbitrary code; the canonical answer is nf_core.utils.fetch_wf_config(path) which shells out to nextflow config -flat and returns the resolved keys. Profile names are recoverable from conf/test.config and conf/test_full.config filenames (the convention enforced by files_exist).
Every container image used by every process
The only sound static answer is also the one nf-core download uses: nextflow inspect -format json -profile <profile> main.nf. This emits a JSON document with a processes array, each with a container field already resolved by Nextflow’s interpolation, eliminating the ternary. nf_core/pipelines/download/download.py::run_nextflow_inspect wraps this. For static-only use, parsing modules/*/*/main.nf for container directives and then resolving the ternary against an assumed engine yields a reasonable approximation, but module overrides in conf/modules.config (withName: { container = '...' }) can still bypass it.
IO shape per module
modules/nf-core/<tool>/<subtool>/meta.yml’s input: and output: sections are the declarative source. They are list-of-list structured, where each top-level entry corresponds to a positional channel and each nested entry is a tuple element with type:, description:, pattern:, and optionally ontologies:.
Limitations and gaps for static introspection
nf-core/tools is a convention enforcer and package manager, not a workflow analyser. It does not:
- Build a process or channel graph. There is no
nf_coreAPI that returns a DAG.nextflow inspectreturns a flat process list, not edges. For graph-level insight the realistic paths are the Nextflow language server (nextflow-io/language-server), parsing the DSL2 AST yourself, or running-with-dagand parsing the generated DOT/HTML. - Resolve channel topology between modules.
meta.ymldocuments channel shapes per module but says nothing about howworkflows/<name>.nfwires them. - Type-check parameters end-to-end (only the
params.*declared innextflow_schema.jsonare validated; runtime usage is unchecked). - Provide a stable Python API contract — re-imports across major versions break.
- Cover non-nf-core pipelines. A pipeline that lacks
.nf-core.yml,modules.json, or the canonical layout is rejected byis_pipeline_directoryand most subcommands. - Resolve container images without running Nextflow. The static ternary is engine-conditional and can be overridden in
conf/modules.config.
For workflow topology, the right tools are nextflow inspect (process list + containers + resolved configs), the Nextflow language server (AST), nf-test (reified IO at test time), and Seqera Platform / nf-tower (runtime DAG).
Open gaps
Updated when contact with real pipelines reveals an nf-core convention or tooling behaviour we hadn’t accounted for.