COMPONENT_NEXTFLOW_WORKFLOW_TESTING

Component — Nextflow Workflow Testing

Synthesis of (a) external documentation/community material and (b) concrete evidence from a pinned local corpus of 7 nf-core pipelines at ~/projects/repositories/nextflow-fixtures/pipelines/. Every corpus claim is grounded in a file path + line numbers; every external claim has a URL.

Corpus pins (see fixtures.yaml): demo 1.1.0, fetchngs 1.12.0, bacass 2.5.0, hlatyping 2.2.0, taxprofiler 2.0.0, rnaseq 3.24.0, sarek 3.8.1.


1. The framework landscape

Nextflow itself ships no testing framework. Official docs cover only one test-adjacent feature: the stub: block executed via -stub-run (Nextflow process docs). Everything else is community-driven.

Two frameworks have mattered in practice:

Corpus evidence: all 7 pipelines use nf-test. Zero pytest.ini, .pytest_workflow.yml, or conftest.py found anywhere under the pipelines (find -name 'pytest*' returns empty). Each pipeline has a root nf-test.config and a .github/workflows/nf-test.yml. pytest-workflow is operationally extinct in the active nf-core corpus.


2. File layout of a tested pipeline

Every pipeline in the corpus follows the same shape:

<pipeline>/
├── nf-test.config                       ← framework config
├── conf/
│   ├── test.config                      ← minimal-dataset profile
│   ├── test_full.config                 ← AWS megatest profile
│   └── test_<variant>.config            ← per-flavor test profiles (sarek, taxprofiler, ...)
├── tests/
│   ├── default.nf.test                  ← pipeline-level test
│   ├── <scenario>.nf.test               ← additional pipeline tests
│   ├── *.nf.test.snap                   ← snapshot outputs
│   ├── .nftignore                       ← unstable files to exclude from snapshots
│   └── nextflow.config                  ← test-only overrides
├── modules/nf-core/<tool>/tests/        ← per-module tests (when vendored in-repo)
├── subworkflows/nf-core/<sw>/tests/     ← per-subworkflow tests
└── .github/workflows/nf-test.yml        ← CI entry point

Concrete nf-test.config (from nf-core__demo/nf-test.config:1-24):

config {
    testsDir "."                                            // search repo root down
    workDir System.getenv("NFT_WORKDIR") ?: ".nf-test"
    configFile "tests/nextflow.config"                      // layered on top of main
    ignore 'modules/nf-core/**/tests/*', 'subworkflows/nf-core/**/tests/*'
    profile "test"
    triggers 'nextflow.config', 'nf-test.config',
             'conf/test.config', 'tests/nextflow.config',
             'tests/.nftignore'                             // changes that force full run
    plugins { load "nft-utils@0.0.3" }
}

The ignore directive is a load-bearing convention: nf-core pipelines vendor modules from nf-core/modules verbatim, and those module tests are exercised upstream in the nf-core/modules repo — pipelines only run their own pipeline-level tests.


3. Test profiles (conf/test.config, conf/test_full.config, variants)

Two canonical profiles plus per-scenario variants for branchy pipelines.

test = minimal smoke dataset, CI-runnable. Overrides params.input, references, and caps resources. Example: nf-core__rnaseq/conf/test.config:13-36 (excerpt):

params {
    input            = 'https://raw.githubusercontent.com/nf-core/test-datasets/626c8fab639062eade4b10747e919341cbf9b41a/samplesheet/v3.10/samplesheet_test.csv'
    fasta            = 'https://raw.githubusercontent.com/nf-core/test-datasets/626c8fab.../reference/genome.fasta'
    gtf              = '...genes_with_empty_tid.gtf.gz'
    salmon_index     = '...salmon.tar.gz'
    skip_bbsplit     = false
    pseudo_aligner   = 'salmon'
    umitools_bc_pattern = 'NNNN'
}
process {
    withName: 'RSEM_PREPAREREFERENCE_GENOME|...' { ext.args2 = "--genomeSAindexNbases 7" }
    withName: '.*:BOWTIE2_ALIGN$'                { ext.args = '--very-sensitive-local --seed 1 --reorder' }
    withName: '.*:RIBODETECTOR'                  { ext.args = '--seed 1' }
}

Two notable patterns here:

  1. Test data URLs are pinned to a full test-datasets commit SHA (626c8fab63...). That’s a deliberate reproducibility choice — the test-datasets branch can evolve without breaking pinned pipeline tests.
  2. Process-level seeds/flags are set from the test profile. Determinism hacks (--seed 1, --reorder, --genomeSAindexNbases 7 for tiny indices) live in the test profile, not in the module — modules stay generic.

nf-core__demo/conf/test.config:13-27 shows a minimum-overhead variant with only a resourceLimits map (cpus: 2, memory: '4.GB', time: '1.h') and a single input URL.

test_full = full-size public dataset, AWS megatest only. Example nf-core__rnaseq/conf/test_full.config:13-21:

params {
    input          = 'https://raw.githubusercontent.com/nf-core/test-datasets/626c8fab.../samplesheet/v3.10/samplesheet_full.csv'
    genome         = 'GRCh37'
    pseudo_aligner = 'salmon'
}

Run on real hardware via .github/workflows/awsfulltest.yml, not CI. Results land in s3://nf-core-awsmegatests (nf-core/awsmegatests, Bytesize 19).

Variant profiles. Branchy pipelines carry per-flavor configs rather than encoding the matrix in test code. Examples:

Each variant corresponds 1:1 to a .nf.test under tests/. This is how nf-core handles workflow-level branching in testing: one profile + one test file per mode, rather than a single parametric test.


4. Test data: where it actually comes from

Test data is stored in the nf-core/test-datasets GitHub repo — one git branch per pipeline plus a special modules branch (nf-core/test-datasets README). Rationale: “Due [to] the large number of large files in this repository for each pipeline, we highly recommend cloning only the branches you would use.” Branching (instead of subdirectories) lets contributors shallow-clone exactly what they need.

Guiding principle: “as small as possible, as large as necessary.” Contributors are told to ask on Slack before adding test data.

Access pattern in the corpus: raw GitHub content URLs pinned to a commit SHA. Per-pipeline examples:

Module-level test data uses a different base path: params.modules_testdata_base_pathhttps://raw.githubusercontent.com/nf-core/test-datasets/modules/data/. The modules branch is the shared pool for module/subworkflow tests. Example: nf-core__sarek/tests/variant_calling_mutect2.nf.test:1-2 literally hardcodes def modules_testdata_base_path = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/' at file scope.

In-repo test fixtures (CSVs, small sample sheets) show up under tests/csv/ and assets/ — e.g., nf-core__sarek/tests/csv/3.0/recalibrated_somatic.csv referenced at variant_calling_mutect2.nf.test:20. FASTQs / references are never in-repo.


5. The shape of an nf-test test

Tests are written in a Groovy-like DSL with BDD-style when { } / then { } blocks. Official reference: nf-test.com and nf-core’s writing-tests tutorial.

Four kinds of tests, each with its own outer block name:

5a. Pipeline-level test (canonical shape)

nf-core__demo/tests/default.nf.test:1-33:

nextflow_pipeline {
    name "Test pipeline"
    script "../main.nf"
    tag "pipeline"

    test("-profile test") {
        when {
            params { outdir = "$outputDir" }
        }
        then {
            def stable_name = getAllFilesFromDir(params.outdir, relative: true, includeDir: true,
                                                 ignore: ['pipeline_info/*.{html,json,txt}'])
            def stable_path = getAllFilesFromDir(params.outdir, ignoreFile: 'tests/.nftignore')
            assertAll(
                { assert workflow.success },
                { assert snapshot(
                    removeNextflowVersion("$outputDir/pipeline_info/nf_core_demo_software_mqc_versions.yml"),
                    stable_name,   // path tree listing
                    stable_path    // md5 of each file's content
                ).match() }
            )
        }
    }
}

The three-part snapshot is the nf-core idiom:

  1. Normalized versions.yml — Nextflow version stripped so the test passes on multiple Nextflow versions in CI.
  2. stable_name — recursive file listing as strings (catches renamed/added/removed outputs).
  3. stable_path — content-hashed files (catches output drift), with .nftignore filtering out known-unstable files.

Wrapping in assertAll() is idiomatic — one failed assertion doesn’t mask others (assertions tutorial).

5b. Pipeline-level test — explicit assertions (no snapshot)

nf-core__fetchngs/tests/main.nf.test:16-60 takes a different approach — heavy use of assertAll({ assert new File(...).exists() }) for specific filenames plus targeted readLines() checks:

then {
    assert workflow.success
    assertAll(
        { assert new File("$outputDir/samplesheet/samplesheet.csv").readLines().size() == 15 },
        { assert new File("$outputDir/samplesheet/samplesheet.csv").readLines()*.split(',')[0].take(4)
              == ['"sample"', '"fastq_1"', '"fastq_2"', '"run_accession"'] },
        { assert new File("$outputDir/fastq/md5/DRX024467_DRR026872.fastq.gz.md5").exists() },
        // ... dozens more existence checks
    )
}

Seqera’s blog cites fetchngs as nf-core’s best-practice reference (Seqera blog). The style fits: FASTQ downloads produce binary content with embedded metadata that md5 can’t snapshot meaningfully, so the test checks shape + key files + specific header content instead.

5c. Matrix-of-scenarios pattern

nf-core__sarek/tests/variant_calling_mutect2.nf.test:4-45 shows how sarek does multi-scenario testing in a single file — top-of-file Groovy list of scenario maps:

def test_scenario = [
    [ name: "-profile test --tools mutect2 somatic",
      params: [ genome: null, igenomes_ignore: true,
                dbsnp: modules_testdata_base_path + 'genomics/.../dbsnp_138.hg38.vcf.gz',
                fasta: modules_testdata_base_path + '...',
                input: "${projectDir}/tests/csv/3.0/recalibrated_somatic.csv",
                step: "variant_calling", tools: 'mutect2', wes: true ] ],
    [ name: "-profile test --tools mutect2 somatic --no_intervals", params: [ ..., no_intervals: true, ... ] ],
    // ...
]

These are fanned out into separate test(...) blocks later in the file. Sarek has 59 pipeline-level tests and 0 module-level tests in the corpus — it inherits modules from nf-core/modules upstream.

5d. Module-level test

nf-core__rnaseq/modules/nf-core/rustqc/tests/main.nf.test:1-50 is representative of nextflow_process tests:

nextflow_process {
    name "Test Process RUSTQC"
    script "../main.nf"
    process "RUSTQC"

    tag "modules"
    tag "modules_nfcore"
    tag "rustqc"

    test("homo_sapiens paired-end [bam]") {
        config './nextflow.config'
        when {
            process {
                """
                input[0] = channel.of([ [ id:'test', single_end:false ],
                                        file(params.modules_testdata_base_path + "...test.paired_end.sorted.bam", checkIfExists: true),
                                        file(params.modules_testdata_base_path + "...test.paired_end.sorted.bam.bai", checkIfExists: true) ])
                input[1] = channel.of([ [ id:'homo_sapiens' ],
                                        file(params.modules_testdata_base_path + "...genome.gtf", checkIfExists: true) ])
                """
            }
        }
        then {
            assertAll(
                { assert process.success },
                { assert snapshot(
                    process.out.featurecounts,
                    process.out.preseq,
                    process.out.rseqc[0][1].findAll { it.toString().endsWith("infer_experiment.txt") || ... },
                    // non-reproducible outputs — filenames only
                    process.out.dupradar[0][1].collect { file(it).name }.sort(),
                    process.out.qualimap[0][1].findAll { !file(it).isDirectory() }.collect { file(it).name }.sort(),
                    ...
                ).match() },
            )
        }
    }
}

Key idiom: cherry-pick stable vs unstable outputs within one snapshot call. Reproducible text outputs go in as full content; binary/timestamped outputs go in as filenames-only (sorted). Tag hierarchy (modules, modules_nfcore, <tool>) drives selective CI runs.

Tags are the scoping mechanism. nf-test --tag rustqc runs only rustqc tests; --tag modules runs all module tests. Pipeline-level tests carry tag "pipeline" or tag "PIPELINE". Sarek uses custom tags for vendor-specific tools: tag "sentieon" in nf-core__sarek/tests/sentieon*.nf.test.


6. Snapshots (.nf.test.snap)

Snapshots are pretty-printed JSON files keyed by test name, regenerated with nf-test test <path> --update-snapshot. Content is whatever the test passed to snapshot(...). nf-test auto-hashes file paths to MD5 so binary files compare sanely; structured outputs (channel lists, maps) are serialized directly (nf-test snapshot docs).

Example from nf-core__rnaseq/tests/default.nf.test.snap:1-80:

{
  "Params: default - stub": {
    "content": [
      27,                                     // workflow.trace.succeeded().size()
      {
        "FASTQC":   { "fastqc": "0.12.1" },
        "STAR_GENOMEGENERATE": { "star": "2.7.11b", "samtools": 1.21, "gawk": "5.1.0" },
        "TRIMGALORE": { "trimgalore": "0.6.10" },
        ...                                   // normalized versions.yml
      },
      [ "fastqc", "fastqc/raw", "fastqc/raw/RAP1_IAA_30M_REP1_raw.html", ... ]  // stable_name listing
    ]
  }
}

Observations from the corpus:

.nftignore filters unstable files out of stable_path. nf-core__demo/tests/.nftignore:1-14:

.DS_Store
multiqc/multiqc_data/fastqc_top_overrepresented_sequences_table.txt
multiqc/multiqc_data/multiqc.parquet
multiqc/multiqc_data/multiqc.log
multiqc/multiqc_data/multiqc_data.json
multiqc/multiqc_data/multiqc_sources.txt
multiqc/multiqc_data/multiqc_software_versions.txt
multiqc/multiqc_data/llms-full.txt
multiqc/multiqc_plots/{svg,pdf,png}/*.{svg,pdf,png}
multiqc/multiqc_report.html
multiqc/multiqc_data/BETA-multiqc.parquet
fastqc/**/*_fastqc.{html,zip}
pipeline_info/*.{html,json,txt,yml}

This tells you directly what’s known to drift: MultiQC reports, FastQC HTML/ZIP bundles, parquet caches, pipeline_info metadata. New pipelines start with a similar list out of the nf-core template.

Helper plugins for format-aware hashing. nf-core points to nft-utils, nft-bam, nft-vcf which compute checksums that ignore headers/timestamps in BAM/VCF files instead of raw MD5 (Seqera blog).


7. Stub blocks and -stub-run

Official feature: a stub: block inside a process supplies a dummy script that runs under -stub-run to produce filename-compatible outputs without real computation (Nextflow process docs).

Corpus evidence: widespread in module code. 76 stub blocks in nf-core__rnaseq/modules alone. Representative block (dupradar module, grep-extracted):

stub:
"""
touch ${meta.id}_duprateExpDens.pdf
touch ${meta.id}_duprateExpBoxplot.pdf
touch ${meta.id}_expressionHist.pdf
touch ${meta.id}_dupMatrix.txt
touch ${meta.id}_intercept_slope.txt
touch ${meta.id}_dup_intercept_mqc.txt
...

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bioconductor-dupradar: \$(Rscript -e "library(dupRadar); cat(as.character(packageVersion('dupRadar')))")
END_VERSIONS
"""

Stub use in nf-test: the rnaseq default.nf.test.snap key is "Params: default - stub" — the pipeline-level test runs under -stub-run and the snapshot shows .stub marker files (multiqc/star_salmon/multiqc_data/.stub) rather than real outputs. This is how the flagship “is the workflow wiring correct?” CI test runs fast without burning full aligner time.

Known anti-patterns (process docs, Indap blog, issue #6556):

nf-core’s current push is for all modules to have stubs so end-to-end stub runs are reliable (nf-core blog 2026).


8. CI integration

All 7 pipelines ship a .github/workflows/nf-test.yml. The canonical shape (from nf-core__demo/.github/workflows/nf-test.yml):

Additional workflows seen in larger pipelines:


9. Common shortcuts, gaps, and anti-patterns

Drawn from corpus evidence + external guidance (Seqera blog, nf-core testing recommendations, nf-test snapshot docs):

What isn’t rigorously tested:

What’s conspicuously good:

Common mistakes (writing-tests tutorial):


10. Implications for Galaxy-side tooling

For a review-nextflow skill or downstream translation work:

  1. Test data is structured and discoverable. Look in conf/test.config params for URLs; URLs are raw GitHub content pinned to a SHA. params.modules_testdata_base_path in module tests resolves to the modules branch of nf-core/test-datasets. A review agent can inventory the full test-data surface with a grep.
  2. The test profile is the contract for what a minimal run looks like — parameters, references, seeds. Translating this to a Galaxy test should preserve the same dataset choices.
  3. Variant profiles 1:1 map to testable modes. Each conf/test_<variant>.config corresponds to a tests/<variant>.nf.test. This is the explicit enumeration of branches the pipeline authors consider test-worthy — a natural input to the “workflow splitter” problem in nf→Galaxy.
  4. .nftignore is a cheat sheet for unstable outputs. When building Galaxy tests for a translated workflow, anything in .nftignore on the nf-core side is known-unstable and likely needs the same treatment (or an nft-bam/vcf-style format-aware comparator).
  5. Snapshots aren’t directly portable to Galaxy, but they tell you what the pipeline authors believed was stable enough to pin. That’s useful provenance for deciding which translated outputs deserve assertions.
  6. Stub blocks are a shortcut for workflow-wiring validation. A Galaxy analog would be dry-running the translated .ga with synthetic inputs to check connectivity — worth modeling on this pattern.
  7. Containers / seeds are part of the test contract. When translating, preserve the seed flags from conf/test.config’s process { ext.args } or the translated Galaxy workflow will produce non-matching outputs.

Corpus statistics (pinned SHAs)

PipelinePipeline testsModule tests (in-repo)Subworkflow testsSnap files
nf-core/demo 1.1.01357
nf-core/fetchngs 1.12.0110915
nf-core/bacass 2.5.0921634
nf-core/hlatyping 2.2.079519
nf-core/taxprofiler 2.0.0859570
nf-core/rnaseq 3.24.0207634124
nf-core/sarek 3.8.1590362

Sarek’s 0 module tests reflect its policy of inheriting modules from nf-core/modules upstream rather than vendoring module tests in-repo.


Sources

External (documentation, blogs, papers):

Corpus (local pins at ~/projects/repositories/nextflow-fixtures/pipelines/):