Galaxy workflow testability design

Use this note when authoring or translating a Galaxy workflow before the -tests.yml file exists. It covers workflow structure choices that make later IWC-style tests meaningful: labels, promoted checkpoints, collection identifiers, and fixture-compatible inputs.

This is not a content/patterns/ page. It is cross-cutting design guidance for Molds that need testable Galaxy workflows. Assertion syntax lives in planemo-asserts-idioms. Test YAML fixture shapes live in iwc-test-data-conventions. Accepted shortcut vs smell calls live in iwc-shortcuts-anti-patterns. Corpus evidence trail lives in iwc-workflow-testability-survey.

1. Treat labels as API

Workflow input and output labels are not cosmetic. Planemo and IWC tests address workflow inputs and outputs by label, and the survey found exact label matches for every asserted output across 114 matched workflow/test pairs. A generated workflow should therefore pick stable, descriptive labels before test authoring starts.

Rules:

Label every output that may need a test assertion.
Treat input/output renames as breaking changes requiring sibling -tests.yml updates.
Prefer stable domain names over tool-step defaults or positional names.
Do not rely on unlabeled or positional outputs for tests.

Evidence:

Scanpy exposes outputs such as Initial Anndata General Info, UMAP of louvain, Ranked genes with Wilcoxon test, and Dotplot of top genes on clusters ($IWC_FORMAT2/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy.gxwf.yml:105-147). The sibling test keys assertions by those exact labels ($IWC/workflows/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy-tests.yml:27-205).
VGP scaffolding uses punctuation-heavy labels such as Hi-C duplication stats on scaffolds: Raw, Hi-C duplication stats on scaffolds: MultiQc, and Merged Alignment stats ($IWC_FORMAT2/VGP-assembly-v2/Scaffolding-HiC-VGP8/Scaffolding-HiC-VGP8.gxwf.yml:170-196). The test asserts those exact labels ($IWC/workflows/VGP-assembly-v2/Scaffolding-HiC-VGP8/Scaffolding-HiC-VGP8-tests.yml:218-245).

2. Promote assertable checkpoints

IWC workflow tests assert workflow-level outputs. Intermediate step results are invisible unless promoted to top-level workflow outputs. When final reports are weakly assertable, expose intermediate checkpoints that carry deterministic content or structure.

Rules:

Promote intermediate outputs when they are the best deterministic or structural checkpoint.
Prefer a checkpoint table/text/HDF5 object that can prove content over a final plot/report that can only prove existence.
Accept some output-list clutter when it buys meaningful tests.
Do not promote every intermediate by default; expose checkpoints that map to concrete assertion intent.

Evidence:

Scanpy exposes 21 workflow outputs and the sibling test asserts all 21. These include initial AnnData summaries, intermediate plots, ranked-gene tables, final AnnData, cluster-count tables, and final plots ($IWC_FORMAT2/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy.gxwf.yml:105-147; $IWC/workflows/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy-tests.yml:27-205).
RNA-seq paired-end exposes mapped reads, stranded/unstranded coverage, abundance estimates, expression tables, counts tables, and MultiQC reports ($IWC_FORMAT2/transcriptomics/rnaseq-pe/rnaseq-pe.gxwf.yml:90-112). The sibling test asserts sizes for coverage/read outputs and stronger regex/line checks for expression/count outputs ($IWC/workflows/transcriptomics/rnaseq-pe/rnaseq-pe-tests.yml:48-97).
MGnify complete exposes 83 workflow outputs; 38 are asserted in the sibling test, including MultiQC reports, FASTA collections, taxonomic classifications, OTU tables, and HDF5/JSON outputs ($IWC_FORMAT2/amplicon/amplicon-mgnify/mgnify-amplicon-pipeline-v5-complete/mgnify-amplicon-pipeline-v5-complete.gxwf.yml:163-329; $IWC/workflows/amplicon/amplicon-mgnify/mgnify-amplicon-pipeline-v5-complete/mgnify-amplicon-pipeline-v5-complete-tests.yml:15-120).

3. Stabilize collection output identifiers

Collection tests key assertions by element identifier. If a workflow emits collections with unstable or opaque identifiers, the test cannot target elements cleanly.

Rules:

Preserve biologically or sample-meaningful identifiers through map-over and collection reshaping.
When generating or relabeling collections, make the identifier derivation deterministic and visible in workflow structure.
For nested collections, ensure each axis has predictable identifiers.
Quote special identifiers in tests when YAML requires it, but do not simplify identifiers merely for YAML convenience.

Evidence:

59 of 115 IWC test files use element_tests:, with 227 element_tests: blocks in the corpus survey.
10x CellPlex tests nested collection outputs by subsample, then inner matrix, barcodes, and genes elements ($IWC/workflows/scRNAseq/fastq-to-matrix-10x/scrna-seq-fastq-to-matrix-10x-cellplex-tests.yml:82-128). The workflow exposes the corresponding collection outputs ($IWC_FORMAT2/scRNAseq/fastq-to-matrix-10x/scrna-seq-fastq-to-matrix-10x-cellplex.gxwf.yml:73-91).
HyPhy collection outputs are tested by generated gene identifiers such as NC_001477.1|capsid_protein_C|95-394_DENV1 ($IWC/workflows/comparative_genomics/hyphy/hyphy-core-tests.yml:31-71). The workflow exposes collection outputs for MEME, PRIME, BUSTED, and FEL ($IWC_FORMAT2/comparative_genomics/hyphy/hyphy-core.gxwf.yml:26-38).

4. Choose checkpoints by assertion strength

Assertion choice is not only a test-file decision. It should feed back into workflow output design. If the only exposed output is a stochastic plot or binary file, the best possible test may be a weak size check. Exposing a sibling table, report, HDF5 structure, or summary line can make the same workflow much more testable.

Rules:

For image-heavy workflows, expose data or summary outputs behind the plot when possible.
For stochastic statistical outputs, expose structural checkpoints and stable summary tokens.
For binary outputs, expose a text/table report or stats file when the tool can produce one.
Use planemo-asserts-idioms to select the assertion family after choosing the checkpoint.

Evidence:

Scanpy image outputs are mostly smoke-tested with has_size, has_image_width, and has_image_height, but the same workflow also exposes AnnData HDF5 keys and cluster-count table checks ($IWC/workflows/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy-tests.yml:33-205).
RNA-seq paired-end pairs coarse size checks for coverage/mapped-read outputs with stronger regex and exact-line checks for expression/count tables ($IWC/workflows/transcriptomics/rnaseq-pe/rnaseq-pe-tests.yml:48-97).
VGP scaffolding tests combine stable text checks for scaffold/report stats with size checks for map/alignment artifacts ($IWC/workflows/VGP-assembly-v2/Scaffolding-HiC-VGP8/Scaffolding-HiC-VGP8-tests.yml:173-245).

5. Design inputs with fixtures in mind

Workflow input labels and types constrain the eventual job: block. Fixture planning is not only a test-file activity: it should influence whether the workflow exposes a file input, a collection input, a string data-table input, or a typed parameter.

Rules:

Choose input labels that will be readable as test job: keys.
Match workflow input collection types to realistic fixture shapes.
Decide early whether reference data should be a portable remote file or a CVMFS/data-table string.
Keep typed parameters explicit when tests need to set them (int, boolean, string) rather than burying them in step defaults.

Evidence:

10x CellPlex job inputs include fastq PE collection GEX, reference genome, gtf, cellranger_barcodes_3M-february-2018.txt, fastq PE collection CMO, sample name and CMO sequence collection, and Number of expected cells ($IWC/workflows/scRNAseq/fastq-to-matrix-10x/scrna-seq-fastq-to-matrix-10x-cellplex-tests.yml:2-75). The workflow declares matching collection, data, string, boolean, and int inputs ($IWC_FORMAT2/scRNAseq/fastq-to-matrix-10x/scrna-seq-fastq-to-matrix-10x-cellplex.gxwf.yml:4-72).
HyPhy accepts a list collection of unaligned sequences and preserves accession-like fixture identifiers through to output element assertions ($IWC/workflows/comparative_genomics/hyphy/hyphy-core-tests.yml:7-30; $IWC_FORMAT2/comparative_genomics/hyphy/hyphy-core.gxwf.yml:14-25).

6. Know what a gxformat2 output entry contains

Top-level gxformat2 outputs: is the public workflow-output surface. It is separate from per-step out: declarations and from step post-job actions such as change_datatype or rename.

Authoring rules:

Use label as the stable public name tests and users will address.
Use outputSource to point at the producing step output; do not rely on positional output order.
Use doc for short user-facing context when the label is not self-explanatory.
Keep type aligned with the exposed value (data, collection, or scalar vocabulary from gxformat2-workflow-inputs) when the schema needs it.
Apply change_datatype at the producing step output when Galaxy needs a stronger datatype than the tool reports; choose values from galaxy-datatypes-conf.
Use rename only for generated dataset names inside Galaxy histories. It is not a substitute for stable workflow-output label.
Treat add_tags and remove_tags as metadata helpers, not as the test API. IWC tests key by labels and collection element identifiers, not tags.
Avoid hide or delete_intermediate_datasets on outputs that are promoted as test checkpoints.

Design inference: a workflow-output promotion decision should pick both the public outputs: entry and any producer-side post-job action needed to make that output useful. For example, a synthesized BED checkpoint needs a stable output label plus a producer-side change_datatype: bed; one without the other is incomplete for a testable workflow.

Cross-references

iwc-workflow-testability-survey — corpus survey and distribution rationale.
iwc-test-data-conventions — job/input YAML shapes, remote fixtures, hashes, collection fixture syntax.
planemo-asserts-idioms — assertion-family choice after an output is exposed.
iwc-shortcuts-anti-patterns — accepted shortcut vs smell calls for weak assertions and label coupling.
planemo-workflow-test-architecture — Planemo execution, output-problem ambiguity, and structured artifacts.
gxformat2-schema — structural vocabulary for top-level workflow outputs and step post-job actions.
galaxy-datatypes-conf — valid Galaxy datatype extensions for format and change_datatype choices.

Galaxy Workflow Testability Design

Galaxy workflow testability design

1. Treat labels as API

2. Promote assertable checkpoints

3. Stabilize collection output identifiers

4. Choose checkpoints by assertion strength

5. Design inputs with fixtures in mind

6. Know what a gxformat2 output entry contains

Cross-references

Incoming References (9)