Component — Galaxy Workflow Testing (IWC + planemo)
Synthesis of (a) the planemo-centric external documentation + Galaxy ecosystem specs and (b) concrete evidence from the IWC corpus at /Users/jxc755/projects/repositories/iwc/. Every corpus claim is grounded in a file path + line numbers; every external claim has a URL.
Scope & positioning. This document covers the public-facing, contribution-oriented workflow testing layer: the -tests.yml format run by planemo, bundled into IWC, and enforced by IWC CI. It is complementary to the existing vault note Component - Workflow Testing.md (887 lines), which covers Galaxy core’s internal frameworks (.gxwf.yml / .gxwf-tests.yml + the procedural test_workflows.py suite). Both layers share the assertion vocabulary — they differ in packaging, discovery, and execution environment.
1. The two layers
- Galaxy core (covered in the companion vault note):
.gxwf.yml/.gxwf-tests.ymlfixtures undergalaxy/test/functional/+ Python drivertest_workflows.py. Exercises Galaxy internals, runs inside the Galaxy pytest suite. - IWC / planemo (this document): directory-per-workflow,
-tests.ymlsibling of each.ga, run byplanemo test, wrapped in a GitHub Actions shard matrix, deployed post-merge to Dockstore + WorkflowHub +iwc-workflowsGitHub org. This is the path contributors follow to land a workflow in usegalaxy.*
The assertion vocabulary is shared (same galaxy.tool_util.verify.asserts code path). The harness, discovery mechanism, and CI environment differ.
2. The -tests.yml format (planemo spec)
Authoritative reference: planemo.readthedocs.io/en/latest/test_format.html.
A tests file is a YAML list of test cases. Each case has three top-level keys:
doc:— description string (required in practice).job:— input mapping, keyed by the workflow’s input label (not step index).outputs:— mapping keyed by the workflow’s output label, containing assertions or file comparisons.
Inputs referenced by workflow label, not index. The natural-language workflow input label is the key verbatim — spaces, colons, question marks and all. Example: /Users/jxc755/projects/repositories/iwc/workflows/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy-tests.yml:23 has Manually annotate celltypes?: true as a job key. This makes labeled inputs a load-bearing planemo practice — unlabeled inputs fall back to “Input dataset” defaults which are fragile (planemo best practices).
2a. Canonical minimal example
/Users/jxc755/projects/repositories/iwc/workflows/sars-cov-2-variant-calling/sars-cov-2-consensus-from-variation/consensus-from-variation-tests.yml (32 lines) is the simplest complete test in the sample:
- doc: Test consensus building from called variants
job:
Reference genome:
class: File
location: 'https://zenodo.org/record/4555735/files/NC_045512.2_reference.fasta?download=1'
hashes:
- hash_function: SHA-1
hash_value: db3759c2e1d9ce8827ba4aa1749e759313591240
aligned reads data for depth calculation:
class: Collection
collection_type: 'list'
elements:
- identifier: SRR11578257
class: File
path: test-data/aligned_reads_for_coverage.bam
...
outputs:
multisample_consensus_fasta:
file: test-data/masked_consensus.fa
Three patterns in 32 lines: remote data + SHA-1 integrity check, a list collection from local fixtures, and an exact-file output assertion.
2b. Input shapes
All documented at the planemo test_format page; all observed in the corpus:
-
Scalar param: bare value keyed by label.
short-read-quality-control-and-trimming-tests.yml:19-23—Qualified quality score: '15',Adapter to remove on forward reads: null. -
Single file, remote:
class: File+location:+filetype:+hashes:(optional but IWC-idiomatic for integrity). -
Single file, local:
class: File+path: test-data/.... -
List collection:
class: Collection, collection_type: list, elements: [{identifier, class: File, path/location, filetype}]. Examplehyphy-core-tests.yml:8-30. -
Paired / list:paired / list:list:paired: nested
class: Collectionwithtype: pairedinsideelements:. Full pattern atpox-virus-half-genome-tests.yml:17-38:class: Collection collection_type: list:paired elements: - class: Collection type: paired identifier: 20L70 elements: - identifier: forward class: File location: ... - identifier: reverse class: File location: ... -
CWL-style shorthand — list of File dicts without identifiers (documented; not observed in sampled IWC).
-
composite_data:— multi-file datatypes (e.g. imzml + ibd). -
tags:on elements (Galaxy 20.09+). -
External job file —
job: some_job.yml(documented; rare in IWC). -
Data-table / built-in-index refs — the workflow input receives a plain string (e.g.
"hg38") matching a.locentry. IWC CI mounts CVMFS viatest_workflows.yml:83(setup-cvmfs: true) to make these resolvable. No special JSON/YAML form — it’s just a string parameter.
2c. Output assertion shapes
Three patterns (documented at test_format.html; all observed):
- Exact file:
file: test-data/expected.ext— byte-for-byte or viacompare:option. - Checksum:
checksum: "sha1$..."(documented; not observed in sampled IWC). - Structured
asserts:— content assertions (preferred by IWC for outputs > 1 MB, perworkflows/README.md).
File-compare options:
compare: diff(default) +lines_diff:tolerance.compare: sim_size+delta:(bytes) ordelta_frac:— size-only with slack. Used heavily in repeatmasking for non-deterministic RepeatModeler outputs:RepeatMasking-Workflow-tests.yml:12-20pairs each output withcompare: sim_size, delta: 30000, up todelta: 90000000on a large Stockholm alignment.compare: re_match/re_match_multiline.compare: contains.
asserts: vocabulary (shared with tool XML <assert_contents> via galaxy.tool_util.verify.asserts; authoritative list in the Galaxy XSD at galaxy/lib/galaxy/tool_util/xsd/galaxy.xsd):
| Category | Assertions |
|---|---|
| Text | has_text, not_has_text, has_text_matching, has_line, has_line_matching, has_n_lines (with delta:) |
| Tabular | has_n_columns |
| Size | has_size (value, min, max, delta) |
| Archives | has_archive_member (regex path; nests assertions on member content) |
| HDF5 | has_h5_keys, has_h5_attribute |
| XML | is_valid_xml, has_element_with_path, has_n_elements_with_path, element_text_matches, element_text_is, attribute_matches, attribute_is, xml_element |
| JSON | has_json_property_with_value, has_json_property_with_text |
| Images | has_image_width, has_image_height, has_image_channels, has_image_center_of_mass, plus related |
Verify exact assertion names against the XSD before relying on them — the corpus-surfaced names (has_image_width/has_image_height/has_size) are confirmed; the broader image-assertion list is indicative, not audit-verified.
Diverse asserts: examples from the corpus:
- Content substring on HTML:
short-read-quality-control-and-trimming-tests.yml:25-28—has_text: text: "Filtered Reads"against a MultiQC report. - Exact-line check:
Preprocessing-...-Scanpy-tests.yml:97-100—has_linewith a tab-delimited literal. - AnnData structure: same file, lines 27-32 —
has_h5_keys: keys: "obs/louvain". - Image plots: same file, lines 33-42 —
has_size: size: 696399, delta: 60000+has_image_width/has_image_heightwithdelta:across ~15 PNGs.
2d. Collection output assertions
element_tests: keyed by element identifier; each value is the same assertion dict used for a single file; nested collections nest element_tests: recursively.
- List of JSONs:
hyphy-core-tests.yml:31-71— four output collections (meme_output,prime_output,busted_output,fel_output), each keyed by gene identifier, each element assertinghas_text: text: "{". list:paired:short-read-quality-control-and-trimming-tests.yml:31-43useselement_tests: pair: asserts: has_text: ....
Deeply nested (list:list:paired) collection assertions are legal but sparsely exemplified; planemo’s _writing_collections.rst is the de-facto reference.
2e. Not observed in IWC
expect_failure:— IWC is happy-path only across the sampled files. Zero negative tests.- Stray
md5:assertions — the corpus-wide preference issim_size+deltaor content probes (has_text,has_n_lines).
3. IWC repository contract
Per /Users/jxc755/projects/repositories/iwc/workflows/README.md:12-18 and sampled directories, every IWC workflow directory holds:
<category>/<workflow-name>/
├── <workflow-name>.ga ← Galaxy native workflow (mandatory)
├── <workflow-name>-tests.yml ← planemo tests file (mandatory, basename matches)
├── README.md ← narrative + Input/Output Datasets sections
├── CHANGELOG.md ← keepachangelog format, ISO dates
├── .dockstore.yml ← Dockstore 1.2 descriptor
└── test-data/ ← optional: small fixtures + expected outputs
requirements.txt:3pinsplanemo>=0.74.5..gafront matter requiresa_galaxy_workflow: "true",creator:(Person with ORCID URI or Organization with URL —consensus-from-variation.ga:4-10),license:,name:,release:(e.g."0.4.3"),annotation:. Optional: top-levelreport: markdown:(custom invocation report,short-read-quality-control-and-trimming.ga:21-23), embeddedreadme:.CHANGELOG.md+releaseare kept in lockstep by the repo-rootbump_version.pyhelper.### Automatic updateentries come from the weekly planemo-autoupdate bot (workflows/README.md:231-247).ro-crate-metadata.jsonis not stored in this repo — it’s generated on merge byworkflows/gen_crates.pyand lands in the downstreamgithub.com/iwc-workflows/<name>repo.
3a. Multi-workflow families in one directory
hyphy/holds four workflows./Users/jxc755/projects/repositories/iwc/workflows/comparative_genomics/hyphy/.dockstore.yml:2-54enumerates all four with distinctname:/primaryDescriptorPath:/testParameterFiles:. One<wfname>.ga+ one<wfname>-tests.ymlper workflow at directory root; sharedREADME.md, sharedCHANGELOG.md, sharedtest-data/. Perworkflows/README.md:217-221, co-resident workflows must bump version in lockstep.repeatmasking/has one published workflow (RepeatMasking-Workflow.ga) and an orphaned second tests file (Repeat-masking-with-RepeatModeler-and-RepeatMasker-tests.yml) without a matching registered.ga— an example of legacy/alternate test harness residue.
4. Test data organization
Two storage patterns, often combined in one test case:
-
Remote via
location:for large inputs. Overwhelmingly Zenodo (persistent DOI):short-read-quality-control-and-trimming-tests.yml:13,17—https://zenodo.org/records/11484215/files/paired_r1.fastq.gzPreprocessing-...-Scanpy-tests.yml:8-13—https://zenodo.org/record/3581213/files/...Mass_spectrometry__LC-MS_...-tests.yml— 11+ mzML files fromzenodo.org/record/10130758/files/consensus-from-variation-tests.yml:6—https://zenodo.org/record/4555735/files/...
Also EBI/ENA and SRA FTP for virology-style raw reads:
pox-virus-half-genome-tests.yml:5(EBI reference),:27,:34,:48,:55(SRA FTP fastq). Every remotelocation:is paired with a SHA-1hashes:block for integrity. -
In-repo
test-data/viapath:for small fixtures and expected outputs.- Structured subdirs:
hyphyusestest-data/unaligned_seqs/,test-data/codon_alignments/,test-data/iqtree_trees/(seehyphy-compare-tests.yml:12,19,25). - Element identifiers can contain pipes (
AB178040.1|2002) and survive the YAML round-trip.
- Structured subdirs:
IWC convention (workflows/README.md): large inputs go to Zenodo; only toy data in-repo. Reviewers push back on large files committed in test-data/.
5. CI integration
Important path note: the file /Users/jxc755/projects/repositories/iwc/.github/workflows/gh-build-and-test.yml triggers only on website/** changes (it’s the static-site Playwright E2E job). The real workflow-test CI lives at /Users/jxc755/projects/repositories/iwc/.github/workflows/workflow_test.yml.
Structure of workflow_test.yml:
- Triggers (L4-14):
pushandpull_request, ignoring**/*.md,scripts/**,website/**. setupjob (L18-25) calls reusablesetup.yml; pinsgalaxy-branch: release_25.1,galaxy-fork: galaxyproject,max-chunks: 4,python-version: 3.11. Usesgalaxyproject/planemo-ci-action@v1(setup.yml:86-94) to run planemo’sci_find_reposlogic against changed files and emitrepository-list+chunk-listoutputs.lintjob (L28-58) runs planemo-ci-action inmode: lint,workflows: true,additional-planemo-options: --iwc(the--iwcflag toggles IWC-specific lint rules).testjob (L60-73) calls reusabletest_workflows.yml, which runs a matrix of chunk × python-version against a PostgreSQL service (test_workflows.yml:53-61). Each chunk runs planemo-ci-action inmode: testwithsetup-cvmfs: true(L83).fail-fast: false— chunks fail independently.combine_outputs(L75-117) downloads per-chunk artifacts, runsmode: combine(HTML + Markdown report +$GITHUB_STEP_SUMMARY), thenmode: checkto fail on any test failure.deploy(L120-160, main branch + galaxyproject org only) regenerates RO-Crates viaworkflows/gen_crates.py, then planemo-ci-actionmode: deploy, workflow-namespace: iwc-workflowspushes each workflow togithub.com/iwc-workflows/<name>.deploy-report(L162-182) posts a PR comment on deploy failure.determine-success(L184-192) — non-main PRs pass only when lint + combine_outputs succeed.
Key properties:
- Planemo-driven, shard-parallel (max 4 chunks).
- Galaxy runs in-CI, not against usegalaxy.*. Release pinned to
25.1. - CVMFS mounted in-runner for built-in indices /
.loclookups. - Three merge gates: lint passes → tests pass → human reviewer approves.
- Post-merge auto-deploy: Dockstore (via
.dockstore.yml), WorkflowHub,iwc-workflows/<name>mirror repo, LifeMonitor registration.
6. Planemo toolchain for workflows
Cross-referenced from planemo.readthedocs.io + the GTN FAQ + the workflow-fairification tutorial:
| Command | Purpose |
|---|---|
planemo test <workflow.ga> | Run -tests.yml (auto-discovered by filename). Local Galaxy by default; --galaxy_url + --galaxy_user_key for remote. Outputs HTML / JSON / xUnit / JUnit. |
planemo run <workflow.ga> <job.yml> | Execute without assertions. Supports --engine external_galaxy, --profile, --download_outputs, --output_json. |
planemo serve | Launch local Galaxy preloaded with workflow tools. |
planemo workflow_lint / planemo lint --iwc | Validate .ga / format2. --iwc adds IWC-specific rules (creator URI, license, release, connected inputs, labeled outputs). |
planemo workflow_test_init | Scaffold a -tests.yml. With --from_invocation <id> it reconstructs job + outputs + test-data/ from a completed invocation. |
planemo workflow_test_on_invocation <tests.yml> <id> | Re-validate edited assertions against a saved invocation without re-running the workflow. Added to reduce the inner-loop cost of assertion iteration. |
planemo workflow_job_init | Scaffold a job.yml template. |
planemo list_invocations, planemo invocation_download, planemo invocation_export, planemo rerun | Post-hoc invocation tooling. |
planemo dockstore_init | Generate .dockstore.yml for submission. |
The --from_invocation pattern is strongly preferred by IWC reviewers: generate the test from a real run on usegalaxy.*, don’t hand-write it. See help.galaxyproject.org/t/adding-galaxy-eu-workflow-to-iwc-library and the workflow-fairification tutorial.
7. The .ga format and gxformat2
Two formats exist (galaxyproject/gxformat2, v19_09 spec):
- Legacy
.ga— Galaxy-native JSON, verbose, not human-writable; what IWC commits. - Format 2 — human-writable YAML with
inputs:/steps:/outputs:, structuredstate:instead oftool_state. Galaxy ingests both; gxformat2 round-trips. Shipsgxwf-lint,gxwf-viz,gxwf-abstract-export, and Python / Java / TypeScript bindings (Schema Salad-generated).
Tests reference workflow inputs / outputs by label, not step index. That makes labeled inputs/outputs load-bearing — planemo workflow_lint enforces it. Renaming a labeled output in the .ga silently breaks its test unless -tests.yml is updated in the same commit.
Format2 adoption in IWC is slow — workflows stay committed as .ga; gxformat2 is used for linting / round-tripping (see gxformat2#61).
8. Scale of corpus (sampled)
Categories sampled: read-preprocessing, comparative_genomics, virology, metabolomics, scRNAseq, repeatmasking, sars-cov-2-variant-calling. Other categories present include amplicon, bacterial_genomics, computational-chemistry, data-fetching, epigenetics, genome_annotation, genome-assembly, imaging, microbiome, proteomics.
Every sampled workflow carries a -tests.yml sibling. Per workflows/README.md:61-64, contribution without tests is permitted but deprioritized; publication-to-usegalaxy is gated on tests passing.
9. Common shortcuts, gaps, anti-patterns
Corpus-observed shortcuts:
- Existence-only content probes. HyPhy family tests assert only that each output JSON starts with
{(hyphy-core-tests.yml:37-71). Applies to 10+ statistical outputs across MEME/PRIME/BUSTED/FEL/CFEL/RELAX — “file exists and is valid-ish JSON,” nothing about content correctness. - Size-only comparisons for non-deterministic outputs.
RepeatMasking-Workflow-tests.yml:12-20— every output reduced tocompare: sim_size, delta: 30000..90000000. Correctness = “output is within ~30KB of expected size.” Pragmatic for RepeatModeler but no semantic check. - Image tests check pixel dimensions, not content. Scanpy clustering asserts
has_image_width/has_image_height/has_sizewith 5-10% deltas across PNG plots — catches “something rendered” but not “correct plot.” - Happy path only. Zero
expect_failure:in the sampled corpus. - Remote-data availability dependency. Heavy Zenodo + EBI/SRA FTP usage. SHA-1 hashes guard against silent corruption but not against service outages — a Zenodo hiccup breaks CI across many IWC PRs simultaneously.
- Big raw data in CI. pox-virus pulls full SRA fastq.gz pairs every run; LCMS pulls 11+ mzML files. Beyond pip / planemo caches (
setup.yml:66-77), no cross-run data cache.
CI / environment gaps:
- CVMFS-gated tests are not portable. Tests that require built-in indices need CVMFS mounted. Works in IWC CI (
test_workflows.yml:83), fails on a plain developer laptop runningplanemo test. The corollary: contributors hit reference-genome /.locmismatches when a required entry isn’t in the mounted cache. - Pinned Galaxy version. CI runs
release_25.1. Regressions againstdev/mainGalaxy are invisible until the next release bump. - No per-test timeouts, no retry. A hanging tool hangs the whole chunk until GHA’s 6-hour hard limit.
- Format2 lag.
.gais the canonical committed form; format2 (gxformat2) is a lint/round-trip side-path. Format2-first workflows aren’t first-class in IWC yet.
Workflow-testing friction points:
- Intermediate step outputs can’t be directly asserted — they have to be promoted to workflow outputs first. Common contributor pain point.
- Tests are coupled to output labels. Renaming an output in the
.gasilently breaks the sibling test. - Invocation iteration cost.
planemo testre-runs the whole workflow per change.workflow_test_on_invocationwas added to iterate on assertions without re-running, but it’s under-used.
Common PR-review feedback (community / help threads):
- Generate tests via
--from_invocation, don’t hand-write. help.galaxyproject.org thread 13903. - Replace locally-copied large inputs with Zenodo
location:URLs before submitting. - Set creator
identifier:to a full ORCID URL — the most common lint failure. Enforced in planemo#1458. - Don’t use
compare: diffon outputs that embed timestamps — switch tohas_text/has_n_lineswithdelta:. - Bump
releasein the.gaand add aCHANGELOG.mdentry in the same PR — the IWC PR template enforces this; reviewers catch it viabump_version.py.
10. Implications for gxwf + review-nextflow skill development
- Assertion vocabulary is shared between workflows and tools. The
asserts:block is the same code path as tool XML<assert_contents>. Anything gxwf or a conversion skill emits can reuse the existing Galaxy XSD as the source-of-truth schema. This is a strong schema to target for JSON-schema-driven static validation. - Tests reference inputs by workflow label, not index. For any nf→Galaxy translation, the label discipline has to be preserved end-to-end — an unlabeled input in the translated
.gameans its test becomes fragile / unspecifiable. --from_invocationis the preferred authoring path. The equivalent story for gxwf-authored workflows should probably be: run on a Galaxy instance, capture the invocation, regenerate the tests file. The tooling already exists (planemo workflow_test_init --from_invocation); wrapping it into the gxwf workflow-authoring loop would match how humans actually do this.- IWC’s format is the contribution contract. Anything intended to land in IWC must satisfy: directory layout +
-tests.yml+README.md+CHANGELOG.md+.dockstore.yml+ labeled inputs/outputs + creator ORCID + license + release. A review skill or conversion skill should audit against this checklist, not just against planemo lint. - Test data strategy for generated Galaxy workflows. Mirror IWC’s Zenodo-first pattern; toy data only in
test-data/. For nf→Galaxy translations, the nf-core test-datasets URLs (covered inCOMPONENT_NEXTFLOW_WORKFLOW_TESTING.md) are already stable persistent URLs — they can be reused directly in the translated-tests.yml. .nftignore↔compare: sim_size+delta / filename-only assertions. The Nextflow-side convention of excluding unstable files from snapshots maps naturally to the planemo convention of using tolerant assertions (sim_size+delta,has_image_*+delta,has_n_lines+delta) for the same outputs. A translator should preserve this mapping so translated tests aren’t stricter than their source.- CVMFS + built-in indices are a translation friction point. Nextflow pipelines parameterize references via URLs in
test.config; Galaxy workflows frequently use.loc-backed data tables resolved via CVMFS. A faithful translation needs to pick a lane — stay URL-driven (portable but slow), or switch to data-table driven (fast but requires CVMFS-aware CI). - This document + the core-side vault note cover both layers. Anything targeting “Galaxy workflow testing” should reference both. The core note covers
.gxwf.ymlinternals; this note covers IWC + planemo contribution flow.
Key paths and sources
IWC corpus:
- Repo root:
/Users/jxc755/projects/repositories/iwc/ - Contribution contract:
workflows/README.md - CI entry point:
.github/workflows/workflow_test.yml(NOTgh-build-and-test.yml, which is the website job) - Reusable CI jobs:
.github/workflows/setup.yml,.github/workflows/test_workflows.yml - Version helper:
bump_version.py - Sample workflows cited:
workflows/read-preprocessing/short-read-qc-trimming/workflows/comparative_genomics/hyphy/workflows/virology/pox-virus-amplicon/workflows/metabolomics/lcms-preprocessing/workflows/scRNAseq/scanpy-clustering/workflows/repeatmasking/workflows/sars-cov-2-variant-calling/sars-cov-2-consensus-from-variation/
External:
- Primary spec: planemo.readthedocs.io/en/latest/test_format.html
- Best practices: planemo.readthedocs.io/en/latest/best_practices_workflows.html
- Running workflows: planemo.readthedocs.io/en/latest/running.html
- Assertion source of truth (Galaxy XSD): galaxy/lib/galaxy/tool_util/xsd/galaxy.xsd
- Galaxy verify module: docs.galaxyproject.org lib/galaxy.tool_util.verify.html
- gxformat2 spec: galaxyproject.github.io/gxformat2/v19_09.html
- IWC workflows README: github.com/galaxyproject/iwc/blob/main/workflows/README.md
- IWC CI workflow: github.com/galaxyproject/iwc/blob/main/.github/workflows/workflow_test.yml
- planemo-ci-action: github.com/galaxyproject/planemo-ci-action
- Collections reference: github.com/galaxyproject/planemo/blob/master/docs/_writing_collections.rst
- GTN FAQ on workflow tests: training.galaxyproject.org/training-material/faqs/gtn/gtn_workflow_testing.html
- GTN workflow-fairification: training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/workflow-fairification/tutorial.html
- Creator URI enforcement: galaxyproject/planemo#1458
- Help thread on IWC submission: help.galaxyproject.org/t/13903
- gxformat2 adoption issue: galaxyproject/gxformat2#61
Complementary internal note:
/Users/jxc755/projects/repositories/galaxy-brain/vault/research/Component - Workflow Testing.md— covers Galaxy core’s.gxwf.yml/test_workflows.pylayer (887 lines).
Unverified / caveats
- Galaxy release pin “25.1” reflects the CI file at research time; minor versions drift.
- The full image-assertion list is taken from XSD search output and not individually page-verified — confirm assertion names against the XSD before emitting them programmatically.
has_n_bytesas a distinct assertion is unverified; onlyhas_sizeconfirmed.- Exact IWC corpus size (workflow count) was not enumerated in this pass — subagent sampled ~7 workflows across 7 categories; the full corpus spans at least 20+ categories.