COMPONENT_GALAXY_WORKFLOW_TESTING

Component — Galaxy Workflow Testing (IWC + planemo)

Synthesis of (a) the planemo-centric external documentation + Galaxy ecosystem specs and (b) concrete evidence from the IWC corpus at /Users/jxc755/projects/repositories/iwc/. Every corpus claim is grounded in a file path + line numbers; every external claim has a URL.

Scope & positioning. This document covers the public-facing, contribution-oriented workflow testing layer: the -tests.yml format run by planemo, bundled into IWC, and enforced by IWC CI. It is complementary to the existing vault note Component - Workflow Testing.md (887 lines), which covers Galaxy core’s internal frameworks (.gxwf.yml / .gxwf-tests.yml + the procedural test_workflows.py suite). Both layers share the assertion vocabulary — they differ in packaging, discovery, and execution environment.


1. The two layers

The assertion vocabulary is shared (same galaxy.tool_util.verify.asserts code path). The harness, discovery mechanism, and CI environment differ.


2. The -tests.yml format (planemo spec)

Authoritative reference: planemo.readthedocs.io/en/latest/test_format.html.

A tests file is a YAML list of test cases. Each case has three top-level keys:

Inputs referenced by workflow label, not index. The natural-language workflow input label is the key verbatim — spaces, colons, question marks and all. Example: /Users/jxc755/projects/repositories/iwc/workflows/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy-tests.yml:23 has Manually annotate celltypes?: true as a job key. This makes labeled inputs a load-bearing planemo practice — unlabeled inputs fall back to “Input dataset” defaults which are fragile (planemo best practices).

2a. Canonical minimal example

/Users/jxc755/projects/repositories/iwc/workflows/sars-cov-2-variant-calling/sars-cov-2-consensus-from-variation/consensus-from-variation-tests.yml (32 lines) is the simplest complete test in the sample:

- doc: Test consensus building from called variants
  job:
    Reference genome:
      class: File
      location: 'https://zenodo.org/record/4555735/files/NC_045512.2_reference.fasta?download=1'
      hashes:
      - hash_function: SHA-1
        hash_value: db3759c2e1d9ce8827ba4aa1749e759313591240
    aligned reads data for depth calculation:
      class: Collection
      collection_type: 'list'
      elements:
      - identifier: SRR11578257
        class: File
        path: test-data/aligned_reads_for_coverage.bam
        ...
  outputs:
    multisample_consensus_fasta:
      file: test-data/masked_consensus.fa

Three patterns in 32 lines: remote data + SHA-1 integrity check, a list collection from local fixtures, and an exact-file output assertion.

2b. Input shapes

All documented at the planemo test_format page; all observed in the corpus:

2c. Output assertion shapes

Three patterns (documented at test_format.html; all observed):

  1. Exact file: file: test-data/expected.ext — byte-for-byte or via compare: option.
  2. Checksum: checksum: "sha1$..." (documented; not observed in sampled IWC).
  3. Structured asserts: — content assertions (preferred by IWC for outputs > 1 MB, per workflows/README.md).

File-compare options:

asserts: vocabulary (shared with tool XML <assert_contents> via galaxy.tool_util.verify.asserts; authoritative list in the Galaxy XSD at galaxy/lib/galaxy/tool_util/xsd/galaxy.xsd):

CategoryAssertions
Texthas_text, not_has_text, has_text_matching, has_line, has_line_matching, has_n_lines (with delta:)
Tabularhas_n_columns
Sizehas_size (value, min, max, delta)
Archiveshas_archive_member (regex path; nests assertions on member content)
HDF5has_h5_keys, has_h5_attribute
XMLis_valid_xml, has_element_with_path, has_n_elements_with_path, element_text_matches, element_text_is, attribute_matches, attribute_is, xml_element
JSONhas_json_property_with_value, has_json_property_with_text
Imageshas_image_width, has_image_height, has_image_channels, has_image_center_of_mass, plus related

Verify exact assertion names against the XSD before relying on them — the corpus-surfaced names (has_image_width/has_image_height/has_size) are confirmed; the broader image-assertion list is indicative, not audit-verified.

Diverse asserts: examples from the corpus:

2d. Collection output assertions

element_tests: keyed by element identifier; each value is the same assertion dict used for a single file; nested collections nest element_tests: recursively.

Deeply nested (list:list:paired) collection assertions are legal but sparsely exemplified; planemo’s _writing_collections.rst is the de-facto reference.

2e. Not observed in IWC


3. IWC repository contract

Per /Users/jxc755/projects/repositories/iwc/workflows/README.md:12-18 and sampled directories, every IWC workflow directory holds:

<category>/<workflow-name>/
├── <workflow-name>.ga              ← Galaxy native workflow (mandatory)
├── <workflow-name>-tests.yml       ← planemo tests file (mandatory, basename matches)
├── README.md                       ← narrative + Input/Output Datasets sections
├── CHANGELOG.md                    ← keepachangelog format, ISO dates
├── .dockstore.yml                  ← Dockstore 1.2 descriptor
└── test-data/                      ← optional: small fixtures + expected outputs

3a. Multi-workflow families in one directory


4. Test data organization

Two storage patterns, often combined in one test case:

IWC convention (workflows/README.md): large inputs go to Zenodo; only toy data in-repo. Reviewers push back on large files committed in test-data/.


5. CI integration

Important path note: the file /Users/jxc755/projects/repositories/iwc/.github/workflows/gh-build-and-test.yml triggers only on website/** changes (it’s the static-site Playwright E2E job). The real workflow-test CI lives at /Users/jxc755/projects/repositories/iwc/.github/workflows/workflow_test.yml.

Structure of workflow_test.yml:

Key properties:


6. Planemo toolchain for workflows

Cross-referenced from planemo.readthedocs.io + the GTN FAQ + the workflow-fairification tutorial:

CommandPurpose
planemo test <workflow.ga>Run -tests.yml (auto-discovered by filename). Local Galaxy by default; --galaxy_url + --galaxy_user_key for remote. Outputs HTML / JSON / xUnit / JUnit.
planemo run <workflow.ga> <job.yml>Execute without assertions. Supports --engine external_galaxy, --profile, --download_outputs, --output_json.
planemo serveLaunch local Galaxy preloaded with workflow tools.
planemo workflow_lint / planemo lint --iwcValidate .ga / format2. --iwc adds IWC-specific rules (creator URI, license, release, connected inputs, labeled outputs).
planemo workflow_test_initScaffold a -tests.yml. With --from_invocation <id> it reconstructs job + outputs + test-data/ from a completed invocation.
planemo workflow_test_on_invocation <tests.yml> <id>Re-validate edited assertions against a saved invocation without re-running the workflow. Added to reduce the inner-loop cost of assertion iteration.
planemo workflow_job_initScaffold a job.yml template.
planemo list_invocations, planemo invocation_download, planemo invocation_export, planemo rerunPost-hoc invocation tooling.
planemo dockstore_initGenerate .dockstore.yml for submission.

The --from_invocation pattern is strongly preferred by IWC reviewers: generate the test from a real run on usegalaxy.*, don’t hand-write it. See help.galaxyproject.org/t/adding-galaxy-eu-workflow-to-iwc-library and the workflow-fairification tutorial.


7. The .ga format and gxformat2

Two formats exist (galaxyproject/gxformat2, v19_09 spec):

Tests reference workflow inputs / outputs by label, not step index. That makes labeled inputs/outputs load-bearing — planemo workflow_lint enforces it. Renaming a labeled output in the .ga silently breaks its test unless -tests.yml is updated in the same commit.

Format2 adoption in IWC is slow — workflows stay committed as .ga; gxformat2 is used for linting / round-tripping (see gxformat2#61).


8. Scale of corpus (sampled)

Categories sampled: read-preprocessing, comparative_genomics, virology, metabolomics, scRNAseq, repeatmasking, sars-cov-2-variant-calling. Other categories present include amplicon, bacterial_genomics, computational-chemistry, data-fetching, epigenetics, genome_annotation, genome-assembly, imaging, microbiome, proteomics.

Every sampled workflow carries a -tests.yml sibling. Per workflows/README.md:61-64, contribution without tests is permitted but deprioritized; publication-to-usegalaxy is gated on tests passing.


9. Common shortcuts, gaps, anti-patterns

Corpus-observed shortcuts:

CI / environment gaps:

Workflow-testing friction points:

Common PR-review feedback (community / help threads):


10. Implications for gxwf + review-nextflow skill development

  1. Assertion vocabulary is shared between workflows and tools. The asserts: block is the same code path as tool XML <assert_contents>. Anything gxwf or a conversion skill emits can reuse the existing Galaxy XSD as the source-of-truth schema. This is a strong schema to target for JSON-schema-driven static validation.
  2. Tests reference inputs by workflow label, not index. For any nf→Galaxy translation, the label discipline has to be preserved end-to-end — an unlabeled input in the translated .ga means its test becomes fragile / unspecifiable.
  3. --from_invocation is the preferred authoring path. The equivalent story for gxwf-authored workflows should probably be: run on a Galaxy instance, capture the invocation, regenerate the tests file. The tooling already exists (planemo workflow_test_init --from_invocation); wrapping it into the gxwf workflow-authoring loop would match how humans actually do this.
  4. IWC’s format is the contribution contract. Anything intended to land in IWC must satisfy: directory layout + -tests.yml + README.md + CHANGELOG.md + .dockstore.yml + labeled inputs/outputs + creator ORCID + license + release. A review skill or conversion skill should audit against this checklist, not just against planemo lint.
  5. Test data strategy for generated Galaxy workflows. Mirror IWC’s Zenodo-first pattern; toy data only in test-data/. For nf→Galaxy translations, the nf-core test-datasets URLs (covered in COMPONENT_NEXTFLOW_WORKFLOW_TESTING.md) are already stable persistent URLs — they can be reused directly in the translated -tests.yml.
  6. .nftignorecompare: sim_size+delta / filename-only assertions. The Nextflow-side convention of excluding unstable files from snapshots maps naturally to the planemo convention of using tolerant assertions (sim_size+delta, has_image_*+delta, has_n_lines+delta) for the same outputs. A translator should preserve this mapping so translated tests aren’t stricter than their source.
  7. CVMFS + built-in indices are a translation friction point. Nextflow pipelines parameterize references via URLs in test.config; Galaxy workflows frequently use .loc-backed data tables resolved via CVMFS. A faithful translation needs to pick a lane — stay URL-driven (portable but slow), or switch to data-table driven (fast but requires CVMFS-aware CI).
  8. This document + the core-side vault note cover both layers. Anything targeting “Galaxy workflow testing” should reference both. The core note covers .gxwf.yml internals; this note covers IWC + planemo contribution flow.

Key paths and sources

IWC corpus:

External:

Complementary internal note:


Unverified / caveats