Planemo asserts: idiom and decision guide

Companion to iwc-test-data-conventions (input shapes), galaxy-workflow-testability-design (workflow structure before test YAML exists), and iwc-shortcuts-anti-patterns (what’s accepted vs smell). This note is forward-looking: when authoring a new <workflow>-tests.yml, which assertion family fits which output, and what the recommended tolerances and operators are.

The vocabulary itself is not restated here — every assertion’s parameter list, types, defaults, required fields, and Python docstring is rendered from the test-format JSON Schema at tests-format. Assertion names below deep-link into that page (e.g. has_text jumps straight to that $def).

1. Choose by output type

The single most useful decision table. Pick the row that matches the file format the workflow emits; default to the recommended assertion family.

Output type	Default assertion family	Why	Fallback
Plain text reports / logs (FastQC summary, MultiQC text section)	[[tests-format#has_text_model	has_text]] (substring on a known stable token) + [[tests-format#has_n_lines_model	has_n_lines]] with `delta:`
HTML reports (MultiQC HTML, custom dashboards)	[[tests-format#has_text_model	has_text]] against stable section names	HTML embeds timestamps and asset hashes; byte-diff is hopeless
Tabular (TSV, CSV, BED-like)	[[tests-format#has_n_columns_model	has_n_columns]] + [[tests-format#has_text_model	has_text]] for headers + [[tests-format#has_n_lines_model
VCF	`compare: diff` with `lines_diff: 6`	The `lines_diff: 6` constant matches the typical VCF header preamble that embeds `##fileDate=` and `##source=`	[[tests-format#has_text_matching_model
BAM	[[tests-format#has_size_model	has_size]] + [[tests-format#has_archive_member_model	has_archive_member]] (BAM is a gzipped block format)
FASTA (deterministic — assemblies, consensus)	`file:` exact comparison or [[tests-format#has_text_model	has_text]] for known sequence	Output is byte-stable when the upstream tool is deterministic
FASTA (non-deterministic — RepeatModeler libraries)	`compare: sim_size` with large `delta:`	Family content varies run-to-run	[[tests-format#has_n_lines_model
FASTQ (rare as workflow output)	[[tests-format#has_n_lines_model	has_n_lines]] (must be multiple of 4)	Quality scores are read-id-dependent
JSON (deterministic — config dumps, params)	[[tests-format#has_json_property_with_value_model	has_json_property_with_value]] / [[tests-format#has_json_property_with_text_model	has_json_property_with_text]]
JSON (stochastic — HyPhy stats, MCMC results)	`has_text: text: "{"` (existence-only)	Embedded floats break any structural assertion; see iwc-shortcuts-anti-patterns §1	[[tests-format#has_h5_keys_model
HDF5 / AnnData	[[tests-format#has_h5_keys_model	has_h5_keys]] + [[tests-format#has_h5_attribute_model	has_h5_attribute]] for known structure
XML	[[tests-format#is_valid_xml_model	is_valid_xml]] + [[tests-format#has_element_with_path_model	has_element_with_path]] + `element_text_is`/`element_text_matches`
PNG / image plots	[[tests-format#has_image_width_model	has_image_width]] + [[tests-format#has_image_height_model	has_image_height]] + [[tests-format#has_size_model
TIFF / multipage images	[[tests-format#has_image_frames_model	has_image_frames]] + [[tests-format#has_image_channels_model	has_image_channels]] + [[tests-format#has_size_model
Archives (zip, tar.gz)	`has_archive_member: path: "regex"` with nested `asserts:`	Asserts on a specific member; archive timestamps never byte-stable	[[tests-format#has_size_model
GFF / GTF	[[tests-format#has_n_lines_model	has_n_lines]] with `delta:` + [[tests-format#has_text_model	has_text]] for known feature
Cool / HiC matrices	`compare: sim_size` with multi-MB `delta:`	Binary, run-to-run variance	[[tests-format#has_archive_member_model

When in doubt: start with has_size + delta_frac: 0.1. It catches the catastrophic failure mode (empty / 10x bigger output). Then add a content probe.

1a. If the assertion is too weak, revisit the workflow output

Assertion choice sometimes reveals a workflow-design problem. If the only available output can support only a size check or image-dimension smoke test, check whether the workflow should expose a stronger checkpoint before settling for the weak assertion.

Scanpy’s plot outputs use size and image-dimension assertions, but the same workflow also exposes AnnData HDF5 checkpoints and a cluster-count table ($IWC/workflows/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy-tests.yml:33-205).
RNA-seq paired-end uses coarse size bands for coverage/mapped-read outputs, but expression/count tables get stronger regex and exact-line checks ($IWC/workflows/transcriptomics/rnaseq-pe/rnaseq-pe-tests.yml:48-97).

Rule: if a translated workflow exposes only weakly assertable final reports, consult galaxy-workflow-testability-design and consider promoting a table/text/HDF5 checkpoint before writing the final assertions.

2. The `compare:` operators

Top-level on a file: output assertion. Vocabulary, in decreasing strictness:

diff (default). Byte-for-byte equality with optional lines_diff: tolerance for a fixed number of header lines. Use only when the upstream tool is deterministic on fixed inputs and the output has no embedded timestamps, command lines, version banners, or hash-ordered Python-dict-style keys.
re_match / re_match_multiline. Each line of the expected fixture is a regex that must match the corresponding output line. Useful when a few fields per row are timestamped but the rest is canonical. Rare in the corpus.
contains. The expected fixture is a substring of the output. Cheap; weak. Prefer asserts: has_text for new code unless you genuinely have a multi-line block to assert as a whole.
sim_size. Output file size matches the fixture’s size within delta: (bytes) or delta_frac: (fraction). Use when the output is necessarily non-deterministic but its rough size is reproducible (RepeatModeler libraries, HiC matrices, Bayesian sampler outputs).

Picking lines_diff:: count the mutable header lines in the output format. VCF: ~6 (##fileformat, ##fileDate, ##source, ##reference, contig/info lines vary). SAM/text headers: count @HD/@PG/@CO lines. Set lines_diff: to the count exactly — looser values mask real diffs.

3. Tolerance picking

delta: is bytes (for has_size and compare: sim_size) or absolute count (for has_n_lines, has_n_columns, has_image_width, etc.). Suffix multipliers documented in the schema — 1K, 1M, 1G work.

delta_frac: is a fraction (0.1 = 10%). Use when expected size scales with input volume. Three IWC tests use it (scRNAseq/baredsc/*, genome-assembly/polish-with-long-reads/*); the rest use absolute delta:.

Picking magnitudes (from corpus survey in iwc-shortcuts-anti-patterns §2):

Image dimensions: delta: 25–30 pixels (5% of typical matplotlib defaults).
Image file size: delta: 5K–60K (5–10% of file size).
Small text reports: delta: 1K–10K.
HTML reports: delta: 25K–100K.
BAM files: delta: 1M–10M.
RepeatModeler / Bayesian sampler outputs: delta: 30K–90M (extreme, but justified by the underlying nondeterminism).

Heuristic for new outputs: delta_frac: 0.1 is a defensible default. Tighten if the output proves more deterministic than expected.

4. Text family — has_text vs has_text_matching vs has_line vs has_line_matching vs has_n_lines

All five are common; choose by what you’re verifying.

has_text — output contains the substring text:. Anywhere in the output, any number of times. Add n: / min: / max: to constrain occurrence count. Add delta: to allow slack on the count.
has_text_matching — output matches the regex expression:. Use sparingly; prefer literal has_text when you can.
has_line — output has at least one line matching line: exactly. Use when line boundaries matter (e.g. asserting on a specific row in a table).
has_line_matching — same but with regex.
has_n_lines — assert the line count is n: ± delta:.

A common combination in IWC: has_n_lines: n: 100, delta: 5 + has_text: text: "expected_token" — line-count sanity-check plus a content marker. This catches both truncation and content drift in one assertion pair.

negate: true is supported on every assertion. Used for the “this output should NOT contain X” case.

5. Collection output assertions (`element_tests:`)

For Galaxy collection outputs, the test format keys element assertions by element identifier:

my_collection_output:
  element_tests:
    sample_1:
      asserts:
        has_text:
          text: "expected"
    sample_2:
      file: test-data/expected_sample_2.txt

Optional attributes: at the collection level can assert on the produced collection shape:

my_collection_output:
  attributes: {collection_type: list:list}
  element_tests:
    ...

Nested collections: outer element_tests: keyed by outer identifier; inner uses elements: (note plural, no _tests suffix on the inner). See iwc-test-data-conventions §2f for the live example.

For a list-of-files where every element should pass the same minimal check, the existence-probe pattern (has_text: "{" for JSON; has_size: min: 100 for any non-empty binary) is widely used and accepted in IWC.

6. The validate-against-workflow inner loop

A -tests.yml file can be structurally invalid in two distinct ways:

Schema-invalid — wrong field names, wrong nesting, wrong types. Caught by the test-format JSON Schema.
Workflow-incoherent — schema-valid YAML, but the input/output labels don’t match the actual workflow. Renaming an output in the .ga and forgetting to update its sibling -tests.yml produces this case. Planemo will surface it as an “output not found” error at test-runtime, but only after a full workflow run.

The @galaxy-tool-util/schema npm package ships two validators that catch both cases statically — no Galaxy or Planemo invocation needed:

validateTestsFile(yaml) — runs the file against tests.schema.json (AJV). Reports schema violations with paths.
checkTestsAgainstWorkflow(workflow, tests) — cross-checks a .ga / format2 workflow against a tests file: missing input labels, missing output labels, type incompatibilities (e.g. test supplies a File for a parameter typed int).

Both are pure-JS, take milliseconds, and have no Galaxy dependency. Wire them into the inner authoring loop:

edit -tests.yml
  → validateTestsFile()                    # schema gate
  → checkTestsAgainstWorkflow(.ga, tests)  # coherence gate
  → planemo workflow_test_on_invocation    # assertion gate (no full re-run)
  → planemo test                           # full integration (slow)

The first two gates short-circuit cheap mistakes before a slow planemo run. They are the static-validation equivalent of gxwf for tests, and the implement-galaxy-workflow-test mold should reference them as its primary inner-loop tooling. Source: galaxy-tool-util-ts package, src/test-format/index.ts exports.

Reviewer convention is to generate the initial -tests.yml rather than hand-write it. Two planemo subcommands cover this:

planemo workflow_test_init --from_invocation <invocation_id> (planemo-workflow_test_init) — given a successful Galaxy invocation ID, emit a -tests.yml with a job: block that captures all inputs (with SHA-1 hashes) and an outputs: block with file: references to the actual outputs (downloaded into test-data/). Hand-tighten the assertions afterward.
planemo workflow_test_on_invocation <tests.yml> <invocation_id> (planemo-workflow_test_on_invocation) — re-evaluate an edited -tests.yml against a saved invocation without re-running the workflow. The fast inner loop for assertion iteration; complements the static gates in §6.

Together these cut the assertion-iteration cost dramatically. An agent should:

Run the workflow once on usegalaxy.* (or local) to get a known-good invocation.
--from_invocation to bootstrap the test file.
Replace the autogenerated file: exact-comparison assertions with assertion-family-appropriate alternatives per §1.
planemo-workflow_test_on_invocation after each edit; full planemo-test at the end.

8. What the schema gives you for free

When the test-format schema lands as a Foundry-rendered note, the agent can consult any assertion’s $def directly for: parameter types, defaults, required fields, the that discriminator constant, and the original Python docstring (carried through as description). This note does not restate that vocabulary — it complements it with the corpus-grounded which-and-when.

What’s still missing from the schema and worth keeping in research notes:

This decision table (§1) — output-type → assertion family.
Tolerance magnitudes (§3) — corpus-derived defaults.
The validateTestsFile / checkTestsAgainstWorkflow integration story (§6).
Anti-pattern flags — see iwc-shortcuts-anti-patterns.

9. Common combinations (recipes)

Six recipes worth memorizing.

Stable text report (FastQC summary, simple stats).

my_report:
  asserts:
    has_n_lines: { n: 12, delta: 2 }
    has_text: { text: "Total Sequences" }

MultiQC HTML report.

multiqc_report:
  asserts:
    has_text: { text: "Filtered Reads" }
    has_text: { text: "FastQC" }

VCF (pinned tool, fixed reference).

called_variants:
  file: test-data/expected.vcf
  compare: diff
  lines_diff: 6

Stochastic JSON (HyPhy-style).

hyphy_meme:
  element_tests:
    geneA: { asserts: { has_text: { text: "{" } } }
    geneB: { asserts: { has_text: { text: "{" } } }

Matplotlib plot.

umap_plot:
  asserts:
    has_size: { size: 68416, delta: 6000 }
    has_image_width: { width: 601, delta: 30 }
    has_image_height: { height: 429, delta: 25 }

AnnData (HDF5).

clustered_anndata:
  asserts:
    has_h5_keys: { keys: "obs/louvain" }
    has_h5_keys: { keys: "var/highly_variable" }
    has_h5_keys: { keys: "uns/rank_genes_groups" }
    has_size: { size: 12000000, delta: 1500000 }

10. Cross-references

galaxy-workflow-testability-design — decide which workflow outputs and checkpoints to expose before choosing assertions.
iwc-test-data-conventions — input-side conventions (job inputs, collection shapes, hashes:, CVMFS).
iwc-shortcuts-anti-patterns — accepted-vs-smell catalog and corpus prevalence; this note’s mirror image.
Test-format schema (@galaxy-tool-util/schema npm package) — authoritative vocabulary; will be vendored into a Foundry-rendered schema note. See docs/COMPILATION_PIPELINE.md for the casting story.
Planemo test-format spec: planemo.readthedocs.io/en/latest/test_format.html.
galaxy-xsd — Galaxy XSD assertion source of truth, vendored from upstream.
Tightening of the schema and Pydantic source: galaxyproject/galaxy#22566.
TS schema sync into npm: jmchilton/galaxy-tool-util-ts#75.

Planemo Asserts Idioms

Planemo asserts: idiom and decision guide

1. Choose by output type

1a. If the assertion is too weak, revisit the workflow output

2. The `compare:` operators

3. Tolerance picking

4. Text family — has_text vs has_text_matching vs has_line vs has_line_matching vs has_n_lines

5. Collection output assertions (`element_tests:`)

6. The validate-against-workflow inner loop

7. Authoring loop — generation, then refinement

8. What the schema gives you for free

9. Common combinations (recipes)

10. Cross-references

Incoming References (11)

Planemo asserts: idiom and decision guide

1. Choose by output type

1a. If the assertion is too weak, revisit the workflow output

2. The compare: operators

3. Tolerance picking

4. Text family — has_text vs has_text_matching vs has_line vs has_line_matching vs has_n_lines

5. Collection output assertions (element_tests:)

6. The validate-against-workflow inner loop

7. Authoring loop — generation, then refinement

8. What the schema gives you for free

9. Common combinations (recipes)

10. Cross-references

Incoming References (11)

2. The `compare:` operators

5. Collection output assertions (`element_tests:`)