IWC workflow testability survey

Source corpus: 120 cleaned gxformat2 workflows under $IWC_FORMAT2/, 120 skeletons under $IWC_SKELETONS/, and 115 sibling *-tests.yml files under $IWC/workflows/ as materialized in workflow-fixtures/iwc-src/workflows/. This survey supports galaxy-workflow-testability-design; it is an evidence hub, not a pattern-page proposal.

1. Scope

Topic type: workflow-shape concern. The question is not “which assertion should a test use?” That is covered by planemo-asserts-idioms. The question is “how should a Galaxy workflow be structured so useful tests can be written later?”

Evidence strategy, adapted from the /iwc-survey command:

Skeleton scan first. Use $IWC_SKELETONS/ to catalog workflow outputs, labels, subworkflow boundaries, and collection-producing recipes without paying full parameter-read cost.
Sibling-test scan second. Use *-tests.yml files to identify which workflow outputs are actually asserted, which collection element identifiers matter, and which assertion families imply deterministic or stochastic behavior.
Selective full workflow reads. Read full $IWC_FORMAT2 only for examples where output promotion, labels, or collection topology need line-level confirmation.
Out of scope. Do not re-survey the assertion vocabulary itself, fixture-hosting conventions, checksum/negative-test absences, or shortcut-vs-smell calls already pinned in iwc-test-data-conventions, planemo-asserts-idioms, and iwc-shortcuts-anti-patterns.

2. Corpus counts

The first full pass matched top-level workflow outputs: against sibling test outputs: keys.

Measure	Count	Interpretation
Cleaned format2 workflows	120	Survey denominator for workflow structure.
Workflows with sibling tests found by path convention	114	Six workflows lacked a matching sibling test file in this materialized corpus.
Total workflow-level outputs in those 114 workflows	1,305	IWC workflows expose many labeled outputs, including checkpoint/report outputs.
Distinct asserted output labels across sibling tests	617	Tests assert a selective subset of exposed workflow outputs.
Workflows where every asserted test output matched a workflow output label exactly	114 / 114	Output labels are the test API; no positional or unlabeled assertion route was observed.
Test files with `element_tests:`	59 / 115	Collection-shaped output testing is common.
`element_tests:` blocks	227	Stable collection element identifiers are central to testability.
Test files with inline `attributes: {collection_type: ...}`	2 / 115	Explicit collection-type assertions exist but are much rarer than element-keyed assertions; four inline occurrences appear across those files.

Highest asserted-output examples:

Workflow	Workflow outputs	Asserted outputs	Why it matters
`$IWC_FORMAT2/amplicon/amplicon-mgnify/mgnify-amplicon-pipeline-v5-complete/mgnify-amplicon-pipeline-v5-complete.gxwf.yml`	83	38	Large workflow exposing many domain outputs; tests select report, table, FASTA, HDF5/JSON, and collection checkpoints.
`$IWC_FORMAT2/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy.gxwf.yml`	21	21	Every exposed output is asserted, including many mid-pipeline plots and AnnData checkpoints.
`$IWC_FORMAT2/VGP-assembly-v2/Scaffolding-HiC-VGP8/Scaffolding-HiC-VGP8.gxwf.yml`	48	16	Workflow exposes many report/checkpoint outputs, while tests assert a diagnostic subset.
`$IWC_FORMAT2/amplicon/amplicon-mgnify/mgnify-amplicon-pipeline-v5-rrna-prediction/mgnify-amplicon-pipeline-v5-rrna-prediction.gxwf.yml`	13	13	Smaller report-heavy workflow where all outputs are test-facing.

Assertion-family counts from sibling tests:

Assertion family	Files	Line hits	Workflow-design implication
`has_text`	78	597	Text/report checkpoints are broadly assertable with stable tokens.
`has_size`	57	201	Binary/report/plot checkpoints often use size bands.
`has_n_lines`	40	146	Line-count checkpoints are common for tabular/text outputs.
`has_line`	23	69	Deterministic table rows make stronger checkpoints than final reports.
`has_text_matching`	17	56	Regex checkpoints cover numeric drift while preserving content checks.
`compare: sim_size`	9	26	Stochastic or binary outputs may only support magnitude checks.
`compare: diff`	5	10	Strict exact comparison is rare and format-specific.
`has_image_width` / `has_image_height`	1	15 each	Image checkpoints are mostly smoke-tested by size/dimensions in this corpus.

3. Findings

3a. Workflow labels are the test API

Every asserted output key in the 114 matched workflow/test pairs resolved to a top-level workflow output label. The direct coupling is visible in Scanpy: the workflow exposes outputs like Initial Anndata General Info, UMAP of louvain, Ranked genes with Wilcoxon test, and Dotplot of top genes on clusters at $IWC_FORMAT2/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy.gxwf.yml:105-147; the test file keys assertions by those labels at $IWC/workflows/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy-tests.yml:27-205.

VGP scaffolding shows the same with punctuation-heavy labels: workflow outputs include Hi-C duplication stats on scaffolds: Raw, Hi-C duplication stats on scaffolds: MultiQc, and Merged Alignment stats at $IWC_FORMAT2/VGP-assembly-v2/Scaffolding-HiC-VGP8/Scaffolding-HiC-VGP8.gxwf.yml:170-196; tests assert those exact labels at $IWC/workflows/VGP-assembly-v2/Scaffolding-HiC-VGP8/Scaffolding-HiC-VGP8-tests.yml:218-245.

Design implication: input/output labels are stable interface names. Renaming is a test-breaking API change.

3b. IWC promotes checkpoint outputs so tests can see them

Workflow tests can assert only workflow-level outputs, so IWC workflows expose intermediate summaries, plots, and diagnostics as top-level outputs.

Scanpy exposes a dense ladder of checkpoints: initial AnnData summaries, intermediate plots, ranked-gene tables, final AnnData, cluster-count tables, and final plots ($IWC_FORMAT2/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy.gxwf.yml:105-147). The sibling test asserts every one of the 21 workflow outputs, mixing HDF5 key probes, text probes, image dimensions, and line checks ($IWC/workflows/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy-tests.yml:27-205).

RNA-seq paired-end exposes mapped reads, stranded/unstranded coverage, abundance estimates, expression tables, counts tables, and MultiQC reports ($IWC_FORMAT2/transcriptomics/rnaseq-pe/rnaseq-pe.gxwf.yml:90-112). The sibling test asserts coverage sizes, mapped-read sizes, expression regexes, and a deterministic counts row ($IWC/workflows/transcriptomics/rnaseq-pe/rnaseq-pe-tests.yml:48-97).

Design implication: promote the strongest useful checkpoint, not only the final human-facing report. The cost is output-list clutter; IWC tolerates that when it buys testability.

3c. Collection-shaped outputs need stable element identifiers

The 10x CellPlex workflow exposes collection-shaped outputs such as Seurat input for gene expression (filtered), CITE-seq-Count report, and Seurat input for CMO (UMI) ($IWC_FORMAT2/scRNAseq/fastq-to-matrix-10x/scrna-seq-fastq-to-matrix-10x-cellplex.gxwf.yml:73-91). The test asserts collection shape with attributes: {collection_type: ...} and drills into element identifiers such as subsample, matrix, barcodes, and genes ($IWC/workflows/scRNAseq/fastq-to-matrix-10x/scrna-seq-fastq-to-matrix-10x-cellplex-tests.yml:82-128).

HyPhy shows identifier stability in a different form: the workflow emits meme_output, prime_output, busted_output, and fel_output collection outputs ($IWC_FORMAT2/comparative_genomics/hyphy/hyphy-core.gxwf.yml:26-38), and tests key element checks by generated gene identifiers such as NC_001477.1|capsid_protein_C|95-394_DENV1 ($IWC/workflows/comparative_genomics/hyphy/hyphy-core-tests.yml:31-71).

Design implication: generated workflows need deterministic collection element identifiers before test authoring begins. Otherwise element_tests: cannot target outputs cleanly.

3d. Assertion strength feeds back into checkpoint choice

Scanpy plot outputs are mostly smoke-tested with has_size, has_image_width, and has_image_height tolerances ($IWC/workflows/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy-tests.yml:33-205). The same workflow also exposes stronger non-image checkpoints such as AnnData HDF5 keys and Number of cells per cluster line checks ($IWC/workflows/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy-tests.yml:159-195).

RNA-seq paired-end pairs coarse size checks for BAM/bigWig-like outputs with stronger regex/line checks for expression/count tables ($IWC/workflows/transcriptomics/rnaseq-pe/rnaseq-pe-tests.yml:48-97). MGnify complete uses MultiQC token checks, exact file/location comparisons, collection element checks, and table-shape assertions in the same test file ($IWC/workflows/amplicon/amplicon-mgnify/mgnify-amplicon-pipeline-v5-complete/mgnify-amplicon-pipeline-v5-complete-tests.yml:15-120).

Design implication: if final outputs are stochastic, binary, image-only, or report-heavy, expose an adjacent table/text/HDF5 checkpoint that can carry a stronger assertion.

3e. Fixture shape constrains workflow inputs

Tests supply job inputs by workflow input label. The 10x CellPlex test uses labels like fastq PE collection GEX, reference genome, gtf, cellranger_barcodes_3M-february-2018.txt, fastq PE collection CMO, sample name and CMO sequence collection, and Number of expected cells ($IWC/workflows/scRNAseq/fastq-to-matrix-10x/scrna-seq-fastq-to-matrix-10x-cellplex-tests.yml:2-75); the workflow declares matching inputs and types, including data, string, boolean, int, and collection inputs ($IWC_FORMAT2/scRNAseq/fastq-to-matrix-10x/scrna-seq-fastq-to-matrix-10x-cellplex.gxwf.yml:4-72).

HyPhy’s input collection identifiers contain pipes and accession-like labels ($IWC/workflows/comparative_genomics/hyphy/hyphy-core-tests.yml:7-30), while the workflow only declares the collection type ($IWC_FORMAT2/comparative_genomics/hyphy/hyphy-core.gxwf.yml:14-25). The test contract therefore lives partly in workflow input shape and partly in fixture element identifiers.

Design implication: workflow input labels and collection types should be designed with fixture authoring in mind, not treated as a test-file afterthought.

4. Distribution plan

No formal content/patterns/*.md pages are recommended from this issue right now. These findings are cross-cutting workflow-testability guidance, not operation-anchored Galaxy construction patterns as defined in docs/PATTERNS.md.

Finding	Permanent home	Integration action
Labels as test API	galaxy-workflow-testability-design, with shortcut warning retained in iwc-shortcuts-anti-patterns	New durable note owns design guidance; anti-pattern note keeps accepted/smell wording.
Promote checkpoint outputs	galaxy-workflow-testability-design	New durable note owns the workflow-authoring rule and evidence.
Stable collection output identifiers	galaxy-workflow-testability-design, cross-linked from iwc-test-data-conventions	Design note owns output-side stability; test-data note keeps YAML shapes.
Assertion strength affects checkpoint choice	galaxy-workflow-testability-design, cross-linked from planemo-asserts-idioms	Assertion note keeps assertion families; design note explains upstream checkpoint selection.
Fixture shape constrains workflow inputs	iwc-test-data-conventions plus galaxy-workflow-testability-design	Test-data note owns input fixture YAML; design note owns workflow interface implications.
Planemo missing-output ambiguity	planemo-workflow-test-architecture	Architecture note should mention label drift and omitted workflow outputs as likely causes.
Mold auto-load behavior	implement-galaxy-workflow-test frontmatter `references`	Add design note as on-demand research reference; avoid expanding Mold body.

5. Open questions

Should galaxy-workflow-testability-design be loaded only by implement-galaxy-workflow-test, or also by upstream workflow-construction Molds that choose outputs before tests exist?
Should the survey stay as a draft evidence hub, or become stale after the durable note absorbs the guidance?
Should we add a separate workflow-authoring research note later for user-facing output curation, or is testability-specific output clutter enough for this note?

Iwc Workflow Testability Survey