IWC workflow testability survey
Source corpus: 120 cleaned gxformat2 workflows under $IWC_FORMAT2/, 120 skeletons under $IWC_SKELETONS/, and 115 sibling *-tests.yml files under $IWC/workflows/ as materialized in workflow-fixtures/iwc-src/workflows/. This survey supports galaxy-workflow-testability-design; it is an evidence hub, not a pattern-page proposal.
1. Scope
Topic type: workflow-shape concern. The question is not “which assertion should a test use?” That is covered by planemo-asserts-idioms. The question is “how should a Galaxy workflow be structured so useful tests can be written later?”
Evidence strategy, adapted from the /iwc-survey command:
- Skeleton scan first. Use
$IWC_SKELETONS/to catalog workflow outputs, labels, subworkflow boundaries, and collection-producing recipes without paying full parameter-read cost. - Sibling-test scan second. Use
*-tests.ymlfiles to identify which workflow outputs are actually asserted, which collection element identifiers matter, and which assertion families imply deterministic or stochastic behavior. - Selective full workflow reads. Read full
$IWC_FORMAT2only for examples where output promotion, labels, or collection topology need line-level confirmation. - Out of scope. Do not re-survey the assertion vocabulary itself, fixture-hosting conventions, checksum/negative-test absences, or shortcut-vs-smell calls already pinned in iwc-test-data-conventions, planemo-asserts-idioms, and iwc-shortcuts-anti-patterns.
2. Corpus counts
The first full pass matched top-level workflow outputs: against sibling test outputs: keys.
| Measure | Count | Interpretation |
|---|---|---|
| Cleaned format2 workflows | 120 | Survey denominator for workflow structure. |
| Workflows with sibling tests found by path convention | 114 | Six workflows lacked a matching sibling test file in this materialized corpus. |
| Total workflow-level outputs in those 114 workflows | 1,305 | IWC workflows expose many labeled outputs, including checkpoint/report outputs. |
| Distinct asserted output labels across sibling tests | 617 | Tests assert a selective subset of exposed workflow outputs. |
| Workflows where every asserted test output matched a workflow output label exactly | 114 / 114 | Output labels are the test API; no positional or unlabeled assertion route was observed. |
Test files with element_tests: | 59 / 115 | Collection-shaped output testing is common. |
element_tests: blocks | 227 | Stable collection element identifiers are central to testability. |
Test files with inline attributes: {collection_type: ...} | 2 / 115 | Explicit collection-type assertions exist but are much rarer than element-keyed assertions; four inline occurrences appear across those files. |
Highest asserted-output examples:
| Workflow | Workflow outputs | Asserted outputs | Why it matters |
|---|---|---|---|
$IWC_FORMAT2/amplicon/amplicon-mgnify/mgnify-amplicon-pipeline-v5-complete/mgnify-amplicon-pipeline-v5-complete.gxwf.yml | 83 | 38 | Large workflow exposing many domain outputs; tests select report, table, FASTA, HDF5/JSON, and collection checkpoints. |
$IWC_FORMAT2/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy.gxwf.yml | 21 | 21 | Every exposed output is asserted, including many mid-pipeline plots and AnnData checkpoints. |
$IWC_FORMAT2/VGP-assembly-v2/Scaffolding-HiC-VGP8/Scaffolding-HiC-VGP8.gxwf.yml | 48 | 16 | Workflow exposes many report/checkpoint outputs, while tests assert a diagnostic subset. |
$IWC_FORMAT2/amplicon/amplicon-mgnify/mgnify-amplicon-pipeline-v5-rrna-prediction/mgnify-amplicon-pipeline-v5-rrna-prediction.gxwf.yml | 13 | 13 | Smaller report-heavy workflow where all outputs are test-facing. |
Assertion-family counts from sibling tests:
| Assertion family | Files | Line hits | Workflow-design implication |
|---|---|---|---|
has_text | 78 | 597 | Text/report checkpoints are broadly assertable with stable tokens. |
has_size | 57 | 201 | Binary/report/plot checkpoints often use size bands. |
has_n_lines | 40 | 146 | Line-count checkpoints are common for tabular/text outputs. |
has_line | 23 | 69 | Deterministic table rows make stronger checkpoints than final reports. |
has_text_matching | 17 | 56 | Regex checkpoints cover numeric drift while preserving content checks. |
compare: sim_size | 9 | 26 | Stochastic or binary outputs may only support magnitude checks. |
compare: diff | 5 | 10 | Strict exact comparison is rare and format-specific. |
has_image_width / has_image_height | 1 | 15 each | Image checkpoints are mostly smoke-tested by size/dimensions in this corpus. |
3. Findings
3a. Workflow labels are the test API
Every asserted output key in the 114 matched workflow/test pairs resolved to a top-level workflow output label. The direct coupling is visible in Scanpy: the workflow exposes outputs like Initial Anndata General Info, UMAP of louvain, Ranked genes with Wilcoxon test, and Dotplot of top genes on clusters at $IWC_FORMAT2/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy.gxwf.yml:105-147; the test file keys assertions by those labels at $IWC/workflows/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy-tests.yml:27-205.
VGP scaffolding shows the same with punctuation-heavy labels: workflow outputs include Hi-C duplication stats on scaffolds: Raw, Hi-C duplication stats on scaffolds: MultiQc, and Merged Alignment stats at $IWC_FORMAT2/VGP-assembly-v2/Scaffolding-HiC-VGP8/Scaffolding-HiC-VGP8.gxwf.yml:170-196; tests assert those exact labels at $IWC/workflows/VGP-assembly-v2/Scaffolding-HiC-VGP8/Scaffolding-HiC-VGP8-tests.yml:218-245.
Design implication: input/output labels are stable interface names. Renaming is a test-breaking API change.
3b. IWC promotes checkpoint outputs so tests can see them
Workflow tests can assert only workflow-level outputs, so IWC workflows expose intermediate summaries, plots, and diagnostics as top-level outputs.
Scanpy exposes a dense ladder of checkpoints: initial AnnData summaries, intermediate plots, ranked-gene tables, final AnnData, cluster-count tables, and final plots ($IWC_FORMAT2/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy.gxwf.yml:105-147). The sibling test asserts every one of the 21 workflow outputs, mixing HDF5 key probes, text probes, image dimensions, and line checks ($IWC/workflows/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy-tests.yml:27-205).
RNA-seq paired-end exposes mapped reads, stranded/unstranded coverage, abundance estimates, expression tables, counts tables, and MultiQC reports ($IWC_FORMAT2/transcriptomics/rnaseq-pe/rnaseq-pe.gxwf.yml:90-112). The sibling test asserts coverage sizes, mapped-read sizes, expression regexes, and a deterministic counts row ($IWC/workflows/transcriptomics/rnaseq-pe/rnaseq-pe-tests.yml:48-97).
Design implication: promote the strongest useful checkpoint, not only the final human-facing report. The cost is output-list clutter; IWC tolerates that when it buys testability.
3c. Collection-shaped outputs need stable element identifiers
The 10x CellPlex workflow exposes collection-shaped outputs such as Seurat input for gene expression (filtered), CITE-seq-Count report, and Seurat input for CMO (UMI) ($IWC_FORMAT2/scRNAseq/fastq-to-matrix-10x/scrna-seq-fastq-to-matrix-10x-cellplex.gxwf.yml:73-91). The test asserts collection shape with attributes: {collection_type: ...} and drills into element identifiers such as subsample, matrix, barcodes, and genes ($IWC/workflows/scRNAseq/fastq-to-matrix-10x/scrna-seq-fastq-to-matrix-10x-cellplex-tests.yml:82-128).
HyPhy shows identifier stability in a different form: the workflow emits meme_output, prime_output, busted_output, and fel_output collection outputs ($IWC_FORMAT2/comparative_genomics/hyphy/hyphy-core.gxwf.yml:26-38), and tests key element checks by generated gene identifiers such as NC_001477.1|capsid_protein_C|95-394_DENV1 ($IWC/workflows/comparative_genomics/hyphy/hyphy-core-tests.yml:31-71).
Design implication: generated workflows need deterministic collection element identifiers before test authoring begins. Otherwise element_tests: cannot target outputs cleanly.
3d. Assertion strength feeds back into checkpoint choice
Scanpy plot outputs are mostly smoke-tested with has_size, has_image_width, and has_image_height tolerances ($IWC/workflows/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy-tests.yml:33-205). The same workflow also exposes stronger non-image checkpoints such as AnnData HDF5 keys and Number of cells per cluster line checks ($IWC/workflows/scRNAseq/scanpy-clustering/Preprocessing-and-Clustering-of-single-cell-RNA-seq-data-with-Scanpy-tests.yml:159-195).
RNA-seq paired-end pairs coarse size checks for BAM/bigWig-like outputs with stronger regex/line checks for expression/count tables ($IWC/workflows/transcriptomics/rnaseq-pe/rnaseq-pe-tests.yml:48-97). MGnify complete uses MultiQC token checks, exact file/location comparisons, collection element checks, and table-shape assertions in the same test file ($IWC/workflows/amplicon/amplicon-mgnify/mgnify-amplicon-pipeline-v5-complete/mgnify-amplicon-pipeline-v5-complete-tests.yml:15-120).
Design implication: if final outputs are stochastic, binary, image-only, or report-heavy, expose an adjacent table/text/HDF5 checkpoint that can carry a stronger assertion.
3e. Fixture shape constrains workflow inputs
Tests supply job inputs by workflow input label. The 10x CellPlex test uses labels like fastq PE collection GEX, reference genome, gtf, cellranger_barcodes_3M-february-2018.txt, fastq PE collection CMO, sample name and CMO sequence collection, and Number of expected cells ($IWC/workflows/scRNAseq/fastq-to-matrix-10x/scrna-seq-fastq-to-matrix-10x-cellplex-tests.yml:2-75); the workflow declares matching inputs and types, including data, string, boolean, int, and collection inputs ($IWC_FORMAT2/scRNAseq/fastq-to-matrix-10x/scrna-seq-fastq-to-matrix-10x-cellplex.gxwf.yml:4-72).
HyPhy’s input collection identifiers contain pipes and accession-like labels ($IWC/workflows/comparative_genomics/hyphy/hyphy-core-tests.yml:7-30), while the workflow only declares the collection type ($IWC_FORMAT2/comparative_genomics/hyphy/hyphy-core.gxwf.yml:14-25). The test contract therefore lives partly in workflow input shape and partly in fixture element identifiers.
Design implication: workflow input labels and collection types should be designed with fixture authoring in mind, not treated as a test-file afterthought.
4. Distribution plan
No formal content/patterns/*.md pages are recommended from this issue right now. These findings are cross-cutting workflow-testability guidance, not operation-anchored Galaxy construction patterns as defined in docs/PATTERNS.md.
| Finding | Permanent home | Integration action |
|---|---|---|
| Labels as test API | galaxy-workflow-testability-design, with shortcut warning retained in iwc-shortcuts-anti-patterns | New durable note owns design guidance; anti-pattern note keeps accepted/smell wording. |
| Promote checkpoint outputs | galaxy-workflow-testability-design | New durable note owns the workflow-authoring rule and evidence. |
| Stable collection output identifiers | galaxy-workflow-testability-design, cross-linked from iwc-test-data-conventions | Design note owns output-side stability; test-data note keeps YAML shapes. |
| Assertion strength affects checkpoint choice | galaxy-workflow-testability-design, cross-linked from planemo-asserts-idioms | Assertion note keeps assertion families; design note explains upstream checkpoint selection. |
| Fixture shape constrains workflow inputs | iwc-test-data-conventions plus galaxy-workflow-testability-design | Test-data note owns input fixture YAML; design note owns workflow interface implications. |
| Planemo missing-output ambiguity | planemo-workflow-test-architecture | Architecture note should mention label drift and omitted workflow outputs as likely causes. |
| Mold auto-load behavior | implement-galaxy-workflow-test frontmatter references | Add design note as on-demand research reference; avoid expanding Mold body. |
5. Open questions
- Should galaxy-workflow-testability-design be loaded only by implement-galaxy-workflow-test, or also by upstream workflow-construction Molds that choose outputs before tests exist?
- Should the survey stay as a draft evidence hub, or become
staleafter the durable note absorbs the guidance? - Should we add a separate workflow-authoring research note later for user-facing output curation, or is testability-specific output clutter enough for this note?