Improved Test Model Plan
Independent follow-up to TEST_VALIDATION_PLAN.md §“TestOutputAssertions depth”. Goal: make TestOutputAssertions errors usable by collapsing the un-discriminated Union into a tagged one. Secondary: parity fixes for the nested-element variants.
Target file: lib/galaxy/tool_util_models/__init__.py:185-235.
Status: IMPLEMENTED — commit 0adf9f776c on branch wf_test_job_schema. See §8 for what diverged.
0. Verified claims
TestOutputAssertions = Union[TestCollectionOutputAssertions, TestDataOutputAssertions, TestOutputLiteral]— confirmed (__init__.py:235). Un-discriminated.class_defaults:"File"onTestDataOutputAssertions(__init__.py:202) and"Collection"onTestCollectionOutputAssertions(__init__.py:225). BothOptional[Literal[...]] = Field(..., alias="class")—class:key is optional in YAML.TestCollectionCollectionElementAssertionshas noclass_field (__init__.py:205-207). Nested siblings:TestCollectionDatasetElementAssertions(BaseTestOutputModel)— inherits noclass_either (onlyTestDataOutputAssertionsadds it). Nested dataset elements therefore also cannot carryclass: File.TestCollectionElementAssertion = Union[TestCollectionDatasetElementAssertions, TestCollectionCollectionElementAssertions]— also un-discriminated.
_discriminate_fileprecedent:test_job.py:102-132,Annotated[Union[Tag(...), ...], Discriminator(callable)]— in-tree pattern.- Existing
Field(discriminator="class_")string-form precedent:__init__.py:159(DynamicToolSources),test_job.py:146(CollectionElement). StrictModel→extra="forbid"(_base.py:17-18).- Assertions root model is discriminated on
"that"— verified atassertions.py:1-60(auto-generated,AssertionModelusesextra="forbid"). 3641-line file, not touching. Tests/TestJobexternal consumers:test/unit/tool_util_models/test_simple_parse.py,test/unit/tool_util/test_test_format_model.py, andlib/galaxy_test/workflow/test_framework_workflows.py:17(likelyUserToolSourcebundle; not the assertion types — verify).
1. Discriminator strategy
Use a callable Discriminator on TestOutputAssertions, three tags: "File", "Collection", "scalar".
def _discriminate_output(v):
if isinstance(v, dict):
cls = v.get("class")
if cls == "Collection":
return "Collection"
return "File" # default — class absent or class: "File"
if isinstance(v, TestCollectionOutputAssertions):
return "Collection"
if isinstance(v, TestDataOutputAssertions):
return "File"
if isinstance(v, (bool, int, float, str)):
return "scalar"
return None
TestOutputAssertions = Annotated[
Union[
Annotated[TestCollectionOutputAssertions, Tag("Collection")],
Annotated[TestDataOutputAssertions, Tag("File")],
Annotated[TestOutputLiteral, Tag("scalar")],
],
Discriminator(_discriminate_output),
]
Why callable and not Field(discriminator="class_"):
- The scalar branch has no
class_field. class_is optional everywhere; string-form discriminator requires the key present.- Pydantic’s string discriminator does not see defaults at discrimination time.
An unknown class: value (e.g. class: Banana) falls through to "File" → user gets a clean “literal_error: expected ‘File’” on the class_ field rather than three parallel tracebacks. Acceptable.
Same treatment applies to TestCollectionElementAssertion — but the two inner types currently share zero distinguishing keys. See §2.
2. Nested element fix — TestCollectionCollectionElementAssertions
Parallels TestCollectionOutputAssertions but omits class_. No surrounding tests exercise the nested element shape. Looks like oversight, not deliberate.
Recommended: add the field, symmetric with outer.
class TestCollectionCollectionElementAssertions(StrictModel):
class_: Optional[Literal["Collection"]] = Field("Collection", alias="class")
elements: Optional[Dict[str, "TestCollectionElementAssertion"]] = None
element_tests: Optional[Dict[str, "TestCollectionElementAssertion"]] = None
And give the dataset nested variant an explicit class_ too (on TestCollectionDatasetElementAssertions since the base model intentionally doesn’t have one — keep base bare, push to the concrete variant):
class TestCollectionDatasetElementAssertions(BaseTestOutputModel):
class_: Optional[Literal["File"]] = Field("File", alias="class")
Then apply the same _discriminate_output-style callable to TestCollectionElementAssertion so nested-element diagnostics match.
Risk: if any real test YAML has nested collection elements with unknown keys that currently parse as TestCollectionDatasetElementAssertions (the Union leftmost winner) by accident, adding strict class_ won’t change that path — class_ defaults to "File" so omitted-class still goes to dataset branch. Should be pure improvement.
3. Backwards compatibility
Grep for TestOutputAssertions / TestCollectionOutputAssertions / TestDataOutputAssertions / TestCollectionCollectionElementAssertions: only __init__.py references them. No external importers.
The discriminator change is additive for valid inputs: same models still accept same dicts. Only the error-surface shape changes. model_dump() round-trips still work — Annotated[... Tag(), Discriminator()] is dump-transparent.
model_json_schema() output changes from flat anyOf to oneOf with a discriminator property (see §5) — any consumer parsing that schema needs to handle discriminator, but we control both sides.
4. Test impact & new tests
Existing:
test/unit/tool_util_models/test_simple_parse.py— one positive case, passes under either model.test/unit/tool_util/test_test_format_model.py— runsTests.model_validateagainstlib/galaxy_test/workflow/*.gxwf-tests.ymland optionally IWC. Positive-only; should continue passing. Verify locally.
New tests (red-to-green, drop in test/unit/tool_util_models/ or test/unit/tool_util/):
- Positive: implicit-File output (no
class:key) validates asTestDataOutputAssertions. - Positive: explicit
class: Collectionwithelements:validates asTestCollectionOutputAssertions. - Positive: scalar literals (
True,42,"hello") validate asTestOutputLiteral. - Positive: nested collection element with
class: Collectionvalidates (regression guard for §2 fix). - Negative (the crux):
Fileoutput with unknown field — assertValidationErrorcontains ONE error targetingTestDataOutputAssertions(not three), and the error string does NOT include_modelfan-out:errs = exc.value.errors() assert len(errs) == 1 assert errs[0]["loc"][:2] == ("outputs", "out") assert "File" in str(errs[0]["loc"]) # discriminator tag present - Negative: bad
class: Banana— one error onclass_literal, not three branches. - Negative:
asserts: [{that: "not_a_real_assert"}]— verify error is scoped toassertsand doesn’t bleed into Collection / scalar branches. - Snapshot-ish: assert
Tests.model_json_schema()contains adiscriminatorkey at the output-assertions union site.
All additions are in-package; no integration-test impact. Run pytest test/unit/tool_util_models test/unit/tool_util/test_test_format_model.py before commit.
5. JSON Schema export impact (shared benefit)
- Current:
Tests.model_json_schema()emitsanyOf: [CollectionOutput, DataOutput, {type: [bool, int, number, string]}]for each output value. - After:
oneOf: [...]with adiscriminator: {propertyName: "class", mapping: {File: "#/$defs/TestDataOutputAssertions", Collection: "#/$defs/TestCollectionOutputAssertions"}}for the model arms. Callable discriminators may not export a JSON-Schema-level discriminator automatically — verify empirically, and if missing addjson_schema_extra={"discriminator": {...}}override.
Benefits for downstream:
- Ajv picks the right branch by
class— single-path error instead of all-branches-failed. - TS codegen emits a proper tagged union (
{ class: 'File', ... } | { class: 'Collection', ... } | boolean | number | string). - The
TestOutputLiteralarm remains a scalar union — fine.
6. Implementation order (red-to-green)
- Write new negative test #5 first against current code — confirm it fails in the bad way (triple error,
_modelfan-out string). Locks the regression baseline. - Write positive tests #1-#4 — confirm #1-#3 pass; #4 (nested
class: Collection) fails on current code. Leave red. - Apply §2 fix: add
class_toTestCollectionCollectionElementAssertionsandTestCollectionDatasetElementAssertions. Run tests — nested #4 passes; others still pass. - Apply §1 discriminator on outer
TestOutputAssertions. Run #5-#7. Should go green. - Apply discriminator on inner
TestCollectionElementAssertion(same callable pattern, tags"File"/"Collection"). Run nested negative tests. - Write test #8 (JSON Schema). Inspect schema; if no discriminator key emitted for callable case, add
json_schema_extraoverride. - Run existing
test_test_format_model.pyandtest_simple_parse.py. Spot-check IWC tree if env available. - Commit.
7. Unresolved questions
- Require
class:key explicitly (break implicit-File)? Would let us use string-formField(discriminator="class_")and skip callable. Probably no — IWC test YAMLs routinely omit it. - Pydantic callable-
DiscriminatorJSON Schema: auto-emitdiscriminatorproperty or needjson_schema_extra? Verify in step 6. TestOutputLiteraltag — single"scalar"or split bool/int/float/str? Single simpler; split gives better TS types but clutters errors.BaseTestOutputModelitself growclass_instead of duplicating on two subclasses? Cleaner, but changesTestCollectionDatasetElementAssertions’s class_ default semantics. Recommend: duplicate on subclasses, keep base bare.- Rename
TestOutputAssertions(Union) vsTestDataOutputAssertions/TestCollectionOutputAssertions(classes)? Confusing but out of scope.
Key paths
lib/galaxy/tool_util_models/__init__.py:185-250lib/galaxy/tool_util_models/test_job.py:102-147(precedent)lib/galaxy/tool_util_models/_base.py:17-18(StrictModel)lib/galaxy/tool_util_models/assertions.py:1-60(AssertionModel; don’t touch)test/unit/tool_util/test_test_format_model.py(regression guard)test/unit/tool_util_models/test_simple_parse.py(regression guard)test/unit/tool_util_models/test_output_assertions.py(new — 11 tests added in commit0adf9f776c)
8. Implementation debrief (2026-04-21)
Landed in commit 0adf9f776c on wf_test_job_schema. Full regression suite (test/unit/tool_util_models + test/unit/tool_util/test_test_format_model.py): 32 passed, 1 skipped.
Deviations from plan
A. Three gxwf-tests YAMLs were buggy; fixed, not accommodated.
Plan §1 said class absent → default to File. Regression surfaced: three workflow test YAMLs had nested collection outputs with elements: but no class: Collection — the un-discriminated Union had been silently accepting them by trying branches until one worked. Rather than add a shape-peek fallback to the discriminator (initial pass did this; reverted on review), fixed the YAMLs:
lib/galaxy_test/workflow/collection_semantics_cat.gxwf-tests.yml—[3].outputs.wf_output.elements.el1lib/galaxy_test/workflow/subcollection_rank_sorting.gxwf-tests.yml—[2].outputs.out.elements.test_level_3lib/galaxy_test/workflow/subcollection_rank_sorting_paired.gxwf-tests.yml—[2].outputs.out.elements.test_level_3
Verified no other workflow YAMLs have the same omission, and no type: vs collection_type: confusion on class: Collection entries anywhere. Discriminators stay strict per plan: class == "Collection" → Collection, else → File.
B. Test loc-shape correction.
Plan §4 test #5 asserted errs[0]["loc"][:2] == ("outputs", "out"). Actual pydantic loc begins with the RootModel list index, so (0, "outputs", "out", "File", ...). Tests were updated to match. No behavioral impact — just mechanical.
C. BaseTestOutputModel stays bare; class_ duplicated on subclasses.
Plan §7 asked this as an open question; implementation went with “duplicate on subclasses”. TestCollectionDatasetElementAssertions now has class_: Optional[Literal["File"]] = Field("File", alias="class") explicitly. Keeps the base model flexible for any future subclass that doesn’t need a class tag.
D. JSON-schema test (#8) left loose.
Plan §5 flagged uncertainty about whether callable Discriminator auto-emits a JSON-Schema discriminator keyword. Test asserts "discriminator" in schema OR "oneOf" in schema — passes under pydantic 2.12 but doesn’t pin down which. Tightening requires inspecting actual schema output; deferred.
What matched the plan exactly
- §1 tag names (
"File"/"Collection"/"scalar") - §2 nested
class_additions (symmetric with outer) - §3 backwards-compat: all existing positive fixtures still parse
- §6 red-to-green order loosely followed — skipped the baseline-lock test at user’s direction and wrote final-shape tests directly
- No external consumers touched (confirmed —
__init__.pyis the only site that references these names)
Updated unresolved questions
- (Original)
TestOutputLiteraltag — single"scalar"or split? Kept single. Revisit if downstream TS consumers want tighter types. - (Original) Rename
TestOutputAssertionsvsTestDataOutputAssertions/TestCollectionOutputAssertions? Still out of scope. - (New) Tighten test #8 once pydantic 2.12’s emit shape is confirmed. Cheap follow-up.
- (New)
json_schema_extradiscriminator override needed for TS codegen downstream? Gated on whatever consumer surfaces the need.