Test-format schema parity: TS ajv vs Python Pydantic divergence
Debrief from mirroring Galaxy’s Python IWC sweep tests into the TS monorepo
(packages/cli/test/iwc-sweep.test.ts). Feeds directly into
TEST_FORMAT_EFFECT_SCHEMA_PLAN — see “Implication” below.
What was done
- Fixed a real latent bug in the TS strict-encoding sweep (ran
checkStrictEncodingon raw.ga; every step’stool_stateis a JSON string → would fail 119/120). Now cleans first, mirroring PythonTestIWCSweepStrictEncodingClean. Also fixedstrict (all): structure→clean →encoding order, and a dead skip filter (r.status === "skip"never matched; nowSKIP_STATUSES.has). - Added a tests-file sweep mirroring Python
TestIWCSweepValidateTests(discover*-tests.yml/-test.yml, validate vsTestsschema).
The divergence the new sweep surfaced
Ran both validators over all 119 IWC tests files:
| verdict | count |
|---|---|
| both reject (faithful) | 46 |
| Python rejects, TS accepts (TS too permissive) | 30 |
| TS rejects, Python accepts (TS too strict) | 5 |
| Python valid / TS valid totals | 43 / 68 |
So TS and Python disagree on 35 files, in both directions.
Root cause
TS validates with ajv against Tests.model_json_schema()
(packages/schema/src/test-format/tests.schema.json, generated by
scripts/dump-test-format-schema.py). Python validates with
Tests.model_validate() at runtime. JSON Schema is structural and cannot
encode imperative Pydantic logic, so the exported artifact is lossy. Two
mechanisms, both confirmed against real IWC data:
1. Callable Discriminator functions are not serializable
galaxy.tool_util_models.__init__:
TestOutputAssertions= tagged unionDiscriminator(_discriminate_output)TestCollectionElementAssertion= tagged unionDiscriminator(_discriminate_collection_element)
_discriminate_output(v): dict with class == "Collection" → Collection
model; else → File model (extra="forbid"). So a class-less
{element_tests: {...}} output is routed to the File model at runtime and
rejected (“Extra inputs are not permitted”). model_json_schema() can’t emit
the callable, so it degrades the union to a plain oneOf of the members. ajv
tries all branches, finds element_tests matches TestCollectionOutputAssertions
cleanly → accepts. → the 30 too-permissive files (collection assertions
written without class: Collection).
Inverse: ajv oneOf requires exactly one match / applies every branch’s
additionalProperties:false; the runtime discriminator cleanly picks one. This
contributes to the 5 too-strict files.
2. @model_validator(mode="before") normalizers are absent from the schema
TestCollectionOutputAssertions._normalize_type_alias →
normalize_collection_type_alias maps legacy type: → collection_type on
Collection shapes before validation (IWC authors write type: paired on
nested class: Collection elements). Pure Python, invisible to
model_json_schema(). ajv sees type as an undeclared key on a strict
Collection model → rejects valid input. → the rest of the 5 too-strict files
(e.g. microbiome/binning-evaluation, Scaffolding-HiC-VGP8).
Implication for TEST_FORMAT_EFFECT_SCHEMA_PLAN
The plan’s Option A (generate Effect Schema from the emitted JSON Schema)
cannot achieve parity — the loss happens upstream, in
model_json_schema(), before TS sees anything. Swapping ajv for Effect Schema
fixed to the same JSON artifact reproduces all 35 divergences.
To actually match Python, the generator must capture the discriminator + before
-validator semantics, i.e. Option B (walk Tests.model_fields /
TypeAdapter core schema in Python and emit the discriminator dispatch + the
type→collection_type alias), or a hand-mirror. Minimum faithful set:
class-keyed dispatch for the output union and collection-element union (emitif/then/elseonclass, no fall-through) instead ofoneOf.- a pre-validation
type→collection_typenormalization pass.
Status of the divergent files
Not a regression in either validator — these are genuine IWC authoring issues
(class-less collection assertions; legacy type: aliases) that Python’s model
deliberately handles (alias) or rejects (missing class). The sweep is a
gated triage harness (skipped without GALAXY_TEST_IWC_DIRECTORY), matching
Python — it is expected to fail while drift exists.
Fix implemented (mechanism #1)
scripts/dump-test-format-schema.py now post-processes the dumped schema:
the two callable-discriminator oneOf unions (output assertions, collection
elements) are rewritten into if/then/else keyed on class (== “Collection”
→ Collection model; other object → File model; non-object → scalar branch for
the output union). Regenerated via make sync-test-format-schema. The callable
discriminators stay upstream in the model — only their JSON-Schema projection
is made faithful (the callables are justified: a Pydantic string discriminator
can’t default class-less dicts to File, nor route scalar members).
Parity result (119 IWC tests files)
| before | after | |
|---|---|---|
| both reject (agreed) | 46 | 76 |
| Python rejects / TS accepts (too permissive) | 30 | 0 |
| TS rejects / Python accepts (too strict) | 5 | 5 |
The 30 false-accepts are closed (TS now catches the real class-less collection
-assertion bugs). The remaining 5 are exactly the type:→collection_type
alias files (mechanism #2), descoped — those IWC files are to be fixed in
IWC, not worked around in TS. They are a known TS-stricter-than-Python gap that
persists only because Galaxy’s normalize_collection_type_alias before
-validator still tolerates the alias; removing that upstream would make both
sides agree (separate decision). No regressions: 4895 schema tests pass;
validate-tests unit tests pass; if/then/else routing verified positively.
Repro
# TS (built dist required): packages/cli
GALAXY_TEST_IWC_DIRECTORY=~/projects/repositories/iwc \
pnpm exec vitest run test/iwc-sweep.test.ts -t "tests-file validation"
# Python: galaxy wf_tool_state worktree, .venv
PYTHONPATH=lib .venv/bin/python -c "from galaxy.tool_util.workflow_state.validation_tests import load_tests_file, validate_tests_file; ..."