TEST_FORMAT_SCHEMA_PARITY_FINDINGS

Test-format schema parity: TS ajv vs Python Pydantic divergence

Debrief from mirroring Galaxy’s Python IWC sweep tests into the TS monorepo (packages/cli/test/iwc-sweep.test.ts). Feeds directly into TEST_FORMAT_EFFECT_SCHEMA_PLAN — see “Implication” below.

What was done

The divergence the new sweep surfaced

Ran both validators over all 119 IWC tests files:

verdictcount
both reject (faithful)46
Python rejects, TS accepts (TS too permissive)30
TS rejects, Python accepts (TS too strict)5
Python valid / TS valid totals43 / 68

So TS and Python disagree on 35 files, in both directions.

Root cause

TS validates with ajv against Tests.model_json_schema() (packages/schema/src/test-format/tests.schema.json, generated by scripts/dump-test-format-schema.py). Python validates with Tests.model_validate() at runtime. JSON Schema is structural and cannot encode imperative Pydantic logic, so the exported artifact is lossy. Two mechanisms, both confirmed against real IWC data:

1. Callable Discriminator functions are not serializable

galaxy.tool_util_models.__init__:

_discriminate_output(v): dict with class == "Collection" → Collection model; else → File model (extra="forbid"). So a class-less {element_tests: {...}} output is routed to the File model at runtime and rejected (“Extra inputs are not permitted”). model_json_schema() can’t emit the callable, so it degrades the union to a plain oneOf of the members. ajv tries all branches, finds element_tests matches TestCollectionOutputAssertions cleanly → accepts. → the 30 too-permissive files (collection assertions written without class: Collection).

Inverse: ajv oneOf requires exactly one match / applies every branch’s additionalProperties:false; the runtime discriminator cleanly picks one. This contributes to the 5 too-strict files.

2. @model_validator(mode="before") normalizers are absent from the schema

TestCollectionOutputAssertions._normalize_type_aliasnormalize_collection_type_alias maps legacy type:collection_type on Collection shapes before validation (IWC authors write type: paired on nested class: Collection elements). Pure Python, invisible to model_json_schema(). ajv sees type as an undeclared key on a strict Collection model → rejects valid input. → the rest of the 5 too-strict files (e.g. microbiome/binning-evaluation, Scaffolding-HiC-VGP8).

Implication for TEST_FORMAT_EFFECT_SCHEMA_PLAN

The plan’s Option A (generate Effect Schema from the emitted JSON Schema) cannot achieve parity — the loss happens upstream, in model_json_schema(), before TS sees anything. Swapping ajv for Effect Schema fixed to the same JSON artifact reproduces all 35 divergences.

To actually match Python, the generator must capture the discriminator + before -validator semantics, i.e. Option B (walk Tests.model_fields / TypeAdapter core schema in Python and emit the discriminator dispatch + the typecollection_type alias), or a hand-mirror. Minimum faithful set:

  1. class-keyed dispatch for the output union and collection-element union (emit if/then/else on class, no fall-through) instead of oneOf.
  2. a pre-validation typecollection_type normalization pass.

Status of the divergent files

Not a regression in either validator — these are genuine IWC authoring issues (class-less collection assertions; legacy type: aliases) that Python’s model deliberately handles (alias) or rejects (missing class). The sweep is a gated triage harness (skipped without GALAXY_TEST_IWC_DIRECTORY), matching Python — it is expected to fail while drift exists.

Fix implemented (mechanism #1)

scripts/dump-test-format-schema.py now post-processes the dumped schema: the two callable-discriminator oneOf unions (output assertions, collection elements) are rewritten into if/then/else keyed on class (== “Collection” → Collection model; other object → File model; non-object → scalar branch for the output union). Regenerated via make sync-test-format-schema. The callable discriminators stay upstream in the model — only their JSON-Schema projection is made faithful (the callables are justified: a Pydantic string discriminator can’t default class-less dicts to File, nor route scalar members).

Parity result (119 IWC tests files)

beforeafter
both reject (agreed)4676
Python rejects / TS accepts (too permissive)300
TS rejects / Python accepts (too strict)55

The 30 false-accepts are closed (TS now catches the real class-less collection -assertion bugs). The remaining 5 are exactly the type:collection_type alias files (mechanism #2), descoped — those IWC files are to be fixed in IWC, not worked around in TS. They are a known TS-stricter-than-Python gap that persists only because Galaxy’s normalize_collection_type_alias before -validator still tolerates the alias; removing that upstream would make both sides agree (separate decision). No regressions: 4895 schema tests pass; validate-tests unit tests pass; if/then/else routing verified positively.

Repro

# TS (built dist required): packages/cli
GALAXY_TEST_IWC_DIRECTORY=~/projects/repositories/iwc \
  pnpm exec vitest run test/iwc-sweep.test.ts -t "tests-file validation"
# Python: galaxy wf_tool_state worktree, .venv
PYTHONPATH=lib .venv/bin/python -c "from galaxy.tool_util.workflow_state.validation_tests import load_tests_file, validate_tests_file; ..."