Test-format schema parity: TS ajv vs Python Pydantic divergence

Debrief from mirroring Galaxy’s Python IWC sweep tests into the TS monorepo (packages/cli/test/iwc-sweep.test.ts). Feeds directly into TEST_FORMAT_EFFECT_SCHEMA_PLAN — see “Implication” below.

What was done

Fixed a real latent bug in the TS strict-encoding sweep (ran checkStrictEncoding on raw .ga; every step’s tool_state is a JSON string → would fail 119/120). Now cleans first, mirroring Python TestIWCSweepStrictEncodingClean. Also fixed strict (all): structure→clean →encoding order, and a dead skip filter (r.status === "skip" never matched; now SKIP_STATUSES.has).
Added a tests-file sweep mirroring Python TestIWCSweepValidateTests (discover *-tests.yml / -test.yml, validate vs Tests schema).

The divergence the new sweep surfaced

Ran both validators over all 119 IWC tests files:

verdict	count
both reject (faithful)	46
Python rejects, TS accepts (TS too permissive)	30
TS rejects, Python accepts (TS too strict)	5
Python valid / TS valid totals	43 / 68

So TS and Python disagree on 35 files, in both directions.

Root cause

TS validates with ajv against Tests.model_json_schema() (packages/schema/src/test-format/tests.schema.json, generated by scripts/dump-test-format-schema.py). Python validates with Tests.model_validate() at runtime. JSON Schema is structural and cannot encode imperative Pydantic logic, so the exported artifact is lossy. Two mechanisms, both confirmed against real IWC data:

1. Callable `Discriminator` functions are not serializable

galaxy.tool_util_models.__init__:

TestOutputAssertions = tagged union Discriminator(_discriminate_output)
TestCollectionElementAssertion = tagged union Discriminator(_discriminate_collection_element)

_discriminate_output(v): dict with class == "Collection" → Collection model; else → File model (extra="forbid"). So a class-less {element_tests: {...}} output is routed to the File model at runtime and rejected (“Extra inputs are not permitted”). model_json_schema() can’t emit the callable, so it degrades the union to a plain oneOf of the members. ajv tries all branches, finds element_tests matches TestCollectionOutputAssertions cleanly → accepts. → the 30 too-permissive files (collection assertions written without class: Collection).

Inverse: ajv oneOf requires exactly one match / applies every branch’s additionalProperties:false; the runtime discriminator cleanly picks one. This contributes to the 5 too-strict files.

2. `@model_validator(mode="before")` normalizers are absent from the schema

TestCollectionOutputAssertions._normalize_type_alias → normalize_collection_type_alias maps legacy type: → collection_type on Collection shapes before validation (IWC authors write type: paired on nested class: Collection elements). Pure Python, invisible to model_json_schema(). ajv sees type as an undeclared key on a strict Collection model → rejects valid input. → the rest of the 5 too-strict files (e.g. microbiome/binning-evaluation, Scaffolding-HiC-VGP8).

Implication for TEST_FORMAT_EFFECT_SCHEMA_PLAN

The plan’s Option A (generate Effect Schema from the emitted JSON Schema) cannot achieve parity — the loss happens upstream, in model_json_schema(), before TS sees anything. Swapping ajv for Effect Schema fixed to the same JSON artifact reproduces all 35 divergences.

To actually match Python, the generator must capture the discriminator + before -validator semantics, i.e. Option B (walk Tests.model_fields / TypeAdapter core schema in Python and emit the discriminator dispatch + the type→collection_type alias), or a hand-mirror. Minimum faithful set:

class-keyed dispatch for the output union and collection-element union (emit if/then/else on class, no fall-through) instead of oneOf.
a pre-validation type→collection_type normalization pass.

Status of the divergent files

Not a regression in either validator — these are genuine IWC authoring issues (class-less collection assertions; legacy type: aliases) that Python’s model deliberately handles (alias) or rejects (missing class). The sweep is a gated triage harness (skipped without GALAXY_TEST_IWC_DIRECTORY), matching Python — it is expected to fail while drift exists.

Fix implemented (mechanism #1)

scripts/dump-test-format-schema.py now post-processes the dumped schema: the two callable-discriminator oneOf unions (output assertions, collection elements) are rewritten into if/then/else keyed on class (== “Collection” → Collection model; other object → File model; non-object → scalar branch for the output union). Regenerated via make sync-test-format-schema. The callable discriminators stay upstream in the model — only their JSON-Schema projection is made faithful (the callables are justified: a Pydantic string discriminator can’t default class-less dicts to File, nor route scalar members).

Parity result (119 IWC tests files)

	before	after
both reject (agreed)	46	76
Python rejects / TS accepts (too permissive)	30	0
TS rejects / Python accepts (too strict)	5	5

The 30 false-accepts are closed (TS now catches the real class-less collection -assertion bugs). The remaining 5 are exactly the type:→collection_type alias files (mechanism #2), descoped — those IWC files are to be fixed in IWC, not worked around in TS. They are a known TS-stricter-than-Python gap that persists only because Galaxy’s normalize_collection_type_alias before -validator still tolerates the alias; removing that upstream would make both sides agree (separate decision). No regressions: 4895 schema tests pass; validate-tests unit tests pass; if/then/else routing verified positively.

Repro

# TS (built dist required): packages/cli
GALAXY_TEST_IWC_DIRECTORY=~/projects/repositories/iwc \
  pnpm exec vitest run test/iwc-sweep.test.ts -t "tests-file validation"
# Python: galaxy wf_tool_state worktree, .venv
PYTHONPATH=lib .venv/bin/python -c "from galaxy.tool_util.workflow_state.validation_tests import load_tests_file, validate_tests_file; ..."

TEST_FORMAT_SCHEMA_PARITY_FINDINGS