JSON Schema Validation Plan
Validate exported Pydantic JSON Schemas are standard Draft 2020-12 and round-trip through Python jsonschema and TypeScript ajv.
Two schema sources:
- gxformat2 —
GalaxyWorkflow.model_json_schema()for whole-workflow structural validation - galaxy-tool-util —
WorkflowStepToolState.model_json_schema()for per-step tool state
1. JSON Schema Export Reliability
Current state: CustomGenerateJsonSchema in galaxy.tool_util.parameters.json injects $schema dialect. Pydantic v2 generates Draft 2020-12 natively. run_schema in cache.py already exports per-tool schemas.
Missing for gxformat2: GalaxyWorkflow.model_json_schema() uses Pydantic default generator — no $schema key. Need to use CustomGenerateJsonSchema or duplicate the 5-line class in gxformat2 (can’t depend on galaxy-tool-util).
Action items:
- A. Add
json_schema()function in gxformat2 using duplicatedCustomGenerateJsonSchema - B. Add CLI:
galaxy-tool-cache structural-schema -o gxformat2_schema.json - C. Unit test: exported schema passes
jsonschema.Draft202012Validator.check_schema()
Known issues:
extra="allow"on gxformat2 models →additionalProperties: true→ won’t catch typos. Intentional for forward compat.gxformat2_strict.pymay haveextra="forbid"— investigate.- Pydantic uses
$defsfor$reftargets — bothjsonschemaandajvsupport this natively in 2020-12.
2. Two-Level Validation Module
New file: lib/galaxy/tool_util/workflow_state/validation_json_schema.py
Level 1 — Structural (gxformat2 schema):
- Input: raw workflow dict (YAML-loaded format2)
- Schema:
GalaxyWorkflow.model_json_schema(schema_generator=CustomGenerateJsonSchema) - Validator:
jsonschema.Draft202012Validator
Level 2 — Per-step tool state:
- Walk
steps, resolvetool_id→ cache →WorkflowStepToolState.parameter_model_for()→to_json_schema()→ validatestateblock - Pre-exported schemas via
galaxy-tool-cache schemaalso supported (offline mode)
API:
@dataclass
class JsonSchemaValidationError:
path: str # JSON pointer
message: str
schema_path: str
@dataclass
class JsonSchemaStepResult:
step: str
tool_id: Optional[str]
errors: List[JsonSchemaValidationError]
status: Literal["ok", "fail", "skip"]
@dataclass
class JsonSchemaValidationResult:
structural_errors: List[JsonSchemaValidationError]
step_results: List[JsonSchemaStepResult]
@property
def valid(self) -> bool:
return not self.structural_errors and all(s.status != "fail" for s in self.step_results)
def validate_workflow_json_schema(
workflow_dict: dict,
get_tool_info: Optional[GetToolInfo] = None,
tool_schema_dir: Optional[str] = None,
) -> JsonSchemaValidationResult:
...
def validate_structural_json_schema(
workflow_dict: dict,
) -> List[JsonSchemaValidationError]:
"""Level 1 only — no tool cache needed."""
...
Key decisions:
- Use
jsonschemalibrary (notmodel_validate) — proves the exported JSON Schema works, same schema TS consumers use get_tool_infofor dynamic generation;tool_schema_dirfor offline pre-exported schemas- Structural schema cached as module-level constant (generated once)
- Map results into existing
ValidationStepResultfor formatter compat; addvalidation_mode: Literal["pydantic", "json_schema"]to distinguish
3. CLI Integration
Preferred: add --mode json-schema to existing gxwf-state-validate:
gxwf-state-validate workflow.gxwf.yml --mode json-schema
gxwf-state-validate workflow.gxwf.yml --mode json-schema --tool-schema-dir ./schemas/
In validate.py run_validate():
mode == "json-schema"→validate_workflow_json_schema()- Map results to same
ValidationStepResultlist - All downstream formatting (text, JSON, markdown) unchanged
Structural schema export on galaxy-tool-cache:
galaxy-tool-cache structural-schema -o gxformat2_schema.json
4. TypeScript Round-Trip Testing
Goal: Prove exported schemas (structural + per-tool) consumable by ajv.
Setup: test/ts_json_schema/ with vitest + ajv:
test/ts_json_schema/
package.json # ajv, vitest
validate_structural.test.ts
validate_tool_state.test.ts
Test flow:
- Python fixture exports schemas to temp dir
- vitest loads JSON files, instantiates
Ajv({ strict: false }) - Validates known-good and known-bad documents
- Asserts pass/fail as expected
strict: false needed because Pydantic 2020-12 uses features ajv strict mode flags.
ajv compatibility notes:
$defs— natively supported in ajv 2020-12 modeprefixItemsfor tuples — supported- Discriminated unions (
oneOf+if/then) — need to verify. gxformat2 usesDiscriminator+Tagfor comments and creators. Highest risk area — test specifically.
Alternative (simpler): Single-file Node.js script via subprocess:
const Ajv = require("ajv/dist/2020");
const schema = JSON.parse(fs.readFileSync(process.argv[2]));
const data = JSON.parse(fs.readFileSync(process.argv[3]));
const ajv = new Ajv();
process.exit(ajv.validate(schema, data) ? 0 : 1);
Make TS tests optional in CI (skip if Node not available).
5. Test Strategy (Red-to-Green)
Phase 1: Schema Export Correctness
test_json_schema_export.py:
test_gxformat2_schema_is_valid_draft_2020_12— passesDraft202012Validator.check_schema()test_gxformat2_schema_has_schema_dialect—$schemakey presenttest_tool_state_schema_is_valid_draft_2020_12— per-tool schema passes check_schematest_tool_state_schema_has_expected_properties— properties match tool inputs
Phase 2: Structural Validation (Level 1)
test_json_schema_structural.py:
test_valid_minimal_workflow_passestest_missing_required_field_fails— nosteps→ errortest_invalid_step_type_fails—type: "bogus"→ enum errortest_extra_keys_allowed— unknown top-level key passes (extra=“allow”)test_comment_discriminator_works—type: "text"ok,type: "bogus"fails
Phase 3: Per-Step Tool State (Level 2)
test_json_schema_tool_state.py:
test_valid_state_passestest_wrong_type_fails— int param given stringtest_missing_required_param_failstest_extra_state_key_behavior— verify matches model configtest_offline_schema_dir_mode— pre-export then validate
Phase 4: Integration (Two-Level Combined)
test_json_schema_validation.py:
test_full_workflow_valid— real format2 workflow + populated cache → all greentest_structural_error_reported— bad structure → structural_errors populatedtest_tool_state_error_reported— good structure + bad state → step failurestest_cli_mode_json_schema— CLI--mode json-schema, check exit code + output
Phase 5: TypeScript Round-Trip
test/ts_json_schema/ vitest suite as described in section 4.
Unresolved Questions
-
Where does
CustomGenerateJsonSchemalive canonically? gxformat2 can’t depend on galaxy-tool-util. Options: (a) duplicate 5-line class, (b) extract tiny shared pkg, (c) plain generator + post-process inject$schema. Option (a) simplest. -
Should structural failure block step validation? Probably yes — step walking might crash on malformed structure. Return early.
-
Schema caching for level 2. Generating per-step schema is expensive. LRU cache keyed on
(tool_id, tool_version)for compiledDraft202012Validatorinstances? -
Discriminated unions in jsonschema + ajv. gxformat2
Discriminator+Tagfor comments/creators generatesoneOf/if-then. Need to verify bothjsonschemaandajvhandle these. Highest risk. -
Strict mode?
extra="allow"won’t catch typos. Offer--strict-schemathat generates withadditionalProperties: false? gxformat2 hasgxformat2_strict.py— investigate if it hasextra="forbid". -
Offline schema naming convention. For
tool_schema_dir:{tool_id_safe}/{version}.jsonwhere/→~. Matches TRS ID convention. -
jsonschema vs Pydantic validation agreement. Both should agree on same inputs. Add test that validates identical input with both approaches, asserts same pass/fail.
Critical Files
| File | Action |
|---|---|
lib/galaxy/tool_util/workflow_state/validation_json_schema.py | New — core two-level validation |
lib/galaxy/tool_util/workflow_state/validate.py | Modify — add --mode json-schema path |
lib/galaxy/tool_util/workflow_state/scripts/workflow_validate.py | Modify — --mode, --tool-schema-dir args |
lib/galaxy/tool_util/parameters/json.py | Reference — CustomGenerateJsonSchema |
gxformat2/schema/gxformat2.py | Modify — add JSON Schema export function |
lib/galaxy/tool_util/workflow_state/scripts/tool_cache.py | Modify — structural-schema subcommand |