TYPESCRIPT_META_MODELS_PLAN

TypeScript Tool State Meta-Models Plan

Goal

Create a TypeScript package (galaxy-tool-util) using Effect Schema that can:

  1. Accept serialized ToolParameterBundleModel JSON (as produced by Python’s model_dump())
  2. Dynamically generate validation schemas for each state representation
  3. Validate tool state dicts against those schemas
  4. Produce JSON Schema from those schemas
  5. (Future) Cache tool objects, produce OpenAPI fragments, etc.

The correctness oracle is parameter_specification.yml from the Galaxy repo — the same YAML that drives the Python test suite.

Key Design Decision: Effect Schema over Zod

Effect Schema chosen because custom validation filters can declare their JSON Schema representation via jsonSchema annotation — Zod’s .refine() is opaque to JSON Schema generation. This matters because Galaxy API exposes these schemas and validators like in_range, regex, length need to appear in JSON Schema output.

Project Structure

galaxy-tool-util/
  package.json
  tsconfig.json
  vitest.config.ts
  src/
    index.ts                          # public API
    schema/                           # Effect Schema generation (this plan)
      index.ts
      state-representations.ts        # StateRepresentation type + registry
      parameters/                     # per-parameter-type schema generators
        index.ts                      # registry: parameter_type -> generator
        base.ts                       # shared types, DynamicSchemaInfo
        gx-integer.ts
        gx-float.ts
        gx-text.ts
        gx-boolean.ts
        gx-select.ts
        gx-data.ts
        gx-data-collection.ts
        gx-color.ts
        gx-hidden.ts
        gx-directory-uri.ts
        gx-drill-down.ts
        gx-data-column.ts
        gx-group-tag.ts
        gx-genomebuild.ts
        gx-baseurl.ts
        gx-rules.ts
        cwl-integer.ts
        cwl-float.ts
        cwl-string.ts
        cwl-boolean.ts
        cwl-file.ts
        cwl-directory.ts
        cwl-null.ts
        cwl-union.ts
      containers/                     # conditional, repeat, section
        conditional.ts
        repeat.ts
        section.ts
      validators/                     # Galaxy XML validators -> Effect filters
        index.ts
        regex.ts
        in-range.ts
        length.ts
        expression.ts
        empty-field.ts
        no-options.ts
      model-factory.ts                # createFieldModel() — assembles schemas
      connected-value.ts              # ConnectedValue union helper
    bundle/                           # ToolParameterBundleModel parsing (future)
      index.ts
    cache/                            # tool object caching (future)
      index.ts
    json-schema/                      # JSON Schema generation (future)
      index.ts
  test/
    parameter-specification.test.ts   # main spec-driven test suite
    fixtures/
      parameter_specification.yml     # copied from Galaxy repo
      parameter_models/               # serialized bundles per tool (generated by Python)
        gx_int.json
        gx_boolean.json
        ...

Input Data

The TypeScript side consumes two artifacts from the Python side:

  1. parameter_specification.yml — the test oracle (valid/invalid payloads per tool per state representation)
  2. Serialized ToolParameterBundleModel per tool — JSON files produced by running:
    # in Galaxy repo
    PYTHONPATH=lib python test/unit/tool_util/test_parameter_specification.py
    This dumps parameter_models.yml (or we write a small script to dump per-tool JSON files).

The TS test harness loads a tool’s serialized bundle, passes it to createFieldModel(bundle, stateRep), and validates each test case against the resulting Effect Schema.

Enumerated Scope

State Representations (12)

State RepresentationKey Behavior
requestString IDs, batching allowed, defaults fill absent
relaxed_requestLike request but nulls -> defaults
request_internalInt IDs, batching allowed
request_internal_dereferencedInt IDs, no URL sources
landing_requestString IDs, ALL optional
landing_request_internalInt IDs, ALL optional
job_internalInt IDs, ALL required, no batching
job_runtimeCWL-style file metadata, ALL required
test_case_xmlFile paths, string splitting
test_case_jsonFile paths, no string splitting
workflow_stepData params always None
workflow_step_linkedAllows ConnectedValue

Parameter Types (28 distinct parameter_type values across ~80 tool entries)

Galaxy scalar: gx_integer, gx_float, gx_text, gx_boolean, gx_color, gx_hidden, gx_directory_uri, gx_genomebuild, gx_baseurl

Galaxy choice: gx_select (single + multiple), gx_drill_down, gx_data_column, gx_group_tag

Galaxy data: gx_data (single + multiple), gx_data_collection (many collection_type variants)

Galaxy containers: gx_conditional (boolean + select test), gx_repeat, gx_section

Galaxy special: gx_rules (rule builder with mappings — complex nested dict structure)

CWL: cwl_integer, cwl_float, cwl_string, cwl_boolean, cwl_file, cwl_directory, cwl_null, cwl_union

Validator Types

regex, in_range, length, expression, empty_field, no_options (others like metadata validators are dynamic/runtime-only — skip for now)

Per-Parameter Semantic Quirks

These are non-obvious behaviors that differ from naive assumptions:


Phase 1: Infrastructure — COMPLETE (2026-03-29)

All steps implemented. Test results: 33 passed, 139 skipped.

What was built

Learnings / deviations from plan


Phase 2: First Parameter + First State Representation — COMPLETE (2026-03-29)

Implemented as part of Phase 1 infrastructure validation.


Phase 3: Iterative Expansion

This phase is a structured loop. Each iteration adds either a parameter type or a state representation (or both) and fixes tests until green. The order is chosen to maximize coverage per iteration.

Critical ordering note: Validators must be implemented alongside the parameter types that use them. The skip logic checks both parameter_type registration AND validator support to prevent premature un-skipping.

Round 1 — Scalars, choice, all validators, request state rep — COMPLETE (2026-03-29)

Test results: 216 passed, 323 skipped (up from 33 passed).

Parameter types implemented (10 total):

Validators implemented (6 total):

Architecture established:

Post-Round 1 Review Fixes — COMPLETE (2026-03-29)

A subagent review identified P0/P1 issues. All fixed:

Tooling added:

Round 2 — Container parameters — COMPLETE (2026-03-29)

Test results: 275 passed, 347 skipped (up from 216).

Extended GeneratorContext with buildChildSchemaInfos() and assembleStruct() methods to give containers access to raw field infos and struct assembly without circular imports.

Container types implemented (3):

Design decisions:

Round 3 — Choice, data, data_collection, data_column, group_tag — COMPLETE (2026-03-29)

Test results: 498 passed, 548 skipped (up from 275). All remaining skips are state rep coverage.

Parameter types implemented (5), completing ALL 18 Galaxy types:

Bundle type updates: DrillDownParameterModel.hierarchy, DataColumnParameterModel.value, DataCollectionParameterModel.extensions/value.

Review findings (all addressed):

Round 4 — Expand state representations — COMPLETE (2026-03-29)

Test results: 2087 passed, 10 skipped (up from 498). All 12 state representations implemented. Remaining 10 skips are CWL parameter types (no fixture bundles).

All 12 state reps implemented in one pass:

  1. request_internal + request_internal_dereferenced + landing_request + landing_request_internal — Scalars just worked via existing computeIsOptional. Data changes: allowsUrlSources fixed to cover all request-like reps (was wrong — only had request_internal). request_internal_dereferenced excludes URL sources AND inline collections for data_collection.
  2. job_internal — All required (via computeIsOptional). Data: {src: "hda"|"dce", id: int} — added dce direct source. Data collection: {src: "hdca"|"dce", id: int}. No batch, no URL, no inline collection.
  3. job_runtime — All required. Data: {class: "File", basename, location, path, nameroot, nameext, format, size, element_identifier?}. Data collection: recursive collection_type-aware validation — list → array elements, paired{forward, reverse} record, nested types like list:paired validated recursively. Extra fields (column_definitions, has_single_item) as optional.
  4. workflow_step + workflow_step_linked — Data params use S.Unknown.pipe(S.filter(() => false)) (rejects all values). workflow_step: data always optional (absent only). workflow_step_linked: central ConnectedValue wrapping in model-factory.ts via S.Union(schema, ConnectedValueSchema). Array types (select_multiple) use connectedValueHandled flag for item-level ConnectedValue instead of outer wrapping.
  5. test_case_xml + test_case_json — Data: S.Union({class: "File", path}, {class: "File", location}). Data collection: recursive schema with typed File elements ({class: "File", identifier, path}), validates nested collection structure. test_case_xml accepts comma-separated strings for select_multiple and data_column_multiple.
  6. relaxed_request — Only 6 tests, mostly same as request. Key difference: gx_text allows null for non-optional params (null coercion via stateRep === "relaxed_request" check).

Key design decisions:

Round 5 — Special parameters

  1. gx_rules — rule builder with mappings, complex nested dict structure

Round 6 — CWL parameters

  1. cwl_integer, cwl_float, cwl_string, cwl_boolean — simple wrappers
  2. cwl_file, cwl_directory — CWL-specific data refs
  3. cwl_null — validates Literal[None]
  4. cwl_union — union of other CWL types (recursive)

Development harness for each iteration

  1. Pick next item from the iteration order
  2. Run tests: make test — note which tools/reps are newly unskipped
  3. Implement the parameter type or state representation
  4. Run tests — iterate until all newly-unskipped tests are green
  5. Verify no regressions: make check && make test
  6. Commit

Phase 4: Drop All Skips

Step 4.1: Remove Skip Logic

Step 4.2: Full Green

Step 4.3: CI


Future Work (not in this plan, but project structure supports)


Risks

RiskSeverityStatusMitigation
Data parameter complexity (~8 type shapes across state reps)HighResolvedgx-data.ts branches on state rep: source refs (request-like), File objects (job_runtime), File path/location (test_case), S.Never (workflow_step). gx-data-collection.ts has recursive collection_type validation for job_runtime.
Conditional discriminated union with onExcessProperty: "error"HighResolvedS.Union of per-branch structs with S.Literal discriminators + S.optional in default branch. Tested with boolean, select, and nested conditionals.
relaxed_request null coercion is per-parameter-type, not globalMediumResolvedOnly gx_text needed null coercion — simple stateRep === "relaxed_request" check adds NullOr wrapping. Only 6 test cases for relaxed_request total.
allowsUrlSources() predicate is wrongMediumResolvedFixed to include request, relaxed_request, request_internal, landing_request, landing_request_internal.
expression validator only handles subset of Python expressionsMediumOpenCurrently parses 'X' in value and value == 'X'. Unrecognized expressions pass through.
cwl_union is recursive (union of CWL types including nested unions)LowOpenSmall scope, deferred to Round 6. Schema.suspend handles it.
Fixture staleness — Python model changes break TS tests silentlyMediumOpenCI step to regenerate fixtures and diff against committed versions.

Unresolved Questions