TS_STATEFUL_CONVERSION_PLAN

TypeScript Stateful Conversion Plan

Date: 2026-04-04 Repo: jmchilton/galaxy-tool-util-ts (galaxy-tool-util) Goal: Schema-aware format conversion (gxwf convert --stateful) and roundtrip validation (gxwf roundtrip) using tool definitions from the cache to properly re-encode parameter values.


Background

Schema-free vs Stateful

Schema-free (current gxwf convert): copies tool_state as-is between formats. Fast, no tool cache dependency, but produces lossy results — native tool_state may contain stale bookkeeping keys, string-typed numbers, comma-delimited multi-selects, ConnectedValue/RuntimeValue markers mixed into state.

Stateful (this plan): walks the parameter tree using tool definitions to:

Design Principles

  1. No double encoding. Native tool_state is a dict of values — proper dicts, lists, numbers, booleans, strings. The {key: json.dumps(value)} pattern from Python’s encode_state_to_native() is a legacy serialization artifact that we do not replicate. (Python side is fixing this in STRICT_STATE_PLAN Step 0.)

  2. No legacy decode. The walker does not silently decode JSON-string containers (as_dict()/as_list()). Containers must be proper dicts/lists. Legacy-encoded workflows are rejected by precheck, not silently accommodated. (Matches Python commit 67aa42d.)

  3. Graceful degradation. Per-step conversion failure falls back to schema-free passthrough. The caller gets a structured report of which steps converted vs fell back.

Python Reference

Key files in galaxy-tool-util Python (wf_tool_state branch):

Key files in gxformat2:


Plan

Step 1: Port the state walker

New file: packages/schema/src/workflow/walker.ts

Port Python’s _walker.py — two walker functions with a shared leaf callback pattern.

walkNativeState()

type LeafCallback = (
  toolInput: ToolParameterModel,
  value: unknown,
  statePath: string,
) => unknown | typeof SKIP_VALUE;

function walkNativeState(
  inputConnections: Record<string, unknown>,
  toolInputs: ToolParameterModel[],
  state: Record<string, unknown>,
  leafCallback: LeafCallback,
  options?: { prefix?: string; checkUnknownKeys?: boolean },
): Record<string, unknown>;

Handles:

Returns new dict of {paramName: callbackResult} for non-skipped leaves, with nested dicts for conditionals/sections and arrays for repeats.

walkFormat2State()

function walkFormat2State(
  toolInputs: ToolParameterModel[],
  state: Record<string, unknown>,
  leafCallback: LeafCallback,
  prefix?: string,
): Record<string, unknown>;

Simpler — no double-encoding, no bookkeeping keys, no input_connections. Clean dict walking with conditional branch selection, repeat iteration, section recursion.

Relationship to state-merge.ts

state-merge.ts does similar tree walking for connection injection/stripping but mutates in-place and doesn’t use a leaf callback. The walker generalizes the pattern. We keep state-merge.ts as-is for now — it works and is tested. The walker is for new stateful conversion code. A future refactor could unify them, but that’s not in scope here.

Reusable utilities

state-merge.ts already exports: flatStatePath(), repeatInputsToArray(), keysStartingWith(), selectWhichWhen(). The walker should import and reuse these rather than duplicating.

Tests: Unit tests covering:

Step 2: Conversion functions

New file: packages/schema/src/workflow/stateful-convert.ts

Native → Format2: convertStateToFormat2()

interface Format2ConvertedState {
  state: Record<string, unknown>;
  in: Record<string, string>;  // connection mapping (statePath → placeholder)
}

function convertStateToFormat2(
  nativeStep: NormalizedNativeStep,
  toolInputs: ToolParameterModel[],
): Format2ConvertedState;

Logic:

  1. Extract tool_state, input_connections, connected paths from step
  2. Walk native state with leaf callback that:
    • gx_data / gx_data_collection: always SKIP_VALUE, record in in block if connected/runtime
    • gx_rules: parse JSON string to object, SKIP if null/connected
    • ConnectedValue/RuntimeValue markers: record in in block, SKIP_VALUE
    • Scalars: coerce via convertScalarValue()
    • Null/"null" values: SKIP_VALUE
  3. Return {state, in} pair

Scalar coercions: convertScalarValue()

Parameter typeNativeFormat2
gx_integer"42" or 4242 (number)
gx_float"3.14" or 3.143.14 (number)
gx_boolean"true"/"false" or booltrue/false (boolean)
gx_select (multiple)"a,b,c" or list["a","b","c"] (array)
gx_data_column (multiple)"0,1" or list[0, 1] (number array)
gx_data_column (single)"3" or 33 (number)
gx_text, gx_color, gx_hidden, etc.stringstring (passthrough)

Format2 → Native: encodeStateToNative()

function encodeStateToNative(
  toolInputs: ToolParameterModel[],
  state: Record<string, unknown>,
): Record<string, unknown>;

Walks format2 state reversing coercions:

No JSON.stringify per-key. Returns a clean dict. The structural conversion (toNative()) places this dict directly as tool_state — a proper object, not double-encoded JSON strings.

Validation wrapper

function convertStateToFormat2Validated(
  nativeStep: NormalizedNativeStep,
  toolInputs: ToolParameterModel[],
): Format2ConvertedState;  // throws ConversionValidationFailure
  1. Validate native state against createFieldModel(bundle, "workflow_step_native")
  2. Convert via convertStateToFormat2()
  3. Validate result against createFieldModel(bundle, "workflow_step") + linked validation with in connections
  4. Throw ConversionValidationFailure if either validation fails — caller catches and falls back

Tests:

Step 3: Hook into toFormat2/toNative

Modified files:

ConversionOptions

interface ConversionOptions {
  /** Per-step callback: native step → format2 state dict, or null for passthrough. */
  stateEncodeToFormat2?: (nativeStep: NormalizedNativeStep) => Record<string, unknown> | null;
  /** Per-step callback: (step, format2State) → native tool_state dict, or null for default. */
  stateEncodeToNative?: (step: Record<string, unknown>, state: Record<string, unknown>) => Record<string, unknown> | null;
  compact?: boolean;
}

Add optional options parameter to toFormat2() and toNative():

function toFormat2(raw: unknown, options?: ConversionOptions): NormalizedFormat2Workflow;
function toNative(raw: unknown, options?: ConversionOptions): NormalizedNativeWorkflow;

In _buildFormat2Step(), if options.stateEncodeToFormat2 is provided, call it with the native step. If it returns non-null, use the returned dict as the format2 state (replacing the passthrough tool_state). If null, fall back to current behavior.

Same pattern for _buildStep() in toNative.ts with stateEncodeToNative.

Stateful wrappers

New file: packages/schema/src/workflow/normalized/toFormat2Stateful.ts

interface StepExportStatus {
  stepId: string;
  toolId?: string;
  converted: boolean;
  error?: string;
}

interface StatefulExportResult {
  workflow: NormalizedFormat2Workflow;
  steps: StepExportStatus[];
}

async function toFormat2Stateful(
  raw: unknown,
  toolCache: ToolCache,
  options?: { compact?: boolean },
): Promise<StatefulExportResult>;

Creates the stateEncodeToFormat2 callback:

  1. For each step, load tool from cache
  2. Call convertStateToFormat2Validated(step, tool.inputs)
  3. Track per-step status (converted vs fallback with error)
  4. Return converted state or null (fallback)

New file: packages/schema/src/workflow/normalized/toNativeStateful.ts

Same pattern with stateEncodeToNative callback using encodeStateToNative().

Tests:

Step 4: CLI wiring

Modified files:

Add --stateful flag to gxwf convert and gxwf convert-tree:

gxwf convert my-workflow.ga --to format2 --stateful
gxwf convert-tree ./workflows/ --to format2 --stateful --output-dir ./converted/

When --stateful:

Without --stateful: behavior unchanged (schema-free passthrough).

Tests:

Step 5: Precheck / legacy encoding gate

New file: packages/schema/src/workflow/precheck.ts

interface PrecheckResult {
  canProcess: boolean;
  skipReasons: string[];
}

function precheckNativeWorkflow(
  workflow: NormalizedNativeWorkflow,
  toolInputs?: Map<string, ToolParameterModel[]>,
): PrecheckResult;

Checks:

Wire into stateful conversion: if precheck fails, skip stateful conversion for that workflow (fall back to schema-free).

Tests:

Step 6: Roundtrip validation

New file: packages/schema/src/workflow/roundtrip.ts

interface StepRoundtripResult {
  stepId: string;
  toolId?: string;
  success: boolean;
  failureClass?: FailureClass;
  error?: string;
  diffs: string[];
}

interface RoundtripResult {
  workflowName: string;
  stepResults: StepRoundtripResult[];
  success: boolean;
}

async function roundtripValidate(
  nativeWorkflow: NormalizedNativeWorkflow,
  toolCache: ToolCache,
): Promise<RoundtripResult>;

Pipeline: native → format2 (stateful) → native’ (stateful) → compare

Comparison logic

Per-step comparison of original vs reimported tool_state:

CLI

gxwf roundtrip my-workflow.ga
gxwf roundtrip-tree ./workflows/ --json

Exit codes: 0 = clean roundtrip, 1 = benign diffs only, 2 = real diffs or errors.

Tests:

Step 7: Documentation

Update docs/guide/workflow-operations.md:

Update docs/packages/cli.md:


Implementation Order

  1. Step 1 — Walker (foundation for everything) ✅ Done (2026-04-04)
  2. Step 2 — Conversion functions (uses walker) ✅ Done (2026-04-04) — validation wrapper deferred to Step 3 (depends on integration)
  3. Step 5 — Precheck (independent, wired into steps 3-4) ✅ Done (2026-04-04)
  4. Step 3 — ConversionOptions hooks in toFormat2/toNative + stateful wrappers ✅ Done (2026-04-04, revised 2026-04-04)
  5. Step 4 — CLI wiring ✅ Done (2026-04-04)
  6. Step 6 — Roundtrip validation ✅ Done (2026-04-04)
  7. Step 7 — Documentation ✅ Done (2026-04-04)

Steps 1-2 are schema-package work. Step 5 is small and independent. Steps 3-4 wire everything together. Step 6 builds on top.

Progress notes (2026-04-04)

Step 1packages/schema/src/workflow/walker.ts + test/walker.test.ts (34 tests). Exports: walkNativeState, walkFormat2State, SKIP_VALUE, UnknownKeyError, LeafCallback, WalkNativeOptions. Reuses flatStatePath, repeatInputsToArray, selectWhichWhen from state-merge.ts. String container rejection (no legacy decode).

Step 2packages/schema/src/workflow/stateful-convert.ts + test/stateful-convert.test.ts (52 tests). Exports: convertScalarValue, reverseScalarValue, convertStateToFormat2, encodeStateToNative, Format2ConvertedState. Scalar coercion table matches Python reference (confirmed via research agent). encodeStateToNative returns clean dicts — no JSON.stringify per-key. Deferred: convertStateToFormat2Validated wrapper with ConversionValidationFailure — defer to Step 3 where createFieldModel validation integrates with toFormat2/toNative.

Step 5packages/schema/src/workflow/precheck.ts + test/precheck.test.ts (9 tests). Exports: precheckNativeWorkflow, PrecheckResult, StepPrecheckResult. Reuses scanForReplacements (typed ${...} detection) and scanToolState (legacy encoding detection) — no duplicated walking. Per-step results enable per-step fallback in later steps.

Step 3 — ConversionOptions hooks + stateful wrappers.

Resolved unresolved question (sync vs async callbacks): callback-shaped resolver — matches gxformat2’s ConversionOptions.state_encode_to_* design (options.py, _conversion.py:1264-1276). Core conversion stays sync, CLI layer handles async preloading. Schema package has no runtime dep on core.

Revised 2026-04-04: Original design used Map<tool_id, ToolParameterModel[]> keyed by tool_id alone. Reviewer identified (and gxformat2 research confirmed) that this (a) collides on version for multi-version workflows, (b) requires pre-computed lookups that miss external subworkflow refs. Switched to ToolInputsResolver callback taking (toolId, toolVersion). Error message: "tool not resolved: {id}@{version}". This is the gxformat2-aligned shape.

Total: 100 new tests. Full schema suite: 4443 passed | 88 skipped. Lint/format/typecheck clean across all packages.

Step 4 — CLI wiring (gxwf convert / gxwf convert-tree --stateful flag).

Design notes:

Full test count: 4443 schema, 97 CLI (4 new), 13 proxy. make check + make test clean.

Step 6 — Roundtrip validation.

Exports added: roundtripValidate, RoundtripResult, StepRoundtripResult, StepDiff, DiffSeverity, BenignArtifactKind, RoundtripFailureClass (via workflow/index.ts and src/index.ts).

Total: 12 new tests. make check + make test clean: 4449 schema, 103 CLI (6 new), 97 core, 13 proxy.

Step 6 review pass (2026-04-04). Reviewed and applied three should-fixes + nits from a subagent review:

Python parity gaps not ported (deliberate): step ID remapping (label+type matching — TS preserves step IDs through both conversions), subworkflow recursion in diff, comment remapping, visual/position/label/annotation diffs (_compare_step_visual), opportunistic JSON-decode of string leaves (_try_json_decode — violates “no legacy decode” principle), KnownBenignArtifacts rich enum (we have a smaller BenignArtifactKind). Version-key handling in resolvers is untested at the roundtrip layer — mapResolver(toolId) ignores version.

Step 7 — Documentation.


Relationship to Python STRICT_STATE_PLAN

The STRICT_STATE_PLAN decomposes --strict into --strict-structure, --strict-encoding, --strict-state. Several items are directly relevant:


Future Work (not in this plan)


Unresolved Questions