COMPONENT_WORKFLOW_STATE_INITIAL_WORK

Structured Workflow Tool State: Initial Work Summary

Branch: structured_tool_state Commits: ed4d1eeb4a (gxformat2 abstraction layer), 1a69df9462 (workflow conversion and validation), c33843b28b (WIP populate_state test)

Reference docs:

Problem Statement

Galaxy’s two workflow formats handle tool state very differently:

The gxformat2 library converts between formats without consulting tool definitions. It can’t validate parameter names/types, can’t infer missing __current_case__ values, and can’t produce clean state on export (it just copies tool_state with per-value JSON strings intact). The research docs identify this as a key gap: the export path produces tool_state (machine-format) not state (human-format).

Galaxy already has a sophisticated tool state validation infrastructure (12 state representations with Pydantic models), including workflow_step and workflow_step_linked representations that validate workflow-context tool state. This work starts connecting that infrastructure to workflow format conversion.

What Was Built

1. gxformat2 Abstraction Layer (ed4d1eeb4a)

lib/galaxy/workflow/format2.py — Extracts gxformat2 conversion calls from managers/workflows.py into a reusable module:

managers/workflows.py was simplified to use these helpers, removing ~30 lines of duplicated conversion boilerplate.

test/unit/workflows/test_convert.py — Roundtrip test: format2 -> native -> format2.

2. Workflow State Validation and Conversion Package (1a69df9462)

New package: lib/galaxy/tool_util/workflow_state/

This is the core of the work. It’s in tool_util (runtime-independent), not galaxy.workflow (runtime-dependent).

Types (_types.py)

Validation — Format2 (validation_format2.py)

For each tool step in a format2 workflow:

  1. Resolves the tool via GetToolInfo
  2. Validates state dict against WorkflowStepToolState pydantic model (parameters without connections)
  3. Merges connections from in/connect into state as ConnectedValue markers
  4. Validates merged state against WorkflowStepLinkedToolState model (parameters with connections allowed)

Key function: merge_inputs() — walks tool parameter tree (conditionals, repeats, sections) and injects ConnectedValue into state dict for each connected input. This is the schema-aware analog of what gxformat2’s setup_connected_values() does schema-free.

Handles:

Validation — Native (validation_native.py)

For each tool step in a native workflow:

  1. Parses tool_state JSON string
  2. Merges input_connections into state as ConnectedValue markers (some older workflows don’t have them inline)
  3. Walks parameter tree validating:
    • Integers: int(value) check
    • Data/collection: must be ConnectedValue/RuntimeValue dict or null+connected (unless optional)
    • Selects: value must be in options list
    • Conditionals: resolves when branch, cross-checks __current_case__ index
    • Extra keys (not in tool def): raises error
  4. Allowed extra keys: __page__, __rerun_remap_job_id__ at root level; __current_case__ + test param name in conditional branches

Conversion (convert.py)

convert_state_to_format2(native_step_dict, get_tool_info) -> Format2State

Defensive conversion with validation guards:

  1. Resolve tool via GetToolInfo
  2. Validate native step state (fail -> ConversionValidationFailure)
  3. Convert to format2 state (currently handles gx_integer and gx_data only)
  4. Validate resulting format2 state (fail -> ConversionValidationFailure)
  5. Return Format2State (pydantic model with state and in fields)

The caller catches ConversionValidationFailure and falls back to the raw native tool_state — “better ugly than corrupted.”

Status: Only gx_integer and gx_data parameter types implemented. Other types hit a pass/NotImplementedError. Has debug print() statements.

Dispatch (validation.py)

validate_workflow(workflow_dict, get_tool_info) — detects format via a_galaxy_workflow == "true" and dispatches to format2 or native validator.

3. Galaxy-Side Validator (lib/galaxy/workflow/gx_validator.py)

GalaxyGetToolInfo — concrete GetToolInfo implementation using Galaxy stock tools:

validate_workflow(as_dict) — convenience function using the global instance.

4. Supporting Changes

5. Tests

test/unit/workflows/test_workflow_validation.py — Main test file:

test/unit/workflows/test_workflow_state_conversion.py — converts test_workflow_1.ga cat step to format2.

test/unit/workflows/test_workflow_validation_helpers.py — tests GalaxyGetToolInfo (resolves cat1 by version and latest).

Test fixtures in test/unit/workflows/valid/ and invalid/ — minimal gxwf.yml workflows using gx_int and gx_data parameter spec test tools.

6. WIP Populate State Test (c33843b28b)

Adds TestMetadata class to test_populate_state.py — tests populate_state() with gx_data_column.xml tool. Marked WIP, may not be kept.

Current Gaps / What’s Left