Plan C: Representation-Transform Architecture
Approach
workflow_step and workflow_step_linked were already designed for format2 state validation (commit 39641f6531). Don’t create a new representation — leverage the existing pair and reframe conversion as a representation transform between them and the native encoding. The visitor pattern + Pydantic models already handle all parameter types; the missing piece is wiring them into the conversion pipeline.
Why No New Representation
The research confirms workflow_step already encodes format2’s contract:
| Property | workflow_step | Format2 state | Match? |
|---|---|---|---|
| Structured dicts | Yes | Yes | Yes |
| No ConnectedValue | Yes (rejected) | Yes (in in dict) | Yes |
No __current_case__ | Yes (discriminated union on selector) | Yes (inferred) | Yes |
No __page__/__rerun_remap_job_id__ | Yes | Yes | Yes |
| Data params | type(None), requires_value=False → absent OK | Absent (in in dict) | Yes |
| Defaults not required | Yes (nearly everything optional) | Yes | Yes |
| Extra fields forbidden | Yes (extra=“forbid”) | Yes | Yes |
Specific design choices that confirm intent:
- DataParameterModel:
type(None)withrequires_value=False— absent is valid - SelectParameterModel:
py_type_workflow_stepforces all selects optional - HiddenParameterModel: non-optional hidden forced optional for workflow context
- ConditionalParameterModel:
__absent__branch allows entire conditional to be missing, discriminator uses test param value not__current_case__ - BooleanParameterModel: StrictBool compatible with YAML bool deserialization
Creating a 13th representation that duplicates workflow_step would be confusing — the old one would lose its reason to exist.
Core Architecture: Conversion as Representation Transform
The key insight: native tool_state (double-encoded JSON with ConnectedValue + __current_case__ + bookkeeping) needs to become format2 state (clean structured dict). This is a transform from one encoding to workflow_step representation. The reverse (format2→native) is a transform from workflow_step to the native encoding.
Native tool_state (JSON string)
│
▼ parse + strip bookkeeping
workflow_step_linked (structured dict with ConnectedValue markers)
│
▼ strip ConnectedValue → connections dict
workflow_step (clean dict) ←→ Format2 state
+
connections dict ←→ Format2 in/connect
The existing visit_input_values() visitor and pydantic_template() infrastructure handle the parameter tree traversal. We just need conversion callbacks.
Implementation Steps
Phase 1: Validate That workflow_step Already Works (1-2 days)
Before building conversion, confirm the representation handles all format2 patterns.
Step 1.1: Audit parameter_specification.yml for workflow_step coverage
File: test/unit/tool_util/parameter_specification.yml
Check that workflow_step_valid and workflow_step_invalid entries exist for all parameter types used in format2 workflows. Identify any gaps. Key types to verify:
- gx_text, gx_integer, gx_float, gx_boolean
- gx_select (single + multiple)
- gx_conditional_boolean, gx_conditional_select
- gx_repeat, gx_section
- gx_data, gx_data_collection, gx_data_optional
Step 1.2: Add missing workflow_step spec entries
For any parameter type lacking workflow_step_valid/workflow_step_invalid entries, add them. These test the exact values that would appear in format2 state blocks.
Step 1.3: Test with real format2 workflow state
Write a test that:
- Takes a format2 workflow fixture (e.g.,
default_values.gxwf.yml) - Extracts the
statedict from a tool step - Validates it against
WorkflowStepToolState.parameter_model_for(tool.inputs) - Confirms it passes
If any real format2 state fails validation, that’s a bug in workflow_step to fix — not a reason for a new representation.
Phase 2: Native → workflow_step_linked Parsing (1 week)
The first conversion step: parse native tool_state JSON into a workflow_step_linked dict.
Step 2.1: Build native state parser
File: lib/galaxy/tool_util/workflow_state/convert.py
def parse_native_to_workflow_step_linked(
tool_state_json: str,
input_connections: dict,
input_models: ToolParameterBundle,
) -> dict:
"""Parse native tool_state JSON → workflow_step_linked dict.
- Outer JSON.loads to get the double-encoded dict
- Per-value JSON.loads to decode string values
- Type coercion using parameter type info (string "5" → int 5)
- Inject ConnectedValue for connected params
- Strip __page__, __rerun_remap_job_id__, __current_case__
"""
This uses visit_input_values() with a callback that:
- For each leaf parameter, decodes the JSON string value and coerces to the correct Python type
- For connected params (in
input_connections), returns{"__class__": "ConnectedValue"} - For conditionals, strips
__current_case__(the discriminated union handles it) - Skips
__page__,__rerun_remap_job_id__
Step 2.2: Validate result
After parsing, validate the result against WorkflowStepLinkedToolState:
linked_model = WorkflowStepLinkedToolState.parameter_model_for(input_models)
linked_model.model_validate(parsed_state)
This catches any parsing errors immediately.
Phase 3: workflow_step_linked → workflow_step (Format2 State) (3-4 days)
Strip ConnectedValue markers and separate into state + connections.
Step 3.1: Build stripping converter
File: lib/galaxy/tool_util/workflow_state/convert.py or new convert_format2.py
def linked_to_format2(
linked_state: dict,
input_models: ToolParameterBundle,
) -> tuple[dict, dict]:
"""workflow_step_linked → (workflow_step dict, connections dict).
Walk the parameter tree. For each ConnectedValue, remove from state
and record in connections dict. Return clean state + connections.
"""
connections = {}
def callback(parameter, value, prefix_path):
if isinstance(value, dict) and value.get("__class__") == "ConnectedValue":
connections[prefix_path] = True
return VISITOR_REMOVE
return VISITOR_NO_REPLACEMENT
clean_state = visit_input_values(input_models, linked_state, callback)
return clean_state, connections
Step 3.2: Validate result
step_model = WorkflowStepToolState.parameter_model_for(input_models)
step_model.model_validate(clean_state)
Step 3.3: Build connections dict → format2 in dict
Map the internal connection paths to format2 in syntax. The connections dict keys are parameter paths (e.g., anno|reference_gene_sets). These become entries in the format2 in block, with the actual source step/output resolved from input_connections.
Phase 4: Full Pipeline (convert_state_to_format2) (2-3 days)
Step 4.1: Wire the pipeline together
File: lib/galaxy/tool_util/workflow_state/convert.py
Replace the current per-type _convert_state_at_level() approach:
def convert_state_to_format2(native_step_dict, get_tool_info):
parsed_tool = get_tool_info.get_tool_info(tool_id, tool_version)
# Step 1: Parse native → workflow_step_linked
linked_state = parse_native_to_workflow_step_linked(
native_step_dict["tool_state"],
native_step_dict.get("input_connections", {}),
parsed_tool.inputs,
)
# Step 2: Validate as workflow_step_linked
WorkflowStepLinkedToolState.parameter_model_for(parsed_tool.inputs) \
.model_validate(linked_state)
# Step 3: Strip connections → workflow_step + connections
clean_state, connections = linked_to_format2(linked_state, parsed_tool.inputs)
# Step 4: Validate as workflow_step
WorkflowStepToolState.parameter_model_for(parsed_tool.inputs) \
.model_validate(clean_state)
# Step 5: Build format2 in dict from connections + input_connections
format2_in = build_format2_in(connections, native_step_dict)
return Format2State(state=clean_state, in_=format2_in)
This replaces per-type conversion with representation transforms. All parameter types work automatically because the visitor pattern + Pydantic models already handle them.
Step 4.2: Reverse pipeline (format2 → native)
For round-trip support:
def convert_format2_to_native(format2_state, format2_in, input_models):
# Step 1: Validate as workflow_step
WorkflowStepToolState.parameter_model_for(input_models) \
.model_validate(format2_state)
# Step 2: Inject ConnectedValue markers from in dict
linked_state = inject_connections(format2_state, format2_in, input_models)
# Step 3: Validate as workflow_step_linked
WorkflowStepLinkedToolState.parameter_model_for(input_models) \
.model_validate(linked_state)
# Step 4: Encode to native format (per-value JSON.dumps, outer JSON.dumps)
native_tool_state = encode_to_native(linked_state)
return native_tool_state
Phase 5: Improve validation_format2.py (1-2 days)
Step 5.1: Use the conversion pipeline for validation
File: lib/galaxy/tool_util/workflow_state/validation_format2.py
The current validate_step_against() already uses WorkflowStepToolState and WorkflowStepLinkedToolState. Simplify to use the pipeline:
def validate_step_against(step_dict, parsed_tool):
if "state" in step_dict:
# Phase 1: Validate raw format2 state
model = WorkflowStepToolState.parameter_model_for(parsed_tool.inputs)
model.model_validate(step_dict["state"])
# Phase 2: Merge connections and validate linked state
linked_step = merge_inputs(step_dict, parsed_tool)
linked_model = WorkflowStepLinkedToolState.parameter_model_for(parsed_tool.inputs)
linked_model.model_validate(linked_step["state"])
This is already close to what exists — just ensure it’s clean.
Phase 6: Schema Generation for External Tools (Optional, 1 day)
Step 6.1: JSON Schema from workflow_step model
def format2_state_json_schema(parsed_tool):
model = WorkflowStepToolState.parameter_model_for(parsed_tool.inputs)
return model.model_json_schema(mode="validation")
Serve via Tool Shed 2.0 API. External tools validate format2 state without Galaxy.
How This Enables Each Deliverable
| Deliverable | How It Works |
|---|---|
| D1: Conversion | parse_native_to_workflow_step_linked() + linked_to_format2() — representation transforms using visitor pattern. No per-type code. |
| D2: Validation | WorkflowStepToolState.parameter_model_for(inputs).model_validate(state) — already exists, just needs wiring. |
| D3: Native Validator | parse_native_to_workflow_step_linked() validates native state by parsing and validating as workflow_step_linked. |
| D4: Format2 Validator | validate_step_against() in validation_format2.py — already uses both representations. |
| D5: Round-Trip | Full pipeline: native → workflow_step_linked → workflow_step → validate → reverse → compare. Both directions validated. |
| D6: Export | Export calls convert_state_to_format2() per step, producing state (not tool_state). Falls back on failure. |
Advantages
- No new representation — reuses existing
workflow_step/workflow_step_linkedpair. No 13th representation to maintain. - All parameter types work immediately —
pydantic_template("workflow_step")already defined for every parameter model class. No per-type conversion code. - Visitor pattern handles tree traversal — conditionals, repeats, sections all handled by
visit_input_values()recursion. - Validated at every step — parse → validate → transform → validate → output. Errors caught early.
- Testable via existing infrastructure —
parameter_specification.ymlalready hasworkflow_step_valid/workflow_step_invalidentries. - JSON Schema for free —
model.model_json_schema()on existing models.
Risks
- Native parsing complexity — The double-encoded JSON with per-value string encoding is the hardest part. The visitor callback needs to handle type coercion (string “5” → int 5, string “true” → bool True) correctly for each parameter type.
- Legacy native encodings — Some older workflows have triple-encoded JSON or other legacy formats. The parser needs to handle these gracefully.
workflow_stepmodel completeness unknown — We won’t know if the models are complete until we start validating real workflows. Expect to discover and fix gaps.
Resolved Decisions
- Defaults: Don’t fill defaults. Native (.ga) workflows are essentially always fully filled in. Format2 workflows may have absent defaults from hand-coding — that’s expected and valid.
$linkin state:workflow_stepshould fail validation if$linkmarkers appear.$linkis frowned upon and not expected in real workflows.workflow_step_linkedhandles connection markers and may be used as an intermediary during conversion.- Dynamic selects: Lenient pass-through when options aren’t available at validation time.
__current_case__in reverse pipeline: Assume omitting it works fine. Validate via Galaxy framework tests and IWC corpus — fixing any defects is a project deliverable.
Status: Partially Superseded by Plan B
Plan B implemented the core architecture this plan describes: native → workflow_step_linked → workflow_step pipeline with post-conversion validation against both models. Key outcomes:
- Confirmed:
workflow_step/workflow_step_linkedis the right representation pair. No new representation needed. - Confirmed: Post-conversion validation (validate native, convert, validate cleaned) catches bugs early.
- Not adopted: Visitor pattern (
visit_input_values()) for conversion. Per-type explicit conversion was kept — more debuggable, worked well in practice. The “no per-type code” claim in Advantages #2 turned out to be aspirational; type coercion inherently requires per-type logic. - Not adopted:
linked_to_format2()as a separate visitor pass. Connection stripping and state building happen together in_convert_state_at_level(). - Still relevant: Phases 5-6 (improve validation_format2.py, JSON Schema generation) are not yet addressed by Plan B.
Open Questions
- Where does round-trip utility live — galaxy-tool-util, gxformat2, or both?