PLAN_B_ROUNDTRIP_FIRST

Plan B: Round-Trip First (IWC-Driven)

Approach

Start from the end goal (D5: round-trip validation) and work backward. Build a harness that takes real workflows through native→format2→native, catalog every failure, and let those failures drive library work. The workflows are the agenda — we don’t guess what to build; failures tell us.

Test Workflow Inventory

Framework Test Workflows (53 files)

Location: lib/galaxy_test/workflow/ Each has {name}.gxwf.yml (format2 source) + {name}.gxwf-tests.yml (execution spec).

CategoryCountExamples
Basic types5default_values, default_values_optional, multiple_text
Integer handling3multiple_versions, integer_into_data_column
Collections6zip_collection, flatten_collection, empty_collection_sort
Collection mapping5multi_select_mapping, subcollection_rank_sorting
Conditionals2optional_conditional_inputs_to_build_list
Replacement params3replacement_parameters_text, _legacy, _nested
Other~29Various advanced patterns

Strengths: Already format2, have execution specs. Good for format2→native→format2 direction. Limitation: Not native .ga, no toolshed tools.

Native Test Workflows (13 files)

Location: lib/galaxy_test/base/data/

FileStatusBlocker
test_workflow_1.ga (2.7KB)Baseline target-
test_workflow_2.ga (3.2KB)Target-
test_workflow_two_random_lines.ga (3.3KB)Currently validates-
test_workflow_pause.ga (3.7KB)TargetPause step handling
test_workflow_missing_tool.ga (2.8KB)Error caseTool not in toolbox
test_workflow_matching_lists.ga (3.6KB)TargetList matching
test_workflow_randomlines_legacy_params.ga (1.8KB)TargetLegacy param format
test_workflow_randomlines_legacy_params_mixed_types.ga (1.8KB)TargetMixed types
test_workflow_batch.ga (5.1KB)Blockedgx_text not handled
test_workflow_map_reduce_pause.ga (6.7KB)BlockedDouble-nested JSON
test_subworkflow_with_integer_input.ga (16KB)BlockedSubworkflows
test_subworkflow_with_tags.ga (5.1KB)BlockedSubworkflows
test_workflow_topoambigouity.ga (13KB)BlockedDisconnected input

Unit Test Workflows (5 files)

Location: test/unit/workflows/valid/ and invalid/

The Round-Trip Pipeline

The existing workflow_step and workflow_step_linked Pydantic representations were designed for format2 state validation (commit 39641f6531). Conversion is a representation transform: parse native encoding → validate as workflow_step_linked → strip ConnectedValue markers → produce workflow_step (= format2 state) + connections dict (= format2 in).

Native (.ga) tool_state JSON

    ▼  parse + type-coerce + strip bookkeeping
workflow_step_linked dict (structured, with ConnectedValue)

    ▼  validate against WorkflowStepLinkedToolState model

    ▼  strip ConnectedValue → separate connections dict
workflow_step dict (= Format2 state) + connections (= Format2 in)

    ▼  validate against WorkflowStepToolState model

    ▼  convert_from_format2() [gxformat2: python_to_workflow()]
Native' (.ga)

    ▼  compare(Native, Native')

  PASS / FAIL with structured diff

Comparison Logic: “Functional Equivalence”

Fields that MUST match:

Fields explicitly SKIPPED:

Comparison implementation:

def compare_tool_state(orig: dict, after: dict, path: str = "") -> List[str]:
    """Recursively compare parsed tool_state dicts.
    Skip bookkeeping keys. Report diffs with dot-path."""
    SKIP_KEYS = {"__current_case__", "__page__", "__rerun_remap_job_id__"}
    diffs = []
    for key in set(list(orig.keys()) + list(after.keys())):
        if key in SKIP_KEYS:
            continue
        # ... recursive comparison with type-aware matching
    return diffs

Phased Execution

Phase 1: Harness + Simple Workflows (2-3 weeks)

Goal: Build harness, get 3-5 native workflows passing, catalog all failure modes.

Step 1.1: Build round-trip test harness

New file: test/unit/workflows/test_roundtrip.py

Step 1.2: Run initial sweep against all 71 workflows

Expected output: Failure classification table:

| Workflow | Failure Class | Details |
| test_workflow_batch.ga | TYPE_NOT_HANDLED | gx_text parameter |
| test_workflow_map_reduce.ga | PARSE_ERROR | double-nested JSON |
| test_subworkflow_*.ga | NOT_SUPPORTED | subworkflows |

Step 1.3: Get baseline workflows passing

Step 1.4: Document failure categories

Each failure maps to a D1-D3 work item. Expected categories:

Failure ClassCount (est.)Root CauseMaps To
Type coercion (string↔int)~10Native uses strings, format2 typedD1
__current_case__ mismatch~5Case index not recalculatedD1
gx_text not handled~5Missing parameter typeD1/D2
Tool not found~3Stock tool registry incompleteD2 (GetToolInfo)
Double-nested JSON~2Legacy encodingD1
Subworkflows~2Not yet supportedD1 (early priority)
Disconnected input~1Non-optional without connectionD2

Phase 2: Fix Failures Systematically (3-4 weeks)

Goal: Fix failure classes in priority order (most workflows unblocked per fix). Target: 70%+ passing.

Step 2.1: Native → workflow_step_linked parser

Unblocks: Most failures at once (the core conversion)

Step 2.2: workflow_step_linked → workflow_step (format2 state)

Unblocks: Clean format2 export

Step 2.3: Conditional handling in parser

Unblocks: ~5 workflows with conditionals

Step 2.4: Type coercion edge cases

Unblocks: remaining type-specific failures

Step 2.5: Stock tool registry expansion

Unblocks: ~3 workflows with tool-not-found

Step 2.6: Fix connection merging bugs

File: validation_native.py:native_connections_for()

Step 2.7: Double-nested JSON handling

Unblocks: test_workflow_map_reduce_pause.ga

Red-to-green for each fix:

  1. Identify failing workflow in test_roundtrip.py
  2. Write focused test case for the specific failure
  3. Implement fix
  4. Verify focused test passes
  5. Re-run full sweep to check for regressions

Phase 2.5: Refactor to visitor pattern

Motivation: convert.py and validation_native.py each hand-roll tree traversal (conditional branch selection, repeat iteration, section recursion) that duplicates visit_input_values() in visitor.py. Three places to update when adding parameter types or fixing traversal bugs.

Current visitor limitations (must fix first):

Approach:

  1. Fix visitor.py bugs (remove print statements)
  2. Extend callback protocol with path context: (parameter, value, path, context) -> replacement
    • path: flat state path (e.g. cond|repeat_0|param) — visitor already computes this via flat_state_path()
    • context: opaque dict passed through (carries step, format2_in, etc.)
  3. Add pre-processing hook for native state: JSON string decoding, connection merge — runs before visitor walks each level
  4. Refactor convert.py to use visitor for tree traversal, keep leaf logic in callback:
    • Callback handles: connection check → format2_in, type coercion → format2_state, RuntimeValue detection
    • ~30 lines of callback replacing ~60 lines of manual traversal
  5. Refactor validation_native.py similarly — merge + validation as visitor callback
  6. Verify 100% sweep still passes (no behavioral change, pure refactor)

Risk: Low — existing 41/41 sweep is the safety net. Pure refactor, no new functionality.

When: Before Phase 3 (subworkflows, IWC corpus). Subworkflows will add another traversal concern; better to have one traversal path first.

Phase 2.6: Subworkflow Support

Motivation: Two native test workflows (test_subworkflow_with_integer_input.ga, test_subworkflow_with_tags.ga) and one framework workflow (replacement_parameters_nested.gxwf.yml) are blocked. Subworkflows are structural (no tool_state), so they need a different code path than tool steps. gxformat2 already handles them recursively in both directions — the gap is entirely in the workflow_state module and the round-trip test harness.

Key insight: Subworkflow steps have no tool_state to convert. The tool_state field is null/absent. The work is: (1) don’t crash on them, (2) recursively process nested tool steps within the embedded subworkflow, (3) compare nested structures in round-trip.

Step 2.6.1: Understand the native subworkflow representation

Step 2.6.2: Handle subworkflow steps in convert.py

Step 2.6.3: Handle subworkflow steps in validation_native.py

Step 2.6.4: Update round-trip test harness

Step 2.6.5: Full round-trip with nested subworkflows

Step 2.6.6: Edge cases

Test plan (red-to-green):

  1. Add subworkflow test files to inventory → tests fail (SUBWORKFLOW failure class)
  2. Handle subworkflow steps in convert.py → per-step conversion passes for nested tool steps
  3. Handle in validation_native.py → validation passes
  4. Update comparison logic → full round-trip comparison works
  5. Assert 100% sweep still holds with subworkflow workflows included

Risk: Low-medium. gxformat2 does the heavy lifting for structural conversion. Main risk is edge cases in nested connection comparison and subworkflow output mapping.

When: After Phase 2.5 (visitor refactor), before Phase 3 (IWC/ToolShed). Subworkflows are common in IWC corpus — must work before scaling to real-world workflows.

Phase 3: IWC Scale + Execution Validation (3-4 weeks)

Goal: Extend to IWC workflows, prove execution equivalence.

Step 3.1: IWC workflow collection

Step 3.2: Toolshed tool support in GetToolInfo

Step 3.3: Execution equivalence testing

For workflows that pass structural round-trip:

  1. Run original native workflow through Galaxy test infrastructure
  2. Run round-tripped native’ workflow through same infrastructure
  3. Compare job outputs (not just metadata)

Step 3.4: Framework test workflows WITHOUT __current_case__

How Failures Map to Deliverables

Phase 1 failures

    ├─→ "10 workflows fail type coercion"
    │   └─→ D1 work: convert.py type handlers

    ├─→ "5 workflows fail conditional case"
    │   └─→ D1 work: __current_case__ inference

    ├─→ "5 workflows fail gx_text"
    │   └─→ D1+D2: text parameter support

    ├─→ "3 workflows fail tool lookup"
    │   └─→ D3: GetToolInfo expansion

    └─→ "2 workflows fail subworkflows"
        └─→ D1: subworkflow support (early priority)

Phase 2 fixes → D1 (conversion library) + D2 (validation library)
Phase 3 fixes → D3 (native validator) + D4 (format2 validator)
Harness itself → D5 (round-trip utility)
Export integration → D6

Key File Paths

Files to Create

FilePurposePhase
test/unit/workflows/test_roundtrip.pyRound-trip harness + comparison1
lib/galaxy/tool_util/workflow_state/roundtrip.pyReusable round-trip utility2
test/integration/workflows/test_roundtrip_execution.pyExecution equivalence3

Files to Modify

FileChangesPhase
lib/galaxy/tool_util/workflow_state/convert.pyReplace per-type _convert_state_at_level() with visitor-based parse → workflow_step_linkedworkflow_step pipeline2
lib/galaxy/tool_util/workflow_state/validation_native.pyFix connection bug, simplify to use WorkflowStepLinkedToolState models2
lib/galaxy/workflow/gx_validator.pyExpand tool registry2-3
test/unit/workflows/test_workflow_validation.pyUncomment blocked tests2

Reference Files (read, don’t modify)

FileWhat It Tells Us
gxformat2/export.py:from_galaxy_native()Current export behavior (produces tool_state not state)
gxformat2/converter.py:python_to_workflow()Current import behavior (state vs tool_state paths)
lib/galaxy/tool_util_models/parameters.pyAll parameter model classes + pydantic_template("workflow_step") — the target representation
lib/galaxy/tool_util/parameters/convert.pyfill_static_defaults() and visitor pattern
lib/galaxy/tool_util/parameters/visitor.pyvisit_input_values() — tree traversal for representation transforms

Advantages of This Approach

  1. Real-world driven — test against actual workflows, not synthetic data
  2. Failure-prioritized — fix what’s broken first, not what we think matters
  3. Incremental value — each phase produces working functionality
  4. Natural prioritization — the most common parameter types appear in the most workflows, so they get fixed first
  5. D5 is the harness itself — the testing infrastructure IS the deliverable

Risks

  1. gxformat2 export limitationsfrom_galaxy_native() currently produces tool_state not state. The round-trip pipeline bypasses this by doing its own conversion: parse native → workflow_step_linkedworkflow_step (= format2 state). The gxformat2 export path is only used for non-state parts of the workflow (connections, metadata, structure).
  2. Tool availability — Framework workflows use stock tools (available). IWC workflows use toolshed tools (need API integration). Phase 3 may be blocked on Tool Shed integration.
  3. Subworkflows2 native + 1 framework test workflows use subworkflows. Resolved in Phase 2.6 (42ae7a48aa).
  4. Execution tests require Galaxy instance — Phase 3 execution equivalence needs a running Galaxy with test tools. Heavier infrastructure than structural comparison.

Resolved Decisions

Progress

Phase 1+2 Complete (2026-03-07)

All stock-tool test workflows pass both per-step conversion and full native→format2→native round-trip comparison.

Per-step conversion sweep: 41/41 (100%) — 1 excluded (intentional missing tool) Full round-trip sweep: 14/14 (100%) native workflows

Commits (branch wf_tool_state)

  1. be3344f8c2 — Round-trip harness + extend native validation/conversion to all param types (35/42)
  2. 244bd48f12 — Version-tolerant tool lookup, full round-trip comparison tests (37/42, 14/15 full)
  3. 422d1524a5 — Fix ToolExpressionOutput.to_model() for boolean type (40/42)
  4. baf139333e — Fix native validation merge for gxformat2 output reference strings (41/42)
  5. 2c1484cabd — Exclude list for intentional missing tool, assert 100% in sweeps

What was built

Key bugs fixed

Known limitations

Plan deviations

Phase 2.6 Complete (2026-03-08)

Subworkflow steps now fully supported in round-trip pipeline.

Per-step conversion sweep: 43/43 (100%) — was 41, +2 subworkflow workflows Full round-trip sweep: 16/16 (100%) — was 14, +2 subworkflow workflows

Commit (branch wf_tool_state)

  1. 42ae7a48aa — Subworkflow support for round-trip conversion and validation

What was built

Key insight

Subworkflow steps have no tool_state — they’re structural. No changes needed in convert.py or _walker.py. The work was entirely in the test harness (recursion into nested workflows) and the validation entry point.

Remaining: Phase 3

Phase 3 (IWC scale + execution validation) is not started. Key items:

Open Questions