PLAN_NEXT

Plan: Phases 3+ (Building on Completed Phases 1-2.6)

Starting Point

Phases 1, 2, 2.5, and 2.6 of Plan B are complete. What exists:

All stock-tool workflows pass. The gap: ToolShed tools need cache infrastructure before IWC validation, no execution equivalence proof, no Galaxy export endpoint.


Phase 2.7: Tool Info Cache Infrastructure — COMPLETE

Goal: Build local cache + CLI tooling for ToolShed tool metadata, enabling Phase 3+ without requiring a fully-working ToolShed API.

Full plan: See CLIENT_TOOL_CACHE_PLAN.md

Commits (branch wf_tool_state)

What was built

Remaining (deferred to Phase 3)


Phase 2.8: Stale State Detection & Fixing — COMPLETE

Goal: Detect and strip stale (undeclared) keys from persisted tool_state in workflows, and provide CLI tooling for auditing/cleaning.

Commits (branch wf_tool_state)

What was built


Phase 3: ToolShed Tool Validation at Scale

Goal: Prove round-trip works on real-world workflows with ToolShed tools.

Depends on: Phase 2.7 (cache populated for target tools)

3.1: IWC workflow selection — COMPLETE

Ran against all 111 IWC workflows (exceeding the original 10-20 target). Full results in IWC_REVIEW_SUMMARY.md.

3.2: Cache population for IWC corpus — COMPLETE

509/509 unique tools cached. 499 via ToolShed API, 10 via add-local fallback (stock Galaxy tools not served by ToolShed API, plus expression tools tracked in #22007). Zero skips.

3.3: IWC validation sweep — PARTIALLY COMPLETE

CLI sweep done:

Workflows: 111 | Steps: 2074 OK, 64 FAIL, 0 SKIP, 0 ERROR

Still TODO: Formalize into test_roundtrip.py IWC test class with GET_TOOL_INFO_WITH_TOOLSHED for CI regression checking.

3.4: Format2 subworkflow recursion in validation — COMPLETE

Both validate_workflow_format2 and _validate_format2 now detect steps with a run dict and recurse into inline subworkflows, matching native validation’s subworkflow handling. CLI path produces dotted step prefixes (e.g. "0.0.0") for nested tools. 9 tests covering valid/invalid state, deep nesting (3 levels), string run refs, missing run key, empty subworkflows.

Commit: 2635b0a3fc on wf_tool_state

3.5: Fix IWC-specific failures — IN PROGRESS

Done:

Remaining — actual failure classes from IWC sweep:

The original plan predicted “unknown parameter types” as the key risk. The IWC sweep revealed stale state as the dominant issue instead. Failure categories:

CategoryStepsWorkflowsRoot cause
saveLog in multiqc1614Stale key from tool upgrade
__workflow_invocation_uuid__201Runtime leak via encode fallback
__identifier__~104Runtime leak via extraction cleanup gap
trim_front2/trim_tail2 in fastp33Stale keys from tool upgrade
images in imagemagick42Stale key from tool upgrade
Tool-specific stale keys~115Various tool upgrades

3.6: Stale bookkeeping key leak fixes

Root cause analysis in STALE_STATE_BOOKKEEPING.md. Two runtime-only values leak into persisted .ga files through different paths:

__workflow_invocation_uuid__ — injected at tools/execute.py:223, leaks when DefaultToolState.encode() raises ValueError and the fallback in modules.py:374 returns raw self.state.inputs unfiltered, bypassing params_to_strings.

Fix needed: lib/galaxy/workflow/modules.py:374 — filter or strip the fallback return value instead of returning raw self.state.inputs.

__identifier__ — injected at tools/actions/__init__.py:501 with pipe-delimited keys (input_name|__identifier__). Leaks because __cleanup_param_values() in extract.py:430-487 only handles underscore-suffix keys, not pipe-delimited patterns.

Fix needed: lib/galaxy/workflow/extract.py:430-487 — add __identifier__ cleanup matching the pipe-delimited pattern.

Note: the Phase 2.8 params_to_strings fix prevents these keys from surviving through save/load cycles, but these fixes prevent them from entering persisted state in the first place.


Phase 4: Execution Equivalence

Goal: Prove that round-tripped workflows produce identical execution results, and that __current_case__ omission is safe.

4.1: __current_case__ stripping test

This is the key proof that __current_case__ is unnecessary:

  1. Take the 24 framework format2 workflows (already have execution specs in {name}.gxwf-tests.yml)
  2. Convert format2 → native (produces __current_case__)
  3. Strip all __current_case__ values from native
  4. Run through Galaxy’s workflow test runner
  5. All tests must pass

Implementation: Add a pytest fixture or test class in test/integration/workflows/ that:

If any fail, that’s a bug to fix in Galaxy’s workflow execution engine (it should derive case from selector value, not rely on persisted index).

4.2: Round-tripped workflow execution

For the 16 native workflows that pass full round-trip:

  1. Run original native workflow through Galaxy test infrastructure
  2. Run round-tripped native’ workflow through same infrastructure
  3. Compare: same jobs created, same outputs produced

New file: test/integration/workflows/test_roundtrip_execution.py

This is heavy (needs running Galaxy instance + test tools). Run selectively, not in every CI build.


Phase 5: Format2 Export from Galaxy

Goal: D6 — Galaxy can export workflows as clean Format2 with state (not tool_state).

5.1: Export function

New or extend: lib/galaxy/workflow/ export path

def export_workflow_to_format2(workflow_dict: dict, get_tool_info: GetToolInfo) -> dict:
    """Export native workflow as format2 with clean `state` blocks.

    For each tool step:
    1. convert_state_to_format2() → Format2State(state, in_)
    2. Replace tool_state with state, merge in_ into step

    Falls back to tool_state for steps that fail conversion.
    """

Currently gxformat2.export.from_galaxy_native() produces tool_state (JSON strings) because it has no tool definitions. The schema-aware path uses convert_state_to_format2() per step to produce clean state.

5.2: API endpoint

Add or modify Galaxy API endpoint to return format2 YAML:

5.3: Round-trip validation gate

Only offer format2 export for workflows that pass round-trip validation:


Phase 6: External Tooling Support

Goal: Enable validation of format2 workflows without a Galaxy instance, using only Tool Shed API + local cache.

6.1: JSON Schema generation from workflow_step models

def format2_state_json_schema(parsed_tool: ParsedTool) -> dict:
    model = WorkflowStepToolState.parameter_model_for(parsed_tool.inputs)
    return model.model_json_schema(mode="validation")

This is nearly free — pydantic_template("workflow_step") already exists for all parameter types.

6.2: Tool Shed API endpoint for workflow state schema

Serve the JSON Schema via Tool Shed 2.0 API:

6.3: gxformat2 lint integration

Extend gxformat2/lint.py with optional schema-aware validation:


Phase Summary

PhaseDeliversDepends OnStatusKey Risk
2.7: Cache Infrastructuregalaxy-tool-cache CLI, cache index, multi-sourceCOMPLETE
2.8: Stale State Detectionparams_to_strings fix, validate/clean CLIsCOMPLETE
3: ToolShed ValidationIWC round-trip at scalePhase 2.73.1-3.4 COMPLETE, 3.5-3.6 IN PROGRESSStale state leakage (not unknown param types)
4: Execution Equivalence__current_case__ proof, execution comparison— (stock tools only)Not startedGalaxy engine bugs
5: Format2 ExportGalaxy export API with state blocksPhases 3-4Not startedFallback UX
6: External ToolingIDE/agent validation supportPhase 2.7, 3Not startedgxformat2 dependency boundary

Parallelism: Phases 2.7 and 2.8 are complete. Phase 3.5-3.6 and Phase 4 can proceed in parallel — Phase 4 uses only stock tools, Phase 3 uses the cache infra from 2.7. Phase 5 depends on confidence from 3+4. Phase 6 depends on 3 but not on 4+5.


Unresolved Questions

Resolved Questions