Plan: Phases 3+ (Building on Completed Phases 1-2.6)
Starting Point
Phases 1, 2, 2.5, and 2.6 of Plan B are complete. What exists:
- Conversion library (
convert.py,_walker.py): all parameter types, conditionals, repeats, sections, subworkflows - Validation (
validation_native.py,validation_format2.py): both native + format2 validated againstworkflow_step/workflow_step_linkedPydantic models - Round-trip harness (
test_roundtrip.py): 43/43 per-step (100%), 16/16 full round-trip (100%) — stock tools only GetToolInfoProtocol: generic interface,GalaxyGetToolInfo(stock),ToolShedGetToolInfo(API+cache),CombinedGetToolInfo(stock+toolshed)- Shared walker (
_walker.py): unified tree traversal for convert + validate
All stock-tool workflows pass. The gap: ToolShed tools need cache infrastructure before IWC validation, no execution equivalence proof, no Galaxy export endpoint.
Phase 2.7: Tool Info Cache Infrastructure — COMPLETE
Goal: Build local cache + CLI tooling for ToolShed tool metadata, enabling Phase 3+ without requiring a fully-working ToolShed API.
Full plan: See CLIENT_TOOL_CACHE_PLAN.md
Commits (branch wf_tool_state)
cfe3898e87—ToolShedGetToolInfowith API fetch + filesystem cache,CombinedGetToolInfo,gx_validator.pywiringaab87eeca8—galaxy-tool-cacheCLI, cache index, workflow extractor, Galaxy API stub, 33 unit tests
What was built
toolshed_tool_info.py—ToolShedGetToolInfowith API fetch, memory + filesystem cache,CacheIndex(index.json with provenance),populate_from_parsed_tool(),has_cached(),list_cached(),clear_cache(),GALAXY_TOOL_CACHE_DIRenv var, backfill for legacy cache filesworkflow_tools.py— Extract toolshed tool refs from native (.ga) and format2 (.gxwf.yml) workflows with subworkflow recursion and deduplicationtool_cache.py—galaxy-tool-cacheCLI with subcommands:populate-workflow,add,add-local,list,info,clear. Sources: ToolShed API (working), Galaxy instance API (stubbedNotImplementedError), local XMLpackages/tool_util/setup.cfg—galaxy-tool-cacheentry point registeredtest/unit/workflows/iwc/— 2 IWC sample workflows (RepeatMasking, average-bigwig)- 33 unit tests — CacheIndex CRUD/persistence/corruption recovery, parse_toolshed_tool_id, get_cache_dir, ToolShedGetToolInfo cache ops + backfill, workflow extraction (native/format2/subworkflow/dedup/IWC), CLI parser, Galaxy API stub
Remaining (deferred to Phase 3)
- Wire IWC tests into
test_roundtrip.pywithGET_TOOL_INFO_WITH_TOOLSHED - Galaxy instance API source implementation
Phase 2.8: Stale State Detection & Fixing — COMPLETE
Goal: Detect and strip stale (undeclared) keys from persisted tool_state in workflows, and provide CLI tooling for auditing/cleaning.
Commits (branch wf_tool_state)
e2eb2ee70e— RED tests: stale parameter keys persist in workflow tool_state across upgradese32e9da58f— Fixparams_to_strings/params_from_stringsto strip stale tool_state keyse51990291c— Addgalaxy-workflow-validateandgalaxy-workflow-clean-stale-stateCLIs802fb755e2— Fix clean-stale-state: preserve key order and values, add--diffflagf6c5172f37— Deduplicateas_dict/as_listhelpers andDEFAULT_TOOLSHED_URLconstantbcc81a367e— Add--recursivetree ops to workflow validate and clean-stale-state CLIsc79957b0a5— Overhaul workflow_state CLIs for cohesion and consistencyb52faccc21— Polish CLI overhaul: deduplicate--tool-sourcearg, improve naming and help text8aed179978— Refactor workflow_state CLIs: extract domain logic into typed modules
What was built
- Core fix —
params_to_strings/params_from_stringsintools/parameters/__init__.pynow filter to declared inputs only, preventing stale keys from surviving save/load cycles galaxy-workflow-validate— CLI that validates every step’stool_stateagainst the tool’s declared inputs, reports undeclared/stale keys per stepgalaxy-workflow-clean-stale-state— CLI that strips stale keys from.gafiles, with--diffmode for review--recursive— Both CLIs support recursive directory traversal for bulk operations--populate-cache— Auto-cache ToolShed tool metadata during validation- Domain modules — CLI logic extracted into typed modules for reuse
Phase 3: ToolShed Tool Validation at Scale
Goal: Prove round-trip works on real-world workflows with ToolShed tools.
Depends on: Phase 2.7 (cache populated for target tools)
3.1: IWC workflow selection — COMPLETE
Ran against all 111 IWC workflows (exceeding the original 10-20 target). Full results in IWC_REVIEW_SUMMARY.md.
3.2: Cache population for IWC corpus — COMPLETE
509/509 unique tools cached. 499 via ToolShed API, 10 via add-local fallback (stock Galaxy tools not served by ToolShed API, plus expression tools tracked in #22007). Zero skips.
3.3: IWC validation sweep — PARTIALLY COMPLETE
CLI sweep done:
Workflows: 111 | Steps: 2074 OK, 64 FAIL, 0 SKIP, 0 ERROR
- 86/111 workflows fully clean
- 25 workflows have 124 stale keys across ~50 steps
Still TODO: Formalize into test_roundtrip.py IWC test class with GET_TOOL_INFO_WITH_TOOLSHED for CI regression checking.
3.4: Format2 subworkflow recursion in validation — COMPLETE
Both validate_workflow_format2 and _validate_format2 now detect steps with a run dict and recurse into inline subworkflows, matching native validation’s subworkflow handling. CLI path produces dotted step prefixes (e.g. "0.0.0") for nested tools. 9 tests covering valid/invalid state, deep nesting (3 levels), string run refs, missing run key, empty subworkflows.
Commit: 2635b0a3fc on wf_tool_state
3.5: Fix IWC-specific failures — IN PROGRESS
Done:
hidden_dataparam modeled as optional data in tool meta-models (7662e14a37)params_to_strings/params_from_stringsstale key stripping (Phase 2.8)
Remaining — actual failure classes from IWC sweep:
The original plan predicted “unknown parameter types” as the key risk. The IWC sweep revealed stale state as the dominant issue instead. Failure categories:
| Category | Steps | Workflows | Root cause |
|---|---|---|---|
saveLog in multiqc | 16 | 14 | Stale key from tool upgrade |
__workflow_invocation_uuid__ | 20 | 1 | Runtime leak via encode fallback |
__identifier__ | ~10 | 4 | Runtime leak via extraction cleanup gap |
trim_front2/trim_tail2 in fastp | 3 | 3 | Stale keys from tool upgrade |
images in imagemagick | 4 | 2 | Stale key from tool upgrade |
| Tool-specific stale keys | ~11 | 5 | Various tool upgrades |
3.6: Stale bookkeeping key leak fixes
Root cause analysis in STALE_STATE_BOOKKEEPING.md. Two runtime-only values leak into persisted .ga files through different paths:
__workflow_invocation_uuid__ — injected at tools/execute.py:223, leaks when DefaultToolState.encode() raises ValueError and the fallback in modules.py:374 returns raw self.state.inputs unfiltered, bypassing params_to_strings.
Fix needed: lib/galaxy/workflow/modules.py:374 — filter or strip the fallback return value instead of returning raw self.state.inputs.
__identifier__ — injected at tools/actions/__init__.py:501 with pipe-delimited keys (input_name|__identifier__). Leaks because __cleanup_param_values() in extract.py:430-487 only handles underscore-suffix keys, not pipe-delimited patterns.
Fix needed: lib/galaxy/workflow/extract.py:430-487 — add __identifier__ cleanup matching the pipe-delimited pattern.
Note: the Phase 2.8 params_to_strings fix prevents these keys from surviving through save/load cycles, but these fixes prevent them from entering persisted state in the first place.
Phase 4: Execution Equivalence
Goal: Prove that round-tripped workflows produce identical execution results, and that __current_case__ omission is safe.
4.1: __current_case__ stripping test
This is the key proof that __current_case__ is unnecessary:
- Take the 24 framework format2 workflows (already have execution specs in
{name}.gxwf-tests.yml) - Convert format2 → native (produces
__current_case__) - Strip all
__current_case__values from native - Run through Galaxy’s workflow test runner
- All tests must pass
Implementation: Add a pytest fixture or test class in test/integration/workflows/ that:
- Loads each framework workflow
- Converts to native via
python_to_workflow() - Strips
__current_case__from every conditional in every step’stool_state - Submits to Galaxy for execution
- Asserts execution matches the
.gxwf-tests.ymlspec
If any fail, that’s a bug to fix in Galaxy’s workflow execution engine (it should derive case from selector value, not rely on persisted index).
4.2: Round-tripped workflow execution
For the 16 native workflows that pass full round-trip:
- Run original native workflow through Galaxy test infrastructure
- Run round-tripped native’ workflow through same infrastructure
- Compare: same jobs created, same outputs produced
New file: test/integration/workflows/test_roundtrip_execution.py
This is heavy (needs running Galaxy instance + test tools). Run selectively, not in every CI build.
Phase 5: Format2 Export from Galaxy
Goal: D6 — Galaxy can export workflows as clean Format2 with state (not tool_state).
5.1: Export function
New or extend: lib/galaxy/workflow/ export path
def export_workflow_to_format2(workflow_dict: dict, get_tool_info: GetToolInfo) -> dict:
"""Export native workflow as format2 with clean `state` blocks.
For each tool step:
1. convert_state_to_format2() → Format2State(state, in_)
2. Replace tool_state with state, merge in_ into step
Falls back to tool_state for steps that fail conversion.
"""
Currently gxformat2.export.from_galaxy_native() produces tool_state (JSON strings) because it has no tool definitions. The schema-aware path uses convert_state_to_format2() per step to produce clean state.
5.2: API endpoint
Add or modify Galaxy API endpoint to return format2 YAML:
GET /api/workflows/{id}/download?format=format2(or similar)- Uses
CombinedGetToolInfo(stock tools + ToolShed) - Returns format2 with
stateblocks where conversion succeeds - Falls back to
tool_statefor steps that fail (with warning annotation)
5.3: Round-trip validation gate
Only offer format2 export for workflows that pass round-trip validation:
- Run
convert_state_to_format2()for all steps - If any step fails, warn user and offer native-only export
- If all pass, export with confidence
Phase 6: External Tooling Support
Goal: Enable validation of format2 workflows without a Galaxy instance, using only Tool Shed API + local cache.
6.1: JSON Schema generation from workflow_step models
def format2_state_json_schema(parsed_tool: ParsedTool) -> dict:
model = WorkflowStepToolState.parameter_model_for(parsed_tool.inputs)
return model.model_json_schema(mode="validation")
This is nearly free — pydantic_template("workflow_step") already exists for all parameter types.
6.2: Tool Shed API endpoint for workflow state schema
Serve the JSON Schema via Tool Shed 2.0 API:
GET /api/tools/{trs_tool_id}/version/{version}/workflow_state_schema- Returns JSON Schema that validates format2
stateblocks - External tools (IDEs, linters, AI agents) validate without Galaxy
6.3: gxformat2 lint integration
Extend gxformat2/lint.py with optional schema-aware validation:
- When tool definitions available (via ToolShed API, cache, or local), validate each step’s
stateagainstWorkflowStepToolState - When not available, fall back to current structural-only lint
gxformat2stays dependency-free — tool definitions passed in asParsedTooldicts or JSON Schemasgalaxy-tool-cacheprovides the bridge: populate cache → lint uses cachedParsedTool
Phase Summary
| Phase | Delivers | Depends On | Status | Key Risk |
|---|---|---|---|---|
| 2.7: Cache Infrastructure | galaxy-tool-cache CLI, cache index, multi-source | — | COMPLETE | — |
| 2.8: Stale State Detection | params_to_strings fix, validate/clean CLIs | — | COMPLETE | — |
| 3: ToolShed Validation | IWC round-trip at scale | Phase 2.7 | 3.1-3.4 COMPLETE, 3.5-3.6 IN PROGRESS | Stale state leakage (not unknown param types) |
| 4: Execution Equivalence | __current_case__ proof, execution comparison | — (stock tools only) | Not started | Galaxy engine bugs |
| 5: Format2 Export | Galaxy export API with state blocks | Phases 3-4 | Not started | Fallback UX |
| 6: External Tooling | IDE/agent validation support | Phase 2.7, 3 | Not started | gxformat2 dependency boundary |
Parallelism: Phases 2.7 and 2.8 are complete. Phase 3.5-3.6 and Phase 4 can proceed in parallel — Phase 4 uses only stock tools, Phase 3 uses the cache infra from 2.7. Phase 5 depends on confidence from 3+4. Phase 6 depends on 3 but not on 4+5.
Unresolved Questions
- Should the
modules.py/extract.pyleak fixes (3.6) be part of this branch or separate PRs? - Is formalizing the IWC sweep into
test_roundtrip.pystill valuable given the CLI sweep already proved coverage? - Of the 64 failing steps, how many are stale-key-only (fixable by cleaning) vs genuine conversion/model gaps?
Resolved Questions
- ToolShed tool ID parsing:
toolshed.g2.bx.psu.edu/repos/owner/repo/tool_name/version→ split on/repos/, join with~for TRS ID. Implemented inparse_toolshed_tool_id(). - IWC corpus access: Vendor a small subset (2-3 .ga files) in
test/unit/workflows/iwc/; larger corpus fetched by CI orgalaxy-tool-cache populate-workflow. - Execution equivalence tests: Integration tests requiring Galaxy server — CI with selective runs, not every build.
- Format2 export API: Extend existing
/api/workflows/{id}/downloadwithformat=format2parameter. - gxformat2 lint: Pass
ParsedTooldicts (not as a dependency);galaxy-tool-cachepopulates the cache, lint reads it. - Cache infrastructure:
galaxy-tool-cacheCLI inpackages/tool_util,GALAXY_TOOL_CACHE_DIRenv var, no TTL. SeeCLIENT_TOOL_CACHE_PLAN.md. - IWC corpus size: Full corpus (111 workflows, 509 tools) used instead of 10-20 subset. All tools cacheable.
- Dominant failure class: Stale state keys, not unknown parameter types. Most failures are tool-upgrade residue or runtime value leakage, not missing model coverage.