PHASE_6_DETAILED_PLAN

Phase 6: Format2 Export — Detailed Plan

Motivation

gxformat2.from_galaxy_native() converts native .ga to format2 structure but produces tool_state (JSON strings) because it has no tool definitions. The schema-aware conversion in convert_state_to_format2() already exists and produces clean state + in blocks — but there’s no way to apply it to a whole workflow outside of test code.

Two audiences:

Plan separates these so CLI tooling ships first, entirely in galaxy-tool-util.


6.0: Refactor test_roundtrip.pyroundtrip.py — DONE

Commit: d1aed972e0, refined in 96796ce6f2

Extracted all reusable logic from test/unit/workflows/test_roundtrip.py into lib/galaxy/tool_util/workflow_state/roundtrip.py:

test_roundtrip.py is now a thin wrapper: loaders, workflow inventories, sweep runners, pytest classes. Tests use roundtrip_validate/RoundTripValidationResult directly — FullRoundTripResult was removed from the library.

Review fixes applied: _values_equivalent bool/string precedence fixed; full_roundtrip_native delegates to roundtrip_validate (single pipeline); unused imports removed.


6.1: galaxy-workflow-export-format2 CLI — DONE

Commit: fc1fbcd85c

Package: galaxy-tool-util (no Galaxy server dependency) Entry point: galaxy-workflow-export-format2 = galaxy.tool_util.workflow_state.scripts.workflow_export_format2:main Tool lookup: ToolShedGetToolInfo (tool cache) + stock tool source

Files

Interface

galaxy-workflow-export-format2 workflow.ga [--output FILE] [--json]
                                           [--populate-cache] [--strict] [--diff]
FlagBehavior
--outputWrite format2 to file (default: stdout, summary to stderr)
--jsonOutput JSON instead of YAML (default: YAML)
--populate-cacheAuto-fetch uncached ToolShed tools before converting
--strictFail on any step that can’t be converted (default: best-effort fallback)
--diffShow unified diff vs naive from_galaxy_native() output

Design decisions

Tested


6.2: galaxy-workflow-roundtrip-validate CLI — DONE

Commit: 7ebef1fa27

Package: galaxy-tool-util Entry point: galaxy-workflow-roundtrip-validate = galaxy.tool_util.workflow_state.scripts.workflow_roundtrip_validate:main

Files

Interface

galaxy-workflow-roundtrip-validate workflow.ga [--populate-cache] [--strip-bookkeeping]
                                               [--output-native FILE] [--output-format2 FILE] [-v]
FlagBehavior
positionalPath to .ga file or directory (auto-detected)
--populate-cacheAuto-fetch uncached ToolShed tools
--strip-bookkeepingStrip bookkeeping keys before comparison
--output-native FILEWrite reimported native for inspection
--output-format2 FILEWrite intermediate format2 for inspection
-vPer-step failure details and diffs

Exit code 0 = all OK, 1 = failures.

Tested


6.3: Galaxy API endpoint (format2 export)

Package: galaxy (web server) Endpoint: GET /api/workflows/{id}/download?format=format2 Status: Not started

Key difference from CLI

The Galaxy server has instantiated Tool objects in its toolbox. No need for the ToolShed API cache — the GetToolInfo implementation wraps the toolbox directly.

Implementation

New GetToolInfo in Galaxy server:

class ToolboxGetToolInfo:
    """GetToolInfo backed by a live Galaxy toolbox."""

    def __init__(self, app):
        self.app = app

    def get_tool_info(self, tool_id: str, tool_version: Optional[str]) -> ParsedTool:
        tool = self.app.toolbox.get_tool(tool_id, tool_version=tool_version)
        if tool is None:
            raise ToolNotFoundError(tool_id, tool_version)
        return tool.to_parsed_tool()  # or equivalent

Endpoint change: In lib/galaxy/webapps/galaxy/api/workflows.py:

if format == "format2":
    get_tool_info = ToolboxGetToolInfo(trans.app)
    result = export_workflow_to_format2(workflow_dict, get_tool_info)
    # Return format2 YAML or JSON based on Accept header

Fallback behavior

Same best-effort semantics as CLI: steps that fail conversion keep tool_state. Response includes metadata listing which steps fell back.

Tool.to_parsed_tool() — Research Complete

No such method exists. Research findings:

ParsedTool is a Pydantic BaseModel in galaxy.tool_util_models with fields: id, version, name, description, inputs: List[ToolParameterT], outputs, citations, license, profile, edam_operations, edam_topics, xrefs, help.

Existing paths to ParsedTool:

  1. From ToolSourceparse_tool(tool_source) in galaxy.tool_util.model_factory (used by ToolShed API, gx_validator.py)
  2. From ToolShed APIparsed_tool_model_for() in lib/tool_shed/managers/tools.py → gets ToolSource → calls parse_tool()

What the live Tool object already has:

The gap is a ~15-line assembly method:

def to_parsed_tool(self) -> ParsedTool:
    return ParsedTool(
        id=self.id,
        version=self.version,
        name=self.name,
        description=self.description,
        inputs=self.parameters or [],
        outputs=...,  # convert self.outputs dict to list
        citations=...,
        license=self.license,
        profile=self.profile,
        edam_operations=self.edam_operations,
        edam_topics=self.edam_topics,
        xrefs=self.xrefs,
        help=...,
    )

Key insight: tool.parameters is already list[ToolParameterT] — the expensive part (parsing XML inputs into Pydantic models) is already done. The remaining fields are trivial attribute copies. The outputs conversion (from dict[str, ToolOutput] to list[ToolOutput]) is the only non-trivial mapping.

Alternative approach: Since gx_validator.py’s GalaxyGetToolInfo already builds ParsedTool from stock tool sources via parse_tool(tool_source), the ToolboxGetToolInfo adapter could do the same — get the tool’s ToolSource from the toolbox and call parse_tool(). This avoids adding a method to the Tool class entirely. Trade-off: re-parses from XML each time vs assembling from already-parsed attributes.


6.4: Round-trip validation gate for API

Status: Not started

Optional enhancement: before returning format2 from the API, run roundtrip_validate() on the result. If any step fails round-trip, include a warning in the response. If strict=true query param, return 422 instead.

Low priority — the CLI galaxy-workflow-roundtrip-validate is the primary validation tool. The API gate is a convenience.


Implementation Order

StepDeliversStatus
6.0Refactor test_roundtrip.pyroundtrip.pyDONE (d1aed972e0, 96796ce6f2)
6.1galaxy-workflow-export-format2 CLIDONE (fc1fbcd85c)
6.2galaxy-workflow-roundtrip-validate CLIDONE (7ebef1fa27)
6.3aTool.to_parsed_tool() bridge (if missing)Not started
6.3bToolboxGetToolInfo adapterNot started
6.3cAPI endpoint format=format2Not started
6.4Round-trip validation gateNot started

6.0–6.2 are complete and purely in galaxy-tool-util. No Galaxy server dependency.

6.3 is the Galaxy server integration and can follow later.


Testing

6.0 tests — DONE

6.1 tests — DONE (manual)

6.2 tests — DONE (manual)

6.3 tests


Unresolved Questions