First Planning Steps
Step 1: Audit Parameter Type Coverage
Inventory every Galaxy parameter type and assess current handling in:
- Native validation (
validation_native.py) - Format2 validation (
validation_format2.py) - State conversion (
convert.py)
Produce a coverage matrix. Prioritize by frequency in IWC workflows (text, integer, float, boolean, select, conditional, repeat, section, data, data_collection cover ~95% of real usage).
Step 2: Design the state JSON Schema / Pydantic Model
The Format2 state block needs a formal schema. Design how to derive it from the existing workflow_step representation:
- Can
workflow_stepPydantic models be reused directly for validatingstate? - What adaptations are needed? (
stateomits connected params, uses structured dicts not JSON strings, has noConnectedValue/__current_case__/__page__) - Should a new state representation (e.g.,
format2_state) be added, or isworkflow_stepsufficient with pre-processing? - How does
$linkinstateinteract with the schema?
Step 3: Design the Conversion Interface
Define the contract for tool_state ↔ state conversion:
- Input/output types
- How tool definitions are provided (the
GetToolInfoprotocol, or directParsedTool/ToolParameterBundleModel?) - Error handling strategy (structured errors vs exceptions, partial conversion vs all-or-nothing)
- Where in the package hierarchy this lives
Plan how conversion handles each parameter type, especially:
- Conditionals:
__current_case__inference from selector values - Repeats: array ↔ indexed-key mapping
- Connections:
ConnectedValue↔in/$linkextraction/injection - Defaults: should conversion fill missing defaults or leave them absent?
Step 4: Implement Core Scalar + Container Conversion
Build out the conversion for the common parameter types in priority order:
- Scalars: text, integer, float, boolean, color, hidden
- Select (including dynamic selects — may need special handling)
- Sections
- Conditionals (with
__current_case__inference) - Repeats
- Data/collection (ConnectedValue extraction)
Red-to-green: write test cases for each type before implementing.
Step 5: Wire Validation into gxformat2 Lint Path
Design how gxformat2’s linter can optionally consume tool schemas:
- Extend
ImporterGalaxyInterfaceor introduce a parallel interface? - How are tool schemas provided? (local tool XML, Tool Shed API response, pre-built
ParsedToolobjects?) - What lint messages should be emitted for invalid state?
Step 6: Design the Round-Trip Test Harness
Plan the D5 round-trip validation:
- What “semantically equivalent” means precisely (which fields to compare, which to ignore)
- How to handle fields that legitimately differ (
__current_case__ordering, default filling, key ordering) - Test corpus: start with framework test workflows, expand to IWC workflows
- Where this utility lives (galaxy-tool-util CLI? gxformat2 CLI? both?)
Step 7: Plan Format2 Export Integration
Design how Galaxy’s export path changes:
- Where does the schema-aware conversion plug into
from_galaxy_native()? - How does the UI offer Format2 download? (new API param? separate endpoint?)
- Fallback behavior when conversion fails
- What metadata is needed beyond tool state (comments are already known-lost; anything else?)
Unresolved Questions
- New
format2_staterepresentation or reuseworkflow_stepwith pre-processing? - Should conversion fill tool defaults or preserve only explicitly-set values?
- Where does the round-trip utility live — galaxy-tool-util, gxformat2, or a new package?
- Should Format2 export use
state(fully decoded) or offer bothstateandtool_stateoutput modes? - How to handle tools not available to the validator (missing from Tool Shed, local-only tools)?
- Does the
$linksyntax instateneed schema-level support or just pre-processing before validation?