Plan: Symmetric Schema-Aware Protocol for gxformat2
Goal
Add optional callback protocols to gxformat2’s from_galaxy_native() and python_to_workflow() so that schema-aware consumers (like galaxy-tool-util) can inject tool-definition-aware state conversion on both the export and import paths. gxformat2 stays schema-free; the callbacks are optional with unchanged default behavior.
Status
Step 1: gxformat2 PR — DONE
Branch: state_callbacks at /Users/jxc755/projects/worktrees/gxformat2/branch/state_callbacks
Commit: 4bcae5f “Add convert_tool_state and native_state_encoder callback protocols.”
Implemented:
ConvertToolStateFntype alias +convert_tool_stateparam onfrom_galaxy_native()NativeStateEncoderFntype alias +native_state_encoderonImportOptions- Renamed
encode_tool_state→encode_tool_state_json stateandtool_statemutually exclusive on export- Subworkflow recursion passes
convert_tool_statethrough - Warning logged on callback exception (not silent swallow)
- Both type aliases exported from
gxformat2.__init__ - 12 new tests (7 export, 5 import), 100/100 suite passing
Step 2: galaxy-tool-util — encode_state_to_native — DONE
Commit: 8345935718 on wf_tool_state branch
Implemented in convert.py:
encode_state_to_native(parsed_tool, state)— recursive walker reverses format2 conversions- Multiple select lists → comma-delimited strings
- Recurses into conditionals, sections, repeats
- ConnectedValue markers passed through as json.dumps
- Everything else: json.dumps (same as default)
_reverse_format2_values()/_reverse_value()— recursive walk using tool input defs_find_conditional_branch()— branch selection for format2 conditional state
Step 3: galaxy-tool-util — Wire callbacks, remove post-processing — DONE
Same commit as Step 2.
Implemented:
make_convert_tool_state(get_tool_info)— factory for export callbackmake_encode_tool_state(get_tool_info)— factory for import callbackroundtrip_validate()rewritten to use callbacksreplace_tool_state_with_format2_state()removed from roundtrip.pyfind_matching_native_step()andensure_export_defaults()kept (still used by export_format2.py)- Added list↔string equivalence to
_values_equivalentfor multiple selects - 230/230 unit tests passing
- IWC corpus: 33 OK, 23 MISMATCH, 53 CONVERSION_FAIL, 3 ERROR (out of 112)
Step 4: Pin gxformat2 >= new version — TODO
- gxformat2 PR not yet opened/merged
- Galaxy’s
encode_tool_statereference inlib/galaxy/workflow/format2.pywill break — not used, but import will fail if the attribute is removed. Current rename isencode_tool_state→encode_tool_state_json; Galaxy never sets it so no code change needed, but the pin bump will surface it. export_format2.pystill uses the old post-processing approach (not roundtrip — it’s the export CLI). Could be migrated to callbacks too but is a separate task.
Design
Export Callback: ConvertToolStateFn
A callable that gxformat2 accepts but doesn’t implement. Returns only state — connections are always handled by gxformat2’s _convert_input_connections().
ConvertToolStateFn = Optional[Callable[[dict], Optional[Dict[str, Any]]]]
Why no in_connections? _convert_input_connections() would overwrite the callback’s in dict, the callback can’t produce correct source references without gxformat2’s label_map, and they fully overlap on the same native input_connections dict.
Import Callback: NativeStateEncoderFn
NativeStateEncoderFn = Optional[Callable[[dict, Dict[str, Any]], Optional[Dict[str, Any]]]]
setup_connected_values() runs before the callback. ConnectedValue markers are passed through as json.dumps(marker). RuntimeValue markers are injected separately after — the callback never sees them.
Encoding Flow
- gxformat2
transform_tool()popsstate, runssetup_connected_values - Calls
native_state_encoder(step, step_state)— our callback - Our callback calls
encode_state_to_native(parsed_tool, state)which walks format2 state with tool defs, reverses multiple select lists → comma strings, json.dumps each value - gxformat2 does
tool_state.update(encoded)with callback result - gxformat2 calls
_populate_tool_state(step, tool_state)— outer json.dumps envelope
Resolved Questions
Q: Should the export callback return in_connections?
No. Fully overlaps with _convert_input_connections, would be overwritten, can’t resolve step labels.
Q: Should encode_state_to_native handle ConnectedValue/RuntimeValue?
ConnectedValue: yes, trivially. RuntimeValue: N/A (injected after callback).
Q: Should the export callback receive workflow context?
No. Native step dict has everything needed.
Q: Should the callback handle pick_value steps?
No. gxformat2 already handles them.
Q: Naming for the import-side callback?
native_state_encoder. Boolean renamed to encode_tool_state_json.
Q: Does the import callback’s step dict still have tool_id/tool_version?
Yes. Only state and in/connect have been popped at callback time.
Unresolved Questions
None.