Galaxy workflow draft format
The output artifact galaxy-workflow-draft produced by the *-summary-to-galaxy-template Molds is gxformat2 with wrapper-tier relaxations and free-text planning fields, sized to the gap between data-flow design and tool-resolved implementation.
Topology — workflow inputs and their collection shapes, workflow outputs, the step set, the producer→consumer edge graph, branches, and when: guards — is settled by the template Mold itself, drawing on the upstream interface and data-flow briefs (see galaxy-data-flow-draft-contract). The output is concrete gxformat2 with no topology TODOs. Everything deferred to the per-step implementation Mold is wrapper-tier: which Tool Shed wrapper, what parameters, and the wrapper-determined port names that populate in: / out: / outputSource.
Relaxations vs. gxformat2
For tool steps, when the wrapper has not been picked:
tool_idandtool_versionMAY be the literal stringTODO. Resolution belongs to discover-shed-tool and the per-step implementation Mold.tool_shed_repository(the{ changeset_revision, name, owner, tool_shed }block) MAY be absent.tool_state/stateMAY be absent.- Step
out[].idandin[]keys MAY beTODO_<hint>sentinels (e.g.TODO_trimmed_paired,TODO_input). Workflowoutputs[].outputSourceMAY reference such a sentinel viastep/TODO_<hint>so the connection graph still resolves syntactically. The connection itself (which step’s output feeds which step’s input) is topology and MUST be present; only the wrapper-determined port names are deferred.
Workflow inputs (types, collection shapes, formats, optionality), workflow outputs, step labels, and the producer→consumer edge set MUST be expressed in normal gxformat2 form with concrete values — never TODO. The template Mold is the locus where these decisions are made; if the upstream brief leaves a topology choice open, decide it here from source evidence, IWC exemplars, and pattern pages rather than punting downstream.
Additions: _plan_* planning fields
Free-text fields per tool step capture the template Mold’s intent for the downstream per-step implementation Mold. All optional, but expected on any step where tool_id is TODO or tool_state is absent.
_plan_state— parameter binding intent: which knobs matter, value or range, why. Read by the per-step Mold to bind realtool_stateonce a wrapper is selected._plan_context— extras the per-step Mold needs to pick a wrapper: sourcecommand:block, conda packages, Docker/Singularity images, environment variables, preconditions, postconditions, container entrypoints, scratch-disk needs._plan_in— wrapper input port mapping intent. The connection graph already says “this source feeds this step”;_plan_inrecords the semantic role of each port and likely wrapper-side names (single_pairedvspaired_inputvsinput) so the per-step Mold can collapseTODO_*keys to the real ones._plan_out— wrapper output surface intent: which outputs downstream consumers depend on (with the existing edges as evidence) so the per-step Mold can pick a wrapper that exposes them, or insert a follow-up step if it does not.
The _plan_* family is wrapper-tier: each entry exists because the tool wrapper is not yet picked and disappears the moment one is. Topology decisions do not belong here.
_plan_* fields are draft-only. They MUST be removed before the workflow is treated as a runnable gxformat2 document. Any remaining TODO / TODO_* sentinels MUST also be resolved at the same time.
Example (sketch)
class: GalaxyWorkflow
inputs:
reads:
type: collection
collection_type: list:paired
format: fastqsanger.gz
outputs:
trimmed:
outputSource: fastp/TODO_trimmed_paired
steps:
fastp:
tool_id: TODO
label: trim and QC paired reads
in:
TODO_input: reads
out:
- id: TODO_trimmed_paired
- id: TODO_html_report
_plan_state: |
adapter trimming on, quality cutoff ~Q20, min length ~50.
preserve paired-end pairing for downstream alignment.
_plan_context: |
upstream: nf-core FASTP module.
conda: bioconda::fastp=0.23.4
container: quay.io/biocontainers/fastp:0.23.4--h5f740d0_0
precondition: paired list collection with sane element identifiers
_plan_in: |
single semantic port `reads`: feeds workflow `reads` (list:paired).
wrapper input port name likely one of `single_paired` | `paired_input` |
`input` depending on which fastp wrapper is picked.
_plan_out: |
need a paired output that preserves list:paired shape (downstream
alignment step consumes it). also expose the HTML report as a
checkpoint output for QC.
Why this shape
- Keeps the artifact recognizably gxformat2 so static tooling (
gxwf validate --no-tool-state, structural diff against IWC) still applies to topology and port wiring. - Carves a stable handoff between the template Mold and the per-step implementation Mold without sneaking tool resolution into either side.
- Makes the boundary between topology (settled upstream) and wrapper-tier (deferred) explicit — the
_plan_*family lives squarely on the deferred side. - Free-text
_plan_*is intentional for v1: it lets the templating agent record intent without a contract pretending to be parameterizable yet. Structuring those fields is open work — see each template Mold’srefinement.md.
Open work
- Decide whether
_plan_stateshould grow structure (parameter-name hints, value ranges, references back to the source summary). - Decide whether
_plan_contextshould be split into typed fields (source_command,conda,containers,env,pre,post). - Decide structure for
_plan_inand_plan_outonce two or three worked drafts exercise them. Candidate_plan_out: array of{ semantic_name, downstream_consumers[], required_for }. Candidate_plan_in: map keyed by theTODO_*placeholder key to{ semantic_name, source_step_output, shape_constraint }. - Pick a JSON Schema strategy: extend the gxformat2 schema with
_plan_*and the relaxed wrapper / port-name rules, or validate the draft via a sibling schema that wraps gxformat2 with the relaxations. - Specify the strip step that converts a draft into a runnable gxformat2 workflow. It MUST drop all
_plan_*fields and reject any remainingTODOorTODO_<hint>sentinel intool_id,tool_version,out[].id,in[]keys, oroutputSource.