Galaxy Data-Flow Draft Contract
This is an architectural contract, not a schema. Evidence is strongest for Mold and Pipeline boundaries. Proposed fields are speculative until exercised by two or three worked translations.
Boundary
The data-flow draft owns a target-shaped abstract DAG for Galaxy. It should not be valid gxformat2 and should not resolve exact Tool Shed tools.
Data-flow draft owns:
- Galaxy-facing workflow inputs and outputs.
- Abstract nodes, edges, branches, collection mapping, collection reduction, and placeholder transformations.
- Input/output shape decisions such as
File,list,paired,list:paired, orlist:list. - Conceptual Galaxy idioms: map-over, reduction, Apply Rules, collection cleanup, identifier synchronization, tabular bridge.
- Abstract unresolved tool needs with input and output shapes.
- Confidence and rationale on inferred nodes, edges, transforms, and tool needs.
The Galaxy template owns:
- A
gxformat2skeleton. - Ordered placeholder steps, labels, TODO slots, workflow inputs, workflow outputs, and rough connections.
- Placeholder collection-operation or Apply Rules steps only when the data-flow draft says they are necessary.
- Handoff units for the per-step implementation loop.
Concrete step implementation owns:
- Exact
tool_id, version, owner/repository metadata, changeset, parameters, andinput_connections. - Concrete built-in collection-operation steps and parameters.
- Validation with
gxwfand repair after schema/lint failures.
Proposed Body-Level Contract
Do not add these as frontmatter fields yet.
| Concept | Status | Notes |
|---|---|---|
source_summary_ref | Speculative | Reference to the source summary consumed by the Mold. |
workflow_inputs | Speculative | Abstract Galaxy-facing inputs and collection types. |
workflow_outputs | Speculative | Intended outputs and source-summary provenance. |
nodes | Speculative | Abstract operations, not concrete Galaxy tools. |
edges | Speculative | Data dependencies, shape before/after, and source evidence. |
galaxy_idioms | Speculative | Map-over, reduction, Apply Rules, collection filter, tabular bridge, etc. |
unresolved_tool_needs | Speculative | Per abstract step needs for discover-shed-tool or author-galaxy-tool-wrapper. |
placeholder_transformations | Speculative | Shape/text/table transforms needed for Galaxy semantics but not concretely implemented. |
confidence | Speculative | Prefer high, medium, low plus rationale. |
handoff_notes | Speculative | Instructions to template, exemplar comparison, and per-step Molds. |
open_questions | Speculative | Semantic/tooling issues carried forward. |
Handoff Examples
Unresolved tool need:
- nextflow-summary-to-galaxy-data-flow emits an abstract node such as
trim FASTQ reads, inputlist:paired fastq, outputlist:paired fastq, tool needread trimming, confidencemedium. - nextflow-summary-to-galaxy-template creates a placeholder step with TODOs and collection-shaped connections.
- The harness routes through tool discovery or wrapper authoring.
- implement-galaxy-tool-step fills exact Galaxy tool metadata, parameters, and connections.
Collection cleanup after fan-out:
- Data-flow records a conceptual cleanup transform after a mapped step.
- Template materializes a placeholder collection-operation step.
- Implementation chooses exact built-in tool and parameters.
Identifier-derived reshaping:
- Data-flow records desired input shape, output shape, and identifier transformation.
- Template emits an Apply Rules placeholder only if needed.
- Implementation fills concrete rule JSON after exemplar comparison confirms the shape.
IWC exemplar comparison:
- Data-flow and template should hand compare-against-iwc-exemplar the abstract topology, placeholder transformations, unresolved tool needs, and confidence notes.
- Exemplar comparison should flag structural divergence, not resolve tools.
Confidence Guidance
- Attach confidence to the smallest useful unit: node, edge, transformation, or tool need.
- Use qualitative
high,medium,lowuntil examples justify a richer schema. - Require rationale for low confidence.
- Do not reuse source-summary
warnings[]as data-flow confidence. - Keep evidence quality distinct from translation confidence. A claim can be corpus-observed but still low-confidence for a specific workflow.
Risks
If the data-flow draft is too broad, it duplicates the template Mold, makes premature tool decisions, and leaks harness routing into Mold output.
If it is too narrow, the template Mold receives source-summary details without Galaxy-shaped semantics, exemplar comparison has only a surface skeleton to diff, and tool discovery loses input/output shape context.
Evidence
- nextflow-to-galaxy-channel-shape-mapping and nextflow-operators-to-galaxy-collection-recipes show why a Galaxy-shaped abstraction is needed between source summary and
gxformat2template. content/pipelines/nextflow-to-galaxy.mdplaces data-flow before template, exemplar comparison, and per-step implementation.content/molds/summarize-nextflow/index.mdsays source summarization should not produce Galaxy data flow.content/molds/implement-galaxy-tool-step/index.mdowns concrete step implementation.
TODOs
- Decide whether to author
galaxy-data-flow-draft.schema.jsonnow or wait for worked examples. - Decide whether confidence should be a single enum or per-axis vector.
- Decide how much ordering guidance belongs in data-flow versus template.
- Decide whether built-in collection operations should be abstract placeholders or concrete template steps.