Harness Pipelines

Harness pipelines for the Galaxy Workflow Foundry. Each named pipeline phase corresponds to one atomic, harness-step-sized Mold, and the union of phases across pipelines is the Mold catalog. See MOLDS.md.

Framing

A harness is hand-authored orchestration glue. Harnesses sequence Molds, manage user-approval gates, and maintain run state. They are not cast from Molds and live outside the Foundry's casting pipeline. Some harnesses are heavyweight (Archon-style); some are simple orchestration skills.
Each phase below is intended to be a Mold — atomic, cast from the Foundry, LLM-driven content, reusable across harnesses where the phase recurs.
"atomic" means atomic relative to harness pipeline phases, not necessarily small. summarize-nextflow and implement-tool-step are both atomic at this tier even though they differ in LOC.

CWL as intermediate (one option, not the path)

CWL is unofficially positioned as a low-level, high-structure interchange format — suitable as an intermediate target between an unstructured/loosely-structured source (a paper, a Nextflow pipeline) and Galaxy. The Foundry must support both direct and composed paths as first-class options:

PAPER → GALAXY (direct) and PAPER → CWL → GALAXY (composed) are both valid.
NEXTFLOW → GALAXY (direct) and NEXTFLOW → CWL → GALAXY (composed) are both valid.
Direct paths are simpler to run and debug. Composed paths buy a structured checkpoint (CWL) at the cost of running two harnesses.
Whether composition is reliable enough to prefer over direct is a longer-term research question. For now: both paths must be possible from the Mold inventory; the harness picks.

Mold-inventory parity. Source summarizers emit per-source schemas (paper, NF, CWL each different by design). Interface and data-flow handoffs are source-target Molds that produce reviewable Markdown design briefs rather than rich workflow schemas. This avoids pushing all polymorphism into one target Mold while keeping direct/composed pipelines explicit.

Harness-level concerns (not Molds)

Some recurring pipeline activities are harness-level, not Mold-shaped, and are therefore not in the Mold inventory. They are listed here so the boundary is visible.

Approval gates / scope confirmation / plan presentation. Whether and when to pause for user confirmation (after planning, before authoring, after a partial cast) is a property of the harness's autonomy posture, not of any individual Mold. Different harnesses (interactive vs. batch vs. fully autonomous) want different gates around the same Molds; baking gates into Molds would either constrain that or duplicate logic. Harnesses own gates.
Tool-discovery routing. "Try discover-shed-tool (find an existing wrapper via the Tool Shed); if nothing acceptable, fall through to author-galaxy-tool-wrapper" is a routing decision the harness makes; the two underlying capabilities are clean Molds. (discover-shed-tool is named for the mechanism — the Galaxy Tool Shed — leaving room for siblings like discover-tool-via-galaxy-api or discover-tool-on-github if other discovery paths get wrapped.)
State and resumption. Persisting harness state across phases, resuming a partial run, and managing run history are harness concerns.

Runtime tooling

The Foundry distinguishes:

Design time: gxwf — workflow validation, tool discovery, schema, conversion. Used by Molds that author or validate workflow content.
Run time: Planemo — executes Galaxy and CWL workflows. Used by run-workflow-test, debug-galaxy-workflow-output, debug-cwl-workflow-output.

Validation posture: schema, not caveats

gxwf provides static schema validation for gxformat2 workflows and tool steps that catches the failure modes prior-art skills (e.g., the existing nf-to-galaxy skill in SKILLS_NF.md) had to enumerate as prose caveats — UUID validity, tool-ID/owner/+galaxyN suffix mismatches, input_connections parameter-name mismatches, conditional-selector branches in tool_state, etc. The Foundry does not maintain a parallel "caveat catalog" of these failure modes; gxwf's schema is the source of truth and the validation loop is the enforcement mechanism.

This shifts the per-step loop from "author and hope" to author → validate → fix with validation running inline after each step is implemented, not only as a terminal phase. The pipelines below reflect this by invoking validate-galaxy-step (or validate-cwl) inside the per-step loop.

Pipelines

Each pipeline is presented as an ordered list of phases. Phases marked [loop] run once per step in the workflow being constructed. Phases marked [branch] are harness-level routing — binary branches with fallthrough, or N-step fallback chains. They are not Molds; they reference Molds. The discover-or-author branch in Galaxy-targeting per-step loops is [branch] routing between two underlying capabilities.

Other inline phase annotations may be coined as needs surface — e.g., [gate] for an approval / scope-confirmation checkpoint that pauses for user input. None appear inline in the pipelines below today, so we don't pre-enumerate. [branch] and [gate] are unrelated behaviors; they don't share an umbrella tag.

PAPER → GALAXY

summarize-paper — extract methods, named tools/algorithms, sample data, metrics, references to existing pipelines.
paper-summary-to-galaxy-design — combined Galaxy interface and abstract data-flow design brief.
compare-against-iwc-exemplar — structural diff of the design brief against nearest IWC exemplar(s); guidance feeds template authoring.
paper-summary-to-galaxy-template — gxformat2 skeleton with per-step TODOs from paper source evidence, the design brief, and exemplar comparison notes.
[loop] [branch] discover-or-author branch:
- try discover-shed-tool.
- on fallthrough, author-galaxy-tool-wrapper.
[loop] summarize-galaxy-tool — pull JSON schema, containers, inputs/outputs for the resolved tool.
[loop] implement-galaxy-tool-step — convert abstract step to concrete gxformat2 step.
[loop] validate-galaxy-step — schema-validate the just-implemented step; on red, the harness loops back to (7).
[branch] test-data resolution chain: try paper-to-test-data → on failure, find-test-data → on failure, harness gates to user-supplied data.
implement-galaxy-workflow-test — assemble test fixtures and assertions.
validate-galaxy-workflow — terminal schema/lint pass on the assembled workflow.
run-workflow-test — execute via Planemo.
debug-galaxy-workflow-output — triage failures, propose fixes.

PAPER → CWL

summarize-paper
paper-summary-to-cwl-design
summary-to-cwl-template — CWL Workflow skeleton with per-step TODOs from source evidence and prior handoffs.
[loop] summarize-cwl-tool — derive a CommandLineTool description for each candidate (container, baseCommand, inputs/outputs).
[loop] implement-cwl-tool-step — concrete CommandLineTool and Workflow step.
[loop] validate-cwl — schema-validate the just-implemented step; on red, the harness loops back to (5).
[branch] test-data resolution chain: try paper-to-test-data → on failure, find-test-data → on failure, harness gates to user-supplied data.
implement-cwl-workflow-test
validate-cwl — terminal cwltool --validate / schema lint.
run-workflow-test — execute via Planemo.
debug-cwl-workflow-output — triage failures, propose fixes.

NEXTFLOW → CWL

summarize-nextflow — enumerate processes, channels, conditionals, containers, test data; emit a structured summary (NF-specific schema).
nextflow-summary-to-cwl-interface
nextflow-summary-to-cwl-data-flow
summary-to-cwl-template
[loop] summarize-cwl-tool
[loop] implement-cwl-tool-step
[loop] validate-cwl — inline schema validation per step; loop back on red.
nextflow-test-to-cwl-test-plan — translate NF test data and expectations into a CWL workflow test plan.
validate-cwl — terminal pass on the assembled workflow.
run-workflow-test — execute via Planemo.
debug-cwl-workflow-output

NEXTFLOW → GALAXY

summarize-nextflow
nextflow-summary-to-galaxy-reference-data — decide Galaxy-side shape of external reference data (iGenomes key, per-asset, compute-if-missing) before interface and data-flow choices pin workflow inputs.
nextflow-summary-to-galaxy-interface
nextflow-summary-to-galaxy-data-flow
compare-against-iwc-exemplar — structural diff of the design briefs against nearest IWC exemplar(s); guidance feeds template authoring.
nextflow-summary-to-galaxy-template
[loop] [branch] discover-or-author branch (discover-shed-tool → fallthrough to author-galaxy-tool-wrapper).
[loop] summarize-galaxy-tool
[loop] implement-galaxy-tool-step
[loop] validate-galaxy-step — inline schema validation per step; loop back on red.
nextflow-test-to-galaxy-test-plan — translate NF test data and expectations into a Galaxy workflow test plan.
implement-galaxy-workflow-test — assemble test fixtures and assertions from the translated test plan.
validate-galaxy-workflow — terminal pass on the assembled workflow.
run-workflow-test — execute via Planemo.
debug-galaxy-workflow-output

CWL → GALAXY

CWL is already structured; the upstream extraction work is much lighter.

summarize-cwl — read CWL Workflow + referenced CommandLineTools, identify inputs/outputs, scatter, conditional logic.
cwl-summary-to-galaxy-interface — choose Galaxy workflow interface from CWL inputs/outputs.
cwl-summary-to-galaxy-data-flow — re-shape into Galaxy-shaped data-flow idioms from a CWL summary that's already nearly a DAG.
compare-against-iwc-exemplar — structural diff of the design briefs against nearest IWC exemplar(s); guidance feeds template authoring.
cwl-summary-to-galaxy-template
[loop] [branch] discover-or-author branch (discover-shed-tool → fallthrough to author-galaxy-tool-wrapper).
[loop] summarize-galaxy-tool
[loop] implement-galaxy-tool-step
[loop] validate-galaxy-step — inline schema validation per step; loop back on red.
cwl-test-to-galaxy-test-plan — translate CWL test fixtures into a Galaxy workflow test plan.
implement-galaxy-workflow-test — assemble test fixtures and assertions from the translated test plan.
validate-galaxy-workflow — terminal pass on the assembled workflow.
run-workflow-test — execute via Planemo.
debug-galaxy-workflow-output

Cross-pipeline observations

Source-specific (one per source): summarize-paper, summarize-nextflow, summarize-cwl. Each emits its own schema by design.
Source × target interface/data-flow: nextflow-summary-to-galaxy-interface, nextflow-summary-to-galaxy-data-flow, cwl-summary-to-galaxy-interface, cwl-summary-to-galaxy-data-flow, nextflow-summary-to-cwl-interface, nextflow-summary-to-cwl-data-flow, plus combined paper design Molds until paper examples justify a split.
Source × target template generation (Galaxy): nextflow-summary-to-galaxy-template, cwl-summary-to-galaxy-template, paper-summary-to-galaxy-template. Each consumes its source-specific design briefs.
Target-specific (one per target):
- Templates: summary-to-cwl-template.
- Per-step (Galaxy): discover-shed-tool, summarize-galaxy-tool, author-galaxy-tool-wrapper, implement-galaxy-tool-step.
- Per-step (CWL): summarize-cwl-tool, implement-cwl-tool-step.
- Validate: validate-galaxy-step, validate-galaxy-workflow, validate-cwl.
- Debug: debug-galaxy-workflow-output, debug-cwl-workflow-output.
Cross-target (Planemo-backed): run-workflow-test.
Source × target (test-plan translation): nextflow-test-to-galaxy-test-plan, cwl-test-to-galaxy-test-plan, nextflow-test-to-cwl-test-plan. These produce reviewable test plans, not final test artifacts.
Test data extraction (source-specific, target-agnostic): paper-to-test-data is its own thing because a paper rarely ships a test bundle the way NF/CWL pipelines do.

Pattern pages, not Molds

Per the architecture, the design-* knowledge skills (collection manipulation, tabular manipulation, conditional handling, …) are Foundry pattern pages, not Molds. They are wiki-linked from action Molds (especially implement-galaxy-tool-step and the source-specific Galaxy template Molds) and pulled into generated skills via casting's link resolution.

Custom-Galaxy-tool authoring is split: a pattern page (reference and guidance) plus a companion action Mold (author-galaxy-tool-wrapper) that performs the authoring. The Mold links to the pattern page; the pattern page is consumed by the generated skill via link resolution.

Tracked Follow-Up

Composed paths (PAPER -> CWL -> GALAXY, NEXTFLOW -> CWL -> GALAXY) reuse the existing Mold inventory. Track whether they become distinct pipeline notes or remain runtime compositions in issue #200.