Harness Pipelines
The source-to-target journeys that compose Molds, loops, and branch phases.
Harness pipelines for the Galaxy Workflow Foundry. Each named pipeline phase corresponds to one atomic, harness-step-sized Mold, and the union of phases across pipelines is the Mold catalog. See MOLDS.md.
Framing
- A harness is hand-authored orchestration glue. Harnesses sequence Molds, manage user-approval gates, and maintain run state. They are not cast from Molds and live outside the Foundry's casting pipeline. Some harnesses are heavyweight (Archon-style); some are simple orchestration skills.
- Each phase below is intended to be a Mold — atomic, cast from the Foundry, LLM-driven content, reusable across harnesses where the phase recurs.
- "atomic" means atomic relative to harness pipeline phases, not necessarily small.
summarize-nextflowandimplement-tool-stepare both atomic at this tier even though they differ in LOC.
CWL as intermediate (one option, not the path)
CWL is unofficially positioned as a low-level, high-structure interchange format — suitable as an intermediate target between an unstructured/loosely-structured source (a paper, a Nextflow pipeline) and Galaxy. The Foundry must support both direct and composed paths as first-class options:
PAPER → GALAXY(direct) andPAPER → CWL → GALAXY(composed) are both valid.NEXTFLOW → GALAXY(direct) andNEXTFLOW → CWL → GALAXY(composed) are both valid.- Direct paths are simpler to run and debug. Composed paths buy a structured checkpoint (CWL) at the cost of running two harnesses.
- Whether composition is reliable enough to prefer over direct is a longer-term research question. For now: both paths must be possible from the Mold inventory; the harness picks.
Mold-inventory parity. Source summarizers emit per-source schemas (paper, NF, CWL each different by design). Interface and data-flow handoffs are source-target Molds that produce reviewable Markdown design briefs rather than rich workflow schemas. This avoids pushing all polymorphism into one target Mold while keeping direct/composed pipelines explicit.
Harness-level concerns (not Molds)
Some recurring pipeline activities are harness-level, not Mold-shaped, and are therefore not in the Mold inventory. They are listed here so the boundary is visible.
- Approval gates / scope confirmation / plan presentation. Whether and when to pause for user confirmation (after planning, before authoring, after a partial cast) is a property of the harness's autonomy posture, not of any individual Mold. Different harnesses (interactive vs. batch vs. fully autonomous) want different gates around the same Molds; baking gates into Molds would either constrain that or duplicate logic. Harnesses own gates.
- Tool-discovery routing. "Try
discover-shed-tool(find an existing wrapper via the Tool Shed); if nothing acceptable, fall through toauthor-galaxy-tool-wrapper" is a routing decision the harness makes; the two underlying capabilities are clean Molds. (discover-shed-toolis named for the mechanism — the Galaxy Tool Shed — leaving room for siblings likediscover-tool-via-galaxy-apiordiscover-tool-on-githubif other discovery paths get wrapped.) - State and resumption. Persisting harness state across phases, resuming a partial run, and managing run history are harness concerns.
Runtime tooling
The Foundry distinguishes:
- Design time:
gxwf— workflow validation, tool discovery, schema, conversion. Used by Molds that author or validate workflow content. - Run time: Planemo — executes Galaxy and CWL workflows. Used by
run-workflow-test,debug-galaxy-workflow-output,debug-cwl-workflow-output.
Validation posture: schema, not caveats
gxwf provides static schema validation for gxformat2 workflows and tool steps that catches the failure modes prior-art skills (e.g., the existing nf-to-galaxy skill in SKILLS_NF.md) had to enumerate as prose caveats — UUID validity, tool-ID/owner/+galaxyN suffix mismatches, input_connections parameter-name mismatches, conditional-selector branches in tool_state, etc. The Foundry does not maintain a parallel "caveat catalog" of these failure modes; gxwf's schema is the source of truth and the validation loop is the enforcement mechanism.
This shifts the per-step loop from "author and hope" to author → validate → fix with validation running inline after each step is implemented, not only as a terminal phase. The pipelines below reflect this by invoking validate-galaxy-step (or validate-cwl) inside the per-step loop.
Pipelines
Each pipeline is presented as an ordered list of phases. Phases marked [loop] run once per step in the workflow being constructed. Phases marked [branch] are harness-level routing — binary branches with fallthrough, or N-step fallback chains. They are not Molds; they reference Molds. The discover-or-author branch in Galaxy-targeting per-step loops is [branch] routing between two underlying capabilities.
Other inline phase annotations may be coined as needs surface — e.g., [gate] for an approval / scope-confirmation checkpoint that pauses for user input. None appear inline in the pipelines below today, so we don't pre-enumerate. [branch] and [gate] are unrelated behaviors; they don't share an umbrella tag.
PAPER → GALAXY
summarize-paper— extract methods, named tools/algorithms, sample data, metrics, references to existing pipelines.paper-summary-to-galaxy-design— combined Galaxy interface and abstract data-flow design brief.compare-against-iwc-exemplar— structural diff of the design brief against nearest IWC exemplar(s); guidance feeds template authoring.paper-summary-to-galaxy-template—gxformat2skeleton with per-step TODOs from paper source evidence, the design brief, and exemplar comparison notes.[loop][branch]discover-or-author branch:- try
discover-shed-tool. - on fallthrough,
author-galaxy-tool-wrapper.
- try
[loop]summarize-galaxy-tool— pull JSON schema, containers, inputs/outputs for the resolved tool.[loop]implement-galaxy-tool-step— convert abstract step to concretegxformat2step.[loop]validate-galaxy-step— schema-validate the just-implemented step; on red, the harness loops back to (7).[branch]test-data resolution chain: trypaper-to-test-data→ on failure,find-test-data→ on failure, harness gates to user-supplied data.implement-galaxy-workflow-test— assemble test fixtures and assertions.validate-galaxy-workflow— terminal schema/lint pass on the assembled workflow.run-workflow-test— execute via Planemo.debug-galaxy-workflow-output— triage failures, propose fixes.
PAPER → CWL
summarize-paperpaper-summary-to-cwl-designsummary-to-cwl-template— CWL Workflow skeleton with per-step TODOs from source evidence and prior handoffs.[loop]summarize-cwl-tool— derive aCommandLineTooldescription for each candidate (container, baseCommand, inputs/outputs).[loop]implement-cwl-tool-step— concreteCommandLineTooland Workflow step.[loop]validate-cwl— schema-validate the just-implemented step; on red, the harness loops back to (5).[branch]test-data resolution chain: trypaper-to-test-data→ on failure,find-test-data→ on failure, harness gates to user-supplied data.implement-cwl-workflow-testvalidate-cwl— terminalcwltool --validate/ schema lint.run-workflow-test— execute via Planemo.debug-cwl-workflow-output— triage failures, propose fixes.
NEXTFLOW → CWL
summarize-nextflow— enumerate processes, channels, conditionals, containers, test data; emit a structured summary (NF-specific schema).nextflow-summary-to-cwl-interfacenextflow-summary-to-cwl-data-flowsummary-to-cwl-template[loop]summarize-cwl-tool[loop]implement-cwl-tool-step[loop]validate-cwl— inline schema validation per step; loop back on red.nextflow-test-to-cwl-test-plan— translate NF test data and expectations into a CWL workflow test plan.validate-cwl— terminal pass on the assembled workflow.run-workflow-test— execute via Planemo.debug-cwl-workflow-output
NEXTFLOW → GALAXY
summarize-nextflownextflow-summary-to-galaxy-reference-data— decide Galaxy-side shape of external reference data (iGenomes key, per-asset, compute-if-missing) before interface and data-flow choices pin workflow inputs.nextflow-summary-to-galaxy-interfacenextflow-summary-to-galaxy-data-flowcompare-against-iwc-exemplar— structural diff of the design briefs against nearest IWC exemplar(s); guidance feeds template authoring.nextflow-summary-to-galaxy-template[loop][branch]discover-or-author branch (discover-shed-tool→ fallthrough toauthor-galaxy-tool-wrapper).[loop]summarize-galaxy-tool[loop]implement-galaxy-tool-step[loop]validate-galaxy-step— inline schema validation per step; loop back on red.nextflow-test-to-galaxy-test-plan— translate NF test data and expectations into a Galaxy workflow test plan.implement-galaxy-workflow-test— assemble test fixtures and assertions from the translated test plan.validate-galaxy-workflow— terminal pass on the assembled workflow.run-workflow-test— execute via Planemo.debug-galaxy-workflow-output
CWL → GALAXY
CWL is already structured; the upstream extraction work is much lighter.
summarize-cwl— read CWL Workflow + referencedCommandLineTools, identify inputs/outputs, scatter, conditional logic.cwl-summary-to-galaxy-interface— choose Galaxy workflow interface from CWL inputs/outputs.cwl-summary-to-galaxy-data-flow— re-shape into Galaxy-shaped data-flow idioms from a CWL summary that's already nearly a DAG.compare-against-iwc-exemplar— structural diff of the design briefs against nearest IWC exemplar(s); guidance feeds template authoring.cwl-summary-to-galaxy-template[loop][branch]discover-or-author branch (discover-shed-tool→ fallthrough toauthor-galaxy-tool-wrapper).[loop]summarize-galaxy-tool[loop]implement-galaxy-tool-step[loop]validate-galaxy-step— inline schema validation per step; loop back on red.cwl-test-to-galaxy-test-plan— translate CWL test fixtures into a Galaxy workflow test plan.implement-galaxy-workflow-test— assemble test fixtures and assertions from the translated test plan.validate-galaxy-workflow— terminal pass on the assembled workflow.run-workflow-test— execute via Planemo.debug-galaxy-workflow-output
Cross-pipeline observations
- Source-specific (one per source):
summarize-paper,summarize-nextflow,summarize-cwl. Each emits its own schema by design. - Source × target interface/data-flow:
nextflow-summary-to-galaxy-interface,nextflow-summary-to-galaxy-data-flow,cwl-summary-to-galaxy-interface,cwl-summary-to-galaxy-data-flow,nextflow-summary-to-cwl-interface,nextflow-summary-to-cwl-data-flow, plus combined paper design Molds until paper examples justify a split. - Source × target template generation (Galaxy):
nextflow-summary-to-galaxy-template,cwl-summary-to-galaxy-template,paper-summary-to-galaxy-template. Each consumes its source-specific design briefs. - Target-specific (one per target):
- Templates:
summary-to-cwl-template. - Per-step (Galaxy):
discover-shed-tool,summarize-galaxy-tool,author-galaxy-tool-wrapper,implement-galaxy-tool-step. - Per-step (CWL):
summarize-cwl-tool,implement-cwl-tool-step. - Validate:
validate-galaxy-step,validate-galaxy-workflow,validate-cwl. - Debug:
debug-galaxy-workflow-output,debug-cwl-workflow-output.
- Templates:
- Cross-target (Planemo-backed):
run-workflow-test. - Source × target (test-plan translation):
nextflow-test-to-galaxy-test-plan,cwl-test-to-galaxy-test-plan,nextflow-test-to-cwl-test-plan. These produce reviewable test plans, not final test artifacts. - Test data extraction (source-specific, target-agnostic):
paper-to-test-datais its own thing because a paper rarely ships a test bundle the way NF/CWL pipelines do.
Pattern pages, not Molds
Per the architecture, the design-* knowledge skills (collection manipulation, tabular manipulation, conditional handling, …) are Foundry pattern pages, not Molds. They are wiki-linked from action Molds (especially implement-galaxy-tool-step and the source-specific Galaxy template Molds) and pulled into generated skills via casting's link resolution.
Custom-Galaxy-tool authoring is split: a pattern page (reference and guidance) plus a companion action Mold (author-galaxy-tool-wrapper) that performs the authoring. The Mold links to the pattern page; the pattern page is consumed by the generated skill via link resolution.
Tracked Follow-Up
- Composed paths (
PAPER -> CWL -> GALAXY,NEXTFLOW -> CWL -> GALAXY) reuse the existing Mold inventory. Track whether they become distinct pipeline notes or remain runtime compositions in issue #200.