INITIAL_MOLDS

Initial Molds

Initial Mold inventory for the Galaxy Workflow Foundry, derived as the union of phases across the harness pipelines sketched in INITIAL_HARNESS_PIPELINES.md, plus the CLI Molds that capture reusable tool-reference content the action Molds depend on. Each Mold is atomic at the harness-step tier (not necessarily small in content).

This is a v1 candidate list, not a spec. Names, splits, and groupings will shift as we ground each Mold against IWC corpus exemplars and write the first one or two end-to-end. The point of this list is to surface what metadata a Mold actually needs, by enumerating concrete cases.

Bucketing axes

Each Mold falls along these axes:

This isn’t a frontmatter schema; it’s a mental model for v1 grouping. Whether these axes graduate into Mold metadata is a spec-time decision. tool-specific is provisional and may collapse into generic if the distinction stays uninteresting.

Catalog

Source summarization (source-specific, target-agnostic)

Each source emits its own schema by design — paper, Nextflow, and CWL are different enough that forcing a shared summary shape would either lose detail or bloat all three. Downstream Molds (data flow, templates) consume any source’s summary; the cast skills are responsible for handling the polymorphism.

Data flow (target-specific)

Split by target because Galaxy and CWL have different idioms (Galaxy collections / paired collections vs. CWL scatter / valueFrom). Each consumes any source-summarizer output:

Template generation (target-specific)

Per-step tool work (target-specific, runs in [loop])

For Galaxy targets, the harness performs discover-first, author-on-fallthrough: try discover-shed-tool first, and only invoke author-galaxy-tool-wrapper when no acceptable existing wrapper is found. The branch is harness logic; the Molds cleanly split the two cases.

Tests (mixed)

Two-step shape (translation/derivation, then assembly):

Derivation (gets the raw fixtures):

Assembly (turns fixtures into the final test artifact):

The derivation Molds and assembly Molds are complementary, not redundant: derivation produces fixtures; assembly produces the test artifact. Both fire in NF→Galaxy, CWL→Galaxy, etc.

Open question: whether the <source>-test-to-<target>-tests family factors cleanly through a generic intermediate, or stays per-pair.

Validation (target-specific)

Validate Molds describe the step in the process even where they wrap a static / structured CLI. The underlying validation is deterministic, but the cast skill is the Mold-shaped procedural description (when to run, how to interpret results, what to recommend on failure, when to loop back to authoring). Wraps gxwf / cwltool but is not a hand-authored CLI skill — it’s a Mold that references the relevant CLI manual pages.

Run & debug (Planemo-backed runtime)

Planemo is the runtime tool (it can run both Galaxy and CWL workflows); gxwf is the design-time tool. Run/debug Molds reference Planemo’s CLI manual pages.

Note: this run/debug tier is sized for “smart enough as a Claude skill, but Claude could often do it ad-hoc without one.” Treat them as nominally Mold-shaped for inventory completeness, but accept that they may end up thinner than the authoring Molds.

CLI Molds (tool-specific)

Whole-CLI casts. Each rolls up all cli/<tool>/* manual pages under it, plus a thin procedural overview, into a structured runtime artifact (typically JSON manifest + sidecar). The cast is what an agent loads when it needs general competence with the CLI rather than a specific verb. Per-action Molds (above) reference individual manual pages directly; they do not depend on the whole-CLI cast.

Open question: whether one Mold per tool is the right granularity, or whether very large CLIs (planemo) should split into sub-Molds. v1: one Mold per tool; revisit if the cast bundle is unwieldy.

Corpus-grounding (Galaxy-specific, generic in source)

Not Molds

Excluded from the inventory by design. Naming them keeps the boundary visible.

Wrapping a CLI is not a Mold disqualifier. discover-shed-tool, validate-with-gxwf, run-workflow-test, and the CLI Molds all wrap CLIs and are all Molds. The criterion is whether there is procedural content worth casting (when to run, how to interpret, when to loop back), not whether the underlying mechanism is a CLI.

Counts and reuse

What this list is for

This list exists to drive the Mold metadata schema. Once we walk through 2-3 of these and see what each one actually needs to encode (typed references by kind: patterns, CLI manual pages, IO schemas, prompts, examples; dependencies on other Molds; evaluation hooks; casting hints), the schema falls out empirically. Suggested first walks, in priority order:

  1. summarize-paper — most novel, most uncertain, exercises source-summarization shape and IO-schema reference.
  2. implement-galaxy-tool-step — runs in inner loop, pulls heavily from pattern pages and corpus, exercises wiki-link resolution and condensation.
  3. validate-with-gxwf — exercises CLI-manual-page reference, error-feedback loop; surfaces what a per-action Mold needs from a manpage cast.
  4. gxwf-cli — exercises whole-CLI roll-up, manpage→JSON casting, the “structured runtime artifact” cast target.

After those four, the schema for a Mold should be obvious; spec time.