INITIAL_COMPILATION_PIPELINE

Initial Compilation Pipeline

Initial sketch of how Molds become cast skills. Anchored to the file layout in INITIAL_ARCHITECTURE.md (molds/<name>/casts/<target>/<name>/). Working premise: LLM-driven, evolution-friendly, reproducible enough to diff. Casting is not deterministic; it is recorded.

What casting is

Casting takes a Mold (a typed reference manifest plus a procedural body) and its declared references — pattern pages, CLI manual pages, IO schemas, prompt fragments, examples — and produces a target-specific skill artifact. The cast is condensed and isolated — no links back to the Foundry, no runtime dependency on it.

Casting operates as per-kind dispatch over the manifest, not a single resolve-and-inline pass. Different reference kinds get different transformations:

Reference kindSource locationCasting transformationLands at
patterncontent/patterns/*.mdLLM-condensed, mixed verbatim + summarizationinlined into SKILL.md (or references/patterns/ for large pages)
cli-commandcontent/cli/<tool>/<cmd>.mdCast to structured JSON sidecarreferences/cli/<tool>/<cmd>.json
schemaschemas/*.schema.jsonVerbatim copyreferences/schemas/<name>.schema.json
promptcontent/prompts/*.mdInlined verbatim, no LLM rewriteinlined into SKILL.md or references/prompts/
examplecontent/molds/<slug>/examples/, shared content/examples/Verbatim copyreferences/examples/
evalcontent/molds/<slug>/eval.mdNever packaged— (Foundry-only)
mold (smell)another MoldDiscouraged; see Open questions

Verbatim-copy paths are deterministic; LLM-driven condensation is reserved for kinds where it adds value (patterns, partial manpage extracts when only a slice is referenced).

The casting process is itself expected to evolve. Today: an LLM with a target-specific prompt for the condensation steps; deterministic file copies for the rest. Tomorrow: maybe smarter prompts, different models per kind, partial determinism within a kind. The Foundry does not lock in a casting algorithm; it locks in a contract (input shape, output shape, provenance).

When casting runs

Three triggers, in increasing automation:

  1. Manual. foundry cast <mold-name> --target=<target>. The default for v1. A maintainer runs this when a Mold has changed and they want to see the new cast.
  2. CI on Mold change. When a PR touches molds/<name>/, CI re-casts that Mold against all configured targets and surfaces the diff in review.
  3. Watch-on-change (dev convenience). foundry cast --watch re-casts on file change for tight iteration.

foundry status reports drift: which casts in casts/<target>/<name>/ were produced from a different Mold content hash than what’s currently on disk.

Input contract

To cast a Mold, the casting process consumes:

Resolution policy is per-kind, not a single rule:

Output contract

Per cast: casts/<target>/<mold-name>/. Layout depends on target.

For the Claude target:

casts/claude/<mold-name>/
├── SKILL.md                  # the skill body Claude loads
├── references/               # supporting content, organized by kind
│   ├── schemas/              # verbatim *.schema.json
│   ├── cli/                  # JSON sidecars cast from manpages
│   │   └── <tool>/<cmd>.json
│   ├── patterns/             # condensed pattern excerpts (when not fully inlined)
│   ├── prompts/              # verbatim prompt fragments (when not fully inlined)
│   └── examples/             # verbatim fixtures
└── _provenance.json          # required, not part of the skill

Per-kind subdirectories under references/ mirror the casting dispatch and let the cast skill’s runtime locate any artifact deterministically.

For the web target (sketch):

casts/web/<mold-name>/
├── skill.json                # structured skill description
├── prompt.md
└── _provenance.json

For generic: shape TBD; probably a single self-contained markdown.

_provenance.json is required for every cast and contains:

{
  "mold_name": "implement-galaxy-tool-step",
  "mold_content_hash": "<sha256 of mold.md>",
  "mold_commit": "<git SHA at cast time>",
  "casting_model": "claude-opus-4-7",
  "casting_prompt_version": "v3",
  "casting_target": "claude",
  "cast_at": "2026-04-29T20:15:00Z",
  "resolved_refs": [
    { "kind": "pattern",     "name": "galaxy-collection-manipulation", "hash": "<sha256>" },
    { "kind": "cli-command", "name": "gxwf/tool-search",                "hash": "<sha256>" },
    { "kind": "cli-command", "name": "gxwf/tool-versions",              "hash": "<sha256>" },
    { "kind": "schema",      "name": "summary-paper.schema.json",       "hash": "<sha256>" },
    { "kind": "example",     "name": "scatter-with-collection.gxformat2.yml", "hash": "<sha256>" }
  ]
}

Provenance is the foundation for drift detection, reproducibility audits, and “why does this cast contain X” forensics.

Process steps (per cast)

cast_mold(mold_name, target):
  mold     <- read molds/<mold_name>/index.md
  validate mold against frontmatter schema (incl. typed-reference manifest)
  refs     <- resolve_manifest(mold)               # by kind: patterns, cli_commands, schemas, prompts, examples
  validate every ref exists and conforms to its kind's contract
  target   <- load_target_adapter(target)

  # Per-kind dispatch:
  for ref in refs:
    case ref.kind:
      pattern      -> condensed = llm.condense(ref, target.pattern_prompt)
                      stash for SKILL.md inlining or write to references/patterns/
      cli-command  -> sidecar = llm.cast_manpage_to_json(ref, target.cli_prompt)
                      write to references/cli/<tool>/<cmd>.json
      schema       -> copy verbatim to references/schemas/
      prompt       -> copy verbatim (inlined or to references/prompts/)
      example      -> copy verbatim to references/examples/
      eval         -> skip

  skill_md  <- target.assemble_skill(mold.body, condensed_patterns, manifest)
  write SKILL.md to casts/<target>/<mold_name>/
  write _provenance.json (mold hash, model(s), prompt version(s), per-ref hashes, timestamp)

The LLM is invoked per kind that needs condensation, not once globally. Streaming, retries, and per-kind output validation (does the JSON sidecar parse? does the condensed pattern preserve required sections?) live in the per-kind handler. If any handler fails, the cast aborts and the previous cast on disk is unchanged.

Drift detection

A cast is stale when any of:

foundry status enumerates stale casts; foundry cast --all re-casts every stale entry. Re-casting against an unchanged Mold and unchanged refs with the same model produces a different artifact because the LLM is non-deterministic — that’s expected and reviewed via diff.

Versioning

No semver on Molds, no semver on casts. Identity is content hash + commit SHA. Re-casting is the migration path. If a cast skill needs to be “frozen” (e.g., a published skill on a marketplace), pin it by commit SHA in the consumer.

This keeps the Foundry’s iteration loop fast: change a Mold, re-cast, review the diff. Don’t bump versions, don’t manage compatibility tables, don’t write changelogs for every cast.

Reproducibility

Casting is non-deterministic (LLM). What we guarantee instead is traceability: every cast records exactly what went into it (Mold hash, ref hashes, model, prompt version). A reviewer can:

We do not guarantee that re-casting produces byte-identical output. We do guarantee that re-casting produces output derivable from the same inputs.

What casting does not do

v1 minimum

To exercise the architecture without overbuilding:

If those casts look reasonable and the provenance flow holds, scale to more Molds and more targets.

Open questions