Corpus Integration
How IWC grounding works without turning the Foundry into an upstream workflow mirror.
The Foundry has no IWC ingestion pipeline, no exemplar mirror, and no workflow-fixtures runtime dependency. It integrates the IWC corpus through citations, survey notes, optional inline excerpts, and Molds that can fetch live corpus evidence at runtime.
workflow-fixtures/ now lives as a top-level directory inside the Foundry checkout (Foundry support infrastructure, not a separate product). Generated corpora — pipelines/, cwl/, iwc-src/, iwc-cleaned/, iwc-format2/, iwc-skeletons/ — are gitignored. The validator and site-content traversal stay scoped to content/; workflow-fixtures/ is invisible to them. The directory is cited by $IWC_FORMAT2/... and $IWC_SKELETONS/... from authoring/survey notes; CWL research can cite workflow-fixtures/cwl/<repo>/... paths in ad-hoc notes, but polished content should prefer upstream URLs or abstract workflow IDs. Nothing in casts/ or content/ reads from it at build time.
Skeleton tier
Survey-time evidence has three tiers:
- Grep over
$IWC_FORMAT2/**/*.gxwf.yml— cheap, blind to step-pair / step-sequence patterns. - Skeleton scan over
$IWC_SKELETONS/**/*.gxwf.yml— cheap structural read; sees topology, control flow, and tool sequences. All 120 skeletons fit in agent context (median ~6KB, total ~1MB). - Whole-workflow reading of selective
$IWC_FORMAT2files — expensive; reserved for parameter-level evidence on the recipes that look promising from tier 1 or 2.
A skeleton is the format2 workflow with non-structural fields stripped, leaving:
tool_id,label,docper stepin:(withsource:) /out:(ids only) /step_inputstopologywhen:expressions and other control flowrun:subworkflow descents (recursive)- workflow-level
inputs:/outputs:/tags/release/license
Dropped: tool_state parameter blobs, step position: UI metadata, step-level comments: / uuid / tool_shed_repository / tool_version (redundant with tool_id), output post-processing fields (add_tags, change_datatype, hide, rename, …), and top-level comments: (Galaxy sticky-notes, not topology).
Regen: cd workflow-fixtures && make skeletons (or tsx workflow-fixtures/scripts/build-skeletons.ts). Idempotent — rebuilds iwc-skeletons/ from the current iwc-format2/. Re-run after make iwc bumps the IWC pin.
The pattern is skeletons + selective full reads, not skeletons replacing full reads. /iwc-survey defaults to "skeleton scan first, then drill into $IWC_FORMAT2" for workflow-shape topics; tool-level topics still lean grep + structured-block extraction.
What the Foundry does instead
Patterns cite IWC by URL, in the page body. A pattern's
## Exemplarssection lists IWC workflows that demonstrate it, each as a free-form Markdown link ([bacterial-genomics/...](https://github.com/galaxyproject/iwc/blob/<sha>/workflows/...)) with one-liner author commentary. Pin to a specific commit SHA when stability matters; pin tomainwhen freshness matters. Author choice per citation; no enforced policy.Inline excerpts when they earn it. A pattern author may paste 10–30 lines of cleaned
gxformat2directly into a pattern body to illustrate an idiom. The cleaning is done at authoring time by the human runninggxwflocally (probably againstworkflow-fixtures, or against a fresh clone, or against a raw IWC URL — the Foundry doesn't care). The excerpt is committed verbatim into the pattern page; no build-time regeneration; rot is rot.No IWC category aggregation layer. Corpus grounding lives in survey notes, pattern exemplars, and body citations.
compare-against-iwc-exemplar(the Mold) operates against live IWC. The generated skill loads with instructions to fetch IWC at runtime viaWebFetch/gxwf, not against Foundry-hosted exemplar pages. The Mold's source artifact describes the procedure, not a corpus index.
What this gives up
- Per-workflow inverse view ("which Foundry patterns does this specific IWC workflow demonstrate"). No structural support. To recover, an author can hand-write a
conceptnote for a particularly canonical workflow and wiki-link to it. By exception, not by structural rule. - Per-workflow or per-category browsing in the static site. No structural support for now.
- Build-time inlining of full workflow content into casts. Casts that need workflow detail get IWC URLs the agent fetches at runtime, or the small hand-curated excerpts on pattern pages.
- Auto-detection of upstream IWC structural drift. A cited workflow can change shape without tripping any Foundry signal. Mitigation: pin citations to commit SHAs where stability matters; rely on review when authoring patterns.
Validation
There is no IWC-specific validator layer. Citations in pattern bodies are not validated (URLs are URLs; the cost of brokenness is moderate, the cost of automated link-check at scale is real).
What lives where (summary)
- In the Foundry repo: patterns with IWC citations and optional excerpts in body, plus surveys that explain the corpus evidence.
- NOT in the Foundry repo: workflow-fixtures, exemplar pages,
_pin.txt, ingest scripts, hand-curated annotation markers, frontmatter schema forexemplarnotes. - In generated fixture directories: workflow-fixtures supports authoring and survey work only. No reference from Foundry tooling.
Minimum Exercise
To exercise this lighter integration:
- Author 2–3 patterns end-to-end with
## Exemplarssections citing IWC paths/URLs and one inline excerpt when useful. - Confirm a Mold can wiki-link a pattern and that casting preserves the citations as live evidence pointers, not embedded mirrors.
If the loop holds, scale to more patterns. No further integration tooling is planned.