INITIAL_ARCHITECTURE

Initial Architecture

Initial sketch of the Galaxy Workflow Foundry’s architecture, anchored on the physical file layout of the foundry repo and on the structural conventions borrowed from galaxy-brain (see COMPONENT_GALAXY_BRAIN.md). Working premise: organize the data well — typed frontmatter, registered tags, wiki-linked references, generated indexes — and the skills, validation, and rendering fall out naturally.

These are sketches, not specs. Layouts and component edges will move as we walk concrete Molds end-to-end and as ingestion/casting tooling lands. Where galaxy-brain has a battle-tested pattern that maps cleanly, the Foundry adopts it verbatim; where the Foundry’s purpose differs (it’s more targeted, action-oriented, and grounded in the IWC corpus), the pattern is reshaped or replaced.

The Foundry has no runtime or content dependency on galaxy-brain. Galaxy-brain is design influence, not an upstream.

1. Component map

External:

Foundry-internal (in the foundry/ repo):

Consumers (external):

2. Concepts and vocabulary

Cribs galaxy-brain’s vocabulary where it carries (Note, Type, Subtype, Tag, Wiki link, Log, Slash command), and adds Foundry-specific terms. Authoritative term definitions live in GLOSSARY.md; this section is the architectural picture.

Note: galaxy-brain’s concept and moc types are not carried over. The Foundry’s content types already aggregate references — Molds aggregate patterns/CLI/schemas/examples, Pipelines aggregate Molds in order, Patterns aggregate IWC URLs and link out to companion Molds. Each is a focused MOC. A separate “navigation hub note” type would be a fourth aggregation surface without any content the others can’t already host.

Note: the content root is content/ (not galaxy-brain’s vault/). The Foundry isn’t an Obsidian vault by intent — content/ is the Astro idiom and reads accurately to a new contributor.

3. Note types and subtypes

Source of truth: meta_schema.yml type.enum and the allOf/if/then block; meta_tags.yml for the matching tag.

typesubtypeRequired-extraTag(s)Directory
moldname, axismoldcontent/molds/<slug>/index.md only
patterntitlepattern (+ optional iwc/*)content/patterns/
cli-commandtool, commandcli-reference (+ cli/<tool>)content/cli/<tool>/
pipelinetitle, phasespipeline (+ optional source/*, target/*)content/pipelines/
researchcomponent(base + subtype)research/componentcontent/research/
researchdesign-problem(base + subtype)research/design-problemcontent/research/
researchdesign-spec(base + subtype)research/design-speccontent/research/

mold has a directory-placement contract enforced by the validator’s findMdFiles (sibling .md files in content/molds/<slug>/ are skipped). The pattern is lifted from galaxy-brain’s project rule but the Foundry doesn’t carry forward project itself — docs/ holds long-form design docs and Mold is the only directory-note type.

cli-command notes are not directory-based — each command is a flat single file. The two-level content/cli/<tool>/<cmd>.md directory structure is for organization, not directory-note semantics. Slug for wiki-link resolution: <tool>-<cmd> or namespaced as cli/<tool>/<cmd> — TBD when the resolver shared module is updated; see §7.

The research subtype list is intentionally narrower than galaxy-brain’s seven. The Foundry expects most “issue/PR research” to live in galaxy-brain or upstream; the Foundry keeps component, design-problem, design-spec for self-design notes plus background syntheses (e.g., the existing COMPONENT_NEXTFLOW_WORKFLOW_TESTING.md lands as a research/component note).

4. Tag system

meta_tags.yml is a flat YAML dict whose keys are the entire allowed tag vocabulary; each value is { description: "..." }. Hierarchy is purely textual (slash-delimited). Examples:

mold:
  description: "Mold note (source artifact for casting)"
pattern:
  description: "Pattern reference page (Galaxy workflow construction patterns)"
iwc/variant-calling:
  description: "Variant-calling workflows (DNA-seq, somatic, germline)"
iwc/rna-seq:
  description: "RNA-seq quantification, splicing, differential expression"

Validation injects the registry keys into the schema at runtime (scripts/lib/schema.ts:loadTags / loadSchema), so meta_schema.yml’s tag enum stays empty on disk. Vocabulary changes touch one file; the schema stays static. Pattern lifted from galaxy-brain — the separation is load-bearing.

Tag families:

Subject-area tags beyond iwc/* are deferred. Galaxy-brain’s galaxy/* family (Galaxy code/feature areas — collections, tools, conditionals) is not committed to up front. The kinds of knowledge the Foundry will hold (background research like COMPONENT_NEXTFLOW_WORKFLOW_TESTING, gxformat2 syntax notes, custom-tool-authoring detail, etc.) haven’t been catalogued yet; locking in a subject-area taxonomy before content lands is premature. Tag families bloom as patterns surface real cross-cutting needs.

Coherence check (TYPE_TAG_MAP + validate_tag_coherence) emits a warning (not error) when a note’s (type, subtype) doesn’t carry its expected note-type tag. Hierarchy-aware: plan/section satisfies plan.

5. Frontmatter schema

meta_schema.yml is JSON Schema Draft 07 written in YAML. Adopted wholesale from galaxy-brain.

Base required (everywhere): type, tags, status, created, revised, revision, ai_generated, summary.

Conditional fields declared at top level (must be, due to additionalProperties: false) and gated by allOf/if/then. Foundry-specific blocks beyond the galaxy-brain set:

- if: { properties: { type: { const: mold } }, required: [type] }
  then: { required: [name, axis] }
- if: { properties: { type: { const: pattern } }, required: [type] }
  then: { required: [title] }
- if: { properties: { type: { const: cli-command } }, required: [type] }
  then: { required: [tool, command] }
- if: { properties: { type: { const: pipeline } }, required: [type] }
  then: { required: [title, phases] }

Foundry-specific field types:

Mold = typed reference manifest. Beyond the wiki-link fields below, a Mold’s frontmatter declares typed references by reference kind (sketch — exact field shape pending MOLD_SPEC after a couple of walked Molds):

The validator resolves each kind with its own check (slug-resolves for wiki-link kinds; file-exists + JSON-Schema-parseable for input_schemas / output_schemas; etc.). The casting tool dispatches per kind — see INITIAL_COMPILATION_PIPELINE.md.

Wiki-link frontmatter fields (regex ^\[\[.+\]\]$):

No exemplar-related fields. IWC workflows are referenced by URL in pattern bodies, not as typed frontmatter (see INITIAL_CORPUS_INGESTION.md).

Strict mode: additionalProperties: false. Every conditional field declared at top level. Carried from galaxy-brain.

6. Validation pipeline

scripts/validate.ts is the validator entry point, runnable via tsx scripts/validate.ts (or compiled). Dependencies: Ajv (JSON Schema Draft 07), gray-matter (frontmatter parse), js-yaml (load schema + tag registry). Same shape as galaxy-brain’s validate_frontmatter.py, ported to TS.

Layered validation (validateData orchestrates):

  1. preprocessFrontmatter — normalize parsed dates (gray-matter / js-yaml may produce Date objects) to ISO strings before schema check.
  2. validateSchema — Ajv compiled against the schema with tag enum injected at load time.
  3. validateDates — second pass on created / revised via strict ISO parse.
  4. validateWikiLinks — regex-checks the inner text of [[...]] for whitespace-only payloads.
  5. validateTagCoherencewarning when (type, subtype) doesn’t carry its expected tag.
  6. validateBidirectionalRelatedNotes (cross-file) — builds slug→file map; warns on asymmetric related_notes links.
  7. validateIwcTags (Foundry-specific) — every iwc/<category> tag used in a note is declared in meta_tags.yml. Same enforcement as the existing tag pipeline; no separate mechanism.
  8. validateMoldRefs (Foundry-specific) — every Mold’s typed references resolve, per kind:
    • patterns, cli_commands, prompts — slug resolves to a content note of the expected type.
    • input_schemas / output_schemas — file exists in schemas/, parses as JSON Schema Draft 07.
    • examples — path exists. Failures error. The per-kind dispatch here is the static-validation analog of casting’s per-kind dispatch.
  9. validatePipelinePhases (Foundry-specific) — every pipeline note’s phases items resolve:
    • mold-shaped phases — wiki link resolves to a type: mold note.
    • branch-shaped phases — branch value is a known routing pattern; embedded wiki links (in branches, chain, etc.) resolve to type: mold notes.
    • Other phase kinds (e.g., gate) — validated per the kind’s own shape when introduced. Failures error. Inventory coverage warning — emits warning listing Molds that have zero pipeline membership across all pipeline notes (candidate dead Molds, or pipeline gaps).

findMdFiles skip rules:

const SKIP_DIRS = new Set([".obsidian", "casts"]);
const SKIP_FILES = new Set(["Dashboard.md", "Index.md", "iwc-overview.md", "log.md"]);
// directory-note rule, generalized:
const DIR_NOTE_TYPES = new Set(["projects", "molds"]);

Hidden directories skipped. Casts directory (casts/) is always skipped — it’s generated content, validated by casting tooling separately.

One slug-resolver, not two. Because everything is TS, the wiki-link slug + resolver lives in one shared module (scripts/lib/wiki-links.ts) imported by both the validator and the Astro site (site/src/lib/wiki-links.ts re-exports from it, or the site imports directly via path alias). Galaxy-brain had to maintain two parallel implementations (Python + TS) and risk drift; the Foundry collapses that to one. This is the most concrete win from going TS-only.

tests/validate.test.ts (Vitest) loads the real meta_schema.yml and meta_tags.yml and exercises validateData (unit) and validateFile (integration with tmp directories). Mirrors galaxy-brain’s test layout.

Frontmatter wiki-link fields: parent_pattern, related_notes, related_patterns, related_molds. All regex ^\[\[.+\]\]$.

Format: [[Target Name]]. Pipe-aliasing supported in body ([[Target|display]]) by the remark plugin; not in frontmatter.

Resolution algorithm — adopted verbatim from galaxy-brain. Single shared module (scripts/lib/wiki-links.ts); validator, site page renderer, and the remark transformer all import the same slugify and resolveWikiLink.

slug = lower(name) → "  -  " → "-" → spaces → "-" → strip [^a-z0-9-] → collapse dashes

Lookup: exact match on a basename-keyed map first, then prefix-match fallback. Directory-based notes (projects/<slug>/index.md, molds/<slug>/index.md) are keyed by their parent directory name. Lets [[implement-galaxy-tool-step]] resolve to content/molds/implement-galaxy-tool-step/index.md.

Tighten galaxy-brain’s prefix-match non-determinism (dict iteration order) by sorting candidates shortest-first, then alphabetically[[foo-b]] resolves to foo-bar rather than foo-bar-baz, which is what an author typing a partial stub almost always means. Cheap to do in the shared module; eliminates a class of cross-version flake.

Backlinks computed only from typed frontmatter fields (bounded, fast, author-controlled). Each note page renders an “Incoming References” section grouped by field. Body wiki links don’t backlink — same scope cut as galaxy-brain (revisit if Mold pages need full backlink graphs).

Bidirectional warning: validator emits related_notes: missing backlink to [[X]]. Asymmetric and informational only.

8. Generated artifacts

All generated files live under content/ and are committed to git; CI runs --check drift gates before deploy.

Dashboard.md — Obsidian Dataview tables, one per section. site/src/pages/index.astro — same sections rendered as HTML tables.

dashboard_sections.json is the single source of truth:

[
  { "label": "Pipelines", "tag": "pipeline" },
  { "label": "Molds", "tag": "mold" },
  { "label": "Patterns", "tag": "pattern" },
  { "label": "Plans", "tag": "plan" },
  { "label": "Component Research", "tag": "research/component" },
  { "label": "Design Problems", "tag": "research/design-problem" },
  { "label": "Projects", "tag": "project" }
]

Pipelines lead the dashboard because they are the primary task surface of the Foundry: a contributor or agent landing cold should first see the journeys (“convert a Nextflow workflow to Galaxy”), then drill into Molds / Patterns / CLI as the reference layer beneath. Type-based sections are preserved as the reference surface; pipelines are the journey surface. See §11 for how this propagates to the Astro routes.

scripts/generate-dashboard.ts emits Dataview blocks; the Astro page imports the same JSON. Both filter status !== 'archived', sort revised DESC. Pattern lifted from galaxy-brain.

Index.md — flat prose catalog grouped by type/subtype, alphabetized within each group:

- [[slug]] — {summary} *(stale)*

scripts/generate-index.ts walks findMdFiles (reusing the validator’s skip logic), groups by type, emits the file. Directory-note slugs use the parent directory name.

content/iwc-overview.md — Foundry-specific. Auto-generated grouping of every iwc/<category> tag into a single dashboard, with counts and per-category lists of patterns + Molds. Single landing page for “what does the Foundry have for variant-calling, RNA-seq, …”. Detail in INITIAL_CORPUS_INGESTION.md.

Drift detection: --check flag on every generator reads the file and string-compares with re-generation; exit 1 on mismatch. Wired into npm run check:dashboard, check:index, check:iwc-overview. Designed as CI gates — galaxy-brain had this pattern but no CI to run it; the Foundry wires it from day one.

9. Authoring flow

Two authoring entry points:

Galaxy-brain’s third entry point (Obsidian Templater files under content/templates/) is not carried over. The Foundry isn’t an Obsidian vault by intent; agent-driven authoring through slash commands handles scaffold-prompt-stamp-validate without an interactive plugin in the loop.

Foundry slash commands (sketch — see open questions):

There is no IWC ingestion command. IWC is referenced by URL in pattern bodies (see INITIAL_CORPUS_INGESTION.md); no ingest-iwc script exists.

The keystone agent shape from galaxy-brain — classify → fetch → dedup → draft → cross-ref → write → validate → log → regenerate — is preserved in /cast.

10. Directory-based note types

One type uses the directory-note pattern: Mold.

Mold (content/molds/<slug>/):

content/molds/implement-galaxy-tool-step/
  index.md           ← only file with frontmatter (the "mold.md" of casting)
  eval.md            ← evaluation plan; never packaged into the cast
  examples/          ← optional walk-throughs
  casting-hints.md   ← optional per-target overrides (deferred until walk-throughs surface need)

eval.md co-locates evaluation with the Mold (improves discoverability and ownership) without bleeding it into the cast skill. Casting reads index.md and refs; never reads eval.md.

Galaxy-brain’s project type is not carried forward — docs/ holds long-form Foundry-meta design narrative; the validator’s directory-note rule is reused for Mold but not generalized to a second type.

Validator distinction:

const DIR_NOTE_TYPES = new Set(["molds"]);
if (parts.some(p => DIR_NOTE_TYPES.has(p)) && path.basename !== "index.md") continue;

Two Astro content collections:

Routes:

Casts directory (casts/<target>/<name>/) is not a content collection — it’s generated, language-target-shaped, and rendered via a dedicated route family (pages/casts/[target]/[mold]/[...path].astro) that treats the cast as a standalone artifact, not a foundry note. Open question: whether casts render on the public site at all, or only as a downloadable archive.

11. Site / Astro layer

Stack: Astro static + Tailwind CSS v4 (@tailwindcss/vite) + @tailwindcss/typography. Lifted from galaxy-brain. Font choice (Atkinson Hyperlegible was a galaxy-brain personal accessibility default) is reconsidered for the Foundry — open question.

Routes (departures from galaxy-brain noted):

Theme: CSS custom properties under @theme { ... } with @custom-variant dark and a .dark { ... } override block. Galaxy palette renamed for Foundry brand; structure preserved. Status badges (.badge-draft, …) and .tag chips first-class. .dangling styles unresolved wiki links muted+italic.

Deployment: minimal two-job GitHub Actions on push to main (withastro/action@v3 + actions/deploy-pages@v4). Unlike galaxy-brain, CI runs npm run validate, check:index, check:dashboard, check:categories, and test before the deploy — galaxy-brain has these gates as Makefile targets but doesn’t run them in CI. Closing that hole is part of v1.

12. Ingestion and maintenance

One ingestion spine — Mold casting. There is no IWC ingestion (see INITIAL_CORPUS_INGESTION.md for the deconstruction).

Mold casting (scripts/cast-mold.ts, driven by /cast). Covered in INITIAL_COMPILATION_PIPELINE.md. Reads from molds/, patterns/, schemas/; writes only to casts/<target>/<name>/.

content/log.md — append-only, excluded from validator and Astro collections, Obsidian-visible. Reserved entry types: cast, planned lint and query. Format follows galaxy-brain:

## 2026-04-29 cast — implement-galaxy-tool-step (claude)
- **mold**: [[implement-galaxy-tool-step]]
- **target**: claude
- **model**: claude-opus-4-7
- **prompt-version**: v3
- **resolved-refs**: 4 patterns

package.json scripts (replacing galaxy-brain’s Makefile):

Stack:

13. Cross-cutting concerns

Validation. Two layers, per galaxy-brain:

Versioning. No semver on Molds, no semver on casts. Identity = name + content hash. Re-casting is the migration path. Carried directly from INITIAL_COMPILATION_PIPELINE.md.

Provenance. Every derived artifact records what produced it:

IWC-cited URLs in pattern bodies are not tracked as provenance — they are author-controlled citations. Pinning to a commit SHA is at the author’s discretion per citation.

Status lifecycle. Status enum (draft | reviewed | revised | stale | archived) on every note. Archived notes filtered everywhere a list appears. First-class, not a tag convention. Lifted from galaxy-brain.

14. Physical file layout

Directory tree. Names provisional; the shape is the proposal.

foundry/
├── README.md
├── GLOSSARY.md
├── KNOWLEDGE_BASE.md
├── meta_schema.yml                       # JSON Schema Draft 07 in YAML
├── meta_tags.yml                         # tag registry (incl. iwc/*)
├── dashboard_sections.json               # single source for Obsidian + Astro dashboards
├── docs/
│   ├── ARCHITECTURE.md
│   ├── MOLD_SPEC.md
│   ├── HARNESS_PIPELINES.md
│   ├── MOLD_INVENTORY.md
│   ├── CORPUS_INGESTION.md
│   ├── COMPILATION_PIPELINE.md
│   ├── PROBLEM_AND_GOAL.md
│   └── SCOPE_V1.md
├── schemas/                              # Mold IO schemas (the schema library)
│   ├── summary-paper.schema.json         # per-source summary outputs
│   ├── summary-nextflow.schema.json
│   ├── summary-cwl.schema.json
│   ├── galaxy-tool-summary.schema.json   # output of summarize-galaxy-tool
│   └── …                                 # one or more per Mold with structured IO
├── content/
│   ├── Dashboard.md                      # generated; --check
│   ├── Index.md                          # generated; --check
│   ├── iwc-overview.md                   # generated; --check
│   ├── log.md                            # append-only operations journal
│   ├── molds/
│   │   ├── implement-galaxy-tool-step/
│   │   │   ├── index.md                  # frontmatter + body (the "mold.md")
│   │   │   ├── eval.md                   # not packaged into cast
│   │   │   └── examples/
│   │   ├── summarize-paper/
│   │   ├── discover-shed-tool/
│   │   ├── gxwf-cli/                     # whole-CLI Mold
│   │   ├── planemo-cli/                  # whole-CLI Mold
│   │   └── …
│   ├── patterns/
│   │   ├── galaxy-collection-manipulation.md   # body cites IWC URLs
│   │   ├── galaxy-tabular-manipulation.md
│   │   ├── galaxy-conditional-handling.md
│   │   ├── galaxy-custom-tool-authoring.md
│   │   └── …
│   ├── cli/
│   │   ├── gxwf/
│   │   │   ├── tool-search.md            # one file per command/subcommand
│   │   │   ├── tool-versions.md
│   │   │   ├── tool-revisions.md
│   │   │   ├── validate.md
│   │   │   ├── lint.md
│   │   │   ├── convert.md
│   │   │   └── …
│   │   └── planemo/
│   │       ├── test.md
│   │       ├── run.md
│   │       └── …
│   ├── pipelines/
│   │   ├── paper-to-galaxy.md
│   │   ├── nextflow-to-galaxy.md
│   │   ├── cwl-to-galaxy.md
│   │   ├── paper-to-cwl.md
│   │   └── nextflow-to-cwl.md
│   └── research/
│       └── component-nextflow-workflow-testing.md  # background syntheses
├── casts/                                # generated; committed; skipped by validator
│   ├── claude/
│   │   ├── _target.yml                   # prompt template, model, output schema
│   │   ├── implement-galaxy-tool-step/
│   │   │   ├── SKILL.md
│   │   │   ├── references/
│   │   │   └── _provenance.json
│   │   └── …
│   ├── web/
│   └── generic/
├── scripts/
│   ├── validate.ts
│   ├── generate-dashboard.ts
│   ├── generate-index.ts
│   ├── generate-iwc-overview.ts
│   ├── seed-iwc-tags.ts                  # one-time, then archived
│   ├── cast-mold.ts
│   ├── status.ts                         # cast drift detection
│   └── lib/
│       ├── schema.ts                     # load + tag-enum injection
│       ├── frontmatter.ts                # gray-matter wrapper + date normalization
│       ├── wiki-links.ts                 # slug + resolver (shared with site)
│       └── walk.ts                       # findMdFiles + skip rules
├── tests/
│   └── validate.test.ts                  # Vitest
├── site/                                 # Astro renderer
│   ├── src/
│   │   ├── content.config.ts             # content + directoryNoteFiles collections
│   │   ├── lib/
│   │   │   └── remark-wiki-links.ts      # imports scripts/lib/wiki-links.ts
│   │   ├── pages/
│   │   ├── components/
│   │   └── styles/global.css
│   └── astro.config.mjs
├── .claude/
│   └── commands/
│       ├── draft-mold.md
│       ├── draft-pattern.md
│       └── cast.md
├── .github/workflows/
│   ├── ci.yml                            # validate + check:* + test + tsc --noEmit
│   └── deploy.yml                        # Astro → GitHub Pages
├── package.json                          # one dep tree for tooling + site
├── tsconfig.json                         # path alias for scripts/lib/* shared with site
└── vitest.config.ts

Key decisions reflected in the layout:

15. What’s adopted from galaxy-brain (and what’s reshaped)

Direct lifts (in priority order, per COMPONENT_GALAXY_BRAIN.md §“What to borrow”):

  1. Frontmatter contract pattern. JSON Schema Draft 07 in YAML + runtime tag-enum injection + additionalProperties: false + allOf/if/then for conditional requireds. Replaced enums with Foundry types; structure verbatim.
  2. Wiki-link resolver — collapsed to one shared TS module. Galaxy-brain maintained parallel Python + TS implementations and risked drift; the Foundry’s TS-only stack lets validator, site renderer, and remark transformer all import the same slugify and resolveWikiLink. Generalized basename keying handles projects/<slug>/index.md and molds/<slug>/index.md. Prefix-match is sorted longest-first (galaxy-brain’s dict-order non-determinism eliminated).
  3. Generated Index.md + Dashboard.md with --check drift gates, plus dashboard_sections.json driving both Obsidian Dataview and the Astro landing page.
  4. Two-collection Astro split (typed content + passthrough sibling files). Generalized to cover both Project and Mold directory notes.
  5. Slash-command authoring shapeclassify → fetch → dedup → draft → cross-ref → write → validate → log → regenerate. Realized as /cast and the /draft-* commands; galaxy-brain’s URL/issue-centric /ingest is not carried over.
  6. content/log.md append-only operations journal, excluded from validator and Astro collections.
  7. Single-file scripts pattern. Galaxy-brain used PEP 723 + uv; the Foundry uses tsx + a single package.json. Same goal — zero virtualenv ceremony, scripts live next to the data.
  8. Status lifecycle as first-class enum with badge rendering and global archived filtering.
  9. Raw markdown endpoints + clipboard copy on every page.
  10. CSS custom-property theme tokens + class-based .dark override, with semantic surface/text/badge/tag tokens.
  11. additionalProperties: false + bidirectional related_notes warning.
  12. Tag registry as separate file with empty enum in schema, injected at runtime.

Reshaped or replaced:

Explicitly not carried over from galaxy-brain:

Gaps galaxy-brain has that the Foundry closes:

16. Open questions

Layout:

Tag families:

Pipelines:

Schema:

Tooling:

Process:

Resolved (moved out of this list):