Archon evaluation

Project-infrastructure research on Archon as a possible Foundry harness substrate. Reviewed against https://github.com/coleam00/Archon on 2026-04-29, dev branch at CHANGELOG 0.3.10. Operational details are point-in-time evidence — re-check before adopting.

1. What is it?

Tagline (verbatim from README.md line 8): “The first open-source harness builder for AI coding. Make AI coding deterministic and repeatable.”

Elevator pitch (README.md line 23): “Archon is a workflow engine for AI coding agents. Define your development processes as YAML workflows — planning, implementation, validation, code review, PR creation — and run them reliably across all your projects.” It self-positions as “Like what Dockerfiles did for infrastructure and GitHub Actions did for CI/CD … Think n8n, but for software development.”

Category. A YAML-DAG workflow engine that wraps coding-agent SDKs (Claude Agent SDK, OpenAI Codex SDK, and a community Pi provider that fronts ~20 LLM backends). It is not a generic agent framework, not a multi-agent runtime, and not a RAG/KB system. It is closer in spirit to GitHub Actions for AI than to LangGraph or CrewAI. Platform adapters (Slack/Telegram/Discord/GitHub webhooks/Web/CLI) feed messages into a router that picks a workflow and runs it.

Important pivot. v1 of Archon was a Python-based “task management + RAG” multi-agent system. That codebase is archived on the archive/v1-task-management-rag branch and is no longer developed. The current main/dev is a complete rewrite in TypeScript/Bun (started ~Feb 2025) with a different premise. Anything written about Archon before mid-2025 is about a different product.

Maturity / activity.

License: MIT (LICENSE).
Stars: ~20,200; forks: ~3,090; open issues: 191.
Last push: 2026-04-29 (the day this review was done). CHANGELOG shows roughly weekly patch releases through 0.3.x. Still pre-1.0; no compatibility promise.
Primary maintainer: Cole Medin (coleam00) plus contributors. Bus factor appears to be roughly 1 + a small contributor pool — issues #1428–#1497 in the last ~2 weeks are mostly authored or merged by a tight group.
Trendshift-listed; clearly still in marketing/growth phase.

Stated goals. Determinism and repeatability (“same workflow, same sequence, every time”), portability (workflows committed to your repo, run identically from CLI/Web/Slack/Telegram/GitHub), team-shareable processes, and isolation-by-default (every run gets its own git worktree). Stated non-goals (from CLAUDE.md): not multi-tenant, single-developer tool, KISS/YAGNI/SRP enforced explicitly. No scheduling, no general task-graph compute, no agent-to-agent negotiation.

2. Core architecture and primitives

Archon’s vocabulary is small and concrete. The primitives are: workflow, node, command (prompt template), codebase (registered repo), conversation, session, isolation environment (worktree), workflow run, workflow event.

Composition model. Workflow-centric, not agent-centric. A workflow is a YAML DAG of nodes with depends_on edges. Independent nodes in the same topological layer run concurrently. The author writes the structure; the AI fills in the intelligence at each AI node.

Node types (from packages/workflows/src/schemas/dag-node.ts, mutually exclusive within a node):

command: — invokes a named markdown prompt file from .archon/commands/.
prompt: — inline prompt string, sent to the AI provider.
bash: — shell script, no AI; stdout captured as $nodeId.output.
script: — inline TS/Python (runtime: bun | uv) with optional deps: and timeout:; stdout captured as $nodeId.output.
loop: — iterative AI prompt with until: completion-signal string, max_iterations, optional fresh_context: true (new session per iteration), interactive: true (pause for user via /workflow approve), and until_bash: for deterministic loop-exit checks.
approval: — pure human gate. approval.message, optional capture_response: true to store the user’s reply as $node.output, optional on_reject: with its own redraft prompt and max_attempts.
cancel: — terminate the run with a reason string (think “abort branch” of a when: conditional).

Per-node knobs (also in dagNodeBaseSchema): when: JS-expression conditions, trigger_rule: for join semantics (all_success | one_success | none_failed_min_one_success | all_done), context: fresh | shared, output_format: (JSON-schema for structured output, SDK-enforced for Claude/Codex), allowed_tools/denied_tools, hooks: (per-node SDK hooks), mcp: (per-node MCP server config path), skills: (per-node Claude skill preload — directly relevant), agents: (inline Claude sub-agent definitions usable via the Task tool), effort/thinking/maxBudgetUsd/systemPrompt/fallbackModel/betas/sandbox (Claude SDK passthrough), retry: (with backoff), idle_timeout:, model:/provider: overrides.

Storage / state. SQLite by default at ~/.archon/archon.db; PostgreSQL via DATABASE_URL. 8 tables, remote_agent_* prefix: codebases, conversations, sessions (immutable, transition-linked), messages, isolation_environments, workflow_runs (with working_path for resume detection — migration 019_workflow_resume_path.sql), workflow_events (step-level event log for observability), codebase_env_vars. Per-run artifacts live on disk under ~/.archon/workspaces/<owner>/<repo>/artifacts/runs/<id>/ and are reachable from inside nodes via $ARTIFACTS_DIR. Per-run logs go to …/logs/. Cross-run repo-scoped state lives at <repo>/.archon/state/ (gitignored).

Execution model. Async, in-process (Bun event loop), per-run path-exclusive lock (overridable via mutates_checkout: false). Node concurrency is per-DAG-layer. The DAG executor lives at packages/workflows/src/dag-executor.ts; the loader+validator at loader.ts; orchestration that maps platform messages to runs at packages/core/src/orchestrator/.

Multi-agent coordination. Limited and deliberate. Archon does not have an agent-to-agent negotiation layer. The closest things are: (a) parallel DAG layers (the archon-idea-to-pr workflow runs five reviewer nodes concurrently and joins them with trigger_rule: one_success into a synthesize node — exactly the multi-reviewer fan-out you’d expect); (b) per-node agents: blocks that define Claude sub-agents callable via the Task tool, scoped to a single node. There’s no “team of agents debating” abstraction. This is a feature, not a gap, given the determinism goal.

3. Tech stack and deployment

Languages/runtime: TypeScript on Bun (workspaces monorepo). Strict TS. Zod schemas via @hono/zod-openapi (every API/engine type is OpenAPI-generated).
Server: Hono (OpenAPIHono) on port 3090. SSE for streaming.
Frontend: React + Vite + Tailwind v4 + shadcn/ui + Zustand. Generated TypeScript types from the OpenAPI spec.
DB: SQLite (default, zero-setup) or PostgreSQL.
Deployment options: local Bun dev (bun run dev), compiled binary (Homebrew formula at homebrew/, install scripts at scripts/install.{sh,ps1}, archon serve to run the bundled web UI), Docker (Dockerfile, docker-compose.yml, Caddyfile.example for VPS), and a wizard-driven setup that’s the recommended path.
External deps: the chosen LLM provider’s SDK (Anthropic, OpenAI Codex, or @mariozechner/pi-coding-agent); MCP servers if you use them; gh CLI; git; optional Slack/Telegram/Discord credentials; PostHog (telemetry, opt-out via ARCHON_TELEMETRY_DISABLED=1).
UI surfaces: Web UI (chat, dashboard “Mission Control”, drag-and-drop Workflow Builder, run-step viewer), CLI (bun run cli workflow run|status|resume|abandon|cleanup, cli isolation list|cleanup, cli validate workflows|commands), platform adapters (Slack, Telegram, Discord, GitHub webhooks).
Configuration: .archon/config.yaml (repo-level) and ~/.archon/config.yaml (global). Provider/model defaults, Claude settingSources, Codex reasoning effort, additional directories, docs path, env-var allow-lists.

4. Skill / tool integration model

This is where Archon’s fit with generated skills and other cast artifacts is closest.

Claude skills are first-class. The skills: array on any AI node preloads Claude skill directories via SDK AgentDefinition wrapping. Skills live in the standard Claude location and are referenced by name. Per-node, not workflow-wide.
Commands (.archon/commands/*.md) are reusable prompt templates with $1/$2/$ARGUMENTS/$ARTIFACTS_DIR/$WORKFLOW_ID/$BASE_BRANCH/$DOCS_DIR/$LOOP_USER_INPUT/$REJECTION_REASON/$LOOP_PREV_OUTPUT/$<nodeId>.output substitution. They are not code — they are prompt fragments. This maps cleanly onto a prompt-shaped generated skill with a contract.
MCP servers are configured per-node via mcp: (path to a config JSON, env vars expanded at execution time). Claude-only. Useful if generated skills expose themselves as MCP tools.
Inline agents: define Claude sub-agents callable via the Task tool, scoped to the node — kebab-case IDs, optional model/tools/skills/maxTurns. This is the closest analog to dynamically composing generated skills inside a phase.
External executables are reached via bash: or script: nodes. There is no “tool registry” — you call gxwf, planemo, gh, etc. directly. script: nodes with runtime: uv and deps: [...] are particularly useful: a self-contained Python script with its own dependency set runs as a node and emits structured stdout. This is the integration point for gxwf lint/validate/roundtrip and Planemo CLI invocations.
No Claude-skill-directory auto-discovery beyond what Claude Code itself does. Archon doesn’t have its own skill registry; it leans on Claude’s. For non-Claude providers, skills: is ignored.
Workflow + command discovery: automatic, three-layer precedence (bundled defaults < ~/.archon/{workflows,commands,scripts}/ global < <repo>/.archon/... project). Subfolder one-level nesting allowed. Validated at load time (bun run cli validate workflows).

5. Orchestration features

Feature	Archon support
Sequencing / DAG	First-class. `nodes` with `depends_on`. Layer-parallel execution.
Conditional branching	`when:` expression on nodes; `cancel:` node terminates a branch.
Routing	The router (`packages/workflows/src/router.ts`) uses an LLM call to pick a workflow from descriptions, with a 4-tier name-resolution fallback (exact → case-insensitive → suffix → substring) and ambiguity detection. Within a workflow, branching is `when:`-based, not LLM-routed.
Looping / per-item iteration	`loop:` with `until:` signal, `max_iterations`, `fresh_context`, `until_bash:`, `$LOOP_PREV_OUTPUT`. The looped unit is a single prompt — there is no native “for each item in list, run this sub-DAG” primitive. To loop over a list of steps, you either drive iteration from inside the prompt (the `archon-piv-loop` “Ralph pattern” reads a plan from disk and picks one task per iteration) or you author N copies of the layer. This is a real limitation for the Foundry’s “loop over workflow steps” requirement.
Approval gates / HITL	`approval:` nodes (pure gate, optional `capture_response`, optional `on_reject` redraft). `loop: { interactive: true, gate_message }` for iterative HITL. CLI/web/platform `/workflow approve <id> <text>` and `/workflow reject <id> <reason>`.
Retry / failure	Per-node `retry:` with backoff (not on loop nodes — loops manage their own iteration). `trigger_rule` lets a join survive partial failures. Failures classified into structured error types (`dag.node_empty_output`, `codex_stream_incomplete`, etc.).
State persistence and resumption	Yes. Workflow runs persist to DB; `working_path` lets a re-run on the same branch find prior failed runs. `cli workflow resume <run-id>` re-runs, skipping completed nodes. Approval gates pause the run and survive process restarts. `archon-piv-loop` after-resume semantics handled explicitly (e.g., `$LOOP_USER_INPUT` only populated on first iteration after resume).
Observability / tracing	`workflow_events` table is a step-level event log (transitions, artifacts, errors). JSONL file logs per run under `…/logs/`. Web UI “Workflow Execution” view streams events. Pino structured logs with `{domain}.{action}_{state}` event naming.
Cost tracking	Per-node `maxBudgetUsd:` cap (Claude only — SDK passthrough). No global cost dashboard.
Concurrency	Per-DAG-layer parallelism inside a run. Multiple concurrent runs across worktrees. Same-checkout concurrency requires `mutates_checkout: false` (author asserts no race).
Worktree isolation	Default. `cli workflow run --branch <name>` or auto-generated name; `--no-worktree` opt-out. `archon-resolve-conflicts`, `cli isolation cleanup`, `cli complete <branch>` lifecycle.

6. Knowledge base / RAG features

There is no KB. v1 had RAG (Supabase + pgvector + Crawl4AI); v2 dropped it entirely. The closest things v2 has:

$DOCS_DIR (configurable docs path the agent can grep) — file-system docs, not a vector store.
$ARTIFACTS_DIR per run, plus <repo>/.archon/state/ for cross-run state — basically a scratchpad and a journal, not a queryable corpus.
Codebase-scoped env vars and registered codebases.

For the Foundry’s wish to host patterns and IWC exemplars in retrievable form, Archon is the wrong shape. You would either keep them as static files in <repo>/.archon/state/ or docs/ and grep from inside nodes, or pair Archon with an external retriever (MCP server, custom skill, separate vector DB). The repo’s authors deliberately removed the KB layer; it is not coming back.

7. Extensibility and customization

Workflows are the extension point. A “harness” in Foundry terms maps almost 1-to-1 onto an Archon workflow file. They are first-class, file-based, version-controlled, and discoverable. This is by far Archon’s strongest fit.
Commands are reusable prompt fragments — extract a phase prompt once, call it from many workflows.
Scripts (.archon/scripts/*.{ts,py}) are reusable code units called from script: nodes.
Code-level extensibility is narrower. New AI providers implement IAgentProvider (packages/providers/src/types.ts), new platform adapters implement IPlatformAdapter. There is no plugin loader for arbitrary new node types or new workflow primitives — adding (e.g.) a for_each: node would mean a fork of @archon/workflows. The repo’s stated principles (KISS/YAGNI, fail-fast, no autonomous lifecycle mutation, see CLAUDE.md) suggest the maintainer will resist speculative extension hooks.
How much code vs config for a non-trivial harness? For something like the Foundry’s full pipeline — sequence, gates, routing on discover-or-author, per-step loop, gxwf/planemo/test runs, debug loop — the answer is “mostly config.” You’d write one workflow YAML, ~10 commands, and 2–3 script: nodes for gxwf/planemo. The single sharp limitation is the per-step loop semantics (see §10).

8. Failure modes and limitations

Per-iteration “for each step” is awkward. As noted, loops iterate a single prompt; iterating a sub-DAG over N items requires either (a) the Ralph pattern (loop reads list from disk, picks one) or (b) static unrolling. Foundry’s “loop over steps: discover-or-author tool → summarize → implement → validate” cannot be expressed as a sub-DAG-per-item natively.
Single-developer tooling assumption. No multi-tenancy, no auth model beyond per-adapter env-var allow-lists, no audit log targeted at compliance. Fine for a research foundry; not OK if you ever want a hosted multi-user Foundry.
Pre-1.0, schema churn. CHANGELOG 0.3.10 alone shows substantive engine semantics changes (e.g., dag.node_empty_output failure mode, provider/model resolution rewrite, $LOOP_PREV_OUTPUT). Lock-in via YAML is small (workflows are portable), but lock-in via runtime semantics is real.
Provider lock-in. The whole engine assumes one of three provider SDKs. Not a generic “any LLM” engine. Pi (community) widens this via OpenRouter-style routing but it’s builtIn: false.
No native KB/retrieval (see §6).
Bus factor. Effectively one primary maintainer with a contributor pool. Pace is fast but concentrated.
Known recent bugs / open issues (sample from issues 1485–1497): security CVEs in dev-chain deps (Discord adapter undici/lodash; vite/astro/postcss in dev), IME enter-key handling on web, release-workflow race conditions, GitHub App auth roadmap (#1495). Roadmap-pinned: setup overhaul to “binary-install primary + archon doctor” (#1494).
No autonomous lifecycle mutation across processes is an explicit architectural rule (CLAUDE.md, citing #1216) — Archon will refuse to mark stale runs as failed. Pragmatic: requires user one-click action. Worth noting if you want “fully autonomous” harnesses.
Lock-in on extraction. Workflows are plain YAML referencing prompts in plain markdown. Easy to migrate. The DB schema (8 tables) is more entangled, but artifacts on disk and per-repo .archon/ make extraction tractable.

9. Roadmap and trajectory

Cadence: weekly-ish patch releases through 0.3.x. CHANGELOG follows Keep a Changelog. SemVer.
Recent themes (0.3.10): maintainer-workflow suite (auto PR review, daily standup, contributor-reply triage), provider/model trust-the-SDK rewrite, looped-output passthrough ($LOOP_PREV_OUTPUT), mutates_checkout for same-checkout concurrency, autodetection of Claude/Codex binary paths.
Active roadmap items in issues: GitHub App auth (#1495), setup overhaul / archon doctor (#1494), security-dep churn for community adapters (#1485, #1486).
Direction: hardening for “fire-and-forget remote agentic coding” — Slack/Telegram/GitHub triggering, multi-platform unified inbox, more automation around PR review and triage. Not moving toward general orchestration, KB, or multi-agent.

10. Concrete fit assessment for harnesses

Re-read against Foundry harness requirements:

What Archon covers off-the-shelf:

Sequencing the pipeline. summarize-nextflow → nextflow-summary-to-galaxy-interface → nextflow-summary-to-galaxy-data-flow → … → run-workflow-test → debug-galaxy-workflow-output maps directly onto a nodes: list with depends_on. Each phase is a command: or prompt: node calling a generated skill (or wrapping gxwf/planemo via bash: or script:).
Approval gates with full autonomy posture. approval: for pure gates, loop: { interactive: true } for iterative HITL, plain non-interactive for fully autonomous. The same workflow file can be authored interactive or batch by toggling interactive: and removing approval nodes — exactly the autonomy-posture knob you described.
State persistence and resumption. Run state in DB; cli workflow resume <id> skips completed nodes; approval pauses survive restart; per-run $ARTIFACTS_DIR is durable; cross-run <repo>/.archon/state/ is durable.
CLI integration for gxwf/planemo. script: nodes with runtime: uv + deps:, or bash: nodes. Stdout becomes $node.output for downstream nodes. output_format: JSON schema for Claude/Codex enforces structured outputs from validators.
Observability. workflow_events table + Web UI step viewer + JSONL logs. Probably exceeds what a hand-rolled harness would have.
Concurrent runs. Worktrees by default — useful if multiple Galaxy workflows are being foundry-cast at once.
Cast skill loading. Claude skills via per-node skills:; reusable prompt fragments via command: nodes; MCP servers via mcp:.

What requires non-trivial custom code on top of Archon:

Per-step looping over a sub-DAG. The “[loop over steps] discover-or-author tool → summarize tool → implement step → validate-galaxy-step → translate tests → assemble tests” sub-DAG cannot be expressed as a native for-each. Workarounds: (a) Ralph pattern in a single big loop: (one iteration per step, fresh context, plan-on-disk); (b) flatten by writing a shell driver that calls bun run cli workflow run <sub-workflow> once per step and then synthesizes — feasible but adds an outer driver; (c) fork the engine to add for_each: (rejected by KISS/YAGNI culture, would diverge from upstream). The Ralph pattern is what the bundled archon-piv-loop and archon-ralph-dag workflows do, and it works, but each iteration is a single prompt — sub-DAG per iteration is not native.
Routing within a phase (the discover-or-author branch). Easiest expression: a prompt: node returning structured output_format: { found: bool }, then two siblings with when: conditions (when: $discover.output.found == true vs false), with a cancel: on a third branch. This works but the when: expression evaluator is intentionally simple; complex routing graphs become hard to read.
Foundry’s casting step. Archon doesn’t compile or cast Molds — it consumes the generated skills or other cast artifacts. Casting lives outside Archon. Not a fit problem; just confirming the boundary.
Pattern/IWC-exemplar retrieval. Either keep them on the filesystem and grep from inside nodes, or pair with an external retriever. Not Archon’s job.

What Archon would actively get in the way of: very little. The main friction is if you want a different runtime model (durable async like Temporal, externally-scheduled, or a different agent runtime than Claude/Codex/Pi). Archon assumes one in-process Bun event loop, one of three SDKs.

Lightweight harness end of the spectrum (simple sequence of generated skills): Archon is overkill. A 10-line shell script or a small Python file calling the Anthropic SDK does this and incurs no DB, no daemon, no platform-adapter layer, no schema. But: if you’re going to write more than two harnesses, the marginal cost of the second Archon workflow is much lower than the second hand-rolled harness, because you already have observability, resumption, and a UI. Verdict: lightweight harnesses alone don’t justify Archon; lightweight + heavy mixed does.

Heavy harness end (gated, resumable, multi-step, observability-equipped): Archon is genuinely close to enough, with the per-step-loop caveat. Concretely it covers ~80% of the heavy-harness requirements, and the missing 20% (per-item sub-DAGs, complex routing) is workable via Ralph + when: patterns. You would not need to compose Archon with LangGraph or Temporal for the Foundry’s stated requirements. You might compose it with a small external retriever for patterns/IWC exemplars.

Concrete recommendation. Hybrid — lean on Archon as the harness substrate, but isolate the boundary. Author Foundry harnesses as Archon workflows under <repo>/.archon/workflows/, generated skills as Claude skills + .archon/commands/, and call gxwf/planemo via script: and bash: nodes. Keep casting entirely outside Archon. Keep IWC/pattern retrieval outside Archon (filesystem or MCP). Do not build deep dependencies on Archon’s DB schema or its TypeScript engine APIs — treat workflows-as-YAML as the only durable contract; that’s where extraction cost is low. Accept the per-step-loop limitation up front and design the per-step pipeline around the Ralph pattern (single big loop reading the step list from disk). Revisit in 3–6 months: if Archon hits 1.0 with the loop-over-list primitive added, double down; if v2 churn continues, the YAML + commands are still portable to a hand-rolled runner.

11. Alternatives worth comparing

LangGraph (Python). Graph-based agent orchestration with explicit state, conditional edges, checkpointing, and a strong “for each / branch / merge” vocabulary. Better than Archon at expressing per-step sub-DAGs and complex routing; weaker at being a product (no UI, no platform adapters, no built-in worktree/PR lifecycle). If the Foundry’s hardest problem is the per-step branching graph, LangGraph is a better orchestration kernel; but you would build all the Archon-shaped scaffolding (run DB, resumption, gates UI, multi-platform invocation) yourself.
Temporal / Inngest. Durable workflow orchestrators with rock-solid state, retries, and resumption — far stronger than Archon on those axes — but at the cost of being infrastructure, not a coding-agent product. Activities are arbitrary code; they have no concept of LLM nodes, approvals, skills, MCP, or worktrees. You’d be building the agent layer on top. Use only if “this must survive any restart, forever” is a hard requirement; for a research foundry it isn’t.
CrewAI / AutoGen. Multi-agent frameworks where “agents” with roles negotiate. Wrong shape for the Foundry’s deterministic-pipeline goal — you don’t want agents deciding the order, you want the harness deciding. Both are also weaker on resumption and gates than Archon.
Plain Python with the Anthropic SDK (the do-nothing-fancy baseline). Honest answer: for a single harness end-to-end, this is fine. You write one script per harness, you ship generated skills as files, you subprocess.run("gxwf …"). You lose: parallel reviewer fan-out, structured observability, the Web UI step viewer, resume-from-failure, multi-platform invocation. You gain: zero dependencies on a pre-1.0 engine, full control. Use this for the lightweight-harness end if you don’t already need Archon for the heavy end. If you need Archon at all, use it for both.
Claude Code itself, with skills only. A Claude skill is a small harness when its instructions and supporting scripts are well-shaped. For very simple sequencing (one phase calls another via the SlashCommand or Task tool), no orchestrator is needed. The Foundry’s “lightweight harness” end may genuinely be expressible as a single Claude skill that calls generated skills sequentially. Worth piloting before adopting Archon.

Recommendation

Adopt Archon as the substrate for heavy Foundry harnesses, treating workflow YAML and commands as the only durable contract. It gives you ~80% of the heavy-harness requirements off-the-shelf — sequencing, approval gates, retries, resumption, observability, parallel fan-out, worktree isolation, Claude-skill loading, MCP, structured outputs, and a usable Web UI — with the noteworthy limitation that “loop over a list of items, each running a sub-DAG” is not a native primitive (use the bundled Ralph pattern). Do not let it host pattern/IWC-exemplar retrieval; that’s the wrong shape and v2 deliberately removed the KB. For lightweight harnesses (simple sequence of generated skills) Archon is overkill in isolation but cheap once you’re already running it for the heavy end.

Next steps (prioritized):

Build a throwaway proof-of-concept harness in Archon that exercises (a) a script: { runtime: uv } node calling gxwf validate, (b) a Ralph-pattern loop over a fake step list, (c) one approval: gate, (d) a cancel: branch from a when: route. Goal: confirm the per-step-loop ergonomics are tolerable for the real pipeline.
Decide where Mold compilation lives (outside Archon) and where pattern/IWC retrieval lives (outside Archon — pick filesystem grep vs. MCP server). Document the boundary so the Foundry doesn’t drift Archon into KB territory.
Pin to a specific Archon release in the Foundry’s docs and re-evaluate at each minor bump until 1.0; v2 semantics are still moving (CHANGELOG 0.3.10 alone changed provider/model resolution).
If the proof-of-concept exposes that per-step sub-DAGs are too painful: prototype the same flow in LangGraph as a head-to-head comparison before committing.
Skip CrewAI/AutoGen/Temporal/Inngest for now — wrong shape or too heavy for a research foundry.