COMPONENT_ARCHON

Component: Archon (orchestration evaluation)

Source: https://github.com/coleam00/Archon, cloned to ~/projects/repositories/Archon at commit on dev matching CHANGELOG 0.3.10 (2026-04-29). All file paths below are relative to that clone.

1. What is it?

Tagline (verbatim from README.md line 8): “The first open-source harness builder for AI coding. Make AI coding deterministic and repeatable.”

Elevator pitch (README.md line 23): “Archon is a workflow engine for AI coding agents. Define your development processes as YAML workflows — planning, implementation, validation, code review, PR creation — and run them reliably across all your projects.” It self-positions as “Like what Dockerfiles did for infrastructure and GitHub Actions did for CI/CD … Think n8n, but for software development.”

Category. A YAML-DAG workflow engine that wraps coding-agent SDKs (Claude Agent SDK, OpenAI Codex SDK, and a community Pi provider that fronts ~20 LLM backends). It is not a generic agent framework, not a multi-agent runtime, and not a RAG/KB system. It is closer in spirit to GitHub Actions for AI than to LangGraph or CrewAI. Platform adapters (Slack/Telegram/Discord/GitHub webhooks/Web/CLI) feed messages into a router that picks a workflow and runs it.

Important pivot. v1 of Archon was a Python-based “task management + RAG” multi-agent system. That codebase is archived on the archive/v1-task-management-rag branch and is no longer developed. The current main/dev is a complete rewrite in TypeScript/Bun (started ~Feb 2025) with a different premise. Anything written about Archon before mid-2025 is about a different product.

Maturity / activity.

Stated goals. Determinism and repeatability (“same workflow, same sequence, every time”), portability (workflows committed to your repo, run identically from CLI/Web/Slack/Telegram/GitHub), team-shareable processes, and isolation-by-default (every run gets its own git worktree). Stated non-goals (from CLAUDE.md): not multi-tenant, single-developer tool, KISS/YAGNI/SRP enforced explicitly. No scheduling, no general task-graph compute, no agent-to-agent negotiation.

2. Core architecture and primitives

Archon’s vocabulary is small and concrete. The primitives are: workflow, node, command (prompt template), codebase (registered repo), conversation, session, isolation environment (worktree), workflow run, workflow event.

Composition model. Workflow-centric, not agent-centric. A workflow is a YAML DAG of nodes with depends_on edges. Independent nodes in the same topological layer run concurrently. The author writes the structure; the AI fills in the intelligence at each AI node.

Node types (from packages/workflows/src/schemas/dag-node.ts, mutually exclusive within a node):

Per-node knobs (also in dagNodeBaseSchema): when: JS-expression conditions, trigger_rule: for join semantics (all_success | one_success | none_failed_min_one_success | all_done), context: fresh | shared, output_format: (JSON-schema for structured output, SDK-enforced for Claude/Codex), allowed_tools/denied_tools, hooks: (per-node SDK hooks), mcp: (per-node MCP server config path), skills: (per-node Claude skill preload — directly relevant), agents: (inline Claude sub-agent definitions usable via the Task tool), effort/thinking/maxBudgetUsd/systemPrompt/fallbackModel/betas/sandbox (Claude SDK passthrough), retry: (with backoff), idle_timeout:, model:/provider: overrides.

Storage / state. SQLite by default at ~/.archon/archon.db; PostgreSQL via DATABASE_URL. 8 tables, remote_agent_* prefix: codebases, conversations, sessions (immutable, transition-linked), messages, isolation_environments, workflow_runs (with working_path for resume detection — migration 019_workflow_resume_path.sql), workflow_events (step-level event log for observability), codebase_env_vars. Per-run artifacts live on disk under ~/.archon/workspaces/<owner>/<repo>/artifacts/runs/<id>/ and are reachable from inside nodes via $ARTIFACTS_DIR. Per-run logs go to …/logs/. Cross-run repo-scoped state lives at <repo>/.archon/state/ (gitignored).

Execution model. Async, in-process (Bun event loop), per-run path-exclusive lock (overridable via mutates_checkout: false). Node concurrency is per-DAG-layer. The DAG executor lives at packages/workflows/src/dag-executor.ts; the loader+validator at loader.ts; orchestration that maps platform messages to runs at packages/core/src/orchestrator/.

Multi-agent coordination. Limited and deliberate. Archon does not have an agent-to-agent negotiation layer. The closest things are: (a) parallel DAG layers (the archon-idea-to-pr workflow runs five reviewer nodes concurrently and joins them with trigger_rule: one_success into a synthesize node — exactly the multi-reviewer fan-out you’d expect); (b) per-node agents: blocks that define Claude sub-agents callable via the Task tool, scoped to a single node. There’s no “team of agents debating” abstraction. This is a feature, not a gap, given the determinism goal.

3. Tech stack and deployment

4. Skill / tool integration model

This is where Archon’s fit with the Foundry’s “cast skills” concept is closest.

5. Orchestration features

FeatureArchon support
Sequencing / DAGFirst-class. nodes with depends_on. Layer-parallel execution.
Conditional branchingwhen: expression on nodes; cancel: node terminates a branch.
RoutingThe router (packages/workflows/src/router.ts) uses an LLM call to pick a workflow from descriptions, with a 4-tier name-resolution fallback (exact → case-insensitive → suffix → substring) and ambiguity detection. Within a workflow, branching is when:-based, not LLM-routed.
Looping / per-item iterationloop: with until: signal, max_iterations, fresh_context, until_bash:, $LOOP_PREV_OUTPUT. The looped unit is a single prompt — there is no native “for each item in list, run this sub-DAG” primitive. To loop over a list of steps, you either drive iteration from inside the prompt (the archon-piv-loop “Ralph pattern” reads a plan from disk and picks one task per iteration) or you author N copies of the layer. This is a real limitation for the Foundry’s “loop over workflow steps” requirement.
Approval gates / HITLapproval: nodes (pure gate, optional capture_response, optional on_reject redraft). loop: { interactive: true, gate_message } for iterative HITL. CLI/web/platform /workflow approve <id> <text> and /workflow reject <id> <reason>.
Retry / failurePer-node retry: with backoff (not on loop nodes — loops manage their own iteration). trigger_rule lets a join survive partial failures. Failures classified into structured error types (dag.node_empty_output, codex_stream_incomplete, etc.).
State persistence and resumptionYes. Workflow runs persist to DB; working_path lets a re-run on the same branch find prior failed runs. cli workflow resume <run-id> re-runs, skipping completed nodes. Approval gates pause the run and survive process restarts. archon-piv-loop after-resume semantics handled explicitly (e.g., $LOOP_USER_INPUT only populated on first iteration after resume).
Observability / tracingworkflow_events table is a step-level event log (transitions, artifacts, errors). JSONL file logs per run under …/logs/. Web UI “Workflow Execution” view streams events. Pino structured logs with {domain}.{action}_{state} event naming.
Cost trackingPer-node maxBudgetUsd: cap (Claude only — SDK passthrough). No global cost dashboard.
ConcurrencyPer-DAG-layer parallelism inside a run. Multiple concurrent runs across worktrees. Same-checkout concurrency requires mutates_checkout: false (author asserts no race).
Worktree isolationDefault. cli workflow run --branch <name> or auto-generated name; --no-worktree opt-out. archon-resolve-conflicts, cli isolation cleanup, cli complete <branch> lifecycle.

6. Knowledge base / RAG features

There is no KB. v1 had RAG (Supabase + pgvector + Crawl4AI); v2 dropped it entirely. The closest things v2 has:

For the Foundry’s wish to host patterns and IWC exemplars in retrievable form, Archon is the wrong shape. You would either keep them as static files in <repo>/.archon/state/ or docs/ and grep from inside nodes, or pair Archon with an external retriever (MCP server, custom skill, separate vector DB). The repo’s authors deliberately removed the KB layer; it is not coming back.

7. Extensibility and customization

8. Failure modes and limitations

9. Roadmap and trajectory

10. Concrete fit assessment for harnesses

Re-read against Foundry harness requirements:

What Archon covers off-the-shelf:

What requires non-trivial custom code on top of Archon:

What Archon would actively get in the way of: very little. The main friction is if you want a different runtime model (durable async like Temporal, externally-scheduled, or a different agent runtime than Claude/Codex/Pi). Archon assumes one in-process Bun event loop, one of three SDKs.

Lightweight harness end of the spectrum (simple sequence of cast skills): Archon is overkill. A 10-line shell script or a small Python file calling the Anthropic SDK does this and incurs no DB, no daemon, no platform-adapter layer, no schema. But: if you’re going to write more than two harnesses, the marginal cost of the second Archon workflow is much lower than the second hand-rolled harness, because you already have observability, resumption, and a UI. Verdict: lightweight harnesses alone don’t justify Archon; lightweight + heavy mixed does.

Heavy harness end (gated, resumable, multi-step, observability-equipped): Archon is genuinely close to enough, with the per-step-loop caveat. Concretely it covers ~80% of the heavy-harness requirements, and the missing 20% (per-item sub-DAGs, complex routing) is workable via Ralph + when: patterns. You would not need to compose Archon with LangGraph or Temporal for the Foundry’s stated requirements. You might compose it with a small external retriever for patterns/IWC exemplars.

Concrete recommendation. Hybrid — lean on Archon as the harness substrate, but isolate the boundary. Author Foundry harnesses as Archon workflows under <repo>/.archon/workflows/, cast skills as Claude skills + .archon/commands/, and call gxwf/planemo via script: and bash: nodes. Keep the Mold-compilation step entirely outside Archon. Keep IWC/pattern retrieval outside Archon (filesystem or MCP). Do not build deep dependencies on Archon’s DB schema or its TypeScript engine APIs — treat workflows-as-YAML as the only durable contract; that’s where extraction cost is low. Accept the per-step-loop limitation up front and design the per-step pipeline around the Ralph pattern (single big loop reading the step list from disk). Revisit in 3–6 months: if Archon hits 1.0 with the loop-over-list primitive added, double down; if v2 churn continues, the YAML + commands are still portable to a hand-rolled runner.

11. Alternatives worth comparing

Recommendation

Adopt Archon as the substrate for heavy Foundry harnesses, treating workflow YAML and commands as the only durable contract. It gives you ~80% of the heavy-harness requirements off-the-shelf — sequencing, approval gates, retries, resumption, observability, parallel fan-out, worktree isolation, Claude-skill loading, MCP, structured outputs, and a usable Web UI — with the noteworthy limitation that “loop over a list of items, each running a sub-DAG” is not a native primitive (use the bundled Ralph pattern). Do not let it host pattern/IWC-exemplar retrieval; that’s the wrong shape and v2 deliberately removed the KB. For lightweight harnesses (simple sequence of cast skills) Archon is overkill in isolation but cheap once you’re already running it for the heavy end.

Next steps (prioritized):

  1. Build a throwaway proof-of-concept harness in Archon that exercises (a) a script: { runtime: uv } node calling gxwf validate, (b) a Ralph-pattern loop over a fake step list, (c) one approval: gate, (d) a cancel: branch from a when: route. Goal: confirm the per-step-loop ergonomics are tolerable for the real pipeline.
  2. Decide where Mold compilation lives (outside Archon) and where pattern/IWC retrieval lives (outside Archon — pick filesystem grep vs. MCP server). Document the boundary so the Foundry doesn’t drift Archon into KB territory.
  3. Pin to a specific Archon release in the Foundry’s docs and re-evaluate at each minor bump until 1.0; v2 semantics are still moving (CHANGELOG 0.3.10 alone changed provider/model resolution).
  4. If the proof-of-concept exposes that per-step sub-DAGs are too painful: prototype the same flow in LangGraph as a head-to-head comparison before committing.
  5. Skip CrewAI/AutoGen/Temporal/Inngest for now — wrong shape or too heavy for a research foundry.