ABSTRACT_CLAUDE

Achieving Reproducibility in the Age of Agents

Agent-driven science is changing how computational research gets done. The scope is staggering — and the word “slop” has become synonymous with agentic artifacts. If agents aren’t strongly encouraged to do science reproducibly, the democratization of data analysis will produce more tangles of bash scripts and more results without provenance. When reproducibility is easier than ever, we may see the reproducibility crisis worsen.

We argue that the tools we’ve built to encourage humans to do reproducible data analysis — structured workflows, versioned artifacts, rich metadata, typed tool registries — are exactly the tools we need for agent-assisted science. We demonstrate this through two new Galaxy capabilities, both designed for humans but immediately exploitable by agents.

History Notebooks are Galaxy-flavored markdown documents attached to histories. They give every analysis a living narrative — with embedded dataset views, job parameter tables, and interactive visualizations — that persists and versions alongside the data. An AI assistant can inspect history contents and co-author the notebook, with every revision attributed via edit_source provenance (human, in-app agent, or external agent). Three usage modes share one infrastructure: a researcher documenting solo, a human co-authoring with the in-app agent, or an external agent (Claude Code via MCP, a CI pipeline) driving analysis and documenting its own work through the API. Crucially, when a workflow is extracted from a documented history, the notebook becomes the workflow’s built-in report — the narrative travels with the computation, so every future invocation ships with the story of why it exists and how to interpret its outputs.

Workflow State Validation brings per-tool parameter validation to Galaxy workflows offline, with no server required. Galaxy’s ToolShed serves typed parameter schemas for 10,000+ tools — full parameter trees with types, constraints, conditional logic, and valid option enumerations. Every major workflow system (Nextflow, Snakemake, WDL) now validates at the pipeline configuration level. Only Galaxy validates at the tool invocation level — every parameter name, value, select option, conditional branch, and collection type connection — because only Galaxy has a centralized registry with full parameter schemas. For agents composing multi-step pipelines, this provides structured, per-parameter error reports in milliseconds rather than trial-and-error execution cycles that take minutes to hours. The same infrastructure powers Format2 (YAML) as a first-class authoring surface for both agents and IDE users, with JSON Schema-powered auto-completion and hover documentation in VS Code.

Together these features illustrate a pattern: Galaxy’s reproducibility infrastructure isn’t overhead agents should bypass — it’s the feedback loop that makes agent-assisted science trustworthy. Typed tool metadata validates agent-composed workflows before execution. History Notebooks give agents a structured, versioned medium to document their reasoning, subject to human review. The result is agent-assisted analysis that is not just powerful but auditable, reproducible, and portable.