# Format 2 and gxwf: Schema-Aware Authoring and Validation of Galaxy Workflows

## Abstract

Bioinformatics workflow systems have converged on a common pattern: pipelines are authored as text, validated before execution, and shared through community repositories. Galaxy has historically diverged from this pattern, treating workflows as predominantly GUI-authored runtime artifacts whose machine-readable native form encodes per-tool state opaquely. We close that gap. **Format 2** is the human- and agent-writable representation of Galaxy workflows: a concise YAML document whose step state is structured, named, and editable. **gxwf** is the schema-aware validation and authoring stack built on Format 2 and on Galaxy's existing tool metadata: it validates workflow structure, individual tool invocations, and connections — including Galaxy's collection semantics — against typed schemas served by the ToolShed for more than 10,000 community-maintained tools. The same validation core runs from a command-line interface, in a Visual Studio Code extension, in a browser-based editor with no Galaxy server required, and as a static check suitable for continuous integration over corpora such as the Intergalactic Workflow Commission (IWC) collection. The result is a single artifact — a Format 2 workflow — that a human can write, a reviewer can diff, an IDE can complete, an agent can generate, and a static checker can reject before any compute is consumed. We describe Format 2, the validation architecture, the loss-aware conversion between Format 2 and the native `.ga` representation, the IWC corpus round-trip results, and the four authoring surfaces that share the core. The contribution is not the existence of pre-execution validation — every major workflow system now offers some form of it — but its *depth*: schema-aware checking of individual scientific tool invocations, made possible by Galaxy's centralized, typed tool registry.

## Introduction

A workflow in modern bioinformatics is a multi-step computation: aligning sequencing reads, calling variants, quantifying transcripts, fitting models. The systems researchers use to build, share, and re-execute these computations have converged technically over the past decade. Nextflow [Di Tommaso 2017], Snakemake [Mölder 2021], and the Workflow Description Language (WDL) [OpenWDL] all expose workflows as text artifacts that fit in version control, accept community contributions through pull requests, and validate themselves before submitting a single job. Each ecosystem has matured its validation surface: Nextflow's nf-schema validates pipeline-level parameters against JSON Schema [nf-schema]; Snakemake offers `--lint`, `--dry-run`, and config-schema validation [Snakemake docs]; WDL has three independent implementations — womtool [Cromwell], miniwdl [miniwdl], and Sprocket [Sprocket] — that perform static type checking. These are real capabilities, and they have made text-first workflow authoring credible at scale. The Common Workflow Language [Crusoe 2022] provides a portable specification that adjacent platforms, including Galaxy, interoperate with.

Galaxy [Galaxy Community 2024] occupies a different position. Galaxy's strength is breadth: a centralized, curated, versioned registry of more than 10,000 bioinformatics tools [Blankenberg 2014], each described by a typed XML schema that declares parameters, types, constraints, conditional structure, valid select options, and collection requirements. The same metadata that powers Galaxy's graphical user interface — populating forms, validating user input, driving the workflow editor — is, in principle, available to any client that can speak to the ToolShed. In practice, that metadata has not been connected to text-based workflow authoring. Galaxy's native workflow format (`.ga`) is JSON, comprehensive, and operationally important, but its per-step `tool_state` is encoded as string-quoted JSON whose interpretation depends on tool-specific decoding rules. A human cannot reasonably edit a `.ga` workflow by hand; a code-review tool cannot meaningfully diff one; an agent cannot generate one without round-tripping through the GUI.

Format 2 is the response to that gap. Originally introduced as a YAML serialization for Galaxy workflows, Format 2 makes step state explicit and structured: parameter names are keys, values are typed, conditional branches are addressable, and the document as a whole is readable in the same way a Snakefile or a `main.nf` is readable. The `gxformat2` library [gxformat2] performs the underlying conversion between Format 2 and the native `.ga` representation. The format has existed for several years, has been referenced from prior Galaxy-side methodological work [Bray 2023], and has been used in a number of community projects, most notably the Intergalactic Workflow Commission's curated workflow corpus [IWC].

What has not existed, until now, is a validation and authoring stack that takes Format 2 seriously as a primary artifact. Writability without static checking is a regression, not progress: a YAML file that produces a runtime error after the first scheduled job has wasted both human time and compute. The contribution of this work is that stack. We call it **gxwf** — a command-line entry point, a TypeScript library distributed as the `@galaxy-tool-util/*` monorepo, a Visual Studio Code extension, and a browser-based workflow editor that share a single validation core.

The central technical claim is not that other systems lack validation. They have it, and they have invested heavily in it. The claim is one of *depth*. Pipeline-level validation can catch a misspelled output directory; whole-workflow type checking can catch a `File` connected to a `String`. Neither can catch an invalid alignment scoring option that will silently produce wrong results, because the parameter contracts of individual scientific tools — BWA-MEM's gap-open penalty, samtools' index format, DESeq2's design formula — are not expressed in those systems' type systems. They are implicit in the underlying process scripts or rule blocks. In Galaxy they are explicit, machine-readable, queryable, and versioned. gxwf is the layer that puts that metadata to use at authoring time.

The remainder of the paper is organized as follows. We describe Format 2 as the writable artifact and contrast it concretely with the native `.ga` form. We then describe schema-aware validation across three layers: per-step state, per-connection types including collection semantics, and conditional state consistency. We report results from validating and round-tripping the IWC workflow corpus. We describe the four authoring surfaces that share the validation core. We compare the resulting validation depth against the principal text-based workflow systems. We close with a discussion of implications for agent-assisted workflow authoring, ecosystem direction, and the limits of the approach.

## Format 2: A Writable Galaxy Workflow

A Galaxy workflow declares inputs, tool steps, the connections between them, and the state — the parameterization — of each step. The native `.ga` representation captures all of this in JSON. The per-step `tool_state` field is itself a JSON document, encoded as a string, whose schema depends on the tool. A small fragment of a real workflow's native form appears below; we have elided everything outside the tool_state for clarity:

```json
{
  "id": 2,
  "tool_id": "toolshed.g2.bx.psu.edu/repos/devteam/bwa_mem/bwa_mem/0.7.17.2",
  "tool_state": "{\"__current_case__\": 0, \"analysis_type\": {\"analysis_type_selector\": \"illumina\", \"__current_case__\": 0}, \"fastq_input\": {\"fastq_input_selector\": \"paired\", \"fastq_input1\": {\"__class__\": \"ConnectedValue\"}, \"fastq_input2\": {\"__class__\": \"ConnectedValue\"}, \"iset_stats\": \"\"}, \"output_sort\": \"coordsorted\", \"reference_source\": {\"reference_source_selector\": \"cached\", \"__current_case__\": 0, \"ref_file\": \"hg38\"}, \"rg\": {\"rg_selector\": \"do_not_set\", \"__current_case__\": 3}}"
}
```

A human reading this for the first time has to perform several decodings: parse the outer JSON, parse the inner string-quoted JSON, recognize `ConnectedValue` as the encoding for an upstream step output, interpret `__current_case__` indices as conditional-branch selectors keyed positionally rather than by name, and reconcile the implicit hierarchy of the parameter tree against the tool's XML schema. Diffing two versions of this state across a parameter change is meaningful only if the reader performs all of those decodings simultaneously.

The same step in Format 2 is a YAML object whose structure mirrors the tool's logical parameter tree:

```yaml
- label: bwa_mem
  tool_id: toolshed.g2.bx.psu.edu/repos/devteam/bwa_mem/bwa_mem/0.7.17.2
  in:
    fastq_input|fastq_input1: trim_galore/reads_1
    fastq_input|fastq_input2: trim_galore/reads_2
  state:
    analysis_type:
      analysis_type_selector: illumina
    fastq_input:
      fastq_input_selector: paired
    output_sort: coordsorted
    reference_source:
      reference_source_selector: cached
      ref_file: hg38
    rg:
      rg_selector: do_not_set
```

The differences are not cosmetic. The Format 2 representation makes named structure explicit (`reference_source.ref_file: hg38` rather than positional case indices), uses connection labels (`trim_galore/reads_1`) instead of opaque `ConnectedValue` placeholders, and elides the bookkeeping fields (`__current_case__`) that the runtime maintains. A reviewer can diff a parameter change. An editor can complete a key against the tool schema. A CI system can run this through validation and refuse a pull request. An agent can produce this from a natural-language specification with substantially smaller context windows and fewer encoding errors than would be required to generate well-formed native `.ga`.

We treat the relationship between Format 2 and the native representation as bidirectional and loss-aware, not as an export pipeline. Both formats are first-class. The native `.ga` form remains operationally important — Galaxy servers consume and emit it, and the long tail of existing workflows is encoded in it — and so the authoring stack must read, write, validate, and round-trip both. We return to this in the Conversion section.

## Schema-Aware Validation

The validation core, distributed as the `@galaxy-tool-util/*` TypeScript monorepo, consumes a workflow (in either Format 2 or native form) and a source of tool metadata, and produces a structured validation report. The metadata source is pluggable: a Galaxy server [Galaxy Community 2024], a local cache of ToolShed-fetched schemas [Blankenberg 2014; GA4GH TRS], or a directory of tool XML files. Validation runs in three layers.

**Structural validation** checks the workflow document itself: required top-level fields are present, step references are resolvable, inputs are declared, outputs are reachable, and the workflow is acyclic. Structural validation is necessary but shallow, and is functionally comparable to what every other workflow system already provides. We do not claim it as a contribution.

**Per-step state validation** is the first depth contribution. Given a step's `tool_id` (a versioned ToolShed identifier) and its declared state, the validator resolves the tool's schema and checks the state against it parameter by parameter. Parameter names are checked against the tool's declared parameter tree: an unknown key (a misspelling or a parameter that has been removed in a newer tool version) produces a warning with a precise source range. Parameter values are checked against their declared types: an integer parameter cannot hold a string, a float parameter cannot hold an arbitrary expression. Select-input values are checked against the tool's declared option enumeration: a select parameter that accepts `coordsorted`, `queryname`, or `unsorted` will reject any other value, with the legal options listed in the error message. Conditional inputs are descended into: when a conditional has selected its `illumina` branch, the validator checks the inputs declared inside that branch and ignores the inputs declared inside the unselected branches. Repeats and sections are descended into the same way.

This is the validation that other workflow systems cannot easily perform, not because their type systems are weaker, but because the parameter contracts of the tools they orchestrate are not declared in machine-readable form. A Nextflow process declares its inputs and outputs in the process header; the per-tool flags inside the process script are free-form shell. A Snakemake rule declares its inputs, outputs, and shell or run block; the meaning of `--genome hg38` inside a rule's shell command is invisible to the lint checker. WDL types its inputs and outputs at the task level; the tool inside a task's command section is still an opaque shell invocation. Galaxy is structurally different because tools are not embedded; they are referenced by versioned ID, and their schemas live in a registry the validator can query.

**Per-connection validation** is the second depth contribution. A Galaxy connection between two steps is more than a producer-consumer link: it carries information about Galaxy's collection system. A tool that produces a `list` collection connected to a tool that consumes a single `dataset` causes the downstream tool to map over the list — implicit parallelism. A `list:paired` collection connected to a tool that consumes a `paired` collection maps over the outer `list` dimension, processing each pair in turn. These map-over semantics, with their reduction edges and collection-depth bookkeeping, are operationally central to Galaxy and have historically been validated only at runtime.

The validation core models collection types using the `CollectionTypeDescription` library, originally part of the Galaxy server and now extracted as the zero-dependency `@galaxy-tool-util/workflow-graph` package. For every connection in a workflow, the validator computes the producing output's collection type, the consuming input's required type, the resulting map-over dimension if any, and the implied collection type of every downstream step. Connections that imply incompatible collection types — for example, a `paired` output connected to an input that expects `list:paired` without an intervening operation that supplies the outer list — are rejected statically. The same package backs the diagram renderer in the browser editor, so the depiction of map-over depth in a visual diagram and its acceptance by the static validator are guaranteed to agree.

**Conditional and stale-state validation** is the third layer. Galaxy tool schemas frequently express conditional structure: a `reference_source_selector` set to `cached` exposes one set of inputs (a select over Galaxy-cached references), while `history` exposes another (a dataset connection to a user-uploaded reference). The native `.ga` form encodes the chosen branch via the integer `__current_case__` bookkeeping field; Format 2 makes the choice implicit by which sub-keys appear under the conditional. The validator checks that the selected branch's required inputs are present, that the unselected branches' inputs are absent (or, configurably, that they may persist as stale-state), and that the conditional's selector value is one of the declared cases.

Stale state — parameters that were valid in a previous tool version but no longer exist in the current one — is a chronic source of friction in long-lived Galaxy workflows. Tool authors add, remove, and reorganize parameters across versions, and workflow state captured against an old version can carry parameter names that no longer mean anything to the new tool. The validation core classifies stale state into removed-parameter, renamed-parameter (when a heuristic match succeeds), and conditional-branch-vestigial categories, and exposes a `clean` operation that the CLI surfaces as `gxwf clean`. The policy — what to strip, what to flag, what to preserve — is configurable, and the result is itself a validated workflow.

## Loss-Aware Conversion and Corpus Validation

A writable format is credible only if it is *interconvertible* with the format the runtime actually consumes. Format 2 and native `.ga` are two encodings of the same underlying Galaxy workflow graph, but the encodings carry different bookkeeping. The native form preserves runtime fields — uuids, position metadata, internal IDs — that Format 2 does not require and does not represent. The Format 2 form carries higher-level structure — named conditionals, explicit step labels — that the native form encodes positionally.

We treat conversion as a tool-aware operation backed by the same schema metadata that drives validation. Converting a `.ga` workflow to Format 2 is not a syntactic transformation; it is a re-encoding of each step's `tool_state` against the tool's schema, which requires resolving conditional cases, distinguishing connected-value placeholders from literal values, and reconstructing the named structure that the native form encodes positionally. Converting Format 2 to `.ga` performs the inverse, generating the bookkeeping fields the runtime expects and serializing the state in the native string-quoted JSON form.

We use round-trip equivalence as the operational measure of conversion quality. A native workflow `W` is converted to Format 2 to produce `W'` and back to native to produce `W''`. We compare `W` and `W''`: an unchanged round-trip is the desired result; a benign diff — for example, a regenerated UUID — is acceptable; a state-altering diff is a bug in the conversion or, occasionally, a latent inconsistency in `W` itself that the round-trip surfaces.

The Intergalactic Workflow Commission [IWC] curates a corpus of Galaxy workflows covering RNA-seq, ChIP-seq, variant calling, mass spectrometry, single-cell, and imaging pipelines. The corpus is community-maintained, version-controlled, and continuously updated as new analyses are contributed. We use the corpus as the empirical testbed for the validation core: every workflow is validated, converted, and round-tripped with the published `gxwf` binary, and we report those results here. As of the time of writing, the corpus contains [NUMBER] workflows. Of these, [NUMBER] validate cleanly under the strict-state policy, [NUMBER] surface stale-state diagnostics that can be cleaned automatically, and [NUMBER] surface diagnostics that require human attention. Round-trip equivalence holds for [NUMBER] workflows; benign diffs (UUID regeneration, key reordering) account for [NUMBER]; state-altering diffs account for [NUMBER] and are tracked as bugs.

The corpus is not a held-out test set in the machine-learning sense — we use it both to develop the validator and to report on it — but it is a community-curated artifact whose composition was determined independently of our work. Its function in this paper is empirical: it bounds the claims we can make about what the validator actually catches on real workflows authored by real Galaxy contributors.

## Authoring Surfaces

The validation core is a TypeScript library, but a library does not constitute an authoring experience. We describe four surfaces that consume the core. All four share the same validator, the same tool-metadata cache, and the same conversion logic; they differ in how they present diagnostics, complete keys, and integrate with the rest of a researcher's environment.

### Command-Line Interface

The `gxwf` command-line tool is the lowest-level authoring surface and the substrate for the other three. It exposes the validation core as a set of subcommands operating over Format 2 and native workflow files: `gxwf validate` produces a structured validation report; `gxwf convert` performs format conversion with validation; `gxwf clean` performs stale-state cleaning under a configurable policy; `gxwf lint` performs structural and best-practice checks; `gxwf mermaid` and `gxwf cytoscapejs` render workflow diagrams suitable for embedding in documentation or rendering in a browser. A parallel family of `-tree` subcommands (`validate-tree`, `lint-tree`, `roundtrip-tree`, and others) operates over a directory of workflows and emits HTML or Markdown summary reports via `--report-html` and `--report-markdown`. Adjacent binaries — `galaxy-tool-cache` for managing the local tool-metadata cache, `galaxy-tool-proxy` for serving a cache to other clients, and `gxwf-web` for serving the browser editor — round out the CLI surface; `gxwf` alone exposes 24 subcommands and `galaxy-tool-cache` adds 7. The library is distributed as a pnpm monorepo of 10 packages published on npm; the binaries are also published as standalone executables for users who prefer not to install a Node runtime.

A strict-mode flag family (`--strict-structure`, `--strict-encoding`, `--strict-state`, and the convenience `--strict`) controls the diagnostic threshold. The default mode reports warnings on issues that may be benign — unknown parameters under permissive tool versions, stale-state remnants, conventional formatting drift — and errors only on issues that would cause runtime failure. Strict modes promote categories of warning to errors, enabling CI configurations that refuse pull requests with any unresolved diagnostic.

### Visual Studio Code Extension

The Visual Studio Code extension `galaxy-workflows-vscode` brings the validation core into a working editor environment. The extension communicates with the validator through a Language Server Protocol [LSP] interface, so the same validation logic that runs in the CLI runs in the editor without re-implementation. The extension provides diagnostics, completions, hover documentation, conversion commands, and tool-discovery and navigation features over both `.gxwf.yml` and `.ga` files.

Diagnostics are tool-aware. An unknown parameter inside a `state:` block produces a warning with a precise sub-range — not the whole step, but the specific YAML key. An invalid select value produces an error whose message includes the legal options, again with sub-range precision. Conditional and repeat structures are descended into the same way the standalone validator descends, so a parameter that would be legal under one branch and illegal under another is reported correctly against whichever branch is currently selected. Diagnostics update incrementally as the user edits.

Completions are parameter-aware. Inside a `state:` block, the extension completes parameter keys against the tool's schema, including descent into selected conditional branches and into `section` and `repeat` containers. Value completions are offered for select inputs (the legal option set) and for typed inputs where the schema declares a constraint. Completions reflect the structure of the *currently selected* conditional case, which is the behavior a working editor experience requires but which is non-trivial to implement against a representation where the selection is implicit.

Hover documentation surfaces tool-XML content — parameter name, type, label, help text, and select-option labels — directly in the editor. A user editing a `bwa_mem` step's `output_sort` parameter sees the human-readable label and help for that specific parameter without leaving the editor or consulting external documentation.

A quick-fix code action repairs legacy string-encoded `tool_state` in older `.ga` files, and six conversion commands support previewing, sibling-writing, and in-place conversion between Format 2 and native forms.

The extension also assists with tool discovery and navigation. An *Insert Tool Step* command queries the ToolShed search API and presents the results in a quick-pick — each hit showing the tool name, its `owner/repo` provenance, a description, and a one-click link to the tool's ToolShed page — so a step can be added without leaving the editor or knowing a tool's fully qualified identifier in advance. A *Workflow Tools* tree view enumerates every tool a workflow references, and a CodeLens over each `tool_id` opens that tool's ToolShed page directly, turning the otherwise opaque identifier strings in a workflow document into navigable references back to their source repositories.

Tool metadata is resolved on demand: opening a workflow triggers background resolution of its referenced tools against the configured ToolShed, populating the local cache so that diagnostics and completions become available without an explicit fetch step. A `Populate Tool Cache` command performs the same batch fetch eagerly from ToolShed TRS endpoints; a status-bar indicator reflects cache state, and the extension is fully usable offline once the cache is warm.

Architecturally, the extension consumes the same `@galaxy-tool-util/schema` and `@galaxy-tool-util/core` packages that back the CLI. An earlier generation of the extension vendored its own schema sources and TRS client; that code has been retired in favor of dependencies on the published packages. The contribution this consolidation makes is not new functionality but a guarantee: any diagnostic the user sees in the editor is the same diagnostic the CLI would report, the browser editor would report, and the CI checker would report.

Because its language servers are packaged as web workers, the extension also runs as a *web extension*: the same `galaxy-workflows-vscode` build loads in the browser-hosted VS Code at vscode.dev and github.dev, where the validation core executes entirely in-browser with no local installation and no Galaxy server. This is distinct from the standalone `gxwf-ui` editor described next — the web extension is the full VS Code workbench hosting the extension, whereas `gxwf-ui` is a purpose-built single-page application that embeds the same language server. Both run the identical validation core in the browser; they differ in host and audience. (The VS Code figures in this paper were captured from this browser-hosted extension.)

### Browser-Based Editor

The browser editor, distributed as `gxwf-ui`, runs the same validator in a web browser, with no Galaxy server required. It is a Vue 3 + PrimeVue single-page application that embeds the Monaco editor [Monaco] and the `galaxy-workflows-vscode` language server (via `@codingame/monaco-vscode-api`), backed by an IndexedDB-resident tool cache. The same Language Server Protocol interface [LSP] that drives the VS Code extension drives the in-browser editor; a diagnostic surfaced in one is, by construction, the same diagnostic surfaced in the other. The diagram renderer uses Cytoscape with map-over depth and reduction-edge annotations supplied by the shared `workflow-graph` package.

The browser editor exists for two purposes. First, it lowers the barrier to text-based Galaxy workflow authoring to zero: a researcher can open a URL, paste a Format 2 document, see schema-aware validation in real time, edit with completions, and download the result. Second, it makes the validation infrastructure usable from environments where installing a development toolchain is impractical — classroom settings, citizen-science contexts, review environments. A jsDelivr-hosted IIFE bundle (`gxwf-report-shell`) additionally allows Python-side report generators to embed the same Vue components in standalone HTML, so workflow documentation and validation reports share a single rendering path.

### Continuous-Integration Surface

The fourth surface is not an editor but a continuous-integration pattern the validation core is built to support. Because `gxwf validate --strict` is a single command that exits non-zero on any unresolved diagnostic, a curated workflow repository can gate its pull requests on it: a contribution whose state does not validate against the relevant tool schemas would be rejected before review, and the same mechanism extends to round-trip equivalence — a contribution that round-trips cleanly is mergeable, one that does not is held until the divergence is explained. We describe this as the intended integration path for curated corpora such as IWC; we have not deployed it as a production merge gate in any repository to date.

The value of the pattern is preventive. Parameter drift introduced by tool-version bumps would be caught at the time of contribution rather than surfacing when a downstream user next executes the affected steps; a corpus that adopts the check trades a slowly accumulating tail of stale state for an enforced merge-boundary guarantee.

## Validation Across Workflow Systems

Every major bioinformatics workflow system now offers some form of pre-execution validation. The contribution this work makes is not the existence of validation but its depth. We separate two questions: what classes of error each system can catch before execution, and what architectural prerequisites the deeper classes require.

**Table 1.** Static validation capabilities across the principal text-based workflow systems. The first row is comparable across systems; the next three are the depth contributions of this work; the last row is a substrate-level capability all systems now share.

| Capability | Galaxy (this work) | Nextflow | Snakemake | WDL |
|---|---|---|---|---|
| Pipeline/workflow-level parameter validation | Yes | Yes (nf-schema) | Yes (config schema) | Yes (womtool, miniwdl, Sprocket) |
| Per-tool invocation parameter validation | **Yes** — names, types, constraints, options | No | No | No |
| Per-connection type validation | **Yes** — including Galaxy collection semantics | No | No | Partial (WDL types) |
| Tool-aware IDE diagnostics, completion, and hover | **Yes** — galaxy-workflows-vscode | Community extensions, workflow-level | Community extensions, workflow-level | Yes (Sprocket LSP) — workflow-level only |
| Offline validation, no server required | Yes — Node, browser, IDE | Yes (nf-schema) | Yes (`--lint`, `--dry-run`) | Yes (womtool, miniwdl, Sprocket) |

The depth contributions in rows two through four are not architectural accidents. They presuppose a typed tool registry against which a workflow's per-step state can be validated. Galaxy has one: the ToolShed [Blankenberg 2014] serves typed schemas for the more than ten thousand tools currently registered, and the GA4GH Tool Registry Service protocol [GA4GH TRS; O'Connor 2017] makes those schemas queryable from any client. The other systems would need to construct an equivalent registry to perform tool-level validation. The nf-core community module collection holds I/O metadata for roughly thirteen hundred modules but does not maintain parameter schemas; Snakemake [Mölder 2021] has no central tool registry by design, each rule's tool being embedded in shell or Python; Dockstore [O'Connor 2017] hosts WDL workflows but does not maintain separate per-tool schemas. The depth differential between Galaxy and the comparison set is a downstream consequence of an architectural decision Galaxy made early and has invested in over fifteen years [Galaxy Community 2024].

This is not a claim that Galaxy's text-authoring experience is uniformly superior to its competitors'. Nextflow [Di Tommaso 2017] has stronger community momentum, broader cloud-execution integration, and a domain-specific language whose expressive power for control-flow exceeds Format 2's design. Snakemake's Python integration is unmatched. WDL's strong typing at the workflow level catches a class of errors Format 2 does not. The Common Workflow Language [Crusoe 2022] provides a portable specification for cross-platform workflow exchange that Galaxy interoperates with rather than competes against. The claim is narrower: that for the specific problem of validating individual scientific tool invocations inside a workflow, Galaxy's centralized typed registry provides a structural advantage no competitor currently has.

## Discussion

### Consumers of the Validation Core

A single validator, a single tool-metadata cache, and a single diagnostic vocabulary serve four classes of consumer. We treat them in order of present installed base. The contribution of this work is not most usefully framed as "validation for agents" or as "VS Code support for Galaxy"; it is the shared infrastructure that makes all four cases possible, and the architectural commitment that the same diagnostic surfaces in each.

**Human authors in development environments** are the largest consumer class today. A researcher composing a Format 2 workflow in Visual Studio Code receives parameter-name completions that descend into the tool's conditional structure, hover documentation sourced directly from the tool XML, and diagnostics with the precision needed to fix an error in place. The experience is comparable to what other workflow systems' IDE integrations offer at the workflow-syntax level, with the additional capability that tool invocations are themselves type-checked. Training-material authors [Hiltemann 2023] gain a way to commit workflows alongside their tutorial text whose correctness is checkable in continuous integration rather than asserted in prose, closing a gap between training artifacts and runtime behaviour that has historically required manual reconciliation.

**Continuous-integration maintainers of curated workflow repositories** are the consumer class with the highest potential leverage per workflow. A curated repository such as IWC can gate its pull requests on `gxwf validate --strict` and on round-trip equivalence; the operational effect would be that parameter drift introduced by tool-version bumps is caught at the time of contribution rather than at the time of execution by a downstream user. The same pattern transfers to other curated Galaxy workflow repositories — Galaxy training materials, lab-internal collections, institutional registries — without infrastructure change. For a maintainer, the validator is the difference between a corpus whose health is asserted in a README and one whose health is enforced at the merge boundary.

**Contexts where installation cost matters** are the consumer class the browser editor was designed for. A classroom session, a citizen-science outreach event, a code-review meeting, or a one-off collaborator request can run schema-aware Galaxy workflow validation in a browser tab with no Node toolchain, no Galaxy server, and no local cache to manage. The same library code that backs the CLI runs in the browser by virtue of the IndexedDB cache backend, so a user who gains familiarity in the browser can migrate to a local install or a Galaxy server without learning a new interface. The barrier to first contact with text-based Galaxy workflow authoring is, in this consumer class, a single URL.

**Agent-assisted workflow authoring** is the most demanding consumer class and the one most likely to drive design pressure in the coming years. Recent work on agent-driven bioinformatics analysis [BioAgents 2025; Xin 2024] has converged on a recurring observation: agents that interact with workflow systems only through execution face a per-iteration cost that limits how many iterations they can perform before exhausting time, compute, or the user's patience. An agent that submits a workflow, waits minutes for failure, and reads an ambiguous error message has a very different trajectory from one that validates locally, receives a structured per-parameter diagnostic, and corrects. This work supplies the second loop. The validation core is sufficiently fast — sub-second on representative workflows once the tool cache is warm — to be invoked at every authoring step rather than only at submission. Diagnostics are structured: each finding carries a path into the workflow document, a category, and a machine-readable description of the violated constraint. We are explicit that this contribution is a *substrate* for agent-assisted authoring, not an agent. The construction of agent-authoring layers on top of this substrate — the orchestration of tool discovery, the decomposition of authoring tasks, the management of provenance for agent-generated workflows — is the subject of separate work.

### Limits and Honest Risks

The depth-of-validation claim depends on the completeness and correctness of tool schemas in the ToolShed [Blankenberg 2014]. Tools whose XML is incomplete, whose conditional structure misrepresents the underlying executable's actual parameters, or whose select-option enumerations omit legal values will let through invalid state that the validator cannot catch. ToolShed metadata is a community artifact, and its coverage and accuracy vary across the registry. The validator is a *floor* on workflow correctness, not a *ceiling*.

The conversion layer carries a related risk. Lossless round-tripping is an empirical claim, not a proven theorem. The corpus testing reported above demonstrates that the conversion is faithful on the IWC's workflows, but the corpus is finite and curated. Workflows that exercise tool features the test corpus does not exercise may surface conversion bugs we have not seen. We mitigate this by treating round-trip equivalence as a tested invariant across the corpus and with diagnostics that prefer to flag uncertainty rather than silently accept it.

A more structural risk is fragmentation. The Galaxy ecosystem has historically benefited from a single canonical workflow representation served by a single canonical runtime. Promoting Format 2 to a co-equal artifact introduces the possibility that workflows in the two representations drift in ways the conversion layer does not catch. The mitigation is treating the round-trip property as a tested invariant, not an aspiration, and surfacing benign and state-altering diffs distinctly so the boundary remains visible.

Finally, we note that the consumer classes above are asymmetric in the evidence they supply. The IDE, CI, and low-install-cost consumers are documented by working systems with measurable behaviour and adopting users. The agent consumer is documented by a small and rapidly-evolving literature whose empirical claims have not yet stabilised. The validation core's usefulness to that consumer is the most speculative of the four, and the consumer least likely to validate, in retrospect, the design decisions we made for it. We have tried to keep the design defensible across all four cases — by treating diagnostics as a programmable interface rather than a presentation, by keeping the validator's API stable across the surfaces that consume it, and by avoiding consumer-specific affordances in the core. Whether that balance was right will be visible only in retrospect.

## Methods

The `@galaxy-tool-util/*` monorepo is a pnpm-managed TypeScript workspace of 13 packages, 10 of them published to npm. Core packages include `@galaxy-tool-util/schema` (Effect Schema parameter types, per-step state validators, and Format 2 parsing, serialization, and conversion), `@galaxy-tool-util/core` (the tool-metadata cache with filesystem and IndexedDB backends, the ToolShed/TRS client, and the parsed-tool models), `@galaxy-tool-util/connection-validation` (per-connection collection-semantics validation, ported from Galaxy's `tool_util.workflow_state.connection_validation`), `@galaxy-tool-util/workflow-graph` (collection-type algebra and datatype subtyping, distributed as a zero-dependency package consumable from non-Node environments), `@galaxy-tool-util/search` (ToolShed search and tool discovery), and `@galaxy-tool-util/cli` (the `gxwf` and `galaxy-tool-cache` binaries). The conceptual validation core spans `schema`, `connection-validation`, and `workflow-graph` rather than the package literally named `core`. The browser editor `gxwf-ui` is deployed separately as a hosted Vue 3 application; the Visual Studio Code extension is published as `galaxy-workflows-vscode` on the VS Code Marketplace.

Tool metadata is sourced from ToolShed 2.0 [Blankenberg 2014; Galaxy Community 2024] over the GA4GH Tool Registry Service (TRS) API [GA4GH TRS; O'Connor 2017]. The cache layer abstracts over the storage backend: `FilesystemCacheStorage` writes to a directory on disk in Node environments; `IndexedDBCacheStorage` writes to the browser's IndexedDB in browser and Web Worker environments. The same library code runs against either backend. Cache freshness is governed by ToolShed-provided revision identifiers, so the cache survives across tool updates without manual invalidation.

JSON Schemas are exported for three artifact categories: per-tool parameter schemas, the structural workflow schema (covering Format 2 and native forms), and the Galaxy test-format schema (covering tool-test definitions). These schemas are usable by any language and any editor with JSON Schema support; the gxwf-specific tooling is one consumer among potentially many.

The CLI binaries are distributed via npm and via standalone executables built with [packaging tool — TODO confirm exact name and link in tasks.md]. The library packages are published under the `@galaxy-tool-util` npm scope. All source code is hosted at [REPOSITORY URL] under [LICENSE].

## Availability

The `@galaxy-tool-util/*` packages are available on npm under the `@galaxy-tool-util` scope; sources are at [REPOSITORY URL]. The Visual Studio Code extension `galaxy-workflows-vscode` is available on the VS Code Marketplace; sources are at [REPOSITORY URL]. The browser editor `gxwf-ui` is available at [BROWSER EDITOR URL]; sources are at [REPOSITORY URL]. All components are released under [LICENSE]. Format 2 is documented at [FORMAT 2 DOCS URL]; the `gxformat2` library is available at [GXFORMAT2 REPOSITORY URL].

## Supporting Information

Supporting information provides the reproducible artifacts and extended figures behind the validation claims. Because every CLI-backed item regenerates from the published `gxwf` binary against the public IWC corpus, the supporting information is not merely illustrative — it is the audit trail for the depth claim.

- **Extended figures (Figures S1–S10)** — per-surface detail the body only summarizes: native `.ga` vs Format 2 for one real step (S1); the planted-error worked example with its actual diagnostics (S2); collection/map-over connection validation with an annotated diagram (S3); native `.ga` parity, the legacy `tool_state` quick fix, conditional-aware completion, and tool-cache graceful degradation in VS Code (S4–S7); the `gxwf-ui` browser editor running with no Galaxy server (S8); bidirectional conversion and round-trip fidelity (S9); and tool-ID search in two forms — the `gxwf tool-search` CLI result table and the VS Code *Insert Tool Step* ToolShed-search QuickPick (S10).
- **Worked example (Listing S1, Workflows S1–S2)** — a single named IWC workflow in both native `.ga` and Format 2, plus a variant carrying one mistake per error class (misspelled parameter, illegal select, type mismatch, stale `__current_case__`, invalid collection connection) each shown with the real `gxwf validate` diagnostic and contrasted against the comparable Nextflow/Snakemake/WDL toolchain.
- **Tool-install list (Data S1)** — an ephemeris/shed-tools YAML pinning the worked example's tools, so a reader can cache them for offline validation or install them into a Galaxy server.
- **Corpus reports (Reports S1–S2)** — the full per-workflow `gxwf validate-tree --strict` and `gxwf roundtrip-tree` results behind Figure 4 and the corpus counts, with the corpus commit SHA and capture date recorded for auditability.

## References

Citations use the inline form used in the text; full bibliographic records are maintained in `references.yml` and the list below is generated from it at render time. Entries marked "repository" denote software whose canonical citable form is the source repository; we cite these explicitly in the prose to make the absence of a peer-reviewed reference visible to readers.


## Figures

(Figure placeholders for production; each figure to be produced from artifacts already in repository or screenshots from the deployed tooling.)

**Figure 1. The Format 2 / gxwf stack.** ToolShed-served tool schemas flow into the validation core, which is consumed by the CLI, the VS Code extension, the browser editor, and CI. The same core, the same metadata, four authoring surfaces.

**Figure 2. Validation depth in layers.** Three concentric or stacked layers: structural validation (workflow document shape), per-step state validation (per-tool parameter, value, select, conditional), and per-connection validation (collection semantics, map-over depth). The depth claim is the second and third layers — the first is comparable across systems.

**Figure 3. Schema-aware editing in Visual Studio Code.** A screenshot of a Format 2 workflow open in the browser-hosted `galaxy-workflows-vscode` web extension (as run at vscode.dev), with a diagnostic highlighting an invalid select value, a completion popup showing the legal options for a `state:` key, and a hover surfacing the tool-XML help text. The figure should ideally include two or three representative diagnostic categories.

**Figure 4. IWC corpus validation and round-trip results.** A small table or bar chart summarizing validate/clean/round-trip outcomes across the IWC corpus, with a short categorical breakdown of the diagnostics surfaced.