User-Defined Tools in Galaxy
Synthesis of the User-Defined Tools (UDT) initiative across PRs #19434 → #22625, with linkage to the structured tool-state work. Open questions at end.
1. Motivation
Galaxy’s tool ecosystem has, since inception, assumed a privileged author: admins install XML tool definitions via the Tool Shed or filesystem, those tools are loaded into a global toolbox, and Cheetah templating gives them unrestricted access to Galaxy internals. That model serves the published-tool catalog well but blocks several increasingly important use cases:
- Casual/user authorship — researchers who want to wrap a custom shell pipeline for a single analysis without negotiating a Tool Shed PR or admin install.
- Agent-authored tools — LLM agents (cf. Galaxy’s MCP / agent-operations layer, #22625) generating tools on the fly during a session.
- Workflow-embedded scripts — workflows that bundle a small custom step without pretending it is a published tool.
- Portability / reproducibility — shipping the tool definition with the workflow rather than depending on a remote Tool Shed installation matching in name and version.
The User-Defined Tools (UDT) initiative, opened by PR #19434 (Marius van den Beek, target Galaxy 25.0), introduces a YAML tool format and a privilege model that lets non-admin users author and run tools inside Galaxy — at the cost of walking away from the unconstrained Cheetah templating that XML tools enjoy.
This paper synthesizes the architectural, security, and validation work that turns UDTs from a beta toy into a defensible foundation, and surfaces the post-merge gaps that remain.
2. Two tool formats, one toolbox
UDTs do not replace XML tools. They sit alongside them as a second tool source class with intentionally different trust assumptions:
| Aspect | Standard XML Tools | User-Defined YAML Tools |
|---|---|---|
| Author | Admin / Tool Shed | Any user with the Custom Tool Execution role |
| Format | XML | YAML |
| Templating | Cheetah (full Python access) | Sandboxed JavaScript $() expressions |
| DB / FS access at templating | Full | None |
| Container | Optional | Required (#21161 hardens this for 25.1) |
| Storage | Filesystem | Database (dynamic_tool + user_dynamic_tool_association) |
| Discovery | Global toolbox | UUID lookup, per-user panel |
| Workflow embedding | By tool id | Tool definition copied into workflow |
The two flavors share the same tool-source machinery via parallel Pydantic
models, UserToolSource (class: GalaxyUserTool) and AdminToolSource
(class: GalaxyTool), in lib/galaxy/tool_util/models.py. The strict
UserToolSource is what a user can POST; the looser AdminToolSource is
reserved for admin-authored dynamic tools.
Example: a user-written cat
class: GalaxyUserTool
id: cat_user_defined
version: "0.1"
name: Concatenate Files
container: busybox
shell_command: |
cat $(inputs.datasets.map((i) => i.path).join(' ')) > output.txt
inputs:
- name: datasets
multiple: true
type: data
outputs:
- name: output1
type: data
format_source: datasets
from_work_dir: output.txt
The $(...) block is a JavaScript expression evaluated at job-creation time
against a runtime model derived from the inputs — explicitly not a Cheetah
template, not Python, and not able to import pathlib and write to
$HOME.
3. The security model
UDTs solve a fundamentally different security problem than XML tools. XML-tool security is mostly “the admin vetted the tool”; UDT security has to hold up against the tool author themselves potentially being adversarial.
Three-layer defense
- Sandboxed expression language.
$()blocks run in a JavaScript evaluator pinned to ES2017 with no host-object exposure. There is noapp, nomodel, noos, no filesystem. The runtime model only carries what the tool itself declared as inputs (paths, formats, element identifiers, etc.). - Mandatory containerization. A UDT without a
container:field is rejected. PR #21161 (still draft, slated for 25.1) makes this requirement uniform with interactive tools, ensuring even admin-supplied dynamic tools cannot escape the container envelope when run in user-author mode. - Per-user authorization. Authoring requires the
Custom Tool Executionrole; running someone else’s UDT requires explicit sharing or workflow embedding (which copies the tool into the recipient’s private namespace).
Boundaries that matter
- Filesystem outside the container. UDTs cannot read
extra_files, metadata indices (e.g. BAM.baifiles), or reference data — features XML tools take for granted. Some of these are intentional (reference data is a side-channel for authority); others are #19434 limitations not yet resolved (configfiles partially landed in #20761). - Job placement. UDTs need to be addressable by job_conf.yml so admins
can route them to a sandboxed destination. PR
#20932 added a
tool_typefor this. - Network and credentials. UDTs accept credentials parameters (a separate feature line) but the YAML schema has been narrowed (#22507) to prevent declaring properties the runtime ignores — closing a class of “looks valid, isn’t honored” footguns.
4. From request to runtime: the structured tool state pipeline
The hardest part of UDTs is not authoring — it is making sure the same tool
definition runs the same way every time, that the form re-prefills correctly,
and that workflow extraction reconstructs the exact job that ran. XML tools
get away with JobParameter rows and pipe-delimited string state because
basic.py is the source of truth. YAML tools have a richer, typed shape
that does not fit cleanly into that legacy encoding, so the structured tool
state work was a precondition for taking UDTs seriously.
State representations
Component - Tool State Specification catalogs roughly twelve state representations. The ones that matter for UDTs:
| State | When | Validated by |
|---|---|---|
request | Inbound API payload | RequestToolState Pydantic |
request_internal | After id resolution | RequestInternalToolState |
request_internal_dereferenced | After URI/HDA dereferencing | same, dereferenced |
job_internal | Persisted on the Job | JobInternalToolState → Job.tool_state column |
job_runtime | At evaluation, with paths | dynamic discriminated unions |
test_case_json | YAML test definitions | full parameter validation |
PR #21828 added the
Job.tool_state JSONB column and the runtimeify path that converts the
persisted internal state into the runtime CWL-style inputs the YAML tool
evaluator consumes. PR
#20935 introduced the
typed request side; #21828 closed the loop on the runtime side.
Discriminated collection runtime models
Collections are where the typing pays for itself. #21828 and follow-ups (#21991 for subcollection mapping / DCE, #22116 for hidden_data parameters, #22362 for JSON-Schema generation) build a recursive Pydantic discriminated-union family covering:
- Leaf collections:
paired,list,record,sample_sheet,paired_or_unpaired. - Nested types:
list:paired,list:list:paired,record:paired, … — generated lazily bybuild_collection_model_for_type()with LRU caching inlib/galaxy/tool_util_models/parameters.py. - Subset unions:
list:paired,list:listfor tools that accept several shapes. - DCE references for subcollection mapping (
{src: "dce", id: …}).
The factory returns a create_model() Pydantic class with a
Literal[collection_type] discriminator — the same dynamic-model pattern
captured in Component - Tool State Dynamic Models. Unknown leaf or
nested segments yield None and a controlled fallback, rather than a silent
type widening.
This is the foundation that makes “user wrote a YAML tool with a
list:paired input” into a job whose state Galaxy can validate, persist,
and reconstruct.
5. Schema hardening — closing the slop surface
Once a tool format is database-stored and authored by users, every permissive corner of the schema becomes an attack surface and a support burden. The schema-hardening campaign:
| PR | Effect |
|---|---|
| #22280 | Fix validation of optional text validators — closing a 26.0-era regression where some text validator combos let invalid tool defs load |
| #22362 | Generate complete, tested JSON-Schema from the Pydantic tool state models; allow validating workflows via either Pydantic or JSON-Schema |
| #22507 | Narrow the YAML tool schema — reject truevalue/falsevalue on boolean params and other input properties the runtime ignores. This is the branch this paper is being written from |
| #22566 | Tighten the workflow test schema, unifying the Planemo and in-tree framework test formats |
| #21828 follow-ups | Credential test definitions; JSON Schema keywords (color, length, in_range, regex); recursive-union warning silencing; format alias |
The narrowing matters for two distinct audiences: human authors get told their tool is malformed at create time instead of at job time, and agents generating tools get a tighter schema to validate against, which materially reduces invalid-but-syntactically-plausible output. The latter is the same motivation behind exposing tool state JSON-Schema externally — agents need schemas, not Pydantic.
There is a trade-off: the client-side ToolSourceSchema.json ballooned in
size after #22507, which the author flagged. Worth tracking, not yet a
problem.
6. The post-hoc divergence problem
Problem - YAML Tool Post-Hoc State Divergence is the open architectural issue and the most important honest qualifier to put on the UDT story.
Today, even though Job.tool_state is a validated structured column, the
post-hoc consumers of “what was run” still read the legacy JobParameter
rows via params_from_strings:
| Consumer | Path | Source |
|---|---|---|
| Job display UI | summarize_job_parameters | JobParameter rows |
| Tool form rerun | Tool.to_json(job=…) | JobParameter rows |
| Workflow extraction | workflow/extract.py:step_inputs | JobParameter rows |
| History export | dual: emits both tool_state and params; reads params on import | both written, only legacy read |
For XML tools this is fine because basic.py is the source of truth. For
YAML tools it is structurally risky: collection runtime metadata
(column_definitions, fields, has_single_item, columns),
comma-separated collection types, and DCE-source elements for subcollection
mapping all live cleanly on the Job.tool_state side and have no proven
round-trip through the flat JobParameter encoding.
There are no end-to-end tests today proving:
- Run YAML tool → rerun from history → second job’s
Job.tool_stateequals the first. - Run YAML tool → extract workflow → run extracted workflow →
tool_statematches. - Run YAML tool → export → import → rerun.
These are the missing invariants. Closing them either requires adding the
tests against the current dual representation (and accepting some lossiness),
or making Job.tool_state the source of truth for post-hoc consumers when
present (a from_runtime_state(job) symmetric to runtimeify). The right
answer is probably both: tests first, then a controlled switchover.
7. Authoring surface
UDTs are not just a backend feature; the authoring UX is a substantial part of the value.
- Monaco editor with full YAML schema validation, JS intellisense for
embedded
$()blocks, and mixed YAML/JS syntax highlighting (yaml-with-js.ts). The narrowed schema (#22507) is what makes the red-squiggle experience trustworthy. - User Tool Panel in the sidebar listing the user’s private tools.
- Build / runtime model preview —
/api/unprivileged_tools/buildand/runtime_modellet the editor preview the form and the JS runtime without committing to a job. - Workflow embedding — embedding a UDT in a workflow copies the tool definition; importing the workflow copies it into the new owner’s namespace. No global registry pollution.
- Upload from URL (#20860) for sharing tools out-of-band.
8. Agent-native authoring
PR #22625 lifts UDT
operations into the agent-operations layer with MCP wrappers — create_user_tool,
delete_user_tool, run_user_tool. PR
#22628 excludes UDTs
from requiring a Galaxy-env destination, which matters because agents
generating tools on the fly cannot assume a particular cluster/Conda
environment exists.
This is the inflection point that justifies the schema-hardening work. An
agent that hallucinates truevalue on a boolean param under #22507 gets a
clean validation error; pre-#22507 the tool would load and silently ignore
the field. The same logic applies to JSON-Schema externalization in #22362
— agents need a schema they can target, and the schema must reflect what
the runtime actually honors.
The run_user_tool guards in commits bec6b06813 / 3c02ff707c ensure
agents can’t run tools that have been deactivated, which closes a real
race window.
9. Where this leaves us
UDTs as of mid-2026:
- ✅ A coherent YAML format with a published schema and a sandboxed expression language.
- ✅ A typed, persisted, validated tool state column.
- ✅ End-to-end runtime path (
request → internal → job_internal → runtime) with Pydantic validation at each step. - ✅ Recursive collection-type modeling, including comma-separated unions and subcollection mapping.
- ✅ JSON-Schema externalization for cross-platform / agent consumption.
- ✅ Schema narrowing closing several “looks valid, isn’t honored” classes.
- ✅ Agent-operations / MCP surface for creation, deletion, and execution.
- ⚠️ Mandatory containerization landing in 25.1 (#21161 still draft).
- ⚠️ Post-hoc consumers (rerun, display, extract, export) still use legacy
JobParameterrows — divergence proven possible, not yet measured. - ❌ No E2E tests for rerun/extract/export round-trip on YAML tools.
- ❌ Output collections in YAML tools partial —
_parse_teststill hard-codesoutput_collections = []per #21828. - ❌ No
extra_files/ metadata-files / reference-data access paths for UDTs.
10. Recommended next steps
In rough priority order:
- Land #21161 so the container guarantee is uniform across UDTs and interactive tools before broader rollout.
- Add the missing E2E divergence tests described in
Problem - YAML Tool Post-Hoc State Divergence — rerun, extract,
export, all asserting
Job.tool_stateequality across the round trip. Red-to-green: write the tests first, watch them fail, then drive the reconciliation work. - Implement
from_runtime_state(job)symmetric toruntimeifyfor post-hoc consumers; switch UI form rerun, job display, and workflow extraction to prefer it whenJob.tool_stateis present. - Finish output-collection support in YAML tools (the
_parse_testTODO). - Document the JSON-Schema externalization so agent authors and other Galaxy clients can consume it without reading the Pydantic source.
- Consider broader rollout policy: who gets the
Custom Tool Executionrole by default on community Galaxies, and what does the operator playbook for UDT abuse look like?
11. Unresolved questions
to_cwldeprecation timeline — the legacy fallback inevaluation.py:1116logs “may work differently in the future” but has no removal plan. When does it go?- 24.2 minimum-profile bump — what happens to UDTs with no
profileset or with an older one? #22507 narrowed the schema; do legacy older-profile YAML tools still load? - Does
params_from_stringsround-tripdata_collectionwith comma-separatedcollection_typefor YAML tools? Likely no test today. - Does the legacy flat encoding represent
dce-source elements? Unverified for YAML tool jobs. - For collection runtime metadata (
column_definitions,fields,has_single_item,columns), is there any path back fromJobParameterrows to a structured shape, or is it lost on rerun / extract / display? - Is the dual emission of
tool_stateandparamson history export acceptable, or does it need an export-time consistency check? - ToolSourceSchema.json size after #22507 — at what size does client-bundle cost become a problem?
- Sharing model — only embed-in-workflow today. Is direct user-to-user share desired, or does workflow remain the canonical sharing unit?
- Reference data / metadata files in UDTs — is the long-term answer “expose
via explicit
inputsof new types” or “never, use a workflow”?