Galaxy Tool And Job Failure Reference

This is reference material, not a debug recipe. Use it to understand what Galaxy can know about a failed tool job and which API surfaces preserve that evidence.

Model

Galaxy tool failure handling is layered:

The tool wrapper defines expected failure semantics through detect_errors, <stdio>, exit-code checks, regex checks, and command strictness.
The job runner executes the command and captures exit code plus tool/job stdout and stderr streams.
Galaxy evaluates configured failure rules and records structured job_messages.
The job reaches a terminal state, output datasets may become error, and dependent jobs may pause or fail later.
Workflow invocation APIs summarize those jobs, but job APIs preserve the most detailed tool-level evidence.

Tool Wrapper Failure Controls

Important wrapper controls:

Control	Meaning
`detect_errors="default"`	For non-legacy XML tools without explicit `<stdio>`, Galaxy defaults to non-zero exit-code failure detection.
`detect_errors="exit_code"`	Adds fatal checks for non-zero exit codes, plus optional OOM exit-code handling.
`detect_errors="aggressive"`	Adds non-zero exit-code checks plus broad stdout/stderr regexes for OOM and generic error text.
`<stdio><exit_code ... /></stdio>`	Adds explicit exit-code ranges or values with levels such as `fatal`, `warning`, or `fatal_oom`.
`<stdio><regex ... /></stdio>`	Adds case-insensitive stdout/stderr regex checks.
`<command strict="...">`	Controls shell strictness; newer tool profiles default to `set -e` behavior.

Current Galaxy source parses these rules while loading the tool, then stores exit-code and regex rules on the tool object. XML parser behavior lives under ~/projects/repositories/galaxy/lib/galaxy/tool_util/parser/xml.py; concrete stdio presets live under ~/projects/repositories/galaxy/lib/galaxy/tool_util/parser/stdio.py.

Rule Ordering And Levels

Galaxy evaluates explicit and preset rules in a defined order:

Exit-code rules are evaluated before regex rules.
Regexes search stdout and/or stderr case-insensitively.
A fatal rule stops later checks.
Warnings, log messages, and QC messages can produce job_messages without failing the job.

Useful levels:

Level	Effect
`log`	Records informational message; does not fail the job.
`qc`	Records QC message; does not fail the job.
`warning`	Records warning; does not fail the job.
`fatal`	Fails the job as a generic tool error.
`fatal_oom`	Fails as out-of-memory; runner behavior can use this for OOM handling/resubmission.

Low-confidence caveat: OOM resubmission behavior is runner/destination configuration dependent. The wrapper and output checker can classify OOM, but retry policy is not solely a wrapper property.

Job And Dataset States

Job states relevant to workflow tests include:

In progress or queued: new, queued, running, waiting, upload, resubmitted.
Success: ok.
Failure or terminal problem: error, failed, paused, stopped, deleted, skipped depending context.

Dataset states relevant to downstream failures include ok, error, paused, failed_metadata, deferred, and discarded.

For workflow debugging, do not collapse job state and dataset state. A job can fail, its outputs can become error datasets, and a downstream workflow step can later fail because it consumes those datasets.

Stream And Message Fields

Galaxy distinguishes tool streams from job/runner streams:

Field	Meaning
`tool_stdout`	stdout from the executed tool command.
`tool_stderr`	stderr from the executed tool command.
`job_stdout`	stdout from job wrapper/runner context.
`job_stderr`	stderr from job wrapper/runner context.
`stdout`	Combined stdout compatibility view.
`stderr`	Combined stderr compatibility view.
`job_messages`	Structured failure/warning messages produced by stdio and output checking.
`exit_code`	Process exit code observed by the runner.

Prefer job_messages plus separate streams for reference-quality failure interpretation. Combined stdout and stderr are useful for humans but lose provenance.

API Surfaces

Useful job APIs from Galaxy source under ~/projects/repositories/galaxy/lib/galaxy/webapps/galaxy/api/jobs.py:

API	Use
`GET /api/jobs`	Filter jobs by state, tool id, workflow id, invocation id, history id, etc.
`GET /api/jobs/{job_id}`	Basic job detail.
`GET /api/jobs/{job_id}?full=true`	Full job detail, including streams and `job_messages`.
`GET /api/jobs/{job_id}/stdout`	Combined stdout as plain text.
`GET /api/jobs/{job_id}/stderr`	Combined stderr as plain text.
`GET /api/jobs/{job_id}/console_output`	Live or stored tool stdout/stderr, useful while a job is running.
`GET /api/jobs/{job_id}/inputs`	Input datasets for the job.
`GET /api/jobs/{job_id}/outputs`	Output datasets and output collections.
`GET /api/jobs/{job_id}/metrics`	Job metrics, if available.
`GET /api/jobs/{job_id}/common_problems`	Known simple problems such as empty or duplicate inputs.

Admin-only or admin-enriched fields can include command line, traceback, destination, runner, handler, external id, and destination parameters. Do not assume a normal workflow-testing API key can retrieve everything.

Durable Reference Use

This note should inform generated skills when they need to preserve or inspect tool-level failure evidence:

A concrete workflow step should preserve tool id, version, input labels, output labels, datatype and collection shape so a later job failure can be traced back to the authoring decision.
A runtime failure should not be classified from stderr alone if job_messages, exit code, and wrapper stdio rules are available.
A workflow invocation failure may be caused by an upstream job, but invocation APIs are not a substitute for full job detail.

Low-Confidence Or Version-Sensitive Points

Tool profile defaults changed over time. Prefer current Galaxy source and wrapper profile over old generic rules.
Some docs describe failure messages as stream text; current code preserves structured job_messages separately.
OOM handling depends on job runner and destination configuration.
Planemo may print or summarize job detail differently from raw Galaxy APIs; see planemo-workflow-test-architecture.

Galaxy tool and job failure reference