Tool Request API
Galaxy PR #20935 - Asynchronous Tool Execution API
Executive Summary
The Tool Request API introduces a new asynchronous job submission mechanism for Galaxy via POST /api/jobs. This replaces the problematic synchronous POST /api/tools endpoint that blocks web threads during tool execution, which can take minutes for large collection-based workflows. The new architecture offloads job expansion and creation to Celery workers while providing strongly-typed, Pydantic-validated state transformations at each step.
Problem Statement
The legacy tool submission process (POST /api/tools) has several critical issues:
-
Blocking Web Threads - Tool execution happens entirely in the web thread, even when processing could take dozens of minutes (e.g., mapping large collections over tools can create hundreds of thousands of jobs)
-
Semantic Endpoint Confusion -
POST /api/toolscreates jobs, not tools, violating REST semantics -
Untyped State Dictionaries - Tool parameters are passed as opaque, mostly unvalidated dictionaries making debugging and documentation difficult
-
Poor Validation Timing - Parameter validation happens deep in execution rather than at request time
Architecture Overview
API Flow
┌─────────────┐ ┌───────────┐ ┌─────────────┐ ┌──────────┐ ┌────────────┐
│ API Request │────▶│ Jobs API │────▶│ Job Service │────▶│ Database │ │ Task Queue │
└─────────────┘ └───────────┘ └─────────────┘ └──────────┘ └────────────┘
│ │ │ │ │
│ HTTP JSON │ create() │ │ │
│ │ │ │ │
│ │ ┌────────────┴───────────┐ │ │
│ │ │ If not strict: │ │ │
│ │ │ - Build RelaxedRequest│ │ │
│ │ │ - strictify() to │ │ │
│ │ │ RequestToolState │ │ │
│ │ │ If strict: │ │ │
│ │ │ - Build & validate │ │ │
│ │ │ RequestToolState │ │ │
│ │ │ decode() to │ │ │
│ │ │ RequestInternalState │ │ │
│ │ └────────────┬───────────┘ │ │
│ │ │ │ │
│ │ │──────────────────▶│ Serialize │
│ │ │ │ ToolRequest │
│ │ │ │ │
│ │ │───────────────────┼───────────────▶│
│ │ │ │ Queue QueueJobs
│ │ │ │ │
│◀──────────────────│◀─────────────────│ JobCreateResponse │ │
│ JSON Response │ │ │ │
Backend Processing (Celery Worker)
┌─────────────┐ ┌───────────────┐ ┌────────────────┐ ┌──────────────┐
│ Task Queue │────▶│ JobSubmitter │────▶│ Tool.execute() │────▶│ Job Manager │
└─────────────┘ └───────────────┘ └────────────────┘ └──────────────┘
│ │ │ │
│ QueueJobs │ │ │
│ │ │ │
│ ┌────────┴────────┐ │ │
│ │ Load ToolRequest│ │ │
│ │ from Database │ │ │
│ │ │ │ │
│ │ dereference() │ │ │
│ │ URI inputs to │ │ │
│ │ HDAs │ │ │
│ │ │ │ │
│ │ materialize() │ │ │
│ │ deferred data │ │ │
│ └────────┬────────┘ │ │
│ │ │ │
│ │─────────────────────▶│ │
│ │ handle_input_async() │ │
│ │ │ │
│ │ │─────────────────────▶│
│ │ │ Create & queue │
│ │ │ individual jobs │
New API Endpoints
Primary Endpoint
POST /api/jobs
Creates a tool request and queues job creation asynchronously.
Request Schema (JobRequest):
class JobRequest:
tool_id: Optional[str] # Tool identifier
tool_uuid: Optional[str] # Tool UUID (alternative identifier)
tool_version: Optional[str] # Specific tool version
history_id: Optional[str] # Target history (encoded ID)
inputs: Optional[dict] # Tool parameters
strict: bool = True # Enable strict validation
use_cached_jobs: Optional[bool] # Reuse existing job results
rerun_remap_job_id: Optional[str]
send_email_notification: bool = False
Response Schema (JobCreateResponse):
class JobCreateResponse:
tool_request_id: str # Encoded ID of the ToolRequest
task_result: AsyncTaskResultSummary # Celery task tracking info
Supporting Endpoints
| Endpoint | Method | Description |
|---|---|---|
/api/tool_requests/{id} | GET | Get tool request details |
/api/tool_requests/{id}/state | GET | Get tool request state |
/api/histories/{history_id}/tool_requests | GET | List tool requests for a history |
/api/tools/{tool_id}/inputs | GET | Get tool input schema |
/api/tools/{tool_id}/parameter_request_schema | GET | JSON Schema for tool request API |
/api/tools/{tool_id}/parameter_landing_request_schema | GET | JSON Schema for landing request API |
/api/tools/{tool_id}/parameter_test_case_xml_schema | GET | JSON Schema for test case construction |
State Classes and Transformations
The API introduces a hierarchy of strongly-typed state classes with explicit, validated transformations between them.
State Class Hierarchy
ToolState (abstract)
│
┌───────────────┼───────────────────────────────┐
│ │ │
▼ ▼ ▼
RelaxedRequestToolState RequestToolState WorkflowStepToolState
│ │ │
│ strictify() │ ▼
└──────────────▶│ WorkflowStepLinkedToolState
│ decode()
▼
RequestInternalToolState
│
│ dereference()
▼
RequestInternalDereferencedToolState
│
│ expand()
▼
JobInternalToolState
State Representations
| State Class | Representation | Object References | Features |
|---|---|---|---|
RelaxedRequestToolState | relaxed_request | {src: "hda", id: <encoded>} | Allows legacy syntax quirks |
RequestToolState | request | {src: "hda", id: <encoded>} | Strict validation, map/reduce |
RequestInternalToolState | request_internal | {src: "hda", id: <decoded>} | Database-ready, allows URI sources |
RequestInternalDereferencedToolState | request_internal_dereferenced | {src: "hda", id: <decoded>} | All URIs converted to HDAs |
JobInternalToolState | job_internal | {src: "hda", id: <decoded>} | Mapping expanded, per-job state |
TestCaseToolState | test_case_xml | File names and URIs | For test case construction |
WorkflowStepToolState | workflow_step | Mixed | Nearly everything optional |
WorkflowStepLinkedToolState | workflow_step_linked | With link references | Includes workflow connections |
Transformation Functions
# API layer (web thread)
strictify(relaxed: RelaxedRequestToolState) -> RequestToolState
decode(request: RequestToolState, decode_id) -> RequestInternalToolState
# Celery worker
dereference(internal: RequestInternalToolState) -> RequestInternalDereferencedToolState
expand(dereferenced: RequestInternalDereferencedToolState) -> list[JobInternalToolState]
Database Models
ToolRequest
class ToolRequest:
id: int # Primary key
tool_source_id: int # FK to ToolSource
history_id: Optional[int] # FK to History
request: dict # Serialized RequestInternalToolState
state: str # "new" | "submitted" | "failed"
state_message: Optional[str] # Error details if failed
# Relationships
tool_source: ToolSource
history: Optional[History]
jobs: list[Job] # Created jobs
implicit_collections: list[ToolRequestImplicitCollectionAssociation]
ToolRequestState Enum
class ToolRequestState(str, Enum):
NEW = "new" # Request created, pending processing
SUBMITTED = "submitted" # Jobs created successfully
FAILED = "failed" # Processing failed
ToolRequestImplicitCollectionAssociation
Links implicit output collections to their source tool request:
class ToolRequestImplicitCollectionAssociation:
id: int
tool_request_id: int
dataset_collection_id: int
output_name: str
Celery Task
queue_jobs Task
@galaxy_task(action="queuing up submitted jobs")
def queue_jobs(request: QueueJobs, app: MinimalManagerApp, job_submitter: JobSubmitter):
tool = cached_create_tool_from_representation(
app=app,
raw_tool_source=request.tool_source.raw_tool_source,
tool_dir=request.tool_source.tool_dir,
tool_source_class=request.tool_source.tool_source_class,
)
job_submitter.queue_jobs(tool, request)
QueueJobs Task Request
class QueueJobs:
tool_source: ToolSource # Serialized tool definition
tool_request_id: int # Reference to persisted request
user: RequestUser # User context for job creation
use_cached_jobs: bool # Enable job caching
rerun_remap_job_id: Optional[int] # For reruns
JobSubmitter Processing
The JobSubmitter class handles the asynchronous job creation:
class JobSubmitter:
def queue_jobs(self, tool: Tool, request: QueueJobs) -> None:
tool_request = self._tool_request(request.tool_request_id)
request_context = self._context(tool_request, request)
# 1. Dereference URI inputs to HDAs
tool_state, new_hdas = self.dereference(request_context, tool, request, tool_request)
# 2. Materialize deferred datasets
for hda_pair in [p for p in new_hdas if not p.request.deferred]:
self.hda_manager.materialize(...)
# 3. Execute tool (creates jobs)
tool.handle_input_async(
request_context,
tool_request,
tool_state,
history=target_history,
use_cached_job=use_cached_jobs,
rerun_remap_job_id=rerun_remap_job_id,
)
# 4. Update request state
tool_request.state = ToolRequest.states.SUBMITTED
Strict vs Relaxed Mode
The API supports two validation modes:
Strict Mode (default, strict=True)
- Full Pydantic validation of inputs
- No legacy behavior accommodations
- Cleaner, more predictable validation errors
Relaxed Mode (strict=False)
- Preserves some legacy behavior for backwards compatibility
- Examples:
- Empty string defaults for non-optional text inputs
- Conversion of explicit
Noneto empty string for non-optional text - More lenient conditional/repeat initialization
# Relaxed mode processing
if not strict:
relaxed_request_state = RelaxedRequestToolState(inputs)
relaxed_request_state.validate(tool)
request_state = strictify(relaxed_request_state, tool)
else:
request_state = RequestToolState(inputs)
Benefits
-
Non-Blocking Web Requests - Tool execution no longer blocks web threads; immediate response with tracking ID
-
Correct REST Semantics -
POST /api/jobscreates jobs,POST /api/toolsreserved for tool management -
Strong Typing Throughout - Pydantic models validate state at each transformation step
-
Self-Documenting - JSON Schema endpoints describe valid inputs for any tool
-
Better Error Messages - Validation errors pinpoint exact parameter issues early
-
Scalable - Job creation distributed across Celery workers
-
Traceable - ToolRequest provides audit trail linking requests to created jobs
Testing
The PR includes comprehensive testing:
test/functional/test_toolbox_pytest.py- Framework tool testslib/galaxy_test/api/test_tool_execute.py- Existing tests adaptedlib/galaxy_test/api/test_tool_execution.py- New async API tests
Test matrix includes both legacy and new API paths via GALAXY_TEST_USE_LEGACY_TOOL_API environment variable (if_needed | always).
Migration Path
The legacy POST /api/tools endpoint remains functional. Applications can migrate to POST /api/jobs incrementally:
- Update client to handle async response pattern
- Poll
/api/tool_requests/{id}/statefor completion - Retrieve job IDs from
/api/tool_requests/{id}
Future Work
As noted in the PR, this forms the backend foundation for:
- Workflow transformation using these state models
- Tool form adaptation to use the new API
- Enhanced linting using the Pydantic models