Dashboard

Pr 20935 Tool Request Api

Asynchronous job submission via POST /api/jobs with Pydantic-validated state transformations

Raw
Revised:
2026-06-06
Revision:
8
GitHub PR:
#20935
Related Notes:
Component - API Tests Tools, Component - Tool State Dynamic Models, Component - Tool State Specification, Component - YAML Tool Runtime, Dependency - Pydantic Dynamic Models, PR 18641 - Parameter Model Improvements Research, PR 18758 - Tool Execution Typing and Decomposition, PR 19434 - User Defined Tools, PR 21335 - GA4GH WES API, PR 21828 - YAML Tool Hardening and Tool State, PR 21842 - Tool Execution Migrated to api jobs, PR 21932 - History Graph API, PR 22706 - Workflow Extraction by IDs, Problem - YAML Tool Post-Hoc State Divergence, Problem - basic.py Parameter Hierarchy

Tool Request API

Galaxy PR #20935 - Asynchronous Tool Execution API

Executive Summary

The Tool Request API introduces a new asynchronous job submission mechanism for Galaxy via POST /api/jobs. This replaces the problematic synchronous POST /api/tools endpoint that blocks web threads during tool execution, which can take minutes for large collection-based workflows. The new architecture offloads job expansion and creation to Celery workers while providing strongly-typed, Pydantic-validated state transformations at each step.

Problem Statement

The legacy tool submission process (POST /api/tools) has several critical issues:

  1. Blocking Web Threads - Tool execution happens entirely in the web thread, even when processing could take dozens of minutes (e.g., mapping large collections over tools can create hundreds of thousands of jobs)

  2. Semantic Endpoint Confusion - POST /api/tools creates jobs, not tools, violating REST semantics

  3. Untyped State Dictionaries - Tool parameters are passed as opaque, mostly unvalidated dictionaries making debugging and documentation difficult

  4. Poor Validation Timing - Parameter validation happens deep in execution rather than at request time

Architecture Overview

API Flow

┌─────────────┐     ┌───────────┐     ┌─────────────┐     ┌──────────┐     ┌────────────┐
│ API Request │────▶│ Jobs API  │────▶│ Job Service │────▶│ Database │     │ Task Queue │
└─────────────┘     └───────────┘     └─────────────┘     └──────────┘     └────────────┘
      │                   │                  │                   │                │
      │  HTTP JSON        │    create()      │                   │                │
      │                   │                  │                   │                │
      │                   │     ┌────────────┴───────────┐       │                │
      │                   │     │ If not strict:         │       │                │
      │                   │     │  - Build RelaxedRequest│       │                │
      │                   │     │  - strictify() to      │       │                │
      │                   │     │    RequestToolState    │       │                │
      │                   │     │ If strict:             │       │                │
      │                   │     │  - Build & validate    │       │                │
      │                   │     │    RequestToolState    │       │                │
      │                   │     │ decode() to            │       │                │
      │                   │     │   RequestInternalState │       │                │
      │                   │     └────────────┬───────────┘       │                │
      │                   │                  │                   │                │
      │                   │                  │──────────────────▶│ Serialize      │
      │                   │                  │                   │ ToolRequest    │
      │                   │                  │                   │                │
      │                   │                  │───────────────────┼───────────────▶│
      │                   │                  │                   │  Queue QueueJobs
      │                   │                  │                   │                │
      │◀──────────────────│◀─────────────────│ JobCreateResponse │                │
      │   JSON Response   │                  │                   │                │

Backend Processing (Celery Worker)

┌─────────────┐     ┌───────────────┐     ┌────────────────┐     ┌──────────────┐
│ Task Queue  │────▶│ JobSubmitter  │────▶│ Tool.execute() │────▶│ Job Manager  │
└─────────────┘     └───────────────┘     └────────────────┘     └──────────────┘
      │                    │                      │                      │
      │  QueueJobs         │                      │                      │
      │                    │                      │                      │
      │           ┌────────┴────────┐             │                      │
      │           │ Load ToolRequest│             │                      │
      │           │ from Database   │             │                      │
      │           │                 │             │                      │
      │           │ dereference()   │             │                      │
      │           │ URI inputs to   │             │                      │
      │           │ HDAs            │             │                      │
      │           │                 │             │                      │
      │           │ materialize()   │             │                      │
      │           │ deferred data   │             │                      │
      │           └────────┬────────┘             │                      │
      │                    │                      │                      │
      │                    │─────────────────────▶│                      │
      │                    │ handle_input_async() │                      │
      │                    │                      │                      │
      │                    │                      │─────────────────────▶│
      │                    │                      │   Create & queue     │
      │                    │                      │   individual jobs    │

New API Endpoints

Primary Endpoint

POST /api/jobs

Creates a tool request and queues job creation asynchronously.

Request Schema (JobRequest):

class JobRequest:
    tool_id: Optional[str]          # Tool identifier
    tool_uuid: Optional[str]        # Tool UUID (alternative identifier)
    tool_version: Optional[str]     # Specific tool version
    history_id: Optional[str]       # Target history (encoded ID)
    inputs: Optional[dict]          # Tool parameters
    strict: bool = True             # Enable strict validation
    use_cached_jobs: Optional[bool] # Reuse existing job results
    rerun_remap_job_id: Optional[str]
    send_email_notification: bool = False

Response Schema (JobCreateResponse):

class JobCreateResponse:
    tool_request_id: str            # Encoded ID of the ToolRequest
    task_result: AsyncTaskResultSummary  # Celery task tracking info

Supporting Endpoints

EndpointMethodDescription
/api/tool_requests/{id}GETGet tool request details
/api/tool_requests/{id}/stateGETGet tool request state
/api/histories/{history_id}/tool_requestsGETList tool requests for a history
/api/tools/{tool_id}/inputsGETGet tool input schema
/api/tools/{tool_id}/parameter_request_schemaGETJSON Schema for tool request API
/api/tools/{tool_id}/parameter_landing_request_schemaGETJSON Schema for landing request API
/api/tools/{tool_id}/parameter_test_case_xml_schemaGETJSON Schema for test case construction

State Classes and Transformations

The API introduces a hierarchy of strongly-typed state classes with explicit, validated transformations between them.

State Class Hierarchy

                    ToolState (abstract)

         ┌───────────────┼───────────────────────────────┐
         │               │                               │
         ▼               ▼                               ▼
RelaxedRequestToolState  RequestToolState        WorkflowStepToolState
         │               │                               │
         │  strictify()  │                               ▼
         └──────────────▶│                    WorkflowStepLinkedToolState
                         │  decode()

              RequestInternalToolState

                         │  dereference()

         RequestInternalDereferencedToolState

                         │  expand()

                JobInternalToolState

State Representations

State ClassRepresentationObject ReferencesFeatures
RelaxedRequestToolStaterelaxed_request{src: "hda", id: <encoded>}Allows legacy syntax quirks
RequestToolStaterequest{src: "hda", id: <encoded>}Strict validation, map/reduce
RequestInternalToolStaterequest_internal{src: "hda", id: <decoded>}Database-ready, allows URI sources
RequestInternalDereferencedToolStaterequest_internal_dereferenced{src: "hda", id: <decoded>}All URIs converted to HDAs
JobInternalToolStatejob_internal{src: "hda", id: <decoded>}Mapping expanded, per-job state
TestCaseToolStatetest_case_xmlFile names and URIsFor test case construction
WorkflowStepToolStateworkflow_stepMixedNearly everything optional
WorkflowStepLinkedToolStateworkflow_step_linkedWith link referencesIncludes workflow connections

Transformation Functions

# API layer (web thread)
strictify(relaxed: RelaxedRequestToolState) -> RequestToolState
decode(request: RequestToolState, decode_id) -> RequestInternalToolState

# Celery worker
dereference(internal: RequestInternalToolState) -> RequestInternalDereferencedToolState
expand(dereferenced: RequestInternalDereferencedToolState) -> list[JobInternalToolState]

Database Models

ToolRequest

class ToolRequest:
    id: int                        # Primary key
    tool_source_id: int           # FK to ToolSource
    history_id: Optional[int]     # FK to History
    request: dict                 # Serialized RequestInternalToolState
    state: str                    # "new" | "submitted" | "failed"
    state_message: Optional[str]  # Error details if failed

    # Relationships
    tool_source: ToolSource
    history: Optional[History]
    jobs: list[Job]               # Created jobs
    implicit_collections: list[ToolRequestImplicitCollectionAssociation]

ToolRequestState Enum

class ToolRequestState(str, Enum):
    NEW = "new"           # Request created, pending processing
    SUBMITTED = "submitted"  # Jobs created successfully
    FAILED = "failed"     # Processing failed

ToolRequestImplicitCollectionAssociation

Links implicit output collections to their source tool request:

class ToolRequestImplicitCollectionAssociation:
    id: int
    tool_request_id: int
    dataset_collection_id: int
    output_name: str

Celery Task

queue_jobs Task

@galaxy_task(action="queuing up submitted jobs")
def queue_jobs(request: QueueJobs, app: MinimalManagerApp, job_submitter: JobSubmitter):
    tool = cached_create_tool_from_representation(
        app=app,
        raw_tool_source=request.tool_source.raw_tool_source,
        tool_dir=request.tool_source.tool_dir,
        tool_source_class=request.tool_source.tool_source_class,
    )
    job_submitter.queue_jobs(tool, request)

QueueJobs Task Request

class QueueJobs:
    tool_source: ToolSource        # Serialized tool definition
    tool_request_id: int          # Reference to persisted request
    user: RequestUser             # User context for job creation
    use_cached_jobs: bool         # Enable job caching
    rerun_remap_job_id: Optional[int]  # For reruns

JobSubmitter Processing

The JobSubmitter class handles the asynchronous job creation:

class JobSubmitter:
    def queue_jobs(self, tool: Tool, request: QueueJobs) -> None:
        tool_request = self._tool_request(request.tool_request_id)
        request_context = self._context(tool_request, request)

        # 1. Dereference URI inputs to HDAs
        tool_state, new_hdas = self.dereference(request_context, tool, request, tool_request)

        # 2. Materialize deferred datasets
        for hda_pair in [p for p in new_hdas if not p.request.deferred]:
            self.hda_manager.materialize(...)

        # 3. Execute tool (creates jobs)
        tool.handle_input_async(
            request_context,
            tool_request,
            tool_state,
            history=target_history,
            use_cached_job=use_cached_jobs,
            rerun_remap_job_id=rerun_remap_job_id,
        )

        # 4. Update request state
        tool_request.state = ToolRequest.states.SUBMITTED

Strict vs Relaxed Mode

The API supports two validation modes:

Strict Mode (default, strict=True)

  • Full Pydantic validation of inputs
  • No legacy behavior accommodations
  • Cleaner, more predictable validation errors

Relaxed Mode (strict=False)

  • Preserves some legacy behavior for backwards compatibility
  • Examples:
    • Empty string defaults for non-optional text inputs
    • Conversion of explicit None to empty string for non-optional text
    • More lenient conditional/repeat initialization
# Relaxed mode processing
if not strict:
    relaxed_request_state = RelaxedRequestToolState(inputs)
    relaxed_request_state.validate(tool)
    request_state = strictify(relaxed_request_state, tool)
else:
    request_state = RequestToolState(inputs)

Benefits

  1. Non-Blocking Web Requests - Tool execution no longer blocks web threads; immediate response with tracking ID

  2. Correct REST Semantics - POST /api/jobs creates jobs, POST /api/tools reserved for tool management

  3. Strong Typing Throughout - Pydantic models validate state at each transformation step

  4. Self-Documenting - JSON Schema endpoints describe valid inputs for any tool

  5. Better Error Messages - Validation errors pinpoint exact parameter issues early

  6. Scalable - Job creation distributed across Celery workers

  7. Traceable - ToolRequest provides audit trail linking requests to created jobs

Testing

The PR includes comprehensive testing:

  • test/functional/test_toolbox_pytest.py - Framework tool tests
  • lib/galaxy_test/api/test_tool_execute.py - Existing tests adapted
  • lib/galaxy_test/api/test_tool_execution.py - New async API tests

Test matrix includes both legacy and new API paths via GALAXY_TEST_USE_LEGACY_TOOL_API environment variable (if_needed | always).

Migration Path

The legacy POST /api/tools endpoint remains functional. Applications can migrate to POST /api/jobs incrementally:

  1. Update client to handle async response pattern
  2. Poll /api/tool_requests/{id}/state for completion
  3. Retrieve job IDs from /api/tool_requests/{id}

Future Work

As noted in the PR, this forms the backend foundation for:

  • Workflow transformation using these state models
  • Tool form adaptation to use the new API
  • Enhanced linting using the Pydantic models

Incoming References (16)