YAML Tool Runtime State Representation

Overview

This document describes how YAML-defined tools (User-Defined Tools and Admin Tools) convert tool state into a runtime representation suitable for building command lines. The runtime representation uses a CWL-like format where dataset references are transformed into file objects with path, format, and metadata.

Current Architecture

Two Paths to Runtime State

The UserToolEvaluator.build_param_dict() method in lib/galaxy/tools/evaluation.py now supports two paths for building the CWL-style inputs dictionary:

New Path (via runtimeify): Uses validated JobInternalToolState persisted with the job
Legacy Path (via to_cwl): Falls back to the workflow modules to_cwl function

# From lib/galaxy/tools/evaluation.py (UserToolEvaluator.build_param_dict)
if validated_tool_state is not None:
    from galaxy.tool_util.parameters.convert import runtimeify
    from galaxy.tools.runtime import setup_for_runtimeify

    hda_references, adapt_datasets = setup_for_runtimeify(self.app, compute_environment, input_datasets)
    job_runtime_state = runtimeify(validated_tool_state, self.tool, adapt_datasets)
    cwl_style_inputs = job_runtime_state.input_state
else:
    from galaxy.workflow.modules import to_cwl

    log.info(
        "Building CWL style inputs using deprecated to_cwl function - tool may work differently in the future."
    )
    hda_references = []
    cwl_style_inputs = to_cwl(incoming, hda_references=hda_references, compute_environment=compute_environment)

The `to_cwl` Shortcut

Location

lib/galaxy/workflow/modules.py - function to_cwl()

Purpose

The to_cwl function was originally designed for workflow execution, transforming Galaxy model objects into CWL-compatible representations. It was repurposed as a “shortcut” for YAML tools because:

It recursively converts HDAs, HDCAs, and collections to file/directory objects
It handles nested tool state (conditionals, repeats)
It already produced the exact format needed for JavaScript expression evaluation

How It Works

def to_cwl(value, hda_references, step=None, compute_environment=None):
    if isinstance(value, model.HistoryDatasetAssociation):
        hda_references.append(value)
        properties = {
            "class": "File",
            "location": f"step_input://{len(hda_references)}",
            "format": value.extension,
            "path": compute_environment.input_path_rewrite(value) if compute_environment else value.get_file_name(),
        }
        set_basename_and_derived_properties(properties, value.dataset.created_from_basename or value.name)
        return properties
    elif isinstance(value, model.DatasetCollection):
        # Handle collections recursively...
    elif isinstance(value, dict):
        # Recurse into nested state
        return {k: to_cwl(v, ...) for k, v in value.items()}
    # ...

Limitations of `to_cwl`

No Type Information: Recursively processes all dict values without understanding the tool parameter model
Model Object Dependency: Requires actual Galaxy model objects (HDAs, HDCAs) at evaluation time
Workflow-Specific Logic: Contains workflow-related checks (step readiness, dataset state) that aren’t relevant for tool execution
No Validation: No validation against the tool’s parameter model

The New Approach: `runtimeify`

Location

lib/galaxy/tool_util/parameters/convert.py - function runtimeify()

Key Concept

The runtimeify function transforms a validated JobInternalToolState into a JobRuntimeToolState. This is a model-aware transformation that:

Takes strongly-typed internal state (with dataset IDs already decoded)
Uses the tool’s parameter model to identify data parameters
Transforms data references into CWL-style file objects

Implementation

def runtimeify(
    internal_state: JobInternalToolState,
    input_models: ToolParameterBundle,
    adapt_dataset: DatasetToRuntimeJson,
) -> JobRuntimeToolState:

    def adapt_dict(value: dict):
        data_request_internal_hda = DataRequestInternalHda(**value)
        as_json = adapt_dataset(data_request_internal_hda).model_dump()
        as_json["class"] = as_json.pop("class_")  # Pydantic alias handling
        return as_json

    def to_runtime_callback(parameter: ToolParameterT, value: Any):
        if isinstance(parameter, DataParameterModel):
            if parameter.multiple and isinstance(value, list):
                return list(map(adapt_dict, value))
            else:
                return adapt_dict(value)
        elif isinstance(parameter, DataCollectionParameterModel):
            raise NotImplementedError("DataCollectionParameterModel runtime adaptation not implemented yet.")
        else:
            return VISITOR_NO_REPLACEMENT

    runtime_state_dict = visit_input_values(input_models, internal_state, to_runtime_callback)
    runtime_state = JobRuntimeToolState(runtime_state_dict)
    runtime_state.validate(input_models)
    return runtime_state

Support Infrastructure

`lib/galaxy/tools/runtime.py`

def setup_for_runtimeify(app, compute_environment, input_datasets):
    hdas_by_id = {d.id: (d, i) for (i, d) in enumerate(input_datasets.values()) if d is not None}

    def adapt_dataset(value: DataRequestInternalDereferencedT) -> DataInternalJson:
        hda, index = hdas_by_id[value.id]
        properties = {
            "class": "File",
            "location": f"step_input://{index}",
            "format": hda.extension,
            "path": compute_environment.input_path_rewrite(hda) if compute_environment else hda.get_file_name(),
            "size": int(hda.dataset.get_size()),
            "listing": [],
        }
        set_basename_and_derived_properties(properties, hda.dataset.created_from_basename or hda.name)
        return DataInternalJson(**properties)

    return hda_references, adapt_dataset

State Classes Involved

`JobInternalToolState`

Representation: "job_internal"
Data References: {src: "hda", id: <decoded_int>}
Purpose: Internal state after decoding, dereferencing, and expansion - per-job state

`JobRuntimeToolState`

Representation: "job_runtime"
Data References: DataInternalJson (CWL-style File objects)
Purpose: Runtime state suitable for JavaScript expression evaluation

`DataInternalJson`

class DataInternalJson(StrictModel):
    class_: Literal["File"]
    basename: str
    location: str
    path: str                # Absolute path to file
    listing: Optional[List[str]]
    nameroot: Optional[str]
    nameext: Optional[str]
    format: str              # Galaxy extension (txt, bam, etc.)
    checksum: Optional[str]
    size: int

How Job Tool State Gets Persisted

Commit 5ad27b8ca8fe759e2f6ad7cec5670c07374ca1c7 (“Persist validated job tool state”) added:

New tool_state column on Job model: Stores JobInternalToolState.input_state as JSON
ToolSource.source_class column: Stores the tool source class name for reconstruction

# From lib/galaxy/tools/execute.py
if execution_slice.validated_param_combination:
    tool_state = execution_slice.validated_param_combination.input_state
    job.tool_state = tool_state

The validated_param_combination flows through:

Tool Request API creates JobInternalToolState via state transformations
MappingParameters carries validated state through expansion
ExecutionSlice receives validated state for each job
State persisted to job.tool_state at job creation

State Transformation Flow

RequestToolState (API)
        |
        | decode()
        v
RequestInternalToolState (persisted in ToolRequest)
        |
        | dereference() - URI inputs -> HDA references
        v
RequestInternalDereferencedToolState
        |
        | expand() - collection mapping
        v
JobInternalToolState (persisted in job.tool_state)
        |
        | runtimeify() - at job evaluation time
        v
JobRuntimeToolState (used for command building)

Current Gaps and Future Work

Not Yet Implemented

Collection inputs: DataCollectionParameterModel runtime adaptation raises NotImplementedError
Nested collections: More complex collection types need handling in runtimeify

Path Forward

The goal is to:

Fully implement runtimeify to handle all parameter types
Add comprehensive testing of the state transformation pipeline
Eventually deprecate the to_cwl fallback path
Use the validated state for more than just YAML tools (command-line construction, provenance, etc.)

Relevant Code Locations

Component	Location
`runtimeify`	`lib/galaxy/tool_util/parameters/convert.py`
`setup_for_runtimeify`	`lib/galaxy/tools/runtime.py`
`to_cwl` (legacy)	`lib/galaxy/workflow/modules.py`
`UserToolEvaluator`	`lib/galaxy/tools/evaluation.py`
`JobRuntimeToolState`	`lib/galaxy/tool_util/parameters/state.py`
`DataInternalJson`	`lib/galaxy/tool_util_models/parameters.py`
State persistence	`lib/galaxy/tools/execute.py`
`job_runtime` model factory	`lib/galaxy/tool_util_models/parameters.py`

Testing

The state conversion is tested via:

Tool test cases that exercise the new API path
Unit tests in test/unit/tool_util/test_parameter_test_cases.py
Integration tests that verify tool execution through the Jobs API

The GALAXY_TEST_USE_LEGACY_TOOL_API environment variable controls whether tests use the legacy POST /api/tools or new POST /api/jobs endpoint.

Component Yaml Tool Runtime

YAML Tool Runtime State Representation

Overview

Current Architecture

Two Paths to Runtime State

The `to_cwl` Shortcut

Location

Purpose

How It Works

Limitations of `to_cwl`

The New Approach: `runtimeify`

Location

Key Concept

Implementation

Support Infrastructure

`lib/galaxy/tools/runtime.py`

State Classes Involved

`JobInternalToolState`

`JobRuntimeToolState`

`DataInternalJson`

How Job Tool State Gets Persisted

State Transformation Flow

Current Gaps and Future Work

Not Yet Implemented

Path Forward

Relevant Code Locations

Testing

Incoming References (12)

YAML Tool Runtime State Representation

Overview

Current Architecture

Two Paths to Runtime State

The to_cwl Shortcut

Location

Purpose

How It Works

Limitations of to_cwl

The New Approach: runtimeify

Location

Key Concept

Implementation

Support Infrastructure

lib/galaxy/tools/runtime.py

State Classes Involved

JobInternalToolState

JobRuntimeToolState

DataInternalJson

How Job Tool State Gets Persisted

State Transformation Flow

Current Gaps and Future Work

Not Yet Implemented

Path Forward

Relevant Code Locations

Testing

Incoming References (12)

The `to_cwl` Shortcut

Limitations of `to_cwl`

The New Approach: `runtimeify`

`lib/galaxy/tools/runtime.py`

`JobInternalToolState`

`JobRuntimeToolState`

`DataInternalJson`