PR #18758: More Typing, Docs, and Decomposition Around Tool Execution
PR: https://github.com/galaxyproject/galaxy/pull/18758 Title: More typing, docs, and decomposition around tool execution Status: Merged
Overview
PR #18758 introduced structured type annotations for the tool state lifecycle and decomposed monolithic tool execution methods into focused, well-typed functions. This PR is foundational to the structured tool state work — it creates the vocabulary of type aliases that describe how tool state transforms through the execution pipeline.
Key Changes
1. lib/galaxy/tools/_types.py — Tool State Type Aliases (NEW FILE)
Created a new module defining type aliases for each stage of tool state transformation. While all are Dict[str, Any] at runtime, the type aliases serve as documentation markers describing what processing has occurred.
Type Lifecycle Table (from the module docstring):
| Type | State For | Object References | Validated? |
|---|---|---|---|
ToolRequestT | request | src dicts of encoded ids | no |
ToolStateJobInstanceT | a job | src dicts of encoded ids | no |
ToolStateJobInstancePopulatedT | a job | model objs loaded from db | check_param |
ToolStateDumpedToJsonT | a job | src dicts of encoded ids (normalized) | yes |
ToolStateDumpedToJsonInternalT | a job | src dicts of decoded ids (normalized) | yes |
ToolStateDumpedToStringsT | a job | src dicts dumped to strs (normalized) | yes |
ParameterValidationErrorsT | errors | nested dict of str/Exception | n/a |
InputFormatT | format flag | Literal[“legacy”, “21.01”] | n/a |
Current location: lib/galaxy/tools/_types.py (69 lines)
Key insight: The lifecycle is ToolRequestT → (expand) → ToolStateJobInstanceT → (populate/check_param) → ToolStateJobInstancePopulatedT → (dump) → ToolStateDumpedToJson*T / ToolStateDumpedToStringsT
2. lib/galaxy/tools/__init__.py — expand_incoming() Decomposition
The monolithic expand_incoming() method (~40 lines of inline logic) was decomposed into focused methods:
Before (single method):
def expand_incoming(self, trans, incoming, request_context, input_format="legacy"):
# inline: decode rerun_remap_job_id
# inline: expand meta parameters
# inline: validate expansion
# inline: loop over expanded, populate each
...
After (decomposed):
a) expand_incoming() — orchestrator, typed signature:
def expand_incoming(
self, request_context: WorkRequestContext, incoming: ToolRequestT, input_format: InputFormatT = "legacy"
) -> Tuple[List[ToolStateJobInstancePopulatedT], List[ParameterValidationErrorsT], Optional[int], Optional[MatchingCollections]]
Note: trans parameter removed — now uses request_context directly.
b) _rerun_remap_job_id() — module-level function extracted:
def _rerun_remap_job_id(trans, incoming, tool_id: Optional[str]) -> Optional[int]
c) _ensure_expansion_is_valid() — validation guard:
def _ensure_expansion_is_valid(self, expanded_incomings: List[ToolStateJobInstanceT], rerun_remap_job_id: Optional[int]) -> None
d) _populate() — per-job parameter population:
def _populate(self, request_context, expanded_incoming: ToolStateJobInstanceT, input_format: InputFormatT) -> Tuple[ToolStateJobInstancePopulatedT, ParameterValidationErrorsT]
e) completed_jobs() — job caching lookup extracted from handle_input():
def completed_jobs(self, trans, use_cached_job: bool, all_params: List[ToolStateJobInstancePopulatedT]) -> Dict[int, Optional[model.Job]]
This was also called from workflow/modules.py with duplicated code — the extraction removed that duplication.
3. lib/galaxy/tools/execute.py — Execution Framework Typing
a) MappingParameters NamedTuple — typed fields:
class MappingParameters(NamedTuple):
param_template: ToolRequestT # was ToolParameterRequestT
param_combinations: List[ToolStateJobInstancePopulatedT] # was ToolParameterRequestInstanceT
Renamed from the old ToolParameterRequestT/ToolParameterRequestInstanceT aliases (which were deleted from execute.py and moved to _types.py with new names).
b) ExecutionSlice — typed param_combination:
param_combination: ToolStateJobInstancePopulatedT # was ToolParameterRequestInstanceT
c) ExecutionTracker — class-level attribute type annotations added:
execution_errors: List[ExecutionErrorsT]
successful_jobs: List[model.Job]
output_datasets: List[Tuple[str, model.HistoryDatasetAssociation]]
output_collections: List[Tuple[str, model.HistoryDatasetCollectionAssociation]]
implicit_collections: Dict[str, model.HistoryDatasetCollectionAssociation]
d) ExecutionErrorsT — new type alias:
ExecutionErrorsT = Union[str, Exception]
e) Null safety for collection_info — throughout ExecutionTracker, self.collection_info accesses were guarded with assert collection_info or if collection_info is not None checks, replacing direct attribute access on potentially-None objects.
4. lib/galaxy/tools/actions/ — Typed Action execute() Methods
All ToolAction subclass execute() methods had their incoming parameter retyped:
- Before:
incoming: Optional[ToolParameterRequestInstanceT] - After:
incoming: Optional[ToolStateJobInstancePopulatedT]
Affected files:
actions/__init__.py—ToolAction(abstract),DefaultToolActionactions/data_manager.py—DataManagerToolActionactions/history_imp_exp.py—ImportHistoryToolAction,ExportHistoryToolActionactions/metadata.py—SetMetadataToolActionactions/model_operations.py—ModelOperationToolActionactions/upload.py—UploadToolAction
Also added get_output_name() as an @abstractmethod on ToolAction.
5. lib/galaxy/tools/parameters/__init__.py — New Functions and Typed Signatures
a) ToolInputsT — new type alias:
ToolInputsT = Dict[str, Union[Group, ToolParameter]]
b) params_to_json_internal() — new convenience function:
def params_to_json_internal(params: ToolInputsT, param_values: ToolStateJobInstancePopulatedT, app) -> ToolStateDumpedToJsonInternalT
Wraps params_to_strings() with nested=True, use_security=False → decoded IDs.
c) params_to_json() — new convenience function:
def params_to_json(params: ToolInputsT, param_values: ToolStateJobInstancePopulatedT, app) -> ToolStateDumpedToJsonT
Wraps params_to_strings() with nested=True, use_security=True → encoded IDs.
d) params_to_strings() — enhanced signature and docs:
def params_to_strings(
params: ToolInputsT, param_values: ToolStateJobInstancePopulatedT, app, nested=False, use_security=False
) -> Union[ToolStateDumpedToJsonT, ToolStateDumpedToJsonInternalT, ToolStateDumpedToStringsT]
e) populate_state() — typed parameters:
def populate_state(
request_context, inputs: ToolInputsT, incoming: ToolStateJobInstanceT,
state: ToolStateJobInstancePopulatedT, errors: Optional[ParameterValidationErrorsT] = None,
..., input_format: InputFormatT = "legacy"
)
6. lib/galaxy/tools/parameters/grouping.py — Constructor Refactoring
All Group subclasses changed to require name in constructor:
Before:
group = Repeat()
group.name = "r"
After:
group = Repeat("r")
Affected classes: Group, Repeat, Section, UploadDataset, Conditional
Also added class-level type annotations:
Group.name: strRepeat.inputs: ToolInputsT,Repeat.min: int,Repeat.max: floatSection.inputs: ToolInputsTUploadDataset.inputs: ToolInputsTConditional.cases: List[ConditionalWhen],Conditional.value_ref: Optional[str]
Repeat.min defaults to 0 and Repeat.max defaults to inf (from math.inf), replacing None.
7. Other Typed Improvements
a) lib/galaxy/tools/parameters/meta.py:
ExpandedT = Tuple[List[ToolStateJobInstanceT], Optional[matching.MatchingCollections]]expand_meta_parameters(trans, tool, incoming: ToolRequestT) -> ExpandedT
b) lib/galaxy/managers/jobs.py:
by_tool_input()method typed withToolStateJobInstancePopulatedTandToolStateDumpedToJsonInternalT- New type aliases:
JobStateT = str,JobStatesT = Union[JobStateT, List[JobStateT]]
c) lib/galaxy/webapps/galaxy/api/jobs.py:
search()endpoint updated to useproxy_work_context_for_history()instead of constructingWorkRequestContextdirectlyexpand_incoming()call site updated for new signature (notransparam)
d) lib/galaxy/work/context.py:
proxy_work_context_for_history()now has explicit-> WorkRequestContextreturn type
e) lib/galaxy/workflow/modules.py:
- Duplicated
completed_jobsloop replaced withtool.completed_jobs(trans, use_cached_job, param_combinations)
f) lib/galaxy/tools/parameters/basic.py:
ToolParameter.name: strclass-level annotation added
g) test/unit/app/tools/test_evaluation.py:
- Test code updated for new Group constructor signatures
Current Codebase State (Post-PR Evolution)
Cross-referencing PR #18758 with the current structured_tool_state branch:
All PR Changes Intact
Every change from PR #18758 is present in the current codebase at the expected locations.
Significant Evolution Since PR
1. Async Variants Added:
expand_incoming_async()— async version ofexpand_incoming()in Tool class_populate_async()— async version of_populate()in Tool classpopulate_state_async()— async version in parameters module- These support the tool request/tasks API for async tool execution
2. MappingParameters Enhanced:
class MappingParameters(NamedTuple):
param_template: ToolRequestT
param_combinations: List[ToolStateJobInstancePopulatedT]
validated_param_template: Optional[RequestInternalDereferencedToolState] = None
validated_param_combinations: Optional[List[JobInternalToolState]] = None
def ensure_validated(self): ...
Added optional schema-validated state fields for the structured tool state execution path.
3. ExecutionSlice Enhanced:
- Added
validated_param_combination: Optional[JobInternalToolState]field - Supports both legacy and schema-validated execution modes
4. _ensure_expansion_is_valid() Updated:
def _ensure_expansion_is_valid(
self,
expanded_incomings: Union[List[JobInternalToolState], List[ToolStateJobInstanceT]],
rerun_remap_job_id: Optional[int],
) -> None
Union type now includes JobInternalToolState for schema-validated paths.
Relationship to Structured Tool State
PR #18758 is the bridge PR between the old untyped tool execution and the structured tool state system:
- Type aliases as documentation — Even though all types are
Dict[str, Any]at runtime, the aliases document what has happened to the state at each point in the pipeline - Decomposition enables insertion points — Breaking
expand_incoming()into parts created clean insertion points for the async/schema-validated variants added later - MappingParameters as dual carrier — The later addition of
validated_param_template/validated_param_combinationsshows MappingParameters became the bridge carrying both legacy and schema-validated state through execution - Foundation for
_types.py— This module is imported across the tools package and serves as the central type vocabulary for tool state
File Index
| File | Lines (PR) | Current Lines | Status |
|---|---|---|---|
lib/galaxy/tools/_types.py | 65 (new) | 69 | Intact |
lib/galaxy/tools/__init__.py | major changes | ~5000+ | Intact + async variants |
lib/galaxy/tools/execute.py | major changes | ~700+ | Intact + validated state fields |
lib/galaxy/tools/actions/__init__.py | typed execute() | ~1000+ | Intact |
lib/galaxy/tools/actions/data_manager.py | typed execute() | ~80+ | Intact |
lib/galaxy/tools/actions/history_imp_exp.py | typed execute() | ~200+ | Intact |
lib/galaxy/tools/actions/metadata.py | typed execute() | ~200+ | Intact |
lib/galaxy/tools/actions/model_operations.py | typed execute() | ~200+ | Intact |
lib/galaxy/tools/actions/upload.py | typed execute() | ~200+ | Intact |
lib/galaxy/tools/parameters/__init__.py | typed + new fns | ~700+ | Intact + async populate |
lib/galaxy/tools/parameters/grouping.py | constructor refactor | ~800+ | Intact |
lib/galaxy/tools/parameters/meta.py | typed expand | ~250+ | Intact |
lib/galaxy/tools/parameters/basic.py | name annotation | ~2500+ | Intact |
lib/galaxy/managers/jobs.py | typed by_tool_input | ~500+ | Intact |
lib/galaxy/webapps/galaxy/api/jobs.py | updated call site | ~500+ | Intact |
lib/galaxy/workflow/modules.py | deduplicated completed_jobs | ~2500+ | Intact |
lib/galaxy/work/context.py | return type | ~200+ | Intact |