PR #18524 Research Summary: Add Tool-Centric APIs to the Tool Shed 2.0
Metadata
| Field | Value |
|---|---|
| Title | Add Tool-Centric APIs to the Tool Shed 2.0 |
| Author | John Chilton (@jmchilton) |
| Status | MERGED |
| Created | 2024-07-10 |
| Merged | 2024-07-15 |
| Milestone | 24.2 |
| Labels | kind/enhancement, area/toolshed, area/tools |
| Branch | structured_tool_state_models -> dev |
| Stats | 96 files changed, +6287 / -151 |
| Merge Commit | f7427844ae641812f20f75db08135f557b8403fe |
| URL | https://github.com/galaxyproject/galaxy/pull/18524 |
High-Level Description
This PR is a foundational piece of the “structured tool state” initiative. It brings Pydantic-based models for describing tool inputs, outputs, and metadata from galaxy-tool-util into a form that can be served via new Tool Shed 2.0 APIs, enabling external tooling (IDE plugins, workflow validators, etc.) to reason about Galaxy tools without depending on Galaxy’s internal Tool classes.
Three Core Backend Enhancements to galaxy-tool-util:
-
Input parameter model layer — Pydantic models describing every type of Galaxy and CWL tool input parameter (text, integer, float, boolean, select, conditional, repeat, section, data, data_collection, color, hidden, rules, CWL types). Each model can generate a dynamic Pydantic model at runtime for validating tool state in different representations (request, request_internal, job_internal, test_case).
-
Citation handling refactor — Citations parsing was decoupled from XML and moved into
galaxy.tool_util.parserusing aCitationPydantic model, so it can be reused in library contexts without Galaxy app dependencies. -
Tool output Pydantic models — New
output_models.pywithToolOutputdiscriminated union (data, collection, text, integer, float, boolean) that mirrors output metadata fromToolSource.
Two New Tool Shed 2.0 API Endpoints:
-
GET /api/tools/{trs_tool_id}/version/{version}— Returns aParsedToolPydantic model containing parsed tool metadata (id, version, name, description, inputs, outputs, citations, license, profile, EDAM operations/topics, xrefs, help). -
GET /api/tools/{trs_tool_id}/version/{version}/parameter_request_schema— Returns a JSON schema for validating tool inputs according to the structured tool state API (from PR #17393).
This was described as an improvement over the earlier attempt (#18470), with repeat min/max, sections, richer API (inputs + outputs together), caching rework, and stock tool support.
Detailed File-by-File Breakdown
New Core Files (Introduced by PR)
lib/galaxy/tool_util/parameters/models.py (+931 lines)
The heart of the PR. Defines Pydantic models for every Galaxy tool parameter type:
TextParameterModel,IntegerParameterModel,FloatParameterModel,BooleanParameterModelColorParameterModel,HiddenParameterModel,RulesParameterModelSelectParameterModelwithLabelValueoptionsDataParameterModel,DataCollectionParameterModelConditionalParameterModelwithConditionalWhenfor when-blocksRepeatParameterModelwith min/max instance countsSectionParameterModel- CWL types:
CwlIntegerParameterModel,CwlFloatParameterModel,CwlStringParameterModel,CwlBooleanParameterModel,CwlFileParameterModel,CwlDirectoryParameterModel,CwlNullParameterModel,CwlUnionParameterModel
Each model implements pydantic_template(state_representation) returning DynamicModelInformation used to build runtime Pydantic models for different state representations (request vs request_internal vs job_internal vs test_case).
Current state: This file no longer exists at lib/galaxy/tool_util/parameters/models.py. It has been refactored into lib/galaxy/tool_util_models/parameters.py (now 2229 lines — more than double the original 931 lines). The architecture is preserved but significantly expanded with additional parameter types (drill_down, data_column, group_tag, baseurl, genomebuild, directory_uri, sample_sheet), many more state representations (relaxed_request, workflow_step, workflow_step_linked, landing_request, landing_request_internal, test_case_json, job_runtime), and validators.
lib/galaxy/tool_util/parameters/factory.py (+294 lines)
Factory functions to build parameter models from InputSource/ToolSource abstractions:
from_input_source()— routes to Galaxy or CWL parameter model constructioninput_models_for_tool_source()— builds full parameter bundle from tool sourceinput_models_for_pages()/input_models_for_page()— iterates page sourcesget_color_value()— helper for color parameter default
Current state: Still exists at same path, expanded to 481 lines. Now handles many more parameter types (drill_down, data_column, group_tag, baseurl, genomebuild, directory_uri) and validators. The _from_input_source_galaxy function grew substantially.
lib/galaxy/tool_util/parameters/__init__.py (+101 lines)
Package init that re-exports all parameter models and utilities.
Current state: Still exists, expanded to 189 lines with many more exports (validation functions, additional state types, visitor functions, case functions, convert functions).
lib/galaxy/tool_util/parameters/state.py (+98 lines)
Defines ToolState base class and subclasses:
RequestToolState(state_representation=“request”)RequestInternalToolState(state_representation=“request_internal”)JobInternalToolState(state_representation=“job_internal”)TestCaseToolState(state_representation=“test_case”)
Each has a _to_base_model() method that builds a dynamic Pydantic model for validation.
Current state: Still exists, expanded significantly with additional state types: RelaxedRequestToolState, WorkflowStepToolState, WorkflowStepLinkedToolState, LandingRequestToolState, LandingRequestInternalToolState, RequestInternalDereferencedToolState, TestCaseJsonToolState, JobRuntimeToolState, and a HasToolParameters protocol.
lib/galaxy/tool_util/parameters/_types.py (+43 lines)
Type utility functions: optional_if_needed, is_optional, union_type, list_type, cast_as_type.
Current state: This file no longer exists at lib/galaxy/tool_util/parameters/_types.py. Moved to lib/galaxy/tool_util_models/_types.py.
lib/galaxy/tool_util/parameters/visitor.py (+56 lines)
visit_input_values() function for traversing parameter model trees (handles conditionals, repeats, sections).
Current state: Still exists at same path, expanded with flat_state_path, keys_starting_with, repeat_inputs_to_array, validate_explicit_conditional_test_value, VISITOR_NO_REPLACEMENT.
lib/galaxy/tool_util/parameters/convert.py (+73 lines)
decode() function for converting RequestToolState to RequestInternalToolState (decodes encoded IDs to internal integers).
Current state: Still exists, expanded with encode, encode_test, fill_static_defaults, landing_decode, landing_encode, dereference, strictify.
lib/galaxy/tool_util/parameters/json.py (+27 lines)
to_json_schema_string() — converts tool parameter bundle to JSON schema string.
Current state: Still exists at same path.
lib/galaxy/tool_util/models.py (+74 lines)
Defines ParsedTool Pydantic model and parse_tool() factory function.
Current state: No longer exists at this path. Refactored into lib/galaxy/tool_util_models/__init__.py which now contains ParsedTool, ToolSourceBase, UserToolSource, AdminToolSource, plus test-related models (TestJob, Tests, etc.). The parse_tool() function moved to lib/galaxy/tool_util/model_factory.py as parse_tool() and parse_tool_custom().
lib/galaxy/tool_util/parser/output_models.py (+115 lines)
Pydantic models for tool outputs: ToolOutputDataset, ToolOutputCollection, ToolOutputText, ToolOutputInteger, ToolOutputFloat, ToolOutputBoolean, plus DatasetCollectionDescription hierarchy and from_tool_source().
Current state: No longer exists at this path. Moved to lib/galaxy/tool_util_models/tool_outputs.py with the same model structure but enhanced with Generic types for incoming vs strict output models, and the from_tool_source() function moved to lib/galaxy/tool_util/parser/output_objects.py.
lib/galaxy/tool_util/unittest_utils/parameters.py (+46 lines)
Test utility for loading tool sources from the test parameters directory.
Current state: Still exists at same path.
lib/galaxy/tools/stock.py (+33 lines)
Functions to iterate stock Galaxy tool paths and tool sources: stock_tool_paths(), stock_tool_sources().
Current state: Still exists at same path, largely unchanged.
lib/tool_shed/managers/model_cache.py (+64 lines)
ModelCache class for caching parsed tool models on disk, keyed by model schema hash + tool_id + tool_version. Gracefully handles model schema changes by using MD5 of the JSON schema as a cache key prefix.
Current state: Still exists at same path, largely unchanged.
lib/tool_shed/managers/tools.py (+128 lines)
Core tool shed tool management: parsed_tool_model_cached_for(), parsed_tool_model_for(), tool_source_for(), stock tool source lookup.
Current state: Still exists, expanded. Now imports parse_tool_custom from galaxy.tool_util.model_factory, uses ShedParsedTool (a subclass of ParsedTool with repository_revision).
Modified Existing Files
lib/galaxy/managers/citations.py (+30/-18)
Refactored to use Citation Pydantic model from galaxy.tool_util.parser.interface instead of parsing XML elements directly. parse_citation() now takes a Citation model. BibtexCitation and DoiCitation constructors take Citation model.
Current state: Uses Citation from galaxy.tool_util_models.tool_source (refactored path). The pattern is preserved.
lib/galaxy/tool_util/parser/interface.py (+25/-7)
Added Citation Pydantic model (type + content), XrefDict TypedDict, abstract methods parse_citations(), parse_edam_operations(), parse_edam_topics(), parse_xrefs(), parse_license(), parse_help().
Current state: Still exists. Citation and XrefDict moved to lib/galaxy/tool_util_models/tool_source.py. The abstract methods are still on ToolSource. parse_help() now returns Optional[HelpContent] (a structured model with format + content) instead of Optional[str].
lib/galaxy/tool_util/parser/xml.py (+43/-10)
Implemented parse_citations(), parse_edam_operations(), parse_edam_topics(), parse_xrefs(), parse_license(), parse_help() on XmlToolSource.
Current state: Still exists with these methods, plus many additional parsing methods added since.
lib/galaxy/tool_util/parser/yaml.py (+17/-12)
Implemented citation/metadata parsing methods for YAML tool sources.
Current state: Still exists with expanded parsing.
lib/galaxy/tool_util/parser/cwl.py (+25/-2)
Added CWL-specific implementations of the new parsing methods (mostly returning empty defaults).
Current state: Still exists.
lib/galaxy/tool_util/parser/output_collection_def.py (+62/-15)
Added dataset_collector_descriptions_from_output_dict() and Pydantic model conversion methods.
Current state: Still exists with these methods preserved and expanded.
lib/galaxy/tool_util/parser/output_objects.py (+86/-8)
Added to_model() methods on ToolOutput and ToolOutputCollection to convert to Pydantic output models.
Current state: Still exists. Now includes from_tool_source() function (moved from the deleted output_models.py).
lib/galaxy/tool_shed/metadata/metadata_generator.py (+20/-2)
Added RepositoryMetadataToolDict TypedDict and a refactored method for reuse.
Current state: Still exists with the RepositoryMetadataToolDict type.
lib/galaxy/tools/__init__.py (+14/-20)
Citation handling refactored to use the new Citation model pattern instead of parsing XML directly.
Current state: Still uses citation parsing through the model pattern.
lib/galaxy/tools/parameters/basic.py (+1/-1)
Bug fix in color parameter parsing — get_color_value() fix.
Current state: Still exists.
lib/galaxy/util/__init__.py (+12/-4)
Added listify improvements and galaxy_directory() function.
Current state: Still exists with these additions.
lib/galaxy/tool_util/ontologies/ontology_data.py (+5/-2)
Small improvements to EDAM ontology data handling.
Current state: Still exists.
lib/tool_shed/webapp/api2/tools.py (+49/-1)
Added the two new API endpoints (show_tool and tool_state_request).
Current state: Still exists with the endpoints preserved and expanded. Now also includes parameter_landing_request_schema, parameter_test_case_xml_schema, and tool_source endpoints.
lib/tool_shed/webapp/app.py (+2/-0)
Added model_cache initialization.
Current state: Still exists.
lib/tool_shed/structured_app.py (+2/-0)
Added model_cache to the structured app interface.
Current state: Still exists.
lib/tool_shed/webapp/model/__init__.py (+2/-0)
Minor model additions.
Current state: Still exists.
lib/tool_shed/managers/trs.py (+8/-6)
Small refactoring for TRS ID handling.
Current state: Still exists.
lib/tool_shed/webapp/frontend/src/schema/schema.ts (+1099/-0)
Auto-generated TypeScript schema for the tool shed frontend reflecting new API types.
Current state: Still exists, continuously regenerated.
Test Files
test/unit/tool_util/test_parameter_specification.py (+226 lines)
Core test framework for parameter model validation. Tests each parameter type against valid/invalid request states, request_internal states, job_internal states, and test_case states, driven by parameter_specification.yml.
Current state: Still exists, expanded to handle many more state representations and test scenarios.
test/unit/tool_util/parameter_specification.yml (+681 lines)
YAML test data defining expected valid/invalid states for each parameter type across different state representations.
Current state: Still exists, expanded from 681 to 4199 lines — dramatically more test coverage.
test/unit/tool_util/test_parsing.py (+124/-38)
Tests for tool source parsing, added tests for citation, output, EDAM, and help parsing.
Current state: Still exists with expanded test coverage.
test/unit/tool_shed/test_model_cache.py (+53 lines)
Tests for the ModelCache class.
Current state: Still exists.
test/unit/tool_shed/test_tool_source.py (+38 lines)
Tests for tool source retrieval in the tool shed context.
Current state: Still exists.
test/unit/app/tools/test_citations.py (+5/-1)
Updated citation tests for the new model pattern.
Current state: Still exists.
test/unit/app/tools/test_stock.py (+16 lines)
Tests for stock tool iteration.
Current state: Still exists.
test/functional/tools/parameters/*.xml and *.cwl (36 new files)
Test tool XML and CWL files covering every parameter type: boolean, color, conditional, data, data_collection, float, hidden, int, repeat, section, select, text, and CWL equivalents.
Current state: All still exist. Many more parameter test tools have been added since (drill_down, data_column, genomebuild, group_tag, directory_uri, sample_sheet, etc.).
Documentation Files
doc/source/dev/ PlantUML files and diagrams
tool_state_api.plantuml.txt/.svg— Sequence diagram showing API request flow through RequestToolState validationtool_state_state_classes.plantuml.txt/.svg— Class diagram of ToolState hierarchy (RequestToolState -> RequestInternalToolState -> JobInternalToolState)image.Makefile,plantuml_options.txt,plantuml_style.txt— Build infrastructure for diagrams
Current state: All still exist.
Key Architectural Decisions and Patterns
1. Discriminated Union Pattern for Parameter Types
All parameter types are modeled as a discriminated union (ToolParameterT) using Pydantic’s Discriminator based on parameter_type field. This enables clean serialization/deserialization and JSON schema generation.
2. State Representation Architecture
The ToolState class hierarchy models different representations of tool state:
- RequestToolState — External API state with encoded IDs and mapping/batch constructs
- RequestInternalToolState — Decoded IDs, still supports mapping
- JobInternalToolState — Expanded (no mapping), defaults filled
- TestCaseToolState — For tool test XML validation
Each generates a dynamic Pydantic model from the parameter definitions for that specific representation.
3. Dynamic Model Generation
pydantic_template() on each parameter model returns DynamicModelInformation (name, type definition, validators). These are assembled via pydantic.create_model() into a runtime validation model. This is a key pattern that enables JSON schema generation for arbitrary tools.
4. Model-Based Caching
The ModelCache uses MD5 of the Pydantic model’s JSON schema as a cache key prefix. When models evolve (fields added/removed), the hash changes and old cache entries are naturally invalidated. Clever approach to avoid versioning issues.
5. Citation Decoupling
Citations moved from XML-parsing in galaxy.tools to a clean Citation(type, content) Pydantic model in the parser layer, enabling reuse without Galaxy app dependencies.
6. Stock Tool Support
The PR added the ability to serve built-in Galaxy tools through the same Tool Shed API, not just repository-hosted tools.
Cross-Reference: Current Codebase vs PR
Major Refactoring Since the PR
The most significant change since this PR merged is the extraction of galaxy.tool_util_models as a separate package:
| PR Location | Current Location | Notes |
|---|---|---|
lib/galaxy/tool_util/parameters/models.py | lib/galaxy/tool_util_models/parameters.py | Grew from 931 to 2229 lines |
lib/galaxy/tool_util/parameters/_types.py | lib/galaxy/tool_util_models/_types.py | Moved to models package |
lib/galaxy/tool_util/models.py | lib/galaxy/tool_util_models/__init__.py | Expanded with UserToolSource, AdminToolSource, test models |
lib/galaxy/tool_util/parser/output_models.py | lib/galaxy/tool_util_models/tool_outputs.py | Enhanced with Generic types |
| (new) | lib/galaxy/tool_util_models/_base.py | New base model |
| (new) | lib/galaxy/tool_util_models/tool_source.py | Citation, XrefDict, HelpContent, etc. |
| (new) | lib/galaxy/tool_util_models/parameter_validators.py | Validator models |
| (new) | lib/galaxy/tool_util_models/sample_sheet.py | Sample sheet models |
| (new) | lib/galaxy/tool_util_models/assertions.py | Assertion models |
| (new) | lib/galaxy/tool_util/model_factory.py | parse_tool/parse_tool_custom extracted |
| (new) | lib/galaxy/tool_util/parameters/case.py | Test case state handling |
| (new) | lib/galaxy/tool_util/parameters/model_validation.py | Validation entry points |
Files That Still Exist Unchanged or Minimally Changed
lib/tool_shed/managers/model_cache.py— Largely unchangedlib/galaxy/tools/stock.py— Largely unchangedlib/galaxy/tool_util/unittest_utils/parameters.py— Largely unchanged- All doc/source/dev PlantUML files — Unchanged
- All test parameter tool XML/CWL files — Still present (many more added)
State Representations Growth
The PR introduced 4 state representations. The current codebase has expanded to 12+:
request,relaxed_request,request_internal,request_internal_dereferencedjob_internal,job_runtimetest_case,test_case_jsonworkflow_step,workflow_step_linkedlanding_request,landing_request_internal
Parameter Types Growth
The PR introduced core Galaxy types + CWL types. Since then, many more have been added:
drill_down,data_column,group_tag,baseurl,genomebuild,directory_uri- Sample sheet support
- Parameter validators (in_range, length, regex, expression, empty_field, no_options)
Review Discussion Summary
Reviewer: @mvdbeek
- Overall very positive: “Looks great, can’t wait to play with this” and “Looks great, let’s deploy it”
- APPROVED on 2024-07-15
Key Discussion Points:
-
typing_extensionsusage (on_types.py): mvdbeek suggested usingtyping_extensionsforget_args/get_originfor Python 3.7 compatibility. jmchilton initially concerned about dropping 3.7, then agreed after seeing it works. -
case.pyWIP code (onparameters/case.py): mvdbeek raised concerns about unqualified input references and nested input handling. jmchilton clarified this was WIP code that would move to the full structured tool state branch (#17393). -
Default values in factory (on
factory.py): mvdbeek asked if providing defaults in the factory was the right layer vs producing optional types. Also suggested eventually consuming parameter models inbasic.pyto avoid drift. jmchilton agreed as a goal but deferred, creating issue #18537 to track. -
Empty tool_state.md: mvdbeek noted the file was empty. jmchilton removed it, keeping only the PlantUML diagrams.
-
TODO comment cleanup: mvdbeek pointed out a done TODO that should be removed.
Community Comments:
- @hexylena: “Fantastic to see, it’ll let me throw away a repository where I collect this data from live servers” — highlighted community need for this metadata API.
- @mvdbeek (post-merge): “This is so awesome… This will enable so many cool things for a long time to come!”
Post-Deployment Issues (Sentry):
After deployment, Sentry reported 5 issues:
ValueError: too many values to unpack (expected 3)on/api/tools/{tool_id}/versions/{tool_version}AttributeError: 'NoneType' object has no attribute 'installable_revisions'AttributeError: 'object' object has no attribute 'app'ValidationError: 1 validation error for CitationAssertionErroron display_tool
These were presumably addressed in follow-up PRs.
Related PRs
- #17393 — Full structured tool state PR (the larger initiative this feeds into)
- #18470 — Previous attempt at these APIs (superseded by this PR)
- #18537 — Issue tracking consuming parameter models in basic.py to avoid drift