CWL Tool Loading and Reference Test Infrastructure
Research document covering how Galaxy loads CWL tools from .cwl files into executable Tool objects, how the CWL reference/conformance test infrastructure works, and how loaded CWL tools interact with the tool request API.
Branch: cwl_on_tool_request_api_2
Table of Contents
- Topic 1: How Galaxy Loads CWL Tools
- Topic 2: CWL Reference Test Infrastructure
- Topic 3: Tool Loading and the Tool Request API
Topic 1: How Galaxy Loads CWL Tools
Overview
The CWL tool loading pipeline transforms a .cwl file into a fully usable Galaxy Tool object through a multi-layered chain:
.cwl file
-> get_tool_source() (factory.py:105-111)
-> CwlToolSource (parser/cwl.py:72)
-> tool_proxy() (cwl/parser.py:761)
-> SchemaLoader.tool() (cwl/schema.py:94)
-> cwltool loads & validates CWL document
-> _cwl_tool_object_to_proxy() (cwl/parser.py:858)
-> CommandLineToolProxy or ExpressionToolProxy
-> CwlToolSource.parse_tool_type() returns "cwl" or "galactic_cwl"
-> create_tool_from_source() (__init__.py:450)
-> tool_types["cwl"] -> CwlTool class
-> Tool.__init__() calls tool.parse(tool_source)
-> CwlCommandBindingTool.parse() stores _cwl_tool_proxy
-> Tool.parse_inputs() -> input_models_for_pages() -> CWL parameter models
Step 1: File Detection and ToolSource Creation
Entry point: get_tool_source() in lib/galaxy/tool_util/parser/factory.py:64-114
When Galaxy encounters a .cwl (or .json) file, it is identified as a CWL tool:
# factory.py:105-111
elif config_file.endswith(".json") or config_file.endswith(".cwl"):
uuid = uuid or uuid4()
return CwlToolSource(config_file, strict_cwl_validation=strict_cwl_validation,
tool_id=tool_id, uuid=uuid)
For CWL tools to be recognized by the directory loader, enable_beta_formats must be True:
lib/galaxy/tool_util/loader_directory.py:119-150-looks_like_a_tool()only checks CWL files whenenable_beta_formats=Truelib/galaxy/tool_util/loader_directory.py:253-277-_find_tool_files()only searches non-XML files whenenable_beta_formats=Truelib/galaxy/config/__init__.py:960- Galaxy config:enable_beta_tool_formats(default:False)- Test driver sets
enable_beta_tool_formats=Trueatlib/galaxy_test/driver/driver_util.py:212
Step 2: CwlToolSource and the ToolProxy
File: lib/galaxy/tool_util/parser/cwl.py:72-346
CwlToolSource extends ToolSource (the abstract interface all tool formats implement). It lazily creates a ToolProxy on first access:
# cwl.py:93-116
@property
def tool_proxy(self) -> "ToolProxy":
if self._tool_proxy is None:
if self._source_path is not None:
self._tool_proxy = tool_proxy(
self._source_path,
strict_cwl_validation=self._strict_cwl_validation,
tool_directory=self._tool_directory,
tool_id=self._tool_id,
uuid=self._uuid,
)
else:
# From persistent representation (Celery deserialization)
self._tool_proxy = tool_proxy_from_persistent_representation(
self._source_object, ...)
return self._tool_proxy
Key parse methods on CwlToolSource:
| Method | Returns | Notes |
|---|---|---|
parse_tool_type() (line 125) | "cwl" or "galactic_cwl" | Checks for gx:Interface hint |
parse_command() (line 137) | "$__cwl_command" | Placeholder; real command built by cwltool at exec time |
parse_input_pages() (line 223) | PagesSource([CwlPageSource], inputs_style="cwl") | Creates CWL-style input page |
parse_outputs() (line 228) | (outputs, output_collections) | Delegates to ToolProxy.output_instances() |
parse_requirements() (line 305) | containers, software reqs | Extracts DockerRequirement, SoftwareRequirement, etc. |
parse_profile() (line 322) | "17.09" | Hardcoded CWL profile |
to_string() (line 344) | JSON string | For Celery serialization; calls tool_proxy.to_persistent_representation() |
The gx:Interface hint determines tool type:
- With
gx:Interface(e.g.,galactic_cat.cwl):tool_type = "galactic_cwl"->GalacticCwlToolclass - Without
gx:Interface:tool_type = "cwl"->CwlToolclass
Step 3: Schema Loading Pipeline
File: lib/galaxy/tool_util/cwl/schema.py:1-111
The SchemaLoader class wraps cwltool’s document loading pipeline:
# schema.py:32-110
class SchemaLoader:
def __init__(self, strict=True, validate=True):
self._strict = strict
self._validate = validate
def loading_context(self):
# Creates cwltool LoadingContext with:
loading_context.strict = self._strict
loading_context.do_validate = self._validate
loading_context.enable_dev = True # allows dev CWL versions
loading_context.do_update = True
loading_context.relax_path_checks = True
def raw_process_reference(self, path):
# Step 1: Normalize path, create file:// URI
# Step 2: load_tool.fetch_document(uri, loadingContext)
# Returns: RawProcessReference(loading_context, process_object, uri)
def process_definition(self, raw_process_reference):
# Step 3: resolve_and_validate_document() - full CWL validation
# Returns: ResolvedProcessDefinition
def tool(self, **kwds):
# Step 4: load_tool.make_tool() - creates cwltool Process object
# Returns: cwltool Process (CommandLineTool or ExpressionTool)
Two singleton instances:
schema_loader = SchemaLoader()- strict, validating (line 109)non_strict_non_validating_schema_loader = SchemaLoader(strict=False, validate=False)(line 110)
Step 4: ToolProxy Construction
File: lib/galaxy/tool_util/cwl/parser.py:127-256, 761-879
The tool_proxy() function (line 761) calls _to_cwl_tool_object() (line 811):
# parser.py:811-855
def _to_cwl_tool_object(tool_path=None, tool_object=None, ...):
schema_loader = _schema_loader(strict_cwl_validation)
if tool_path is not None:
# Load from file path
raw_process_reference = schema_loader.raw_process_reference(tool_path)
cwl_tool = schema_loader.tool(raw_process_reference=raw_process_reference)
elif tool_object is not None:
# Load from dict/YAML object (for persistent representations)
tool_object = yaml_no_ts().load(json.dumps(tool_object))
raw_process_reference = schema_loader.raw_process_reference_for_object(tool_object)
cwl_tool = schema_loader.tool(raw_process_reference=raw_process_reference)
_hack_cwl_requirements(cwl_tool) # Galaxy-specific requirement adjustments
check_requirements(raw_tool) # Validate supported requirements
return _cwl_tool_object_to_proxy(cwl_tool, tool_id, uuid, ...)
_cwl_tool_object_to_proxy() (line 858) selects the proxy class based on class:
# parser.py:858-879
def _cwl_tool_object_to_proxy(cwl_tool, tool_id, uuid, ...):
process_class = raw_tool["class"]
if process_class == "CommandLineTool":
proxy_class = CommandLineToolProxy
elif process_class == "ExpressionTool":
proxy_class = ExpressionToolProxy
else:
raise Exception("File not a CWL CommandLineTool.")
return proxy_class(cwl_tool, tool_id, uuid, raw_process_reference, tool_path)
ToolProxy base class (line 127-256) provides:
job_proxy(input_dict, output_dict, job_directory)(line 150) - creates aJobProxyfor executiongalaxy_id()(line 162) - derives Galaxy tool ID from CWLidfield or UUIDto_persistent_representation()(line 199) - serializes for Celery/database storagefrom_persistent_representation()(line 215) - deserializesrequirements/hints_or_requirements_of_class()- CWL requirement access
Constructor (line 130-148) strips format from input fields to prevent cwltool validation errors:
for input_field in self._tool.inputs_record_schema["fields"]:
if "format" in input_field:
del input_field["format"]
CommandLineToolProxy (line 258-322) adds:
input_fields()(line 278) - readsinputs_record_schema["fields"], resolvesschemaDefsinput_instances()(line 305) - converts fields toInputInstanceobjectsoutput_instances()(line 308) - readsoutputs_record_schema["fields"]docker_identifier()(line 315) - extracts DockerRequirement
ExpressionToolProxy (line 325) - subclass of CommandLineToolProxy, only changes _class = "ExpressionTool".
Step 5: Input Parameters - From CWL Schema to Galaxy Parameter Models
CWL inputs flow through two parallel systems:
A. Galaxy Legacy Parameters (parse_inputs in __init__.py)
Tool.parse_inputs() at lib/galaxy/tools/__init__.py:1718-1757:
def parse_inputs(self, tool_source):
self.has_galaxy_inputs = False
pages = tool_source.parse_input_pages()
# CwlToolSource returns PagesSource with inputs_style="cwl"
# PagesSource.inputs_defined returns True (style != "none")
try:
parameters = input_models_for_pages(pages, self.profile)
self.parameters = parameters
except Exception:
pass
if pages.inputs_defined:
self.has_galaxy_inputs = True # <-- WAS True for CWL
# BUT the new branch bypasses this for CWL
Key change on this branch: has_galaxy_inputs is set True because inputs_style="cwl" is not "none". However, the expand_incoming_async() method at line 2183-2191 checks self.has_galaxy_inputs to decide whether to run Galaxy’s parameter expansion machinery. When has_galaxy_inputs=False (forced for CWL in the new path), the raw state passes through.
B. CWL Parameter Models (New typed system)
CwlPageSource (parser/cwl.py:366) creates CwlInputSource objects from the tool proxy’s input_instances().
These flow into input_models_for_pages() at lib/galaxy/tool_util/parameters/factory.py:453:
def from_input_source(input_source, profile):
if input_source.input_class == "cwl": # CwlInputSource.input_class returns "cwl"
tool_parameter = _from_input_source_cwl(input_source)
else:
tool_parameter = _from_input_source_galaxy(input_source, profile)
_from_input_source_cwl() (factory.py:421-436) maps CWL schema-salad types to parameter models:
| CWL Type | Galaxy Parameter Model | parameter_type |
|---|---|---|
int | CwlIntegerParameterModel | "cwl_integer" |
float | CwlFloatParameterModel | "cwl_float" |
string | CwlStringParameterModel | "cwl_string" |
boolean | CwlBooleanParameterModel | "cwl_boolean" |
null | CwlNullParameterModel | "cwl_null" |
org.w3id.cwl.cwl.File | CwlFileParameterModel | "cwl_file" |
org.w3id.cwl.cwl.Directory | CwlDirectoryParameterModel | "cwl_directory" |
[type1, type2, ...] (union) | CwlUnionParameterModel | "cwl_union" |
These models live in lib/galaxy/tool_util_models/parameters.py:1943-2100.
CwlFileParameterModel and CwlDirectoryParameterModel (lines 2061-2088) both use DataRequest as their py_type, meaning the API expects {src: "hda", id: <encoded_id>} for dataset inputs.
Step 6: Tool Class Instantiation
File: lib/galaxy/tools/__init__.py:450-472, 5085-5103
# Line 460-466
elif tool_type := tool_source.parse_tool_type():
ToolClass = tool_types.get(tool_type)
if ToolClass is None:
if tool_type == "cwl":
raise ToolLoadError("Runtime support for CWL tools is not implemented currently")
# Line 5085-5103 - TOOL_CLASSES list includes:
# CwlTool, # tool_type = "cwl"
# GalacticCwlTool, # tool_type = "galactic_cwl"
tool_types = {tool_class.tool_type: tool_class for tool_class in TOOL_CLASSES}
Note: The error at line 463-464 fires only if CwlTool is not in TOOL_CLASSES (i.e., on mainline Galaxy without CWL support). On this CWL branch, CwlTool IS in the list.
CWL Tool Class Hierarchy
Tool (lib/galaxy/tools/__init__.py)
└── CwlCommandBindingTool (line 3754)
├── GalacticCwlTool (line 3843, tool_type="galactic_cwl")
└── CwlTool (line 3855, tool_type="cwl")
CwlCommandBindingTool (line 3754-3840):
exec_before_job()- CreatesJobProxy, pre-computes command via cwltool, stages filesparse()(line 3831) - Stores_cwl_tool_proxyfromtool_source.tool_proxyparam_dict_to_cwl_inputs()- Abstract, raisesNotImplementedError
CwlTool (line 3855-3873):
tool_type = "cwl"may_use_container_entry_point = Trueparam_dict_to_cwl_inputs()- Legacy path viato_cwl_job()(not used in new path)inputs_from_dict()(line 3866) - Translates API payloads betweengalaxyandcwlrepresentations
GalacticCwlTool (line 3843-3852):
tool_type = "galactic_cwl"param_dict_to_cwl_inputs()- Usesgalactic_flavored_to_cwl_job()(legacy)
Serialization for Celery
CWL tools serialize/deserialize for Celery task processing:
-
Serialize:
Tool.to_raw_tool_source()(init.py:1799) callsCwlToolSource.to_string()(parser/cwl.py:344), which callsToolProxy.to_persistent_representation()(cwl/parser.py:199). Returns JSON containingclass,raw_process_reference(the raw CWL doc),tool_id, anduuid. -
Deserialize:
create_tool_from_representation()(init.py:475) callsget_tool_source(tool_source_class="CwlToolSource", raw_tool_source=json_string), which callsbuild_cwl_tool_source()(factory.py:48), which callstool_proxy_from_persistent_representation(). -
The
tool_source_classis persisted as"CwlToolSource"(it’stype(self.tool_source).__name__).
Supported CWL Requirements
From lib/galaxy/tool_util/cwl/parser.py:82-96:
SUPPORTED_TOOL_REQUIREMENTS = [
"CreateFileRequirement",
"DockerRequirement",
"EnvVarRequirement",
"InitialWorkDirRequirement",
"InlineJavascriptRequirement",
"LoadListingRequirement",
"ResourceRequirement",
"ShellCommandRequirement",
"ScatterFeatureRequirement",
"SchemaDefRequirement",
"SubworkflowFeatureRequirement",
"StepInputExpressionRequirement",
"MultipleInputFeatureRequirement",
"CredentialsRequirement",
]
Topic 2: CWL Reference Test Infrastructure
Conformance Test Provisioning: update_cwl_conformance_tests.sh
The CWL conformance test tools are not vendored or submoduled. They are downloaded on-demand by scripts/update_cwl_conformance_tests.sh and not committed to git. This is a two-stage process:
Stage 1: Shell Script Downloads Tools
File: scripts/update_cwl_conformance_tests.sh
For each CWL version (1.0, 1.1, 1.2):
-
Downloads the official CWL spec repo as a zip from GitHub:
- v1.0:
common-workflow-language/common-workflow-languagerepo - v1.1:
common-workflow-language/cwl-v1.1repo - v1.2:
common-workflow-language/cwl-v1.2repo
- v1.0:
-
Extracts into
test/functional/tools/cwl_tools/v{version}/:conformance_tests.yaml— the test manifest (different source paths per version: v1.0 usesv1.0/conformance_test_v1.0.yaml, others use rootconformance_tests.yaml)- The test tools directory — v1.0 copies
v1.0/v1.0/(creating thecwl_tools/v1.0/v1.0/path thatsample_tool_conf.xmlreferences), others copytests/
-
Runs
scripts/cwl_conformance_to_test_cases.pyto generate Python test files
Result directory structure after running:
test/functional/tools/cwl_tools/
├── v1.0/
│ ├── conformance_tests.yaml
│ └── v1.0/ # actual test tools (cat1-testcli.cwl, bwa-mem-tool.cwl, etc.)
├── v1.0_custom/ # committed Galaxy-specific CWL test tools
├── v1.1/
│ ├── conformance_tests.yaml
│ └── tests/ # CWL v1.1 test tools
└── v1.2/
├── conformance_tests.yaml
└── tests/ # CWL v1.2 test tools
Stage 2: Python Script Generates Test Cases
File: scripts/cwl_conformance_to_test_cases.py
- Reads
conformance_tests.yamlrecursively (following$importreferences via its ownconformance_tests_gen()) - For each conformance test entry, generates a pytest method in a
TestCwlConformanceclass:@pytest.mark.cwl_conformance @pytest.mark.cwl_conformance_v1_0 @pytest.mark.command_line_tool # from CWL test tags @pytest.mark.green # or @pytest.mark.red def test_conformance_v1_0_cat1(self): """Test doc string...""" self.cwl_populator.run_conformance_test("v1.0", "Test doc string...") - Tests are marked red (known-failing in Galaxy) or green based on a hardcoded
RED_TESTSdict:- v1.0: ~30 red tests (mostly scatter/valuefrom/subworkflow/secondary files)
- v1.1: ~50 red tests (adds timelimit, networkaccess, inplace_update, etc.)
- v1.2: ~100+ red tests (adds conditionals, v1.2-specific features)
- Writes generated test file to
lib/galaxy_test/api/cwl/test_cwl_conformance_v{version_simple}.py - The generated test class extends
BaseCwlWorkflowsApiTestCaseand each method callsself.cwl_populator.run_conformance_test(version, doc)— which looks up the test bydocstring inconformance_tests.yaml, stages inputs, runs the tool/workflow, and compares outputs
The generated test files ARE committed; the downloaded tool files are NOT.
Conformance Test Lookup at Runtime
CwlPopulator.run_conformance_test(version, doc) (populators.py:3150):
- Calls
get_conformance_test(version, doc)which iteratesconformance_tests.yamlentries matching bydocfield - Each entry has
tool(relative .cwl path),job(input JSON), andoutput(expected output) fields - Resolves tool path relative to the conformance test directory
- Stages inputs via
stage_inputs()(uploads files referenced in the job JSON) - Runs via
_run_cwl_tool_job()(POST /api/jobs) or_run_cwl_workflow_job() - Compares outputs using
cwltest.compare.compare()
Test Tool Locations
| Location | Committed? | Purpose |
|---|---|---|
test/functional/tools/parameters/cwl_*.cwl | Yes | CWL parameter type testing (10 files) |
test/functional/tools/cwl_tools/v1.0_custom/ | Yes | Galaxy-specific CWL test tools (11 files) |
test/functional/tools/cwl_tools/v1.0/v1.0/ | No — downloaded | CWL v1.0 conformance tools |
test/functional/tools/cwl_tools/v1.1/tests/ | No — downloaded | CWL v1.1 conformance tools |
test/functional/tools/cwl_tools/v1.2/tests/ | No — downloaded | CWL v1.2 conformance tools |
test/functional/tools/galactic_cat.cwl | Yes | Galactic (gx:Interface) CWL tool |
test/functional/tools/galactic_record_input.cwl | Yes | Galactic CWL with record inputs |
lib/galaxy_test/api/cwl/test_cwl_conformance_v*.py | Yes — generated | Generated pytest conformance test cases |
Unit tests in test/unit/tool_util/test_cwl.py reference paths like v1.0/v1.0/cat1-testcli.cwl — these require update_cwl_conformance_tests.sh to have been run first.
Tool Configuration for Tests
File: test/functional/tools/sample_tool_conf.xml
All test tools are registered in this file. The CWL section (lines 268-287):
<!-- CWL Testing -->
<tool file="parameters/cwl_int.cwl" />
<tool file="cwl_tools/v1.0/v1.0/cat3-tool.cwl" />
<tool file="cwl_tools/v1.0/v1.0/env-tool1.cwl" />
<tool file="cwl_tools/v1.0/v1.0/null-expression1-tool.cwl" />
<tool file="cwl_tools/v1.0/v1.0/null-expression2-tool.cwl" />
<tool file="cwl_tools/v1.0/v1.0/optional-output.cwl" />
<tool file="cwl_tools/v1.0/v1.0/parseInt-tool.cwl" />
<tool file="cwl_tools/v1.0/v1.0/record-output.cwl" />
<tool file="cwl_tools/v1.0/v1.0/sorttool.cwl" />
<tool file="cwl_tools/v1.0_custom/any1.cwl" />
<tool file="cwl_tools/v1.0_custom/cat1-tool.cwl" />
<tool file="cwl_tools/v1.0_custom/cat2-tool.cwl" />
<tool file="cwl_tools/v1.0_custom/cat-default.cwl" />
<tool file="cwl_tools/v1.0_custom/default_path_custom_1.cwl" />
<tool file="cwl_tools/v1.0_custom/index1.cwl" />
<tool file="cwl_tools/v1.0_custom/optional-output2.cwl" />
<tool file="cwl_tools/v1.0_custom/showindex1.cwl" />
<tool file="galactic_cat.cwl" />
<tool file="galactic_record_input.cwl" />
Note: Several entries reference cwl_tools/v1.0/v1.0/*.cwl which do not exist on this branch. These tools would fail to load. Only parameters/cwl_int.cwl, the v1.0_custom/ tools, and the root-level galactic tools exist.
Test Framework Configuration
File: lib/galaxy_test/driver/driver_util.py
Key constants:
FRAMEWORK_TOOLS_DIR = os.path.join(GALAXY_TEST_DIRECTORY, "functional", "tools")(line 60)FRAMEWORK_SAMPLE_TOOLS_CONF = os.path.join(FRAMEWORK_TOOLS_DIR, "sample_tool_conf.xml")(line 62)enable_beta_tool_formats=True(line 212) - required for.cwlfile loading
Tool conf is resolved at line 177:
tool_conf = os.environ.get("GALAXY_TEST_TOOL_CONF", default_tool_conf)
CWL Parameter Specification Tests
File: test/unit/tool_util/parameter_specification.yml (lines 3946-4196)
Defines validation test cases for CWL parameter types. These test the CwlParameterModel pydantic models:
cwl_int:
request_valid:
- parameter: 5
request_invalid:
- parameter: "5" # must be strict int
- {} # required
- parameter: null
cwl_file:
request_valid:
- parameter: {src: hda, id: abcdabcd}
request_invalid:
- parameter: {src: hda, id: 7} # id must be encoded
- parameter: {src: hdca, id: abcdabcd} # hdca not valid for File
- parameter: null
These are tested by test/unit/tool_util/test_parameter_specification.py.
API Test Infrastructure
File: lib/galaxy_test/api/test_tools_cwl.py
TestCwlTools class runs CWL tools via Galaxy’s API. Two execution paths:
-
Galaxy representation (
_runmethod, line 374): Usesrun_tool_payload()which posts to/api/toolswith Galaxy-format inputs ({src: "hda", id: ...}) -
CWL representation (line 54-64): Same endpoint but with
inputs_representation="cwl", sending native CWL inputs -
CWL job files (via
CwlPopulator.run_cwl_job(), line 67-73): Uses tool request API (POST /api/jobs) with CWL job JSON
CwlPopulator
File: lib/galaxy_test/base/populators.py:3019-3178
Key constant:
CWL_TOOL_DIRECTORY = os.path.join(galaxy_root_path, "test", "functional", "tools", "cwl_tools")
# => test/functional/tools/cwl_tools
Methods:
-
run_cwl_job(artifact, job_path, ...)(line 3084): Main entry point. Determines if artifact is tool or workflow, stages inputs viastage_inputs(), then dispatches to_run_cwl_tool_job()or_run_cwl_workflow_job(). -
_run_cwl_tool_job(tool_id, job, history_id)(line 3030): Posts to tool request API viatool_request_raw(). If tool doesn’t exist in Galaxy, creates it as a dynamic tool viacreate_tool_from_path(). -
run_conformance_test(version, doc)(line 3150): Loads conformance test spec, runs the CWL job, and compares outputs usingcwltest.compare.compare(). -
get_conformance_test(version, doc)(line 3024): Looks up a test by itsdocfield fromconformance_tests.yamlin the test directory.
Conformance Test Discovery
File: lib/galaxy_test/base/populators.py:320-331
def conformance_tests_gen(directory, filename="conformance_tests.yaml"):
conformance_tests_path = os.path.join(directory, filename)
with open(conformance_tests_path) as f:
conformance_tests = yaml.safe_load(f)
for conformance_test in conformance_tests:
if "$import" in conformance_test:
import_dir, import_filename = os.path.split(conformance_test["$import"])
yield from conformance_tests_gen(os.path.join(directory, import_dir), import_filename)
else:
conformance_test["directory"] = directory
yield conformance_test
This expects conformance_tests.yaml in each CWL version directory (e.g., test/functional/tools/cwl_tools/v1.0/conformance_tests.yaml). Each test entry has tool, job, output, and doc fields.
Test Categories
| Test Type | Location | Runs Against | Requires v1.0/v1.0? |
|---|---|---|---|
| Unit: ToolProxy creation | test/unit/tool_util/test_cwl.py | cwltool directly | Yes |
| Unit: Parameter validation | test/unit/tool_util/test_parameter_specification.py | Pydantic models | No (uses parameters/ tools) |
| Unit: Runtime model | test/unit/tool_util/test_parameter_cwl_runtime_model.py | Galaxy parameter models | No (uses Galaxy tools) |
| API: Tool execution | lib/galaxy_test/api/test_tools_cwl.py | Running Galaxy server | Yes (mostly), some use v1.0_custom |
| Conformance: CWL spec | Via CwlPopulator.run_conformance_test() | Running Galaxy server | Yes |
CWL Test Tool Examples
ExpressionTool (parameters/cwl_int.cwl):
class: ExpressionTool
requirements:
- class: InlineJavascriptRequirement
cwlVersion: v1.2
inputs:
parameter:
type: int
outputs:
output: int
expression: "$({'output': inputs.parameter})"
CommandLineTool (v1.0_custom/cat1-tool.cwl):
class: CommandLineTool
cwlVersion: v1.0
inputs:
file1:
type: File
inputBinding: {position: 1}
numbering:
type: boolean?
inputBinding: {position: 0, prefix: -n}
baseCommand: cat
outputs: {}
Galactic CWL Tool (galactic_cat.cwl) - with gx:Interface:
class: CommandLineTool
$namespaces:
gx: "http://galaxyproject.org/cwl#"
hints:
gx:interface:
gx:inputs:
- gx:name: input1
gx:type: data
gx:format: 'txt'
Topic 3: Tool Loading and the Tool Request API
How CWL Tools Enter the Tool Request API
CWL tools now use the tool request API (POST /api/jobs) instead of the legacy POST /api/tools path. The flow:
POST /api/jobs (CwlPopulator._run_cwl_tool_job)
-> lib/galaxy/webapps/galaxy/services/jobs.py
-> creates ToolRequest model
-> dispatches Celery task: queue_jobs
-> JobCreationManager.queue_jobs() (lib/galaxy/managers/jobs.py:2174)
-> dereference() - converts URIs to HDAs
-> tool.handle_input_async() - creates Job
Dereference Step
File: lib/galaxy/managers/jobs.py:2129-2172
Before handle_input_async(), the dereferencer converts raw data requests to internal HDA references:
tool_state = RequestInternalToolState(tool_request.request)
return dereference(tool_state, tool, dereference_callback, dereference_collection_callback), new_hdas
For CWL tools, CwlFileParameterModel and CwlDirectoryParameterModel have py_type = DataRequest (which expects {src: "hda", id: <encoded_id>}). The dereference step converts URI-based requests to internal HDA IDs.
handle_input_async for CWL
File: lib/galaxy/tools/__init__.py:2377
After dereference, queue_jobs() calls:
tool.handle_input_async(
request_context,
tool_request,
tool_state, # RequestInternalDereferencedToolState
history=target_history,
use_cached_job=use_cached_jobs,
rerun_remap_job_id=rerun_remap_job_id,
)
Inside handle_input_async, expand_incoming_async() is called:
# __init__.py:2183-2191
if self.has_galaxy_inputs:
expanded_incomings, job_tool_states, collection_info = expand_meta_parameters_async(...)
else:
# CWL tools: pass state through as-is
expanded_incomings = [deepcopy(tool_request_internal_state.input_state)]
job_tool_states = [deepcopy(tool_request_internal_state.input_state)]
collection_info = None
Since CWL tools bypass Galaxy’s parameter expansion, the input state passes through unchanged. A JobInternalToolState is created and validated against the tool’s CWL parameter models:
internal_tool_state = JobInternalToolState(job_tool_state)
internal_tool_state.validate(self, f"{self.id} (job internal model)")
Job Persistence
File: lib/galaxy/tools/execute.py:254-256
if execution_slice.validated_param_combination:
tool_state = execution_slice.validated_param_combination.input_state
job.tool_state = tool_state
The JobInternalToolState.input_state dict is persisted as JSON on the Job model. For CWL tools, this contains the raw CWL-compatible inputs with dataset references as {src: "hda", id: <int>}.
Celery Serialization of CWL Tools
The tool request API dispatches jobs via Celery. The tool itself must be serializable:
File: lib/galaxy/tools/execute.py:326-345
raw_tool_source, tool_source_class = tool.to_raw_tool_source()
# For CWL: tool_source_class = "CwlToolSource"
# raw_tool_source = JSON string of ToolProxy.to_persistent_representation()
On the Celery worker:
# lib/galaxy/celery/tasks.py:83-92
def queue_jobs(tool_id, raw_tool_source, tool_source_class, ...):
tool = create_tool_from_representation(
app=app, raw_tool_source=raw_tool_source,
tool_source_class=tool_source_class # "CwlToolSource"
)
This reconstructs the full CWL tool from its persistent representation. Fixed in commit d4d68d2a9b.
Job Preparation and Evaluation
When the job is ready to execute:
-
Evaluator selection (
jobs/__init__.py:1402-1415):if self.tool.base_command or self.tool.shell_command: klass = UserToolEvaluator # YAML tools else: klass = ToolEvaluator # CWL tools get this -
State reconstruction (
evaluation.py:217-220):if job.tool_state: internal_tool_state = JobInternalToolState(job.tool_state) internal_tool_state.validate(self.tool, ...) -
param_dict construction (
evaluation.py:263-276):if self.tool.tool_type == "cwl": param_dict = self.param_dict # plain dict, not TreeDict # ... # Skip output wrapping, sanitization param_dict["__local_working_directory__"] = self.local_working_directory return param_dict -
Hook execution - calls
exec_before_job(validated_tool_state=internal_tool_state) -
exec_before_job (
__init__.py:3757-3829): Takesvalidated_tool_state.input_state, createsJobProxy, pre-computes command via cwltool, stores inparam_dict["__cwl_command"].
The Input State Gap
Currently there is a structural gap in the new path: validated_tool_state.input_state at exec_before_job time still contains dataset references ({src: "hda", id: N}) rather than CWL File objects with paths. The JobProxy._normalize_job() expects File objects with path or location keys.
This conversion (dataset reference -> CWL File object with filesystem path) is the key missing piece. In the YAML tool path, runtimeify() + setup_for_runtimeify() handles this. For CWL, it needs to happen somewhere between state reconstruction and JobProxy creation, enriched with CWL-specific data (secondaryFiles, format URIs, etc.).
Dynamic Tool Loading (Test Infrastructure)
When a CWL tool is not pre-loaded in the toolbox, tests create it dynamically:
# populators.py:3040-3050
if os.path.exists(tool_id):
tool_versions = self.dataset_populator._get("tools", data=dict(tool_id=raw_tool_id)).json()
if tool_versions:
galaxy_tool_id = raw_tool_id
else:
dynamic_tool = self.dataset_populator.create_tool_from_path(tool_id)
galaxy_tool_id = None
tool_uuid = dynamic_tool["uuid"]
create_tool_from_path() (line 1057) posts to Galaxy’s dynamic tool creation API with src="from_path". This uses lib/galaxy/managers/tools.py which requires enable_beta_tool_formats config.
Test API Paths
| Test Method | API Endpoint | Input Format | Notes |
|---|---|---|---|
_run() in test_tools_cwl.py | POST /api/tools | Galaxy ({src: "hda", id: ...}) or CWL | Legacy path |
CwlPopulator._run_cwl_tool_job() | POST /api/jobs | CWL-native | New tool request API |
CwlPopulator.run_cwl_job() | Routes to above | CWL job JSON file | Stages inputs first |
Summary of Loading -> Execution Path
1. Tool Loading (startup or dynamic):
.cwl file -> CwlToolSource -> ToolProxy -> CwlTool/GalacticCwlTool
2. API Request:
POST /api/jobs {tool_id, inputs: {param: {src: "hda", id: ...}}}
3. Request Processing:
-> ToolRequest created -> Celery task dispatched
-> Tool deserialized from CwlToolSource persistent representation
-> dereference() resolves data references to HDAs
4. Job Creation:
-> expand_incoming_async() bypasses Galaxy parameter expansion (has_galaxy_inputs=False)
-> JobInternalToolState validated against CWL parameter models
-> Job persisted with tool_state = input_state dict
5. Job Execution:
-> ToolEvaluator (not UserToolEvaluator)
-> JobInternalToolState reconstructed from job.tool_state
-> exec_before_job():
-> input_json = validated_tool_state.input_state
-> [GAP: needs dataset ref -> File object conversion]
-> JobProxy(input_json, output_dict, job_dir)
-> cwltool generates command, stages files
-> param_dict["__cwl_command"] = command_line
-> build() uses __cwl_command verbatim
6. Output Collection:
-> relocate_dynamic_outputs.py (appended to job script)
-> Reconstructs JobProxy from .cwl_job.json
-> cwltool's collect_outputs() evaluates output globs
Unresolved Questions
-
Only
parameters/cwl_int.cwlis insample_tool_conf.xmlfrom the parameters directory — should other CWL parameter tools (cwl_float.cwl,cwl_string.cwl,cwl_file.cwl, etc.) be added? -
The
has_galaxy_inputsflag for CWL isTruebecauseinputs_style="cwl"satisfiesinputs_defined. How is this being overridden toFalsein the new path? Is there a separate mechanism? -
How are CWL
arrayandrecordinput types handled by the new parameter model system?_from_input_source_cwl()only handles simple types and unions — no array/record support yet. -
CwlUnionParameterModelhasrequest_requires_value = False(with TODO comment) — is this correct for all unions, or only unions containingnull?