Galaxy Workflow Import: Component Architecture
A deep-dive into the full request-to-database stack for importing workflows into Galaxy.
Architectural Overview
Galaxy follows a layered architecture for workflow imports:
HTTP Request
│
▼
┌──────────────────────────────────────────┐
│ API Controller (WSGI) │
│ WorkflowsAPIController.create() │
│ lib/galaxy/webapps/galaxy/api/ │
│ workflows.py │
└──────────────┬───────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ Manager Layer │
│ WorkflowContentsManager │
│ lib/galaxy/managers/workflows.py │
│ │
│ ┌────────────────────────────────────┐ │
│ │ Format Normalization │ │
│ │ (gxformat2 ↔ Galaxy JSON) │ │
│ └────────────────────────────────────┘ │
│ ┌────────────────────────────────────┐ │
│ │ Workflow Construction │ │
│ │ (Steps, Modules, Connections) │ │
│ └────────────────────────────────────┘ │
└──────────────┬───────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ Module System │
│ WorkflowModuleFactory │
│ lib/galaxy/workflow/modules.py │
│ │
│ ToolModule, SubWorkflowModule, │
│ InputDataModule, PauseModule, ... │
└──────────────┬───────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ ORM Model Layer (SQLAlchemy) │
│ StoredWorkflow → Workflow → WorkflowStep│
│ lib/galaxy/model/__init__.py │
└──────────────┬───────────────────────────┘
│
▼
Database
The service layer (WorkflowsService in lib/galaxy/webapps/galaxy/services/workflows.py) exists but workflow import logic bypasses it — the controller talks directly to the managers. This is a known architectural wrinkle; the service layer is more active for invocation and refactoring operations.
Layer 1: API Controller
File: lib/galaxy/webapps/galaxy/api/workflows.py
Class Hierarchy
class WorkflowsAPIController(
BaseGalaxyAPIController,
UsesStoredWorkflowMixin, # shared workflow CRUD helpers
UsesAnnotations, # annotation helpers
SharableMixin, # sharing/publishing helpers
ServesExportStores, # model store export
ConsumesModelStores, # model store import
):
service: WorkflowsService = depends(WorkflowsService)
The controller gets its manager references via the app singleton at __init__ time:
def __init__(self, app: StructuredApp):
self.workflow_manager = app.workflow_manager # WorkflowsManager
self.workflow_contents_manager = app.workflow_contents_manager # WorkflowContentsManager
These are registered as singletons during app startup in lib/galaxy/app.py:
self.workflow_manager = self._register_singleton(WorkflowsManager)
self.workflow_contents_manager = self._register_singleton(WorkflowContentsManager)
The POST /api/workflows Endpoint
Line 196 — create() is the single endpoint handling all workflow creation methods. It enforces exactly-one-of six mutually exclusive payload parameters:
| Parameter | Import Method |
|---|---|
archive_source | URL, TRS URL, or file:// path |
archive_file | Uploaded file |
from_history_id | Extract workflow from execution history |
from_path | Server filesystem path (admin only) |
shared_workflow_id | Copy another user’s shared workflow |
workflow | Direct JSON/dict payload |
Validation (line 228-237):
- Bootstrap admin users are rejected (real user required)
- Exactly one creation method must be present
validate_uri_access()checks URL safety forarchive_source
Import Dispatch Logic
The create() method routes to internal helpers based on which parameter is present:
archive_source / archive_file → Two sub-paths:
-
TRS import (
archive_source == "trs_tool"): Delegates directly toWorkflowContentsManager.get_or_create_workflow_from_trs()which handles TRS resolution and deduplication. -
URL import: Streams the URL content via
stream_url_to_str(), then calls__api_import_from_archive(). -
File upload: Reads the uploaded file content, calls
__api_import_from_archive(). -
file://scheme: Rewrites as afrom_pathpayload and delegates to__api_import_new_workflow().
from_history_id → Calls extract_workflow() from lib/galaxy/workflow/extract.py (a separate code path that builds a workflow by analyzing job execution history).
from_path → Rewrites payload to {"src": "from_path", "path": ...} and calls __api_import_new_workflow().
shared_workflow_id → Calls __api_import_shared_workflow() which copies an existing workflow via the UsesStoredWorkflowMixin._import_shared_workflow() method.
workflow → Direct JSON payload, calls __api_import_new_workflow().
Controller Helper Methods
__api_import_from_archive() (line 586):
Parses archive data (tries JSON first, falls back to YAML if GalaxyWorkflow marker present), normalizes format, and calls _workflow_from_dict().
__api_import_new_workflow() (line 623):
Takes the workflow dict from the payload, normalizes, creates, returns encoded workflow dict with annotations, URL, owner, step count.
_workflow_from_dict() (line 680):
The convergence point for most import paths. Orchestrates:
- Validates publish/importable compatibility
- Calls
WorkflowContentsManager.build_workflow_from_raw_description() - Makes workflow accessible if importable
- Optionally triggers tool installation via
_import_tools_if_needed()
_import_tools_if_needed() (line 705):
Admin-only. Extracts tool_shed_repository metadata from step dicts and uses InstallRepositoryManager to install missing tools from the Tool Shed.
Response Format
__api_import_response() (line 604) returns:
{
"message": "Workflow 'name' imported successfully.",
"status": "success",
"id": "<encoded_stored_workflow_id>"
}
Status degrades to "error" if the workflow has_errors, has zero steps, or has_cycles.
Legacy/Deprecated Endpoints
POST /api/workflows/upload(line 374) — deprecated, maps to__api_import_new_workflow()POST /api/workflows/import(line 647) — deprecated, imports shared workflows
There is also a WSGI UI controller at lib/galaxy/webapps/galaxy/controllers/workflow.py with an imp() method for the web UI import flow.
Layer 2: Manager — WorkflowContentsManager
File: lib/galaxy/managers/workflows.py, line 593
This is the core business logic layer for workflow content manipulation. It is distinct from WorkflowsManager (line 151) which handles CRUD/sharing/access control on StoredWorkflow objects.
class WorkflowContentsManager(UsesAnnotations):
def __init__(self, app: MinimalManagerApp, trs_proxy: TrsProxy):
self.app = app
self.trs_proxy = trs_proxy
Format Normalization
normalize_workflow_format() (line 620):
All incoming workflow descriptions pass through this method. Its job is to convert any supported format into Galaxy’s internal JSON representation.
Input formats:
├── Galaxy native JSON (.ga) → passed through unchanged
├── Format2 YAML (class: GalaxyWorkflow) → converted via gxformat2
├── CWL $graph documents → resolved via artifact_class()
└── File path references (src: from_path) → loaded from disk (admin only)
Output: RawWorkflowDescription(as_dict, workflow_path)
Format detection is handled by artifact_class() in lib/galaxy/managers/executables.py. It checks:
src == "from_path"— loads YAML from filesystem (admin-gated)classfield — e.g."GalaxyWorkflow"indicates Format2$graphfield — CWL packed workflow format, resolves byobject_id
Format2 conversion uses the gxformat2 library:
from gxformat2 import python_to_workflow, ImporterGalaxyInterface, ImportOptions
galaxy_interface = Format2ConverterGalaxyInterface()
import_options = ImportOptions()
import_options.deduplicate_subworkflows = True
as_dict = python_to_workflow(as_dict, galaxy_interface,
workflow_directory=workflow_directory,
import_options=import_options)
Format2ConverterGalaxyInterface (line 2273) is a minimal implementation of gxformat2’s ImporterGalaxyInterface — its import_workflow() raises NotImplementedError, meaning nested Format2 subworkflow imports must go through the standard Galaxy path.
Workflow Construction
build_workflow_from_raw_description() (line 653):
This is the primary public entry point. It:
- Sets
trans.workflow_building_mode = ENABLED - Appends
(imported from <source>)to the workflow name - Calls
_workflow_from_raw_description()to build the transient model - Creates and wires up
StoredWorkflow↔Workflow - Applies annotations and tags
- Persists to database via
trans.sa_session.add()+commit() - Returns
CreatedWorkflow(stored_workflow, workflow, missing_tools)
_workflow_from_raw_description() (line 784):
The core construction method. This is ~130 lines of orchestration:
Phase 1 — Workflow model creation:
workflow = model.Workflow()
workflow.name = name
workflow.reports_config = data.get("report")
workflow.license = data.get("license")
workflow.creator_metadata = data.get("creator")
workflow.logo_url = data.get("logo_url")
workflow.doi = data.get("doi") # validated
workflow.help = data.get("help")
workflow.readme = data.get("readme")
Phase 2 — Source metadata tracking: If imported from TRS or URL, records provenance:
workflow.source_metadata = {
"trs_tool_id": ...,
"trs_version_id": ...,
"trs_server": ...,
"trs_url": ...
}
# or for URL imports:
workflow.source_metadata = {"url": archive_source}
Phase 3 — Subworkflow preloading:
If the workflow dict contains a top-level "subworkflows" map, these are recursively built first and stored in subworkflow_id_map for later reference by steps.
Phase 4 — Step iteration (two passes):
First pass — subworkflow resolution:
for step_dict in self.__walk_step_dicts(data):
self.__load_subworkflows(trans, step_dict, subworkflow_id_map, ...)
Second pass — module and step creation:
for step_dict in self.__walk_step_dicts(data):
module, step = self.__module_from_dict(trans, steps, steps_by_external_id, step_dict, **module_kwds)
if isinstance(module, ToolModule) and module.tool is None:
missing_tool_tups.append(...)
Phase 5 — Connection wiring:
self.__connect_workflow_steps(steps, steps_by_external_id, dry_run)
Phase 6 — Comment processing: WorkflowComment objects are created and parent-child relationships established between comments and steps.
Phase 7 — Step ordering:
if not is_subworkflow:
attach_ordered_steps(workflow)
Key Helper Methods
__walk_step_dicts() (line 1781):
Iterates through data["steps"] in order. Handles both dict-keyed and list-keyed step formats. Assigns discovery output UUIDs.
__module_from_dict() (line 1852):
Creates a WorkflowStep model and its corresponding WorkflowModule:
step = model.WorkflowStep()
step.position = step_dict.get("position")
step.uuid = step_dict.get("uuid")
step.label = step_dict.get("label")
module = module_factory.from_dict(trans, step_dict, **kwds)
module.save_to_step(step)
Also processes: annotations, when-expressions, workflow outputs, and stores temp_input_connections on the step for the connection pass.
__connect_workflow_steps() (line 1978):
Second pass — creates WorkflowStepConnection objects linking steps:
for input_name, conn_list in step.temp_input_connections.items():
for conn_dict in conn_list:
output_step = steps_by_external_id[conn_dict["id"]]
step.add_connection(input_name, conn_dict["output_name"], output_step, ...)
__load_subworkflow_from_step_dict() (line 1938):
Resolves subworkflow for a step from one of three sources:
- Embedded
"subworkflow"dict in the step → recursively built "content_id"referencingsubworkflow_id_map→ locally resolved"content_id"as a stored workflow ID → loaded from database
__build_embedded_subworkflow() (line 1967):
Recursive call back to build_workflow_from_raw_description() with hidden=True, is_subworkflow=True.
TRS Integration
get_or_create_workflow_from_trs() (line 2086):
Deduplication-aware import. Checks if a workflow with the same trs_id and trs_version already exists for this user before fetching.
create_workflow_from_trs_url() (line 2109):
Fetches from TRS, parses YAML, normalizes format, builds workflow with TRS source metadata.
TrsProxy (lib/galaxy/workflow/trs_proxy.py):
Handles GA4GH TRS v2 protocol:
- Parses TRS URLs via regex:
https://<server>/ga4gh/trs/v2/tools/<tool_id>/versions/<version_id> - Fetches
GALAXYtype descriptors - Default server: Dockstore (
dockstore.org)
Layer 3: Module System
File: lib/galaxy/workflow/modules.py
WorkflowModuleFactory
The factory pattern dispatches step creation by type:
module_types = {
"data_input": InputDataModule,
"data_collection_input": InputDataCollectionModule,
"parameter_input": InputParameterModule,
"pause": PauseModule,
"tool": ToolModule,
"subworkflow": SubWorkflowModule,
}
module_factory = WorkflowModuleFactory(module_types)
Each module class implements:
from_dict(trans, d, **kwargs)— creates module from step dict during importfrom_workflow_step(trans, step, **kwargs)— creates from ORM objectsave_to_step(step)— persists module state into WorkflowStep
ToolModule Resolution
When ToolModule.from_dict() processes a step dict:
- Extracts
content_id/tool_id,tool_version,tool_uuid - If no tool_id/uuid but a
tool_representationexists → creates a dynamic tool (admin only) - Attempts to resolve the tool from the local toolbox
- If tool not found →
module.tool = None, tracked as missing - If version mismatch → records version_changes message
SubWorkflowModule Resolution
SubWorkflowModule.from_dict() resolves from:
"subworkflow"key in dict → already-built model object (set by__load_subworkflows)"content_id"→ fetches owned workflow from database
Layer 4: ORM Models
File: lib/galaxy/model/__init__.py
Entity Relationship
StoredWorkflow (line 8324)
│ User-owned wrapper. Tags, annotations, sharing, published/importable flags.
│ Table: stored_workflow
│
├─── workflows: [Workflow] (all revisions)
└─── latest_workflow: Workflow (current revision)
│
│ Workflow (line 8507)
│ A specific revision. Name, UUID, license, DOI, source_metadata,
│ reports_config, creator_metadata, readme, help, logo_url.
│ Table: workflow
│
├─── steps: [WorkflowStep] (eager loaded, cascade delete)
│ │
│ │ WorkflowStep (line 8730)
│ │ type, tool_id, tool_version, tool_inputs (JSON), position (JSON),
│ │ order_index, label, uuid, when_expression, config (JSON)
│ │ Table: workflow_step
│ │
│ ├─── inputs: [WorkflowStepInput]
│ │ │ name, merge_type, scatter_type, default_value,
│ │ │ value_from, runtime_value
│ │ │ Table: workflow_step_input
│ │ │
│ │ └─── connections: [WorkflowStepConnection]
│ │ output_step_id, output_name,
│ │ input_subworkflow_step_id
│ │ Table: workflow_step_connection
│ │
│ ├─── workflow_outputs: [WorkflowOutput]
│ │ output_name, label, uuid
│ │ Table: workflow_output
│ │
│ ├─── post_job_actions: [PostJobAction]
│ ├─── subworkflow: Workflow (optional, for subworkflow steps)
│ ├─── dynamic_tool: DynamicTool (optional)
│ ├─── tags, annotations
│ └─── parent_comment: WorkflowComment (optional)
│
└─── comments: [WorkflowComment]
Key Model Details
StoredWorkflow — the user-facing entity. Owns name, slug, published, importable, deleted, hidden, from_path (set when imported from a filesystem path). Has sharing associations (StoredWorkflowUserShareAssociation) and menu entries.
Workflow — a specific revision of a StoredWorkflow. A StoredWorkflow can have many Workflow revisions; latest_workflow points to the current one. Carries the bulk of the metadata: source_metadata (JSON, tracks TRS/URL provenance), reports_config, creator_metadata, license, doi, readme, help, logo_url.
WorkflowStep — a node in the DAG. The type field determines behavior:
"tool"— referencestool_id,tool_version;tool_inputsholds serialized state as JSON"subworkflow"— references anotherWorkflowviasubworkflow_id"data_input","data_collection_input","parameter_input"— workflow inputs"pause"— human-intervention pause point
WorkflowStepConnection — an edge in the DAG. Links a source step’s named output to a target step’s named input. Special constant NON_DATA_CONNECTION = "__NO_INPUT_OUTPUT_NAME__" for non-data dependencies. The input_subworkflow_step_id field routes connections into subworkflow internals.
Configuration & Options
WorkflowStateResolutionOptions (line 2197)
Base pydantic model controlling how tool state is resolved during import:
| Field | Default | Purpose |
|---|---|---|
fill_defaults | False | Fill missing tool state with tool defaults |
from_tool_form | False | Expect form-generated state vs simpler JSON |
exact_tools | True | Require exact tool version match |
WorkflowCreateOptions (line 2217)
Extends WorkflowStateResolutionOptions:
| Field | Default | Purpose |
|---|---|---|
import_tools | False | Auto-install tools from Tool Shed |
publish | False | Make workflow published |
importable | None | Make workflow importable (defaults to publish) |
archive_source | None | Source identifier for provenance |
trs_tool_id | None | TRS tool ID |
trs_version_id | None | TRS version ID |
trs_server | None | TRS server identifier |
trs_url | None | Full TRS URL |
install_* | False | Tool Shed install options |
tool_panel_section_* | "" | Where to place installed tools in the panel |
Supported Workflow Formats
Galaxy Native JSON (.ga)
The canonical format. A JSON document with top-level keys:
name,annotation,tags,uuidsteps— dict keyed by string step IDs, each containingtype,tool_id,tool_version,tool_state,position,input_connections,workflow_outputs,post_job_actionssubworkflows— optional map of locally-defined subworkflow dicts- Metadata:
report,license,creator,logo_url,doi,help,readme
Format2 (gxformat2 YAML)
A YAML-based format designed for human readability. Detected by class: GalaxyWorkflow marker or yaml_content key. Converted to native JSON via gxformat2.python_to_workflow() before processing.
CWL $graph
Packed CWL documents with a $graph key. The artifact_class() function resolves the target object by object_id (defaults to "main").
Complete Import Flow (Happy Path)
1. POST /api/workflows { "workflow": { ... } }
│
2. WorkflowsAPIController.create()
│ Validates exactly one creation method
│
3. __api_import_new_workflow()
│
4. __normalize_workflow() → WorkflowContentsManager.normalize_workflow_format()
│ ├── artifact_class() detects format
│ ├── Format2? → gxformat2.python_to_workflow()
│ └── Returns RawWorkflowDescription
│
5. _workflow_from_dict()
│ ├── Validates publish/importable
│ │
│ ├── WorkflowContentsManager.build_workflow_from_raw_description()
│ │ │
│ │ ├── _workflow_from_raw_description()
│ │ │ ├── Create Workflow() model, set metadata
│ │ │ ├── Record source_metadata (TRS/URL provenance)
│ │ │ ├── Preload subworkflows (recursive)
│ │ │ ├── Pass 1: Load subworkflows for each step
│ │ │ ├── Pass 2: Create modules and steps
│ │ │ │ └── module_factory.from_dict() → type-specific Module
│ │ │ │ └── Module.save_to_step(step)
│ │ │ ├── Pass 3: Connect steps (WorkflowStepConnection)
│ │ │ ├── Process comments
│ │ │ └── attach_ordered_steps()
│ │ │
│ │ ├── Create StoredWorkflow(), wire to Workflow
│ │ ├── Set annotations and tags
│ │ ├── sa_session.add() + commit()
│ │ └── Return CreatedWorkflow
│ │
│ ├── Make importable if requested
│ └── Install tools if requested (admin only)
│
6. Return encoded workflow dict to client
File Index
| Component | File | Key Lines |
|---|---|---|
| API Controller | lib/galaxy/webapps/galaxy/api/workflows.py | 140-742 |
| WSGI UI Controller | lib/galaxy/webapps/galaxy/controllers/workflow.py | 30-180 |
| Service Layer | lib/galaxy/webapps/galaxy/services/workflows.py | 50-294 |
| WorkflowsManager | lib/galaxy/managers/workflows.py | 151-580 |
| WorkflowContentsManager | lib/galaxy/managers/workflows.py | 593-2180 |
| Options Models | lib/galaxy/managers/workflows.py | 2197-2259 |
| Format Detection | lib/galaxy/managers/executables.py | 14-46 |
| Module Factory | lib/galaxy/workflow/modules.py | 2635-2666 |
| ToolModule | lib/galaxy/workflow/modules.py | ~1917-1985 |
| SubWorkflowModule | lib/galaxy/workflow/modules.py | ~670-700 |
| TRS Proxy | lib/galaxy/workflow/trs_proxy.py | 61-100 |
| Workflow Extraction | lib/galaxy/workflow/extract.py | 34+ |
| StoredWorkflow Model | lib/galaxy/model/__init__.py | 8324-8503 |
| Workflow Model | lib/galaxy/model/__init__.py | 8507-8725 |
| WorkflowStep Model | lib/galaxy/model/__init__.py | 8730-8969 |
| WorkflowStepInput | lib/galaxy/model/__init__.py | 9059-9107 |
| WorkflowStepConnection | lib/galaxy/model/__init__.py | 9109-9158 |
| WorkflowOutput | lib/galaxy/model/__init__.py | 9165-9198 |
| Pydantic Schemas | lib/galaxy/schema/schema.py | 2410-2568 |
| Workflow Schemas | lib/galaxy/schema/workflows.py | 109-298 |
| App Wiring | lib/galaxy/app.py | 637-638 |
| Controller Mixin | lib/galaxy/webapps/base/controller.py | 582+ |
Architectural Notes
-
No FastAPI for imports yet. The
create()endpoint uses the WSGI@expose_apidecorator, not FastAPI routes. The controller inherits fromBaseGalaxyAPIController(WSGI-based). FastAPI migration for this endpoint has not happened. -
Service layer bypass. Import logic lives in the controller and manager, skipping the service layer. The
WorkflowsServiceclass handles invocation and refactoring but not import. -
Two managers, one concern.
WorkflowsManagerhandles StoredWorkflow CRUD (access control, sharing, listing).WorkflowContentsManagerhandles the content — building, updating, exporting workflow internals. Both are registered as app singletons. -
Module system as strategy pattern. The
WorkflowModuleFactory+ per-typeWorkflowModulesubclasses isolate step-type-specific logic from the general construction flow. Adding a new step type means adding a module class and registering it inmodule_types. -
Recursive subworkflows. Subworkflow import is recursive —
__build_embedded_subworkflow()calls back tobuild_workflow_from_raw_description()withis_subworkflow=True, hidden=True. This creates separateStoredWorkflow+Workflowrecords for each embedded subworkflow. -
Format normalization as gateway. All format diversity is collapsed to Galaxy native JSON before the construction pipeline. The
gxformat2library handles Format2→native conversion. CWL$graphdocuments are resolved to their target object. After normalization, the rest of the stack works with a single format. -
Missing tools are warnings, not errors. Workflows can be imported even when referenced tools aren’t installed locally. Missing tools are tracked as tuples
(tool_id, name, version, step_id)and optionally trigger Tool Shed installation ifimport_tools=True(admin-only).