Post Job Actions (PJA) - Component Architecture & Design
Overview
Post Job Actions (PJA) are a core Galaxy component that enable automatic transformations and operations to be performed on job outputs after job execution or workflow step completion. PJAs provide a declarative, extensible mechanism for post-processing datasets and managing workflow intermediate outputs without requiring explicit tool definitions.
Originally designed for workflows, PJA support has been extended to standalone jobs in recent versions.
Architecture
Core Model Layer
Location: lib/galaxy/model/__init__.py
PostJobAction
The primary data model for defining an action to be performed on outputs:
class PostJobAction(Base, RepresentById):
__tablename__ = "post_job_action"
id: Mapped[int]
workflow_step_id: Mapped[Optional[int]] # FK to WorkflowStep
action_type: Mapped[str]
output_name: Mapped[Optional[str]]
_action_arguments: Mapped[Optional[dict[str, Any]]]
Key properties:
action_type: String identifier for the action class (e.g., “RenameDatasetAction”)output_name: Target output name, empty string = all outputsaction_arguments: Dict of action-specific parameters (e.g.,{"newname": "result.txt"})
PostJobActionAssociation
Join table linking PJAs to jobs when executed:
class PostJobActionAssociation(Base, RepresentById):
__tablename__ = "post_job_action_association"
id: Mapped[int]
job_id: Mapped[int] # FK to Job
post_job_action_id: Mapped[int] # FK to PostJobAction
This table exists because:
- PJAs defined on workflow steps are reused across all invocations
- Some PJAs are created/added dynamically at job creation time
- Allows tracking which PJAs were actually executed for a job
Job Integration
The Job model maintains relationships with PJAs:
class Job:
post_job_actions: Mapped[list["PostJobActionAssociation"]]
Methods:
add_post_job_action(pja)- Create PostJobActionAssociationget_post_job_actions()- Retrieve associated PJAsset_post_job_actions(post_job_actions)- Bulk set
Execution Layer
Location: lib/galaxy/job_execution/actions/post.py
ActionBox Registry
Central dispatcher for all PJA execution. Only registered actions can be executed:
class ActionBox:
actions: dict[str, type[DefaultJobAction]] = {
"RenameDatasetAction": RenameDatasetAction,
"HideDatasetAction": HideDatasetAction,
"ChangeDatatypeAction": ChangeDatatypeAction,
"ColumnSetAction": ColumnSetAction,
"EmailAction": EmailAction,
"DeleteIntermediatesAction": DeleteIntermediatesAction,
"TagDatasetAction": TagDatasetAction,
"RemoveTagDatasetAction": RemoveTagDatasetAction,
}
public_actions: list[str] # Explicitly exposed to users
immediate_actions: list[str] # Applied during workflow scheduling
mapped_over_output_actions: list[str] # Applied to collection outputs
Important: Actions defined as classes but NOT in the actions registry cannot be executed:
ValidateOutputsAction- class exists but not registeredSetMetadataAction- class exists but not registeredDeleteDatasetAction- class exists but not registered (disabled due to breaking downstream dependencies)
Default Job Action Base Class
class DefaultJobAction:
@classmethod
def execute(cls, app, sa_session, action, job,
replacement_dict=None, final_job_state=None):
"""Execute on standalone jobs or post-completion"""
pass
@classmethod
def execute_on_mapped_over(cls, trans, sa_session, action,
step_inputs, step_outputs,
replacement_dict, final_job_state=None):
"""Execute during workflow step processing with mapped inputs"""
pass
@classmethod
def get_short_str(cls, pja) -> str:
"""Human-readable description for UI"""
pass
Built-in Action Types
Note: Only 8 of the 11 action classes defined in the codebase are registered in ActionBox.actions and can be executed. See “Unregistered Actions” section below.
1. RenameDatasetAction
- Purpose: Rename output datasets
- Timing: Immediate (during workflow scheduling) + separate mapped_over handling for collections
- Immediate Execution: Called via
job_callbackafter outputs recorded but before job queued - Mapped Collections: Executed separately via
execute_on_mapped_over()when workflow step processes collections - Parameters:
newname- supports template syntax - Template Features:
#{variable_name}- substitute input name#{variable|basename}- filename without extension#{variable|upper}- uppercase#{variable|lower}- lowercase${replacement_key}- runtime replacement parameters
- Scope: Can target specific outputs or apply to all (empty
output_name)
2. ChangeDatatypeAction
- Purpose: Convert output datatype post-execution
- Timing: Immediate (ahead of job) + job completion for collections
- Parameters:
newtype- target Galaxy datatype - Special Behavior: For dynamic collections, creates PostJobActionAssociation for later execution
- Constraints: Skipped if job state is SKIPPED
3. HideDatasetAction
- Purpose: Hide outputs from history visibility
- Timing: Job completion only
- Mapped Collections: Supported via
execute_on_mapped_over() - Execution: Skipped on job ERROR state
- Scope: Targets specific output or all outputs
- Note: Registered in
ActionBox.actionsbut NOT inpublic_actionslist (internal use)
4. ColumnSetAction
- Purpose: Set tabular metadata (BED file column assignments)
- Timing: Job completion only
- Parameters:
chromCol,startCol,endCol,strandCol,nameCol - Conversion: Automatically converts “cX” format to integer
5. EmailAction
- Purpose: Notify user of job completion via email
- Timing: Always executes (even on failure)
- Content: Includes dataset names, history link, workflow invocation link (if applicable)
- Parameters:
host(optional) - server to link from - Error Handling: Gracefully fails with logging, doesn’t block job completion
6. DeleteIntermediatesAction
- Purpose: Clean up intermediate datasets created during workflow execution
- Timing: Job completion only (after entire workflow)
- Conditions:
- Only applies to workflow invocations with output definitions
- Targets non-output steps that are not marked as outputs
- Skips deletion if dependent jobs are in non-terminal states
- Aborts if workflow invocation still active
- Complexity: Extensive safety checks to avoid deleting needed intermediates
- Performance: No optimization yet - full scan approach
7. TagDatasetAction / RemoveTagDatasetAction
- Purpose: Add or remove tags from outputs
- Timing: Immediate (during workflow) + job completion
- Mapped Collections: Supported via
execute_on_mapped_over() - Parameters:
tags- comma-separated tag list - Tag Formats:
#name:value- name:value tagsregular_tag- untyped tags
- Supports: Both datasets and dataset collections
Unregistered Actions
The following action classes are defined in the codebase but not registered in ActionBox.actions and therefore cannot be executed:
DeleteDatasetAction
- Purpose: Mark datasets as deleted
- Status: NOT REGISTERED (class exists but disabled)
- Reason: Disabled because deleting datasets in the middle of a workflow causes errors for subsequent steps that depend on that data
- Code Location:
lib/galaxy/job_execution/actions/post.pylines 304-326 - Note: To use dataset deletion, must be done at workflow end or after all dependent steps complete
SetMetadataAction
- Purpose: Apply custom metadata to outputs
- Status: NOT REGISTERED (class exists but broken/incomplete)
- Code Comment: “DBTODO Setting of Metadata is currently broken and disabled. It should not be used (yet).”
- Code Location:
lib/galaxy/job_execution/actions/post.pylines 351-359 - Note: Full implementation required before registration
ValidateOutputsAction
- Purpose: Validate produced outputs against expected datatype
- Status: NOT REGISTERED (class exists but incomplete)
- Implementation Note: “no-op: needs to inject metadata handling parameters ahead of time.”
- Code Location:
lib/galaxy/job_execution/actions/post.pylines 88-104 - Note: Metadata validation infrastructure needed before this can be activated
Action Registration Categories
ActionBox maintains four lists for organizing actions:
-
actions: The active registry - only actions here can be executed
- 8 registered actions (see Built-in Action Types above)
- 3 unregistered but defined (see Unregistered Actions above)
-
public_actions: Actions exposed to workflow designers via UI
- 7 actions (all except HideDatasetAction, which is internal)
-
immediate_actions: Actions executed during workflow scheduling (before job runs)
ChangeDatatypeAction,RenameDatasetAction,TagDatasetAction,RemoveTagDatasetAction- Execute after outputs recorded but before job queued
- Can affect job setup/metadata
-
mapped_over_output_actions: Actions supporting collection mapping
RenameDatasetAction,HideDatasetAction,TagDatasetAction,RemoveTagDatasetAction- Called via
execute_on_mapped_over()when workflow steps have mapped inputs - Operate on implicit collection outputs
Workflow Integration
Location: lib/galaxy/workflow/modules.py
Step-Level PJA Management
class ToolModule:
def get_post_job_actions(self, incoming):
"""Parse incoming workflow form data for PJA definitions"""
# Translates form input like "pja__output__ActionType__param"
# into structured PostJobAction objects
Execution During Workflow
# Immediate actions execute during workflow scheduling
if pja.action_type in ActionBox.immediate_actions:
ActionBox.execute(trans.app, trans.sa_session, pja, current_job,
replacement_dict)
# Mapped-over actions execute separately for collection steps
if pja.action_type in ActionBox.mapped_over_output_actions:
ActionBox.execute_on_mapped_over(trans, trans.sa_session, pja,
step_inputs, step_outputs,
replacement_dict)
# Non-immediate actions stored as PostJobActionAssociation
# and executed on job completion
Mapped Over Collection Handling
When workflow steps process collections (mapped over):
execute_on_mapped_over()is called with step inputs/outputs- Different execution path than immediate actions
- Allows actions to operate on collection semantics
- Supports per-element operations on implicit collections
Job Execution & Completion
Location: lib/galaxy/jobs/__init__.py
Execution Flow
-
Job Startup (setup phase)
- Immediate PJAs are executed during job creation/scheduling
- Allows setting datatypes/names before job runs
-
Job Success Path
for pja in job.post_job_actions: if pja.post_job_action.action_type not in ActionBox.immediate_actions: ActionBox.execute(self.app, self.sa_session, pja.post_job_action, job, final_job_state=final_job_state) -
Job Failure Path
- Only EmailAction is executed (user notification)
- Other PJAs are skipped
- Error state prevents HideDatasetAction execution
-
Parameters Passed to Actions
app- Galaxy application instancesa_session- SQLAlchemy session for persistenceaction- PostJobAction objectjob- Job instance (with access to outputs, inputs, user, history)replacement_dict- Runtime workflow parameters for templatingfinal_job_state- Job terminal state (ok, error, deleted, etc.)
Frontend Integration
Location: client/src/
Type Definitions
File: stores/workflowStepStore.ts
export interface PostJobAction {
action_type: string;
output_name: string;
action_arguments: {
[index: string]: string;
};
}
export interface PostJobActions {
[index: string]: PostJobAction;
}
Storage Format: PostJobActions is a dictionary indexed by concatenated action_type + output_name
Examples:
"ChangeDatatypeAction__out_file1"= ChangeDatatypeAction targeting “out_file1”"RenameDatasetAction__out_file1"= RenameDatasetAction targeting “out_file1”"TagDatasetAction__"= TagDatasetAction targeting all outputs (empty output_name)
Each key maps to a PostJobAction object with the action details.
UI Components
File: components/Workflow/Editor/Forms/FormTool.vue
- Hosts FormSection component for PJA editing
- Receives and emits PJA updates
File: components/Workflow/Editor/Forms/FormOutput.vue
- Per-output PJA configuration interface
- Displays changeable actions:
- Rename dataset
- Change datatype
- Add/remove tags
- Assign columns (tabular metadata)
- Handles form field generation from action definitions
File: components/Workflow/Editor/Forms/FormSection.vue
- Parent container for FormOutput components
- Manages collection of outputs and their PJAs
- Coordinates state updates with parent FormTool
Form Data Handling
Actions are serialized to form fields with naming convention:
- Format:
pja__[output_name]__[action_type]__[parameter] - Example:
pja__out_file__RenameDatasetAction__newname
ActionBox.handle_incoming() parses HTTP form data back into structured objects:
{
"action_type": "RenameDatasetAction",
"output_name": "out_file",
"action_arguments": {"newname": "result.txt"}
}
API & Serialization
Location: lib/galaxy/managers/workflows.py
When returning workflow step details:
step_model["post_job_actions"] = [
{
"short_str": ActionBox.get_short_str(pja),
"action_type": pja.action_type,
"output_name": pja.output_name,
"action_arguments": pja.action_arguments,
}
for pja in step.post_job_actions
]
Key design: Always includes short_str for UI display without further processing.
Execution Timeline
Workflow Invocation Scenario
1. Workflow Scheduling Phase
- Workflow steps are initialized
- Immediate PJAs execute (ChangeDatatypeAction, RenameDatasetAction, etc.)
- Jobs are created and enqueued
2. Job Execution Phase
- Tool runs
- Outputs created
3. Job Completion Phase
- Non-immediate PJAs execute
- Post-job action associations processed
- cleanup scheduled
4. Workflow Completion
- DeleteIntermediatesAction evaluates safety
- Intermediate files cleaned up
Standalone Job Scenario
1. Job Creation
- PJAs can be attached via API/UI
- Immediate actions executed
2. Job Execution
- Tool runs
3. Completion
- All non-immediate PJAs execute
Key Design Patterns
1. Polymorphic Action Dispatch
- Base class
DefaultJobActiondefines interface - ActionBox registry holds mapping of action types to classes
execute()andexecute_on_mapped_over()are optional (implement only needed ones)
2. Output Targeting
- Empty
output_name= apply to all outputs - Specific name = targeted to one output
- Supports both datasets and dataset collections
3. Template Substitution
- Input-based:
#{variable_name}with operators (basename, upper, lower) - Parameter-based:
${parameter_key}for runtime workflow parameters - Allows dynamic naming based on inputs/workflow context
4. Dual Execution Paths for Scheduling-Time Actions
-
Immediate (regular): Called via
job_callbackafter outputs recorded for non-mapped steps- Executes:
execute(app, sa_session, action, job, replacement_dict)
- Executes:
-
Immediate (mapped-over): Called separately for steps with mapped inputs
- Executes:
execute_on_mapped_over(trans, sa_session, action, step_inputs, step_outputs, replacement_dict) - Operates on collection semantics, not individual datasets
- Executes:
-
Deferred (completion): Called after job finishes
- Executes:
execute(app, sa_session, action, job, replacement_dict, final_job_state) - All non-immediate actions follow this path
- Has access to final job state (error, ok, etc.)
- Executes:
5. Error Resilience
- EmailAction always executes (even on job failure)
- Most other actions skip on ERROR job state
- HideDatasetAction explicitly skips on ERROR state
- Graceful degradation (log errors, don’t propagate to break job completion)
6. Workflow-Aware Operations
- DeleteIntermediatesAction understands workflow DAG structure
- Can access workflow_invocation context via
job.workflow_invocation_step - Extensive safety checks prevent deletion of required intermediate data
- Aborts if workflow still actively scheduling
Performance Considerations
Current Limitations
-
DeleteIntermediatesAction
- Full table scan approach
- Expensive for large workflows
- No optimization for identifying candidates
-
Execution Timing
- Actions executed synchronously in job completion handler
- Email actions can block if mail service is slow
- No parallelization
-
Immediate Actions
- Run during workflow scheduling
- Can delay workflow startup for many steps
- No batch optimization
Async Challenges
- DeleteIntermediatesAction logs “PJA Async Issues” when dataset/job relationships incomplete
- Happens under concurrent job completion conditions
- Refresh operations used as workaround
Extension Points
Adding Custom Actions
- Create class extending DefaultJobAction
- Implement
execute()and/orexecute_on_mapped_over() - Implement
get_short_str()for UI - Register in ActionBox.actions dictionary
- Add to appropriate lists (public_actions, immediate_actions, mapped_over_output_actions)
Requirements
- Idempotent execution (may retry on failure)
- Handle missing/skipped outputs gracefully
- Access outputs via job model relationships
- Use provided sa_session for persistence
Integration Points
- Frontend: Add UI component to FormOutput.vue
- Serialization: ActionBox.handle_incoming() handles parsing
- Execution: ActionBox.execute() dispatches
- API: Managers serialize to JSON via get_short_str()
Recent Changes
Standalone Job Support
- PostJobActionAssociation now linkable directly to jobs (not just via workflow steps)
- Allows non-workflow tools to use PJAs
- Same execution paths as workflow-based PJAs
Validation Improvements
- ChangeDatatypeAction now validates datatype availability
- ValidateOutputsAction available but not auto-executed
Known Issues & Limitations
- SetMetadataAction & ValidateOutputsAction: Currently disabled/broken
- DeleteDatasetAction: Disabled due to breaking downstream dependencies
- DeleteIntermediatesAction: Performance issues on large workflows
- Tag Operations: Flush=False used, requires outer commit
- Dynamic Collections: ChangeDatatypeAction deferred for dynamic outputs
- Async Races: Async dataset relationship issues in DeleteIntermediatesAction
Database Schema
-- Post job action definitions (workflow-step-level)
CREATE TABLE post_job_action (
id INTEGER PRIMARY KEY,
workflow_step_id INTEGER REFERENCES workflow_step(id),
action_type VARCHAR(255),
output_name VARCHAR(255),
action_arguments JSON -- MutableJSONType (database-agnostic)
);
-- Association of PJAs to actual jobs (for tracking execution)
CREATE TABLE post_job_action_association (
id INTEGER PRIMARY KEY,
job_id INTEGER REFERENCES job(id),
post_job_action_id INTEGER REFERENCES post_job_action(id)
);
Note: action_arguments uses SQLAlchemy’s MutableJSONType which provides change tracking and automatic persistence for nested dictionaries. This allows safe mutation of action parameters.
Related Components
- Workflow Modules (
lib/galaxy/workflow/modules.py) - PJA parsing & execution context - Job Handlers (
lib/galaxy/jobs/__init__.py) - Completion flow - Tool Actions (
lib/galaxy/tools/actions/__init__.py) - Job creation hooks - Workflow Managers (
lib/galaxy/managers/workflows.py) - API layer - Workflow Editor Frontend (
client/src/components/Workflow/Editor/) - UI layer
Summary
Post Job Actions provide a declarative, extensible framework for automating post-execution transformations on Galaxy job outputs. The design separates action definitions (database models) from execution logic (ActionBox registry), enabling clean extension while maintaining compatibility with both workflow-based and standalone job execution models.
The two-phase execution model (immediate during scheduling, deferred on completion) balances workflow optimization with access to final job state. While powerful, the current implementation has performance concerns and some disabled actions that would benefit from refactoring in high-throughput scenarios.