Workflow Extraction - Known Issues
Summary of open GitHub issues related to workflow extraction limitations in Galaxy.
UI/UX Modernization
#17506 - Convert workflow extraction interface to Vue
Status: Open | Labels: enhancement, UI-UX, refactoring, backend
The build_from_current_history Mako template is the last non-data display Mako in Galaxy. Needs conversion to FastAPI + Vue.
Discussion points:
- Option A: Keep selection UI, convert to Vue
- Option B: Extract full workflow automatically, let users edit in workflow editor
- Concern about large histories generating “a lot of crap” without selection
- jmchilton: “extracting workflows from histories is pretty much the core idea of Galaxy”
Copied Dataset Problems
#9161 - Extracting workflow from history with copied datasets breaks
Status: Open | Labels: bug, workflows
When datasets are copied from other histories:
- All connections are broken
- Includes tools from original history that weren’t run in target history
- Copied datasets not treated as inputs (should be like uploads)
Workaround: Download and re-upload datasets instead of copying.
#21336 - Extract workflow from history misses connections
Status: Open (Nov 2025)
- Extracted workflow has 5 tools without any inputs
- May be related to job cache usage
- Reported on usegalaxy.eu (v25.0)
#13823 - Workflow extraction fails (in specific identified cases)
Status: Open | Labels: bug
Fails when:
- Tool has multiple collection outputs (e.g., FastQC)
- One output collection is copied to new history
- Another tool runs on that copied collection
- Extraction produces empty workflow
Collection Extraction Problems
#21788 - Workflow extraction with empty collection produces empty workflow
Status: Open (Feb 2026)
Follow-up to #18484 (which fixed server crash but not underlying extraction).
- Empty collection filtering causes no jobs to run for downstream tools
- Job-based extraction finds nothing to trace
- Extracted workflow contains 0 steps instead of expected structure
Test: test_empty_collection_map_over_extract_workflow
#21789 - Workflow extraction fails to connect tools for dynamically-created nested collections
Status: Open (Feb 2026)
Related to #21722 (ID-based extraction for cross-history). This is same-history HID-reliance problem.
- Tool creates dynamic nested collection (no inputs)
- Downstream tool reduces/operates on it
- Extraction creates spurious input collection step
- Downstream tool misconnected to spurious input instead of upstream tool
Test: test_subcollection_reduction
Tool-Specific Issues
#12590 - Expression tool disconnected when extracting workflow
Status: Open | Labels: bug, workflows, paper-cut
- Multiple root datasets cause label mix-ups
- “Compose text parameter value” left disconnected
- Input dataset labels assigned incorrectly
#14541 - Extract Workflow does not include Extract Dataset Tool
Status: Open | Labels: bug
- Extract Dataset tool missing from extracted workflows
- Creates duplicate input files instead of preserving tool chain
#12236 - Extracting workflows with Unzip Collection for Copied Collections is Broken
Status: Open | Labels: bug, workflows, dataset-collections
Crashes
#14423 - build_from_current_history crashed with key error
Status: Open | Labels: bug, workflows
AttributeError: 'dict' object has no attribute 'hid'
- Crash in
extract.py:416within__cleanup_param_values - Occurs with collections and deferred datasets (DL)
- Expected HDA object, received dict with
{'values': [{'src': 'hda', 'id': '...'}]}
#5189 - workflow extraction failed
Status: Open (2018) | Labels: bug
#6126 - cannot extract workflow with history that contains ‘quast’ tool
Status: Open | Labels: bug, workflows
Feature Requests
#6714 - Option of applying dataset annotations to extracted workflow
Status: Open | Labels: feature-request
#7003 - Feature: Extract workflow also to include queued jobs
Status: Open | Labels: feature-request, workflows
Include jobs that are queued but not yet completed in extraction.
#17194 - Set a label for checkboxes in workflow extraction view
Status: Open | Labels: UI-UX, feature-request, paper-cut, accessibility
Related Closed Issues
- #18484 - Workflow extraction from history with empty collection will fail (Partially fixed - server crash resolved, but extraction still produces 0 steps; see #21788)
- #11059 - Workflow extraction fails for list:paired inputs copied from another history (Fixed)
- #19524 - Extract workflow from purged history does not work (Fixed)
Common Patterns
- Copied datasets are a major source of extraction failures
- Multi-output tools with collections cause issues
- Expression tools and parameter-based tools often disconnect
- Deferred/DL datasets can cause crashes
- Legacy Mako UI limits modern UX improvements
- Empty collections cause no jobs to run, breaking job-based extraction
- Dynamic collections with HID-based tracing create spurious inputs
Analysis: Relationship to Fundamental Model Limitations
The current extraction architecture relies on tracing HDAs/HDCAs back to their creating jobs. When this tracing fails (no job exists, job in wrong history, etc.), extraction breaks. See WORKFLOW_EXTRACTION_LIMITATIONS.md for detailed evidence.
Issues Likely Caused by Model Limitations
| Issue | Problem | Why Job-Based Extraction Fails |
|---|---|---|
| #9161 | Copied datasets break extraction | creating_job_associations points to job in original history, causing wrong HIDs and foreign jobs pulled in |
| #21336 | Missing connections | Job cache causes associations to point to cached jobs in different history contexts |
| #13823 | Multi-output copied collection fails | Partial copy of collection outputs breaks job tracing chain |
| #14541 | Extract Dataset tool missing | Association chain incomplete for __EXTRACT_DATASET__ tool |
| #7003 | Queued jobs not included | JobParameter records may not be available until job completes |
| #21788 | Empty collection produces empty workflow | No jobs run for empty collections, nothing to trace |
| #21789 | Dynamic nested collection misconnected | HID-based tracing fails for dynamically-created collections |
Issues Possibly Related to Model Limitations
| Issue | Problem | Possible Model Relationship |
|---|---|---|
| #12590 | Expression tool disconnected | Parameter-based inputs handled differently than dataset inputs in job associations |
| #12236 | Unzip Collection with copied collections | Variant of copied collection problem |
| #14423 | Key error crash with deferred datasets | Object reconstruction failure when tracing job parameters |
Issues Unrelated to Model Limitations
| Issue | Problem | Actual Cause |
|---|---|---|
| #17506 | Convert to Vue | UI framework modernization |
| #17194 | Checkbox labels | UI accessibility |
| #6714 | Dataset annotations | Metadata feature request |
| #5189 | Extraction failed (2018) | Insufficient information |
| #6126 | QUAST tool fails | Tool-specific compatibility |
Key Insight: The Copied Dataset Problem
The copied dataset problem is the most prevalent root cause, affecting 3+ open issues (#9161, #13823, #12236, possibly #21336).
Why it happens:
- User copies dataset/collection from History A to History B
- The copied item’s
creating_job_associationsstill points to the job in History A - When extracting from History B:
- Extraction follows the association to History A’s job
- That job’s inputs/outputs reference HIDs in History A
- HID mismatches cause broken connections
- Jobs from History A incorrectly appear in extracted workflow
Current workaround: Download and re-upload datasets instead of copying.
Proposed Solution: ToolRequest-Based Extraction
The ToolRequest model (already exists in codebase) would fix all “Likely” issues because:
- Created at request time - before job cache lookup, before jobs queue
- Captures current history context - not dependent on where data originated
- Stores original request dict - parameters available regardless of job state
- Links to output collections - via
ToolRequestImplicitCollectionAssociation
Existing TODO comments confirm this is the intended solution:
extract.py:323: “TODO track this via tool request model”test_workflow_extraction.py:235: “TODO: after adding request models we should be able to recover implicit collection job requests” (relates to #21788)test_workflow_extraction.py:305: “TODO: refactor workflow extraction to not rely on HID” (relates to #21789)