UC2 debrief — TAL1 peaks to candidate regulated genes (issue #13)
Built interactively via the notebooks MCP against the local Galaxy (worktree history_pages), 2026-06-13. Setup story in SETUP_DEBRIEF.md, operational facts in index.md. Sibling: UC1_DEBRIEF.md.
Artifacts produced
| Thing | ID / location |
|---|---|
| History | TAL1 peaks to candidate regulated genes — f30a35c999095ed7 |
| Notebook page | f0f309c56aff0025 (history-attached) |
| G1E TAL1 narrowPeak (261) | e516d7c43b2ce824 |
| Megakaryocyte TAL1 narrowPeak, pooled R1+R2 (150) | 7f09d52a860db821 |
| common / G1E-only / mega-only peaks | 5bb18c11f5b70a41 / 38d70b27d179c236 / 603e7db97773b4d1 |
| promoter windows (TSS −1000/+500) | b735ed9e5e005602 |
| distance-to-TSS figure (PNG) | d33e32db742aed56 |
| bowtie2 + mm10 index installed | devteam/bowtie2; database/bowtie2_index/mm10 (prebuilt, registered in bowtie2_indices.loc) |
Data-source decision
The tutorial’s TAL1 narrowPeak files are tutorial outputs, not hosted inputs (Zenodo 197100 has only FASTQ + the RefSeq/ROI BEDs). GSE51338 hosts real TAL1 peaks but in mm9 broadPeak (build mismatch with the mm10 RefSeq). User chose mm10 + regenerate peaks: align the tutorial FASTQ to mm10 and call peaks ourselves. (Rejected: mm9+ENCODE peaks [build switch], liftover, synthetic.)
What was done
G1E (erythroid) and megakaryocyte TAL1 ChIP, 36 bp single-end, R1 per condition for G1E, pooled R1+R2 for megakaryocyte:
FASTQ ─► Bowtie2 (mm10 prebuilt index) ─► BAM ─► MACS2 callpeak (TAL1 vs input, gsize mouse) ─► narrowPeak
RefSeq mm10 BED12 ─► awk TSS±window (promoters) + awk TSS points
bedtools intersect (G1E vs mega) ─► common / G1E-only / mega-only
├─ ∩ promoters (-u) ─► promoter-bound candidate genes
└─ closest -d vs TSS ─► nearest gene + distance
No read trimming (36 bp, good quality — documented simplification).
Headline result (textbook-correct)
The erythroid↔megakaryocyte lineage switch falls out of the differential TAL1 binding:
- Promoter-bound, G1E-only:
Gata1(master erythroid TF) + erythroid program (Cpox,Tfr2). - Promoter-bound, mega-only:
Fli1(master megakaryocyte TF) +Tal1autoregulation. - Common:
Cbfa2t3(ETO2, a TAL1-complex member),Pf4. - Distal nearest-gene network:
Runx1(all classes),Erg,Cd44,Nprl3(α-globin locus, erythroid). - Most TAL1 peaks are distal (median 12–24 kb to TSS) → enhancer-dominated, as expected.
Peak-set sizes: G1E 261, mega 150 (pooled), common 39, G1E-only 222, mega-only 110.
Snags / findings
- Megakaryocyte single-replicate gave only 17 peaks. Cause: deep TAL1 (4.4M reads) vs shallow input (750k) → MACS2 scales the larger library down to the smaller, killing sensitivity. Adding Tal1 R2 alone would NOT fix it (worsens the imbalance); the fix is deeper control. Pooled R1+R2 for both treatment and control → 150 peaks. (The 17 were all real, at Gata1/Gata2/Runx1/Cbfa2t3.)
- mm10 bowtie2 index setup (Bakta-style infrastructure): downloaded the genome-idx prebuilt mm10 index (3.4 GB zip → 6
.bt2), registered intool-data/shed/bowtie2_indices.loc(colsvalue, dbkey, name, path; path = index basename), restarted. The shed install of bowtie2 auto-registered thebowtie2_indexesdata table. Prebuilt index avoids a slow emulatedbowtie2-build. - The MCP upload bug (UC1) and
run_tool-no-map-over gap (UC1) still apply; UC2 usedrun_toolfor flat/conditional params (worked) and didn’t need map-over.
Review pass (subagent, live-verified) — corrections applied
A review subagent verified every quantitative claim against the live Galaxy data (peak counts, gene lists, promoter-window logic, dedup), checked MACS2 params, and fact-checked the biology. Verdict: clears the issue #13 MVP bar; all six peak counts, all three promoter-bound gene lists, the strand-aware window logic, and the distance bins matched the data exactly. Fixes applied to the notebook (rev 5114a2a207b7caff):
- Biology framing (the important one): G1E is a GATA1-null erythroid line, not an “erythroid progenitor.” The earlier “Gata1 (master erythroid TF) bound only in G1E” implied GATA1 is active in G1E — it is not. Reframed to “TAL1 occupies the Gata1 locus only in G1E,” with an explicit note that this is differential TAL1 binding at the Gata1 gene, not GATA1 activity. The lineage-contrast headline survives (it rests on differential binding).
- Stale medians: notebook said median dist-to-TSS 12/15/24 kb (computed over tie-inflated
closestlines); corrected to 14/15.5/20 kb (deduped per peak; reconfirmed live). - Intersect-count asymmetry:
common=39 is the G1E-side count (reciprocal is 40 mega-side); added a footnote so 39+222=261 and 40+110=150 both reconcile. - Genes in both lists:
Pdcd4/Fli1appear promoter-bound and distal (separate peaks at one gene) — clarified. - Minor:
Cbfa2t3described as ETO2/MTG16 corepressor of the TAL1 complex.
Completeness gaps the review flagged (beyond the open list below): the issue’s bar chart of candidate counts by class and a consolidated peak-set summary table (peaks / promoter-overlapping / nearest-assigned per set) aren’t present — minor, would strengthen paper-worthiness.
Still-open gaps vs issue #13
- Expression stretch not done — no RNA-seq cross-reference (the issue’s stretch; needs verified GSE51338 processed tables).
- No JBrowse locus view (Gata1 vs Fli1).
- Distance-to-TSS “distribution” rendered as a binned count-heatmap (no histogram tool installed).
- Workflow not extracted — pipeline is linear/clean but not yet run through PR #22860 extraction; note the G1E (R1) vs mega (pooled) asymmetry would need harmonizing for a single clean workflow.
Extractability — original NOT extractable; extractable rebuild done (a912e9e5d84530d4)
A full extractable rebuild was done in a fresh history TAL1 peaks to candidate genes (extractable) (a912e9e5d84530d4), notebook 72ad249754f05d26. It reproduces the original science exactly (G1E 261 / mega 150 peaks; common 39 / G1E-only 222 / mega-only 110; same promoter-bound genes incl. Gata1 G1E-only, Fli1 mega-only) but as a clean tool/collection DAG:
- 6 FASTQ fetched into ONE list collection (
TAL1 ChIP reads,cbbbf59e8f08c98c) → one map-over Bowtie2 step → BAM collection (964b37715ec9bd22). Replaces the original’s 6 independent uploads + 6 alignment jobs. - MACS2 ×2 (G1E; mega R1+R2 pooled) pulling treatment/control BAMs from the collection.
- bedtools intersect (classes + promoter overlap), awk promoters/TSS, SortBED, closest — all tools.
- Candidate gene lists via the
Grouptool (group by gene column), not bash — so the gene set is a reproducible dataset (b489799d…/981bfb3a…/cc428750…). - Op note: Docker has only 7.75 GB, so 4-wide Bowtie2 (3.4 GB index each) would OOM — set the local runner to
workers: 2inconfig/galaxy.yml(left at 2; safer for heavy emulated jobs). - Distance-to-TSS figure now also extractable. Installed
iuc/datamash_opsand rebuilt the figure entirely with tools:bedtools closest(ties=first) →awkbins →Datamashcount-per-bin (×3 classes) →Multi-Joinon the bin column (filler 0) →cata header-label row →ggplot2_heatmap2(PNGcb1423dc5924128e). Counts identical to the original (common 4/11/22/2, G1E-only 18/66/97/41, mega-only 5/30/45/30). Only non-tool input is a tiny header-label constant. So the figure tail extracts cleanly — but see the extraction-test section below: the alignment→MACS2 seam does NOT wire (element-addressing), so the workflow isn’t fully runnable until theExtract Datasetbridge is added.
This history is the one to run extraction against. The original f30a35c999095ed7 remains as the first-pass reference. Original redo instructions retained below for reference.
Extraction test — run 2026-06-13 (feature WORKS; map-then-reduce-by-element seam breaks)
Booted local Galaxy and ran extraction against a912e9e5d84530d4 both ways — the worktree first had only PR #22706 (history-based extraction); jmchilton then merged extract_next/#22860 mid-session, so the page-based flow was re-tested properly.
Run B (the real feature) — page-based extract_next/#22860
GET /api/pages/72ad249754f05d26/workflow_extraction_summary → POST /api/workflows/extract {…, from_page_id}. Workflow f597429621d6eb2b, .ga at /tmp/uc2_page_workflow.ga.
What works (the headline — the feature is sound):
- Smart subgraph seeding. The page summary seeds only the subgraph behind the notebook’s displayed outputs (31/36 rows) and pre-exposes exactly the 3 outputs the notebook shows (2 Group gene-lists + the heatmap figure) — not the whole history. 0 summary warnings.
- Workflow outputs set from the notebook. The extracted workflow has the 3 displayed outputs marked as workflow outputs with labels.
- Report rewrite is flawless. The notebook markdown became the workflow
report(4126 chars): all 3 display directives rewritten to workflow-relativeoutput="…"labels, zero leftover instance ids, 0 report_warnings. The analysis story travels into the workflow correctly. (/tmp/uc2_report.md.)
Two real defects this history exposes — both rooted in map-over then reduce-by-named-element:
(A) Seeding bug — the map’s input collection is dropped when its output is consumed as loose elements. → FIXED. In lib/galaxy/managers/workflow_extraction_summary.py::_backward_job_closure, map-input recovery (reading implicit_input_collections and enqueuing the input collection) fired only when the queue item was the output HDCA. But MACS2 consumed individual BAM elements (src=hda, e.g. G1E_Tal1 BAM hid 10 3ee1d7c9a966c95c). Walking back from a loose element, the code seeded the Bowtie2 ICJ but never enqueued the reads collection. Net effect: the reads collection (hid 1) was not seeded (hdca_ids: 0), so Bowtie2 extracted dangling, and a spurious hidden fastq element (G1E_Tal1 hid 3) was surfaced as a loose input.
- Fix applied: factored the recovery into
_enqueue_mapped_input_collections(output_hdca, queue)and now also call it in the job loop whenicj_assoc is not None— iterating the ICJ’soutput_dataset_collection_instancesto recover theirimplicit_input_collections. So a map reached via a loose element seeds its input collection just like one reached via the output HDCA. - Tests: new red→green unit test
test_loose_element_of_map_output_seeds_input_collection(verified failing without the fix);MockIcjAssoc/MockJobextended to model the realimplicit_collection_jobsrelationship. Full unit suite green (test_workflow_extraction_summary.py16/16,test_extract_report.py+test_markdown_export.py58/58). - Confirmed end-to-end: after the fix, the page summary seeds
TAL1 ChIP reads(collection) and the spuriousG1E_Tal1input is gone; re-extracted workflow1cd8e2f6b131e891has Bowtie2 wired to the collection input and only the 2 MACS2 steps dangling (= issue B alone). Still recommended before any PR: run the API testsTestNotebookWorkflowExtractionSummary(live-server, left to CI).
(A2) Sibling elements of a shared fetch job silently dropped → resolved as a side effect of A. The seen_jobs guard meant all 6 fastq elements sharing one __DATA_FETCH__ job → only 1 of 6 surfaced as a boundary input. Now moot: with A fixed, the per-element fastqs are skipped (their input name matches the recovered mapped_input_names) and the collection is seeded instead, so no loose element surfaces at all. (The underlying seen_jobs-drops-siblings behavior remains latent for any non-map “one upload, N siblings” shape, but no longer affects this pipeline.)
(B) Fundamental topology limit — “pick named element out of a collection → single-dataset input” has no workflow-connection representation. MACS2 reads specific BAM elements (treatment vs control, pooled replicates) out of the alignment collection. Even with A fixed (collection seeded), the two MACS2 steps come out disconnected (input_treatment_file: null), because a collection→single-dataset element edge can’t be a workflow connection. Confirmed independently in Run A (classic full-history extract, workflow f2db41e1fa331b3e, /tmp/uc2_workflow.ga): there the collection was seeded and Bowtie2 wired, but MACS2 still dangled — isolating B from A. Not a bug; a real limit.
Net verdict & fix
The extraction feature itself is solid — summary, subgraph seeding, output exposure, and the notebook→report rewrite all work cleanly at this scale. What a912e9e5d84530d4 exposed in a map-over-then-reduce-by-element pipeline was one real seeding bug (A/A2 — now fixed, Bowtie2 wires) and one inherent topology limit (B — the MACS2 element-addressing seam, history-side fix only). So the earlier “the whole UC2 now extracts” claim was over-stated — the figure tail extracts and (post-fix) the alignment map extracts, but the map→MACS2 reduce-by-element seam still needs the Extract Dataset bridge below.
Redo recipe for B — structure the reads so MACS2 reduces/maps instead of element-addressing
The decisive tool fact: MACS2 callpeak’s treatment and control pooling inputs are multiple="true" data params (not <repeat>) — iuc/macs2 .../macs2_callpeak.xml, treatment conditional “Are you pooling?” → input_treatment_file multiple="true", same for control|c_multiple. multiple="true" data inputs participate in collection reduction + map-over (LIST_REDUCTION / NESTED_LIST_REDUCTION). So the pooling we needed isn’t an obstacle — it’s what a list:list reduces into.
Build the reads as two list:list inputs (outer = condition, inner = replicate):
treatment_reads : list:list { G1E:{r1:G1E_Tal1}, mega:{r1:Mega_Tal1_R1, r2:Mega_Tal1_R2} }
control_reads : list:list { G1E:{r1:G1E_input}, mega:{r1:Mega_input_R1, r2:Mega_input_R2} }
Bowtie2 map-over each → treatment_bams / control_bams (list:list, structure preserved) [NESTED_LIST_MAPPING]
MACS2 map treatment_bams over treatment(multiple) + control_bams over control(multiple),
linked on the outer (condition) key [NESTED_LIST_REDUCTION ×2, linked]
→ peaks : list { G1E:…narrowPeak, mega:…narrowPeak }
Inner list reduces (pools replicates — handles the G1E-1-rep vs mega-2-reps asymmetry for free, since multiple="true" takes 1+); outer list maps (one MACS2 job per condition); treatment/control link on the shared outer key → matched inputs. One Bowtie2 step per input, one MACS2 step, zero element-addressing — extracts clean, and a new user just supplies their own two list:list (condition→replicates). Strictly better than the Extract Dataset bridge, which hard-codes element="G1E_Tal1" and only works for these sample names.
One assumption to verify before banking on it: that Galaxy maps two list:list over MACS2’s two multiple="true" inputs simultaneously, linked on the outer key. Compositionally supported by the semantics rules but not pinned by a specific test — worth a one-off live prototype (build the two collections, map-align, run MACS2 once).
Why not sample sheets (the semantically perfect model — the sample-sheet backend doc literally motivates it with ChIP-seq condition/replicate/control_sample): (1) it’s an authored input, not a derivable shape — column_definitions live on the workflow input parameter and are user-filled at run time; extraction emits plain data_collection_input steps and has no path to reconstruct the schema. (2) No tools consume the metadata yet — bowtie2/MACS2 see a sample sheet as a list; routing treatment/control from the columns needs a “split/route by sample-sheet column” tool that the doc’s own Limitations list as future work. So sample_sheet = best hand-authored input, not an extraction target. The nested list:list encodes the same grouping structurally and needs no metadata-aware tools — which is why it extracts. (paired_or_unpaired doesn’t fit at all: treatment/control ≠ forward/reverse, and it caps at 2 elements.)
The residual — and why UC2 is a stress-test, not a showcase, for single-notebook extraction
The list:list restructure dissolves the MACS2 addressing, but not the downstream G1E-vs-mega comparison (common / G1E-only / mega-only). That’s an inherent 2-way set comparison of two named conditions — picking two specific elements out of the peaks list, the same addressing problem in a new spot. The ideal engineering answer is to split into two workflows (a map-over peak-caller + a 2-data-input pairwise comparator), but we’re demoing notebook→one-workflow extraction, so a split defeats the point. Within the one-notebook constraint, the comparison tail still needs 2 condition-pinned Extract Dataset steps (element="G1E_Tal1" etc. — workflow-compatible, so it runs, just not reusably).
Honest verdict: a differential two-condition analysis is not the cleanest showcase for “notebook → one reusable workflow.” Its irreducible 2-way comparison means the best single-notebook outcome is a runnable-but-condition-pinned workflow. It is, however, a genuinely good robustness test — it found and drove a real seeding-bug fix (A). For a pristine happy-path demo, prefer a naturally map-over analysis with no cross-sample comparison (per-sample/per-isolate — closer to UC1’s shape); save UC2 as the “extraction survives a realistic messy pipeline” story. (Fixing A/A2 was a code change to #22860’s seeding walk — committed 7e1d6d730f. The list:list rebuild and the prototype are not yet done; documented here as the next-run recipe.)
Original redo instructions (now executed above)
Job-graph scan: 0 collections; the 6 FASTQ were uploaded individually → 6 separate bowtie2 jobs (no map-over), the volcano-style figure used a Python-computed pasted matrix, and the candidate-gene lists were produced in bash (awk/sort -u), not as Galaxy datasets. So PR #22860 extraction would yield a sprawling fixed-sample DAG (6 named FASTQ inputs, parallel hardcoded alignments) plus dead-ends where external computation was injected — not a reusable workflow.
To redo so it extracts cleanly:
- Upload reads into a collection, not individually. Build a list (or list:paired) collection of the FASTQ samples (e.g. one collection of TAL1 ChIP samples + one of matched inputs, with element identifiers = sample names). Use the dataset-collection builder, not N separate
upload_file_from_urlcalls. Via API:POST /api/histories/{id}/contentswithtype=dataset_collection. - Map Bowtie2 over the collection (
{"library|input_1": {"batch": true, "values": [{"src":"hdca","id":...}]}}) → one BAM collection, one workflow step instead of six branches. - MACS2 with matched treatment/control. Either call MACS2 per condition (TAL1 BAM vs input BAM) as explicit steps, or pair via collections; keep the pooled-replicate handling as a single MACS2 step (it already pools — that part is fine). Avoid the one-off R1-only run.
- Do every downstream transform as a Galaxy tool (these already are, keep them):
bedtools intersectfor common/G1E-only/mega-only,tp_awkfor promoter/TSS,bedtools closestfor nearest-gene. - Produce the candidate-gene list as a dataset, not in bash. After the promoter ∩ peaks intersect, extract + dedupe gene symbols with
text_processing(Cutcolumn 4 →Sort→Unique), so the gene list is a workflow output, not a terminalawkin the shell. - Build the figure’s count matrix with Galaxy tools (datamash/awk on the
closestoutput), not Python → pasted, so the figure step is reproducible.
The interpretive bedtools half is already tool-based and would survive extraction; the alignment front-end and the figure/gene-list tails are what break it.
CLEAN REBUILD — 2026-06-13 (list:list MACS2 validated; full extractable workflow with one pinned seam)
Rebuilt from scratch in a clean history with the list:list restructure from the redo recipe above. The prototype assumption is now confirmed live, and the whole analysis extracts into a complete runnable workflow — better than the earlier “stress-test, not showcase” verdict: the MACS2 seam fully dissolves, and even the 2-way comparison extracts (condition-pinned).
Artifacts:
- History
TAL1 peaks to candidate genes (clean, extractable)—96d9e11f37f34b29 - Notebook page
42a2c611109e5ed3 - Extracted workflow
5969b1f7201f12ae(/tmp/uc2_workflow.ga) - treatment_reads
90240358ebde1489/ control_reads86cf1d3beeec9f1c(bothlist:list{G1E:{r1}, mega:{r1,r2}}) - peaks list
846fb0a2a64137c0(G1E 261, mega 150); figure PNG3e28f7bb496103da
The decisive prototype — MACS2 maps two list:list (the assumption the recipe flagged “verify before banking”). Confirmed the encoding:
- Naive
{"batch":true,"values":[{"src":"hdca","id":X}]}on amultiple="true"input maps over all leaves (3 jobs, replicates NOT pooled). Wrong. {"batch":true,"values":[{"src":"hdca","id":X,"map_over_type":"list"}]}splits thelist:listinto inner-listsubcollections → outer-map (condition) + inner-reduce (pool replicates). Exactly 2 MACS2 jobs (G1E single; mega pools R1+R2 — 2 treatment + 2 control BAMs), output alistof 2 condition narrowPeaks. The twolist:listlink automatically on the outer key. (map_over_typeis read inlib/galaxy/tools/parameters/meta.py.)
Science reproduced exactly from the clean tool graph: G1E 261 / mega 150 peaks; common 39 / G1E-only 222 / mega-only 110; promoter-bound Group gene lists = Cbfa2t3/Pf4 (common), Gata1+11 (G1E-only), Fli1/Tal1/Dock8/Lgi1 (mega-only); figure bin counts common 4/11/22/2, G1E-only 18/66/97/41, mega-only 5/30/45/30 (note: figure closest uses ties=first, one row per peak — ties=all tie-inflates and was caught/fixed mid-build).
Extraction — fully runnable workflow, one pinned seam, plus a real seeding-walk gap. Page summary: 31 seeded, 4 ICJ (2 Bowtie2 + 1 MACS2 — the two MACS2 summary rows share one ICJ, so dedupe implicit_collection_jobs_ids before POSTing /api/workflows/extract). The extracted .ga (with the fix below): 34 steps, every tool step has inputs, 0 dangling, 3 workflow outputs, report 0 leftover ids. MACS2 wires to both Bowtie2 BAM collections — the list:list step survives extraction as a single mapped step.
- The seam: the G1E-vs-mega comparison uses
__EXTRACT_DATASET__(element=G1E / element=mega) → bedtools intersect. Confirmed it extracts as condition-pinnedExtract datasetsteps that run, so the workflow is complete and runnable — just not sample-agnostic at that seam (the irreducible 2-way comparison, as predicted). - Seeding gap — FOUND, ROOT-CAUSED, and FIXED. The page-extraction summary originally did not surface
__EXTRACT_DATASET__jobs, so the auto-built payload (= what a UI “extract” sends) omitted them and the comparison intersects came out input-less (dangling). Root cause (DB-confirmed): the Extract Dataset output HDA has bothcopied_from_history_dataset_association(→ the MACS2 peaks element) and its own creating job (__EXTRACT_DATASET__);galaxy.workflow.extract._original_hdaunconditionally walkedcopied_from, so bothsummarizeand the closure normalized the extracted dataset back to the MACS2 element and dropped the Extract Dataset step entirely (the summary even showedG1E/megaas MACS2 outputs). Fix (lib/galaxy/workflow/extract.py):_original_hda/_original_hdcanow stop walkingcopied_fromwhen the content has its owncreating_job_associations— a passive copy (drag-drop) has none and still normalizes; a collection-operation output (Extract Dataset, Filter, Relabel, Flatten, …) has one and is kept as a real step. Generalizes to everyDatabaseOperationTool. Tests added (test_extract_report.py::test_index_does_not_normalize_collection_operation_output, stub extended); 34 extraction unit tests pass. Verified live: post-fix the page summary surfaces 2 seededExtract datasetrows (no Bowtie2 duplication), and the auto-built extraction yields a complete 34-step workflow with the Extract Dataset bridge connected and zero dangling — a UI extract now works.
Net: UC2 is no longer just a “robustness test” — with list:list it’s a genuinely extractable differential-ChIP workflow whose only non-reusable point is the inherent 2-condition comparison (condition-pinned, but runnable). The clean uploads (2 list:list from Zenodo), all-tool graph (no bash, no pasted matrix), and exact science reproduction hold.
Two-workflow split — VERIFIED reusable (2026-06-14)
To make the differential analysis fully sample-agnostic (vs the single condition-pinned workflow), it extracts cleanly as two workflows via extract_by_ids:
- WF1 peak caller
3f5830403180d620: 5 steps — twolist:listreads inputs → Bowtie2 ×2 (map-over) → MACS2 (one mapped step) → condition peaks list. Fully map-over reusable (supply your own condition→replicate reads). - WF2 differential comparator
e85a3be143d5905b: 29 steps — 4 inputs (G1E peaks, mega peaks, RefSeq, header) → 3-waybedtools intersect(common/only/only) → promoters/TSS → Group gene lists + closest→datamash→heatmap figure → 3 outputs (2 gene lists + figure). All tools connected, edges resolve, no condition-name pinning (the two peak sets are plain data inputs — feed any A-vs-B).
Both verified: all tool steps connected, all edges resolve. Chaining WF1→WF2 still requires picking the two condition elements out of WF1’s peaks list (the irreducible 2-way comparison), but each workflow is individually fully reusable. This is the clean answer to “differential A-vs-B doesn’t fit one reusable notebook→workflow”: split the map-over caller from the pairwise comparator.
Suggested next moves
JBrowse locus view, the RNA-seq expression stretch, or move on. UC2 now extracts either as one complete condition-pinned workflow (post _original_hda fix) or as two fully-reusable workflows (caller + comparator).