UC3_NEXT_STEPS

UC3 Next Steps — emit the filtered + ranked differential-peaks tables

Issue: https://github.com/jmchilton/galaxy-brain/issues/14 For: the agent that built the clean UC3 history. Handoff to push the differential-ATAC analysis from “great paper figures” to “complete workflow.”

Why this handoff

You optimized UC3 for the Galaxy Notebooks paper: a clean 5-step spine (counts collection + sample sheet → DESeq2 → tp_awk NA-filter → volcanoplot) that extracts with all rows seeded and forced the PDF-renderer contributions. That goal was met (history 241d84796a24640a, page a7e42332dab8f5db, extracted workflow 0a248a1f62a0cc04).

But at the workflow level the headline scientific deliverable is missing. The issue (#14 steps 7 and 9) calls for filtering significant peaks and ranking top gained/lost — and the current workflow does neither as data:

No new tools needed: bgruening/text_processing (tp_awk, sort) is already installed and in the spine. This is additive — don’t disturb the existing DESeq2 / NA-filter / volcano path.

Step A — significance filter → significant-peaks table

Step B — rank top gained / top lost

Direction reminder (reference_level = Erythroblast): LFC > 0 = B-cell-gained, LFC < 0 = Erythroblast-gained. Confirm with a marker (MS4A1/CD20 chr11 is LFC>0).

Step C — nearest-gene annotation (optional / stretch)

The debrief lists nearest-gene as a “next.” Keep it light per the issue’s scope note (peak-to-gene assignment is not trivial):

Verification (what “done” looks like)

CheckExpected
Workflow step countgrows from 5 → ~7–8 (filter + sort, plus optional annotate)
Significant-peaks tableexposed output, 45,620 rows (padj<0.05, |LFC|≥1)
Direction split34,873 B-cell-gained (LFC>0) / 10,747 Eryth-gained (LFC<0)
Top-gained table markersEBF1/MS4A1/PAX5/CD19 among top LFC>0
Top-lost table markersHBB/GATA1 among top LFC<0
Volcanounchanged — still fed the full non-NA table, not the filtered subset
Re-extractionnew outputs exposed, zero dangling, report clean (hold to UC3 §7)

Scope / risks