UC1_PAPER_INTEGRATION

UC1 → paper integration proposal (MRSA mobile-AMR, issue #12)

What this is: ideas for integrating the UC1 work into the Galaxy Notebooks paper (vault/papers/galaxy-notebooks/manuscript.md), generated by feeding the UC1 debrief pair (UC1_DEBRIEF.md + UC1_DEBRIEF_2.md) and the paper draft to a review subagent. The paper itself was not modified — this is a proposal for the author to apply (or not). Line numbers reference manuscript.md as of 2026-06-14.

Framing: what UC1 unlocks in the current draft

The manuscript repeatedly hedges its central claim — notebook-driven workflow extraction — as unproven:

UC1 is that captured, verified, byte-identical end-to-end vignette. The highest-value move is to use UC1 to flip these hedges from conditional to delivered. That is the must-include thread; everything else supports it.

MUST-INCLUDE

M1. Concrete worked-example in the Extraction section (load-bearing)

Where: new paragraph after the “implementation status” paragraph (~line 90). Drop-in paraphrase:

We validated this path on a real comparative-genomics analysis. A four-isolate S. aureus mobile-resistome study was documented as a Galaxy Notebook in which every analytical step — ARG detection, insertion-sequence scanning, integron finding, coordinate reformatting, and ARG↔IS distance computation — ran as a collection map-over across the four isolates, and both result figures were on-graph tool outputs rather than pasted images. Backward extraction from the notebook’s referenced artifacts recovered a 14-step workflow (one input collection plus 13 tool steps), every input connection resolved (zero dangling), nine workflow outputs, eight collection map-over steps, and a seeded workflow report requiring no manual repair (zero leftover dataset-instance identifiers). The recovered workflow is sample-agnostic.

Then soften the ~line 90 conditional: report the vignette has been captured on the current branch (page-based extraction, PR #22860 merged), keeping the “polished/contributed vignette” aspiration for the richer UC2/UC3 cases.

M2. “Byte-identical science” claim

Where: Extraction section (after M1) + echoed in Discussion “reuse” benefit (~line 174).

The extracted workflow reproduces the original exactly: re-running it produced an ARG↔IS distance collection byte-identical to the validated original across all four isolates, with both figure matrices byte-identical (one differing only in cosmetic row order). Extraction recovers the same computation, not an approximate reconstruction.

UC1 is the best evidence of the three for this soundness claim; substantiates the existing “deliberately not free-text workflow synthesis” contrast (~line 88).

M3. The remove_short_is gotcha → evidence that on-graph artifacts are auditable

Where: Design Goals “reference artifacts, not just describe them” (~lines 29–30) and/or Discussion limits (~line 176).

Because referenced outputs are real on-graph tool results, parameter choices that affect them remain auditable. A single tool default — the IS scanner’s remove_short_is flag — silently altered a figure (25 vs 17 element calls for one isolate, spurious zero-distance overlaps); because the figure was on-graph, the discrepancy was traceable through provenance and corrected. A pasted image would have hidden it.

New, concrete argument for the core thesis; pre-empts “notebooks just paste prettier figures.” Frame as made the error auditable, not prevented it (see T1).

M4. Ground the Evaluation Plan in the real vignette

Where: Evaluation Plan evidence layers 2–3 (~lines 144–146), currently generic.

The worked vignette is a four-isolate S. aureus mobile-AMR comparison (BioProject PRJDB8599); the notebook embeds two on-graph heatmaps and a comparative finding, and extraction recovers a 14-step, nine-output, sample-agnostic workflow with a clean seeded report.

UC1 turns layers 2 (worked vignette) and 3 (workflow handoff) into reported results. It does not cover layer 1 (test counts) or layer 4 (agent authorship) — flag those need separate sourcing.

FIGURES / TABLES

TENSIONS / HONESTY

Where UC1 is best vs UC2/UC3

Suggested arc: lead extraction evidence with UC1 (clean baseline), then UC2/UC3 as the harder cases that exercised real fixes.

Key IDs for figure capture: notebook eafb646da3b7aac5, workflow 33b43b4e7093c91f, Fig1 29e36fb8642bf5ed, Fig2 579ae69ccbd17e45, history 48916fac0de9a85d; extraction = page-based PR #22860.