tasks

Manuscript Polish TODO

Working list of things needed to lift manuscript.md from honest first draft to submission-ready. Maintained alongside the draft, not as a planning document for a future paper. Items resolve as the manuscript improves; the list shrinks rather than grows.

Highest-Leverage Work Still Ahead

These two items came out of the first sub-agent review pass and are the most impactful changes still outstanding. Both involve substantive writing and/or analysis beyond mechanical polish, so they sit at the top of the list rather than buried inside the polish sections below.

Axis 1 — Add a central worked example threading depth claims through the paper

The depth-of-validation claim is currently asserted abstractly three times (BWA-MEM scoring matrix hypothetical in Abstract, Introduction, and Validation Across Workflow Systems) but never demonstrated. The Format 2 example shows encoding differences, not validation depth — it’s a clean workflow being shown clean.

Introduce a single named workflow with a sequence of plausible authoring mistakes — a misspelled output_sort, a stale __current_case__ carrying a parameter removed in a newer tool version, a list:paired collection wired to a tool expecting paired without a flattening operation, an illegal select on a reference_source_selector. For each mistake, show the gxwf diagnostic (path, category, legal alternatives), and contrast briefly with what the comparable Nextflow / Snakemake / WDL toolchain would or would not catch on the equivalent error.

This becomes “Listing 1” referenced from Introduction → Schema-Aware Validation → VS Code subsection → Validation Across Workflow Systems. The same listing also fixes a subtler problem: the three validation layers (per-step, per-connection, conditional/stale) are currently described at uneven concreteness levels. One example carrying all three layers solves all three.

Axis 4 — Corpus section needs a finding, not just counts

Filling the [NUMBER] placeholders gives the corpus section quantity but not significance. A reviewer will read “X validate cleanly, Y surface auto-cleanable diagnostics, Z require human attention” and ask: so what? Which categories of error are most common? Are there real bugs in IWC workflows that the validator found? Was any of the stale-state propagating wrong scientific results, or was it all cosmetic?

The target-ladder in index.md already calls this out: “Strengthened materially if a PhD contributor lands one biological re-validation vignette.” The corpus section should cash that in or, failing a full biological vignette, at least land one categorical finding: a histogram of diagnostic categories surfaced across the corpus with the modal category named and exemplified; or a before/after snapshot showing how many warnings the validator surfaced N months ago that have since been fixed by maintainers; or a single named tool whose version bump introduced a parameter rename the validator flagged across K workflows.

This is the upgrade from Methods to Resource framing at GR. The analysis is real work — a day of running the validator over IWC snapshots and categorizing the output — and likely needs a co-author or PhD contributor to execute. Distinct from the [NUMBER] items below, which track counts; this item tracks the categorical analysis that turns counts into a finding.

Numbers to Fill In

Every [NUMBER] in the draft is an empirical claim the manuscript cannot make without measurement. Land these before any external review.

Concrete Examples and Figures

Citations

The manuscript now carries inline citations; bibliographic records now live in references.yml (single source — references.md retired, the site renders the reference list from the YAML). The work remaining:

Manuscript Hygiene Sweeps

Scope and Framing Decisions Still Open

Items here are decisions deferred during drafting, not unknowns. Resolve and propagate to the manuscript.

Honest Risks in the Current Draft

Concerns to raise with co-authors and reviewers; not procedural TODOs.

Fallback Trim Plan (if retreating from Genome Research)

If the venue ladder retreats to Bioinformatics Original Paper (~5000 words), the trim ops are:

If retreating further to Application Note (~2000 words), the trim ops are:

Maintained as a parallel checklist so the trim is mechanical, not creative, if the timeline forces it.