Jinja Templated Reports Plan
Goal: Replace hand-written Python format_*_markdown() functions with Jinja2
templates driven by the CLI --report-json output so the same templates can be
reused by the TypeScript mirror via Nunjucks.
Does the idea make sense?
Yes. The current state makes the transition low-risk:
- Every CLI already produces structured JSON via Pydantic
model_dump(by_alias=True)and its shape is identical to what the markdown formatter consumes (both operate on the sameTree*Reportmodel). jinja2is already a declared dependency of bothgalaxy-tool-util(packages/tool_util/setup.cfg) and the Galaxy monorepo (pyproject.toml), so no new runtime deps.- Nunjucks is a near-drop-in port of Jinja2. If we stay within a small, well-known
subset (
for,if,set,macro,include, and a handful of filters:length,default,upper,lower,replace,join,sort,selectattr), templates render identically in both engines. - The one structural risk: a couple of Python-side formatters call model methods
(e.g.
TreeReportBase.by_category()) that do not appear in JSON. We address this by promoting those to@computed_fields (see below) — a small, localized change that also makes the JSON richer for downstream consumers.
Current inventory
Six markdown formatters and their tree-level Pydantic inputs
(lib/galaxy/tool_util/workflow_state/):
| CLI | Formatter | Tree report model |
|---|---|---|
gxwf-state-validate | validate.py:format_tree_markdown | TreeValidationReport |
gxwf-state-clean | clean.py:format_tree_clean_markdown | TreeCleanReport |
gxwf-roundtrip-validate | roundtrip.py:format_roundtrip_markdown | RoundTripTreeReport |
gxwf-to-format2-stateful | export_format2.py:_format_tree_markdown | ExportTreeReport |
gxwf-to-native-stateful | to_native_stateful.py:_format_tree_markdown | ToNativeTreeReport |
gxwf-lint-stateful | lint_stateful.py:_format_lint_tree_markdown | LintTreeReport |
Plus the embedded format_connection_markdown that validate.py splices in.
All six are wired through _report_output.emit_reports(... markdown_formatter=...)
and _tree_orchestrator.run_tree(... format_markdown=...), so there is a single
injection point per CLI.
Implementation plan
Step 1 — Make every report model fully serializable
Anything the markdown rendering needs must be in JSON. Audit each formatter and
promote derived values to @computed_fields (not @property, which Pydantic
does not emit in model_dump) on the report models so no template depends
on a Python method or attribute that vanishes on serialization.
Before making any model change, land a JSON-contract-freeze snapshot test
(see Step 5) over representative fixtures so the computed_field additions can
be verified as additive-only. This protects any downstream consumer that pins
the --report-json shape (notably the incoming TS mirror).
Concretely, add the following:
_report_models.py
WorkflowResultBase→name(basename ofrelative_path). Used in 3 formatters viaos.path.basename. Nobasenamefilter exists in the Jinja2/Nunjucks shared subset.TreeReportBase→categories: list[{name, results}]computed field. Replaces theby_category()method. Sorted by name. All tree reports inherit it.WorkflowValidationResult— reuse the existingsummarycomputed field (_report_models.py:68) in templates rather than adding paralleln_ok/n_fail/n_skip. Add afailures: [{step, tool_id, message}]list so the template doesn’t have to do the two-pass accumulation thatvalidate.py:format_tree_markdowndoes today (build table rows and a failure appendix in the same loop).TreeValidationReport→all_failures: [{workflow, step, tool_id, message}]so the per-workflow failures can be rendered as a grouped appendix without a second template loop.CleanStepResult→display_label(thetool_id or "unknown"+ optional version suffix currently built atclean.py:619-621).WorkflowCleanResult→steps_affectedcount + keeptotal_removed.LintWorkflowResult→step_countscomputed field mirroringWorkflowValidationResult.summary(n_ok/n_fail/n_skip).
roundtrip.py — biggest block of work, because the current formatter
relies entirely on plain @property that don’t serialize:
- Promote
RoundTripValidationResult.status,.ok,.error_diffs,.benign_diffs,.summary_linefrom@propertyto@computed_field. Without this promotion, the Jinja roundtrip template has nothing to branch on — this is the single biggest correctness risk in the port and must land before the roundtrip template is written. - Add
RoundTripValidationResult.conversion_failure_lines— the exact strings_format_conversion_failures()emits, precomputed Python-side so the template stays Nunjucks-safe. - Add
RoundTripTreeReport.totalfor template convenience. - Add
RoundTripTreeReport.tool_failure_modes: list[{tool_id, count, failure_class}]aggregated across all results. This is the “top-N offending tools” table called out in the report-utility review and is the single most actionable artifact for the roundtrip report. Aggregating in Python keeps templates inside the Jinja2/Nunjucks shared subset. - Note:
StepDiff.format_line()is a method used only by the text formatter, not the markdown formatter. Leave it alone.
export_format2.py / to_native_stateful.py
WorkflowExportResult/WorkflowToNativeResult→ derivedstatus: "ok"|"partial"|"error"|"skipped"computed field. Today the formatters reconstruct this from a 4-branch (error/skipped_reason/ok/else-partial) negative-definition chain — promoting to an explicit field tightens correctness.
Out of scope for Step 1 (deferred to dedicated commits):
lint_messages: list[str]onLintWorkflowResult— requires plumbingLintContextmessages throughrun_lint_stateful_tree’sprocess_one, not just a model field. Treat as its own reviewable change alongside the lint template port.- Stale-keys classification histogram on
CleanStepResult— requires plumbingstale_keys.pyclassification into theremoved_keysdata, which currently ships only as alist[str]. Own reviewable change. - Per-tool fallback aggregation on
ExportTreeReport/ToNativeTreeReport— requires capturing per-steptool_idon step results (currently only counts survive). Own reviewable change. - Lint exit-code documentation footer — requires exposing the exit code
semantics on the report model; a new
exit_code_meaningfield, not just a template change.
These improvements are still in scope for the overall effort but land as separate commits from the template port itself. This is the “separate model changes from template changes” discipline the review called out.
Red→green: add a unit test per model that freezes a fixture dict and asserts on the new computed fields. Then fill them in.
Step 2 — Land a template rendering module ✅ done
New file: lib/galaxy/tool_util/workflow_state/_report_templates.py
Responsibilities:
- Build a single
jinja2.Environment(lazy singleton via_get_env()) with:PackageLoader("galaxy.tool_util.workflow_state", "templates/reports")trim_blocks=True,lstrip_blocks=True,keep_trailing_newline=False- Autoescape disabled (markdown, not HTML)
- No custom filters or globals — Nunjucks parity is the constraint.
- Provide
render_report(template_name: str, report: BaseModel, meta=None) -> strthat takes the report,model_dump(by_alias=True, mode="json")s it, passes the result under a single top-level var (report) plus ametavar forgenerated_at,tool_util_version(optionalmetaoverride for tests). - Provide a
make_markdown_renderer(template_name)helper returning aCallable[[BaseModel], str]— the exact signatureemit_reports/run_treealready expect, so the swap at each CLI is one line.
Step 3 — Template scaffolding ✅ done
Scope of Step 3 is the shared scaffolding only — the six tree-level templates are created inside Step 4 alongside the per-CLI port so each diff stays reviewable. Delivered in this step:
lib/galaxy/tool_util/workflow_state/templates/reports/
_macros.md.j2 # status_badge, kv_summary, workflow_state_cells, failure_bullet
connection_section.md.j2 # ported from validate.format_connection_markdown; also include target for validate_tree
Added in Step 4 (one per CLI port):
validate_tree.md.j2
clean_tree.md.j2
roundtrip_tree.md.j2
export_tree.md.j2
to_native_tree.md.j2
lint_tree.md.j2
One macro file avoids cross-template duplication while staying inside the
Jinja2/Nunjucks shared subset ({% import "_macros.md.j2" as m %} works in both).
Templates are packaged as data files. packages/tool_util/setup.cfg has
include_package_data = True, but sdists built from packages/tool_util/
still need an explicit MANIFEST.in include for the new .j2 files —
added as include galaxy/tool_util/workflow_state/templates/reports/*.j2.
Step 4 — Port formatters one at a time ✅ done
For each CLI, in this order (smallest → largest), to keep diffs reviewable:
export_format2— simplest (bullet list). Lowest risk smoke test. ✅to_native_stateful— near-identical to #1. ✅lint_stateful— single table. ✅clean— affected table + per-workflow details. ✅validate— categories + failure details + spliced connection section. ✅roundtrip— most structure (diff classification, conversion failures). ✅
Per CLI, the change is:
- Write the
.md.j2file. - Replace the
format_*_markdownimport at theemit_reportscall site withmake_markdown_renderer("validate_tree.md.j2"). - Delete the old
format_*_markdownfunction in the same commit. Hard cut, no shims. Parity is enforced by the Step 5 snapshot tests (old-formatter goldens are checked in first, then the template is required to match them — or to match a new golden when we intentionally improve the output).
Conventions established while porting CLIs #1–#4:
- Golden file suffix is
.md.golden, not.md, to keep prettier’s table-column-aligner off them. Templates emit unaligned tables because Jinja2/Nunjucks have no shared column-pad filter. Goldens live undertest/unit/tool_util/workflow_state/report_markdown_goldens/. - Templates emit a blank line after the H1 (prettier enforces this on markdown and we want the rendered output to be prettier-compliant anyway).
- Templates reach for
selectattr(attr)(single-arg, truthy-check) plusloop.firstwhen a filtered section header needs gating — avoids mutablenamespace, which isn’t in the Jinja2/Nunjucks shared subset. - Content improvements from the review (category grouping on
export_treeandto_native_tree, lint_messages surfacing, etc.) have been deferred from the per-CLI port commits to keep diffs tight. They’ll land as follow-ups once the six pure ports are in. - Caveat: CLIs #1–#4 were committed without running the full pytest
(no .venv in the worktree). Each template was verified against real
model_dumpoutput via a standalonejinja2.Environmentdry-run and, for clean, viaexec-ing_report_models.pywith a stubbedgxformat2. Runtest_report_templates.pyunder a bootstrapped venv before merging.
Step 5 — Testing strategy (red→green)
Three layers:
- JSON contract freeze (lands first, before any Step 1 model changes):
test/unit/tool_util/workflow_state/test_report_json_contract.py. For each of the six tree reports, build a representative fixture and snapshotmodel_dump(by_alias=True, mode="json"). After Step 1 lands, update the goldens and review the diff — it must be additive-only. Catches any accidental breakage of the TS mirror’s or IWC CI’s JSON expectations. - Snapshot tests (new, per CLI):
test/unit/tool_util/workflow_state/test_report_templates.py. Feed a hand-built fixtureTreeValidationReport(etc.) into the renderer, compare output to a checked-in.mdgolden. Runpytest --snapshot-updateonly intentionally. This gives us the red-before-green for Step 4. - Regression parity (one-shot): before deleting each
format_*_markdown, capture its output against the shared fixtures as a checked-in golden. The Jinja template must match the golden exactly (pure port) or the golden must be updated deliberately in the same commit (intentional improvement). No dual-rendering shim phase — goldens are the bridge.
Also: extend test_iwc_sweep.py (gated on GALAXY_TEST_IWC_DIRECTORY) with one
run per CLI that writes --report-markdown to a tempfile and asserts non-empty
- parseable structure (section count, table headers present).
Step 6 — Nunjucks compatibility gate (removed)
Decided against a dedicated CI gate. The templates stay within the
Jinja2/Nunjucks shared subset by convention (documented in _macros.md.j2
header); the npm package itself will validate compatibility when consumed.
Step 7 — TypeScript consumer wiring (removed)
Out of scope for this effort. TS consumption will be handled separately when the TS mirror project is ready.
Markdown report utility review + improvement suggestions
Reviewing each current formatter against its JSON counterpart:
validate_tree (strong baseline)
- Works well: per-category tables, failure details appendix, connection validation section spliced in.
- Gaps:
- No overall status banner (PASS/FAIL) at the top — a human scanning a CI artifact has to read the counts line. Add an explicit badge-style line.
- Failure details are flat bullets with the full relative path every time;
grouping by workflow (
### workflow→ bullet list) is easier to read for IWC-scale output (120 workflows). - No link anchors from the category table rows to the failure details, so clicking through a GH preview is manual scrolling.
- Connection section emits a
## Connections: <path>block for every workflow that has aconnection_report, including fully-valid ones with zero invalid connections. Should be suppressed when the connection report is fully green.
clean_tree (good)
- Works well: affected table + per-workflow details section.
- Gaps:
- “Clean workflows” section currently says only “All other workflows have
no stale keys.” Useful to list them (optionally, collapsed under a
<details>tag, which GH-flavored markdown supports) so reviewers can confirm coverage. - No aggregate “stale keys by classification” — we already have the category
info in
stale_keys.py. A summary histogram (e.g.bookkeeping: 45, removed_param: 12, case_index: 8) would tell the reader why things are being cleaned at a glance.
- “Clean workflows” section currently says only “All other workflows have
no stale keys.” Useful to list them (optionally, collapsed under a
roundtrip_tree (this one most needs improvement)
- Works well: status per workflow, benign/error diff counts.
- Gaps:
- Currently just a flat bullet list — 120 IWC workflows produces a wall of
text. Needs per-category tables like
validate_tree. - No “tool failure modes” aggregation. When a roundtrip fails across multiple
workflows it is almost always the same tool(s) — a top-N table of offending
tool_id→ failure count is the single most actionable artifact for this report. - Benign-diff workflows are reported identically to clean ones aside from a parenthetical count; a separate collapsed section listing the benign patterns observed (“empty-repeat normalization”, “multi-select scalar→list”) would make the benign classification system visible.
conversion_failure_linesare currently sub-bullets; a small dedicated table with columns (workflow, step, tool, failure_class) is easier to scan.
- Currently just a flat bullet list — 120 IWC workflows produces a wall of
text. Needs per-category tables like
export_tree / to_native_tree (sparse)
- Identical structure:
# Title+ bullet list + summary line. Feels like a stub. - Gaps:
- No category grouping even though the tree walker has category info.
- No per-tool fallback aggregation — for export/to-native, the thing you actually want to know is “which tools keep hitting the fallback path”, same argument as roundtrip.
- No section listing skipped workflows with skip reasons distinct from errors.
- Suggest standardizing on the same structure as
validate_tree(header + category tables + failure appendix) via shared macros.
lint_tree (adequate)
- Works well: single wide table is easy to scan.
- Gaps (model changes deferred — see Step 1 “out of scope”):
- Lint errors/warnings are shown as counts only; the actual lint messages
(which
LintContexthas) are dropped from the markdown. Fixing this needsLintWorkflowResult.lint_messagesplus plumbing throughrun_lint_stateful_tree’sprocess_one— model and pipeline change, not just a template edit. - Exit code semantics aren’t documented in the report itself (0 vs 1 vs
EXIT_CODE_LINT_FAILED). Needs a newexit_code_meaningfield onLintTreeReport.
- Lint errors/warnings are shown as counts only; the actual lint messages
(which
Cross-cutting improvements to bake into the shared macros
- Consistent H1:
# <Operation>: <root>, consistent summary badge line (**Status:** PASS | **Workflows:** N | ...), consistent “Summary” section. - A single macro
workflow_row(r)that handles the error/skipped/ok cases — currently every formatter reimplements the same three-branch logic. - Emit
generated_atand the galaxy-tool-util version in a footer (from themetadict passed torender_report). Useful for diffing reports over time. - GitHub-flavored
<details>collapsibles for long lists (clean workflows, benign-diff details) so the rendered output is scannable but complete. - Anchor links from summary tables to detail sections.
None of the above require template features outside the Jinja2/Nunjucks shared subset.
Deliverable summary
- Computed-field additions on report models (Step 1). ✅
_report_templates.pyrendering module (Step 2). ✅templates/reports/scaffolding —_macros.md.j2+connection_section.md.j2(Step 3). ✅- Per-CLI
*_tree.md.j2templates + swap offormat_*_markdown→ Jinja renderer (Step 4).- export_format2 ✅ · to_native_stateful ✅ · lint_stateful ✅ · clean ✅ · validate ✅ · roundtrip ✅
- Snapshot + parity tests (Step 5) — JSON contract freeze ✅, per-CLI goldens landing alongside each Step 4 commit (6 of 6 done).
Nunjucks compat CI gate(removed — convention-enforced, not CI-gated).TypeScript consumer wiring(removed — out of scope).- Updated report content per review above — un-bundled from Step 4 in practice: pure ports first, content improvements as follow-up commits after all six ports land.
Unresolved questions
- Meta footer: include galaxy-tool-util version? Git SHA? Opt-in?
- Collapse
<details>blocks — acceptable in all consumer surfaces, or do some render raw markdown where<details>won’t work? - Do we want a
--report-htmlmode later that reuses the same templates with a different macro set? If yes, factor macros to minimize rework now. - Do the Step 1 computed_field additions warrant a JSON schema version bump for downstream consumers, or is additive-only (enforced by the Step 5 contract freeze) sufficient?
Resolved (no longer unresolved):
- Snapshot lib: plain golden files.
syrupyis not currently a dep. - Shim vs hard cut: hard cut. Goldens are the bridge.
connection_section.md.j2: both —format_connection_markdownis already standalone and spliced intovalidate_treetoday; the template port mirrors that.- Roundtrip “tool failure modes” aggregation: compute in Python as a
computed_field on
RoundTripTreeReportto keep templates inside the Jinja2/Nunjucks shared subset. - Nunjucks CI gate: removed — convention-enforced via
_macros.md.j2header, not a separate CI step. - TS consumer wiring: out of scope for this effort.