Foundry operations log

Append-only journal for cast, planned lint, query. Excluded from validator and Astro collection.

2026-05-01 — cast run: summarize-nextflow on nf-core/demo (45904cb)

Cast: casts/claude/summarize-nextflow/ (hand-cast, rev 1) Target: nf-core/demo @ 1.1.0 (workflow-fixtures/pipelines/nf-core__demo, tier=tiny) Output: casts/claude/summarize-nextflow/runs/nf-core__demo/summary.json — schema-valid (ajv clean).

Gaps surfaced (issue #17 — paragraphs land in the named note's ## Open gaps):

Schema verdict (single-pipeline sample, biased toward easy case):

Next: same skill against nf-core/bacass (small, mid-tier) before graduating to rnaseq or sarek.

2026-05-01 — schema rev 2 + mold rev 3 + cast rev 2 (re-cast against nf-core/demo)

Schema (content/schemas/summary-nextflow.schema.json) rev 2: Tool.wave, ChannelIO.topic, Subworkflow.kind added; Tool.biocontainer widened to docker.io alias; Process.container/Process.conda re-described as verbatim directive text; Workflow.name documents the multi-workflow selection rule. Schema-note summary-nextflow.md rev 2.

Mold (content/molds/summarize-nextflow/index.md) rev 3: §1 names the multi-workflow selection rule; §4 captures verbatim directives + channel topics; §5 5-pattern resolver (biocontainer-quay, biocontainer-docker.io alias, wave, depot.galaxyproject.org/singularity, fallthrough docker) + file-path conda; §6 utility-vs-pipeline subworkflow split + free-function call handling.

Cast (casts/claude/summarize-nextflow/) rev 2: SKILL.md regenerated; schema recopied; provenance bumped with cast_history. Notes still stubs.

Re-cast on nf-core/demo: schema-valid (ajv clean), warnings down from 8 → 2. Of the 13 gaps logged in the prior entry, 9 are now structurally captured in the schema; 4 remain (snapshot-fixture structure, free-function-call modeling, accumulator-channel evolving sources, samplesheet-URL fetch policy) — kept in the notes' Open gaps sections to grow on next pipeline contact.

2026-05-01 — cast run: summarize-nextflow on nf-core/bacass (76e4b12)

Cast: casts/claude/summarize-nextflow/ rev 2 (mold rev 3, schema rev 2) Target: nf-core/bacass @ 2.5.0 (workflow-fixtures/pipelines/nf-core__bacass, tier=small — bigger than tier=tiny suggested; 33 processes / 7 subworkflows / 694-line workflow body) Output: casts/claude/summarize-nextflow/runs/nf-core__bacass/summary.json — schema-valid (ajv clean).

Schema held: required fields covered all 33 processes, both kinds of subworkflow, multi-tool envs, mulled containers, and a 20-conditional gating tree. No fields had to be force-fitted. Warnings: 14, all structural-coverage rather than schema-failures.

New gaps surfaced (different from demo):

What survived from rev 2 schema unchanged: container ternary handling, file-path conda directives, channel topics (none in bacass — version aggregation uses old path "versions.yml" style here), Subworkflow.kind, multi-workflow selection rule (BACASS picked correctly).

What didn't get exercised by bacass: Wave registry (only multiqc 1.33 used Wave in demo; bacass uses 1.27 from bioconda).

Schema verdict (two-pipeline sample, demo + bacass):

Next: defer mulled-container per-component breakdown (deferred by mold §5 caveats already); add Process.aliases as the next schema bump if a downstream Mold (likely author-galaxy-tool-wrapper) confirms it needs the canonical→alias mapping. Otherwise this gap is logged for the third pipeline to confirm pattern.

2026-05-01 — schema rev 3 + mold rev 4 + cast rev 3 (aliases + nf_tests)

Promoted two bacass-surfaced patterns straight to schema rather than waiting for a third pipeline. Both are universal in nf-core templates (alias-multiplexing of modules, per-profile .nf.test files); the "wait and see" rule is for ambiguous patterns, not unambiguous ones.

Schema rev 3 (content/schemas/summary-nextflow.schema.json):

Schema-note summary-nextflow.md rev 3.

Mold (content/molds/summarize-nextflow/index.md) rev 4: §4 grew the alias-sweep rule (recover from include-as imports across main.nf/workflows/subworkflows). §7 split into test_fixtures + nf_tests[] enumeration with explicit SnapshotFixture extraction (captures, helpers, ignore_files, ignore_globs, snap_path).

Cast (casts/claude/summarize-nextflow/) rev 3: SKILL.md regenerated; schema recopied; provenance bumped with cast_history. Notes still stubs.

Re-cast both pipelines:

Of the 17 bacass gaps logged in the prior entry, 4 are now structurally captured (process aliasing, multi-test-profile enumeration, snapshot helper extraction, snapshot ignore-globs). 13 remain (mulled containers, multi-dep envs, racon env bug, multiMap/.branch/.cross, meta-mutation closures, conditional channel construction, .mix-then-reassign, .dump pervasiveness, exit/log.error, subworkflow-typed-value-inputs, empty-list-literal channel inputs, params runtime concatenation, params runtime existence checks). These hold for the third pipeline.

2026-05-02 — summarize-nextflow eval.md + 7-fixture sweep

Added content/molds/summarize-nextflow/eval.md (14 cases: schema/fidelity/utility/regression buckets). Ran @galaxy-foundry/summarize-nextflow against all seven pinned fixtures (demo, fetchngs, hlatyping, bacass, rnaseq, sarek, taxprofiler) for the first time end-to-end.

Schema-conformance (eval bucket "schema"): PASS for all 7. Every fixture exits 0 with --validate enabled. additionalProperties: false holds.

Fidelity findings (eval bucket "fidelity"):

Regression (eval bucket "regression"): FAIL on both committed runs. demo diff 822 lines, bacass diff 4624 lines. Sampling the head of demo: committed run has hand-curated param descriptions ("Path to a samplesheet CSV.") while the CLI emits the verbatim nextflow_schema.json text ("Path to a metadata file containing information about the samples in the experiment."). The committed runs predate the strict-deterministic CLI; they are no longer the right regression baseline. Either re-baseline (commit the new CLI outputs as the canonical run) or drop the regression cases until a "v1 frozen" reference run is established.

Utility (eval bucket "utility"): not exercised. Requires running nextflow-summary-to-galaxy-data-flow and author-galaxy-tool-wrapper against the new outputs. Open as next step.

Recommended next steps:

  1. Re-baseline runs/nf-core__{demo,bacass}/summary.json with the current CLI output, after deciding on the nf_tests scoping question (module-level in or out).
  2. Add the 5 new fixtures (fetchngs, hlatyping, rnaseq, sarek, taxprofiler) as runs under runs/ if they are intended baselines.
  3. Investigate process-count divergence on bacass (34 vs naive 31) before declaring the fidelity case authoritative.
  4. Investigate the uniform 2-warning count — either the warnings system is under-sampling or the corpus genuinely doesn't trigger §"Caveats" predicates.
  5. Run the utility cases (data-flow Mold + tool-wrapper Mold consumers) against bacass output to surface missing fields / open gaps.

2026-05-03 — summarize-nextflow ad-hoc fixture sweep (overfitting check)

Issue #110 raised, then 8 ad-hoc DSL2 pipelines added to workflow-fixtures/fixtures.yaml (flavor: adhoc field new; existing 7 nf-core entries marked flavor: nf-core). Stress-test goal: confirm the resolver doesn't silently degrade against pipelines that don't follow the nf-core template.

Result: severe overfitting. The resolver works only on nf-core layouts.

Hard failures (2/8)

Pipeline-root detection (packages/summarize-nextflow/src/resolver.ts:121) requires <root>/nextflow.config and offers no fallback for nested-pipeline or non-standard-root layouts.

Silent degradation (6/8)

The other six exit 0 and emit schema-valid but empty summaries. Process counts (resolver-emitted vs filesystem grep ^process ):

pipeline emitted fs gap reason
CalliNGS-NF 0 11 11 processes in single modules.nf file at root
mcmicro 0 17 flat modules/<name>.nf (one file per module, not nf-core's per-dir)
nf-demos 0 9 no modules/ dir; processes in <dir>/<file>.nf
crispr-process-nf 0 12 processes inline in main.nf itself
What_the_Phage 0 94 flat modules/<tool>.nf + custom phage.nf entrypoint
wf-human-variation 0 99 processes spread across workflows/, lib/, modules/local/<name>.nf

Root cause

discoverProcessFiles() (resolver.ts:284-286):

function discoverProcessFiles(pipelineRoot: string): string[] {
  return walk(join(pipelineRoot, "modules")).filter((path) => basename(path) === "main.nf");
}

Two hardcoded nf-core assumptions:

  1. <root>/modules/ exists. Half the ad-hoc pipelines have processes elsewhere (<root>/main.nf, <root>/<custom-name>.nf, <root>/workflows/, <root>/lib/, flat <root>/modules/<x>.nf).
  2. One process per main.nf file. CalliNGS-NF puts 11 processes in one modules.nf; ad-hoc pipelines routinely put multiple processes in a single file. parseProcessFile uses matchOne(/process\s+(...)\{/u) — single match — so even if the file were found, only the first process would be parsed.

Surviving fields

Implications for the resolver

The fix-list, ordered by impact:

  1. Walk all .nf files under the pipeline root (excluding .git/, work/, .nextflow/, vendored submodules) and grep each for ^[ \t]*process [A-Z_][A-Z0-9_]*[ \t]*\{. This single change unblocks 5 of the 6 silent-degradation cases.
  2. Multi-process-per-file support. Replace matchOne with matchAll; emit one processes[] entry per match. Each gets its own module_path (the file) plus a span/offset into the file for IO/script extraction.
  3. Pipeline-root auto-detect. When <path>/nextflow.config is absent, walk down looking for one (handles MOP2's mop_preprocess/); when none found anywhere, fall back to the nearest dir containing a .nf file with a workflow { block (handles egapx's nf/). Surface the chosen root in source or warnings.
  4. Entrypoint detection beyond main.nf. What_the_Phage uses phage.nf; egapx uses ui/main.nf or similar. The resolver should pick the .nf file containing workflow { ... } (named or anonymous) at the chosen root, not require a literal main.nf.
  5. Local-module IO inference. Once 5/6 pipelines start emitting processes, every one of those processes will be a "local" module without meta.yml. The resolver currently leans on meta.yml for IO docs; it'll need a script-block IO inference path (already noted in mold §4 but not implemented).

Implications for the schema

Probably none from this sweep alone — additionalProperties: false holds, and the schema's required fields are all already satisfiable as empty arrays. The schema isn't overfit; the resolver is.

Implications for the eval

Two new fidelity cases to add to content/molds/summarize-nextflow/eval.md:

Recommended next step

File a follow-up issue: "summarize-nextflow resolver: walk all .nf files, support multi-process-per-file, auto-detect pipeline root." This is independent of issue #110 (per-module tests + meta.yml) and should land first — there's no point capturing per-module tests while the resolver finds zero modules.