NFCORE_TO_GALAXY_TOOLS

NF-Core Modules → Galaxy Tools

Plan for an IUC-shaped repository that converts nf-core modules into Galaxy tool wrappers, reviewed largely by Claude / Foundry-cast skills.

Two halves:

  1. Infrastructure plan — the new repo, CI (planemo + deterministic lint hooks), local-skill review, Foundry molds that drive it.
  2. Tooling roadmap — wave-based selection, high-value first, deferring hard cases.

1. Strategic Frame

2. XML vs YAML — Decision

Recommend XML. UDT YAML (GalaxyUserTool, [[author-galaxy-tool-wrapper]]) is for ad-hoc user tools and remains in the Foundry as one path. Tool Shed distribution requires XML.

Reasons:

YAML stays useful for:

The new repo accepts XML only. The existing UDT path keeps its niche.

3. New Repository

Layout (mirrors tools-iuc)

tools/<name>/<name>.xml          # primary wrapper
tools/<name>/macros.xml          # tool-local macros
tools/<name>/test-data/          # rare; only when a remote URL fixture won't fit (see §3 Test data)
tools/<name>/.shed.yml           # Tool Shed metadata
tools/<name>/_provenance.yml     # NEW — nf-core source SHA, conversion mold version, reviewer signoff
data_managers/                   # later
.github/workflows/pr.yaml        # planemo-ci-action on changed tools
.github/workflows/ci.yaml        # weekly global lint+test
CONTRIBUTING.md                  # human + agent onboarding
AGENTS.md                        # how the cast skill is consumed

_provenance.yml — new file per tool

Contract: every tool wrapper points back at the upstream module it was derived from. Survives nf-core churn; signals when a refresh is due. The nfcore_source block mirrors nf-core/modules’s own modules.json shape — git_sha, branch — so nf-core readers find familiar fields. (nf-core-tools/nf_core/modules/modules_json.py:27–31.)

nfcore_source:
  modules_repo: nf-core/modules         # mirrors modules.json: per-module entry
  module_path: modules/nf-core/fastp
  branch: master                        # modules.json field
  git_sha: <sha at conversion time>     # modules.json field — the canonical "version" of the module
  meta_yml_hash: <sha256 of meta.yml at conversion>
  main_nf_hash:  <sha256 of main.nf at conversion>
  environment_yml_hash: <sha256 of environment.yml at conversion>
  test_datasets_sha: <sha pinning nf-core/test-datasets URLs in <test> blocks>
generated:
  by_mold: convert-nfcore-module-to-galaxy-tool   # our analog to modules.json `installed_by`
  mold_revision: <semver>
  cast_target: claude
  cast_artifact_sha: <sha of cast bundle used>
  on_date: 2026-05-09
overrides:
  # any human edit beyond the cast output
  - reason: "..."
    files: ["fastp.xml"]
review:
  reviewer: claude-cast/review-galaxy-tool-pr@<sha>
  signoff_at: 2026-05-09T12:00:00Z
  unresolved: []

A drift checker compares current upstream hashes vs _provenance.yml and opens an issue when an nf-core module has moved past the pin.

Versioning

Three orthogonal concerns, tracked in different places:

ConcernSourceWhere it lives
Upstream tool versionenvironment.yml bioconda pin (bioconda::fastp=0.23.4)<tool_version> in tool XML; matches IUC convention
Module pinnf-core/modules git SHA (no per-module semver — modules evolve at HEAD per nf-core-tools/nf_core/modules/lint/module_version.py:15–29)_provenance.yml.nfcore_source.git_sha
Wrapper iterationXML wrapper changes that don’t move the bioconda pin+galaxyN build tag

Tool XML version string: <bioconda_version>+galaxy<N>, strict IUC convention. Don’t embed the nf-core/modules SHA in the version string — Tool Shed version_compare expects the IUC shape and every other Galaxy wrapper follows it. The SHA lives in _provenance.yml.

+galaxyN bumps for any wrapper-only change, including:

Runtime version sink: nf-core’s versions: channel idiom in main.nf (e.g., eval('fastp --version 2>&1 | sed ...')) maps to Galaxy’s <version_command> — see pattern nfcore-versions-emit-to-galaxy-version-command.md.

Test data — remote URLs, not lifted fixtures

Galaxy 23.1+ tool XML supports a location attribute (xs:anyURI) on <param> and <output> elements inside <test> blocks. We pin to nf-core/test-datasets raw GitHub URLs at a SHA — no fixture lift, no LFS, no test-data/ bloat. The framework downloads, caches in GALAXY_TEST_DATA_REPO_CACHE, and (optionally) verifies a checksum="<algo>$<hex>".

Evidence in galaxy:

<test>
  <!-- location-only: filename inferred from URL basename -->
  <param name="reads_1"
         location="https://raw.githubusercontent.com/nf-core/test-datasets/&lt;pinned_sha&gt;/data/genomics/sarscov2/illumina/fastq/test_1.fastq.gz"/>
  <param name="reads_2"
         location="https://raw.githubusercontent.com/nf-core/test-datasets/&lt;pinned_sha&gt;/data/genomics/sarscov2/illumina/fastq/test_2.fastq.gz"/>
  <!-- value + location: local-first, remote fallback (rare; only when a tool authors a small fixture in test-data/ but wants a remote backup) -->
  <param name="adapters" value="adapters.fa"
         location="https://raw.githubusercontent.com/nf-core/test-datasets/&lt;pinned_sha&gt;/data/delete_me/adapters.fa"/>
  <!-- output with sha256 checksum -->
  <output name="trimmed"
          file="trimmed.fastq.gz"
          location="https://raw.githubusercontent.com/nf-core/test-datasets/&lt;pinned_sha&gt;/data/genomics/sarscov2/illumina/fastq/test_trimmed.fastq.gz"
          checksum="sha256$&lt;hex&gt;"/>
</test>

Conventions:

Containers — match upstream

Mirror the upstream module’s container choice. The Galaxy <requirements> block declares package + version matching environment.yml exactly; bioconda → BioContainers → singularity-at-depot.galaxyproject.org is the same resolution chain nf-core’s biocontainers/... branch hits, so the runtime image is the same byte stream. This is what lint-container-parity enforces.

Where nf-core’s container directive uses a multi-package mulled image (seqera mulled or community.wave.seqera.io/library/...), the Galaxy wrapper declares the same multi-package set in <requirements> so bioconda mulled-resolution produces an equivalent image. Don’t substitute a hand-picked container source — bioconda parity with the upstream module is the contract. Document any forced divergence in _provenance.yml.overrides.

Macros — per-tool, no shared cross-family macros

Each tool’s macros.xml lives next to its tool.xml. There is no top-level macros/ directory and no cross-family shared macros — nf-core modules are self-contained, and this repo follows that structure. Revisit only if a concrete duplication pain emerges across multiple converted tools.

License

MIT, propagated from nf-core. Each tool.xml’s <citations> block plus the repo LICENSE carry MIT; double-check per-tool upstream licenses (some bundled binaries ship under non-MIT terms — record divergences in _provenance.yml.overrides).

CI stack

Tool Shed publication

4. Foundry Side — New Molds, Patterns, References

New molds (proposed slugs)

Two molds. Earlier drafts had six; the rest are dropped as YAGNI:

The kept molds:

New schema packages

New CLI manual pages

New / extended pattern pages

New prompts

(xml-tool-structured / xml-tool-critic were proposed as XML twins of speculative custom-tool-* prompts and dropped as YAGNI; the convert Mold’s body + the [[galaxy-xsd]] reference + planemo-lint feedback loop carry the same load.)

Cast targets

5. Local Review — Dimensions

The review-galaxy-tool-pr skill is run locally by the reviewer / merger before merge — not by CI. It produces a structured report with these dimensions, each scored pass | warn | fail | n/a plus rationale:

  1. Provenance integrity_provenance.yml exists, hashes match upstream, pinned SHA reachable.
  2. Parameter parity — every task.ext.args-able flag the upstream tool exposes is either: a Galaxy param, the additional-options bag, or documented as intentionally hidden.
  3. Command-line fidelity — the rendered Galaxy command, given the test inputs, matches the rendered Nextflow script: block on the same conceptual fixture, modulo path differences.
  4. Output discovery — every output: channel from main.nf has a Galaxy <data> / <collection> (or documented drop, e.g., versions.yml).
  5. Container / requirement parity — Galaxy <requirements> versions match environment.yml. Container directive’s BioContainers branch present and reachable.
  6. Test parity — at least one <test> block consumes an nf-test fixture by remote location URL pinned to the _provenance.yml test_datasets_sha; output <output location=… checksum=sha256$…> where feasible. Assertions concrete (file size, key tokens, format match), not just has_text=" ".
  7. XML idiomplanemo lint clean. Profile set. <citations> present (DOI from meta.yml). <help> non-empty. detect_errors="exit_code" unless commented otherwise.
  8. Pattern compliance — meta-map handling, paired/unpaired branching, versions emit, additional-options bag — all consistent with the relevant Foundry patterns.
  9. Scope fit — the wrapper isn’t trying to fold a subworkflow’s worth of logic into one tool. Subworkflows are out of scope (separate nfcore-subworkflow-to-galaxy-workflow Mold, not in this plan).

Reviewer output schema is the source of truth for the GH PR comment renderer; the comment is a deterministic projection of the JSON.

6. Conversion Flow (per module)

Per-module flow is short enough to describe in prose; not promoted to a Foundry pipeline note (YAGNI — pipelines exist for whole-source-to-target journeys, not per-tool conversion).

  1. Run the cast convert-nfcore-module-to-galaxy-tool skill against the module dir. Emits XML + _provenance.yml + remote-URL <test> blocks.
  2. Run planemo lint then planemo test locally; iterate inside the cast skill until clean.
  3. Run the cast review-galaxy-tool-pr skill against the resulting diff (local, pre-merge).
  4. Open the PR (gh pr create).

Per-tool is the unit; cross-tool batching is harness/scripting concern, not a Foundry pipeline.

7. Wave Strategy

Goal: ship Wave 1 fast to validate plumbing; pull subsequent waves in only as the reviewer’s failure modes shrink.

Selection heuristics

A module is easy if all of:

A module is medium if any of:

A module is hard if any of:

Wave 1 — bootstrap, ~30 tools

Goal: prove the loop works on the simplest end of the corpus.

Probable picks (single-input, single-main-output, simple flags):

Process check: each Wave 1 PR runs the full pipeline, contributor + reviewer run the local review skill before merge, human merges. Track:

These metrics drive what Wave 2 looks like.

Wave 2 — multi-input, conditionals, paired-aware, ~60 tools

Add modules with:

Foundry deliverables before Wave 2:

Wave 3 — hard / deferred

Long tail

After Wave 3, the residual question becomes: what is Galaxy missing that nf-core has? Feed each refusal into the Foundry’s existing roadmap items (see W1–W7 in [[nf-schema-samplesheet-galaxy-gaps]]). Tool conversion gets us most of the way; the remaining ~5% drives Galaxy core / gxformat2 features.

8. Phased Rollout (calendar)

Phase 0 — Plumbing (1–2 weeks)

Phase 0 progress (2026-05-10)

Foundry artifacts now in-tree on branch module_farm:

Validates clean (0 errors, 18 warnings — same as baseline). Pattern bodies are scaffolding only; they flesh out next as the Mold drives convergence.

Phase 1 — Wave 1 (4–6 weeks)

Phase 2 — Wave 2 (8–12 weeks)

Phase 3 — Wave 3 + maintenance (open-ended)

9. Risk


10. What This Plan Does Not Cover