convert-nfcore-module-to-galaxy-tool

Convert one nf-core module directory into a Galaxy tool wrapper. Input is a path to modules/nf-core/<name>/ (or any directory of the same shape: main.nf + meta.yml + environment.yml + optional tests/). Output is a self-contained tool dir: tool.xml, macros.xml, _provenance.yml, with <test> blocks pinned to remote nf-core/test-datasets URLs.

The Mold authors a Galaxy tool XML wrapper directly from the nf-core module shape — the regular structure (tuple(meta, path) channels, task.ext.args escape hatch, versions emit, environment.yml bioconda pin) makes a mechanical mapping practical. It does not depend on [[summarize-nextflow]] — that Mold summarizes whole pipelines, the wrong granularity for one module.

The Mold is run per module by an outer harness (a script or human loop). Cross-module batches are not its concern.

Inputs

The Mold expects:

A path to the module directory (modules/nf-core/<name>/, or any local clone of that shape).
Optional module pin: tag, branch, or commit SHA of nf-core/modules. When absent, the cast skill resolves to git rev-parse HEAD of the dir’s containing repo.
Optional test-datasets pin: SHA of nf-core/test-datasets to use for <test> block location URLs. When absent, the cast skill resolves to a recent SHA on the module’s pipeline-of-record branch (best-effort; recorded in _provenance.yml either way).

The Mold does not accept “convert a subworkflow” — meta.yml with a populated components: field is out of scope (composes other modules; route to a separate subworkflow Mold not in this plan).

Outputs

Three files in a sibling output directory the harness specifies:

<output_dir>/
  <name>.xml          # primary wrapper
  macros.xml          # tool-local macros (token, requirements, version_command, citations)
  _provenance.yml     # nfcore source SHA, file hashes, mold revision, generated_at

tool.xml shape (skeleton; idiomatic IUC layout):

<tool id="<name>" name="<name>" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="23.1">
  <description><!-- meta.yml description, first sentence --></description>
  <macros>
    <import>macros.xml</import>
  </macros>
  <expand macro="requirements"/>
  <expand macro="version_command"/>
  <command detect_errors="exit_code"><![CDATA[
    <!-- script: body translated, $task.ext.args → $extra_args -->
  ]]></command>
  <inputs>
    <!-- per nfcore-channel-input-to-galaxy-collection + nfcore-meta-map-to-galaxy-params -->
    <param name="extra_args" type="text" optional="true" .../>  <!-- per task.ext.args pattern -->
  </inputs>
  <outputs>
    <!-- one Galaxy <data> / <collection> per output: channel, minus the versions channel -->
  </outputs>
  <tests>
    <test>
      <param name="reads_1" location="https://raw.githubusercontent.com/nf-core/test-datasets/<sha>/.../test_1.fastq.gz"/>
      <output name="trimmed" location="https://raw.githubusercontent.com/nf-core/test-datasets/<sha>/.../trimmed.fastq.gz"
              checksum="sha256$..."/>
    </test>
  </tests>
  <help><!-- meta.yml description --></help>
  <expand macro="citations"/>
</tool>

_provenance.yml shape (canonical):

nfcore_source:
  modules_repo: nf-core/modules
  module_path: modules/nf-core/<name>
  branch: master
  git_sha: <sha at conversion time>
  meta_yml_hash: <sha256 of meta.yml at conversion>
  main_nf_hash:  <sha256 of main.nf at conversion>
  environment_yml_hash: <sha256 of environment.yml at conversion>
  test_datasets_sha: <sha of nf-core/test-datasets pin>
generated:
  by_mold: convert-nfcore-module-to-galaxy-tool
  mold_revision: 1
  cast_target: claude
  cast_artifact_sha: <sha of cast bundle used>
  on_date: 2026-05-10
overrides: []

Procedure

The cast skill is not a single LLM prompt. It is a small program with embedded LLM calls:

Deterministic: read meta.yml / main.nf / environment.yml / tests/main.nf.test; tokenize the process block for input/output channels; resolve container directive into bioconda packages; compute file hashes; resolve git SHAs; emit _provenance.yml and the static portions of <requirements>, <citations>, <version_command>.
LLM-driven: translate the script: body into Galaxy <command> Cheetah; pick <inputs> shapes (data vs collection; conditional gating); name and document outputs; humanize <help>; place <test> block fixtures.

The boundary mirrors [[summarize-nextflow]] §Procedure: enumerated artifacts deterministic, free-text fields LLM.

1. Read the module

Open meta.yml, main.nf, environment.yml, and tests/main.nf.test (when present). Reject early if meta.yml.components: is populated (subworkflow composing other modules; out of scope — see Non-goals).

Compute sha256 of each file and capture for _provenance.yml.

2. Build `<requirements>`

Walk environment.yml.dependencies:. Each bioconda::<name>=<version> becomes a Galaxy <requirement type="package" version="<version>"><name></requirement> entry. For mulled / multi-package environments, declare every package — bioconda’s mulled-resolution produces an equivalent image (per [[component-nextflow-containers-and-envs]]).

Record any forced divergence from upstream container choice in _provenance.yml.overrides.

3. Translate `<inputs>`

Per [[nfcore-channel-input-to-galaxy-collection]], decide the Galaxy input shape from the process’s input channel cardinality. Per [[nfcore-meta-map-to-galaxy-params]], triage meta-map keys into Galaxy params, conditionals, or drops.

Emit a final extra_args text param per [[nfcore-task-ext-args-to-galaxy-additional-options]] if (and only if) the script body interpolates ${task.ext.args} / args2 / args3.

4. Translate `<outputs>`

For each output: channel that isn’t the versions emit, decide cardinality first, then shape (per [[galaxy-discover-datasets]] §Convert Mold posture). The Nextflow glob alone is not enough — path('*.bam') (N files, one per element of an upstream collection) and path('*.{bai,csi,crai}') (exactly one file, alternation across mutually-exclusive extensions) look the same but map to different Galaxy idioms.

Single output, deterministic name (path("${prefix}.json")) → <data name="..." format="json" from_work_dir="${prefix}.json"/>. No <discover_datasets>.
Single output, variable extension (alternation glob like path("*.{bai,csi,crai}"), or path("${prefix}.${ext}") where ext is computed): the channel emits one file whose extension depends on inputs or args. Map to a <data> with the most-common extension as format=, plus a <change_format> block that flips on the input ext or the responsible param. Preserve the upstream invocation byte-for-byte and capture the result with a tight mv — ln -s '$input' 'input.${input.ext}' to stage with the upstream-expected name, run the tool exactly as the nf-core script: body does, then mv 'input.${input.ext}'.{bai,csi,crai} '$output_name'. Do not use <collection> + <discover_datasets> for this shape — there is no list. Direct write to '$output_name' (instead of mv) is the secondary form, used only when the upstream script: body itself passes an output-path arg to the tool. See [[galaxy-discover-datasets]] §Convert Mold posture Rule 2 for the full pattern, including the variant and the from_work_dir callout.
Multi-output, list cardinality (true glob like path('*.bam') where the upstream process emits N files keyed by element identifier) → <collection type="list" name="..." format="bam"> with <discover_datasets pattern="__name_and_ext__" visible="true"/>.
Multi-output, paired cardinality (tuple val(meta), path("*_R{1,2}.fastp.fastq.gz")) → <collection type="paired" ...> with a custom (?P<name>...)_R(?P<identifier_1>[12])... regex.
versions channel → drop; the <version_command> carries that load (per [[nfcore-versions-emit-to-galaxy-version-command]]).

Cardinality heuristic: if the upstream input: channel is tuple(meta, path) (one item per process invocation) and output: emits one path per concept, the output is single — even when the path is glob-shaped. Process cardinality = output cardinality unless the script explicitly fans out.

5. Translate `script:` to `<command>`

LLM step. Pass:

The verbatim script: body.
The Galaxy <inputs> and <outputs> already chosen.
The task.ext.args mapping (text param → $extra_args).

Ask only for the Cheetah-flavored Galaxy command. Wrap in <![CDATA[...]]>. Set detect_errors="exit_code" unless a comment in the original script: argues otherwise.

6. Emit `<version_command>`

Per [[nfcore-versions-emit-to-galaxy-version-command]]: extract the primary tool’s version-emit line from the heredoc or topic: versions annotation, strip Nextflow escaping (\$( → $( etc.), and wrap in <![CDATA[...]]>.

7. Emit `<citations>` and `<help>`

<citations> are DOIs from meta.yml.tools[].doi (one <citation type="doi">…</citation> per tool). <help> is the meta.yml.description (humanized one-liner; expanded into a paragraph if meta.yml has a longer prose block).

8. Emit `<test>` blocks (remote URLs)

Read tests/main.nf.test. For each test that asserts a successful run with a non-trivial fixture:

Resolve every input fixture path to a raw.githubusercontent.com/nf-core/test-datasets/<test_datasets_sha>/... URL. Pin <test_datasets_sha> upfront — never use a branch ref.
Emit Galaxy <param ... location="https://..."/> for inputs.
For outputs, prefer <output name="..." location="https://..." checksum="sha256$..."/> (compute checksum from the upstream snapshot file when one exists; omit if not).

When tests/main.nf.test has no usable fixture (stub-only coverage, missing test file), the convert Mold does not ship a placeholder <test>. It surfaces the gap in _provenance.yml.overrides and exits with a non-zero status; the harness escalates to human review. Every shipped wrapper carries at least one <test> block backed by a real fixture (per [[nfcore-stub-block-to-galaxy-noop-test]] and the reviewer Mold’s dimension #6).

9. Emit `_provenance.yml`

Collect: nf-core module source (repo, path, branch, git_sha), file hashes, test-datasets pin, mold metadata, any overrides (forced divergence from upstream container, dropped stub block, hand-edits the cast skill chose to apply).

10. Convergence loop: lint, test, fix

Iterate until clean. Both gates exit on structured signals — never free-text grep.

10.1 Lint

planemo lint <output_dir> (see planemo-lint). Treat by exit code:

Exit 0 — proceed to 10.2.
Non-zero — read the diagnostic block. XSD failures are hard: fix the XML and re-emit. Soft findings (missing <help>, missing <citations>) are fixed in place by the cast skill, then loop.

10.2 Test

planemo test <output_dir> --test_output_json <output_dir>/_planemo_test_report.json (see planemo-test). Always pass --test_output_json; planemo’s stdout is for humans and intentionally not part of the cast skill’s parsing surface.

After the run:

Validate. AJV-check the JSON against planemo-test-report (validate-planemo-test-report CLI from @galaxy-foundry/planemo-test-report-schema). A schema-invalid report means the pinned planemo SHA drifted or the run aborted before writing the report — escalate to human triage; do not classify.
Classify from schema fields, not free-text. tests[].data.job is dict[str, Any] (extra-allow) — its inner keys come from the Galaxy job state and are not constrained by planemo-test-report, so treat them as best-effort signals:
- tests[].data.status == "success" → pass.
- tests[].data.status == "failure" + data.problem_log matches an output-discovery pattern (<discover_datasets> mismatch, missing dataset name, format mismatch) → adjust the corresponding <output> / <discover_datasets> block.
- tests[].data.status == "failure", and data.job.stderr (when present) carries upstream tool stderr → the <command> Cheetah translation is wrong; LLM revises.
- tests[].data.status == "error" with HTTP/URL signals → fixture-availability fault; verify the nf-core/test-datasets URL resolves and consider a local test-data/ fallback, recording the divergence in _provenance.yml.overrides.
- Any other failure shape → human triage.
Stop when lint and test both clear.

The convergence loop is bounded (default 3 attempts). On exhaustion, the cast skill writes whatever it has and surfaces the final _planemo_test_report.json (plus the lint diagnostics) for human triage.

Non-goals

Subworkflow conversion. meta.yml.components: populated → out of scope. Routed to a separate Mold (not in this plan).
Pipeline conversion. Whole nf-core pipelines stay with [[nextflow-to-galaxy]]; this Mold runs at the module tier.
Discovery / dedup against IUC. The new repo coexists with IUC by design (different interface contract for the same upstream tool).
Cross-module refactoring. Per-module unit; harness owns batches.
Tool Shed publication. The output is a tool dir on disk; .shed.yml and shed publication live in the new repo’s CI, not in this Mold.

Caveats

meta.yml may lie. Hand-authored, can drift from script: IO. When the LLM-inferred IO disagrees with meta.yml, prefer meta.yml and surface the disagreement in _provenance.yml.overrides.
Container directive ternaries require pulling both branches; the bioconda pin from environment.yml is the source of truth for <requirements>. Don’t substitute a hand-picked container source.
task.ext.args with embedded Groovy logic can’t be cleanly mapped to a single text param. Surface as a text param plus a heavy <help> block; document the divergence in _provenance.yml.overrides.
Stub-only tests. When tests/main.nf.test has only stub-mode coverage (-stub-run), the convert Mold cannot derive a Galaxy <test> from it (per [[nfcore-stub-block-to-galaxy-noop-test]]). Surface the gap; let the harness decide whether to author a <test> by hand.

Reference dispatch (for casting)

patterns → 5 nf-core→Galaxy translation patterns (see frontmatter references[]). LLM-condensed into the cast skill.
cli-tool → planemo carries the pinned install metadata; flows into the cast bundle’s _required_tools.json via the PR #235 mechanism.
cli-command → planemo-lint and planemo-test cast to JSON sidecars; consulted on-demand inside the §10 loop.
schema → planemo-test-report copied verbatim into the cast bundle; the convergence loop AJV-validates --test_output_json output against it before classifying failures.
research → [[component-nf-core-tools]], [[component-nextflow-containers-and-envs]], [[galaxy-discover-datasets]]. On-demand, condensed into runtime context when triggered.
examples — pending: 3 hand-picked Wave 1 modules (one trivial, one paired-aware, one with conditional). Used for round-trip smoke testing before this Mold ships.

Revision history

rev 1 (2026-05-10) — initial draft. Procedure sketched against the trimmed plan; no cast runs yet. Pattern + CLI references all evidence: hypothesis.
rev 2 (2026-05-11) — convergence loop rewritten against the JSON test-report gate: §10.2 now consumes planemo test --test_output_json validated against planemo-test-report; pulled in cli-tool/cli-command/schema references for planemo, planemo-lint, planemo-test, planemo-test-report. Deferred-manpages caveat removed.

convert-nfcore-module-to-galaxy-tool

Mold health

Reference Loading

Cast artifacts

convert-nfcore-module-to-galaxy-tool

Inputs

Outputs

Procedure

1. Read the module

2. Build `<requirements>`

3. Translate `<inputs>`

4. Translate `<outputs>`

5. Translate `script:` to `<command>`

6. Emit `<version_command>`

7. Emit `<citations>` and `<help>`

8. Emit `<test>` blocks (remote URLs)

9. Emit `_provenance.yml`

10. Convergence loop: lint, test, fix

10.1 Lint

10.2 Test

Non-goals

Caveats

Reference dispatch (for casting)

Revision history

Incoming References (10)

Mold health

Reference Loading

Cast artifacts

convert-nfcore-module-to-galaxy-tool

Inputs

Outputs

Procedure

1. Read the module

2. Build <requirements>

3. Translate <inputs>

4. Translate <outputs>

5. Translate script: to <command>

6. Emit <version_command>

7. Emit <citations> and <help>

8. Emit <test> blocks (remote URLs)

9. Emit _provenance.yml

10. Convergence loop: lint, test, fix

10.1 Lint

10.2 Test

Non-goals

Caveats

Reference dispatch (for casting)

Revision history

Incoming References (10)

2. Build `<requirements>`

3. Translate `<inputs>`

4. Translate `<outputs>`

5. Translate `script:` to `<command>`

6. Emit `<version_command>`

7. Emit `<citations>` and `<help>`

8. Emit `<test>` blocks (remote URLs)

9. Emit `_provenance.yml`