Claude skill · cast

convert-nfcore-module-to-galaxy-tool

Convert one nf-core module dir into a Galaxy tool wrapper (tool.xml + macros.xml + _provenance.yml + remote-URL <test> blocks).

← All cast skills · Source mold →

Install

/plugin marketplace add jmchilton/foundry
/plugin install foundry-skills@galaxy-workflow-foundry

Then invoke as:

/foundry-skills:convert-nfcore-module-to-galaxy-tool

Skill Bundle

/ packaged cast
attached files
12
upfront
6
on demand
6
cast rev
n/a
validated
0

Attached Files

/ runtime references

Load upfront

cli-tool

planemo

packaged

Install metadata + pin SHA for the planemo CLI invoked by the convergence loop.

Trigger: Always — the cast skill needs planemo on PATH before running lint/test.

upfront runtime verbatim hypothesis deterministic 2.2 KB
bundle
references/cli/planemo.md
source
content/cli/planemo/index.md
Preview md
---
type: cli-tool
tool: planemo
origin: pypi
package: planemo
package_version: "git+https://github.com/jmchilton/planemo@a9b8b8bc7ab3b12035d53bdb5383fe450413d9f3"
invoke: planemo
invoke_fallback: "uvx --from git+https://github.com/jmchilton/planemo@a9b8b8bc7ab3b12035d53bdb5383fe450413d9f3 planemo"
availability_check: "planemo --version"
docs_url: "https://planemo.readthedocs.io/"
tags:
  - cli-tool
  - cli/planemo
status: draft
created: 2026-05-10
revised: 2026-05-11
revision: 2
ai_generated: true
summary: "Galaxy tool/workflow runtime testing CLI; used by run-workflow-test and friends."
---

# planemo

Galaxy's runtime testing and authoring CLI. Foundry Molds invoke `planemo test`, `planemo lint`, and friends for end-to-end workflow validation; the cast skill consumes structured JSON output from `--test_output_json` and validates it against [[planemo-test-report]].

## Pin

`package_version` pins to `jmchilton/planemo@a9b8b8bc7ab3b12035d53bdb5383fe450413d9f3` — the merge commit that introduces `planemo cli_metadata` and `planemo output_schema` ([galaxyproject/planemo#1636](https://github.com/galaxyproject/planemo/pull/1636), OPEN as of 2026-05-11). The SHA lives on `jmchilton/planemo`, not `galaxyproject/planemo`, because #1636 has not merged upstream yet. Flip the pin to a released `planemo>=<version>` once the PR merges and publishes.

The convergence loop in [[convert-nfcore-module-to-galaxy-tool]] depends on the JSON test report produced by this SHA; older planemo releases emit only free-text output that the cast skill cannot classify mechanically.

## Install

```sh
uvx --from git+https://github.com/jmchilton/planemo@a9b8b8bc7ab3b12035d53bdb5383fe450413d9f3 planemo --version
```

For a persistent install:

```sh
uv tool install git+https://github.com/jmchilton/pl
...
pattern

nfcore-channel-input-to-galaxy-collection

packaged

Map process input channels (tuple(meta, path)) to Galaxy <param type="data"> / <param type="data_collection">.

Trigger: When emitting <inputs> for a module.

upfront both condense hypothesis llm 7.4 KB
bundle
references/patterns/nfcore-channel-input-to-galaxy-collection.md
source
content/patterns/nfcore-channel-input-to-galaxy-collection.md
Preview md
---
type: pattern
pattern_kind: operation
evidence: hypothesis
title: "nf-core channel input → Galaxy data / collection"
tags:
  - pattern
  - target/galaxy
status: draft
created: 2026-05-10
revised: 2026-05-10
revision: 2
ai_generated: true
summary: "Map an nf-core process's tuple(meta, path) input channel to a Galaxy <param type=\"data\"> or paired/list collection input."
related_molds:
  - "[[convert-nfcore-module-to-galaxy-tool]]"
related_patterns:
  - "[[nfcore-meta-map-to-galaxy-params]]"
related_notes:
  - "[[galaxy-discover-datasets]]"
sources:
  - "https://github.com/nf-core/modules/tree/9b261a459473bc8e2d830bfc626f480c0733f4fe"
  - "https://github.com/galaxyproject/tools-iuc"
---

# nf-core channel input → Galaxy data / collection

Cited modules pinned to `nf-core/modules@9b261a459473bc8e2d830bfc626f480c0733f4fe`.

## Triage table

| nf-core input shape | meta-driven branching? | Galaxy mapping |
|---|---|---|
| `tuple val(meta), path(input)` (single artifact) | no | one `<param type="data">` |
| `tuple val(meta), path(input), path(secondary)` | no | two `<param type="data">` (one per role) |
| `tuple val(meta), path(reads)` with `meta.single_end` driving the script | yes | `<conditional>` switching `<param type="data">` (single) ↔ `<param type="data_collection" collection_type="paired">` (paired) |
| `tuple val(meta), path(reads, stageAs: "input*/*")` (multi-list) | yes (single/paired branching) | `<conditional>` switching `<param type="data" multiple="true">` ↔ `<param type="data_collection" collection_type="list:paired">` |
| `tuple val(meta), path("*.bam")` glob | no | `<param type="data_collection" collection_type="list">` |

The `meta` map itself is **never** an input — see `[[nfcore-meta-map-to-galaxy-params]]`. Its keys may *gate* a `<conditional>`, bu
...
pattern

nfcore-meta-map-to-galaxy-params

packaged

Triage meta-map keys: behavior-driving keys become Galaxy <param>s; identity keys are dropped.

Trigger: When a process consumes a meta-map and any meta keys influence the script: body.

upfront both condense hypothesis llm 5.6 KB
bundle
references/patterns/nfcore-meta-map-to-galaxy-params.md
source
content/patterns/nfcore-meta-map-to-galaxy-params.md
Preview md
---
type: pattern
pattern_kind: operation
evidence: hypothesis
title: "nf-core meta-map → Galaxy params"
tags:
  - pattern
  - target/galaxy
status: draft
created: 2026-05-10
revised: 2026-05-10
revision: 2
ai_generated: true
summary: "Promote nf-core meta-map keys to Galaxy <param>s only when they drive script behavior; drop identity-only keys; pull naming from $input.element_identifier."
related_molds:
  - "[[convert-nfcore-module-to-galaxy-tool]]"
related_patterns:
  - "[[nfcore-channel-input-to-galaxy-collection]]"
sources:
  - "https://github.com/nf-core/modules/tree/9b261a459473bc8e2d830bfc626f480c0733f4fe"
---

# nf-core meta-map → Galaxy params

The nf-core `meta` map travels with the data channel as `tuple val(meta), path(...)`. Galaxy has no metadata-channel equivalent — datasets carry only their own `element_identifier`, `name`, and format. The convert Mold's job is to triage every meta key the script body reads.

Cited modules pinned to `nf-core/modules@9b261a459473bc8e2d830bfc626f480c0733f4fe`.

## Triage rule

Walk the module's `script:` body for `meta.<key>` references. For each key, classify:

| Class | Diagnostic | Galaxy mapping |
|---|---|---|
| **Identity** | only used in output filenames (`${meta.id}.bam`) | **drop** — use `$input.element_identifier` in `<command>` |
| **Behavior-driving** | gates branches in the script body | `<conditional>` (boolean) or `<param type="select">` (multi-mode) |
| **Mode/strategy** | substituted into a flag value (`--lib_type ${meta.strandedness}`) | `<param type="select">` with explicit options |
| **Pass-through tag** | written into output metadata or filenames but doesn't change behavior | optional `<param type="text">`, default `${input.element_identifier}` |

When in doubt, prefer **dropping** over surfacing. Gal
...
pattern

nfcore-stub-block-to-galaxy-noop-test

packaged

Document the intentional drop of stub: blocks; rely on planemo test for fixture coverage.

Trigger: When the module's main.nf contains a stub: block.

upfront both condense hypothesis llm 4.7 KB
bundle
references/patterns/nfcore-stub-block-to-galaxy-noop-test.md
source
content/patterns/nfcore-stub-block-to-galaxy-noop-test.md
Preview md
---
type: pattern
pattern_kind: operation
evidence: hypothesis
title: "nf-core stub: block → Galaxy (intentional drop)"
tags:
  - pattern
  - target/galaxy
status: draft
created: 2026-05-10
revised: 2026-05-10
revision: 2
ai_generated: true
summary: "nf-core's stub: block has no Galaxy analog; the convert Mold drops it intentionally and records the drop in _provenance.yml."
related_molds:
  - "[[convert-nfcore-module-to-galaxy-tool]]"
sources:
  - "https://github.com/nf-core/modules/tree/9b261a459473bc8e2d830bfc626f480c0733f4fe"
---

# nf-core `stub:` block → Galaxy (intentional drop)

Cited modules pinned to `nf-core/modules@9b261a459473bc8e2d830bfc626f480c0733f4fe`.

## What the `stub:` block does in nf-core

Most nf-core modules ship a `stub:` block alongside `script:`. It fakes the outputs cheaply so `nextflow run -stub-run` exercises the DAG (channels, joins, output discovery, naming) without invoking the upstream tool.

`modules/nf-core/samtools/index/main.nf`:

```nextflow
script:
def args = task.ext.args ?: ''
"""
samtools index -@ ${task.cpus} ${args} ${input}
"""

stub:
def args = task.ext.args ?: ''
def extension = file(input).getExtension() == 'cram'
    ? "crai"
    : args.contains("-c") ? "csi" : "bai"
"""
touch ${input}.${extension}
"""
```

`modules/nf-core/fastp/main.nf` has a more elaborate `stub:` block that produces every conditional output the script can emit (paired/single, merged, fail FASTQs, JSON, HTML, log).

## Why Galaxy doesn't need an analog

Galaxy's tool-execution model has no DAG-level dry-run. The Galaxy/IUC equivalent of "did the wrapper plumb correctly" is `planemo lint` (XML-shape correctness) followed by `planemo test` (real CLI invocation against a fixture). Together they cover the same ground:

| nf-core | Galaxy |
|---|---|
| `ne
...
pattern

nfcore-task-ext-args-to-galaxy-additional-options

packaged

Surface task.ext.args as a single Galaxy text param; do not enumerate per-flag inputs.

Trigger: When the upstream script: body interpolates ${task.ext.args} (or args2/args3).

upfront both condense hypothesis llm 5.3 KB
bundle
references/patterns/nfcore-task-ext-args-to-galaxy-additional-options.md
source
content/patterns/nfcore-task-ext-args-to-galaxy-additional-options.md
Preview md
---
type: pattern
pattern_kind: operation
evidence: hypothesis
title: "nf-core task.ext.args → Galaxy additional-options bag"
tags:
  - pattern
  - target/galaxy
status: draft
created: 2026-05-10
revised: 2026-05-10
revision: 2
ai_generated: true
summary: "Map nf-core's task.ext.args escape hatch to a single Galaxy text param surfacing extra command-line arguments."
related_molds:
  - "[[convert-nfcore-module-to-galaxy-tool]]"
sources:
  - "https://github.com/nf-core/modules/tree/9b261a459473bc8e2d830bfc626f480c0733f4fe"
  - "https://github.com/galaxyproject/tools-iuc"
---

# nf-core task.ext.args → Galaxy additional-options bag

Cited modules pinned to `nf-core/modules@9b261a459473bc8e2d830bfc626f480c0733f4fe`.

## Why a single bag, not per-flag inputs

nf-core modules expose `task.ext.args` (sometimes `args2`, `args3`) as the configuration escape hatch — pipelines override these per-process via `modules.config`. The bag is intentionally opaque: the module's authors chose **not** to enumerate every flag of the upstream tool, so the wrapper inherits a one-string-of-CLI surface.

Galaxy has no per-step config layer; the natural mirror is one optional text `<param>` appended to the command line. Going further (per-flag inputs) re-enumerates the upstream tool's CLI in Galaxy XML, which:

- Forces the wrapper author to track upstream releases at the flag level — the exact maintenance load nf-core declined.
- Doubles the cognitive surface for users: which flags are first-class, which need the bag.

The **policy** is: surface `task.ext.args` as a single bag. Promote individual flags to first-class `<param>`s only when the module already does (e.g., `discard_trimmed_pass` is a `val` input in fastp; that's a `<param type="boolean">`, not part of the args bag).

## Cited cases


...
pattern

nfcore-versions-emit-to-galaxy-version-command

packaged

Translate the versions.yml emit block (or topic: versions) into Galaxy's <version_command>.

Trigger: When the script: body or output: declarations contain a versions emit.

upfront both condense hypothesis llm 5.0 KB
bundle
references/patterns/nfcore-versions-emit-to-galaxy-version-command.md
source
content/patterns/nfcore-versions-emit-to-galaxy-version-command.md
Preview md
---
type: pattern
pattern_kind: operation
evidence: hypothesis
title: "nf-core versions emit → Galaxy <version_command>"
tags:
  - pattern
  - target/galaxy
status: draft
created: 2026-05-10
revised: 2026-05-10
revision: 2
ai_generated: true
summary: "Translate nf-core's versions emit (heredoc or topic: versions) into Galaxy's <version_command>, dropping the versions output channel."
related_molds:
  - "[[convert-nfcore-module-to-galaxy-tool]]"
sources:
  - "https://github.com/nf-core/modules/tree/9b261a459473bc8e2d830bfc626f480c0733f4fe"
---

# nf-core versions emit → Galaxy `<version_command>`

Cited modules pinned to `nf-core/modules@9b261a459473bc8e2d830bfc626f480c0733f4fe`.

Every nf-core module emits a tool-version string. Galaxy's `<version_command>` is the natural sink. Two emission shapes are both common in current nf-core/modules:

1. **`topic: versions`** (Nextflow 24+) — modern; an output tuple with `eval(...)` capturing the version string.
2. **`cat <<-END_VERSIONS > versions.yml`** heredoc — older; still present in many modules.

The convert Mold's job is the same either way: extract the primary tool's version-line command, strip Nextflow escaping, drop the versions channel from outputs, and emit `<version_command>`.

## Cited cases

### `topic: versions` style → strip `eval('...')`, keep the inner command

`modules/nf-core/fastp/main.nf`:

```nextflow
output:
...
tuple val("${task.process}"), val('fastp'),
      eval('fastp --version 2>&1 | sed -e "s/fastp //g"'),
      emit: versions_fastp, topic: versions
```

Galaxy:

```xml
<version_command><![CDATA[fastp --version 2>&1 | sed -e 's/fastp //g']]></version_command>
```

Mechanical translation: take the `eval(...)` argument verbatim, strip nothing (no escaping needed for `eval(...)` content), wrap in `<!
...

Load on demand

cli-command

planemo-lint

packaged

Reference for `planemo lint` flags and output classification; first gate in the convergence loop.

Trigger: Step 10.1 — after every <command>/<inputs>/<outputs> emission.

on-demand runtime sidecar hypothesis deterministic 2.0 KB
bundle
references/cli/planemo-lint.json
source
content/cli/planemo/planemo-lint.md
Preview json
{
  "type": "cli-command",
  "tool": "planemo",
  "command": "lint",
  "summary": "Check for common errors and best practices.",
  "source_path": "content/cli/planemo/planemo-lint.md",
  "source_revision": 1,
  "body": "<!-- planemo-cli-meta: BEGIN auto-generated -->\n\n# `planemo lint`\n\nCheck for common errors and best practices.\n\n## Synopsis\n\n```text\nplanemo lint [OPTIONS] TOOL_PATH\n```\n\n\n## Arguments\n\n| Argument | Type | Required | Help |\n|---|---|---|---|\n| TOOL_PATH | path | — | — |\n\n## Options\n\n| Option | Type | Default | Required | Help |\n|---|---|---|---|---|\n| --report_level | choice | all | — | — |\n| --report_xunit | path | — | — | Output an XUnit report, useful for CI testing |\n| --fail_level | choice | warn | — | — |\n| -s, --skip | text | — | — | Comma-separated list of lint tests to skip (e.g. passing --skip 'citations,xml_order' would skip linting of citations and best-practice XML ordering. |\n| --skip_file | file | — | — | File containing a list of lint tests to skip |\n| -r, --recursive | flag | false | — | Recursively perform command for subdirectories. |\n| --urls | flag | false | — | Check validity of URLs in XML files |\n| --doi | flag | false | — | Check validity of DOIs in XML files |\n| --conda_requirements | flag | false | — | Check tool requirements for availability in best practice Conda channels. |\n| --biocontainer, --biocontainers | flag | false | — | Check best practice BioContainer namespaces for a container definition applicable for this tool. |\n\n<!-- planemo-cli-meta: END auto-generated -->\n## Output\n\n<!-- Hand-edited. Preserved across `tsx scripts/sync-planemo-cli.ts`. -->\n\nConsole output is human-oriented; use the process exit status as the pass/fail gate.\n\n## Examples\n\n<!-- Hand-edited. Preserved across `tsx scripts/sync-planemo-cli.ts`. -->\n\n```sh\nplanemo lint <tool_dir>\n```\n\n## Gotchas\n\n<!-- Hand-edited. Preserved across `tsx scripts/sync-planemo-cli.ts`. -->\n\nNo Foundry-specific gotchas recorded yet."
}
cli-command

planemo-test

packaged

Reference for `planemo test --test_output_json` invocation, exit codes, and the JSON report path.

Trigger: Step 10.2 — after lint clears.

on-demand runtime sidecar hypothesis deterministic 12.8 KB
bundle
references/cli/planemo-test.json
source
content/cli/planemo/planemo-test.md
Preview json
{
  "type": "cli-command",
  "tool": "planemo",
  "command": "test",
  "summary": "Run specified tool or workflow tests within Galaxy.",
  "source_path": "content/cli/planemo/planemo-test.md",
  "source_revision": 1,
  "body": "<!-- planemo-cli-meta: BEGIN auto-generated -->\n\n# `planemo test`\n\nRun specified tool or workflow tests within Galaxy.\n\n## Synopsis\n\n```text\nplanemo test [OPTIONS] TOOL_PATH\n```\n\n\n## Arguments\n\n| Argument | Type | Required | Help |\n|---|---|---|---|\n| TOOL_PATH | path | — | — |\n\n## Options\n\n| Option | Type | Default | Required | Help |\n|---|---|---|---|---|\n| --failed | flag | false | — | Re-run only failed tests. This command will read tool_test_output.json to determine which tests failed so this file must have been produced with the same set of tool ids previously. |\n| --test_index | integer | [] | — | Index(es) of specific test(s) to run (1-based). Can be specified multiple times (e.g., --test_index 1 --test_index 3) to run specific tests. If not specified, all tests are run. |\n| --polling_backoff | integer | 0 | — | Poll resources with an increasing interval between requests. Useful when testing against remote and/or production instances to limit generated traffic. |\n| --galaxy_root | directory | — | — | Root of development galaxy directory to execute command with. |\n| --galaxy_python_version | choice | — | — | Python version to start Galaxy under |\n| --extra_tools | path | — | — | Extra tool sources to include in Galaxy's tool panel (file or directory). These will not be linted/tested/etc... but they will be available to workflows and for interactive use. |\n| --install_galaxy | flag | false | — | Download and configure a disposable copy of Galaxy from github. |\n| --galaxy_branch | text | — | — | Branch of Galaxy to target (defaults to master) if a Galaxy root isn't specified. |\n| --galaxy_source | text | — | — | Git source of Galaxy to target (defaults to the official galaxyproject github source if a Galaxy root isn't specified. |\n| --skip_venv | flag | false | — | Do not create or source a virtualenv environment for Galaxy, this should be used to preserve an externally configured virtual environment or conda environment. |\n| --no_cache_galaxy | flag | false | — | Skip caching of Galaxy source and dependencies obtained with --install_galaxy. Not caching this results in faster downloads (no git) - 
...
research

component-nextflow-containers-and-envs

packaged

Resolve the container directive (mulled, biocontainer, Wave) and environment.yml into a Galaxy <requirements> block with matching bioconda pins.

Trigger: When emitting <requirements> and the module's container directive is non-trivial (ternary or mulled).

on-demand runtime condense hypothesis llm 30.8 KB
bundle
references/notes/component-nextflow-containers-and-envs.md
source
content/research/component-nextflow-containers-and-envs.md
Preview md
---
type: research
subtype: component
tags:
  - research/component
  - source/nextflow
  - target/galaxy
component: "Nextflow Containers and Environments"
status: draft
created: 2026-05-01
revised: 2026-05-05
revision: 3
ai_generated: true
summary: "Container URL grammar (depot, BioContainers, mulled-v2, Wave, ORAS) and conda directive resolution rules backing summarize-nextflow §5."
sources:
  - "https://docs.seqera.io/nextflow/process"
  - "https://docs.seqera.io/nextflow/reference/process"
  - "https://github.com/nf-core/modules/blob/master/modules/nf-core/fastqc/main.nf"
  - "https://github.com/nf-core/modules/blob/master/modules/nf-core/multiqc/main.nf"
  - "https://github.com/nf-core/modules/blob/master/modules/nf-core/dragmap/align/main.nf"
  - "https://github.com/nf-core/modules/blob/master/modules/nf-core/seqkit/sample/main.nf"
  - "https://github.com/nf-core/modules/blob/master/modules/meta-schema.json"
  - "https://github.com/nf-core/modules/blob/master/modules/environment-schema.json"
  - "https://github.com/nf-core/tools/blob/master/nf_core/module-template/main.nf"
  - "https://github.com/BioContainers/multi-package-containers"
  - "https://github.com/BioContainers/singularity-build-bot"
  - "https://depot.galaxyproject.org/singularity/"
  - "https://biocontainers.pro/registry"
  - "https://bioconda.github.io/"
  - "https://docs.seqera.io/wave"
  - "https://nf-co.re/events/2024/bytesize_pipeline_container_urls"
related_molds:
  - "[[summarize-nextflow]]"
  - "[[author-galaxy-tool-wrapper]]"
  - "[[summarize-galaxy-tool]]"
related_notes:
  - "[[component-nextflow-pipeline-anatomy]]"
  - "[[component-nf-core-tools]]"
  - "[[component-nextflow-inspect]]"
---

# Nextflow Containers and Environments

Operational grounding for `[[summarize-nextflow]]` §5 ("Build 
...
research

component-nf-core-tools

packaged

Reference for nf-core module conventions: meta.yml shape, modules.json, environment.yml posture, test layout, container directive idioms.

Trigger: When parsing meta.yml, environment.yml, or main.nf and a convention is unclear; when populating _provenance.yml.

on-demand runtime condense hypothesis llm 24.4 KB
bundle
references/notes/component-nf-core-tools.md
source
content/research/component-nf-core-tools.md
Preview md
---
type: research
subtype: component
tags:
  - research/component
component: "nf-core/tools (Python package + ecosystem)"
status: draft
created: 2026-05-01
revised: 2026-05-01
revision: 1
ai_generated: true
summary: "White paper on nf-core/tools — conventions, CLI surface, schema universe, container resolution. Survey, not decision."
related_molds:
  - "[[summarize-nextflow]]"
  - "[[convert-nfcore-module-to-galaxy-tool]]"
related_notes:
  - "[[convert-nfcore-module-to-galaxy-tool]]"
  - "[[component-nextflow-containers-and-envs]]"
sources:
  - "https://github.com/nf-core/tools"
  - "https://nf-co.re/docs/nf-core-tools"
  - "https://nf-co.re/pipelines.json"
  - "https://github.com/nf-core/modules"
  - "https://github.com/nf-core/test-datasets"
  - "https://github.com/nf-core/configs"
---

# `nf-core/tools` and the nf-core Pipeline Toolchain: A Technical Survey

**Source clone:** `~/projects/repositories/nf-core-tools` (commit `b6c5737`, version `4.0.2`).

## Overview

`nf-core/tools` is the official Python package that the nf-core community publishes to PyPI as `nf-core` (current release **4.0.2**, May 2026). It is a Click-based CLI plus an importable Python library (`nf_core.*`) that handles essentially every lifecycle task a pipeline author or operator performs against an nf-core Nextflow pipeline: scaffolding new pipelines from a Jinja-rendered cookiecutter template, installing and updating shared modules and subworkflows from `nf-core/modules`, linting, schema management, listing remote pipelines, downloading pipelines together with their container images for offline use, and synchronising pipelines with the upstream template as it evolves.

Conceptually the package solves three problems for the community: **enforcing convention** (every nf-core pipeline shares a d
...
research

galaxy-discover-datasets

packaged

Reference for the <discover_datasets> XML element: attributes, named/regex patterns, <data> vs <collection> contexts, test-side <discovered_dataset>.

Trigger: When translating a Nextflow output: channel that uses a glob path or runtime-interpolated filenames into a Galaxy <collection> or multi-output <data>.

on-demand runtime condense hypothesis llm 20.1 KB
bundle
references/notes/galaxy-discover-datasets.md
source
content/research/galaxy-discover-datasets.md
Preview md
---
type: research
subtype: component
title: "Galaxy <discover_datasets>"
tags:
  - research/component
  - target/galaxy
component: "Galaxy <discover_datasets> XML element"
status: draft
created: 2026-05-10
revised: 2026-05-10
revision: 1
ai_generated: true
summary: "Reference for the <discover_datasets> Galaxy XML element — attributes, named/regex patterns, <data> vs <collection> contexts, test assertions."
related_molds:
  - "[[convert-nfcore-module-to-galaxy-tool]]"
related_patterns:
  - "[[nfcore-channel-input-to-galaxy-collection]]"
related_notes:
  - "[[convert-nfcore-module-to-galaxy-tool]]"
  - "[[nfcore-channel-input-to-galaxy-collection]]"
  - "[[galaxy-collection-semantics]]"
  - "[[planemo-asserts-idioms]]"
sources:
  - "https://github.com/galaxyproject/galaxy/blob/7765fae934fbfdee77e3be5f5b235e43735273ae/lib/galaxy/tool_util/xsd/galaxy.xsd"
  - "https://github.com/galaxyproject/galaxy/blob/7765fae934fbfdee77e3be5f5b235e43735273ae/lib/galaxy/tool_util/parser/output_collection_def.py"
  - "https://planemo.readthedocs.io/en/latest/writing_advanced.html#multiple-output-files"
  - "https://github.com/galaxyproject/galaxy/tree/dev/test/functional/tools"
---

# Galaxy `<discover_datasets>`

Cited Galaxy source pinned to `galaxyproject/galaxy@7765fae9` (XSD: `lib/galaxy/tool_util/xsd/galaxy.xsd`; parser: `lib/galaxy/tool_util/parser/output_collection_def.py`).

`<discover_datasets>` is Galaxy's mechanism for collecting outputs whose names or counts aren't knowable at tool-wrapper authoring time. A tool that emits "one BAM per chromosome", "every `*.report.tsv` in the working dir", "whatever fell out of split-by-this-column" — uses `<discover_datasets>` to tell Galaxy how to find them after the job completes.

Two parents, slightly different behavior:

| Parent | Wh
...
schema

planemo-test-report

packaged

Validate `planemo test --test_output_json` output before classifying failures; the JSON gate that replaces free-text parsing.

Trigger: Step 10.2 — after every `planemo test` invocation.

on-demand runtime verbatim hypothesis deterministic 5.1 KB
bundle
references/schemas/planemo-test-report.schema.json
source
package://@galaxy-foundry/planemo-test-report-schema#planemoTestReportSchema
Preview json
{
  "$defs": {
    "PlanemoTestCase": {
      "additionalProperties": true,
      "properties": {
        "data": {
          "anyOf": [
            {
              "$ref": "#/$defs/PlanemoTestCaseData"
            },
            {
              "type": "null"
            }
          ],
          "default": null
        },
        "doc": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Doc"
        },
        "has_data": {
          "title": "Has Data",
          "type": "boolean"
        },
        "id": {
          "title": "Id",
          "type": "string"
        },
        "test_type": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Test Type"
        }
      },
      "required": [
        "id",
        "has_data"
      ],
      "title": "PlanemoTestCase",
      "type": "object"
    },
    "PlanemoTestCaseData": {
      "additionalProperties": true,
      "properties": {
        "end_datetime": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "End Datetime"
        },
        "execution_problem": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Execution Problem"
        },
        "inputs": {
          "anyOf": [
            {
              "additionalProperties": true,
              "type": "object"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Inputs"
        },
        "invocation_details": {
          "anyOf": [
            {
              "additionalProperties": true,
              "type": "object"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Invocation Details"
        },
        "job": {
          "anyOf": [
            {
              "additionalProperties": true,
              "type
...

SKILL.md


# convert-nfcore-module-to-galaxy-tool

Follow the procedure below and use the artifact/reference sections as the runtime contract.

## When To Use

- Convert one nf-core module dir into a Galaxy tool wrapper (tool.xml + macros.xml + _provenance.yml + remote-URL <test> blocks).

## Inputs

- No upstream artifact inputs declared. See the procedure for user-supplied runtime inputs.

## Outputs

- None declared.

## Required Tools

- **`planemo`** (planemo). `uv tool install planemo==git+https://github.com/jmchilton/planemo@a9b8b8bc7ab3b12035d53bdb5383fe450413d9f3` (or `pip install planemo==git+https://github.com/jmchilton/planemo@a9b8b8bc7ab3b12035d53bdb5383fe450413d9f3`).
  Ephemeral run: `uvx --from git+https://github.com/jmchilton/planemo@a9b8b8bc7ab3b12035d53bdb5383fe450413d9f3 planemo`.
  Check: `planemo --version`.
  Docs: https://planemo.readthedocs.io/
  Bundled reference: `references/cli/planemo.md`.

## Load Upfront

- `references/cli/planemo.md`: CLI tool reference copied verbatim into the bundle. Install metadata + pin SHA for the planemo CLI invoked by the convergence loop. Use when: always — the cast skill needs planemo on PATH before running lint/test.
- `references/patterns/nfcore-channel-input-to-galaxy-collection.md`: Pattern note copied verbatim into the bundle. Map process input channels (tuple(meta, path)) to Galaxy <param type="data"> / <param type="data_collection">. Use when: emitting <inputs> for a module.
- `references/patterns/nfcore-meta-map-to-galaxy-params.md`: Pattern note copied verbatim into the bundle. Triage meta-map keys: behavior-driving keys become Galaxy <param>s; identity keys are dropped. Use when: a process consumes a meta-map and any meta keys influence the script: body.
- `references/patterns/nfcore-stub-block-to-galaxy-noop-test.md`: Pattern note copied verbatim into the bundle. Document the intentional drop of stub: blocks; rely on planemo test for fixture coverage. Use when: the module's main.nf contains a stub: block.
- `references/patterns/nfcore-task-ext-args-to-galaxy-additional-options.md`: Pattern note copied verbatim into the bundle. Surface task.ext.args as a single Galaxy text param; do not enumerate per-flag inputs. Use when: the upstream script: body interpolates ${task.ext.args} (or args2/args3).
- `references/patterns/nfcore-versions-emit-to-galaxy-version-command.md`: Pattern note copied verbatim into the bundle. Translate the versions.yml emit block (or topic: versions) into Galaxy's <version_command>. Use when: the script: body or output: declarations contain a versions emit.

## Load On Demand

- `references/cli/planemo-lint.json`: CLI command reference packaged as a sidecar. Reference for `planemo lint` flags and output classification; first gate in the convergence loop. Use when: step 10.1 — after every <command>/<inputs>/<outputs> emission.
- `references/cli/planemo-test.json`: CLI command reference packaged as a sidecar. Reference for `planemo test --test_output_json` invocation, exit codes, and the JSON report path. Use when: step 10.2 — after lint clears.
- `references/notes/component-nextflow-containers-and-envs.md`: Research note copied verbatim into the bundle. Resolve the container directive (mulled, biocontainer, Wave) and environment.yml into a Galaxy <requirements> block with matching bioconda pins. Use when: emitting <requirements> and the module's container directive is non-trivial (ternary or mulled).
- `references/notes/component-nf-core-tools.md`: Research note copied verbatim into the bundle. Reference for nf-core module conventions: meta.yml shape, modules.json, environment.yml posture, test layout, container directive idioms. Use when: parsing meta.yml, environment.yml, or main.nf and a convention is unclear; when populating _provenance.yml.
- `references/notes/galaxy-discover-datasets.md`: Research note copied verbatim into the bundle. Reference for the <discover_datasets> XML element: attributes, named/regex patterns, <data> vs <collection> contexts, test-side <discovered_dataset>. Use when: translating a Nextflow output: channel that uses a glob path or runtime-interpolated filenames into a Galaxy <collection> or multi-output <data>.
- `references/schemas/planemo-test-report.schema.json`: Schema file copied verbatim into the bundle. Validate `planemo test --test_output_json` output before classifying failures; the JSON gate that replaces free-text parsing. Use when: step 10.2 — after every `planemo test` invocation.

## Validation

- None declared.

## Procedure

Convert **one nf-core module directory** into a Galaxy tool wrapper. Input is a path to `modules/nf-core/<name>/` (or any directory of the same shape: `main.nf` + `meta.yml` + `environment.yml` + optional `tests/`). Output is a self-contained tool dir: `tool.xml`, `macros.xml`, `_provenance.yml`, with `<test>` blocks pinned to remote `nf-core/test-datasets` URLs.

The skill authors a Galaxy tool XML wrapper directly from the nf-core module shape — the regular structure (`tuple(meta, path)` channels, `task.ext.args` escape hatch, versions emit, environment.yml bioconda pin) makes a mechanical mapping practical. It does **not** depend on `summarize-nextflow` — that skill summarizes whole pipelines, the wrong granularity for one module.

The skill is run per module by an outer harness (a script or human loop). Cross-module batches are not its concern.

### Inputs

The skill expects:

- A **path** to the module directory (`modules/nf-core/<name>/`, or any local clone of that shape).
- Optional **module pin**: tag, branch, or commit SHA of `nf-core/modules`. When absent, the skill resolves to `git rev-parse HEAD` of the dir's containing repo.
- Optional **test-datasets pin**: SHA of `nf-core/test-datasets` to use for `<test>` block `location` URLs. When absent, the skill resolves to a recent SHA on the module's pipeline-of-record branch (best-effort; recorded in `_provenance.yml` either way).

The skill does **not** accept "convert a subworkflow" — `meta.yml` with a populated `components:` field is out of scope (composes other modules; route to a separate subworkflow skill not in this plan).

### Outputs

Three files in a sibling output directory the harness specifies:

```
<output_dir>/
  <name>.xml          # primary wrapper
  macros.xml          # tool-local macros (token, requirements, version_command, citations)
  _provenance.yml     # nfcore source SHA, file hashes, mold revision, generated_at
```

`tool.xml` shape (skeleton; idiomatic IUC layout):

```xml
<tool id="<name>" name="<name>" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="23.1">
  <description><!-- meta.yml description, first sentence --></description>
  <macros>
    <import>macros.xml</import>
  </macros>
  <expand macro="requirements"/>
  <expand macro="version_command"/>
  <command detect_errors="exit_code"><![CDATA[
    <!-- script: body translated, $task.ext.args → $extra_args -->
  ]]></command>
  <inputs>
    <!-- per nfcore-channel-input-to-galaxy-collection + nfcore-meta-map-to-galaxy-params -->
    <param name="extra_args" type="text" optional="true" .../>  <!-- per task.ext.args pattern -->
  </inputs>
  <outputs>
    <!-- one Galaxy <data> / <collection> per output: channel, minus the versions channel -->
  </outputs>
  <tests>
    <test>
      <param name="reads_1" location="https://raw.githubusercontent.com/nf-core/test-datasets/<sha>/.../test_1.fastq.gz"/>
      <output name="trimmed" location="https://raw.githubusercontent.com/nf-core/test-datasets/<sha>/.../trimmed.fastq.gz"
              checksum="sha256$..."/>
    </test>
  </tests>
  <help><!-- meta.yml description --></help>
  <expand macro="citations"/>
</tool>
```

`_provenance.yml` shape (canonical):

```yaml
nfcore_source:
  modules_repo: nf-core/modules
  module_path: modules/nf-core/<name>
  branch: master
  git_sha: <sha at conversion time>
  meta_yml_hash: <sha256 of meta.yml at conversion>
  main_nf_hash:  <sha256 of main.nf at conversion>
  environment_yml_hash: <sha256 of environment.yml at conversion>
  test_datasets_sha: <sha of nf-core/test-datasets pin>
generated:
  by_mold: convert-nfcore-module-to-galaxy-tool
  mold_revision: 1
  cast_target: claude
  cast_artifact_sha: <sha of cast bundle used>
  on_date: 2026-05-10
overrides: []
```

### Procedure

The skill is **not a single LLM prompt**. It is a small program with embedded LLM calls:

- **Deterministic:** read meta.yml / main.nf / environment.yml / tests/main.nf.test; tokenize the `process` block for input/output channels; resolve container directive into bioconda packages; compute file hashes; resolve git SHAs; emit `_provenance.yml` and the static portions of `<requirements>`, `<citations>`, `<version_command>`.
- **LLM-driven:** translate the `script:` body into Galaxy `<command>` Cheetah; pick `<inputs>` shapes (data vs collection; conditional gating); name and document outputs; humanize `<help>`; place `<test>` block fixtures.

The boundary mirrors `summarize-nextflow` §Procedure: enumerated artifacts deterministic, free-text fields LLM.

#### 1. Read the module

Open `meta.yml`, `main.nf`, `environment.yml`, and `tests/main.nf.test` (when present). Reject early if `meta.yml.components:` is populated (subworkflow composing other modules; out of scope — see *Non-goals*).

Compute `sha256` of each file and capture for `_provenance.yml`.

#### 2. Build `<requirements>`

Walk `environment.yml.dependencies:`. Each `bioconda::<name>=<version>` becomes a Galaxy `<requirement type="package" version="<version>"><name></requirement>` entry. For mulled / multi-package environments, declare every package — bioconda's mulled-resolution produces an equivalent image (per `component-nextflow-containers-and-envs`).

Record any forced divergence from upstream container choice in `_provenance.yml.overrides`.

#### 3. Translate `<inputs>`

Per `nfcore-channel-input-to-galaxy-collection`, decide the Galaxy input shape from the process's input channel cardinality. Per `nfcore-meta-map-to-galaxy-params`, triage meta-map keys into Galaxy params, conditionals, or drops.

Emit a final `extra_args` text param per `nfcore-task-ext-args-to-galaxy-additional-options` if (and only if) the script body interpolates `${task.ext.args}` / `args2` / `args3`.

#### 4. Translate `<outputs>`

For each `output:` channel that isn't the `versions` emit, **decide cardinality first, then shape** (per `galaxy-discover-datasets` §*Convert skill posture*). The Nextflow glob alone is not enough — `path('*.bam')` (N files, one per element of an upstream collection) and `path('*.{bai,csi,crai}')` (exactly one file, alternation across mutually-exclusive extensions) look the same but map to different Galaxy idioms.

- **Single output, deterministic name** (`path("${prefix}.json")`) → `<data name="..." format="json" from_work_dir="${prefix}.json"/>`. No `<discover_datasets>`.
- **Single output, variable extension** (alternation glob like `path("*.{bai,csi,crai}")`, or `path("${prefix}.${ext}")` where `ext` is computed): the channel emits **one** file whose extension depends on inputs or args. Map to a `<data>` with the most-common extension as `format=`, plus a `<change_format>` block that flips on the input ext or the responsible param. **Preserve the upstream invocation byte-for-byte** and capture the result with a tight `mv` — `ln -s '$input' 'input.${input.ext}'` to stage with the upstream-expected name, run the tool exactly as the nf-core `script:` body does, then `mv 'input.${input.ext}'.{bai,csi,crai} '$output_name'`. **Do not** use `<collection>` + `<discover_datasets>` for this shape — there is no list. Direct write to `'$output_name'` (instead of `mv`) is the secondary form, used only when the upstream `script:` body itself passes an output-path arg to the tool. See `galaxy-discover-datasets` §*Convert skill posture* Rule 2 for the full pattern, including the variant and the `from_work_dir` callout.
- **Multi-output, list cardinality** (true glob like `path('*.bam')` where the upstream process emits N files keyed by element identifier) → `<collection type="list" name="..." format="bam">` with `<discover_datasets pattern="__name_and_ext__" visible="true"/>`.
- **Multi-output, paired cardinality** (`tuple val(meta), path("*_R{1,2}.fastp.fastq.gz")`) → `<collection type="paired" ...>` with a custom `(?P<name>...)_R(?P<identifier_1>[12])...` regex.
- **`versions` channel** → drop; the `<version_command>` carries that load (per `nfcore-versions-emit-to-galaxy-version-command`).

**Cardinality heuristic**: if the upstream `input:` channel is `tuple(meta, path)` (one item per process invocation) and `output:` emits one path per concept, the output is single — even when the path is glob-shaped. Process cardinality = output cardinality unless the script explicitly fans out.

#### 5. Translate `script:` to `<command>`

LLM step. Pass:

- The verbatim `script:` body.
- The Galaxy `<inputs>` and `<outputs>` already chosen.
- The `task.ext.args` mapping (text param → `$extra_args`).

Ask only for the Cheetah-flavored Galaxy command. Wrap in `<![CDATA[...]]>`. Set `detect_errors="exit_code"` unless a comment in the original `script:` argues otherwise.

#### 6. Emit `<version_command>`

Per `nfcore-versions-emit-to-galaxy-version-command`: extract the primary tool's version-emit line from the heredoc or `topic: versions` annotation, strip Nextflow escaping (`\$( → $(` etc.), and wrap in `<![CDATA[...]]>`.

#### 7. Emit `<citations>` and `<help>`

`<citations>` are DOIs from `meta.yml.tools[].doi` (one `<citation type="doi">…</citation>` per tool). `<help>` is the `meta.yml.description` (humanized one-liner; expanded into a paragraph if `meta.yml` has a longer prose block).

#### 8. Emit `<test>` blocks (remote URLs)

Read `tests/main.nf.test`. For each test that asserts a successful run with a non-trivial fixture:

- Resolve every input fixture path to a `raw.githubusercontent.com/nf-core/test-datasets/<test_datasets_sha>/...` URL. Pin `<test_datasets_sha>` upfront — never use a branch ref.
- Emit Galaxy `<param ... location="https://..."/>` for inputs.
- For outputs, prefer `<output name="..." location="https://..." checksum="sha256$..."/>` (compute checksum from the upstream snapshot file when one exists; omit if not).

When `tests/main.nf.test` has no usable fixture (stub-only coverage, missing test file), the convert skill **does not ship a placeholder `<test>`**. It surfaces the gap in `_provenance.yml.overrides` and exits with a non-zero status; the harness escalates to human review. Every shipped wrapper carries at least one `<test>` block backed by a real fixture (per `nfcore-stub-block-to-galaxy-noop-test` and the reviewer skill's dimension #6).

#### 9. Emit `_provenance.yml`

Collect: nf-core module source (repo, path, branch, git_sha), file hashes, test-datasets pin, mold metadata, any overrides (forced divergence from upstream container, dropped stub block, hand-edits the skill chose to apply).

#### 10. Convergence loop: lint, test, fix

Iterate until clean. Both gates exit on structured signals — never free-text grep.

##### 10.1 Lint

`planemo lint <output_dir>` (see planemo-lint). Treat by exit code:

- Exit 0 — proceed to 10.2.
- Non-zero — read the diagnostic block. XSD failures are hard: fix the XML and re-emit. Soft findings (missing `<help>`, missing `<citations>`) are fixed in place by the skill, then loop.

##### 10.2 Test

`planemo test <output_dir> --test_output_json <output_dir>/_planemo_test_report.json` (see planemo-test). Always pass `--test_output_json`; planemo's stdout is for humans and intentionally not part of the skill's parsing surface.

After the run:

1. **Validate.** AJV-check the JSON against planemo-test-report (`validate-planemo-test-report` CLI from `@galaxy-foundry/planemo-test-report-schema`). A schema-invalid report means the pinned planemo SHA drifted or the run aborted before writing the report — escalate to human triage; do not classify.
2. **Classify** from schema fields, not free-text. `tests[].data.job` is `dict[str, Any]` (extra-allow) — its inner keys come from the Galaxy job state and are not constrained by planemo-test-report, so treat them as best-effort signals:
   - `tests[].data.status == "success"` → pass.
   - `tests[].data.status == "failure"` + `data.problem_log` matches an output-discovery pattern (`<discover_datasets>` mismatch, missing dataset name, format mismatch) → adjust the corresponding `<output>` / `<discover_datasets>` block.
   - `tests[].data.status == "failure"`, and `data.job.stderr` (when present) carries upstream tool stderr → the `<command>` Cheetah translation is wrong; LLM revises.
   - `tests[].data.status == "error"` with HTTP/URL signals → fixture-availability fault; verify the `nf-core/test-datasets` URL resolves and consider a local `test-data/` fallback, recording the divergence in `_provenance.yml.overrides`.
   - Any other failure shape → human triage.
3. **Stop** when lint and test both clear.

The convergence loop is bounded (default 3 attempts). On exhaustion, the skill writes whatever it has and surfaces the final `_planemo_test_report.json` (plus the lint diagnostics) for human triage.

### Non-goals

- **Subworkflow conversion.** `meta.yml.components:` populated → out of scope. Routed to a separate skill (not in this plan).
- **Pipeline conversion.** Whole nf-core pipelines stay with `nextflow-to-galaxy`; this skill runs at the module tier.
- **Discovery / dedup against IUC.** The new repo coexists with IUC by design (different interface contract for the same upstream tool).
- **Cross-module refactoring.** Per-module unit; harness owns batches.
- **Tool Shed publication.** The output is a tool dir on disk; `.shed.yml` and shed publication live in the new repo's CI, not in this skill.

### Caveats

- **`meta.yml` may lie.** Hand-authored, can drift from `script:` IO. When the LLM-inferred IO disagrees with `meta.yml`, prefer `meta.yml` and surface the disagreement in `_provenance.yml.overrides`.
- **Container directive ternaries** require pulling **both** branches; the bioconda pin from `environment.yml` is the source of truth for `<requirements>`. Don't substitute a hand-picked container source.
- **`task.ext.args` with embedded Groovy logic** can't be cleanly mapped to a single text param. Surface as a text param **plus** a heavy `<help>` block; document the divergence in `_provenance.yml.overrides`.
- **Stub-only tests.** When `tests/main.nf.test` has only stub-mode coverage (`-stub-run`), the convert skill cannot derive a Galaxy `<test>` from it (per `nfcore-stub-block-to-galaxy-noop-test`). Surface the gap; let the harness decide whether to author a `<test>` by hand.

### Reference dispatch (for casting)

- `patterns` → 5 nf-core→Galaxy translation patterns (see frontmatter `references[]`). LLM-condensed into the skill.
- `cli-tool` → planemo carries the pinned install metadata; flows into the cast bundle's `_required_tools.json` via the PR #235 mechanism.
- `cli-command` → planemo-lint and planemo-test cast to JSON sidecars; consulted on-demand inside the §10 loop.
- `schema` → planemo-test-report copied verbatim into the cast bundle; the convergence loop AJV-validates `--test_output_json` output against it before classifying failures.
- `research` → `component-nf-core-tools`, `component-nextflow-containers-and-envs`, `galaxy-discover-datasets`. On-demand, condensed into runtime context when triggered.
- `examples` — pending: 3 hand-picked Wave 1 modules (one trivial, one paired-aware, one with conditional). Used for round-trip smoke testing before this skill ships.

### Revision history

- **rev 1 (2026-05-10)** — initial draft. Procedure sketched against the trimmed plan; no cast runs yet. Pattern + CLI references all `evidence: hypothesis`.
- **rev 2 (2026-05-11)** — convergence loop rewritten against the JSON test-report gate: §10.2 now consumes `planemo test --test_output_json` validated against planemo-test-report; pulled in `cli-tool`/`cli-command`/`schema` references for planemo, planemo-lint, planemo-test, planemo-test-report. Deferred-manpages caveat removed.

## Runtime Notes

- Do not read Foundry source files at runtime; use only files packaged in this skill bundle and user-supplied artifacts.
- Preserve declared artifact filenames unless the user or harness supplies explicit paths.
- Carry unresolved assumptions into the output artifact instead of silently inventing missing source evidence.