Home Research

Nextflow params to Galaxy workflow inputs

Rules for translating Nextflow params, sample sheets, channels, and control flags into gxformat2 inputs.

Raw
Revised
2026-05-08
Rev
4
component

Nextflow params to Galaxy workflow inputs

Use this note when nextflow-summary-to-galaxy-interface turns a summary-nextflow artifact into Galaxy workflow inputs. nextflow-workflow-io-semantics defines what counts as a Nextflow interface surface; this note narrows that into gxformat2 input decisions.

Evidence quality:

  • Corpus-observed claims cite pinned fixtures under $NEXTFLOW_FIXTURES, the shared clone at /Users/jxc755/projects/repositories/workflow-fixtures/pipelines/.
  • Foundry-internal claims cite existing Foundry notes and the summary-nextflow schema.
  • External-doc claims cite Nextflow, nf-schema, gxformat2, and Galaxy docs.
  • Design inference states the translation posture this Foundry note recommends.

Translation order

  1. Read summary.params[], summary.sample_sheets[], summary.workflow.channels[], summary.workflow.conditionals[], process and subworkflow inputs[], and any warnings. Index summary.workflow.channels[] by from_param so each launch param can be looked up against its materialization channels in O(1).
  2. Classify each launch param as data-bearing, structured sample sheet, scalar workflow parameter, workflow-shape control, runtime/publish control, or reference/data-table selector.
  3. For data-bearing params, use materialization evidence — read workflow.channels[].construct (typed enum: fromPath / fromFilePairs / fromList / samplesheetToList / splitCsv / file / files) joined to the param via from_param, before consulting process inputs. When from_param is null but a channel’s verbatim source references the param, fall back to substring matching (covers one-hop Groovy bindings until jmchilton/foundry#211 is implemented).
  4. Use process and named-workflow inputs to refine shape and identifiers, not to invent top-level Galaxy inputs without upstream launch-param or external-source evidence.
  5. Decide gxformat2 type (data, collection, string, int, float, boolean), then separately decide format, collection_type, optional, default, and confidence.

Design inference: translate from launch params plus materialization evidence, not from every process input:. Process inputs refine shape; they do not create top-level Galaxy inputs unless traced to a launch param or external materialization.

Summary to gxformat2 mapping

summary-nextflow evidenceQuestiongxformat2 inputFieldsConfidence
sample_sheets[] with param = pDoes launch param p describe row-structured datasets?type: collectioncollection_type: sample_sheet*, column_definitions, optionalHigh for nf-schema or samplesheetToList; lower for splitCsv or ad-hoc.
params[] scalar type, not data/controlUser-facing scalar?string, int, float, booleandefault, optional, restrictions from enumHigh when schema-backed.
params[] path-like + workflow.channels[] entry with from_param == p.name and construct in {"fromPath","file"} over a concrete (non-glob) pathSingle dataset?type: dataformat only if nextflow-path-glob-to-galaxy-datatype is confidentMedium to high.
params[] path-like + workflow.channels[] entry with from_param == p.name and construct in {"fromPath","fromList","files","splitCsv"} over a glob/directory/listCollection?type: collectioncollection_type: list unless shape evidence says otherwiseMedium.
workflow.channels[].construct == "fromFilePairs" (and from_param == p.name when known)Paired files?type: collectioncollection_type: paired or list:pairedMedium; high when current source evidence pins sample cardinality.
Repeated tuple val(meta), path(reads)Per-sample dataset listtype: collectioncollection_type: list; use meta for identifiersHigh.
Repeated tuple val(meta), path(reads) where reads is [R1, R2]Paired read collectiontype: collectioncollection_type: list:paired or sample_sheet:paired if sample-sheet-backedHigh.
tuple val(meta), path(a), path(b)Heterogeneous record?Usually parallel inputs or sample_sheet:recordDo not default to pairedMedium.
processes[].inputs[] onlyInternal task input?No top-level inputUse only to refine shapeHigh.
workflow.conditionals[] guard references paramBranching control?Scalar input if preserving branchboolean or enum/string; default from paramMedium to high.
Hidden/report/resource/output-only paramRuntime control?Usually excludeRecord as excluded control param when usefulHigh.

Scalar typing

Nextflow / nf-schema typegxformat2 typeNotes
booleanbooleanPreserve default if present.
integerintUse current gxformat2 spelling, not integer.
numberfloatUse current gxformat2 spelling, not double.
stringstringAdd restrictions from enum when choices are closed.
Primitive list[string], [int], [float], [boolean]Use only when Galaxy should expose multiple primitive values; otherwise keep as string or ask for interface design.
Path string that is a user datasetdata or collectionDo not leave as scalar if Galaxy should receive uploaded or selected data.
Path string that selects reference datadata input the user supplies, or a workflow-curated string enum with restrictionsAvoid introducing new Galaxy data tables (from_data_table / .loc); see §Reference data below.

A path-like Nextflow string is not automatically Galaxy data. First decide whether it is a dataset, collection, sample sheet, reference selector, output directory, or runtime-control path.

Sample sheets

When summary.sample_sheets[] exists, it is the preferred input-shape source for that param. Map it to Galaxy sample_sheet* when row metadata should survive as invocation-time structured metadata.

Sample-sheet evidenceGalaxy input
One path/data column per rowtype: collection, collection_type: sample_sheet
Two required path/data columns forming R1/R2collection_type: sample_sheet:paired
Optional second mate or mixed single/paired rowscollection_type: sample_sheet:paired_or_unpaired
Multiple heterogeneous path columns per rowcollection_type: sample_sheet:record
splitCsv(header: true) without column schemaUse sample_sheet* only if roles are clear; otherwise fall back to flat data CSV or ordinary collections.
Source consumes the manifest file literallytype: data, format: csv or tsv if confident.
Source uses the sheet to fetch accessions or remote referencesTreat as manifest data or scalar/reference design choice; do not invent dataset collections.

Use flat data CSV only when the workflow consumes the manifest as a file or the target cannot safely model row datasets and metadata.

Non-sample-sheet collections

Nextflow evidenceGalaxy shape
Direct fromPath over many filescollection_type: list
Direct paired glob / fromFilePairspaired for one sample; list:paired for many samples
Existing per-sample tuple stream without explicit column schemalist or list:paired
Mixed single/paired branch splitpaired_or_unpaired or split into list plus list:paired
Nested grouping axis matterslist:list, list:paired, or explicit reshaping; do not collapse silently.
Arbitrary tuple/recordParallel inputs, explicit tabular metadata, or manual interface decision.

See nextflow-to-galaxy-channel-shape-mapping for detailed channel-shape rules. Keep collection shape separate from datatype; R1/R2 names determine pairing, not the Galaxy datatype extension.

Control params

Exclude by default:

ClassExamplesReason
Output location/publishingoutdir, publish_dir_modeGalaxy owns histories and workflow outputs.
Email/notificationemail, email_on_fail, plaintext_email, hook_urlRuntime reporting, not scientific input.
Logs/reports/runtime metadatatrace_report_suffix, monochrome_logs, validate_paramsExecution UX.
Institutional/profile configconfig_profile_name, pipelines_testdata_base_pathSite/run environment config.
CLI plumbinghelp, help_full, show_hidden, versionNextflow CLI behavior.
Pure publish togglessave_* flags that only affect publishDirGalaxy output exposure should be a workflow-design choice.

Keep as Galaxy scalar inputs when they alter workflow shape:

ClassExamplesReason
Tool or mode switchesaligner, trimmer, ribo_removal_tool, toolsSelects subgraph or tool branch.
Skip flags that gate analysisskip_alignment, skip_trimming, skip_fastqcChanges which steps run.
Save flags that change command outputstool arguments that make extra datasetsAlters command and output set.
Report config consumed by a Galaxy toolmultiqc_config, multiqc_logoReal data input if the target includes that customization.

A save_* or skip_* name is not enough. Classify by effect: if the param only changes publishDir, exclude. If it changes whether processes run, which tool is called, or which files a command creates, keep as scalar input or resolve to a fixed Galaxy design default.

Warning and impact assessment

Most execution-control params (outdir, publish_dir_mode, email, save_* toggles, reporting flags) are not Galaxy workflow concerns: Galaxy owns history layout, output exposure, and notifications. Drop them silently from the Galaxy interface, but record them so casting can surface a single warning to the user and a per-param impact note to the agent.

Cast Mold posture:

  • Warn the user. Emit one consolidated notice listing each Nextflow param dropped from the Galaxy interface, grouped by class (publish, notification, runtime UX, CLI plumbing). The user should know the Galaxy target is not a faithful CLI replica.
  • Assess problematic cases. For each dropped param the agent must decide whether the omission only changes runtime UX (safe) or changes scientific output (problematic). Mark a param problematic when any of these hold:
    • It is referenced outside publishDir / report config — e.g. inside a process script, channel construction, or branch guard.
    • A save_* flag gates a tool argument or process that produces a published dataset the Galaxy target should expose.
    • A skip_* flag gates a process whose outputs feed downstream steps the target keeps.
    • It selects a reference / database location with no portable Galaxy substitute (user-supplied data, curated string enum, or existing CVMFS path).
    • It is required by nf-schema and has no Galaxy-side substitute.
  • Promote, don’t drop, when problematic. Convert the param into a scalar input, fixed design default, or recorded validation loss; do not let it stay in the silent-exclude bucket.
  • Record the decision. Each excluded param gets source_param, class, effect, assessment (safe or problematic), and a one-line reason in the interface brief, so review can audit the exclusion list.

Default to safe-exclude only after the agent has traced the param’s references in the summary; a name match alone is not assessment.

Requiredness and defaults

EvidenceInterpretationStrength
nf-schema root required[] includes paramRequired launch paramHigh
Param.required: trueRequired, source-normalizedHigh
Missing-default imperative errorRequiredHigh
workflow.channels[].required_runtime == true (channel construction guarded by .ifEmpty { error ... })Materialized data requiredHigh, but may be branch-conditional.
Sample-sheet column requiredRequired row columnHigh
Default existsDefault value, not optionalityMedium to high
Optional placeholder path or empty channelOptional branch plumbingMedium
Branch guard uses paramRequired only in that branchMedium

Rules:

  • Set optional: false unless omission is semantically valid.
  • Do not infer optional: true from default.
  • Set default when Galaxy should supply a value if the user omits or nulls the input.
  • For branch-required inputs, prefer a scalar mode input plus branch-specific data inputs and a confidence note.
  • For sample sheets, required metadata columns become column_definitions[].optional: false; optional mate columns usually imply sample_sheet:paired_or_unpaired.
  • ifEmpty { error } after filters may mean content constraint, not launch-param requiredness. Preserve that as validation loss or an open question.

Confidence annotations

ConfidenceUse when
Highnf-schema and materialization agree; sample-sheet schema exists; channel shape is simple; Galaxy shape is native.
MediumOne strong source plus inferred shape; branch/default interaction; path format known but arity inferred.
LowAd-hoc CSV parsing; glob-only typing; dynamic Groovy closures; arbitrary tuple/record; control effect unclear.

Downstream interface briefs should record source_param, evidence, chosen galaxy_input_type, collection_type if applicable, requiredness_basis, known losses, confidence, and open questions.

Corpus examples

Corpus-observed:

  • $NEXTFLOW_FIXTURES/nf-core__rnaseq/nextflow_schema.json declares input as required with format: file-path, exists: true, schema: assets/schema_input.json, and mimetype: text/csv. Do not map it as plain data by default; inspect the row schema and materialization.
  • $NEXTFLOW_FIXTURES/nf-core__rnaseq/assets/schema_input.json includes sample, fastq_1, optional fastq_2, strandedness, and optional BAM path columns. This is sample metadata plus datasets, not just a CSV file.
  • $NEXTFLOW_FIXTURES/nf-core__rnaseq/workflows/rnaseq/main.nf passes mode and skip parameters such as trimmer, remove_ribo_rna, ribo_removal_tool, skip_alignment, and aligner into workflow branches and subworkflows. Preserve these only when the Galaxy target keeps source configurability.
  • $NEXTFLOW_FIXTURES/nf-core__taxprofiler/subworkflows/local/utils_nfcore_taxprofiler_pipeline/main.nf calls samplesheetToList for both biological input and database definitions. These likely map to different Galaxy interface decisions: sample datasets versus database/reference strategy.
  • $NEXTFLOW_FIXTURES/nf-core__sarek/subworkflows/local/samplesheet_to_channel/main.nf uses ifEmpty { error } after sample filters for tumor/normal constraints. That is conditional content validation, not just unconditional top-level requiredness.

Foundry-internal:

  • gxformat2-workflow-inputs separates optional from default and recommends current gxformat2 scalar spellings.
  • galaxy-sample-sheet-collections defines sample_sheet, sample_sheet:paired, sample_sheet:paired_or_unpaired, and sample_sheet:record.
  • nextflow-workflow-io-semantics records that params.input is only a name; materialization decides whether it is a sample sheet, direct dataset, directory, glob, accession list, or mode switch.

Reference data

Nextflow pipelines often pass reference paths through params (genomes, indices, annotation bundles, kraken DBs). Translating these to Galaxy:

  • Prefer a data input the user supplies when the reference is small, distributable, or already lives on the Galaxy instance as a regular dataset.
  • Prefer a workflow-curated string input with restrictions when there is a small closed set of supported references the workflow author wants to enumerate.
  • Reuse existing CVMFS / data-table-backed inputs only when an established Galaxy tool the workflow already calls expects that exact .loc value (e.g. bowtie2_indexes). The string is then the .loc first column, as in iwc-test-data-conventions.
  • Do not introduce new Galaxy data tables to support a translated workflow. Data tables require admin install, tool_data_table_conf.xml edits, and .loc files, which break the Foundry’s portability-first posture: a translated workflow should run on a stock Galaxy with user-uploaded inputs. Record the loss and ask for an interface decision instead.

The same rule applies to database/reference sample sheets (e.g. nf-core/taxprofiler databases): map them to a regular sample-sheet collection of user-supplied datasets, or to a curated string selector — not to a new admin-managed table.

Open questions

  • Conditional requiredness has no clean pure-gxformat2 expression; interface briefs need review notes.

Incoming References (10)