Nextflow params to Galaxy workflow inputs
Use this note when nextflow-summary-to-galaxy-interface turns a summary-nextflow artifact into Galaxy workflow inputs. nextflow-workflow-io-semantics defines what counts as a Nextflow interface surface; this note narrows that into gxformat2 input decisions.
Evidence quality:
- Corpus-observed claims cite pinned fixtures under
$NEXTFLOW_FIXTURES, the shared clone at/Users/jxc755/projects/repositories/workflow-fixtures/pipelines/. - Foundry-internal claims cite existing Foundry notes and the
summary-nextflowschema. - External-doc claims cite Nextflow, nf-schema, gxformat2, and Galaxy docs.
- Design inference states the translation posture this Foundry note recommends.
Translation order
- Read
summary.params[],summary.sample_sheets[],summary.workflow.channels[],summary.workflow.conditionals[], process and subworkflowinputs[], and any warnings. Indexsummary.workflow.channels[]byfrom_paramso each launch param can be looked up against its materialization channels in O(1). - Classify each launch param as data-bearing, structured sample sheet, scalar workflow parameter, workflow-shape control, runtime/publish control, or reference/data-table selector.
- For data-bearing params, use materialization evidence — read
workflow.channels[].construct(typed enum:fromPath/fromFilePairs/fromList/samplesheetToList/splitCsv/file/files) joined to the param viafrom_param, before consulting process inputs. Whenfrom_paramis null but a channel’s verbatimsourcereferences the param, fall back to substring matching (covers one-hop Groovy bindings until jmchilton/foundry#211 is implemented). - Use process and named-workflow inputs to refine shape and identifiers, not to invent top-level Galaxy inputs without upstream launch-param or external-source evidence.
- Decide gxformat2 type (
data,collection,string,int,float,boolean), then separately decideformat,collection_type,optional,default, and confidence.
Design inference: translate from launch params plus materialization evidence, not from every process input:. Process inputs refine shape; they do not create top-level Galaxy inputs unless traced to a launch param or external materialization.
Summary to gxformat2 mapping
summary-nextflow evidence | Question | gxformat2 input | Fields | Confidence |
|---|---|---|---|---|
sample_sheets[] with param = p | Does launch param p describe row-structured datasets? | type: collection | collection_type: sample_sheet*, column_definitions, optional | High for nf-schema or samplesheetToList; lower for splitCsv or ad-hoc. |
params[] scalar type, not data/control | User-facing scalar? | string, int, float, boolean | default, optional, restrictions from enum | High when schema-backed. |
params[] path-like + workflow.channels[] entry with from_param == p.name and construct in {"fromPath","file"} over a concrete (non-glob) path | Single dataset? | type: data | format only if nextflow-path-glob-to-galaxy-datatype is confident | Medium to high. |
params[] path-like + workflow.channels[] entry with from_param == p.name and construct in {"fromPath","fromList","files","splitCsv"} over a glob/directory/list | Collection? | type: collection | collection_type: list unless shape evidence says otherwise | Medium. |
workflow.channels[].construct == "fromFilePairs" (and from_param == p.name when known) | Paired files? | type: collection | collection_type: paired or list:paired | Medium; high when current source evidence pins sample cardinality. |
Repeated tuple val(meta), path(reads) | Per-sample dataset list | type: collection | collection_type: list; use meta for identifiers | High. |
Repeated tuple val(meta), path(reads) where reads is [R1, R2] | Paired read collection | type: collection | collection_type: list:paired or sample_sheet:paired if sample-sheet-backed | High. |
tuple val(meta), path(a), path(b) | Heterogeneous record? | Usually parallel inputs or sample_sheet:record | Do not default to paired | Medium. |
processes[].inputs[] only | Internal task input? | No top-level input | Use only to refine shape | High. |
workflow.conditionals[] guard references param | Branching control? | Scalar input if preserving branch | boolean or enum/string; default from param | Medium to high. |
| Hidden/report/resource/output-only param | Runtime control? | Usually exclude | Record as excluded control param when useful | High. |
Scalar typing
| Nextflow / nf-schema type | gxformat2 type | Notes |
|---|---|---|
boolean | boolean | Preserve default if present. |
integer | int | Use current gxformat2 spelling, not integer. |
number | float | Use current gxformat2 spelling, not double. |
string | string | Add restrictions from enum when choices are closed. |
| Primitive list | [string], [int], [float], [boolean] | Use only when Galaxy should expose multiple primitive values; otherwise keep as string or ask for interface design. |
| Path string that is a user dataset | data or collection | Do not leave as scalar if Galaxy should receive uploaded or selected data. |
| Path string that selects reference data | data input the user supplies, or a workflow-curated string enum with restrictions | Avoid introducing new Galaxy data tables (from_data_table / .loc); see §Reference data below. |
A path-like Nextflow string is not automatically Galaxy data. First decide whether it is a dataset, collection, sample sheet, reference selector, output directory, or runtime-control path.
Sample sheets
When summary.sample_sheets[] exists, it is the preferred input-shape source for that param. Map it to Galaxy sample_sheet* when row metadata should survive as invocation-time structured metadata.
| Sample-sheet evidence | Galaxy input |
|---|---|
| One path/data column per row | type: collection, collection_type: sample_sheet |
| Two required path/data columns forming R1/R2 | collection_type: sample_sheet:paired |
| Optional second mate or mixed single/paired rows | collection_type: sample_sheet:paired_or_unpaired |
| Multiple heterogeneous path columns per row | collection_type: sample_sheet:record |
splitCsv(header: true) without column schema | Use sample_sheet* only if roles are clear; otherwise fall back to flat data CSV or ordinary collections. |
| Source consumes the manifest file literally | type: data, format: csv or tsv if confident. |
| Source uses the sheet to fetch accessions or remote references | Treat as manifest data or scalar/reference design choice; do not invent dataset collections. |
Use flat data CSV only when the workflow consumes the manifest as a file or the target cannot safely model row datasets and metadata.
Non-sample-sheet collections
| Nextflow evidence | Galaxy shape |
|---|---|
Direct fromPath over many files | collection_type: list |
Direct paired glob / fromFilePairs | paired for one sample; list:paired for many samples |
| Existing per-sample tuple stream without explicit column schema | list or list:paired |
| Mixed single/paired branch split | paired_or_unpaired or split into list plus list:paired |
| Nested grouping axis matters | list:list, list:paired, or explicit reshaping; do not collapse silently. |
| Arbitrary tuple/record | Parallel inputs, explicit tabular metadata, or manual interface decision. |
See nextflow-to-galaxy-channel-shape-mapping for detailed channel-shape rules. Keep collection shape separate from datatype; R1/R2 names determine pairing, not the Galaxy datatype extension.
Control params
Exclude by default:
| Class | Examples | Reason |
|---|---|---|
| Output location/publishing | outdir, publish_dir_mode | Galaxy owns histories and workflow outputs. |
| Email/notification | email, email_on_fail, plaintext_email, hook_url | Runtime reporting, not scientific input. |
| Logs/reports/runtime metadata | trace_report_suffix, monochrome_logs, validate_params | Execution UX. |
| Institutional/profile config | config_profile_name, pipelines_testdata_base_path | Site/run environment config. |
| CLI plumbing | help, help_full, show_hidden, version | Nextflow CLI behavior. |
| Pure publish toggles | save_* flags that only affect publishDir | Galaxy output exposure should be a workflow-design choice. |
Keep as Galaxy scalar inputs when they alter workflow shape:
| Class | Examples | Reason |
|---|---|---|
| Tool or mode switches | aligner, trimmer, ribo_removal_tool, tools | Selects subgraph or tool branch. |
| Skip flags that gate analysis | skip_alignment, skip_trimming, skip_fastqc | Changes which steps run. |
| Save flags that change command outputs | tool arguments that make extra datasets | Alters command and output set. |
| Report config consumed by a Galaxy tool | multiqc_config, multiqc_logo | Real data input if the target includes that customization. |
A save_* or skip_* name is not enough. Classify by effect: if the param only changes publishDir, exclude. If it changes whether processes run, which tool is called, or which files a command creates, keep as scalar input or resolve to a fixed Galaxy design default.
Warning and impact assessment
Most execution-control params (outdir, publish_dir_mode, email, save_* toggles, reporting flags) are not Galaxy workflow concerns: Galaxy owns history layout, output exposure, and notifications. Drop them silently from the Galaxy interface, but record them so casting can surface a single warning to the user and a per-param impact note to the agent.
Cast Mold posture:
- Warn the user. Emit one consolidated notice listing each Nextflow param dropped from the Galaxy interface, grouped by class (publish, notification, runtime UX, CLI plumbing). The user should know the Galaxy target is not a faithful CLI replica.
- Assess problematic cases. For each dropped param the agent must decide whether the omission only changes runtime UX (safe) or changes scientific output (problematic). Mark a param problematic when any of these hold:
- It is referenced outside
publishDir/ report config — e.g. inside a process script, channel construction, or branch guard. - A
save_*flag gates a tool argument or process that produces a published dataset the Galaxy target should expose. - A
skip_*flag gates a process whose outputs feed downstream steps the target keeps. - It selects a reference / database location with no portable Galaxy substitute (user-supplied
data, curatedstringenum, or existing CVMFS path). - It is required by nf-schema and has no Galaxy-side substitute.
- It is referenced outside
- Promote, don’t drop, when problematic. Convert the param into a scalar input, fixed design default, or recorded validation loss; do not let it stay in the silent-exclude bucket.
- Record the decision. Each excluded param gets
source_param,class,effect,assessment(safeorproblematic), and a one-line reason in the interface brief, so review can audit the exclusion list.
Default to safe-exclude only after the agent has traced the param’s references in the summary; a name match alone is not assessment.
Requiredness and defaults
| Evidence | Interpretation | Strength |
|---|---|---|
nf-schema root required[] includes param | Required launch param | High |
Param.required: true | Required, source-normalized | High |
Missing-default imperative error | Required | High |
workflow.channels[].required_runtime == true (channel construction guarded by .ifEmpty { error ... }) | Materialized data required | High, but may be branch-conditional. |
| Sample-sheet column required | Required row column | High |
| Default exists | Default value, not optionality | Medium to high |
| Optional placeholder path or empty channel | Optional branch plumbing | Medium |
| Branch guard uses param | Required only in that branch | Medium |
Rules:
- Set
optional: falseunless omission is semantically valid. - Do not infer
optional: truefromdefault. - Set
defaultwhen Galaxy should supply a value if the user omits or nulls the input. - For branch-required inputs, prefer a scalar mode input plus branch-specific data inputs and a confidence note.
- For sample sheets, required metadata columns become
column_definitions[].optional: false; optional mate columns usually implysample_sheet:paired_or_unpaired. ifEmpty { error }after filters may mean content constraint, not launch-param requiredness. Preserve that as validation loss or an open question.
Confidence annotations
| Confidence | Use when |
|---|---|
| High | nf-schema and materialization agree; sample-sheet schema exists; channel shape is simple; Galaxy shape is native. |
| Medium | One strong source plus inferred shape; branch/default interaction; path format known but arity inferred. |
| Low | Ad-hoc CSV parsing; glob-only typing; dynamic Groovy closures; arbitrary tuple/record; control effect unclear. |
Downstream interface briefs should record source_param, evidence, chosen galaxy_input_type, collection_type if applicable, requiredness_basis, known losses, confidence, and open questions.
Corpus examples
Corpus-observed:
$NEXTFLOW_FIXTURES/nf-core__rnaseq/nextflow_schema.jsondeclaresinputas required withformat: file-path,exists: true,schema: assets/schema_input.json, andmimetype: text/csv. Do not map it as plaindataby default; inspect the row schema and materialization.$NEXTFLOW_FIXTURES/nf-core__rnaseq/assets/schema_input.jsonincludessample,fastq_1, optionalfastq_2,strandedness, and optional BAM path columns. This is sample metadata plus datasets, not just a CSV file.$NEXTFLOW_FIXTURES/nf-core__rnaseq/workflows/rnaseq/main.nfpasses mode and skip parameters such astrimmer,remove_ribo_rna,ribo_removal_tool,skip_alignment, andalignerinto workflow branches and subworkflows. Preserve these only when the Galaxy target keeps source configurability.$NEXTFLOW_FIXTURES/nf-core__taxprofiler/subworkflows/local/utils_nfcore_taxprofiler_pipeline/main.nfcallssamplesheetToListfor both biological input and database definitions. These likely map to different Galaxy interface decisions: sample datasets versus database/reference strategy.$NEXTFLOW_FIXTURES/nf-core__sarek/subworkflows/local/samplesheet_to_channel/main.nfusesifEmpty { error }after sample filters for tumor/normal constraints. That is conditional content validation, not just unconditional top-level requiredness.
Foundry-internal:
- gxformat2-workflow-inputs separates
optionalfromdefaultand recommends current gxformat2 scalar spellings. - galaxy-sample-sheet-collections defines
sample_sheet,sample_sheet:paired,sample_sheet:paired_or_unpaired, andsample_sheet:record. - nextflow-workflow-io-semantics records that
params.inputis only a name; materialization decides whether it is a sample sheet, direct dataset, directory, glob, accession list, or mode switch.
Reference data
Nextflow pipelines often pass reference paths through params (genomes, indices, annotation bundles, kraken DBs). Translating these to Galaxy:
- Prefer a
datainput the user supplies when the reference is small, distributable, or already lives on the Galaxy instance as a regular dataset. - Prefer a workflow-curated
stringinput withrestrictionswhen there is a small closed set of supported references the workflow author wants to enumerate. - Reuse existing CVMFS / data-table-backed inputs only when an established Galaxy tool the workflow already calls expects that exact
.locvalue (e.g.bowtie2_indexes). The string is then the.locfirst column, as in iwc-test-data-conventions. - Do not introduce new Galaxy data tables to support a translated workflow. Data tables require admin install,
tool_data_table_conf.xmledits, and.locfiles, which break the Foundry’s portability-first posture: a translated workflow should run on a stock Galaxy with user-uploaded inputs. Record the loss and ask for an interface decision instead.
The same rule applies to database/reference sample sheets (e.g. nf-core/taxprofiler databases): map them to a regular sample-sheet collection of user-supplied datasets, or to a curated string selector — not to a new admin-managed table.
Open questions
- Conditional requiredness has no clean pure-gxformat2 expression; interface briefs need review notes.