Galaxy sample_sheet collection types
Reference for the Galaxy backend shape that targets structured per-row metadata — the natural landing zone for Nextflow samplesheetToList parameters and for any source-side idiom that pairs typed metadata columns with dataset references.
Shape
A sample_sheet is a list-shaped collection where each element carries a typed columns row, and the parent collection carries a column_definitions schema. The model (lib/galaxy/model/__init__.py) adds two nullable JSON columns: dataset_collection.column_definitions and dataset_collection_element.columns. Existing collections are unaffected.
DatasetCollection
collection_type: "sample_sheet"
column_definitions: [{name, type, optional, ...}]
elements:
DatasetCollectionElement(element_identifier, columns: [v1, v2, ...], hda or child_collection)
Column definition vocabulary
SampleSheetColumnDefinition (lib/galaxy/schema/schema.py:384-405):
type: "string" | "int" | "float" | "boolean" | "element_identifier"
optional: bool
default_value, validators[], restrictions[], suggestions[]
element_identifiercross-references another row’s identifier in the same collection. Captures patterns like nf-corecontrol_sample. Within-collection only — no cross-collection refs.restrictions[]enumerates allowed values (maps to nf-schemaenum).validators[]is restricted to the safe allowlist (AnySafeValidatorModel, lib/galaxy/tool_util_models/parameter_validators.py):regex,in_range,length.expressionand other arbitrary-Python validators are rejected. Translation from richer nf-schema validation must drop to prose or fall through.- String values and column names are restricted to
[\w\-_ ?]*— no tabs, newlines, quotes, or most specials. Necessary for safe CSV/TSV serialization.
Variants
Enforced by COLLECTION_TYPE_REGEX in type_description.py:
| Variant | Inner element shape | Typical source |
|---|---|---|
sample_sheet | one dataset per row | one path column in nf-schema |
sample_sheet:paired | paired (forward/reverse) | two path columns, paired-end fastq |
sample_sheet:paired_or_unpaired | mixed single or pair | mixed-library sample sheets |
sample_sheet:record | record (named typed slots) | heterogeneous per-sample artifacts |
Constraints to plan around:
sample_sheetmust be outermost.list:sample_sheetandsample_sheet:listare invalid.prototype_elements()is not implemented — sample-sheet-shaped pre-creation isn’t possible because element count is unknown until data exists.
Mapping rules
allow_implicit_mapping is True for everything except record, so sample sheets behave like lists for mapping:
sample_sheetover a(dataset → dataset)tool produces an implicitsample_sheet-shaped output.column_definitionsandcolumnsare not propagated — metadata lives only on the input collection.sample_sheet:pairedover apairedinput produces asample_sheet-shaped output (one job per row).- A
sample_sheetcan be reduced by amultiple=truedata input. - Two sample sheets with identical structure can be linked for dot-product execution.
Tool-side access
Tools read columns at runtime through DatasetCollectionWrapper.sample_sheet_row(element_identifier) (lib/galaxy/tools/wrappers.py:643-707), which builds a {element_identifier: row} dict from element columns. The built-in __SAMPLE_SHEET_TO_TABULAR__ tool (lib/galaxy/tools/sample_sheet_to_tabular.xml) iterates and tab-joins for downstream tabular consumers. Tool inputs declare acceptance via data_collection collection_type="sample_sheet,sample_sheet:paired,...".
DataCollectionToolParameter carries column_definitions through to_dict(), which lets the workflow editor render column-definition forms for sample-sheet inputs.
API surfaces
POST /api/dataset_collectionsacceptscolumn_definitionsandrows(a{element_identifier: [values]}dict).POST /api/tools/fetchcarriescolumn_definitionsat target level and per-elementrowfor remote-URI sample-sheet creation.- Workbook endpoints (
/api/sample_sheet_workbook[/parse],/api/dataset_collections/{id}/sample_sheet_workbook[/parse]) generate and parse XLSX/CSV/TSV using openpyxl, with header rows, dropdowns forrestrictions, type validation, and an instructions sheet.
Workflow integration
InputCollectionModule (lib/galaxy/workflow/modules.py) validates column_definitions at workflow save and propagates them to the runtime form. Galaxy workflow YAML accepts:
inputs:
chipseq_data:
type: collection
collection_type: sample_sheet:paired
column_definitions:
- {type: string, name: condition, optional: false}
- {type: int, name: replicate, optional: false}
- {type: element_identifier, name: control_sample, optional: true}
This is the structured-input lever a Nextflow → Galaxy translation reaches for when source pipelines use samplesheetToList(params.input, "assets/schema_input.json").
Mapping from Nextflow sample sheets
| Nextflow / nf-schema | Galaxy sample_sheet* |
|---|---|
samplesheetToList(params.x, schema) materialization | sample_sheet collection input bound to params.x |
nf-schema column type: string|integer|number|boolean | column type: string|int|float|boolean |
Column annotated meta: (carried in meta map) | non-path column in columns[] |
| Path-typed column (one per row) | element dataset (variant sample_sheet) |
| Two path columns (fastq pair) | inner paired (variant sample_sheet:paired) |
| Mixed single/paired rows | inner paired_or_unpaired |
| Heterogeneous typed per-row artifacts | inner record (variant sample_sheet:record) |
nf-schema enum | restrictions[] |
nf-schema pattern | regex validator |
nf-core control_sample style cross-row reference | element_identifier column type |
Edges to flag, not silently squash, when translating:
- nf-schema validation richer than regex/in_range/length cannot be expressed; downgrade to prose with confidence note.
- Cross-sample-sheet references have no Galaxy expression — keep as plain
stringand warn. - Ad-hoc
splitCsv(header: true)parsing without anassets/schema_*.jsonproduces no column types — translation needs ad-hoc inference; this note’s mapping does not apply. - Output side has no automatic propagation: a tool mapped over a
sample_sheetproduces asample_sheet-shaped collection withoutcolumn_definitions. Carry-forward must be explicit (re-attaching metadata via__SAMPLE_SHEET_TO_TABULAR__or rules DSLadd_column_from_sample_sheet_index).
Why this matters for the Foundry
Without sample_sheet shape awareness, nextflow-summary-to-galaxy-interface defaults nf-core sample-sheet inputs to list:paired or a flat file input, dropping per-row metadata. Surfacing the column schema lets the interface Mold pick the right variant and lets the data-flow Mold preserve meta fields through identifier-keyed wiring instead of inventing parallel parameter inputs.