Dashboard

Pr 19305 Implement Sample Sheets

Sample sheets attach typed columnar metadata to dataset collection elements for bioinformatics workflows

Raw
Revised:
2026-04-22
Revision:
3
GitHub PR:
#19305
Related Notes:
Component - Collections - Sample Sheets Backend, Component - Collection API, Component - Collection Models, PR 19377 - Collection Types and Wizard UI

Research: Galaxy PR #19305 - Implement Sample Sheets

PR: https://github.com/galaxyproject/galaxy/pull/19305 Branch: sample_sheets -> dev Merged: 2025-07-30 Stats: 113 files changed, +6504 / -235 Implements: Issue #19085


Overview

Problem

Bioinformatics workflows (e.g. ChIP-seq) require complex “sample sheets” that encode per-sample metadata not captured by Galaxy’s existing simple list/paired collection types. Without this, users had to either:

  • Upload tabular files alongside lists, losing the structural connection between datasets and their metadata
  • Manually encode metadata in file naming conventions

Solution

This PR introduces a new sample_sheet collection type that attaches typed, validated, columnar metadata to each element of a dataset collection. Sample sheets extend the existing collection system with:

  1. column_definitions on DatasetCollection — a JSON column defining the schema (column name, type, validators, restrictions, default values)
  2. columns on DatasetCollectionElement — a JSON column storing each element’s row of metadata values
  3. New collection types: sample_sheet, sample_sheet:paired, sample_sheet:paired_or_unpaired, sample_sheet:record
  4. Workbook (XLSX/CSV/TSV) generation and parsing — users can download spreadsheets, fill them in externally, and upload them back
  5. Workflow editor UI for defining column schemas on collection inputs
  6. Wizard-based collection creator for interactively building sample sheets from URIs, pasted data, existing collections, or uploaded workbooks
  7. AG Grid-based spreadsheet UI for in-browser editing of sample sheet metadata
  8. Rule Builder integration — new add_column_from_sample_sheet_index rule for using sample sheet metadata in the Apply Rules tool
  9. __SAMPLE_SHEET_TO_TABULAR__ tool — converts sample sheet metadata to a tabular dataset for downstream use

Database Migration

File: lib/galaxy/model/migrations/alembic/versions_gxy/3af58c192752_implement_sample_sheets.py Revision: 3af58c192752, depends on 338d0e5deb03

Adds two JSON columns:

  • dataset_collection.column_definitions — stores SampleSheetColumnDefinitions (list of column definition dicts)
  • dataset_collection_element.columns — stores SampleSheetRow (list of column values for that element)

Both are nullable with default None, so existing collections are unaffected.

Status in codebase: EXISTS, unchanged.


Architecture

Data Model Flow

Workflow Input Definition
  -> collection_type = "sample_sheet:paired"
  -> column_definitions = [{name, type, optional, description, validators, restrictions, suggestions, default_value}, ...]

Collection Creation (API or fetch)
  -> DatasetCollection.column_definitions = column_definitions
  -> DatasetCollectionElement.columns = [value1, value2, ...] (one per element)

Downstream tool usage
  -> DatasetCollectionWrapper.sample_sheet_row(element_identifier) returns the row
  -> Rule Builder: add_column_from_sample_sheet_index extracts metadata columns for rule-based operations

API Flow

  1. Workflow author defines sample_sheet:paired input with column definitions in workflow editor
  2. User running workflow sees a wizard UI that lets them:
    • Select a source (remote files, paste URIs, upload workbook, select existing collection)
    • Auto-pair files if needed (for paired sample sheets)
    • Fill in metadata in an AG Grid spreadsheet
    • Download a pre-seeded workbook to fill in externally
  3. Submission either:
    • Creates via fetch API (for URI-based imports): POST /api/tools/fetch with column_definitions on the target and row on each element
    • Creates via collection API (for existing datasets): POST /api/dataset_collections with column_definitions and rows on the payload

Client Flow

FormDataWorkflowRunTabs.vue  (detects sample_sheet type)
  -> SampleSheetCollectionCreator.vue
    -> SampleSheetWizard.vue (multi-step wizard)
      Step 1: Select source (SourceFromRemoteFiles, SourceFromPastedData, SourceFromWorkbook, SourceFromCollection, etc.)
      Step 2: Source-specific input (folder selection, paste area, dataset selection, etc.)
      Step 3: Auto-pairing (for paired types)
      Step 4: Upload workbook (if workbook source)
      Step 5: SampleSheetGrid.vue (AG Grid spreadsheet for editing metadata)
        -> Submit: either fetch API or collection create API

File Inventory

Backend Model Layer

FileChangeExists
lib/galaxy/model/__init__.pyAdded column_definitions (Mapped JSON) to DatasetCollection, columns (Mapped JSON) to DatasetCollectionElement. Updated __init__, _base_to_dict, _serialize, dict_element_visible_keys.YES
lib/galaxy/model/dataset_collections/types/sample_sheet.pyNEW. SampleSheetDatasetCollectionType — flat list of named elements with column metadata. generate_elements() validates rows against column_definitions.YES
lib/galaxy/model/dataset_collections/types/sample_sheet_util.pyNEW. Core validation logic: SampleSheetColumnDefinitionModel (Pydantic), validate_column_definitions(), validate_row(), validate_column_value(). Validates types (int/float/string/boolean/element_identifier), restrictions, and safe validators.YES
lib/galaxy/model/dataset_collections/types/sample_sheet_workbook.pyNEW (614 lines). Workbook generation and parsing for sample sheets. Key classes: CreateWorkbookRequest, ParseWorkbook, ParsedWorkbook, CreateWorkbookForCollection, ParseWorkbookForCollection. Functions: generate_workbook_from_request(), generate_workbook_from_request_for_collection(), parse_workbook(), parse_workbook_for_collection(). Generates XLSX with data validation, instructions sheet, column help. Parses XLSX/CSV/TSV back into structured data.YES
lib/galaxy/model/dataset_collections/workbook_util.pyExtended with CSV/TSV support. Added ReadOnlyWorkbook protocol, ExcelReadOnlyWorkbook, CsvReaderReadOnlyWorkbook, CsvDialect, ContentTypeMessage, CsvDialectInferenceMessage, parse_format_messages(). load_workbook_from_base64() now detects file format (xlsx vs CSV/TSV via magic bytes).YES
lib/galaxy/model/dataset_collections/builder.pybuild_collection() and set_collection_elements() now accept column_definitions and rows. CollectionBuilder tracks _current_row_data, get_level() and add_dataset() accept row param. New build_elements_and_rows(). BoundCollectionBuilder.populate_partial() passes rows.YES
lib/galaxy/model/dataset_collections/registry.pyAdded sample_sheet module import and SampleSheetDatasetCollectionType to PLUGIN_CLASSES.YES
lib/galaxy/model/dataset_collections/rule_target_columns.pycolumn_titles_to_headers() now returns Tuple[List[HeaderColumn], List[InferredColumnMapping]] (added inference logging). New InferredColumnMapping model. Accepts column_offset param.YES
lib/galaxy/model/dataset_collections/type_description.pyAdded COLLECTION_TYPE_REGEX validating all valid collection type strings including sample_sheet*. New validate() method on CollectionTypeDescription.YES
lib/galaxy/model/dataset_collections/adapters.pyAdded columns property (returns None) to adapter class for compatibility.YES
lib/galaxy/model/store/__init__.pymaterialize_elements() now passes columns=element_attrs.get("columns") to DatasetCollectionElement.YES
lib/galaxy/model/store/discover.py_populate_elements() tracks rows list, passes row to get_level() and add_dataset(). persist_elements_to_hdca() uses BoundCollectionBuilder with row support. JsonCollectedDatasetMatch gets row property.YES

Backend Schema Layer

FileChangeExists
lib/galaxy/schema/schema.pyAdded SampleSheetColumnType literal, SampleSheetColumnValueT union, SampleSheetColumnDefinition TypedDict, SampleSheetColumnDefinitions, SampleSheetRow, SampleSheetRows types. Added columns to DCESummary, column_definitions to HDCADetailed, column_definitions/rows to CreateNewCollectionPayload.YES
lib/galaxy/schema/fetch_data.pyBaseCollectionTarget gets column_definitions. BaseDataElement gets row.YES

Backend API / Services

FileChangeExists
lib/galaxy/webapps/galaxy/api/dataset_collections.py4 new endpoints: POST /api/sample_sheet_workbook (create workbook), POST /api/sample_sheet_workbook/parse (parse workbook), POST /api/dataset_collections/{hdca_id}/sample_sheet_workbook (create workbook for collection), POST /api/dataset_collections/{hdca_id}/sample_sheet_workbook/parse (parse workbook for collection). Query params for base64 column definitions, prefix values, filename.YES
lib/galaxy/webapps/galaxy/services/dataset_collections.pyNew service methods: create_workbook(), create_workbook_for_collection(), parse_workbook(), parse_workbook_for_collection(). New models: CreateWorkbookForCollectionApi, ParseWorkbookForCollectionApi, ParsedWorkbookHda, ParsedWorkbookCollection, ParsedWorkbookElement, ParsedWorkbookForCollection. Helper: _attach_elements_to_parsed_workbook().YES

Backend Manager Layer

FileChangeExists
lib/galaxy/managers/collections.pycreate() and create_dataset_collection() accept column_definitions and rows, pass to builder.build_collection(). __init_rule_data() accepts parent_columns, propagates element.columns into sources for rule builder.YES
lib/galaxy/managers/collections_util.pyapi_payload_to_create_params() extracts column_definitions and rows, calls validate_column_definitions().YES

Backend Workflow Layer

FileChangeExists
lib/galaxy/workflow/modules.pyInputCollectionModule.validate_state() validates collection_type using COLLECTION_TYPE_DESCRIPTION_FACTORY. get_runtime_inputs() passes column_definitions and fields to DataCollectionToolParameter. _parse_state_into_dict() extracts column_definitions.YES

Backend Tools

FileChangeExists
lib/galaxy/tools/sample_sheet_to_tabular.xmlNEW. Tool __SAMPLE_SHEET_TO_TABULAR__ — converts sample sheet collection metadata to tabular. Uses Cheetah configfile template iterating $input.keys() and $input.sample_sheet_row($key). Handles None, empty string, boolean replacements.YES
lib/galaxy/tools/wrappers.pyDatasetCollectionWrapper.__init__() builds self.__rows dict mapping element_identifier to columns. New sample_sheet_row() method.YES
lib/galaxy/tools/data_fetch.pyPasses column_definitions through to fetched target. Copies row from src_item to target_metadata.YES
lib/galaxy/tools/fetch/workbooks.pyUpdated to use ReadOnlyWorkbook protocol, parse_format_messages(), new column_titles_to_headers() return type. _load_row_data() uses workbook’s iter_rows(). FetchParseLog type.YES
lib/galaxy/tools/parameters/basic.pyDataCollectionToolParameter.__init__() reads fields and column_definitions from input_source. to_dict() includes them.YES

Backend Utility Layer

FileChangeExists
lib/galaxy/util/rules_dsl.pyNEW rule: AddColumnFromSampleSheetByIndex — extracts a column from source["columns"] by index and appends to row data. Registered in RULES.YES
lib/galaxy/util/rules_dsl_spec.ymlAdded test cases for add_column_from_sample_sheet_index rule (single and multiple columns).YES

Backend Tool Util / CWL

FileChangeExists
lib/galaxy/tool_util/client/staging.pycreate_collection_func() accepts optional rows param, passes to API payload.YES
lib/galaxy/tool_util/cwl/util.pyNew CollectionCreateFunc protocol with rows kwarg. replacement_collection() passes rows for sample_sheet types.YES
lib/galaxy/tool_util/parser/parameter_validators.pyNew UnsafeValidatorConfiguredInUntrustedContext exception (replaces bare assert).YES
lib/galaxy/tool_util_models/parameter_validators.pyNew AnySafeValidatorModel (union of Regex, InRange, Length validators only) and DiscriminatedAnySafeValidatorModel TypeAdapter. Used to restrict which validators sample sheet column definitions can use.YES

Backend Config / Job Execution

FileChangeExists
lib/galaxy/config/sample/tool_conf.xml.sampleAdded <tool file="sample_sheet_to_tabular.xml" /> to sample tool conf.YES
lib/galaxy/job_execution/output_collect.pyMinor change (passes through sample sheet context during output collection).YES

Client API Layer

FileChangeExists
client/src/api/datasetCollections.tsNew types: SampleSheetCollectionType, SampleSheetColumnValueT, CreateWorkbookForCollectionPayload, CreateWorkbookPayload. New functions: createWorkbook(), createWorkbookForCollection().YES
client/src/api/index.tsExports SampleSheetColumnDefinition, SampleSheetColumnDefinitionType, SampleSheetColumnDefinitions from schema.YES
client/src/api/jobs.tsNew exported constants: NON_TERMINAL_STATES, ERROR_STATES, TERMINAL_STATES.YES
client/src/api/tools.tsExtended fetch data types and fetch function to support sample sheet payloads.YES
client/src/api/schema/schema.tsAuto-generated schema updates for all new API types.YES

Client Components — Collection Creation

FileChangeExists
client/src/components/Collections/SampleSheetCollectionCreator.vueNEW. Thin wrapper loading config then rendering SampleSheetWizard. Props: collectionType, extendedCollectionType, extensions.YES
client/src/components/Collections/SampleSheetWizard.vueNEW (499 lines). Multi-step wizard component orchestrating sample sheet creation. Uses useWizard composable. Steps: select-source, select-remote-files-folder, paste-data, select-dataset, select-collection, auto-pairing, upload-workbook, fill-grid. Manages source state, auto-pairing, workbook parsing, fetch job monitoring, and collection creation.YES
client/src/components/Collections/sheet/SampleSheetGrid.vueNEW (663 lines). AG Grid-based spreadsheet for editing sample sheet metadata. Generates dynamic column definitions from schema. Handles two modes: uris (creating from URIs) and model_objects (from existing collections). Builds fetch targets or collection create payloads. Supports drag-and-drop workbook upload. Includes extension/dbKey selectors and collection name input.YES
client/src/components/Collections/sheet/workbooks.tsNEW (114 lines). Client-side workbook utilities: downloadWorkbook(), downloadWorkbookForCollection(), parseWorkbook(), withAutoListIdentifiers(), initialValue().YES
client/src/components/Collections/sheet/DownloadWorkbookButton.vueNEW. Small button component for downloading workbooks.YES
client/src/components/Collections/CollectionCreatorIndex.vueModified to route sample_sheet collection types to SampleSheetCollectionCreator.YES

Client Components — Wizard Sources

FileChangeExists
client/src/components/Collections/wizard/SourceFromRemoteFiles.vueNEW. Card for selecting remote files as source.YES
client/src/components/Collections/wizard/SourceFromPastedData.vueNEW. Card for pasting URI data.YES
client/src/components/Collections/wizard/SourceFromDatasetAsTable.vueNEW. Card for selecting a tabular dataset as source.YES
client/src/components/Collections/wizard/SourceFromWorkbook.vueNEW. Card for uploading a workbook file.YES
client/src/components/Collections/wizard/SourceFromCollection.vueNEW. Card for selecting an existing collection.YES
client/src/components/Collections/wizard/SelectCollection.vueNEW. Collection selection dialog for sample sheet wizard.YES
client/src/components/Collections/wizard/SelectDataset.vueNEW. Dataset selection for tabular source.YES
client/src/components/Collections/wizard/UploadSampleSheet.vueNEW. Upload interface for sample sheet workbooks.YES
client/src/components/Collections/wizard/CardDownloadWorkbook.vueNEW. Card with download workbook action.YES
client/src/components/Collections/wizard/fetchWorkbooks.tsNEW. Client-side workbook column title to target type mapping (columnTitleToTargetType()).YES
client/src/components/Collections/wizard/fetchWorkbooks.test.tsNEW. Tests for columnTitleToTargetType.YES
client/src/components/Collections/wizard/types.tsNEW. Type definitions: InitialElements, PrefixColumnsType, RulesSourceFrom, ParsedFetchWorkbookColumn, AnyParsedSampleSheetWorkbook.YES
client/src/components/Collections/wizard/rule_target_column_specification.ymlNEW. YAML spec for rule target column titles.YES

Client Components — Pairing

FileChangeExists
client/src/components/Collections/common/AutoPairing.vueExtended to support generic HasName type (not just HistoryItemSummary).YES
client/src/components/Collections/common/usePairingSummary.tsNew composable for pairing summary text.YES
client/src/components/Collections/usePairing.tsGeneric auto-pairing composable used by wizard.YES

Client Components — Workflow Editor

FileChangeExists
client/src/components/Workflow/Editor/Forms/FormCollectionType.vueAdded sample_sheet collection type options. Validates collection type string before emitting.YES
client/src/components/Workflow/Editor/Forms/FormColumnDefinition.vueNEW (269 lines). Form for editing a single column definition: name, type, description, restrictions/suggestions, optional flag, default value. Validates column names against reserved Galaxy column titles.YES
client/src/components/Workflow/Editor/Forms/FormColumnDefinitions.vueNEW (161 lines). Repeatable form for managing list of column definitions. Add/remove/reorder columns. Download example workbook button.YES
client/src/components/Workflow/Editor/Forms/FormColumnDefinitionType.vueNEW (53 lines). Select dropdown for column type (Text, Integer, Float, Boolean, Element Identifier).YES
client/src/components/Workflow/Editor/Forms/FormInputCollection.vueExtended to show FormColumnDefinitions when collection_type starts with sample_sheet.YES
client/src/components/Workflow/Editor/modules/collectionTypeDescription.tsUpdated to handle sample_sheet collection types.YES
client/src/components/Workflow/Editor/modules/inputs.tsUpdated input module to handle sample_sheet collection type metadata.YES

Client Components — Form Data / Workflow Run

FileChangeExists
client/src/components/Form/Elements/FormData/FormData.vueUpdated to detect sample_sheet collection types and route to sample sheet creator.YES
client/src/components/Form/Elements/FormData/FormDataWorkflowRunTabs.vueTabs component updated for sample sheet collection type handling.YES
client/src/components/Form/Elements/FormData/collections.tsCollection type utilities updated for sample_sheet types.YES
client/src/components/Form/Elements/FormData/types.tsNew ExtendedCollectionType type with columnDefinitions field.YES
client/src/components/Form/FormElement.vueMinor updates for sample sheet support.YES

Client Components — Rule Builder

FileChangeExists
client/src/components/RuleCollectionBuilder.vueAdded add_column_from_sample_sheet_index rule type to UI. Added sampleSheetMetadataAvailable computed property. Updated populateElementsFromCollectionDescription() to propagate columns from sample sheet elements. Updated metadata options to recognize sample_sheet types. Modernized fetch call to use typed API.YES
client/src/components/RuleBuilder/rule-definitions.jsAdded rule definition for add_column_from_sample_sheet_index.YES

Client Components — History / Other

FileChangeExists
client/src/components/History/Content/Collection/CollectionDescription.vueUpdated to display sample_sheet collection type descriptions.YES
client/src/components/History/CurrentHistory/HistoryOperations/SelectionOperations.vueUpdated selection operations for sample sheet support.YES
client/src/components/History/adapters/buildCollectionModal.tsUpdated modal building to support extendedCollectionType.YES
client/src/components/JobInformation/JobInformation.vueMinor updates for compatibility.YES
client/src/components/JobStates/wait.jsMinor updates.YES
client/src/components/Libraries/LibraryFolder/TopToolbar/FolderTopBar.vueAdded extended-collection-type prop to CollectionCreatorIndex.YES
client/src/components/Upload/DefaultBox.vueAdded extended-collection-type prop to CollectionCreatorIndex.YES
client/src/components/WorkflowInvocationState/util.tsState constants moved (NON_TERMINAL_STATES, ERROR_STATES now in api/jobs.ts).YES
client/src/components/admin/JobsList.vueMinor updates.YES
client/src/components/providers/utils.jsMinor updates.YES

Client Composables / Stores / Utils

FileChangeExists
client/src/composables/fetch.tsNEW (107 lines). useJobWatcher() — watches a job by ID using resource watcher. useFetchJobMonitor() — submits fetch API call, monitors the resulting job until terminal state.YES
client/src/composables/resourceWatcher.tsAdded startWatchingResourceIfNeeded() and stopWatchingResourceIfNeeded() exports.YES
client/src/stores/workflowStepStore.tsUpdated to handle sample sheet collection type in workflow step store.YES
client/src/utils/navigation/navigation.ymlAdded navigation selectors for sample sheet UI elements (grid cells, wizard buttons, paste textarea, etc.).YES
client/src/utils/utils.tsMinor utility updates.YES

Selenium Navigation / Test Infrastructure

FileChangeExists
lib/galaxy/selenium/navigates_galaxy.pyAdded ColumnDefinition dataclass. New methods: workflow_editor_enter_column_definitions(), workflow_editor_connect(), workflow_editor_source_sink_terminal_ids(), workflow_index_open_with_name(). Uses seletools.drag_and_drop.YES
client/src/utils/navigation/navigation.ymlExtensive additions for sample sheet testing: sample_sheet section under workflow_run.input with selectors for grid cells, wizard navigation, paste data, collection selection, etc.YES
packages/navigation/setup.cfgAdded seletools dependency.YES

Test Files

FileChangeExists
test/unit/data/dataset_collections/test_sample_sheet_util.pyNEW (205 lines). Tests for validation: column types, restrictions, optional columns, validators (regex, in_range, length), element_identifier type, special characters.YES
test/unit/data/dataset_collections/test_sample_sheet_workbook.pyNEW (~200 lines). Tests for workbook generation and parsing: simple sample sheet, paired, paired_or_unpaired, from collection, TSV parsing, dbkey columns.YES
test/unit/data/dataset_collections/test_type_descriptions.pyAdded tests for sample_sheet collection type validation.YES
test/unit/data/dataset_collections/test_workbook_util.pyAdded tests for CSV/TSV workbook parsing utilities.YES
test/unit/data/model/test_model_discovery.pyTests for model discovery with sample sheet row data propagation.YES
test/unit/app/tools/test_fetch_workbooks.pyExtended with CSV/TSV parsing tests. Updated column_titles_to_headers() calls for new return type.YES
lib/galaxy_test/api/test_dataset_collections.py~240 lines of new API tests: test_sample_sheet_column_definition_problems, test_sample_sheet_element_identifier_column_type, test_sample_sheet_of_pairs_creation, test_sample_sheet_validating_against_column_definition, test_sample_sheet_requires_columns, workbook download/parse roundtrip, workbook for collection, sample sheet via fetch API, paired via fetch API, plus additional tests.YES
lib/galaxy_test/api/test_tools.pyExtended with sample sheet tool test coverage.YES
lib/galaxy_test/api/test_workflows.pyExtended with sample sheet workflow integration tests.YES
lib/galaxy_test/base/populators.pyNew methods: download_workbook(), download_workbook_for_collection(), parse_workbook(), parse_workflow_for_collection().YES
lib/galaxy_test/base/rules_test_data.pyNew test data: EXAMPLE_SAMPLE_SHEET_SIMPLE_TO_NESTED_LIST, EXAMPLE_SAMPLE_SHEET_SIMPLE_TO_NESTED_LIST_OF_PAIRS. Tests converting sample sheets to nested lists via rules.YES
lib/galaxy_test/selenium/test_workflow_editor.pyNew: CHIPSEQ_COLUMNS test fixture, test_collection_input_sample_sheet_chipseq_example selenium test for workflow editor column definition entry. Refactored helper methods.YES
lib/galaxy_test/selenium/test_workflow_run.py~250 lines of new selenium tests: test_collection_input_sample_sheet_chipseq_example_from_uris (full end-to-end: paste URIs -> auto-pair -> fill grid -> submit -> verify tabular output), test_collection_input_sample_sheet_chipseq_example_from_list_pairs (create from existing list:paired collection).YES
test/functional/tools/sample_tool_conf.xmlAdded sample_sheet_to_tabular.xml to tool conf.YES

Test Fixtures

FileExists
lib/galaxy/model/unittest_utils/filled_in_workbook_1.tsvYES
lib/galaxy/model/unittest_utils/filled_in_workbook_1.xlsxYES
lib/galaxy/model/unittest_utils/filled_in_workbook_1_with_dbkey.xlsxYES
lib/galaxy/model/unittest_utils/filled_in_workbook_from_collection.xlsxYES
lib/galaxy/model/unittest_utils/filled_in_workbook_paired.xlsxYES
lib/galaxy/model/unittest_utils/filled_in_workbook_paired_or_unpaired.xlsxYES
lib/galaxy/app_unittest_utils/fetch_workbook.csvYES
lib/galaxy/app_unittest_utils/fetch_workbook.tsvYES

Key Implementation Details

Collection Type System

Sample sheets introduce a new top-level rank type sample_sheet that can be composed:

  • sample_sheet — flat list of datasets, each with columnar metadata
  • sample_sheet:paired — each element is a paired collection, with metadata per pair
  • sample_sheet:paired_or_unpaired — each element is either paired or unpaired
  • sample_sheet:record — each element is a record (heterogeneous collection)

The regex for valid collection types:

^((list|paired|paired_or_unpaired|record)(:(list|paired|paired_or_unpaired|record))*|sample_sheet|sample_sheet:paired|sample_sheet:record|sample_sheet:paired_or_unpaired)$

Key difference from list: sample sheets cannot be nested further. A sample_sheet is always the outermost rank. This is enforced by the regex.

Column Definition Schema

class SampleSheetColumnDefinition(TypedDict):
    name: str                    # column name (no special characters)
    type: SampleSheetColumnType  # "string" | "int" | "float" | "boolean" | "element_identifier"
    optional: bool
    description: NotRequired[Optional[str]]
    default_value: NotRequired[Optional[SampleSheetColumnValueT]]
    validators: NotRequired[Optional[List[Dict[str, Any]]]]  # only safe validators: regex, in_range, length
    restrictions: NotRequired[Optional[List[SampleSheetColumnValueT]]]
    suggestions: NotRequired[Optional[List[SampleSheetColumnValueT]]]

Workbook Generation

Uses openpyxl to generate XLSX files with:

  • Column headers derived from prefix columns (URI columns for sample_sheet, URI 1/URI 2 for paired) + user-defined columns
  • Data validation (dropdown lists for restrictions, type validation)
  • Cell protection on non-editable columns
  • Instructions sheet
  • Extra column help sheet (for Galaxy-recognized columns like dbkey, file_type, etc.)

Workbook Parsing

Supports three formats:

  1. XLSX — detected by ZIP magic bytes (PK\x03\x04), parsed with openpyxl
  2. CSV — detected by csv.Sniffer, parsed with csv module
  3. TSV — detected by csv.Sniffer (delimiter=\t)

The ReadOnlyWorkbook protocol abstracts over both formats.

Rule Builder Integration

New rule add_column_from_sample_sheet_index:

  • Extracts a value from source["columns"] at a given index
  • Appends it as a new column to the row data
  • Enables deriving collections (e.g., nested lists grouped by treatment) from sample sheet metadata

The __init_rule_data() in the collections manager propagates element.columns into the sources dict so rules can access them.

Tool Wrapper Integration

DatasetCollectionWrapper exposes sample_sheet_row(element_identifier) which returns the columns list for an element. This is used by the __SAMPLE_SHEET_TO_TABULAR__ tool’s Cheetah template to iterate over rows and produce tabular output.

Fetch API Path

When creating sample sheets from URIs, the fetch API path is used:

  1. Each element in the fetch payload has a row field containing its column values
  2. The target has column_definitions describing the schema
  3. data_fetch.py passes row through to discovered file metadata
  4. discover.py propagates rows through CollectionBuilder.add_dataset(row=row) and get_level(row=row)
  5. Builder stores rows and passes them to build_collection() which sets DatasetCollectionElement.columns

API Changes Summary

New Endpoints

MethodPathDescription
POST/api/sample_sheet_workbookGenerate XLSX workbook for sample sheet definition
POST/api/sample_sheet_workbook/parseParse uploaded workbook against sample sheet definition
POST/api/dataset_collections/{hdca_id}/sample_sheet_workbookGenerate workbook pre-seeded with collection elements
POST/api/dataset_collections/{hdca_id}/sample_sheet_workbook/parseParse workbook against collection’s elements

Modified Endpoints

MethodPathChange
POST/api/dataset_collectionsAccepts column_definitions and rows in payload
POST/api/tools/fetchTargets accept column_definitions, elements accept row

New Schema Types

  • SampleSheetColumnDefinitionModel
  • CreateWorkbookRequest
  • ParseWorkbook / ParsedWorkbook
  • CreateWorkbookForCollectionApi
  • ParseWorkbookForCollectionApi / ParsedWorkbookForCollection

Cross-Reference Notes

All 113 files from the PR exist at their original paths in the current codebase. The collection_specification branch HEAD (2a5538b103) matches the dev branch merge base — no additional commits have been made on top of the sample_sheets merge. The codebase is in the exact state left by the PR merge.


Test Coverage Summary

Unit Tests

  • Validation: 205 lines in test_sample_sheet_util.py covering all column types, restrictions, optional fields, validators (regex, in_range, length), element_identifier validation, special character rejection
  • Workbook generation/parsing: test_sample_sheet_workbook.py covers XLSX and TSV roundtrips for simple, paired, paired_or_unpaired, from-collection scenarios
  • Type descriptions: test_type_descriptions.py validates the collection type regex
  • Workbook utilities: test_workbook_util.py tests CSV/TSV parsing
  • Fetch workbooks: test_fetch_workbooks.py adds CSV/TSV parsing, updates existing tests for new return types
  • Model discovery: test_model_discovery.py tests row propagation during collection population

API Integration Tests

  • Collection creation with sample_sheet type (simple, paired, element_identifier columns)
  • Column definition validation (bad types, missing fields, unsafe validators)
  • Row validation against column definitions (type mismatches, out-of-range)
  • Workbook download/parse roundtrip via API
  • Workbook for collection via API
  • Fetch-based sample sheet creation
  • Paired fetch-based sample sheet creation

Selenium Tests

  • Workflow editor: defining sample_sheet:paired input with ChIP-seq-like column definitions
  • Workflow run from URIs: paste URLs -> auto-pair -> fill grid -> submit -> verify tabular output
  • Workflow run from existing collection: select list:paired -> fill metadata grid -> submit -> verify

Rules DSL Tests

  • rules_dsl_spec.yml test cases for add_column_from_sample_sheet_index
  • rules_test_data.py integration test data: sample_sheet -> nested list, sample_sheet:paired -> nested list:paired

Incoming References (4)