Galaxy Dataset Collection API Layer

Comprehensive reference for the API layer that exposes Galaxy’s dataset collection system. Covers endpoints, service/manager interactions, schemas, serialization, authentication, and test coverage.

Endpoint Inventory
Dedicated Collection API — dataset_collections.py
History Contents API — history_contents.py
Service Layer
Manager Layer (API-Relevant Parts)
Request/Response Schemas
Serialization Pipeline
Collection Creation — Full Request Path
Collection Access and Navigation
Update, Delete, and Bulk Operations
Collection Downloads
Job State and Implicit Collections in API
Authentication and Authorization
Pagination and Filtering
Error Handling
Sample Sheet and Workbook Endpoints
Test Coverage
File Index

1. Endpoint Inventory

All endpoints that deal with dataset collections, across all API files.

Dedicated Collection Endpoints (`/api/dataset_collections/...`)

Method	Path	Operation	Description
POST	`/api/dataset_collections`	`create`	Create a new collection instance
GET	`/api/dataset_collections/{hdca_id}`	`show`	Get detailed info about a collection
PUT	`/api/dataset_collections/{hdca_id}`	`update_collection`	Update collection attributes
GET	`/api/dataset_collections/{hdca_id}/contents/{parent_id}`	`contents`	Get child elements of a subcollection
GET	`/api/dataset_collection_element/{dce_id}`	`content`	Get a single DCE by its ID
GET	`/api/dataset_collections/{hdca_id}/attributes`	`attributes`	Get dbkey/extension for all elements
GET	`/api/dataset_collections/{hdca_id}/suitable_converters`	`suitable_converters`	Get applicable converters
POST	`/api/dataset_collections/{hdca_id}/copy`	`copy`	Copy collection with new dbkey
GET	`/api/dataset_collections/{hdca_id}/download`	`download`	Download as zip archive
POST	`/api/dataset_collections/{hdca_id}/prepare_download`	`prepare_download`	Async prepare zip download

Sample Sheet / Workbook Endpoints

Method	Path	Operation	Description
POST	`/api/sample_sheet_workbook`	`create_workbook`	Generate XLSX for sample sheet definition
POST	`/api/sample_sheet_workbook/parse`	`parse_workbook`	Parse XLSX workbook
POST	`/api/dataset_collections/{hdca_id}/sample_sheet_workbook`	`create_workbook_for_collection`	Generate XLSX targeting existing collection
POST	`/api/dataset_collections/{hdca_id}/sample_sheet_workbook/parse`	`parse_workbook_for_collection`	Parse XLSX for existing collection

History Contents Endpoints (Collection-Relevant)

Method	Path	Operation	Description
GET	`/api/histories/{history_id}/contents`	`index`	List history contents (HDAs + HDCAs)
GET	`/api/histories/{history_id}/contents/{type}s`	`index_typed`	List filtered by type (datasets or dataset_collections)
GET	`/api/histories/{history_id}/contents/{type}s/{id}`	`show`	Get detail for HDA or HDCA
POST	`/api/histories/{history_id}/contents/{type}s`	`create_typed`	Create HDA or HDCA in history
POST	`/api/histories/{history_id}/contents`	`create` (deprecated)	Create HDA or HDCA
PUT	`/api/histories/{history_id}/contents/{type}s/{id}`	`update_typed`	Update HDA or HDCA
DELETE	`/api/histories/{history_id}/contents/{type}s/{id}`	`delete_typed`	Delete HDA or HDCA
PUT	`/api/histories/{history_id}/contents`	`update_batch`	Batch update multiple items
PUT	`/api/histories/{history_id}/contents/bulk`	`bulk_operation`	Bulk ops (hide, delete, tag, etc.)
GET	`/api/histories/{history_id}/contents/{type}s/{id}/jobs_summary`	`show_jobs_summary`	Job state summary for HDCA
GET	`/api/histories/{history_id}/jobs_summary`	`index_jobs_summary`	Batch job state summaries
GET	`/api/histories/{history_id}/contents/dataset_collections/{hdca_id}/download`	`download`	Download collection as zip
POST	`/api/histories/{history_id}/contents/dataset_collections/{hdca_id}/prepare_download`	`prepare_download`	Async prepare download
POST	`/api/histories/{history_id}/contents/{type}s/{id}/prepare_store_download`	`prepare_store_download`	Export-style download (model store)
POST	`/api/histories/{history_id}/contents/{type}s/{id}/write_store`	`write_store`	Write to external URI
POST	`/api/histories/{history_id}/copy_contents`	`copy_contents`	Copy datasets/collections between histories

2. Dedicated Collection API

File: lib/galaxy/webapps/galaxy/api/dataset_collections.py

This file defines FastAPIDatasetCollections, a class-based view (CBV) using FastAPI’s router. It delegates entirely to DatasetCollectionsService.

POST `/api/dataset_collections` — Create

Request body: CreateNewCollectionPayload (Pydantic model).

Key fields:

collection_type (str): e.g. "list", "paired", "list:paired", "sample_sheet"
element_identifiers (list): Elements to include, each with name, src (hda/ldda/hdca/new_collection), id, optional tags, optional nested element_identifiers
instance_type: "history" (default) or "library"
history_id: Required when instance_type == "history"
folder_id: Required when instance_type == "library"
name: Collection name
hide_source_items, copy_elements: Behavioral flags
fields: For record type, field definitions (or "auto")
column_definitions, rows: For sample_sheet type

Response: HDCADetailed — full collection representation including elements.

Flow: Service layer calls api_payload_to_create_params() to extract/validate params, then DatasetCollectionManager.create().

GET `/api/dataset_collections/{hdca_id}` — Show

Query params: instance_type (history/library), view (element/element-reference/collection)

Response: AnyHDCA = Union[HDCACustom, HDCADetailed, HDCASummary]

The view parameter controls the level of detail:

"element" — Full element details including nested HDA metadata (default)
"element-reference" — Minimal element info (id, state, type) for efficient UI rendering
"collection" — No element information at all

GET `/api/dataset_collections/{hdca_id}/contents/{parent_id}` — Contents

Paginated child elements of a specific (sub)collection within an HDCA. Used for lazy-loading nested collections.

Query params: limit, offset, instance_type

Response: DatasetCollectionContentElements (root model wrapping list[DCESummary])

Security: validates that parent_id is a subcollection within the given hdca_id using hdca.contains_collection(parent_id) (recursive CTE query).

For subcollection elements, the response includes a contents_url for further drill-down navigation.

GET `/api/dataset_collection_element/{dce_id}` — Single Element

Returns a single DCESummary for a DatasetCollectionElement by its ID. Security check uses security_agent.can_access_collection() on the parent or child collection.

GET `/api/dataset_collections/{hdca_id}/attributes` — Attributes

Returns DatasetCollectionAttributesResult containing dbkey, extension, plus sets of all dbkeys and extensions in the collection.

GET `/api/dataset_collections/{hdca_id}/suitable_converters` — Converters

Returns SuitableConverters — list of tools that can convert all datatypes in the collection. Uses set intersection across all leaf dataset extensions to find converters applicable to the entire collection.

POST `/api/dataset_collections/{hdca_id}/copy` — Copy with Attributes

Copies entire collection with new dbkey. Returns 204 No Content.

3. History Contents API

File: lib/galaxy/webapps/galaxy/api/history_contents.py

Defines FastAPIHistoryContents. Collections appear as HistoryContentType.dataset_collection items in history contents. Many endpoints are polymorphic — they handle both HDAs and HDCAs based on the type parameter.

Index/Listing

Two versions:

Legacy (v param unset): Uses History.contents_iter() with filter params (ids, types, deleted, visible, shareable). Collections serialized via dictify_dataset_collection_instance.
Dev (v=dev): Uses HistoryContentsManager.contents() with ORM filter chain. Collections serialized via HDCASerializer.

Collections in listings use "summary" or "collection" view (no elements). Only when dataset_details="all" or a specific ID matches does the listing use "element" view.

Supports Accept: application/vnd.galaxy.history.contents.stats+json to return HistoryContentsWithStatsResult which includes total match count and requests elements_datatypes key in serialization.

Show

GET /api/histories/{history_id}/contents/dataset_collections/{id}

Supports fuzzy_count parameter for large collections — a heuristic to limit how many elements are returned at each nesting level. See gen_rank_fuzzy_counts() in collections_util.py. This is explicitly not a stable API — it provides a best-effort “balanced start” of large collections for UI rendering.

Create (Collections via History Contents)

POST /api/histories/{history_id}/contents/dataset_collections

When type=dataset_collection, routes to __create_dataset_collection():

If source=new_collection (default): calls DatasetCollectionManager.create() with params from api_payload_to_create_params()
If source=hdca: calls DatasetCollectionManager.copy() to copy an existing collection into the target history, optionally with copy_elements=True and dbkey override

Update

PUT /api/histories/{history_id}/contents/dataset_collections/{id}

Delegates to DatasetCollectionManager.update(). For anonymous users, only deleted and visible are allowed. For authenticated users:

name — validated and sanitized
deleted, visible — boolean validation
tags — sanitized string list
annotation — stored via annotation system

Delete

DELETE /api/histories/{history_id}/contents/dataset_collections/{id}

Supports recursive, purge, stop_job flags. Delegates to DatasetCollectionManager.delete():

Sets deleted=True on the HDCA
If recursive=True: iterates all leaf datasets and deletes them
If purge=True: also purges each leaf dataset

Bulk Operations

PUT /api/histories/{history_id}/contents/bulk

Operations that affect collections:

hide/unhide — sets visible flag
delete — calls DatasetCollectionManager.delete(recursive=True)
undelete — via HDCAManager.undelete()
purge — calls DatasetCollectionManager.delete(recursive=True, purge=True)
change_datatype — chains Celery tasks for all leaf datasets
change_dbkey — sets dbkey on all leaf datasets
add_tags/remove_tags — via tag handler

4. Service Layer

DatasetCollectionsService

File: lib/galaxy/webapps/galaxy/services/dataset_collections.py

Thin service that mediates between API endpoints and managers.

Dependencies: HistoryManager, HDAManager, HDCAManager, DatasetCollectionManager, Registry (datatypes)

Key methods map directly to endpoints:

create() — validates instance_type, resolves parent (history or library folder), calls DatasetCollectionManager.create(), serializes result via dictify_dataset_collection_instance()
show() — gets HDCA/LDCA via DatasetCollectionManager.get_dataset_collection_instance(), serializes with chosen view
contents() — validates HDCA, checks subcollection membership, gets elements via DatasetCollectionManager.get_collection_contents(), serializes each element via dictify_element_reference()
dce_content() — direct session lookup of DatasetCollectionElement, access check via security_agent.can_access_collection()
attributes() — calls dataset_collection_instance.to_dict(view="dbkeysandextensions")
copy() — delegates to DatasetCollectionManager.copy()
suitable_converters() — delegates to DatasetCollectionManager.get_converters_for_collection()

HistoriesContentsService

File: lib/galaxy/webapps/galaxy/services/history_contents.py

Handles polymorphic history content operations. Collection-relevant methods:

__show_dataset_collection() — gets accessible collection, serializes with dictify_dataset_collection_instance() using view param and fuzzy_count
__create_dataset_collection() — routes to DatasetCollectionManager.create() or .copy() based on source
__update_dataset_collection() — delegates to DatasetCollectionManager.update()
show_jobs_summary() — for collections, checks job_source_type (Job or ImplicitCollectionJobs) and returns state summary
get_dataset_collection_archive_for_download() — streams collection as zip via hdcas.stream_dataset_collection()
prepare_collection_download() — async version using Celery task

5. Manager Layer

DatasetCollectionManager

File: lib/galaxy/managers/collections.py

The central service object for all collection operations. Not a typical ModelManager subclass — it directly manages creation, access, matching, and rule-based building.

Key methods exposed to API:

get_dataset_collection_instance(trans, instance_type, id, check_ownership=False, check_accessible=True):

For "history": loads HDCA by ID, checks history ownership/accessibility
For "library": loads LDCA by ID, checks library accessibility via security agent
Overloaded with type hints to return the correct type

create(trans, parent, name, collection_type, element_identifiers=None, elements=None, ...):

Entry point for all user-initiated collection creation
Validates identifiers (unless trusted_identifiers)
Creates DatasetCollection via create_dataset_collection()
Creates HDCA or LDCA instance
Handles tags, implicit inputs, implicit output name

create_dataset_collection(trans, collection_type, element_identifiers=None, elements=None, ...):

Core collection building logic
Resolves element identifiers to actual objects (__load_elements())
For nested collections, recursively creates subcollections
Calls builder.build_collection() with the type plugin

update(trans, instance_type, id, payload):

Validates and parses update payload
Delegates to dataset_collection_instance.set_from_dict() for model fields
Handles annotations and tags separately

delete(trans, instance_type, id, recursive=False, purge=False):

Sets deleted flag
If recursive: iterates all leaf datasets, verifying ownership for each
If purge: purges each leaf dataset

get_collection_contents(trans, parent_id, limit=None, offset=None):

SQL query on DatasetCollectionElement table filtered by dataset_collection_id
Ordered by element_index
Eager loads child_collection and hda relationships

match_collections(collections_to_match):

Delegates to MatchingCollections.for_collections() — used during tool execution, not directly by API

apply_rules(hdca, rule_set, handle_dataset):

Rule-based collection manipulation
Flattens collection to tabular data + sources, applies rules, builds new elements

HDCAManager

File: lib/galaxy/managers/hdcas.py

Standard Galaxy model manager for HDCAs. Extends ModelManager, AccessibleManagerMixin, OwnableManagerMixin, PurgableManagerMixin, AnnotatableManagerMixin.

is_owner(item, user, **kwargs):

Checks item.history.user == user
For anonymous users: checks item.history == kwargs.get("history")

map_datasets(content, fn, *parents):

Recursive walker over all datasets in a collection
Used by bulk operations to apply changes to every leaf dataset

6. Request/Response Schemas

File: lib/galaxy/schema/schema.py

Core Enums

DatasetCollectionInstanceType = Literal["history", "library"]

class DatasetCollectionPopulatedState(str, Enum):
    NEW = "new"
    OK = "ok"
    FAILED = "failed"

class DCEType(str, Enum):
    hda = "hda"
    dataset_collection = "dataset_collection"

class CollectionSourceType(str, Enum):
    hda = "hda"
    ldda = "ldda"
    hdca = "hdca"
    new_collection = "new_collection"

Request Models

CreateNewCollectionPayload — POST body for creating collections:

collection_type: Optional[str] — e.g. “list”, “paired”, “list:paired”, “sample_sheet”
element_identifiers: Optional[list[CollectionElementIdentifier]]
name, hide_source_items, copy_elements
instance_type: “history” or “library”
history_id, folder_id
fields: For record type
column_definitions, rows: For sample_sheet type

CollectionElementIdentifier (in history_contents service):

name, src (CollectionSourceType), id, tags
element_identifiers: For nested new_collection src (self-referencing)
collection_type: For nested collections

UpdateHistoryContentsPayload — PUT body for updates (shared with HDAs):

Flexible payload, used with model_dump(exclude_unset=True)

DeleteHistoryContentPayload — DELETE body:

purge, recursive, stop_job booleans

Response Models

DCSummary — DatasetCollection summary:

id, create_time, update_time, collection_type, populated_state, populated_state_message, element_count

DCDetailed extends DCSummary:

populated (bool), elements (list[DCESummary])

DCESummary — DatasetCollectionElement:

id, element_index, element_identifier, element_type (DCEType)
object: Union[HDAObject, HDADetailed, DCObject] — actual content
columns: Optional sample sheet row data

DCObject — Nested DatasetCollection as element:

id, collection_type, populated, element_count
contents_url: Optional URL for drill-down
elements: list[DCESummary] (recursive)
elements_states, elements_deleted, elements_datatypes: Summary stats

HDAObject — Dataset as element:

id, state, hda_ldda, history_id, tags, purged
accessible: Optional (set during contents serialization)

HDCASummary — HDCA summary (used in listings):

id, name, hid, history_id, collection_id
history_content_type: Always "dataset_collection"
type: Always "collection"
collection_type, populated_state, populated_state_message, element_count
elements_datatypes, elements_states, elements_deleted
job_source_id, job_source_type, job_state_summary
deleted, visible, create_time, update_time
tags, url, contents_url
store_times_summary

HDCADetailed extends HDCASummary:

populated (bool)
elements (list[DCESummary])
implicit_collection_jobs_id
column_definitions (for sample_sheet type)

AnyHDCA = Union[HDCACustom, HDCADetailed, HDCASummary]

7. Serialization Pipeline

There are two serialization paths for collections:

Path 1: `dictify_dataset_collection_instance()` (collections_util.py)

Used by DatasetCollectionsService and HistoriesContentsService.__collection_dict().

dictify_dataset_collection_instance(hdca, parent, security, url_builder, view, fuzzy_count)
    |
    ├── hdca.to_dict(view=hdca_view)  -- base model serialization
    |
    ├── Compute URL and contents_url
    |
    ├── If view in ("element", "element-reference"):
    |     ├── gen_rank_fuzzy_counts(collection_type, fuzzy_count)
    |     ├── get_fuzzy_count_elements(collection, rank_fuzzy_counts)
    |     └── For each element:
    |           ├── dictify_element(element, ...)    # full view
    |           └── dictify_element_reference(...)    # reference view
    |
    └── Attach implicit_collection_jobs_id

dictify_element(): Full recursive serialization. Calls element_object.to_dict() for datasets (includes all HDA metadata). For subcollections, recursively serializes nested elements.

dictify_element_reference(): Lightweight serialization. For datasets: just id, model_class, state, hda_ldda, purged, history_id, tags. For subcollections: collection_type, element_count, populated, elements_states, elements_deleted, elements_datatypes, plus recursive nested elements.

Path 2: `HDCASerializer` (hdcas.py)

Used by HistoriesContentsService._serialize_content_item() (v2 index).

Standard Galaxy serializer framework with view-based key selection:

Summary view keys: id, type_id, name, history_id, collection_id, hid, history_content_type, collection_type, populated_state, populated_state_message, element_count, elements_datatypes, elements_deleted, elements_states, job_source_id, job_source_type, job_state_summary, deleted, visible, type, url, create_time, update_time, tags, contents_url, store_times_summary

Detailed view adds: populated, elements

Collection-proxied keys (delegated to DCSerializer): create_time, update_time, collection_type, populated, populated_state, populated_state_message, elements, element_count.

The elements serializer recursively uses DCESerializer, which delegates to HDASerializer for dataset elements and DCSerializer for subcollection elements.

Fuzzy Count Mechanism

gen_rank_fuzzy_counts(collection_type, fuzzy_count) converts a global element budget into per-rank limits:

paired ranks always get 2
list ranks split the remaining budget by nth-root
The goal is balanced representation across nesting levels
Example: list:paired with fuzzy_count=100 -> [~50, 2]

This is explicitly unstable / heuristic. The only guarantee is the API won’t return orders of magnitude more elements than the requested fuzzy_count.

8. Collection Creation — Full Request Path

Via `/api/dataset_collections` (Direct)

Client POST /api/dataset_collections
  { collection_type: "list:paired",
    element_identifiers: [...],
    instance_type: "history",
    history_id: "abc123",
    name: "My Collection" }
    │
    ▼
FastAPIDatasetCollections.create()
    │
    ▼
DatasetCollectionsService.create(trans, payload)
    │
    ├── api_payload_to_create_params(payload)
    │     ├── Validates required: collection_type, element_identifiers
    │     ├── validate_column_definitions() if sample_sheet
    │     └── Returns dict with: collection_type, element_identifiers, name,
    │         hide_source_items, copy_elements, fields, column_definitions, rows
    │
    ├── Resolve parent:
    │   ├── history: HistoryManager.get_mutable(history_id, user)
    │   └── library: get_library_folder + check_user_can_add_to_library_item
    │
    ▼
DatasetCollectionManager.create(trans, parent, name, collection_type, element_identifiers, ...)
    │
    ├── validate_input_element_identifiers(element_identifiers)
    │     ├── Check no __object__ key (injection prevention)
    │     ├── Check all have "name" field
    │     ├── Check no duplicate names
    │     ├── Check src in (hda, hdca, ldda, new_collection)
    │     └── Recursive validation for new_collection children
    │
    ├── create_dataset_collection(trans, collection_type, element_identifiers, ...)
    │     │
    │     ├── CollectionTypeDescriptionFactory.for_collection_type(collection_type)
    │     │
    │     ├── _element_identifiers_to_elements()
    │     │     │
    │     │     ├── If nested: __recursively_create_collections_for_identifiers()
    │     │     │     └── For each src="new_collection": recursive create_dataset_collection()
    │     │     │
    │     │     └── __load_elements()
    │     │           └── For each identifier:
    │     │                 ├── src="hda": hda_manager.get_accessible() [+ copy if copy_elements]
    │     │                 ├── src="ldda": ldda_manager.get() -> to_history_dataset_association()
    │     │                 ├── src="hdca": __get_history_collection_instance().collection
    │     │                 └── Apply tags from identifier
    │     │
    │     ├── builder.build_collection(type_plugin, elements)
    │     │     └── type_plugin.generate_elements(elements)
    │     │           └── Yields DatasetCollectionElement objects
    │     │
    │     └── Set collection_type on DatasetCollection
    │
    ├── _create_instance_for_collection(trans, parent, name, collection, ...)
    │     ├── Create HDCA (or LDCA for library)
    │     ├── Set implicit_input_collections if applicable
    │     ├── parent.add_dataset_collection() -- assigns HID
    │     └── Apply tags (list of strings or dict of tag objects)
    │
    └── __persist() -- session.add() + session.commit()

Via History Contents API (Copy)

POST /api/histories/{history_id}/contents/dataset_collections
  { type: "dataset_collection", source: "hdca", content: "encoded_hdca_id",
    copy_elements: true, dbkey: "hg38" }
    │
    ▼
HistoriesContentsService.__create_dataset_collection()
    │
    ▼
DatasetCollectionManager.copy(trans, parent=history, source="hdca",
                              encoded_source_id, copy_elements=True,
                              dataset_instance_attributes={dbkey: "hg38"})
    │
    ├── __get_history_collection_instance(trans, encoded_source_id)
    ├── source_hdca.copy(element_destination=history, dataset_instance_attributes=...)
    ├── new_hdca.copy_tags_from(source=source_hdca)
    └── session.commit()

Fetching a Collection

GET /api/dataset_collections/{hdca_id}?view=element

Returns HDCADetailed with full element tree. For very large collections, use view=element-reference for lighter payloads, or view=collection to skip elements entirely.

Fetching via History Contents

GET /api/histories/{history_id}/contents/dataset_collections/{hdca_id}?fuzzy_count=100

The fuzzy_count parameter limits elements at each level. For list:paired with fuzzy_count=100, approximately 50 list elements are returned, each with their full 2 paired elements.

Navigating Nested Collections

Step 1: Get the HDCA with elements or get contents_url:

GET /api/histories/{history_id}/contents?v=dev&view=summary&keys=contents_url

Step 2: Use contents_url to get root elements:

GET /api/dataset_collections/{hdca_id}/contents/{collection_id}

Step 3: For subcollections, each element’s object.contents_url provides the next level:

GET /api/dataset_collections/{hdca_id}/contents/{child_collection_id}

This supports pagination with limit and offset at each level.

Accessing Individual Elements

GET /api/dataset_collection_element/{dce_id}

Returns DCESummary for any element by its ID. Access checked via the parent collection’s permissions.

10. Update, Delete, and Bulk Operations

Update

PUT /api/dataset_collections/{hdca_id}
  { name: "New Name", tags: ["tag1", "tag2"], visible: false }

Or via history contents:

PUT /api/histories/{history_id}/contents/dataset_collections/{hdca_id}
  { name: "New Name" }

Allowed fields:

name — sanitized string
deleted — boolean
visible — boolean
tags — list of strings (calls tag_handler.set_tags_from_list())
annotation — text

Anonymous users can only update deleted and visible.

Delete

DELETE /api/histories/{history_id}/contents/dataset_collections/{hdca_id}

Payload/query options:

recursive (bool, deprecated as query param): also delete leaf datasets
purge (bool, deprecated as query param): purge leaf datasets from disk
stop_job (bool, deprecated as query param): stop creating job

Returns 202 (accepted, async purge) or 204 (immediate).

Batch Update

PUT /api/histories/{history_id}/contents
  { items: [{ id: "...", history_content_type: "dataset_collection" }],
    visible: false }

Applies same payload to all listed items. HDCAs updated via DatasetCollectionManager.update().

Bulk Operations

PUT /api/histories/{history_id}/contents/bulk
  { operation: "delete",
    items: [{ id: "...", history_content_type: "dataset_collection" }] }

Or with filter-based selection (no explicit items, uses filter query params).

Operations and their effect on collections:

hide/unhide: sets visible on HDCA
delete: recursive delete of HDCA + all leaf datasets
undelete: undeletes HDCA (fails if purged)
purge: recursive delete + purge of all leaf datasets
change_datatype: chains Celery tasks for all leaf datasets, then touches HDCA
change_dbkey: sets dbkey on all leaf dataset instances
add_tags/remove_tags: modifies HDCA tags

11. Collection Downloads

Synchronous Download

GET /api/dataset_collections/{hdca_id}/download
GET /api/histories/{history_id}/contents/dataset_collections/{hdca_id}/download

Returns StreamingResponse with zip archive. Structure:

Collection name as root directory
Elements named by element_identifier + file extension
Nested collections create subdirectories

Uses ZipstreamWrapper for streaming. Skips datasets not in ok state or that are purged.

Prerequisite: Collection must be fully populated (populated_optimized == True), otherwise raises 400.

Async Download

POST /api/dataset_collections/{hdca_id}/prepare_download

Returns AsyncFile with storage_request_id. Uses Celery task prepare_dataset_collection_download. Client polls short-term storage API for completion.

Export-Style Download

POST /api/histories/{history_id}/contents/dataset_collections/{id}/prepare_store_download
  { model_store_format: "tar.gz", include_files: true }

Exports collection as a Galaxy model store archive (tar.gz, rocrate, etc.).

12. Job State and Implicit Collections in API

Job State Summary

GET /api/histories/{history_id}/contents/dataset_collections/{id}/jobs_summary

Returns AnyJobStateSummary. For implicit collections (created by map-over):

Checks job_source_type: either "Job" (single job) or "ImplicitCollectionJobs" (job group)
Returns aggregate state counts across all jobs in the group

Batch Job State Polling

GET /api/histories/{history_id}/jobs_summary?ids=id1,id2&types=ImplicitCollectionJobs,Job

Efficient bulk lookup. IDs and types arrays must have same length. Uses fetch_job_states() for efficient SQL.

How Implicit Collections Appear in API Responses

HDCASummary / HDCADetailed always include:

job_source_id: encoded ID of the Job or ImplicitCollectionJobs
job_source_type: "Job" or "ImplicitCollectionJobs" or null
job_state_summary: HDCJobStateSummary with counts per job state (new, waiting, running, ok, error, paused, etc.)
implicit_collection_jobs_id (detailed view only)

These fields allow the UI to track progress of implicit collection population. The populated_state field indicates whether the collection structure itself is finalized:

"new" — elements not yet fully determined
"ok" — all elements present (though individual datasets may still be running)
"failed" — collection population failed

13. Authentication and Authorization

Access Control Model

All collection access goes through history/library access checks:

History collections (HDCA):
- get_dataset_collection_instance() calls history_manager.error_unless_accessible() or error_unless_owner() depending on check_ownership flag
- Most read operations use check_accessible=True (allows shared histories)
- Write operations (update, delete, copy) use check_ownership=True
Library collections (LDCA):
- Uses security_agent.can_access_library_item() for access checks
- Ownership checks for library collections are not yet implemented (raises NotImplementedError)
Element-level access:
- dce_content() checks security_agent.can_access_collection() on the DCE’s parent or child collection
- This checks dataset permissions for all leaf datasets in the collection
- Admin users bypass element-level checks
Anonymous users:
- Can access their current history’s contents
- Limited to deleted and visible updates only
- Ownership verified by history == current_history when history.user is None

Security Patterns

Collection creation requires mutable history access: history_manager.get_mutable()
hide_source_items requires ownership of each source HDA
Recursive delete verifies ownership of each leaf dataset before deletion
hda_manager.get_accessible() used when resolving element identifiers (allows referencing accessible-but-not-owned datasets)
Job state summary intentionally has no access checks — considered non-sensitive data for efficiency

14. Pagination and Filtering

Collection Contents Pagination

GET /api/dataset_collections/{hdca_id}/contents/{parent_id}?limit=20&offset=0

Direct SQL LIMIT/OFFSET on DatasetCollectionElement query, ordered by element_index.

History Contents Filtering

Collections appear alongside datasets in history contents. The v2 index supports:

Standard filter params via FilterQueryParams (q, qv, offset, limit, order)
Ordering by hid-asc (default) or other fields
Type filtering: ?types=dataset_collection restricts to HDCAs only

Legacy index supports:

types: comma-separated list including "dataset_collection"
ids: specific encoded IDs
deleted, visible: boolean filters
shareable: filter by object store shareability

Fuzzy Count (Large Collection Handling)

Not true pagination — a heuristic budget for the show endpoint:

GET /api/histories/{history_id}/contents/dataset_collections/{id}?fuzzy_count=500

Distributes an element budget across nesting levels. For list:list:list with fuzzy_count=1000:

Each list rank gets approximately cube_root(1000) + 1 = 11 elements

15. Error Handling

Validation Errors

Element identifier validation (validate_input_element_identifiers()):

Missing name field -> 400
Duplicate name values -> 400
Unknown src type -> 400
__object__ key present (injection) -> 400
Missing element_identifiers for new_collection -> 400
Missing collection_type for new_collection -> 400

Creation errors:

Missing collection_type -> 400 (ERROR_NO_COLLECTION_TYPE)
Missing element_identifiers and no elements -> 400 (ERROR_INVALID_ELEMENTS_SPECIFICATION)
Missing history_id when instance_type="history" -> 400
Record type without fields -> 400

Column definition validation (sample_sheet):

Invalid type -> 400
Missing required keys -> 400
Invalid validators -> 400
Row data mismatching column definitions -> 400

Access Errors

History not accessible -> 403 (ItemAccessibilityException)
History not owned (for mutations) -> 403
Collection not found -> 400 (RequestParameterInvalidException with “not found” message)
Library collection not accessible -> 403
HDA not accessible during element resolution -> 403

Containment Errors

contents endpoint with parent_id not contained in HDCA -> 404 (ObjectNotFound)

Population Errors

Download of unpopulated collection -> 400 (RequestParameterInvalidException)
Serialization failure when collection not populated -> logs exception and re-raises ValidationError

16. Sample Sheet and Workbook Endpoints

Sample sheets are a collection type where each element carries row-level metadata (columns field on DCE). The API includes workbook endpoints for Excel-based data entry.

Create Workbook

POST /api/sample_sheet_workbook
  { title: "...", column_definitions: [...] }

Returns XLSX file as StreamingResponse. No collection needed — just generates a template.

Create Workbook for Existing Collection

POST /api/dataset_collections/{hdca_id}/sample_sheet_workbook
  { column_definitions: [...], prefix_values: [...] }

Pre-fills the workbook with element identifiers from the existing collection.

Parse Workbook

POST /api/sample_sheet_workbook/parse
  { content: "base64-encoded-xlsx", column_definitions: [...], collection_type: "..." }

Returns ParsedWorkbook with rows extracted from the XLSX.

Parse Workbook for Existing Collection

POST /api/dataset_collections/{hdca_id}/sample_sheet_workbook/parse
  { content: "base64-encoded-xlsx", column_definitions: [...] }

Returns ParsedWorkbookForCollection — includes rows plus elements from the existing collection, allowing the client to map rows to collection elements.

17. Test Coverage

File: lib/galaxy_test/api/test_dataset_collections.py

Test Inventory

Test	What It Covers
`test_create_pair_from_history`	Create paired collection via fetch API
`test_create_list_from_history`	Create list collection via direct POST
`test_create_list_of_existing_pairs`	Reference existing HDCA as element (src=hdca)
`test_create_list_of_new_pairs`	Nested collection creation (list:paired with new subcollections)
`test_create_paried_or_unpaired`	paired_or_unpaired collection with single “unpaired” element
`test_create_record`	Record collection with explicit fields
`test_record_requires_fields`	400 when record type without fields
`test_record_auto_fields`	Auto-detect fields from identifiers
`test_record_field_validation`	Rejects wrong field count/names
`test_sample_sheet_*` (7 tests)	Sample sheet creation, column definitions, validation, nested sample sheets
`test_workbook_download`	XLSX generation
`test_workbook_download_for_collection`	XLSX generation from existing collection
`test_workbook_parse`	XLSX parsing
`test_workbook_parse_for_collection`	XLSX parsing with collection context
`test_list_download`	Download list as zip
`test_pair_download`	Download pair as zip
`test_list_pair_download`	Download list:paired as zip
`test_list_list_download`	Download list:list as zip
`test_list_list_list_download`	Download list:list:list as zip
`test_download_non_english_characters`	Non-ASCII collection names in zip
`test_hda_security`	403 when element is inaccessible to another user
`test_dataset_collection_element_security`	DCE endpoint security for nested collections
`test_enforces_unique_names`	400 on duplicate element identifiers
`test_upload_collection`	Fetch API collection upload with tags
`test_upload_nested`	Fetch API nested collection upload
`test_upload_collection_from_url`	Upload from base64 URL
`test_upload_collection_deferred`	Deferred dataset in collection
`test_upload_collection_failed_expansion_url`	Failed bagit expansion
`test_upload_flat_sample_sheet`	Fetch API sample sheet upload
`test_upload_sample_sheet_paired`	Fetch API sample_sheet:paired upload
`test_collection_contents_security`	403 on contents of non-owned collection
`test_published_collection_contents_accessible`	Contents accessible in published history
`test_collection_contents_invalid_collection`	404 for invalid subcollection ID
`test_show_dataset_collection`	GET show endpoint basic functionality
`test_show_dataset_collection_contents`	Contents endpoint with drill-down
`test_collection_contents_limit_offset`	Pagination params on contents
`test_collection_contents_empty_root`	Empty collection contents
`test_get_suitable_converters_*` (3 tests)	Converter intersection logic
`test_collection_tools_tag_propagation`	Tags propagated through tool execution

Testing Patterns

Populator objects: DatasetCollectionPopulator and DatasetPopulator provide helper methods for creating test data
Fetch API usage: Many tests use dataset_populator.fetch(payload) (tools/fetch endpoint) instead of direct collection creation
Wait patterns: wait_for_fetched_collection() polls until collection is populated
Response validation: _check_create_response() verifies 200 status and required keys (elements, url, name, collection_type, element_count)
Security tests: Use _different_user() context manager to test access control
Helper pattern: _create_collection_contents_pair() creates a simple collection and returns (hdca_dict, contents_url) for reuse

Coverage Gaps (Observed)

No explicit test for library collection creation/access
No test for PUT /api/dataset_collections/{id} (direct update)
No test for POST /api/dataset_collections/{id}/copy (copy with attributes)
No explicit test for the fuzzy_count parameter behavior
No test for GET /api/dataset_collections/{id}/attributes
Bulk operations on collections tested elsewhere in history contents tests

18. File Index

File	Contents
`lib/galaxy/webapps/galaxy/api/dataset_collections.py`	FastAPI endpoints for `/api/dataset_collections/` and `/api/dataset_collection_element/` and `/api/sample_sheet_workbook/`
`lib/galaxy/webapps/galaxy/api/history_contents.py`	FastAPI endpoints for `/api/histories/{id}/contents/` including collection-typed operations and download endpoints
`lib/galaxy/webapps/galaxy/api/common.py`	Shared path/query param definitions: `HistoryHDCAIDPathParam`, `DatasetCollectionElementIdPathParam`, `serve_workbook()`
`lib/galaxy/webapps/galaxy/services/dataset_collections.py`	`DatasetCollectionsService` — service layer for dedicated collection endpoints. Also defines `UpdateCollectionAttributePayload`, `DatasetCollectionAttributesResult`, `SuitableConverters`, `DatasetCollectionContentElements`, workbook API models
`lib/galaxy/webapps/galaxy/services/history_contents.py`	`HistoriesContentsService` — service layer for history contents endpoints. Also defines `CreateHistoryContentPayload`, `CollectionElementIdentifier`, `HistoryContentsIndexParams`, `HistoryItemOperator`
`lib/galaxy/managers/collections.py`	`DatasetCollectionManager` — central business logic for collection CRUD, matching, rule application
`lib/galaxy/managers/hdcas.py`	`HDCAManager`, `DCESerializer`, `DCSerializer`, `DCASerializer`, `HDCASerializer` — CRUD manager and serialization
`lib/galaxy/managers/collections_util.py`	`api_payload_to_create_params()`, `validate_input_element_identifiers()`, `dictify_dataset_collection_instance()`, `dictify_element()`, `dictify_element_reference()`, `gen_rank_fuzzy_counts()`
`lib/galaxy/schema/schema.py`	Pydantic models: `CreateNewCollectionPayload`, `DCSummary`, `DCDetailed`, `DCESummary`, `DCObject`, `HDAObject`, `HDCASummary`, `HDCADetailed`, `HDCJobStateSummary`, `DatasetCollectionPopulatedState`, `DCEType`, `CollectionSourceType`, `DatasetCollectionInstanceType`
`lib/galaxy_test/api/test_dataset_collections.py`	API integration tests for collection creation, download, contents navigation, security, converters, sample sheets, workbooks

Galaxy Dataset Collection API Layer

Table of Contents

1. Endpoint Inventory

Dedicated Collection Endpoints (/api/dataset_collections/...)

Sample Sheet / Workbook Endpoints

History Contents Endpoints (Collection-Relevant)

2. Dedicated Collection API

POST /api/dataset_collections — Create

GET /api/dataset_collections/{hdca_id} — Show

GET /api/dataset_collections/{hdca_id}/contents/{parent_id} — Contents

GET /api/dataset_collection_element/{dce_id} — Single Element

GET /api/dataset_collections/{hdca_id}/attributes — Attributes

GET /api/dataset_collections/{hdca_id}/suitable_converters — Converters

POST /api/dataset_collections/{hdca_id}/copy — Copy with Attributes

3. History Contents API

Index/Listing

Show

Create (Collections via History Contents)

Update

Delete

Bulk Operations

4. Service Layer

DatasetCollectionsService

HistoriesContentsService

5. Manager Layer

DatasetCollectionManager

HDCAManager

6. Request/Response Schemas

Core Enums

Request Models

Response Models

7. Serialization Pipeline

Path 1: dictify_dataset_collection_instance() (collections_util.py)

Path 2: HDCASerializer (hdcas.py)

Fuzzy Count Mechanism

8. Collection Creation — Full Request Path

Via /api/dataset_collections (Direct)

Via History Contents API (Copy)

9. Collection Access and Navigation

Fetching a Collection

Fetching via History Contents

Navigating Nested Collections

Accessing Individual Elements

10. Update, Delete, and Bulk Operations

Update

Delete

Batch Update

Bulk Operations

11. Collection Downloads

Synchronous Download

Async Download

Export-Style Download

12. Job State and Implicit Collections in API

Job State Summary

Batch Job State Polling

How Implicit Collections Appear in API Responses

13. Authentication and Authorization

Access Control Model

Security Patterns

14. Pagination and Filtering

Collection Contents Pagination

History Contents Filtering

Fuzzy Count (Large Collection Handling)

15. Error Handling

Validation Errors

Access Errors

Containment Errors

Population Errors

16. Sample Sheet and Workbook Endpoints

Create Workbook

Create Workbook for Existing Collection

Parse Workbook

Parse Workbook for Existing Collection

17. Test Coverage

Test Inventory

Testing Patterns

Coverage Gaps (Observed)

18. File Index

Incoming References (7)

Dedicated Collection Endpoints (`/api/dataset_collections/...`)

POST `/api/dataset_collections` — Create

GET `/api/dataset_collections/{hdca_id}` — Show

GET `/api/dataset_collections/{hdca_id}/contents/{parent_id}` — Contents

GET `/api/dataset_collection_element/{dce_id}` — Single Element

GET `/api/dataset_collections/{hdca_id}/attributes` — Attributes

GET `/api/dataset_collections/{hdca_id}/suitable_converters` — Converters

POST `/api/dataset_collections/{hdca_id}/copy` — Copy with Attributes

Path 1: `dictify_dataset_collection_instance()` (collections_util.py)

Path 2: `HDCASerializer` (hdcas.py)

Via `/api/dataset_collections` (Direct)