Galaxy Dataset Collection API Layer
Comprehensive reference for the API layer that exposes Galaxy’s dataset collection system. Covers endpoints, service/manager interactions, schemas, serialization, authentication, and test coverage.
Table of Contents
- Endpoint Inventory
- Dedicated Collection API — dataset_collections.py
- History Contents API — history_contents.py
- Service Layer
- Manager Layer (API-Relevant Parts)
- Request/Response Schemas
- Serialization Pipeline
- Collection Creation — Full Request Path
- Collection Access and Navigation
- Update, Delete, and Bulk Operations
- Collection Downloads
- Job State and Implicit Collections in API
- Authentication and Authorization
- Pagination and Filtering
- Error Handling
- Sample Sheet and Workbook Endpoints
- Test Coverage
- File Index
1. Endpoint Inventory
All endpoints that deal with dataset collections, across all API files.
Dedicated Collection Endpoints (/api/dataset_collections/...)
| Method | Path | Operation | Description |
|---|---|---|---|
| POST | /api/dataset_collections | create | Create a new collection instance |
| GET | /api/dataset_collections/{hdca_id} | show | Get detailed info about a collection |
| PUT | /api/dataset_collections/{hdca_id} | update_collection | Update collection attributes |
| GET | /api/dataset_collections/{hdca_id}/contents/{parent_id} | contents | Get child elements of a subcollection |
| GET | /api/dataset_collection_element/{dce_id} | content | Get a single DCE by its ID |
| GET | /api/dataset_collections/{hdca_id}/attributes | attributes | Get dbkey/extension for all elements |
| GET | /api/dataset_collections/{hdca_id}/suitable_converters | suitable_converters | Get applicable converters |
| POST | /api/dataset_collections/{hdca_id}/copy | copy | Copy collection with new dbkey |
| GET | /api/dataset_collections/{hdca_id}/download | download | Download as zip archive |
| POST | /api/dataset_collections/{hdca_id}/prepare_download | prepare_download | Async prepare zip download |
Sample Sheet / Workbook Endpoints
| Method | Path | Operation | Description |
|---|---|---|---|
| POST | /api/sample_sheet_workbook | create_workbook | Generate XLSX for sample sheet definition |
| POST | /api/sample_sheet_workbook/parse | parse_workbook | Parse XLSX workbook |
| POST | /api/dataset_collections/{hdca_id}/sample_sheet_workbook | create_workbook_for_collection | Generate XLSX targeting existing collection |
| POST | /api/dataset_collections/{hdca_id}/sample_sheet_workbook/parse | parse_workbook_for_collection | Parse XLSX for existing collection |
History Contents Endpoints (Collection-Relevant)
| Method | Path | Operation | Description |
|---|---|---|---|
| GET | /api/histories/{history_id}/contents | index | List history contents (HDAs + HDCAs) |
| GET | /api/histories/{history_id}/contents/{type}s | index_typed | List filtered by type (datasets or dataset_collections) |
| GET | /api/histories/{history_id}/contents/{type}s/{id} | show | Get detail for HDA or HDCA |
| POST | /api/histories/{history_id}/contents/{type}s | create_typed | Create HDA or HDCA in history |
| POST | /api/histories/{history_id}/contents | create (deprecated) | Create HDA or HDCA |
| PUT | /api/histories/{history_id}/contents/{type}s/{id} | update_typed | Update HDA or HDCA |
| DELETE | /api/histories/{history_id}/contents/{type}s/{id} | delete_typed | Delete HDA or HDCA |
| PUT | /api/histories/{history_id}/contents | update_batch | Batch update multiple items |
| PUT | /api/histories/{history_id}/contents/bulk | bulk_operation | Bulk ops (hide, delete, tag, etc.) |
| GET | /api/histories/{history_id}/contents/{type}s/{id}/jobs_summary | show_jobs_summary | Job state summary for HDCA |
| GET | /api/histories/{history_id}/jobs_summary | index_jobs_summary | Batch job state summaries |
| GET | /api/histories/{history_id}/contents/dataset_collections/{hdca_id}/download | download | Download collection as zip |
| POST | /api/histories/{history_id}/contents/dataset_collections/{hdca_id}/prepare_download | prepare_download | Async prepare download |
| POST | /api/histories/{history_id}/contents/{type}s/{id}/prepare_store_download | prepare_store_download | Export-style download (model store) |
| POST | /api/histories/{history_id}/contents/{type}s/{id}/write_store | write_store | Write to external URI |
| POST | /api/histories/{history_id}/copy_contents | copy_contents | Copy datasets/collections between histories |
2. Dedicated Collection API
File: lib/galaxy/webapps/galaxy/api/dataset_collections.py
This file defines FastAPIDatasetCollections, a class-based view (CBV) using FastAPI’s router. It delegates entirely to DatasetCollectionsService.
POST /api/dataset_collections — Create
Request body: CreateNewCollectionPayload (Pydantic model).
Key fields:
collection_type(str): e.g."list","paired","list:paired","sample_sheet"element_identifiers(list): Elements to include, each withname,src(hda/ldda/hdca/new_collection),id, optionaltags, optional nestedelement_identifiersinstance_type:"history"(default) or"library"history_id: Required wheninstance_type == "history"folder_id: Required wheninstance_type == "library"name: Collection namehide_source_items,copy_elements: Behavioral flagsfields: Forrecordtype, field definitions (or"auto")column_definitions,rows: Forsample_sheettype
Response: HDCADetailed — full collection representation including elements.
Flow: Service layer calls api_payload_to_create_params() to extract/validate params, then DatasetCollectionManager.create().
GET /api/dataset_collections/{hdca_id} — Show
Query params: instance_type (history/library), view (element/element-reference/collection)
Response: AnyHDCA = Union[HDCACustom, HDCADetailed, HDCASummary]
The view parameter controls the level of detail:
"element"— Full element details including nested HDA metadata (default)"element-reference"— Minimal element info (id, state, type) for efficient UI rendering"collection"— No element information at all
GET /api/dataset_collections/{hdca_id}/contents/{parent_id} — Contents
Paginated child elements of a specific (sub)collection within an HDCA. Used for lazy-loading nested collections.
Query params: limit, offset, instance_type
Response: DatasetCollectionContentElements (root model wrapping list[DCESummary])
Security: validates that parent_id is a subcollection within the given hdca_id using hdca.contains_collection(parent_id) (recursive CTE query).
For subcollection elements, the response includes a contents_url for further drill-down navigation.
GET /api/dataset_collection_element/{dce_id} — Single Element
Returns a single DCESummary for a DatasetCollectionElement by its ID. Security check uses security_agent.can_access_collection() on the parent or child collection.
GET /api/dataset_collections/{hdca_id}/attributes — Attributes
Returns DatasetCollectionAttributesResult containing dbkey, extension, plus sets of all dbkeys and extensions in the collection.
GET /api/dataset_collections/{hdca_id}/suitable_converters — Converters
Returns SuitableConverters — list of tools that can convert all datatypes in the collection. Uses set intersection across all leaf dataset extensions to find converters applicable to the entire collection.
POST /api/dataset_collections/{hdca_id}/copy — Copy with Attributes
Copies entire collection with new dbkey. Returns 204 No Content.
3. History Contents API
File: lib/galaxy/webapps/galaxy/api/history_contents.py
Defines FastAPIHistoryContents. Collections appear as HistoryContentType.dataset_collection items in history contents. Many endpoints are polymorphic — they handle both HDAs and HDCAs based on the type parameter.
Index/Listing
Two versions:
- Legacy (
vparam unset): UsesHistory.contents_iter()with filter params (ids,types,deleted,visible,shareable). Collections serialized viadictify_dataset_collection_instance. - Dev (
v=dev): UsesHistoryContentsManager.contents()with ORM filter chain. Collections serialized viaHDCASerializer.
Collections in listings use "summary" or "collection" view (no elements). Only when dataset_details="all" or a specific ID matches does the listing use "element" view.
Supports Accept: application/vnd.galaxy.history.contents.stats+json to return HistoryContentsWithStatsResult which includes total match count and requests elements_datatypes key in serialization.
Show
GET /api/histories/{history_id}/contents/dataset_collections/{id}
Supports fuzzy_count parameter for large collections — a heuristic to limit how many elements are returned at each nesting level. See gen_rank_fuzzy_counts() in collections_util.py. This is explicitly not a stable API — it provides a best-effort “balanced start” of large collections for UI rendering.
Create (Collections via History Contents)
POST /api/histories/{history_id}/contents/dataset_collections
When type=dataset_collection, routes to __create_dataset_collection():
- If
source=new_collection(default): callsDatasetCollectionManager.create()with params fromapi_payload_to_create_params() - If
source=hdca: callsDatasetCollectionManager.copy()to copy an existing collection into the target history, optionally withcopy_elements=Trueanddbkeyoverride
Update
PUT /api/histories/{history_id}/contents/dataset_collections/{id}
Delegates to DatasetCollectionManager.update(). For anonymous users, only deleted and visible are allowed. For authenticated users:
name— validated and sanitizeddeleted,visible— boolean validationtags— sanitized string listannotation— stored via annotation system
Delete
DELETE /api/histories/{history_id}/contents/dataset_collections/{id}
Supports recursive, purge, stop_job flags. Delegates to DatasetCollectionManager.delete():
- Sets
deleted=Trueon the HDCA - If
recursive=True: iterates all leaf datasets and deletes them - If
purge=True: also purges each leaf dataset
Bulk Operations
PUT /api/histories/{history_id}/contents/bulk
Operations that affect collections:
hide/unhide— setsvisibleflagdelete— callsDatasetCollectionManager.delete(recursive=True)undelete— viaHDCAManager.undelete()purge— callsDatasetCollectionManager.delete(recursive=True, purge=True)change_datatype— chains Celery tasks for all leaf datasetschange_dbkey— sets dbkey on all leaf datasetsadd_tags/remove_tags— via tag handler
4. Service Layer
DatasetCollectionsService
File: lib/galaxy/webapps/galaxy/services/dataset_collections.py
Thin service that mediates between API endpoints and managers.
Dependencies: HistoryManager, HDAManager, HDCAManager, DatasetCollectionManager, Registry (datatypes)
Key methods map directly to endpoints:
create()— validates instance_type, resolves parent (history or library folder), callsDatasetCollectionManager.create(), serializes result viadictify_dataset_collection_instance()show()— gets HDCA/LDCA viaDatasetCollectionManager.get_dataset_collection_instance(), serializes with chosen viewcontents()— validates HDCA, checks subcollection membership, gets elements viaDatasetCollectionManager.get_collection_contents(), serializes each element viadictify_element_reference()dce_content()— direct session lookup ofDatasetCollectionElement, access check viasecurity_agent.can_access_collection()attributes()— callsdataset_collection_instance.to_dict(view="dbkeysandextensions")copy()— delegates toDatasetCollectionManager.copy()suitable_converters()— delegates toDatasetCollectionManager.get_converters_for_collection()
HistoriesContentsService
File: lib/galaxy/webapps/galaxy/services/history_contents.py
Handles polymorphic history content operations. Collection-relevant methods:
__show_dataset_collection()— gets accessible collection, serializes withdictify_dataset_collection_instance()using view param and fuzzy_count__create_dataset_collection()— routes toDatasetCollectionManager.create()or.copy()based on source__update_dataset_collection()— delegates toDatasetCollectionManager.update()show_jobs_summary()— for collections, checksjob_source_type(Job or ImplicitCollectionJobs) and returns state summaryget_dataset_collection_archive_for_download()— streams collection as zip viahdcas.stream_dataset_collection()prepare_collection_download()— async version using Celery task
5. Manager Layer
DatasetCollectionManager
File: lib/galaxy/managers/collections.py
The central service object for all collection operations. Not a typical ModelManager subclass — it directly manages creation, access, matching, and rule-based building.
Key methods exposed to API:
get_dataset_collection_instance(trans, instance_type, id, check_ownership=False, check_accessible=True):
- For
"history": loads HDCA by ID, checks history ownership/accessibility - For
"library": loads LDCA by ID, checks library accessibility via security agent - Overloaded with type hints to return the correct type
create(trans, parent, name, collection_type, element_identifiers=None, elements=None, ...):
- Entry point for all user-initiated collection creation
- Validates identifiers (unless
trusted_identifiers) - Creates
DatasetCollectionviacreate_dataset_collection() - Creates HDCA or LDCA instance
- Handles tags, implicit inputs, implicit output name
create_dataset_collection(trans, collection_type, element_identifiers=None, elements=None, ...):
- Core collection building logic
- Resolves element identifiers to actual objects (
__load_elements()) - For nested collections, recursively creates subcollections
- Calls
builder.build_collection()with the type plugin
update(trans, instance_type, id, payload):
- Validates and parses update payload
- Delegates to
dataset_collection_instance.set_from_dict()for model fields - Handles annotations and tags separately
delete(trans, instance_type, id, recursive=False, purge=False):
- Sets deleted flag
- If recursive: iterates all leaf datasets, verifying ownership for each
- If purge: purges each leaf dataset
get_collection_contents(trans, parent_id, limit=None, offset=None):
- SQL query on
DatasetCollectionElementtable filtered bydataset_collection_id - Ordered by
element_index - Eager loads
child_collectionandhdarelationships
match_collections(collections_to_match):
- Delegates to
MatchingCollections.for_collections()— used during tool execution, not directly by API
apply_rules(hdca, rule_set, handle_dataset):
- Rule-based collection manipulation
- Flattens collection to tabular data + sources, applies rules, builds new elements
HDCAManager
File: lib/galaxy/managers/hdcas.py
Standard Galaxy model manager for HDCAs. Extends ModelManager, AccessibleManagerMixin, OwnableManagerMixin, PurgableManagerMixin, AnnotatableManagerMixin.
is_owner(item, user, **kwargs):
- Checks
item.history.user == user - For anonymous users: checks
item.history == kwargs.get("history")
map_datasets(content, fn, *parents):
- Recursive walker over all datasets in a collection
- Used by bulk operations to apply changes to every leaf dataset
6. Request/Response Schemas
File: lib/galaxy/schema/schema.py
Core Enums
DatasetCollectionInstanceType = Literal["history", "library"]
class DatasetCollectionPopulatedState(str, Enum):
NEW = "new"
OK = "ok"
FAILED = "failed"
class DCEType(str, Enum):
hda = "hda"
dataset_collection = "dataset_collection"
class CollectionSourceType(str, Enum):
hda = "hda"
ldda = "ldda"
hdca = "hdca"
new_collection = "new_collection"
Request Models
CreateNewCollectionPayload — POST body for creating collections:
collection_type: Optional[str] — e.g. “list”, “paired”, “list:paired”, “sample_sheet”element_identifiers: Optional[list[CollectionElementIdentifier]]name,hide_source_items,copy_elementsinstance_type: “history” or “library”history_id,folder_idfields: For record typecolumn_definitions,rows: For sample_sheet type
CollectionElementIdentifier (in history_contents service):
name,src(CollectionSourceType),id,tagselement_identifiers: For nestednew_collectionsrc (self-referencing)collection_type: For nested collections
UpdateHistoryContentsPayload — PUT body for updates (shared with HDAs):
- Flexible payload, used with
model_dump(exclude_unset=True)
DeleteHistoryContentPayload — DELETE body:
purge,recursive,stop_jobbooleans
Response Models
DCSummary — DatasetCollection summary:
id,create_time,update_time,collection_type,populated_state,populated_state_message,element_count
DCDetailed extends DCSummary:
populated(bool),elements(list[DCESummary])
DCESummary — DatasetCollectionElement:
id,element_index,element_identifier,element_type(DCEType)object: Union[HDAObject, HDADetailed, DCObject] — actual contentcolumns: Optional sample sheet row data
DCObject — Nested DatasetCollection as element:
id,collection_type,populated,element_countcontents_url: Optional URL for drill-downelements: list[DCESummary] (recursive)elements_states,elements_deleted,elements_datatypes: Summary stats
HDAObject — Dataset as element:
id,state,hda_ldda,history_id,tags,purgedaccessible: Optional (set during contents serialization)
HDCASummary — HDCA summary (used in listings):
id,name,hid,history_id,collection_idhistory_content_type: Always"dataset_collection"type: Always"collection"collection_type,populated_state,populated_state_message,element_countelements_datatypes,elements_states,elements_deletedjob_source_id,job_source_type,job_state_summarydeleted,visible,create_time,update_timetags,url,contents_urlstore_times_summary
HDCADetailed extends HDCASummary:
populated(bool)elements(list[DCESummary])implicit_collection_jobs_idcolumn_definitions(for sample_sheet type)
AnyHDCA = Union[HDCACustom, HDCADetailed, HDCASummary]
7. Serialization Pipeline
There are two serialization paths for collections:
Path 1: dictify_dataset_collection_instance() (collections_util.py)
Used by DatasetCollectionsService and HistoriesContentsService.__collection_dict().
dictify_dataset_collection_instance(hdca, parent, security, url_builder, view, fuzzy_count)
|
├── hdca.to_dict(view=hdca_view) -- base model serialization
|
├── Compute URL and contents_url
|
├── If view in ("element", "element-reference"):
| ├── gen_rank_fuzzy_counts(collection_type, fuzzy_count)
| ├── get_fuzzy_count_elements(collection, rank_fuzzy_counts)
| └── For each element:
| ├── dictify_element(element, ...) # full view
| └── dictify_element_reference(...) # reference view
|
└── Attach implicit_collection_jobs_id
dictify_element(): Full recursive serialization. Calls element_object.to_dict() for datasets (includes all HDA metadata). For subcollections, recursively serializes nested elements.
dictify_element_reference(): Lightweight serialization. For datasets: just id, model_class, state, hda_ldda, purged, history_id, tags. For subcollections: collection_type, element_count, populated, elements_states, elements_deleted, elements_datatypes, plus recursive nested elements.
Path 2: HDCASerializer (hdcas.py)
Used by HistoriesContentsService._serialize_content_item() (v2 index).
Standard Galaxy serializer framework with view-based key selection:
Summary view keys: id, type_id, name, history_id, collection_id, hid, history_content_type, collection_type, populated_state, populated_state_message, element_count, elements_datatypes, elements_deleted, elements_states, job_source_id, job_source_type, job_state_summary, deleted, visible, type, url, create_time, update_time, tags, contents_url, store_times_summary
Detailed view adds: populated, elements
Collection-proxied keys (delegated to DCSerializer): create_time, update_time, collection_type, populated, populated_state, populated_state_message, elements, element_count.
The elements serializer recursively uses DCESerializer, which delegates to HDASerializer for dataset elements and DCSerializer for subcollection elements.
Fuzzy Count Mechanism
gen_rank_fuzzy_counts(collection_type, fuzzy_count) converts a global element budget into per-rank limits:
pairedranks always get 2listranks split the remaining budget by nth-root- The goal is balanced representation across nesting levels
- Example:
list:pairedwith fuzzy_count=100 ->[~50, 2]
This is explicitly unstable / heuristic. The only guarantee is the API won’t return orders of magnitude more elements than the requested fuzzy_count.
8. Collection Creation — Full Request Path
Via /api/dataset_collections (Direct)
Client POST /api/dataset_collections
{ collection_type: "list:paired",
element_identifiers: [...],
instance_type: "history",
history_id: "abc123",
name: "My Collection" }
│
▼
FastAPIDatasetCollections.create()
│
▼
DatasetCollectionsService.create(trans, payload)
│
├── api_payload_to_create_params(payload)
│ ├── Validates required: collection_type, element_identifiers
│ ├── validate_column_definitions() if sample_sheet
│ └── Returns dict with: collection_type, element_identifiers, name,
│ hide_source_items, copy_elements, fields, column_definitions, rows
│
├── Resolve parent:
│ ├── history: HistoryManager.get_mutable(history_id, user)
│ └── library: get_library_folder + check_user_can_add_to_library_item
│
▼
DatasetCollectionManager.create(trans, parent, name, collection_type, element_identifiers, ...)
│
├── validate_input_element_identifiers(element_identifiers)
│ ├── Check no __object__ key (injection prevention)
│ ├── Check all have "name" field
│ ├── Check no duplicate names
│ ├── Check src in (hda, hdca, ldda, new_collection)
│ └── Recursive validation for new_collection children
│
├── create_dataset_collection(trans, collection_type, element_identifiers, ...)
│ │
│ ├── CollectionTypeDescriptionFactory.for_collection_type(collection_type)
│ │
│ ├── _element_identifiers_to_elements()
│ │ │
│ │ ├── If nested: __recursively_create_collections_for_identifiers()
│ │ │ └── For each src="new_collection": recursive create_dataset_collection()
│ │ │
│ │ └── __load_elements()
│ │ └── For each identifier:
│ │ ├── src="hda": hda_manager.get_accessible() [+ copy if copy_elements]
│ │ ├── src="ldda": ldda_manager.get() -> to_history_dataset_association()
│ │ ├── src="hdca": __get_history_collection_instance().collection
│ │ └── Apply tags from identifier
│ │
│ ├── builder.build_collection(type_plugin, elements)
│ │ └── type_plugin.generate_elements(elements)
│ │ └── Yields DatasetCollectionElement objects
│ │
│ └── Set collection_type on DatasetCollection
│
├── _create_instance_for_collection(trans, parent, name, collection, ...)
│ ├── Create HDCA (or LDCA for library)
│ ├── Set implicit_input_collections if applicable
│ ├── parent.add_dataset_collection() -- assigns HID
│ └── Apply tags (list of strings or dict of tag objects)
│
└── __persist() -- session.add() + session.commit()
Via History Contents API (Copy)
POST /api/histories/{history_id}/contents/dataset_collections
{ type: "dataset_collection", source: "hdca", content: "encoded_hdca_id",
copy_elements: true, dbkey: "hg38" }
│
▼
HistoriesContentsService.__create_dataset_collection()
│
▼
DatasetCollectionManager.copy(trans, parent=history, source="hdca",
encoded_source_id, copy_elements=True,
dataset_instance_attributes={dbkey: "hg38"})
│
├── __get_history_collection_instance(trans, encoded_source_id)
├── source_hdca.copy(element_destination=history, dataset_instance_attributes=...)
├── new_hdca.copy_tags_from(source=source_hdca)
└── session.commit()
9. Collection Access and Navigation
Fetching a Collection
GET /api/dataset_collections/{hdca_id}?view=element
Returns HDCADetailed with full element tree. For very large collections, use view=element-reference for lighter payloads, or view=collection to skip elements entirely.
Fetching via History Contents
GET /api/histories/{history_id}/contents/dataset_collections/{hdca_id}?fuzzy_count=100
The fuzzy_count parameter limits elements at each level. For list:paired with fuzzy_count=100, approximately 50 list elements are returned, each with their full 2 paired elements.
Navigating Nested Collections
Step 1: Get the HDCA with elements or get contents_url:
GET /api/histories/{history_id}/contents?v=dev&view=summary&keys=contents_url
Step 2: Use contents_url to get root elements:
GET /api/dataset_collections/{hdca_id}/contents/{collection_id}
Step 3: For subcollections, each element’s object.contents_url provides the next level:
GET /api/dataset_collections/{hdca_id}/contents/{child_collection_id}
This supports pagination with limit and offset at each level.
Accessing Individual Elements
GET /api/dataset_collection_element/{dce_id}
Returns DCESummary for any element by its ID. Access checked via the parent collection’s permissions.
10. Update, Delete, and Bulk Operations
Update
PUT /api/dataset_collections/{hdca_id}
{ name: "New Name", tags: ["tag1", "tag2"], visible: false }
Or via history contents:
PUT /api/histories/{history_id}/contents/dataset_collections/{hdca_id}
{ name: "New Name" }
Allowed fields:
name— sanitized stringdeleted— booleanvisible— booleantags— list of strings (callstag_handler.set_tags_from_list())annotation— text
Anonymous users can only update deleted and visible.
Delete
DELETE /api/histories/{history_id}/contents/dataset_collections/{hdca_id}
Payload/query options:
recursive(bool, deprecated as query param): also delete leaf datasetspurge(bool, deprecated as query param): purge leaf datasets from diskstop_job(bool, deprecated as query param): stop creating job
Returns 202 (accepted, async purge) or 204 (immediate).
Batch Update
PUT /api/histories/{history_id}/contents
{ items: [{ id: "...", history_content_type: "dataset_collection" }],
visible: false }
Applies same payload to all listed items. HDCAs updated via DatasetCollectionManager.update().
Bulk Operations
PUT /api/histories/{history_id}/contents/bulk
{ operation: "delete",
items: [{ id: "...", history_content_type: "dataset_collection" }] }
Or with filter-based selection (no explicit items, uses filter query params).
Operations and their effect on collections:
hide/unhide: setsvisibleon HDCAdelete: recursive delete of HDCA + all leaf datasetsundelete: undeletes HDCA (fails if purged)purge: recursive delete + purge of all leaf datasetschange_datatype: chains Celery tasks for all leaf datasets, then touches HDCAchange_dbkey: sets dbkey on all leaf dataset instancesadd_tags/remove_tags: modifies HDCA tags
11. Collection Downloads
Synchronous Download
GET /api/dataset_collections/{hdca_id}/download
GET /api/histories/{history_id}/contents/dataset_collections/{hdca_id}/download
Returns StreamingResponse with zip archive. Structure:
- Collection name as root directory
- Elements named by element_identifier + file extension
- Nested collections create subdirectories
Uses ZipstreamWrapper for streaming. Skips datasets not in ok state or that are purged.
Prerequisite: Collection must be fully populated (populated_optimized == True), otherwise raises 400.
Async Download
POST /api/dataset_collections/{hdca_id}/prepare_download
Returns AsyncFile with storage_request_id. Uses Celery task prepare_dataset_collection_download. Client polls short-term storage API for completion.
Export-Style Download
POST /api/histories/{history_id}/contents/dataset_collections/{id}/prepare_store_download
{ model_store_format: "tar.gz", include_files: true }
Exports collection as a Galaxy model store archive (tar.gz, rocrate, etc.).
12. Job State and Implicit Collections in API
Job State Summary
GET /api/histories/{history_id}/contents/dataset_collections/{id}/jobs_summary
Returns AnyJobStateSummary. For implicit collections (created by map-over):
- Checks
job_source_type: either"Job"(single job) or"ImplicitCollectionJobs"(job group) - Returns aggregate state counts across all jobs in the group
Batch Job State Polling
GET /api/histories/{history_id}/jobs_summary?ids=id1,id2&types=ImplicitCollectionJobs,Job
Efficient bulk lookup. IDs and types arrays must have same length. Uses fetch_job_states() for efficient SQL.
How Implicit Collections Appear in API Responses
HDCASummary / HDCADetailed always include:
job_source_id: encoded ID of the Job or ImplicitCollectionJobsjob_source_type:"Job"or"ImplicitCollectionJobs"or nulljob_state_summary:HDCJobStateSummarywith counts per job state (new, waiting, running, ok, error, paused, etc.)implicit_collection_jobs_id(detailed view only)
These fields allow the UI to track progress of implicit collection population. The populated_state field indicates whether the collection structure itself is finalized:
"new"— elements not yet fully determined"ok"— all elements present (though individual datasets may still be running)"failed"— collection population failed
13. Authentication and Authorization
Access Control Model
All collection access goes through history/library access checks:
-
History collections (HDCA):
get_dataset_collection_instance()callshistory_manager.error_unless_accessible()orerror_unless_owner()depending oncheck_ownershipflag- Most read operations use
check_accessible=True(allows shared histories) - Write operations (update, delete, copy) use
check_ownership=True
-
Library collections (LDCA):
- Uses
security_agent.can_access_library_item()for access checks - Ownership checks for library collections are not yet implemented (raises
NotImplementedError)
- Uses
-
Element-level access:
dce_content()checkssecurity_agent.can_access_collection()on the DCE’s parent or child collection- This checks dataset permissions for all leaf datasets in the collection
- Admin users bypass element-level checks
-
Anonymous users:
- Can access their current history’s contents
- Limited to
deletedandvisibleupdates only - Ownership verified by
history == current_historywhenhistory.user is None
Security Patterns
- Collection creation requires mutable history access:
history_manager.get_mutable() hide_source_itemsrequires ownership of each source HDA- Recursive delete verifies ownership of each leaf dataset before deletion
hda_manager.get_accessible()used when resolving element identifiers (allows referencing accessible-but-not-owned datasets)- Job state summary intentionally has no access checks — considered non-sensitive data for efficiency
14. Pagination and Filtering
Collection Contents Pagination
GET /api/dataset_collections/{hdca_id}/contents/{parent_id}?limit=20&offset=0
Direct SQL LIMIT/OFFSET on DatasetCollectionElement query, ordered by element_index.
History Contents Filtering
Collections appear alongside datasets in history contents. The v2 index supports:
- Standard filter params via
FilterQueryParams(q,qv,offset,limit,order) - Ordering by
hid-asc(default) or other fields - Type filtering:
?types=dataset_collectionrestricts to HDCAs only
Legacy index supports:
types: comma-separated list including"dataset_collection"ids: specific encoded IDsdeleted,visible: boolean filtersshareable: filter by object store shareability
Fuzzy Count (Large Collection Handling)
Not true pagination — a heuristic budget for the show endpoint:
GET /api/histories/{history_id}/contents/dataset_collections/{id}?fuzzy_count=500
Distributes an element budget across nesting levels. For list:list:list with fuzzy_count=1000:
- Each list rank gets approximately
cube_root(1000) + 1 = 11elements
15. Error Handling
Validation Errors
Element identifier validation (validate_input_element_identifiers()):
- Missing
namefield -> 400 - Duplicate
namevalues -> 400 - Unknown
srctype -> 400 __object__key present (injection) -> 400- Missing
element_identifiersfornew_collection-> 400 - Missing
collection_typefornew_collection-> 400
Creation errors:
- Missing
collection_type-> 400 (ERROR_NO_COLLECTION_TYPE) - Missing
element_identifiersand noelements-> 400 (ERROR_INVALID_ELEMENTS_SPECIFICATION) - Missing
history_idwheninstance_type="history"-> 400 - Record type without
fields-> 400
Column definition validation (sample_sheet):
- Invalid type -> 400
- Missing required keys -> 400
- Invalid validators -> 400
- Row data mismatching column definitions -> 400
Access Errors
- History not accessible -> 403 (ItemAccessibilityException)
- History not owned (for mutations) -> 403
- Collection not found -> 400 (RequestParameterInvalidException with “not found” message)
- Library collection not accessible -> 403
- HDA not accessible during element resolution -> 403
Containment Errors
contentsendpoint withparent_idnot contained in HDCA -> 404 (ObjectNotFound)
Population Errors
- Download of unpopulated collection -> 400 (RequestParameterInvalidException)
- Serialization failure when collection not populated -> logs exception and re-raises ValidationError
16. Sample Sheet and Workbook Endpoints
Sample sheets are a collection type where each element carries row-level metadata (columns field on DCE). The API includes workbook endpoints for Excel-based data entry.
Create Workbook
POST /api/sample_sheet_workbook
{ title: "...", column_definitions: [...] }
Returns XLSX file as StreamingResponse. No collection needed — just generates a template.
Create Workbook for Existing Collection
POST /api/dataset_collections/{hdca_id}/sample_sheet_workbook
{ column_definitions: [...], prefix_values: [...] }
Pre-fills the workbook with element identifiers from the existing collection.
Parse Workbook
POST /api/sample_sheet_workbook/parse
{ content: "base64-encoded-xlsx", column_definitions: [...], collection_type: "..." }
Returns ParsedWorkbook with rows extracted from the XLSX.
Parse Workbook for Existing Collection
POST /api/dataset_collections/{hdca_id}/sample_sheet_workbook/parse
{ content: "base64-encoded-xlsx", column_definitions: [...] }
Returns ParsedWorkbookForCollection — includes rows plus elements from the existing collection, allowing the client to map rows to collection elements.
17. Test Coverage
File: lib/galaxy_test/api/test_dataset_collections.py
Test Inventory
| Test | What It Covers |
|---|---|
test_create_pair_from_history | Create paired collection via fetch API |
test_create_list_from_history | Create list collection via direct POST |
test_create_list_of_existing_pairs | Reference existing HDCA as element (src=hdca) |
test_create_list_of_new_pairs | Nested collection creation (list:paired with new subcollections) |
test_create_paried_or_unpaired | paired_or_unpaired collection with single “unpaired” element |
test_create_record | Record collection with explicit fields |
test_record_requires_fields | 400 when record type without fields |
test_record_auto_fields | Auto-detect fields from identifiers |
test_record_field_validation | Rejects wrong field count/names |
test_sample_sheet_* (7 tests) | Sample sheet creation, column definitions, validation, nested sample sheets |
test_workbook_download | XLSX generation |
test_workbook_download_for_collection | XLSX generation from existing collection |
test_workbook_parse | XLSX parsing |
test_workbook_parse_for_collection | XLSX parsing with collection context |
test_list_download | Download list as zip |
test_pair_download | Download pair as zip |
test_list_pair_download | Download list:paired as zip |
test_list_list_download | Download list:list as zip |
test_list_list_list_download | Download list:list:list as zip |
test_download_non_english_characters | Non-ASCII collection names in zip |
test_hda_security | 403 when element is inaccessible to another user |
test_dataset_collection_element_security | DCE endpoint security for nested collections |
test_enforces_unique_names | 400 on duplicate element identifiers |
test_upload_collection | Fetch API collection upload with tags |
test_upload_nested | Fetch API nested collection upload |
test_upload_collection_from_url | Upload from base64 URL |
test_upload_collection_deferred | Deferred dataset in collection |
test_upload_collection_failed_expansion_url | Failed bagit expansion |
test_upload_flat_sample_sheet | Fetch API sample sheet upload |
test_upload_sample_sheet_paired | Fetch API sample_sheet:paired upload |
test_collection_contents_security | 403 on contents of non-owned collection |
test_published_collection_contents_accessible | Contents accessible in published history |
test_collection_contents_invalid_collection | 404 for invalid subcollection ID |
test_show_dataset_collection | GET show endpoint basic functionality |
test_show_dataset_collection_contents | Contents endpoint with drill-down |
test_collection_contents_limit_offset | Pagination params on contents |
test_collection_contents_empty_root | Empty collection contents |
test_get_suitable_converters_* (3 tests) | Converter intersection logic |
test_collection_tools_tag_propagation | Tags propagated through tool execution |
Testing Patterns
- Populator objects:
DatasetCollectionPopulatorandDatasetPopulatorprovide helper methods for creating test data - Fetch API usage: Many tests use
dataset_populator.fetch(payload)(tools/fetch endpoint) instead of direct collection creation - Wait patterns:
wait_for_fetched_collection()polls until collection is populated - Response validation:
_check_create_response()verifies 200 status and required keys (elements,url,name,collection_type,element_count) - Security tests: Use
_different_user()context manager to test access control - Helper pattern:
_create_collection_contents_pair()creates a simple collection and returns (hdca_dict, contents_url) for reuse
Coverage Gaps (Observed)
- No explicit test for library collection creation/access
- No test for
PUT /api/dataset_collections/{id}(direct update) - No test for
POST /api/dataset_collections/{id}/copy(copy with attributes) - No explicit test for the
fuzzy_countparameter behavior - No test for
GET /api/dataset_collections/{id}/attributes - Bulk operations on collections tested elsewhere in history contents tests
18. File Index
| File | Contents |
|---|---|
lib/galaxy/webapps/galaxy/api/dataset_collections.py | FastAPI endpoints for /api/dataset_collections/ and /api/dataset_collection_element/ and /api/sample_sheet_workbook/ |
lib/galaxy/webapps/galaxy/api/history_contents.py | FastAPI endpoints for /api/histories/{id}/contents/ including collection-typed operations and download endpoints |
lib/galaxy/webapps/galaxy/api/common.py | Shared path/query param definitions: HistoryHDCAIDPathParam, DatasetCollectionElementIdPathParam, serve_workbook() |
lib/galaxy/webapps/galaxy/services/dataset_collections.py | DatasetCollectionsService — service layer for dedicated collection endpoints. Also defines UpdateCollectionAttributePayload, DatasetCollectionAttributesResult, SuitableConverters, DatasetCollectionContentElements, workbook API models |
lib/galaxy/webapps/galaxy/services/history_contents.py | HistoriesContentsService — service layer for history contents endpoints. Also defines CreateHistoryContentPayload, CollectionElementIdentifier, HistoryContentsIndexParams, HistoryItemOperator |
lib/galaxy/managers/collections.py | DatasetCollectionManager — central business logic for collection CRUD, matching, rule application |
lib/galaxy/managers/hdcas.py | HDCAManager, DCESerializer, DCSerializer, DCASerializer, HDCASerializer — CRUD manager and serialization |
lib/galaxy/managers/collections_util.py | api_payload_to_create_params(), validate_input_element_identifiers(), dictify_dataset_collection_instance(), dictify_element(), dictify_element_reference(), gen_rank_fuzzy_counts() |
lib/galaxy/schema/schema.py | Pydantic models: CreateNewCollectionPayload, DCSummary, DCDetailed, DCESummary, DCObject, HDAObject, HDCASummary, HDCADetailed, HDCJobStateSummary, DatasetCollectionPopulatedState, DCEType, CollectionSourceType, DatasetCollectionInstanceType |
lib/galaxy_test/api/test_dataset_collections.py | API integration tests for collection creation, download, contents navigation, security, converters, sample sheets, workbooks |