Dashboard

Component Collection Adapters

Ephemeral wrappers promoting datasets/pairs to collections at tool runtime (PromoteDatasetToCollection family)

Raw
Revised:
2026-04-22
Revision:
3
Related Notes:
Component - CWL Ephemeral Collections, Component - Collection Models, Component - Collection Tool Execution Semantics, Component - Tool XML Collection Commands, PR 19377 - Collection Types and Wizard UI

Component: Collection Adapters

Overview

Collection adapters are a framework for wrapping model objects (datasets, collection elements) into ephemeral, collection-like facades for tool execution. They exist only during tool evaluation — never persisted as real collections, never visible to users. They let Galaxy accept flexible input combinations (single datasets, pairs of datasets, collection elements) where a tool declares a collection input, bridging type gaps at runtime.

Introduced in PR #19377 (“Empower Users to Build More Kinds of Collections, More Intelligently”), implementing the design from issue #19359. Evolved from the earlier “EphemeralCollections” concept in the CWL branch.

Key source file: lib/galaxy/model/dataset_collections/adapters.py

Motivation

Galaxy tool authors historically had to write complex conditionals to handle different data structures. A mapper tool shouldn’t need separate code paths for paired reads, single-end reads, and mixed collections. The adapter framework pushes that complexity into Galaxy’s runtime:

<!-- Before: tool authors wrote complex conditionals -->
<conditional name="single_paired">
    <param name="type" type="select">
        <option value="single">Single-end</option>
        <option value="paired">Paired-end</option>
        <option value="paired_collection">Paired Collection</option>
        <option value="paired_list">List of Pairs</option>
    </param>
    <!-- ... dozens of lines per option ... -->
</conditional>

<!-- After: one input, Galaxy handles the rest -->
<param type="data_collection" collection_type="paired_or_unpaired" label="Reads" />

When a user passes a single dataset to a paired_or_unpaired input, Galaxy wraps it in a PromoteDatasetToCollection adapter. When passing paired to paired_or_unpaired, an adapter bridges the type gap. The tool code never knows the difference.

Architecture

Base Class: CollectionAdapter

adapters.py:28-70 — Abstract base defining the interface that all adapters must implement. Mirrors the DatasetCollection interface used by tool execution code:

PropertyPurpose
dataset_action_tuplesSecurity permission tuples for RBAC checks
dataset_states_and_extensions_summaryMetadata summary (states, extensions, dbkeys)
dataset_instancesList of underlying HistoryDatasetAssociation objects
elementsCollection elements (real or transient)
collection_typeString type identifier (e.g., "paired_or_unpaired", "list")
collectionReturns self — adapter serves as both the association and collection object
adaptingThe wrapped model object (for recording job provenance)
to_adapter_model()Serialize to Pydantic model for database persistence

Concrete Adapters

1. PromoteCollectionElementToCollectionAdapter (adapters.py:122-135)

  • Wraps a DatasetCollectionElement → acts as paired_or_unpaired
  • Use case: mapping over list:paired_or_unpaired — each element (which may be a single dataset rather than a subcollection) gets promoted
  • Created in subcollections.py:44-45 when splitting flat collections via single_datasets pseudo-type
  • Inherits from DCECollectionAdapter (shared logic for wrapping DCEs)

2. PromoteDatasetToCollection (adapters.py:138-189)

  • Wraps a single HistoryDatasetAssociation → acts as list or paired_or_unpaired
  • For paired_or_unpaired: element identifier becomes "unpaired"
  • For list: element identifier is the dataset name
  • Creates TransientCollectionAdapterDatasetInstanceElement to wrap the HDA

3. PromoteDatasetsToCollection (adapters.py:192-262)

  • Wraps multiple HDAs → acts as paired or paired_or_unpaired
  • Takes pre-built list of TransientCollectionAdapterDatasetInstanceElement objects
  • Use case: user specifies forward + reverse datasets separately, adapter makes them look like a paired collection

Helper: TransientCollectionAdapterDatasetInstanceElement

adapters.py:265-285 — Lightweight stand-in for DatasetCollectionElement. Holds an HDA + element identifier without database backing. Implements element_object, dataset_instance, is_collection, columns to satisfy the DCE interface.

Type Matching

type_description.py:107-126can_match_type() determines when adapters are needed:

  • paired matches paired_or_unpaired (asymmetric — not vice versa)
  • list:paired_or_unpaired matches both list and list:paired
  • Pattern: X:paired_or_unpaired matches X and X:paired

When can_match_type() returns true but types differ, the tool parameter code creates an appropriate adapter.

Persistence & Recovery

Adapters are ephemeral but their configuration is persisted on job input association tables for provenance:

DB columns (migration ec25b23b08e2):

  • job_to_input_dataset.adapter — JSON
  • job_to_input_dataset_collection.adapter — JSON
  • job_to_input_dataset_collection_element.adapter — JSON

Pydantic models (tool_util_models/parameters.py):

External API models (user-facing, reference by encoded ID):

  • AdaptedDataCollectionPromoteDatasetToCollectionRequest
  • AdaptedDataCollectionPromoteDatasetsToCollectionRequest
  • Discriminated union: AdaptedDataCollectionRequest on adapter_type

Internal models (DB-facing, reference by raw int ID):

  • AdaptedDataCollectionPromoteCollectionElementToCollectionRequestInternal
  • AdaptedDataCollectionPromoteDatasetToCollectionRequestInternal
  • AdaptedDataCollectionPromoteDatasetsToCollectionRequestInternal
  • Discriminated union: AdaptedDataCollectionRequestInternal on adapter_type

Recovery: recover_adapter() (adapters.py:288-297) reconstitutes the adapter from a persisted model + the wrapped object loaded from DB. Called during job input recovery in tools/parameters/basic.py.

Integration Points

Tool Parameter Resolution (tools/parameters/basic.py)

DataCollectionToolParameter.from_json() accepts CollectionAdapter objects directly or dicts with src: "CollectionAdapter". When receiving a dict, it validates against the adapter models, loads the referenced objects from the DB, and calls recover_adapter().

Tool Actions (tools/actions/__init__.py)

_collect_input_datasets() checks isinstance(value, CollectionAdapter) and handles adapters alongside real collections. Adapters flow through:

  1. Security checks via dataset_action_tuples
  2. State/extension summaries via dataset_states_and_extensions_summary
  3. Dataset listing via dataset_instances
  4. Job recording via to_adapter_model() → serialized to adapter JSON column

Subcollection Splitting (dataset_collections/subcollections.py)

When mapping over collections, _split_dataset_collection() creates PromoteCollectionElementToCollectionAdapter for flat collection elements that need to look like subcollections (line 44-45). The single_datasets pseudo-type triggers this.

Collection Builder (dataset_collections/builder.py)

replace_elements_in_collection() accepts Union[CollectionAdapter, DatasetCollection] as template, enabling format conversion on adapter-wrapped inputs.

Proposed Future Adapters (from #19359)

Issue #19359 remains open. Several additional adapters are proposed but not yet implemented:

  1. PromoteDatasetsToListAdapter — send a set of individual datasets to a list collection input. API: {"src": "CollectionAdapter", "adapter_type": "PromoteDatasetsToList", "adapting": [{"src": "hda", "id": ...}, ...]}

  2. WrapEachElementInAListAdapter — map a list over a list input or multi-data input. Would wrap each element of an existing list collection in its own singleton list, enabling mapping over list-typed inputs.

  3. Broader PromoteDatasetToCollection scope — currently limited to list and paired_or_unpaired. Could expand to support promoting a dataset into any flat collection type.

  4. Tool form enhancements — adapters are most powerful when paired with UI that lets users select “single dataset” or “forward/reverse pair” on a tool form that declares paired_or_unpaired, with Galaxy auto-creating the adapter.

The overall direction: absorb collection complexity out of tools and into Galaxy’s runtime + UI, so tools declare what they need and Galaxy figures out how to provide it.

Relationship to Other Collection Work

  • paired_or_unpaired type (types/paired_or_unpaired.py): The primary collection type driving adapter creation. Mixed paired/unpaired data in one collection.
  • record type (types/record.py): Heterogeneous tuples with named fields. Foundation for sample sheets. Records are a generalization of paired (paired = record with fields [{type: File, name: forward}, {type: File, name: reverse}]).
  • List Wizard UI (Collections/ListWizard.vue, PairedOrUnpairedListCollectionCreator.vue): Replaced the old PairedListCollectionCreator. Auto-detects pairing, builds list:paired_or_unpaired collections.
  • __SPLIT_PAIRED_AND_UNPAIRED__ tool (tools/split_paired_and_unpaired.xml): Splits list:paired_or_unpaired into homogeneous list + list:paired.
  • Collection semantics spec (types/collection_semantics.yml): Formal spec of mapping, reduction, and type matching rules.

Key Files

FileRole
lib/galaxy/model/dataset_collections/adapters.pyCore adapter classes + recovery
lib/galaxy/model/dataset_collections/subcollections.pyCreates adapters during collection splitting
lib/galaxy/model/dataset_collections/type_description.pycan_match_type() — when adapters are needed
lib/galaxy/tool_util_models/parameters.pyPydantic models for adapter serialization
lib/galaxy/tools/parameters/basic.pyAdapter resolution in tool parameter processing
lib/galaxy/tools/actions/__init__.pyAdapter handling in job recording
lib/galaxy/model/dataset_collections/builder.pyAdapter-aware collection building
lib/galaxy/model/dataset_collections/registry.pyCollection type plugin registration
lib/galaxy/model/dataset_collections/types/paired_or_unpaired.pyPrimary type driving adapter use

Unresolved Questions

  • Should can_match_type() logic move into the type registry/plugins rather than being hardcoded? (Comment in source: “can we push this to the type registry somehow?”)
  • How will adapter creation surface in the tool form UI? Currently adapters are created programmatically; richer UI could let users pick “single dataset” / “forward+reverse pair” on collection inputs.
  • Will WrapEachElementInAListAdapter require changes to implicit mapping logic or just tool parameter resolution?
  • How do adapters interact with workflow connections? Can the workflow editor infer adapter requirements from connected output types?

Incoming References (5)