Dashboard

Pr 19377 Collection Types And Wizard Ui

Paired_or_unpaired and record collection types plus collection adapters enable flexible tool input matching

Raw
Revised:
2026-04-28
Revision:
5
GitHub PR:
#19377
Related Notes:
Component - Dataset Collections, Component - Collections - Paired or Unpaired, Component - Auto Pairing, Component - Collection Adapters, Component - Collection Models, Component - Collection Tool Execution Semantics, PR 19305 - Implement Sample Sheets, PR 21828 - YAML Tool Hardening and Tool State

PR #19377 Research Summary

Title: Empower Users to Build More Kinds of Collections, More Intelligently Author: John Chilton (jmchilton) URL: https://github.com/galaxyproject/galaxy/pull/19377 Status: MERGED into dev (branch: fixed_length_collections) Labels: area/UI-UX, kind/feature, area/API, area/dataset-collections, area/tool-framework, highlight Merge commit in dev: c212434dc8 In current branch’s merge base: Yes — all PR changes are already in the yaml_tool_harden_1 working tree.

Summary

Massive feature PR (~6700 additions across 130+ files) introducing:

  1. paired_or_unpaired collection type — mixed paired/unpaired data in a single collection. Elements are either {forward, reverse} (paired) or {unpaired} (singleton). Tools declaring collection<paired_or_unpaired> also accept plain paired input. Lists of these (list:paired_or_unpaired) can match both list and list:paired.

  2. record collection type — CWL-style heterogeneous tuples with named, typed fields. Generalization of paired. Fields defined via FieldDict ({name, type, format?}). Database stores fields JSON on DatasetCollection.

  3. Collection Adapter framework (adapters.py) — wraps model objects to create ephemeral/pseudo collections for tool execution. Key adapters: PromoteCollectionElementToCollectionAdapter, PromoteDatasetToCollection, PromoteDatasetsToCollection. Adapter state serialized to adapter JSON column on job input association tables.

  4. List Wizard UI — replaces PairedListCollectionCreator (deleted, -1257 lines) with wizard-based ListWizard.vue + PairedOrUnpairedListCollectionCreator.vue. “Auto Build List” and “Advanced Build List” options in history dropdown. Auto-detects pairing.

  5. Rule-Based Import Activity — new standalone activity + wizard for seeding rule builder from remote files, pasted data, existing datasets, etc.

  6. Collection Semantics Documentation (collection_semantics.yml + generated collection_semantics.md) — formal specification of mapping, reduction, sub-collection mapping, and paired_or_unpaired semantics with labeled examples tied to test cases.

  7. __SPLIT_PAIRED_AND_UNPAIRED__ tool — new collection operation that splits list:paired_or_unpaired into homogeneous list + list:paired.

  8. Database migration (ec25b23b08e2) — adds fields column to dataset_collection, adapter column to job_to_input_dataset, job_to_input_dataset_collection, job_to_input_dataset_collection_element.

Key Architectural Decisions

  • Collection type matching is asymmetric: paired_or_unpaired consumes paired but not vice versa. can_match_type() in type_description.py implements this. list:paired_or_unpaired matches list and list:paired.
  • Adapters bridge type gaps at runtime: When a paired collection is passed to a paired_or_unpaired input, an adapter wraps it. Adapter JSON is persisted on job input associations for provenance but isn’t re-used during job evaluation.
  • single_datasets pseudo-type: Used as a subcollection type to allow flat collections (like paired_or_unpaired) to be split into individual DCE-level adapters.
  • fields parameter threaded everywhere: From API payload -> collections_util -> collections.py -> builder.py -> type plugins. "auto" value triggers field guessing from element identifiers.
  • Discriminated unions for adapter models in tool_util_models/parameters.pyAdaptedDataCollectionRequest and AdaptedDataCollectionRequestInternal use adapter_type discriminator.
  • DatasetCollection.allow_implicit_mapping: Records return False — they don’t participate in implicit mapping.

Files Changed (130 files)

New Backend Files

FileDescription
lib/galaxy/model/dataset_collections/adapters.py (+289)CollectionAdapter hierarchy
lib/galaxy/model/dataset_collections/types/paired_or_unpaired.py (+43)paired_or_unpaired type plugin
lib/galaxy/model/dataset_collections/types/record.py (+45)record type plugin
lib/galaxy/model/dataset_collections/types/semantics.py (+240)YAML->Markdown doc generator
lib/galaxy/model/dataset_collections/types/collection_semantics.yml (+563)Formal collection semantics spec
lib/galaxy/tools/split_paired_and_unpaired.xml (+132)Split paired/unpaired tool
lib/galaxy/model/migrations/.../ec25b23b08e2_...py (+46)Alembic migration
doc/source/dev/collection_semantics.md (+762)Generated docs

New Frontend Files (selection)

FileDescription
client/src/components/Collections/ListWizard.vue (+279)Main list wizard component
client/src/components/Collections/PairedOrUnpairedListCollectionCreator.vue (+798)New paired/unpaired creator
client/src/components/Collections/BuildFileSetWizard.vue (+220)Rule-based import wizard
client/src/components/Collections/common/AutoPairing.vue (+188)Auto-pairing UI
client/src/components/Workflow/Editor/Forms/FormRecordFieldDefinitions.vue (+145)Record field editor
client/src/components/Workflow/Editor/Forms/FormFieldType.vue (+60)Field type selector

Deleted Files

File
client/src/components/Collections/PairedListCollectionCreator.vue (-1257)
client/src/components/Collections/PairedListCollectionCreator.test.js (-147)
client/src/components/Upload/useUploadDatatypes.ts (-29)

Significantly Modified Backend Files

File+/-Key Changes
lib/galaxy/model/__init__.py+32/-10fields on DatasetCollection, adapter on job input assocs, allow_implicit_mapping
lib/galaxy/model/dataset_collections/type_description.py+39/-34can_match_type() rewrite for paired_or_unpaired, single_datasets, fields param
lib/galaxy/model/dataset_collections/builder.py+24/-5fields threading, guess_fields()
lib/galaxy/model/dataset_collections/registry.py+10/-3Register record + paired_or_unpaired plugins
lib/galaxy/model/dataset_collections/subcollections.py+16/-1PromoteCollectionElementToCollectionAdapter for flat collections
lib/galaxy/tools/parameters/basic.py+80/-20CollectionAdapter in type unions, src_id_to_item_collection(), adapter recovery
lib/galaxy/tools/actions/__init__.py+45/-5Adapter handling in _record_inputs() and collection security checks
lib/galaxy/tools/__init__.py+60/-3SplitPairedAndUnpairedTool, ExtractDatasetCollectionTool supports new types
lib/galaxy/tool_util_models/parameters.py+74/-0Full adapter model hierarchy with discriminated unions
lib/galaxy/tool_util_models/tool_source.py+18/-3FieldDict, CwlType, nullable collection_type
lib/galaxy/managers/collections.py+20/-6fields param in create/create_dataset_collection, indices in rule data
lib/galaxy/tool_util/parser/interface.py+20/-3fields on TestCollectionDef, nullable collection_type
lib/galaxy/tool_util/parser/output_objects.py+10/-0fields on ToolOutputCollectionStructure
lib/galaxy/schema/schema.py+6/-0fields_ on CreateNewCollectionPayload

Cross-References to Current Branch (yaml_tool_harden_1)

Files Modified by Both PR and Current Branch (40 files overlap)

The current branch focuses on structured tool state, YAML tool support, and collection runtime hardening. Key overlapping areas:

FilePR PurposeCurrent Branch Purpose
lib/galaxy/tool_util_models/parameters.pyAdded adapter models (discriminated unions)Added collection runtime models with discriminated unions
lib/galaxy/tool_util_models/tool_source.pyAdded FieldDict, CwlTypeModified for YAML tool source changes
lib/galaxy/model/dataset_collections/builder.pyAdded fields supportModified for collection runtime
lib/galaxy/model/dataset_collections/subcollections.pyAdded adapter for flat collectionsModified for collection runtime
lib/galaxy/tools/parameters/basic.pyAdded CollectionAdapter supportChanges for structured tool state
lib/galaxy/tools/actions/__init__.pyAdapter recording in _record_inputsStructured tool state changes
lib/galaxy/model/__init__.pyfields/adapter columnsVarious model changes
lib/galaxy/managers/collections.pyfields param threadingCollection-related changes

No Conflicts

Since PR #19377 is already merged into dev and the current branch is based on a commit after that merge (c212434dc8 is an ancestor of the merge base c89c5f0644), there are no merge conflicts. The current branch builds on top of all PR #19377 changes.

Architectural Alignment

The current branch’s work on collection runtime models with discriminated unions (commits 200d5b8fbc, cd04873951) directly extends the patterns established by PR #19377:

  • PR #19377 introduced discriminated unions for adapter models (AdaptedDataCollectionRequestInternal discriminated on adapter_type)
  • Current branch extends discriminated unions to collection runtime models (e.g., DataCollectionPairedRuntime, dynamic model factory for recursive collection types)
  • PR #19377’s FieldDict type from tool_source.py is available for the current branch’s YAML tool work
  • The collection_semantics.yml spec provides a reference for what test cases should exist for the new collection types

Key Types/Interfaces Introduced by PR Available in Current Branch

  • CollectionAdapter and subclasses (adapters.py)
  • FieldDict, CwlType, FieldType (tool_source.py)
  • AdaptedDataCollectionRequest* models (parameters.py)
  • PairedOrUnpairedDatasetCollectionType and RecordDatasetCollectionType (type plugins)
  • DatasetCollection.fields and DatasetCollection.allow_implicit_mapping
  • can_match_type() with paired_or_unpaired semantics
  • SplitPairedAndUnpairedTool tool type

Incoming References (7)