CWL_REPLACE_EPHEMERAL_COLLECTIONS_PLAN_V2

Revised Plan: Replace EphemeralCollections with CollectionAdapter Merge Subclasses (V2)

Executive Summary

Attempt 1 proved Steps 1-4 (adapter classes, Pydantic models, recovery, job recording) work. Failures were in the collection iteration pipelineon_text HID assertions and SubWorkflowModule duck-typing on HDCA properties.

Revised strategy — two-tier approach:

  1. Adapters flow through the tool parameter pipeline (serialization, recording, recovery) — already works.
  2. Fix iteration pipeline to tolerate adapter-shaped objects; materialize only where DB IDs are required (CWL scatter, subworkflow invocation).

Complete Inventory of “Collection-ness” Assumptions

Category A: HID assumptions (fix consumer — don’t need real HIDs)

SiteFile:LineCurrent Code
A1execute.py:462assert item.hid is not None — guarded by uses_non_persisted_collections
A2actions/__init__.py:989assert dataset_collection.hidNOT guarded
A3matching.py:25not getattr(hdca, "hid", None) — sets guard flag

Fix: A1 already guarded. A2: change assert to if dataset_collection.hid:. A3 already correct in WIP.

Category B: .collection + allow_implicit_mapping (adapter implements)

SiteFile:LineProperty
B1modules.py:881.collection.allow_implicit_mapping
B2matching.py:42.collection via get_child_collection
B3matching.py:93.collection in slice_collections_crossproduct
B4matching.py:118.dataset_action_tuples
B5structure.py:233.collection in get_structure
B6structure.py:104.collection in walk_collections

Adapter already has .collection (returns self), allow_implicit_mapping = True, dataset_action_tuples. These work.

Category C: Structure/walk interface (add 2 properties to adapter)

SiteFile:LinePropertyFix
C1structure.py:100.column_definitionsAdd column_definitions = None to CollectionAdapter base
C2structure.py:107,110,122collection[index]Add __getitem__ to CollectionAdapter base

Category D: SubWorkflowModule (accept materialization)

SiteFile:LineApproach
D1modules.py:877-880Materialize in get_collections_to_match — subworkflow invocation needs real DB objects

Category E: Invocation recording (already handled in WIP)

SiteFile:LineApproach
E1model/__init__.py:10294isinstance(CollectionAdapter): return guard

Category F: Tag propagation (already handled in WIP)

SiteFile:LineApproach
F1managers/collections.py:447Iterates v.dataset_instances for adapter

Category G: CWL-specific (already handled in WIP)

SiteApproach
build_cwl_input_dictMaterialize — scatter needs HDCA IDs
scatter subcollection wrappingDirect HDCA (not adapter)
subworkflow_progress multi-connMaterialize

What’s Needed Beyond WIP (2eeef24cfe)

The WIP has 90% of the work. Remaining delta:

FileChange~Lines
adapters.pyAdd column_definitions = None and __getitem__ to CollectionAdapter base+6
actions/__init__.pyFix _get_on_text HID assert → conditional skip~1

That’s it. ~7 lines of code to fix both blockers.


Development Order (Red-to-Green)

Phase 1: Fix on_text HID tolerance

  1. actions/__init__.py:_get_on_text(): change assert dataset_collection.hidif dataset_collection.hid:
  2. Test: wf_wc_scatter_multiple_nested (Path C — hit this blocker first)

Phase 2: Fix CollectionAdapter interface for get_structure/walk_collections

  1. Add column_definitions = None property to CollectionAdapter base
  2. Add __getitem__ to CollectionAdapter base
  3. Test: scatter_multi_input_embedded_subworkflow (subworkflow test that hit blocker 2)

Phase 3: Full regression


Decision Matrix

ConsumerApproachRationale
on_text (execute.py, actions)Fix consumer1-2 lines, cosmetic output
matching.py detectionAlready worksnot getattr(hdca, "hid", None)
structure.py get_structure/walkAdapter implements+4 lines: column_definitions, __getitem__
SubWorkflowModule mappingMaterializeSubworkflow needs real DB objects
CWL scatterMaterializeScatter loads HDCAs by DB ID
Tool parameter pipelineAdapter flows throughSerialization/recovery works cleanly
Tag propagationAlready handledWIP iterates v.dataset_instances
Invocation recordingAlready handledGuard skips adapters

Unresolved Questions