We’re working on /Users/jxc755/projects/repositories/galaxy-brain/vault/projects/history_markdown/EXTRACT_BY_ID_PLAN.md. The final state from the other mode is:
⏺
After studying the model further, I want to pause here before committing to the implicit-map design. Step 2 needs careful navigation of
DatasetCollectionElement
↔ parent
HDCA
(no direct back-link in the model — requires reverse query), and the implicit-map cases need real-DB validation. Pushing further without testing on real data is risky.
Final state for this session
Branch
history_notebook_extract
, 6 new commits:
1787213de2 extract: support hdca_ids inputs in extract_steps_by_ids fd9f63fd71 extract: implement extract_steps_by_ids skeleton (HDA inputs) a0ecf5312e extract: drop eager sa_session capture in BaseWorkflowSummary f0ded11559 extract: scaffold ID-based workflow extraction endpoint df4339e966 extract: lift WorkflowSummary helpers to BaseWorkflowSummary
Plan progress:
- ✅ Step 0:
BaseWorkflowSummary
lift (helpers + warnings only — narrow lift per plan §3a) - ✅ Step 0b: Schema
WorkflowExtractionByIdsPayload
, stubs,
POST /api/workflow/extract
registered,
WorkflowsService.extract_by_ids
- ✅ Step 1 (partially): HDA + HDCA input handling in
extract_steps_by_ids
, 6 unit tests in
test/unit/workflows/test_extract_by_ids.py
(all pass; 97 total workflow unit tests green) - ❌ Step 1 remaining: tool-job branch (joining via
id_to_output_pair
) - ⏸ Steps 2–9: pending
Key open questions for next session:
- Walk
DatasetCollectionElement
→ parent
HDCA
: model has no direct back-ref. Plan §2 says try without it first; fall back to
JobToInputDatasetCollectionAssociation
. Worth deciding before writing the cleanup walker. 2. Implicit-map representative-job logic in ID path: jobs share
ImplicitCollectionJobs
. New payload may pass any participating job_id — need to detect and look up output HDCA. Plan §4. 3. The plan calls unit tests #1, #3, #4 “unit” but #2 (collection-through-tool), #5 (implicit-map), #6 (copy) are most natural at API/integration level given their DB dependencies. Worth reconciling before writing tests #2/#5 with brittle mocks.
We’re working on /Users/jxc755/projects/repositories/galaxy-brain/vault/projects/history_markdown/EXTRACT_BY_ID_PLAN.md. The final state from the other agent is:
#6c
—
test_extract_dce_as_data_param_flows_through_as_leaf_hda
✅ DCE → leaf HDA, no HDCA in workflow -
#14
—
test_extract_with_cached_job_cross_history
✅ cached B-side job extracts cleanly -
#15
—
test_roundtrip_basic_by_ids
✅ extract → invoke on fresh history, output verified
11/11
by-ids API tests passing.
On #15b (subworkflow roundtrip):
I held off — couldn’t find an existing subworkflow extraction test to port, and subworkflow-ness lives on the
invocation
, not on resulting jobs/datasets, so extraction sees a flat post-run history regardless. I think this can be punted to a follow-up unless you specifically want me to construct the scenario from scratch.
Remaining from plan:
- Step 8 — Vue form: switch to
hda_ids
/
hdca_ids
payload, swap to
extractWorkflowByIds
, update
WorkflowExtractionForm.test.ts
- Step 9 — Selenium roundtrip + back-compat sanity (existing HID suite already green from earlier full run) - #15b subworkflow if wanted
I think dropping subworkflow stuff makes sense for now (update the plan to reflect this please). After that can you continue working on this project.