Tool-Source Identity Tracking — Implementation Plan

Date: 2026-05-18 (rev 3 — dynamic_tool linkage pulled in-scope as load-bearing PHASE 1 (was deferred “PHASE 2b”); core/consumer branch split off origin/dev; stale a994fe6a99 pin flagged) Branch (split — see BRANCH_SPLIT): the persistence+celery core (ToolSource identity columns incl. the dynamic_tool FK, write site, celery slim) touches only infra already on origin/dev → cut tool-source-identity off origin/dev and build the core there; land/rebase it first. The extraction consumer (_tool_from_request guid + dynamic-tool wiring, step dynamic_tool_id) + PHASE 3 edit code that exists only in the #7003 stack — those commits stay layered on extract_issue_followups (= origin/dev + 12, 0 behind). Stale-hash note: the rev-2 base pin a994fe6a99 is no longer in extract_issue_followups (rebased; #7003 head is now 5a40094b8a). All “a994fe6a99” references below mean the #7003 stack as it currently sits on the branch — reference #7003 by content, not by that dead hash. Tracking: no single GitHub issue — a model-correctness + provenance-fidelity fix underpinning the tool-request extraction family. Closes the guid=None toolshed-id edge and the symmetric dynamic/user-tool dynamic_tool=None edge documented in #7003 and the #21788 commit (05e4ac7c54). Related:

MAP_OVER_EMPTY_EXTRACTION_TOOL_REQUEST_PLAN — #21788, introduced _tool_from_request(guid=None) (the edge this closes)

QUEUED_EXECUTION_EXTRACTION_TOOL_REQUEST_PLAN — #7003, the reroute that widened the edge to completed/grey tool-request ICJs

EXTRACT_TOOL_REQUEST_STATE_PLAN — structured-state gate; established blob-first reconstruction as the deliberate de-toolbox direction

vault/research/Workflow Extraction Issues.md

At a glance


Problem	The persisted `ToolSource` DB model stores only `source` + `source_class` + `hash`. The resolved tool identity (`tool_id`/guid, `tool_version`, and the `DynamicTool` link for user/dynamic tools) lives only inside the serialized blob, on a `Job`, or on the transient `QueueJobs` message. So `_tool_from_request` rebuilds with `guid=None` and `dynamic_tool=None` → a toolshed tool reconstructs with its short `id`, not its namespaced guid, and a user/dynamic tool reconstructs with no `DynamicTool` link (so the extracted step has no `dynamic_tool_id`/`tool_uuid`) → an extracted `WorkflowStep` may not resolve to the real tool. Same under-capture, two tool classes.
Key insight	Identity is intrinsic to the snapshot + the request’s resolved tool, not recoverable from the request inputs. `ToolSource` is the content-addressable tool snapshot. `tool_id`/`tool_version` belong there; the `DynamicTool` reference is the same identity gap for the dynamic/user-tool class and belongs on the same row (a nullable FK) — not on `ToolRequest` (stays “which source + what inputs”), and not solved by toolbox-first (reintroduces the install/upgrade/removal drift this initiative deliberately moves away from).
This plan delivers	PHASE 1 (load-bearing, small): `ToolSource.{tool_id, tool_version, dynamic_tool_id}` (FK→`dynamic_tool`) columns, populated at the single write site from the already-resolved tool; `_tool_from_request` passes `guid=tool_source.tool_id`, attaches the linked `DynamicTool`, and the step-builder sets `step.dynamic_tool_id`. Closes both the toolshed-id edge and the dynamic/user-tool edge family-wide (#21788 direct `tool_request_ids` path and #7003 ICJ reroute) in one place. PHASE 2 (cleanup): celery reads `source`/`source_class`/`tool_id`/`dynamic_tool_id` off the model via the `tool_request_id` it already receives → slim the `QueueJobs` message (now incl. dropping `dynamic_tool_id` — enabled by the new FK). PHASE 3: three review-surfaced tidy-ups in the messy #21788/#7003 extract surface.
This plan does NOT	Add toolbox-first resolution (explicitly rejected). Persist `tool_dir` (runtime-only, location-coupled, no in-scope reader — stays ephemeral on the message). Fix the classic job-backed dynamic-tool extraction gap (`extract.py` never sets `step.dynamic_tool_id` for `job is not None` either — pre-existing, separate; see Unresolved). Backfill legacy rows. Change runtime execution semantics.
Risk	PHASE 1 low — additive nullable columns (incl. one nullable FK), one write site, two small extraction changes, unit-pinnable. PHASE 2 low-medium — `JobSubmitter` already loads the `ToolRequest`, lru-cache key byte-identical; main care is avoiding a double-fetch and that the `dynamic_tool_id` read-off-model is byte-equivalent to today’s message value. PHASE 3 low — comment/guard/test.

Why this exists & why bundle it here

Bundling bar: does this improve the (admittedly messy) #21788/#7003 work? Yes —

The guid=None toolshed-id limitation originated in committed #21788 (_tool_from_request, 05e4ac7c54) and was widened by #7003 (the reroute now routes completed/grey tool-request ICJs — previously job→guid via _tool_for_job — through the blob path). The same reroute widened a parallel limitation: a jobless dynamic/user-tool request rebuilds with dynamic_tool=None (the message-only link is absent in the extraction path), so the extracted step has no dynamic_tool_id/tool_uuid. PHASE 1 closes both for the whole family at the correct layer, retroactively fixing #21788’s direct path too.
PHASE 2 makes ToolSource the source of truth its consumers already implicitly need: celery threads raw_tool_source/tool_source_class/tool_id and dynamic_tool_id through a parallel DTO because the model under-captures. JobSubmitter.queue_jobs already re-loads the ToolRequest by id (managers/jobs.py:2239, :2303-2308) — so reading the blob+identity+dynamic_tool off the model is a no-new-coupling cleanup. Once the FK exists, dropping QueueJobs.dynamic_tool_id is no longer a separate model change — it falls out of the same PHASE 2 read-off-model (this is what rev-2 deferred as “PHASE 2b”; it is no longer deferred).
PHASE 3 clears three correctness/clarity asterisks the review found in the same extract.py/validator surface.

Retracted en route (recorded so it is not re-litigated): the earlier finding that reconstruction “fails for macros / toolshed-installed tools” was wrong. Macros are expanded at toolbox-load before XmlToolSource exists; to_string() serializes the post-expansion tree. Celery queue_jobs runs every async tool-request job by reconstructing from the same persisted blob — if it failed for those tools the async API could not execute them. The real residuals are the id-string namespacing and the dynamic/user-tool DynamicTool link below (same under-capture, two tool classes); tool_dir-dependent runtime concerns are genuinely deferred and not touched by parameter-model extraction.

Verified facts (code-traced 2026-05-17, subagent-confirmed)

Bug mechanism. Tool.parse (lib/galaxy/tools/__init__.py:1330-1333): if guid is None: self.id = self.old_id (= tool_source.parse_id() = root.get("id"), the short id, xml.py:193-194) else: self.id = guid. _tool_from_request (extract.py:612-628) calls create_tool_from_representation(..., guid=None) (:627) → reconstructed .id is the short id. A toolshed tool’s executed Job.tool_id / live toolbox id is the namespaced guid. Mismatch → WorkflowStep.tool_id may not resolve.
tool_version is recovered from the blob (XmlToolSource.parse_version = root.get("version"), xml.py:190-191). So tool_version on the model is identity/queryability only — not a reconstruction-correctness fix. Only tool_id/guid is load-bearing for the bug.
Model under-captures. DB ToolSource (model/__init__.py:1402-1408): id, hash, source, source_class only. ToolRequest (:1411-1428): request inputs-only (services/jobs.py:272); no tool identity anywhere on the request. For a jobless request, identity is recorded nowhere today (confirmed — not a missed abstraction).
Identity is free at the single write site. services/jobs.py:create() (:246-300) holds the resolved tool and builds ToolSourceModel(source=tool.tool_source.to_string(), source_class=…, hash="TODO") at :267-271. It already passes tool.id (:285), tool.tool_dir (:283), tool.dynamic_tool.id (:299) into the transient celery DTO only.
Column-shape precedent = DynamicTool, not Job. DynamicTool (model:1460-1475) already models this exact shape — tool_id, tool_version, tool_format, tool_path, tool_directory, all Unicode(255); ToolSource.hash is itself Unicode(255) (:1406). (Job.tool_id is String(255), Job.tool_version is TEXT default "1.0.0" — a different shape; do not cite Job for DDL.) No conflict: DynamicTool is the user-tool registry, ToolSource the per-request snapshot — adding identity to ToolSource does not duplicate DynamicTool’s role.
uuid is via the dynamic_tool FK, never a string column (managers/jobs.py:2060-2063; WorkflowStep.tool_uuid is a property = self.dynamic_tool and self.dynamic_tool.uuid, model:9123-9124, and WorkflowStep already carries dynamic_tool_id FK :9056). So ToolSource gets a nullable dynamic_tool_id FK (mirroring WorkflowStep/Job), not a tool_uuid string column.
Celery already has the link to read off the model. QueueJobs (schema/tasks.py:188-199) already carries tool_request_id (:190). request.tool_source is consumed only in celery/tasks.py:queue_jobs:448-455; JobSubmitter.queue_jobs never reads it but already loads the ToolRequest via _tool_request(request.tool_request_id) (managers/jobs.py:2239, :2303-2308). lru-cache key stays byte-identical (persisted source == today’s raw_tool_source = tool.tool_source.to_string()).
dynamic_tool_id is the same under-capture, not a separate concern (rev-3 reclassification). Consumed at managers/jobs.py:2241-2242 (tool.dynamic_tool = sa_session.get(DynamicTool, request.dynamic_tool_id)); set on the message at services/jobs.py:299 (tool.dynamic_tool.id if tool.dynamic_tool else None). create_tool_from_representation (tools/__init__.py:471-479) does not set tool.dynamic_tool — only the celery path patches it back from the message. The extraction path (_tool_from_request) has no message, so a jobless dynamic/user-tool request rebuilds with dynamic_tool=None, and the step-builder (extract.py:843-851) sets only step.tool_id/tool_version, never step.dynamic_tool_id → an extracted user/dynamic tool step cannot resolve via tool_uuid. This is load-bearing and exactly symmetric to the guid=None toolshed edge — same model under-capture, other tool class. An async tool request can be for a dynamic tool (services/jobs.py:247 ToolRunReference(... tool_uuid ...)), so the scenario is reachable, not theoretical. ⇒ add the dynamic_tool_id FK to ToolSource now, in PHASE 1 (rev-2’s “PHASE 2b deferral” was a misclassification of a PHASE 1 fidelity fix as celery-message cleanup).
tool.tool_dir is an absolute install path (tools/__init__.py:1028-1032: os.path.dirname(realpath(config_file)); None for dynamic/user tools, config_file=None). Used only at runtime (<command> exec :1425-1426, <code file> :1585-1586, <options from_file> :1639-1657) — never by parameter-model extraction. The only similar existing column, DynamicTool.tool_directory, is a user-supplied CWL path, not an install-tree path — there is no precedent for persisting install absolute paths in a DB column.
Migration reality. 1d1d7bf6ac02_tool_request_implicit_outputs.py is a create_table migration — not an add-column precedent. Additive columns use add_column/drop_column (migrations/util.py:308,312). versions_gxy has ~15 unmerged heads (periodic merge-migrations, e.g. 98621a25ab75_merge_migration_heads.py); a hand-picked down_revision will be wrong. Generate via sh manage_db.sh revision (autogenerate resolves the head chain / flags merge-heads).

Settled decisions

IDENTITY_ON_TOOLSOURCE. Add tool_id, tool_version, and dynamic_tool_id (nullable FK→dynamic_tool) to ToolSource, not ToolRequest. Intrinsic to the snapshot/resolved tool, shared across requests, fixes celery’s under-capture too. Column DDL: tool_id/tool_version Mapped[Optional[str]] = mapped_column(Unicode(255)) (index tool_id), anchored on DynamicTool/ToolSource.hash; dynamic_tool_id: Mapped[Optional[int]] = mapped_column(ForeignKey("dynamic_tool.id"), index=True) + dynamic_tool: Mapped[Optional["DynamicTool"]] = relationship() — anchored on WorkflowStep.dynamic_tool_id (model:9056) / Job.dynamic_tool_id, the established FK shape (note: name the FK constraint per the local convention seen on ToolRequestImplicitCollectionAssociation, :1435 name="fk_...", if autogenerate doesn’t). Rejected: ToolRequest.{tool_id,tool_version,tool_uuid} (denormalizes per-request, leaves ToolSource under-specified for celery); tool_uuid string column (uuid is reached via the FK — fact 6).
TOOL_DIR_NOT_PERSISTED (was Open → B, resolved; reversed an earlier A-lean on subagent evidence). Do not add ToolSource.tool_dir. It is runtime-only, install-location-coupled, read by nothing in scope (extraction passes tool_dir=None; only celery needs it, request→queue window). No real precedent for persisting install absolute paths (DynamicTool.tool_directory is user-CWL, not install-tree). tool_dir stays an ephemeral field on the (slimmed) QueueJobs message. The PHASE 2 collapse is therefore partial by design — identity/blob/dynamic_tool → model, tool_dir → message. “Single source of truth” applies to identity, not runtime location.
PHASE_2_UNSPLIT (rev-3 — supersedes rev-2 PHASE_2_SPLIT). The 2a/2b split existed only because dynamic_tool had no persisted home; PHASE 1 now adds the FK, so there is one PHASE 2: celery/tasks.py:queue_jobs reads source/source_class/tool_id/dynamic_tool_id from the persisted ToolSource via the loaded ToolRequest; tool_dir stays on the message; pass the loaded tool_request through to JobSubmitter.queue_jobs instead of re-fetching at managers/jobs.py:2239 (avoid double-fetch). Slim QueueJobs.tool_source to just tool_dir; drop QueueJobs.dynamic_tool_id (now read off tool_source.dynamic_tool_id); the managers/jobs.py:2241-2242 consume site reads it from the loaded ToolRequest.tool_source instead of request.dynamic_tool_id. Verify the off-model dynamic_tool_id is byte-equivalent to today’s message value for the same request (same resolved tool at write time).
IDENTITY_NOT_UUID (was MIRROR_JOB_NOT_UUID). No tool_uuid string column; the dynamic-tool link is a real FK on ToolSource mirroring WorkflowStep/Job (fact 6) — added in PHASE 1, not deferred.
NO_TOOLBOX_FIRST. Do not prefer the live toolbox. Blob-first reconstruction is the feature; the fix is feeding it the correct guid and the persisted DynamicTool, not bypassing it.
NO_BACKFILL. New columns nullable. Legacy rows (NULL tool_id/dynamic_tool_id) → guid=None/dynamic_tool=None → today’s behavior (short id, built-in-safe; legacy dynamic-tool requests keep today’s no-op extraction). No data migration.
BRANCH_SPLIT — core off origin/dev, consumer on #7003 (rev-3 — corrects the rev-2 PHASE_1_INDEPENDENT inconsistency). The rev-2 claim “PHASE 1 is independent of #7003” is false as it bundled the _tool_from_request edit into PHASE 1 — _tool_from_request exists only in the #7003 stack (absent on origin/dev; verified). Real split is by file-existence on origin/dev, not by phase number:
- CORE (origin/dev-branchable, land/rebase first): ToolSource columns + FK + migration; services/jobs.py:create() write site; PHASE 2 celery slim (celery/tasks.py, managers/jobs.py, schema/tasks.py). All touch infra already on origin/dev. No client/API schema rebuild (pure ORM, no API Pydantic surface — unlike the unrelated ade60f0ff0).
- CONSUMER (stays stacked on extract_issue_followups): _tool_from_request guid + dynamic_tool attach, the step-builder step.dynamic_tool_id set, PHASE 3. These edit code that exists only in #7003.
- The family-wide regression test (red-to-green step 3) needs the #7003 code present → runs on the consumer branch, not the core branch.
NO_PLAN_REFS_IN_CODE. Comments explain rationale directly; no reference to this plan or its labels.

Architecture / seam

 services/jobs.py:create()                         ← single write site
   tool = validate_tool_for_running(...)             (resolved here)
   ToolSourceModel(
     source=tool.tool_source.to_string(),
     source_class=...,
     tool_id=tool.id,                                  ← NEW (guid for toolshed/installed)
     tool_version=tool.version,                        ← NEW (identity/queryability)
     dynamic_tool_id=tool.dynamic_tool.id              ← NEW (FK; user/dynamic-tool identity,
       if tool.dynamic_tool else None,                       same value as today's QueueJobs.dynamic_tool_id)
   )                              # NO tool_dir column (TOOL_DIR_NOT_PERSISTED)
        │ persisted
        ▼
 ┌──────────────── two consumers, identity from one source ────────────────┐
 │ EXTRACTION (CONSUMER, on #7003)       │ CELERY queue_jobs (CORE, PHASE 2)  │
 │ extract.py:_tool_from_request         │ load tr = ToolRequest(            │
 │   create_tool_from_representation(    │     request.tool_request_id)      │
 │     ..., guid=tr.tool_source.tool_id) │ ts = tr.tool_source               │
 │   tool.dynamic_tool =                 │ create_tool_from_representation(  │
 │     ts.dynamic_tool   ← NEW           │   ts.source, request.tool_dir,    │
 │   → tool.id == guid                   │   ts.source_class, guid=ts.tool_id)│
 │ step-builder (:843-851):              │ tool.dynamic_tool = ts.dynamic_tool│
 │   step.dynamic_tool_id =              │ pass `tr` to JobSubmitter         │
 │     item.tool.dynamic_tool.id ← NEW   │   (no re-fetch); dynamic_tool_id  │
 │   → correct WorkflowStep tool_id      │   read off model → QueueJobs       │
 │     AND tool_uuid, family-wide        │   .dynamic_tool_id DROPPED         │
 │   (tool_dir stays None — unused)      │                                    │
 └────────────────────────────────────────┴────────────────────────────────┘

Load-bearing change = one argument + one attribute (guid=None → guid=tool_source.tool_id; tool.dynamic_tool ← tool_source.dynamic_tool) and the step-builder setting step.dynamic_tool_id, made correct by three populated columns. Everything else is cleanup riding the same seam.

Files to touch (checklist)

Tagging: [CORE] = origin/dev-branchable; [CONSUMER] = stays on the #7003 stack (BRANCH_SPLIT).

PHASE 1 — close the toolshed-id and dynamic/user-tool gaps family-wide

[CORE] lib/galaxy/model/__init__.py ToolSource (:1402-1408): add tool_id: Mapped[Optional[str]] = mapped_column(Unicode(255), index=True), tool_version: Mapped[Optional[str]] = mapped_column(Unicode(255)), dynamic_tool_id: Mapped[Optional[int]] = mapped_column(ForeignKey("dynamic_tool.id"), index=True), and dynamic_tool: Mapped[Optional["DynamicTool"]] = relationship(). Identity-string shape anchored on DynamicTool (:1460-1475) / ToolSource.hash (:1406); FK shape anchored on WorkflowStep.dynamic_tool_id (:9056). (DynamicTool is declared after ToolSource at :1460 — confirm forward-ref/string-relationship resolves; it does for the existing ToolRequest→ToolSource order.)
[CORE] New Alembic revision (add_column/drop_column, migrations/util.py:308,312) for the three nullable tool_source columns (incl. the FK; name the constraint per local convention if autogenerate doesn’t — cf. fk_trica_* :1435). Generate via sh manage_db.sh revision (do not hand-pick down_revision; ~15 heads, may need a merge-heads). Downgrade drops all three. No data step.
[CORE] lib/galaxy/webapps/galaxy/services/jobs.py:267-271: set tool_id=tool.id, tool_version=tool.version, dynamic_tool_id=tool.dynamic_tool.id if tool.dynamic_tool else None on the ToolSourceModel(...) (all values already in scope — tool :248, tool.dynamic_tool is the exact expression used at :299 for the message).
[CONSUMER] lib/galaxy/workflow/extract.py:_tool_from_request (:612-628): pass guid=tool_request.tool_source.tool_id (was guid=None :627); after reconstruction set tool.dynamic_tool = tool_request.tool_source.dynamic_tool (mirrors managers/jobs.py:2242). Update docstring :617-619: guid + dynamic-tool now recovered; keep the tool_dir runtime caveat. No plan reference.
[CONSUMER] lib/galaxy/workflow/extract.py step-builder (:843-851, jobless else branch): set step.dynamic_tool_id = item.tool.dynamic_tool.id if item.tool.dynamic_tool else None so the extracted step resolves via tool_uuid (model:9123-9124). (Confirm whether the job is not None branch also needs this — see Unresolved; default is no, keep scope to the jobless path.)
[CORE] No client/API schema rebuild (pure ORM; see BRANCH_SPLIT).

PHASE 2 — slim the celery message (one phase; rev-2’s 2a+2b)

[CORE] lib/galaxy/celery/tasks.py:queue_jobs (:447-456): load ToolRequest by request.tool_request_id; read source/source_class/tool_id/dynamic_tool_id off tool_request.tool_source; tool_dir from the (slimmed) message. Pass the loaded tool_request into job_submitter.queue_jobs(...).
[CORE] lib/galaxy/managers/jobs.py:JobSubmitter.queue_jobs (:2238-2306): accept the passed-through tool_request; drop the re-fetch at :2239/_tool_request call (or keep _tool_request as fallback only). At :2241-2242, set tool.dynamic_tool from tool_request.tool_source.dynamic_tool (was sa_session.get(DynamicTool, request.dynamic_tool_id)).
[CORE] lib/galaxy/schema/tasks.py (:181-199): slim DTO ToolSource to the still-needed runtime field(s) (tool_dir); drop QueueJobs.dynamic_tool_id (:199) — now read off tool_source.dynamic_tool_id. Do not remove tool_source wholesale (tool_dir remains).
[CORE] lib/galaxy/webapps/galaxy/services/jobs.py:281-300: stop populating the dropped DTO fields (incl. dynamic_tool_id :299); keep tool_dir.

PHASE 3 — bells & whistles (review-surfaced, same surface) — all [CONSUMER]

CLEANUP_A extract.py comment block :822-828 (overclaim sentence :826-827: “the service-layer validator rejects payloads that mix them”). The gap is real: in implicit_collection_jobs_ids, a classic ICJ sorts by representative_job.id (Job space, :805) while a jobless tool-request ICJ sorts by tool_request.id (:796 → _tool_request_work_item default :639) — incomparable, no guard. A grey (jobs-exist) tool-request ICJ uses representative_job.id and is comparable. Fix: tighten the comment to the precise guarantee and add a validator guard in _validate_extract_by_ids_payload that rejects a payload whose implicit_collection_jobs_ids resolves to both a Job-keyed ICJ and a ToolRequest-keyed (tool-request-backed AND not icj.jobs) ICJ — replicate the exact extract.py:796 branch logic; do not key on “tool-request-backed” alone (would wrongly reject valid grey+classic mixes). Mirror the existing mix-guard at services/workflows.py:287.
CLEANUP_B extract.py:858-864: post-#7003 the reroute makes _synthesize_request_input_steps run for completed tool-request ICJs — a no-op only when the input HDCA is co-selected. Add an API test selecting a tool-request ICJ without its input collection; pin that the synthesized input step is the intended structure (expected: yes, a workflow input node). Test-only unless it surfaces a wiring bug.
CLEANUP_C services/workflows.py:342-344: payload.job_ids or payload.implicit_collection_jobs_ids in single_request_only is unreachable (already raised at :287). Simplify to not (payload.hda_ids or payload.hdca_ids) + one-line comment, or leave defensive. Trivial, lowest priority.

Red-to-green test order

Project convention: red first, one suite at a time. Reuse existing harness; do not add tool fixtures without checking.

RED — unit, both bugs (unit-pure). New test/unit/workflows/ test: call _tool_from_request(trans, tool_request) where tool_request.tool_source is a SimpleNamespace/Mock. (a) toolshed: source=<cat1-shaped xml>, source_class="XmlToolSource", tool_id="toolshed.example/repos/o/r/cat/1.0", dynamic_tool=None → assert tool.id == "toolshed.example/repos/o/r/cat/1.0". (b) dynamic: tool_id=None, dynamic_tool=<mock DynamicTool with .id/.uuid> → assert tool.dynamic_tool is that mock. Both currently fail (guid=None → short id; dynamic_tool never attached). Control: tool_id=None, dynamic_tool=None → short id, no dynamic_tool (NO_BACKFILL safety).
GREEN — PHASE 1. Columns + FK + migration + populate + guid=/dynamic_tool wiring + step-builder dynamic_tool_id → test 1 green. tox -e unit -- test/unit/workflows/ + migration up/down check (incl. FK constraint create/drop).
GREEN — family regression. ./run_tests.sh -api lib/galaxy_test/api/test_workflow_extraction.py::TestWorkflowExtractionByIdsApi (#21788 + #7003 36-class) stays green (built-in tools → short id == effective id, no dynamic_tool → both changes no-op for them; proves no regression).
GREEN — PHASE 2. Existing async tool-request API/integration suite: a tool-request job still queues+runs after the message slims (load-off-model); add one for a user/custom (dynamic) tool asserting it still queues+runs after QueueJobs.dynamic_tool_id is dropped and read off tool_source.dynamic_tool_id (this is the primary new risk surface — the consume path moved from message to model).
PHASE 3 — CLEANUP_A: validator unit test (extends test_extract_by_ids_validation.py; note: needs ImplicitCollectionJobs mocking — output_dataset_collection_instances/jobs/populated_state — heavier than the existing tool-request-only mocks). CLEANUP_B: the API test above. CLEANUP_C: existing validation suite.
Toolshed e2e (deferred unless cheap fixture exists). The unit pin (1–2) proves the guid= plumbing. An installed-shed/guid-namespaced API e2e is heavy; defer unless the implementer finds an existing guid-namespaced test tool (grep first).

Run after each suite; one suite at a time (Galaxy single-run convention).

Out of scope (do not pull in)

(rev-2’s “PHASE 2b” — dropping dynamic_tool_id from QueueJobs — is no longer out of scope; the FK that made it a separate model change is now PHASE 1, and the drop is folded into PHASE 2.)
The classic job-backed dynamic-tool extraction gap (extract.py:846-847 sets step.tool_id=job.tool_id/tool_version and never step.dynamic_tool_id, even when job.dynamic_tool exists) — pre-existing, predates this family, not widened by #7003. Tracked as Unresolved (decide: separate follow-up vs. one-line opportunistic fix); not pulled in by default.
Persisting ToolSource.tool_dir (TOOL_DIR_NOT_PERSISTED).
Toolbox-first / any live-toolbox preference (NO_TOOLBOX_FIRST).
tool_dir-dependent runtime reconstruction edges (<command> exec, <code file>, <options from_file/from_code>).
Backfilling legacy ToolSource rows.
ToolSource.hash implementation (the existing "TODO") — adjacent, distinct concern.
Workflow-invocation-sourced extraction — CAPTURE_WORKFLOW_EXECUTION_STATE_PLAN.

Resolved questions

Reconstruction fail for macros/toolshed? No — pre-expanded blob; celery runs every async job from it. Retracted.
Identity on ToolRequest or ToolSource? ToolSource (IDENTITY_ON_TOOLSOURCE).
tool_version needed for reconstruction correctness? No (fact 2) — identity/queryability only; tool_id/guid is load-bearing.
Column DDL precedent? identity strings: DynamicTool/ToolSource.hash (Unicode(255)); FK: WorkflowStep.dynamic_tool_id/Job.dynamic_tool_id — not Job for the strings (different shape; rev-1 “mirror Job” was wrong).
Should ToolSource→DynamicTool linkage exist, and now? Yes, now, in PHASE 1 (rev-3). It is the same model under-capture as tool_id/guid, for the dynamic/user-tool class — load-bearing for jobless extraction (fact 8), not celery-message cleanup. Rev-2’s “PHASE 2b deferral” was a misclassification. (This question is what triggered rev-3.)
TOOL_DIR_PERSISTENCE? Resolved B — do not persist (no in-scope reader, no real precedent, location-coupled). Stays ephemeral on the message.
PHASE_2 scope? Resolved — one phase (PHASE_2_UNSPLIT); rev-2’s 2a/2b split dissolved once the FK landed in PHASE 1.
Independent of #7003? Split, not binary (BRANCH_SPLIT): the persistence+celery core is origin/dev-branchable; the _tool_from_request/step-builder consumer + PHASE 3 are #7003-bound (those symbols don’t exist on origin/dev — verified). Rev-2’s flat “Yes” was wrong because it bundled the consumer edit into PHASE 1.
Migration mechanics? add_column/drop_column via manage_db.sh revision autogenerate (not hand-picked head; 1d1d7bf6ac02 is create_table, not the precedent) — incl. the new nullable FK + its constraint.

Unresolved questions

Toolshed e2e fixture: does a cheap existing installed/guid-namespaced test tool exist for step 6, or is the unit pin sufficient and API e2e deferred? (grep at implementation time).
CLEANUP_A test cost: is the ImplicitCollectionJobs mocking acceptable at unit level, or should this assertion be an API-level test instead? (decide when writing step 5).
PR/issue boundary: CORE as its own PR off origin/dev (cross-linking #21788/#7003) with the CONSUMER commits landing on the #7003 stack once CORE merges/rebases, or one stacked PR? (BRANCH_SPLIT makes the branching mechanical; the PR packaging is still open. Recommendation: separate CORE PR — it is independently shippable and reviewable without the messy extract surface.)
Classic job-backed dynamic-tool step gap: opportunistically set step.dynamic_tool_id in the job is not None branch too (extract.py:846-847, one line via job.dynamic_tool), or strictly leave it as a pre-existing separate follow-up? (Cheap and symmetric, but widens blast radius / test surface beyond the tool-request family this stack targets.)
Reconstructed dynamic tool’s tool.id: after create_tool_from_representation(guid=None) on a dynamic tool’s blob, is tool.id the same value create_dynamic_tool would set (dynamic_tool.tool_id, tools/__init__.py:708)? If not, the step needs tool_id from tool_source.dynamic_tool.tool_id, not item.tool.id. (Verify at implementation; affects the step-builder line.)
e2e fixture for the dynamic-tool path: does a cheap existing user/unprivileged-tool async-request test fixture exist (grep dynamic_tool/tool_uuid in lib/galaxy_test/), or is the unit pin + the PHASE 2 queue test sufficient?

References (in-repo, file:line — read on `extract_issue_followups` @ `5a40094b8a`, the current #7003 head; the rev-2 `a994fe6a99` pin is dead)

Bug mechanism: tools/__init__.py:1330-1333 (guid→tool.id), :471-479 (create_tool_from_representation), xml.py:190-194 (parse_version/parse_id).
Extraction call site: workflow/extract.py:612-628 (_tool_from_request, guid=None :627); reroute loop :779 (tool-request branch :789-797, jobless sort_key :796, classic :805); overclaim comment block :822-828.
Model: model/__init__.py:1402-1408 (ToolSource; hash Unicode(255) :1406), :1411-1428 (ToolRequest, tool_source rel :1423), :1460-1475 (DynamicTool — identity-string precedent), :1435 (fk_trica_* FK-name convention), :9056 (WorkflowStep.dynamic_tool_id — FK precedent), :9123-9124 (WorkflowStep.tool_uuid property = dynamic_tool.uuid).
Dynamic-tool path: tools/__init__.py:471-479 (create_tool_from_representation — does not set dynamic_tool), :665-682 (dynamic_tool_to_tool), :703-708 (create_dynamic_tool, tool.id = dynamic_tool.tool_id); managers/jobs.py:2060-2063 (uuid via dynamic_tool FK), :2241-2242 (message→tool.dynamic_tool patch); services/jobs.py:247 (ToolRunReference(... tool_uuid ...) — async request can be dynamic).
Write site: services/jobs.py:246-300 (create; ToolSourceModel :267-271; tool.id :285, tool.tool_dir :283, tool.dynamic_tool.id :299).
Celery / submit: celery/tasks.py:95-109 (cached_create_tool_from_representation, guid=tool_id :108), :447-456 (queue_jobs); schema/tasks.py:181-199 (DTO ToolSource, QueueJobs, tool_request_id :190, dynamic_tool_id :199); managers/jobs.py:2238-2306 (JobSubmitter.queue_jobs; ToolRequest load :2239/:2303-2308; dynamic_tool_id consume :2241-2242).
tool_dir: tools/__init__.py:1028-1032; runtime use :1425-1426,:1585-1586,:1639-1657.
Migration: migrations/util.py:308,312 (add_column/drop_column); merge-heads pattern versions_gxy/98621a25ab75_merge_migration_heads.py.
Validator (CLEANUP_A/C): services/workflows.py:287 (mix-guard to mirror), :342-344 (redundant term).
#7003 head (current, on extract_issue_followups): 5a40094b8a (rev-2 a994fe6a99 no longer exists post-rebase); #21788: 05e4ac7c54, ade60f0ff0. extract_issue_followups = origin/dev (merge-base e232e72c9b) + 12, 0 behind.

TOOL_SOURCE_TRACKING_PLAN