TOOL_SOURCE_TRACKING_PLAN

Tool-Source Identity Tracking — Implementation Plan

Date: 2026-05-18 (rev 3 — dynamic_tool linkage pulled in-scope as load-bearing PHASE 1 (was deferred “PHASE 2b”); core/consumer branch split off origin/dev; stale a994fe6a99 pin flagged) Branch (split — see BRANCH_SPLIT): the persistence+celery core (ToolSource identity columns incl. the dynamic_tool FK, write site, celery slim) touches only infra already on origin/dev → cut tool-source-identity off origin/dev and build the core there; land/rebase it first. The extraction consumer (_tool_from_request guid + dynamic-tool wiring, step dynamic_tool_id) + PHASE 3 edit code that exists only in the #7003 stack — those commits stay layered on extract_issue_followups (= origin/dev + 12, 0 behind). Stale-hash note: the rev-2 base pin a994fe6a99 is no longer in extract_issue_followups (rebased; #7003 head is now 5a40094b8a). All “a994fe6a99” references below mean the #7003 stack as it currently sits on the branch — reference #7003 by content, not by that dead hash. Tracking: no single GitHub issue — a model-correctness + provenance-fidelity fix underpinning the tool-request extraction family. Closes the guid=None toolshed-id edge and the symmetric dynamic/user-tool dynamic_tool=None edge documented in #7003 and the #21788 commit (05e4ac7c54). Related:

  • MAP_OVER_EMPTY_EXTRACTION_TOOL_REQUEST_PLAN — #21788, introduced _tool_from_request(guid=None) (the edge this closes)
  • QUEUED_EXECUTION_EXTRACTION_TOOL_REQUEST_PLAN — #7003, the reroute that widened the edge to completed/grey tool-request ICJs
  • EXTRACT_TOOL_REQUEST_STATE_PLAN — structured-state gate; established blob-first reconstruction as the deliberate de-toolbox direction
  • vault/research/Workflow Extraction Issues.md

At a glance

ProblemThe persisted ToolSource DB model stores only source + source_class + hash. The resolved tool identity (tool_id/guid, tool_version, and the DynamicTool link for user/dynamic tools) lives only inside the serialized blob, on a Job, or on the transient QueueJobs message. So _tool_from_request rebuilds with guid=None and dynamic_tool=None → a toolshed tool reconstructs with its short id, not its namespaced guid, and a user/dynamic tool reconstructs with no DynamicTool link (so the extracted step has no dynamic_tool_id/tool_uuid) → an extracted WorkflowStep may not resolve to the real tool. Same under-capture, two tool classes.
Key insightIdentity is intrinsic to the snapshot + the request’s resolved tool, not recoverable from the request inputs. ToolSource is the content-addressable tool snapshot. tool_id/tool_version belong there; the DynamicTool reference is the same identity gap for the dynamic/user-tool class and belongs on the same row (a nullable FK) — not on ToolRequest (stays “which source + what inputs”), and not solved by toolbox-first (reintroduces the install/upgrade/removal drift this initiative deliberately moves away from).
This plan deliversPHASE 1 (load-bearing, small): ToolSource.{tool_id, tool_version, dynamic_tool_id} (FK→dynamic_tool) columns, populated at the single write site from the already-resolved tool; _tool_from_request passes guid=tool_source.tool_id, attaches the linked DynamicTool, and the step-builder sets step.dynamic_tool_id. Closes both the toolshed-id edge and the dynamic/user-tool edge family-wide (#21788 direct tool_request_ids path and #7003 ICJ reroute) in one place. PHASE 2 (cleanup): celery reads source/source_class/tool_id/dynamic_tool_id off the model via the tool_request_id it already receives → slim the QueueJobs message (now incl. dropping dynamic_tool_id — enabled by the new FK). PHASE 3: three review-surfaced tidy-ups in the messy #21788/#7003 extract surface.
This plan does NOTAdd toolbox-first resolution (explicitly rejected). Persist tool_dir (runtime-only, location-coupled, no in-scope reader — stays ephemeral on the message). Fix the classic job-backed dynamic-tool extraction gap (extract.py never sets step.dynamic_tool_id for job is not None either — pre-existing, separate; see Unresolved). Backfill legacy rows. Change runtime execution semantics.
RiskPHASE 1 low — additive nullable columns (incl. one nullable FK), one write site, two small extraction changes, unit-pinnable. PHASE 2 low-medium — JobSubmitter already loads the ToolRequest, lru-cache key byte-identical; main care is avoiding a double-fetch and that the dynamic_tool_id read-off-model is byte-equivalent to today’s message value. PHASE 3 low — comment/guard/test.

Why this exists & why bundle it here

Bundling bar: does this improve the (admittedly messy) #21788/#7003 work? Yes —

  1. The guid=None toolshed-id limitation originated in committed #21788 (_tool_from_request, 05e4ac7c54) and was widened by #7003 (the reroute now routes completed/grey tool-request ICJs — previously job→guid via _tool_for_job — through the blob path). The same reroute widened a parallel limitation: a jobless dynamic/user-tool request rebuilds with dynamic_tool=None (the message-only link is absent in the extraction path), so the extracted step has no dynamic_tool_id/tool_uuid. PHASE 1 closes both for the whole family at the correct layer, retroactively fixing #21788’s direct path too.
  2. PHASE 2 makes ToolSource the source of truth its consumers already implicitly need: celery threads raw_tool_source/tool_source_class/tool_id and dynamic_tool_id through a parallel DTO because the model under-captures. JobSubmitter.queue_jobs already re-loads the ToolRequest by id (managers/jobs.py:2239, :2303-2308) — so reading the blob+identity+dynamic_tool off the model is a no-new-coupling cleanup. Once the FK exists, dropping QueueJobs.dynamic_tool_id is no longer a separate model change — it falls out of the same PHASE 2 read-off-model (this is what rev-2 deferred as “PHASE 2b”; it is no longer deferred).
  3. PHASE 3 clears three correctness/clarity asterisks the review found in the same extract.py/validator surface.

Retracted en route (recorded so it is not re-litigated): the earlier finding that reconstruction “fails for macros / toolshed-installed tools” was wrong. Macros are expanded at toolbox-load before XmlToolSource exists; to_string() serializes the post-expansion tree. Celery queue_jobs runs every async tool-request job by reconstructing from the same persisted blob — if it failed for those tools the async API could not execute them. The real residuals are the id-string namespacing and the dynamic/user-tool DynamicTool link below (same under-capture, two tool classes); tool_dir-dependent runtime concerns are genuinely deferred and not touched by parameter-model extraction.

Verified facts (code-traced 2026-05-17, subagent-confirmed)

  1. Bug mechanism. Tool.parse (lib/galaxy/tools/__init__.py:1330-1333): if guid is None: self.id = self.old_id (= tool_source.parse_id() = root.get("id"), the short id, xml.py:193-194) else: self.id = guid. _tool_from_request (extract.py:612-628) calls create_tool_from_representation(..., guid=None) (:627) → reconstructed .id is the short id. A toolshed tool’s executed Job.tool_id / live toolbox id is the namespaced guid. Mismatch → WorkflowStep.tool_id may not resolve.
  2. tool_version is recovered from the blob (XmlToolSource.parse_version = root.get("version"), xml.py:190-191). So tool_version on the model is identity/queryability only — not a reconstruction-correctness fix. Only tool_id/guid is load-bearing for the bug.
  3. Model under-captures. DB ToolSource (model/__init__.py:1402-1408): id, hash, source, source_class only. ToolRequest (:1411-1428): request inputs-only (services/jobs.py:272); no tool identity anywhere on the request. For a jobless request, identity is recorded nowhere today (confirmed — not a missed abstraction).
  4. Identity is free at the single write site. services/jobs.py:create() (:246-300) holds the resolved tool and builds ToolSourceModel(source=tool.tool_source.to_string(), source_class=…, hash="TODO") at :267-271. It already passes tool.id (:285), tool.tool_dir (:283), tool.dynamic_tool.id (:299) into the transient celery DTO only.
  5. Column-shape precedent = DynamicTool, not Job. DynamicTool (model:1460-1475) already models this exact shape — tool_id, tool_version, tool_format, tool_path, tool_directory, all Unicode(255); ToolSource.hash is itself Unicode(255) (:1406). (Job.tool_id is String(255), Job.tool_version is TEXT default "1.0.0" — a different shape; do not cite Job for DDL.) No conflict: DynamicTool is the user-tool registry, ToolSource the per-request snapshot — adding identity to ToolSource does not duplicate DynamicTool’s role.
  6. uuid is via the dynamic_tool FK, never a string column (managers/jobs.py:2060-2063; WorkflowStep.tool_uuid is a property = self.dynamic_tool and self.dynamic_tool.uuid, model:9123-9124, and WorkflowStep already carries dynamic_tool_id FK :9056). So ToolSource gets a nullable dynamic_tool_id FK (mirroring WorkflowStep/Job), not a tool_uuid string column.
  7. Celery already has the link to read off the model. QueueJobs (schema/tasks.py:188-199) already carries tool_request_id (:190). request.tool_source is consumed only in celery/tasks.py:queue_jobs:448-455; JobSubmitter.queue_jobs never reads it but already loads the ToolRequest via _tool_request(request.tool_request_id) (managers/jobs.py:2239, :2303-2308). lru-cache key stays byte-identical (persisted source == today’s raw_tool_source = tool.tool_source.to_string()).
  8. dynamic_tool_id is the same under-capture, not a separate concern (rev-3 reclassification). Consumed at managers/jobs.py:2241-2242 (tool.dynamic_tool = sa_session.get(DynamicTool, request.dynamic_tool_id)); set on the message at services/jobs.py:299 (tool.dynamic_tool.id if tool.dynamic_tool else None). create_tool_from_representation (tools/__init__.py:471-479) does not set tool.dynamic_tool — only the celery path patches it back from the message. The extraction path (_tool_from_request) has no message, so a jobless dynamic/user-tool request rebuilds with dynamic_tool=None, and the step-builder (extract.py:843-851) sets only step.tool_id/tool_version, never step.dynamic_tool_id → an extracted user/dynamic tool step cannot resolve via tool_uuid. This is load-bearing and exactly symmetric to the guid=None toolshed edge — same model under-capture, other tool class. An async tool request can be for a dynamic tool (services/jobs.py:247 ToolRunReference(... tool_uuid ...)), so the scenario is reachable, not theoretical. ⇒ add the dynamic_tool_id FK to ToolSource now, in PHASE 1 (rev-2’s “PHASE 2b deferral” was a misclassification of a PHASE 1 fidelity fix as celery-message cleanup).
  9. tool.tool_dir is an absolute install path (tools/__init__.py:1028-1032: os.path.dirname(realpath(config_file)); None for dynamic/user tools, config_file=None). Used only at runtime (<command> exec :1425-1426, <code file> :1585-1586, <options from_file> :1639-1657) — never by parameter-model extraction. The only similar existing column, DynamicTool.tool_directory, is a user-supplied CWL path, not an install-tree path — there is no precedent for persisting install absolute paths in a DB column.
  10. Migration reality. 1d1d7bf6ac02_tool_request_implicit_outputs.py is a create_table migration — not an add-column precedent. Additive columns use add_column/drop_column (migrations/util.py:308,312). versions_gxy has ~15 unmerged heads (periodic merge-migrations, e.g. 98621a25ab75_merge_migration_heads.py); a hand-picked down_revision will be wrong. Generate via sh manage_db.sh revision (autogenerate resolves the head chain / flags merge-heads).

Settled decisions

Architecture / seam

 services/jobs.py:create()                         ← single write site
   tool = validate_tool_for_running(...)             (resolved here)
   ToolSourceModel(
     source=tool.tool_source.to_string(),
     source_class=...,
     tool_id=tool.id,                                  ← NEW (guid for toolshed/installed)
     tool_version=tool.version,                        ← NEW (identity/queryability)
     dynamic_tool_id=tool.dynamic_tool.id              ← NEW (FK; user/dynamic-tool identity,
       if tool.dynamic_tool else None,                       same value as today's QueueJobs.dynamic_tool_id)
   )                              # NO tool_dir column (TOOL_DIR_NOT_PERSISTED)
        │ persisted

 ┌──────────────── two consumers, identity from one source ────────────────┐
 │ EXTRACTION (CONSUMER, on #7003)       │ CELERY queue_jobs (CORE, PHASE 2)  │
 │ extract.py:_tool_from_request         │ load tr = ToolRequest(            │
 │   create_tool_from_representation(    │     request.tool_request_id)      │
 │     ..., guid=tr.tool_source.tool_id) │ ts = tr.tool_source               │
 │   tool.dynamic_tool =                 │ create_tool_from_representation(  │
 │     ts.dynamic_tool   ← NEW           │   ts.source, request.tool_dir,    │
 │   → tool.id == guid                   │   ts.source_class, guid=ts.tool_id)│
 │ step-builder (:843-851):              │ tool.dynamic_tool = ts.dynamic_tool│
 │   step.dynamic_tool_id =              │ pass `tr` to JobSubmitter         │
 │     item.tool.dynamic_tool.id ← NEW   │   (no re-fetch); dynamic_tool_id  │
 │   → correct WorkflowStep tool_id      │   read off model → QueueJobs       │
 │     AND tool_uuid, family-wide        │   .dynamic_tool_id DROPPED         │
 │   (tool_dir stays None — unused)      │                                    │
 └────────────────────────────────────────┴────────────────────────────────┘

Load-bearing change = one argument + one attribute (guid=Noneguid=tool_source.tool_id; tool.dynamic_tooltool_source.dynamic_tool) and the step-builder setting step.dynamic_tool_id, made correct by three populated columns. Everything else is cleanup riding the same seam.

Files to touch (checklist)

Tagging: [CORE] = origin/dev-branchable; [CONSUMER] = stays on the #7003 stack (BRANCH_SPLIT).

PHASE 1 — close the toolshed-id and dynamic/user-tool gaps family-wide

PHASE 2 — slim the celery message (one phase; rev-2’s 2a+2b)

PHASE 3 — bells & whistles (review-surfaced, same surface) — all [CONSUMER]

Red-to-green test order

Project convention: red first, one suite at a time. Reuse existing harness; do not add tool fixtures without checking.

  1. RED — unit, both bugs (unit-pure). New test/unit/workflows/ test: call _tool_from_request(trans, tool_request) where tool_request.tool_source is a SimpleNamespace/Mock. (a) toolshed: source=<cat1-shaped xml>, source_class="XmlToolSource", tool_id="toolshed.example/repos/o/r/cat/1.0", dynamic_tool=None → assert tool.id == "toolshed.example/repos/o/r/cat/1.0". (b) dynamic: tool_id=None, dynamic_tool=<mock DynamicTool with .id/.uuid> → assert tool.dynamic_tool is that mock. Both currently fail (guid=None → short id; dynamic_tool never attached). Control: tool_id=None, dynamic_tool=None → short id, no dynamic_tool (NO_BACKFILL safety).
  2. GREEN — PHASE 1. Columns + FK + migration + populate + guid=/dynamic_tool wiring + step-builder dynamic_tool_id → test 1 green. tox -e unit -- test/unit/workflows/ + migration up/down check (incl. FK constraint create/drop).
  3. GREEN — family regression. ./run_tests.sh -api lib/galaxy_test/api/test_workflow_extraction.py::TestWorkflowExtractionByIdsApi (#21788 + #7003 36-class) stays green (built-in tools → short id == effective id, no dynamic_tool → both changes no-op for them; proves no regression).
  4. GREEN — PHASE 2. Existing async tool-request API/integration suite: a tool-request job still queues+runs after the message slims (load-off-model); add one for a user/custom (dynamic) tool asserting it still queues+runs after QueueJobs.dynamic_tool_id is dropped and read off tool_source.dynamic_tool_id (this is the primary new risk surface — the consume path moved from message to model).
  5. PHASE 3 — CLEANUP_A: validator unit test (extends test_extract_by_ids_validation.py; note: needs ImplicitCollectionJobs mocking — output_dataset_collection_instances/jobs/populated_state — heavier than the existing tool-request-only mocks). CLEANUP_B: the API test above. CLEANUP_C: existing validation suite.
  6. Toolshed e2e (deferred unless cheap fixture exists). The unit pin (1–2) proves the guid= plumbing. An installed-shed/guid-namespaced API e2e is heavy; defer unless the implementer finds an existing guid-namespaced test tool (grep first).

Run after each suite; one suite at a time (Galaxy single-run convention).

Out of scope (do not pull in)

Resolved questions

Unresolved questions

References (in-repo, file:line — read on extract_issue_followups @ 5a40094b8a, the current #7003 head; the rev-2 a994fe6a99 pin is dead)