BOOKKEEPING_MODELS

Bookkeeping Models: ToolExecutionState + ToolSource Identity

Summary of model changes on workflow_state_backfill since 2109b9d6ff (the fda0f58413 Add rmd filetype merge), and how simple jobs, map-over jobs, the History Graph, and history → workflow extraction all consume them.

Commits in scope

07e1b23db3 Tool-source identity: persist tool_id/version/dynamic_tool; slim queue_jobs message
7d97dfeb73 Use ToolRequest state for workflow extraction
190f3177b9 Polish extraction: source-neutral structured-state seam + test trim
d4ebce5a8b Workflow extract: tool_request_ids primitive for jobless executions
fe599ce512 Rebuild schema.
e3b2adba94 Workflow extract: tool_request_ids covers queued/grey executions (#7003)
dfd629dce4 Workflow extract: recover tool identity off ToolSource; tighten ICJ mix guard
0415dd4624 Capture workflow tool-step request state via tool_execution_state
eecc209be6 fixup! Capture workflow tool-step request state via tool_execution_state
9109d06fbd History Graph UI integration prep (+ fixups 3e3ed076c7, e6e4f13308, 6e8ffef160, 49c769180f)
ef7a69c913 fixup! Tool-source identity ...
c067754e13 Converge on ToolExecutionState as the only payload seam
8bb8b20c12 TES docs: ICJ-supersedes framing + resolver-seam comments
decebd02b4 Centralize tool resolution in managers/tool_execution.py
4229a7f231 Unify extract_by_ids producer identity on ToolExecutionState.id
5194e2f388 Discriminate resolver outcome via ResolutionState
a64eb4f126 Walk Job -> WIS -> TES; reassert tool-not-None on extract
0fb619242f Move tool_source_id from ToolRequest to ToolExecutionState
8f5a92a0a2 Tighten tool_for_execution to a TES-shaped seam
79ff86fd5b Let WIS freely co-point with Job/ICJ at the same TES
1f87eebce3 Replace TRICA with TES-keyed TEICA; tighten TES back-pops to scalar

Five migrations:

Five migrations and matching ORM changes converge on one idea: a tool execution’s validated request_internal payload lives on its own row, ToolExecutionState (TES), and every consumer (jobs, ICJs, workflow steps, tool requests, History Graph, workflow extraction) reaches it through one source-neutral seam. Tool identity hangs off the execution event (TES → ToolSource), not the request side.


1. tool_source becomes content + identity-addressable

Migrations 0b49ffb1e890 (identity columns), 29fe58dda936 (dedupe + unique constraint), and 395148707459 (move FK from TR to TES).

ToolSource (lib/galaxy/model/__init__.py:1402) gains:

get_or_create_tool_source (lib/galaxy/managers/tool_source.py) is the single lookup-or-create helper used at tool-request and workflow-step-TES mint time, with IntegrityError race rollback. The old “always insert a new row with hash='TODO'” path is gone; the identity-hash migration dedupes existing rows by repointing tool_request.tool_source_id at the survivor.

After 395148707459, the tool_source_id FK lives on ToolExecutionState (NOT NULL), not on ToolRequest. Tool identity hangs off the execution event: every TES knows its tool, and ToolRequest reaches identity via tr.tool_execution_state.tool_source.

QueueJobs/Celery task no longer ship raw_tool_source, tool_source_class, tool_id, dynamic_tool_id. The celery worker reads identity off tool_request.tool_execution_state.tool_source (lib/galaxy/celery/tasks.py:534). The payload is now just { tool_request_id, ... } plus runtime knobs.


2. ToolExecutionState — single payload seam

Migration 28885b317f78 creates the new table and FKs from four hosts:

tool_request.tool_execution_state_id
job.tool_execution_state_id                       -- only for non-mapped jobs
implicit_collection_jobs.tool_execution_state_id  -- canonical anchor for mapped executions
workflow_invocation_step.tool_execution_state_id  -- workflow step before ICJ exists / failure capture

ToolExecutionState (lib/galaxy/model/__init__.py:1450):

The migration also drops the now-redundant tool_request.request column — the TES row is the only payload carrier.

”ICJ supersedes its Jobs” invariant

__strict_check_before_flush__ on Job and ImplicitCollectionJobs (gated by GALAXY_TEST_RAISE_EXCEPTION_ON_HISTORYLESS_HDA) enforces a single rule with two faces:

WIS is not part of this invariant. WIS keeps its TES FK across the execution event and co-points with either its Job (simple step) or its ICJ (mapped step) at the same TES row. TR likewise co-points with the materialized side (request + execution).

Backfill SQL respects this: every legacy tool_request gets a TES row (reusing the id for a 1:1 mapping), joined jobs get the FK, but jobs under an ICJ are then nulled out and the ICJ gets the shared TES. WIS rows linked to such ICJs keep their FK — they co-point with the ICJ at the same TES row.


3. Wire surface


4. Tool resolution — one helper

lib/galaxy/managers/tool_execution.py::tool_for_execution is the single helper that turns the captured tool identity into a Tool. It takes one of two strategies, explicitly (no inference from kwargs):

The strategy encodes which source is authoritative for the consumer (live registry vs persisted blob), which the input shape can no longer disambiguate — 8f5a92a0a2 made it required and renamed the prior "model" value to "rebuild".

Callers pass either tool_execution_state= (preferred — the TES carries every identity primitive symmetrically via tes.tool_source) or the identity primitives directly (tool_id / tool_version / dynamic_tool / tool_source). The two shapes are mutually exclusive. ResolvedStructuredRequest now carries the producing TES, so extract routes it straight into tool_for_execution without re-walking identity.

Toolbox MessageException is swallowed to None so display-time callers (History Graph) don’t need a try/except wrapper. Extract sites re-assert tool is not None after the call because a missing tool at extract time is a hard failure, not “no producer name to render” — see a64eb4f126.

History Graph display, extract’s job branch, and extract’s tool-request rebuild now all route through this one helper; the prior in-line helpers (_tool_from_request, _tool_for_job) in workflow/extract.py have dropped out. The rebuild path here is uncached; galaxy.celery.tasks keeps a worker-local cached_create_tool_from_representation for queue_jobs / finish_job hot paths, and collapsing the two cache homes into one is a documented follow-up (gated on dict-vs-str cache-key handling for in-process callers).


How simple jobs vs. map-over jobs leverage these

Async-API tool-request path (writes)

services/jobs.py::create now:

  1. Calls get_or_create_tool_source(sa_session, tool) — content+identity-deduped row.
  2. Creates a ToolExecutionState(request=request_internal_state.input_state, state=VALIDATED) and links it to the new ToolRequest.
  3. Sends a slim QueueJobs payload — celery worker fetches tool source and identity off the row.

In JobSubmitter.queue_jobs (lib/galaxy/managers/jobs.py:2235), the payload is read through the new tool_request_payload(tool_request) helper (reads through tool_request.tool_execution_state.request); dynamic_tool is recovered from tool_request.tool_execution_state.tool_source.dynamic_tool. Anything that used to mutate tool_request.request (e.g. __data_manager_mode) now mutates tool_request.tool_execution_state.request.

_execute (lib/galaxy/tools/execute.py:215)

_tool_execution_state_for_jobs(tool_request, invocation_step) picks the TES row to stamp on this execution. Then:

For workflow tool steps the WIS carries the TES from step scheduling time onward. Once WorkflowStepExecutionTracker.ensure_implicit_collections_populated produces the ICJ (mapped step) or _execute stamps the Job (simple step), the same TES row is also referenced from that materialized anchor — WIS and Job/ICJ co-point. No move, no null-out.

Workflow-side TES synthesis (lib/galaxy/workflow/modules.py:2296)

_capture_workflow_tool_request_state synthesizes a request_internal payload from the workflow step’s resolved execution state plus a per-iteration validated_param_combinations list. It deliberately:

The resulting TES is threaded into MappingParameters(..., validated_param_template, validated_param_combinations) and into _execute. Simple workflow tool steps and map-over steps go through the same code; the only branch is which side gets the materialized-anchor FK (Job for simple, ICJ for mapped) — WIS keeps its FK either way.


How the History Graph uses these

lib/galaxy/managers/history_graph.py builds a provenance graph keyed on TES id as the producer node, not Job / ToolRequest / WIS id:

  1. _producers collects candidate producers for every selected HDA/HDCA:

    • Job-side: JobToOutputDatasetAssociation / JobToOutputDatasetCollectionAssociation joined to Job.
    • Jobless collection-side: HistoryDatasetCollectionAssociation → TEICA → ToolExecutionState → ToolRequest (joined to ToolSource for identity) — how an empty map-over (tool request, zero jobs) still appears as a producer. TEICA replaces TRICA as the producer-side bookkeeping in 10c4cd393d5a; it’s keyed on TES (rekeying lifts the link off the request side onto the execution event) and is written once at execute time, so HDCA.copy() never carries a TEICA row. The walk therefore answers “what did this execution originally produce” without filtering out copies after the fact — the join itself excludes them.
  2. For each Job it calls resolve_structured_request(job=...) from lib/galaxy/managers/workflow_request_state.py. That seam encodes the “ICJ-supersedes” rule with a fallback walk through the WIS:

    _tes_from_job:  if job is under ICJ → ICJ.tool_execution_state
                    elif job.tool_execution_state → it
                    elif job.workflow_invocation_step → WIS.tool_execution_state
    _tes_from_tool_request → tool_request.tool_execution_state

    The Job → WIS walk recovers TES.id for workflow tool executions whose capture wasn’t validated (the writer always mints a WIS-side TES; execute.py only propagates the link to the Job on validation success — see a64eb4f126).

    The resolver always returns a ResolvedStructuredRequest (never None); a state: ResolutionState field discriminates the outcome: VALIDATED / NOT_VALIDATED / VALIDATION_FAILED / MISSING (no TES row at all). source_id is set whenever a TES exists; payload is populated only for VALIDATED. A validated row with a non-dict payload is a write-side bug and asserts. Consumers can choose to react state-specifically; both Graph and extract debug-log non-VALIDATED outcomes and treat the item as having no producer edge / no structured payload. resolve_structured_request_payload is the thin Optional[dict] wrapper for callers that only need the validated payload.

  3. The returned ResolvedStructuredRequest(source_id=tes.id, payload=...) is the seam — simple job and map-over job both collapse to one TES id. So in the graph:

    • A simple-job execution = one producer node = TES id of job.tool_execution_state.
    • A map-over execution (N jobs in one ICJ) = one producer node = TES id of icj.tool_execution_state. All N output HDAs/HDCAs point at the same producer.
    • A jobless tool request = one producer node = TES id of tool_request.tool_execution_state.
  4. Producer nodes are encoded with the dedicated TOOL_EXECUTION_STATE_ENCODE_KIND cipher (_producer_ref) and emitted with src="tool_execution". Input edges are derived from request_internal_input_refs(payload) — from the validated TES payload itself, not from per-job JobToInputDataset* rows. Inputs come from the declared request, identical for every constituent job of a map-over.

  5. If multiple producers resolve to the same item (shouldn’t happen post-invariant, but defensive), the item is left node-only with a debug log.


How history → workflow extraction uses these

lib/galaxy/workflow/extract.py::extract_steps_by_ids is the ID-based path used by the new wire surface. The structured payload reaches it via the same seam — extract uses the discriminated form so it can pick up both payload and source_id (the latter is the tier-1 sort key):

resolved = resolve_structured_request(job=job)                   # simple job
resolved = resolve_structured_request(icj=icj)                   # map-over
request_payload = tool_request_payload(tool_request)             # jobless / tool_request_ids

_WorkItem is the shared record (job optional, tool_request optional, request_payload optional). Three sources, one downstream:

The structured branch _structured_step_inputs_by_id validates the payload against the tool’s parameter model and projects to workflow-step state with to_workflow_step_state; associations are produced from request_internal_input_refs rather than from Job.input_dataset* rows, so map-over connections are wired to the pre-map input HDCAs (via implicit_input_collections + the request refs), not to individual sliced elements.

Sort ordering uses a 2-tuple (tier, id) on each _WorkItem:

Because tier-1 ids are universally comparable, the two prior service-layer mix-guards (TR-keyed vs. job/ICJ-keyed in one payload; job-keyed ICJs vs. TR-keyed ICJs) have dropped out — they existed only because the underlying ids weren’t comparable. The remaining cross- payload validation (populated_state, output-collection presence, job-must-not-have-ICJ, TR-state-new single-step gate, accessibility) stays in lib/galaxy/webapps/galaxy/services/workflows.py.


Net picture

Both the History Graph (read-side provenance) and structured workflow extraction (read-side workflow synthesis) used to have to peek into multiple shapes — Job, ToolRequest, WorkflowInvocationStep, ICJ — to find a validated request_internal. After these changes they both walk one seam: resolve_structured_request(...) → ToolExecutionState. Simple jobs and map-over jobs become the same shape to consumers — they differ only in which row owns the FK (Job vs. ICJ), and the resolver hides that. Jobless executions (empty map-over, never-submitted tool request) become first-class because the TES exists independently of any Job.