POLISH_MODELS_PLAN

Polish Plan: ToolExecutionState Nest

Five-item polish pass on the ToolExecutionState (TES) nest in workflow_state_backfill. Two doc-only items, one refactor, one substantive consumer-API unification, one resolver-API change. Greenfield on both consumer APIs (graph + extract-by-ids) makes the riskier items cheap.

Order is dependency-respecting: docs → refactor (#4) → identity (#5) → resolver shape (#6). #6 lands last so it can be informed by what #5 showed extraction actually needs.

Item 1: Rename “exactly one path” → “ICJ supersedes children”

Problem. The invariant in __strict_check_before_flush__ on Job, ICJ, WIS and the docstrings on ToolExecutionState / 28885b317f78_add_tool_execution_state.py advertise “exactly one path.” Not true: ToolRequest.tool_execution_state_id and Job.tool_execution_state_id legitimately point at the same TES (request side + materialized side). The actual rule is “ICJ supersedes its child Job/WIS.” Misleading framing → future readers will think TR + Job is also forbidden.

Scope.

Decision: keep dunder name or rename method? Verify whether __strict_check_before_flush__ is a SQLAlchemy event-hook-discovered name (grep for before_flush listeners that walk session.dirty / new). If hook-discovered, keep the dunder; if a regular method called explicitly, rename to _check_icj_supersedes_children.

Tests. test/unit/data/model/test_tes_path_invariant.py already covers the three rejection cases. Rename test functions to match new naming; assertion bodies stay identical. No new tests.

Effort. ~30 min including the hook-name verification.

Item 2: Docstring the canonical-access pattern on relationships

Problem. Today nothing stops a new consumer from doing job.tool_execution_state directly and getting None for a mapped Job. There is exactly one production reader (_tes_from_job) and one intentional test reader (simple-workflow-step assertion). Future drift is the risk; convention is the defense.

Scope. Single-line comments on three relationship declarations in lib/galaxy/model/__init__.py:

# Job-side link. NULL for mapped Jobs (the ICJ carries the TES). Read via
# galaxy.managers.workflow_request_state.resolve_structured_request, not
# this attribute directly.
tool_execution_state: Mapped[Optional["ToolExecutionState"]] = relationship(...)

Same idea on ImplicitCollectionJobs.tool_execution_state (“supersedes the constituent Job/WIS links when set”) and WorkflowInvocationStep.tool_execution_state (“transitional: moved to ICJ once mapped step produces an ICJ”).

Tests. None.

Effort. ~5 min.

Item 3: Centralize tool resolution

Problem. Three sites resolve a Tool from execution-time identity:

Three sites, two distinct resolution strategies, no centralization. A fourth consumer will pick one arbitrarily.

Design. New module lib/galaxy/managers/tool_execution.py (cleaner than colocating with tool_source.py — the helper is about producing a Tool from execution-time identity, not about ToolSource per se).

def tool_for_execution(
    app,
    toolbox: AbstractToolBox,
    *,
    tool_id: Optional[str],
    tool_version: Optional[str] = None,
    dynamic_tool: Optional[DynamicTool] = None,
    tool_source: Optional[ToolSource] = None,
    prefer: Optional[Literal["toolbox", "model"]] = None,
) -> Optional[Tool]:
    """Resolve a Tool for a captured execution event.

    prefer=None infers: model when tool_source is supplied, else toolbox.
    Pass prefer explicitly to override (e.g. toolbox-first with
    tool_source as a fallback).
    """

Reuse cached_create_tool_from_representation from celery/tasks.py for the model path (caching parity with the celery worker). If the lru_cache location is awkward to import, lift it into the new module instead.

Call site migration.

After migration, delete the two private helpers in extract.py.

Tests.

Effort. ~2-3 hours including tests.

Item 4: Unified producer identity on TES.id

Problem. Graph encodes producer nodes with TES.id (TOOL_EXECUTION_STATE_ENCODE_KIND). Extraction sorts work items by Job.id for job/ICJ items and ToolRequest.id for TR items, and explicitly rejects payloads mixing the two spaces. Same execution event, two identities depending on the consumer. Blocks graph-node → extraction-unit linking and forces mix-guards that exist solely because of the id-space split.

Design.

TES.id monotonicity for dependency ordering. TES is minted at request creation (TR path) or step scheduling (workflow path), both of which precede any Job in the same step. Across steps, the workflow scheduler walks dependency order so a downstream step’s TES is minted after its upstream’s. Job.id today is a heuristic in the same way — swapping to TES.id preserves the heuristic without weakening it.

Pre-TES legacy historical executions. Workflow tool-step executions that predate TES have no link. They are reachable only via the legacy fallback (workflow_extraction_fallback_to_legacy_state).

Two options:

Recommend 5a + flip workflow_extraction_fallback_to_legacy_state default to false in the same change. Greenfield consumer API; the flag was a transitional safety net, and structured capture has been landing on every new execution. Pre-TES historical extraction was never the target user flow for extract_by_ids (the HID-based extract_steps path is the historical one and is unaffected).

Tests.

Effort. ~1 day. The work is mostly test churn: many extraction tests assert specific ordering behavior; they’ll need to switch from “job_ids in this order” to “TES.id implied order.”

Item 5: Discriminated resolver outcome

Problem. resolve_structured_request returns Optional[ResolvedStructuredRequest]. None collapses three distinct failure modes: TES missing entirely, TES exists with state=not_validated (skipped conditional step), TES exists with state=validation_failed (capture attempted, payload rejected), TES exists but request is not a dict (post-validation invariant break). Consumers can’t tell them apart. The validity enum (TES.state) was introduced to express the “missing vs. attempted vs. validated” distinction, but the resolver throws that distinction away.

Design. Always return a ResolvedStructuredRequest; never None. Add a ResolutionState enum carrying both ToolExecutionStateValidity values plus a MISSING sentinel for the no-TES case and MALFORMED for the post-validation invariant break.

# lib/galaxy/managers/workflow_request_state.py
class ResolutionState(str, Enum):
    VALIDATED = "validated"           # mirror of ToolExecutionStateValidity
    NOT_VALIDATED = "not_validated"   # mirror
    VALIDATION_FAILED = "validation_failed"  # mirror
    MISSING = "missing"               # no TES row at all (pre-TES historical)
    MALFORMED = "malformed"           # TES validated but request is not a dict

class ResolvedStructuredRequest(NamedTuple):
    state: ResolutionState
    source_id: Optional[int] = None    # TES.id when state != MISSING
    payload: Optional[dict] = None     # set only when state == VALIDATED

resolve_structured_request_payload(...) -> Optional[dict] keeps the existing return shape — thin convenience for callers that only want the validated payload. tool_request_payload(tool_request) already raises RequestInternalToWorkflowStateError on missing/malformed — unchanged.

Consumer use.

Tests.

Effort. ~half day. Mostly mechanical at the consumer end (if result is Noneif result.state != VALIDATED).

Suggested ordering and PR shape

PRItemsWhy grouped
1#1, #2Pure docs; safe to merge while design discussions on 4-6 continue.
2#3Pure refactor, no behavior change. Lands before #4 so the centralized helper is in place.
3#4Substantive consumer-API change; deserves its own PR + reviewers.
4#5Resolver-shape change; touches both consumers, but smaller blast radius than #4.

Total: ~2-3 days of work spread across 4 PRs.

Unresolved questions