USER_DEFINED_TOOL_STEP_VALIDATION

User-Defined Tool Step Validation in workflow_state

Date: 2026-05-22 Branch: wf_tool_state Companion docs:


1. Problem

In a running Galaxy, a user-defined tool (UDT, class: GalaxyUserTool) looks indistinguishable from a regular tool to the workflow editor — the step carries a tool_uuid, the editor resolves the UUID against the per-user DynamicTool table, and the form/connection logic uses the parsed parameter model the same way it would for a ToolShed tool. The dynamic-tool table is the resolver.

Our offline CLI stack has no such resolver. GetToolInfo (lib/galaxy/tool_util/workflow_state/_types.py:32-35) looks tools up by (tool_id, tool_version), the implementations (ToolShedGetToolInfo, CombinedGetToolInfo) hit the Tool Shed 2.0 API or a configured Galaxy. UDTs are not in either — they live only in dynamic_tool.value on the originating Galaxy, scoped to the creating user.

Galaxy’s own workflow exporter does the right thing: when a step has a dynamic_tool, it inlines the full tool YAML into step_dict["tool_representation"] and clears tool_id / tool_uuid / content_id (lib/galaxy/managers/workflows.py:1713-1721). gxformat2 round-trips that field (native tool_representation ↔ format2 run: GalaxyUserToolStub, gxformat2/normalized/_conversion.py:581-644, gxformat2/schema/native_strict.py:354). So the workflow document is self-contained — the resolver gap is purely on our side.

1.1 Current workflow_state behavior on a UDT step

UDT steps are silently skipped today, producing false-negative passes:

The result: a workflow that embeds a broken UserToolSource (undeclared input refs, blank version, missing output claim, malformed citation, weird container shape, …) passes gxwf-state-validate --strict-state clean. Any downstream consumer — VS Code, IWC lint-on-merge (D7), the gxformat2 IWC migration (D10) — inherits the false negative.

2. Goal

Make every workflow_state operation that resolves a tool also resolve UDT (and admin dynamic-tool) steps from their inline tool_representation, so a workflow that embeds a UDT is validated, lint-checked, connection-checked, and round-tripped with the same rigor as one that references a ToolShed tool. A workflow that POSTs cleanly through /api/unprivileged_tools should pass gxwf-state-validate; a workflow that does not, should fail.

Concretely:

  1. Resolve — produce a ParsedTool from step.tool_representation (or format2 step.run) without touching the network.
  2. Validate the source — run the UserToolSource / YamlToolSource pydantic gate plus lint_user_tool_source against the embedded YAML, surfacing the same friendly bullets PR #22615 introduced.
  3. Validate the state — feed the locally parsed tool through the existing native / format2 state validators.
  4. Validate connections — extend the connection graph so UDT outputs carry their declared formats and feed downstream type checks.
  5. Round-trip — confirm tool_representation survives native → format2 → native with no diff (gxformat2 already does this; lock with a fixture).
  6. Export schemasgalaxy-tool-cache learns to emit WorkflowStepToolState JSON Schemas for inline UDTs, so the JSON-Schema validation backend and VS Code can validate UDT step state without a tool cache.

3. Non-goals

4. Background — the discriminator

The native step is an inline UDT iff:

step.tool_representation is not None
  AND step.tool_representation["class"] == "GalaxyUserTool"

The native step carries an admin dynamic tool (out of scope per §3) iff:

step.tool_representation is not None
  AND step.tool_representation["class"] == "GalaxyTool"

Admin steps are detected only to surface the inline_source_unsupported warning; we do not parse them or feed them into the connection graph.

We resolve through UserToolSource.model_validate(...) directly — not the DynamicToolSources discriminated union (tool_util_models/__init__.py:325), since the admin class is skipped:

from galaxy.tool_util_models import UserToolSource
UserToolSource.model_validate(tool_representation)

The format2 step is an inline UDT iff step.run is a GalaxyUserToolStub or a dict with class: GalaxyUserTool (gxformat2/normalized/_format2.py:138-148, _conversion.py:1279-1282).

When both tool_id and tool_representation are set on a native step (rare — Galaxy’s exporter clears tool_id for UDTs, but third-party exporters might not), tool_representation wins: it’s the user’s per-instance canonical copy, tool_id is just for human display.

4.1 The bridge to ParsedTool

Galaxy already does this: lib/galaxy/tools/__init__.py:653-660 constructs a YamlToolSource(tool_representation) from a DynamicTool.value and parses it. The same call works for us:

from galaxy.tool_util.parser.yaml import YamlToolSource
from galaxy.tool_util.model_factory import parse_tool

tool_source = YamlToolSource(tool_representation)
parsed_tool = parse_tool(tool_source)

parse_tool is in lib/galaxy/tool_util/model_factory.py:26 and returns a ParsedTool, which is exactly what GetToolInfo.get_tool_info returns. So no new abstraction is needed — we just inject a fast path before the remote fetch.

5. Design

5.1 Layer 1 — utility helpers (_util.py)

Add small, focused helpers next to step_tool_id / step_tool_state:

def step_tool_representation(step: StepLike) -> Optional[dict]: ...
def step_is_inline_tool(step: StepLike) -> bool: ...
def step_inline_tool_class(step: StepLike) -> Optional[Literal["GalaxyUserTool", "GalaxyTool"]]: ...

Both native (model + raw dict) and format2 (NormalizedWorkflowStep.run = GalaxyUserToolStub | dict) forms are normalized by these. Tests in test_workflow_state_helpers.py.

5.2 Layer 2 — inline tool resolver

New internal module workflow_state/_inline_tool.py (underscore-prefixed; Phase A surface is internal until Phase B settles the call site):

InlineToolSourceResult is a Pydantic report model and is the only public symbol from Phase A — it round-trips through JSON / Markdown reports. The two functions stay private until Phase B promotes them with their final call signatures.

5.2.1 Extend lint_user_tool_source to preserve severity

Required Galaxy-side change in lib/galaxy/tool_util/lint.py. Today (:319-337) returns a flat List[str] with error_messages + warn_messages concatenated:

def lint_user_tool_source(user_tool_source):
    # ...
    return error_messages + warn_messages

Change to return both lists (additive — back-compat for existing one-call site in managers/tools.py via a shim):

def lint_user_tool_source_structured(
    user_tool_source, *, skip_network: bool = True
) -> tuple[list[str], list[str]]:
    skip = list(NETWORK_LINTERS) if skip_network else []
    # ... build lint_ctx with skip_types=skip
    return error_messages, warn_messages

def lint_user_tool_source(user_tool_source):  # back-compat
    errors, warnings = lint_user_tool_source_structured(user_tool_source)
    return errors + warnings

The skip_network parameter is new — current callers (managers/tools.py, agents/custom_tool.py) get the old skip-always behavior; workflow_state passes skip_network=offline_mode.

5.3 Layer 3 — step-shaped resolution

GetToolInfo stays a typing.Protocol (structural typing matters — external consumers satisfy it without subclassing). The step-shaped resolution is a free function, not a protocol extension:

# workflow_state/_inline_tool.py
def resolve_for_step(
    get_tool_info: GetToolInfo, step: StepLike, *, offline: bool = False
) -> Optional[ParsedTool]:
    if step_is_inline_tool(step):
        if step_inline_tool_class(step) != "GalaxyUserTool":
            return None  # admin class: emit inline_source_unsupported diagnostic at caller
        return _parse_inline_tool(step_tool_representation(step))
    tool_id = step_tool_id(step)
    if not tool_id:
        return None
    return get_tool_info.get_tool_info(tool_id, step_tool_version(step))

Per-invocation memoization layered on top in a _ResolveCache(get_tool_info) wrapper used by tree-mode entry points. Keyed by step identity (id(step)) — content-hash dedupe deferred until profiles say it matters.

5.4 Callers to update

Every site that currently consults GetToolInfo. All switch to resolve_for_step(get_tool_info, step, offline=...):

SiteTodayAfter
validation_native.get_parsed_tool_for_native_step (validation_native.py:65-70)short-circuit on tool_id=Nonecall resolve_for_step; inline UDT steps now parsed locally
validation_format2.validate_step_format2 (validation_format2.py:69-73)short-circuit on not tool_idcall resolve_for_step; GalaxyUserToolStub run resolves through the same helper
connection_graph._resolve_tool_step (connection_graph.py:179-213, inner check at :190)gated on step.tool_iduse resolve_for_step; resolved ParsedTool drives output type propagation
clean.py step cleanershort-circuits on tool_id=None (currently broken on UDT — Phase B side-effect fix)resolve_for_step for stale-key classification of step.tool_state. Embedded tool_representation itself is never touched.
roundtrip.pyGetToolInfo for round-trip diff classificationresolve_for_step; inline-tool round-trip classified the same way
to_native_stateful.py (lines 46, 83, 120, 357)resolves tools for format2→native encoding callbacksresolve_for_step everywhere GetToolInfo is used; ensures inline-UDT format2 workflows produce correct native output
connection_validation.py (lines 29, 104, 114)per-connection type validation via GetToolInforesolve_for_step; inline-UDT outputs participate in connection type checking
lint_stateful.pyGetToolInfo for state lintadd a new InlineSourceLint phase that runs _validate_inline_tool_source on every inline step; structural+state lint paths use resolve_for_step
cache.populate_workflowwalks steps, calls add_tool(tool_id)skip inline steps; emit them in a new inline_tools[] inventory section

validation_tests.py is unaffected — workflow-test files don’t reference tools directly.

Subworkflow recursion is automatic. validation_native.py:79-86 recurses into step.subworkflow and threads get_tool_info through; connection_graph._resolve_subworkflow_step (connection_graph.py:216+) builds an inner graph with the same resolver. Once resolve_for_step is the single entry point, a UDT step inside a subworkflow is resolved the same as a top-level one. The §9.8 UDT-in-subworkflow fixture locks this.

Precheck / legacy detection. legacy_encoding.py classifies tool_state encoding; a UDT step’s tool_state is created post-PR-22615 and is modern-by-construction. precheck_native_workflow scans for legacy encoding signals only, not for missing tool_id, so inline UDTs do not trigger precheck skips today. No change needed — confirm with a test fixture (test_precheck.py gains udt_step_not_flagged_as_legacy).

5.5 Validation surface — what we surface where

We want three distinct error categories in reports, not collapsed into a single bucket. Surfacing them through the same Pydantic report models that already drive --report-json / Jinja2 markdown:

  1. Inline-source errors (UserToolSource.model_validate fails on tool_representation). Bullet form: <step_id>/<tool_id>: <dotted.loc>: <pydantic_msg>. New StepDiagnostic type inline_source_invalid. Severity: error (workflow is structurally unsound).

  2. Inline-source lint findings (errors + warnings from lint_user_tool_source_structured). Bullet form: <step_id>/<tool_id>: <linter_name>: <message>. Severity preserved via the new structured-lint API (§5.2.1). Diagnostic types inline_source_lint_error / inline_source_lint_warning.

  3. Inline-source unsupported (class: GalaxyTool admin dynamic tool detected). New StepDiagnostic type inline_source_unsupported. Severity: warning. Tool is skipped from validation and connection graph; warning lands on the step.

  4. State validation errors against the inline tool model. Identical to today’s WorkflowStepNativeToolState / WorkflowStepLinkedToolState errors — no new diagnostic shape, just no longer skipped.

Strictness wiring:

--strict shorthand promotes all four axes (structure, encoding, state, inline-source).

5.6 Connection graph for inline UDT outputs

Once _resolve_tool_step returns a ParsedTool for an inline step, the connection graph just works — outputs carry their format declarations (from IncomingToolOutput), format_source: <input_name> resolves the same way, data_collection types feed into connection_types. No new connection logic required.

One subtlety: UDTs declare outputs via from_work_dir / discover_datasets, not via <data> / <collection> like XML. The IncomingToolOutput model already normalizes both; parse_tool produces the same ToolOutput shape on both sides.

5.7 Revalidation and read-only posture

Revalidation is unconditional: validate_inline_tool_source runs on every CLI invocation that touches a workflow with an inline UDT. Rationale:

lift_user_tool_source (tool_util_models/__init__.py:387) lifts known-drift cases out before validation. We do not call it: workflow_state is read-only and reports drift via inline_source_invalid diagnostics. The user updates the workflow upstream; we never rewrite their authored YAML.

Admin dynamic tools (class: GalaxyTool) are detected and emitted as inline_source_unsupported warnings — out of scope per §3.

6. gxformat2 enhancements

gxformat2 already does the heavy lifting on round-trip — what’s missing is the small ergonomic API for downstream consumers:

6.1 Step-level helpers

Add on NormalizedNativeStep:

@property
def is_inline_tool_step(self) -> bool:
    return bool(self.tool_representation and self.tool_representation.get("class") in ("GalaxyUserTool", "GalaxyTool"))

@property
def inline_tool_class(self) -> Optional[str]:
    return self.tool_representation and self.tool_representation.get("class")

And on NormalizedWorkflowStep (format2):

@property
def is_inline_tool_step(self) -> bool:
    if isinstance(self.run, GalaxyUserToolStub):
        return True
    if isinstance(self.run, dict) and self.run.get("class") in ("GalaxyUserTool", "GalaxyTool"):
        return True
    return False

@property
def inline_tool_representation(self) -> Optional[dict]:
    if isinstance(self.run, GalaxyUserToolStub):
        return self.run.model_dump(by_alias=True, exclude_none=True)
    if isinstance(self.run, dict) and self.run.get("class") in ("GalaxyUserTool", "GalaxyTool"):
        return self.run
    return None

Keeps workflow_state from special-casing dict vs. stub everywhere.

6.2 GalaxyUserToolStub schema tightening — deferred decision

Today GalaxyUserToolStub(extra="allow") preserves arbitrary fields. Two options:

Recommendation: keep loose. The pydantic schema lives in galaxy.tool_util_models; gxformat2 importing it would invert the dependency. workflow_state is the right place to enforce the gate because it already depends on both.

6.3 Native ↔ format2 round-trip lock

Add tests/test_inline_tool_roundtrip.py in gxformat2 with the synthetic-user-defined-tool example (already at gxformat2/examples/format2/synthetic-user-defined-tool.gxwf.yml) plus a synthetic native fixture. Confirms tool_representation survives both directions with no field drift.

6.4 Optional: format2 schema docs

gxformat2/schema/native_strict.py:354 documents tool_representation; the format2-side docs for run: should mention the GalaxyUserToolStub shape explicitly. Doc-only.

7. CLI surface

7.1 Global --offline flag

A single global flag on gxwf (and galaxy-tool-cache) that disables all network access across subcommands.

Important contrast with current lint_user_tool_source behavior. The existing function (lint.py:319-337) always skips NETWORK_LINTERS because the interactive Galaxy tool editor cannot block save on third-party APIs. Workflow lint is a deeper, less interactive pass — EDAM and biotools checks are worth running. The plan therefore enables network linters in workflow lint by default and --offline restores the always-skip posture. This is the inverse of what early drafts of this plan implied.

The new lint_user_tool_source_structured(user_tool_source, *, skip_network=False) (§5.2.1) is the seam: workflow_state passes skip_network=offline_mode. The existing one-arg lint_user_tool_source shim continues to skip network linters always (interactive editor preserves current behavior).

--offline also:

Implementation: an OfflineMode context object plumbed through ToolCacheOptions (_cli_common.py:183-195 adds one field). Threaded as a kwarg into resolve_for_step and into _validate_inline_tool_source. A test (test_offline_coverage.py) asserts every gxwf subcommand registers the --offline flag.

7.2 gxwf-state-validate / gxwf state-validate

7.3 gxwf-lint-stateful / gxwf lint-stateful

7.4 galaxy-tool-cache populate-workflow

7.5 galaxy-tool-cache embedded-schema (new subcommand)

Existing: galaxy-tool-cache schema <tool_id> exports WorkflowStepToolState JSON Schema for a cached tool.

Add: galaxy-tool-cache embedded-schema <workflow_path> → emits a flat directory of per-step JSON Schemas. Filename convention:

<tool_id>.<version>.<step_id>.schema.json

Step id guarantees uniqueness even when two steps embed the same (tool_id, version). Tool id + version up front keeps the names self-describing for grep/glob workflows.

Backend consumer side. validation_json_schema.py’s two-level backend currently keys tool_schema_dir lookups on <tool_id>.<version>.schema.json. After this plan: the backend’s per-step Level-2 resolver gains an inline-tool branch — if step.tool_representation is set, look for <tool_id>.<version>.<step_id>.schema.json and use it; else fall back to the existing <tool_id>.<version>.schema.json (cacheable across workflows). One new lookup branch, both naming schemes coexist. Test: test_json_schema_inline.py::test_per_step_schema_lookup_prefers_inline_file (§9.7).

7.6 gxwf-roundtrip-validate

8. Library entry points

workflow_state.__init__ gains:

from .inline_tool import (
    InlineToolSourceResult,
    parse_inline_tool,
    validate_inline_tool_source,
    InlineAwareGetToolInfo,
)

So in-process consumers can run the gate without going through argparse.

9. Test plan

Red-to-green per ../../../.claude/CLAUDE.md preference.

Red-first tests are marked R below — they describe the current false-negative behavior and fail before the implementation lands. G tests are green-only-after-implementation.

9.1 Unit tests

TestR/GWhat it asserts
test_inline_tool_resolver.py::test_parse_minimal_user_toolG_parse_inline_tool({...}) returns a ParsedTool with declared inputs/outputs.
test_inline_tool_resolver.py::test_validate_source_blank_versionG_validate_inline_tool_source flags blank version (PR #22615 rule).
test_inline_tool_resolver.py::test_validate_source_undeclared_input_refG$(inputs.missing) in shell_command is reported.
test_inline_tool_resolver.py::test_validate_source_unclaimed_outputGoutput without from_work_dir or discover_datasets is reported.
test_inline_tool_resolver.py::test_container_shape_lint_warningGcontainer-shape linter warning surfaces via the new structured lint API.
test_inline_tool_resolver.py::test_admin_class_emits_unsupported_warningGclass: GalaxyTool returns an inline_source_unsupported warning, not a validation pass.
test_inline_tool_resolver.py::test_offline_skips_network_lintersGoffline=True skips NETWORK_LINTERS (BioToolsValid, EDAMTermsValid).
test_inline_tool_resolver.py::test_lint_structured_returns_severityGNew lint_user_tool_source_structured returns (errors, warnings) tuples.
test_resolve_for_step.py::test_inline_step_skips_fallbackGresolve_for_step does not invoke get_tool_info.get_tool_info for inline steps.
test_resolve_for_step.py::test_non_inline_step_delegatesGRegular steps still hit the fallback GetToolInfo.

9.2 Native-state validation tests

TestR/GWhat it asserts
test_validation_native_inline.py::test_inline_state_validatedRA workflow with a UDT step whose tool_state violates the schema is reported with a state error pointing at the step. (Currently passes silently.)
test_validation_native_inline.py::test_inline_state_cleanGCleanly-formed inline UDT step passes.
test_validation_native_inline.py::test_broken_source_no_longer_silentRWorkflow with a UDT step containing $(inputs.missing) is reported, not silently accepted.
test_validation_native_inline.py::test_clean_strips_state_not_representationGstate-clean cleans step.tool_state’s stale keys; step.tool_representation is byte-identical post-clean.
test_precheck.py::test_udt_step_not_flagged_as_legacyGprecheck_native_workflow does not skip workflows whose only “missing tool_id” steps are inline UDTs.

9.3 Format2-state validation tests

TestR/GWhat it asserts
test_validation_format2_inline.py::test_user_tool_stub_resolvedGA format2 workflow with run: {class: GalaxyUserTool, ...} validates against the parsed tool.
test_validation_format2_inline.py::test_user_tool_stub_state_errorRState mismatch with the inline schema is reported. (Currently passes silently.)
test_to_native_stateful_inline.py::test_format2_udt_to_native_roundtripGFormat2 UDT workflow → native via to_native_stateful produces correct tool_representation.

9.4 Connection-graph tests

TestR/GWhat it asserts
test_connection_inline.py::test_inline_output_type_propagatesRDownstream connection consuming an inline UDT output sees the correct format. (Currently sees no type.)
test_connection_inline.py::test_format_source_resolves_to_inline_inputGformat_source: <input_name> on an inline UDT output resolves against the inline input list.
test_connection_inline.py::test_diagnostic_attributed_to_consumerGWrong-typed connection diagnostic lands on the consuming step, not the producing UDT.
test_connection_inline.py::test_udt_in_subworkflow_resolvedGUDT step nested inside a subworkflow is resolved via the same resolve_for_step path.

9.5 Round-trip + lint

TestWhat it asserts
test_roundtrip_inline.py::test_native_to_format2_to_native_preserves_representationtool_representation survives both directions, byte-identical post-canonicalization.
test_lint_inline.py::test_inline_source_lint_phasegxwf lint-stateful reports inline-source lint findings alongside structural lint.

9.6 CLI integration

TestR/GWhat it asserts
test_gxwf_inline_cli.py::test_state_validate_strict_inline_sourceG--strict-inline-source promotes inline-source errors to failure exit code.
test_gxwf_inline_cli.py::test_cache_populate_skips_inline_toolsGgalaxy-tool-cache populate-workflow does not attempt to fetch inline UDTs from ToolShed.
test_gxwf_inline_cli.py::test_embedded_schema_dumpGgalaxy-tool-cache embedded-schema <wf> produces per-step JSON Schemas with the documented filename convention.
test_offline_coverage.py::test_every_subcommand_accepts_offlineGEvery registered gxwf / galaxy-tool-cache subcommand parses --offline without error.
test_offline_coverage.py::test_offline_skips_network_lintGWith --offline, EDAM/biotools linters are not invoked.
test_offline_coverage.py::test_offline_skips_toolshed_fetchGWith --offline, populate-cache does not hit ToolShed.

9.7 JSON Schema validation backend

TestR/GWhat it asserts
test_json_schema_inline.py::test_per_step_schema_for_inline_udtGvalidate_native_workflow_json_schema validates inline-UDT step state against a schema generated from the inline tool.
test_json_schema_inline.py::test_per_step_schema_lookup_prefers_inline_fileGWith --tool-schema-dir, inline steps look up <tool_id>.<version>.<step_id>.schema.json first; non-inline steps still use <tool_id>.<version>.schema.json.

9.8 Fixtures

9.9 IWC sweep — DROPPED (2026-05-22)

Original plan added a synthetic IWC-shaped fixture to test_iwc_sweep.py. Dropped along with Phase F: IWC corpus has zero UDTs today and this work is expected to land in CI before IWC ever embeds one. If real IWC workflows start embedding UDTs, add fixtures from the real corpus at that point.

10. Phasing

Phase A — helpers and source validation (smallest landing unit) ✅ LANDED

  1. _util.py helpers (step_tool_representation, step_is_inline_tool, step_inline_tool_class).
  2. Galaxy-side: extend lint_user_tool_source to surface severity (new lint_user_tool_source_structured, back-compat shim — §5.2.1). Touches lib/galaxy/tool_util/lint.py; existing one-arg callers unaffected.
  3. _inline_tool.py (underscore-prefixed): _parse_inline_tool, _validate_inline_tool_source, InlineToolSourceResult (only InlineToolSourceResult is public).
  4. Unit tests 9.1, plus fixtures.

No behavior change in any CLI yet. Pure additions. Lands clean.

Phase B — resolver injection ✅ LANDED

  1. Add resolve_for_step free function to _inline_tool.py; promote private function names if the call signatures have settled.
  2. Per-invocation _ResolveCache wrapper (memoize parse per step identity within one CLI invocation — avoid re-parsing the same tool_representation across validate/clean/connections phases).
  3. Update every GetToolInfo caller from §5.4 to use resolve_for_step: validation_native, validation_format2, connection_graph, clean.py, roundtrip.py, to_native_stateful.py, connection_validation.py, lint_stateful.py, cache.populate_workflow.
  4. clean.py’s short-circuit on tool_id=None is fixed as a side effect — its tool-resolution path now succeeds for inline UDTs, so stale-key classification works against the inline schema. The embedded tool_representation itself is never modified.
  5. Unit tests 9.2-9.4.

After Phase B, state, clean, and connection validation honor inline UDTs but reports nothing extra about source/lint problems — those still flow as today through whichever path fails first. GetToolInfo Protocol shape is unchanged (structural typing preserved for external consumers).

Phase B status (2026-05-22)

Files modified (wf_tool_state branch, uncommitted):

Tests added (all green):

Suite health: 605 passed, 16 skipped in test/unit/tool_util/workflow_state/ (excluding pre-existing unrelated fixture/binary failures in test_declarative.py / test_gxwf_cli.py).

Phase B no-ops (deferred per §10 phasing):

Plan deviations:

Known issues:

Deferred Phase B test items (out of scope for B, owned by C/E):

Phase C — diagnostics surface ✅ LANDED

  1. New diagnostic types in _report_models.py (inline_source_invalid, inline_source_lint_warning, inline_source_lint_error).
  2. Wire validate_inline_tool_source into validate.py and lint_stateful.py.
  3. New --strict-inline-source axis.
  4. JSON + Markdown report rendering for inline diagnostics.
  5. Tests 9.5-9.6.

Phase C status (2026-05-22)

Files modified (wf_tool_state branch, uncommitted):

Tests added (all green):

Plan deviations:

Phase D — cache and JSON Schema surface ✅ LANDED

  1. galaxy-tool-cache populate-workflow skips inline steps + emits inventory.
  2. galaxy-tool-cache list-inline-tools new subcommand.
  3. galaxy-tool-cache embedded-schema <workflow> new subcommand.
  4. Two-level JSON Schema validation backend learns to consume the new per-step schemas.
  5. Tests 9.7.

Phase D status (2026-05-22)

Files modified (wf_tool_state branch, uncommitted):

Tests added (all green):

Suite health (post C+D + triage): 644 passed, 16 skipped in test/unit/tool_util/workflow_state/ (was 605 at start of C). 16 skipped is pre-existing.

Plan deviations / carry-overs:

Suggested Phase E pickups (from C/D review):

Phase E — gxformat2 ergonomic additions ✅ LANDED (mostly)

  1. Add is_inline_tool_step / inline_tool_representation properties on the gxformat2 models.
  2. Round-trip lock test in gxformat2.
  3. Refactor workflow_state call sites to use the new properties (removes duplicated dict probing).
  4. Bump gxformat2 dep pin (already on a branch dep, so this rides the same bump).
  5. Widen ExpandedWorkflowStep.run to include GalaxyUserToolStub; add pass-through branch in _expand_format2. Revert ensure_format2(expand=False) Phase B workaround.

Phase E status (2026-05-22)

gxformat2 PR: galaxyproject/gxformat2#218 — “Less Broken User Defined Tool Support during Normalization”. Open, three commits, currently being polished by a parallel agent. CI status: Java/TS/codecov/build_packages green; Python CI matrix has one failure on 3.11 with the rest cancelled — needs triage before merge.

gxformat2 commits on parameter_models:

Galaxy-side commits on wf_tool_state:

Confirmed in tree:

Phase E carry-overs:

Phase E follow-up: InlineResolver cache wired (2026-05-22)

Promoted _ResolveCache → public InlineResolver (Phase B deviation finally addressed). Threaded through the walk entry points so per-step parse_tool(YamlToolSource(...)) calls are memoized across the multiple validation phases that touch one workflow.

Files modified (wf_tool_state branch, uncommitted):

Tests added (8 new, all green):

Concrete cache wins (locked by tests):

Known limitations (deliberate, locked by tests):

Phase E follow-up: §9.8 on-disk fixtures landed (2026-05-22)

Migrated the Python-dict synthesizers from test_inline_udt_workflows.py, test_inline_source_validation.py, and test_json_schema_inline.py to two canonical on-disk fixtures plus a shared loader. Removes ~150 lines of duplicated CAT_UDT / _native_with_inline_udt / _format2_with_inline_udt definitions.

Fixtures created:

The embedded UDT body matches test/functional/tools/cat_user_defined.yml (same id, version, container, shell_command) so the fixtures double as a coupling point between offline workflow_state validation and Galaxy’s live UDT framework tools.

Loader (test/unit/tool_util/workflow_state/inline_udt_fixtures.py):

Test refactors:

Declarative coverage added:

Suite health: 716 passed (was 714), 12 pre-existing failures unchanged.

Phase F — REMOVED FROM SCOPE (2026-05-22)

Original Phase F was: synthetic IWC-shaped fixture in the sweep, docs in doc/source/dev/wf_tooling.md, CURRENT_STATE D7/D10 note. Dropped because this work is expected to land in CI before IWC ever embeds its first UDT — the synthetic-IWC sweep fixture would be regressioning against a hypothetical. If real IWC workflows start embedding UDTs later, revisit with a fresh fixture pass against actual content. Docs and CURRENT_STATE notes belong in whatever PR ships this work, not in this plan.

11. Risk and back-out

12. Decisions (resolved 2026-05-22)

Two interview passes with jmchilton, plus a review pass, captured here so Phase A implementation has unambiguous direction.

12.1 First-pass decisions

12.2 Review-pass decisions