LINTING_OVERHAUL_PLAN

Workflow Linting Overhaul Plan

Goal: give every workflow lint rule a stable identity and per-rule metadata, so that (a) declarative tests can assert against rule IDs instead of message substrings, (b) the cross-project rule inventory is no longer hand-maintained, and (c) VS Code can map messages to document ranges via a json_pointer carried on each message.

Pattern is forked from Galaxy’s galaxy.tool_util.lint — we take the ideas, not the dependency. Per-rule-subclass dispatch is explicitly rejected (Galaxy runs ~100 independent tree walks; gxformat2 does one). We keep the single-traversal call structure and attach per-rule metadata classes as lookup-only records.

Decisions already pinned

  1. Fork the pattern — gxformat2 does not depend on galaxy.tool_util.lint.
  2. Keep single-traversal linting. Linter subclasses are metadata holders only, not dispatch units. Future refactor to per-rule passes is allowed but not in scope here.
  3. Galaxy-style naming — rule ID = class name. NativeStepKeyNotInteger, WorkflowMissingLicense, etc. No A7_ numeric prefix.
  4. Profiles live in an external shared JSON mapping, not on the class. Three profiles:
    • structural — correctness gates. Workflow is broken if these fire.
    • best-practices — style / discoverability. IWC publishes these.
    • release — stricter gates that only apply before tagging a release (today: release: field present). Exactly one rule at start; profile exists so we can grow into it.
    • IWC profile = best-practices ∪ release.
    • Stateful checks (Galaxy-side tool state validation) classify as structural by default — they’re correctness gates against tool definitions, not style.

Glossary


Part 1 — gxformat2 (Python)

1.1 Shape of the new linting.py

Replace the string-bag LintContext with a structured-message version. Keep the old API surface (ctx.error(msg), ctx.warn(msg)) working as legacy shims so the first patch is purely additive.

# gxformat2/linting.py (new)

from __future__ import annotations
from enum import IntEnum
from typing import ClassVar, List, Optional


class LintLevel(IntEnum):
    SILENT = 5
    ERROR = 4
    WARN = 3
    INFO = 2
    VALID = 1
    ALL = 0


class Linter:
    """Metadata-only base class for lint rules.

    Subclasses never need to be instantiated. They exist so rule identity,
    default severity, and human docs live in one place next to the code
    that emits the message.
    """

    # class name is the id; override only if needed
    severity: ClassVar[str] = "error"             # "error" | "warning" | "info"
    applies_to: ClassVar[List[str]] = ["format2", "native"]
    default_profiles: ClassVar[List[str]] = ["structural"]
    description: ClassVar[str] = ""

    @classmethod
    def id(cls) -> str:
        return cls.__name__


class LintMessage:
    def __init__(
        self,
        level: str,
        message: str,
        linter: Optional[str] = None,
        json_pointer: Optional[str] = None,
    ):
        self.level = level
        self.message = message
        self.linter = linter
        self.json_pointer = json_pointer

    # Keep substring equality so legacy `"foo" in ctx.error_messages` tests work.
    def __eq__(self, other):
        if isinstance(other, str):
            return other in self.message
        if isinstance(other, LintMessage):
            return self.message == other.message and self.linter == other.linter
        return False


class LintContext:
    def __init__(self, level: str = "warn", training_topic: Optional[str] = None,
                 skip_rules: Optional[List[str]] = None, only_rules: Optional[List[str]] = None):
        self.level = level
        self.training_topic = training_topic
        self.messages: List[LintMessage] = []
        self.skip_rules = set(skip_rules or [])
        self.only_rules = set(only_rules) if only_rules is not None else None

    def _should_emit(self, linter_id: Optional[str]) -> bool:
        if linter_id is None:
            return True                            # legacy unlabeled messages always emit
        if self.only_rules is not None and linter_id not in self.only_rules:
            return False
        return linter_id not in self.skip_rules

    def _emit(self, level: str, message: str, linter: Optional[str], json_pointer: Optional[str]):
        if self._should_emit(linter):
            self.messages.append(LintMessage(level, message, linter=linter, json_pointer=json_pointer))

    def error(self, message: str, *args, linter: Optional[str] = None,
              json_pointer: Optional[str] = None, **kwds):
        if args or kwds:
            message = message.format(*args, **kwds)
        self._emit("error", message, linter, json_pointer)

    def warn(self, message: str, *args, linter: Optional[str] = None,
             json_pointer: Optional[str] = None, **kwds):
        if args or kwds:
            message = message.format(*args, **kwds)
        self._emit("warning", message, linter, json_pointer)

    # --- Backward-compat views used by tests/test_interop_tests.py ---
    @property
    def error_messages(self) -> List[str]:
        return [m.message for m in self.messages if m.level == "error"]

    @property
    def warn_messages(self) -> List[str]:
        return [m.message for m in self.messages if m.level == "warning"]

    @property
    def found_errors(self) -> bool:
        return any(m.level == "error" for m in self.messages)

    @property
    def found_warns(self) -> bool:
        return any(m.level == "warning" for m in self.messages)

Notes:

1.2 Rule registry

Each rule gets a small class next to its call site. The body stays inline in the single traversal — the class is just metadata.

# gxformat2/lint.py (diff against today)

class NativeStepKeyNotInteger(Linter):
    severity = "error"
    applies_to = ["native"]
    default_profiles = ["structural"]
    description = "Native workflow step keys must be numeric strings ('0', '1', ...)."


# call site inside lint_ga():
for order_index_str, step in nnw.steps.items():
    if not order_index_str.isdigit():
        lint_context.error(
            "expected step_key to be integer not [{value}]",
            value=order_index_str,
            linter=NativeStepKeyNotInteger.id(),
            json_pointer=f"/steps/{order_index_str}",
        )

The initial rule classes map 1:1 to the inventory in cross-project-linting.md:

Rule classProfile defaultInventory ID
NativeMissingMarkerstructuralA3
NativeMarkerNotTruestructuralA4
NativeMissingFormatVersionstructuralA5
NativeFormatVersionNotSupportedstructuralA6
NativeMissingStepsstructuralA2/A3 variant
NativeStepKeyNotIntegerstructuralA7
SubworkflowMissingStepsstructuralA8
StepExportErrorstructuralA9 (warning)
StepUsesTestToolshedstructuralA10 (warning)
WorkflowHasNoOutputsstructuralA11 (warning)
WorkflowOutputMissingLabelstructuralA12 (warning)
OutputSourceNotFoundstructuralA13
InputDefaultTypeInvalidstructuralA14
ReportMarkdownNotStringstructuralA15
ReportMarkdownInvalidstructuralA16
SchemaStrictExtraFieldstructural (warning)A17
SchemaStructuralErrorstructuralA18
Format2MissingClassstructuralA1
Format2MissingStepsstructuralA2
WorkflowMissingAnnotationbest-practicesB1
WorkflowMissingCreatorbest-practicesB2
WorkflowMissingLicensebest-practicesB3
WorkflowMissingReleasereleaseB4
CreatorIdentifierNotURIbest-practicesB5
StepMissingLabelbest-practicesB6
StepMissingAnnotationbest-practicesB7
StepInputDisconnectedbest-practicesB8
StepToolStateUntypedParameterbest-practicesB9
StepPJAUntypedParameterbest-practicesB10
TrainingWorkflowMissingTagbest-practicesB11
TrainingWorkflowMissingDocbest-practicesB12

All classes live in a single gxformat2/lint_rules.py module so that gxformat2/lint.py imports them near the top of the file (no inline imports) and downstream (Galaxy, TS codegen tooling) can enumerate them via Linter.__subclasses__().

1.3 External profile JSON

Shared, format-neutral, checked into gxformat2 so both Python and TS consume the same file.

# gxformat2/schema/linting/profiles.yml
structural:
  - NativeMissingMarker
  - NativeMarkerNotTrue
  - NativeMissingFormatVersion
  - NativeFormatVersionNotSupported
  - NativeStepKeyNotInteger
  - NativeMissingSteps
  - SubworkflowMissingSteps
  - StepExportError
  - StepUsesTestToolshed
  - WorkflowHasNoOutputs
  - WorkflowOutputMissingLabel
  - OutputSourceNotFound
  - InputDefaultTypeInvalid
  - ReportMarkdownNotString
  - ReportMarkdownInvalid
  - SchemaStrictExtraField
  - SchemaStructuralError
  - Format2MissingClass
  - Format2MissingSteps
  # Stateful rules (owned by Galaxy, registered here so profile membership is canonical)
  - ToolStateUnknownParameter
  - ToolStateInvalidValue
  - ToolNotInCache
  - StaleToolStateKeys
  - StepConnectionsInconsistent

best-practices:
  - WorkflowMissingAnnotation
  - WorkflowMissingCreator
  - WorkflowMissingLicense
  - CreatorIdentifierNotURI
  - StepMissingLabel
  - StepMissingAnnotation
  - StepInputDisconnected
  - StepToolStateUntypedParameter
  - StepPJAUntypedParameter
  - TrainingWorkflowMissingTag
  - TrainingWorkflowMissingDoc

release:
  - WorkflowMissingRelease

# Convenience profile assembled from the above; IWC = best-practices + release
iwc:
  extends:
    - best-practices
    - release

Loader sits in gxformat2/lint_profiles.py:

def load_profiles() -> dict[str, set[str]]:
    """Return profile-name -> set-of-rule-ids, with `extends` flattened."""
    ...

def rules_for_profiles(names: Iterable[str]) -> set[str]:
    ...

CLI: gxwf lint --profile structural,best-practices path.yml. Deprecate --skip-best-practices in favor of the profile flag but keep it as alias for one release.

1.4 lint_stateful_lite wrapper for declarative tests

tests/test_interop_tests.py today returns {errors, warnings, error_count, warn_count}. Extend to also return structured diagnostics without breaking old assertions:

def _lint_native(wf_dict):
    ctx = LintContext()
    nnw = ensure_native(wf_dict)
    _lint_ga_impl(ctx, nnw, raw_dict=wf_dict)
    return {
        "errors": ctx.error_messages,          # legacy
        "warnings": ctx.warn_messages,         # legacy
        "error_count": len(ctx.error_messages),
        "warn_count": len(ctx.warn_messages),
        "messages": [                          # new structured view
            {"level": m.level, "message": m.message,
             "linter": m.linter, "json_pointer": m.json_pointer}
            for m in ctx.messages
        ],
    }

Declarative assertions then support rule-ID navigation via the existing {field: value} path syntax:

test_lint_native_non_integer_step_key:
  fixture: synthetic-lint-non-integer-step.ga
  operation: lint_native
  assertions:
    - path: [messages, {linter: NativeStepKeyNotInteger}, level]
      value: error
    - path: [messages, {linter: NativeStepKeyNotInteger}, json_pointer]
      value: /steps/step_one
    - path: [messages, {linter: NativeStepKeyNotInteger}, message]
      value_contains: "integer"

1.5 Test plan (Python)

Red-to-green throughout. Each step should be one PR.

  1. PR 1 — scaffolding. Add linting.py rewrite + Linter base + LintMessage. Keep LintContext.error()/warn() unchanged in signature so no call site moves. All existing tests pass (the compat shims keep the legacy error_messages: List[str] view working). Add unit tests for LintMessage equality, LintContext.skip_rules, only_rules, messages property.
  2. PR 2 — profile module. Add lint_profiles.py + the YAML file. Unit tests: loader flattens extends, rejects unknown rules (cross-check against Linter.__subclasses__() after forcing subclass imports), iwc resolves to best-practices ∪ release.
  3. PR 3 — pilot rule (A7). Add NativeStepKeyNotInteger class in lint_rules.py. Update the one call site in lint_ga to pass linter=/json_pointer=. Extend _lint_native operation result with messages. Update lint_native.yml declarative fixture for test_lint_native_non_integer_step_key to assert on messages[{linter: ...}]. This is the red-to-green: the assertion fails before the linter-id wiring, passes after.
  4. PR 4 — migrate remaining rules. Mechanically add a Linter subclass per existing ctx.error/ctx.warn site. No behavior change. Add declarative messages[{linter: ...}] assertions to at least one expectation per rule (can reuse the existing fixture set). Remove the substring assertions once the ID-based one passes.
  5. PR 5 — profile enforcement. Add --profile CLI flag; deprecate --skip-best-practices (keep as alias). Tests: run lint with --profile structural over a fixture that has best-practice violations, assert best-practice rules are absent from messages. Same with --profile best-practices,release on a fixture missing release.
  6. PR 6 — json_pointer coverage. Audit every rule for a meaningful pointer. Anything without one gets a # TODO json_pointer comment and an explicit json_pointer=None pass. Declarative tests for at least 5 rules assert pointer shape.
  7. PR 7 — rule-metadata enumeration endpoint. Add gxwf lint --list-rules that prints the rule table (id, severity, profiles, applies_to, description). Test: snapshot test against the generated table. This is the mechanism that replaces the hand-curated cross-project-linting.md rule table.

CLAUDE.md constraints honored: red-to-green throughout, no test removal, no data mutation, imports at top of file (no per-method imports), no hand-edits to generated schema.


Part 2 — galaxy-tool-util TypeScript (gxwf-web tree)

Ground truth: /Users/jxc755/projects/worktrees/galaxy-tool-util/branch/gxwf-web. Research survey integrated below; key citations use TS-side file:line.

2.1 Current TS state (from the survey)

2.2 TS-side rule emission sites (from survey; use for mapping)

File:lineMessage (abbrev.)Rule ID
lint.ts:74”tool step contains error indicated during Galaxy export”StepExportError
lint.ts:80”test tool shed”StepUsesTestToolshed
lint.ts:93”markdown to be of class string”ReportMarkdownNotString
lint.ts:110/112a_galaxy_workflow absent/not trueNativeMissingMarker / NativeMarkerNotTrue
lint.ts:117/119format-version absent / != 0.1NativeMissingFormatVersion / NativeFormatVersionNotSupported
lint.ts:124native steps absentNativeMissingSteps
lint.ts:134step_key integerNativeStepKeyNotInteger
lint.ts:147/251subworkflow empty stepsSubworkflowMissingSteps
lint.ts:160/163no outputs / output without labelWorkflowHasNoOutputs / WorkflowOutputMissingLabel
lint.ts:194outputSource not foundOutputSourceNotFound
lint.ts:212/216/220invalid default typeInputDefaultTypeInvalid
lint.ts:233format2 steps absentFormat2MissingSteps
lint.ts:236format2 class absentFormat2MissingClass
lint.ts:340/373input disconnected (both paths)StepInputDisconnected
lint.ts:346step missing annotationStepMissingAnnotation
lint.ts:351step missing labelStepMissingLabel
lint.ts:357untyped tool_state paramStepToolStateUntypedParameter
lint.ts:362/381untyped PJA paramStepPJAUntypedParameter
lint.ts:395workflow not annotatedWorkflowMissingAnnotation
lint.ts:402missing creatorWorkflowMissingCreator
lint.ts:410creator identifier not URICreatorIdentifierNotURI
lint.ts:422missing licenseWorkflowMissingLicense

Gap vs Python inventory: TS does not implement ReportMarkdownInvalid (A16), TrainingWorkflowMissingTag/Doc (B11/B12), schema strict/lax lint (A17/A18). Those stay unported — out of scope.

2.3 TS design (adopted verbatim into this plan)

Use object-literal + Record, not class-as-metadata. Rule IDs are branded strings.

// packages/schema/src/workflow/lint-rules.ts (new)

export type LinterId = string & { readonly __brand: "LinterId" };

export interface LinterMetadata {
  id: LinterId;
  severity: "error" | "warning" | "info";
  appliesTo: readonly ("format2" | "native")[];
  defaultProfiles: readonly ("structural" | "best-practices" | "release")[];
  description: string;
}

export const RULES = {
  NativeStepKeyNotInteger: {
    id: "NativeStepKeyNotInteger" as LinterId,
    severity: "error",
    appliesTo: ["native"],
    defaultProfiles: ["structural"],
    description: "Native workflow step keys must be numeric strings.",
  },
  // ... full mirror of Python table
} as const satisfies Record<string, LinterMetadata>;

export type LintMessage = {
  level: "error" | "warning" | "info";
  message: string;
  linter: LinterId;
  json_pointer?: string;       // snake_case matches Python result shape
};

LintContext grows structured emission:

// lint.ts — replace today's string-bag
export class LintContext {
  messages: LintMessage[] = [];
  errors: string[] = [];       // legacy mirror, populated via getter
  warnings: string[] = [];

  error(message: string, opts: { linter: LinterId; json_pointer?: string }): void {
    this.messages.push({ level: "error", message, ...opts });
    this.errors.push(message);
  }
  warn(message: string, opts: { linter: LinterId; json_pointer?: string }): void {
    this.messages.push({ level: "warning", message, ...opts });
    this.warnings.push(message);
  }
}

export interface LintResult {
  errors: string[];
  warnings: string[];
  error_count: number;
  warn_count: number;
  messages: LintMessage[];     // new
}

Breaking change: every existing call site of ctx.error(str) must pass linter:. Enforced by TypeScript — a caller forgetting it won’t compile. This is the point.

2.4 Shared profile file

Write profiles once in gxformat2 (gxformat2/schema/linting/profiles.yml), sync to TS the same way fixtures are synced. New Makefile target sync-lint-profiles:

sync-lint-profiles:
    node scripts/sync-fixtures.mjs --group lint-profiles

Manifest entry in packages/schema/scripts/sync-manifest.json:

{
  "lint-profiles": {
    "source": "${GXFORMAT2_ROOT}/gxformat2/schema/linting/profiles.yml",
    "destination": "packages/schema/src/workflow/lint-profiles.yml"
  }
}

TS loader parses at build or test time; no runtime YAML parse on the hot path.

2.5 Declarative-test assertions for the TS harness

Extend declarative-test-utils.ts path navigation to match the Python {field: value} syntax so the same expectation YAML works on both sides:

// In navigate(), when a path element is a plain object (not array / not string / not int)
if (typeof element === "object" && !Array.isArray(element) && element !== null) {
  const [key, value] = Object.entries(element)[0];
  const found = (current as any[]).find((item) => item?.[key] === value);
  if (!found) throw new Error(`No element matching ${key}=${value}`);
  current = found;
  continue;
}

After the change, the identical YAML assertion from §1.4 runs unmodified on both vitest and pytest.

2.6 VS Code integration

Per the survey: server/packages/server-common/src/providers/validation/ has its own ValidationRule classes (TestToolShedValidationRule, StepExportErrorValidationRule, RequiredPropertyValidationRule). It does not call lintFormat2/lintNative. Adding json_pointer to TS lint messages creates the bridge:

Out of scope for this plan: actually retiring the duplicate ValidationRule classes. That’s a follow-up once json_pointer coverage is proven on the pilot.

2.7 TS test plan (pilot: NativeStepKeyNotInteger)

  1. Add lint-rules.ts with the RULES record (initially just the pilot).
  2. Update lint.ts:134 to call ctx.error(msg, { linter: RULES.NativeStepKeyNotInteger.id, json_pointer: \/steps/${orderIndex}` }). Compile error on every other ctx.error/ctx.warn` call — that’s the todo list.
  3. As a one-off, allow the other call sites to compile with a temporary linter: "TODO" as LinterId. Track with a commit message listing the rule-id to-assign. (Same back-compat dance as Python PR 1.)
  4. Extend navigate() in declarative-test-utils.ts to handle {field: value} path elements.
  5. Re-run make sync-workflow-expectations to pull in the updated lint_native.yml from gxformat2 (must land gxformat2 PR 3 first). Tests fail until step 2 is in place → go green after.
  6. Add a lint-profiles.yml sync step; unit test that the loaded profile set matches the Python side (snapshot test from a checked-in golden fixture).

2.8 TS risks / opens (surfaced by the survey — decisions noted)


Part 3 — Galaxy wf_tool_state stateful linter

Ground truth: /Users/jxc755/projects/worktrees/galaxy/branch/wf_tool_state/lib/galaxy/tool_util/workflow_state/lint_stateful.py. 483 lines, composes gxformat2’s lint + state validation.

3.1 Current shape

3.2 What changes

A. Adopt the gxformat2 Linter pattern for stateful checks. Add a galaxy/tool_util/workflow_state/lint_rules.py module mirroring gxformat2’s style:

from gxformat2.linting import Linter

class ToolStateUnknownParameter(Linter):
    severity = "warning"
    applies_to = ["format2", "native"]
    default_profiles = ["structural"]
    description = "tool_state contains a parameter not defined by the current tool XML."

class ToolStateInvalidValue(Linter):
    severity = "error"
    ...

class ToolNotInCache(Linter):
    severity = "info"
    default_profiles = ["structural"]

class StaleToolStateKeys(Linter):
    severity = "warning"
    default_profiles = ["structural"]

class StepConnectionsInconsistent(Linter):
    severity = "warning"
    default_profiles = ["structural"]

All five register in the shared profiles.yml under structural (matching the user’s note that stateful checks are “in some ways structural”).

B. Route stateful emissions through LintContext with rule IDs. Today stateful findings live in ValidationStepResult; they should also surface as LintMessage instances so a single result stream is what the CLI/VS Code consume. Concretely:

C. Profile-driven execution. Add --profile structural,best-practices,release to gxwf lint CLI (uncovered by today’s --skip-best-practices). StaleKeyPolicy remains orthogonal — it configures severity of stale-key hits within StaleToolStateKeys, not whether the rule runs.

D. Stateful-to-lint mapping table. Add a small module doc in lint_stateful.py mapping each stateful diagnostic class to a gxformat2 Linter ID. Used both at emission time and by tests asserting on linter: ToolStateUnknownParameter.

3.3 Galaxy test plan

  1. Unit test: Linter.__subclasses__() discovery. Import both gxformat2 lint rules and Galaxy stateful lint rules; assert the five stateful IDs appear in the structural profile resolution.
  2. Declarative interop tests. Galaxy’s test/unit/tool_util/workflow_state/ already has an expectations directory. Add lint_stateful.yml with cases like:
    test_stale_cat1_keys:
      fixture: synthetic-cat1-stale.ga
      operation: lint_stateful
      assertions:
        - path: [messages, {linter: StaleToolStateKeys}, level]
          value: warning
        - path: [messages, {linter: StaleToolStateKeys}, json_pointer]
          value_contains: "/steps/"
  3. Red-to-green for the pilot. Pick ToolStateUnknownParameter. Add the Linter subclass. Wire emission. Update declarative expectations. Verify previously green tests stay green (via legacy errors/warnings string views) while new messages[{linter: ...}] assertions start passing.
  4. CLI profile test. Run gxwf lint --profile structural on a fixture with best-practice violations; assert only structural + stateful rules emit. Run with --profile structural,best-practices; assert both.
  5. Backwards-compat. Ensure SingleLintReport.lint_errors/lint_warnings still count matches today for every existing fixture in test/unit/tool_util/workflow_state/fixtures/. Snapshot existing reports; regenerate; diff empty.

3.4 Galaxy risks / opens


Execution order

Strict sequencing (each phase is atomic on its own repo):

  1. gxformat2 PR 1–3 — linting.py rewrite, profile module, pilot rule A7. Release new gxformat2 version.
  2. galaxy-tool-util TS — pilot rule + harness extension in parallel with gxformat2 PR 4. Sync once gxformat2 PRs 1–3 are released and profiles.yml lives in gxformat2.
  3. gxformat2 PR 4–7 — migrate remaining rules, add CLI profile flag, pointer coverage, --list-rules. TS PRs follow behind, one rule group at a time.
  4. Galaxy stateful — bump gxformat2 dep, add lint_rules.py, pilot ToolStateUnknownParameter, then remaining four.
  5. VS Code — introduce WorkflowLintDiagnosticsRule adapter using json_pointer for range resolution. Retire duplicate ValidationRule classes one-by-one.

Success criteria

Unresolved questions