Workstream A - Upstream gxformat2 Schema Modeling
Goal
Make upstream gxformat2 explicitly aware of draft workflow markers so downstream TS codegen has a principled model to sync from. This workstream is deliberately upstream-first: B-E must not patch around missing draft fields in generated TS artifacts.
Inputs
- Metaplan: INDEX.md
- Draft format spec:
/Users/jxc755/projects/worktrees/foundry/branch/design/content/research/galaxy-workflow-draft-format.md - gxformat2 worktree:
/Users/jxc755/projects/worktrees/gxformat2/branch/abstraction_applications - Schema-salad source:
schema/v19_09/workflow.yml - Shared tool fields:
schema/common/common.yml - Pydantic / schema build:
build_schema.sh
Modeling Decisions
TodoSentinelis a constrained string shape, not a plainstringalias.- Intended sentinel pattern:
^TODO(_[a-z0-9_]+)?$. - Bare
TODOis legal fortool_idandtool_version. TODO_<hint>is canonical for wrapper-determined port names inin:keys,out[].id, and the port half ofoutputs[].outputSource._plan_state,_plan_context,_plan_in, and_plan_outare optional string fields onWorkflowStep._plan_*fields are allowed on all step kinds in v1, includingsubworkflow,pause, andpick_value, until real examples show a reason to tighten.format2-draft.schema.jsonis downstream TS package work. Upstream gxformat2 should not block on packaging that artifact.
Plan
-
Confirm codegen support for
TodoSentinel.- Add a local spike type in schema-salad that attempts to express a string with pattern
^TODO(_[a-z0-9_]+)?$. - Run the pydantic and TS schema generation paths.
- Inspect whether generated Python, strict Python, and TS artifacts preserve the constraint or erase it to
string. - Record the outcome in this subplan before doing the final schema edit.
Outcome 2026-05-22: schema-salad rejects a named primitive
type: stringwithpattern, and schema-salad-plus-pydantic emits references to such a type without defining it.TodoSentinelcannot currently be represented as a true schema-salad constrained string through the existing codegen path. Upstream now owns the sentinel contract as metadata/constants; downstream draft-checks enforce the pattern semantically. - Add a local spike type in schema-salad that attempts to express a string with pattern
-
Add explicit
_plan_*fields toWorkflowStep.- Edit
schema/v19_09/workflow.yml. - Use optional string types.
- Document that these fields are draft-only and must be stripped before runnable workflow validation/import.
- Keep them on
WorkflowStep, not a tool-step-only subtype, because v1 allows planning notes on non-tool steps.
Outcome 2026-05-22: direct leading-underscore schema field names generate invalid Pydantic fields unless patched after codegen. The upstream schema now uses the real serialized names (
_plan_state,_plan_context,_plan_in,_plan_out).scripts/patch_generated_pydantic.pyrewrites the generated Pydantic attributes to Python-safe names (plan_state,plan_context,plan_in,plan_out) with aliases preserving the serialized keys. The generated Effect schema exposes the draft keys with leading underscores. - Edit
-
Model TODO-bearing fields where codegen can preserve useful structure.
tool_idandtool_versionlive inschema/common/common.ymlunderReferencesTool.out[].idcomes fromWorkflowStepOutputextendingcwl:Identified.outputSourceisWorkflowOutputParameter.outputSource.in:keys are record keys via themapSubject: idshape; if schema-salad cannot constrain keys here, document the intended sentinel shape and leave key validation to downstream draft-checks.- Do not make the base schema reject normal strings in any of these positions. Draft sentinels are additional allowed values, not a replacement for concrete gxformat2 values.
-
Regenerate upstream schema artifacts.
- Run
SKIP_JAVA=1 SKIP_TYPESCRIPT=1 bash build_schema.shfor the fast pydantic-only path while iterating. - Run the full build path before landing if the environment supports Java and TS codegen.
- Verify regenerated
gxformat2/schema/gxformat2.pyandgxformat2/schema/gxformat2_strict.pycontain_plan_*fields. - Verify strict pydantic validation no longer rejects
_plan_*onWorkflowStep.
- Run
-
Add upstream fixtures/tests.
- Positive: draft tool step with
tool_id: TODO, TODO input key, TODO output id, and full_plan_*block validates structurally. - Positive:
_plan_*on a subworkflow step validates structurally. - Positive: fully concrete workflow without any draft markers still validates exactly as before.
- Negative:
_plan_*outsideWorkflowStepis rejected by strict validation if strict schema can enforce unknown-field rejection at that location. - If
TodoSentinelconstraints survive codegen, add negative cases for malformed sentinel spellings. If they do not survive, leave malformed sentinel tests to downstream draft-checks.
- Positive: draft tool step with
-
Expose sentinel metadata for downstream sync if feasible.
- Prefer a generated or source-owned constant for the pattern so TS draft-checks can import rather than redeclare it.
- If the schema/codegen path cannot expose a constant cleanly, document the pattern in a stable upstream module or schema metadata location and let the TS agent decide whether importing it is practical.
-
Version and release coordination.
- Add a
HISTORY.rstentry describing draft workflow schema fields. - Bump gxformat2 according to existing release practice.
- After upstream lands, run the TS monorepo
make syncand verify the generated Effect schema includes_plan_*.
- Add a
Acceptance Criteria
WorkflowStephas explicit optional_plan_state,_plan_context,_plan_in, and_plan_outfields in schema-salad source.- Regenerated strict Python pydantic models accept
_plan_*on workflow steps. - Existing concrete gxformat2 fixture tests still pass.
- New draft schema fixtures prove the upstream model accepts the v1 draft markers it owns.
- The subplan records whether
TodoSentinelsurvived codegen as a constrained string or had to be enforced downstream. - No upstream task claims ownership of publishing
format2-draft.schema.json.
Risks
- Schema-salad may not express regex-constrained string subtypes in a way that survives pydantic and TS codegen.
in:keys are map keys, so the schema layer may not be able to representTodoSentinelthere directly.outputSourceis a compound string (step_label/port), so a pureTodoSentineltype cannot model only the port half without a separate downstream syntactic validator.- The generated artifacts may accept
_plan_*structurally but still provide no useful way to export the sentinel regex as a constant. That is acceptable as long as it is documented and downstream draft-checks own enforcement.