TREE_SPLIT_PLAN

Plan: Split Single-File and Tree CLI Commands

Date: 2026-03-29 Branch: wf_tool_state_applications (builds on wf_tool_state) Status: DRAFT

Problem

Every gxwf-*-stateful command currently auto-detects os.path.isdir(workflow_path) and branches into single-file vs tree mode. This creates several concrete problems:

P1. Output semantics are irreconcilable

Single-file commands naturally write converted content to stdout (pipe-friendly, composable). Tree commands need to write N files — stdout is nonsensical. Today this tension manifests as:

P2. Result model wrapping hack

Single-file results (SingleValidationReport, SingleCleanReport) must be manually wrapped into TreeValidationReport / TreeCleanReport via wrap_single_validation() / wrap_single_clean() just to reuse the markdown formatter. This exists because we have one formatter contract but two result shapes.

P3. Duplicated is_dir branching in every module

Each module reimplements the same pattern:

is_dir = os.path.isdir(options.workflow_path)
if is_dir:
    return _run_tree(...)
else:
    return _run_single(...)

This appears in: validate.py:700, clean.py:627, roundtrip.py:1281, validate.py:622 (json-schema mode). Each branch has different error handling, precheck behavior, and output logic.

P4. Connection validation is bolted on

_run_connection_validation() in validate.py:535 has its own is_dir check and its own tree-walking loop, completely separate from the main validation pipeline. The TODO at line 702 acknowledges this:

# TODO: This feels like asymmetric - we should do the single or multiple dance
# between state and connection validation in some symmetric way.

lint_stateful.py also calls _run_connection_validation as a bolt-on at line 200-202, importing it from validate.py and suffering the same is_dir branching.

P5. Report format inconsistency

Single-file JSON output uses SingleValidationReport (flat step list). Tree JSON uses TreeValidationReport (nested by workflow). Users switching from single-file to directory get a different JSON schema with no migration path.

P6. Precheck handling diverges

P7. discover_workflows() imported inside domain functions

Every tree function does a late from .workflow_tree import discover_workflows inside the function body. Discovery is a cross-cutting concern that should be orchestrated from outside, not embedded in domain logic.

P8. lint_stateful.py inherits all validate.py pain points

lint_stateful.py imports validate_workflow_cli, _run_connection_validation, wrap_single_validation, and format_tree_markdown directly from validate.py. It inherits the wrapping hack (P2) and the bolt-on connection validation (P4). A tree variant would need to duplicate all of this again.


Design: -tree Command Variants

Split each command into two entry points with clear contracts:

Single-file commandTree command
gxwf-state-validategxwf-state-validate-tree
gxwf-state-cleangxwf-state-clean-tree
gxwf-roundtrip-validategxwf-roundtrip-validate-tree
gxwf-to-format2-statefulgxwf-to-format2-stateful-tree
gxwf-to-native-statefulgxwf-to-native-stateful-tree
gxwf-lint-statefulgxwf-lint-stateful-tree

Single-file contract

Tree contract

Shared contract (both)


Implementation Steps

Phase 0: Extract TreeOrchestrator — shared tree-walking infrastructure

New file: _tree_orchestrator.py

A generic driver that handles the discover->load->process->aggregate->report loop:

@dataclass
class TreeContext:
    root: str
    tool_info: GetToolInfo
    output_dir: Optional[str] = None

def run_tree(
    ctx: TreeContext,
    process_one: Callable[[WorkflowInfo, dict], T],
    aggregate: Callable[[str, List[Tuple[WorkflowInfo, T]]], TreeReport],
    format_text: Callable[[TreeReport], str],
    format_markdown: Callable[[TreeReport], str],
    report_options: HasReportDests,
    include_format2: bool = True,
) -> int:
    workflows = discover_workflows(ctx.root, include_format2=include_format2)
    results = []
    for info in workflows:
        wf_dict = load_workflow_safe(info)
        if wf_dict is None:
            results.append((info, None))  # load failure
            continue
        result = process_one(info, wf_dict)
        results.append((info, result))
    report = aggregate(ctx.root, results)
    # ... emit reports, compute exit code

This eliminates P3 and P7 — the tree loop is written once, discovery is at the orchestration layer, and each command just provides process_one + aggregate.

Changes:

populate_cache in tree mode: setup_tool_info in _cli_common.py already calls populate_cache(tool_info, options.workflow_path) and populate_cache already handles directories via _populate_cache_for_tree. This means cache population happens before run_tree() starts iterating, which is the right behavior — gather all tool defs upfront, then process. No changes needed; the existing populate_cache(path) with its own is_dir check is fine because it’s a cache-priming concern, not a workflow-processing concern. The one improvement: TreeOrchestrator could collect the unique tool set from discovered workflows and pass it to a new populate_cache_for_tools(tool_set) function to avoid the double-discovery. But this is an optimization we can defer.

Phase 1: Split gxwf-state-validate / gxwf-state-validate-tree

Why first: Most complex, has the most pain points (P3, P4, P5), good proving ground.

Single-file changes (validate.py)

Connection validation integration

New tree script (scripts/workflow_validate_tree.py)

Fixes: P1, P2, P3, P4, P5, P6, P7

Phase 2: Split gxwf-state-clean / gxwf-state-clean-tree

Single-file changes (clean.py)

New tree script

Fixes: P1 (clean dry-run clarity), P3, P6

Phase 3: Split gxwf-roundtrip-validate / gxwf-roundtrip-validate-tree

Single-file changes (roundtrip.py)

New tree script

Phase 4: Add gxwf-to-format2-stateful-tree (new)

Currently export_format2.py has no tree support despite the infrastructure. Rather than bolting it on, implement it cleanly as a tree command from the start.

Single-file changes (export_format2.py)

New tree script

Phase 5: Add gxwf-to-native-stateful-tree (new)

to_native_stateful.py is single-file only today (stdout-by-default, -o FILE). Tree conversion is a clear use case: convert an entire format2 project to native.

Single-file changes (to_native_stateful.py)

New tree script

Phase 6: Add gxwf-lint-stateful-tree (new)

lint_stateful.py composes structural lint + stateful validation for a single file. It currently inherits all the wrapping hacks (P2, P8) and bolt-on connection validation (P4) from validate.py.

Single-file changes (lint_stateful.py)

New tree script

Why this matters

Lint is the highest-value tree operation — it’s what CI would run against an IWC-style repo. Having gxwf-lint-stateful-tree path/to/iwc/workflows/ --report-json results.json as a single command is the end goal for CI integration.

Phase 7: Clean up report models

With tree/single split, we can simplify:


What stays shared


Migration

Since nothing has been released, no backward compat needed. If a single-file command receives a directory, print a clear error:

Error: got directory, use gxwf-state-validate-tree for batch validation

Testing Strategy


File inventory (new/modified)

FileAction
_tree_orchestrator.pyNEW — shared tree loop
scripts/workflow_validate_tree.pyNEW — CLI entry point
scripts/workflow_clean_tree.pyNEW — CLI entry point
scripts/workflow_roundtrip_validate_tree.pyNEW — CLI entry point
scripts/workflow_export_format2_tree.pyNEW — CLI entry point
scripts/workflow_to_native_stateful_tree.pyNEW — CLI entry point
scripts/workflow_lint_stateful_tree.pyNEW — CLI entry point
validate.pyMODIFY — remove tree paths, simplify, integrate connections
clean.pyMODIFY — remove tree paths, simplify
roundtrip.pyMODIFY — remove tree paths, simplify
export_format2.pyMODIFY — add directory rejection
to_native_stateful.pyMODIFY — add directory rejection
lint_stateful.pyMODIFY — remove wrapping, integrate connections, add dir rejection
_report_models.pyMODIFY — remove wrapping helpers
setup.cfgMODIFY — register 6 new console_scripts

Implementation Order

PhaseWhatEffort
0_tree_orchestrator.pySmall-Medium
1validate split (proving ground)Medium
2clean splitMedium
3roundtrip splitSmall
4export-format2 tree (new)Small
5to-native-stateful tree (new)Small
6lint-stateful tree (new, highest CI value)Medium
7Report model cleanupSmall

Phases 4, 5, 6 can proceed in any order after Phase 0. Phases 1-3 remove existing tree code and depend on Phase 0.