Plan: Structured Diff Classification for Roundtrip Validation
Goal
Replace plain string diffs with structured objects that classify each mismatch by type and severity. Known-benign representation artifacts (all-None sections dropped, empty repeats dropped, connection-only sections dropped) are distinguished from real data loss. Reporting surfaces the distinction in text, JSON, and summary output.
Data Model
class DiffType(Enum):
VALUE_MISMATCH = "value_mismatch"
MISSING_IN_ROUNDTRIP = "missing_in_roundtrip"
MISSING_IN_ORIGINAL = "missing_in_original"
CONNECTION_MISMATCH = "connection_mismatch"
POSITION_MISMATCH = "position_mismatch"
LABEL_MISMATCH = "label_mismatch"
ANNOTATION_MISMATCH = "annotation_mismatch"
COMMENT_MISMATCH = "comment_mismatch"
class DiffSeverity(Enum):
ERROR = "error"
BENIGN = "benign"
@dataclass
class StepDiff:
step_path: str # "step 6" or "step 22:subworkflow//step 3"
key_path: str # "single_paired.global_trimming_options"
diff_type: DiffType
severity: DiffSeverity
description: str # human-readable summary
original_value: Optional[Any] = None
roundtrip_value: Optional[Any] = None
benign_reason: Optional[str] = None # why it's benign, if severity=BENIGN
Benign Classification
Three patterns, classified at construction time in compare_tool_state:
1. All-None section dropped
MISSING_IN_ROUNDTRIP where original value is a dict with every leaf None/"null".
def _is_all_none_dict(d):
if not isinstance(d, dict):
return False
for v in d.values():
if isinstance(v, dict):
if not _is_all_none_dict(v):
return False
elif v not in (None, "null"):
return False
return True
Benign reason: "all-None section omitted by format2 export"
2. Empty repeat/list dropped
MISSING_IN_ROUNDTRIP where original value is [], or a dict containing only [] values and None/"null".
def _is_empty_container_dict(d):
if not isinstance(d, dict):
return False
for v in d.values():
if isinstance(v, list) and len(v) == 0:
continue
if isinstance(v, dict):
if not _is_empty_container_dict(v):
return False
elif v not in (None, "null"):
return False
return True
Benign reason: "empty repeat/list omitted by format2 export"
3. Connection-only section dropped
MISSING_IN_ROUNDTRIP where original value is a dict with every leaf a ConnectedValue/RuntimeValue marker or None.
def _is_connection_only_dict(d):
if not isinstance(d, dict):
return False
for v in d.values():
if isinstance(v, dict):
if _is_connection_marker(v):
continue
if not _is_connection_only_dict(v):
return False
elif v not in (None, "null"):
return False
return True
Benign reason: "connection-only section omitted by format2 export (connections preserved in 'in' block)"
Everything else: severity=ERROR.
Changes to Comparison Functions
All comparison functions return list[StepDiff] instead of list[str]:
compare_tool_state(orig, after, path, step_path)→list[StepDiff]_compare_list_state(orig, after, path, step_path)→list[StepDiff]_compare_step_visual(orig, after, step_path)→list[StepDiff]compare_connections(orig, after, step_path)→list[StepDiff]_compare_connections_with_id_mapping(orig, after, id_mapping, step_path)→list[StepDiff]compare_comments(orig, after, path)→list[StepDiff]compare_steps(orig, after, step_path)→list[StepDiff]_compare_steps_with_id_mapping(orig, after, id_mapping, step_path)→list[StepDiff]compare_workflow_steps(orig, after, path)→list[StepDiff]
The step_path parameter threads through from compare_workflow_steps so each StepDiff has its full path at construction.
Changes to RoundTripValidationResult
@dataclass
class RoundTripValidationResult:
...
diffs: Optional[list[StepDiff]] = None
@property
def error_diffs(self) -> list[StepDiff]:
return [d for d in (self.diffs or []) if d.severity == DiffSeverity.ERROR]
@property
def benign_diffs(self) -> list[StepDiff]:
return [d for d in (self.diffs or []) if d.severity == DiffSeverity.BENIGN]
@property
def ok(self) -> bool:
if self.error:
return False
if self.conversion_result and not self.conversion_result.success:
return False
if self.diffs is None:
return False
return len(self.error_diffs) == 0
@property
def status(self) -> str:
if self.error:
return "error"
if self.conversion_result and not self.conversion_result.success:
return "conversion_fail"
if self.diffs is None:
return "error"
if len(self.error_diffs) > 0:
return "roundtrip_mismatch"
return "ok"
@property
def summary_line(self) -> str:
status = self.status
name = os.path.basename(self.workflow_path)
n_steps = len(self.conversion_result.step_results) if self.conversion_result else 0
if status == "ok":
benign = len(self.benign_diffs)
if benign:
return f"{name}: OK ({n_steps} steps, {benign} benign diff(s))"
return f"{name}: OK ({n_steps} steps)"
elif status == "conversion_fail":
... # unchanged
elif status == "roundtrip_mismatch":
errors = len(self.error_diffs)
benign = len(self.benign_diffs)
parts = f"{errors} error(s)"
if benign:
parts += f", {benign} benign"
return f"{name}: MISMATCH ({parts})"
else:
... # unchanged
CLI Changes
--strict flag
Add strict: bool = False to RoundTripValidateOptions. When strict:
oktreats benign diffs as errors (all diffs are errors)- Summary counts benign diffs as failures
Implementation: add a strict param to RoundTripValidationResult.ok or check in the CLI when computing exit code.
Simpler approach — keep the model severity-aware, handle strict in the CLI/runner:
def _is_passing(result: RoundTripValidationResult, strict: bool) -> bool:
if strict:
return result.diffs is not None and len(result.diffs) == 0
return result.ok
Text output (format_validation_text)
Verbose mode shows all diffs with severity tag:
chipseq-sr.ga: OK (13 steps, 1 benign diff(s))
[benign] step 6: single_paired.global_trimming_options — all-None section omitted by format2 export
gromacs-dctmd.ga: OK (13 steps, 2 benign diff(s))
[benign] step 28: inps — connection-only section omitted (connections in 'in' block)
[benign] step 34: inps — connection-only section omitted (connections in 'in' block)
Non-verbose: benign diffs counted but not listed. Error diffs always listed.
JSON output
Each StepDiff serialized:
{
"step_path": "step 6",
"key_path": "single_paired.global_trimming_options",
"diff_type": "missing_in_roundtrip",
"severity": "benign",
"description": "present in original ({'trim_front1': None, ...}), missing in roundtripped",
"original_value": {"trim_front1": null, "trim_tail1": null},
"roundtrip_value": null,
"benign_reason": "all-None section omitted by format2 export"
}
Summary line
Summary: 71 OK (30 clean, 41 with benign diffs), 41 FAIL (total 112 workflows)
With --strict:
Summary: 30 OK, 82 FAIL (total 112 workflows)
Implementation Order
- Add
DiffType,DiffSeverity,StepDiffto roundtrip.py - Add
_is_all_none_dict,_is_empty_container_dict,_is_connection_only_dictclassifiers - Refactor
compare_tool_state→ returnlist[StepDiff], classify at construction - Refactor visual/connection/comment comparison functions similarly
- Refactor
compare_workflow_stepsto threadstep_paththrough - Update
RoundTripValidationResultproperties - Update
format_validation_textfor severity-aware output - Add
--strictflag to CLI - Run IWC suite — expect ~41 current MISMATCH to become OK with benign diffs
- Tests — verify benign classification for each artifact type
Unresolved Questions
None.