CORPUS_MEASUREMENT

IWC Corpus Measurement — gxwf validate/roundtrip

Captured 2026-06-22. Audit trail for the manuscript line-86 numbers + Axis-4 finding.

Provenance

Validation (default mode, tool-state)

Round-trip (native → format2 → native)

Modal finding — tools behind the 38 (cached-schema post-conversion failures)

count (step-level)tool
43compose_text_param
10macs2_callpeak
6busco
4calculate_numeric_param
3scanpy_filter
4meryl / meryl_count_kmers
~4qiime2 (dada2, diversity)

Modal tool: compose_text_param (IUC), via its components repeat + param_type conditional.

Worked example (verbatim)

computational-chemistry/fragment-based-docking-scoring/fragment-based-docking-scoring.ga, step 9, iuc/compose_text_param/0.1.1:

state failed post-conversion validation:
  components.1.param_type.component_value: is missing;
  components.1.param_type.select_param_type: Expected "text", actual "float";
  components.1.param_type: Expected undefined, actual {"select_param_type":"float"};
  components: Expected undefined, actual [{"param_type":{"select_param_type":"text",
    "component_value":"$SuCOS_Score >= "}},{"param_type":{"select_param_type":"float"}}]
diffs: []

This is the cascading-conditional-selector pattern (the figS2 caveat): the float case of select_param_type is reported as unexpected (validator appears to model only the text case) and the cascade collapses into parent-object mismatches.

INTERPRETATION — needs triage (do NOT assert a cause in the manuscript yet)

diffs: [] + errorDiffs: 0 ⇒ the round-trip preserves state; this is not conversion data-loss. The open question is why the reimported state fails post-conversion schema validation while forward validate-tree (default) reported 0 failures. Candidate causes:

  1. Validator/schema gap — the parsed tool schema doesn’t model the float (or other) conditional case, so legal state is rejected. (If so: a real validator bug to file; weakens nothing about conversion, but the depth claim’s floor.)
  2. Validation-profile asymmetry — post-conversion validation enforces required-param presence; forward default mode does not. Benign; reframe as “strict surfaces latent state gaps.”
  3. Genuinely stale source state — the published workflow omits a now-required param. The manuscript’s exact “latent inconsistency the round-trip surfaces” story.

Resolving (1) vs (2)/(3) requires inspecting compose_text_param’s parsed schema cases vs the tool XML. This is the highest-value next analysis and an author/dev judgment call.

Known measurement gap