WRONG_NO_HANDLING_ISSUE

gxwf (Python): unquoted YAML-1.1 reserved words in format2 tool_state corrupt string values (use_guide: no → bool False)

Handoff doc for a Galaxy/Python gxwf agent. Prepared by Claude (AI assistant) on jmchilton’s behalf, 2026-06-08. Self-contained: repro, tested evidence, root cause, code pointers, fix, tests.

TL;DR

Python gxwf reads format2 (.gxwf.yml) with a YAML 1.1 loader (PyYAML SafeLoader). YAML 1.1 coerces the plain scalars no/yes/on/off/y/n (and case variants) to booleans. Galaxy tool_state select/boolean params store these as strings (e.g. StringTie’s guide.use_guide = "no"). When such a value is written unquoted in format2 (use_guide: no), Python loads it back as Python False, which then fails the tool’s conditional discriminator. The TypeScript gxwf reads the same file with YAML 1.2 and gets the correct string "no", so this is a Python-side correctness/data-integrity bug. It affects every Python format2 read path — validate, convert, clean, roundtrip, lint — not just validate.

There is also a contributing emitter bug: Python’s format2 writer (ruamel YAML() path) emits the string "no" unquoted, so it produces files that its own reader then misreads.

Reproduction

Workflow: IWC transcriptomics/brew3r/BREW3R, step assembl with StringTie (toolshed.g2.bx.psu.edu/repos/iuc/stringtie/stringtie/2.2.3+galaxy0). Its tool_state conditional:

guide:
  use_guide: no        # <-- intended string "no", select option / __current_case__: 0

Native .ga source stores it correctly as a string:

"guide": {"use_guide": "no", "__current_case__": 0}

Run Python validate on the format2 file:

$ gxwf validate BREW3R.gxwf.yml
Step 2: .../stringtie/2.2.3+galaxy0 ... FAIL
  1 validation error for DynamicModelForTool
  guide
    Input tag 'false' found using model_x_discriminator() does not match any of
    the expected tags: 'no', '__absent__', 'yes' [type=union_tag_invalid, ...]
    input_value={'use_guide': False}

The discriminator expects 'no' and receives boolean False. TypeScript gxwf validate on the same file does not report this — it reads "no" as a string and the branch matches.

Tested evidence (root cause)

YAML 1.1 vs 1.2 read behavior:

import yaml
yaml.safe_load("use_guide: no")["use_guide"]   # -> False   (PyYAML = YAML 1.1)
yaml.safe_load("use_guide: yes")["use_guide"]  # -> True
yaml.safe_load("use_guide: off")["use_guide"]  # -> False

from ruamel.yaml import YAML
y = YAML(typ="safe", pure=True); y.version = (1, 2)
y.load("use_guide: no")["use_guide"]            # -> 'no'    (YAML 1.2, correct)

Emitter behavior (same string "no"):

import yaml
yaml.safe_dump({"use_guide": "no"})            # -> "use_guide: 'no'\n"   (PyYAML quotes — safe)

from ruamel.yaml import YAML; import io
y = YAML(); s = io.StringIO(); y.dump({"use_guide": "no"}, s)
s.getvalue()                                    # -> "use_guide: no\n"     (ruamel does NOT quote — unsafe)

So: the reader corrupts on load (1.1), and the ruamel writer path produces unquoted output that even a 1.2 reader survives but a 1.1 reader does not.

Why this is broader than one field

YAML 1.1 implicit booleans cover: y|Y|n|N|yes|Yes|YES|no|No|NO|true|True|TRUE|false|False|FALSE|on|On|ON|off|Off|OFF (plus ~/null, and numeric-looking scalars are a separate axis — see “Out of scope”). Any Galaxy select/boolean param whose string option value is one of these is corrupted on read. use_guide is just the first one this surfaced on (StringTie suite, widely used in IWC). Because load_workflow is shared by all commands, a convert or roundtrip of an affected file silently rewrites "no"False/"false", i.e. it corrupts the workflow, not just a validation report.

Code pointers (galaxy fork: worktrees/galaxy/branch/wf_tool_state)

Read side (the primary bug):

Write side (contributing):

1. Read side (primary, robust). Read format2 with YAML 1.2 semantics so no/yes/on/off/y/n plain scalars decode as strings. Two options:

This is the robust fix because it also corrects existing corpus files (the IWC format2 corpus, produced by the TS converter, already contains unquoted no). An emitter-only fix would not.

Caveat to verify: confirm nothing in the workflow_state / tool_state models relies on yes/no/on/off decoding to bool. Galaxy tool_state booleans are normally stored as real bools and emitted as true/false (still booleans under YAML 1.2), while yes/no/on/off appear only as select-option string values — so YAML 1.2 should be correct for this domain. Validate against the parameter_specification corpus and the IWC sweep.

2. Write side (defense-in-depth). Make export_format2.format_yaml quote string scalars that any YAML 1.1 reader would misinterpret (booleans, null, numerics). Either configure the ruamel dumper to do so or route through a representer that quotes such strings. Prevents Python from emitting files that its own (or any 1.1) reader corrupts.

Do both: 1.2 read fixes correctness everywhere; safe-quote write keeps output portable to any consumer.

Suggested tests (red → green)

Out of scope (do not conflate)

Cross-impl note

This is the inverse of the TS situation: on this BREW3R file, Python catches the use_guide corruption (because YAML 1.1 produced a bool that fails the discriminator) while TS passes (YAML 1.2 string). The correct end state is that the value is the string "no" everywhere — which means Python should stop coercing on read (and stop emitting unquoted on write), converging with the TS reader.