CWL_CONDITIONALS_PICK_VALUE_COMPARE

pickValue Implementation: Approach Comparison

Executive Summaries

Approach A — Synthetic Tool Steps: During CWL import, inject Galaxy’s bundled pick_value expression tool as a synthetic workflow step for each workflow output that uses pickValue. The tool already handles first_non_null (via first_or_error) and the_only_non_null (via only). Parser-only change — no model migrations, no runtime changes. Does NOT cover all_non_null (tool can’t return arrays/collections). Estimated 1-2 files touched in parser.py.

Approach B — Native Framework Support: Add a pick_value column to the WorkflowOutput model, create duplicate-label WorkflowOutput objects across source steps, then post-process pickValue semantics in run.py after all steps complete. Covers all three modes including all_non_null. Requires DB migration, model changes, parser changes, import changes, runtime changes, and export changes. Estimated 5-7 files touched.

Pros/Cons

Approach A: Synthetic Tool Steps

ProsCons
Parser-only change, no model/runtime/migrationall_non_null not supported (tool returns scalar)
Reuses battle-tested pick_value toolSynthetic steps visible in workflow editor
should_fail tests handled by tool’s error modesNo round-trip CWL re-export fidelity
Low risk — no changes to execution engineFile[] output via all_non_null impossible
Small diff, fast to implementScatter+conditional pattern not addressed
Tool already tested in Galaxy workflow suitecond-with-defaults (linkMerge+pickValue) unclear

Approach B: Native Framework Support

ProsCons
All 3 pickValue modes supportedDB migration required
Clean semantic model — pickValue is first-classNull detection (skipped vs empty) is hard
Benefits Galaxy-native workflows long-termDuplicate-label WorkflowOutputs may confuse editor
Scatter+conditional pattern addressable5-7 files, medium-large change
Correct CWL export round-trip possibleall_non_null returning list vs HDCA is unresolved
No synthetic steps polluting workflow graphHigher regression risk across workflow subsystem

Coverage Analysis (29 RED Tests)

By pickValue mode

ModeTestsPatternTool (A)Framework (B)
first_non_null8multi-sourceYESYES
pass_through_required_{false,true}_when x2 (+nojs)multi-sourceYESYES
first_non_null_{first,second}_non_null x2 (+nojs)multi-sourceYESYES
the_only_non_null4multi-sourceYESYES
pass_through_required_the_only_non_null (+nojs)multi-sourceYESYES
the_only_non_null_single_true (+nojs)multi-sourceYESYES
all_non_null6multi-sourceNOYES
all_non_null_{all_null,one,multi}_non_null x3 (+nojs)multi-sourceNOYES
scatter+conditional7scatterNOPARTIAL
condifional_scatter_on_nonscattered_{false,true_nojs} x3scatter+pickValueNOYES (Phase 6)
scatter_on_scattered_conditional (+nojs)scatter+pickValueNOYES (Phase 6)
conditionals_nested_cross_scatter (+nojs)nested scatterNOMAYBE
conditionals_multi_scatter (+nojs)hybrid multi+scatterNOMAYBE
Complex2multi+linkMergeNOPARTIAL
cond-with-defaults-{1,2}linkMerge+pickValue+File[]NOPARTIAL

Summary

Tool (A)Framework (B)
Covered12/29 (41%)18-24/29 (62-83%)
first_non_null + the_only_non_null12/1212/12
all_non_null (multi-source)0/66/6
scatter+conditional0/94-9/9 (phases)
cond-with-defaults0/20-2/2 (depends on linkMerge)

Implementation Effort

DimensionTool (A)Framework (B)
Files touched1 (parser.py)5-7 (model, migration, parser, import, run, export)
Lines of code~100-150~300-500
DB migrationNoYes
Runtime changesNoYes (run.py post-processing)
Regression riskLow (parser only)Medium-High (execution path)
Time estimate1-2 days5-8 days
ReviewabilityEasy — self-containedHarder — cross-cutting
Hardest sub-problemType mapping CWL->param_typeNull detection (skipped vs empty)

Recommendation

Pursue hybrid: Tool (A) first, Framework (B) later.

Rationale:

  1. Goal is CWL conformance, not Galaxy UX. Synthetic tool steps are invisible to CWL users — they never see the Galaxy workflow graph. 12 tests going green immediately is significant.
  2. Tool approach is low-risk and fast. Parser-only change, no migration, testable in 1-2 days.
  3. Framework approach has unsolved hard problems. Null detection, duplicate-label editor behavior, and all_non_null return type are each individually tricky. Stacking them makes the PR risky.
  4. The 12 easiest tests are the same 12 for both approaches. No wasted work — the parser’s get_outputs_for_label() skip logic for pickValue outputs (needed by A) is compatible with later adding framework support (B) for the remaining tests.
  5. all_non_null and scatter patterns can wait. They’re harder regardless of approach and may need the pick_value tool extended anyway (for expression/scalar types).

Phase 1 (this PR): Implement Approach A for first_non_null + the_only_non_null. Target: 12 tests RED->GREEN.

Phase 2 (future PR): Either extend pick_value tool with all_non_null mode (for string[] types) OR implement Framework support. Decision deferred until Phase 1 ships and scatter+conditional patterns are better understood.

Unresolved Questions