Home Research

gxformat2 workflow inputs

Conceptual model, current aliases, and schema gaps for gxformat2 workflow inputs.

Raw
Revised
2026-05-06
Rev
2
component

gxformat2 workflow inputs

Use this note when translating or authoring the top-level inputs: section of a gxformat2 workflow. gxformat2-schema gives the closed vocabulary, but not the authoring posture: which aliases are current, which fields drive Galaxy runtime behavior, and which schema gaps should be annotated upstream.

Conceptual model

A gxformat2 workflow input is a top-level workflow interface item. Galaxy imports it as one of three native input step families:

gxformat2 typeNative Galaxy stepMeaningAuthoring guidance
data, Filedata_inputOne Galaxy dataset input.Prefer data. Treat File as CWL-friendly alias accepted for compatibility.
collectiondata_collection_inputOne Galaxy dataset collection input.Pair with collection_type; if absent, Galaxy defaults to list.
string, int, float, booleanparameter_inputScalar workflow parameter exposed at invocation.Prefer current gxformat2 spellings: string, int, float, boolean.
[<scalar-type>]parameter_input with multiple: true in native stateMultiple primitive values.Supported by conversion code for simple arrays.

The current conversion code makes the alias split explicit. Native Galaxy stores parameter inputs as text and integer, but gxformat2 normalized export converts them to string and int. Import accepts both spellings and converts string -> text, int -> integer for native state.

Current vs compatibility aliases

Author new gxformat2 with the current normalized export vocabulary:

Use in new gxformat2Also acceptedEvidence
dataFile, data_inputSALAD documents File as a data alias; normalization maps File and data_input to data.
collectiondata_collection, data_collection_inputNative input steps are not gxformat2 steps; SALAD says native input step types should be represented under inputs; normalization maps native aliases to collection.
stringtextSALAD says text aliases string because Galaxy tools use text; export emits string.
intintegerSALAD says integer aliases int because Galaxy tools use integer; export emits int.
floatdouble in parts of primitive vocabularyGalaxy native workflow parameter inputs expose float. double is treated as numeric by gxformat2 lint default validation, but should not be presented as preferred authoring vocabulary.
booleannone meaningfulNative and gxformat2 agree.

The generated structural JSON Schema includes null, long, double, integer, text, and File because it flattens primitive/SALAD vocabulary. That enum is permissive vocabulary, not a current-authoring recommendation.

Cross-cut fields

optional and default

optional controls whether Galaxy requires the workflow input at invocation. It defaults to false in SALAD and in native modules. default is inherited from the CWL-ish InputParameter base and is applied when the input object is missing or null.

Runtime behavior differs by input family:

  • Dataset and collection inputs read default only when no invocation value is supplied; the default is converted through raw_to_galaxy.
  • Parameter inputs read default when no invocation value is supplied; non-dict defaults are wrapped as {value: <default>} before extracting the value.
  • The Galaxy parameter-input editor carries a backwards-compatibility conditional around defaults, but the code comment says defaults can now be set for optional and required parameters.

IWC Corpus Shape:

ShapeCount
required, no default520
required, default67
optional, default84
optional, no default46

Guidance: do not infer optional: true merely because default exists. IWC has required parameters with defaults, especially thresholds and numeric settings. Use optional: true when omission is semantically acceptable; use default when Galaxy should supply a value if the user omits or nulls the input.

format

format is optional and applies to dataset and collection inputs. Galaxy uses it as datatype-extension filtering for valid datasets. It is good hygiene and should be encouraged when the author is confident about the datatype extension, but it is better to omit format than to encode a weak guess. Valid extension vocabulary should cite galaxy-datatypes-conf.

collection_type

collection_type applies to type: collection. SALAD documents default list and colon-separated nested types. galaxy-collection-semantics is the broader authority for valid Galaxy collection shapes.

restrictions, suggestions, and restrictOnConnections

These fields are current Galaxy behavior but are not declared in the SALAD input records or the generated JSON Schema.

restrictions is a static closed option list for text inputs. Galaxy turns a text parameter with restrictions into a select input at runtime.

suggestions is a static open suggestion list for text inputs. Galaxy passes suggestions as options without switching the parameter type to select.

restrictOnConnections asks Galaxy to derive a text input’s valid choices from connected tool/subworkflow select options at runtime.

Option item shape is either a scalar value or an object with value and optional label; Galaxy converts both into runtime options and serializes colon-delimited editor state back into the source shape.

Input tags

Native Galaxy data and collection input modules expose a tag field used as a runtime input filter. The generated gxformat2 conversion currently does not copy tag from native input steps into top-level gxformat2 inputs.

Corpus status: native cleaned IWC workflows have tag present on 266 data/collection input step states, but every value is empty string or null. No non-empty workflow-input tag filter was observed. Generated gxformat2 workflows still contain many unrelated tool-state tag and output tags fields, so searches for tag: in whole files are not evidence for top-level input tags.

Guidance: treat input tag as real native Galaxy behavior but not yet gxformat2 interface vocabulary. If gxformat2 should preserve it, add it deliberately to the SALAD data/collection input records and conversion key lists.

Open questions

  • Should long and double get explicit native mappings (long -> integer, double -> float) or remain permissive primitive vocabulary outside the preferred gxformat2 authoring set?
  • Should tag be added to gxformat2 inputs now, or deferred until a non-empty corpus example or user need appears?

Incoming References (5)