gxformat2 workflow inputs
Use this note when translating or authoring the top-level inputs: section of a gxformat2 workflow. gxformat2-schema gives the closed vocabulary, but not the authoring posture: which aliases are current, which fields drive Galaxy runtime behavior, and which schema gaps should be annotated upstream.
Conceptual model
A gxformat2 workflow input is a top-level workflow interface item. Galaxy imports it as one of three native input step families:
| gxformat2 type | Native Galaxy step | Meaning | Authoring guidance |
|---|---|---|---|
data, File | data_input | One Galaxy dataset input. | Prefer data. Treat File as CWL-friendly alias accepted for compatibility. |
collection | data_collection_input | One Galaxy dataset collection input. | Pair with collection_type; if absent, Galaxy defaults to list. |
string, int, float, boolean | parameter_input | Scalar workflow parameter exposed at invocation. | Prefer current gxformat2 spellings: string, int, float, boolean. |
[<scalar-type>] | parameter_input with multiple: true in native state | Multiple primitive values. | Supported by conversion code for simple arrays. |
The current conversion code makes the alias split explicit. Native Galaxy stores parameter inputs as text and integer, but gxformat2 normalized export converts them to string and int. Import accepts both spellings and converts string -> text, int -> integer for native state.
Current vs compatibility aliases
Author new gxformat2 with the current normalized export vocabulary:
| Use in new gxformat2 | Also accepted | Evidence |
|---|---|---|
data | File, data_input | SALAD documents File as a data alias; normalization maps File and data_input to data. |
collection | data_collection, data_collection_input | Native input steps are not gxformat2 steps; SALAD says native input step types should be represented under inputs; normalization maps native aliases to collection. |
string | text | SALAD says text aliases string because Galaxy tools use text; export emits string. |
int | integer | SALAD says integer aliases int because Galaxy tools use integer; export emits int. |
float | double in parts of primitive vocabulary | Galaxy native workflow parameter inputs expose float. double is treated as numeric by gxformat2 lint default validation, but should not be presented as preferred authoring vocabulary. |
boolean | none meaningful | Native and gxformat2 agree. |
The generated structural JSON Schema includes null, long, double, integer, text, and File because it flattens primitive/SALAD vocabulary. That enum is permissive vocabulary, not a current-authoring recommendation.
Cross-cut fields
optional and default
optional controls whether Galaxy requires the workflow input at invocation. It defaults to false in SALAD and in native modules. default is inherited from the CWL-ish InputParameter base and is applied when the input object is missing or null.
Runtime behavior differs by input family:
- Dataset and collection inputs read
defaultonly when no invocation value is supplied; the default is converted throughraw_to_galaxy. - Parameter inputs read
defaultwhen no invocation value is supplied; non-dict defaults are wrapped as{value: <default>}before extracting the value. - The Galaxy parameter-input editor carries a backwards-compatibility conditional around defaults, but the code comment says defaults can now be set for optional and required parameters.
IWC Corpus Shape:
| Shape | Count |
|---|---|
| required, no default | 520 |
| required, default | 67 |
| optional, default | 84 |
| optional, no default | 46 |
Guidance: do not infer optional: true merely because default exists. IWC has required parameters with defaults, especially thresholds and numeric settings. Use optional: true when omission is semantically acceptable; use default when Galaxy should supply a value if the user omits or nulls the input.
format
format is optional and applies to dataset and collection inputs. Galaxy uses it as datatype-extension filtering for valid datasets. It is good hygiene and should be encouraged when the author is confident about the datatype extension, but it is better to omit format than to encode a weak guess. Valid extension vocabulary should cite galaxy-datatypes-conf.
collection_type
collection_type applies to type: collection. SALAD documents default list and colon-separated nested types. galaxy-collection-semantics is the broader authority for valid Galaxy collection shapes.
restrictions, suggestions, and restrictOnConnections
These fields are current Galaxy behavior but are not declared in the SALAD input records or the generated JSON Schema.
restrictions is a static closed option list for text inputs. Galaxy turns a text parameter with restrictions into a select input at runtime.
suggestions is a static open suggestion list for text inputs. Galaxy passes suggestions as options without switching the parameter type to select.
restrictOnConnections asks Galaxy to derive a text input’s valid choices from connected tool/subworkflow select options at runtime.
Option item shape is either a scalar value or an object with value and optional label; Galaxy converts both into runtime options and serializes colon-delimited editor state back into the source shape.
Input tags
Native Galaxy data and collection input modules expose a tag field used as a runtime input filter. The generated gxformat2 conversion currently does not copy tag from native input steps into top-level gxformat2 inputs.
Corpus status: native cleaned IWC workflows have tag present on 266 data/collection input step states, but every value is empty string or null. No non-empty workflow-input tag filter was observed. Generated gxformat2 workflows still contain many unrelated tool-state tag and output tags fields, so searches for tag: in whole files are not evidence for top-level input tags.
Guidance: treat input tag as real native Galaxy behavior but not yet gxformat2 interface vocabulary. If gxformat2 should preserve it, add it deliberately to the SALAD data/collection input records and conversion key lists.
Open questions
- Should
longanddoubleget explicit native mappings (long -> integer,double -> float) or remain permissive primitive vocabulary outside the preferred gxformat2 authoring set? - Should
tagbe added to gxformat2 inputs now, or deferred until a non-empty corpus example or user need appears?