Dashboard

Dependency Gxformat2

Bidirectional conversion between native .ga and Format2 YAML with linting and visualization

Raw
Revised:
2026-04-22
Revision:
2
Related Notes:
Component - Format2 Workflows (gxformat2), Component - Workflow Format (.ga), Component - Workflow Format Differences, Component - gxformat2 Parsing and Syntax

gxformat2 — Implementation Overview & API Reference

Library for bidirectional conversion between Galaxy’s native .ga workflow format and the human-readable Format 2 (YAML-based) representation. Also provides linting, abstract CWL export, and visualization.

Public API

Exported from gxformat2:

from gxformat2 import (
    python_to_workflow,          # Format2 dict → native Galaxy dict
    from_galaxy_native,          # native Galaxy dict → Format2 dict
    convert_and_import_workflow,  # convert + POST to Galaxy via BioBlend
    ImporterGalaxyInterface,     # ABC for Galaxy API interactions
    ImportOptions,               # conversion options (e.g. deduplicate_subworkflows)
)

python_to_workflow(as_python, galaxy_interface, workflow_directory=None, import_options=None) → dict

Core Format2→native conversion. Takes a Format2 workflow as a Python dict, returns a native .ga-style dict. galaxy_interface can be None for offline conversion (only needed if the workflow uses run: GalaxyTool steps). workflow_directory resolves @import references for subworkflows.

from_galaxy_native(native_workflow_dict, tool_interface=None, json_wrapper=False) → dict

Reverse conversion: native .ga dict → Format2 dict. Marked “highly experimental” in the source. If json_wrapper=True, returns {"yaml_content": <yaml_string>} instead.

convert_and_import_workflow(has_workflow, **kwds) → dict

High-level: convert + import into a running Galaxy instance. Key kwargs:

kwargdescription
galaxy_interfaceImporterGalaxyInterface instance (or constructs BioBlendImporterGalaxyInterface from remaining kwargs)
source_type"path" to load from filesystem
workflow_directorybase dir for resolving @import
convertTrue (default) to convert Format2→native before import
nameoverride workflow name
publishpublish on import
exact_toolsrequire exact tool versions
url / admin_key / user_keypassed to BioBlendImporterGalaxyInterface

ImporterGalaxyInterface (ABC)

class ImporterGalaxyInterface(abc.ABCMeta):
    @abstractmethod
    def import_workflow(self, workflow, **kwds): ...
    def import_tool(self, tool): ...  # raises NotImplementedError

BioBlendImporterGalaxyInterface is the concrete implementation — wraps bioblend.GalaxyInstance.workflows.import_workflow_json(). Constructor accepts admin_gi/user_gi (BioBlend instances) or url+admin_key/user_key.

ImportOptions

class ImportOptions:
    deduplicate_subworkflows: bool = False  # share subworkflow definitions via $graph

Format2 vs Native Galaxy Format

Format 2Native (.ga)
SerializationYAMLJSON
Markerclass: GalaxyWorkflowa_galaxy_workflow: "true"
Stepsdict (keyed by label) or listdict keyed by integer order_index strings
Inputstop-level inputs: dictsteps with type: data_input etc.
ConnectionsCWL-style in: {x: {source: step/output}}input_connections: {x: [{id: N, output_name: "..."}]}
Outputstop-level outputs: with outputSourceworkflow_outputs embedded in steps

Minimal Format2 example:

class: GalaxyWorkflow
inputs:
  the_input: data
steps:
  the_step:
    tool_id: cat1
    in:
      input1:
        source: the_input
outputs:
  the_output:
    outputSource: the_step/out_file1

Module Architecture

converter.py — Format2 → Native

Entry: python_to_workflow(). Flow:

  1. _preprocess_graphs() — handle $graph multi-workflow documents, register subworkflows
  2. convert_inputs_to_steps() — transform top-level inputs: into native input step dicts, prepend to step list
  3. Per-step dispatch via transform_{step_type}():
    • transform_tool() — resolves state/runtime_inputs/tool_state, builds post_job_actions from out specs
    • transform_subworkflow() — recursive conversion via run_workflow_to_step()
    • transform_data_input(), transform_data_collection_input(), transform_parameter_input() — input step scaffolding
    • transform_pause() — pause/review step
  4. _populate_input_connections() — converts CWL-style in/connect to native input_connections with numeric step IDs
  5. Output processing — maps outputs[].outputSourceworkflow_outputs entries on target steps

ConversionContext tracks label→step_id mappings and subworkflow state. SubworkflowConversionContext delegates graph-level state to parent.

Post-Job Actions

Format2 out dict keys map to Galaxy PJA classes:

POST_JOB_ACTIONS = {
    'hide':         HideDatasetAction,
    'rename':       RenameDatasetAction,
    'change_datatype': ChangeDatatypeAction,
    'set_columns':  ColumnSetAction,
    'add_tags':     TagDatasetAction,
    'remove_tags':  RemoveTagDatasetAction,
    'delete_intermediate_datasets': DeleteIntermediatesAction,
}

export.py — Native → Format2

Entry: from_galaxy_native(). Iterates native steps, dispatches by module_type:

  • data_input/data_collection_input/parameter_input → Format2 inputs: entries (using native_input_to_format2_type())
  • tool → step dict with tool_id, tool_version, recovered tool_state (parsed from JSON string)
  • subworkflow → recursive from_galaxy_native() call, result embedded as run:
  • pause → step with type: pause

_convert_input_connections() reverses native connections to CWL-style in: dicts. _convert_post_job_actions() reverses PJAs back to out: specs.

model.py — Core Abstractions

Type system and shared utilities:

NativeGalaxyStepType = Literal["subworkflow", "data_input", "data_collection_input", "tool", "pause", "parameter_input"]

STEP_TYPE_ALIASES = {
    'input': 'data_input',
    'input_collection': 'data_collection_input',
    'parameter': 'parameter_input',
}

Key functions:

  • get_native_step_type(step_dict) — infer native type from Format2 step, resolving aliases; defaults to "tool" (or "subworkflow" if run present)
  • pop_connect_from_step_dict(step) — merge in and connect keys into unified connection dict, separating connections from defaults
  • setup_connected_values(value, key, append_to) — recursively walk tool state, replace {"$link": "step/output"} with {"__class__": "ConnectedValue"} and collect connections
  • inputs_as_native_steps(workflow_dict) — convert Format2 inputs: to native step dicts with proper type, parameter_type, tool_state etc.
  • steps_as_list(format2_workflow, add_ids, inputs_offset, mutate) — normalize steps from dict-or-list to list, optionally embedding IDs
  • convert_dict_to_id_list_if_needed(dict_or_list) — convert {key: value} to [{id: key, ...value}]

Input type mapping (inputs_as_native_steps):

Format2 typeNative step typeNotes
data, Filedata_input
collection, data_collectiondata_collection_input
int, integerparameter_inputparameter_type: "integer"
string, textparameter_inputparameter_type: "text"
float, color, booleanparameter_inputparameter_type matches
[int] (array)parameter_inputmultiple: true in tool_state

normalize.py — Cross-Format Normalization

Provides format-agnostic views of workflows. Auto-converts native to Format2 via ensure_format2().

  • steps_normalized(workflow_dict=None, workflow_path=None) — returns all steps (inputs + tool/subworkflow steps) as normalized list
  • inputs_normalized(**kwd) — just the input steps
  • outputs_normalized(**kwd) — just the outputs
  • NormalizedWorkflow(input_workflow) — deep-copies and normalizes: replaces anonymous output references, ensures implicit out dicts
  • walk_id_list_or_dict(dict_or_list) — yields (key, value) regardless of dict or list representation

lint.py + linting.py — Validation

from gxformat2.lint import lint_format2, lint_ga
from gxformat2.linting import LintContext

ctx = LintContext()
lint_format2(ctx, workflow_dict, path="/path/to/workflow.gxwf.yml")
# or
lint_ga(ctx, workflow_dict)

ctx.print_messages()
# ctx.found_errors, ctx.found_warns, ctx.error_messages, ctx.warn_messages

lint_format2() validates against the schema-salad v19.09 schema (requires a file path for file:// URI). lint_ga() validates native format structure.

Both check:

  • Structural correctness (required keys, types)
  • Workflow outputs exist and have labels
  • Report markdown validity (validate_galaxy_markdown)
  • Test tool shed references (warn)
  • Training topic tags (if training_topic set on LintContext)

Exit codes (from main()): 0 success, 1 warnings, 2 errors, 3 parse failure.

abstract.py — CWL Export

from gxformat2.abstract import from_dict

cwl_dict = from_dict(workflow_dict)  # accepts either format, auto-converts

Produces CWL v1.2 abstract representation. Tool steps become Operation classes (non-executable). Subworkflows become nested Workflow classes. Uses NormalizedWorkflow to resolve anonymous outputs and ensure out dicts before export.

Type mapping: dataFile, collectionFile[], optional types get ? suffix.

cytoscape.py — Visualization

from gxformat2.cytoscape import to_cytoscape

to_cytoscape("workflow.ga", "output.html")  # or .json for raw elements

Produces Cytoscape.js nodes/edges from normalized step representation. HTML output embeds the data in an interactive visualization template.

_scripts.py — Format Detection

from gxformat2._scripts import ensure_format2

format2_dict = ensure_format2(some_dict)  # converts from native if a_galaxy_workflow == "true"

yaml.py — YAML Utilities

from gxformat2.yaml import ordered_load, ordered_dump, ordered_load_path, ordered_dump_to_path

Safe YAML load/dump preserving dict ordering.


Connection Syntax

Format2 supports two connection styles in steps:

CWL-style in dict (preferred):

in:
  input_name:
    source: other_step/output_name

state with $link (for tool parameters):

state:
  param_name:
    $link: other_step/output_name
  nested:
    deep_param:
      $link: step2/result

$link values are replaced with {"__class__": "ConnectedValue"} in the native tool_state, and the connection is recorded in input_connections. The pipe-delimited key path (e.g. nested|deep_param) maps to Galaxy’s parameter addressing.

Legacy # syntax (deprecated, opt-in via GXFORMAT2_SUPPORT_LEGACY_CONNECTIONS=1):

source: step#output  # → step/output

Subworkflow Handling

Three mechanisms:

  1. Inline run: — subworkflow dict embedded directly in the step
  2. @importrun: {"@import": "path/to/subworkflow.gxwf.yml"}, resolved relative to workflow_directory
  3. $graph — multi-workflow document with id-keyed entries; main is the entry point, others referenced by #graph_id. With ImportOptions.deduplicate_subworkflows=True, shared subworkflows are stored once in converted["subworkflows"]

Schema Validation

gxformat2.schema.v19_09 is auto-generated from schema-salad definitions (via build_schema.sh). lint_format2() calls load_document("file://" + path) for structural validation. The schema defines the Format2 vocabulary: GalaxyWorkflow, step types, input types, output definitions, etc.


Dependencies

PackageUsage
pyyamlYAML parse/dump
schema-salad >= 8.7Format2 schema validation, code generation
bioblendGalaxy API interaction (BioBlendImporterGalaxyInterface)
typing_extensionsLiteral type hints

Incoming References (4)