Dashboard

Dependency Cwl Utils

CWL document parser and transformer with autogenerated dataclasses for v1.0, v1.1, v1.2

Raw
Revised:
2026-04-22
Revision:
2
Related Notes:
Component - CWL Ephemeral Collections, Component - CWL Workflow State, Component - Galaxy Workflow Expression Context, Dependency - CWL Conformance Tests

cwl-utils Dependency White Paper

Package: cwl-utils (v0.41) License: Apache 2.0 Python: 3.10 - 3.14 Repository: https://github.com/common-workflow-language/cwl-utils Docs: https://cwl-utils.readthedocs.io/

Overview

cwl-utils is the official Python utility library for loading, parsing, manipulating, and transforming Common Workflow Language (CWL) documents. It provides autogenerated Python dataclasses for CWL v1.0, v1.1, and v1.2, a version-dispatching parser, static type checking of workflow connections, document packing/splitting, expression handling, and several CLI tools for common CWL operations.

It is not a CWL executor. It operates purely at the document/schema level - reading, validating, and transforming CWL definitions.

CWL Version Support

VersionParser ModuleStatus
v1.0cwl_utils.parser.cwl_v1_0Stable
v1.1cwl_utils.parser.cwl_v1_1Stable
v1.2cwl_utils.parser.cwl_v1_2Stable (latest, default)

Each version has an autogenerated parser (~25-30k lines of Python dataclasses generated from the CWL schema-salad definitions), version-specific utilities, and version-specific expression refactoring support.

Core Python API

Document Loading

All loaders live in cwl_utils.parser and auto-detect cwlVersion to dispatch to the correct version-specific parser.

from cwl_utils.parser import load_document_by_uri, load_document, save

# Load from file path or URL
process = load_document_by_uri("workflow.cwl")

# Load from YAML string or object
process = load_document(yaml_string_or_dict)

# Serialize back to JSON/YAML-compatible dict
saved = save(process)

Key loading functions:

FunctionInputDescription
load_document_by_uri(path)file path, file:// URI, or http(s):// URLPrimary entry point
load_document(doc)YAML string or parsed dictLoad from in-memory data
load_document_by_string(string, uri)Raw YAML stringParse string then load
load_document_by_yaml(yaml, uri)Pre-parsed YAML objectLoad from ruamel.yaml output

All loaders accept optional loadingOptions (for custom fetch behavior) and load_all=True to load all entries from $graph documents instead of just #main.

Returned objects are typed Python dataclass instances (e.g. cwl_v1_2.Workflow, cwl_v1_2.CommandLineTool) with full attribute access to all CWL fields.

Version-Agnostic Type Aliases

cwl_utils.parser exports union type aliases spanning all three CWL versions, enabling version-agnostic code:

from cwl_utils.parser import (
    Process,              # Workflow | CommandLineTool | ExpressionTool | Operation
    Workflow,             # v1.0 | v1.1 | v1.2 Workflow
    CommandLineTool,      # v1.0 | v1.1 | v1.2 CommandLineTool
    ExpressionTool,       # v1.0 | v1.1 | v1.2 ExpressionTool
    WorkflowStep,         # v1.0 | v1.1 | v1.2 WorkflowStep
    WorkflowInputParameter,
    WorkflowOutputParameter,
    CommandInputParameter,
    CommandOutputParameter,
    DockerRequirement,
    SoftwareRequirement,
    InputArraySchema,
    InputEnumSchema,
    InputRecordSchema,
    File,
    Directory,
    SecondaryFileSchema,  # v1.1+ only
)

Runtime type-checking tuples are also available (e.g. WorkflowTypes, CommandLineToolTypes, DockerRequirementTypes) for use with isinstance().

Serialization

from cwl_utils.parser import save

# Convert Python objects back to dicts suitable for YAML/JSON output
result = save(process, top=True, base_url="", relative_uris=True)

When saving a list of processes, save() automatically wraps them in a $graph document with the latest cwlVersion.

Utility Functions

from cwl_utils.parser import cwl_version, is_process, version_split

cwl_version(yaml_dict)    # -> "v1.2" | None
is_process(obj)           # -> bool
version_split("v1.2")     # -> [1, 2]

Parser Utilities (cwl_utils.parser.utils)

Higher-level operations on parsed CWL objects.

Static Type Checking

from cwl_utils.parser.utils import static_checker

static_checker(workflow)
# Raises ValidationException with detailed source-line errors
# if any workflow step source/sink types are incompatible.

Validates all step input sources against their declared types, checks linkMerge compatibility (merge_nested, merge_flattened), and verifies pickValue semantics (first_non_null, only_non_null, all_non_null).

Type Inference

from cwl_utils.parser.utils import (
    type_for_source,
    type_for_step_input,
    type_for_step_output,
    param_for_source_id,
)

These functions resolve the actual CWL type flowing through workflow connections, accounting for scatter, linkMerge, and pickValue modifiers.

Step Loading & Conversion

from cwl_utils.parser.utils import load_step, convert_stdstreams_to_files

# Resolve a step's `run` field (handles both inline and URI references)
step_process = load_step(workflow_step)

# Normalize stdin/stdout/stderr shortcuts into File objects
convert_stdstreams_to_files(command_line_tool)

Input File Loading

from cwl_utils.parser.utils import load_inputfile_by_uri, load_inputfile

# Load CWL input/job files (the YAML files that provide runtime values)
inputs = load_inputfile_by_uri("v1.2", "inputs.yml")

Document Packing (cwl_utils.pack)

Consolidates multi-file CWL workflows (with $include, $import, run: references) into a single self-contained document.

from cwl_utils.pack import pack_process

packed = pack_process(cwl_dict, base_url, cwl_version)

Handles:

  • Inlining of run: references to external tool definitions
  • $include / $import resolution
  • SchemaDefRequirement user-defined type inlining
  • Local and remote (HTTP) file fetching
  • GitHub symbolic link detection

Expression Handling (cwl_utils.expression)

Parses and evaluates CWL expressions ($(...) parameter references and ${...} JavaScript blocks).

from cwl_utils.expression import scanner

# Find JS expression boundaries in a string
result = scanner("prefix_$(inputs.name)_suffix")
# Returns (start, end) tuple of the expression

Supports:

  • Parameter reference syntax: $(inputs.file.path)
  • JavaScript expression syntax: ${return inputs.x + 1}
  • Nested quoting (single/double quotes, backslash escapes)
  • Configurable JavaScript engine sandboxing (cwl_utils.sandboxjs)

File Format Validation (cwl_utils.file_formats)

Validates CWL file format annotations against ontologies (typically EDAM).

from cwl_utils.file_formats import check_format, formatSubclassOf

# Validate that a file's format matches allowed input formats
check_format(file_dict, allowed_formats, ontology_graph)

# Check ontology subclass relationships
formatSubclassOf(fmt_uri, class_uri, ontology_graph, visited=set())

Uses rdflib to traverse rdfs:subClassOf and owl:equivalentClass relationships.

Container Image Handling (cwl_utils.image_puller)

Abstract ImagePuller base class with concrete implementations:

ClassEngineOutput
DockerImagePullerDocker / Podman.tar tarball
SingularityImagePullerSingularity 2.6+ / 3.x+.img or .sif

Both support force-pull and configurable save directories.

CWL Value Types (cwl_utils.types)

TypedDict definitions for CWL runtime values:

TypeDescription
CWLFileTypeFile object (location, basename, checksum, size, secondaryFiles, format, contents)
CWLDirectoryTypeDirectory object (location, basename, listing)
CWLOutputTypeUnion of all CWL output value types (primitives, File, Directory, arrays, records)
CWLObjectTypeMutableMapping[str, CWLOutputType]
CWLParameterContext{inputs, self, runtime} context for expression evaluation
CWLRuntimeParameterContextRuntime context (outdir, tmpdir, cores, ram, exitCode, etc.)

Type guard functions: is_file(), is_directory(), is_file_or_directory().

Built-in CWL type names: null, boolean, int, long, float, double, string, File, Directory, stdin, stdout, stderr, Any.

Schema Definition Handling (cwl_utils.schemadef)

Resolves SchemaDefRequirement user-defined types in CWL documents. Builds a type dictionary from inline and $import-ed schema definitions for use during packing and validation.

General Utilities (cwl_utils.utils)

FunctionDescription
load_linked_file()Fetch and parse imported CWL files (local or remote)
normalize_to_map() / normalize_to_list()Convert between dict and list representations of CWL fields
resolved_path()Resolve relative paths against base URIs
bytes2str_in_dicts()Recursively decode byte strings in nested structures
yaml_dumps()Serialize to YAML string
sanitise_schema_field()Normalize CWL type shorthand (e.g. File? -> optional File)
to_pascal_case()String case conversion
is_uri() / is_local_uri() / get_value_from_uri()URI detection and parsing

CLI Tools

Six command-line tools are installed as console scripts:

cwl-cite-extract

Extract software citations/requirements from CWL documents. Traverses workflows recursively to find all SoftwareRequirement entries with package names and versions.

cwl-docker-extract

Pull and cache all container images referenced in CWL documents via DockerRequirement. Supports Docker, Podman, Singularity, and udocker backends.

cwl-expression-refactor

Refactor inline CWL expressions ($(...) / ${...}) into standalone ExpressionTool or CommandLineTool steps, producing expression-free workflows.

cwl-graph-split

Unpack $graph-style CWL documents (multiple processes in one file) into separate files, one per process.

cwl-normalizer

Normalize CWL documents: upgrade to v1.2 (via cwl-upgrader), pack into a single document, and optionally refactor expressions. Outputs JSON or YAML.

cwl-inputs-schema-gen

Generate a JSON Schema from CWL workflow/tool input definitions. Useful for building input validation forms or generating documentation.

Error Hierarchy

BaseException
  ArrayMissingItems
  MissingKeyField
  MissingTypeName
  RecordMissingFields
Exception
  JavascriptException
  SubstitutionError
  WorkflowException
    GraphTargetMissingException
schema_salad.exceptions.ValidationException  (used extensively)

Dependencies

Required

PackageVersion ConstraintPurpose
schema-salad>= 8.8, < 9Schema validation framework, YAML loading, source-line tracking
ruamel.yaml>= 0.17.6, < 0.20YAML parsing with round-trip fidelity
rdflibanyRDF graph handling for file format ontology validation
requestsanyHTTP client for fetching remote CWL documents
cwl-upgrader>= 1.2.3CWL version upgrading (v1.0/v1.1 -> v1.2)
packaginganyVersion string comparison
typing_extensions>= 4.10.0Backported typing features (TypeIs, etc.)

Optional

ExtraPackagesPurpose
prettycwlformatPretty-printing CWL output
testingpytest, pytest-cov, pytest-xdist, jsonschema, udocker, cwltoolTest suite

Integration Patterns

As a parsing library

Load CWL documents into typed Python objects for analysis, transformation, or code generation.

As a validation layer

Use static_checker() to type-check workflow connections before execution. Use check_format() to validate file format ontology compatibility.

As a transformation pipeline

Chain: load -> modify Python objects -> save -> write. Or use CLI tools: cwl-normalizer (upgrade + pack), cwl-expression-refactor (simplify expressions), cwl-graph-split (decompose).

As a metadata extractor

Extract software requirements (cwl-cite-extract), container images (cwl-docker-extract), or input schemas (cwl-inputs-schema-gen) from CWL documents.

Incoming References (4)