cwl-utils Dependency White Paper
Package: cwl-utils (v0.41)
License: Apache 2.0
Python: 3.10 - 3.14
Repository: https://github.com/common-workflow-language/cwl-utils
Docs: https://cwl-utils.readthedocs.io/
Overview
cwl-utils is the official Python utility library for loading, parsing, manipulating, and transforming Common Workflow Language (CWL) documents. It provides autogenerated Python dataclasses for CWL v1.0, v1.1, and v1.2, a version-dispatching parser, static type checking of workflow connections, document packing/splitting, expression handling, and several CLI tools for common CWL operations.
It is not a CWL executor. It operates purely at the document/schema level - reading, validating, and transforming CWL definitions.
CWL Version Support
| Version | Parser Module | Status |
|---|---|---|
| v1.0 | cwl_utils.parser.cwl_v1_0 | Stable |
| v1.1 | cwl_utils.parser.cwl_v1_1 | Stable |
| v1.2 | cwl_utils.parser.cwl_v1_2 | Stable (latest, default) |
Each version has an autogenerated parser (~25-30k lines of Python dataclasses generated from the CWL schema-salad definitions), version-specific utilities, and version-specific expression refactoring support.
Core Python API
Document Loading
All loaders live in cwl_utils.parser and auto-detect cwlVersion to dispatch to the correct version-specific parser.
from cwl_utils.parser import load_document_by_uri, load_document, save
# Load from file path or URL
process = load_document_by_uri("workflow.cwl")
# Load from YAML string or object
process = load_document(yaml_string_or_dict)
# Serialize back to JSON/YAML-compatible dict
saved = save(process)
Key loading functions:
| Function | Input | Description |
|---|---|---|
load_document_by_uri(path) | file path, file:// URI, or http(s):// URL | Primary entry point |
load_document(doc) | YAML string or parsed dict | Load from in-memory data |
load_document_by_string(string, uri) | Raw YAML string | Parse string then load |
load_document_by_yaml(yaml, uri) | Pre-parsed YAML object | Load from ruamel.yaml output |
All loaders accept optional loadingOptions (for custom fetch behavior) and load_all=True to load all entries from $graph documents instead of just #main.
Returned objects are typed Python dataclass instances (e.g. cwl_v1_2.Workflow, cwl_v1_2.CommandLineTool) with full attribute access to all CWL fields.
Version-Agnostic Type Aliases
cwl_utils.parser exports union type aliases spanning all three CWL versions, enabling version-agnostic code:
from cwl_utils.parser import (
Process, # Workflow | CommandLineTool | ExpressionTool | Operation
Workflow, # v1.0 | v1.1 | v1.2 Workflow
CommandLineTool, # v1.0 | v1.1 | v1.2 CommandLineTool
ExpressionTool, # v1.0 | v1.1 | v1.2 ExpressionTool
WorkflowStep, # v1.0 | v1.1 | v1.2 WorkflowStep
WorkflowInputParameter,
WorkflowOutputParameter,
CommandInputParameter,
CommandOutputParameter,
DockerRequirement,
SoftwareRequirement,
InputArraySchema,
InputEnumSchema,
InputRecordSchema,
File,
Directory,
SecondaryFileSchema, # v1.1+ only
)
Runtime type-checking tuples are also available (e.g. WorkflowTypes, CommandLineToolTypes, DockerRequirementTypes) for use with isinstance().
Serialization
from cwl_utils.parser import save
# Convert Python objects back to dicts suitable for YAML/JSON output
result = save(process, top=True, base_url="", relative_uris=True)
When saving a list of processes, save() automatically wraps them in a $graph document with the latest cwlVersion.
Utility Functions
from cwl_utils.parser import cwl_version, is_process, version_split
cwl_version(yaml_dict) # -> "v1.2" | None
is_process(obj) # -> bool
version_split("v1.2") # -> [1, 2]
Parser Utilities (cwl_utils.parser.utils)
Higher-level operations on parsed CWL objects.
Static Type Checking
from cwl_utils.parser.utils import static_checker
static_checker(workflow)
# Raises ValidationException with detailed source-line errors
# if any workflow step source/sink types are incompatible.
Validates all step input sources against their declared types, checks linkMerge compatibility (merge_nested, merge_flattened), and verifies pickValue semantics (first_non_null, only_non_null, all_non_null).
Type Inference
from cwl_utils.parser.utils import (
type_for_source,
type_for_step_input,
type_for_step_output,
param_for_source_id,
)
These functions resolve the actual CWL type flowing through workflow connections, accounting for scatter, linkMerge, and pickValue modifiers.
Step Loading & Conversion
from cwl_utils.parser.utils import load_step, convert_stdstreams_to_files
# Resolve a step's `run` field (handles both inline and URI references)
step_process = load_step(workflow_step)
# Normalize stdin/stdout/stderr shortcuts into File objects
convert_stdstreams_to_files(command_line_tool)
Input File Loading
from cwl_utils.parser.utils import load_inputfile_by_uri, load_inputfile
# Load CWL input/job files (the YAML files that provide runtime values)
inputs = load_inputfile_by_uri("v1.2", "inputs.yml")
Document Packing (cwl_utils.pack)
Consolidates multi-file CWL workflows (with $include, $import, run: references) into a single self-contained document.
from cwl_utils.pack import pack_process
packed = pack_process(cwl_dict, base_url, cwl_version)
Handles:
- Inlining of
run:references to external tool definitions $include/$importresolutionSchemaDefRequirementuser-defined type inlining- Local and remote (HTTP) file fetching
- GitHub symbolic link detection
Expression Handling (cwl_utils.expression)
Parses and evaluates CWL expressions ($(...) parameter references and ${...} JavaScript blocks).
from cwl_utils.expression import scanner
# Find JS expression boundaries in a string
result = scanner("prefix_$(inputs.name)_suffix")
# Returns (start, end) tuple of the expression
Supports:
- Parameter reference syntax:
$(inputs.file.path) - JavaScript expression syntax:
${return inputs.x + 1} - Nested quoting (single/double quotes, backslash escapes)
- Configurable JavaScript engine sandboxing (
cwl_utils.sandboxjs)
File Format Validation (cwl_utils.file_formats)
Validates CWL file format annotations against ontologies (typically EDAM).
from cwl_utils.file_formats import check_format, formatSubclassOf
# Validate that a file's format matches allowed input formats
check_format(file_dict, allowed_formats, ontology_graph)
# Check ontology subclass relationships
formatSubclassOf(fmt_uri, class_uri, ontology_graph, visited=set())
Uses rdflib to traverse rdfs:subClassOf and owl:equivalentClass relationships.
Container Image Handling (cwl_utils.image_puller)
Abstract ImagePuller base class with concrete implementations:
| Class | Engine | Output |
|---|---|---|
DockerImagePuller | Docker / Podman | .tar tarball |
SingularityImagePuller | Singularity 2.6+ / 3.x+ | .img or .sif |
Both support force-pull and configurable save directories.
CWL Value Types (cwl_utils.types)
TypedDict definitions for CWL runtime values:
| Type | Description |
|---|---|
CWLFileType | File object (location, basename, checksum, size, secondaryFiles, format, contents) |
CWLDirectoryType | Directory object (location, basename, listing) |
CWLOutputType | Union of all CWL output value types (primitives, File, Directory, arrays, records) |
CWLObjectType | MutableMapping[str, CWLOutputType] |
CWLParameterContext | {inputs, self, runtime} context for expression evaluation |
CWLRuntimeParameterContext | Runtime context (outdir, tmpdir, cores, ram, exitCode, etc.) |
Type guard functions: is_file(), is_directory(), is_file_or_directory().
Built-in CWL type names: null, boolean, int, long, float, double, string, File, Directory, stdin, stdout, stderr, Any.
Schema Definition Handling (cwl_utils.schemadef)
Resolves SchemaDefRequirement user-defined types in CWL documents. Builds a type dictionary from inline and $import-ed schema definitions for use during packing and validation.
General Utilities (cwl_utils.utils)
| Function | Description |
|---|---|
load_linked_file() | Fetch and parse imported CWL files (local or remote) |
normalize_to_map() / normalize_to_list() | Convert between dict and list representations of CWL fields |
resolved_path() | Resolve relative paths against base URIs |
bytes2str_in_dicts() | Recursively decode byte strings in nested structures |
yaml_dumps() | Serialize to YAML string |
sanitise_schema_field() | Normalize CWL type shorthand (e.g. File? -> optional File) |
to_pascal_case() | String case conversion |
is_uri() / is_local_uri() / get_value_from_uri() | URI detection and parsing |
CLI Tools
Six command-line tools are installed as console scripts:
cwl-cite-extract
Extract software citations/requirements from CWL documents. Traverses workflows recursively to find all SoftwareRequirement entries with package names and versions.
cwl-docker-extract
Pull and cache all container images referenced in CWL documents via DockerRequirement. Supports Docker, Podman, Singularity, and udocker backends.
cwl-expression-refactor
Refactor inline CWL expressions ($(...) / ${...}) into standalone ExpressionTool or CommandLineTool steps, producing expression-free workflows.
cwl-graph-split
Unpack $graph-style CWL documents (multiple processes in one file) into separate files, one per process.
cwl-normalizer
Normalize CWL documents: upgrade to v1.2 (via cwl-upgrader), pack into a single document, and optionally refactor expressions. Outputs JSON or YAML.
cwl-inputs-schema-gen
Generate a JSON Schema from CWL workflow/tool input definitions. Useful for building input validation forms or generating documentation.
Error Hierarchy
BaseException
ArrayMissingItems
MissingKeyField
MissingTypeName
RecordMissingFields
Exception
JavascriptException
SubstitutionError
WorkflowException
GraphTargetMissingException
schema_salad.exceptions.ValidationException (used extensively)
Dependencies
Required
| Package | Version Constraint | Purpose |
|---|---|---|
schema-salad | >= 8.8, < 9 | Schema validation framework, YAML loading, source-line tracking |
ruamel.yaml | >= 0.17.6, < 0.20 | YAML parsing with round-trip fidelity |
rdflib | any | RDF graph handling for file format ontology validation |
requests | any | HTTP client for fetching remote CWL documents |
cwl-upgrader | >= 1.2.3 | CWL version upgrading (v1.0/v1.1 -> v1.2) |
packaging | any | Version string comparison |
typing_extensions | >= 4.10.0 | Backported typing features (TypeIs, etc.) |
Optional
| Extra | Packages | Purpose |
|---|---|---|
pretty | cwlformat | Pretty-printing CWL output |
testing | pytest, pytest-cov, pytest-xdist, jsonschema, udocker, cwltool | Test suite |
Integration Patterns
As a parsing library
Load CWL documents into typed Python objects for analysis, transformation, or code generation.
As a validation layer
Use static_checker() to type-check workflow connections before execution. Use check_format() to validate file format ontology compatibility.
As a transformation pipeline
Chain: load -> modify Python objects -> save -> write. Or use CLI tools: cwl-normalizer (upgrade + pack), cwl-expression-refactor (simplify expressions), cwl-graph-split (decompose).
As a metadata extractor
Extract software requirements (cwl-cite-extract), container images (cwl-docker-extract), or input schemas (cwl-inputs-schema-gen) from CWL documents.