COLLECTION_SEMANTICS_PLAN_GRAPHVIZ_DIAGRAMS

Plan: Auto-Generate Graphviz Collection Structure Diagrams

Overview

Add diagram generation to semantics.py that converts the YAML spec’s CollectionDefinition declarations into Graphviz SVG diagrams using the gx-collection-graphviz library, embedding them inline in the Sphinx documentation.

Current State

Add generate_diagrams() to semantics.py that converts CollectionDefinition -> gx-collection-graphviz Collection objects -> SVGs. Keeps gx-collection-graphviz as pure rendering library.

YAML-to-Diagram Bridge

Conversion Logic

YAML format:

C: [paired, {forward: d_f, reverse: d_r}]
C: ["list:paired", {el1: {forward: d_f, reverse: d_r}}]

gx-collection-graphviz format:

Collection(name="C", collection_type="list:paired", elements=[
    Element(element_identifier="el1", collection_type="paired", elements=[
        Element(element_identifier="forward"),
        Element(element_identifier="reverse"),
    ])
])

Key mapping:

Unique Structures (~7)

Collection TypeElements
paired{forward, reverse}
paired_or_unpaired (paired){forward, reverse}
paired_or_unpaired (unpaired){unpaired}
list{i1, ..., in}
list:list{o1: {inner}, ..., on: {inner}}
list:paired{el1: {forward, reverse}}
list:paired_or_unpaired{el1: {forward, reverse}}

Implementation Phases

Phase 1: Add gx-collection-graphviz as dev dependency

Phase 2: Build conversion function

In semantics.py: collection_definition_to_collection(name, coll_def) -> Collection

Phase 3: Generate SVG files

Phase 4: Embed in generated Markdown

Phase 5: Deduplicate

Phase 6: Testing (Red-to-Green)

Test file: test/unit/model/dataset_collections/test_semantics_diagrams.py

  1. Write conversion tests (fail) -> implement collection_definition_to_collection() -> pass
  2. Write SVG generation test (fail) -> implement generate_diagrams() -> pass
  3. Write markdown reference test (fail) -> modify generate_docs() -> pass

Key test cases: paired, nested list:paired, ellipsis, deep nesting (list:list:paired).

Phase 7 (Future): Transform Diagrams

Parse then expression results to show input -> output collection side-by-side. Requires parser for collection<list:paired, {...}> patterns.

Phase 8 (Future): Sphinx Extension

Custom directive for build-time generation. Lower priority since pre-generation works.

File Changes

FileChange
lib/galaxy/model/dataset_collections/types/semantics.pyAdd generate_diagrams(), conversion functions, SVG refs in generate_docs()
doc/source/dev/collection_diagrams/New directory for generated SVGs
Galaxy dev dependenciesAdd gx-collection-graphviz
test/unit/model/dataset_collections/test_semantics_diagrams.pyNew test file
Makefile (optional)Add make collection-diagrams target

Unresolved Questions

  1. Pre-generate and commit SVGs, or generate at doc-build time? (Committing avoids system Graphviz dep at build)
  2. Publish gx-collection-graphviz to PyPI first, or git dep? Could vendor ~150 lines directly.
  3. Diagram per example, or per unique structure (deduplication)?
  4. Should is_valid: false examples get diagrams? (Structure itself is valid even when operation isn’t)
  5. Label leaves with dataset names (d_f, d_r) or generic “dataset”?
  6. SVG or PNG? (SVG recommended)