CONNECTION_VALIDATION_TESTING_AND_TRACKING_PLAN

Connection Validation: Testing, Reporting, and Tracking

Replaces Phase 4 reporting and Phase 5 of CONNECTION_VALIDATION.md with a unified approach: fixture-based testing with sidecar expectations, Pydantic report models integrated into the existing validate.py infrastructure, and collection_semantics.yml tracking via fixture filenames.


1. Pydantic Report Models

The current connection validation uses dataclass result objects (ConnectionValidationResult, StepConnectionResult, WorkflowConnectionResult) that are separate from the existing _report_models.py Pydantic hierarchy. This section adds Pydantic models that:

1.1 New Models in _report_models.py

ConnectionStatus = Literal["ok", "invalid", "skip"]

class ConnectionResult(BaseModel):
    """Single connection between two steps.

    status and mapping are orthogonal:
    - status: whether the connection is valid (ok/invalid/skip)
    - mapping: what collection type is being mapped over (None = direct match)
    A connection can be ok with mapping (valid map-over) or ok without (direct).
    """
    source_step: str
    source_output: str
    target_step: str
    target_input: str
    status: ConnectionStatus
    mapping: Optional[str] = None  # collection type being mapped over, None = direct
    errors: List[str] = []

class ResolvedOutputType(BaseModel):
    """Resolved collection type for a step output."""
    name: str
    collection_type: Optional[str] = None  # None = plain dataset

class ConnectionStepResult(StepResultBase):
    """Connection validation result for one step."""
    step_type: str = "tool"
    map_over: Optional[str] = None
    connections: List[ConnectionResult] = []
    resolved_outputs: List[ResolvedOutputType] = []
    errors: List[str] = []

class ConnectionValidationReport(BaseModel):
    """Connection validation results for one workflow."""
    valid: bool
    step_results: List[ConnectionStepResult] = []
    summary: Dict[str, int] = {}  # {ok, invalid, skip}

1.2 Extend WorkflowValidationResult

Add an optional connection_report field to the existing workflow result:

class WorkflowValidationResult(WorkflowResultBase):
    step_results: List[ValidationStepResult] = Field(default=[], serialization_alias="results")
    connection_report: Optional[ConnectionValidationReport] = None
    # ... existing summary computed_field

When --connections is enabled, connection_report is populated. When disabled, it’s None and excluded from JSON output via model_dump(exclude_none=True). This keeps the existing --report-json shape stable for consumers that don’t care about connections, while enriching it for those that do.

1.3 Resolved Output Types

The validation engine currently builds resolved_output_types: Dict[str, Dict[str, CollectionTypeOrSentinel]] during traversal but discards it. Change validate_connection_graph() to return this alongside the result, and convert it to ResolvedOutputType entries on each ConnectionStepResult.

This is the key data that enables testing “what collection type does step X’s output Y resolve to after propagation?“

1.4 JSON Output Shape (example)

{
  "path": "workflow.ga",
  "results": [ ... tool_state validation ... ],
  "connection_report": {
    "valid": true,
    "step_results": [
      {
        "step": "0",
        "step_type": "data_collection_input",
        "resolved_outputs": [
          {"name": "output", "collection_type": "list:paired"}
        ]
      },
      {
        "step": "1",
        "tool_id": "collection_paired_test",
        "step_type": "tool",
        "map_over": "list",
        "connections": [
          {
            "source_step": "0",
            "source_output": "output",
            "target_step": "1",
            "target_input": "f1",
            "status": "ok",
            "mapping": "list"
          }
        ],
        "resolved_outputs": [
          {"name": "out1", "collection_type": "list"}
        ]
      }
    ],
    "summary": {"ok": 1, "invalid": 0, "skip": 0}
  }
}

1.5 Formatters

Add text and markdown formatters for ConnectionValidationReport following the existing patterns in validate.py. The text formatter replaces the current format_connection_text(). The markdown formatter adds a “Connection Validation” section to the existing markdown report.


2. Fixture-Based Test Infrastructure

2.1 Directory Layout

test/unit/tool_util/workflow_state/
    connection_workflows/
        ok_dataset_to_dataset.gxwf.yml
        ok_paired_to_paired.gxwf.yml
        ok_paired_to_paired_or_unpaired.gxwf.yml
        ok_list_paired_to_paired.gxwf.yml
        ok_list_to_multi_data.gxwf.yml
        ok_collection_type_source_passthrough.gxwf.yml
        ok_structured_like_passthrough.gxwf.yml
        ok_two_inputs_compatible_map_over.gxwf.yml
        ok_chain_map_over_propagates.gxwf.yml
        ok_collection_output_with_map_over.gxwf.yml
        ok_sample_sheet_to_list.gxwf.yml
        fail_list_to_paired.gxwf.yml
        fail_incompatible_map_over.gxwf.yml
        fail_paired_or_unpaired_to_paired.gxwf.yml
        ...
    connection_workflows/expected/
        ok_list_paired_to_paired.yml
        ok_chain_map_over_propagates.yml
        fail_incompatible_map_over.yml
        ...

2.2 Fixture Workflow Format

class: GalaxyWorkflow
inputs:
  input_collection:
    type: collection
    collection_type: list:paired
steps:
  step1:
    tool_id: collection_paired_test
    in:
      f1: input_collection

Tools referenced by tool_id are resolved via functional_test_tool_sourceparse_tool. The functional test tools in test/functional/tools/ already cover the needed archetypes:

ToolInputOutputUse For
collection_paired_testf1: pairedout1: datapaired consumption
collection_paired_or_unpairedf1: paired_or_unpairedout1: datapou consumption
collection_list_paired_or_unpairedf1: list:paired_or_unpairedout1: datacompound pou
collection_type_sourceinput_collect: collection(any)list_output: collection(type_source=input_collect)dynamic output types
collection_paired_structured_likeinput1: pairedlist_output: paired(structured_like)structured_like output
collection_creates_pairinput1: datapaired_output: pairedpaired creation
multi_data_paramf1: data(multiple)out1: datamulti-data reduction

Missing tool: A simple data → data tool is needed for basic mapping tests (e.g., ok_list_to_dataset.gxwf.yml). None of the existing functional test tools have a single data input and single data output — collection_creates_pair takes data but outputs a paired collection. Add a minimal simple_data_tool.xml to test/functional/tools/ with one data input and one data output.

2.3 FunctionalGetToolInfo

Adapter that bridges functional_test_tool_source to the GetToolInfo protocol:

class FunctionalGetToolInfo:
    """GetToolInfo backed by functional test tool XMLs."""

    def __init__(self):
        self._cache: Dict[str, Optional[ParsedTool]] = {}

    def get_tool_info(self, tool_id: str, tool_version: Optional[str]) -> Optional[ParsedTool]:
        if tool_id not in self._cache:
            try:
                ts = functional_test_tool_source(tool_id)
                self._cache[tool_id] = parse_tool(ts)
            except Exception:
                self._cache[tool_id] = None
        return self._cache[tool_id]

Shared as a module-level or session-scoped pytest fixture to avoid re-parsing tools per test.

2.4 Test Runner: test_connection_workflows.py

import pytest
import yaml
from pathlib import Path
from gxformat2.converter import python_to_workflow

WORKFLOW_DIR = Path(__file__).parent / "connection_workflows"
EXPECTED_DIR = WORKFLOW_DIR / "expected"

def discover_workflows():
    return sorted(WORKFLOW_DIR.glob("*.gxwf.yml"))

def workflow_ids():
    return [p.stem for p in discover_workflows()]

@pytest.fixture(scope="module")
def tool_info():
    return FunctionalGetToolInfo()

@pytest.mark.parametrize("workflow_path", discover_workflows(), ids=workflow_ids())
def test_connection_workflow(workflow_path, tool_info):
    """Validate fixture workflow and check ok_/fail_ expectation + sidecar assertions."""
    wf_dict = yaml.safe_load(workflow_path.read_text())
    native = python_to_workflow(wf_dict, MinimalGalaxyInterface())
    result = validate_connections(native, tool_info)
    report = to_connection_validation_report(result)

    stem = workflow_path.stem
    if stem.startswith("ok_"):
        assert report.valid, f"Expected valid but got errors: {_collect_errors(report)}"
    elif stem.startswith("fail_"):
        assert not report.valid, f"Expected invalid but workflow validated"
    else:
        pytest.skip(f"Unknown prefix for {stem}")

    # Check sidecar expectations if present
    expected_path = EXPECTED_DIR / f"{stem}.yml"
    if expected_path.exists():
        raw = yaml.safe_load(expected_path.read_text())
        expectations = [(e["target"], e["value"]) for e in raw]
        report_dict = report.model_dump(by_alias=True)
        dict_verify_each(report_dict, expectations)

The parametrized test auto-discovers workflows. Adding a new test case = dropping a YAML file.

2.5 Sidecar Expectation Files

Sidecar .yml files are hand-written lists of [json_path, expected_value] pairs — the same pattern as dict_verify_each in test/unit/tool_util/util.py. Each assertion is a deliberate human/agent judgment, not a snapshot.

Example for ok_list_paired_to_paired.yml:

- target: [valid]
  value: true
- target: [step_results, 1, map_over]
  value: "list"
- target: [step_results, 1, connections, 0, status]
  value: "ok"
- target: [step_results, 1, connections, 0, mapping]
  value: "list"
- target: [step_results, 1, resolved_outputs, 0, collection_type]
  value: "list"

Example for fail_incompatible_map_over.yml:

- target: [valid]
  value: false
- target: [step_results, 2, errors, 0]
  value: "Incompatible map-over types: list vs paired"

Properties:


3. Relationship to Existing Tests

3.1 What Moves to Fixtures

The following tests from test_connection_validation.py become fixture workflows:

Current TestFixture
test_dataset_to_dataset_okok_dataset_to_dataset.gxwf.yml
test_collection_to_matching_collection_okok_paired_to_paired.gxwf.yml
test_incompatible_collection_types_invalidfail_list_to_paired.gxwf.yml
test_paired_to_paired_or_unpaired_okok_paired_to_paired_or_unpaired.gxwf.yml
test_list_to_any_collection_okok_any_collection_accepts_list.gxwf.yml
test_list_paired_over_pairedok_list_paired_to_paired.gxwf.yml + expectations
test_list_over_datasetok_list_to_dataset.gxwf.yml + expectations
test_list_to_multi_data_okok_list_to_multi_data.gxwf.yml
test_paired_to_multi_data_not_reductionok_paired_maps_over_multi_data.gxwf.yml + expectations
test_incompatible_map_over_types_errorfail_incompatible_map_over.gxwf.yml + expectations
test_map_over_propagates_through_chainok_chain_map_over_propagates.gxwf.yml + expectations
test_collection_output_with_map_overok_collection_output_with_map_over.gxwf.yml + expectations
test_collection_type_source_resolutionok_collection_type_source.gxwf.yml + expectations
test_structured_like_resolutionok_structured_like.gxwf.yml + expectations

3.2 What Stays Programmatic

3.3 Migration Path

  1. Build the Pydantic report models and wire resolved outputs
  2. Build FunctionalGetToolInfo and the test runner
  3. Write fixture workflows one at a time, verifying against existing programmatic test assertions
  4. Write sidecar expectation files for fixtures needing detailed assertions
  5. Remove corresponding programmatic tests as fixtures take over
  6. Keep test_connection_types.py and test_connection_graph.py as-is

4. collection_semantics.yml Tracking

4.1 New Test Category

Add workflow_format_validation to each example’s tests: block. The value is the fixture workflow filename (without .gxwf.yml):

- example:
    label: BASIC_MAPPING_PAIRED
    # ... existing content ...
    tests:
        tool_runtime:
            api_test: "test_tool_execute.py::test_map_over_collection"
        workflow_runtime:
            framework_test: "collection_semantics_cat_0"
        workflow_editor: "accepts paired data -> data connection"
        workflow_format_validation: "ok_list_paired_to_paired"

The entry IS the filename. The test runner discovers the file by name, validates it, and the ok/fail status maps directly to the collection_semantics.yml example’s expected behavior.

4.2 Fixture ↔ Example Mapping

Each collection_semantics.yml example that describes a connection scenario gets a fixture workflow. The fixture filename should be traceable to the example label:

Example LabelFixtureExpected
BASIC_MAPPING_PAIREDok_basic_mapping_paired.gxwf.ymlvalid, map_over=paired
BASIC_MAPPING_LISTok_basic_mapping_list.gxwf.ymlvalid, map_over=list
COLLECTION_INPUT_PAIREDok_collection_input_paired.gxwf.ymlvalid, direct match
COLLECTION_INPUT_PAIRED_NOT_CONSUMES_LISTfail_paired_not_consumes_list.gxwf.ymlinvalid
PAIRED_OR_UNPAIRED_CONSUMES_PAIREDok_pou_consumes_paired.gxwf.ymlvalid, direct match
PAIRED_OR_UNPAIRED_NOT_CONSUMED_BY_PAIREDfail_pou_not_consumed_by_paired.gxwf.ymlinvalid
MAPPING_LIST_PAIRED_OVER_PAIREDok_list_paired_to_paired.gxwf.ymlvalid, map_over=list
LIST_REDUCTIONok_list_to_multi_data.gxwf.ymlvalid, reduction
PAIRED_REDUCTION_INVALIDfail_paired_reduction.gxwf.ymlinvalid
SAMPLE_SHEET_MATCHES_LISTok_sample_sheet_to_list.gxwf.ymlvalid
LIST_NOT_MATCHES_SAMPLE_SHEETfail_list_not_matches_sample_sheet.gxwf.ymlinvalid

Not every example maps 1:1 — some examples describe runtime behavior (job execution, element identity) that connection validation can’t test. Those get workflow_format_validation: ~ (null) to explicitly mark them as not applicable.

4.3 Coverage Verification

A script or test that loads collection_semantics.yml, extracts all examples with workflow_format_validation entries, and verifies that the referenced fixture file exists and passes/fails as expected. This prevents drift between the tracking document and the test fixtures.

def test_collection_semantics_coverage():
    """Verify all collection_semantics.yml entries have matching fixtures."""
    examples = load_collection_semantics_examples()
    for ex in examples:
        fixture_name = ex.tests.get("workflow_format_validation")
        if fixture_name is None:
            continue  # Not applicable for this example
        fixture_path = WORKFLOW_DIR / f"{fixture_name}.gxwf.yml"
        assert fixture_path.exists(), f"Missing fixture for {ex.label}: {fixture_name}"

4.4 Example Entries Without Fixture Workflows

Some examples describe behaviors that can’t be tested by connection validation alone:

These get workflow_format_validation: ~ with a comment explaining why.


5. Implementation Order

StepDeliversDepends On
5.1Pydantic report models (ConnectionResult, ConnectionStepResult, ConnectionValidationReport)Existing dataclass models
5.2Wire resolved output types into validation result5.1
5.3Extend WorkflowValidationResult with connection_report5.1
5.4Update validate.py to use Pydantic reports (replace format_connection_text, integrate into emit_reports)5.1, 5.3
5.5Text + markdown formatters for connection report5.4
5.6FunctionalGetToolInfo adapter + gxformat2 conversion helper
5.7Test runner (test_connection_workflows.py) with fixture discovery + dict_verify_each sidecar loading5.6
5.8First batch of fixture workflows (10-15 covering the main cases) + hand-written expectation files5.7
5.9Remove migrated programmatic tests from test_connection_validation.py5.8
5.10collection_semantics.yml entries for all fixture workflows5.8
5.11Coverage verification test5.10

6. Unresolved Questions