Galaxy Testing

Learning Questions

Where should I put a new test?
How do I write an API test?
When should I use an integration test vs an API test?
How do I test code that requires special Galaxy configuration?
How do I write Selenium/Playwright tests?

Learning Objectives

Use the decision tree to select appropriate test type
Write Python unit tests for isolated components
Write API tests using populators and assertions
Write integration tests with custom Galaxy configuration
Write Selenium tests using the smart component system
Understand CI workflows for each test type

Writing Tests for Galaxy

Tests are essential for Galaxy development:

Prevent regressions as code evolves
Document behavior through executable examples
Enable refactoring with confidence
Run in CI on every pull request

Galaxy has a comprehensive test suite covering everything from isolated Python functions to full-stack browser automation. This topic guides you through the different test types, when to use each, and how to write effective tests for your Galaxy contributions.

Other Resources

./run_tests.sh --help - Command-line options
GTN Writing Tests Tutorial - Hands-on exercises
client/README.md - Client-side testing details
Galaxy Architecture Slides - CI overview

The run_tests.sh script is the primary entry point for running Galaxy’s Python tests. Its help output provides the most up-to-date documentation on available test suites and command-line flags. The GTN tutorial provides hands-on exercises covering API tests, unit tests, and testability patterns.

Quick Reference

Test Type	Location	Run Command
Unit (Python)	`test/unit/`	`./run_tests.sh -unit`
Unit (Client)	`client/src/`	`make client-test`
API	`lib/galaxy_test/api/`	`./run_tests.sh -api`
Integration	`test/integration/`	`./run_tests.sh -integration`
Framework	`test/functional/tools/`	`./run_tests.sh -framework`
Workflow Framework	`lib/galaxy_test/workflow/`	`./run_tests.sh -framework-workflows`
Selenium	`lib/galaxy_test/selenium/`	`./run_tests.sh -selenium`
Playwright	`lib/galaxy_test/selenium/`	`./run_tests.sh -playwright`
Selenium Integration	`test/integration_selenium/`	`./run_tests.sh -selenium`

This table provides a quick reference for finding and running each test type. Note that Selenium and Playwright tests share the same test files in lib/galaxy_test/selenium/ but use different browser automation backends. The run commands shown are the most common invocations; see run_tests.sh --help for additional options.

Which Test Type?

Test Type Decision Tree

The decision tree helps you choose the right test type for your needs. The primary questions are: Does the test need a running Galaxy server? Does it need a web browser? Does it need custom Galaxy configuration? Your answers guide you to the appropriate test suite.

Decision Tree Walkthrough

No running server needed? → Unit test

Python backend → test/unit/
ES6/Vue client → client/src/

Server needed, no browser?

Standard config → API test
Custom config → Integration test
Tool/workflow only → Framework test

Browser needed? → Selenium/Playwright

Start by asking whether your test needs a running Galaxy server. If not, it’s a unit test. If yes, consider whether you need browser automation. Most backend tests work best as API tests since they’re faster and more reliable than browser tests. Only use integration tests when you need custom Galaxy configuration, and only use Selenium tests when testing the actual UI.

Python Unit Tests

Location: test/unit/

When to use:

Component can be tested in isolation
No database or web server needed
Complex logic worth testing independently

Run: ./run_tests.sh -unit

Python unit tests are located in test/unit/ and run via pytest. They’re ideal for components that are well-architected for isolation - those that shield complexity from consumers and have minimal external dependencies. Unit tests are fast and reliable since they don’t require Galaxy’s infrastructure.

Doctests Guidance

Doctests are executable examples embedded in docstrings - Python runs them to verify documentation stays accurate.

Doctests are more brittle and more restrictive.

Use doctests when:

Tests serve as documentation (definitely)
Tests are simple and isolated (maybe)

Use standalone tests when:

Tests are complex
Tests need fixtures or mocking
Tests verify edge cases

Doctests can be useful when the test itself serves as high-quality documentation for the component. However, they’re harder to maintain and more restrictive than standalone test files. For most cases, prefer writing tests in test/unit/ where you have full access to pytest fixtures and standard testing patterns.

External Dependency Tests

Tests in test/unit/tool_util/ are “slow” tests that interact with external services:

from .util import external_dependency_management

@external_dependency_management
def test_conda_install(tmp_path):
    # ... test conda operations

Run separately:

tox -e mulled
# or:
pytest -m external_dependency_management test/unit/tool_util/

Some unit tests interact with external services like Conda, container registries, or BioContainers. These are marked with the @external_dependency_management decorator and excluded from normal unit test runs since they’re slow and depend on network access. Run them explicitly with tox -e mulled when needed.

Python Unit Test CI

CI Platform: GitHub Actions

Workflow: .github/workflows/unit.yaml

Characteristics:

Runs on every pull request
Tests multiple Python versions (3.9, 3.14)
Moderately prone to flaky failures
If tests fail unrelated to your PR, request a re-run

Python unit tests run automatically on every pull request via GitHub Actions. The test suite can occasionally fail due to transient issues unrelated to your changes. If you see failures that don’t seem related to your PR, ping the Galaxy committers to request a re-run.

Frontend Unit Tests

Galaxy’s client unit tests test the ES6/TypeScript/Vue frontend components.

Technologies:

Vitest - Test framework
Vue Test Utils - Component testing
MSW - API mocking

New client-side code submissions should include accompanying unit tests for the developer-facing API. Vitest provides the test framework, Vue Test Utils enables component testing, and MSW (Mock Service Worker) provides type-safe API mocking.

Test File Structure

Place tests adjacent to code with .test.ts extension (older tests may use .test.js):

src/components/MyComponent/
├── MyComponent.vue
├── MyComponent.test.ts
└── test-utils.ts        # optional shared utilities

Standard imports:

import { createTestingPinia } from "@pinia/testing";
import { getLocalVue } from "@tests/vitest/helpers";
import { shallowMount } from "@vue/test-utils";
import { useServerMock } from "@/api/client/__mocks__";
import { beforeEach, describe, expect, it, vi } from "vitest";

Client tests are colocated with the source code they test, following the *.test.ts naming convention. This makes it easy to find tests for a given component and keeps related code together. The standard imports pattern shows the common utilities you’ll use in most tests.

Galaxy Testing Infrastructure

LocalVue Setup - configures BootstrapVue, Pinia, localization:

const localVue = getLocalVue();
// or with localization testing:
const localVue = getLocalVue(true);

Test Data Factories - consistent test data:

import { getFakeRegisteredUser } from "@tests/test-data";
const user = getFakeRegisteredUser({ id: "custom-id", is_admin: true });

Galaxy provides testing infrastructure to simplify component setup. getLocalVue() returns a Vue instance configured with common plugins. Test data factories like getFakeRegisteredUser() provide consistent mock data. Helper functions suppress known console warnings from third-party libraries that would otherwise clutter test output.

Test Data Factories

Galaxy’s @tests/test-data module provides factory functions for creating consistent test fixtures:

import {
    getFakeRegisteredUser,
    getFakeHistory,
    getFakeHistoryDataset,
} from "@tests/test-data";

// Create with defaults
const user = getFakeRegisteredUser();

// Override specific fields
const adminUser = getFakeRegisteredUser({ is_admin: true });
const history = getFakeHistory({ name: "Test History", size: 1024 });

When to use factories vs inline data:

Use factories when you need realistic, complete objects with all required fields
Use inline data for minimal test cases focused on specific fields
Extend factories when tests need consistent variations (e.g., admin users)

Available factories include user types (getFakeRegisteredUser, getFakeAnonymousUser), histories, datasets, and other common Galaxy objects. Check client/tests/test-data/ for the complete list.

API Mocking with MSW

Galaxy uses MSW with OpenAPI-MSW for type-safe API mocking:

import { useServerMock } from "@/api/client/__mocks__";

const { server, http } = useServerMock();

beforeEach(() => {
    server.use(
        http.get("/api/histories/{history_id}", ({ response }) => {
            return response(200).json({
                id: "history-id", name: "Test History",
            });
        }),
    );
});

MSW intercepts API requests at the network level, providing realistic API mocking. The useServerMock() helper sets up MSW with Galaxy’s OpenAPI spec, giving you type-safe request handlers. This is the preferred approach over axios-mock-adapter for new tests.

shallowMount vs mount

Prefer shallowMount for client unit tests.

`shallowMount` (preferred)	`mount`
Stubs child components	Renders full tree
Tests component in isolation	Tests integration
Faster, fewer mocks needed	Slower, more setup

// Preferred: isolated unit test
const wrapper = shallowMount(MyComponent, { localVue, pinia });

Integration testing → use Selenium/Playwright instead.

Client-side tests should be focused unit tests that verify individual component behavior. shallowMount stubs all child components, keeping tests isolated and fast. For testing parent-child interactions across the full component tree, Galaxy’s Selenium/Playwright tests are better suited since they test the real application in a browser.

Mount Wrapper Factories

Create reusable mount functions for complex setup:

async function mountMyComponent(propsData = {}, options = {}) {
    const pinia = createTestingPinia({ createSpy: vi.fn });

    const wrapper = shallowMount(MyComponent, {
        localVue,
        propsData: { defaultProp: "value", ...propsData },
        pinia,
        ...options,
    });

    await flushPromises();
    return wrapper;
}

For components with complex setup requirements, create a factory function that handles Pinia setup, default props, and waiting for promises. This keeps individual tests focused on behavior rather than boilerplate setup.

Selector Constants & Events

Define selectors as constants:

const SELECTORS = {
    SUBMIT_BUTTON: "[data-description='submit button']",
    ERROR_MESSAGE: "[data-description='error message']",
};
expect(wrapper.find(SELECTORS.ERROR_MESSAGE).exists()).toBe(true);

Testing emitted events:

await wrapper.find("input").setValue("new value");

expect(wrapper.emitted()["update:value"]).toBeTruthy();
expect(wrapper.emitted()["update:value"][0][0]).toBe("new value");

Define selectors as constants for maintainability - when component structure changes, update selectors in one place. For events, use wrapper.emitted() to verify components emit the right events with correct payloads.

Pinia Store Testing

In component tests:

const pinia = createTestingPinia({ createSpy: vi.fn, stubActions: false });
setActivePinia(pinia);

const wrapper = shallowMount(MyComponent, { localVue, pinia });

const userStore = useUserStore();
userStore.currentUser = getFakeRegisteredUser();

Isolated store tests:

beforeEach(() => setActivePinia(createPinia()));

it("updates state correctly", () => {
    const store = useMyStore();
    store.doAction();
    expect(store.someState).toBe("expected");
});

For component tests, use createTestingPinia() which provides a mock Pinia instance with spied actions. For isolated store unit tests, use a real Pinia instance to test store logic directly.

Async Operations

Use flushPromises() after API calls:

const wrapper = shallowMount(MyComponent, { localVue, pinia });
await flushPromises(); // Wait for mounted() API calls

Use nextTick() for Vue reactivity:

await wrapper.setProps({ value: "new" });
await nextTick();
expect(wrapper.text()).toContain("new");

Many component operations are asynchronous. Use flushPromises() to wait for all pending promises (API calls, async setup). Use Vue’s nextTick() when waiting for reactive updates to propagate to the DOM.

Testing Best Practices

Test behavior, not implementation
Avoid wrapper.vm directly - test through template
One behavior per test
Descriptive names: "displays error when API returns 500"
Clean up in beforeEach/afterEach
Mock external services, not component logic
Test edge cases: errors, empty data, boundaries

Focus on what users see and experience, not internal implementation details. Keep tests focused on a single behavior with clear names. Mock at appropriate boundaries - external services yes, component logic no. Don’t forget edge cases like error states and empty data.

Good vs Bad Test Examples

GOOD: Test user-visible behavior

test('displays error message when API fails', async () => {
    server.use(http.get("/api/data", ({ response }) => response(500).json({})));
    const wrapper = shallowMount(MyComponent, { localVue, pinia });
    await flushPromises();
    expect(wrapper.text()).toContain('Error loading data');
});

BAD: Test implementation details

test('calls fetchData method', () => {
    const fetchDataSpy = vi.spyOn(wrapper.vm, 'fetchData');
    wrapper.vm.loadData();
    expect(fetchDataSpy).toHaveBeenCalled();
});

Running Client Tests

Full test run (CI):

make client-test

Watch mode (development):

yarn test:watch
yarn test:watch MyModule      # Filter by name
yarn test:watch workflow/run  # Filter by path

The make client-test command runs the complete client test suite, which is what CI executes. During development, use watch mode for fast feedback - tests rerun automatically on file changes. Filter to specific tests by passing a pattern.

Client Test CI

CI Platform: GitHub Actions

Workflow: .github/workflows/client-unit.yaml

Linting: Run before submitting PRs:

make client-lint     # Check for issues
make client-format   # Auto-fix formatting

Client unit tests run automatically on every pull request via GitHub Actions. Prettier enforces code style, so run make client-lint and make client-format before submitting. If tests fail unrelated to your changes, request a re-run.

Tool Framework Tests Overview

Location: test/functional/tools/

What they test:

Galaxy tool wrapper definitions (XML)
Complex tool internals via actual tool execution
Legacy behavior compatibility

When to use:

Testing tool XML features
Verifying tool test assertions work correctly
No need to write Python - just XML

A surprising amount of Galaxy’s complexity is in tool wrapper definitions. The Tool Framework Tests (“Framework Tests”) exercise these internals by running actual tool tests. This suite uses sample tools in test/functional/tools/ to verify everything from simple parameter handling to complex output discovery. Adding a test here is often simpler than writing a Python API test.

Adding a Tool Test

Option 1: Add test block to existing tool

<!-- In test/functional/tools/some_tool.xml -->
<tests>
    <test>
        <param name="input1" value="test.txt"/>
        <output name="out_file1" file="expected.txt"/>
    </test>
</tests>

Option 2: Add new tool to sample_tool_conf.xml

<tool file="my_new_tool.xml" />

Run: ./run_tests.sh -framework

To add a framework test, either add a <test> block to an existing sample tool or create a new tool XML file and register it in test/functional/tools/sample_tool_conf.xml. The test framework handles server startup, tool execution, and output verification automatically. See Planemo’s Test-Driven Development documentation for detailed guidance on writing Galaxy tool tests.

Workflow Framework Tests

Location: lib/galaxy_test/workflow/

What they test:

Workflow evaluation engine
Input handling and connections
Output assertions

Structure: Each test has two files:

*.gxwf.yml - Workflow definition (Format2 YAML)
*.gxwf-tests.yml - Test cases with assertions

Run: ./run_tests.sh -framework-workflows

Workflow Framework Tests verify Galaxy’s workflow evaluation by running workflows and checking outputs. Unlike tool tests which use embedded <tests> blocks, workflow tests use separate definition and test files. The workflow is defined in Galaxy’s Format2 YAML syntax, and test cases specify inputs and expected outputs.

Workflow Framework Example

Workflow (default_values.gxwf.yml):

class: GalaxyWorkflow
inputs:
  input_int:
    type: int
    default: 1
outputs:
  out:
    outputSource: my_tool/out_file1
steps:
  my_tool:
    tool_id: integer_default
    in:
      input1: { source: input_int }

Tests (default_values.gxwf-tests.yml):

- doc: Test default value works
  job: {}
  outputs:
    out:
      asserts:
      - that: has_text
        text: "1"

This example shows a workflow that uses a default value for an integer input. The test case provides an empty job (no inputs), expecting the default to be used. The asserts section verifies the output contains “1”. You can also use expect_failure: true to verify workflows fail correctly for invalid inputs.

Framework Test CI

Tool Framework:

Workflow: .github/workflows/framework_tools.yaml
Stable, rarely flaky

Workflow Framework:

Workflow: .github/workflows/framework_workflows.yaml
Stable, rarely flaky

Both run on every PR and are maintained in GitHub Actions.

Framework tests run automatically on every pull request via GitHub Actions. Both tool and workflow framework tests are among the most stable CI jobs, typically not experiencing transient failures unrelated to the changes under test. Failures usually indicate a real problem with your changes.

API Tests Overview

Location: lib/galaxy_test/api/

What they test:

Galaxy backend via HTTP API
Standard Galaxy configuration
Most backend functionality

When to use:

Testing API endpoints
Backend logic accessible via API
No custom Galaxy config needed

Run: ./run_tests.sh -api

API tests are Python tests that exercise Galaxy’s backend by making HTTP requests to the Galaxy API. They run against a Galaxy server with default configuration and are generally preferred over integration tests because they’re faster (shared server) and can be run against external Galaxy instances for deployment testing.

Test Class Structure

from galaxy_test.base.populators import DatasetPopulator
from ._framework import ApiTestCase

class TestMyFeatureApi(ApiTestCase):
    dataset_populator: DatasetPopulator

    def setUp(self):
        super().setUp()
        self.dataset_populator = DatasetPopulator(self.galaxy_interactor)

    def test_something(self):
        history_id = self.dataset_populator.new_history()
        response = self._get(f"histories/{history_id}")
        self._assert_status_code_is(response, 200)

API tests inherit from ApiTestCase which provides HTTP methods, server setup, and Celery task handling. The setUp method initializes populators for creating test data. The galaxy_interactor handles authentication and request routing.

HTTP Methods

# GET request
response = self._get("histories")

# POST with data
response = self._post("histories", data={"name": "Test"})

# PUT, PATCH, DELETE
response = self._put(f"histories/{history_id}", data=payload)
response = self._patch(f"histories/{history_id}", data=updates)
response = self._delete(f"histories/{history_id}")

# Admin operations
response = self._get("users", admin=True)

# Run as different user (requires admin)
response = self._post("histories", data=data,
                      headers={"run-as": other_user_id}, admin=True)

The base class provides wrapped HTTP methods that handle authentication and URL routing. Use admin=True for admin-only endpoints. The run-as header lets admin users execute requests on behalf of other users.

Populators Concept

What: Abstractions over Galaxy API for test data creation

Why use them:

Simpler than raw HTTP requests
Handle waiting for async operations
Encapsulate common patterns
Maintain consistency across tests

Three main populators:

DatasetPopulator - datasets, histories, tools
WorkflowPopulator - workflows
DatasetCollectionPopulator - collections

Populators are the primary abstraction for creating test fixtures in Galaxy tests. Rather than making raw API calls and handling async operations manually, populators provide convenient methods that handle the complexity. Despite its name, DatasetPopulator has become a hub for many Galaxy operations beyond just datasets.

DatasetPopulator

self.dataset_populator = DatasetPopulator(self.galaxy_interactor)

# Create history and dataset
history_id = self.dataset_populator.new_history("Test History")
hda = self.dataset_populator.new_dataset(history_id, content="data", wait=True)

# Run a tool
result = self.dataset_populator.run_tool(
    tool_id="cat1",
    inputs={"input1": {"src": "hda", "id": hda["id"]}},
    history_id=history_id
)
self.dataset_populator.wait_for_tool_run(history_id, result, assert_ok=True)

# Get dataset content
content = self.dataset_populator.get_history_dataset_content(history_id)

DatasetPopulator is the most commonly used populator. It creates histories and datasets, runs tools, and waits for jobs to complete. The wait=True parameter blocks until the operation finishes. Use assert_ok=True to fail the test if jobs don’t complete successfully.

DatasetPopulator Advanced

Getting content - multiple ways to specify dataset:

# Most recent dataset in history
content = self.dataset_populator.get_history_dataset_content(history_id)
# By position (hid)
content = self.dataset_populator.get_history_dataset_content(history_id, hid=7)
# By dataset ID
content = self.dataset_populator.get_history_dataset_content(history_id, dataset_id=hda["id"])

The _raw pattern - for testing error responses:

# Convenience: returns parsed dict, asserts success
result = self.dataset_populator.run_tool("cat1", inputs, history_id)

# Raw: returns Response for testing edge cases
response = self.dataset_populator.run_tool_raw("cat1", inputs, history_id)
assert_status_code_is(response, 400)  # Test error handling

Many populator methods have _raw variants that return the raw Response object instead of parsed JSON. Use raw methods when testing error responses, status codes, or API edge cases. The convenience methods are better for readable tests focused on functionality rather than API details.

WorkflowPopulator

self.workflow_populator = WorkflowPopulator(self.galaxy_interactor)

# Create a simple workflow
workflow_id = self.workflow_populator.simple_workflow("Test Workflow")

# Upload workflow from YAML
workflow_id = self.workflow_populator.upload_yaml_workflow("""
class: GalaxyWorkflow
inputs:
  input1: data
steps:
  step1:
    tool_id: cat1
    in:
      input1: input1
""")

# Wait for invocation
self.workflow_populator.wait_for_invocation(workflow_id, invocation_id)

WorkflowPopulator handles workflow creation and execution. Use simple_workflow() for basic tests or upload_yaml_workflow() with Format2 YAML for more complex workflows. The populator also provides methods to invoke workflows and wait for their completion.

DatasetCollectionPopulator

self.dataset_collection_populator = DatasetCollectionPopulator(
    self.galaxy_interactor
)

# Create a list collection
hdca = self.dataset_collection_populator.create_list_in_history(
    history_id,
    contents=["data1", "data2", "data3"],
    wait=True
)

# Create a paired collection
pair = self.dataset_collection_populator.create_pair_in_history(
    history_id,
    contents=[("forward", "ACGT"), ("reverse", "TGCA")],
    wait=True
)

# Create nested collections (list:paired)
identifiers = self.dataset_collection_populator.nested_collection_identifiers(
    history_id, "list:paired"
)

DatasetCollectionPopulator creates dataset collections for testing collection-aware tools and workflows. It supports lists, pairs, and nested structures like list:paired. The contents parameter can be simple strings or tuples for paired data.

API Test Assertions

from galaxy_test.base.api_asserts import (
    assert_status_code_is,
    assert_status_code_is_ok,
    assert_has_keys,
    assert_error_code_is,
    assert_error_message_contains,
)

# Check HTTP status codes
response = self._get("histories")
assert_status_code_is(response, 200)
assert_status_code_is_ok(response)  # Any 2XX

# Check response structure
data = response.json()
assert_has_keys(data[0], "id", "name", "state")

# Check Galaxy error codes
assert_error_code_is(response, error_codes.USER_REQUEST_INVALID_PARAMETER)
assert_error_message_contains(response, "required field")

The api_asserts module provides assertion helpers for common API test patterns. These give better error messages than plain asserts. The test class also provides wrapper methods like self._assert_status_code_is() for convenience.

Test Decorators

from galaxy_test.base.decorators import (
    requires_admin,
    requires_new_user,
    requires_new_history,
)
from galaxy_test.base.populators import skip_without_tool

class TestMyApi(ApiTestCase):
    @requires_admin
    def test_admin_only_endpoint(self):
        # Test runs only with admin user
        ...

    @requires_new_user
    def test_fresh_user(self):
        # Creates new user for test isolation
        ...

    @skip_without_tool("cat1")
    def test_cat_tool(self):
        # Skips if cat1 tool not installed
        ...

Decorators provide conditional test execution. @requires_admin ensures the test runs as an admin user. @requires_new_user creates a fresh user for isolation. @skip_without_tool skips tests when required tools aren’t available (useful for tests that depend on specific tool installations).

Context Managers

User switching:

def test_permissions(self):
    # Create resource as default user
    history_id = self.dataset_populator.new_history()

    # Test access as different user
    with self._different_user("other@example.com"):
        response = self._get(f"histories/{history_id}")
        self._assert_status_code_is(response, 403)

    # Test anonymous access
    with self._different_user(anon=True):
        response = self._get("histories")
        # Verify anonymous behavior

Context managers enable testing with different user contexts within the same test. Use _different_user() to verify permission checks and access controls. The anon=True parameter tests unauthenticated access.

Async & Celery

# Wait for history jobs to complete
self.dataset_populator.wait_for_history(history_id, assert_ok=True)

# Wait for specific job
job_id = result["jobs"][0]["id"]
self.dataset_populator.wait_for_job(job_id, assert_ok=True)

# Wait for workflow invocation
self.workflow_populator.wait_for_invocation(workflow_id, invocation_id)

# Wait for async task (Celery)
self.dataset_populator.wait_on_task(async_response)

ApiTestCase includes UsesCeleryTasks - Celery is auto-configured.

Galaxy uses Celery for background tasks. Many operations (tool execution, exports) return immediately with a task that must be awaited. The populator wait methods handle polling and timeout logic. ApiTestCase automatically configures Celery for testing.

API Test CI

CI Platform: GitHub Actions

Workflow: .github/workflows/api.yaml

Characteristics:

Fairly stable, rarely flaky
Split into chunks for parallelization
Uses PostgreSQL (not SQLite)
Runs on every PR

Failures usually indicate real issues with your changes.

API tests run on every pull request via GitHub Actions. The test suite is split into chunks that run in parallel for faster feedback. Unlike some other test suites, API tests are quite stable and failures typically indicate actual problems with the code under test.

Integration Tests Overview

Location: test/integration/

When to use instead of API tests:

Need custom Galaxy configuration
Need direct database access
Need Galaxy app internals (self._app)

Trade-off: Each test class spins up its own Galaxy server (slower)

Run: ./run_tests.sh -integration

Integration tests have more power than API tests - they control Galaxy’s configuration and access internals. However, this comes at a cost: each test case spins up its own Galaxy server, making them slower. API tests are generally preferred; use integration tests only when an API test won’t work.

Example: test_quota.py

from galaxy_test.base.populators import DatasetPopulator
from galaxy_test.driver import integration_util

class TestQuotaIntegration(integration_util.IntegrationTestCase):
    dataset_populator: DatasetPopulator
    require_admin_user = True

    @classmethod
    def handle_galaxy_config_kwds(cls, config):
        super().handle_galaxy_config_kwds(config)
        config["enable_quotas"] = True

    def setUp(self):
        super().setUp()
        self.dataset_populator = DatasetPopulator(self.galaxy_interactor)

    def test_create(self):
        # ... test quota API

This example shows the key integration test patterns: inherit from IntegrationTestCase, override handle_galaxy_config_kwds to modify configuration, and use class attributes like require_admin_user. The test has full access to populators and HTTP methods just like API tests.

Class Attributes

class TestMyFeature(integration_util.IntegrationTestCase):
    # Require the default API user to be an admin
    require_admin_user = True

    # Include Galaxy's sample tools and datatypes
    framework_tool_and_types = True

Attribute	Default	Purpose
`require_admin_user`	False	API user must be admin
`framework_tool_and_types`	False	Include sample tools/datatypes

Class attributes control test setup. require_admin_user ensures the test user has admin privileges. framework_tool_and_types loads the sample tools and datatypes used by framework tests, useful when your integration test needs to run tools.

Direct Config Options

The simplest pattern - set config values directly:

@classmethod
def handle_galaxy_config_kwds(cls, config):
    super().handle_galaxy_config_kwds(config)
    config["enable_quotas"] = True
    config["metadata_strategy"] = "extended"
    config["allow_path_paste"] = True
    config["ftp_upload_dir"] = "/tmp/ftp"

The config dict corresponds to galaxy.yml options.

The handle_galaxy_config_kwds class method receives the Galaxy configuration dictionary before the server starts. You can set any option that would normally go in galaxy.yml. Always call super().handle_galaxy_config_kwds(config) first to preserve base configuration.

External Config Files

For complex configs (job runners, object stores), use external files:

import os
SCRIPT_DIRECTORY = os.path.dirname(__file__)
JOB_CONFIG_FILE = os.path.join(SCRIPT_DIRECTORY, "my_job_conf.yml")

class TestCustomRunner(integration_util.IntegrationTestCase):
    @classmethod
    def handle_galaxy_config_kwds(cls, config):
        super().handle_galaxy_config_kwds(config)
        config["job_config_file"] = JOB_CONFIG_FILE

Common: job_config_file, object_store_config_file, file_sources_config_file

Complex configurations like job runners and object stores are easier to maintain as separate YAML or XML files. Point to them with the appropriate config option. Store these config files alongside your test file in the test/integration/ directory.

Dynamic Config Templates

For configs needing runtime values (temp dirs, ports):

import string

OBJECT_STORE_TEMPLATE = string.Template("""
<object_store type="disk">
    <files_dir path="${temp_directory}/files"/>
</object_store>
""")

@classmethod
def handle_galaxy_config_kwds(cls, config):
    super().handle_galaxy_config_kwds(config)
    temp_dir = cls._test_driver.mkdtemp()
    config_content = OBJECT_STORE_TEMPLATE.safe_substitute(
        temp_directory=temp_dir
    )
    config_path = os.path.join(temp_dir, "object_store_conf.xml")
    with open(config_path, "w") as f:
        f.write(config_content)
    config["object_store_config_file"] = config_path

When your config needs runtime values like temporary directories or dynamically allocated ports, use string.Template to substitute values. Use cls._test_driver.mkdtemp() to get a managed temporary directory that’s cleaned up after tests.

Configuration Mixins

Mixin classes simplify common configuration patterns:

class TestWithObjectStore(
    integration_util.ConfiguresObjectStores,
    integration_util.IntegrationTestCase
):
    @classmethod
    def handle_galaxy_config_kwds(cls, config):
        cls._configure_object_store(STORE_TEMPLATE, config)

Mixin	Purpose
`ConfiguresObjectStores`	Object store setup
`ConfiguresDatabaseVault`	Encrypted secrets
`PosixFileSourceSetup`	File upload sources

Galaxy provides mixin classes for common configuration patterns. Use multiple inheritance to compose capabilities. These mixins encapsulate boilerplate configuration code, making tests more readable and reducing duplication.

Accessing Galaxy Internals

Integration tests can access Galaxy’s app object via self._app:

from galaxy.model import StoredWorkflow
from sqlalchemy import select

def test_workflow_storage(self):
    # Query database directly
    stmt = select(StoredWorkflow).order_by(StoredWorkflow.id.desc())
    workflow = self._app.model.session.execute(stmt).scalar_one()

    # Access application services
    table = self._app.tool_data_tables.get("all_fasta")

    # Get managed temp directory
    temp_dir = self._test_driver.mkdtemp()

Direct app access enables testing internal state not exposed via API. You can query the database with SQLAlchemy, access tool data tables, or inspect any Galaxy service. Use this sparingly - prefer API-based assertions when possible as they’re more representative of real usage.

Skip Decorators & External Services

from galaxy_test.driver import integration_util

@integration_util.skip_unless_docker()
def test_docker_feature(self):
    ...

@integration_util.skip_unless_kubernetes()
def test_k8s_feature(self):
    ...

Decorator	Skips Unless
`skip_unless_docker()`	Docker available
`skip_unless_kubernetes()`	kubectl configured
`skip_unless_postgres()`	Using PostgreSQL
`skip_unless_amqp()`	AMQP URL configured

Skip decorators allow tests to gracefully skip when required services aren’t available. CI provides PostgreSQL, RabbitMQ, Minikube, and Apptainer. Some tests start their own Docker containers in setUpClass and clean up in tearDownClass.

Integration Test CI

CI Platform: GitHub Actions

Workflow: .github/workflows/integration.yaml

CI provides:

PostgreSQL database
RabbitMQ message queue
Minikube (Kubernetes)
Apptainer/Singularity

Stability: Moderately prone to flaky failures

Integration tests run on every pull request via GitHub Actions. The CI environment provides external services like PostgreSQL and RabbitMQ. This suite is more prone to transient failures than API tests. If you see failures unrelated to your changes, request a re-run from the Galaxy committers.

Selenium Tests Overview

Location: lib/galaxy_test/selenium/

What they test:

Full-stack UI with real browsers
User workflows through the interface
Visual correctness and accessibility

Technologies:

Selenium WebDriver (browser automation)
Smart component system (navigation.yml)

Run: ./run_tests.sh -selenium

Selenium tests are full-stack tests that drive Galaxy’s UI with real browsers. They inherit all API test infrastructure, so you can use populators for setup while testing actual UI interactions. These tests are slower but catch issues that API tests miss.

API vs UI Methods

Use API methods for setup (faster, more reliable):

self.dataset_populator.new_dataset(self.history_id, content="data")

Use UI methods when testing the UI itself:

self.perform_upload(self.get_filename("1.sam"))

Scenario	Use	Method
Need dataset for other test	API	`dataset_populator.new_dataset()`
Testing upload form	UI	`perform_upload()`
Need workflow for test	API	`workflow_populator.run_workflow()`
Testing workflow editor	UI	`workflow_run_open_workflow()`

Selenium tests should use API methods for test setup and UI methods only for what you’re actually testing. This makes tests faster and avoids false failures from unrelated UI bugs. If you’re testing dataset details, create the dataset via API, not through the upload form.

Test Class Structure

from .framework import (
    managed_history,
    selenium_test,
    SeleniumTestCase,
)

class TestMyFeature(SeleniumTestCase):
    ensure_registered = True  # Auto-login before each test

    @selenium_test
    @managed_history
    def test_something(self):
        self.perform_upload(self.get_filename("1.sam"))
        self.history_panel_wait_for_hid_ok(1)

Attribute	Purpose
`ensure_registered`	Auto-login before each test
`run_as_admin`	Login as admin user instead

Selenium tests inherit from SeleniumTestCase which provides browser automation plus all API test infrastructure. Set ensure_registered = True to automatically login before each test. Use run_as_admin = True for tests that need admin privileges.

Test Decorators

@selenium_test
@managed_history
def test_upload(self):
    ...

Decorator	Purpose
`@selenium_test`	Required - handles retries, debug dumps, accessibility
`@managed_history`	Creates isolated history, auto-cleanup
`@selenium_only(reason)`	Skip if using Playwright backend
`@playwright_only(reason)`	Skip if using Selenium backend

The @selenium_test decorator is required for all Selenium tests. It handles automatic retries (via GALAXY_TEST_SELENIUM_RETRIES), debug dumps on failure, and baseline accessibility checks. Use @managed_history to get an isolated history per test.

Smart Component System

Access UI elements via self.components (defined in navigation.yml):

# Access nested components
editor = self.components.workflow_editor
save_button = editor.save_button

# SmartTarget methods wait and interact
save_button.wait_for_visible()
save_button.wait_for_and_click()
save_button.assert_disabled()

# Parameterized selectors
self.components.history_panel.item(hid=1).wait_for_visible()

The smart component system wraps UI selectors with driver-aware methods. Components are defined in client/src/utils/navigation/navigation.yml and accessed via self.components. This provides a clean API and automatic waiting, reducing test flakiness.

SmartTarget Methods

Method	Purpose
`wait_for_visible()`	Wait for visibility, return element
`wait_for_and_click()`	Wait then click
`wait_for_text()`	Wait, return `.text`
`wait_for_value()`	Wait, return input value
`wait_for_absent_or_hidden()`	Wait for element to disappear
`assert_absent_or_hidden()`	Fail if element visible
`assert_disabled()`	Verify disabled state
`all()`	Return list of all matching elements

SmartTarget methods handle the common pattern of waiting for an element and then interacting with it. They include built-in timeouts and retry logic, making tests more reliable than raw Selenium calls.

History & Workflow Operations

File uploads:

self.perform_upload(self.get_filename("1.sam"))
self.perform_upload(self.get_filename("1.sam"), ext="txt", genome="hg18")

History panel:

self.history_panel_wait_for_hid_ok(1)
self.history_panel_click_item_title(hid=1)
self.wait_for_history()

Workflows (via RunsWorkflows mixin):

self.workflow_run_open_workflow(WORKFLOW_YAML)
self.workflow_run_submit()
self.workflow_run_wait_for_ok(hid=2)

The test framework provides specialized methods for common Galaxy operations. File uploads, history panel interactions, and workflow execution all have dedicated helper methods that handle waiting and error checking.

Accessibility Testing

@selenium_test automatically runs axe-core accessibility checks.

Component-level checks:

login = self.components.login
login.form.assert_no_axe_violations_with_impact_of_at_least("moderate")

# With known violations excluded
EXCEPTIONS = ["heading-order", "label"]
self.components.history_panel._.assert_no_axe_violations_with_impact_of_at_least(
    "moderate", EXCEPTIONS
)

Impact levels: "minor", "moderate", "serious", "critical"

Galaxy’s Selenium tests include automatic accessibility testing using axe-core. The @selenium_test decorator runs baseline checks, and you can add component-level assertions with specific impact thresholds. This helps ensure Galaxy remains accessible to all users.

Shared State Tests

For expensive one-time setup, use SharedStateSeleniumTestCase:

class TestPublishedPages(SharedStateSeleniumTestCase):
    @selenium_test
    def test_index(self):
        self.navigate_to_pages()
        # ... test using shared state

    def setup_shared_state(self):
        # Called once before first test in class
        self.user1_email = self._get_random_email("test1")
        self.register(self.user1_email)
        self.new_public_page()
        self.logout_if_needed()

SharedStateSeleniumTestCase runs setup_shared_state() once per class rather than per test. Use this for expensive setup like creating multiple users or published resources. State persists across all test methods in the class.

Selenium Test CI

CI Platform: GitHub Actions

Workflow: .github/workflows/selenium.yaml

Features:

Split into 3 chunks for parallelization
Auto-retry on failure (GALAXY_TEST_SELENIUM_RETRIES=1)
Debug artifacts uploaded on failure
PostgreSQL backend

Stability: More prone to flaky failures than API tests

Selenium tests run on every pull request via GitHub Actions. The CI automatically retries failed tests once and uploads debug artifacts (screenshots, HTML dumps) on failure. These tests are more prone to transient failures; request a re-run if failures seem unrelated.

Playwright Tests

Same test files as Selenium (lib/galaxy_test/selenium/)

Why Playwright?

Faster execution
Better reliability
Modern browser automation

Run:

./run_tests.sh -playwright

Install browser:

playwright install chromium --with-deps

Playwright is a modern alternative to Selenium that uses the same test files. Tests work with both backends; use @selenium_only or @playwright_only decorators for backend-specific tests. Playwright tends to be faster and more reliable.

Playwright CI

CI Platform: GitHub Actions

Workflow: .github/workflows/playwright.yaml

Differences from Selenium CI:

Installs Playwright via playwright install chromium
Uses headless mode (GALAXY_TEST_SELENIUM_HEADLESS=1)
Same test splitting (3 chunks)

Both Selenium and Playwright CI run on every PR.

Playwright tests run in a separate CI workflow from Selenium tests. This provides redundancy and helps catch browser-specific issues. Both workflows run on every pull request, testing the same code with different browser automation backends.

Selenium Integration Tests

Location: test/integration_selenium/

Combines:

Selenium browser automation
Integration test config hooks (handle_galaxy_config_kwds)

When to use:

UI testing that needs custom Galaxy configuration
Testing UI features behind config flags

Run: ./run_tests.sh -integration test/integration_selenium

Selenium Integration tests combine browser automation with integration test capabilities. Use them when you need to test UI behavior with custom Galaxy configuration - for example, testing an admin feature that requires specific config options enabled.

Selenium Integration CI

Example: test/integration_selenium/test_upload_ftp.py

Tests FTP upload UI
Requires ftp_upload_dir configuration

CI Platform: GitHub Actions

Workflow: .github/workflows/integration_selenium.yaml

Runs on every PR, similar stability to regular Selenium tests.

Selenium Integration tests have their own CI workflow. Like regular Selenium tests, they can be flaky due to browser timing issues. The CI uploads debug artifacts on failure to help diagnose issues.

Handling Flaky Tests

Some tests fail intermittently due to:

Race conditions
Timing issues
External dependencies

Galaxy’s approach:

Track via GitHub issues with transient-test-error label
Mark tests with @transient_failure decorator
Modified error messages help reviewers identify non-blocking failures

Flaky tests are tests that sometimes pass and sometimes fail without code changes. Galaxy provides infrastructure to track and manage these tests so they don’t block legitimate pull requests while still maintaining visibility into the underlying issues.

@transient_failure Decorator

from galaxy.util.unittest_utils import transient_failure

@transient_failure(issue=21224)
@selenium_test
def test_sharing_private_history(self):
    # Test that sometimes fails due to race condition
    ...

Parameters:

issue - GitHub issue number tracking this failure
potentially_fixed=True - Indicates fix was implemented

When test fails, error message includes issue link and tracking info.

The @transient_failure decorator wraps test failures with additional context linking to a tracking issue. This helps CI reviewers quickly identify known flaky tests vs. real failures.

Flaky Test Workflow

Identify - Test fails intermittently in CI
Track - Create GitHub issue with transient-test-error label
Mark - Add @transient_failure(issue=XXXXX) decorator
Fix - When implemented, set potentially_fixed=True

@transient_failure(issue=21242, potentially_fixed=True)
def test_delete_job_with_message(self, history_id):
    ...

Close - If no failures for ~1 month, remove decorator and close issue

The workflow ensures flaky tests are tracked and eventually fixed rather than being ignored. The potentially_fixed flag triggers reviewers to report any subsequent failures, helping determine if fixes are effective.

Running Tests Reference

Quick reference:

./run_tests.sh --help          # Full documentation

# Common patterns
./run_tests.sh -unit           # Python unit tests
./run_tests.sh -api            # API tests
./run_tests.sh -integration    # Integration tests
./run_tests.sh -selenium       # Selenium tests
./run_tests.sh -framework      # Tool framework tests

Client tests:

make client-test               # All client tests
yarn --cwd client test:watch   # Watch mode

The run_tests.sh script is the primary entry point for running Galaxy’s test suites. Use --help for complete documentation including all flags, environment variables, and test selection options.

Key Takeaways

Unit tests for isolated components (no server needed)
API tests for backend behavior via Galaxy API
Integration tests for custom Galaxy configurations
Framework tests for tool/workflow XML validation
Selenium/Playwright tests for UI with browser automation
Populators simplify test data creation
Each test type has a dedicated CI workflow