# PR #21335: Implement GA4GH WES API

## Summary

Adds a GA4GH Workflow Execution Service (WES) v1.0.0 API to Galaxy so external clients can submit, monitor, cancel, and list Galaxy workflow runs over the standard WES wire protocol. WES "runs" map onto Galaxy `WorkflowInvocation`s and WES "tasks" map onto invocation steps/jobs. Pydantic models are code-generated from the GA4GH WES OpenAPI spec; a thin FastAPI CBV router delegates to a large service layer that loads workflows (URL, attachment, or `gxworkflow://` DB reference), creates a history, invokes the workflow, maps state, and decorates outputs with DRS URIs. The PR also factors shared GA4GH service-info construction out of DRS into a reusable utility, adds an opaque-token keyset-pagination helper, and adds plain-text job stdout/stderr endpoints that WES task logs link to.

Merged 2026-01-27 (author jmchilton, merge by mvdbeek). Labels: kind/enhancement, area/API, area/workflows. No parent issue referenced in the body; follow-up `4afff175ea` fixes issue #22347.

## Changes

### WES API router (new)

`lib/galaxy/webapps/galaxy/api/wes.py` — FastAPI CBV class `WesApi`, `router = Router(tags=["wes"])`. Auto-registered via `include_all_package_routers`. **8 route decorators** (the PR body's "6 endpoints" counts the 2 task-log routes separately):

- `GET  /ga4gh/wes/v1/service-info` (public) — `service_info`
- `POST /ga4gh/wes/v1/runs` (multipart/form-data) — `submit_run`
- `GET  /ga4gh/wes/v1/runs` — `list_runs`
- `GET  /ga4gh/wes/v1/runs/{run_id}` — `get_run`
- `GET  /ga4gh/wes/v1/runs/{run_id}/status` — `get_run_status`
- `POST /ga4gh/wes/v1/runs/{run_id}/cancel` — `cancel_run`
- `GET  /ga4gh/wes/v1/runs/{run_id}/tasks` — `get_run_tasks`
- `GET  /ga4gh/wes/v1/runs/{run_id}/tasks/{task_id}` — `get_run_task`

A WES `run_id` is an encoded Galaxy invocation ID (decoded via `DecodedDatabaseIdField`). A `task_id` is a string `order_index` or `order_index.job_index` (e.g. `"0"`, `"2.5"`).

### WES service layer (new)

`lib/galaxy/webapps/galaxy/services/wes.py` (~1170 lines) — `WesService(ServiceBase)` depends on `WorkflowsService`, app config, `IdEncodingHelper`, and instantiates `KeysetPagination`.

- **submit_run pipeline** — rejects anonymous + checks user activation; loads workflow content (`gxworkflow://` DB ref, or creates a workflow from raw description); parses engine params; gets/creates a history; builds `InvokeWorkflowPayload`; calls `WorkflowsService.invoke_workflow`; returns `RunId`. Batch invocations are rejected.
- **`gxworkflow://` parsing** — `_parse_gxworkflow_uri`, format `gxworkflow://<encoded_id>[?instance=true|false]`. Default `instance=False` loads the `StoredWorkflow`; `instance=True` loads a `Workflow` instance.
- **Workflow type detection** — `_determine_workflow_type`: `class == "GalaxyWorkflow"` → `gx_workflow_format2`; presence of `steps`/`workflow` key → `gx_workflow_ga`; else `MessageException`.
- **History** — `_get_or_create_history` reuses an engine-param `history_id` or creates one (default name `"WES Run"`). Includes an explicit `sa_session.commit()` flagged "Postgres tests in CI fail without this commit."
- **Task listing** — a UNION query over single-job steps, collection-mapping jobs (expanded via `ImplicitCollectionJobsJobAssociation`), and no-job steps, with composite `(step_order, job_index)` keyset pagination (`TaskKeysetToken`).
- **State mapping** — `GALAXY_TO_WES_STATE` (`new`→QUEUED, `ready`→INITIALIZING, `scheduled`→RUNNING, `failed`→EXECUTOR_ERROR, `cancelled`→CANCELED, `cancelling`→CANCELING); unmapped → `UNKNOWN`. **No terminal/COMPLETE mapping** — a finished Galaxy invocation (`scheduled`) reports WES `RUNNING` (see Unresolved Questions).
- **RunLog** — `_invocation_to_run_log` decorates HDA outputs with DRS URIs (`drs://drs.{netloc}/hda-{drs-encoded-id}`), builds `task_logs_url`, leaves deprecated `task_logs=None`, and cannot recover the original `RunRequest` (always `None`).

### Generated Pydantic models (new)

`lib/galaxy/schema/wes/__init__.py` — generated by `datamodel-codegen` (pydantic v2) from the ga4gh WES OpenAPI spec (develop branch); 18 classes (`State` enum, `RunRequest`, `RunLog`, `TaskLog`, `ServiceInfo`, etc.). `lib/galaxy/schema/wes/gen.sh` regenerates with **no Galaxy-specific post-processing** — hand edits are lost on regen, and upstream is a moving target.

### Shared GA4GH utility + DRS refactor

`lib/galaxy/webapps/galaxy/services/ga4gh.py` (new) — `build_service_info(config, request_url, artifact, service_name, service_description, artifact_version)` returns a `galaxy.schema.drs.Service`, computing the org id from the reversed hostname and honoring `ga4gh_service_id` / `organization_*` / `ga4gh_service_environment` config. `api/drs.py` (`DrsApi.service_info`) drops ~30 lines of inline org/service building and calls the shared helper (`artifact="drs"`, version `1.2.0`). WES is the second consumer.

### Job stdout/stderr endpoints (new)

`lib/galaxy/webapps/galaxy/api/jobs.py` — two `@router.get` handlers (`get_job_stdout` / `get_job_stderr`), `response_class=PlainTextResponse`, returning `job.stdout or ""` / `job.stderr or ""`. WES `TaskLog` stdout/stderr fields link to `/api/jobs/{job_id}/stdout` and `/stderr`.

### Keyset pagination helper (new)

`lib/galaxy/model/keyset_token_pagination.py` — `KeysetToken` Protocol, `SingleKeysetToken` dataclass, and `KeysetPagination` with `encode_token` (base64 of JSON values) / `decode_token` (raises `MessageException` on bad tokens). WES task pagination supplies its own `TaskKeysetToken` implementing the same protocol.

### Request URL abstraction

`lib/galaxy/work/context.py` adds an abstract `url` property to `GalaxyAbstractRequest` (returns a starlette `URL`); `api/__init__.py` implements `GalaxyASGIRequest.url`. Needed so services can build absolute service-info URLs.

### Docs

`doc/source/admin/ga4gh.md` (new, +189) added to `doc/source/admin/index.rst` — covers DRS + WES admin/config with a `service-info` curl example.

## Changes since PR

Recent PR; minimal drift (verified `ad39d779..origin/dev`):

- `4afff175ea` (mvdbeek, 2026-04-01, fixes #22347) — "Require authentication for WES list_runs endpoint." Anonymous requests crashed with `AttributeError` on `trans.user.id`; adds an `AuthenticationRequired` guard at the top of `list_runs`.
- `afdbddf0ca` (mvdbeek, 2026-01-27, merge day) — "Fix newly failing WES test." A cancel test now waits via `wait_for_invocation_and_completion(...)` and asserts invocation state `"completed"` instead of `"scheduled"`.
- No follow-ups to `api/wes.py`, `services/ga4gh.py`, `keyset_token_pagination.py`, `schema/wes/__init__.py`, or `api/drs.py` — identical to PR-era code at HEAD.
- `api/jobs.py` and `work/context.py` were later touched by unrelated commits; this PR's stdout/stderr endpoints and `url` property are intact.

No files touched by this PR have moved or been renamed since merge.

## Tests

- `lib/galaxy_test/api/test_wes.py` (new, +929) — 27 test functions: service-info, run submission via URL / attachment / `gxworkflow://`, status, cancel, list, task listing + detail, pagination.
- `test/unit/model/test_keyset_token_pagination.py` (new, +92) — unit coverage for the pagination helper; unchanged since PR.
- `lib/galaxy_test/base/populators.py` (+5/-2) — adds `new_dataset_from_test_data(...)` (refactors `new_bam_dataset` onto it) and widens `wait_for_invocation_and_jobs` `workflow_id` to `Optional[str]`.
- `lib/galaxy_test/api/test_workflows.py` (+11/-5) — hoists `_show_workflow` to `BaseWorkflowsApiTestCase` and adds a `_latest_instance_id(...)` helper.

## Notes

PR body cross-checks against live code (all verified unless noted):

- "6 WES API endpoints" — the router actually defines **8** routes (6 core + 2 task-log); the body splits the count across two feature lists.
- The body's "Files Added" list is **incomplete** — it omits `keyset_token_pagination.py`, `schema/wes/gen.sh`, and both new test files.
- `gxworkflow://` scheme, endpoint paths, `/api/jobs/{id}/stdout|stderr`, `gx_workflow_ga` + `gx_workflow_format2`, codegen models, CBV router, and shared GA4GH utility — all verified.

WES is Galaxy's second GA4GH surface; the first is the partial TRS implementation noted in [[Component - Tool Shed Search and Indexing]]. There is no dedicated DRS vault note — this note carries the GA4GH/DRS-URI context for now. WES sits on top of the same invocation machinery as [[Component - Workflow API]] and reuses the import/normalize path from [[Component - Workflow Import]]; its async submit-then-poll shape parallels [[PR 20935 - Tool Request API]].

## Unresolved Questions

- No WES `COMPLETE` mapping — a finished Galaxy invocation reports `RUNNING`. Is the state contract intentional, or is terminal/job-failure mapping deferred? Load-bearing for any client polling for completion.
- `service-info` ships `auth_instructions_url` hardcoded to `"TODO"`.
- `_get_or_create_history` falls back to treating a decode failure as an already-decoded raw id — does this let a caller bypass id encoding?
- Output asymmetry — HDA outputs get DRS URIs, but collection (HDCA) outputs get only id-encoded ids. Intentional?
- Anonymous handling was inconsistent at merge (`submit_run` rejected anon, `list_runs` did not until #22347). Are `get_run` / `_get_invocation` (which read `trans.user.id`) anon-safe?
- `gen.sh` has no post-gen patching; regenerating against the upstream WES `develop` branch could silently change the shipped models.