TOOL_SOURCE_STORAGE_PLAN

Tool Source Storage

Date: 2026-04-28 Repo: galaxy-tool-util-ts Depends on: MERGED_CACHE_UI_PLAN.md (phases 1-5 landed). Adds the storage backing for the tool_source slot reserved in §2.1.


1. Goal

Make GET /api/tools/{tool_id}/versions/{tool_version}/tool_source return the raw tool wrapper XML (or YAML / CWL when applicable) instead of 501. The slot, the openapi route, and the disabled UI tab already exist — this plan fills in the storage / fetch path.

Driver: an external project consumes raw tool source. Storage policy choices should optimize for cheap, accurate raw bytes for that consumer over UI ergonomics.

2. Non-goals

3. What “tool source” means here

For Galaxy-style tools, “tool source” is the wrapper XML — the file containing <tool id=...>. Most tools also reference one or more sibling files (macros, requirements XML, test data layout). The contract:

Mapping at the route:

4. Where source comes from

ToolInfoService.getToolInfo already loads tools from configured sources to produce ParsedTool. The same fetcher knows the bytes — we just don’t currently keep them. Two upstream shapes today:

Decision needed: for type: toolshed, do we (a) mirror Galaxy’s raw_tool_source semantics by fetching from a configured Galaxy server, or (b) fetch raw bytes directly from the ToolShed (the source of truth, but a different code path)? Most likely (b) — the ToolShed is what tool-cache-proxy is for, no need to require a Galaxy alongside.

5. Storage policy — three options

Option A — Eager + write-through. getToolInfo fetches and stores raw source whenever it parses. Source available immediately for any cached tool. Cost: every fetch pays the bandwidth + ~12 KB extra disk.

Option B — Lazy fetch on demand. Cache stays lean; the tool_source handler fetches from the upstream on hit, returns directly. No disk cost, slow first call, repeated calls hit upstream every time.

Option C — Lazy fetch + write-through. First hit fetches and stores, subsequent hits serve from disk. Best for the actual consumer (read-heavy, repeat hits). Cost is amortized over use, not paid upfront.

Recommendation: C. It’s strictly better than B for any tool the consumer touches twice; pays nothing for tools the consumer never views (unlike A). Implementation cost over B is one extra cache.saveToolSource(...) call.

Decision needed: confirm C, or pick differently if the external consumer’s access pattern argues otherwise.

6. Cache layer changes

In packages/core/src/cache/:

In packages/core/src/cache-http/handlers.ts:

Open: does getToolSource need a ?refresh=true flag to bypass cache and re-fetch upstream? Probably yes — mirrors the existing refetch pattern for parsed tools.

7. Adapter changes (gxwf-web + tool-cache-proxy)

dispatchCacheRoute currently returns unknown and the adapter calls json(res, 200, result). Tool source returns bytes, not JSON — needs a different return shape:

type CacheResult =
  | { kind: "json"; body: unknown }
  | { kind: "bytes"; body: Uint8Array; contentType: string; headers?: Record<string, string> };

dispatchCacheRoute returns CacheResult; adapters branch on kind. Trivial change in both adapters.

8. UI

9. OpenAPI

Both specs already declare the route. Update the 200 response in both:

Regenerate api-types.ts in both servers.

10. Tests

Unit (packages/core/test/):

Integration (each server’s router test):

11. Rollout

  1. Cache layer: loadToolSource / saveToolSource + filesystem backend. Tests.
  2. infoService.fetchToolSource for type: toolshed and type: galaxy. Tests.
  3. getToolSource handler + CacheResult shape change in dispatchCacheRoute. Adapter byte-writing path in both servers. Tests.
  4. OpenAPI updates + codegen.
  5. UI: enable the disabled tab + optional table badge.
  6. Changeset (minor on core, gxwf-web, tool-cache-proxy; patch on cache-ui).

12. Open questions