STRETCH_TOOLSHED

Stretch plan: Tool Shed and adjacent improvements that would help find-shed-tool

These are not blockers for the CLI/skill plan in PLAN_SEARCH_CLI.md — each is a multi-week-or-longer effort against a separate codebase (Galaxy / Tool Shed) or against the search package’s still-evolving surface. Each entry: what, why it helps a discovery skill, rough effort, blast radius.

Source for diagnoses: vault/research/Component - Tool Shed Search and Indexing.md.


A. Tool Shed–side fixes (Galaxy monorepo, lib/tool_shed/)

A1. Carry version and changeset_revision on tool-search hits

tool_search.py:74-88 builds the per-hit dict from the Whoosh document but drops version (already in the schema) and never indexes changeset _revision. Result: every search hit needs a follow-up TRS call to become installable. The search package already declares both fields as ?optional with a comment pointing at this patch (packages/search/src/models/toolshed-search.ts:27-34).

A2. Index EDAM operations and topics

Tool XML <edam_operations> / <edam_topics> are parsed by Galaxy’s metadata pipeline (already in RepositoryMetadata.metadata["tools"]) but never put in the Whoosh tool_schema (tool_search.py:21-31). EDAM fields are indexed by Galaxy’s internal toolbox search; only the Tool Shed’s is missing them.

A3. Auto-reindex on upload

upload_tar_and_set_metadata writes RepositoryMetadata rows (managers/repositories.py:741-803) but does not update the Whoosh indexes. Freshness depends on cron / a manual PUT /api/tools/build_search_ index. Documented in research §6 as a primary rough edge.

A4. Implement GET /api/ga4gh/trs/v2/tools (currently stubbed [])

api2/tools.py:116-121 returns an empty list with a TODO. With it implemented, agents can enumerate the catalog (paginated, filtered) using a spec-conforming protocol — useful for offline indexes and for tools that cross-reference multiple TRS-compliant registries (Dockstore, BioContainers).

A5. Fix the dead approved boost (or remove it)

RepoWeighting.final doubles BM25 when approved == "yes" (repo_search.py:66), but approved is always written as "no" (shed_index.py:161). The boost is dead code today.

A6. Lowercase categories consistently for filter parity

The repo index lowercases category names at write time (shed_index.py:101-105) but the reserved category:'Climate Analysis' filter passes the user’s casing through verbatim (repo_search.py:187). Silent zero-result misses.

A7. Stop deduping TRS versions by version string

get_repository_metadata_by_tool_version collapses duplicate version strings across changesets (managers/trs.py:96, versions[v] = metadata overwrite). Two changesets can publish the same XML version with different content; only the last wins.


B. New surfaces that would unblock skill quality without changing the Tool Shed

B1. Local mirror index built from the TRS catalog

If A4 (GET /tools) lands, build a local Whoosh-or-MeiliSearch-or-Tantivy index from the catalog snapshot. Skill queries hit the local index (milliseconds, fully filterable, EDAM enabled) and only fall through to the live Tool Shed for installation-time confirmation.

Falls back gracefully: if the local index is missing or stale, gxwf search calls the live Tool Shed (current Stage-1 behaviour).

B2. Curated tool catalog with EDAM and signal — independent of Tool Shed

Skip the Tool Shed for discovery entirely; consume an upstream curated list (the Galaxy training material EDAM annotations, the bio.tools registry, BioContainers index, IUC’s own tools-iuc tree). Cross-reference to Tool Shed only when a candidate is selected.

The wildcard-wrapped BM25 ordering is coarse. After Stage 1 ships, run a small embedding model (locally or via the Anthropic API) over hit name + description and the user’s query, and rerank.

B4. Galaxy-instance-installed-tools cross-check

Optional, off by default. Given a Galaxy URL, mark each search hit with whether the exact (tool_id, version) is installed there. Useful when the user has a target instance and wants to know what’s already available without leaving the discovery flow.

B5. Skill memory of past picks

When the skill picks iuc/fastqc over devteam/fastqc for a given user or project, persist that choice to the agent’s memory system so future ambiguous queries default the same way without re-prompting.

B6. Workflow-aware search ranking

When the skill is invoked inside a workflow-authoring flow (the parent skill knows the upstream/downstream tools), bias ranking toward tools whose declared input/output datatypes are compatible with the workflow’s existing data flow.


Suggested ordering across all of A and B

If forced to pick a near-term sequence:

  1. A1 (carry version on hits) — small change, removes a per-hit round-trip from the skill.
  2. A6 (lowercase category filter) — trivial, fixes silent miss.
  3. B3 (embedding reranker) — improves quality without waiting on upstream.
  4. A2 (EDAM in shed indexes) — unlocks semantic recall server-side.
  5. A3 (auto-reindex on upload) — freshness; biggest user-visible improvement of the cron-driven status quo.
  6. A4 (TRS GET /tools) — unlocks B1 / B2.
  7. B1 / B2 (local or curated index) — only worth the effort once A4 exists, or commit to scraping the SQL backend directly.
  8. Everything else as opportunity arises.

Unresolved questions