basic.py Parameter Hierarchy — Unified Diagnostic
lib/galaxy/tools/parameters/basic.py (3,191 lines, ~30 classes) is the
legacy ToolParameter hierarchy. It has been the central choke point for tool
parameters in Galaxy for over a decade. This note collects, in one place, the
structural reasons it has resisted decomposition. It is not a refactor plan —
it is the dossier that should accompany every plan that touches this file.
The file’s problems are not specific to any one downstream consumer. Workflow extraction, post-hoc state divergence, the new tool form, the workflow editor, the job display panel, and the YAML runtime all suffer for related but not identical reasons. This dossier tries to describe the file on its own terms; the consumer-specific symptom catalogues live in the cross-referenced notes.
Cross-references in this note point at file paths in the working tree at
lib/galaxy/.... Line numbers reference the state at the time of writing
(2026-05-21, branch refactor-data-options-builder).
Executive summary
-
Two products glued together. The file is both the parameter description authority (parses XML
input_source, ownsto_dict/get_initial_value/validate/get_optionsfor the form) and the value codec for arbitrary persistedJobParameterrows (to_json/to_python/value_to_basic/value_from_basicfor every consumer that re-reads a finished job). The two roles have incompatible contracts. Every method ends up serving both at once. -
The polymorphic surface is a mirage. ~30
isinstance(..., XxxToolParameter)sites acrosslib/galaxy/reach past the base class to discriminate on concrete subclasses. Direct construction ofBooleanToolParameter(None, …),DataToolParameter(None, …, self.trans),IntegerToolParameter(None, …)happens inworkflow/modules.pyat:1005, 1021, 1159, 1173, 1225, 1289, 1356, 1366, 1388, 1419, 1451, 1459, 1474, 1483, 1493, 1502. The base class is a method bag. -
Pervasive type erasure. Base-class signatures are effectively
Any → Any.value_from_basic(:284–301) dispatches onMutableMapping["__class__"]sentinels (UnvalidatedValue,NoReplacement),is_runtime_value, falls back toto_python, optionally swallows exceptions viaignore_errors. There is no coherent state machine — every consumer re-discriminates the shape. -
The data parameters are god classes.
DataToolParameter(~500 lines,:2180–2683) andDataCollectionToolParameter(~250 lines,:2684–2933) bundle XML parsing, input coercion, validation, paginated SQL option building, workflow converter-safety analysis, dynamic-options filter machinery, UI default seeding, and DB-row-committing default-object materialization (raw_to_galaxyat:3048–3133, whereapp.model.session.commit()lives inside what reads as a value coercion path). -
The Pydantic side is a parallel reimplementation with zero shared code.
lib/galaxy/tool_util_models/parameters.pyandlib/galaxy/tools/runtime.pyre-define the same taxonomy in Pydantic and do not importbasic.py(confirmed via grep — no hits).basic.pydoes not import them either. They are kept aligned by convention. -
A
HistoryItemRefADT is trying to escape from the helpers at the bottom.src_id_to_item(:2133),src_id_to_item_collection(:2171),history_item_dict_to_python(:3159),history_item_to_json(:3174),raw_to_galaxy(:3048) do the polymorphic dispatch over{HDA, HDCA, DCE, LDDA, CollectionAdapter}that the class hierarchy refuses to model. They are called from four parameter classes and fromTool.to_json. The src whitelist is duplicated at:3162and:2151. -
Inheritance leaks, MRO hazards, and Liskov violations are out in the open. The file contains in-source confessions —
# why does Integer and Float subclass this :_((:429),# skip SelectToolParameter (the immediate parent) bc we need to get options in a different way here(:1276, 1854duplicate),As with all hidden parameters, this is a HACK.(:2937).DrillDownSelectToolParameterreturns an incompatible list type fromget_options, and the supertype annotation was widened to absorb the violation (:1009).
1. The two roles, in detail
Forward direction — parameter description authority
ToolParameter.__init__ parses XML input_source (or the YAML-derived adapter
view). The forward methods (get_options, get_initial_value, to_dict,
get_legal_values, validate against fresh form submissions) consume the spec
and a live trans/history context and produce form-renderable JSON or a
validated incoming value. The relevant consumers are the tool form
(Tool.to_json in tools/__init__.py), the workflow editor
(managers/workflows.py, workflow/modules.py), and live execution
(tools/parameters/__init__.py:306 in check_param).
Backward direction — post-hoc value codec
The same classes are called to decode arbitrary JobParameter rows for any
tool, including YAML tools that the class hierarchy was never designed to
describe. BaseDataToolParameter.to_python (:2039–2070) carries a comment
labelling four branches as “Handle legacy string values potentially stored in
databases”:
- comma-separated id strings (
:2054–2059) "__collection_reduce__|<encoded_id>"(:2060–2064)"dce:N"/"hdca:N"(:2065–2068)- the canonical
{"values":[…]}envelope (the modern shape)
The same method body has to serve both directions because there is no separate
codec type. DataToolParameter.from_json (:2245–2401, ~150 lines) is the
clearest case: it dispatches over MutableMapping, list, str,
__collection_reduce__|, HDA/HDCA/DCE/LDDA, mutates
value_to_check.implicit_conversion (:2354) as a side effect, and includes
a 16-character-length heuristic at :2294–2298 to guess whether an id is
encoded (a 16-digit integer ≥ 1e15 would defeat it; the code logs a warning
then decodes anyway).
This is the deep structural sin. Every other problem on this page is downstream of it.
2. The polymorphic mirage
ToolParameter is polymorphic on paper. In practice ~30 callsites reach for
concrete subclasses via isinstance:
tools/__init__.py:168–179— imports 10 concrete classes, 7isinstancesites (:1788, 1891, 1958, 2543, 2557, 2580, 3261, 3273).tools/evaluation.py:62–66— 3 imports, 6isinstancesites (:418, 457, 492, 505, 518, 530, 659).tools/actions/__init__.py:64–67— 7isinstancesites.managers/workflows.py:91–94— 7isinstancesites.tools/wrappers.py:38–41— 4 imports.tools/parameters/wrapped.py:11–16— 4isinstancesites.workflow/modules.py:95–107— 9 imports, bothisinstanceand direct construction at:1005, 1021, 1159, 1173, 1225, 1289, 1356, 1366, 1388, 1419, 1451, 1459, 1474, 1483, 1493, 1502.tool_shed/tools/tool_validator.py:34—isinstanceagainstSelectToolParameterfor dynamic-options detection.
Direct construction with None as the tool argument
(DataToolParameter(None, data_src, self.trans) at workflow/modules.py:1159)
demonstrates that the class can be torn from its supposed XML-spec context —
which it can, because nothing in the data path uses the spec context
faithfully past initialization.
Exactly one subclass exists outside basic.py: ConditionalStepWhen
(workflow/modules.py:162), a BooleanToolParameter for workflow conditional
step semantics. Everything else lives in basic.py.
3. Inheritance leaks and MRO hazards
IntegerToolParameter(TextToolParameter), FloatToolParameter(TextToolParameter) (:465, 538)
In-source confession at :429: # why does Integer and Float subclass this :_(.
Both numeric classes carry datalist, area, optionality_inferred, and
wrapper_default attributes they have no semantic use for.
IntegerToolParameter.__init__ (:484) immediately re-validates the parent’s
self.value as int, contradicting the parent’s text contract. The
optionality_inferred flag at :433 only fires when self.type == "text",
exposing a runtime type discriminator inside an inheritance hierarchy that is
supposed to obviate it. FloatToolParameter.from_json (:578) is a
copy-paste of the integer version, including an error message that still says
“integer” (:588) when reporting failures on a float input.
BaseURLToolParameter(HiddenToolParameter) (:902)
URL building is conceptually unrelated to hiding; the inheritance exists
purely for self.hidden = True and to skip label rendering. Reuse-via-
inheritance for two booleans.
The Select hierarchy
GenomeBuildParameter, SelectTagParameter, ColumnListParameter, and
DrillDownSelectToolParameter all inherit from SelectToolParameter and then
skip the immediate parent in to_dict (:1276, 1854 — same comment
duplicated: “skip SelectToolParameter (the immediate parent) bc we need to
get options in a different way here”).
DrillDownSelectToolParameter.get_options returns
list[DrillDownOptionsDict] (:1700), while the parent’s annotation is
Sequence[Union[ParameterOption, DrillDownOptionsDict]] (:1009) — the
supertype was widened to admit the subtype, enshrining the Liskov violation in
the parent. The class calls ToolParameter.__init__(self, tool, input_source)
directly (:1672), bypassing its declared parent.
ColumnListParameter does file I/O inside get_options (:1560:
open(dataset.get_file_name()) to read header rows). An option-builder
performing disk I/O is a clear smell; it ties the form-rendering path to
dataset filesystem state.
HiddenDataToolParameter(HiddenToolParameter, DataToolParameter) (:2937)
Declares MRO Hidden-first, but __init__ calls
DataToolParameter.__init__(self, tool, elem) directly (:2944) to defeat
the declared order. Currently safe because HiddenToolParameter defines no
to_json/from_json overrides — one future override would silently break
every caller that depends on the data semantics. Docstring at :2938:
“As with all hidden parameters, this is a HACK.” :2950–2954 rejects
not self.optional with a comment noting the only known user is the cufflinks
tool. An entire class lives in this file to support one historical tool.
4. Method-level conflation
value_from_basic (:284–301)
Five different jobs in one method body: dispatch on is_runtime_value,
MutableMapping["__class__"] == "UnvalidatedValue", "NoReplacement",
delegate to to_python, optionally swallow exceptions. The ignore_errors
toggle inverts the contract.
SelectToolParameter._select_from_json (:1047–1131)
Legal-value lookup, name-based fallback when value lookup fails, runtime-
context detection, multiple-value list splitting on whitespace, a
Version(profile) < "18.09" workaround (:1081–1085 — eight years of carried
legacy), and finally raises. Six concerns, one method.
DataToolParameter.to_dict (:2472–2505)
Executes paginated SQL, builds match dictionaries, dedups by HID, and
serialises — alongside computing extensions, edam, multiple, min,
max, tag (all static info). The recent split into
_fill_to_dict_static/_page_hda_matches/_pin_live_hda_inputs/
_carry_unresolved_inputs/_page_hdca_matches re-packs but does not
re-distribute the responsibilities; they still all live on the parameter
class.
from_json everywhere
Coercion + workflow-parameter pass-through + optional/empty handling + two
different error messages depending on workflow_building_mode, in a method
ostensibly named for parsing.
get_initial_value is overloaded
BaseDataToolParameter.get_initial_value (:1981–2027) serves both as
UI default for /api/tools/{tool_id}/build (pre-populate the form when the
user hasn’t picked anything) and as validation/from_json fallback when an
existing from_json returns None and the param isn’t optional. A
post-hoc reader cannot tell whether a returned value came from the user’s
persisted choice or from a “pick first matching HDA” heuristic.
5. Type uncertainty / union proliferation
:2119–2130: explicitItemFromSrcAny = Union[DCE, HDA, HDCA, LDDA, CollectionAdapter]andItemFromSrcCollection = Union[DCE, HDCA, CollectionAdapter]. Every consumer needsisinstancechains.DataToolParameter.from_jsonreturnsOptional[Union[HistoryItem, list[HistoryItem]]]depending onself.multiple; the same return slot also carriesbatch_wrapperHDCAs (:2374–2383) for a different mode entirely.SelectToolParameter.get_options:Sequence[Union[ParameterOption, DrillDownOptionsDict]](:1009) — union exists only to absorb the Drilldown LSP violation.ColumnListParameter.get_legal_values(:1577) returnsset[str]of column numbers but extends with raw user values when the file is empty (:1587).SelectToolParameter.get_initial_valuereturnsOptional[Union[str, list[str]]], shape determined by combinations ofself.optional,self.multiple, and the count of selected options.FileToolParameter.to_json(:734) returnsOptional[Union[str, None]]from inputs that may bestr,MutableMapping,cgi_FieldStorage, or raises.
The “flag combinations” pattern is everywhere: multiple, optional,
is_dynamic, refresh_on_change, default_object, accept_default,
default_value, usecolnames, batch_wrapper, workflow_building_mode —
each pair compounds the shape of inputs and outputs without type narrowing.
6. Data parameters as god classes
DataToolParameter (:2180–2683) and DataCollectionToolParameter
(:2684–2933) each combine, in one class:
- Source parsing —
__init__readsload_contents,format,multiple,min/max,tag, conversions,default_object,allow_uri_if_protocol, plus comma-typedcollection_typefor the collection variant. - Input coercion —
from_json, withDataToolParameterweighing in at ~150 lines ofisinstancedispatch. - Validation — inherited from
BaseDataToolParameterplus inline validation cascades infrom_json. - Paginated DB option building —
_page_hda_matches,_pin_live_hda_inputs,_carry_unresolved_inputs,_page_hdca_matches,_classify_hdca,_history_query. Roughly half of each class. - Workflow converter-safety analysis —
converter_safe(:2427–2451) walks other tool inputs. - Dynamic-options filter attribute machinery —
get_options_filter_attribute(:2453–2470), preceded by a multi-line HACK comment. - DB-row materialization of default objects via
raw_to_galaxy(:3048–3133). This commits sessions (app.model.session.commit()at:3099) inside what reads as a value coercion path. Layering wart. - UI default seeding —
get_initial_valuedoubling as UI default and validation fallback (see §4).
DataCollectionToolParameter._classify_hdca (:2899–2934) returns 0, 1, or
2 entries encoding two semantic flavours ("direct", "multirun") in a
tagged tuple because the class can’t represent two distinct match kinds.
match_collections / match_multirun_collections on
DataCollectionToolParameter (:2721, 2737) have zero callers outside the
class itself. The codebase’s actual collection matching goes through
trans.app.dataset_collection_manager.match_collections(...)
(workflow/modules.py:597, tools/parameters/meta.py:259,391,
managers/collections.py:730). Two parallel collection-matching surfaces;
the unused one lives on the parameter.
Request-context leaks into parameter description
BaseDataToolParameter.__init__ (:1880) takes a trans parameter that no
other ToolParameter.__init__ takes — leaking request context into XML
parsing. The cached self._acceptable_extensions_cache (:1950) is
per-instance state mutated during to_dict, which means tool parameter
objects are not safely reusable across requests with different
datatypes_registry instances. This is a latent footgun for any caller that
treats tool.inputs as immutable shared state.
7. The Pydantic schism
lib/galaxy/tool_util_models/parameters.py and lib/galaxy/tools/runtime.py
implement the same parameter taxonomy in Pydantic, validated end-to-end at the
API boundary. They do not import basic.py (confirmed via
rg -n "parameters\.basic" lib/galaxy/tool_util_models/ lib/galaxy/tools/runtime.py
— no hits). basic.py does not import the Pydantic side either. They are two
implementations of the same concept, kept aligned by code review and by the
YAML-driven state-representation test suite (see
Component - Tool State Specification).
Consequences:
- No shared validator. A YAML tool defines its parameters once; basic.py
parses an XML adapter view (
tool_util/parser/yaml.py:340–342withvalue_state_representation = "test_case_json"), Pydantic parses the same source independently. - No shared codec.
from_json/to_jsonin basic.py and theRequestToolStatevalidators are independent code paths. - The 30+
isinstanceconsumers are the load-bearing constraint on any refactor: replacing concrete class identities (DataToolParameter,SelectToolParameter, etc.) would require rewriting every dispatch site. Delegating basic.py down to the Pydantic models, or wrapping basic.py with Pydantic adapters, are the two realistic directions. Continuing the parallel-by-convention stance is the third option and continues to widen the gap as YAML-era features land.
8. The hidden HistoryItemRef ADT
Five module-level helpers do the polymorphic dispatch over
{HDA, HDCA, DCE, LDDA, CollectionAdapter} that the class hierarchy refuses
to model:
src_id_to_item(sa_session, security, value) → ItemFromSrcAny(:2133–2170)src_id_to_item_collection(...) → ItemFromSrcCollection(:2171–2179)history_item_dict_to_python(value, app, name)(:3159–3171)history_item_to_json(value, app, use_security)(:3174–3198)raw_to_galaxy(app, history, value)(:3048–3133)
They are called from at least four parameter classes
(SelectToolParameter.to_json/to_python,
BaseDataToolParameter.to_json/to_python, DataToolParameter.from_json,
DataCollectionToolParameter.from_json) and from Tool.to_json. The src
whitelist is duplicated at :3162 and :2151 — two sources of truth for
the same set.
Type aliases at :2119–2130 spell out the union explicitly:
ItemFromSrcAny = Union[DCE, HDA, HDCA, LDDA, CollectionAdapter]
ItemFromSrcCollection = Union[DCE, HDCA, CollectionAdapter]
These helpers want to be HistoryItemRef — a value object with
{src, id, extra_params, encode/decode} and an opt-in resolve method. The
absence of the value object is what forces every data-param method into
isinstance chains. Promoting it would also expose two distinct concerns
currently tangled in raw_to_galaxy: (a) parsing/encoding refs (pure), and
(b) materializing DB rows from defaults (side-effecting). The helper at the
bottom of the file is doing both.
history_item_to_json is, in particular, the lone bottleneck for the wire
format of persisted data values:
def history_item_to_json(value, app, use_security):
if isinstance(value, CollectionAdapter):
return value.to_adapter_model().model_dump()
if isinstance(value, MutableMapping) and "src" in value and "id" in value:
return value
elif isinstance(value, DatasetCollectionElement): src = "dce"
elif isinstance(value, HistoryDatasetCollectionAssociation): src = "hdca"
elif isinstance(value, LibraryDatasetDatasetAssociation): src = "ldda"
elif isinstance(value, HistoryDatasetAssociation): src = "hda"
object_id = cached_id(value)
return {"id": app.security.encode_id(object_id) if use_security else object_id,
"src": src}
Whatever it doesn’t emit, the persisted layer can’t see. That is the structural
weakness behind Problem - YAML Tool Post-Hoc State Divergence and the
HID-trace fragility documented in Workflow Extraction Issues (#14423,
#21789, #21788, #9161 etc.), but the failure mode is more general: any new
collection-typed feature that lands on the Pydantic side and is not also
reflected in history_item_to_json is silently invisible to every post-hoc
consumer.
9. Identifier encoding bugs hiding in the codec
DataToolParameter.from_json16-char heuristic (:2294–2298) — if a value string is exactly 16 chars long, assume encoded id. A 16-digit integer ≥1e15 fails this test. The code logs"Encoded ID where unencoded ID expected"and decodes anyway.BaseDataToolParameter.to_python"dce:N"/"hdca:N"parsers (:2065–2068) and the symmetric site inDataCollectionToolParameter.from_json(:2787–2792) callint(value[N:])without consultingapp.security. Any pathway that produced an encoded id in this legacy format would fail to load.DataToolParameter.from_jsonplain-int fallback (:2321) callsint(value)directly — encoded id without digits would raiseValueError.BaseDataToolParameter.to_python(:2063) is itself branchy:if not decoded_id.isdigit(): decoded_id = app.security.decode_id(decoded_id).
use_security toggles encoding on write but reads infer from shape. There is
no schema and no validation.
10. Downstream symptoms (one paragraph each)
These are consequences, not new problems — the underlying causes are §1–§9.
Post-hoc state divergence (YAML tools). PRs 20935 / 21828 / 21842
introduced a Pydantic-validated Job.tool_state for YAML tools, but
Tool.get_param_values(job) (tools/__init__.py:2668–2674) — used by rerun,
job display, workflow extraction, and history export — still routes through
Job.raw_param_dict() → params_from_strings → per-class to_python in
basic.py. Two parallel representations exist, only one is validated, nothing
reconciles them. Full treatment in
Problem - YAML Tool Post-Hoc State Divergence.
Workflow extraction crashes and miswires. extract.py:435–492
(__cleanup_param_values) assumes recovered values are ORM objects with
.hid, but to_python’s six-shape acceptance and the legacy string formats
can leak dicts/envelopes through. This produces AttributeError: 'dict' object has no attribute 'hid' (#14423). Separately, the {src,id} wire
shape drops parent-HDCA / element identifier / column metadata, which
contributes to the broader class of extraction failures catalogued in
Workflow Extraction Issues. These are symptoms — the extraction
subsystem also has independent fragilities (HID-based reconnection, copied-
dataset provenance walks) that would exist even with a better codec.
Workflow editor coupling. workflow/modules.py directly constructs
parameter classes with tool=None and a synthetic source at :1005, 1021, 1159, 1173, 1225, 1289, 1356, 1366, 1388, 1419, 1451, 1459, 1474, 1483, 1493, 1502. This tightly couples the editor to the basic.py class identities; any
refactor must preserve them or rewrite the editor.
Job display. managers/jobs.py:2040 calls input.value_to_display_text
on each param — the only polymorphic point of use here. Everything else in
the display path duck-types on input.type, input.test_param, input.label.
A display-side fidelity gap exists for any structured value the codec can’t
round-trip.
11. Vestigial / legacy / dead code inventory
In-source confessions and TODOs:
:429# why does Integer and Float subclass this :_(:1081–1085Version(profile) < "18.09"workaround (8 years of carried legacy):1189, 1831# FIXME: Currently only translating values back to labels if they are not dynamic(duplicate):1276, 1854# skip SelectToolParameter (the immediate parent) bc we need to get options in a different way here(duplicate):1298–1302# Hack for unit tests, since we have no tool— production branches on test context:1322, 1440# Legacy style default value specification.../# Newer style...parallel paths:1862–1870_carried_state_labelemits UI text ("deleted","hidden","not in current history") that downstream code prefixes onto names at:2622, 2628. UI strings in the model layer.:1909# can be None if self.tool.app is a ValidationContext:1930# TODO: Enhance dynamic options for DataToolParameters...:1936# TODO: Abstract away XML handling here.:2047–2067Four-way legacy string-format dispatch into_python:2181, 2189named-developer TODOs (# TODO, Nate: ...):2291–2295Encoded-vs-decoded id confusion withlog.warningin the production data path:2308–2314__collection_reduce__|parser; same prefix at:2057:2454Multi-line# HACK to get around current hardcoded limitation... this behavior needs to be entirely reworked (in a backwards compatible manner):2699self.multiple = False # Accessed on DataToolParameter a lot, may want in future:2703_parse_options(input_source) # TODO: Review and test.:2937As with all hidden parameters, this is a HACK.:2950–2954hidden_datarequiresoptional="true"(cufflinks-only class):3069# TODO: Convert md5 -> MD5 during tool parsing.:3185–3186# hasattr 'id' fires a query on persistent objects after a flush so better to do the isinstance check. Not sure we need the hasattr check anymore - it'd be nice to drop it.
Other smells
- Live SQLAlchemy mappings in a doctest.
ColumnListParameter(:1417–1422) constructs SQLA-mapped fixtures inline in its doctest — heavy fixture work entangled with the unit test. - File I/O in an option-builder.
ColumnListParameter.get_optionsopens the dataset file at:1560to read header rows. - Polymorphic
to_texthas zero external callers. Consumers usevalue_to_display_textinstead. The polymorphic-looking surface is dead at the base class. match_collectionsandmatch_multirun_collectionsonDataCollectionToolParameter(:2721, 2737) — zero external callers._carried_state_labelemits UI prose from the model layer (:1862–1870).
12. What would let basic.py shrink
In order of leverage, all of which are orthogonal to any specific consumer problem:
-
Promote the
HistoryItemRefADT. Extractsrc_id_to_item/history_item_dict_to_python/history_item_to_jsoninto a typed value-object module. Make every data-param method consumeHistoryItemRefinstead of dispatching on raw dicts. Keepraw_to_galaxyseparate — it’s a materializer (DB-side-effecting), not a codec. -
Define a
parse_persistedboundary. Funnelto_python’s six legacy input shapes through a single normalizer that returns either alist[HistoryItemRef]or raises a typed error. This collapses §9, makes the data-param post-hoc readers safe to type-narrow, and is a prerequisite for the structured-state work in Problem - YAML Tool Post-Hoc State Divergence. -
Split parameter description from value codec. The forward and backward roles (§1) have incompatible contracts. The forward side stays in
basic.py(or moves into a Pydantic-paired description module). The backward side becomes a small set of codecs overHistoryItemRefand primitive types, callable without aToolinstance. -
Split the data-param god classes. A
DataParameterOptionsProvider(or similar) takes a parameter spec, a history, and a matcher, and produces the form option dict.DataToolParametershrinks to “what the parameter is.”_classify_hdca’s"direct"/"multirun"flavours become explicit match types. The recenttool_form_optionsextraction is a first step. -
Decommission legacy string formats.
__collection_reduce__|,dce:,hdca:, comma-separated ids: deprecate, audit production DBs for remaining rows, migrate. -
Fix the inheritance smells with the lowest behavioural risk. Move
IntegerToolParameterandFloatToolParameterout from underTextToolParameter. DetachBaseURLToolParameterfromHiddenToolParameter. DeclareHiddenDataToolParameterMRO as(DataToolParameter, HiddenToolParameter)to match what__init__already does. MoveColumnListParameter’s file I/O out ofget_options. -
Pick a Pydantic delegation direction. Either basic.py delegates to
tool_util_models/parameters.py, or the Pydantic models extend/wrap basic.py, or the parallel-by-convention stance is formally accepted with a cross-validation test that runs on every CI build. Today there is no such validation.
Methodology note
This dossier was produced by four parallel subagents on 2026-05-21, each covering a distinct angle: internal structural / code smells, callsite / API surface map, state-representation divergence, and data/collection parameters’ interaction with workflow extraction. Two of the four prompts leaned into extraction-specific concerns; revision 2 of this note rebalanced toward file-internal problems and demoted extraction to one of several downstream symptoms (§10). Findings were synthesized into this note; the four source reports are not preserved as separate artifacts but the citations carry the receipts.
Unresolved questions
- How much production data still encodes data values as
"__collection_reduce__|...","dce:N", or comma-int-strings? A read-only scan ofJobParameter.valuepatterns would tell us whether the legacy decode branches can be deleted. - Does any in-tree caller depend on
extra_paramskeys being passed throughhistory_item_to_jsonon the input-dict short-circuit (:3178–3179)? Or is the input-dict path effectively dead for fresh API submissions? - Is there a path to making
Tool.to_json(job=…)(the rerun pre-fill) useToolRequest.requestwhen present, independent of any basic.py refactor? - For
HiddenDataToolParameter: any callers depending on the declared Hidden-first MRO ordering, or is theDataToolParameter.__init__direct call sufficient evidence that nobody does? - Is the per-instance
_acceptable_extensions_cache(:1950) actually observed to cause issues with multi-tenantdatatypes_registryswaps, or is it latent only? - Does the
match_collections/match_multirun_collectionszero-caller surface predate thedataset_collection_manageralternative, and can it simply be deleted?