Pydantic v2 Dynamic Models Reference
Reference document covering Pydantic v2 dynamic model features used in Galaxy’s tool state system and related patterns.
1. create_model() API
Dynamically creates a BaseModel subclass at runtime.
from pydantic import BaseModel, Field, create_model
# Signature
create_model(
__model_name: str,
*,
__config__: ConfigDict | None = None,
__doc__: str | None = None,
__base__: type[BaseModel] | tuple[type[BaseModel], ...] | None = None,
__module__: str = __name__,
__validators__: dict[str, classmethod] | None = None,
__cls_kwargs__: dict[str, Any] | None = None,
__qualname__: str | None = None,
**field_definitions, # field_name=(type, default) or field_name=type
) -> type[BaseModel]
Field Definition Syntax
Fields are passed as keyword arguments. Three forms:
# Form 1: Bare type (required field, no default)
M = create_model('M', name=str)
# Form 2: Tuple of (type, default_value)
M = create_model('M', name=(str, "default"))
# Form 3: Tuple of (type, FieldInfo)
M = create_model('M', name=(str, Field(description="Name field")))
# Form 4: Annotated type with FieldInfo default
M = create_model('M', name=(Annotated[str, Field(gt=0)], ...))
The ... (Ellipsis) as default means the field is required:
M = create_model('M',
required_field=(int, ...), # required
optional_field=(int, None), # optional, defaults to None
defaulted_field=(int, 42), # optional, defaults to 42
)
Base Class Inheritance
class MyBase(BaseModel):
model_config = ConfigDict(extra="forbid")
common_field: str = "shared"
DerivedModel = create_model(
'DerivedModel',
__base__=MyBase,
extra_field=(int, ...),
)
# DerivedModel inherits MyBase's config, validators, and fields
Multiple bases:
DerivedModel = create_model('D', __base__=(Base1, Base2), ...)
Config with __config__
from pydantic import ConfigDict, create_model
model_config = ConfigDict(
extra="forbid", # reject unknown fields
protected_namespaces=(), # allow model_* field names
populate_by_name=True, # allow field name or alias
)
M = create_model('M', __config__=model_config, name=(str, ...))
If __base__ is also provided, the __config__ merges with/overrides the base config.
Validators with __validators__
from pydantic import create_model, field_validator
def check_positive(cls, v):
assert v > 0, "must be positive"
return v
M = create_model(
'M',
__validators__={
"check_value": field_validator("value")(check_positive),
},
value=(int, ...),
)
Validator keys must be unique across the model. When assembling from multiple parameters,
namespace them: f"{field_name}_{validator_name}".
2. Discriminated Unions
String Discriminator (Simple)
All union members share a field with Literal values:
from typing import Literal, Union
from pydantic import BaseModel, Field
class Cat(BaseModel):
pet_type: Literal['cat']
meows: int
class Dog(BaseModel):
pet_type: Literal['dog']
barks: float
class Model(BaseModel):
pet: Union[Cat, Dog] = Field(discriminator='pet_type')
Pydantic reads pet_type from the input dict and routes validation to the matching model.
Fast O(1) dispatch. Clear error messages.
Callable Discriminator with Tag
When union members have different field structures (no shared discriminator field):
from typing import Annotated, Any, Union
from pydantic import BaseModel, Discriminator, Tag
class Apple(BaseModel):
fruit: str
class Pumpkin(BaseModel):
filling: str
def pie_discriminator(v: Any) -> str:
if isinstance(v, dict):
if 'fruit' in v:
return 'apple'
if 'filling' in v:
return 'pumpkin'
# Handle model instances too (used during serialization)
if hasattr(v, 'fruit'):
return 'apple'
return 'pumpkin'
Pie = Annotated[
Union[
Annotated[Apple, Tag('apple')],
Annotated[Pumpkin, Tag('pumpkin')],
],
Discriminator(pie_discriminator),
]
Key points:
- The callable receives raw
dictduring validation and model instances during serialization - Always handle both cases (
isinstance(v, dict)andgetattr) - Return value must match one of the
Tag(...)strings - Returning a non-matching string or
Nonecauses aValidationError
Custom Error Messages
Discriminator(
my_discriminator_func,
custom_error_type='invalid_union_member',
custom_error_message='Input does not match any expected type',
custom_error_context={'discriminator': 'type_field'},
)
Nested Discriminators
Stack discriminators for hierarchical type selection:
Cat = Annotated[
Union[Annotated[BlackCat, Tag('black')], Annotated[WhiteCat, Tag('white')]],
Discriminator(color_discriminator),
]
Pet = Annotated[
Union[Annotated[Cat, Tag('cat')], Annotated[Dog, Tag('dog')]],
Discriminator(species_discriminator),
]
Dynamic Discriminated Union Construction
Build unions at runtime (Galaxy’s primary pattern):
from pydantic import Discriminator, Tag, create_model
from typing import Annotated, Union
def build_discriminated_union(branches: dict[str, type]) -> type:
"""branches maps tag_string -> model_class"""
tagged = [Annotated[model, Tag(tag)] for tag, model in branches.items()]
def discriminate(v):
if isinstance(v, dict):
return v.get('type', '')
return getattr(v, 'type', '')
return Annotated[Union[tuple(tagged)], Discriminator(discriminate)]
3. model_rebuild() for Forward References
When It’s Needed
Pydantic resolves type annotations when the model class is created. If a model references
a type that isn’t defined yet (forward reference), schema generation is deferred. Call
model_rebuild() after all referenced types are defined.
from __future__ import annotations # makes all annotations strings (forward refs)
from pydantic import BaseModel
class Node(BaseModel):
value: int
children: list[Node] # self-reference
Node.model_rebuild() # resolve the forward reference to Node
API
@classmethod
def model_rebuild(
cls,
*,
force: bool = False, # rebuild even if already complete
raise_errors: bool = True, # raise on resolution failure
_parent_namespace_depth: int = 2,
_types_namespace: MappingNamespace | None = None, # custom namespace for resolution
) -> bool | None
Returns:
Noneif schema was already complete andforce=FalseTrueif rebuild succeededFalseif rebuild failed andraise_errors=False
Mutual References
class Parent(BaseModel):
children: list['Child']
class Child(BaseModel):
parent: 'Parent'
# Must rebuild after BOTH are defined
Parent.model_rebuild()
Child.model_rebuild()
Cascading Rebuilds
Rebuilding a model may cascade to rebuild its dependencies. If those dependencies
have unresolved references, errors can occur. Define all types before calling model_rebuild().
Module-Level Pattern
Standard pattern — define all models, then rebuild at module level:
class A(BaseModel):
b: Optional['B'] = None
class B(BaseModel):
a: Optional['A'] = None
# After all definitions
A.model_rebuild()
B.model_rebuild()
4. JSON Schema Generation
model_json_schema()
schema = MyModel.model_json_schema(
mode='validation', # or 'serialization'
schema_generator=GenerateJsonSchema, # custom generator class
ref_template='#/$defs/{model}', # reference template
)
Custom Schema Generator
from pydantic.json_schema import GenerateJsonSchema
class CustomSchema(GenerateJsonSchema):
def generate(self, schema, mode='validation'):
json_schema = super().generate(schema, mode=mode)
json_schema['$schema'] = self.schema_dialect # add $schema field
return json_schema
schema = Model.model_json_schema(schema_generator=CustomSchema)
Dynamic Models and JSON Schema
Dynamic models created via create_model() support model_json_schema() just like static models.
Discriminated unions produce oneOf with discriminator metadata in the schema.
TypeAdapter for Non-Model Types
from pydantic import TypeAdapter
adapter = TypeAdapter(Union[Cat, Dog])
schema = adapter.json_schema(mode='validation')
5. __get_pydantic_core_schema__ Custom Types
On the Type Itself
from pydantic_core import CoreSchema, core_schema
from pydantic import GetCoreSchemaHandler
class PositiveInt(int):
@classmethod
def __get_pydantic_core_schema__(
cls, source_type: type, handler: GetCoreSchemaHandler
) -> CoreSchema:
return core_schema.int_schema(gt=0)
As Annotation Metadata
from dataclasses import dataclass
@dataclass(frozen=True)
class Uppercase:
def __get_pydantic_core_schema__(self, source_type, handler):
schema = handler(source_type)
return core_schema.no_info_after_validator_function(
lambda v: v.upper(), schema
)
# Usage: Annotated[str, Uppercase()]
Handler Methods
handler(type)— call next in chain or Pydantic’s default fortypehandler.generate_schema(type)— generate schema for an unrelated typehandler.field_name— access current field name (v2.4+)
Common core_schema Functions
core_schema.str_schema()
core_schema.int_schema(gt=0)
core_schema.no_info_after_validator_function(fn, inner_schema)
core_schema.to_string_ser_schema()
core_schema.union_schema([schema1, schema2])
core_schema.tagged_union_schema(discriminator, choices)
6. RootModel for Single-Value Models
from pydantic import RootModel, Field, Discriminator
class MyUnion(RootModel):
root: Union[Cat, Dog] = Field(discriminator='type')
Useful for:
- Wrapping discriminated unions as standalone models
- Creating list/container models:
root: List[SomeType] - Using as field types in
create_model(the root value is unwrapped on access)
7. TypeAdapter for Non-Model Validation
from pydantic import TypeAdapter
# Validate complex union types without a model wrapper
adapter = TypeAdapter(Annotated[Union[A, B], Field(discriminator='src')])
result = adapter.validate_python({'src': 'a', 'value': 1})
Module-level TypeAdapter instances are created once and reused (no lru_cache needed).
8. Caching Patterns for Dynamic Models
Problem
create_model() is relatively expensive — it generates pydantic-core schemas, compiles
validators, and creates a new class. Calling it per-request is wasteful.
lru_cache on Factory Functions
from functools import lru_cache
@lru_cache(maxsize=256)
def get_model_for_tool(tool_id: str, state_repr: str) -> type[BaseModel]:
bundle = load_tool_parameters(tool_id)
return create_field_model(bundle.parameters, f"Model_{tool_id}", state_repr)
Requirements for caching:
- Arguments must be hashable (strings, tuples, frozensets — not lists or dicts)
- The tool parameter definition must be immutable for the cache to be valid
- Consider
maxsizebased on number of unique tools x state representations
Galaxy’s Approach
Galaxy currently does NOT cache dynamic models. Models are created fresh on each validation call. This is acceptable because:
- Tool definitions are read at startup and rarely change
- Validation calls are relatively infrequent (per-job, not per-request in hot paths)
- The model creation overhead is small compared to actual job execution
If caching were needed, the natural cache key would be
(tool_id, tool_version, state_representation).
9. Common Pitfalls
1. Forgetting model_rebuild() for Forward References
Symptom: PydanticUndefinedAnnotation at validation time.
Fix: Call model_rebuild() after all referenced types are defined.
2. Callable Discriminator Not Handling Model Instances
Pydantic calls the discriminator function during serialization too, where v is a model instance
not a dict:
# WRONG
def discriminate(v):
return v.get('type') # fails on model instances
# CORRECT
def discriminate(v):
if isinstance(v, dict):
return v.get('type', '')
return getattr(v, 'type', '')
3. Non-Unique __validators__ Keys
All validator keys must be unique within a model. When combining validators from multiple parameters, namespace them:
# WRONG
validators = {"check": field_validator("a")(fn), "check": field_validator("b")(fn)}
# CORRECT
validators = {"a_check": field_validator("a")(fn), "b_check": field_validator("b")(fn)}
4. extra="forbid" with Aliases
When using extra="forbid", fields accessed by alias will reject the non-alias name and
vice versa unless populate_by_name=True is set:
ConfigDict(extra="forbid", populate_by_name=True)
5. Dynamic Literal with Empty Options
Union[()] (empty tuple) is invalid. Handle the zero-options case:
if len(options) > 0:
py_type = union_type([Literal[o] for o in options])
else:
py_type = type(None) # no valid options -> only None accepted
6. protected_namespaces Warning
Pydantic warns if fields start with model_. Suppress with:
ConfigDict(protected_namespaces=())
7. Mutating Types After Model Creation
create_model() captures types at call time. Mutating the type objects after creation
has no effect on already-created models.
10. Performance Notes
create_model()cost is dominated by pydantic-core schema compilation- Pydantic v2 internally caches some schema generation results
- Discriminated unions with string discriminators are faster than callable discriminators (direct dict lookup vs function call)
TypeAdapterinstances should be created once at module level, not per-call- For dynamic models that are created frequently with the same structure,
lru_cacheon the factory function provides significant speedup model_rebuild()triggers a full schema recompilation; avoid calling it more than once per model unlessforce=Trueis needed