ADR-006: Use Pydantic for all data models with mypy strict mode

Status

ACCEPTED

Context and Problem Statement

orchcore has approximately 20 data structures that cross component boundaries: AgentConfig, AgentResult, StreamEvent, Phase, PhaseResult, PipelineResult, ToolExecution, AgentMonitorSnapshot, RetryPolicy, OutputExtraction, and others. These models carry data between the registry and runner, between the stream pipeline and UICallback, between the phase engine and consuming projects, and between the recovery module and the runner.

In multi-agent orchestration, data integrity at component boundaries is critical. A misconfigured AgentConfig (wrong type for stall_timeout, missing binary path) causes a subprocess launch failure that manifests far from the misconfiguration. A malformed StreamEvent (wrong enum value, missing timestamp) causes the AgentMonitor state machine to enter an undefined state. A PhaseResult with an invalid status causes the PipelineRunner to make incorrect dependency decisions.

In the four source systems: - Planora (Python): Uses Pydantic v2 for some models, but not consistently — some data flows use raw dicts - Articles/Finvault/Raven (Bash): No type safety — all data is untyped strings passed between functions

The lack of consistent type safety across the source systems has been a recurring source of bugs: wrong field names in dicts, missing fields silently defaulting to None, numeric strings not converted to integers, and invalid enum values passing through unchecked.

Business Context

orchcore is critical infrastructure — bugs propagate to all consuming projects
Multi-agent orchestration has subtle failure modes that are hard to debug without type safety
The Python ecosystem has converged on Pydantic v2 as the standard for data validation
mypy strict mode provides compile-time checking that catches type errors before runtime

Decision Drivers

Driver	Priority	Why It Matters
Runtime validation at component boundaries	Critical	Catches misconfiguration, malformed data, and type errors at the point of entry, not at the point of failure
Static type checking (mypy strict)	Critical	Catches type errors during development, before code runs in production
Serialization (to/from JSON, dict, TOML)	High	AgentResult, PhaseResult, and timing records need to be serialized for archival and resume
IDE autocompletion and documentation	High	Engineers working with orchcore need discoverability of model fields and types
Immutability for safety	Medium	Models like AgentConfig and Phase should not be mutated after creation
Compatibility with pydantic-settings	Medium	Configuration system (ADR-005) uses pydantic-settings, which requires Pydantic models

Considered Options

Option 1: Pydantic BaseModel for all data structures with mypy strict (CHOSEN)

Overview: Every data structure that crosses a component boundary is a Pydantic BaseModel (or StrEnum). All fields have explicit type annotations. The entire package runs under mypy strict mode with zero errors. Validation happens at model construction time.

Pros: - Runtime validation at construction: AgentConfig(stall_timeout="abc") raises ValidationError immediately - Static type checking: mypy strict catches result.exit_cod (typo) at development time - Automatic serialization: .model_dump(), .model_dump_json(), .model_validate() for JSON and dict conversion - IDE integration: autocompletion, type hints, field documentation via Field(description=...) - Immutability: model_config = ConfigDict(frozen=True) prevents accidental mutation - Schema generation: .model_json_schema() produces JSON Schema for documentation - pydantic-settings compatibility: configuration system (ADR-005) directly uses Pydantic's type system - Ecosystem standard: most Python libraries understand and integrate with Pydantic models

Cons: - Pydantic v2 is a required dependency (~2MB installed size) - Construction overhead: Pydantic validation adds ~1-5 microseconds per model instance (negligible for orchcore's throughput) - Learning curve: developers must understand Pydantic v2's validation semantics (validators, model_config, etc.) - mypy strict mode is demanding: requires explicit return types, no implicit Any, no untyped function signatures

Risk Assessment:

Risk Type	Level	Detail
Technical	Low	Pydantic v2 is stable and battle-tested
Schedule	Low	Planora already uses Pydantic — patterns are established
Ecosystem	Low	Pydantic is the most widely used Python data library; not at risk of abandonment

Trade-offs: - We gain runtime validation, static type checking, and serialization at every component boundary, accepting Pydantic as a dependency and the discipline of mypy strict mode

Option 2: Python dataclasses (stdlib)

Overview: Use @dataclass for all data structures. Type hints provide documentation and mypy checking, but no runtime validation.

Pros: - No additional dependency (stdlib) - Lighter weight than Pydantic (~10x faster construction) - Familiar to all Python developers - mypy strict works with dataclasses

Cons: - No runtime validation: AgentConfig(stall_timeout="abc") silently stores a string where int is expected - No automatic serialization: must write custom to_dict(), from_dict() methods for each model - No schema generation - Less expressive validation: custom __post_init__ is fragile and verbose - Not compatible with pydantic-settings without wrapping

Why not chosen: - The absence of runtime validation is unacceptable for a library that processes untrusted data from agent CLI output and user-provided configuration. A malformed StreamEvent from a parser bug would propagate silently through the system until it causes a failure far from the source. Pydantic catches this at construction time.

Option 3: attrs

Overview: Use the attrs library for data classes with optional validation via @attr.s(slots=True) and validators.

Pros: - Lighter weight than Pydantic - Good performance (slotted classes) - Validators available (opt-in) - mypy plugin provides good type checking

Cons: - Validation is less expressive than Pydantic (no complex types, no automatic coercion) - No automatic JSON/dict serialization (need cattrs as companion) - Smaller ecosystem integration (fewer libraries understand attrs natively) - Not compatible with pydantic-settings - Two data libraries (attrs + cattrs) instead of one (pydantic)

Why not chosen: - attrs + cattrs provides a subset of Pydantic's functionality with more dependencies and less ecosystem integration. Since orchcore already needs pydantic-settings for configuration (ADR-005), using Pydantic for all models is the natural, consistent choice.

Option 4: TypedDict (stdlib)

Overview: Use TypedDict for structural typing of dictionaries. This provides type hints without requiring class instances.

Pros: - No dependency (stdlib) - Zero overhead (plain dicts) - mypy checks field access on TypedDict

Cons: - No runtime validation (TypedDict is purely a static analysis construct) - Mutable by default (no frozen option) - No methods (no .model_dump(), no .snapshot()) - No inheritance with proper field merging - Awkward syntax for optional fields and defaults - Not compatible with pydantic-settings

Why not chosen: - TypedDict provides static type hints but no runtime validation, no serialization, and no behavior (methods). orchcore models need runtime validation (parsing agent output), serialization (archiving results), and behavior (AgentMonitor.snapshot(), AgentRegistry.get()). TypedDict is insufficient for these requirements.

Decision

We have decided to use Pydantic BaseModel (>= 2.10) for all data structures that cross component boundaries, with mypy strict mode enforced across the entire orchcore package.

Implementation Details

All models use pydantic.BaseModel with explicit type annotations for all fields
Enums use StrEnum for string-based enumerations (AgentState, StreamEventType, PhaseStatus, StreamFormat, AgentMode)
Immutable models use model_config = ConfigDict(frozen=True) for models that should not be mutated after creation (AgentConfig, Phase, StreamEvent)
Mutable models (AgentMonitor internals, WorkspaceManager state) use regular BaseModel

mypy configuration in pyproject.toml:

[tool.mypy]
strict = true
plugins = ["pydantic.mypy"]

[tool.pydantic-mypy]
init_forbid_extra = true
init_typed = true
warn_required_dynamic_aliases = true

Validation errors produce clear messages referencing the field name, expected type, and received value
Serialization uses .model_dump() for dict output and .model_dump_json() for JSON output
Deserialization uses .model_validate() for dict input and .model_validate_json() for JSON input

When to Revisit This Decision

If Pydantic v3 introduces breaking changes that require significant migration effort
If a new Python stdlib mechanism provides runtime validation without external dependencies
If construction overhead becomes measurable (> 1ms per model in profiling) — extremely unlikely given orchcore's throughput
If mypy strict mode causes excessive friction for contributors (no sign of this with a single-developer project)

Consequences

Positive

Runtime validation catches misconfiguration and malformed data at construction time, not at the point of failure
mypy strict mode catches type errors during development with zero runtime cost
Automatic serialization enables archival, resume, and debugging without custom code
IDE autocompletion makes orchcore's data model discoverable
Frozen models prevent accidental mutation of AgentConfig, Phase, and StreamEvent
Consistent model pattern across all 20+ data structures reduces cognitive load
pydantic-settings integration provides type-safe configuration with the same validation system

Negative

Pydantic v2 is a required dependency (mitigated: most Python projects already use Pydantic)
mypy strict mode requires explicit type annotations everywhere (mitigated: this is a feature, not a bug — it forces clear API design)
Minor construction overhead (mitigated: ~1-5 microseconds per instance, negligible vs. subprocess execution time)

Neutral

Pydantic is the de facto standard for Python data models — using it aligns with community expectations
mypy strict mode is increasingly common in production Python packages

Validation and Monitoring

Success Metric	Target	How to Measure
mypy strict compliance	Zero errors on every commit	CI check: `mypy --strict src/orchcore`
Runtime validation coverage	All cross-boundary data structures are Pydantic models	Code review: no raw dicts crossing module boundaries
Serialization round-trip	All models survive `.model_dump()` / `.model_validate()` round-trip	Unit test for each model
Validation error clarity	Error messages include field name, expected type, and received value	Manual review of ValidationError output

Review Schedule: - On each pydantic release: check for breaking changes or deprecations - Annually: Reassess Pydantic vs. alternatives

ADR-001: Extract reusable orchestration core — type safety is a key motivation for the Python package
ADR-003: Protocol-based UI decoupling — UICallback parameters are Pydantic models
ADR-005: Multi-source layered configuration — uses pydantic-settings (Pydantic ecosystem)

References

Pydantic v2 documentation
Pydantic mypy plugin
mypy strict mode
PEP 557 — Data Classes (rejected alternative)

Document History

Version	Date	Author	Changes
1.0	2026-03-25	Abdelaziz Abdelrasol	Initial version (ACCEPTED)