Software Design Document

Project: orchcore | Author: Abdelaziz Abdelrasol | Date: 2026-03-25 | Status: DRAFT

TL;DR

orchcore extracts the common orchestration infrastructure from four production AI agent orchestration systems (Planora, Articles, Finvault, Raven/Ralph) into a standalone, reusable Python package. It provides 12 components — agent registry, subprocess runner, stream processing pipeline, phase engine, rate-limit recovery, workspace management, configuration, prompt templating, logging, UI protocol, signal handling, and observability — unified by a UICallback protocol that decouples the engine from presentation. The extraction eliminates 60-70% code duplication across the four systems.

Problem Statement

Current Situation

Four production orchestration systems exist — Planora (Python), Articles (Bash), Finvault (Bash), and Raven/Ralph (Go) — each independently implementing the same infrastructure: subprocess launching, JSONL stream parsing, rate-limit detection, workspace management, and configuration handling.
An estimated 60-70% of the orchestration code across these four systems is functionally identical infrastructure.
Bug fixes in one system must be manually ported to the other three.
Each new agent CLI (Gemini, Copilot) requires N implementations across N systems.

Opportunity

A shared package eliminates duplication — improvements benefit all consumers immediately.
Dramatically lowers the barrier to building new AI agent orchestration systems.
The Python ecosystem lacks a purpose-built library for orchestrating CLI-based AI coding agents (existing tools like LangChain operate at the API level, not the subprocess level).

Goals

Provide a pip-installable Python package with 12 documented, tested components
Zero code changes to add new agent CLI support — TOML configuration only
Sequential and parallel phase execution with configurable partial failure semantics
Process JSONL from 5 agent CLI formats into a unified StreamEvent model
Automatic rate-limit recovery with timezone-aware reset parsing and exponential backoff
mypy strict mode with zero errors
Each component usable independently (no all-or-nothing dependency)

Non-Goals

orchcore is not an AI agent — it does not make API calls or generate code
orchcore does not include a TUI framework — it provides hooks that a TUI implements
orchcore does not handle API keys — agent CLIs manage their own credentials
Windows is not a first-class platform in v1.0

Success Metrics

Metric	Current	Target
Code duplication across systems	~60-70%	< 5%
Time to add new agent	2-4 hours × 4 systems	< 30 minutes (one TOML entry)
Test coverage	Varies (20-80%)	> 90%
mypy strict compliance	Partial	100% (zero errors)

Requirements

Functional Requirements

Component	Key Requirements
Agent Registry	Built-in configs for 5 agents; custom agents via TOML; mode-specific flags (PLAN, FIX, AUDIT, REVIEW)
Subprocess Runner	Async launch with stream capture; concurrency via Semaphore; structured `AgentResult`
Stream Processing	Pre-parse filtering (~95% noise reduction); 5-format parsing; 9-state machine; stall detection
Tool Assignment	Per-phase `ToolSet`; per-agent overrides; layered resolution order; permission levels
Pipeline Engine	Sequential/parallel phases; dependency ordering; resume, skip, only-phase options
Recovery	Regex rate-limit detection; timezone-aware reset parsing; exponential backoff; git dirty-tree recovery
Workspace	Active directories; timestamped archives; gzip compression; "latest" symlink
Configuration	7-level priority chain; named profiles; per-agent overrides; extensible via subclassing
Prompt Templating	Jinja2 rendering; frontmatter stripping; configurable template directories
Signal Handling	SIGINT/SIGTERM trap; cooperative `shutdown_requested` flag; PhaseRunner owns subprocess cleanup and 30s grace period
UI Protocol	15 callback methods; NullCallback and LoggingCallback built-in

Non-Functional Requirements

Category	Target
Performance	< 5ms per event, < 100ms subprocess launch
Memory	< 50MB per concurrent agent (line-by-line streaming)
Extensibility	New agent via TOML only
Composability	Each component usable standalone
Type Safety	mypy strict, zero errors
Testing	> 90% line coverage

Key Interfaces

UICallback Protocol

class UICallback(Protocol):
    def on_pipeline_start(self, phases: Sequence[Phase]) -> None: ...
    def on_pipeline_complete(self, result: PipelineResult) -> None: ...
    def on_phase_start(self, phase: Phase) -> None: ...
    def on_phase_end(self, phase: Phase, result: PhaseResult) -> None: ...
    def on_phase_skip(self, phase: Phase, reason: str) -> None: ...
    def on_agent_start(self, agent_name: str, phase: str) -> None: ...
    def on_agent_event(self, event: StreamEvent) -> None: ...
    def on_agent_complete(self, agent_name: str, result: AgentResult) -> None: ...
    def on_agent_error(self, agent_name: str, error: str) -> None: ...
    def on_stall_detected(self, agent_name: str, duration: float) -> None: ...
    def on_rate_limit(self, agent_name: str, message: str) -> None: ...
    def on_rate_limit_wait(self, agent_name: str, wait_seconds: float) -> None: ...
    def on_retry(self, agent_name: str, attempt: int, max_attempts: int) -> None: ...
    def on_git_recovery(self, action: str, detail: str) -> None: ...
    def on_shutdown(self, reason: str) -> None: ...

AgentRunner

class AgentRunner:
    async def run(
        self,
        agent: AgentConfig,
        prompt: str,
        output_path: Path,
        mode: AgentMode = AgentMode.PLAN,
        dry_run: bool = False,
        on_event: Callable[[StreamEvent], None] | None = None,
        on_snapshot: Callable[[AgentMonitorSnapshot], None] | None = None,
        snapshot_interval: float | None = None,
        stall_check_interval: float = 5.0,
        on_process_start: Callable[[asyncio.subprocess.Process], None] | None = None,
        on_process_end: Callable[[asyncio.subprocess.Process], None] | None = None,
        toolset: ToolSet | None = None,
        on_stall: Callable[[str, float], None] | None = None,
    ) -> AgentResult: ...

PipelineRunner

class PipelineRunner:
    async def run_pipeline(
        self,
        phases: list[Phase],
        prompts: dict[str, str],
        ui_callback: UICallback,
        mode: AgentMode | None = None,
        resume_from: str | None = None,
    ) -> PipelineResult: ...

Alternatives Considered

Alternative	Why Not
LangChain / CrewAI	Operate at the API level, not CLI subprocess level
Shared Bash library	Not feasible for complex async orchestration
Monorepo with shared code	Tight coupling, hard to version independently
Per-system copy-paste	Current state — 60-70% duplication, maintenance burden

Architecture Overview — package layout and component diagrams
Architecture Decision Records — all 9 ADRs
Configuration Reference
Stream Events Reference