npm - ultimate-pi - Versions diffs - 0.1.2 → 0.1.4 - Mend

ultimate-pi 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (516) hide show

package/vault/wiki/concepts/lifecycle-hooks.md ADDED Viewed

@@ -0,0 +1,94 @@
+---
+type: concept
+title: "Lifecycle Hook System"
+aliases: ["lifecycle hooks", "tool-level hooks", "deterministic hooks"]
+created: 2026-05-01
+tags: [concept, harness-design, hooks, deterministic-policy, claude-code]
+status: developing
+related:
+  - "[[harness-implementation-plan]]"
+  - "[[agentic-harness]]"
+  - "[[feedforward-feedback-harness]]"
+  - "[[harness-wiki-skill-mapping]]"
+sources:
+  - "[[claude-code-architecture-karaxai-2026]]"
+  - "[[claude-code-security-architecture-penligent-2026]]"
+updated: 2026-05-02
+---
+# Lifecycle Hook System
+Deterministic policy enforcement at the tool lifecycle level through hook events with exit-code semantics. The key insight: **CLAUDE.md achieves ~92% compliance. Hooks achieve 100% compliance** for conditions they match (Source: [[claude-code-architecture-karaxai-2026]]).
+## Hook vs Prompt
+| Mechanism | Compliance | Method | When it fails |
+|---|---|---|---|
+| CLAUDE.md rules | ~92% | Prompt injection | Model ignores, forgets under load, context anxiety |
+| Hooks | 100% (when matched) | Shell command with exit code | Only if hook script has bugs |
+The 8% gap between CLAUDE.md and hooks represents every case where the model "knew the rule but didn't follow it." Hooks close this gap by making policy enforcement external to the model.
+## Key Hook Events
+### PreToolUse (Most Critical)
+Fires before tool execution. Can: allow, deny, ask (prompt user), defer (for batch processing), modify tool input, inject context. Exit code 2 blocks the tool. JSON output allows finer control.
+**Use cases**:
+- Block dangerous shell commands (`rm -rf`, `curl` to unknown hosts)
+- Auto-approve safe commands (`npm test`, `git status`)
+- Modify tool input (sanitize paths, add safety flags)
+- Inject environment context ("current branch: main, CI is red")
+- Defend against prompt injection in tool calls
+### PostToolUse
+Fires after tool succeeds. Can: audit output, auto-format files, run tests asynchronously, replace tool output (redact secrets), inject context. Cannot block (tool already ran).
+**Use cases**:
+- Run linter after every `Write`/`Edit`
+- Redact secrets from tool output before model sees them
+- Log all file modifications for audit trail
+- Trigger async test suite after code changes
+### Stop / SubagentStop
+Fires when agent finishes. Can: block stopping, re-invoke agent with feedback. Exit code 2 or JSON `decision: "block"` prevents stop.
+**Use cases**:
+- "Tests must pass before you stop"
+- "All lint errors must be resolved"
+- "Task list must be fully checked off"
+- "Build must succeed before claiming completion"
+### PermissionRequest
+Fires when permission dialog would appear. Can: programmatically allow/deny, modify input, update permission rules. Replaces user-facing approval dialogs.
+**Use cases**:
+- Auto-approve based on environment (CI vs local)
+- Implement time-based policies ("no deploys after 5 PM")
+- Apply team-wide permission rules dynamically
+## Five Hook Types
+| Type | Mechanism | Blocking | Use |
+|---|---|---|---|
+| `command` | Shell script, exit codes + JSON stdout | Yes (exit 2) | Deterministic checks, validation |
+| `http` | POST to URL, JSON response | Yes (2xx with block) | Webhook integrations, external services |
+| `mcp_tool` | Call MCP server tool | Depends on tool | Reuse existing MCP infrastructure |
+| `prompt` | Single-turn LLM evaluation | Yes (JSON ok:false) | Semantic checks, policy evaluation |
+| `agent` | Multi-turn subagent with tools | Yes (JSON ok:false) | Complex verification needing codebase access |
+## Integration with Our Harness
+Current state: `extensions/harness-*.ts` provides layer-level hooks (before/after each pipeline stage). Missing: tool-level hooks with deterministic exit-code semantics.
+**Proposed**: Add a `lib/harness-hooks.ts` module that:
+1. Registers hook events at tool lifecycle boundaries (PreToolUse, PostToolUse, etc.)
+2. Supports command-based and prompt-based hook types
+3. Uses exit-code semantics for deterministic allow/deny
+4. Integrates with existing layer-level extension hooks (they fire before/after tool-level hooks)
+5. Hooks configured in `.pi/harness/hooks.json` with the same scoping model (project, user, managed)
+## First Principles
+- **Hooks determine correctness, not prompts.** If a rule must never be broken, use a hook. If a rule should usually be followed, use prompts.
+- **Exit codes are the contract.** 0 = allow, 2 = deny (feed stderr to model), other = non-blocking error.
+- **Hooks are external to the model.** They don't consume context. They can't be ignored, forgotten, or bypassed through clever prompting.

package/vault/wiki/concepts/mcp-tool-routing.md ADDED Viewed

@@ -0,0 +1,102 @@
+---
+type: concept
+status: developing
+created: 2026-04-30
+updated: 2026-05-01
+tags:
+  - mcp
+  - agent-tools
+  - routing
+  - architecture
+  - typescript-execution-layer
+related:
+  - "[[agent-search-enforcement]]"
+  - "[[ck-tool]]"
+  - "[[Research: semantic code search tools]]"
+  - "[[ts-execution-layer]]"
+  - "[[harness-implementation-plan]]"
+title: "MCP tool routing"
+---# MCP tool routing
+Using the Model Context Protocol (MCP) to register semantic code search as a first-class agent tool, then routing code-exploration queries through it instead of generic shell tools.
+## MCP Architecture for Code Search
+```
+┌─────────────────┐     MCP Protocol      ┌──────────────────┐
+│  AI Agent       │ ◄──────────────────► │  ck MCP Server    │
+│  (Claude Code,  │   tools/list          │  (ck --serve)     │
+│   Cursor, etc.) │   tools/call          │                   │
+│                 │   resources/read      │  ck_search()      │
+│  Native Tools:  │                       │  ck_get()         │
+│  - bash         │                       │  ck_info()        │
+│  - read         │                       │  ck_reindex()     │
+│  - write        │                       │                   │
+└─────────────────┘                       └──────────────────┘
+```
+## ck MCP Tools
+| Tool | Description | Parameters |
+|------|-------------|------------|
+| `ck_search` | Semantic/hybrid search | query, path, mode (sem/hybrid/regex), limit, threshold |
+| `ck_get` | Get file content with context | file_path, start_line, end_line |
+| `ck_info` | Get index statistics | path |
+| `ck_reindex` | Force re-index | path, model |
+## Registration
+```bash
+# Claude Code
+claude mcp add ck-search -s user -- ck --serve
+# Claude Desktop (claude_desktop_config.json)
+{
+  "mcpServers": {
+    "ck-search": {
+      "command": "ck",
+      "args": ["--serve"]
+    }
+  }
+}
+# Cursor (.cursor/mcp.json)
+{
+  "mcpServers": {
+    "ck-search": {
+      "command": "ck",
+      "args": ["--serve"]
+    }
+  }
+}
+```
+## Routing Logic
+The agent decides which tool to use. MCP tools appear alongside native tools. To influence routing:
+1. **Tool descriptions matter**: The MCP tool description is what the agent sees. Make it specific:
+   ```
+   "ck_search: Semantic code search using embeddings. Use for conceptual
+    queries like 'error handling', 'authentication flow', 'retry logic'.
+    For exact string matching, use grep instead."
+   ```
+2. **System prompt priority**: Tell the agent to prefer MCP tools for code exploration.
+3. **Naming conventions**: Name tools intuitively. `ck_search` is clearer than `tool_1`.
+## Limitations
+- **No priority/weight system in MCP**: All tools are equal. No way to mark a tool as "preferred."
+- **Agent may still choose bash**: If bash grep works, inertia favors it.
+- **Tool discovery overhead**: Agent must query `tools/list` to discover MCP tools. Some agents cache this.
+- **No "replace native tool" mechanism**: MCP tools are additive, not substitutive. Can't disable bash grep.
+## Alternatives to MCP Routing
+- **Custom agent framework**: Build your own tool router that intercepts all tool calls and rewrites them.
+- **Proxy MCP server**: An MCP server that wraps both native tools and ck, making routing decisions centrally.
+- **Shell function aliases**: `function grep() { ck --hybrid "$@" || command grep "$@"; }` — simpler but less controlled.
+- **[[ts-execution-layer|TypeScript Execution Layer]]**: Replace MCP tool routing entirely. Instead of routing individual tool calls, expose all tools as a typed TypeScript API. Agent writes TS code; sandbox executes; tool calls dispatch via RPC. 3-4x context reduction and ~20% higher multi-tool success rate vs flat tool calling. Validated by CodeAct, Cloudflare Code Mode, Executor.

package/vault/wiki/concepts/memory-system-of-record-vs-ephemeral-cache.md ADDED Viewed

@@ -0,0 +1,47 @@
+---
+type: concept
+title: "Memory System-of-Record vs Ephemeral Cache"
+aliases: ["memory layering", "canonical memory vs fast memory"]
+created: 2026-05-05
+updated: 2026-05-05
+tags: [concept, memory, harness, architecture]
+status: developing
+related:
+  - "[[persistent-memory]]"
+  - "[[adr-009]]"
+  - "[[lifecycle-hooks]]"
+  - "[[Research: claude-mem over Obsidian for Harness Layer]]"
+sources:
+  - "[[adr-009]]"
+  - "[[persistent-memory]]"
+  - "[[codex-harness-innovations]]"
+---
+# Memory System-of-Record vs Ephemeral Cache
+## Definition
+Split agent memory into two layers:
+- **System-of-record memory**: durable, auditable, human-reviewable, citation-ready.
+- **Ephemeral cache memory**: fast, local, convenience-oriented, non-authoritative.
+## Why This Pattern
+- Harness decisions need traceability and contradiction handling.
+- Fast memory helps turn-level latency and continuity.
+- Mixing both into one store causes drift: quick notes get mistaken as validated decisions.
+## Harness Mapping
+| Layer | Role | Example in this repo |
+|---|---|---|
+| System-of-record | Canonical truth | `vault/wiki/` with `index.md`, `log.md`, `hot.md` |
+| Ephemeral cache | Speed and recall hints | Optional local auto-memory tool |
+## Guardrails
+- Any decision, architecture change, or policy update must be written to wiki.
+- Ephemeral memory can suggest; wiki must confirm.
+- When cache conflicts with wiki, wiki wins.
+- Completion hooks should fail tasks that changed architecture without wiki write.
+## Tradeoff
+- Pure wiki: higher reliability, more manual writing.
+- Pure auto-memory: lower friction, weaker audit/provenance.
+- Hybrid: best practical balance for agentic harnesses.

package/vault/wiki/concepts/meta-agent-context-pruning.md ADDED Viewed

@@ -0,0 +1,151 @@
+---
+aliases: ["meta-agent pruning", "context drift meta-agent", "stuck-agent recovery"]
+type: concept
+title: "Meta-Agent Context Pruning"
+created: 2026-04-30
+status: developing
+tags:
+  - concept
+  - meta-agent
+  - context-pruning
+  - agent-reliability
+  - harness-design
+related:
+  - "[[Research: Meta-Agent Context Drift Detection]]"
+  - "[[context-drift-in-agents]]"
+  - "[[agent-loop-detection-patterns]]"
+  - "[[guardian-agent-pattern]]"
+  - "[[ironclaw-drift-monitor]]"
+  - "[[harness-configuration-layers]]"
+  - "[[grounding-checkpoints]]"
+updated: 2026-05-02
+---# Meta-Agent Context Pruning
+A proposed system: a separate observer (meta-agent) monitors the primary coding agent for stuck behavior, detects context drift, prunes irrelevant history from the context window, and restarts the agent with clean context. This is a **novel synthesis** — each component exists independently in literature and practice, but the full pipeline (detect → identify dead-ends → prune → restart) has not been published as a single system.
+## Architecture
+```
+┌─────────────────────────────────────────────────┐
+│               META-AGENT (Observer)               │
+│                                                   │
+│  ┌──────────┐   ┌──────────┐   ┌──────────────┐  │
+│  │ DETECT   │ → │IDENTIFY  │ → │  PRUNE +     │  │
+│  │ stuck    │   │ dead-end │   │  RESTART     │  │
+│  │ pattern  │   │ entries  │   │              │  │
+│  └──────────┘   └──────────┘   └──────────────┘  │
+│       ↑                               │           │
+│       │ monitors                      │ injects   │
+│       │                               ↓           │
+│  ┌──────────────────────────────────────────┐    │
+│  │        PRIMARY AGENT (Coding Agent)       │    │
+│  │  tool calls → errors → retries → context  │    │
+│  │  fills with noise → gets more lost       │    │
+│  └──────────────────────────────────────────┘    │
+└─────────────────────────────────────────────────┘
+```
+## Pipeline
+### Phase 1: Detection
+Rule-based pattern matching on tool call history. Zero LLM overhead.
+| Pattern | Method | Threshold |
+|---------|--------|-----------|
+| Repetition | Hash(tool + args), count in window | 3+ in 10 calls |
+| Failure spiral | Consecutive error count | 4+ |
+| Tool cycling | Sequence pattern A-B-A-B-A-B | 6 calls |
+| Silence drift | Iterations since last text response | 15+ |
+| Rework churn | Same file path written repeatedly | 3+ |
+| Excessive searching | ls/find/grep calls without code edits | 5+ |
+Optionally: LLM-based semantic drift check every N steps for higher precision on nuanced stuckness.
+### Phase 2: Identify Dead-End Entries
+Classify each context entry as keep or prune:
+**Keep**: Error led to different approach on next attempt. Output contained new information. User explicitly requested. Established a constraint.
+**Prune**: Identical call returned same result. Pure noise (boilerplate errors). Agent forgot about it entirely.
+Conservative default: when uncertain, keep. False-positive pruning is worse than false-negative (keeping noise).
+### Phase 3: Prune + Restart
+Two implementation strategies:
+**Strategy A — In-place editing** (if API supports message deletion from middle of history): Keep original task, key decisions, constraints discovered, last successful state. Remove dead-end entries between them. Inject correction message.
+**Strategy B — Session restart** (portable, always works): Terminate current session. Start new session with: original system prompt + task + pruned history summary + correction message.
+### Phase 4: Correction Injection
+Escalation model:
+1. **Soft**: "You've called [tool] with same args 3 times. Summarize what you know and try a different approach."
+2. **Strong**: "You're stuck. Here's a summary of what you've accomplished. Start fresh from here." (includes pruned context summary)
+3. **Forced restart**: Terminate, prune, restart with clean context.
+## Feasibility
+**High**. Each component is individually validated:
+- Detection: Production-proven (ironclaw DriftMonitor, LangSight loop detection)
+- Pruning: Conceptually similar to context compaction (Anthropic Claude Code) and code-level pruning (SWE-Pruner)
+- Restart: Standard pattern in multi-agent systems (sub-agent isolation)
+**Novelty**: The composition. No existing system combines all three phases.
+## Overhead
+| Component | Tokens | Notes |
+|-----------|--------|-------|
+| Rule-based detection | 0 | Hash comparison + counters |
+| LLM-based detection (optional) | ~500/check | Every 10-15 steps |
+| Pruning | 0 | Metadata operation |
+| Correction injection | ~150/injection | Max 3 injections |
+| Session restart | 1 API call | Cache miss cost |
+| **Total** | **~1,500-3,000** | Per 50-step session |
+**Net savings**: 5-10x token reduction when stuck sessions are common. Breakeven after 1-2 interventions.
+## Edge Cases
+- **Polling agents**: Whitelist polling tools. Use time-based windows, not count-based.
+- **Retry-heavy workflows**: Increase thresholds (5-7). Some tools legitimately fail transiently.
+- **Exploratory searching**: Distinguish by whether code edits follow the searches.
+- **Mid-reasoning interruption**: Pruning while the model is mid-chain-of-thought may lose coherence. Needs testing.
+## Harness Integration
+Proposed as **Layer 2.5 — Runtime Drift Monitor** in the ultimate-pi harness:
+```
+L1 (Spec Hardening) → L2 (Structured Planning) → L2.5 (Drift Monitor) → L3 (Execution + Grounding)
+```
+Components:
+- `lib/harness-drift-monitor.ts` — Detection engine, pruning logic, correction injection
+- `extensions/harness-drift-monitor.ts` — Hooks into before_llm_call / after_tool_call
+- `.pi/harness/drift-monitor.json` — Config: thresholds, escalation, whitelists, model profile
+Model-adaptive: Rule-based every step for GPT, rule-based every 10 steps for Gemini, LLM-based every 15 steps for Opus.
+## Open Questions
+- Can context be pruned in-place or must it always restart? API support varies.
+- What is the minimum context that must survive pruning?
+- Does pruning break chain-of-thought coherence?
+- How does pruning interact with prompt caching (cache invalidation)?
+- Can a small model (Haiku/Flash) serve as the meta-agent detector?
+- Does the meta-agent itself need drift monitoring? (Infinite regress risk)
+## See Also
+- [[context-drift-in-agents]] — The problem this solves
+- [[agent-loop-detection-patterns]] — Detection code and patterns
+- [[guardian-agent-pattern]] — Complementary proactive approach
+- [[Research: Meta-Agent Context Drift Detection]] — Full research synthesis
+- [[harness-configuration-layers]] — Where this fits in the four-layer harness model

package/vault/wiki/concepts/model-adaptive-harness.md ADDED Viewed

@@ -0,0 +1,122 @@
+---
+type: concept
+title: "Model-Adaptive Agent Harness"
+aliases: ["adaptive harness", "model-aware harness"]
+created: 2026-04-30
+updated: 2026-05-01
+tags: [concept, agents, harness-design, model-awareness]
+status: redesign
+related:
+  - "[[provider-native-prompting]]"
+  - "[[harness-configuration-layers]]"
+  - "[[Research: Model-Specific Prompting Guides]]"
+  - "[[forgecode-gpt5-agent-improvements]]"
+  - "[[harness-implementation-plan]]"
+  - "[[codex-harness-innovations]]"
+  - "[[codex-open-source-agent-2026]]"
+  - "[[openai-prompt-guidance]]"
+  - "[[anthropic-prompt-best-practices]]"
+  - "[[gemini-3-prompting-guide]]"
+sources:
+  - "[[openai-prompt-guidance]]"
+  - "[[anthropic-prompt-best-practices]]"
+  - "[[gemini-3-prompting-guide]]"
+  - "[[forgecode-gpt5-agent-improvements]]"
+---
+# Model-Adaptive Agent Harness
+An agent harness that generates **provider-native prompts** optimized for each model's official prompting conventions — not a single canonical format with strictness relaxations.
+> [!important] REDESIGN: May 2026 — The original design ("write once for strictest, relax for forgiving") has been replaced. Official provider guidance shows that models need fundamentally different prompt formats, not just different strictness levels. See [[provider-native-prompting]] and [[Research: Model-Specific Prompting Guides]].
+## Why This Exists
+Forge Code demonstrated that GPT 5.4 and Opus 4.6 reached identical benchmark scores (81.8% on TermBench 2.0) only after the harness was adapted to each model. This proved that adaptation matters.
+But Forge Code's approach was empirical: observe failure modes, then compensate. Each provider now publishes OFFICIAL guidance on how to prompt their models correctly. These guides should be the PRIMARY source for harness adaptations.
+## Design Principle (v2 — May 2026)
+**Generate provider-native prompts from a provider-agnostic semantic specification. Never generate a single canonical prompt and relax it.**
+The harness's internal representation is a semantic spec (WHAT must be communicated). The prompt renderer generates actual prompt text according to the target model's provider conventions (HOW it's communicated).
+See [[provider-native-prompting]] for the full architecture and renderer design.
+## Provider Profiles (Official Guidance)
+### OpenAI GPT-5.x
+- **Structure**: XML-like sections, constraints-first ordering
+- **Density**: Outcome-first, concise. Describe destination, not journey
+- **Verification**: Pre-flight/post-flight action safety blocks
+- **Thinking**: reasoning_effort parameter (none/low/medium/high/xhigh)
+- **Tools**: apply_patch native, shell_command, update_plan
+- **Key rule**: Contradictory instructions actively harm GPT-5+ reasoning
+- **Source**: [[openai-prompt-guidance]]
+### Anthropic Claude 4.x
+- **Structure**: XML tags, long content at TOP, query at BOTTOM
+- **Density**: General instructions over prescriptive steps
+- **Verification**: Self-check at end, role setting critical
+- **Thinking**: Adaptive thinking with effort parameter (max/xhigh/high/medium/low)
+- **Tools**: Explicit tool direction, text_editor, bash
+- **Key rule**: Be explicit; don't infer intent from vague prompts
+- **Source**: [[anthropic-prompt-best-practices]]
+### Google Gemini 3
+- **Structure**: Plain text, constraints at END (not beginning)
+- **Density**: Concise by default, must explicitly steer for verbosity
+- **Verification**: Split-step: verify capability → then generate
+- **Thinking**: thinking level LOW/HIGH, system instructions
+- **Temperature**: **1.0 MANDATORY** — never change
+- **Key rule**: Persona definitions are binding; model treats them seriously
+- **Source**: [[gemini-3-prompting-guide]]
+> [!gap] Empirical failure mode data (from Forge Code) should be layered ON TOP of official guidance, not used as the foundation. The old design was reversed.
+## What Never Adapts
+Core invariants across all profiles — enforced by pipeline structure, not model-specific instructions:
+- Pipeline phase ordering and layer requirements
+- Quality standards and source attribution requirements
+- Confidence labeling
+- Budget constraints (max rounds, max tokens, max pages)
+- Verification requirements (what must be checked, even if how varies)
+- Read-first/write-after wiki contract
+- No-skip rule (verification is mandatory)
+## Application to Harness Pipeline
+Each pipeline layer generates a fragment of the semantic spec. The renderer produces the actual prompt:
+- **L1 Spec Hardening**: Task definition, acceptance criteria → rendered per provider conventions
+- **L2 Structured Planning**: Task DAG, dependencies → constraint ordering per provider
+- **L2.5 Drift Monitor**: Detection strategy → split-step (Gemini), self-check (Claude), loop (GPT)
+- **L3 Grounding Checkpoints**: Verification steps → grounding mechanism per provider
+- **L4 Adversarial Verification**: Attack vectors → verification workflow per provider
+- **L5-L8**: Observability, memory, orchestration, query → rendered per provider
+## Implementation
+New module: **Prompt Renderer** (Phase P22b in [[harness-implementation-plan]])
+```
+Semantic Spec → Prompt Renderer → Provider-Native Prompt
+                ├── openai-renderer
+                ├── anthropic-renderer
+                └── google-renderer
+```
+- `lib/renderers/openai.ts` — XML-like sections, constraints-first, preambles
+- `lib/renderers/anthropic.ts` — XML tags, long-content-top, role setting
+- `lib/renderers/google.ts` — Plain text, constraints-last, grounding statements
+- `lib/renderers/fallback.ts` — Conservative markdown for unknown models
+## Sources
+- [[openai-prompt-guidance]] — OpenAI official, 2026
+- [[anthropic-prompt-best-practices]] — Anthropic official, 2026
+- [[gemini-3-prompting-guide]] — Google Cloud official, 2026-04-29
+- [[forgecode-gpt5-agent-improvements]] — Forge Code empirical, 2026

package/vault/wiki/concepts/model-routing-agents.md ADDED Viewed

@@ -0,0 +1,101 @@
+---
+type: concept
+title: "Model Routing Agents"
+created: 2026-04-30
+updated: 2026-04-30
+tags:
+  - agent-architecture
+  - token-reduction
+  - cost-optimization
+related:
+  - "[[wozcode]]"
+  - "[[agentic-harness]]"
+  - "[[research-wozcode-token-reduction]]"
+  - "[[wiki-query-interface]]"
+status: developing
+---# Model Routing Agents
+Model routing is an agent architecture pattern where different operation types are dispatched to different AI models based on capability requirements and cost. Read-only exploration work goes to the cheapest capable model; code generation stays on the frontier model.
+## WOZCODE's Pattern
+WOZCODE uses a two-agent split (Source: [[wozcode]]):
+| Agent | Model | Cost vs Opus | Operations |
+|-------|-------|-------------|------------|
+| `woz:code` | User's frontier (Opus/Sonnet) | 1× | Write/edit code, full tool access |
+| `woz:explore` | Haiku | ~15× cheaper | Read-only: search, explore, summarize |
+~40% of coding work is exploration → automatically routed to Haiku → ~70% savings on exploration calls.
+## Why This Works
+- **Exploration is read-only**: No risk of Haiku writing bad code because it only reads
+- **Exploration is high-volume**: Finding the right files, understanding architecture, searching for patterns — these are many small calls
+- **Exploration is low-creativity**: "Find the file that defines X" doesn't need frontier reasoning
+- **Summaries keep context lean**: Haiku returns summaries, main agent stays focused
+## Our Harness Integration Points
+### L8: Wiki Query Interface
+Current: LLM-native search via claude-obsidian skills.
+Change: Route wiki queries through Haiku when:
+- Query is read-only knowledge retrieval
+- Query is exploratory (not code generation)
+- Query result is a summary, not executable code
+### L2: Structured Planning
+Current: Task DAG generation uses the main model.
+Potential: Route plan review/refinement to Haiku. Frontier model generates the initial plan; Haiku validates, cross-references specs, and checks for missing dependencies.
+### NEW: Model Router Component
+A new cross-cutting component that sits between the Archon orchestrator (L7) and tool invocations:
+```
+User Request → L1 (Spec) → L2 (Plan) → L3 (Execute) → L4 (Critics) → L5 (Observe) → L6 (Memory)
+                                        ↓
+                              [Model Router]
+                              /              \
+                        woz:code         woz:explore
+                      (Frontier)         (Haiku)
+```
+The router decides per-operation:
+- **Route to Haiku**: Read tool, wiki query, search tool, summarize operations
+- **Route to Frontier**: Edit tool, write tool, bash tool, code generation
+- **Route to Frontier (always)**: Adversarial verification (L4), spec hardening (L1)
+## Router Decision Rules
+```typescript
+interface ModelRouter {
+  route(operation: ToolOperation): ModelTarget;
+}
+// Default rules:
+const DEFAULT_RULES: RoutingRule[] = [
+  { tools: ['read', 'wiki-query', 'search', 'grep'], operation: 'explore', model: 'haiku' },
+  { tools: ['edit', 'write', 'bash'], operation: 'mutate', model: 'frontier' },
+  { tools: ['harden-spec', 'attack', 'verify'], operation: 'critical', model: 'frontier' },
+  { tools: ['read'], when: 'post-edit-verify', model: 'frontier' }, // verification reads stay on frontier
+];
+```
+## Risks
+- **Haiku hallucination in summaries**: If Haiku misreads code, the summary is wrong, and the main agent acts on bad information. Mitigation: confidence scoring on summaries, critical reads always on frontier.
+- **Latency overhead**: Routing adds a decision step. Mitigation: deterministic routing rules, no AI-in-the-loop for the routing decision itself.
+- **Context coherence**: Summaries may lose detail that matters. Mitigation: progressive disclosure — Haiku returns L1 (signatures), main agent can request L2/L3 if needed.
+## Cost Model
+For a typical 5-subtask plan:
+- Without routing: all operations on frontier model
+- With routing: ~40% of operations on Haiku (15× cheaper)
+- Expected savings on exploration: ~70%
+- Expected overall savings: ~15-25% (consistent with WOZCODE's reported range)
+These are projections. Actual savings must be measured from API usage fields (Source: [[wozcode]] methodology).

package/vault/wiki/concepts/monorepo-architecture.md ADDED Viewed

@@ -0,0 +1,45 @@
+---
+type: concept
+status: developing
+tags:
+  - typescript
+  - monorepo
+  - turborepo
+  - architecture
+related:
+  - "[[ts-monorepo-koerselman]]"
+  - "[[Research: TypeScript Best Practices and Codebase Structure]]"
+created: 2026-05-02
+updated: 2026-05-02
+---# Monorepo Architecture (TypeScript)
+A monorepo stores multiple related packages (apps, libraries, services) in a single version-controlled repository. TypeScript monorepos add the complexity of shared types, build ordering, and module resolution across packages.
+## Key Tools
+- **Turborepo**: Build orchestration with caching. Defines task dependencies (`dependsOn`), inputs/outputs per task. Optional remote cloud cache.
+- **Nx**: Similar to Turborepo but with more integrated code generation and dependency graph visualization.
+- **pnpm workspaces**: Fast, disk-efficient package management with strict dependency isolation.
+## Internal Package Strategies
+### Built-Package (Recommended by Koerselman)
+Build TS → JS (with bundler). Point `main` to compiled output. Benefits: efficient caching, path aliases work, ESM output clean. Requires more config.
+### Internal-Packages (Source-only)
+Point `main` directly to TS source. Benefits: simple setup, live code updates. Downsides: no build caching, slower type-checking in consumers, path aliases conflict across packages.
+## ESM in Monorepos
+- CJS **cannot** import from ESM at the top level (synchronous vs asynchronous)
+- ESM requires file extensions on relative imports (`.js` or `.ts` with `moduleResolution: "bundler"`)
+- Bundlers eliminate the need for extensions by combining all code into single output files
+## IDE Integration
+Use `declarationMap: true` + `tsc --emitDeclarationOnly` to generate type definition map files. Enables go-to-definition from consuming packages back to original TS source.
+## Deployment Isolation
+Tools like `isolate-package` extract a single package + its internal dependencies into a self-contained directory for deployment (solves Firebase monorepo issues).