npm - ultimate-pi - Versions diffs - 0.1.2 → 0.1.4 - Mend

ultimate-pi 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (516) hide show

package/vault/wiki/modules/schema-orchestration.md ADDED Viewed

@@ -0,0 +1,68 @@
+---
+type: module
+title: Schema-Based Orchestration
+status: developing
+created: 2026-04-28
+updated: 2026-04-28
+tags: [harness, orchestration, archon, dag, layer-7]
+layer: "7"
+sources:
+  - "[[harness-implementation-plan]]"
+related:
+  - "[[agentic-harness]]"
+  - "[[persistent-memory]]"
+  - "[[structured-planning]]"
+---
+# Schema-Based Orchestration via Archon
+Layer 7 of the [[agentic-harness]]. Uses Archon's YAML workflow engine for DAG execution, loop nodes, human approval gates, worktree isolation, and run persistence. No custom orchestration code.
+## Architecture
+| Need | Archon provides | Alternative build cost |
+|------|-----------------|----------------------|
+| DAG execution | YAML workflow nodes | Custom task graph executor |
+| Loop nodes | `loop: { until: CONDITION }` | Custom rework loop logic |
+| Human approval gates | `loop: { until: APPROVED, interactive: true }` | Custom approval UI |
+| Worktree isolation | Auto git worktree per run | Custom branch management |
+| Run persistence | SQLite/PostgreSQL | Custom state storage |
+| Parallel nodes | Concurrent independent nodes | Custom parallel dispatch |
+pi.dev extensions implement **intelligence**. Archon implements **orchestration**.
+## Primary Workflow: harness-pipeline.yaml
+1. **harden-spec** → Spec hardening
+2. **resolve-ambiguities** → Loop until no blocking ambiguities
+3. **create-plan** → Plan from hardened spec
+4. **review-plan** → Adversarial review
+5. **approve-plan** → Loop until approved (interactive)
+6. **execute-plan** → Loop until all tasks complete (max 100 iterations)
+7. **capture-memory** → Store results via wiki-ingest skill
+## Terminal States
+| State | Meaning |
+|-------|---------|
+| `completed` | All control objectives passed |
+| `blocked` | Mandatory gate/dependency unresolved |
+| `replan_required` | Drift, failed critics, or spec change |
+| `cancelled` | Precondition not met |
+| `failed` | Retries/limits exhausted |
+## Extension Interface
+| Type | Name |
+|------|------|
+| Tool | `orchestrate-plan` |
+| Tool | `register-agent-capability` |
+| Command | `/harness-orchestration-status` |
+## Files
+- `lib/harness-orchestrator.ts` — Orchestrator class, schema validation, wave tracking
+- `extensions/harness-orchestrator.ts` — pi.dev extension registration
+- `.archon/workflows/harness-pipeline.yaml`
+- `.archon/workflows/harness-fix-issue.yaml`
+- `.archon/workflows/harness-quick-review.yaml`

package/vault/wiki/modules/skills.md ADDED Viewed

@@ -0,0 +1,27 @@
+---
+type: module
+path: ".pi/skills/"
+status: active
+language: markdown
+purpose: "Core capability plugins for the ultimate-pi agent."
+maintainer: "aryaniyaps"
+last_updated: "2026-04-28"
+linked_issues: []
+depends_on: ["lean-ctx"]
+used_by: ["pi"]
+tags: [module, skills]
+created: "2026-04-28"
+updated: "2026-04-28"
+title: "Agent Skills"
+---
+# Agent Skills
+## Description
+The `skills/` directory contains individual skill definitions that extend the capabilities of the agent. These are structured as Obsidian-flavored markdown documents that provide instructions and context routing.
+## Key Skills
+- `wiki`, `wiki-ingest`, `wiki-query`, `wiki-lint`
+- `lean-ctx` (core operations)
+- `caveman`
+- `compress`
+- `firecrawl`

package/vault/wiki/modules/spec-hardening.md ADDED Viewed

@@ -0,0 +1,58 @@
+---
+type: module
+title: Spec Hardening
+status: developing
+created: 2026-04-28
+updated: 2026-04-28
+tags: [harness, spec, layer-1, quality]
+layer: "1"
+sources:
+  - "[[harness-implementation-plan]]"
+related:
+  - "[[agentic-harness]]"
+  - "[[structured-planning]]"
+  - "[[adversarial-verification]]"
+---
+# Spec Hardening
+Layer 1 of the [[agentic-harness]]. Blocks execution until every underspecified component is resolved. Ambiguity is a bug — if you can't write a test for it, it's not specified.
+## Flow
+1. User request → `SpecHardener.harden()` → **HardenedSpec**
+2. Count blocking ambiguities → if > 0, loop back to user (max 3 retries)
+3. Store in `.pi/harness/specs/<id>.json`
+4. Emit `spec_hardened` → Layer 2
+## HardenedSpec Data Contract
+| Field | Purpose |
+|-------|---------|
+| `intent_summary` | What the user actually wants |
+| `success_criteria` | Each must be testable |
+| `anti_criteria` | What the solution MUST NOT do |
+| `ambiguity_flags` | Blocking or warning severity |
+| `definition_of_done` | Single boolean expression |
+| `scope_boundary` | Explicit in/out of scope |
+| `constraints` | Technical or domain constraints |
+## Extension Interface
+| Type | Name |
+|------|------|
+| Tool | `harden-spec` |
+| Tool | `resolve-ambiguity` |
+| Tool | `approve-spec` (human override) |
+| Command | `/harness-spec-status` |
+## Config
+```json
+{ "spec_hardening": { "max_ambiguity_retries": 3, "auto_resolve_warning": true } }
+```
+## Files
+- `lib/harness-spec.ts` — SpecHardener class, AI prompt construction
+- `extensions/harness-spec.ts` — Extension: intercepts requests, runs hardening gate

package/vault/wiki/modules/structured-planning.md ADDED Viewed

@@ -0,0 +1,53 @@
+---
+type: module
+title: Structured Planning
+status: developing
+created: 2026-04-28
+updated: 2026-04-28
+tags: [harness, planning, dag, layer-2, quality]
+layer: "2"
+sources:
+  - "[[harness-implementation-plan]]"
+related:
+  - "[[agentic-harness]]"
+  - "[[spec-hardening]]"
+  - "[[grounding-checkpoints]]"
+  - "[[schema-orchestration]]"
+---
+# Structured Planning
+Layer 2 of the [[agentic-harness]]. Produces a machine-readable task DAG reviewed before code begins. No code without a plan.
+## Flow
+1. `spec_hardened` event → `Planner.createPlan(spec)` → **ExecutionPlan**
+2. DAG validation: cycle detection, orphan detection, spec coverage
+3. If invalid → regenerate (max 3 revisions)
+4. Plan review gate: adversarial critic review OR human approval
+5. Store in `.pi/harness/plans/<id>.json`
+6. Emit `plan_approved` → Layer 7 (Archon)
+## ExecutionPlan Data Contract
+Each **PlanNode**: `task_id`, `title`, `description`, `inputs`/`outputs`, `dependencies`, `risk_surface`, `verification`, `status`.
+## Validation Checks
+- **Cycle detection** — no circular dependencies
+- **Orphan detection** — no disconnected nodes
+- **Spec coverage** — every success criterion maps to at least one task
+## Extension Interface
+| Type | Name |
+|------|------|
+| Tool | `create-plan` |
+| Tool | `review-plan` |
+| Tool | `approve-plan` |
+| Command | `/harness-plan-status` |
+## Files
+- `lib/harness-planner.ts` — Planner class, DAG generation, validation
+- `extensions/harness-planner.ts` — Extension for spec_hardened events

package/vault/wiki/modules/think-in-code-enforcement.md ADDED Viewed

@@ -0,0 +1,153 @@
+---
+type: module
+title: "Think-in-Code Enforcement (L3)"
+status: developing
+created: 2026-04-30
+updated: 2026-04-30
+tags: [harness, think-in-code, context-optimization, layer-3, enforcement]
+layer: "3"
+sources:
+  - "[[think-in-code-blog]]"
+  - "[[context-mode-website]]"
+  - "[[Research: context-mode vs lean-ctx]]"
+related:
+  - "[[think-in-code]]"
+  - "[[agentic-harness-context-enforcement]]"
+  - "[[grounding-checkpoints]]"
+  - "[[harness-implementation-plan]]"
+  - "[[lean-ctx]]"
+---
+# Think-in-Code Enforcement (L3 Tool Layer)
+A mandatory paradigm enforced at the L3 tool layer of the harness. Agents MUST write code to process data instead of reading raw data into the context window for mental processing. This is not a suggestion — it is enforced through system prompt injection, tool interception, and post-tool compression.
+## First Principles
+1. **Reading raw data into context is wasteful**: An agent reading 47 files to count errors consumes 700KB of context. A script doing the same analysis outputs 3.6KB. Reduction: 200×.
+2. **Agents are bad at mental computation**: Counting, filtering, comparing, parsing — these are CPU tasks. Agents should delegate to CPU.
+3. **Context is the scarcest resource**: Every token of raw data is a token not used for reasoning. The context budget must be protected.
+4. **The agent won't do this voluntarily**: Under pressure (context filling, task complexity), agents revert to read-everything patterns. Enforcement is mandatory.
+## Enforcement Architecture
+Three-layer enforcement, from cheapest/least-reliable to most expensive/most-reliable:
+### Layer 1: System Prompt Injection (zero cost)
+AGENTS.md rule:
+```markdown
+## Think in Code (MANDATORY)
+When you need to analyze, count, filter, compare, or process data,
+write code (JavaScript/Python) that does the work. Output only the
+answer. Do NOT read raw data into context for mental processing.
+Use built-ins only. No package installs. Always try/catch.
+Use ctx_execute() for sandboxed execution.
+```
+Cost: 0 tokens beyond the rule text. Reliability: depends on agent compliance.
+### Layer 2: PreToolUse Interception (medium cost)
+Intercept `Read()`, `Bash()`, `WebFetch()` calls at L3 executor hooks. Detect data-analysis patterns:
+- Sequential reads of 3+ files without edits between them
+- grep/find on large result sets (>100 lines)
+- WebFetch of large API responses (>5KB)
+Route to `ctx_execute()` sandbox via pi-lean-ctx's execution capabilities instead.
+Cost: ~0-50 tokens per intercepted call (check logic). Reliability: high — prevents wasteful calls before they happen.
+### Layer 3: PostToolUse Compression (medium cost)
+When large output enters context despite interception, lean-ctx's 90+ shell pattern matchers auto-compress:
+- Strip filler/boilerplate
+- Keep only signal (errors, results, key data)
+- Store raw output in searchable index (FTS5 equivalent)
+Cost: 0 tokens (lean-ctx shell hook pattern matching). Reliability: medium — compresses what got through, doesn't prevent.
+---
+## Execution Sandbox: ctx_execute()
+pi-lean-ctx provides `ctx_execute()` — a sandboxed code execution tool:
+- **What runs**: JavaScript/TypeScript (Node.js built-ins only, no npm)
+- **What returns**: Only `console.log()` output enters the conversation
+- **Sandbox**: Isolated subprocess, no filesystem access outside working directory
+- **Timeout**: Configurable (default: 30s)
+### Example: Before vs After
+**Before** (without Think in Code):
+```
+Agent: Read(file1) → Read(file2) → ... → Read(file47)
+       → mentally count errors → report
+Context: 700KB consumed. 47 tool calls. 20+ turns.
+```
+**After** (with Think in Code enforced):
+```
+Agent: ctx_execute(`
+  const fs = require('fs');
+  const files = fs.readdirSync('./logs');
+  let errors = 0;
+  for (const f of files) {
+    const content = fs.readFileSync(`./logs/${f}`, 'utf8');
+    errors += (content.match(/ERROR/g) || []).length;
+  }
+  console.log(JSON.stringify({total_errors: errors, files_scanned: files.length}));
+`)
+→ Output: {"total_errors": 127, "files_scanned": 47}
+Context: 3.6KB consumed. 1 tool call. 1 turn.
+```
+---
+## What Gets Routed to Think-in-Code
+| Pattern | Detection | Redirect |
+|---------|-----------|----------|
+| Sequential file reads (3+) without edits | L2 interception | `ctx_execute()` batch script |
+| grep/find with >100 results | L2 interception | `ctx_execute()` with filtered output |
+| WebFetch with >5KB response | L2 interception | `ctx_execute()` with `JSON.parse()` |
+| "Count how many...", "Find all..." | L1 system prompt | Agent self-routes |
+| "Compare X and Y..." | L1 system prompt | Agent self-routes |
+| "Summarize the errors..." | L3 compression | lean-ctx auto-compresses |
+---
+## Efficiency Gains
+| Scenario | Before | After | Reduction |
+|----------|--------|-------|-----------|
+| Multi-file data analysis | 47 Read() calls = 700KB | 1 ctx_execute() = 3.6KB | 200× |
+| Error log scanning | 20 tool calls = 600KB | 1 execute = 20KB | 30× |
+| API response parsing | 5 WebFetch + Read = 500KB | 1 execute = 1KB | 500× |
+| Config comparison across files | 10 Read() = 200KB | 1 execute = 5KB | 40× |
+---
+## Integration with L3 Grounding Checkpoints
+Think-in-Code enforcement runs as a pre-execution hook within L3:
+```
+L3 Grounding Checkpoint:
+  1. Pre-execution: spec grounding check
+  2. Pre-execution: Think-in-Code enforcement check (is the agent about to do data analysis via raw reads?)
+  3. Execute subtask
+  4. Post-execution: spec grounding check
+  5. Post-execution: context usage audit (did we exceed budget?)
+```
+If an agent tries to bypass Think-in-Code (reads 47 files sequentially), L3 drift monitor (L2.5) detects "excessive searching" and triggers a soft nudge.
+---
+## Files
+- `lib/harness-think-in-code.ts` — Enforcement logic, pattern detection, `ctx_execute()` routing
+- Update `lib/harness-executor.ts` — Add Think-in-Code hook to pre-execution phase
+- Update AGENTS.md — Add mandatory Think in Code rule

package/vault/wiki/modules/wiki-query-interface.md ADDED Viewed

@@ -0,0 +1,64 @@
+---
+type: module
+title: Wiki Query Interface
+status: developing
+created: 2026-04-28
+updated: 2026-04-28
+tags: [harness, wiki, search, claude-obsidian, layer-8, query]
+layer: "8"
+sources:
+  - "[[harness-implementation-plan]]"
+related:
+  - "[[agentic-harness]]"
+  - "[[persistent-memory]]"
+---
+# Wiki Query Interface (claude-obsidian Skills)
+Layer 8 of the [[agentic-harness]]. The query interface to the wiki. Uses claude-obsidian skills in GitHub Mode B — LLM-native search via hot.md → index.md → pages. See [[adr-009]].
+## Architecture
+```
+Agent / Human
+  ├── wiki-query (read)  ──→ wiki/hot.md → index.md → pages
+  ├── wiki-ingest (write) ──→ wiki/ (create/update pages)
+  └── wiki-lint (health)  ──→ orphan/contradiction checks
+```
+## Query Operations
+### Three Depth Modes
+| Mode | Code | Reads | Tokens |
+|------|------|-------|--------|
+| **Quick** | `query quick:` | hot.md + index.md | ~1,500 |
+| **Standard** | default | hot.md → index → 3-5 pages | ~3,000 |
+| **Deep** | `query deep:` | Full wiki + optional web | ~8,000+ |
+## Ingest Operations
+| Harness Event | Wiki Write | Frontmatter |
+|--------------|-----------|-------------|
+| `spec_hardened` | `decisions/ADR-<N>.md` | `type: decision` |
+| `plan_approved` | `flows/PLAN-<id>.md` | `type: flow` |
+| `subtask_completed` | Append to `log.md` | Operation log entry |
+| `subtask_verified` | `modules/<name>.md` | `type: module` |
+| `subtask_failed` | `modules/<name>.md` | `> [!contradiction]` |
+## Lint Operations (after every 10-15 writes)
+1. Orphan pages
+2. Dead links
+3. Stale claims
+4. Missing pages
+5. Frontmatter gaps
+6. Empty sections
+7. Stale index entries
+Output: `wiki/meta/lint-report-YYYY-MM-DD.md`
+## Dependencies
+- 24 obsidian-wiki skills (`npx skills add Ar9av/obsidian-wiki --yes`)
+- 5 obsidian-skills (`npx skills add kepano/obsidian-skills --yes`)

package/vault/wiki/overview.md ADDED Viewed

@@ -0,0 +1,51 @@
+---
+type: overview
+title: "Ultimate-PI Harness Architecture Overview"
+created: 2026-04-30
+updated: 2026-04-30
+status: active
+tags: [meta, overview, harness, architecture]
+---
+# Ultimate-PI Harness Architecture Overview
+## What This Is
+The **ultimate-pi agentic harness** is a mandatory 8-layer pipeline with drift monitoring, cross-cutting tool enhancements, and persistent wiki-based memory. Every AI coding task flows through all layers. Verification is mandatory — agent confidence is not evidence.
+## Architecture At a Glance
+```
+L1: Spec Hardening    → L2: Structured Planning  → L2.5: Runtime Drift Monitor
+  ↓                         ↓                           ↓ (3 paradigms: tool-call, spec, implementation)
+L3: Grounding Checkpoints  → L4: Adversarial Verification → Phase 16: Lint+Format Gate
+  ↓ (with Think-in-Code, AST truncation, fuzzy edits,
+     inline syntax validation, ck semantic search, Gitingest)
+L5: Automated Observability → L6: Persistent Memory (Wiki) → L7: Archon Orchestration → L8: Wiki Query
+```
+## Key Numbers
+- **~15,000-16,000 tokens/subtask** pipeline overhead (with all enhancements)
+- **27 build phases** (P0-P27) + 3 future phases (F1-F3)
+- **4 new tools**: ck (semantic search), Gitingest (bulk ingestion), pi-messenger (debate transport), pi-lean-ctx (compression+governance)
+- **3 control frameworks**: H=(E,T,C,S,L,V), Feedforward-Feedback, Generator-Evaluator
+- **3 drift detection paradigms**: Tool-call (L2.5), Spec (L3), Implementation (L4)
+- **Model-adaptive**: 4 profiles (opus/gpt/gemini/strict) × 4 configuration layers
+## Authoritative Pages
+| Page | Role |
+|------|------|
+| [[harness-implementation-plan]] | Master plan: phases, token budget, architecture |
+| [[harness]] | Pipeline overview with layer descriptions |
+| [[harness-control-frameworks]] | Unified formal models |
+| [[drift-detection-unified]] | Three complementary drift paradigms |
+| [[index]] | Master catalog of all wiki pages |
+## Key Decisions
+- [[adr-008]] — Spec-Only Black-Box QA
+- [[adr-009]] — claude-obsidian Mode B persistent memory
+- [[adr-010]] — Wiki tight-coupling contract (read-first, write-after)
+- [[adr-011]] — Consensus debate with selective routing (iMAD)

package/vault/wiki/questions/Research-pi-vs-claude-code-agentic-orchestration-pipeline.md ADDED Viewed

@@ -0,0 +1,87 @@
+---
+type: synthesis
+title: "Research: pi-vs-claude-code Agentic Orchestration Pipeline"
+created: 2026-05-03
+updated: 2026-05-03
+tags:
+  - research
+  - agentic-orchestration
+  - pi-agent
+  - harness
+status: developing
+related:
+  - "[[concepts/agentic-orchestration-pipeline]]"
+  - "[[concepts/agent-harness-architecture]]"
+  - "[[concepts/multi-agent-specialization]]"
+  - "[[concepts/context-engineering]]"
+  - "[[concepts/safety-defense-in-depth]]"
+  - "[[entities/pi-coding-agent]]"
+  - "[[entities/disler-indydevdan]]"
+  - "[[entities/opendev]]"
+sources:
+  - "[[sources/disler-pi-vs-claude-code]]"
+  - "[[sources/opendev-arxiv-2603.05344v1]]"
+  - "[[sources/martin-fowler-harness-engineering]]"
+  - "[[sources/mindstudio-four-agent-types]]"
+  - "[[sources/anthropic-effective-harnesses]]"
+---
+# Research: pi-vs-claude-code Agentic Orchestration Pipeline
+## Overview
+The `disler/pi-vs-claude-code` repository demonstrates that Pi Coding Agent's extension system can implement production-grade multi-agent orchestration entirely in user-space TypeScript. Three orchestration patterns emerge — subagent delegation, team dispatch, and sequential chaining — each with distinct use cases and implementation strategies. These patterns can be ported to our harness as `.pi/skills/` extensions backed by YAML configuration files. The broader research reveals that harness engineering (context management, safety, feedback loops) is as critical as orchestration itself, and mature systems like OpenDev provide reference architectures for both.
+## Key Findings
+1. **Pi extensions can implement full orchestration without core changes** (Source: [[sources/disler-pi-vs-claude-code]]). The three orchestration extensions (subagent-widget, agent-team, agent-chain) are clean TypeScript files that hook Pi's event system. Our harness can adopt identical patterns.
+2. **Three orchestration patterns cover the design space** (Source: [[sources/disler-pi-vs-claude-code]], [[sources/mindstudio-four-agent-types]]):
+   - **Subagent delegation** (fan-out): Spawn isolated agents for parallel subtasks. Best for exploration, analysis, background work.
+   - **Team dispatch** (specialist routing): Dispatcher selects specialist from roster. Best for domain-specific work.
+   - **Sequential chaining** (pipeline): Agents execute in order with `$INPUT` passing. Best for multi-phase workflows.
+3. **Schema-level isolation is more robust than runtime checks** (Source: [[sources/opendev-arxiv-2603.05344v1]]). Removing tools from a subagent's schema makes dangerous operations structurally impossible. The model cannot argue for capabilities it doesn't know exist. This should be our default safety strategy.
+4. **Context engineering is a first-class concern, not an afterthought** (Source: [[sources/opendev-arxiv-2603.05344v1]], [[sources/martin-fowler-harness-engineering]]). Staged compaction (5 graduated thresholds), event-driven reminders (24 templates, user-role injection), and dual-memory architecture (episodic + working) are proven techniques. Our harness lacks all three.
+5. **Harness = Guides + Sensors + Steering Loop** (Source: [[sources/martin-fowler-harness-engineering]]). Feedforward guides steer before action; feedback sensors observe after. The human iterates on both. Our `.pi/skills/` are feedforward; `wiki-lint` and `posthog-analyst` are feedback. We need more computational sensors.
+6. **Multi-model pipelines beat single-model agents** (Source: [[sources/mindstudio-four-agent-types]], industry pattern). Different pipeline stages benefit from different models (Opus for planning, Sonnet for building, Haiku for reviewing). Our harness should support per-stage model selection.
+7. **Safety requires defense-in-depth, not single-point checks** (Source: [[sources/opendev-arxiv-2603.05344v1]], [[sources/disler-pi-vs-claude-code]]). Five independent layers: prompt guardrails → schema gating → runtime approval → tool validation → lifecycle hooks. Our harness has none of these in structured form.
+## Key Entities
+- **[[entities/pi-coding-agent]]**: The foundation — open-source terminal coding agent with TypeScript extension API. Our harness platform.
+- **[[entities/disler-indydevdan]]**: Created the reference implementation of Pi orchestration extensions. Primary source of patterns.
+- **[[entities/opendev]]**: Most comprehensive reference architecture for terminal coding agents. Source of context engineering and safety patterns.
+## Key Concepts
+- **[[concepts/agentic-orchestration-pipeline]]**: Three orchestration patterns (subagent, team, chain) with design principles for implementation.
+- **[[concepts/agent-harness-architecture]]**: Scaffolding + Harness model. Feedforward guides + Feedback sensors in a steering loop.
+- **[[concepts/multi-agent-specialization]]**: Specialization by role, model, and tool set. Team composition via YAML config.
+- **[[concepts/context-engineering]]**: Staged compaction, dual-memory, event-driven reminders, lazy tool discovery, prompt caching.
+- **[[concepts/safety-defense-in-depth]]**: Five-layer architecture with schema gating as the primary strategy.
+## Contradictions
+- **Single agent vs Multi-agent overhead**: [[sources/mindstudio-four-agent-types]] warns that orchestration adds overhead and should only be used when a single agent demonstrably fails. [[sources/disler-pi-vs-claude-code]] shows orchestration as a default pattern. Resolution: Start with a single agent; add orchestration when context limits or specialization needs are clear. This aligns with our current approach where the `Agent` tool is used selectively.
+## Open Questions
+- How to implement event-driven system reminders in Pi's extension API? Pi's event system supports `tool_call` and `turn_end` events — these could drive reminder injection.
+- What's the right compaction strategy for our context window? Pi doesn't expose token counts to extensions — we may need to approximate or request API changes.
+- How to persist approval rules across sessions? Pi's extension lifecycle includes `session_start` and `session_shutdown` — rules could be loaded/saved in these hooks.
+- Can we implement 9-pass fuzzy editing in Pi's `edit` tool handler? Pi's extension API exposes `tool_call` events — we could intercept edit failures and retry with fuzzy matching.
+- What's the performance impact of context isolation per subagent? Spawning new Pi processes per subagent may be expensive. Thread-based subagents (like OpenDev) would be lighter.
+- How to implement the steering loop? Need a mechanism for humans to review harness performance and update guides/sensors. Our `wiki` + `posthog-analyst` pipeline is a start.
+## Sources
+- [[sources/disler-pi-vs-claude-code]]: disler, Feb 2026 — Reference implementation of Pi orchestration extensions
+- [[sources/opendev-arxiv-2603.05344v1]]: Nghi D. Q. Bui, Mar 2026 — Comprehensive terminal agent architecture paper
+- [[sources/martin-fowler-harness-engineering]]: Birgitta Böckeler, Apr 2026 — Harness engineering mental model and framework
+- [[sources/mindstudio-four-agent-types]]: MindStudio, Apr 2026 — Taxonomy of agent types and architecture decisions
+- [[sources/anthropic-effective-harnesses]]: Justin Young (Anthropic), 2025 — Authoritative harness definition