npm - ultimate-pi - Versions diffs - 0.1.7 → 0.2.2 - Mend

ultimate-pi 0.1.7 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (524) hide show

package/vault/wiki/decisions/adr-012.md DELETED Viewed

@@ -1,102 +0,0 @@
----
-type: decision
-title: "ADR-012: Extension-Based Harness Orchestrator — Leveraging Pi's Native Event System"
-status: accepted
-priority: 1
-date: "2026-05-02"
-updated: "2026-05-04"
-tags: [adr, harness, integration, extensions, orchestrator, pi-extension-api]
-sources:
-  - "[[HARNESS-PRD]]"
-  - "[[dotenv-loader]]"
-  - "[[custom-footer]]"
-related:
-  - "[[adr-010]]"
-  - "[[adr-011]]"
-  - "[[adr-026-one-thing]]"
-supersedes:
-created: 2026-05-02
----
-# ADR-012: Extension-Based Harness Orchestrator — Leveraging Pi's Native Event System
-**UPDATED 2026-05-04**: Original ADR incorrectly stated pi has only 5 native events. Pi v0.70.2 provides 30+ event types via `ExtensionAPI.on()`. The decision — extension-based, no fork — remains correct. Updated to reflect actual pi capabilities.
-## Context
-The HARNESS-PRD specifies an 8-layer mandatory execution pipeline. The pi coding agent provides an `ExtensionAPI` with **30+ native event types**: `session_start`, `session_before_compact`, `session_compact`, `session_shutdown`, `before_agent_start`, `agent_start`, `agent_end`, `turn_start`, `turn_end`, `message_start`, `message_update`, `message_end`, `tool_execution_start`, `tool_execution_update`, `tool_execution_end`, `tool_call` (per-tool types: bash, read, edit, write, grep, find, ls, custom), `tool_result` (per-tool types), `context`, `before_provider_request`, `after_provider_response`, `model_select`, `input`, `user_bash`, `resources_discover`, and more.
-Three integration paths were considered:
-- **A) Fork pi to add hook points.** Maintenance burden. Blocks on upstream.
-- **B) Wrap pi at process boundary.** Fragile, breaks on pi updates.
-- **C) Build a harness orchestrator extension that listens to pi's native events and routes them through a skill pipeline.** Zero pi changes. Full control within extensions.
-## Decision
-**Use an extension-based harness orchestrator. No fork. No process wrapping. No custom event bus needed.**
-A single `harness-orchestrator` extension subscribes to pi's native events directly — no intermediate event bus layer. State machine tracks pipeline position (L1→L2→L3→L4→L5-L8). Phase transitions detected via tool result patterns and turn boundaries. Skills are activated via `pi.sendMessage()` with `deliverAs: "steer"` to inject steering prompts at the right pipeline phase.
-### Event-to-Pipeline Mapping
-Pi's native events map directly to harness pipeline phases without translation:
-| Pi Native Event | Harness Pipeline Action |
-|---|---|
-| `turn_start` | Initialize phase context. Inject L1 spec-hardening steer if entering L1. |
-| `tool_call` | Track which tools the agent invokes. Detect phase transitions (e.g., write tool = entering L3 execution). |
-| `tool_result` | Route to drift monitor during L3 execution. Detect gate conditions (compile failures, lint errors). |
-| `turn_end` | Trigger L4 critic if in verification phase. Accumulate token budget. |
-| `before_agent_start` | Inject harness state into system prompt. Reinject after compaction. |
-| `agent_end` | Finalize pipeline. Trigger L5 observability + L6 memory writes. |
-| `session_start` | Bootstrap harness state. Load config. Warm wiki cache. |
-| `session_compact` | Persist harness state. Reinject after compaction completes. |
-| `session_shutdown` | Flush observations. Write keep-rate samples. |
-### Enforcement Model — Updated
-With `tool_call` events, the orchestrator gains new enforcement capabilities not possible with only 5 events:
-- **Pre-execution tool blocking**: `tool_call` handlers can return `{ block: true, reason: "..." }` to prevent tool execution. This enables: blocking edits when spec isn't hardened, blocking writes when drift detected, blocking bash when sandbox isn't configured.
-- **Result mutation**: `tool_result` handlers can modify content/details/isError. This enables: injecting warnings into results, marking results as errors based on drift detection, adding structural analysis annotations.
-- **Context injection**: `before_agent_start` can replace the system prompt entirely. This enables: switching between "spec hardening mode" and "execution mode" and "verification mode" prompts.
-Not 100% software-enforced for all layers, but `tool_call` blocking + `tool_result` mutation + `before_agent_start` prompt injection achieves high compliance. L7 orchestration (later phase) may add process-level enforcement via `pi.exec()`-based gate scripts.
-## Rationale
-- **Zero pi dependency**: Works with pi v0.70.2 today. No upstream PRs, no fork maintenance.
-- **All harness logic in one extension**: The orchestrator is a single `.pi/extensions/harness-orchestrator.ts`. No intermediate event bus layer.
-- **Proven pattern**: `custom-footer.ts` already demonstrates using `turn_start`, `context`, `model_select`, and `session_start` events. The harness orchestrator is the same pattern, scaled.
-- **Upgrade path**: If pi adds new native events, the orchestrator can subscribe to them without architectural changes.
-- **Tool call blocking is new**: pi's `tool_call` event supports `{ block: true }` return — this is a hard enforcement mechanism not available in the original 5-event assumption.
-## Consequences
-### Positive
-- Ships immediately. No external dependencies beyond pi.
-- Harness is self-contained in `.pi/extensions/`.
-- `tool_call` blocking provides hard enforcement for critical gates.
-- ~150 lines, not ~290 (no intermediate event bus).
-### Negative
-- Still ~95% compliance for skill-level steering (LLM can ignore steering prompts).
-- Pattern detection in `tool_result` is heuristic, not guaranteed.
-- `pi.sendMessage()` steering behavior unverified — need to test skill activation.
-### Mitigations
-- `tool_call` blocking for critical safety gates (no edit without spec, no write with drift).
-- Multiple defense layers (L1 hardening + L2.5 drift + L4 adversarial + P20 deterministic) mean a single-layer bypass is caught downstream.
-- Compliance monitoring in L5 observability tracks bypass rates per layer.
-- L7 orchestration (P23) adds process-level enforcement via `pi.exec()` gate scripts.
-## Correction from Original (2026-05-04)
-| | Original ADR-012 | Corrected |
-|---|---|---|
-| Pi native events | "5 native events" | **30+ native events** |
-| Architecture | Event bus layer on top of 5 events | **Orchestrator listens to 30+ events directly** |
-| File | `harness-event-bus.ts` | **`harness-orchestrator.ts`** |
-| Lines | ~200+ | **~100** (thinner — no bus layer) |
-| Tool blocking | Not possible (assumed) | **Possible via `tool_call` event** |
-| Pre-execution gates | Prompt-only | **Prompt OR `tool_call` blocking** |

package/vault/wiki/decisions/adr-013.md DELETED Viewed

@@ -1,59 +0,0 @@
----
-type: decision
-title: "ADR-013: Biome for Phase 16 Deterministic Quality Gate"
-status: accepted
-priority: 1
-date: "2026-05-02"
-tags: [adr, harness, phase-16, linting, formatting, biome, deterministic-gate]
-sources:
-  - "[[HARNESS-PRD]]"
-  - "[[package.json]]"
-  - "[[biome.json]]"
-related:
-  - "[[adr-012]]"
-supersedes: "PRD Q5 (ESLint+Prettier recommendation)"
-created: 2026-05-02
-updated: 2026-05-02
----
-# ADR-013: Biome for Phase 16 Deterministic Quality Gate
-## Context
-PRD Q5 originally resolved Phase 16 gate to "ESLint + Prettier." The project already uses Biome 2.0.6 (`package.json`: `"lint": "biome check"`, `"format": "biome format --write"`) with lefthook pre-commit integration. Adding ESLint+Prettier as new dependencies would duplicate existing tooling.
-The original concern was Biome's type-aware linting gap. With Biome 2.0.6 + TypeScript 6.0.3, type-aware rules have improved. The remaining gap is covered by `tsc --noEmit` as a separate deterministic step.
-## Decision
-**Use Biome for lint + format in Phase 16. Replace `ESLint + Prettier` with `biome check --apply` + `tsc --noEmit` + `fallow audit`.**
-Phase 16 gate runs three deterministic steps, 0 LLM tokens:
-1. `biome check --apply` — lint + format in one pass
-2. `tsc --noEmit` — type-checking catch for rules Biome doesn't cover (floating promises, type-aware issues)
-3. `fallow audit --changed-since main` — dead code, duplication, complexity
-All three are pure CLI tools with exit codes. No LLM involvement.
-## Rationale
-- **Already configured**: Biome is installed, configured (`biome.json`), and integrated with lefthook. Zero setup cost.
-- **Single tool for lint+format**: Biome replaces both ESLint and Prettier. One dependency instead of two.
-- **TypeScript type-checking via `tsc`**: Covers what Biome can't. `tsc --noEmit` is already in `package.json` scripts (`"check:ts"`).
-- **Zero incremental dependencies**: No ESLint, no prettier, no eslint-config-prettier, no @typescript-eslint packages.
-## Consequences
-### Positive
-- Fewer dependencies. Lower maintenance.
-- Matches existing project conventions.
-- lefthook integration already works.
-### Negative
-- Some ESLint rules have no Biome equivalent (rare edge cases).
-- `tsc --noEmit` is slower than Biome's native linting (but acceptable as a separate gate step).
-### Mitigations
-- If a specific ESLint-only rule is needed, evaluate case-by-case. Most are cosmetic — Biome's defaults are sufficient for a deterministic quality gate.
-- `tsc --noEmit` can be limited to `--skipLibCheck` for speed.

package/vault/wiki/decisions/adr-014.md DELETED Viewed

@@ -1,73 +0,0 @@
----
-type: decision
-title: "ADR-014: isolated-vm for P43 TypeScript Execution Sandbox"
-status: accepted
-priority: 1
-date: "2026-05-02"
-tags: [adr, harness, p43, typescript-execution, sandbox, isolated-vm, security]
-sources:
-  - "[[HARNESS-PRD]]"
-  - "[[adr-012]]"
-related:
-  - "[[adr-012]]"
-supersedes:
-created: 2026-05-02
-updated: 2026-05-02
----
-# ADR-014: isolated-vm for P43 TypeScript Execution Sandbox
-## Context
-P43 TypeScript Execution Layer replaces flat tool calling with a single `write_ts` tool backed by a sandboxed runtime. Agent writes TypeScript orchestrating tools; runtime executes the code.
-Three sandbox options evaluated:
-| | Node.js VM (`node:vm`) | Deno subprocess | `isolated-vm` |
-|---|---|---|---|
-| Isolation | Weak — same process, `process.exit()` kills harness | Strong — separate OS process | Strong — V8 isolate, separate heap |
-| Performance | Fastest — no IPC | IPC overhead per tool call | Fast — in-process but isolated |
-| Setup | Zero deps | Install Deno (new runtime dep) | Native addon (C++ compilation) |
-| Node compat | Full | Partial (Deno APIs differ) | Full |
-| Security | Low — `require('child_process')` escapes | Medium — `--allow-*` flags | High — no `require` unless granted |
-| Maturity | Built-in | Young | Mature (7K+ stars, Fly.io, Netlify) |
-Pi runs on Node.js. Adding Deno as a dependency for just the sandbox is heavy. `node:vm` is too weak — `process.exit()` kills the harness. PRD P38 (OS-level sandbox with bubblewrap/Seatbelt) is a separate phase and won't be ready when P43 ships.
-## Decision
-**Use `isolated-vm` as the P43 sandbox runtime.**
-- Separate V8 isolate with its own heap. Cannot crash the harness.
-- No `require` access unless explicitly granted via the sandbox API.
-- Tool functions (`read`, `edit`, `bash`, `find`, `grep`, `ck_search`) are exposed via explicit host function registration, not via Node.js module resolution.
-- TypeScript agent code is compiled with ESBuild (`tsc` type-strips, ESBuild bundles) to plain JS before injection into the isolate.
-- P38 bubblewrap/Seatbelt adds defense-in-depth later. `isolated-vm` is the inner sandbox; P38 is the outer sandbox.
-### Fallback
-If `isolated-vm` native addon compilation fails in a given environment, fall back to `node:vm` + P38 bubblewrap as the outer enforcement layer. The fallback is less secure but functional.
-## Rationale
-- **Security**: The agent writes arbitrary TypeScript. We cannot trust it. `isolated-vm` limits blast radius to the isolate.
-- **Performance**: In-process. No IPC overhead. Tool calls dispatch via typed host functions.
-- **Maturity**: 7K+ GitHub stars. Used by Fly.io for customer code execution and Netlify for edge functions. Battle-tested.
-- **Composability**: P43 sandbox serves double duty as P15b pre-verification sandbox. Same isolate, different execution context.
-## Consequences
-### Positive
-- Strong isolation without process overhead.
-- Reuses same sandbox for P15b pre-verification.
-- Explicit host function registration = auditable tool surface.
-### Negative
-- Native addon requires C++ build toolchain (`node-gyp`). Adds `dev` setup step.
-- Not available in all environments (e.g., some CI runners without C++ toolchain). Fallback needed.
-- Learning curve — `isolated-vm` API differs from `node:vm`.
-### Mitigations
-- Document `isolated-vm` build requirements in README.
-- Implement `node:vm` fallback path from day one.
-- P38 OS-level sandbox provides outer defense for fallback mode.

package/vault/wiki/decisions/adr-015.md DELETED Viewed

@@ -1,81 +0,0 @@
----
-type: decision
-title: "ADR-015: Pipeline-First Build Order"
-status: accepted
-priority: 1
-date: "2026-05-02"
-tags: [adr, harness, build-order, mvp, incremental-delivery]
-sources:
-  - "[[HARNESS-PRD]]"
-  - "[[adr-012]]"
-  - "[[adr-014]]"
-related:
-  - "[[adr-012]]"
-  - "[[adr-014]]"
-supersedes: "PRD Section 16.1 (original 10-group build order)"
-created: 2026-05-02
-updated: 2026-05-02
----
-# ADR-015: Pipeline-First Build Order
-## Context
-The original PRD specified 10 build groups with P43 (TypeScript Execution Layer) in Group 6 — after L1, L2, L3, L2.5, L4, and Post-Verification. Two competing strategies emerged:
-- **Option A (P43-first)**: Foundation → L1/L2 → P43 → L3 survivors → L2.5 → L4. Front-load the biggest context reduction.
-- **Option B (Pipeline-first)**: Foundation → L1/L2 → L2.5 → L4 → P43 + L3 survivors → Post-Verification. Validate quality gates before investing in execution layer.
-Initial preference was Option A to avoid rebuilding L3 integration. Re-evaluated: L2.5 (drift monitor) and L4 (adversarial verification) do not depend on P43. They work with pi's existing flat tool calling. Validating the full L1→L2→L2.5→L4 pipeline before P43 means we prove the gate model works before committing to the execution layer.
-## Decision
-**Pipeline-first (Option B). Validate gates before P43 investment.**
-### New Build Order
-```
-Group 1: Foundation (F0) + L1 Spec Hardening + L2 Structured Planning
-Group 2: L2.5 Runtime Drift Monitor (rule-based, works with pi's existing tool calling)
-Group 3: L4 Adversarial Verification (critic agents, selective debate)
-Group 4: P43 TypeScript Execution Layer + L3 survivors (P8/P9/P11/P13/P15)
-Group 5: Post-Verification (P20-P24: lint gate, observability, memory, orchestration, wiki query)
-Group 6: Cross-Cutting Capabilities (P25-P42: router, anxiety guard, error class, browser, hooks, compaction, permissions, etc.)
-Group 7: Self-Evolving Infrastructure (P45-P48: auto-optimize, behaviour harness, auto-learn, sandbox infra)
-```
-### L3 Survivors Absorbed into P43
-P8 (grounding checkpoints), P9 (AST truncation), P11 (inline validation), P12 (post-edit hooks), P13 (ck search), P15 (gitingest) implement inside P43's `isolated-vm` runtime, not on top of flat tools. P14 (Think-in-Code) is absorbed by P43 — P43 IS think-in-code. P10 (fuzzy edit matching) moves into P43's `edit()` host function. P15b (pre-verification sandbox) reuses the same `isolated-vm` isolate.
-### Incremental Delivery
-1. **After Group 1**: Harness blocks ambiguous tasks. Specs hardened. Plans structured.
-2. **After Group 2**: Agent stuckness detected and auto-corrected. Drift spirals prevented.
-3. **After Group 3**: Every change passes critic attack. Consensus debates filed.
-4. **After Group 4**: 3-4x context reduction on all tool workflows.
-5. **After Group 5**: Keep Rate tracked. Memory persists. Pipeline orchestrated.
-6. **After Group 6**: Full SOTA harness feature set.
-7. **After Group 7**: Harness self-evolves.
-## Rationale
-- **Risk reduction**: P43 is the hardest single phase (CodeAct-level complexity). Validating the simpler gate pipeline (L1/L2/L2.5/L4) first proves the architecture before committing to the execution layer.
-- **No throwaway work**: L2.5 and L4 work with any tool-calling mechanism. When P43 arrives, drift monitor and critics monitor P43 tool calls the same way they monitor flat tool calls — through `tool_result` events.
-- **Faster to first working pipeline**: Groups 1-3 produce an end-to-end harness (harden → plan → monitor drift → verify) in ~7 weeks. Users get value before P43.
-- **Parallelizable**: Group 4 (P43) can begin in parallel with Groups 2-3 if multiple developers/agents are available.
-## Consequences
-### Positive
-- Gates proven before execution layer.
-- Earlier user value.
-- P43 benefits from lessons learned in Groups 2-3 about tool calling patterns.
-### Negative
-- More total time to 3-4x context reduction (P43 at Group 4 vs Group 2).
-- L3 survivors (P8/P9/P11) delayed until P43 ships — grounding checkpoints not available in Groups 1-3.
-### Mitigations
-- If P43 proves simpler than expected, Group 4 can be fast-tracked.
-- Drift monitor (Group 2) provides partial grounding — catches context loops even without formal checkpoints.

package/vault/wiki/decisions/adr-016.md DELETED Viewed

@@ -1,91 +0,0 @@
----
-type: decision
-title: "ADR-016: @tintinweb/pi-subagents for L4 Critic and Sub-Agent Infrastructure"
-status: accepted
-priority: 1
-date: "2026-05-02"
-tags: [adr, harness, l4, subagents, critic, pi-subagents, tintinweb]
-sources:
-  - "[[HARNESS-PRD]]"
-  - "[[adr-011]]"
-  - "[[adr-012]]"
-related:
-  - "[[adr-011]]"
-  - "[[adr-012]]"
-supersedes:
-created: 2026-05-02
-updated: 2026-05-02
----
-# ADR-016: @tintinweb/pi-subagents for L4 Critic and Sub-Agent Infrastructure
-## Context
-L4 Adversarial Verification requires a separate agent process (critic) with its own context window, system prompt, and tool set. ADR-011 specifies multi-agent debate with separate sessions. ADR-012 specifies extension-based integration without forking pi.
-Two existing pi subagent packages were evaluated:
-- `@tintinweb/pi-subagents` (v0.6.3, 26 versions) — full-featured: RPC, event bus, custom agents, worktree isolation, memory, graceful turn limits
-- `@mjakl/pi-subagent` (v1.4.1) — minimal: depth guards, cycle prevention, spawn/fork modes. No RPC or event bus.
-Pi's philosophy: "No sub-agents built in. Build your own with extensions, or install a package." Both packages follow this model — they are pi extensions that spawn sub-agents as separate pi processes.
-## Decision
-**Use `@tintinweb/pi-subagents` as the sub-agent infrastructure. Define L4 critic as a custom agent type.**
-### Critic Agent Definition
-`.pi/agents/critic.md`:
-```yaml
----
-description: Adversarial code reviewer — attacks code changes with hard-threshold pass/fail criteria
-tools: read, grep, find, ls, bash
-model: inherit
-thinking: high
-max_turns: 15
-prompt_mode: replace
----
-```
-Critic runs with `prompt_mode: replace` — standalone system prompt, no parent context inheritance. This ensures true generator-evaluator separation (FP #8). The critic's system prompt contains hard-threshold pass/fail criteria extracted from the sprint contract.
-### Harness Integration
-The harness extension uses the package's cross-extension RPC to spawn and manage critics:
-1. Harness writes critic prompt to `.pi/harness/critics/<spec-hash>.md` (spec, diff, criteria)
-2. Harness emits `subagents:rpc:spawn` with `type: "critic"`, `prompt: "@.pi/harness/critics/<hash>.md"`
-3. Critic runs in separate pi process (`prompt_mode: replace` = clean context)
-4. Harness listens for `subagents:completed` event to get verdict
-5. Harness files consensus to `wiki/consensus/` (ADR-011)
-### Multi-Round Debate
-For selective multi-round debate (ADR-011), the harness can spawn multiple critic agents with different attack angles and use `steer_subagent` via RPC to inject counter-arguments.
-## Rationale
-- **Event bus + RPC**: The cross-extension RPC (`subagents:rpc:spawn`, `subagents:rpc:stop`) is essential for programmatic harness integration. `@mjakl/pi-subagent` lacks this.
-- **Separate pi processes**: Each sub-agent gets its own context window, model, and tool set. True adversarial separation.
-- **Mature**: 26 versions, active maintenance, ~4.6K monthly downloads.
-- **Custom agent types via `.md` files**: Clean, declarative. No code changes to define new agent roles.
-- **Graceful turn limits**: Critic won't spin forever. Gets wrap-up warning before abort.
-- **Compatibility**: Pi-native. Uses pi's session management, tool system, and extension API. No external LLM SDK needed.
-## Consequences
-### Positive
-- L4 critic runs in isolated context. No generator-evaluator contamination.
-- Extensible to other sub-agent roles (P25 specialization router, P30 browser agent).
-- Event bus enables other extensions to react to sub-agent lifecycle.
-### Negative
-- New dependency: `@tintinweb/pi-subagents`. Must be installed via `pi install npm:@tintinweb/pi-subagents`.
-- Sub-agent token cost is additive (critic tokens + proposer tokens). Mitigated by selective debate routing (ADR-011).
-- Relies on third-party package maintenance. If abandoned, fallback to direct pi SDK usage.
-### Mitigations
-- Package is MIT licensed. Can be forked and maintained if needed.
-- Fallback: direct `createAgentSession()` SDK usage if the package becomes unavailable.

package/vault/wiki/decisions/adr-017.md DELETED Viewed

@@ -1,79 +0,0 @@
----
-type: decision
-title: "ADR-017: Harness Project Structure — src/harness/ Library + Extension Wiring"
-status: superseded
-priority: 1
-date: "2026-05-02"
-tags: [adr, harness, project-structure, foundation, f0]
-sources:
-  - "[[HARNESS-PRD]]"
-  - "[[adr-012]]"
-related:
-  - "[[adr-012]]"
-  - "[[skill-first-architecture]]"
-supersedes: "PRD Section 17 (original lib/ file structure)"
-superseded_by: "Pi built-in event bus (2026-05-04) — custom event bus no longer needed"
-created: 2026-05-02
-updated: 2026-05-04
----
-# ADR-017: Harness Project Structure
-## Context
-The PRD specified a `lib/` directory with ~30 TypeScript files for harness logic. The project is a pi package with `.pi/extensions/` and `.pi/skills/`, not a standalone Node.js library. The integration model (ADR-012) is extension-based — harness logic wires into pi's `ExtensionAPI`.
-> [!warning] Superseded (2026-05-04)
-> Pi's latest version ships a built-in event bus, making the custom `events.ts` and `harness-event-bus.ts` wiring layer redundant. The code layer now consists of 3 files: `types.ts`, `config.ts`, `drift-monitor.ts`. Skills register directly with pi's native event bus. See [[skill-first-architecture]] for the updated architecture.
-Three structures considered:
-- ~~**Monolithic extension**: all logic in `.pi/extensions/harness-event-bus.ts`~~ (event bus removed)
-- **Multiple extensions**: one per layer (`.pi/extensions/harness-l1.ts`, etc.)
-- ~~**Library + wiring**: `src/harness/` for pure logic, `.pi/extensions/harness-event-bus.ts` for pi integration~~ (event bus removed)
-## Decision
-**Use `src/harness/` as the harness library. Skills register with pi's built-in event bus directly (no custom event bus needed).**
-```
-src/harness/
-  types.ts           # All harness types (Spec, Plan, DriftEvent, CriticVerdict, Config)
-  config.ts          # Load .pi/harness/config.json with code defaults
-  drift-monitor.ts   # L2.5: LLM-first drift detection + rule pre-filter
-.pi/extensions/
-  wiki-hooks.ts         # Existing (unchanged)
-  dotenv-loader.ts      # Existing (unchanged)
-.pi/agents/
-  critic.md             # L4 critic agent definition (ADR-016)
-```
-### Rules
-- `src/harness/` modules are **pure TypeScript**. No pi imports (`ExtensionAPI`, etc.). Testable without pi runtime.
-- Skills register event handlers directly with pi's built-in event bus — no custom wiring extension needed.
-- Shared state between harness modules uses pi's native event bus and typed interfaces in `types.ts`.
-## Rationale
-- **Separation of concerns**: Harness logic (spec hardening, drift detection, critic management) is independent of pi's API. Can be tested with plain vitest.
-- **Preserves PRD modularity**: The 30-file structure condenses into `src/harness/` modules but maintains the same logical separation.
-- **Single extension load**: pi loads one harness extension. No startup ordering issues.
-- **Minimal pi surface**: Pi's built-in event bus handles all Event API calls. Skills register directly with pi's native events.
-## Consequences
-### Positive
-- Testable without pi runtime.
-- Clean dependency direction: `pi native event bus → skills → src/harness/ → nothing external`.
-- Fits standard TypeScript project structure.
-- Fewer files: 3 code files vs 4 (event bus removed).
-### Negative
-- `src/harness/` modules must avoid importing from `@mariozechner/pi-coding-agent`. Type-only imports are OK.
-- Skills must correctly register with pi's built-in event bus API (pi's responsibility, not ours).
-### Mitigations
-- Pi extensions are TypeScript natively — pi runs them via `tsx`. No build step needed for development.
-- Type-only imports from pi SDK are safe (import type { ExtensionAPI }).

package/vault/wiki/decisions/adr-018.md DELETED Viewed

@@ -1,100 +0,0 @@
----
-type: decision
-title: "ADR-018: Single Harness Config File — .pi/harness/config.json"
-status: accepted
-priority: 1
-date: "2026-05-02"
-tags: [adr, harness, config, foundation, f0]
-sources:
-  - "[[HARNESS-PRD]]"
-  - "[[adr-017]]"
-related:
-  - "[[adr-017]]"
-supersedes: "PRD Section 17 (multiple config files)"
-created: 2026-05-02
-updated: 2026-05-02
----
-# ADR-018: Single Harness Config File
-## Context
-The PRD specified multiple harness config files (`.pi/harness/drift-monitor.json`, `.pi/harness/ts-exec.json`, `.pi/harness/fallow-gate.json`, etc.). This fragments configuration and adds cognitive overhead. Pi already has `.pi/settings.json` for its own config.
-Three approaches considered:
-- **Separate files per subsystem** — original PRD approach
-- **Extend `.pi/settings.json`** with a `harness` key — mixes harness config with pi config
-- **Single `.pi/harness/config.json`** with all harness settings
-## Decision
-**Use a single `.pi/harness/config.json` file. Project-local. No cascade. Defaults in code.**
-```json
-{
-  "driftMonitor": {
-    "enabled": true,
-    "patterns": {
-      "repetitionLoops": { "threshold": 3 },
-      "failureSpirals": { "threshold": 3 },
-      "toolCycling": { "threshold": 5 },
-      "silenceBatching": { "threshold": 6 },
-      "rework": { "threshold": 2 },
-      "excessiveSearch": { "threshold": 8 }
-    },
-    "escalation": {
-      "softNudgeAfter": 2,
-      "strongNudgeAfter": 4,
-      "restartAfter": 6
-    }
-  },
-  "critics": {
-    "maxRounds": 3,
-    "maxTokensPerRound": 6000,
-    "model": "inherit"
-  },
-  "specs": {
-    "storagePath": ".pi/harness/specs",
-    "maxClarificationRounds": 3
-  },
-  "debate": {
-    "enabled": true,
-    "gatingMode": "imad",
-    "budget": {
-      "l1MaxTokens": 6000,
-      "l2MaxTokens": 10000,
-      "l4MaxTokens": 8000
-    }
-  },
-  "phase16": {
-    "biome": true,
-    "tsc": true,
-    "fallow": false
-  }
-}
-```
-### Rules
-- All keys have defaults in `src/harness/config.ts`. User config merges on top.
-- File is project-local only (`.pi/harness/config.json`). No global cascade.
-- User creates from `harness.example.json` or edits by hand.
-- Missing file = all defaults. No error.
-## Rationale
-- **Single source of truth**: One file to understand and edit. No hunting across multiple files.
-- **Defaults in code**: Sensible defaults ship with the harness. Users only override what they need.
-- **No cascade complexity**: Project-local only. Avoids implementing a separate cascade system when pi already has one for its settings.
-- **Flat structure**: Top-level keys correspond to harness subsystems. Clear ownership.
-## Consequences
-### Positive
-- Simple. One file to read, one file to write.
-- Discoverable — single `harness.example.json` shows all options.
-- Merge from code defaults means config file can be minimal.
-### Negative
-- File grows as subsystems are added. Mitigated by flat top-level keys.
-- No per-user global defaults. Users who want the same harness config across projects must copy the file.

package/vault/wiki/decisions/adr-019.md DELETED Viewed

@@ -1,75 +0,0 @@
----
-type: decision
-title: "ADR-019: Tool-Based Q&A for L1 Spec Clarification"
-status: accepted
-priority: 1
-date: "2026-05-02"
-tags: [adr, harness, l1, spec-hardening, qa, tool]
-sources:
-  - "[[HARNESS-PRD]]"
-  - "[[adr-012]]"
-  - "[[adr-017]]"
-related:
-  - "[[adr-012]]"
-  - "[[adr-017]]"
-supersedes:
-created: 2026-05-02
-updated: 2026-05-02
----
-# ADR-019: Tool-Based Q&A for L1 Spec Clarification
-## Context
-L1 spec hardening may detect unresolved ambiguities. When automatic resolution fails (clarification loop exhausts), the harness must surface structured questions to the user. The harness extension has no direct UI — it must communicate through the LLM or via registered tools.
-Three approaches considered:
-- **System prompt injection**: LLM rephrases and asks user. Fragile — harness must parse LLM's rephrasing.
-- **Tool-based Q&A**: Harness registers a `harness_ask` tool. LLM calls it with structured questions. Tool handles user interaction via pi's TUI.
-- **Pre-execution gate**: Block before LLM sees the task. Poor UX in pi's conversation model.
-## Decision
-**Register a `harness_ask` tool that the LLM calls when L1 requires user clarification.**
-### Flow
-1. L1 ambiguity detector finds unresolved decisions in user request
-2. Harness injects system prompt: "Call `harness_ask` to clarify these ambiguities before proceeding"
-3. LLM calls `harness_ask({ questions: [{ id, question, options? }] })`
-4. Tool presents structured questions in pi's TUI (using `ctx.ui` API, same pattern as `wiki-hooks.ts` notifications)
-5. User answers via structured input (select from options, free text)
-6. Tool returns `{ answers: [{ id, answer }] }` to LLM
-7. Harness re-checks spec hardness. If resolved, proceed. If not, loop.
-### Fallback
-If `harness_ask` tool registration fails or pi's TUI API is insufficient, fall back to system prompt injection: "ASK THE USER THESE EXACT QUESTIONS: ...". The LLM becomes the intermediary.
-### Constraints
-- Maximum 3 clarification rounds per spec (configurable in `.pi/harness/config.json` → `specs.maxClarificationRounds`)
-- Questions must be multiple-choice when possible (reduces user effort, prevents LLM reinterpretation)
-- User can skip individual questions (allow partial resolution)
-## Rationale
-- **Structured**: Harness formats questions. LLM doesn't re-interpret. Answers are typed.
-- **Natural UX**: LLM mediates the conversation but harness controls the questions.
-- **Proven pattern**: `@tintinweb/pi-subagents` uses pi's TUI for agent widgets and conversation viewers. Tool-based UI is established in pi's extension model.
-- **Extensible**: Same `harness_ask` tool can be used by L2 (plan clarification) and L4 (critic follow-up questions).
-## Consequences
-### Positive
-- Structured Q&A prevents LLM from rephrasing or skipping questions.
-- Reusable across pipeline layers.
-- User sees clear, intentional questions — not LLM-generated ambiguity.
-### Negative
-- Requires pi TUI API support. If insufficient, falls back to system prompt injection (less reliable).
-- Adds latency — tool call round-trip for every clarification round.
-### Mitigations
-- Multiple questions batched in a single `harness_ask` call.
-- `maxClarificationRounds: 3` prevents infinite loops.