npm - ultimate-pi - Versions diffs - 0.1.2 → 0.1.3 - Mend

ultimate-pi 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (516) hide show

package/vault/wiki/questions/mvp-implementation-blueprint.md ADDED Viewed

@@ -0,0 +1,552 @@
+---
+type: question
+title: "MVP Implementation Blueprint for Agentic Coding Harness (Skill-First v2)"
+question: "Rethought from first principles: MVP implementation plan where spec-hardening and harness layers are markdown-based skills, only drift monitoring remains as code. Event bus handled by pi's built-in system."
+answer_quality: solid
+created: 2026-05-03
+updated: 2026-05-03
+tags: [question, harness, mvp, implementation, build-plan, skill-first, v2]
+related:
+  - "[[Research: Skill-First Harness Architecture]]"
+  - "[[skill-first-architecture]]"
+  - "[[harness-implementation-plan]]"
+  - "[[HARNESS-PRD]]"
+  - "[[adr-012]]"
+  - "[[adr-015]]"
+  - "[[adr-017]]"
+  - "[[adr-021]]"
+  - "[[adr-018]]"
+  - "[[adr-019]]"
+  - "[[adr-020]]"
+  - "[[adr-022]]"
+  - "[[adr-025]]"
+  - "[[agent-skills-pattern]]"
+  - "[[drift-detection-unified]]"
+sources:
+  - "[[Source: SwirlAI Agent Skills Progressive Disclosure]]"
+  - "[[Source: Claude API Agent Skills Overview]]"
+  - "[[Source: Blake Crosley Agent Architecture Guide]]"
+status: developing
+---
+# MVP Implementation Blueprint: Skill-First Harness (v2)
+## Rethought from First Principles
+**Previous plan (v1)**: 15 TypeScript files in `src/harness/`. Every pipeline layer implemented as code.
+**New plan (v2)**: 3 TypeScript files (drift monitor, types, config). Six markdown-based skills for pipeline layers. Event bus handled by pi's built-in system. Progressive disclosure keeps context lean. Zero-compile iteration for harness behavior changes.
+**Core insight**: The harness is NOT a code pipeline — it's a skill coordination layer. The model is better at evaluation (spec quality, plan correctness, code review) than imperative code. Code is for determinism: the drift monitor MUST pattern-match on every `tool_result`. Pi's built-in event bus handles routing — no custom event bus needed. Everything else is probabilistic evaluation and SHOULD be a skill.
+## MVP Scope
+Per ADR-015 (pipeline-first build order), the MVP is **Groups 1-3 + P20 gate from Group 5** — the full quality pipeline.
+**MVP pipeline**: `/harness "task"` → L1 Spec Hardening (skill) → L2 Structured Planning (skill) → L2.5 Drift Monitor (code) → L3 Agent Execution (flat tools) → L4 Adversarial Verification (skill + agent) → P20 Deterministic Gate (skill + bash) → trace/memory writes.
+| MVP Phase | Implementation | What Gets Built |
+|-----------|---------------|-----------------|
+| **F0** | CODE | Types, config (event bus handled by pi's built-in system) |
+| **P1** | SKILL | L1 Spec Hardening: `harness-spec/SKILL.md` |
+| **P2** | SKILL | L2 Structured Planning: `harness-plan/SKILL.md` |
+| **P3-P7** | CODE | L2.5 Drift Monitor: LLM-first + rule pre-filter + escalation |
+| **P16-P19b** | SKILL + AGENT | L4 Adversarial: `harness-critic/SKILL.md` + `.pi/agents/critic.md` + consensus filing |
+| **P20** | SKILL + BASH | Deterministic gate: `harness-gate/SKILL.md` (biome + tsc + fallow) |
+| **P21-P24** | SKILL + WIKI | L5 Obsrv + L6 Memory + L7 Orch + L8 Wiki |
+---
+## 1. File Structure (Code Layer)
+### What Stays Code (3 files)
+```
+src/harness/
+├── types.ts                # All harness types (Spec, Plan, DriftVerdict, CriticVerdict, Config)
+├── config.ts               # Load .pi/harness/config.json, merge with code defaults
+└── drift-monitor.ts        # L2.5: LLM-first drift detection + 6-rule pre-filter + escalation
+```
+> [!note] Event bus removed
+> Pi's latest version ships a built-in event bus. Skills register directly with pi's native events — no custom `events.ts` or `harness-event-bus.ts` wiring needed.
+### What Becomes Skills (6 directories)
+```
+.pi/skills/
+├── harness-spec/
+│   ├── SKILL.md            # L1: Ambiguity detection, spec hardening, harness_ask tool
+│   └── reference.md        # Ambiguity categories, hardening patterns
+├── harness-plan/
+│   ├── SKILL.md            # L2: YAML task DAG generation, sprint contracts
+│   └── reference.md        # Plan templates, DAG patterns, sprint contract examples
+├── harness-critic/
+│   ├── SKILL.md            # L4: Adversarial attack patterns, debate protocol
+│   └── reference.md        # Attack angle catalog, failure pattern taxonomy
+├── harness-observe/
+│   ├── SKILL.md            # L5: Keep Rate tracking, LLM-as-Judge, satisfaction metrics
+│   └── reference.md        # Metric definitions, sampling strategies
+├── harness-gate/
+│   ├── SKILL.md            # P20: Deterministic gate instructions
+│   └── reference.md        # Gate configuration, baseline management
+└── harness-memory/
+    ├── SKILL.md            # L6: Read-first/write-after wiki contract
+    └── reference.md        # Wiki page templates, staleness rules
+```
+### Extension Wiring
+```
+.pi/
+├── extensions/
+│   ├── wiki-hooks.ts            # Existing (unchanged)
+│   └── dotenv-loader.ts         # Existing (unchanged)
+├── harness/
+│   ├── config.json             # Single config file (ADR-018)
+│   ├── plans/                  # <spec-hash>.yaml plan files (generated by L2 skill)
+│   └── critics/                # Critic temp files (generated by L4 skill)
+├── agents/
+│   └── critic.md               # L4 critic agent definition (ADR-016) — invoked by skill
+└── skills/
+    └── gitingest/SKILL.md      # Bulk repo ingestion (unchanged)
+.github/
+└── ISSUE_TEMPLATE/
+    └── harness-spec.yml        # GitHub Issue template for specs (ADR-025)
+```
+**Key rule**: `src/harness/` modules are pure TypeScript. All harness logic lives in skills — markdown, not code. Pi's built-in event bus handles event routing.
+---
+## 2. Foundation (F0) — Phase 0 (CODE)
+### 2.1 Harness Types (`src/harness/types.ts`)
+```typescript
+// Spec after hardening (generated by L1 skill)
+export interface HardenedSpec {
+  request: string;
+  intent: string;
+  acceptanceCriteria: {
+    deterministic: DeterministicCriterion[];
+    freeform: string[];
+  };
+  constraints: string[];
+  context: { files: string[]; wiki: string[]; git: { branch: string } };
+  specHash: string;             // SHA256(intent + criteria)
+  clarifiedQuestions: { q: string; a: string }[];
+  createdAt: string;
+}
+// L2 plan (generated by L2 skill)
+export interface SprintPlan {
+  specHash: string;
+  tasks: TaskNode[];
+  generated: string;
+  model: string;
+}
+export interface TaskNode {
+  id: string;
+  description: string;
+  dependsOn: string[];
+  doneCriteria: DoneCriterion[];
+  estimatedTokens: number;
+  checkpoint: boolean;
+}
+export type DoneCriterion =
+  | { type: 'tests_pass'; pattern: string }
+  | { type: 'lint_passes' }
+  | { type: 'typescript_compiles' }
+  | { type: 'no_regression'; baseline: string }
+  | { type: 'spec_requirement'; requirement: string }
+  | { type: 'no_new_dead_code' };
+// L2.5 drift verdict (generated by code)
+export interface DriftVerdict {
+  drifted: boolean;
+  patterns: string[];
+  confidence: number;
+  action: 'continue' | 'nudge' | 'restart';
+}
+// L4 critic verdict (generated by L4 skill)
+export interface CriticVerdict {
+  pass: boolean;
+  failures: { criteria: string; explanation: string }[];
+  score: number;
+  rounds: number;
+  debateResult: 'CONSENSUS_REACHED' | 'DEADLOCK' | 'BUDGET_EXHAUSTED' | 'TIMEOUT';
+}
+export type PipelinePhase = 'idle' | 'l1-spec' | 'l2-plan' | 'l3-execute' | 'l4-verify' | 'p20-gate' | 'l5-l8-trace';
+export interface PipelineState {
+  phase: PipelinePhase;
+  spec?: HardenedSpec;
+  plan?: SprintPlan;
+  currentTaskId?: string;
+  driftHistory: DriftVerdict[];
+  criticVerdict?: CriticVerdict;
+  turnCount: number;
+}
+```
+### 2.2 Config (`src/harness/config.ts` + `.pi/harness/config.json`)
+ADR-018: single file, code defaults, project-local only. Same structure as v1 — unchanged.
+### 2.3 Event Routing (Pi's Built-in Event Bus)
+Pi's latest version ships a built-in event bus. Skills register directly with pi's native events — no custom `events.ts` or `harness-event-bus.ts` needed.
+```
+Pi Native Event          → Skill Action
+─────────────────────────────────────────────
+session_start            → Load harness config, set state.idle
+before_agent_start       → If /harness command detected:
+                           → Phase L1: activate harness-spec skill
+                           → Phase L2: activate harness-plan skill
+                           → Phase L4: activate harness-critic skill + spawn critic agent
+                           → Phase P20: activate harness-gate skill
+                           → Phase L5-L8: activate harness-observe + memory skills
+tool_result              → Drift monitor (code): pattern-match, check every N turns
+session_compact          → Persist pipeline state for reinjection
+session_shutdown         → Flush consensus, record Keep Rate sample
+```
+---
+## 3. L1: Spec Hardening — `harness-spec/SKILL.md`
+### Skill Frontmatter
+```yaml
+---
+name: harness-spec
+description: >
+  Hardens user task descriptions into structured specifications.
+  Detects ambiguity, resolves through Q&A, generates spec hash,
+  stores in GitHub Issues. Activates on /harness command.
+  Use when user invokes the harness pipeline.
+allowed-tools: Read, Grep, Glob, Bash, harness_ask
+---
+```
+### SKILL.md Body (Core Instructions)
+The skill body contains step-by-step instructions for the LLM:
+1. **Ambiguity Scan**: Read the user's task description. Identify unresolved decisions: missing bug numbers, unspecified file scopes, absent acceptance criteria, ambiguous constraints.
+2. **Clarification Round** (max 3, configurable): Call `harness_ask` tool with structured questions for each ambiguity. Question format: `{ id, question, options? }`. User answers via TUI.
+3. **Spec Generation**: Once clarified, generate the hardened spec as YAML:
+   - `intent`: disambiguated goal
+   - `acceptanceCriteria`: deterministic (testable) + freeform (L4 critic judges)
+   - `constraints`: hard limits (files, dependencies, performance)
+   - `context`: relevant files, wiki pages, git branch
+4. **Spec Hash**: Compute SHA256(intent + JSON.stringify(criteria)).slice(0, 16).
+5. **GitHub Issue Storage**: Create GitHub Issue using template `.github/ISSUE_TEMPLATE/harness-spec.yml`. Labels: `harness`, `spec`, `in-progress`. Return issue number as spec ID.
+6. **Wiki Write**: Write spec summary to wiki for cross-session traceability.
+### Supporting File: `reference.md`
+Contains:
+- Ambiguity pattern catalog (missing scope, vague acceptance criteria, conflicting constraints)
+- Spec hardening anti-patterns (overly broad, untestable criteria)
+- Example hardened specs for common task types
+---
+## 4. L2: Structured Planning — `harness-plan/SKILL.md`
+### Skill Frontmatter
+```yaml
+---
+name: harness-plan
+description: >
+  Generates machine-readable YAML task DAG with sprint contracts
+  from a hardened spec. Every task has doneCriteria. Checkpoint
+  tasks are grounding points. Activates after L1 spec hardening completes.
+allowed-tools: Read, Write, Bash
+---
+```
+### SKILL.md Body
+1. **Read Hardened Spec**: Read the GitHub Issue or `.pi/harness/specs/<hash>.yaml`.
+2. **Decompose into Tasks**: Break the spec into sequential + parallel subtasks. Each task has: `id`, `description`, `dependsOn`, `doneCriteria`, `estimatedTokens`, `checkpoint`.
+3. **Sprint Contracts**: Every task gets `doneCriteria` — mix of deterministic (auto-verified: `tests_pass`, `lint_passes`, `typescript_compiles`) and `spec_requirement` (L4 critic judges).
+4. **Checkpoint Marking**: Mark tasks that produce a verifiable state change as `checkpoint: true`. These are MVC grounding points.
+5. **Write Plan YAML**: Store at `.pi/harness/plans/<spec-hash>.yaml`.
+6. **Plan Summary Injection**: Generate 3-5 line plan summary. Event bus injects into system prompt.
+### Supporting File: `reference.md`
+Contains:
+- DAG templates for common task types (bug fix, feature add, refactor, test)
+- Sprint contract examples with varying doneCriteria
+- Plan complexity heuristics (when to split, when to combine tasks)
+---
+## 5. L2.5: Runtime Drift Monitor (CODE — `drift-monitor.ts`)
+**This is the ONLY complex logic that stays as code.** The drift monitor runs on every `tool_result` event. It needs sub-millisecond rule-based pre-filter and deterministic escalation. The LLM-based primary detection (Haiku 4.5 every 8 turns) is invoked FROM code, but the monitor itself is code.
+Architecture unchanged from v1:
+- Rule-based pre-filter (6 patterns, 0 tokens, <1ms)
+- Structured drift context builder (~700 tokens)
+- Haiku 4.5 invocation every 8 turns
+- Escalation ladder (soft nudge → strong nudge → forced restart)
+See [[drift-detection-unified]] and [[adr-022]] for full specification.
+**Why this must be code**: Skills are probabilistic — the model decides when to activate them. Drift detection must fire deterministically on every `tool_result` event with zero exceptions. A skill cannot guarantee this. Code can.
+---
+## 6. L4: Adversarial Verification — `harness-critic/SKILL.md` + `.pi/agents/critic.md`
+### Skill Frontmatter
+```yaml
+---
+name: harness-critic
+description: >
+  Performs adversarial code review with hard-threshold pass/fail criteria.
+  Spawns critic sub-agent via pi-subagents RPC. Activates after code changes
+  complete in L3 execution.
+allowed-tools: Read, Grep, Glob, Bash
+hooks:
+  PostToolUse:
+    - matcher: "Task"
+      hooks:
+        - type: "agent"
+          prompt: "Verify all critic criteria pass. Re-read spec. $ARGUMENTS"
+---
+```
+### SKILL.md Body
+1. **Read Diff + Spec**: Read the spec's acceptance criteria. Read the git diff of changes.
+2. **Prepare Critic Prompt**: Write to `.pi/harness/critics/<spec-hash>.md` — spec, diff, sprint contract doneCriteria, attack angles.
+3. **Spawn Critic Sub-agent**: Via pi-subagents RPC: `subagents:rpc:spawn` with `type: "critic"`, `prompt: "@.pi/harness/critics/<hash>.md"`.
+4. **Monitor + iMAD Gate**: Pre-debate classifier: high-confidence tasks skip multi-round debate (92% token savings). Ambiguous tasks trigger multi-round debate with budget caps.
+5. **Evaluate Verdict**: `{ pass: bool, failures: [...], score: number, debateResult: ... }`.
+6. **Fix or File**: If pass → proceed to P20 gate. If fail → inject failures into agent prompt, retry (max 3 rounds).
+7. **Consensus Filing** (P19b): Every debate verdict writes to `wiki/consensus/<layer>-<topic-slug>.md`.
+### Critic Agent Definition (`.pi/agents/critic.md`)
+```yaml
+---
+description: Adversarial code reviewer — attacks code changes with hard-threshold pass/fail criteria
+tools: read, grep, find, ls, bash
+model: inherit
+thinking: high
+max_turns: 15
+prompt_mode: replace
+---
+```
+### Supporting File: `reference.md`
+Contains:
+- Attack angle catalog (security, performance, edge cases, regression, spec compliance)
+- Failure pattern taxonomy
+- Debate protocol reference
+- Consensus filing templates
+---
+## 7. P20: Deterministic Gate — `harness-gate/SKILL.md`
+### Skill Frontmatter
+```yaml
+---
+name: harness-gate
+description: >
+  Runs deterministic quality gates: biome lint+format, tsc type-check,
+  fallow dead code/duplication audit. Zero LLM tokens. Activates after
+  L4 adversarial verification passes.
+allowed-tools: Bash
+---
+```
+### SKILL.md Body
+Three-step gate, 0 LLM tokens, <10s:
+1. `biome check --apply` — lint + format in one pass
+2. `tsc --noEmit --skipLibCheck` — type-checking
+3. `fallow audit --changed-since main --gate all` — dead code, duplication, complexity (optional, config-controlled)
+All three are pure CLI tools with exit codes. The skill provides instructions on which commands to run and how to interpret results. The event bus reads exit codes.
+**Why a skill and not code**: The logic is trivial — three bash commands. The value is in the instructions: which flags to use, how to handle warnings vs errors, when to gate vs warn. A markdown skill captures this knowledge perfectly. If a project needs different gate commands, they edit the skill — no code change needed.
+### Supporting File: `reference.md`
+Contains:
+- Gate pass/fail/warn semantics
+- Baseline management for legacy codebases (fallow)
+- Environment-specific configurations
+---
+## 8. L5-L8: Trace, Observability, Memory
+### L5: `harness-observe/SKILL.md`
+Tracks Keep Rate (agent-generated code survival at 1-day, 1-week, 1-month). LLM-as-Judge satisfaction metrics. Writes samples to wiki.
+### L6: `harness-memory/SKILL.md`
+Read-first/write-after wiki contract (ADR-010). Hot cache management. Wiki staleness rules. Already partially implemented via claude-obsidian skills. This skill formalizes the harness-specific contract.
+### L7: Schema Orchestration
+Archon workflow DAG — unchanged from v1. Already YAML-based, not code. Enforces pipeline ordering and consensus filing compliance.
+### L8: Wiki Query Interface
+Already operational via claude-obsidian skills. No change needed.
+---
+## 9. Dependencies
+| Dependency | Purpose | Install |
+|-----------|---------|---------|
+| `@tintinweb/pi-subagents` v0.6.3 | L4 critic sub-agent infrastructure (ADR-016) | `pi install npm:@tintinweb/pi-subagents` |
+| `js-yaml` | YAML parsing for plans + specs | Already in ecosystem |
+| `biome` | P20 lint+format gate | Already configured |
+| `fallow` (optional) | P20 dead code/duplication audit | `npm install -D fallow` |
+---
+## 10. Token Budget (MVP — Skill-First)
+| Layer | Tokens | Mechanism |
+|-------|--------|-----------|
+| L1 Spec Hardening (skill) | ~2,500 | Skill activation ~2,000 tokens + spec generation + Q&A |
+| L2 Planning (skill) | ~4,500 | Skill activation + plan generation + sprint contracts |
+| L2.5 Drift Monitor (code) | ~1,500-2,200 | Haiku 4.5 every 8 turns (~$0.0002/check). Rule pre-filter: 0 tokens. |
+| L3 Agent Execution | variable | Flat tools — unchanged from v1 |
+| L4 Adversarial (skill + agent) | ~4,500 | Skill activation + critic sub-agent + selective debate |
+| P20 Gate (skill + bash) | 0 | Deterministic tools — skill only provides instructions |
+| L5-L8 Trace (skills) | ~4,000 | Observability + memory writes |
+| **Total overhead** | **~17,000-17,700/subtask** | Similar to v1. Skill activation replaces code module loading — comparable token cost but better isolation. |
+### Savings from Skill-First Architecture
+| Mechanism | Savings |
+|-----------|---------|
+| Progressive disclosure (skills load on demand) | Code modules always loaded → skills only loaded when pipeline phase active |
+| Zero compile iteration | No TypeScript compilation for harness logic changes — edit markdown, agent picks up next activation |
+| User-editable harness | No TypeScript knowledge needed to customize spec hardening, planning, critic behavior |
+| Selective debate routing (iMAD) | 92% token savings on ~80% of debate-invoked tasks (unchanged) |
+| ck search routing | Deterministic nudge toward semantic search (unchanged) |
+---
+## 11. Build Order & Delivery Increments
+Same ADR-015 pipeline-first order. Implementation method changes from code to skills.
+| Step | Deliverable | Implementation | Est. Time |
+|------|-------------|---------------|-----------|
+| **1. F0** | Types, config, event bus | CODE: `types.ts`, `config.ts`, `events.ts` | ~1 week |
+| **2. L1+L2** | Spec hardening + planning skills | SKILLS: `harness-spec/`, `harness-plan/` | ~1 week |
+| **3. L2.5** | LLM-first drift monitor + rule pre-filter + escalation | CODE: `drift-monitor.ts` | ~2 weeks |
+| **4. L4** | Critic skill + agent + iMAD gating + consensus filing | SKILL + AGENT: `harness-critic/`, `critic.md` | ~2 weeks |
+| **5. P20+L5-L8** | Gate skill + observe skill + memory skill | SKILLS: `harness-gate/`, `harness-observe/`, `harness-memory/` | ~2 weeks |
+**Total MVP**: ~8 weeks (down from ~9 weeks in v1 — skill iteration is faster than code, no custom event bus to build).
+---
+## 12. What's Deferred (Post-MVP)
+Same as v1. P43 TS Execution Layer, P22b Prompt Renderer, P25 Subagent Router, P27-P48 deferred to Groups 4-9.
+---
+## 13. Files to Create (MVP — Skill-First v2)
+### Code (3 files)
+1. `src/harness/types.ts` — All type definitions
+2. `src/harness/config.ts` — Config loader with code defaults
+3. `src/harness/drift-monitor.ts` — LLM-first drift detection + 6-rule pre-filter + escalation
+> Event bus handled by pi's built-in system — no custom `events.ts` needed.
+### Skills (12+ files)
+5. `.pi/skills/harness-spec/SKILL.md` — L1: Ambiguity detection, spec hardening, harness_ask
+6. `.pi/skills/harness-spec/reference.md` — Ambiguity patterns, hardening examples
+7. `.pi/skills/harness-plan/SKILL.md` — L2: YAML DAG generation, sprint contracts
+8. `.pi/skills/harness-plan/reference.md` — Plan templates, DAG patterns
+9. `.pi/skills/harness-critic/SKILL.md` — L4: Adversarial attack patterns, debate protocol
+10. `.pi/skills/harness-critic/reference.md` — Attack angle catalog, failure taxonomy
+11. `.pi/skills/harness-gate/SKILL.md` — P20: Deterministic gate instructions
+12. `.pi/skills/harness-gate/reference.md` — Gate configuration, baseline management
+13. `.pi/skills/harness-observe/SKILL.md` — L5: Keep Rate, LLM-as-Judge
+14. `.pi/skills/harness-observe/reference.md` — Metric definitions, sampling
+15. `.pi/skills/harness-memory/SKILL.md` — L6: Wiki read/write contract
+16. `.pi/skills/harness-memory/reference.md` — Wiki templates, staleness rules
+### Config + Wiring
+14. `.pi/harness/config.json` — Single config file
+15. `.pi/agents/critic.md` — L4 critic agent definition
+16. `.github/ISSUE_TEMPLATE/harness-spec.yml` — GitHub Issues spec template
+**Total**: 3 code files + 12 skill files + 4 config files = 19 files. Down from 15 code files (v1) — shift from code to configuration.
+---
+## 14. Key Architecture Decisions (ADRs Governing MVP)
+All ADRs from v1 remain valid. New emphasis:
+| ADR | Decision | Skill-First Impact |
+|-----|----------|-------------------|
+| **ADR-012** | Extension-based harness. No fork. | Pi's built-in event bus handles routing — skills register directly. |
+| **ADR-015** | Pipeline-first build order. Validate gates before P43. | Unchanged. Skills deliver same pipeline, same order. |
+| **ADR-017** | ~~`src/harness/` library + thin extension wiring~~. Superseded. | Now truly thin: 3 files vs 15. Business logic extracted to skills. Event bus removed. |
+| **ADR-021** | Explicit `/harness` command. | Event bus detects command → activates harness-spec skill. |
+| **ADR-018** | Single `.pi/harness/config.json`. | Unchanged. |
+| **ADR-019** | `harness_ask` tool for L1 clarification. | Invoked by harness-spec skill, not code. |
+| **ADR-020** | YAML task DAG + sprint contracts. | Generated by harness-plan skill, not code. |
+| **ADR-022** | 7 drift patterns with turn-dependent thresholds. | Implemented in drift-monitor.ts (code — must be deterministic). |
+| **ADR-025** | GitHub Issues as sole spec storage. | Used by harness-spec skill. Event bus passes issue number. |
+| **ADR-013** | Biome + tsc + fallow for P20 gate. | Invoked by harness-gate skill (bash commands). |
+| **ADR-014** | isolated-vm for P43 sandbox (deferred). | Unchanged. |
+| **ADR-016** | @tintinweb/pi-subagents for L4 critic. | Invoked by harness-critic skill. |
+---
+## 15. Skill-First vs Code-First Comparison
+| Dimension | v1 (Code-First) | v2 (Skill-First) |
+|-----------|----------------|-------------------|
+| TS source files | 15 | 3 |
+| TS lines of code | ~2,500 | ~600 |
+| Skill markdown files | 0 | 12 (6 SKILL.md + 6 reference.md) |
+| Compilation required per change | Every logic change | Only when types/drift change |
+| User-edit harness behavior | Edit TS, recompile, restart | Edit markdown, agent picks up next activation |
+| Context cost of loaded logic | All 15 modules loaded as tool definitions (~15K tokens) | Skills loaded progressively: discovery (~480 tokens), activation only when phase active |
+| Cross-platform portability | Pi-specific TypeScript | SKILL.md standard portable to Codex, Cursor, Copilot |
+| Deterministic guarantees | All code (guaranteed) | Code for critical path (drift monitor), pi event bus for routing, skills for evaluation (probabilistic) |
+| Iteration speed | Minutes (edit → compile → restart) | Seconds (edit markdown → agent picks up) |
+> [!gap] Pi skill system integration details need verification. Can pi skills invoke other pi skills? Can pi skills write to `.pi/harness/` directories? Can pi skills use `harness_ask` tool? These determine exact event bus sequencing logic.