npm - ultimate-pi - Versions diffs - 0.1.2 → 0.1.4 - Mend

ultimate-pi 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (516) hide show

package/vault/wiki/concepts/multi-agent-specialization.md ADDED Viewed

@@ -0,0 +1,61 @@
+---
+type: concept
+tags:
+  - multi-agent
+  - specialization
+  - team-composition
+  - model-routing
+related:
+  - "[[Agentic Orchestration Pipeline]]"
+  - "[[Agent Harness Architecture]]"
+  - "[[sources/disler-pi-vs-claude-code]]"
+  - "[[sources/mindstudio-four-agent-types]]"
+---
+# Multi-Agent Specialization
+The practice of assigning different agents to different cognitive roles based on their specialized system prompts, tool sets, and underlying models. Specialization matters more than the raw capability ceiling of any single model.
+## Specialization Axes
+### By Role (What the agent does)
+- **Planner**: Explores codebase, analyzes patterns, produces structured plans. Read-only tools.
+- **Builder**: Implements code changes, writes files, runs commands. Full read-write tools.
+- **Reviewer**: Analyzes diffs, finds issues, suggests improvements. Read + analysis tools.
+- **Fixer**: Addresses review feedback, applies targeted fixes. Write tools.
+- **Gatekeeper**: Final verification, test execution, merge approval. Read + test tools.
+- **Narrator**: Produces human-readable summaries, PR descriptions, documentation.
+### By Model (Which LLM powers it)
+- **Opus/Strong models**: Planning, reviewing, complex reasoning (high quality, high cost)
+- **Sonnet/Mid models**: Building, fixing, standard coding tasks (balanced quality/cost)
+- **Haiku/Fast models**: Spec-checking, PR descriptions, summarization (low cost, fast)
+- **Task routing**: Different pipeline stages use different models optimized for that stage
+### By Tool Set (What capabilities it has)
+- **Read-only**: Code explorer, reviewer (cannot modify files)
+- **Read-write**: Builder, fixer (can create/edit files)
+- **Shell access**: Builder, fixer (can run commands)
+- **Web access**: Researcher, web-cloner (can fetch URLs)
+- **User interaction**: Ask-user agents (structured surveys)
+## Team Composition
+Teams defined in YAML config map domain names to agent rosters:
+```yaml
+frontend: [planner, builder, reviewer]
+backend: [architect, implementer, tester]
+security: [auditor, remediator]
+```
+The dispatcher (orchestrator) selects the most appropriate team and specialist based on the user's request.
+## Model-Agnostic Design
+Specialization should be model-agnostic: switching providers requires only configuration changes, not code changes. Agents should declare model preferences per role but gracefully degrade to defaults.
+## Key Insight
+> "Multi-model pipelines beat single-model agents for non-trivial work. Specialization matters more than the raw capability ceiling of any one model." — Hamza L. (4CO-OP contributor)
+The 4CO-OP project demonstrates this with 9 specialized agents across Claude Code and Codex: Planner, Builder, Spec Checker, Escalation, Reviewer, Fixer, Gatekeeper, PR Writer, Narrator. Each runs the model best at that specific job.

package/vault/wiki/concepts/permission-subsystem.md ADDED Viewed

@@ -0,0 +1,16 @@
+---
+type: concept
+status: stub
+created: 2026-05-02
+updated: 2026-05-02
+tags: [concept, permissions, security]
+---
+# Permission Subsystem
+Security subsystem that gates tool execution based on permissions. Claude Code uses an ML classifier to determine whether a tool call requires user approval, balancing safety with autonomy.
+## References
+- [[claude-code-security-architecture-penligent-2026]]
+- [[policy-engine-pattern]]

package/vault/wiki/concepts/pi-messenger-analysis.md ADDED Viewed

@@ -0,0 +1,243 @@
+---
+type: analysis
+title: "pi-messenger Analysis for Harness Integration"
+created: 2026-04-30
+updated: 2026-04-30
+tags: [pi-messenger, analysis, harness, multi-agent, communication, transport]
+status: developing
+sources:
+  - "https://github.com/nicobailon/pi-messenger"
+related:
+  - "[[adr-011]]"
+  - "[[consensus-debate]]"
+  - "[[agentic-harness]]"
+---
+# pi-messenger Analysis: What We Adopt, What We Strip
+pi-messenger is a multi-agent communication extension for the pi coding agent. 532 stars. File-based: no daemon, no server, just files.
+## Architecture Summary
+### Core Mechanism: File-Based Agent Mesh
+Agents write JSON registration files to a shared `.pi/messenger/registry/` directory. Each file contains: name, PID, sessionId, cwd, model, git branch, reservations, activity timestamps.
+Messages are delivered as JSON files to per-agent inbox directories (`.pi/messenger/inbox/<agentName>/`). Recipients detect new messages via `fs.watch()` on their inbox directory.
+**Key insight**: This is peer-to-peer, not parent-child. Agents communicate directly. No orchestrator mediates messages. This enables genuine multi-agent conversation — the kind needed for consensus debates.
+### Components
+| Component | What it does | Keep? |
+|-----------|-------------|-------|
+| **Registry** | Agent registration/discovery. Files in `registry/`. PID-based liveness. Stale cleanup. | ✅ YES |
+| **Inbox** | Per-agent message delivery. `fs.watch` for real-time. Debounced to 50ms. | ✅ YES |
+| **Messaging** | DM + broadcast. JSON message format `{id, from, to, text, timestamp, replyTo}`. Atomic file writes. | ✅ YES (adapted) |
+| **Presence** | Active/idle/away/stuck detection. Status indicators. Auto-generated status messages ("on fire", "debugging...") | ❌ STRIP |
+| **File Reservations** | Claim files/directories. Other agents blocked with conflict message. | ⚠️ KEEP (useful for parallel harness waves) |
+| **Activity Feed** | Timeline of edits, commits, tests, messages. Human-facing. | ❌ STRIP |
+| **Chat Overlay** | `/messenger` interactive UI. Agent list, activity feed, chat tabs. | ❌ STRIP |
+| **Crew Orchestration** | Planner→Worker→Reviewer DAG. Wave-based execution. Task dependency graph. | ❌ STRIP (L7 handles this) |
+| **Swarm** | Spec-based claim/complete. File-based locking. | ❌ STRIP (L7 handles task tracking) |
+| **Crew Skills** | On-demand skill loading for workers. | ❌ STRIP (harness has own skill system) |
+| **Human as Participant** | Interactive pi session appears in agent list. | ❌ STRIP |
+### Message Delivery Flow
+```
+Agent A                          Filesystem                       Agent B
+  │                                 │                                │
+  ├─ send( to: "B", text: "..." )   │                                │
+  │  └─ write inbox/B/<ts>.json ────►                                │
+  │                                 │                                │
+  │                                 │  fs.watch fires                │
+  │                                 │  debounce 50ms                 │
+  │                                 │  processAllPendingMessages()   │
+  │                                 │  └─ read + parse inbox files   │
+  │                                 │     └─ deliverFn(msg) ────────►│ B sees message
+  │                                 │     └─ unlink file             │
+```
+### Registry Format
+```json
+{
+  "name": "SwiftRaven",
+  "pid": 12345,
+  "sessionId": "abc-123",
+  "cwd": "/path/to/project",
+  "model": "claude-sonnet-4-6",
+  "startedAt": "2026-04-30T10:00:00Z",
+  "gitBranch": "main",
+  "isHuman": false,
+  "session": { "toolCalls": 42, "tokens": 15000, "filesModified": ["src/auth.ts"] },
+  "activity": { "lastActivityAt": "2026-04-30T10:05:00Z" },
+  "reservations": [{ "pattern": "src/auth/", "reason": "Refactoring", "since": "..." }]
+}
+```
+### Message Format
+```json
+{
+  "id": "uuid",
+  "from": "SwiftRaven",
+  "to": "GoldFalcon",
+  "text": "auth module is done, your turn",
+  "timestamp": "2026-04-30T10:05:00Z",
+  "replyTo": "prev-msg-uuid"
+}
+```
+## What We Adopt
+### 1. Agent Registry (`.pi/messenger/registry/`)
+Adapted for harness:
+- Each harness layer spawns debate participants that register
+- Registry tracks which agents are alive (PID check)
+- Stale cleanup on crash/exit
+- Name generation for debuggability
+### 2. Per-Agent Inboxes (`.pi/messenger/inbox/<agent>/`)
+Adapted for consensus:
+- Debate rounds delivered as messages to inboxes
+- `fs.watch` triggers message processing
+- Debounce prevents thundering herd on rapid messages
+- Atomic file writes (write temp, rename) prevent partial reads
+### 3. Message Format
+Extended for consensus:
+```json
+{
+  "id": "uuid",
+  "from": "agent-name",
+  "to": "agent-name",
+  "type": "debate_turn",
+  "debate_id": "debate-uuid",
+  "round": 2,
+  "role": "attacker",
+  "position": "Succinct claim",
+  "counter_to": "Previous claim being countered",
+  "evidence_refs": ["wiki/page", "src/auth.ts:142"],
+  "confidence_change": -1,
+  "timestamp": "...",
+  "replyTo": "prev-turn-uuid"
+}
+```
+### 4. Atomic Patterns
+- Temp file write + `fs.renameSync` for race-free writes
+- Swarm lock pattern (`.lock` file with PID, staleness detection)
+- Retry with exponential backoff on watcher failures
+## What We Strip
+### Chat Overlay (`overlay.ts`, `overlay-*.ts`)
+Human-facing TUI. Not needed for agent-to-agent debate.
+### Status Bar Indicators
+"SwiftRaven (2 peers) ●3", "on fire 🔥", "debugging...". Human-facing presence display.
+### Activity Feed (`feed.ts`)
+Human-facing timeline of events. Debate transcripts serve as the "feed" for agents.
+### Crew Orchestration (`crew/`)
+Planner→Worker→Reviewer DAG, wave execution, autonomous mode, task dependency graph, review cycles. All of this is handled by L7 (Schema Orchestration via Archon). The harness already has a workflow engine with loop nodes, approval gates, and worktree isolation.
+### Swarm (`swarm` actions, `store.ts` claim/complete)
+Task claiming with file-based locks. L7's Archon workflows handle task assignment and state tracking.
+### Crew Skills (`crew/skills/`)
+On-demand skill loading for workers. The harness has its own skill system via `.pi/skills/`.
+### Human as Participant
+Interactive pi session as a peer agent. Not needed — debates are agent-to-agent.
+### Message Budgets (per-coordination-level)
+`{ none: 0, minimal: 2, moderate: 5, chatty: 10 }`. Replaced by ConsensusBudget which is per-debate, not per-coordination-level.
+## Integration Architecture
+```
+┌─────────────────────────────────────────────────────────┐
+│                   HARNESS PIPELINE (L7)                   │
+│                                                           │
+│  ┌──────┐   ┌──────┐   ┌──────┐   ┌──────┐             │
+│  │  L1  │   │  L2  │   │  L3  │   │  L4  │   ...       │
+│  │ Spec │   │ Plan │   │ Exec │   │Critic│              │
+│  └──┬───┘   └──┬───┘   └──────┘   └──┬───┘             │
+│     │          │                      │                  │
+│     │  spawns  │  spawns              │  spawns          │
+│     ▼          ▼                      ▼                  │
+│  ┌──────────────────────────────────────┐              │
+│  │     CONSENSUS PROTOCOL LAYER         │              │
+│  │  DebateSession, ConsensusBudget,     │              │
+│  │  Turn protocol, Convergence detect   │              │
+│  └──────────┬───────────────────────────┘              │
+│             │                                            │
+│             ▼                                            │
+│  ┌──────────────────────────────────────┐              │
+│  │   pi-messenger TRANSPORT LAYER       │              │
+│  │  Registry, Inboxes, fs.watch,        │              │
+│  │  Atomic writes, Stale cleanup         │              │
+│  └──────────┬───────────────────────────┘              │
+│             │                                            │
+│             ▼                                            │
+│     .pi/messenger/registry/                              │
+│     .pi/messenger/inbox/<agent>/                         │
+│     .pi/messenger/debates/<debate-id>/                   │
+└─────────────────────────────────────────────────────────┘
+```
+## Key Differences from pi-messenger's Crew
+| Aspect | pi-messenger Crew | Harness Consensus |
+|--------|-------------------|-------------------|
+| **Orchestration** | Built-in DAG executor | L7 (Archon) handles orchestration |
+| **Purpose** | Execute coding tasks in parallel | Debate design decisions |
+| **Agent roles** | Planner, Worker, Reviewer | Attacker, Defender (per debate) |
+| **Outcome** | Code changes + review verdict (SHIP/NEEDS_WORK) | Consensus verdict (CONSENSUS/DEADLOCK/BUDGET_EXHAUSTED) |
+| **Parallelism** | Workers run in parallel waves | Debate is turn-based (sequential rounds) |
+| **Persistence** | Crew state files, planning-progress.md | Debate transcripts stored as wiki artifacts |
+| **Model routing** | Config per role (planner/worker/reviewer) | Per-debate model selection (both sides can differ) |
+## Files We Will Adapt
+| pi-messenger file | Harness equivalent | Changes |
+|-------------------|-------------------|---------|
+| `store.ts` (registry, inbox, messaging) | `lib/harness-messenger.ts` | Strip swarm, feed, reservations; keep registry + inbox + messaging |
+| `lib.ts` (types, status) | `lib/harness-schemas.ts` | Keep AgentRegistration, AgentMailMessage; add DebateMessage |
+| `handlers.ts` (join/leave/send/list) | `lib/harness-messenger.ts` | Keep join/leave/send; strip overlay actions, swarm, crew |
+| — | `lib/harness-debate.ts` | NEW: DebateSession, ConsensusBudget, convergence, verdict |
+| `crew/state*.ts` | — | STRIP entirely |
+| `feed.ts` | — | STRIP entirely |
+| `overlay*.ts` | — | STRIP entirely |
+| `config-overlay.ts` | — | STRIP entirely |
+## Dependency
+```json
+// package.json
+{
+  "dependencies": {
+    "pi-messenger": "^latest"
+  }
+}
+```
+We use pi-messenger as a library — import its transport primitives (registry, inbox, watcher) directly. We do NOT use its CLI, overlay, or crew features.
+## Risk Assessment
+| Risk | Likelihood | Mitigation |
+|------|-----------|------------|
+| pi-messenger API breaks on update | Medium | Pin version; we only use stable file-based primitives |
+| fs.watch reliability on different OS | Low | pi-messenger already handles macOS/Linux; WSL tested |
+| Race conditions in multi-agent file ops | Low | pi-messenger has lock patterns; we add debate-level idempotency |
+| Token cost of debates exceeds budget | Medium | Hard ConsensusBudget enforcement; debate is opt-in per layer |
+| Debate quality varies with model quality | Medium | Model routing per debate; Haiku for spec critic, Sonnet for code critic |

package/vault/wiki/concepts/pi-vscode-extension-landscape.md ADDED Viewed

@@ -0,0 +1,37 @@
+---
+type: concept
+status: developing
+created: 2026-05-05
+tags:
+  - pi-agent
+  - vscode
+  - extension
+  - ecosystem
+related:
+  - "[[pi-vscode-marketplace]]"
+  - "[[pi-vscode-model-provider-marketplace]]"
+  - "[[vscode-pi-community-extension]]"
+  - "[[pi-coding-agent]]"
+---
+# Pi VS Code Extension Landscape
+## Definition
+Pi ecosystem now has three practical VS Code extension patterns:
+1. **Official terminal bridge** (`pi0.pi-vscode`)
+2. **Model provider bridge** (`tintinweb.vscode-pi-model-chat-provider`)
+3. **Community full chat UI** (`cdervis.vscode-pi`)
+## Pattern Differences
+| Pattern | Primary UX | Best For | Tradeoff |
+|---|---|---|---|
+| Official terminal bridge | Pi terminal + IDE bridge tools | Native Pi workflow with VS Code context | Less integrated chat-first UI |
+| Model provider bridge | Copilot Chat model picker | Reuse Pi models across LM API ecosystem | Less direct Pi session UX |
+| Community full UI | Sidebar + chat UX | Rich local workflow and controls | Unofficial / prerelease risk |
+## Key Insight
+"Pi extension for VS Code" is not single product anymore. It is a small ecosystem with different architectural bets. Teams must choose by workflow shape: terminal-first, model-provider-first, or chat-sidebar-first.

package/vault/wiki/concepts/policy-engine-pattern.md ADDED Viewed

@@ -0,0 +1,78 @@
+---
+type: concept
+title: "Policy Engine Pattern (Pre-Execution Gates)"
+created: 2026-05-01
+updated: 2026-05-01
+status: developing
+tags:
+  - harness
+  - policy
+  - security
+  - gemini-cli
+related:
+  - "[[harness-engineering-first-principles]]"
+  - "[[gemini-cli-architecture]]"
+sources:
+  - "[[Source: Gemini CLI Changelogs]]"
+  - "[[Source: Martin Fowler - Harness Engineering]]"
+  - "[[Source: Augment - Harness Engineering for AI Coding Agents]]"
+---# Policy Engine Pattern: Pre-Execution Gates
+## What It Is
+A Policy Engine is a harness component that enforces **deterministic pre-execution constraints** on agent tool calls. Unlike drift detection (post-hoc), policy gates fire _before_ a tool executes, rejecting calls that violate architectural invariants, security boundaries, or operational policies.
+## Why It Matters
+> "Telling an agent 'follow our coding standards' in a prompt is fundamentally different from wiring a linter that blocks the PR when standards are violated. The first approach relies on probabilistic compliance; the second enforces deterministic constraints."
+> — [[Source: Augment - Harness Engineering for AI Coding Agents]]
+Pre-execution gates prevent failures before they occur. Post-hoc drift detection catches them after. Both needed, but prevention is cheaper.
+## Gemini CLI Implementation (v0.18+)
+- **v0.18 (Nov 2025)**: Experimental policy engine. Fine-grained policy for tool calls.
+- **v0.20**: Persistent "Always Allow" policies for tool executions
+- **v0.24**: Default folder trust set to untrusted. Granular shell command allowlisting.
+- **v0.30**: `--policy` flag for user-defined policies, strict seatbelt profiles
+- **v0.31**: Project-level policies, MCP server wildcards, tool annotation matching
+- **v0.38**: Context-aware persistent approvals for tool execution
+## Policy Dimensions
+1. **Tool-level**: Which tools can be used? Under what conditions?
+2. **Path-level**: Which files/directories can be read? Written?
+3. **Command-level**: Which shell commands are allowed? With what arguments?
+4. **Network-level**: Which domains can be accessed? Ports?
+5. **Context-level**: Policies that vary by project, user, or session state
+6. **Temporal-level**: Time-of-day, session duration, rate limits
+## Pre-Execution Gate Types (Augment PEV)
+1. **Known tool check**: Is this a recognized tool in the registry?
+2. **Argument validation**: Are the arguments valid for this tool?
+3. **User approval check**: Does this action require user confirmation?
+4. **Workspace boundary check**: Is the requested path inside the workspace?
+5. **Plan alignment check**: Does this action match the approved plan?
+## Ultimate-PI Current State
+We have **post-hoc drift detection** (L2.5, L3, L4) but no pre-execution policy gates. Our drift detection catches violations after they occur; a policy engine would prevent them.
+## Integration Path (P-F1)
+Add as L2.7 (between L2 Structured Planning and L3 Grounding):
+1. Define policy schema: tool allowlists, path restrictions, command policies, network policies
+2. Implement policy evaluation engine that checks every tool call before execution
+3. Add user approval flow for policy violations (override prompt)
+4. Invert existing drift detection rules into policy rules where applicable
+5. Add policy effectiveness metrics: false positive rate, blocked violations, agent retry success
+## Relationship to Other Harness Primitives
+- **Feedforward + Feedback**: Policy engine is feedforward (prevent). Drift detection is feedback (detect).
+- **Computational > Inferential**: Policy engine should be computational (deterministic, fast).
+- **Steering Loop**: When policy is too restrictive → false positives → human loosens. When too loose → drift → human tightens.
+- **Keep Quality Left**: Policy engine runs before any tool execution — as far left as possible.

package/vault/wiki/concepts/progressive-disclosure-agents.md ADDED Viewed

@@ -0,0 +1,53 @@
+---
+type: concept
+title: "Progressive Disclosure for Agents"
+created: 2026-04-30
+updated: 2026-04-30
+tags:
+  - agent-architecture
+  - context-window
+  - codebase-exploration
+related:
+  - "[[agent-codebase-interface]]"
+  - "[[repo-map-ranking]]"
+  - "[[aider-repomap-tree-sitter]]"
+status: developing
+---# Progressive Disclosure for Agents
+A strategy for presenting codebase information to agents in layers of increasing detail, matching the agent's navigation pattern.
+## Why It Matters
+Humans use progressive disclosure naturally: they scan file names, open a file, skim headers, drill into functions. Agents need this structured explicitly because they can't "skim" — every byte they read consumes context window and costs tokens.
+## Layers
+### L0: Project Map (always available, minimal tokens)
+- Directory structure
+- Key entry points (main files, config files)
+- Build system and dependencies
+- ~200-500 tokens
+### L1: Symbol Map (on-demand, medium tokens)
+- All top-level symbols (classes, functions, types) with signatures
+- Cross-reference counts per symbol
+- File-to-symbol mapping
+- ~1K-4K tokens (ranked subset for large repos)
+### L2: File Context (on request)
+- Full file contents for specific files
+- Selected by agent based on L0/L1 information
+### L3: Deep Context (on explicit request)
+- Call graphs for specific functions
+- Data flow diagrams
+- Test coverage maps
+## Implementation
+The agent should:
+1. Always receive L0 (free, cached)
+2. Query L1 for relevant symbols based on the task
+3. Request L2 for specific files identified from L1
+4. Request L3 only when stuck or verifying complex interactions

package/vault/wiki/concepts/progressive-skill-disclosure.md ADDED Viewed

@@ -0,0 +1,17 @@
+---
+type: concept
+status: stub
+created: 2026-05-02
+updated: 2026-05-02
+tags: [concept, skills, progressive-disclosure]
+---
+# Progressive Skill Disclosure
+Pattern of loading agent skills on-demand rather than including all available skills in every prompt. Reduces context token usage by keeping only relevant skills active.
+Related to [[agent-skills-pattern]] — the broader pattern of skills as on-demand capability plugins.
+## References
+- [[agent-skills-pattern]]