npm - ultimate-pi - Versions diffs - 0.1.7 → 0.2.2 - Mend

ultimate-pi 0.1.7 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (524) hide show

package/vault/wiki/concepts/agentic-harness-context-enforcement.md DELETED Viewed

@@ -1,91 +0,0 @@
----
-type: concept
-title: Agentic Harness Context Enforcement
-created: 2026-04-30
-updated: 2026-04-30
-tags:
-  - agentic-harness
-  - context-optimization
-  - enforcement
-status: developing
-related:
-  - "[[think-in-code]]"
-  - "[[context-mode]]"
-  - "[[lean-ctx]]"
-sources:
-  - "[[Research: context-mode vs lean-ctx]]"
----# Agentic Harness Context Enforcement
-How to enforce context-efficient behavior ("think in code") in an agentic harness — the orchestration layer that manages AI coding agents.
-## Problem
-AI agents are profligate with context. They call `Read()` on 47 files when 1 script would suffice. They produce verbose pleasantries. They forget what they already read. The harness must enforce discipline because the agent won't do it voluntarily.
-## Enforcement Layers
-### Layer 1: System Prompt / Instructions (cheapest, least reliable)
-- Inject "Think in Code" rules into AGENTS.md or system prompt
-- Works with any agent without custom tools
-- Relies on agent compliance — can be ignored under pressure
-- Examples: context-mode injects rules into 14 platform configs
-### Layer 2: PreToolUse Interception (medium cost, high reliability)
-- Intercept tool calls before execution
-- Route large reads to sandbox execution instead
-- Block dangerous commands (curl, wget, rm -rf)
-- Requires MCP or hook support in the harness
-- Example: context-mode PreToolUse hook
-### Layer 3: PostToolUse Compression (medium cost, medium reliability)
-- After tool output enters context, compress it
-- Strip noise, keep signal
-- Store raw data in searchable index (FTS5)
-- Example: lean-ctx shell hook patterns
-### Layer 4: Tool Replacement (highest cost, highest reliability)
-- Replace native `Read()`, `Bash()`, `WebFetch()` with optimized versions
-- AST-based file reading (signatures only)
-- Shell output compression (pattern-matched)
-- Cached re-reads
-- Example: lean-ctx's 46 MCP tools
-### Layer 5: Governance & Monitoring (supplemental)
-- Profiles define what each agent can do
-- Budgets limit token/cost/shell usage
-- SLOs trigger throttling
-- Anomaly detection for runaway consumption
-- Analytics dashboard for human oversight
-- Example: lean-ctx governance features
-### Layer 6: TypeScript Execution Layer (emerging, high potential)
-- Replace ALL individual tool calls with a single "write TypeScript" tool
-- Agent writes TS code that orchestrates tools via typed API
-- Code executes in sandboxed runtime (Node.js VM, Deno, or Worker isolate)
-- Tool calls dispatch via typed RPC to harness for permission gating
-- Intermediate results stay in sandbox — only final output enters LLM context
-- 3-4x context reduction vs flat tool calling
-- ~20% higher multi-tool success rate (CodeAct, ICML 2024)
-- Validated by: Apple CodeAct, Cloudflare Code Mode, Executor (1.3K stars)
-- See [[ts-execution-layer]] and [[harness-implementation-plan]] (P43)
-## Recommendation for ultimate-pi Harness
-**Current state**: lean-ctx installed as MCP server + shell hook.
-**Gap**: No "Think in Code" enforcement. The harness relies on AGENTS.md rules (Layer 1 only).
-**Recommended additions**:
-1. **Add Think in Code to system prompt** (zero cost, immediate). Update AGENTS.md with the mandatory rule from context-mode's playbook.
-2. **Verify lean-ctx `ctx_execute` works** — lean-ctx has execution capabilities. Test if agent can write and run analysis scripts through lean-ctx tools.
-3. **Consider context-mode as complement** — the two tools solve different halves: context-mode excels at sandbox enforcement + Think in Code paradigm; lean-ctx excels at compression + governance. They could coexist if the MCP namespace doesn't conflict.
-4. **Add output compression rules** — context-mode's output compression (strip filler, fragments OK, short synonyms) can be added to AGENTS.md regardless of tool choice.
-5. **Monitor context usage** — lean-ctx's `gain` dashboard and `wrapped` reports provide visibility. Use them to measure effectiveness of any new enforcement.
-6. **Plan TypeScript Execution Layer (P43)** — the logical extension of Think-in-Code. Instead of enforcing code-over-data for analysis tasks, replace the entire flat tool list with a typed TypeScript API + sandboxed runtime. Agent writes TS code; sandbox executes; only results enter context. 3-4x context reduction, ~20% higher success rate. See [[ts-execution-layer]] and [[harness-implementation-plan]].

package/vault/wiki/concepts/agentic-harness.md DELETED Viewed

@@ -1,34 +0,0 @@
----
-type: concept
-title: "Agentic Harness"
-created: 2026-04-30
-updated: 2026-04-30
-status: seed
-tags: [#concept, #harness]
-related:
-  - "[[harness]]"
-  - "[[harness-implementation-plan]]"
-  - "[[harness-wiki-skill-mapping]]"
----
-# Agentic Harness
-> [!stub] This is a stub page. See [[harness]] for the full module documentation.
-The agentic harness is the central execution pipeline in the ultimate-pi architecture. It enforces an 8-layer mandatory workflow where every task must flow through all layers without skipping.
-## What it does
-- Enforces structured execution (no ad-hoc coding)
-- Runs adversarial verification (critic agents attack, not review)
-- Maintains persistent memory via the wiki vault
-- Orchestrates multi-step plans with grounding checkpoints
-## Key pages
-- [[harness]] — full module documentation
-- [[harness-implementation-plan]] — build phases and token budgets
-- [[harness-wiki-pipeline]] — data flow between harness and wiki
-- [[adr-008]] — Spec-Only Black-Box QA decision
-- [[adr-009]] — Mode B persistent memory decision
-- [[adr-010]] — Harness-wiki tight-coupling contract

package/vault/wiki/concepts/agentic-orchestration-pipeline.md DELETED Viewed

@@ -1,56 +0,0 @@
----
-type: concept
-tags:
-  - orchestration
-  - multi-agent
-  - pipeline
-  - agent-architecture
-related:
-  - "[[Agent Harness Architecture]]"
-  - "[[Multi-Agent Specialization]]"
-  - "[[sources/disler-pi-vs-claude-code]]"
-  - "[[sources/opendev-arxiv-2603.05344v1]]"
----
-# Agentic Orchestration Pipeline
-A structured workflow where multiple specialized AI agents coordinate to complete complex software engineering tasks. The orchestrator decomposes work, routes to specialists, and assembles results.
-## Three Orchestration Patterns
-### 1. Subagent Delegation (Fan-out)
-A primary agent spawns isolated subagents for independent subtasks. Each subagent runs in its own context window with filtered tool access. Results are collected and synthesized by the primary agent.
-**Implementation**: Pi's `subagent-widget` extension (`/sub <task>`), OpenDev's `spawn_subagent` tool.
-**Best for**: Parallel exploration, isolated analysis, background tasks.
-### 2. Team Dispatch (Specialist Routing)
-A dispatcher agent reviews user requests and selects the most appropriate specialist from a predefined roster. Each specialist has a domain-specific system prompt and tool set.
-**Implementation**: Pi's `agent-team` extension, configured via `.pi/agents/teams.yaml`. The dispatcher uses a `dispatch_agent` tool.
-**Best for**: Work that benefits from domain expertise (frontend vs backend, planning vs execution).
-### 3. Sequential Chaining (Pipeline)
-Multiple agents execute in sequence where each step's output feeds into the next step's prompt. The `$INPUT` variable carries the previous step's output; `$ORIGINAL` always contains the initial user prompt.
-**Implementation**: Pi's `agent-chain` extension, defined in `.pi/agents/agent-chain.yaml` as a list of `steps` with `agent` and `prompt` fields.
-**Best for**: Multi-phase workflows (plan → build → review → fix → verify).
-## Design Principles
-1. **Schema-level isolation**: Subagents receive filtered tool schemas — they can't attempt actions they shouldn't perform. More robust than runtime permission checks.
-2. **Context isolation**: Each subagent runs with an independent conversation history. Only summaries return to the parent, preventing context pollution.
-3. **Explicit termination**: Subagents have clear stop conditions to prevent over-exploration.
-4. **Parallel execution**: Independent subagent calls auto-parallelize via thread pools.
-5. **Model specialization**: Different pipeline stages can use different models (e.g., Opus for planning, Sonnet for building, Haiku for reviewing).
-## Harness Implementation Path
-Our harness can adopt all three patterns as Pi extensions:
-1. Extend existing `Agent` tool with team dispatch via YAML config
-2. Add chain orchestration with `$INPUT` variable injection
-3. Implement context isolation per subagent (fresh conversation per spawn)
-4. Add progress dashboards (grid for teams, step tracker for chains)

package/vault/wiki/concepts/agentic-search-no-embeddings.md DELETED Viewed

@@ -1,18 +0,0 @@
----
-type: concept
-status: stub
-created: 2026-05-02
-updated: 2026-05-02
-tags: [concept, search, agents]
----
-# Agentic Search Without Embeddings
-Pattern used by Claude Code: agents search codebases by reading files directly (grep, find, AST traversal) rather than relying on pre-built embedding indexes. No vector database required.
-Contrasts with [[Semantic Codebase Indexing]] and [[hybrid-code-search]]. Relevant to the embedding-vs-agentic-search design tension in harness architecture.
-## References
-- [[claude-code-architecture-vila-lab-2026]]
-- [[agent-search-enforcement]]

package/vault/wiki/concepts/anthropic-context-engineering.md DELETED Viewed

@@ -1,13 +0,0 @@
----
-type: concept
-status: stub
-created: 2026-05-02
-updated: 2026-05-02
-tags: [concept, context]
----
-# Anthropic Context Engineering
-Anthropic's approach to context engineering for Claude agents. Encompasses prompt design, context window management, and tool output formatting.
-Referenced in: [[Research: Meta-Agent Context Drift Detection]]

package/vault/wiki/concepts/antigravity-agent-first-architecture.md DELETED Viewed

@@ -1,61 +0,0 @@
----
-type: concept
-title: "Antigravity Agent-First Architecture"
-status: developing
-created: 2026-05-01
-updated: 2026-05-01
-tags:
-  - antigravity
-  - agent-architecture
-  - harness-design
-aliases: ["agent-first IDE", "Antigravity architecture"]
-related:
-  - "[[agentic-harness]]"
-  - "[[model-adaptive-harness]]"
-  - "[[harness-implementation-plan]]"
-sources:
-  - "[[google-antigravity-official-blog]]"
-  - "[[google-antigravity-wikipedia]]"
-  - "[[cursor-vs-antigravity-2026]]"
----# Antigravity Agent-First Architecture
-Google Antigravity's foundational architectural shift: the IDE is not an AI-enhanced editor. It is a **control plane for autonomous coding agents**.
-## The Two-View Architecture
-### Editor View
-Traditional IDE interface (VS Code fork). Agent sidebar. Tab completions, inline commands. For hands-on synchronous workflows.
-### Manager View ("Mission Control")
-Dedicated orchestration interface. Spawn, supervise, and redirect multiple agents working asynchronously across different workspaces. The human shifts from coder to architect.
-## Core Innovation: The Inversion
-```
-Traditional: Human → IDE → Agent (agent as assistant in sidebar)
-Antigravity: Human → Manager View → Multiple Agents → Editor/Browser/Terminal
-```
-The Manager View inverts the relationship. The interface is embedded in the agent, not the other way around. Agents have direct access to editor, terminal, and browser as equal tool surfaces.
-## What This Means for Harness Design
-Our 8-layer harness is a **pipeline** (sequential, mandatory layers). Antigravity's is a **control plane** (parallel agents, asynchronous execution).
-These are complementary architectures:
-- **Pipeline**: Best for quality enforcement, correctness guarantees, drift detection
-- **Control Plane**: Best for parallelism, task delegation, human oversight
-The harness should adopt the control-plane model for its L7 orchestration layer while keeping the pipeline model for L1-L4 quality enforcement.
-## Four Design Tenets
-1. **Trust**: Artifacts replace raw tool logs. Agents prove work via verifiable deliverables.
-2. **Autonomy**: Agents have full control of multiple surfaces. No constant human prompts.
-3. **Feedback**: Google Docs-style commenting on artifacts. Asynchronous. No restart needed.
-4. **Self-Improvement**: Agents learn from past work. Knowledge base persists across projects.
-## Our Gap
-The harness has no Manager View equivalent. L7 (Schema Orchestration) is DAG-based sequential orchestration, not parallel agent dispatch. This is a design gap — but may be intentional: our harness targets CLI-level enforcement, not IDE-level.

package/vault/wiki/concepts/ast-compression.md DELETED Viewed

@@ -1,19 +0,0 @@
----
-type: concept
-title: "ast-compression"
-created: 2026-04-30
-updated: 2026-04-30
-status: seed
-tags: [#concept, #lean-ctx, #context-optimization]
-related:
-  - "[[lean-ctx]]"
-  - "[[ast-truncation]]"
----
-# AST Compression
-> [!stub] See also: [[ast-truncation]] for the harness-specific implementation.
-lean-ctx's approach to code compression: use tree-sitter to parse code in 18 languages, extract only signatures, types, and logic bodies, and strip comments, whitespace, and non-essential syntax. Achieves 60-95% token reduction on source files.
-Differs from [[ast-truncation]] (which stubs function bodies) in that AST compression preserves logic but strips non-semantic elements, while AST truncation removes function bodies entirely for high-level structural views.

package/vault/wiki/concepts/ast-truncation.md DELETED Viewed

@@ -1,66 +0,0 @@
----
-type: concept
-title: "AST Truncation"
-created: 2026-04-30
-updated: 2026-04-30
-tags:
-  - agent-context
-  - token-reduction
-  - tree-sitter
-  - context-window
-related:
-  - "[[repo-map-ranking]]"
-  - "[[progressive-disclosure-agents]]"
-  - "[[wozcode]]"
-  - "[[research-wozcode-token-reduction]]"
-status: developing
----# AST Truncation
-AST truncation is a technique for reducing LLM input tokens during code exploration by returning function/method signatures while stubbing their bodies. Unlike file-level selection (choose which files to show), AST truncation operates at the syntax level: show the interface, hide the implementation.
-## How It Works
-1. Parse a source file with tree-sitter to produce a concrete syntax tree
-2. Identify all definition nodes: functions, methods, classes, type definitions
-3. For each definition: return the signature (name, parameters, return type, docstring)
-4. Replace the body with a stub: `{ /* ... N lines truncated ... */ }`
-5. The model can request full body expansion for specific definitions
-## Token Savings
-- A typical function signature is 3-10 lines; its body may be 50-500 lines
-- For files with many functions, AST truncation can reduce context by 70-90%
-- The model still sees the "map" (what exists, how things connect) without the "territory" (full implementation)
-## Relationship to Repo-Map Ranking
-[[repo-map-ranking]] selects *which files* to include. AST truncation selects *how much* of each file to include. Combined:
-| Level | Technique | What's Shown |
-|-------|-----------|-------------|
-| L0 | File list | Filenames only |
-| L1 | AST truncation | Signatures + stubs |
-| L2 | AST truncation + imports | Signatures, imports, cross-references |
-| L3 | Full content | Everything (on demand) |
-This maps to and extends our existing [[progressive-disclosure-agents]] model.
-## WOZCODE Implementation
-WOZCODE uses AST truncation as its primary input-reduction lever (Source: [[wozcode]]). Combined with ranked search results (not full-file grep dumps), it reduces input tokens on code exploration calls. Their architecture returns "what the model needs" rather than everything found.
-## Limitations
-- **Dynamic languages**: Python, JavaScript, Ruby — tree-sitter can parse syntax but not always resolve types or call targets statically. Truncation may hide important runtime behavior.
-- **Decorators/metaprogramming**: Code generation patterns (Python decorators, Ruby method_missing, JS proxies) create behavior not visible in AST signatures.
-- **Test files**: Often rely on implicit context (fixtures, before/after hooks). Truncation may hide critical setup.
-- **Parser availability**: Requires tree-sitter grammar for each language in the codebase.
-## Implementation Path for Our Harness
-1. Leverage existing [[repo-map-ranking]] tree-sitter infrastructure
-2. Add a `--truncate` flag to the `read` tool (L8 wiki-query-interface)
-3. Implement progressive expansion: model requests `read --expand funcName`
-4. Integrate with [[grounding-checkpoints]] (L3) for verification reads
-5. Language coverage: start with TypeScript/JavaScript, Python, then extend

package/vault/wiki/concepts/barrel-files.md DELETED Viewed

@@ -1,37 +0,0 @@
----
-type: concept
-status: developing
-tags:
-  - typescript
-  - barrel-files
-  - code-organization
-  - performance
-related:
-  - "[[barrel-files-tkdodo]]"
-  - "[[Research: TypeScript Best Practices and Codebase Structure]]"
-created: 2026-05-02
-updated: 2026-05-02
----# Barrel Files
-A barrel file is a module (typically `index.ts`) that does nothing but re-export symbols from other files in the same directory. It provides a single import entry point for consumers.
-## The Debate
-**Pro-barrel** (traditional view): Clean imports (`import { X, Y } from '@/dir'`), hides internal structure, simplifies refactoring.
-**Anti-barrel** (emerging consensus, 2024+): Causes circular imports, slows development servers, blocks bundler optimizations.
-## Known Problems
-1. **Circular imports**: When a module inside a directory imports from its own barrel, a circular dependency forms.
-2. **Dev server slowdown**: JavaScript loads and parses every module in the barrel synchronously. Real-world case: 11K → 3.5K modules (68% reduction) by removing barrels, cutting startup from 5-10 seconds.
-3. **Blocks `optimizePackageImports`**: Next.js optimization only works on "pure" re-export barrels with no side-effect code.
-## Current Best Practice (2024+)
-**Application code**: Avoid barrel files. Import directly from source files.
-**Library code**: Barrel files are appropriate as the public API entry point (specified in `package.json` `main` field).
-**Linting**: Enable `import/no-cycle` ESLint rule to catch circular imports from barrels.

package/vault/wiki/concepts/browser-harness-agent.md DELETED Viewed

@@ -1,41 +0,0 @@
----
-type: concept
-title: "browser-harness — Self-Healing CDP Harness"
-status: developing
-created: 2026-05-02
-updated: 2026-05-02
-tags:
-  - browser-automation
-  - cdp
-  - headless-browser
-  - browser-harness
-aliases: ["browser-harness", "CDP harness"]
-related:
-  - "[[browser-subagent-visual-verification]]"
-  - "[[harness-implementation-plan]]"
-  - "[[Source: browser-harness CDP Harness]]"
-sources:
-  - "[[Source: browser-harness CDP Harness]]"
----# browser-harness — Self-Healing CDP Harness
-Cutting-edge SOTA thin CDP harness by browser-use (9.4K GitHub stars, MIT, Python). Connects LLMs directly to Chrome via one WebSocket — nothing between. Self-healing: the agent writes missing helper functions mid-execution.
-## Core Idea
-No Puppeteer. No Playwright. No pre-baked helpers. Just raw Chrome DevTools Protocol over a WebSocket. The agent calls `session.Page.navigate()`, `session.Input.dispatchMouseEvent()` — exactly what CDP provides, nothing hidden.
-When the agent encounters a missing interaction pattern, it writes the helper itself in `agent-workspace/agent_helpers.py`. The harness improves itself every run.
-## Architecture
-- **browser-harness** (Python, 9.4K stars): ~592 lines of core. Agent-editable workspace + domain skills.
-- **browser-harness-js** (TypeScript, 428 stars): 652 typed CDP methods. Bun-native REPL. `npx skills add` install.
-## Key Properties
-- **Minimal**: ~592 lines of Python. One WebSocket to Chrome.
-- **Self-healing**: Agent writes missing helpers mid-task.
-- **CDP-native**: 56+ domains, 652+ methods — no wrappers, no abstraction.
-- **Agent-editable**: `agent_helpers.py` and `domain-skills/` designed for agent modification.
-- **No version drift**: Auto-generated from Chrome protocol JSON.

package/vault/wiki/concepts/browser-subagent-visual-verification.md DELETED Viewed

@@ -1,82 +0,0 @@
----
-type: concept
-title: "Browser Subagent for Visual Verification"
-status: developing
-created: 2026-05-01
-updated: 2026-05-02
-tags:
-  - antigravity
-  - browser-automation
-  - visual-verification
-  - tools
-  - agent-browser
-aliases: ["headless browser agent", "visual verification subagent"]
-related:
-  - "[[agentic-harness]]"
-  - "[[harness-implementation-plan]]"
-  - "[[grounding-checkpoints]]"
-  - "[[agent-browser-browser-automation]]"
-sources:
-  - "[[cursor-vs-antigravity-2026]]"
-  - "[[google-antigravity-official-blog]]"
-  - "[[Source: Vercel Labs agent-browser]]"
----
-# Browser Subagent for Visual Verification
-Antigravity's most distinctive technical capability: an agent subprocess that drives a headless Chromium browser to visually verify UI changes.
-## How It Works
-1. Agent makes a code change (e.g., CSS fix)
-2. Agent spins up local dev server
-3. Browser subagent opens headless Chrome
-4. Subagent navigates to the affected page
-5. Takes before/after screenshots
-6. Uses vision-optimized models to analyze pixel differences
-7. Verifies the fix worked visually
-8. Reports results with screenshot evidence
-## Why This Is Revolutionary
-Traditional coding agents are **blind**. They reason about code as text but cannot see what it produces. A CSS change that "looks right" to the model may look completely wrong in the browser. The browser subagent closes this loop.
-## Use Cases
-- **CSS/UI fixes**: Agent sees if padding/margins/layout actually work
-- **Visual regression testing**: Before/after screenshots as verifiable artifacts
-- **Cross-device verification**: Test at different viewport sizes
-- **Form interaction testing**: Click buttons, fill forms, verify behavior
-- **Login flow testing**: Automate auth flows end-to-end
-## Gap in Our Harness
-Our harness has **no browser control capability**. All verification is:
-- **Syntax-level** (P11 inline validation, P20 lint/format)
-- **Semantic-level** (L4 adversarial critic)
-- **Observability-level** (L5 metrics)
-None of this can verify that a UI change actually produced the correct visual result.
-## Proposed Integration: Phase P30
-Add a **Browser Subagent** to the tool registry:
-- `lib/harness-browser.ts` — agent-browser driving headless Chrome via Rust daemon
-- `extensions/harness-browser.ts` — Extension hook: after UI-related edits, optionally trigger visual verification
-- Configurable: `.pi/harness/browser.json` — enable/disable, screenshot directories, viewport configs
-The browser subagent operates as a specialized subagent (P25 router dispatches UI tasks to it). It reports results as artifacts (P31).
-> [!update] May 2026: Replaced browser-harness (9.4K stars, Python) with **Vercel Labs agent-browser** (31.4K stars, Apache 2.0, Rust-native). agent-browser provides richer AI agent integration: snapshot + refs workflow, annotated screenshots, structured diff, React introspection, Web Vitals, batch mode, and built-in skills system. See [[agent-browser-browser-automation]] and [[Source: Vercel Labs agent-browser]].
-### Why agent-browser over browser-harness
-| Feature | browser-harness | agent-browser |
-|---------|----------------|---------------|
-| **Ecosystem** | 9.4K stars, Python | 31.4K stars, Rust-native binary |
-| **Agent workflow** | Raw CDP — agent writes helpers | Snapshot + @eN refs — purpose-built |
-| **Visual diff** | None | `diff screenshot --baseline before.png` |
-| **Annotated screenshots** | None | `--annotate` with numbered labels |
-| **Skills system** | None | `skills get core`, `npx skills add` |
-| **Batch mode** | None | Multi-command single invocation |
-| **Install** | `uv add browser-harness` (Python dep) | `npm install -g agent-browser` (single binary) |