npm - ultimate-pi - Versions diffs - 0.1.2 → 0.1.4 - Mend

ultimate-pi 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (516) hide show

package/vault/wiki/sources/Source: TianPan Prompt Caching Architecture.md ADDED Viewed

@@ -0,0 +1,89 @@
+---
+type: source
+status: ingested
+source_type: engineering-blog
+title: "Prompt Caching: The Optimization That Cuts LLM Costs by 90%"
+author: "Tian Pan"
+date_published: 2026-04-07
+url: "https://tianpan.co/blog/2025-10-13-prompt-caching-cut-llm-costs"
+confidence: high
+tags:
+  - prompt-caching
+  - cost-optimization
+  - multi-model
+  - cache-architecture
+related:
+  - "[[Research: Prompt Renderer for Multi-Model Agent Harness]]"
+  - "[[Source: Arxiv — Don't Break the Cache]]"
+key_claims:
+  - "Most teams overpay 60-90% by reprocessing the same tokens on every request"
+  - "Multi-tier caching: Semantic cache (100% savings) → Prefix cache (50-90% savings) → Full inference (0% savings)"
+  - "Golden rule: static content first, dynamic content last — injecting timestamps/user IDs breaks the cache"
+  - "Parallel execution trap: firing parallel requests before cache warms → 4% hit rate. Fix: dedicated warmup call"
+  - "Anthropic: cache write 25% premium, cache read 90% discount — break-even at 1.4 cache hits"
+  - "OpenAI: auto-caching, 50% discount, no write premium"
+  - "Monitor: cache hit rate = cache_read_input_tokens / total_input_tokens, target 70%+"
+created: 2026-05-02
+updated: 2026-05-02
+---# Multi-Tier Prompt Caching Architecture
+## Three-Tier Stack
+```
+Request
+→ Semantic cache (exact/near-duplicate queries) → 100% savings
+→ Prefix cache (shared static context)           → 50-90% savings
+→ Full inference                                  → 0% savings
+```
+Well-tuned system routes 70-80% of tokens through caching layers.
+## Prompt Structure IS Cache Architecture
+The golden rule: **static content first, dynamic content last.**
+```
+[System prompt — stable across all requests]         ← CACHED
+[Retrieved documents — stable for a given session]   ← CACHED
+[Conversation history — grows per turn]              ← PARTIAL
+[Current user message — always new]                  ← NEVER
+```
+Cache-breaking anti-patterns:
+- Timestamps in system prompts
+- User IDs in static sections
+- Request IDs injected early
+- Document content that varies slightly across requests
+## Provider Differences
+| Provider | Cache Control | Write Cost | Read Cost | TTL |
+|----------|-------------|-----------|-----------|-----|
+| Anthropic | Explicit `cache_control` markers | +25% premium | 90% discount | 5min (extends to 1h) |
+| OpenAI | Automatic | None | 50% discount | 5min |
+| Google | Explicit context cache | Storage cost | Guaranteed discount | Configurable |
+| vLLM (self-host) | Automatic prefix caching (APC) | None | 14-24x throughput | Hash-table KV blocks |
+## The Parallel Execution Trap
+**Problem**: Firing 10 parallel requests before cache is written → 10 cache writes, 0 reads → 5-10x expected cost.
+**Fix**: Dedicated warmup call with `max_tokens=1` before parallel processing.
+Cost comparison for 30K-token document with 3 parallel questions: $0.34 without warming vs $0.14 with warming — 59% reduction.
+## When Caching Hurts
+- One-shot workflows: everything is unique, you're paying write premiums for zero reads
+- Dynamic system prompts: personalization undermines prefix caching
+- Short prompts: below 1,024-token threshold, caching doesn't engage
+- Cold starts: freshly deployed services, cache TTL expiry at low-traffic hours
+## Relevance to ultimate-pi Prompt Renderer
+The caching layer in the prompt renderer should:
+1. **Hash-based cache keys**: hash the base spec + variables → deterministic cache lookup
+2. **Pre-compiled prompts shipped in npm**: eliminates cache warmup entirely — prompts are pre-rendered at build time
+3. **Output caching for rendered prompts**: if same spec+model+vars produces the same output, return cached result
+4. **Monitoring**: track renderer cache hit rate (prompts served from pre-compiled vs runtime-rendered)

package/vault/wiki/sources/Source: Vercel Labs agent-browser.md ADDED Viewed

@@ -0,0 +1,155 @@
+---
+type: source
+status: ingested
+source_type: official-repo
+title: "agent-browser — Browser Automation CLI for AI Agents by Vercel Labs"
+author: "Vercel Labs"
+date_published: 2026-04-16
+url: "https://github.com/vercel-labs/agent-browser"
+confidence: high
+tags:
+  - browser-automation
+  - ai-agents
+  - vercel-labs
+  - rust
+  - cdp
+  - headless-browser
+related:
+  - "[[agent-browser-browser-automation]]"
+  - "[[browser-subagent-visual-verification]]"
+  - "[[harness-implementation-plan]]"
+key_claims:
+  - "31.4K GitHub stars, 1.9K forks, 568 commits, Apache 2.0 — Rust-native browser automation CLI for AI agents"
+  - "Native Rust CLI + daemon — single binary, no Node.js required after install"
+  - "npm install -g agent-browser for global install. Also Homebrew, Cargo."
+  - "Snapshot + refs workflow optimized for LLMs: snapshot -i → click @e2 → fill @e3"
+  - "React introspection: react tree, react inspect, react renders, react suspense"
+  - "Web Vitals: LCP/CLS/TTFB/FCP/INP with React hydration phases"
+  - "Annotated screenshots: --annotate overlays numbered labels matching @eN refs"
+  - "Diff: structural snapshot diff + visual pixel diff between pages/states"
+  - "Multi-provider: Chrome local, Browserless, Browserbase, Browser Use, Kernel, AgentCore, iOS"
+  - "Security: domain allowlist, action policy, content boundaries, action confirmation"
+  - "Skills system: agent-browser skills get core — 420-line usage guide, npx skills add"
+  - "Dashboard: local web dashboard with live viewport, activity feed, AI chat"
+  - "batch mode: multi-command single invocation; session persistence; auth vault"
+  - "112 contributors, 81 releases, Rust 85% + TypeScript 12.4%"
+created: 2026-05-02
+updated: 2026-05-02
+---
+# agent-browser — Browser Automation CLI for AI Agents
+**Repository**: https://github.com/vercel-labs/agent-browser
+**Stars**: 31.4K | **Forks**: 1.9K | **Commits**: 568 | **License**: Apache 2.0
+**Language**: Rust 85% + TypeScript 12.4% | **Status**: Active (v0.26.0, Apr 16, 2026)
+## What It Is
+agent-browser is a **Rust-native browser automation CLI purpose-built for AI agents**. Unlike traditional browser automation tools (Puppeteer, Playwright, Selenium) designed for human scripting, agent-browser provides an agent-first interface: snapshot-based element refs (`@e1`, `@e2`), JSON output mode, annotated screenshots, and structured diff commands.
+**Core philosophy**: Give AI agents a CLI that speaks their language — refs, snapshots, JSON — not a scripting API.
+## Architecture
+```
+AI Agent → agent-browser CLI → Rust Daemon → Chrome DevTools Protocol → Chrome
+              ↑                        ↑
+              skills/                   agent-browser.json (config)
+              (SKILL.md discovery)      .agent-browser/ (sessions, auth)
+```
+- **Rust CLI**: Parses commands, communicates with daemon via IPC
+- **Rust Daemon**: Pure Rust daemon using direct CDP. No Node.js required.
+- **Client-daemon model**: Daemon auto-starts, persists between commands for speed
+- **Multi-provider backend**: Local Chrome, Browserless, Browserbase, Browser Use, Kernel, AgentCore, iOS
+## Key Properties
+| Property | Description |
+|----------|-------------|
+| **Agent-native** | Snapshot with refs (`@e1`), JSON output, annotated screenshots with matching labels |
+| **Rust-native** | Single binary, sub-second startup. 85% Rust, 12.4% TypeScript |
+| **Full CLI** | 80+ commands: navigate, interact, snapshot, screenshot, diff, react, network, auth |
+| **Skills system** | `agent-browser skills get core` — 420-line usage guide. `npx skills add vercel-labs/agent-browser` |
+| **Security-first** | Domain allowlist, action policy, content boundaries, auth vault with AES-256-GCM encryption |
+| **Multi-provider** | Local Chrome + 6 cloud providers (Browserless, Browserbase, Browser Use, Kernel, AgentCore, iOS) |
+| **Dashboard** | Local web dashboard (port 4848) with live viewport, activity feed, AI chat |
+| **Batch mode** | Multiple commands in single CLI invocation, JSON stdin mode |
+| **Diff** | Structural snapshot diff + visual pixel diff between before/after states |
+| **React introspection** | React component tree, fiber inspection, suspense boundaries, render profiling |
+| **Web Vitals** | LCP/CLS/TTFB/FCP/INP with React hydration phase breakdown |
+## Why agent-browser Replaces browser-harness for P30
+| Aspect | browser-harness | agent-browser |
+|--------|----------------|---------------|
+| **Stars** | 9.4K | 31.4K (3.3× larger) |
+| **Language** | Python (~592 lines core) | Rust (native binary, sub-second) |
+| **AI agent workflow** | Raw CDP — agent writes helpers mid-execution | Snapshot + refs — purpose-built for LLMs |
+| **Skill system** | None | Built-in: `skills get core`, `npx skills add` |
+| **Diff/verify** | None | Structural + visual diff between states |
+| **Annotated screenshots** | None | `--annotate` with numbered labels → `@eN` refs |
+| **React/Vitals** | None | `react tree`, `react renders`, `vitals` |
+| **Security** | None | Domain allowlist, action policy, boundaries, auth vault |
+| **Cloud providers** | None | 6 providers (Browserless, Browserbase, BW, Kernel, AgentCore, iOS) |
+| **Dashboard** | None | Live viewport + activity feed + AI chat |
+| **Install** | `uv add browser-harness` (Python) | `npm install -g agent-browser` (single binary) |
+| **Maturity** | 253 commits, 1 main contributor | 568 commits, 112 contributors, 81 releases |
+| **License** | MIT | Apache 2.0 |
+## Integration with ultimate-pi Harness (P30)
+```
+P25 Subagent Router → P30 Browser Subagent
+    ↓
+agent-browser CLI (Rust binary, sub-second startup)
+    ↓
+Chrome DevTools Protocol (Rust daemon)
+    ↓
+Chrome (headless or headed)
+    ↓
+Visual verification: agent-browser snapshot -i → click @e2 → screenshot --annotate
+    ↓
+Diff: agent-browser diff screenshot --baseline before.png
+```
+### Harness Config
+```json
+// .pi/harness/browser.json
+{
+  "engine": "agent-browser",
+  "mode": "headless",
+  "screenshot_dir": ".pi/harness/screenshots/",
+  "viewport": {"width": 1280, "height": 720},
+  "timeout_ms": 25000
+}
+```
+### Key Commands for Harness P30
+```bash
+# Navigate and snapshot
+agent-browser open <url> && agent-browser snapshot -i --json
+# Interact via refs
+agent-browser click @e2
+agent-browser fill @e3 "text"
+# Visual verification
+agent-browser screenshot --annotate before.png
+# ... code change ...
+agent-browser reload
+agent-browser screenshot --annotate after.png
+agent-browser diff screenshot --baseline before.png -o diff.png
+# Structural diff
+agent-browser diff snapshot --baseline before-snapshot.txt
+```
+## What We Deliberately Do NOT Adopt
+- **Dashboard UI**: CLI harness only. Dashboard is nice-to-have for debugging but not integrated.
+- **AI Chat feature**: Uses Vercel AI Gateway. Our agent IS the chat. Not needed.
+- **Cloud providers**: Local Chrome only for harness. Cloud providers add latency and cost. Available as opt-in.
+- **iOS Simulator**: Out of scope for web-focused harness. Available as opt-in.

package/vault/wiki/sources/Source: browser-harness CDP Harness.md ADDED Viewed

@@ -0,0 +1,126 @@
+---
+type: source
+status: ingested
+source_type: official-repo
+title: "browser-harness — Self-Healing CDP Harness by browser-use"
+author: "browser-use"
+date_published: 2026-04-17
+url: "https://github.com/browser-use/browser-harness"
+confidence: high
+tags:
+  - browser-automation
+  - cdp
+  - headless-browser
+  - browser-harness
+  - self-healing
+related:
+  - "[[Research: Google Antigravity Harness Integration]]"
+  - "[[browser-subagent-visual-verification]]"
+  - "[[browser-harness-agent]]"
+key_claims:
+  - "9.4K GitHub stars, 855 forks, 253 commits, MIT license — thin CDP harness for LLM browser control"
+  - "~592 lines of Python core — connects LLM directly to Chrome via one WebSocket, nothing between"
+  - "Self-healing: agent writes missing helper functions mid-task during execution"
+  - "TypeScript version available: browser-harness-js (428 stars, Bun-native, 652 typed CDP wrappers)"
+  - "No pre-baked helpers — raw CDP protocol. Agent calls session.Domain.method() directly"
+  - "One WebSocket to Chrome, zero abstraction layers. The protocol IS the API"
+  - "agent-workspace: agent-editable helper code + domain-skills/ for reusable per-site playbooks"
+created: 2026-05-02
+updated: 2026-05-02
+---
+# browser-harness — Self-Healing CDP Harness
+**Repository**: https://github.com/browser-use/browser-harness
+**Stars**: 9.4K | **Forks**: 855 | **Commits**: 253 | **License**: MIT
+**Language**: Python 100% | **Status**: Active (commits today — May 2, 2026)
+## What It Is
+browser-harness is a **minimal, self-healing CDP harness** that connects LLMs directly to Chrome via the Chrome DevTools Protocol. Unlike Puppeteer/Playwright (which wrap CDP with high-level helper APIs), browser-harness gives the LLM **direct CDP access** — the agent writes what's missing during execution.
+**Core philosophy**: "One WebSocket to Chrome, nothing between. The agent writes what's missing during execution. The harness improves itself every run."
+## Architecture
+```
+LLM Agent → browser-harness → Chrome DevTools Protocol (CDP) → Chrome
+              ↑
+              agent-workspace/agent_helpers.py (agent edits this!)
+              agent-workspace/domain-skills/ (reusable per-site playbooks)
+```
+## Key Properties
+| Property | Description |
+|----------|-------------|
+| **Minimal** | ~592 lines of Python core. No Puppeteer, no Playwright, no Selenium. |
+| **Self-healing** | Agent encounters missing helper → agent writes it mid-task → harness works next time. |
+| **CDP-native** | Direct `session.Page.navigate()`, `session.Input.dispatchMouseEvent()` — no wrappers. |
+| **Thin** | One WebSocket to Chrome. Nothing between the LLM and the browser. |
+| **Agent-editable** | `agent-workspace/agent_helpers.py` is designed for the agent to edit during execution. |
+| **Domain skills** | `agent-workspace/domain-skills/` — reusable playbooks per site (GitHub, LinkedIn, Amazon…). |
+## TypeScript Version: browser-harness-js
+**Repository**: https://github.com/browser-use/browser-harness-js
+**Stars**: 428 | **License**: MIT | **Language**: TypeScript 99.4%
+- 56 CDP domains, 652 typed wrappers — auto-generated from protocol JSON
+- `npx skills add https://github.com/browser-use/browser-harness-js`
+- Bun-native REPL server. CLI forwards snippets to running session.
+- **No helpers at all** — "The protocol IS the API. If Chrome can do it, you can call it."
+- Pure CDP recipes in `interaction-skills/`
+## Why browser-harness Replaces Puppeteer for P30
+| Aspect | Puppeteer | browser-harness |
+|--------|-----------|-----------------|
+| **Abstraction level** | High-level helpers (page.click, page.type) | Raw CDP (session.Input.dispatchMouseEvent) |
+| **LLM-native** | Designed for human scripting | Designed for LLMs to write CDP calls directly |
+| **Self-healing** | No — fix scripts manually | Yes — agent writes missing helpers mid-execution |
+| **Weight** | Heavy npm package + Chromium download | ~592 lines of Python or ~650 typed CDP wrappers in TS |
+| **Freedom** | Limited to pre-built helper API | Complete CDP freedom — all 56+ domains accessible |
+| **Version drift** | Puppeteer must update for new Chrome features | Auto-generated from CDP protocol JSON — always current |
+| **Deployment** | `npm install puppeteer` | `uv init && uv add browser-harness` (Python) or `npx skills add` (JS) |
+## Integration with ultimate-pi Harness (P30)
+```
+P25 Subagent Router → P30 Browser Subagent
+    ↓
+browser-harness (thin CDP harness)
+    ↓
+Chrome DevTools Protocol (one WebSocket)
+    ↓
+Chrome (headless or headed)
+    ↓
+Visual verification: screenshots via CDP Page.captureScreenshot
+    ↓
+Self-healing: agent writes missing interaction helpers in agent_helpers.py
+```
+### TypeScript Stack Preference
+For our TypeScript harness, **browser-harness-js** is the natural fit:
+- TypeScript-native (99.4% TS)
+- 652 typed CDP methods auto-generated from protocol JSON
+- Installed via `npx skills add` — no Python dependency
+- Bun REPL server for persistent sessions across agent turns
+For maximum capability (domain skills, mature agent-workspace), **browser-harness** (Python) provides more features. Hybrid approach: use browser-harness-js for the core CDP bridge, borrow the domain-skills pattern from browser-harness.
+### Config
+```json
+// .pi/harness/browser.json
+{
+  "engine": "browser-harness",
+  "variant": "browser-harness-js",
+  "mode": "headless",
+  "cdp_url": "http://localhost:9222",
+  "screenshot_dir": ".pi/harness/screenshots/",
+  "agent_workspace": ".pi/harness/browser-workspace/"
+}
+```

package/vault/wiki/sources/agent-drift-academic-paper.md ADDED Viewed

@@ -0,0 +1,79 @@
+---
+type: source
+status: ingested
+source_type: academic-paper
+title: "Agent Drift: Quantifying Behavioral Degradation in Multi-Agent LLM Systems Over Extended Interactions"
+author: Abhishek Rath
+date_published: 2026-01-07
+url: https://arxiv.org/abs/2601.04170
+confidence: high
+key_claims:
+  - "Agent drift: progressive degradation in behavior, decision quality, and inter-agent coherence"
+  - "42% task success rate reduction, 3.2x human intervention increase in drifted systems"
+  - "ASI (Agent Stability Index): composite metric across 12 behavioral dimensions"
+  - "Three drift types: semantic, coordination, behavioral"
+  - "Combined mitigation strategies achieve 81.5% drift reduction"
+tags:
+  - source
+  - academic
+  - agent-drift
+  - multi-agent
+  - reliability
+related:
+  - "[[Research: Meta-Agent Context Drift Detection]]"
+  - "[[context-drift-in-agents]]"
+  - "[[agent-loop-detection-patterns]]"
+created: 2026-05-02
+updated: 2026-05-02
+---# Agent Drift: Academic Paper
+## Summary
+Foundational academic paper establishing agent drift as a measurable, quantifiable phenomenon in multi-agent LLM systems. Introduces the Agent Stability Index (ASI) — a composite metric across 12 dimensions in 4 categories. Demonstrates through simulation that unchecked drift causes 42% task success reduction and 3.2x human intervention increase.
+## What It Contributes
+Provides the academic foundation for agent drift as a real problem (not just anecdotal). The ASI framework gives a rigorous measurement methodology. The mitigation strategies (EMC, DAR, ABA) validate that drift can be controlled. Establishes that context window pollution is a primary mechanism — directly supporting the case for context pruning.
+## Three Drift Types
+1. **Semantic drift**: Agent outputs progressively deviate from original task intent while remaining syntactically valid
+2. **Coordination drift**: Multi-agent consensus mechanisms degrade, leading to conflicts, redundant work
+3. **Behavioral drift**: Agents develop novel strategies not present in initial interactions
+## Agent Stability Index (ASI)
+Composite metric across 12 dimensions in 4 categories:
+1. **Response Consistency** (weight: 0.30): Output semantic similarity, decision pathway stability, confidence calibration
+2. **Tool Usage Patterns** (weight: 0.25): Tool selection stability, tool sequencing consistency, parameterization drift
+3. **Inter-Agent Coordination** (weight: 0.25): Consensus agreement rate, handoff efficiency, role adherence
+4. **Behavioral Boundaries** (weight: 0.20): Output length stability, error pattern emergence, human intervention rate
+ASI computed over rolling 50-interaction windows. Drift detected when ASI <0.75 for 3 consecutive windows.
+## Key Findings
+- Drift emerges after median 73 interactions (far earlier than expected)
+- Drift accelerates: 0.08 ASI decline per 50 interactions (early) → 0.19 per 50 (late)
+- Financial analysis agents drift fastest (53.2% by 500 interactions) due to task ambiguity
+- Two-level hierarchies (router + specialists) are most drift-resistant
+- External memory systems (vector DBs, structured logs) provide "behavioral anchors"
+## Three Mitigation Strategies
+1. **Episodic Memory Consolidation (EMC)**: Periodic compression of agent interaction histories → 51.9% drift reduction
+2. **Drift-Aware Routing (DAR)**: Router uses agent stability scores in delegation, resets drifting agents → 63.0% reduction
+3. **Adaptive Behavioral Anchoring (ABA)**: Few-shot prompt augmentation with baseline exemplars → 70.4% reduction
+4. **Combined (all three)**: 81.5% drift reduction, 23% computational overhead
+## Three Causal Mechanisms
+1. **Context window pollution**: Interaction histories fill with irrelevant information, diluting signal-to-noise
+2. **Distributional shift**: Agents encounter input distributions increasingly divergent from training data
+3. **Reinforcement through autoregression**: Small errors compound through feedback loops in shared memory
+## Relevance to Meta-Agent Concept
+This paper validates that context window pollution is a primary causal mechanism of agent drift. Context pruning directly addresses this mechanism. The ASI framework provides metrics for evaluating whether pruning is effective. The finding that drift emerges after ~73 interactions sets a natural checkpoint frequency for meta-agent monitoring.

package/vault/wiki/sources/aider-repomap-tree-sitter.md ADDED Viewed

@@ -0,0 +1,42 @@
+---
+type: source
+source_type: blog
+title: "Building a better repository map with tree-sitter"
+author: "Aider (Paul Gauthier)"
+date_published: 2023-10-22
+url: "https://aider.chat/2023/10/22/repomap.html"
+confidence: high
+key_claims:
+  - "Repo maps provide GPT with a concise view of the entire codebase: files + key symbols with signatures"
+  - "tree-sitter parses source into AST to extract definitions and cross-references"
+  - "Graph ranking algorithm selects most important portions that fit within token budget (default 1k tokens)"
+  - "GPT can use the map to autonomously decide which files to inspect further"
+  - "Sending whole files wastes context window; repo map is a compressed representation"
+  - "Most important identifiers are those most referenced by other portions of code"
+status: ingested
+tags:
+  - agent-context
+  - tree-sitter
+  - repo-map
+  - context-window
+created: 2023-10-22
+updated: 2026-04-30
+---# Building a better repository map with tree-sitter
+Aider's approach to solving the "code context" problem for LLMs. When an LLM needs to make changes in a large codebase, it must understand how the target code relates to the rest of the codebase. Aider sends a concise repository map built via tree-sitter AST parsing.
+## Core Technique
+1. **tree-sitter parsing**: Extract all symbol definitions (classes, functions, methods, variables, types) from every source file
+2. **Reference tracking**: Identify where each symbol is used across the codebase
+3. **Graph ranking**: Build a dependency graph (files = nodes, dependencies = edges). Rank nodes by importance — most-referenced symbols are most important.
+4. **Token budget**: Select the top-ranked nodes that fit within a configurable token budget (default 1k tokens)
+5. **Dynamic adjustment**: Map expands when no files are in chat (need full context) and contracts when working on specific files
+## Why This Works for Agents
+- GPT sees call signatures and class structures across the entire repo
+- Can autonomously decide which files to request for deeper inspection
+- Compressed representation — doesn't waste context window on implementation details
+- Tree-sitter is language-aware, producing structured, accurate symbol extraction

package/vault/wiki/sources/anthropic-compaction-api.md ADDED Viewed

@@ -0,0 +1,58 @@
+---
+type: source
+source_type: official-docs
+title: "Anthropic Context Compaction API (Beta)"
+author: Anthropic
+date_published: 2026-01-12
+date_accessed: 2026-05-05
+url: "https://docs.anthropic.com/en/docs/build-with-claude/compaction"
+confidence: high
+tags:
+  - anthropic
+  - claude
+  - compaction
+  - api
+  - context-management
+key_claims:
+  - "Server-side automatic summarization when input tokens exceed threshold"
+  - "Beta, launched January 2026, header: compact-2026-01-12"
+  - "Supported models: Claude Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6"
+  - "Creates compaction block, drops all prior messages on next request"
+  - "ZDR (Zero Data Retention) eligible"
+  - "Context Folding available as first-class API primitive in context-management"
+---
+# Anthropic Context Compaction API
+## Summary
+Anthropic released a server-side context compaction API in beta (January 2026). When enabled, the API automatically detects when input tokens exceed a configurable threshold, generates a summary, creates a `compaction` block, and drops all prior messages on the next request.
+## How It Works
+1. Add `compact_20260112` to `context_management.edits` in Messages API request
+2. Include beta header `compact-2026-01-12`
+3. API detects when tokens exceed trigger threshold
+4. Generates summary → creates compaction block → continues response
+5. Subsequent requests automatically drop all pre-compaction messages
+## Supported Models
+- Claude Mythos Preview
+- Claude Opus 4.7
+- Claude Opus 4.6
+- Claude Sonnet 4.6
+## Ideal Use Cases
+- Long-running chat conversations
+- Tool-heavy agentic workflows
+- Multi-turn conversations exceeding context limits
+## Relevance to pi-vcc
+This is Anthropic's official take on compaction — LLM-based, server-side, automatic. It validates that compaction is a first-class concern. However, it has all the failure modes pi-vcc avoids: non-deterministic, no recall, LLM cost. Pi could theoretically use this API as a backend for its compaction, but pi-vcc's deterministic approach remains architecturally distinct.
+## Context Folding
+Context Folding (arXiv 2510.11967) is now available as a first-class API primitive in Anthropic's beta context-management. Agents can branch/return sub-trajectories, with intermediate steps "folded" away.

package/vault/wiki/sources/anthropic-effective-harnesses.md ADDED Viewed

@@ -0,0 +1,42 @@
+---
+type: source
+source_type: engineering-blog
+author: "Justin Young (Anthropic)"
+date_published: 2025
+url: https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
+confidence: high
+tags:
+  - anthropic
+  - agent-harness
+  - long-running-agents
+  - context-windows
+key_claims:
+  - "A harness is the runtime framework that coordinates tool dispatch, context lifecycle, progress tracking, and clean handoffs between context windows"
+  - "Long-running agents need structured handoffs between context windows"
+  - "The harness must manage context as a finite resource across extended timeframes"
+---
+# Effective Harnesses for Long-Running Agents
+Anthropic Engineering Blog — 2025. By Justin Young.
+## Core Definition
+A harness is the runtime orchestration layer that wraps the core reasoning loop and coordinates:
+- Tool dispatch
+- Context lifecycle management
+- Safety enforcement
+- Session persistence
+- Progress tracking
+- Clean handoffs between context windows
+## Key Principles
+1. **Context windows are finite resources** — the harness must manage them explicitly across long timeframes
+2. **Structured handoffs** — when context fills, the harness must summarize and transfer state to a fresh window
+3. **Progress tracking** — agents must maintain awareness of what's been done across context boundaries
+4. **Safety invariants** — the harness enforces constraints that persist across context resets
+## Relevance
+This is the authoritative definition of "harness" as used in the agent engineering community. It maps directly to disler's Pi extensions (subagent-widget, agent-team, agent-chain) and OpenDev's four-layer architecture. Our harness implementation should treat context as a managed resource with explicit handoff protocols.