npm - ultimate-pi - Versions diffs - 0.1.2 → 0.1.3 - Mend

ultimate-pi 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (516) hide show

package/vault/wiki/sources/cursor-agent-best-practices-2026.md ADDED Viewed

@@ -0,0 +1,62 @@
+---
+type: source
+status: ingested
+source_type: engineering-blog
+title: "Best Practices for Coding with Agents"
+author: "Lee Robinson (Cursor/Anysphere)"
+date_published: 2026-01-09
+url: "https://cursor.com/blog/agent-best-practices"
+confidence: high
+tags: [cursor, agent-best-practices, plan-mode, hooks, skills, context-management]
+key_claims:
+  - "Agent harness = Instructions + Tools + Model, tuned per model family"
+  - "Plan Mode: research codebase → clarify → plan → approve → build"
+  - "Context management: let agent find context dynamically; don't pre-load everything"
+  - "Rules (.cursor/rules/): static always-on context. Skills (SKILL.md): dynamic on-demand capabilities"
+  - "Long-running agent hooks: stop hooks that re-invoke agent until goal achieved"
+  - "Git worktree isolation for parallel agents"
+  - "Multi-model parallel execution with judging"
+  - "Context anxiety: models start refusing work as context fills up"
+created: 2026-05-02
+updated: 2026-05-02
+---
+# Best Practices for Coding with Agents
+Cursor's official guide (Lee Robinson, Jan 2026) covering agent harness design, Plan Mode, context management strategies, Rules/Skills system, long-running agent hooks, parallel agents via git worktrees, and workflow patterns.
+## Harness Components
+1. **Instructions**: System prompt + rules guiding agent behavior
+2. **Tools**: File editing, codebase search, terminal execution
+3. **Model**: The agent model for the task
+Cursor tunes instructions and tools specifically for every frontier model based on internal evals and external benchmarks.
+## Plan Mode
+`Shift+Tab` toggles Plan Mode. Agent:
+1. Researches codebase for relevant files
+2. Asks clarifying questions
+3. Creates detailed implementation plan with file paths
+4. Waits for approval before building
+Plans open as editable Markdown. Save to `.cursor/plans/` for documentation + future agent context.
+## Context Management
+- Let agent find context via grep + semantic search — don't pre-tag every file
+- Start new conversation per task; continue for iterations on same feature
+- `@Past Chats` to reference previous work selectively
+- Long conversations cause context noise → agent loses focus
+## Long-Running Agent Hooks
+Stop hooks in `.cursor/hooks.json` that re-invoke agent via `followup_message` until a DONE condition is met (scratchpad check). Max iteration guard. Pattern: run tests, fix until pass.
+## Parallel Agents
+Git worktrees provide isolated workspaces per agent. Multiple models can run same prompt simultaneously; Cursor judges which solution is best. Apply to merge results back.
+## Relevance to Harness
+Directly validates our: L2 Structured Planning (Plan Mode), SKILL.md system (Skills), context drift concerns (context anxiety), and model-adaptive harness design. Long-running agent hooks are an elegant alternative to our drift monitor's stop-only approach — we need both.

package/vault/wiki/sources/cursor-fork-29b-2025.md ADDED Viewed

@@ -0,0 +1,50 @@
+---
+type: source
+status: ingested
+source_type: analysis
+title: "Cursor: How Forking VS Code Built a $29B Company"
+author: "MMNTM Research"
+date_published: 2025-12-15
+url: "https://www.mmntm.net/articles/cursor-deep-dive"
+confidence: medium
+tags: [cursor, vs-code-fork, vertical-agents, architecture, business]
+key_claims:
+  - "Forking VS Code = root access to developer workflow. Plugins cannot replicate this"
+  - "Extension API constraints: limited UI, process isolation, context blindness"
+  - "Shadow Workspace, native diffs, terminal interception, cursor teleportation all require fork"
+  - "Model agnosticism as competitive moat vs Copilot's OpenAI lock-in"
+  - "Vertical agent thesis: interface and intelligence cannot be decoupled"
+  - "The fork tax: constant upstream VS Code merges required"
+created: 2026-05-02
+updated: 2026-05-02
+---
+# Cursor: How Forking VS Code Built a $29B Company
+MMNTM Research analysis (Dec 2025) of Cursor's architectural strategy and business model.
+## The Extension Trap
+VS Code Extension API constraints:
+- Limited UI control (no inline diff rendering)
+- Process isolation (Extension Host separate from Renderer/Main)
+- Context blindness (can't cheaply access full editor state)
+Copilot operates within these constraints. Cursor bypasses them by forking VS Code entirely.
+## The Fork = Root Access
+Forking under MIT license gave Anysphere access to C++ and TypeScript internals. Enables:
+- Shadow Workspace (hidden parallel editor instances)
+- Native diff rendering (inline color-coded overlays)
+- Terminal interception (read output, inject commands)
+- Tab teleportation (predict and animate cursor position)
+Tax: monthly VS Code upstream merges. Dedicated team for "keeping the lights on."
+## Vertical Agent Thesis
+"The interface and the intelligence cannot be decoupled." Winners aren't building best models — they're building best environments for models. Harvey (legal), Abridge (clinical), Cursor (coding). Pattern repeats across domains.
+## Relevance to Harness
+Meta-lesson: architectural control matters more than model access. Our .pi/ harness architecture is our "fork" — we intercept tool calls, enforce pipeline stages, and control the agent's environment. The question is whether we have enough control points to match what Cursor achieves with editor-level access. We do: tool interception hooks give us equivalent leverage in a CLI/agent context.

package/vault/wiki/sources/cursor-harness-april-2026.md ADDED Viewed

@@ -0,0 +1,76 @@
+---
+type: source
+status: ingested
+source_type: engineering-blog
+title: "Continually Improving Our Agent Harness"
+author: "Stefan Heule & Jediah Katz (Cursor/Anysphere)"
+date_published: 2026-04-30
+url: "https://cursor.com/blog/continually-improving-agent-harness"
+confidence: high
+tags: [cursor, agent-harness, model-adaptive, context-window, error-classification, keep-rate]
+key_claims:
+  - "Moved from static guardrails + pre-loaded context to dynamic context discovery"
+  - "Keep Rate metric: fraction of agent code still in codebase after time intervals"
+  - "LLM-as-judge for user satisfaction from response semantics"
+  - "Per-tool per-model error baselines with anomaly detection alerts"
+  - "Weekly automated Cloud Agent for bug triage from log analysis"
+  - "Model-specific tool provisioning: patch format for OpenAI, string replace for Anthropic"
+  - "Mid-chat model switching with conversation summarization"
+  - "Context anxiety: one model started refusing work as context window filled"
+  - "Subagent pattern: fresh context window per specialized task"
+  - "Future: multi-agent orchestration where system dispatches to specialized subagents"
+created: 2026-05-02
+updated: 2026-05-02
+---
+# Continually Improving Our Agent Harness
+Cursor's April 30, 2026 engineering blog detailing their harness evolution philosophy, measurement systems, error classification, and model-adaptive customization. Most directly relevant source for our harness plan.
+## Dynamic Context Evolution
+Early Cursor (2024): static context pre-loaded (folder layout, semantic snippets, compressed files) + guardrails (lint surfacing, read rewriting, tool call limits).
+Current Cursor (2026): guardrails removed as models improved. Dynamic context fetched by agent on demand. More ways for agent to pull context and interact with the world.
+## Measurement: Keep Rate + LLM-as-Judge
+**Keep Rate**: For agent-proposed code changes, track what fraction remains in codebase after fixed time intervals (1hr, 1day, 1week). High keep rate = agent did good work.
+**LLM-as-Judge**: Language model reads user's responses to agent output to determine satisfaction semantically. Moving to next feature = good. Pasting stack trace = bad.
+A/B testing harness variants on real usage. One experiment: more expensive model for context summarization made negligible difference.
+## Error Classification System
+Every tool call error classified:
+| Error Type | Meaning |
+|---|---|
+| `InvalidArguments` | Model mistake in tool call |
+| `UnexpectedEnvironment` | Contradictions in context window |
+| `ProviderError` | Vendor outages |
+| `UserAborted` | User cancelled |
+| `Timeout` | Tool call timed out |
+| Unknown | Always a bug |
+Alerts fire on unknown error threshold. Anomaly detection for expected errors vs per-tool per-model baseline.
+Weekly Cloud Agent Automation: searches logs, surfaces new/spiked issues, creates/updates tickets with investigation.
+## Model-Adaptive Customization
+- OpenAI models: patch-based edit format
+- Anthropic models: string replacement format
+- Custom prompting per provider AND per model version
+- Mid-chat model switching: auto-switch harness, summarize conversation, warn about tool set differences
+- Subagent pattern: fresh context window per specialized task (planning, editing, debugging)
+## Context Anxiety
+One model developed "context anxiety": as context window filled, it started refusing work, hedging that tasks seemed too big. Mitigated through prompt adjustments. Independent validation of our P27 Context Anxiety Guard concept.
+## Relevance to Harness
+**Directly validates**: model-adaptive harness design, provider-native prompting, context anxiety guard (P27), L5 observability need, drift monitor need.
+**New gaps identified**: Keep Rate metric missing from L5, per-tool per-model error classification missing, subagent specialization beyond cost routing missing, autonomous harness self-repair (Cloud Agent for harness bugs) missing.

package/vault/wiki/sources/cursor-instant-apply-2024.md ADDED Viewed

@@ -0,0 +1,45 @@
+---
+type: source
+status: ingested
+source_type: engineering-blog
+title: "Editing Files at 1000 Tokens per Second"
+author: "Aman Sanger (Cursor/Anysphere)"
+date_published: 2024-05-14
+url: "https://cursor.com/blog/instant-apply"
+confidence: high
+tags: [cursor, speculative-edits, fast-apply, diff-models, code-editing, latency]
+key_claims:
+  - "Fast Apply: custom model trained for full-file rewrites, not diff generation"
+  - "Speculative edits: deterministic speculation using existing code as draft tokens. 9-13x speedup"
+  - "Diffs fail because: fewer thinking tokens, out-of-distribution, line number hallucination"
+  - "Search/replace diff format (Aider-inspired) eliminates line numbers but most models still fail"
+  - "Fine-tuned Llama-3-70b + speculative edits outperforms GPT-4o on accuracy and speed"
+  - "~1000 tokens/sec on 70B model, deployed with Fireworks AI inference engine"
+created: 2026-05-02
+updated: 2026-05-02
+---
+# Editing Files at 1000 Tokens per Second
+Cursor's May 2024 technical post on their Fast Apply model and speculative edits algorithm.
+## Why Full-File Rewrites Beat Diffs
+1. **Fewer thinking tokens**: Diffs constrain output tokens, giving model fewer forward passes
+2. **Out of distribution**: Models see more full files than diffs in training
+3. **Line number hallucination**: Tokenizers treat multi-digit numbers as single tokens; model must commit on first token
+Cursor tested Aider-inspired search/replace diff format (no line numbers, redundant +/- markers). Only Claude Opus could output accurate diffs. Most models fail badly.
+## Speculative Edits Algorithm
+Unlike standard speculative decoding (draft model proposes, target verifies), Cursor's **speculative edits** uses the *existing code as draft tokens*. Since code edits reuse 80-90% of existing lines, the current file contents serve as high-quality draft predictions. The target model verifies which spans to keep vs replace.
+This is deterministic speculation — no draft model needed. Deployed with Fireworks AI custom inference engine.
+## Training Pipeline
+Synthetic data from cmd-k prompts → GPT-4 produces chat response → LM "applies" change → mix with real apply data (80/20). Downsampled: small files, repeated filenames, no-op edits. Best model: Llama-3-70b fine-tuned.
+## Relevance to Harness
+Our P10 fuzzy edit matching addresses the same "diff problem" from the tool side. Cursor solves it from the model side (train model to output full rewritten files, not diffs). We should consider: for our edit tool, could we accept full-file rewrites and diff them server-side? This would be more model-friendly.

package/vault/wiki/sources/cursor-shadow-workspace-2024.md ADDED Viewed

@@ -0,0 +1,52 @@
+---
+type: source
+status: ingested
+source_type: engineering-blog
+title: "Iterating with Shadow Workspaces"
+author: "Arvid Lunnemark (Cursor/Anysphere)"
+date_published: 2024-09-01
+url: "https://cursor.com/blog/shadow-workspace"
+confidence: high
+tags: [cursor, shadow-workspace, lsp, pre-verification, agent-harness]
+key_claims:
+  - "Shadow workspace = hidden Electron window for AI code iteration with full LSP access"
+  - "AI iterates invisibly until lints pass; user only sees valid code"
+  - "Implemented as hidden window with gRPC IPC, auto-killed after 15min idle"
+  - "Concurrency via interleaving: AIs paused/resumed like CPU processes"
+  - "Future: kernel-level folder proxy (FUSE) for runnability + disk isolation"
+  - "Rust-analyzer broken because it needs on-disk files; macOS FUSE blocked by Apple walled garden"
+created: 2026-05-02
+updated: 2026-05-02
+---
+# Iterating with Shadow Workspaces
+Cursor's engineering blog post describing the **shadow workspace** — a hidden Electron window that lets AI agents iterate on code with full Language Server Protocol (LSP) access, independently of the user's coding experience.
+## Design Criteria
+1. **LSP-usability**: AIs see lints, go-to-definitions, full LSP interaction
+2. **Runnability**: AIs run code and see output (future goal)
+3. **Independence**: User's coding experience unaffected
+4. **Privacy**: Code stays local
+5. **Concurrency**: Multiple AIs work concurrently
+6. **Universality**: Works for all languages and workspace setups
+7. **Maintainability**: Minimal isolatable code
+8. **Speed**: No minute-long delays, throughput for hundreds of AI branches
+## Current Implementation
+Hidden Electron window spawned with `show: false`. Edits sent via gRPC IPC between extension hosts. Shadow window runs full VS Code environment with LSP plugins. AI iterates on lints invisibly, then valid code presented to user.
+Concurrency: interleaves AI edits like CPU processes — AI A runs, pauses, AI B runs, resume A. AIs don't notice time.
+## Open Questions
+1. Kernel-level folder proxy without kernel extension?
+2. Windows equivalent of FUSE?
+3. DriverKit for fake USB proxy folder?
+4. Network-level isolation for microservice testing?
+5. Cloud-based remote workspace with auto-inferred Docker?
+## Relevance to Harness
+The shadow workspace is the **pre-verification isolation** pattern. It proves that validating code before the user sees it is the single biggest UX differentiator in agentic coding. Our harness should implement an analogous "pre-commit validation sandbox" between L3 and L4.

package/vault/wiki/sources/cursor-shipped-coding-agent-2026.md ADDED Viewed

@@ -0,0 +1,53 @@
+---
+type: source
+status: ingested
+source_type: engineering-blog
+title: "How Cursor Shipped its Coding Agent to Production"
+author: "Lee Robinson (Cursor) + ByteByteGo"
+date_published: 2026-01-26
+url: "https://blog.bytebytego.com/p/how-cursor-shipped-its-coding-agent"
+confidence: high
+tags: [cursor, composer, coding-agent, latency, sandboxing, speculative-decoding, context-compaction]
+key_claims:
+  - "Coding agent ≠ agentic model. Model is brain, agent is body with tools + loop + context retrieval"
+  - "Composer: MoE architecture, 4x faster than similarly intelligent models"
+  - "Three latency strategies: MoE (per-call cost), speculative decoding (generation time), context compaction (prompt processing)"
+  - "Diff Problem: models struggle with edit tasks. Solved via training on (original, edit_cmd, final) triples"
+  - "Search and replace tools are hardest to teach; training data has high volume of these trajectories"
+  - "Sandboxing: custom VM scheduler for bursty demand. Sandboxes are core serving infrastructure"
+  - "Three production lessons: tool use baked into model, adoption is ultimate metric, speed is product"
+created: 2026-05-02
+updated: 2026-05-02
+---
+# How Cursor Shipped its Coding Agent to Production
+ByteByteGo deep dive (Jan 2026) written with Lee Robinson at Cursor. Covers the full architecture of Cursor's coding agent system, Composer model training, and three production challenges.
+## System Architecture
+| Component | Purpose |
+|---|---|
+| **Router** | Auto mode: analyzes request complexity, picks best model |
+| **LLM (agentic model)** | Trained on trajectories (action sequences), not just text |
+| **Tools** | 10+ tools: search, read, write, apply edits, terminal |
+| **Context Retrieval** | Pulls relevant snippets/docs/definitions for current step |
+| **Orchestrator** | ReAct loop: model decides → tool executes → result collected → rebuild context → repeat |
+| **Sandbox** | Isolated execution for builds/tests/linters with strict guardrails |
+## Three Production Challenges
+### 1. The Diff Problem
+Models trained on text generation struggle with code editing. Solution: train on (original_code, edit_command, final_code) triples. Search+replace tools hardest to teach — require high volume of tool-specific trajectories. Composer trained on tens of thousands of GPUs.
+### 2. Latency Compounds
+Three techniques:
+- **MoE Architecture**: Conditional expert routing, fewer active params per token, better quality at similar latency
+- **Speculative Decoding**: Small draft model proposes tokens, large model verifies in parallel. Code structure is predictable (imports, brackets, syntax) → high acceptance rate
+- **Context Compaction**: Summarize working state. Keep failing test names, error types, key stack frames. Drop stale context, deduplicate repeats.
+### 3. Sandboxing at Scale
+Custom VM scheduler for bursty demand. Fast provisioning + aggressive recycling. Sandboxes treated as core serving infrastructure, not just containers. During training: hundreds of thousands of concurrent sandboxed environments.
+## Relevance to Harness
+Validates our: inline syntax validation (P11-P12), edit tool fuzziness (P10), Haiku router (P25), sandbox execution. New gaps: context compaction strategy more sophisticated than our drift pruning, speculative editing is a model-level optimization we can't replicate but can learn from conceptually.

package/vault/wiki/sources/cursor-vs-antigravity-2026.md ADDED Viewed

@@ -0,0 +1,51 @@
+---
+type: source
+source_type: secondary
+title: "Cursor vs Antigravity 2026: Which AI Agent Actually Wins?"
+author: "Vishnu (MeshWorld)"
+date_published: 2026-03-18
+url: "https://meshworld.in/blog/ai/comparisons/cursor-vs-antigravity/"
+confidence: medium
+status: ingested
+created: 2026-05-01
+updated: 2026-05-01
+tags:
+  - antigravity
+  - cursor
+  - comparison
+  - harness-design
+key_claims:
+  - "Antigravity has 1M token context window vs Cursor's RAG-based indexing"
+  - "Browser subagent with visual verification is Antigravity's killer feature"
+  - "Cursor = Centaur model (you-first). Antigravity = Manager model (agent-first)"
+  - "Antigravity Ultra at $249.99/mo criticized for high agentic loop costs"
+  - "Cursor v2.6 adds JetBrains support; Antigravity is VS Code only"
+  - "Antigravity v1.20.5 powered by Gemini 3.1 Pro"
+---# Cursor vs Antigravity 2026
+Technical comparison published March 18, 2026 by MeshWorld.
+## Core Distinction
+- **Cursor: "Centaur" model** — AI amplifies human typing. You stay in flow.
+- **Antigravity: "Manager" model** — AI does the work. You review artifacts and steer.
+## Key Antigravity Features
+1. **1M Token Context Window**: Ingests entire repos into active memory. Understands cross-file dependencies natively. No RAG needed.
+2. **Browser Subagent**: Drives headless Chrome. Takes screenshots, analyzes pixels, verifies UI changes.
+3. **Nano Banana**: Built-in image generator for UI assets.
+## Benchmark Notes
+- Cursor (Claude Opus 4.6) better at pure logic and bug fixing
+- Antigravity (Gemini 3.1 Pro) undefeated for UI, vision, and multi-step reasoning
+## Pricing Gap
+Antigravity Ultra at $249.99/mo. Token-heavy agentic loops burn through quotas fast. Pro users report multi-day lockouts after intensive sessions.
+## Relevance to Harness
+Validates that different agent architectures excel at different task types. The 1M context window vs RAG debate is central to our context strategy. The browser subagent reveals a gap in our tool registry.

package/vault/wiki/sources/disler-pi-vs-claude-code.md ADDED Viewed

@@ -0,0 +1,69 @@
+---
+type: source
+source_type: github-repo
+author: disler (IndyDevDan)
+date_published: 2026-02-23
+url: https://github.com/disler/pi-vs-claude-code
+confidence: high
+tags:
+  - pi-agent
+  - claude-code
+  - agentic-coding
+  - multi-agent
+  - extensions
+key_claims:
+  - "Pi Coding Agent is the only real open-source competitor to Claude Code"
+  - "Pi's extension system enables UI customization, agent orchestration, safety auditing, and cross-agent integrations"
+  - "Extensions compose via multiple -e flags: subagent-widget, agent-team, agent-chain, damage-control, pi-pi"
+  - "Pi supports every major AI model provider (OpenAI, Anthropic, Google, OpenRouter)"
+  - "Agent teams dispatch work to specialists via teams.yaml; agent chains pipeline steps sequentially via agent-chain.yaml"
+---
+# disler/pi-vs-claude-code
+GitHub repository by IndyDevDan (disler) — 928 stars, 244 forks. A collection of customized Pi Coding Agent instances demonstrating how to hedge against Claude Code in the agentic coding market.
+## What It Provides
+**15+ production extensions** covering the full agent lifecycle:
+### Multi-Agent Orchestration (3 extensions)
+- **subagent-widget**: `/sub <task>` spawns background Pi subagents with live-progress widgets
+- **agent-team**: Dispatcher-only orchestrator — primary agent delegates to named specialists via `dispatch_agent` tool, shows grid dashboard
+- **agent-chain**: Sequential pipeline orchestrator — chains agents where output feeds into next step (`$INPUT`, `$ORIGINAL` variables). Example: `plan-build-review` pipeline
+### Safety & Control (2 extensions)
+- **damage-control**: Real-time safety auditing — intercepts dangerous bash patterns via regex, enforces path-based access controls from `.pi/damage-control-rules.yaml`. Block levels: Zero Access, Read-Only, No-Delete, Dangerous Commands (some with `ask: true` confirmation)
+- **purpose-gate**: Session intent declaration on startup, blocks prompts until answered
+### UI & DX (7 extensions)
+- **pure-focus**: Distraction-free mode (no footer/status)
+- **minimal**: Compact footer with model name + 10-block context meter
+- **tool-counter**: Rich two-line footer (model, context, tokens, cost + cwd/branch, per-tool tally)
+- **tool-counter-widget**: Live-updating above-editor per-tool call counts
+- **session-replay**: Scrollable timeline overlay of session history
+- **theme-cycler**: Keyboard shortcuts to cycle custom themes
+- **system-select**: `/system` command to switch between agent personas from `.pi/agents/`
+### Meta & Cross-Agent (2 extensions)
+- **cross-agent**: Scans `.claude/`, `.gemini/`, `.codex/` dirs for commands/skills/agents and registers them in Pi
+- **pi-pi**: Meta-agent that builds Pi agents using parallel research experts (ext-expert, theme-expert, tui-expert)
+## Key Architecture Insights
+**Agent Teams** configured in `.pi/agents/teams.yaml`:
+```yaml
+frontend: [planner, builder, bowser]
+backend: [architect, implementer, tester]
+```
+Individual agent personas live as `.md` files in `.pi/agents/`.
+**Agent Chains** defined in `.pi/agents/agent-chain.yaml` as sequential steps with `$INPUT` injection.
+**Damage Control Rules** in `.pi/damage-control-rules.yaml` with four path policies (Zero Access, Read-Only, No-Delete, Dangerous Commands).
+**Stacking**: Extensions compose — `pi -e extensions/minimal.ts -e extensions/cross-agent.ts`.
+## Relevance to Our Harness
+The repo demonstrates that Pi's extension system can implement the full orchestration patterns (subagent delegation, team dispatch, sequential chaining) entirely in user-space TypeScript, without modifying the core agent. This means our harness can adopt these patterns as `.pi/skills/` extensions rather than core code changes.

package/vault/wiki/sources/distill-deterministic-context-compression.md ADDED Viewed

@@ -0,0 +1,53 @@
+---
+type: source
+source_type: github-repo
+title: "Siddhant-K-code/distill"
+author: "Siddhant Khare"
+date_published: 2026-02-24
+date_accessed: 2026-05-05
+url: "https://github.com/Siddhant-K-code/distill"
+confidence: medium
+tags:
+  - compaction
+  - context-engineering
+  - deterministic
+  - deduplication
+key_claims:
+  - "4-layer deterministic context compression: Cluster, Select, Rerank, Compress"
+  - "~12ms overhead vs ~500ms for LLM compression"
+  - "~$0.0001/call vs $0.01+ for LLM compression"
+  - "Semantic deduplication removes 30-40% redundant context from multiple sources"
+  - "Session-based context window management with token budgets (v0.4.0)"
+  - "Persistent context memory with write-time deduplication and hierarchical decay"
+  - "143 GitHub stars, v0.4.0 (Feb 2026)"
+---
+# Distill — Deterministic Context Compression for LLM Agents
+## Summary
+Distill is a general-purpose context optimization tool that preprocesses context from multiple sources (RAG, tools, memory, docs) before sending to LLMs. It operates as a reliability layer, not a session compactor — its scope is broader but shallower than pi-vcc.
+## Key Details
+- **Repo**: Siddhant-K-code/distill (143 stars, MIT)
+- **Version**: v0.4.0 (Feb 2026)
+- **Algorithm**: Agglomerative clustering + Maximal Marginal Relevance (MMR) re-ranking
+- **Pipeline**: Over-fetch → Cluster → Select → MMR re-rank → Compress
+- **Scope**: Context preprocessing layer (any LLM workflow), not session-specific compaction
+- **Observability**: Prometheus metrics + OpenTelemetry tracing
+- **Config**: `distill.yaml` file
+## How It Differs from pi-vcc
+| Dimension | Distill | pi-vcc |
+|-----------|---------|--------|
+| Scope | Multi-source context preprocessing | Session conversation compaction |
+| Input | RAG chunks, tool outputs, docs, memory | Pi session transcript |
+| Output | Deduplicated, ranked context | Brief transcript + 5 semantic sections |
+| Recall | No lineage recall | Full JSONL lineage recall |
+| Integration | General LLM middleware | Pi `session_before_compact` hook |
+## Why This Matters
+Distill validates the deterministic-over-LLM pattern but operates at a different layer than pi-vcc. Distill preprocesses what goes INTO the context window. pi-vcc compresses what has ALREADY accumulated in the session. Both are complementary, not competing.

package/vault/wiki/sources/embedding-models-benchmark-supermemory-2025.md ADDED Viewed

@@ -0,0 +1,48 @@
+---
+type: source
+status: ingested
+source_type: benchmark-report
+author: Naman Bansal / Supermemory AI
+date_published: 2025-06-27
+url: https://supermemory.ai/blog/best-open-source-embedding-models-benchmarked-and-ranked/
+confidence: high
+key_claims:
+  - "MiniLM-L6-v2: 78.1% top-5 retrieval, 14.7ms/1K tokens, 68ms latency, 1.2GB GPU"
+  - "E5-Base-v2: 83.5% top-5 retrieval, 20.2ms/1K tokens, 79ms latency, 2.0GB GPU"
+  - "BGE-Base-v1.5: 84.7% top-5 retrieval, 22.5ms/1K tokens, 82ms latency, 2.1GB GPU"
+  - "Nomic Embed v1: 86.2% top-5 retrieval, 41.9ms/1K tokens, 110ms latency, 4.8GB GPU"
+  - "MiniLM-L6-v2 is 5-8% lower accuracy than larger models but 3x faster"
+tags:
+  - embedding-models
+  - benchmark
+  - minilm
+  - bge
+  - e5
+  - nomic
+created: 2026-05-02
+updated: 2026-05-02
+---# Best Open-Source Embedding Models Benchmarked and Ranked (2025)
+## Summary
+Comprehensive benchmark of four leading open-source embedding models on BEIR TREC-COVID dataset using FAISS flat L2 index. Provides accuracy, latency, and compute cost trade-offs.
+## Benchmark Results
+| Model | Embed Time (ms/1K tok) | Latency (ms) | Top-5 Accuracy | GPU Memory |
+|-------|----------------------|------------|----------------|------------|
+| MiniLM-L6-v2 | 14.7 | 68 | 78.1% | ~1.2 GB |
+| E5-Base-v2 | 20.2 | 79 | 83.5% | ~2.0 GB |
+| BGE-Base-v1.5 | 22.5 | 82 | 84.7% | ~2.1 GB |
+| Nomic Embed v1 | 41.9 | 110 | 86.2% | ~4.8 GB |
+## Trade-off Analysis
+- **Speed-first**: MiniLM-L6-v2 — best for high-volume, low-latency, edge deployments
+- **Balanced**: E5-Base-v2 or BGE-Base-v1.5 — strong accuracy at reasonable latency
+- **Accuracy-first**: Nomic Embed v1 — best precision but 2x slower, GPU-dependent
+## Relevance to Our Implementation
+MiniLM-L6-v2's 78.1% vs Nomic's 86.2% is an 8.1 percentage point gap on general text retrieval. For code retrieval, the gap is likely wider since MiniLM was trained on general text, not code. However, with AST-aware chunking + contextualized text, the effective gap narrows significantly because the chunking quality improvement (per Vectara NAACL 2025) can outweigh the embedding model choice.