npm - ultimate-pi - Versions diffs - 0.1.2 → 0.1.3 - Mend

ultimate-pi 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (516) hide show

package/vault/wiki/questions/Research-sentrux-dev.md ADDED Viewed

@@ -0,0 +1,123 @@
+---
+type: synthesis
+title: "Research: sentrux.dev"
+created: 2026-05-03
+updated: 2026-05-03
+tags:
+  - research
+  - sentrux
+  - code-quality
+  - ai-coding
+status: developing
+related:
+  - "[[sentrux (tool)]]"
+  - "[[Quality Signal (sentrux)]]"
+  - "[[Five Root Cause Metrics (sentrux)]]"
+  - "[[sentrux Rules Engine]]"
+  - "[[sentrux MCP Integration]]"
+  - "[[harness-implementation-plan]]"
+  - "[[harness]]"
+sources:
+  - "[[sentrux-github-repo]]"
+  - "[[sentrux-dev-landing]]"
+  - "[[sentrux-docs-quality-signal]]"
+  - "[[sentrux-docs-root-cause-metrics]]"
+  - "[[sentrux-docs-rules-engine]]"
+  - "[[sentrux-docs-pro-architecture]]"
+---
+# Research: sentrux.dev
+## Overview
+sentrux is a real-time architectural sensor for AI-agent-written code. Built in Rust (MIT licensed), it computes a single Quality Signal score (0–10,000) from 5 graph-theoretic root cause metrics, visualizes the codebase as an interactive treemap, and integrates with AI coding agents via MCP (Model Context Protocol). First released March 11, 2026; already at v0.5.7 with 1.9k GitHub stars.
+## Key Findings
+- **Unique positioning:** sentrux positions itself as the missing feedback loop in AI-assisted development — the "sensor" that observes architectural reality while AI agents are the "actuator" making changes (Source: [[sentrux-dev-landing]])
+- **Mathematically grounded scoring:** Uses geometric mean of 5 independent graph-theoretic dimensions, justified by Nash Social Welfare theorem (1950). Claims to be "ungameable by design" (Source: [[sentrux-docs-quality-signal]])
+- **5 root cause metrics cover complete structural space:** 3 edge properties (modularity, acyclicity, depth) + 2 node properties (equality, redundancy) — each grounded in peer-reviewed theory (Newman 2004, Martin 2003, Lakos 1996, Gini 1912, Kolmogorov 1963) (Source: [[sentrux-docs-root-cause-metrics]])
+- **MCP-first AI agent integration:** 9 MCP tools allow agents to scan, baseline, check rules, and detect quality degradation per session — closing the feedback loop automatically (Source: [[sentrux-github-repo]])
+- **Pro tier via runtime plugin model:** Pro features ($15/month) live in a separately downloaded dylib, not in the free binary. Ed25519 license keys with offline validation and per-user watermarking for anti-piracy (Source: [[sentrux-docs-pro-architecture]])
+- **Rapid development velocity:** From initial commit to v0.5.7 with Pro architecture, Claude Code plugin, universal resolver, and 316+ tests in approximately one week (Source: [[sentrux-github-repo]])
+- **52 languages via tree-sitter:** Zero language-specific code in the Rust binary. All language knowledge in plugin.toml + tags.scm query files. New languages require zero Rust code (Source: [[sentrux-github-repo]])
+## Key Entities
+- [[sentrux (tool)]]: The open-source architectural sensor and quality governance tool
+- **yjing:** Primary author; GitHub user "sentrux"; appears as "yjing@sentrux.dev" in license keys
+- **claude:** Contributor account — likely represents AI-generated code contributions (the tool was partially built by Claude)
+## Key Concepts
+- [[Quality Signal (sentrux)]]: Single scalar score 0–10,000 via geometric mean of 5 normalized metrics
+- [[Five Root Cause Metrics (sentrux)]]: modularity, acyclicity, depth, equality, redundancy
+- [[sentrux Rules Engine]]: TOML-based architectural constraint system for CI and MCP
+- [[sentrux MCP Integration]]: 9-tool MCP server for AI agent feedback loops
+- **Feedback Loop (cybernetic):** sensor (sentrux) → signal → controller (AI agent) → actuator (code changes) → system (codebase) → loop
+## Contradictions
+- **Self-assessment gap:** The sentrux repo gives itself a "D" rating when analyzed by its own tool (Source: Reddit comment by ron3090). This raises questions about either the tool's calibration or the repo's code quality — both problematic for credibility.
+- **Rapid release pace vs stability:** 17 releases in a single day during launch. Reddit community flagged this as potentially "vibe-coded" with insufficient review and testing. Creator responded to specific feedback (swapped `dirs` crate for `std::env::home_dir()`) suggesting responsiveness but also rapid, reactive development.
+- **Conceptual strength vs practical utility:** Community feedback split between praising the concept (human-in-the-loop, feedback loop) and questioning practical usefulness — "it looks okay visually, but doesn't actually show anything useful" and metrics are "meaningless noise" without actionable drill-down.
+- **"Ungameable" claim:** The Nash Social Welfare theorem guarantees aggregation properties, not that individual metrics can't be gamed. The claim conflates mathematical properties of aggregation with practical impossibility of gaming.
+## Open Questions
+- **Production adoption unknown:** No evidence found of production usage beyond the creator. Tool is <2 months old. Listed as "developing" status.
+- **No independent reviews found:** No blog posts, technical analyses, or third-party evaluations beyond the Reddit launch thread comments. All documentation is from the creator.
+- **Accuracy of metrics across languages:** 52 languages supported via tree-sitter but accuracy of dependency graph extraction likely varies significantly by language maturity of tree-sitter grammars.
+- **Scalability limits unknown:** No data on performance with large codebases (100K+ files, monorepos). Treemap rendering with dependency edges at scale unverified.
+- **Pro plugin security model:** Runtime dylib loading has inherent security risks. The anti-piracy posture explicitly accepts that binary patching defeats all protections — what about malicious plugin substitution?
+- **Comparison with existing tools absent:** No positioning relative to SonarQube, CodeClimate, CodeScene, or other established code quality tools. sentrux claims uniqueness via MCP integration and geometric mean, but doesn't benchmark against alternatives.
+## Sources
+- [[sentrux-github-repo]]: GitHub repository, primary development source, MIT licensed
+- [[sentrux-dev-landing]]: Official marketing website (sentrux.dev)
+- [[sentrux-docs-quality-signal]]: Quality Signal scoring methodology documentation
+- [[sentrux-docs-root-cause-metrics]]: Detailed mathematical definitions of all 5 metrics
+- [[sentrux-docs-rules-engine]]: Rules engine TOML configuration and enforcement documentation
+- [[sentrux-docs-pro-architecture]]: Pro tier architecture, license system, business model
+## Harness Integration Map
+sentrux maps onto the ultimate-pi agentic harness at 5 integration points. See [[harness-implementation-plan]] for the updated build plan.
+| Harness Layer | What sentrux Provides | Replaces |
+|--------------|----------------------|----------|
+| **L2.5 Drift Monitor** | `session_start()`/`session_end()` — structural health baseline + degradation detection per agent session | Augments behavioral drift with structural drift signals |
+| **L3 Grounding** | `scan()` + `check_rules()` — agent gets real-time structural awareness before/after edits; verifies architectural constraints before committing | Adds architectural grounding to existing manual checkpoints |
+| **P20 Gate** | `sentrux check .` — CI-friendly exit 0/1: modularity, acyclicity, depth, equality, redundancy | Joins biome + tsc + fallow (dead code) as fourth deterministic gate |
+| **L5 Observability** | Quality Signal trending via `evolution` tool — continuous metric trackable across sessions | Adds structural dimension to Keep Rate + LLM-as-Judge |
+| **P44 Structural Gate** | Full Fallow replacement: dead code (redundancy), coupling (modularity), cycle detection (acyclicity), god files (equality), depth analysis — all in one tool with MCP + session diff + rules engine | **Replaces Fallow entirely** for P44a-g. Fallow retained only for dead code detection in P20 gate (complementary). |
+### What sentrux Does NOT Replace
+- **L1 Spec Hardening** — specification analysis (LLM evaluation)
+- **L2 Structured Planning** — task planning (LLM evaluation)
+- **L4 Adversarial Verification** — semantic code review, critic agents (LLM evaluation)
+- **P13 Semantic Code Search (ck)** — BM25+embeddings grep (different concern: semantic vs structural)
+- **P14 Think-in-Code** — coding paradigm enforcement (different concern: process vs structure)
+- **P15 Gitingest** — bulk external repo ingestion (different concern: ingestion vs analysis)
+- **P30 Browser Subagent** — visual UI verification (different domain)
+- **P43 TS Execution Layer** — TypeScript sandbox (different concern: execution vs analysis)
+- **L7 Schema Orchestration** — workflow DAG (different concern: orchestration vs analysis)
+- **L8 Wiki Query** — knowledge base search (different concern: retrieval vs analysis)
+### Token Budget Impact
+sentrux MCP calls add **0 LLM tokens** to the pipeline budget. All 9 tools are deterministic Rust computations — structural analysis happens outside the LLM context window. This replaces ~500-1,000 tokens of LLM-based structural review that Fallow required for interpretation of its JSON output.
+### Why sentrux Over Fallow
+| Capability | Fallow | sentrux |
+|-----------|--------|--------|
+| Dead code detection | Yes (redundancy metric) | Yes (redundancy metric) |
+| Duplication detection | Yes | Via redundancy |
+| Complexity (cyclomatic) | Yes | Via equality (Gini) — god file detection |
+| Boundary/coupling analysis | Yes | Yes (modularity + acyclicity) |
+| Dependency depth | No | Yes (Lakos 1996 levelization) |
+| Modularity (Newman 2004) | No | Yes |
+| Single scalar 0-10,000 score | No | Yes (Quality Signal) |
+| MCP server | Yes | Yes (9 tools vs Fallow's tool set) |
+| Session baseline/diff | No | Yes |
+| Rules engine (TOML constraints) | No | Yes |
+| 52 languages | TS/JS only | 52 languages |
+| Open source license | MIT | MIT (free binary) |
+**Decision:** sentrux is the primary structural quality tool. Fallow is retained only for TypeScript-specific dead code detection in the P20 gate (complementary, not competitive).

package/vault/wiki/questions/Research-superpowers-skill-for-agentic-coding-agents.md ADDED Viewed

@@ -0,0 +1,164 @@
+---
+type: synthesis
+title: "Research: Superpowers Skill for Agentic Coding Agents"
+created: 2026-05-05
+updated: 2026-05-05
+tags:
+  - research
+  - agent-skills
+  - superpowers
+  - harness
+  - methodology
+status: developing
+related:
+  - "[[superpowers-methodology]]"
+  - "[[agent-skills-ecosystem]]"
+  - "[[jesse-vincent]]"
+  - "[[superpowers-github-repo]]"
+  - "[[superpowers-release-blog]]"
+  - "[[superpowers-termdock-analysis]]"
+  - "[[skill-first-architecture]]"
+  - "[[agent-skills-pattern]]"
+  - "[[policy-engine-pattern]]"
+  - "[[agentic-orchestration-pipeline]]"
+  - "[[harness-implementation-plan]]"
+sources:
+  - "[[superpowers-github-repo]]"
+  - "[[superpowers-release-blog]]"
+  - "[[superpowers-termdock-analysis]]"
+  - "[[superpowers-angle1.json]]"
+  - "[[superpowers-angle2.json]]"
+---
+# Research: Superpowers Skill for Agentic Coding Agents
+## Overview
+Superpowers (`obra/superpowers`) by Jesse Vincent is the most-adopted agentic skills framework for AI coding agents — 179K GitHub stars, 15.9K forks, MIT license, v5.1.0 (May 2026). It is a complete software development methodology expressed as composable SKILL.md files that enforce disciplined engineering practices through hard gates. Superpowers does not improve the model — it enforces process. And process, not intelligence, is the real bottleneck for AI coding agents.
+The framework is deeply relevant to our harness pipeline: Superpowers validates our skill-first architecture, provides a battle-tested pattern for hard-gate enforcement, and can be directly integrated as a `.pi/skills/` skill set. But Superpowers cannot replace our code-level enforcement (drift monitor) — its enforcement is probabilistic (agent compliance with skill instructions), while our harness requires deterministic gates.
+## Key Findings
+### 1. Superpowers is process-as-discipline, not model improvement (Source: [[superpowers-github-repo]])
+Superpowers ships 14+ composable skills organized into a complete development workflow: brainstorming → git-worktrees → writing-plans → subagent-driven-development → TDD → code-review → branch-cleanup. Skills trigger automatically. They are mandatory, not advisory. The agent checks for relevant skills before any task.
+### 2. Hard gates beat suggestions every time (Source: [[superpowers-termdock-analysis]])
+"Always write tests first" in CLAUDE.md is a suggestion ignored under pressure. "NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST. Write code before the test? Delete it. Start over." is a gate that cannot be bypassed. Superpowers' core insight: AI agents respond to structure with explicit consequences.
+### 3. Subagent-driven development is the architectural innovation (Source: [[superpowers-release-blog]])
+Each task dispatches a fresh subagent with only task description and relevant context — not full conversation history. Two-stage review: spec compliance first, then code quality. Result: Claude can work autonomously for hours without deviating from plan. This prevents context pollution and drift.
+### 4. Cross-agent portability via plain Markdown (Source: [[agent-skills-ecosystem]])
+Skills are SKILL.md files — not platform-specific plugins. Work across Claude Code, Codex CLI, Cursor, Gemini CLI, OpenCode, GitHub Copilot, and Factory Droid. The Agent Skills open standard was adopted by all major platforms within weeks of release. 490K+ skills exist across marketplaces as of March 2026.
+### 5. TDD enforcement with real results (Source: [[superpowers-termdock-analysis]])
+chardet 7.0.0 shipped using Superpowers: 41x faster, 96.8% accuracy, dozens of longstanding issues fixed. Test suite covering 2,161 files across 99 encodings was a direct product of enforced TDD.
+### 6. Skill creation is meta — TDD for skills (Source: [[superpowers-release-blog]])
+Superpowers includes a `writing-skills` skill. Skills are tested by running subagents through realistic pressure scenarios (time pressure, sunk cost). After each failure, skill instructions are strengthened. This is "TDD for skills" — a recursive self-improvement loop.
+### 7. Persuasion principles work on LLMs (Source: [[superpowers-release-blog]])
+Cialdini's Influence principles (authority, commitment, scarcity, social proof) measurably affect LLM behavior. A Wharton study co-authored by Dan Shapiro put scientific rigor behind this. Superpowers uses these principles in skill design — not to jailbreak agents, but to make them MORE reliable and disciplined.
+## How Superpowers Fits Into / Helps Our Workflow
+### Direct Integration Path
+Superpowers can be integrated as a skill set within our `.pi/skills/` system:
+```
+.pi/skills/superpowers/
+├── brainstorming/SKILL.md
+├── writing-plans/SKILL.md
+├── test-driven-development/SKILL.md
+├── systematic-debugging/SKILL.md
+├── subagent-driven-development/SKILL.md
+├── requesting-code-review/SKILL.md
+├── using-git-worktrees/SKILL.md
+└── finishing-a-development-branch/SKILL.md
+```
+These skills would activate automatically through our existing progressive disclosure mechanism. The agent self-selects relevant skills based on description matching — same as Superpowers' native mechanism.
+### Mapping to Our Harness Layers
+| Superpowers Skill | Harness Layer | How It Helps |
+|------------------|---------------|--------------|
+| brainstorming | L1 (Spec Hardening) | Enforces design-before-code; asks clarifying questions; produces signed-off spec |
+| writing-plans | L2 (Structured Planning) | Breaks spec into 2-5 min tasks with exact file paths and verification steps |
+| test-driven-development | L3 (Execute) | Enforces RED-GREEN-REFACTOR as hard gate; deletes code written before tests |
+| systematic-debugging | L3 (Execute) | Four-phase root cause investigation; no fixes without diagnosis |
+| requesting-code-review | L4 (Adversarial) | Fresh subagent reviews against plan; critical issues block progress |
+| using-git-worktrees | L3 (Execute) | Isolated workspace per task; clean baseline verification |
+| subagent-driven-development | L7 (Orchestration) | Dispatches fresh subagent per task; two-stage review; context isolation |
+### What Superpowers Validates About Our Architecture
+1. **Skill-first architecture** — Superpowers proves that markdown-based skills are the right primitive for agent discipline. Our May 2026 redesign to skill-first architecture is independently validated by the most-adopted framework.
+2. **Progressive disclosure** — Superpowers' trigger mechanism (load name+description, activate on match) is identical to our `.pi/skills/` design. Cross-validation increases confidence.
+3. **Hard-gate enforcement** — Superpowers' "Iron Laws" (delete code written before tests, no fixes without investigation) map directly to our pre-execution policy gates (P-F1 from Gemini CLI research). Both approaches recognize that suggestions fail under pressure.
+4. **Subagent isolation** — Superpowers' fresh-context-per-task pattern maps to our subagent worktree isolation (P25). Both prevent context pollution.
+5. **Cross-agent portability** — Superpowers' SKILL.md format works across all major agents. Our `.pi/skills/` format should consider SKILL.md compatibility for ecosystem reuse.
+### What Superpowers Does NOT Replace
+Superpowers cannot replace our code-level enforcement:
+- **Drift Monitor (L2.5)** — Superpowers has no runtime drift detection. It relies on the agent following skill instructions. Our LLM-first drift monitor (Haiku 4.5 every 8 turns) catches semantic drift that skills cannot prevent.
+- **Pre-execution Policy Gates** — Superpowers' enforcement is probabilistic (model compliance). Our harness needs deterministic gates (exit-code semantics, tool interception) for safety-critical operations.
+- **Pipeline Orchestration (L7)** — Superpowers chains skills implicitly via agent behavior. Our harness needs explicit DAG-based orchestration with contracts between layers.
+- **Persistent Memory (L6)** — Superpowers has no cross-session memory beyond the agent's immediate context. Our Obsidian wiki provides canonical memory with ADRs, consensus records, and hot cache.
+### Recommended Integration Strategy
+1. **Adopt Superpowers as a skill set** — Add Superpowers skills to `.pi/skills/superpowers/`. The agent loads them on demand via progressive disclosure. Zero code changes needed.
+2. **Layer deterministic enforcement on top** — Superpowers provides the methodology (probabilistic). Our harness provides the enforcement (deterministic). The drift monitor catches when the agent fails to follow the methodology. The policy engine gates destructive operations.
+3. **Customize for our workflow** — Fork and customize Superpowers skills to match our harness layers. Add references to wiki (read ADRs before planning, file consensus after decisions). Add allowed-tools restrictions.
+4. **Maintain SKILL.md compatibility** — Ensure our `.pi/skills/` format remains compatible with the Agent Skills open standard. This enables reuse of the 490K+ ecosystem and cross-agent portability.
+## Key Entities
+- [[jesse-vincent]]: Creator of Superpowers, Request Tracker, K-9 Mail. Founder of Prime Radiant.
+## Key Concepts
+- [[superpowers-methodology]]: The complete discipline framework — hard gates, composable workflow, two enforcement types
+- [[agent-skills-ecosystem]]: The 490K+ skill marketplace, open standard, cross-agent support, security risks
+## Contradictions
+- **Superpowers skills vs CLAUDE.md**: Some HN commenters argue that SKILL.md files are "just elaborate CLAUDE.md files" and no different from good prompts. The key difference: Superpowers uses hard gates with explicit consequences (delete code, start over), not just instructions. CLAUDE.md is informational; SKILL.md is enforceable. But this distinction is probabilistic — compliance depends on the model following instructions.
+- **Community reception**: HN thread on Superpowers launch had mixed reception. Enthusiasts (Simon Willison: "wildly more ambitious") vs skeptics ("voodoo nonsense," "no benchmarks"). The absence of rigorous A/B testing remains a valid criticism.
+- **Security risk**: 36.8% of skills have security flaws. Superpowers itself is safe (MIT, open source, from trusted author), but the broader ecosystem carries supply-chain risk.
+## Open Questions
+- Can Superpowers' hard-gate enforcement be made deterministic? (Currently probabilistic — depends on model compliance with skill instructions)
+- How well does Superpowers work with non-Claude models? (Claude Code has deepest integration; other agents get reduced functionality)
+- What is the token cost of full Superpowers workflow? (Not documented; subagent-driven development likely has high token consumption)
+- Should our `.pi/skills/` format adopt SKILL.md compatibility? (Enables ecosystem reuse but reduces our ability to add harness-specific extensions)
+- How do we test skill effectiveness? (Superpowers uses "TDD for skills" with pressure scenarios — can we adopt this for our harness skills?)
+## Sources
+- [[superpowers-github-repo]]: Primary source — repo README, architecture, skills library, philosophy
+- [[superpowers-release-blog]]: Original release announcement with development methodology, persuasion principles, memory system
+- [[superpowers-termdock-analysis]]: Third-party deep dive with skill-by-skill analysis, philosophy, practical guidance

package/vault/wiki/questions/Research: Augment Code Context Engine.md ADDED Viewed

@@ -0,0 +1,244 @@
+---
+type: synthesis
+title: "Research: Augment Code Context Engine"
+created: 2026-04-30
+updated: 2026-04-30
+tags:
+  - research
+  - context-engine
+  - augment-code
+  - agent-architecture
+  - semantic-search
+  - chunking
+  - embeddings
+status: developing
+related:
+  - "[[Context Engine (AI Coding)]]"
+  - "[[Semantic Codebase Indexing]]"
+  - "[[Prompt Enhancement]]"
+  - "[[Dual-Model Agent Architecture]]"
+  - "[[Majority Vote Ensembling]]"
+  - "[[Contractor vs Employee AI Model]]"
+  - "[[Augment Code]]"
+  - "[[AST-Aware Code Chunking]]"
+  - "[[Contextualized Text Embedding]]"
+  - "[[Late Chunking vs Early Chunking]]"
+sources:
+  - "[[Augment Context Engine Official]]"
+  - "[[Augment SWE-bench Agent GitHub]]"
+  - "[[Augment SWE-bench Pro Blog]]"
+  - "[[Augment Code WorkOS ERC 2025]]"
+  - "[[Augment Code Codacy AI Giants]]"
+  - "[[Augment Code MCP SiliconAngle]]"
+  - "[[Auggie Context MCP Server]]"
+  - "[[cast-code-chunking-paper]]"
+  - "[[vectara-chunking-vs-embedding-naacl2025]]"
+  - "[[coir-code-retrieval-benchmark]]"
+  - "[[code-chunk-library-supermemory]]"
+  - "[[embedding-models-benchmark-supermemory-2025]]"
+---# Research: Augment Code Context Engine
+## Overview
+Augment Code's Context Engine is a semantic search engine for codebases that provides AI coding agents with deep understanding of architecture, dependencies, and team patterns. It is the primary differentiator behind Augment's #1 SWE-bench Pro score (51.80%). The core insight: **context quality determines code quality more than model intelligence** — the same model (Claude Opus 4.5) scores 6 points higher with better context.
+## Key Findings
+### 1. Context Engine Architecture
+- Semantic indexing of entire codebase (1M+ files) using custom embedding models trained in pairs (Source: [[Augment Context Engine Official]], [[Augment Code Codacy AI Giants]]).
+- Real-time knowledge graph mapping relationships between files, services, dependencies (Source: [[Augment Context Engine Official]]).
+- Intelligent context curation: retrieves only what matters, compresses context, ranks by relevance (Source: [[Augment Context Engine Official]]).
+- Multi-source: code + commit history + team patterns + external docs + tribal knowledge (Source: [[Augment Context Engine Official]]).
+### 2. Benchmark Performance
+- 65.4% on SWE-bench Verified — #1 open-source implementation (Source: [[Augment SWE-bench Agent GitHub]]).
+- 51.80% on SWE-bench Pro — #1 among tested agents (Source: [[Augment SWE-bench Pro Blog]]).
+- Same model (Claude Opus 4.5): Auggie 51.80%, Cursor 50.21%, Claude Code 49.75% — context retrieval explains the gap (Source: [[Augment SWE-bench Pro Blog]]).
+- As context provider for other agents: 30-80% quality improvement (Source: [[Augment Code MCP SiliconAngle]]).
+### 3. Agent Architecture
+- **Dual-model**: Claude Sonnet 3.7 as core driver + OpenAI o1 as ensembler (Source: [[Augment SWE-bench Agent GitHub]]).
+- **Majority vote ensembling**: Generate 8 candidate solutions, o1 selects best (Source: [[Augment SWE-bench Agent GitHub]]).
+- **Sequential thinking tool**: Complex problem decomposition (Source: [[Augment SWE-bench Agent GitHub]]).
+- **Parallel execution**: Sharding across machines, 80 agents in parallel (Source: [[Augment SWE-bench Agent GitHub]]).
+### 4. Prompt Enhancement
+- Automatically enriches user queries with relevant codebase context before LLM sees them (Source: [[Augment Code WorkOS ERC 2025]]).
+- Detects existing utilities/libraries to encourage reuse (Source: [[Augment Code WorkOS ERC 2025]]).
+- "Good code is often no new code at all" (Source: [[Augment Code WorkOS ERC 2025]]).
+### 5. Context as API/MCP
+- Context Engine launched as MCP server (Feb 2026) — any agent can use it (Source: [[Augment Code MCP SiliconAngle]]).
+- Community MCP wrapper: auggie-context-mcp on npm (Source: [[Auggie Context MCP Server]]).
+- Less powerful model + Augment context > more powerful model + poor context (Source: [[Augment Code MCP SiliconAngle]]).
+### 6. Real-World Impact (claimed)
+- Onboarding: 18 months → 2 weeks on legacy Java monolith (Source: [[Augment Context Engine Official]]).
+- Refactoring: 6-month estimate → 1 week (Source: [[Augment Context Engine Official]]).
+- Code review: 7 min → 3 min per PR; 60-80% acceptance rate (Source: [[Augment Context Engine Official]], [[Augment Code Codacy AI Giants]]).
+- Test coverage: 45% → 80% in one quarter (Source: [[Augment Context Engine Official]]).
+## Key Entities
+- [[Augment Code]]: Company building the Context Engine and Auggie agent.
+## Key Concepts
+- [[Context Engine (AI Coding)]]: Semantic search engine providing deep codebase understanding.
+- [[Semantic Codebase Indexing]]: Converting code to vector embeddings for similarity search.
+- [[Dual-Model Agent Architecture]]: Fast model for iteration + deliberative model for selection.
+- [[Prompt Enhancement]]: Pre-processing queries with retrieved context.
+- [[Majority Vote Ensembling]]: Generating multiple solutions and selecting best via LLM.
+- [[Contractor vs Employee AI Model]]: Context makes the difference, not intelligence.
+## Implementation Plan: Integration into Our Agentic Coding Harness
+### Module 1: Semantic Codebase Indexer
+**What**: Embedding-based indexing of all project files.
+**How**:
+- Use sentence-transformers (all-MiniLM-L6-v2) for local embeddings.
+- Chunk code via tree-sitter AST (already available via lean-ctx).
+- Store in LanceDB (embedded, zero-config).
+- Real-time sync via watchdog.
+- Build dependency graph via tree-sitter AST analysis.
+**Integration with harness**: New `pi_semantic_index` module. Exposes `semantic_search(query, top_k)` API. Complements lean-ctx's exact search with semantic search.
+### Module 2: Context Retrieval Engine
+**What**: Given a task, retrieve semantically relevant code, patterns, and knowledge.
+**How**:
+- Hybrid search: keyword (BM25) + semantic (cosine similarity).
+- Multi-source: code files + wiki pages + git history + ctx_knowledge.
+- Ranking: relevance × recency × relationship proximity.
+- Context compression: summarize large chunks to fit token budget.
+**Integration with harness**: New `pi_context_retrieval` module. Exposes `retrieve_context(query, max_tokens)` API. Used by prompt enhancer and agent loop.
+### Module 3: Prompt Enhancer
+**What**: Pre-process user queries by injecting retrieved context.
+**How**:
+- Query → Context Retrieval Engine → Build augmented prompt.
+- Include: relevant code, existing patterns, related utilities, wiki knowledge.
+- Detect reuse opportunities (existing libraries/utilities).
+- Compress to fit model's context window.
+**Integration with harness**: New `pi_prompt_enhancer` module. Sits between user input and LLM call. Configurable via harness config.
+### Module 4: MCP Context Server
+**What**: Expose context retrieval as MCP tool for any AI agent.
+**How**:
+- MCP server providing `query_codebase` tool.
+- Read-only — no file modification.
+- Uses Module 2 (Context Retrieval Engine) under the hood.
+- Supports Claude Desktop, Cursor, and any MCP-compatible agent.
+**Integration with harness**: New `pi_mcp_context` module. Runs as separate MCP server process. Our own agent can use it, or external agents can.
+### Module 5: Dual-Model Agent Loop
+**What**: Agent architecture using fast model for iteration + deliberative model for verification.
+**How**:
+- Primary model (Claude Sonnet/Opus) for the main agent loop.
+- Ensembler model (GPT-5/o1) for solution verification and selection.
+- Generate N candidate solutions, ensembler picks best.
+- Configurable: single-model mode for cost-sensitive runs.
+**Integration with harness**: Enhancement to existing agent loop. Model selection strategy becomes configurable. Adds `ensemble` execution mode.
+### Module 6: Multi-Source Context Aggregation
+**What**: Unify all context sources available in our harness.
+**How**:
+- Code: lean-ctx (exact) + semantic index (new).
+- Knowledge: wiki vault (existing) + ctx_knowledge (existing).
+- History: git log integration (new).
+- Patterns: extracted from codebase conventions (new).
+- Session: ctx_session cross-session memory (existing).
+**Integration with harness**: New `pi_context_aggregator` module. Single unified API: `get_full_context(query)` returns merged, ranked, deduplicated context from all sources.
+## Architecture Diagram
+```
+┌─────────────────────────────────────────────────────┐
+│                  User Query                          │
+└──────────────────────┬──────────────────────────────┘
+                       ▼
+┌─────────────────────────────────────────────────────┐
+│              Prompt Enhancer (Module 3)              │
+│  Original query + retrieved context + patterns       │
+└──────────────────────┬──────────────────────────────┘
+                       ▼
+┌─────────────────────────────────────────────────────┐
+│           Context Aggregator (Module 6)              │
+│  Merges: code + wiki + git + patterns + session      │
+└──┬──────────┬──────────┬──────────┬─────────────────┘
+   ▼          ▼          ▼          ▼
+┌──────┐ ┌──────┐ ┌──────┐ ┌──────────┐
+│Semantic│ │lean-ctx│ │ Wiki │ │Git/Patterns│
+│ Index │ │(exact) │ │Vault │ │(new)      │
+│(new)  │ │        │ │      │ │           │
+└──────┘ └──────┘ └──────┘ └──────────┘
+                       ▼
+┌─────────────────────────────────────────────────────┐
+│           Agent Loop (Module 5)                      │
+│  Primary model (iterative) + Ensembler (selection)   │
+└──────────────────────┬──────────────────────────────┘
+                       ▼
+┌─────────────────────────────────────────────────────┐
+│           MCP Context Server (Module 4)              │
+│  Exposes context retrieval to external agents        │
+└─────────────────────────────────────────────────────┘
+```
+## Contradictions
+- **Benchmark scores vary by benchmark type**: SWE-bench Verified scores are higher (65.4% for Augment) than SWE-bench Pro (51.80%). This reflects Pro's greater difficulty (multi-file, multi-language, real task diversity). Claims of "80.9% SWE-bench Verified" from Claude Opus 4.5 come from different evaluation setups. (Source: [[Augment SWE-bench Pro Blog]], cross-referenced with leaderboard listings).
+- **Self-reported metrics**: Augment's onboarding, refactoring, and velocity claims are from their own marketing/ case studies. No independent verification found. Confidence: medium for impact claims.
+- **Community MCP vs Official MCP**: The community auggie-context-mcp exists, but Augment has since released an official Context Engine MCP. The community version may be deprecated.
+## Resolved Questions
+### Q1: What embedding model and vector DB does Augment use?
+**Status: Partially resolved by inference.** No public disclosure exists. Augment states "custom embedding and retrieval models trained in pairs" (Source: [[Augment Context Engine Official]]). Based on the CoIR code retrieval benchmark (Source: [[coir-code-retrieval-benchmark]]), the top code embedding models as of 2025-2026 are Voyage-code-3, Salesforce SFR-Embedding-Code-2B_R, BGE-code-v1, Jina-embeddings-v4, and Qwen3-Embedding. Augment likely uses a custom variant fine-tuned on their proprietary code corpus. For the vector DB: given 1M+ files with millisecond sync, candidates include Pinecone serverless, Weaviate, Milvus, or a custom sharded FAISS deployment. **No way to confirm without Augment disclosure.**
+### Q2: What is Augment's chunking strategy and compression algorithm?
+**Status: Resolved by inference from latest research.** While Augment's exact strategy is undisclosed, the state of the art in code chunking as of 2025-2026 is:
+- **AST-aware chunking** (cAST paper, June 2025): Splits code at syntactic boundaries via tree-sitter AST. Improves Recall@5 by 4.3 points and Pass@1 by 2.67 on SWE-bench (Source: [[cast-code-chunking-paper]]).
+- **Contextualized text**: Prepending file path, scope chain, signatures, and imports to each chunk before embedding — bridges gap between code syntax and natural-language-trained embedding models (Source: [[code-chunk-library-supermemory]]).
+- **Chunking matters more than embedding model**: Vectara NAACL 2025 study across 25 chunking configs × 48 embedding models found chunking strategy equals or exceeds embedding model choice in retrieval quality impact (Source: [[vectara-chunking-vs-embedding-naacl2025]]).
+- **Contextual retrieval** (not full late chunking) is the sweet spot: preserves semantic coherence at moderate compute cost (Source: [[Late Chunking vs Early Chunking]]).
+**Augment almost certainly uses AST-aware chunking with contextualized text**, given their stated focus on "understanding relationships between files" and "retrieving only what matters."
+### Q3: Can local embeddings (all-MiniLM-L6-v2) approach comparable quality?
+**Status: Resolved — viable with right chunking strategy, but needs empirical validation.**
+Benchmark data (Source: [[embedding-models-benchmark-supermemory-2025]]):
+- MiniLM-L6-v2: 78.1% top-5 retrieval accuracy on general text, 14.7ms/1K tokens, 1.2GB GPU
+- BGE-Base-v1.5: 84.7% accuracy, 22.5ms/1K tokens, 2.1GB GPU
+- Nomic Embed v1: 86.2% accuracy, 41.9ms/1K tokens, 4.8GB GPU
+**Key insight**: The 5-8% accuracy gap between MiniLM and larger models can be partially closed by:
+1. **AST-aware chunking** (higher leverage than model upgrade per Vectara NAACL 2025)
+2. **Contextualized text prepending** (compensates for MiniLM's lack of code-specific training)
+3. **Hybrid search** (BM25 + vector) to catch exact matches that semantic search misses
+**Code-specific gap is wider**: MiniLM-L6-v2 was trained on general text, not code. Code-specific models (Voyage-code-3, BGE-code-v1) have a larger advantage on code retrieval than the general-text benchmark suggests. Qdrant's code search tutorial notes MiniLM requires "preprocessing code to resemble natural language" while Jina embeddings natively support code (Source: Qdrant docs).
+**Recommendation**: Start with MiniLM-L6-v2 + AST-aware chunking + contextualized text. Run CoIR benchmark eval against the leaderboard to quantify the gap. If retrieval quality is insufficient, upgrade to BGE-code-v1 (2.1GB GPU, code-native) or all-MiniLM-L12-v2 (same 384-dim but 12-layer, better quality at moderate cost).
+## Remaining Open Questions
+- [ ] How does real-time sync work at scale (1M+ files)? "Millisecond-level sync" — implementation detail not available.
+- [ ] How does context compression work without losing critical information? Black box.
+- [ ] What is the actual retrieval pipeline (candidate generation → re-ranking)? Partial information only.
+- [ ] Empirical CoIR benchmark validation needed: MiniLM-L6-v2 + AST chunking vs BGE-code-v1 vs Voyage-code-3 on our actual codebase.
+## Sources
+- [[Augment Context Engine Official]]: Official product page, 2026.
+- [[Augment SWE-bench Agent GitHub]]: Open-source agent, 2025.
+- [[Augment SWE-bench Pro Blog]]: Benchmark results, Feb 2026.
+- [[Augment Code WorkOS ERC 2025]]: Conference demo recap, Oct 2025.
+- [[Augment Code Codacy AI Giants]]: Engineering interview, Mar 2026.
+- [[Augment Code MCP SiliconAngle]]: MCP launch coverage, Feb 2026.
+- [[Auggie Context MCP Server]]: Community MCP wrapper, 2026.