npm - ultimate-pi - Versions diffs - 0.1.2 → 0.1.4 - Mend

ultimate-pi 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (516) hide show

package/vault/wiki/sources/Auggie Context MCP Server.md ADDED Viewed

@@ -0,0 +1,63 @@
+---
+type: source
+status: ingested
+source_type: github-repository
+author: aj47 (community)
+date_published: 2026
+url: https://github.com/aj47/auggie-context-mcp
+confidence: medium
+key_claims:
+  - "MCP server wrapping Auggie CLI for codebase context retrieval"
+  - "Single tool: query_codebase — intelligent Q&A over repositories"
+  - "Pure TypeScript/Node.js, read-only"
+  - "Architecture: AI Agent → MCP Protocol → auggie-context-mcp → Auggie CLI → Augment Context Engine"
+  - "34 stars, 6 forks"
+  - "Official Augment MCP now available at docs.augmentcode.com/context-services/mcp/overview"
+tags:
+  - mcp
+  - augment-code
+  - context-engine
+  - open-source
+created: 2026-05-02
+updated: 2026-05-02
+---# Auggie Context MCP Server (Community)
+## Summary
+A community-built MCP server that exposes Auggie CLI for codebase context retrieval via the Model Context Protocol. Allows AI agents (Claude Desktop, Cursor) to query codebases using Augment's context engine.
+**Note**: Augment Code has since released an official Context Engine MCP.
+## Architecture
+```
+AI Agent (Claude, Cursor)
+    │ MCP Protocol (stdio)
+    ▼
+auggie-context-mcp (TypeScript/Node.js)
+    │ subprocess
+    ▼
+Auggie CLI (--print --quiet)
+    │
+    ▼
+Augment Context Engine
+```
+## Available Tool: query_codebase
+Parameters:
+- `query` (required): Question about the codebase.
+- `workspace_root` (optional): Path to repository root.
+- `model` (optional): Model ID to use.
+- `rules_path` (optional): Path to additional rules file.
+- `timeout_sec` (optional): Query timeout, default 240s.
+- `output_format` (optional): `text` or `json`.
+## Setup
+Uses `npx -y auggie-context-mcp@latest` — no installation needed. Requires Auggie CLI installed and authenticated (`auggie login`). Configuration added to Claude Desktop or Cursor MCP config JSON.
+## Relevance to Implementation
+Demonstrates the pattern of wrapping a context retrieval engine as an MCP tool. Our own context engine could be exposed similarly — an MCP server that provides `query_codebase` using our semantic index + wiki knowledge.

package/vault/wiki/sources/Augment Code Codacy AI Giants.md ADDED Viewed

@@ -0,0 +1,61 @@
+---
+type: source
+status: ingested
+source_type: podcast-recap
+author: Codacy (interview with Vinay Perneti, VP Engineering at Augment Code)
+date_published: 2026-03-30
+url: https://blog.codacy.com/ai-giants-how-augment-code-solved-the-large-codebase-problem
+confidence: high
+key_claims:
+  - "Custom embedding and retrieval models trained in pairs for maximum quality"
+  - "New engineer ran evals in week one, shipped models in week two"
+  - "Context as API: allowing companies to add their own context sources programmatically"
+  - "60-80% code review acceptance rate"
+  - "Roadmaps are dead — plan in quarters not years"
+  - "Only 3 model choices vs competitors' 20+"
+  - "Contractor vs employee model: contractors have intelligence, employees have context"
+tags:
+  - augment-code
+  - context-engine
+  - engineering-interview
+  - enterprise
+created: 2026-05-02
+updated: 2026-05-02
+---# How Augment Code Solved the Large Codebase Problem (Codacy AI Giants)
+## Summary
+Codacy CEO Jaime Jorge interviewed Vinay Perneti, VP of Engineering at Augment Code, about building AI tools that work in the real world of legacy code, technical debt, and million-line repositories.
+## Key Engineering Insights
+### Three-Pronged Context Strategy
+1. **Research-driven embeddings**: Custom embedding and retrieval models trained in pairs for maximum quality.
+2. **Expanding context sources**: Recently added PR history because "adding a feature flag touches like 20 different places."
+3. **Context as an API**: Soon allowing companies to add their own context sources programmatically.
+### The "Contractor vs Employee" Model
+- Contractors borrow intelligence but lack context.
+- Full-time employees have both intelligence and context.
+- Augment provides context (like an FTE), using best-in-class models for intelligence.
+- Result: only 3 model choices offered vs competitors' 20+.
+### Onboarding Revolution
+- Traditional: 4-5 months to become productive on large codebases.
+- With Augment: 6 weeks to ship complex PRs touching wide ranges of codebase.
+- New engineering manager: ran evals week one, shipped models week two.
+### Code Review Evolution
+- 60-80% acceptance rate on AI suggestions (higher than many human reviews).
+- AI handles technical details, humans focus on direction and architecture.
+- "Automation bias is cultural, not technical — the person pushing the PR owns the code."
+### Pricing Evolution
+- Moved from per-message pricing to usage-based pricing.
+- Prompt enhancer made each message do exponentially more work, breaking old pricing model.
+- Transparent, compute-cost-aligned pricing.
+### Planning Philosophy
+- "Roadmaps are dead" — threw out entire 2025 roadmap on January 25th.
+- Now plan in quarters, not years.

package/vault/wiki/sources/Augment Code MCP SiliconAngle.md ADDED Viewed

@@ -0,0 +1,49 @@
+---
+type: source
+status: ingested
+source_type: news-article
+author: Kyt Dotson, SiliconAngle
+date_published: 2026-02-06
+url: https://siliconangle.com/2026/02/06/augment-code-makes-semantic-coding-capability-available-ai-agent/
+confidence: high
+key_claims:
+  - "30-80% quality improvement when Augment's context engine is used as context provider for other agents"
+  - "Cursor + Claude Opus 4.5: 71% improvement"
+  - "Claude Code + Opus 4.5: 80% improvement"
+  - "Cursor + Composer-1: 30% improvement"
+  - "Less powerful model with good context outperforms larger model with poor context"
+  - "Context Engine MCP launched February 2026"
+tags:
+  - augment-code
+  - context-engine
+  - mcp
+  - siliconangle
+created: 2026-05-02
+updated: 2026-05-02
+---# Augment Code Makes Context Engine Available for Any AI Agent (SiliconAngle)
+## Summary
+Feb 2026: Augment Code launched MCP support for their Context Engine, enabling any AI coding agent or platform to use their semantic codebase understanding. The Context Engine can be plugged into Claude Code, Cursor, Codex, or any MCP-compatible agent.
+## Performance Gains When Used as Context Provider
+When Augment's Context Engine was used to provide context to other agents:
+| Agent + Model | Improvement |
+|--------------|-------------|
+| Claude Code + Opus 4.5 | 80% |
+| Cursor + Claude Opus 4.5 | 71% |
+| Cursor + Composer-1 | 30% |
+## Key Argument
+Less powerful model + high-quality context > more powerful model + poor context.
+The Context Engine reduces search failures by delivering deeper semantic understanding — accuracy and selectivity — providing what's needed for the task while avoiding irrelevant context. This reduces costs and speeds up operations.
+## MCP Integration
+- Model Context Protocol allows Augment's agents to connect to IDEs, CLIs, LLMs, and other agents.
+- Any MCP-compatible platform can integrate as of February 2026.
+- The augmentation occurs before the LLM sees the prompt, improving context quality at the retrieval layer.

package/vault/wiki/sources/Augment Code WorkOS ERC 2025.md ADDED Viewed

@@ -0,0 +1,55 @@
+---
+type: source
+status: ingested
+source_type: conference-recap
+author: Zack Proser, WorkOS
+date_published: 2025-10-29
+url: https://workos.com/blog/augment-code-context-is-the-new-compiler
+confidence: high
+key_claims:
+  - "Augment's context engine uses vector search to understand how code actually behaves"
+  - "Maps relationships and patterns across a project semantically, not by tokens or grep"
+  - "Automatically enriches prompts with relevant context from existing code"
+  - "Detected reusable Git library within codebase instead of shelling out to git"
+  - "Good code is often no new code at all"
+  - "Context engines act as institutional memory for large teams"
+tags:
+  - augment-code
+  - context-engine
+  - semantic-search
+  - erc-2025
+created: 2026-05-02
+updated: 2026-05-02
+---# Augment Code: Context Is the New Compiler (WorkOS ERC 2025)
+## Summary
+At the 2025 Enterprise Ready Conference, Chris Kelly from Augment Code demonstrated their CLI and argued that AI coding tools have been missing context — the hard-won understanding that separates senior engineering from code generation.
+## Key Arguments
+### Why AI Coding Feels Junior
+- Seasoned engineers recall patterns, reference internal libraries, and respect constraints from years of debugging.
+- AI assistants are good at syntax, weak at understanding intent.
+- Asking AI to "just write it" asks it to act without grounding in the codebase reality.
+### Augment's Answer: Context Engines
+- Deeply indexes codebase semantically, not by tokens or grep.
+- Uses vector search to understand how code actually behaves.
+- When a feature is requested, the system automatically enriches the prompt with relevant context.
+- Pulls in established patterns, libraries, and internal utilities — choosing reuse over reinvention.
+### Live Demo: Git Branch Status Bar
+- Simple task: customize status bar to include current Git branch.
+- Other assistant: shelled out to git in a new process.
+- Augment: detected a reusable Git library already in the company codebase and built on top of it.
+- "Good code is often no new code at all."
+### Beyond Code Generation
+- Vectorized index acts as institutional memory for large teams.
+- Surfaces prior art and avoids duplication.
+- AI that learns organization idioms, team patterns, and project scars.
+## Relevance to Implementation
+The "prompt enhancer" concept is directly implementable: pre-process user queries by querying a semantic index of the codebase, then inject retrieved context into the prompt before sending to the LLM.

package/vault/wiki/sources/Augment Context Engine Official.md ADDED Viewed

@@ -0,0 +1,71 @@
+---
+type: source
+status: ingested
+source_type: product-page
+author: Augment Code
+date_published: 2026
+url: https://www.augmentcode.com/context-engine
+confidence: high
+key_claims:
+  - "Context Engine semantically indexes and maps code, understanding relationships between hundreds of thousands of files"
+  - "Not grep or keyword matching — a full search engine for code"
+  - "Indexes 1M+ files with real-time knowledge graph"
+  - "Retrieves only what matters, compresses context, ranks by relevance"
+  - "60-80% code review acceptance rate"
+  - "Onboarding reduced from 18 months to 2 weeks on legacy Java monolith"
+  - "Refactoring: 6-month estimate completed in 1 week"
+  - "Test coverage increased from 45% to 80% in one quarter"
+tags:
+  - context-engine
+  - augment-code
+  - semantic-search
+  - codebase-indexing
+created: 2026-05-02
+updated: 2026-05-02
+---# Augment Context Engine Official Page
+## Summary
+Augment Code's Context Engine is a semantic search engine for codebases that maintains a live understanding of the entire stack — across repos, services, and history. It semantically indexes code, understanding relationships between files rather than relying on grep or keyword matching.
+## Core Capabilities
+### Semantic Indexing
+- Indexes 1M+ files with a real-time knowledge graph.
+- Understands what's active vs deprecated.
+- Maps how services connect and depend on each other.
+- Tracks what developers are working on in their IDE.
+### Intelligent Context Curation
+- Does not dump the entire codebase into the prompt.
+- Retrieves only what matters for the request.
+- Compresses context without losing critical information.
+- Ranks and prioritizes based on relevance.
+- Respects access permissions with proof of possession.
+### Beyond Code
+- **Commit history**: Why changes were made, not just what changed.
+- **Codebase patterns**: How the team actually builds, not generic best practices.
+- **External sources**: Docs, tickets, design decisions via integrations and MCP.
+- **Tribal knowledge**: Edge cases and team conventions discovered through deep analysis.
+## Benchmarked Results
+### Blind Study on Elasticsearch Repository (3.6M Java LOC, 2,187 contributors)
+Comparing 500 agent-generated PRs to human-written code:
+- **Augment Code**: +12.8 overall (outperformed humans)
+- **Cursor**: -11.8 (underperformed)
+- **Claude Code**: -13.9 (underperformed)
+### Sub-scores (Augment vs Cursor vs Claude Code):
+- Correctness: +14.8 vs -9.3 vs -11.8
+- Completeness: +18.2 vs -12.0 vs -12.4
+- Code Reuse: -4.4 vs -9.3 vs -15.8
+- Best Practice: +12.4 vs -10.5 vs -16.4
+## Team Impact Claims
+- 18-month onboarding → 2 weeks on legacy Java monolith.
+- 6-month refactoring → 1 week with full test coverage.
+- PR review time: 7 min → 3 min.
+- Test coverage: 45% → 80% in one quarter.

package/vault/wiki/sources/Augment SWE-bench Agent GitHub.md ADDED Viewed

@@ -0,0 +1,74 @@
+---
+type: source
+status: ingested
+source_type: github-repository
+author: Augment Code
+date_published: 2025
+url: https://github.com/augmentcode/augment-swebench-agent
+confidence: high
+key_claims:
+  - "65.4% success rate on first SWE-bench Verified submission"
+  - "#1 open-source SWE-bench Verified implementation"
+  - "Uses Claude Sonnet 3.7 as core driver + OpenAI o1 as ensembler"
+  - "Forked agent system architecture from Anthropic's SWE-bench blog post"
+  - "Majority vote ensembler for selecting best solution from candidates"
+  - "Supports parallel execution via sharding across machines"
+  - "870 stars, 154 forks"
+tags:
+  - swe-bench
+  - augment-code
+  - coding-agent
+  - open-source
+  - claude
+created: 2026-05-02
+updated: 2026-05-02
+---# Augment SWE-bench Verified Agent (GitHub)
+## Summary
+Open-source implementation of Augment Code's SWE-bench agent. Achieved 65.4% on SWE-bench Verified, the #1 open-source implementation. Combines Claude Sonnet 3.7 for core reasoning with OpenAI o1 for solution ensembling.
+## Architecture
+### High-Level Component Structure
+| Layer | Components | Purpose |
+|-------|-----------|---------|
+| Entry Points | cli.py, run_agent_on_swebench_problem.py | User interfaces |
+| Agent Core | tools/agent.py, utils/common.py | Orchestration and dialog management |
+| LLM Integration | utils/llm_client.py | Abstracted LLM communication |
+| Tool Ecosystem | tools/*.py | Executable agent capabilities |
+| Infrastructure | utils/docker_utils.py, workspace_manager.py | Environment management |
+| Ensembling | majority_vote_ensembler.py | Solution selection |
+### Key Tools
+- **BashTool**: Command execution in workspace.
+- **StrReplaceTool**: File content manipulation and editing.
+- **SequentialThinkingTool**: Complex reasoning and problem decomposition.
+- **CompleteTool**: Task completion and result finalization.
+### Technology Stack
+- Python 3.x, Docker, uv (package management)
+- Anthropic Claude Sonnet 3.7 (core agent)
+- OpenAI o1-2024-12-17 (ensembling)
+- SWE-bench evaluation harness
+## Execution Modes
+### Interactive Mode (cli.py)
+- Personal coding assistant.
+- Single agent instance.
+- Session-based interaction.
+### SWE-bench Mode (run_agent_on_swebench_problem.py)
+- Automated benchmark evaluation.
+- Multiple parallel processes (8 per machine recommended).
+- Sharding across machines (80 parallel agents in their setup).
+- Majority vote ensembling post-generation.
+## Majority Vote Ensembler
+- Takes multiple candidate solutions per problem.
+- Presents all candidates to OpenAI o1.
+- o1 analyzes and selects the most common/best solution.
+- Parallel processing with configurable worker threads.

package/vault/wiki/sources/Augment SWE-bench Pro Blog.md ADDED Viewed

@@ -0,0 +1,58 @@
+---
+type: source
+status: ingested
+source_type: blog-post
+author: Arash (AJ) Joobandi, Augment Code
+date_published: 2026-02-04
+url: https://www.augmentcode.com/blog/auggie-tops-swe-bench-pro
+confidence: high
+key_claims:
+  - "Auggie scored 51.80% on SWE-bench Pro, highest of any agent tested"
+  - "Same model (Claude Opus 4.5), different results: Auggie 51.80%, Cursor 50.21%, Claude Code 49.75%"
+  - "Auggie beat SWE-Agent baseline (45.89%) by nearly 6 points with same model"
+  - "Context retrieval quality is the difference, not model intelligence"
+  - "SWE-bench Pro problems require multi-file understanding (avg 4.1 files, 107 lines changed)"
+tags:
+  - swe-bench-pro
+  - augment-code
+  - benchmark
+  - context-engine
+created: 2026-05-02
+updated: 2026-05-02
+---# Auggie Tops SWE-Bench Pro (Official Blog)
+## Summary
+Augment Code ran their agent (Auggie) on Scale AI's SWE-bench Pro benchmark and scored 51.80%, the highest among all tested agents. Crucially, Auggie, Cursor, and Claude Code all used the same underlying model (Claude Opus 4.5), yet Auggie solved 15-17 more problems out of 731.
+## Benchmark Results
+| Agent | Model | Score |
+|-------|-------|-------|
+| Auggie | Claude Opus 4.5 | 51.80% |
+| Cursor | Claude Opus 4.5 | 50.21% |
+| Claude Code | Claude Opus 4.5 | 49.75% |
+| Codex | GPT-5.2-codex | 46.47% |
+| SWE-Agent | Claude Opus 4.5 (Scale baseline) | 45.89% |
+## Key Insight: Context > Model Intelligence
+The gap between agents using the same model comes from **context retrieval quality**. SWE-bench Pro problems require understanding code that isn't in the immediate file. Finding the right code in a large repository is a retrieval problem.
+### Example: BCrypt Handling in Ansible
+- Relevant code spans several layers (high-level filters → low-level utilities).
+- Grep finds top-level APIs easily but misses the actual fix location.
+- Augment's Context Engine found the low-level utility because it understands semantic relationships, not just keyword matching.
+## What Is SWE-bench Pro?
+Released by Scale AI in late 2025 to address SWE-bench Verified saturation:
+- Multi-file edits (avg 4.1 files, 107 lines changed).
+- Multiple languages (Python, Go, TypeScript, JavaScript).
+- Real task diversity (bug fixes, features, security, performance, UI).
+- When launched, best models dropped from 70%+ to ~23%.
+## Context Engine as MCP
+Augment launched their Context Engine as an MCP server, making it available for any AI agent to use for codebase context retrieval.

package/vault/wiki/sources/Source: AgentBus Jinja2 Prompt Pipelines.md ADDED Viewed

@@ -0,0 +1,75 @@
+---
+type: source
+status: ingested
+source_type: engineering-blog
+title: "How to Build Prompt Pipelines with Jinja2 Templating"
+author: "Qasim"
+date_published: 2026-02-15
+url: "https://agentbus.sh/posts/how-to-build-prompt-pipelines-with-jinja2-templating/"
+confidence: high
+tags:
+  - prompt-templating
+  - jinja2
+  - prompt-pipelines
+  - multi-model
+related:
+  - "[[Research: Prompt Renderer for Multi-Model Agent Harness]]"
+key_claims:
+  - "Jinja2 provides variables, conditionals, loops, and template inheritance — the same tools that power web frameworks applied to prompt engineering"
+  - "Template inheritance (base → child) enables layered prompt systems where adding a new step means creating one .j2 file"
+  - "Pipeline runner: define pipelines as data, not code — `inputs_from` mapping connects step outputs to next step's template vars"
+  - "Common errors: undefined variables (use defaults), template path issues (use absolute paths), whitespace (use trim_blocks/lstrip_blocks)"
+  - "One template handles both zero-shot and few-shot — you control behavior through data, not separate prompt strings"
+created: 2026-05-02
+updated: 2026-05-02
+---# Prompts as Jinja2 Pipelines
+## Core Pattern
+```python
+from jinja2 import Environment, FileSystemLoader
+env = Environment(loader=FileSystemLoader("templates"))
+template = env.get_template("classify.j2")
+prompt = template.render(role="classifier", categories=["pos","neg"], text="...")
+```
+## Template Inheritance for Prompt Chains
+Base template (`base_prompt.j2`):
+```jinja2
+You are a {{ role | default("helpful assistant") }}.
+{% block task %}{% endblock %}
+{% block format_instructions %}Respond in plain text.{% endblock %}
+```
+Child template (`summarize.j2`):
+```jinja2
+{% extends "base_prompt.j2" %}
+{% block task %}
+Summarize in {{ num_sentences }} sentences.
+Document: {{ document }}
+{% endblock %}
+```
+## Reusable Pipeline Runner
+```python
+def run_pipeline(steps: list[dict]) -> dict[str, str]:
+    results = {}
+    for step in steps:
+        kwargs = step.get("kwargs", {})
+        for key, ref in step.get("inputs_from", {}).items():
+            kwargs[key] = results[ref]
+        results[step["name"]] = run_prompt(step["template"], **kwargs)
+    return results
+```
+## Key Takeaways for ultimate-pi
+1. **FileSystemLoader + .j2 files** = version-controlled, reviewable prompt templates separate from app logic
+2. **Template inheritance** = base prompt defined once, per-model variants extend/override blocks
+3. **Conditionals + loops** = dynamic few-shot examples, variable-length context injection
+4. **Pipeline as data** = declarative step definitions with explicit data flow
+5. **Common pitfalls**: undefined vars → use `| default()`; whitespace → `trim_blocks=True`; special chars → never concatenate user input into template strings

package/vault/wiki/sources/Source: Arxiv /342/200/224 Don't Break the Cache.md" ADDED Viewed

@@ -0,0 +1,85 @@
+---
+type: source
+status: ingested
+source_type: academic-paper
+title: "Don't Break the Cache: An Evaluation of Prompt Caching for Long-Horizon Agentic Tasks"
+author: "Elias Lumer, Faheem Nizar, Akshaya Jangiti, et al. (PricewaterhouseCoopers U.S.)"
+date_published: 2026-01-31
+url: "https://arxiv.org/html/2601.06007v2"
+confidence: high
+tags:
+  - prompt-caching
+  - agentic-workloads
+  - multi-provider-evaluation
+  - cache-strategies
+related:
+  - "[[Research: Prompt Renderer for Multi-Model Agent Harness]]"
+  - "[[Source: TianPan Prompt Caching Architecture]]"
+key_claims:
+  - "First comprehensive evaluation of prompt caching for agentic workloads across OpenAI, Anthropic, and Google"
+  - "Prompt caching reduces API costs by 41-80% and improves TTFT by 13-31% across providers"
+  - "Strategic cache boundary control (system prompt only) outperforms naive full-context caching"
+  - "Full context caching can paradoxically increase latency — dynamic tool calls trigger cache writes for non-reusable content"
+  - "System prompt only caching provides the most consistent benefits across both cost and latency dimensions"
+  - "Cost savings scale linearly with prompt size (54-89% at 50K tokens), stable across tool counts"
+  - "Evaluated on DeepResearch Bench: 500 agent sessions, 10K-token system prompts, 4 flagship models"
+created: 2026-05-02
+updated: 2026-05-02
+---# Academic Validation of Prompt Caching Strategies
+## Experimental Design
+- **3 providers**: OpenAI (GPT-5.2, GPT-4o), Anthropic (Claude Sonnet 4.5), Google (Gemini 2.5 Pro)
+- **4 cache strategies**:
+  1. **No Cache**: UUID prepended to break all prefix matching
+  2. **Full Context Caching**: No UUIDs, automatic caching
+  3. **System Prompt Only**: UUID appended after system prompt — only static system prompt cached
+  4. **Exclude Tool Results**: UUIDs after system prompt AND after each tool result
+- **Benchmark**: DeepResearch Bench — 100 PhD-level research tasks, agents autonomously execute web search tool calls
+## Key Results
+| Model | Best Mode | Cost ↓ | TTFT ↓ |
+|-------|-----------|--------|--------|
+| GPT-5.2 | Excl. Tool Results | 79.6% | 13.0% |
+| Claude Sonnet 4.5 | System Prompt | 78.5% | 22.9% |
+| Gemini 2.5 Pro | System Prompt | 41.4% | 6.1% |
+| GPT-4o | System Prompt | 45.9% | 30.9% |
+## Cache Strategy Comparison
+1. **System prompt only caching** = most consistent benefits across cost AND latency
+2. **Full context caching** = similar cost savings BUT paradoxically can increase latency (GPT-4o: -8.8% TTFT regression)
+3. **Exclude tool results** = best for models with high tool-call overhead (GPT-5.2)
+4. **Cost savings driven by system prompt size**, not tool count — focus on maximizing cacheable prefix
+## Strategic Cache Boundary Control
+The key insight: **providers abstract the caching mechanism, automatically triggering cache creation when token thresholds are exceeded. Without explicit boundary control, this can cache dynamic, session-specific content.**
+Implementation: use UUIDs to explicitly break the cache at boundary points:
+- Prepending UUID → breaks ALL caching (baseline)
+- UUID after system prompt → ONLY system prompt cached
+- UUID after system prompt + after each tool result → excludes tool results from cache
+## Minimum Token Thresholds
+| Provider | Model | Min Tokens |
+|----------|-------|-----------|
+| OpenAI | GPT-4o, GPT-5.2 | 1,024 |
+| Anthropic | Claude Sonnet 4.5 | 1,024 |
+| Google | Gemini 2.5 Pro | 4,096 |
+## Ablation Findings
+- **Cost savings scale linearly with prompt size**: 10-45% at 500 tokens → 54-89% at 50K tokens
+- **Tool count has minimal impact**: cost savings stable across 3-50 tool calls
+- **Below threshold**: TTFT regressions of 10-18% at 500 tokens (caching cannot activate)
+## Relevance to ultimate-pi Prompt Renderer
+1. **Compile-time caching**: Pre-rendered prompts shipped in npm → no runtime cache warmup, no threshold concerns
+2. **Static-first structure**: The renderer must place all static/model-agnostic content FIRST, variables/dynamic content LAST
+3. **System prompt caching is sufficient**: 75-81% of savings come from caching the system prompt alone — focus rendering optimization there
+4. **Per-model threshold awareness**: Minimum 1,024 tokens for caching to engage (OpenAI/Anthropic), 4,096 for Google — renderer should ensure compiled prompts exceed these

package/vault/wiki/sources/Source: Augment - Harness Engineering for AI Coding Agents.md ADDED Viewed

@@ -0,0 +1,58 @@
+---
+type: source
+status: ingested
+source_type: engineering-blog
+author: Molisha Shah (Augment Code)
+date_published: 2026-04-16
+date_accessed: 2026-05-01
+url: https://www.augmentcode.com/guides/harness-engineering-ai-coding-agents
+confidence: medium
+key_claims:
+  - Harness engineering shifts engineers from writing code to designing systems that govern how agents write code
+  - Three harness layers: Constraints (feedforward), Feedback Loops (corrective), Quality Gates (enforcement)
+  - PEV Loop (Plan-Execute-Verify) as structured harness pattern with gates at every phase transition
+  - Rules files are one layer of harness, not complete solution — must combine with deterministic enforcement
+  - Key metrics: Task Resolution Rate, Code Churn Rate, Verification Tax, Defect Escape Rate
+  - DORA report: higher AI adoption correlates with increased throughput AND instability
+created: 2026-05-02
+updated: 2026-05-02
+tags: [source]
+---
+# Augment: Harness Engineering for AI Coding Agents
+## What It Is
+Practical guide to harness engineering from Augment Code, published April 16, 2026. Focuses on constraint design, PEV loops, and measurement.
+## Origin of "Harness Engineering"
+| Term | Attribution | Date |
+|------|-------------|------|
+| Harness engineering | Mitchell Hashimoto (per secondary reports) | Early Feb 2026 |
+| Formal definition | OpenAI / Ryan Lopopolo | Feb 11, 2026 |
+| Agent = Model + Harness | LangChain | Feb-Mar 2026 |
+| Context engineering | Andrej Karpathy | Dec 19, 2025 |
+| Agentic engineering | Andrej Karpathy | Feb 2026 |
+## Three Harness Layers
+1. **Constraint Harnesses (Feedforward)**: Reduce solution space before generation. Rules files, architectural lint configs, type systems. OpenAI enforces "taste invariants" as hard CI failures.
+2. **Feedback Loops (Corrective)**: Structured error signals back to agent. Critical detail: lint message *becomes* a prompt. "Use `logger.info({event: 'name'})` instead of `console.log`" vs "violation detected." Disable inline-disable rules to prevent agents suppressing violations.
+3. **Quality Gates (Enforcement)**: Prevent non-compliant code from merging. Standard CI insufficient — agents introduce problems conventional checks miss.
+## PEV Loop (Plan-Execute-Verify)
+Architectural pattern with gates at every transition:
+- **Pre-execution gates**: Is this a known tool? Valid arguments? Requires user approval? Inside workspace?
+- **Plan alignment gate**: Did agent use existing auth middleware or create new one? Architectural questions invisible to test runners.
+- **Verification timing**: Pre-execution + runtime + post-execution + plan alignment
+## Measurement
+Key metrics: Task Resolution Rate, Code Churn Rate, Verification Tax (time-to-approval minus time-to-first-commit), Harness Constraint Effect (success rate constrained vs unconstrained), Defect Escape Rate.
+## Relevance to Ultimate-PI
+PEV Loop maps directly to our L2 (Plan) → L3 (Execute/Ground) → L4 (Verify). Our L2.5 Drift Monitor + Phase 16 Lint Gate are quality gates. Gap: we lack pre-execution gates (known tool check, argument validation) — this is P-F1 in integration plan. The "lint message as prompt" concept validates our approach to making drift detection messages actionable rather than just flagging.