npm - ultimate-pi - Versions diffs - 0.1.2 → 0.1.3 - Mend

ultimate-pi 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (516) hide show

package/vault/wiki/sources/Source: Blake Crosley Agent Architecture Guide.md ADDED Viewed

@@ -0,0 +1,100 @@
+---
+type: source
+source_type: engineering-blog
+title: "Blake Crosley — Agent Architecture: Building AI-Powered Development Harnesses"
+author: "Blake Crosley"
+date_published: 2026-04-29
+url: "https://blakecrosley.com/guides/agent-architecture"
+confidence: high
+key_claims:
+  - "The harness is a programmable runtime with an LLM kernel — not a chat box with file access"
+  - "Hooks guarantee execution (exit code 2 blocks). Prompts achieve ~80% compliance."
+  - "Skills encode domain expertise that auto-activates via LLM reasoning"
+  - "Subagents prevent context bloat — isolated context windows for exploration"
+  - "Memory lives in the filesystem — files persist across context boundaries"
+  - "Multi-agent deliberation catches blind spots that single agents cannot"
+  - "The harness pattern is the system — CLAUDE.md, hooks, skills, agents, memory compose into a deterministic layer"
+  - "Code is cheap now; verification is the expensive part"
+  - "Single-agent systems have a structural blind spot: they cannot challenge their own assumptions"
+  - "Fresh-context iteration (Ralph Loop) beats long conversations for quality beyond 90 minutes"
+  - "Production harness reduced false completion rate from 35% to 4% and blocked 7 credential leaks"
+tags: [source, agent-architecture, harness, hooks, skills, multi-agent]
+related:
+  - "[[harness-engineering-first-principles]]"
+  - "[[agent-skills-pattern]]"
+  - "[[lifecycle-hooks]]"
+  - "[[consensus-debate]]"
+  - "[[skill-first-architecture]]"
+---
+# Blake Crosley — Agent Architecture Guide
+## Summary
+Comprehensive 12,783-word guide on building production AI agent harnesses. Published April 29, 2026, updated through Google Cloud Next 2026 (April 22-24). Covers the complete stack: hooks, skills, subagents, multi-agent orchestration, memory, and production patterns. Based on the author's implementation: 84 hooks, 48 skills, 19 agents, ~15,000 lines of orchestration.
+## Key Contributions
+### The Harness Pattern
+The harness is not a framework — it's a pattern: a composable set of files, scripts, and conventions that wrap an AI coding agent in deterministic infrastructure. Four layers:
+1. **Instruction Layer**: CLAUDE.md + rules directories — what the agent knows
+2. **Extension Layer**: Skills (domain expertise), Hooks (deterministic gates), Memory (persistent state), Agents (specialized subagents)
+3. **Orchestration Layer**: Multi-agent patterns, spawn budgets, consensus validation
+4. **Core Layer**: Main conversation context (LLM)
+> "Most users work entirely in the Core Layer, watching context bloat and costs climb. Power users configure the Instruction and Extension layers."
+### Hooks vs Skills vs Subagents Decision Framework
+| Problem | Use | Why |
+|---------|-----|-----|
+| Format code after every edit | PostToolUse hook | Must happen every time, deterministically |
+| Block dangerous bash commands | PreToolUse hook | Must block before execution, exit code 2 |
+| Apply security review patterns | Skill | Domain expertise that auto-activates |
+| Explore codebase without polluting context | Explore subagent | Isolated context, returns summary only |
+| Run experimental refactoring safely | Worktree-isolated subagent | Changes can be discarded |
+| Review code from multiple perspectives | Parallel subagents | Independent evaluation |
+| Decide on irreversible architecture | Multi-agent deliberation | Confidence trigger + consensus |
+### The Distinction That Matters
+> "Hooks guarantee execution; prompts do not. Use hooks for linting, formatting, security checks, and anything that must run every time regardless of model behavior. Exit code 2 blocks actions. Exit code 1 only warns."
+> "Skills are model-invoked extensions. Claude discovers and applies them automatically based on context, without you explicitly calling them. The moment you catch yourself re-explaining the same context across sessions is the moment you should build a skill."
+### Production Results
+A production harness processed 12 PRDs (47 stories) across 8 overnight sessions:
+| Metric | Minimal Harness | Full Harness |
+|--------|----------------|--------------|
+| False completion rate | 35% | 4% |
+| Credential leaks | 2 leaked | 7 blocked |
+| Destructive commands | 1 force-push | 4 blocked |
+| Revision rounds/story | 2.1 | 0.8 |
+| Token overhead | 0% | ~3.2% |
+### Context Degradation Research
+Microsoft Research + Salesforce: 15 LLMs, 200,000+ conversations, 39% average performance drop from single-turn to multi-turn. The degradation starts in as few as two turns. Fresh-context iteration (Ralph Loop) beats long conversations for quality beyond 60-90 minutes.
+### Multi-Agent Deliberation Findings
+Free-form debate rounds produced 7,500 tokens of debate with rounds 2-3 just restating positions. Structured dimension scoring replaced free-form debate, dropping cost by 60% while improving ranking quality. Independence is critical — two agents with visibility into each other's findings converged to similar scores (0.45 vs 0.48). Without visibility: 0.45 vs 0.72 — the gap is the cost of herding.
+## What We Adopt
+- The hook/skill/agent differentiation as the primary architectural decision framework
+- "Code is for determinism, skills are for expertise" as the first-principles dividing line
+- Filesystem as memory (our wiki vault IS this pattern)
+- Fresh-context iteration via subagents (we have pi-subagents for this)
+- Production metrics as evidence that harness infrastructure compounds (3.2% overhead prevents 35% false completion)
+- Structured dimension scoring over free-form debate (we already adopted iMAD selective routing)
+## What We Deliberately Do NOT Adopt
+- Full Claude Code dependency: our harness runs on pi, not Claude Code. But the architectural principles transfer.
+- 84 hooks / 48 skills scale: excessive for an MVP. Start with 4-6 skills.
+- Agent Teams (Claude Code proprietary): use pi-subagents for equivalent isolation.

package/vault/wiki/sources/Source: Bolt.new Architecture & Case Study.md ADDED Viewed

@@ -0,0 +1,75 @@
+---
+type: source
+source_type: case_study
+title: "Bolt.new Architecture & Case Study"
+author: "DeepWiki, Evil Martians (Victoria Melnikova, Travis Turner)"
+date_published: 2024-12-02
+url:
+  - "https://deepwiki.com/stackblitz/bolt.new/1.2-architecture"
+  - "https://evilmartians.com/chronicles/bolt-new-from-stackblitz-how-they-surfed-the-ai-wave-with-no-wipeouts"
+  - "https://github.com/stackblitz/bolt.new"
+confidence: high
+key_claims:
+  - "WebContainers give AI complete control over filesystem, node server, package manager, terminal, browser console"
+  - "Claude 3.5 Sonnet was the enabling technology — zero-shot code gen without RAG infrastructure"
+  - "0 to $4M ARR in 4 weeks — usage doubling daily"
+  - "AI-generated code is immediately executable and editable in-browser"
+  - "Bolt.new is open source, built on Remix + React + WebContainers"
+  - "Rails powers the backend (users, permissions, billing)"
+tags:
+  - bolt
+  - webcontainers
+  - claude
+  - remix
+  - stackblitz
+created: 2026-05-03
+updated: 2026-05-03
+status: ingested
+---# Bolt.new Architecture & Case Study
+Bolt.new is an AI-powered full-stack web development platform by StackBlitz that runs entirely in the browser. Users prompt, AI builds, code executes instantly in WebContainers.
+## Architecture (DeepWiki)
+### Core Components
+**Frontend**: Remix framework + React. UI libraries: Radix UI, Framer Motion, UnoCSS. CodeMirror editor + XTerm.js terminal + app preview.
+**WebContainer System**: In-browser Node.js runtime. Filesystem, package manager, terminal, browser console — all in browser sandbox via WebAssembly.
+**AI Integration**: Anthropic Claude API. AI agent interprets prompts → controls dev environment → generates code → installs dependencies → runs dev server → deploys.
+**Deployment**: Cloudflare Pages. One-click production deploy.
+### Interaction Flow
+```
+User → Submit prompt → AI Agent → Generate code → Create files
+     → Install deps → Run dev server → Display preview
+     → Request changes → Modify code → Update preview
+     → Request deploy → Deploy → Share URL
+```
+## Evil Martians Case Study
+### The Breakthrough
+StackBlitz had WebContainers since 2021. The missing piece was a model capable of zero-shot code generation without RAG infrastructure. Claude 3.5 Sonnet changed everything: "There's an order of magnitude difference in the LLM's required infrastructure to make it functional versus zero shot."
+### Key Product Decisions
+- Code executes instantly — no waiting for cloud VMs
+- AI-generated code is **malleable** — editable in-browser
+- Streaming interface shows real-time results
+- Complex environments spin up in milliseconds
+- One-click deploy to Netlify
+### Results
+- 0→$4M ARR in 4 weeks
+- 99% reduction in development costs for users
+- Tens of thousands of new customers, usage doubling daily
+- Supabase signups surged after bolt.new integration
+### Lessons for AI Coding Harness
+1. **Environment control is the moat.** If the agent can't run code, it can't verify its own output. Bolt's WebContainers + OpenAI Codex's Chrome DevTools integration + Anthropic's Playwright MCP all converge on this.
+2. **Model capability matters more than prompt engineering** for certain thresholds. Before Claude 3.5 Sonnet, the same WebContainer technology wasn't enough. Find the model that makes your harness viable.
+3. **Keep generated code editable by users.** Don't lock users into a black-box AI output. This reduces trust barriers and enables human-in-the-loop refinement.
+4. **Rails backend for non-AI concerns** — users, permissions, billing. Don't reinvent infrastructure; use proven tech for everything outside the AI path.

package/vault/wiki/sources/Source: Build-Time Prompt Compilation Architecture.md ADDED Viewed

@@ -0,0 +1,107 @@
+---
+type: source
+status: ingested
+source_type: architecture-analysis
+title: "Build-Time Prompt Compilation Architecture"
+author: "Synthesis of multiple real sources"
+date_published: 2026-05-02
+url: "https://github.com/microsoft/prompt-engine"
+confidence: high
+tags:
+  - prompt-compilation
+  - build-tools
+  - yaml-to-json
+  - template-engine
+related:
+  - "[[Research: Prompt Renderer for Multi-Model Agent Harness]]"
+key_claims:
+  - "Build-time prompt compilation is a valid architectural pattern but no mature off-the-shelf npm package exists"
+  - "Microsoft prompt-engine (2.8K stars, MIT) validates the YAML-based prompt management pattern"
+  - "PromptWeaver (@iqai/prompt-weaver, MIT, Dec 2025) provides template compilation + Zod validation for production use"
+  - "The DIY approach (js-yaml + @iqai/prompt-weaver + per-model renderer plugins) is the correct implementation path"
+  - "Deterministic builds: same spec + same renderer version → identical output with hash verification"
+created: 2026-05-02
+updated: 2026-05-02
+---
+# Build-Time Prompt Compilation — Real Tools & Architecture
+> [!correction] Previous research cited "PromptKit PackC" (npm, v1.4.6, 48 versions) which does not exist. This page documents the real tools and architecture.
+## What Exists
+### Microsoft prompt-engine
+- **Package**: `prompt-engine` (npm)
+- **Stars**: 2.8K | **License**: MIT | **Language**: TypeScript
+- **Status**: Last updated Oct 2022 (abandoned)
+- **What it does**: YAML-based prompt management with description + examples + dialog pattern. Builds prompts programmatically from YAML specs.
+- **Relevance**: Validates the YAML→prompt pattern. Code engine (NL→Code) and Chat engine (dialogs). Shows the pattern works but project is dormant.
+- **URL**: https://github.com/microsoft/prompt-engine
+### PromptWeaver
+- **Package**: `@iqai/prompt-weaver` (npm)
+- **Stars**: 4 | **License**: MIT | **Language**: TypeScript 100%
+- **Status**: Active (Dec 2025, v1.1.1, 104 commits, 7 releases)
+- **What it does**: Handlebars-based template engine with Zod/Valibot/ArkType validation schema support. Built-in 60+ transformers (dates, currency, strings, collections). Supports template compilation caching, reusable partials, Fluent Builder API, and composition.
+- **Relevance**: Production-ready template engine for prompts. Handlebars syntax for control flow (loops, conditionals, switch/case). Template compilation caching gets us 90% of the way to "build-time compilation." The Fluent Builder API enables dynamic prompt construction.
+- **URL**: https://github.com/IQAIcom/prompt-weaver
+### What Does NOT Exist
+- No npm package called "PromptKit PackC" exists
+- No npm package called `@altairalabs/packc` exists
+- No npm package `prompt-kit` exists (only `promptkit@0.0.1` — unrelated template scaffolding tool)
+- No mature, maintained build-time YAML→JSON prompt compiler exists on npm
+## Recommended Implementation
+### DIY Build Pipeline
+The architecture is sound. Instead of looking for a mythical off-the-shelf package, build the compiler ourselves:
+```
+prompts/*.yaml (base specs)
+    ↓ js-yaml (parse)
+SpecConfig[] (validated)
+    ↓ @iqai/prompt-weaver (template engine)
+    ↓ Per-model renderer plugins (apply provider conventions)
+    ↓ zod (validate schema)
+compiled prompts: dist/prompts/{gpt,claude,gemini}/*.json
+    ↓ SHA-256 (hash)
+manifest.json (deterministic build record)
+```
+### Stack
+| Component | Library | Purpose |
+|-----------|---------|---------|
+| YAML parsing | `js-yaml` (mature, 2.7K stars) | Parse base spec YAML files |
+| Template engine | `@iqai/prompt-weaver` | Handlebars-based template compilation with Zod validation |
+| Schema validation | `zod` | Type-safe spec validation, compile-time checking |
+| Deterministic builds | `crypto.createHash('sha256')` | Hash source specs + renderer version for reproducibility |
+| Per-model renderers | Custom TypeScript plugins | Apply each provider's official conventions |
+### Why Not Microsoft prompt-engine Directly?
+- Abandoned since 2022 (80 commits total)
+- No per-model rendering support
+- Limited to Code/Chat engines — not general-purpose prompt specs
+- Pattern is valid; codebase is stale
+### Why PromptWeaver?
+- Active development (Dec 2025)
+- Handlebars → familiar syntax for template authors
+- Zod integration → type-safe, validated prompts
+- Template compilation caching → same spec = cached compiled output
+- Reusable partials → DRY prompt fragments
+- Fluent Builder API → dynamic prompt construction when needed
+## Relevance to ultimate-pi Prompt Renderer
+The build-time compilation architecture should:
+1. **Accept a base prompt spec (YAML)** as input: `prompts/base/system.yaml`
+2. **Use PromptWeaver as the template engine**: Handlebars syntax, Zod validation, template caching
+3. **Apply per-model renderer plugins**: Each plugin knows its provider's official conventions (OpenAI constraints-first, Anthropic XML tags, Google constraints-last)
+4. **Compile at build time** via `npm run compile-prompts` → outputs `dist/prompts/{model}/*.json`
+5. **Ship compiled JSON in npm package** — no template engine at runtime
+6. **Runtime just does JSON.parse + string replace**: `__VAR_name__` placeholders for runtime variables
+7. **Deterministic builds**: Same YAML + same renderer version → identical compiled output (hash-verified)

package/vault/wiki/sources/Source: Claude API Agent Skills Overview.md ADDED Viewed

@@ -0,0 +1,70 @@
+---
+type: source
+source_type: official-docs
+title: "Claude API — Agent Skills Overview"
+author: "Anthropic"
+date_published: 2026
+url: "https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview"
+confidence: high
+key_claims:
+  - "Skills are reusable, filesystem-based resources that provide Claude with domain-specific expertise"
+  - "Three levels of loading: Metadata (always, ~100 tokens), Instructions (when triggered, <5k tokens), Resources (as needed, effectively unlimited)"
+  - "No practical limit on bundled content — files don't consume context until accessed"
+  - "Skills run in a code execution environment where Claude has filesystem access, bash commands, and code execution"
+  - "Script execution: code never enters context — only output consumes tokens"
+  - "Custom Skills: create as directories with SKILL.md files"
+tags: [source, skills, claude, anthropic, progressive-disclosure]
+related:
+  - "[[agent-skills-pattern]]"
+  - "[[progressive-disclosure-agents]]"
+  - "[[skill-first-architecture]]"
+---
+# Claude API — Agent Skills Overview
+## Summary
+Official Anthropic documentation for the Agent Skills system. Covers architecture, loading model, security considerations, and cross-surface availability (Claude API, Claude Code, claude.ai).
+## Key Contributions
+### Filesystem-Based Architecture
+Skills exist as directories on a virtual machine. Claude interacts with them using bash commands — reading SKILL.md, running scripts, accessing reference files. This filesystem-based architecture enables progressive disclosure: Claude loads information in stages.
+### Three Content Types, Three Loading Levels
+| Level | Content | When Loaded | Token Cost |
+|-------|---------|-------------|------------|
+| Level 1 | Metadata (YAML frontmatter: name + description) | Always (at startup) | ~100 tokens per skill |
+| Level 2 | Instructions (SKILL.md body) | When skill is triggered | Under 5,000 tokens |
+| Level 3+ | Resources (additional .md, scripts, templates) | As needed | Effectively unlimited |
+### On-Demand File Access
+Claude reads only files needed for each specific task. A skill can include dozens of reference files — if a task only needs the sales schema, Claude loads just that one file. The rest consume zero tokens.
+### Efficient Script Execution
+When Claude runs `validate_form.py`, the script's code never loads into context. Only the script's output consumes tokens. This makes scripts far more efficient than generating equivalent code on the fly.
+### No Practical Limit on Bundled Content
+Because files don't consume context until accessed, skills can include comprehensive API documentation, large datasets, extensive examples, or any reference materials. Zero context penalty for unused bundled content.
+### Security Model
+Skills should only come from trusted sources. A malicious skill can direct Claude to invoke tools or execute code in harmful ways. Recommendations: audit thoroughly, treat like installing software, be especially careful in production systems.
+## What We Adopt
+- Filesystem-based skill architecture as the model for harness skills
+- Three-tier loading model for progressive disclosure
+- Scripts-as-executables pattern (code never enters context)
+- No practical limit on bundled reference material — enables comprehensive attack pattern catalogs, plan templates, etc.
+## What We Note
+- Cross-surface availability: Skills don't sync across Claude API, Claude Code, and claude.ai — each surface requires separate management
+- Runtime constraints vary: Claude API has no network access and no runtime package installation; Claude Code has full network access
+- Our harness skills are pi-specific but follow the open standard — portable to any platform that supports SKILL.md

package/vault/wiki/sources/Source: Gemini CLI Changelogs.md ADDED Viewed

@@ -0,0 +1,88 @@
+---
+type: source
+status: ingested
+source_type: official-changelog
+author: Google
+date_published: 2026-04-30
+date_accessed: 2026-05-01
+url: https://geminicli.com/docs/changelogs/
+confidence: high
+key_claims:
+  - v0.40 (Apr 2026): Offline search (bundled ripgrep), four-tier memory system, Gemma local model support
+  - v0.39 (Apr): /memory inbox for skill review, ContextManager architecture, memory leak fixes
+  - v0.38 (Apr): Chapters narrative flow, Context Compression Service, persistent policy approvals
+  - v0.37 (Apr): Dynamic sandbox expansion, git worktrees, browser agent enhancements
+  - v0.36 (Apr): Multi-registry architecture, native macOS Seatbelt/Windows sandboxing, git worktrees
+  - v0.34 (Mar): Plan Mode enabled by default, gVisor/LXC sandboxing
+  - v0.32 (Mar): Generalist agent for task routing, model steering, Plan Mode external editor
+  - v0.29 (Feb): Plan Mode introduced, Gemini 3 default
+  - v0.27 (Feb): Event-driven scheduler, /rewind command, queued tool confirmations
+  - v0.26 (Jan): skill-creator skill, agent skills enabled by default, generalist agent
+  - v0.23 (Jan): Experimental Agent Skills support (agentskills.io)
+  - v0.12 (Oct 2025): Codebase investigator subagent, model routing, model selection
+  - Launch: Jun 25, 2025
+created: 2026-05-02
+updated: 2026-05-02
+tags: [source]
+---
+# Gemini CLI Changelogs (v0.4 — v0.40)
+## What It Is
+Complete release history of Gemini CLI from launch (June 2025) through v0.40 (April 2026). Tracks feature evolution across 40+ weekly releases.
+## Feature Evolution Timeline
+### Phase 1: Foundation (Jun–Sep 2025, v0.4–v0.9)
+- v0.4: Edit tool, CloudRun/Security extensions, prompt completion, citations
+- v0.5: FastMCP integration, positional prompts, tool output truncation
+- v0.6: JSON output mode, chat sharing, prompt search, A2A protocol RFC
+- v0.7: IDE plugin spec, Flutter/nanobanana extensions, experimental todos
+- v0.8: **Extensions ecosystem launch** (20+ partners), new homepage/docs
+- v0.9: **Interactive Shell** (vim, git rebase), OpenTelemetry GenAI metrics
+### Phase 2: Intelligence (Oct–Dec 2025, v0.10–v0.22)
+- v0.10: Polish + bug fixing investment
+- v0.11: Jules extension (remote workers), stream-json output
+- v0.12: **Codebase investigator subagent**, model routing, model selection
+- v0.15: **Todo planning**, scrollable UI + mouse support
+- v0.16: **Gemini 3 launch**
+- v0.18: Policy engine (experimental), Google Workspace extension
+- v0.20: Multi-file drag-drop, persistent "Always Allow" policies
+- v0.21: Gemini 3 Flash, Rill/Browserbase extensions
+- v0.22: Free tier gets Gemini 3, Conductor extension (planning++)
+### Phase 3: Agent Architecture (Jan–Apr 2026, v0.23–v0.40)
+- v0.23: **Agent Skills support** (agentskills.io), gemini-wrapped
+- v0.24: Built-in agent skills, `/agents refresh`, `/skills install/uninstall`
+- v0.25: `activate_skill` tool, `pr-creator` skill, skills enabled by default
+- v0.26: `skill-creator` skill, agent skills by default, generalist agent
+- v0.27: **Event-driven scheduler**, `/rewind`, queued tool confirmations
+- v0.28: Positron IDE, custom themes, OAuth improvements
+- v0.29: **Plan Mode**, Gemini 3 default for all
+- v0.30: SDK package, custom skills, policy engine `--policy` flag
+- v0.31: Gemini 3.1 Pro Preview, experimental browser agent
+- v0.32: **Generalist agent enabled**, model steering, Plan Mode external editor
+- v0.33: A2A remote agents, Plan Mode research subagents
+- v0.34: **Plan Mode default**, gVisor/LXC sandboxing
+- v0.35: Customizable keyboard shortcuts, vim improvements, JIT context discovery
+- v0.36: **Multi-registry architecture**, macOS Seatbelt/Windows sandboxing, git worktrees
+- v0.37: Dynamic sandbox expansion, Chapters narrative, browser persistent sessions
+- v0.38: **Chapters narrative flow**, Context Compression Service, persistent policy approvals
+- v0.39: `/memory inbox`, ContextManager architecture decoupling
+- v0.40: Offline search (bundled ripgrep), four-tier memory, Gemma local model support
+## Key Patterns
+1. **Rapid iteration**: Weekly releases, 6,005 commits in ~10 months
+2. **Progressive disclosure**: Features gated behind experimental flags → preview → stable → default
+3. **Ecosystem first**: Extensions launched v0.8, Skills v0.23 — both designed for community contribution
+4. **Security layered in**: Policy engine (v0.18), sandboxing (v0.34), worktrees (v0.36) — not bolted on
+5. **Model-adaptive**: Model routing (v0.12), model steering (v0.32), Gemma local (v0.40)
+## Relevance to Ultimate-PI
+Gemini CLI's evolution pattern validates our phased approach: foundation → intelligence → agent architecture. Their rapid iteration (weekly releases, experimental → preview → stable → default) is a model for how we should deploy harness improvements. Their "ecosystem first" approach (extensions, skills registries) suggests we should design our tool system for community contribution from the start.

package/vault/wiki/sources/Source: Google Blog - Gemini CLI Announcement.md ADDED Viewed

@@ -0,0 +1,57 @@
+---
+type: source
+status: ingested
+source_type: official-announcement
+author: Taylor Mullen, Ryan J. Salva (Google)
+date_published: 2025-06-25
+date_accessed: 2026-05-01
+url: https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemini-cli-open-source-ai-agent/
+confidence: high
+key_claims:
+  - Gemini CLI: open-source AI agent (Apache 2.0) bringing Gemini to terminal
+  - Free tier: 60 req/min, 1,000 req/day with personal Google account (industry's largest allowance)
+  - Access to Gemini 2.5 Pro with 1M token context window
+  - Built-in tools: Google Search grounding, MCP support, bundled extensions, customizable prompts
+  - Non-interactive mode for script automation
+  - Shares technology with Gemini Code Assist (VS Code + terminal)
+  - Open source: global community contribution expected
+created: 2026-05-02
+updated: 2026-05-02
+tags: [source]
+---
+# Google Official Blog: Gemini CLI Announcement
+## What It Is
+Official launch announcement for Gemini CLI, published June 25, 2025 on Google's blog (The Keyword). Authored by Taylor Mullen (Senior Staff Software Engineer, creator of Gemini CLI) and Ryan J. Salva (Senior Director, Product Management).
+## Key Announcements
+### Free Tier (Unprecedented)
+- 60 model requests per minute
+- 1,000 model requests per day
+- Access to Gemini 2.5 Pro with 1M token context window
+- Requires only personal Google account
+- Marketed as "industry's largest allowance"
+### Core Capabilities
+- Code understanding, file manipulation, command execution, dynamic troubleshooting
+- Ground prompts with Google Search for real-time web context
+- Extend via MCP (Model Context Protocol) or bundled extensions
+- Customize prompts and instructions for specific workflows
+- Automate tasks via non-interactive script invocation
+### Open Source
+- Apache 2.0 license
+- Full source on GitHub: github.com/google-gemini/gemini-cli
+- Community contribution expected (bugs, features, security, code)
+- Emerging standards: MCP, system prompts (GEMINI.md), settings
+### Gemini Code Assist Integration
+- Shares technology with Code Assist (VS Code)
+- Agent mode in Code Assist: multi-step planning, auto-recovery from failures
+- Available on all plans (free, Standard, Enterprise)
+## Relevance to Ultimate-PI
+The free tier economics (60 req/min, 1,000 req/day) make Gemini CLI viable as a *model provider* within our multi-model harness. The 1M token window + Google Search grounding directly complement our L3 Grounding layer. The open-source model (Apache 2.0) means we can study and adapt their harness patterns without license concerns.

package/vault/wiki/sources/Source: Google Gemini CLI Architecture Docs.md ADDED Viewed

@@ -0,0 +1,53 @@
+---
+type: source
+status: ingested
+source_type: official-documentation
+author: Google
+date_published: 2025-06-25
+date_accessed: 2026-05-01
+url: https://google-gemini.github.io/gemini-cli/docs/architecture.html
+confidence: high
+key_claims:
+  - Gemini CLI is composed of CLI package (frontend) and Core package (backend)
+  - Core receives requests, orchestrates Gemini API, manages tool execution
+  - Tools are individual modules for filesystem, shell, web fetch, search
+  - ReAct loop: user input → CLI → Core → Gemini API → tool execution → final response
+  - Key design principles: modularity, extensibility, user experience
+  - Read-only ops may not require user confirmation; write ops always do
+created: 2026-05-02
+updated: 2026-05-02
+tags: [source]
+---
+# Gemini CLI Architecture (Official Docs)
+## What It Is
+The official architecture documentation for Google's Gemini CLI, an open-source AI agent (Apache 2.0) that brings Gemini models directly into the terminal.
+## Core Components
+1. **CLI package (`packages/cli`)**: User-facing — input processing, history management, display rendering, theme/UI customization, CLI configuration.
+2. **Core package (`packages/core`)**: Backend — API client for Gemini API, prompt construction/management, tool registration/execution, state management, server-side configuration.
+3. **Tools (`packages/core/src/tools/`)**: Individual modules extending Gemini model capabilities — filesystem operations, shell commands, web fetching, Google Search grounding, multi-file read, memory, MCP server bridge.
+## Interaction Flow
+1. User types prompt → CLI package
+2. CLI sends to Core package
+3. Core constructs prompt (history + tool definitions), sends to Gemini API
+4. Gemini API returns response (direct answer OR tool request)
+5. If tool: Core prepares execution, requests user approval for write/shell ops, executes, sends result back to API
+6. Core sends final response back to CLI
+7. CLI displays to user
+## Design Principles
+- **Modularity**: Separating frontend from backend enables independent development and alternative frontends
+- **Extensibility**: Tool system designed for adding new capabilities
+- **User Experience**: Rich interactive terminal experience via CLI package
+## Relevance to Ultimate-PI
+The two-package architecture (CLI/Core) maps to our L1-L4 (Core/Harness) vs L5-L8 (Observability/Memory/Orchestration) separation. Their tool registration + execution logic parallels our tool definitions. Their ReAct loop with approval gates parallels our planned pre-execution policy gates (P-F1).

package/vault/wiki/sources/Source: LangChain - Anatomy of Agent Harness.md ADDED Viewed

@@ -0,0 +1,65 @@
+---
+type: source
+status: ingested
+source_type: engineering-blog
+author: Vivek Trivedy (LangChain)
+date_published: 2026-03-10
+date_accessed: 2026-05-01
+url: https://blog.langchain.com/the-anatomy-of-an-agent-harness/
+confidence: high
+key_claims:
+  - Agent = Model + Harness. "If you're not the model, you're the harness."
+  - Harness includes: system prompts, tools/skills/MCPs, bundled infrastructure, orchestration logic, hooks/middleware
+  - Filesystem is most foundational harness primitive (durable state, collaboration surface, git versioning)
+  - Bash + code exec as general-purpose tool (avoid pre-designing every tool)
+  - Sandboxes for safe execution environments with good default tooling
+  - Context Rot management: compaction, tool call offloading, progressive disclosure (Skills)
+  - Ralph Loop: intercept model exit, reinject original prompt in clean context window
+  - Model-harness co-evolution creates overfitting — best harness for task may NOT be what model was trained with
+created: 2026-05-02
+updated: 2026-05-02
+tags: [source]
+---
+# LangChain: The Anatomy of an Agent Harness
+## What It Is
+Comprehensive analysis of harness engineering from LangChain, published March 10, 2026. Defines harness primitives by working backwards from desired agent behavior.
+## Core Definition
+**Agent = Model + Harness.** If you're not the model, you're the harness. A harness is every piece of code, configuration, and execution logic that isn't the model itself.
+Concrete harness components: system prompts, tools/skills/MCPs + descriptions, bundled infrastructure (filesystem, sandbox, browser), orchestration logic (subagent spawning, handoffs, model routing), hooks/middleware (compaction, continuation, lint checks).
+## Key Harness Primitives
+### Filesystem
+Most foundational primitive. Unlocks: workspace for reading data/code/docs, incremental work offloading, state persistence across sessions, collaboration surface (multiple agents + humans coordinate through shared files). Git adds versioning.
+### Bash + Code Execution
+General-purpose tool. Instead of forcing users to build tools for every action, give agents a computer. Model can design its own tools on the fly via code. Still ship other tools, but code exec is default strategy for autonomous problem solving.
+### Sandboxes
+Safe operating environments with good default tooling. Pre-installed runtimes, CLIs, browsers. Enable scale: create on demand, fan out, tear down.
+### Context Rot Management
+- **Compaction**: Offloads/summarizes context near window limit.
+- **Tool call offloading**: Keeps head + tail tokens of large outputs; full output on filesystem.
+- **Progressive disclosure (Skills)**: Too many tools at startup degrades performance. Skills solve via on-demand loading.
+### Long-Horizon Execution
+- **Ralph Loop**: Intercepts model exit attempt, reinjects original prompt in clean context. Filesystem makes this possible (fresh context reads state from previous iteration).
+- **Planning + Self-Verification**: Plan files on filesystem, verification via test suites, hooks that loop back on failure.
+## Model-Harness Co-Evolution (Critical Insight)
+Models post-trained with harness in the loop → overfitting to specific tool logic. Example: Codex's `apply_patch` tool — changing patch methods leads to worse model performance despite model intelligence.
+**Counter-intuitive finding**: Terminal Bench 2.0 shows Opus 4.6 scores far lower in Claude Code than in other harnesses. LangChain improved their agent from Top 30 to Top 5 by only changing the harness. **"Best harness for your task is NOT necessarily the one a model was post-trained with."**
+## Relevance to Ultimate-PI
+Validates our multi-model approach (4 profiles). Each model may need a different harness configuration — we should test model-harness combinations rather than assuming one harness fits all. The Ralph Loop concept could enhance our L2 Structured Planning by adding continuation hooks. Context rot management (compaction, offloading, progressive disclosure) directly validates our pi-lean-ctx + skills architecture.