npm - ultimate-pi - Versions diffs - 0.1.7 → 0.2.2 - Mend

ultimate-pi 0.1.7 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (524) hide show

package/vault/wiki/concepts/agent-browser-browser-automation.md DELETED Viewed

@@ -1,99 +0,0 @@
----
-type: concept
-title: "agent-browser — Rust-Native Browser Automation for AI Agents"
-status: developing
-created: 2026-05-02
-updated: 2026-05-02
-tags:
-  - browser-automation
-  - ai-agents
-  - vercel-labs
-  - rust
-  - cdp
-  - headless-browser
-aliases: ["agent-browser", "Vercel Labs agent-browser"]
-related:
-  - "[[browser-subagent-visual-verification]]"
-  - "[[harness-implementation-plan]]"
-  - "[[Source: Vercel Labs agent-browser]]"
-sources:
-  - "[[Source: Vercel Labs agent-browser]]"
----
-# agent-browser — Rust-Native Browser Automation for AI Agents
-Vercel Labs agent-browser (31.4K GitHub stars, Apache 2.0, v0.26.0) is the leading open-source browser automation CLI built specifically for AI agents. Rust-native single binary, 112 contributors, 81 releases, 568 commits.
-**Supersedes**: [[browser-harness-agent]] (9.4K stars, MIT, Python) — replaced May 2026 for P30. agent-browser has 3.3× more stars, richer AI agent integration, and Rust-native performance.
-## Core Design
-Unlike Puppeteer/Playwright (human scripting APIs) and browser-harness (raw CDP with self-healing), agent-browser provides an **agent-native interface**: snapshot-based element refs (`@e1`, `@e2`), JSON output, annotated screenshots, structured diff, and a built-in skills system. The AI agent thinks in terms of refs from snapshots — not CSS selectors, not CDP method calls.
-## Key Innovations for AI Agents
-### 1. Snapshot + Refs Workflow
-```
-agent-browser snapshot -i --json
-→ Returns: {"refs": {"e1": {"role":"button","name":"Submit"}, "e2": {"role":"textbox","name":"Email"}}}
-agent-browser click @e1          # deterministic, no DOM re-query
-agent-browser fill @e2 "text"    # refs survive page changes until re-snapshot
-```
-### 2. Annotated Screenshots
-```
-agent-browser screenshot --annotate
-→ Screenshot with numbered labels [1], [2], [3] matching @e1, @e2, @e3 refs
-→ Multimodal models can reason about visual layout + refs simultaneously
-```
-### 3. Structured Diff
-```
-agent-browser diff screenshot --baseline before.png -o diff.png
-agent-browser diff snapshot --baseline before-snapshot.txt
-→ Structural + visual diff for verifying UI changes
-```
-### 4. React Introspection
-```
-agent-browser open --enable react-devtools <url>
-agent-browser react tree           # full component tree
-agent-browser react suspense       # suspense boundaries + classifier
-agent-browser vitals               # LCP/CLS/TTFB/FCP/INP + React hydration
-```
-### 5. Batch Mode
-```
-agent-browser batch "open url" "snapshot -i" "click @e1" "screenshot"
-→ Multiple commands in single CLI invocation, reduces process startup overhead
-```
-### 6. Built-in Skills
-```
-agent-browser skills get core      # 420-line usage guide for agents
-npx skills add vercel-labs/agent-browser  # install skill stub
-```
-## Architecture
-- **Rust CLI** + **Rust Daemon**: Single binary. Daemon auto-starts, persists between commands
-- **Client-daemon**: Fast subsequent commands (no browser restart)
-- **Direct CDP**: Like browser-harness — raw DevTools Protocol, no Puppeteer wrappers
-- **Multi-provider**: Local Chrome + 6 cloud providers (Browserless, Browserbase, Browser Use, Kernel, AgentCore, iOS)
-## Integration with P30
-P30 Browser Subagent dispatches via P25 router for UI tasks. Harness invokes `agent-browser` CLI as a subprocess (or via batch mode for multi-step workflows). Config at `.pi/harness/browser.json`.
-**What we use**:
-- Snapshot + refs for element interaction
-- Annotated screenshots for visual verification
-- Diff for before/after comparison
-- Batch mode for multi-step agent workflows
-- `--json` for structured output parsing
-**What we skip**:
-- Dashboard (CLI harness only)
-- AI Chat (our agent IS the chat)
-- Cloud providers (local Chrome only; opt-in for serverless)
-- iOS Simulator (web-focused; opt-in)

package/vault/wiki/concepts/agent-codebase-interface.md DELETED Viewed

@@ -1,43 +0,0 @@
----
-type: concept
-title: "Agent-Codebase Interface (ACI)"
-created: 2026-04-30
-updated: 2026-04-30
-tags:
-  - agent-architecture
-  - codebase-exploration
-  - interface-design
-related:
-  - "[[swe-agent-aci]]"
-  - "[[research-agent-first-codebase-exploration]]"
-status: developing
----# Agent-Codebase Interface (ACI)
-The design of tool interfaces specifically for AI agents — not humans — to interact with codebases. Extends the SWE-agent concept of Agent-Computer Interfaces to codebase exploration specifically.
-## Core Principle
-Agents process information differently from humans. They have:
-- **Fixed context windows** (not infinite working memory)
-- **Token-based costs** (every byte of context has a cost)
-- **No visual cortex** (can't "see" code structure, need explicit representations)
-- **No intuition** (can't form mental models from partial exposure)
-- **Perfect recall within context** (but zero recall outside it)
-Therefore, the interface must:
-1. Maximize information density per token
-2. Present structured, machine-parseable representations
-3. Support progressive disclosure (drill down on demand)
-4. Enable autonomous navigation decisions
-## Contrast with Human Interfaces
-| Human Interface | Agent Interface |
-|----------------|-----------------|
-| Syntax highlighting, file trees | AST symbol maps, dependency graphs |
-| Scroll through files | Fetch specific symbol definitions |
-| Visual pattern recognition | Semantic search + structured queries |
-| Gradual immersion ("Paper Cuts") | Bulk ingestion + ranking algorithms |
-| IDE debugging (step-through) | Execution feedback loops (run tests, check output) |
-| "Use the project" to learn | "Map the project" to learn |

package/vault/wiki/concepts/agent-harness-architecture.md DELETED Viewed

@@ -1,67 +0,0 @@
----
-type: concept
-tags:
-  - harness
-  - architecture
-  - context-engineering
-  - safety
-related:
-  - "[[Agentic Orchestration Pipeline]]"
-  - "[[Context Engineering]]"
-  - "[[Safety Defense-in-Depth]]"
-  - "[[sources/martin-fowler-harness-engineering]]"
-  - "[[sources/opendev-arxiv-2603.05344v1]]"
----
-# Agent Harness Architecture
-The harness is everything in an AI coding agent except the model itself: the runtime orchestration layer that wraps the reasoning loop and coordinates tool dispatch, context management, safety enforcement, and session persistence. Defined as: **Agent = Model + Harness**.
-## Two-Phase Model
-### Scaffolding (Pre-Runtime)
-Runs once before the first prompt. Assembles the agent:
-- System prompt compilation (conditional, priority-ordered sections)
-- Tool schema building (from registry, MCP discovery, subagent schemas)
-- Subagent registration and initialization
-### Harness (Runtime)
-Operates continuously during execution:
-- Tool dispatch with safety gating
-- Context lifecycle management (compaction, reminders, memory)
-- Approval workflows (Manual/Semi-Auto/Auto)
-- Session persistence and undo tracking
-## Feedforward + Feedback Model
-| Direction | Type | Examples |
-|-----------|------|----------|
-| **Feedforward (Guides)** | Steer before action | System prompts, AGENTS.md, Skills, coding conventions, architecture docs |
-| **Feedback (Sensors)** | Observe after action | Linters, tests, review agents, type checkers, structural analysis |
-Two execution modes:
-- **Computational**: Deterministic, fast — tests, linters, type checkers
-- **Inferential**: LLM-based, semantic — AI code reviews, "LLM as judge"
-## The Steering Loop
-Human developers iterate on the harness: whenever an issue occurs repeatedly, improve feedforward guides or feedback sensors. Agents can help build harness components (write tests, generate linter rules, create documentation).
-## Harness Layers (OpenDev Reference)
-1. **Prompt Composition**: Conditional sections sorted by priority, provider-specific variants, ${VAR} substitution, two-part caching
-2. **Context Engineering**: Staged compaction, event-driven reminders, dual-memory architecture, tool result optimization
-3. **Tool System**: Registry with handler categories, lazy MCP discovery, batch execution, 9-pass fuzzy edit matching
-4. **Safety System**: 5-layer defense-in-depth (prompt → schema → approval → validation → hooks)
-5. **Persistence**: Session storage, operation log/undo, configuration hierarchy, provider cache
-## Harness Templates
-For common topologies (CRUD APIs, event processors, dashboards), a harness template bundles guides + sensors as a reusable package. Teams select tech stacks partly based on available harnesses.
-## Relevance to Our Harness
-Our current harness architecture:
-- **Scaffolding**: `.pi/skills/` system, agent prompt engineering, wiki as knowledge base
-- **Runtime**: `lean-ctx` for tool routing, `Agent` for subagent spawning, `wiki-autoresearch` for research
-- **Gaps**: No safety defense-in-depth, no staged compaction, no event-driven reminders, no team dispatch, no sequential chaining

package/vault/wiki/concepts/agent-loop-detection-patterns.md DELETED Viewed

@@ -1,133 +0,0 @@
----
-aliases: ["agent loop patterns", "stuck agent detection", "tool call loops"]
-type: concept
-title: "Agent Loop Detection Patterns"
-created: 2026-04-30
-status: developing
-tags:
-  - concept
-  - loop-detection
-  - agent-reliability
-  - production
-related:
-  - "[[Research: Meta-Agent Context Drift Detection]]"
-  - "[[context-drift-in-agents]]"
-  - "[[meta-agent-context-pruning]]"
-  - "[[langsight-loop-detection]]"
-  - "[[ironclaw-drift-monitor]]"
-updated: 2026-05-02
----# Agent Loop Detection Patterns
-Production-grade detection patterns for identifying when an AI agent is stuck in a non-productive loop. Based on LangSight's production experience and ironclaw's DriftMonitor proposal.
-## Three Loop Types
-### 1. Direct Repetition
-Same tool called with identical arguments multiple times in a row. Most common pattern.
-**Cause**: Tool returns error or unexpected result. LLM's retry logic doesn't distinguish "transient failure, retry" from "structural failure, give up."
-**Real-world example**: Support agent called `crm-mcp/lookup_customer` 89 times with identical arguments. CRM returned slightly malformed response. Agent decided it needed more data, called same tool, got same malformed response, repeated. Cost: $214.
-**Detection**: `SHA256(tool_name + normalized_args)[:16]`. If same hash appears ≥3 times in session window, flag as loop.
-### 2. Ping-Pong Between Tools
-Two tools called alternately without state change between calls.
-**Example**: Agent calls CRM → gets customer → calls Billing → gets invoices → calls CRM again with same args → calls Billing again.
-**Detection**: Sequence pattern matching on last 6 calls. A-B-A-B-A-B pattern triggers detection.
-### 3. Retry-Without-Progress
-Tool call succeeds (no error) but response doesn't satisfy agent's internal goal. Agent keeps calling with minor argument variations.
-**Detection**: Semantic similarity of consecutive reasoning outputs >0.95 cosine across multiple steps. Computationally expensive.
-## Detection Approaches
-### Approach 1: Argument Hash (Recommended)
-```python
-import hashlib, json
-from collections import Counter
-def compute_call_hash(tool_name: str, args: dict) -> str:
-    payload = f"{tool_name}:{json.dumps(args, sort_keys=True)}"
-    return hashlib.sha256(payload.encode()).hexdigest()[:16]
-class LoopDetector:
-    def __init__(self, threshold: int = 3):
-        self.threshold = threshold
-        self.call_counts = Counter()
-    def record_call(self, tool_name: str, args: dict) -> bool:
-        call_hash = compute_call_hash(tool_name, args)
-        self.call_counts[call_hash] += 1
-        return self.call_counts[call_hash] >= self.threshold
-```
-Catches >90% of real-world loops with zero false positives at threshold 3.
-### Approach 2: Sliding Window Rate
-Count tool calls regardless of argument variation. If tool called >N times in M seconds, flag.
-```python
-from collections import deque
-from datetime import datetime, timedelta
-class RateLoopDetector:
-    def __init__(self, max_calls: int = 10, window_seconds: int = 60):
-        self.max_calls = max_calls
-        self.window = timedelta(seconds=window_seconds)
-        self.call_times: dict[str, deque] = {}
-    def record_call(self, tool_name: str) -> bool:
-        now = datetime.utcnow()
-        if tool_name not in self.call_times:
-            self.call_times[tool_name] = deque()
-        times = self.call_times[tool_name]
-        while times and now - times[0] > self.window:
-            times.popleft()
-        times.append(now)
-        return len(times) >= self.max_calls
-```
-### Approach 3: LLM Similarity
-Compare semantic similarity between consecutive reasoning outputs. Most sophisticated but computationally expensive. Usually overkill — Approaches 1+2 catch >90%.
-## Intervention Strategies
-| Strategy | When | Risk |
-|----------|------|------|
-| **Warn + continue** | Early monitoring, unsure about thresholds | No false-termination risk, but loops continue |
-| **Terminate session** | Production, confident in thresholds | False termination loses partial work |
-| **Inject recovery** | Want agent to self-correct | Agent may ignore or loop again |
-| **Prune + restart** | Proposed meta-agent pattern | Pruning may remove useful context |
-## Threshold Tuning
-- **Default**: 3 identical calls. Works for most agents.
-- **Polling agents**: Use time-based windows (Approach 2), not count-based.
-- **Retry-heavy workflows**: Increase to 5-7.
-- **Sub-agents**: Each sub-agent gets own detector. Parent calling same sub-agent multiple times is not a loop.
-- **Start with warn, switch to terminate**: Monitor for a week, then enforce.
-## Always Combine With Budget Guardrails
-Loop detection catches known patterns. Budget guardrails catch unknown patterns:
-- Max cost per session ($1 default)
-- Max steps (25 default)
-- Max wall time (120s default)
-- Soft alert at 80% of budget
-## See Also
-- [[meta-agent-context-pruning]] — Extends detection with pruning + restart
-- [[langsight-loop-detection]] — Source: production deployment guide
-- [[ironclaw-drift-monitor]] — Source: 5-rule DriftMonitor proposal

package/vault/wiki/concepts/agent-search-enforcement.md DELETED Viewed

@@ -1,126 +0,0 @@
----
-type: concept
-status: developing
-created: 2026-04-30
-updated: 2026-04-30
-tags:
-  - agentic-harness
-  - tool-enforcement
-  - semantic-search
-  - mcp
-related:
-  - "[[ck-tool]]"
-  - "[[mcp-tool-routing]]"
-  - "[[agentic-harness-context-enforcement]]"
-  - "[[Research: semantic code search tools]]"
-title: "agent search enforcement"
----# agent search enforcement
-Strategies to force AI coding agents to use semantic code search tools (ck, vgrep) instead of raw `grep`, `cat`, and pipe commands.
-## Problem
-AI coding agents default to shell tools: `grep -r "pattern" .`, `cat file | grep foo`, `find . -name "*.py" | xargs grep bar`. These are:
-- **Lexical-only**: Miss conceptual matches, require exact keyword knowledge
-- **Noisy**: Return too many or too few results
-- **Token-inefficient**: Raw grep output wastes context window on irrelevant matches
-- **Non-indexed**: Every query scans the entire codebase (slow on large repos)
-Semantic tools (ck --sem) solve these problems but agents don't use them by default because they're not native tools.
-## Enforcement Strategies
-### 1. System Prompt Rules (Weak)
-Add to agent system prompt / CLAUDE.md:
-```markdown
-## Search Policy
-- NEVER use raw `grep` for codebase exploration.
-- ALWAYS use `ck --sem` or `ck --hybrid` for conceptual searches.
-- `grep` is permitted ONLY for exact literal string matching (e.g., finding a specific error message).
-- Before any grep, consider: "Can I express this as a ck query?"
-```
-**Effectiveness**: Low-Medium. Depends on model compliance. Claude 4 Opus follows rules well; smaller models may ignore. Costs zero infrastructure.
-### 2. MCP Tool Registration (Medium)
-Register ck as an MCP tool:
-```bash
-claude mcp add ck-search -s user -- ck --serve
-```
-The agent sees `ck_search`, `ck_get`, `ck_info`, `ck_reindex` as first-class tools alongside `bash` and `read`. If the prompt emphasizes preferring MCP tools, the agent may route code searches through ck.
-**Effectiveness**: Medium. Agent still has `bash` available. Needs prompt reinforcement. Best when combined with Strategy 1.
-### 3. Shell Wrapper Interception (Medium-Strong)
-Create a wrapper script that intercepts grep and routes semantic-looking queries to ck:
-```bash
-#!/bin/bash
-# ~/bin/grep (wrapper for agent's PATH)
-# Route to ck if query looks conceptual (multi-word, no obvious regex)
-if [[ "$*" =~ [[:space:]] ]] && [[ ! "$*" =~ [\^\$\.\*\[\]\\] ]]; then
-  if command -v ck &>/dev/null; then
-    exec ck --hybrid "$@" 2>/dev/null || exec /usr/bin/grep "$@"
-  fi
-fi
-exec /usr/bin/grep "$@"
-```
-Place this in the agent's PATH before `/usr/bin`.
-**Risks**:
-- False positives: `grep "TODO: fix this"` gets intercepted but should be lexical
-- Breaks scripts that parse grep output format
-- Adding `--hybrid` changes output format (score fields, different line format)
-- Hard to distinguish "the agent wants grep" from "the agent typed something that looks semantic"
-**Mitigation**: Only wrap for known agent users, not system-wide. Use an explicit env var: `CK_ENFORCE=1 grep ...`
-### 4. Harness-Level Tool Routing (Strong)
-Modify the agent harness (e.g., lean-ctx bash tool) to inspect every bash command before execution:
-```python
-def pre_exec_hook(command: str) -> str:
-    """Intercept grep/cat and suggest ck."""
-    if re.match(r'^(grep|/usr/bin/grep|/bin/grep)\s', command):
-        # Extract pattern and path
-        match = re.match(r'^grep\s+(?:-[a-zA-Z]+\s+)*["\']?([^"\']+)["\']?\s+(.*)', command)
-        if match:
-            pattern, path = match.groups()
-            # If pattern is multi-word (conceptual), route to ck
-            if ' ' in pattern and not re.search(r'[\^\$\.\*\[\]\\]', pattern):
-                return f'ck --hybrid "{pattern}" {path}'
-    return command  # pass through unchanged
-```
-**Effectiveness**: Strong. Catches all grep invocations. Can log/report non-compliance. Requires modifying harness code.
-### 5. Post-Hoc Validation (Weak)
-A checker that scans agent action logs and flags grep usage. Reactive — doesn't prevent the bad behavior, only reports it.
-```bash
-# Check agent logs for grep usage
-grep -c '"command": "grep' agent-session.log
-```
-## Recommended Approach
-**Three-layer defense for the ultimate-pi harness:**
-1. **Layer 1 (immediate)**: System prompt rules in AGENTS.md + install ck + register MCP
-2. **Layer 2 (medium-term)**: Add pre-exec hook to lean-ctx bash tool that warns/logs grep usage and suggests ck
-3. **Layer 3 (optional)**: Shell wrapper for known agent sessions with `CK_ENFORCE` env var
-## Open Questions
-- [ ] How does Claude Code's native `Grep` tool interact with custom MCP tools? Does it prefer its own?
-- [ ] Can MCP tools be marked as "preferred" or given higher priority?
-- [ ] What's the false-positive rate of shell interception on real-world agent queries?

package/vault/wiki/concepts/agent-skills-ecosystem.md DELETED Viewed

@@ -1,74 +0,0 @@
----
-type: concept
-status: developing
-created: 2026-05-05
-tags:
-  - agent-skills
-  - ecosystem
-  - open-standard
-  - progressive-disclosure
-related:
-  - "[[superpowers-methodology]]"
-  - "[[agent-skills-pattern]]"
-  - "[[skill-first-architecture]]"
-  - "[[policy-engine-pattern]]"
----
-# Agent Skills Ecosystem
-## Definition
-The Agent Skills ecosystem is the open-standard marketplace and format for packaging reusable AI agent expertise as SKILL.md files. Originally developed by Anthropic, released as an open standard in October 2025, and adopted by all major agent platforms within weeks. As of May 2026: 490K+ skills across multiple marketplaces.
-## The SKILL.md Open Standard
-Every skill is a directory containing a `SKILL.md` file with:
-- **YAML frontmatter**: `name` (lowercase-hyphenated, ≤64 chars), `description` (≤1024 chars — the trigger), optional `allowed-tools`, `metadata`, `license`
-- **Markdown instructions**: What the agent should do when the skill activates
-Progressive disclosure architecture:
-1. **Discovery** (always loaded): Name + description only (~100 tokens per skill)
-2. **Activation** (on-demand): Full SKILL.md body loaded when task matches description
-3. **Execution** (on-demand): Scripts, reference files, templates loaded as needed
-## Marketplaces
-| Marketplace | Skills | Key Differentiator |
-|-------------|--------|-------------------|
-| **Skills.sh** (Vercel) | 83K+ | Curated quality, CLI-native install, Snyk security scanning, leaderboard |
-| **SkillsMP** | 400K+ | Volume leader, GitHub crawl, AI-powered semantic search |
-| **ClawHub** (OpenClaw) | ~10K+ | Open platform, hit by ClawHavoc malware campaign |
-## Installation
-Universal: `npx skills add owner/repo`
-Per-agent paths:
-- Claude Code: `.claude/skills/` (project) or `~/.claude/skills/` (personal)
-- Codex CLI: `.agents/skills/` or `.codex/skills/`
-- Cursor: `.cursor/skills/`
-- Gemini CLI: `.gemini/skills/`
-- GitHub Copilot: `.github/skills/`
-- Windsurf: `.windsurf/skills/`
-## Two Skill Types
-1. **Capability Uplift** — Gives agent abilities it doesn't have. Before the skill, agent can't do the task. Examples: Firecrawl (web scraping), Document Skills (PDF/DOCX creation), Webapp Testing (Playwright).
-2. **Encoded Preference** — Agent already knows how, but the skill encodes your team's specific way. Examples: Code review checklists, commit message formats, API conventions.
-## Security Risks
-Snyk's ToxicSkills study (Feb 2026) scanned 3,984 skills:
-- 36.8% had at least one security flaw
-- 13.4% contained critical-level issues
-- 76 skills were confirmed malicious payloads
-- 91% of malicious skills combined prompt injection with traditional malware
-The ClawHavoc campaign (Jan-Feb 2026): 341 malicious skills on ClawHub distributing Atomic macOS Stealer.
-## Ecosystem Trajectory
-Zero to 490K skills in six months (Oct 2025 – Mar 2026). All major platforms adopted within weeks. The format's simplicity (anyone who can write Markdown can create a skill) drove adoption. Network effects accelerating: more skills → more agent users → more skill authors.
-## Relevance to Harness
-Our `.pi/skills/` system uses the same progressive disclosure pattern. The Agent Skills ecosystem validates that markdown-based skills are the right primitive — and that cross-agent portability is the winning strategy. We should consider SKILL.md compatibility for maximum reuse of the 490K+ ecosystem.

package/vault/wiki/concepts/agent-skills-pattern.md DELETED Viewed

@@ -1,68 +0,0 @@
----
-type: concept
-title: "Agent Skills Pattern (Progressive Disclosure)"
-created: 2026-05-01
-updated: 2026-05-01
-status: developing
-tags:
-  - harness
-  - skills
-  - context-engineering
-  - gemini-cli
-related:
-  - "[[harness-engineering-first-principles]]"
-  - "[[gemini-cli-architecture]]"
-sources:
-  - "[[Source: Gemini CLI Changelogs]]"
-  - "[[Source: LangChain - Anatomy of Agent Harness]]"
----# Agent Skills Pattern: Progressive Disclosure
-## What It Is
-Agent Skills is a harness-level primitive for **progressive disclosure**: skills are loaded on-demand via an activation mechanism rather than all at context start. This prevents context rot — the observed degradation in model performance as the context window fills with irrelevant tool definitions and instructions.
-## Why It Matters
-Too many tools or MCP servers loaded into context on agent start degrades performance _before_ the agent can start working. Skills solve this by loading only when needed:
-1. Agent starts with minimal context (core tools + system prompt)
-2. Agent analyzes task, determines which skills are relevant
-3. Agent calls `activate_skill` tool to load specific skill's instructions + tools
-4. Skill's context injected into current conversation
-5. Agent uses skill, then moves on (skill context may persist or be compacted)
-## Gemini CLI Implementation (v0.23+)
-- **v0.23 (Jan 2026)**: Experimental Agent Skills support via agentskills.io
-- **v0.24**: Built-in agent skills, `/skills install/uninstall`, `/agents refresh`
-- **v0.25**: `activate_skill` tool formalized, `pr-creator` skill, skills enabled by default
-- **v0.26**: `skill-creator` meta-skill (skills that create skills)
-- **v0.30**: SDK package enabling custom skills with dynamic system instructions
-- **v0.39**: `/memory inbox` for reviewing and patching skills extracted during sessions
-## Key Design Decisions
-1. **Frontmatter metadata**: Each skill has structured metadata describing when to activate
-2. **Activation tool**: Model decides when to call `activate_skill` based on task analysis
-3. **Skill inbox**: Extracted skills don't auto-install — human reviews first via `/memory inbox`
-4. **Skill-creator**: Meta-skill enables agent to create new skills from observed patterns
-## Ultimate-PI Current State
-We have `.pi/skills/` directory with 16+ skills, but they load all at context start (no progressive disclosure). This follows the "delivery mechanism for context engineering" pattern but without the activation mechanism that prevents context rot.
-## Integration Path (P-F2)
-1. Add frontmatter to each skill: `activation_triggers`, `required_capabilities`, `token_budget`
-2. Add `activate_skill` tool to tool registry
-3. Implement skill registry that loads skills on-demand
-4. Add `/memory inbox` for reviewing AI-extracted patterns before they become permanent skills
-5. Implement skill-creator meta-skill for autonomous skill generation from observed failures
-## Relationship to Other Harness Primitives
-- **Context Compression**: Skills reduce the _need_ for compression by keeping context lean
-- **Subagents**: Skills can be loaded into subagents independently, each with relevant context
-- **Policy Engine**: Skill activation can be gated by policy (e.g., "never activate browser skill on production")
-- **Memory Systems**: Skills extracted from sessions feed into persistent memory (wiki in our case)