npm - ultimate-pi - Versions diffs - 0.1.2 → 0.1.4 - Mend

ultimate-pi 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (516) hide show

package/vault/wiki/sources/anthropic-prompt-best-practices.md ADDED Viewed

@@ -0,0 +1,100 @@
+---
+type: source
+status: ingested
+source_type: official-documentation
+title: "Anthropic Prompt Engineering Best Practices (Claude Opus 4.7 through Haiku 4.5)"
+author: "Anthropic"
+date_published: 2026-04-01
+date_fetched: 2026-05-01
+url: "https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/claude-prompting-best-practices"
+confidence: high
+key_claims:
+  - "Claude Opus 4.7 interprets prompts more literally and explicitly than Opus 4.6"
+  - "Effort parameter (max/xhigh/high/medium/low) is the primary control knob replacing budget_tokens"
+  - "Adaptive thinking dynamically calibrates reasoning depth per step"
+  - "XML tags are the recommended structure format for complex prompts"
+  - "Long content at top + query at bottom improves performance up to 30%"
+  - "Claude Opus 4.7 has stronger default design aesthetic with specific house style"
+  - "Code review harnesses need explicit lowering of the reporting bar for Opus 4.7"
+tags:
+  - prompting
+  - anthropic
+  - claude
+  - model-specific
+  - harness-design
+created: 2026-05-02
+updated: 2026-05-02
+---# Anthropic Prompt Engineering Best Practices
+Official comprehensive prompt engineering guide for Claude's latest models (Opus 4.7, Opus 4.6, Sonnet 4.6, Haiku 4.5). Single reference covering foundational techniques, output control, tool use, thinking, and agentic systems.
+## Model-Specific Key Findings
+### Claude Opus 4.7
+- **More literal instruction following**: Will not silently generalize instructions; precision over thrash
+- **Response length calibrated to task complexity**: Shorter on lookups, longer on analysis
+- **Effort parameter critical**: `xhigh` for coding/agentic, `high` minimum for intelligence-sensitive
+- **Tool use triggering**: Uses tools LESS than Opus 4.6; needs explicit guidance or higher effort
+- **User-facing progress updates**: Better native updates; remove scaffolding that forces interim messages
+- **Tone shift**: More direct/opinionated, less validation-forward, fewer emoji than Opus 4.6
+- **Subagent spawning**: Tends to spawn FEWER subagents by default
+- **Default frontend aesthetic**: Warm cream (~`#F4F1EA`), serif display type, terracotta/amber accent
+- **Code review**: Better at finding bugs but follows "only high-severity" filters too faithfully
+- **Design steering**: "Propose 4 directions first" pattern breaks the default
+### Claude Opus 4.6
+- **Adaptive thinking**: Replaces budget_tokens; effort controls depth
+- **Overthinking risk**: Excessive upfront exploration; may gather extensive context without prompting
+- **Subagent predilection**: Strong tendency to spawn subagents; may overuse
+- **Better vision**: Improved multi-image processing, computer use
+- **Prefilled responses deprecated**: Last assistant turn prefill returns 400 on Mythos Preview
+### Claude Sonnet 4.6
+- **Effort default**: `high` (was no effort parameter in Sonnet 4.5)
+- **Recommended settings**: `medium` for most apps, `low` for latency-sensitive
+- **Adaptive thinking**: Best for autonomous multi-step agents, computer use, bimodal workloads
+- **64k max_tokens recommended** at medium/high effort
+## General Principles (All Models)
+### Prompt Structure
+- **XML tags preferred**: `<instructions>`, `<context>`, `<examples>`, `<input>` — unambiguous parsing
+- **Long content at top, query at bottom**: Up to 30% quality improvement
+- **Role setting**: Even single sentence makes difference
+- **Examples in `<example>` tags**: 3-5 examples; diverse, structured
+- **Be clear and direct**: "Golden rule — show prompt to colleague; if they'd be confused, Claude will be too"
+- **Provide context/why**: Explaining motivation helps understanding
+- **Prefer general instructions over prescriptive steps**: Claude's reasoning exceeds human-prescribed steps
+### Tool Use
+- **Explicit direction needed**: "can you suggest" vs "implement" distinction
+- **Proactive action default**: `<default_to_action>` block for autonomous behavior
+- **Conservative action default**: `<do_not_act_before_instructions>` for safety-critical
+- **Parallel tool calling**: Maximize by default, steerable
+- **Older aggressive prompts cause overtriggering**: Dial back "CRITICAL: You MUST" language
+### Thinking & Reasoning
+- **Adaptive thinking**: Dynamic calibration; higher effort = more thinking
+- **Steerable**: "Thinking adds latency; only use when it will meaningfully improve quality"
+- **Self-check**: "Before you finish, verify your answer against [criteria]"
+- **Multishot with `<thinking>` tags**: Show reasoning pattern in examples
+### Agentic Systems
+- **Context awareness**: Model tracks remaining context window tokens
+- **State tracking across windows**: Save progress to files, use git, structured formats
+- **Multi-window workflows**: First window sets up framework, future windows iterate
+- **Balancing autonomy/safety**: Reversible actions OK, destructive actions need confirmation
+- **Research mode**: Competing hypotheses, confidence tracking, hypothesis trees
+### Output Control
+- **Tell what to do, not what not to do**: "Write in flowing prose" not "Don't use markdown"
+- **Match prompt style to desired output**: Remove markdown from prompt to reduce markdown in output
+- **XML format indicators**: `<smoothly_flowing_prose_paragraphs>` tags
+- **Avoid overengineering**: Don't add features, refactors, or abstractions beyond what was asked
+- **Minimize hallucinations**: `<investigate_before_answering>` block
+### Frontend Design
+- **Don't settle for "AI slop"**: Distinctive typography, cohesive themes, purposeful motion
+- **Opus 4.7 default**: Warm cream + serif + terracotta; steer via concrete specs or option proposal
+- **Frontend aesthetics block**: Explicit guidance against generic patterns

package/vault/wiki/sources/anthropic2026-harness-design.md ADDED Viewed

@@ -0,0 +1,63 @@
+---
+type: source
+source_type: blog
+title: "Harness Design for Long-Running Application Development"
+author: "Prithvi Rajasekaran, Anthropic Engineering"
+date_published: 2026-03-24
+url: "https://www.anthropic.com/engineering/harness-design-long-running-apps"
+confidence: high
+key_claims:
+  - "Self-evaluation is fundamentally broken: agents praise their own mediocre work"
+  - "Separating generator from evaluator (GAN-inspired) dramatically improves output quality"
+  - "Sprint contracts: agree on 'done' before writing code"
+  - "Harness simplification: as models improve, remove non-load-bearing components"
+  - "Cost: 20x more expensive but dramatically better quality"
+tags:
+  - harness
+  - anthropic
+  - multi-agent
+  - evaluator
+  - generator
+created: 2026-04-30
+updated: 2026-04-30
+status: ingested
+---# Harness Design for Long-Running Application Development
+Anthropic Engineering, March 2026. Prithvi Rajasekaran.
+## Three-Agent Architecture
+**Planner**: Takes 1-4 sentence prompt → full product spec. Stays at product-context level, avoids granular technical details to prevent cascading errors.
+**Generator**: Implements one feature at a time. Self-evaluates after each sprint before handing off.
+**Evaluator**: Uses Playwright MCP to interactively test the running application. Grades against explicit criteria (design quality, originality, craft, functionality). Each criterion has a hard threshold — if any falls below, sprint fails.
+## Critical Findings
+### Self-Evaluation Is Broken
+When asked to evaluate their own work, agents "respond by confidently praising the work — even when, to a human observer, the quality is obviously mediocre." Separating generator from evaluator is essential.
+### Context Anxiety
+Models start wrapping up prematurely when approaching context limit. Compaction alone insufficient — context resets with structured handoffs required for Sonnet 4.5. Opus 4.5+ largely fixed this.
+### Evaluator Tuning
+Claude is "a poor QA agent out of the box" — identifies flaws then talks itself out of flagging them. Needs explicit tuning to be skeptical. Multiple rounds of development loop required.
+### Sprint Contracts
+Before each sprint, generator and evaluator negotiate a contract defining what "done" looks like. Generator proposes, evaluator reviews. They iterate until agreement. Communication via files.
+### Harness Simplification Principle
+"Every component in a harness encodes an assumption about what the model can't do on its own, and those assumptions are worth stress testing." When Opus 4.6 arrived, sprint construct was removed (model handled decomposition natively). Evaluator became conditional — worth the cost only when task sits beyond what the model does reliably solo.
+## Results
+| Harness | Duration | Cost | Quality |
+|---------|----------|------|---------|
+| Solo | 20 min | $9 | Core feature broken |
+| Full harness | 6 hr | $200 | All features working, AI integration |
+## Key Takeaway
+"The space of interesting harness combinations doesn't shrink as models improve. Instead, it moves."

package/vault/wiki/sources/barrel-files-tkdodo.md ADDED Viewed

@@ -0,0 +1,38 @@
+---
+type: source
+status: ingested
+source_type: article
+author: "Dominik Dorfmeister (TkDodo)"
+date_published: 2024-07-26
+url: "https://tkdodo.eu/blog/please-stop-using-barrel-files"
+confidence: high
+key_claims:
+  - "Barrel files cause circular imports when internal modules import from the barrel"
+  - "Next.js projects saw 11K → 3.5K module load reduction (68%) by removing barrels"
+  - "Barrel files slow development server startup by 5-10 seconds in large projects"
+  - "Barrels are appropriate only for library entry points (package.json `main` field)"
+tags:
+  - typescript
+  - barrel-files
+  - code-organization
+  - performance
+created: 2026-05-02
+updated: 2026-05-02
+---# Please Stop Using Barrel Files
+Source: TkDodo's blog (Dominik Dorfmeister), July 2024. Author of TanStack Query.
+## Summary
+Argues against the widespread practice of using `index.ts` barrel files to re-export from directories. Documents real-world performance problems and circular import issues caused by barrel files in production Next.js applications.
+## Key Arguments
+**Circular imports**: When a module inside a directory imports from its own barrel (`import { X } from '@/dir'`), it creates a circular dependency. ESLint `import/no-cycle` can catch some but not all cases.
+**Development speed**: Barrel files force JavaScript to load and parse every module in the barrel synchronously, even if only one export is needed. A real Next.js project saw module count drop from 11K to 3.5K (68% reduction) after removing barrels, cutting startup time from 5-10 seconds down significantly.
+**Next.js `optimizePackageImports`**: Automatically transforms barrel imports to direct module paths, but only works if the barrel is a "pure" re-export file with no other code.
+**When barrels are OK**: Library entry points only (the `main` field in `package.json`). For application code, direct imports are preferred.

package/vault/wiki/sources/birth-of-unix-kernighan-interview.md ADDED Viewed

@@ -0,0 +1,57 @@
+---
+type: source
+source_type: interview-podcast
+title: "The Birth of UNIX — Brian Kernighan on Bell Labs"
+author: "Brian Kernighan (interviewed by Adam Gordon Bell)"
+date_published: 2020-11-01
+url: "https://corecursive.com/brian-kernighan-unix-bell-labs1/"
+confidence: high
+key_claims:
+  - "Ken Thompson built the first working Unix in 3 weeks"
+  - "Bell Labs culture: shared machine, shared source tree, everyone on same filesystem"
+  - "The Unix Room as collaborative physical space"
+  - "Ken Thompson reversed-engineered a typesetter in hours: disassembler, assembler, B interpreter"
+  - "Pipes enabled Cambrian explosion of composable tools"
+  - "Community built around shared machine and shared source — `who` command as social tool"
+  - "The only rule: you changed it last, it's yours"
+tags: [unix, bell-labs, ken-thompson, brian-kernighan, history]
+---
+# The Birth of UNIX — Brian Kernighan Interview
+## Ken Thompson's Productivity
+- Built first working Unix in 3 weeks while wife was on vacation.
+- Reverse-engineered a typesetter in hours: wrote a disassembler for an unfamiliar CPU from binary code, then an assembler, then a B language interpreter — all in about a day.
+- Brian Kernighan: "For Ken, it was just like breathing. Oh, okay, done. Next."
+## The Unix Room Culture
+- Physical shared space on Bell Labs 6th floor with a PDP-11 and teletypes.
+- "If you wanted, you could go sit in your office and think deep thoughts... then come back to the common space when you wanted to."
+- Shared machine + shared filesystem: everyone could see everyone's source code.
+- "The only real rule: you changed it last, it's yours."
+- `who` command as community builder — showed who was logged in and when they last acted.
+- 10-kilo chocolate bars on the table, Private Eye magazine from Dennis Ritchie.
+## The Pipes Breakthrough
+- Doug McIlroy pushed for program composition for years.
+- Ken Thompson implemented it. The pipe symbol `|` "just clicked instantly."
+- Within days: "frenzy of fixing up programs so that they would work properly in pipelines."
+- Sort was repackaged to read stdin/write stdout — pattern used daily by millions since.
+## Kernighan on Modern Programming
+- "I found it easier to program when I was trying to figure out the logic for myself rather than trying to figure out where in the infinite stack of documentation was the function I needed."
+- "Too much of today's programming is more like looking it up."
+## Richard Hamming's Influence
+- "He would reserve Friday afternoons for thinking great thoughts."
+- Asked chemists: "Could your work lead to a Nobel Prize? If not, why are you working on it?"
+- But the Unix work itself didn't seem important at the time — it was just making programming easier for themselves.
+## Kernighan's Thesis as Tool-Building Metaphor
+- His PhD thesis formatting program: the first 500 cards were the program, the remaining 5,500 were the thesis. "It's building tools that let you do things, and the tools are often some kind of specialized language."

package/vault/wiki/sources/bockeler2026-harness-engineering.md ADDED Viewed

@@ -0,0 +1,69 @@
+---
+type: source
+source_type: blog
+title: "Harness Engineering for Coding Agent Users"
+author: "Birgitta Böckeler, Martin Fowler"
+date_published: 2026-04-02
+url: "https://martinfowler.com/articles/harness-engineering.html"
+confidence: high
+key_claims:
+  - "Feedforward (guides) + Feedback (sensors) = harness control framework"
+  - "Computational controls: deterministic, fast (tests, linters, type checkers)"
+  - "Inferential controls: semantic, probabilistic (AI code review, LLM-as-judge)"
+  - "Three regulation categories: Maintainability, Architecture Fitness, Behaviour"
+  - "Behavioural harness (functional correctness) remains unsolved"
+  - "Ashby's Law: harness must match system variety; topologies reduce variety"
+tags:
+  - harness
+  - feedforward
+  - feedback
+  - martin-fowler
+  - maintainability
+created: 2026-04-30
+updated: 2026-04-30
+status: ingested
+---# Harness Engineering for Coding Agent Users
+Birgitta Böckeler, Martin Fowler. April 2026.
+## The Framework
+### Feedforward Controls (Guides)
+Anticipate agent behavior, steer BEFORE it acts:
+- AGENTS.md, skills, rules, how-to guides
+- Language servers, CLIs, scripts, codemods
+### Feedback Controls (Sensors)
+Observe AFTER agent acts, enable self-correction:
+- AI code review agents
+- Static analysis, linters, logs, browser testing
+### Computational vs Inferential
+| Type | Speed | Reliability | Examples |
+|------|-------|-------------|----------|
+| Computational | ms-sec | Deterministic | Tests, linters, type checkers, structural analysis |
+| Inferential | sec-min | Probabilistic | AI code review, LLM-as-judge, semantic analysis |
+## Three Regulation Categories
+1. **Maintainability Harness**: Internal code quality. Computational sensors catch structural issues reliably. LLMs partially address semantic issues but expensively.
+2. **Architecture Fitness Harness**: Architecture characteristics. Fitness functions + observability standards.
+3. **Behaviour Harness**: Functional correctness. **THE UNSOLVED PROBLEM.** Current approach (AI-generated tests + manual testing) insufficient.
+## Harnessability
+Not every codebase is equally harnessable. Strongly typed languages, clear module boundaries, framework abstractions increase harnessability. "Ambient affordances" — structural properties that make the environment legible to agents.
+## Harness Templates
+Pre-bundled guides + sensors for service topologies (CRUD, event processor, data dashboard). Ashby's Law: topology narrows the solution space, making comprehensive harnesses achievable.
+## Key Insight
+> "The human's job is to STEER the agent by iterating on the harness. Whenever an issue happens multiple times, the feedforward and feedback controls should be improved."
+Harness engineering is an ongoing practice, not a one-time configuration.

package/vault/wiki/sources/cast-code-chunking-paper.md ADDED Viewed

@@ -0,0 +1,50 @@
+---
+type: source
+status: ingested
+source_type: research-paper
+author: Yilin Zhang, Xinran Zhao, Zora Zhiruo Wang, Chenyang Yang, Jiayi Wei, Tongshuang Wu (CMU)
+date_published: 2025-06-18
+url: https://arxiv.org/abs/2506.15655
+confidence: high
+key_claims:
+  - "AST-based chunking (cAST) boosts Recall@5 by 4.3 points on RepoEval retrieval and Pass@1 by 2.67 on SWE-bench generation"
+  - "Existing line-based chunking heuristics break semantic structures, splitting functions or merging unrelated code"
+  - "cAST recursively breaks large AST nodes into smaller chunks and merges sibling nodes while respecting size limits"
+  - "Structure-aware chunking generates self-contained, semantically coherent units across programming languages"
+tags:
+  - chunking
+  - AST
+  - code-rag
+  - embedding
+  - arxiv
+created: 2026-05-02
+updated: 2026-05-02
+---# cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree
+## Summary
+Peer-reviewed paper (arXiv:2506.15655, June 2025) from CMU researchers proposing AST-based chunking for code RAG pipelines. The core insight: line-based chunking breaks semantic structures, splitting functions mid-body or merging unrelated code. cAST parses code into ASTs and uses recursive split-then-merge to create self-contained, semantically coherent chunks.
+## Key Details
+### Problem
+- RAG pipelines split documents into retrievable units (chunks)
+- Line-based heuristics often break semantic structures
+- Splitting functions or merging unrelated code degrades generation quality
+### Solution: cAST
+- Parse code into Abstract Syntax Tree
+- Recursively break large AST nodes into smaller chunks
+- Merge sibling nodes while respecting size limits
+- Uses non-whitespace character count (not line count) for sizing
+- Greedy window assignment with merge-adjacent optimization
+### Results
+- Recall@5: +4.3 points on RepoEval retrieval
+- Pass@1: +2.67 on SWE-bench generation
+- Works across programming languages
+## Relevance to Our Implementation
+This is the foundational paper for AST-aware chunking. The `code-chunk` library (supermemoryai) implements this algorithm in production. We should adopt AST-aware chunking via tree-sitter (already available in lean-ctx) rather than naive text splitting.

package/vault/wiki/sources/ck-semantic-search.md ADDED Viewed

@@ -0,0 +1,78 @@
+---
+type: source
+source_type: official-documentation
+title: "ck: Hybrid Code Search"
+author: BeaconBay
+date_published: 2025-08-30
+url: https://beaconbay.github.io/ck/
+repo: https://github.com/BeaconBay/ck
+confidence: high
+key_claims:
+  - "ck is a grep-compatible hybrid code search tool combining BM25 lexical search with embedding-based semantic search"
+  - "~1M LOC indexed in under 2 minutes, sub-500ms queries"
+  - "Completely offline: no code or queries sent to external services"
+  - "Built-in MCP server for AI agent integration (ck --serve)"
+  - "Supports 80+ languages via tree-sitter chunking"
+tags:
+  - code-search
+  - semantic-search
+  - grep
+  - mcp
+  - rust
+related:
+  - "[[ck-tool]]"
+  - "[[hybrid-code-search]]"
+  - "[[Research: semantic code search tools]]"
+created: 2025-08-30
+updated: 2026-04-30
+status: ingested
+---# ck (seek): Hybrid Code Search
+## Summary
+ck is a Rust-based hybrid code search tool that fuses lexical (BM25/grep) precision with embedding-based semantic recall, then re-ranks results using Reciprocal Rank Fusion (RRF). It positions itself as a drop-in grep replacement with added semantic capabilities.
+## What It Contributes
+**Primary contribution to AI coding agents**: ck provides a grep-compatible CLI that agents can use directly (`ck --sem "error handling" src/`) while also serving as an MCP server for deeper integration. The MCP tools (`ck_search`, `ck_get`, `ck_info`, `ck_reindex`) give agents first-class access to semantic code search without parsing CLI output.
+## Key Capabilities
+| Capability | Details |
+|---|---|
+| **Lexical Search** | BM25-based, grep-compatible flags (-n, -A, -B, -C, -r, -l, -i, -w) |
+| **Semantic Search** | `ck --sem "query"` — embedding-based, finds by concept not keywords |
+| **Hybrid Search** | `ck --hybrid "query"` — RRF fusion of lexical + semantic results |
+| **TUI Mode** | `ck-tui` — interactive terminal interface with live results |
+| **Editor Integration** | VSCode/Cursor extension (`code --install-extension ck-search`) |
+| **MCP Server** | `ck --serve` — Model Context Protocol for AI agent integration |
+| **Incremental Indexing** | Chunk-level re-indexing: only re-embeds changed files |
+## Installation
+```bash
+# From NPM (recommended)
+npm install -g @beaconbay/ck-search
+# From crates.io
+cargo install ck-search
+# MCP setup for Claude Code
+claude mcp add ck-search -s user -- ck --serve
+```
+## Limitations (Documented)
+1. **No code-aware embeddings**: Uses generic text embeddings (fastembed), not code-specialized models. Structural patterns may be missed.
+2. **80 language max**: Tree-sitter chunking covers 80 languages. Unsupported languages fall back to line-based chunking.
+3. **No custom model training**: Pre-trained models only. Cannot fine-tune for domain-specific codebases.
+4. **HuggingFace cache control**: Cache location controlled by HF env vars (`$HF_HOME`), no ck-specific config.
+5. **Memory**: 4-8GB RAM recommended for large codebases (10M+ LOC).
+6. **Result pagination**: Max 100 results per page. Exhaustive search requires cursor-based pagination.
+7. **No team/cloud sync**: Local-only indexes. No shared or remote indexes.
+8. **No AST-level understanding**: Chunking is tree-sitter-based, but embeddings are text, not AST-aware.
+## Confidence Assessment
+**High confidence** for feature claims — all verified against official documentation and GitHub repo. The limitations section is unusually thorough for a young project (transparency is a good signal). The tool is actively maintained (last commit within days as of April 2026). Stars growth: 1,572 in ~8 months suggests strong community validation.

package/vault/wiki/sources/claude-code-architecture-karaxai-2026.md ADDED Viewed

@@ -0,0 +1,71 @@
+---
+type: source
+status: ingested
+source_type: blog
+title: "How Claude Code Actually Works: A Systems-Level Deep Dive"
+author: "KaraxAI"
+date_published: 2026-03-19
+url: "https://karaxai.com/posts/how-claude-code-works-systems-deep-dive/"
+confidence: medium
+tags: [claude-code, architecture, CLAUDE.md, agent-loop, skills, plugins, MCP, subagents, hooks]
+key_claims:
+  - "Claude Code has 82,000+ GitHub stars and handles millions of coding sessions"
+  - "CLAUDE.md is injected into user messages in <system-reminder> tags, every turn — not in system prompt"
+  - "96% compliance with 5 conditional rule files (30 lines each) vs 92% with single 150-line CLAUDE.md"
+  - "Three memory systems: CLAUDE.md (reliable), auto-memory (200-line limit, lossy), session memory (lossy)"
+  - "Auto-compaction at ~83.5% of 200K window, ~85% payload reduction"
+  - "Skills use progressive disclosure: 100 tokens at startup, full body on-demand"
+  - "Subagents get fresh 200K context, only summary returns, cannot spawn own subagents"
+  - "Hooks achieve 100% compliance; CLAUDE.md rules achieve ~92%"
+  - "Deliberately no embeddings: 'agentic search generally works better' — Boris Cherny"
+  - "The model is the commodity; the agent is the product"
+created: 2026-05-02
+updated: 2026-05-02
+---
+# Claude Code Systems Deep-Dive (KaraxAI, 2026)
+## Source Summary
+Comprehensive technical walkthrough published March 2026. Covers the full stack: context assembly, agentic loop, MCP, plugins. Based on reverse-engineered internals from mitmproxy interception, npm tarball analysis, and systematic prompt extraction. Notable for providing specific compliance numbers and the direct quote from Claude Code's creator about rejecting embeddings.
+## CLAUDE.md Loading Hierarchy
+```
+Global (~/.claude/CLAUDE.md) → Enterprise → Project → Local → Notebook (cursor rules)
+```
+All tiers are additive. When instructions conflict, more specific (local) wins. Conditional rules via YAML frontmatter (`match: "*.test.ts"`) since v1.0.16.
+## Agentic Loop
+Single-threaded. Model receives context → produces response → if tool calls, execute, append to history, call again → if `stop_reason === "end_turn"`, stop. Between iterations: permission enforcement (hooks → deny rules → allow rules → ask rules → permission mode), context monitoring (auto-compaction at ~83.5%), state re-injection (CLAUDE.md re-sent every turn), mid-task steering (async dual-buffer queue).
+## Context Compression
+At ~167K/200K tokens, auto-compaction triggers. Summary in `<summary>` tags. All prior messages dropped. ~85% reduction (167K → ~25K). Lossy: old file contents, tool outputs lost; new summary + last 5 messages + CLAUDE.md survive.
+## Skills
+Progressive disclosure: scans `.claude/skills/` and `~/.claude/skills/`, loads only `name` + `description` (~100 tokens each) into `<available_skills>` block. Full content loads on invocation via Skill tool. Skills can include supporting files, restrict tools, spawn subagents. Built-in skills (`/simplify`, `/review`, `/batch`, `/loop`, `/debug`) are prompt-based, not hardcoded.
+## Plugins
+Directory with `.claude-plugin/plugin.json` manifest. Bundles any combination of: skills, agents, hooks, MCP servers, commands, CLAUDE.md. Namespacing: `/my-plugin:hello`. Agent override: plugin can replace main agent's system prompt. 9,000+ plugins across registries. Official marketplace ships built-in.
+## No Embeddings
+> "Early versions used RAG + a local vector db, but we found pretty quickly that agentic search generally works better." — Boris Cherny, Claude Code creator
+Search hierarchy: Glob (file path matching, near-zero token cost) → Grep (ripgrep, regex-powered) → Read (full file load, reserved for confirmed-relevant files). Explore subagent on Haiku for deep exploration.
+## Hooks
+Deterministic escape hatch. Shell commands fire on lifecycle events. Exit codes: 0 = allow, 2 = block (stderr fed to Claude), other = non-blocking error. CLAUDE.md ~92% compliance. Hooks 100% for matched conditions.
+## Key Quotes
+> "The model is the commodity; the agent is the product."
+> "Agentic search generally works better." — Boris Cherny
+> "CLAUDE.md content is injected into user messages, wrapped in <system-reminder> XML tags. Every turn. Not once at session start — every single API call re-sends it."

package/vault/wiki/sources/claude-code-architecture-qubytes-2026.md ADDED Viewed

@@ -0,0 +1,50 @@
+---
+type: source
+status: ingested
+source_type: blog
+title: "Inside Claude Code: The Architecture That Makes AI Actually Do the Work"
+author: "Vijendra (The Neural Blueprint / Qubytes)"
+date_published: 2026-04-30
+url: "https://qubytes.substack.com/p/claude-code-architecture-explained"
+confidence: medium
+tags: [claude-code, architecture, agent-loop, compaction, hooks, subagents]
+key_claims:
+  - "Claude Code is a while-loop surrounded by serious infrastructure"
+  - "Five critical subsystems: Agent Loop, Permission System, Tools & Execution Environment, State & Persistence, Compaction Pipeline"
+  - "Compaction pipeline is the most underappreciated component — five layers, forked subagent, structured summary"
+  - "Hooks are the enterprise integration surface"
+  - "Subagents enable horizontal scaling of reasoning"
+  - "Safety as a subsystem, not an afterthought"
+created: 2026-05-02
+updated: 2026-05-02
+---
+# Claude Code Architecture (Qubytes, 2026)
+## Source Summary
+Technical deep-dive by Vijendra (The Neural Blueprint) analyzing Claude Code as a layered architecture. Published April 30, 2026 — same day as Cursor's harness evolution blog. Synthesized from the leaked source code and official documentation.
+## Five Subsystems
+### 1. Agent Loop
+The heart. Orchestrates everything: assembles context window, dispatches requests, routes tool-use responses, commits state. Feedback controller, not a pipeline. Non-deterministic iterations driven by task complexity.
+### 2. Permission System
+First-class architectural concern. Sits between agent loop and tool execution. ML-based auto classifier with 7 permission modes. Diamond-shaped decision node: deny sends feedback to loop, accept lets execution proceed.
+### 3. Tools & Execution Environment
+Built-in tools (file read/write, bash, grep, glob) + MCP extensions. All tool execution runs through Shell Sandbox. Remote execution backends (local/cloud/remote).
+### 4. State & Persistence
+Append-oriented session transcript. Not just logging — substrate for resume, fork, rewind. CLAUDE.md + memory inject persistent project context. Sidechain transcripts for subagent interactions, preventing context pollution.
+### 5. Compaction Pipeline
+Five layers: forked subagent produces ~6,500 token structured summary. Preserves: last 5 file attachments, active skills, plan state, tool deltas. "Structured extraction followed by selective reconstruction — not summarization."
+## Key Quotes
+> "The core agent loop — assemble context, call the model, receive a tool request, execute it, repeat — is conceptually simple. The real engineering genius lives in everything around that loop."
+> "Context is a managed resource, not an infinite buffer."
+> "If you can't answer: How does your permission system work? What's your compaction strategy? Can I hook into the lifecycle? How does subagent delegation handle context isolation? — you're not looking at a production-ready agentic system."

package/vault/wiki/sources/claude-code-architecture-vila-lab-2026.md ADDED Viewed

@@ -0,0 +1,64 @@
+---
+type: source
+status: ingested
+source_type: academic-paper
+title: "Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems"
+author: "Jiacheng Liu, Xiaohan Zhao, Xinyi Shang, Zhiqiang Shen"
+date_published: 2026-04-14
+url: "https://arxiv.org/abs/2604.14228"
+confidence: high
+tags: [claude-code, agent-architecture, source-code-analysis, design-principles]
+key_claims:
+  - "Claude Code architecture centers on a simple while-loop with four surrounding subsystems"
+  - "Five human values motivate the architecture: human decision authority, safety, reliable execution, capability amplification, contextual adaptability"
+  - "Thirteen design principles trace from values to specific implementation choices"
+  - "Core subsystems: permission system with ML classifier, five-layer compaction pipeline, four extensibility mechanisms, subagent delegation with worktree isolation"
+  - "Comparison with OpenClaw reveals how same design questions produce different answers under different deployment contexts"
+  - "Six open design directions identified for future agent systems"
+created: 2026-05-02
+updated: 2026-05-02
+---
+# Dive into Claude Code (VILA-Lab, 2026)
+## Source Summary
+Academic paper by researchers at VILA-Lab analyzing Claude Code's TypeScript source code (publicly available after accidental leak). Reverse-engineers the complete architecture from 510K+ lines of TypeScript. The most comprehensive architectural analysis of Claude Code available.
+## Architecture Components
+### Core Loop
+The center is a simple `while`-loop: assemble context → call model → receive tool request → execute → repeat. Most of the 510K lines lives in systems around this loop, not in the loop itself.
+### Five Human Values → 13 Design Principles
+1. **Human Decision Authority**: Permission system, plan mode, manual approval gates
+2. **Safety and Security**: ML-based auto classifier, sandboxing, permission modes
+3. **Reliable Execution**: Compaction for long sessions, checkpointing, error recovery
+4. **Capability Amplification**: MCP, subagents, skills, plugins
+5. **Contextual Adaptability**: CLAUDE.md hierarchy, conditional rules, dynamic context loading
+### Four Extensibility Mechanisms
+1. **MCP** (Model Context Protocol): Open standard for tool connections. JSON-RPC 2.0, stdio and HTTP transports. Donated to Linux Foundation Dec 2025. Adopted by OpenAI, Google, GitHub, JetBrains.
+2. **Plugins**: Distribution layer bundling skills + agents + hooks + MCP. 9,000+ ecosystem. Namespaced, versioned.
+3. **Skills**: Progressive disclosure. Name+description at startup, full body on demand.
+4. **Hooks**: Deterministic lifecycle events. Exit-code semantics for allow/deny.
+### Comparison with OpenClaw
+OpenClaw is a multi-channel personal assistant gateway. Same design questions, different answers due to different deployment context: per-action classification vs perimeter-level access control, single CLI loop vs embedded runtime in gateway control plane, context-window extensions vs gateway-wide capability registration.
+## Six Open Design Directions
+1. Cross-agent state sharing
+2. Long-horizon task decomposition
+3. Agent-to-agent negotiation protocols
+4. Formal verification of agent safety
+5. Energy-aware scheduling
+6. Multi-modal grounding in software engineering
+## Relevance to Our Harness
+This paper provides the foundational framework for understanding Claude Code as a harness architecture. The "five human values → 13 design principles → implementation choices" methodology is directly applicable to our own harness design documentation. The comparison with OpenClaw validates our multi-source research approach (Cursor vs Antigravity vs Claude Code — different deployment contexts surface different design answers).
+## Key Quotes
+> "The core of the system is a simple while-loop that calls the model, runs tools, and repeats. Most of the code, however, lives in the systems around this loop."
+> "Our analysis identifies five human values, philosophies, and needs that motivate the architecture and traces them through thirteen design principles to specific implementation choices."