ultimate-pi 0.1.0 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (509) hide show
  1. package/.agents/skills/ck-search/SKILL.md +99 -0
  2. package/.agents/skills/defuddle/SKILL.md +90 -0
  3. package/.agents/skills/find-skills/SKILL.md +142 -0
  4. package/.agents/skills/firecrawl/SKILL.md +150 -0
  5. package/.agents/skills/firecrawl/rules/install.md +82 -0
  6. package/.agents/skills/firecrawl/rules/security.md +26 -0
  7. package/.agents/skills/firecrawl-agent/SKILL.md +57 -0
  8. package/.agents/skills/firecrawl-build-interact/SKILL.md +67 -0
  9. package/.agents/skills/firecrawl-build-onboarding/SKILL.md +102 -0
  10. package/.agents/skills/firecrawl-build-onboarding/references/auth-flow.md +39 -0
  11. package/.agents/skills/firecrawl-build-onboarding/references/project-setup.md +20 -0
  12. package/.agents/skills/firecrawl-build-onboarding/references/sdk-installation.md +17 -0
  13. package/.agents/skills/firecrawl-build-scrape/SKILL.md +68 -0
  14. package/.agents/skills/firecrawl-build-search/SKILL.md +68 -0
  15. package/.agents/skills/firecrawl-crawl/SKILL.md +58 -0
  16. package/.agents/skills/firecrawl-download/SKILL.md +69 -0
  17. package/.agents/skills/firecrawl-interact/SKILL.md +83 -0
  18. package/.agents/skills/firecrawl-map/SKILL.md +50 -0
  19. package/.agents/skills/firecrawl-parse/SKILL.md +61 -0
  20. package/.agents/skills/firecrawl-scrape/SKILL.md +68 -0
  21. package/.agents/skills/firecrawl-search/SKILL.md +59 -0
  22. package/.agents/skills/obsidian-bases/SKILL.md +299 -0
  23. package/.agents/skills/obsidian-markdown/SKILL.md +237 -0
  24. package/.agents/skills/posthog-analyst/SKILL.md +306 -0
  25. package/.agents/skills/posthog-analyst/evals/evals.json +23 -0
  26. package/.agents/skills/wiki/SKILL.md +215 -0
  27. package/.agents/skills/wiki/references/css-snippets.md +122 -0
  28. package/.agents/skills/wiki/references/frontmatter.md +107 -0
  29. package/.agents/skills/wiki/references/git-setup.md +58 -0
  30. package/.agents/skills/wiki/references/mcp-setup.md +149 -0
  31. package/.agents/skills/wiki/references/modes.md +259 -0
  32. package/.agents/skills/wiki/references/plugins.md +96 -0
  33. package/.agents/skills/wiki/references/rest-api.md +124 -0
  34. package/.agents/skills/wiki-autoresearch/SKILL.md +211 -0
  35. package/.agents/skills/wiki-autoresearch/references/program.md +75 -0
  36. package/.agents/skills/wiki-fold/SKILL.md +204 -0
  37. package/.agents/skills/wiki-fold/references/fold-template.md +133 -0
  38. package/.agents/skills/wiki-ingest/SKILL.md +288 -0
  39. package/.agents/skills/wiki-lint/SKILL.md +183 -0
  40. package/.agents/skills/wiki-query/SKILL.md +176 -0
  41. package/.agents/skills/wiki-save/SKILL.md +128 -0
  42. package/.ckignore +41 -0
  43. package/.env.example +9 -0
  44. package/.github/banner-v2.png +0 -0
  45. package/.github/workflows/lint.yml +33 -0
  46. package/.github/workflows/publish-github-packages.yml +35 -0
  47. package/.github/workflows/publish-npm.yml +32 -0
  48. package/.pi/SYSTEM.md +107 -40
  49. package/.pi/agents/pi-pi/agent-expert.md +205 -0
  50. package/.pi/agents/pi-pi/cli-expert.md +47 -0
  51. package/.pi/agents/pi-pi/config-expert.md +67 -0
  52. package/.pi/agents/pi-pi/ext-expert.md +53 -0
  53. package/.pi/agents/pi-pi/keybinding-expert.md +123 -0
  54. package/.pi/agents/pi-pi/pi-orchestrator.md +103 -0
  55. package/.pi/agents/pi-pi/prompt-expert.md +83 -0
  56. package/.pi/agents/pi-pi/skill-expert.md +52 -0
  57. package/.pi/agents/pi-pi/theme-expert.md +46 -0
  58. package/.pi/agents/pi-pi/tui-expert.md +100 -0
  59. package/.pi/agents/rethink.md +140 -0
  60. package/.pi/agents/wiki-ingest.md +67 -0
  61. package/.pi/agents/wiki-lint.md +75 -0
  62. package/.pi/auto-commit.json +20 -0
  63. package/.pi/extensions/banner.png +0 -0
  64. package/.pi/extensions/ck-enforce.ts +216 -0
  65. package/.pi/extensions/custom-footer.ts +308 -0
  66. package/.pi/extensions/custom-header.ts +116 -0
  67. package/.pi/extensions/dotenv-loader.ts +170 -0
  68. package/.pi/internal/cursor-sdk-transcript-parser.ts +59 -0
  69. package/.pi/model-router.json +95 -0
  70. package/.pi/npm/.gitignore +2 -0
  71. package/.pi/prompts/git-sync.md +124 -0
  72. package/.pi/prompts/harness-setup.md +509 -0
  73. package/.pi/prompts/save.md +16 -0
  74. package/.pi/prompts/wiki-autoresearch.md +19 -0
  75. package/.pi/prompts/wiki.md +23 -0
  76. package/.pi/providers/cursor-sdk-provider.test.mjs +476 -0
  77. package/.pi/providers/cursor-sdk-provider.ts +1085 -0
  78. package/.pi/settings.json +14 -4
  79. package/.pi/skills/agent-router/SKILL.md +174 -0
  80. package/.pi/sounds/alert/1-kaching-track.mp3 +0 -0
  81. package/.pi/sounds/error/1-ksi-wth-track.mp3 +0 -0
  82. package/.pi/sounds/error/2-smash-track.mp3 +0 -0
  83. package/.pi/sounds/error/3-buzzer-track.mp3 +0 -0
  84. package/.pi/sounds/notification/1-soft-notification-track.mp3 +0 -0
  85. package/.pi/sounds/project-sounds.json +25 -0
  86. package/.pi/sounds/reminder/1-soft-notification-track.mp3 +0 -0
  87. package/.pi/sounds/success/1-tada-track.mp3 +0 -0
  88. package/.pi/sounds/success/2-jobs-done-track.mp3 +0 -0
  89. package/.pi/sounds/success/3-yay-track.mp3 +0 -0
  90. package/CONTRIBUTING.md +116 -0
  91. package/README.md +33 -40
  92. package/biome.json +34 -0
  93. package/firecrawl/.env.template +58 -0
  94. package/firecrawl/README.md +49 -0
  95. package/firecrawl/docker-compose.yaml +201 -0
  96. package/firecrawl/searxng/searxng.env +3 -0
  97. package/firecrawl/searxng/settings.yml +85 -0
  98. package/lefthook.yml +8 -0
  99. package/package.json +55 -16
  100. package/vault/AGENTS.md +37 -0
  101. package/vault/wiki/_templates/comparison.md +39 -0
  102. package/vault/wiki/_templates/concept.md +40 -0
  103. package/vault/wiki/_templates/decision.md +21 -0
  104. package/vault/wiki/_templates/entity.md +32 -0
  105. package/vault/wiki/_templates/flow.md +14 -0
  106. package/vault/wiki/_templates/module.md +18 -0
  107. package/vault/wiki/_templates/question.md +31 -0
  108. package/vault/wiki/_templates/source.md +39 -0
  109. package/vault/wiki/concepts/AST-Aware Code Chunking.md +44 -0
  110. package/vault/wiki/concepts/Build-Time Prompt Compilation.md +107 -0
  111. package/vault/wiki/concepts/Context Engine (AI Coding).md +47 -0
  112. package/vault/wiki/concepts/Context-Aware System Reminders.md +61 -0
  113. package/vault/wiki/concepts/Contextualized Text Embedding.md +42 -0
  114. package/vault/wiki/concepts/Contractor vs Employee AI Model.md +55 -0
  115. package/vault/wiki/concepts/Dual-Model Agent Architecture.md +65 -0
  116. package/vault/wiki/concepts/Late Chunking vs Early Chunking.md +43 -0
  117. package/vault/wiki/concepts/Majority Vote Ensembling.md +68 -0
  118. package/vault/wiki/concepts/Meta-Harness.md +16 -0
  119. package/vault/wiki/concepts/Multi-Agent AI Coding Architecture.md +75 -0
  120. package/vault/wiki/concepts/Prompt Enhancement.md +90 -0
  121. package/vault/wiki/concepts/Prompt Renderer.md +89 -0
  122. package/vault/wiki/concepts/Semantic Codebase Indexing.md +67 -0
  123. package/vault/wiki/concepts/additive-config-hierarchy.md +16 -0
  124. package/vault/wiki/concepts/agent-artifacts-verifiable-deliverables.md +71 -0
  125. package/vault/wiki/concepts/agent-browser-browser-automation.md +99 -0
  126. package/vault/wiki/concepts/agent-codebase-interface.md +43 -0
  127. package/vault/wiki/concepts/agent-harness-architecture.md +67 -0
  128. package/vault/wiki/concepts/agent-loop-detection-patterns.md +133 -0
  129. package/vault/wiki/concepts/agent-search-enforcement.md +126 -0
  130. package/vault/wiki/concepts/agent-skills-ecosystem.md +74 -0
  131. package/vault/wiki/concepts/agent-skills-pattern.md +68 -0
  132. package/vault/wiki/concepts/agentic-harness-context-enforcement.md +91 -0
  133. package/vault/wiki/concepts/agentic-harness.md +34 -0
  134. package/vault/wiki/concepts/agentic-orchestration-pipeline.md +56 -0
  135. package/vault/wiki/concepts/agentic-search-no-embeddings.md +18 -0
  136. package/vault/wiki/concepts/anthropic-context-engineering.md +13 -0
  137. package/vault/wiki/concepts/antigravity-agent-first-architecture.md +61 -0
  138. package/vault/wiki/concepts/ast-compression.md +19 -0
  139. package/vault/wiki/concepts/ast-truncation.md +66 -0
  140. package/vault/wiki/concepts/barrel-files.md +37 -0
  141. package/vault/wiki/concepts/browser-harness-agent.md +41 -0
  142. package/vault/wiki/concepts/browser-subagent-visual-verification.md +82 -0
  143. package/vault/wiki/concepts/codebase-intelligence-ecosystem-comparison.md +192 -0
  144. package/vault/wiki/concepts/codebase-intelligence-harness-integration.md +161 -0
  145. package/vault/wiki/concepts/codebase-to-context-ingestion.md +46 -0
  146. package/vault/wiki/concepts/codex-harness-innovations.md +147 -0
  147. package/vault/wiki/concepts/consensus-debate-flow.md +17 -0
  148. package/vault/wiki/concepts/consensus-debate.md +206 -0
  149. package/vault/wiki/concepts/content-addressed-spec-identity.md +166 -0
  150. package/vault/wiki/concepts/context-anxiety.md +57 -0
  151. package/vault/wiki/concepts/context-compression-techniques.md +19 -0
  152. package/vault/wiki/concepts/context-continuity.md +22 -0
  153. package/vault/wiki/concepts/context-drift-in-agents.md +106 -0
  154. package/vault/wiki/concepts/context-engineering.md +62 -0
  155. package/vault/wiki/concepts/context-folding.md +67 -0
  156. package/vault/wiki/concepts/context-mode.md +38 -0
  157. package/vault/wiki/concepts/cursor-harness-innovations.md +107 -0
  158. package/vault/wiki/concepts/deterministic-session-compaction.md +79 -0
  159. package/vault/wiki/concepts/drift-detection-unified.md +296 -0
  160. package/vault/wiki/concepts/execution-feedback-loop.md +46 -0
  161. package/vault/wiki/concepts/feedforward-feedback-harness.md +60 -0
  162. package/vault/wiki/concepts/five-root-cause-metrics-sentrux.md +40 -0
  163. package/vault/wiki/concepts/fork-safe-spec-storage.md +89 -0
  164. package/vault/wiki/concepts/fts5-sandbox.md +19 -0
  165. package/vault/wiki/concepts/fuzzy-edit-matching.md +71 -0
  166. package/vault/wiki/concepts/gemini-cli-architecture.md +104 -0
  167. package/vault/wiki/concepts/generator-evaluator-architecture.md +64 -0
  168. package/vault/wiki/concepts/guardian-agent-pattern.md +67 -0
  169. package/vault/wiki/concepts/harness-configuration-layers.md +89 -0
  170. package/vault/wiki/concepts/harness-control-frameworks.md +155 -0
  171. package/vault/wiki/concepts/harness-engineering-first-principles.md +90 -0
  172. package/vault/wiki/concepts/harness-h-formalism.md +53 -0
  173. package/vault/wiki/concepts/hybrid-code-search.md +61 -0
  174. package/vault/wiki/concepts/inline-post-edit-validation.md +112 -0
  175. package/vault/wiki/concepts/legendary-engineering-patterns-harness.md +110 -0
  176. package/vault/wiki/concepts/lifecycle-hooks.md +94 -0
  177. package/vault/wiki/concepts/mcp-tool-routing.md +102 -0
  178. package/vault/wiki/concepts/memory-system-of-record-vs-ephemeral-cache.md +47 -0
  179. package/vault/wiki/concepts/meta-agent-context-pruning.md +151 -0
  180. package/vault/wiki/concepts/model-adaptive-harness.md +122 -0
  181. package/vault/wiki/concepts/model-routing-agents.md +101 -0
  182. package/vault/wiki/concepts/monorepo-architecture.md +45 -0
  183. package/vault/wiki/concepts/multi-agent-specialization.md +61 -0
  184. package/vault/wiki/concepts/permission-subsystem.md +16 -0
  185. package/vault/wiki/concepts/pi-messenger-analysis.md +243 -0
  186. package/vault/wiki/concepts/pi-vscode-extension-landscape.md +37 -0
  187. package/vault/wiki/concepts/policy-engine-pattern.md +78 -0
  188. package/vault/wiki/concepts/progressive-disclosure-agents.md +53 -0
  189. package/vault/wiki/concepts/progressive-skill-disclosure.md +17 -0
  190. package/vault/wiki/concepts/provider-native-prompting.md +203 -0
  191. package/vault/wiki/concepts/quality-signal-sentrux.md +37 -0
  192. package/vault/wiki/concepts/repo-map-ranking.md +42 -0
  193. package/vault/wiki/concepts/result-monad-error-handling.md +47 -0
  194. package/vault/wiki/concepts/safety-defense-in-depth.md +83 -0
  195. package/vault/wiki/concepts/sandbox-os-enforcement.md +18 -0
  196. package/vault/wiki/concepts/selective-debate-routing.md +70 -0
  197. package/vault/wiki/concepts/self-evolving-harness.md +60 -0
  198. package/vault/wiki/concepts/sentrux-mcp-integration.md +36 -0
  199. package/vault/wiki/concepts/sentrux-rules-engine.md +49 -0
  200. package/vault/wiki/concepts/shell-pattern-compression.md +24 -0
  201. package/vault/wiki/concepts/skill-first-architecture.md +166 -0
  202. package/vault/wiki/concepts/structured-compaction.md +78 -0
  203. package/vault/wiki/concepts/subagent-orchestration.md +17 -0
  204. package/vault/wiki/concepts/subagent-worktree-isolation.md +68 -0
  205. package/vault/wiki/concepts/superpowers-methodology.md +78 -0
  206. package/vault/wiki/concepts/think-in-code.md +73 -0
  207. package/vault/wiki/concepts/ts-execution-layer.md +100 -0
  208. package/vault/wiki/concepts/typescript-strict-mode.md +37 -0
  209. package/vault/wiki/concepts/vcc-conversation-compaction-for-pi.md +51 -0
  210. package/vault/wiki/concepts/verification-drift-detection.md +19 -0
  211. package/vault/wiki/consensus/consensus-records.md +58 -0
  212. package/vault/wiki/decisions/2026-04-30-pi-lean-ctx-native.md +122 -0
  213. package/vault/wiki/decisions/adr-008.md +40 -0
  214. package/vault/wiki/decisions/adr-009.md +46 -0
  215. package/vault/wiki/decisions/adr-010.md +55 -0
  216. package/vault/wiki/decisions/adr-011.md +165 -0
  217. package/vault/wiki/decisions/adr-012.md +102 -0
  218. package/vault/wiki/decisions/adr-013.md +59 -0
  219. package/vault/wiki/decisions/adr-014.md +73 -0
  220. package/vault/wiki/decisions/adr-015.md +81 -0
  221. package/vault/wiki/decisions/adr-016.md +91 -0
  222. package/vault/wiki/decisions/adr-017.md +79 -0
  223. package/vault/wiki/decisions/adr-018.md +100 -0
  224. package/vault/wiki/decisions/adr-019.md +75 -0
  225. package/vault/wiki/decisions/adr-020.md +106 -0
  226. package/vault/wiki/decisions/adr-021.md +86 -0
  227. package/vault/wiki/decisions/adr-022.md +113 -0
  228. package/vault/wiki/decisions/adr-023.md +113 -0
  229. package/vault/wiki/decisions/adr-024.md +73 -0
  230. package/vault/wiki/decisions/adr-025.md +130 -0
  231. package/vault/wiki/decisions/adr-026.md +56 -0
  232. package/vault/wiki/decisions/colocate-wiki.md +34 -0
  233. package/vault/wiki/entities/Anders Hejlsberg.md +29 -0
  234. package/vault/wiki/entities/Anthropic.md +17 -0
  235. package/vault/wiki/entities/Augment Code.md +49 -0
  236. package/vault/wiki/entities/Bjarne Stroustrup.md +26 -0
  237. package/vault/wiki/entities/Bolt.new (StackBlitz).md +39 -0
  238. package/vault/wiki/entities/Boris Cherny.md +11 -0
  239. package/vault/wiki/entities/Claude Code.md +19 -0
  240. package/vault/wiki/entities/Dennis Ritchie.md +26 -0
  241. package/vault/wiki/entities/Emergent Labs.md +32 -0
  242. package/vault/wiki/entities/Google Cloud.md +16 -0
  243. package/vault/wiki/entities/Guido van Rossum.md +28 -0
  244. package/vault/wiki/entities/Ken Thompson.md +28 -0
  245. package/vault/wiki/entities/Lee et al.md +16 -0
  246. package/vault/wiki/entities/Linus Torvalds.md +28 -0
  247. package/vault/wiki/entities/Lovable (company).md +40 -0
  248. package/vault/wiki/entities/Martin Fowler.md +16 -0
  249. package/vault/wiki/entities/Meng et al.md +16 -0
  250. package/vault/wiki/entities/OpenAI.md +16 -0
  251. package/vault/wiki/entities/Rocket.new.md +38 -0
  252. package/vault/wiki/entities/VILA-Lab.md +15 -0
  253. package/vault/wiki/entities/autodev-codebase.md +18 -0
  254. package/vault/wiki/entities/ck-tool.md +59 -0
  255. package/vault/wiki/entities/codesearch.md +18 -0
  256. package/vault/wiki/entities/disler-indydevdan.md +33 -0
  257. package/vault/wiki/entities/gsd-get-shit-done.md +56 -0
  258. package/vault/wiki/entities/javascript-runtimes.md +48 -0
  259. package/vault/wiki/entities/jesse-vincent.md +38 -0
  260. package/vault/wiki/entities/lean-ctx.md +32 -0
  261. package/vault/wiki/entities/opendev.md +41 -0
  262. package/vault/wiki/entities/ops-codegraph-tool.md +18 -0
  263. package/vault/wiki/entities/pi-coding-agent.md +53 -0
  264. package/vault/wiki/entities/sentrux.md +54 -0
  265. package/vault/wiki/entities/vgrep-tool.md +57 -0
  266. package/vault/wiki/entities/vitest.md +41 -0
  267. package/vault/wiki/flows/harness-wiki-pipeline.md +204 -0
  268. package/vault/wiki/hot.md +932 -0
  269. package/vault/wiki/index.md +437 -0
  270. package/vault/wiki/log.md +418 -0
  271. package/vault/wiki/meta/dashboard.md +30 -0
  272. package/vault/wiki/meta/lint-report-2026-04-30.md +86 -0
  273. package/vault/wiki/meta/lint-report-2026-05-02.md +251 -0
  274. package/vault/wiki/meta/overview.canvas +43 -0
  275. package/vault/wiki/modules/adversarial-verification.md +57 -0
  276. package/vault/wiki/modules/automated-observability.md +54 -0
  277. package/vault/wiki/modules/bench.md +20 -0
  278. package/vault/wiki/modules/extensions.md +23 -0
  279. package/vault/wiki/modules/grounding-checkpoints.md +62 -0
  280. package/vault/wiki/modules/harness-implementation-plan.md +345 -0
  281. package/vault/wiki/modules/harness-wiki-skill-mapping.md +135 -0
  282. package/vault/wiki/modules/harness.md +86 -0
  283. package/vault/wiki/modules/persistent-memory.md +85 -0
  284. package/vault/wiki/modules/schema-orchestration.md +68 -0
  285. package/vault/wiki/modules/skills.md +27 -0
  286. package/vault/wiki/modules/spec-hardening.md +58 -0
  287. package/vault/wiki/modules/structured-planning.md +53 -0
  288. package/vault/wiki/modules/think-in-code-enforcement.md +153 -0
  289. package/vault/wiki/modules/wiki-query-interface.md +64 -0
  290. package/vault/wiki/overview.md +51 -0
  291. package/vault/wiki/questions/Research-pi-vs-claude-code-agentic-orchestration-pipeline.md +87 -0
  292. package/vault/wiki/questions/Research-sentrux-dev.md +123 -0
  293. package/vault/wiki/questions/Research-superpowers-skill-for-agentic-coding-agents.md +164 -0
  294. package/vault/wiki/questions/Research: Augment Code Context Engine.md +244 -0
  295. package/vault/wiki/questions/Research: Automating Software Engineering - Lovable, Bolt, Emergent, Rocket.md +112 -0
  296. package/vault/wiki/questions/Research: Claude Code State-of-the-Art Harness Improvements.md +209 -0
  297. package/vault/wiki/questions/Research: Codex State-of-the-Art Harness Improvements.md +99 -0
  298. package/vault/wiki/questions/Research: Engineering Workflows of Legendary Programmers and AI Harness Mapping.md +107 -0
  299. package/vault/wiki/questions/Research: Fallow Codebase Intelligence Harness Integration.md +72 -0
  300. package/vault/wiki/questions/Research: Gemini CLI SOTA Harness Integration.md +166 -0
  301. package/vault/wiki/questions/Research: GitHub Issues as Harness Spec Storage.md +188 -0
  302. package/vault/wiki/questions/Research: Google Antigravity Harness Integration.md +120 -0
  303. package/vault/wiki/questions/Research: Meta-Agent Context Drift Detection.md +236 -0
  304. package/vault/wiki/questions/Research: Model-Adaptive Agent Harness Design.md +95 -0
  305. package/vault/wiki/questions/Research: Model-Specific Prompting Guides.md +165 -0
  306. package/vault/wiki/questions/Research: Prompt Renderer for Multi-Model Agent Harness.md +216 -0
  307. package/vault/wiki/questions/Research: Skill-First Harness Architecture.md +91 -0
  308. package/vault/wiki/questions/Research: TypeScript Best Practices and Codebase Structure.md +88 -0
  309. package/vault/wiki/questions/Research: TypeScript Execution Layer for Agent Tool Calling.md +81 -0
  310. package/vault/wiki/questions/Research: claude-mem over Obsidian for Harness Layer.md +71 -0
  311. package/vault/wiki/questions/Research: claude-mem over obsidian wiki as the knowledge base for our agentic harness pipeline. think from first principles. does this replace or complement our current setup? no hard feelings about previous decisions. gimme accurate points.md +80 -0
  312. package/vault/wiki/questions/Research: context-mode vs lean-ctx.md +72 -0
  313. package/vault/wiki/questions/Research: cursor.sh Harness Innovations.md +92 -0
  314. package/vault/wiki/questions/Research: executor.sh Harness Integration.md +170 -0
  315. package/vault/wiki/questions/Research: how GSD fits into our coding harness setup.md +97 -0
  316. package/vault/wiki/questions/Research: how claude-mem fits into our workflow. and whether it should replace obsidian in the codebase. no hard feelings about previous actions, rethink from first principles always.md +80 -0
  317. package/vault/wiki/questions/Research: pi-vcc.md +113 -0
  318. package/vault/wiki/questions/Research: semantic code search tools.md +69 -0
  319. package/vault/wiki/questions/Research: vcc extension for pi coding agent.md +73 -0
  320. package/vault/wiki/questions/how-to-enable-semantic-code-search-now.md +111 -0
  321. package/vault/wiki/questions/mvp-implementation-blueprint.md +552 -0
  322. package/vault/wiki/questions/research-agent-first-codebase-exploration.md +199 -0
  323. package/vault/wiki/questions/research-agentic-coding-harness-latest-papers.md +142 -0
  324. package/vault/wiki/questions/research-gitingest-gitreverse-integration.md +100 -0
  325. package/vault/wiki/questions/research-wozcode-token-reduction.md +67 -0
  326. package/vault/wiki/questions/resolved-context-pruning-inplace-vs-restart.md +95 -0
  327. package/vault/wiki/questions/resolved-context-window-economics.md +167 -0
  328. package/vault/wiki/questions/resolved-imad-debate-gating-transfer.md +126 -0
  329. package/vault/wiki/questions/resolved-mcp-tool-preference.md +112 -0
  330. package/vault/wiki/questions/resolved-small-model-meta-agents.md +107 -0
  331. package/vault/wiki/questions/resolved-treesitter-dynamic-languages.md +95 -0
  332. package/vault/wiki/sources/Auggie Context MCP Server.md +63 -0
  333. package/vault/wiki/sources/Augment Code Codacy AI Giants.md +61 -0
  334. package/vault/wiki/sources/Augment Code MCP SiliconAngle.md +49 -0
  335. package/vault/wiki/sources/Augment Code WorkOS ERC 2025.md +55 -0
  336. package/vault/wiki/sources/Augment Context Engine Official.md +71 -0
  337. package/vault/wiki/sources/Augment SWE-bench Agent GitHub.md +74 -0
  338. package/vault/wiki/sources/Augment SWE-bench Pro Blog.md +58 -0
  339. package/vault/wiki/sources/Source: AgentBus Jinja2 Prompt Pipelines.md +75 -0
  340. package/vault/wiki/sources/Source: Arxiv /342/200/224 Don't Break the Cache.md" +85 -0
  341. package/vault/wiki/sources/Source: Augment - Harness Engineering for AI Coding Agents.md +58 -0
  342. package/vault/wiki/sources/Source: Blake Crosley Agent Architecture Guide.md +100 -0
  343. package/vault/wiki/sources/Source: Bolt.new Architecture & Case Study.md +75 -0
  344. package/vault/wiki/sources/Source: Build-Time Prompt Compilation Architecture.md +107 -0
  345. package/vault/wiki/sources/Source: Claude API Agent Skills Overview.md +70 -0
  346. package/vault/wiki/sources/Source: Gemini CLI Changelogs.md +88 -0
  347. package/vault/wiki/sources/Source: Google Blog - Gemini CLI Announcement.md +57 -0
  348. package/vault/wiki/sources/Source: Google Gemini CLI Architecture Docs.md +53 -0
  349. package/vault/wiki/sources/Source: LangChain - Anatomy of Agent Harness.md +65 -0
  350. package/vault/wiki/sources/Source: Lovable Architecture & Clone Analysis.md +83 -0
  351. package/vault/wiki/sources/Source: Martin Fowler - Harness Engineering.md +70 -0
  352. package/vault/wiki/sources/Source: OpenAI Harness Engineering Five Principles.md +58 -0
  353. package/vault/wiki/sources/Source: OpenAI Harness Engineering /342/200/224 0 Lines of Human Code.md" +101 -0
  354. package/vault/wiki/sources/Source: OpenDev /342/200/224 Building AI Coding Agents for the Terminal.md" +100 -0
  355. package/vault/wiki/sources/Source: Render AI Coding Agents Benchmark 2025.md +53 -0
  356. package/vault/wiki/sources/Source: Rocket.new /342/200/224 Vibe Solutioning Platform.md" +70 -0
  357. package/vault/wiki/sources/Source: SwirlAI Agent Skills Progressive Disclosure.md +71 -0
  358. package/vault/wiki/sources/Source: TianPan Prompt Caching Architecture.md +89 -0
  359. package/vault/wiki/sources/Source: Vercel Labs agent-browser.md +155 -0
  360. package/vault/wiki/sources/Source: browser-harness CDP Harness.md +126 -0
  361. package/vault/wiki/sources/agent-drift-academic-paper.md +79 -0
  362. package/vault/wiki/sources/aider-repomap-tree-sitter.md +42 -0
  363. package/vault/wiki/sources/anthropic-compaction-api.md +58 -0
  364. package/vault/wiki/sources/anthropic-effective-harnesses.md +42 -0
  365. package/vault/wiki/sources/anthropic-prompt-best-practices.md +100 -0
  366. package/vault/wiki/sources/anthropic2026-harness-design.md +63 -0
  367. package/vault/wiki/sources/barrel-files-tkdodo.md +38 -0
  368. package/vault/wiki/sources/birth-of-unix-kernighan-interview.md +57 -0
  369. package/vault/wiki/sources/bockeler2026-harness-engineering.md +69 -0
  370. package/vault/wiki/sources/cast-code-chunking-paper.md +50 -0
  371. package/vault/wiki/sources/ck-semantic-search.md +78 -0
  372. package/vault/wiki/sources/claude-code-architecture-karaxai-2026.md +71 -0
  373. package/vault/wiki/sources/claude-code-architecture-qubytes-2026.md +50 -0
  374. package/vault/wiki/sources/claude-code-architecture-vila-lab-2026.md +64 -0
  375. package/vault/wiki/sources/claude-code-security-architecture-penligent-2026.md +70 -0
  376. package/vault/wiki/sources/claude-context-editing-docs.md +13 -0
  377. package/vault/wiki/sources/cloudflare-codemode.md +63 -0
  378. package/vault/wiki/sources/code-chunk-library-supermemory.md +63 -0
  379. package/vault/wiki/sources/codeact-apple-2024.md +62 -0
  380. package/vault/wiki/sources/codex-dsc-rfc-8573.md +41 -0
  381. package/vault/wiki/sources/codex-open-source-agent-2026.md +110 -0
  382. package/vault/wiki/sources/coir-code-retrieval-benchmark.md +51 -0
  383. package/vault/wiki/sources/colinmcnamara-context-optimization-codemode.md +48 -0
  384. package/vault/wiki/sources/context-folding-paper.md +61 -0
  385. package/vault/wiki/sources/context-mode-website.md +63 -0
  386. package/vault/wiki/sources/cursor-agent-best-practices-2026.md +62 -0
  387. package/vault/wiki/sources/cursor-fork-29b-2025.md +50 -0
  388. package/vault/wiki/sources/cursor-harness-april-2026.md +76 -0
  389. package/vault/wiki/sources/cursor-instant-apply-2024.md +45 -0
  390. package/vault/wiki/sources/cursor-shadow-workspace-2024.md +52 -0
  391. package/vault/wiki/sources/cursor-shipped-coding-agent-2026.md +53 -0
  392. package/vault/wiki/sources/cursor-vs-antigravity-2026.md +51 -0
  393. package/vault/wiki/sources/disler-pi-vs-claude-code.md +69 -0
  394. package/vault/wiki/sources/distill-deterministic-context-compression.md +53 -0
  395. package/vault/wiki/sources/embedding-models-benchmark-supermemory-2025.md +48 -0
  396. package/vault/wiki/sources/executor-rhyssullivan.md +122 -0
  397. package/vault/wiki/sources/fallow-rs-codebase-intelligence.md +125 -0
  398. package/vault/wiki/sources/fan2025-imad.md +60 -0
  399. package/vault/wiki/sources/forgecode-gpt5-agent-improvements.md +63 -0
  400. package/vault/wiki/sources/gemini-3-prompting-guide.md +78 -0
  401. package/vault/wiki/sources/gh-cli-sub-issue-rfc.md +50 -0
  402. package/vault/wiki/sources/gh-sub-issue-extension.md +72 -0
  403. package/vault/wiki/sources/github-fork-issues-discussion.md +44 -0
  404. package/vault/wiki/sources/github-issue-dependencies-docs.md +49 -0
  405. package/vault/wiki/sources/github-sub-issues-docs.md +51 -0
  406. package/vault/wiki/sources/gitingest.md +91 -0
  407. package/vault/wiki/sources/gitreverse.md +63 -0
  408. package/vault/wiki/sources/google-antigravity-official-blog.md +47 -0
  409. package/vault/wiki/sources/google-antigravity-wikipedia.md +53 -0
  410. package/vault/wiki/sources/gsd-codecentric-deep-dive.md +57 -0
  411. package/vault/wiki/sources/gsd-github-repo.md +51 -0
  412. package/vault/wiki/sources/gsd-hn-discussion.md +59 -0
  413. package/vault/wiki/sources/guido-python-design-philosophy.md +56 -0
  414. package/vault/wiki/sources/hejlsberg-7-learnings.md +48 -0
  415. package/vault/wiki/sources/ironclaw-drift-monitor.md +80 -0
  416. package/vault/wiki/sources/langsight-loop-detection.md +80 -0
  417. package/vault/wiki/sources/leanctx-website.md +69 -0
  418. package/vault/wiki/sources/lee2026-meta-harness.md +59 -0
  419. package/vault/wiki/sources/linux-kernel-coding-workflow.md +50 -0
  420. package/vault/wiki/sources/lou2026-autoharness.md +53 -0
  421. package/vault/wiki/sources/martin-fowler-harness-engineering.md +73 -0
  422. package/vault/wiki/sources/mcp-architecture-docs.md +13 -0
  423. package/vault/wiki/sources/meng2026-agent-harness-survey.md +79 -0
  424. package/vault/wiki/sources/mindstudio-four-agent-types.md +68 -0
  425. package/vault/wiki/sources/ms-chat-history-management.md +13 -0
  426. package/vault/wiki/sources/openai-prompt-guidance.md +104 -0
  427. package/vault/wiki/sources/openclaw-session-pruning.md +13 -0
  428. package/vault/wiki/sources/opencode-dcp.md +13 -0
  429. package/vault/wiki/sources/opendev-arxiv-2603.05344v1.md +79 -0
  430. package/vault/wiki/sources/openhands-platform.md +39 -0
  431. package/vault/wiki/sources/oss-guide-codebase-exploration.md +53 -0
  432. package/vault/wiki/sources/pi-compaction-extensions-ecosystem.md +102 -0
  433. package/vault/wiki/sources/pi-context-prune-github-repo.md +38 -0
  434. package/vault/wiki/sources/pi-mono-compaction-docs.md +38 -0
  435. package/vault/wiki/sources/pi-omni-compact-github-repo.md +50 -0
  436. package/vault/wiki/sources/pi-rtk-optimizer-github-repo.md +45 -0
  437. package/vault/wiki/sources/pi-vcc-github-repo.md +69 -0
  438. package/vault/wiki/sources/pi-vscode-marketplace.md +41 -0
  439. package/vault/wiki/sources/pi-vscode-model-provider-marketplace.md +39 -0
  440. package/vault/wiki/sources/py-tree-sitter.md +13 -0
  441. package/vault/wiki/sources/sentrux-dev-landing.md +40 -0
  442. package/vault/wiki/sources/sentrux-docs-pro-architecture.md +75 -0
  443. package/vault/wiki/sources/sentrux-docs-quality-signal.md +46 -0
  444. package/vault/wiki/sources/sentrux-docs-root-cause-metrics.md +57 -0
  445. package/vault/wiki/sources/sentrux-docs-rules-engine.md +58 -0
  446. package/vault/wiki/sources/sentrux-github-repo.md +56 -0
  447. package/vault/wiki/sources/superpowers-github-repo.md +56 -0
  448. package/vault/wiki/sources/superpowers-release-blog.md +54 -0
  449. package/vault/wiki/sources/superpowers-termdock-analysis.md +45 -0
  450. package/vault/wiki/sources/swe-agent-aci.md +42 -0
  451. package/vault/wiki/sources/swe-bench.md +45 -0
  452. package/vault/wiki/sources/swe-pruner-context-pruning.md +13 -0
  453. package/vault/wiki/sources/think-in-code-blog.md +48 -0
  454. package/vault/wiki/sources/tree-sitter-docs.md +13 -0
  455. package/vault/wiki/sources/ts-best-practices-2025-devto.md +42 -0
  456. package/vault/wiki/sources/ts-folder-structure-mingyang.md +58 -0
  457. package/vault/wiki/sources/ts-monorepo-koerselman.md +44 -0
  458. package/vault/wiki/sources/ts-result-error-handling-kkalamarski.md +52 -0
  459. package/vault/wiki/sources/ts-runtimes-comparison-betterstack.md +42 -0
  460. package/vault/wiki/sources/ts-strict-mode-rishikc.md +43 -0
  461. package/vault/wiki/sources/unix-philosophy.md +48 -0
  462. package/vault/wiki/sources/vectara-chunking-vs-embedding-naacl2025.md +39 -0
  463. package/vault/wiki/sources/vectara-guardian-agents.md +79 -0
  464. package/vault/wiki/sources/vgrep-semantic-search.md +76 -0
  465. package/vault/wiki/sources/vitest-official.md +41 -0
  466. package/vault/wiki/sources/vscode-pi-community-extension.md +40 -0
  467. package/vault/wiki/sources/wozcode.md +79 -0
  468. package/.agents/skills/compress/SKILL.md +0 -111
  469. package/.agents/skills/compress/scripts/__init__.py +0 -9
  470. package/.agents/skills/compress/scripts/__main__.py +0 -3
  471. package/.agents/skills/compress/scripts/benchmark.py +0 -78
  472. package/.agents/skills/compress/scripts/cli.py +0 -73
  473. package/.agents/skills/compress/scripts/compress.py +0 -227
  474. package/.agents/skills/compress/scripts/detect.py +0 -121
  475. package/.agents/skills/compress/scripts/validate.py +0 -189
  476. package/.agents/skills/emil-design-eng/SKILL.md +0 -679
  477. package/.agents/skills/lean-ctx/SKILL.md +0 -149
  478. package/.agents/skills/lean-ctx/scripts/install.sh +0 -95
  479. package/.agents/skills/scrapling-official/LICENSE.txt +0 -28
  480. package/.agents/skills/scrapling-official/SKILL.md +0 -390
  481. package/.agents/skills/scrapling-official/examples/01_fetcher_session.py +0 -26
  482. package/.agents/skills/scrapling-official/examples/02_dynamic_session.py +0 -26
  483. package/.agents/skills/scrapling-official/examples/03_stealthy_session.py +0 -26
  484. package/.agents/skills/scrapling-official/examples/04_spider.py +0 -58
  485. package/.agents/skills/scrapling-official/examples/README.md +0 -45
  486. package/.agents/skills/scrapling-official/references/fetching/choosing.md +0 -78
  487. package/.agents/skills/scrapling-official/references/fetching/dynamic.md +0 -352
  488. package/.agents/skills/scrapling-official/references/fetching/static.md +0 -432
  489. package/.agents/skills/scrapling-official/references/fetching/stealthy.md +0 -255
  490. package/.agents/skills/scrapling-official/references/mcp-server.md +0 -214
  491. package/.agents/skills/scrapling-official/references/migrating_from_beautifulsoup.md +0 -86
  492. package/.agents/skills/scrapling-official/references/parsing/adaptive.md +0 -212
  493. package/.agents/skills/scrapling-official/references/parsing/main_classes.md +0 -586
  494. package/.agents/skills/scrapling-official/references/parsing/selection.md +0 -494
  495. package/.agents/skills/scrapling-official/references/spiders/advanced.md +0 -344
  496. package/.agents/skills/scrapling-official/references/spiders/architecture.md +0 -94
  497. package/.agents/skills/scrapling-official/references/spiders/getting-started.md +0 -164
  498. package/.agents/skills/scrapling-official/references/spiders/proxy-blocking.md +0 -235
  499. package/.agents/skills/scrapling-official/references/spiders/requests-responses.md +0 -196
  500. package/.agents/skills/scrapling-official/references/spiders/sessions.md +0 -205
  501. package/.github/banner.png +0 -0
  502. package/PLAN.md +0 -11
  503. package/extensions/lean-ctx-enforce.ts +0 -166
  504. package/skills-lock.json +0 -35
  505. package/wiki/README.md +0 -10
  506. package/wiki/decisions/0001-establish-project-wiki-and-decision-record-format.md +0 -25
  507. package/wiki/decisions/0002-add-project-banner-to-readme.md +0 -26
  508. package/wiki/decisions/0003-remove-redundant-readme-title-heading.md +0 -26
  509. package/wiki/decisions/0004-publish-package-to-npm-as-ultimate-pi.md +0 -26
@@ -0,0 +1,100 @@
1
+ ---
2
+ type: source
3
+ source_type: engineering-blog
4
+ title: "Blake Crosley — Agent Architecture: Building AI-Powered Development Harnesses"
5
+ author: "Blake Crosley"
6
+ date_published: 2026-04-29
7
+ url: "https://blakecrosley.com/guides/agent-architecture"
8
+ confidence: high
9
+ key_claims:
10
+ - "The harness is a programmable runtime with an LLM kernel — not a chat box with file access"
11
+ - "Hooks guarantee execution (exit code 2 blocks). Prompts achieve ~80% compliance."
12
+ - "Skills encode domain expertise that auto-activates via LLM reasoning"
13
+ - "Subagents prevent context bloat — isolated context windows for exploration"
14
+ - "Memory lives in the filesystem — files persist across context boundaries"
15
+ - "Multi-agent deliberation catches blind spots that single agents cannot"
16
+ - "The harness pattern is the system — CLAUDE.md, hooks, skills, agents, memory compose into a deterministic layer"
17
+ - "Code is cheap now; verification is the expensive part"
18
+ - "Single-agent systems have a structural blind spot: they cannot challenge their own assumptions"
19
+ - "Fresh-context iteration (Ralph Loop) beats long conversations for quality beyond 90 minutes"
20
+ - "Production harness reduced false completion rate from 35% to 4% and blocked 7 credential leaks"
21
+ tags: [source, agent-architecture, harness, hooks, skills, multi-agent]
22
+ related:
23
+ - "[[harness-engineering-first-principles]]"
24
+ - "[[agent-skills-pattern]]"
25
+ - "[[lifecycle-hooks]]"
26
+ - "[[consensus-debate]]"
27
+ - "[[skill-first-architecture]]"
28
+ ---
29
+
30
+ # Blake Crosley — Agent Architecture Guide
31
+
32
+ ## Summary
33
+
34
+ Comprehensive 12,783-word guide on building production AI agent harnesses. Published April 29, 2026, updated through Google Cloud Next 2026 (April 22-24). Covers the complete stack: hooks, skills, subagents, multi-agent orchestration, memory, and production patterns. Based on the author's implementation: 84 hooks, 48 skills, 19 agents, ~15,000 lines of orchestration.
35
+
36
+ ## Key Contributions
37
+
38
+ ### The Harness Pattern
39
+
40
+ The harness is not a framework — it's a pattern: a composable set of files, scripts, and conventions that wrap an AI coding agent in deterministic infrastructure. Four layers:
41
+
42
+ 1. **Instruction Layer**: CLAUDE.md + rules directories — what the agent knows
43
+ 2. **Extension Layer**: Skills (domain expertise), Hooks (deterministic gates), Memory (persistent state), Agents (specialized subagents)
44
+ 3. **Orchestration Layer**: Multi-agent patterns, spawn budgets, consensus validation
45
+ 4. **Core Layer**: Main conversation context (LLM)
46
+
47
+ > "Most users work entirely in the Core Layer, watching context bloat and costs climb. Power users configure the Instruction and Extension layers."
48
+
49
+ ### Hooks vs Skills vs Subagents Decision Framework
50
+
51
+ | Problem | Use | Why |
52
+ |---------|-----|-----|
53
+ | Format code after every edit | PostToolUse hook | Must happen every time, deterministically |
54
+ | Block dangerous bash commands | PreToolUse hook | Must block before execution, exit code 2 |
55
+ | Apply security review patterns | Skill | Domain expertise that auto-activates |
56
+ | Explore codebase without polluting context | Explore subagent | Isolated context, returns summary only |
57
+ | Run experimental refactoring safely | Worktree-isolated subagent | Changes can be discarded |
58
+ | Review code from multiple perspectives | Parallel subagents | Independent evaluation |
59
+ | Decide on irreversible architecture | Multi-agent deliberation | Confidence trigger + consensus |
60
+
61
+ ### The Distinction That Matters
62
+
63
+ > "Hooks guarantee execution; prompts do not. Use hooks for linting, formatting, security checks, and anything that must run every time regardless of model behavior. Exit code 2 blocks actions. Exit code 1 only warns."
64
+
65
+ > "Skills are model-invoked extensions. Claude discovers and applies them automatically based on context, without you explicitly calling them. The moment you catch yourself re-explaining the same context across sessions is the moment you should build a skill."
66
+
67
+ ### Production Results
68
+
69
+ A production harness processed 12 PRDs (47 stories) across 8 overnight sessions:
70
+
71
+ | Metric | Minimal Harness | Full Harness |
72
+ |--------|----------------|--------------|
73
+ | False completion rate | 35% | 4% |
74
+ | Credential leaks | 2 leaked | 7 blocked |
75
+ | Destructive commands | 1 force-push | 4 blocked |
76
+ | Revision rounds/story | 2.1 | 0.8 |
77
+ | Token overhead | 0% | ~3.2% |
78
+
79
+ ### Context Degradation Research
80
+
81
+ Microsoft Research + Salesforce: 15 LLMs, 200,000+ conversations, 39% average performance drop from single-turn to multi-turn. The degradation starts in as few as two turns. Fresh-context iteration (Ralph Loop) beats long conversations for quality beyond 60-90 minutes.
82
+
83
+ ### Multi-Agent Deliberation Findings
84
+
85
+ Free-form debate rounds produced 7,500 tokens of debate with rounds 2-3 just restating positions. Structured dimension scoring replaced free-form debate, dropping cost by 60% while improving ranking quality. Independence is critical — two agents with visibility into each other's findings converged to similar scores (0.45 vs 0.48). Without visibility: 0.45 vs 0.72 — the gap is the cost of herding.
86
+
87
+ ## What We Adopt
88
+
89
+ - The hook/skill/agent differentiation as the primary architectural decision framework
90
+ - "Code is for determinism, skills are for expertise" as the first-principles dividing line
91
+ - Filesystem as memory (our wiki vault IS this pattern)
92
+ - Fresh-context iteration via subagents (we have pi-subagents for this)
93
+ - Production metrics as evidence that harness infrastructure compounds (3.2% overhead prevents 35% false completion)
94
+ - Structured dimension scoring over free-form debate (we already adopted iMAD selective routing)
95
+
96
+ ## What We Deliberately Do NOT Adopt
97
+
98
+ - Full Claude Code dependency: our harness runs on pi, not Claude Code. But the architectural principles transfer.
99
+ - 84 hooks / 48 skills scale: excessive for an MVP. Start with 4-6 skills.
100
+ - Agent Teams (Claude Code proprietary): use pi-subagents for equivalent isolation.
@@ -0,0 +1,75 @@
1
+ ---
2
+ type: source
3
+ source_type: case_study
4
+ title: "Bolt.new Architecture & Case Study"
5
+ author: "DeepWiki, Evil Martians (Victoria Melnikova, Travis Turner)"
6
+ date_published: 2024-12-02
7
+ url:
8
+ - "https://deepwiki.com/stackblitz/bolt.new/1.2-architecture"
9
+ - "https://evilmartians.com/chronicles/bolt-new-from-stackblitz-how-they-surfed-the-ai-wave-with-no-wipeouts"
10
+ - "https://github.com/stackblitz/bolt.new"
11
+ confidence: high
12
+ key_claims:
13
+ - "WebContainers give AI complete control over filesystem, node server, package manager, terminal, browser console"
14
+ - "Claude 3.5 Sonnet was the enabling technology — zero-shot code gen without RAG infrastructure"
15
+ - "0 to $4M ARR in 4 weeks — usage doubling daily"
16
+ - "AI-generated code is immediately executable and editable in-browser"
17
+ - "Bolt.new is open source, built on Remix + React + WebContainers"
18
+ - "Rails powers the backend (users, permissions, billing)"
19
+ tags:
20
+ - bolt
21
+ - webcontainers
22
+ - claude
23
+ - remix
24
+ - stackblitz
25
+ created: 2026-05-03
26
+ updated: 2026-05-03
27
+ status: ingested
28
+
29
+ ---# Bolt.new Architecture & Case Study
30
+
31
+ Bolt.new is an AI-powered full-stack web development platform by StackBlitz that runs entirely in the browser. Users prompt, AI builds, code executes instantly in WebContainers.
32
+
33
+ ## Architecture (DeepWiki)
34
+
35
+ ### Core Components
36
+
37
+ **Frontend**: Remix framework + React. UI libraries: Radix UI, Framer Motion, UnoCSS. CodeMirror editor + XTerm.js terminal + app preview.
38
+
39
+ **WebContainer System**: In-browser Node.js runtime. Filesystem, package manager, terminal, browser console — all in browser sandbox via WebAssembly.
40
+
41
+ **AI Integration**: Anthropic Claude API. AI agent interprets prompts → controls dev environment → generates code → installs dependencies → runs dev server → deploys.
42
+
43
+ **Deployment**: Cloudflare Pages. One-click production deploy.
44
+
45
+ ### Interaction Flow
46
+ ```
47
+ User → Submit prompt → AI Agent → Generate code → Create files
48
+ → Install deps → Run dev server → Display preview
49
+ → Request changes → Modify code → Update preview
50
+ → Request deploy → Deploy → Share URL
51
+ ```
52
+
53
+ ## Evil Martians Case Study
54
+
55
+ ### The Breakthrough
56
+ StackBlitz had WebContainers since 2021. The missing piece was a model capable of zero-shot code generation without RAG infrastructure. Claude 3.5 Sonnet changed everything: "There's an order of magnitude difference in the LLM's required infrastructure to make it functional versus zero shot."
57
+
58
+ ### Key Product Decisions
59
+ - Code executes instantly — no waiting for cloud VMs
60
+ - AI-generated code is **malleable** — editable in-browser
61
+ - Streaming interface shows real-time results
62
+ - Complex environments spin up in milliseconds
63
+ - One-click deploy to Netlify
64
+
65
+ ### Results
66
+ - 0→$4M ARR in 4 weeks
67
+ - 99% reduction in development costs for users
68
+ - Tens of thousands of new customers, usage doubling daily
69
+ - Supabase signups surged after bolt.new integration
70
+
71
+ ### Lessons for AI Coding Harness
72
+ 1. **Environment control is the moat.** If the agent can't run code, it can't verify its own output. Bolt's WebContainers + OpenAI Codex's Chrome DevTools integration + Anthropic's Playwright MCP all converge on this.
73
+ 2. **Model capability matters more than prompt engineering** for certain thresholds. Before Claude 3.5 Sonnet, the same WebContainer technology wasn't enough. Find the model that makes your harness viable.
74
+ 3. **Keep generated code editable by users.** Don't lock users into a black-box AI output. This reduces trust barriers and enables human-in-the-loop refinement.
75
+ 4. **Rails backend for non-AI concerns** — users, permissions, billing. Don't reinvent infrastructure; use proven tech for everything outside the AI path.
@@ -0,0 +1,107 @@
1
+ ---
2
+ type: source
3
+ status: ingested
4
+ source_type: architecture-analysis
5
+ title: "Build-Time Prompt Compilation Architecture"
6
+ author: "Synthesis of multiple real sources"
7
+ date_published: 2026-05-02
8
+ url: "https://github.com/microsoft/prompt-engine"
9
+ confidence: high
10
+ tags:
11
+ - prompt-compilation
12
+ - build-tools
13
+ - yaml-to-json
14
+ - template-engine
15
+ related:
16
+ - "[[Research: Prompt Renderer for Multi-Model Agent Harness]]"
17
+ key_claims:
18
+ - "Build-time prompt compilation is a valid architectural pattern but no mature off-the-shelf npm package exists"
19
+ - "Microsoft prompt-engine (2.8K stars, MIT) validates the YAML-based prompt management pattern"
20
+ - "PromptWeaver (@iqai/prompt-weaver, MIT, Dec 2025) provides template compilation + Zod validation for production use"
21
+ - "The DIY approach (js-yaml + @iqai/prompt-weaver + per-model renderer plugins) is the correct implementation path"
22
+ - "Deterministic builds: same spec + same renderer version → identical output with hash verification"
23
+ created: 2026-05-02
24
+ updated: 2026-05-02
25
+
26
+ ---
27
+
28
+ # Build-Time Prompt Compilation — Real Tools & Architecture
29
+
30
+ > [!correction] Previous research cited "PromptKit PackC" (npm, v1.4.6, 48 versions) which does not exist. This page documents the real tools and architecture.
31
+
32
+ ## What Exists
33
+
34
+ ### Microsoft prompt-engine
35
+ - **Package**: `prompt-engine` (npm)
36
+ - **Stars**: 2.8K | **License**: MIT | **Language**: TypeScript
37
+ - **Status**: Last updated Oct 2022 (abandoned)
38
+ - **What it does**: YAML-based prompt management with description + examples + dialog pattern. Builds prompts programmatically from YAML specs.
39
+ - **Relevance**: Validates the YAML→prompt pattern. Code engine (NL→Code) and Chat engine (dialogs). Shows the pattern works but project is dormant.
40
+ - **URL**: https://github.com/microsoft/prompt-engine
41
+
42
+ ### PromptWeaver
43
+ - **Package**: `@iqai/prompt-weaver` (npm)
44
+ - **Stars**: 4 | **License**: MIT | **Language**: TypeScript 100%
45
+ - **Status**: Active (Dec 2025, v1.1.1, 104 commits, 7 releases)
46
+ - **What it does**: Handlebars-based template engine with Zod/Valibot/ArkType validation schema support. Built-in 60+ transformers (dates, currency, strings, collections). Supports template compilation caching, reusable partials, Fluent Builder API, and composition.
47
+ - **Relevance**: Production-ready template engine for prompts. Handlebars syntax for control flow (loops, conditionals, switch/case). Template compilation caching gets us 90% of the way to "build-time compilation." The Fluent Builder API enables dynamic prompt construction.
48
+ - **URL**: https://github.com/IQAIcom/prompt-weaver
49
+
50
+ ### What Does NOT Exist
51
+ - No npm package called "PromptKit PackC" exists
52
+ - No npm package called `@altairalabs/packc` exists
53
+ - No npm package `prompt-kit` exists (only `promptkit@0.0.1` — unrelated template scaffolding tool)
54
+ - No mature, maintained build-time YAML→JSON prompt compiler exists on npm
55
+
56
+ ## Recommended Implementation
57
+
58
+ ### DIY Build Pipeline
59
+
60
+ The architecture is sound. Instead of looking for a mythical off-the-shelf package, build the compiler ourselves:
61
+
62
+ ```
63
+ prompts/*.yaml (base specs)
64
+ ↓ js-yaml (parse)
65
+ SpecConfig[] (validated)
66
+ ↓ @iqai/prompt-weaver (template engine)
67
+ ↓ Per-model renderer plugins (apply provider conventions)
68
+ ↓ zod (validate schema)
69
+ compiled prompts: dist/prompts/{gpt,claude,gemini}/*.json
70
+ ↓ SHA-256 (hash)
71
+ manifest.json (deterministic build record)
72
+ ```
73
+
74
+ ### Stack
75
+ | Component | Library | Purpose |
76
+ |-----------|---------|---------|
77
+ | YAML parsing | `js-yaml` (mature, 2.7K stars) | Parse base spec YAML files |
78
+ | Template engine | `@iqai/prompt-weaver` | Handlebars-based template compilation with Zod validation |
79
+ | Schema validation | `zod` | Type-safe spec validation, compile-time checking |
80
+ | Deterministic builds | `crypto.createHash('sha256')` | Hash source specs + renderer version for reproducibility |
81
+ | Per-model renderers | Custom TypeScript plugins | Apply each provider's official conventions |
82
+
83
+ ### Why Not Microsoft prompt-engine Directly?
84
+ - Abandoned since 2022 (80 commits total)
85
+ - No per-model rendering support
86
+ - Limited to Code/Chat engines — not general-purpose prompt specs
87
+ - Pattern is valid; codebase is stale
88
+
89
+ ### Why PromptWeaver?
90
+ - Active development (Dec 2025)
91
+ - Handlebars → familiar syntax for template authors
92
+ - Zod integration → type-safe, validated prompts
93
+ - Template compilation caching → same spec = cached compiled output
94
+ - Reusable partials → DRY prompt fragments
95
+ - Fluent Builder API → dynamic prompt construction when needed
96
+
97
+ ## Relevance to ultimate-pi Prompt Renderer
98
+
99
+ The build-time compilation architecture should:
100
+
101
+ 1. **Accept a base prompt spec (YAML)** as input: `prompts/base/system.yaml`
102
+ 2. **Use PromptWeaver as the template engine**: Handlebars syntax, Zod validation, template caching
103
+ 3. **Apply per-model renderer plugins**: Each plugin knows its provider's official conventions (OpenAI constraints-first, Anthropic XML tags, Google constraints-last)
104
+ 4. **Compile at build time** via `npm run compile-prompts` → outputs `dist/prompts/{model}/*.json`
105
+ 5. **Ship compiled JSON in npm package** — no template engine at runtime
106
+ 6. **Runtime just does JSON.parse + string replace**: `__VAR_name__` placeholders for runtime variables
107
+ 7. **Deterministic builds**: Same YAML + same renderer version → identical compiled output (hash-verified)
@@ -0,0 +1,70 @@
1
+ ---
2
+ type: source
3
+ source_type: official-docs
4
+ title: "Claude API — Agent Skills Overview"
5
+ author: "Anthropic"
6
+ date_published: 2026
7
+ url: "https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview"
8
+ confidence: high
9
+ key_claims:
10
+ - "Skills are reusable, filesystem-based resources that provide Claude with domain-specific expertise"
11
+ - "Three levels of loading: Metadata (always, ~100 tokens), Instructions (when triggered, <5k tokens), Resources (as needed, effectively unlimited)"
12
+ - "No practical limit on bundled content — files don't consume context until accessed"
13
+ - "Skills run in a code execution environment where Claude has filesystem access, bash commands, and code execution"
14
+ - "Script execution: code never enters context — only output consumes tokens"
15
+ - "Custom Skills: create as directories with SKILL.md files"
16
+ tags: [source, skills, claude, anthropic, progressive-disclosure]
17
+ related:
18
+ - "[[agent-skills-pattern]]"
19
+ - "[[progressive-disclosure-agents]]"
20
+ - "[[skill-first-architecture]]"
21
+ ---
22
+
23
+ # Claude API — Agent Skills Overview
24
+
25
+ ## Summary
26
+
27
+ Official Anthropic documentation for the Agent Skills system. Covers architecture, loading model, security considerations, and cross-surface availability (Claude API, Claude Code, claude.ai).
28
+
29
+ ## Key Contributions
30
+
31
+ ### Filesystem-Based Architecture
32
+
33
+ Skills exist as directories on a virtual machine. Claude interacts with them using bash commands — reading SKILL.md, running scripts, accessing reference files. This filesystem-based architecture enables progressive disclosure: Claude loads information in stages.
34
+
35
+ ### Three Content Types, Three Loading Levels
36
+
37
+ | Level | Content | When Loaded | Token Cost |
38
+ |-------|---------|-------------|------------|
39
+ | Level 1 | Metadata (YAML frontmatter: name + description) | Always (at startup) | ~100 tokens per skill |
40
+ | Level 2 | Instructions (SKILL.md body) | When skill is triggered | Under 5,000 tokens |
41
+ | Level 3+ | Resources (additional .md, scripts, templates) | As needed | Effectively unlimited |
42
+
43
+ ### On-Demand File Access
44
+
45
+ Claude reads only files needed for each specific task. A skill can include dozens of reference files — if a task only needs the sales schema, Claude loads just that one file. The rest consume zero tokens.
46
+
47
+ ### Efficient Script Execution
48
+
49
+ When Claude runs `validate_form.py`, the script's code never loads into context. Only the script's output consumes tokens. This makes scripts far more efficient than generating equivalent code on the fly.
50
+
51
+ ### No Practical Limit on Bundled Content
52
+
53
+ Because files don't consume context until accessed, skills can include comprehensive API documentation, large datasets, extensive examples, or any reference materials. Zero context penalty for unused bundled content.
54
+
55
+ ### Security Model
56
+
57
+ Skills should only come from trusted sources. A malicious skill can direct Claude to invoke tools or execute code in harmful ways. Recommendations: audit thoroughly, treat like installing software, be especially careful in production systems.
58
+
59
+ ## What We Adopt
60
+
61
+ - Filesystem-based skill architecture as the model for harness skills
62
+ - Three-tier loading model for progressive disclosure
63
+ - Scripts-as-executables pattern (code never enters context)
64
+ - No practical limit on bundled reference material — enables comprehensive attack pattern catalogs, plan templates, etc.
65
+
66
+ ## What We Note
67
+
68
+ - Cross-surface availability: Skills don't sync across Claude API, Claude Code, and claude.ai — each surface requires separate management
69
+ - Runtime constraints vary: Claude API has no network access and no runtime package installation; Claude Code has full network access
70
+ - Our harness skills are pi-specific but follow the open standard — portable to any platform that supports SKILL.md
@@ -0,0 +1,88 @@
1
+ ---
2
+ type: source
3
+ status: ingested
4
+ source_type: official-changelog
5
+ author: Google
6
+ date_published: 2026-04-30
7
+ date_accessed: 2026-05-01
8
+ url: https://geminicli.com/docs/changelogs/
9
+ confidence: high
10
+ key_claims:
11
+ - v0.40 (Apr 2026): Offline search (bundled ripgrep), four-tier memory system, Gemma local model support
12
+ - v0.39 (Apr): /memory inbox for skill review, ContextManager architecture, memory leak fixes
13
+ - v0.38 (Apr): Chapters narrative flow, Context Compression Service, persistent policy approvals
14
+ - v0.37 (Apr): Dynamic sandbox expansion, git worktrees, browser agent enhancements
15
+ - v0.36 (Apr): Multi-registry architecture, native macOS Seatbelt/Windows sandboxing, git worktrees
16
+ - v0.34 (Mar): Plan Mode enabled by default, gVisor/LXC sandboxing
17
+ - v0.32 (Mar): Generalist agent for task routing, model steering, Plan Mode external editor
18
+ - v0.29 (Feb): Plan Mode introduced, Gemini 3 default
19
+ - v0.27 (Feb): Event-driven scheduler, /rewind command, queued tool confirmations
20
+ - v0.26 (Jan): skill-creator skill, agent skills enabled by default, generalist agent
21
+ - v0.23 (Jan): Experimental Agent Skills support (agentskills.io)
22
+ - v0.12 (Oct 2025): Codebase investigator subagent, model routing, model selection
23
+ - Launch: Jun 25, 2025
24
+ created: 2026-05-02
25
+ updated: 2026-05-02
26
+ tags: [source]
27
+ ---
28
+ # Gemini CLI Changelogs (v0.4 — v0.40)
29
+
30
+ ## What It Is
31
+
32
+ Complete release history of Gemini CLI from launch (June 2025) through v0.40 (April 2026). Tracks feature evolution across 40+ weekly releases.
33
+
34
+ ## Feature Evolution Timeline
35
+
36
+ ### Phase 1: Foundation (Jun–Sep 2025, v0.4–v0.9)
37
+
38
+ - v0.4: Edit tool, CloudRun/Security extensions, prompt completion, citations
39
+ - v0.5: FastMCP integration, positional prompts, tool output truncation
40
+ - v0.6: JSON output mode, chat sharing, prompt search, A2A protocol RFC
41
+ - v0.7: IDE plugin spec, Flutter/nanobanana extensions, experimental todos
42
+ - v0.8: **Extensions ecosystem launch** (20+ partners), new homepage/docs
43
+ - v0.9: **Interactive Shell** (vim, git rebase), OpenTelemetry GenAI metrics
44
+
45
+ ### Phase 2: Intelligence (Oct–Dec 2025, v0.10–v0.22)
46
+
47
+ - v0.10: Polish + bug fixing investment
48
+ - v0.11: Jules extension (remote workers), stream-json output
49
+ - v0.12: **Codebase investigator subagent**, model routing, model selection
50
+ - v0.15: **Todo planning**, scrollable UI + mouse support
51
+ - v0.16: **Gemini 3 launch**
52
+ - v0.18: Policy engine (experimental), Google Workspace extension
53
+ - v0.20: Multi-file drag-drop, persistent "Always Allow" policies
54
+ - v0.21: Gemini 3 Flash, Rill/Browserbase extensions
55
+ - v0.22: Free tier gets Gemini 3, Conductor extension (planning++)
56
+
57
+ ### Phase 3: Agent Architecture (Jan–Apr 2026, v0.23–v0.40)
58
+
59
+ - v0.23: **Agent Skills support** (agentskills.io), gemini-wrapped
60
+ - v0.24: Built-in agent skills, `/agents refresh`, `/skills install/uninstall`
61
+ - v0.25: `activate_skill` tool, `pr-creator` skill, skills enabled by default
62
+ - v0.26: `skill-creator` skill, agent skills by default, generalist agent
63
+ - v0.27: **Event-driven scheduler**, `/rewind`, queued tool confirmations
64
+ - v0.28: Positron IDE, custom themes, OAuth improvements
65
+ - v0.29: **Plan Mode**, Gemini 3 default for all
66
+ - v0.30: SDK package, custom skills, policy engine `--policy` flag
67
+ - v0.31: Gemini 3.1 Pro Preview, experimental browser agent
68
+ - v0.32: **Generalist agent enabled**, model steering, Plan Mode external editor
69
+ - v0.33: A2A remote agents, Plan Mode research subagents
70
+ - v0.34: **Plan Mode default**, gVisor/LXC sandboxing
71
+ - v0.35: Customizable keyboard shortcuts, vim improvements, JIT context discovery
72
+ - v0.36: **Multi-registry architecture**, macOS Seatbelt/Windows sandboxing, git worktrees
73
+ - v0.37: Dynamic sandbox expansion, Chapters narrative, browser persistent sessions
74
+ - v0.38: **Chapters narrative flow**, Context Compression Service, persistent policy approvals
75
+ - v0.39: `/memory inbox`, ContextManager architecture decoupling
76
+ - v0.40: Offline search (bundled ripgrep), four-tier memory, Gemma local model support
77
+
78
+ ## Key Patterns
79
+
80
+ 1. **Rapid iteration**: Weekly releases, 6,005 commits in ~10 months
81
+ 2. **Progressive disclosure**: Features gated behind experimental flags → preview → stable → default
82
+ 3. **Ecosystem first**: Extensions launched v0.8, Skills v0.23 — both designed for community contribution
83
+ 4. **Security layered in**: Policy engine (v0.18), sandboxing (v0.34), worktrees (v0.36) — not bolted on
84
+ 5. **Model-adaptive**: Model routing (v0.12), model steering (v0.32), Gemma local (v0.40)
85
+
86
+ ## Relevance to Ultimate-PI
87
+
88
+ Gemini CLI's evolution pattern validates our phased approach: foundation → intelligence → agent architecture. Their rapid iteration (weekly releases, experimental → preview → stable → default) is a model for how we should deploy harness improvements. Their "ecosystem first" approach (extensions, skills registries) suggests we should design our tool system for community contribution from the start.
@@ -0,0 +1,57 @@
1
+ ---
2
+ type: source
3
+ status: ingested
4
+ source_type: official-announcement
5
+ author: Taylor Mullen, Ryan J. Salva (Google)
6
+ date_published: 2025-06-25
7
+ date_accessed: 2026-05-01
8
+ url: https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemini-cli-open-source-ai-agent/
9
+ confidence: high
10
+ key_claims:
11
+ - Gemini CLI: open-source AI agent (Apache 2.0) bringing Gemini to terminal
12
+ - Free tier: 60 req/min, 1,000 req/day with personal Google account (industry's largest allowance)
13
+ - Access to Gemini 2.5 Pro with 1M token context window
14
+ - Built-in tools: Google Search grounding, MCP support, bundled extensions, customizable prompts
15
+ - Non-interactive mode for script automation
16
+ - Shares technology with Gemini Code Assist (VS Code + terminal)
17
+ - Open source: global community contribution expected
18
+ created: 2026-05-02
19
+ updated: 2026-05-02
20
+ tags: [source]
21
+ ---
22
+ # Google Official Blog: Gemini CLI Announcement
23
+
24
+ ## What It Is
25
+
26
+ Official launch announcement for Gemini CLI, published June 25, 2025 on Google's blog (The Keyword). Authored by Taylor Mullen (Senior Staff Software Engineer, creator of Gemini CLI) and Ryan J. Salva (Senior Director, Product Management).
27
+
28
+ ## Key Announcements
29
+
30
+ ### Free Tier (Unprecedented)
31
+ - 60 model requests per minute
32
+ - 1,000 model requests per day
33
+ - Access to Gemini 2.5 Pro with 1M token context window
34
+ - Requires only personal Google account
35
+ - Marketed as "industry's largest allowance"
36
+
37
+ ### Core Capabilities
38
+ - Code understanding, file manipulation, command execution, dynamic troubleshooting
39
+ - Ground prompts with Google Search for real-time web context
40
+ - Extend via MCP (Model Context Protocol) or bundled extensions
41
+ - Customize prompts and instructions for specific workflows
42
+ - Automate tasks via non-interactive script invocation
43
+
44
+ ### Open Source
45
+ - Apache 2.0 license
46
+ - Full source on GitHub: github.com/google-gemini/gemini-cli
47
+ - Community contribution expected (bugs, features, security, code)
48
+ - Emerging standards: MCP, system prompts (GEMINI.md), settings
49
+
50
+ ### Gemini Code Assist Integration
51
+ - Shares technology with Code Assist (VS Code)
52
+ - Agent mode in Code Assist: multi-step planning, auto-recovery from failures
53
+ - Available on all plans (free, Standard, Enterprise)
54
+
55
+ ## Relevance to Ultimate-PI
56
+
57
+ The free tier economics (60 req/min, 1,000 req/day) make Gemini CLI viable as a *model provider* within our multi-model harness. The 1M token window + Google Search grounding directly complement our L3 Grounding layer. The open-source model (Apache 2.0) means we can study and adapt their harness patterns without license concerns.
@@ -0,0 +1,53 @@
1
+ ---
2
+ type: source
3
+ status: ingested
4
+ source_type: official-documentation
5
+ author: Google
6
+ date_published: 2025-06-25
7
+ date_accessed: 2026-05-01
8
+ url: https://google-gemini.github.io/gemini-cli/docs/architecture.html
9
+ confidence: high
10
+ key_claims:
11
+ - Gemini CLI is composed of CLI package (frontend) and Core package (backend)
12
+ - Core receives requests, orchestrates Gemini API, manages tool execution
13
+ - Tools are individual modules for filesystem, shell, web fetch, search
14
+ - ReAct loop: user input → CLI → Core → Gemini API → tool execution → final response
15
+ - Key design principles: modularity, extensibility, user experience
16
+ - Read-only ops may not require user confirmation; write ops always do
17
+ created: 2026-05-02
18
+ updated: 2026-05-02
19
+ tags: [source]
20
+ ---
21
+ # Gemini CLI Architecture (Official Docs)
22
+
23
+ ## What It Is
24
+
25
+ The official architecture documentation for Google's Gemini CLI, an open-source AI agent (Apache 2.0) that brings Gemini models directly into the terminal.
26
+
27
+ ## Core Components
28
+
29
+ 1. **CLI package (`packages/cli`)**: User-facing — input processing, history management, display rendering, theme/UI customization, CLI configuration.
30
+
31
+ 2. **Core package (`packages/core`)**: Backend — API client for Gemini API, prompt construction/management, tool registration/execution, state management, server-side configuration.
32
+
33
+ 3. **Tools (`packages/core/src/tools/`)**: Individual modules extending Gemini model capabilities — filesystem operations, shell commands, web fetching, Google Search grounding, multi-file read, memory, MCP server bridge.
34
+
35
+ ## Interaction Flow
36
+
37
+ 1. User types prompt → CLI package
38
+ 2. CLI sends to Core package
39
+ 3. Core constructs prompt (history + tool definitions), sends to Gemini API
40
+ 4. Gemini API returns response (direct answer OR tool request)
41
+ 5. If tool: Core prepares execution, requests user approval for write/shell ops, executes, sends result back to API
42
+ 6. Core sends final response back to CLI
43
+ 7. CLI displays to user
44
+
45
+ ## Design Principles
46
+
47
+ - **Modularity**: Separating frontend from backend enables independent development and alternative frontends
48
+ - **Extensibility**: Tool system designed for adding new capabilities
49
+ - **User Experience**: Rich interactive terminal experience via CLI package
50
+
51
+ ## Relevance to Ultimate-PI
52
+
53
+ The two-package architecture (CLI/Core) maps to our L1-L4 (Core/Harness) vs L5-L8 (Observability/Memory/Orchestration) separation. Their tool registration + execution logic parallels our tool definitions. Their ReAct loop with approval gates parallels our planned pre-execution policy gates (P-F1).
@@ -0,0 +1,65 @@
1
+ ---
2
+ type: source
3
+ status: ingested
4
+ source_type: engineering-blog
5
+ author: Vivek Trivedy (LangChain)
6
+ date_published: 2026-03-10
7
+ date_accessed: 2026-05-01
8
+ url: https://blog.langchain.com/the-anatomy-of-an-agent-harness/
9
+ confidence: high
10
+ key_claims:
11
+ - Agent = Model + Harness. "If you're not the model, you're the harness."
12
+ - Harness includes: system prompts, tools/skills/MCPs, bundled infrastructure, orchestration logic, hooks/middleware
13
+ - Filesystem is most foundational harness primitive (durable state, collaboration surface, git versioning)
14
+ - Bash + code exec as general-purpose tool (avoid pre-designing every tool)
15
+ - Sandboxes for safe execution environments with good default tooling
16
+ - Context Rot management: compaction, tool call offloading, progressive disclosure (Skills)
17
+ - Ralph Loop: intercept model exit, reinject original prompt in clean context window
18
+ - Model-harness co-evolution creates overfitting — best harness for task may NOT be what model was trained with
19
+ created: 2026-05-02
20
+ updated: 2026-05-02
21
+ tags: [source]
22
+ ---
23
+ # LangChain: The Anatomy of an Agent Harness
24
+
25
+ ## What It Is
26
+
27
+ Comprehensive analysis of harness engineering from LangChain, published March 10, 2026. Defines harness primitives by working backwards from desired agent behavior.
28
+
29
+ ## Core Definition
30
+
31
+ **Agent = Model + Harness.** If you're not the model, you're the harness. A harness is every piece of code, configuration, and execution logic that isn't the model itself.
32
+
33
+ Concrete harness components: system prompts, tools/skills/MCPs + descriptions, bundled infrastructure (filesystem, sandbox, browser), orchestration logic (subagent spawning, handoffs, model routing), hooks/middleware (compaction, continuation, lint checks).
34
+
35
+ ## Key Harness Primitives
36
+
37
+ ### Filesystem
38
+ Most foundational primitive. Unlocks: workspace for reading data/code/docs, incremental work offloading, state persistence across sessions, collaboration surface (multiple agents + humans coordinate through shared files). Git adds versioning.
39
+
40
+ ### Bash + Code Execution
41
+ General-purpose tool. Instead of forcing users to build tools for every action, give agents a computer. Model can design its own tools on the fly via code. Still ship other tools, but code exec is default strategy for autonomous problem solving.
42
+
43
+ ### Sandboxes
44
+ Safe operating environments with good default tooling. Pre-installed runtimes, CLIs, browsers. Enable scale: create on demand, fan out, tear down.
45
+
46
+ ### Context Rot Management
47
+
48
+ - **Compaction**: Offloads/summarizes context near window limit.
49
+ - **Tool call offloading**: Keeps head + tail tokens of large outputs; full output on filesystem.
50
+ - **Progressive disclosure (Skills)**: Too many tools at startup degrades performance. Skills solve via on-demand loading.
51
+
52
+ ### Long-Horizon Execution
53
+
54
+ - **Ralph Loop**: Intercepts model exit attempt, reinjects original prompt in clean context. Filesystem makes this possible (fresh context reads state from previous iteration).
55
+ - **Planning + Self-Verification**: Plan files on filesystem, verification via test suites, hooks that loop back on failure.
56
+
57
+ ## Model-Harness Co-Evolution (Critical Insight)
58
+
59
+ Models post-trained with harness in the loop → overfitting to specific tool logic. Example: Codex's `apply_patch` tool — changing patch methods leads to worse model performance despite model intelligence.
60
+
61
+ **Counter-intuitive finding**: Terminal Bench 2.0 shows Opus 4.6 scores far lower in Claude Code than in other harnesses. LangChain improved their agent from Top 30 to Top 5 by only changing the harness. **"Best harness for your task is NOT necessarily the one a model was post-trained with."**
62
+
63
+ ## Relevance to Ultimate-PI
64
+
65
+ Validates our multi-model approach (4 profiles). Each model may need a different harness configuration — we should test model-harness combinations rather than assuming one harness fits all. The Ralph Loop concept could enhance our L2 Structured Planning by adding continuation hooks. Context rot management (compaction, offloading, progressive disclosure) directly validates our pi-lean-ctx + skills architecture.