ultimate-pi 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (516) hide show
  1. package/.agents/skills/ck-search/SKILL.md +99 -0
  2. package/.agents/skills/defuddle/SKILL.md +90 -0
  3. package/.agents/skills/find-skills/SKILL.md +142 -0
  4. package/.agents/skills/firecrawl/SKILL.md +150 -0
  5. package/.agents/skills/firecrawl/rules/install.md +82 -0
  6. package/.agents/skills/firecrawl/rules/security.md +26 -0
  7. package/.agents/skills/firecrawl-agent/SKILL.md +57 -0
  8. package/.agents/skills/firecrawl-build-interact/SKILL.md +67 -0
  9. package/.agents/skills/firecrawl-build-onboarding/SKILL.md +102 -0
  10. package/.agents/skills/firecrawl-build-onboarding/references/auth-flow.md +39 -0
  11. package/.agents/skills/firecrawl-build-onboarding/references/project-setup.md +20 -0
  12. package/.agents/skills/firecrawl-build-onboarding/references/sdk-installation.md +17 -0
  13. package/.agents/skills/firecrawl-build-scrape/SKILL.md +68 -0
  14. package/.agents/skills/firecrawl-build-search/SKILL.md +68 -0
  15. package/.agents/skills/firecrawl-crawl/SKILL.md +58 -0
  16. package/.agents/skills/firecrawl-download/SKILL.md +69 -0
  17. package/.agents/skills/firecrawl-interact/SKILL.md +83 -0
  18. package/.agents/skills/firecrawl-map/SKILL.md +50 -0
  19. package/.agents/skills/firecrawl-parse/SKILL.md +61 -0
  20. package/.agents/skills/firecrawl-scrape/SKILL.md +68 -0
  21. package/.agents/skills/firecrawl-search/SKILL.md +59 -0
  22. package/.agents/skills/obsidian-bases/SKILL.md +299 -0
  23. package/.agents/skills/obsidian-markdown/SKILL.md +237 -0
  24. package/.agents/skills/posthog-analyst/SKILL.md +306 -0
  25. package/.agents/skills/posthog-analyst/evals/evals.json +23 -0
  26. package/.agents/skills/wiki/SKILL.md +215 -0
  27. package/.agents/skills/wiki/references/css-snippets.md +122 -0
  28. package/.agents/skills/wiki/references/frontmatter.md +107 -0
  29. package/.agents/skills/wiki/references/git-setup.md +58 -0
  30. package/.agents/skills/wiki/references/mcp-setup.md +149 -0
  31. package/.agents/skills/wiki/references/modes.md +259 -0
  32. package/.agents/skills/wiki/references/plugins.md +96 -0
  33. package/.agents/skills/wiki/references/rest-api.md +124 -0
  34. package/.agents/skills/wiki-autoresearch/SKILL.md +211 -0
  35. package/.agents/skills/wiki-autoresearch/references/program.md +75 -0
  36. package/.agents/skills/wiki-fold/SKILL.md +204 -0
  37. package/.agents/skills/wiki-fold/references/fold-template.md +133 -0
  38. package/.agents/skills/wiki-ingest/SKILL.md +288 -0
  39. package/.agents/skills/wiki-lint/SKILL.md +183 -0
  40. package/.agents/skills/wiki-query/SKILL.md +176 -0
  41. package/.agents/skills/wiki-save/SKILL.md +128 -0
  42. package/.ckignore +41 -0
  43. package/.env.example +9 -0
  44. package/.github/workflows/lint.yml +33 -0
  45. package/.github/workflows/publish-github-packages.yml +35 -0
  46. package/.github/workflows/publish-npm.yml +1 -1
  47. package/.pi/SYSTEM.md +107 -40
  48. package/.pi/agents/pi-pi/agent-expert.md +205 -0
  49. package/.pi/agents/pi-pi/cli-expert.md +47 -0
  50. package/.pi/agents/pi-pi/config-expert.md +67 -0
  51. package/.pi/agents/pi-pi/ext-expert.md +53 -0
  52. package/.pi/agents/pi-pi/keybinding-expert.md +123 -0
  53. package/.pi/agents/pi-pi/pi-orchestrator.md +103 -0
  54. package/.pi/agents/pi-pi/prompt-expert.md +83 -0
  55. package/.pi/agents/pi-pi/skill-expert.md +52 -0
  56. package/.pi/agents/pi-pi/theme-expert.md +46 -0
  57. package/.pi/agents/pi-pi/tui-expert.md +100 -0
  58. package/.pi/agents/rethink.md +140 -0
  59. package/.pi/agents/wiki-ingest.md +67 -0
  60. package/.pi/agents/wiki-lint.md +75 -0
  61. package/.pi/auto-commit.json +20 -0
  62. package/.pi/extensions/banner.png +0 -0
  63. package/.pi/extensions/ck-enforce.ts +216 -0
  64. package/.pi/extensions/custom-footer.ts +308 -0
  65. package/.pi/extensions/custom-header.ts +116 -0
  66. package/.pi/extensions/dotenv-loader.ts +170 -0
  67. package/.pi/internal/cursor-sdk-transcript-parser.ts +59 -0
  68. package/.pi/model-router.json +95 -0
  69. package/.pi/npm/.gitignore +2 -0
  70. package/.pi/prompts/git-sync.md +124 -0
  71. package/.pi/prompts/harness-setup.md +509 -0
  72. package/.pi/prompts/save.md +16 -0
  73. package/.pi/prompts/wiki-autoresearch.md +19 -0
  74. package/.pi/prompts/wiki.md +23 -0
  75. package/.pi/providers/cursor-sdk-provider.test.mjs +476 -0
  76. package/.pi/providers/cursor-sdk-provider.ts +1085 -0
  77. package/.pi/settings.json +14 -4
  78. package/.pi/skills/agent-router/SKILL.md +174 -0
  79. package/.pi/sounds/alert/1-kaching-track.mp3 +0 -0
  80. package/.pi/sounds/error/1-ksi-wth-track.mp3 +0 -0
  81. package/.pi/sounds/error/2-smash-track.mp3 +0 -0
  82. package/.pi/sounds/error/3-buzzer-track.mp3 +0 -0
  83. package/.pi/sounds/notification/1-soft-notification-track.mp3 +0 -0
  84. package/.pi/sounds/project-sounds.json +25 -0
  85. package/.pi/sounds/reminder/1-soft-notification-track.mp3 +0 -0
  86. package/.pi/sounds/success/1-tada-track.mp3 +0 -0
  87. package/.pi/sounds/success/2-jobs-done-track.mp3 +0 -0
  88. package/.pi/sounds/success/3-yay-track.mp3 +0 -0
  89. package/CONTRIBUTING.md +116 -0
  90. package/README.md +32 -39
  91. package/biome.json +34 -0
  92. package/firecrawl/.env.template +58 -0
  93. package/firecrawl/README.md +49 -0
  94. package/firecrawl/docker-compose.yaml +201 -0
  95. package/firecrawl/searxng/searxng.env +3 -0
  96. package/firecrawl/searxng/settings.yml +85 -0
  97. package/lefthook.yml +8 -0
  98. package/package.json +55 -24
  99. package/vault/AGENTS.md +37 -0
  100. package/vault/wiki/_templates/comparison.md +39 -0
  101. package/vault/wiki/_templates/concept.md +40 -0
  102. package/vault/wiki/_templates/decision.md +21 -0
  103. package/vault/wiki/_templates/entity.md +32 -0
  104. package/vault/wiki/_templates/flow.md +14 -0
  105. package/vault/wiki/_templates/module.md +18 -0
  106. package/vault/wiki/_templates/question.md +31 -0
  107. package/vault/wiki/_templates/source.md +39 -0
  108. package/vault/wiki/concepts/AST-Aware Code Chunking.md +44 -0
  109. package/vault/wiki/concepts/Build-Time Prompt Compilation.md +107 -0
  110. package/vault/wiki/concepts/Context Engine (AI Coding).md +47 -0
  111. package/vault/wiki/concepts/Context-Aware System Reminders.md +61 -0
  112. package/vault/wiki/concepts/Contextualized Text Embedding.md +42 -0
  113. package/vault/wiki/concepts/Contractor vs Employee AI Model.md +55 -0
  114. package/vault/wiki/concepts/Dual-Model Agent Architecture.md +65 -0
  115. package/vault/wiki/concepts/Late Chunking vs Early Chunking.md +43 -0
  116. package/vault/wiki/concepts/Majority Vote Ensembling.md +68 -0
  117. package/vault/wiki/concepts/Meta-Harness.md +16 -0
  118. package/vault/wiki/concepts/Multi-Agent AI Coding Architecture.md +75 -0
  119. package/vault/wiki/concepts/Prompt Enhancement.md +90 -0
  120. package/vault/wiki/concepts/Prompt Renderer.md +89 -0
  121. package/vault/wiki/concepts/Semantic Codebase Indexing.md +67 -0
  122. package/vault/wiki/concepts/additive-config-hierarchy.md +16 -0
  123. package/vault/wiki/concepts/agent-artifacts-verifiable-deliverables.md +71 -0
  124. package/vault/wiki/concepts/agent-browser-browser-automation.md +99 -0
  125. package/vault/wiki/concepts/agent-codebase-interface.md +43 -0
  126. package/vault/wiki/concepts/agent-harness-architecture.md +67 -0
  127. package/vault/wiki/concepts/agent-loop-detection-patterns.md +133 -0
  128. package/vault/wiki/concepts/agent-search-enforcement.md +126 -0
  129. package/vault/wiki/concepts/agent-skills-ecosystem.md +74 -0
  130. package/vault/wiki/concepts/agent-skills-pattern.md +68 -0
  131. package/vault/wiki/concepts/agentic-harness-context-enforcement.md +91 -0
  132. package/vault/wiki/concepts/agentic-harness.md +34 -0
  133. package/vault/wiki/concepts/agentic-orchestration-pipeline.md +56 -0
  134. package/vault/wiki/concepts/agentic-search-no-embeddings.md +18 -0
  135. package/vault/wiki/concepts/anthropic-context-engineering.md +13 -0
  136. package/vault/wiki/concepts/antigravity-agent-first-architecture.md +61 -0
  137. package/vault/wiki/concepts/ast-compression.md +19 -0
  138. package/vault/wiki/concepts/ast-truncation.md +66 -0
  139. package/vault/wiki/concepts/barrel-files.md +37 -0
  140. package/vault/wiki/concepts/browser-harness-agent.md +41 -0
  141. package/vault/wiki/concepts/browser-subagent-visual-verification.md +82 -0
  142. package/vault/wiki/concepts/codebase-intelligence-ecosystem-comparison.md +192 -0
  143. package/vault/wiki/concepts/codebase-intelligence-harness-integration.md +161 -0
  144. package/vault/wiki/concepts/codebase-to-context-ingestion.md +46 -0
  145. package/vault/wiki/concepts/codex-harness-innovations.md +147 -0
  146. package/vault/wiki/concepts/consensus-debate-flow.md +17 -0
  147. package/vault/wiki/concepts/consensus-debate.md +206 -0
  148. package/vault/wiki/concepts/content-addressed-spec-identity.md +166 -0
  149. package/vault/wiki/concepts/context-anxiety.md +57 -0
  150. package/vault/wiki/concepts/context-compression-techniques.md +19 -0
  151. package/vault/wiki/concepts/context-continuity.md +22 -0
  152. package/vault/wiki/concepts/context-drift-in-agents.md +106 -0
  153. package/vault/wiki/concepts/context-engineering.md +62 -0
  154. package/vault/wiki/concepts/context-folding.md +67 -0
  155. package/vault/wiki/concepts/context-mode.md +38 -0
  156. package/vault/wiki/concepts/cursor-harness-innovations.md +107 -0
  157. package/vault/wiki/concepts/deterministic-session-compaction.md +79 -0
  158. package/vault/wiki/concepts/drift-detection-unified.md +296 -0
  159. package/vault/wiki/concepts/execution-feedback-loop.md +46 -0
  160. package/vault/wiki/concepts/feedforward-feedback-harness.md +60 -0
  161. package/vault/wiki/concepts/five-root-cause-metrics-sentrux.md +40 -0
  162. package/vault/wiki/concepts/fork-safe-spec-storage.md +89 -0
  163. package/vault/wiki/concepts/fts5-sandbox.md +19 -0
  164. package/vault/wiki/concepts/fuzzy-edit-matching.md +71 -0
  165. package/vault/wiki/concepts/gemini-cli-architecture.md +104 -0
  166. package/vault/wiki/concepts/generator-evaluator-architecture.md +64 -0
  167. package/vault/wiki/concepts/guardian-agent-pattern.md +67 -0
  168. package/vault/wiki/concepts/harness-configuration-layers.md +89 -0
  169. package/vault/wiki/concepts/harness-control-frameworks.md +155 -0
  170. package/vault/wiki/concepts/harness-engineering-first-principles.md +90 -0
  171. package/vault/wiki/concepts/harness-h-formalism.md +53 -0
  172. package/vault/wiki/concepts/hybrid-code-search.md +61 -0
  173. package/vault/wiki/concepts/inline-post-edit-validation.md +112 -0
  174. package/vault/wiki/concepts/legendary-engineering-patterns-harness.md +110 -0
  175. package/vault/wiki/concepts/lifecycle-hooks.md +94 -0
  176. package/vault/wiki/concepts/mcp-tool-routing.md +102 -0
  177. package/vault/wiki/concepts/memory-system-of-record-vs-ephemeral-cache.md +47 -0
  178. package/vault/wiki/concepts/meta-agent-context-pruning.md +151 -0
  179. package/vault/wiki/concepts/model-adaptive-harness.md +122 -0
  180. package/vault/wiki/concepts/model-routing-agents.md +101 -0
  181. package/vault/wiki/concepts/monorepo-architecture.md +45 -0
  182. package/vault/wiki/concepts/multi-agent-specialization.md +61 -0
  183. package/vault/wiki/concepts/permission-subsystem.md +16 -0
  184. package/vault/wiki/concepts/pi-messenger-analysis.md +243 -0
  185. package/vault/wiki/concepts/pi-vscode-extension-landscape.md +37 -0
  186. package/vault/wiki/concepts/policy-engine-pattern.md +78 -0
  187. package/vault/wiki/concepts/progressive-disclosure-agents.md +53 -0
  188. package/vault/wiki/concepts/progressive-skill-disclosure.md +17 -0
  189. package/vault/wiki/concepts/provider-native-prompting.md +203 -0
  190. package/vault/wiki/concepts/quality-signal-sentrux.md +37 -0
  191. package/vault/wiki/concepts/repo-map-ranking.md +42 -0
  192. package/vault/wiki/concepts/result-monad-error-handling.md +47 -0
  193. package/vault/wiki/concepts/safety-defense-in-depth.md +83 -0
  194. package/vault/wiki/concepts/sandbox-os-enforcement.md +18 -0
  195. package/vault/wiki/concepts/selective-debate-routing.md +70 -0
  196. package/vault/wiki/concepts/self-evolving-harness.md +60 -0
  197. package/vault/wiki/concepts/sentrux-mcp-integration.md +36 -0
  198. package/vault/wiki/concepts/sentrux-rules-engine.md +49 -0
  199. package/vault/wiki/concepts/shell-pattern-compression.md +24 -0
  200. package/vault/wiki/concepts/skill-first-architecture.md +166 -0
  201. package/vault/wiki/concepts/structured-compaction.md +78 -0
  202. package/vault/wiki/concepts/subagent-orchestration.md +17 -0
  203. package/vault/wiki/concepts/subagent-worktree-isolation.md +68 -0
  204. package/vault/wiki/concepts/superpowers-methodology.md +78 -0
  205. package/vault/wiki/concepts/think-in-code.md +73 -0
  206. package/vault/wiki/concepts/ts-execution-layer.md +100 -0
  207. package/vault/wiki/concepts/typescript-strict-mode.md +37 -0
  208. package/vault/wiki/concepts/vcc-conversation-compaction-for-pi.md +51 -0
  209. package/vault/wiki/concepts/verification-drift-detection.md +19 -0
  210. package/vault/wiki/consensus/consensus-records.md +58 -0
  211. package/vault/wiki/decisions/2026-04-30-pi-lean-ctx-native.md +122 -0
  212. package/vault/wiki/decisions/adr-008.md +40 -0
  213. package/vault/wiki/decisions/adr-009.md +46 -0
  214. package/vault/wiki/decisions/adr-010.md +55 -0
  215. package/vault/wiki/decisions/adr-011.md +165 -0
  216. package/vault/wiki/decisions/adr-012.md +102 -0
  217. package/vault/wiki/decisions/adr-013.md +59 -0
  218. package/vault/wiki/decisions/adr-014.md +73 -0
  219. package/vault/wiki/decisions/adr-015.md +81 -0
  220. package/vault/wiki/decisions/adr-016.md +91 -0
  221. package/vault/wiki/decisions/adr-017.md +79 -0
  222. package/vault/wiki/decisions/adr-018.md +100 -0
  223. package/vault/wiki/decisions/adr-019.md +75 -0
  224. package/vault/wiki/decisions/adr-020.md +106 -0
  225. package/vault/wiki/decisions/adr-021.md +86 -0
  226. package/vault/wiki/decisions/adr-022.md +113 -0
  227. package/vault/wiki/decisions/adr-023.md +113 -0
  228. package/vault/wiki/decisions/adr-024.md +73 -0
  229. package/vault/wiki/decisions/adr-025.md +130 -0
  230. package/vault/wiki/decisions/adr-026.md +56 -0
  231. package/vault/wiki/decisions/colocate-wiki.md +34 -0
  232. package/vault/wiki/entities/Anders Hejlsberg.md +29 -0
  233. package/vault/wiki/entities/Anthropic.md +17 -0
  234. package/vault/wiki/entities/Augment Code.md +49 -0
  235. package/vault/wiki/entities/Bjarne Stroustrup.md +26 -0
  236. package/vault/wiki/entities/Bolt.new (StackBlitz).md +39 -0
  237. package/vault/wiki/entities/Boris Cherny.md +11 -0
  238. package/vault/wiki/entities/Claude Code.md +19 -0
  239. package/vault/wiki/entities/Dennis Ritchie.md +26 -0
  240. package/vault/wiki/entities/Emergent Labs.md +32 -0
  241. package/vault/wiki/entities/Google Cloud.md +16 -0
  242. package/vault/wiki/entities/Guido van Rossum.md +28 -0
  243. package/vault/wiki/entities/Ken Thompson.md +28 -0
  244. package/vault/wiki/entities/Lee et al.md +16 -0
  245. package/vault/wiki/entities/Linus Torvalds.md +28 -0
  246. package/vault/wiki/entities/Lovable (company).md +40 -0
  247. package/vault/wiki/entities/Martin Fowler.md +16 -0
  248. package/vault/wiki/entities/Meng et al.md +16 -0
  249. package/vault/wiki/entities/OpenAI.md +16 -0
  250. package/vault/wiki/entities/Rocket.new.md +38 -0
  251. package/vault/wiki/entities/VILA-Lab.md +15 -0
  252. package/vault/wiki/entities/autodev-codebase.md +18 -0
  253. package/vault/wiki/entities/ck-tool.md +59 -0
  254. package/vault/wiki/entities/codesearch.md +18 -0
  255. package/vault/wiki/entities/disler-indydevdan.md +33 -0
  256. package/vault/wiki/entities/gsd-get-shit-done.md +56 -0
  257. package/vault/wiki/entities/javascript-runtimes.md +48 -0
  258. package/vault/wiki/entities/jesse-vincent.md +38 -0
  259. package/vault/wiki/entities/lean-ctx.md +32 -0
  260. package/vault/wiki/entities/opendev.md +41 -0
  261. package/vault/wiki/entities/ops-codegraph-tool.md +18 -0
  262. package/vault/wiki/entities/pi-coding-agent.md +53 -0
  263. package/vault/wiki/entities/sentrux.md +54 -0
  264. package/vault/wiki/entities/vgrep-tool.md +57 -0
  265. package/vault/wiki/entities/vitest.md +41 -0
  266. package/vault/wiki/flows/harness-wiki-pipeline.md +204 -0
  267. package/vault/wiki/hot.md +932 -0
  268. package/vault/wiki/index.md +437 -0
  269. package/vault/wiki/log.md +418 -0
  270. package/vault/wiki/meta/dashboard.md +30 -0
  271. package/vault/wiki/meta/lint-report-2026-04-30.md +86 -0
  272. package/vault/wiki/meta/lint-report-2026-05-02.md +251 -0
  273. package/vault/wiki/meta/overview.canvas +43 -0
  274. package/vault/wiki/modules/adversarial-verification.md +57 -0
  275. package/vault/wiki/modules/automated-observability.md +54 -0
  276. package/vault/wiki/modules/bench.md +20 -0
  277. package/vault/wiki/modules/extensions.md +23 -0
  278. package/vault/wiki/modules/grounding-checkpoints.md +62 -0
  279. package/vault/wiki/modules/harness-implementation-plan.md +345 -0
  280. package/vault/wiki/modules/harness-wiki-skill-mapping.md +135 -0
  281. package/vault/wiki/modules/harness.md +86 -0
  282. package/vault/wiki/modules/persistent-memory.md +85 -0
  283. package/vault/wiki/modules/schema-orchestration.md +68 -0
  284. package/vault/wiki/modules/skills.md +27 -0
  285. package/vault/wiki/modules/spec-hardening.md +58 -0
  286. package/vault/wiki/modules/structured-planning.md +53 -0
  287. package/vault/wiki/modules/think-in-code-enforcement.md +153 -0
  288. package/vault/wiki/modules/wiki-query-interface.md +64 -0
  289. package/vault/wiki/overview.md +51 -0
  290. package/vault/wiki/questions/Research-pi-vs-claude-code-agentic-orchestration-pipeline.md +87 -0
  291. package/vault/wiki/questions/Research-sentrux-dev.md +123 -0
  292. package/vault/wiki/questions/Research-superpowers-skill-for-agentic-coding-agents.md +164 -0
  293. package/vault/wiki/questions/Research: Augment Code Context Engine.md +244 -0
  294. package/vault/wiki/questions/Research: Automating Software Engineering - Lovable, Bolt, Emergent, Rocket.md +112 -0
  295. package/vault/wiki/questions/Research: Claude Code State-of-the-Art Harness Improvements.md +209 -0
  296. package/vault/wiki/questions/Research: Codex State-of-the-Art Harness Improvements.md +99 -0
  297. package/vault/wiki/questions/Research: Engineering Workflows of Legendary Programmers and AI Harness Mapping.md +107 -0
  298. package/vault/wiki/questions/Research: Fallow Codebase Intelligence Harness Integration.md +72 -0
  299. package/vault/wiki/questions/Research: Gemini CLI SOTA Harness Integration.md +166 -0
  300. package/vault/wiki/questions/Research: GitHub Issues as Harness Spec Storage.md +188 -0
  301. package/vault/wiki/questions/Research: Google Antigravity Harness Integration.md +120 -0
  302. package/vault/wiki/questions/Research: Meta-Agent Context Drift Detection.md +236 -0
  303. package/vault/wiki/questions/Research: Model-Adaptive Agent Harness Design.md +95 -0
  304. package/vault/wiki/questions/Research: Model-Specific Prompting Guides.md +165 -0
  305. package/vault/wiki/questions/Research: Prompt Renderer for Multi-Model Agent Harness.md +216 -0
  306. package/vault/wiki/questions/Research: Skill-First Harness Architecture.md +91 -0
  307. package/vault/wiki/questions/Research: TypeScript Best Practices and Codebase Structure.md +88 -0
  308. package/vault/wiki/questions/Research: TypeScript Execution Layer for Agent Tool Calling.md +81 -0
  309. package/vault/wiki/questions/Research: claude-mem over Obsidian for Harness Layer.md +71 -0
  310. package/vault/wiki/questions/Research: claude-mem over obsidian wiki as the knowledge base for our agentic harness pipeline. think from first principles. does this replace or complement our current setup? no hard feelings about previous decisions. gimme accurate points.md +80 -0
  311. package/vault/wiki/questions/Research: context-mode vs lean-ctx.md +72 -0
  312. package/vault/wiki/questions/Research: cursor.sh Harness Innovations.md +92 -0
  313. package/vault/wiki/questions/Research: executor.sh Harness Integration.md +170 -0
  314. package/vault/wiki/questions/Research: how GSD fits into our coding harness setup.md +97 -0
  315. package/vault/wiki/questions/Research: how claude-mem fits into our workflow. and whether it should replace obsidian in the codebase. no hard feelings about previous actions, rethink from first principles always.md +80 -0
  316. package/vault/wiki/questions/Research: pi-vcc.md +113 -0
  317. package/vault/wiki/questions/Research: semantic code search tools.md +69 -0
  318. package/vault/wiki/questions/Research: vcc extension for pi coding agent.md +73 -0
  319. package/vault/wiki/questions/how-to-enable-semantic-code-search-now.md +111 -0
  320. package/vault/wiki/questions/mvp-implementation-blueprint.md +552 -0
  321. package/vault/wiki/questions/research-agent-first-codebase-exploration.md +199 -0
  322. package/vault/wiki/questions/research-agentic-coding-harness-latest-papers.md +142 -0
  323. package/vault/wiki/questions/research-gitingest-gitreverse-integration.md +100 -0
  324. package/vault/wiki/questions/research-wozcode-token-reduction.md +67 -0
  325. package/vault/wiki/questions/resolved-context-pruning-inplace-vs-restart.md +95 -0
  326. package/vault/wiki/questions/resolved-context-window-economics.md +167 -0
  327. package/vault/wiki/questions/resolved-imad-debate-gating-transfer.md +126 -0
  328. package/vault/wiki/questions/resolved-mcp-tool-preference.md +112 -0
  329. package/vault/wiki/questions/resolved-small-model-meta-agents.md +107 -0
  330. package/vault/wiki/questions/resolved-treesitter-dynamic-languages.md +95 -0
  331. package/vault/wiki/sources/Auggie Context MCP Server.md +63 -0
  332. package/vault/wiki/sources/Augment Code Codacy AI Giants.md +61 -0
  333. package/vault/wiki/sources/Augment Code MCP SiliconAngle.md +49 -0
  334. package/vault/wiki/sources/Augment Code WorkOS ERC 2025.md +55 -0
  335. package/vault/wiki/sources/Augment Context Engine Official.md +71 -0
  336. package/vault/wiki/sources/Augment SWE-bench Agent GitHub.md +74 -0
  337. package/vault/wiki/sources/Augment SWE-bench Pro Blog.md +58 -0
  338. package/vault/wiki/sources/Source: AgentBus Jinja2 Prompt Pipelines.md +75 -0
  339. package/vault/wiki/sources/Source: Arxiv /342/200/224 Don't Break the Cache.md" +85 -0
  340. package/vault/wiki/sources/Source: Augment - Harness Engineering for AI Coding Agents.md +58 -0
  341. package/vault/wiki/sources/Source: Blake Crosley Agent Architecture Guide.md +100 -0
  342. package/vault/wiki/sources/Source: Bolt.new Architecture & Case Study.md +75 -0
  343. package/vault/wiki/sources/Source: Build-Time Prompt Compilation Architecture.md +107 -0
  344. package/vault/wiki/sources/Source: Claude API Agent Skills Overview.md +70 -0
  345. package/vault/wiki/sources/Source: Gemini CLI Changelogs.md +88 -0
  346. package/vault/wiki/sources/Source: Google Blog - Gemini CLI Announcement.md +57 -0
  347. package/vault/wiki/sources/Source: Google Gemini CLI Architecture Docs.md +53 -0
  348. package/vault/wiki/sources/Source: LangChain - Anatomy of Agent Harness.md +65 -0
  349. package/vault/wiki/sources/Source: Lovable Architecture & Clone Analysis.md +83 -0
  350. package/vault/wiki/sources/Source: Martin Fowler - Harness Engineering.md +70 -0
  351. package/vault/wiki/sources/Source: OpenAI Harness Engineering Five Principles.md +58 -0
  352. package/vault/wiki/sources/Source: OpenAI Harness Engineering /342/200/224 0 Lines of Human Code.md" +101 -0
  353. package/vault/wiki/sources/Source: OpenDev /342/200/224 Building AI Coding Agents for the Terminal.md" +100 -0
  354. package/vault/wiki/sources/Source: Render AI Coding Agents Benchmark 2025.md +53 -0
  355. package/vault/wiki/sources/Source: Rocket.new /342/200/224 Vibe Solutioning Platform.md" +70 -0
  356. package/vault/wiki/sources/Source: SwirlAI Agent Skills Progressive Disclosure.md +71 -0
  357. package/vault/wiki/sources/Source: TianPan Prompt Caching Architecture.md +89 -0
  358. package/vault/wiki/sources/Source: Vercel Labs agent-browser.md +155 -0
  359. package/vault/wiki/sources/Source: browser-harness CDP Harness.md +126 -0
  360. package/vault/wiki/sources/agent-drift-academic-paper.md +79 -0
  361. package/vault/wiki/sources/aider-repomap-tree-sitter.md +42 -0
  362. package/vault/wiki/sources/anthropic-compaction-api.md +58 -0
  363. package/vault/wiki/sources/anthropic-effective-harnesses.md +42 -0
  364. package/vault/wiki/sources/anthropic-prompt-best-practices.md +100 -0
  365. package/vault/wiki/sources/anthropic2026-harness-design.md +63 -0
  366. package/vault/wiki/sources/barrel-files-tkdodo.md +38 -0
  367. package/vault/wiki/sources/birth-of-unix-kernighan-interview.md +57 -0
  368. package/vault/wiki/sources/bockeler2026-harness-engineering.md +69 -0
  369. package/vault/wiki/sources/cast-code-chunking-paper.md +50 -0
  370. package/vault/wiki/sources/ck-semantic-search.md +78 -0
  371. package/vault/wiki/sources/claude-code-architecture-karaxai-2026.md +71 -0
  372. package/vault/wiki/sources/claude-code-architecture-qubytes-2026.md +50 -0
  373. package/vault/wiki/sources/claude-code-architecture-vila-lab-2026.md +64 -0
  374. package/vault/wiki/sources/claude-code-security-architecture-penligent-2026.md +70 -0
  375. package/vault/wiki/sources/claude-context-editing-docs.md +13 -0
  376. package/vault/wiki/sources/cloudflare-codemode.md +63 -0
  377. package/vault/wiki/sources/code-chunk-library-supermemory.md +63 -0
  378. package/vault/wiki/sources/codeact-apple-2024.md +62 -0
  379. package/vault/wiki/sources/codex-dsc-rfc-8573.md +41 -0
  380. package/vault/wiki/sources/codex-open-source-agent-2026.md +110 -0
  381. package/vault/wiki/sources/coir-code-retrieval-benchmark.md +51 -0
  382. package/vault/wiki/sources/colinmcnamara-context-optimization-codemode.md +48 -0
  383. package/vault/wiki/sources/context-folding-paper.md +61 -0
  384. package/vault/wiki/sources/context-mode-website.md +63 -0
  385. package/vault/wiki/sources/cursor-agent-best-practices-2026.md +62 -0
  386. package/vault/wiki/sources/cursor-fork-29b-2025.md +50 -0
  387. package/vault/wiki/sources/cursor-harness-april-2026.md +76 -0
  388. package/vault/wiki/sources/cursor-instant-apply-2024.md +45 -0
  389. package/vault/wiki/sources/cursor-shadow-workspace-2024.md +52 -0
  390. package/vault/wiki/sources/cursor-shipped-coding-agent-2026.md +53 -0
  391. package/vault/wiki/sources/cursor-vs-antigravity-2026.md +51 -0
  392. package/vault/wiki/sources/disler-pi-vs-claude-code.md +69 -0
  393. package/vault/wiki/sources/distill-deterministic-context-compression.md +53 -0
  394. package/vault/wiki/sources/embedding-models-benchmark-supermemory-2025.md +48 -0
  395. package/vault/wiki/sources/executor-rhyssullivan.md +122 -0
  396. package/vault/wiki/sources/fallow-rs-codebase-intelligence.md +125 -0
  397. package/vault/wiki/sources/fan2025-imad.md +60 -0
  398. package/vault/wiki/sources/forgecode-gpt5-agent-improvements.md +63 -0
  399. package/vault/wiki/sources/gemini-3-prompting-guide.md +78 -0
  400. package/vault/wiki/sources/gh-cli-sub-issue-rfc.md +50 -0
  401. package/vault/wiki/sources/gh-sub-issue-extension.md +72 -0
  402. package/vault/wiki/sources/github-fork-issues-discussion.md +44 -0
  403. package/vault/wiki/sources/github-issue-dependencies-docs.md +49 -0
  404. package/vault/wiki/sources/github-sub-issues-docs.md +51 -0
  405. package/vault/wiki/sources/gitingest.md +91 -0
  406. package/vault/wiki/sources/gitreverse.md +63 -0
  407. package/vault/wiki/sources/google-antigravity-official-blog.md +47 -0
  408. package/vault/wiki/sources/google-antigravity-wikipedia.md +53 -0
  409. package/vault/wiki/sources/gsd-codecentric-deep-dive.md +57 -0
  410. package/vault/wiki/sources/gsd-github-repo.md +51 -0
  411. package/vault/wiki/sources/gsd-hn-discussion.md +59 -0
  412. package/vault/wiki/sources/guido-python-design-philosophy.md +56 -0
  413. package/vault/wiki/sources/hejlsberg-7-learnings.md +48 -0
  414. package/vault/wiki/sources/ironclaw-drift-monitor.md +80 -0
  415. package/vault/wiki/sources/langsight-loop-detection.md +80 -0
  416. package/vault/wiki/sources/leanctx-website.md +69 -0
  417. package/vault/wiki/sources/lee2026-meta-harness.md +59 -0
  418. package/vault/wiki/sources/linux-kernel-coding-workflow.md +50 -0
  419. package/vault/wiki/sources/lou2026-autoharness.md +53 -0
  420. package/vault/wiki/sources/martin-fowler-harness-engineering.md +73 -0
  421. package/vault/wiki/sources/mcp-architecture-docs.md +13 -0
  422. package/vault/wiki/sources/meng2026-agent-harness-survey.md +79 -0
  423. package/vault/wiki/sources/mindstudio-four-agent-types.md +68 -0
  424. package/vault/wiki/sources/ms-chat-history-management.md +13 -0
  425. package/vault/wiki/sources/openai-prompt-guidance.md +104 -0
  426. package/vault/wiki/sources/openclaw-session-pruning.md +13 -0
  427. package/vault/wiki/sources/opencode-dcp.md +13 -0
  428. package/vault/wiki/sources/opendev-arxiv-2603.05344v1.md +79 -0
  429. package/vault/wiki/sources/openhands-platform.md +39 -0
  430. package/vault/wiki/sources/oss-guide-codebase-exploration.md +53 -0
  431. package/vault/wiki/sources/pi-compaction-extensions-ecosystem.md +102 -0
  432. package/vault/wiki/sources/pi-context-prune-github-repo.md +38 -0
  433. package/vault/wiki/sources/pi-mono-compaction-docs.md +38 -0
  434. package/vault/wiki/sources/pi-omni-compact-github-repo.md +50 -0
  435. package/vault/wiki/sources/pi-rtk-optimizer-github-repo.md +45 -0
  436. package/vault/wiki/sources/pi-vcc-github-repo.md +69 -0
  437. package/vault/wiki/sources/pi-vscode-marketplace.md +41 -0
  438. package/vault/wiki/sources/pi-vscode-model-provider-marketplace.md +39 -0
  439. package/vault/wiki/sources/py-tree-sitter.md +13 -0
  440. package/vault/wiki/sources/sentrux-dev-landing.md +40 -0
  441. package/vault/wiki/sources/sentrux-docs-pro-architecture.md +75 -0
  442. package/vault/wiki/sources/sentrux-docs-quality-signal.md +46 -0
  443. package/vault/wiki/sources/sentrux-docs-root-cause-metrics.md +57 -0
  444. package/vault/wiki/sources/sentrux-docs-rules-engine.md +58 -0
  445. package/vault/wiki/sources/sentrux-github-repo.md +56 -0
  446. package/vault/wiki/sources/superpowers-github-repo.md +56 -0
  447. package/vault/wiki/sources/superpowers-release-blog.md +54 -0
  448. package/vault/wiki/sources/superpowers-termdock-analysis.md +45 -0
  449. package/vault/wiki/sources/swe-agent-aci.md +42 -0
  450. package/vault/wiki/sources/swe-bench.md +45 -0
  451. package/vault/wiki/sources/swe-pruner-context-pruning.md +13 -0
  452. package/vault/wiki/sources/think-in-code-blog.md +48 -0
  453. package/vault/wiki/sources/tree-sitter-docs.md +13 -0
  454. package/vault/wiki/sources/ts-best-practices-2025-devto.md +42 -0
  455. package/vault/wiki/sources/ts-folder-structure-mingyang.md +58 -0
  456. package/vault/wiki/sources/ts-monorepo-koerselman.md +44 -0
  457. package/vault/wiki/sources/ts-result-error-handling-kkalamarski.md +52 -0
  458. package/vault/wiki/sources/ts-runtimes-comparison-betterstack.md +42 -0
  459. package/vault/wiki/sources/ts-strict-mode-rishikc.md +43 -0
  460. package/vault/wiki/sources/unix-philosophy.md +48 -0
  461. package/vault/wiki/sources/vectara-chunking-vs-embedding-naacl2025.md +39 -0
  462. package/vault/wiki/sources/vectara-guardian-agents.md +79 -0
  463. package/vault/wiki/sources/vgrep-semantic-search.md +76 -0
  464. package/vault/wiki/sources/vitest-official.md +41 -0
  465. package/vault/wiki/sources/vscode-pi-community-extension.md +40 -0
  466. package/vault/wiki/sources/wozcode.md +79 -0
  467. package/.agents/skills/compress/SKILL.md +0 -111
  468. package/.agents/skills/compress/scripts/__init__.py +0 -9
  469. package/.agents/skills/compress/scripts/__main__.py +0 -3
  470. package/.agents/skills/compress/scripts/benchmark.py +0 -78
  471. package/.agents/skills/compress/scripts/cli.py +0 -73
  472. package/.agents/skills/compress/scripts/compress.py +0 -227
  473. package/.agents/skills/compress/scripts/detect.py +0 -121
  474. package/.agents/skills/compress/scripts/validate.py +0 -189
  475. package/.agents/skills/emil-design-eng/SKILL.md +0 -679
  476. package/.agents/skills/lean-ctx/SKILL.md +0 -149
  477. package/.agents/skills/lean-ctx/scripts/install.sh +0 -95
  478. package/.agents/skills/scrapling-official/LICENSE.txt +0 -28
  479. package/.agents/skills/scrapling-official/SKILL.md +0 -390
  480. package/.agents/skills/scrapling-official/examples/01_fetcher_session.py +0 -26
  481. package/.agents/skills/scrapling-official/examples/02_dynamic_session.py +0 -26
  482. package/.agents/skills/scrapling-official/examples/03_stealthy_session.py +0 -26
  483. package/.agents/skills/scrapling-official/examples/04_spider.py +0 -58
  484. package/.agents/skills/scrapling-official/examples/README.md +0 -45
  485. package/.agents/skills/scrapling-official/references/fetching/choosing.md +0 -78
  486. package/.agents/skills/scrapling-official/references/fetching/dynamic.md +0 -352
  487. package/.agents/skills/scrapling-official/references/fetching/static.md +0 -432
  488. package/.agents/skills/scrapling-official/references/fetching/stealthy.md +0 -255
  489. package/.agents/skills/scrapling-official/references/mcp-server.md +0 -214
  490. package/.agents/skills/scrapling-official/references/migrating_from_beautifulsoup.md +0 -86
  491. package/.agents/skills/scrapling-official/references/parsing/adaptive.md +0 -212
  492. package/.agents/skills/scrapling-official/references/parsing/main_classes.md +0 -586
  493. package/.agents/skills/scrapling-official/references/parsing/selection.md +0 -494
  494. package/.agents/skills/scrapling-official/references/spiders/advanced.md +0 -344
  495. package/.agents/skills/scrapling-official/references/spiders/architecture.md +0 -94
  496. package/.agents/skills/scrapling-official/references/spiders/getting-started.md +0 -164
  497. package/.agents/skills/scrapling-official/references/spiders/proxy-blocking.md +0 -235
  498. package/.agents/skills/scrapling-official/references/spiders/requests-responses.md +0 -196
  499. package/.agents/skills/scrapling-official/references/spiders/sessions.md +0 -205
  500. package/PLAN.md +0 -11
  501. package/extensions/lean-ctx-enforce.ts +0 -166
  502. package/skills-lock.json +0 -35
  503. package/wiki/README.md +0 -19
  504. package/wiki/decisions/0001-establish-project-wiki-and-decision-record-format.md +0 -25
  505. package/wiki/decisions/0002-add-project-banner-to-readme.md +0 -26
  506. package/wiki/decisions/0003-remove-redundant-readme-title-heading.md +0 -26
  507. package/wiki/decisions/0004-publish-package-to-npm-as-ultimate-pi.md +0 -26
  508. package/wiki/decisions/0005-automate-npm-publish-with-github-actions.md +0 -27
  509. package/wiki/decisions/0006-switch-to-npm-trusted-publishing.md +0 -26
  510. package/wiki/decisions/0007-use-absolute-banner-url-for-npm-readme-rendering.md +0 -26
  511. package/wiki/decisions/0008-rename-banner-asset-for-cache-busting.md +0 -26
  512. package/wiki/decisions/0009-force-oidc-path-by-clearing-node-auth-token-in-publish-step.md +0 -25
  513. package/wiki/decisions/0010-simplify-setup-node-for-npm-trusted-publishing.md +0 -26
  514. package/wiki/decisions/0011-add-noop-workflow-change-to-force-fresh-publish-run.md +0 -25
  515. package/wiki/decisions/0012-align-workflow-runtime-with-npm-trusted-publishing-requirements.md +0 -26
  516. package/wiki/decisions/0013-add-package-repository-url-for-provenance-validation.md +0 -25
@@ -0,0 +1,932 @@
1
+ ---
2
+ type: meta
3
+ title: "Hot Cache"
4
+ updated: 2026-05-05T20:20:00
5
+ created: 2026-04-30
6
+ tags: []
7
+ status: active
8
+ ---
9
+
10
+ # Recent Context
11
+
12
+ ## Last Updated
13
+ 2026-05-05. Deep research: [[Research: pi-vcc]] (ecosystem expansion 4→7, Anthropic compaction API, Context Folding SOTA).
14
+
15
+ ## pi-vcc Research — Ecosystem Expansion (May 2026)
16
+
17
+ ### Key Finding
18
+ **Pi compaction ecosystem grew from 4 to 7 extensions operating at three distinct layers: prevention (rtk-optimizer), mid-session pruning (context-prune), and boundary compaction (vcc + 4 others). Anthropic launched official server-side compaction API (beta Jan 2026). Context Folding (arXiv 2510.11967) achieves 10x context reduction via RL-trained branch/return. pi-vcc remains the only zero-LLM option across all 7 extensions and the official API. Tool-calling accuracy collapses ~40% past 80K tokens — a hard cliff, not gradual decline.**
19
+
20
+ ### Ten Key Findings
21
+ 1. **Ecosystem grew 4→7 extensions**: Three new extensions since April: pi-omni-compact (large-context model subprocess), pi-context-prune (tool-call batch summarization), pi-rtk-optimizer (upstream command rewriting + output compaction).
22
+ 2. **Three-layer token management emerged**: Prevention (rtk-optimizer) → mid-session pruning (context-prune) → boundary compaction (vcc/others). Maps directly to our harness's layered context engineering.
23
+ 3. **pi-vcc still uniquely deterministic**: Across all 7 extensions plus Anthropic's official API, pi-vcc is the only zero-LLM option. Every other approach uses LLM summarization.
24
+ 4. **Anthropic launched official compaction API**: Beta since January 2026. Server-side automatic summarization for Claude Opus 4.7/4.6, Sonnet 4.6. Validates compaction as first-class platform concern.
25
+ 5. **Context Folding achieves 10x reduction**: arXiv 2510.11967 (ByteDance Seed/CMU/Stanford). FoldGRPO RL framework learns branch/return sub-trajectories. 62% BrowseComp-Plus, 58% SWE-Bench Verified at only 32K tokens.
26
+ 6. **80K token accuracy cliff confirmed**: Past ~80K effective-context tokens, tool-calling accuracy collapses ~40%. Hard cliff. Context windows beyond 80K misleading for agentic workloads.
27
+ 7. **pi-omni-compact: opposing philosophy**: Spawns 1M+ context model for highest-fidelity summaries. Exactly opposite of pi-vcc: more compute for quality vs zero compute for determinism.
28
+ 8. **Recall still unique**: No new extension or API offers searchable pre-compaction history. `vcc_recall` over raw JSONL remains pi-vcc's killer differentiator.
29
+ 9. **Pi ecosystem reached 2,808+ resources**: 1,183+ extensions, 1,459 active projects. Compaction among highest-activity categories.
30
+ 10. **65% of enterprise AI failures from context drift**: Compaction quality directly determines agent reliability. Not optional.
31
+
32
+ ### Pages Created
33
+ [[pi-rtk-optimizer-github-repo]], [[pi-omni-compact-github-repo]], [[pi-context-prune-github-repo]], [[anthropic-compaction-api]], [[context-folding-paper]] (sources), [[context-folding]] (concept)
34
+
35
+ ### Pages Updated
36
+ [[pi-compaction-extensions-ecosystem]] (4→7 extensions), [[deterministic-session-compaction]] (context folding + 3-layer model), [[Research: pi-vcc]] (10 findings, expanded landscape)
37
+
38
+ ## VCC Extension Research (May 2026)
39
+
40
+ ### Key Finding
41
+ **Topic has two valid meanings. For Pi, "VCC" can mean VS Code integration extensions OR literal VCC compaction (`pi-vcc`). VS Code side currently has three patterns: official bridge (`pi0.pi-vscode`), LM provider bridge (`tintinweb.vscode-pi-model-chat-provider`), and community full chat UX (`cdervis.vscode-pi`). Literal VCC side is `pi-vcc`, a deterministic compaction/recall Pi package, not a VS Code UI extension.**
42
+
43
+ ### Five Key Findings
44
+ 1. **Official extension exists and is active**: `pi0.pi-vscode` (1,348 installs, v0.0.9) keeps terminal-first flow, adds `@pi` chat participant, and ships rich VS Code bridge tools.
45
+ 2. **Model provider path exists**: `tintinweb.vscode-pi-model-chat-provider` (350 installs) exposes Pi via `vscode.lm.*` so Copilot Chat and LM API extensions can use Pi models.
46
+ 3. **Community full UX option exists**: `cdervis.vscode-pi` (352 installs, v0.13.0) is unofficial/prerelease but offers rich sidebar + RPC workflow and rapid iteration.
47
+ 4. **Literal VCC is compaction tech**: `sting8k/pi-vcc` uses deterministic no-LLM compaction, high token reduction, and recall over session lineage.
48
+ 5. **Naming collision causes confusion**: User phrase "vcc extension" is ambiguous unless clarified as IDE extension vs compaction extension.
49
+
50
+ ### Pages Created
51
+ [[pi-vscode-marketplace]], [[pi-vscode-model-provider-marketplace]], [[vscode-pi-community-extension]], [[pi-vcc-github-repo]] (sources), [[pi-vscode-extension-landscape]], [[vcc-conversation-compaction-for-pi]] (concepts), [[Research: vcc extension for pi coding agent]] (synthesis)
52
+
53
+ ## GSD Integration Research (May 2026)
54
+
55
+ ### Key Finding
56
+ **GSD (60K stars) is a downstream application-building pipeline (discuss→plan→execute→verify→ship). Our harness is an upstream behavior-control pipeline (spec→plan→drift→grounding→adversarial→observe→memory→orchestrate→query). They address fundamentally different layers. Complementary, not competitive.**
57
+
58
+ ### Six Key Findings
59
+ 1. **GSD is downstream; we are upstream.** GSD receives user ideas and produces apps. Our harness governs how agents reason, verify, and maintain state during any coding task.
60
+ 2. **GSD lacks adversarial verification — our L4 fills that gap.** GSD's quality gates are mechanical (lint, test, type-check). No agent reads another agent's code to assess spec-vs-intent alignment.
61
+ 3. **Both systems share skill-first architecture.** Since May 2026, our harness uses markdown skills. GSD has always been markdown-first. Both use progressive disclosure.
62
+ 4. **GSD's context engineering complements our L3 grounding.** Fresh subagent contexts, file-based state, XML plans — same techniques, different abstraction levels.
63
+ 5. **GSD's state files are a narrower version of our L6 memory.** `.planning/` is project memory. Our wiki is universal knowledge across all harness layers.
64
+ 6. **GSD's community limitations validate our harness design.** Token-heavy, degrades at scale, verification uses lexical tools — exactly the failure modes our L1-L4 pipeline prevents.
65
+
66
+ ### Patterns Worth Adopting
67
+ - **Namespace routing**: 6 meta-skills reducing 86→6 skill listings (saves ~2K tokens/turn)
68
+ - **Deterministic CLI helper**: `gsd-tools.cjs` — script for operations LLMs do unreliably
69
+ - **Wave execution tracking**: Dependency-aware parallel execution with per-task summaries
70
+
71
+ ### Integration Opportunities
72
+ - **Immediate**: Adopt namespace routing + deterministic helper pattern
73
+ - **Medium-term**: Run GSD inside harness-controlled pi sessions with harness drift monitoring
74
+ - **Long-term**: Unified skill marketplace if GSD-2 converges with pi
75
+
76
+ ### Pages Created
77
+ [[gsd-github-repo]], [[gsd-codecentric-deep-dive]], [[gsd-hn-discussion]] (sources), [[gsd-get-shit-done]] (entity), [[Research: how GSD fits into our coding harness setup]] (synthesis)
78
+
79
+ ---
80
+
81
+ ## Key Recent Facts
82
+ - Superpowers (`obra/superpowers`) by Jesse Vincent: 179K GitHub stars, 15.9K forks, MIT license, v5.1.0 (May 2026).
83
+ - Complete software development methodology as composable SKILL.md skills with hard-gate enforcement.
84
+ - Validates our skill-first architecture — both use progressive disclosure, markdown-based skills.
85
+ - Can be directly integrated as `.pi/skills/superpowers/` skill set. Adds methodology layer.
86
+ - Cannot replace our deterministic code-level enforcement (drift monitor). Best approach: Superpowers as methodology (probabilistic) + harness as enforcement (deterministic).
87
+ - Cross-agent SKILL.md portability: works across Claude Code, Codex, Cursor, Gemini CLI, OpenCode, Copilot.
88
+ - 490K+ skills ecosystem exists across Skills.sh (83K+), SkillsMP (400K+), ClawHub (~10K).
89
+
90
+ ## Active Threads
91
+ - User evaluating Superpowers for harness integration.
92
+ - Key decision: adopt Superpowers as skill set vs build custom methodology skills.
93
+ - Security consideration: 13.4% of marketplace skills have critical issues (Snyk study).
94
+
95
+ ---
96
+
97
+ Previous research still active. See below.
98
+
99
+ ---
100
+
101
+ ## pi-vs-claude-code — Agentic Orchestration Pipeline for Our Harness (May 2026)
102
+
103
+ ### Key Finding
104
+ **Pi's TypeScript extension system can implement full multi-agent orchestration — subagent delegation, team dispatch, and sequential chaining — entirely in user-space without core agent changes.** disler/pi-vs-claude-code (928 stars) provides production-grade reference implementations of all three patterns. Combined with OpenDev's comprehensive architecture paper (arXiv:2603.05344v1) and Martin Fowler's harness engineering framework, clear implementation paths emerge for our harness.
105
+
106
+ ### 7 Key Findings
107
+ 1. **Three orchestration patterns cover the design space** (Source: [[sources/disler-pi-vs-claude-code]]): Subagent delegation (`/sub`), team dispatch (`/team` via `teams.yaml`), sequential chaining (`/chain` via `agent-chain.yaml`). All implementable as `.pi/skills/` extensions.
108
+ 2. **Schema-level isolation > runtime permission checks** (Source: [[sources/opendev-arxiv-2603.05344v1]]): Removing tools from subagent schemas makes dangerous operations structurally impossible. The model cannot argue for capabilities it doesn't know exist.
109
+ 3. **Context engineering is a first-class concern** (Source: [[sources/opendev-arxiv-2603.05344v1]]): Staged compaction (5 thresholds), event-driven reminders (24 templates, `role: user` injection), dual-memory architecture. Our harness lacks all three.
110
+ 4. **Harness = Guides + Sensors + Steering Loop** (Source: [[sources/martin-fowler-harness-engineering]]): Feedforward guides steer before action; feedback sensors observe after. Human iterates on both.
111
+ 5. **Multi-model pipelines beat single-model agents** (Source: [[sources/mindstudio-four-agent-types]]): Different pipeline stages benefit from different models (Opus for planning, Sonnet for building, Haiku for reviewing).
112
+ 6. **Safety requires defense-in-depth** (Source: [[sources/opendev-arxiv-2603.05344v1]]): Five independent layers — prompt → schema → approval → validation → hooks. Our harness has none in structured form.
113
+ 7. **Agent types must match task architecture** (Source: [[sources/mindstudio-four-agent-types]]): Coding Harness, Dark Factory, Auto Research, Orchestration — each optimized for different work. Mismatch = failure.
114
+
115
+ ### Concepts Created (5)
116
+ [[concepts/agentic-orchestration-pipeline]], [[concepts/agent-harness-architecture]], [[concepts/multi-agent-specialization]], [[concepts/context-engineering]], [[concepts/safety-defense-in-depth]]
117
+
118
+ ### Entities Created (3)
119
+ [[entities/pi-coding-agent]], [[entities/disler-indydevdan]], [[entities/opendev]]
120
+
121
+ ### Sources (5)
122
+ [[sources/disler-pi-vs-claude-code]], [[sources/opendev-arxiv-2603.05344v1]], [[sources/martin-fowler-harness-engineering]], [[sources/mindstudio-four-agent-types]], [[sources/anthropic-effective-harnesses]]
123
+
124
+ ### Synthesis
125
+ [[questions/Research-pi-vs-claude-code-agentic-orchestration-pipeline]] — Full research: 7 findings, 1 contradiction, 6 open questions
126
+
127
+ ### Implementation Path
128
+ 1. Extend existing `Agent` tool with team dispatch via YAML config (`.pi/agents/teams.yaml`)
129
+ 2. Add chain orchestration with `$INPUT` variable injection (`.pi/agents/agent-chain.yaml`)
130
+ 3. Implement context isolation per subagent (fresh conversation per spawn)
131
+ 4. Add progress dashboards (grid for teams, step tracker for chains)
132
+ 5. Build safety defense-in-depth (schema gating → approval → validation → hooks)
133
+ 6. Implement staged compaction (5 thresholds) and event-driven reminders
134
+
135
+ ### Open Questions
136
+ - Event-driven system reminders in Pi's extension API?
137
+ - Right compaction strategy for our context window?
138
+ - Persistent approval rules across sessions?
139
+ - 9-pass fuzzy editing in Pi's `edit` tool?
140
+ - Performance impact of context isolation per subagent?
141
+ - Steering loop mechanism (human review → update guides/sensors)?
142
+
143
+ ---
144
+
145
+ ## sentrux.dev — Architectural Sensor for AI Coding Agents (May 2026)
146
+
147
+ ### Key Finding
148
+ **sentrux closes the feedback loop for AI-agent-written code — a real-time architectural sensor that computes a single Quality Signal (0–10,000) from 5 independent graph-theoretic metrics, visualizes the codebase as an interactive treemap, and integrates with AI coding agents via MCP.** Built in Rust (MIT), 52 languages via tree-sitter. First released March 11, 2026 (v0.5.7 as of March 18). 1.9k GitHub stars, 168 forks. Pro tier ($15/month) via runtime dylib plugin with Ed25519 license keys.
149
+
150
+ ### 7 Key Findings
151
+ 1. **Unique positioning as the missing feedback loop**: sensor (sentrux) → signal → controller (AI agent) → actuator (code changes) → system (codebase) → loop. Classical cybernetics (Wiener 1948, Tsien 1954).
152
+ 2. **Mathematically grounded scoring**: Quality Signal = geometric mean of (modularity × acyclicity × depth × equality × redundancy)^(1/5) × 10000. Justified by Nash Social Welfare theorem (1950). Claims "ungameable by design."
153
+ 3. **5 metrics cover complete structural space**: 3 edge properties (modularity Newman 2004, acyclicity Martin 2003, depth Lakos 1996) + 2 node properties (equality Gini 1912, redundancy Kolmogorov 1963).
154
+ 4. **MCP-first AI agent integration**: 9 tools (scan, health, session_start, session_end, rescan, check_rules, evolution, dsm, test_gaps). Agent detects quality degradation per session.
155
+ 5. **Pro tier via runtime plugin**: Pro features ($15/mo) in separately downloaded dylib. Ed25519 license keys with offline validation. Per-user watermarking. Team tier at $40/month/seat.
156
+ 6. **Rapid development velocity**: Initial commit to v0.5.7 with Pro architecture in ~1 week. 318 commits, 37 releases.
157
+ 7. **52 languages via tree-sitter**: Zero language-specific Rust code. Adding language = plugin.toml + tags.scm only.
158
+
159
+ ### Contradictions
160
+ - **Self-assessment gap**: sentrux gives own repo a "D" rating — problematic for credibility.
161
+ - **Rapid release pace vs stability**: 17 releases in one day during launch. Reddit flagged as "vibe-coded."
162
+ - **Conceptual strength vs practical utility**: Community split between praising feedback loop concept and questioning usefulness.
163
+
164
+ ### Sources (7)
165
+ [[sentrux-github-repo]], [[sentrux-dev-landing]], [[sentrux-docs-quality-signal]], [[sentrux-docs-root-cause-metrics]], [[sentrux-docs-rules-engine]], [[sentrux-docs-pro-architecture]], Reddit r/rust launch post
166
+
167
+ ### Concepts Created (4)
168
+ [[Quality Signal (sentrux)]], [[Five Root Cause Metrics (sentrux)]], [[sentrux Rules Engine]], [[sentrux MCP Integration]]
169
+
170
+ ### Entity Created
171
+ [[sentrux (tool)]]
172
+
173
+ ### Synthesis
174
+ [[Research: sentrux.dev]] — Full research: 7 findings, 4 contradictions, 6 open questions
175
+
176
+ ### Open Questions
177
+ - Production adoption unknown (tool <2 months old)
178
+ - No independent reviews beyond Reddit launch thread
179
+ - Accuracy varies by tree-sitter grammar maturity
180
+ - Scalability with large codebases unverified
181
+ - Pro plugin security: runtime dylib loading risks
182
+ - No benchmarking against SonarQube, CodeClimate, etc.
183
+
184
+ ---
185
+
186
+ Previous (2026-05-03): **Automating Software Engineering**, **Legendary Engineering Patterns**, and **Skill-First Architecture** research also active. See below.
187
+
188
+ ---
189
+
190
+ ## Automating Software Engineering — Platform Patterns for Harness (May 2026)
191
+
192
+ ### Key Finding
193
+ **Multi-agent architecture (Planner→Architect→Coder/Generator→Evaluator) is universal across all successful AI coding platforms.** Combined with deep engineering reports from OpenAI (Codex, 0 human-written lines) and Anthropic/OpenDev (context engineering, system reminders), clear first-principles patterns emerge for our harness.
194
+
195
+ ### 10 Key Findings
196
+ 1. **Multi-agent decomposition is universal**: Lovable (Planner→Architect→Coder), Anthropic (Planner→Generator→Evaluator), OpenAI (agent-to-agent review loops), OpenDev (dual-agent thinking/action separation).
197
+ 2. **Environment control is the moat**: Bolt's WebContainers ($4M ARR in 4 weeks), OpenAI's Chrome DevTools MCP, Anthropic's Playwright MCP. Agents must run and verify code, not just generate it.
198
+ 3. **Structured outputs prevent chaos**: Pydantic-typed handoffs between agents. "Structured data transforms AI from demo to production" (Lovable clone).
199
+ 4. **Context engineering is the central constraint**: Progressive disclosure (AGENTS.md as ToC, not encyclopedia), adaptive compaction (5 stages, 54% reduction), system reminders at decision points (OpenDev).
200
+ 5. **Repository as system of record**: "What Codex can't see doesn't exist." All knowledge in repo as versioned markdown. Doc-gardening agents scan for staleness (OpenAI).
201
+ 6. **Enforce architecture mechanically**: Custom linters + structural tests, not prompts. "Constraints become multipliers with agents" (OpenAI).
202
+ 7. **Generator-evaluator loop (GAN-inspired)**: Separate evaluator with hard-threshold criteria. Sprint contracts (agree on "done" before coding). Evaluator uses Playwright to actually click through apps.
203
+ 8. **"Code generation is a commodity"** (Rocket.new thesis): Pre-build strategy + competitive intelligence is the frontier. $15M seed, 1.5M users.
204
+ 9. **0 lines of human code**: OpenAI built production product (~1M lines, ~1.5K PRs) with Codex. Humans steer, agents execute. 3.5 PRs/engineer/day.
205
+ 10. **Garbage collection for AI slop**: Continuous background cleanup agents. "Golden principles" encoded mechanically. Technical debt = high-interest loan.
206
+
207
+ ### Sources (6 new)
208
+ [[Source: Lovable Architecture & Clone Analysis]], [[Source: Bolt.new Architecture & Case Study]], [[Source: Rocket.new — Vibe Solutioning Platform]], [[Source: OpenAI Harness Engineering — 0 Lines of Human Code]], [[Source: OpenDev — Building AI Coding Agents for the Terminal]], [[anthropic2026-harness-design]]
209
+
210
+ ### Concepts Created (2 new)
211
+ [[Context-Aware System Reminders]] — Event-driven behavioral guidance at decision points
212
+ [[Multi-Agent AI Coding Architecture]] — Universal pattern with structured handoffs
213
+
214
+ ### Entities Created (4)
215
+ [[Lovable (company)]], [[Bolt.new (StackBlitz)]], [[Rocket.new]], [[Emergent Labs]]
216
+
217
+ ### Synthesis
218
+ [[Research: Automating Software Engineering - Lovable, Bolt, Emergent, Rocket]] — Full research: 10 findings, 3 contradictions, 5 open questions
219
+
220
+ ### Page Updated
221
+ [[context-engineering]] (stub → developing, enriched with 8 first principles from OpenDev/OpenAI/Anthropic)
222
+
223
+ ---
224
+
225
+ ## Legendary Engineering Patterns for AI Harness (May 2026)
226
+
227
+ ### Key Finding
228
+ **10 cross-cutting engineering patterns from 6 legendary programmers — Torvalds, Thompson, Ritchie, Stroustrup, Hejlsberg, van Rossum — map directly to AI coding harness design.** The core insight: the same principles that produced the world's most durable software must constrain AI-generated code. Deterministic guardrails (type systems, linters, tests) become more important with AI, not less.
229
+
230
+ ### 10 Patterns → Harness Map
231
+ | Pattern | Source | Harness Map |
232
+ |---------|--------|-------------|
233
+ | Fast feedback loops | Hejlsberg, Torvalds | Instant lint/build on AI output; pre-execution type checking |
234
+ | Composability over monoliths | Thompson, Ritchie, McIlroy | Agent composition: specialized sub-agent pipeline stages |
235
+ | Chain of trust | Torvalds | Tiered verification: lint → type-check → test → critic → human |
236
+ | Subtractive design | Thompson, McIlroy | Harness "suggest deletion" mode — NOT YET IMPLEMENTED |
237
+ | Behavioral compatibility | Hejlsberg, Stroustrup, Torvalds | Fidelity gates: preserve existing test behavior |
238
+ | Pragmatism over perfection | van Rossum | "Correct enough" over "provably correct" |
239
+ | Readability first | Torvalds, van Rossum, Kernighan | Post-generation lint/style enforcement |
240
+ | Deep understanding → leverage | Thompson | Semantic codebase indexing prerequisite to generation |
241
+ | Type systems as guardrails | Hejlsberg, Stroustrup | Mandatory deterministic checks before human review |
242
+ | Shared context/community | Thompson, Ritchie, Kernighan | Wiki as digital Unix Room — all decisions visible, searchable |
243
+
244
+ ### Critical Consensus
245
+ - **All six programmers oppose "vibe coding"** — human architectural control is non-negotiable
246
+ - **Hejlsberg + Stroustrup converge**: type systems are THE AI guardrail
247
+ - **Hejlsberg + van Rossum converge**: type hints above ~10K lines, dynamic below
248
+ - **van Rossum**: "We stay in control where it comes to architecture and API design"
249
+ - **Torvalds**: vibe coding "horrible for production, fine for learning" (2026)
250
+
251
+ ### Sources (5)
252
+ [[linux-kernel-coding-workflow]], [[unix-philosophy]], [[birth-of-unix-kernighan-interview]], [[hejlsberg-7-learnings]], [[guido-python-design-philosophy]]
253
+
254
+ ### Concept Created
255
+ [[legendary-engineering-patterns-harness]] — Full 10-pattern mapping with harness integration details
256
+
257
+ ### Entities Created (6)
258
+ [[Linus Torvalds]], [[Ken Thompson]], [[Dennis Ritchie]], [[Anders Hejlsberg]], [[Guido van Rossum]], [[Bjarne Stroustrup]]
259
+
260
+ ### Synthesis
261
+ [[Research: Engineering Workflows of Legendary Programmers and AI Harness Mapping]] — Full research synthesis: 10 findings, 5 contradictions resolved, 5 open questions
262
+
263
+ ### Open Questions
264
+ - Subtractive design in AI harness — not yet designed in any layer
265
+ - Thompson-level codebase understanding for AI agents — benchmark needed
266
+ - Balance fast feedback vs thorough verification per change type
267
+ - Harness equivalent of "don't break userspace"
268
+ - Van Rossum's 10K-line typing threshold — empirical validation needed
269
+
270
+ ---
271
+
272
+ **Skill-First Architecture**: Harness layers as markdown skills (`.pi/skills/harness-*/SKILL.md`) — only drift monitor remains as TypeScript code. Event bus handled by pi's built-in system. 3 code files vs 15. Progressive disclosure keeps context lean. See below.
273
+
274
+ ## Skill-First Harness Architecture (May 2026)
275
+
276
+ ### Key Finding
277
+ **Harness layers should be markdown-based skills, not TypeScript code modules.** The core insight: the harness is a skill coordination layer, not a code pipeline. Only deterministic infrastructure needs code — the drift monitor (real-time pattern matching on every `tool_result` event), shared types, and config. Event routing is handled by pi's built-in event bus. Everything else — spec hardening, planning, adversarial verification, observability, memory — is probabilistic LLM evaluation and should be a skill.
278
+
279
+ ### Architecture
280
+ ```
281
+ CODE LAYER (3 TS files — deterministic, always-on):
282
+ drift-monitor.ts (pattern matching) | types.ts | config.ts
283
+ EVENT BUS: pi's built-in native event system
284
+
285
+ SKILL LAYER (6 SKILL.md directories — probabilistic, on-demand):
286
+ harness-spec/ | harness-plan/ | harness-critic/ | harness-observe/ | harness-gate/ | harness-memory/
287
+
288
+ WIKI LAYER (Obsidian — persistent, cross-session):
289
+ ADRs, specs, plans, consensus, hot cache, index
290
+ ```
291
+
292
+ ### Why Skills Over Code
293
+ 1. **Better at evaluation**: LLM is better at spec quality, plan correctness, code review than imperative code.
294
+ 2. **Progressive disclosure**: 6 harness skills cost ~480 tokens at discovery vs 15 code modules always loaded (~15K tokens).
295
+ 3. **Zero-compile iteration**: Edit markdown → agent picks up next activation. No TypeScript compilation for harness logic changes.
296
+ 4. **User-editable**: PMs/domain experts can edit SKILL.md without TypeScript knowledge.
297
+ 5. **Industry convergence**: Anthropic, OpenAI, Google, GitHub, Cursor all adopted SKILL.md open standard within weeks.
298
+
299
+ ### Why Some Things Stay Code
300
+ **Drift monitor MUST be code**: Runs deterministically on every `tool_result` event with sub-millisecond rule-based pre-filter. Skills are probabilistic — the model decides when to activate them. Drift detection cannot be skipped.
301
+
302
+ **Event bus is pi's built-in**: Pi's latest version ships a native event bus — no custom `events.ts` or `harness-event-bus.ts` needed. Skills register directly with pi's native events.
303
+
304
+ ### File Count
305
+ | Old (v1 Code-First) | New (v2 Skill-First) |
306
+ |---------------------|----------------------|
307
+ | 15 TypeScript files (~2,500 lines) | 3 TypeScript files (~600 lines) |
308
+ | 0 skill files | 12 skill files (6 SKILL.md + 6 reference.md) |
309
+ | Compilation required for every logic change | Compilation only when types/drift change |
310
+ | ~9 weeks to MVP | ~8 weeks to MVP |
311
+
312
+ ### Sources (3 new)
313
+ [[Source: SwirlAI Agent Skills Progressive Disclosure]] — Three-tier architecture, ecosystem adoption speed (Mar 2026)
314
+ [[Source: Claude API Agent Skills Overview]] — Filesystem-based skill architecture, loading levels, security
315
+ [[Source: Blake Crosley Agent Architecture Guide]] — Complete harness pattern: hooks, skills, subagents, production results (Apr 2026)
316
+
317
+ ### Concept Created
318
+ [[skill-first-architecture]] — Full architecture derivation, first principles, when skills vs when code
319
+
320
+ ### Synthesis
321
+ [[Research: Skill-First Harness Architecture]] — Full research synthesis: architecture comparison, contradictions, open questions
322
+
323
+ ### Plans Rewritten
324
+ [[mvp-implementation-blueprint]] — Skill-First v2: 19 files, 3 code + 12 skill + 4 config. Event bus handled by pi built-in. All skill bodies documented.
325
+ [[harness-implementation-plan]] — Skill-First v2: every phase now specifies implementation method (SKILL or CODE).
326
+
327
+ ---
328
+
329
+ Previous (2026-05-02): **P30 browser engine replaced**: Vercel Labs agent-browser (31.4K stars, Apache 2.0, Rust-native) replaces browser-harness (9.4K stars, MIT, Python). **L2.5 drift detection rethought from first principles**: LLM-first (Haiku 4.5) with structured drift context replaces rule-based primary.
330
+
331
+ ## P30 Browser Engine: agent-browser (May 2026)
332
+
333
+ ### browser-harness → Vercel Labs agent-browser
334
+ **Upgrade for maturity and AI agent integration.** agent-browser (31.4K GitHub stars, Apache 2.0, v0.26.0, 81 releases, 112 contributors) is 3.3× larger than browser-harness (9.4K stars, MIT, Python) and provides richer AI agent primitives:
335
+ - Snapshot + refs workflow (snapshot -i → click @e2 → fill @e3)
336
+ - Annotated screenshots with numbered labels matching @eN refs
337
+ - Structured diff (snapshot diff + visual pixel diff)
338
+ - React introspection (react tree, react renders, react suspense)
339
+ - Web Vitals (LCP/CLS/TTFB/FCP/INP)
340
+ - Batch mode (multi-command single invocation)
341
+ - Built-in skills system (skills get core, npx skills add)
342
+ - Rust-native single binary (no Python dependency)
343
+
344
+ Install: `npm install -g agent-browser`. Config: `.pi/harness/browser.json`.
345
+
346
+ See [[Source: Vercel Labs agent-browser]] and [[agent-browser-browser-automation]].
347
+
348
+ ### Files updated
349
+ HARNESS-PRD.md (P30, L2.5, deps, references, token budget, resolved Q19), wiki/modules/harness-implementation-plan.md (P3-P7, P30, L2.5, token budget, tools table), wiki/concepts/drift-detection-unified.md (full LLM-first rewrite), wiki/concepts/browser-subagent-visual-verification.md (agent-browser), wiki/concepts/agent-browser-browser-automation.md (new), wiki/sources/Source: Vercel Labs agent-browser.md (new), wiki/index.md, wiki/hot.md.
350
+
351
+ ## L2.5 Drift Detection: LLM-First v2 (May 2026)
352
+
353
+ ### First-Principles Rethink
354
+
355
+ **Problem**: Rule-based detection (6 patterns: repetition, failure spiral, etc.) catches ~80% of stuck sessions but MISSES semantic drift — agent making "progress" but heading in the wrong direction. FP #6 states drift is a positive feedback loop; the 20% that slip through are the most dangerous.
356
+
357
+ **Solution**: LLM-based primary detection with structured context input + very cheap model.
358
+
359
+ ```
360
+ Every 8 turns:
361
+ 1. Rule-based pre-filter (0 tokens, <1ms): if CLEAR stuck → escalate immediately
362
+ 2. Build structured drift context (~700 tokens):
363
+ { task, subtask, last_12_tool_calls_summary, files_modified, errors, turn }
364
+ 3. Send to Haiku 4.5: "Is agent making progress? Reply JSON."
365
+ 4. Act on verdict: continue | nudge | restart
366
+ ```
367
+
368
+ **Why LLM-first**:
369
+ - LLM has semantic understanding of task + plan — catches direction-drift that rule-based can't
370
+ - Structured context (not full history) keeps cost negligible: ~700 tokens × $0.25/M = ~$0.0002/check
371
+ - Classification task (not generation) is ideal for cheap models
372
+ - Rule-based is now the cost-saving pre-filter, not the authority
373
+ - Net positive: ~$0.001-0.005/session prevents 5,000-50,000 token stuck sessions
374
+
375
+ Token budget impact: L2.5 goes from ~0-150 → ~1,500-2,200/session. Total per-subtask: ~17,500-19,000 (up from 16,000-17,500).
376
+
377
+ See [[drift-detection-unified]] for full first-principles derivation.
378
+
379
+ ---
380
+
381
+ ## Tech Stack Corrections (2026-05-02)
382
+
383
+ ### PromptKit PackC → Real Tools
384
+ **Fabrication detected & corrected.** "PromptKit PackC" (npm, v1.4.6, 48 versions) does NOT exist — it was an LLM hallucination. The build-time prompt compilation architecture IS valid. No mature off-the-shelf npm package exists.
385
+
386
+ **Real alternatives**:
387
+ - **Microsoft prompt-engine** (2.8K stars, MIT): YAML-based prompt management. Validates the pattern. Abandoned 2022.
388
+ - **PromptWeaver** (`@iqai/prompt-weaver`, MIT, Dec 2025): Handlebars template compilation + Zod validation. Active. Production-ready.
389
+ - **DIY pipeline**: `js-yaml` (parsing) + `@iqai/prompt-weaver` (templates) + custom per-model renderer plugins. See [[Source: Build-Time Prompt Compilation Architecture]].
390
+
391
+ ### Puppeteer → browser-harness → agent-browser (P30)
392
+ **Evolution**: Puppeteer → browser-harness (May 1) → Vercel Labs agent-browser (May 2). agent-browser (31.4K stars, Apache 2.0, Rust-native) provides richer AI agent primitives: snapshot + refs, annotated screenshots, structured diff, React introspection, batch mode, skills system. Replaces browser-harness (9.4K stars, MIT, Python) for P30. See [[Source: Vercel Labs agent-browser]] and [[agent-browser-browser-automation]].
393
+
394
+ ---
395
+
396
+ ## TypeScript Best Practices and Codebase Structure (2026-05-02)
397
+
398
+ ## TypeScript Best Practices and Codebase Structure (2026-05-02)
399
+
400
+ ### Key Finding
401
+ **TypeScript ecosystem has matured significantly.** Strict mode is consensus default. Barrel files are now discouraged for app code. Monorepo tooling (Turborepo, Nx) is production-ready. Vitest has replaced Jest for new projects. Bun is fastest runtime but Node.js remains safest for production. tRPC eliminates 89-98% of API bugs through compile-time type safety (separate research available).
402
+
403
+ ### Eight Key Findings
404
+ 1. **Enable `strict: true` by default.** `strictNullChecks` alone eliminates null-reference bugs. Migrate incrementally.
405
+ 2. **Avoid barrel files in app code.** Causes circular imports + 68% module bloat in real production measurements.
406
+ 3. **Node.js remains safest runtime.** Bun is 4× faster but Node.js features are backporting Bun/Deno innovations.
407
+ 4. **Built-package strategy preferred for TS monorepos.** Build with bundler, use Turborepo for orchestration, generate `.d.ts.map` for IDE support.
408
+ 5. **Vitest replaced Jest** as default test runner for Vite/TypeScript projects.
409
+ 6. **Name backend folders by technical capability** (controllers, services, repositories), not feature.
410
+ 7. **Result monad enables declarative error handling.** `Result<Ok, Err>` with map/flatMap/match. Errors are values, not exceptions.
411
+ 8. **ESLint + strict mode = defense-in-depth.** `@typescript-eslint/recommended-type-checked` catches what strict mode misses (floating promises).
412
+
413
+ ### Sources (8)
414
+ [[ts-strict-mode-rishikc]], [[ts-runtimes-comparison-betterstack]], [[barrel-files-tkdodo]], [[ts-monorepo-koerselman]], [[vitest-official]], [[ts-folder-structure-mingyang]], [[ts-best-practices-2025-devto]], [[ts-result-error-handling-kkalamarski]]
415
+
416
+ ### Concepts Created
417
+ [[typescript-strict-mode]], [[barrel-files]], [[monorepo-architecture]], [[result-monad-error-handling]]
418
+
419
+ ### Entities Created
420
+ [[javascript-runtimes]], [[vitest]]
421
+
422
+ ### Synthesis
423
+ [[Research: TypeScript Best Practices and Codebase Structure]] — Full synthesis with key findings, contradictions, open questions.
424
+
425
+ ### Contradictions Found
426
+ - Barrel files: traditional wisdom vs TkDodo's performance data (resolution: libraries only)
427
+ - Folder structure: technical-capability vs feature-based (resolution: technical for backend, feature for frontend)
428
+ - Built vs source-only packages: depends on team/project size
429
+
430
+ ### Open Questions
431
+ - tRPC adoption rate vs REST in non-TypeScript environments
432
+ - Biome vs ESLint+Prettier adoption in 2026
433
+ - `isolatedModules: true` performance in large monorepos
434
+ - Oxc vs SWC vs ESBuild for type stripping benchmarks
435
+
436
+ ---
437
+
438
+ ## Prompt Renderer for Multi-Model Agent Harness (2026-05-02)
439
+
440
+ ### Key Finding
441
+ **Build-time prompt compilation is a PROVEN PATTERN — but no mature off-the-shelf npm package exists.** The architecture: base prompt spec (YAML) → per-model renderer plugins (GPT/Claude/Gemini) → compiled JSON shipped in npm → runtime just does string substitution for variables. Implementation: DIY build pipeline using `js-yaml` (parsing) + `@iqai/prompt-weaver` (Handlebars templates + Zod validation) + custom per-model renderer plugins. Microsoft prompt-engine (2.8K stars, MIT) validates the YAML→prompt pattern but is abandoned (last update 2022). PromptWeaver (MIT, Dec 2025) provides the active template compilation layer. This eliminates runtime template engines, cache warmup latency, and parallel-execution traps entirely. See [[Source: Build-Time Prompt Compilation Architecture]] for full correction.
442
+
443
+ ### Architecture
444
+ ```
445
+ BUILD TIME: Base Spec (YAML) → Compiler → GPT.json + Claude.json + Gemini.json → npm package
446
+ RUNTIME: Load {spec, model}.json → substitute runtime vars → send to LLM
447
+ ```
448
+
449
+ ### Core Design Decisions
450
+ 1. **Build-time, not runtime**: Compiler runs during `npm run build`. Compiled prompts are static JSON assets in `dist/prompts/`. No template engine shipped.
451
+ 2. **Per-model renderers are plugins**: Each provider's official conventions applied at compile time. GPT (constraints-first, flat), Claude (XML tags, long-form), Gemini (constraints-last, plain text).
452
+ 3. **Two-phase variable system**: Compile-time vars produce multiple compiled variants. Runtime vars are `__VAR_name__` placeholders for simple string replace (no template engine needed).
453
+ 4. **Caching is built-in**: Compiled prompts ARE the cache. Incremental builds only recompile changed specs (hash-based). No runtime cache warming, no parallel-execution trap, no TTL concerns.
454
+ 5. **Deterministic builds**: Same spec + same compiler version → identical output. Hash-verified via build manifest.
455
+
456
+ ### Multi-Tier Caching Context
457
+ - **Semantic cache** (100% savings): intercept near-duplicate queries before API call
458
+ - **Prefix cache** (50-90% savings): static system prompts cached by provider API
459
+ - **Build cache**: compiled prompts shipped in package — no runtime prefix caching needed
460
+ - Arxiv-validated (500 agent sessions, 4 models): system prompt only caching = 41-80% cost reduction, 13-31% TTFT improvement
461
+
462
+ ### Per-Model Rendering Rules
463
+ | Provider | Structure | Instruction Order | Cache | Best Practice Source |
464
+ |----------|-----------|------------------|-------|---------------------|
465
+ | OpenAI GPT | Flat, constraints-first | Outcome→Constraints→Context | Auto | platform.openai.com/docs/guides/prompt-engineering |
466
+ | Anthropic Claude | XML tags, nesting OK | Role→Context→Task→XML | Explicit cache_control | docs.anthropic.com + interactive tutorial |
467
+ | Google Gemini | Plain text, constraints-last | Context→Task→Constraints | Explicit context cache | cloud.google.com/vertex-ai/docs |
468
+
469
+ ### Integration with Harness
470
+ Extends [[provider-native-prompting]] (Phase P22b from prior research). New compiler module: `scripts/compile-prompts.ts`. Compiled output: `dist/prompts/{gpt,claude,gemini}/*.json`. Runtime loader: `loadPrompt(specName, model, runtimeVars)` — zero-dependency, just `JSON.parse` + string replace.
471
+
472
+ ### Sources (4 new)
473
+ [[Source: Build-Time Prompt Compilation Architecture]] — Real tools: DIY pipeline (js-yaml + PromptWeaver + per-model renderers)
474
+ [[Source: AgentBus Jinja2 Prompt Pipelines]] — Jinja2 templating patterns adapted to build-time
475
+ [[Source: TianPan Prompt Caching Architecture]] — Multi-tier caching, 60-90% savings, cache boundary control
476
+ [[Source: Arxiv — Don't Break the Cache]] — Academic validation: 41-80% cost reduction across providers
477
+
478
+ ### Synthesis
479
+ [[Research: Prompt Renderer for Multi-Model Agent Harness]] — Full architecture, 5-phase implementation plan, 7 open questions
480
+
481
+ ### New Concept Pages
482
+ [[Prompt Renderer]] — Build-time compilation: base spec → per-model prompts via pluggable renderers
483
+ [[Build-Time Prompt Compilation]] — Compile at build time, ship as static JSON in npm, zero runtime cost
484
+
485
+ ---
486
+
487
+ ## executor.sh Harness Integration (2026-05-01)
488
+
489
+ ### Key Finding
490
+ **executor.sh (RhysSullivan/executor) is an integration layer — a unified tool catalog + auth + policy + execution runtime — not just a TypeScript execution layer.** Our existing wiki classified it alongside CodeAct and Cloudflare Code Mode under P43 (TS Execution Layer). This research finds Executor belongs in a broader category: the agent integration/runtime layer.
491
+
492
+ ### Three Gaps Revealed
493
+ 1. **No tool catalog with intent-based discovery**: `tools.discover({ query, limit })` lets agents search tools by intent without loading all schemas → new **P43b**
494
+ 2. **No shared auth for external tools**: Executor centralizes OAuth/keychain across agents → gap for post-v1
495
+ 3. **No execution pause/resume**: Stateful execution lifecycle with `waiting_for_interaction` state → new **P43c**
496
+
497
+ ### Five Pillars (from executor.sh landing)
498
+ 1. Unified catalog with intent-based discovery
499
+ 2. Shared auth (sign in once, all agents share)
500
+ 3. Policy engine (auto-approve reads, pause on writes, wildcards)
501
+ 4. Any agent via MCP (single MCP endpoint)
502
+ 5. Local-first (secrets in keychain, nothing leaves machine)
503
+
504
+ ### Build vs Integrate Decision
505
+ - **Harness-native L3 tools (P43)**: Build custom TS runtime with borrowed catalog/discovery/policy patterns
506
+ - **External API integration (GitHub, Slack, Stripe)**: Use Executor as MCP dependency (post-v1)
507
+
508
+ ### Source Updated
509
+ [[executor-rhyssullivan]] — major rewrite with product positioning, architecture, policy engine, execution lifecycle
510
+
511
+ ### Synthesis
512
+ [[Research: executor.sh Harness Integration]]
513
+
514
+ ### Plan Updated
515
+ [[harness-implementation-plan]] — P43b (Tool Catalog with Discovery), P43c (Policy-Aware Execution)
516
+
517
+ ---
518
+
519
+ ## Fallow Codebase Intelligence Harness Integration (2026-05-01)
520
+
521
+ ### Key Finding
522
+ **Fallow (fallow-rs/fallow, 1.7K stars, MIT, Rust-native) is the ONLY codebase intelligence tool across TS/JS, Python, Go, Rust, Elixir that provides dead code + duplication + complexity + boundaries in one sub-second package.** Purpose-built for AI agents: MCP server, JSON `actions` array, `auto_fixable` flags. Beats knip 2-13x, beats jscpd 8-26x.
523
+
524
+ ### Seven Integration Points (P44a-P44g)
525
+ | Phase | Where | What |
526
+ |-------|-------|------|
527
+ | P44a | L3 tools | MCP tool registration. Agent calls fallow for real-time feedback |
528
+ | P44b | P15b sandbox | `fallow audit --changed-since main` scoped pre-verify |
529
+ | P44c | Phase 16 gate | `fallow audit --gate all` deterministic pass/warn/fail |
530
+ | P44d | L5 observability | Health score snapshots as Keep Rate proxy |
531
+ | P44e | P29 errors | Per-issue rule/severity/actions taxonomy mapping |
532
+ | P44f | L6 memory | Git-committed baselines in `.fallow-baselines/` |
533
+ | P44g | P42 automations | Cron-style weekly health sweeps + daily dead code |
534
+
535
+ ### Ecosystem Gap
536
+ **No ecosystem has a fallow-equivalent single-tool.** Python needs Vulture + Skylos + Ruff + Radon. Go needs golangci-lint + deadcode + gocyclo. Rust needs clippy + cargo-udeps + rust-code-analysis. Elixir needs dialyxir + credo.
537
+
538
+ ### Sources
539
+ [[fallow-rs-codebase-intelligence]]
540
+
541
+ ### Synthesis
542
+ [[Research: Fallow Codebase Intelligence Harness Integration]]
543
+
544
+ ### Plan Updated
545
+ [[harness-implementation-plan]] — New P44 phases, Fallow Validation section, New Tools table.
546
+
547
+ ---
548
+
549
+ ## TypeScript Execution Layer Research (2026-05-01)
550
+
551
+ ### Key Finding
552
+ **Three independent systems converge on the same architecture: replace flat tool calling with a typed TypeScript API + sandboxed runtime.** Apple CodeAct (ICML 2024: +20% success rate, -30% interaction turns), Cloudflare Code Mode (production: 3-4x context reduction), Executor (open-source: 1.3K stars, local-first TS runtime). Core insight: LLMs have seen millions of lines of code in pretraining but only contrived tool-calling examples — code is a more natural interface.
553
+
554
+ ### Three Systems Converging
555
+ | System | Type | Key Metric | Sandbox |
556
+ |--------|------|-----------|---------|
557
+ | **CodeAct** (Apple, ICML 2024) | Academic paper | +20% multi-tool success | Python interpreter |
558
+ | **Cloudflare Code Mode** (2025) | Production SDK | 3-4x context reduction | V8 Worker isolates |
559
+ | **Executor** (RhysSullivan, 2026) | Open-source | 1.3K stars | Local Node.js |
560
+
561
+ ### New Harness Phase: P43
562
+ **TypeScript Execution Layer** — single `write_ts` tool replaces 8-15 individual L3 tools. All tools (read, bash, edit, grep, find, ck_search, ctx_execute) exposed as typed TS API via auto-generated type defs. Agent writes TypeScript code; runtime executes in sandboxed Node.js VM or Deno subprocess. Tool calls dispatch via typed RPC back to harness. Permission subsystem (P35) gates all tool calls. Extends P14 (Think-in-Code) from data analysis to full tool orchestration.
563
+
564
+ ### New First Principle
565
+ **FP #19**: Code is a better tool-calling interface than JSON. LLMs have seen millions of lines of code in pretraining but only contrived tool-calling examples. A single "write TypeScript" tool + sandboxed runtime achieves 3-4x context reduction and ~20% higher success rate on multi-tool tasks.
566
+
567
+ ### What We Do NOT Adopt
568
+ - Cloudflare Workers dependency (our sandbox: local Node.js VM or Deno)
569
+ - Python interpreter / CodeAct (our stack is TypeScript)
570
+ - Web UI for tool config / Executor (our harness is CLI-only)
571
+
572
+ ### Sources (4)
573
+ [[codeact-apple-2024]], [[cloudflare-codemode]], [[executor-rhyssullivan]], [[colinmcnamara-context-optimization-codemode]]
574
+
575
+ ### Synthesis
576
+ [[Research: TypeScript Execution Layer for Agent Tool Calling]]
577
+
578
+ ### Plan Updated
579
+ [[harness-implementation-plan]] — New FP #19, P43 phase, TS Execution Layer Validation section, updated savings.
580
+
581
+ ---
582
+
583
+ ## Gemini CLI SOTA + Harness Integration (2026-05-01)
584
+
585
+ ### Key Finding
586
+ **Gemini CLI (103k GitHub stars, 6,005 commits, 40+ weekly releases) introduced 15 SOTA harness primitives — most already independently validated by other agents (Codex, Claude Code, Cursor, Antigravity), but Gemini CLI provides the most complete refactoring-oriented harness.** Seven integration priorities derived from first principles (not feature-copying).
587
+
588
+ ### 15 SOTA Innovations Cataloged
589
+ 1. **Agent Skills** (v0.23+): Progressive disclosure with activation mechanism
590
+ 2. **Plan Mode** (v0.29+): Structured decomposition with research subagents
591
+ 3. **Codebase Investigator** (v0.12+): JIT context discovery subagent
592
+ 4. **Context Compression** (v0.38+): Advanced conversation history distillation
593
+ 5. **Chapters Narrative** (v0.38+): Session grouping by intent (novel concept)
594
+ 6. **Policy Engine** (v0.18+): Pre-execution tool gates, persistent approvals
595
+ 7. **Subagents + Remote** (v0.32+): A2A protocol, generalist router
596
+ 8. **Event-Driven Hooks** (v0.27+): MessageBus architecture, queued confirmations
597
+ 9. **Four-Tier Memory** (v0.39+): Prompt-driven, /memory inbox
598
+ 10. **Multi-Registry** (v0.36+): Extensions, skills, MCP all registries
599
+ 11. **Browser Agent** (v0.31+): CDP access, persistent sessions
600
+ 12. **Model Routing** (v0.12+): Auto-select Flash vs Pro
601
+ 13. **Sandboxing Stack** (v0.34+): Docker, gVisor, LXC, Seatbelt
602
+ 14. **Git Worktrees** (v0.36+): Isolated parallel sessions
603
+ 15. **Extensions** (v0.8+): 20+ partners, A2A, SDK
604
+
605
+ ### Seven Integration Priorities (from First Principles)
606
+ | P-F1 | Pre-Execution Policy Gates | Mechanical enforcement over documentation (FP #3) |
607
+ | P-F2 | Skills Activation Mechanism | Progressive disclosure prevents context rot (FP #9) |
608
+ | P-F3 | Research Subagents for L2 | Ask what capability is missing (FP #5) |
609
+ | P-F4 | Event-Driven Hooks Middleware | Steering loop after every action (FP #10) |
610
+ | P-F5 | Git Worktree Sessions | Give the agent isolated space (FP #6) |
611
+ | P-F6 | Chapters Narrative for Sessions | A map not a manual (FP #7) |
612
+ | P-F7 | Browser Agent for Visual Verif | Give the agent eyes (FP #6) |
613
+
614
+ ### First-Principles Synthesis
615
+ **12 first principles synthesized** from Fowler (Feedforward+Feedback, Keep Quality Left), OpenAI (Visibility, Capability-Gap, Enforcement, Eyes, Map), LangChain (Progressive Disclosure, Model-Harness Independence, Filesystem as Universal Primitive), Augment (PEV Loop). See [[harness-engineering-first-principles]].
616
+
617
+ ### Benchmark Context
618
+ Render benchmark (Aug 2025): Gemini CLI 6.8/10. **Context: 9/10 (best).** Excels at editing existing codebases, weak at greenfield. Validates our grounding-heavy approach.
619
+
620
+ ### Source Pages (8)
621
+ [[Source: Google Gemini CLI Architecture Docs]], [[Source: Google Blog - Gemini CLI Announcement]], [[Source: Render AI Coding Agents Benchmark 2025]], [[Source: Martin Fowler - Harness Engineering]], [[Source: LangChain - Anatomy of Agent Harness]], [[Source: OpenAI Harness Engineering Five Principles]], [[Source: Augment - Harness Engineering for AI Coding Agents]], [[Source: Gemini CLI Changelogs]]
622
+
623
+ ### Concept Pages (4)
624
+ [[harness-engineering-first-principles]], [[agent-skills-pattern]], [[policy-engine-pattern]], [[gemini-cli-architecture]]
625
+
626
+ ### Synthesis
627
+ [[Research: Gemini CLI SOTA Harness Integration]] — Full synthesis with 15 innovations, 7 integration priorities, gap analysis, contradictions, open questions.
628
+
629
+ ---
630
+
631
+ ## Codex Open-Source Architecture Research (2026-05-01)
632
+
633
+ ### Key Finding
634
+ **Codex (79.2K GitHub stars, open-source Apache 2.0, Rust 96.3%) independently validated 7 of our planned features and revealed 5 critical gaps.** Codex is uniquely valuable because its architecture is transparent (not reverse-engineered). Three novel architectural patterns challenge our first principles.
635
+
636
+ ### Seven Validations
637
+ Model-adaptive (per-agent model selection), skills system (agentskills.io standard), lifecycle hooks (6 events, JSON I/O), subagent specialization (parallel dispatch + summary returns), pre-verification isolation (sandbox tiers), persistent memory (Memories + Chronicle), subagent worktree isolation (git worktrees).
638
+
639
+ ### Five Gaps → New Phases
640
+ | P38 | OS-Level Sandbox Enforcement | bubblewrap/Seatbelt integration |
641
+ | P39 | Harness as MCP Server | Expose pipeline stages as MCP tools |
642
+ | P40 | Skills Ecosystem Tooling | `$skill-creator`, `$skill-installer`, agentskills.io |
643
+ | P41 | Implicit Memory Capture | Chronicle-style screen-context capture |
644
+ | P42 | Scheduled Agent Automations | cron-style recurring harness runs |
645
+
646
+ ### Three Novel Patterns
647
+ 1. **Multi-surface agent**: Single logic runs CLI+IDE+App+Web via App Server
648
+ 2. **Rust-native as first-principles**: Systems language for zero-dependency install + OS sandbox
649
+ 3. **Bidirectional MCP**: Codex IS an MCP server — agents can use Codex as a tool
650
+
651
+ ### New First Principles
652
+ - FP #16: Sandbox = foundation, permissions = policy (not reverse). OS-level enforcement.
653
+ - FP #17: Agent should be composable — consumer AND provider of tools (MCP server).
654
+ - FP #18: Implicit memory complements explicit memory (Chronicle + wiki).
655
+
656
+ ### Source
657
+ [[codex-open-source-agent-2026]] — GitHub repo + official docs
658
+
659
+ ### Synthesis
660
+ [[Research: Codex State-of-the-Art Harness Improvements]]
661
+
662
+ ---
663
+
664
+ ## Claude Code Architecture Research (2026-05-01)
665
+
666
+ ### Key Finding
667
+ **Claude Code (510K-line TypeScript, 82K+ GitHub stars) is the most sophisticated production agent harness analyzed.** Six gaps → phases P33-P37, four new first principles, three independent validations.
668
+
669
+ ### Six Gaps → New Phases
670
+ | P33 | Lifecycle Hooks (30+ events, 100% compliance) | P34 | Structured Compaction (~85% reduction) | P35 | Permission Subsystem (7 modes) | P36 | Session Storage + Checkpoints | P37 | CLAUDE.md Entrypoint (96% compliance) |
671
+
672
+ ### Three Validations
673
+ FP #1 (harness>model), model-adaptive harness, skills system.
674
+
675
+ ### Design Tensions
676
+ Embeddings (P13) vs Agentic Search. Pipeline vs Loop.
677
+
678
+ ### Sources (4)
679
+ [[claude-code-architecture-vila-lab-2026]], [[claude-code-architecture-qubytes-2026]], [[claude-code-architecture-karaxai-2026]], [[claude-code-security-architecture-penligent-2026]]
680
+
681
+ ### Synthesis
682
+ [[Research: Claude Code State-of-the-Art Harness Improvements]]
683
+
684
+ ---
685
+
686
+ ## Google Antigravity Harness Research (2026-05-01)
687
+
688
+ ## Google Antigravity Harness Research (2026-05-01)
689
+
690
+ ### Key Finding
691
+ **Google Antigravity's agent-first IDE independently validated 7 of our planned features and revealed 3 critical gaps.** Antigravity (launched Nov 2025, Windsurf $2.4B acq) is the first IDE built from ground up as a control plane for autonomous coding agents — not an AI plugin on an old editor.
692
+
693
+ ### SOTA Innovations Identified
694
+
695
+ 1. **Agent-First Dual-View Architecture**: Editor View + Manager View (mission control for multi-agent orchestration). Human shifts from coder to architect.
696
+ 2. **1M Token Context Window**: Ingests entire repos into active memory. No RAG needed. But expensive ($249.99/mo Ultra).
697
+ 3. **Browser Subagent**: Headless Chromium driver. Visual verification via screenshot pixel analysis. KILLER feature for UI work.
698
+ 4. **Artifact System**: Human-reviewable deliverables (screenshots, recordings, plans) replace raw tool logs. Google Docs-style async commenting.
699
+ 5. **Cross-Project Learning KB**: Agents save successful strategies across projects. Self-improvement as core primitive.
700
+ 6. **SKILL.md Progressive Disclosure**: Same pattern as our `.pi/skills/` system. Community ecosystem ported from Claude Code.
701
+ 7. **Four Design Tenets**: Trust (artifacts), Autonomy (multi-surface agent control), Feedback (async artifact comments), Self-Improvement (learning KB).
702
+
703
+ ### What Antigravity Validates from Our Plan
704
+ - Model-adaptive harness (multi-model support matching task strengths)
705
+ - Pre-verification isolation P15b (browser subagent visual verification)
706
+ - Subagent specialization P25 (Manager View multi-agent orchestration)
707
+ - Self-evolving harness F1 (cross-project learning KB)
708
+ - Skills system F0 (identical progressive disclosure pattern)
709
+ - Adversarial verification L4 (complementary: artifacts prove right, critic proves wrong)
710
+
711
+ ### New Phases Added
712
+ | P30 | Browser Subagent | Headless browser for visual UI verification |
713
+ | P31 | Artifact Generation Layer | Human-reviewable deliverables after L4 verification |
714
+ | P32 | Cross-Project Learning KB | Multi-project knowledge transfer for agents |
715
+
716
+ ### Deliberate Non-Adoptions
717
+ - 1M token context window (too expensive for CLI harness. Selective context is OUR advantage)
718
+ - Full IDE integration (we are CLI-level harness)
719
+ - Google Cloud lock-in (we stay platform-agnostic)
720
+ - $249.99/mo pricing (our token budget optimization wins)
721
+
722
+ ### Sources
723
+ [[google-antigravity-official-blog]], [[google-antigravity-wikipedia]], [[cursor-vs-antigravity-2026]]
724
+
725
+ ### Synthesis
726
+ [[Research: Google Antigravity Harness Integration]] — Full synthesis with first-principles rethinking, gap analysis, contradictions.
727
+
728
+ ### Plan Updated
729
+ [[harness-implementation-plan]] — New First Principle #11, Antigravity Validation section, P30-P32 phases, updated token budget.
730
+
731
+ ---
732
+
733
+ ## cursor.sh Harness Innovations (2026-05-01)
734
+
735
+ ### Key Finding
736
+ **Cursor's production harness ($1B ARR, 400M+ daily requests) independently validated 5 of our planned features** (model-adaptive, dynamic context, P27 context anxiety, F1 self-evolving, P10 fuzzy edits) and revealed **4 critical gaps** now incorporated: pre-verification isolation (P15b), positive loops (P28), error classification (P29), subagent specialization (P25 evolved).
737
+
738
+ ### Validations (Cursor confirmed our designs before we built them)
739
+ - Model-adaptive harness: Cursor provisions different tool formats per model (patches vs string replace)
740
+ - Dynamic context: Cursor removed static pre-loaded context in favor of agent-driven context discovery
741
+ - Context anxiety: Cursor independently discovered models refusing work as context fills (validates our P27)
742
+ - Self-evolving harness: Cursor's 90-min RL loop on user accept/reject data (validates our F1)
743
+ - Edit quality bottleneck: Cursor's "Diff Problem" is their hardest engineering challenge (validates P10)
744
+
745
+ ### New Phases Added
746
+ | P15b | Pre-Verification Isolation Sandbox | Shadow Workspace pattern: validate in isolated temp workspace before showing results |
747
+ | P28 | Positive Agent Loop Hooks | Counterpart to drift monitor: keep agent running until DONE, not just stop when stuck |
748
+ | P29 | Tool Error Classification | Per-tool per-model error types + anomaly detection baselines (enables self-healing) |
749
+ | P25 | Subagent Specialization Router | Evolved from cost router: dispatch by task type (plan/edit/debug), fresh context per subagent |
750
+ | P21 | Keep Rate + LLM-as-Judge | Extended L5 to track code persistence over time + semantic satisfaction signals |
751
+
752
+ ### First Principles from Cursor
753
+ 1. **Pre-verification > post-verification.** Validate before user sees failure. Shadow Workspace pattern.
754
+ 2. **Keep Rate > benchmark scores.** Fraction of agent code still in codebase after time intervals is ultimate metric.
755
+ 3. **Error classification enables self-healing.** Can't fix what you can't categorize. Cursor classifies every tool error.
756
+ 4. **Positive loops as important as negative loops.** Hooks that keep agent running are counterpart to drift detection.
757
+ 5. **Subagent specialization > cost routing.** Dispatch by capability, not just price.
758
+ 6. **Context anxiety is real and cross-model.** Prepare proactively.
759
+ 7. **Architectural control matters more than model access.** Our .pi/ tool interception is our "fork."
760
+
761
+ ### Sources (7)
762
+ [[cursor-shadow-workspace-2024]], [[cursor-agent-best-practices-2026]], [[cursor-harness-april-2026]], [[cursor-shipped-coding-agent-2026]], [[cursor-instant-apply-2024]], [[cursor-fork-29b-2025]], [[cursor-harness-innovations]]
763
+
764
+ ### Synthesis
765
+ [[Research: cursor.sh Harness Innovations]] — Full synthesis with gap analysis, contradictions, open questions.
766
+
767
+ ### Plan Updated
768
+ [[harness-implementation-plan]] — New First Principles #8-10, Cursor Validation section, new/extended phases, updated token budget.
769
+
770
+ ---
771
+
772
+ ## Model-Specific Prompting Guides — Harness Redesign (2026-05-01)
773
+
774
+ ### Key Finding
775
+ **Every major model provider publishes OFFICIAL prompting guidance.** The current harness design ("write once for strictest model, relax for forgiving") is WRONG. Each provider specifies fundamentally DIFFERENT prompting conventions — not different strictness levels.
776
+
777
+ ### Critical Contradictions Found
778
+ - **Constraint ordering**: OpenAI says FIRST. Google says LAST. Can't reconcile in one format.
779
+ - **Prompt density**: OpenAI (GPT-5.5+) says SHORTER, outcome-first. Harness generates verbose constraint-heavy prompts.
780
+ - **Structure format**: Anthropic mandates XML tags. Google uses plain text. OpenAI uses XML-like sections.
781
+ - **Temperature**: Google mandates 1.0. Others unspecified. Harness needs model-specific temp.
782
+ - **Verification**: Google = split-step (verify→generate). Anthropic = self-check at end. OpenAI = verification loop.
783
+
784
+ ### Redesign: Provider-Native Prompting
785
+ New module: **Prompt Renderer** (Phase P22b). Generates provider-native prompts from a semantic spec.
786
+
787
+ ```
788
+ Semantic Spec → Prompt Renderer → Provider-Native Prompt
789
+ ├── openai-renderer (XML-like, constraints-first, preambles)
790
+ ├── anthropic-renderer (XML tags, long-content-top, role)
791
+ └── google-renderer (plain text, constraints-LAST, grounding)
792
+ ```
793
+
794
+ ### Pages Created/Updated
795
+ - Created: [[openai-prompt-guidance]], [[anthropic-prompt-best-practices]], [[gemini-3-prompting-guide]] (sources)
796
+ - Created: [[provider-native-prompting]] (concept)
797
+ - Created: [[Research: Model-Specific Prompting Guides]] (synthesis)
798
+ - Rewritten: [[model-adaptive-harness]] (v2 redesign — retired old principle)
799
+ - Rewritten: [[harness-configuration-layers]] (added Gemini, provider-native dimensions)
800
+ - Updated: [[index]], [[log]], [[hot]]
801
+
802
+ ### Provider Profiles (Summary)
803
+ | Provider | Structure | Constraint Order | Verification | Thinking |
804
+ |----------|-----------|-----------------|--------------|----------|
805
+ | OpenAI | XML-like sections | FIRST | Pre-flight/post-flight loop | reasoning_effort |
806
+ | Anthropic | XML tags | Flexible (top) | Self-check at end | effort + adaptive |
807
+ | Google | Plain text | LAST | Split-step verify→generate | thinking level |
808
+
809
+ ## Augment Code Context Engine Research (2026-04-30)
810
+
811
+ ### Key Findings
812
+ - Context Engine: semantic codebase indexing (1M+ files), real-time knowledge graph, not grep/ keyword.
813
+ - #1 SWE-bench Pro (51.80%) — same model (Opus 4.5) beats Cursor by 1.59%, Claude Code by 2.05%.
814
+ - #1 open-source SWE-bench Verified agent (65.4%) — dual-model: Claude Sonnet 3.7 + OpenAI o1 ensembler.
815
+ - Prompt Enhancer: auto-enriches queries with codebase context before LLM sees them.
816
+ - Context as MCP: launched Feb 2026 — 30-80% improvement when used as context provider for other agents.
817
+ - "Contractor vs Employee" model: context is the bottleneck, not intelligence.
818
+
819
+ ### Integration Plan (6 Modules)
820
+ 1. **Semantic Codebase Indexer** — embeddings via sentence-transformers, LanceDB storage, tree-sitter chunking, watchdog sync.
821
+ 2. **Context Retrieval Engine** — hybrid BM25 + semantic search, multi-source (code + wiki + git + knowledge).
822
+ 3. **Prompt Enhancer** — pre-process queries, inject context, detect reuse opportunities.
823
+ 4. **MCP Context Server** — expose `query_codebase` tool, read-only.
824
+ 5. **Dual-Model Agent Loop** — primary model (Claude) for iteration + ensembler (GPT-5/o1) for selection.
825
+ 6. **Multi-Source Context Aggregator** — unify lean-ctx + semantic index + wiki + ctx_knowledge + git history.
826
+
827
+ ### Pages Created (15)
828
+ Sources: [[Augment Context Engine Official]], [[Augment SWE-bench Agent GitHub]], [[Augment SWE-bench Pro Blog]], [[Augment Code WorkOS ERC 2025]], [[Augment Code Codacy AI Giants]], [[Augment Code MCP SiliconAngle]], [[Auggie Context MCP Server]]
829
+ Concepts: [[Context Engine (AI Coding)]], [[Semantic Codebase Indexing]], [[Dual-Model Agent Architecture]], [[Prompt Enhancement]], [[Majority Vote Ensembling]], [[Contractor vs Employee AI Model]]
830
+ Entity: [[Augment Code]]
831
+ Synthesis: [[Research: Augment Code Context Engine]]
832
+
833
+ ### Open Questions → NOW RESOLVED (2026-04-30 follow-up research)
834
+ - **Q1: Augment's embedding model & vector DB** — Still undisclosed. Inferred: likely custom variant of Voyage-code-3 / BGE-code-v1 / SFR-Embedding-Code fine-tuned on proprietary corpus. Vector DB candidates: Pinecone serverless, Weaviate, or custom sharded FAISS. See [[coir-code-retrieval-benchmark]] for top code embedding models.
835
+ - **Q2: Chunking strategy & compression** — Resolved. State of the art is AST-aware chunking (cAST paper, June 2025) + contextualized text prepending. Chunking matters MORE than embedding model (Vectara NAACL 2025). Augment almost certainly uses this approach. See [[cast-code-chunking-paper]], [[AST-Aware Code Chunking]].
836
+ - **Q3: MiniLM-L6-v2 vs larger models** — Resolved. MiniLM-L6-v2 is 5-8% less accurate than larger models (78.1% vs 86.2% top-5 on general text, gap wider for code). But gap can be partially closed by AST-aware chunking + contextualized text + hybrid search. Start with MiniLM + good chunking, upgrade to BGE-code-v1 if CoIR benchmark shows insufficient quality. See [[embedding-models-benchmark-supermemory-2025]], [[code-chunk-library-supermemory]].
837
+
838
+ ### New Sources (5)
839
+ [[cast-code-chunking-paper]], [[vectara-chunking-vs-embedding-naacl2025]], [[coir-code-retrieval-benchmark]], [[code-chunk-library-supermemory]], [[embedding-models-benchmark-supermemory-2025]]
840
+
841
+ ### New Concepts (3)
842
+ [[AST-Aware Code Chunking]], [[Contextualized Text Embedding]], [[Late Chunking vs Early Chunking]]
843
+
844
+ ### Remaining Open Questions
845
+ - Real-time sync at scale (1M+ files) — implementation detail not available.
846
+ - Context compression algorithm — black box.
847
+ - Retrieval pipeline (candidate generation → re-ranking) — partial information only.
848
+ - Empirical CoIR benchmark validation needed for our setup.
849
+
850
+ ---
851
+
852
+ **46 open questions resolved across 6 themes — see [[resolved-context-pruning-inplace-vs-restart]] and 5 other resolution pages.**
853
+
854
+ ## Consensus-to-Wiki Filing Rule (2026-04-30)
855
+
856
+ **Mandatory**: Winning consensus from any agent debate MUST be filed in `wiki/consensus/`. All 4 verdict types file (CONSENSUS_REACHED, DEADLOCK, BUDGET_EXHAUSTED, TIMEOUT). Purpose: permanent agent alignment — future agents query before forming positions, harness blocks contradictions.
857
+
858
+ Updated: [[consensus-debate]], [[harness-implementation-plan]] (new First Principle #7, phase P19b, Consensus Filing Contract), [[adr-011]], [[selective-debate-routing]], [[harness]]. Created: [[consensus-records]] (directory + template).
859
+
860
+ ## Consolidation Summary (2026-04-30)
861
+
862
+ **Completed**: Full first-principles consolidation of ALL April 2026 research into the harness pipeline.
863
+
864
+ ### New Pages Created
865
+
866
+ - [[harness-control-frameworks]] — Unified view: H-Formalism + Feedforward-Feedback + Generator-Evaluator as orthogonal dimensions
867
+ - [[drift-detection-unified]] — Three complementary drift paradigms (L2.5 tool-call, L3 spec, L4 implementation) with clear boundaries
868
+ - [[think-in-code-enforcement]] — Formal L3 module for mandatory code-over-data paradigm with 3-layer enforcement architecture
869
+
870
+ ### Pages Significantly Updated
871
+
872
+ - [[harness-implementation-plan]] — Complete rewrite: 27 properly-numbered build phases (P0-P27 + F1-F3 future), single authoritative token budget (~15K-16K/subtask), all tools/research integrated, proper phase-to-layer mapping
873
+ - [[harness]] — Updated to reflect L2.5 drift monitor, cross-cutting tool enhancements, formal models, token budget
874
+ - [[index]] — Full reorganization: harness pipeline section, formal models, concepts grouped by domain (execution/drift, context/search, agent architecture), all 30+ concepts listed
875
+ - [[adr-011]] — Updated status to "accepted", integrated iMAD selective routing findings, revised token budget (always-debate ~13K → selective ~3K avg), pre-debate gating mechanism
876
+ - [[model-adaptive-harness]] — Restructured as canonical entry point with pointer to [[harness-configuration-layers]] for detailed tables. Added Gemini column. Removed redundancy.
877
+
878
+ ### Duplication/Redundancy Resolved
879
+
880
+ 1. **Layer numbering**: Old Phase 1-19 numbering replaced with P0-P27 mapped to layers. L2.5 properly placed. Phase 12 no longer collides with layer L3.
881
+ 2. **Drift detection**: Three overlapping concepts (L3 grounding, L2.5 meta-agent, L4 adversarial) unified in [[drift-detection-unified]] with clear "why three" justification.
882
+ 3. **Token budget**: Scattered across 4+ pages → single table in [[harness-implementation-plan]].
883
+ 4. **Model profiles**: [[model-adaptive-harness]] and [[harness-configuration-layers]] de-duplicated — former is entry point, latter is detailed tables.
884
+ 5. **Control frameworks**: H-formalism, feedforward-feedback, generator-evaluator unified in [[harness-control-frameworks]] as orthogonal dimensions.
885
+ 6. **ADR-011 staleness**: Updated from always-debate to selective routing per iMAD findings.
886
+ 7. **Index freshness**: All ~30 concept pages now listed. Previously missing ~7.
887
+
888
+ ### New Tools in Pipeline
889
+
890
+ | Tool | Phase | Status |
891
+ |------|-------|--------|
892
+ | ck (semantic code search) | P13 | Planned — MCP integration + 3-layer enforcement |
893
+ | Gitingest (bulk ingestion) | P15 | Planned — `/gitingest` skill |
894
+ | pi-messenger (stripped) | P17 | Planned — debate transport layer |
895
+ | pi-lean-ctx (native) | F0 | Done — [[2026-04-30-pi-lean-ctx-native]] |
896
+
897
+ ### Key New Paradigms
898
+
899
+ - **Think-in-Code enforcement** now has its own L3 module with 3-layer architecture (system prompt → interception → compression)
900
+ - **Selective debate routing** (iMAD) reduces consensus debate cost by ~92% on high-confidence tasks
901
+ - **Context drift as positive feedback loop** — each failed attempt accelerates failure. Meta-agent breaks the loop (detect → prune → restart).
902
+ - **Three quality concerns, three timings**: Syntax (inline, blocks progress), Semantics (L4, needs LLM), Style (Phase 16 final gate, deterministic)
903
+
904
+ ### Token Budget (Unified, Per Subtask)
905
+
906
+ - ~15,000-16,000 total pipeline overhead (down from ~17,500 baseline)
907
+ - Savings: AST truncation (30-50%), fuzzy edits (5-15%), inline validation (10-20%), Haiku router (15-25%), selective debate (92% on ~80% tasks), Think-in-Code (30-200× on analysis)
908
+
909
+ ### Active Architecture
910
+
911
+ ```
912
+ L1: Spec → L2: Plan → L2.5: Drift Monitor → L3: Execute (+TiC, +AST, +Fuzzy, +Inline, +ck, +Gitingest)
913
+ → L4: Adversarial (+selective debate) → Phase 16: Lint+Format → L5: Observe → L6: Memory → L7: Orch → L8: Query
914
+ ```
915
+
916
+ Formal models: H=(E,T,C,S,L,V) + Feedforward-Feedback + Generator-Evaluator. All mapped to our pipeline in [[harness-control-frameworks]].
917
+
918
+ ### GitHub Issues as Harness Spec Storage (2026-04-30)
919
+
920
+ Research: [[Research: GitHub Issues as Harness Spec Storage]] — GitHub Issues can serve as cloud-persistent spec storage with native sub-issues (parent-child hierarchies, April 2025) and issue dependencies (blocked-by/blocking).
921
+
922
+ Key architecture: Dual-tier — local `.pi/harness/specs/<id>.json` for speed + GitHub Issue for cross-session ledger. Not every micro-step creates an issue; only major state transitions (spec creation, plan creation, phase completion).
923
+
924
+ Toolchain: `gh issue create/edit/comment/list/view` for CRUD, `gh-sub-issue` extension (yahsan2, 110 stars, MIT) for parent-child management until `gh` CLI gains native support (cli/cli#10298). GitHub Projects v2 for optional kanban/roadmap visualization.
925
+
926
+ Labels encode machine-readable state: `harness-spec`, `layer-{n}`, `status:{status}`. Issue comments serve as immutable execution audit trail.
927
+
928
+ **Fork safety**: `.pi/harness/specs/` is gitignored — never committed, never forked. `ultimate-pi harness init` bootstraps a fork's own issue tracker (enable issues, create labels, set `gh` repo context). Zero upstream spec leakage into forks.
929
+
930
+ Init flow: detect fork → enable issues → create labels → set repo → gitignore cache → ready. Idempotent re-runs are no-ops.
931
+
932
+ **Content-addressed spec identity**: Every HardenedSpec carries a `SHA256(intent + criteria + done)` fingerprint embedded in the issue body (`<!-- spec-fp: <hash> -->`). Harness resolves specs by hash search across repos, not by brittle issue numbers. When fork merges upstream: `ultimate-pi harness migrate` transfers specs via `gh issue transfer` + relays labels. Idempotent. ~2-3 days to implement.