ultimate-pi 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (516) hide show
  1. package/.agents/skills/ck-search/SKILL.md +99 -0
  2. package/.agents/skills/defuddle/SKILL.md +90 -0
  3. package/.agents/skills/find-skills/SKILL.md +142 -0
  4. package/.agents/skills/firecrawl/SKILL.md +150 -0
  5. package/.agents/skills/firecrawl/rules/install.md +82 -0
  6. package/.agents/skills/firecrawl/rules/security.md +26 -0
  7. package/.agents/skills/firecrawl-agent/SKILL.md +57 -0
  8. package/.agents/skills/firecrawl-build-interact/SKILL.md +67 -0
  9. package/.agents/skills/firecrawl-build-onboarding/SKILL.md +102 -0
  10. package/.agents/skills/firecrawl-build-onboarding/references/auth-flow.md +39 -0
  11. package/.agents/skills/firecrawl-build-onboarding/references/project-setup.md +20 -0
  12. package/.agents/skills/firecrawl-build-onboarding/references/sdk-installation.md +17 -0
  13. package/.agents/skills/firecrawl-build-scrape/SKILL.md +68 -0
  14. package/.agents/skills/firecrawl-build-search/SKILL.md +68 -0
  15. package/.agents/skills/firecrawl-crawl/SKILL.md +58 -0
  16. package/.agents/skills/firecrawl-download/SKILL.md +69 -0
  17. package/.agents/skills/firecrawl-interact/SKILL.md +83 -0
  18. package/.agents/skills/firecrawl-map/SKILL.md +50 -0
  19. package/.agents/skills/firecrawl-parse/SKILL.md +61 -0
  20. package/.agents/skills/firecrawl-scrape/SKILL.md +68 -0
  21. package/.agents/skills/firecrawl-search/SKILL.md +59 -0
  22. package/.agents/skills/obsidian-bases/SKILL.md +299 -0
  23. package/.agents/skills/obsidian-markdown/SKILL.md +237 -0
  24. package/.agents/skills/posthog-analyst/SKILL.md +306 -0
  25. package/.agents/skills/posthog-analyst/evals/evals.json +23 -0
  26. package/.agents/skills/wiki/SKILL.md +215 -0
  27. package/.agents/skills/wiki/references/css-snippets.md +122 -0
  28. package/.agents/skills/wiki/references/frontmatter.md +107 -0
  29. package/.agents/skills/wiki/references/git-setup.md +58 -0
  30. package/.agents/skills/wiki/references/mcp-setup.md +149 -0
  31. package/.agents/skills/wiki/references/modes.md +259 -0
  32. package/.agents/skills/wiki/references/plugins.md +96 -0
  33. package/.agents/skills/wiki/references/rest-api.md +124 -0
  34. package/.agents/skills/wiki-autoresearch/SKILL.md +211 -0
  35. package/.agents/skills/wiki-autoresearch/references/program.md +75 -0
  36. package/.agents/skills/wiki-fold/SKILL.md +204 -0
  37. package/.agents/skills/wiki-fold/references/fold-template.md +133 -0
  38. package/.agents/skills/wiki-ingest/SKILL.md +288 -0
  39. package/.agents/skills/wiki-lint/SKILL.md +183 -0
  40. package/.agents/skills/wiki-query/SKILL.md +176 -0
  41. package/.agents/skills/wiki-save/SKILL.md +128 -0
  42. package/.ckignore +41 -0
  43. package/.env.example +9 -0
  44. package/.github/workflows/lint.yml +33 -0
  45. package/.github/workflows/publish-github-packages.yml +35 -0
  46. package/.github/workflows/publish-npm.yml +1 -1
  47. package/.pi/SYSTEM.md +107 -40
  48. package/.pi/agents/pi-pi/agent-expert.md +205 -0
  49. package/.pi/agents/pi-pi/cli-expert.md +47 -0
  50. package/.pi/agents/pi-pi/config-expert.md +67 -0
  51. package/.pi/agents/pi-pi/ext-expert.md +53 -0
  52. package/.pi/agents/pi-pi/keybinding-expert.md +123 -0
  53. package/.pi/agents/pi-pi/pi-orchestrator.md +103 -0
  54. package/.pi/agents/pi-pi/prompt-expert.md +83 -0
  55. package/.pi/agents/pi-pi/skill-expert.md +52 -0
  56. package/.pi/agents/pi-pi/theme-expert.md +46 -0
  57. package/.pi/agents/pi-pi/tui-expert.md +100 -0
  58. package/.pi/agents/rethink.md +140 -0
  59. package/.pi/agents/wiki-ingest.md +67 -0
  60. package/.pi/agents/wiki-lint.md +75 -0
  61. package/.pi/auto-commit.json +20 -0
  62. package/.pi/extensions/banner.png +0 -0
  63. package/.pi/extensions/ck-enforce.ts +216 -0
  64. package/.pi/extensions/custom-footer.ts +308 -0
  65. package/.pi/extensions/custom-header.ts +116 -0
  66. package/.pi/extensions/dotenv-loader.ts +170 -0
  67. package/.pi/internal/cursor-sdk-transcript-parser.ts +59 -0
  68. package/.pi/model-router.json +95 -0
  69. package/.pi/npm/.gitignore +2 -0
  70. package/.pi/prompts/git-sync.md +124 -0
  71. package/.pi/prompts/harness-setup.md +509 -0
  72. package/.pi/prompts/save.md +16 -0
  73. package/.pi/prompts/wiki-autoresearch.md +19 -0
  74. package/.pi/prompts/wiki.md +23 -0
  75. package/.pi/providers/cursor-sdk-provider.test.mjs +476 -0
  76. package/.pi/providers/cursor-sdk-provider.ts +1085 -0
  77. package/.pi/settings.json +14 -4
  78. package/.pi/skills/agent-router/SKILL.md +174 -0
  79. package/.pi/sounds/alert/1-kaching-track.mp3 +0 -0
  80. package/.pi/sounds/error/1-ksi-wth-track.mp3 +0 -0
  81. package/.pi/sounds/error/2-smash-track.mp3 +0 -0
  82. package/.pi/sounds/error/3-buzzer-track.mp3 +0 -0
  83. package/.pi/sounds/notification/1-soft-notification-track.mp3 +0 -0
  84. package/.pi/sounds/project-sounds.json +25 -0
  85. package/.pi/sounds/reminder/1-soft-notification-track.mp3 +0 -0
  86. package/.pi/sounds/success/1-tada-track.mp3 +0 -0
  87. package/.pi/sounds/success/2-jobs-done-track.mp3 +0 -0
  88. package/.pi/sounds/success/3-yay-track.mp3 +0 -0
  89. package/CONTRIBUTING.md +116 -0
  90. package/README.md +32 -39
  91. package/biome.json +34 -0
  92. package/firecrawl/.env.template +58 -0
  93. package/firecrawl/README.md +49 -0
  94. package/firecrawl/docker-compose.yaml +201 -0
  95. package/firecrawl/searxng/searxng.env +3 -0
  96. package/firecrawl/searxng/settings.yml +85 -0
  97. package/lefthook.yml +8 -0
  98. package/package.json +55 -24
  99. package/vault/AGENTS.md +37 -0
  100. package/vault/wiki/_templates/comparison.md +39 -0
  101. package/vault/wiki/_templates/concept.md +40 -0
  102. package/vault/wiki/_templates/decision.md +21 -0
  103. package/vault/wiki/_templates/entity.md +32 -0
  104. package/vault/wiki/_templates/flow.md +14 -0
  105. package/vault/wiki/_templates/module.md +18 -0
  106. package/vault/wiki/_templates/question.md +31 -0
  107. package/vault/wiki/_templates/source.md +39 -0
  108. package/vault/wiki/concepts/AST-Aware Code Chunking.md +44 -0
  109. package/vault/wiki/concepts/Build-Time Prompt Compilation.md +107 -0
  110. package/vault/wiki/concepts/Context Engine (AI Coding).md +47 -0
  111. package/vault/wiki/concepts/Context-Aware System Reminders.md +61 -0
  112. package/vault/wiki/concepts/Contextualized Text Embedding.md +42 -0
  113. package/vault/wiki/concepts/Contractor vs Employee AI Model.md +55 -0
  114. package/vault/wiki/concepts/Dual-Model Agent Architecture.md +65 -0
  115. package/vault/wiki/concepts/Late Chunking vs Early Chunking.md +43 -0
  116. package/vault/wiki/concepts/Majority Vote Ensembling.md +68 -0
  117. package/vault/wiki/concepts/Meta-Harness.md +16 -0
  118. package/vault/wiki/concepts/Multi-Agent AI Coding Architecture.md +75 -0
  119. package/vault/wiki/concepts/Prompt Enhancement.md +90 -0
  120. package/vault/wiki/concepts/Prompt Renderer.md +89 -0
  121. package/vault/wiki/concepts/Semantic Codebase Indexing.md +67 -0
  122. package/vault/wiki/concepts/additive-config-hierarchy.md +16 -0
  123. package/vault/wiki/concepts/agent-artifacts-verifiable-deliverables.md +71 -0
  124. package/vault/wiki/concepts/agent-browser-browser-automation.md +99 -0
  125. package/vault/wiki/concepts/agent-codebase-interface.md +43 -0
  126. package/vault/wiki/concepts/agent-harness-architecture.md +67 -0
  127. package/vault/wiki/concepts/agent-loop-detection-patterns.md +133 -0
  128. package/vault/wiki/concepts/agent-search-enforcement.md +126 -0
  129. package/vault/wiki/concepts/agent-skills-ecosystem.md +74 -0
  130. package/vault/wiki/concepts/agent-skills-pattern.md +68 -0
  131. package/vault/wiki/concepts/agentic-harness-context-enforcement.md +91 -0
  132. package/vault/wiki/concepts/agentic-harness.md +34 -0
  133. package/vault/wiki/concepts/agentic-orchestration-pipeline.md +56 -0
  134. package/vault/wiki/concepts/agentic-search-no-embeddings.md +18 -0
  135. package/vault/wiki/concepts/anthropic-context-engineering.md +13 -0
  136. package/vault/wiki/concepts/antigravity-agent-first-architecture.md +61 -0
  137. package/vault/wiki/concepts/ast-compression.md +19 -0
  138. package/vault/wiki/concepts/ast-truncation.md +66 -0
  139. package/vault/wiki/concepts/barrel-files.md +37 -0
  140. package/vault/wiki/concepts/browser-harness-agent.md +41 -0
  141. package/vault/wiki/concepts/browser-subagent-visual-verification.md +82 -0
  142. package/vault/wiki/concepts/codebase-intelligence-ecosystem-comparison.md +192 -0
  143. package/vault/wiki/concepts/codebase-intelligence-harness-integration.md +161 -0
  144. package/vault/wiki/concepts/codebase-to-context-ingestion.md +46 -0
  145. package/vault/wiki/concepts/codex-harness-innovations.md +147 -0
  146. package/vault/wiki/concepts/consensus-debate-flow.md +17 -0
  147. package/vault/wiki/concepts/consensus-debate.md +206 -0
  148. package/vault/wiki/concepts/content-addressed-spec-identity.md +166 -0
  149. package/vault/wiki/concepts/context-anxiety.md +57 -0
  150. package/vault/wiki/concepts/context-compression-techniques.md +19 -0
  151. package/vault/wiki/concepts/context-continuity.md +22 -0
  152. package/vault/wiki/concepts/context-drift-in-agents.md +106 -0
  153. package/vault/wiki/concepts/context-engineering.md +62 -0
  154. package/vault/wiki/concepts/context-folding.md +67 -0
  155. package/vault/wiki/concepts/context-mode.md +38 -0
  156. package/vault/wiki/concepts/cursor-harness-innovations.md +107 -0
  157. package/vault/wiki/concepts/deterministic-session-compaction.md +79 -0
  158. package/vault/wiki/concepts/drift-detection-unified.md +296 -0
  159. package/vault/wiki/concepts/execution-feedback-loop.md +46 -0
  160. package/vault/wiki/concepts/feedforward-feedback-harness.md +60 -0
  161. package/vault/wiki/concepts/five-root-cause-metrics-sentrux.md +40 -0
  162. package/vault/wiki/concepts/fork-safe-spec-storage.md +89 -0
  163. package/vault/wiki/concepts/fts5-sandbox.md +19 -0
  164. package/vault/wiki/concepts/fuzzy-edit-matching.md +71 -0
  165. package/vault/wiki/concepts/gemini-cli-architecture.md +104 -0
  166. package/vault/wiki/concepts/generator-evaluator-architecture.md +64 -0
  167. package/vault/wiki/concepts/guardian-agent-pattern.md +67 -0
  168. package/vault/wiki/concepts/harness-configuration-layers.md +89 -0
  169. package/vault/wiki/concepts/harness-control-frameworks.md +155 -0
  170. package/vault/wiki/concepts/harness-engineering-first-principles.md +90 -0
  171. package/vault/wiki/concepts/harness-h-formalism.md +53 -0
  172. package/vault/wiki/concepts/hybrid-code-search.md +61 -0
  173. package/vault/wiki/concepts/inline-post-edit-validation.md +112 -0
  174. package/vault/wiki/concepts/legendary-engineering-patterns-harness.md +110 -0
  175. package/vault/wiki/concepts/lifecycle-hooks.md +94 -0
  176. package/vault/wiki/concepts/mcp-tool-routing.md +102 -0
  177. package/vault/wiki/concepts/memory-system-of-record-vs-ephemeral-cache.md +47 -0
  178. package/vault/wiki/concepts/meta-agent-context-pruning.md +151 -0
  179. package/vault/wiki/concepts/model-adaptive-harness.md +122 -0
  180. package/vault/wiki/concepts/model-routing-agents.md +101 -0
  181. package/vault/wiki/concepts/monorepo-architecture.md +45 -0
  182. package/vault/wiki/concepts/multi-agent-specialization.md +61 -0
  183. package/vault/wiki/concepts/permission-subsystem.md +16 -0
  184. package/vault/wiki/concepts/pi-messenger-analysis.md +243 -0
  185. package/vault/wiki/concepts/pi-vscode-extension-landscape.md +37 -0
  186. package/vault/wiki/concepts/policy-engine-pattern.md +78 -0
  187. package/vault/wiki/concepts/progressive-disclosure-agents.md +53 -0
  188. package/vault/wiki/concepts/progressive-skill-disclosure.md +17 -0
  189. package/vault/wiki/concepts/provider-native-prompting.md +203 -0
  190. package/vault/wiki/concepts/quality-signal-sentrux.md +37 -0
  191. package/vault/wiki/concepts/repo-map-ranking.md +42 -0
  192. package/vault/wiki/concepts/result-monad-error-handling.md +47 -0
  193. package/vault/wiki/concepts/safety-defense-in-depth.md +83 -0
  194. package/vault/wiki/concepts/sandbox-os-enforcement.md +18 -0
  195. package/vault/wiki/concepts/selective-debate-routing.md +70 -0
  196. package/vault/wiki/concepts/self-evolving-harness.md +60 -0
  197. package/vault/wiki/concepts/sentrux-mcp-integration.md +36 -0
  198. package/vault/wiki/concepts/sentrux-rules-engine.md +49 -0
  199. package/vault/wiki/concepts/shell-pattern-compression.md +24 -0
  200. package/vault/wiki/concepts/skill-first-architecture.md +166 -0
  201. package/vault/wiki/concepts/structured-compaction.md +78 -0
  202. package/vault/wiki/concepts/subagent-orchestration.md +17 -0
  203. package/vault/wiki/concepts/subagent-worktree-isolation.md +68 -0
  204. package/vault/wiki/concepts/superpowers-methodology.md +78 -0
  205. package/vault/wiki/concepts/think-in-code.md +73 -0
  206. package/vault/wiki/concepts/ts-execution-layer.md +100 -0
  207. package/vault/wiki/concepts/typescript-strict-mode.md +37 -0
  208. package/vault/wiki/concepts/vcc-conversation-compaction-for-pi.md +51 -0
  209. package/vault/wiki/concepts/verification-drift-detection.md +19 -0
  210. package/vault/wiki/consensus/consensus-records.md +58 -0
  211. package/vault/wiki/decisions/2026-04-30-pi-lean-ctx-native.md +122 -0
  212. package/vault/wiki/decisions/adr-008.md +40 -0
  213. package/vault/wiki/decisions/adr-009.md +46 -0
  214. package/vault/wiki/decisions/adr-010.md +55 -0
  215. package/vault/wiki/decisions/adr-011.md +165 -0
  216. package/vault/wiki/decisions/adr-012.md +102 -0
  217. package/vault/wiki/decisions/adr-013.md +59 -0
  218. package/vault/wiki/decisions/adr-014.md +73 -0
  219. package/vault/wiki/decisions/adr-015.md +81 -0
  220. package/vault/wiki/decisions/adr-016.md +91 -0
  221. package/vault/wiki/decisions/adr-017.md +79 -0
  222. package/vault/wiki/decisions/adr-018.md +100 -0
  223. package/vault/wiki/decisions/adr-019.md +75 -0
  224. package/vault/wiki/decisions/adr-020.md +106 -0
  225. package/vault/wiki/decisions/adr-021.md +86 -0
  226. package/vault/wiki/decisions/adr-022.md +113 -0
  227. package/vault/wiki/decisions/adr-023.md +113 -0
  228. package/vault/wiki/decisions/adr-024.md +73 -0
  229. package/vault/wiki/decisions/adr-025.md +130 -0
  230. package/vault/wiki/decisions/adr-026.md +56 -0
  231. package/vault/wiki/decisions/colocate-wiki.md +34 -0
  232. package/vault/wiki/entities/Anders Hejlsberg.md +29 -0
  233. package/vault/wiki/entities/Anthropic.md +17 -0
  234. package/vault/wiki/entities/Augment Code.md +49 -0
  235. package/vault/wiki/entities/Bjarne Stroustrup.md +26 -0
  236. package/vault/wiki/entities/Bolt.new (StackBlitz).md +39 -0
  237. package/vault/wiki/entities/Boris Cherny.md +11 -0
  238. package/vault/wiki/entities/Claude Code.md +19 -0
  239. package/vault/wiki/entities/Dennis Ritchie.md +26 -0
  240. package/vault/wiki/entities/Emergent Labs.md +32 -0
  241. package/vault/wiki/entities/Google Cloud.md +16 -0
  242. package/vault/wiki/entities/Guido van Rossum.md +28 -0
  243. package/vault/wiki/entities/Ken Thompson.md +28 -0
  244. package/vault/wiki/entities/Lee et al.md +16 -0
  245. package/vault/wiki/entities/Linus Torvalds.md +28 -0
  246. package/vault/wiki/entities/Lovable (company).md +40 -0
  247. package/vault/wiki/entities/Martin Fowler.md +16 -0
  248. package/vault/wiki/entities/Meng et al.md +16 -0
  249. package/vault/wiki/entities/OpenAI.md +16 -0
  250. package/vault/wiki/entities/Rocket.new.md +38 -0
  251. package/vault/wiki/entities/VILA-Lab.md +15 -0
  252. package/vault/wiki/entities/autodev-codebase.md +18 -0
  253. package/vault/wiki/entities/ck-tool.md +59 -0
  254. package/vault/wiki/entities/codesearch.md +18 -0
  255. package/vault/wiki/entities/disler-indydevdan.md +33 -0
  256. package/vault/wiki/entities/gsd-get-shit-done.md +56 -0
  257. package/vault/wiki/entities/javascript-runtimes.md +48 -0
  258. package/vault/wiki/entities/jesse-vincent.md +38 -0
  259. package/vault/wiki/entities/lean-ctx.md +32 -0
  260. package/vault/wiki/entities/opendev.md +41 -0
  261. package/vault/wiki/entities/ops-codegraph-tool.md +18 -0
  262. package/vault/wiki/entities/pi-coding-agent.md +53 -0
  263. package/vault/wiki/entities/sentrux.md +54 -0
  264. package/vault/wiki/entities/vgrep-tool.md +57 -0
  265. package/vault/wiki/entities/vitest.md +41 -0
  266. package/vault/wiki/flows/harness-wiki-pipeline.md +204 -0
  267. package/vault/wiki/hot.md +932 -0
  268. package/vault/wiki/index.md +437 -0
  269. package/vault/wiki/log.md +418 -0
  270. package/vault/wiki/meta/dashboard.md +30 -0
  271. package/vault/wiki/meta/lint-report-2026-04-30.md +86 -0
  272. package/vault/wiki/meta/lint-report-2026-05-02.md +251 -0
  273. package/vault/wiki/meta/overview.canvas +43 -0
  274. package/vault/wiki/modules/adversarial-verification.md +57 -0
  275. package/vault/wiki/modules/automated-observability.md +54 -0
  276. package/vault/wiki/modules/bench.md +20 -0
  277. package/vault/wiki/modules/extensions.md +23 -0
  278. package/vault/wiki/modules/grounding-checkpoints.md +62 -0
  279. package/vault/wiki/modules/harness-implementation-plan.md +345 -0
  280. package/vault/wiki/modules/harness-wiki-skill-mapping.md +135 -0
  281. package/vault/wiki/modules/harness.md +86 -0
  282. package/vault/wiki/modules/persistent-memory.md +85 -0
  283. package/vault/wiki/modules/schema-orchestration.md +68 -0
  284. package/vault/wiki/modules/skills.md +27 -0
  285. package/vault/wiki/modules/spec-hardening.md +58 -0
  286. package/vault/wiki/modules/structured-planning.md +53 -0
  287. package/vault/wiki/modules/think-in-code-enforcement.md +153 -0
  288. package/vault/wiki/modules/wiki-query-interface.md +64 -0
  289. package/vault/wiki/overview.md +51 -0
  290. package/vault/wiki/questions/Research-pi-vs-claude-code-agentic-orchestration-pipeline.md +87 -0
  291. package/vault/wiki/questions/Research-sentrux-dev.md +123 -0
  292. package/vault/wiki/questions/Research-superpowers-skill-for-agentic-coding-agents.md +164 -0
  293. package/vault/wiki/questions/Research: Augment Code Context Engine.md +244 -0
  294. package/vault/wiki/questions/Research: Automating Software Engineering - Lovable, Bolt, Emergent, Rocket.md +112 -0
  295. package/vault/wiki/questions/Research: Claude Code State-of-the-Art Harness Improvements.md +209 -0
  296. package/vault/wiki/questions/Research: Codex State-of-the-Art Harness Improvements.md +99 -0
  297. package/vault/wiki/questions/Research: Engineering Workflows of Legendary Programmers and AI Harness Mapping.md +107 -0
  298. package/vault/wiki/questions/Research: Fallow Codebase Intelligence Harness Integration.md +72 -0
  299. package/vault/wiki/questions/Research: Gemini CLI SOTA Harness Integration.md +166 -0
  300. package/vault/wiki/questions/Research: GitHub Issues as Harness Spec Storage.md +188 -0
  301. package/vault/wiki/questions/Research: Google Antigravity Harness Integration.md +120 -0
  302. package/vault/wiki/questions/Research: Meta-Agent Context Drift Detection.md +236 -0
  303. package/vault/wiki/questions/Research: Model-Adaptive Agent Harness Design.md +95 -0
  304. package/vault/wiki/questions/Research: Model-Specific Prompting Guides.md +165 -0
  305. package/vault/wiki/questions/Research: Prompt Renderer for Multi-Model Agent Harness.md +216 -0
  306. package/vault/wiki/questions/Research: Skill-First Harness Architecture.md +91 -0
  307. package/vault/wiki/questions/Research: TypeScript Best Practices and Codebase Structure.md +88 -0
  308. package/vault/wiki/questions/Research: TypeScript Execution Layer for Agent Tool Calling.md +81 -0
  309. package/vault/wiki/questions/Research: claude-mem over Obsidian for Harness Layer.md +71 -0
  310. package/vault/wiki/questions/Research: claude-mem over obsidian wiki as the knowledge base for our agentic harness pipeline. think from first principles. does this replace or complement our current setup? no hard feelings about previous decisions. gimme accurate points.md +80 -0
  311. package/vault/wiki/questions/Research: context-mode vs lean-ctx.md +72 -0
  312. package/vault/wiki/questions/Research: cursor.sh Harness Innovations.md +92 -0
  313. package/vault/wiki/questions/Research: executor.sh Harness Integration.md +170 -0
  314. package/vault/wiki/questions/Research: how GSD fits into our coding harness setup.md +97 -0
  315. package/vault/wiki/questions/Research: how claude-mem fits into our workflow. and whether it should replace obsidian in the codebase. no hard feelings about previous actions, rethink from first principles always.md +80 -0
  316. package/vault/wiki/questions/Research: pi-vcc.md +113 -0
  317. package/vault/wiki/questions/Research: semantic code search tools.md +69 -0
  318. package/vault/wiki/questions/Research: vcc extension for pi coding agent.md +73 -0
  319. package/vault/wiki/questions/how-to-enable-semantic-code-search-now.md +111 -0
  320. package/vault/wiki/questions/mvp-implementation-blueprint.md +552 -0
  321. package/vault/wiki/questions/research-agent-first-codebase-exploration.md +199 -0
  322. package/vault/wiki/questions/research-agentic-coding-harness-latest-papers.md +142 -0
  323. package/vault/wiki/questions/research-gitingest-gitreverse-integration.md +100 -0
  324. package/vault/wiki/questions/research-wozcode-token-reduction.md +67 -0
  325. package/vault/wiki/questions/resolved-context-pruning-inplace-vs-restart.md +95 -0
  326. package/vault/wiki/questions/resolved-context-window-economics.md +167 -0
  327. package/vault/wiki/questions/resolved-imad-debate-gating-transfer.md +126 -0
  328. package/vault/wiki/questions/resolved-mcp-tool-preference.md +112 -0
  329. package/vault/wiki/questions/resolved-small-model-meta-agents.md +107 -0
  330. package/vault/wiki/questions/resolved-treesitter-dynamic-languages.md +95 -0
  331. package/vault/wiki/sources/Auggie Context MCP Server.md +63 -0
  332. package/vault/wiki/sources/Augment Code Codacy AI Giants.md +61 -0
  333. package/vault/wiki/sources/Augment Code MCP SiliconAngle.md +49 -0
  334. package/vault/wiki/sources/Augment Code WorkOS ERC 2025.md +55 -0
  335. package/vault/wiki/sources/Augment Context Engine Official.md +71 -0
  336. package/vault/wiki/sources/Augment SWE-bench Agent GitHub.md +74 -0
  337. package/vault/wiki/sources/Augment SWE-bench Pro Blog.md +58 -0
  338. package/vault/wiki/sources/Source: AgentBus Jinja2 Prompt Pipelines.md +75 -0
  339. package/vault/wiki/sources/Source: Arxiv /342/200/224 Don't Break the Cache.md" +85 -0
  340. package/vault/wiki/sources/Source: Augment - Harness Engineering for AI Coding Agents.md +58 -0
  341. package/vault/wiki/sources/Source: Blake Crosley Agent Architecture Guide.md +100 -0
  342. package/vault/wiki/sources/Source: Bolt.new Architecture & Case Study.md +75 -0
  343. package/vault/wiki/sources/Source: Build-Time Prompt Compilation Architecture.md +107 -0
  344. package/vault/wiki/sources/Source: Claude API Agent Skills Overview.md +70 -0
  345. package/vault/wiki/sources/Source: Gemini CLI Changelogs.md +88 -0
  346. package/vault/wiki/sources/Source: Google Blog - Gemini CLI Announcement.md +57 -0
  347. package/vault/wiki/sources/Source: Google Gemini CLI Architecture Docs.md +53 -0
  348. package/vault/wiki/sources/Source: LangChain - Anatomy of Agent Harness.md +65 -0
  349. package/vault/wiki/sources/Source: Lovable Architecture & Clone Analysis.md +83 -0
  350. package/vault/wiki/sources/Source: Martin Fowler - Harness Engineering.md +70 -0
  351. package/vault/wiki/sources/Source: OpenAI Harness Engineering Five Principles.md +58 -0
  352. package/vault/wiki/sources/Source: OpenAI Harness Engineering /342/200/224 0 Lines of Human Code.md" +101 -0
  353. package/vault/wiki/sources/Source: OpenDev /342/200/224 Building AI Coding Agents for the Terminal.md" +100 -0
  354. package/vault/wiki/sources/Source: Render AI Coding Agents Benchmark 2025.md +53 -0
  355. package/vault/wiki/sources/Source: Rocket.new /342/200/224 Vibe Solutioning Platform.md" +70 -0
  356. package/vault/wiki/sources/Source: SwirlAI Agent Skills Progressive Disclosure.md +71 -0
  357. package/vault/wiki/sources/Source: TianPan Prompt Caching Architecture.md +89 -0
  358. package/vault/wiki/sources/Source: Vercel Labs agent-browser.md +155 -0
  359. package/vault/wiki/sources/Source: browser-harness CDP Harness.md +126 -0
  360. package/vault/wiki/sources/agent-drift-academic-paper.md +79 -0
  361. package/vault/wiki/sources/aider-repomap-tree-sitter.md +42 -0
  362. package/vault/wiki/sources/anthropic-compaction-api.md +58 -0
  363. package/vault/wiki/sources/anthropic-effective-harnesses.md +42 -0
  364. package/vault/wiki/sources/anthropic-prompt-best-practices.md +100 -0
  365. package/vault/wiki/sources/anthropic2026-harness-design.md +63 -0
  366. package/vault/wiki/sources/barrel-files-tkdodo.md +38 -0
  367. package/vault/wiki/sources/birth-of-unix-kernighan-interview.md +57 -0
  368. package/vault/wiki/sources/bockeler2026-harness-engineering.md +69 -0
  369. package/vault/wiki/sources/cast-code-chunking-paper.md +50 -0
  370. package/vault/wiki/sources/ck-semantic-search.md +78 -0
  371. package/vault/wiki/sources/claude-code-architecture-karaxai-2026.md +71 -0
  372. package/vault/wiki/sources/claude-code-architecture-qubytes-2026.md +50 -0
  373. package/vault/wiki/sources/claude-code-architecture-vila-lab-2026.md +64 -0
  374. package/vault/wiki/sources/claude-code-security-architecture-penligent-2026.md +70 -0
  375. package/vault/wiki/sources/claude-context-editing-docs.md +13 -0
  376. package/vault/wiki/sources/cloudflare-codemode.md +63 -0
  377. package/vault/wiki/sources/code-chunk-library-supermemory.md +63 -0
  378. package/vault/wiki/sources/codeact-apple-2024.md +62 -0
  379. package/vault/wiki/sources/codex-dsc-rfc-8573.md +41 -0
  380. package/vault/wiki/sources/codex-open-source-agent-2026.md +110 -0
  381. package/vault/wiki/sources/coir-code-retrieval-benchmark.md +51 -0
  382. package/vault/wiki/sources/colinmcnamara-context-optimization-codemode.md +48 -0
  383. package/vault/wiki/sources/context-folding-paper.md +61 -0
  384. package/vault/wiki/sources/context-mode-website.md +63 -0
  385. package/vault/wiki/sources/cursor-agent-best-practices-2026.md +62 -0
  386. package/vault/wiki/sources/cursor-fork-29b-2025.md +50 -0
  387. package/vault/wiki/sources/cursor-harness-april-2026.md +76 -0
  388. package/vault/wiki/sources/cursor-instant-apply-2024.md +45 -0
  389. package/vault/wiki/sources/cursor-shadow-workspace-2024.md +52 -0
  390. package/vault/wiki/sources/cursor-shipped-coding-agent-2026.md +53 -0
  391. package/vault/wiki/sources/cursor-vs-antigravity-2026.md +51 -0
  392. package/vault/wiki/sources/disler-pi-vs-claude-code.md +69 -0
  393. package/vault/wiki/sources/distill-deterministic-context-compression.md +53 -0
  394. package/vault/wiki/sources/embedding-models-benchmark-supermemory-2025.md +48 -0
  395. package/vault/wiki/sources/executor-rhyssullivan.md +122 -0
  396. package/vault/wiki/sources/fallow-rs-codebase-intelligence.md +125 -0
  397. package/vault/wiki/sources/fan2025-imad.md +60 -0
  398. package/vault/wiki/sources/forgecode-gpt5-agent-improvements.md +63 -0
  399. package/vault/wiki/sources/gemini-3-prompting-guide.md +78 -0
  400. package/vault/wiki/sources/gh-cli-sub-issue-rfc.md +50 -0
  401. package/vault/wiki/sources/gh-sub-issue-extension.md +72 -0
  402. package/vault/wiki/sources/github-fork-issues-discussion.md +44 -0
  403. package/vault/wiki/sources/github-issue-dependencies-docs.md +49 -0
  404. package/vault/wiki/sources/github-sub-issues-docs.md +51 -0
  405. package/vault/wiki/sources/gitingest.md +91 -0
  406. package/vault/wiki/sources/gitreverse.md +63 -0
  407. package/vault/wiki/sources/google-antigravity-official-blog.md +47 -0
  408. package/vault/wiki/sources/google-antigravity-wikipedia.md +53 -0
  409. package/vault/wiki/sources/gsd-codecentric-deep-dive.md +57 -0
  410. package/vault/wiki/sources/gsd-github-repo.md +51 -0
  411. package/vault/wiki/sources/gsd-hn-discussion.md +59 -0
  412. package/vault/wiki/sources/guido-python-design-philosophy.md +56 -0
  413. package/vault/wiki/sources/hejlsberg-7-learnings.md +48 -0
  414. package/vault/wiki/sources/ironclaw-drift-monitor.md +80 -0
  415. package/vault/wiki/sources/langsight-loop-detection.md +80 -0
  416. package/vault/wiki/sources/leanctx-website.md +69 -0
  417. package/vault/wiki/sources/lee2026-meta-harness.md +59 -0
  418. package/vault/wiki/sources/linux-kernel-coding-workflow.md +50 -0
  419. package/vault/wiki/sources/lou2026-autoharness.md +53 -0
  420. package/vault/wiki/sources/martin-fowler-harness-engineering.md +73 -0
  421. package/vault/wiki/sources/mcp-architecture-docs.md +13 -0
  422. package/vault/wiki/sources/meng2026-agent-harness-survey.md +79 -0
  423. package/vault/wiki/sources/mindstudio-four-agent-types.md +68 -0
  424. package/vault/wiki/sources/ms-chat-history-management.md +13 -0
  425. package/vault/wiki/sources/openai-prompt-guidance.md +104 -0
  426. package/vault/wiki/sources/openclaw-session-pruning.md +13 -0
  427. package/vault/wiki/sources/opencode-dcp.md +13 -0
  428. package/vault/wiki/sources/opendev-arxiv-2603.05344v1.md +79 -0
  429. package/vault/wiki/sources/openhands-platform.md +39 -0
  430. package/vault/wiki/sources/oss-guide-codebase-exploration.md +53 -0
  431. package/vault/wiki/sources/pi-compaction-extensions-ecosystem.md +102 -0
  432. package/vault/wiki/sources/pi-context-prune-github-repo.md +38 -0
  433. package/vault/wiki/sources/pi-mono-compaction-docs.md +38 -0
  434. package/vault/wiki/sources/pi-omni-compact-github-repo.md +50 -0
  435. package/vault/wiki/sources/pi-rtk-optimizer-github-repo.md +45 -0
  436. package/vault/wiki/sources/pi-vcc-github-repo.md +69 -0
  437. package/vault/wiki/sources/pi-vscode-marketplace.md +41 -0
  438. package/vault/wiki/sources/pi-vscode-model-provider-marketplace.md +39 -0
  439. package/vault/wiki/sources/py-tree-sitter.md +13 -0
  440. package/vault/wiki/sources/sentrux-dev-landing.md +40 -0
  441. package/vault/wiki/sources/sentrux-docs-pro-architecture.md +75 -0
  442. package/vault/wiki/sources/sentrux-docs-quality-signal.md +46 -0
  443. package/vault/wiki/sources/sentrux-docs-root-cause-metrics.md +57 -0
  444. package/vault/wiki/sources/sentrux-docs-rules-engine.md +58 -0
  445. package/vault/wiki/sources/sentrux-github-repo.md +56 -0
  446. package/vault/wiki/sources/superpowers-github-repo.md +56 -0
  447. package/vault/wiki/sources/superpowers-release-blog.md +54 -0
  448. package/vault/wiki/sources/superpowers-termdock-analysis.md +45 -0
  449. package/vault/wiki/sources/swe-agent-aci.md +42 -0
  450. package/vault/wiki/sources/swe-bench.md +45 -0
  451. package/vault/wiki/sources/swe-pruner-context-pruning.md +13 -0
  452. package/vault/wiki/sources/think-in-code-blog.md +48 -0
  453. package/vault/wiki/sources/tree-sitter-docs.md +13 -0
  454. package/vault/wiki/sources/ts-best-practices-2025-devto.md +42 -0
  455. package/vault/wiki/sources/ts-folder-structure-mingyang.md +58 -0
  456. package/vault/wiki/sources/ts-monorepo-koerselman.md +44 -0
  457. package/vault/wiki/sources/ts-result-error-handling-kkalamarski.md +52 -0
  458. package/vault/wiki/sources/ts-runtimes-comparison-betterstack.md +42 -0
  459. package/vault/wiki/sources/ts-strict-mode-rishikc.md +43 -0
  460. package/vault/wiki/sources/unix-philosophy.md +48 -0
  461. package/vault/wiki/sources/vectara-chunking-vs-embedding-naacl2025.md +39 -0
  462. package/vault/wiki/sources/vectara-guardian-agents.md +79 -0
  463. package/vault/wiki/sources/vgrep-semantic-search.md +76 -0
  464. package/vault/wiki/sources/vitest-official.md +41 -0
  465. package/vault/wiki/sources/vscode-pi-community-extension.md +40 -0
  466. package/vault/wiki/sources/wozcode.md +79 -0
  467. package/.agents/skills/compress/SKILL.md +0 -111
  468. package/.agents/skills/compress/scripts/__init__.py +0 -9
  469. package/.agents/skills/compress/scripts/__main__.py +0 -3
  470. package/.agents/skills/compress/scripts/benchmark.py +0 -78
  471. package/.agents/skills/compress/scripts/cli.py +0 -73
  472. package/.agents/skills/compress/scripts/compress.py +0 -227
  473. package/.agents/skills/compress/scripts/detect.py +0 -121
  474. package/.agents/skills/compress/scripts/validate.py +0 -189
  475. package/.agents/skills/emil-design-eng/SKILL.md +0 -679
  476. package/.agents/skills/lean-ctx/SKILL.md +0 -149
  477. package/.agents/skills/lean-ctx/scripts/install.sh +0 -95
  478. package/.agents/skills/scrapling-official/LICENSE.txt +0 -28
  479. package/.agents/skills/scrapling-official/SKILL.md +0 -390
  480. package/.agents/skills/scrapling-official/examples/01_fetcher_session.py +0 -26
  481. package/.agents/skills/scrapling-official/examples/02_dynamic_session.py +0 -26
  482. package/.agents/skills/scrapling-official/examples/03_stealthy_session.py +0 -26
  483. package/.agents/skills/scrapling-official/examples/04_spider.py +0 -58
  484. package/.agents/skills/scrapling-official/examples/README.md +0 -45
  485. package/.agents/skills/scrapling-official/references/fetching/choosing.md +0 -78
  486. package/.agents/skills/scrapling-official/references/fetching/dynamic.md +0 -352
  487. package/.agents/skills/scrapling-official/references/fetching/static.md +0 -432
  488. package/.agents/skills/scrapling-official/references/fetching/stealthy.md +0 -255
  489. package/.agents/skills/scrapling-official/references/mcp-server.md +0 -214
  490. package/.agents/skills/scrapling-official/references/migrating_from_beautifulsoup.md +0 -86
  491. package/.agents/skills/scrapling-official/references/parsing/adaptive.md +0 -212
  492. package/.agents/skills/scrapling-official/references/parsing/main_classes.md +0 -586
  493. package/.agents/skills/scrapling-official/references/parsing/selection.md +0 -494
  494. package/.agents/skills/scrapling-official/references/spiders/advanced.md +0 -344
  495. package/.agents/skills/scrapling-official/references/spiders/architecture.md +0 -94
  496. package/.agents/skills/scrapling-official/references/spiders/getting-started.md +0 -164
  497. package/.agents/skills/scrapling-official/references/spiders/proxy-blocking.md +0 -235
  498. package/.agents/skills/scrapling-official/references/spiders/requests-responses.md +0 -196
  499. package/.agents/skills/scrapling-official/references/spiders/sessions.md +0 -205
  500. package/PLAN.md +0 -11
  501. package/extensions/lean-ctx-enforce.ts +0 -166
  502. package/skills-lock.json +0 -35
  503. package/wiki/README.md +0 -19
  504. package/wiki/decisions/0001-establish-project-wiki-and-decision-record-format.md +0 -25
  505. package/wiki/decisions/0002-add-project-banner-to-readme.md +0 -26
  506. package/wiki/decisions/0003-remove-redundant-readme-title-heading.md +0 -26
  507. package/wiki/decisions/0004-publish-package-to-npm-as-ultimate-pi.md +0 -26
  508. package/wiki/decisions/0005-automate-npm-publish-with-github-actions.md +0 -27
  509. package/wiki/decisions/0006-switch-to-npm-trusted-publishing.md +0 -26
  510. package/wiki/decisions/0007-use-absolute-banner-url-for-npm-readme-rendering.md +0 -26
  511. package/wiki/decisions/0008-rename-banner-asset-for-cache-busting.md +0 -26
  512. package/wiki/decisions/0009-force-oidc-path-by-clearing-node-auth-token-in-publish-step.md +0 -25
  513. package/wiki/decisions/0010-simplify-setup-node-for-npm-trusted-publishing.md +0 -26
  514. package/wiki/decisions/0011-add-noop-workflow-change-to-force-fresh-publish-run.md +0 -25
  515. package/wiki/decisions/0012-align-workflow-runtime-with-npm-trusted-publishing-requirements.md +0 -26
  516. package/wiki/decisions/0013-add-package-repository-url-for-provenance-validation.md +0 -25
@@ -0,0 +1,216 @@
1
+ ---
2
+ type: synthesis
3
+ title: "Research: Prompt Renderer for Multi-Model Agent Harness"
4
+ created: 2026-05-02
5
+ updated: 2026-05-02
6
+ tags:
7
+ - research
8
+ - prompt-renderer
9
+ - multi-model
10
+ - build-time-compilation
11
+ - caching
12
+ - harness
13
+ status: developing
14
+ related:
15
+ - "[[Prompt Renderer]]"
16
+ - "[[Build-Time Prompt Compilation]]"
17
+ - "[[provider-native-prompting]]"
18
+ - "[[model-adaptive-harness]]"
19
+ - "[[harness-configuration-layers]]"
20
+ - "[[harness-implementation-plan]]"
21
+ sources:
22
+ - "[[Source: Build-Time Prompt Compilation Architecture]]"
23
+ - "[[Source: AgentBus Jinja2 Prompt Pipelines]]"
24
+ - "[[Source: TianPan Prompt Caching Architecture]]"
25
+ - "[[Source: Arxiv — Don't Break the Cache]]"
26
+ - "[[openai-prompt-guidance]]"
27
+ - "[[anthropic-prompt-best-practices]]"
28
+ - "[[gemini-3-prompting-guide]]"
29
+
30
+ ---# Research: Prompt Renderer for Multi-Model Agent Harness
31
+
32
+ ## Overview
33
+
34
+ Design a custom prompt renderer for the ultimate-pi agentic harness that takes a **base prompt spec** (model-agnostic), applies **per-model prompting best practices**, substitutes variables, uses a **caching layer** for cost optimization, and compiles rendered prompts at **build time** (not runtime) — shipped as compiled assets inside the npm library. This extends the existing [[provider-native-prompting]] concept with compilation, caching, and a two-phase variable system.
35
+
36
+ ## Key Findings
37
+
38
+ 1. **Build-time compilation is a proven architectural pattern but no mature off-the-shelf npm package exists.** The pattern is validated by Microsoft prompt-engine (2.8K stars, MIT — YAML-based prompt management, abandoned 2022) and PromptWeaver (`@iqai/prompt-weaver`, MIT, active Dec 2025 — Handlebars template compilation with Zod validation). The implementation is a DIY build pipeline: `js-yaml` (parse specs) + `@iqai/prompt-weaver` (template engine) + per-model renderer plugins → compiled JSON shipped in npm. No runtime template engine needed.
39
+
40
+ 2. **Strategic cache boundary control is essential** (Source: [[Source: Arxiv — Don't Break the Cache]]). Across 500 agent sessions and 4 flagship models, system prompt only caching provides the most consistent benefits (41-80% cost reduction, 13-31% TTFT improvement). Full context caching can paradoxically increase latency. The golden rule: static content first, dynamic content last. Compile-time rendering makes this trivial — all static content is in the compiled prompt, runtime vars are appended at the end.
41
+
42
+ 3. **Multi-tier caching architecture is well understood** (Source: [[Source: TianPan Prompt Caching Architecture]]). Three tiers: Semantic cache (100% savings for exact/near-duplicate queries), Prefix cache (50-90% savings for shared static context), Full inference (0% savings). Build-time compilation eliminates the need for runtime prefix caching entirely — compiled prompts ARE the cache. The "parallel execution trap" (4% cache hit rate without warming) is irrelevant when prompts are pre-compiled.
43
+
44
+ 4. **Each model has fundamentally different prompting conventions** (Sources: [[openai-prompt-guidance]], [[anthropic-prompt-best-practices]], [[gemini-3-prompting-guide]]). OpenAI says constraints-first and outcome-first. Anthropic mandates XML tags and long-form reasoning. Google says constraints-LAST with plain text. A single canonical prompt relaxed per model is WRONG — each model needs a purpose-built renderer that applies its official conventions from a shared semantic spec.
45
+
46
+ 5. **Jinja2 template patterns are production-ready but runtime-only** (Source: [[Source: AgentBus Jinja2 Prompt Pipelines]]). The Jinja2 pattern (FileSystemLoader, template inheritance, conditionals, loops, pipeline runner) is excellent for prompt structure but designed for runtime. We adapt the pattern to build-time: templates are compiled to static JSON with placeholders for runtime variables, not rendered at request time.
47
+
48
+ ## Architecture
49
+
50
+ ```
51
+ ┌──────────────────────────────────────────────────────┐
52
+ │ BUILD TIME │
53
+ │ │
54
+ │ Base Prompt Spec (prompts/*.yaml) │
55
+ │ │ │
56
+ │ ▼ │
57
+ │ ┌─────────────────┐ │
58
+ │ │ Prompt Compiler │ ← TypeScript build script │
59
+ │ │ │ │
60
+ │ │ • Parse YAML │ │
61
+ │ │ • Validate spec │ │
62
+ │ │ • Per-model │ ← Renderer plugins │
63
+ │ │ renderers │ (GPT, Claude, Gemini) │
64
+ │ │ • Substitute │ │
65
+ │ │ compile vars │ │
66
+ │ │ • Hash + cache │ │
67
+ │ └──────┬───────────┘ │
68
+ │ │ │
69
+ │ ▼ │
70
+ │ Compiled Prompts (dist/prompts/*.json) │
71
+ │ ✓ Per-model variants │
72
+ │ ✓ Syntax-validated │
73
+ │ ✓ Token-count checked │
74
+ │ ✓ Hash-verified │
75
+ │ ✓ Shipped in npm package │
76
+ │ │
77
+ └──────────────────────────────────────────────────────┘
78
+
79
+
80
+ ┌──────────────────────────────────────────────────────┐
81
+ │ RUNTIME │
82
+ │ │
83
+ │ Load compiled prompt by {spec, model} │
84
+ │ │ │
85
+ │ ▼ │
86
+ │ Substitute runtime variables │
87
+ │ (user_query, context, etc.) │
88
+ │ │ │
89
+ │ ▼ │
90
+ │ Send to LLM API │
91
+ │ (no template engine, no compilation, no cache warmup)│
92
+ └──────────────────────────────────────────────────────┘
93
+ ```
94
+
95
+ ## Caching Layer Design
96
+
97
+ ### Build Cache (incremental compilation)
98
+ ```
99
+ cache/
100
+ └── compile-cache.json # { spec_hash → compiled_output_hash }
101
+ ```
102
+ Only recompile prompts whose spec hash changed since last build.
103
+
104
+ ### Output Cache (compiled prompts)
105
+ ```
106
+ dist/prompts/manifest.json # { spec → { model → { hash, path, build_time } } }
107
+ ```
108
+ Each compiled prompt is content-hashed for deterministic verification.
109
+
110
+ ### Runtime Cache (not needed)
111
+ No runtime cache required — compiled prompts are static files loaded directly. Zero compilation latency, zero cache warming, zero parallel-execution traps.
112
+
113
+ ## Per-Model Rendering Rules
114
+
115
+ | Dimension | GPT (OpenAI) | Claude (Anthropic) | Gemini (Google) |
116
+ |-----------|-------------|-------------------|-----------------|
117
+ | **System prompt** | `messages[0].role="system"` | `system` parameter | `systemInstruction` config |
118
+ | **Structure** | Flat, constraints-first | XML tags (`<instructions>`) | Plain text, constraints-last |
119
+ | **Instruction ordering** | Outcome → Constraints → Context | Role → Context → Task → XML | Context → Task → Constraints |
120
+ | **Output format** | Function calling / JSON mode | Structured output API | Controlled generation / JSON |
121
+ | **Cache mechanism** | Auto (prefix match) | `cache_control: {type: "ephemeral"}` | Explicit context cache |
122
+ | **Best practice source** | platform.openai.com/docs/guides/prompt-engineering | docs.anthropic.com + interactive tutorial | cloud.google.com/vertex-ai/docs |
123
+ | **Examples preference** | Few-shot inline | Few-shot with XML wrappers | Few-shot with clear separation |
124
+ | **Token threshold** | 1,024 (cache min) | 1,024 (cache min) | 4,096 (cache min) |
125
+
126
+ ## Variable System
127
+
128
+ Two-phase variable resolution:
129
+
130
+ ```typescript
131
+ interface PromptVariable {
132
+ name: string;
133
+ type: 'string' | 'number' | 'boolean' | 'json';
134
+ phase: 'compile' | 'runtime';
135
+ default?: unknown;
136
+ required: boolean;
137
+ }
138
+ ```
139
+
140
+ - **Compile-time vars** (`phase: 'compile'`): Resolved at build time. Multiple values produce multiple compiled variants. Example: `model_name: [gpt-5.2, claude-sonnet-4.5]` → 2 compiled prompts.
141
+ - **Runtime vars** (`phase: 'runtime'`): Resolved at call time. Left as `{{PLACEHOLDER}}` in compiled output. Substituted by a lightweight runtime function (no template engine needed — simple string replace).
142
+
143
+ ## npm Package Structure
144
+
145
+ ```
146
+ @ultimate-pi/harness/
147
+ ├── dist/
148
+ │ ├── prompts/
149
+ │ │ ├── gpt/
150
+ │ │ │ ├── system.json # Compiled system prompt for GPT
151
+ │ │ │ ├── spec-hardening.json
152
+ │ │ │ └── verify.json
153
+ │ │ ├── claude/
154
+ │ │ │ ├── system.json # Compiled system prompt for Claude
155
+ │ │ │ └── ...
156
+ │ │ └── gemini/
157
+ │ │ └── ...
158
+ │ ├── manifest.json # Build manifest
159
+ │ └── renderers/
160
+ │ ├── gpt-renderer.js # Renderer plugins (only if runtime rendering needed)
161
+ │ └── ...
162
+ ├── prompts/ # Source specs (for development)
163
+ │ ├── base/
164
+ │ │ ├── system.yaml
165
+ │ │ └── verify.yaml
166
+ │ └── fragments/
167
+ │ └── common.yaml
168
+ ├── scripts/
169
+ │ └── compile-prompts.ts # Build script
170
+ └── src/
171
+ └── runtime/
172
+ └── prompt-loader.ts # Runtime loader (reads compiled JSON)
173
+ ```
174
+
175
+ ## Implementation Plan (integrated into harness)
176
+
177
+ ### Phase 1: Compiler Core
178
+ - TypeScript build script that reads YAML specs → validates → applies per-model renderers → outputs compiled JSON
179
+ - Supported models: GPT-5.2, Claude Sonnet 4.5, Gemini 2.5 Pro (extensible plugin system)
180
+ - Deterministic builds with SHA-256 manifest
181
+ - Integration: `npm run compile-prompts` as build step
182
+
183
+ ### Phase 2: Per-Model Renderers
184
+ - GPT renderer: constraints-first, flat structure, outcome-first ordering, system role message
185
+ - Claude renderer: XML tags, long-form structure, cache_control markers, system parameter
186
+ - Gemini renderer: constraints-last, plain text, systemInstruction, context cache config
187
+
188
+ ### Phase 3: Variable System
189
+ - Two-phase variable resolution with type checking
190
+ - Compile-time multi-value expansion
191
+ - Runtime placeholder format: `__VAR_name__` (avoid collision with any template syntax)
192
+
193
+ ### Phase 4: Caching
194
+ - Incremental build cache (recompile only changed specs)
195
+ - Compiled prompts shipped as static JSON in npm (no runtime compilation)
196
+ - Content-hash verification for deterministic builds
197
+
198
+ ### Phase 5: Runtime Integration
199
+ - `loadPrompt(specName, model, runtimeVars)` function
200
+ - Zero-dependency runtime (just JSON.parse + string replace)
201
+ - Type-safe with TypeScript types for all compiled prompts
202
+
203
+ ## Contradictions
204
+
205
+ - [[Source: AgentBus Jinja2 Prompt Pipelines]] advocates runtime template rendering with Jinja2 (Python). Our design deliberately avoids this — pre-compiling at build time eliminates template engine dependency, reduces runtime overhead to zero, and makes prompts auditable static assets. The contradiction is resolved by recognizing that Jinja2's patterns (inheritance, blocks, pipelines) are excellent for prompt STRUCTURE but should be resolved at build time, not runtime.
206
+ - [[Source: TianPan Prompt Caching Architecture]] describes runtime prefix caching with cache warming. Our design makes this mostly irrelevant — when prompts are pre-compiled and shipped in npm, there is no runtime prefix to cache. However, the multi-tier caching insight (semantic → prefix → full) remains valuable for the broader harness caching strategy beyond prompt rendering.
207
+
208
+ ## Open Questions
209
+
210
+ 1. **What template syntax for base specs?** YAML with JSON Schema validation is the practical choice. PromptWeaver's Handlebars syntax provides the template layer. Microsoft prompt-engine validated the YAML pattern. JSON Schema (or Zod, integrated with PromptWeaver) provides better validation than raw YAML parsing alone. YAML stays human-friendly for spec authors.
211
+ 2. **How to handle prompt versioning across npm releases?** Compiled prompts must be versioned with the harness. Semantic versioning for prompts: major = breaking spec change, minor = new prompt added, patch = rendering tweak. The build manifest provides traceability.
212
+ 3. **What about custom/fine-tuned models?** The renderer plugin system should support user-defined renderers for custom models. Default: fall back to "generic" renderer that produces a neutral format.
213
+ 4. **How to test compiled prompts?** Each compiled variant needs automated testing. PromptWeaver's Zod schema validation checks structure and types at compile time. Token thresholds checked against model-specific limits. Semantic testing (does the compiled prompt produce expected behavior?) requires sending to the target model — this is a separate integration test concern.
214
+ 5. **What happens when a provider changes its API format?** Compiled prompts for that provider become stale. The build manifest tracks renderer version — recompilation produces updated prompts. A CI check should flag prompts compiled with outdated renderer versions.
215
+ 6. **Where does token budget allocation fit?** The base spec should declare expected token budgets. The compiler validates that compiled prompts don't exceed model limits. Budget allocation is a prompt design concern, not a renderer concern — but the renderer enforces it.
216
+ 7. **Does the renderer need to support chat message arrays (multi-turn)?** Yes — the base spec should support defining multi-message prompts (system + examples + user template). The renderer compiles the full message array structure per model's expected format.
@@ -0,0 +1,91 @@
1
+ ---
2
+ type: synthesis
3
+ title: "Research: Skill-First MVP & Harness Implementation Architecture"
4
+ created: 2026-05-03
5
+ updated: 2026-05-03
6
+ tags:
7
+ - research
8
+ - harness
9
+ - mvp
10
+ - skills
11
+ - architecture
12
+ - first-principles
13
+ status: developing
14
+ related:
15
+ - "[[skill-first-architecture]]"
16
+ - "[[harness-implementation-plan]]"
17
+ - "[[mvp-implementation-blueprint]]"
18
+ - "[[agent-skills-pattern]]"
19
+ - "[[drift-detection-unified]]"
20
+ - "[[harness-engineering-first-principles]]"
21
+ - "[[adr-015]]"
22
+ sources:
23
+ - "[[Source: SwirlAI Agent Skills Progressive Disclosure]]"
24
+ - "[[Source: Claude API Agent Skills Overview]]"
25
+ - "[[Source: Blake Crosley Agent Architecture Guide]]"
26
+ ---
27
+
28
+ # Research: Skill-First MVP & Harness Implementation Architecture
29
+
30
+ ## Overview
31
+
32
+ Rethought the entire MVP and harness implementation plans from first principles. The core insight: the harness is NOT a code pipeline — it's a skill coordination layer. 80% of harness functionality can be markdown-based skills (`SKILL.md` files) loaded on-demand via progressive disclosure. Only deterministic infrastructure needs code: the event bus (wiring), the drift monitor (real-time pattern matching), shared types, and config. Everything else — spec hardening, planning, adversarial verification, observability, memory — is an LLM-invoked skill. This cuts the code surface from ~15 TypeScript files to 4, while gaining auto-activation, progressive disclosure, and zero-compile iteration speed.
33
+
34
+ ## Key Findings
35
+
36
+ - **Skills are the atomic unit of harness behavior.** Validated by Anthropic's open standard (Dec 2025), adopted by OpenAI, Google, GitHub, Cursor within weeks. Skills use three-tier progressive disclosure: Discovery (~80 tokens/skill), Activation (~2,000 tokens), Execution (unlimited supporting files). (Source: [[Source: SwirlAI Agent Skills Progressive Disclosure]])
37
+ - **Code is for determinism, not logic.** Hooks guarantee execution (exit code 2 blocks). Skills are probabilistic (model decides when to activate). The drift monitor MUST be code because it runs deterministic pattern matching on every `tool_result` event with sliding windows. Everything else is LLM evaluation and SHOULD be a skill. (Source: [[Source: Blake Crosley Agent Architecture Guide]])
38
+ - **The harness pattern is hooks→skills→agents→workflows.** Claude Code's architecture (22+ lifecycle events, markdown skills, subagents, filesystem memory) validates that the harness is "a programmable runtime with an LLM kernel" — not a TypeScript codebase. (Source: [[Source: Blake Crosley Agent Architecture Guide]])
39
+ - **Skills compose with hooks.** Skills can define their own hooks in YAML frontmatter that activate only while the skill runs. This creates domain-specific deterministic behavior without polluting other sessions. (Source: [[Source: Blake Crosley Agent Architecture Guide]])
40
+ - **Markdown skills ARE the spec.** No separate spec files. The SKILL.md body is simultaneously the specification, the implementation instructions, and the documentation. Supporting files (reference.md, scripts/) provide execution-layer resources. (Source: [[Source: Claude API Agent Skills Overview]])
41
+ - **Pi's built-in event bus handles routing.** No custom event bus needed — pi's native event system wires events to skill invocations. Pipeline ordering is enforced by skill activation sequence, not imperative code.
42
+ - **File count drops from 15 to 3 TypeScript files.** `src/harness/drift-monitor.ts` (drift detection), `src/harness/types.ts` (shared types), `src/harness/config.ts` (config loader). All other functionality becomes `.pi/skills/harness-*/SKILL.md` files.
43
+
44
+ ## Key Entities
45
+
46
+ - [[Anthropic]]: Released Agent Skills open standard Dec 18, 2025. Three-tier progressive disclosure. Adopted industry-wide within weeks.
47
+ - [[OpenAI]]: Adopted skills for Codex CLI and ChatGPT.
48
+ - [[Google]]: Added skills to Gemini CLI.
49
+ - [[GitHub Copilot]]: Launched skills support same day as standard.
50
+ - [[Cursor]]: Integrated skills alongside Rules system.
51
+
52
+ ## Key Concepts
53
+
54
+ - [[skill-first-architecture]]: Harness layers as markdown-based skills instead of TypeScript code modules. Only deterministic infrastructure remains as code.
55
+ - [[progressive-disclosure-agents]]: Three-tier loading: metadata (always, ~80 tokens/skill) → full SKILL.md (when relevant, ~2,000 tokens) → supporting files (on demand, unlimited).
56
+ - [[agent-skills-pattern]]: Progressive disclosure as a system design pattern. Context windows are finite and lossy — skills keep context lean.
57
+
58
+ ## Architecture Comparison
59
+
60
+ | Dimension | Old Plan (Code-First) | New Plan (Skill-First) |
61
+ |-----------|----------------------|------------------------|
62
+ | L1 Spec Hardening | `src/harness/l1-spec.ts` (~300 lines TS) | `.pi/skills/harness-spec/SKILL.md` (markdown) |
63
+ | L2 Planning | `src/harness/l2-planner.ts` (~400 lines TS) | `.pi/skills/harness-plan/SKILL.md` (markdown) |
64
+ | L2.5 Drift Monitor | `src/harness/l2.5-drift.ts` (~500 lines TS) | **KEPT AS CODE** — deterministic pattern matching |
65
+ | L4 Adversarial | `src/harness/l4-critics.ts` (~300 lines TS) | `.pi/skills/harness-critic/SKILL.md` + `.pi/agents/critic.md` |
66
+ | P20 Gate | `src/harness/p20-gate.ts` (~100 lines TS) | `.pi/skills/harness-gate/SKILL.md` (bash commands) |
67
+ | L5 Observability | `src/harness/l5-observability.ts` (~200 lines TS) | `.pi/skills/harness-observe/SKILL.md` (markdown) |
68
+ | L6 Memory | `src/harness/l6-memory.ts` (~150 lines TS) | Already wiki-based (claude-obsidian skills) |
69
+ | Event Bus | ~~`src/harness/events.ts`~~ (~200 lines TS) | **REMOVED** — pi's built-in event bus handles routing (2026-05-04) |
70
+ | Types + Config | `src/harness/types.ts` + `config.ts` (~300 lines) | **KEPT AS CODE** — shared infrastructure |
71
+ | **Total TS files** | **~15 files, ~2,500 lines** | **~3 files, ~600 lines** |
72
+ | **Total skill files** | **0** | **6 SKILL.md files + supporting** |
73
+
74
+ ## Contradictions
75
+
76
+ - None identified. All three sources converge on the same architecture: skills for domain expertise, hooks for deterministic enforcement, code only when determinism is required. The skill-first approach is independently validated by Anthropic, Microsoft, OpenAI, and Google within a 4-month window (Dec 2025–Mar 2026).
77
+
78
+ ## Open Questions
79
+
80
+ - **How does pi's skill system handle skill-to-skill invocation?** Can a harness skill invoke the next pipeline skill programmatically, or does pi's built-in event bus need to sequence them? Pi's event bus likely sequences — each skill returns, pi fires next hook.
81
+ - **Can pi skills define hooks in frontmatter?** Claude Code skills can. If pi doesn't support this, hooks must remain in pi's event system or `.pi/settings.json`.
82
+ - **What is the skill context budget in pi?** Claude Code uses 2% of context window with 16,000 char fallback. Pi's budget is unknown.
83
+ - **Skill caching behavior.** Smarter implementations cache recently used skills. Does pi reload SKILL.md from disk every activation or cache? This affects drift monitor → spec hardening reinvocation performance.
84
+ - **How are skill-generated artifacts stored?** L2 planning generates YAML plan files. Can skills write to `.pi/harness/plans/` directly, or does pi's event system broker file writes?
85
+ - **Skill version pinning across releases.** When harness skills ship in pi package, compiled prompts vs live markdown: which approach? The research shows build-time compilation is valid but adds complexity vs live markdown that can be user-edited.
86
+
87
+ ## Sources
88
+
89
+ - [[Source: SwirlAI Agent Skills Progressive Disclosure]]: Mar 11, 2026. Three-tier architecture, ecosystem adoption speed, progressive disclosure as system design pattern.
90
+ - [[Source: Claude API Agent Skills Overview]]: Official docs. Filesystem-based skill architecture, three loading levels, security considerations.
91
+ - [[Source: Blake Crosley Agent Architecture Guide]]: Apr 29, 2026. Complete harness pattern: hooks, skills, subagents, multi-agent orchestration, memory, production patterns.
@@ -0,0 +1,88 @@
1
+ ---
2
+ type: synthesis
3
+ title: "Research: TypeScript Best Practices and Codebase Structure"
4
+ created: 2026-05-02
5
+ updated: 2026-05-02
6
+ tags:
7
+ - research
8
+ - typescript
9
+ - best-practices
10
+ - codebase-structure
11
+ status: developing
12
+ related:
13
+ - "[[ts-strict-mode-rishikc]]"
14
+ - "[[ts-runtimes-comparison-betterstack]]"
15
+ - "[[barrel-files-tkdodo]]"
16
+ - "[[ts-monorepo-koerselman]]"
17
+ - "[[vitest-official]]"
18
+ - "[[ts-folder-structure-mingyang]]"
19
+ - "[[ts-best-practices-2025-devto]]"
20
+ - "[[ts-result-error-handling-kkalamarski]]"
21
+ - "[[typescript-strict-mode]]"
22
+ - "[[barrel-files]]"
23
+ - "[[monorepo-architecture]]"
24
+ - "[[result-monad-error-handling]]"
25
+ - "[[javascript-runtimes]]"
26
+ - "[[vitest]]"
27
+ sources:
28
+ - "[[ts-strict-mode-rishikc]]"
29
+ - "[[ts-runtimes-comparison-betterstack]]"
30
+ - "[[barrel-files-tkdodo]]"
31
+ - "[[ts-monorepo-koerselman]]"
32
+ - "[[vitest-official]]"
33
+ - "[[ts-folder-structure-mingyang]]"
34
+ - "[[ts-best-practices-2025-devto]]"
35
+ - "[[ts-result-error-handling-kkalamarski]]"
36
+
37
+ ---# Research: TypeScript Best Practices and Codebase Structure
38
+
39
+ ## Overview
40
+
41
+ Research across 8 authoritative sources covering TypeScript compiler configuration, runtime selection, code organization patterns, monorepo strategies, testing frameworks, and error handling approaches. The ecosystem has matured significantly: strict mode is the default, barrel files are discouraged, monorepo tooling is production-ready, and type-safe API patterns (tRPC) are gaining adoption.
42
+
43
+ ## Key Findings
44
+
45
+ - **Enable `strict: true` by default** for all new TypeScript projects. `strictNullChecks` alone eliminates a major class of null-reference production bugs. Migrate existing codebases incrementally — one strict flag at a time. (Source: [[ts-strict-mode-rishikc]])
46
+ - **Avoid barrel files (`index.ts` re-exports) in application code**. Barrel files cause circular imports and slow dev servers by 68% in real production measurements. Libraries are the only valid use case. (Source: [[barrel-files-tkdodo]])
47
+ - **Bun is the fastest runtime** (52K req/s vs Node 13K), but Node.js remains the safe choice for production due to ecosystem maturity and backporting of Bun/Deno features. (Source: [[ts-runtimes-comparison-betterstack]])
48
+ - **Built-package strategy with Turborepo** is preferred for TypeScript monorepos. Build packages to JS with a bundler (TSUP, RsLib), use TypeScript project references, and generate `.d.ts.map` files for IDE go-to-definition. (Source: [[ts-monorepo-koerselman]])
49
+ - **Vitest has replaced Jest** as the default test runner for new TypeScript projects. Vite-native, Jest-compatible API, smart watch mode. (Source: [[vitest-official]])
50
+ - **Name backend folders by technical capability** (controllers, services, repositories), not by business feature. Feature-based structure works better for frontend. Separate database logic from business logic. (Source: [[ts-folder-structure-mingyang]])
51
+ - **`Result<Ok, Err>` monad pattern** enables declarative error handling — errors are values, not exceptions. Wrap early, unwrap late. Gaining adoption via libraries like neverthrow and effect-ts. (Source: [[ts-result-error-handling-kkalamarski]])
52
+ - **ESLint `@typescript-eslint/recommended-type-checked`** pairs with strict mode for defense-in-depth. Strict mode catches type issues; ESLint catches floating promises and behavioral bugs. (Sources: [[ts-strict-mode-rishikc]], [[ts-best-practices-2025-devto]])
53
+
54
+ ## Key Entities
55
+
56
+ - [[vitest]]: Vite-native test framework, Jest-compatible, v4.1.5 (2026)
57
+ - [[javascript-runtimes]]: Node.js (stable, mature), Deno (secure, tooling-rich), Bun (fast, drop-in Node.js replacement)
58
+
59
+ ## Key Concepts
60
+
61
+ - [[typescript-strict-mode]]: The `"strict": true` compiler flag enables 8+ sub-checks
62
+ - [[barrel-files]]: Re-export files — useful for libraries, harmful for app code
63
+ - [[monorepo-architecture]]: Single repo, multiple packages — built-package vs internal-packages strategies
64
+ - [[result-monad-error-handling]]: Functional error handling — `Result<Ok, Err>` with map/flatMap/match
65
+
66
+ ## Contradictions
67
+
68
+ - **Barrel files**: Traditional advice says barrel files clean up imports; TkDodo (2024) demonstrates they cause circular imports and 68% module bloat. Consensus is shifting toward direct imports for app code. Resolution: Use barrels only for library entry points. (Sources: [[barrel-files-tkdodo]] vs common practice)
69
+ - **Folder structure**: Mingyang Li argues for technical-capability folders on backend (Clean Architecture). Vertical Slice advocates argue feature-based folders reduce context switching. Resolution: Technical structure for backend stability, feature structure for frontend adaptability. (Source: [[ts-folder-structure-mingyang]])
70
+ - **Built vs source-only packages**: Koerselman prefers building packages with bundlers for caching and ESM compatibility. Turborepo team's blog argues source-only is simpler and often sufficient. Resolution: Depends on project size. Small teams: source-only. Large teams with CI/CD: built-package. (Source: [[ts-monorepo-koerselman]])
71
+
72
+ ## Open Questions
73
+
74
+ - How does tRPC compare to traditional REST in non-TypeScript environments? (Research focused on TS-TS stacks)
75
+ - What is the adoption rate of Biome (Rust-based linter/formatter) vs ESLint+Prettier in 2026?
76
+ - Are there published benchmarks for `isolatedModules: true` performance impact in large monorepos?
77
+ - How does the Oxc-based TypeScript transpiler (used by Vitest) compare to SWC and ESBuild for type stripping?
78
+
79
+ ## Sources
80
+
81
+ - [[ts-strict-mode-rishikc]]: Rishi Kumar Chawda, 2021/2026 — comprehensive strict mode guide
82
+ - [[ts-runtimes-comparison-betterstack]]: Stanley Ulili, 2026 — Node.js vs Deno vs Bun comparison with benchmarks
83
+ - [[barrel-files-tkdodo]]: Dominik Dorfmeister, 2024 — argument against barrel files with performance data
84
+ - [[ts-monorepo-koerselman]]: Thijs Koerselman, 2023/2026 — deep dive into TS monorepo patterns
85
+ - [[vitest-official]]: Vitest contributors, 2026 — official testing framework documentation
86
+ - [[ts-folder-structure-mingyang]]: Mingyang Li, 2024 — production-grade Node.js/TS folder structure
87
+ - [[ts-best-practices-2025-devto]]: Mitu M, 2025 — broad overview of 2025 best practices
88
+ - [[ts-result-error-handling-kkalamarski]]: Krzysztof Kalamarski, 2022 — Result monad pattern implementation
@@ -0,0 +1,81 @@
1
+ ---
2
+ type: synthesis
3
+ title: "Research: TypeScript Execution Layer for Agent Tool Calling"
4
+ created: 2026-05-01
5
+ updated: 2026-05-01
6
+ tags:
7
+ - research
8
+ - agent-tools
9
+ - typescript-execution-layer
10
+ - harness
11
+ status: developing
12
+ related:
13
+ - "[[ts-execution-layer]]"
14
+ - "[[mcp-tool-routing]]"
15
+ - "[[agentic-harness-context-enforcement]]"
16
+ - "[[think-in-code-enforcement]]"
17
+ - "[[harness-implementation-plan]]"
18
+ sources:
19
+ - "[[codeact-apple-2024]]"
20
+ - "[[cloudflare-codemode]]"
21
+ - "[[executor-rhyssullivan]]"
22
+ - "[[colinmcnamara-context-optimization-codemode]]"
23
+
24
+ ---# Research: TypeScript Execution Layer for Agent Tool Calling
25
+
26
+ ## Overview
27
+
28
+ The TypeScript execution layer pattern replaces flat tool calling with a single "write code" tool plus a sandboxed TypeScript runtime. Research across 4 sources (1 academic paper, 2 production systems, 1 analysis) confirms this pattern reduces context by 3-4x, improves multi-tool success rates by ~20%, and reduces interaction turns by ~30%. The pattern is validated at production scale by Cloudflare (Code Mode) and the open-source Executor project (1.3K stars). It directly addresses the **tool context bloat problem** identified in MCP-heavy agent architectures and complements our existing Think-in-Code enforcement (P14).
29
+
30
+ ## Key Findings
31
+
32
+ - **20% higher success rate on multi-tool tasks** when agents write Python/TypeScript code instead of JSON tool calls (CodeAct, ICML 2024, tested on 17 LLMs). This is a capability improvement, not just a context optimization. (Source: [[codeact-apple-2024]])
33
+ - **~3-4x context reduction**: Code Mode uses ~3,100 tokens vs ~10,500+ tokens for traditional tool calling per interaction. The LLM only sees type definitions and final results — intermediate tool call/response pairs stay in the sandbox. (Source: [[cloudflare-codemode]], [[colinmcnamara-context-optimization-codemode]])
34
+ - **30% fewer interaction turns**: Multi-step workflows that required 5-10 round-trips become one code generation turn. Fewer round-trips mean fewer opportunities for error propagation. (Source: [[codeact-apple-2024]])
35
+ - **Python interpreter provides zero-cost error signals**: Wrong intermediate calculations raise exceptions immediately. Agent sees traceback and revises without a separate critique step. This complements our L4 adversarial verification by catching syntax/semantic errors before they reach the critic agent. (Source: [[codeact-apple-2024]])
36
+ - **TypeScript preferred over Python for agent code**: LLMs have seen millions of TS/JS repos in training data. Type system provides natural guardrails — malformed API calls are caught at generation time, not execution time. (Source: [[cloudflare-codemode]], [[executor-rhyssullivan]])
37
+ - **Tool discovery without context load**: Executor's `tools.discover({ query, limit })` pattern lets agents discover tools dynamically without loading all definitions into context. This is a fundamental improvement over MCP's `tools/list` which returns everything. (Source: [[executor-rhyssullivan]])
38
+ - **Cross-agent tool sharing**: Executor's MCP server mode enables a single tool catalog shared across Cursor, Claude Code, OpenCode, and other agents. Aligns with our P39 (Harness as MCP Server). (Source: [[executor-rhyssullivan]])
39
+
40
+ ## Key Entities
41
+
42
+ - **Apple Machine Learning Research**: Published CodeAct (ICML 2024), the foundational paper establishing code-as-unified-action-space
43
+ - **Cloudflare**: Production implementation of TypeScript execution layer via `@cloudflare/codemode` (Workers-based sandbox)
44
+ - **Rhys Sullivan**: Creator of Executor, the leading open-source local-first TypeScript runtime for agents (1.3K stars)
45
+ - **Kenton Varda & Sunil Pai**: Cloudflare engineers who articulated "LLMs are better at writing code than making tool calls"
46
+
47
+ ## Key Concepts
48
+
49
+ - **[[ts-execution-layer]]**: The pattern of replacing flat tool lists with a typed TypeScript runtime. Core concept page.
50
+ - **CodeAct paradigm**: LLM actions expressed as executable code rather than JSON/text. Foundation for all code execution layer systems.
51
+ - **Tool catalog**: Single discovery point for all tools, replacing per-agent tool loading. Queryable by intent.
52
+ - **Deterministic bridge**: LLM (non-deterministic) generates code → runtime (deterministic) executes it → predictable results. Contrasts with sub-agent pattern (non-deterministic all the way down).
53
+ - **Network isolation**: Executed code has no network access by default. All external interaction flows through tool dispatch mechanism. Enforced at runtime level.
54
+
55
+ ## Contradictions
56
+
57
+ - **Python vs TypeScript**: CodeAct uses Python; Cloudflare and Executor use TypeScript. CodeAct argues Python's interpreter errors are the key mechanism; TypeScript advocates argue type definitions provide similar guardrails at generation time. Both valid — the language choice depends on sandbox infrastructure. TypeScript is the better fit for our Node.js harness.
58
+ - **Cloud vs Local**: Cloudflare Code Mode requires Cloudflare Workers; Executor runs locally. For our CLI harness, local execution is mandatory. Executor's architecture (local daemon, tool catalog, typed runtime) is the closer reference implementation.
59
+
60
+ ## What the Harness Does NOT Need from These Systems
61
+
62
+ - **Cloudflare Workers dependency**: Our sandbox uses local Node.js VM or Deno — not CF infrastructure. The `Executor` interface is minimal and we implement our own backend.
63
+ - **Python interpreter sandbox**: TypeScript is our harness language. CodeAct's Python research validates the paradigm but we implement in TypeScript.
64
+ - **Web UI for tool configuration**: Executor has a browser UI. Our harness is CLI-only. Configuration via `.pi/harness/tool-catalog.json` and CLI commands.
65
+ - **Multi-agent tool sharing**: Nice-to-have but not in scope for Phase 0. P39 (Harness as MCP Server) covers this eventually.
66
+
67
+ ## Open Questions
68
+
69
+ - **Sandbox security model**: What's the minimum viable sandbox for TypeScript code that calls our tools? Node.js `vm` module? Deno subprocess? Bubblewrap? Each has different security/performance tradeoffs. Needs a dedicated spike.
70
+ - **Type generation from our tool schemas**: Our tools (read, bash, edit, ctx_execute, ck_search) need TypeScript type definitions auto-generated. Cloudflare's `generateTypes()` is CF-specific. Executor's type generation is tied to its plugin system. We need a harness-specific solution.
71
+ - **Model compatibility**: Which models are good enough at TypeScript generation to use this pattern? GPT/Claude are strong. Smaller models (Haiku, Gemini Flash) may struggle. Need model-adaptive routing per [[model-adaptive-harness]].
72
+ - **Permission gating inside sandbox**: If the LLM-generated code calls `tools.bash("rm -rf /")`, how does the permission subsystem intercept? The tool dispatch mechanism must route through P35 (Permission Subsystem) before execution.
73
+ - **Error handling UX**: When generated TypeScript has syntax errors or type mismatches, what does the agent see? Traceback? Auto-fix attempt? The error feedback loop design is critical.
74
+ - **Benchmark against direct tool calling**: Before committing to this phase, benchmark our harness with direct tool calling vs TS execution layer on real tasks (not just M3ToolEval). Measure context usage, success rate, and wall-clock time.
75
+
76
+ ## Sources
77
+
78
+ - [[codeact-apple-2024]]: Wang et al., ICML 2024 — 20% improvement, 30% fewer turns
79
+ - [[cloudflare-codemode]]: Cloudflare official docs — production implementation, 3-4x context reduction
80
+ - [[executor-rhyssullivan]]: RhysSullivan/executor — open-source local TS runtime, 1.3K stars
81
+ - [[colinmcnamara-context-optimization-codemode]]: Colin McNamara analysis — context efficiency comparison, sub-agent vs Code Mode
@@ -0,0 +1,71 @@
1
+ ---
2
+ type: synthesis
3
+ title: "Research: claude-mem over Obsidian for Harness Layer"
4
+ created: 2026-05-05
5
+ updated: 2026-05-05
6
+ tags:
7
+ - research
8
+ - memory
9
+ - harness
10
+ - claude-mem
11
+ - obsidian
12
+ status: developing
13
+ related:
14
+ - "[[persistent-memory]]"
15
+ - "[[adr-009]]"
16
+ - "[[Research: Claude Code State-of-the-Art Harness Improvements]]"
17
+ - "[[lifecycle-hooks]]"
18
+ - "[[Codex Harness Innovations (OpenAI)]]"
19
+ - "[[memory-system-of-record-vs-ephemeral-cache]]"
20
+ sources:
21
+ - "[[adr-009]]"
22
+ - "[[persistent-memory]]"
23
+ - "[[Research: Claude Code State-of-the-Art Harness Improvements]]"
24
+ - "[[claude-code-architecture-karaxai-2026]]"
25
+ - "[[codex-harness-innovations]]"
26
+ ---
27
+
28
+ # Research: claude-mem over Obsidian for Harness Layer
29
+
30
+ ## Overview
31
+ Current harness memory decision is explicit and stable: Obsidian wiki is Layer 6 system of record via ADR-009. Local corpus has no direct `claude-mem` implementation details, so replacement recommendation cannot be high confidence. Based on current evidence, full replacement is not advised; safest path is Obsidian as source of truth plus optional local auto-memory cache if needed.
32
+
33
+ ## Key Findings
34
+ - **System-of-record already chosen (high)**: ADR-009 explicitly replaces vector-memory stack with claude-obsidian Mode B, with `hot.md -> index.md -> pages` retrieval and lower dependency surface (Source: [[adr-009]]).
35
+ - **Harness contracts depend on wiki structure (high)**: Layer write/read hooks, auditability, and cross-layer memory mapping are built around `wiki/index.md`, `wiki/log.md`, and `wiki/hot.md` (Source: [[persistent-memory]]).
36
+ - **Prompt memory is weaker than deterministic controls (high)**: Claude architecture notes ~92% instruction compliance from CLAUDE.md versus deterministic hook enforcement when configured (Source: [[lifecycle-hooks]], [[claude-code-architecture-karaxai-2026]]).
37
+ - **Automatic memory and explicit wiki solve different problems (medium)**: Codex-style implicit memories help fast context recovery, but explicit wiki remains better for durable decisions, provenance, and human review (Source: [[codex-harness-innovations]]).
38
+ - **No verified claude-mem data in current wiki (low)**: No local source page or benchmark for claude-mem behavior, storage model, recall quality, or failure modes in this repo context.
39
+
40
+ ## Key Entities
41
+ - [[Claude Code]]: demonstrates multi-memory architecture and deterministic hook enforcement.
42
+ - [[Codex Harness Innovations (OpenAI)]]: demonstrates automatic/implicit memory pattern.
43
+
44
+ ## Key Concepts
45
+ - [[persistent-memory]]: current Layer 6 design.
46
+ - [[lifecycle-hooks]]: reliability boundary between prompt-following and enforcement.
47
+ - [[memory-system-of-record-vs-ephemeral-cache]]: recommended split architecture.
48
+
49
+ ## Contradictions
50
+ - [[adr-009]] optimizes for explicit wiki memory with human-readable provenance; [[codex-harness-innovations]] shows value in implicit auto-memory capture. Best reconciliation: keep explicit wiki as canonical and use implicit memory only as non-authoritative accelerator.
51
+
52
+ ## Recommendation
53
+ - Do **not** replace Obsidian with claude-mem as primary harness memory right now.
54
+ - If you want claude-mem, run it as a **secondary cache layer** only:
55
+ - read order: claude-mem quick hints -> wiki `hot.md` -> wiki `index.md` -> linked pages
56
+ - write order: all accepted decisions and patterns must land in wiki files
57
+ - enforcement: hooks block "done" unless wiki write completed for decision-bearing tasks
58
+
59
+ ## Open Questions
60
+ - What is claude-mem persistence format and durability under compaction/restart?
61
+ - Can claude-mem expose provenance links equivalent to wiki page references?
62
+ - What are precision/recall metrics on this repo versus wiki query flow?
63
+ - How should conflict resolution work when claude-mem memory disagrees with wiki decisions?
64
+ - What token/cost/latency delta appears in real harness runs with hybrid mode?
65
+
66
+ ## Sources
67
+ - [[adr-009]]: persistent memory ADR, 2026-04-28.
68
+ - [[persistent-memory]]: Layer 6 module contract.
69
+ - [[Research: Claude Code State-of-the-Art Harness Improvements]]: memory and control architecture synthesis.
70
+ - [[claude-code-architecture-karaxai-2026]]: CLAUDE.md memory and compliance behavior.
71
+ - [[codex-harness-innovations]]: implicit/automatic memory pattern comparison.
@@ -0,0 +1,80 @@
1
+ ---
2
+ type: synthesis
3
+ title: "Research: claude-mem over obsidian wiki as the knowledge base for our agentic harness pipeline. think from first principles. does this replace or complement our current setup? no hard feelings about previous decisions. gimme accurate points"
4
+ created: 2026-05-05
5
+ updated: 2026-05-05
6
+ tags:
7
+ - research
8
+ - memory
9
+ - claude-mem
10
+ - obsidian
11
+ - harness
12
+ - first-principles
13
+ status: developing
14
+ related:
15
+ - "[[adr-009]]"
16
+ - "[[persistent-memory]]"
17
+ - "[[lifecycle-hooks]]"
18
+ - "[[memory-system-of-record-vs-ephemeral-cache]]"
19
+ - "[[Research: how claude-mem fits into our workflow. and whether it should replace obsidian in the codebase. no hard feelings about previous actions, rethink from first principles always]]"
20
+ sources:
21
+ - "[[adr-009]]"
22
+ - "[[persistent-memory]]"
23
+ - "[[anthropic-effective-harnesses]]"
24
+ - "[[claude-code-architecture-qubytes-2026]]"
25
+ - "[[codex-open-source-agent-2026]]"
26
+ ---
27
+
28
+ # Research: claude-mem over obsidian wiki as the knowledge base for our agentic harness pipeline. think from first principles. does this replace or complement our current setup? no hard feelings about previous decisions. gimme accurate points
29
+
30
+ ## Overview
31
+ First-principles answer: `claude-mem` does not replace Obsidian wiki in current harness design. It complements wiki as a fast recall cache. Canonical decision memory, provenance, and enforcement still belong in wiki system-of-record.
32
+
33
+ ## Research Method
34
+ - Round 1 (broad): evaluate harness-memory requirements, existing ADRs, and architecture contracts.
35
+ - Round 2 (gap fill): validate whether vault contains direct `claude-mem` primary evidence.
36
+ - Round 3: skipped; confidence ceiling reached because direct `claude-mem` source evidence is still missing.
37
+ - Constraint note: web tool access blocked in this run, so findings rely on existing filed sources.
38
+
39
+ ## First-Principles Criteria
40
+ 1. Canonical memory must be durable across sessions.
41
+ 2. Memory claims must be auditable and human-inspectable.
42
+ 3. Decision provenance must be linkable and conflict-resolvable.
43
+ 4. Completion gates must enforce write-back deterministically.
44
+ 5. Fast recall is useful, but cannot override canonical truth.
45
+
46
+ ## Key Findings
47
+ - **(high)** Current architecture already sets canonical memory to wiki via `[[adr-009]]`, with read order and durable files (`hot.md`, `index.md`, linked pages). (Source: [[adr-009]], [[persistent-memory]])
48
+ - **(high)** Harness integration points and lifecycle write patterns are coupled to wiki artifacts; replacing wiki implies re-architecting memory contracts across layers. (Source: [[persistent-memory]])
49
+ - **(high)** Deterministic hooks and gates are required for policy reliability; prompt memory alone drifts under long-running workloads. (Source: [[anthropic-effective-harnesses]], [[claude-code-architecture-qubytes-2026]])
50
+ - **(medium)** Auto-memory pattern is useful for convenience and continuity, but strongest in non-authoritative role alongside explicit durable memory. (Source: [[codex-open-source-agent-2026]])
51
+ - **(low)** Vault still lacks direct `claude-mem` benchmark evidence (storage semantics, provenance fidelity, recall precision, conflict behavior), so full replacement claim is unproven.
52
+
53
+ ## Replace vs Complement
54
+ ### Replace
55
+ Not supported by current evidence. Fails criteria 2-4 unless additional mechanisms are proven.
56
+
57
+ ### Complement
58
+ Supported. Use `claude-mem` as optional acceleration cache; keep wiki as source-of-record.
59
+
60
+ ## Recommended Operating Model
61
+ 1. Read path: `claude-mem` hints -> `[[hot]]` -> `[[index]]` -> canonical linked pages.
62
+ 2. Write path: all decision-bearing outputs must land in wiki first.
63
+ 3. Conflict rule: cache and wiki disagree -> wiki wins.
64
+ 4. Enforcement: stop-hook blocks completion when required wiki filing is missing.
65
+
66
+ ## Contradictions
67
+ - Fast-memory systems optimize latency; wiki optimizes auditability and governance. One layer cannot optimize both without tradeoffs. Use two-layer memory model.
68
+
69
+ ## Open Questions
70
+ - What are exact retention and deletion semantics for `claude-mem` in team workflows?
71
+ - Can `claude-mem` produce source-level provenance links compatible with wikilinks?
72
+ - What measured latency/token gain appears in this repo with cache+wiki mode?
73
+ - What precision/recall benchmark should define acceptable cache quality?
74
+
75
+ ## Sources
76
+ - [[adr-009]]: canonical memory ADR.
77
+ - [[persistent-memory]]: layer contract and write/read patterns.
78
+ - [[anthropic-effective-harnesses]]: long-running harness constraints.
79
+ - [[claude-code-architecture-qubytes-2026]]: practical persistence/hook architecture notes.
80
+ - [[codex-open-source-agent-2026]]: implicit memory as complementary pattern.