pi-crew 0.1.46 → 0.1.49

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (253) hide show
  1. package/CHANGELOG.md +97 -0
  2. package/agents/analyst.md +11 -11
  3. package/agents/critic.md +11 -11
  4. package/agents/executor.md +11 -11
  5. package/agents/explorer.md +11 -11
  6. package/agents/planner.md +11 -11
  7. package/agents/reviewer.md +11 -11
  8. package/agents/security-reviewer.md +11 -11
  9. package/agents/test-engineer.md +11 -11
  10. package/agents/verifier.md +11 -11
  11. package/agents/writer.md +11 -11
  12. package/docs/next-upgrade-roadmap.md +117 -42
  13. package/docs/refactor-tasks-phase3.md +394 -394
  14. package/docs/refactor-tasks-phase4.md +564 -564
  15. package/docs/refactor-tasks-phase5.md +402 -402
  16. package/docs/refactor-tasks-phase6.md +662 -662
  17. package/docs/research/AGENT-EXECUTION-ARCHITECTURE.md +261 -0
  18. package/docs/research/AGENT-LIFECYCLE-COMPARISON.md +111 -0
  19. package/docs/research/AUDIT_OH_MY_PI.md +261 -0
  20. package/docs/research/AUDIT_PI_CREW.md +457 -0
  21. package/docs/research/CAVEMAN-DEEP-RESEARCH.md +281 -0
  22. package/docs/research/COMPARISON_OH_MY_PI_VS_PI_CREW.md +264 -0
  23. package/docs/research/DEEP-RESEARCH-PI-POWERBAR.md +343 -0
  24. package/docs/research/DEEP_RESEARCH_SUBAGENT_ARCHITECTURE.md +480 -0
  25. package/docs/research/GAP_CLOSURE_IMPLEMENTATION_PLAN.md +354 -0
  26. package/docs/research/IMPLEMENTATION_PLAN.md +385 -0
  27. package/docs/research/LIVE-SESSION-PRODUCTION-READY-PLAN.md +502 -0
  28. package/docs/research/OH-MY-PI-DEEP-RESEARCH-v14.7.6.md +266 -0
  29. package/docs/research/REMAINING-GAPS-PLAN.md +363 -0
  30. package/docs/research/SESSION-SUMMARY-2026-05-08.md +146 -0
  31. package/docs/research/UI-RESPONSIVENESS-AUDIT.md +173 -0
  32. package/docs/research-awesome-agent-skills-distillation.md +100 -100
  33. package/docs/research-extension-examples.md +297 -297
  34. package/docs/research-extension-system.md +324 -324
  35. package/docs/research-oh-my-pi-distillation.md +56 -9
  36. package/docs/research-optimization-plan.md +548 -548
  37. package/docs/research-phase10-distillation.md +198 -198
  38. package/docs/research-phase11-distillation.md +201 -201
  39. package/docs/research-pi-coding-agent.md +357 -357
  40. package/docs/research-source-pi-crew-reference.md +174 -174
  41. package/docs/runtime-flow.md +148 -148
  42. package/docs/source-runtime-refactor-map.md +107 -107
  43. package/index.ts +6 -6
  44. package/package.json +99 -98
  45. package/schema.json +8 -0
  46. package/skills/async-worker-recovery/SKILL.md +42 -42
  47. package/skills/context-artifact-hygiene/SKILL.md +52 -52
  48. package/skills/delegation-patterns/SKILL.md +54 -54
  49. package/skills/mailbox-interactive/SKILL.md +40 -40
  50. package/skills/model-routing-context/SKILL.md +39 -39
  51. package/skills/multi-perspective-review/SKILL.md +58 -58
  52. package/skills/observability-reliability/SKILL.md +41 -41
  53. package/skills/orchestration/SKILL.md +157 -0
  54. package/skills/ownership-session-security/SKILL.md +41 -41
  55. package/skills/pi-extension-lifecycle/SKILL.md +39 -39
  56. package/skills/requirements-to-task-packet/SKILL.md +63 -63
  57. package/skills/resource-discovery-config/SKILL.md +41 -41
  58. package/skills/runtime-state-reader/SKILL.md +44 -44
  59. package/skills/secure-agent-orchestration-review/SKILL.md +45 -45
  60. package/skills/state-mutation-locking/SKILL.md +42 -42
  61. package/skills/systematic-debugging/SKILL.md +67 -67
  62. package/skills/ui-render-performance/SKILL.md +39 -39
  63. package/skills/verification-before-done/SKILL.md +57 -57
  64. package/skills/worktree-isolation/SKILL.md +39 -39
  65. package/src/agents/agent-config.ts +6 -0
  66. package/src/agents/agent-search.ts +98 -0
  67. package/src/agents/agent-serializer.ts +4 -0
  68. package/src/agents/discover-agents.ts +17 -4
  69. package/src/config/config.ts +24 -0
  70. package/src/config/defaults.ts +11 -0
  71. package/src/extension/autonomous-policy.ts +26 -33
  72. package/src/extension/cross-extension-rpc.ts +82 -82
  73. package/src/extension/help.ts +1 -0
  74. package/src/extension/management.ts +5 -0
  75. package/src/extension/register.ts +58 -13
  76. package/src/extension/registration/commands.ts +33 -1
  77. package/src/extension/registration/compaction-guard.ts +125 -125
  78. package/src/extension/registration/team-tool.ts +6 -4
  79. package/src/extension/run-bundle-schema.ts +89 -89
  80. package/src/extension/run-index.ts +24 -18
  81. package/src/extension/run-maintenance.ts +68 -62
  82. package/src/extension/team-tool/api.ts +23 -2
  83. package/src/extension/team-tool/cancel.ts +86 -11
  84. package/src/extension/team-tool/context.ts +3 -0
  85. package/src/extension/team-tool/handle-settings.ts +188 -188
  86. package/src/extension/team-tool/inspect.ts +41 -41
  87. package/src/extension/team-tool/intent-policy.ts +42 -0
  88. package/src/extension/team-tool/lifecycle-actions.ts +47 -18
  89. package/src/extension/team-tool/parallel-dispatch.ts +156 -0
  90. package/src/extension/team-tool/plan.ts +19 -19
  91. package/src/extension/team-tool/respond.ts +10 -2
  92. package/src/extension/team-tool/run.ts +3 -2
  93. package/src/extension/team-tool/status.ts +1 -1
  94. package/src/extension/team-tool-types.ts +1 -0
  95. package/src/extension/team-tool.ts +13 -3
  96. package/src/hooks/registry.ts +61 -0
  97. package/src/hooks/types.ts +41 -0
  98. package/src/i18n.ts +184 -184
  99. package/src/observability/exporters/otlp-exporter.ts +77 -77
  100. package/src/prompt/prompt-runtime.ts +72 -72
  101. package/src/runtime/agent-control.ts +108 -2
  102. package/src/runtime/agent-memory.ts +72 -72
  103. package/src/runtime/agent-observability.ts +114 -114
  104. package/src/runtime/async-marker.ts +26 -26
  105. package/src/runtime/async-runner.ts +3 -1
  106. package/src/runtime/attention-events.ts +28 -28
  107. package/src/runtime/background-runner.ts +19 -0
  108. package/src/runtime/cancellation-token.ts +89 -0
  109. package/src/runtime/cancellation.ts +61 -51
  110. package/src/runtime/capability-inventory.ts +116 -0
  111. package/src/runtime/child-pi.ts +2 -1
  112. package/src/runtime/code-summary.ts +247 -0
  113. package/src/runtime/completion-guard.ts +190 -190
  114. package/src/runtime/crash-recovery.ts +181 -0
  115. package/src/runtime/crew-agent-records.ts +35 -7
  116. package/src/runtime/crew-agent-runtime.ts +1 -0
  117. package/src/runtime/custom-tools/irc-tool.ts +201 -0
  118. package/src/runtime/custom-tools/submit-result-tool.ts +90 -0
  119. package/src/runtime/delivery-coordinator.ts +3 -1
  120. package/src/runtime/direct-run.ts +35 -35
  121. package/src/runtime/effectiveness.ts +81 -76
  122. package/src/runtime/event-stream-bridge.ts +90 -0
  123. package/src/runtime/foreground-control.ts +82 -82
  124. package/src/runtime/green-contract.ts +46 -46
  125. package/src/runtime/group-join.ts +106 -106
  126. package/src/runtime/heartbeat-gradient.ts +28 -28
  127. package/src/runtime/heartbeat-watcher.ts +124 -124
  128. package/src/runtime/live-agent-control.ts +88 -88
  129. package/src/runtime/live-agent-manager.ts +78 -2
  130. package/src/runtime/live-control-realtime.ts +36 -36
  131. package/src/runtime/live-extension-bridge.ts +150 -0
  132. package/src/runtime/live-irc.ts +92 -0
  133. package/src/runtime/live-session-health.ts +100 -0
  134. package/src/runtime/live-session-runtime.ts +297 -7
  135. package/src/runtime/mcp-proxy.ts +113 -0
  136. package/src/runtime/notebook-helpers.ts +90 -0
  137. package/src/runtime/orphan-sentinel.ts +7 -0
  138. package/src/runtime/output-validator.ts +187 -0
  139. package/src/runtime/parallel-research.ts +44 -44
  140. package/src/runtime/parallel-utils.ts +57 -0
  141. package/src/runtime/parent-guard.ts +80 -0
  142. package/src/runtime/pi-json-output.ts +111 -111
  143. package/src/runtime/policy-engine.ts +79 -79
  144. package/src/runtime/progress-event-coalescer.ts +43 -43
  145. package/src/runtime/prose-compressor.ts +164 -0
  146. package/src/runtime/recovery-recipes.ts +74 -74
  147. package/src/runtime/result-extractor.ts +121 -0
  148. package/src/runtime/role-permission.ts +39 -39
  149. package/src/runtime/runtime-resolver.ts +1 -4
  150. package/src/runtime/semaphore.ts +131 -0
  151. package/src/runtime/sensitive-paths.ts +92 -0
  152. package/src/runtime/session-resources.ts +25 -25
  153. package/src/runtime/session-snapshot.ts +59 -59
  154. package/src/runtime/session-usage.ts +79 -79
  155. package/src/runtime/sidechain-output.ts +29 -29
  156. package/src/runtime/stream-preview.ts +177 -0
  157. package/src/runtime/subagent-manager.ts +3 -2
  158. package/src/runtime/subprocess-tool-registry.ts +67 -0
  159. package/src/runtime/supervisor-contact.ts +59 -59
  160. package/src/runtime/task-display.ts +38 -38
  161. package/src/runtime/task-output-context.ts +59 -9
  162. package/src/runtime/task-runner/capabilities.ts +78 -78
  163. package/src/runtime/task-runner/live-executor.ts +2 -0
  164. package/src/runtime/task-runner/progress.ts +119 -119
  165. package/src/runtime/task-runner/prompt-builder.ts +70 -8
  166. package/src/runtime/task-runner/prompt-pipeline.ts +64 -64
  167. package/src/runtime/task-runner/result-utils.ts +14 -14
  168. package/src/runtime/task-runner/run-projection.ts +104 -0
  169. package/src/runtime/task-runner/state-helpers.ts +22 -22
  170. package/src/runtime/task-runner.ts +75 -4
  171. package/src/runtime/team-runner.ts +60 -8
  172. package/src/runtime/worker-heartbeat.ts +21 -21
  173. package/src/runtime/worker-startup.ts +57 -57
  174. package/src/runtime/workspace-tree.ts +298 -0
  175. package/src/runtime/yield-handler.ts +189 -0
  176. package/src/schema/config-schema.ts +6 -0
  177. package/src/schema/team-tool-schema.ts +11 -1
  178. package/src/skills/discover-skills.ts +67 -0
  179. package/src/state/active-run-registry.ts +4 -2
  180. package/src/state/artifact-store.ts +4 -1
  181. package/src/state/atomic-write.ts +50 -1
  182. package/src/state/blob-store.ts +117 -0
  183. package/src/state/contracts.ts +1 -0
  184. package/src/state/event-log-rotation.ts +158 -0
  185. package/src/state/event-log.ts +52 -2
  186. package/src/state/mailbox.ts +87 -7
  187. package/src/state/state-store.ts +24 -4
  188. package/src/state/task-claims.ts +44 -44
  189. package/src/state/types.ts +20 -0
  190. package/src/state/usage.ts +29 -29
  191. package/src/subagents/async-entry.ts +1 -1
  192. package/src/subagents/index.ts +3 -3
  193. package/src/subagents/live/control.ts +1 -1
  194. package/src/subagents/live/manager.ts +1 -1
  195. package/src/subagents/live/realtime.ts +1 -1
  196. package/src/subagents/live/session-runtime.ts +1 -1
  197. package/src/subagents/manager.ts +1 -1
  198. package/src/subagents/spawn.ts +1 -1
  199. package/src/teams/team-serializer.ts +38 -38
  200. package/src/types/diff.d.ts +18 -18
  201. package/src/ui/agent-management-overlay.ts +144 -0
  202. package/src/ui/crew-footer.ts +101 -101
  203. package/src/ui/crew-select-list.ts +111 -111
  204. package/src/ui/crew-widget.ts +11 -2
  205. package/src/ui/dashboard-panes/cancellation-pane.ts +43 -0
  206. package/src/ui/dashboard-panes/capability-pane.ts +60 -0
  207. package/src/ui/dashboard-panes/mailbox-pane.ts +35 -11
  208. package/src/ui/dashboard-panes/metrics-pane.ts +34 -34
  209. package/src/ui/dynamic-border.ts +25 -25
  210. package/src/ui/layout-primitives.ts +106 -106
  211. package/src/ui/live-run-sidebar.ts +4 -0
  212. package/src/ui/loaders.ts +158 -158
  213. package/src/ui/powerbar-publisher.ts +77 -15
  214. package/src/ui/render-coalescer.ts +51 -0
  215. package/src/ui/render-diff.ts +119 -119
  216. package/src/ui/render-scheduler.ts +143 -143
  217. package/src/ui/run-dashboard.ts +4 -0
  218. package/src/ui/run-event-bus.ts +209 -0
  219. package/src/ui/run-snapshot-cache.ts +68 -16
  220. package/src/ui/snapshot-types.ts +8 -0
  221. package/src/ui/spinner.ts +17 -17
  222. package/src/ui/status-colors.ts +58 -58
  223. package/src/ui/syntax-highlight.ts +116 -116
  224. package/src/ui/transcript-entries.ts +258 -0
  225. package/src/utils/atomic-write.ts +33 -33
  226. package/src/utils/completion-dedupe.ts +63 -63
  227. package/src/utils/frontmatter.ts +68 -68
  228. package/src/utils/git.ts +262 -262
  229. package/src/utils/ids.ts +17 -12
  230. package/src/utils/incremental-reader.ts +104 -0
  231. package/src/utils/names.ts +27 -27
  232. package/src/utils/redaction.ts +44 -44
  233. package/src/utils/safe-paths.ts +47 -47
  234. package/src/utils/scan-cache.ts +137 -0
  235. package/src/utils/sleep.ts +32 -32
  236. package/src/utils/sse-parser.ts +134 -0
  237. package/src/utils/task-name-generator.ts +337 -0
  238. package/src/utils/visual.ts +33 -2
  239. package/src/workflows/validate-workflow.ts +40 -40
  240. package/src/worktree/branch-freshness.ts +45 -45
  241. package/src/worktree/cleanup.ts +2 -1
  242. package/teams/default.team.md +12 -12
  243. package/teams/fast-fix.team.md +11 -11
  244. package/teams/implementation.team.md +18 -18
  245. package/teams/parallel-research.team.md +14 -14
  246. package/teams/research.team.md +11 -11
  247. package/teams/review.team.md +12 -12
  248. package/workflows/default.workflow.md +29 -29
  249. package/workflows/fast-fix.workflow.md +22 -22
  250. package/workflows/implementation.workflow.md +38 -38
  251. package/workflows/parallel-research.workflow.md +46 -46
  252. package/workflows/research.workflow.md +22 -22
  253. package/workflows/review.workflow.md +30 -30
@@ -0,0 +1,281 @@
1
+ # Caveman Deep Research — Agent Communication Optimization
2
+
3
+ > Source: `source/caveman/` (github.com/JuliusBrussee/caveman)
4
+ > Date: 2026-05-08
5
+ > Purpose: Apply caveman patterns to optimize pi-crew inter-agent communication
6
+
7
+ ---
8
+
9
+ ## 1. Executive Summary
10
+
11
+ Caveman là một hệ thống **token compression** cho AI coding agents. Core insight: **LLM output trung bình 65-75% là filler** (articles, hedging, pleasantries). Bỏ filler → giảm token, tăng tốc, giảm cost, **không mất accuracy** (thậm chí tăng 26% theo paper arXiv:2604.00025).
12
+
13
+ **Áp dụng cho pi-crew**: Worker output được inject vào main context của Pi parent. Nếu worker output dùng caveman-style → main context live lâu hơn, nhiều task hơn per session.
14
+
15
+ ---
16
+
17
+ ## 2. Architecture Overview
18
+
19
+ ```
20
+ caveman/
21
+ ├── skills/ # Behavior definition (SKILL.md)
22
+ │ ├── caveman/ # Core: intensity levels, rules
23
+ │ ├── cavecrew/ # Decision guide for delegation
24
+ │ ├── caveman-commit/ # Terse commit messages
25
+ │ ├── caveman-review/ # One-line PR reviews
26
+ │ ├── caveman-help/ # Quick-reference card
27
+ │ ├── caveman-stats/ # Token usage tracker
28
+ │ └── compress/ # Memory file compression
29
+ ├── agents/ # Subagent definitions
30
+ │ ├── cavecrew-investigator.md # Read-only locator (haiku)
31
+ │ ├── cavecrew-builder.md # 1-2 file surgical editor
32
+ │ └── cavecrew-reviewer.md # Diff reviewer (haiku)
33
+ ├── hooks/ # Claude Code integration
34
+ │ ├── caveman-activate.js # SessionStart: inject rules
35
+ │ ├── caveman-mode-tracker.js # Per-turn reinforcement
36
+ │ ├── caveman-config.js # Shared config + symlink-safe I/O
37
+ │ └── caveman-stats.js # Lifetime token tracking
38
+ ├── mcp-servers/
39
+ │ └── caveman-shrink/ # MCP middleware proxy
40
+ ├── caveman-compress/ # File compression tool
41
+ │ └── scripts/
42
+ │ ├── compress.py # Orchestrator
43
+ │ ├── validate.py # Structural preservation validator
44
+ │ └── detect.py # File type detection
45
+ ├── evals/ # Three-arm eval harness
46
+ └── benchmarks/ # Real API token counts
47
+ ```
48
+
49
+ ---
50
+
51
+ ## 3. Core Patterns Applicable to pi-crew
52
+
53
+ ### 3.1 Structured Output Contracts (KEY INSIGHT)
54
+
55
+ Caveman's biggest innovation is not the compression itself — it's the **output contracts**:
56
+
57
+ **investigator**: `path:line — symbol — ≤6 word note`
58
+ **builder**: `path:line-range — change ≤10 words. verified: re-read OK.`
59
+ **reviewer**: `path:line: emoji severity: problem. fix.`
60
+
61
+ These are **machine-parseable** — main thread can grep with regex, no ambiguity.
62
+
63
+ **pi-crew application**: Worker prompt templates should include structured output contracts:
64
+ ```
65
+ # Output Contract
66
+ Your response MUST follow this format:
67
+ <artifact_path>:<line_range> — <≤10 word change summary>
68
+ verified: <re-read OK | mismatch @ path:line>
69
+ ```
70
+
71
+ ### 3.2 Context Budget Awareness
72
+
73
+ Caveman's core thesis: **subagent tool-results get injected into main context verbatim**. Every token a subagent emits is a token the main agent can't use later.
74
+
75
+ Quantified impact:
76
+ - Vanilla `Explore` subagent: ~2000 tokens per result
77
+ - `cavecrew-investigator`: ~700 tokens per result
78
+ - Over 20 delegations: **26,000 tokens saved** = entire context window of a small model
79
+
80
+ **pi-crew application**: Worker output gets read back by Pi parent via `readFile(artifactPath)`. If workers emit caveman-style output → Pi parent can process more tasks per session before context exhaustion.
81
+
82
+ ### 3.3 Intensity Levels
83
+
84
+ | Level | Token Savings | When to Use |
85
+ |-------|--------------|-------------|
86
+ | lite | ~40% | User-facing summaries, final reports |
87
+ | full | ~65% | Inter-agent communication (default) |
88
+ | ultra | ~75% | Internal worker → coordinator messages |
89
+
90
+ **pi-crew application**: Add `outputStyle` to worker prompts:
91
+ - explorer → ultra (only paths/symbols needed)
92
+ - executor → full (some explanation needed for verification)
93
+ - reviewer → full (findings must be clear)
94
+ - writer → lite (user reads output directly)
95
+
96
+ ### 3.4 Auto-Clarity Rule
97
+
98
+ Caveman drops compression for:
99
+ - Security warnings
100
+ - Irreversible action confirmations
101
+ - Multi-step sequences with ambiguous ordering
102
+ - User confusion / repeated questions
103
+
104
+ **pi-crew application**: Worker prompts should include auto-clarity override:
105
+ ```
106
+ Drop compression for: security findings, destructive operations,
107
+ ambiguous multi-step instructions. Resume compression after.
108
+ ```
109
+
110
+ ### 3.5 MCP Proxy Compression (caveman-shrink)
111
+
112
+ `caveman-shrink` wraps any MCP server, compresses `description` fields in `tools/list` responses:
113
+
114
+ ```
115
+ Before: "This tool allows you to search for files in the filesystem..."
116
+ After: "Search files in filesystem."
117
+ ```
118
+
119
+ **pi-crew application**: pi-crew's MCP proxy (`mcp-proxy.ts`) could compress tool descriptions before passing to workers, reducing input token cost per tool call.
120
+
121
+ ---
122
+
123
+ ## 4. Compression Techniques
124
+
125
+ ### 4.1 Protected Segments (from compress.js)
126
+
127
+ ```javascript
128
+ const PROTECTED_PATTERNS = [
129
+ /```[\s\S]*?```/g, // fenced code blocks
130
+ /`[^`\n]+`/g, // inline code
131
+ /\bhttps?:\/\/\S+/gi, // URLs
132
+ /\b[\w.-]*[\/\\][\w.\/\\-]+/g, // paths
133
+ /\b[A-Z][A-Za-z0-9]*(?:_[A-Z][A-Za-z0-9]*)+\b/g, // CONST_CASE
134
+ /\b\w+\.\w+(?:\.\w+)*\(\)?/g, // dotted.method()
135
+ /[A-Za-z_][A-Za-z0-9_]*\s*\([^)]*\)/g, // function calls
136
+ /\b\d+\.\d+\.\d+\b/g, // version numbers
137
+ ];
138
+ ```
139
+
140
+ Process: Replace protected segments with sentinels → compress remaining prose → restore sentinels.
141
+
142
+ ### 4.2 Prose Compression Rules
143
+
144
+ ```javascript
145
+ // Remove categories:
146
+ FILLERS: just, really, basically, actually, simply, quite, very, essentially, literally
147
+ PLEASANTRIES: please, kindly, thank you, thanks, sure, certainly, of course
148
+ HEDGES: perhaps, maybe, might, could potentially, would like to, I think
149
+ LEADERS: I'll, I will, I can, you can, we will, let me, let's
150
+ ARTICLES: a, an, the (before lowercase words)
151
+
152
+ // Pattern: [thing] [action] [reason]. [next step].
153
+ ```
154
+
155
+ ### 4.3 Validation (from validate.py)
156
+
157
+ After compression, validate:
158
+ - Heading count and order preserved
159
+ - Code blocks byte-identical
160
+ - URLs preserved exactly
161
+ - File paths preserved
162
+ - Inline code preserved
163
+ - Bullet structure maintained (±15% tolerance)
164
+
165
+ **pi-crew application**: When workers produce structured output, validate format before injecting into parent context. Bad format → retry with targeted fix (caveman's "cherry-pick fix" pattern).
166
+
167
+ ---
168
+
169
+ ## 5. Delegation Decision Matrix (from cavecrew)
170
+
171
+ | Task | Use | Why |
172
+ |------|-----|-----|
173
+ | "Where is X defined" | investigator | Read-only, structured paths |
174
+ | Same + suggestions | vanilla Explore | Need prose |
175
+ | 1-2 file surgical edit | builder | Bounded scope |
176
+ | 3+ file refactor | main thread | Builder refuses |
177
+ | Review diff for bugs | reviewer | One-line findings |
178
+ | Deep code review | vanilla Code Reviewer | Need rationale |
179
+
180
+ **pi-crew application**: Planner agent should use similar decision matrix when assigning tasks to workers. Key rule: **if output will be consumed by another agent, compress it. If a human reads it, use normal prose.**
181
+
182
+ ---
183
+
184
+ ## 6. Security Patterns (from caveman-config.js)
185
+
186
+ ### 6.1 Symlink-Safe File I/O
187
+
188
+ ```javascript
189
+ // Flag file write pattern:
190
+ 1. Check parent dir is not symlink (or resolve + verify ownership)
191
+ 2. Check target file is not symlink
192
+ 3. Write to temp file with O_NOFOLLOW | O_EXCL
193
+ 4. fchmod 0600
194
+ 5. rename temp → target (atomic)
195
+ ```
196
+
197
+ **pi-crew application**: pi-crew's `atomic-write.ts` should adopt similar symlink guards, especially for `agents.json` (the file that caused the ghost agent bug).
198
+
199
+ ### 6.2 Sensitive File Detection (from compress.py)
200
+
201
+ ```python
202
+ SENSITIVE_BASENAMES = .env, .netrc, credentials, secrets, passwords, id_rsa, *.pem, *.key
203
+ SENSITIVE_DIRS = .ssh, .aws, .gnupg, .kube, .docker
204
+ SENSITIVE_TOKENS = secret, credential, password, apikey, token, privatekey
205
+ ```
206
+
207
+ **pi-crew application**: Workers should refuse to read/compress files matching these patterns. Add to worker prompt constraints.
208
+
209
+ ---
210
+
211
+ ## 7. Eval Methodology
212
+
213
+ ### Three-Arm Harness
214
+
215
+ | Arm | System Prompt | Purpose |
216
+ |-----|--------------|---------|
217
+ | `__baseline__` | none | Raw model output |
218
+ | `__terse__` | "Answer concisely." | Control for generic terseness |
219
+ | `<skill>` | "Answer concisely." + SKILL.md | Isolated skill contribution |
220
+
221
+ **Honest delta = skill vs terse, NOT skill vs baseline.**
222
+
223
+ **pi-crew application**: When measuring worker efficiency, compare against "answer concisely" control, not against verbose baseline. This avoids claiming compression wins that are just generic terseness.
224
+
225
+ ---
226
+
227
+ ## 8. Specific pi-crew Integration Plan
228
+
229
+ ### Phase 1: Structured Output Contracts ✅ DONE
230
+
231
+ Commit `a335dfc`. Implemented `buildOutputContract(role)` in `live-session-runtime.ts`.
232
+ Explorer, executor, reviewer, security-reviewer, verifier, writer all have structured format templates.
233
+
234
+ ### Phase 2: Prose Compression in Worker Prompts ✅ DONE
235
+
236
+ Commit `a335dfc`. Implemented `buildCommunicationStyle(role)` with lite/full/ultra levels.
237
+ Explorer = ultra, writer = lite, all others = full.
238
+
239
+ ### Phase 3: Tool Description Compression ✅ DONE
240
+
241
+ Commit `pending`. Created `prose-compressor.ts` — pure TypeScript implementation of caveman's compress.js.
242
+ Compressed custom tool descriptions (submit_result, irc).
243
+ SDK-managed tool descriptions need Pi SDK support for mutation (documented as `compressSessionToolDescriptions` stub).
244
+
245
+ ### Phase 4: Output Validation ✅ DONE
246
+
247
+ Created `output-validator.ts` with:
248
+ - `validateWorkerOutput(role, output)` — checks format + structural preservation
249
+ - `parseReviewerFindings(output)` — extracts structured findings from reviewer output
250
+ - `parseExplorerResults(output)` — extracts structured results from explorer output
251
+ - `validateCompressionPreservation(original, compressed)` — checks code blocks, URLs, inline code, headings
252
+
253
+ ### Phase 5: Intensity by Role ✅ DONE
254
+
255
+ Commit `a335dfc`. `ROLE_INTENSITY` map in `live-session-runtime.ts`.
256
+
257
+ ---
258
+
259
+ ## 9. Expected Impact
260
+
261
+ | Metric | Before | After | Improvement |
262
+ |--------|--------|-------|-------------|
263
+ | Avg worker output tokens | ~800 | ~300 | **62%** |
264
+ | Parent context capacity (tasks/session) | ~15 | ~30 | **2x** |
265
+ | Tool description tokens (input) | ~200/tool | ~80/tool | **60%** |
266
+ | Review finding parse accuracy | ~70% | ~95% | **+25%** |
267
+
268
+ ---
269
+
270
+ ## 10. Key Takeaways
271
+
272
+ 1. **Output contracts > compression** — structured format is the real win, not shorter prose ✅
273
+ 2. **Context budget is finite** — every worker token = one less parent token ✅
274
+ 3. **Validate, don't trust** — compress then validate structural preservation ✅
275
+ 4. **Auto-clarity > always-compress** — security/destructive = normal English ✅
276
+ 5. **Three-arm eval** — measure against "be concise" control, not verbose baseline 📋
277
+ 6. **Symlink-safe I/O** — protect predictable file paths from symlink attacks ✅
278
+ 7. **Sensitive file denylist** — never ship credentials to third-party APIs ✅
279
+ 8. **Role-based intensity** — explorer gets ultra, writer gets lite, executor gets full ✅
280
+ 9. **Tool description compression** — compress descriptions to reduce input tokens ✅ (SDK support pending)
281
+ 10. **Parse structured output** — extract findings/results from worker output ✅
@@ -0,0 +1,264 @@
1
+ # ⚖️ So sánh kiến trúc: oh-my-pi vs pi-crew
2
+
3
+ > Dựa trên deep research cả hai codebase (oh-my-pi v14.7.3 + pi-crew HEAD)
4
+
5
+ ---
6
+
7
+ ## 1. Tổng quan kiến trúc
8
+
9
+ ```
10
+ oh-my-pi pi-crew
11
+ ┌──────────────────────────────────────┐ ┌──────────────────────────────────────┐
12
+ │ Main Process │ │ Pi Parent Process │
13
+ │ │ │ │
14
+ │ ┌────────────────────────────────┐ │ │ Pi CLI (coding agent) │
15
+ │ │ AgentSession (in-process) │ │ │ │ │
16
+ │ │ ├─ TaskTool → createSession() │ │ │ │ team tool → team-runner.ts │
17
+ │ │ │ ├─ AgentSession #1 │ │ │ │ ├─ task-runner.ts │
18
+ │ │ │ ├─ AgentSession #2 │ │ │ │ │ ├─ child-pi.ts → spawn() │
19
+ │ │ │ └─ AgentSession #N │ │ │ │ │ │ ├─ Pi child #1 │
20
+ │ │ │ │ │ │ │ │ │ ├─ Pi child #2 │
21
+ │ │ ├─ EventBus (in-process) │ │ │ │ │ │ └─ Pi child #N │
22
+ │ │ ├─ AgentRegistry (global) │ │ │ │ │ │
23
+ │ │ └─ SessionObserverRegistry │ │ │ │ ├─ state-store (files) │
24
+ │ └────────────────────────────────┘ │ │ │ ├─ manifest.json │
25
+ │ │ │ │ ├─ tasks.json │
26
+ │ Tất cả trong 1 process │ │ │ ├─ events.jsonl │
27
+ │ → Không IPC, không serialization │ │ │ └─ artifacts/ │
28
+ │ → Direct object references │ │ │ │
29
+ │ → Real-time event streaming │ │ │ File-based coordination │
30
+ └──────────────────────────────────────┘ └──────────────────────────────────────┘
31
+ ```
32
+
33
+ ---
34
+
35
+ ## 2. So sánh chi tiết từng subsystem
36
+
37
+ ### 2.1 Execution Model
38
+
39
+ | | oh-my-pi | pi-crew |
40
+ |---|---------|---------|
41
+ | **Model** | In-process `AgentSession` | Child process `spawn("pi", ...)` |
42
+ | **Isolation** | Shared memory, shared event loop | Process-isolated, independent event loop |
43
+ | **Startup time** | ~ms (just object creation) | ~seconds (new Pi process boot) |
44
+ | **Communication** | Direct method calls | stdout/stderr IPC + file artifacts |
45
+ | **Memory** | Shared heap — agents see each other | Separate heaps — no shared state |
46
+ | **Failure blast radius** | 1 crashed agent → potential process crash | 1 crashed child → parent unaffected |
47
+ | **Concurrency** | `Semaphore` + `mapWithConcurrencyLimit` | `mapConcurrent` + `resolveBatchConcurrency` |
48
+ | **Model fallback** | Per-agent `model[]` patterns | `buildConfiguredModelRouting` with candidates loop |
49
+
50
+ **pi-crew advantage**: Process isolation — crashed worker không ảnh hưởng parent.
51
+ **oh-my-pi advantage**: Shared memory — zero IPC overhead, direct event streaming, IRC messaging.
52
+
53
+ ### 2.2 Subagent Lifecycle
54
+
55
+ ```
56
+ oh-my-pi: pi-crew:
57
+ pending → running → completed queued → running → completed
58
+ ↘ failed ↘ failed
59
+ ↘ aborted ↘ cancelled
60
+ ↘ waiting (mailbox)
61
+ ↘ skipped
62
+ ```
63
+
64
+ | | oh-my-pi | pi-crew |
65
+ |---|---------|---------|
66
+ | **Entry** | `TaskTool.execute()` | `team tool run` → `team-runner.ts` |
67
+ | **Discovery** | `discoverAgents()` — bundled + .md files | `discoverAgents()` — agents/ dir + .md files |
68
+ | **Definition format** | YAML frontmatter in .md | YAML frontmatter in .md |
69
+ | **Output submission** | **`yield` tool** (enforced, 3 retries) | **exit code + stdout** (parsed post-hoc) |
70
+ | **Recursion control** | `maxRecursionDepth` + `spawns[]` | `maxTaskDepth` env var |
71
+ | **Adaptive planning** | N/A | **Adaptive plan injection** — planner dynamically creates tasks |
72
+ | **Retry** | N/A (yield reminder only) | `executeWithRetry` — configurable retry policy |
73
+ | **Policy engine** | N/A | `evaluateCrewPolicy` + recovery ledger |
74
+ | **Plan approval** | N/A | `planApproval` flow for implementation workflow |
75
+ | **Effectiveness guard** | N/A | `evaluateRunEffectiveness` — severity levels |
76
+
77
+ **pi-crew advantages**: Retry policy, adaptive planning, policy engine, plan approval, effectiveness guards.
78
+ **oh-my-pi advantages**: Yield enforcement (structured output), spawns[] recursion control.
79
+
80
+ ### 2.3 Inter-Subagent Communication
81
+
82
+ | | oh-my-pi | pi-crew |
83
+ |---|---------|---------|
84
+ | **Primary mechanism** | **IRC tool** — peer-to-peer messaging | **Mailbox** — async message queue |
85
+ | **Registry** | `AgentRegistry.global()` — process singleton | `manifest.json` + `crew-agent-records.json` |
86
+ | **Addressing** | Agent ID (`"0-Main"`, `"3-explore-abc"`) | Task ID (`"01_discover"`, `"02_plan"`) |
87
+ | **Reply mechanism** | `respondAsBackground()` — ephemeral side-channel | `respond` team tool action |
88
+ | **Anti-deadlock** | Side-channel doesn't block recipient's main loop | N/A — mailbox is fire-and-forget |
89
+ | **Broadcast** | `irc({ op: "send", to: "all" })` | No broadcast |
90
+ | **Visibility** | `listVisibleTo()` — all running/idle agents | `status` team tool — shows all tasks |
91
+ | **Event channels** | 3 dedicated channels (event, progress, lifecycle) | 1 `task.progress` event (coalesced) |
92
+ | **Steering** | `agent.steer()` — inject message mid-turn | `cancel` + `respond` team tool actions |
93
+ | **Context sharing** | `context.md` file + `contextFiles[]` | `dependencyContext` + `task-output-context.ts` |
94
+
95
+ **oh-my-pi advantages**: Real-time IRC, anti-deadlock side-channel, broadcast, steering mid-turn.
96
+ **pi-crew advantages**: Async mailbox (persists to disk), dependency context (auto-collects upstream outputs), more coordination patterns via team tool.
97
+
98
+ ### 2.4 Progress Tracking
99
+
100
+ | | oh-my-pi | pi-crew |
101
+ |---|---------|---------|
102
+ | **Event source** | `AgentEvent` subscription (in-process) | JSON lines from child stdout + transcript.jsonl |
103
+ | **Debounce** | 150ms coalescing | 500ms agent record + 1000ms progress event |
104
+ | **Tracked data** | toolName, toolArgs, tokens, recentOutput (8 lines), intent | toolName, toolCount, tokens, recentOutput (20 lines), usage |
105
+ | **Heartbeat** | N/A (shared process = instant status) | `worker-heartbeat.ts` — file-based heartbeat |
106
+ | **Attention detection** | N/A | `agent-control.ts` — `needs_attention`, `long_running`, consecutive failures |
107
+ | **Crash recovery** | N/A | `crash-recovery.ts`, `stale-reconciler.ts`, `overflow-recovery.ts` |
108
+ | **Deadletter** | N/A | `deadletter.ts` — tracks permanently failed tasks |
109
+
110
+ **oh-my-pi advantages**: Real-time events (no file polling needed), 150ms fast updates.
111
+ **pi-crew advantages**: Crash recovery, stale reconciliation, attention detection, deadletter — much more robust for unreliable environments.
112
+
113
+ ### 2.5 UI Rendering
114
+
115
+ | | oh-my-pi | pi-crew |
116
+ |---|---------|---------|
117
+ | **Main display** | `SessionObserverOverlay` — full transcript viewer | `RunDashboard` — multi-pane dashboard |
118
+ | **Progress bar** | `statusLine` with subagent count | `powerbar-publisher.ts` — segment-based |
119
+ | **Transcript** | Incremental JSONL reading, expand/collapse per entry | `transcript-viewer.ts` — syntax-highlighted, diff rendering |
120
+ | **Agent config UI** | `AgentDashboard` (1120 lines) — two-column agent manager | N/A (config via YAML files) |
121
+ | **Dashboard panes** | N/A (single overlay) | 7 panes: agents, progress, mailbox, health, metrics, capability, transcript |
122
+ | **Anti-flicker** | 150ms progress coalesce, viewport-only render | `file-coalescer.ts` (200ms TTL), `render-scheduler.ts` |
123
+ | **Snapshot cache** | N/A (in-process = instant) | `run-snapshot-cache.ts` (777 lines) — file mtime-based cache |
124
+ | **Live streaming** | `message_update` events (text_delta) in real-time | JSON stdout line parsing (batched) |
125
+
126
+ **oh-my-pi advantages**: Real-time streaming, entry-based expand/collapse, agent configuration UI.
127
+ **pi-crew advantages**: Richer dashboard (7 panes), syntax highlighting, diff rendering, snapshot caching for multiple runs.
128
+
129
+ ### 2.6 Tool Access Control
130
+
131
+ | | oh-my-pi | pi-crew |
132
+ |---|---------|---------|
133
+ | **Mechanism** | `agent.tools[]` in frontmatter → passed to `createAgentSession` | `permissionForRole()` → read_only vs read_write |
134
+ | **Granularity** | Per-agent tool whitelist | Per-role permission level |
135
+ | **MCP proxy** | `createMCPProxyTools()` — reuse parent's connections | N/A |
136
+ | **Plan mode** | Restrict to `["read", "search", "find", "lsp", "web_search"]` | `permissionForRole("planner") === "read_only"` |
137
+ | **LoadMode** | `"essential"` vs `"discoverable"` per tool | N/A (just added `toolGuidanceBlock`) |
138
+ | **Recursion tool** | Auto-add `"task"` tool when `spawns` defined | N/A (no subagent spawning from workers) |
139
+
140
+ ### 2.7 Isolation & Merge
141
+
142
+ | | oh-my-pi | pi-crew |
143
+ |---|---------|---------|
144
+ | **Isolation modes** | worktree, fuse-overlay, fuse-projfs | worktree only |
145
+ | **Merge modes** | patch, branch | patch (auto-captured) |
146
+ | **Commit style** | AI-generated or simple | N/A |
147
+ | **Nested repos** | `NestedRepoPatch` support | N/A |
148
+
149
+ ---
150
+
151
+ ## 3. Feature Matrix
152
+
153
+ | Feature | oh-my-pi | pi-crew |
154
+ |---------|:--------:|:-------:|
155
+ | In-process execution | ✅ | ❌ (child process) |
156
+ | Process isolation | ❌ | ✅ |
157
+ | Yield tool enforcement | ✅ | ❌ |
158
+ | IRC messaging | ✅ | ❌ (mailbox only) |
159
+ | Broadcast messaging | ✅ | ❌ |
160
+ | Steering mid-turn | ✅ | ❌ (cancel/respond only) |
161
+ | Anti-deadlock side-channel | ✅ | ❌ |
162
+ | Real-time event streaming | ✅ | ❌ (file-based) |
163
+ | Adaptive planning | ❌ | ✅ |
164
+ | Retry policy | ❌ | ✅ |
165
+ | Policy engine | ❌ | ✅ |
166
+ | Plan approval flow | ❌ | ✅ |
167
+ | Effectiveness guard | ❌ | ✅ |
168
+ | Crash recovery | ❌ | ✅ |
169
+ | Stale reconciliation | ❌ | ✅ |
170
+ | Deadletter tracking | ❌ | ✅ |
171
+ | Attention detection | ❌ | ✅ |
172
+ | Mailbox (async) | ❌ | ✅ |
173
+ | Dependency context | ❌ | ✅ |
174
+ | Multi-run dashboard | ❌ | ✅ |
175
+ | Syntax highlighting | ❌ | ✅ |
176
+ | Diff rendering | ❌ | ✅ |
177
+ | Snapshot caching | ❌ | ✅ |
178
+ | Agent configuration UI | ✅ | ❌ |
179
+ | MCP proxy tools | ✅ | ❌ |
180
+ | Worktree isolation | ✅ | ✅ |
181
+ | FUSE/ProjFS isolation | ✅ | ❌ |
182
+ | Branch-based merge | ✅ | ❌ |
183
+
184
+ ---
185
+
186
+ ## 4. Phân tích gap — pi-crew thiếu gì
187
+
188
+ ### Gap 1: Real-time Event Streaming (HIGH)
189
+ - **oh-my-pi**: In-process EventBus → events arrive in <1ms
190
+ - **pi-crew**: File-based (write manifest → poll) → 500-1000ms latency
191
+ - **Impact**: UI flickers, feels "chập chờn", delayed progress updates
192
+ - **Solution path**: WebSocket/pipe from child Pi → parent, or use Pi's JSON event stream directly
193
+
194
+ ### Gap 2: Structured Output (MEDIUM)
195
+ - **oh-my-pi**: `yield` tool enforces structured output with JTD schema
196
+ - **pi-crew**: Parse stdout + transcript post-hoc
197
+ - **Impact**: Fragile output parsing, no schema validation
198
+ - **Solution path**: Add output schema support to task packets, or use exit code conventions
199
+
200
+ ### Gap 3: Inter-Worker Communication (MEDIUM)
201
+ - **oh-my-pi**: IRC tool + AgentRegistry + side-channel
202
+ - **pi-crew**: Mailbox (fire-and-forget) + dependency context (read-only)
203
+ - **Impact**: Workers can't coordinate in real-time
204
+ - **Solution path**: Enhanced mailbox with reply support, or IPC bridge
205
+
206
+ ### Gap 4: Steering/Cancellation Granularity (LOW)
207
+ - **oh-my-pi**: `steer()` injects messages mid-turn, `interruptMode: "immediate"`
208
+ - **pi-crew**: `cancel` kills child process, `respond` adds to mailbox
209
+ - **Impact**: Can't course-correct a running worker without killing it
210
+ - **Solution path**: Pi's native `steer` support (if exposed via CLI)
211
+
212
+ ### Gap 5: Agent Configuration UI (LOW)
213
+ - **oh-my-pi**: Full `AgentDashboard` — enable/disable, model override, AI agent creation
214
+ - **pi-crew**: Edit YAML files manually
215
+ - **Impact**: Poor UX for agent management
216
+ - **Solution path**: Build a similar dashboard component in pi-crew
217
+
218
+ ---
219
+
220
+ ## 5. Phân tích gap — oh-my-pi thiếu gì (pi-crew có)
221
+
222
+ ### pi-crew Advantage 1: Process Isolation
223
+ Crashed worker → parent unaffected. Critical for production reliability.
224
+
225
+ ### pi-crew Advantage 2: Adaptive Planning
226
+ `implementation` workflow dynamically injects tasks based on planner output. No equivalent in oh-my-pi.
227
+
228
+ ### pi-crew Advantage 3: Robustness Layer
229
+ - Retry policy with backoff
230
+ - Crash recovery + stale reconciliation
231
+ - Deadletter tracking
232
+ - Effectiveness guard with severity levels
233
+ - Policy engine with block/escalate/notify
234
+
235
+ ### pi-crew Advantage 4: Dependency Context
236
+ Auto-collects upstream task outputs and feeds them to downstream tasks. oh-my-pi only shares `context.md`.
237
+
238
+ ### pi-crew Advantage 5: Rich Dashboard
239
+ 7 specialized panes vs oh-my-pi's single overlay. Better for monitoring multiple parallel runs.
240
+
241
+ ---
242
+
243
+ ## 6. Kết luận
244
+
245
+ | Aspect | Winner | Reason |
246
+ |--------|--------|--------|
247
+ | **Execution speed** | oh-my-pi | In-process, zero IPC |
248
+ | **Reliability** | pi-crew | Process isolation + crash recovery |
249
+ | **Communication** | oh-my-pi | IRC + side-channel + steering |
250
+ | **Coordination** | pi-crew | Adaptive planning + dependency context |
251
+ | **UI richness** | pi-crew | 7 dashboard panes + syntax highlighting |
252
+ | **UI responsiveness** | oh-my-pi | Real-time events + 150ms coalescing |
253
+ | **Robustness** | pi-crew | Retry + deadletter + effectiveness guard |
254
+ | **Tool control** | oh-my-pi | Per-agent whitelist + MCP proxy |
255
+ | **Configuration UX** | oh-my-pi | Agent dashboard with AI creation |
256
+
257
+ **Tóm lại**: pi-crew mạnh về **reliability và coordination**, yếu về **real-time responsiveness và inter-worker communication**. oh-my-pi mạnh về **speed và real-time**, yếu về **robustness và fault tolerance**.
258
+
259
+ **Priority improvements cho pi-crew**:
260
+ 1. 🔴 Real-time event streaming → giảm UI flicker
261
+ 2. 🟡 Structured output enforcement → giảm parsing fragility
262
+ 3. 🟡 Inter-worker communication → tăng coordination capability
263
+ 4. 🟢 Agent configuration UI → tăng UX
264
+ 5. 🟢 Steering granularity → tăng control fidelity