npm - pi-crew - Versions diffs - 0.1.46 → 0.1.49 - Mend

pi-crew 0.1.46 → 0.1.49

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (253) hide show

package/CHANGELOG.md +97 -0
package/agents/analyst.md +11 -11
package/agents/critic.md +11 -11
package/agents/executor.md +11 -11
package/agents/explorer.md +11 -11
package/agents/planner.md +11 -11
package/agents/reviewer.md +11 -11
package/agents/security-reviewer.md +11 -11
package/agents/test-engineer.md +11 -11
package/agents/verifier.md +11 -11
package/agents/writer.md +11 -11
package/docs/next-upgrade-roadmap.md +117 -42
package/docs/refactor-tasks-phase3.md +394 -394
package/docs/refactor-tasks-phase4.md +564 -564
package/docs/refactor-tasks-phase5.md +402 -402
package/docs/refactor-tasks-phase6.md +662 -662
package/docs/research/AGENT-EXECUTION-ARCHITECTURE.md +261 -0
package/docs/research/AGENT-LIFECYCLE-COMPARISON.md +111 -0
package/docs/research/AUDIT_OH_MY_PI.md +261 -0
package/docs/research/AUDIT_PI_CREW.md +457 -0
package/docs/research/CAVEMAN-DEEP-RESEARCH.md +281 -0
package/docs/research/COMPARISON_OH_MY_PI_VS_PI_CREW.md +264 -0
package/docs/research/DEEP-RESEARCH-PI-POWERBAR.md +343 -0
package/docs/research/DEEP_RESEARCH_SUBAGENT_ARCHITECTURE.md +480 -0
package/docs/research/GAP_CLOSURE_IMPLEMENTATION_PLAN.md +354 -0
package/docs/research/IMPLEMENTATION_PLAN.md +385 -0
package/docs/research/LIVE-SESSION-PRODUCTION-READY-PLAN.md +502 -0
package/docs/research/OH-MY-PI-DEEP-RESEARCH-v14.7.6.md +266 -0
package/docs/research/REMAINING-GAPS-PLAN.md +363 -0
package/docs/research/SESSION-SUMMARY-2026-05-08.md +146 -0
package/docs/research/UI-RESPONSIVENESS-AUDIT.md +173 -0
package/docs/research-awesome-agent-skills-distillation.md +100 -100
package/docs/research-extension-examples.md +297 -297
package/docs/research-extension-system.md +324 -324
package/docs/research-oh-my-pi-distillation.md +56 -9
package/docs/research-optimization-plan.md +548 -548
package/docs/research-phase10-distillation.md +198 -198
package/docs/research-phase11-distillation.md +201 -201
package/docs/research-pi-coding-agent.md +357 -357
package/docs/research-source-pi-crew-reference.md +174 -174
package/docs/runtime-flow.md +148 -148
package/docs/source-runtime-refactor-map.md +107 -107
package/index.ts +6 -6
package/package.json +99 -98
package/schema.json +8 -0
package/skills/async-worker-recovery/SKILL.md +42 -42
package/skills/context-artifact-hygiene/SKILL.md +52 -52
package/skills/delegation-patterns/SKILL.md +54 -54
package/skills/mailbox-interactive/SKILL.md +40 -40
package/skills/model-routing-context/SKILL.md +39 -39
package/skills/multi-perspective-review/SKILL.md +58 -58
package/skills/observability-reliability/SKILL.md +41 -41
package/skills/orchestration/SKILL.md +157 -0
package/skills/ownership-session-security/SKILL.md +41 -41
package/skills/pi-extension-lifecycle/SKILL.md +39 -39
package/skills/requirements-to-task-packet/SKILL.md +63 -63
package/skills/resource-discovery-config/SKILL.md +41 -41
package/skills/runtime-state-reader/SKILL.md +44 -44
package/skills/secure-agent-orchestration-review/SKILL.md +45 -45
package/skills/state-mutation-locking/SKILL.md +42 -42
package/skills/systematic-debugging/SKILL.md +67 -67
package/skills/ui-render-performance/SKILL.md +39 -39
package/skills/verification-before-done/SKILL.md +57 -57
package/skills/worktree-isolation/SKILL.md +39 -39
package/src/agents/agent-config.ts +6 -0
package/src/agents/agent-search.ts +98 -0
package/src/agents/agent-serializer.ts +4 -0
package/src/agents/discover-agents.ts +17 -4
package/src/config/config.ts +24 -0
package/src/config/defaults.ts +11 -0
package/src/extension/autonomous-policy.ts +26 -33
package/src/extension/cross-extension-rpc.ts +82 -82
package/src/extension/help.ts +1 -0
package/src/extension/management.ts +5 -0
package/src/extension/register.ts +58 -13
package/src/extension/registration/commands.ts +33 -1
package/src/extension/registration/compaction-guard.ts +125 -125
package/src/extension/registration/team-tool.ts +6 -4
package/src/extension/run-bundle-schema.ts +89 -89
package/src/extension/run-index.ts +24 -18
package/src/extension/run-maintenance.ts +68 -62
package/src/extension/team-tool/api.ts +23 -2
package/src/extension/team-tool/cancel.ts +86 -11
package/src/extension/team-tool/context.ts +3 -0
package/src/extension/team-tool/handle-settings.ts +188 -188
package/src/extension/team-tool/inspect.ts +41 -41
package/src/extension/team-tool/intent-policy.ts +42 -0
package/src/extension/team-tool/lifecycle-actions.ts +47 -18
package/src/extension/team-tool/parallel-dispatch.ts +156 -0
package/src/extension/team-tool/plan.ts +19 -19
package/src/extension/team-tool/respond.ts +10 -2
package/src/extension/team-tool/run.ts +3 -2
package/src/extension/team-tool/status.ts +1 -1
package/src/extension/team-tool-types.ts +1 -0
package/src/extension/team-tool.ts +13 -3
package/src/hooks/registry.ts +61 -0
package/src/hooks/types.ts +41 -0
package/src/i18n.ts +184 -184
package/src/observability/exporters/otlp-exporter.ts +77 -77
package/src/prompt/prompt-runtime.ts +72 -72
package/src/runtime/agent-control.ts +108 -2
package/src/runtime/agent-memory.ts +72 -72
package/src/runtime/agent-observability.ts +114 -114
package/src/runtime/async-marker.ts +26 -26
package/src/runtime/async-runner.ts +3 -1
package/src/runtime/attention-events.ts +28 -28
package/src/runtime/background-runner.ts +19 -0
package/src/runtime/cancellation-token.ts +89 -0
package/src/runtime/cancellation.ts +61 -51
package/src/runtime/capability-inventory.ts +116 -0
package/src/runtime/child-pi.ts +2 -1
package/src/runtime/code-summary.ts +247 -0
package/src/runtime/completion-guard.ts +190 -190
package/src/runtime/crash-recovery.ts +181 -0
package/src/runtime/crew-agent-records.ts +35 -7
package/src/runtime/crew-agent-runtime.ts +1 -0
package/src/runtime/custom-tools/irc-tool.ts +201 -0
package/src/runtime/custom-tools/submit-result-tool.ts +90 -0
package/src/runtime/delivery-coordinator.ts +3 -1
package/src/runtime/direct-run.ts +35 -35
package/src/runtime/effectiveness.ts +81 -76
package/src/runtime/event-stream-bridge.ts +90 -0
package/src/runtime/foreground-control.ts +82 -82
package/src/runtime/green-contract.ts +46 -46
package/src/runtime/group-join.ts +106 -106
package/src/runtime/heartbeat-gradient.ts +28 -28
package/src/runtime/heartbeat-watcher.ts +124 -124
package/src/runtime/live-agent-control.ts +88 -88
package/src/runtime/live-agent-manager.ts +78 -2
package/src/runtime/live-control-realtime.ts +36 -36
package/src/runtime/live-extension-bridge.ts +150 -0
package/src/runtime/live-irc.ts +92 -0
package/src/runtime/live-session-health.ts +100 -0
package/src/runtime/live-session-runtime.ts +297 -7
package/src/runtime/mcp-proxy.ts +113 -0
package/src/runtime/notebook-helpers.ts +90 -0
package/src/runtime/orphan-sentinel.ts +7 -0
package/src/runtime/output-validator.ts +187 -0
package/src/runtime/parallel-research.ts +44 -44
package/src/runtime/parallel-utils.ts +57 -0
package/src/runtime/parent-guard.ts +80 -0
package/src/runtime/pi-json-output.ts +111 -111
package/src/runtime/policy-engine.ts +79 -79
package/src/runtime/progress-event-coalescer.ts +43 -43
package/src/runtime/prose-compressor.ts +164 -0
package/src/runtime/recovery-recipes.ts +74 -74
package/src/runtime/result-extractor.ts +121 -0
package/src/runtime/role-permission.ts +39 -39
package/src/runtime/runtime-resolver.ts +1 -4
package/src/runtime/semaphore.ts +131 -0
package/src/runtime/sensitive-paths.ts +92 -0
package/src/runtime/session-resources.ts +25 -25
package/src/runtime/session-snapshot.ts +59 -59
package/src/runtime/session-usage.ts +79 -79
package/src/runtime/sidechain-output.ts +29 -29
package/src/runtime/stream-preview.ts +177 -0
package/src/runtime/subagent-manager.ts +3 -2
package/src/runtime/subprocess-tool-registry.ts +67 -0
package/src/runtime/supervisor-contact.ts +59 -59
package/src/runtime/task-display.ts +38 -38
package/src/runtime/task-output-context.ts +59 -9
package/src/runtime/task-runner/capabilities.ts +78 -78
package/src/runtime/task-runner/live-executor.ts +2 -0
package/src/runtime/task-runner/progress.ts +119 -119
package/src/runtime/task-runner/prompt-builder.ts +70 -8
package/src/runtime/task-runner/prompt-pipeline.ts +64 -64
package/src/runtime/task-runner/result-utils.ts +14 -14
package/src/runtime/task-runner/run-projection.ts +104 -0
package/src/runtime/task-runner/state-helpers.ts +22 -22
package/src/runtime/task-runner.ts +75 -4
package/src/runtime/team-runner.ts +60 -8
package/src/runtime/worker-heartbeat.ts +21 -21
package/src/runtime/worker-startup.ts +57 -57
package/src/runtime/workspace-tree.ts +298 -0
package/src/runtime/yield-handler.ts +189 -0
package/src/schema/config-schema.ts +6 -0
package/src/schema/team-tool-schema.ts +11 -1
package/src/skills/discover-skills.ts +67 -0
package/src/state/active-run-registry.ts +4 -2
package/src/state/artifact-store.ts +4 -1
package/src/state/atomic-write.ts +50 -1
package/src/state/blob-store.ts +117 -0
package/src/state/contracts.ts +1 -0
package/src/state/event-log-rotation.ts +158 -0
package/src/state/event-log.ts +52 -2
package/src/state/mailbox.ts +87 -7
package/src/state/state-store.ts +24 -4
package/src/state/task-claims.ts +44 -44
package/src/state/types.ts +20 -0
package/src/state/usage.ts +29 -29
package/src/subagents/async-entry.ts +1 -1
package/src/subagents/index.ts +3 -3
package/src/subagents/live/control.ts +1 -1
package/src/subagents/live/manager.ts +1 -1
package/src/subagents/live/realtime.ts +1 -1
package/src/subagents/live/session-runtime.ts +1 -1
package/src/subagents/manager.ts +1 -1
package/src/subagents/spawn.ts +1 -1
package/src/teams/team-serializer.ts +38 -38
package/src/types/diff.d.ts +18 -18
package/src/ui/agent-management-overlay.ts +144 -0
package/src/ui/crew-footer.ts +101 -101
package/src/ui/crew-select-list.ts +111 -111
package/src/ui/crew-widget.ts +11 -2
package/src/ui/dashboard-panes/cancellation-pane.ts +43 -0
package/src/ui/dashboard-panes/capability-pane.ts +60 -0
package/src/ui/dashboard-panes/mailbox-pane.ts +35 -11
package/src/ui/dashboard-panes/metrics-pane.ts +34 -34
package/src/ui/dynamic-border.ts +25 -25
package/src/ui/layout-primitives.ts +106 -106
package/src/ui/live-run-sidebar.ts +4 -0
package/src/ui/loaders.ts +158 -158
package/src/ui/powerbar-publisher.ts +77 -15
package/src/ui/render-coalescer.ts +51 -0
package/src/ui/render-diff.ts +119 -119
package/src/ui/render-scheduler.ts +143 -143
package/src/ui/run-dashboard.ts +4 -0
package/src/ui/run-event-bus.ts +209 -0
package/src/ui/run-snapshot-cache.ts +68 -16
package/src/ui/snapshot-types.ts +8 -0
package/src/ui/spinner.ts +17 -17
package/src/ui/status-colors.ts +58 -58
package/src/ui/syntax-highlight.ts +116 -116
package/src/ui/transcript-entries.ts +258 -0
package/src/utils/atomic-write.ts +33 -33
package/src/utils/completion-dedupe.ts +63 -63
package/src/utils/frontmatter.ts +68 -68
package/src/utils/git.ts +262 -262
package/src/utils/ids.ts +17 -12
package/src/utils/incremental-reader.ts +104 -0
package/src/utils/names.ts +27 -27
package/src/utils/redaction.ts +44 -44
package/src/utils/safe-paths.ts +47 -47
package/src/utils/scan-cache.ts +137 -0
package/src/utils/sleep.ts +32 -32
package/src/utils/sse-parser.ts +134 -0
package/src/utils/task-name-generator.ts +337 -0
package/src/utils/visual.ts +33 -2
package/src/workflows/validate-workflow.ts +40 -40
package/src/worktree/branch-freshness.ts +45 -45
package/src/worktree/cleanup.ts +2 -1
package/teams/default.team.md +12 -12
package/teams/fast-fix.team.md +11 -11
package/teams/implementation.team.md +18 -18
package/teams/parallel-research.team.md +14 -14
package/teams/research.team.md +11 -11
package/teams/review.team.md +12 -12
package/workflows/default.workflow.md +29 -29
package/workflows/fast-fix.workflow.md +22 -22
package/workflows/implementation.workflow.md +38 -38
package/workflows/parallel-research.workflow.md +46 -46
package/workflows/research.workflow.md +22 -22
package/workflows/review.workflow.md +30 -30

package/docs/research/CAVEMAN-DEEP-RESEARCH.md ADDED Viewed

@@ -0,0 +1,281 @@
+# Caveman Deep Research — Agent Communication Optimization
+> Source: `source/caveman/` (github.com/JuliusBrussee/caveman)
+> Date: 2026-05-08
+> Purpose: Apply caveman patterns to optimize pi-crew inter-agent communication
+---
+## 1. Executive Summary
+Caveman là một hệ thống **token compression** cho AI coding agents. Core insight: **LLM output trung bình 65-75% là filler** (articles, hedging, pleasantries). Bỏ filler → giảm token, tăng tốc, giảm cost, **không mất accuracy** (thậm chí tăng 26% theo paper arXiv:2604.00025).
+**Áp dụng cho pi-crew**: Worker output được inject vào main context của Pi parent. Nếu worker output dùng caveman-style → main context live lâu hơn, nhiều task hơn per session.
+---
+## 2. Architecture Overview
+```
+caveman/
+├── skills/           # Behavior definition (SKILL.md)
+│   ├── caveman/      # Core: intensity levels, rules
+│   ├── cavecrew/     # Decision guide for delegation
+│   ├── caveman-commit/   # Terse commit messages
+│   ├── caveman-review/   # One-line PR reviews
+│   ├── caveman-help/     # Quick-reference card
+│   ├── caveman-stats/    # Token usage tracker
+│   └── compress/     # Memory file compression
+├── agents/           # Subagent definitions
+│   ├── cavecrew-investigator.md  # Read-only locator (haiku)
+│   ├── cavecrew-builder.md       # 1-2 file surgical editor
+│   └── cavecrew-reviewer.md      # Diff reviewer (haiku)
+├── hooks/            # Claude Code integration
+│   ├── caveman-activate.js       # SessionStart: inject rules
+│   ├── caveman-mode-tracker.js   # Per-turn reinforcement
+│   ├── caveman-config.js         # Shared config + symlink-safe I/O
+│   └── caveman-stats.js          # Lifetime token tracking
+├── mcp-servers/
+│   └── caveman-shrink/           # MCP middleware proxy
+├── caveman-compress/             # File compression tool
+│   └── scripts/
+│       ├── compress.py   # Orchestrator
+│       ├── validate.py   # Structural preservation validator
+│       └── detect.py     # File type detection
+├── evals/            # Three-arm eval harness
+└── benchmarks/       # Real API token counts
+```
+---
+## 3. Core Patterns Applicable to pi-crew
+### 3.1 Structured Output Contracts (KEY INSIGHT)
+Caveman's biggest innovation is not the compression itself — it's the **output contracts**:
+**investigator**: `path:line — symbol — ≤6 word note`
+**builder**: `path:line-range — change ≤10 words. verified: re-read OK.`
+**reviewer**: `path:line: emoji severity: problem. fix.`
+These are **machine-parseable** — main thread can grep with regex, no ambiguity.
+**pi-crew application**: Worker prompt templates should include structured output contracts:
+```
+# Output Contract
+Your response MUST follow this format:
+<artifact_path>:<line_range> — <≤10 word change summary>
+verified: <re-read OK | mismatch @ path:line>
+```
+### 3.2 Context Budget Awareness
+Caveman's core thesis: **subagent tool-results get injected into main context verbatim**. Every token a subagent emits is a token the main agent can't use later.
+Quantified impact:
+- Vanilla `Explore` subagent: ~2000 tokens per result
+- `cavecrew-investigator`: ~700 tokens per result
+- Over 20 delegations: **26,000 tokens saved** = entire context window of a small model
+**pi-crew application**: Worker output gets read back by Pi parent via `readFile(artifactPath)`. If workers emit caveman-style output → Pi parent can process more tasks per session before context exhaustion.
+### 3.3 Intensity Levels
+| Level | Token Savings | When to Use |
+|-------|--------------|-------------|
+| lite | ~40% | User-facing summaries, final reports |
+| full | ~65% | Inter-agent communication (default) |
+| ultra | ~75% | Internal worker → coordinator messages |
+**pi-crew application**: Add `outputStyle` to worker prompts:
+- explorer → ultra (only paths/symbols needed)
+- executor → full (some explanation needed for verification)
+- reviewer → full (findings must be clear)
+- writer → lite (user reads output directly)
+### 3.4 Auto-Clarity Rule
+Caveman drops compression for:
+- Security warnings
+- Irreversible action confirmations
+- Multi-step sequences with ambiguous ordering
+- User confusion / repeated questions
+**pi-crew application**: Worker prompts should include auto-clarity override:
+```
+Drop compression for: security findings, destructive operations,
+ambiguous multi-step instructions. Resume compression after.
+```
+### 3.5 MCP Proxy Compression (caveman-shrink)
+`caveman-shrink` wraps any MCP server, compresses `description` fields in `tools/list` responses:
+```
+Before: "This tool allows you to search for files in the filesystem..."
+After:  "Search files in filesystem."
+```
+**pi-crew application**: pi-crew's MCP proxy (`mcp-proxy.ts`) could compress tool descriptions before passing to workers, reducing input token cost per tool call.
+---
+## 4. Compression Techniques
+### 4.1 Protected Segments (from compress.js)
+```javascript
+const PROTECTED_PATTERNS = [
+  /```[\s\S]*?```/g,           // fenced code blocks
+  /`[^`\n]+`/g,                // inline code
+  /\bhttps?:\/\/\S+/gi,        // URLs
+  /\b[\w.-]*[\/\\][\w.\/\\-]+/g, // paths
+  /\b[A-Z][A-Za-z0-9]*(?:_[A-Z][A-Za-z0-9]*)+\b/g, // CONST_CASE
+  /\b\w+\.\w+(?:\.\w+)*\(\)?/g,   // dotted.method()
+  /[A-Za-z_][A-Za-z0-9_]*\s*\([^)]*\)/g, // function calls
+  /\b\d+\.\d+\.\d+\b/g,       // version numbers
+];
+```
+Process: Replace protected segments with sentinels → compress remaining prose → restore sentinels.
+### 4.2 Prose Compression Rules
+```javascript
+// Remove categories:
+FILLERS: just, really, basically, actually, simply, quite, very, essentially, literally
+PLEASANTRIES: please, kindly, thank you, thanks, sure, certainly, of course
+HEDGES: perhaps, maybe, might, could potentially, would like to, I think
+LEADERS: I'll, I will, I can, you can, we will, let me, let's
+ARTICLES: a, an, the (before lowercase words)
+// Pattern: [thing] [action] [reason]. [next step].
+```
+### 4.3 Validation (from validate.py)
+After compression, validate:
+- Heading count and order preserved
+- Code blocks byte-identical
+- URLs preserved exactly
+- File paths preserved
+- Inline code preserved
+- Bullet structure maintained (±15% tolerance)
+**pi-crew application**: When workers produce structured output, validate format before injecting into parent context. Bad format → retry with targeted fix (caveman's "cherry-pick fix" pattern).
+---
+## 5. Delegation Decision Matrix (from cavecrew)
+| Task | Use | Why |
+|------|-----|-----|
+| "Where is X defined" | investigator | Read-only, structured paths |
+| Same + suggestions | vanilla Explore | Need prose |
+| 1-2 file surgical edit | builder | Bounded scope |
+| 3+ file refactor | main thread | Builder refuses |
+| Review diff for bugs | reviewer | One-line findings |
+| Deep code review | vanilla Code Reviewer | Need rationale |
+**pi-crew application**: Planner agent should use similar decision matrix when assigning tasks to workers. Key rule: **if output will be consumed by another agent, compress it. If a human reads it, use normal prose.**
+---
+## 6. Security Patterns (from caveman-config.js)
+### 6.1 Symlink-Safe File I/O
+```javascript
+// Flag file write pattern:
+1. Check parent dir is not symlink (or resolve + verify ownership)
+2. Check target file is not symlink
+3. Write to temp file with O_NOFOLLOW | O_EXCL
+4. fchmod 0600
+5. rename temp → target (atomic)
+```
+**pi-crew application**: pi-crew's `atomic-write.ts` should adopt similar symlink guards, especially for `agents.json` (the file that caused the ghost agent bug).
+### 6.2 Sensitive File Detection (from compress.py)
+```python
+SENSITIVE_BASENAMES = .env, .netrc, credentials, secrets, passwords, id_rsa, *.pem, *.key
+SENSITIVE_DIRS = .ssh, .aws, .gnupg, .kube, .docker
+SENSITIVE_TOKENS = secret, credential, password, apikey, token, privatekey
+```
+**pi-crew application**: Workers should refuse to read/compress files matching these patterns. Add to worker prompt constraints.
+---
+## 7. Eval Methodology
+### Three-Arm Harness
+| Arm | System Prompt | Purpose |
+|-----|--------------|---------|
+| `__baseline__` | none | Raw model output |
+| `__terse__` | "Answer concisely." | Control for generic terseness |
+| `<skill>` | "Answer concisely." + SKILL.md | Isolated skill contribution |
+**Honest delta = skill vs terse, NOT skill vs baseline.**
+**pi-crew application**: When measuring worker efficiency, compare against "answer concisely" control, not against verbose baseline. This avoids claiming compression wins that are just generic terseness.
+---
+## 8. Specific pi-crew Integration Plan
+### Phase 1: Structured Output Contracts ✅ DONE
+Commit `a335dfc`. Implemented `buildOutputContract(role)` in `live-session-runtime.ts`.
+Explorer, executor, reviewer, security-reviewer, verifier, writer all have structured format templates.
+### Phase 2: Prose Compression in Worker Prompts ✅ DONE
+Commit `a335dfc`. Implemented `buildCommunicationStyle(role)` with lite/full/ultra levels.
+Explorer = ultra, writer = lite, all others = full.
+### Phase 3: Tool Description Compression ✅ DONE
+Commit `pending`. Created `prose-compressor.ts` — pure TypeScript implementation of caveman's compress.js.
+Compressed custom tool descriptions (submit_result, irc).
+SDK-managed tool descriptions need Pi SDK support for mutation (documented as `compressSessionToolDescriptions` stub).
+### Phase 4: Output Validation ✅ DONE
+Created `output-validator.ts` with:
+- `validateWorkerOutput(role, output)` — checks format + structural preservation
+- `parseReviewerFindings(output)` — extracts structured findings from reviewer output
+- `parseExplorerResults(output)` — extracts structured results from explorer output
+- `validateCompressionPreservation(original, compressed)` — checks code blocks, URLs, inline code, headings
+### Phase 5: Intensity by Role ✅ DONE
+Commit `a335dfc`. `ROLE_INTENSITY` map in `live-session-runtime.ts`.
+---
+## 9. Expected Impact
+| Metric | Before | After | Improvement |
+|--------|--------|-------|-------------|
+| Avg worker output tokens | ~800 | ~300 | **62%** |
+| Parent context capacity (tasks/session) | ~15 | ~30 | **2x** |
+| Tool description tokens (input) | ~200/tool | ~80/tool | **60%** |
+| Review finding parse accuracy | ~70% | ~95% | **+25%** |
+---
+## 10. Key Takeaways
+1. **Output contracts > compression** — structured format is the real win, not shorter prose ✅
+2. **Context budget is finite** — every worker token = one less parent token ✅
+3. **Validate, don't trust** — compress then validate structural preservation ✅
+4. **Auto-clarity > always-compress** — security/destructive = normal English ✅
+5. **Three-arm eval** — measure against "be concise" control, not verbose baseline 📋
+6. **Symlink-safe I/O** — protect predictable file paths from symlink attacks ✅
+7. **Sensitive file denylist** — never ship credentials to third-party APIs ✅
+8. **Role-based intensity** — explorer gets ultra, writer gets lite, executor gets full ✅
+9. **Tool description compression** — compress descriptions to reduce input tokens ✅ (SDK support pending)
+10. **Parse structured output** — extract findings/results from worker output ✅

package/docs/research/COMPARISON_OH_MY_PI_VS_PI_CREW.md ADDED Viewed

@@ -0,0 +1,264 @@
+# ⚖️ So sánh kiến trúc: oh-my-pi vs pi-crew
+> Dựa trên deep research cả hai codebase (oh-my-pi v14.7.3 + pi-crew HEAD)
+---
+## 1. Tổng quan kiến trúc
+```
+                        oh-my-pi                              pi-crew
+┌──────────────────────────────────────┐  ┌──────────────────────────────────────┐
+│          Main Process                │  │       Pi Parent Process              │
+│                                      │  │                                      │
+│  ┌────────────────────────────────┐  │  │  Pi CLI (coding agent)               │
+│  │  AgentSession (in-process)     │  │  │    │                                 │
+│  │  ├─ TaskTool → createSession() │  │  │    │ team tool → team-runner.ts       │
+│  │  │   ├─ AgentSession #1        │  │  │    │   ├─ task-runner.ts             │
+│  │  │   ├─ AgentSession #2        │  │  │    │   │   ├─ child-pi.ts → spawn() │
+│  │  │   └─ AgentSession #N        │  │  │    │   │   │   ├─ Pi child #1     │
+│  │  │                             │  │  │    │   │   │   ├─ Pi child #2     │
+│  │  ├─ EventBus (in-process)      │  │  │    │   │   │   └─ Pi child #N     │
+│  │  ├─ AgentRegistry (global)     │  │  │    │   │                              │
+│  │  └─ SessionObserverRegistry    │  │  │    │   ├─ state-store (files)        │
+│  └────────────────────────────────┘  │  │    │   ├─ manifest.json              │
+│                                      │  │    │   ├─ tasks.json                 │
+│  Tất cả trong 1 process              │  │    │   ├─ events.jsonl               │
+│  → Không IPC, không serialization    │  │    │   └─ artifacts/                 │
+│  → Direct object references          │  │    │                                  │
+│  → Real-time event streaming         │  │    │ File-based coordination           │
+└──────────────────────────────────────┘  └──────────────────────────────────────┘
+```
+---
+## 2. So sánh chi tiết từng subsystem
+### 2.1 Execution Model
+| | oh-my-pi | pi-crew |
+|---|---------|---------|
+| **Model** | In-process `AgentSession` | Child process `spawn("pi", ...)` |
+| **Isolation** | Shared memory, shared event loop | Process-isolated, independent event loop |
+| **Startup time** | ~ms (just object creation) | ~seconds (new Pi process boot) |
+| **Communication** | Direct method calls | stdout/stderr IPC + file artifacts |
+| **Memory** | Shared heap — agents see each other | Separate heaps — no shared state |
+| **Failure blast radius** | 1 crashed agent → potential process crash | 1 crashed child → parent unaffected |
+| **Concurrency** | `Semaphore` + `mapWithConcurrencyLimit` | `mapConcurrent` + `resolveBatchConcurrency` |
+| **Model fallback** | Per-agent `model[]` patterns | `buildConfiguredModelRouting` with candidates loop |
+**pi-crew advantage**: Process isolation — crashed worker không ảnh hưởng parent.
+**oh-my-pi advantage**: Shared memory — zero IPC overhead, direct event streaming, IRC messaging.
+### 2.2 Subagent Lifecycle
+```
+oh-my-pi:                           pi-crew:
+pending → running → completed       queued → running → completed
+                    ↘ failed                         ↘ failed
+                    ↘ aborted                         ↘ cancelled
+                                                      ↘ waiting (mailbox)
+                                                      ↘ skipped
+```
+| | oh-my-pi | pi-crew |
+|---|---------|---------|
+| **Entry** | `TaskTool.execute()` | `team tool run` → `team-runner.ts` |
+| **Discovery** | `discoverAgents()` — bundled + .md files | `discoverAgents()` — agents/ dir + .md files |
+| **Definition format** | YAML frontmatter in .md | YAML frontmatter in .md |
+| **Output submission** | **`yield` tool** (enforced, 3 retries) | **exit code + stdout** (parsed post-hoc) |
+| **Recursion control** | `maxRecursionDepth` + `spawns[]` | `maxTaskDepth` env var |
+| **Adaptive planning** | N/A | **Adaptive plan injection** — planner dynamically creates tasks |
+| **Retry** | N/A (yield reminder only) | `executeWithRetry` — configurable retry policy |
+| **Policy engine** | N/A | `evaluateCrewPolicy` + recovery ledger |
+| **Plan approval** | N/A | `planApproval` flow for implementation workflow |
+| **Effectiveness guard** | N/A | `evaluateRunEffectiveness` — severity levels |
+**pi-crew advantages**: Retry policy, adaptive planning, policy engine, plan approval, effectiveness guards.
+**oh-my-pi advantages**: Yield enforcement (structured output), spawns[] recursion control.
+### 2.3 Inter-Subagent Communication
+| | oh-my-pi | pi-crew |
+|---|---------|---------|
+| **Primary mechanism** | **IRC tool** — peer-to-peer messaging | **Mailbox** — async message queue |
+| **Registry** | `AgentRegistry.global()` — process singleton | `manifest.json` + `crew-agent-records.json` |
+| **Addressing** | Agent ID (`"0-Main"`, `"3-explore-abc"`) | Task ID (`"01_discover"`, `"02_plan"`) |
+| **Reply mechanism** | `respondAsBackground()` — ephemeral side-channel | `respond` team tool action |
+| **Anti-deadlock** | Side-channel doesn't block recipient's main loop | N/A — mailbox is fire-and-forget |
+| **Broadcast** | `irc({ op: "send", to: "all" })` | No broadcast |
+| **Visibility** | `listVisibleTo()` — all running/idle agents | `status` team tool — shows all tasks |
+| **Event channels** | 3 dedicated channels (event, progress, lifecycle) | 1 `task.progress` event (coalesced) |
+| **Steering** | `agent.steer()` — inject message mid-turn | `cancel` + `respond` team tool actions |
+| **Context sharing** | `context.md` file + `contextFiles[]` | `dependencyContext` + `task-output-context.ts` |
+**oh-my-pi advantages**: Real-time IRC, anti-deadlock side-channel, broadcast, steering mid-turn.
+**pi-crew advantages**: Async mailbox (persists to disk), dependency context (auto-collects upstream outputs), more coordination patterns via team tool.
+### 2.4 Progress Tracking
+| | oh-my-pi | pi-crew |
+|---|---------|---------|
+| **Event source** | `AgentEvent` subscription (in-process) | JSON lines from child stdout + transcript.jsonl |
+| **Debounce** | 150ms coalescing | 500ms agent record + 1000ms progress event |
+| **Tracked data** | toolName, toolArgs, tokens, recentOutput (8 lines), intent | toolName, toolCount, tokens, recentOutput (20 lines), usage |
+| **Heartbeat** | N/A (shared process = instant status) | `worker-heartbeat.ts` — file-based heartbeat |
+| **Attention detection** | N/A | `agent-control.ts` — `needs_attention`, `long_running`, consecutive failures |
+| **Crash recovery** | N/A | `crash-recovery.ts`, `stale-reconciler.ts`, `overflow-recovery.ts` |
+| **Deadletter** | N/A | `deadletter.ts` — tracks permanently failed tasks |
+**oh-my-pi advantages**: Real-time events (no file polling needed), 150ms fast updates.
+**pi-crew advantages**: Crash recovery, stale reconciliation, attention detection, deadletter — much more robust for unreliable environments.
+### 2.5 UI Rendering
+| | oh-my-pi | pi-crew |
+|---|---------|---------|
+| **Main display** | `SessionObserverOverlay` — full transcript viewer | `RunDashboard` — multi-pane dashboard |
+| **Progress bar** | `statusLine` with subagent count | `powerbar-publisher.ts` — segment-based |
+| **Transcript** | Incremental JSONL reading, expand/collapse per entry | `transcript-viewer.ts` — syntax-highlighted, diff rendering |
+| **Agent config UI** | `AgentDashboard` (1120 lines) — two-column agent manager | N/A (config via YAML files) |
+| **Dashboard panes** | N/A (single overlay) | 7 panes: agents, progress, mailbox, health, metrics, capability, transcript |
+| **Anti-flicker** | 150ms progress coalesce, viewport-only render | `file-coalescer.ts` (200ms TTL), `render-scheduler.ts` |
+| **Snapshot cache** | N/A (in-process = instant) | `run-snapshot-cache.ts` (777 lines) — file mtime-based cache |
+| **Live streaming** | `message_update` events (text_delta) in real-time | JSON stdout line parsing (batched) |
+**oh-my-pi advantages**: Real-time streaming, entry-based expand/collapse, agent configuration UI.
+**pi-crew advantages**: Richer dashboard (7 panes), syntax highlighting, diff rendering, snapshot caching for multiple runs.
+### 2.6 Tool Access Control
+| | oh-my-pi | pi-crew |
+|---|---------|---------|
+| **Mechanism** | `agent.tools[]` in frontmatter → passed to `createAgentSession` | `permissionForRole()` → read_only vs read_write |
+| **Granularity** | Per-agent tool whitelist | Per-role permission level |
+| **MCP proxy** | `createMCPProxyTools()` — reuse parent's connections | N/A |
+| **Plan mode** | Restrict to `["read", "search", "find", "lsp", "web_search"]` | `permissionForRole("planner") === "read_only"` |
+| **LoadMode** | `"essential"` vs `"discoverable"` per tool | N/A (just added `toolGuidanceBlock`) |
+| **Recursion tool** | Auto-add `"task"` tool when `spawns` defined | N/A (no subagent spawning from workers) |
+### 2.7 Isolation & Merge
+| | oh-my-pi | pi-crew |
+|---|---------|---------|
+| **Isolation modes** | worktree, fuse-overlay, fuse-projfs | worktree only |
+| **Merge modes** | patch, branch | patch (auto-captured) |
+| **Commit style** | AI-generated or simple | N/A |
+| **Nested repos** | `NestedRepoPatch` support | N/A |
+---
+## 3. Feature Matrix
+| Feature | oh-my-pi | pi-crew |
+|---------|:--------:|:-------:|
+| In-process execution | ✅ | ❌ (child process) |
+| Process isolation | ❌ | ✅ |
+| Yield tool enforcement | ✅ | ❌ |
+| IRC messaging | ✅ | ❌ (mailbox only) |
+| Broadcast messaging | ✅ | ❌ |
+| Steering mid-turn | ✅ | ❌ (cancel/respond only) |
+| Anti-deadlock side-channel | ✅ | ❌ |
+| Real-time event streaming | ✅ | ❌ (file-based) |
+| Adaptive planning | ❌ | ✅ |
+| Retry policy | ❌ | ✅ |
+| Policy engine | ❌ | ✅ |
+| Plan approval flow | ❌ | ✅ |
+| Effectiveness guard | ❌ | ✅ |
+| Crash recovery | ❌ | ✅ |
+| Stale reconciliation | ❌ | ✅ |
+| Deadletter tracking | ❌ | ✅ |
+| Attention detection | ❌ | ✅ |
+| Mailbox (async) | ❌ | ✅ |
+| Dependency context | ❌ | ✅ |
+| Multi-run dashboard | ❌ | ✅ |
+| Syntax highlighting | ❌ | ✅ |
+| Diff rendering | ❌ | ✅ |
+| Snapshot caching | ❌ | ✅ |
+| Agent configuration UI | ✅ | ❌ |
+| MCP proxy tools | ✅ | ❌ |
+| Worktree isolation | ✅ | ✅ |
+| FUSE/ProjFS isolation | ✅ | ❌ |
+| Branch-based merge | ✅ | ❌ |
+---
+## 4. Phân tích gap — pi-crew thiếu gì
+### Gap 1: Real-time Event Streaming (HIGH)
+- **oh-my-pi**: In-process EventBus → events arrive in <1ms
+- **pi-crew**: File-based (write manifest → poll) → 500-1000ms latency
+- **Impact**: UI flickers, feels "chập chờn", delayed progress updates
+- **Solution path**: WebSocket/pipe from child Pi → parent, or use Pi's JSON event stream directly
+### Gap 2: Structured Output (MEDIUM)
+- **oh-my-pi**: `yield` tool enforces structured output with JTD schema
+- **pi-crew**: Parse stdout + transcript post-hoc
+- **Impact**: Fragile output parsing, no schema validation
+- **Solution path**: Add output schema support to task packets, or use exit code conventions
+### Gap 3: Inter-Worker Communication (MEDIUM)
+- **oh-my-pi**: IRC tool + AgentRegistry + side-channel
+- **pi-crew**: Mailbox (fire-and-forget) + dependency context (read-only)
+- **Impact**: Workers can't coordinate in real-time
+- **Solution path**: Enhanced mailbox with reply support, or IPC bridge
+### Gap 4: Steering/Cancellation Granularity (LOW)
+- **oh-my-pi**: `steer()` injects messages mid-turn, `interruptMode: "immediate"`
+- **pi-crew**: `cancel` kills child process, `respond` adds to mailbox
+- **Impact**: Can't course-correct a running worker without killing it
+- **Solution path**: Pi's native `steer` support (if exposed via CLI)
+### Gap 5: Agent Configuration UI (LOW)
+- **oh-my-pi**: Full `AgentDashboard` — enable/disable, model override, AI agent creation
+- **pi-crew**: Edit YAML files manually
+- **Impact**: Poor UX for agent management
+- **Solution path**: Build a similar dashboard component in pi-crew
+---
+## 5. Phân tích gap — oh-my-pi thiếu gì (pi-crew có)
+### pi-crew Advantage 1: Process Isolation
+Crashed worker → parent unaffected. Critical for production reliability.
+### pi-crew Advantage 2: Adaptive Planning
+`implementation` workflow dynamically injects tasks based on planner output. No equivalent in oh-my-pi.
+### pi-crew Advantage 3: Robustness Layer
+- Retry policy with backoff
+- Crash recovery + stale reconciliation
+- Deadletter tracking
+- Effectiveness guard with severity levels
+- Policy engine with block/escalate/notify
+### pi-crew Advantage 4: Dependency Context
+Auto-collects upstream task outputs and feeds them to downstream tasks. oh-my-pi only shares `context.md`.
+### pi-crew Advantage 5: Rich Dashboard
+7 specialized panes vs oh-my-pi's single overlay. Better for monitoring multiple parallel runs.
+---
+## 6. Kết luận
+| Aspect | Winner | Reason |
+|--------|--------|--------|
+| **Execution speed** | oh-my-pi | In-process, zero IPC |
+| **Reliability** | pi-crew | Process isolation + crash recovery |
+| **Communication** | oh-my-pi | IRC + side-channel + steering |
+| **Coordination** | pi-crew | Adaptive planning + dependency context |
+| **UI richness** | pi-crew | 7 dashboard panes + syntax highlighting |
+| **UI responsiveness** | oh-my-pi | Real-time events + 150ms coalescing |
+| **Robustness** | pi-crew | Retry + deadletter + effectiveness guard |
+| **Tool control** | oh-my-pi | Per-agent whitelist + MCP proxy |
+| **Configuration UX** | oh-my-pi | Agent dashboard with AI creation |
+**Tóm lại**: pi-crew mạnh về **reliability và coordination**, yếu về **real-time responsiveness và inter-worker communication**. oh-my-pi mạnh về **speed và real-time**, yếu về **robustness và fault tolerance**.
+**Priority improvements cho pi-crew**:
+1. 🔴 Real-time event streaming → giảm UI flicker
+2. 🟡 Structured output enforcement → giảm parsing fragility
+3. 🟡 Inter-worker communication → tăng coordination capability
+4. 🟢 Agent configuration UI → tăng UX
+5. 🟢 Steering granularity → tăng control fidelity