joonecli 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (184) hide show
  1. package/README.md +12 -12
  2. package/dist/__tests__/optimizations.test.js.map +1 -1
  3. package/dist/__tests__/promptBuilder.test.js +14 -20
  4. package/dist/__tests__/promptBuilder.test.js.map +1 -1
  5. package/dist/agents/agentRegistry.d.ts +37 -0
  6. package/dist/agents/agentRegistry.js +58 -0
  7. package/dist/agents/agentRegistry.js.map +1 -0
  8. package/dist/agents/agentSpec.d.ts +54 -0
  9. package/dist/agents/agentSpec.js +9 -0
  10. package/dist/agents/agentSpec.js.map +1 -0
  11. package/dist/agents/builtinAgents.d.ts +20 -0
  12. package/{src/agents/builtinAgents.ts → dist/agents/builtinAgents.js} +84 -101
  13. package/dist/agents/builtinAgents.js.map +1 -0
  14. package/dist/cli/config.d.ts +4 -0
  15. package/dist/cli/config.js.map +1 -1
  16. package/dist/cli/index.js +29 -2
  17. package/dist/cli/index.js.map +1 -1
  18. package/dist/cli/postinstall.d.ts +2 -0
  19. package/dist/cli/postinstall.js +25 -0
  20. package/dist/cli/postinstall.js.map +1 -0
  21. package/dist/commands/builtinCommands.d.ts +21 -0
  22. package/dist/commands/builtinCommands.js +241 -0
  23. package/dist/commands/builtinCommands.js.map +1 -0
  24. package/dist/commands/commandRegistry.d.ts +92 -0
  25. package/dist/commands/commandRegistry.js +128 -0
  26. package/dist/commands/commandRegistry.js.map +1 -0
  27. package/dist/core/agentLoop.d.ts +7 -2
  28. package/dist/core/agentLoop.js +35 -13
  29. package/dist/core/agentLoop.js.map +1 -1
  30. package/dist/core/autoSave.d.ts +41 -0
  31. package/dist/core/autoSave.js +69 -0
  32. package/dist/core/autoSave.js.map +1 -0
  33. package/dist/core/compactor.d.ts +66 -0
  34. package/dist/core/compactor.js +170 -0
  35. package/dist/core/compactor.js.map +1 -0
  36. package/dist/core/contextGuard.d.ts +38 -0
  37. package/dist/core/contextGuard.js +122 -0
  38. package/dist/core/contextGuard.js.map +1 -0
  39. package/dist/core/events.d.ts +45 -0
  40. package/dist/core/events.js +8 -0
  41. package/dist/core/events.js.map +1 -0
  42. package/dist/core/promptBuilder.d.ts +16 -1
  43. package/dist/core/promptBuilder.js +27 -14
  44. package/dist/core/promptBuilder.js.map +1 -1
  45. package/dist/core/sessionResumer.js +3 -3
  46. package/dist/core/sessionResumer.js.map +1 -1
  47. package/dist/core/sessionStore.js +3 -2
  48. package/dist/core/sessionStore.js.map +1 -1
  49. package/dist/core/subAgent.d.ts +56 -0
  50. package/dist/core/subAgent.js +240 -0
  51. package/dist/core/subAgent.js.map +1 -0
  52. package/dist/core/tokenCounter.d.ts +8 -1
  53. package/dist/core/tokenCounter.js +28 -0
  54. package/dist/core/tokenCounter.js.map +1 -1
  55. package/dist/debug_google.d.ts +1 -0
  56. package/dist/debug_google.js +23 -0
  57. package/dist/debug_google.js.map +1 -0
  58. package/dist/middleware/permission.js +1 -0
  59. package/dist/middleware/permission.js.map +1 -1
  60. package/dist/test_google.d.ts +1 -0
  61. package/dist/test_google.js +32 -89
  62. package/dist/test_google.js.map +1 -0
  63. package/dist/tools/browser.js +4 -1
  64. package/dist/tools/browser.js.map +1 -1
  65. package/dist/tools/index.d.ts +2 -1
  66. package/dist/tools/index.js +11 -3
  67. package/dist/tools/index.js.map +1 -1
  68. package/dist/tools/installHostDeps.d.ts +2 -0
  69. package/dist/tools/installHostDeps.js +37 -0
  70. package/dist/tools/installHostDeps.js.map +1 -0
  71. package/dist/tools/router.js +3 -0
  72. package/dist/tools/router.js.map +1 -1
  73. package/dist/tools/spawnAgent.d.ts +19 -0
  74. package/dist/tools/spawnAgent.js +132 -0
  75. package/dist/tools/spawnAgent.js.map +1 -0
  76. package/dist/tracing/sessionTracer.d.ts +1 -0
  77. package/dist/tracing/sessionTracer.js +4 -1
  78. package/dist/tracing/sessionTracer.js.map +1 -1
  79. package/dist/ui/App.js +94 -6
  80. package/dist/ui/App.js.map +1 -1
  81. package/dist/ui/components/ActionLog.d.ts +7 -0
  82. package/dist/ui/components/ActionLog.js +63 -0
  83. package/dist/ui/components/ActionLog.js.map +1 -0
  84. package/dist/ui/components/FileBrowser.d.ts +2 -0
  85. package/dist/ui/components/FileBrowser.js +41 -0
  86. package/dist/ui/components/FileBrowser.js.map +1 -0
  87. package/package.json +5 -6
  88. package/AGENTS.md +0 -56
  89. package/Handover.md +0 -115
  90. package/PROGRESS.md +0 -160
  91. package/docs/01_insights_and_patterns.md +0 -27
  92. package/docs/02_edge_cases_and_mitigations.md +0 -143
  93. package/docs/03_initial_implementation_plan.md +0 -66
  94. package/docs/04_tech_stack_proposal.md +0 -20
  95. package/docs/05_prd.md +0 -87
  96. package/docs/06_user_stories.md +0 -72
  97. package/docs/07_system_architecture.md +0 -138
  98. package/docs/08_roadmap.md +0 -200
  99. package/e2b/Dockerfile +0 -26
  100. package/src/__tests__/bootstrap.test.ts +0 -111
  101. package/src/__tests__/config.test.ts +0 -97
  102. package/src/__tests__/m55.test.ts +0 -238
  103. package/src/__tests__/middleware.test.ts +0 -219
  104. package/src/__tests__/modelFactory.test.ts +0 -63
  105. package/src/__tests__/optimizations.test.ts +0 -201
  106. package/src/__tests__/promptBuilder.test.ts +0 -141
  107. package/src/__tests__/sandbox.test.ts +0 -102
  108. package/src/__tests__/security.test.ts +0 -122
  109. package/src/__tests__/streaming.test.ts +0 -82
  110. package/src/__tests__/toolRouter.test.ts +0 -52
  111. package/src/__tests__/tools.test.ts +0 -146
  112. package/src/__tests__/tracing.test.ts +0 -196
  113. package/src/agents/agentRegistry.ts +0 -69
  114. package/src/agents/agentSpec.ts +0 -67
  115. package/src/cli/config.ts +0 -124
  116. package/src/cli/index.ts +0 -730
  117. package/src/cli/modelFactory.ts +0 -174
  118. package/src/cli/providers.ts +0 -107
  119. package/src/commands/builtinCommands.ts +0 -293
  120. package/src/commands/commandRegistry.ts +0 -194
  121. package/src/core/agentLoop.d.ts.map +0 -1
  122. package/src/core/agentLoop.ts +0 -312
  123. package/src/core/autoSave.ts +0 -95
  124. package/src/core/compactor.ts +0 -252
  125. package/src/core/contextGuard.ts +0 -129
  126. package/src/core/errors.ts +0 -202
  127. package/src/core/promptBuilder.d.ts.map +0 -1
  128. package/src/core/promptBuilder.ts +0 -139
  129. package/src/core/reasoningRouter.ts +0 -121
  130. package/src/core/retry.ts +0 -75
  131. package/src/core/sessionResumer.ts +0 -90
  132. package/src/core/sessionStore.ts +0 -215
  133. package/src/core/subAgent.ts +0 -339
  134. package/src/core/tokenCounter.ts +0 -64
  135. package/src/evals/dataset.ts +0 -67
  136. package/src/evals/evaluator.ts +0 -81
  137. package/src/hitl/bridge.ts +0 -160
  138. package/src/middleware/commandSanitizer.ts +0 -60
  139. package/src/middleware/loopDetection.ts +0 -63
  140. package/src/middleware/permission.ts +0 -72
  141. package/src/middleware/pipeline.ts +0 -75
  142. package/src/middleware/preCompletion.ts +0 -94
  143. package/src/middleware/types.ts +0 -45
  144. package/src/sandbox/bootstrap.ts +0 -121
  145. package/src/sandbox/manager.ts +0 -239
  146. package/src/sandbox/sync.ts +0 -157
  147. package/src/skills/loader.ts +0 -143
  148. package/src/skills/tools.ts +0 -99
  149. package/src/skills/types.ts +0 -13
  150. package/src/test_cache.ts +0 -72
  151. package/src/test_google.js +0 -40
  152. package/src/test_google.ts +0 -40
  153. package/src/tools/askUser.ts +0 -47
  154. package/src/tools/browser.ts +0 -137
  155. package/src/tools/index.d.ts.map +0 -1
  156. package/src/tools/index.ts +0 -237
  157. package/src/tools/registry.ts +0 -198
  158. package/src/tools/router.ts +0 -78
  159. package/src/tools/security.ts +0 -220
  160. package/src/tools/spawnAgent.ts +0 -158
  161. package/src/tools/webSearch.ts +0 -142
  162. package/src/tracing/analyzer.ts +0 -265
  163. package/src/tracing/langsmith.ts +0 -63
  164. package/src/tracing/sessionTracer.ts +0 -202
  165. package/src/tracing/types.ts +0 -49
  166. package/src/types/valyu.d.ts +0 -37
  167. package/src/ui/App.tsx +0 -404
  168. package/src/ui/components/HITLPrompt.tsx +0 -119
  169. package/src/ui/components/Header.tsx +0 -51
  170. package/src/ui/components/MessageBubble.tsx +0 -46
  171. package/src/ui/components/StatusBar.tsx +0 -138
  172. package/src/ui/components/StreamingText.tsx +0 -48
  173. package/src/ui/components/ToolCallPanel.tsx +0 -80
  174. package/tests/commands/commands.test.ts +0 -356
  175. package/tests/core/compactor.test.ts +0 -217
  176. package/tests/core/retryAndErrors.test.ts +0 -164
  177. package/tests/core/sessionResumer.test.ts +0 -95
  178. package/tests/core/sessionStore.test.ts +0 -84
  179. package/tests/core/stability.test.ts +0 -165
  180. package/tests/core/subAgent.test.ts +0 -238
  181. package/tests/hitl/hitlBridge.test.ts +0 -115
  182. package/tsconfig.json +0 -16
  183. package/vitest.config.ts +0 -10
  184. package/vitest.out +0 -48
package/package.json CHANGED
@@ -1,18 +1,17 @@
1
1
  {
2
2
  "name": "joonecli",
3
- "version": "0.1.0",
3
+ "version": "0.2.0",
4
4
  "description": "An autonomous coding agent",
5
+ "files": ["dist"],
5
6
  "main": "dist/cli/index.js",
6
7
  "bin": {
7
- "joone": "./dist/cli/index.js"
8
- },
9
- "directories": {
10
- "doc": "docs"
8
+ "joone": "dist/cli/index.js"
11
9
  },
12
10
  "scripts": {
13
11
  "build": "tsc",
14
12
  "test": "vitest run",
15
- "test:watch": "vitest"
13
+ "test:watch": "vitest",
14
+ "postinstall": "node ./dist/cli/postinstall.js"
16
15
  },
17
16
  "repository": {
18
17
  "type": "git",
package/AGENTS.md DELETED
@@ -1,56 +0,0 @@
1
- # Welcome to the Agentic Coding Project
2
-
3
- You are an autonomous AI Agent contributing to this codebase. To ensure consistency with the established architectural patterns (Prompt Caching + Harness Engineering), you **MUST** review the foundational documents in the `docs/` directory before proposing new features or modifying the core loop.
4
-
5
- ## Required Reading List
6
-
7
- Before tackling complex tasks related to the context engine, middleware, or tools, please reference:
8
-
9
- - `Handover.md`: **MUST READ FIRST if this is a new session.** Contains all key architectural decisions and current project state.
10
- - `docs/01_insights_and_patterns.md`: The core thesis of the project (Prefix caching rules, Middleware hooks).
11
- - `docs/02_edge_cases_and_mitigations.md`: What _not_ to do (e.g., Leaky timestamps, Mid-session model switches).
12
- - `docs/07_system_architecture.md`: The REPL execution graph.
13
-
14
- ## Development Process: Red-Green-Refactor TDD
15
-
16
- > **CRITICAL:** This project follows a strict **Test-Driven Development (TDD)** workflow using the **Red-Green-Refactor** cycle.
17
-
18
- ### TDD Skills (MUST READ)
19
-
20
- Before writing **any** production code, you **MUST** load and follow the TDD skill instructions:
21
-
22
- 1. **Primary:** `C:\Users\Lenovo\.agents\skills\tdd\SKILL.md` — Covers vertical slicing (tracer bullets), behavior-driven testing, and anti-patterns.
23
- 2. **Extended:** `C:\Users\Lenovo\.agents\skills\test-driven-development\SKILL.md` — The "Iron Law": no production code without a failing test first. Includes rationalizations to watch for and a verification checklist.
24
-
25
- ### The Cycle
26
-
27
- 1. **RED** — Write a failing test first. The test defines expected behavior.
28
- 2. **GREEN** — Write the minimum production code to make the failing test pass.
29
- 3. **REFACTOR** — Clean up both test and production code while keeping all tests green.
30
-
31
- ### Rules
32
-
33
- - **Vertical slices only.** One test → one implementation → repeat. Never write all tests first.
34
- - **No production code without a failing test.** Code written before the test? Delete it. Start over.
35
- - **Never refactor while RED.** Get to GREEN first.
36
-
37
- **Test Runner:** Vitest (`npx vitest` or `npm test`).
38
-
39
- ## Tracking Progress
40
-
41
- Any time you complete a significant milestone, you **must**:
42
-
43
- 1. Append a summary of your actions and the current state of the project to `PROGRESS.md` in the project root.
44
- 2. Update `Handover.md` to reflect any new architectural decisions, tool additions, or shifts in the project state.
45
-
46
- This ensures the next agent or human developer knows exactly where to pick up and why decisions were made.
47
-
48
- ## API Key Management
49
-
50
- > **CRITICAL:** When implementing a **new tool** that requires an API key or token, you **must**:
51
- >
52
- > 1. Add the key field to `JooneConfig` in `src/cli/config.ts`.
53
- > 2. Add a password prompt for it in the `joone config` onboarding flow in `src/cli/index.ts` (under the "Optional Service Keys" section).
54
- > 3. Ensure the key is included in the `newConfig` object that gets saved.
55
- >
56
- > All service keys (except the primary LLM provider key) should be **optional** — the user can press Enter to skip. Tools should gracefully degrade when their API key is not configured.
package/Handover.md DELETED
@@ -1,115 +0,0 @@
1
- # Joone Agent: Handover & Architecture Document
2
-
3
- This document serves as a comprehensive handover note for future agents or engineering sessions. It captures the core architectural decisions, the current state of the project, and the rationale behind the implementation of the `joone` AI coding assistant.
4
-
5
- ## 1. Project Overview & Tech Stack
6
-
7
- **Goal:** Build a secure, terminal-based AI coding assistant that executes code in an isolated sandbox while manipulating local project files, providing a premium developer experience.
8
-
9
- - **Runtime:** Node.js (v20+)
10
- - **Module System:** ESM (`"type": "module"`)
11
- - **Language:** TypeScript (`NodeNext` resolution)
12
- - **Terminal UI (TUI):** Ink v6 (React for CLI) + Clack (prompts/onboarding)
13
- - **Primary Sandbox:** E2B
14
- - **Fallback Sandbox:** OpenSandbox (Local Docker)
15
- - **AI SDKs:** `@langchain/anthropic`, `@langchain/openai`, `@google/genai` (planned/flexible via factory)
16
- - **Testing:** Vitest (TDD methodology strictly enforced)
17
- - **Testing:** Vitest (TDD methodology strictly enforced)
18
-
19
- ---
20
-
21
- ## 2. Key Architectural Decisions
22
-
23
- ### 2.1 The "Hybrid" Execution Model
24
-
25
- - **Decision:** Split tool execution between the **Host** machine and the **Sandbox**.
26
- - **Rationale:** We want the user to see file changes happen live in their IDE (Host), but we strictly do not want to run untrusted shell commands or install random dependencies on the user's machine (Sandbox).
27
- - **Implementation (`src/tools/router.ts` & `src/sandbox/manager.ts`):**
28
- - **Host Routing (`HOST_TOOLS`):** `read_file`, `write_file`, `search_tools`.
29
- - **Sandbox Routing (`SANDBOX_TOOLS`):** `bash`, `run_tests`, `install_deps`, `security_scan`, `dep_scan`.
30
- - **Wrapper Architecture:** The `SandboxManager` uses an `ISandboxWrapper`. It attempts to connect to a primary **E2B** cloud sandbox. If initialization fails (e.g., API key error, network timeout), it gracefully defaults to a robust local **OpenSandbox** deployment (`localhost:8080`).
31
- - _Safe-by-default logic:_ Any unknown tool request is routed to the sandbox.
32
-
33
- ### 2.2 ESM Migration for the TUI
34
-
35
- - **Decision:** Migrated the entire codebase from CommonJS to ESM.
36
- - **Rationale:** The chosen TUI framework (Ink v6) and modern utility libraries (like Clack) are ESM-only. A premium UI requires modern tooling.
37
- - **Implementation:** Updated `package.json` (`type: module`), modified TSConfig (`module: NodeNext`), and enforced `.js` extensions on all relative local imports.
38
-
39
- ### 2.3 Upload-on-Execute File Synchronization
40
-
41
- - **Decision:** Sync files from host to sandbox _just-in-time_ before command execution.
42
- - **Rationale:** Constant bidirectional syncing is slow and error-prone. Instead, when the agent uses `write_file` (on the host), the file is marked as "dirty".
43
- - **Implementation (`src/sandbox/sync.ts`):** Before `SandboxManager.exec()` runs a bash command, the `FileSync` layer checks the dirty queue and uploads only the modified files to the E2B sandbox.
44
-
45
- ### 2.4 File Size & Context Guardrails
46
-
47
- - **Decision:** Prevent the LLM from reading massive files that would blow up the context window.
48
- - **Rationale:** Reading generic log files or compiled assets breaks token limits, leading to expensive failures.
49
- - **Implementation (`src/tools/index.ts` -> `ReadFileTool`):**
50
- - Strict 512 KB file size hard-limit.
51
- - Soft 2,000-line truncation limit.
52
- - Added `startLine`/`endLine` arguments for specific chunk reading.
53
- - Suggests `grep` or `head` via `bash` when limits are hit.
54
-
55
- ### 2.5 Config-Driven Sandbox Strategy (Security Scanning)
56
-
57
- - **Decision:** Support both a zero-startup-cost Production environment and a flexible Development environment.
58
- - **Rationale:** Installing the Gemini CLI and OSV-Scanner inside the sandbox takes ~15 seconds, which ruins the UX if done on every session start.
59
- - **Implementation (`src/sandbox/bootstrap.ts`):**
60
- - **Dev Mode (Default):** `LazyInstaller` installs tools on-demand _only_ when the user actually invokes `security_scan` or `dep_scan`. Install state is cached per session.
61
- - **Prod Mode (`sandboxTemplate: "joone-base"`):** Uses a pre-baked E2B template defined in `e2b/Dockerfile`. The installer detects this and skips the install phase entirely (0s startup).
62
-
63
- ### 2.6 Skills System (Multi-Directory Discovery)
64
-
65
- - **Decision:** Skills are discovered from multiple directories with project-level overriding user-level.
66
- - **Rationale:** Users may have personal skills (e.g., `~/.agents/skills/`) and project-specific skills. Project skills should take priority to allow per-project customization.
67
- - **Implementation (`src/skills/loader.ts`):**
68
- - Discovery paths: `./skills/`, `./.agents/skills/`, `./.agent/skills/` (project), `~/.joone/skills/`, `~/.agents/skills/` (user)
69
- - YAML frontmatter parsing for `name` and `description` fields
70
- - Deduplication by name; project-level wins on conflict
71
-
72
- ---
73
-
74
- ## 3. Current Project State
75
-
76
- All development follows strict TDD. Currently, **95 out of 95 tests are GREEN** across 13 test suites. TypeScript compiles cleanly.
77
-
78
- ### Completed Milestones
79
-
80
- - ✅ **M1: Core Setup:** CLI scaffolding, config manager, dynamic Model Factory.
81
- - ✅ **M2: TUI & Core Loop:** Clack onboarding, Ink REPL interface, tool buffering.
82
- - ✅ **M3: Hybrid Sandbox:** E2B `SandboxManager`, `FileSync`, `ToolRouter`, core tools.
83
- - ✅ **M3.5: Security Tools:** `LazyInstaller`, `SecurityScanTool`, `DepScanTool`, E2B Dockerfile.
84
- - ✅ **M4: Harness Engineering:** `MiddlewarePipeline`, `LoopDetectionMiddleware`, `CommandSanitizerMiddleware`, `PreCompletionMiddleware`.
85
- - ✅ **M5: Advanced Optimizations:** Enhanced registry (fuzzy search + `activateTool`), `TokenCounter`, improved `compactHistory`, `ReasoningRouter`.
86
- - ✅ **M5.5: Browser, Search & Skills:** `BrowserTool` (agent-browser), `WebSearchTool` (@valyu/ai-sdk), `SkillLoader` + `search_skills`/`load_skill` tools.
87
- - ✅ **M6: Tracing & Refinement:** `SessionTracer` (metrics routing), `TraceAnalyzer` (offline insights), LangSmith env integration, `joone analyze` CLI command.
88
- - ✅ **M8: OpenSandbox Fallback:** `ISandboxWrapper`, local docker degradation at `localhost:8080`, and documented NFRs (Rate Limits & Budgets).
89
- - ✅ **M9: Persistent Sessions:** `SessionStore` (JSONL), `SessionResumer` (host drift detection), `joone sessions` dashboard, `joone start --resume <id>`.
90
- - ✅ **M10: Retry, HITL, Skills Sync:** `JooneError` hierarchy + `retryWithBackoff`, `HITLBridge` + `AskUserQuestionTool` + `PermissionMiddleware`, user-level skills sandbox sync.
91
- - ✅ **M11: Slash Command System:** `CommandRegistry` + 10 built-in `/commands` (`/help`, `/model`, `/clear`, `/compact`, `/tokens`, `/status`, `/exit`, `/history`, `/undo`) intercepted in TUI before agent loop (zero LLM cost).
92
-
93
- - ✅ **M12: LLM-Powered Compaction:** LLM-driven `ConversationCompactor`, fast model mapping (`FAST_MODEL_DEFAULTS`), and seamless handoff prompts post-compaction.
94
- - ✅ **M13: Sub-Agent Orchestration:** `AgentRegistry`, isolated sync/async `SubAgentManager`, and safe `spawn_agent` + `check_agent` tools (Depth-1 limits).
95
- - ✅ **M14: Stability & Reliability:** `ContextGuard` (80% auto-compact, 95% emergency truncation), `AutoSave` (debounced JSONL persistence), and atomic TUI `SIGINT/SIGTERM` handling.
96
-
97
- ### Tool Routing Summary
98
-
99
- | HOST (safe, runs on user machine) | SANDBOX (isolated, runs in E2B) |
100
- | --------------------------------- | ----------------------------------- |
101
- | `read_file`, `write_file` | `bash`, `run_tests`, `install_deps` |
102
- | `search_tools`, `activate_tool` | `security_scan`, `dep_scan` |
103
- | `web_search` (API call) | `browser` (agent-browser CLI) |
104
- | `search_skills`, `load_skill` | Unknown tools (safe-by-default) |
105
- | `ask_user_question` | |
106
- | `/commands` (TUI-only, no LLM) | |
107
- | `spawn_agent`, `check_agent` | |
108
-
109
- ### Pending Next Steps (Where to resume)
110
-
111
- **Continue with Milestone 15:**
112
-
113
- 1. **M15: MCP Client Integration** — `@modelcontextprotocol/sdk`, stdio/HTTP transport, namespaced MCP tools.
114
-
115
- _Reference `docs/08_roadmap.md` and the implementation plan artifact for the full checklist._
package/PROGRESS.md DELETED
@@ -1,160 +0,0 @@
1
- # Project Progress & Status
2
-
3
- _This document serves as a living changelog and status board. Any human or agent picking up this directory should read this first to understand the current state of the implementation._
4
-
5
- ---
6
-
7
- ## Current Status
8
-
9
- - [x] Milestone 6: Tracing & Refinement
10
- - [x] Milestone 7: Testing & Evaluations (TDD - Ongoing)
11
- - [x] Milestone 8: OpenSandbox Fallback & NFRs
12
- - [x] Milestone 9: Persistent Sessions
13
-
14
- ## Next Steps
15
-
16
- 1. **Milestone 7 Evals**: Hook LangSmith datasets to ExecutionHarness for regression testing.
17
- 2. **Dataset CI**: Build `joone eval` CLI command to assert Cache Hit Rate > 90% and Cost < $X.
18
- 3. **Security Tier 2 & 3 (Planned)**: OS Keychain and encrypted config.
19
-
20
- ---
21
-
22
- ## Changelog
23
-
24
- ### 2026-02-22: Project Initialization & Context Engine
25
-
26
- - **Docs Setup**: Extracted key insights from Harness Engineering and Prompt Caching research into the `docs/` folder (`01_insights...` through `08_roadmap...`).
27
- - **Tech Stack**: Finalized TypeScript + Node + LangChain + Zod + LangSmith architecture.
28
- - **Project Scaffold**: Initialized `package.json`, installed dependencies, configured strict `tsconfig.json`.
29
- - **Phase 1 Complete**: Implemented `CacheOptimizedPromptBuilder` (`src/core/promptBuilder.ts`) to strictly enforce the static-to-dynamic prefix caching rules via LangChain message formatting.
30
- - **Phase 1 Complete**: Implemented the base `ExecutionHarness` (`src/core/agentLoop.ts`) combining the LLM query and naive tool execution block.
31
- - **Phase 2 Started**: Created the `DeferredToolsDB` and mock `SearchToolsTool` (`src/tools/registry.ts`) to support lazy loading of tools for cache preservation.
32
- - **Testing & Sandbox Strategy**: Created `src/test_cache.ts` to empirically test Anthropic prompt caching locally. Outlined the architecture to use **E2B (e2b.dev)** or Docker for secure sandboxed code execution, isolating the agent's OS interactions from the host environment.
33
- - **Governance**: Created `AGENTS.md` and this `PROGRESS.md` file.
34
-
35
- ### 2026-02-25: Architecture Refinements & Doc Overhaul
36
-
37
- - **Provider Abstraction**: Refactored `ExecutionHarness` to accept any LangChain `BaseChatModel | Runnable` instead of hardcoding `ChatAnthropic`. Model selection now happens at the call site (`src/index.ts`).
38
- - **AGENTS.md**: Added mandatory Red-Green-Refactor TDD workflow instructions; added reminder to use `tdd` skill if available.
39
- - **PRD**: Added CLI packaging (`npx joone`), provider/model selection feature, and E2B sandbox execution as core features.
40
- - **User Stories**: Added new Epics for CLI/Config (Epic 1) and E2B Sandbox Execution (Epic 3).
41
- - **System Architecture**: Updated mermaid diagram to show CLI config layer routing to provider selection and E2B sandbox replacing local OS execution.
42
- - **Roadmap**: Restructured milestones to include CLI Packaging (M2), E2B Sandbox Integration (M3), and made Testing & Evaluations an ongoing parallel milestone (M7) driven by TDD.
43
-
44
- ### 2026-02-25: Milestone 7 — TDD Setup & First GREEN
45
-
46
- - **TDD Skills**: Located and verified both `tdd` and `test-driven-development` skills at `C:\Users\Lenovo\.agents\skills\`. Updated `AGENTS.md` with exact paths and instructions.
47
- - **Vitest Installed**: Added `vitest` as a dev dependency.
48
- - **PromptBuilder Tests (5/5 GREEN)**: Wrote 5 behavior-driven tests covering: strict prefix ordering, history appending after prefix, prefix stability across calls, `<system-reminder>` injection, and compaction. All passing.
49
-
50
- ### 2026-02-26: Milestone 2 — Detailed Planning Complete
51
-
52
- - **PRD Updated**: Added streaming, expanded provider list (9+), tiered API key security (plain → keychain → encrypted), and dynamic provider loading.
53
- - **User Stories Updated**: Added Epic 2 (Streaming), expanded Epic 1 with masked input and planned keychain onboarding (US 1.6).
54
- - **System Architecture Updated**: Added Stream Handler component, Model Factory component, full provider table, and security tier roadmap.
55
- - **Roadmap Updated**: Detailed Milestone 2 into 5 sub-sections (2a–2e) with a 9-step TDD vertical slice test plan.
56
- - **Pending tracked items**: OS Keychain (Security Tier 2) and AES-256 encrypted config (Security Tier 3) tracked as planned items for future onboarding enhancement.
57
-
58
- ### 2026-02-27: Milestone 2 — CLI Packaging & Provider Selection (COMPLETE)
59
-
60
- - **Config Manager** (`src/cli/config.ts`): `JooneConfig` interface, `loadConfig` (with env var fallback for 8 providers), `saveConfig` (with `chmod 600`), `DEFAULT_CONFIG`, `getProviderEnvVar`.
61
- - **Model Factory** (`src/cli/modelFactory.ts`): Dynamic `import()` for Anthropic and OpenAI. API key validation, missing package detection with install instructions. Support for 9+ providers planned.
62
- - **CLI Entry Point** (`src/cli/index.ts`): Commander.js with `joone` (default start) and `joone config` (interactive setup). Masked API key input via `@inquirer/prompts`. 9 provider choices + model lists.
63
- - **Streaming Support** (`src/core/agentLoop.ts`): `streamStep()` method on `ExecutionHarness` — text tokens emitted via `onToken` callback, tool call JSON chunks buffered until complete.
64
- - **Security Tier 1**: `saveConfig` writes with `mode: 0o600`, directory with `mode: 0o700`. Masked input in CLI. Env var fallback for API keys.
65
- - **Package.json**: Updated with `"bin"`, `"build"`, `"test"`, `"test:watch"` scripts. Version bumped to `0.1.0`.
66
- - **vitest.config.ts**: Created with test env vars to prevent Anthropic API key errors during testing.
67
- - **Bug fix**: Deleted stale compiled `.js`/`.d.ts` files that were shadowing `.ts` sources, causing `streamStep` not to be found at runtime.
68
- - **Tests**: 14/14 GREEN across 4 suites (config: 3, modelFactory: 4, promptBuilder: 5, streaming: 2).
69
-
70
- ### 2026-02-28: Milestone 2.5 — Terminal UI (Ink + Clack) (COMPLETE)
71
-
72
- - **ESM Migration**: `package.json` → `"type": "module"`, `tsconfig.json` → `"module": "NodeNext"`, `"jsx": "react-jsx"`. All 17 relative imports updated with `.js` extensions. 14/14 tests GREEN after migration.
73
- - **Dependencies**: Added `ink`, `react`, `@types/react`, `@clack/prompts`, `chalk`, `ink-spinner`. Removed `@inquirer/prompts`.
74
- - **Clack Onboarding** (`src/cli/index.ts`): `joone config` rewritten with `intro()`, `outro()`, `spinner()`, `select()`, `password()`, `confirm()`, `cancel()`. Full cancellation handling with `isCancel()`. Chalk-styled terminal output for `joone start`.
75
- - **Ink Components** (`src/ui/`):
76
- - `App.tsx`: Main REPL layout with header, message history, streaming text, tool call panel, status bar, keyboard input (Ctrl+C to exit), elapsed time timer.
77
- - `Header.tsx`: Bordered box showing provider, model, streaming status with cyan accent.
78
- - `MessageBubble.tsx`: Role-based styling (user=cyan, agent=green, system=yellow).
79
- - `StreamingText.tsx`: Token-by-token rendering with blinking cursor during streaming.
80
- - `ToolCallPanel.tsx`: Status-colored bordered box (yellow=running, green=success, red=error) with spinner, args display, truncated result.
81
- - `StatusBar.tsx`: Persistent footer with token count, cache hit rate, tool calls, elapsed time.
82
-
83
- ### 2026-02-28: Milestone 3 — Hybrid Sandbox Integration (COMPLETE)
84
-
85
- - **SandboxManager** (`src/sandbox/manager.ts`): E2B SDK wrapper with `create()`, `destroy()`, `exec(cmd)`, `uploadFile(path, content)`, and `isActive()` lifecycle management.
86
- - **FileSync** (`src/sandbox/sync.ts`): Host → sandbox file sync with `markDirty()`, `syncToSandbox()`, and `initialSync()`. Excludes `node_modules`, `.git`, `dist` on initial sync.
87
- - **ToolRouter** (`src/tools/router.ts`): Routes tools to HOST (`write_file`, `read_file`, `search_tools`) or SANDBOX (`bash`, `run_tests`, `install_deps`). Unknown tools default to sandbox for safety.
88
- - **Tests**: 26/26 GREEN across 6 suites (sandbox: 5, toolRouter: 7, plus existing 14).
89
-
90
- ### 2026-02-28: Milestone 3.5 — Security Scanning Tool (COMPLETE)
91
-
92
- - **Config**: Added `sandboxTemplate?: string` to `JooneConfig` — config-driven switching between dev (lazy install) and prod (pre-baked template).
93
- - **LazyInstaller** (`src/sandbox/bootstrap.ts`): Handles on-demand tool installation inside the sandbox. Caches install state per session. Skips entirely when using a custom E2B template.
94
- - **SecurityScanTool** (`src/tools/security.ts`): Runs `gemini -x security:analyze` in sandbox. Supports targets: `"changes"`, `"file"`, `"deps"`. Handles CLI unavailability gracefully.
95
- - **DepScanTool** (`src/tools/security.ts`): Runs OSV-Scanner with `npm audit` fallback. Supports JSON and summary output.
96
- - **ToolRouter**: Added `security_scan` and `dep_scan` to `SANDBOX_TOOLS`.
97
- - **E2B Dockerfile** (`e2b/Dockerfile`): Pre-baked production template with Gemini CLI + security extension + OSV-Scanner.
98
- - **Tests**: 43/43 GREEN across 9 suites (bootstrap: 5, security: 5, plus existing 33).
99
-
100
- ### 2026-02-28: Milestone 4 — Harness Engineering & Middlewares (COMPLETE)
101
-
102
- - **Middleware Types** (`src/middleware/types.ts`): `ToolCallContext` and `ToolMiddleware` interface with before/after hooks.
103
- - **MiddlewarePipeline** (`src/middleware/pipeline.ts`): Chains before-hooks in order, executes tool, chains after-hooks in reverse. Short-circuits on rejection.
104
- - **LoopDetectionMiddleware** (`src/middleware/loopDetection.ts`): Blocks after N identical consecutive tool calls (default: 3). Anti-doom-loop.
105
- - **CommandSanitizerMiddleware** (`src/middleware/commandSanitizer.ts`): Blocks destructive (`rm -rf /`, fork bombs), interactive (`vim`, `top`), and pipe-to-shell commands.
106
- - **PreCompletionMiddleware** (`src/middleware/preCompletion.ts`): Tracks test execution and blocks task completion until tests have been run.
107
- - **Integration**: `ExecutionHarness.executeToolCalls()` now routes through `MiddlewarePipeline`.
108
- - **Tests**: 55/55 GREEN across 10 suites (middleware: 12, plus existing 43).
109
-
110
- ### 2026-02-28: Milestone 5 — Advanced Optimizations (COMPLETE)
111
-
112
- - **Enhanced Registry** (`src/tools/registry.ts`): Fuzzy search by name/description, `activateTool()` for dynamic mid-session tool loading, `ActivateToolTool`. Expanded stubs: git_diff, git_log, grep_search, list_dir.
113
- - **Token Counter** (`src/core/tokenCounter.ts`): Character-based heuristic (~4 chars/token). `estimateTokens()`, `countMessageTokens()`, `isNearCapacity()`.
114
- - **Context Compaction**: Enhanced `compactHistory()` with `keepLastN` parameter (preserves recent messages). Added `shouldCompact()` to `CacheOptimizedPromptBuilder`.
115
- - **Reasoning Router** (`src/core/reasoningRouter.ts`): HIGH/MEDIUM reasoning levels. HIGH for planning + error recovery, MEDIUM for tool-heavy turns. Temperature-only adjustment (preserves cache prefix).
116
- - **Tests**: 69/69 GREEN across 11 suites.
117
-
118
- ### 2026-02-28: Milestone 5.5 — Browser, Web Search & Skills (COMPLETE)
119
-
120
- - **Browser Tool** (`src/tools/browser.ts`): Wraps `agent-browser` CLI. Actions: navigate, snapshot, click, type, screenshot, scroll. Runs in sandbox.
121
- - **Web Search Tool** (`src/tools/webSearch.ts`): Wraps `@valyu/ai-sdk`. Sources: web, papers, finance, patents, SEC, companies. Dynamic import, type stub in `src/types/valyu.d.ts`.
122
- - **Skills System**: `SkillLoader` (`src/skills/loader.ts`) discovers skills from project root (`./skills/`, `./.agents/skills/`) and user home (`~/.joone/skills/`, `~/.agents/skills/`). YAML frontmatter parsing, project-overrides-user deduplication.
123
- - **Skills Tools** (`src/skills/tools.ts`): `search_skills` + `load_skill` tools for agent runtime use.
124
- - **Config**: Added `valyuApiKey` to `JooneConfig`. Updated `ToolRouter` with browser/web_search/skills routing.
125
-
126
- ### 2026-02-28: Milestone 6 — Tracing & Refinement (COMPLETE)
127
-
128
- - **SessionTracer** (`src/tracing/sessionTracer.ts`): Records LLM events (prompt/completion tokens), tool runs (name/args/duration/success), and errors. Saves traces to `~/.joone/traces/{id}.json`.
129
- - **Harness Integration** (`src/core/agentLoop.ts`): Wired `ExecutionHarness` to automatically emit tracing events natively through `SessionTracer` during `step()`, `streamStep()`, and `executeToolCalls()`.
130
- - **Trace Analyzer** (`src/tracing/analyzer.ts`): Analyzes a saved `SessionTrace` to detect doom-loops, cost hotspots (>20% total tokens), low cache efficiency (<70%), and error clusters. Generates actionable recommendations.
131
- - **LangSmith Integration** (`src/tracing/langsmith.ts`): Injects configured `LANGCHAIN_TRACING_V2` environment variables from `JooneConfig` natively on CLI startup.
132
- - **CLI Command** (`src/cli/index.ts`): Added `joone analyze [sessionId]` to read trace files and print the offline analysis report beautifully.
133
- - **Tests**: 91/91 GREEN across 13 suites.
134
-
135
- ### 2026-03-01: Milestone 8 & Milestone 9 Completed!
136
-
137
- The agent now supports robust **Persistent Sessions** allowing users to pause/resume tasks. It uses highly optimized JSONL appending and automatically detects Host File System Drift when waking up. Furthermore, it supports automatic fallbacks to OpenSandbox when the primary cloud sandbox (E2B) is unavailable!
138
-
139
- - **SandboxManager (`ISandboxWrapper`)**: Refactored the core sandbox execution system to support multiple backends securely. Created `E2BSandboxWrapper` and `OpenSandboxWrapper`.
140
- - **Graceful Degradation**: If E2B fails to initialize (e.g. from a network error or bad API key), the agent automatically catches the error and degrades instantly to a local Docker `OpenSandbox` container on `localhost:8080`.
141
- - **SessionStore & SessionResumer**: Implemented a highly optimized JSONL-based `SessionStore` (`src/core/sessionStore.ts`) for persistent session logging and `SessionResumer` (`src/core/sessionResumer.ts`) for rehydrating agent state.
142
- - **Host File System Drift Detection**: `SessionResumer` now automatically detects changes in the host file system since the last session save and prompts the user for reconciliation.
143
- - **Config & CLI**: Updated `JooneConfig`, `loadConfig`/`saveConfig`, and the `joone config` Clack onboarding wizard to optionally prompt for `OpenSandbox API key` and `Domain`.
144
- - **NFRs Documented**: Formally established architectural standards in `docs/05_prd.md` for Error Handling (Fallback), Rate Limiting (Budgets & Loop Breakers), Authentication (CLI keys), and Telemetry Data Retention (Local JSONs rotated at 30 days — 100% private).
145
- - **Tests**: 95/95 GREEN tests ensuring the sandbox layer abstraction natively handles API mappings without breaking `BashTool`.
146
-
147
- ### 2026-03-04: Milestone 10 — Retry, HITL, and Skills Sync (COMPLETE)
148
-
149
- - **Error Hierarchy** (`src/core/errors.ts`): `JooneError` base class with `LLMApiError`, `SandboxError`, `ToolExecutionError` subclasses. Each carries `category`, `retryable` flag, structured `context`, and `toRecoveryHint()` for self-healing. `wrapLLMError()` auto-classifies raw provider errors.
150
- - **Retry** (`src/core/retry.ts`): `retryWithBackoff<T>()` generic utility with exponential backoff (1s→2s→4s + jitter). Respects `JooneError.retryable` flag. Non-retryable errors (401/403) fail immediately.
151
- - **Self-Recovery** (`src/core/agentLoop.ts`): On exhausted retries, `ExecutionHarness` injects the error's `toRecoveryHint()` as a `SystemMessage` into conversation history instead of crashing. Tool errors now wrapped in `ToolExecutionError`.
152
- - **HITLBridge** (`src/hitl/bridge.ts`): EventEmitter-based singleton with `askUser()` and `requestPermission()`. Configurable timeout (default 5 min) with auto-deny/auto-no-response.
153
- - **AskUserQuestionTool** (`src/tools/askUser.ts`): Agent-callable tool for mid-turn clarification, preference gathering, and plan approval.
154
- - **PermissionMiddleware** (`src/middleware/permission.ts`): `ToolMiddleware` implementation with 3 modes (`auto`, `ask_dangerous`, `ask_all`). Hardcoded `SAFE_TOOLS` whitelist. Uses `HITLBridge.requestPermission()` for dangerous tools.
155
- - **HITLPrompt** (`src/ui/components/HITLPrompt.tsx`): Ink TUI component rendering question/permission prompts with `TextInput` capture.
156
- - **Skills Sync** (`src/sandbox/sync.ts`): `syncSkillsToSandbox()` uploads user-level skill directories into `/workspace/.joone/skills/` in the sandbox.
157
- - **System Prompt**: Updated `globalSystemInstructions` with `ask_user_question` awareness, permission system notice, and skills discovery instructions.
158
- - **Config**: Added `permissionMode` to `JooneConfig` (default: `"auto"`).
159
- - **Edge Cases**: Added 8 new scenarios covering retry/self-recovery, HITL timeouts, permission misconfiguration, and skills sync.
160
- - **Tests**: 24 new tests (14 retry/errors + 10 HITL/permission) all GREEN. TypeScript build clean.
@@ -1,27 +0,0 @@
1
- # Actionable Insights, Patterns, and Best Practices
2
-
3
- Derived from recent research on Harness Engineering and Prompt Caching for Agentic Coding.
4
-
5
- ## 1. The Cache-Optimized Context Prefix (Prompt Caching)
6
-
7
- - **Prefix Matching Rule:** LLM APIs cache everything from the start of a prompt up to a `cache_control` breakpoint. Any dynamic change in the middle invalidates the rest of the cache.
8
- - **Order Matters (Static to Dynamic):**
9
- 1. Base System Instructions & Tool Definitions (Globally Cached)
10
- 2. Project/Workspace memory (e.g., `CLAUDE.md`) (Cached per project)
11
- 3. Session State (Environment variables, rules) (Cached per session)
12
- 4. Conversation Messages (Grows iteratively)
13
- - **Immutability within a Session:** Never add/remove tools mid-conversation, and never swap models (e.g., from Opus to Haiku) mid-session, as this breaks the cache prefix.
14
- - **The `<system-reminder>` Pattern:** If you need to update agent behavior or state, do **not** edit the system prompt. Instead, insert a `<system-reminder>` tag inside the next simulated User Message or Tool Result.
15
-
16
- ## 2. Harness Engineering & Middleware
17
-
18
- - **Control via Harness, Not Just Prompts:** Mold the agent's behavior by building programmatic wrappers (middleware) around the LLM reasoning step rather than just asking the LLM nicely.
19
- - **Anti-Doom-Loop Middleware:** Track per-file edits in the harness. If an agent edits the same file N times without success, inject a message forcing it to reconsider its approach.
20
- - **Forced Self-Verification:** Agents tend to write code and immediately stop without testing. Implement a `PreCompletionChecklistMiddleware` that intercepts the agent's attempt to exit, forcing it to run local tests and read the full output before concluding.
21
- - **Local Context Injection:** Automatically discover and map the working directory and available binaries (e.g., Python, Node) into the prompt upon startup.
22
-
23
- ## 3. Agent Execution Strategy
24
-
25
- - **The Reasoning Sandwich:** Adjust the amount of compute/reasoning dynamically. Use heavy reasoning for Planning, Discovery, and Final Verification, but use medium reasoning for straightforward code implementations to save time and tokens.
26
- - **Lazy Tool Loading (Searchable Tools):** Instead of stuffing every possible schema into the prompt, provide "stubs" (tool names and descriptions). Allow the agent to search for advanced tools, deferring the loading of full schemas to preserve prefix caching.
27
- - **Trace-Driven Improvement:** Treat tracing (e.g., LangSmith) as a first-class feature. Route raw text-space traces to a designated "Trace Analyzer Subagent" to find where the agent frequently fails, allowing you to patch the harness without blindly guessing.
@@ -1,143 +0,0 @@
1
- # Edge Cases & Mitigations
2
-
3
- When building a coding agent with Prompt Caching + Middlewares, these are the primary edge cases to design around:
4
-
5
- ## 1. Prompt Caching Edge Cases (Cost & Latency Traps)
6
-
7
- - **The "Leaky Timestamp" Cache Breaker:**
8
- - _The Edge Case:_ If you inject dynamic data (like the current time, memory usage, or random UUIDs) into your Base System Prompt, you will achieve a **0% cache hit rate**. The cache relies on exact prefix matching.
9
- - _Mitigation:_ Put all static, immutable instructions at the top. Any dynamic state must be injected via a `<system-reminder>` inside the _Messages_ array (which sits at the end of the context).
10
- - **The Mid-Session Model Switch:**
11
- - _The Edge Case:_ Switching models mid-thread (e.g., cheap model for summarizing, smart model for coding) means the new model has an empty cache and must re-process the entire prompt prefix from scratch.
12
- - _Mitigation:_ Avoid swapping models in the same thread. Span a "Sub-agent" thread and only pass minimum necessary context.
13
- - **Context Window Compaction (Amnesia):**
14
- - _The Edge Case:_ Summarizing a long conversation and starting a new prompt causes you to lose your cached prefix AND the agent forgets specific constraints.
15
- - _Mitigation:_ Implement **Cache-Safe Forking**. Keep the exact same System Prompt and Tool definitions. Start a new thread by passing the summary of the previous history as the first few messages, followed by the new task.
16
-
17
- ## 2. Harness & Middleware Edge Cases (Logic Traps)
18
-
19
- - **The "Massive File" Blunder:**
20
- - _The Edge Case:_ The agent reads a 10,000-line minified file. This floods the context window, pushes out important instructions, and ruins the session cache.
21
- - _Mitigation:_ Harness-level Guardrails. Restrict `read_file` to return chunks or force the agent to use `grep_search` / `view_file_outline`.
22
- - **The "Blind Retry" Doom Loop:**
23
- - _The Edge Case:_ The agent misses a space in a search-and-replace, fails, and tries the exact same edit endlessly.
24
- - _Mitigation:_ Use `LoopDetectionMiddleware`. If the agent emits identical tool calls 3 times, intercept and inject: _"You have failed this 3 times. Stop trying this approach."_
25
- - **The "Fake Success" Verification:**
26
- - _The Edge Case:_ The agent runs tests, they fail, but the agent hallucinates that the failure is acceptable and marks the task as Done. Older approaches relied on fragile string parsing (e.g., matching "failed" in output), which could easily be bypassed or confused by test output.
27
- - _Mitigation:_ The harness must programmatically parse terminal exit codes. By explicitly surfacing structured tool metadata (e.g., `ToolResult.metadata.exitCode`) from execution sandboxes, the `PreCompletionMiddleware` reliably blocks the agent from exiting if tests don't pass (`exitCode !== 0`).
28
- - **Tool Schema Amnesia (with Lazy Loading):**
29
- - _The Edge Case:_ An agent loads a complex tool lazily, uses it once, and then later forgets how to format its JSON schema.
30
- - _Mitigation:_ If a tool is "discovered", it must remain in the "Messages" context as a system reminder so the schema is preserved.
31
- - **The "Ghost Tool Call" (Context Desync):**
32
- - _The Edge Case:_ A model emits a tool call but occasionally forgets to attach a internal `tool_call_id` (this breaks the strict `AIMessage[tool_calls] -> ToolMessage[tool_call_id]` sequencing rules required by modern LangChain/Anthropic/OpenAI APIs). If you forge a fake ID or cast it as a string, the LLM rejects the context on the next turn.
33
- - _Mitigation:_ The "Soft Fail" approach. Intercept the malformed tool call in the `ExecutionHarness`. Do not execute the tool and do not emit a `ToolMessage`. Instead, emit a corrective `HumanMessage` stating: _"You attempted to call tool X, but didn't provide a tool_call_id. Please try again."_ This prevents context poisoning.
34
-
35
- ## 3. Security & Execution Edge Cases (Tool Exploits)
36
-
37
- - **Command Injection via Malicious Interpolation:**
38
- - _The Edge Case:_ Passing user-provided arguments directly into shell commands (e.g., `agent-browser --url "${args.url}"` or `gemini --file "${args.path}"`) allows attackers to escape quotes and execute arbitrary commands in the sandbox (e.g., `url = '"; cat /etc/passwd; "'`).
39
- - _Mitigation:_ Use strict Bash parameter escaping. All dynamic strings passed to shell commands are wrapped in single quotes, and any internal single quotes are escaped (`'\\''`).
40
- - **Host Filesystem Path Traversal (The "Escaped Workspace" Vulnerability):**
41
- - _The Edge Case:_ Because `read_file` and `write_file` execute on the host machine to support live IDE syncing, a malicious prompt could instruct the agent to write to `~/.bashrc`, `C:\Windows\System32`, or `/.ssh/id_rsa`, compromising the user's host machine.
42
- - _Mitigation:_ Implement strict Workspace Jail boundaries. Before any host I/O operation, the resolved path is evaluated against `process.cwd()`. If the path attempts to escape the root workspace, the tool immediately rejects the call returning a permissions error.
43
- - **Silently Swallowed CLI Errors:**
44
- - _The Edge Case:_ A CLI tool (like OSV-Scanner) crashes due to a configuration error (exit code > 1) and prints an error to `stderr`. If the orchestration layer only checks for `stdout` and swallows non-zero exit codes silently falling back to another tool, the critical error trace is lost.
45
- - _Mitigation:_ Enforce strict exit code verification (e.g., `exitCode === 1` means vulnerabilities found) and emit clear warnings with the full `stderr` trace before attempting any fallback strategies.
46
- - **The "Over-Eager Doom Loop" Reporter:**
47
- - _The Edge Case:_ When detecting a doom loop (calling the same tool with identical args continuously), firing an alert during the active iteration causes redundant, spammy issue reports (e.g., reporting loop counts 3, 4, and 5 as separate critical issues).
48
- - _Mitigation:_ Track the loop state continuously but defer pushing the `AnalysisIssue` to the report array until the loop is visibly broken by a differing action, or the trace ends.
49
- - **The "Parallel Tool Expansion" Bug (TUI Memory Corruption):**
50
- - _The Edge Case:_ In a Terminal UI rendering loop, executing an array of tool calls _inside_ the UI rendering iteration causes the generated `ToolMessage` array to be appended to the conversation history $N$ times (for $N$ tools), massively inflating context usage with duplicated data.
51
-
52
- ## 4. Persistent Session Edge Cases (State Management)
53
-
54
- - **File System Drift (Host Desync):**
55
- - _The Edge Case:_ The agent edits a file, the session is paused. A human edits the file externally before the session is resumed. The agent resumes, unaware of the external edits, and attempts a line-based replacement that corrupts the file.
56
- - _Mitigation:_ `SessionResumer` explicitly logs `mtime` file stats. Upon resumption, it flags recently modified workspace files and injects a "Wakeup Prompt" forcing the LLM to diff or re-read the file before acting.
57
- - **Sandbox Ephemerality (The Amnesia Problem):**
58
- - _The Edge Case:_ A session running a background Express server in a cloud sandbox on Friday is resumed on Monday. The cloud provider killed the idle VM. The new VM lacks the running server, but the LLM’s context history believes it is still running.
59
- - _Mitigation:_ Sandboxes are treated strictly statelessly. Upon string resumption, the agent is injected with a system message that the sandbox was recycled and it must manually restart required daemons/dev-servers.
60
- - **"Mid-Breath" Interruption State (Corrupt Serialization):**
61
- - _The Edge Case:_ A forced exit (`SIGINT`/Power Loss) occurs exactly while the agent stream is halfway through emitting a JSON tool call chunk, serializing a broken `AIMessage` into history.
62
- - _Mitigation:_ The `SessionStore` must only trigger a `saveSession()` at strict execution boundaries (e.g. after a complete LLM generation cycle or successfully parsed CLI execution), guaranteeing invalid mid-stream JSON chunks never touch the disk.
63
- - **Context Overflow (The Infinite Chat Log):**
64
- - _The Edge Case:_ A persistent session spanning weeks scales the context past 200k tokens, hitting API limits and exponentially inflating the per-turn token costs.
65
- - _Mitigation:_ Compaction is forced _before_ disk serialization. The session stringizes and compresses turns older than $N$ iterations into a dense system summary block before writing to `.jsonl`.
66
- - **Provider/Model Switching Mid-Task:**
67
- - _The Edge Case:_ Starting a complex reasoning loop with Opus, pausing, and resuming with a lightweight local model like Llama 3 8B. The history is filled with complex schema usages that confused the smaller model.
68
- - _Mitigation:_ Serialize the `.jsonl` lines with `provider/model` metadata blocks. Upon resumption, the CLI explicitly warns if a provider downgrade is detected.
69
-
70
- ## 5. Error Recovery & Retry Edge Cases
71
-
72
- - **Transient LLM API Failure (429/5xx):**
73
- - _The Edge Case:_ The LLM provider returns a rate-limit (429) or server error (500/502/503) mid-turn, crashing the entire session.
74
- - _Mitigation:_ `retryWithBackoff()` wraps all LLM calls with exponential backoff (1s→2s→4s + jitter). Only `JooneError` instances with `retryable === true` trigger retries; auth failures (401/403) propagate immediately.
75
- - **Exhausted Retries (Self-Recovery):**
76
- - _The Edge Case:_ After 3 retry attempts, the LLM API is still down. The session crashes and the user loses all progress.
77
- - _Mitigation:_ Instead of crashing, `ExecutionHarness` injects the error's `toRecoveryHint()` as a `SystemMessage` into the conversation, returning a synthetic `AIMessage`. The agent can observe the error context and adapt (e.g., wait, simplify, or ask the user).
78
- - **Unclassified Provider Errors:**
79
- - _The Edge Case:_ A new LLM provider throws a non-standard error with no HTTP status code, bypassing the retry classification.
80
- - _Mitigation:_ `wrapLLMError()` inspects `.status`, `.statusCode`, `.code`, and `.response.status` on raw errors, covering the common patterns of LangChain, Axios, and native `fetch` errors.
81
-
82
- ## 6. Human-in-the-Loop Edge Cases
83
-
84
- - **Permission Timeout (User Away):**
85
- - _The Edge Case:_ The agent calls a dangerous tool (`bash`, `write_file`) while the user is away from the terminal. The agent blocks indefinitely waiting for permission.
86
- - _Mitigation:_ `HITLBridge.requestPermission()` has a configurable timeout (default 5 minutes) that auto-denies and returns a short-circuit string, letting the agent try an alternative.
87
- - **Ask Question Timeout:**
88
- - _The Edge Case:_ The agent asks the user a clarifying question via `ask_user_question`, but the user doesn't respond.
89
- - _Mitigation:_ `HITLBridge.askUser()` resolves with `"[No response]"` after timeout, so the agent can proceed with a default assumption.
90
- - **Permission Mode Misconfiguration:**
91
- - _The Edge Case:_ The user sets `"permissionMode": "ask_all"` and then every tool call — including harmless reads — triggers a prompt, making the agent unusable.
92
- - _Mitigation:_ `PermissionMiddleware` maintains a hardcoded `SAFE_TOOLS` whitelist (`read_file`, `search_skills`, `ask_user_question`, etc.) that bypasses approval even in `ask_all` mode.
93
-
94
- ## 7. Skills Sync Edge Cases
95
-
96
- - **Missing User Skills Directory:**
97
- - _The Edge Case:_ `~/.joone/skills/` doesn't exist on the user's machine. The sync crashes trying to walk a nonexistent path.
98
- - _Mitigation:_ `syncSkillsToSandbox()` checks `fs.existsSync()` before walking each skill directory and silently skips missing paths.
99
- - **Skill Name Collision (Project vs. User):**
100
- - _The Edge Case:_ A user-level skill and a project-level skill have the same name. Both get synced to the sandbox, creating confusion.
101
- - _Mitigation:_ `SkillLoader.discoverSkills()` deduplicates by name with project-level priority. `syncSkillsToSandbox()` only uploads `source: "user"` skills since project-level skills are already inside `projectRoot`.
102
-
103
- ## 8. Slash Command Edge Cases (M11)
104
-
105
- - **Command Typos & Frustration:**
106
- - _The Edge Case:_ User types `/modle` instead of `/model` and the agent treats it as a prompt, wasting LLM tokens and failing to switch the model.
107
- - _Mitigation:_ Levenshtein distance check in `CommandRegistry`. If an unknown command is `< 3` edits away from a known command, the TUI intercepts it and suggests the correct command without calling the LLM.
108
- - **State Mutation While Processing:**
109
- - _The Edge Case:_ User runs `/exit` or `/clear` while the agent is midway through generating a sequence of ToolCalls.
110
- - _Mitigation:_ App-level UI blocks input while `isProcessing === true`. The commands are disabled.
111
- - **Model Switch to Non-Existent Model:**
112
- - _The Edge Case:_ User runs `/model nonexistent`.
113
- - _Mitigation:_ The command validates the model string against `ConfigManager`'s available models and securely rejects it before updating internal state.
114
-
115
- ## 9. LLM-Powered Compaction Edge Cases (M12)
116
-
117
- - **Compaction Data Loss (Amnesia 2.0):**
118
- - _The Edge Case:_ The LLM summarizes a 50-turn conversation but drops explicit file paths or tool choices, leaving the main agent blind when resuming.
119
- - _Mitigation:_ The built-in Compact Prompt explicitly mandates a structured format: `Files Modified`, `Decisions Made`, `Tools Used`. A handoff prompt (`[CONTEXT HANDOFF]`) is injected into the bottom of the history to glue the summary back to the agent's persona.
120
- - **Double Compaction Fidelity Loss:**
121
- - _The Edge Case:_ A session exists so long it must be compacted twice. A "summary of a summary" loses critical resolution.
122
- - _Mitigation:_ `ConversationCompactor` detects prior summaries and includes them entirely in the eviction block, prompting the LLM to unify the old summary with the new evicted messages.
123
-
124
- ## 10. Sub-Agent Orchestration Edge Cases (M13)
125
-
126
- - **The Sub-Agent Recursion Bomb:**
127
- - _The Edge Case:_ A sub-agent uses the `spawn_agent` tool to spawn another sub-agent, creating an infinite nesting loop.
128
- - _Mitigation:_ Hardcoded Depth-1 limit. Pre-configured sub-agents in `AgentRegistry` never include `spawn_agent` or `check_agent` in their allowed toolsets.
129
- - **Async Resource Contention:**
130
- - _The Edge Case:_ The main agent loops over a directory and spawns 50 async `test_runner` agents concurrently.
131
- - _Mitigation:_ `SubAgentManager` maintains a hard cap of 3 concurrent async tasks. Further spawn requests are queued or rejected with a backpressure error tool response.
132
- - **Stale Files in Sandbox:**
133
- - _The Edge Case:_ The main agent edits a file on the host, then immediately spawns a `bash` sub-agent. The sub-agent runs in the sandbox before the new host file is synced.
134
- - _Mitigation:_ The `SubAgentManager` shares the main harness's `FileSync` instance and always forces a `syncToSandbox()` pass _before_ the sub-agent takes its first step.
135
-
136
- ## 11. Stability & Reliability Edge Cases (M14)
137
-
138
- - **Context Window Overflows (Instant Death):**
139
- - _The Edge Case:_ Despite compaction thresholds, a single `read_file` returns 120k tokens string, instantly blowing past the 100% capacity mark. Compaction fails because the context is already overflowing.
140
- - _Mitigation:_ `ContextGuard` has a 95% "Emergency Truncation" threshold. Before hitting the API, if tokens > 95%, it _bypasses_ LLM compaction and brutally slices all but the last 4 messages, inserting a loud warning message directly into the stream, guaranteeing survival.
141
- - **Process Death Serialization Tearing:**
142
- - _The Edge Case:_ The `AutoSave` triggers at the exact millisecond the user presses `Ctrl+C`. The Node process terminates while `fs.writeFileSync` is mid-chunk, corrupting the JSONL session file irreversibly.
143
- - _Mitigation:_ Atomic saves. `SessionStore.saveSession()` writes to an intermediate staging stream. On `process.on('SIGINT')`, a synchronous `forceSave()` is fired to cleanly flush state _before_ `process.exit(0)`.
@@ -1,66 +0,0 @@
1
- # Initial Implementation Plan
2
-
3
- ## Phase 1: Context Engine & Caching Layer
4
-
5
- Build a structured Prompt Builder that strictly enforces the Prefix Matching patterns so every task in a session enjoys a >90% cache hit rate.
6
-
7
- ```mermaid
8
- graph TD
9
- A[Base System Prompt] -->|Static Prefix| B
10
- B[Tool Schemas] -->|Static Prefix| C
11
- C[Project Memory e.g., README] -->|Project Prefix| D
12
- D[Session Context e.g., OS Info] -->|Session Prefix| E
13
- E[Conversation History] -->|Dynamic Appends| F[New User/Tool Message]
14
-
15
- style A fill:#1e4620,stroke:#2b662e,color:#fff
16
- style B fill:#1e4620,stroke:#2b662e,color:#fff
17
- style C fill:#1e4620,stroke:#2b662e,color:#fff
18
- style D fill:#2b465e,stroke:#3b6282,color:#fff
19
- style E fill:#4a3219,stroke:#664422,color:#fff
20
-
21
- subgraph Fully Cached Prefix
22
- A
23
- B
24
- C
25
- end
26
- ```
27
-
28
- ## Phase 2: Interoperable Tooling & Lazy Loading
29
-
30
- Implement tools as immutable objects for the session. Implement "Plan Mode" to alter agent rules without unloading tool schemas.
31
-
32
- - Define core tools: `read_file`, `write_file`, `bash_command`.
33
- - Implement dummy/stub tools for complex integrations.
34
- - Implement "Cache-Safe Forking" for compaction.
35
-
36
- ## Phase 3: The Middleware Harness
37
-
38
- Implement pre-completion checks and loop detection via a middleware pipeline.
39
-
40
- ```mermaid
41
- sequenceDiagram
42
- participant Agent as LLM Agent
43
- participant Harness as Execution Harness
44
- participant Middle as Middleware Pipeline
45
- participant Env as Environment (Bash/FS)
46
-
47
- Agent->>Harness: Request: Edit target_file.py
48
- Harness->>Middle: Emit: 'pre_tool_call'
49
- Middle-->>Harness: Check LoopDetection (Fail if > 4 tries)
50
- Harness->>Env: Execute Edit
51
- Env-->>Harness: Return File Diff
52
- Harness->>Agent: Send Tool Result
53
-
54
- Agent->>Harness: Request: Submit/Exit
55
- Harness->>Middle: Emit: 'pre_submit'
56
- Middle->>Harness: Inject 'PreCompletionChecklist' (Wait, did you run tests?)
57
- Harness->>Agent: System Reminder: "Please run tests to verify."
58
- Agent->>Harness: Request: Run `pytest`
59
- ```
60
-
61
- ## Phase 4: Tracing & Feedback Loop
62
-
63
- Build an automated pipeline that sends JSON traces of failed agent runs into an evaluation database.
64
-
65
- - Hook LLM API calls to save traces.
66
- - Implement `TraceAnalyzer` subagent to review failures.