joonecli 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +56 -0
- package/Handover.md +115 -0
- package/LICENSE +201 -0
- package/PROGRESS.md +160 -0
- package/README.md +114 -0
- package/dist/__tests__/bootstrap.test.d.ts +1 -0
- package/dist/__tests__/bootstrap.test.js +76 -0
- package/dist/__tests__/bootstrap.test.js.map +1 -0
- package/dist/__tests__/config.test.d.ts +1 -0
- package/dist/__tests__/config.test.js +84 -0
- package/dist/__tests__/config.test.js.map +1 -0
- package/dist/__tests__/m55.test.d.ts +1 -0
- package/dist/__tests__/m55.test.js +160 -0
- package/dist/__tests__/m55.test.js.map +1 -0
- package/dist/__tests__/middleware.test.d.ts +1 -0
- package/dist/__tests__/middleware.test.js +169 -0
- package/dist/__tests__/middleware.test.js.map +1 -0
- package/dist/__tests__/modelFactory.test.d.ts +1 -0
- package/dist/__tests__/modelFactory.test.js +50 -0
- package/dist/__tests__/modelFactory.test.js.map +1 -0
- package/dist/__tests__/optimizations.test.d.ts +1 -0
- package/dist/__tests__/optimizations.test.js +136 -0
- package/dist/__tests__/optimizations.test.js.map +1 -0
- package/dist/__tests__/promptBuilder.test.d.ts +1 -0
- package/dist/__tests__/promptBuilder.test.js +108 -0
- package/dist/__tests__/promptBuilder.test.js.map +1 -0
- package/dist/__tests__/sandbox.test.d.ts +1 -0
- package/dist/__tests__/sandbox.test.js +78 -0
- package/dist/__tests__/sandbox.test.js.map +1 -0
- package/dist/__tests__/security.test.d.ts +1 -0
- package/dist/__tests__/security.test.js +86 -0
- package/dist/__tests__/security.test.js.map +1 -0
- package/dist/__tests__/streaming.test.d.ts +1 -0
- package/dist/__tests__/streaming.test.js +71 -0
- package/dist/__tests__/streaming.test.js.map +1 -0
- package/dist/__tests__/toolRouter.test.d.ts +1 -0
- package/dist/__tests__/toolRouter.test.js +37 -0
- package/dist/__tests__/toolRouter.test.js.map +1 -0
- package/dist/__tests__/tools.test.d.ts +1 -0
- package/dist/__tests__/tools.test.js +112 -0
- package/dist/__tests__/tools.test.js.map +1 -0
- package/dist/__tests__/tracing.test.d.ts +1 -0
- package/dist/__tests__/tracing.test.js +147 -0
- package/dist/__tests__/tracing.test.js.map +1 -0
- package/dist/cli/config.d.ts +49 -0
- package/dist/cli/config.js +86 -0
- package/dist/cli/config.js.map +1 -0
- package/dist/cli/index.d.ts +2 -0
- package/dist/cli/index.js +625 -0
- package/dist/cli/index.js.map +1 -0
- package/dist/cli/modelFactory.d.ts +9 -0
- package/dist/cli/modelFactory.js +154 -0
- package/dist/cli/modelFactory.js.map +1 -0
- package/dist/cli/providers.d.ts +18 -0
- package/dist/cli/providers.js +94 -0
- package/dist/cli/providers.js.map +1 -0
- package/dist/core/agentLoop.d.ts +43 -0
- package/dist/core/agentLoop.js +245 -0
- package/dist/core/agentLoop.js.map +1 -0
- package/dist/core/errors.d.ts +62 -0
- package/dist/core/errors.js +139 -0
- package/dist/core/errors.js.map +1 -0
- package/dist/core/promptBuilder.d.ts +49 -0
- package/dist/core/promptBuilder.js +84 -0
- package/dist/core/promptBuilder.js.map +1 -0
- package/dist/core/reasoningRouter.d.ts +62 -0
- package/dist/core/reasoningRouter.js +102 -0
- package/dist/core/reasoningRouter.js.map +1 -0
- package/dist/core/retry.d.ts +25 -0
- package/dist/core/retry.js +49 -0
- package/dist/core/retry.js.map +1 -0
- package/dist/core/sessionResumer.d.ts +17 -0
- package/dist/core/sessionResumer.js +78 -0
- package/dist/core/sessionResumer.js.map +1 -0
- package/dist/core/sessionStore.d.ts +45 -0
- package/dist/core/sessionStore.js +167 -0
- package/dist/core/sessionStore.js.map +1 -0
- package/dist/core/tokenCounter.d.ts +17 -0
- package/dist/core/tokenCounter.js +54 -0
- package/dist/core/tokenCounter.js.map +1 -0
- package/dist/evals/dataset.d.ts +4 -0
- package/dist/evals/dataset.js +61 -0
- package/dist/evals/dataset.js.map +1 -0
- package/dist/evals/evaluator.d.ts +21 -0
- package/dist/evals/evaluator.js +68 -0
- package/dist/evals/evaluator.js.map +1 -0
- package/dist/hitl/bridge.d.ts +65 -0
- package/dist/hitl/bridge.js +120 -0
- package/dist/hitl/bridge.js.map +1 -0
- package/dist/middleware/commandSanitizer.d.ts +18 -0
- package/dist/middleware/commandSanitizer.js +50 -0
- package/dist/middleware/commandSanitizer.js.map +1 -0
- package/dist/middleware/loopDetection.d.ts +28 -0
- package/dist/middleware/loopDetection.js +49 -0
- package/dist/middleware/loopDetection.js.map +1 -0
- package/dist/middleware/permission.d.ts +17 -0
- package/dist/middleware/permission.js +59 -0
- package/dist/middleware/permission.js.map +1 -0
- package/dist/middleware/pipeline.d.ts +31 -0
- package/dist/middleware/pipeline.js +62 -0
- package/dist/middleware/pipeline.js.map +1 -0
- package/dist/middleware/preCompletion.d.ts +29 -0
- package/dist/middleware/preCompletion.js +82 -0
- package/dist/middleware/preCompletion.js.map +1 -0
- package/dist/middleware/types.d.ts +40 -0
- package/dist/middleware/types.js +8 -0
- package/dist/middleware/types.js.map +1 -0
- package/dist/sandbox/bootstrap.d.ts +38 -0
- package/dist/sandbox/bootstrap.js +107 -0
- package/dist/sandbox/bootstrap.js.map +1 -0
- package/dist/sandbox/manager.d.ts +72 -0
- package/dist/sandbox/manager.js +180 -0
- package/dist/sandbox/manager.js.map +1 -0
- package/dist/sandbox/sync.d.ts +55 -0
- package/dist/sandbox/sync.js +135 -0
- package/dist/sandbox/sync.js.map +1 -0
- package/dist/skills/loader.d.ts +55 -0
- package/dist/skills/loader.js +132 -0
- package/dist/skills/loader.js.map +1 -0
- package/dist/skills/tools.d.ts +5 -0
- package/dist/skills/tools.js +78 -0
- package/dist/skills/tools.js.map +1 -0
- package/dist/skills/types.d.ts +13 -0
- package/dist/skills/types.js +2 -0
- package/dist/skills/types.js.map +1 -0
- package/dist/test_cache.d.ts +1 -0
- package/dist/test_cache.js +55 -0
- package/dist/test_cache.js.map +1 -0
- package/dist/test_google.js +93 -0
- package/dist/tools/askUser.d.ts +10 -0
- package/dist/tools/askUser.js +42 -0
- package/dist/tools/askUser.js.map +1 -0
- package/dist/tools/browser.d.ts +19 -0
- package/dist/tools/browser.js +111 -0
- package/dist/tools/browser.js.map +1 -0
- package/dist/tools/index.d.ts +27 -0
- package/dist/tools/index.js +184 -0
- package/dist/tools/index.js.map +1 -0
- package/dist/tools/registry.d.ts +31 -0
- package/dist/tools/registry.js +168 -0
- package/dist/tools/registry.js.map +1 -0
- package/dist/tools/router.d.ts +34 -0
- package/dist/tools/router.js +73 -0
- package/dist/tools/router.js.map +1 -0
- package/dist/tools/security.d.ts +28 -0
- package/dist/tools/security.js +183 -0
- package/dist/tools/security.js.map +1 -0
- package/dist/tools/webSearch.d.ts +6 -0
- package/dist/tools/webSearch.js +120 -0
- package/dist/tools/webSearch.js.map +1 -0
- package/dist/tracing/analyzer.d.ts +58 -0
- package/dist/tracing/analyzer.js +190 -0
- package/dist/tracing/analyzer.js.map +1 -0
- package/dist/tracing/langsmith.d.ts +38 -0
- package/dist/tracing/langsmith.js +50 -0
- package/dist/tracing/langsmith.js.map +1 -0
- package/dist/tracing/sessionTracer.d.ts +73 -0
- package/dist/tracing/sessionTracer.js +157 -0
- package/dist/tracing/sessionTracer.js.map +1 -0
- package/dist/tracing/types.d.ts +46 -0
- package/dist/tracing/types.js +5 -0
- package/dist/tracing/types.js.map +1 -0
- package/dist/ui/App.d.ts +24 -0
- package/dist/ui/App.js +172 -0
- package/dist/ui/App.js.map +1 -0
- package/dist/ui/components/HITLPrompt.d.ts +15 -0
- package/dist/ui/components/HITLPrompt.js +35 -0
- package/dist/ui/components/HITLPrompt.js.map +1 -0
- package/dist/ui/components/Header.d.ts +8 -0
- package/dist/ui/components/Header.js +6 -0
- package/dist/ui/components/Header.js.map +1 -0
- package/dist/ui/components/MessageBubble.d.ts +13 -0
- package/dist/ui/components/MessageBubble.js +17 -0
- package/dist/ui/components/MessageBubble.js.map +1 -0
- package/dist/ui/components/StatusBar.d.ts +21 -0
- package/dist/ui/components/StatusBar.js +34 -0
- package/dist/ui/components/StatusBar.js.map +1 -0
- package/dist/ui/components/StreamingText.d.ts +13 -0
- package/dist/ui/components/StreamingText.js +24 -0
- package/dist/ui/components/StreamingText.js.map +1 -0
- package/dist/ui/components/ToolCallPanel.d.ts +15 -0
- package/dist/ui/components/ToolCallPanel.js +18 -0
- package/dist/ui/components/ToolCallPanel.js.map +1 -0
- package/docs/01_insights_and_patterns.md +27 -0
- package/docs/02_edge_cases_and_mitigations.md +143 -0
- package/docs/03_initial_implementation_plan.md +66 -0
- package/docs/04_tech_stack_proposal.md +20 -0
- package/docs/05_prd.md +87 -0
- package/docs/06_user_stories.md +72 -0
- package/docs/07_system_architecture.md +138 -0
- package/docs/08_roadmap.md +200 -0
- package/e2b/Dockerfile +26 -0
- package/package.json +57 -0
- package/src/__tests__/bootstrap.test.ts +111 -0
- package/src/__tests__/config.test.ts +97 -0
- package/src/__tests__/m55.test.ts +238 -0
- package/src/__tests__/middleware.test.ts +219 -0
- package/src/__tests__/modelFactory.test.ts +63 -0
- package/src/__tests__/optimizations.test.ts +201 -0
- package/src/__tests__/promptBuilder.test.ts +141 -0
- package/src/__tests__/sandbox.test.ts +102 -0
- package/src/__tests__/security.test.ts +122 -0
- package/src/__tests__/streaming.test.ts +82 -0
- package/src/__tests__/toolRouter.test.ts +52 -0
- package/src/__tests__/tools.test.ts +146 -0
- package/src/__tests__/tracing.test.ts +196 -0
- package/src/agents/agentRegistry.ts +69 -0
- package/src/agents/agentSpec.ts +67 -0
- package/src/agents/builtinAgents.ts +142 -0
- package/src/cli/config.ts +124 -0
- package/src/cli/index.ts +730 -0
- package/src/cli/modelFactory.ts +174 -0
- package/src/cli/providers.ts +107 -0
- package/src/commands/builtinCommands.ts +293 -0
- package/src/commands/commandRegistry.ts +194 -0
- package/src/core/agentLoop.d.ts.map +1 -0
- package/src/core/agentLoop.ts +312 -0
- package/src/core/autoSave.ts +95 -0
- package/src/core/compactor.ts +252 -0
- package/src/core/contextGuard.ts +129 -0
- package/src/core/errors.ts +202 -0
- package/src/core/promptBuilder.d.ts.map +1 -0
- package/src/core/promptBuilder.ts +139 -0
- package/src/core/reasoningRouter.ts +121 -0
- package/src/core/retry.ts +75 -0
- package/src/core/sessionResumer.ts +90 -0
- package/src/core/sessionStore.ts +215 -0
- package/src/core/subAgent.ts +339 -0
- package/src/core/tokenCounter.ts +64 -0
- package/src/evals/dataset.ts +67 -0
- package/src/evals/evaluator.ts +81 -0
- package/src/hitl/bridge.ts +160 -0
- package/src/middleware/commandSanitizer.ts +60 -0
- package/src/middleware/loopDetection.ts +63 -0
- package/src/middleware/permission.ts +72 -0
- package/src/middleware/pipeline.ts +75 -0
- package/src/middleware/preCompletion.ts +94 -0
- package/src/middleware/types.ts +45 -0
- package/src/sandbox/bootstrap.ts +121 -0
- package/src/sandbox/manager.ts +239 -0
- package/src/sandbox/sync.ts +157 -0
- package/src/skills/loader.ts +143 -0
- package/src/skills/tools.ts +99 -0
- package/src/skills/types.ts +13 -0
- package/src/test_cache.ts +72 -0
- package/src/test_google.js +40 -0
- package/src/test_google.ts +40 -0
- package/src/tools/askUser.ts +47 -0
- package/src/tools/browser.ts +137 -0
- package/src/tools/index.d.ts.map +1 -0
- package/src/tools/index.ts +237 -0
- package/src/tools/registry.ts +198 -0
- package/src/tools/router.ts +78 -0
- package/src/tools/security.ts +220 -0
- package/src/tools/spawnAgent.ts +158 -0
- package/src/tools/webSearch.ts +142 -0
- package/src/tracing/analyzer.ts +265 -0
- package/src/tracing/langsmith.ts +63 -0
- package/src/tracing/sessionTracer.ts +202 -0
- package/src/tracing/types.ts +49 -0
- package/src/types/valyu.d.ts +37 -0
- package/src/ui/App.tsx +404 -0
- package/src/ui/components/HITLPrompt.tsx +119 -0
- package/src/ui/components/Header.tsx +51 -0
- package/src/ui/components/MessageBubble.tsx +46 -0
- package/src/ui/components/StatusBar.tsx +138 -0
- package/src/ui/components/StreamingText.tsx +48 -0
- package/src/ui/components/ToolCallPanel.tsx +80 -0
- package/tests/commands/commands.test.ts +356 -0
- package/tests/core/compactor.test.ts +217 -0
- package/tests/core/retryAndErrors.test.ts +164 -0
- package/tests/core/sessionResumer.test.ts +95 -0
- package/tests/core/sessionStore.test.ts +84 -0
- package/tests/core/stability.test.ts +165 -0
- package/tests/core/subAgent.test.ts +238 -0
- package/tests/hitl/hitlBridge.test.ts +115 -0
- package/tsconfig.json +16 -0
- package/vitest.config.ts +10 -0
- package/vitest.out +48 -0
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
import React from "react";
|
|
2
|
+
type MessageRole = "user" | "agent" | "system";
|
|
3
|
+
interface MessageBubbleProps {
|
|
4
|
+
role: MessageRole;
|
|
5
|
+
content: string;
|
|
6
|
+
}
|
|
7
|
+
/**
|
|
8
|
+
* Renders a single conversation message with role-based styling.
|
|
9
|
+
* - User messages: cyan accent, labeled "you"
|
|
10
|
+
* - Agent messages: green accent, labeled "joone"
|
|
11
|
+
* - System messages: centered, dim yellow
|
|
12
|
+
*/ export declare const MessageBubble: React.FC<MessageBubbleProps>;
|
|
13
|
+
export {};
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
import { jsxs as _jsxs, jsx as _jsx } from "react/jsx-runtime";
|
|
2
|
+
import { Box, Text } from "ink";
|
|
3
|
+
/**
|
|
4
|
+
* Renders a single conversation message with role-based styling.
|
|
5
|
+
* - User messages: cyan accent, labeled "you"
|
|
6
|
+
* - Agent messages: green accent, labeled "joone"
|
|
7
|
+
* - System messages: centered, dim yellow
|
|
8
|
+
*/ export const MessageBubble = ({ role, content, }) => {
|
|
9
|
+
if (role === "system") {
|
|
10
|
+
return (_jsx(Box, { paddingX: 1, justifyContent: "center", children: _jsxs(Text, { dimColor: true, italic: true, color: "yellow", children: ["\u2699 ", content] }) }));
|
|
11
|
+
}
|
|
12
|
+
const isUser = role === "user";
|
|
13
|
+
const accentColor = isUser ? "cyan" : "green";
|
|
14
|
+
const label = isUser ? "you" : "joone";
|
|
15
|
+
return (_jsxs(Box, { flexDirection: "column", paddingX: 1, children: [_jsx(Text, { bold: true, color: accentColor, children: label }), _jsx(Box, { marginLeft: 2, children: _jsx(Text, { color: "white", wrap: "wrap", children: content }) })] }));
|
|
16
|
+
};
|
|
17
|
+
//# sourceMappingURL=MessageBubble.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"MessageBubble.js","sourceRoot":"","sources":["../../../src/ui/components/MessageBubble.tsx"],"names":[],"mappings":";AACA,OAAO,EAAE,GAAG,EAAE,IAAI,EAAE,MAAM,KAAK,CAAC;AAShC;;;;;GAKG,CAAC,MAAM,CAAC,MAAM,aAAa,GAAiC,CAAC,EAC9D,IAAI,EACJ,OAAO,GACR,EAAE,EAAE;IACH,IAAI,IAAI,KAAK,QAAQ,EAAE,CAAC;QACtB,OAAO,CACL,KAAC,GAAG,IAAC,QAAQ,EAAE,CAAC,EAAE,cAAc,EAAC,QAAQ,YACvC,MAAC,IAAI,IAAC,QAAQ,QAAC,MAAM,QAAC,KAAK,EAAC,QAAQ,wBAC/B,OAAO,IACL,GACH,CACP,CAAC;IACJ,CAAC;IAED,MAAM,MAAM,GAAG,IAAI,KAAK,MAAM,CAAC;IAC/B,MAAM,WAAW,GAAG,MAAM,CAAC,CAAC,CAAC,MAAM,CAAC,CAAC,CAAC,OAAO,CAAC;IAC9C,MAAM,KAAK,GAAG,MAAM,CAAC,CAAC,CAAC,KAAK,CAAC,CAAC,CAAC,OAAO,CAAC;IAEvC,OAAO,CACL,MAAC,GAAG,IAAC,aAAa,EAAC,QAAQ,EAAC,QAAQ,EAAE,CAAC,aACrC,KAAC,IAAI,IAAC,IAAI,QAAC,KAAK,EAAE,WAAW,YAC1B,KAAK,GACD,EACP,KAAC,GAAG,IAAC,UAAU,EAAE,CAAC,YAChB,KAAC,IAAI,IAAC,KAAK,EAAC,OAAO,EAAC,IAAI,EAAC,MAAM,YAC5B,OAAO,GACH,GACH,IACF,CACP,CAAC;AACJ,CAAC,CAAC"}
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
import React from "react";
|
|
2
|
+
interface StatusBarProps {
|
|
3
|
+
/** Current tokens used in the context window (estimated). */
|
|
4
|
+
contextTokens?: number;
|
|
5
|
+
/** Maximum context window size for the model. */
|
|
6
|
+
maxContextTokens?: number;
|
|
7
|
+
/** Total tokens consumed across all LLM calls (prompt + completion). */
|
|
8
|
+
totalTokens?: number;
|
|
9
|
+
/** Cache hit rate (0–1). */
|
|
10
|
+
cacheHitRate?: number;
|
|
11
|
+
/** Elapsed session time. */
|
|
12
|
+
elapsed?: string;
|
|
13
|
+
/** Total tool calls executed. */
|
|
14
|
+
toolCalls?: number;
|
|
15
|
+
/** Number of LLM turns. */
|
|
16
|
+
turns?: number;
|
|
17
|
+
/** Estimated cost in USD. */
|
|
18
|
+
cost?: number;
|
|
19
|
+
}
|
|
20
|
+
export declare const StatusBar: React.FC<StatusBarProps>;
|
|
21
|
+
export {};
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
import { jsx as _jsx, jsxs as _jsxs } from "react/jsx-runtime";
|
|
2
|
+
import { Box, Text } from "ink";
|
|
3
|
+
/**
|
|
4
|
+
* Renders a visual capacity bar for the context window.
|
|
5
|
+
*
|
|
6
|
+
* Example: ▓▓▓▓▓▓▓▓░░░░░░░ 52%
|
|
7
|
+
*/
|
|
8
|
+
function ContextBar({ used, max, width = 16, }) {
|
|
9
|
+
const ratio = max > 0 ? Math.min(used / max, 1) : 0;
|
|
10
|
+
const filled = Math.round(ratio * width);
|
|
11
|
+
const empty = width - filled;
|
|
12
|
+
const pct = Math.round(ratio * 100);
|
|
13
|
+
// Color: green < 60%, yellow 60-80%, red > 80%
|
|
14
|
+
const barColor = ratio < 0.6 ? "green" : ratio < 0.8 ? "yellow" : "red";
|
|
15
|
+
return (_jsxs(Text, { children: [_jsx(Text, { color: barColor, children: "▓".repeat(filled) }), _jsx(Text, { dimColor: true, children: "░".repeat(empty) }), _jsx(Text, { children: " " }), _jsxs(Text, { color: barColor, children: [pct, "%"] })] }));
|
|
16
|
+
}
|
|
17
|
+
/**
|
|
18
|
+
* Formats a token count for display (e.g., 3241 → "3.2K", 128000 → "128K").
|
|
19
|
+
*/
|
|
20
|
+
function formatTokens(n) {
|
|
21
|
+
if (n >= 1_000_000)
|
|
22
|
+
return `${(n / 1_000_000).toFixed(1)}M`;
|
|
23
|
+
if (n >= 1_000)
|
|
24
|
+
return `${(n / 1_000).toFixed(1)}K`;
|
|
25
|
+
return `${n}`;
|
|
26
|
+
}
|
|
27
|
+
export const StatusBar = ({ contextTokens = 0, maxContextTokens = 200_000, totalTokens = 0, cacheHitRate, elapsed = "0s", toolCalls = 0, turns = 0, cost, }) => {
|
|
28
|
+
return (_jsxs(Box, { flexDirection: "column", borderStyle: "single", borderColor: "gray", paddingX: 1, children: [_jsxs(Box, { justifyContent: "space-between", children: [_jsxs(Box, { children: [_jsx(Text, { dimColor: true, children: "ctx " }), _jsx(ContextBar, { used: contextTokens, max: maxContextTokens }), _jsxs(Text, { dimColor: true, children: [" ", formatTokens(contextTokens), "/", formatTokens(maxContextTokens)] })] }), cacheHitRate !== undefined && (_jsxs(Text, { children: [_jsx(Text, { dimColor: true, children: "cache " }), _jsxs(Text, { color: cacheHitRate > 0.8
|
|
29
|
+
? "green"
|
|
30
|
+
: cacheHitRate > 0.5
|
|
31
|
+
? "yellow"
|
|
32
|
+
: "red", children: [(cacheHitRate * 100).toFixed(0), "%"] })] }))] }), _jsxs(Box, { justifyContent: "space-between", children: [_jsxs(Text, { children: [_jsx(Text, { dimColor: true, children: "tokens " }), _jsx(Text, { color: "white", children: formatTokens(totalTokens) })] }), _jsxs(Text, { children: [_jsx(Text, { dimColor: true, children: "turns " }), _jsx(Text, { color: "white", children: turns })] }), _jsxs(Text, { children: [_jsx(Text, { dimColor: true, children: "tools " }), _jsx(Text, { color: "white", children: toolCalls })] }), cost !== undefined && cost > 0 && (_jsxs(Text, { children: [_jsx(Text, { dimColor: true, children: "cost " }), _jsxs(Text, { color: "white", children: ["$", cost < 0.01 ? cost.toFixed(4) : cost.toFixed(2)] })] })), _jsxs(Text, { children: [_jsx(Text, { dimColor: true, children: "elapsed " }), _jsx(Text, { color: "white", children: elapsed })] })] })] }));
|
|
33
|
+
};
|
|
34
|
+
//# sourceMappingURL=StatusBar.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"StatusBar.js","sourceRoot":"","sources":["../../../src/ui/components/StatusBar.tsx"],"names":[],"mappings":";AACA,OAAO,EAAE,GAAG,EAAE,IAAI,EAAE,MAAM,KAAK,CAAC;AAqBhC;;;;GAIG;AACH,SAAS,UAAU,CAAC,EAClB,IAAI,EACJ,GAAG,EACH,KAAK,GAAG,EAAE,GAKX;IACC,MAAM,KAAK,GAAG,GAAG,GAAG,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC,GAAG,CAAC,IAAI,GAAG,GAAG,EAAE,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC;IACpD,MAAM,MAAM,GAAG,IAAI,CAAC,KAAK,CAAC,KAAK,GAAG,KAAK,CAAC,CAAC;IACzC,MAAM,KAAK,GAAG,KAAK,GAAG,MAAM,CAAC;IAC7B,MAAM,GAAG,GAAG,IAAI,CAAC,KAAK,CAAC,KAAK,GAAG,GAAG,CAAC,CAAC;IAEpC,+CAA+C;IAC/C,MAAM,QAAQ,GAAG,KAAK,GAAG,GAAG,CAAC,CAAC,CAAC,OAAO,CAAC,CAAC,CAAC,KAAK,GAAG,GAAG,CAAC,CAAC,CAAC,QAAQ,CAAC,CAAC,CAAC,KAAK,CAAC;IAExE,OAAO,CACL,MAAC,IAAI,eACH,KAAC,IAAI,IAAC,KAAK,EAAE,QAAQ,YAAG,GAAG,CAAC,MAAM,CAAC,MAAM,CAAC,GAAQ,EAClD,KAAC,IAAI,IAAC,QAAQ,kBAAE,GAAG,CAAC,MAAM,CAAC,KAAK,CAAC,GAAQ,EACzC,KAAC,IAAI,oBAAS,EACd,MAAC,IAAI,IAAC,KAAK,EAAE,QAAQ,aAAG,GAAG,SAAS,IAC/B,CACR,CAAC;AACJ,CAAC;AAED;;GAEG;AACH,SAAS,YAAY,CAAC,CAAS;IAC7B,IAAI,CAAC,IAAI,SAAS;QAAE,OAAO,GAAG,CAAC,CAAC,GAAG,SAAS,CAAC,CAAC,OAAO,CAAC,CAAC,CAAC,GAAG,CAAC;IAC5D,IAAI,CAAC,IAAI,KAAK;QAAE,OAAO,GAAG,CAAC,CAAC,GAAG,KAAK,CAAC,CAAC,OAAO,CAAC,CAAC,CAAC,GAAG,CAAC;IACpD,OAAO,GAAG,CAAC,EAAE,CAAC;AAChB,CAAC;AAED,MAAM,CAAC,MAAM,SAAS,GAA6B,CAAC,EAClD,aAAa,GAAG,CAAC,EACjB,gBAAgB,GAAG,OAAO,EAC1B,WAAW,GAAG,CAAC,EACf,YAAY,EACZ,OAAO,GAAG,IAAI,EACd,SAAS,GAAG,CAAC,EACb,KAAK,GAAG,CAAC,EACT,IAAI,GACL,EAAE,EAAE;IACH,OAAO,CACL,MAAC,GAAG,IACF,aAAa,EAAC,QAAQ,EACtB,WAAW,EAAC,QAAQ,EACpB,WAAW,EAAC,MAAM,EAClB,QAAQ,EAAE,CAAC,aAGX,MAAC,GAAG,IAAC,cAAc,EAAC,eAAe,aACjC,MAAC,GAAG,eACF,KAAC,IAAI,IAAC,QAAQ,2BAAY,EAC1B,KAAC,UAAU,IAAC,IAAI,EAAE,aAAa,EAAE,GAAG,EAAE,gBAAgB,GAAI,EAC1D,MAAC,IAAI,IAAC,QAAQ,mBACX,GAAG,EACH,YAAY,CAAC,aAAa,CAAC,OAAG,YAAY,CAAC,gBAAgB,CAAC,IACxD,IACH,EACL,YAAY,KAAK,SAAS,IAAI,CAC7B,MAAC,IAAI,eACH,KAAC,IAAI,IAAC,QAAQ,6BAAc,EAC5B,MAAC,IAAI,IACH,KAAK,EACH,YAAY,GAAG,GAAG;oCAChB,CAAC,CAAC,OAAO;oCACT,CAAC,CAAC,YAAY,GAAG,GAAG;wCAClB,CAAC,CAAC,QAAQ;wCACV,CAAC,CAAC,KAAK,aAGZ,CAAC,YAAY,GAAG,GAAG,CAAC,CAAC,OAAO,CAAC,CAAC,CAAC,SAC3B,IACF,CACR,IACG,EAGN,MAAC,GAAG,IAAC,cAAc,EAAC,eAAe,aACjC,MAAC,IAAI,eACH,KAAC,IAAI,IAAC,QAAQ,8BAAe,EAC7B,KAAC,IAAI,IAAC,KAAK,EAAC,OAAO,YAAE,YAAY,CAAC,WAAW,CAAC,GAAQ,IACjD,EACP,MAAC,IAAI,eACH,KAAC,IAAI,IAAC,QAAQ,6BAAc,EAC5B,KAAC,IAAI,IAAC,KAAK,EAAC,OAAO,YAAE,KAAK,GAAQ,IAC7B,EACP,MAAC,IAAI,eACH,KAAC,IAAI,IAAC,QAAQ,6BAAc,EAC5B,KAAC,IAAI,IAAC,KAAK,EAAC,OAAO,YAAE,SAAS,GAAQ,IACjC,EACN,IAAI,KAAK,SAAS,IAAI,IAAI,GAAG,CAAC,IAAI,CACjC,MAAC,IAAI,eACH,KAAC,IAAI,IAAC,QAAQ,4BAAa,EAC3B,MAAC,IAAI,IAAC,KAAK,EAAC,OAAO,kBACf,IAAI,GAAG,IAAI,CAAC,CAAC,CAAC,IAAI,CAAC,OAAO,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC,OAAO,CAAC,CAAC,CAAC,IAC5C,IACF,CACR,EACD,MAAC,IAAI,eACH,KAAC,IAAI,IAAC,QAAQ,+BAAgB,EAC9B,KAAC,IAAI,IAAC,KAAK,EAAC,OAAO,YAAE,OAAO,GAAQ,IAC/B,IACH,IACF,CACP,CAAC;AACJ,CAAC,CAAC"}
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
import React from "react";
|
|
2
|
+
interface StreamingTextProps {
|
|
3
|
+
/** Array of tokens that have been received so far. */
|
|
4
|
+
tokens: string[];
|
|
5
|
+
/** Whether the stream is still active. */
|
|
6
|
+
isStreaming: boolean;
|
|
7
|
+
}
|
|
8
|
+
/**
|
|
9
|
+
* StreamingText renders incoming tokens with a blinking cursor
|
|
10
|
+
* while the stream is active. Once streaming stops, the cursor disappears.
|
|
11
|
+
*/
|
|
12
|
+
export declare const StreamingText: React.FC<StreamingTextProps>;
|
|
13
|
+
export {};
|
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
import { jsx as _jsx, jsxs as _jsxs } from "react/jsx-runtime";
|
|
2
|
+
import { useState, useEffect } from "react";
|
|
3
|
+
import { Text } from "ink";
|
|
4
|
+
/**
|
|
5
|
+
* StreamingText renders incoming tokens with a blinking cursor
|
|
6
|
+
* while the stream is active. Once streaming stops, the cursor disappears.
|
|
7
|
+
*/
|
|
8
|
+
export const StreamingText = ({ tokens, isStreaming, }) => {
|
|
9
|
+
const [cursorVisible, setCursorVisible] = useState(true);
|
|
10
|
+
useEffect(() => {
|
|
11
|
+
if (!isStreaming) {
|
|
12
|
+
setCursorVisible(false);
|
|
13
|
+
return;
|
|
14
|
+
}
|
|
15
|
+
setCursorVisible(true);
|
|
16
|
+
const interval = setInterval(() => {
|
|
17
|
+
setCursorVisible((v) => !v);
|
|
18
|
+
}, 500);
|
|
19
|
+
return () => clearInterval(interval);
|
|
20
|
+
}, [isStreaming]);
|
|
21
|
+
const fullText = tokens.join("");
|
|
22
|
+
return (_jsxs(Text, { children: [_jsx(Text, { color: "white", children: fullText }), isStreaming && (_jsx(Text, { color: "cyan", bold: true, children: cursorVisible ? "▊" : " " }))] }));
|
|
23
|
+
};
|
|
24
|
+
//# sourceMappingURL=StreamingText.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"StreamingText.js","sourceRoot":"","sources":["../../../src/ui/components/StreamingText.tsx"],"names":[],"mappings":";AAAA,OAAc,EAAE,QAAQ,EAAE,SAAS,EAAE,MAAM,OAAO,CAAC;AACnD,OAAO,EAAE,IAAI,EAAE,MAAM,KAAK,CAAC;AAS3B;;;GAGG;AACH,MAAM,CAAC,MAAM,aAAa,GAAiC,CAAC,EAC1D,MAAM,EACN,WAAW,GACZ,EAAE,EAAE;IACH,MAAM,CAAC,aAAa,EAAE,gBAAgB,CAAC,GAAG,QAAQ,CAAC,IAAI,CAAC,CAAC;IAEzD,SAAS,CAAC,GAAG,EAAE;QACb,IAAI,CAAC,WAAW,EAAE,CAAC;YACjB,gBAAgB,CAAC,KAAK,CAAC,CAAC;YACxB,OAAO;QACT,CAAC;QAED,gBAAgB,CAAC,IAAI,CAAC,CAAC;QAEvB,MAAM,QAAQ,GAAG,WAAW,CAAC,GAAG,EAAE;YAChC,gBAAgB,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,CAAC,CAAC;QAC9B,CAAC,EAAE,GAAG,CAAC,CAAC;QAER,OAAO,GAAG,EAAE,CAAC,aAAa,CAAC,QAAQ,CAAC,CAAC;IACvC,CAAC,EAAE,CAAC,WAAW,CAAC,CAAC,CAAC;IAElB,MAAM,QAAQ,GAAG,MAAM,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;IAEjC,OAAO,CACL,MAAC,IAAI,eACH,KAAC,IAAI,IAAC,KAAK,EAAC,OAAO,YAAE,QAAQ,GAAQ,EACpC,WAAW,IAAI,CACd,KAAC,IAAI,IAAC,KAAK,EAAC,MAAM,EAAC,IAAI,kBACpB,aAAa,CAAC,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC,GAAG,GACrB,CACR,IACI,CACR,CAAC;AACJ,CAAC,CAAC"}
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
import React from "react";
|
|
2
|
+
export type ToolCallStatus = "running" | "success" | "error";
|
|
3
|
+
interface ToolCallPanelProps {
|
|
4
|
+
toolName: string;
|
|
5
|
+
args?: Record<string, unknown>;
|
|
6
|
+
status: ToolCallStatus;
|
|
7
|
+
result?: string;
|
|
8
|
+
}
|
|
9
|
+
/**
|
|
10
|
+
* Styled panel that appears when the agent invokes a tool.
|
|
11
|
+
* Shows the tool name, arguments, a spinner while executing,
|
|
12
|
+
* and the result once complete.
|
|
13
|
+
*/
|
|
14
|
+
export declare const ToolCallPanel: React.FC<ToolCallPanelProps>;
|
|
15
|
+
export {};
|
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
import { jsx as _jsx, jsxs as _jsxs } from "react/jsx-runtime";
|
|
2
|
+
import { Box, Text } from "ink";
|
|
3
|
+
import Spinner from "ink-spinner";
|
|
4
|
+
/**
|
|
5
|
+
* Styled panel that appears when the agent invokes a tool.
|
|
6
|
+
* Shows the tool name, arguments, a spinner while executing,
|
|
7
|
+
* and the result once complete.
|
|
8
|
+
*/
|
|
9
|
+
export const ToolCallPanel = ({ toolName, args, status, result, }) => {
|
|
10
|
+
const borderColor = status === "running" ? "yellow" : status === "success" ? "green" : "red";
|
|
11
|
+
const statusIcon = status === "running" ? (_jsx(Text, { color: "yellow", children: _jsx(Spinner, { type: "dots" }) })) : status === "success" ? (_jsx(Text, { color: "green", children: "\u2713" })) : (_jsx(Text, { color: "red", children: "\u2717" }));
|
|
12
|
+
return (_jsxs(Box, { flexDirection: "column", borderStyle: "round", borderColor: borderColor, paddingX: 1, marginY: 0, children: [_jsxs(Box, { gap: 1, children: [statusIcon, _jsx(Text, { bold: true, color: borderColor, children: toolName })] }), args && Object.keys(args).length > 0 && (_jsx(Box, { marginLeft: 2, flexDirection: "column", children: Object.entries(args).map(([key, value]) => (_jsxs(Text, { children: [_jsxs(Text, { dimColor: true, children: [key, ":"] }), " ", _jsx(Text, { color: "white", children: typeof value === "string"
|
|
13
|
+
? value.length > 80
|
|
14
|
+
? value.slice(0, 77) + "..."
|
|
15
|
+
: value
|
|
16
|
+
: JSON.stringify(value) })] }, key))) })), result && (_jsx(Box, { marginTop: 0, marginLeft: 2, children: _jsx(Text, { dimColor: true, children: result.length > 120 ? result.slice(0, 117) + "..." : result }) }))] }));
|
|
17
|
+
};
|
|
18
|
+
//# sourceMappingURL=ToolCallPanel.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"ToolCallPanel.js","sourceRoot":"","sources":["../../../src/ui/components/ToolCallPanel.tsx"],"names":[],"mappings":";AACA,OAAO,EAAE,GAAG,EAAE,IAAI,EAAE,MAAM,KAAK,CAAC;AAChC,OAAO,OAAO,MAAM,aAAa,CAAC;AAWlC;;;;GAIG;AACH,MAAM,CAAC,MAAM,aAAa,GAAiC,CAAC,EAC1D,QAAQ,EACR,IAAI,EACJ,MAAM,EACN,MAAM,GACP,EAAE,EAAE;IACH,MAAM,WAAW,GACf,MAAM,KAAK,SAAS,CAAC,CAAC,CAAC,QAAQ,CAAC,CAAC,CAAC,MAAM,KAAK,SAAS,CAAC,CAAC,CAAC,OAAO,CAAC,CAAC,CAAC,KAAK,CAAC;IAE3E,MAAM,UAAU,GACd,MAAM,KAAK,SAAS,CAAC,CAAC,CAAC,CACrB,KAAC,IAAI,IAAC,KAAK,EAAC,QAAQ,YAClB,KAAC,OAAO,IAAC,IAAI,EAAC,MAAM,GAAG,GAClB,CACR,CAAC,CAAC,CAAC,MAAM,KAAK,SAAS,CAAC,CAAC,CAAC,CACzB,KAAC,IAAI,IAAC,KAAK,EAAC,OAAO,uBAAS,CAC7B,CAAC,CAAC,CAAC,CACF,KAAC,IAAI,IAAC,KAAK,EAAC,KAAK,uBAAS,CAC3B,CAAC;IAEJ,OAAO,CACL,MAAC,GAAG,IACF,aAAa,EAAC,QAAQ,EACtB,WAAW,EAAC,OAAO,EACnB,WAAW,EAAE,WAAW,EACxB,QAAQ,EAAE,CAAC,EACX,OAAO,EAAE,CAAC,aAEV,MAAC,GAAG,IAAC,GAAG,EAAE,CAAC,aACR,UAAU,EACX,KAAC,IAAI,IAAC,IAAI,QAAC,KAAK,EAAE,WAAW,YAC1B,QAAQ,GACJ,IACH,EAEL,IAAI,IAAI,MAAM,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC,MAAM,GAAG,CAAC,IAAI,CACvC,KAAC,GAAG,IAAC,UAAU,EAAE,CAAC,EAAE,aAAa,EAAC,QAAQ,YACvC,MAAM,CAAC,OAAO,CAAC,IAAI,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC,GAAG,EAAE,KAAK,CAAC,EAAE,EAAE,CAAC,CAC1C,MAAC,IAAI,eACH,MAAC,IAAI,IAAC,QAAQ,mBAAE,GAAG,SAAS,EAAC,GAAG,EAChC,KAAC,IAAI,IAAC,KAAK,EAAC,OAAO,YAChB,OAAO,KAAK,KAAK,QAAQ;gCACxB,CAAC,CAAC,KAAK,CAAC,MAAM,GAAG,EAAE;oCACjB,CAAC,CAAC,KAAK,CAAC,KAAK,CAAC,CAAC,EAAE,EAAE,CAAC,GAAG,KAAK;oCAC5B,CAAC,CAAC,KAAK;gCACT,CAAC,CAAC,IAAI,CAAC,SAAS,CAAC,KAAK,CAAC,GACpB,KARE,GAAG,CASP,CACR,CAAC,GACE,CACP,EAEA,MAAM,IAAI,CACT,KAAC,GAAG,IAAC,SAAS,EAAE,CAAC,EAAE,UAAU,EAAE,CAAC,YAC9B,KAAC,IAAI,IAAC,QAAQ,kBACX,MAAM,CAAC,MAAM,GAAG,GAAG,CAAC,CAAC,CAAC,MAAM,CAAC,KAAK,CAAC,CAAC,EAAE,GAAG,CAAC,GAAG,KAAK,CAAC,CAAC,CAAC,MAAM,GACvD,GACH,CACP,IACG,CACP,CAAC;AACJ,CAAC,CAAC"}
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
# Actionable Insights, Patterns, and Best Practices
|
|
2
|
+
|
|
3
|
+
Derived from recent research on Harness Engineering and Prompt Caching for Agentic Coding.
|
|
4
|
+
|
|
5
|
+
## 1. The Cache-Optimized Context Prefix (Prompt Caching)
|
|
6
|
+
|
|
7
|
+
- **Prefix Matching Rule:** LLM APIs cache everything from the start of a prompt up to a `cache_control` breakpoint. Any dynamic change in the middle invalidates the rest of the cache.
|
|
8
|
+
- **Order Matters (Static to Dynamic):**
|
|
9
|
+
1. Base System Instructions & Tool Definitions (Globally Cached)
|
|
10
|
+
2. Project/Workspace memory (e.g., `CLAUDE.md`) (Cached per project)
|
|
11
|
+
3. Session State (Environment variables, rules) (Cached per session)
|
|
12
|
+
4. Conversation Messages (Grows iteratively)
|
|
13
|
+
- **Immutability within a Session:** Never add/remove tools mid-conversation, and never swap models (e.g., from Opus to Haiku) mid-session, as this breaks the cache prefix.
|
|
14
|
+
- **The `<system-reminder>` Pattern:** If you need to update agent behavior or state, do **not** edit the system prompt. Instead, insert a `<system-reminder>` tag inside the next simulated User Message or Tool Result.
|
|
15
|
+
|
|
16
|
+
## 2. Harness Engineering & Middleware
|
|
17
|
+
|
|
18
|
+
- **Control via Harness, Not Just Prompts:** Mold the agent's behavior by building programmatic wrappers (middleware) around the LLM reasoning step rather than just asking the LLM nicely.
|
|
19
|
+
- **Anti-Doom-Loop Middleware:** Track per-file edits in the harness. If an agent edits the same file N times without success, inject a message forcing it to reconsider its approach.
|
|
20
|
+
- **Forced Self-Verification:** Agents tend to write code and immediately stop without testing. Implement a `PreCompletionChecklistMiddleware` that intercepts the agent's attempt to exit, forcing it to run local tests and read the full output before concluding.
|
|
21
|
+
- **Local Context Injection:** Automatically discover and map the working directory and available binaries (e.g., Python, Node) into the prompt upon startup.
|
|
22
|
+
|
|
23
|
+
## 3. Agent Execution Strategy
|
|
24
|
+
|
|
25
|
+
- **The Reasoning Sandwich:** Adjust the amount of compute/reasoning dynamically. Use heavy reasoning for Planning, Discovery, and Final Verification, but use medium reasoning for straightforward code implementations to save time and tokens.
|
|
26
|
+
- **Lazy Tool Loading (Searchable Tools):** Instead of stuffing every possible schema into the prompt, provide "stubs" (tool names and descriptions). Allow the agent to search for advanced tools, deferring the loading of full schemas to preserve prefix caching.
|
|
27
|
+
- **Trace-Driven Improvement:** Treat tracing (e.g., LangSmith) as a first-class feature. Route raw text-space traces to a designated "Trace Analyzer Subagent" to find where the agent frequently fails, allowing you to patch the harness without blindly guessing.
|
|
@@ -0,0 +1,143 @@
|
|
|
1
|
+
# Edge Cases & Mitigations
|
|
2
|
+
|
|
3
|
+
When building a coding agent with Prompt Caching + Middlewares, these are the primary edge cases to design around:
|
|
4
|
+
|
|
5
|
+
## 1. Prompt Caching Edge Cases (Cost & Latency Traps)
|
|
6
|
+
|
|
7
|
+
- **The "Leaky Timestamp" Cache Breaker:**
|
|
8
|
+
- _The Edge Case:_ If you inject dynamic data (like the current time, memory usage, or random UUIDs) into your Base System Prompt, you will achieve a **0% cache hit rate**. The cache relies on exact prefix matching.
|
|
9
|
+
- _Mitigation:_ Put all static, immutable instructions at the top. Any dynamic state must be injected via a `<system-reminder>` inside the _Messages_ array (which sits at the end of the context).
|
|
10
|
+
- **The Mid-Session Model Switch:**
|
|
11
|
+
- _The Edge Case:_ Switching models mid-thread (e.g., cheap model for summarizing, smart model for coding) means the new model has an empty cache and must re-process the entire prompt prefix from scratch.
|
|
12
|
+
- _Mitigation:_ Avoid swapping models in the same thread. Span a "Sub-agent" thread and only pass minimum necessary context.
|
|
13
|
+
- **Context Window Compaction (Amnesia):**
|
|
14
|
+
- _The Edge Case:_ Summarizing a long conversation and starting a new prompt causes you to lose your cached prefix AND the agent forgets specific constraints.
|
|
15
|
+
- _Mitigation:_ Implement **Cache-Safe Forking**. Keep the exact same System Prompt and Tool definitions. Start a new thread by passing the summary of the previous history as the first few messages, followed by the new task.
|
|
16
|
+
|
|
17
|
+
## 2. Harness & Middleware Edge Cases (Logic Traps)
|
|
18
|
+
|
|
19
|
+
- **The "Massive File" Blunder:**
|
|
20
|
+
- _The Edge Case:_ The agent reads a 10,000-line minified file. This floods the context window, pushes out important instructions, and ruins the session cache.
|
|
21
|
+
- _Mitigation:_ Harness-level Guardrails. Restrict `read_file` to return chunks or force the agent to use `grep_search` / `view_file_outline`.
|
|
22
|
+
- **The "Blind Retry" Doom Loop:**
|
|
23
|
+
- _The Edge Case:_ The agent misses a space in a search-and-replace, fails, and tries the exact same edit endlessly.
|
|
24
|
+
- _Mitigation:_ Use `LoopDetectionMiddleware`. If the agent emits identical tool calls 3 times, intercept and inject: _"You have failed this 3 times. Stop trying this approach."_
|
|
25
|
+
- **The "Fake Success" Verification:**
|
|
26
|
+
- _The Edge Case:_ The agent runs tests, they fail, but the agent hallucinates that the failure is acceptable and marks the task as Done. Older approaches relied on fragile string parsing (e.g., matching "failed" in output), which could easily be bypassed or confused by test output.
|
|
27
|
+
- _Mitigation:_ The harness must programmatically parse terminal exit codes. By explicitly surfacing structured tool metadata (e.g., `ToolResult.metadata.exitCode`) from execution sandboxes, the `PreCompletionMiddleware` reliably blocks the agent from exiting if tests don't pass (`exitCode !== 0`).
|
|
28
|
+
- **Tool Schema Amnesia (with Lazy Loading):**
|
|
29
|
+
- _The Edge Case:_ An agent loads a complex tool lazily, uses it once, and then later forgets how to format its JSON schema.
|
|
30
|
+
- _Mitigation:_ If a tool is "discovered", it must remain in the "Messages" context as a system reminder so the schema is preserved.
|
|
31
|
+
- **The "Ghost Tool Call" (Context Desync):**
|
|
32
|
+
- _The Edge Case:_ A model emits a tool call but occasionally forgets to attach a internal `tool_call_id` (this breaks the strict `AIMessage[tool_calls] -> ToolMessage[tool_call_id]` sequencing rules required by modern LangChain/Anthropic/OpenAI APIs). If you forge a fake ID or cast it as a string, the LLM rejects the context on the next turn.
|
|
33
|
+
- _Mitigation:_ The "Soft Fail" approach. Intercept the malformed tool call in the `ExecutionHarness`. Do not execute the tool and do not emit a `ToolMessage`. Instead, emit a corrective `HumanMessage` stating: _"You attempted to call tool X, but didn't provide a tool_call_id. Please try again."_ This prevents context poisoning.
|
|
34
|
+
|
|
35
|
+
## 3. Security & Execution Edge Cases (Tool Exploits)
|
|
36
|
+
|
|
37
|
+
- **Command Injection via Malicious Interpolation:**
|
|
38
|
+
- _The Edge Case:_ Passing user-provided arguments directly into shell commands (e.g., `agent-browser --url "${args.url}"` or `gemini --file "${args.path}"`) allows attackers to escape quotes and execute arbitrary commands in the sandbox (e.g., `url = '"; cat /etc/passwd; "'`).
|
|
39
|
+
- _Mitigation:_ Use strict Bash parameter escaping. All dynamic strings passed to shell commands are wrapped in single quotes, and any internal single quotes are escaped (`'\\''`).
|
|
40
|
+
- **Host Filesystem Path Traversal (The "Escaped Workspace" Vulnerability):**
|
|
41
|
+
- _The Edge Case:_ Because `read_file` and `write_file` execute on the host machine to support live IDE syncing, a malicious prompt could instruct the agent to write to `~/.bashrc`, `C:\Windows\System32`, or `/.ssh/id_rsa`, compromising the user's host machine.
|
|
42
|
+
- _Mitigation:_ Implement strict Workspace Jail boundaries. Before any host I/O operation, the resolved path is evaluated against `process.cwd()`. If the path attempts to escape the root workspace, the tool immediately rejects the call returning a permissions error.
|
|
43
|
+
- **Silently Swallowed CLI Errors:**
|
|
44
|
+
- _The Edge Case:_ A CLI tool (like OSV-Scanner) crashes due to a configuration error (exit code > 1) and prints an error to `stderr`. If the orchestration layer only checks for `stdout` and swallows non-zero exit codes silently falling back to another tool, the critical error trace is lost.
|
|
45
|
+
- _Mitigation:_ Enforce strict exit code verification (e.g., `exitCode === 1` means vulnerabilities found) and emit clear warnings with the full `stderr` trace before attempting any fallback strategies.
|
|
46
|
+
- **The "Over-Eager Doom Loop" Reporter:**
|
|
47
|
+
- _The Edge Case:_ When detecting a doom loop (calling the same tool with identical args continuously), firing an alert during the active iteration causes redundant, spammy issue reports (e.g., reporting loop counts 3, 4, and 5 as separate critical issues).
|
|
48
|
+
- _Mitigation:_ Track the loop state continuously but defer pushing the `AnalysisIssue` to the report array until the loop is visibly broken by a differing action, or the trace ends.
|
|
49
|
+
- **The "Parallel Tool Expansion" Bug (TUI Memory Corruption):**
|
|
50
|
+
- _The Edge Case:_ In a Terminal UI rendering loop, executing an array of tool calls _inside_ the UI rendering iteration causes the generated `ToolMessage` array to be appended to the conversation history $N$ times (for $N$ tools), massively inflating context usage with duplicated data.
|
|
51
|
+
|
|
52
|
+
## 4. Persistent Session Edge Cases (State Management)
|
|
53
|
+
|
|
54
|
+
- **File System Drift (Host Desync):**
|
|
55
|
+
- _The Edge Case:_ The agent edits a file, the session is paused. A human edits the file externally before the session is resumed. The agent resumes, unaware of the external edits, and attempts a line-based replacement that corrupts the file.
|
|
56
|
+
- _Mitigation:_ `SessionResumer` explicitly logs `mtime` file stats. Upon resumption, it flags recently modified workspace files and injects a "Wakeup Prompt" forcing the LLM to diff or re-read the file before acting.
|
|
57
|
+
- **Sandbox Ephemerality (The Amnesia Problem):**
|
|
58
|
+
- _The Edge Case:_ A session running a background Express server in a cloud sandbox on Friday is resumed on Monday. The cloud provider killed the idle VM. The new VM lacks the running server, but the LLM’s context history believes it is still running.
|
|
59
|
+
- _Mitigation:_ Sandboxes are treated strictly statelessly. Upon string resumption, the agent is injected with a system message that the sandbox was recycled and it must manually restart required daemons/dev-servers.
|
|
60
|
+
- **"Mid-Breath" Interruption State (Corrupt Serialization):**
|
|
61
|
+
- _The Edge Case:_ A forced exit (`SIGINT`/Power Loss) occurs exactly while the agent stream is halfway through emitting a JSON tool call chunk, serializing a broken `AIMessage` into history.
|
|
62
|
+
- _Mitigation:_ The `SessionStore` must only trigger a `saveSession()` at strict execution boundaries (e.g. after a complete LLM generation cycle or successfully parsed CLI execution), guaranteeing invalid mid-stream JSON chunks never touch the disk.
|
|
63
|
+
- **Context Overflow (The Infinite Chat Log):**
|
|
64
|
+
- _The Edge Case:_ A persistent session spanning weeks scales the context past 200k tokens, hitting API limits and exponentially inflating the per-turn token costs.
|
|
65
|
+
- _Mitigation:_ Compaction is forced _before_ disk serialization. The session stringizes and compresses turns older than $N$ iterations into a dense system summary block before writing to `.jsonl`.
|
|
66
|
+
- **Provider/Model Switching Mid-Task:**
|
|
67
|
+
- _The Edge Case:_ Starting a complex reasoning loop with Opus, pausing, and resuming with a lightweight local model like Llama 3 8B. The history is filled with complex schema usages that confused the smaller model.
|
|
68
|
+
- _Mitigation:_ Serialize the `.jsonl` lines with `provider/model` metadata blocks. Upon resumption, the CLI explicitly warns if a provider downgrade is detected.
|
|
69
|
+
|
|
70
|
+
## 5. Error Recovery & Retry Edge Cases
|
|
71
|
+
|
|
72
|
+
- **Transient LLM API Failure (429/5xx):**
|
|
73
|
+
- _The Edge Case:_ The LLM provider returns a rate-limit (429) or server error (500/502/503) mid-turn, crashing the entire session.
|
|
74
|
+
- _Mitigation:_ `retryWithBackoff()` wraps all LLM calls with exponential backoff (1s→2s→4s + jitter). Only `JooneError` instances with `retryable === true` trigger retries; auth failures (401/403) propagate immediately.
|
|
75
|
+
- **Exhausted Retries (Self-Recovery):**
|
|
76
|
+
- _The Edge Case:_ After 3 retry attempts, the LLM API is still down. The session crashes and the user loses all progress.
|
|
77
|
+
- _Mitigation:_ Instead of crashing, `ExecutionHarness` injects the error's `toRecoveryHint()` as a `SystemMessage` into the conversation, returning a synthetic `AIMessage`. The agent can observe the error context and adapt (e.g., wait, simplify, or ask the user).
|
|
78
|
+
- **Unclassified Provider Errors:**
|
|
79
|
+
- _The Edge Case:_ A new LLM provider throws a non-standard error with no HTTP status code, bypassing the retry classification.
|
|
80
|
+
- _Mitigation:_ `wrapLLMError()` inspects `.status`, `.statusCode`, `.code`, and `.response.status` on raw errors, covering the common patterns of LangChain, Axios, and native `fetch` errors.
|
|
81
|
+
|
|
82
|
+
## 6. Human-in-the-Loop Edge Cases
|
|
83
|
+
|
|
84
|
+
- **Permission Timeout (User Away):**
|
|
85
|
+
- _The Edge Case:_ The agent calls a dangerous tool (`bash`, `write_file`) while the user is away from the terminal. The agent blocks indefinitely waiting for permission.
|
|
86
|
+
- _Mitigation:_ `HITLBridge.requestPermission()` has a configurable timeout (default 5 minutes) that auto-denies and returns a short-circuit string, letting the agent try an alternative.
|
|
87
|
+
- **Ask Question Timeout:**
|
|
88
|
+
- _The Edge Case:_ The agent asks the user a clarifying question via `ask_user_question`, but the user doesn't respond.
|
|
89
|
+
- _Mitigation:_ `HITLBridge.askUser()` resolves with `"[No response]"` after timeout, so the agent can proceed with a default assumption.
|
|
90
|
+
- **Permission Mode Misconfiguration:**
|
|
91
|
+
- _The Edge Case:_ The user sets `"permissionMode": "ask_all"` and then every tool call — including harmless reads — triggers a prompt, making the agent unusable.
|
|
92
|
+
- _Mitigation:_ `PermissionMiddleware` maintains a hardcoded `SAFE_TOOLS` whitelist (`read_file`, `search_skills`, `ask_user_question`, etc.) that bypasses approval even in `ask_all` mode.
|
|
93
|
+
|
|
94
|
+
## 7. Skills Sync Edge Cases
|
|
95
|
+
|
|
96
|
+
- **Missing User Skills Directory:**
|
|
97
|
+
- _The Edge Case:_ `~/.joone/skills/` doesn't exist on the user's machine. The sync crashes trying to walk a nonexistent path.
|
|
98
|
+
- _Mitigation:_ `syncSkillsToSandbox()` checks `fs.existsSync()` before walking each skill directory and silently skips missing paths.
|
|
99
|
+
- **Skill Name Collision (Project vs. User):**
|
|
100
|
+
- _The Edge Case:_ A user-level skill and a project-level skill have the same name. Both get synced to the sandbox, creating confusion.
|
|
101
|
+
- _Mitigation:_ `SkillLoader.discoverSkills()` deduplicates by name with project-level priority. `syncSkillsToSandbox()` only uploads `source: "user"` skills since project-level skills are already inside `projectRoot`.
|
|
102
|
+
|
|
103
|
+
## 8. Slash Command Edge Cases (M11)
|
|
104
|
+
|
|
105
|
+
- **Command Typos & Frustration:**
|
|
106
|
+
- _The Edge Case:_ User types `/modle` instead of `/model` and the agent treats it as a prompt, wasting LLM tokens and failing to switch the model.
|
|
107
|
+
- _Mitigation:_ Levenshtein distance check in `CommandRegistry`. If an unknown command is `< 3` edits away from a known command, the TUI intercepts it and suggests the correct command without calling the LLM.
|
|
108
|
+
- **State Mutation While Processing:**
|
|
109
|
+
- _The Edge Case:_ User runs `/exit` or `/clear` while the agent is midway through generating a sequence of ToolCalls.
|
|
110
|
+
- _Mitigation:_ App-level UI blocks input while `isProcessing === true`. The commands are disabled.
|
|
111
|
+
- **Model Switch to Non-Existent Model:**
|
|
112
|
+
- _The Edge Case:_ User runs `/model nonexistent`.
|
|
113
|
+
- _Mitigation:_ The command validates the model string against `ConfigManager`'s available models and securely rejects it before updating internal state.
|
|
114
|
+
|
|
115
|
+
## 9. LLM-Powered Compaction Edge Cases (M12)
|
|
116
|
+
|
|
117
|
+
- **Compaction Data Loss (Amnesia 2.0):**
|
|
118
|
+
- _The Edge Case:_ The LLM summarizes a 50-turn conversation but drops explicit file paths or tool choices, leaving the main agent blind when resuming.
|
|
119
|
+
- _Mitigation:_ The built-in Compact Prompt explicitly mandates a structured format: `Files Modified`, `Decisions Made`, `Tools Used`. A handoff prompt (`[CONTEXT HANDOFF]`) is injected into the bottom of the history to glue the summary back to the agent's persona.
|
|
120
|
+
- **Double Compaction Fidelity Loss:**
|
|
121
|
+
- _The Edge Case:_ A session exists so long it must be compacted twice. A "summary of a summary" loses critical resolution.
|
|
122
|
+
- _Mitigation:_ `ConversationCompactor` detects prior summaries and includes them entirely in the eviction block, prompting the LLM to unify the old summary with the new evicted messages.
|
|
123
|
+
|
|
124
|
+
## 10. Sub-Agent Orchestration Edge Cases (M13)
|
|
125
|
+
|
|
126
|
+
- **The Sub-Agent Recursion Bomb:**
|
|
127
|
+
- _The Edge Case:_ A sub-agent uses the `spawn_agent` tool to spawn another sub-agent, creating an infinite nesting loop.
|
|
128
|
+
- _Mitigation:_ Hardcoded Depth-1 limit. Pre-configured sub-agents in `AgentRegistry` never include `spawn_agent` or `check_agent` in their allowed toolsets.
|
|
129
|
+
- **Async Resource Contention:**
|
|
130
|
+
- _The Edge Case:_ The main agent loops over a directory and spawns 50 async `test_runner` agents concurrently.
|
|
131
|
+
- _Mitigation:_ `SubAgentManager` maintains a hard cap of 3 concurrent async tasks. Further spawn requests are queued or rejected with a backpressure error tool response.
|
|
132
|
+
- **Stale Files in Sandbox:**
|
|
133
|
+
- _The Edge Case:_ The main agent edits a file on the host, then immediately spawns a `bash` sub-agent. The sub-agent runs in the sandbox before the new host file is synced.
|
|
134
|
+
- _Mitigation:_ The `SubAgentManager` shares the main harness's `FileSync` instance and always forces a `syncToSandbox()` pass _before_ the sub-agent takes its first step.
|
|
135
|
+
|
|
136
|
+
## 11. Stability & Reliability Edge Cases (M14)
|
|
137
|
+
|
|
138
|
+
- **Context Window Overflows (Instant Death):**
|
|
139
|
+
- _The Edge Case:_ Despite compaction thresholds, a single `read_file` returns 120k tokens string, instantly blowing past the 100% capacity mark. Compaction fails because the context is already overflowing.
|
|
140
|
+
- _Mitigation:_ `ContextGuard` has a 95% "Emergency Truncation" threshold. Before hitting the API, if tokens > 95%, it _bypasses_ LLM compaction and brutally slices all but the last 4 messages, inserting a loud warning message directly into the stream, guaranteeing survival.
|
|
141
|
+
- **Process Death Serialization Tearing:**
|
|
142
|
+
- _The Edge Case:_ The `AutoSave` triggers at the exact millisecond the user presses `Ctrl+C`. The Node process terminates while `fs.writeFileSync` is mid-chunk, corrupting the JSONL session file irreversibly.
|
|
143
|
+
- _Mitigation:_ Atomic saves. `SessionStore.saveSession()` writes to an intermediate staging stream. On `process.on('SIGINT')`, a synchronous `forceSave()` is fired to cleanly flush state _before_ `process.exit(0)`.
|
|
@@ -0,0 +1,66 @@
|
|
|
1
|
+
# Initial Implementation Plan
|
|
2
|
+
|
|
3
|
+
## Phase 1: Context Engine & Caching Layer
|
|
4
|
+
|
|
5
|
+
Build a structured Prompt Builder that strictly enforces the Prefix Matching patterns so every task in a session enjoys a >90% cache hit rate.
|
|
6
|
+
|
|
7
|
+
```mermaid
|
|
8
|
+
graph TD
|
|
9
|
+
A[Base System Prompt] -->|Static Prefix| B
|
|
10
|
+
B[Tool Schemas] -->|Static Prefix| C
|
|
11
|
+
C[Project Memory e.g., README] -->|Project Prefix| D
|
|
12
|
+
D[Session Context e.g., OS Info] -->|Session Prefix| E
|
|
13
|
+
E[Conversation History] -->|Dynamic Appends| F[New User/Tool Message]
|
|
14
|
+
|
|
15
|
+
style A fill:#1e4620,stroke:#2b662e,color:#fff
|
|
16
|
+
style B fill:#1e4620,stroke:#2b662e,color:#fff
|
|
17
|
+
style C fill:#1e4620,stroke:#2b662e,color:#fff
|
|
18
|
+
style D fill:#2b465e,stroke:#3b6282,color:#fff
|
|
19
|
+
style E fill:#4a3219,stroke:#664422,color:#fff
|
|
20
|
+
|
|
21
|
+
subgraph Fully Cached Prefix
|
|
22
|
+
A
|
|
23
|
+
B
|
|
24
|
+
C
|
|
25
|
+
end
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
## Phase 2: Interoperable Tooling & Lazy Loading
|
|
29
|
+
|
|
30
|
+
Implement tools as immutable objects for the session. Implement "Plan Mode" to alter agent rules without unloading tool schemas.
|
|
31
|
+
|
|
32
|
+
- Define core tools: `read_file`, `write_file`, `bash_command`.
|
|
33
|
+
- Implement dummy/stub tools for complex integrations.
|
|
34
|
+
- Implement "Cache-Safe Forking" for compaction.
|
|
35
|
+
|
|
36
|
+
## Phase 3: The Middleware Harness
|
|
37
|
+
|
|
38
|
+
Implement pre-completion checks and loop detection via a middleware pipeline.
|
|
39
|
+
|
|
40
|
+
```mermaid
|
|
41
|
+
sequenceDiagram
|
|
42
|
+
participant Agent as LLM Agent
|
|
43
|
+
participant Harness as Execution Harness
|
|
44
|
+
participant Middle as Middleware Pipeline
|
|
45
|
+
participant Env as Environment (Bash/FS)
|
|
46
|
+
|
|
47
|
+
Agent->>Harness: Request: Edit target_file.py
|
|
48
|
+
Harness->>Middle: Emit: 'pre_tool_call'
|
|
49
|
+
Middle-->>Harness: Check LoopDetection (Fail if > 4 tries)
|
|
50
|
+
Harness->>Env: Execute Edit
|
|
51
|
+
Env-->>Harness: Return File Diff
|
|
52
|
+
Harness->>Agent: Send Tool Result
|
|
53
|
+
|
|
54
|
+
Agent->>Harness: Request: Submit/Exit
|
|
55
|
+
Harness->>Middle: Emit: 'pre_submit'
|
|
56
|
+
Middle->>Harness: Inject 'PreCompletionChecklist' (Wait, did you run tests?)
|
|
57
|
+
Harness->>Agent: System Reminder: "Please run tests to verify."
|
|
58
|
+
Agent->>Harness: Request: Run `pytest`
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
## Phase 4: Tracing & Feedback Loop
|
|
62
|
+
|
|
63
|
+
Build an automated pipeline that sends JSON traces of failed agent runs into an evaluation database.
|
|
64
|
+
|
|
65
|
+
- Hook LLM API calls to save traces.
|
|
66
|
+
- Implement `TraceAnalyzer` subagent to review failures.
|
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
# Tech Stack
|
|
2
|
+
|
|
3
|
+
The technology stack has been finalized. We are moving forward with a combination of the strong typing of TypeScript and the robust AI orchestration ecosystem of LangChain.
|
|
4
|
+
|
|
5
|
+
## The Final Stack
|
|
6
|
+
|
|
7
|
+
- **Language:** TypeScript (Node.js)
|
|
8
|
+
- Provides end-to-end type safety, especially crucial for tool schemas (Zod) and avoiding runtime errors in the execution loop.
|
|
9
|
+
- **Orchestration / LLM Framework:** LangChain.js / LangGraph.js
|
|
10
|
+
- Using the TypeScript SDK for LangChain allows us to build complex, cyclical agent workflows (like Middlewares and self-correction loops) via LangGraph.
|
|
11
|
+
- **Typing / Tool Schemas:** Zod
|
|
12
|
+
- Seamless integration with LangChain for structural output parsing and strict tool definition.
|
|
13
|
+
- **Tracing:** LangSmith
|
|
14
|
+
- First-party integration with LangChain, providing deep visibility into token usage, prompt construction, and latency. Essential for debugging cache hit rates.
|
|
15
|
+
- **CLI Framework (Optional):** Commander.js / Ink
|
|
16
|
+
- To be used if we build a robust terminal interface for the agent.
|
|
17
|
+
|
|
18
|
+
## Why this combination?
|
|
19
|
+
|
|
20
|
+
This marries the best of both originally proposed worlds. It gives us the frontend/backend interoperability and strict compile-time checks of TypeScript, while retaining the mature, graph-based agent orchestration and high-fidelity trace analysis typically dominated by Python's LangChain ecosystem.
|
package/docs/05_prd.md
ADDED
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
# Product Requirements Document (PRD)
|
|
2
|
+
|
|
3
|
+
## 1. Product Overview
|
|
4
|
+
|
|
5
|
+
**Joone** is a CLI-based autonomous coding agent that leverages **Prompt Caching** and **Harness Engineering** to achieve high autonomy and robustness in complex coding tasks while minimizing token cost and latency. It executes all generated code inside **E2B sandboxed microVMs**, isolating the host machine from any destructive operations.
|
|
6
|
+
|
|
7
|
+
## 2. Target Audience
|
|
8
|
+
|
|
9
|
+
- Software Engineers looking for an autonomous pair-programmer.
|
|
10
|
+
- DevOps engineers looking to automate script fixes and verifications.
|
|
11
|
+
- AI Researchers running benchmarks (e.g., Terminal Bench 2.0).
|
|
12
|
+
|
|
13
|
+
## 3. Core Features
|
|
14
|
+
|
|
15
|
+
### 3.1. CLI Interface & Provider Selection
|
|
16
|
+
|
|
17
|
+
- **Installable CLI**: Packaged as an npm global binary (`npx joone` or `npm i -g joone`).
|
|
18
|
+
- **Provider/Model Selection**: On first run (or via `joone config`), the user interactively selects their LLM provider and model. Stored at `~/.joone/config.json`.
|
|
19
|
+
- **Supported Providers**: Anthropic, OpenAI, Google, Mistral, Groq, DeepSeek, Fireworks, Together AI, Ollama (local).
|
|
20
|
+
- **Dynamic Provider Loading**: Provider packages are loaded on demand. If a package isn't installed, the CLI prints a helpful install command.
|
|
21
|
+
- **Streaming Output**: Token-by-token streaming enabled by default for all providers. Tool calls are buffered until complete before execution.
|
|
22
|
+
|
|
23
|
+
### 3.2. API Key Security (Tiered)
|
|
24
|
+
|
|
25
|
+
- **Tier 1 (Default)**: API keys stored in `~/.joone/config.json` with restrictive file permissions (`chmod 600`). Masked input during `joone config`.
|
|
26
|
+
- **Tier 2 (Planned)**: OS Keychain integration (Windows Credential Manager / macOS Keychain) via `keytar`.
|
|
27
|
+
- **Tier 3 (Planned)**: AES-256 encrypted config file with machine-derived key.
|
|
28
|
+
- During onboarding, the user will eventually be able to choose their preferred security tier.
|
|
29
|
+
|
|
30
|
+
### 3.3. Cache-Optimized Context Engine
|
|
31
|
+
|
|
32
|
+
- **Strict Prefix Ordering**: Separates static system instructions, tool definitions, project memory, and conversation history to align with LLM `cache_control` behaviors.
|
|
33
|
+
- **`<system-reminder>` Injection**: Updates agent state natively via standard messages rather than system prompt overwrites, preserving the cache.
|
|
34
|
+
- **Cache-Safe Compaction**: Forks and summarizes contexts seamlessly without full cache eviction.
|
|
35
|
+
|
|
36
|
+
### 3.4. Hybrid Sandbox Execution
|
|
37
|
+
|
|
38
|
+
- **Architecture**: The agent uses a **Hybrid** model — file operations (`write_file`, `read_file`) run on the **host machine** so the user sees changes live in their IDE, while all **code execution** (`bash`, tests, scripts) runs inside an [E2B](https://e2b.dev) cloud microVM sandbox.
|
|
39
|
+
- **File Sync Mechanism**: Before each sandbox execution, changed files are synced from host → sandbox.
|
|
40
|
+
- _Tracking:_ Modifications are tracked via a "dirty paths" memory array. The `write_file` host tool explicitly marks paths as dirty upon successful write.
|
|
41
|
+
- _Concurrency:_ Concurrent modifications are prevented by the `ExecutionHarness`, which executes agent tool calls sequentially and blocks the LLM loop until file I/O and sandbox syncs are fully complete.
|
|
42
|
+
- _Conflict Resolution:_ The Host machine is the absolute source of truth. The sandbox filesystem is ephemeral and overwritten by the Host's dirty files before any command runs. Modifications made manually in the sandbox bypass the host and are lost upon destroy.
|
|
43
|
+
- **Security**: The host machine is never exposed to agent-executed commands. Only file read/write touches the host.
|
|
44
|
+
|
|
45
|
+
### 3.5. Middleware Harness
|
|
46
|
+
|
|
47
|
+
- **Loop Detection (Anti-Doom Loop)**: Tracks agent action duplication and injects corrective context to break the loop.
|
|
48
|
+
- **Pre-Completion Checklist**: Intercepts task submission to force a self-verification/testing phase.
|
|
49
|
+
- **Guardrails for Scale**: Prevents loading oversized files (>1MB) entirely into memory; enforces chunked reads.
|
|
50
|
+
|
|
51
|
+
### 3.6. Lazy & Interoperable Tooling
|
|
52
|
+
|
|
53
|
+
- **Immutable Tool Definition**: Prevents mid-session tool swapping to preserve cache.
|
|
54
|
+
- **Tool Search**: Implements "stub" tools, allowing dynamic loading of complex tools only when actively requested.
|
|
55
|
+
|
|
56
|
+
### 3.7. Trace Analytics (V2)
|
|
57
|
+
|
|
58
|
+
- Logs reasoning loops and tool execution traces to analyze points of failure.
|
|
59
|
+
- Trace analyzer sub-agent that periodically reviews failures to suggest harness improvements.
|
|
60
|
+
|
|
61
|
+
## 4. Non-Functional Requirements
|
|
62
|
+
|
|
63
|
+
- **Latency:** High cache hit rates (>80% for long sessions) leading to sub-second Time-To-First-Token.
|
|
64
|
+
- **Cost:** Minimize redundant prefix token generation.
|
|
65
|
+
- **Extensibility:** Middleware pipeline should make it trivial to add new guardrails.
|
|
66
|
+
- **Development Process:** Strict Red-Green-Refactor TDD for all new features.
|
|
67
|
+
|
|
68
|
+
### 4.1 Error Handling & Degraded Modes
|
|
69
|
+
|
|
70
|
+
- **Sandbox Failures:** The `SandboxManager` is architected with an `ISandboxWrapper` interface. If the primary cloud **E2B** sandbox fails to initialize (e.g., due to network drops, API outages, or invalid keys), the manager will automatically print a warning and gracefully degrade to a local **OpenSandbox** deployment (`localhost:8080`) to ensure execution continuity.
|
|
71
|
+
- **LLM Failures (Planned):** If the remote LLM provider API goes down, the Model Factory should seamlessly fallback to a local Ollama instance if available.
|
|
72
|
+
|
|
73
|
+
### 4.2 Rate Limiting & Cost Controls (Planned)
|
|
74
|
+
|
|
75
|
+
- **Session Budgets:** Users can configure a maximum token budget (e.g., $5.00) per session. The `ExecutionHarness` tracks usage via the `SessionTracer` and will forcefully halt the agent if the threshold is reached.
|
|
76
|
+
- **Loop Circuit Breakers:** The `LoopDetectionMiddleware` acts as a behavioral rate limit, preventing the agent from infinitely burning tokens on failed bash commands.
|
|
77
|
+
|
|
78
|
+
### 4.3 Sandbox Authentication Management
|
|
79
|
+
|
|
80
|
+
- **E2B:** Authentication is managed via the `E2B_API_KEY` environment variable or securely stored in `~/.joone/config.json`.
|
|
81
|
+
- **OpenSandbox (Fallback):** Handled via `OPENSANDBOX_API_KEY` and `OPENSANDBOX_DOMAIN` properties in the config, falling back to a local Docker `localhost:8080` endpoint.
|
|
82
|
+
|
|
83
|
+
### 4.4 Telemetry, Privacy & Data Retention
|
|
84
|
+
|
|
85
|
+
- **Local-First Tracing:** All session reasoning and tool execution traces are logged to `~/.joone/traces/` as local JSON files for offline analysis using `TraceAnalyzer`.
|
|
86
|
+
- **Data Retention:** Local traces are automatically rotated/deleted after 30 days to prevent infinite disk bloat.
|
|
87
|
+
- **Privacy:** Project source code is _never_ sent to third-party telemetry providers unless the user intentionally enables the optional `LANGSMITH_API_KEY` for advanced dashboard debugging.
|