mini-coder 0.0.8 → 0.0.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/codex-lazy-fix.md +76 -0
- package/dist/mc.js +98 -40
- package/hanging-bug.md +78 -0
- package/package.json +1 -1
- package/plan-code-health.md +169 -0
|
@@ -0,0 +1,76 @@
|
|
|
1
|
+
# Codex Autonomy Issues & Fix Analysis
|
|
2
|
+
|
|
3
|
+
## Behaviours
|
|
4
|
+
When using `zen/gpt-5.3-codex` as the agent, the model consistently exhibits "lazy" or permission-seeking behaviour. Specifically:
|
|
5
|
+
1. **Initial Compliance**: It starts by reading files or globbing the directory.
|
|
6
|
+
2. **Immediate Stall**: Instead of executing edits or implementing the plan, it outputs a multi-paragraph text explaining what it *plans* to do and ends the turn.
|
|
7
|
+
3. **Permission Seeking**: It explicitly asks the user for permission (e.g., "Reply **'proceed'** and I'll start implementing batch 1").
|
|
8
|
+
4. **Ralph Mode Incompatibility**: In `/ralph` mode, the agent loops continuously. Because it restarts with a fresh context on each loop and stalls after gathering context, it never actually writes any files. It just loops through the same read-and-plan phase until it hits the max iteration limit.
|
|
9
|
+
5. **Model Differences**: Both Claude and Gemini models do not exhibit this behaviour. They are not subjected to the same conversational RLHF that pushes the model to ask the user to double check its work.
|
|
10
|
+
|
|
11
|
+
## Root Cause Analysis
|
|
12
|
+
An analysis of both OpenAI's open-source `codex-rs` client and `opencode` source code reveals that Codex models (like `gpt-5.3-codex`) are highly RLHF-tuned for safety and collaborative pair-programming. By default, the model prefers to break tasks into chunks and explicitly ask for sign-off.
|
|
13
|
+
|
|
14
|
+
To override this, the model requires three things which `mini-coder` was failing to provide correctly:
|
|
15
|
+
|
|
16
|
+
### 1. Dual-Anchored System Prompts (`system` + `instructions`)
|
|
17
|
+
`mini-coder` implemented a check `useInstructions` that placed the system prompt into the `instructions` field of the `/v1/responses` API payload. However, doing so stripped the `system` role message from the conversation context (`input` array).
|
|
18
|
+
|
|
19
|
+
By looking at `opencode` and `codex-rs`, they both ensure that the context array *also* contains the system prompt:
|
|
20
|
+
- `opencode` maps its environment variables and system instructions to `role: "system"` (or `role: "developer"`) inside `input.messages`, **while also** passing behavioral instructions to the `instructions` field in the API payload.
|
|
21
|
+
- `codex-rs` directly injects `role: "developer"` into the message list (as seen in `codex-rs/core/src/compact.rs` and their memory tracing implementations).
|
|
22
|
+
|
|
23
|
+
Without the `system` / `developer` message anchored at the start of the `input` array, the AI SDK and the model deprioritized the standalone `instructions` field, allowing the model's base permission-seeking behaviors to take over.
|
|
24
|
+
|
|
25
|
+
### 2. Explicit "Do Not Ask" Directives
|
|
26
|
+
Both `opencode` and `codex-rs` employ heavy anti-permission prompts.
|
|
27
|
+
- **Opencode** (`session/prompt/codex_header.txt`):
|
|
28
|
+
> "- Default: do the work without asking questions... Never ask permission questions like 'Should I proceed?' or 'Do you want me to run tests?'; proceed with the most reasonable option and mention what you did."
|
|
29
|
+
- **Codex-RS** (`core/templates/model_instructions/gpt-5.2-codex_instructions_template.md`):
|
|
30
|
+
> "Persist until the task is fully handled end-to-end within the current turn whenever feasible: do not stop at analysis or partial fixes; carry changes through implementation, verification, and a clear explanation of outcomes unless the user explicitly pauses or redirects you."
|
|
31
|
+
|
|
32
|
+
`mini-coder` introduced `CODEX_AUTONOMY` in a previous commit, but because of Issue #1, it was never adequately anchored in the `input` array.
|
|
33
|
+
|
|
34
|
+
## Evidence & Tests
|
|
35
|
+
We introduced a fetch wrapper interceptor in `src/llm-api/providers.ts` that logs the full outbound API requests to `~/.config/mini-coder/api.log`.
|
|
36
|
+
|
|
37
|
+
A test script `test-turn.ts` running a dummy turn showed the exact payload generated by the AI SDK before our fix:
|
|
38
|
+
```json
|
|
39
|
+
"body": {
|
|
40
|
+
"model": "gpt-5.3-codex",
|
|
41
|
+
"input": [
|
|
42
|
+
{
|
|
43
|
+
"role": "user",
|
|
44
|
+
"content": [
|
|
45
|
+
{ "type": "input_text", "text": "hello" }
|
|
46
|
+
]
|
|
47
|
+
}
|
|
48
|
+
],
|
|
49
|
+
"store": false,
|
|
50
|
+
"instructions": "You are a test agent.",
|
|
51
|
+
...
|
|
52
|
+
```
|
|
53
|
+
```json
|
|
54
|
+
"body": {
|
|
55
|
+
"model": "gpt-5.3-codex",
|
|
56
|
+
"input": [
|
|
57
|
+
{
|
|
58
|
+
"role": "developer",
|
|
59
|
+
"content": "You are mini-coder, a small and fast CLI coding agent... [CODEX_AUTONOMY directives]"
|
|
60
|
+
},
|
|
61
|
+
{
|
|
62
|
+
"role": "user",
|
|
63
|
+
"content": [
|
|
64
|
+
{ "type": "input_text", "text": "hello" }
|
|
65
|
+
]
|
|
66
|
+
}
|
|
67
|
+
],
|
|
68
|
+
"instructions": "You are mini-coder, a small and fast CLI coding agent... [CODEX_AUTONOMY directives]"
|
|
69
|
+
}
|
|
70
|
+
```
|
|
71
|
+
This perfectly mirrors the behavior seen in `opencode` and `codex-rs`.
|
|
72
|
+
|
|
73
|
+
## Actions Taken
|
|
74
|
+
1. Added an `api.log` request interceptor in `providers.ts` to capture and inspect the exact JSON payloads sent to the OpenAI/AI SDK endpoints.
|
|
75
|
+
2. Cloned and analyzed both `opencode` and `codex` repos to observe how they communicate with `gpt-5.*` codex endpoints.
|
|
76
|
+
3. Updated `src/llm-api/turn.ts` so `system: systemPrompt` is *always* passed to the AI SDK, guaranteeing a `developer` message anchors the `input` array, even when `instructions` is also used.
|
package/dist/mc.js
CHANGED
|
@@ -94,7 +94,7 @@ function zenEndpointFor(modelId) {
|
|
|
94
94
|
if (modelId.startsWith("claude-"))
|
|
95
95
|
return zenAnthropic()(modelId);
|
|
96
96
|
if (modelId.startsWith("gpt-"))
|
|
97
|
-
return zenOpenAI()(modelId);
|
|
97
|
+
return zenOpenAI().responses(modelId);
|
|
98
98
|
if (modelId.startsWith("gemini-"))
|
|
99
99
|
return zenGoogle()(modelId);
|
|
100
100
|
return zenCompat()(modelId);
|
|
@@ -176,6 +176,15 @@ function directGoogle() {
|
|
|
176
176
|
}
|
|
177
177
|
return _directGoogle;
|
|
178
178
|
}
|
|
179
|
+
function parseModelString(modelString) {
|
|
180
|
+
const slashIdx = modelString.indexOf("/");
|
|
181
|
+
if (slashIdx === -1)
|
|
182
|
+
return { provider: modelString, modelId: "" };
|
|
183
|
+
return {
|
|
184
|
+
provider: modelString.slice(0, slashIdx),
|
|
185
|
+
modelId: modelString.slice(slashIdx + 1)
|
|
186
|
+
};
|
|
187
|
+
}
|
|
179
188
|
var CONTEXT_WINDOW_TABLE = [
|
|
180
189
|
[/^claude-/, 200000],
|
|
181
190
|
[/^gemini-/, 1e6],
|
|
@@ -187,7 +196,7 @@ var CONTEXT_WINDOW_TABLE = [
|
|
|
187
196
|
[/^qwen3-/, 131000]
|
|
188
197
|
];
|
|
189
198
|
function getContextWindow(modelString) {
|
|
190
|
-
const modelId =
|
|
199
|
+
const { modelId } = parseModelString(modelString);
|
|
191
200
|
for (const [pattern, tokens] of CONTEXT_WINDOW_TABLE) {
|
|
192
201
|
if (pattern.test(modelId))
|
|
193
202
|
return tokens;
|
|
@@ -208,7 +217,7 @@ function resolveModel(modelString) {
|
|
|
208
217
|
case "anthropic":
|
|
209
218
|
return directAnthropic()(modelId);
|
|
210
219
|
case "openai":
|
|
211
|
-
return directOpenAI()(modelId);
|
|
220
|
+
return modelId.startsWith("gpt-") ? directOpenAI().responses(modelId) : directOpenAI()(modelId);
|
|
212
221
|
case "google":
|
|
213
222
|
return directGoogle()(modelId);
|
|
214
223
|
case "ollama": {
|
|
@@ -775,7 +784,7 @@ function renderChunk(text, inFence) {
|
|
|
775
784
|
|
|
776
785
|
// src/cli/output.ts
|
|
777
786
|
var HOME = homedir4();
|
|
778
|
-
var PACKAGE_VERSION = "0.0.
|
|
787
|
+
var PACKAGE_VERSION = "0.0.8";
|
|
779
788
|
function tildePath(p) {
|
|
780
789
|
return p.startsWith(HOME) ? `~${p.slice(HOME.length)}` : p;
|
|
781
790
|
}
|
|
@@ -2401,30 +2410,51 @@ var MAX_STEPS = 50;
|
|
|
2401
2410
|
function isZodSchema(s) {
|
|
2402
2411
|
return s !== null && typeof s === "object" && "_def" in s;
|
|
2403
2412
|
}
|
|
2404
|
-
function toCoreTool(def) {
|
|
2413
|
+
function toCoreTool(def, claimWarning) {
|
|
2405
2414
|
const schema = isZodSchema(def.schema) ? def.schema : jsonSchema(def.schema);
|
|
2406
2415
|
return dynamicTool({
|
|
2407
2416
|
description: def.description,
|
|
2408
2417
|
inputSchema: schema,
|
|
2409
2418
|
execute: async (input) => {
|
|
2410
2419
|
try {
|
|
2411
|
-
|
|
2420
|
+
const result = await def.execute(input);
|
|
2421
|
+
if (claimWarning()) {
|
|
2422
|
+
const warning = `
|
|
2423
|
+
|
|
2424
|
+
<system-message>You have reached the maximum number of tool calls. ` + "No more tools will be available after this result. " + "Respond with a status update and list what still needs to be done.</system-message>";
|
|
2425
|
+
const str = typeof result === "string" ? result : JSON.stringify(result);
|
|
2426
|
+
return str + warning;
|
|
2427
|
+
}
|
|
2428
|
+
return result;
|
|
2412
2429
|
} catch (err) {
|
|
2413
2430
|
throw err instanceof Error ? err : new Error(String(err));
|
|
2414
2431
|
}
|
|
2415
2432
|
}
|
|
2416
2433
|
});
|
|
2417
2434
|
}
|
|
2435
|
+
function isOpenAIGPT(modelString) {
|
|
2436
|
+
const { provider, modelId } = parseModelString(modelString);
|
|
2437
|
+
return (provider === "openai" || provider === "zen") && modelId.startsWith("gpt-");
|
|
2438
|
+
}
|
|
2418
2439
|
async function* runTurn(options) {
|
|
2419
|
-
const { model, messages, tools, systemPrompt, signal } = options;
|
|
2440
|
+
const { model, modelString, messages, tools, systemPrompt, signal } = options;
|
|
2441
|
+
let stepCount = 0;
|
|
2442
|
+
let warningClaimed = false;
|
|
2443
|
+
function claimWarning() {
|
|
2444
|
+
if (stepCount !== MAX_STEPS - 2 || warningClaimed)
|
|
2445
|
+
return false;
|
|
2446
|
+
warningClaimed = true;
|
|
2447
|
+
return true;
|
|
2448
|
+
}
|
|
2420
2449
|
const toolSet = {};
|
|
2421
2450
|
for (const def of tools) {
|
|
2422
|
-
toolSet[def.name] = toCoreTool(def);
|
|
2451
|
+
toolSet[def.name] = toCoreTool(def, claimWarning);
|
|
2423
2452
|
}
|
|
2424
2453
|
let inputTokens = 0;
|
|
2425
2454
|
let outputTokens = 0;
|
|
2426
2455
|
let contextTokens = 0;
|
|
2427
2456
|
try {
|
|
2457
|
+
const useInstructions = systemPrompt !== undefined && isOpenAIGPT(modelString);
|
|
2428
2458
|
const streamOpts = {
|
|
2429
2459
|
model,
|
|
2430
2460
|
messages,
|
|
@@ -2434,8 +2464,24 @@ async function* runTurn(options) {
|
|
|
2434
2464
|
inputTokens += step.usage?.inputTokens ?? 0;
|
|
2435
2465
|
outputTokens += step.usage?.outputTokens ?? 0;
|
|
2436
2466
|
contextTokens = step.usage?.inputTokens ?? contextTokens;
|
|
2467
|
+
stepCount++;
|
|
2468
|
+
warningClaimed = false;
|
|
2437
2469
|
},
|
|
2438
|
-
|
|
2470
|
+
prepareStep: ({ stepNumber }) => {
|
|
2471
|
+
if (stepNumber >= MAX_STEPS - 1) {
|
|
2472
|
+
return { activeTools: [] };
|
|
2473
|
+
}
|
|
2474
|
+
return;
|
|
2475
|
+
},
|
|
2476
|
+
...systemPrompt && !useInstructions ? { system: systemPrompt } : {},
|
|
2477
|
+
...useInstructions ? {
|
|
2478
|
+
providerOptions: {
|
|
2479
|
+
openai: {
|
|
2480
|
+
instructions: systemPrompt,
|
|
2481
|
+
store: false
|
|
2482
|
+
}
|
|
2483
|
+
}
|
|
2484
|
+
} : {},
|
|
2439
2485
|
...signal ? { abortSignal: signal } : {}
|
|
2440
2486
|
};
|
|
2441
2487
|
const result = streamText(streamOpts);
|
|
@@ -3694,7 +3740,19 @@ function loadContextFile(cwd) {
|
|
|
3694
3740
|
}
|
|
3695
3741
|
return null;
|
|
3696
3742
|
}
|
|
3697
|
-
|
|
3743
|
+
var CODEX_AUTONOMY = `
|
|
3744
|
+
# Autonomy and persistence
|
|
3745
|
+
- You are an autonomous senior engineer. Once given a direction, proactively gather context, implement, test, and refine without waiting for additional prompts at each step.
|
|
3746
|
+
- Persist until the task is fully handled end-to-end within the current turn: do not stop at analysis or partial work; carry changes through to implementation and verification.
|
|
3747
|
+
- Bias to action: default to implementing with reasonable assumptions. Do not end your turn with clarifications or requests to "proceed" unless you are truly blocked on information only the user can provide.
|
|
3748
|
+
- Do NOT output an upfront plan, preamble, or status update before working. Start making tool calls immediately.
|
|
3749
|
+
- Do NOT ask "shall I proceed?", "shall I start?", "reply X to continue", or any equivalent. Just start.
|
|
3750
|
+
- If something is ambiguous, pick the most reasonable interpretation, implement it, and note the assumption at the end.`;
|
|
3751
|
+
function isCodexModel(modelString) {
|
|
3752
|
+
const { modelId } = parseModelString(modelString);
|
|
3753
|
+
return modelId.includes("codex");
|
|
3754
|
+
}
|
|
3755
|
+
function buildSystemPrompt(cwd, modelString) {
|
|
3698
3756
|
const contextFile = loadContextFile(cwd);
|
|
3699
3757
|
const cwdDisplay = tildePath(cwd);
|
|
3700
3758
|
const now = new Date().toLocaleString(undefined, { hour12: false });
|
|
@@ -3709,8 +3767,10 @@ Guidelines:
|
|
|
3709
3767
|
- Prefer small, targeted edits over large rewrites.
|
|
3710
3768
|
- Always read a file before editing it.
|
|
3711
3769
|
- Use glob to discover files, grep to find patterns, read to inspect contents.
|
|
3712
|
-
- Use shell for tests, builds, and git operations
|
|
3713
|
-
|
|
3770
|
+
- Use shell for tests, builds, and git operations.`;
|
|
3771
|
+
if (modelString && isCodexModel(modelString)) {
|
|
3772
|
+
prompt += CODEX_AUTONOMY;
|
|
3773
|
+
}
|
|
3714
3774
|
if (contextFile) {
|
|
3715
3775
|
prompt += `
|
|
3716
3776
|
|
|
@@ -3769,7 +3829,7 @@ async function runAgent(opts) {
|
|
|
3769
3829
|
throw new Error(`Unknown agent "${agentName}". Available agents: ${[...allAgents.keys()].join(", ") || "(none)"}`);
|
|
3770
3830
|
}
|
|
3771
3831
|
const model = modelOverride ?? agentConfig?.model ?? currentModel;
|
|
3772
|
-
const systemPrompt = agentConfig?.systemPrompt ?? buildSystemPrompt(cwd);
|
|
3832
|
+
const systemPrompt = agentConfig?.systemPrompt ?? buildSystemPrompt(cwd, model);
|
|
3773
3833
|
const subMessages = [{ role: "user", content: prompt }];
|
|
3774
3834
|
const laneId = nextLaneId++;
|
|
3775
3835
|
activeLanes.add(laneId);
|
|
@@ -3788,6 +3848,7 @@ async function runAgent(opts) {
|
|
|
3788
3848
|
let outputTokens = 0;
|
|
3789
3849
|
const events = runTurn({
|
|
3790
3850
|
model: subLlm,
|
|
3851
|
+
modelString: model,
|
|
3791
3852
|
messages: subMessages,
|
|
3792
3853
|
tools: subTools,
|
|
3793
3854
|
systemPrompt
|
|
@@ -4003,7 +4064,7 @@ ${out}
|
|
|
4003
4064
|
|
|
4004
4065
|
<system-message>PLAN MODE ACTIVE: Help the user gather context for the plan -- READ ONLY</system-message>` : ralphMode ? `${resolvedText}
|
|
4005
4066
|
|
|
4006
|
-
<system-message>RALPH MODE: You are in an autonomous loop.
|
|
4067
|
+
<system-message>RALPH MODE: You are in an autonomous loop. You MUST make actual file changes (create, edit, or write files) to complete the requested task before outputting \`/ralph\`. Reading files, running tests, or exploring the codebase does NOT count as doing the work. Only output \`/ralph\` as your final message after all requested changes are implemented and tests pass.</system-message>` : resolvedText;
|
|
4007
4068
|
const userMsg = allImages.length > 0 ? {
|
|
4008
4069
|
role: "user",
|
|
4009
4070
|
content: [
|
|
@@ -4017,10 +4078,7 @@ ${out}
|
|
|
4017
4078
|
} : { role: "user", content: coreContent };
|
|
4018
4079
|
if (wasAborted) {
|
|
4019
4080
|
stopWatcher();
|
|
4020
|
-
const stubMsg =
|
|
4021
|
-
role: "assistant",
|
|
4022
|
-
content: "[interrupted]"
|
|
4023
|
-
};
|
|
4081
|
+
const stubMsg = makeInterruptMessage("user");
|
|
4024
4082
|
session.messages.push(userMsg, stubMsg);
|
|
4025
4083
|
saveMessages(session.id, [userMsg, stubMsg], thisTurn);
|
|
4026
4084
|
coreHistory.push(userMsg, stubMsg);
|
|
@@ -4032,26 +4090,15 @@ ${out}
|
|
|
4032
4090
|
saveMessages(session.id, [userMsg], thisTurn);
|
|
4033
4091
|
coreHistory.push(userMsg);
|
|
4034
4092
|
const llm = resolveModel(currentModel);
|
|
4035
|
-
const systemPrompt = buildSystemPrompt(cwd);
|
|
4093
|
+
const systemPrompt = buildSystemPrompt(cwd, currentModel);
|
|
4036
4094
|
let lastAssistantText = "";
|
|
4037
|
-
let
|
|
4038
|
-
const rollbackTurn = () => {
|
|
4039
|
-
if (turnRolledBack)
|
|
4040
|
-
return;
|
|
4041
|
-
turnRolledBack = true;
|
|
4042
|
-
coreHistory.pop();
|
|
4043
|
-
session.messages.pop();
|
|
4044
|
-
deleteLastTurn(session.id, thisTurn);
|
|
4045
|
-
if (snapped)
|
|
4046
|
-
deleteSnapshot(session.id, thisTurn);
|
|
4047
|
-
snapshotStack.pop();
|
|
4048
|
-
turnIndex--;
|
|
4049
|
-
};
|
|
4095
|
+
let errorStubSaved = false;
|
|
4050
4096
|
try {
|
|
4051
4097
|
snapshotStack.push(snapped ? thisTurn : null);
|
|
4052
4098
|
spinner.start("thinking");
|
|
4053
4099
|
const events = runTurn({
|
|
4054
4100
|
model: llm,
|
|
4101
|
+
modelString: currentModel,
|
|
4055
4102
|
messages: coreHistory,
|
|
4056
4103
|
tools: planMode ? [...buildReadOnlyToolSet({ cwd }), ...mcpTools] : tools,
|
|
4057
4104
|
systemPrompt,
|
|
@@ -4062,16 +4109,17 @@ ${out}
|
|
|
4062
4109
|
coreHistory.push(...newMessages);
|
|
4063
4110
|
session.messages.push(...newMessages);
|
|
4064
4111
|
saveMessages(session.id, newMessages, thisTurn);
|
|
4065
|
-
|
|
4066
|
-
|
|
4067
|
-
|
|
4068
|
-
|
|
4069
|
-
|
|
4112
|
+
if (wasAborted) {
|
|
4113
|
+
const note = makeInterruptMessage("user");
|
|
4114
|
+
coreHistory.push(note);
|
|
4115
|
+
session.messages.push(note);
|
|
4116
|
+
saveMessages(session.id, [note], thisTurn);
|
|
4117
|
+
}
|
|
4118
|
+
} else {
|
|
4119
|
+
const stubMsg = makeInterruptMessage("user");
|
|
4070
4120
|
coreHistory.push(stubMsg);
|
|
4071
4121
|
session.messages.push(stubMsg);
|
|
4072
4122
|
saveMessages(session.id, [stubMsg], thisTurn);
|
|
4073
|
-
} else {
|
|
4074
|
-
rollbackTurn();
|
|
4075
4123
|
}
|
|
4076
4124
|
lastAssistantText = extractAssistantText(newMessages);
|
|
4077
4125
|
totalIn += inputTokens;
|
|
@@ -4079,7 +4127,13 @@ ${out}
|
|
|
4079
4127
|
lastContextTokens = contextTokens;
|
|
4080
4128
|
touchActiveSession(session);
|
|
4081
4129
|
} catch (err) {
|
|
4082
|
-
|
|
4130
|
+
if (!errorStubSaved) {
|
|
4131
|
+
errorStubSaved = true;
|
|
4132
|
+
const stubMsg = makeInterruptMessage("error");
|
|
4133
|
+
coreHistory.push(stubMsg);
|
|
4134
|
+
session.messages.push(stubMsg);
|
|
4135
|
+
saveMessages(session.id, [stubMsg], thisTurn);
|
|
4136
|
+
}
|
|
4083
4137
|
throw err;
|
|
4084
4138
|
} finally {
|
|
4085
4139
|
stopWatcher();
|
|
@@ -4107,6 +4161,10 @@ ${out}
|
|
|
4107
4161
|
});
|
|
4108
4162
|
}
|
|
4109
4163
|
}
|
|
4164
|
+
function makeInterruptMessage(reason) {
|
|
4165
|
+
const text = reason === "user" ? "<system-message>Response was interrupted by the user.</system-message>" : "<system-message>Response was interrupted due to an error.</system-message>";
|
|
4166
|
+
return { role: "assistant", content: text };
|
|
4167
|
+
}
|
|
4110
4168
|
function extractAssistantText(newMessages) {
|
|
4111
4169
|
const parts = [];
|
|
4112
4170
|
for (const msg of newMessages) {
|
package/hanging-bug.md
ADDED
|
@@ -0,0 +1,78 @@
|
|
|
1
|
+
# Hanging Bug Investigation
|
|
2
|
+
|
|
3
|
+
## Symptoms
|
|
4
|
+
|
|
5
|
+
After a shell tool call completes successfully, **SOMETIMES** the app hangs indefinitely instead of
|
|
6
|
+
returning to the prompt. Two observed states:
|
|
7
|
+
|
|
8
|
+
```
|
|
9
|
+
$ $ rm src/session/db.ts && bun run test
|
|
10
|
+
⠇ shell
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
Stayed spinning with shell label for well over the timeout time. I had to kill the app.
|
|
14
|
+
|
|
15
|
+
Another example with a different shell command:
|
|
16
|
+
```
|
|
17
|
+
$ $ git diff
|
|
18
|
+
✔ 0
|
|
19
|
+
│ diff --git a/src/cli/tool-render.ts b/src/cli/tool-render.ts
|
|
20
|
+
│ index f8d33af..e224932 100644
|
|
21
|
+
│ --- a/src/cli/tool-render.ts
|
|
22
|
+
│ +++ b/src/cli/tool-render.ts
|
|
23
|
+
│ @@ -148,21 +148,25 @@ export function renderToolResultInline(
|
|
24
|
+
│ }
|
|
25
|
+
│
|
|
26
|
+
│ if (toolName === "glob") {
|
|
27
|
+
│ - const r = result as { files: string[]; truncated: boolean };
|
|
28
|
+
│ - const n = r.files.length;
|
|
29
|
+
│ - writeln(
|
|
30
|
+
│ - `${indent}${G.info} ${c.dim(n === 0 ? "no matches" : `${n} file${n === 1 ? "" : "s"}${r.truncated ? " (capped)" : ""}`)}`,
|
|
31
|
+
│ - );
|
|
32
|
+
│ - return;
|
|
33
|
+
│ + const r = result as { files?: string[]; truncated?: boolean };
|
|
34
|
+
│ + if (Array.isArray(r.files)) {
|
|
35
|
+
│ + const n = r.files.length;
|
|
36
|
+
│ + writeln(
|
|
37
|
+
│ + `${indent}${G.info} ${c.dim(n === 0 ? "no matches" : `${n} file${n === 1 ? "" : "s"}${r.truncated ? " (capped)" : ""}`)}`,
|
|
38
|
+
│ + );
|
|
39
|
+
│ … +150 lines
|
|
40
|
+
⠦ thinking
|
|
41
|
+
```
|
|
42
|
+
User also had to kill the app.
|
|
43
|
+
|
|
44
|
+
Only seen with Gemini/Google models.
|
|
45
|
+
Only happens after shell tool calls.
|
|
46
|
+
Other tools calls worked just fine during the same session with the same model and mini-coder version.
|
|
47
|
+
There are no shell tool related hooks configured anywhere.
|
|
48
|
+
|
|
49
|
+
The expectation was the turn to continue as normal since the tool calls succeded.
|
|
50
|
+
|
|
51
|
+
Find the root cause, show proof.
|
|
52
|
+
|
|
53
|
+
## Findings:
|
|
54
|
+
|
|
55
|
+
Here is the concrete proof I just ran, no edits:
|
|
56
|
+
- I executed renderTurn with an event stream that does:
|
|
57
|
+
1) tool-call-start(shell)
|
|
58
|
+
2) tool-result(shell success)
|
|
59
|
+
3) then never yields again
|
|
60
|
+
- Result: the promise did not resolve within 700ms (timed_out_700ms).
|
|
61
|
+
Then I ran the same sequence but added turn-complete:
|
|
62
|
+
- Result: it resolved immediately (resolved).
|
|
63
|
+
So this is now proven in your codepath:
|
|
64
|
+
- renderTurn waits forever when the upstream async stream goes silent after a shell tool result.
|
|
65
|
+
- This exactly matches the visible hang states.
|
|
66
|
+
|
|
67
|
+
### Root Cause 1: Hangs spinning on `"shell"`
|
|
68
|
+
**Proof in code:** `src/tools/shell.ts`
|
|
69
|
+
* When a command times out, `proc.kill("SIGTERM")` only kills the parent process (e.g., `bash`). Any child processes (e.g., `bun`) become orphaned but stay alive, holding the write end of the `stdout`/`stderr` pipes open.
|
|
70
|
+
* Because the pipe never closes, `await reader.read()` inside `collectStream()` hangs indefinitely.
|
|
71
|
+
* Because `collectStream()` never resolves, the tool execution never finishes, `tool-result` is never yielded, and the stream goes completely silent while the spinner stays stuck on "shell".
|
|
72
|
+
|
|
73
|
+
### Root Cause 2: Hangs spinning on `"thinking"`
|
|
74
|
+
**Proof in code:** `src/llm-api/turn.ts`
|
|
75
|
+
* After `git diff` completes, the tool resolves and `renderTurn` switches the spinner to `"thinking"`.
|
|
76
|
+
* The AI SDK automatically makes a new HTTP request to the Gemini API containing the tool result to generate the next step.
|
|
77
|
+
* Gemini's API occasionally hangs indefinitely or silently drops connections when receiving certain payloads (like large tool outputs or ANSI color codes, which `git diff` outputs).
|
|
78
|
+
* Because there is no timeout configured on the `streamText` call in `runTurn` (unless the user manually aborts), the underlying fetch request waits forever. The `result.fullStream` never yields the next chunk, but also never closes or errors.
|
package/package.json
CHANGED
|
@@ -0,0 +1,169 @@
|
|
|
1
|
+
# Code Health Remediation Plan
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
Address maintainability and reliability issues identified in `code-health.md` with low-risk, incremental refactors that keep behavior stable.
|
|
5
|
+
|
|
6
|
+
## Constraints
|
|
7
|
+
- Keep `mini-coder-idea.md` and `README.md` unchanged.
|
|
8
|
+
- Prefer small PR-sized changes with passing tests after each step.
|
|
9
|
+
- Preserve current CLI behavior while improving structure.
|
|
10
|
+
|
|
11
|
+
## Workstreams
|
|
12
|
+
|
|
13
|
+
### 1) Decompose `src/agent/agent.ts` (High)
|
|
14
|
+
**Outcome:** `runAgent` remains orchestration entrypoint; responsibilities split into focused modules.
|
|
15
|
+
|
|
16
|
+
**Steps:**
|
|
17
|
+
1. Add `src/agent/reporter.ts` interface (narrow surface for output/status/tool events).
|
|
18
|
+
2. Extract session lifecycle + turn loop into `src/agent/session-runner.ts`.
|
|
19
|
+
3. Extract subagent execution into `src/agent/subagent-runner.ts`.
|
|
20
|
+
4. Extract snapshot/undo helpers into `src/agent/undo-snapshot.ts`.
|
|
21
|
+
5. Extract user input processing into `src/agent/input-loop.ts`.
|
|
22
|
+
6. Keep `agent.ts` as composition/wiring file only.
|
|
23
|
+
|
|
24
|
+
**Checks:**
|
|
25
|
+
- Add/adjust unit tests around orchestration boundaries.
|
|
26
|
+
- Ensure no behavior regressions in interrupts, resume, and tool-call flows.
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
### 2) Decompose `src/cli/output.ts` (High)
|
|
31
|
+
**Outcome:** Rendering responsibilities isolated and testable.
|
|
32
|
+
|
|
33
|
+
**Target modules:**
|
|
34
|
+
- `src/cli/spinner.ts`
|
|
35
|
+
- `src/cli/tool-render.ts`
|
|
36
|
+
- `src/cli/stream-render.ts`
|
|
37
|
+
- `src/cli/status-bar.ts`
|
|
38
|
+
- `src/cli/error-render.ts`
|
|
39
|
+
- `src/cli/output.ts` as facade
|
|
40
|
+
|
|
41
|
+
**Steps:**
|
|
42
|
+
1. Extract pure formatting helpers first (no IO).
|
|
43
|
+
2. Extract spinner lifecycle module.
|
|
44
|
+
3. Extract stream queue/tick/flush behavior.
|
|
45
|
+
4. Keep compatibility exports in `output.ts` to avoid broad callsite churn.
|
|
46
|
+
|
|
47
|
+
**Checks:**
|
|
48
|
+
- Add focused tests for formatting + stream behavior.
|
|
49
|
+
- Verify terminal rendering remains stable manually.
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
### 3) Introduce `TerminalIO` abstraction (Medium)
|
|
54
|
+
**Outcome:** Centralized process/TTY interactions and signal lifecycle.
|
|
55
|
+
|
|
56
|
+
**Steps:**
|
|
57
|
+
1. Create `src/cli/terminal-io.ts` with methods for stdout/stderr writes, raw mode, signal subscriptions.
|
|
58
|
+
2. Replace direct `process.*` use in output/input stack with injected `TerminalIO`.
|
|
59
|
+
3. Centralize signal registration/unregistration in one lifecycle owner.
|
|
60
|
+
|
|
61
|
+
**Checks:**
|
|
62
|
+
- Add unit tests for signal registration cleanup semantics.
|
|
63
|
+
- Confirm no stuck raw-mode edge cases.
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
### 4) Split DB layer by domain (Medium)
|
|
68
|
+
**Outcome:** Reduced blast radius and clearer data ownership.
|
|
69
|
+
|
|
70
|
+
**Target modules:**
|
|
71
|
+
- `src/session/db/connection.ts`
|
|
72
|
+
- `src/session/db/session-repo.ts`
|
|
73
|
+
- `src/session/db/message-repo.ts`
|
|
74
|
+
- `src/session/db/settings-repo.ts`
|
|
75
|
+
- `src/session/db/mcp-repo.ts`
|
|
76
|
+
- `src/session/db/snapshot-repo.ts`
|
|
77
|
+
- `src/session/db/index.ts` (facade exports)
|
|
78
|
+
|
|
79
|
+
**Steps:**
|
|
80
|
+
1. Move code without behavior changes.
|
|
81
|
+
2. Keep SQL and schema unchanged initially.
|
|
82
|
+
3. Replace direct `JSON.parse` in message loading with guarded parser:
|
|
83
|
+
- skip malformed rows
|
|
84
|
+
- emit diagnostic via logger/reporter
|
|
85
|
+
|
|
86
|
+
**Checks:**
|
|
87
|
+
- Add tests for malformed payload handling.
|
|
88
|
+
- Validate existing DB tests still pass.
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
### 5) Shared markdown config loader (Medium)
|
|
93
|
+
**Outcome:** Remove duplication across agents/skills/custom-commands.
|
|
94
|
+
|
|
95
|
+
**Steps:**
|
|
96
|
+
1. Create `src/cli/load-markdown-configs.ts` with parameterized layout strategy.
|
|
97
|
+
2. Migrate:
|
|
98
|
+
- `src/cli/agents.ts`
|
|
99
|
+
- `src/cli/skills.ts`
|
|
100
|
+
- `src/cli/custom-commands.ts`
|
|
101
|
+
3. Keep precedence rules identical (built-in/user/project).
|
|
102
|
+
4. Preserve existing frontmatter semantics.
|
|
103
|
+
|
|
104
|
+
**Checks:**
|
|
105
|
+
- Reuse/expand existing loader tests to cover parity.
|
|
106
|
+
|
|
107
|
+
---
|
|
108
|
+
|
|
109
|
+
### 6) Runtime/UI decoupling via reporter boundary (Medium)
|
|
110
|
+
**Outcome:** Core runtime no longer depends directly on terminal rendering.
|
|
111
|
+
|
|
112
|
+
**Steps:**
|
|
113
|
+
1. Define domain events or reporter interface in `src/agent/reporter.ts`.
|
|
114
|
+
2. Implement CLI reporter adapter in `src/cli/output-reporter.ts`.
|
|
115
|
+
3. Replace direct output calls in agent runtime with reporter calls.
|
|
116
|
+
|
|
117
|
+
**Checks:**
|
|
118
|
+
- Add tests using test reporter to assert emitted events.
|
|
119
|
+
|
|
120
|
+
---
|
|
121
|
+
|
|
122
|
+
### 7) Error observability and silent catches (Medium)
|
|
123
|
+
**Outcome:** Non-fatal failures become diagnosable without crashing.
|
|
124
|
+
|
|
125
|
+
**Steps:**
|
|
126
|
+
1. Find empty/broad catches in agent/output/loaders.
|
|
127
|
+
2. Add debug-level diagnostics with contextual metadata.
|
|
128
|
+
3. Keep user-facing behavior unchanged unless critical.
|
|
129
|
+
|
|
130
|
+
**Checks:**
|
|
131
|
+
- Validate noisy paths are still quiet at normal verbosity.
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
### 8) Startup FS sync usage (Low/Deferred)
|
|
136
|
+
**Outcome:** Optional responsiveness improvement if startup cost grows.
|
|
137
|
+
|
|
138
|
+
**Steps:**
|
|
139
|
+
1. Measure startup and config-loading time first.
|
|
140
|
+
2. If needed, move high-volume file scanning to async or cache results with invalidation.
|
|
141
|
+
|
|
142
|
+
---
|
|
143
|
+
|
|
144
|
+
### 9) Test hygiene cleanup (Low)
|
|
145
|
+
**Outcome:** Cleaner CI output.
|
|
146
|
+
|
|
147
|
+
**Steps:**
|
|
148
|
+
1. Remove `console.log` skip notices in `src/tools/shell.test.ts`.
|
|
149
|
+
2. Use test-framework-native skip annotations/helpers.
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
## Execution Order (recommended)
|
|
154
|
+
1. Reporter interface (foundation for later decoupling).
|
|
155
|
+
2. `agent.ts` decomposition.
|
|
156
|
+
3. `output.ts` decomposition.
|
|
157
|
+
4. Shared config loader extraction.
|
|
158
|
+
5. DB module split + safe JSON parsing.
|
|
159
|
+
6. TerminalIO + centralized signals.
|
|
160
|
+
7. Silent catch diagnostics.
|
|
161
|
+
8. Test hygiene and any deferred FS optimization.
|
|
162
|
+
|
|
163
|
+
## Definition of Done
|
|
164
|
+
- `bun run typecheck && bun run format && bun run lint && bun test` passes.
|
|
165
|
+
- No behavior regressions in interactive CLI flows.
|
|
166
|
+
- `agent.ts` and `output.ts` materially reduced in size/responsibility.
|
|
167
|
+
- Config loader duplication removed.
|
|
168
|
+
- Message loading resilient to malformed JSON rows.
|
|
169
|
+
- New abstractions documented in code comments where non-obvious.
|