@ducci/jarvis 1.0.31 → 1.0.32

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,142 @@
1
+ # Finding 015: Failed Runs Leave Tool History in Context (Context Bloat Death Spiral)
2
+
3
+ **Date:** 2026-03-02
4
+ **Severity:** High — caused 3 consecutive `model_error: Empty choices array` failures; session unusable
5
+ **Status:** Fixed
6
+
7
+ ---
8
+
9
+ ## Observed Session
10
+
11
+ Session `6123209d-ce5a-44d0-be12-29aac58b4cf3`. Model: `nvidia/nemotron-3-nano-30b-a3b:free`. User requested a ZAP security scanning project.
12
+
13
+ | Entry | Trigger | Status | messageCount at failure | toolCalls |
14
+ |-------|---------|--------|------------------------|-----------|
15
+ | 1 | "hi all good?" | ok | — | 0 |
16
+ | 2 | ZAP task (run 1) | checkpoint_reached | — | 10 |
17
+ | 3 | handoff resume (run 2) | checkpoint_reached | — | 10 |
18
+ | 4 | handoff resume (run 3) | model_error (empty choices, iter 7) | 22 | 26 |
19
+ | 5 | "Why I get Model returned an empty response?" | model_error (empty choices, iter 3) | 27 | 2 |
20
+ | 6 | "Why I get Model returned an empty response again?!!" | model_error (empty choices, iter 5) | 37 | 4 |
21
+
22
+ The session ended without producing any result. The user received `'Model returned an empty response.'` three times.
23
+
24
+ ---
25
+
26
+ ## Root Cause 1: Failed runs leave tool call history in session
27
+
28
+ ### What happened
29
+
30
+ The handoff loop strips tool call messages for `checkpoint_reached` runs:
31
+
32
+ ```js
33
+ session.messages.splice(runStartIndex, session.messages.length - runStartIndex - 1);
34
+ ```
35
+
36
+ Runs that end with `model_error` or `format_error` received **no strip**. Every tool call message (assistant+tool pair, nudge injections) from the failed run remained in `session.messages`, with only a synthetic error note appended afterward.
37
+
38
+ Run 3 had 26 tool calls across 7 iterations — approximately 13 messages added to the session. These were preserved verbatim. Each subsequent user turn started with more context than the last.
39
+
40
+ ### Message count growth
41
+
42
+ - Before run 3: ~8 messages (runs 1 and 2 were both checkpoint_reached and stripped correctly)
43
+ - After entry 4 (model_error, no strip): 21 messages + synthetic note = 22
44
+ - After entry 5 (model_error, no strip): 27 messages + synthetic note = 28
45
+ - At entry 6: 37 messages in context
46
+
47
+ The free model returns `choices: []` when the context exceeds what it can handle. Each failure added more context, making the next failure more likely: a **positive feedback death spiral**.
48
+
49
+ ### Fix
50
+
51
+ Apply the same splice that checkpoint runs already use:
52
+
53
+ ```js
54
+ if (finalStatus === 'model_error' || finalStatus === 'format_error') {
55
+ session.messages.splice(runStartIndex, session.messages.length - runStartIndex);
56
+ // then push synthetic error note as before
57
+ }
58
+ ```
59
+
60
+ The strip runs before the synthetic error note is pushed, returning the session to its pre-run state plus one concise note. The JSONL log preserves all tool results for retrospective inspection via `read_session_log`.
61
+
62
+ **File**: `src/server/agent.js` — `_runHandleChat`, non-checkpoint break path
63
+
64
+ ---
65
+
66
+ ## Root Cause 2: No detection or escalation for consecutive model_errors
67
+
68
+ ### What happened
69
+
70
+ After two consecutive `model_error: Empty choices array` entries (4 and 5), no protective mechanism fired. The system continued accepting new user messages and spawning new runs indefinitely.
71
+
72
+ Existing protection mechanisms all missed this case:
73
+ - `maxHandoffs` — only applies to `checkpoint_reached` runs
74
+ - `consecutiveFailures` — tracks tool failures within a single run
75
+ - Zero-progress detection — only applies to `checkpoint_reached` runs
76
+
77
+ ### Fix
78
+
79
+ Detect the pattern structurally in `session.messages` before starting each new run: if the last two assistant messages are both synthetic `model_error` notes, the session is in a confirmed failure loop. Escalate to `intervention_required` without running another agent loop.
80
+
81
+ ```js
82
+ function hasConsecutiveModelErrors(messages) {
83
+ const assistantTail = messages.filter(m => m.role === 'assistant').slice(-2);
84
+ return (
85
+ assistantTail.length === 2 &&
86
+ assistantTail.every(
87
+ m =>
88
+ typeof m.content === 'string' &&
89
+ m.content.startsWith('[System: Previous run failed (model_error)')
90
+ )
91
+ );
92
+ }
93
+ ```
94
+
95
+ This uses no additional state: it reads the session history directly. Old sessions are handled correctly. One failure is allowed (transient errors are real); two consecutive failures mean the session cannot self-recover.
96
+
97
+ Combined with Fix 1, consecutive model_errors in this session would have played out as:
98
+ 1. Entry 4 (run 3): model_error → strip → synthetic note. Session back to 9 messages.
99
+ 2. Entry 5 (user "Why?"): run 4 starts with 10 messages. If it still fails → strip → synthetic note. Two model_error notes now in session.
100
+ 3. Entry 6 (user "Why again?!"): `hasConsecutiveModelErrors` fires → `intervention_required` returned immediately. User gets a clear message: start a new session or switch model.
101
+
102
+ **File**: `src/server/agent.js` — `hasConsecutiveModelErrors` function + check at top of handoff loop
103
+
104
+ ---
105
+
106
+ ## Root Cause 3: Empty choices error message provides no actionable guidance
107
+
108
+ ### What happened
109
+
110
+ The `choices.length === 0` path returned:
111
+
112
+ ```
113
+ Model returned an empty response.
114
+ ```
115
+
116
+ When the user asked "why?", the agent — with ZAP tool call context still present — continued ZAP investigation instead of explaining the API failure. The opaque error and the polluted context compounded: the model had no clear signal about what went wrong and no guidance on how to recover.
117
+
118
+ ### Fix
119
+
120
+ Include the context size and recovery guidance in the response:
121
+
122
+ ```js
123
+ response: `Model returned an empty response (${preparedMessages.length} messages in context). This typically happens when the conversation is too long for the model. Try starting a new session or switching to a model with a larger context window.`,
124
+ ```
125
+
126
+ **File**: `src/server/agent.js` — `runAgentLoop`, empty choices early return
127
+
128
+ ---
129
+
130
+ ## Why Fix 1 is Primary
131
+
132
+ Fix 1 is the root fix. With context stripped after each failure, the model operates on a tiny session (~10 messages) on subsequent turns. The free model handles this easily. Fix 2 is a safety net for persistent non-context failures. Fix 3 improves user-facing error messages for the residual cases that slip through.
133
+
134
+ ---
135
+
136
+ ## Files Changed
137
+
138
+ | File | Change |
139
+ |------|--------|
140
+ | `src/server/agent.js` | Strip tool history on `model_error`/`format_error` (same as checkpoint) |
141
+ | `src/server/agent.js` | `hasConsecutiveModelErrors` function + check before each run in handoff loop |
142
+ | `src/server/agent.js` | Include message count in empty choices response |
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ducci/jarvis",
3
- "version": "1.0.31",
3
+ "version": "1.0.32",
4
4
  "description": "A fully automated agent system that lives on a server.",
5
5
  "main": "./src/index.js",
6
6
  "type": "module",
@@ -69,6 +69,25 @@ async function callModelWithFallback(client, config, messages, tools) {
69
69
  }
70
70
  }
71
71
 
72
+ /**
73
+ * Returns true if the last two assistant messages in the session are both
74
+ * synthetic model_error notes, indicating a confirmed failure loop that cannot
75
+ * self-resolve (e.g. persistent empty choices from context overflow).
76
+ */
77
+ function hasConsecutiveModelErrors(messages) {
78
+ const assistantTail = messages
79
+ .filter(m => m.role === 'assistant')
80
+ .slice(-2);
81
+ return (
82
+ assistantTail.length === 2 &&
83
+ assistantTail.every(
84
+ m =>
85
+ typeof m.content === 'string' &&
86
+ m.content.startsWith('[System: Previous run failed (model_error)')
87
+ )
88
+ );
89
+ }
90
+
72
91
  /**
73
92
  * Runs a single agent loop up to maxIterations.
74
93
  * Returns { iteration, response, logSummary, status, runToolCalls, checkpoint }.
@@ -112,7 +131,7 @@ async function runAgentLoop(client, config, session, prepareMessages) {
112
131
  if (!modelResult.choices || modelResult.choices.length === 0) {
113
132
  return {
114
133
  iteration,
115
- response: 'Model returned an empty response.',
134
+ response: `Model returned an empty response (${preparedMessages.length} messages in context). This typically happens when the conversation is too long for the model. Try starting a new session or switching to a model with a larger context window.`,
116
135
  logSummary: `Model error on iteration ${iteration}: Empty choices array.`,
117
136
  status: 'model_error',
118
137
  runToolCalls,
@@ -482,6 +501,25 @@ async function _runHandleChat(config, sessionId, userMessage) {
482
501
  try {
483
502
  // Handoff loop
484
503
  while (true) {
504
+ // Safety check: if the last two assistant messages are both model_error
505
+ // synthetic notes, we are in a confirmed failure loop. Escalate immediately
506
+ // rather than burning more iterations on a stuck session.
507
+ if (hasConsecutiveModelErrors(session.messages)) {
508
+ finalResponse = 'The model has failed twice in a row. This is likely due to the conversation being too long for the model to process. Please start a new session or switch to a model with a larger context window.';
509
+ finalLogSummary = 'Consecutive model_error detected: session escalated to intervention_required without running another agent loop.';
510
+ finalStatus = 'intervention_required';
511
+ await appendLog(sessionId, {
512
+ iteration: 0,
513
+ model: config.selectedModel,
514
+ userInput: userMessage,
515
+ toolCalls: [],
516
+ response: finalResponse,
517
+ logSummary: finalLogSummary,
518
+ status: 'intervention_required',
519
+ });
520
+ break;
521
+ }
522
+
485
523
  const runStartIndex = session.messages.length;
486
524
  const run = await runAgentLoop(client, config, session, prepareMessages);
487
525
  allToolCalls.push(...run.runToolCalls);
@@ -505,8 +543,14 @@ async function _runHandleChat(config, sessionId, userMessage) {
505
543
  if (run.rawResponse) logEntry.rawResponse = run.rawResponse;
506
544
  await appendLog(sessionId, logEntry);
507
545
 
508
- // Inject synthetic error note so the model has context on the next user turn
546
+ // Inject synthetic error note so the model has context on the next user turn.
547
+ // For failed runs, also strip the tool call history — keeping it would bloat
548
+ // the context and create a positive-feedback death spiral where each failure
549
+ // makes the next one more likely (especially on free models with small context
550
+ // windows). The synthetic note is sufficient context; tool results are preserved
551
+ // in the JSONL log and accessible via read_session_log.
509
552
  if (finalStatus === 'model_error' || finalStatus === 'format_error') {
553
+ session.messages.splice(runStartIndex, session.messages.length - runStartIndex);
510
554
  const errorDetail = run.errorDetail ? ` Error detail: ${JSON.stringify(run.errorDetail)}` : '';
511
555
  session.messages.push({
512
556
  role: 'assistant',