npm - @ducci/jarvis - Versions diffs - 1.0.31 → 1.0.32 - Mend

@ducci/jarvis 1.0.31 → 1.0.32

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/docs/findings/015-failed-run-context-strip.md +142 -0
package/package.json +1 -1
package/src/server/agent.js +46 -2

package/docs/findings/015-failed-run-context-strip.md ADDED Viewed

@@ -0,0 +1,142 @@
+# Finding 015: Failed Runs Leave Tool History in Context (Context Bloat Death Spiral)
+**Date:** 2026-03-02
+**Severity:** High — caused 3 consecutive `model_error: Empty choices array` failures; session unusable
+**Status:** Fixed
+---
+## Observed Session
+Session `6123209d-ce5a-44d0-be12-29aac58b4cf3`. Model: `nvidia/nemotron-3-nano-30b-a3b:free`. User requested a ZAP security scanning project.
+| Entry | Trigger | Status | messageCount at failure | toolCalls |
+|-------|---------|--------|------------------------|-----------|
+| 1 | "hi all good?" | ok | — | 0 |
+| 2 | ZAP task (run 1) | checkpoint_reached | — | 10 |
+| 3 | handoff resume (run 2) | checkpoint_reached | — | 10 |
+| 4 | handoff resume (run 3) | model_error (empty choices, iter 7) | 22 | 26 |
+| 5 | "Why I get Model returned an empty response?" | model_error (empty choices, iter 3) | 27 | 2 |
+| 6 | "Why I get Model returned an empty response again?!!" | model_error (empty choices, iter 5) | 37 | 4 |
+The session ended without producing any result. The user received `'Model returned an empty response.'` three times.
+---
+## Root Cause 1: Failed runs leave tool call history in session
+### What happened
+The handoff loop strips tool call messages for `checkpoint_reached` runs:
+```js
+session.messages.splice(runStartIndex, session.messages.length - runStartIndex - 1);
+```
+Runs that end with `model_error` or `format_error` received **no strip**. Every tool call message (assistant+tool pair, nudge injections) from the failed run remained in `session.messages`, with only a synthetic error note appended afterward.
+Run 3 had 26 tool calls across 7 iterations — approximately 13 messages added to the session. These were preserved verbatim. Each subsequent user turn started with more context than the last.
+### Message count growth
+- Before run 3: ~8 messages (runs 1 and 2 were both checkpoint_reached and stripped correctly)
+- After entry 4 (model_error, no strip): 21 messages + synthetic note = 22
+- After entry 5 (model_error, no strip): 27 messages + synthetic note = 28
+- At entry 6: 37 messages in context
+The free model returns `choices: []` when the context exceeds what it can handle. Each failure added more context, making the next failure more likely: a **positive feedback death spiral**.
+### Fix
+Apply the same splice that checkpoint runs already use:
+```js
+if (finalStatus === 'model_error' || finalStatus === 'format_error') {
+  session.messages.splice(runStartIndex, session.messages.length - runStartIndex);
+  // then push synthetic error note as before
+}
+```
+The strip runs before the synthetic error note is pushed, returning the session to its pre-run state plus one concise note. The JSONL log preserves all tool results for retrospective inspection via `read_session_log`.
+**File**: `src/server/agent.js` — `_runHandleChat`, non-checkpoint break path
+---
+## Root Cause 2: No detection or escalation for consecutive model_errors
+### What happened
+After two consecutive `model_error: Empty choices array` entries (4 and 5), no protective mechanism fired. The system continued accepting new user messages and spawning new runs indefinitely.
+Existing protection mechanisms all missed this case:
+- `maxHandoffs` — only applies to `checkpoint_reached` runs
+- `consecutiveFailures` — tracks tool failures within a single run
+- Zero-progress detection — only applies to `checkpoint_reached` runs
+### Fix
+Detect the pattern structurally in `session.messages` before starting each new run: if the last two assistant messages are both synthetic `model_error` notes, the session is in a confirmed failure loop. Escalate to `intervention_required` without running another agent loop.
+```js
+function hasConsecutiveModelErrors(messages) {
+  const assistantTail = messages.filter(m => m.role === 'assistant').slice(-2);
+  return (
+    assistantTail.length === 2 &&
+    assistantTail.every(
+      m =>
+        typeof m.content === 'string' &&
+        m.content.startsWith('[System: Previous run failed (model_error)')
+    )
+  );
+}
+```
+This uses no additional state: it reads the session history directly. Old sessions are handled correctly. One failure is allowed (transient errors are real); two consecutive failures mean the session cannot self-recover.
+Combined with Fix 1, consecutive model_errors in this session would have played out as:
+1. Entry 4 (run 3): model_error → strip → synthetic note. Session back to 9 messages.
+2. Entry 5 (user "Why?"): run 4 starts with 10 messages. If it still fails → strip → synthetic note. Two model_error notes now in session.
+3. Entry 6 (user "Why again?!"): `hasConsecutiveModelErrors` fires → `intervention_required` returned immediately. User gets a clear message: start a new session or switch model.
+**File**: `src/server/agent.js` — `hasConsecutiveModelErrors` function + check at top of handoff loop
+---
+## Root Cause 3: Empty choices error message provides no actionable guidance
+### What happened
+The `choices.length === 0` path returned:
+```
+Model returned an empty response.
+```
+When the user asked "why?", the agent — with ZAP tool call context still present — continued ZAP investigation instead of explaining the API failure. The opaque error and the polluted context compounded: the model had no clear signal about what went wrong and no guidance on how to recover.
+### Fix
+Include the context size and recovery guidance in the response:
+```js
+response: `Model returned an empty response (${preparedMessages.length} messages in context). This typically happens when the conversation is too long for the model. Try starting a new session or switching to a model with a larger context window.`,
+```
+**File**: `src/server/agent.js` — `runAgentLoop`, empty choices early return
+---
+## Why Fix 1 is Primary
+Fix 1 is the root fix. With context stripped after each failure, the model operates on a tiny session (~10 messages) on subsequent turns. The free model handles this easily. Fix 2 is a safety net for persistent non-context failures. Fix 3 improves user-facing error messages for the residual cases that slip through.
+---
+## Files Changed
+| File | Change |
+|------|--------|
+| `src/server/agent.js` | Strip tool history on `model_error`/`format_error` (same as checkpoint) |
+| `src/server/agent.js` | `hasConsecutiveModelErrors` function + check before each run in handoff loop |
+| `src/server/agent.js` | Include message count in empty choices response |

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@ducci/jarvis",
-  "version": "1.0.31",
+  "version": "1.0.32",
   "description": "A fully automated agent system that lives on a server.",
   "main": "./src/index.js",
   "type": "module",

package/src/server/agent.js CHANGED Viewed

@@ -69,6 +69,25 @@ async function callModelWithFallback(client, config, messages, tools) {
   }
 }
+/**
+ * Returns true if the last two assistant messages in the session are both
+ * synthetic model_error notes, indicating a confirmed failure loop that cannot
+ * self-resolve (e.g. persistent empty choices from context overflow).
+ */
+function hasConsecutiveModelErrors(messages) {
+  const assistantTail = messages
+    .filter(m => m.role === 'assistant')
+    .slice(-2);
+  return (
+    assistantTail.length === 2 &&
+    assistantTail.every(
+      m =>
+        typeof m.content === 'string' &&
+        m.content.startsWith('[System: Previous run failed (model_error)')
+    )
+  );
+}
 /**
  * Runs a single agent loop up to maxIterations.
  * Returns { iteration, response, logSummary, status, runToolCalls, checkpoint }.
@@ -112,7 +131,7 @@ async function runAgentLoop(client, config, session, prepareMessages) {
     if (!modelResult.choices || modelResult.choices.length === 0) {
       return {
         iteration,
-        response: 'Model returned an empty response.',
+        response: `Model returned an empty response (${preparedMessages.length} messages in context). This typically happens when the conversation is too long for the model. Try starting a new session or switching to a model with a larger context window.`,
         logSummary: `Model error on iteration ${iteration}: Empty choices array.`,
         status: 'model_error',
         runToolCalls,
@@ -482,6 +501,25 @@ async function _runHandleChat(config, sessionId, userMessage) {
   try {
     // Handoff loop
     while (true) {
+      // Safety check: if the last two assistant messages are both model_error
+      // synthetic notes, we are in a confirmed failure loop. Escalate immediately
+      // rather than burning more iterations on a stuck session.
+      if (hasConsecutiveModelErrors(session.messages)) {
+        finalResponse = 'The model has failed twice in a row. This is likely due to the conversation being too long for the model to process. Please start a new session or switch to a model with a larger context window.';
+        finalLogSummary = 'Consecutive model_error detected: session escalated to intervention_required without running another agent loop.';
+        finalStatus = 'intervention_required';
+        await appendLog(sessionId, {
+          iteration: 0,
+          model: config.selectedModel,
+          userInput: userMessage,
+          toolCalls: [],
+          response: finalResponse,
+          logSummary: finalLogSummary,
+          status: 'intervention_required',
+        });
+        break;
+      }
       const runStartIndex = session.messages.length;
       const run = await runAgentLoop(client, config, session, prepareMessages);
       allToolCalls.push(...run.runToolCalls);
@@ -505,8 +543,14 @@ async function _runHandleChat(config, sessionId, userMessage) {
         if (run.rawResponse) logEntry.rawResponse = run.rawResponse;
         await appendLog(sessionId, logEntry);
-        // Inject synthetic error note so the model has context on the next user turn
+        // Inject synthetic error note so the model has context on the next user turn.
+        // For failed runs, also strip the tool call history — keeping it would bloat
+        // the context and create a positive-feedback death spiral where each failure
+        // makes the next one more likely (especially on free models with small context
+        // windows). The synthetic note is sufficient context; tool results are preserved
+        // in the JSONL log and accessible via read_session_log.
         if (finalStatus === 'model_error' || finalStatus === 'format_error') {
+          session.messages.splice(runStartIndex, session.messages.length - runStartIndex);
           const errorDetail = run.errorDetail ? ` Error detail: ${JSON.stringify(run.errorDetail)}` : '';
           session.messages.push({
             role: 'assistant',