npm - @ducci/jarvis - Versions diffs - 1.0.29 → 1.0.30 - Mend

@ducci/jarvis 1.0.29 → 1.0.30

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/docs/findings/013-stderr-visibility-and-truncation.md +59 -0
package/package.json +1 -1
package/src/server/agent.js +12 -1

package/docs/findings/013-stderr-visibility-and-truncation.md ADDED Viewed

@@ -0,0 +1,59 @@
+# Finding 013 — stderr Visibility and Output Truncation
+## Observed Behaviour
+During a multi-run ZAP security scan session, the agent repeatedly failed to diagnose and fix the root cause of scan failures. It issued `pkill`/`kill` variants dozens of times, burned through 6 iteration limits and 2 handoffs, and ultimately gave up without producing results.
+Post-mortem analysis of the debug session logs revealed two compounding problems.
+## Root Causes
+### 1. Head-only truncation buried errors at the end of output
+`MAX_TOOL_RESULT = 4000` was applied as a simple head slice: `resultStr.slice(0, 4000)`. ZAP produces verbose startup logs (JVM init, add-on loading, database migration) that easily exceed 4000 characters. Five separate ZAP exec results were truncated exactly at the limit, cutting off during database migration messages. Any errors that appeared later in the output — after the verbose preamble — were silently dropped before the model ever saw them.
+### 2. Model ignored stderr even when it was visible
+In one un-truncated result (810 chars), the critical error was plainly present:
+```
+g_module_open() failed for libpixbufloader-tiff.so: libtiff.so.5: cannot open shared object file: No such file or directory
+```
+The model's subsequent response ignored it entirely and concluded only that "no results were found in the output directory." There was no mechanism forcing the model to re-examine stderr before forming its conclusion or retrying.
+Similarly, all `pkill`/`kill` commands returned `exitCode: 1` with clear stderr — yet the model continued issuing variations of the same commands without diagnosing why process termination was failing.
+## Fixes
+### Fix 1 — Head + tail truncation (`agent.js`)
+Replace head-only truncation with a head+tail strategy:
+```js
+// Before
+resultStr.slice(0, MAX_TOOL_RESULT) + '\n[...truncated]'
+// After
+resultStr.slice(0, 2000) + `\n[...${resultStr.length - 4000} chars truncated...]\n` + resultStr.slice(-2000)
+```
+The first 2000 chars preserve startup context; the last 2000 chars preserve the diagnostic tail where errors typically appear. The marker in the middle shows how much was dropped. Total budget stays at 4000 chars.
+### Fix 2 — Stderr nudge injection (`agent.js`)
+After each iteration's tool-call loop, if any tool failed (`status === 'error'`) with non-empty `stderr`, inject a system message:
+```
+[System: A command failed and produced stderr output. Examine the stderr field in the tool result carefully — it likely describes the root cause of the failure. Do not retry the same command without first addressing what stderr reports.]
+```
+This creates an active forcing function — the model cannot continue to the next iteration without the nudge appearing in its context. The nudge is suppressed if loop detection already fired (to avoid contradictory instructions).
+## Known Gap
+The stderr nudge only fires when `toolFailed` is true (i.e., `exec` returned `status: 'error'`). Commands that return `exitCode: 0` but still emit meaningful errors to stderr (e.g., a shell script that succeeds but a subprocess inside it fails) will not trigger the nudge. Catching that case without generating noise from normal stderr usage (npm warnings, apt-get progress) requires more context than is available at this level. Documented as a known limitation.
+## Files Changed
+- `src/server/agent.js` — truncation strategy + stderr nudge injection

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@ducci/jarvis",
-  "version": "1.0.29",
+  "version": "1.0.30",
   "description": "A fully automated agent system that lives on a server.",
   "main": "./src/index.js",
   "type": "module",

package/src/server/agent.js CHANGED Viewed

@@ -139,6 +139,7 @@ async function runAgentLoop(client, config, session, prepareMessages) {
       });
       let toolsModified = false;
+      let stderrErrorInIteration = false;
       for (const toolCall of assistantMessage.tool_calls) {
         const toolName = toolCall.function.name;
         let toolArgs;
@@ -165,6 +166,9 @@ async function runAgentLoop(client, config, session, prepareMessages) {
         const toolFailed = toolStatus === 'error' || (resultObj && resultObj.status === 'error');
         if (toolFailed) {
           consecutiveFailures++;
+          if (resultObj && resultObj.stderr) {
+            stderrErrorInIteration = true;
+          }
         } else {
           consecutiveFailures = 0;
         }
@@ -173,7 +177,7 @@ async function runAgentLoop(client, config, session, prepareMessages) {
         runToolCalls.push({ name: toolName, args: toolArgs, status: toolStatus, result: resultStr });
         const sessionContent = resultStr.length > MAX_TOOL_RESULT
-          ? resultStr.slice(0, MAX_TOOL_RESULT) + '\n[...truncated]'
+          ? resultStr.slice(0, 2000) + `\n[...${resultStr.length - 4000} chars truncated...]\n` + resultStr.slice(-2000)
           : resultStr;
         session.messages.push({
           role: 'tool',
@@ -201,6 +205,13 @@ async function runAgentLoop(client, config, session, prepareMessages) {
         });
       }
+      if (stderrErrorInIteration && !loopDetected) {
+        session.messages.push({
+          role: 'user',
+          content: '[System: A command failed and produced stderr output. Examine the stderr field in the tool result carefully — it likely describes the root cause of the failure. Do not retry the same command without first addressing what stderr reports.]',
+        });
+      }
       // Reload tools if any were created/updated this iteration
       if (toolsModified) {
         tools = await loadTools();