@ducci/jarvis 1.0.28 → 1.0.30
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -0,0 +1,121 @@
|
|
|
1
|
+
# Finding 012: Empty-Content Nudge Includes Tools and Loses Recovery Text
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-03-02
|
|
4
|
+
**Severity:** Medium — user sees generic error when model produces a partial recovery response
|
|
5
|
+
**Status:** Fixed
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Observed Session
|
|
10
|
+
|
|
11
|
+
Session `21fb43a7-2b11-4208-99fb-e6b54fddc07b`, entry 9 in session.jsonl:
|
|
12
|
+
|
|
13
|
+
```
|
|
14
|
+
status=format_error
|
|
15
|
+
model=nvidia/nemotron-3-nano-30b-a3b:free
|
|
16
|
+
iteration=3
|
|
17
|
+
userInput='Ok. Read the results folder. Is there anything?'
|
|
18
|
+
logSummary='Model returned non-JSON final response after recovery attempts.'
|
|
19
|
+
response='The model did not produce a response. Please try again.'
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
The user received: **"The model did not produce a response. Please try again."**
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## What Happened
|
|
27
|
+
|
|
28
|
+
1. The agent executed two tool calls:
|
|
29
|
+
- `list_dir /root/.jarvis/projects/cybersecurity/results` → success
|
|
30
|
+
- `exec "list_dir /root/.jarvis/projects/cybersecurity/results/dviet.de"` → exit 127 (`list_dir: not found`)
|
|
31
|
+
- The model confused the `list_dir` jarvis tool with a shell command
|
|
32
|
+
|
|
33
|
+
2. After the failed exec, the model returned `assistantMessage.content = null` with no `tool_calls` — it "went silent"
|
|
34
|
+
|
|
35
|
+
3. Finding 011's empty-content nudge was triggered
|
|
36
|
+
|
|
37
|
+
4. The nudge **also failed** — no valid JSON response was produced
|
|
38
|
+
|
|
39
|
+
5. The agent fell through to `format_error` with the fallback message
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## Bug Chain
|
|
44
|
+
|
|
45
|
+
### Bug 1 — toolDefs included in empty nudge
|
|
46
|
+
|
|
47
|
+
```js
|
|
48
|
+
const nudgeResult = await callModelWithFallback(client, config, emptyNudge, toolDefs);
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
When the model is confused after a tool failure, it may respond to the nudge with **another tool call** instead of text. If it does:
|
|
52
|
+
|
|
53
|
+
```
|
|
54
|
+
nudgeResult.choices[0].message.content = null
|
|
55
|
+
nudgeContent = ''
|
|
56
|
+
JSON.parse('') → throws
|
|
57
|
+
catch: // Give up — content stays ''
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
The model had an opportunity to call more tools instead of producing a text response — the wrong behavior for a recovery nudge.
|
|
61
|
+
|
|
62
|
+
### Bug 2 — content assigned after parse
|
|
63
|
+
|
|
64
|
+
```js
|
|
65
|
+
const nudgeContent = nudgeResult.choices[0]?.message?.content || '';
|
|
66
|
+
parsed = JSON.parse(nudgeContent); // ← throws on non-JSON or empty
|
|
67
|
+
content = nudgeContent; // ← only reached if parse succeeded
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
If the model responds to the nudge with non-empty but non-JSON text (e.g. a plain English answer), `JSON.parse` throws and `content` is **never updated**. The non-JSON text is discarded. The `!parsed` handler then shows the fallback message instead of the model's actual text.
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
|
|
74
|
+
## Difference from Finding 011
|
|
75
|
+
|
|
76
|
+
| Finding | Problem | Trigger |
|
|
77
|
+
|---------|---------|---------|
|
|
78
|
+
| 011 | Empty model response propagates to Telegram | Initial empty content, no recovery chain |
|
|
79
|
+
| 012 | Recovery nudge discards best-effort text; model can respond with tool call | Recovery nudge called with toolDefs + content assigned after parse |
|
|
80
|
+
|
|
81
|
+
Finding 012 is a refinement of the recovery path introduced in Finding 011.
|
|
82
|
+
|
|
83
|
+
---
|
|
84
|
+
|
|
85
|
+
## Fix
|
|
86
|
+
|
|
87
|
+
### `src/server/agent.js` — empty-content nudge block
|
|
88
|
+
|
|
89
|
+
**Before:**
|
|
90
|
+
```js
|
|
91
|
+
const nudgeResult = await callModelWithFallback(client, config, emptyNudge, toolDefs);
|
|
92
|
+
const nudgeContent = nudgeResult.choices[0]?.message?.content || '';
|
|
93
|
+
parsed = JSON.parse(nudgeContent);
|
|
94
|
+
content = nudgeContent;
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
**After:**
|
|
98
|
+
```js
|
|
99
|
+
// No tools: force text response, prevent model from calling another tool
|
|
100
|
+
const nudgeResult = await callModelWithFallback(client, config, emptyNudge, []);
|
|
101
|
+
const nudgeContent = nudgeResult.choices[0]?.message?.content || '';
|
|
102
|
+
// Persist before parsing — if JSON parse throws, content still carries the
|
|
103
|
+
// model's best-effort text so the !parsed handler can show it to the user
|
|
104
|
+
if (nudgeContent.trim()) {
|
|
105
|
+
content = nudgeContent;
|
|
106
|
+
}
|
|
107
|
+
parsed = JSON.parse(nudgeContent);
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
---
|
|
111
|
+
|
|
112
|
+
## Outcome
|
|
113
|
+
|
|
114
|
+
| Nudge response | Before | After |
|
|
115
|
+
|---|---|---|
|
|
116
|
+
| Valid JSON | Clean recovery | Clean recovery (no change) |
|
|
117
|
+
| Non-JSON text | Text discarded, fallback shown | Text shown to user |
|
|
118
|
+
| Tool call (no content) | content='', fallback shown | Less likely; content='', fallback shown |
|
|
119
|
+
| Empty again | content='', fallback shown | content='', fallback shown (no change) |
|
|
120
|
+
|
|
121
|
+
The user in the observed session would have received the model's best-effort text about the results folder contents, rather than "The model did not produce a response. Please try again."
|
|
@@ -0,0 +1,59 @@
|
|
|
1
|
+
# Finding 013 — stderr Visibility and Output Truncation
|
|
2
|
+
|
|
3
|
+
## Observed Behaviour
|
|
4
|
+
|
|
5
|
+
During a multi-run ZAP security scan session, the agent repeatedly failed to diagnose and fix the root cause of scan failures. It issued `pkill`/`kill` variants dozens of times, burned through 6 iteration limits and 2 handoffs, and ultimately gave up without producing results.
|
|
6
|
+
|
|
7
|
+
Post-mortem analysis of the debug session logs revealed two compounding problems.
|
|
8
|
+
|
|
9
|
+
## Root Causes
|
|
10
|
+
|
|
11
|
+
### 1. Head-only truncation buried errors at the end of output
|
|
12
|
+
|
|
13
|
+
`MAX_TOOL_RESULT = 4000` was applied as a simple head slice: `resultStr.slice(0, 4000)`. ZAP produces verbose startup logs (JVM init, add-on loading, database migration) that easily exceed 4000 characters. Five separate ZAP exec results were truncated exactly at the limit, cutting off during database migration messages. Any errors that appeared later in the output — after the verbose preamble — were silently dropped before the model ever saw them.
|
|
14
|
+
|
|
15
|
+
### 2. Model ignored stderr even when it was visible
|
|
16
|
+
|
|
17
|
+
In one un-truncated result (810 chars), the critical error was plainly present:
|
|
18
|
+
|
|
19
|
+
```
|
|
20
|
+
g_module_open() failed for libpixbufloader-tiff.so: libtiff.so.5: cannot open shared object file: No such file or directory
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
The model's subsequent response ignored it entirely and concluded only that "no results were found in the output directory." There was no mechanism forcing the model to re-examine stderr before forming its conclusion or retrying.
|
|
24
|
+
|
|
25
|
+
Similarly, all `pkill`/`kill` commands returned `exitCode: 1` with clear stderr — yet the model continued issuing variations of the same commands without diagnosing why process termination was failing.
|
|
26
|
+
|
|
27
|
+
## Fixes
|
|
28
|
+
|
|
29
|
+
### Fix 1 — Head + tail truncation (`agent.js`)
|
|
30
|
+
|
|
31
|
+
Replace head-only truncation with a head+tail strategy:
|
|
32
|
+
|
|
33
|
+
```js
|
|
34
|
+
// Before
|
|
35
|
+
resultStr.slice(0, MAX_TOOL_RESULT) + '\n[...truncated]'
|
|
36
|
+
|
|
37
|
+
// After
|
|
38
|
+
resultStr.slice(0, 2000) + `\n[...${resultStr.length - 4000} chars truncated...]\n` + resultStr.slice(-2000)
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
The first 2000 chars preserve startup context; the last 2000 chars preserve the diagnostic tail where errors typically appear. The marker in the middle shows how much was dropped. Total budget stays at 4000 chars.
|
|
42
|
+
|
|
43
|
+
### Fix 2 — Stderr nudge injection (`agent.js`)
|
|
44
|
+
|
|
45
|
+
After each iteration's tool-call loop, if any tool failed (`status === 'error'`) with non-empty `stderr`, inject a system message:
|
|
46
|
+
|
|
47
|
+
```
|
|
48
|
+
[System: A command failed and produced stderr output. Examine the stderr field in the tool result carefully — it likely describes the root cause of the failure. Do not retry the same command without first addressing what stderr reports.]
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
This creates an active forcing function — the model cannot continue to the next iteration without the nudge appearing in its context. The nudge is suppressed if loop detection already fired (to avoid contradictory instructions).
|
|
52
|
+
|
|
53
|
+
## Known Gap
|
|
54
|
+
|
|
55
|
+
The stderr nudge only fires when `toolFailed` is true (i.e., `exec` returned `status: 'error'`). Commands that return `exitCode: 0` but still emit meaningful errors to stderr (e.g., a shell script that succeeds but a subprocess inside it fails) will not trigger the nudge. Catching that case without generating noise from normal stderr usage (npm warnings, apt-get progress) requires more context than is available at this level. Documented as a known limitation.
|
|
56
|
+
|
|
57
|
+
## Files Changed
|
|
58
|
+
|
|
59
|
+
- `src/server/agent.js` — truncation strategy + stderr nudge injection
|
package/package.json
CHANGED
package/src/server/agent.js
CHANGED
|
@@ -139,6 +139,7 @@ async function runAgentLoop(client, config, session, prepareMessages) {
|
|
|
139
139
|
});
|
|
140
140
|
|
|
141
141
|
let toolsModified = false;
|
|
142
|
+
let stderrErrorInIteration = false;
|
|
142
143
|
for (const toolCall of assistantMessage.tool_calls) {
|
|
143
144
|
const toolName = toolCall.function.name;
|
|
144
145
|
let toolArgs;
|
|
@@ -165,6 +166,9 @@ async function runAgentLoop(client, config, session, prepareMessages) {
|
|
|
165
166
|
const toolFailed = toolStatus === 'error' || (resultObj && resultObj.status === 'error');
|
|
166
167
|
if (toolFailed) {
|
|
167
168
|
consecutiveFailures++;
|
|
169
|
+
if (resultObj && resultObj.stderr) {
|
|
170
|
+
stderrErrorInIteration = true;
|
|
171
|
+
}
|
|
168
172
|
} else {
|
|
169
173
|
consecutiveFailures = 0;
|
|
170
174
|
}
|
|
@@ -173,7 +177,7 @@ async function runAgentLoop(client, config, session, prepareMessages) {
|
|
|
173
177
|
runToolCalls.push({ name: toolName, args: toolArgs, status: toolStatus, result: resultStr });
|
|
174
178
|
|
|
175
179
|
const sessionContent = resultStr.length > MAX_TOOL_RESULT
|
|
176
|
-
? resultStr.slice(0,
|
|
180
|
+
? resultStr.slice(0, 2000) + `\n[...${resultStr.length - 4000} chars truncated...]\n` + resultStr.slice(-2000)
|
|
177
181
|
: resultStr;
|
|
178
182
|
session.messages.push({
|
|
179
183
|
role: 'tool',
|
|
@@ -201,6 +205,13 @@ async function runAgentLoop(client, config, session, prepareMessages) {
|
|
|
201
205
|
});
|
|
202
206
|
}
|
|
203
207
|
|
|
208
|
+
if (stderrErrorInIteration && !loopDetected) {
|
|
209
|
+
session.messages.push({
|
|
210
|
+
role: 'user',
|
|
211
|
+
content: '[System: A command failed and produced stderr output. Examine the stderr field in the tool result carefully — it likely describes the root cause of the failure. Do not retry the same command without first addressing what stderr reports.]',
|
|
212
|
+
});
|
|
213
|
+
}
|
|
214
|
+
|
|
204
215
|
// Reload tools if any were created/updated this iteration
|
|
205
216
|
if (toolsModified) {
|
|
206
217
|
tools = await loadTools();
|
|
@@ -218,17 +229,24 @@ async function runAgentLoop(client, config, session, prepareMessages) {
|
|
|
218
229
|
if (!content.trim()) {
|
|
219
230
|
// Model returned no content at all — use a targeted nudge instead of the
|
|
220
231
|
// standard JSON recovery chain (designed for non-empty non-JSON responses).
|
|
232
|
+
// Send with no tools so the model cannot respond with another tool call,
|
|
233
|
+
// which would leave content empty and discard any recovery text.
|
|
221
234
|
try {
|
|
222
235
|
const emptyNudge = [
|
|
223
236
|
...preparedMessages,
|
|
224
237
|
{ role: 'user', content: 'You returned an empty response. ' + FORMAT_NUDGE },
|
|
225
238
|
];
|
|
226
|
-
const nudgeResult = await callModelWithFallback(client, config, emptyNudge,
|
|
239
|
+
const nudgeResult = await callModelWithFallback(client, config, emptyNudge, []);
|
|
227
240
|
const nudgeContent = nudgeResult.choices[0]?.message?.content || '';
|
|
241
|
+
// Persist nudge text before parsing — if JSON parse throws, content still
|
|
242
|
+
// carries the model's best-effort text so the !parsed handler can show it
|
|
243
|
+
// rather than falling back to "The model did not produce a response."
|
|
244
|
+
if (nudgeContent.trim()) {
|
|
245
|
+
content = nudgeContent;
|
|
246
|
+
}
|
|
228
247
|
parsed = JSON.parse(nudgeContent);
|
|
229
|
-
content = nudgeContent;
|
|
230
248
|
} catch {
|
|
231
|
-
//
|
|
249
|
+
// Fall through to !parsed handler; content may now carry the nudge text
|
|
232
250
|
}
|
|
233
251
|
} else {
|
|
234
252
|
try {
|