@ducci/jarvis 1.0.32 → 1.0.34
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -0,0 +1,119 @@
|
|
|
1
|
+
# Finding 016: File Writing Corruption, Misleading Stderr Nudge, and Repeated-Error Loop
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-03-02
|
|
4
|
+
**Severity:** High — agent burned 10 iterations on the wrong diagnosis; task abandoned
|
|
5
|
+
**Status:** Fixed
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Observed Session
|
|
10
|
+
|
|
11
|
+
Session `a25fd973-3e92-4902-a96d-536ef0eb3005`. Model: `nvidia/nemotron-3-nano-30b-a3b:free`. User asked to run a ZAP security scanner script.
|
|
12
|
+
|
|
13
|
+
The script (`scan.sh`) failed immediately with `$ZAP_CMD: command not found`. The agent used all 10 iterations investigating PATH issues and gave up without solving the problem or producing a handoff checkpoint.
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
## Root Cause 1: Shell Script Written with Escaped Dollar Signs
|
|
18
|
+
|
|
19
|
+
### What happened
|
|
20
|
+
|
|
21
|
+
`scan.sh` was written in a prior session by the agent using `exec` with a shell command (echo or heredoc). Multi-layered escaping — JS string → JSON encoding → bash variable expansion — caused every `$` in the script to be written as `\$` in the file.
|
|
22
|
+
|
|
23
|
+
In bash, `"\$VAR"` (backslash-dollar in double quotes) suppresses variable expansion and produces the literal string `$VAR`. The script ran but nothing expanded:
|
|
24
|
+
|
|
25
|
+
```
|
|
26
|
+
bash -x scan.sh http://testphp.vulnweb.com:
|
|
27
|
+
|
|
28
|
+
+ DOMAIN='$1' ← should be 'http://testphp.vulnweb.com'
|
|
29
|
+
+ OUTPUT_DIR='/path/results/$DOMAIN' ← should be '/path/results/http://...'
|
|
30
|
+
+ '$ZAP_CMD' -cmd ... ← tries to run a command literally named '$ZAP_CMD'
|
|
31
|
+
scan.sh: line 27: $ZAP_CMD: command not found
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
Secondary confirmation: the project directory contained folders literally named `$OUTPUT_DIR` and `$RESULTS_DIR`, created by a prior run of the broken script.
|
|
35
|
+
|
|
36
|
+
### Fix
|
|
37
|
+
|
|
38
|
+
Added `write_file` as a seed tool. It calls `fs.promises.writeFile` directly — content arrives as a JSON string and is written to disk verbatim. No shell is involved, so no escaping layer can corrupt dollar signs or backslashes.
|
|
39
|
+
|
|
40
|
+
Added an optional `mode` parameter (e.g. `"755"`) to make scripts executable in the same call.
|
|
41
|
+
|
|
42
|
+
Updated the system prompt with a dedicated "Writing Files" section (peer-level to "exec Safety") stating: use `write_file` for all file creation — never `exec` with `echo`, `printf`, or heredoc.
|
|
43
|
+
|
|
44
|
+
**Files changed:**
|
|
45
|
+
- `src/server/tools.js` — `write_file` added to `SEED_TOOLS`
|
|
46
|
+
- `docs/system-prompt.md` — new "Writing Files" section; removed the buried `echo -e` bullet from exec Safety
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## Root Cause 2: Stderr Nudge Misdirected the Model
|
|
51
|
+
|
|
52
|
+
### What happened
|
|
53
|
+
|
|
54
|
+
After every failed tool call with stderr output, the system injected:
|
|
55
|
+
|
|
56
|
+
> *"Examine the stderr field in the tool result carefully — it likely describes the root cause of the failure."*
|
|
57
|
+
|
|
58
|
+
The stderr was `$ZAP_CMD: command not found` — which looks like a PATH problem. The real diagnostic clue was in the **stdout** of `bash -x`: `DOMAIN='$1'` (variable not expanded). By directing the model to stderr, the nudge trained its attention away from the evidence.
|
|
59
|
+
|
|
60
|
+
The model then spent iterations on: explicit PATH overrides, `command -v` checks, `sed -n l`, `cat -A`, `which bash` — all stderr-adjacent investigation — and never examined what the `-x` trace was actually showing.
|
|
61
|
+
|
|
62
|
+
### Fix
|
|
63
|
+
|
|
64
|
+
Changed the stderr nudge to cover both stdout and stderr, with an explicit callout to `bash -x` as a debug tool whose key output is in stdout:
|
|
65
|
+
|
|
66
|
+
```
|
|
67
|
+
[System: A command failed. Examine both the stdout AND stderr fields in the tool result —
|
|
68
|
+
stderr names the error, but stdout (especially from debug commands like bash -x) often shows
|
|
69
|
+
the root cause. Do not retry without first understanding what the full output is telling you.]
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
**File changed:** `src/server/agent.js` — `stderrErrorInIteration` nudge
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
## Root Cause 3: No Detection for the Same Error Repeating Across Different Commands
|
|
77
|
+
|
|
78
|
+
### What happened
|
|
79
|
+
|
|
80
|
+
The existing loop detector tracks `tool + args + result` triples. Because the model varied its tool calls each iteration (different PATH strings, different diagnostic commands), this never triggered. Meanwhile `$ZAP_CMD: command not found` appeared in stderr across 5+ tool calls from entirely different commands — a strong signal that the diagnosis is wrong, not the approach.
|
|
81
|
+
|
|
82
|
+
Existing detectors that missed this:
|
|
83
|
+
- `loopTracker` — requires identical tool + args + result; missed because args varied
|
|
84
|
+
- `consecutiveFailures` — tracks back-to-back failures; partially reset when some calls succeeded
|
|
85
|
+
- `maxHandoffs` / zero-progress — apply only to checkpoint-reached runs
|
|
86
|
+
|
|
87
|
+
### Fix
|
|
88
|
+
|
|
89
|
+
Added `stderrTracker = new Map()` in `runAgentLoop` (parallel to `loopTracker`). After each failed tool call, the first line of stderr is extracted as the key and its count incremented.
|
|
90
|
+
|
|
91
|
+
When any stderr string reaches `CONSECUTIVE_FAILURE_THRESHOLD` (3), a targeted "step back" nudge is injected, quoting the repeating error, instead of the basic stderr nudge:
|
|
92
|
+
|
|
93
|
+
```
|
|
94
|
+
[System: The error "$ZAP_CMD: command not found" has now appeared 3 times across different
|
|
95
|
+
commands. You are repeatedly diagnosing the wrong thing. Stop, step back, and reconsider
|
|
96
|
+
from scratch — what is this error fundamentally telling you about the state of the system?]
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
Using only the first line of stderr (not the full message) makes the tracker robust to verbose multi-line output where later lines may contain timestamps or variable content.
|
|
100
|
+
|
|
101
|
+
**File changed:** `src/server/agent.js` — `stderrTracker`, `firstStderrLine` extraction, `repeatedStderr` check replacing basic stderr nudge when threshold reached
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
## Why the Session Never Produced a Checkpoint
|
|
106
|
+
|
|
107
|
+
The model produced a final text response on iteration 10 (it didn't time out — it gave up). This means `!done` was never true after the while loop, so the wrap-up / checkpoint path never ran. The model exhausted its budget investigating PATH and on the last iteration concluded it couldn't solve the problem.
|
|
108
|
+
|
|
109
|
+
Fix 3 (repeated-error nudge) would have fired by iteration 5–6 with the specific message quoting `$ZAP_CMD: command not found`. At that point the model would have had a chance to reconsider what the error was telling it rather than continuing PATH investigation.
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## Files Changed
|
|
114
|
+
|
|
115
|
+
| File | Change |
|
|
116
|
+
|------|--------|
|
|
117
|
+
| `src/server/tools.js` | Added `write_file` seed tool |
|
|
118
|
+
| `docs/system-prompt.md` | Added "Writing Files" section; removed `echo -e` bullet from exec Safety |
|
|
119
|
+
| `src/server/agent.js` | Added `stderrTracker`; first-line stderr key extraction; repeated-error nudge overriding basic stderr nudge; broadened stderr nudge wording to cover stdout |
|
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
# Finding 017: Looping Intervention and Lossy Checkpoint
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-03-04
|
|
4
|
+
**Severity:** High — agent burned 40 iterations (4 full runs) on a structurally impossible task without escaping; concrete facts like file paths regressed between handoff runs
|
|
5
|
+
**Status:** Fixed
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Observed Session
|
|
10
|
+
|
|
11
|
+
Session from a remote server. Model: `nvidia/nemotron-3-nano-30b-a3b:free`. User requested a ZAP security scanning workflow (create project dir, write README + scan.sh, run a test scan).
|
|
12
|
+
|
|
13
|
+
| Entry | Trigger | Status | Iterations | Notes |
|
|
14
|
+
|-------|---------|--------|------------|-------|
|
|
15
|
+
| 1 | "hello" | ok | 0 | greeting |
|
|
16
|
+
| 2 | ZAP task | checkpoint_reached | 10 | ZAP daemon lock blocked scan |
|
|
17
|
+
| 3 | handoff resume | checkpoint_reached | 10 | same lock, same result |
|
|
18
|
+
| 4 | zero-progress | intervention_required | 0 | correctly detected |
|
|
19
|
+
| 5 | "Ok can you do it?" | checkpoint_reached | 10 | loop restarted, same result |
|
|
20
|
+
| 6 | handoff resume | checkpoint_reached | 10 | same lock again |
|
|
21
|
+
| 7 | zero-progress | intervention_required | 0 | detected again, session abandoned |
|
|
22
|
+
|
|
23
|
+
Total: 40 wasted iterations. Additionally, between Entry 2 and Entry 6, the agent changed the project path from `/root/.jarvis/projects/cybersecurity` to `/root/projects/cybersecurity`, causing `list_dir` to fail at the start of that run.
|
|
24
|
+
|
|
25
|
+
---
|
|
26
|
+
|
|
27
|
+
## Root Cause 1: Zero-Progress Detection Resets Across User Messages
|
|
28
|
+
|
|
29
|
+
### What happened
|
|
30
|
+
|
|
31
|
+
`previousRemaining` is a local variable initialized to `null` on every call to `_runHandleChat`. Zero-progress detection requires two identical `checkpoint.remaining` values, but both must occur within the same invocation. When `intervention_required` fires and the user sends a new message, `_runHandleChat` is called fresh with `previousRemaining = null`. The detection resets.
|
|
32
|
+
|
|
33
|
+
The result: each new user message grants the agent 2 full runs (20 iterations) before zero-progress fires again — regardless of how many times the cycle has already repeated. The intervention mechanism correctly identifies "stuck" but provides no structural escape when the user replies.
|
|
34
|
+
|
|
35
|
+
### Fix
|
|
36
|
+
|
|
37
|
+
1. When zero-progress fires, persist `session.metadata.lastCheckpointRemaining = currentRemaining`.
|
|
38
|
+
2. In `_runHandleChat`, initialize `previousRemaining` from `session.metadata.lastCheckpointRemaining` (cleared after reading). Zero-progress now fires after just one run (10 iterations) on the next user message if the agent produces the same remaining.
|
|
39
|
+
3. Inject a note into `userMessageWithContext` when `lastCheckpointRemaining` was set:
|
|
40
|
+
|
|
41
|
+
```
|
|
42
|
+
[System: This task previously hit zero-progress and required intervention. If the user has given new direction or clarification, follow it. Otherwise, immediately explain what specific obstacle is blocking progress — do not resume the same failing approach.]
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
This gives the agent explicit guidance to respond to what the user actually asked instead of blindly resuming.
|
|
46
|
+
|
|
47
|
+
**File:** `src/server/agent.js` — `_runHandleChat`
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## Root Cause 2: Checkpoint Loses Concrete Facts Between Runs
|
|
52
|
+
|
|
53
|
+
### What happened
|
|
54
|
+
|
|
55
|
+
The checkpoint schema had `progress`, `remaining`, and `failedApproaches` — all natural-language prose. Concrete facts the agent discovers during a run (file paths, binary locations, config values) are paraphrased or omitted when the agent writes the summary. On resume, the model reconstructs these facts from vague prose and sometimes gets them wrong.
|
|
56
|
+
|
|
57
|
+
In this session, the project directory `/root/.jarvis/projects/cybersecurity` was written correctly in runs 1–3 but reconstructed as `/root/projects/cybersecurity` in run 6. The first action of that run was `list_dir` on the wrong path, which failed.
|
|
58
|
+
|
|
59
|
+
### Fix
|
|
60
|
+
|
|
61
|
+
1. Added a `state` field to `WRAP_UP_NOTE`'s checkpoint schema: a flat key-value JSON object for concrete facts confirmed by tool output (file paths created, binary locations found, config values discovered).
|
|
62
|
+
|
|
63
|
+
2. After each handoff, merge `run.checkpoint.state` into `session.metadata.checkpointState` (later runs overwrite earlier values for the same key).
|
|
64
|
+
|
|
65
|
+
3. When building `resumeContent` for the next handoff run, inject the accumulated state:
|
|
66
|
+
|
|
67
|
+
```
|
|
68
|
+
[System: Known facts from previous runs:
|
|
69
|
+
- projectDir: /root/.jarvis/projects/cybersecurity
|
|
70
|
+
- zapBinary: /snap/bin/zaproxy
|
|
71
|
+
- scanScriptPath: /root/.jarvis/projects/cybersecurity/scan.sh]
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
4. When re-entering after zero-progress (via `wasZeroProgress`), also include the state in the `userMessageWithContext` injection so facts are available immediately on the first run.
|
|
75
|
+
|
|
76
|
+
5. `session.metadata.checkpointState` is reset on each new user message (same lifecycle as `failedApproaches`) to avoid stale facts from previous tasks leaking into new ones.
|
|
77
|
+
|
|
78
|
+
**File:** `src/server/agent.js` — `WRAP_UP_NOTE`, checkpoint normalization, `_runHandleChat`
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## Interaction Between the Two Fixes
|
|
83
|
+
|
|
84
|
+
The zero-progress note injected into `userMessageWithContext` now also includes the `priorCheckpointState` (captured before the metadata reset), so the agent that receives the note also has the concrete facts it needs if the user does provide new direction. This means both fixes compound: the agent is told it was stuck AND given the facts it needs to act on whatever the user says.
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
## What Was Not Changed
|
|
89
|
+
|
|
90
|
+
- The existing exact-match loop detector (`loopTracker`) — unchanged
|
|
91
|
+
- The consecutive failure detector — unchanged
|
|
92
|
+
- The `maxHandoffs` cap — unchanged
|
|
93
|
+
- The `hasConsecutiveModelErrors` escalation — unchanged
|
|
94
|
+
- The context strip on checkpoint/intervention — unchanged
|
|
95
|
+
|
|
96
|
+
---
|
|
97
|
+
|
|
98
|
+
## Files Changed
|
|
99
|
+
|
|
100
|
+
| File | Change |
|
|
101
|
+
|------|--------|
|
|
102
|
+
| `src/server/agent.js` | `WRAP_UP_NOTE` — added `state` field to checkpoint schema |
|
|
103
|
+
| `src/server/agent.js` | Checkpoint normalization — normalize `cp.state` to `{}` if missing/malformed |
|
|
104
|
+
| `src/server/agent.js` | `_runHandleChat` — capture `wasZeroProgress`, `priorCheckpointRemaining`, `priorCheckpointState` before metadata reset |
|
|
105
|
+
| `src/server/agent.js` | `_runHandleChat` — inject zero-progress note + prior state into `userMessageWithContext` when applicable |
|
|
106
|
+
| `src/server/agent.js` | `_runHandleChat` — reset `lastCheckpointRemaining` and `checkpointState` on new user message |
|
|
107
|
+
| `src/server/agent.js` | `_runHandleChat` — initialize `previousRemaining` from `priorCheckpointRemaining` |
|
|
108
|
+
| `src/server/agent.js` | Zero-progress block — persist `session.metadata.lastCheckpointRemaining` |
|
|
109
|
+
| `src/server/agent.js` | Handoff accumulation — merge `run.checkpoint.state` into `session.metadata.checkpointState` |
|
|
110
|
+
| `src/server/agent.js` | `resumeContent` — inject accumulated `checkpointState` as known facts |
|
package/docs/system-prompt.md
CHANGED
|
@@ -54,7 +54,15 @@ The `exec` tool runs real shell commands on the server. Use it responsibly:
|
|
|
54
54
|
- **Use known paths.** Prefer `process.cwd()`, `$HOME`, or paths you already know over broad searches. Use `which <binary>` to locate executables.
|
|
55
55
|
- **Prefer targeted reads.** Use `grep`, `head`, or `tail` instead of `cat` on files you haven't seen before. Large file output is truncated anyway — a targeted command gives you better signal.
|
|
56
56
|
- **Avoid commands with unbounded runtime.** If a command could run indefinitely or scan an unknown-size tree, scope it first.
|
|
57
|
-
|
|
57
|
+
|
|
58
|
+
## Writing Files
|
|
59
|
+
|
|
60
|
+
Use the `write_file` tool to create or overwrite any file. Never use `exec` with `echo`, `printf`, or heredoc to write files.
|
|
61
|
+
|
|
62
|
+
Shell escaping through `exec` silently corrupts file content: dollar signs become `\$`, backslashes double up, and the resulting file looks correct when printed but is broken at runtime (variables never expand, scripts fail with "command not found"). `write_file` bypasses all shell interpretation — content arrives as a JSON string and lands in the file exactly as written.
|
|
63
|
+
|
|
64
|
+
- For shell scripts: pass `mode: "755"` to make the file executable in the same call.
|
|
65
|
+
- For any other file: omit `mode` or use `"644"`.
|
|
58
66
|
|
|
59
67
|
## Execution Timeouts
|
|
60
68
|
|
package/package.json
CHANGED
package/src/server/agent.js
CHANGED
|
@@ -20,11 +20,12 @@ Respond with your normal JSON, but add a checkpoint field:
|
|
|
20
20
|
"checkpoint": {
|
|
21
21
|
"progress": "What has been fully completed — only include items confirmed by tool output (e.g., successful exec with exit code 0, or verified by ls/cat). Do not report planned steps as completed.",
|
|
22
22
|
"remaining": "What still needs to be done to finish the task — as a plain text string, never an array or object.",
|
|
23
|
-
"failedApproaches": ["Concise description of each approach that was tried and failed, e.g. 'downloading subfinder via curl from GitHub releases — connection reset'. Omit array entries for things that succeeded. Leave as empty array if nothing failed."]
|
|
23
|
+
"failedApproaches": ["Concise description of each approach that was tried and failed, e.g. 'downloading subfinder via curl from GitHub releases — connection reset'. Omit array entries for things that succeeded. Leave as empty array if nothing failed."],
|
|
24
|
+
"state": {"factKey": "factValue — concrete facts confirmed by tool output this run: file paths created, binary locations found, config values discovered. Use short stable keys, e.g. projectDir, zapBinary, scanScriptPath. Omit or use {} if nothing concrete was discovered."}
|
|
24
25
|
}
|
|
25
26
|
}
|
|
26
27
|
|
|
27
|
-
The checkpoint field will be used to automatically resume the task in the next run. failedApproaches is injected into the next run so the agent does not waste iterations repeating strategies that already failed. remaining must be a plain text string. failedApproaches must be a JSON array of strings.]`;
|
|
28
|
+
The checkpoint field will be used to automatically resume the task in the next run. failedApproaches is injected into the next run so the agent does not waste iterations repeating strategies that already failed. state is injected verbatim so the next run does not need to rediscover file paths or binary locations. remaining must be a plain text string. failedApproaches must be a JSON array of strings. state must be a flat JSON object.]`;
|
|
28
29
|
|
|
29
30
|
// Serializes concurrent requests for the same session. Maps sessionId to the
|
|
30
31
|
// tail of the current request chain (a Promise that resolves when the last
|
|
@@ -103,6 +104,7 @@ async function runAgentLoop(client, config, session, prepareMessages) {
|
|
|
103
104
|
let logSummary = '';
|
|
104
105
|
let status = 'ok';
|
|
105
106
|
let consecutiveFailures = 0;
|
|
107
|
+
const stderrTracker = new Map();
|
|
106
108
|
|
|
107
109
|
while (iteration < config.maxIterations) {
|
|
108
110
|
iteration++;
|
|
@@ -199,6 +201,10 @@ async function runAgentLoop(client, config, session, prepareMessages) {
|
|
|
199
201
|
consecutiveFailures++;
|
|
200
202
|
if (resultObj && resultObj.stderr) {
|
|
201
203
|
stderrErrorInIteration = true;
|
|
204
|
+
const firstStderrLine = resultObj.stderr.split('\n')[0].trim();
|
|
205
|
+
if (firstStderrLine) {
|
|
206
|
+
stderrTracker.set(firstStderrLine, (stderrTracker.get(firstStderrLine) || 0) + 1);
|
|
207
|
+
}
|
|
202
208
|
}
|
|
203
209
|
} else {
|
|
204
210
|
consecutiveFailures = 0;
|
|
@@ -236,10 +242,16 @@ async function runAgentLoop(client, config, session, prepareMessages) {
|
|
|
236
242
|
});
|
|
237
243
|
}
|
|
238
244
|
|
|
239
|
-
|
|
245
|
+
const repeatedStderr = [...stderrTracker.entries()].find(([, count]) => count >= CONSECUTIVE_FAILURE_THRESHOLD);
|
|
246
|
+
if (repeatedStderr && !loopDetected) {
|
|
240
247
|
session.messages.push({
|
|
241
248
|
role: 'user',
|
|
242
|
-
content:
|
|
249
|
+
content: `[System: The error "${repeatedStderr[0].slice(0, 200)}" has now appeared ${repeatedStderr[1]} times across different commands. You are repeatedly diagnosing the wrong thing. Stop, step back, and reconsider from scratch — what is this error fundamentally telling you about the state of the system?]`,
|
|
250
|
+
});
|
|
251
|
+
} else if (stderrErrorInIteration && !loopDetected) {
|
|
252
|
+
session.messages.push({
|
|
253
|
+
role: 'user',
|
|
254
|
+
content: '[System: A command failed. Examine both the stdout AND stderr fields in the tool result — stderr names the error, but stdout (especially from debug commands like bash -x) often shows the root cause. Do not retry without first understanding what the full output is telling you.]',
|
|
243
255
|
});
|
|
244
256
|
}
|
|
245
257
|
|
|
@@ -403,6 +415,9 @@ async function runAgentLoop(client, config, session, prepareMessages) {
|
|
|
403
415
|
typeof item === 'string' ? item : JSON.stringify(item)
|
|
404
416
|
);
|
|
405
417
|
}
|
|
418
|
+
if (typeof cp.state !== 'object' || cp.state === null || Array.isArray(cp.state)) {
|
|
419
|
+
cp.state = {};
|
|
420
|
+
}
|
|
406
421
|
return {
|
|
407
422
|
iteration,
|
|
408
423
|
response,
|
|
@@ -469,6 +484,11 @@ async function _runHandleChat(config, sessionId, userMessage) {
|
|
|
469
484
|
session = createSession(systemPromptTemplate);
|
|
470
485
|
}
|
|
471
486
|
|
|
487
|
+
// Capture persisted state BEFORE resetting metadata so we can inject it below.
|
|
488
|
+
const wasZeroProgress = !!session.metadata.lastCheckpointRemaining;
|
|
489
|
+
const priorCheckpointRemaining = session.metadata.lastCheckpointRemaining || null;
|
|
490
|
+
const priorCheckpointState = session.metadata.checkpointState || {};
|
|
491
|
+
|
|
472
492
|
// Preserve accumulated failedApproaches in conversation history before resetting
|
|
473
493
|
// so the model retains knowledge of what failed in the previous batch of handoff runs.
|
|
474
494
|
let userMessageWithContext = userMessage;
|
|
@@ -476,10 +496,24 @@ async function _runHandleChat(config, sessionId, userMessage) {
|
|
|
476
496
|
userMessageWithContext += `\n\n[System: The following approaches were tried and failed in previous runs — consider them exhausted:\n${session.metadata.failedApproaches.map((a, i) => `${i + 1}. ${a}`).join('\n')}]`;
|
|
477
497
|
}
|
|
478
498
|
|
|
499
|
+
// If this message follows a zero-progress intervention, tell the agent explicitly so
|
|
500
|
+
// it responds to the user's input instead of blindly resuming the same failing approach.
|
|
501
|
+
if (wasZeroProgress) {
|
|
502
|
+
const stateLines = Object.entries(priorCheckpointState).map(([k, v]) => `- ${k}: ${v}`);
|
|
503
|
+
let note = `\n\n[System: This task previously hit zero-progress and required intervention. If the user has given new direction or clarification, follow it. Otherwise, immediately explain what specific obstacle is blocking progress — do not resume the same failing approach.`;
|
|
504
|
+
if (stateLines.length > 0) {
|
|
505
|
+
note += `\n\nKnown facts from previous run:\n${stateLines.join('\n')}`;
|
|
506
|
+
}
|
|
507
|
+
note += `]`;
|
|
508
|
+
userMessageWithContext += note;
|
|
509
|
+
}
|
|
510
|
+
|
|
479
511
|
// Append user message and reset handoff state
|
|
480
512
|
session.messages.push({ role: 'user', content: userMessageWithContext });
|
|
481
513
|
session.metadata.handoffCount = 0;
|
|
482
514
|
session.metadata.failedApproaches = [];
|
|
515
|
+
session.metadata.lastCheckpointRemaining = null;
|
|
516
|
+
session.metadata.checkpointState = {};
|
|
483
517
|
|
|
484
518
|
// Resolves {{user_info}} in system prompt at runtime (never persisted)
|
|
485
519
|
function prepareMessages(messages) {
|
|
@@ -495,8 +529,11 @@ async function _runHandleChat(config, sessionId, userMessage) {
|
|
|
495
529
|
let finalResponse = '';
|
|
496
530
|
let finalLogSummary = '';
|
|
497
531
|
let finalStatus = 'ok';
|
|
498
|
-
// Tracks checkpoint.remaining from the previous handoff run to detect zero progress
|
|
499
|
-
|
|
532
|
+
// Tracks checkpoint.remaining from the previous handoff run to detect zero progress.
|
|
533
|
+
// Initialized from persisted metadata so detection works across user messages too —
|
|
534
|
+
// if the agent was stuck before and produces the same remaining again on the next
|
|
535
|
+
// user turn, zero-progress fires after just one run instead of two.
|
|
536
|
+
let previousRemaining = priorCheckpointRemaining;
|
|
500
537
|
|
|
501
538
|
try {
|
|
502
539
|
// Handoff loop
|
|
@@ -579,6 +616,15 @@ async function _runHandleChat(config, sessionId, userMessage) {
|
|
|
579
616
|
session.metadata.failedApproaches.push(...run.checkpoint.failedApproaches);
|
|
580
617
|
}
|
|
581
618
|
|
|
619
|
+
// Merge concrete facts from this run's checkpoint.state into session metadata.
|
|
620
|
+
// Later runs overwrite earlier values for the same key (newer discoveries win).
|
|
621
|
+
if (run.checkpoint.state && Object.keys(run.checkpoint.state).length > 0) {
|
|
622
|
+
session.metadata.checkpointState = {
|
|
623
|
+
...(session.metadata.checkpointState || {}),
|
|
624
|
+
...run.checkpoint.state,
|
|
625
|
+
};
|
|
626
|
+
}
|
|
627
|
+
|
|
582
628
|
// Zero-progress detection: if checkpoint.remaining is identical to the previous
|
|
583
629
|
// handoff's remaining, the agent completed a full run without making any progress.
|
|
584
630
|
// Stop immediately rather than burning more iterations on a stuck task.
|
|
@@ -588,6 +634,10 @@ async function _runHandleChat(config, sessionId, userMessage) {
|
|
|
588
634
|
finalLogSummary = 'Zero progress detected: task state unchanged after a full run. Human intervention required.';
|
|
589
635
|
finalStatus = 'intervention_required';
|
|
590
636
|
|
|
637
|
+
// Persist so that the next user message initializes previousRemaining from this
|
|
638
|
+
// value — zero-progress will then fire after just one run instead of two.
|
|
639
|
+
session.metadata.lastCheckpointRemaining = currentRemaining;
|
|
640
|
+
|
|
591
641
|
await appendLog(sessionId, {
|
|
592
642
|
iteration: 0,
|
|
593
643
|
model: config.selectedModel,
|
|
@@ -631,13 +681,17 @@ async function _runHandleChat(config, sessionId, userMessage) {
|
|
|
631
681
|
|
|
632
682
|
// Resume with checkpoint.remaining as new prompt.
|
|
633
683
|
// Guard against null/undefined in case the model omitted the field.
|
|
634
|
-
//
|
|
635
|
-
//
|
|
684
|
+
// Inject the full accumulated failedApproaches and concrete state so the agent
|
|
685
|
+
// has complete memory of what failed and what was already discovered.
|
|
636
686
|
let resumeContent = run.checkpoint.remaining || 'Continue with the task.';
|
|
637
687
|
const allFailedApproaches = session.metadata.failedApproaches || [];
|
|
638
688
|
if (allFailedApproaches.length > 0) {
|
|
639
689
|
resumeContent += `\n\n[System: The following approaches were tried and failed in previous runs — do not repeat them:\n${allFailedApproaches.map((a, i) => `${i + 1}. ${a}`).join('\n')}]`;
|
|
640
690
|
}
|
|
691
|
+
const stateToInject = session.metadata.checkpointState || {};
|
|
692
|
+
if (Object.keys(stateToInject).length > 0) {
|
|
693
|
+
resumeContent += `\n\n[System: Known facts from previous runs:\n${Object.entries(stateToInject).map(([k, v]) => `- ${k}: ${v}`).join('\n')}]`;
|
|
694
|
+
}
|
|
641
695
|
session.messages.push({ role: 'user', content: resumeContent });
|
|
642
696
|
}
|
|
643
697
|
} catch (e) {
|
package/src/server/tools.js
CHANGED
|
@@ -347,6 +347,43 @@ const SEED_TOOLS = {
|
|
|
347
347
|
}
|
|
348
348
|
`,
|
|
349
349
|
},
|
|
350
|
+
write_file: {
|
|
351
|
+
definition: {
|
|
352
|
+
type: 'function',
|
|
353
|
+
function: {
|
|
354
|
+
name: 'write_file',
|
|
355
|
+
description: 'Write content directly to a file on the filesystem, bypassing all shell escaping. Use this to create or overwrite any file — shell scripts, config files, code, etc. Content is written exactly as provided: dollar signs, backslashes, and special characters are preserved without modification. Always prefer this over exec+echo, exec+printf, or exec+heredoc for writing files. For shell scripts, pass mode: "755" to make the file executable. Example: write_file({ path: "/path/to/scan.sh", content: "#!/bin/bash\\nDOMAIN=$1\\n...", mode: "755" })',
|
|
356
|
+
parameters: {
|
|
357
|
+
type: 'object',
|
|
358
|
+
properties: {
|
|
359
|
+
path: {
|
|
360
|
+
type: 'string',
|
|
361
|
+
description: 'Absolute or relative path to the file to write. Parent directories are created automatically.',
|
|
362
|
+
},
|
|
363
|
+
content: {
|
|
364
|
+
type: 'string',
|
|
365
|
+
description: 'The content to write to the file. Written as-is — no shell interpretation occurs.',
|
|
366
|
+
},
|
|
367
|
+
mode: {
|
|
368
|
+
type: 'string',
|
|
369
|
+
description: 'Optional Unix file mode in octal string form, e.g. "755" for executable scripts, "644" for regular files. Defaults to "644".',
|
|
370
|
+
},
|
|
371
|
+
},
|
|
372
|
+
required: ['path', 'content'],
|
|
373
|
+
},
|
|
374
|
+
},
|
|
375
|
+
},
|
|
376
|
+
code: `
|
|
377
|
+
const targetPath = path.resolve(args.path);
|
|
378
|
+
await fs.promises.mkdir(path.dirname(targetPath), { recursive: true });
|
|
379
|
+
await fs.promises.writeFile(targetPath, args.content, 'utf8');
|
|
380
|
+
if (args.mode) {
|
|
381
|
+
await fs.promises.chmod(targetPath, parseInt(args.mode, 8));
|
|
382
|
+
}
|
|
383
|
+
const bytes = Buffer.byteLength(args.content, 'utf8');
|
|
384
|
+
return { status: 'ok', path: targetPath, bytes, mode: args.mode || '644' };
|
|
385
|
+
`,
|
|
386
|
+
},
|
|
350
387
|
get_recent_sessions: {
|
|
351
388
|
definition: {
|
|
352
389
|
type: 'function',
|