@ducci/jarvis 1.0.33 → 1.0.34

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,110 @@
1
+ # Finding 017: Looping Intervention and Lossy Checkpoint
2
+
3
+ **Date:** 2026-03-04
4
+ **Severity:** High — agent burned 40 iterations (4 full runs) on a structurally impossible task without escaping; concrete facts like file paths regressed between handoff runs
5
+ **Status:** Fixed
6
+
7
+ ---
8
+
9
+ ## Observed Session
10
+
11
+ Session from a remote server. Model: `nvidia/nemotron-3-nano-30b-a3b:free`. User requested a ZAP security scanning workflow (create project dir, write README + scan.sh, run a test scan).
12
+
13
+ | Entry | Trigger | Status | Iterations | Notes |
14
+ |-------|---------|--------|------------|-------|
15
+ | 1 | "hello" | ok | 0 | greeting |
16
+ | 2 | ZAP task | checkpoint_reached | 10 | ZAP daemon lock blocked scan |
17
+ | 3 | handoff resume | checkpoint_reached | 10 | same lock, same result |
18
+ | 4 | zero-progress | intervention_required | 0 | correctly detected |
19
+ | 5 | "Ok can you do it?" | checkpoint_reached | 10 | loop restarted, same result |
20
+ | 6 | handoff resume | checkpoint_reached | 10 | same lock again |
21
+ | 7 | zero-progress | intervention_required | 0 | detected again, session abandoned |
22
+
23
+ Total: 40 wasted iterations. Additionally, between Entry 2 and Entry 6, the agent changed the project path from `/root/.jarvis/projects/cybersecurity` to `/root/projects/cybersecurity`, causing `list_dir` to fail at the start of that run.
24
+
25
+ ---
26
+
27
+ ## Root Cause 1: Zero-Progress Detection Resets Across User Messages
28
+
29
+ ### What happened
30
+
31
+ `previousRemaining` is a local variable initialized to `null` on every call to `_runHandleChat`. Zero-progress detection requires two identical `checkpoint.remaining` values, but both must occur within the same invocation. When `intervention_required` fires and the user sends a new message, `_runHandleChat` is called fresh with `previousRemaining = null`. The detection resets.
32
+
33
+ The result: each new user message grants the agent 2 full runs (20 iterations) before zero-progress fires again — regardless of how many times the cycle has already repeated. The intervention mechanism correctly identifies "stuck" but provides no structural escape when the user replies.
34
+
35
+ ### Fix
36
+
37
+ 1. When zero-progress fires, persist `session.metadata.lastCheckpointRemaining = currentRemaining`.
38
+ 2. In `_runHandleChat`, initialize `previousRemaining` from `session.metadata.lastCheckpointRemaining` (cleared after reading). Zero-progress now fires after just one run (10 iterations) on the next user message if the agent produces the same remaining.
39
+ 3. Inject a note into `userMessageWithContext` when `lastCheckpointRemaining` was set:
40
+
41
+ ```
42
+ [System: This task previously hit zero-progress and required intervention. If the user has given new direction or clarification, follow it. Otherwise, immediately explain what specific obstacle is blocking progress — do not resume the same failing approach.]
43
+ ```
44
+
45
+ This gives the agent explicit guidance to respond to what the user actually asked instead of blindly resuming.
46
+
47
+ **File:** `src/server/agent.js` — `_runHandleChat`
48
+
49
+ ---
50
+
51
+ ## Root Cause 2: Checkpoint Loses Concrete Facts Between Runs
52
+
53
+ ### What happened
54
+
55
+ The checkpoint schema had `progress`, `remaining`, and `failedApproaches` — all natural-language prose. Concrete facts the agent discovers during a run (file paths, binary locations, config values) are paraphrased or omitted when the agent writes the summary. On resume, the model reconstructs these facts from vague prose and sometimes gets them wrong.
56
+
57
+ In this session, the project directory `/root/.jarvis/projects/cybersecurity` was written correctly in runs 1–3 but reconstructed as `/root/projects/cybersecurity` in run 6. The first action of that run was `list_dir` on the wrong path, which failed.
58
+
59
+ ### Fix
60
+
61
+ 1. Added a `state` field to `WRAP_UP_NOTE`'s checkpoint schema: a flat key-value JSON object for concrete facts confirmed by tool output (file paths created, binary locations found, config values discovered).
62
+
63
+ 2. After each handoff, merge `run.checkpoint.state` into `session.metadata.checkpointState` (later runs overwrite earlier values for the same key).
64
+
65
+ 3. When building `resumeContent` for the next handoff run, inject the accumulated state:
66
+
67
+ ```
68
+ [System: Known facts from previous runs:
69
+ - projectDir: /root/.jarvis/projects/cybersecurity
70
+ - zapBinary: /snap/bin/zaproxy
71
+ - scanScriptPath: /root/.jarvis/projects/cybersecurity/scan.sh]
72
+ ```
73
+
74
+ 4. When re-entering after zero-progress (via `wasZeroProgress`), also include the state in the `userMessageWithContext` injection so facts are available immediately on the first run.
75
+
76
+ 5. `session.metadata.checkpointState` is reset on each new user message (same lifecycle as `failedApproaches`) to avoid stale facts from previous tasks leaking into new ones.
77
+
78
+ **File:** `src/server/agent.js` — `WRAP_UP_NOTE`, checkpoint normalization, `_runHandleChat`
79
+
80
+ ---
81
+
82
+ ## Interaction Between the Two Fixes
83
+
84
+ The zero-progress note injected into `userMessageWithContext` now also includes the `priorCheckpointState` (captured before the metadata reset), so the agent that receives the note also has the concrete facts it needs if the user does provide new direction. This means both fixes compound: the agent is told it was stuck AND given the facts it needs to act on whatever the user says.
85
+
86
+ ---
87
+
88
+ ## What Was Not Changed
89
+
90
+ - The existing exact-match loop detector (`loopTracker`) — unchanged
91
+ - The consecutive failure detector — unchanged
92
+ - The `maxHandoffs` cap — unchanged
93
+ - The `hasConsecutiveModelErrors` escalation — unchanged
94
+ - The context strip on checkpoint/intervention — unchanged
95
+
96
+ ---
97
+
98
+ ## Files Changed
99
+
100
+ | File | Change |
101
+ |------|--------|
102
+ | `src/server/agent.js` | `WRAP_UP_NOTE` — added `state` field to checkpoint schema |
103
+ | `src/server/agent.js` | Checkpoint normalization — normalize `cp.state` to `{}` if missing/malformed |
104
+ | `src/server/agent.js` | `_runHandleChat` — capture `wasZeroProgress`, `priorCheckpointRemaining`, `priorCheckpointState` before metadata reset |
105
+ | `src/server/agent.js` | `_runHandleChat` — inject zero-progress note + prior state into `userMessageWithContext` when applicable |
106
+ | `src/server/agent.js` | `_runHandleChat` — reset `lastCheckpointRemaining` and `checkpointState` on new user message |
107
+ | `src/server/agent.js` | `_runHandleChat` — initialize `previousRemaining` from `priorCheckpointRemaining` |
108
+ | `src/server/agent.js` | Zero-progress block — persist `session.metadata.lastCheckpointRemaining` |
109
+ | `src/server/agent.js` | Handoff accumulation — merge `run.checkpoint.state` into `session.metadata.checkpointState` |
110
+ | `src/server/agent.js` | `resumeContent` — inject accumulated `checkpointState` as known facts |
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ducci/jarvis",
3
- "version": "1.0.33",
3
+ "version": "1.0.34",
4
4
  "description": "A fully automated agent system that lives on a server.",
5
5
  "main": "./src/index.js",
6
6
  "type": "module",
@@ -20,11 +20,12 @@ Respond with your normal JSON, but add a checkpoint field:
20
20
  "checkpoint": {
21
21
  "progress": "What has been fully completed — only include items confirmed by tool output (e.g., successful exec with exit code 0, or verified by ls/cat). Do not report planned steps as completed.",
22
22
  "remaining": "What still needs to be done to finish the task — as a plain text string, never an array or object.",
23
- "failedApproaches": ["Concise description of each approach that was tried and failed, e.g. 'downloading subfinder via curl from GitHub releases — connection reset'. Omit array entries for things that succeeded. Leave as empty array if nothing failed."]
23
+ "failedApproaches": ["Concise description of each approach that was tried and failed, e.g. 'downloading subfinder via curl from GitHub releases — connection reset'. Omit array entries for things that succeeded. Leave as empty array if nothing failed."],
24
+ "state": {"factKey": "factValue — concrete facts confirmed by tool output this run: file paths created, binary locations found, config values discovered. Use short stable keys, e.g. projectDir, zapBinary, scanScriptPath. Omit or use {} if nothing concrete was discovered."}
24
25
  }
25
26
  }
26
27
 
27
- The checkpoint field will be used to automatically resume the task in the next run. failedApproaches is injected into the next run so the agent does not waste iterations repeating strategies that already failed. remaining must be a plain text string. failedApproaches must be a JSON array of strings.]`;
28
+ The checkpoint field will be used to automatically resume the task in the next run. failedApproaches is injected into the next run so the agent does not waste iterations repeating strategies that already failed. state is injected verbatim so the next run does not need to rediscover file paths or binary locations. remaining must be a plain text string. failedApproaches must be a JSON array of strings. state must be a flat JSON object.]`;
28
29
 
29
30
  // Serializes concurrent requests for the same session. Maps sessionId to the
30
31
  // tail of the current request chain (a Promise that resolves when the last
@@ -414,6 +415,9 @@ async function runAgentLoop(client, config, session, prepareMessages) {
414
415
  typeof item === 'string' ? item : JSON.stringify(item)
415
416
  );
416
417
  }
418
+ if (typeof cp.state !== 'object' || cp.state === null || Array.isArray(cp.state)) {
419
+ cp.state = {};
420
+ }
417
421
  return {
418
422
  iteration,
419
423
  response,
@@ -480,6 +484,11 @@ async function _runHandleChat(config, sessionId, userMessage) {
480
484
  session = createSession(systemPromptTemplate);
481
485
  }
482
486
 
487
+ // Capture persisted state BEFORE resetting metadata so we can inject it below.
488
+ const wasZeroProgress = !!session.metadata.lastCheckpointRemaining;
489
+ const priorCheckpointRemaining = session.metadata.lastCheckpointRemaining || null;
490
+ const priorCheckpointState = session.metadata.checkpointState || {};
491
+
483
492
  // Preserve accumulated failedApproaches in conversation history before resetting
484
493
  // so the model retains knowledge of what failed in the previous batch of handoff runs.
485
494
  let userMessageWithContext = userMessage;
@@ -487,10 +496,24 @@ async function _runHandleChat(config, sessionId, userMessage) {
487
496
  userMessageWithContext += `\n\n[System: The following approaches were tried and failed in previous runs — consider them exhausted:\n${session.metadata.failedApproaches.map((a, i) => `${i + 1}. ${a}`).join('\n')}]`;
488
497
  }
489
498
 
499
+ // If this message follows a zero-progress intervention, tell the agent explicitly so
500
+ // it responds to the user's input instead of blindly resuming the same failing approach.
501
+ if (wasZeroProgress) {
502
+ const stateLines = Object.entries(priorCheckpointState).map(([k, v]) => `- ${k}: ${v}`);
503
+ let note = `\n\n[System: This task previously hit zero-progress and required intervention. If the user has given new direction or clarification, follow it. Otherwise, immediately explain what specific obstacle is blocking progress — do not resume the same failing approach.`;
504
+ if (stateLines.length > 0) {
505
+ note += `\n\nKnown facts from previous run:\n${stateLines.join('\n')}`;
506
+ }
507
+ note += `]`;
508
+ userMessageWithContext += note;
509
+ }
510
+
490
511
  // Append user message and reset handoff state
491
512
  session.messages.push({ role: 'user', content: userMessageWithContext });
492
513
  session.metadata.handoffCount = 0;
493
514
  session.metadata.failedApproaches = [];
515
+ session.metadata.lastCheckpointRemaining = null;
516
+ session.metadata.checkpointState = {};
494
517
 
495
518
  // Resolves {{user_info}} in system prompt at runtime (never persisted)
496
519
  function prepareMessages(messages) {
@@ -506,8 +529,11 @@ async function _runHandleChat(config, sessionId, userMessage) {
506
529
  let finalResponse = '';
507
530
  let finalLogSummary = '';
508
531
  let finalStatus = 'ok';
509
- // Tracks checkpoint.remaining from the previous handoff run to detect zero progress
510
- let previousRemaining = null;
532
+ // Tracks checkpoint.remaining from the previous handoff run to detect zero progress.
533
+ // Initialized from persisted metadata so detection works across user messages too —
534
+ // if the agent was stuck before and produces the same remaining again on the next
535
+ // user turn, zero-progress fires after just one run instead of two.
536
+ let previousRemaining = priorCheckpointRemaining;
511
537
 
512
538
  try {
513
539
  // Handoff loop
@@ -590,6 +616,15 @@ async function _runHandleChat(config, sessionId, userMessage) {
590
616
  session.metadata.failedApproaches.push(...run.checkpoint.failedApproaches);
591
617
  }
592
618
 
619
+ // Merge concrete facts from this run's checkpoint.state into session metadata.
620
+ // Later runs overwrite earlier values for the same key (newer discoveries win).
621
+ if (run.checkpoint.state && Object.keys(run.checkpoint.state).length > 0) {
622
+ session.metadata.checkpointState = {
623
+ ...(session.metadata.checkpointState || {}),
624
+ ...run.checkpoint.state,
625
+ };
626
+ }
627
+
593
628
  // Zero-progress detection: if checkpoint.remaining is identical to the previous
594
629
  // handoff's remaining, the agent completed a full run without making any progress.
595
630
  // Stop immediately rather than burning more iterations on a stuck task.
@@ -599,6 +634,10 @@ async function _runHandleChat(config, sessionId, userMessage) {
599
634
  finalLogSummary = 'Zero progress detected: task state unchanged after a full run. Human intervention required.';
600
635
  finalStatus = 'intervention_required';
601
636
 
637
+ // Persist so that the next user message initializes previousRemaining from this
638
+ // value — zero-progress will then fire after just one run instead of two.
639
+ session.metadata.lastCheckpointRemaining = currentRemaining;
640
+
602
641
  await appendLog(sessionId, {
603
642
  iteration: 0,
604
643
  model: config.selectedModel,
@@ -642,13 +681,17 @@ async function _runHandleChat(config, sessionId, userMessage) {
642
681
 
643
682
  // Resume with checkpoint.remaining as new prompt.
644
683
  // Guard against null/undefined in case the model omitted the field.
645
- // Use the full accumulated failedApproaches list across ALL handoff runs so the
646
- // agent has complete memory of what has already been tried and failed.
684
+ // Inject the full accumulated failedApproaches and concrete state so the agent
685
+ // has complete memory of what failed and what was already discovered.
647
686
  let resumeContent = run.checkpoint.remaining || 'Continue with the task.';
648
687
  const allFailedApproaches = session.metadata.failedApproaches || [];
649
688
  if (allFailedApproaches.length > 0) {
650
689
  resumeContent += `\n\n[System: The following approaches were tried and failed in previous runs — do not repeat them:\n${allFailedApproaches.map((a, i) => `${i + 1}. ${a}`).join('\n')}]`;
651
690
  }
691
+ const stateToInject = session.metadata.checkpointState || {};
692
+ if (Object.keys(stateToInject).length > 0) {
693
+ resumeContent += `\n\n[System: Known facts from previous runs:\n${Object.entries(stateToInject).map(([k, v]) => `- ${k}: ${v}`).join('\n')}]`;
694
+ }
652
695
  session.messages.push({ role: 'user', content: resumeContent });
653
696
  }
654
697
  } catch (e) {