@ducci/jarvis 1.0.38 → 1.0.39
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/docs/agent.md +43 -4
- package/docs/crons.md +100 -0
- package/docs/identity.md +38 -0
- package/docs/skills.md +77 -0
- package/docs/system-prompt.md +25 -13
- package/docs/telegram.md +19 -0
- package/package.json +2 -1
- package/src/server/agent.js +44 -14
- package/src/server/app.js +125 -2
- package/src/server/config.js +43 -0
- package/src/server/cron-scheduler.js +35 -0
- package/src/server/crons.js +106 -0
- package/src/server/tools.js +192 -71
- package/docs/findings/001-context-explosion.md +0 -116
- package/docs/findings/002-handoff-edge-cases.md +0 -84
- package/docs/findings/003-event-loop-blocking-and-reliability.md +0 -120
- package/docs/findings/004-agent-reliability-improvements.md +0 -162
- package/docs/findings/005-installation-timeout.md +0 -128
- package/docs/findings/006-malformed-tool-schema.md +0 -118
- package/docs/findings/007-telegram-errors-and-handoff-stalling.md +0 -271
- package/docs/findings/008-exec-timeout-architecture.md +0 -118
- package/docs/findings/009-non-string-response-field.md +0 -153
- package/docs/findings/010-checkpoint-field-type-safety.md +0 -121
- package/docs/findings/011-empty-model-response.md +0 -157
- package/docs/findings/012-empty-nudge-loses-recovery-text.md +0 -121
- package/docs/findings/013-stderr-visibility-and-truncation.md +0 -59
- package/docs/findings/014-exec-stderr-artifact-and-malformed-tool-args.md +0 -202
- package/docs/findings/015-failed-run-context-strip.md +0 -142
- package/docs/findings/016-file-writing-corruption-and-stderr-loop.md +0 -119
- package/docs/findings/017-looping-intervention-and-lossy-checkpoint.md +0 -110
- package/docs/findings/018-anthropic-oauth-token-support.md +0 -72
|
@@ -1,142 +0,0 @@
|
|
|
1
|
-
# Finding 015: Failed Runs Leave Tool History in Context (Context Bloat Death Spiral)
|
|
2
|
-
|
|
3
|
-
**Date:** 2026-03-02
|
|
4
|
-
**Severity:** High — caused 3 consecutive `model_error: Empty choices array` failures; session unusable
|
|
5
|
-
**Status:** Fixed
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Observed Session
|
|
10
|
-
|
|
11
|
-
Session `6123209d-ce5a-44d0-be12-29aac58b4cf3`. Model: `nvidia/nemotron-3-nano-30b-a3b:free`. User requested a ZAP security scanning project.
|
|
12
|
-
|
|
13
|
-
| Entry | Trigger | Status | messageCount at failure | toolCalls |
|
|
14
|
-
|-------|---------|--------|------------------------|-----------|
|
|
15
|
-
| 1 | "hi all good?" | ok | — | 0 |
|
|
16
|
-
| 2 | ZAP task (run 1) | checkpoint_reached | — | 10 |
|
|
17
|
-
| 3 | handoff resume (run 2) | checkpoint_reached | — | 10 |
|
|
18
|
-
| 4 | handoff resume (run 3) | model_error (empty choices, iter 7) | 22 | 26 |
|
|
19
|
-
| 5 | "Why I get Model returned an empty response?" | model_error (empty choices, iter 3) | 27 | 2 |
|
|
20
|
-
| 6 | "Why I get Model returned an empty response again?!!" | model_error (empty choices, iter 5) | 37 | 4 |
|
|
21
|
-
|
|
22
|
-
The session ended without producing any result. The user received `'Model returned an empty response.'` three times.
|
|
23
|
-
|
|
24
|
-
---
|
|
25
|
-
|
|
26
|
-
## Root Cause 1: Failed runs leave tool call history in session
|
|
27
|
-
|
|
28
|
-
### What happened
|
|
29
|
-
|
|
30
|
-
The handoff loop strips tool call messages for `checkpoint_reached` runs:
|
|
31
|
-
|
|
32
|
-
```js
|
|
33
|
-
session.messages.splice(runStartIndex, session.messages.length - runStartIndex - 1);
|
|
34
|
-
```
|
|
35
|
-
|
|
36
|
-
Runs that end with `model_error` or `format_error` received **no strip**. Every tool call message (assistant+tool pair, nudge injections) from the failed run remained in `session.messages`, with only a synthetic error note appended afterward.
|
|
37
|
-
|
|
38
|
-
Run 3 had 26 tool calls across 7 iterations — approximately 13 messages added to the session. These were preserved verbatim. Each subsequent user turn started with more context than the last.
|
|
39
|
-
|
|
40
|
-
### Message count growth
|
|
41
|
-
|
|
42
|
-
- Before run 3: ~8 messages (runs 1 and 2 were both checkpoint_reached and stripped correctly)
|
|
43
|
-
- After entry 4 (model_error, no strip): 21 messages + synthetic note = 22
|
|
44
|
-
- After entry 5 (model_error, no strip): 27 messages + synthetic note = 28
|
|
45
|
-
- At entry 6: 37 messages in context
|
|
46
|
-
|
|
47
|
-
The free model returns `choices: []` when the context exceeds what it can handle. Each failure added more context, making the next failure more likely: a **positive feedback death spiral**.
|
|
48
|
-
|
|
49
|
-
### Fix
|
|
50
|
-
|
|
51
|
-
Apply the same splice that checkpoint runs already use:
|
|
52
|
-
|
|
53
|
-
```js
|
|
54
|
-
if (finalStatus === 'model_error' || finalStatus === 'format_error') {
|
|
55
|
-
session.messages.splice(runStartIndex, session.messages.length - runStartIndex);
|
|
56
|
-
// then push synthetic error note as before
|
|
57
|
-
}
|
|
58
|
-
```
|
|
59
|
-
|
|
60
|
-
The strip runs before the synthetic error note is pushed, returning the session to its pre-run state plus one concise note. The JSONL log preserves all tool results for retrospective inspection via `read_session_log`.
|
|
61
|
-
|
|
62
|
-
**File**: `src/server/agent.js` — `_runHandleChat`, non-checkpoint break path
|
|
63
|
-
|
|
64
|
-
---
|
|
65
|
-
|
|
66
|
-
## Root Cause 2: No detection or escalation for consecutive model_errors
|
|
67
|
-
|
|
68
|
-
### What happened
|
|
69
|
-
|
|
70
|
-
After two consecutive `model_error: Empty choices array` entries (4 and 5), no protective mechanism fired. The system continued accepting new user messages and spawning new runs indefinitely.
|
|
71
|
-
|
|
72
|
-
Existing protection mechanisms all missed this case:
|
|
73
|
-
- `maxHandoffs` — only applies to `checkpoint_reached` runs
|
|
74
|
-
- `consecutiveFailures` — tracks tool failures within a single run
|
|
75
|
-
- Zero-progress detection — only applies to `checkpoint_reached` runs
|
|
76
|
-
|
|
77
|
-
### Fix
|
|
78
|
-
|
|
79
|
-
Detect the pattern structurally in `session.messages` before starting each new run: if the last two assistant messages are both synthetic `model_error` notes, the session is in a confirmed failure loop. Escalate to `intervention_required` without running another agent loop.
|
|
80
|
-
|
|
81
|
-
```js
|
|
82
|
-
function hasConsecutiveModelErrors(messages) {
|
|
83
|
-
const assistantTail = messages.filter(m => m.role === 'assistant').slice(-2);
|
|
84
|
-
return (
|
|
85
|
-
assistantTail.length === 2 &&
|
|
86
|
-
assistantTail.every(
|
|
87
|
-
m =>
|
|
88
|
-
typeof m.content === 'string' &&
|
|
89
|
-
m.content.startsWith('[System: Previous run failed (model_error)')
|
|
90
|
-
)
|
|
91
|
-
);
|
|
92
|
-
}
|
|
93
|
-
```
|
|
94
|
-
|
|
95
|
-
This uses no additional state: it reads the session history directly. Old sessions are handled correctly. One failure is allowed (transient errors are real); two consecutive failures mean the session cannot self-recover.
|
|
96
|
-
|
|
97
|
-
Combined with Fix 1, consecutive model_errors in this session would have played out as:
|
|
98
|
-
1. Entry 4 (run 3): model_error → strip → synthetic note. Session back to 9 messages.
|
|
99
|
-
2. Entry 5 (user "Why?"): run 4 starts with 10 messages. If it still fails → strip → synthetic note. Two model_error notes now in session.
|
|
100
|
-
3. Entry 6 (user "Why again?!"): `hasConsecutiveModelErrors` fires → `intervention_required` returned immediately. User gets a clear message: start a new session or switch model.
|
|
101
|
-
|
|
102
|
-
**File**: `src/server/agent.js` — `hasConsecutiveModelErrors` function + check at top of handoff loop
|
|
103
|
-
|
|
104
|
-
---
|
|
105
|
-
|
|
106
|
-
## Root Cause 3: Empty choices error message provides no actionable guidance
|
|
107
|
-
|
|
108
|
-
### What happened
|
|
109
|
-
|
|
110
|
-
The `choices.length === 0` path returned:
|
|
111
|
-
|
|
112
|
-
```
|
|
113
|
-
Model returned an empty response.
|
|
114
|
-
```
|
|
115
|
-
|
|
116
|
-
When the user asked "why?", the agent — with ZAP tool call context still present — continued ZAP investigation instead of explaining the API failure. The opaque error and the polluted context compounded: the model had no clear signal about what went wrong and no guidance on how to recover.
|
|
117
|
-
|
|
118
|
-
### Fix
|
|
119
|
-
|
|
120
|
-
Include the context size and recovery guidance in the response:
|
|
121
|
-
|
|
122
|
-
```js
|
|
123
|
-
response: `Model returned an empty response (${preparedMessages.length} messages in context). This typically happens when the conversation is too long for the model. Try starting a new session or switching to a model with a larger context window.`,
|
|
124
|
-
```
|
|
125
|
-
|
|
126
|
-
**File**: `src/server/agent.js` — `runAgentLoop`, empty choices early return
|
|
127
|
-
|
|
128
|
-
---
|
|
129
|
-
|
|
130
|
-
## Why Fix 1 is Primary
|
|
131
|
-
|
|
132
|
-
Fix 1 is the root fix. With context stripped after each failure, the model operates on a tiny session (~10 messages) on subsequent turns. The free model handles this easily. Fix 2 is a safety net for persistent non-context failures. Fix 3 improves user-facing error messages for the residual cases that slip through.
|
|
133
|
-
|
|
134
|
-
---
|
|
135
|
-
|
|
136
|
-
## Files Changed
|
|
137
|
-
|
|
138
|
-
| File | Change |
|
|
139
|
-
|------|--------|
|
|
140
|
-
| `src/server/agent.js` | Strip tool history on `model_error`/`format_error` (same as checkpoint) |
|
|
141
|
-
| `src/server/agent.js` | `hasConsecutiveModelErrors` function + check before each run in handoff loop |
|
|
142
|
-
| `src/server/agent.js` | Include message count in empty choices response |
|
|
@@ -1,119 +0,0 @@
|
|
|
1
|
-
# Finding 016: File Writing Corruption, Misleading Stderr Nudge, and Repeated-Error Loop
|
|
2
|
-
|
|
3
|
-
**Date:** 2026-03-02
|
|
4
|
-
**Severity:** High — agent burned 10 iterations on the wrong diagnosis; task abandoned
|
|
5
|
-
**Status:** Fixed
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Observed Session
|
|
10
|
-
|
|
11
|
-
Session `a25fd973-3e92-4902-a96d-536ef0eb3005`. Model: `nvidia/nemotron-3-nano-30b-a3b:free`. User asked to run a ZAP security scanner script.
|
|
12
|
-
|
|
13
|
-
The script (`scan.sh`) failed immediately with `$ZAP_CMD: command not found`. The agent used all 10 iterations investigating PATH issues and gave up without solving the problem or producing a handoff checkpoint.
|
|
14
|
-
|
|
15
|
-
---
|
|
16
|
-
|
|
17
|
-
## Root Cause 1: Shell Script Written with Escaped Dollar Signs
|
|
18
|
-
|
|
19
|
-
### What happened
|
|
20
|
-
|
|
21
|
-
`scan.sh` was written in a prior session by the agent using `exec` with a shell command (echo or heredoc). Multi-layered escaping — JS string → JSON encoding → bash variable expansion — caused every `$` in the script to be written as `\$` in the file.
|
|
22
|
-
|
|
23
|
-
In bash, `"\$VAR"` (backslash-dollar in double quotes) suppresses variable expansion and produces the literal string `$VAR`. The script ran but nothing expanded:
|
|
24
|
-
|
|
25
|
-
```
|
|
26
|
-
bash -x scan.sh http://testphp.vulnweb.com:
|
|
27
|
-
|
|
28
|
-
+ DOMAIN='$1' ← should be 'http://testphp.vulnweb.com'
|
|
29
|
-
+ OUTPUT_DIR='/path/results/$DOMAIN' ← should be '/path/results/http://...'
|
|
30
|
-
+ '$ZAP_CMD' -cmd ... ← tries to run a command literally named '$ZAP_CMD'
|
|
31
|
-
scan.sh: line 27: $ZAP_CMD: command not found
|
|
32
|
-
```
|
|
33
|
-
|
|
34
|
-
Secondary confirmation: the project directory contained folders literally named `$OUTPUT_DIR` and `$RESULTS_DIR`, created by a prior run of the broken script.
|
|
35
|
-
|
|
36
|
-
### Fix
|
|
37
|
-
|
|
38
|
-
Added `write_file` as a seed tool. It calls `fs.promises.writeFile` directly — content arrives as a JSON string and is written to disk verbatim. No shell is involved, so no escaping layer can corrupt dollar signs or backslashes.
|
|
39
|
-
|
|
40
|
-
Added an optional `mode` parameter (e.g. `"755"`) to make scripts executable in the same call.
|
|
41
|
-
|
|
42
|
-
Updated the system prompt with a dedicated "Writing Files" section (peer-level to "exec Safety") stating: use `write_file` for all file creation — never `exec` with `echo`, `printf`, or heredoc.
|
|
43
|
-
|
|
44
|
-
**Files changed:**
|
|
45
|
-
- `src/server/tools.js` — `write_file` added to `SEED_TOOLS`
|
|
46
|
-
- `docs/system-prompt.md` — new "Writing Files" section; removed the buried `echo -e` bullet from exec Safety
|
|
47
|
-
|
|
48
|
-
---
|
|
49
|
-
|
|
50
|
-
## Root Cause 2: Stderr Nudge Misdirected the Model
|
|
51
|
-
|
|
52
|
-
### What happened
|
|
53
|
-
|
|
54
|
-
After every failed tool call with stderr output, the system injected:
|
|
55
|
-
|
|
56
|
-
> *"Examine the stderr field in the tool result carefully — it likely describes the root cause of the failure."*
|
|
57
|
-
|
|
58
|
-
The stderr was `$ZAP_CMD: command not found` — which looks like a PATH problem. The real diagnostic clue was in the **stdout** of `bash -x`: `DOMAIN='$1'` (variable not expanded). By directing the model to stderr, the nudge trained its attention away from the evidence.
|
|
59
|
-
|
|
60
|
-
The model then spent iterations on: explicit PATH overrides, `command -v` checks, `sed -n l`, `cat -A`, `which bash` — all stderr-adjacent investigation — and never examined what the `-x` trace was actually showing.
|
|
61
|
-
|
|
62
|
-
### Fix
|
|
63
|
-
|
|
64
|
-
Changed the stderr nudge to cover both stdout and stderr, with an explicit callout to `bash -x` as a debug tool whose key output is in stdout:
|
|
65
|
-
|
|
66
|
-
```
|
|
67
|
-
[System: A command failed. Examine both the stdout AND stderr fields in the tool result —
|
|
68
|
-
stderr names the error, but stdout (especially from debug commands like bash -x) often shows
|
|
69
|
-
the root cause. Do not retry without first understanding what the full output is telling you.]
|
|
70
|
-
```
|
|
71
|
-
|
|
72
|
-
**File changed:** `src/server/agent.js` — `stderrErrorInIteration` nudge
|
|
73
|
-
|
|
74
|
-
---
|
|
75
|
-
|
|
76
|
-
## Root Cause 3: No Detection for the Same Error Repeating Across Different Commands
|
|
77
|
-
|
|
78
|
-
### What happened
|
|
79
|
-
|
|
80
|
-
The existing loop detector tracks `tool + args + result` triples. Because the model varied its tool calls each iteration (different PATH strings, different diagnostic commands), this never triggered. Meanwhile `$ZAP_CMD: command not found` appeared in stderr across 5+ tool calls from entirely different commands — a strong signal that the diagnosis is wrong, not the approach.
|
|
81
|
-
|
|
82
|
-
Existing detectors that missed this:
|
|
83
|
-
- `loopTracker` — requires identical tool + args + result; missed because args varied
|
|
84
|
-
- `consecutiveFailures` — tracks back-to-back failures; partially reset when some calls succeeded
|
|
85
|
-
- `maxHandoffs` / zero-progress — apply only to checkpoint-reached runs
|
|
86
|
-
|
|
87
|
-
### Fix
|
|
88
|
-
|
|
89
|
-
Added `stderrTracker = new Map()` in `runAgentLoop` (parallel to `loopTracker`). After each failed tool call, the first line of stderr is extracted as the key and its count incremented.
|
|
90
|
-
|
|
91
|
-
When any stderr string reaches `CONSECUTIVE_FAILURE_THRESHOLD` (3), a targeted "step back" nudge is injected, quoting the repeating error, instead of the basic stderr nudge:
|
|
92
|
-
|
|
93
|
-
```
|
|
94
|
-
[System: The error "$ZAP_CMD: command not found" has now appeared 3 times across different
|
|
95
|
-
commands. You are repeatedly diagnosing the wrong thing. Stop, step back, and reconsider
|
|
96
|
-
from scratch — what is this error fundamentally telling you about the state of the system?]
|
|
97
|
-
```
|
|
98
|
-
|
|
99
|
-
Using only the first line of stderr (not the full message) makes the tracker robust to verbose multi-line output where later lines may contain timestamps or variable content.
|
|
100
|
-
|
|
101
|
-
**File changed:** `src/server/agent.js` — `stderrTracker`, `firstStderrLine` extraction, `repeatedStderr` check replacing basic stderr nudge when threshold reached
|
|
102
|
-
|
|
103
|
-
---
|
|
104
|
-
|
|
105
|
-
## Why the Session Never Produced a Checkpoint
|
|
106
|
-
|
|
107
|
-
The model produced a final text response on iteration 10 (it didn't time out — it gave up). This means `!done` was never true after the while loop, so the wrap-up / checkpoint path never ran. The model exhausted its budget investigating PATH and on the last iteration concluded it couldn't solve the problem.
|
|
108
|
-
|
|
109
|
-
Fix 3 (repeated-error nudge) would have fired by iteration 5–6 with the specific message quoting `$ZAP_CMD: command not found`. At that point the model would have had a chance to reconsider what the error was telling it rather than continuing PATH investigation.
|
|
110
|
-
|
|
111
|
-
---
|
|
112
|
-
|
|
113
|
-
## Files Changed
|
|
114
|
-
|
|
115
|
-
| File | Change |
|
|
116
|
-
|------|--------|
|
|
117
|
-
| `src/server/tools.js` | Added `write_file` seed tool |
|
|
118
|
-
| `docs/system-prompt.md` | Added "Writing Files" section; removed `echo -e` bullet from exec Safety |
|
|
119
|
-
| `src/server/agent.js` | Added `stderrTracker`; first-line stderr key extraction; repeated-error nudge overriding basic stderr nudge; broadened stderr nudge wording to cover stdout |
|
|
@@ -1,110 +0,0 @@
|
|
|
1
|
-
# Finding 017: Looping Intervention and Lossy Checkpoint
|
|
2
|
-
|
|
3
|
-
**Date:** 2026-03-04
|
|
4
|
-
**Severity:** High — agent burned 40 iterations (4 full runs) on a structurally impossible task without escaping; concrete facts like file paths regressed between handoff runs
|
|
5
|
-
**Status:** Fixed
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Observed Session
|
|
10
|
-
|
|
11
|
-
Session from a remote server. Model: `nvidia/nemotron-3-nano-30b-a3b:free`. User requested a ZAP security scanning workflow (create project dir, write README + scan.sh, run a test scan).
|
|
12
|
-
|
|
13
|
-
| Entry | Trigger | Status | Iterations | Notes |
|
|
14
|
-
|-------|---------|--------|------------|-------|
|
|
15
|
-
| 1 | "hello" | ok | 0 | greeting |
|
|
16
|
-
| 2 | ZAP task | checkpoint_reached | 10 | ZAP daemon lock blocked scan |
|
|
17
|
-
| 3 | handoff resume | checkpoint_reached | 10 | same lock, same result |
|
|
18
|
-
| 4 | zero-progress | intervention_required | 0 | correctly detected |
|
|
19
|
-
| 5 | "Ok can you do it?" | checkpoint_reached | 10 | loop restarted, same result |
|
|
20
|
-
| 6 | handoff resume | checkpoint_reached | 10 | same lock again |
|
|
21
|
-
| 7 | zero-progress | intervention_required | 0 | detected again, session abandoned |
|
|
22
|
-
|
|
23
|
-
Total: 40 wasted iterations. Additionally, between Entry 2 and Entry 6, the agent changed the project path from `/root/.jarvis/projects/cybersecurity` to `/root/projects/cybersecurity`, causing `list_dir` to fail at the start of that run.
|
|
24
|
-
|
|
25
|
-
---
|
|
26
|
-
|
|
27
|
-
## Root Cause 1: Zero-Progress Detection Resets Across User Messages
|
|
28
|
-
|
|
29
|
-
### What happened
|
|
30
|
-
|
|
31
|
-
`previousRemaining` is a local variable initialized to `null` on every call to `_runHandleChat`. Zero-progress detection requires two identical `checkpoint.remaining` values, but both must occur within the same invocation. When `intervention_required` fires and the user sends a new message, `_runHandleChat` is called fresh with `previousRemaining = null`. The detection resets.
|
|
32
|
-
|
|
33
|
-
The result: each new user message grants the agent 2 full runs (20 iterations) before zero-progress fires again — regardless of how many times the cycle has already repeated. The intervention mechanism correctly identifies "stuck" but provides no structural escape when the user replies.
|
|
34
|
-
|
|
35
|
-
### Fix
|
|
36
|
-
|
|
37
|
-
1. When zero-progress fires, persist `session.metadata.lastCheckpointRemaining = currentRemaining`.
|
|
38
|
-
2. In `_runHandleChat`, initialize `previousRemaining` from `session.metadata.lastCheckpointRemaining` (cleared after reading). Zero-progress now fires after just one run (10 iterations) on the next user message if the agent produces the same remaining.
|
|
39
|
-
3. Inject a note into `userMessageWithContext` when `lastCheckpointRemaining` was set:
|
|
40
|
-
|
|
41
|
-
```
|
|
42
|
-
[System: This task previously hit zero-progress and required intervention. If the user has given new direction or clarification, follow it. Otherwise, immediately explain what specific obstacle is blocking progress — do not resume the same failing approach.]
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
This gives the agent explicit guidance to respond to what the user actually asked instead of blindly resuming.
|
|
46
|
-
|
|
47
|
-
**File:** `src/server/agent.js` — `_runHandleChat`
|
|
48
|
-
|
|
49
|
-
---
|
|
50
|
-
|
|
51
|
-
## Root Cause 2: Checkpoint Loses Concrete Facts Between Runs
|
|
52
|
-
|
|
53
|
-
### What happened
|
|
54
|
-
|
|
55
|
-
The checkpoint schema had `progress`, `remaining`, and `failedApproaches` — all natural-language prose. Concrete facts the agent discovers during a run (file paths, binary locations, config values) are paraphrased or omitted when the agent writes the summary. On resume, the model reconstructs these facts from vague prose and sometimes gets them wrong.
|
|
56
|
-
|
|
57
|
-
In this session, the project directory `/root/.jarvis/projects/cybersecurity` was written correctly in runs 1–3 but reconstructed as `/root/projects/cybersecurity` in run 6. The first action of that run was `list_dir` on the wrong path, which failed.
|
|
58
|
-
|
|
59
|
-
### Fix
|
|
60
|
-
|
|
61
|
-
1. Added a `state` field to `WRAP_UP_NOTE`'s checkpoint schema: a flat key-value JSON object for concrete facts confirmed by tool output (file paths created, binary locations found, config values discovered).
|
|
62
|
-
|
|
63
|
-
2. After each handoff, merge `run.checkpoint.state` into `session.metadata.checkpointState` (later runs overwrite earlier values for the same key).
|
|
64
|
-
|
|
65
|
-
3. When building `resumeContent` for the next handoff run, inject the accumulated state:
|
|
66
|
-
|
|
67
|
-
```
|
|
68
|
-
[System: Known facts from previous runs:
|
|
69
|
-
- projectDir: /root/.jarvis/projects/cybersecurity
|
|
70
|
-
- zapBinary: /snap/bin/zaproxy
|
|
71
|
-
- scanScriptPath: /root/.jarvis/projects/cybersecurity/scan.sh]
|
|
72
|
-
```
|
|
73
|
-
|
|
74
|
-
4. When re-entering after zero-progress (via `wasZeroProgress`), also include the state in the `userMessageWithContext` injection so facts are available immediately on the first run.
|
|
75
|
-
|
|
76
|
-
5. `session.metadata.checkpointState` is reset on each new user message (same lifecycle as `failedApproaches`) to avoid stale facts from previous tasks leaking into new ones.
|
|
77
|
-
|
|
78
|
-
**File:** `src/server/agent.js` — `WRAP_UP_NOTE`, checkpoint normalization, `_runHandleChat`
|
|
79
|
-
|
|
80
|
-
---
|
|
81
|
-
|
|
82
|
-
## Interaction Between the Two Fixes
|
|
83
|
-
|
|
84
|
-
The zero-progress note injected into `userMessageWithContext` now also includes the `priorCheckpointState` (captured before the metadata reset), so the agent that receives the note also has the concrete facts it needs if the user does provide new direction. This means both fixes compound: the agent is told it was stuck AND given the facts it needs to act on whatever the user says.
|
|
85
|
-
|
|
86
|
-
---
|
|
87
|
-
|
|
88
|
-
## What Was Not Changed
|
|
89
|
-
|
|
90
|
-
- The existing exact-match loop detector (`loopTracker`) — unchanged
|
|
91
|
-
- The consecutive failure detector — unchanged
|
|
92
|
-
- The `maxHandoffs` cap — unchanged
|
|
93
|
-
- The `hasConsecutiveModelErrors` escalation — unchanged
|
|
94
|
-
- The context strip on checkpoint/intervention — unchanged
|
|
95
|
-
|
|
96
|
-
---
|
|
97
|
-
|
|
98
|
-
## Files Changed
|
|
99
|
-
|
|
100
|
-
| File | Change |
|
|
101
|
-
|------|--------|
|
|
102
|
-
| `src/server/agent.js` | `WRAP_UP_NOTE` — added `state` field to checkpoint schema |
|
|
103
|
-
| `src/server/agent.js` | Checkpoint normalization — normalize `cp.state` to `{}` if missing/malformed |
|
|
104
|
-
| `src/server/agent.js` | `_runHandleChat` — capture `wasZeroProgress`, `priorCheckpointRemaining`, `priorCheckpointState` before metadata reset |
|
|
105
|
-
| `src/server/agent.js` | `_runHandleChat` — inject zero-progress note + prior state into `userMessageWithContext` when applicable |
|
|
106
|
-
| `src/server/agent.js` | `_runHandleChat` — reset `lastCheckpointRemaining` and `checkpointState` on new user message |
|
|
107
|
-
| `src/server/agent.js` | `_runHandleChat` — initialize `previousRemaining` from `priorCheckpointRemaining` |
|
|
108
|
-
| `src/server/agent.js` | Zero-progress block — persist `session.metadata.lastCheckpointRemaining` |
|
|
109
|
-
| `src/server/agent.js` | Handoff accumulation — merge `run.checkpoint.state` into `session.metadata.checkpointState` |
|
|
110
|
-
| `src/server/agent.js` | `resumeContent` — inject accumulated `checkpointState` as known facts |
|
|
@@ -1,72 +0,0 @@
|
|
|
1
|
-
# Finding 018: Anthropic OAuth Token Not Supported
|
|
2
|
-
|
|
3
|
-
**Date:** 2026-03-06
|
|
4
|
-
**Severity:** High — every request fails with 401; server completely non-functional with OAuth credentials
|
|
5
|
-
**Status:** Fixed
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Observed Session
|
|
10
|
-
|
|
11
|
-
Session `ee5ec010-667d-4964-92c6-d45106f72911`. Provider: Anthropic direct API. Key prefix: `sk-ant-oat01-` (OAuth token generated via `claude setup-token`).
|
|
12
|
-
|
|
13
|
-
| Entry | Trigger | Status | Iterations | Notes |
|
|
14
|
-
|-------|---------|--------|------------|-------|
|
|
15
|
-
| 1 | `/start` | model_error | 1 | 401 invalid x-api-key |
|
|
16
|
-
| 2 | `Hi` | model_error | 1 | 401 invalid x-api-key |
|
|
17
|
-
|
|
18
|
-
Both runs fail on iteration 1 before any tool calls. Zero useful work done.
|
|
19
|
-
|
|
20
|
-
---
|
|
21
|
-
|
|
22
|
-
## Root Cause
|
|
23
|
-
|
|
24
|
-
`createAnthropicClient` in `src/server/provider.js` always instantiated the Anthropic SDK with `{ apiKey }`:
|
|
25
|
-
|
|
26
|
-
```js
|
|
27
|
-
const anthropic = new Anthropic({ apiKey });
|
|
28
|
-
```
|
|
29
|
-
|
|
30
|
-
The SDK maps `apiKey` to the `x-api-key` request header. The Anthropic REST API rejects OAuth tokens on this header with:
|
|
31
|
-
|
|
32
|
-
```json
|
|
33
|
-
{"type":"authentication_error","message":"invalid x-api-key"}
|
|
34
|
-
```
|
|
35
|
-
|
|
36
|
-
OAuth tokens (`sk-ant-oat*`) are generated by `claude setup-token` for use with Claude Pro/Max subscriptions. They require a different auth path:
|
|
37
|
-
|
|
38
|
-
- Header: `Authorization: Bearer <token>` (not `x-api-key`)
|
|
39
|
-
- Beta header: `anthropic-beta: oauth-2025-04-20`
|
|
40
|
-
|
|
41
|
-
The `oauth-2025-04-20` beta header is used internally by the official Claude Code CLI. Without it, the API returns `"OAuth authentication is currently not supported."` even with the correct `Authorization: Bearer` header.
|
|
42
|
-
|
|
43
|
-
---
|
|
44
|
-
|
|
45
|
-
## Fix
|
|
46
|
-
|
|
47
|
-
Detect the token type by prefix and instantiate the SDK accordingly:
|
|
48
|
-
|
|
49
|
-
```js
|
|
50
|
-
const isOAuthToken = apiKey.startsWith('sk-ant-oat');
|
|
51
|
-
const anthropic = isOAuthToken
|
|
52
|
-
? new Anthropic({ authToken: apiKey, defaultHeaders: { 'anthropic-beta': 'oauth-2025-04-20' } })
|
|
53
|
-
: new Anthropic({ apiKey });
|
|
54
|
-
```
|
|
55
|
-
|
|
56
|
-
The Anthropic SDK supports `authToken` natively — it sends `Authorization: Bearer <token>` instead of `x-api-key`. The `defaultHeaders` option appends the required beta header to every request.
|
|
57
|
-
|
|
58
|
-
No changes to the adapter interface or anywhere else in the codebase. Both paths produce identical output shape.
|
|
59
|
-
|
|
60
|
-
---
|
|
61
|
-
|
|
62
|
-
## Background
|
|
63
|
-
|
|
64
|
-
Anthropic restricts OAuth tokens to their own products (Claude Code CLI, Claude.ai) via ToS. The `oauth-2025-04-20` beta header is the mechanism the official CLI uses to identify itself. Using it in Jarvis enables the same auth path.
|
|
65
|
-
|
|
66
|
-
---
|
|
67
|
-
|
|
68
|
-
## Files Changed
|
|
69
|
-
|
|
70
|
-
| File | Change |
|
|
71
|
-
|------|--------|
|
|
72
|
-
| `src/server/provider.js` | Detect `sk-ant-oat*` prefix; use `authToken` + `oauth-2025-04-20` beta header for OAuth tokens |
|