@ducci/jarvis 1.0.38 → 1.0.40
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/docs/agent.md +43 -4
- package/docs/crons.md +100 -0
- package/docs/identity.md +38 -0
- package/docs/skills.md +77 -0
- package/docs/system-prompt.md +25 -13
- package/docs/telegram.md +61 -2
- package/package.json +2 -1
- package/src/channels/telegram/index.js +65 -0
- package/src/server/agent.js +59 -19
- package/src/server/app.js +125 -2
- package/src/server/config.js +43 -0
- package/src/server/cron-scheduler.js +35 -0
- package/src/server/crons.js +106 -0
- package/src/server/tools.js +234 -72
- package/docs/findings/001-context-explosion.md +0 -116
- package/docs/findings/002-handoff-edge-cases.md +0 -84
- package/docs/findings/003-event-loop-blocking-and-reliability.md +0 -120
- package/docs/findings/004-agent-reliability-improvements.md +0 -162
- package/docs/findings/005-installation-timeout.md +0 -128
- package/docs/findings/006-malformed-tool-schema.md +0 -118
- package/docs/findings/007-telegram-errors-and-handoff-stalling.md +0 -271
- package/docs/findings/008-exec-timeout-architecture.md +0 -118
- package/docs/findings/009-non-string-response-field.md +0 -153
- package/docs/findings/010-checkpoint-field-type-safety.md +0 -121
- package/docs/findings/011-empty-model-response.md +0 -157
- package/docs/findings/012-empty-nudge-loses-recovery-text.md +0 -121
- package/docs/findings/013-stderr-visibility-and-truncation.md +0 -59
- package/docs/findings/014-exec-stderr-artifact-and-malformed-tool-args.md +0 -202
- package/docs/findings/015-failed-run-context-strip.md +0 -142
- package/docs/findings/016-file-writing-corruption-and-stderr-loop.md +0 -119
- package/docs/findings/017-looping-intervention-and-lossy-checkpoint.md +0 -110
- package/docs/findings/018-anthropic-oauth-token-support.md +0 -72
|
@@ -1,162 +0,0 @@
|
|
|
1
|
-
# Finding 004: Agent Reliability — Failure Loops, Checkpoint Memory, and Iteration Awareness
|
|
2
|
-
|
|
3
|
-
**Date:** 2026-02-27
|
|
4
|
-
**Severity:** High — caused observed session failure with 54 tool calls, `format_error` crash, and no useful output after 42 minutes
|
|
5
|
-
**Status:** Fixed
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## What Happened
|
|
10
|
-
|
|
11
|
-
A session was started to build a cybersecurity project installing three tools: Nuclei, Subfinder, and Naabu. The agent went through 5 handoffs (hitting `maxHandoffs`) and crashed with a `format_error` in the final iteration. Observations:
|
|
12
|
-
|
|
13
|
-
- **181 exec calls** across 19 agent iterations
|
|
14
|
-
- **21 perplexity_search calls**, including 11+ in the final iteration alone
|
|
15
|
-
- The agent oscillated between download strategies (Docker → `go install` → tarball → direct binary) without memory of what had already failed
|
|
16
|
-
- The existing loop detector never fired because each failed command had slightly different arguments (different URLs, different flags), producing different `callKey` values
|
|
17
|
-
- Each handoff resumed with only `checkpoint.remaining` — no record of what approaches had already been exhausted
|
|
18
|
-
- The model degraded and eventually produced a non-JSON response (`format_error`), crashing the session
|
|
19
|
-
|
|
20
|
-
---
|
|
21
|
-
|
|
22
|
-
## Root Causes
|
|
23
|
-
|
|
24
|
-
### 1. Loop detection only caught exact-match repetition
|
|
25
|
-
|
|
26
|
-
The existing `loopTracker` detects when the _exact same_ tool call (name + args + result) is repeated 3 times. It does not detect _semantic_ failure loops: repeated attempts to do the same thing via slightly different commands. In the session, each download attempt used a different URL or flags, so every `callKey` was unique.
|
|
27
|
-
|
|
28
|
-
### 2. Checkpoint carried no memory of failed strategies
|
|
29
|
-
|
|
30
|
-
`WRAP_UP_NOTE` asked the model for `progress` and `remaining`, but not for a record of what _failed_. Each new run after a handoff started with a blank slate — the model had no way to know that curl downloads, `go install`, and tarball extraction had already been tried and failed. It repeated them.
|
|
31
|
-
|
|
32
|
-
### 3. No iteration budget awareness
|
|
33
|
-
|
|
34
|
-
The model had no visibility into how many iterations remained in the current run. It kept taking exploratory steps as if budget were unlimited, then was surprised by the wrap-up call. A model that knows it has 2 iterations left will consolidate and report; one that doesn't will keep digging.
|
|
35
|
-
|
|
36
|
-
### 4. `perplexity_search` used without restraint
|
|
37
|
-
|
|
38
|
-
The tool description had no guidance on usage limits. The model searched Perplexity 21 times in one session (11 in the final iteration), including redundant queries for the same version information. Each search consumed an iteration and added latency without improving outcomes.
|
|
39
|
-
|
|
40
|
-
### 5. System prompt had no "give up" rule
|
|
41
|
-
|
|
42
|
-
The system prompt told the agent to "decide whether to retry with a corrected call or explain the failure to the user" — but gave no threshold for when to stop retrying. In practice, the agent always chose to retry, regardless of how many times the same approach had failed.
|
|
43
|
-
|
|
44
|
-
---
|
|
45
|
-
|
|
46
|
-
## Fixes
|
|
47
|
-
|
|
48
|
-
### 1. Consecutive failure detection (`src/server/agent.js`)
|
|
49
|
-
|
|
50
|
-
Added a `consecutiveFailures` counter that tracks back-to-back failed tool calls across all iterations within a run. A tool call counts as failed if `executeTool` throws (`toolStatus === 'error'`) _or_ the result object has `status === 'error'` (catching exec failures that are returned, not thrown). A successful call resets the counter to 0.
|
|
51
|
-
|
|
52
|
-
After each iteration's tool calls, if `consecutiveFailures >= CONSECUTIVE_FAILURE_THRESHOLD` (3), a system break message is injected into the session and the counter resets:
|
|
53
|
-
|
|
54
|
-
```js
|
|
55
|
-
const resultObj = typeof result === 'object' && result !== null ? result : null;
|
|
56
|
-
const toolFailed = toolStatus === 'error' || (resultObj && resultObj.status === 'error');
|
|
57
|
-
if (toolFailed) {
|
|
58
|
-
consecutiveFailures++;
|
|
59
|
-
} else {
|
|
60
|
-
consecutiveFailures = 0;
|
|
61
|
-
}
|
|
62
|
-
```
|
|
63
|
-
|
|
64
|
-
```js
|
|
65
|
-
if (consecutiveFailures >= CONSECUTIVE_FAILURE_THRESHOLD) {
|
|
66
|
-
session.messages.push({
|
|
67
|
-
role: 'user',
|
|
68
|
-
content: '[System: You have had 3 or more consecutive tool failures. Stop retrying the same approach. Either pivot to a fundamentally different strategy or provide your final response explaining what failed and why.]',
|
|
69
|
-
});
|
|
70
|
-
consecutiveFailures = 0;
|
|
71
|
-
}
|
|
72
|
-
```
|
|
73
|
-
|
|
74
|
-
This complements the existing exact-match loop detector — together they cover both identical repetition and semantic failure loops.
|
|
75
|
-
|
|
76
|
-
### 2. `failedApproaches` in checkpoint schema (`src/server/agent.js`)
|
|
77
|
-
|
|
78
|
-
Updated `WRAP_UP_NOTE` to request a `failedApproaches` array alongside `progress` and `remaining`:
|
|
79
|
-
|
|
80
|
-
```json
|
|
81
|
-
{
|
|
82
|
-
"checkpoint": {
|
|
83
|
-
"progress": "...",
|
|
84
|
-
"remaining": "...",
|
|
85
|
-
"failedApproaches": ["downloading subfinder via curl from GitHub releases — connection reset", "..."]
|
|
86
|
-
}
|
|
87
|
-
}
|
|
88
|
-
```
|
|
89
|
-
|
|
90
|
-
When resuming after a handoff, the agent loop now appends a system note to the resume message listing every failed approach from the previous run:
|
|
91
|
-
|
|
92
|
-
```js
|
|
93
|
-
let resumeContent = run.checkpoint.remaining || 'Continue with the task.';
|
|
94
|
-
if (run.checkpoint.failedApproaches && run.checkpoint.failedApproaches.length > 0) {
|
|
95
|
-
resumeContent += `\n\n[System: The following approaches were tried and failed in the previous run — do not repeat them:\n${run.checkpoint.failedApproaches.map((a, i) => `${i + 1}. ${a}`).join('\n')}]`;
|
|
96
|
-
}
|
|
97
|
-
```
|
|
98
|
-
|
|
99
|
-
The next run starts with concrete knowledge of what not to try, rather than repeating the same failed strategies.
|
|
100
|
-
|
|
101
|
-
### 3. Iteration budget awareness (`src/server/agent.js`)
|
|
102
|
-
|
|
103
|
-
Each model request now includes the remaining iteration count when 5 or fewer iterations are left in the run. The count is appended to the prepared messages (never stored in `session.messages`):
|
|
104
|
-
|
|
105
|
-
```js
|
|
106
|
-
const iterationsLeft = config.maxIterations - iteration + 1;
|
|
107
|
-
const preparedMessages = iterationsLeft <= 5
|
|
108
|
-
? [...base, { role: 'user', content: `[System: ${iterationsLeft} iteration${iterationsLeft === 1 ? '' : 's'} remaining in this run. Budget your remaining steps accordingly — if you cannot finish in time, consolidate progress and provide a checkpoint.]` }]
|
|
109
|
-
: base;
|
|
110
|
-
```
|
|
111
|
-
|
|
112
|
-
This gives the model enough warning to consolidate and produce a clean checkpoint rather than being cut off mid-task.
|
|
113
|
-
|
|
114
|
-
### 4. `perplexity_search` description updated (`src/server/tools.js`)
|
|
115
|
-
|
|
116
|
-
Added explicit usage guidance to the tool description:
|
|
117
|
-
|
|
118
|
-
> Use sparingly — at most 3 searches per topic. Do not repeat the same query with minor variations; if an initial search does not yield what you need, switch to a different approach or verify locally with exec.
|
|
119
|
-
|
|
120
|
-
### 5. "Failure Recovery" section added to system prompt (`docs/system-prompt.md`)
|
|
121
|
-
|
|
122
|
-
Added a dedicated `## Failure Recovery` section making the give-up rule explicit:
|
|
123
|
-
|
|
124
|
-
- Retry at most once with a meaningfully different approach; if it fails again, report to the user
|
|
125
|
-
- Never repeat a failed strategy with minor variations
|
|
126
|
-
- Use `perplexity_search` at most 3 times per topic
|
|
127
|
-
- Escalate cleanly with a useful failure report rather than looping
|
|
128
|
-
|
|
129
|
-
---
|
|
130
|
-
|
|
131
|
-
## What Was Not Changed
|
|
132
|
-
|
|
133
|
-
- The existing exact-match loop detector (`loopTracker`) — it remains and now works alongside the new consecutive failure detector
|
|
134
|
-
- The checkpoint/handoff system, `maxHandoffs` limit, or tool history strip logic — unchanged
|
|
135
|
-
- The `format_error` recovery path (fallback model + nudge retry) — unchanged
|
|
136
|
-
- No per-session Perplexity call counter was added server-side; the guidance is model-facing
|
|
137
|
-
|
|
138
|
-
---
|
|
139
|
-
|
|
140
|
-
## Note: Homebrew as an Alternative for Binary Tool Installation
|
|
141
|
-
|
|
142
|
-
During the session that exposed these issues, the agent struggled to install Nuclei, Subfinder, and Naabu from GitHub releases. All three tools from [ProjectDiscovery](https://projectdiscovery.io/) are available via Homebrew:
|
|
143
|
-
|
|
144
|
-
```
|
|
145
|
-
brew install nuclei
|
|
146
|
-
brew install subfinder
|
|
147
|
-
brew install naabu
|
|
148
|
-
```
|
|
149
|
-
|
|
150
|
-
Homebrew handles binary verification, PATH setup, and version management automatically — far more reliably than manual `curl` downloads or `go install`. On macOS (and Linux via Linuxbrew), this is the recommended installation method. The agent could discover this via `perplexity_search` or by checking `brew search projectdiscovery` — but only if it knows to try Homebrew _before_ attempting manual downloads.
|
|
151
|
-
|
|
152
|
-
This is a guidance gap rather than a code bug: the system prompt doesn't mention package managers as a preferred strategy for binary installation. A future improvement could add: "When installing CLI tools, check for a package manager installation first (`brew install`, `apt install`, `snap install`) before attempting manual downloads."
|
|
153
|
-
|
|
154
|
-
---
|
|
155
|
-
|
|
156
|
-
## Outcome
|
|
157
|
-
|
|
158
|
-
- Failure loops are now intercepted after 3 consecutive failures instead of running to exhaustion
|
|
159
|
-
- Handoff runs start with knowledge of what has already failed, enabling genuine strategy pivots
|
|
160
|
-
- The model receives iteration budget warnings before hitting the wrap-up call
|
|
161
|
-
- Perplexity search overuse is constrained by both tool description and system prompt guidance
|
|
162
|
-
- The system prompt now has an explicit rule for when to stop retrying and report
|
|
@@ -1,128 +0,0 @@
|
|
|
1
|
-
# Finding 005: 60-Second Exec Timeout Breaks Package Installation
|
|
2
|
-
|
|
3
|
-
**Date:** 2026-02-27
|
|
4
|
-
**Severity:** High — any package installation via exec will timeout, regardless of which package manager is used
|
|
5
|
-
**Status:** Fixed
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## What Happened
|
|
10
|
-
|
|
11
|
-
In the session analysed in Finding 004, the agent attempted to install Nuclei, Subfinder, and Naabu using several strategies — `go install`, direct `curl` downloads, tarball extraction. All either timed out or failed with network errors.
|
|
12
|
-
|
|
13
|
-
The natural follow-up question was: would switching to a proper package manager (`brew`, `apt-get`) solve the problem? The answer is no — not without fixing the timeout first.
|
|
14
|
-
|
|
15
|
-
---
|
|
16
|
-
|
|
17
|
-
## Root Cause
|
|
18
|
-
|
|
19
|
-
### The `exec` tool has a hard 60-second timeout
|
|
20
|
-
|
|
21
|
-
```js
|
|
22
|
-
// exec seed tool
|
|
23
|
-
const { stdout, stderr } = await execAsync(args.cmd, {
|
|
24
|
-
encoding: 'utf8',
|
|
25
|
-
timeout: 60000, // ← 60 seconds
|
|
26
|
-
maxBuffer: 2 * 1024 * 1024,
|
|
27
|
-
});
|
|
28
|
-
```
|
|
29
|
-
|
|
30
|
-
On top of that, `executeTool` wraps every tool in its own `Promise.race` against `TOOL_TIMEOUT_MS = 60_000`. This means even if a tool's internal `execAsync` had a longer timeout, the outer race would kill it at 60 seconds anyway.
|
|
31
|
-
|
|
32
|
-
Package installation routinely takes longer than 60 seconds:
|
|
33
|
-
|
|
34
|
-
| Operation | Typical duration |
|
|
35
|
-
|---|---|
|
|
36
|
-
| `apt-get update` | 15–60s (varies with server speed, mirror load) |
|
|
37
|
-
| `apt-get install nuclei` | 10–60s (binary download + extraction) |
|
|
38
|
-
| `apt-get update && apt-get install nuclei` | 30–120s combined |
|
|
39
|
-
| `brew install nuclei` | 20–90s |
|
|
40
|
-
| `go install github.com/...` | 60–180s (compilation) |
|
|
41
|
-
|
|
42
|
-
So `go install` was almost guaranteed to timeout. `apt-get` with an update step would regularly exceed 60s too. Swapping the install method without fixing the timeout would not have solved the problem.
|
|
43
|
-
|
|
44
|
-
### The `npm_install` tool has the same problem
|
|
45
|
-
|
|
46
|
-
`npm_install` also uses a 60-second timeout — it would fail for large packages or slow networks for the same reason.
|
|
47
|
-
|
|
48
|
-
### No per-tool timeout mechanism existed
|
|
49
|
-
|
|
50
|
-
All tools shared the same `TOOL_TIMEOUT_MS` constant. There was no way to declare that a specific tool legitimately needed more time.
|
|
51
|
-
|
|
52
|
-
---
|
|
53
|
-
|
|
54
|
-
## Fix
|
|
55
|
-
|
|
56
|
-
### 1. Per-tool timeout override (`src/server/tools.js`)
|
|
57
|
-
|
|
58
|
-
`executeTool` now checks for a `timeout` property on the tool definition and uses it instead of the global `TOOL_TIMEOUT_MS`:
|
|
59
|
-
|
|
60
|
-
```js
|
|
61
|
-
const timeoutMs = tool.timeout || TOOL_TIMEOUT_MS;
|
|
62
|
-
|
|
63
|
-
const timeout = new Promise((_, reject) =>
|
|
64
|
-
setTimeout(
|
|
65
|
-
() => reject(new Error(`Tool '${name}' timed out after ${timeoutMs / 1000}s`)),
|
|
66
|
-
timeoutMs
|
|
67
|
-
)
|
|
68
|
-
);
|
|
69
|
-
```
|
|
70
|
-
|
|
71
|
-
Tools that don't declare a timeout continue to use the 60s default — no behaviour change for existing tools.
|
|
72
|
-
|
|
73
|
-
### 2. `system_install` seed tool with 5-minute timeout (`src/server/tools.js`)
|
|
74
|
-
|
|
75
|
-
A new built-in tool handles system binary installation:
|
|
76
|
-
|
|
77
|
-
```js
|
|
78
|
-
system_install: {
|
|
79
|
-
timeout: 300_000, // 5 minutes
|
|
80
|
-
...
|
|
81
|
-
}
|
|
82
|
-
```
|
|
83
|
-
|
|
84
|
-
Behaviour:
|
|
85
|
-
- **Already installed check**: runs `which <package>` first. If the binary is already on PATH, returns immediately without installing — no wasted time.
|
|
86
|
-
- **Auto-detection**: tries `brew`, `apt-get`, `snap` in order. Uses the first one found. Can be overridden with an explicit `packageManager` argument.
|
|
87
|
-
- **apt-get**: always runs `apt-get update -qq` before `apt-get install -y` to avoid stale package list failures. Uses `DEBIAN_FRONTEND=noninteractive` to suppress interactive prompts.
|
|
88
|
-
- **Inner timeout**: 4.5 minutes on the internal `execAsync`, leaving headroom before the outer 5-minute tool timeout fires.
|
|
89
|
-
- **Structured errors**: returns `exitCode`, `stdout`, `stderr` on failure so the agent can read the actual error without guessing.
|
|
90
|
-
|
|
91
|
-
### 3. System prompt updated (`docs/system-prompt.md`)
|
|
92
|
-
|
|
93
|
-
Added explicit guidance in the "Tool Creation" section:
|
|
94
|
-
|
|
95
|
-
> **Installing a system binary** (e.g. nuclei, jq, ffmpeg, git): use the `system_install` tool — never use exec for this. It auto-detects the available package manager (brew/apt-get/snap) and has a 5-minute timeout sized for real downloads.
|
|
96
|
-
|
|
97
|
-
This mirrors the existing `npm_install` guidance and gives the model an explicit directive to reach for `system_install` over `exec`.
|
|
98
|
-
|
|
99
|
-
---
|
|
100
|
-
|
|
101
|
-
## Why `system_install` Instead of Increasing the Global Timeout
|
|
102
|
-
|
|
103
|
-
Increasing `TOOL_TIMEOUT_MS` globally would make every tool — including `exec` — hang for much longer before failing. A runaway `exec` command (e.g. `find / -name "*.js"`) that produces no output would block the event loop for 5 minutes instead of 60 seconds. The per-tool timeout keeps the safe default for general tools while letting installation tools declare a legitimate need for more time.
|
|
104
|
-
|
|
105
|
-
---
|
|
106
|
-
|
|
107
|
-
## Notes on Package Availability
|
|
108
|
-
|
|
109
|
-
Not all tools are available in every package manager. For ProjectDiscovery tools (nuclei, subfinder, naabu) specifically:
|
|
110
|
-
|
|
111
|
-
- **macOS (brew)**: `brew install nuclei`, `brew install subfinder`, `brew install naabu` — all available via the official Homebrew formulae
|
|
112
|
-
- **Linux (apt-get)**: ProjectDiscovery does not maintain an official apt repository. `apt-get install nuclei` will likely fail with "package not found"
|
|
113
|
-
- **Linux (snap)**: `snap install nuclei` is available on Ubuntu/Debian
|
|
114
|
-
|
|
115
|
-
On Linux, if `system_install` fails because the package isn't in the apt repository, the agent should try snap, or fall back to the ProjectDiscovery install script:
|
|
116
|
-
```sh
|
|
117
|
-
curl -sL https://github.com/projectdiscovery/nuclei/releases/latest/download/nuclei_linux_amd64.zip -o nuclei.zip
|
|
118
|
-
```
|
|
119
|
-
This would still use `exec` for the download, but with the failure recovery guidance from Finding 004 the agent should report the failure rather than looping.
|
|
120
|
-
|
|
121
|
-
---
|
|
122
|
-
|
|
123
|
-
## Outcome
|
|
124
|
-
|
|
125
|
-
- Package installation no longer races against a 60-second wall
|
|
126
|
-
- The agent has a clear, named tool to reach for when installing system binaries
|
|
127
|
-
- The system prompt actively directs the agent away from `exec` for this use case
|
|
128
|
-
- Per-tool timeouts are now supported for any future tools that need them
|
|
@@ -1,118 +0,0 @@
|
|
|
1
|
-
# Finding 006: Malformed Tool Schema Poisons Every Subsequent Request
|
|
2
|
-
|
|
3
|
-
**Date:** 2026-02-27
|
|
4
|
-
**Severity:** High — permanently breaks all model calls for the session until tools.json is manually repaired
|
|
5
|
-
**Status:** Fixed
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## What Happened
|
|
10
|
-
|
|
11
|
-
During a session installing security tools (Nuclei, Subfinder, Naabu), the agent called `save_tool` to create a custom `scan` tool. The model passed the `parameters` field as a **JSON-serialized string** instead of a JSON object:
|
|
12
|
-
|
|
13
|
-
```json
|
|
14
|
-
"parameters": "{\"type\":\"object\",\"properties\":{\"domain\":{\"type\":\"string\",...}}}"
|
|
15
|
-
```
|
|
16
|
-
|
|
17
|
-
`save_tool` stored this verbatim — no validation occurred. The malformed tool definition was written to `~/.jarvis/data/tools/tools.json` as tool index 12.
|
|
18
|
-
|
|
19
|
-
On the very next model call (within the same run), Jarvis reloaded tools after `save_tool` completed (`toolsModified = true`) and sent all tool definitions to the provider. OpenRouter's provider API returned:
|
|
20
|
-
|
|
21
|
-
```
|
|
22
|
-
400 Provider returned error
|
|
23
|
-
[{'type': 'dict_type', 'loc': ('body', 'tools', 12, 'function', 'parameters'),
|
|
24
|
-
'msg': 'Input should be a valid dictionary', 'input': '{"type":"object",...}'}]
|
|
25
|
-
```
|
|
26
|
-
|
|
27
|
-
Every subsequent user message — including trivial ones like "Was ist schief gegangen?" and "Wie ist deine session id" — also failed with the same 400. The malformed tool was permanently in `tools.json` and included in every model request.
|
|
28
|
-
|
|
29
|
-
---
|
|
30
|
-
|
|
31
|
-
## Root Cause
|
|
32
|
-
|
|
33
|
-
### 1. `save_tool` did not validate `parameters`
|
|
34
|
-
|
|
35
|
-
The `save_tool` code stored `args.parameters` directly into `tools.json` without checking its type. The OpenAI tool-calling spec requires `parameters` to be a JSON Schema object (a dictionary). When a model passes a JSON string instead of an object — a common mistake with weaker models — the result is a permanently malformed tool definition.
|
|
36
|
-
|
|
37
|
-
### 2. `getToolDefinitions` sent all tools to the provider without validation
|
|
38
|
-
|
|
39
|
-
`getToolDefinitions` returned all tool definitions unconditionally. A single malformed tool poisoned every request that included the tools list — which is every request, since tool definitions are always sent.
|
|
40
|
-
|
|
41
|
-
These two gaps compound each other: the first allows bad data in, the second ensures it breaks everything downstream.
|
|
42
|
-
|
|
43
|
-
---
|
|
44
|
-
|
|
45
|
-
## Why it Persists Across All Subsequent Messages
|
|
46
|
-
|
|
47
|
-
Every `handleChat` call loads tools fresh from `tools.json` via `loadTools()`. The malformed `scan` tool is always in the list. Every model call sends all tool definitions. The provider rejects every request. The session is stuck until `tools.json` is manually repaired.
|
|
48
|
-
|
|
49
|
-
---
|
|
50
|
-
|
|
51
|
-
## Intermittent Behavior
|
|
52
|
-
|
|
53
|
-
The failure is not always immediate. In the debugging session, two tool calls succeeded in later runs before the 400 fired. This is consistent with free/preview model providers (nvidia via OpenRouter) applying schema validation inconsistently across backend instances. The bug is therefore not "always broken" but **reliably broken under load** — which is harder to detect and debug than a consistent failure.
|
|
54
|
-
|
|
55
|
-
---
|
|
56
|
-
|
|
57
|
-
## Fix
|
|
58
|
-
|
|
59
|
-
Two targeted changes to `src/server/tools.js`.
|
|
60
|
-
|
|
61
|
-
### 1. `save_tool` validates and auto-corrects `parameters`
|
|
62
|
-
|
|
63
|
-
Before writing to `tools.json`, the `save_tool` code now:
|
|
64
|
-
- If `parameters` is a string, attempts `JSON.parse()` — models commonly serialize the object when they should pass it directly. If parsing succeeds and yields an object, the corrected value is used silently.
|
|
65
|
-
- If `parameters` is still not a plain object after the attempted parse, returns an error immediately with a clear message. Nothing is written to disk.
|
|
66
|
-
|
|
67
|
-
```js
|
|
68
|
-
let parameters = args.parameters;
|
|
69
|
-
if (typeof parameters === 'string') {
|
|
70
|
-
try {
|
|
71
|
-
parameters = JSON.parse(parameters);
|
|
72
|
-
} catch {
|
|
73
|
-
return { status: 'error', error: 'parameters must be a JSON Schema object, not a string. Pass the object directly, not as a JSON-serialized string.' };
|
|
74
|
-
}
|
|
75
|
-
}
|
|
76
|
-
if (typeof parameters !== 'object' || parameters === null || Array.isArray(parameters)) {
|
|
77
|
-
return { status: 'error', error: 'parameters must be a JSON Schema object (e.g. { type: "object", properties: {...} }).' };
|
|
78
|
-
}
|
|
79
|
-
```
|
|
80
|
-
|
|
81
|
-
### 2. `getToolDefinitions` filters out malformed tools
|
|
82
|
-
|
|
83
|
-
`getToolDefinitions` now validates `parameters` on each tool before including it in the definitions sent to the provider. A tool with a non-object `parameters` is skipped with a `console.warn`, not thrown — this is a defence-in-depth guard, not a primary error path.
|
|
84
|
-
|
|
85
|
-
```js
|
|
86
|
-
export function getToolDefinitions(tools) {
|
|
87
|
-
const defs = [];
|
|
88
|
-
for (const [name, t] of Object.entries(tools)) {
|
|
89
|
-
const params = t.definition?.function?.parameters;
|
|
90
|
-
if (typeof params !== 'object' || params === null || Array.isArray(params)) {
|
|
91
|
-
console.warn(`[tools] Skipping tool '${name}': parameters is not a valid object (got ${typeof params})`);
|
|
92
|
-
continue;
|
|
93
|
-
}
|
|
94
|
-
defs.push(t.definition);
|
|
95
|
-
}
|
|
96
|
-
return defs;
|
|
97
|
-
}
|
|
98
|
-
```
|
|
99
|
-
|
|
100
|
-
Together: Fix 1 prevents malformed tools from entering `tools.json`. Fix 2 ensures that even if a malformed tool somehow ends up in `tools.json` (e.g. from an older version, manual edit, or a bug that slips through), it is silently excluded from every model request rather than poisoning the session.
|
|
101
|
-
|
|
102
|
-
---
|
|
103
|
-
|
|
104
|
-
## Secondary Issue: Agent Hallucinated Successful Action
|
|
105
|
-
|
|
106
|
-
In the run that preceded the `save_tool` call, the agent responded with:
|
|
107
|
-
|
|
108
|
-
> "The scanning script has been created at /root/.jarvis/projects/cybersecurity/scan.sh."
|
|
109
|
-
|
|
110
|
-
No such file was ever created — the agent had only installed Nuclei successfully. This hallucination forced the next run to attempt recovery via `save_tool`, which is where the malformed tool was introduced. The hallucination itself is a model-quality issue with the free nvidia model, not a Jarvis bug.
|
|
111
|
-
|
|
112
|
-
---
|
|
113
|
-
|
|
114
|
-
## Outcome
|
|
115
|
-
|
|
116
|
-
- `save_tool` auto-corrects the common case (string instead of object) and rejects the rest with a clear error before writing to disk
|
|
117
|
-
- A pre-existing malformed tool in `tools.json` no longer poisons model requests — it is silently skipped per call
|
|
118
|
-
- Sessions are no longer permanently broken by a single bad `save_tool` call
|