@ducci/jarvis 1.0.22 → 1.0.24
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/docs/findings/006-malformed-tool-schema.md +118 -0
- package/docs/findings/007-telegram-errors-and-handoff-stalling.md +271 -0
- package/docs/system-prompt.md +1 -0
- package/package.json +1 -1
- package/src/channels/telegram/index.js +9 -4
- package/src/server/agent.js +39 -4
- package/src/server/sessions.js +1 -0
- package/src/server/tools.js +11 -2
|
@@ -0,0 +1,118 @@
|
|
|
1
|
+
# Finding 006: Malformed Tool Schema Poisons Every Subsequent Request
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-02-27
|
|
4
|
+
**Severity:** High — permanently breaks all model calls for the session until tools.json is manually repaired
|
|
5
|
+
**Status:** Fixed
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## What Happened
|
|
10
|
+
|
|
11
|
+
During a session installing security tools (Nuclei, Subfinder, Naabu), the agent called `save_tool` to create a custom `scan` tool. The model passed the `parameters` field as a **JSON-serialized string** instead of a JSON object:
|
|
12
|
+
|
|
13
|
+
```json
|
|
14
|
+
"parameters": "{\"type\":\"object\",\"properties\":{\"domain\":{\"type\":\"string\",...}}}"
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
`save_tool` stored this verbatim — no validation occurred. The malformed tool definition was written to `~/.jarvis/data/tools/tools.json` as tool index 12.
|
|
18
|
+
|
|
19
|
+
On the very next model call (within the same run), Jarvis reloaded tools after `save_tool` completed (`toolsModified = true`) and sent all tool definitions to the provider. OpenRouter's provider API returned:
|
|
20
|
+
|
|
21
|
+
```
|
|
22
|
+
400 Provider returned error
|
|
23
|
+
[{'type': 'dict_type', 'loc': ('body', 'tools', 12, 'function', 'parameters'),
|
|
24
|
+
'msg': 'Input should be a valid dictionary', 'input': '{"type":"object",...}'}]
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
Every subsequent user message — including trivial ones like "Was ist schief gegangen?" and "Wie ist deine session id" — also failed with the same 400. The malformed tool was permanently in `tools.json` and included in every model request.
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
## Root Cause
|
|
32
|
+
|
|
33
|
+
### 1. `save_tool` did not validate `parameters`
|
|
34
|
+
|
|
35
|
+
The `save_tool` code stored `args.parameters` directly into `tools.json` without checking its type. The OpenAI tool-calling spec requires `parameters` to be a JSON Schema object (a dictionary). When a model passes a JSON string instead of an object — a common mistake with weaker models — the result is a permanently malformed tool definition.
|
|
36
|
+
|
|
37
|
+
### 2. `getToolDefinitions` sent all tools to the provider without validation
|
|
38
|
+
|
|
39
|
+
`getToolDefinitions` returned all tool definitions unconditionally. A single malformed tool poisoned every request that included the tools list — which is every request, since tool definitions are always sent.
|
|
40
|
+
|
|
41
|
+
These two gaps compound each other: the first allows bad data in, the second ensures it breaks everything downstream.
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## Why it Persists Across All Subsequent Messages
|
|
46
|
+
|
|
47
|
+
Every `handleChat` call loads tools fresh from `tools.json` via `loadTools()`. The malformed `scan` tool is always in the list. Every model call sends all tool definitions. The provider rejects every request. The session is stuck until `tools.json` is manually repaired.
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## Intermittent Behavior
|
|
52
|
+
|
|
53
|
+
The failure is not always immediate. In the debugging session, two tool calls succeeded in later runs before the 400 fired. This is consistent with free/preview model providers (nvidia via OpenRouter) applying schema validation inconsistently across backend instances. The bug is therefore not "always broken" but **reliably broken under load** — which is harder to detect and debug than a consistent failure.
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## Fix
|
|
58
|
+
|
|
59
|
+
Two targeted changes to `src/server/tools.js`.
|
|
60
|
+
|
|
61
|
+
### 1. `save_tool` validates and auto-corrects `parameters`
|
|
62
|
+
|
|
63
|
+
Before writing to `tools.json`, the `save_tool` code now:
|
|
64
|
+
- If `parameters` is a string, attempts `JSON.parse()` — models commonly serialize the object when they should pass it directly. If parsing succeeds and yields an object, the corrected value is used silently.
|
|
65
|
+
- If `parameters` is still not a plain object after the attempted parse, returns an error immediately with a clear message. Nothing is written to disk.
|
|
66
|
+
|
|
67
|
+
```js
|
|
68
|
+
let parameters = args.parameters;
|
|
69
|
+
if (typeof parameters === 'string') {
|
|
70
|
+
try {
|
|
71
|
+
parameters = JSON.parse(parameters);
|
|
72
|
+
} catch {
|
|
73
|
+
return { status: 'error', error: 'parameters must be a JSON Schema object, not a string. Pass the object directly, not as a JSON-serialized string.' };
|
|
74
|
+
}
|
|
75
|
+
}
|
|
76
|
+
if (typeof parameters !== 'object' || parameters === null || Array.isArray(parameters)) {
|
|
77
|
+
return { status: 'error', error: 'parameters must be a JSON Schema object (e.g. { type: "object", properties: {...} }).' };
|
|
78
|
+
}
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
### 2. `getToolDefinitions` filters out malformed tools
|
|
82
|
+
|
|
83
|
+
`getToolDefinitions` now validates `parameters` on each tool before including it in the definitions sent to the provider. A tool with a non-object `parameters` is skipped with a `console.warn`, not thrown — this is a defence-in-depth guard, not a primary error path.
|
|
84
|
+
|
|
85
|
+
```js
|
|
86
|
+
export function getToolDefinitions(tools) {
|
|
87
|
+
const defs = [];
|
|
88
|
+
for (const [name, t] of Object.entries(tools)) {
|
|
89
|
+
const params = t.definition?.function?.parameters;
|
|
90
|
+
if (typeof params !== 'object' || params === null || Array.isArray(params)) {
|
|
91
|
+
console.warn(`[tools] Skipping tool '${name}': parameters is not a valid object (got ${typeof params})`);
|
|
92
|
+
continue;
|
|
93
|
+
}
|
|
94
|
+
defs.push(t.definition);
|
|
95
|
+
}
|
|
96
|
+
return defs;
|
|
97
|
+
}
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
Together: Fix 1 prevents malformed tools from entering `tools.json`. Fix 2 ensures that even if a malformed tool somehow ends up in `tools.json` (e.g. from an older version, manual edit, or a bug that slips through), it is silently excluded from every model request rather than poisoning the session.
|
|
101
|
+
|
|
102
|
+
---
|
|
103
|
+
|
|
104
|
+
## Secondary Issue: Agent Hallucinated Successful Action
|
|
105
|
+
|
|
106
|
+
In the run that preceded the `save_tool` call, the agent responded with:
|
|
107
|
+
|
|
108
|
+
> "The scanning script has been created at /root/.jarvis/projects/cybersecurity/scan.sh."
|
|
109
|
+
|
|
110
|
+
No such file was ever created — the agent had only installed Nuclei successfully. This hallucination forced the next run to attempt recovery via `save_tool`, which is where the malformed tool was introduced. The hallucination itself is a model-quality issue with the free nvidia model, not a Jarvis bug.
|
|
111
|
+
|
|
112
|
+
---
|
|
113
|
+
|
|
114
|
+
## Outcome
|
|
115
|
+
|
|
116
|
+
- `save_tool` auto-corrects the common case (string instead of object) and rejects the rest with a clear error before writing to disk
|
|
117
|
+
- A pre-existing malformed tool in `tools.json` no longer poisons model requests — it is silently skipped per call
|
|
118
|
+
- Sessions are no longer permanently broken by a single bad `save_tool` call
|
|
@@ -0,0 +1,271 @@
|
|
|
1
|
+
# Finding 007: Telegram Error Opacity, Empty Responses, and Handoff Stalling
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-02-28
|
|
4
|
+
**Severity:** High — caused "Sorry, something went wrong" with no context, silent empty responses, and 40+ wasted iterations on a stuck task
|
|
5
|
+
**Status:** Fixed
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Observed Session
|
|
10
|
+
|
|
11
|
+
The session (`fdb3fb46`) ran on a Linux server using `nvidia/nemotron-3-nano-30b-a3b:free`. The user asked Jarvis to implement a cybersecurity scanning project and run a scan against `https://dviet.de`. The session produced 10 agent runs over approximately 2 hours, including 4 consecutive handoff runs that made no real progress, two "Sorry, something went wrong" errors in Telegram, and one completely silent (empty) Telegram message.
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## What Happened — Full Run Sequence
|
|
16
|
+
|
|
17
|
+
| Run | Trigger | Status | Telegram received |
|
|
18
|
+
|-----|---------|--------|-------------------|
|
|
19
|
+
| 1 | "Hi" | ok | "Hello Duc! 👋" |
|
|
20
|
+
| 2 | "Weißt du wo das cybersecurity Projekt liegt?" | ok (9 iterations) | Location found |
|
|
21
|
+
| 3 | "Kannst du das readme lesen..." | ok | README analysis |
|
|
22
|
+
| 4 | "Yes implement the missing pieces..." | **format_error** | **Empty message (silent)** |
|
|
23
|
+
| 5 | "What exactly went wrong?" (handoff 1) | checkpoint_reached | — (internal handoff loop) |
|
|
24
|
+
| 6 | handoff 2 | checkpoint_reached | — (internal) |
|
|
25
|
+
| 7 | handoff 3 | checkpoint_reached | — (internal) |
|
|
26
|
+
| 8 | handoff 4 | **model_error** | **"Sorry, something went wrong: ..."** |
|
|
27
|
+
| 9 | "What is your session id" | ok | Session ID |
|
|
28
|
+
| 10 | "Do exec ls" | **model_error** | **"Sorry, something went wrong: ..."** |
|
|
29
|
+
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
## Issue 1: Generic "Sorry, something went wrong" with no context
|
|
33
|
+
|
|
34
|
+
### What happened
|
|
35
|
+
|
|
36
|
+
Runs 8 and 10 failed with `model_error: Empty choices array` — the nvidia free model returned a response with `choices: []`, producing no content at all. When `handleChat` threw an error, the Telegram channel's catch block sent:
|
|
37
|
+
|
|
38
|
+
```
|
|
39
|
+
Sorry, something went wrong. Please try again.
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
This is maximally unhelpful. The user has no idea whether the model failed, whether a tool crashed, whether the session is broken, or whether retrying will help.
|
|
43
|
+
|
|
44
|
+
### Root cause
|
|
45
|
+
|
|
46
|
+
The catch block in `src/channels/telegram/index.js` used a hardcoded string regardless of the actual error:
|
|
47
|
+
|
|
48
|
+
```js
|
|
49
|
+
} catch (e) {
|
|
50
|
+
console.error(`[telegram] agent error chat_id=${chatId}: ${e.message}`);
|
|
51
|
+
await ctx.reply('Sorry, something went wrong. Please try again.');
|
|
52
|
+
clearInterval(typingInterval);
|
|
53
|
+
return;
|
|
54
|
+
}
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
The error message was logged to `console.error` (only visible in server logs, not to the user) and discarded.
|
|
58
|
+
|
|
59
|
+
### Fix (`src/channels/telegram/index.js`)
|
|
60
|
+
|
|
61
|
+
Pass `e.message` to the user reply:
|
|
62
|
+
|
|
63
|
+
```js
|
|
64
|
+
const errText = e.message
|
|
65
|
+
? `Sorry, something went wrong: ${e.message}`
|
|
66
|
+
: 'Sorry, something went wrong. Please try again.';
|
|
67
|
+
await ctx.reply(errText).catch(() => {});
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
The `.catch(() => {})` guards against a second failure when the Telegram API itself is unreachable — without it, a failed `ctx.reply` inside the catch block would throw an unhandled rejection.
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
|
|
74
|
+
## Issue 2: Empty Telegram message on `format_error`
|
|
75
|
+
|
|
76
|
+
### What happened
|
|
77
|
+
|
|
78
|
+
Run 4 ended with `format_error` — the model produced a non-JSON final response and all three recovery attempts (fallback model, nudge retry) also failed. The agent returned with `response: ""` (empty string). The Telegram handler then called:
|
|
79
|
+
|
|
80
|
+
```js
|
|
81
|
+
const text = result.response; // ""
|
|
82
|
+
await ctx.reply(text); // ctx.reply("") — Telegram silently rejects empty messages
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
The user saw nothing. No error, no confirmation, no indication that anything had happened. From their perspective the message was sent and never received a reply.
|
|
86
|
+
|
|
87
|
+
### Root cause
|
|
88
|
+
|
|
89
|
+
The delivery block in `src/channels/telegram/index.js` used `result.response` directly without guarding against empty or null values. When `format_error` returns an empty string, `ctx.reply("")` is called and Telegram's API rejects it silently (HTTP 400 from Telegram, swallowed by grammy).
|
|
90
|
+
|
|
91
|
+
Additionally, the error log in the delivery catch block used `result.response.length`, which would throw a `TypeError` if `result.response` was `null` rather than `""`.
|
|
92
|
+
|
|
93
|
+
### Fix (`src/channels/telegram/index.js`)
|
|
94
|
+
|
|
95
|
+
Guard with `?.trim()` and a fallback message:
|
|
96
|
+
|
|
97
|
+
```js
|
|
98
|
+
const text = result.response?.trim()
|
|
99
|
+
|| 'The agent encountered an error and could not produce a response. Please try again.';
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
Also updated the delivery catch block to not reference `result.response.length` (which crashes on null).
|
|
103
|
+
|
|
104
|
+
---
|
|
105
|
+
|
|
106
|
+
## Issue 3: `failedApproaches` memory lost across handoff runs
|
|
107
|
+
|
|
108
|
+
### What happened
|
|
109
|
+
|
|
110
|
+
Runs 5, 6, 7, and 8 were all consecutive handoff runs triggered by a single user message ("What exactly went wrong?"). Each run correctly produced a `failedApproaches` array in its checkpoint — for example, "nuclei scan command timed out". However, the resume message for the next run was built using only the **current run's** `failedApproaches`:
|
|
111
|
+
|
|
112
|
+
```js
|
|
113
|
+
if (run.checkpoint.failedApproaches && run.checkpoint.failedApproaches.length > 0) {
|
|
114
|
+
resumeContent += `\n\n[System: The following approaches were tried and failed in the previous run — ...]`
|
|
115
|
+
// ↑ only the last run's failures, not all previous runs
|
|
116
|
+
}
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
Run 6 started knowing about run 5's failures. Run 7 started knowing about run 6's failures — but had forgotten run 5's. Run 8 started knowing about run 7's failures — but had forgotten runs 5 and 6. The model could only see one run of history at a time and kept rediscovering and re-attempting strategies it had already tried.
|
|
120
|
+
|
|
121
|
+
The session JSONL log shows runs 5, 6, 7, and 8 all executing nearly identical tool call sequences:
|
|
122
|
+
|
|
123
|
+
- `nuclei -update-templates`
|
|
124
|
+
- `nuclei -h`
|
|
125
|
+
- `mkdir -p results/dviet.de`
|
|
126
|
+
- `which node`
|
|
127
|
+
- `ls -al /usr/local/share/nuclei` → error (directory doesn't exist)
|
|
128
|
+
- `echo -e '#!/bin/bash...'` → broken scan.sh
|
|
129
|
+
- `nuclei -silent ... -u https://dviet.de` → **60s timeout**
|
|
130
|
+
- `nmap -sV -p 80,443 dviet.de` → host down
|
|
131
|
+
- `nuclei -silent -t http-title ...` → **60s timeout**
|
|
132
|
+
|
|
133
|
+
32 tool calls in run 8 alone, across only 3 model iterations — the model was dumping the same 10-call batch per iteration, learning nothing.
|
|
134
|
+
|
|
135
|
+
### Fix (`src/server/agent.js`, `src/server/sessions.js`)
|
|
136
|
+
|
|
137
|
+
**1.** Added `failedApproaches: []` to `session.metadata` in `createSession()`. This gives old sessions a graceful upgrade path (the field will be initialized to `[]` on the first new user message that resets handoff state).
|
|
138
|
+
|
|
139
|
+
**2.** On every new user message, `failedApproaches` is reset alongside `handoffCount`:
|
|
140
|
+
|
|
141
|
+
```js
|
|
142
|
+
session.metadata.handoffCount = 0;
|
|
143
|
+
session.metadata.failedApproaches = [];
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
**3.** After each `checkpoint_reached` run, the current run's failures are pushed onto the session-level accumulator:
|
|
147
|
+
|
|
148
|
+
```js
|
|
149
|
+
if (run.checkpoint.failedApproaches && run.checkpoint.failedApproaches.length > 0) {
|
|
150
|
+
if (!session.metadata.failedApproaches) session.metadata.failedApproaches = [];
|
|
151
|
+
session.metadata.failedApproaches.push(...run.checkpoint.failedApproaches);
|
|
152
|
+
}
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
**4.** The resume message uses the full accumulated list instead of just the last run's:
|
|
156
|
+
|
|
157
|
+
```js
|
|
158
|
+
const allFailedApproaches = session.metadata.failedApproaches || [];
|
|
159
|
+
if (allFailedApproaches.length > 0) {
|
|
160
|
+
resumeContent += `\n\n[System: The following approaches were tried and failed in previous runs — do not repeat them:\n${allFailedApproaches.map((a, i) => `${i + 1}. ${a}`).join('\n')}]`;
|
|
161
|
+
}
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
The message changed from "in the **previous run**" to "in **previous runs**" to accurately reflect the scope.
|
|
165
|
+
|
|
166
|
+
---
|
|
167
|
+
|
|
168
|
+
## Issue 4: Zero-progress handoffs not detected
|
|
169
|
+
|
|
170
|
+
### What happened
|
|
171
|
+
|
|
172
|
+
Runs 5, 6, and 7 all hit `checkpoint_reached` with nearly identical `remaining` fields. Each run used 10 full iterations yet made no real progress — the `remaining` list after run 7 was essentially the same as after run 5. The handoff loop continued spawning new runs until `maxHandoffs` was hit, burning 30 more iterations and about 90 minutes of wall time.
|
|
173
|
+
|
|
174
|
+
The existing `maxHandoffs` limit (default 5) is the only backstop, but it doesn't distinguish between runs that make real progress and runs that achieve nothing. It allows up to 5 useless handoffs before stopping.
|
|
175
|
+
|
|
176
|
+
### Root cause
|
|
177
|
+
|
|
178
|
+
No comparison was made between consecutive `checkpoint.remaining` values. The handoff loop always continued as long as `handoffCount <= maxHandoffs`, with no check that the agent had actually made forward progress.
|
|
179
|
+
|
|
180
|
+
### Fix (`src/server/agent.js`)
|
|
181
|
+
|
|
182
|
+
Introduced a `previousRemaining` variable in the handoff loop. Before each handoff continuation, the current `checkpoint.remaining` (trimmed) is compared against the previous run's value. If they are identical, the session is stopped immediately with `intervention_required`:
|
|
183
|
+
|
|
184
|
+
```js
|
|
185
|
+
let previousRemaining = null;
|
|
186
|
+
|
|
187
|
+
// ... inside handoff loop, after checkpoint_reached:
|
|
188
|
+
const currentRemaining = (run.checkpoint.remaining || '').trim();
|
|
189
|
+
if (previousRemaining !== null && currentRemaining === previousRemaining) {
|
|
190
|
+
finalStatus = 'intervention_required';
|
|
191
|
+
finalLogSummary = 'Zero progress detected: task state unchanged after a full run. Human intervention required.';
|
|
192
|
+
// log, strip, break
|
|
193
|
+
}
|
|
194
|
+
previousRemaining = currentRemaining;
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
This fires on the **second** handoff with identical remaining — one repeat is allowed because the model may be working on the same items in a different order. Identical remaining on two consecutive runs means it is genuinely stuck.
|
|
198
|
+
|
|
199
|
+
**Effect on the debugging session**: instead of running 4 useless handoffs (runs 5–8), the agent would have stopped at run 6 with `intervention_required`, saving 20 iterations and about 60 minutes.
|
|
200
|
+
|
|
201
|
+
---
|
|
202
|
+
|
|
203
|
+
## Issue 5: `echo -e` creates broken shell scripts on Ubuntu
|
|
204
|
+
|
|
205
|
+
### What happened
|
|
206
|
+
|
|
207
|
+
The model attempted to create `scan.sh` using:
|
|
208
|
+
|
|
209
|
+
```sh
|
|
210
|
+
echo -e '#!/bin/bash\n# Simple wrapper...\n...'
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
On Ubuntu, the default shell for non-interactive commands is `/bin/dash`, not `/bin/bash`. `dash` does not support `echo -e` — it treats `-e` as a literal argument, so the file started with:
|
|
214
|
+
|
|
215
|
+
```
|
|
216
|
+
-e #!/bin/bash
|
|
217
|
+
# Simple wrapper...
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
Every attempt to run `scan.sh` failed with:
|
|
221
|
+
|
|
222
|
+
```
|
|
223
|
+
/root/.jarvis/projects/cybersecurity/scan.sh: 1: -e: not found
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
The model detected the error on the first try, but each subsequent handoff run repeated the exact same broken creation command, presumably because the `echo -e` pattern was deep in the model's training distribution for "create a shell script".
|
|
227
|
+
|
|
228
|
+
### Root cause
|
|
229
|
+
|
|
230
|
+
The system prompt's `## exec Safety` section gave guidance about filesystem scans and `cat` vs `grep`, but said nothing about portable file creation. The `echo -e` pattern works on bash but not on sh/dash, and this distinction is invisible to a model working through `exec`.
|
|
231
|
+
|
|
232
|
+
### Fix (`docs/system-prompt.md`)
|
|
233
|
+
|
|
234
|
+
Added a bullet to `## exec Safety`:
|
|
235
|
+
|
|
236
|
+
```
|
|
237
|
+
- **Writing multi-line files**: use `printf '...'` or a heredoc (`cat <<'EOF' > file`) instead of
|
|
238
|
+
`echo -e`. The `-e` flag is not portable — on Ubuntu `/bin/sh` it is treated as literal text,
|
|
239
|
+
corrupting the file.
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
---
|
|
243
|
+
|
|
244
|
+
## What Was Not Changed
|
|
245
|
+
|
|
246
|
+
- The `model_error: Empty choices array` path in `runAgentLoop` — this is a provider-side failure, no server-side fix can prevent the model from returning nothing. The fix is at the Telegram delivery layer (surfacing the error to the user), not at the agent layer.
|
|
247
|
+
- The `maxHandoffs` limit — it remains as a hard cap. Zero-progress detection fires before `maxHandoffs` in the case of a truly stuck task, so the two mechanisms are complementary.
|
|
248
|
+
- The `consecutiveFailures` detector and exact-match loop tracker — unchanged; they work alongside the new zero-progress check.
|
|
249
|
+
- No per-session Perplexity call counter was added server-side.
|
|
250
|
+
|
|
251
|
+
---
|
|
252
|
+
|
|
253
|
+
## Secondary Observation: nuclei templates were present but never found
|
|
254
|
+
|
|
255
|
+
Nuclei was installed at `/usr/local/bin/nuclei`. Its templates were at `/root/nuclei-templates` (visible in the `list_dir /root` output from run 2). However, the model never connected these two facts — it searched `/usr/local/share/nuclei`, `/root/.nuclei`, and various `/usr` subdirectories, missing the templates that were right there in `/root`.
|
|
256
|
+
|
|
257
|
+
Running `nuclei -u https://dviet.de -t /root/nuclei-templates/...` (or just `nuclei -u https://dviet.de` with the templates auto-discovered from `~/nuclei-templates`) would likely have worked. The scan also failed because `nmap` reported "host seems down" for `dviet.de` — the host was either rate-limiting ICMP or the port scan was blocked. Using `nmap -Pn` to skip host discovery would have bypassed this.
|
|
258
|
+
|
|
259
|
+
Neither of these is a Jarvis bug — they are model reasoning failures specific to this session and this free model. A more capable model would have spotted the template path and the nmap flag.
|
|
260
|
+
|
|
261
|
+
---
|
|
262
|
+
|
|
263
|
+
## Outcome
|
|
264
|
+
|
|
265
|
+
| Fix | Files changed |
|
|
266
|
+
|-----|--------------|
|
|
267
|
+
| Better error text in Telegram catch block | `src/channels/telegram/index.js` |
|
|
268
|
+
| Guard against empty response before `ctx.reply()` | `src/channels/telegram/index.js` |
|
|
269
|
+
| Accumulate `failedApproaches` across all handoffs in `session.metadata` | `src/server/agent.js`, `src/server/sessions.js` |
|
|
270
|
+
| Zero-progress handoff detection via `previousRemaining` comparison | `src/server/agent.js` |
|
|
271
|
+
| Shell script writing guidance (`printf`/heredoc over `echo -e`) | `docs/system-prompt.md` |
|
package/docs/system-prompt.md
CHANGED
|
@@ -52,6 +52,7 @@ The `exec` tool runs real shell commands on the server. Use it responsibly:
|
|
|
52
52
|
- **Use known paths.** Prefer `process.cwd()`, `$HOME`, or paths you already know over broad searches. Use `which <binary>` to locate executables.
|
|
53
53
|
- **Prefer targeted reads.** Use `grep`, `head`, or `tail` instead of `cat` on files you haven't seen before. Large file output is truncated anyway — a targeted command gives you better signal.
|
|
54
54
|
- **Avoid commands with unbounded runtime.** If a command could run indefinitely or scan an unknown-size tree, scope it first.
|
|
55
|
+
- **Writing multi-line files**: use `printf '...'` or a heredoc (`cat <<'EOF' > file`) instead of `echo -e`. The `-e` flag is not portable — on Ubuntu `/bin/sh` it is treated as literal text, corrupting the file.
|
|
55
56
|
|
|
56
57
|
## Failure Recovery
|
|
57
58
|
|
package/package.json
CHANGED
|
@@ -50,7 +50,10 @@ export async function startTelegramChannel(config) {
|
|
|
50
50
|
result = await handleChat(config, sessionId, ctx.message.text);
|
|
51
51
|
} catch (e) {
|
|
52
52
|
console.error(`[telegram] agent error chat_id=${chatId}: ${e.message}`);
|
|
53
|
-
|
|
53
|
+
const errText = e.message
|
|
54
|
+
? `Sorry, something went wrong: ${e.message}`
|
|
55
|
+
: 'Sorry, something went wrong. Please try again.';
|
|
56
|
+
await ctx.reply(errText).catch(() => {});
|
|
54
57
|
clearInterval(typingInterval);
|
|
55
58
|
return;
|
|
56
59
|
}
|
|
@@ -64,7 +67,9 @@ export async function startTelegramChannel(config) {
|
|
|
64
67
|
|
|
65
68
|
try {
|
|
66
69
|
const MAX_TG = 4096;
|
|
67
|
-
|
|
70
|
+
// Guard against empty response (e.g. format_error returns empty string)
|
|
71
|
+
const text = result.response?.trim()
|
|
72
|
+
|| 'The agent encountered an error and could not produce a response. Please try again.';
|
|
68
73
|
if (text.length <= MAX_TG) {
|
|
69
74
|
await ctx.reply(text);
|
|
70
75
|
} else {
|
|
@@ -74,8 +79,8 @@ export async function startTelegramChannel(config) {
|
|
|
74
79
|
}
|
|
75
80
|
console.log(`[telegram] response sent chat_id=${chatId} length=${text.length}`);
|
|
76
81
|
} catch (e) {
|
|
77
|
-
console.error(`[telegram] delivery error chat_id=${chatId}
|
|
78
|
-
await ctx.reply('Sorry, something went wrong. Please try again.');
|
|
82
|
+
console.error(`[telegram] delivery error chat_id=${chatId}: ${e.message}`);
|
|
83
|
+
await ctx.reply('Sorry, something went wrong sending the response. Please try again.').catch(() => {});
|
|
79
84
|
} finally {
|
|
80
85
|
clearInterval(typingInterval);
|
|
81
86
|
}
|
package/src/server/agent.js
CHANGED
|
@@ -381,9 +381,10 @@ async function _runHandleChat(config, sessionId, userMessage) {
|
|
|
381
381
|
session = createSession(systemPromptTemplate);
|
|
382
382
|
}
|
|
383
383
|
|
|
384
|
-
// Append user message and reset handoff
|
|
384
|
+
// Append user message and reset handoff state
|
|
385
385
|
session.messages.push({ role: 'user', content: userMessage });
|
|
386
386
|
session.metadata.handoffCount = 0;
|
|
387
|
+
session.metadata.failedApproaches = [];
|
|
387
388
|
|
|
388
389
|
// Resolves {{user_info}} in system prompt at runtime (never persisted)
|
|
389
390
|
function prepareMessages(messages) {
|
|
@@ -399,6 +400,8 @@ async function _runHandleChat(config, sessionId, userMessage) {
|
|
|
399
400
|
let finalResponse = '';
|
|
400
401
|
let finalLogSummary = '';
|
|
401
402
|
let finalStatus = 'ok';
|
|
403
|
+
// Tracks checkpoint.remaining from the previous handoff run to detect zero progress
|
|
404
|
+
let previousRemaining = null;
|
|
402
405
|
|
|
403
406
|
try {
|
|
404
407
|
// Handoff loop
|
|
@@ -449,6 +452,36 @@ async function _runHandleChat(config, sessionId, userMessage) {
|
|
|
449
452
|
status: 'checkpoint_reached',
|
|
450
453
|
});
|
|
451
454
|
|
|
455
|
+
// Accumulate failedApproaches from this run into session metadata so the
|
|
456
|
+
// full history of failed strategies is available across all handoff runs.
|
|
457
|
+
if (run.checkpoint.failedApproaches && run.checkpoint.failedApproaches.length > 0) {
|
|
458
|
+
if (!session.metadata.failedApproaches) session.metadata.failedApproaches = [];
|
|
459
|
+
session.metadata.failedApproaches.push(...run.checkpoint.failedApproaches);
|
|
460
|
+
}
|
|
461
|
+
|
|
462
|
+
// Zero-progress detection: if checkpoint.remaining is identical to the previous
|
|
463
|
+
// handoff's remaining, the agent completed a full run without making any progress.
|
|
464
|
+
// Stop immediately rather than burning more iterations on a stuck task.
|
|
465
|
+
const currentRemaining = (run.checkpoint.remaining || '').trim();
|
|
466
|
+
if (previousRemaining !== null && currentRemaining === previousRemaining) {
|
|
467
|
+
finalResponse = run.response;
|
|
468
|
+
finalLogSummary = 'Zero progress detected: task state unchanged after a full run. Human intervention required.';
|
|
469
|
+
finalStatus = 'intervention_required';
|
|
470
|
+
|
|
471
|
+
await appendLog(sessionId, {
|
|
472
|
+
iteration: 0,
|
|
473
|
+
model: config.selectedModel,
|
|
474
|
+
userInput: userMessage,
|
|
475
|
+
toolCalls: [],
|
|
476
|
+
response: finalResponse,
|
|
477
|
+
logSummary: finalLogSummary,
|
|
478
|
+
status: 'intervention_required',
|
|
479
|
+
});
|
|
480
|
+
session.messages.splice(runStartIndex, session.messages.length - runStartIndex - 1);
|
|
481
|
+
break;
|
|
482
|
+
}
|
|
483
|
+
previousRemaining = currentRemaining;
|
|
484
|
+
|
|
452
485
|
// Check handoff limit
|
|
453
486
|
session.metadata.handoffCount++;
|
|
454
487
|
if (session.metadata.handoffCount > config.maxHandoffs) {
|
|
@@ -478,10 +511,12 @@ async function _runHandleChat(config, sessionId, userMessage) {
|
|
|
478
511
|
|
|
479
512
|
// Resume with checkpoint.remaining as new prompt.
|
|
480
513
|
// Guard against null/undefined in case the model omitted the field.
|
|
481
|
-
//
|
|
514
|
+
// Use the full accumulated failedApproaches list across ALL handoff runs so the
|
|
515
|
+
// agent has complete memory of what has already been tried and failed.
|
|
482
516
|
let resumeContent = run.checkpoint.remaining || 'Continue with the task.';
|
|
483
|
-
|
|
484
|
-
|
|
517
|
+
const allFailedApproaches = session.metadata.failedApproaches || [];
|
|
518
|
+
if (allFailedApproaches.length > 0) {
|
|
519
|
+
resumeContent += `\n\n[System: The following approaches were tried and failed in previous runs — do not repeat them:\n${allFailedApproaches.map((a, i) => `${i + 1}. ${a}`).join('\n')}]`;
|
|
485
520
|
}
|
|
486
521
|
session.messages.push({ role: 'user', content: resumeContent });
|
|
487
522
|
}
|
package/src/server/sessions.js
CHANGED
package/src/server/tools.js
CHANGED
|
@@ -149,7 +149,7 @@ const SEED_TOOLS = {
|
|
|
149
149
|
},
|
|
150
150
|
},
|
|
151
151
|
},
|
|
152
|
-
code: `const toolsFile = path.join(process.env.HOME, '.jarvis/data/tools/tools.json'); const raw = await fs.promises.readFile(toolsFile, 'utf8').catch(() => '{}'); const tools = JSON.parse(raw); tools[args.name] = { definition: { type: 'function', function: { name: args.name, description: args.description, parameters
|
|
152
|
+
code: `const toolsFile = path.join(process.env.HOME, '.jarvis/data/tools/tools.json'); const raw = await fs.promises.readFile(toolsFile, 'utf8').catch(() => '{}'); const tools = JSON.parse(raw); let parameters = args.parameters; if (typeof parameters === 'string') { try { parameters = JSON.parse(parameters); } catch { return { status: 'error', error: 'parameters must be a JSON Schema object, not a string. Pass the object directly, not as a JSON-serialized string.' }; } } if (typeof parameters !== 'object' || parameters === null || Array.isArray(parameters)) { return { status: 'error', error: 'parameters must be a JSON Schema object (e.g. { type: "object", properties: {...} }).' }; } tools[args.name] = { definition: { type: 'function', function: { name: args.name, description: args.description, parameters } }, code: args.code }; await fs.promises.writeFile(toolsFile, JSON.stringify(tools, null, 2), 'utf8'); return { status: 'ok', saved: args.name };`,
|
|
153
153
|
},
|
|
154
154
|
get_tool: {
|
|
155
155
|
definition: {
|
|
@@ -424,7 +424,16 @@ export async function loadTools() {
|
|
|
424
424
|
}
|
|
425
425
|
|
|
426
426
|
export function getToolDefinitions(tools) {
|
|
427
|
-
|
|
427
|
+
const defs = [];
|
|
428
|
+
for (const [name, t] of Object.entries(tools)) {
|
|
429
|
+
const params = t.definition?.function?.parameters;
|
|
430
|
+
if (typeof params !== 'object' || params === null || Array.isArray(params)) {
|
|
431
|
+
console.warn(`[tools] Skipping tool '${name}': parameters is not a valid object (got ${typeof params})`);
|
|
432
|
+
continue;
|
|
433
|
+
}
|
|
434
|
+
defs.push(t.definition);
|
|
435
|
+
}
|
|
436
|
+
return defs;
|
|
428
437
|
}
|
|
429
438
|
|
|
430
439
|
export async function executeTool(tools, name, toolArgs) {
|