@ducci/jarvis 1.0.20 → 1.0.22
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/docs/findings/004-agent-reliability-improvements.md +162 -0
- package/docs/findings/005-installation-timeout.md +128 -0
- package/docs/system-prompt.md +10 -0
- package/package.json +1 -1
- package/src/scripts/onboarding.js +38 -0
- package/src/server/agent.js +32 -4
- package/src/server/tools.js +127 -2
|
@@ -0,0 +1,162 @@
|
|
|
1
|
+
# Finding 004: Agent Reliability — Failure Loops, Checkpoint Memory, and Iteration Awareness
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-02-27
|
|
4
|
+
**Severity:** High — caused observed session failure with 54 tool calls, `format_error` crash, and no useful output after 42 minutes
|
|
5
|
+
**Status:** Fixed
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## What Happened
|
|
10
|
+
|
|
11
|
+
A session was started to build a cybersecurity project installing three tools: Nuclei, Subfinder, and Naabu. The agent went through 5 handoffs (hitting `maxHandoffs`) and crashed with a `format_error` in the final iteration. Observations:
|
|
12
|
+
|
|
13
|
+
- **181 exec calls** across 19 agent iterations
|
|
14
|
+
- **21 perplexity_search calls**, including 11+ in the final iteration alone
|
|
15
|
+
- The agent oscillated between download strategies (Docker → `go install` → tarball → direct binary) without memory of what had already failed
|
|
16
|
+
- The existing loop detector never fired because each failed command had slightly different arguments (different URLs, different flags), producing different `callKey` values
|
|
17
|
+
- Each handoff resumed with only `checkpoint.remaining` — no record of what approaches had already been exhausted
|
|
18
|
+
- The model degraded and eventually produced a non-JSON response (`format_error`), crashing the session
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## Root Causes
|
|
23
|
+
|
|
24
|
+
### 1. Loop detection only caught exact-match repetition
|
|
25
|
+
|
|
26
|
+
The existing `loopTracker` detects when the _exact same_ tool call (name + args + result) is repeated 3 times. It does not detect _semantic_ failure loops: repeated attempts to do the same thing via slightly different commands. In the session, each download attempt used a different URL or flags, so every `callKey` was unique.
|
|
27
|
+
|
|
28
|
+
### 2. Checkpoint carried no memory of failed strategies
|
|
29
|
+
|
|
30
|
+
`WRAP_UP_NOTE` asked the model for `progress` and `remaining`, but not for a record of what _failed_. Each new run after a handoff started with a blank slate — the model had no way to know that curl downloads, `go install`, and tarball extraction had already been tried and failed. It repeated them.
|
|
31
|
+
|
|
32
|
+
### 3. No iteration budget awareness
|
|
33
|
+
|
|
34
|
+
The model had no visibility into how many iterations remained in the current run. It kept taking exploratory steps as if budget were unlimited, then was surprised by the wrap-up call. A model that knows it has 2 iterations left will consolidate and report; one that doesn't will keep digging.
|
|
35
|
+
|
|
36
|
+
### 4. `perplexity_search` used without restraint
|
|
37
|
+
|
|
38
|
+
The tool description had no guidance on usage limits. The model searched Perplexity 21 times in one session (11 in the final iteration), including redundant queries for the same version information. Each search consumed an iteration and added latency without improving outcomes.
|
|
39
|
+
|
|
40
|
+
### 5. System prompt had no "give up" rule
|
|
41
|
+
|
|
42
|
+
The system prompt told the agent to "decide whether to retry with a corrected call or explain the failure to the user" — but gave no threshold for when to stop retrying. In practice, the agent always chose to retry, regardless of how many times the same approach had failed.
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## Fixes
|
|
47
|
+
|
|
48
|
+
### 1. Consecutive failure detection (`src/server/agent.js`)
|
|
49
|
+
|
|
50
|
+
Added a `consecutiveFailures` counter that tracks back-to-back failed tool calls across all iterations within a run. A tool call counts as failed if `executeTool` throws (`toolStatus === 'error'`) _or_ the result object has `status === 'error'` (catching exec failures that are returned, not thrown). A successful call resets the counter to 0.
|
|
51
|
+
|
|
52
|
+
After each iteration's tool calls, if `consecutiveFailures >= CONSECUTIVE_FAILURE_THRESHOLD` (3), a system break message is injected into the session and the counter resets:
|
|
53
|
+
|
|
54
|
+
```js
|
|
55
|
+
const resultObj = typeof result === 'object' && result !== null ? result : null;
|
|
56
|
+
const toolFailed = toolStatus === 'error' || (resultObj && resultObj.status === 'error');
|
|
57
|
+
if (toolFailed) {
|
|
58
|
+
consecutiveFailures++;
|
|
59
|
+
} else {
|
|
60
|
+
consecutiveFailures = 0;
|
|
61
|
+
}
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
```js
|
|
65
|
+
if (consecutiveFailures >= CONSECUTIVE_FAILURE_THRESHOLD) {
|
|
66
|
+
session.messages.push({
|
|
67
|
+
role: 'user',
|
|
68
|
+
content: '[System: You have had 3 or more consecutive tool failures. Stop retrying the same approach. Either pivot to a fundamentally different strategy or provide your final response explaining what failed and why.]',
|
|
69
|
+
});
|
|
70
|
+
consecutiveFailures = 0;
|
|
71
|
+
}
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
This complements the existing exact-match loop detector — together they cover both identical repetition and semantic failure loops.
|
|
75
|
+
|
|
76
|
+
### 2. `failedApproaches` in checkpoint schema (`src/server/agent.js`)
|
|
77
|
+
|
|
78
|
+
Updated `WRAP_UP_NOTE` to request a `failedApproaches` array alongside `progress` and `remaining`:
|
|
79
|
+
|
|
80
|
+
```json
|
|
81
|
+
{
|
|
82
|
+
"checkpoint": {
|
|
83
|
+
"progress": "...",
|
|
84
|
+
"remaining": "...",
|
|
85
|
+
"failedApproaches": ["downloading subfinder via curl from GitHub releases — connection reset", "..."]
|
|
86
|
+
}
|
|
87
|
+
}
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
When resuming after a handoff, the agent loop now appends a system note to the resume message listing every failed approach from the previous run:
|
|
91
|
+
|
|
92
|
+
```js
|
|
93
|
+
let resumeContent = run.checkpoint.remaining || 'Continue with the task.';
|
|
94
|
+
if (run.checkpoint.failedApproaches && run.checkpoint.failedApproaches.length > 0) {
|
|
95
|
+
resumeContent += `\n\n[System: The following approaches were tried and failed in the previous run — do not repeat them:\n${run.checkpoint.failedApproaches.map((a, i) => `${i + 1}. ${a}`).join('\n')}]`;
|
|
96
|
+
}
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
The next run starts with concrete knowledge of what not to try, rather than repeating the same failed strategies.
|
|
100
|
+
|
|
101
|
+
### 3. Iteration budget awareness (`src/server/agent.js`)
|
|
102
|
+
|
|
103
|
+
Each model request now includes the remaining iteration count when 5 or fewer iterations are left in the run. The count is appended to the prepared messages (never stored in `session.messages`):
|
|
104
|
+
|
|
105
|
+
```js
|
|
106
|
+
const iterationsLeft = config.maxIterations - iteration + 1;
|
|
107
|
+
const preparedMessages = iterationsLeft <= 5
|
|
108
|
+
? [...base, { role: 'user', content: `[System: ${iterationsLeft} iteration${iterationsLeft === 1 ? '' : 's'} remaining in this run. Budget your remaining steps accordingly — if you cannot finish in time, consolidate progress and provide a checkpoint.]` }]
|
|
109
|
+
: base;
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
This gives the model enough warning to consolidate and produce a clean checkpoint rather than being cut off mid-task.
|
|
113
|
+
|
|
114
|
+
### 4. `perplexity_search` description updated (`src/server/tools.js`)
|
|
115
|
+
|
|
116
|
+
Added explicit usage guidance to the tool description:
|
|
117
|
+
|
|
118
|
+
> Use sparingly — at most 3 searches per topic. Do not repeat the same query with minor variations; if an initial search does not yield what you need, switch to a different approach or verify locally with exec.
|
|
119
|
+
|
|
120
|
+
### 5. "Failure Recovery" section added to system prompt (`docs/system-prompt.md`)
|
|
121
|
+
|
|
122
|
+
Added a dedicated `## Failure Recovery` section making the give-up rule explicit:
|
|
123
|
+
|
|
124
|
+
- Retry at most once with a meaningfully different approach; if it fails again, report to the user
|
|
125
|
+
- Never repeat a failed strategy with minor variations
|
|
126
|
+
- Use `perplexity_search` at most 3 times per topic
|
|
127
|
+
- Escalate cleanly with a useful failure report rather than looping
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
## What Was Not Changed
|
|
132
|
+
|
|
133
|
+
- The existing exact-match loop detector (`loopTracker`) — it remains and now works alongside the new consecutive failure detector
|
|
134
|
+
- The checkpoint/handoff system, `maxHandoffs` limit, or tool history strip logic — unchanged
|
|
135
|
+
- The `format_error` recovery path (fallback model + nudge retry) — unchanged
|
|
136
|
+
- No per-session Perplexity call counter was added server-side; the guidance is model-facing
|
|
137
|
+
|
|
138
|
+
---
|
|
139
|
+
|
|
140
|
+
## Note: Homebrew as an Alternative for Binary Tool Installation
|
|
141
|
+
|
|
142
|
+
During the session that exposed these issues, the agent struggled to install Nuclei, Subfinder, and Naabu from GitHub releases. All three tools from [ProjectDiscovery](https://projectdiscovery.io/) are available via Homebrew:
|
|
143
|
+
|
|
144
|
+
```
|
|
145
|
+
brew install nuclei
|
|
146
|
+
brew install subfinder
|
|
147
|
+
brew install naabu
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
Homebrew handles binary verification, PATH setup, and version management automatically — far more reliably than manual `curl` downloads or `go install`. On macOS (and Linux via Linuxbrew), this is the recommended installation method. The agent could discover this via `perplexity_search` or by checking `brew search projectdiscovery` — but only if it knows to try Homebrew _before_ attempting manual downloads.
|
|
151
|
+
|
|
152
|
+
This is a guidance gap rather than a code bug: the system prompt doesn't mention package managers as a preferred strategy for binary installation. A future improvement could add: "When installing CLI tools, check for a package manager installation first (`brew install`, `apt install`, `snap install`) before attempting manual downloads."
|
|
153
|
+
|
|
154
|
+
---
|
|
155
|
+
|
|
156
|
+
## Outcome
|
|
157
|
+
|
|
158
|
+
- Failure loops are now intercepted after 3 consecutive failures instead of running to exhaustion
|
|
159
|
+
- Handoff runs start with knowledge of what has already failed, enabling genuine strategy pivots
|
|
160
|
+
- The model receives iteration budget warnings before hitting the wrap-up call
|
|
161
|
+
- Perplexity search overuse is constrained by both tool description and system prompt guidance
|
|
162
|
+
- The system prompt now has an explicit rule for when to stop retrying and report
|
|
@@ -0,0 +1,128 @@
|
|
|
1
|
+
# Finding 005: 60-Second Exec Timeout Breaks Package Installation
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-02-27
|
|
4
|
+
**Severity:** High — any package installation via exec will timeout, regardless of which package manager is used
|
|
5
|
+
**Status:** Fixed
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## What Happened
|
|
10
|
+
|
|
11
|
+
In the session analysed in Finding 004, the agent attempted to install Nuclei, Subfinder, and Naabu using several strategies — `go install`, direct `curl` downloads, tarball extraction. All either timed out or failed with network errors.
|
|
12
|
+
|
|
13
|
+
The natural follow-up question was: would switching to a proper package manager (`brew`, `apt-get`) solve the problem? The answer is no — not without fixing the timeout first.
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
## Root Cause
|
|
18
|
+
|
|
19
|
+
### The `exec` tool has a hard 60-second timeout
|
|
20
|
+
|
|
21
|
+
```js
|
|
22
|
+
// exec seed tool
|
|
23
|
+
const { stdout, stderr } = await execAsync(args.cmd, {
|
|
24
|
+
encoding: 'utf8',
|
|
25
|
+
timeout: 60000, // ← 60 seconds
|
|
26
|
+
maxBuffer: 2 * 1024 * 1024,
|
|
27
|
+
});
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
On top of that, `executeTool` wraps every tool in its own `Promise.race` against `TOOL_TIMEOUT_MS = 60_000`. This means even if a tool's internal `execAsync` had a longer timeout, the outer race would kill it at 60 seconds anyway.
|
|
31
|
+
|
|
32
|
+
Package installation routinely takes longer than 60 seconds:
|
|
33
|
+
|
|
34
|
+
| Operation | Typical duration |
|
|
35
|
+
|---|---|
|
|
36
|
+
| `apt-get update` | 15–60s (varies with server speed, mirror load) |
|
|
37
|
+
| `apt-get install nuclei` | 10–60s (binary download + extraction) |
|
|
38
|
+
| `apt-get update && apt-get install nuclei` | 30–120s combined |
|
|
39
|
+
| `brew install nuclei` | 20–90s |
|
|
40
|
+
| `go install github.com/...` | 60–180s (compilation) |
|
|
41
|
+
|
|
42
|
+
So `go install` was almost guaranteed to timeout. `apt-get` with an update step would regularly exceed 60s too. Swapping the install method without fixing the timeout would not have solved the problem.
|
|
43
|
+
|
|
44
|
+
### The `npm_install` tool has the same problem
|
|
45
|
+
|
|
46
|
+
`npm_install` also uses a 60-second timeout — it would fail for large packages or slow networks for the same reason.
|
|
47
|
+
|
|
48
|
+
### No per-tool timeout mechanism existed
|
|
49
|
+
|
|
50
|
+
All tools shared the same `TOOL_TIMEOUT_MS` constant. There was no way to declare that a specific tool legitimately needed more time.
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
## Fix
|
|
55
|
+
|
|
56
|
+
### 1. Per-tool timeout override (`src/server/tools.js`)
|
|
57
|
+
|
|
58
|
+
`executeTool` now checks for a `timeout` property on the tool definition and uses it instead of the global `TOOL_TIMEOUT_MS`:
|
|
59
|
+
|
|
60
|
+
```js
|
|
61
|
+
const timeoutMs = tool.timeout || TOOL_TIMEOUT_MS;
|
|
62
|
+
|
|
63
|
+
const timeout = new Promise((_, reject) =>
|
|
64
|
+
setTimeout(
|
|
65
|
+
() => reject(new Error(`Tool '${name}' timed out after ${timeoutMs / 1000}s`)),
|
|
66
|
+
timeoutMs
|
|
67
|
+
)
|
|
68
|
+
);
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
Tools that don't declare a timeout continue to use the 60s default — no behaviour change for existing tools.
|
|
72
|
+
|
|
73
|
+
### 2. `system_install` seed tool with 5-minute timeout (`src/server/tools.js`)
|
|
74
|
+
|
|
75
|
+
A new built-in tool handles system binary installation:
|
|
76
|
+
|
|
77
|
+
```js
|
|
78
|
+
system_install: {
|
|
79
|
+
timeout: 300_000, // 5 minutes
|
|
80
|
+
...
|
|
81
|
+
}
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
Behaviour:
|
|
85
|
+
- **Already installed check**: runs `which <package>` first. If the binary is already on PATH, returns immediately without installing — no wasted time.
|
|
86
|
+
- **Auto-detection**: tries `brew`, `apt-get`, `snap` in order. Uses the first one found. Can be overridden with an explicit `packageManager` argument.
|
|
87
|
+
- **apt-get**: always runs `apt-get update -qq` before `apt-get install -y` to avoid stale package list failures. Uses `DEBIAN_FRONTEND=noninteractive` to suppress interactive prompts.
|
|
88
|
+
- **Inner timeout**: 4.5 minutes on the internal `execAsync`, leaving headroom before the outer 5-minute tool timeout fires.
|
|
89
|
+
- **Structured errors**: returns `exitCode`, `stdout`, `stderr` on failure so the agent can read the actual error without guessing.
|
|
90
|
+
|
|
91
|
+
### 3. System prompt updated (`docs/system-prompt.md`)
|
|
92
|
+
|
|
93
|
+
Added explicit guidance in the "Tool Creation" section:
|
|
94
|
+
|
|
95
|
+
> **Installing a system binary** (e.g. nuclei, jq, ffmpeg, git): use the `system_install` tool — never use exec for this. It auto-detects the available package manager (brew/apt-get/snap) and has a 5-minute timeout sized for real downloads.
|
|
96
|
+
|
|
97
|
+
This mirrors the existing `npm_install` guidance and gives the model an explicit directive to reach for `system_install` over `exec`.
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## Why `system_install` Instead of Increasing the Global Timeout
|
|
102
|
+
|
|
103
|
+
Increasing `TOOL_TIMEOUT_MS` globally would make every tool — including `exec` — hang for much longer before failing. A runaway `exec` command (e.g. `find / -name "*.js"`) that produces no output would block the event loop for 5 minutes instead of 60 seconds. The per-tool timeout keeps the safe default for general tools while letting installation tools declare a legitimate need for more time.
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## Notes on Package Availability
|
|
108
|
+
|
|
109
|
+
Not all tools are available in every package manager. For ProjectDiscovery tools (nuclei, subfinder, naabu) specifically:
|
|
110
|
+
|
|
111
|
+
- **macOS (brew)**: `brew install nuclei`, `brew install subfinder`, `brew install naabu` — all available via the official Homebrew formulae
|
|
112
|
+
- **Linux (apt-get)**: ProjectDiscovery does not maintain an official apt repository. `apt-get install nuclei` will likely fail with "package not found"
|
|
113
|
+
- **Linux (snap)**: `snap install nuclei` is available on Ubuntu/Debian
|
|
114
|
+
|
|
115
|
+
On Linux, if `system_install` fails because the package isn't in the apt repository, the agent should try snap, or fall back to the ProjectDiscovery install script:
|
|
116
|
+
```sh
|
|
117
|
+
curl -sL https://github.com/projectdiscovery/nuclei/releases/latest/download/nuclei_linux_amd64.zip -o nuclei.zip
|
|
118
|
+
```
|
|
119
|
+
This would still use `exec` for the download, but with the failure recovery guidance from Finding 004 the agent should report the failure rather than looping.
|
|
120
|
+
|
|
121
|
+
---
|
|
122
|
+
|
|
123
|
+
## Outcome
|
|
124
|
+
|
|
125
|
+
- Package installation no longer races against a 60-second wall
|
|
126
|
+
- The agent has a clear, named tool to reach for when installing system binaries
|
|
127
|
+
- The system prompt actively directs the agent away from `exec` for this use case
|
|
128
|
+
- Per-tool timeouts are now supported for any future tools that need them
|
package/docs/system-prompt.md
CHANGED
|
@@ -53,12 +53,22 @@ The `exec` tool runs real shell commands on the server. Use it responsibly:
|
|
|
53
53
|
- **Prefer targeted reads.** Use `grep`, `head`, or `tail` instead of `cat` on files you haven't seen before. Large file output is truncated anyway — a targeted command gives you better signal.
|
|
54
54
|
- **Avoid commands with unbounded runtime.** If a command could run indefinitely or scan an unknown-size tree, scope it first.
|
|
55
55
|
|
|
56
|
+
## Failure Recovery
|
|
57
|
+
|
|
58
|
+
When a tool or command fails:
|
|
59
|
+
|
|
60
|
+
- **Retry at most once** with a meaningfully different approach (different command, different source, different strategy). If it fails a second time, stop and report the failure to the user — do not keep trying variations.
|
|
61
|
+
- **Do not repeat a failed strategy.** If one download method fails, do not re-run it with minor changes. Try an entirely different installation method (e.g. package manager instead of curl), or explain the failure to the user.
|
|
62
|
+
- **Use `perplexity_search` sparingly.** At most 3 searches per topic per session. If the first search didn't give you what you need, try a different query angle once — then stop searching and work with what you have or report the gap.
|
|
63
|
+
- **Escalate cleanly.** If you cannot make progress after two distinct approaches, give the user a clear explanation of what was attempted, what failed, and what they can do manually. A useful failure report is better than an infinite retry loop.
|
|
64
|
+
|
|
56
65
|
## Tool Creation
|
|
57
66
|
|
|
58
67
|
When building a custom tool with `save_tool`:
|
|
59
68
|
|
|
60
69
|
- **Prefer npm packages** over reimplementing functionality from scratch. If a well-known package exists for the task (e.g. an API SDK, a parser, a utility library), use it.
|
|
61
70
|
- **Installing an npm package**: use the `npm_install` tool — it handles the correct install directory automatically. Then create the tool with `save_tool`. The tool code can `require('<package-name>')` directly.
|
|
71
|
+
- **Installing a system binary** (e.g. nuclei, jq, ffmpeg, git): use the `system_install` tool — never use exec for this. It auto-detects the available package manager (brew/apt-get/snap) and has a 5-minute timeout sized for real downloads.
|
|
62
72
|
- **Available bindings in tool code**: `args`, `fs`, `path`, `process`, `require`, `__jarvisDir` (absolute path to the jarvis server directory).
|
|
63
73
|
|
|
64
74
|
## logSummary Guidelines
|
package/package.json
CHANGED
|
@@ -300,6 +300,44 @@ async function run() {
|
|
|
300
300
|
}
|
|
301
301
|
}
|
|
302
302
|
|
|
303
|
+
// --- PERPLEXITY STEP (OPTIONAL) ---
|
|
304
|
+
const existingPerplexityKey = loadEnvVar('PERPLEXITY_API_KEY');
|
|
305
|
+
const { configurePerplexity } = await inquirer.prompt([
|
|
306
|
+
{
|
|
307
|
+
type: 'confirm',
|
|
308
|
+
name: 'configurePerplexity',
|
|
309
|
+
message: 'Do you want to configure Perplexity web search?',
|
|
310
|
+
default: !!existingPerplexityKey
|
|
311
|
+
}
|
|
312
|
+
]);
|
|
313
|
+
|
|
314
|
+
if (configurePerplexity) {
|
|
315
|
+
let keepPerplexityKey = false;
|
|
316
|
+
if (existingPerplexityKey) {
|
|
317
|
+
const { keep } = await inquirer.prompt([
|
|
318
|
+
{
|
|
319
|
+
type: 'confirm',
|
|
320
|
+
name: 'keep',
|
|
321
|
+
message: 'A PERPLEXITY_API_KEY is already configured. Do you want to keep it?',
|
|
322
|
+
default: true
|
|
323
|
+
}
|
|
324
|
+
]);
|
|
325
|
+
keepPerplexityKey = keep;
|
|
326
|
+
}
|
|
327
|
+
if (!keepPerplexityKey) {
|
|
328
|
+
const { perplexityKey } = await inquirer.prompt([
|
|
329
|
+
{
|
|
330
|
+
type: 'password',
|
|
331
|
+
name: 'perplexityKey',
|
|
332
|
+
message: 'Enter your Perplexity API key (from perplexity.ai/settings/api):',
|
|
333
|
+
validate: (input) => input.trim().length > 0 || 'API key cannot be empty.'
|
|
334
|
+
}
|
|
335
|
+
]);
|
|
336
|
+
saveEnvVar('PERPLEXITY_API_KEY', perplexityKey.trim());
|
|
337
|
+
console.log(chalk.green('Perplexity API key saved.'));
|
|
338
|
+
}
|
|
339
|
+
}
|
|
340
|
+
|
|
303
341
|
console.log(chalk.green.bold('\nSetup complete!'));
|
|
304
342
|
}
|
|
305
343
|
|
package/src/server/agent.js
CHANGED
|
@@ -8,6 +8,7 @@ import chalk from 'chalk';
|
|
|
8
8
|
|
|
9
9
|
const FORMAT_NUDGE = 'Your previous response was not valid JSON. Respond only with the required JSON object: {"response": "...", "logSummary": "..."}';
|
|
10
10
|
const LOOP_DETECTION_THRESHOLD = 3;
|
|
11
|
+
const CONSECUTIVE_FAILURE_THRESHOLD = 3;
|
|
11
12
|
const MAX_TOOL_RESULT = 4000;
|
|
12
13
|
|
|
13
14
|
const WRAP_UP_NOTE = `[System: You have reached the iteration limit. This is your final response for this run.
|
|
@@ -18,11 +19,12 @@ Respond with your normal JSON, but add a checkpoint field:
|
|
|
18
19
|
"logSummary": "Human-readable summary of what happened in this run.",
|
|
19
20
|
"checkpoint": {
|
|
20
21
|
"progress": "What has been fully completed so far.",
|
|
21
|
-
"remaining": "What still needs to be done to finish the task."
|
|
22
|
+
"remaining": "What still needs to be done to finish the task.",
|
|
23
|
+
"failedApproaches": ["Concise description of each approach that was tried and failed, e.g. 'downloading subfinder via curl from GitHub releases — connection reset'. Omit array entries for things that succeeded. Leave as empty array if nothing failed."]
|
|
22
24
|
}
|
|
23
25
|
}
|
|
24
26
|
|
|
25
|
-
The checkpoint field will be used to automatically resume the task in the next run.]`;
|
|
27
|
+
The checkpoint field will be used to automatically resume the task in the next run. failedApproaches is injected into the next run so the agent does not waste iterations repeating strategies that already failed.]`;
|
|
26
28
|
|
|
27
29
|
// Serializes concurrent requests for the same session. Maps sessionId to the
|
|
28
30
|
// tail of the current request chain (a Promise that resolves when the last
|
|
@@ -81,12 +83,17 @@ async function runAgentLoop(client, config, session, prepareMessages) {
|
|
|
81
83
|
let response = '';
|
|
82
84
|
let logSummary = '';
|
|
83
85
|
let status = 'ok';
|
|
86
|
+
let consecutiveFailures = 0;
|
|
84
87
|
|
|
85
88
|
while (iteration < config.maxIterations) {
|
|
86
89
|
iteration++;
|
|
87
90
|
|
|
88
91
|
let modelResult;
|
|
89
|
-
const
|
|
92
|
+
const iterationsLeft = config.maxIterations - iteration + 1;
|
|
93
|
+
const base = prepareMessages(session.messages);
|
|
94
|
+
const preparedMessages = iterationsLeft <= 5
|
|
95
|
+
? [...base, { role: 'user', content: `[System: ${iterationsLeft} iteration${iterationsLeft === 1 ? '' : 's'} remaining in this run. Budget your remaining steps accordingly — if you cannot finish in time, consolidate progress and provide a checkpoint.]` }]
|
|
96
|
+
: base;
|
|
90
97
|
try {
|
|
91
98
|
modelResult = await callModelWithFallback(client, config, preparedMessages, toolDefs);
|
|
92
99
|
} catch (e) {
|
|
@@ -154,6 +161,14 @@ async function runAgentLoop(client, config, session, prepareMessages) {
|
|
|
154
161
|
toolsModified = true;
|
|
155
162
|
}
|
|
156
163
|
|
|
164
|
+
const resultObj = typeof result === 'object' && result !== null ? result : null;
|
|
165
|
+
const toolFailed = toolStatus === 'error' || (resultObj && resultObj.status === 'error');
|
|
166
|
+
if (toolFailed) {
|
|
167
|
+
consecutiveFailures++;
|
|
168
|
+
} else {
|
|
169
|
+
consecutiveFailures = 0;
|
|
170
|
+
}
|
|
171
|
+
|
|
157
172
|
const resultStr = typeof result === 'string' ? result : JSON.stringify(result);
|
|
158
173
|
runToolCalls.push({ name: toolName, args: toolArgs, status: toolStatus, result: resultStr });
|
|
159
174
|
|
|
@@ -170,6 +185,14 @@ async function runAgentLoop(client, config, session, prepareMessages) {
|
|
|
170
185
|
loopTracker.set(callKey, (loopTracker.get(callKey) || 0) + 1);
|
|
171
186
|
}
|
|
172
187
|
|
|
188
|
+
if (consecutiveFailures >= CONSECUTIVE_FAILURE_THRESHOLD) {
|
|
189
|
+
session.messages.push({
|
|
190
|
+
role: 'user',
|
|
191
|
+
content: '[System: You have had 3 or more consecutive tool failures. Stop retrying the same approach. Either pivot to a fundamentally different strategy or provide your final response explaining what failed and why.]',
|
|
192
|
+
});
|
|
193
|
+
consecutiveFailures = 0;
|
|
194
|
+
}
|
|
195
|
+
|
|
173
196
|
const loopDetected = [...loopTracker.values()].some(count => count >= LOOP_DETECTION_THRESHOLD);
|
|
174
197
|
if (loopDetected) {
|
|
175
198
|
session.messages.push({
|
|
@@ -455,7 +478,12 @@ async function _runHandleChat(config, sessionId, userMessage) {
|
|
|
455
478
|
|
|
456
479
|
// Resume with checkpoint.remaining as new prompt.
|
|
457
480
|
// Guard against null/undefined in case the model omitted the field.
|
|
458
|
-
|
|
481
|
+
// Prepend any failed approaches so the next run doesn't repeat them.
|
|
482
|
+
let resumeContent = run.checkpoint.remaining || 'Continue with the task.';
|
|
483
|
+
if (run.checkpoint.failedApproaches && run.checkpoint.failedApproaches.length > 0) {
|
|
484
|
+
resumeContent += `\n\n[System: The following approaches were tried and failed in the previous run — do not repeat them:\n${run.checkpoint.failedApproaches.map((a, i) => `${i + 1}. ${a}`).join('\n')}]`;
|
|
485
|
+
}
|
|
486
|
+
session.messages.push({ role: 'user', content: resumeContent });
|
|
459
487
|
}
|
|
460
488
|
} catch (e) {
|
|
461
489
|
await appendLog(sessionId, {
|
package/src/server/tools.js
CHANGED
|
@@ -221,6 +221,127 @@ const SEED_TOOLS = {
|
|
|
221
221
|
}
|
|
222
222
|
`,
|
|
223
223
|
},
|
|
224
|
+
perplexity_search: {
|
|
225
|
+
definition: {
|
|
226
|
+
type: 'function',
|
|
227
|
+
function: {
|
|
228
|
+
name: 'perplexity_search',
|
|
229
|
+
description: 'Search the web using Perplexity AI. Returns an answer grounded in real-time web results with citations. Use this for current events, factual lookups, or research questions. Use sparingly — at most 3 searches per topic. Do not repeat the same query with minor variations; if an initial search does not yield what you need, switch to a different approach or verify locally with exec.',
|
|
230
|
+
parameters: {
|
|
231
|
+
type: 'object',
|
|
232
|
+
properties: {
|
|
233
|
+
query: {
|
|
234
|
+
type: 'string',
|
|
235
|
+
description: 'The search query or question.',
|
|
236
|
+
},
|
|
237
|
+
model: {
|
|
238
|
+
type: 'string',
|
|
239
|
+
enum: ['sonar', 'sonar-pro', 'sonar-deep-research'],
|
|
240
|
+
description: 'Search model to use. sonar: fast and cheap, good for simple lookups. sonar-pro: deeper multi-step search, more citations, better for complex questions. sonar-deep-research: long-form research reports. Defaults to sonar.',
|
|
241
|
+
},
|
|
242
|
+
search_recency_filter: {
|
|
243
|
+
type: 'string',
|
|
244
|
+
enum: ['hour', 'day', 'week', 'month', 'year'],
|
|
245
|
+
description: 'Optional time filter to restrict results to recent content.',
|
|
246
|
+
},
|
|
247
|
+
},
|
|
248
|
+
required: ['query'],
|
|
249
|
+
},
|
|
250
|
+
},
|
|
251
|
+
},
|
|
252
|
+
code: `
|
|
253
|
+
const OpenAI = require('openai');
|
|
254
|
+
const client = new OpenAI({
|
|
255
|
+
apiKey: process.env.PERPLEXITY_API_KEY,
|
|
256
|
+
baseURL: 'https://api.perplexity.ai',
|
|
257
|
+
});
|
|
258
|
+
const params = {
|
|
259
|
+
model: args.model || 'sonar',
|
|
260
|
+
messages: [{ role: 'user', content: args.query }],
|
|
261
|
+
};
|
|
262
|
+
if (args.search_recency_filter) params.search_recency_filter = args.search_recency_filter;
|
|
263
|
+
const response = await client.chat.completions.create(params);
|
|
264
|
+
const answer = response.choices[0].message.content;
|
|
265
|
+
const citations = response.citations || [];
|
|
266
|
+
return { answer, citations };
|
|
267
|
+
`,
|
|
268
|
+
},
|
|
269
|
+
system_install: {
|
|
270
|
+
timeout: 300_000, // 5 minutes — package downloads and installs routinely exceed 60s
|
|
271
|
+
definition: {
|
|
272
|
+
type: 'function',
|
|
273
|
+
function: {
|
|
274
|
+
name: 'system_install',
|
|
275
|
+
description: 'Install a system binary using the available package manager (brew on macOS, apt-get on Debian/Ubuntu, snap as fallback). Always use this instead of exec for installing system packages — it auto-detects the package manager and has a 5-minute timeout sized for real downloads. If the binary is already on PATH it returns immediately without installing. Examples: nuclei, subfinder, naabu, jq, curl, git.',
|
|
276
|
+
parameters: {
|
|
277
|
+
type: 'object',
|
|
278
|
+
properties: {
|
|
279
|
+
package: {
|
|
280
|
+
type: 'string',
|
|
281
|
+
description: 'The package name to install, e.g. "nuclei", "jq", "curl".',
|
|
282
|
+
},
|
|
283
|
+
packageManager: {
|
|
284
|
+
type: 'string',
|
|
285
|
+
enum: ['brew', 'apt-get', 'snap'],
|
|
286
|
+
description: 'Optional. Force a specific package manager instead of auto-detecting.',
|
|
287
|
+
},
|
|
288
|
+
},
|
|
289
|
+
required: ['package'],
|
|
290
|
+
},
|
|
291
|
+
},
|
|
292
|
+
},
|
|
293
|
+
code: `
|
|
294
|
+
const { exec } = require('child_process');
|
|
295
|
+
const { promisify } = require('util');
|
|
296
|
+
const execAsync = promisify(exec);
|
|
297
|
+
|
|
298
|
+
// If the binary is already installed, return immediately.
|
|
299
|
+
try {
|
|
300
|
+
const { stdout: whichOut } = await execAsync('which ' + args.package, { timeout: 5000 });
|
|
301
|
+
if (whichOut.trim()) {
|
|
302
|
+
return { status: 'ok', alreadyInstalled: true, package: args.package, path: whichOut.trim() };
|
|
303
|
+
}
|
|
304
|
+
} catch {}
|
|
305
|
+
|
|
306
|
+
// Auto-detect package manager if not specified.
|
|
307
|
+
let pm = args.packageManager;
|
|
308
|
+
if (!pm) {
|
|
309
|
+
for (const candidate of ['brew', 'apt-get', 'snap']) {
|
|
310
|
+
try {
|
|
311
|
+
await execAsync('which ' + candidate, { timeout: 5000 });
|
|
312
|
+
pm = candidate;
|
|
313
|
+
break;
|
|
314
|
+
} catch {}
|
|
315
|
+
}
|
|
316
|
+
}
|
|
317
|
+
|
|
318
|
+
if (!pm) {
|
|
319
|
+
return { status: 'error', package: args.package, error: 'No supported package manager found (brew, apt-get, snap). Install one first or use exec to install manually.' };
|
|
320
|
+
}
|
|
321
|
+
|
|
322
|
+
// Build install command. apt-get always runs update first to avoid stale
|
|
323
|
+
// package lists causing "package not found" errors.
|
|
324
|
+
let cmd;
|
|
325
|
+
if (pm === 'apt-get') {
|
|
326
|
+
cmd = 'DEBIAN_FRONTEND=noninteractive apt-get update -qq && apt-get install -y ' + args.package;
|
|
327
|
+
} else if (pm === 'brew') {
|
|
328
|
+
cmd = 'brew install ' + args.package;
|
|
329
|
+
} else {
|
|
330
|
+
cmd = pm + ' install ' + args.package;
|
|
331
|
+
}
|
|
332
|
+
|
|
333
|
+
try {
|
|
334
|
+
const { stdout, stderr } = await execAsync(cmd, {
|
|
335
|
+
encoding: 'utf8',
|
|
336
|
+
timeout: 270000, // 4.5 min — leaves headroom before the outer 5-min tool timeout
|
|
337
|
+
maxBuffer: 2 * 1024 * 1024,
|
|
338
|
+
});
|
|
339
|
+
return { status: 'ok', packageManager: pm, package: args.package, stdout, stderr };
|
|
340
|
+
} catch (e) {
|
|
341
|
+
return { status: 'error', packageManager: pm, package: args.package, exitCode: e.code || 1, stdout: e.stdout || '', stderr: e.stderr || e.message };
|
|
342
|
+
}
|
|
343
|
+
`,
|
|
344
|
+
},
|
|
224
345
|
get_recent_sessions: {
|
|
225
346
|
definition: {
|
|
226
347
|
type: 'function',
|
|
@@ -314,10 +435,14 @@ export async function executeTool(tools, name, toolArgs) {
|
|
|
314
435
|
|
|
315
436
|
const fn = new AsyncFunction('args', 'fs', 'path', 'process', 'require', '__jarvisDir', tool.code);
|
|
316
437
|
|
|
438
|
+
// Tools can declare their own timeout (e.g. system_install needs 5 min).
|
|
439
|
+
// Falls back to the global default of 60s.
|
|
440
|
+
const timeoutMs = tool.timeout || TOOL_TIMEOUT_MS;
|
|
441
|
+
|
|
317
442
|
const timeout = new Promise((_, reject) =>
|
|
318
443
|
setTimeout(
|
|
319
|
-
() => reject(new Error(`Tool '${name}' timed out after ${
|
|
320
|
-
|
|
444
|
+
() => reject(new Error(`Tool '${name}' timed out after ${timeoutMs / 1000}s`)),
|
|
445
|
+
timeoutMs
|
|
321
446
|
)
|
|
322
447
|
);
|
|
323
448
|
|