@ducci/jarvis 1.0.21 → 1.0.23

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,162 @@
1
+ # Finding 004: Agent Reliability — Failure Loops, Checkpoint Memory, and Iteration Awareness
2
+
3
+ **Date:** 2026-02-27
4
+ **Severity:** High — caused observed session failure with 54 tool calls, `format_error` crash, and no useful output after 42 minutes
5
+ **Status:** Fixed
6
+
7
+ ---
8
+
9
+ ## What Happened
10
+
11
+ A session was started to build a cybersecurity project installing three tools: Nuclei, Subfinder, and Naabu. The agent went through 5 handoffs (hitting `maxHandoffs`) and crashed with a `format_error` in the final iteration. Observations:
12
+
13
+ - **181 exec calls** across 19 agent iterations
14
+ - **21 perplexity_search calls**, including 11+ in the final iteration alone
15
+ - The agent oscillated between download strategies (Docker → `go install` → tarball → direct binary) without memory of what had already failed
16
+ - The existing loop detector never fired because each failed command had slightly different arguments (different URLs, different flags), producing different `callKey` values
17
+ - Each handoff resumed with only `checkpoint.remaining` — no record of what approaches had already been exhausted
18
+ - The model degraded and eventually produced a non-JSON response (`format_error`), crashing the session
19
+
20
+ ---
21
+
22
+ ## Root Causes
23
+
24
+ ### 1. Loop detection only caught exact-match repetition
25
+
26
+ The existing `loopTracker` detects when the _exact same_ tool call (name + args + result) is repeated 3 times. It does not detect _semantic_ failure loops: repeated attempts to do the same thing via slightly different commands. In the session, each download attempt used a different URL or flags, so every `callKey` was unique.
27
+
28
+ ### 2. Checkpoint carried no memory of failed strategies
29
+
30
+ `WRAP_UP_NOTE` asked the model for `progress` and `remaining`, but not for a record of what _failed_. Each new run after a handoff started with a blank slate — the model had no way to know that curl downloads, `go install`, and tarball extraction had already been tried and failed. It repeated them.
31
+
32
+ ### 3. No iteration budget awareness
33
+
34
+ The model had no visibility into how many iterations remained in the current run. It kept taking exploratory steps as if budget were unlimited, then was surprised by the wrap-up call. A model that knows it has 2 iterations left will consolidate and report; one that doesn't will keep digging.
35
+
36
+ ### 4. `perplexity_search` used without restraint
37
+
38
+ The tool description had no guidance on usage limits. The model searched Perplexity 21 times in one session (11 in the final iteration), including redundant queries for the same version information. Each search consumed an iteration and added latency without improving outcomes.
39
+
40
+ ### 5. System prompt had no "give up" rule
41
+
42
+ The system prompt told the agent to "decide whether to retry with a corrected call or explain the failure to the user" — but gave no threshold for when to stop retrying. In practice, the agent always chose to retry, regardless of how many times the same approach had failed.
43
+
44
+ ---
45
+
46
+ ## Fixes
47
+
48
+ ### 1. Consecutive failure detection (`src/server/agent.js`)
49
+
50
+ Added a `consecutiveFailures` counter that tracks back-to-back failed tool calls across all iterations within a run. A tool call counts as failed if `executeTool` throws (`toolStatus === 'error'`) _or_ the result object has `status === 'error'` (catching exec failures that are returned, not thrown). A successful call resets the counter to 0.
51
+
52
+ After each iteration's tool calls, if `consecutiveFailures >= CONSECUTIVE_FAILURE_THRESHOLD` (3), a system break message is injected into the session and the counter resets:
53
+
54
+ ```js
55
+ const resultObj = typeof result === 'object' && result !== null ? result : null;
56
+ const toolFailed = toolStatus === 'error' || (resultObj && resultObj.status === 'error');
57
+ if (toolFailed) {
58
+ consecutiveFailures++;
59
+ } else {
60
+ consecutiveFailures = 0;
61
+ }
62
+ ```
63
+
64
+ ```js
65
+ if (consecutiveFailures >= CONSECUTIVE_FAILURE_THRESHOLD) {
66
+ session.messages.push({
67
+ role: 'user',
68
+ content: '[System: You have had 3 or more consecutive tool failures. Stop retrying the same approach. Either pivot to a fundamentally different strategy or provide your final response explaining what failed and why.]',
69
+ });
70
+ consecutiveFailures = 0;
71
+ }
72
+ ```
73
+
74
+ This complements the existing exact-match loop detector — together they cover both identical repetition and semantic failure loops.
75
+
76
+ ### 2. `failedApproaches` in checkpoint schema (`src/server/agent.js`)
77
+
78
+ Updated `WRAP_UP_NOTE` to request a `failedApproaches` array alongside `progress` and `remaining`:
79
+
80
+ ```json
81
+ {
82
+ "checkpoint": {
83
+ "progress": "...",
84
+ "remaining": "...",
85
+ "failedApproaches": ["downloading subfinder via curl from GitHub releases — connection reset", "..."]
86
+ }
87
+ }
88
+ ```
89
+
90
+ When resuming after a handoff, the agent loop now appends a system note to the resume message listing every failed approach from the previous run:
91
+
92
+ ```js
93
+ let resumeContent = run.checkpoint.remaining || 'Continue with the task.';
94
+ if (run.checkpoint.failedApproaches && run.checkpoint.failedApproaches.length > 0) {
95
+ resumeContent += `\n\n[System: The following approaches were tried and failed in the previous run — do not repeat them:\n${run.checkpoint.failedApproaches.map((a, i) => `${i + 1}. ${a}`).join('\n')}]`;
96
+ }
97
+ ```
98
+
99
+ The next run starts with concrete knowledge of what not to try, rather than repeating the same failed strategies.
100
+
101
+ ### 3. Iteration budget awareness (`src/server/agent.js`)
102
+
103
+ Each model request now includes the remaining iteration count when 5 or fewer iterations are left in the run. The count is appended to the prepared messages (never stored in `session.messages`):
104
+
105
+ ```js
106
+ const iterationsLeft = config.maxIterations - iteration + 1;
107
+ const preparedMessages = iterationsLeft <= 5
108
+ ? [...base, { role: 'user', content: `[System: ${iterationsLeft} iteration${iterationsLeft === 1 ? '' : 's'} remaining in this run. Budget your remaining steps accordingly — if you cannot finish in time, consolidate progress and provide a checkpoint.]` }]
109
+ : base;
110
+ ```
111
+
112
+ This gives the model enough warning to consolidate and produce a clean checkpoint rather than being cut off mid-task.
113
+
114
+ ### 4. `perplexity_search` description updated (`src/server/tools.js`)
115
+
116
+ Added explicit usage guidance to the tool description:
117
+
118
+ > Use sparingly — at most 3 searches per topic. Do not repeat the same query with minor variations; if an initial search does not yield what you need, switch to a different approach or verify locally with exec.
119
+
120
+ ### 5. "Failure Recovery" section added to system prompt (`docs/system-prompt.md`)
121
+
122
+ Added a dedicated `## Failure Recovery` section making the give-up rule explicit:
123
+
124
+ - Retry at most once with a meaningfully different approach; if it fails again, report to the user
125
+ - Never repeat a failed strategy with minor variations
126
+ - Use `perplexity_search` at most 3 times per topic
127
+ - Escalate cleanly with a useful failure report rather than looping
128
+
129
+ ---
130
+
131
+ ## What Was Not Changed
132
+
133
+ - The existing exact-match loop detector (`loopTracker`) — it remains and now works alongside the new consecutive failure detector
134
+ - The checkpoint/handoff system, `maxHandoffs` limit, or tool history strip logic — unchanged
135
+ - The `format_error` recovery path (fallback model + nudge retry) — unchanged
136
+ - No per-session Perplexity call counter was added server-side; the guidance is model-facing
137
+
138
+ ---
139
+
140
+ ## Note: Homebrew as an Alternative for Binary Tool Installation
141
+
142
+ During the session that exposed these issues, the agent struggled to install Nuclei, Subfinder, and Naabu from GitHub releases. All three tools from [ProjectDiscovery](https://projectdiscovery.io/) are available via Homebrew:
143
+
144
+ ```
145
+ brew install nuclei
146
+ brew install subfinder
147
+ brew install naabu
148
+ ```
149
+
150
+ Homebrew handles binary verification, PATH setup, and version management automatically — far more reliably than manual `curl` downloads or `go install`. On macOS (and Linux via Linuxbrew), this is the recommended installation method. The agent could discover this via `perplexity_search` or by checking `brew search projectdiscovery` — but only if it knows to try Homebrew _before_ attempting manual downloads.
151
+
152
+ This is a guidance gap rather than a code bug: the system prompt doesn't mention package managers as a preferred strategy for binary installation. A future improvement could add: "When installing CLI tools, check for a package manager installation first (`brew install`, `apt install`, `snap install`) before attempting manual downloads."
153
+
154
+ ---
155
+
156
+ ## Outcome
157
+
158
+ - Failure loops are now intercepted after 3 consecutive failures instead of running to exhaustion
159
+ - Handoff runs start with knowledge of what has already failed, enabling genuine strategy pivots
160
+ - The model receives iteration budget warnings before hitting the wrap-up call
161
+ - Perplexity search overuse is constrained by both tool description and system prompt guidance
162
+ - The system prompt now has an explicit rule for when to stop retrying and report
@@ -0,0 +1,128 @@
1
+ # Finding 005: 60-Second Exec Timeout Breaks Package Installation
2
+
3
+ **Date:** 2026-02-27
4
+ **Severity:** High — any package installation via exec will timeout, regardless of which package manager is used
5
+ **Status:** Fixed
6
+
7
+ ---
8
+
9
+ ## What Happened
10
+
11
+ In the session analysed in Finding 004, the agent attempted to install Nuclei, Subfinder, and Naabu using several strategies — `go install`, direct `curl` downloads, tarball extraction. All either timed out or failed with network errors.
12
+
13
+ The natural follow-up question was: would switching to a proper package manager (`brew`, `apt-get`) solve the problem? The answer is no — not without fixing the timeout first.
14
+
15
+ ---
16
+
17
+ ## Root Cause
18
+
19
+ ### The `exec` tool has a hard 60-second timeout
20
+
21
+ ```js
22
+ // exec seed tool
23
+ const { stdout, stderr } = await execAsync(args.cmd, {
24
+ encoding: 'utf8',
25
+ timeout: 60000, // ← 60 seconds
26
+ maxBuffer: 2 * 1024 * 1024,
27
+ });
28
+ ```
29
+
30
+ On top of that, `executeTool` wraps every tool in its own `Promise.race` against `TOOL_TIMEOUT_MS = 60_000`. This means even if a tool's internal `execAsync` had a longer timeout, the outer race would kill it at 60 seconds anyway.
31
+
32
+ Package installation routinely takes longer than 60 seconds:
33
+
34
+ | Operation | Typical duration |
35
+ |---|---|
36
+ | `apt-get update` | 15–60s (varies with server speed, mirror load) |
37
+ | `apt-get install nuclei` | 10–60s (binary download + extraction) |
38
+ | `apt-get update && apt-get install nuclei` | 30–120s combined |
39
+ | `brew install nuclei` | 20–90s |
40
+ | `go install github.com/...` | 60–180s (compilation) |
41
+
42
+ So `go install` was almost guaranteed to timeout. `apt-get` with an update step would regularly exceed 60s too. Swapping the install method without fixing the timeout would not have solved the problem.
43
+
44
+ ### The `npm_install` tool has the same problem
45
+
46
+ `npm_install` also uses a 60-second timeout — it would fail for large packages or slow networks for the same reason.
47
+
48
+ ### No per-tool timeout mechanism existed
49
+
50
+ All tools shared the same `TOOL_TIMEOUT_MS` constant. There was no way to declare that a specific tool legitimately needed more time.
51
+
52
+ ---
53
+
54
+ ## Fix
55
+
56
+ ### 1. Per-tool timeout override (`src/server/tools.js`)
57
+
58
+ `executeTool` now checks for a `timeout` property on the tool definition and uses it instead of the global `TOOL_TIMEOUT_MS`:
59
+
60
+ ```js
61
+ const timeoutMs = tool.timeout || TOOL_TIMEOUT_MS;
62
+
63
+ const timeout = new Promise((_, reject) =>
64
+ setTimeout(
65
+ () => reject(new Error(`Tool '${name}' timed out after ${timeoutMs / 1000}s`)),
66
+ timeoutMs
67
+ )
68
+ );
69
+ ```
70
+
71
+ Tools that don't declare a timeout continue to use the 60s default — no behaviour change for existing tools.
72
+
73
+ ### 2. `system_install` seed tool with 5-minute timeout (`src/server/tools.js`)
74
+
75
+ A new built-in tool handles system binary installation:
76
+
77
+ ```js
78
+ system_install: {
79
+ timeout: 300_000, // 5 minutes
80
+ ...
81
+ }
82
+ ```
83
+
84
+ Behaviour:
85
+ - **Already installed check**: runs `which <package>` first. If the binary is already on PATH, returns immediately without installing — no wasted time.
86
+ - **Auto-detection**: tries `brew`, `apt-get`, `snap` in order. Uses the first one found. Can be overridden with an explicit `packageManager` argument.
87
+ - **apt-get**: always runs `apt-get update -qq` before `apt-get install -y` to avoid stale package list failures. Uses `DEBIAN_FRONTEND=noninteractive` to suppress interactive prompts.
88
+ - **Inner timeout**: 4.5 minutes on the internal `execAsync`, leaving headroom before the outer 5-minute tool timeout fires.
89
+ - **Structured errors**: returns `exitCode`, `stdout`, `stderr` on failure so the agent can read the actual error without guessing.
90
+
91
+ ### 3. System prompt updated (`docs/system-prompt.md`)
92
+
93
+ Added explicit guidance in the "Tool Creation" section:
94
+
95
+ > **Installing a system binary** (e.g. nuclei, jq, ffmpeg, git): use the `system_install` tool — never use exec for this. It auto-detects the available package manager (brew/apt-get/snap) and has a 5-minute timeout sized for real downloads.
96
+
97
+ This mirrors the existing `npm_install` guidance and gives the model an explicit directive to reach for `system_install` over `exec`.
98
+
99
+ ---
100
+
101
+ ## Why `system_install` Instead of Increasing the Global Timeout
102
+
103
+ Increasing `TOOL_TIMEOUT_MS` globally would make every tool — including `exec` — hang for much longer before failing. A runaway `exec` command (e.g. `find / -name "*.js"`) that produces no output would block the event loop for 5 minutes instead of 60 seconds. The per-tool timeout keeps the safe default for general tools while letting installation tools declare a legitimate need for more time.
104
+
105
+ ---
106
+
107
+ ## Notes on Package Availability
108
+
109
+ Not all tools are available in every package manager. For ProjectDiscovery tools (nuclei, subfinder, naabu) specifically:
110
+
111
+ - **macOS (brew)**: `brew install nuclei`, `brew install subfinder`, `brew install naabu` — all available via the official Homebrew formulae
112
+ - **Linux (apt-get)**: ProjectDiscovery does not maintain an official apt repository. `apt-get install nuclei` will likely fail with "package not found"
113
+ - **Linux (snap)**: `snap install nuclei` is available on Ubuntu/Debian
114
+
115
+ On Linux, if `system_install` fails because the package isn't in the apt repository, the agent should try snap, or fall back to the ProjectDiscovery install script:
116
+ ```sh
117
+ curl -sL https://github.com/projectdiscovery/nuclei/releases/latest/download/nuclei_linux_amd64.zip -o nuclei.zip
118
+ ```
119
+ This would still use `exec` for the download, but with the failure recovery guidance from Finding 004 the agent should report the failure rather than looping.
120
+
121
+ ---
122
+
123
+ ## Outcome
124
+
125
+ - Package installation no longer races against a 60-second wall
126
+ - The agent has a clear, named tool to reach for when installing system binaries
127
+ - The system prompt actively directs the agent away from `exec` for this use case
128
+ - Per-tool timeouts are now supported for any future tools that need them
@@ -0,0 +1,118 @@
1
+ # Finding 006: Malformed Tool Schema Poisons Every Subsequent Request
2
+
3
+ **Date:** 2026-02-27
4
+ **Severity:** High — permanently breaks all model calls for the session until tools.json is manually repaired
5
+ **Status:** Fixed
6
+
7
+ ---
8
+
9
+ ## What Happened
10
+
11
+ During a session installing security tools (Nuclei, Subfinder, Naabu), the agent called `save_tool` to create a custom `scan` tool. The model passed the `parameters` field as a **JSON-serialized string** instead of a JSON object:
12
+
13
+ ```json
14
+ "parameters": "{\"type\":\"object\",\"properties\":{\"domain\":{\"type\":\"string\",...}}}"
15
+ ```
16
+
17
+ `save_tool` stored this verbatim — no validation occurred. The malformed tool definition was written to `~/.jarvis/data/tools/tools.json` as tool index 12.
18
+
19
+ On the very next model call (within the same run), Jarvis reloaded tools after `save_tool` completed (`toolsModified = true`) and sent all tool definitions to the provider. OpenRouter's provider API returned:
20
+
21
+ ```
22
+ 400 Provider returned error
23
+ [{'type': 'dict_type', 'loc': ('body', 'tools', 12, 'function', 'parameters'),
24
+ 'msg': 'Input should be a valid dictionary', 'input': '{"type":"object",...}'}]
25
+ ```
26
+
27
+ Every subsequent user message — including trivial ones like "Was ist schief gegangen?" and "Wie ist deine session id" — also failed with the same 400. The malformed tool was permanently in `tools.json` and included in every model request.
28
+
29
+ ---
30
+
31
+ ## Root Cause
32
+
33
+ ### 1. `save_tool` did not validate `parameters`
34
+
35
+ The `save_tool` code stored `args.parameters` directly into `tools.json` without checking its type. The OpenAI tool-calling spec requires `parameters` to be a JSON Schema object (a dictionary). When a model passes a JSON string instead of an object — a common mistake with weaker models — the result is a permanently malformed tool definition.
36
+
37
+ ### 2. `getToolDefinitions` sent all tools to the provider without validation
38
+
39
+ `getToolDefinitions` returned all tool definitions unconditionally. A single malformed tool poisoned every request that included the tools list — which is every request, since tool definitions are always sent.
40
+
41
+ These two gaps compound each other: the first allows bad data in, the second ensures it breaks everything downstream.
42
+
43
+ ---
44
+
45
+ ## Why it Persists Across All Subsequent Messages
46
+
47
+ Every `handleChat` call loads tools fresh from `tools.json` via `loadTools()`. The malformed `scan` tool is always in the list. Every model call sends all tool definitions. The provider rejects every request. The session is stuck until `tools.json` is manually repaired.
48
+
49
+ ---
50
+
51
+ ## Intermittent Behavior
52
+
53
+ The failure is not always immediate. In the debugging session, two tool calls succeeded in later runs before the 400 fired. This is consistent with free/preview model providers (nvidia via OpenRouter) applying schema validation inconsistently across backend instances. The bug is therefore not "always broken" but **reliably broken under load** — which is harder to detect and debug than a consistent failure.
54
+
55
+ ---
56
+
57
+ ## Fix
58
+
59
+ Two targeted changes to `src/server/tools.js`.
60
+
61
+ ### 1. `save_tool` validates and auto-corrects `parameters`
62
+
63
+ Before writing to `tools.json`, the `save_tool` code now:
64
+ - If `parameters` is a string, attempts `JSON.parse()` — models commonly serialize the object when they should pass it directly. If parsing succeeds and yields an object, the corrected value is used silently.
65
+ - If `parameters` is still not a plain object after the attempted parse, returns an error immediately with a clear message. Nothing is written to disk.
66
+
67
+ ```js
68
+ let parameters = args.parameters;
69
+ if (typeof parameters === 'string') {
70
+ try {
71
+ parameters = JSON.parse(parameters);
72
+ } catch {
73
+ return { status: 'error', error: 'parameters must be a JSON Schema object, not a string. Pass the object directly, not as a JSON-serialized string.' };
74
+ }
75
+ }
76
+ if (typeof parameters !== 'object' || parameters === null || Array.isArray(parameters)) {
77
+ return { status: 'error', error: 'parameters must be a JSON Schema object (e.g. { type: "object", properties: {...} }).' };
78
+ }
79
+ ```
80
+
81
+ ### 2. `getToolDefinitions` filters out malformed tools
82
+
83
+ `getToolDefinitions` now validates `parameters` on each tool before including it in the definitions sent to the provider. A tool with a non-object `parameters` is skipped with a `console.warn`, not thrown — this is a defence-in-depth guard, not a primary error path.
84
+
85
+ ```js
86
+ export function getToolDefinitions(tools) {
87
+ const defs = [];
88
+ for (const [name, t] of Object.entries(tools)) {
89
+ const params = t.definition?.function?.parameters;
90
+ if (typeof params !== 'object' || params === null || Array.isArray(params)) {
91
+ console.warn(`[tools] Skipping tool '${name}': parameters is not a valid object (got ${typeof params})`);
92
+ continue;
93
+ }
94
+ defs.push(t.definition);
95
+ }
96
+ return defs;
97
+ }
98
+ ```
99
+
100
+ Together: Fix 1 prevents malformed tools from entering `tools.json`. Fix 2 ensures that even if a malformed tool somehow ends up in `tools.json` (e.g. from an older version, manual edit, or a bug that slips through), it is silently excluded from every model request rather than poisoning the session.
101
+
102
+ ---
103
+
104
+ ## Secondary Issue: Agent Hallucinated Successful Action
105
+
106
+ In the run that preceded the `save_tool` call, the agent responded with:
107
+
108
+ > "The scanning script has been created at /root/.jarvis/projects/cybersecurity/scan.sh."
109
+
110
+ No such file was ever created — the agent had only installed Nuclei successfully. This hallucination forced the next run to attempt recovery via `save_tool`, which is where the malformed tool was introduced. The hallucination itself is a model-quality issue with the free nvidia model, not a Jarvis bug.
111
+
112
+ ---
113
+
114
+ ## Outcome
115
+
116
+ - `save_tool` auto-corrects the common case (string instead of object) and rejects the rest with a clear error before writing to disk
117
+ - A pre-existing malformed tool in `tools.json` no longer poisons model requests — it is silently skipped per call
118
+ - Sessions are no longer permanently broken by a single bad `save_tool` call
@@ -53,12 +53,22 @@ The `exec` tool runs real shell commands on the server. Use it responsibly:
53
53
  - **Prefer targeted reads.** Use `grep`, `head`, or `tail` instead of `cat` on files you haven't seen before. Large file output is truncated anyway — a targeted command gives you better signal.
54
54
  - **Avoid commands with unbounded runtime.** If a command could run indefinitely or scan an unknown-size tree, scope it first.
55
55
 
56
+ ## Failure Recovery
57
+
58
+ When a tool or command fails:
59
+
60
+ - **Retry at most once** with a meaningfully different approach (different command, different source, different strategy). If it fails a second time, stop and report the failure to the user — do not keep trying variations.
61
+ - **Do not repeat a failed strategy.** If one download method fails, do not re-run it with minor changes. Try an entirely different installation method (e.g. package manager instead of curl), or explain the failure to the user.
62
+ - **Use `perplexity_search` sparingly.** At most 3 searches per topic per session. If the first search didn't give you what you need, try a different query angle once — then stop searching and work with what you have or report the gap.
63
+ - **Escalate cleanly.** If you cannot make progress after two distinct approaches, give the user a clear explanation of what was attempted, what failed, and what they can do manually. A useful failure report is better than an infinite retry loop.
64
+
56
65
  ## Tool Creation
57
66
 
58
67
  When building a custom tool with `save_tool`:
59
68
 
60
69
  - **Prefer npm packages** over reimplementing functionality from scratch. If a well-known package exists for the task (e.g. an API SDK, a parser, a utility library), use it.
61
70
  - **Installing an npm package**: use the `npm_install` tool — it handles the correct install directory automatically. Then create the tool with `save_tool`. The tool code can `require('<package-name>')` directly.
71
+ - **Installing a system binary** (e.g. nuclei, jq, ffmpeg, git): use the `system_install` tool — never use exec for this. It auto-detects the available package manager (brew/apt-get/snap) and has a 5-minute timeout sized for real downloads.
62
72
  - **Available bindings in tool code**: `args`, `fs`, `path`, `process`, `require`, `__jarvisDir` (absolute path to the jarvis server directory).
63
73
 
64
74
  ## logSummary Guidelines
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ducci/jarvis",
3
- "version": "1.0.21",
3
+ "version": "1.0.23",
4
4
  "description": "A fully automated agent system that lives on a server.",
5
5
  "main": "./src/index.js",
6
6
  "type": "module",
@@ -8,6 +8,7 @@ import chalk from 'chalk';
8
8
 
9
9
  const FORMAT_NUDGE = 'Your previous response was not valid JSON. Respond only with the required JSON object: {"response": "...", "logSummary": "..."}';
10
10
  const LOOP_DETECTION_THRESHOLD = 3;
11
+ const CONSECUTIVE_FAILURE_THRESHOLD = 3;
11
12
  const MAX_TOOL_RESULT = 4000;
12
13
 
13
14
  const WRAP_UP_NOTE = `[System: You have reached the iteration limit. This is your final response for this run.
@@ -18,11 +19,12 @@ Respond with your normal JSON, but add a checkpoint field:
18
19
  "logSummary": "Human-readable summary of what happened in this run.",
19
20
  "checkpoint": {
20
21
  "progress": "What has been fully completed so far.",
21
- "remaining": "What still needs to be done to finish the task."
22
+ "remaining": "What still needs to be done to finish the task.",
23
+ "failedApproaches": ["Concise description of each approach that was tried and failed, e.g. 'downloading subfinder via curl from GitHub releases — connection reset'. Omit array entries for things that succeeded. Leave as empty array if nothing failed."]
22
24
  }
23
25
  }
24
26
 
25
- The checkpoint field will be used to automatically resume the task in the next run.]`;
27
+ The checkpoint field will be used to automatically resume the task in the next run. failedApproaches is injected into the next run so the agent does not waste iterations repeating strategies that already failed.]`;
26
28
 
27
29
  // Serializes concurrent requests for the same session. Maps sessionId to the
28
30
  // tail of the current request chain (a Promise that resolves when the last
@@ -81,12 +83,17 @@ async function runAgentLoop(client, config, session, prepareMessages) {
81
83
  let response = '';
82
84
  let logSummary = '';
83
85
  let status = 'ok';
86
+ let consecutiveFailures = 0;
84
87
 
85
88
  while (iteration < config.maxIterations) {
86
89
  iteration++;
87
90
 
88
91
  let modelResult;
89
- const preparedMessages = prepareMessages(session.messages);
92
+ const iterationsLeft = config.maxIterations - iteration + 1;
93
+ const base = prepareMessages(session.messages);
94
+ const preparedMessages = iterationsLeft <= 5
95
+ ? [...base, { role: 'user', content: `[System: ${iterationsLeft} iteration${iterationsLeft === 1 ? '' : 's'} remaining in this run. Budget your remaining steps accordingly — if you cannot finish in time, consolidate progress and provide a checkpoint.]` }]
96
+ : base;
90
97
  try {
91
98
  modelResult = await callModelWithFallback(client, config, preparedMessages, toolDefs);
92
99
  } catch (e) {
@@ -154,6 +161,14 @@ async function runAgentLoop(client, config, session, prepareMessages) {
154
161
  toolsModified = true;
155
162
  }
156
163
 
164
+ const resultObj = typeof result === 'object' && result !== null ? result : null;
165
+ const toolFailed = toolStatus === 'error' || (resultObj && resultObj.status === 'error');
166
+ if (toolFailed) {
167
+ consecutiveFailures++;
168
+ } else {
169
+ consecutiveFailures = 0;
170
+ }
171
+
157
172
  const resultStr = typeof result === 'string' ? result : JSON.stringify(result);
158
173
  runToolCalls.push({ name: toolName, args: toolArgs, status: toolStatus, result: resultStr });
159
174
 
@@ -170,6 +185,14 @@ async function runAgentLoop(client, config, session, prepareMessages) {
170
185
  loopTracker.set(callKey, (loopTracker.get(callKey) || 0) + 1);
171
186
  }
172
187
 
188
+ if (consecutiveFailures >= CONSECUTIVE_FAILURE_THRESHOLD) {
189
+ session.messages.push({
190
+ role: 'user',
191
+ content: '[System: You have had 3 or more consecutive tool failures. Stop retrying the same approach. Either pivot to a fundamentally different strategy or provide your final response explaining what failed and why.]',
192
+ });
193
+ consecutiveFailures = 0;
194
+ }
195
+
173
196
  const loopDetected = [...loopTracker.values()].some(count => count >= LOOP_DETECTION_THRESHOLD);
174
197
  if (loopDetected) {
175
198
  session.messages.push({
@@ -455,7 +478,12 @@ async function _runHandleChat(config, sessionId, userMessage) {
455
478
 
456
479
  // Resume with checkpoint.remaining as new prompt.
457
480
  // Guard against null/undefined in case the model omitted the field.
458
- session.messages.push({ role: 'user', content: run.checkpoint.remaining || 'Continue with the task.' });
481
+ // Prepend any failed approaches so the next run doesn't repeat them.
482
+ let resumeContent = run.checkpoint.remaining || 'Continue with the task.';
483
+ if (run.checkpoint.failedApproaches && run.checkpoint.failedApproaches.length > 0) {
484
+ resumeContent += `\n\n[System: The following approaches were tried and failed in the previous run — do not repeat them:\n${run.checkpoint.failedApproaches.map((a, i) => `${i + 1}. ${a}`).join('\n')}]`;
485
+ }
486
+ session.messages.push({ role: 'user', content: resumeContent });
459
487
  }
460
488
  } catch (e) {
461
489
  await appendLog(sessionId, {
@@ -149,7 +149,7 @@ const SEED_TOOLS = {
149
149
  },
150
150
  },
151
151
  },
152
- code: `const toolsFile = path.join(process.env.HOME, '.jarvis/data/tools/tools.json'); const raw = await fs.promises.readFile(toolsFile, 'utf8').catch(() => '{}'); const tools = JSON.parse(raw); tools[args.name] = { definition: { type: 'function', function: { name: args.name, description: args.description, parameters: args.parameters } }, code: args.code }; await fs.promises.writeFile(toolsFile, JSON.stringify(tools, null, 2), 'utf8'); return { status: 'ok', saved: args.name };`,
152
+ code: `const toolsFile = path.join(process.env.HOME, '.jarvis/data/tools/tools.json'); const raw = await fs.promises.readFile(toolsFile, 'utf8').catch(() => '{}'); const tools = JSON.parse(raw); let parameters = args.parameters; if (typeof parameters === 'string') { try { parameters = JSON.parse(parameters); } catch { return { status: 'error', error: 'parameters must be a JSON Schema object, not a string. Pass the object directly, not as a JSON-serialized string.' }; } } if (typeof parameters !== 'object' || parameters === null || Array.isArray(parameters)) { return { status: 'error', error: 'parameters must be a JSON Schema object (e.g. { type: "object", properties: {...} }).' }; } tools[args.name] = { definition: { type: 'function', function: { name: args.name, description: args.description, parameters } }, code: args.code }; await fs.promises.writeFile(toolsFile, JSON.stringify(tools, null, 2), 'utf8'); return { status: 'ok', saved: args.name };`,
153
153
  },
154
154
  get_tool: {
155
155
  definition: {
@@ -226,7 +226,7 @@ const SEED_TOOLS = {
226
226
  type: 'function',
227
227
  function: {
228
228
  name: 'perplexity_search',
229
- description: 'Search the web using Perplexity AI. Returns an answer grounded in real-time web results with citations. Use this for current events, factual lookups, or research questions.',
229
+ description: 'Search the web using Perplexity AI. Returns an answer grounded in real-time web results with citations. Use this for current events, factual lookups, or research questions. Use sparingly — at most 3 searches per topic. Do not repeat the same query with minor variations; if an initial search does not yield what you need, switch to a different approach or verify locally with exec.',
230
230
  parameters: {
231
231
  type: 'object',
232
232
  properties: {
@@ -266,6 +266,82 @@ const SEED_TOOLS = {
266
266
  return { answer, citations };
267
267
  `,
268
268
  },
269
+ system_install: {
270
+ timeout: 300_000, // 5 minutes — package downloads and installs routinely exceed 60s
271
+ definition: {
272
+ type: 'function',
273
+ function: {
274
+ name: 'system_install',
275
+ description: 'Install a system binary using the available package manager (brew on macOS, apt-get on Debian/Ubuntu, snap as fallback). Always use this instead of exec for installing system packages — it auto-detects the package manager and has a 5-minute timeout sized for real downloads. If the binary is already on PATH it returns immediately without installing. Examples: nuclei, subfinder, naabu, jq, curl, git.',
276
+ parameters: {
277
+ type: 'object',
278
+ properties: {
279
+ package: {
280
+ type: 'string',
281
+ description: 'The package name to install, e.g. "nuclei", "jq", "curl".',
282
+ },
283
+ packageManager: {
284
+ type: 'string',
285
+ enum: ['brew', 'apt-get', 'snap'],
286
+ description: 'Optional. Force a specific package manager instead of auto-detecting.',
287
+ },
288
+ },
289
+ required: ['package'],
290
+ },
291
+ },
292
+ },
293
+ code: `
294
+ const { exec } = require('child_process');
295
+ const { promisify } = require('util');
296
+ const execAsync = promisify(exec);
297
+
298
+ // If the binary is already installed, return immediately.
299
+ try {
300
+ const { stdout: whichOut } = await execAsync('which ' + args.package, { timeout: 5000 });
301
+ if (whichOut.trim()) {
302
+ return { status: 'ok', alreadyInstalled: true, package: args.package, path: whichOut.trim() };
303
+ }
304
+ } catch {}
305
+
306
+ // Auto-detect package manager if not specified.
307
+ let pm = args.packageManager;
308
+ if (!pm) {
309
+ for (const candidate of ['brew', 'apt-get', 'snap']) {
310
+ try {
311
+ await execAsync('which ' + candidate, { timeout: 5000 });
312
+ pm = candidate;
313
+ break;
314
+ } catch {}
315
+ }
316
+ }
317
+
318
+ if (!pm) {
319
+ return { status: 'error', package: args.package, error: 'No supported package manager found (brew, apt-get, snap). Install one first or use exec to install manually.' };
320
+ }
321
+
322
+ // Build install command. apt-get always runs update first to avoid stale
323
+ // package lists causing "package not found" errors.
324
+ let cmd;
325
+ if (pm === 'apt-get') {
326
+ cmd = 'DEBIAN_FRONTEND=noninteractive apt-get update -qq && apt-get install -y ' + args.package;
327
+ } else if (pm === 'brew') {
328
+ cmd = 'brew install ' + args.package;
329
+ } else {
330
+ cmd = pm + ' install ' + args.package;
331
+ }
332
+
333
+ try {
334
+ const { stdout, stderr } = await execAsync(cmd, {
335
+ encoding: 'utf8',
336
+ timeout: 270000, // 4.5 min — leaves headroom before the outer 5-min tool timeout
337
+ maxBuffer: 2 * 1024 * 1024,
338
+ });
339
+ return { status: 'ok', packageManager: pm, package: args.package, stdout, stderr };
340
+ } catch (e) {
341
+ return { status: 'error', packageManager: pm, package: args.package, exitCode: e.code || 1, stdout: e.stdout || '', stderr: e.stderr || e.message };
342
+ }
343
+ `,
344
+ },
269
345
  get_recent_sessions: {
270
346
  definition: {
271
347
  type: 'function',
@@ -348,7 +424,16 @@ export async function loadTools() {
348
424
  }
349
425
 
350
426
  export function getToolDefinitions(tools) {
351
- return Object.values(tools).map(t => t.definition);
427
+ const defs = [];
428
+ for (const [name, t] of Object.entries(tools)) {
429
+ const params = t.definition?.function?.parameters;
430
+ if (typeof params !== 'object' || params === null || Array.isArray(params)) {
431
+ console.warn(`[tools] Skipping tool '${name}': parameters is not a valid object (got ${typeof params})`);
432
+ continue;
433
+ }
434
+ defs.push(t.definition);
435
+ }
436
+ return defs;
352
437
  }
353
438
 
354
439
  export async function executeTool(tools, name, toolArgs) {
@@ -359,10 +444,14 @@ export async function executeTool(tools, name, toolArgs) {
359
444
 
360
445
  const fn = new AsyncFunction('args', 'fs', 'path', 'process', 'require', '__jarvisDir', tool.code);
361
446
 
447
+ // Tools can declare their own timeout (e.g. system_install needs 5 min).
448
+ // Falls back to the global default of 60s.
449
+ const timeoutMs = tool.timeout || TOOL_TIMEOUT_MS;
450
+
362
451
  const timeout = new Promise((_, reject) =>
363
452
  setTimeout(
364
- () => reject(new Error(`Tool '${name}' timed out after ${TOOL_TIMEOUT_MS / 1000}s`)),
365
- TOOL_TIMEOUT_MS
453
+ () => reject(new Error(`Tool '${name}' timed out after ${timeoutMs / 1000}s`)),
454
+ timeoutMs
366
455
  )
367
456
  );
368
457