npm - @ducci/jarvis - Versions diffs - 1.0.25 → 1.0.27 - Mend

@ducci/jarvis 1.0.25 → 1.0.27

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/docs/findings/009-non-string-response-field.md +153 -0
package/docs/findings/010-checkpoint-field-type-safety.md +121 -0
package/docs/system-prompt.md +2 -0
package/package.json +1 -1
package/src/channels/telegram/index.js +5 -2
package/src/server/agent.js +24 -4

package/docs/findings/009-non-string-response-field.md ADDED Viewed

@@ -0,0 +1,153 @@
+# Finding 009: Non-String `response` Field Crashes Telegram Delivery
+**Date:** 2026-03-01
+**Severity:** High — caused "Sorry, something went wrong sending the response" with no useful information for the user
+**Status:** Fixed
+---
+## Observed Session
+The session ran 19 agent runs, all completing successfully (`ok` or `checkpoint_reached`). The crash occurred on run 19. The user asked:
+> "List me all tool calls you did In this session. Tool name and args are enough to display for each entry."
+The model returned valid JSON but placed the list of tool calls as a JSON **array** (not a string) in the `response` field:
+```json
+{
+  "response": [
+    { "tool": "exec", "args": { "cmd": "find ..." } },
+    ...16 entries...
+  ],
+  "logSummary": "Enumerated every tool call made during the session..."
+}
+```
+The Telegram user received: **"Sorry, something went wrong sending the response. Please try again."**
+---
+## Bug Chain
+### Step 1 — Agent parses valid JSON, stores non-string response
+`runAgentLoop` in `src/server/agent.js` successfully parsed the model's response JSON. The extraction logic had no type check:
+```js
+response = parsed.response || content;
+```
+`parsed.response` was an array (truthy) → `response` was set to the array. No validation. The array propagated through `finalResponse` all the way to the return value of `handleChat`.
+### Step 2 — Telegram handler crashes calling `.trim()` on an array
+In `src/channels/telegram/index.js`:
+```js
+const text = result.response?.trim()
+  || 'The agent encountered an error...';
+```
+`?.` guards against `null` and `undefined` only — not against wrong types. Arrays do not have a `.trim()` method. This threw:
+```
+TypeError: result.response.trim is not a function
+```
+### Step 3 — Delivery catch block sends the generic error
+The TypeError was caught by the outer delivery try/catch, which replied:
+```
+Sorry, something went wrong sending the response. Please try again.
+```
+The user had no idea what failed. The agent had completed successfully — only the delivery step crashed.
+---
+## Root Causes
+**Primary**: `agent.js` never validates that `parsed.response` is a string after JSON parsing. The response contract ("Your message to the user, in plain text.") is documented but never enforced. Any JSON value — array, object, number, null — passes through silently.
+**Secondary**: `telegram/index.js` assumed `result.response` would always be a string or null/undefined, and called `.trim()` without type-guarding.
+The same primary bug exists in the wrap-up path (line ~315):
+```js
+response = parsedWrapUp.response || '';
+```
+This would fail identically if the wrap-up model returned a non-string `response`.
+---
+## What Was Not Caught Earlier
+- The JSONL log stored `response: [array]` but the run status was `ok` — nothing flagged as an error on the agent side.
+- The error only surfaces in the Telegram delivery layer, which has no visibility into the JSONL log.
+- The model had valid intent (listing tool calls as a structured data type) — it just put the data in the wrong JSON field type.
+---
+## Fix
+### 1. `src/server/agent.js` — normalize response to string at both sites
+**Main response path:**
+```js
+// Before:
+response = parsed.response || content;
+// After:
+response = typeof parsed.response === 'string'
+  ? parsed.response
+  : JSON.stringify(parsed.response, null, 2);
+```
+**Wrap-up path:**
+```js
+// Before:
+response = parsedWrapUp.response || '';
+// After:
+response = typeof parsedWrapUp.response === 'string'
+  ? parsedWrapUp.response
+  : parsedWrapUp.response != null ? JSON.stringify(parsedWrapUp.response, null, 2) : '';
+```
+When the model returns a non-string (array, object), it is JSON-stringified with 2-space indentation. The user gets a readable representation of the intended content rather than a crash. This preserves the model's intent while enforcing the string contract.
+### 2. `src/channels/telegram/index.js` — defense-in-depth type guard
+```js
+// Before:
+const text = result.response?.trim()
+  || 'The agent encountered an error and could not produce a response. Please try again.';
+// After:
+const rawResponse = typeof result.response === 'string'
+  ? result.response
+  : result.response != null ? JSON.stringify(result.response, null, 2) : '';
+const text = rawResponse.trim()
+  || 'The agent encountered an error and could not produce a response. Please try again.';
+```
+### 3. `docs/system-prompt.md` — explicit type constraint on `response`
+Added one sentence to the `## Response Format` section:
+```
+The `response` value must be a plain text string — never an array or object. If you need to present structured data (e.g. a list of items), format it as text within the string value.
+```
+---
+## Outcome
+| Fix | Files changed |
+|-----|--------------|
+| Coerce `parsed.response` to string in main and wrap-up paths | `src/server/agent.js` |
+| Type guard before `.trim()` call | `src/channels/telegram/index.js` |
+| Explicit type constraint on `response` field | `docs/system-prompt.md` |
+**Effect on the debugging session**: instead of "Sorry, something went wrong sending the response", the user would have received the tool call list formatted as a readable JSON string.

package/docs/findings/010-checkpoint-field-type-safety.md ADDED Viewed

@@ -0,0 +1,121 @@
+# Finding 010: Non-String `checkpoint.remaining` Crashes Zero-Progress Detection
+**Date:** 2026-03-01
+**Severity:** High — caused "Sorry, something went wrong" in Telegram with no useful context; crashed the handoff loop mid-run
+**Status:** Fixed
+---
+## Observed Session
+The session ran 13+ agent runs working on OWASP ZAP installation. Runs 8–13 were consecutive `checkpoint_reached` handoffs. On entry 14 (immediately after entry 13), the server logged:
+```
+status: error
+response: "An unexpected server error occurred: (run.checkpoint.remaining || "").trim is not a function"
+```
+The Telegram user received:
+```
+Sorry, something went wrong: (run.checkpoint.remaining || "").trim is not a function
+```
+---
+## Bug Chain
+### Step 1 — Wrap-up call returns non-string `remaining`
+At iteration limit, `runAgentLoop` sends the `WRAP_UP_NOTE` and parses the model's JSON response. The model returned `checkpoint.remaining` as a non-string value (array or object) instead of a plain text string. `parsedWrapUp.checkpoint` was stored and returned with no type validation.
+### Step 2 — Zero-progress detection crashes on `.trim()`
+In `_runHandleChat`, finding 007 introduced zero-progress detection:
+```js
+const currentRemaining = (run.checkpoint.remaining || '').trim();
+```
+The `|| ''` guard only catches falsy values (null, undefined). A truthy non-string (array, object) passes through the `||` and `.trim()` is called on a non-string:
+```
+TypeError: (run.checkpoint.remaining || "").trim is not a function
+```
+### Step 3 — Outer catch logs the error and re-throws
+The `try/catch` at the top of the handoff loop caught the TypeError, wrote an `error` status log entry, and re-threw. The Telegram handler surfaced the raw error message.
+---
+## Secondary Issues
+**`resumeContent` (line 520)**: `run.checkpoint.remaining || 'Continue with the task.'` — if `remaining` is a truthy non-string, it would be pushed directly into `session.messages` as the next user message content. The message API expects a string, so this would produce a malformed conversation message.
+**`failedApproaches` spread (lines 461–463)**: If the model returns `failedApproaches` as a non-array (string, object), `push(...value)` would spread wrong data. A string spreads individual characters; an object spreads its enumerable values.
+---
+## Root Cause
+Same class of bug as finding 009 (non-string `response` field). Finding 009 hardened `response` and `logSummary` extraction, but the `checkpoint` sub-object fields were not included in that hardening pass. Models — especially smaller/free models under iteration-limit pressure — sometimes return structured data (arrays, objects) in fields the system prompt specifies as plain text strings.
+---
+## Fix
+### `src/server/agent.js` — normalize checkpoint fields at source
+Added a normalization block immediately inside the `if (parsedWrapUp.checkpoint)` branch, before any checkpoint field is accessed downstream:
+```js
+const cp = parsedWrapUp.checkpoint;
+// remaining must be a string — used as the next run's resume prompt
+if (typeof cp.remaining !== 'string') {
+  cp.remaining = Array.isArray(cp.remaining)
+    ? cp.remaining.map(String).join('\n')
+    : cp.remaining != null ? JSON.stringify(cp.remaining) : '';
+}
+// failedApproaches must be an array of strings — spread into session metadata
+if (!Array.isArray(cp.failedApproaches)) {
+  cp.failedApproaches = [];
+} else {
+  cp.failedApproaches = cp.failedApproaches.map(item =>
+    typeof item === 'string' ? item : JSON.stringify(item)
+  );
+}
+```
+**Array coercion for `remaining`**: when the model returns an array (e.g., `["install Java", "create symlink"]`), elements are joined with newlines rather than JSON-stringified — producing a natural readable resume prompt rather than raw JSON syntax.
+**Centralized normalization**: fixing at source (right after parse) rather than at each use site means lines 469 and 520 need no change. Any future use of `checkpoint.remaining` or `checkpoint.failedApproaches` is automatically safe.
+### `src/server/agent.js` — update `WRAP_UP_NOTE`
+Added explicit type constraints to the `remaining` field description and a trailing instruction:
+```
+"remaining": "What still needs to be done — as a plain text string, never an array or object."
+...
+remaining must be a plain text string. failedApproaches must be a JSON array of strings.
+```
+---
+## What Was Not Changed
+- `agent.js` lines 469 and 520 — no changes needed; normalization at source makes them safe
+- `src/channels/telegram/index.js` — finding 007 and 009 already added `.catch(() => {})` and type guards on delivery
+- `sessions.js`, `tools.js` — no changes needed
+---
+## Outcome
+| Fix | Files changed |
+|-----|--------------|
+| Normalize `checkpoint.remaining` to string and `checkpoint.failedApproaches` to string array at source | `src/server/agent.js` |
+| Add explicit type constraints to WRAP_UP_NOTE | `src/server/agent.js` |
+**Effect**: instead of a `TypeError` crash mid-handoff-loop, the model's non-string `remaining` value is coerced to a readable string and used as the resume prompt. The session continues normally.

package/docs/system-prompt.md CHANGED Viewed

@@ -30,6 +30,8 @@ There are two types of responses depending on whether you need to use tools:
   "logSummary": "A concise explanation of what you did and why, written for a human reading the logs."
 }
+The `response` value must be a plain text string — never an array or object. If you need to present structured data (e.g. a list of items), format it as text within the string value.
 Never include markdown code fences, preamble, or any text outside this JSON object. If you cannot complete a task, explain why in the `response` field — still as valid JSON.
 ## Tool Use

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@ducci/jarvis",
-  "version": "1.0.25",
+  "version": "1.0.27",
   "description": "A fully automated agent system that lives on a server.",
   "main": "./src/index.js",
   "type": "module",

package/src/channels/telegram/index.js CHANGED Viewed

@@ -67,8 +67,11 @@ export async function startTelegramChannel(config) {
     try {
       const MAX_TG = 4096;
-      // Guard against empty response (e.g. format_error returns empty string)
-      const text = result.response?.trim()
+      // Guard against empty or non-string response (e.g. model returns array instead of string)
+      const rawResponse = typeof result.response === 'string'
+        ? result.response
+        : result.response != null ? JSON.stringify(result.response, null, 2) : '';
+      const text = rawResponse.trim()
         || 'The agent encountered an error and could not produce a response. Please try again.';
       if (text.length <= MAX_TG) {
         await ctx.reply(text);

package/src/server/agent.js CHANGED Viewed

@@ -19,12 +19,12 @@ Respond with your normal JSON, but add a checkpoint field:
   "logSummary": "Human-readable summary of what happened in this run.",
   "checkpoint": {
     "progress": "What has been fully completed so far.",
-    "remaining": "What still needs to be done to finish the task.",
+    "remaining": "What still needs to be done to finish the task — as a plain text string, never an array or object.",
     "failedApproaches": ["Concise description of each approach that was tried and failed, e.g. 'downloading subfinder via curl from GitHub releases — connection reset'. Omit array entries for things that succeeded. Leave as empty array if nothing failed."]
   }
 }
-The checkpoint field will be used to automatically resume the task in the next run. failedApproaches is injected into the next run so the agent does not waste iterations repeating strategies that already failed.]`;
+The checkpoint field will be used to automatically resume the task in the next run. failedApproaches is injected into the next run so the agent does not waste iterations repeating strategies that already failed. remaining must be a plain text string. failedApproaches must be a JSON array of strings.]`;
 // Serializes concurrent requests for the same session. Maps sessionId to the
 // tail of the current request chain (a Promise that resolves when the last
@@ -247,7 +247,9 @@ async function runAgentLoop(client, config, session, prepareMessages) {
     }
     session.messages.push({ role: 'assistant', content });
-    response = parsed.response || content;
+    response = typeof parsed.response === 'string'
+      ? parsed.response
+      : JSON.stringify(parsed.response, null, 2);
     logSummary = parsed.logSummary || '';
     done = true;
@@ -312,9 +314,27 @@ async function runAgentLoop(client, config, session, prepareMessages) {
     session.messages.push({ role: 'assistant', content: wrapUpContent });
     if (parsedWrapUp) {
-      response = parsedWrapUp.response || '';
+      response = typeof parsedWrapUp.response === 'string'
+        ? parsedWrapUp.response
+        : parsedWrapUp.response != null ? JSON.stringify(parsedWrapUp.response, null, 2) : '';
       logSummary = parsedWrapUp.logSummary || '';
       if (parsedWrapUp.checkpoint) {
+        // Normalize checkpoint fields to their expected types. Models sometimes
+        // return arrays or objects in fields that must be strings — the same class
+        // of bug fixed for `response` in finding 009.
+        const cp = parsedWrapUp.checkpoint;
+        if (typeof cp.remaining !== 'string') {
+          cp.remaining = Array.isArray(cp.remaining)
+            ? cp.remaining.map(String).join('\n')
+            : cp.remaining != null ? JSON.stringify(cp.remaining) : '';
+        }
+        if (!Array.isArray(cp.failedApproaches)) {
+          cp.failedApproaches = [];
+        } else {
+          cp.failedApproaches = cp.failedApproaches.map(item =>
+            typeof item === 'string' ? item : JSON.stringify(item)
+          );
+        }
         return {
           iteration,
           response,