npm - open-agents-ai - Versions diffs - 0.187.478 → 0.187.480 - Mend

open-agents-ai 0.187.478 → 0.187.480

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/dist/index.js +171 -22
package/npm-shrinkwrap.json +2 -2
package/package.json +1 -1
package/prompts/agentic/system-large.md +7 -7
package/prompts/agentic/system-medium.md +8 -13

package/dist/index.js CHANGED Viewed

@@ -512194,17 +512194,18 @@ function buildStagnationDiagnostic(signals) {
     ``,
     `2. STATE A HYPOTHESIS in writing — what specifically is wrong? "I think X is failing because Y." Be concrete. Do NOT propose a fix yet.`,
     ``,
-    `3. VERIFY ONE ASSUMPTION — pick the ONE thing you most BELIEVE to be true and test it with the smallest possible command:`,
-    `     • If you think a package is installed: ls node_modules/<name>/package.json`,
-    `     • If you think an env var is set: printenv <NAME>`,
-    `     • If you think a file imports correctly: head -5 <file>`,
-    `     • If you don't know what an error means: web_search("<exact error string>")`,
+    `3. VERIFY ONE ASSUMPTION — pick the ONE thing you most BELIEVE to be true and test it with the smallest possible command native to whatever ecosystem you're in. Examples of the *shape* (not the exact commands):`,
+    `     • Is this artifact present on disk? (one read of the path)`,
+    `     • Does this import / reference resolve? (read 5 lines around it)`,
+    `     • Is this environment value set? (one query)`,
+    `     • Is this binary on PATH? (one which/where)`,
+    `     • Don't know what an error means? web_search("<exact error string>")`,
     ``,
-    `4. CHECK SILENT FAILURES — npm install reporting "added N packages" does NOT mean ALL declared deps installed; npm sometimes drops packages with peer-dep conflicts without erroring. Verify each expected dep individually.`,
+    `4. CHECK SILENT FAILURES — package managers and build systems frequently report "success" while silently dropping artifacts you needed. Don't trust summary output ("added N", "build complete") without verifying the SPECIFIC artifact exists.`,
     ``,
     `DO NOT in your next response:`,
     `  • Try another version, flag, or variant of any command in the list above`,
-    `  • Wipe node_modules / re-install — that hides the original error`,
+    `  • Wipe caches / re-install / re-build — that hides the original error`,
     `  • Call task_complete — being stuck on a debug problem is NEVER grounds for task_complete`,
     ``,
     `task_complete is ONLY for actual completion or unrecoverable hardware/permission errors. You are stuck on a fixable problem; diagnose it.`
@@ -512228,6 +512229,87 @@ var init_critic = __esm({
   }
 });
+// packages/orchestrator/dist/reflection.js
+function categorizeError(errorText) {
+  if (!errorText)
+    return "unknown";
+  for (const { category, re } of CATEGORY_PATTERNS) {
+    if (re.test(errorText))
+      return category;
+  }
+  return "unknown";
+}
+function buildStem(toolName, args) {
+  if (!args || Object.keys(args).length === 0)
+    return toolName;
+  const entries = Object.entries(args).sort(([a2], [b]) => a2.localeCompare(b));
+  const first2 = entries[0];
+  const v = typeof first2[1] === "string" ? first2[1] : JSON.stringify(first2[1]);
+  return `${toolName}:${first2[0]}=${v.slice(0, 60)}`;
+}
+function firstSignalLine(errorText) {
+  if (!errorText)
+    return "";
+  const lines = errorText.split(/\r?\n/);
+  for (const raw of lines) {
+    const line = raw.trim();
+    if (!line)
+      continue;
+    if (line === "Error:" || line === "error:")
+      continue;
+    return line.slice(0, 200);
+  }
+  return errorText.slice(0, 200);
+}
+function synthesizeReflection(input) {
+  const category = categorizeError(input.errorText);
+  const stem = buildStem(input.toolName, input.args);
+  const argPreview = JSON.stringify(input.args ?? {}).slice(0, 120);
+  return {
+    stem,
+    attempted: `${input.toolName}(${argPreview})`,
+    wentWrong: firstSignalLine(input.errorText),
+    hypothesis: HYPOTHESES[category],
+    turn: input.turn,
+    attempts: (input.priorAttempts ?? 0) + 1
+  };
+}
+function renderReflectionMessage(r2) {
+  return [
+    `[REFLECTION — your last attempt of \`${r2.attempted}\` failed (turn ${r2.turn}, ${r2.attempts} attempt${r2.attempts === 1 ? "" : "s"} so far).`,
+    `Last error: "${r2.wentWrong}"`,
+    `Hypothesis: ${r2.hypothesis}`,
+    `VERIFY this hypothesis with a single small command BEFORE retrying the same tool. If you retry without verifying, you will likely fail the same way.]`
+  ].join("\n");
+}
+var CATEGORY_PATTERNS, HYPOTHESES;
+var init_reflection = __esm({
+  "packages/orchestrator/dist/reflection.js"() {
+    "use strict";
+    CATEGORY_PATTERNS = [
+      { category: "permission_denied", re: /\b(permission denied|eacces|access denied|operation not permitted|forbidden)\b/i },
+      { category: "type_or_reference_error", re: /\b(type error|cannot find module|cannot find name|is not (a function|defined|assignable)|undefined reference|unresolved (import|reference)|missing required)\b/i },
+      { category: "connection_refused", re: /\b(connection refused|econnrefused|connection reset|econnreset|host unreachable|getaddrinfo|enotfound)\b/i },
+      { category: "timeout", re: /\b(timeout|timed out|etimedout|deadline exceeded)\b/i },
+      { category: "syntax_error", re: /\b(syntax error|parse error|unexpected token|unexpected end of (input|json)|malformed)\b/i },
+      { category: "not_found", re: /\b(not found|enoent|no such file|cannot find|does not exist|404)\b/i },
+      // Use [1-9]\d* so multi-digit non-zero codes (e.g. "return code 127") match —
+      // the prior [^0] only matched a single character and failed on multi-digit.
+      { category: "nonzero_exit", re: /\b(exit code [1-9]\d*|exit status [1-9]\d*|command failed|exit code: ?[1-9]\d*|return code [1-9]\d*)\b/i }
+    ];
+    HYPOTHESES = {
+      permission_denied: "permissions issue — check ownership and mode of the target; you may need to operate on a writeable location",
+      not_found: "the named resource doesn't exist at the expected location — verify the path/name with a single-line list before retrying",
+      connection_refused: "remote service is unreachable — verify it's running and reachable before retrying with the same address",
+      timeout: "operation took too long — reduce scope (smaller batch, fewer items) or verify the service is healthy",
+      syntax_error: "malformed input — re-read the surrounding context; the input you produced doesn't match what the consumer expects",
+      type_or_reference_error: "a name, type, or import doesn't resolve — verify the reference matches what's defined; do not guess at the symbol",
+      nonzero_exit: "the command exited with a failure code — read the FULL error output and verify args + prerequisites before retrying",
+      unknown: "re-read the full error message and identify the most likely cause; verify your assumption with a single small command before retrying"
+    };
+  }
+});
 // packages/orchestrator/dist/pressure-gate.js
 function detectPressure(message2) {
   const hasProfanity = PRESSURE_SIGNALS.test(message2);
@@ -518337,10 +518419,8 @@ function detectTaskMode(task) {
     return true;
   if (/(\/[\w.-]+){2,}/.test(task.slice(0, 2e3)))
     return true;
-  if (/\b(implement|build|create|refactor|write|fix|migrate|deploy|generate|setup|set up|develop|design|integrate)\b/.test(head)) {
-    if (/\b(spec|file|module|component|api|endpoint|database|schema|test|build|next\.js|typescript|react|prisma|tailwind|sql|python|rust|go)\b/.test(head)) {
-      return true;
-    }
+  if (/\b(implement|build|create|refactor|rewrite|fix|migrate|deploy|generate|setup|set up|develop|design|integrate|configure|install|debug|port|extend|add)\b/.test(head)) {
+    return true;
   }
   return false;
 }
@@ -518461,6 +518541,7 @@ var init_agenticRunner = __esm({
     init_personality();
     init_promptLoader();
     init_critic();
+    init_reflection();
     init_pressure_gate();
     init_dist5();
     init_dist7();
@@ -518587,6 +518668,14 @@ var init_agenticRunner = __esm({
       _errorPatterns = /* @__PURE__ */ new Map();
       _errorGuidanceInjected = /* @__PURE__ */ new Set();
       // prevent duplicate injection per turn
+      // REG-26 (Patch C): Reflexion-style structured failure memory. Indexed by
+      // fingerprint stem (tool + first arg, truncated). When the agent retries a
+      // tool with a stem matching a stored reflection, surface "what was tried,
+      // what went wrong, hypothesis to verify" as a system message before the
+      // dispatch — generic across all stacks. See packages/orchestrator/src/reflection.ts.
+      _failureReflections = /* @__PURE__ */ new Map();
+      _reflectionsInjectedThisTurn = /* @__PURE__ */ new Set();
+      // prevent duplicate inject per turn
       // ── WO-AM-01/04/10: Associative memory stores ──
       // Episode store: every tool call → persistent episode with importance + decay
       // Temporal KG: entities + relations with temporal validity (valid_from/valid_until)
@@ -520730,6 +520819,7 @@ TASK: ${task}` : task;
             break;
           }
           injectionsThisTurn = 0;
+          this._reflectionsInjectedThisTurn.clear();
           while (deferredSoftInjections.length > 0 && injectionsThisTurn < INJECTION_BUDGET_SOFT) {
             const next = deferredSoftInjections.shift();
             messages2.push({ role: next.role, content: next.content });
@@ -521599,6 +521689,16 @@ ${memoryLines.join("\n")}`
               if (observerRedundantBlock) {
                 this._littlemanRedundantBlocks.delete(toolFingerprint);
               }
+              {
+                const _reflStem = buildStem(tc.name, tc.arguments ?? {});
+                if (!this._reflectionsInjectedThisTurn.has(_reflStem)) {
+                  const _reflEntry = this._failureReflections.get(_reflStem);
+                  if (_reflEntry) {
+                    this._reflectionsInjectedThisTurn.add(_reflStem);
+                    pushSoftInjection("system", renderReflectionMessage(_reflEntry));
+                  }
+                }
+              }
               const criticDecision = evaluate({
                 proposedCall: { tool: tc.name, args: tc.arguments ?? {} },
                 fingerprint: toolFingerprint,
@@ -521625,6 +521725,11 @@ ${criticDecision.cachedResult.slice(0, 500)}` : `[BLOCKED — the observer confi
               }
               if (criticDecision.decision === "force_progress_block") {
                 dedupHitCount.set(toolFingerprint, criticDecision.hitNumber);
+                const _existingFp = recentToolResults.get(toolFingerprint);
+                if (_existingFp !== void 0) {
+                  recentToolResults.delete(toolFingerprint);
+                  recentToolResults.set(toolFingerprint, _existingFp);
+                }
                 this.emit({ type: "tool_call", toolName: tc.name, toolArgs: tc.arguments, turn, timestamp: (/* @__PURE__ */ new Date()).toISOString() });
                 this.emit({
                   type: "tool_result",
@@ -521638,6 +521743,11 @@ ${criticDecision.cachedResult.slice(0, 500)}` : `[BLOCKED — the observer confi
               }
               if (criticDecision.decision === "serve_cached") {
                 dedupHitCount.set(toolFingerprint, criticDecision.hitNumber);
+                const _existingFp = recentToolResults.get(toolFingerprint);
+                if (_existingFp !== void 0) {
+                  recentToolResults.delete(toolFingerprint);
+                  recentToolResults.set(toolFingerprint, _existingFp);
+                }
                 this.emit({
                   type: "tool_call",
                   toolName: tc.name,
@@ -522063,6 +522173,8 @@ ${criticDecision.cachedResult.slice(0, 500)}` : `[BLOCKED — the observer confi
               }
               if (result.success) {
                 this._recentFailures = this._recentFailures.filter((f2) => f2.fingerprint !== toolFingerprint);
+                const _stem = buildStem(tc.name, tc.arguments ?? {});
+                this._failureReflections.delete(_stem);
               }
               if (!result.success) {
                 this._recentFailures.push({
@@ -522076,6 +522188,22 @@ ${criticDecision.cachedResult.slice(0, 500)}` : `[BLOCKED — the observer confi
                 if (this._recentFailures.length > 8) {
                   this._recentFailures = this._recentFailures.slice(-8);
                 }
+                const _refStem = buildStem(tc.name, tc.arguments ?? {});
+                const _prior = this._failureReflections.get(_refStem);
+                const _refErr = (result.error ?? result.output ?? "").toString();
+                const _entry = synthesizeReflection({
+                  toolName: tc.name,
+                  args: tc.arguments ?? {},
+                  errorText: _refErr,
+                  turn,
+                  priorAttempts: _prior?.attempts ?? 0
+                });
+                this._failureReflections.set(_refStem, _entry);
+                if (this._failureReflections.size > 32) {
+                  const oldestKey = this._failureReflections.keys().next().value;
+                  if (oldestKey !== void 0)
+                    this._failureReflections.delete(oldestKey);
+                }
               }
               if (!result.success && tc.name === "shell" && /\[PERMISSION_ERROR\]/.test(result.error ?? "")) {
                 this.emit({
@@ -522341,9 +522469,35 @@ ${sr.result.output}`;
               for (const batch2 of batches) {
                 if (this.aborted)
                   break;
+                const batchFingerprintFirstId = /* @__PURE__ */ new Map();
+                const batchInFlight = /* @__PURE__ */ new Map();
+                const buildBatchFp = (call) => {
+                  const args = call.args ?? {};
+                  const argsKey = Object.entries(args).sort(([a2], [b]) => a2.localeCompare(b)).map(([k, v]) => `${k}=${typeof v === "string" ? v.slice(0, 160) : JSON.stringify(v).slice(0, 160)}`).join(",");
+                  return `${call.name}:${argsKey}`;
+                };
+                for (const call of batch2.calls) {
+                  const fp = buildBatchFp(call);
+                  if (!batchFingerprintFirstId.has(fp)) {
+                    batchFingerprintFirstId.set(fp, call.id);
+                  }
+                }
                 const results = await executeBatch(batch2, async (call) => {
                   const originalTc = rawToolCalls.find((tc) => tc.id === call.id);
-                  return executeSingle(originalTc);
+                  const fp = buildBatchFp(call);
+                  const firstId = batchFingerprintFirstId.get(fp);
+                  if (firstId !== void 0 && call.id !== void 0 && firstId !== call.id) {
+                    const inflight = batchInFlight.get(fp);
+                    if (inflight) {
+                      const cloned = await inflight;
+                      if (!cloned)
+                        return null;
+                      return { tc: { ...cloned.tc, id: call.id }, output: cloned.output };
+                    }
+                  }
+                  const promise = executeSingle(originalTc);
+                  batchInFlight.set(fp, promise);
+                  return promise;
                 }, 5);
                 for (const r2 of results) {
                   if (r2) {
@@ -525000,7 +525154,11 @@ ${transcript}`
           /\buse\s+(\w+)/g,
           /\bcall\s+(\w+)/g,
           /`([a-z_]+)`/g,
-          /\btool[:\s]+(\w+)/g
+          /\btool[:\s]+(\w+)/g,
+          // Function-call syntax: `name(args)` is the strongest call-site signal
+          // a name can have. A bare identifier mention isn't enough — that catches
+          // filesystem nouns like `node_modules` or `package_lock`.
+          /\b([a-z][a-z0-9_]*[a-z0-9])\s*\(/g
         ];
         const contextualNames = /* @__PURE__ */ new Set();
         for (const pat of TOOL_CONTEXT_PATTERNS) {
@@ -525011,14 +525169,6 @@ ${transcript}`
               contextualNames.add(name10);
           }
         }
-        const nameCounts = /* @__PURE__ */ new Map();
-        for (const name10 of referenced) {
-          const escaped = name10.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
-          const occurrences = (systemPrompt.match(new RegExp(`\\b${escaped}\\b`, "g")) ?? []).length;
-          nameCounts.set(name10, occurrences);
-          if (occurrences >= 2)
-            contextualNames.add(name10);
-        }
         const IGNORE_LIST = /* @__PURE__ */ new Set([
           "tool_use",
           "tool_call",
@@ -525063,7 +525213,6 @@ ${transcript}`
           // reserved status for partial-done todos
           "not_started",
           // alternative status phrasing
-          // Shell/bash idioms that look like snake_case
           "ctrl_c",
           "ctrl_d"
         ]);

package/npm-shrinkwrap.json CHANGED Viewed

@@ -1,12 +1,12 @@
 {
   "name": "open-agents-ai",
-  "version": "0.187.478",
+  "version": "0.187.480",
   "lockfileVersion": 3,
   "requires": true,
   "packages": {
     "": {
       "name": "open-agents-ai",
-      "version": "0.187.478",
+      "version": "0.187.480",
       "hasInstallScript": true,
       "license": "CC-BY-NC-4.0",
       "dependencies": {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "open-agents-ai",
-  "version": "0.187.478",
+  "version": "0.187.480",
   "description": "AI coding agent powered by open-source models (Ollama/vLLM) — interactive TUI with agentic tool-calling loop",
   "type": "module",
   "main": "./dist/index.js",

package/prompts/agentic/system-large.md CHANGED Viewed

@@ -169,17 +169,17 @@ If you have tried 2+ approaches to the same blocker and both failed, **STOP atte
 **The diagnostic loop (one cycle per turn, NOT batched):**
-1. **READ THE FULL ERROR** — re-read the most recent failure output ENTIRELY. Don't skim the first 200 chars. If output is in a log packet, use `log_explore` with `op="errors"`, then `op="lines"` for context.
-2. **VERIFY ONE ASSUMPTION** — pick ONE thing you BELIEVE to be true and test it with the smallest possible command (e.g. "I think tailwindcss is installed" → `ls node_modules/tailwindcss/package.json`).
-3. **STATE A HYPOTHESIS in writing** before your next action. Then design ONE experiment that would CONFIRM or REFUTE it (not fix it — verify it first).
+1. **READ THE FULL ERROR** — re-read the most recent failure output ENTIRELY. If it's in a log packet, query `op="errors"` then `op="lines"` for context.
+2. **VERIFY ONE ASSUMPTION** — pick ONE thing you BELIEVE to be true and test it with the smallest possible command native to your ecosystem. Examples of the shape: "is this artifact present?", "does this import resolve?", "is this env var set?". One read, one fact verified.
+3. **STATE A HYPOTHESIS in writing** before your next action. Then design ONE experiment that CONFIRMS or REFUTES it — verify, do NOT fix yet.
 4. **WEB SEARCH the exact error message** if you don't know what it means. A 30-second lookup beats 10 retry attempts.
-5. **CHECK THE OBVIOUS** — silent failures are common. `npm install` reporting "added 141 packages" doesn't mean ALL declared deps installed; npm sometimes drops packages with peer-dep conflicts without erroring. Verify each expected dep with `ls node_modules/<name>/package.json`.
+5. **CHECK THE OBVIOUS** — package managers and build systems frequently report "success" while silently dropping artifacts. Don't trust summary output ("added N", "build complete") without verifying the SPECIFIC artifact you needed actually exists.
 6. Only AFTER root cause is verified, attempt ONE fix targeting that cause. If the fix fails, return to step 1 with the new error.
 **What diagnostic mode is NOT:**
-- Trying another version (`tailwindcss@3.4.19` after `tailwindcss@4.0.0`) — that's variant-fatigue, not diagnosis.
-- Adding `--force` or `--legacy-peer-deps` — those mask root causes.
-- Wiping node_modules and re-installing — hides the original error.
+- Trying another version of the same dependency after one failed — variant-fatigue, not diagnosis.
+- Adding force/override flags that suppress warnings — masks root causes.
+- Wiping caches/dependencies and reinstalling — hides the original error.
 - Calling task_complete to escape — task_complete is NEVER the answer to a stuck debugging session.
 - Use grep_search and find_files for efficient exploration (don't dump entire directories)
 - Use file_edit for small changes instead of rewriting entire files

package/prompts/agentic/system-medium.md CHANGED Viewed

@@ -102,27 +102,22 @@ If you have tried 2+ approaches to the same blocker and both failed, **STOP atte
 **The diagnostic loop (one cycle per turn, NOT batched):**
-1. **READ THE FULL ERROR** — re-read the most recent failure output ENTIRELY. Don't skim the first 200 chars. If the output is in a log packet, use `log_explore` with `op="errors"` to see every marker, then `op="lines"` for surrounding context.
+1. **READ THE FULL ERROR** — re-read the most recent failure output ENTIRELY. Don't skim the first 200 chars. If the output is in a log packet, query it with `op="errors"` then `op="lines"` for surrounding context.
-2. **VERIFY ONE ASSUMPTION** — pick ONE thing you BELIEVE to be true and test it with the smallest possible command:
-   - "I think tailwindcss is installed" → `ls node_modules/tailwindcss/package.json` (one line)
-   - "I think the import path is right" → `cat src/lib/x.ts | head -5`
-   - "I think the env var is set" → `printenv VAR_NAME`
+2. **VERIFY ONE ASSUMPTION** — pick ONE thing you BELIEVE to be true and test it with the smallest possible command native to whatever ecosystem you're in. Examples of the *shape* (not the exact commands): "is this artifact present on disk?", "does this import resolve?", "is this environment variable set?", "does this binary exist on PATH?". One read, one fact verified.
-3. **STATE A HYPOTHESIS in writing** before your next action:
-   - "Hypothesis: tailwindcss didn't install because @tailwindcss/postcss has a peer-dep conflict with autoprefixer."
-   - Then design ONE experiment that would CONFIRM or REFUTE it (not fix it — verify it first).
+3. **STATE A HYPOTHESIS in writing** before your next action — "I think X is failing because Y." Be concrete. Then design ONE experiment that would CONFIRM or REFUTE it (verify it first; do NOT fix yet).
-4. **WEB SEARCH the exact error message** if you don't know what it means. `web_search("exact error string from terminal")`. A 30-second lookup beats 10 retry attempts.
+4. **WEB SEARCH the exact error message** if you don't know what it means. Quote the exact error string. A 30-second lookup beats 10 retry attempts.
-5. **CHECK THE OBVIOUS** — silent failures are common. `npm install` saying "added 141 packages" doesn't mean ALL declared deps installed; npm sometimes drops packages with peer-dep conflicts without erroring. Verify each expected dep with `ls node_modules/<name>/package.json`.
+5. **CHECK THE OBVIOUS** — package managers and build systems frequently report "success" while silently dropping artifacts. Don't trust a summary like "added N packages" or "build complete" without verifying the SPECIFIC artifact you needed actually exists. Check each expected output explicitly.
 6. Only AFTER root cause is verified, attempt ONE fix targeting that cause. If the fix fails, return to step 1 with the new error.
 **What diagnostic mode is NOT:**
-- Trying another version (`tailwindcss@3.4.19` after `tailwindcss@4.0.0` failed) — that's variant-fatigue, not diagnosis.
-- Adding `--force` or `--legacy-peer-deps` — those mask root causes, they don't reveal them.
-- Wiping node_modules and re-installing — that just hides the original error.
+- Trying a different version of the same dependency after one failed — that's variant-fatigue, not diagnosis.
+- Adding force/override flags that suppress warnings — those mask root causes, they don't reveal them.
+- Wiping caches/dependencies and reinstalling — that hides the original error.
 - Calling task_complete to escape — task_complete is NEVER the answer to a stuck debugging session.
 - Do NOT output long explanations. Focus on tool calls.
 - If file_read/list_directory returns ENOENT, use list_directory on the project root — do NOT guess parent paths