open-agents-ai 0.187.474 → 0.187.475

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/dist/index.js CHANGED
@@ -520548,6 +520548,14 @@ TASK: ${task}` : task;
520548
520548
  for (const [tool, budget] of Object.entries(toolBudgets)) {
520549
520549
  toolCallBudget.set(tool, budget);
520550
520550
  }
520551
+ const stagnationWindow = [];
520552
+ let stagnationCooldownUntilTurn = -1;
520553
+ const STAG_WINDOW_TURNS = 40;
520554
+ const STAG_WINDOW_MS = 10 * 60 * 1e3;
520555
+ const STAG_MIN_SAMPLES = 30;
520556
+ const STAG_FAILURE_THRESHOLD = 5;
520557
+ const STAG_VARIANT_THRESHOLD = 4;
520558
+ const STAG_FILES_DELTA_MIN = 3;
520551
520559
  for (let turn = 0; turn < this.options.maxTurns; turn++) {
520552
520560
  clearTurnState(this._appState);
520553
520561
  this._maybeApplyThinkGuard();
@@ -520562,6 +520570,70 @@ TASK: ${task}` : task;
520562
520570
  this.emit({ type: "error", content: "Task aborted by user", timestamp: (/* @__PURE__ */ new Date()).toISOString() });
520563
520571
  break;
520564
520572
  }
520573
+ if (turn > stagnationCooldownUntilTurn && stagnationWindow.length >= STAG_MIN_SAMPLES) {
520574
+ const cutoffTurn = turn - STAG_WINDOW_TURNS;
520575
+ const cutoffTs = Date.now() - STAG_WINDOW_MS;
520576
+ while (stagnationWindow.length && (stagnationWindow[0].turn < cutoffTurn || stagnationWindow[0].ts < cutoffTs)) {
520577
+ stagnationWindow.shift();
520578
+ }
520579
+ if (stagnationWindow.length >= STAG_MIN_SAMPLES) {
520580
+ const completedDelta = stagnationWindow[stagnationWindow.length - 1].completedTodos - stagnationWindow[0].completedTodos;
520581
+ const fileSet = /* @__PURE__ */ new Set();
520582
+ for (const s2 of stagnationWindow)
520583
+ for (const p2 of s2.filesTouchedThisTurn)
520584
+ fileSet.add(p2);
520585
+ const filesDelta = fileSet.size;
520586
+ const failureSum = stagnationWindow.reduce((a2, s2) => a2 + s2.failuresThisTurn, 0);
520587
+ const variantSet = /* @__PURE__ */ new Set();
520588
+ for (const s2 of stagnationWindow)
520589
+ for (const p2 of s2.shellPrefixesThisTurn)
520590
+ variantSet.add(p2);
520591
+ const variantCount = variantSet.size;
520592
+ if (completedDelta === 0 && filesDelta < STAG_FILES_DELTA_MIN && failureSum >= STAG_FAILURE_THRESHOLD && variantCount >= STAG_VARIANT_THRESHOLD) {
520593
+ const variantList = [...variantSet].slice(0, 8).map((v) => ` • ${v}`).join("\n");
520594
+ const stagMsg = [
520595
+ `[STAGNATION DETECTED — DIAGNOSTIC MODE REQUIRED]`,
520596
+ ``,
520597
+ `Over the last ${stagnationWindow.length} turns you have:`,
520598
+ ` • Completed 0 new todos`,
520599
+ ` • Written/edited only ${filesDelta} unique file(s) (need ≥${STAG_FILES_DELTA_MIN} for healthy progress)`,
520600
+ ` • Accumulated ${failureSum} failures`,
520601
+ ` • Tried ${variantCount} different shell-command variants:`,
520602
+ variantList,
520603
+ ``,
520604
+ `You are not making progress — you are trying surface-level variants of the same approach without diagnosing root cause. This is the failure mode that prevents real completion.`,
520605
+ ``,
520606
+ `MANDATORY NEXT ACTIONS (do NOT call task_complete; do NOT try another variant):`,
520607
+ ``,
520608
+ `1. READ THE FULL ERROR — re-read your most recent failure output ENTIRELY. If it's in a log packet, call log_explore({op:"errors"}) then log_explore({op:"lines", start:..., end:...}) for context. Do not skim.`,
520609
+ ``,
520610
+ `2. STATE A HYPOTHESIS in writing — what specifically is wrong? "I think X is failing because Y." Be concrete. Do NOT propose a fix yet.`,
520611
+ ``,
520612
+ `3. VERIFY ONE ASSUMPTION — pick the ONE thing you most BELIEVE to be true and test it with the smallest possible command:`,
520613
+ ` • If you think a package is installed: ls node_modules/<name>/package.json`,
520614
+ ` • If you think an env var is set: printenv <NAME>`,
520615
+ ` • If you think a file imports correctly: head -5 <file>`,
520616
+ ` • If you don't know what an error means: web_search("<exact error string>")`,
520617
+ ``,
520618
+ `4. CHECK SILENT FAILURES — npm install reporting "added N packages" does NOT mean ALL declared deps installed; npm sometimes drops packages with peer-dep conflicts without erroring. Verify each expected dep individually.`,
520619
+ ``,
520620
+ `DO NOT in your next response:`,
520621
+ ` • Try another version, flag, or variant of any command in the list above`,
520622
+ ` • Wipe node_modules / re-install — that hides the original error`,
520623
+ ` • Call task_complete — being stuck on a debug problem is NEVER grounds for task_complete`,
520624
+ ``,
520625
+ `task_complete is ONLY for actual completion or unrecoverable hardware/permission errors. You are stuck on a fixable problem; diagnose it.`
520626
+ ].join("\n");
520627
+ messages2.push({ role: "system", content: stagMsg });
520628
+ stagnationCooldownUntilTurn = turn + 5;
520629
+ this.emit({
520630
+ type: "status",
520631
+ content: `STAGNATION DETECTED — injected diagnostic mode at turn ${turn} (${variantCount} variants, ${failureSum} failures, ${filesDelta} files in window)`,
520632
+ timestamp: (/* @__PURE__ */ new Date()).toISOString()
520633
+ });
520634
+ }
520635
+ }
520636
+ }
520565
520637
  if (pendingConstraintWarnings.length > 0) {
520566
520638
  const warningMsg = "<constraint-recall>\n" + pendingConstraintWarnings.join("\n") + "\n</constraint-recall>";
520567
520639
  messages2.push({ role: "system", content: warningMsg });
@@ -522354,6 +522426,39 @@ Your most recent tool calls SUCCEEDED. If the task is complete, call task_comple
522354
522426
  });
522355
522427
  }
522356
522428
  }
522429
+ try {
522430
+ const turnLogTail = toolCallLog.filter((t2) => t2.turn === turn || t2.turn === void 0);
522431
+ const filesTouched = /* @__PURE__ */ new Set();
522432
+ const shellPrefixes = /* @__PURE__ */ new Set();
522433
+ let failuresThisTurn = 0;
522434
+ for (const tc of turnLogTail) {
522435
+ if (tc.success === false)
522436
+ failuresThisTurn++;
522437
+ if (["file_write", "file_edit", "batch_edit", "file_patch"].includes(tc.name)) {
522438
+ const m2 = tc.argsKey?.match(/path=([^,]+)/);
522439
+ if (m2 && m2[1])
522440
+ filesTouched.add(m2[1]);
522441
+ }
522442
+ if (tc.name === "shell") {
522443
+ const cmdMatch = tc.argsKey?.match(/command=([^,]{0,200})/);
522444
+ const cmd = cmdMatch?.[1] ?? "";
522445
+ const prefix = cmd.replace(/^cd\s+\S+\s*&&\s*/, "").split(/\s+/).slice(0, 3).join(" ");
522446
+ if (prefix)
522447
+ shellPrefixes.add(prefix);
522448
+ }
522449
+ }
522450
+ const todosNow = this.readSessionTodos() || [];
522451
+ const completedNow = todosNow.filter((t2) => t2.status === "completed").length;
522452
+ stagnationWindow.push({
522453
+ turn,
522454
+ ts: Date.now(),
522455
+ completedTodos: completedNow,
522456
+ filesTouchedThisTurn: filesTouched,
522457
+ failuresThisTurn,
522458
+ shellPrefixesThisTurn: shellPrefixes
522459
+ });
522460
+ } catch {
522461
+ }
522357
522462
  }
522358
522463
  let prevCycleToolCalls = toolCallCount;
522359
522464
  while (!completed && !this.aborted && this.options.bruteForce && bruteForceCycle < this.options.bruteForceMaxCycles) {
@@ -1,12 +1,12 @@
1
1
  {
2
2
  "name": "open-agents-ai",
3
- "version": "0.187.474",
3
+ "version": "0.187.475",
4
4
  "lockfileVersion": 3,
5
5
  "requires": true,
6
6
  "packages": {
7
7
  "": {
8
8
  "name": "open-agents-ai",
9
- "version": "0.187.474",
9
+ "version": "0.187.475",
10
10
  "hasInstallScript": true,
11
11
  "license": "CC-BY-NC-4.0",
12
12
  "dependencies": {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "open-agents-ai",
3
- "version": "0.187.474",
3
+ "version": "0.187.475",
4
4
  "description": "AI coding agent powered by open-source models (Ollama/vLLM) — interactive TUI with agentic tool-calling loop",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",
@@ -161,6 +161,26 @@ When you discover image files (png, jpg, gif, svg, webp, bmp) during codebase ex
161
161
  - ALWAYS run validation (tests, build, lint) after making changes
162
162
  - If tests fail, read the FULL error output. Fix the exact failing assertion or error.
163
163
  - Do NOT give up after a failure. Iterate: fix → test → fix → test until it passes.
164
+ - task_complete is ONLY for actual completion or unrecoverable hardware/permission errors. Being stuck on a code/config problem is NEVER grounds for task_complete — use DIAGNOSTIC MODE below.
165
+
166
+ ### DIAGNOSTIC MODE — When You ARE Stuck, Slow Down and Investigate
167
+
168
+ If you have tried 2+ approaches to the same blocker and both failed, **STOP attempting fixes** and enter diagnostic mode. Repeating fix-attempts on a misunderstood problem just wastes turns. Diagnose ROOT CAUSE first.
169
+
170
+ **The diagnostic loop (one cycle per turn, NOT batched):**
171
+
172
+ 1. **READ THE FULL ERROR** — re-read the most recent failure output ENTIRELY. Don't skim the first 200 chars. If output is in a log packet, use `log_explore` with `op="errors"`, then `op="lines"` for context.
173
+ 2. **VERIFY ONE ASSUMPTION** — pick ONE thing you BELIEVE to be true and test it with the smallest possible command (e.g. "I think tailwindcss is installed" → `ls node_modules/tailwindcss/package.json`).
174
+ 3. **STATE A HYPOTHESIS in writing** before your next action. Then design ONE experiment that would CONFIRM or REFUTE it (not fix it — verify it first).
175
+ 4. **WEB SEARCH the exact error message** if you don't know what it means. A 30-second lookup beats 10 retry attempts.
176
+ 5. **CHECK THE OBVIOUS** — silent failures are common. `npm install` reporting "added 141 packages" doesn't mean ALL declared deps installed; npm sometimes drops packages with peer-dep conflicts without erroring. Verify each expected dep with `ls node_modules/<name>/package.json`.
177
+ 6. Only AFTER root cause is verified, attempt ONE fix targeting that cause. If the fix fails, return to step 1 with the new error.
178
+
179
+ **What diagnostic mode is NOT:**
180
+ - Trying another version (`tailwindcss@3.4.19` after `tailwindcss@4.0.0`) — that's variant-fatigue, not diagnosis.
181
+ - Adding `--force` or `--legacy-peer-deps` — those mask root causes.
182
+ - Wiping node_modules and re-installing — hides the original error.
183
+ - Calling task_complete to escape — task_complete is NEVER the answer to a stuck debugging session.
164
184
  - Use grep_search and find_files for efficient exploration (don't dump entire directories)
165
185
  - Use file_edit for small changes instead of rewriting entire files
166
186
  - Keep tool calls focused — read only what you need
@@ -94,6 +94,36 @@ NEVER write the entire document in ONE file_write call. DECOMPOSE:
94
94
  - Do NOT give up after failure. Iterate until it passes.
95
95
  - Use file_edit for small changes, not full file rewrites
96
96
  - You MUST call task_complete when done — when you have enough information from web tools, STOP fetching and call task_complete with a summary. Do not keep browsing after you have the answer.
97
+ - task_complete is ONLY for actual completion or unrecoverable hardware/permission errors. Being stuck on a code/config problem is NEVER grounds for task_complete — use DIAGNOSTIC MODE below.
98
+
99
+ ### DIAGNOSTIC MODE — When You ARE Stuck, Slow Down and Investigate
100
+
101
+ If you have tried 2+ approaches to the same blocker and both failed, **STOP attempting fixes** and enter diagnostic mode. Repeating fix-attempts on a misunderstood problem just wastes turns. Diagnose ROOT CAUSE first.
102
+
103
+ **The diagnostic loop (one cycle per turn, NOT batched):**
104
+
105
+ 1. **READ THE FULL ERROR** — re-read the most recent failure output ENTIRELY. Don't skim the first 200 chars. If the output is in a log packet, use `log_explore` with `op="errors"` to see every marker, then `op="lines"` for surrounding context.
106
+
107
+ 2. **VERIFY ONE ASSUMPTION** — pick ONE thing you BELIEVE to be true and test it with the smallest possible command:
108
+ - "I think tailwindcss is installed" → `ls node_modules/tailwindcss/package.json` (one line)
109
+ - "I think the import path is right" → `cat src/lib/x.ts | head -5`
110
+ - "I think the env var is set" → `printenv VAR_NAME`
111
+
112
+ 3. **STATE A HYPOTHESIS in writing** before your next action:
113
+ - "Hypothesis: tailwindcss didn't install because @tailwindcss/postcss has a peer-dep conflict with autoprefixer."
114
+ - Then design ONE experiment that would CONFIRM or REFUTE it (not fix it — verify it first).
115
+
116
+ 4. **WEB SEARCH the exact error message** if you don't know what it means. `web_search("exact error string from terminal")`. A 30-second lookup beats 10 retry attempts.
117
+
118
+ 5. **CHECK THE OBVIOUS** — silent failures are common. `npm install` saying "added 141 packages" doesn't mean ALL declared deps installed; npm sometimes drops packages with peer-dep conflicts without erroring. Verify each expected dep with `ls node_modules/<name>/package.json`.
119
+
120
+ 6. Only AFTER root cause is verified, attempt ONE fix targeting that cause. If the fix fails, return to step 1 with the new error.
121
+
122
+ **What diagnostic mode is NOT:**
123
+ - Trying another version (`tailwindcss@3.4.19` after `tailwindcss@4.0.0` failed) — that's variant-fatigue, not diagnosis.
124
+ - Adding `--force` or `--legacy-peer-deps` — those mask root causes, they don't reveal them.
125
+ - Wiping node_modules and re-installing — that just hides the original error.
126
+ - Calling task_complete to escape — task_complete is NEVER the answer to a stuck debugging session.
97
127
  - Do NOT output long explanations. Focus on tool calls.
98
128
  - If file_read/list_directory returns ENOENT, use list_directory on the project root — do NOT guess parent paths
99
129
  - Directory listing entries are RELATIVE to the listed directory. If you list "parent/" and see "child", the full path is "parent/child" — NOT ".child" or just "child"
@@ -99,10 +99,12 @@ Complex tasks (5+ steps) — DECOMPOSE before acting:
99
99
  1. Call todo_write with the checklist. Mark item 1 "in_progress".
100
100
  2. Execute ONE STEP AT A TIME. After each, update todo_write status.
101
101
  3. After each file edit, VERIFY: file_read or shell test.
102
- 4. If stuck after 2 attempts, try a DIFFERENT approach do not repeat the same tool call.
102
+ 4. If stuck after 2 attempts: STOP. Enter DIAGNOSTIC MODE — read the FULL error output, state a hypothesis in writing, verify ONE assumption with the smallest test command, web_search the exact error string. Only fix AFTER you've confirmed root cause. Do NOT keep trying variants of the same approach.
103
103
  5. For multi-file changes: read ALL relevant files first, then edit in dependency order.
104
104
  6. Final todo_write marks all items "completed", then call task_complete.
105
105
 
106
+ task_complete is ONLY for ACTUAL completion. Being stuck on a code/config problem is NEVER grounds for task_complete — diagnose, do not exit.
107
+
106
108
  CRITICAL — NEVER repeat a tool call with the same arguments. If you already read a file, use the data you have. If you already ran a command, use the output. Calling the same tool twice with identical arguments wastes turns and produces the same result.
107
109
 
108
110
  Long document generation (reports, SOWs, proposals, contracts):