npm - @link-assistant/hive-mind - Versions diffs - 1.76.1 → 1.77.0 - Mend

@link-assistant/hive-mind 1.76.1 → 1.77.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/CHANGELOG.md +70 -0
package/package.json +1 -1
package/src/anthropic-cost-accumulator.lib.mjs +107 -0
package/src/claude.budget-stats.lib.mjs +29 -6
package/src/claude.lib.mjs +38 -7
package/src/solve.auto-continue.lib.mjs +17 -0
package/src/solve.config.lib.mjs +70 -0
package/src/solve.escalate.lib.mjs +505 -0
package/src/solve.mjs +3 -3

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,75 @@
 # @link-assistant/hive-mind
+## 1.77.0
+### Minor Changes
+- a50d201: feat(solve): experimental `--escalate` mode (#1885)
+  Add an experimental `solve` option family that solves a task cheaply first and
+  escalates to a more capable (more expensive) model only while unfinished work
+  remains. The model ladder, cheapest → most capable, is `haiku < sonnet < opus <
+fable`.
+  - `--escalate` (bare) → the default range `sonnet-fable`.
+  - `--escalate sonnet-opus` → an explicit `<lower>-<upper>` range (`-` delimits the
+    bounds; only the short ladder names are allowed inside a range).
+  - `--escalate-from haiku` → shortcut for `--escalate haiku-fable` (aliases such as
+    `opus-4-8` accepted here, since a single value is unambiguous).
+  - `--escalate-steps N` (default 1) → keep each tier for N working sessions before
+    escalating (e.g. `2` → two sonnet sessions, then two opus, then two fable).
+  The first regular solve session runs on the range's lower bound (unless `--model`
+  is explicitly pinned). After it finishes, the escalate loop re-scans the pull
+  request for deferred/unfinished-work indicators — reusing the detector from issue
+  #1883 — and escalates to the next tier only if work remains; otherwise it stops
+  early so the expensive tiers are never invoked. Restarts are capped at 3
+  consecutive errors and stop on a usage limit. Escalate is Claude-only and runs
+  before `--finalize` / `--keep-working`.
+  Pure parsing/planning helpers live in a network-free module
+  (`src/solve.escalate.lib.mjs`) with full unit-test coverage
+  (`tests/test-escalate-1885.mjs`); a deep case study is compiled under
+  `docs/case-studies/issue-1885/`.
+- 53a0544: Update Hive Mind Docker images to `konard/box` and `konard/box-dind` 2.3.1 so Docker-in-Docker deployments can use the upstream host-image passthrough allowlist.
+## 1.76.2
+### Patch Changes
+- 5d8d6c1: fix(cost): accumulate Anthropic cost across limit-reset resumes (#1886)
+  The session cost summary could report a large negative "Difference" (e.g.
+  `$-11.422796 (-31.66%)`) between the public pricing estimate and the Anthropic
+  figure. Root cause: the public estimate is computed from the session JSONL,
+  which accumulates the **entire** session across every limit-reset resume, while
+  the Anthropic `total_cost_usd` from the stream-json `result` event is scoped to a
+  **single** Claude process (only the resumed run). Comparing a full-session
+  estimate against a single-process figure produced a misleading gap even though
+  both numbers were individually correct.
+  The per-token math (`calculateModelCost`) was audited and is correct; this is a
+  scope mismatch, not a pricing error.
+  Fix:
+  - New `src/anthropic-cost-accumulator.lib.mjs` keeps a model-agnostic running
+    total of Anthropic's per-process `total_cost_usd` (it sums dollars, never
+    inspecting per-token prices, so it is correct for all models).
+  - `runClaude` seeds from and returns the cumulative total on every terminal path;
+    the cross-process limit-reset resume threads it via a new hidden
+    `--previous-anthropic-cost` option (`autoContinueWhenLimitResets`).
+  - A usage-limit hit ends as `is_error` with no `success` result event, so its
+    cost was previously discarded. The cost from a non-success terminal `result`
+    event is now kept as a fallback and folded into the accumulator, closing the
+    gap in the reported scenario.
+  - `displayCostComparison` / `displaySessionTokenUsage` print a verbose
+    accumulation breakdown ("cumulative across resume iterations: this run … +
+    carried forward … = …") so the figure is never mysterious again.
+  A deep case study (timeline, proven root causes, exact reproduced numbers, online
+  prior art incl. `anthropics/claude-code#13088`) is compiled under
+  `docs/case-studies/issue-1886/`.
 ## 1.76.1
 ### Patch Changes

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@link-assistant/hive-mind",
-  "version": "1.76.1",
+  "version": "1.77.0",
   "description": "AI-powered issue solver and hive mind for collaborative problem solving",
   "main": "src/hive.mjs",
   "type": "module",

package/src/anthropic-cost-accumulator.lib.mjs ADDED Viewed

@@ -0,0 +1,107 @@
+#!/usr/bin/env node
+/**
+ * Issue #1886: Cumulative Anthropic cost across resume iterations.
+ *
+ * Background
+ * ----------
+ * The "Token Usage Summary" compares two numbers:
+ *   - "Public pricing estimate" — computed locally from the session JSONL by
+ *     `calculateSessionTokens`. The JSONL accumulates the *entire* logical
+ *     session: when a run hits a usage limit and is resumed (either in-process
+ *     via the auto-merge loop, or cross-process via
+ *     `autoContinueWhenLimitResets` spawning a fresh `solve` with `--resume`),
+ *     Claude Code appends to the *same* `<session-id>.jsonl`, so every run's
+ *     tokens are present.
+ *   - "Calculated by Anthropic" — taken from the `result` event's
+ *     `total_cost_usd`. That figure is scoped to a *single* Claude process: it
+ *     only covers the tokens that process produced, NOT the tokens inherited
+ *     from a previous run that was interrupted by a limit reset.
+ *
+ * The result is a scope mismatch, not a pricing bug. In issue #1886 the public
+ * estimate ($36.085016, full session) was compared against Anthropic's
+ * per-process figure ($24.662220, the resumed run only), yielding a misleading
+ * "-31.66%" difference even though both numbers are individually correct.
+ *
+ * The fix
+ * -------
+ * Accumulate Anthropic's reported cost across resume iterations so the figure
+ * shown next to the full-session public estimate covers the same scope. This
+ * module is the single source of truth for that running total:
+ *
+ *   - Each `solve` process seeds the accumulator once from
+ *     `--previous-anthropic-cost` (0 for the first run; the carried-forward
+ *     total for an auto-resumed run).
+ *   - Every finished Claude process adds its own `total_cost_usd` via
+ *     `addAnthropicRunCost`, which also covers the in-process auto-merge loop
+ *     (each iteration is a separate Claude process in the same node process).
+ *   - The display and the cross-process spawn both read the cumulative total,
+ *     so "Calculated by Anthropic" tracks the full session.
+ *
+ * The accumulation is model-agnostic: it sums dollar figures and never inspects
+ * per-token prices, so it is correct for Fable 5, Opus, Sonnet, Haiku, and any
+ * future model. See docs/case-studies/issue-1886/ for the full analysis.
+ */
+// Module-level singleton: the cumulative Anthropic cost for the current logical
+// session (this node process plus everything seeded from prior processes).
+let cumulativeAnthropicCostUSD = 0;
+// Seeding must happen exactly once per node process. The auto-merge loop calls
+// runClaude (and therefore the seed helper) repeatedly within a single process;
+// re-seeding from the same CLI flag each time would wipe out accumulation.
+let seeded = false;
+/**
+ * Coerce an arbitrary value to a non-negative finite USD amount.
+ * @param {*} value
+ * @returns {number} the sanitized amount, or 0 when not a positive finite number
+ */
+const toCostAmount = value => {
+  const n = typeof value === 'number' ? value : Number(value);
+  return Number.isFinite(n) && n > 0 ? n : 0;
+};
+/**
+ * Seed the accumulator from the carried-forward previous-run cost, exactly once
+ * per node process. Subsequent calls are no-ops so the in-process auto-merge
+ * loop does not reset the running total.
+ * @param {number|string|null|undefined} previousAnthropicCostUSD
+ * @returns {number} the cumulative total after seeding
+ */
+export const seedCumulativeAnthropicCost = previousAnthropicCostUSD => {
+  if (seeded) return cumulativeAnthropicCostUSD;
+  cumulativeAnthropicCostUSD = toCostAmount(previousAnthropicCostUSD);
+  seeded = true;
+  return cumulativeAnthropicCostUSD;
+};
+/**
+ * Add a single Claude process's reported cost to the running total.
+ * Non-positive / non-finite inputs (e.g. a null cost when a run was interrupted
+ * by a limit before emitting a success result) contribute nothing.
+ * @param {number|string|null|undefined} runCostUSD
+ * @returns {number} the cumulative total after adding
+ */
+export const addAnthropicRunCost = runCostUSD => {
+  cumulativeAnthropicCostUSD += toCostAmount(runCostUSD);
+  return cumulativeAnthropicCostUSD;
+};
+/**
+ * @returns {number} the cumulative Anthropic cost for the current logical session
+ */
+export const getCumulativeAnthropicCost = () => cumulativeAnthropicCostUSD;
+/**
+ * @returns {boolean} true once a positive cost has been seeded or accumulated
+ */
+export const hasCumulativeAnthropicCost = () => cumulativeAnthropicCostUSD > 0;
+/**
+ * Reset the accumulator. Intended for tests — production code seeds once and
+ * accumulates for the lifetime of the process.
+ */
+export const resetCumulativeAnthropicCost = () => {
+  cumulativeAnthropicCostUSD = 0;
+  seeded = false;
+};

package/src/claude.budget-stats.lib.mjs CHANGED Viewed

@@ -138,19 +138,38 @@ export const displayModelUsage = async (usage, log) => {
 /**
  * Display cost comparison between public pricing and Anthropic's official cost
  * Issue #1557: Show simplified format when costs match, remove USD suffix
- * @param {number|null} publicCost - Public pricing estimate
- * @param {number|null} anthropicCost - Anthropic's official cost
+ * Issue #1886: `anthropicCost` is the cumulative Anthropic cost across every
+ *   resume iteration (the session JSONL — and therefore `publicCost` — spans
+ *   the full session, so the Anthropic figure must too). The optional
+ *   `previousAnthropicCost` is the portion carried in from earlier runs; when
+ *   non-zero we show a verbose breakdown so the accumulation is auditable.
+ * @param {number|null} publicCost - Public pricing estimate (full session)
+ * @param {number|null} anthropicCost - Anthropic's cumulative official cost (full session)
  * @param {Function} log - Logging function
+ * @param {Object} [options]
+ * @param {number} [options.previousAnthropicCost=0] - cost carried in from earlier resume iterations
  */
-export const displayCostComparison = async (publicCost, anthropicCost, log) => {
+export const displayCostComparison = async (publicCost, anthropicCost, log, options = {}) => {
+  const previousAnthropicCost = options.previousAnthropicCost || 0;
   const hasPublic = publicCost !== null && publicCost !== undefined;
   const hasAnthropic = anthropicCost !== null && anthropicCost !== undefined;
   const publicDec = hasPublic ? new Decimal(publicCost) : null;
   const anthropicDec = hasAnthropic ? new Decimal(anthropicCost) : null;
+  // Issue #1886: when the Anthropic figure was accumulated across resumes,
+  // expose the breakdown in verbose mode so "this run + carried forward = total"
+  // is auditable from the saved log.
+  const logAccumulationBreakdown = async () => {
+    if (previousAnthropicCost > 0 && anthropicDec) {
+      const thisRun = anthropicDec.minus(new Decimal(previousAnthropicCost));
+      await log(`      ↳ Anthropic cost is cumulative across resume iterations (issue #1886):`, { verbose: true });
+      await log(`         this run: $${thisRun.toFixed(6)} + carried forward: $${new Decimal(previousAnthropicCost).toFixed(6)} = $${anthropicDec.toFixed(6)}`, { verbose: true });
+    }
+  };
   // Issue #1703: also collapse to the short form when the rounded difference is below display precision,
   // so reports like "Difference: $-0.000000 (-0.00%)" no longer waste two extra lines.
   if (publicDec && anthropicDec && anthropicDec.minus(publicDec).abs().toFixed(6) === '0.000000') {
     await log(`\n   💰 Cost: $${anthropicDec.toFixed(6)}`);
+    await logAccumulationBreakdown();
     return;
   }
   await log('\n   💰 Cost estimation:');
@@ -163,6 +182,7 @@ export const displayCostComparison = async (publicCost, anthropicCost, log) => {
   } else {
     await log('      Difference:              unknown');
   }
+  await logAccumulationBreakdown();
 };
 /**
@@ -316,11 +336,12 @@ export const displayBudgetStats = async (usage, tokenUsage, log) => {
  * @param {string} params.sessionId - Claude session id (skips when falsy)
  * @param {string} params.tempDir - Working directory containing the session JSONL (skips when falsy)
  * @param {Object|null} params.resultModelUsage - Authoritative per-model usage from the result JSON event
- * @param {number} params.anthropicTotalCostUSD - Anthropic's official total cost (for the comparison line)
+ * @param {number} params.anthropicTotalCostUSD - Anthropic's cumulative official cost across resume iterations (issue #1886)
+ * @param {number} [params.previousAnthropicCostUSD=0] - portion of anthropicTotalCostUSD carried in from earlier resume iterations (issue #1886)
  * @param {Object} params.argv - Parsed CLI args (reads argv.tokensBudgetStats)
  * @param {Function} params.log - Logger
  */
-export const displaySessionTokenUsage = async ({ sessionId, tempDir, resultModelUsage, anthropicTotalCostUSD, argv, log }) => {
+export const displaySessionTokenUsage = async ({ sessionId, tempDir, resultModelUsage, anthropicTotalCostUSD, previousAnthropicCostUSD = 0, argv, log }) => {
   if (!sessionId || !tempDir) return;
   try {
     const tokenUsage = await calculateSessionTokens(sessionId, tempDir, resultModelUsage);
@@ -355,7 +376,9 @@ export const displaySessionTokenUsage = async ({ sessionId, tempDir, resultModel
         await log('\n   📈 Total across all models:');
       }
       // Show cost comparison (for both single and multiple models)
-      await displayCostComparison(tokenUsage.totalCostUSD, anthropicTotalCostUSD, log);
+      // Issue #1886: anthropicTotalCostUSD is cumulative across resume iterations
+      // so it shares the full-session scope of the JSONL-based public estimate.
+      await displayCostComparison(tokenUsage.totalCostUSD, anthropicTotalCostUSD, log, { previousAnthropicCost: previousAnthropicCostUSD });
       // Show total tokens for single model only
       if (modelIds.length === 1) {
         await log(`      Total tokens: ${formatNumber(tokenUsage.totalTokens)}`);

package/src/claude.lib.mjs CHANGED Viewed

@@ -17,6 +17,7 @@ import { sanitizeObjectStrings } from './unicode-sanitization.lib.mjs';
 import Decimal from 'decimal.js-light';
 import { createEmptySubSessionUsage, accumulateModelUsage, mergeResultModelUsage, createSubAgentCallEntry, accumulateSubAgentUsage, getRawRequestInputTokens, displaySessionTokenUsage } from './claude.budget-stats.lib.mjs';
 import { buildClaudeResumeCommand, buildClaudeAutonomousResumeCommand } from './claude.command-builder.lib.mjs';
+import { seedCumulativeAnthropicCost, addAnthropicRunCost } from './anthropic-cost-accumulator.lib.mjs'; // Issue #1886
 import { buildSolveResumeCommand } from './solve.resume-command.lib.mjs'; // Issue #942
 import { SESSION_FORCE_KILLED_MARKER, postTrackedComment } from './tool-comments.lib.mjs'; // Issue #1625
 import { handleClaudeRuntimeSwitch } from './claude.runtime-switch.lib.mjs'; // see issue #1141
@@ -662,6 +663,9 @@ export const executeClaudeCommand = async params => {
     let stderrErrors = [];
     let resultSuccessReceived = false;
     let anthropicTotalCostUSD = null;
+    // Issue #1886: a usage-limit hit ends as is_error (no success result). Keep
+    // the latest cost from ANY result event as a fallback for the failure path.
+    let anthropicCostFromAnyResult = null;
     let errorDuringExecution = false;
     let resultSummary = null;
     let resultModelUsage = null;
@@ -942,7 +946,9 @@ export const executeClaudeCommand = async params => {
                   anthropicTotalCostUSD = data.total_cost_usd;
                   await log(`💰 Anthropic official cost captured from success result: $${anthropicTotalCostUSD.toFixed(6)}`, { verbose: true });
                 } else if (data.total_cost_usd !== undefined && data.total_cost_usd !== null) {
-                  await log(`💰 Anthropic cost from ${data.subtype || 'unknown'} result ignored: $${data.total_cost_usd.toFixed(6)}`, { verbose: true });
+                  // Issue #1886: non-success terminal (e.g. usage-limit hit) still reports this process's cost — keep as accumulation fallback.
+                  anthropicCostFromAnyResult = data.total_cost_usd;
+                  await log(`💰 Anthropic cost from ${data.subtype || 'unknown'} result kept as fallback for accumulation: $${data.total_cost_usd.toFixed(6)}`, { verbose: true });
                 }
                 // Issue #1263: Extract result summary (AI's summary of work done) for --attach-solution-summary
                 if (data.subtype === 'success' && data.result && typeof data.result === 'string') {
@@ -1106,6 +1112,10 @@ export const executeClaudeCommand = async params => {
           await log(JSON.stringify(data, null, 2));
           if (data.type === 'result' && data.subtype === 'success' && data.total_cost_usd != null) {
             anthropicTotalCostUSD = data.total_cost_usd;
+          } else if (data.type === 'result' && data.total_cost_usd != null) {
+            // Issue #1886: keep a non-success terminal result's cost as a fallback
+            // for accumulation (see the streaming branch above).
+            anthropicCostFromAnyResult = data.total_cost_usd;
           }
           // Issue #1472: Forward remaining buffer event to interactive handler (was previously missed)
           if (interactiveHandler) {
@@ -1187,6 +1197,9 @@ export const executeClaudeCommand = async params => {
           await log(`\n\n❌ API explicitly marked error as not retryable (x-should-retry: false) and session made no progress (num_turns=${resultNumTurns}) after ${retryCount} attempt(s)`, { level: 'error' });
           await log(`   This error is not recoverable. Failing fast to avoid a stuck retry loop (Issue #1437).`, { level: 'error' });
           await log(`   Check https://status.anthropic.com/ for API status.`, { level: 'error' });
+          // Issue #1886: fold captured cost so a cross-process resume's carried-forward cost is not dropped here.
+          seedCumulativeAnthropicCost(argv.previousAnthropicCost);
+          const cumulativeAnthropicCostUSDOnStuckRetry = addAnthropicRunCost(anthropicTotalCostUSD ?? anthropicCostFromAnyResult);
           return {
             success: false,
             sessionId,
@@ -1196,7 +1209,7 @@ export const executeClaudeCommand = async params => {
             messageCount,
             toolUseCount,
             is503Error,
-            anthropicTotalCostUSD,
+            anthropicTotalCostUSD: cumulativeAnthropicCostUSDOnStuckRetry, // Issue #1104/#1886
             resultSummary,
             // Issue #1845: surface the actual error so callers can show it to users
             errorInfo: { message: lastMessage || 'API explicitly marked error as not retryable', exitCode },
@@ -1233,6 +1246,9 @@ export const executeClaudeCommand = async params => {
           return await executeWithRetry();
         } else {
           await log(`\n\n❌ Transient API error persisted after ${maxRetries} retries\n   Please try again later or check https://status.anthropic.com/`, { level: 'error' });
+          // Issue #1886: fold captured cost so the carried-forward cost survives this retries-exhausted path.
+          seedCumulativeAnthropicCost(argv.previousAnthropicCost);
+          const cumulativeAnthropicCostUSDOnRetriesExhausted = addAnthropicRunCost(anthropicTotalCostUSD ?? anthropicCostFromAnyResult);
           return {
             success: false,
             sessionId,
@@ -1242,7 +1258,7 @@ export const executeClaudeCommand = async params => {
             messageCount,
             toolUseCount,
             is503Error, // preserve for callers that check this
-            anthropicTotalCostUSD, // Issue #1104: Include cost even on failure
+            anthropicTotalCostUSD: cumulativeAnthropicCostUSDOnRetriesExhausted, // Issue #1104/#1886: Include cumulative cost even on failure
             resultSummary, // Issue #1263: Include result summary
             // Issue #1845: surface the actual error so callers can show it to users
             errorInfo: { message: lastMessage || `Transient API error persisted after ${maxRetries} retries`, exitCode },
@@ -1294,6 +1310,12 @@ export const executeClaudeCommand = async params => {
         await log(`   Memory: ${resourcesAfter.memory.split('\n')[1]}`, { verbose: true });
         await log(`   Load: ${resourcesAfter.load}`, { verbose: true });
         await showResumeCommand(sessionId, tempDir, claudePath, argv.model, log, argv);
+        // Issue #1886: on failure (usually a usage-limit hit → auto-resume) fold
+        // the captured cost into the cumulative total so autoContinueWhenLimitResets
+        // carries it forward. A limit hit ends as is_error → fall back to the
+        // non-success result cost.
+        seedCumulativeAnthropicCost(argv.previousAnthropicCost);
+        const cumulativeAnthropicCostUSDOnFailure = addAnthropicRunCost(anthropicTotalCostUSD ?? anthropicCostFromAnyResult);
         return {
           success: false,
           sessionId,
@@ -1303,7 +1325,7 @@ export const executeClaudeCommand = async params => {
           messageCount,
           toolUseCount,
           errorDuringExecution,
-          anthropicTotalCostUSD, // Issue #1104: Include cost even on failure
+          anthropicTotalCostUSD: cumulativeAnthropicCostUSDOnFailure, // Issue #1104/#1886: cumulative cost even on failure
           resultSummary, // Issue #1263: Include result summary
           // Issue #1845: surface the core error (e.g. "API Error: Output blocked by content
           // filtering policy") so users see what actually went wrong, not just a generic message.
@@ -1322,7 +1344,13 @@ export const executeClaudeCommand = async params => {
       await log(`📊 Total messages: ${messageCount}, Tool uses: ${toolUseCount}`);
       // Calculate and display total token usage from session JSONL file.
       // Extracted to claude.budget-stats.lib.mjs to keep this file under the line limit (Issue #1834).
-      await displaySessionTokenUsage({ sessionId, tempDir, resultModelUsage, anthropicTotalCostUSD, argv, log });
+      // Issue #1886: the JSONL spans every resume iteration but each result
+      // event's total_cost_usd covers only this process; seed the carried-forward
+      // cost + add this process's so the cumulative total shares the JSONL scope.
+      seedCumulativeAnthropicCost(argv.previousAnthropicCost);
+      const cumulativeAnthropicCostUSD = addAnthropicRunCost(anthropicTotalCostUSD);
+      const previousAnthropicCostUSD = cumulativeAnthropicCostUSD - (anthropicTotalCostUSD || 0);
+      await displaySessionTokenUsage({ sessionId, tempDir, resultModelUsage, anthropicTotalCostUSD: cumulativeAnthropicCostUSD, previousAnthropicCostUSD, argv, log });
       await showResumeCommand(sessionId, tempDir, claudePath, argv.model, log, argv);
       return {
         success: true,
@@ -1332,7 +1360,7 @@ export const executeClaudeCommand = async params => {
         limitTimezone,
         messageCount,
         toolUseCount,
-        anthropicTotalCostUSD, // Pass Anthropic's official total cost
+        anthropicTotalCostUSD: cumulativeAnthropicCostUSD, // Issue #1104/#1886: cumulative Anthropic cost across resume iterations
         errorDuringExecution, // Issue #1088: Track if error_during_execution subtype occurred
         resultSummary, // Issue #1263: Include result summary for --attach-solution-summary
         resultModelUsage, // Issue #1454
@@ -1378,6 +1406,9 @@ export const executeClaudeCommand = async params => {
         }
       }
       await log(`\n\n❌ Error executing Claude command: ${error.message}`, { level: 'error' });
+      // Issue #1886: fold captured cost so the carried-forward cost survives this exception path too.
+      seedCumulativeAnthropicCost(argv.previousAnthropicCost);
+      const cumulativeAnthropicCostUSDOnException = addAnthropicRunCost(anthropicTotalCostUSD ?? anthropicCostFromAnyResult);
       return {
         success: false,
         sessionId,
@@ -1386,7 +1417,7 @@ export const executeClaudeCommand = async params => {
         limitTimezone: null,
         messageCount,
         toolUseCount,
-        anthropicTotalCostUSD, // Issue #1104: Include cost even on failure
+        anthropicTotalCostUSD: cumulativeAnthropicCostUSDOnException, // Issue #1104/#1886: Include cumulative cost even on failure
         resultSummary, // Issue #1263: Include result summary
         // Issue #1845: surface the actual exception message so callers can show it to users
         errorInfo: { message: error.message || error.toString() },

package/src/solve.auto-continue.lib.mjs CHANGED Viewed

@@ -54,6 +54,9 @@ import { formatAutoIterationLimit, hasReachedAutoIterationLimit, normalizeAutoIt
 // Issue #1574: Interruptible sleep so CTRL+C is never blocked by a lingering timer
 const { interruptibleSleep } = await import('./interruptible-sleep.lib.mjs');
+// Issue #1886: cumulative Anthropic cost carried across cross-process resumes
+const { getCumulativeAnthropicCost } = await import('./anthropic-cost-accumulator.lib.mjs');
 const { calculateWaitTime } = validation;
 /**
@@ -168,6 +171,20 @@ export const autoContinueWhenLimitResets = async (issueUrl, sessionId, argv, sho
     resumeArgs.push('--auto-resume-iteration', String(nextAutoResumeIteration));
     resumeArgs.push('--auto-resume-max-iterations', String(maxAutoResumeIterations));
+    // Issue #1886: carry the cumulative Anthropic cost into the resumed process.
+    // The resumed run reads the same session JSONL (full session, all runs), so
+    // its public-pricing estimate spans every run; without this the resumed
+    // run's "Calculated by Anthropic" figure would cover only its own process
+    // and disagree with the full-session public estimate (the -31.66% gap in
+    // issue #1886). getCumulativeAnthropicCost() already includes this run's
+    // cost folded in at the runClaude return, plus anything carried from prior
+    // iterations via --previous-anthropic-cost.
+    const carriedAnthropicCost = getCumulativeAnthropicCost();
+    if (carriedAnthropicCost > 0) {
+      resumeArgs.push('--previous-anthropic-cost', String(carriedAnthropicCost));
+      await log(`💰 Carrying forward cumulative Anthropic cost: $${carriedAnthropicCost.toFixed(6)} (issue #1886)`, { verbose: true });
+    }
     // Pass session type for proper comment differentiation
     // See: https://github.com/link-assistant/hive-mind/issues/1152
     const sessionType = isRestart ? 'auto-restart' : 'auto-resume';

package/src/solve.config.lib.mjs CHANGED Viewed

@@ -10,6 +10,7 @@
 import { enhanceErrorMessage, detectMalformedFlags } from './option-suggestions.lib.mjs';
 import { defaultModels, buildModelOptionDescription, resolveDefaultFallbackModel, resolveRuntimeDefaultModel } from './models/index.mjs';
 import { validateBranchName } from './solve.branch.lib.mjs';
+import { resolveEscalationConfig, isEscalateEnabled, DEFAULT_ESCALATE_RANGE } from './solve.escalate.lib.mjs';
 import { getLinoYargsFactory, hideBin, parseCliArgumentsWithLino } from './cli-arguments.lib.mjs';
 // Re-export for use by telegram-bot.mjs (avoids extra import lines there)
@@ -218,6 +219,19 @@ export const SOLVE_OPTION_DEFINITIONS = {
     default: 0,
     hidden: true,
   },
+  // Issue #1886: carried-forward Anthropic cost from previous resume iterations.
+  // The session JSONL accumulates the full session across limit-reset resumes,
+  // but Anthropic's result-event total_cost_usd is scoped to a single process.
+  // Threading the previous total here lets the resumed run display the
+  // full-session Anthropic cost alongside the full-session public estimate,
+  // instead of a misleading per-run figure. Internal/hidden: set automatically
+  // by autoContinueWhenLimitResets when spawning the resumed solve process.
+  'previous-anthropic-cost': {
+    type: 'number',
+    description: 'Internal: cumulative Anthropic total_cost_usd carried forward from previous resume iterations (issue #1886)',
+    default: 0,
+    hidden: true,
+  },
   'auto-merge': {
     type: 'boolean',
     description: 'Automatically merge the pull request when the working session is finished and all CI/CD statuses pass and PR is mergeable. Implies --auto-restart-until-mergeable.',
@@ -597,6 +611,21 @@ export const SOLVE_OPTION_DEFINITIONS = {
     alias: ['keep-going-until-all-requirements-are-fully-done', 'keep-working', 'keep-going'],
     default: undefined,
   },
+  escalate: {
+    type: 'string',
+    description: '[EXPERIMENTAL] Start solving with a cheaper/lower-tier model and automatically escalate to a more capable (more expensive) model while unfinished work remains. Accepts a model range "<lower>-<upper>" using short Claude tier names (ladder: haiku < sonnet < opus < fable), e.g. "sonnet-opus". A single name (e.g. "opus") means just that tier. Bare flag means "sonnet-fable". The idea: iterate cheaply first so expensive models do more reading and less writing.',
+    default: undefined,
+  },
+  'escalate-from': {
+    type: 'string',
+    description: '[EXPERIMENTAL] Shortcut for --escalate <model>-fable: start solving from the given model (haiku/sonnet/opus/fable, aliases accepted) and escalate up to the top of the ladder while unfinished work remains. Takes precedence over --escalate when both are given.',
+    default: undefined,
+  },
+  'escalate-steps': {
+    type: 'number',
+    description: '[EXPERIMENTAL] How many working sessions to keep each model tier before escalating to the next one (default: 1). For example 2 keeps the lower tier for two working sessions, then the next tier for two, and so on. Only used with --escalate / --escalate-from.',
+    default: 1,
+  },
   'working-session-live-progress': {
     type: 'string',
     description: '[EXPERIMENTAL] Enable live progress monitoring. Accepts "comment" (default, updates a per-session PR comment) or "pr" (updates PR description). Plain --working-session-live-progress means "comment". Works with or without --interactive-mode.',
@@ -870,6 +899,34 @@ export const parseArguments = async (yargs = getLinoYargsFactory(), hideBinFn =
     }
   }
+  // --escalate / --escalate-from / --escalate-steps normalization (issue #1885)
+  // The bare `--escalate` flag is a string-typed option, so yargs yields `true`
+  // (or an empty string) for a value-less flag. Canonicalize that to the default
+  // range so downstream parsing in solve.escalate.lib.mjs sees a meaningful
+  // value. We also validate the range/steps eagerly here so misuse fails fast at
+  // config time rather than mid-solve.
+  {
+    const escalateProvided = hasRawOption(rawArgs, '--escalate');
+    if (escalateProvided) {
+      const current = argv.escalate;
+      if (current === true || current === '' || current === undefined || current === null) {
+        argv.escalate = DEFAULT_ESCALATE_RANGE;
+      } else if (typeof current === 'string') {
+        argv.escalate = current.trim().toLowerCase();
+      }
+    } else if (argv.escalate === undefined) {
+      argv.escalate = undefined;
+    }
+    if (typeof argv.escalateFrom === 'string') {
+      argv.escalateFrom = argv.escalateFrom.trim().toLowerCase();
+    }
+    // Validate eagerly (throws on invalid range / from / steps). resolveEscalationConfig
+    // is a no-op (returns null) when the feature is disabled.
+    if (isEscalateEnabled(argv)) {
+      resolveEscalationConfig(argv);
+    }
+  }
   // --working-session-live-progress normalization
   // When passed as --working-session-live-progress (no value), yargs gives true for string type
   // Normalize: true → "comment", validate known values
@@ -898,6 +955,19 @@ export const parseArguments = async (yargs = getLinoYargsFactory(), hideBinFn =
     argv.model = await resolveRuntimeDefaultModel(argv.tool);
   }
+  // Escalate mode (issue #1885): when enabled and the user did not explicitly
+  // pin a model, the very first regular solve session should run on the cheapest
+  // tier in the plan (the range's lower bound). The restart loop in
+  // solve.escalate.lib.mjs then escalates upward while unfinished work remains.
+  // An explicit --model always wins (the user pinned the worker model on
+  // purpose), so only override the resolved default.
+  if (isEscalateEnabled(argv) && !modelExplicitlyProvided && (argv.tool || 'claude') === 'claude') {
+    const escalationConfig = resolveEscalationConfig(argv);
+    if (escalationConfig && escalationConfig.plan.length > 0) {
+      argv.model = escalationConfig.plan[0];
+    }
+  }
   if (argv.tool && !fallbackModelExplicitlyProvided) {
     const defaultFallbackModel = resolveDefaultFallbackModel(argv.tool, argv.model);
     argv.fallbackModel = defaultFallbackModel || undefined;

package/src/solve.escalate.lib.mjs ADDED Viewed

@@ -0,0 +1,505 @@
+#!/usr/bin/env node
+/**
+ * Escalate-mode module for solve.mjs
+ *
+ * [EXPERIMENTAL] `--escalate` makes the solver try to solve a task fast and
+ * cheap first (with a lower-tier model), and only escalate to a more capable
+ * (and more expensive) model when the cheaper model did not finish the job.
+ *
+ * The intuition (from issue #1885): small models often get *most* of the work
+ * right, but not quite right. By iterating cheaply first, the more expensive
+ * models spend their budget reading and refining an existing draft rather than
+ * writing everything from scratch.
+ *
+ * Model ladder (Claude), cheapest → most capable:
+ *
+ *     haiku  →  sonnet  →  opus  →  fable
+ *
+ * Options:
+ *   --escalate [lower-upper]   Enable escalate mode. Bare flag means the default
+ *                              range `sonnet-fable`. `sonnet-opus` sets the lower
+ *                              and upper bound (delimiter is `-`). A single name
+ *                              (e.g. `opus`) means just that one tier.
+ *   --escalate-from <model>    Shortcut for `--escalate <model>-fable` (escalate
+ *                              from <model> up to the top of the ladder).
+ *   --escalate-steps <n>       How many working sessions to keep each tier before
+ *                              escalating to the next one (default: 1). For
+ *                              example `2` keeps the lower tier for two working
+ *                              sessions, then the next tier for two, and so on.
+ *
+ * The pure parsing/planning helpers in this module are network-free so they can
+ * be unit-tested in isolation. The `runEscalation` orchestrator restarts the AI
+ * tool with the escalated model, reusing the same deferred-work detection that
+ * powers `--keep-working-until-all-requirements-are-fully-done` (issue #1883) as
+ * the "did the cheaper model finish?" signal.
+ *
+ * @see https://github.com/link-assistant/hive-mind/issues/1885
+ */
+// ───────────────────────────── Pure helpers ──────────────────────────────────
+// Everything above the `runEscalation` orchestrator is pure (no I/O, no network)
+// so it can be imported and unit-tested without side effects.
+/**
+ * Ordered Claude model ladder used by escalate mode, cheapest → most capable.
+ * These are the short canonical tier names; aliases (e.g. `opus-4-8`,
+ * `claude-fable-5`) are normalized to these by {@link canonicalTier}.
+ */
+export const MODEL_ESCALATION_ORDER = ['haiku', 'sonnet', 'opus', 'fable'];
+/** Default lower bound when `--escalate` is given without an explicit range. */
+export const DEFAULT_ESCALATE_LOWER = 'sonnet';
+/** Default upper bound (top of the ladder). */
+export const DEFAULT_ESCALATE_UPPER = 'fable';
+/** Default range used when `--escalate` is given as a bare flag. */
+export const DEFAULT_ESCALATE_RANGE = `${DEFAULT_ESCALATE_LOWER}-${DEFAULT_ESCALATE_UPPER}`;
+/** Default number of working sessions to keep each tier before escalating. */
+export const DEFAULT_ESCALATE_STEPS = 1;
+/**
+ * Mapping of known model aliases → canonical tier name. Lets users pass either
+ * the short tier name (`opus`) or a more specific alias (`opus-4-8`,
+ * `claude-opus-4-8`) wherever a single model is accepted (e.g. --escalate-from).
+ */
+const TIER_ALIASES = {
+  haiku: 'haiku',
+  'haiku-4-5': 'haiku',
+  'claude-haiku-4-5': 'haiku',
+  'claude-haiku-4-5-20251001': 'haiku',
+  sonnet: 'sonnet',
+  'sonnet-4-6': 'sonnet',
+  'sonnet-4-5': 'sonnet',
+  'claude-sonnet-4-6': 'sonnet',
+  'claude-sonnet-4-5': 'sonnet',
+  opus: 'opus',
+  'opus-4-8': 'opus',
+  'opus-4-7': 'opus',
+  'opus-4-6': 'opus',
+  'opus-4-5': 'opus',
+  'claude-opus-4-8': 'opus',
+  'claude-opus-4-7': 'opus',
+  fable: 'fable',
+  'fable-5': 'fable',
+  'claude-fable-5': 'fable',
+};
+/**
+ * Normalize a model name/alias to its canonical escalate-ladder tier.
+ * @param {string} name
+ * @returns {string|null} canonical tier name, or null if not a known tier.
+ */
+export const canonicalTier = name => {
+  if (typeof name !== 'string') return null;
+  const key = name.trim().toLowerCase();
+  if (!key) return null;
+  return TIER_ALIASES[key] || null;
+};
+/**
+ * Parse a `--escalate` range value into { from, to } canonical tier names.
+ *
+ * Accepted forms:
+ *   - true / '' / undefined  → the default range (`sonnet-fable`)
+ *   - `sonnet`               → { from: 'sonnet', to: 'sonnet' }
+ *   - `sonnet-fable`         → { from: 'sonnet', to: 'fable' }
+ *
+ * The delimiter is `-`. Only the short ladder names (haiku|sonnet|opus|fable)
+ * are accepted inside a range to avoid ambiguity with dashed aliases such as
+ * `opus-4-8` (use --escalate-from for those).
+ *
+ * @param {string|boolean|undefined} value
+ * @returns {{ from: string, to: string }}
+ * @throws {Error} on an unparseable / invalid range.
+ */
+export const parseEscalateRange = value => {
+  let raw = value;
+  if (raw === true || raw === undefined || raw === null) {
+    raw = DEFAULT_ESCALATE_RANGE;
+  }
+  if (typeof raw !== 'string') {
+    throw new Error(`Invalid --escalate value: ${JSON.stringify(value)}. Expected a model range like "sonnet-fable".`);
+  }
+  const trimmed = raw.trim().toLowerCase();
+  if (trimmed === '') {
+    raw = DEFAULT_ESCALATE_RANGE;
+  }
+  const parts = (trimmed === '' ? DEFAULT_ESCALATE_RANGE : trimmed).split('-');
+  const order = MODEL_ESCALATION_ORDER;
+  const requireLadderName = part => {
+    if (!order.includes(part)) {
+      throw new Error(`Invalid --escalate model "${part}". Expected one of: ${order.join(', ')} (range form: "${DEFAULT_ESCALATE_RANGE}").`);
+    }
+    return part;
+  };
+  let from;
+  let to;
+  if (parts.length === 1) {
+    from = requireLadderName(parts[0]);
+    to = from;
+  } else if (parts.length === 2) {
+    from = requireLadderName(parts[0]);
+    to = requireLadderName(parts[1]);
+  } else {
+    throw new Error(`Invalid --escalate range "${trimmed}". Expected "<lower>-<upper>" with short model names (e.g. "${DEFAULT_ESCALATE_RANGE}").`);
+  }
+  if (order.indexOf(from) > order.indexOf(to)) {
+    throw new Error(`Invalid --escalate range "${trimmed}": lower bound "${from}" is more capable than upper bound "${to}". Order is ${order.join(' < ')}.`);
+  }
+  return { from, to };
+};
+/**
+ * Parse a `--escalate-from` value into { from, to } where `to` is the top of
+ * the ladder. Accepts canonical names and aliases (e.g. `opus-4-8`).
+ * @param {string} value
+ * @returns {{ from: string, to: string }}
+ * @throws {Error} on an invalid model name.
+ */
+export const parseEscalateFrom = value => {
+  const from = canonicalTier(value);
+  if (!from) {
+    throw new Error(`Invalid --escalate-from model ${JSON.stringify(value)}. Expected one of: ${MODEL_ESCALATION_ORDER.join(', ')}.`);
+  }
+  return { from, to: DEFAULT_ESCALATE_UPPER };
+};
+/**
+ * Normalize the `--escalate-steps` value into a positive integer (default 1).
+ * @param {string|number|undefined} value
+ * @returns {number}
+ * @throws {Error} on a non-positive / non-numeric value.
+ */
+export const normalizeEscalateSteps = value => {
+  if (value === undefined || value === null || value === true || value === '') {
+    return DEFAULT_ESCALATE_STEPS;
+  }
+  const n = typeof value === 'number' ? value : Number(String(value).trim());
+  if (!Number.isFinite(n) || !Number.isInteger(n) || n < 1) {
+    throw new Error(`Invalid --escalate-steps value: ${JSON.stringify(value)}. Expected a positive integer (>= 1).`);
+  }
+  return n;
+};
+/**
+ * Build the ordered list of models (the "escalation plan"), where each tier
+ * between `from` and `to` (inclusive) is repeated `steps` times.
+ *
+ * Example: { from: 'sonnet', to: 'fable', steps: 2 } →
+ *   ['sonnet', 'sonnet', 'opus', 'opus', 'fable', 'fable']
+ *
+ * @param {{ from: string, to: string, steps?: number }} params
+ * @returns {string[]}
+ */
+export const buildEscalationPlan = ({ from, to, steps = DEFAULT_ESCALATE_STEPS }) => {
+  const order = MODEL_ESCALATION_ORDER;
+  const fromIdx = order.indexOf(from);
+  const toIdx = order.indexOf(to);
+  if (fromIdx === -1 || toIdx === -1 || fromIdx > toIdx) {
+    throw new Error(`Invalid escalation bounds: from="${from}", to="${to}".`);
+  }
+  const tiers = order.slice(fromIdx, toIdx + 1);
+  const plan = [];
+  for (const tier of tiers) {
+    for (let i = 0; i < steps; i++) {
+      plan.push(tier);
+    }
+  }
+  return plan;
+};
+/**
+ * Resolve the model to use for a given 0-based working-session index. Indexes
+ * past the end of the plan clamp to the last (most capable) model so the loop
+ * never reaches outside the ladder.
+ * @param {string[]} plan
+ * @param {number} sessionIndex
+ * @returns {string}
+ */
+export const resolveEscalationModel = (plan, sessionIndex) => {
+  if (!Array.isArray(plan) || plan.length === 0) return undefined;
+  const idx = Math.max(0, Math.min(sessionIndex, plan.length - 1));
+  return plan[idx];
+};
+/**
+ * Whether escalate mode is enabled given parsed argv.
+ * @param {object} argv
+ * @returns {boolean}
+ */
+export const isEscalateEnabled = argv => {
+  if (!argv) return false;
+  return Boolean(argv.escalate) || Boolean(argv.escalateFrom);
+};
+/**
+ * Resolve the full escalation configuration from argv. Returns null when the
+ * feature is disabled.
+ *
+ * `--escalate-from` takes precedence over `--escalate` when both are given.
+ *
+ * @param {object} argv
+ * @returns {{ enabled: boolean, from: string, to: string, steps: number, plan: string[] }|null}
+ */
+export const resolveEscalationConfig = argv => {
+  if (!isEscalateEnabled(argv)) return null;
+  const { from, to } = argv.escalateFrom ? parseEscalateFrom(argv.escalateFrom) : parseEscalateRange(argv.escalate);
+  const steps = normalizeEscalateSteps(argv.escalateSteps);
+  const plan = buildEscalationPlan({ from, to, steps });
+  return { enabled: true, from, to, steps, plan };
+};
+/**
+ * Human-readable one-line description of an escalation plan, collapsing
+ * consecutive repeats into "model×N".
+ * @param {string[]} plan
+ * @returns {string}
+ */
+export const formatEscalationPlan = plan => {
+  if (!Array.isArray(plan) || plan.length === 0) return '(empty)';
+  const groups = [];
+  for (const model of plan) {
+    const last = groups[groups.length - 1];
+    if (last && last.model === model) {
+      last.count++;
+    } else {
+      groups.push({ model, count: 1 });
+    }
+  }
+  return groups.map(({ model, count }) => (count > 1 ? `${model}×${count}` : model)).join(' → ');
+};
+// ─────────────────────────── Orchestrator (I/O) ──────────────────────────────
+// Lazy module bindings are set up inside runEscalation so that importing this
+// module for its pure helpers (e.g. in tests) does not pull in command-stream,
+// the network bootstrap, or other heavy dependencies.
+/**
+ * Runs escalate restart iterations after the main solve.
+ *
+ * The first regular solve session already ran with the lowest tier in the plan
+ * (see the config-time model override in solve.config.lib.mjs), so escalation
+ * continues from plan index 1 onward. Before each restart it re-scans for
+ * deferred / unfinished work (the same detector used by keep-working). If no
+ * unfinished-work indicators remain, the cheaper model is considered to have
+ * succeeded and escalation stops early — we do not waste the more expensive
+ * models.
+ *
+ * @param {object} params
+ * @param {string} params.issueUrl
+ * @param {string} params.owner
+ * @param {string} params.repo
+ * @param {string|number} params.issueNumber
+ * @param {string|number} params.prNumber
+ * @param {string} params.branchName
+ * @param {string} params.tempDir
+ * @param {string} [params.workspaceTmpDir]
+ * @param {object} params.argv - CLI arguments
+ * @param {function} params.cleanupClaudeFile - cleanup function
+ * @param {string} [params.resultSummary] - AI solution summary from the last session
+ * @returns {Promise<{sessionId, anthropicTotalCostUSD, publicPricingEstimate, pricingInfo}|null>}
+ */
+export const runEscalation = async ({ issueUrl, owner, repo, issueNumber, prNumber, branchName, tempDir, workspaceTmpDir, argv, cleanupClaudeFile, resultSummary }) => {
+  const config = resolveEscalationConfig(argv);
+  if (!config || !prNumber) {
+    return null;
+  }
+  // Import shared library functions lazily (network bootstrap lives here).
+  const lib = await import('./lib.mjs');
+  const { log, cleanErrorMessage } = lib;
+  // Escalate mode only makes sense for the Claude model ladder. For other tools
+  // we skip with a clear message rather than misusing the ladder names.
+  const tool = argv.tool || 'claude';
+  if (tool !== 'claude') {
+    await log(`ℹ️  ESCALATE: --escalate is only supported with --tool claude (current tool: ${tool}). Skipping.`, { level: 'warning' });
+    return null;
+  }
+  if (typeof globalThis.use === 'undefined') {
+    globalThis.use = (await eval(await (await fetch('https://unpkg.com/use-m/use.js')).text())).use;
+  }
+  const use = globalThis.use;
+  const { $: __rawDollar$ } = await use('command-stream');
+  const { wrapDollarWithGhRetry } = await import('./github-rate-limit.lib.mjs');
+  const $ = wrapDollarWithGhRetry(__rawDollar$);
+  const restartShared = await import('./solve.restart-shared.lib.mjs');
+  const { executeToolIteration, isApiError, isUsageLimitReached } = restartShared;
+  const keepWorkingLib = await import('./solve.keep-working.lib.mjs');
+  const { collectDeferredWorkSources } = keepWorkingLib;
+  const detectLib = await import('./solve.keep-working.detect.lib.mjs');
+  const { detectDeferredWorkInSources } = detectLib;
+  const { resolveDefaultFallbackModel } = await import('./models/index.mjs');
+  const sentryLib = await import('./sentry.lib.mjs');
+  const { reportError } = sentryLib;
+  const { plan } = config;
+  await log('');
+  await log(`🆙 ESCALATE: ${config.from} → ${config.to} (steps: ${config.steps} working session(s) per tier)`);
+  await log(`   Plan: ${formatEscalationPlan(plan)}`);
+  await log('   Strategy: solve cheaply first; escalate to a more capable model only while unfinished work remains.');
+  await log('');
+  // Get PR merge state status for the iterations
+  let currentMergeStateStatus = null;
+  try {
+    // `$` is wrapped via wrapDollarWithGhRetry above; the lazy import keeps this module
+    // network-free for tests, so the lint rule (which only detects top-level rebinds) can't see it.
+    // eslint-disable-next-line gh-rate-limit/no-direct-gh-exec -- $ is rate-limit-safe (wrapDollarWithGhRetry), rebound lazily on line 334.
+    const prStateResult = await $`gh api repos/${owner}/${repo}/pulls/${prNumber} --jq '.mergeStateStatus'`;
+    if (prStateResult.code === 0) {
+      currentMergeStateStatus = prStateResult.stdout.toString().trim();
+    }
+  } catch {
+    // Ignore errors getting merge state
+  }
+  let sessionId;
+  let anthropicTotalCostUSD;
+  let publicPricingEstimate;
+  let pricingInfo;
+  let lastResultSummary = resultSummary;
+  let consecutiveErrors = 0;
+  const MAX_CONSECUTIVE_ERRORS = 3;
+  let restartsRun = 0;
+  // The first regular solve session = plan index 0. Continue escalating from 1.
+  for (let sessionIndex = 1; sessionIndex < plan.length; sessionIndex++) {
+    const model = resolveEscalationModel(plan, sessionIndex);
+    const previousModel = resolveEscalationModel(plan, sessionIndex - 1);
+    // Decide whether the cheaper model already finished. Re-scan the PR
+    // description, AI solution summary and changed markdown documents for
+    // deferred/unfinished-work indicators (same signal as keep-working).
+    let sources = [];
+    try {
+      sources = await collectDeferredWorkSources({ owner, repo, prNumber, resultSummary: lastResultSummary });
+    } catch (error) {
+      reportError(error, { context: 'escalate_collect_sources', owner, repo, prNumber, operation: 'collect_sources' });
+      await log(`⚠️  ESCALATE: Could not collect sources to evaluate completion: ${cleanErrorMessage(error)}`, { level: 'warning' });
+    }
+    const detections = detectDeferredWorkInSources(sources);
+    if (detections.length === 0) {
+      await log(`✅ ESCALATE: No unfinished-work indicators after ${previousModel} session(s). Stopping before escalating to ${model}.`);
+      break;
+    }
+    await log('');
+    await log(`🆙 ESCALATE: ${detections.length} unfinished-work indicator(s) remain after ${previousModel}; escalating to ${model} (session ${sessionIndex + 1}/${plan.length}).`);
+    for (const detection of detections.slice(0, 10)) {
+      await log(`   • [${detection.label}] in ${detection.source}: "${detection.snippet}"`);
+    }
+    // Sync local branch with remote before each iteration (issue #1572 pattern).
+    try {
+      const pullResult = await $({ cwd: tempDir })`git pull origin ${branchName} 2>&1`;
+      if (pullResult.code === 0) {
+        await log(`   Synced local branch ${branchName} from remote`, { verbose: true });
+      } else {
+        await log(`   Warning: git pull failed (code ${pullResult.code}); continuing with local state`, { level: 'warning' });
+      }
+    } catch (error) {
+      reportError(error, { context: 'escalate_git_pull', branchName, operation: 'git_pull' });
+      await log(`   Warning: git pull error: ${cleanErrorMessage(error)}`, { level: 'warning' });
+    }
+    const feedbackLines = ['', '='.repeat(60), `🆙 ESCALATE MODE — now running on a more capable model (${model}):`, '='.repeat(60), '', `The previous working session(s) used "${previousModel}" but left unfinished work. You are a more capable model. Carefully review what has already been done, then finish every remaining requirement in this single pull request — do not defer, delay, or mark anything as out of scope. Ensure all changes are correct, consistent, validated, tested and that all CI/CD checks pass.`, ''];
+    const fallbackModel = resolveDefaultFallbackModel(tool, model) || undefined;
+    const iterationResult = await executeToolIteration({
+      issueUrl,
+      owner,
+      repo,
+      issueNumber,
+      prNumber,
+      branchName,
+      tempDir,
+      workspaceTmpDir,
+      mergeStateStatus: currentMergeStateStatus,
+      feedbackLines,
+      argv: {
+        ...argv,
+        // Escalate to the next tier for this iteration.
+        model,
+        fallbackModel,
+        // Reinforce the "finish everything now" guidance in the system prompt.
+        promptEnsureAllRequirementsAreMet: true,
+        // Prevent recursive escalation inside the restart iteration.
+        escalate: undefined,
+        escalateFrom: undefined,
+      },
+    });
+    restartsRun++;
+    if (iterationResult) {
+      if (iterationResult.sessionId) sessionId = iterationResult.sessionId;
+      if (iterationResult.anthropicTotalCostUSD) anthropicTotalCostUSD = iterationResult.anthropicTotalCostUSD;
+      if (iterationResult.publicPricingEstimate) publicPricingEstimate = iterationResult.publicPricingEstimate;
+      if (iterationResult.pricingInfo) pricingInfo = iterationResult.pricingInfo;
+      if (iterationResult.result) lastResultSummary = iterationResult.result;
+    }
+    if (isUsageLimitReached(iterationResult)) {
+      await log('🛑 ESCALATE: Usage limit reached during restart. Stopping escalate loop.');
+      break;
+    }
+    if (isApiError(iterationResult)) {
+      consecutiveErrors++;
+      await log(`⚠️  ESCALATE: API error during ${model} restart (${consecutiveErrors}/${MAX_CONSECUTIVE_ERRORS} consecutive).`, { level: 'warning' });
+      if (consecutiveErrors >= MAX_CONSECUTIVE_ERRORS) {
+        await log('🛑 ESCALATE: Too many consecutive errors. Stopping escalate loop.');
+        break;
+      }
+    } else {
+      consecutiveErrors = 0;
+    }
+    await log(`✅ ESCALATE: ${model} session complete (${sessionIndex + 1}/${plan.length})`);
+    await log('');
+  }
+  // Clean up CLAUDE.md/.gitkeep after restarts
+  try {
+    await cleanupClaudeFile(tempDir, branchName, null, argv);
+  } catch (error) {
+    reportError(error, { context: 'escalate_cleanup', branchName, operation: 'cleanup_claude_file' });
+  }
+  if (restartsRun === 0) return null;
+  return { sessionId, anthropicTotalCostUSD, publicPricingEstimate, pricingInfo };
+};
+export default {
+  MODEL_ESCALATION_ORDER,
+  DEFAULT_ESCALATE_LOWER,
+  DEFAULT_ESCALATE_UPPER,
+  DEFAULT_ESCALATE_RANGE,
+  DEFAULT_ESCALATE_STEPS,
+  canonicalTier,
+  parseEscalateRange,
+  parseEscalateFrom,
+  normalizeEscalateSteps,
+  buildEscalationPlan,
+  resolveEscalationModel,
+  isEscalateEnabled,
+  resolveEscalationConfig,
+  formatEscalationPlan,
+  runEscalation,
+};

package/src/solve.mjs CHANGED Viewed

@@ -46,6 +46,7 @@ const { startWatchMode } = watchLib;
 const { startAutoRestartUntilMergeable } = await import('./solve.auto-merge.lib.mjs');
 const { runAutoEnsureRequirements } = await import('./solve.auto-ensure.lib.mjs');
 const { runKeepWorkingUntilDone } = await import('./solve.keep-working.lib.mjs');
+const { runEscalation } = await import('./solve.escalate.lib.mjs');
 const exitHandler = await import('./exit-handler.lib.mjs');
 const { initializeExitHandler, installGlobalExitHandlers, safeExit, logActiveHandles } = exitHandler;
 const { createInterruptWrapper } = await import('./solve.interrupt.lib.mjs');
@@ -1270,10 +1271,9 @@ try {
       await log('⚠️  PR title/description still not updated after restart');
     }
   }
-  // Issue #1383: --finalize
+  // Post-solve restart loops (escalate #1885 first, then finalize #1383, then keep-working #1883):
+  applyRestartResult(await runEscalation({ issueUrl, owner, repo, issueNumber, prNumber, branchName, tempDir, workspaceTmpDir, argv, cleanupClaudeFile, resultSummary }));
   applyRestartResult(await runAutoEnsureRequirements({ issueUrl, owner, repo, issueNumber, prNumber, branchName, tempDir, argv, cleanupClaudeFile }));
-  // Issue #1883: --keep-working-until-all-requirements-are-fully-done (detect deferred work and auto-restart until done)
   applyRestartResult(await runKeepWorkingUntilDone({ issueUrl, owner, repo, issueNumber, prNumber, branchName, tempDir, workspaceTmpDir, argv, cleanupClaudeFile, resultSummary }));
   // Start watch mode if enabled OR if we need to handle uncommitted changes