npm - claudish - Versions diffs - 2.2.1 → 2.4.0 - Mend

claudish 2.2.1 → 2.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md +73 -0
package/dist/index.js +598 -71
package/package.json +1 -1
package/recommended-models.json +28 -112

package/README.md CHANGED Viewed

@@ -600,6 +600,24 @@ claudish --model minimax/minimax-m2 "task C"
 **NEW in v1.1.0**: Claudish now fully supports models with extended thinking/reasoning capabilities (Grok, o1, etc.) with complete Anthropic Messages API protocol compliance.
+### Thinking Translation Model (v1.5.0)
+Claudish includes a sophisticated **Thinking Translation Model** that aligns Claude Code's native thinking budget with the unique requirements of every major AI provider.
+When you set a thinking budget in Claude (e.g., `budget: 16000`), Claudish automatically translates it:
+| Provider | Model | Translation Logic |
+| :--- | :--- | :--- |
+| **OpenAI** | o1, o3 | Maps budget to `reasoning_effort` (minimal/low/medium/high) |
+| **Google** | Gemini 3 | Maps to `thinking_level` (low/high) |
+| **Google** | Gemini 2.x | Passes exact `thinking_budget` (capped at 24k) |
+| **xAI** | Grok 3 Mini | Maps to `reasoning_effort` (low/high) |
+| **Qwen** | Qwen 2.5 | Enables `enable_thinking` + exact budget |
+| **MiniMax** | M2 | Enables `reasoning_split` (interleaved thinking) |
+| **DeepSeek** | R1 | Automatically manages reasoning (params stripped for safety) |
+This ensures you can use standard Claude Code thinking controls with **ANY** supported model, without worrying about API specificities.
 ### What is Extended Thinking?
 Some AI models (like Grok and OpenAI's o1) can show their internal reasoning process before providing the final answer. This "thinking" content helps you understand how the model arrived at its conclusion.
@@ -678,6 +696,61 @@ For complete protocol documentation, see:
 - [COMPREHENSIVE_UX_ISSUE_ANALYSIS.md](./COMPREHENSIVE_UX_ISSUE_ANALYSIS.md) - Technical analysis
 - [THINKING_BLOCKS_IMPLEMENTATION.md](./THINKING_BLOCKS_IMPLEMENTATION.md) - Implementation summary
+## Dynamic Reasoning Support (NEW in v1.4.0)
+**Claudish now intelligently adapts to ANY reasoning model!**
+No more hardcoded lists or manual flags. Claudish dynamically queries OpenRouter metadata to enable thinking capabilities for any model that supports them.
+### 🧠 Dynamic Thinking Features
+1.  **Auto-Detection**:
+    - Automatically checks model capabilities at startup
+    - Enables Extended Thinking UI *only* when supported
+    - Future-proof: Works instantly with new models (e.g., `deepseek-r1` or `minimax-m2`)
+2.  **Smart Parameter Mapping**:
+    - **Claude**: Passes token budget directly (e.g., 16k tokens)
+    - **OpenAI (o1/o3)**: Translates budget to `reasoning_effort`
+        - "ultrathink" (≥32k) → `high`
+        - "think hard" (16k-32k) → `medium`
+        - "think" (<16k) → `low`
+    - **Gemini & Grok**: Preserves thought signatures and XML traces automatically
+3.  **Universal Compatibility**:
+    - Use "ultrathink" or "think hard" prompts with ANY supported model
+    - Claudish handles the translation layer for you
+## Context Scaling & Auto-Compaction
+**NEW in v1.2.0**: Claudish now intelligently manages token counting to support ANY context window size (from 128k to 2M+) while preserving Claude Code's native auto-compaction behavior.
+### The Challenge
+Claude Code naturally assumes a fixed context window (typically 200k tokens for Sonnet).
+- **Small Models (e.g., Grok 128k)**: Claude might overuse context and crash.
+- **Massive Models (e.g., Gemini 2M)**: Claude would compact way too early (at 10% usage), wasting the model's potential.
+### The Solution: Token Scaling
+Claudish implements a "Dual-Accounting" system:
+1. **Internal Scaling (For Claude):**
+   - We fetch the *real* context limit from OpenRouter (e.g., 1M tokens).
+   - We scale reported token usage so Claude *thinks* 1M tokens is 200k.
+   - **Result:** Auto-compaction triggers at the correct *percentage* of usage (e.g., 90% full), regardless of the actual limit.
+2. **Accurate Reporting (For You):**
+   - The status line displays the **Real Unscaled Usage** and **Real Context %**.
+   - You see specific costs and limits, while Claude remains blissfully unaware and stable.
+**Benefits:**
+- ✅ **Works with ANY model** size (128k, 1M, 2M, etc.)
+- ✅ **Unlocks massive context** windows (Claude Code becomes 10x more powerful with Gemini!)
+- ✅ **Prevents crashes** on smaller models (Grok)
+- ✅ **Native behavior** (compaction just works)
 ## Development
 ### Project Structure

package/dist/index.js CHANGED Viewed

@@ -141,20 +141,11 @@ function createTempSettingsFile(modelDisplay, port) {
   const DIM = "\\033[2m";
   const RESET = "\\033[0m";
   const BOLD = "\\033[1m";
-  const MODEL_CONTEXT = {
-    "x-ai/grok-code-fast-1": 256000,
-    "openai/gpt-5-codex": 400000,
-    "minimax/minimax-m2": 204800,
-    "z-ai/glm-4.6": 200000,
-    "qwen/qwen3-vl-235b-a22b-instruct": 256000,
-    "anthropic/claude-sonnet-4.5": 200000
-  };
-  const maxTokens = MODEL_CONTEXT[modelDisplay] || 1e5;
   const tokenFilePath = `/tmp/claudish-tokens-${port}.json`;
   const settings = {
     statusLine: {
       type: "command",
-      command: `JSON=$(cat) && DIR=$(basename "$(pwd)") && [ \${#DIR} -gt 15 ] && DIR="\${DIR:0:12}..." || true && COST=$(echo "$JSON" | grep -o '"total_cost_usd":[0-9.]*' | cut -d: -f2) && [ -z "$COST" ] && COST="0" || true && if [ -f "${tokenFilePath}" ]; then TOKENS=$(cat "${tokenFilePath}" 2>/dev/null) && INPUT=$(echo "$TOKENS" | grep -o '"input_tokens":[0-9]*' | grep -o '[0-9]*') && OUTPUT=$(echo "$TOKENS" | grep -o '"output_tokens":[0-9]*' | grep -o '[0-9]*') && TOTAL=$((INPUT + OUTPUT)) && CTX=$(echo "scale=0; (${maxTokens} - $TOTAL) * 100 / ${maxTokens}" | bc 2>/dev/null); else INPUT=0 && OUTPUT=0 && CTX=100; fi && [ -z "$CTX" ] && CTX="100" || true && printf "${CYAN}${BOLD}%s${RESET} ${DIM}•${RESET} ${YELLOW}%s${RESET} ${DIM}•${RESET} ${GREEN}\\$%.3f${RESET} ${DIM}•${RESET} ${MAGENTA}%s%%${RESET}\\n" "$DIR" "$CLAUDISH_ACTIVE_MODEL_NAME" "$COST" "$CTX"`,
+      command: `JSON=$(cat) && DIR=$(basename "$(pwd)") && [ \${#DIR} -gt 15 ] && DIR="\${DIR:0:12}..." || true && CTX=100 && COST="0" && if [ -f "${tokenFilePath}" ]; then TOKENS=$(cat "${tokenFilePath}" 2>/dev/null) && REAL_COST=$(echo "$TOKENS" | grep -o '"total_cost":[0-9.]*' | cut -d: -f2) && REAL_CTX=$(echo "$TOKENS" | grep -o '"context_left_percent":[0-9]*' | grep -o '[0-9]*') && if [ ! -z "$REAL_COST" ]; then COST="$REAL_COST"; else COST=$(echo "$JSON" | grep -o '"total_cost_usd":[0-9.]*' | cut -d: -f2); fi && if [ ! -z "$REAL_CTX" ]; then CTX="$REAL_CTX"; fi; else COST=$(echo "$JSON" | grep -o '"total_cost_usd":[0-9.]*' | cut -d: -f2); fi && [ -z "$COST" ] && COST="0" || true && printf "${CYAN}${BOLD}%s${RESET} ${DIM}•${RESET} ${YELLOW}%s${RESET} ${DIM}•${RESET} ${GREEN}\\$%.3f${RESET} ${DIM}•${RESET} ${MAGENTA}%s%%${RESET}\\n" "$DIR" "$CLAUDISH_ACTIVE_MODEL_NAME" "$COST" "$CTX"`,
       padding: 0
     }
   };
@@ -336,11 +327,90 @@ function getAvailableModels() {
   _cachedModelIds = [...OPENROUTER_MODELS2];
   return [...OPENROUTER_MODELS2];
 }
+var _cachedOpenRouterModels = null;
+async function fetchModelContextWindow(modelId) {
+  if (_cachedOpenRouterModels) {
+    const model = _cachedOpenRouterModels.find((m) => m.id === modelId);
+    if (model) {
+      return model.context_length || model.top_provider?.context_length || 128000;
+    }
+  }
+  try {
+    const response = await fetch("https://openrouter.ai/api/v1/models");
+    if (response.ok) {
+      const data = await response.json();
+      _cachedOpenRouterModels = data.data;
+      const model = _cachedOpenRouterModels?.find((m) => m.id === modelId);
+      if (model) {
+        return model.context_length || model.top_provider?.context_length || 128000;
+      }
+    }
+  } catch (error) {}
+  try {
+    const modelMetadata = loadModelInfo();
+  } catch (e) {}
+  const jsonPath = join2(__dirname2, "../recommended-models.json");
+  if (existsSync(jsonPath)) {
+    try {
+      const jsonContent = readFileSync(jsonPath, "utf-8");
+      const data = JSON.parse(jsonContent);
+      const model = data.models.find((m) => m.id === modelId);
+      if (model && model.context) {
+        const ctxStr = model.context.toUpperCase();
+        if (ctxStr.includes("K"))
+          return parseFloat(ctxStr.replace("K", "")) * 1024;
+        if (ctxStr.includes("M"))
+          return parseFloat(ctxStr.replace("M", "")) * 1e6;
+        const val = parseInt(ctxStr);
+        if (!isNaN(val))
+          return val;
+      }
+    } catch (e) {}
+  }
+  return 200000;
+}
 // src/cli.ts
 import { readFileSync as readFileSync2, writeFileSync as writeFileSync3, existsSync as existsSync2, mkdirSync, copyFileSync } from "node:fs";
 import { fileURLToPath as fileURLToPath2 } from "node:url";
 import { dirname as dirname2, join as join3 } from "node:path";
+// src/utils.ts
+function fuzzyScore(text, query) {
+  if (!text || !query)
+    return 0;
+  const t = text.toLowerCase();
+  const q = query.toLowerCase();
+  if (t === q)
+    return 1;
+  if (t.startsWith(q))
+    return 0.9;
+  if (t.includes(` ${q}`) || t.includes(`-${q}`) || t.includes(`/${q}`))
+    return 0.8;
+  if (t.includes(q))
+    return 0.6;
+  let score = 0;
+  let tIdx = 0;
+  let qIdx = 0;
+  let consecutive = 0;
+  while (tIdx < t.length && qIdx < q.length) {
+    if (t[tIdx] === q[qIdx]) {
+      score += 1 + consecutive * 0.5;
+      consecutive++;
+      qIdx++;
+    } else {
+      consecutive = 0;
+    }
+    tIdx++;
+  }
+  if (qIdx === q.length) {
+    const compactness = q.length / (tIdx + 1);
+    return 0.1 + 0.4 * compactness * (score / (q.length * 2));
+  }
+  return 0;
+}
+// src/cli.ts
 var __filename3 = fileURLToPath2(import.meta.url);
 var __dirname3 = dirname2(__filename3);
 var packageJson = JSON.parse(readFileSync2(join3(__dirname3, "../package.json"), "utf-8"));
@@ -452,6 +522,15 @@ async function parseArgs(args) {
         printAvailableModels();
       }
       process.exit(0);
+    } else if (arg === "--search" || arg === "-s") {
+      const query = args[++i];
+      if (!query) {
+        console.error("--search requires a search term");
+        process.exit(1);
+      }
+      const forceUpdate = args.includes("--force-update");
+      await searchAndPrintModels(query, forceUpdate);
+      process.exit(0);
     } else {
       config.claudeArgs = args.slice(i);
       break;
@@ -500,6 +579,75 @@ async function parseArgs(args) {
 }
 var CACHE_MAX_AGE_DAYS = 2;
 var MODELS_JSON_PATH = join3(__dirname3, "../recommended-models.json");
+var ALL_MODELS_JSON_PATH = join3(__dirname3, "../all-models.json");
+async function searchAndPrintModels(query, forceUpdate) {
+  let models = [];
+  if (!forceUpdate && existsSync2(ALL_MODELS_JSON_PATH)) {
+    try {
+      const cacheData = JSON.parse(readFileSync2(ALL_MODELS_JSON_PATH, "utf-8"));
+      const lastUpdated = new Date(cacheData.lastUpdated);
+      const now = new Date;
+      const ageInDays = (now.getTime() - lastUpdated.getTime()) / (1000 * 60 * 60 * 24);
+      if (ageInDays <= CACHE_MAX_AGE_DAYS) {
+        models = cacheData.models;
+      }
+    } catch (e) {}
+  }
+  if (models.length === 0) {
+    console.error("\uD83D\uDD04 Fetching all models from OpenRouter (this may take a moment)...");
+    try {
+      const response = await fetch("https://openrouter.ai/api/v1/models");
+      if (!response.ok)
+        throw new Error(`API returned ${response.status}`);
+      const data = await response.json();
+      models = data.data;
+      writeFileSync3(ALL_MODELS_JSON_PATH, JSON.stringify({
+        lastUpdated: new Date().toISOString(),
+        models
+      }), "utf-8");
+      console.error(`✅ Cached ${models.length} models`);
+    } catch (error) {
+      console.error(`❌ Failed to fetch models: ${error}`);
+      process.exit(1);
+    }
+  }
+  const results = models.map((model) => {
+    const nameScore = fuzzyScore(model.name || "", query);
+    const idScore = fuzzyScore(model.id || "", query);
+    const descScore = fuzzyScore(model.description || "", query) * 0.5;
+    return {
+      model,
+      score: Math.max(nameScore, idScore, descScore)
+    };
+  }).filter((item) => item.score > 0.2).sort((a, b) => b.score - a.score).slice(0, 20);
+  if (results.length === 0) {
+    console.log(`No models found matching "${query}"`);
+    return;
+  }
+  console.log(`
+Found ${results.length} matching models:
+`);
+  console.log("  Model                          Provider    Pricing     Context  Score");
+  console.log("  " + "─".repeat(80));
+  for (const { model, score } of results) {
+    const modelId = model.id.length > 30 ? model.id.substring(0, 27) + "..." : model.id;
+    const modelIdPadded = modelId.padEnd(30);
+    const providerName = model.id.split("/")[0];
+    const provider = providerName.length > 10 ? providerName.substring(0, 7) + "..." : providerName;
+    const providerPadded = provider.padEnd(10);
+    const promptPrice = parseFloat(model.pricing?.prompt || "0") * 1e6;
+    const completionPrice = parseFloat(model.pricing?.completion || "0") * 1e6;
+    const avg = (promptPrice + completionPrice) / 2;
+    const pricing = avg === 0 ? "FREE" : `$${avg.toFixed(2)}/1M`;
+    const pricingPadded = pricing.padEnd(10);
+    const contextLen = model.context_length || model.top_provider?.context_length || 0;
+    const context = contextLen > 0 ? `${Math.round(contextLen / 1000)}K` : "N/A";
+    const contextPadded = context.padEnd(7);
+    console.log(`  ${modelIdPadded} ${providerPadded} ${pricingPadded} ${contextPadded} ${(score * 100).toFixed(0)}%`);
+  }
+  console.log("");
+  console.log("Use a model: claudish --model <model-id>");
+}
 function isCacheStale() {
   if (!existsSync2(MODELS_JSON_PATH)) {
     return true;
@@ -522,6 +670,8 @@ async function updateModelsFromOpenRouter() {
   console.error("\uD83D\uDD04 Updating model recommendations from OpenRouter...");
   try {
     const topWeeklyProgrammingModels = [
+      "google/gemini-3-pro-preview",
+      "openai/gpt-5.1-codex",
       "x-ai/grok-code-fast-1",
       "anthropic/claude-sonnet-4.5",
       "google/gemini-2.5-flash",
@@ -555,29 +705,7 @@ async function updateModelsFromOpenRouter() {
       }
       const model = modelMap.get(modelId);
       if (!model) {
-        console.error(`⚠️  Model ${modelId} not found in OpenRouter API (including with limited metadata)`);
-        recommendations.push({
-          id: modelId,
-          name: modelId.split("/")[1].replace(/-/g, " ").replace(/\b\w/g, (l) => l.toUpperCase()),
-          description: `${modelId} (metadata pending - not yet available in API)`,
-          provider: provider.charAt(0).toUpperCase() + provider.slice(1),
-          category: "programming",
-          priority: recommendations.length + 1,
-          pricing: {
-            input: "N/A",
-            output: "N/A",
-            average: "N/A"
-          },
-          context: "N/A",
-          maxOutputTokens: null,
-          modality: "text->text",
-          supportsTools: false,
-          supportsReasoning: false,
-          supportsVision: false,
-          isModerated: false,
-          recommended: true
-        });
-        providers.add(provider);
+        console.error(`⚠️  Model ${modelId} not found in OpenRouter API - skipping`);
         continue;
       }
       const name = model.name || modelId;
@@ -3541,12 +3669,58 @@ class GrokAdapter extends BaseModelAdapter {
   }
 }
+// src/adapters/gemini-adapter.ts
+class GeminiAdapter extends BaseModelAdapter {
+  thoughtSignatures = new Map;
+  processTextContent(textContent, accumulatedText) {
+    return {
+      cleanedText: textContent,
+      extractedToolCalls: [],
+      wasTransformed: false
+    };
+  }
+  extractThoughtSignaturesFromReasoningDetails(reasoningDetails) {
+    const extracted = new Map;
+    if (!reasoningDetails || !Array.isArray(reasoningDetails)) {
+      return extracted;
+    }
+    for (const detail of reasoningDetails) {
+      if (detail && detail.type === "reasoning.encrypted" && detail.id && detail.data) {
+        this.thoughtSignatures.set(detail.id, detail.data);
+        extracted.set(detail.id, detail.data);
+      }
+    }
+    return extracted;
+  }
+  getThoughtSignature(toolCallId) {
+    return this.thoughtSignatures.get(toolCallId);
+  }
+  hasThoughtSignature(toolCallId) {
+    return this.thoughtSignatures.has(toolCallId);
+  }
+  getAllThoughtSignatures() {
+    return new Map(this.thoughtSignatures);
+  }
+  reset() {
+    this.thoughtSignatures.clear();
+  }
+  shouldHandle(modelId) {
+    return modelId.includes("gemini") || modelId.includes("google/");
+  }
+  getName() {
+    return "GeminiAdapter";
+  }
+}
 // src/adapters/adapter-manager.ts
 class AdapterManager {
   adapters;
   defaultAdapter;
   constructor(modelId) {
-    this.adapters = [new GrokAdapter(modelId)];
+    this.adapters = [
+      new GrokAdapter(modelId),
+      new GeminiAdapter(modelId)
+    ];
     this.defaultAdapter = new DefaultAdapter(modelId);
   }
   getAdapter() {
@@ -3562,6 +3736,261 @@ class AdapterManager {
   }
 }
+// src/middleware/manager.ts
+class MiddlewareManager {
+  middlewares = [];
+  initialized = false;
+  register(middleware) {
+    this.middlewares.push(middleware);
+    if (isLoggingEnabled()) {
+      logStructured("Middleware Registered", {
+        name: middleware.name,
+        total: this.middlewares.length
+      });
+    }
+  }
+  async initialize() {
+    if (this.initialized) {
+      log("[Middleware] Already initialized, skipping");
+      return;
+    }
+    log(`[Middleware] Initializing ${this.middlewares.length} middleware(s)...`);
+    for (const middleware of this.middlewares) {
+      if (middleware.onInit) {
+        try {
+          await middleware.onInit();
+          log(`[Middleware] ${middleware.name} initialized`);
+        } catch (error) {
+          log(`[Middleware] ERROR: ${middleware.name} initialization failed: ${error}`);
+        }
+      }
+    }
+    this.initialized = true;
+    log("[Middleware] Initialization complete");
+  }
+  getActiveMiddlewares(modelId) {
+    return this.middlewares.filter((m) => m.shouldHandle(modelId));
+  }
+  async beforeRequest(context) {
+    const active = this.getActiveMiddlewares(context.modelId);
+    if (active.length === 0) {
+      return;
+    }
+    if (isLoggingEnabled()) {
+      logStructured("Middleware Chain (beforeRequest)", {
+        modelId: context.modelId,
+        middlewares: active.map((m) => m.name),
+        messageCount: context.messages.length
+      });
+    }
+    for (const middleware of active) {
+      try {
+        await middleware.beforeRequest(context);
+      } catch (error) {
+        log(`[Middleware] ERROR in ${middleware.name}.beforeRequest: ${error}`);
+      }
+    }
+  }
+  async afterResponse(context) {
+    const active = this.getActiveMiddlewares(context.modelId);
+    if (active.length === 0) {
+      return;
+    }
+    if (isLoggingEnabled()) {
+      logStructured("Middleware Chain (afterResponse)", {
+        modelId: context.modelId,
+        middlewares: active.map((m) => m.name)
+      });
+    }
+    for (const middleware of active) {
+      if (middleware.afterResponse) {
+        try {
+          await middleware.afterResponse(context);
+        } catch (error) {
+          log(`[Middleware] ERROR in ${middleware.name}.afterResponse: ${error}`);
+        }
+      }
+    }
+  }
+  async afterStreamChunk(context) {
+    const active = this.getActiveMiddlewares(context.modelId);
+    if (active.length === 0) {
+      return;
+    }
+    if (isLoggingEnabled() && !context.metadata.has("_middlewareLogged")) {
+      logStructured("Middleware Chain (afterStreamChunk)", {
+        modelId: context.modelId,
+        middlewares: active.map((m) => m.name)
+      });
+      context.metadata.set("_middlewareLogged", true);
+    }
+    for (const middleware of active) {
+      if (middleware.afterStreamChunk) {
+        try {
+          await middleware.afterStreamChunk(context);
+        } catch (error) {
+          log(`[Middleware] ERROR in ${middleware.name}.afterStreamChunk: ${error}`);
+        }
+      }
+    }
+  }
+  async afterStreamComplete(modelId, metadata) {
+    const active = this.getActiveMiddlewares(modelId);
+    if (active.length === 0) {
+      return;
+    }
+    for (const middleware of active) {
+      if (middleware.afterStreamComplete) {
+        try {
+          await middleware.afterStreamComplete(metadata);
+        } catch (error) {
+          log(`[Middleware] ERROR in ${middleware.name}.afterStreamComplete: ${error}`);
+        }
+      }
+    }
+  }
+}
+// src/middleware/gemini-thought-signature.ts
+class GeminiThoughtSignatureMiddleware {
+  name = "GeminiThoughtSignature";
+  persistentReasoningDetails = new Map;
+  shouldHandle(modelId) {
+    return modelId.includes("gemini") || modelId.includes("google/");
+  }
+  onInit() {
+    log("[Gemini] Thought signature middleware initialized");
+  }
+  beforeRequest(context) {
+    if (this.persistentReasoningDetails.size === 0) {
+      return;
+    }
+    if (isLoggingEnabled()) {
+      logStructured("[Gemini] Injecting reasoning_details", {
+        cacheSize: this.persistentReasoningDetails.size,
+        messageCount: context.messages.length
+      });
+    }
+    let injected = 0;
+    for (const msg of context.messages) {
+      if (msg.role === "assistant" && msg.tool_calls) {
+        for (const [msgId, cached] of this.persistentReasoningDetails.entries()) {
+          const hasMatchingToolCall = msg.tool_calls.some((tc) => cached.tool_call_ids.has(tc.id));
+          if (hasMatchingToolCall) {
+            msg.reasoning_details = cached.reasoning_details;
+            injected++;
+            if (isLoggingEnabled()) {
+              logStructured("[Gemini] Reasoning details added to assistant message", {
+                message_id: msgId,
+                reasoning_blocks: cached.reasoning_details.length,
+                tool_calls: msg.tool_calls.length
+              });
+            }
+            break;
+          }
+        }
+        if (!msg.reasoning_details && isLoggingEnabled()) {
+          log(`[Gemini] WARNING: No reasoning_details found for assistant message with tool_calls`);
+          log(`[Gemini] Tool call IDs: ${msg.tool_calls.map((tc) => tc.id).join(", ")}`);
+        }
+      }
+    }
+    if (isLoggingEnabled() && injected > 0) {
+      logStructured("[Gemini] Signature injection complete", {
+        injected,
+        cacheSize: this.persistentReasoningDetails.size
+      });
+      log("[Gemini] DEBUG: Messages after injection:");
+      for (let i = 0;i < context.messages.length; i++) {
+        const msg = context.messages[i];
+        log(`[Gemini] Message ${i}: role=${msg.role}, has_content=${!!msg.content}, has_tool_calls=${!!msg.tool_calls}, tool_call_id=${msg.tool_call_id || "N/A"}`);
+        if (msg.role === "assistant" && msg.tool_calls) {
+          log(`  - Assistant has ${msg.tool_calls.length} tool call(s), content="${msg.content}"`);
+          for (const tc of msg.tool_calls) {
+            log(`    * Tool call: ${tc.id}, function=${tc.function?.name}, has extra_content: ${!!tc.extra_content}, has thought_signature: ${!!tc.extra_content?.google?.thought_signature}`);
+            if (tc.extra_content) {
+              log(`      extra_content keys: ${Object.keys(tc.extra_content).join(", ")}`);
+              if (tc.extra_content.google) {
+                log(`      google keys: ${Object.keys(tc.extra_content.google).join(", ")}`);
+                log(`      thought_signature length: ${tc.extra_content.google.thought_signature?.length || 0}`);
+              }
+            }
+          }
+        } else if (msg.role === "tool") {
+          log(`  - Tool result: tool_call_id=${msg.tool_call_id}, has extra_content: ${!!msg.extra_content}`);
+        }
+      }
+    }
+  }
+  afterResponse(context) {
+    const response = context.response;
+    const message = response?.choices?.[0]?.message;
+    if (!message) {
+      return;
+    }
+    const reasoningDetails = message.reasoning_details || [];
+    const toolCalls = message.tool_calls || [];
+    if (reasoningDetails.length > 0 && toolCalls.length > 0) {
+      const messageId = `msg_${Date.now()}_${Math.random().toString(36).slice(2)}`;
+      const toolCallIds = new Set(toolCalls.map((tc) => tc.id).filter(Boolean));
+      this.persistentReasoningDetails.set(messageId, {
+        reasoning_details: reasoningDetails,
+        tool_call_ids: toolCallIds
+      });
+      logStructured("[Gemini] Reasoning details saved (non-streaming)", {
+        message_id: messageId,
+        reasoning_blocks: reasoningDetails.length,
+        tool_calls: toolCallIds.size,
+        total_cached_messages: this.persistentReasoningDetails.size
+      });
+    }
+  }
+  afterStreamChunk(context) {
+    const delta = context.delta;
+    if (!delta)
+      return;
+    if (delta.reasoning_details && delta.reasoning_details.length > 0) {
+      if (!context.metadata.has("reasoning_details")) {
+        context.metadata.set("reasoning_details", []);
+      }
+      const accumulated = context.metadata.get("reasoning_details");
+      accumulated.push(...delta.reasoning_details);
+      if (isLoggingEnabled()) {
+        logStructured("[Gemini] Reasoning details accumulated", {
+          chunk_blocks: delta.reasoning_details.length,
+          total_blocks: accumulated.length
+        });
+      }
+    }
+    if (delta.tool_calls) {
+      if (!context.metadata.has("tool_call_ids")) {
+        context.metadata.set("tool_call_ids", new Set);
+      }
+      const toolCallIds = context.metadata.get("tool_call_ids");
+      for (const tc of delta.tool_calls) {
+        if (tc.id) {
+          toolCallIds.add(tc.id);
+        }
+      }
+    }
+  }
+  afterStreamComplete(metadata) {
+    const reasoningDetails = metadata.get("reasoning_details") || [];
+    const toolCallIds = metadata.get("tool_call_ids") || new Set;
+    if (reasoningDetails.length > 0 && toolCallIds.size > 0) {
+      const messageId = `msg_${Date.now()}_${Math.random().toString(36).slice(2)}`;
+      this.persistentReasoningDetails.set(messageId, {
+        reasoning_details: reasoningDetails,
+        tool_call_ids: toolCallIds
+      });
+      logStructured("[Gemini] Streaming complete - reasoning details saved", {
+        message_id: messageId,
+        reasoning_blocks: reasoningDetails.length,
+        tool_calls: toolCallIds.size,
+        total_cached_messages: this.persistentReasoningDetails.size
+      });
+    }
+  }
+}
 // src/proxy-server.ts
 async function createProxyServer(port, openrouterApiKey, model, monitorMode = false, anthropicApiKey) {
   const OPENROUTER_API_URL2 = "https://openrouter.ai/api/v1/chat/completions";
@@ -3571,6 +4000,30 @@ async function createProxyServer(port, openrouterApiKey, model, monitorMode = fa
   };
   const ANTHROPIC_API_URL = "https://api.anthropic.com/v1/messages";
   const ANTHROPIC_COUNT_TOKENS_URL = "https://api.anthropic.com/v1/messages/count_tokens";
+  const middlewareManager = new MiddlewareManager;
+  middlewareManager.register(new GeminiThoughtSignatureMiddleware);
+  middlewareManager.initialize().catch((error) => {
+    log(`[Proxy] Middleware initialization error: ${error}`);
+  });
+  let sessionTotalCost = 0;
+  let contextWindowLimit = 200000;
+  const CLAUDE_INTERNAL_CONTEXT_MAX = 200000;
+  const getTokenScaleFactor = () => {
+    if (contextWindowLimit === 0)
+      return 1;
+    return CLAUDE_INTERNAL_CONTEXT_MAX / contextWindowLimit;
+  };
+  if (model && !monitorMode) {
+    fetchModelContextWindow(model).then((limit) => {
+      contextWindowLimit = limit;
+      if (isLoggingEnabled()) {
+        log(`[Proxy] Context window limit updated to ${limit} tokens for model ${model}`);
+        log(`[Proxy] Token scaling factor: ${getTokenScaleFactor().toFixed(2)}x (Map ${limit} → ${CLAUDE_INTERNAL_CONTEXT_MAX})`);
+      }
+    }).catch((err) => {
+      log(`[Proxy] Failed to fetch context window limit: ${err}`);
+    });
+  }
   const app = new Hono2;
   app.use("*", cors());
   app.get("/", (c) => {
@@ -3641,8 +4094,9 @@ async function createProxyServer(port, openrouterApiKey, model, monitorMode = fa
         return acc + Math.ceil(content.length / 4);
       }, 0) : 0;
       const totalTokens = systemTokens + messageTokens;
+      const scaleFactor = getTokenScaleFactor();
       return c.json({
-        input_tokens: totalTokens
+        input_tokens: Math.ceil(totalTokens * scaleFactor)
       });
     } catch (error) {
       log(`[Proxy] Token counting error: ${error}`);
@@ -3777,6 +4231,14 @@ async function createProxyServer(port, openrouterApiKey, model, monitorMode = fa
       });
       const { claudeRequest, droppedParams } = transformOpenAIToClaude(claudePayload);
       const messages = [];
+      const adapterManager = new AdapterManager(model || "");
+      const adapter = adapterManager.getAdapter();
+      if (typeof adapter.reset === "function") {
+        adapter.reset();
+      }
+      if (isLoggingEnabled()) {
+        log(`[Proxy] Using adapter: ${adapter.getName()}`);
+      }
       if (claudeRequest.system) {
         let systemContent;
         if (typeof claudeRequest.system === "string") {
@@ -3806,37 +4268,40 @@ async function createProxyServer(port, openrouterApiKey, model, monitorMode = fa
         for (const msg of claudeRequest.messages) {
           if (msg.role === "user") {
             if (Array.isArray(msg.content)) {
-              const textParts = [];
+              const contentParts = [];
               const toolResults = [];
               const seenToolResultIds = new Set;
               for (const block of msg.content) {
                 if (block.type === "text") {
-                  textParts.push(block.text);
+                  contentParts.push({ type: "text", text: block.text });
+                } else if (block.type === "image") {
+                  contentParts.push({
+                    type: "image_url",
+                    image_url: {
+                      url: `data:${block.source.media_type};base64,${block.source.data}`
+                    }
+                  });
                 } else if (block.type === "tool_result") {
                   if (seenToolResultIds.has(block.tool_use_id)) {
                     log(`[Proxy] Skipping duplicate tool_result with tool_use_id: ${block.tool_use_id}`);
                     continue;
                   }
                   seenToolResultIds.add(block.tool_use_id);
-                  toolResults.push({
+                  const toolResultMsg = {
                     role: "tool",
                     content: typeof block.content === "string" ? block.content : JSON.stringify(block.content),
                     tool_call_id: block.tool_use_id
-                  });
+                  };
+                  toolResults.push(toolResultMsg);
                 }
               }
               if (toolResults.length > 0) {
                 messages.push(...toolResults);
-                if (textParts.length > 0) {
-                  messages.push({
-                    role: "user",
-                    content: textParts.join(" ")
-                  });
-                }
-              } else if (textParts.length > 0) {
+              }
+              if (contentParts.length > 0) {
                 messages.push({
                   role: "user",
-                  content: textParts.join(" ")
+                  content: contentParts
                 });
               }
             } else if (typeof msg.content === "string") {
@@ -3918,8 +4383,27 @@ IMPORTANT: When calling tools, you MUST use the OpenAI tool_calls format with JS
         model,
         messages,
         temperature: claudeRequest.temperature !== undefined ? claudeRequest.temperature : 1,
-        stream: true
+        stream: true,
+        include_reasoning: true
       };
+      if (claudeRequest.thinking) {
+        const { budget_tokens } = claudeRequest.thinking;
+        log(`[Proxy] Thinking mode requested with budget: ${budget_tokens} tokens`);
+        openrouterPayload.thinking = claudeRequest.thinking;
+        let effort = "medium";
+        if (budget_tokens < 16000)
+          effort = "low";
+        else if (budget_tokens >= 32000)
+          effort = "high";
+        if (model && (model.includes("o1") || model.includes("o3") || model.startsWith("openai/"))) {
+          openrouterPayload.reasoning_effort = effort;
+          log(`[Proxy] Mapped budget ${budget_tokens} -> reasoning_effort: ${effort}`);
+        }
+      }
+      if (!openrouterPayload.stream_options) {
+        openrouterPayload.stream_options = {};
+      }
+      openrouterPayload.stream_options.include_usage = true;
       if (claudeRequest.max_tokens) {
         openrouterPayload.max_tokens = claudeRequest.max_tokens;
       }
@@ -3938,6 +4422,12 @@ IMPORTANT: When calling tools, you MUST use the OpenAI tool_calls format with JS
         maxTokens: openrouterPayload.max_tokens,
         stream: openrouterPayload.stream
       });
+      await middlewareManager.beforeRequest({
+        modelId: model || "",
+        messages,
+        tools,
+        stream: openrouterPayload.stream
+      });
       const headers = {
         "Content-Type": "application/json",
         Authorization: `Bearer ${openrouterApiKey}`,
@@ -3975,6 +4465,10 @@ IMPORTANT: When calling tools, you MUST use the OpenAI tool_calls format with JS
         if (data.error) {
           return c.json({ error: data.error.message || "Unknown error" }, 500);
         }
+        await middlewareManager.afterResponse({
+          modelId: model || "",
+          response: data
+        });
         const choice = data.choices[0];
         const openaiMessage = choice.message;
         const content = [];
@@ -4002,8 +4496,8 @@ IMPORTANT: When calling tools, you MUST use the OpenAI tool_calls format with JS
           stop_reason: mapStopReason(choice.finish_reason),
           stop_sequence: null,
           usage: {
-            input_tokens: data.usage?.prompt_tokens || 0,
-            output_tokens: data.usage?.completion_tokens || 0
+            input_tokens: Math.ceil((data.usage?.prompt_tokens || 0) * getTokenScaleFactor()),
+            output_tokens: Math.ceil((data.usage?.completion_tokens || 0) * getTokenScaleFactor())
           }
         };
         log("[Proxy] Translated to Claude format:");
@@ -4113,7 +4607,7 @@ data: ${JSON.stringify(data)}
                   stop_sequence: null
                 },
                 usage: {
-                  output_tokens: outputTokens
+                  output_tokens: Math.ceil(outputTokens * getTokenScaleFactor())
                 }
               });
               sendSSE("message_stop", {
@@ -4137,9 +4631,13 @@ data: ${JSON.stringify(data)}
                 clearInterval(pingInterval);
               }
               log(`[Proxy] Stream closed (reason: ${reason})`);
+              middlewareManager.afterStreamComplete(model || "", streamMetadata).catch((error) => {
+                log(`[Middleware] Error in afterStreamComplete: ${error}`);
+              });
             }
           };
           let usage = null;
+          const streamMetadata = new Map;
           let currentBlockIndex = 0;
           let textBlockIndex = -1;
           let textBlockStarted = false;
@@ -4149,13 +4647,23 @@ data: ${JSON.stringify(data)}
           let streamFinalized = false;
           let cumulativeInputTokens = 0;
           let cumulativeOutputTokens = 0;
+          let currentRequestCost = 0;
           const tokenFilePath = `/tmp/claudish-tokens-${port}.json`;
           const writeTokenFile = () => {
             try {
+              const totalTokens = cumulativeInputTokens + cumulativeOutputTokens;
+              let contextLeftPercent = 100;
+              if (contextWindowLimit > 0) {
+                contextLeftPercent = Math.round((contextWindowLimit - totalTokens) / contextWindowLimit * 100);
+                contextLeftPercent = Math.max(0, Math.min(100, contextLeftPercent));
+              }
               const tokenData = {
                 input_tokens: cumulativeInputTokens,
                 output_tokens: cumulativeOutputTokens,
-                total_tokens: cumulativeInputTokens + cumulativeOutputTokens,
+                total_tokens: totalTokens,
+                total_cost: sessionTotalCost,
+                context_window: contextWindowLimit,
+                context_left_percent: contextLeftPercent,
                 updated_at: Date.now()
               };
               writeFileSync5(tokenFilePath, JSON.stringify(tokenData), "utf-8");
@@ -4167,19 +4675,14 @@ data: ${JSON.stringify(data)}
           };
           const toolCalls = new Map;
           const toolCallIds = new Set;
-          const adapterManager = new AdapterManager(model || "");
-          const adapter = adapterManager.getAdapter();
-          if (typeof adapter.reset === "function") {
-            adapter.reset();
-          }
           let accumulatedTextLength = 0;
-          log(`[Proxy] Using adapter: ${adapter.getName()}`);
           const hasToolResults = claudeRequest.messages?.some((msg) => Array.isArray(msg.content) && msg.content.some((block) => block.type === "tool_result"));
           const isFirstTurn = !hasToolResults;
           const estimateTokens = (text) => Math.ceil(text.length / 4);
           const requestJson = JSON.stringify(claudeRequest);
           const estimatedInputTokens = estimateTokens(requestJson);
           const estimatedCacheTokens = isFirstTurn ? Math.floor(estimatedInputTokens * 0.8) : 0;
+          const scaleFactor = getTokenScaleFactor();
           sendSSE("message_start", {
             type: "message_start",
             message: {
@@ -4191,23 +4694,13 @@ data: ${JSON.stringify(data)}
               stop_reason: null,
               stop_sequence: null,
               usage: {
-                input_tokens: estimatedInputTokens - estimatedCacheTokens,
-                cache_creation_input_tokens: isFirstTurn ? estimatedCacheTokens : 0,
-                cache_read_input_tokens: isFirstTurn ? 0 : estimatedCacheTokens,
+                input_tokens: Math.ceil((estimatedInputTokens - estimatedCacheTokens) * scaleFactor),
+                cache_creation_input_tokens: isFirstTurn ? Math.ceil(estimatedCacheTokens * scaleFactor) : 0,
+                cache_read_input_tokens: isFirstTurn ? 0 : Math.ceil(estimatedCacheTokens * scaleFactor),
                 output_tokens: 1
               }
             }
           });
-          textBlockIndex = currentBlockIndex++;
-          sendSSE("content_block_start", {
-            type: "content_block_start",
-            index: textBlockIndex,
-            content_block: {
-              type: "text",
-              text: ""
-            }
-          });
-          textBlockStarted = true;
           sendSSE("ping", {
             type: "ping"
           });
@@ -4264,9 +4757,35 @@ data: ${JSON.stringify(data)}
                       finishReason: chunk.choices?.[0]?.finish_reason,
                       hasUsage: !!chunk.usage
                     });
+                    const delta2 = chunk.choices?.[0]?.delta;
+                    if (delta2?.tool_calls) {
+                      for (const toolCall of delta2.tool_calls) {
+                        if (toolCall.extra_content) {
+                          logStructured("DEBUG: Found extra_content in tool_call", {
+                            tool_call_id: toolCall.id,
+                            has_extra_content: true,
+                            extra_content_keys: Object.keys(toolCall.extra_content),
+                            has_google: !!toolCall.extra_content.google
+                          });
+                        }
+                      }
+                    }
+                    if (delta2?.tool_calls && dataStr.includes("tool_calls")) {
+                      logStructured("DEBUG: Raw chunk JSON (tool_calls)", {
+                        has_extra_content_in_raw: dataStr.includes("extra_content"),
+                        raw_snippet: dataStr.substring(0, 500)
+                      });
+                    }
                   }
                   if (chunk.usage) {
                     usage = chunk.usage;
+                    if (typeof usage.cost === "number") {
+                      const costDiff = usage.cost - currentRequestCost;
+                      if (costDiff > 0) {
+                        sessionTotalCost += costDiff;
+                        currentRequestCost = usage.cost;
+                      }
+                    }
                     if (usage.prompt_tokens) {
                       cumulativeInputTokens = usage.prompt_tokens;
                     }
@@ -4277,6 +4796,14 @@ data: ${JSON.stringify(data)}
                   }
                   const choice = chunk.choices?.[0];
                   const delta = choice?.delta;
+                  if (delta) {
+                    await middlewareManager.afterStreamChunk({
+                      modelId: model || "",
+                      chunk,
+                      delta,
+                      metadata: streamMetadata
+                    });
+                  }
                   const hasReasoning = !!delta?.reasoning;
                   const hasContent = !!delta?.content;
                   const reasoningText = delta?.reasoning || "";

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claudish",
-  "version": "2.2.1",
+  "version": "2.4.0",
   "description": "CLI tool to run Claude Code with any OpenRouter model (Grok, GPT-5, MiniMax, etc.) via local Anthropic API-compatible proxy",
   "type": "module",
   "main": "./dist/index.js",

package/recommended-models.json CHANGED Viewed

@@ -1,26 +1,26 @@
 {
   "version": "2.1.0",
-  "lastUpdated": "2025-11-20",
+  "lastUpdated": "2025-11-24",
   "source": "https://openrouter.ai/models?categories=programming&fmt=cards&order=top-weekly",
   "models": [
     {
-      "id": "x-ai/grok-code-fast-1",
-      "name": "xAI: Grok Code Fast 1",
-      "description": "Grok Code Fast 1 is a speedy and economical reasoning model that excels at agentic coding. With reasoning traces visible in the response, developers can steer Grok Code for high-quality work flows.",
-      "provider": "X-ai",
-      "category": "reasoning",
+      "id": "google/gemini-3-pro-preview",
+      "name": "Google: Gemini 3 Pro Preview",
+      "description": "Gemini 3 Pro is Google’s flagship frontier model for high-precision multimodal reasoning, combining strong performance across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks. It delivers state-of-the-art benchmark results in general reasoning, STEM problem solving, factual QA, and multimodal understanding, including leading scores on LMArena, GPQA Diamond, MathArena Apex, MMMU-Pro, and Video-MMMU. Interactions emphasize depth and interpretability: the model is designed to infer intent with minimal prompting and produce direct, insight-focused responses.\n\nBuilt for advanced development and agentic workflows, Gemini 3 Pro provides robust tool-calling, long-horizon planning stability, and strong zero-shot generation for complex UI, visualization, and coding tasks. It excels at agentic coding (SWE-Bench Verified, Terminal-Bench 2.0), multimodal analysis, and structured long-form tasks such as research synthesis, planning, and interactive learning experiences. Suitable applications include autonomous agents, coding assistants, multimodal analytics, scientific reasoning, and high-context information processing.",
+      "provider": "Google",
+      "category": "vision",
       "priority": 1,
       "pricing": {
-        "input": "$0.20/1M",
-        "output": "$1.50/1M",
-        "average": "$0.85/1M"
+        "input": "$2.00/1M",
+        "output": "$12.00/1M",
+        "average": "$7.00/1M"
       },
-      "context": "256K",
-      "maxOutputTokens": 10000,
-      "modality": "text->text",
+      "context": "1048K",
+      "maxOutputTokens": 65536,
+      "modality": "text+image->text",
       "supportsTools": true,
       "supportsReasoning": true,
-      "supportsVision": false,
+      "supportsVision": true,
       "isModerated": false,
       "recommended": true
     },
@@ -46,19 +46,19 @@
       "recommended": true
     },
     {
-      "id": "moonshotai/kimi-k2-thinking",
-      "name": "MoonshotAI: Kimi K2 Thinking",
-      "description": "Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in Kimi K2, it activates 32 billion parameters per forward pass and supports 256 k-token context windows. The model is optimized for persistent step-by-step thought, dynamic tool invocation, and complex reasoning workflows that span hundreds of turns. It interleaves step-by-step reasoning with tool use, enabling autonomous research, coding, and writing that can persist for hundreds of sequential actions without drift.\n\nIt sets new open-source benchmarks on HLE, BrowseComp, SWE-Multilingual, and LiveCodeBench, while maintaining stable multi-agent behavior through 200–300 tool calls. Built on a large-scale MoE architecture with MuonClip optimization, it combines strong reasoning depth with high inference efficiency for demanding agentic and analytical tasks.",
-      "provider": "Moonshotai",
+      "id": "x-ai/grok-code-fast-1",
+      "name": "xAI: Grok Code Fast 1",
+      "description": "Grok Code Fast 1 is a speedy and economical reasoning model that excels at agentic coding. With reasoning traces visible in the response, developers can steer Grok Code for high-quality work flows.",
+      "provider": "X-ai",
       "category": "reasoning",
       "priority": 3,
       "pricing": {
-        "input": "$0.50/1M",
-        "output": "$2.50/1M",
-        "average": "$1.50/1M"
+        "input": "$0.20/1M",
+        "output": "$1.50/1M",
+        "average": "$0.85/1M"
       },
-      "context": "262K",
-      "maxOutputTokens": 262144,
+      "context": "256K",
+      "maxOutputTokens": 10000,
       "modality": "text->text",
       "supportsTools": true,
       "supportsReasoning": true,
@@ -66,38 +66,17 @@
       "isModerated": false,
       "recommended": true
     },
-    {
-      "id": "google/gemini-2.5-flash",
-      "name": "Google: Gemini 2.5 Flash",
-      "description": "Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in \"thinking\" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. \n\nAdditionally, Gemini 2.5 Flash is configurable through the \"max tokens for reasoning\" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).",
-      "provider": "Google",
-      "category": "reasoning",
-      "priority": 4,
-      "pricing": {
-        "input": "$0.30/1M",
-        "output": "$2.50/1M",
-        "average": "$1.40/1M"
-      },
-      "context": "1048K",
-      "maxOutputTokens": 65535,
-      "modality": "text+image->text",
-      "supportsTools": true,
-      "supportsReasoning": true,
-      "supportsVision": true,
-      "isModerated": false,
-      "recommended": true
-    },
     {
       "id": "minimax/minimax-m2",
       "name": "MiniMax: MiniMax M2",
       "description": "MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. With 10 billion activated parameters (230 billion total), it delivers near-frontier intelligence across general reasoning, tool use, and multi-step task execution while maintaining low latency and deployment efficiency.\n\nThe model excels in code generation, multi-file editing, compile-run-fix loops, and test-validated repair, showing strong results on SWE-Bench Verified, Multi-SWE-Bench, and Terminal-Bench. It also performs competitively in agentic evaluations such as BrowseComp and GAIA, effectively handling long-horizon planning, retrieval, and recovery from execution errors.\n\nBenchmarked by [Artificial Analysis](https://artificialanalysis.ai/models/minimax-m2), MiniMax-M2 ranks among the top open-source models for composite intelligence, spanning mathematics, science, and instruction-following. Its small activation footprint enables fast inference, high concurrency, and improved unit economics, making it well-suited for large-scale agents, developer assistants, and reasoning-driven applications that require responsiveness and cost efficiency.\n\nTo avoid degrading this model's performance, MiniMax highly recommends preserving reasoning between turns. Learn more about using reasoning_details to pass back reasoning in our [docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks).",
       "provider": "Minimax",
       "category": "reasoning",
-      "priority": 5,
+      "priority": 4,
       "pricing": {
-        "input": "$0.26/1M",
-        "output": "$1.02/1M",
-        "average": "$0.64/1M"
+        "input": "$0.24/1M",
+        "output": "$0.96/1M",
+        "average": "$0.60/1M"
       },
       "context": "204K",
       "maxOutputTokens": 131072,
@@ -114,7 +93,7 @@
       "description": "Compared with GLM-4.5, this generation brings several key improvements:\n\nLonger context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks.\nSuperior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages.\nAdvanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability.\nMore capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks.\nRefined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.",
       "provider": "Z-ai",
       "category": "reasoning",
-      "priority": 6,
+      "priority": 5,
       "pricing": {
         "input": "$0.40/1M",
         "output": "$1.75/1M",
@@ -129,55 +108,13 @@
       "isModerated": false,
       "recommended": true
     },
-    {
-      "id": "openai/gpt-5",
-      "name": "OpenAI: GPT-5",
-      "description": "GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like \"think hard about this.\" Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks.",
-      "provider": "Openai",
-      "category": "reasoning",
-      "priority": 7,
-      "pricing": {
-        "input": "$1.25/1M",
-        "output": "$10.00/1M",
-        "average": "$5.63/1M"
-      },
-      "context": "400K",
-      "maxOutputTokens": 128000,
-      "modality": "text+image->text",
-      "supportsTools": true,
-      "supportsReasoning": true,
-      "supportsVision": true,
-      "isModerated": true,
-      "recommended": true
-    },
-    {
-      "id": "google/gemini-3-pro-preview",
-      "name": "Google: Gemini 3 Pro Preview",
-      "description": "Gemini 3 Pro is Google’s flagship frontier model for high-precision multimodal reasoning, combining strong performance across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks. It delivers state-of-the-art benchmark results in general reasoning, STEM problem solving, factual QA, and multimodal understanding, including leading scores on LMArena, GPQA Diamond, MathArena Apex, MMMU-Pro, and Video-MMMU. Interactions emphasize depth and interpretability: the model is designed to infer intent with minimal prompting and produce direct, insight-focused responses.\n\nBuilt for advanced development and agentic workflows, Gemini 3 Pro provides robust tool-calling, long-horizon planning stability, and strong zero-shot generation for complex UI, visualization, and coding tasks. It excels at agentic coding (SWE-Bench Verified, Terminal-Bench 2.0), multimodal analysis, and structured long-form tasks such as research synthesis, planning, and interactive learning experiences. Suitable applications include autonomous agents, coding assistants, multimodal analytics, scientific reasoning, and high-context information processing.",
-      "provider": "Google",
-      "category": "vision",
-      "priority": 8,
-      "pricing": {
-        "input": "$2.00/1M",
-        "output": "$12.00/1M",
-        "average": "$7.00/1M"
-      },
-      "context": "1048K",
-      "maxOutputTokens": 65536,
-      "modality": "text+image->text",
-      "supportsTools": true,
-      "supportsReasoning": true,
-      "supportsVision": true,
-      "isModerated": false,
-      "recommended": true
-    },
     {
       "id": "qwen/qwen3-vl-235b-a22b-instruct",
       "name": "Qwen: Qwen3 VL 235B A22B Instruct",
       "description": "Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table extraction, multilingual OCR). The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning.\n\nBeyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows—turning sketches or mockups into code and assisting with UI debugging—while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.",
       "provider": "Qwen",
       "category": "vision",
-      "priority": 9,
+      "priority": 6,
       "pricing": {
         "input": "$0.21/1M",
         "output": "$1.90/1M",
@@ -191,27 +128,6 @@
       "supportsVision": true,
       "isModerated": false,
       "recommended": true
-    },
-    {
-      "id": "openrouter/polaris-alpha",
-      "name": "Polaris Alpha",
-      "description": "openrouter/polaris-alpha (metadata pending - not yet available in API)",
-      "provider": "Openrouter",
-      "category": "programming",
-      "priority": 10,
-      "pricing": {
-        "input": "N/A",
-        "output": "N/A",
-        "average": "N/A"
-      },
-      "context": "N/A",
-      "maxOutputTokens": null,
-      "modality": "text->text",
-      "supportsTools": false,
-      "supportsReasoning": false,
-      "supportsVision": false,
-      "isModerated": false,
-      "recommended": true
     }
   ]
 }