claudish 2.2.1 → 2.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -600,6 +600,24 @@ claudish --model minimax/minimax-m2 "task C"
600
600
 
601
601
  **NEW in v1.1.0**: Claudish now fully supports models with extended thinking/reasoning capabilities (Grok, o1, etc.) with complete Anthropic Messages API protocol compliance.
602
602
 
603
+ ### Thinking Translation Model (v1.5.0)
604
+
605
+ Claudish includes a sophisticated **Thinking Translation Model** that aligns Claude Code's native thinking budget with the unique requirements of every major AI provider.
606
+
607
+ When you set a thinking budget in Claude (e.g., `budget: 16000`), Claudish automatically translates it:
608
+
609
+ | Provider | Model | Translation Logic |
610
+ | :--- | :--- | :--- |
611
+ | **OpenAI** | o1, o3 | Maps budget to `reasoning_effort` (minimal/low/medium/high) |
612
+ | **Google** | Gemini 3 | Maps to `thinking_level` (low/high) |
613
+ | **Google** | Gemini 2.x | Passes exact `thinking_budget` (capped at 24k) |
614
+ | **xAI** | Grok 3 Mini | Maps to `reasoning_effort` (low/high) |
615
+ | **Qwen** | Qwen 2.5 | Enables `enable_thinking` + exact budget |
616
+ | **MiniMax** | M2 | Enables `reasoning_split` (interleaved thinking) |
617
+ | **DeepSeek** | R1 | Automatically manages reasoning (params stripped for safety) |
618
+
619
+ This ensures you can use standard Claude Code thinking controls with **ANY** supported model, without worrying about API specificities.
620
+
603
621
  ### What is Extended Thinking?
604
622
 
605
623
  Some AI models (like Grok and OpenAI's o1) can show their internal reasoning process before providing the final answer. This "thinking" content helps you understand how the model arrived at its conclusion.
@@ -678,6 +696,61 @@ For complete protocol documentation, see:
678
696
  - [COMPREHENSIVE_UX_ISSUE_ANALYSIS.md](./COMPREHENSIVE_UX_ISSUE_ANALYSIS.md) - Technical analysis
679
697
  - [THINKING_BLOCKS_IMPLEMENTATION.md](./THINKING_BLOCKS_IMPLEMENTATION.md) - Implementation summary
680
698
 
699
+ ## Dynamic Reasoning Support (NEW in v1.4.0)
700
+
701
+ **Claudish now intelligently adapts to ANY reasoning model!**
702
+
703
+ No more hardcoded lists or manual flags. Claudish dynamically queries OpenRouter metadata to enable thinking capabilities for any model that supports them.
704
+
705
+ ### 🧠 Dynamic Thinking Features
706
+
707
+ 1. **Auto-Detection**:
708
+ - Automatically checks model capabilities at startup
709
+ - Enables Extended Thinking UI *only* when supported
710
+ - Future-proof: Works instantly with new models (e.g., `deepseek-r1` or `minimax-m2`)
711
+
712
+ 2. **Smart Parameter Mapping**:
713
+ - **Claude**: Passes token budget directly (e.g., 16k tokens)
714
+ - **OpenAI (o1/o3)**: Translates budget to `reasoning_effort`
715
+ - "ultrathink" (≥32k) → `high`
716
+ - "think hard" (16k-32k) → `medium`
717
+ - "think" (<16k) → `low`
718
+ - **Gemini & Grok**: Preserves thought signatures and XML traces automatically
719
+
720
+ 3. **Universal Compatibility**:
721
+ - Use "ultrathink" or "think hard" prompts with ANY supported model
722
+ - Claudish handles the translation layer for you
723
+
724
+ ## Context Scaling & Auto-Compaction
725
+
726
+ **NEW in v1.2.0**: Claudish now intelligently manages token counting to support ANY context window size (from 128k to 2M+) while preserving Claude Code's native auto-compaction behavior.
727
+
728
+ ### The Challenge
729
+
730
+ Claude Code naturally assumes a fixed context window (typically 200k tokens for Sonnet).
731
+ - **Small Models (e.g., Grok 128k)**: Claude might overuse context and crash.
732
+ - **Massive Models (e.g., Gemini 2M)**: Claude would compact way too early (at 10% usage), wasting the model's potential.
733
+
734
+ ### The Solution: Token Scaling
735
+
736
+ Claudish implements a "Dual-Accounting" system:
737
+
738
+ 1. **Internal Scaling (For Claude):**
739
+ - We fetch the *real* context limit from OpenRouter (e.g., 1M tokens).
740
+ - We scale reported token usage so Claude *thinks* 1M tokens is 200k.
741
+ - **Result:** Auto-compaction triggers at the correct *percentage* of usage (e.g., 90% full), regardless of the actual limit.
742
+
743
+ 2. **Accurate Reporting (For You):**
744
+ - The status line displays the **Real Unscaled Usage** and **Real Context %**.
745
+ - You see specific costs and limits, while Claude remains blissfully unaware and stable.
746
+
747
+ **Benefits:**
748
+ - ✅ **Works with ANY model** size (128k, 1M, 2M, etc.)
749
+ - ✅ **Unlocks massive context** windows (Claude Code becomes 10x more powerful with Gemini!)
750
+ - ✅ **Prevents crashes** on smaller models (Grok)
751
+ - ✅ **Native behavior** (compaction just works)
752
+
753
+
681
754
  ## Development
682
755
 
683
756
  ### Project Structure
package/dist/index.js CHANGED
@@ -141,20 +141,11 @@ function createTempSettingsFile(modelDisplay, port) {
141
141
  const DIM = "\\033[2m";
142
142
  const RESET = "\\033[0m";
143
143
  const BOLD = "\\033[1m";
144
- const MODEL_CONTEXT = {
145
- "x-ai/grok-code-fast-1": 256000,
146
- "openai/gpt-5-codex": 400000,
147
- "minimax/minimax-m2": 204800,
148
- "z-ai/glm-4.6": 200000,
149
- "qwen/qwen3-vl-235b-a22b-instruct": 256000,
150
- "anthropic/claude-sonnet-4.5": 200000
151
- };
152
- const maxTokens = MODEL_CONTEXT[modelDisplay] || 1e5;
153
144
  const tokenFilePath = `/tmp/claudish-tokens-${port}.json`;
154
145
  const settings = {
155
146
  statusLine: {
156
147
  type: "command",
157
- command: `JSON=$(cat) && DIR=$(basename "$(pwd)") && [ \${#DIR} -gt 15 ] && DIR="\${DIR:0:12}..." || true && COST=$(echo "$JSON" | grep -o '"total_cost_usd":[0-9.]*' | cut -d: -f2) && [ -z "$COST" ] && COST="0" || true && if [ -f "${tokenFilePath}" ]; then TOKENS=$(cat "${tokenFilePath}" 2>/dev/null) && INPUT=$(echo "$TOKENS" | grep -o '"input_tokens":[0-9]*' | grep -o '[0-9]*') && OUTPUT=$(echo "$TOKENS" | grep -o '"output_tokens":[0-9]*' | grep -o '[0-9]*') && TOTAL=$((INPUT + OUTPUT)) && CTX=$(echo "scale=0; (${maxTokens} - $TOTAL) * 100 / ${maxTokens}" | bc 2>/dev/null); else INPUT=0 && OUTPUT=0 && CTX=100; fi && [ -z "$CTX" ] && CTX="100" || true && printf "${CYAN}${BOLD}%s${RESET} ${DIM}•${RESET} ${YELLOW}%s${RESET} ${DIM}•${RESET} ${GREEN}\\$%.3f${RESET} ${DIM}•${RESET} ${MAGENTA}%s%%${RESET}\\n" "$DIR" "$CLAUDISH_ACTIVE_MODEL_NAME" "$COST" "$CTX"`,
148
+ command: `JSON=$(cat) && DIR=$(basename "$(pwd)") && [ \${#DIR} -gt 15 ] && DIR="\${DIR:0:12}..." || true && CTX=100 && COST="0" && if [ -f "${tokenFilePath}" ]; then TOKENS=$(cat "${tokenFilePath}" 2>/dev/null) && REAL_COST=$(echo "$TOKENS" | grep -o '"total_cost":[0-9.]*' | cut -d: -f2) && REAL_CTX=$(echo "$TOKENS" | grep -o '"context_left_percent":[0-9]*' | grep -o '[0-9]*') && if [ ! -z "$REAL_COST" ]; then COST="$REAL_COST"; else COST=$(echo "$JSON" | grep -o '"total_cost_usd":[0-9.]*' | cut -d: -f2); fi && if [ ! -z "$REAL_CTX" ]; then CTX="$REAL_CTX"; fi; else COST=$(echo "$JSON" | grep -o '"total_cost_usd":[0-9.]*' | cut -d: -f2); fi && [ -z "$COST" ] && COST="0" || true && printf "${CYAN}${BOLD}%s${RESET} ${DIM}•${RESET} ${YELLOW}%s${RESET} ${DIM}•${RESET} ${GREEN}\\$%.3f${RESET} ${DIM}•${RESET} ${MAGENTA}%s%%${RESET}\\n" "$DIR" "$CLAUDISH_ACTIVE_MODEL_NAME" "$COST" "$CTX"`,
158
149
  padding: 0
159
150
  }
160
151
  };
@@ -336,11 +327,90 @@ function getAvailableModels() {
336
327
  _cachedModelIds = [...OPENROUTER_MODELS2];
337
328
  return [...OPENROUTER_MODELS2];
338
329
  }
330
+ var _cachedOpenRouterModels = null;
331
+ async function fetchModelContextWindow(modelId) {
332
+ if (_cachedOpenRouterModels) {
333
+ const model = _cachedOpenRouterModels.find((m) => m.id === modelId);
334
+ if (model) {
335
+ return model.context_length || model.top_provider?.context_length || 128000;
336
+ }
337
+ }
338
+ try {
339
+ const response = await fetch("https://openrouter.ai/api/v1/models");
340
+ if (response.ok) {
341
+ const data = await response.json();
342
+ _cachedOpenRouterModels = data.data;
343
+ const model = _cachedOpenRouterModels?.find((m) => m.id === modelId);
344
+ if (model) {
345
+ return model.context_length || model.top_provider?.context_length || 128000;
346
+ }
347
+ }
348
+ } catch (error) {}
349
+ try {
350
+ const modelMetadata = loadModelInfo();
351
+ } catch (e) {}
352
+ const jsonPath = join2(__dirname2, "../recommended-models.json");
353
+ if (existsSync(jsonPath)) {
354
+ try {
355
+ const jsonContent = readFileSync(jsonPath, "utf-8");
356
+ const data = JSON.parse(jsonContent);
357
+ const model = data.models.find((m) => m.id === modelId);
358
+ if (model && model.context) {
359
+ const ctxStr = model.context.toUpperCase();
360
+ if (ctxStr.includes("K"))
361
+ return parseFloat(ctxStr.replace("K", "")) * 1024;
362
+ if (ctxStr.includes("M"))
363
+ return parseFloat(ctxStr.replace("M", "")) * 1e6;
364
+ const val = parseInt(ctxStr);
365
+ if (!isNaN(val))
366
+ return val;
367
+ }
368
+ } catch (e) {}
369
+ }
370
+ return 200000;
371
+ }
339
372
 
340
373
  // src/cli.ts
341
374
  import { readFileSync as readFileSync2, writeFileSync as writeFileSync3, existsSync as existsSync2, mkdirSync, copyFileSync } from "node:fs";
342
375
  import { fileURLToPath as fileURLToPath2 } from "node:url";
343
376
  import { dirname as dirname2, join as join3 } from "node:path";
377
+
378
+ // src/utils.ts
379
+ function fuzzyScore(text, query) {
380
+ if (!text || !query)
381
+ return 0;
382
+ const t = text.toLowerCase();
383
+ const q = query.toLowerCase();
384
+ if (t === q)
385
+ return 1;
386
+ if (t.startsWith(q))
387
+ return 0.9;
388
+ if (t.includes(` ${q}`) || t.includes(`-${q}`) || t.includes(`/${q}`))
389
+ return 0.8;
390
+ if (t.includes(q))
391
+ return 0.6;
392
+ let score = 0;
393
+ let tIdx = 0;
394
+ let qIdx = 0;
395
+ let consecutive = 0;
396
+ while (tIdx < t.length && qIdx < q.length) {
397
+ if (t[tIdx] === q[qIdx]) {
398
+ score += 1 + consecutive * 0.5;
399
+ consecutive++;
400
+ qIdx++;
401
+ } else {
402
+ consecutive = 0;
403
+ }
404
+ tIdx++;
405
+ }
406
+ if (qIdx === q.length) {
407
+ const compactness = q.length / (tIdx + 1);
408
+ return 0.1 + 0.4 * compactness * (score / (q.length * 2));
409
+ }
410
+ return 0;
411
+ }
412
+
413
+ // src/cli.ts
344
414
  var __filename3 = fileURLToPath2(import.meta.url);
345
415
  var __dirname3 = dirname2(__filename3);
346
416
  var packageJson = JSON.parse(readFileSync2(join3(__dirname3, "../package.json"), "utf-8"));
@@ -452,6 +522,15 @@ async function parseArgs(args) {
452
522
  printAvailableModels();
453
523
  }
454
524
  process.exit(0);
525
+ } else if (arg === "--search" || arg === "-s") {
526
+ const query = args[++i];
527
+ if (!query) {
528
+ console.error("--search requires a search term");
529
+ process.exit(1);
530
+ }
531
+ const forceUpdate = args.includes("--force-update");
532
+ await searchAndPrintModels(query, forceUpdate);
533
+ process.exit(0);
455
534
  } else {
456
535
  config.claudeArgs = args.slice(i);
457
536
  break;
@@ -500,6 +579,75 @@ async function parseArgs(args) {
500
579
  }
501
580
  var CACHE_MAX_AGE_DAYS = 2;
502
581
  var MODELS_JSON_PATH = join3(__dirname3, "../recommended-models.json");
582
+ var ALL_MODELS_JSON_PATH = join3(__dirname3, "../all-models.json");
583
+ async function searchAndPrintModels(query, forceUpdate) {
584
+ let models = [];
585
+ if (!forceUpdate && existsSync2(ALL_MODELS_JSON_PATH)) {
586
+ try {
587
+ const cacheData = JSON.parse(readFileSync2(ALL_MODELS_JSON_PATH, "utf-8"));
588
+ const lastUpdated = new Date(cacheData.lastUpdated);
589
+ const now = new Date;
590
+ const ageInDays = (now.getTime() - lastUpdated.getTime()) / (1000 * 60 * 60 * 24);
591
+ if (ageInDays <= CACHE_MAX_AGE_DAYS) {
592
+ models = cacheData.models;
593
+ }
594
+ } catch (e) {}
595
+ }
596
+ if (models.length === 0) {
597
+ console.error("\uD83D\uDD04 Fetching all models from OpenRouter (this may take a moment)...");
598
+ try {
599
+ const response = await fetch("https://openrouter.ai/api/v1/models");
600
+ if (!response.ok)
601
+ throw new Error(`API returned ${response.status}`);
602
+ const data = await response.json();
603
+ models = data.data;
604
+ writeFileSync3(ALL_MODELS_JSON_PATH, JSON.stringify({
605
+ lastUpdated: new Date().toISOString(),
606
+ models
607
+ }), "utf-8");
608
+ console.error(`✅ Cached ${models.length} models`);
609
+ } catch (error) {
610
+ console.error(`❌ Failed to fetch models: ${error}`);
611
+ process.exit(1);
612
+ }
613
+ }
614
+ const results = models.map((model) => {
615
+ const nameScore = fuzzyScore(model.name || "", query);
616
+ const idScore = fuzzyScore(model.id || "", query);
617
+ const descScore = fuzzyScore(model.description || "", query) * 0.5;
618
+ return {
619
+ model,
620
+ score: Math.max(nameScore, idScore, descScore)
621
+ };
622
+ }).filter((item) => item.score > 0.2).sort((a, b) => b.score - a.score).slice(0, 20);
623
+ if (results.length === 0) {
624
+ console.log(`No models found matching "${query}"`);
625
+ return;
626
+ }
627
+ console.log(`
628
+ Found ${results.length} matching models:
629
+ `);
630
+ console.log(" Model Provider Pricing Context Score");
631
+ console.log(" " + "─".repeat(80));
632
+ for (const { model, score } of results) {
633
+ const modelId = model.id.length > 30 ? model.id.substring(0, 27) + "..." : model.id;
634
+ const modelIdPadded = modelId.padEnd(30);
635
+ const providerName = model.id.split("/")[0];
636
+ const provider = providerName.length > 10 ? providerName.substring(0, 7) + "..." : providerName;
637
+ const providerPadded = provider.padEnd(10);
638
+ const promptPrice = parseFloat(model.pricing?.prompt || "0") * 1e6;
639
+ const completionPrice = parseFloat(model.pricing?.completion || "0") * 1e6;
640
+ const avg = (promptPrice + completionPrice) / 2;
641
+ const pricing = avg === 0 ? "FREE" : `$${avg.toFixed(2)}/1M`;
642
+ const pricingPadded = pricing.padEnd(10);
643
+ const contextLen = model.context_length || model.top_provider?.context_length || 0;
644
+ const context = contextLen > 0 ? `${Math.round(contextLen / 1000)}K` : "N/A";
645
+ const contextPadded = context.padEnd(7);
646
+ console.log(` ${modelIdPadded} ${providerPadded} ${pricingPadded} ${contextPadded} ${(score * 100).toFixed(0)}%`);
647
+ }
648
+ console.log("");
649
+ console.log("Use a model: claudish --model <model-id>");
650
+ }
503
651
  function isCacheStale() {
504
652
  if (!existsSync2(MODELS_JSON_PATH)) {
505
653
  return true;
@@ -522,6 +670,8 @@ async function updateModelsFromOpenRouter() {
522
670
  console.error("\uD83D\uDD04 Updating model recommendations from OpenRouter...");
523
671
  try {
524
672
  const topWeeklyProgrammingModels = [
673
+ "google/gemini-3-pro-preview",
674
+ "openai/gpt-5.1-codex",
525
675
  "x-ai/grok-code-fast-1",
526
676
  "anthropic/claude-sonnet-4.5",
527
677
  "google/gemini-2.5-flash",
@@ -555,29 +705,7 @@ async function updateModelsFromOpenRouter() {
555
705
  }
556
706
  const model = modelMap.get(modelId);
557
707
  if (!model) {
558
- console.error(`⚠️ Model ${modelId} not found in OpenRouter API (including with limited metadata)`);
559
- recommendations.push({
560
- id: modelId,
561
- name: modelId.split("/")[1].replace(/-/g, " ").replace(/\b\w/g, (l) => l.toUpperCase()),
562
- description: `${modelId} (metadata pending - not yet available in API)`,
563
- provider: provider.charAt(0).toUpperCase() + provider.slice(1),
564
- category: "programming",
565
- priority: recommendations.length + 1,
566
- pricing: {
567
- input: "N/A",
568
- output: "N/A",
569
- average: "N/A"
570
- },
571
- context: "N/A",
572
- maxOutputTokens: null,
573
- modality: "text->text",
574
- supportsTools: false,
575
- supportsReasoning: false,
576
- supportsVision: false,
577
- isModerated: false,
578
- recommended: true
579
- });
580
- providers.add(provider);
708
+ console.error(`⚠️ Model ${modelId} not found in OpenRouter API - skipping`);
581
709
  continue;
582
710
  }
583
711
  const name = model.name || modelId;
@@ -3541,12 +3669,58 @@ class GrokAdapter extends BaseModelAdapter {
3541
3669
  }
3542
3670
  }
3543
3671
 
3672
+ // src/adapters/gemini-adapter.ts
3673
+ class GeminiAdapter extends BaseModelAdapter {
3674
+ thoughtSignatures = new Map;
3675
+ processTextContent(textContent, accumulatedText) {
3676
+ return {
3677
+ cleanedText: textContent,
3678
+ extractedToolCalls: [],
3679
+ wasTransformed: false
3680
+ };
3681
+ }
3682
+ extractThoughtSignaturesFromReasoningDetails(reasoningDetails) {
3683
+ const extracted = new Map;
3684
+ if (!reasoningDetails || !Array.isArray(reasoningDetails)) {
3685
+ return extracted;
3686
+ }
3687
+ for (const detail of reasoningDetails) {
3688
+ if (detail && detail.type === "reasoning.encrypted" && detail.id && detail.data) {
3689
+ this.thoughtSignatures.set(detail.id, detail.data);
3690
+ extracted.set(detail.id, detail.data);
3691
+ }
3692
+ }
3693
+ return extracted;
3694
+ }
3695
+ getThoughtSignature(toolCallId) {
3696
+ return this.thoughtSignatures.get(toolCallId);
3697
+ }
3698
+ hasThoughtSignature(toolCallId) {
3699
+ return this.thoughtSignatures.has(toolCallId);
3700
+ }
3701
+ getAllThoughtSignatures() {
3702
+ return new Map(this.thoughtSignatures);
3703
+ }
3704
+ reset() {
3705
+ this.thoughtSignatures.clear();
3706
+ }
3707
+ shouldHandle(modelId) {
3708
+ return modelId.includes("gemini") || modelId.includes("google/");
3709
+ }
3710
+ getName() {
3711
+ return "GeminiAdapter";
3712
+ }
3713
+ }
3714
+
3544
3715
  // src/adapters/adapter-manager.ts
3545
3716
  class AdapterManager {
3546
3717
  adapters;
3547
3718
  defaultAdapter;
3548
3719
  constructor(modelId) {
3549
- this.adapters = [new GrokAdapter(modelId)];
3720
+ this.adapters = [
3721
+ new GrokAdapter(modelId),
3722
+ new GeminiAdapter(modelId)
3723
+ ];
3550
3724
  this.defaultAdapter = new DefaultAdapter(modelId);
3551
3725
  }
3552
3726
  getAdapter() {
@@ -3562,6 +3736,261 @@ class AdapterManager {
3562
3736
  }
3563
3737
  }
3564
3738
 
3739
+ // src/middleware/manager.ts
3740
+ class MiddlewareManager {
3741
+ middlewares = [];
3742
+ initialized = false;
3743
+ register(middleware) {
3744
+ this.middlewares.push(middleware);
3745
+ if (isLoggingEnabled()) {
3746
+ logStructured("Middleware Registered", {
3747
+ name: middleware.name,
3748
+ total: this.middlewares.length
3749
+ });
3750
+ }
3751
+ }
3752
+ async initialize() {
3753
+ if (this.initialized) {
3754
+ log("[Middleware] Already initialized, skipping");
3755
+ return;
3756
+ }
3757
+ log(`[Middleware] Initializing ${this.middlewares.length} middleware(s)...`);
3758
+ for (const middleware of this.middlewares) {
3759
+ if (middleware.onInit) {
3760
+ try {
3761
+ await middleware.onInit();
3762
+ log(`[Middleware] ${middleware.name} initialized`);
3763
+ } catch (error) {
3764
+ log(`[Middleware] ERROR: ${middleware.name} initialization failed: ${error}`);
3765
+ }
3766
+ }
3767
+ }
3768
+ this.initialized = true;
3769
+ log("[Middleware] Initialization complete");
3770
+ }
3771
+ getActiveMiddlewares(modelId) {
3772
+ return this.middlewares.filter((m) => m.shouldHandle(modelId));
3773
+ }
3774
+ async beforeRequest(context) {
3775
+ const active = this.getActiveMiddlewares(context.modelId);
3776
+ if (active.length === 0) {
3777
+ return;
3778
+ }
3779
+ if (isLoggingEnabled()) {
3780
+ logStructured("Middleware Chain (beforeRequest)", {
3781
+ modelId: context.modelId,
3782
+ middlewares: active.map((m) => m.name),
3783
+ messageCount: context.messages.length
3784
+ });
3785
+ }
3786
+ for (const middleware of active) {
3787
+ try {
3788
+ await middleware.beforeRequest(context);
3789
+ } catch (error) {
3790
+ log(`[Middleware] ERROR in ${middleware.name}.beforeRequest: ${error}`);
3791
+ }
3792
+ }
3793
+ }
3794
+ async afterResponse(context) {
3795
+ const active = this.getActiveMiddlewares(context.modelId);
3796
+ if (active.length === 0) {
3797
+ return;
3798
+ }
3799
+ if (isLoggingEnabled()) {
3800
+ logStructured("Middleware Chain (afterResponse)", {
3801
+ modelId: context.modelId,
3802
+ middlewares: active.map((m) => m.name)
3803
+ });
3804
+ }
3805
+ for (const middleware of active) {
3806
+ if (middleware.afterResponse) {
3807
+ try {
3808
+ await middleware.afterResponse(context);
3809
+ } catch (error) {
3810
+ log(`[Middleware] ERROR in ${middleware.name}.afterResponse: ${error}`);
3811
+ }
3812
+ }
3813
+ }
3814
+ }
3815
+ async afterStreamChunk(context) {
3816
+ const active = this.getActiveMiddlewares(context.modelId);
3817
+ if (active.length === 0) {
3818
+ return;
3819
+ }
3820
+ if (isLoggingEnabled() && !context.metadata.has("_middlewareLogged")) {
3821
+ logStructured("Middleware Chain (afterStreamChunk)", {
3822
+ modelId: context.modelId,
3823
+ middlewares: active.map((m) => m.name)
3824
+ });
3825
+ context.metadata.set("_middlewareLogged", true);
3826
+ }
3827
+ for (const middleware of active) {
3828
+ if (middleware.afterStreamChunk) {
3829
+ try {
3830
+ await middleware.afterStreamChunk(context);
3831
+ } catch (error) {
3832
+ log(`[Middleware] ERROR in ${middleware.name}.afterStreamChunk: ${error}`);
3833
+ }
3834
+ }
3835
+ }
3836
+ }
3837
+ async afterStreamComplete(modelId, metadata) {
3838
+ const active = this.getActiveMiddlewares(modelId);
3839
+ if (active.length === 0) {
3840
+ return;
3841
+ }
3842
+ for (const middleware of active) {
3843
+ if (middleware.afterStreamComplete) {
3844
+ try {
3845
+ await middleware.afterStreamComplete(metadata);
3846
+ } catch (error) {
3847
+ log(`[Middleware] ERROR in ${middleware.name}.afterStreamComplete: ${error}`);
3848
+ }
3849
+ }
3850
+ }
3851
+ }
3852
+ }
3853
+ // src/middleware/gemini-thought-signature.ts
3854
+ class GeminiThoughtSignatureMiddleware {
3855
+ name = "GeminiThoughtSignature";
3856
+ persistentReasoningDetails = new Map;
3857
+ shouldHandle(modelId) {
3858
+ return modelId.includes("gemini") || modelId.includes("google/");
3859
+ }
3860
+ onInit() {
3861
+ log("[Gemini] Thought signature middleware initialized");
3862
+ }
3863
+ beforeRequest(context) {
3864
+ if (this.persistentReasoningDetails.size === 0) {
3865
+ return;
3866
+ }
3867
+ if (isLoggingEnabled()) {
3868
+ logStructured("[Gemini] Injecting reasoning_details", {
3869
+ cacheSize: this.persistentReasoningDetails.size,
3870
+ messageCount: context.messages.length
3871
+ });
3872
+ }
3873
+ let injected = 0;
3874
+ for (const msg of context.messages) {
3875
+ if (msg.role === "assistant" && msg.tool_calls) {
3876
+ for (const [msgId, cached] of this.persistentReasoningDetails.entries()) {
3877
+ const hasMatchingToolCall = msg.tool_calls.some((tc) => cached.tool_call_ids.has(tc.id));
3878
+ if (hasMatchingToolCall) {
3879
+ msg.reasoning_details = cached.reasoning_details;
3880
+ injected++;
3881
+ if (isLoggingEnabled()) {
3882
+ logStructured("[Gemini] Reasoning details added to assistant message", {
3883
+ message_id: msgId,
3884
+ reasoning_blocks: cached.reasoning_details.length,
3885
+ tool_calls: msg.tool_calls.length
3886
+ });
3887
+ }
3888
+ break;
3889
+ }
3890
+ }
3891
+ if (!msg.reasoning_details && isLoggingEnabled()) {
3892
+ log(`[Gemini] WARNING: No reasoning_details found for assistant message with tool_calls`);
3893
+ log(`[Gemini] Tool call IDs: ${msg.tool_calls.map((tc) => tc.id).join(", ")}`);
3894
+ }
3895
+ }
3896
+ }
3897
+ if (isLoggingEnabled() && injected > 0) {
3898
+ logStructured("[Gemini] Signature injection complete", {
3899
+ injected,
3900
+ cacheSize: this.persistentReasoningDetails.size
3901
+ });
3902
+ log("[Gemini] DEBUG: Messages after injection:");
3903
+ for (let i = 0;i < context.messages.length; i++) {
3904
+ const msg = context.messages[i];
3905
+ log(`[Gemini] Message ${i}: role=${msg.role}, has_content=${!!msg.content}, has_tool_calls=${!!msg.tool_calls}, tool_call_id=${msg.tool_call_id || "N/A"}`);
3906
+ if (msg.role === "assistant" && msg.tool_calls) {
3907
+ log(` - Assistant has ${msg.tool_calls.length} tool call(s), content="${msg.content}"`);
3908
+ for (const tc of msg.tool_calls) {
3909
+ log(` * Tool call: ${tc.id}, function=${tc.function?.name}, has extra_content: ${!!tc.extra_content}, has thought_signature: ${!!tc.extra_content?.google?.thought_signature}`);
3910
+ if (tc.extra_content) {
3911
+ log(` extra_content keys: ${Object.keys(tc.extra_content).join(", ")}`);
3912
+ if (tc.extra_content.google) {
3913
+ log(` google keys: ${Object.keys(tc.extra_content.google).join(", ")}`);
3914
+ log(` thought_signature length: ${tc.extra_content.google.thought_signature?.length || 0}`);
3915
+ }
3916
+ }
3917
+ }
3918
+ } else if (msg.role === "tool") {
3919
+ log(` - Tool result: tool_call_id=${msg.tool_call_id}, has extra_content: ${!!msg.extra_content}`);
3920
+ }
3921
+ }
3922
+ }
3923
+ }
3924
+ afterResponse(context) {
3925
+ const response = context.response;
3926
+ const message = response?.choices?.[0]?.message;
3927
+ if (!message) {
3928
+ return;
3929
+ }
3930
+ const reasoningDetails = message.reasoning_details || [];
3931
+ const toolCalls = message.tool_calls || [];
3932
+ if (reasoningDetails.length > 0 && toolCalls.length > 0) {
3933
+ const messageId = `msg_${Date.now()}_${Math.random().toString(36).slice(2)}`;
3934
+ const toolCallIds = new Set(toolCalls.map((tc) => tc.id).filter(Boolean));
3935
+ this.persistentReasoningDetails.set(messageId, {
3936
+ reasoning_details: reasoningDetails,
3937
+ tool_call_ids: toolCallIds
3938
+ });
3939
+ logStructured("[Gemini] Reasoning details saved (non-streaming)", {
3940
+ message_id: messageId,
3941
+ reasoning_blocks: reasoningDetails.length,
3942
+ tool_calls: toolCallIds.size,
3943
+ total_cached_messages: this.persistentReasoningDetails.size
3944
+ });
3945
+ }
3946
+ }
3947
+ afterStreamChunk(context) {
3948
+ const delta = context.delta;
3949
+ if (!delta)
3950
+ return;
3951
+ if (delta.reasoning_details && delta.reasoning_details.length > 0) {
3952
+ if (!context.metadata.has("reasoning_details")) {
3953
+ context.metadata.set("reasoning_details", []);
3954
+ }
3955
+ const accumulated = context.metadata.get("reasoning_details");
3956
+ accumulated.push(...delta.reasoning_details);
3957
+ if (isLoggingEnabled()) {
3958
+ logStructured("[Gemini] Reasoning details accumulated", {
3959
+ chunk_blocks: delta.reasoning_details.length,
3960
+ total_blocks: accumulated.length
3961
+ });
3962
+ }
3963
+ }
3964
+ if (delta.tool_calls) {
3965
+ if (!context.metadata.has("tool_call_ids")) {
3966
+ context.metadata.set("tool_call_ids", new Set);
3967
+ }
3968
+ const toolCallIds = context.metadata.get("tool_call_ids");
3969
+ for (const tc of delta.tool_calls) {
3970
+ if (tc.id) {
3971
+ toolCallIds.add(tc.id);
3972
+ }
3973
+ }
3974
+ }
3975
+ }
3976
+ afterStreamComplete(metadata) {
3977
+ const reasoningDetails = metadata.get("reasoning_details") || [];
3978
+ const toolCallIds = metadata.get("tool_call_ids") || new Set;
3979
+ if (reasoningDetails.length > 0 && toolCallIds.size > 0) {
3980
+ const messageId = `msg_${Date.now()}_${Math.random().toString(36).slice(2)}`;
3981
+ this.persistentReasoningDetails.set(messageId, {
3982
+ reasoning_details: reasoningDetails,
3983
+ tool_call_ids: toolCallIds
3984
+ });
3985
+ logStructured("[Gemini] Streaming complete - reasoning details saved", {
3986
+ message_id: messageId,
3987
+ reasoning_blocks: reasoningDetails.length,
3988
+ tool_calls: toolCallIds.size,
3989
+ total_cached_messages: this.persistentReasoningDetails.size
3990
+ });
3991
+ }
3992
+ }
3993
+ }
3565
3994
  // src/proxy-server.ts
3566
3995
  async function createProxyServer(port, openrouterApiKey, model, monitorMode = false, anthropicApiKey) {
3567
3996
  const OPENROUTER_API_URL2 = "https://openrouter.ai/api/v1/chat/completions";
@@ -3571,6 +4000,30 @@ async function createProxyServer(port, openrouterApiKey, model, monitorMode = fa
3571
4000
  };
3572
4001
  const ANTHROPIC_API_URL = "https://api.anthropic.com/v1/messages";
3573
4002
  const ANTHROPIC_COUNT_TOKENS_URL = "https://api.anthropic.com/v1/messages/count_tokens";
4003
+ const middlewareManager = new MiddlewareManager;
4004
+ middlewareManager.register(new GeminiThoughtSignatureMiddleware);
4005
+ middlewareManager.initialize().catch((error) => {
4006
+ log(`[Proxy] Middleware initialization error: ${error}`);
4007
+ });
4008
+ let sessionTotalCost = 0;
4009
+ let contextWindowLimit = 200000;
4010
+ const CLAUDE_INTERNAL_CONTEXT_MAX = 200000;
4011
+ const getTokenScaleFactor = () => {
4012
+ if (contextWindowLimit === 0)
4013
+ return 1;
4014
+ return CLAUDE_INTERNAL_CONTEXT_MAX / contextWindowLimit;
4015
+ };
4016
+ if (model && !monitorMode) {
4017
+ fetchModelContextWindow(model).then((limit) => {
4018
+ contextWindowLimit = limit;
4019
+ if (isLoggingEnabled()) {
4020
+ log(`[Proxy] Context window limit updated to ${limit} tokens for model ${model}`);
4021
+ log(`[Proxy] Token scaling factor: ${getTokenScaleFactor().toFixed(2)}x (Map ${limit} → ${CLAUDE_INTERNAL_CONTEXT_MAX})`);
4022
+ }
4023
+ }).catch((err) => {
4024
+ log(`[Proxy] Failed to fetch context window limit: ${err}`);
4025
+ });
4026
+ }
3574
4027
  const app = new Hono2;
3575
4028
  app.use("*", cors());
3576
4029
  app.get("/", (c) => {
@@ -3641,8 +4094,9 @@ async function createProxyServer(port, openrouterApiKey, model, monitorMode = fa
3641
4094
  return acc + Math.ceil(content.length / 4);
3642
4095
  }, 0) : 0;
3643
4096
  const totalTokens = systemTokens + messageTokens;
4097
+ const scaleFactor = getTokenScaleFactor();
3644
4098
  return c.json({
3645
- input_tokens: totalTokens
4099
+ input_tokens: Math.ceil(totalTokens * scaleFactor)
3646
4100
  });
3647
4101
  } catch (error) {
3648
4102
  log(`[Proxy] Token counting error: ${error}`);
@@ -3777,6 +4231,14 @@ async function createProxyServer(port, openrouterApiKey, model, monitorMode = fa
3777
4231
  });
3778
4232
  const { claudeRequest, droppedParams } = transformOpenAIToClaude(claudePayload);
3779
4233
  const messages = [];
4234
+ const adapterManager = new AdapterManager(model || "");
4235
+ const adapter = adapterManager.getAdapter();
4236
+ if (typeof adapter.reset === "function") {
4237
+ adapter.reset();
4238
+ }
4239
+ if (isLoggingEnabled()) {
4240
+ log(`[Proxy] Using adapter: ${adapter.getName()}`);
4241
+ }
3780
4242
  if (claudeRequest.system) {
3781
4243
  let systemContent;
3782
4244
  if (typeof claudeRequest.system === "string") {
@@ -3806,37 +4268,40 @@ async function createProxyServer(port, openrouterApiKey, model, monitorMode = fa
3806
4268
  for (const msg of claudeRequest.messages) {
3807
4269
  if (msg.role === "user") {
3808
4270
  if (Array.isArray(msg.content)) {
3809
- const textParts = [];
4271
+ const contentParts = [];
3810
4272
  const toolResults = [];
3811
4273
  const seenToolResultIds = new Set;
3812
4274
  for (const block of msg.content) {
3813
4275
  if (block.type === "text") {
3814
- textParts.push(block.text);
4276
+ contentParts.push({ type: "text", text: block.text });
4277
+ } else if (block.type === "image") {
4278
+ contentParts.push({
4279
+ type: "image_url",
4280
+ image_url: {
4281
+ url: `data:${block.source.media_type};base64,${block.source.data}`
4282
+ }
4283
+ });
3815
4284
  } else if (block.type === "tool_result") {
3816
4285
  if (seenToolResultIds.has(block.tool_use_id)) {
3817
4286
  log(`[Proxy] Skipping duplicate tool_result with tool_use_id: ${block.tool_use_id}`);
3818
4287
  continue;
3819
4288
  }
3820
4289
  seenToolResultIds.add(block.tool_use_id);
3821
- toolResults.push({
4290
+ const toolResultMsg = {
3822
4291
  role: "tool",
3823
4292
  content: typeof block.content === "string" ? block.content : JSON.stringify(block.content),
3824
4293
  tool_call_id: block.tool_use_id
3825
- });
4294
+ };
4295
+ toolResults.push(toolResultMsg);
3826
4296
  }
3827
4297
  }
3828
4298
  if (toolResults.length > 0) {
3829
4299
  messages.push(...toolResults);
3830
- if (textParts.length > 0) {
3831
- messages.push({
3832
- role: "user",
3833
- content: textParts.join(" ")
3834
- });
3835
- }
3836
- } else if (textParts.length > 0) {
4300
+ }
4301
+ if (contentParts.length > 0) {
3837
4302
  messages.push({
3838
4303
  role: "user",
3839
- content: textParts.join(" ")
4304
+ content: contentParts
3840
4305
  });
3841
4306
  }
3842
4307
  } else if (typeof msg.content === "string") {
@@ -3918,8 +4383,27 @@ IMPORTANT: When calling tools, you MUST use the OpenAI tool_calls format with JS
3918
4383
  model,
3919
4384
  messages,
3920
4385
  temperature: claudeRequest.temperature !== undefined ? claudeRequest.temperature : 1,
3921
- stream: true
4386
+ stream: true,
4387
+ include_reasoning: true
3922
4388
  };
4389
+ if (claudeRequest.thinking) {
4390
+ const { budget_tokens } = claudeRequest.thinking;
4391
+ log(`[Proxy] Thinking mode requested with budget: ${budget_tokens} tokens`);
4392
+ openrouterPayload.thinking = claudeRequest.thinking;
4393
+ let effort = "medium";
4394
+ if (budget_tokens < 16000)
4395
+ effort = "low";
4396
+ else if (budget_tokens >= 32000)
4397
+ effort = "high";
4398
+ if (model && (model.includes("o1") || model.includes("o3") || model.startsWith("openai/"))) {
4399
+ openrouterPayload.reasoning_effort = effort;
4400
+ log(`[Proxy] Mapped budget ${budget_tokens} -> reasoning_effort: ${effort}`);
4401
+ }
4402
+ }
4403
+ if (!openrouterPayload.stream_options) {
4404
+ openrouterPayload.stream_options = {};
4405
+ }
4406
+ openrouterPayload.stream_options.include_usage = true;
3923
4407
  if (claudeRequest.max_tokens) {
3924
4408
  openrouterPayload.max_tokens = claudeRequest.max_tokens;
3925
4409
  }
@@ -3938,6 +4422,12 @@ IMPORTANT: When calling tools, you MUST use the OpenAI tool_calls format with JS
3938
4422
  maxTokens: openrouterPayload.max_tokens,
3939
4423
  stream: openrouterPayload.stream
3940
4424
  });
4425
+ await middlewareManager.beforeRequest({
4426
+ modelId: model || "",
4427
+ messages,
4428
+ tools,
4429
+ stream: openrouterPayload.stream
4430
+ });
3941
4431
  const headers = {
3942
4432
  "Content-Type": "application/json",
3943
4433
  Authorization: `Bearer ${openrouterApiKey}`,
@@ -3975,6 +4465,10 @@ IMPORTANT: When calling tools, you MUST use the OpenAI tool_calls format with JS
3975
4465
  if (data.error) {
3976
4466
  return c.json({ error: data.error.message || "Unknown error" }, 500);
3977
4467
  }
4468
+ await middlewareManager.afterResponse({
4469
+ modelId: model || "",
4470
+ response: data
4471
+ });
3978
4472
  const choice = data.choices[0];
3979
4473
  const openaiMessage = choice.message;
3980
4474
  const content = [];
@@ -4002,8 +4496,8 @@ IMPORTANT: When calling tools, you MUST use the OpenAI tool_calls format with JS
4002
4496
  stop_reason: mapStopReason(choice.finish_reason),
4003
4497
  stop_sequence: null,
4004
4498
  usage: {
4005
- input_tokens: data.usage?.prompt_tokens || 0,
4006
- output_tokens: data.usage?.completion_tokens || 0
4499
+ input_tokens: Math.ceil((data.usage?.prompt_tokens || 0) * getTokenScaleFactor()),
4500
+ output_tokens: Math.ceil((data.usage?.completion_tokens || 0) * getTokenScaleFactor())
4007
4501
  }
4008
4502
  };
4009
4503
  log("[Proxy] Translated to Claude format:");
@@ -4113,7 +4607,7 @@ data: ${JSON.stringify(data)}
4113
4607
  stop_sequence: null
4114
4608
  },
4115
4609
  usage: {
4116
- output_tokens: outputTokens
4610
+ output_tokens: Math.ceil(outputTokens * getTokenScaleFactor())
4117
4611
  }
4118
4612
  });
4119
4613
  sendSSE("message_stop", {
@@ -4137,9 +4631,13 @@ data: ${JSON.stringify(data)}
4137
4631
  clearInterval(pingInterval);
4138
4632
  }
4139
4633
  log(`[Proxy] Stream closed (reason: ${reason})`);
4634
+ middlewareManager.afterStreamComplete(model || "", streamMetadata).catch((error) => {
4635
+ log(`[Middleware] Error in afterStreamComplete: ${error}`);
4636
+ });
4140
4637
  }
4141
4638
  };
4142
4639
  let usage = null;
4640
+ const streamMetadata = new Map;
4143
4641
  let currentBlockIndex = 0;
4144
4642
  let textBlockIndex = -1;
4145
4643
  let textBlockStarted = false;
@@ -4149,13 +4647,23 @@ data: ${JSON.stringify(data)}
4149
4647
  let streamFinalized = false;
4150
4648
  let cumulativeInputTokens = 0;
4151
4649
  let cumulativeOutputTokens = 0;
4650
+ let currentRequestCost = 0;
4152
4651
  const tokenFilePath = `/tmp/claudish-tokens-${port}.json`;
4153
4652
  const writeTokenFile = () => {
4154
4653
  try {
4654
+ const totalTokens = cumulativeInputTokens + cumulativeOutputTokens;
4655
+ let contextLeftPercent = 100;
4656
+ if (contextWindowLimit > 0) {
4657
+ contextLeftPercent = Math.round((contextWindowLimit - totalTokens) / contextWindowLimit * 100);
4658
+ contextLeftPercent = Math.max(0, Math.min(100, contextLeftPercent));
4659
+ }
4155
4660
  const tokenData = {
4156
4661
  input_tokens: cumulativeInputTokens,
4157
4662
  output_tokens: cumulativeOutputTokens,
4158
- total_tokens: cumulativeInputTokens + cumulativeOutputTokens,
4663
+ total_tokens: totalTokens,
4664
+ total_cost: sessionTotalCost,
4665
+ context_window: contextWindowLimit,
4666
+ context_left_percent: contextLeftPercent,
4159
4667
  updated_at: Date.now()
4160
4668
  };
4161
4669
  writeFileSync5(tokenFilePath, JSON.stringify(tokenData), "utf-8");
@@ -4167,19 +4675,14 @@ data: ${JSON.stringify(data)}
4167
4675
  };
4168
4676
  const toolCalls = new Map;
4169
4677
  const toolCallIds = new Set;
4170
- const adapterManager = new AdapterManager(model || "");
4171
- const adapter = adapterManager.getAdapter();
4172
- if (typeof adapter.reset === "function") {
4173
- adapter.reset();
4174
- }
4175
4678
  let accumulatedTextLength = 0;
4176
- log(`[Proxy] Using adapter: ${adapter.getName()}`);
4177
4679
  const hasToolResults = claudeRequest.messages?.some((msg) => Array.isArray(msg.content) && msg.content.some((block) => block.type === "tool_result"));
4178
4680
  const isFirstTurn = !hasToolResults;
4179
4681
  const estimateTokens = (text) => Math.ceil(text.length / 4);
4180
4682
  const requestJson = JSON.stringify(claudeRequest);
4181
4683
  const estimatedInputTokens = estimateTokens(requestJson);
4182
4684
  const estimatedCacheTokens = isFirstTurn ? Math.floor(estimatedInputTokens * 0.8) : 0;
4685
+ const scaleFactor = getTokenScaleFactor();
4183
4686
  sendSSE("message_start", {
4184
4687
  type: "message_start",
4185
4688
  message: {
@@ -4191,23 +4694,13 @@ data: ${JSON.stringify(data)}
4191
4694
  stop_reason: null,
4192
4695
  stop_sequence: null,
4193
4696
  usage: {
4194
- input_tokens: estimatedInputTokens - estimatedCacheTokens,
4195
- cache_creation_input_tokens: isFirstTurn ? estimatedCacheTokens : 0,
4196
- cache_read_input_tokens: isFirstTurn ? 0 : estimatedCacheTokens,
4697
+ input_tokens: Math.ceil((estimatedInputTokens - estimatedCacheTokens) * scaleFactor),
4698
+ cache_creation_input_tokens: isFirstTurn ? Math.ceil(estimatedCacheTokens * scaleFactor) : 0,
4699
+ cache_read_input_tokens: isFirstTurn ? 0 : Math.ceil(estimatedCacheTokens * scaleFactor),
4197
4700
  output_tokens: 1
4198
4701
  }
4199
4702
  }
4200
4703
  });
4201
- textBlockIndex = currentBlockIndex++;
4202
- sendSSE("content_block_start", {
4203
- type: "content_block_start",
4204
- index: textBlockIndex,
4205
- content_block: {
4206
- type: "text",
4207
- text: ""
4208
- }
4209
- });
4210
- textBlockStarted = true;
4211
4704
  sendSSE("ping", {
4212
4705
  type: "ping"
4213
4706
  });
@@ -4264,9 +4757,35 @@ data: ${JSON.stringify(data)}
4264
4757
  finishReason: chunk.choices?.[0]?.finish_reason,
4265
4758
  hasUsage: !!chunk.usage
4266
4759
  });
4760
+ const delta2 = chunk.choices?.[0]?.delta;
4761
+ if (delta2?.tool_calls) {
4762
+ for (const toolCall of delta2.tool_calls) {
4763
+ if (toolCall.extra_content) {
4764
+ logStructured("DEBUG: Found extra_content in tool_call", {
4765
+ tool_call_id: toolCall.id,
4766
+ has_extra_content: true,
4767
+ extra_content_keys: Object.keys(toolCall.extra_content),
4768
+ has_google: !!toolCall.extra_content.google
4769
+ });
4770
+ }
4771
+ }
4772
+ }
4773
+ if (delta2?.tool_calls && dataStr.includes("tool_calls")) {
4774
+ logStructured("DEBUG: Raw chunk JSON (tool_calls)", {
4775
+ has_extra_content_in_raw: dataStr.includes("extra_content"),
4776
+ raw_snippet: dataStr.substring(0, 500)
4777
+ });
4778
+ }
4267
4779
  }
4268
4780
  if (chunk.usage) {
4269
4781
  usage = chunk.usage;
4782
+ if (typeof usage.cost === "number") {
4783
+ const costDiff = usage.cost - currentRequestCost;
4784
+ if (costDiff > 0) {
4785
+ sessionTotalCost += costDiff;
4786
+ currentRequestCost = usage.cost;
4787
+ }
4788
+ }
4270
4789
  if (usage.prompt_tokens) {
4271
4790
  cumulativeInputTokens = usage.prompt_tokens;
4272
4791
  }
@@ -4277,6 +4796,14 @@ data: ${JSON.stringify(data)}
4277
4796
  }
4278
4797
  const choice = chunk.choices?.[0];
4279
4798
  const delta = choice?.delta;
4799
+ if (delta) {
4800
+ await middlewareManager.afterStreamChunk({
4801
+ modelId: model || "",
4802
+ chunk,
4803
+ delta,
4804
+ metadata: streamMetadata
4805
+ });
4806
+ }
4280
4807
  const hasReasoning = !!delta?.reasoning;
4281
4808
  const hasContent = !!delta?.content;
4282
4809
  const reasoningText = delta?.reasoning || "";
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claudish",
3
- "version": "2.2.1",
3
+ "version": "2.4.0",
4
4
  "description": "CLI tool to run Claude Code with any OpenRouter model (Grok, GPT-5, MiniMax, etc.) via local Anthropic API-compatible proxy",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",
@@ -1,26 +1,26 @@
1
1
  {
2
2
  "version": "2.1.0",
3
- "lastUpdated": "2025-11-20",
3
+ "lastUpdated": "2025-11-24",
4
4
  "source": "https://openrouter.ai/models?categories=programming&fmt=cards&order=top-weekly",
5
5
  "models": [
6
6
  {
7
- "id": "x-ai/grok-code-fast-1",
8
- "name": "xAI: Grok Code Fast 1",
9
- "description": "Grok Code Fast 1 is a speedy and economical reasoning model that excels at agentic coding. With reasoning traces visible in the response, developers can steer Grok Code for high-quality work flows.",
10
- "provider": "X-ai",
11
- "category": "reasoning",
7
+ "id": "google/gemini-3-pro-preview",
8
+ "name": "Google: Gemini 3 Pro Preview",
9
+ "description": "Gemini 3 Pro is Google’s flagship frontier model for high-precision multimodal reasoning, combining strong performance across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks. It delivers state-of-the-art benchmark results in general reasoning, STEM problem solving, factual QA, and multimodal understanding, including leading scores on LMArena, GPQA Diamond, MathArena Apex, MMMU-Pro, and Video-MMMU. Interactions emphasize depth and interpretability: the model is designed to infer intent with minimal prompting and produce direct, insight-focused responses.\n\nBuilt for advanced development and agentic workflows, Gemini 3 Pro provides robust tool-calling, long-horizon planning stability, and strong zero-shot generation for complex UI, visualization, and coding tasks. It excels at agentic coding (SWE-Bench Verified, Terminal-Bench 2.0), multimodal analysis, and structured long-form tasks such as research synthesis, planning, and interactive learning experiences. Suitable applications include autonomous agents, coding assistants, multimodal analytics, scientific reasoning, and high-context information processing.",
10
+ "provider": "Google",
11
+ "category": "vision",
12
12
  "priority": 1,
13
13
  "pricing": {
14
- "input": "$0.20/1M",
15
- "output": "$1.50/1M",
16
- "average": "$0.85/1M"
14
+ "input": "$2.00/1M",
15
+ "output": "$12.00/1M",
16
+ "average": "$7.00/1M"
17
17
  },
18
- "context": "256K",
19
- "maxOutputTokens": 10000,
20
- "modality": "text->text",
18
+ "context": "1048K",
19
+ "maxOutputTokens": 65536,
20
+ "modality": "text+image->text",
21
21
  "supportsTools": true,
22
22
  "supportsReasoning": true,
23
- "supportsVision": false,
23
+ "supportsVision": true,
24
24
  "isModerated": false,
25
25
  "recommended": true
26
26
  },
@@ -46,19 +46,19 @@
46
46
  "recommended": true
47
47
  },
48
48
  {
49
- "id": "moonshotai/kimi-k2-thinking",
50
- "name": "MoonshotAI: Kimi K2 Thinking",
51
- "description": "Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in Kimi K2, it activates 32 billion parameters per forward pass and supports 256 k-token context windows. The model is optimized for persistent step-by-step thought, dynamic tool invocation, and complex reasoning workflows that span hundreds of turns. It interleaves step-by-step reasoning with tool use, enabling autonomous research, coding, and writing that can persist for hundreds of sequential actions without drift.\n\nIt sets new open-source benchmarks on HLE, BrowseComp, SWE-Multilingual, and LiveCodeBench, while maintaining stable multi-agent behavior through 200–300 tool calls. Built on a large-scale MoE architecture with MuonClip optimization, it combines strong reasoning depth with high inference efficiency for demanding agentic and analytical tasks.",
52
- "provider": "Moonshotai",
49
+ "id": "x-ai/grok-code-fast-1",
50
+ "name": "xAI: Grok Code Fast 1",
51
+ "description": "Grok Code Fast 1 is a speedy and economical reasoning model that excels at agentic coding. With reasoning traces visible in the response, developers can steer Grok Code for high-quality work flows.",
52
+ "provider": "X-ai",
53
53
  "category": "reasoning",
54
54
  "priority": 3,
55
55
  "pricing": {
56
- "input": "$0.50/1M",
57
- "output": "$2.50/1M",
58
- "average": "$1.50/1M"
56
+ "input": "$0.20/1M",
57
+ "output": "$1.50/1M",
58
+ "average": "$0.85/1M"
59
59
  },
60
- "context": "262K",
61
- "maxOutputTokens": 262144,
60
+ "context": "256K",
61
+ "maxOutputTokens": 10000,
62
62
  "modality": "text->text",
63
63
  "supportsTools": true,
64
64
  "supportsReasoning": true,
@@ -66,38 +66,17 @@
66
66
  "isModerated": false,
67
67
  "recommended": true
68
68
  },
69
- {
70
- "id": "google/gemini-2.5-flash",
71
- "name": "Google: Gemini 2.5 Flash",
72
- "description": "Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in \"thinking\" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. \n\nAdditionally, Gemini 2.5 Flash is configurable through the \"max tokens for reasoning\" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).",
73
- "provider": "Google",
74
- "category": "reasoning",
75
- "priority": 4,
76
- "pricing": {
77
- "input": "$0.30/1M",
78
- "output": "$2.50/1M",
79
- "average": "$1.40/1M"
80
- },
81
- "context": "1048K",
82
- "maxOutputTokens": 65535,
83
- "modality": "text+image->text",
84
- "supportsTools": true,
85
- "supportsReasoning": true,
86
- "supportsVision": true,
87
- "isModerated": false,
88
- "recommended": true
89
- },
90
69
  {
91
70
  "id": "minimax/minimax-m2",
92
71
  "name": "MiniMax: MiniMax M2",
93
72
  "description": "MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. With 10 billion activated parameters (230 billion total), it delivers near-frontier intelligence across general reasoning, tool use, and multi-step task execution while maintaining low latency and deployment efficiency.\n\nThe model excels in code generation, multi-file editing, compile-run-fix loops, and test-validated repair, showing strong results on SWE-Bench Verified, Multi-SWE-Bench, and Terminal-Bench. It also performs competitively in agentic evaluations such as BrowseComp and GAIA, effectively handling long-horizon planning, retrieval, and recovery from execution errors.\n\nBenchmarked by [Artificial Analysis](https://artificialanalysis.ai/models/minimax-m2), MiniMax-M2 ranks among the top open-source models for composite intelligence, spanning mathematics, science, and instruction-following. Its small activation footprint enables fast inference, high concurrency, and improved unit economics, making it well-suited for large-scale agents, developer assistants, and reasoning-driven applications that require responsiveness and cost efficiency.\n\nTo avoid degrading this model's performance, MiniMax highly recommends preserving reasoning between turns. Learn more about using reasoning_details to pass back reasoning in our [docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks).",
94
73
  "provider": "Minimax",
95
74
  "category": "reasoning",
96
- "priority": 5,
75
+ "priority": 4,
97
76
  "pricing": {
98
- "input": "$0.26/1M",
99
- "output": "$1.02/1M",
100
- "average": "$0.64/1M"
77
+ "input": "$0.24/1M",
78
+ "output": "$0.96/1M",
79
+ "average": "$0.60/1M"
101
80
  },
102
81
  "context": "204K",
103
82
  "maxOutputTokens": 131072,
@@ -114,7 +93,7 @@
114
93
  "description": "Compared with GLM-4.5, this generation brings several key improvements:\n\nLonger context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks.\nSuperior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages.\nAdvanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability.\nMore capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks.\nRefined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.",
115
94
  "provider": "Z-ai",
116
95
  "category": "reasoning",
117
- "priority": 6,
96
+ "priority": 5,
118
97
  "pricing": {
119
98
  "input": "$0.40/1M",
120
99
  "output": "$1.75/1M",
@@ -129,55 +108,13 @@
129
108
  "isModerated": false,
130
109
  "recommended": true
131
110
  },
132
- {
133
- "id": "openai/gpt-5",
134
- "name": "OpenAI: GPT-5",
135
- "description": "GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like \"think hard about this.\" Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks.",
136
- "provider": "Openai",
137
- "category": "reasoning",
138
- "priority": 7,
139
- "pricing": {
140
- "input": "$1.25/1M",
141
- "output": "$10.00/1M",
142
- "average": "$5.63/1M"
143
- },
144
- "context": "400K",
145
- "maxOutputTokens": 128000,
146
- "modality": "text+image->text",
147
- "supportsTools": true,
148
- "supportsReasoning": true,
149
- "supportsVision": true,
150
- "isModerated": true,
151
- "recommended": true
152
- },
153
- {
154
- "id": "google/gemini-3-pro-preview",
155
- "name": "Google: Gemini 3 Pro Preview",
156
- "description": "Gemini 3 Pro is Google’s flagship frontier model for high-precision multimodal reasoning, combining strong performance across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks. It delivers state-of-the-art benchmark results in general reasoning, STEM problem solving, factual QA, and multimodal understanding, including leading scores on LMArena, GPQA Diamond, MathArena Apex, MMMU-Pro, and Video-MMMU. Interactions emphasize depth and interpretability: the model is designed to infer intent with minimal prompting and produce direct, insight-focused responses.\n\nBuilt for advanced development and agentic workflows, Gemini 3 Pro provides robust tool-calling, long-horizon planning stability, and strong zero-shot generation for complex UI, visualization, and coding tasks. It excels at agentic coding (SWE-Bench Verified, Terminal-Bench 2.0), multimodal analysis, and structured long-form tasks such as research synthesis, planning, and interactive learning experiences. Suitable applications include autonomous agents, coding assistants, multimodal analytics, scientific reasoning, and high-context information processing.",
157
- "provider": "Google",
158
- "category": "vision",
159
- "priority": 8,
160
- "pricing": {
161
- "input": "$2.00/1M",
162
- "output": "$12.00/1M",
163
- "average": "$7.00/1M"
164
- },
165
- "context": "1048K",
166
- "maxOutputTokens": 65536,
167
- "modality": "text+image->text",
168
- "supportsTools": true,
169
- "supportsReasoning": true,
170
- "supportsVision": true,
171
- "isModerated": false,
172
- "recommended": true
173
- },
174
111
  {
175
112
  "id": "qwen/qwen3-vl-235b-a22b-instruct",
176
113
  "name": "Qwen: Qwen3 VL 235B A22B Instruct",
177
114
  "description": "Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table extraction, multilingual OCR). The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning.\n\nBeyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows—turning sketches or mockups into code and assisting with UI debugging—while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.",
178
115
  "provider": "Qwen",
179
116
  "category": "vision",
180
- "priority": 9,
117
+ "priority": 6,
181
118
  "pricing": {
182
119
  "input": "$0.21/1M",
183
120
  "output": "$1.90/1M",
@@ -191,27 +128,6 @@
191
128
  "supportsVision": true,
192
129
  "isModerated": false,
193
130
  "recommended": true
194
- },
195
- {
196
- "id": "openrouter/polaris-alpha",
197
- "name": "Polaris Alpha",
198
- "description": "openrouter/polaris-alpha (metadata pending - not yet available in API)",
199
- "provider": "Openrouter",
200
- "category": "programming",
201
- "priority": 10,
202
- "pricing": {
203
- "input": "N/A",
204
- "output": "N/A",
205
- "average": "N/A"
206
- },
207
- "context": "N/A",
208
- "maxOutputTokens": null,
209
- "modality": "text->text",
210
- "supportsTools": false,
211
- "supportsReasoning": false,
212
- "supportsVision": false,
213
- "isModerated": false,
214
- "recommended": true
215
131
  }
216
132
  ]
217
133
  }