npm - lynkr - Versions diffs - 9.4.6 → 9.5.0 - Mend

lynkr 9.4.6 → 9.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/README.md +46 -14
package/install.sh +21 -5
package/package.json +4 -2
package/public/dashboard.html +13 -1
package/scripts/check-native.js +97 -0
package/src/clients/databricks.js +80 -3
package/src/clients/openrouter-utils.js +15 -0
package/src/config/index.js +9 -0
package/src/context/caveman.js +94 -0
package/src/context/tool-dedup.js +95 -0
package/src/context/tool-result-compressor.js +106 -0
package/src/dashboard/api.js +69 -18
package/src/orchestrator/bypass.js +135 -0
package/src/orchestrator/index.js +33 -2
package/src/routing/index.js +39 -0
package/src/routing/model-registry.js +89 -26
package/src/routing/risk-analyzer.js +7 -2
package/src/routing/session-affinity.js +96 -0
package/src/routing/telemetry.js +16 -3
package/.impeccable/live/config.json +0 -8

package/README.md CHANGED Viewed

@@ -545,6 +545,28 @@ TOOL_INJECTION_ENABLED=false
 CODE_MODE_ENABLED=true
 ```
+Always-on (no config): **smart tool selection** (server mode), **RTK tool-result
+compression** (test/git/grep/lint/build/JSON output), **MCP tool dedup** (drops
+built-in WebSearch/WebFetch when an Exa/Tavily MCP tool is present), and
+**request bypass** (Claude CLI Warmup / title-extraction calls are answered
+locally, never hitting a provider).
+Optional **terse-output mode** to cut *output* tokens:
+```bash
+CAVEMAN_ENABLED=true        # off by default — nudges the model to be concise
+CAVEMAN_LEVEL=lite          # lite | full | ultra
+```
+### Cost tracking & model pricing
+Per-request cost is computed from a model-pricing registry (LiteLLM → models.dev,
+cached 24h) and recorded in telemetry. Models the registry doesn't know record
+`cost_usd=null` (logged once) rather than a fabricated price. Pin prices for
+unknown models:
+```bash
+# Per-1M-token USD prices, JSON keyed by model name
+MODEL_PRICE_OVERRIDES={"my-model":{"input":0.5,"output":1.5}}
+```
 ### Memory System (Titans-inspired)
 ```bash
 MEMORY_ENABLED=true
@@ -652,35 +674,45 @@ npm start
 ## Benchmark Results
-Measured on real agentic coding workloads (Claude Code / Cursor sessions) with Ollama, Moonshot, and Azure OpenAI backends. Run with `node benchmark-tier-routing.js`.
+Head-to-head against **LiteLLM** on the **same backends** (Ollama `minimax-m2.5`, Moonshot, Azure OpenAI), 9 scenarios across 4 feature categories. Apples-to-apples comparison is Lynkr vs LiteLLM **billed tokens on the same scenario**. Run with `node benchmark-tier-routing.js`.
-### Token compression
+> _Run: June 5, 2026 · Lynkr v9.3.2 · LiteLLM v1.87.1 · macOS, Apple Silicon._
-| Scenario | Tokens without Lynkr | Tokens with Lynkr | Reduction |
+### Token reduction (vs LiteLLM, same model & prompt)
+| Mechanism | Lynkr | LiteLLM | Result |
 |---|---|---|---|
-| 14-tool request (read task) | 1,042 | **547** | **47%** |
-| 14-tool request (write task) | 1,043 | **412** | **60%** |
-| Large JSON grep result (60 items) | 3,458 | **427** | **87.6%** |
+| Smart tool selection (14 tools) | **959** tokens · $0.0044 | 2,085 tokens · $0.0091 | **53% fewer tokens, 52% cheaper** |
+| TOON compression (60-item grep JSON) | **427** tokens · $0.009 | 3,458 tokens · $0.018 | **87.6% fewer tokens, 50% cheaper** |
-Lynkr strips irrelevant tool schemas before forwarding (smart tool selection) and binary-compresses large JSON tool results (TOON) — both happen in-process with no added latency.
+Lynkr strips irrelevant tool schemas (smart tool selection) and binary-compresses large JSON tool results (TOON) — both in-process, no added latency.
 ### Semantic cache
 | | Tokens billed | Response time |
 |---|---|---|
 | First call (cold) | 2,857 | 1,891ms |
-| **Second call — paraphrased, cache hit** | **0** | **171ms** |
+| **Second call — paraphrased, cache hit** | **0** (served from cache) | **171ms (11× faster)** |
-Near-identical prompts return cached responses in 171ms. Zero tokens billed on a cache hit.
+Near-identical prompts return cached responses in 171ms. Zero model tokens billed on a cache hit.
 ### Tier routing
-| Request | Routed to |
-|---|---|
-| "What does git stash do?" | SIMPLE → local model (free) |
-| JWT vs cookies security analysis | COMPLEX → cloud model (correct) |
+| Request | Lynkr routes to | LiteLLM routes to |
+|---|---|---|
+| "What does git stash do?" | `minimax-m2.5` (local, free) | Ollama (local) |
+| JWT vs cookies security analysis | `moonshot` (cloud — correct) | **Ollama (local — wrong call)** |
+Lynkr scores each request on 15 dimensions (token count, code complexity, reasoning markers, risk signals, agentic patterns) and escalates automatically. LiteLLM's `cost-based-routing` sends everything to the cheapest model regardless of complexity.
+### Cost projection (100,000 requests/month, same backend)
+| | Monthly cost | vs LiteLLM |
+|---|---|---|
+| LiteLLM | ~$818 | baseline |
+| **Lynkr** | **~$409** | **~50% cheaper** |
-Lynkr scores each request on 15 dimensions (token count, code complexity, reasoning markers, risk signals, agentic patterns) and routes automatically. No caller changes needed.
+_Based on a tool-heavy agentic session (TOON scenario). On equal footing — same provider, same model — Lynkr is cheaper due to token optimization._
 → [Full benchmark report with methodology](BENCHMARK_REPORT.md)

package/install.sh CHANGED Viewed

@@ -108,8 +108,24 @@ clone_or_update() {
 install_dependencies() {
     print_info "Installing dependencies..."
     cd "$INSTALL_DIR"
-    npm install --production
+    # --omit=dev keeps optionalDependencies (better-sqlite3, hnswlib-node,
+    # tree-sitter) which back telemetry, the memory store and routing ML.
+    # The postinstall hook (scripts/check-native.js) verifies the native ABI
+    # and rebuilds if Node was upgraded — best-effort, never fails the install.
+    npm install --omit=dev
     print_success "Dependencies installed"
+    # Native optional modules need a C/C++ toolchain only if no prebuilt binary
+    # is available for this platform. They degrade gracefully if absent.
+    if ! node -e "const D=require('better-sqlite3'); new D(':memory:').close()" >/dev/null 2>&1; then
+        print_warning "Native module 'better-sqlite3' is not loadable."
+        echo "     Telemetry, the memory store and sessions need it. To enable:"
+        echo "       - Ensure a build toolchain is present (Xcode CLT on macOS, build-essential + python3 on Linux), then:"
+        echo "       - ${BLUE}cd $INSTALL_DIR && npm run rebuild-native${NC}"
+        echo "     Lynkr still runs without it (those features stay disabled)."
+    else
+        print_success "Native modules OK (telemetry, memory, sessions enabled)"
+    fi
 }
 # Create default .env file
@@ -131,7 +147,7 @@ create_env_file() {
 MODEL_PROVIDER=ollama
 # Server Configuration
-PORT=8080
+PORT=8081
 # Ollama Configuration (default for local development)
 OLLAMA_MODEL=qwen2.5-coder:7b
@@ -161,7 +177,7 @@ EOF
         print_info "📝 Configuration ready! Key settings:"
         echo "     • Default provider: Ollama (local, offline)"
         echo "     • Memory system: Enabled (learns from conversations)"
-        echo "     • Port: 8080"
+        echo "     • Port: 8081"
         echo ""
         print_warning "To use cloud providers (Databricks/OpenAI/Azure):"
         echo "     Edit: ${BLUE}nano $INSTALL_DIR/.env${NC}"
@@ -220,7 +236,7 @@ print_next_steps() {
     echo "     ${BLUE}lynkr${NC}"
     echo ""
     echo "  3. Configure Claude Code CLI:"
-    echo "     ${BLUE}export ANTHROPIC_BASE_URL=http://localhost:8080${NC}"
+    echo "     ${BLUE}export ANTHROPIC_BASE_URL=http://localhost:8081${NC}"
     echo "     ${BLUE}claude${NC}"
     echo ""
     echo "  ${YELLOW}Option B: Use Cloud Providers (Databricks/OpenAI/Azure)${NC}"
@@ -238,7 +254,7 @@ print_next_steps() {
     echo "     ${BLUE}lynkr${NC}"
     echo ""
     echo "  3. Configure Claude Code CLI:"
-    echo "     ${BLUE}export ANTHROPIC_BASE_URL=http://localhost:8080${NC}"
+    echo "     ${BLUE}export ANTHROPIC_BASE_URL=http://localhost:8081${NC}"
     echo "     ${BLUE}export ANTHROPIC_API_KEY=any-non-empty-value${NC}  ${GREEN}← Placeholder${NC}"
     echo "     ${BLUE}claude${NC}"
     echo ""

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "lynkr",
-  "version": "9.4.6",
+  "version": "9.5.0",
   "description": "Self-hosted LLM gateway and tier-routing proxy for Claude Code, Cursor, and Codex. Routes across Ollama, AWS Bedrock, OpenRouter, Databricks, Azure OpenAI, llama.cpp, and LM Studio with prompt caching, MCP tools, and 60-80% cost savings.",
   "main": "index.js",
   "bin": {
@@ -8,13 +8,15 @@
     "lynkr-setup": "scripts/setup.js"
   },
   "scripts": {
+    "postinstall": "node scripts/check-native.js",
+    "rebuild-native": "node scripts/check-native.js",
     "prestart": "node -e \"if(process.env.HEADROOM_ENABLED==='true'&&process.env.HEADROOM_DOCKER_ENABLED!=='false'){process.exit(0)}else{process.exit(1)}\" && docker compose --profile headroom up -d --build headroom 2>/dev/null || echo 'Headroom skipped (disabled or Docker not running)'",
     "start": "node index.js 2>&1 | npx pino-pretty --sync",
     "stop": "node -e \"if(process.env.HEADROOM_ENABLED==='true'&&process.env.HEADROOM_DOCKER_ENABLED!=='false'){process.exit(0)}else{process.exit(1)}\" && docker compose --profile headroom down || echo 'Headroom skipped (disabled or Docker not running)'",
     "dev": "nodemon index.js",
     "lint": "eslint src index.js",
     "test": "npm run test:unit && npm run test:performance",
-    "test:unit": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/routing.test.js test/hybrid-routing-integration.test.js test/web-tools.test.js test/passthrough-mode.test.js test/openrouter-error-resilience.test.js test/format-conversion.test.js test/azure-openai-config.test.js test/azure-openai-format-conversion.test.js test/azure-openai-routing.test.js test/azure-openai-streaming.test.js test/azure-openai-error-resilience.test.js test/azure-openai-integration.test.js test/openai-integration.test.js test/toon-compression.test.js test/llamacpp-integration.test.js test/resilience.test.js test/telemetry-routing.test.js test/memory/store.test.js test/memory/surprise.test.js test/memory/extractor.test.js test/memory/search.test.js test/memory/retriever.test.js test/distill.test.js test/large-payload.test.js test/code-mode.test.js test/prompt-cache-injection.test.js test/risk-analyzer.test.js test/interaction-block.test.js test/preflight.test.js",
+    "test:unit": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/routing.test.js test/hybrid-routing-integration.test.js test/web-tools.test.js test/passthrough-mode.test.js test/openrouter-error-resilience.test.js test/format-conversion.test.js test/azure-openai-config.test.js test/azure-openai-format-conversion.test.js test/azure-openai-routing.test.js test/azure-openai-streaming.test.js test/azure-openai-error-resilience.test.js test/azure-openai-integration.test.js test/openai-integration.test.js test/toon-compression.test.js test/llamacpp-integration.test.js test/resilience.test.js test/telemetry-routing.test.js test/memory/store.test.js test/memory/surprise.test.js test/memory/extractor.test.js test/memory/search.test.js test/memory/retriever.test.js test/distill.test.js test/large-payload.test.js test/code-mode.test.js test/prompt-cache-injection.test.js test/risk-analyzer.test.js test/interaction-block.test.js test/preflight.test.js test/token-reduction.test.js test/session-affinity.test.js test/model-registry-cost.test.js",
     "test:memory": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/memory/store.test.js test/memory/surprise.test.js test/memory/extractor.test.js test/memory/search.test.js test/memory/retriever.test.js",
     "test:new-features": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/passthrough-mode.test.js test/openrouter-error-resilience.test.js test/format-conversion.test.js",
     "test:performance": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node test/hybrid-routing-performance.test.js && DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node test/performance-tests.js",

package/public/dashboard.html CHANGED Viewed

@@ -244,6 +244,7 @@ const App = {
     const t = d.today;
     const s = d.stats;
+    const tierLabel = t => t === 'default' ? 'default' : String(t).toLowerCase();
     const providerCards = d.providers.length === 0
       ? `<p class="text-slate-500 text-sm">No providers configured</p>`
       : d.providers.map(p => `
@@ -251,10 +252,21 @@ const App = {
           <div class="flex items-center gap-2">
             <span class="status-dot ${providerDot(p.type)}"></span>
             <span class="text-sm font-medium text-slate-200">${p.name}</span>
+            ${(p.tiers || []).map(t => `<span class="badge bg-slate-600/60 text-slate-300">${tierLabel(t)}</span>`).join('')}
           </div>
           <span class="text-xs ${p.type === 'local' ? 'text-green-400' : 'text-blue-400'}">${p.type}</span>
         </div>`).join('');
+    const providerWarnings = (d.providerWarnings || []).map(w => `
+        <div class="flex items-center justify-between bg-amber-500/10 border border-amber-500/30 rounded-lg px-4 py-3">
+          <div class="flex items-center gap-2">
+            <span class="text-amber-400 text-sm">⚠</span>
+            <span class="text-sm font-medium text-amber-200">${w.name}</span>
+            ${(w.tiers || []).map(t => `<span class="badge bg-amber-500/20 text-amber-300">${tierLabel(t)}</span>`).join('')}
+          </div>
+          <span class="text-xs text-amber-400">no credentials</span>
+        </div>`).join('');
     const recentRows = (d.recentRequests || []).map(r => `
       <tr class="table-row border-b border-slate-700/50">
         <td class="py-2 px-3 text-xs text-slate-500">${fmt.ago(r.timestamp)}</td>
@@ -279,7 +291,7 @@ const App = {
         <!-- Providers -->
         ${card(`
           <h3 class="text-sm font-semibold text-slate-300 mb-3">Configured Providers</h3>
-          <div class="flex flex-col gap-2">${providerCards}</div>
+          <div class="flex flex-col gap-2">${providerCards}${providerWarnings}</div>
         `)}
         <!-- 24h Stats -->

package/scripts/check-native.js ADDED Viewed

@@ -0,0 +1,97 @@
+#!/usr/bin/env node
+/**
+ * Native module ABI guard (postinstall).
+ *
+ * better-sqlite3 (and the other native optionalDependencies) are compiled
+ * against a specific Node ABI. When Node is upgraded, the prebuilt/compiled
+ * binary stops loading with:
+ *
+ *   "was compiled against a different Node.js version using
+ *    NODE_MODULE_VERSION 115. This version of Node.js requires
+ *    NODE_MODULE_VERSION 141."
+ *
+ * The failure is silent at runtime — telemetry, request logs, and the memory
+ * store all sit behind try/catch and simply go empty. This probe detects the
+ * mismatch and rebuilds the native modules so it self-heals on `npm install`.
+ *
+ * It is intentionally best-effort: it NEVER exits non-zero, so it can't break
+ * `npm install` on machines without a build toolchain (the modules are
+ * optional and the app degrades gracefully without them).
+ */
+const { execSync } = require("child_process");
+// Native optionalDependencies that are ABI-sensitive. If Node changed, all of
+// them are stale, so we rebuild the set in one pass.
+const NATIVE_DEPS = [
+  "better-sqlite3",
+  "hnswlib-node",
+  "tree-sitter",
+  "tree-sitter-javascript",
+  "tree-sitter-python",
+  "tree-sitter-typescript",
+];
+function log(msg) {
+  console.log(`[check-native] ${msg}`);
+}
+/**
+ * Probe better-sqlite3 — the canary. `require()` alone is not enough: the
+ * native addon only loads when a Database is instantiated.
+ * @returns {"ok"|"absent"|"mismatch"}
+ */
+function probe() {
+  let Database;
+  try {
+    Database = require("better-sqlite3");
+  } catch (err) {
+    if (err && err.code === "MODULE_NOT_FOUND") return "absent";
+    return "mismatch";
+  }
+  try {
+    const db = new Database(":memory:");
+    db.close();
+    return "ok";
+  } catch (err) {
+    if (/NODE_MODULE_VERSION|different Node\.js version|invalid ELF|dlopen|\.node/i.test(err.message || "")) {
+      return "mismatch";
+    }
+    // Some other instantiation error — not an ABI issue we can fix by rebuild.
+    return "ok";
+  }
+}
+function main() {
+  const status = probe();
+  if (status === "absent") {
+    // Optional dependency not installed (e.g. build skipped). Nothing to do.
+    return;
+  }
+  if (status === "ok") {
+    return;
+  }
+  log("native module ABI mismatch detected (Node was likely upgraded). Rebuilding native modules…");
+  try {
+    execSync(`npm rebuild ${NATIVE_DEPS.join(" ")}`, { stdio: "inherit" });
+  } catch {
+    log("rebuild did not complete (a build toolchain may be missing). Continuing — native features will be disabled until you run: npm rebuild better-sqlite3");
+    return;
+  }
+  // Re-probe to report the outcome.
+  if (probe() === "ok") {
+    log("native modules rebuilt successfully.");
+  } else {
+    log("native modules still not loadable after rebuild. Run `npm rebuild better-sqlite3` manually.");
+  }
+}
+try {
+  main();
+} catch (err) {
+  // Never fail the install.
+  log(`skipped (${err.message})`);
+}

package/src/clients/databricks.js CHANGED Viewed

@@ -1506,10 +1506,16 @@ async function invokeMoonshot(body) {
     "claude-haiku-4-5-20251001": "kimi-k2-turbo-preview",
     "claude-haiku-4-5": "kimi-k2-turbo-preview",
     "claude-3-haiku": "kimi-k2-turbo-preview",
+    // moonshot-v1-auto 400s with "tokenization failed" (its server-side auto
+    // context-size pass fails on large tool-bearing payloads). Remap to a
+    // fixed model that's broadly available on api.moonshot.ai.
+    "moonshot-v1-auto": "moonshot-v1-128k",
   };
   const requestedModel = body._tierModel || body.model || config.moonshot.model;
-  const mappedModel = modelMap[requestedModel] || config.moonshot.model || "kimi-k2-turbo-preview";
+  let mappedModel = modelMap[requestedModel] || config.moonshot.model || "kimi-k2-turbo-preview";
+  // Guard against the deprecated auto model arriving via config too.
+  if (mappedModel === "moonshot-v1-auto") mappedModel = "moonshot-v1-128k";
   // Convert messages using existing utility
   const messages = convertAnthropicMessagesToOpenRouter(body.messages || []);
@@ -1522,12 +1528,18 @@ async function invokeMoonshot(body) {
     messages.unshift({ role: "system", content: systemContent });
   }
+  // kimi-k2.x (k2.5 / k2.6 …) are thinking models that only accept
+  // temperature: 1 — any other value 400s with "invalid temperature".
+  const isKimiThinking = /^kimi-k2/i.test(mappedModel);
   const moonshotBody = {
     model: mappedModel,
     messages,
     max_tokens: body.max_tokens || 16384,
-    temperature: body.temperature ?? 0.7,
-    top_p: body.top_p ?? 1.0,
+    // kimi-k2.x thinking models pin sampling params: temperature must be 1
+    // and top_p must be 0.95 — any other value 400s.
+    temperature: isKimiThinking ? 1 : (body.temperature ?? 0.7),
+    top_p: isKimiThinking ? 0.95 : (body.top_p ?? 1.0),
     stream: false,  // Force non-streaming - OpenAI SSE to Anthropic SSE conversion not implemented
   };
@@ -2027,6 +2039,65 @@ async function invokeCodex(body) {
   };
 }
+/**
+ * Compute request cost in USD from model pricing × token usage.
+ * Registry returns per-1M-token prices ({ input, output }); returns null when
+ * pricing is unknown so we don't record misleading zeros.
+ */
+const _unknownCostWarned = new Set();
+function computeCostUsd(model, inputTokens, outputTokens) {
+  try {
+    const { getModelRegistrySync } = require("../routing/model-registry");
+    const reg = getModelRegistrySync && getModelRegistrySync();
+    const cost = reg?.getCost?.(model);
+    if (!cost) return null;
+    // Unknown model → record null (not a fabricated default), warn once so the
+    // gap is visible and can be fixed via MODEL_PRICE_OVERRIDES.
+    if (cost.unknown) {
+      if (model && !_unknownCostWarned.has(model)) {
+        _unknownCostWarned.add(model);
+        logger.warn({ model }, "[Cost] No pricing for model — recording cost_usd=null. Set MODEL_PRICE_OVERRIDES to fix.");
+      }
+      return null;
+    }
+    if (cost.input == null && cost.output == null) return null;
+    const inUsd = ((inputTokens || 0) / 1e6) * (cost.input || 0);
+    const outUsd = ((outputTokens || 0) / 1e6) * (cost.output || 0);
+    return Number((inUsd + outUsd).toFixed(6));
+  } catch {
+    return null;
+  }
+}
+// Telemetry prompt/response text is always captured (truncated) to build the
+// routing ML training corpus. Stored locally in .lynkr/telemetry.db only.
+const TELEMETRY_TEXT_MAXLEN = 2000;
+/** Flatten the latest user message to plain text (for telemetry capture). */
+function captureRequestText(body) {
+  const messages = body?.messages;
+  if (!Array.isArray(messages)) return null;
+  for (let i = messages.length - 1; i >= 0; i--) {
+    const m = messages[i];
+    if (m?.role !== "user") continue;
+    let text = "";
+    if (typeof m.content === "string") text = m.content;
+    else if (Array.isArray(m.content)) {
+      text = m.content.filter((b) => b?.type === "text").map((b) => b.text || "").join(" ");
+    }
+    if (text) return text.slice(0, TELEMETRY_TEXT_MAXLEN);
+  }
+  return null;
+}
+/** Flatten an Anthropic response's text blocks to plain text (for telemetry). */
+function captureResponseText(resultJson) {
+  const content = resultJson?.content;
+  if (!Array.isArray(content)) return null;
+  const text = content.filter((b) => b?.type === "text").map((b) => b.text || "").join(" ");
+  return text ? text.slice(0, TELEMETRY_TEXT_MAXLEN) : null;
+}
 async function invokeModel(body, options = {}) {
   const { determineProviderSmart, isFallbackEnabled, getFallbackProvider } = require("./routing");
   const metricsCollector = getMetricsCollector();
@@ -2233,6 +2304,9 @@ async function invokeModel(body, options = {}) {
       circuit_breaker_state: breaker.state,
       quality_score: qualityScore,
       tokens_per_second: outputTokens && latency > 0 ? outputTokens / (latency / 1000) : null,
+      cost_usd: computeCostUsd(routingDecision.model || body._tierModel, inputTokens, outputTokens),
+      request_text: captureRequestText(body),
+      response_text: captureResponseText(result.json),
     });
     // Return result with provider info and routing decision for headers
@@ -2394,6 +2468,9 @@ async function invokeModel(body, options = {}) {
           { status_code: 200, output_tokens: fbOutputTokens, tool_calls_made: fbToolCalls, was_fallback: true, retry_count: 0, latency_ms: Date.now() - startTime }
         ),
         tokens_per_second: fbOutputTokens && fallbackLatency > 0 ? fbOutputTokens / (fallbackLatency / 1000) : null,
+        cost_usd: computeCostUsd(routingDecision.model || body._tierModel, fbInputTokens, fbOutputTokens),
+        request_text: captureRequestText(body),
+        response_text: captureResponseText(fallbackResult.json),
       });
       // Return result with actual provider used (fallback provider) and routing decision

package/src/clients/openrouter-utils.js CHANGED Viewed

@@ -176,6 +176,21 @@ function convertAnthropicMessagesToOpenRouter(anthropicMessages) {
     }
   }
+  // Kimi/Moonshot (and some OpenAI-compatible APIs) reject a message whose
+  // content is an empty string with "Invalid request: tokenization failed".
+  // This happens when a turn had only non-text blocks (thinking / image /
+  // stripped content) and flattened to "". Replace empty/whitespace-only
+  // content with a single space — but never touch an assistant message that
+  // carries tool_calls, where content: null is intentional and required.
+  for (const m of converted) {
+    if (m.role === 'tool') continue;
+    const hasToolCalls = Array.isArray(m.tool_calls) && m.tool_calls.length > 0;
+    if (hasToolCalls) continue;
+    if (typeof m.content !== 'string' || m.content.trim() === '') {
+      m.content = ' ';
+    }
+  }
   // Log the converted messages for debugging
   logger.debug({
     inputCount: anthropicMessages.length,

package/src/config/index.js CHANGED Viewed

@@ -208,6 +208,11 @@ const tokenBudgetWarning = Number.parseInt(process.env.TOKEN_BUDGET_WARNING ?? "
 const tokenBudgetMax = Number.parseInt(process.env.TOKEN_BUDGET_MAX ?? "180000", 10);
 const tokenBudgetEnforcement = process.env.TOKEN_BUDGET_ENFORCEMENT !== "false"; // default true
+// Caveman terse-output injection (opt-in, off by default)
+const cavemanEnabled = process.env.CAVEMAN_ENABLED === "true";
+const cavemanLevel = (process.env.CAVEMAN_LEVEL ?? "lite").toLowerCase();
 // TOON payload compression (opt-in)
 const toonEnabled = process.env.TOON_ENABLED === "true"; // default false
 const toonMinBytes = Number.parseInt(process.env.TOON_MIN_BYTES ?? "4096", 10);
@@ -641,6 +646,10 @@ var config = {
   toolResultCompression: {
     enabled: true,
   },
+  caveman: {
+    enabled: cavemanEnabled,
+    level: cavemanLevel,
+  },
   server: {
     jsonLimit: process.env.REQUEST_JSON_LIMIT ?? "1gb",
   },

package/src/context/caveman.js ADDED Viewed

@@ -0,0 +1,94 @@
+/**
+ * Caveman Terse-Output Injector
+ *
+ * Appends a brevity instruction to the system prompt so the model produces
+ * terser responses, reducing OUTPUT tokens. Opt-in and off by default — it
+ * changes model behavior, so it's only applied when explicitly enabled.
+ *
+ * Enable with CAVEMAN_ENABLED=true. Level via CAVEMAN_LEVEL=lite|full|ultra
+ * (default: lite). Adapted from 9router's caveman injector / the caveman skill
+ * (https://github.com/JuliusBrussee/caveman).
+ *
+ * @module context/caveman
+ */
+const config = require("../config");
+const logger = require("../logger");
+const LEVELS = ["lite", "full", "ultra"];
+// Shared guardrails so brevity never corrupts the substance that matters.
+const BOUNDARIES =
+  "Code blocks, file paths, commands, errors, URLs: keep exact. " +
+  "Security warnings, irreversible-action confirmations, and multi-step ordered " +
+  "sequences: write in full normal prose. Resume terse style afterward.";
+const EXAMPLES =
+  'Not: "Sure! I\'d be happy to help. The issue is likely caused by..." ' +
+  'Yes: "Bug in auth middleware. Token expiry uses `<` not `<=`. Fix:"';
+const PERSISTENCE = "Apply this to every response unless a guardrail above applies.";
+const PROMPTS = {
+  lite: [
+    "Respond tersely. Keep grammar and full sentences but drop filler, hedging, and pleasantries (just/really/basically/sure/of course/I'd be happy to).",
+    "Pattern: state the thing, the action, the reason. Then the next step.",
+    EXAMPLES,
+    BOUNDARIES,
+    PERSISTENCE,
+  ].join(" "),
+  full: [
+    "Respond like a terse caveman. All technical substance stays exact; only fluff dies.",
+    "Drop articles (a/an/the), filler (just/really/basically/actually/simply), pleasantries, and hedging. Fragments OK. Prefer short synonyms (big not extensive, fix not implement a solution for).",
+    "Pattern: [thing] [action] [reason]. [next step].",
+    EXAMPLES,
+    BOUNDARIES,
+    PERSISTENCE,
+  ].join(" "),
+  ultra: [
+    "Respond ultra-terse. Maximum compression. Telegraphic.",
+    "Abbreviate (DB/auth/config/req/res/fn/impl), strip conjunctions, use arrows for causality (X → Y). One word when one word is enough.",
+    "Pattern: [thing] → [result]. [fix].",
+    EXAMPLES,
+    BOUNDARIES,
+    PERSISTENCE,
+  ].join(" "),
+};
+const MARKER = "[brevity]";
+/** Resolve the configured level, falling back to "lite". */
+function resolveLevel(level) {
+  const l = String(level || config.caveman?.level || "lite").toLowerCase();
+  return LEVELS.includes(l) ? l : "lite";
+}
+/**
+ * Append the brevity instruction to a system prompt string.
+ * Idempotent — won't double-inject if the marker is already present.
+ *
+ * @param {string} system - Existing system prompt (may be empty).
+ * @param {object} [opts]
+ * @param {boolean} [opts.enabled] - Override config enablement.
+ * @param {string} [opts.level] - Override level.
+ * @returns {string} system prompt, possibly with brevity instruction appended.
+ */
+function injectCaveman(system, opts = {}) {
+  const enabled = opts.enabled ?? config.caveman?.enabled === true;
+  if (!enabled) return system || "";
+  const base = system || "";
+  if (base.includes(MARKER)) return base;
+  const level = resolveLevel(opts.level);
+  const instruction = `\n\n${MARKER} ${PROMPTS[level]}`;
+  logger.debug({ level }, "[Caveman] Injected brevity instruction into system prompt");
+  return base + instruction;
+}
+module.exports = {
+  injectCaveman,
+  LEVELS,
+};