lynkr 9.4.6 → 9.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +46 -14
- package/install.sh +21 -5
- package/package.json +4 -2
- package/public/dashboard.html +13 -1
- package/scripts/check-native.js +97 -0
- package/src/clients/databricks.js +80 -3
- package/src/clients/openrouter-utils.js +15 -0
- package/src/config/index.js +9 -0
- package/src/context/caveman.js +94 -0
- package/src/context/tool-dedup.js +95 -0
- package/src/context/tool-result-compressor.js +106 -0
- package/src/dashboard/api.js +69 -18
- package/src/orchestrator/bypass.js +135 -0
- package/src/orchestrator/index.js +33 -2
- package/src/routing/index.js +39 -0
- package/src/routing/model-registry.js +89 -26
- package/src/routing/risk-analyzer.js +7 -2
- package/src/routing/session-affinity.js +96 -0
- package/src/routing/telemetry.js +16 -3
- package/.impeccable/live/config.json +0 -8
package/README.md
CHANGED
|
@@ -545,6 +545,28 @@ TOOL_INJECTION_ENABLED=false
|
|
|
545
545
|
CODE_MODE_ENABLED=true
|
|
546
546
|
```
|
|
547
547
|
|
|
548
|
+
Always-on (no config): **smart tool selection** (server mode), **RTK tool-result
|
|
549
|
+
compression** (test/git/grep/lint/build/JSON output), **MCP tool dedup** (drops
|
|
550
|
+
built-in WebSearch/WebFetch when an Exa/Tavily MCP tool is present), and
|
|
551
|
+
**request bypass** (Claude CLI Warmup / title-extraction calls are answered
|
|
552
|
+
locally, never hitting a provider).
|
|
553
|
+
|
|
554
|
+
Optional **terse-output mode** to cut *output* tokens:
|
|
555
|
+
```bash
|
|
556
|
+
CAVEMAN_ENABLED=true # off by default — nudges the model to be concise
|
|
557
|
+
CAVEMAN_LEVEL=lite # lite | full | ultra
|
|
558
|
+
```
|
|
559
|
+
|
|
560
|
+
### Cost tracking & model pricing
|
|
561
|
+
Per-request cost is computed from a model-pricing registry (LiteLLM → models.dev,
|
|
562
|
+
cached 24h) and recorded in telemetry. Models the registry doesn't know record
|
|
563
|
+
`cost_usd=null` (logged once) rather than a fabricated price. Pin prices for
|
|
564
|
+
unknown models:
|
|
565
|
+
```bash
|
|
566
|
+
# Per-1M-token USD prices, JSON keyed by model name
|
|
567
|
+
MODEL_PRICE_OVERRIDES={"my-model":{"input":0.5,"output":1.5}}
|
|
568
|
+
```
|
|
569
|
+
|
|
548
570
|
### Memory System (Titans-inspired)
|
|
549
571
|
```bash
|
|
550
572
|
MEMORY_ENABLED=true
|
|
@@ -652,35 +674,45 @@ npm start
|
|
|
652
674
|
|
|
653
675
|
## Benchmark Results
|
|
654
676
|
|
|
655
|
-
|
|
677
|
+
Head-to-head against **LiteLLM** on the **same backends** (Ollama `minimax-m2.5`, Moonshot, Azure OpenAI), 9 scenarios across 4 feature categories. Apples-to-apples comparison is Lynkr vs LiteLLM **billed tokens on the same scenario**. Run with `node benchmark-tier-routing.js`.
|
|
656
678
|
|
|
657
|
-
|
|
679
|
+
> _Run: June 5, 2026 · Lynkr v9.3.2 · LiteLLM v1.87.1 · macOS, Apple Silicon._
|
|
658
680
|
|
|
659
|
-
|
|
681
|
+
### Token reduction (vs LiteLLM, same model & prompt)
|
|
682
|
+
|
|
683
|
+
| Mechanism | Lynkr | LiteLLM | Result |
|
|
660
684
|
|---|---|---|---|
|
|
661
|
-
|
|
|
662
|
-
|
|
|
663
|
-
| Large JSON grep result (60 items) | 3,458 | **427** | **87.6%** |
|
|
685
|
+
| Smart tool selection (14 tools) | **959** tokens · $0.0044 | 2,085 tokens · $0.0091 | **53% fewer tokens, 52% cheaper** |
|
|
686
|
+
| TOON compression (60-item grep JSON) | **427** tokens · $0.009 | 3,458 tokens · $0.018 | **87.6% fewer tokens, 50% cheaper** |
|
|
664
687
|
|
|
665
|
-
Lynkr strips irrelevant tool schemas
|
|
688
|
+
Lynkr strips irrelevant tool schemas (smart tool selection) and binary-compresses large JSON tool results (TOON) — both in-process, no added latency.
|
|
666
689
|
|
|
667
690
|
### Semantic cache
|
|
668
691
|
|
|
669
692
|
| | Tokens billed | Response time |
|
|
670
693
|
|---|---|---|
|
|
671
694
|
| First call (cold) | 2,857 | 1,891ms |
|
|
672
|
-
| **Second call — paraphrased, cache hit** | **0** | **171ms** |
|
|
695
|
+
| **Second call — paraphrased, cache hit** | **0** (served from cache) | **171ms (11× faster)** |
|
|
673
696
|
|
|
674
|
-
Near-identical prompts return cached responses in 171ms. Zero tokens billed on a cache hit.
|
|
697
|
+
Near-identical prompts return cached responses in 171ms. Zero model tokens billed on a cache hit.
|
|
675
698
|
|
|
676
699
|
### Tier routing
|
|
677
700
|
|
|
678
|
-
| Request |
|
|
679
|
-
|
|
680
|
-
| "What does git stash do?" |
|
|
681
|
-
| JWT vs cookies security analysis |
|
|
701
|
+
| Request | Lynkr routes to | LiteLLM routes to |
|
|
702
|
+
|---|---|---|
|
|
703
|
+
| "What does git stash do?" | `minimax-m2.5` (local, free) | Ollama (local) |
|
|
704
|
+
| JWT vs cookies security analysis | `moonshot` (cloud — correct) | **Ollama (local — wrong call)** |
|
|
705
|
+
|
|
706
|
+
Lynkr scores each request on 15 dimensions (token count, code complexity, reasoning markers, risk signals, agentic patterns) and escalates automatically. LiteLLM's `cost-based-routing` sends everything to the cheapest model regardless of complexity.
|
|
707
|
+
|
|
708
|
+
### Cost projection (100,000 requests/month, same backend)
|
|
709
|
+
|
|
710
|
+
| | Monthly cost | vs LiteLLM |
|
|
711
|
+
|---|---|---|
|
|
712
|
+
| LiteLLM | ~$818 | baseline |
|
|
713
|
+
| **Lynkr** | **~$409** | **~50% cheaper** |
|
|
682
714
|
|
|
683
|
-
|
|
715
|
+
_Based on a tool-heavy agentic session (TOON scenario). On equal footing — same provider, same model — Lynkr is cheaper due to token optimization._
|
|
684
716
|
|
|
685
717
|
→ [Full benchmark report with methodology](BENCHMARK_REPORT.md)
|
|
686
718
|
|
package/install.sh
CHANGED
|
@@ -108,8 +108,24 @@ clone_or_update() {
|
|
|
108
108
|
install_dependencies() {
|
|
109
109
|
print_info "Installing dependencies..."
|
|
110
110
|
cd "$INSTALL_DIR"
|
|
111
|
-
|
|
111
|
+
# --omit=dev keeps optionalDependencies (better-sqlite3, hnswlib-node,
|
|
112
|
+
# tree-sitter) which back telemetry, the memory store and routing ML.
|
|
113
|
+
# The postinstall hook (scripts/check-native.js) verifies the native ABI
|
|
114
|
+
# and rebuilds if Node was upgraded — best-effort, never fails the install.
|
|
115
|
+
npm install --omit=dev
|
|
112
116
|
print_success "Dependencies installed"
|
|
117
|
+
|
|
118
|
+
# Native optional modules need a C/C++ toolchain only if no prebuilt binary
|
|
119
|
+
# is available for this platform. They degrade gracefully if absent.
|
|
120
|
+
if ! node -e "const D=require('better-sqlite3'); new D(':memory:').close()" >/dev/null 2>&1; then
|
|
121
|
+
print_warning "Native module 'better-sqlite3' is not loadable."
|
|
122
|
+
echo " Telemetry, the memory store and sessions need it. To enable:"
|
|
123
|
+
echo " - Ensure a build toolchain is present (Xcode CLT on macOS, build-essential + python3 on Linux), then:"
|
|
124
|
+
echo " - ${BLUE}cd $INSTALL_DIR && npm run rebuild-native${NC}"
|
|
125
|
+
echo " Lynkr still runs without it (those features stay disabled)."
|
|
126
|
+
else
|
|
127
|
+
print_success "Native modules OK (telemetry, memory, sessions enabled)"
|
|
128
|
+
fi
|
|
113
129
|
}
|
|
114
130
|
|
|
115
131
|
# Create default .env file
|
|
@@ -131,7 +147,7 @@ create_env_file() {
|
|
|
131
147
|
MODEL_PROVIDER=ollama
|
|
132
148
|
|
|
133
149
|
# Server Configuration
|
|
134
|
-
PORT=
|
|
150
|
+
PORT=8081
|
|
135
151
|
|
|
136
152
|
# Ollama Configuration (default for local development)
|
|
137
153
|
OLLAMA_MODEL=qwen2.5-coder:7b
|
|
@@ -161,7 +177,7 @@ EOF
|
|
|
161
177
|
print_info "📝 Configuration ready! Key settings:"
|
|
162
178
|
echo " • Default provider: Ollama (local, offline)"
|
|
163
179
|
echo " • Memory system: Enabled (learns from conversations)"
|
|
164
|
-
echo " • Port:
|
|
180
|
+
echo " • Port: 8081"
|
|
165
181
|
echo ""
|
|
166
182
|
print_warning "To use cloud providers (Databricks/OpenAI/Azure):"
|
|
167
183
|
echo " Edit: ${BLUE}nano $INSTALL_DIR/.env${NC}"
|
|
@@ -220,7 +236,7 @@ print_next_steps() {
|
|
|
220
236
|
echo " ${BLUE}lynkr${NC}"
|
|
221
237
|
echo ""
|
|
222
238
|
echo " 3. Configure Claude Code CLI:"
|
|
223
|
-
echo " ${BLUE}export ANTHROPIC_BASE_URL=http://localhost:
|
|
239
|
+
echo " ${BLUE}export ANTHROPIC_BASE_URL=http://localhost:8081${NC}"
|
|
224
240
|
echo " ${BLUE}claude${NC}"
|
|
225
241
|
echo ""
|
|
226
242
|
echo " ${YELLOW}Option B: Use Cloud Providers (Databricks/OpenAI/Azure)${NC}"
|
|
@@ -238,7 +254,7 @@ print_next_steps() {
|
|
|
238
254
|
echo " ${BLUE}lynkr${NC}"
|
|
239
255
|
echo ""
|
|
240
256
|
echo " 3. Configure Claude Code CLI:"
|
|
241
|
-
echo " ${BLUE}export ANTHROPIC_BASE_URL=http://localhost:
|
|
257
|
+
echo " ${BLUE}export ANTHROPIC_BASE_URL=http://localhost:8081${NC}"
|
|
242
258
|
echo " ${BLUE}export ANTHROPIC_API_KEY=any-non-empty-value${NC} ${GREEN}← Placeholder${NC}"
|
|
243
259
|
echo " ${BLUE}claude${NC}"
|
|
244
260
|
echo ""
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "lynkr",
|
|
3
|
-
"version": "9.
|
|
3
|
+
"version": "9.5.0",
|
|
4
4
|
"description": "Self-hosted LLM gateway and tier-routing proxy for Claude Code, Cursor, and Codex. Routes across Ollama, AWS Bedrock, OpenRouter, Databricks, Azure OpenAI, llama.cpp, and LM Studio with prompt caching, MCP tools, and 60-80% cost savings.",
|
|
5
5
|
"main": "index.js",
|
|
6
6
|
"bin": {
|
|
@@ -8,13 +8,15 @@
|
|
|
8
8
|
"lynkr-setup": "scripts/setup.js"
|
|
9
9
|
},
|
|
10
10
|
"scripts": {
|
|
11
|
+
"postinstall": "node scripts/check-native.js",
|
|
12
|
+
"rebuild-native": "node scripts/check-native.js",
|
|
11
13
|
"prestart": "node -e \"if(process.env.HEADROOM_ENABLED==='true'&&process.env.HEADROOM_DOCKER_ENABLED!=='false'){process.exit(0)}else{process.exit(1)}\" && docker compose --profile headroom up -d --build headroom 2>/dev/null || echo 'Headroom skipped (disabled or Docker not running)'",
|
|
12
14
|
"start": "node index.js 2>&1 | npx pino-pretty --sync",
|
|
13
15
|
"stop": "node -e \"if(process.env.HEADROOM_ENABLED==='true'&&process.env.HEADROOM_DOCKER_ENABLED!=='false'){process.exit(0)}else{process.exit(1)}\" && docker compose --profile headroom down || echo 'Headroom skipped (disabled or Docker not running)'",
|
|
14
16
|
"dev": "nodemon index.js",
|
|
15
17
|
"lint": "eslint src index.js",
|
|
16
18
|
"test": "npm run test:unit && npm run test:performance",
|
|
17
|
-
"test:unit": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/routing.test.js test/hybrid-routing-integration.test.js test/web-tools.test.js test/passthrough-mode.test.js test/openrouter-error-resilience.test.js test/format-conversion.test.js test/azure-openai-config.test.js test/azure-openai-format-conversion.test.js test/azure-openai-routing.test.js test/azure-openai-streaming.test.js test/azure-openai-error-resilience.test.js test/azure-openai-integration.test.js test/openai-integration.test.js test/toon-compression.test.js test/llamacpp-integration.test.js test/resilience.test.js test/telemetry-routing.test.js test/memory/store.test.js test/memory/surprise.test.js test/memory/extractor.test.js test/memory/search.test.js test/memory/retriever.test.js test/distill.test.js test/large-payload.test.js test/code-mode.test.js test/prompt-cache-injection.test.js test/risk-analyzer.test.js test/interaction-block.test.js test/preflight.test.js",
|
|
19
|
+
"test:unit": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/routing.test.js test/hybrid-routing-integration.test.js test/web-tools.test.js test/passthrough-mode.test.js test/openrouter-error-resilience.test.js test/format-conversion.test.js test/azure-openai-config.test.js test/azure-openai-format-conversion.test.js test/azure-openai-routing.test.js test/azure-openai-streaming.test.js test/azure-openai-error-resilience.test.js test/azure-openai-integration.test.js test/openai-integration.test.js test/toon-compression.test.js test/llamacpp-integration.test.js test/resilience.test.js test/telemetry-routing.test.js test/memory/store.test.js test/memory/surprise.test.js test/memory/extractor.test.js test/memory/search.test.js test/memory/retriever.test.js test/distill.test.js test/large-payload.test.js test/code-mode.test.js test/prompt-cache-injection.test.js test/risk-analyzer.test.js test/interaction-block.test.js test/preflight.test.js test/token-reduction.test.js test/session-affinity.test.js test/model-registry-cost.test.js",
|
|
18
20
|
"test:memory": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/memory/store.test.js test/memory/surprise.test.js test/memory/extractor.test.js test/memory/search.test.js test/memory/retriever.test.js",
|
|
19
21
|
"test:new-features": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/passthrough-mode.test.js test/openrouter-error-resilience.test.js test/format-conversion.test.js",
|
|
20
22
|
"test:performance": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node test/hybrid-routing-performance.test.js && DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node test/performance-tests.js",
|
package/public/dashboard.html
CHANGED
|
@@ -244,6 +244,7 @@ const App = {
|
|
|
244
244
|
const t = d.today;
|
|
245
245
|
const s = d.stats;
|
|
246
246
|
|
|
247
|
+
const tierLabel = t => t === 'default' ? 'default' : String(t).toLowerCase();
|
|
247
248
|
const providerCards = d.providers.length === 0
|
|
248
249
|
? `<p class="text-slate-500 text-sm">No providers configured</p>`
|
|
249
250
|
: d.providers.map(p => `
|
|
@@ -251,10 +252,21 @@ const App = {
|
|
|
251
252
|
<div class="flex items-center gap-2">
|
|
252
253
|
<span class="status-dot ${providerDot(p.type)}"></span>
|
|
253
254
|
<span class="text-sm font-medium text-slate-200">${p.name}</span>
|
|
255
|
+
${(p.tiers || []).map(t => `<span class="badge bg-slate-600/60 text-slate-300">${tierLabel(t)}</span>`).join('')}
|
|
254
256
|
</div>
|
|
255
257
|
<span class="text-xs ${p.type === 'local' ? 'text-green-400' : 'text-blue-400'}">${p.type}</span>
|
|
256
258
|
</div>`).join('');
|
|
257
259
|
|
|
260
|
+
const providerWarnings = (d.providerWarnings || []).map(w => `
|
|
261
|
+
<div class="flex items-center justify-between bg-amber-500/10 border border-amber-500/30 rounded-lg px-4 py-3">
|
|
262
|
+
<div class="flex items-center gap-2">
|
|
263
|
+
<span class="text-amber-400 text-sm">⚠</span>
|
|
264
|
+
<span class="text-sm font-medium text-amber-200">${w.name}</span>
|
|
265
|
+
${(w.tiers || []).map(t => `<span class="badge bg-amber-500/20 text-amber-300">${tierLabel(t)}</span>`).join('')}
|
|
266
|
+
</div>
|
|
267
|
+
<span class="text-xs text-amber-400">no credentials</span>
|
|
268
|
+
</div>`).join('');
|
|
269
|
+
|
|
258
270
|
const recentRows = (d.recentRequests || []).map(r => `
|
|
259
271
|
<tr class="table-row border-b border-slate-700/50">
|
|
260
272
|
<td class="py-2 px-3 text-xs text-slate-500">${fmt.ago(r.timestamp)}</td>
|
|
@@ -279,7 +291,7 @@ const App = {
|
|
|
279
291
|
<!-- Providers -->
|
|
280
292
|
${card(`
|
|
281
293
|
<h3 class="text-sm font-semibold text-slate-300 mb-3">Configured Providers</h3>
|
|
282
|
-
<div class="flex flex-col gap-2">${providerCards}</div>
|
|
294
|
+
<div class="flex flex-col gap-2">${providerCards}${providerWarnings}</div>
|
|
283
295
|
`)}
|
|
284
296
|
|
|
285
297
|
<!-- 24h Stats -->
|
|
@@ -0,0 +1,97 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
/**
|
|
3
|
+
* Native module ABI guard (postinstall).
|
|
4
|
+
*
|
|
5
|
+
* better-sqlite3 (and the other native optionalDependencies) are compiled
|
|
6
|
+
* against a specific Node ABI. When Node is upgraded, the prebuilt/compiled
|
|
7
|
+
* binary stops loading with:
|
|
8
|
+
*
|
|
9
|
+
* "was compiled against a different Node.js version using
|
|
10
|
+
* NODE_MODULE_VERSION 115. This version of Node.js requires
|
|
11
|
+
* NODE_MODULE_VERSION 141."
|
|
12
|
+
*
|
|
13
|
+
* The failure is silent at runtime — telemetry, request logs, and the memory
|
|
14
|
+
* store all sit behind try/catch and simply go empty. This probe detects the
|
|
15
|
+
* mismatch and rebuilds the native modules so it self-heals on `npm install`.
|
|
16
|
+
*
|
|
17
|
+
* It is intentionally best-effort: it NEVER exits non-zero, so it can't break
|
|
18
|
+
* `npm install` on machines without a build toolchain (the modules are
|
|
19
|
+
* optional and the app degrades gracefully without them).
|
|
20
|
+
*/
|
|
21
|
+
|
|
22
|
+
const { execSync } = require("child_process");
|
|
23
|
+
|
|
24
|
+
// Native optionalDependencies that are ABI-sensitive. If Node changed, all of
|
|
25
|
+
// them are stale, so we rebuild the set in one pass.
|
|
26
|
+
const NATIVE_DEPS = [
|
|
27
|
+
"better-sqlite3",
|
|
28
|
+
"hnswlib-node",
|
|
29
|
+
"tree-sitter",
|
|
30
|
+
"tree-sitter-javascript",
|
|
31
|
+
"tree-sitter-python",
|
|
32
|
+
"tree-sitter-typescript",
|
|
33
|
+
];
|
|
34
|
+
|
|
35
|
+
function log(msg) {
|
|
36
|
+
console.log(`[check-native] ${msg}`);
|
|
37
|
+
}
|
|
38
|
+
|
|
39
|
+
/**
|
|
40
|
+
* Probe better-sqlite3 — the canary. `require()` alone is not enough: the
|
|
41
|
+
* native addon only loads when a Database is instantiated.
|
|
42
|
+
* @returns {"ok"|"absent"|"mismatch"}
|
|
43
|
+
*/
|
|
44
|
+
function probe() {
|
|
45
|
+
let Database;
|
|
46
|
+
try {
|
|
47
|
+
Database = require("better-sqlite3");
|
|
48
|
+
} catch (err) {
|
|
49
|
+
if (err && err.code === "MODULE_NOT_FOUND") return "absent";
|
|
50
|
+
return "mismatch";
|
|
51
|
+
}
|
|
52
|
+
try {
|
|
53
|
+
const db = new Database(":memory:");
|
|
54
|
+
db.close();
|
|
55
|
+
return "ok";
|
|
56
|
+
} catch (err) {
|
|
57
|
+
if (/NODE_MODULE_VERSION|different Node\.js version|invalid ELF|dlopen|\.node/i.test(err.message || "")) {
|
|
58
|
+
return "mismatch";
|
|
59
|
+
}
|
|
60
|
+
// Some other instantiation error — not an ABI issue we can fix by rebuild.
|
|
61
|
+
return "ok";
|
|
62
|
+
}
|
|
63
|
+
}
|
|
64
|
+
|
|
65
|
+
function main() {
|
|
66
|
+
const status = probe();
|
|
67
|
+
|
|
68
|
+
if (status === "absent") {
|
|
69
|
+
// Optional dependency not installed (e.g. build skipped). Nothing to do.
|
|
70
|
+
return;
|
|
71
|
+
}
|
|
72
|
+
if (status === "ok") {
|
|
73
|
+
return;
|
|
74
|
+
}
|
|
75
|
+
|
|
76
|
+
log("native module ABI mismatch detected (Node was likely upgraded). Rebuilding native modules…");
|
|
77
|
+
try {
|
|
78
|
+
execSync(`npm rebuild ${NATIVE_DEPS.join(" ")}`, { stdio: "inherit" });
|
|
79
|
+
} catch {
|
|
80
|
+
log("rebuild did not complete (a build toolchain may be missing). Continuing — native features will be disabled until you run: npm rebuild better-sqlite3");
|
|
81
|
+
return;
|
|
82
|
+
}
|
|
83
|
+
|
|
84
|
+
// Re-probe to report the outcome.
|
|
85
|
+
if (probe() === "ok") {
|
|
86
|
+
log("native modules rebuilt successfully.");
|
|
87
|
+
} else {
|
|
88
|
+
log("native modules still not loadable after rebuild. Run `npm rebuild better-sqlite3` manually.");
|
|
89
|
+
}
|
|
90
|
+
}
|
|
91
|
+
|
|
92
|
+
try {
|
|
93
|
+
main();
|
|
94
|
+
} catch (err) {
|
|
95
|
+
// Never fail the install.
|
|
96
|
+
log(`skipped (${err.message})`);
|
|
97
|
+
}
|
|
@@ -1506,10 +1506,16 @@ async function invokeMoonshot(body) {
|
|
|
1506
1506
|
"claude-haiku-4-5-20251001": "kimi-k2-turbo-preview",
|
|
1507
1507
|
"claude-haiku-4-5": "kimi-k2-turbo-preview",
|
|
1508
1508
|
"claude-3-haiku": "kimi-k2-turbo-preview",
|
|
1509
|
+
// moonshot-v1-auto 400s with "tokenization failed" (its server-side auto
|
|
1510
|
+
// context-size pass fails on large tool-bearing payloads). Remap to a
|
|
1511
|
+
// fixed model that's broadly available on api.moonshot.ai.
|
|
1512
|
+
"moonshot-v1-auto": "moonshot-v1-128k",
|
|
1509
1513
|
};
|
|
1510
1514
|
|
|
1511
1515
|
const requestedModel = body._tierModel || body.model || config.moonshot.model;
|
|
1512
|
-
|
|
1516
|
+
let mappedModel = modelMap[requestedModel] || config.moonshot.model || "kimi-k2-turbo-preview";
|
|
1517
|
+
// Guard against the deprecated auto model arriving via config too.
|
|
1518
|
+
if (mappedModel === "moonshot-v1-auto") mappedModel = "moonshot-v1-128k";
|
|
1513
1519
|
|
|
1514
1520
|
// Convert messages using existing utility
|
|
1515
1521
|
const messages = convertAnthropicMessagesToOpenRouter(body.messages || []);
|
|
@@ -1522,12 +1528,18 @@ async function invokeMoonshot(body) {
|
|
|
1522
1528
|
messages.unshift({ role: "system", content: systemContent });
|
|
1523
1529
|
}
|
|
1524
1530
|
|
|
1531
|
+
// kimi-k2.x (k2.5 / k2.6 …) are thinking models that only accept
|
|
1532
|
+
// temperature: 1 — any other value 400s with "invalid temperature".
|
|
1533
|
+
const isKimiThinking = /^kimi-k2/i.test(mappedModel);
|
|
1534
|
+
|
|
1525
1535
|
const moonshotBody = {
|
|
1526
1536
|
model: mappedModel,
|
|
1527
1537
|
messages,
|
|
1528
1538
|
max_tokens: body.max_tokens || 16384,
|
|
1529
|
-
|
|
1530
|
-
top_p
|
|
1539
|
+
// kimi-k2.x thinking models pin sampling params: temperature must be 1
|
|
1540
|
+
// and top_p must be 0.95 — any other value 400s.
|
|
1541
|
+
temperature: isKimiThinking ? 1 : (body.temperature ?? 0.7),
|
|
1542
|
+
top_p: isKimiThinking ? 0.95 : (body.top_p ?? 1.0),
|
|
1531
1543
|
stream: false, // Force non-streaming - OpenAI SSE to Anthropic SSE conversion not implemented
|
|
1532
1544
|
};
|
|
1533
1545
|
|
|
@@ -2027,6 +2039,65 @@ async function invokeCodex(body) {
|
|
|
2027
2039
|
};
|
|
2028
2040
|
}
|
|
2029
2041
|
|
|
2042
|
+
/**
|
|
2043
|
+
* Compute request cost in USD from model pricing × token usage.
|
|
2044
|
+
* Registry returns per-1M-token prices ({ input, output }); returns null when
|
|
2045
|
+
* pricing is unknown so we don't record misleading zeros.
|
|
2046
|
+
*/
|
|
2047
|
+
const _unknownCostWarned = new Set();
|
|
2048
|
+
function computeCostUsd(model, inputTokens, outputTokens) {
|
|
2049
|
+
try {
|
|
2050
|
+
const { getModelRegistrySync } = require("../routing/model-registry");
|
|
2051
|
+
const reg = getModelRegistrySync && getModelRegistrySync();
|
|
2052
|
+
const cost = reg?.getCost?.(model);
|
|
2053
|
+
if (!cost) return null;
|
|
2054
|
+
// Unknown model → record null (not a fabricated default), warn once so the
|
|
2055
|
+
// gap is visible and can be fixed via MODEL_PRICE_OVERRIDES.
|
|
2056
|
+
if (cost.unknown) {
|
|
2057
|
+
if (model && !_unknownCostWarned.has(model)) {
|
|
2058
|
+
_unknownCostWarned.add(model);
|
|
2059
|
+
logger.warn({ model }, "[Cost] No pricing for model — recording cost_usd=null. Set MODEL_PRICE_OVERRIDES to fix.");
|
|
2060
|
+
}
|
|
2061
|
+
return null;
|
|
2062
|
+
}
|
|
2063
|
+
if (cost.input == null && cost.output == null) return null;
|
|
2064
|
+
const inUsd = ((inputTokens || 0) / 1e6) * (cost.input || 0);
|
|
2065
|
+
const outUsd = ((outputTokens || 0) / 1e6) * (cost.output || 0);
|
|
2066
|
+
return Number((inUsd + outUsd).toFixed(6));
|
|
2067
|
+
} catch {
|
|
2068
|
+
return null;
|
|
2069
|
+
}
|
|
2070
|
+
}
|
|
2071
|
+
|
|
2072
|
+
// Telemetry prompt/response text is always captured (truncated) to build the
|
|
2073
|
+
// routing ML training corpus. Stored locally in .lynkr/telemetry.db only.
|
|
2074
|
+
const TELEMETRY_TEXT_MAXLEN = 2000;
|
|
2075
|
+
|
|
2076
|
+
/** Flatten the latest user message to plain text (for telemetry capture). */
|
|
2077
|
+
function captureRequestText(body) {
|
|
2078
|
+
const messages = body?.messages;
|
|
2079
|
+
if (!Array.isArray(messages)) return null;
|
|
2080
|
+
for (let i = messages.length - 1; i >= 0; i--) {
|
|
2081
|
+
const m = messages[i];
|
|
2082
|
+
if (m?.role !== "user") continue;
|
|
2083
|
+
let text = "";
|
|
2084
|
+
if (typeof m.content === "string") text = m.content;
|
|
2085
|
+
else if (Array.isArray(m.content)) {
|
|
2086
|
+
text = m.content.filter((b) => b?.type === "text").map((b) => b.text || "").join(" ");
|
|
2087
|
+
}
|
|
2088
|
+
if (text) return text.slice(0, TELEMETRY_TEXT_MAXLEN);
|
|
2089
|
+
}
|
|
2090
|
+
return null;
|
|
2091
|
+
}
|
|
2092
|
+
|
|
2093
|
+
/** Flatten an Anthropic response's text blocks to plain text (for telemetry). */
|
|
2094
|
+
function captureResponseText(resultJson) {
|
|
2095
|
+
const content = resultJson?.content;
|
|
2096
|
+
if (!Array.isArray(content)) return null;
|
|
2097
|
+
const text = content.filter((b) => b?.type === "text").map((b) => b.text || "").join(" ");
|
|
2098
|
+
return text ? text.slice(0, TELEMETRY_TEXT_MAXLEN) : null;
|
|
2099
|
+
}
|
|
2100
|
+
|
|
2030
2101
|
async function invokeModel(body, options = {}) {
|
|
2031
2102
|
const { determineProviderSmart, isFallbackEnabled, getFallbackProvider } = require("./routing");
|
|
2032
2103
|
const metricsCollector = getMetricsCollector();
|
|
@@ -2233,6 +2304,9 @@ async function invokeModel(body, options = {}) {
|
|
|
2233
2304
|
circuit_breaker_state: breaker.state,
|
|
2234
2305
|
quality_score: qualityScore,
|
|
2235
2306
|
tokens_per_second: outputTokens && latency > 0 ? outputTokens / (latency / 1000) : null,
|
|
2307
|
+
cost_usd: computeCostUsd(routingDecision.model || body._tierModel, inputTokens, outputTokens),
|
|
2308
|
+
request_text: captureRequestText(body),
|
|
2309
|
+
response_text: captureResponseText(result.json),
|
|
2236
2310
|
});
|
|
2237
2311
|
|
|
2238
2312
|
// Return result with provider info and routing decision for headers
|
|
@@ -2394,6 +2468,9 @@ async function invokeModel(body, options = {}) {
|
|
|
2394
2468
|
{ status_code: 200, output_tokens: fbOutputTokens, tool_calls_made: fbToolCalls, was_fallback: true, retry_count: 0, latency_ms: Date.now() - startTime }
|
|
2395
2469
|
),
|
|
2396
2470
|
tokens_per_second: fbOutputTokens && fallbackLatency > 0 ? fbOutputTokens / (fallbackLatency / 1000) : null,
|
|
2471
|
+
cost_usd: computeCostUsd(routingDecision.model || body._tierModel, fbInputTokens, fbOutputTokens),
|
|
2472
|
+
request_text: captureRequestText(body),
|
|
2473
|
+
response_text: captureResponseText(fallbackResult.json),
|
|
2397
2474
|
});
|
|
2398
2475
|
|
|
2399
2476
|
// Return result with actual provider used (fallback provider) and routing decision
|
|
@@ -176,6 +176,21 @@ function convertAnthropicMessagesToOpenRouter(anthropicMessages) {
|
|
|
176
176
|
}
|
|
177
177
|
}
|
|
178
178
|
|
|
179
|
+
// Kimi/Moonshot (and some OpenAI-compatible APIs) reject a message whose
|
|
180
|
+
// content is an empty string with "Invalid request: tokenization failed".
|
|
181
|
+
// This happens when a turn had only non-text blocks (thinking / image /
|
|
182
|
+
// stripped content) and flattened to "". Replace empty/whitespace-only
|
|
183
|
+
// content with a single space — but never touch an assistant message that
|
|
184
|
+
// carries tool_calls, where content: null is intentional and required.
|
|
185
|
+
for (const m of converted) {
|
|
186
|
+
if (m.role === 'tool') continue;
|
|
187
|
+
const hasToolCalls = Array.isArray(m.tool_calls) && m.tool_calls.length > 0;
|
|
188
|
+
if (hasToolCalls) continue;
|
|
189
|
+
if (typeof m.content !== 'string' || m.content.trim() === '') {
|
|
190
|
+
m.content = ' ';
|
|
191
|
+
}
|
|
192
|
+
}
|
|
193
|
+
|
|
179
194
|
// Log the converted messages for debugging
|
|
180
195
|
logger.debug({
|
|
181
196
|
inputCount: anthropicMessages.length,
|
package/src/config/index.js
CHANGED
|
@@ -208,6 +208,11 @@ const tokenBudgetWarning = Number.parseInt(process.env.TOKEN_BUDGET_WARNING ?? "
|
|
|
208
208
|
const tokenBudgetMax = Number.parseInt(process.env.TOKEN_BUDGET_MAX ?? "180000", 10);
|
|
209
209
|
const tokenBudgetEnforcement = process.env.TOKEN_BUDGET_ENFORCEMENT !== "false"; // default true
|
|
210
210
|
|
|
211
|
+
// Caveman terse-output injection (opt-in, off by default)
|
|
212
|
+
const cavemanEnabled = process.env.CAVEMAN_ENABLED === "true";
|
|
213
|
+
const cavemanLevel = (process.env.CAVEMAN_LEVEL ?? "lite").toLowerCase();
|
|
214
|
+
|
|
215
|
+
|
|
211
216
|
// TOON payload compression (opt-in)
|
|
212
217
|
const toonEnabled = process.env.TOON_ENABLED === "true"; // default false
|
|
213
218
|
const toonMinBytes = Number.parseInt(process.env.TOON_MIN_BYTES ?? "4096", 10);
|
|
@@ -641,6 +646,10 @@ var config = {
|
|
|
641
646
|
toolResultCompression: {
|
|
642
647
|
enabled: true,
|
|
643
648
|
},
|
|
649
|
+
caveman: {
|
|
650
|
+
enabled: cavemanEnabled,
|
|
651
|
+
level: cavemanLevel,
|
|
652
|
+
},
|
|
644
653
|
server: {
|
|
645
654
|
jsonLimit: process.env.REQUEST_JSON_LIMIT ?? "1gb",
|
|
646
655
|
},
|
|
@@ -0,0 +1,94 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Caveman Terse-Output Injector
|
|
3
|
+
*
|
|
4
|
+
* Appends a brevity instruction to the system prompt so the model produces
|
|
5
|
+
* terser responses, reducing OUTPUT tokens. Opt-in and off by default — it
|
|
6
|
+
* changes model behavior, so it's only applied when explicitly enabled.
|
|
7
|
+
*
|
|
8
|
+
* Enable with CAVEMAN_ENABLED=true. Level via CAVEMAN_LEVEL=lite|full|ultra
|
|
9
|
+
* (default: lite). Adapted from 9router's caveman injector / the caveman skill
|
|
10
|
+
* (https://github.com/JuliusBrussee/caveman).
|
|
11
|
+
*
|
|
12
|
+
* @module context/caveman
|
|
13
|
+
*/
|
|
14
|
+
|
|
15
|
+
const config = require("../config");
|
|
16
|
+
const logger = require("../logger");
|
|
17
|
+
|
|
18
|
+
const LEVELS = ["lite", "full", "ultra"];
|
|
19
|
+
|
|
20
|
+
// Shared guardrails so brevity never corrupts the substance that matters.
|
|
21
|
+
const BOUNDARIES =
|
|
22
|
+
"Code blocks, file paths, commands, errors, URLs: keep exact. " +
|
|
23
|
+
"Security warnings, irreversible-action confirmations, and multi-step ordered " +
|
|
24
|
+
"sequences: write in full normal prose. Resume terse style afterward.";
|
|
25
|
+
|
|
26
|
+
const EXAMPLES =
|
|
27
|
+
'Not: "Sure! I\'d be happy to help. The issue is likely caused by..." ' +
|
|
28
|
+
'Yes: "Bug in auth middleware. Token expiry uses `<` not `<=`. Fix:"';
|
|
29
|
+
|
|
30
|
+
const PERSISTENCE = "Apply this to every response unless a guardrail above applies.";
|
|
31
|
+
|
|
32
|
+
const PROMPTS = {
|
|
33
|
+
lite: [
|
|
34
|
+
"Respond tersely. Keep grammar and full sentences but drop filler, hedging, and pleasantries (just/really/basically/sure/of course/I'd be happy to).",
|
|
35
|
+
"Pattern: state the thing, the action, the reason. Then the next step.",
|
|
36
|
+
EXAMPLES,
|
|
37
|
+
BOUNDARIES,
|
|
38
|
+
PERSISTENCE,
|
|
39
|
+
].join(" "),
|
|
40
|
+
|
|
41
|
+
full: [
|
|
42
|
+
"Respond like a terse caveman. All technical substance stays exact; only fluff dies.",
|
|
43
|
+
"Drop articles (a/an/the), filler (just/really/basically/actually/simply), pleasantries, and hedging. Fragments OK. Prefer short synonyms (big not extensive, fix not implement a solution for).",
|
|
44
|
+
"Pattern: [thing] [action] [reason]. [next step].",
|
|
45
|
+
EXAMPLES,
|
|
46
|
+
BOUNDARIES,
|
|
47
|
+
PERSISTENCE,
|
|
48
|
+
].join(" "),
|
|
49
|
+
|
|
50
|
+
ultra: [
|
|
51
|
+
"Respond ultra-terse. Maximum compression. Telegraphic.",
|
|
52
|
+
"Abbreviate (DB/auth/config/req/res/fn/impl), strip conjunctions, use arrows for causality (X → Y). One word when one word is enough.",
|
|
53
|
+
"Pattern: [thing] → [result]. [fix].",
|
|
54
|
+
EXAMPLES,
|
|
55
|
+
BOUNDARIES,
|
|
56
|
+
PERSISTENCE,
|
|
57
|
+
].join(" "),
|
|
58
|
+
};
|
|
59
|
+
|
|
60
|
+
const MARKER = "[brevity]";
|
|
61
|
+
|
|
62
|
+
/** Resolve the configured level, falling back to "lite". */
|
|
63
|
+
function resolveLevel(level) {
|
|
64
|
+
const l = String(level || config.caveman?.level || "lite").toLowerCase();
|
|
65
|
+
return LEVELS.includes(l) ? l : "lite";
|
|
66
|
+
}
|
|
67
|
+
|
|
68
|
+
/**
|
|
69
|
+
* Append the brevity instruction to a system prompt string.
|
|
70
|
+
* Idempotent — won't double-inject if the marker is already present.
|
|
71
|
+
*
|
|
72
|
+
* @param {string} system - Existing system prompt (may be empty).
|
|
73
|
+
* @param {object} [opts]
|
|
74
|
+
* @param {boolean} [opts.enabled] - Override config enablement.
|
|
75
|
+
* @param {string} [opts.level] - Override level.
|
|
76
|
+
* @returns {string} system prompt, possibly with brevity instruction appended.
|
|
77
|
+
*/
|
|
78
|
+
function injectCaveman(system, opts = {}) {
|
|
79
|
+
const enabled = opts.enabled ?? config.caveman?.enabled === true;
|
|
80
|
+
if (!enabled) return system || "";
|
|
81
|
+
|
|
82
|
+
const base = system || "";
|
|
83
|
+
if (base.includes(MARKER)) return base;
|
|
84
|
+
|
|
85
|
+
const level = resolveLevel(opts.level);
|
|
86
|
+
const instruction = `\n\n${MARKER} ${PROMPTS[level]}`;
|
|
87
|
+
logger.debug({ level }, "[Caveman] Injected brevity instruction into system prompt");
|
|
88
|
+
return base + instruction;
|
|
89
|
+
}
|
|
90
|
+
|
|
91
|
+
module.exports = {
|
|
92
|
+
injectCaveman,
|
|
93
|
+
LEVELS,
|
|
94
|
+
};
|