npm - claw-llm-router - Versions diffs - 1.0.0 - Mend

claw-llm-router 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/LICENSE +21 -0
package/README.md +336 -0
package/classifier.ts +516 -0
package/docs/ARCHITECTURE.md +82 -0
package/docs/CLASSIFIER.md +146 -0
package/docs/PROVIDERS.md +228 -0
package/index.ts +602 -0
package/models.ts +104 -0
package/openclaw.plugin.json +55 -0
package/package.json +52 -0
package/provider.ts +30 -0
package/providers/anthropic.ts +332 -0
package/providers/gateway.ts +128 -0
package/providers/index.ts +135 -0
package/providers/model-override.ts +81 -0
package/providers/openai-compatible.ts +126 -0
package/providers/types.ts +29 -0
package/proxy.ts +282 -0
package/router-logger.ts +101 -0
package/tier-config.ts +288 -0

package/docs/CLASSIFIER.md ADDED Viewed

@@ -0,0 +1,146 @@
+# Prompt Classifier
+The classifier determines which tier (SIMPLE, MEDIUM, COMPLEX, REASONING) a user prompt belongs to, so the router can pick the most cost-effective model.
+## Architecture
+```
+User prompt
+    │
+    ▼
+┌──────────────────┐
+│  Rule-based       │
+│  (15 dimensions)  │
+└──────────────────┘
+    │
+    ▼
+  Tier assigned
+```
+The classifier is 100% local — 15-dimension weighted scoring that runs in <1ms with no API calls. Ambiguous prompts (near tier boundaries) default to the rule-based result rather than calling an external LLM, because the MEDIUM tier is cheap enough to be a safe default.
+### Why no LLM fallback?
+An earlier version used a hybrid approach: when rule-based confidence was below 0.70, it called a cheap LLM to verify. This was removed because:
+1. **Net cost increase**: The LLM classifier correctly downgraded ~35% of ambiguous prompts to SIMPLE (saving ~$0.004 each), but upgraded ~25% to COMPLEX/REASONING (costing ~$0.03-0.04 each). The upgrades dominated, making the classifier a net cost of $3.50-$74/month depending on traffic.
+2. **Latency**: Added 100-500ms per LLM call on ~33% of messages.
+3. **ClawRouter precedent**: ClawRouter uses 100% local classification with no LLM fallback and reports 70-80% cost savings. The savings come from the non-ambiguous prompts (both approaches classify these identically).
+4. **MEDIUM is a safe default**: Cheap enough to not waste money, capable enough to handle most tasks.
+## Rule-Based Classifier
+### 15 Scoring Dimensions
+Each dimension scores the prompt on a scale (typically -1.0 to 1.0). The weighted sum determines the tier.
+| #   | Dimension             | Weight | What it detects                                                                |
+| --- | --------------------- | ------ | ------------------------------------------------------------------------------ |
+| 1   | `tokenCount`          | 0.08   | Prompt length (<50 tokens = -1.0, >500 = 1.0)                                  |
+| 2   | `codePresence`        | 0.14   | Code keywords: `function`, `class`, `import`, ` ``` `, etc.                    |
+| 3   | `reasoningMarkers`    | 0.17   | `prove`, `theorem`, `step by step`, `chain of thought`, etc.                   |
+| 4   | `technicalTerms`      | 0.09   | `algorithm`, `kubernetes`, `distributed`, `architecture`, etc.                 |
+| 5   | `creativeMarkers`     | 0.05   | `story`, `poem`, `brainstorm`, `write a`, etc.                                 |
+| 6   | `simpleIndicators`    | 0.11   | `what is`, `define`, `hello`, `capital of` → scores -1.0 (pulls toward SIMPLE) |
+| 7   | `multiStepPatterns`   | 0.11   | Regex: `first.*then`, `step \d`, numbered lists                                |
+| 8   | `questionComplexity`  | 0.04   | 4+ question marks in the prompt                                                |
+| 9   | `imperativeVerbs`     | 0.03   | `build`, `create`, `implement`, `deploy`, etc.                                 |
+| 10  | `constraintCount`     | 0.04   | `at most`, `within`, `maximum`, `budget`, etc.                                 |
+| 11  | `outputFormat`        | 0.03   | `json`, `yaml`, `table`, `format as`, etc.                                     |
+| 12  | `referenceComplexity` | 0.02   | `the docs`, `the api`, `attached`, `above`, etc.                               |
+| 13  | `negationComplexity`  | 0.01   | `don't`, `avoid`, `without`, `except`, etc.                                    |
+| 14  | `domainSpecificity`   | 0.02   | `quantum`, `fpga`, `genomics`, `zero-knowledge`, etc.                          |
+| 15  | `agenticTask`         | 0.06   | `read file`, `edit`, `deploy`, `fix`, `debug`, `step 1`, etc.                  |
+Weights sum to 1.0 and are aligned with [ClawRouter](https://github.com/claw-project/claw-router)'s 14-dimension scheme, scaled to accommodate our 15th dimension (`agenticTask`).
+### Tier Boundaries
+The weighted sum maps to a tier via fixed boundaries:
+| Score range | Tier      | Band width |
+| ----------- | --------- | ---------- |
+| < 0.00      | SIMPLE    | —          |
+| 0.00 – 0.30 | MEDIUM    | 0.30       |
+| 0.30 – 0.50 | COMPLEX   | 0.20       |
+| >= 0.50     | REASONING | —          |
+These boundaries match [ClawRouter](https://github.com/claw-project/claw-router)'s production-proven values. The MEDIUM band is intentionally wide (0.30) so that ambiguous prompts — which tend to cluster around boundaries — land confidently within MEDIUM rather than triggering expensive misrouting. With steepness=12.0, a score at the center of MEDIUM (0.15) has distance 0.15 to the nearest boundary, yielding confidence ~0.86.
+### Special Overrides
+These override the score-based mapping regardless of weighted sum:
+| Condition                                                                                | Forced tier | Min confidence |
+| ---------------------------------------------------------------------------------------- | ----------- | -------------- |
+| >100k estimated tokens                                                                   | COMPLEX     | 0.95           |
+| 2+ reasoning keywords                                                                    | REASONING   | 0.85           |
+| 4+ complexity signals (technical + imperative + agentic) AND (multi-step OR long prompt) | COMPLEX     | 0.85           |
+### Confidence Calculation
+Confidence measures how far the score is from the nearest tier boundary, using a sigmoid function:
+```
+confidence = sigmoid(distance_to_nearest_boundary)
+sigmoid(x) = 1 / (1 + exp(-12.0 * x))
+```
+Higher confidence means the score is well within a tier's range. Lower confidence means it's near a boundary. Either way, the rule-based tier is used directly.
+### Signals
+The classifier returns human-readable signal strings that explain why it chose a tier. Examples:
+- `short (3 tokens)` — prompt is very short
+- `simple (what is)` — matched a simple-indicator keyword
+- `code (function, class)` — matched code keywords
+- `reasoning (step by step, prove)` — matched reasoning markers
+These signals appear in the router logs for debugging.
+## Prompt Extraction
+Before classification, the proxy extracts the actual user text from potentially wrapped messages. This prevents system prompt keywords from polluting the classification.
+### Three extraction cases
+1. **Packed context** — OpenClaw group chats/subagents wrap history + current message:
+   ```
+   [Chat messages since your last reply - for context]
+   user: earlier message
+   assistant: earlier reply
+   [Current message - respond to this]
+   What is 2+2?
+   ```
+   The classifier only sees `What is 2+2?`.
+2. **Embedded system prompt** — Some OpenClaw paths (webchat) prepend the system prompt to the user message instead of sending it as a separate system-role message. If the system prompt text is found inside the user message, it's stripped before classification.
+3. **Long message without system role** — If there's no separate system message and the user message is >500 chars, the system prompt is likely embedded. The classifier takes the text after the last `\n\n` break (if it's <500 chars) as the actual user input.
+These extraction steps are critical — without them, system prompt keywords like `json`, `function`, or `code` cause misclassification (e.g., "3+1" classified as MEDIUM instead of SIMPLE because the system prompt mentioned JSON formatting).
+## Forced Tier Override
+Users can bypass the classifier entirely by using tier-specific model IDs:
+| Model ID                                   | Tier      |
+| ------------------------------------------ | --------- |
+| `simple` or `claw-llm-router/simple`       | SIMPLE    |
+| `medium` or `claw-llm-router/medium`       | MEDIUM    |
+| `complex` or `claw-llm-router/complex`     | COMPLEX   |
+| `reasoning` or `claw-llm-router/reasoning` | REASONING |
+## Fallback Chain
+If a provider fails, the router tries the next tier up:
+| Starting tier | Fallback chain            |
+| ------------- | ------------------------- |
+| SIMPLE        | SIMPLE → MEDIUM → COMPLEX |
+| MEDIUM        | MEDIUM → COMPLEX          |
+| COMPLEX       | COMPLEX → REASONING       |
+| REASONING     | REASONING (no fallback)   |

package/docs/PROVIDERS.md ADDED Viewed

@@ -0,0 +1,228 @@
+# Adding a New Provider
+This guide explains how to add a new LLM provider to the claw-llm-router.
+## Overview
+The router uses a **Strategy pattern** — each provider implements the `LLMProvider` interface and handles its own request/response format. The provider registry (`providers/index.ts`) picks the right provider based on the model spec.
+```mermaid
+flowchart TD
+    CALL[callProvider] --> RESOLVE[resolveProvider]
+    RESOLVE --> OAUTH{spec.isOAuth?}
+    OAUTH -->|Yes| PRIMARY{Router is primary?}
+    PRIMARY -->|No| GW[GatewayProvider]
+    PRIMARY -->|Yes| GWO[gateway-with-override]
+    OAUTH -->|No| CHECK{spec.isAnthropic?}
+    CHECK -->|Yes| ANT[AnthropicProvider]
+    CHECK -->|No| OAI[OpenAICompatibleProvider]
+```
+## The `LLMProvider` Interface
+Every provider implements this contract from `providers/types.ts`:
+```typescript
+interface LLMProvider {
+  readonly name: string;
+  chatCompletion(
+    body: Record<string, unknown>, // Original OpenAI-format request
+    spec: { modelId: string; apiKey: string; baseUrl: string },
+    stream: boolean,
+    res: ServerResponse,
+    log: PluginLogger,
+  ): Promise<void>;
+}
+```
+### Parameters
+- `body` — The original request body in OpenAI chat completions format
+- `spec` — Provider details (model ID, API key, base URL) resolved from tier config
+- `stream` — Whether the client requested streaming SSE
+- `res` — Node.js `ServerResponse` to write the response to
+- `log` — Logger with `info()`, `warn()`, `error()` methods
+### Contract
+- **Non-streaming**: Write a complete JSON response to `res` with `Content-Type: application/json`
+- **Streaming**: Write SSE events to `res` with `Content-Type: text/event-stream`, ending with `data: [DONE]\n\n`
+- **Errors**: Throw an `Error` — the proxy's fallback chain will catch it and try the next tier
+- **Response format**: Must be OpenAI chat completions format (the proxy expects it)
+## Step-by-Step Guide
+### 1. Create `providers/my-provider.ts`
+```typescript
+import type { ServerResponse } from "node:http";
+import type { LLMProvider, PluginLogger } from "./types.js";
+export class MyProvider implements LLMProvider {
+  readonly name = "my-provider";
+  async chatCompletion(
+    body: Record<string, unknown>,
+    spec: { modelId: string; apiKey: string; baseUrl: string },
+    stream: boolean,
+    res: ServerResponse,
+    log: PluginLogger,
+  ): Promise<void> {
+    // 1. Convert request if needed (OpenAI format → provider format)
+    // 2. Make the API call with fetch()
+    // 3. Convert response back to OpenAI format if needed
+    // 4. Write to res (JSON for non-streaming, SSE for streaming)
+  }
+}
+```
+### 2. Add to `providers/index.ts`
+```typescript
+import { MyProvider } from "./my-provider.js";
+const myProvider = new MyProvider();
+export function resolveProvider(spec: TierModelSpec): LLMProvider {
+  if (spec.provider === "my-provider") {
+    return myProvider;
+  }
+  // ... existing logic
+}
+```
+### 3. Add well-known base URL to `tier-config.ts`
+In the `WELL_KNOWN_BASE_URLS` map:
+```typescript
+const WELL_KNOWN_BASE_URLS: Record<string, string> = {
+  // ... existing entries
+  "my-provider": "https://api.my-provider.com/v1",
+};
+```
+### 4. Add env var mapping (if non-standard)
+If the API key env var isn't `MY_PROVIDER_API_KEY`, add to `ENV_VAR_OVERRIDES`:
+```typescript
+const ENV_VAR_OVERRIDES: Record<string, string> = {
+  google: "GEMINI_API_KEY",
+  "my-provider": "MY_CUSTOM_KEY_VAR",
+};
+```
+### 5. Test
+```bash
+# Configure a tier to use your provider
+/router set SIMPLE my-provider/model-id
+# Test with curl
+curl -s http://127.0.0.1:8401/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{"model":"simple","messages":[{"role":"user","content":"hello"}],"max_tokens":50}'
+```
+### 6. Write tests
+Create `tests/providers/my-provider.test.ts` — see existing tests for patterns (mock `fetch()`, test request/response conversion).
+## Request/Response Format
+### Input (OpenAI Chat Completions)
+```json
+{
+  "model": "my-model",
+  "messages": [
+    { "role": "system", "content": "You are helpful." },
+    { "role": "user", "content": "Hello" }
+  ],
+  "max_tokens": 100,
+  "temperature": 0.7,
+  "stream": false
+}
+```
+### Output (Non-Streaming)
+```json
+{
+  "id": "chatcmpl-...",
+  "object": "chat.completion",
+  "created": 1234567890,
+  "model": "my-model",
+  "choices": [
+    {
+      "index": 0,
+      "message": { "role": "assistant", "content": "Hi!" },
+      "finish_reason": "stop"
+    }
+  ],
+  "usage": {
+    "prompt_tokens": 10,
+    "completion_tokens": 5,
+    "total_tokens": 15
+  }
+}
+```
+### Output (Streaming SSE)
+```
+data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"my-model","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
+data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"my-model","choices":[{"index":0,"delta":{"content":"Hi"},"finish_reason":null}]}
+data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"my-model","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
+data: [DONE]
+```
+## Request Body Sanitization
+The `OpenAICompatibleProvider` strips non-standard fields from the request body before forwarding. Fields like `store` and `metadata` (added by OpenClaw internally) cause 400 errors on providers like Google Gemini. Only standard OpenAI chat completion parameters are forwarded:
+`messages`, `model`, `stream`, `max_tokens`, `max_completion_tokens`, `temperature`, `top_p`, `n`, `stop`, `presence_penalty`, `frequency_penalty`, `logit_bias`, `logprobs`, `top_logprobs`, `response_format`, `seed`, `tools`, `tool_choice`, `parallel_tool_calls`, `user`, `stream_options`, `service_tier`
+If your provider accepts additional parameters, either add them to the allowlist in `openai-compatible.ts` or handle them in your custom provider.
+## Auth
+API keys are resolved by `tier-config.ts` in this priority order:
+1. Environment variable (e.g., `MY_PROVIDER_API_KEY`)
+2. `auth-profiles.json` (OpenClaw's canonical credential store)
+3. `auth.json` (runtime cache)
+4. `openclaw.json` `env.vars` section
+The key is passed to your provider via `spec.apiKey`. Your provider should use it in the appropriate header (e.g., `Authorization: Bearer {apiKey}` or `x-api-key: {apiKey}`).
+## Existing Providers Reference
+| Provider                   | File                   | Auth Header                             | API Format         | Notes                                                     |
+| -------------------------- | ---------------------- | --------------------------------------- | ------------------ | --------------------------------------------------------- |
+| `OpenAICompatibleProvider` | `openai-compatible.ts` | `Authorization: Bearer`                 | OpenAI             | Sanitizes non-standard fields                             |
+| `AnthropicProvider`        | `anthropic.ts`         | `x-api-key`                             | Anthropic Messages | Full format conversion (request + response + streaming)   |
+| `GatewayProvider`          | `gateway.ts`           | `Authorization: Bearer` (gateway token) | OpenAI             | Fallback for OAuth tokens                                 |
+| `gateway-with-override`    | `index.ts` (inline)    | Same as Gateway                         | OpenAI             | Sets `before_model_resolve` override to prevent recursion |
+### Supported OpenAI-Compatible Providers
+| Provider   | Base URL                                                  | Env Var              | Example Models                           |
+| ---------- | --------------------------------------------------------- | -------------------- | ---------------------------------------- |
+| Google     | `https://generativelanguage.googleapis.com/v1beta/openai` | `GEMINI_API_KEY`     | `gemini-2.5-flash`                       |
+| OpenAI     | `https://api.openai.com/v1`                               | `OPENAI_API_KEY`     | `gpt-4o`, `gpt-4o-mini`                  |
+| Groq       | `https://api.groq.com/openai/v1`                          | `GROQ_API_KEY`       | `llama-3.3-70b-versatile`                |
+| Mistral    | `https://api.mistral.ai/v1`                               | `MISTRAL_API_KEY`    | `mistral-large-latest`                   |
+| DeepSeek   | `https://api.deepseek.com/v1`                             | `DEEPSEEK_API_KEY`   | `deepseek-chat`                          |
+| Together   | `https://api.together.xyz/v1`                             | `TOGETHER_API_KEY`   | `meta-llama/Llama-3-70b`                 |
+| Fireworks  | `https://api.fireworks.ai/inference/v1`                   | `FIREWORKS_API_KEY`  | `accounts/fireworks/models/llama-v3-70b` |
+| Perplexity | `https://api.perplexity.ai`                               | `PERPLEXITY_API_KEY` | `sonar-pro`                              |
+| xAI        | `https://api.x.ai/v1`                                     | `XAI_API_KEY`        | `grok-3`, `grok-beta`                    |
+| MiniMax    | `https://api.minimax.io/v1`                               | `MINIMAX_API_KEY`    | `MiniMax-M1`                             |
+| MoonShot   | `https://api.moonshot.ai/v1`                              | `MOONSHOT_API_KEY`   | `kimi-k2.5`                              |
+**Note:** MiniMax supports both direct API key and OAuth authentication. With OAuth (via OpenClaw auth-profiles), requests route through the gateway which handles token refresh and API format conversion. The router auto-detects OAuth credentials from the `minimax-portal` auth profile.