npm - @relayplane/proxy - Versions diffs - 0.1.9 → 0.1.10 - Mend

@relayplane/proxy 0.1.9 → 0.1.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/README.md CHANGED Viewed

@@ -4,6 +4,19 @@
 Intelligent AI model routing that cuts costs by 50-80% while maintaining quality.
+> **Note:** Designed for standard API key users (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`). MAX subscription OAuth is not currently supported — MAX users should continue using their provider directly.
+> ⚠️ **Cost Monitoring Required**
+>
+> RelayPlane routes requests to LLM providers using your API keys. **This incurs real costs.**
+>
+> - Set up billing alerts with your providers (Anthropic, OpenAI, etc.)
+> - Monitor usage through your provider's dashboard
+> - Use `/relayplane stats` or `curl localhost:3001/control/stats` to track usage
+> - Start with test requests to understand routing behavior
+>
+> RelayPlane provides cost *optimization*, not cost *elimination*. You are responsible for monitoring your actual spending.
 [![CI](https://github.com/RelayPlane/proxy/actions/workflows/ci.yml/badge.svg)](https://github.com/RelayPlane/proxy/actions/workflows/ci.yml)
 [![npm version](https://img.shields.io/npm/v/@relayplane/proxy)](https://www.npmjs.com/package/@relayplane/proxy)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
@@ -39,6 +52,23 @@ npx @relayplane/proxy stats --days 30
 npx @relayplane/proxy --help
 ```
+## OpenClaw Slash Commands
+If you're using OpenClaw, these chat commands are available:
+| Command | Description |
+|---------|-------------|
+| `/relayplane stats` | Show usage statistics and cost savings |
+| `/relayplane status` | Show proxy health and configuration |
+| `/relayplane switch <mode>` | Change routing mode (auto\|cost\|fast\|quality) |
+| `/relayplane models` | List available routing models |
+Example:
+```
+/relayplane stats
+/relayplane switch cost
+```
 ## Quick Start
 ### 1. Set your API keys
@@ -110,11 +140,11 @@ Unlike static routing rules, RelayPlane adapts to **your** usage patterns.
 | Provider | Models | Streaming | Tools |
 |----------|--------|-----------|-------|
-| **Anthropic** | Claude 4.5 (Opus, Sonnet, Haiku) | ✓ | ✓ |
-| **OpenAI** | GPT-5.2, GPT-5.2-Codex, o1, o3 | ✓ | ✓ |
-| **Google** | Gemini 2.0 Flash, 2.0 Pro | ✓ | ✓ |
-| **xAI** | Grok-3, Grok-3-mini | ✓ | ✓ |
-| **Moonshot** | v1-8k, v1-32k, v1-128k | ✓ | ✓ |
+| **Anthropic** | Claude 3.5 Haiku, Sonnet 4, Opus 4.5 | ✓ | ✓ |
+| **OpenAI** | GPT-4o, GPT-4o-mini, GPT-4.1, o1, o3 | ✓ | ✓ |
+| **Google** | Gemini 2.0 Flash, Gemini Pro | ✓ | ✓ |
+| **xAI** | Grok (grok-*) | ✓ | ✓ |
+| **Moonshot** | Moonshot v1 (8k, 32k, 128k) | ✓ | ✓ |
 ## Routing Modes
@@ -178,74 +208,71 @@ Options:
 ## REST API
-The proxy exposes endpoints for stats and monitoring:
+The proxy exposes control endpoints for stats and monitoring:
-### `GET /health`
+### `GET /control/status`
-Server health and version info.
+Proxy status and current configuration.
 ```bash
-curl http://localhost:3001/health
+curl http://localhost:3001/control/status
 ```
 ```json
 {
-  "status": "ok",
-  "version": "0.1.7",
-  "uptime": "2h 15m 30s",
-  "providers": { "anthropic": true, "openai": true, "google": false },
-  "totalRuns": 142
+  "enabled": true,
+  "mode": "cascade",
+  "modelOverrides": {}
 }
 ```
-### `GET /stats`
+### `GET /control/stats`
-Aggregated statistics and cost savings.
+Aggregated statistics and routing counts.
 ```bash
-curl http://localhost:3001/stats
+curl http://localhost:3001/control/stats
 ```
 ```json
 {
-  "totalRuns": 142,
-  "savings": {
-    "estimatedSavingsPercent": "73.2%",
-    "actualCostUsd": "0.0234",
-    "baselineCostUsd": "0.0873",
-    "savedUsd": "0.0639"
+  "uptimeMs": 3600000,
+  "uptimeFormatted": "60m 0s",
+  "totalRequests": 142,
+  "successfulRequests": 138,
+  "failedRequests": 4,
+  "successRate": "97.2%",
+  "avgLatencyMs": 1203,
+  "escalations": 12,
+  "routingCounts": {
+    "auto": 100,
+    "cost": 30,
+    "passthrough": 12
   },
-  "modelDistribution": {
-    "anthropic/claude-3-5-haiku-latest": { "count": 98, "percentage": "69.0%" },
-    "anthropic/claude-sonnet-4-20250514": { "count": 44, "percentage": "31.0%" }
+  "modelCounts": {
+    "anthropic/claude-3-5-haiku-latest": 98,
+    "anthropic/claude-sonnet-4-20250514": 44
   }
 }
 ```
-### `GET /runs`
+### `POST /control/enable` / `POST /control/disable`
-Recent routing decisions.
+Enable or disable routing (passthrough mode when disabled).
 ```bash
-curl "http://localhost:3001/runs?limit=10"
+curl -X POST http://localhost:3001/control/enable
+curl -X POST http://localhost:3001/control/disable
 ```
-```json
-{
-  "runs": [
-    {
-      "runId": "abc123",
-      "timestamp": "2026-02-03T13:26:03Z",
-      "model": "anthropic/claude-3-5-haiku-latest",
-      "taskType": "code_generation",
-      "confidence": 0.92,
-      "mode": "auto",
-      "durationMs": 1203,
-      "promptPreview": "Write a function that..."
-    }
-  ],
-  "total": 142
-}
+### `POST /control/config`
+Update configuration (hot-reload, merges with existing).
+```bash
+curl -X POST http://localhost:3001/control/config \
+  -H "Content-Type: application/json" \
+  -d '{"routing": {"mode": "cascade"}}'
 ```
 ## Configuration
@@ -254,46 +281,71 @@ RelayPlane creates a config file on first run at `~/.relayplane/config.json`:
 ```json
 {
-  "strategies": {
-    "code_review": { "model": "anthropic:claude-sonnet-4-20250514" },
-    "code_generation": { "model": "anthropic:claude-3-5-haiku-latest" },
-    "analysis": { "model": "anthropic:claude-sonnet-4-20250514" },
-    "summarization": { "model": "anthropic:claude-3-5-haiku-latest" },
-    "creative_writing": { "model": "anthropic:claude-sonnet-4-20250514" },
-    "data_extraction": { "model": "anthropic:claude-3-5-haiku-latest" },
-    "translation": { "model": "anthropic:claude-3-5-haiku-latest" },
-    "question_answering": { "model": "anthropic:claude-3-5-haiku-latest" },
-    "general": { "model": "anthropic:claude-3-5-haiku-latest" }
+  "enabled": true,
+  "routing": {
+    "mode": "cascade",
+    "cascade": {
+      "enabled": true,
+      "models": [
+        "claude-3-haiku-20240307",
+        "claude-3-5-sonnet-20241022",
+        "claude-3-opus-20240229"
+      ],
+      "escalateOn": "uncertainty",
+      "maxEscalations": 1
+    },
+    "complexity": {
+      "enabled": true,
+      "simple": "claude-3-haiku-20240307",
+      "moderate": "claude-3-5-sonnet-20241022",
+      "complex": "claude-3-opus-20240229"
+    }
   },
-  "defaults": {
-    "qualityModel": "claude-sonnet-4-20250514",
-    "costModel": "claude-3-5-haiku-latest"
-  }
+  "reliability": {
+    "cooldowns": {
+      "enabled": true,
+      "allowedFails": 3,
+      "windowSeconds": 60,
+      "cooldownSeconds": 120
+    }
+  },
+  "modelOverrides": {}
 }
 ```
 **Edit and save — changes apply instantly** (hot-reload, no restart needed).
-### Strategy Options
+### Configuration Options
 | Field | Description |
 |-------|-------------|
-| `model` | Provider and model in format `provider:model` |
-| `minConfidence` | Optional. Only use this strategy if confidence >= threshold |
-| `fallback` | Optional. Fallback model if primary fails |
+| `enabled` | Enable/disable routing (false = passthrough mode) |
+| `routing.mode` | `"cascade"` or `"standard"` |
+| `routing.cascade.models` | Ordered list of models to try (cheapest first) |
+| `routing.cascade.escalateOn` | When to escalate: `"uncertainty"`, `"refusal"`, or `"error"` |
+| `routing.complexity.simple/moderate/complex` | Models for each complexity level |
+| `reliability.cooldowns` | Auto-disable failing providers temporarily |
+| `modelOverrides` | Map input model names to different targets |
 ### Examples
-Route all analysis tasks to GPT-4o:
+Use GPT-4o for complex tasks:
 ```json
-"analysis": { "model": "openai:gpt-4o" }
+{
+  "routing": {
+    "complexity": {
+      "complex": "gpt-4o"
+    }
+  }
+}
 ```
-Use Opus for code review with fallback:
+Override a specific model:
 ```json
-"code_review": {
-  "model": "anthropic:claude-opus-4-5-20250514",
-  "fallback": "anthropic:claude-sonnet-4-20250514"
+{
+  "modelOverrides": {
+    "claude-3-opus": "claude-3-5-sonnet-20241022"
+  }
 }
 ```
@@ -311,9 +363,9 @@ sqlite3 ~/.relayplane/data.db "SELECT * FROM routing_rules"
 ## Links
-- [Documentation](https://relayplane.com/integrations/openclaw)
+- [RelayPlane Proxy](https://relayplane.com/integrations/openclaw)
 - [GitHub](https://github.com/RelayPlane/proxy)
-- [RelayPlane SDK](https://github.com/RelayPlane/sdk)
+- [RelayPlane](https://relayplane.com/)
 ## License

package/dist/cli.js CHANGED Viewed

@@ -1710,6 +1710,63 @@ var VERSION = "0.1.9";
 var recentRuns = [];
 var MAX_RECENT_RUNS = 100;
 var modelCounts = {};
+var tokenStats = {};
+var MODEL_PRICING2 = {
+  // Anthropic
+  "claude-3-haiku-20240307": { input: 0.25, output: 1.25 },
+  "claude-3-5-haiku-20241022": { input: 1, output: 5 },
+  "claude-3-5-haiku-latest": { input: 1, output: 5 },
+  "claude-3-5-sonnet-20241022": { input: 3, output: 15 },
+  "claude-sonnet-4-20250514": { input: 3, output: 15 },
+  "claude-3-opus-20240229": { input: 15, output: 75 },
+  "claude-opus-4-5-20250514": { input: 15, output: 75 },
+  // OpenAI
+  "gpt-4o": { input: 2.5, output: 10 },
+  "gpt-4o-mini": { input: 0.15, output: 0.6 },
+  "gpt-4-turbo": { input: 10, output: 30 },
+  // Defaults for unknown models
+  "default-cheap": { input: 1, output: 5 },
+  "default-expensive": { input: 15, output: 75 }
+};
+function trackTokens(model, inputTokens, outputTokens) {
+  if (!tokenStats[model]) {
+    tokenStats[model] = { inputTokens: 0, outputTokens: 0, requests: 0 };
+  }
+  tokenStats[model].inputTokens += inputTokens;
+  tokenStats[model].outputTokens += outputTokens;
+  tokenStats[model].requests += 1;
+}
+function calculateCosts() {
+  let totalInputTokens = 0;
+  let totalOutputTokens = 0;
+  let actualCostUsd = 0;
+  const byModel = {};
+  for (const [model, stats] of Object.entries(tokenStats)) {
+    totalInputTokens += stats.inputTokens;
+    totalOutputTokens += stats.outputTokens;
+    const pricing = MODEL_PRICING2[model] || MODEL_PRICING2["default-cheap"];
+    const cost = stats.inputTokens / 1e6 * pricing.input + stats.outputTokens / 1e6 * pricing.output;
+    actualCostUsd += cost;
+    byModel[model] = {
+      inputTokens: stats.inputTokens,
+      outputTokens: stats.outputTokens,
+      costUsd: parseFloat(cost.toFixed(4))
+    };
+  }
+  const opusPricing = MODEL_PRICING2["claude-opus-4-5-20250514"];
+  const opusCostUsd = totalInputTokens / 1e6 * opusPricing.input + totalOutputTokens / 1e6 * opusPricing.output;
+  const savingsUsd = opusCostUsd - actualCostUsd;
+  const savingsPercent = opusCostUsd > 0 ? (savingsUsd / opusCostUsd * 100).toFixed(1) + "%" : "0%";
+  return {
+    totalInputTokens,
+    totalOutputTokens,
+    actualCostUsd: parseFloat(actualCostUsd.toFixed(4)),
+    opusCostUsd: parseFloat(opusCostUsd.toFixed(4)),
+    savingsUsd: parseFloat(savingsUsd.toFixed(4)),
+    savingsPercent,
+    byModel
+  };
+}
 var serverStartTime = 0;
 var currentConfig = loadConfig();
 var DEFAULT_ENDPOINTS = {
@@ -2337,6 +2394,7 @@ function convertAnthropicStreamEvent(eventType, eventData, messageId, model, too
       return null;
   }
 }
+var lastStreamingUsage = null;
 async function* convertAnthropicStream(response, model) {
   const reader = response.body?.getReader();
   if (!reader) {
@@ -2349,6 +2407,8 @@ async function* convertAnthropicStream(response, model) {
     currentToolIndex: 0,
     tools: /* @__PURE__ */ new Map()
   };
+  let streamInputTokens = 0;
+  let streamOutputTokens = 0;
   try {
     while (true) {
       const { done, value } = await reader.read();
@@ -2366,6 +2426,17 @@ async function* convertAnthropicStream(response, model) {
         } else if (line === "" && eventType && eventData) {
           try {
             const parsed = JSON.parse(eventData);
+            if (eventType === "message_start") {
+              const msg = parsed["message"];
+              if (msg?.usage?.input_tokens) {
+                streamInputTokens = msg.usage.input_tokens;
+              }
+            } else if (eventType === "message_delta") {
+              const usage = parsed["usage"];
+              if (usage?.output_tokens) {
+                streamOutputTokens = usage.output_tokens;
+              }
+            }
             const converted = convertAnthropicStreamEvent(eventType, parsed, messageId, model, toolState);
             if (converted) {
               yield converted;
@@ -2377,6 +2448,7 @@ async function* convertAnthropicStream(response, model) {
         }
       }
     }
+    lastStreamingUsage = { inputTokens: streamInputTokens, outputTokens: streamOutputTokens };
   } finally {
     reader.releaseLock();
   }
@@ -2474,23 +2546,32 @@ async function startProxy(config = {}) {
     }
     if (req.method === "GET" && pathname === "/stats") {
       const stats = relay.stats();
-      const savings = relay.savingsReport(30);
+      const costs = calculateCosts();
       const totalRuns = Object.values(modelCounts).reduce((a, b) => a + b, 0);
       const modelDistribution = {};
       for (const [model, count] of Object.entries(modelCounts)) {
+        const modelName = model.split("/")[1] || model;
+        const tokenData = costs.byModel[modelName];
         modelDistribution[model] = {
           count,
-          percentage: totalRuns > 0 ? (count / totalRuns * 100).toFixed(1) + "%" : "0%"
+          percentage: totalRuns > 0 ? (count / totalRuns * 100).toFixed(1) + "%" : "0%",
+          tokens: tokenData ? { input: tokenData.inputTokens, output: tokenData.outputTokens } : void 0,
+          costUsd: tokenData?.costUsd
         };
       }
       res.writeHead(200, { "Content-Type": "application/json" });
       res.end(JSON.stringify({
         totalRuns,
-        savings: {
-          estimatedSavingsPercent: savings.savingsPercent.toFixed(1) + "%",
-          actualCostUsd: savings.actualCost.toFixed(4),
-          baselineCostUsd: savings.baselineCost.toFixed(4),
-          savedUsd: savings.savings.toFixed(4)
+        tokens: {
+          input: costs.totalInputTokens,
+          output: costs.totalOutputTokens,
+          total: costs.totalInputTokens + costs.totalOutputTokens
+        },
+        costs: {
+          actualUsd: costs.actualCostUsd,
+          opusBaselineUsd: costs.opusCostUsd,
+          savingsUsd: costs.savingsUsd,
+          savingsPercent: costs.savingsPercent
         },
         modelDistribution,
         byTaskType: stats.byTaskType,
@@ -2746,6 +2827,11 @@ async function handleStreamingRequest(res, request, targetProvider, targetModel,
   const durationMs = Date.now() - startTime;
   const modelKey = `${targetProvider}/${targetModel}`;
   modelCounts[modelKey] = (modelCounts[modelKey] || 0) + 1;
+  if (lastStreamingUsage && (lastStreamingUsage.inputTokens > 0 || lastStreamingUsage.outputTokens > 0)) {
+    trackTokens(targetModel, lastStreamingUsage.inputTokens, lastStreamingUsage.outputTokens);
+    log(`Tokens: ${lastStreamingUsage.inputTokens} in, ${lastStreamingUsage.outputTokens} out`);
+    lastStreamingUsage = null;
+  }
   relay.run({
     prompt: promptText.slice(0, 500),
     taskType,
@@ -2837,6 +2923,11 @@ async function handleNonStreamingRequest(res, request, targetProvider, targetMod
   const durationMs = Date.now() - startTime;
   const modelKey = `${targetProvider}/${targetModel}`;
   modelCounts[modelKey] = (modelCounts[modelKey] || 0) + 1;
+  const usage = responseData["usage"];
+  if (usage?.prompt_tokens || usage?.completion_tokens) {
+    trackTokens(targetModel, usage.prompt_tokens ?? 0, usage.completion_tokens ?? 0);
+    log(`Tokens: ${usage.prompt_tokens ?? 0} in, ${usage.completion_tokens ?? 0} out`);
+  }
   try {
     const runResult = await relay.run({
       prompt: promptText.slice(0, 500),