@relayplane/proxy 0.2.1 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,185 +1,286 @@
1
1
  # @relayplane/proxy
2
2
 
3
- Intelligent AI model routing proxy for cost optimization and observability.
3
+ Local LLM proxy server for RelayPlane - route requests through multiple AI providers.
4
+
5
+ ## What's New in 1.1
6
+
7
+ - 🩺 **Health Endpoint** — `GET /health` with uptime, stats, and provider status
8
+ - ⚠️ **Usage Warnings** — Console and header warnings at 80%, 90%, 100% of limits
9
+ - 📊 **Response Headers** — `X-RelayPlane-Daily-Usage`, `X-RelayPlane-Monthly-Usage`, `X-RelayPlane-Usage-Warning`
10
+ - 💰 **Spending Limits** — Configure `limits.daily` and `limits.monthly` with 429 when exceeded
11
+ - 🏷️ **Model Aliases** — `rp:fast`, `rp:cheap`, `rp:best`, `rp:balanced` shortcuts
12
+
13
+ ## Features
14
+
15
+ - **OpenAI-compatible API** - Drop-in replacement for OpenAI SDK
16
+ - **Multi-provider routing** - Automatically routes to OpenAI, Anthropic, Groq, Together, OpenRouter
17
+ - **Model aliases** - `rp:fast`, `rp:cheap`, `rp:best` shortcuts
18
+ - **Dry-run mode** - Test routing without making API calls
19
+ - **Usage tracking** - Track tokens, cost, and latency
20
+ - **Spending limits** - Daily/monthly cost limits with warnings
21
+ - **Health endpoint** - `/health` for monitoring and uptime checks
4
22
 
5
23
  ## Installation
6
24
 
7
25
  ```bash
8
- npm install -g @relayplane/proxy
26
+ npm install @relayplane/proxy
9
27
  ```
10
28
 
11
- ## Quick Start
29
+ Or use via the CLI:
12
30
 
13
31
  ```bash
14
- # Set your API keys
15
- export ANTHROPIC_API_KEY=your-key
16
- export OPENAI_API_KEY=your-key
32
+ npm install -g @relayplane/cli
33
+ relayplane proxy start
34
+ ```
17
35
 
18
- # Start the proxy
19
- relayplane-proxy
36
+ ## Quick Start
20
37
 
21
- # Configure your tools to use the proxy
22
- export ANTHROPIC_BASE_URL=http://localhost:3001
23
- export OPENAI_BASE_URL=http://localhost:3001
38
+ ```bash
39
+ # Set API keys
40
+ export OPENAI_API_KEY=sk-...
41
+ export ANTHROPIC_API_KEY=sk-ant-...
24
42
 
25
- # Run your AI tools (Claude Code, Cursor, Aider, etc.)
43
+ # Start the proxy
44
+ npx @relayplane/proxy
45
+
46
+ # Make requests
47
+ curl http://localhost:8787/v1/chat/completions \
48
+ -H "Content-Type: application/json" \
49
+ -d '{
50
+ "model": "gpt-4o",
51
+ "messages": [{"role": "user", "content": "Hello!"}]
52
+ }'
26
53
  ```
27
54
 
28
- ## Features
55
+ ## Endpoints
29
56
 
30
- - **Intelligent Routing**: Routes requests to the optimal model based on task type
31
- - **Cost Tracking**: Tracks and reports API costs across all providers
32
- - **Provider Agnostic**: Works with Anthropic, OpenAI, Gemini, xAI, and more
33
- - **Local Learning**: Learns from your usage patterns to improve routing
34
- - **Privacy First**: Never sees your prompts or responses
57
+ ### `GET /health`
35
58
 
36
- ## CLI Options
59
+ Health check endpoint for monitoring.
37
60
 
38
61
  ```bash
39
- relayplane-proxy [command] [options]
40
-
41
- Commands:
42
- (default) Start the proxy server
43
- telemetry [on|off|status] Manage telemetry settings
44
- stats Show usage statistics
45
- config Show configuration
46
-
47
- Options:
48
- --port <number> Port to listen on (default: 3001)
49
- --host <string> Host to bind to (default: 127.0.0.1)
50
- --offline Disable all network calls except LLM endpoints
51
- --audit Show telemetry payloads before sending
52
- -v, --verbose Enable verbose logging
53
- -h, --help Show this help message
54
- --version Show version
62
+ curl http://localhost:8787/health
55
63
  ```
56
64
 
57
- ## Telemetry
58
-
59
- RelayPlane collects anonymous telemetry to improve model routing. This data helps us understand usage patterns and optimize routing decisions.
60
-
61
- ### What We Collect (Exact Schema)
62
-
65
+ Response:
63
66
  ```json
64
67
  {
65
- "device_id": "anon_8f3a...",
66
- "task_type": "code_review",
67
- "model": "claude-3-5-haiku",
68
- "tokens_in": 1847,
69
- "tokens_out": 423,
70
- "latency_ms": 2341,
71
- "success": true,
72
- "cost_usd": 0.02
68
+ "status": "ok",
69
+ "uptime": 3600,
70
+ "version": "1.1.0",
71
+ "providers": {
72
+ "openai": "configured",
73
+ "anthropic": "configured",
74
+ "groq": "not_configured",
75
+ "together": "not_configured",
76
+ "openrouter": "not_configured"
77
+ },
78
+ "requestsHandled": 150,
79
+ "requestsSuccessful": 148,
80
+ "requestsFailed": 2,
81
+ "dailyCost": 1.25,
82
+ "dailyLimit": 10.00,
83
+ "monthlyCost": 25.50,
84
+ "monthlyLimit": 100.00,
85
+ "usage": {
86
+ "inputTokens": 50000,
87
+ "outputTokens": 25000,
88
+ "totalCost": 1.25
89
+ }
73
90
  }
74
91
  ```
75
92
 
76
- ### Field Descriptions
93
+ ### `GET /v1/models`
77
94
 
78
- | Field | Type | Description |
79
- |-------|------|-------------|
80
- | `device_id` | string | Anonymous random ID (not fingerprintable) |
81
- | `task_type` | string | Inferred from token patterns, NOT prompt content |
82
- | `model` | string | The model that handled the request |
83
- | `tokens_in` | number | Input token count |
84
- | `tokens_out` | number | Output token count |
85
- | `latency_ms` | number | Request latency in milliseconds |
86
- | `success` | boolean | Whether the request succeeded |
87
- | `cost_usd` | number | Estimated cost in USD |
95
+ List available models including aliases.
88
96
 
89
- ### Task Types
97
+ ```bash
98
+ curl http://localhost:8787/v1/models
99
+ ```
90
100
 
91
- Task types are inferred from request characteristics (token counts, ratios, etc.) - never from prompt content:
101
+ ### `POST /v1/chat/completions`
92
102
 
93
- - `quick_task` - Short input/output (< 500 tokens each)
94
- - `code_review` - Medium-long input, medium output
95
- - `generation` - High output/input ratio
96
- - `classification` - Low output/input ratio, short output
97
- - `long_context` - Input > 10,000 tokens
98
- - `content_generation` - Output > 1,000 tokens
99
- - `tool_use` - Request includes tool calls
100
- - `general` - Default classification
103
+ OpenAI-compatible chat completions.
101
104
 
102
- ### What We NEVER Collect
105
+ ```bash
106
+ curl http://localhost:8787/v1/chat/completions \
107
+ -H "Content-Type: application/json" \
108
+ -d '{
109
+ "model": "rp:best",
110
+ "messages": [{"role": "user", "content": "Hello!"}]
111
+ }'
112
+ ```
103
113
 
104
- - Your prompts
105
- - ❌ Model responses
106
- - ❌ File paths or contents
107
- - ❌ Anything that could identify you or your project
114
+ ## Model Aliases
108
115
 
109
- ### Verification
116
+ | Alias | Resolves To | Provider | Use Case |
117
+ |-------|-------------|----------|----------|
118
+ | `rp:fast` | llama-3.1-8b-instant | Groq | Lowest latency |
119
+ | `rp:cheap` | llama-3.1-8b-instant | Groq | Lowest cost |
120
+ | `rp:best` | claude-3-5-sonnet-20241022 | Anthropic | Highest quality |
121
+ | `rp:balanced` | gpt-4o-mini | OpenAI | Good balance |
110
122
 
111
- You can verify exactly what data is collected:
123
+ ## Dry-Run Mode
112
124
 
113
- ```bash
114
- # See telemetry payloads before they're sent
115
- relayplane-proxy --audit
125
+ Test routing logic without making API calls:
116
126
 
117
- # Disable all telemetry transmission
118
- relayplane-proxy --offline
127
+ ```bash
128
+ curl http://localhost:8787/v1/chat/completions \
129
+ -H "Content-Type: application/json" \
130
+ -H "X-Dry-Run: true" \
131
+ -d '{
132
+ "model": "gpt-4o",
133
+ "messages": [{"role": "user", "content": "Hello!"}]
134
+ }'
135
+ ```
119
136
 
120
- # View the source code
121
- # https://github.com/RelayPlane/proxy
137
+ Response:
138
+ ```json
139
+ {
140
+ "dry_run": true,
141
+ "routing": {
142
+ "model": "gpt-4o",
143
+ "provider": "openai",
144
+ "endpoint": "https://api.openai.com/v1/chat/completions"
145
+ },
146
+ "estimate": {
147
+ "inputTokens": 10,
148
+ "expectedOutputTokens": 500,
149
+ "estimatedCost": 0.0125,
150
+ "currency": "USD"
151
+ },
152
+ "limits": {
153
+ "daily": 10.00,
154
+ "dailyUsed": 1.25,
155
+ "dailyRemaining": 8.75,
156
+ "monthly": 100.00,
157
+ "monthlyUsed": 25.50,
158
+ "monthlyRemaining": 74.50
159
+ }
160
+ }
122
161
  ```
123
162
 
124
- ### Opt-Out
163
+ ## Response Headers
125
164
 
126
- To disable telemetry completely:
165
+ The proxy adds usage information to response headers:
127
166
 
128
- ```bash
129
- relayplane-proxy telemetry off
167
+ | Header | Description |
168
+ |--------|-------------|
169
+ | `X-RelayPlane-Cost` | Cost of this request |
170
+ | `X-RelayPlane-Latency` | Request latency in ms |
171
+ | `X-RelayPlane-Daily-Usage` | Daily usage (e.g., "1.25/10.00") |
172
+ | `X-RelayPlane-Monthly-Usage` | Monthly usage (e.g., "25.50/100.00") |
173
+ | `X-RelayPlane-Usage-Warning` | Warning when approaching limits (80%, 90%, 100%) |
174
+
175
+ Example warning header:
176
+ ```
177
+ X-RelayPlane-Usage-Warning: ⚠️ You've used $8.50 of your $10 daily limit
130
178
  ```
131
179
 
132
- To re-enable:
180
+ Console warnings are also logged when approaching limits:
181
+ ```
182
+ ⚠️ Daily spending at 80%: $8.00 / $10
183
+ ⚠️ Daily spending at 90%: $9.00 / $10
184
+ ⚠️ DAILY LIMIT REACHED: $10.00 / $10 (100%)
185
+ ```
133
186
 
134
- ```bash
135
- relayplane-proxy telemetry on
187
+ ## Spending Limits
188
+
189
+ Configure limits in `~/.relayplane/config.json`:
190
+
191
+ ```json
192
+ {
193
+ "limits": {
194
+ "daily": 10.00,
195
+ "monthly": 100.00
196
+ }
197
+ }
136
198
  ```
137
199
 
138
- Check current status:
200
+ When limits are reached, the proxy returns HTTP `429 Too Many Requests`:
139
201
 
140
- ```bash
141
- relayplane-proxy telemetry status
202
+ ```json
203
+ {
204
+ "error": {
205
+ "message": "Daily spending limit reached ($10.00 / $10.00)",
206
+ "code": "spending_limit_exceeded",
207
+ "type": "rate_limit_error"
208
+ }
209
+ }
142
210
  ```
143
211
 
144
- ## Configuration
212
+ Headers included with 429 response:
213
+ - `Retry-After: 86400` (seconds until daily reset)
214
+ - `X-RelayPlane-Daily-Usage: 10.00/10.00`
145
215
 
146
- Configuration is stored in `~/.relayplane/config.json`.
216
+ ## Usage Tracking
147
217
 
148
- ### Set API Key (Pro Features)
218
+ Usage is logged to `~/.relayplane/usage.jsonl`:
149
219
 
150
- ```bash
151
- relayplane-proxy config set-key your-api-key
220
+ ```jsonl
221
+ {"timestamp":"2024-01-15T12:00:00Z","model":"gpt-4o","provider":"openai","inputTokens":100,"outputTokens":50,"cost":0.00125,"latencyMs":1500,"success":true}
152
222
  ```
153
223
 
154
- ### View Configuration
224
+ Daily totals are tracked in `~/.relayplane/daily-usage.json`:
155
225
 
156
- ```bash
157
- relayplane-proxy config
226
+ ```json
227
+ {
228
+ "date": "2024-01-15",
229
+ "cost": 1.25,
230
+ "requests": 50
231
+ }
158
232
  ```
159
233
 
160
- ## Usage Statistics
161
-
162
- View your usage statistics:
234
+ Monthly totals are tracked in `~/.relayplane/monthly-usage.json`:
163
235
 
164
- ```bash
165
- relayplane-proxy stats
236
+ ```json
237
+ {
238
+ "month": "2024-01",
239
+ "cost": 25.50,
240
+ "requests": 1200
241
+ }
166
242
  ```
167
243
 
168
- This shows:
169
- - Total requests and cost
170
- - Success rate
171
- - Breakdown by model
172
- - Breakdown by task type
173
-
174
244
  ## Environment Variables
175
245
 
176
- | Variable | Description |
177
- |----------|-------------|
178
- | `ANTHROPIC_API_KEY` | Anthropic API key |
179
- | `OPENAI_API_KEY` | OpenAI API key |
180
- | `GEMINI_API_KEY` | Google Gemini API key |
181
- | `XAI_API_KEY` | xAI/Grok API key |
182
- | `MOONSHOT_API_KEY` | Moonshot API key |
246
+ | Variable | Default | Description |
247
+ |----------|---------|-------------|
248
+ | `RELAYPLANE_PROXY_PORT` | 8787 | Port to listen on |
249
+ | `RELAYPLANE_PROXY_HOST` | 127.0.0.1 | Host to bind to |
250
+ | `RELAYPLANE_CONFIG_DIR` | ~/.relayplane | Config directory |
251
+ | `OPENAI_API_KEY` | - | OpenAI API key |
252
+ | `ANTHROPIC_API_KEY` | - | Anthropic API key |
253
+ | `GROQ_API_KEY` | - | Groq API key |
254
+ | `TOGETHER_API_KEY` | - | Together AI API key |
255
+ | `OPENROUTER_API_KEY` | - | OpenRouter API key |
256
+
257
+ ## Provider Detection
258
+
259
+ Models are automatically routed to the correct provider:
260
+
261
+ | Pattern | Provider |
262
+ |---------|----------|
263
+ | `gpt-*`, `o1-*` | OpenAI |
264
+ | `claude-*` | Anthropic |
265
+ | `llama-*`, `mixtral-*` | Groq |
266
+ | `meta-llama/*`, `mistralai/*` | Together |
267
+ | Contains `/` | OpenRouter |
268
+
269
+ ## Using with OpenAI SDK
270
+
271
+ ```python
272
+ from openai import OpenAI
273
+
274
+ client = OpenAI(
275
+ base_url="http://localhost:8787/v1",
276
+ api_key="not-needed" # API keys are configured on the proxy
277
+ )
278
+
279
+ response = client.chat.completions.create(
280
+ model="rp:best", # Uses Claude 3.5 Sonnet
281
+ messages=[{"role": "user", "content": "Hello!"}]
282
+ )
283
+ ```
183
284
 
184
285
  ## License
185
286
 
@@ -0,0 +1,13 @@
1
+ #!/usr/bin/env node
2
+ /**
3
+ * RelayPlane Local LLM Proxy Server
4
+ *
5
+ * Routes OpenAI-compatible requests to multiple providers.
6
+ * Features:
7
+ * - /health endpoint for monitoring
8
+ * - Usage tracking with spending warnings
9
+ * - Model aliases (rp:fast, rp:cheap, rp:best)
10
+ * - Dry-run mode for testing
11
+ */
12
+ export {};
13
+ //# sourceMappingURL=server.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"server.d.ts","sourceRoot":"","sources":["../src/server.ts"],"names":[],"mappings":";AACA;;;;;;;;;GASG"}