compress-lightreach 1.0.6 → 1.0.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +116 -87
- package/dist/api-client.d.ts +66 -23
- package/dist/api-client.js +69 -18
- package/dist/cli.js +0 -0
- package/dist/core.d.ts +14 -0
- package/dist/core.js +0 -1
- package/dist/index.d.ts +1 -1
- package/dist/index.js +1 -1
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Compress Light Reach
|
|
2
2
|
|
|
3
|
-
**
|
|
3
|
+
**OpenAI-compatible LLM routing + compression SDK (superset responses with LightReach metadata)**
|
|
4
4
|
|
|
5
5
|
[](https://badge.fury.io/js/compress-lightreach)
|
|
6
6
|
[](https://nodejs.org/)
|
|
@@ -10,7 +10,7 @@ Compress Light Reach is a Node.js/TypeScript SDK that provides intelligent model
|
|
|
10
10
|
|
|
11
11
|
## Features
|
|
12
12
|
|
|
13
|
-
- **Intelligent Model Routing**: Automatically selects optimal model based on quality
|
|
13
|
+
- **Intelligent Model Routing**: Automatically selects the optimal model based on admin-configured quality settings and available provider keys
|
|
14
14
|
- **Token-aware Compression**: Replaces repeated substrings with shorter placeholders using a fast greedy algorithm
|
|
15
15
|
- **Lossless**: Perfect decompression guaranteed
|
|
16
16
|
- **Output Compression**: Optional model output compression support
|
|
@@ -37,7 +37,7 @@ The SDK uses **intelligent model routing** and targets `POST /api/v2/complete`.
|
|
|
37
37
|
|
|
38
38
|
- Authenticate with your **LightReach API key** (env var `PCOMPRESLR_API_KEY` or `LIGHTREACH_API_KEY`)
|
|
39
39
|
- Manage **provider keys** (OpenAI/Anthropic/Google/etc.) in the dashboard (BYOK)
|
|
40
|
-
- System automatically selects optimal model based on
|
|
40
|
+
- System automatically selects the optimal model based on admin-configured quality settings
|
|
41
41
|
|
|
42
42
|
```typescript
|
|
43
43
|
import { PcompresslrAPIClient } from 'compress-lightreach';
|
|
@@ -49,10 +49,10 @@ const result = await client.complete({
|
|
|
49
49
|
{ role: 'system', content: 'You are a helpful assistant.' },
|
|
50
50
|
{ role: 'user', content: 'Explain quantum computing in simple terms.' },
|
|
51
51
|
],
|
|
52
|
-
|
|
52
|
+
tags: { team: 'backend', environment: 'production' },
|
|
53
53
|
});
|
|
54
54
|
|
|
55
|
-
console.log(result.
|
|
55
|
+
console.log(result.choices[0].message.content);
|
|
56
56
|
console.log(`Selected: ${result.routing_info?.selected_model}`);
|
|
57
57
|
console.log(`Token savings: ${result.compression_stats.token_savings}`);
|
|
58
58
|
```
|
|
@@ -61,15 +61,15 @@ console.log(`Token savings: ${result.compression_stats.token_savings}`);
|
|
|
61
61
|
|
|
62
62
|
LightReach also exposes a **strict OpenAI-compatible** surface (including streaming SSE) so you can use standard OpenAI tooling without changing your app.
|
|
63
63
|
|
|
64
|
-
- **Cursor base URL**: `https://compress.lightreach.io/v1/cursor`
|
|
65
|
-
- **Generic OpenAI-compatible base URL**: `https://compress.lightreach.io/v1`
|
|
64
|
+
- **Cursor base URL**: `https://api.compress.lightreach.io/v1/cursor`
|
|
65
|
+
- **Generic OpenAI-compatible base URL**: `https://api.compress.lightreach.io/v1`
|
|
66
66
|
- **Endpoints**: `GET /models`, `POST /chat/completions`
|
|
67
67
|
- **Model id**: `lightreach`
|
|
68
68
|
|
|
69
69
|
Example (cURL):
|
|
70
70
|
|
|
71
71
|
```bash
|
|
72
|
-
curl -sS https://compress.lightreach.io/v1/chat/completions \
|
|
72
|
+
curl -sS https://api.compress.lightreach.io/v1/chat/completions \
|
|
73
73
|
-H "Authorization: Bearer lr_your_lightreach_key" \
|
|
74
74
|
-H "Content-Type: application/json" \
|
|
75
75
|
-d '{
|
|
@@ -79,84 +79,95 @@ curl -sS https://compress.lightreach.io/v1/chat/completions \
|
|
|
79
79
|
}'
|
|
80
80
|
```
|
|
81
81
|
|
|
82
|
-
|
|
82
|
+
## Tags
|
|
83
|
+
|
|
84
|
+
Tags provide **cost attribution** and enable **admin-controlled quality ceilings** per tag. The system supports three tag categories that you can set on requests:
|
|
85
|
+
|
|
86
|
+
| Tag Key | Description | Example Values |
|
|
87
|
+
|---------|-------------|----------------|
|
|
88
|
+
| `team` | Your team or group | `"backend"`, `"ml-platform"`, `"marketing"` |
|
|
89
|
+
| `environment` | Deployment environment | `"development"`, `"staging"`, `"production"` |
|
|
90
|
+
| `feature` | Feature or use case | `"search"`, `"chat"`, `"summarization"` |
|
|
91
|
+
|
|
92
|
+
Tags are validated server-side. Your workspace admin can configure allowed values for each tag category via the dashboard. If a tag value is not in the allowed list, the request may be warned or rejected depending on your workspace's enforcement mode.
|
|
83
93
|
|
|
84
94
|
```typescript
|
|
85
95
|
const result = await client.complete({
|
|
86
|
-
messages: [{ role: 'user', content: '
|
|
87
|
-
|
|
88
|
-
|
|
96
|
+
messages: [{ role: 'user', content: 'Summarize this document...' }],
|
|
97
|
+
tags: {
|
|
98
|
+
team: 'backend',
|
|
99
|
+
environment: 'production',
|
|
100
|
+
feature: 'summarization',
|
|
101
|
+
},
|
|
89
102
|
});
|
|
90
|
-
|
|
91
|
-
console.log(result.decompressed_response);
|
|
92
103
|
```
|
|
93
104
|
|
|
94
|
-
|
|
105
|
+
> **Note:** The `integration` tag is reserved for system use (e.g., Cursor, Claude Code) and should not be set manually. The `project` tag is also available for workspace-level project attribution — see your dashboard for configuration.
|
|
106
|
+
|
|
107
|
+
## Intelligent Model Routing
|
|
108
|
+
|
|
109
|
+
Model routing is fully managed by your workspace admin via the dashboard. The system uses **HLE (Humanity's Last Exam)** scores — a standardized benchmark — to determine model quality. Admins configure quality ceilings at three levels:
|
|
95
110
|
|
|
96
|
-
|
|
111
|
+
- **Global ceiling**: Set via the HLE slider in the dashboard. Applies to all requests.
|
|
112
|
+
- **Tag-level ceilings**: Set per tag (e.g., `environment=development` gets a lower ceiling to save costs).
|
|
113
|
+
- **Integration-level ceilings**: Set per integration (e.g., Cursor, Claude Code).
|
|
114
|
+
|
|
115
|
+
The routing engine picks the **cheapest model** whose HLE score meets the effective ceiling. HLE scores are maintained server-side and cannot be overridden by SDK callers.
|
|
97
116
|
|
|
98
117
|
```typescript
|
|
99
118
|
import { PcompresslrAPIClient } from 'compress-lightreach';
|
|
100
119
|
|
|
101
120
|
const client = new PcompresslrAPIClient("your-lightreach-api-key");
|
|
102
121
|
|
|
103
|
-
// Cross-provider optimization: system picks cheapest model meeting your quality bar
|
|
104
122
|
const result = await client.complete({
|
|
105
123
|
messages: [{ role: 'user', content: 'Explain quantum computing' }],
|
|
106
|
-
|
|
124
|
+
tags: { team: 'backend', environment: 'production' },
|
|
107
125
|
});
|
|
108
126
|
|
|
109
|
-
// Check what was selected
|
|
110
127
|
console.log(result.routing_info?.selected_model); // e.g., "gpt-4o-mini"
|
|
111
128
|
console.log(result.routing_info?.selected_provider); // e.g., "openai"
|
|
112
129
|
console.log(result.routing_info?.model_hle); // e.g., 32.5
|
|
113
130
|
console.log(result.routing_info?.model_price_per_million); // e.g., 0.15
|
|
114
131
|
```
|
|
115
132
|
|
|
133
|
+
### Routing Response
|
|
134
|
+
|
|
135
|
+
Every `complete()` response includes `routing_info` with full transparency into the routing decision:
|
|
136
|
+
|
|
137
|
+
```typescript
|
|
138
|
+
const info = result.routing_info;
|
|
139
|
+
console.log(`Model: ${info?.selected_model}`);
|
|
140
|
+
console.log(`Provider: ${info?.selected_provider}`);
|
|
141
|
+
console.log(`Model HLE: ${info?.model_hle}`);
|
|
142
|
+
console.log(`Effective HLE ceiling: ${info?.effective_hle}`);
|
|
143
|
+
console.log(`Ceiling source: ${info?.hle_source}`); // "tag", "global", or "none"
|
|
144
|
+
```
|
|
145
|
+
|
|
116
146
|
### Provider-Constrained Routing
|
|
117
147
|
|
|
118
148
|
Optionally constrain to a specific provider:
|
|
119
149
|
|
|
120
150
|
```typescript
|
|
121
|
-
// Only use OpenAI models, but pick the cheapest one meeting HLE 35
|
|
122
151
|
const result = await client.complete({
|
|
123
152
|
messages: [{ role: 'user', content: 'Write a poem' }],
|
|
124
|
-
llm_provider: '
|
|
125
|
-
desired_hle: 35,
|
|
153
|
+
llm_provider: 'anthropic',
|
|
126
154
|
});
|
|
127
155
|
```
|
|
128
156
|
|
|
129
|
-
###
|
|
130
|
-
|
|
131
|
-
Admins can set quality **ceilings** via the dashboard (global or per-tag) to control costs. Your `desired_hle` is a preference; if it exceeds an admin-set ceiling, the request will **silently clamp** to the ceiling and proceed.
|
|
157
|
+
### With Output Compression
|
|
132
158
|
|
|
133
159
|
```typescript
|
|
134
|
-
// Admin set global HLE ceiling to 30%
|
|
135
|
-
// Requesting above the ceiling will be clamped to 30 (no error)
|
|
136
|
-
const result = await client.complete({
|
|
137
|
-
messages: [{ role: 'user', content: 'Process payment' }],
|
|
138
|
-
desired_hle: 35, // Will be clamped down to 30
|
|
139
|
-
tags: { env: 'production' },
|
|
140
|
-
});
|
|
141
|
-
|
|
142
|
-
// Correct usage: request within ceiling
|
|
143
160
|
const result = await client.complete({
|
|
144
|
-
messages: [{ role: 'user', content: '
|
|
145
|
-
|
|
146
|
-
tags: { env: 'production' },
|
|
161
|
+
messages: [{ role: 'user', content: 'Generate a long report...' }],
|
|
162
|
+
compress_output: true,
|
|
147
163
|
});
|
|
148
164
|
|
|
149
|
-
|
|
150
|
-
if (result.routing_info?.hle_clamped) {
|
|
151
|
-
console.log(`HLE lowered from ${result.routing_info.requested_hle} ` +
|
|
152
|
-
`to ${result.routing_info.effective_hle} ` +
|
|
153
|
-
`by ${result.routing_info.hle_source}-level ceiling`);
|
|
154
|
-
}
|
|
165
|
+
console.log(result.choices[0].message.content);
|
|
155
166
|
```
|
|
156
167
|
|
|
157
168
|
### With Compression Config
|
|
158
169
|
|
|
159
|
-
|
|
170
|
+
Control which message roles get compressed:
|
|
160
171
|
|
|
161
172
|
```typescript
|
|
162
173
|
import { PcompresslrAPIClient } from 'compress-lightreach';
|
|
@@ -165,7 +176,6 @@ const client = new PcompresslrAPIClient("your-lightreach-api-key");
|
|
|
165
176
|
|
|
166
177
|
const result = await client.complete({
|
|
167
178
|
messages: [{ role: 'user', content: 'Hello!' }],
|
|
168
|
-
desired_hle: 30,
|
|
169
179
|
compress: true,
|
|
170
180
|
compress_output: false,
|
|
171
181
|
compression_config: {
|
|
@@ -176,14 +186,13 @@ const result = await client.complete({
|
|
|
176
186
|
},
|
|
177
187
|
temperature: 0.7,
|
|
178
188
|
max_tokens: 1000,
|
|
179
|
-
tags: {
|
|
189
|
+
tags: { team: 'backend', environment: 'production' },
|
|
180
190
|
});
|
|
181
191
|
|
|
182
|
-
console.log(result.
|
|
192
|
+
console.log(result.choices[0].message.content);
|
|
183
193
|
console.log(`Model used: ${result.routing_info?.selected_model}`);
|
|
184
194
|
```
|
|
185
195
|
|
|
186
|
-
|
|
187
196
|
### Compression Only (No LLM Call)
|
|
188
197
|
|
|
189
198
|
```typescript
|
|
@@ -191,11 +200,10 @@ import { PcompresslrAPIClient } from 'compress-lightreach';
|
|
|
191
200
|
|
|
192
201
|
const client = new PcompresslrAPIClient("your-lightreach-api-key");
|
|
193
202
|
|
|
194
|
-
// Compress text without making an LLM call
|
|
195
203
|
const compressed = await client.compress(
|
|
196
204
|
"Your text with repeated content here...",
|
|
197
|
-
"gpt-4",
|
|
198
|
-
{
|
|
205
|
+
"gpt-4",
|
|
206
|
+
{ team: 'backend' },
|
|
199
207
|
);
|
|
200
208
|
|
|
201
209
|
console.log(compressed.llm_format);
|
|
@@ -209,10 +217,8 @@ console.log(decompressed.decompressed);
|
|
|
209
217
|
### Command Line Interface
|
|
210
218
|
|
|
211
219
|
```bash
|
|
212
|
-
# Set your API key
|
|
213
220
|
export PCOMPRESLR_API_KEY=your-api-key
|
|
214
221
|
|
|
215
|
-
# Compress a prompt
|
|
216
222
|
npx pcompresslr "Your prompt with repeated text here..."
|
|
217
223
|
```
|
|
218
224
|
|
|
@@ -237,7 +243,9 @@ new PcompresslrAPIClient(apiKey?: string, apiUrl?: string, timeout?: number)
|
|
|
237
243
|
|
|
238
244
|
##### `complete(request: CompleteV2Request): Promise<CompleteResponse>`
|
|
239
245
|
|
|
240
|
-
Messages-first completion with intelligent routing (
|
|
246
|
+
Messages-first completion with intelligent routing. Uses async job processing (enqueue + poll) for production reliability.
|
|
247
|
+
|
|
248
|
+
For direct synchronous calls, use `completeSync()` instead.
|
|
241
249
|
|
|
242
250
|
**Request Parameters (`CompleteV2Request`):**
|
|
243
251
|
|
|
@@ -245,13 +253,12 @@ Messages-first completion with intelligent routing (POST `/api/v2/complete`).
|
|
|
245
253
|
|-----------|------|---------|-------------|
|
|
246
254
|
| `messages` | `Message[]` | required | Conversation history with `role` and `content` |
|
|
247
255
|
| `llm_provider` | `'openai' \| 'anthropic' \| 'google' \| 'deepseek' \| 'moonshot'` | — | Optional provider constraint. Omit for cross-provider optimization |
|
|
248
|
-
| `desired_hle` | `number` | — | Quality ceiling (0-100). If above an admin ceiling, it is clamped down |
|
|
249
256
|
| `compress` | `boolean` | `true` | Whether to compress messages |
|
|
250
257
|
| `compress_output` | `boolean` | `false` | Whether to request compressed output from LLM |
|
|
251
258
|
| `compression_config` | `object` | — | Per-role compression settings (see below) |
|
|
252
259
|
| `temperature` | `number` | — | LLM temperature parameter |
|
|
253
260
|
| `max_tokens` | `number` | — | Maximum tokens to generate |
|
|
254
|
-
| `tags` | `Record<string, string>` | — | Tags for cost attribution and
|
|
261
|
+
| `tags` | `Record<string, string>` | — | Tags for cost attribution and quality ceilings. Use `team`, `environment`, and/or `feature` keys |
|
|
255
262
|
| `max_history_messages` | `number` | — | Limit conversation history length |
|
|
256
263
|
|
|
257
264
|
**`compression_config` options:**
|
|
@@ -269,7 +276,21 @@ Messages-first completion with intelligent routing (POST `/api/v2/complete`).
|
|
|
269
276
|
|
|
270
277
|
```typescript
|
|
271
278
|
{
|
|
272
|
-
|
|
279
|
+
id: string; // OpenAI-style completion id
|
|
280
|
+
object: "chat.completion";
|
|
281
|
+
created: number; // Unix timestamp
|
|
282
|
+
model: string;
|
|
283
|
+
choices: Array<{
|
|
284
|
+
index: number;
|
|
285
|
+
message: { role: "assistant"; content: string | null; tool_calls?: any[] };
|
|
286
|
+
finish_reason: string | null;
|
|
287
|
+
}>;
|
|
288
|
+
usage: {
|
|
289
|
+
prompt_tokens: number;
|
|
290
|
+
completion_tokens: number;
|
|
291
|
+
total_tokens: number;
|
|
292
|
+
};
|
|
293
|
+
content: string; // Alias of choices[0].message.content
|
|
273
294
|
compression_stats: {
|
|
274
295
|
compression_enabled: boolean;
|
|
275
296
|
original_tokens: number;
|
|
@@ -293,33 +314,48 @@ Messages-first completion with intelligent routing (POST `/api/v2/complete`).
|
|
|
293
314
|
selected_model: string; // Model chosen by system
|
|
294
315
|
selected_provider: string; // Provider chosen by system
|
|
295
316
|
selected_model_id: string;
|
|
296
|
-
model_hle: number; // HLE score of selected model
|
|
317
|
+
model_hle: number; // HLE score of selected model (server-computed)
|
|
297
318
|
model_price_per_million: number;
|
|
298
|
-
|
|
299
|
-
|
|
300
|
-
hle_source: 'request' | 'tag' | 'global' | 'none';
|
|
301
|
-
hle_clamped: boolean; // true if admin ceiling lowered your desired_hle
|
|
319
|
+
effective_hle: number | null; // The quality ceiling that was applied
|
|
320
|
+
hle_source: 'tag' | 'global' | 'none';
|
|
302
321
|
};
|
|
303
322
|
warnings?: string[];
|
|
304
|
-
|
|
323
|
+
lightreach?: { // Namespaced LightReach metadata extension
|
|
324
|
+
compression_stats?: object;
|
|
325
|
+
llm_stats?: object;
|
|
326
|
+
routing_info?: object;
|
|
327
|
+
latency_ms?: number | null;
|
|
328
|
+
};
|
|
329
|
+
|
|
305
330
|
// Convenience aliases
|
|
306
|
-
|
|
307
|
-
|
|
308
|
-
|
|
309
|
-
|
|
331
|
+
tokens_saved?: number;
|
|
332
|
+
tokens_used?: number;
|
|
333
|
+
compression_ratio?: number;
|
|
334
|
+
cost_estimate?: number | null;
|
|
335
|
+
savings_estimate?: number | null;
|
|
310
336
|
}
|
|
311
337
|
```
|
|
312
338
|
|
|
313
|
-
##### `
|
|
339
|
+
##### `completeSync(request: CompleteV2Request): Promise<CompleteResponse>`
|
|
340
|
+
|
|
341
|
+
Direct synchronous call to POST `/api/v2/complete`. Best for small/interactive usage. For production reliability, prefer `complete()` (async job + polling).
|
|
342
|
+
|
|
343
|
+
##### `completeAsync(request, opts?): Promise<CompleteResponse>`
|
|
344
|
+
|
|
345
|
+
Explicit async job flow with configurable polling. Called internally by `complete()`.
|
|
314
346
|
|
|
315
|
-
|
|
347
|
+
**Options:**
|
|
348
|
+
- `pollIntervalMs` (number, default: 1000): Polling interval in milliseconds
|
|
349
|
+
- `maxWaitMs` (number, default: timeout): Maximum wait time
|
|
350
|
+
- `idempotencyKey` (string, optional): Idempotency key for job creation
|
|
351
|
+
|
|
352
|
+
##### `compress(prompt, model?, tags?): Promise<CompressResponse>`
|
|
316
353
|
|
|
317
354
|
Compression-only (POST `/api/v1/compress`).
|
|
318
355
|
|
|
319
356
|
**Parameters:**
|
|
320
357
|
- `prompt` (string, required): Text to compress
|
|
321
358
|
- `model` (string, optional): Model for tokenization. Default: `'gpt-4'`
|
|
322
|
-
- `algorithm` (`"greedy"`, optional): Legacy-only parameter. Only `"greedy"` is supported.
|
|
323
359
|
- `tags` (`Record<string, string>`, optional): Tags for attribution
|
|
324
360
|
|
|
325
361
|
**Response (`CompressResponse`):**
|
|
@@ -366,7 +402,6 @@ Check API health status (GET `/health`).
|
|
|
366
402
|
}
|
|
367
403
|
```
|
|
368
404
|
|
|
369
|
-
|
|
370
405
|
### Message Types
|
|
371
406
|
|
|
372
407
|
```typescript
|
|
@@ -393,7 +428,7 @@ interface Message {
|
|
|
393
428
|
| `PcompresslrAPIError` | Base exception class |
|
|
394
429
|
| `APIKeyError` | Invalid or missing API key |
|
|
395
430
|
| `RateLimitError` | Rate limit exceeded |
|
|
396
|
-
| `APIRequestError` | General API errors (including routing failures) |
|
|
431
|
+
| `APIRequestError` | General API errors (including routing failures, tag validation errors) |
|
|
397
432
|
|
|
398
433
|
```typescript
|
|
399
434
|
import { APIKeyError, RateLimitError, APIRequestError } from 'compress-lightreach';
|
|
@@ -413,15 +448,10 @@ try {
|
|
|
413
448
|
|
|
414
449
|
## How It Works
|
|
415
450
|
|
|
416
|
-
|
|
417
|
-
|
|
418
|
-
|
|
419
|
-
|
|
420
|
-
2. Calculates token savings for each potential replacement
|
|
421
|
-
3. Selects optimal replacements that reduce total token count
|
|
422
|
-
4. Intelligently routes to the best model based on your quality requirements
|
|
423
|
-
5. Formats the result for easy LLM consumption
|
|
424
|
-
6. Provides perfect decompression
|
|
451
|
+
1. **Compression**: Identifies repeated substrings using efficient algorithms and replaces them with shorter placeholders, reducing token count
|
|
452
|
+
2. **Routing**: Selects the cheapest model that meets the admin-configured quality ceiling (global, tag-level, or integration-level)
|
|
453
|
+
3. **LLM Call**: Sends the compressed prompt to the selected model via your BYOK provider keys
|
|
454
|
+
4. **Decompression**: Losslessly restores the model's response if output compression was enabled
|
|
425
455
|
|
|
426
456
|
## Examples
|
|
427
457
|
|
|
@@ -440,10 +470,10 @@ Write a story about a bird. The bird is very friendly.
|
|
|
440
470
|
|
|
441
471
|
const result = await client.complete({
|
|
442
472
|
messages: [{ role: "user", content: prompt }],
|
|
443
|
-
|
|
473
|
+
tags: { team: 'content', environment: 'production' },
|
|
444
474
|
});
|
|
445
475
|
|
|
446
|
-
console.log(result.
|
|
476
|
+
console.log(result.choices[0].message.content);
|
|
447
477
|
console.log(`Model used: ${result.routing_info?.selected_model}`);
|
|
448
478
|
console.log(`Token savings: ${result.compression_stats.token_savings} tokens`);
|
|
449
479
|
console.log(`Compression ratio: ${(result.compression_stats.compression_ratio * 100).toFixed(2)}%`);
|
|
@@ -458,11 +488,10 @@ const client = new PcompresslrAPIClient("your-lightreach-api-key");
|
|
|
458
488
|
|
|
459
489
|
const result = await client.complete({
|
|
460
490
|
messages: [{ role: "user", content: "Generate a long report with repeated sections..." }],
|
|
461
|
-
desired_hle: 35,
|
|
462
491
|
compress_output: true,
|
|
463
492
|
});
|
|
464
493
|
|
|
465
|
-
console.log(result.
|
|
494
|
+
console.log(result.choices[0].message.content);
|
|
466
495
|
```
|
|
467
496
|
|
|
468
497
|
### Example 3: Multi-turn Conversation
|
|
@@ -479,13 +508,13 @@ const result = await client.complete({
|
|
|
479
508
|
{ role: "assistant", content: "You can use open() with a context manager..." },
|
|
480
509
|
{ role: "user", content: "How about writing to a file?" },
|
|
481
510
|
],
|
|
482
|
-
desired_hle: 30,
|
|
483
511
|
compression_config: {
|
|
484
512
|
compress_system: false,
|
|
485
513
|
compress_user: true,
|
|
486
514
|
compress_assistant: false,
|
|
487
|
-
compress_only_last_n_user: 2,
|
|
515
|
+
compress_only_last_n_user: 2,
|
|
488
516
|
},
|
|
517
|
+
tags: { team: 'engineering', feature: 'code-assistant' },
|
|
489
518
|
});
|
|
490
519
|
```
|
|
491
520
|
|
package/dist/api-client.d.ts
CHANGED
|
@@ -28,7 +28,27 @@ export interface DecompressResponse {
|
|
|
28
28
|
processing_time_ms: number;
|
|
29
29
|
}
|
|
30
30
|
export interface CompleteResponse {
|
|
31
|
-
|
|
31
|
+
id: string;
|
|
32
|
+
object: 'chat.completion';
|
|
33
|
+
created: number;
|
|
34
|
+
model: string;
|
|
35
|
+
choices: Array<{
|
|
36
|
+
index: number;
|
|
37
|
+
message: {
|
|
38
|
+
role: 'assistant';
|
|
39
|
+
content: string | null;
|
|
40
|
+
tool_calls?: Array<Record<string, any>>;
|
|
41
|
+
};
|
|
42
|
+
finish_reason: string | null;
|
|
43
|
+
}>;
|
|
44
|
+
usage: {
|
|
45
|
+
prompt_tokens: number;
|
|
46
|
+
completion_tokens: number;
|
|
47
|
+
total_tokens: number;
|
|
48
|
+
};
|
|
49
|
+
content: string;
|
|
50
|
+
decompressed_response?: string;
|
|
51
|
+
text?: string;
|
|
32
52
|
compression_stats: {
|
|
33
53
|
compression_enabled: boolean;
|
|
34
54
|
original_tokens: number;
|
|
@@ -53,19 +73,29 @@ export interface CompleteResponse {
|
|
|
53
73
|
selected_model: string;
|
|
54
74
|
selected_provider: string;
|
|
55
75
|
selected_model_id: string;
|
|
76
|
+
/** HLE score of the selected model (server-computed). */
|
|
56
77
|
model_hle: number;
|
|
57
78
|
model_price_per_million: number;
|
|
58
|
-
|
|
79
|
+
input_price_per_million?: number | null;
|
|
80
|
+
output_price_per_million?: number | null;
|
|
81
|
+
/** @deprecated Present for backward compatibility. */
|
|
82
|
+
requested_hle?: number | null;
|
|
83
|
+
/** The quality ceiling that was applied (from global, tag, or integration settings). */
|
|
59
84
|
effective_hle: number | null;
|
|
85
|
+
/** Where the effective HLE ceiling came from. */
|
|
60
86
|
hle_source: 'request' | 'tag' | 'global' | 'none';
|
|
61
|
-
|
|
87
|
+
/** @deprecated Present for backward compatibility. */
|
|
88
|
+
hle_clamped?: boolean;
|
|
62
89
|
};
|
|
63
|
-
text?: string;
|
|
64
90
|
tokens_saved?: number;
|
|
65
91
|
tokens_used?: number;
|
|
66
92
|
compression_ratio?: number;
|
|
67
93
|
cost_estimate?: number | null;
|
|
68
94
|
savings_estimate?: number | null;
|
|
95
|
+
model_hle?: number | null;
|
|
96
|
+
input_price_per_million?: number | null;
|
|
97
|
+
output_price_per_million?: number | null;
|
|
98
|
+
lightreach?: Record<string, any>;
|
|
69
99
|
}
|
|
70
100
|
export type MessageRole = 'system' | 'developer' | 'user' | 'assistant';
|
|
71
101
|
export interface Message {
|
|
@@ -74,7 +104,13 @@ export interface Message {
|
|
|
74
104
|
}
|
|
75
105
|
export interface CompleteV2Request {
|
|
76
106
|
messages: Message[];
|
|
107
|
+
/** Optional provider constraint. Omit for cross-provider cost optimization. */
|
|
77
108
|
llm_provider?: 'openai' | 'anthropic' | 'google' | 'deepseek' | 'moonshot';
|
|
109
|
+
/**
|
|
110
|
+
* @deprecated Quality routing is now fully managed by admin-configured ceilings
|
|
111
|
+
* (global, tag-level, integration-level) in the dashboard. This parameter is
|
|
112
|
+
* accepted for backward compatibility but should not be used in new code.
|
|
113
|
+
*/
|
|
78
114
|
desired_hle?: number;
|
|
79
115
|
compress?: boolean;
|
|
80
116
|
compression_config?: {
|
|
@@ -87,12 +123,25 @@ export interface CompleteV2Request {
|
|
|
87
123
|
algorithm?: 'greedy';
|
|
88
124
|
temperature?: number;
|
|
89
125
|
max_tokens?: number;
|
|
126
|
+
/**
|
|
127
|
+
* Tags for cost attribution and tag-level quality ceilings.
|
|
128
|
+
* Supported keys: 'team', 'environment', 'feature'.
|
|
129
|
+
* Values are validated server-side against your workspace's allowed list.
|
|
130
|
+
* The 'integration' tag is reserved for system use and should not be set manually.
|
|
131
|
+
*
|
|
132
|
+
* @example { team: 'backend', environment: 'production', feature: 'search' }
|
|
133
|
+
*/
|
|
90
134
|
tags?: Record<string, string>;
|
|
91
135
|
max_history_messages?: number;
|
|
136
|
+
/** @deprecated System selects model automatically. */
|
|
92
137
|
model?: string;
|
|
138
|
+
/** @deprecated Use desired_hle instead. */
|
|
93
139
|
hle_target_percent?: number;
|
|
140
|
+
/** @deprecated Use desired_hle instead. */
|
|
94
141
|
min_hle_score?: number;
|
|
142
|
+
/** @deprecated Always auto-selects now. */
|
|
95
143
|
auto_select_by_hle?: boolean;
|
|
144
|
+
/** @deprecated Use llm_provider instead. */
|
|
96
145
|
same_provider_only?: boolean;
|
|
97
146
|
}
|
|
98
147
|
export interface HealthCheckResponse {
|
|
@@ -124,6 +173,7 @@ export declare class PcompresslrAPIClient {
|
|
|
124
173
|
private session;
|
|
125
174
|
constructor(apiKey?: string, apiUrl?: string, timeout?: number);
|
|
126
175
|
private makeRequest;
|
|
176
|
+
private toOpenAISupersetResponse;
|
|
127
177
|
/**
|
|
128
178
|
* Create async /complete job (POST /api/v1/complete/jobs).
|
|
129
179
|
*/
|
|
@@ -145,9 +195,11 @@ export declare class PcompresslrAPIClient {
|
|
|
145
195
|
/**
|
|
146
196
|
* Compress text without making an LLM call (POST /api/v1/compress).
|
|
147
197
|
*
|
|
148
|
-
*
|
|
149
|
-
* -
|
|
150
|
-
* -
|
|
198
|
+
* @param prompt - Text to compress
|
|
199
|
+
* @param model - Model for tokenization (default: 'gpt-4')
|
|
200
|
+
* @param tags - Tags for attribution. Supported keys: 'team', 'environment', 'feature'.
|
|
201
|
+
*
|
|
202
|
+
* Also supports a legacy call shape: compress(prompt, model, 'greedy', tags?)
|
|
151
203
|
*/
|
|
152
204
|
compress(prompt: string, model?: string, tags?: Record<string, string>): Promise<CompressResponse>;
|
|
153
205
|
compress(prompt: string, model: string, algorithm: 'greedy', tags?: Record<string, string>): Promise<CompressResponse>;
|
|
@@ -161,32 +213,23 @@ export declare class PcompresslrAPIClient {
|
|
|
161
213
|
*/
|
|
162
214
|
completeSync(request: CompleteV2Request): Promise<CompleteResponse>;
|
|
163
215
|
/**
|
|
164
|
-
* Messages-first complete with intelligent model selection
|
|
216
|
+
* Messages-first complete with intelligent model selection.
|
|
165
217
|
*
|
|
166
|
-
*
|
|
167
|
-
*
|
|
218
|
+
* Uses async job processing (enqueue + poll) for production reliability.
|
|
219
|
+
* Model routing is managed by admin-configured quality ceilings (global,
|
|
220
|
+
* tag-level, integration-level) in the dashboard. The system selects the
|
|
221
|
+
* cheapest model that meets the effective ceiling.
|
|
168
222
|
*
|
|
169
223
|
* Provider API keys must be stored in your account (BYOK via dashboard).
|
|
170
224
|
*
|
|
171
225
|
* @example
|
|
172
|
-
* // Basic usage (cross-provider optimization)
|
|
173
|
-
* const response = await client.complete({
|
|
174
|
-
* messages: [{role: 'user', content: 'Hello'}],
|
|
175
|
-
* desired_hle: 30,
|
|
176
|
-
* });
|
|
177
|
-
*
|
|
178
|
-
* // Constrained to specific provider
|
|
179
226
|
* const response = await client.complete({
|
|
180
227
|
* messages: [{role: 'user', content: 'Hello'}],
|
|
181
|
-
*
|
|
182
|
-
* desired_hle: 35,
|
|
228
|
+
* tags: { team: 'backend', environment: 'production' },
|
|
183
229
|
* });
|
|
184
230
|
*
|
|
185
|
-
* // Access routing info
|
|
186
231
|
* console.log(response.routing_info?.selected_model);
|
|
187
|
-
*
|
|
188
|
-
* console.log('Admin ceiling lowered your desired HLE');
|
|
189
|
-
* }
|
|
232
|
+
* console.log(response.routing_info?.effective_hle);
|
|
190
233
|
*/
|
|
191
234
|
complete(request: CompleteV2Request): Promise<CompleteResponse>;
|
|
192
235
|
}
|
package/dist/api-client.js
CHANGED
|
@@ -141,6 +141,65 @@ class PcompresslrAPIClient {
|
|
|
141
141
|
throw new APIRequestError(`Request failed: ${errorMessage}`);
|
|
142
142
|
}
|
|
143
143
|
}
|
|
144
|
+
toOpenAISupersetResponse(raw) {
|
|
145
|
+
const response = (raw && typeof raw === 'object') ? { ...raw } : {};
|
|
146
|
+
const llmStats = (response.llm_stats && typeof response.llm_stats === 'object') ? response.llm_stats : {};
|
|
147
|
+
const routingInfo = (response.routing_info && typeof response.routing_info === 'object') ? response.routing_info : {};
|
|
148
|
+
const content = (typeof response.content === 'string' && response.content) ||
|
|
149
|
+
(typeof response.decompressed_response === 'string' && response.decompressed_response) ||
|
|
150
|
+
(typeof response.text === 'string' && response.text) ||
|
|
151
|
+
'';
|
|
152
|
+
const model = response.model ||
|
|
153
|
+
response.model_used ||
|
|
154
|
+
routingInfo.selected_model ||
|
|
155
|
+
llmStats.model ||
|
|
156
|
+
'lightreach';
|
|
157
|
+
const promptTokens = Number(llmStats.input_tokens ?? 0) || 0;
|
|
158
|
+
const completionTokens = Number(llmStats.output_tokens ?? 0) || 0;
|
|
159
|
+
const totalTokens = Number(llmStats.total_tokens ?? (promptTokens + completionTokens)) || (promptTokens + completionTokens);
|
|
160
|
+
const finishReason = llmStats.finish_reason ?? 'stop';
|
|
161
|
+
const message = { role: 'assistant', content };
|
|
162
|
+
if (Array.isArray(response.tool_calls) && response.tool_calls.length > 0) {
|
|
163
|
+
message.tool_calls = response.tool_calls;
|
|
164
|
+
if (!content)
|
|
165
|
+
message.content = null;
|
|
166
|
+
}
|
|
167
|
+
response.id = String(response.id || `chatcmpl-${Math.random().toString(16).slice(2)}${Date.now().toString(16)}`);
|
|
168
|
+
response.object = 'chat.completion';
|
|
169
|
+
response.created = Number(response.created || Math.floor(Date.now() / 1000));
|
|
170
|
+
response.model = String(model);
|
|
171
|
+
response.choices = Array.isArray(response.choices)
|
|
172
|
+
? response.choices
|
|
173
|
+
: [{ index: 0, message, finish_reason: finishReason }];
|
|
174
|
+
response.usage = (response.usage && typeof response.usage === 'object')
|
|
175
|
+
? response.usage
|
|
176
|
+
: {
|
|
177
|
+
prompt_tokens: promptTokens,
|
|
178
|
+
completion_tokens: completionTokens,
|
|
179
|
+
total_tokens: totalTokens,
|
|
180
|
+
};
|
|
181
|
+
response.content = content;
|
|
182
|
+
if (response.decompressed_response === undefined)
|
|
183
|
+
response.decompressed_response = content;
|
|
184
|
+
if (response.text === undefined)
|
|
185
|
+
response.text = content;
|
|
186
|
+
response.lightreach = {
|
|
187
|
+
content,
|
|
188
|
+
compression_stats: response.compression_stats,
|
|
189
|
+
llm_stats: response.llm_stats,
|
|
190
|
+
routing_info: response.routing_info,
|
|
191
|
+
warnings: response.warnings,
|
|
192
|
+
tokens_saved: response.tokens_saved,
|
|
193
|
+
tokens_used: response.tokens_used,
|
|
194
|
+
compression_ratio: response.compression_ratio,
|
|
195
|
+
cost_estimate: response.cost_estimate,
|
|
196
|
+
savings_estimate: response.savings_estimate,
|
|
197
|
+
provider_used: response.provider_used,
|
|
198
|
+
model_used: response.model_used,
|
|
199
|
+
latency_ms: llmStats.latency_ms ?? null,
|
|
200
|
+
};
|
|
201
|
+
return response;
|
|
202
|
+
}
|
|
144
203
|
/**
|
|
145
204
|
* Create async /complete job (POST /api/v1/complete/jobs).
|
|
146
205
|
*/
|
|
@@ -193,7 +252,7 @@ class PcompresslrAPIClient {
|
|
|
193
252
|
const st = await this.getCompleteJob(jobId);
|
|
194
253
|
if (st.status === 'succeeded') {
|
|
195
254
|
if (st.result)
|
|
196
|
-
return st.result;
|
|
255
|
+
return this.toOpenAISupersetResponse(st.result);
|
|
197
256
|
throw new APIRequestError('Async job succeeded but result was missing.');
|
|
198
257
|
}
|
|
199
258
|
if (st.status === 'failed' || st.status === 'canceled') {
|
|
@@ -285,35 +344,27 @@ class PcompresslrAPIClient {
|
|
|
285
344
|
data.auto_select_by_hle = request.auto_select_by_hle;
|
|
286
345
|
if (request.same_provider_only !== undefined)
|
|
287
346
|
data.same_provider_only = request.same_provider_only;
|
|
288
|
-
|
|
347
|
+
const raw = await this.makeRequest('/api/v2/complete', data, 'POST');
|
|
348
|
+
return this.toOpenAISupersetResponse(raw);
|
|
289
349
|
}
|
|
290
350
|
/**
|
|
291
|
-
* Messages-first complete with intelligent model selection
|
|
351
|
+
* Messages-first complete with intelligent model selection.
|
|
292
352
|
*
|
|
293
|
-
*
|
|
294
|
-
*
|
|
353
|
+
* Uses async job processing (enqueue + poll) for production reliability.
|
|
354
|
+
* Model routing is managed by admin-configured quality ceilings (global,
|
|
355
|
+
* tag-level, integration-level) in the dashboard. The system selects the
|
|
356
|
+
* cheapest model that meets the effective ceiling.
|
|
295
357
|
*
|
|
296
358
|
* Provider API keys must be stored in your account (BYOK via dashboard).
|
|
297
359
|
*
|
|
298
360
|
* @example
|
|
299
|
-
* // Basic usage (cross-provider optimization)
|
|
300
|
-
* const response = await client.complete({
|
|
301
|
-
* messages: [{role: 'user', content: 'Hello'}],
|
|
302
|
-
* desired_hle: 30,
|
|
303
|
-
* });
|
|
304
|
-
*
|
|
305
|
-
* // Constrained to specific provider
|
|
306
361
|
* const response = await client.complete({
|
|
307
362
|
* messages: [{role: 'user', content: 'Hello'}],
|
|
308
|
-
*
|
|
309
|
-
* desired_hle: 35,
|
|
363
|
+
* tags: { team: 'backend', environment: 'production' },
|
|
310
364
|
* });
|
|
311
365
|
*
|
|
312
|
-
* // Access routing info
|
|
313
366
|
* console.log(response.routing_info?.selected_model);
|
|
314
|
-
*
|
|
315
|
-
* console.log('Admin ceiling lowered your desired HLE');
|
|
316
|
-
* }
|
|
367
|
+
* console.log(response.routing_info?.effective_hle);
|
|
317
368
|
*/
|
|
318
369
|
async complete(request) {
|
|
319
370
|
// Warn about deprecated parameters
|
package/dist/cli.js
CHANGED
|
File without changes
|
package/dist/core.d.ts
CHANGED
|
@@ -17,19 +17,33 @@ export interface CompressionConfig {
|
|
|
17
17
|
}
|
|
18
18
|
export interface CompleteOptions {
|
|
19
19
|
messages: Message[];
|
|
20
|
+
/** @deprecated System selects model automatically. */
|
|
20
21
|
model?: string;
|
|
21
22
|
provider?: 'openai' | 'anthropic' | 'google';
|
|
23
|
+
/**
|
|
24
|
+
* @deprecated Quality routing is now fully managed by admin-configured ceilings
|
|
25
|
+
* in the dashboard. Accepted for backward compatibility.
|
|
26
|
+
*/
|
|
22
27
|
desiredHle?: number;
|
|
23
28
|
compress?: boolean;
|
|
24
29
|
compressionConfig?: CompressionConfig;
|
|
25
30
|
compressOutput?: boolean;
|
|
26
31
|
mode?: 'async' | 'sync';
|
|
32
|
+
/** @deprecated Use desiredHle instead. */
|
|
27
33
|
hleTargetPercent?: number;
|
|
34
|
+
/** @deprecated Use desiredHle instead. */
|
|
28
35
|
minHleScore?: number;
|
|
36
|
+
/** @deprecated Always auto-selects now. */
|
|
29
37
|
autoSelectByHle?: boolean;
|
|
38
|
+
/** @deprecated Use provider instead. */
|
|
30
39
|
sameProviderOnly?: boolean;
|
|
31
40
|
temperature?: number;
|
|
32
41
|
maxTokens?: number;
|
|
42
|
+
/**
|
|
43
|
+
* Tags for cost attribution and quality ceilings.
|
|
44
|
+
* Supported keys: 'team', 'environment', 'feature'.
|
|
45
|
+
* The 'integration' tag is reserved for system use.
|
|
46
|
+
*/
|
|
33
47
|
tags?: Record<string, string>;
|
|
34
48
|
maxHistoryMessages?: number;
|
|
35
49
|
}
|
package/dist/core.js
CHANGED
|
@@ -51,7 +51,6 @@ class LightReach {
|
|
|
51
51
|
// We do NOT fabricate cost estimates here since the API response does not include pricing data.
|
|
52
52
|
return {
|
|
53
53
|
...resp,
|
|
54
|
-
text: resp.text ?? resp.decompressed_response,
|
|
55
54
|
tokens_saved: resp.tokens_saved ?? resp.compression_stats?.token_savings,
|
|
56
55
|
tokens_used: resp.tokens_used ?? resp.llm_stats?.total_tokens,
|
|
57
56
|
compression_ratio: resp.compression_ratio ?? resp.compression_stats?.compression_ratio,
|
package/dist/index.d.ts
CHANGED
package/dist/index.js
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
"use strict";
|
|
2
2
|
/**
|
|
3
|
-
* Compress Light Reach -
|
|
3
|
+
* Compress Light Reach - OpenAI-compatible routing + compression SDK.
|
|
4
4
|
*/
|
|
5
5
|
Object.defineProperty(exports, "__esModule", { value: true });
|
|
6
6
|
exports.PcompresslrAPIError = exports.APIRequestError = exports.RateLimitError = exports.APIKeyError = exports.PcompresslrAPIClient = exports.Pcompresslr = exports.LightReach = exports.__version__ = void 0;
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "compress-lightreach",
|
|
3
|
-
"version": "1.0.
|
|
4
|
-
"description": "
|
|
3
|
+
"version": "1.0.9",
|
|
4
|
+
"description": "OpenAI-compatible LLM routing and compression SDK with LightReach metadata extensions",
|
|
5
5
|
"main": "dist/index.js",
|
|
6
6
|
"types": "dist/index.d.ts",
|
|
7
7
|
"bin": {
|