compress-lightreach 1.0.6 → 1.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Compress Light Reach
2
2
 
3
- **AI cost management SDK with intelligent model routing, prompt compression, and real-time token tracking**
3
+ **OpenAI-compatible LLM routing + compression SDK (superset responses with LightReach metadata)**
4
4
 
5
5
  [![npm version](https://badge.fury.io/js/compress-lightreach.svg)](https://badge.fury.io/js/compress-lightreach)
6
6
  [![Node.js 14+](https://img.shields.io/badge/node-14+-blue.svg)](https://nodejs.org/)
@@ -10,7 +10,7 @@ Compress Light Reach is a Node.js/TypeScript SDK that provides intelligent model
10
10
 
11
11
  ## Features
12
12
 
13
- - **Intelligent Model Routing**: Automatically selects optimal model based on quality requirements (HLE) and available provider keys
13
+ - **Intelligent Model Routing**: Automatically selects the optimal model based on admin-configured quality settings and available provider keys
14
14
  - **Token-aware Compression**: Replaces repeated substrings with shorter placeholders using a fast greedy algorithm
15
15
  - **Lossless**: Perfect decompression guaranteed
16
16
  - **Output Compression**: Optional model output compression support
@@ -37,7 +37,7 @@ The SDK uses **intelligent model routing** and targets `POST /api/v2/complete`.
37
37
 
38
38
  - Authenticate with your **LightReach API key** (env var `PCOMPRESLR_API_KEY` or `LIGHTREACH_API_KEY`)
39
39
  - Manage **provider keys** (OpenAI/Anthropic/Google/etc.) in the dashboard (BYOK)
40
- - System automatically selects optimal model based on your requirements
40
+ - System automatically selects the optimal model based on admin-configured quality settings
41
41
 
42
42
  ```typescript
43
43
  import { PcompresslrAPIClient } from 'compress-lightreach';
@@ -49,10 +49,10 @@ const result = await client.complete({
49
49
  { role: 'system', content: 'You are a helpful assistant.' },
50
50
  { role: 'user', content: 'Explain quantum computing in simple terms.' },
51
51
  ],
52
- desired_hle: 30, // Quality ceiling (0-100). Current SOTA is ~40%.
52
+ tags: { team: 'backend', environment: 'production' },
53
53
  });
54
54
 
55
- console.log(result.decompressed_response);
55
+ console.log(result.choices[0].message.content);
56
56
  console.log(`Selected: ${result.routing_info?.selected_model}`);
57
57
  console.log(`Token savings: ${result.compression_stats.token_savings}`);
58
58
  ```
@@ -61,15 +61,15 @@ console.log(`Token savings: ${result.compression_stats.token_savings}`);
61
61
 
62
62
  LightReach also exposes a **strict OpenAI-compatible** surface (including streaming SSE) so you can use standard OpenAI tooling without changing your app.
63
63
 
64
- - **Cursor base URL**: `https://compress.lightreach.io/v1/cursor`
65
- - **Generic OpenAI-compatible base URL**: `https://compress.lightreach.io/v1`
64
+ - **Cursor base URL**: `https://api.compress.lightreach.io/v1/cursor`
65
+ - **Generic OpenAI-compatible base URL**: `https://api.compress.lightreach.io/v1`
66
66
  - **Endpoints**: `GET /models`, `POST /chat/completions`
67
67
  - **Model id**: `lightreach`
68
68
 
69
69
  Example (cURL):
70
70
 
71
71
  ```bash
72
- curl -sS https://compress.lightreach.io/v1/chat/completions \
72
+ curl -sS https://api.compress.lightreach.io/v1/chat/completions \
73
73
  -H "Authorization: Bearer lr_your_lightreach_key" \
74
74
  -H "Content-Type: application/json" \
75
75
  -d '{
@@ -79,84 +79,95 @@ curl -sS https://compress.lightreach.io/v1/chat/completions \
79
79
  }'
80
80
  ```
81
81
 
82
- ### With Output Compression
82
+ ## Tags
83
+
84
+ Tags provide **cost attribution** and enable **admin-controlled quality ceilings** per tag. The system supports three tag categories that you can set on requests:
85
+
86
+ | Tag Key | Description | Example Values |
87
+ |---------|-------------|----------------|
88
+ | `team` | Your team or group | `"backend"`, `"ml-platform"`, `"marketing"` |
89
+ | `environment` | Deployment environment | `"development"`, `"staging"`, `"production"` |
90
+ | `feature` | Feature or use case | `"search"`, `"chat"`, `"summarization"` |
91
+
92
+ Tags are validated server-side. Your workspace admin can configure allowed values for each tag category via the dashboard. If a tag value is not in the allowed list, the request may be warned or rejected depending on your workspace's enforcement mode.
83
93
 
84
94
  ```typescript
85
95
  const result = await client.complete({
86
- messages: [{ role: 'user', content: 'Generate a long report...' }],
87
- desired_hle: 25,
88
- compress_output: true,
96
+ messages: [{ role: 'user', content: 'Summarize this document...' }],
97
+ tags: {
98
+ team: 'backend',
99
+ environment: 'production',
100
+ feature: 'summarization',
101
+ },
89
102
  });
90
-
91
- console.log(result.decompressed_response);
92
103
  ```
93
104
 
94
- ### Intelligent Model Routing
105
+ > **Note:** The `integration` tag is reserved for system use (e.g., Cursor, Claude Code) and should not be set manually. The `project` tag is also available for workspace-level project attribution — see your dashboard for configuration.
106
+
107
+ ## Intelligent Model Routing
108
+
109
+ Model routing is fully managed by your workspace admin via the dashboard. The system uses **HLE (Humanity's Last Exam)** scores — a standardized benchmark — to determine model quality. Admins configure quality ceilings at three levels:
95
110
 
96
- The system automatically selects the optimal model based on quality requirements and your available provider keys:
111
+ - **Global ceiling**: Set via the HLE slider in the dashboard. Applies to all requests.
112
+ - **Tag-level ceilings**: Set per tag (e.g., `environment=development` gets a lower ceiling to save costs).
113
+ - **Integration-level ceilings**: Set per integration (e.g., Cursor, Claude Code).
114
+
115
+ The routing engine picks the **cheapest model** whose HLE score meets the effective ceiling. HLE scores are maintained server-side and cannot be overridden by SDK callers.
97
116
 
98
117
  ```typescript
99
118
  import { PcompresslrAPIClient } from 'compress-lightreach';
100
119
 
101
120
  const client = new PcompresslrAPIClient("your-lightreach-api-key");
102
121
 
103
- // Cross-provider optimization: system picks cheapest model meeting your quality bar
104
122
  const result = await client.complete({
105
123
  messages: [{ role: 'user', content: 'Explain quantum computing' }],
106
- desired_hle: 30, // Quality ceiling (0-100). Current SOTA is ~40%.
124
+ tags: { team: 'backend', environment: 'production' },
107
125
  });
108
126
 
109
- // Check what was selected
110
127
  console.log(result.routing_info?.selected_model); // e.g., "gpt-4o-mini"
111
128
  console.log(result.routing_info?.selected_provider); // e.g., "openai"
112
129
  console.log(result.routing_info?.model_hle); // e.g., 32.5
113
130
  console.log(result.routing_info?.model_price_per_million); // e.g., 0.15
114
131
  ```
115
132
 
133
+ ### Routing Response
134
+
135
+ Every `complete()` response includes `routing_info` with full transparency into the routing decision:
136
+
137
+ ```typescript
138
+ const info = result.routing_info;
139
+ console.log(`Model: ${info?.selected_model}`);
140
+ console.log(`Provider: ${info?.selected_provider}`);
141
+ console.log(`Model HLE: ${info?.model_hle}`);
142
+ console.log(`Effective HLE ceiling: ${info?.effective_hle}`);
143
+ console.log(`Ceiling source: ${info?.hle_source}`); // "tag", "global", or "none"
144
+ ```
145
+
116
146
  ### Provider-Constrained Routing
117
147
 
118
148
  Optionally constrain to a specific provider:
119
149
 
120
150
  ```typescript
121
- // Only use OpenAI models, but pick the cheapest one meeting HLE 35
122
151
  const result = await client.complete({
123
152
  messages: [{ role: 'user', content: 'Write a poem' }],
124
- llm_provider: 'openai', // Optional: constrain to one provider
125
- desired_hle: 35,
153
+ llm_provider: 'anthropic',
126
154
  });
127
155
  ```
128
156
 
129
- ### HLE Cascading with Admin Controls
130
-
131
- Admins can set quality **ceilings** via the dashboard (global or per-tag) to control costs. Your `desired_hle` is a preference; if it exceeds an admin-set ceiling, the request will **silently clamp** to the ceiling and proceed.
157
+ ### With Output Compression
132
158
 
133
159
  ```typescript
134
- // Admin set global HLE ceiling to 30%
135
- // Requesting above the ceiling will be clamped to 30 (no error)
136
- const result = await client.complete({
137
- messages: [{ role: 'user', content: 'Process payment' }],
138
- desired_hle: 35, // Will be clamped down to 30
139
- tags: { env: 'production' },
140
- });
141
-
142
- // Correct usage: request within ceiling
143
160
  const result = await client.complete({
144
- messages: [{ role: 'user', content: 'Process payment' }],
145
- desired_hle: 25, // OK: below ceiling of 30
146
- tags: { env: 'production' },
161
+ messages: [{ role: 'user', content: 'Generate a long report...' }],
162
+ compress_output: true,
147
163
  });
148
164
 
149
- // Check if your HLE was lowered by an admin ceiling
150
- if (result.routing_info?.hle_clamped) {
151
- console.log(`HLE lowered from ${result.routing_info.requested_hle} ` +
152
- `to ${result.routing_info.effective_hle} ` +
153
- `by ${result.routing_info.hle_source}-level ceiling`);
154
- }
165
+ console.log(result.choices[0].message.content);
155
166
  ```
156
167
 
157
168
  ### With Compression Config
158
169
 
159
- Configure per-role compression settings:
170
+ Control which message roles get compressed:
160
171
 
161
172
  ```typescript
162
173
  import { PcompresslrAPIClient } from 'compress-lightreach';
@@ -165,7 +176,6 @@ const client = new PcompresslrAPIClient("your-lightreach-api-key");
165
176
 
166
177
  const result = await client.complete({
167
178
  messages: [{ role: 'user', content: 'Hello!' }],
168
- desired_hle: 30,
169
179
  compress: true,
170
180
  compress_output: false,
171
181
  compression_config: {
@@ -176,14 +186,13 @@ const result = await client.complete({
176
186
  },
177
187
  temperature: 0.7,
178
188
  max_tokens: 1000,
179
- tags: { env: 'production' },
189
+ tags: { team: 'backend', environment: 'production' },
180
190
  });
181
191
 
182
- console.log(result.decompressed_response);
192
+ console.log(result.choices[0].message.content);
183
193
  console.log(`Model used: ${result.routing_info?.selected_model}`);
184
194
  ```
185
195
 
186
-
187
196
  ### Compression Only (No LLM Call)
188
197
 
189
198
  ```typescript
@@ -191,11 +200,10 @@ import { PcompresslrAPIClient } from 'compress-lightreach';
191
200
 
192
201
  const client = new PcompresslrAPIClient("your-lightreach-api-key");
193
202
 
194
- // Compress text without making an LLM call
195
203
  const compressed = await client.compress(
196
204
  "Your text with repeated content here...",
197
- "gpt-4", // Model for tokenization
198
- { env: 'dev' } // Optional tags
205
+ "gpt-4",
206
+ { team: 'backend' },
199
207
  );
200
208
 
201
209
  console.log(compressed.llm_format);
@@ -209,10 +217,8 @@ console.log(decompressed.decompressed);
209
217
  ### Command Line Interface
210
218
 
211
219
  ```bash
212
- # Set your API key
213
220
  export PCOMPRESLR_API_KEY=your-api-key
214
221
 
215
- # Compress a prompt
216
222
  npx pcompresslr "Your prompt with repeated text here..."
217
223
  ```
218
224
 
@@ -237,7 +243,9 @@ new PcompresslrAPIClient(apiKey?: string, apiUrl?: string, timeout?: number)
237
243
 
238
244
  ##### `complete(request: CompleteV2Request): Promise<CompleteResponse>`
239
245
 
240
- Messages-first completion with intelligent routing (POST `/api/v2/complete`).
246
+ Messages-first completion with intelligent routing. Uses async job processing (enqueue + poll) for production reliability.
247
+
248
+ For direct synchronous calls, use `completeSync()` instead.
241
249
 
242
250
  **Request Parameters (`CompleteV2Request`):**
243
251
 
@@ -245,13 +253,12 @@ Messages-first completion with intelligent routing (POST `/api/v2/complete`).
245
253
  |-----------|------|---------|-------------|
246
254
  | `messages` | `Message[]` | required | Conversation history with `role` and `content` |
247
255
  | `llm_provider` | `'openai' \| 'anthropic' \| 'google' \| 'deepseek' \| 'moonshot'` | — | Optional provider constraint. Omit for cross-provider optimization |
248
- | `desired_hle` | `number` | — | Quality ceiling (0-100). If above an admin ceiling, it is clamped down |
249
256
  | `compress` | `boolean` | `true` | Whether to compress messages |
250
257
  | `compress_output` | `boolean` | `false` | Whether to request compressed output from LLM |
251
258
  | `compression_config` | `object` | — | Per-role compression settings (see below) |
252
259
  | `temperature` | `number` | — | LLM temperature parameter |
253
260
  | `max_tokens` | `number` | — | Maximum tokens to generate |
254
- | `tags` | `Record<string, string>` | — | Tags for cost attribution and tag-level HLE ceilings |
261
+ | `tags` | `Record<string, string>` | — | Tags for cost attribution and quality ceilings. Use `team`, `environment`, and/or `feature` keys |
255
262
  | `max_history_messages` | `number` | — | Limit conversation history length |
256
263
 
257
264
  **`compression_config` options:**
@@ -269,7 +276,21 @@ Messages-first completion with intelligent routing (POST `/api/v2/complete`).
269
276
 
270
277
  ```typescript
271
278
  {
272
- decompressed_response: string; // Final decompressed LLM response
279
+ id: string; // OpenAI-style completion id
280
+ object: "chat.completion";
281
+ created: number; // Unix timestamp
282
+ model: string;
283
+ choices: Array<{
284
+ index: number;
285
+ message: { role: "assistant"; content: string | null; tool_calls?: any[] };
286
+ finish_reason: string | null;
287
+ }>;
288
+ usage: {
289
+ prompt_tokens: number;
290
+ completion_tokens: number;
291
+ total_tokens: number;
292
+ };
293
+ content: string; // Alias of choices[0].message.content
273
294
  compression_stats: {
274
295
  compression_enabled: boolean;
275
296
  original_tokens: number;
@@ -293,33 +314,48 @@ Messages-first completion with intelligent routing (POST `/api/v2/complete`).
293
314
  selected_model: string; // Model chosen by system
294
315
  selected_provider: string; // Provider chosen by system
295
316
  selected_model_id: string;
296
- model_hle: number; // HLE score of selected model
317
+ model_hle: number; // HLE score of selected model (server-computed)
297
318
  model_price_per_million: number;
298
- requested_hle: number | null;
299
- effective_hle: number | null; // Effective HLE after admin ceilings
300
- hle_source: 'request' | 'tag' | 'global' | 'none';
301
- hle_clamped: boolean; // true if admin ceiling lowered your desired_hle
319
+ effective_hle: number | null; // The quality ceiling that was applied
320
+ hle_source: 'tag' | 'global' | 'none';
302
321
  };
303
322
  warnings?: string[];
304
-
323
+ lightreach?: { // Namespaced LightReach metadata extension
324
+ compression_stats?: object;
325
+ llm_stats?: object;
326
+ routing_info?: object;
327
+ latency_ms?: number | null;
328
+ };
329
+
305
330
  // Convenience aliases
306
- text?: string; // Alias for decompressed_response
307
- tokens_saved?: number; // Alias for compression_stats.token_savings
308
- tokens_used?: number; // Alias for llm_stats.total_tokens
309
- compression_ratio?: number; // Alias for compression_stats.compression_ratio
331
+ tokens_saved?: number;
332
+ tokens_used?: number;
333
+ compression_ratio?: number;
334
+ cost_estimate?: number | null;
335
+ savings_estimate?: number | null;
310
336
  }
311
337
  ```
312
338
 
313
- ##### `compress(prompt, model?, tags?): Promise<CompressResponse>`
339
+ ##### `completeSync(request: CompleteV2Request): Promise<CompleteResponse>`
340
+
341
+ Direct synchronous call to POST `/api/v2/complete`. Best for small/interactive usage. For production reliability, prefer `complete()` (async job + polling).
342
+
343
+ ##### `completeAsync(request, opts?): Promise<CompleteResponse>`
344
+
345
+ Explicit async job flow with configurable polling. Called internally by `complete()`.
314
346
 
315
- Also supports a legacy call shape: `compress(prompt, model, algorithm, tags?)` (only `"greedy"` is supported).
347
+ **Options:**
348
+ - `pollIntervalMs` (number, default: 1000): Polling interval in milliseconds
349
+ - `maxWaitMs` (number, default: timeout): Maximum wait time
350
+ - `idempotencyKey` (string, optional): Idempotency key for job creation
351
+
352
+ ##### `compress(prompt, model?, tags?): Promise<CompressResponse>`
316
353
 
317
354
  Compression-only (POST `/api/v1/compress`).
318
355
 
319
356
  **Parameters:**
320
357
  - `prompt` (string, required): Text to compress
321
358
  - `model` (string, optional): Model for tokenization. Default: `'gpt-4'`
322
- - `algorithm` (`"greedy"`, optional): Legacy-only parameter. Only `"greedy"` is supported.
323
359
  - `tags` (`Record<string, string>`, optional): Tags for attribution
324
360
 
325
361
  **Response (`CompressResponse`):**
@@ -366,7 +402,6 @@ Check API health status (GET `/health`).
366
402
  }
367
403
  ```
368
404
 
369
-
370
405
  ### Message Types
371
406
 
372
407
  ```typescript
@@ -393,7 +428,7 @@ interface Message {
393
428
  | `PcompresslrAPIError` | Base exception class |
394
429
  | `APIKeyError` | Invalid or missing API key |
395
430
  | `RateLimitError` | Rate limit exceeded |
396
- | `APIRequestError` | General API errors (including routing failures) |
431
+ | `APIRequestError` | General API errors (including routing failures, tag validation errors) |
397
432
 
398
433
  ```typescript
399
434
  import { APIKeyError, RateLimitError, APIRequestError } from 'compress-lightreach';
@@ -413,15 +448,10 @@ try {
413
448
 
414
449
  ## How It Works
415
450
 
416
- Compress Light Reach uses intelligent algorithms to identify repeated substrings in your prompts and replace them with shorter placeholders.
417
-
418
- The library:
419
- 1. Identifies repeated substrings using efficient suffix array algorithms
420
- 2. Calculates token savings for each potential replacement
421
- 3. Selects optimal replacements that reduce total token count
422
- 4. Intelligently routes to the best model based on your quality requirements
423
- 5. Formats the result for easy LLM consumption
424
- 6. Provides perfect decompression
451
+ 1. **Compression**: Identifies repeated substrings using efficient algorithms and replaces them with shorter placeholders, reducing token count
452
+ 2. **Routing**: Selects the cheapest model that meets the admin-configured quality ceiling (global, tag-level, or integration-level)
453
+ 3. **LLM Call**: Sends the compressed prompt to the selected model via your BYOK provider keys
454
+ 4. **Decompression**: Losslessly restores the model's response if output compression was enabled
425
455
 
426
456
  ## Examples
427
457
 
@@ -440,10 +470,10 @@ Write a story about a bird. The bird is very friendly.
440
470
 
441
471
  const result = await client.complete({
442
472
  messages: [{ role: "user", content: prompt }],
443
- desired_hle: 30,
473
+ tags: { team: 'content', environment: 'production' },
444
474
  });
445
475
 
446
- console.log(result.decompressed_response);
476
+ console.log(result.choices[0].message.content);
447
477
  console.log(`Model used: ${result.routing_info?.selected_model}`);
448
478
  console.log(`Token savings: ${result.compression_stats.token_savings} tokens`);
449
479
  console.log(`Compression ratio: ${(result.compression_stats.compression_ratio * 100).toFixed(2)}%`);
@@ -458,11 +488,10 @@ const client = new PcompresslrAPIClient("your-lightreach-api-key");
458
488
 
459
489
  const result = await client.complete({
460
490
  messages: [{ role: "user", content: "Generate a long report with repeated sections..." }],
461
- desired_hle: 35,
462
491
  compress_output: true,
463
492
  });
464
493
 
465
- console.log(result.decompressed_response);
494
+ console.log(result.choices[0].message.content);
466
495
  ```
467
496
 
468
497
  ### Example 3: Multi-turn Conversation
@@ -479,13 +508,13 @@ const result = await client.complete({
479
508
  { role: "assistant", content: "You can use open() with a context manager..." },
480
509
  { role: "user", content: "How about writing to a file?" },
481
510
  ],
482
- desired_hle: 30,
483
511
  compression_config: {
484
512
  compress_system: false,
485
513
  compress_user: true,
486
514
  compress_assistant: false,
487
- compress_only_last_n_user: 2, // Only compress last 2 user messages
515
+ compress_only_last_n_user: 2,
488
516
  },
517
+ tags: { team: 'engineering', feature: 'code-assistant' },
489
518
  });
490
519
  ```
491
520
 
@@ -28,7 +28,27 @@ export interface DecompressResponse {
28
28
  processing_time_ms: number;
29
29
  }
30
30
  export interface CompleteResponse {
31
- decompressed_response: string;
31
+ id: string;
32
+ object: 'chat.completion';
33
+ created: number;
34
+ model: string;
35
+ choices: Array<{
36
+ index: number;
37
+ message: {
38
+ role: 'assistant';
39
+ content: string | null;
40
+ tool_calls?: Array<Record<string, any>>;
41
+ };
42
+ finish_reason: string | null;
43
+ }>;
44
+ usage: {
45
+ prompt_tokens: number;
46
+ completion_tokens: number;
47
+ total_tokens: number;
48
+ };
49
+ content: string;
50
+ decompressed_response?: string;
51
+ text?: string;
32
52
  compression_stats: {
33
53
  compression_enabled: boolean;
34
54
  original_tokens: number;
@@ -53,19 +73,29 @@ export interface CompleteResponse {
53
73
  selected_model: string;
54
74
  selected_provider: string;
55
75
  selected_model_id: string;
76
+ /** HLE score of the selected model (server-computed). */
56
77
  model_hle: number;
57
78
  model_price_per_million: number;
58
- requested_hle: number | null;
79
+ input_price_per_million?: number | null;
80
+ output_price_per_million?: number | null;
81
+ /** @deprecated Present for backward compatibility. */
82
+ requested_hle?: number | null;
83
+ /** The quality ceiling that was applied (from global, tag, or integration settings). */
59
84
  effective_hle: number | null;
85
+ /** Where the effective HLE ceiling came from. */
60
86
  hle_source: 'request' | 'tag' | 'global' | 'none';
61
- hle_clamped: boolean;
87
+ /** @deprecated Present for backward compatibility. */
88
+ hle_clamped?: boolean;
62
89
  };
63
- text?: string;
64
90
  tokens_saved?: number;
65
91
  tokens_used?: number;
66
92
  compression_ratio?: number;
67
93
  cost_estimate?: number | null;
68
94
  savings_estimate?: number | null;
95
+ model_hle?: number | null;
96
+ input_price_per_million?: number | null;
97
+ output_price_per_million?: number | null;
98
+ lightreach?: Record<string, any>;
69
99
  }
70
100
  export type MessageRole = 'system' | 'developer' | 'user' | 'assistant';
71
101
  export interface Message {
@@ -74,7 +104,13 @@ export interface Message {
74
104
  }
75
105
  export interface CompleteV2Request {
76
106
  messages: Message[];
107
+ /** Optional provider constraint. Omit for cross-provider cost optimization. */
77
108
  llm_provider?: 'openai' | 'anthropic' | 'google' | 'deepseek' | 'moonshot';
109
+ /**
110
+ * @deprecated Quality routing is now fully managed by admin-configured ceilings
111
+ * (global, tag-level, integration-level) in the dashboard. This parameter is
112
+ * accepted for backward compatibility but should not be used in new code.
113
+ */
78
114
  desired_hle?: number;
79
115
  compress?: boolean;
80
116
  compression_config?: {
@@ -87,12 +123,25 @@ export interface CompleteV2Request {
87
123
  algorithm?: 'greedy';
88
124
  temperature?: number;
89
125
  max_tokens?: number;
126
+ /**
127
+ * Tags for cost attribution and tag-level quality ceilings.
128
+ * Supported keys: 'team', 'environment', 'feature'.
129
+ * Values are validated server-side against your workspace's allowed list.
130
+ * The 'integration' tag is reserved for system use and should not be set manually.
131
+ *
132
+ * @example { team: 'backend', environment: 'production', feature: 'search' }
133
+ */
90
134
  tags?: Record<string, string>;
91
135
  max_history_messages?: number;
136
+ /** @deprecated System selects model automatically. */
92
137
  model?: string;
138
+ /** @deprecated Use desired_hle instead. */
93
139
  hle_target_percent?: number;
140
+ /** @deprecated Use desired_hle instead. */
94
141
  min_hle_score?: number;
142
+ /** @deprecated Always auto-selects now. */
95
143
  auto_select_by_hle?: boolean;
144
+ /** @deprecated Use llm_provider instead. */
96
145
  same_provider_only?: boolean;
97
146
  }
98
147
  export interface HealthCheckResponse {
@@ -124,6 +173,7 @@ export declare class PcompresslrAPIClient {
124
173
  private session;
125
174
  constructor(apiKey?: string, apiUrl?: string, timeout?: number);
126
175
  private makeRequest;
176
+ private toOpenAISupersetResponse;
127
177
  /**
128
178
  * Create async /complete job (POST /api/v1/complete/jobs).
129
179
  */
@@ -145,9 +195,11 @@ export declare class PcompresslrAPIClient {
145
195
  /**
146
196
  * Compress text without making an LLM call (POST /api/v1/compress).
147
197
  *
148
- * Supported call shapes:
149
- * - compress(prompt, model?, tags?)
150
- * - compress(prompt, model, algorithm, tags?) (back-compat; only "greedy" is supported)
198
+ * @param prompt - Text to compress
199
+ * @param model - Model for tokenization (default: 'gpt-4')
200
+ * @param tags - Tags for attribution. Supported keys: 'team', 'environment', 'feature'.
201
+ *
202
+ * Also supports a legacy call shape: compress(prompt, model, 'greedy', tags?)
151
203
  */
152
204
  compress(prompt: string, model?: string, tags?: Record<string, string>): Promise<CompressResponse>;
153
205
  compress(prompt: string, model: string, algorithm: 'greedy', tags?: Record<string, string>): Promise<CompressResponse>;
@@ -161,32 +213,23 @@ export declare class PcompresslrAPIClient {
161
213
  */
162
214
  completeSync(request: CompleteV2Request): Promise<CompleteResponse>;
163
215
  /**
164
- * Messages-first complete with intelligent model selection (POST /api/v2/complete).
216
+ * Messages-first complete with intelligent model selection.
165
217
  *
166
- * v1.0.0: System automatically selects optimal model based on your provider keys,
167
- * desired HLE, and admin's global/tag-level HLE ceilings.
218
+ * Uses async job processing (enqueue + poll) for production reliability.
219
+ * Model routing is managed by admin-configured quality ceilings (global,
220
+ * tag-level, integration-level) in the dashboard. The system selects the
221
+ * cheapest model that meets the effective ceiling.
168
222
  *
169
223
  * Provider API keys must be stored in your account (BYOK via dashboard).
170
224
  *
171
225
  * @example
172
- * // Basic usage (cross-provider optimization)
173
- * const response = await client.complete({
174
- * messages: [{role: 'user', content: 'Hello'}],
175
- * desired_hle: 30,
176
- * });
177
- *
178
- * // Constrained to specific provider
179
226
  * const response = await client.complete({
180
227
  * messages: [{role: 'user', content: 'Hello'}],
181
- * llm_provider: 'openai',
182
- * desired_hle: 35,
228
+ * tags: { team: 'backend', environment: 'production' },
183
229
  * });
184
230
  *
185
- * // Access routing info
186
231
  * console.log(response.routing_info?.selected_model);
187
- * if (response.routing_info?.hle_clamped) {
188
- * console.log('Admin ceiling lowered your desired HLE');
189
- * }
232
+ * console.log(response.routing_info?.effective_hle);
190
233
  */
191
234
  complete(request: CompleteV2Request): Promise<CompleteResponse>;
192
235
  }
@@ -141,6 +141,65 @@ class PcompresslrAPIClient {
141
141
  throw new APIRequestError(`Request failed: ${errorMessage}`);
142
142
  }
143
143
  }
144
+ toOpenAISupersetResponse(raw) {
145
+ const response = (raw && typeof raw === 'object') ? { ...raw } : {};
146
+ const llmStats = (response.llm_stats && typeof response.llm_stats === 'object') ? response.llm_stats : {};
147
+ const routingInfo = (response.routing_info && typeof response.routing_info === 'object') ? response.routing_info : {};
148
+ const content = (typeof response.content === 'string' && response.content) ||
149
+ (typeof response.decompressed_response === 'string' && response.decompressed_response) ||
150
+ (typeof response.text === 'string' && response.text) ||
151
+ '';
152
+ const model = response.model ||
153
+ response.model_used ||
154
+ routingInfo.selected_model ||
155
+ llmStats.model ||
156
+ 'lightreach';
157
+ const promptTokens = Number(llmStats.input_tokens ?? 0) || 0;
158
+ const completionTokens = Number(llmStats.output_tokens ?? 0) || 0;
159
+ const totalTokens = Number(llmStats.total_tokens ?? (promptTokens + completionTokens)) || (promptTokens + completionTokens);
160
+ const finishReason = llmStats.finish_reason ?? 'stop';
161
+ const message = { role: 'assistant', content };
162
+ if (Array.isArray(response.tool_calls) && response.tool_calls.length > 0) {
163
+ message.tool_calls = response.tool_calls;
164
+ if (!content)
165
+ message.content = null;
166
+ }
167
+ response.id = String(response.id || `chatcmpl-${Math.random().toString(16).slice(2)}${Date.now().toString(16)}`);
168
+ response.object = 'chat.completion';
169
+ response.created = Number(response.created || Math.floor(Date.now() / 1000));
170
+ response.model = String(model);
171
+ response.choices = Array.isArray(response.choices)
172
+ ? response.choices
173
+ : [{ index: 0, message, finish_reason: finishReason }];
174
+ response.usage = (response.usage && typeof response.usage === 'object')
175
+ ? response.usage
176
+ : {
177
+ prompt_tokens: promptTokens,
178
+ completion_tokens: completionTokens,
179
+ total_tokens: totalTokens,
180
+ };
181
+ response.content = content;
182
+ if (response.decompressed_response === undefined)
183
+ response.decompressed_response = content;
184
+ if (response.text === undefined)
185
+ response.text = content;
186
+ response.lightreach = {
187
+ content,
188
+ compression_stats: response.compression_stats,
189
+ llm_stats: response.llm_stats,
190
+ routing_info: response.routing_info,
191
+ warnings: response.warnings,
192
+ tokens_saved: response.tokens_saved,
193
+ tokens_used: response.tokens_used,
194
+ compression_ratio: response.compression_ratio,
195
+ cost_estimate: response.cost_estimate,
196
+ savings_estimate: response.savings_estimate,
197
+ provider_used: response.provider_used,
198
+ model_used: response.model_used,
199
+ latency_ms: llmStats.latency_ms ?? null,
200
+ };
201
+ return response;
202
+ }
144
203
  /**
145
204
  * Create async /complete job (POST /api/v1/complete/jobs).
146
205
  */
@@ -193,7 +252,7 @@ class PcompresslrAPIClient {
193
252
  const st = await this.getCompleteJob(jobId);
194
253
  if (st.status === 'succeeded') {
195
254
  if (st.result)
196
- return st.result;
255
+ return this.toOpenAISupersetResponse(st.result);
197
256
  throw new APIRequestError('Async job succeeded but result was missing.');
198
257
  }
199
258
  if (st.status === 'failed' || st.status === 'canceled') {
@@ -285,35 +344,27 @@ class PcompresslrAPIClient {
285
344
  data.auto_select_by_hle = request.auto_select_by_hle;
286
345
  if (request.same_provider_only !== undefined)
287
346
  data.same_provider_only = request.same_provider_only;
288
- return this.makeRequest('/api/v2/complete', data, 'POST');
347
+ const raw = await this.makeRequest('/api/v2/complete', data, 'POST');
348
+ return this.toOpenAISupersetResponse(raw);
289
349
  }
290
350
  /**
291
- * Messages-first complete with intelligent model selection (POST /api/v2/complete).
351
+ * Messages-first complete with intelligent model selection.
292
352
  *
293
- * v1.0.0: System automatically selects optimal model based on your provider keys,
294
- * desired HLE, and admin's global/tag-level HLE ceilings.
353
+ * Uses async job processing (enqueue + poll) for production reliability.
354
+ * Model routing is managed by admin-configured quality ceilings (global,
355
+ * tag-level, integration-level) in the dashboard. The system selects the
356
+ * cheapest model that meets the effective ceiling.
295
357
  *
296
358
  * Provider API keys must be stored in your account (BYOK via dashboard).
297
359
  *
298
360
  * @example
299
- * // Basic usage (cross-provider optimization)
300
- * const response = await client.complete({
301
- * messages: [{role: 'user', content: 'Hello'}],
302
- * desired_hle: 30,
303
- * });
304
- *
305
- * // Constrained to specific provider
306
361
  * const response = await client.complete({
307
362
  * messages: [{role: 'user', content: 'Hello'}],
308
- * llm_provider: 'openai',
309
- * desired_hle: 35,
363
+ * tags: { team: 'backend', environment: 'production' },
310
364
  * });
311
365
  *
312
- * // Access routing info
313
366
  * console.log(response.routing_info?.selected_model);
314
- * if (response.routing_info?.hle_clamped) {
315
- * console.log('Admin ceiling lowered your desired HLE');
316
- * }
367
+ * console.log(response.routing_info?.effective_hle);
317
368
  */
318
369
  async complete(request) {
319
370
  // Warn about deprecated parameters
package/dist/cli.js CHANGED
File without changes
package/dist/core.d.ts CHANGED
@@ -17,19 +17,33 @@ export interface CompressionConfig {
17
17
  }
18
18
  export interface CompleteOptions {
19
19
  messages: Message[];
20
+ /** @deprecated System selects model automatically. */
20
21
  model?: string;
21
22
  provider?: 'openai' | 'anthropic' | 'google';
23
+ /**
24
+ * @deprecated Quality routing is now fully managed by admin-configured ceilings
25
+ * in the dashboard. Accepted for backward compatibility.
26
+ */
22
27
  desiredHle?: number;
23
28
  compress?: boolean;
24
29
  compressionConfig?: CompressionConfig;
25
30
  compressOutput?: boolean;
26
31
  mode?: 'async' | 'sync';
32
+ /** @deprecated Use desiredHle instead. */
27
33
  hleTargetPercent?: number;
34
+ /** @deprecated Use desiredHle instead. */
28
35
  minHleScore?: number;
36
+ /** @deprecated Always auto-selects now. */
29
37
  autoSelectByHle?: boolean;
38
+ /** @deprecated Use provider instead. */
30
39
  sameProviderOnly?: boolean;
31
40
  temperature?: number;
32
41
  maxTokens?: number;
42
+ /**
43
+ * Tags for cost attribution and quality ceilings.
44
+ * Supported keys: 'team', 'environment', 'feature'.
45
+ * The 'integration' tag is reserved for system use.
46
+ */
33
47
  tags?: Record<string, string>;
34
48
  maxHistoryMessages?: number;
35
49
  }
package/dist/core.js CHANGED
@@ -51,7 +51,6 @@ class LightReach {
51
51
  // We do NOT fabricate cost estimates here since the API response does not include pricing data.
52
52
  return {
53
53
  ...resp,
54
- text: resp.text ?? resp.decompressed_response,
55
54
  tokens_saved: resp.tokens_saved ?? resp.compression_stats?.token_savings,
56
55
  tokens_used: resp.tokens_used ?? resp.llm_stats?.total_tokens,
57
56
  compression_ratio: resp.compression_ratio ?? resp.compression_stats?.compression_ratio,
package/dist/index.d.ts CHANGED
@@ -1,5 +1,5 @@
1
1
  /**
2
- * Compress Light Reach - Intelligent compression algorithms for LLM prompts.
2
+ * Compress Light Reach - OpenAI-compatible routing + compression SDK.
3
3
  */
4
4
  export { __version__ } from './version';
5
5
  export { LightReach, Pcompresslr } from './core';
package/dist/index.js CHANGED
@@ -1,6 +1,6 @@
1
1
  "use strict";
2
2
  /**
3
- * Compress Light Reach - Intelligent compression algorithms for LLM prompts.
3
+ * Compress Light Reach - OpenAI-compatible routing + compression SDK.
4
4
  */
5
5
  Object.defineProperty(exports, "__esModule", { value: true });
6
6
  exports.PcompresslrAPIError = exports.APIRequestError = exports.RateLimitError = exports.APIKeyError = exports.PcompresslrAPIClient = exports.Pcompresslr = exports.LightReach = exports.__version__ = void 0;
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "compress-lightreach",
3
- "version": "1.0.6",
4
- "description": "AI cost management SDK with intelligent model routing, prompt compression, and real-time token tracking",
3
+ "version": "1.0.9",
4
+ "description": "OpenAI-compatible LLM routing and compression SDK with LightReach metadata extensions",
5
5
  "main": "dist/index.js",
6
6
  "types": "dist/index.d.ts",
7
7
  "bin": {