npm - compress-lightreach - Versions diffs - 1.0.5 → 1.0.8 - Mend

compress-lightreach 1.0.5 → 1.0.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md CHANGED Viewed

@@ -10,11 +10,8 @@ Compress Light Reach is a Node.js/TypeScript SDK that provides intelligent model
 ## Features
-- **Intelligent Model Routing**: Automatically selects optimal model based on quality requirements (HLE) and available provider keys
-- **Token-aware Compression**: Replaces repeated substrings with shorter placeholders
-- **Dual Algorithms**:
-  - Fast greedy (~99% optimal) for daily use
-  - Optimal DP (O(n²)) for critical prompts
+- **Intelligent Model Routing**: Automatically selects the optimal model based on admin-configured quality settings and available provider keys
+- **Token-aware Compression**: Replaces repeated substrings with shorter placeholders using a fast greedy algorithm
 - **Lossless**: Perfect decompression guaranteed
 - **Output Compression**: Optional model output compression support
 - **Cloud API**: Uses Light Reach's cloud service for compression and routing
@@ -40,7 +37,7 @@ The SDK uses **intelligent model routing** and targets `POST /api/v2/complete`.
 - Authenticate with your **LightReach API key** (env var `PCOMPRESLR_API_KEY` or `LIGHTREACH_API_KEY`)
 - Manage **provider keys** (OpenAI/Anthropic/Google/etc.) in the dashboard (BYOK)
-- System automatically selects optimal model based on your requirements
+- System automatically selects the optimal model based on admin-configured quality settings
 ```typescript
 import { PcompresslrAPIClient } from 'compress-lightreach';
@@ -52,7 +49,7 @@ const result = await client.complete({
     { role: 'system', content: 'You are a helpful assistant.' },
     { role: 'user', content: 'Explain quantum computing in simple terms.' },
   ],
-  desired_hle: 30,  // Quality preference (0-40, where 40 is SOTA)
+  tags: { team: 'backend', environment: 'production' },
 });
 console.log(result.decompressed_response);
@@ -60,84 +57,117 @@ console.log(`Selected: ${result.routing_info?.selected_model}`);
 console.log(`Token savings: ${result.compression_stats.token_savings}`);
 ```
-### With Output Compression
+## OpenAI-compatible API (Cursor / OpenAI SDKs)
+LightReach also exposes a **strict OpenAI-compatible** surface (including streaming SSE) so you can use standard OpenAI tooling without changing your app.
+- **Cursor base URL**: `https://api.compress.lightreach.io/v1/cursor`
+- **Generic OpenAI-compatible base URL**: `https://api.compress.lightreach.io/v1`
+- **Endpoints**: `GET /models`, `POST /chat/completions`
+- **Model id**: `lightreach`
+Example (cURL):
+```bash
+curl -sS https://api.compress.lightreach.io/v1/chat/completions \
+  -H "Authorization: Bearer lr_your_lightreach_key" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "lightreach",
+    "messages": [{"role":"user","content":"Say hello"}],
+    "stream": true
+  }'
+```
+## Tags
+Tags provide **cost attribution** and enable **admin-controlled quality ceilings** per tag. The system supports three tag categories that you can set on requests:
+| Tag Key | Description | Example Values |
+|---------|-------------|----------------|
+| `team` | Your team or group | `"backend"`, `"ml-platform"`, `"marketing"` |
+| `environment` | Deployment environment | `"development"`, `"staging"`, `"production"` |
+| `feature` | Feature or use case | `"search"`, `"chat"`, `"summarization"` |
+Tags are validated server-side. Your workspace admin can configure allowed values for each tag category via the dashboard. If a tag value is not in the allowed list, the request may be warned or rejected depending on your workspace's enforcement mode.
 ```typescript
 const result = await client.complete({
-  messages: [{ role: 'user', content: 'Generate a long report...' }],
-  desired_hle: 25,
-  compress_output: true,
+  messages: [{ role: 'user', content: 'Summarize this document...' }],
+  tags: {
+    team: 'backend',
+    environment: 'production',
+    feature: 'summarization',
+  },
 });
-console.log(result.decompressed_response);
 ```
-### Intelligent Model Routing
+> **Note:** The `integration` tag is reserved for system use (e.g., Cursor, Claude Code) and should not be set manually. The `project` tag is also available for workspace-level project attribution — see your dashboard for configuration.
+## Intelligent Model Routing
+Model routing is fully managed by your workspace admin via the dashboard. The system uses **HLE (Humanity's Last Exam)** scores — a standardized benchmark — to determine model quality. Admins configure quality ceilings at three levels:
-The system automatically selects the optimal model based on quality requirements and your available provider keys:
+- **Global ceiling**: Set via the HLE slider in the dashboard. Applies to all requests.
+- **Tag-level ceilings**: Set per tag (e.g., `environment=development` gets a lower ceiling to save costs).
+- **Integration-level ceilings**: Set per integration (e.g., Cursor, Claude Code).
+The routing engine picks the **cheapest model** whose HLE score meets the effective ceiling. HLE scores are maintained server-side and cannot be overridden by SDK callers.
 ```typescript
 import { PcompresslrAPIClient } from 'compress-lightreach';
 const client = new PcompresslrAPIClient("your-lightreach-api-key");
-// Cross-provider optimization: system picks cheapest model meeting your quality bar
 const result = await client.complete({
   messages: [{ role: 'user', content: 'Explain quantum computing' }],
-  desired_hle: 30,  // Quality preference (0-40, where 40 is SOTA)
+  tags: { team: 'backend', environment: 'production' },
 });
-// Check what was selected
 console.log(result.routing_info?.selected_model);           // e.g., "gpt-4o-mini"
 console.log(result.routing_info?.selected_provider);        // e.g., "openai"
 console.log(result.routing_info?.model_hle);                // e.g., 32.5
 console.log(result.routing_info?.model_price_per_million);  // e.g., 0.15
 ```
+### Routing Response
+Every `complete()` response includes `routing_info` with full transparency into the routing decision:
+```typescript
+const info = result.routing_info;
+console.log(`Model: ${info?.selected_model}`);
+console.log(`Provider: ${info?.selected_provider}`);
+console.log(`Model HLE: ${info?.model_hle}`);
+console.log(`Effective HLE ceiling: ${info?.effective_hle}`);
+console.log(`Ceiling source: ${info?.hle_source}`);  // "tag", "global", or "none"
+```
 ### Provider-Constrained Routing
 Optionally constrain to a specific provider:
 ```typescript
-// Only use OpenAI models, but pick the cheapest one meeting HLE 35
 const result = await client.complete({
   messages: [{ role: 'user', content: 'Write a poem' }],
-  llm_provider: 'openai',  // Optional: constrain to one provider
-  desired_hle: 35,
+  llm_provider: 'anthropic',
 });
 ```
-### HLE Cascading with Admin Controls
-Admins can set quality **ceilings** via the dashboard (global or per-tag) to control costs. Your `desired_hle` is a preference; if it exceeds an admin-set ceiling, the request will **silently clamp** to the ceiling and proceed.
+### With Output Compression
 ```typescript
-// Admin set global HLE ceiling to 30%
-// Requesting above the ceiling will be clamped to 30 (no error)
-const result = await client.complete({
-  messages: [{ role: 'user', content: 'Process payment' }],
-  desired_hle: 35,  // Will be clamped down to 30
-  tags: { env: 'production' },
-});
-// Correct usage: request within ceiling
 const result = await client.complete({
-  messages: [{ role: 'user', content: 'Process payment' }],
-  desired_hle: 25,  // OK: below ceiling of 30
-  tags: { env: 'production' },
+  messages: [{ role: 'user', content: 'Generate a long report...' }],
+  compress_output: true,
 });
-// Check if your HLE was lowered by an admin ceiling
-if (result.routing_info?.hle_clamped) {
-  console.log(`HLE lowered from ${result.routing_info.requested_hle} ` +
-              `to ${result.routing_info.effective_hle} ` +
-              `by ${result.routing_info.hle_source}-level ceiling`);
-}
+console.log(result.decompressed_response);
 ```
 ### With Compression Config
-Configure per-role compression settings:
+Control which message roles get compressed:
 ```typescript
 import { PcompresslrAPIClient } from 'compress-lightreach';
@@ -146,7 +176,6 @@ const client = new PcompresslrAPIClient("your-lightreach-api-key");
 const result = await client.complete({
   messages: [{ role: 'user', content: 'Hello!' }],
-  desired_hle: 30,
   compress: true,
   compress_output: false,
   compression_config: {
@@ -157,14 +186,13 @@ const result = await client.complete({
   },
   temperature: 0.7,
   max_tokens: 1000,
-  tags: { env: 'production' },
+  tags: { team: 'backend', environment: 'production' },
 });
 console.log(result.decompressed_response);
 console.log(`Model used: ${result.routing_info?.selected_model}`);
 ```
 ### Compression Only (No LLM Call)
 ```typescript
@@ -172,12 +200,10 @@ import { PcompresslrAPIClient } from 'compress-lightreach';
 const client = new PcompresslrAPIClient("your-lightreach-api-key");
-// Compress text without making an LLM call
 const compressed = await client.compress(
   "Your text with repeated content here...",
-  "gpt-4",      // Model for tokenization
-  "greedy",     // Algorithm: 'greedy' or 'optimal'
-  { env: 'dev' } // Optional tags
+  "gpt-4",
+  { team: 'backend' },
 );
 console.log(compressed.llm_format);
@@ -191,17 +217,9 @@ console.log(decompressed.decompressed);
 ### Command Line Interface
 ```bash
-# Set your API key
 export PCOMPRESLR_API_KEY=your-api-key
-# Compress a prompt
 npx pcompresslr "Your prompt with repeated text here..."
-# Use optimal algorithm only
-npx pcompresslr "Your prompt here" --optimal-only
-# Use greedy algorithm only
-npx pcompresslr "Your prompt here" --greedy-only
 ```
 ## API Reference
@@ -219,13 +237,15 @@ new PcompresslrAPIClient(apiKey?: string, apiUrl?: string, timeout?: number)
 **Parameters:**
 - `apiKey` (string, optional): LightReach API key. Falls back to `LIGHTREACH_API_KEY` or `PCOMPRESLR_API_KEY` env vars.
 - `apiUrl` (string, optional): Override base API URL. Falls back to `PCOMPRESLR_API_URL` env var. Default: `https://api.compress.lightreach.io`
-- `timeout` (number, optional): Request timeout in milliseconds. Default: `120000` (2 minutes)
+- `timeout` (number, optional): Request timeout in milliseconds. Default: `900000` (15 minutes)
 #### Methods
 ##### `complete(request: CompleteV2Request): Promise<CompleteResponse>`
-Messages-first completion with intelligent routing (POST `/api/v2/complete`).
+Messages-first completion with intelligent routing. Uses async job processing (enqueue + poll) for production reliability.
+For direct synchronous calls, use `completeSync()` instead.
 **Request Parameters (`CompleteV2Request`):**
@@ -233,14 +253,12 @@ Messages-first completion with intelligent routing (POST `/api/v2/complete`).
 |-----------|------|---------|-------------|
 | `messages` | `Message[]` | required | Conversation history with `role` and `content` |
 | `llm_provider` | `'openai' \| 'anthropic' \| 'google' \| 'deepseek' \| 'moonshot'` | — | Optional provider constraint. Omit for cross-provider optimization |
-| `desired_hle` | `number` | — | Quality preference (0-40, where 40 is SOTA). If above an admin ceiling, it is clamped down |
 | `compress` | `boolean` | `true` | Whether to compress messages |
 | `compress_output` | `boolean` | `false` | Whether to request compressed output from LLM |
-| `algorithm` | `'greedy' \| 'optimal'` | `'greedy'` | Compression algorithm |
 | `compression_config` | `object` | — | Per-role compression settings (see below) |
 | `temperature` | `number` | — | LLM temperature parameter |
 | `max_tokens` | `number` | — | Maximum tokens to generate |
-| `tags` | `Record<string, string>` | — | Tags for cost attribution and tag-level HLE ceilings |
+| `tags` | `Record<string, string>` | — | Tags for cost attribution and quality ceilings. Use `team`, `environment`, and/or `feature` keys |
 | `max_history_messages` | `number` | — | Limit conversation history length |
 **`compression_config` options:**
@@ -258,51 +276,67 @@ Messages-first completion with intelligent routing (POST `/api/v2/complete`).
 ```typescript
 {
+  content: string;                   // Final response content
   decompressed_response: string;     // Final decompressed LLM response
   compression_stats: {
-    original_size_chars: number;
-    compressed_size_chars: number;
+    compression_enabled: boolean;
     original_tokens: number;
     compressed_tokens: number;
-    compression_ratio: number;
     token_savings: number;
-    token_savings_percent: number;
+    compression_ratio: number;
+    token_count_exact?: boolean;
+    token_count_source?: string;
+    token_accounting_note?: string;
     processing_time_ms?: number;
   };
   llm_stats: {
-    prompt_tokens: number;
-    completion_tokens: number;
+    provider?: string;
+    model?: string;
+    input_tokens: number;
+    output_tokens: number;
     total_tokens: number;
+    finish_reason?: string | null;
   };
   routing_info?: {
     selected_model: string;          // Model chosen by system
     selected_provider: string;       // Provider chosen by system
     selected_model_id: string;
-    model_hle: number;               // HLE score of selected model
+    model_hle: number;               // HLE score of selected model (server-computed)
     model_price_per_million: number;
-    requested_hle: number | null;
-    effective_hle: number | null;    // Effective HLE after admin ceilings
-    hle_source: 'request' | 'tag' | 'global' | 'none';
-    hle_clamped: boolean;            // true if admin ceiling lowered your desired_hle
+    effective_hle: number | null;    // The quality ceiling that was applied
+    hle_source: 'tag' | 'global' | 'none';
   };
   warnings?: string[];
   // Convenience aliases
-  text?: string;                     // Alias for decompressed_response
-  tokens_saved?: number;             // Alias for compression_stats.token_savings
-  tokens_used?: number;              // Alias for llm_stats.total_tokens
-  compression_ratio?: number;        // Alias for compression_stats.compression_ratio
+  tokens_saved?: number;
+  tokens_used?: number;
+  compression_ratio?: number;
+  cost_estimate?: number | null;
+  savings_estimate?: number | null;
 }
 ```
-##### `compress(prompt, model?, algorithm?, tags?): Promise<CompressResponse>`
+##### `completeSync(request: CompleteV2Request): Promise<CompleteResponse>`
+Direct synchronous call to POST `/api/v2/complete`. Best for small/interactive usage. For production reliability, prefer `complete()` (async job + polling).
+##### `completeAsync(request, opts?): Promise<CompleteResponse>`
+Explicit async job flow with configurable polling. Called internally by `complete()`.
+**Options:**
+- `pollIntervalMs` (number, default: 1000): Polling interval in milliseconds
+- `maxWaitMs` (number, default: timeout): Maximum wait time
+- `idempotencyKey` (string, optional): Idempotency key for job creation
+##### `compress(prompt, model?, tags?): Promise<CompressResponse>`
 Compression-only (POST `/api/v1/compress`).
 **Parameters:**
 - `prompt` (string, required): Text to compress
 - `model` (string, optional): Model for tokenization. Default: `'gpt-4'`
-- `algorithm` (`'greedy' | 'optimal'`, optional): Compression algorithm. Default: `'greedy'`
 - `tags` (`Record<string, string>`, optional): Tags for attribution
 **Response (`CompressResponse`):**
@@ -349,7 +383,6 @@ Check API health status (GET `/health`).
 }
 ```
 ### Message Types
 ```typescript
@@ -376,7 +409,7 @@ interface Message {
 | `PcompresslrAPIError` | Base exception class |
 | `APIKeyError` | Invalid or missing API key |
 | `RateLimitError` | Rate limit exceeded |
-| `APIRequestError` | General API errors (including routing failures) |
+| `APIRequestError` | General API errors (including routing failures, tag validation errors) |
 ```typescript
 import { APIKeyError, RateLimitError, APIRequestError } from 'compress-lightreach';
@@ -396,15 +429,10 @@ try {
 ## How It Works
-Compress Light Reach uses intelligent algorithms to identify repeated substrings in your prompts and replace them with shorter placeholders.
-The library:
-1. Identifies repeated substrings using efficient suffix array algorithms
-2. Calculates token savings for each potential replacement
-3. Selects optimal replacements that reduce total token count
-4. Intelligently routes to the best model based on your quality requirements
-5. Formats the result for easy LLM consumption
-6. Provides perfect decompression
+1. **Compression**: Identifies repeated substrings using efficient algorithms and replaces them with shorter placeholders, reducing token count
+2. **Routing**: Selects the cheapest model that meets the admin-configured quality ceiling (global, tag-level, or integration-level)
+3. **LLM Call**: Sends the compressed prompt to the selected model via your BYOK provider keys
+4. **Decompression**: Losslessly restores the model's response if output compression was enabled
 ## Examples
@@ -423,7 +451,7 @@ Write a story about a bird. The bird is very friendly.
 const result = await client.complete({
   messages: [{ role: "user", content: prompt }],
-  desired_hle: 30,
+  tags: { team: 'content', environment: 'production' },
 });
 console.log(result.decompressed_response);
@@ -441,7 +469,6 @@ const client = new PcompresslrAPIClient("your-lightreach-api-key");
 const result = await client.complete({
   messages: [{ role: "user", content: "Generate a long report with repeated sections..." }],
-  desired_hle: 35,
   compress_output: true,
 });
@@ -462,13 +489,13 @@ const result = await client.complete({
     { role: "assistant", content: "You can use open() with a context manager..." },
     { role: "user", content: "How about writing to a file?" },
   ],
-  desired_hle: 30,
   compression_config: {
     compress_system: false,
     compress_user: true,
     compress_assistant: false,
-    compress_only_last_n_user: 2,  // Only compress last 2 user messages
+    compress_only_last_n_user: 2,
   },
+  tags: { team: 'engineering', feature: 'code-assistant' },
 });
 ```

package/dist/api-client.d.ts CHANGED Viewed

@@ -28,40 +28,53 @@ export interface DecompressResponse {
     processing_time_ms: number;
 }
 export interface CompleteResponse {
-    decompressed_response: string;
+    content: string;
     compression_stats: {
-        original_size_chars: number;
-        compressed_size_chars: number;
+        compression_enabled: boolean;
         original_tokens: number;
         compressed_tokens: number;
-        compression_ratio: number;
         token_savings: number;
-        token_savings_percent: number;
+        compression_ratio: number;
+        token_count_exact?: boolean;
+        token_count_source?: string;
+        token_accounting_note?: string;
         processing_time_ms?: number;
     };
     llm_stats: {
-        prompt_tokens: number;
-        completion_tokens: number;
+        provider?: string;
+        model?: string;
+        input_tokens: number;
+        output_tokens: number;
         total_tokens: number;
+        finish_reason?: string | null;
     };
     warnings?: string[];
     routing_info?: {
         selected_model: string;
         selected_provider: string;
         selected_model_id: string;
+        /** HLE score of the selected model (server-computed). */
         model_hle: number;
         model_price_per_million: number;
-        requested_hle: number | null;
+        input_price_per_million?: number | null;
+        output_price_per_million?: number | null;
+        /** @deprecated Present for backward compatibility. */
+        requested_hle?: number | null;
+        /** The quality ceiling that was applied (from global, tag, or integration settings). */
         effective_hle: number | null;
+        /** Where the effective HLE ceiling came from. */
         hle_source: 'request' | 'tag' | 'global' | 'none';
-        hle_clamped: boolean;
+        /** @deprecated Present for backward compatibility. */
+        hle_clamped?: boolean;
     };
-    text?: string;
     tokens_saved?: number;
     tokens_used?: number;
     compression_ratio?: number;
     cost_estimate?: number | null;
     savings_estimate?: number | null;
+    model_hle?: number | null;
+    input_price_per_million?: number | null;
+    output_price_per_million?: number | null;
 }
 export type MessageRole = 'system' | 'developer' | 'user' | 'assistant';
 export interface Message {
@@ -70,7 +83,13 @@ export interface Message {
 }
 export interface CompleteV2Request {
     messages: Message[];
+    /** Optional provider constraint. Omit for cross-provider cost optimization. */
     llm_provider?: 'openai' | 'anthropic' | 'google' | 'deepseek' | 'moonshot';
+    /**
+     * @deprecated Quality routing is now fully managed by admin-configured ceilings
+     * (global, tag-level, integration-level) in the dashboard. This parameter is
+     * accepted for backward compatibility but should not be used in new code.
+     */
     desired_hle?: number;
     compress?: boolean;
     compression_config?: {
@@ -80,15 +99,28 @@ export interface CompleteV2Request {
         compress_only_last_n_user?: number | null;
     };
     compress_output?: boolean;
-    algorithm?: 'greedy' | 'optimal';
+    algorithm?: 'greedy';
     temperature?: number;
     max_tokens?: number;
+    /**
+     * Tags for cost attribution and tag-level quality ceilings.
+     * Supported keys: 'team', 'environment', 'feature'.
+     * Values are validated server-side against your workspace's allowed list.
+     * The 'integration' tag is reserved for system use and should not be set manually.
+     *
+     * @example { team: 'backend', environment: 'production', feature: 'search' }
+     */
     tags?: Record<string, string>;
     max_history_messages?: number;
+    /** @deprecated System selects model automatically. */
     model?: string;
+    /** @deprecated Use desired_hle instead. */
     hle_target_percent?: number;
+    /** @deprecated Use desired_hle instead. */
     min_hle_score?: number;
+    /** @deprecated Always auto-selects now. */
     auto_select_by_hle?: boolean;
+    /** @deprecated Use llm_provider instead. */
     same_provider_only?: boolean;
 }
 export interface HealthCheckResponse {
@@ -138,7 +170,17 @@ export declare class PcompresslrAPIClient {
         maxWaitMs?: number;
         idempotencyKey?: string;
     }): Promise<CompleteResponse>;
-    compress(prompt: string, model?: string, algorithm?: "greedy" | "optimal", tags?: Record<string, string>): Promise<CompressResponse>;
+    /**
+     * Compress text without making an LLM call (POST /api/v1/compress).
+     *
+     * @param prompt - Text to compress
+     * @param model - Model for tokenization (default: 'gpt-4')
+     * @param tags - Tags for attribution. Supported keys: 'team', 'environment', 'feature'.
+     *
+     * Also supports a legacy call shape: compress(prompt, model, 'greedy', tags?)
+     */
+    compress(prompt: string, model?: string, tags?: Record<string, string>): Promise<CompressResponse>;
+    compress(prompt: string, model: string, algorithm: 'greedy', tags?: Record<string, string>): Promise<CompressResponse>;
     decompress(llmFormat: string): Promise<DecompressResponse>;
     healthCheck(): Promise<HealthCheckResponse>;
     /**
@@ -149,32 +191,23 @@ export declare class PcompresslrAPIClient {
      */
     completeSync(request: CompleteV2Request): Promise<CompleteResponse>;
     /**
-     * Messages-first complete with intelligent model selection (POST /api/v2/complete).
+     * Messages-first complete with intelligent model selection.
      *
-     * v1.0.0: System automatically selects optimal model based on your provider keys,
-     * desired HLE, and admin's global/tag-level HLE ceilings.
+     * Uses async job processing (enqueue + poll) for production reliability.
+     * Model routing is managed by admin-configured quality ceilings (global,
+     * tag-level, integration-level) in the dashboard. The system selects the
+     * cheapest model that meets the effective ceiling.
      *
      * Provider API keys must be stored in your account (BYOK via dashboard).
      *
      * @example
-     * // Basic usage (cross-provider optimization)
-     * const response = await client.complete({
-     *   messages: [{role: 'user', content: 'Hello'}],
-     *   desired_hle: 30,
-     * });
-     *
-     * // Constrained to specific provider
      * const response = await client.complete({
      *   messages: [{role: 'user', content: 'Hello'}],
-     *   llm_provider: 'openai',
-     *   desired_hle: 35,
+     *   tags: { team: 'backend', environment: 'production' },
      * });
      *
-     * // Access routing info
      * console.log(response.routing_info?.selected_model);
-     * if (response.routing_info?.hle_clamped) {
-     *   console.log('Admin ceiling lowered your desired HLE');
-     * }
+     * console.log(response.routing_info?.effective_hle);
      */
     complete(request: CompleteV2Request): Promise<CompleteResponse>;
 }

package/dist/api-client.js CHANGED Viewed

@@ -206,15 +206,22 @@ class PcompresslrAPIClient {
             interval = Math.min(Math.floor(interval * 1.2), 2000);
         }
     }
-    async compress(prompt, model = "gpt-4", algorithm = "greedy", tags) {
-        const data = {
-            prompt,
-            model,
-            algorithm
-        };
-        if (tags) {
-            data.tags = tags;
+    async compress(prompt, model = "gpt-4", algorithmOrTags, maybeTags) {
+        let algorithm = 'greedy';
+        let tags;
+        if (typeof algorithmOrTags === 'string') {
+            if (algorithmOrTags !== 'greedy') {
+                throw new APIRequestError(`Invalid algorithm "${algorithmOrTags}". Only "greedy" is supported.`);
+            }
+            algorithm = 'greedy';
+            tags = maybeTags;
         }
+        else if (algorithmOrTags && typeof algorithmOrTags === 'object') {
+            tags = algorithmOrTags;
+        }
+        const data = { prompt, model, algorithm };
+        if (tags)
+            data.tags = tags;
         return this.makeRequest("/api/v1/compress", data);
     }
     async decompress(llmFormat) {
@@ -281,32 +288,23 @@ class PcompresslrAPIClient {
         return this.makeRequest('/api/v2/complete', data, 'POST');
     }
     /**
-     * Messages-first complete with intelligent model selection (POST /api/v2/complete).
+     * Messages-first complete with intelligent model selection.
      *
-     * v1.0.0: System automatically selects optimal model based on your provider keys,
-     * desired HLE, and admin's global/tag-level HLE ceilings.
+     * Uses async job processing (enqueue + poll) for production reliability.
+     * Model routing is managed by admin-configured quality ceilings (global,
+     * tag-level, integration-level) in the dashboard. The system selects the
+     * cheapest model that meets the effective ceiling.
      *
      * Provider API keys must be stored in your account (BYOK via dashboard).
      *
      * @example
-     * // Basic usage (cross-provider optimization)
-     * const response = await client.complete({
-     *   messages: [{role: 'user', content: 'Hello'}],
-     *   desired_hle: 30,
-     * });
-     *
-     * // Constrained to specific provider
      * const response = await client.complete({
      *   messages: [{role: 'user', content: 'Hello'}],
-     *   llm_provider: 'openai',
-     *   desired_hle: 35,
+     *   tags: { team: 'backend', environment: 'production' },
      * });
      *
-     * // Access routing info
      * console.log(response.routing_info?.selected_model);
-     * if (response.routing_info?.hle_clamped) {
-     *   console.log('Admin ceiling lowered your desired HLE');
-     * }
+     * console.log(response.routing_info?.effective_hle);
      */
     async complete(request) {
         // Warn about deprecated parameters
@@ -321,27 +319,6 @@ class PcompresslrAPIClient {
             console.warn('[compress-lightreach v1.0.0] HLE parameters have changed. ' +
                 'Use "desired_hle" and optional "llm_provider" instead.');
         }
-        const data = {
-            messages: request.messages,
-            compress: request.compress ?? true,
-            compress_output: request.compress_output ?? false,
-            algorithm: request.algorithm ?? 'greedy',
-        };
-        // v1.0.0 parameters
-        if (request.llm_provider !== undefined)
-            data.llm_provider = request.llm_provider;
-        if (request.desired_hle !== undefined)
-            data.desired_hle = request.desired_hle;
-        if (request.compression_config)
-            data.compression_config = request.compression_config;
-        if (request.temperature !== undefined)
-            data.temperature = request.temperature;
-        if (request.max_tokens !== undefined)
-            data.max_tokens = request.max_tokens;
-        if (request.tags !== undefined)
-            data.tags = request.tags;
-        if (request.max_history_messages !== undefined)
-            data.max_history_messages = request.max_history_messages;
         // Prefer async jobs for production reliability; sync remains available via /api/v2/complete
         // by calling makeRequest directly if needed.
         return this.completeAsync(request);

package/dist/cli.js CHANGED Viewed

@@ -8,29 +8,17 @@ const api_client_1 = require("./api-client");
 async function main() {
     const args = process.argv.slice(2);
     if (args.length === 0) {
-        console.log("Usage: pcompresslr <prompt> [--greedy-only|--optimal-only]");
+        console.log("Usage: pcompresslr <prompt>");
         console.log("\nExample:");
         console.log('  pcompresslr "hello world hello world hello world"');
-        console.log('  pcompresslr "your prompt here" --greedy-only  # Only greedy');
-        console.log('  pcompresslr "your prompt here" --optimal-only  # Only optimal');
         console.log("\nNote: Requires PCOMPRESLR_API_KEY environment variable");
         process.exit(0);
     }
-    let prompt = args.join(" ");
-    let showGreedy = true;
-    let showOptimal = true;
-    if (prompt.endsWith("--greedy-only")) {
-        prompt = args.slice(0, -1).join(" ");
-        showOptimal = false;
-    }
-    else if (prompt.endsWith("--optimal-only")) {
-        prompt = args.slice(0, -1).join(" ");
-        showGreedy = false;
-    }
+    const prompt = args.join(" ");
     // Get API key from environment
     const apiKey = process.env.PCOMPRESLR_API_KEY;
     if (!apiKey) {
-        console.error("❌ Error: PCOMPRESLR_API_KEY environment variable is required.");
+        console.error("Error: PCOMPRESLR_API_KEY environment variable is required.");
         console.error("\nTo get an API key, visit https://compress.lightreach.io");
         console.error("Then set it with: export PCOMPRESLR_API_KEY=your-key-here");
         process.exit(1);
@@ -42,7 +30,7 @@ async function main() {
     }
     catch (error) {
         if (error instanceof api_client_1.APIKeyError) {
-            console.error(`❌ Error: ${error.message}`);
+            console.error(`Error: ${error.message}`);
             process.exit(1);
         }
         throw error;
@@ -50,116 +38,38 @@ async function main() {
     console.log(`Original prompt: ${JSON.stringify(prompt)}`);
     console.log(`Length: ${prompt.length} characters\n`);
     console.log("=".repeat(80));
-    // Run both compressors and compare
-    const results = {};
-    if (showGreedy) {
-        console.log("\n🔹 GREEDY COMPRESSOR (Fast, ~99% optimal)");
-        console.log("-".repeat(80));
-        try {
-            const resultGreedy = await client.compress(prompt, "gpt-4", "greedy");
-            const compressedGreedy = resultGreedy.compressed;
-            const dictGreedy = resultGreedy.dictionary;
-            const ratioGreedy = resultGreedy.compression_ratio;
-            const llmFormatGreedy = resultGreedy.llm_format;
-            // Verify decompression
-            const decompressResult = await client.decompress(llmFormatGreedy);
-            const decompressedGreedy = decompressResult.decompressed;
-            results['greedy'] = {
-                compressed: compressedGreedy,
-                dict: dictGreedy,
-                ratio: ratioGreedy,
-                llm_format: llmFormatGreedy,
-                decompressed: decompressedGreedy
-            };
-            console.log(`Compressed: ${JSON.stringify(compressedGreedy)}`);
-            console.log(`Dictionary: ${JSON.stringify(dictGreedy)}`);
-            console.log(`Compression ratio: ${(ratioGreedy * 100).toFixed(2)}%`);
-            console.log(`LLM-ready format length: ${llmFormatGreedy.length} chars`);
-            console.log(`Processing time: ${resultGreedy.processing_time_ms.toFixed(2)}ms`);
-            if (decompressedGreedy === prompt) {
-                console.log("✅ Decompression verified");
-            }
-            else {
-                console.log("❌ Decompression failed");
-            }
-        }
-        catch (error) {
-            if (error instanceof api_client_1.RateLimitError) {
-                console.error(`❌ Rate limit exceeded: ${error.message}`);
-            }
-            else if (error instanceof api_client_1.APIRequestError) {
-                console.error(`❌ API error: ${error.message}`);
-            }
-            else {
-                throw error;
-            }
-        }
-    }
-    if (showOptimal) {
-        console.log("\n🔸 OPTIMAL COMPRESSOR (DP, O(n²), globally optimal)");
-        console.log("-".repeat(80));
-        try {
-            const resultOptimal = await client.compress(prompt, "gpt-4", "optimal");
-            const compressedOptimal = resultOptimal.compressed;
-            const dictOptimal = resultOptimal.dictionary;
-            const ratioOptimal = resultOptimal.compression_ratio;
-            const llmFormatOptimal = resultOptimal.llm_format;
-            // Verify decompression
-            const decompressResult = await client.decompress(llmFormatOptimal);
-            const decompressedOptimal = decompressResult.decompressed;
-            results['optimal'] = {
-                compressed: compressedOptimal,
-                dict: dictOptimal,
-                ratio: ratioOptimal,
-                llm_format: llmFormatOptimal,
-                decompressed: decompressedOptimal
-            };
-            console.log(`Compressed: ${JSON.stringify(compressedOptimal)}`);
-            console.log(`Dictionary: ${JSON.stringify(dictOptimal)}`);
-            console.log(`Compression ratio: ${(ratioOptimal * 100).toFixed(2)}%`);
-            console.log(`LLM-ready format length: ${llmFormatOptimal.length} chars`);
-            console.log(`Processing time: ${resultOptimal.processing_time_ms.toFixed(2)}ms`);
-            if (decompressedOptimal === prompt) {
-                console.log("✅ Decompression verified");
-            }
-            else {
-                console.log("❌ Decompression failed");
-            }
+    console.log("\nGREEDY COMPRESSOR");
+    console.log("-".repeat(80));
+    try {
+        const result = await client.compress(prompt, "gpt-4");
+        const compressed = result.compressed;
+        const dictionary = result.dictionary;
+        const ratio = result.compression_ratio;
+        const llmFormat = result.llm_format;
+        // Verify decompression
+        const decompressResult = await client.decompress(llmFormat);
+        const decompressed = decompressResult.decompressed;
+        console.log(`Compressed: ${JSON.stringify(compressed)}`);
+        console.log(`Dictionary: ${JSON.stringify(dictionary)}`);
+        console.log(`Compression ratio: ${(ratio * 100).toFixed(2)}%`);
+        console.log(`LLM-ready format length: ${llmFormat.length} chars`);
+        console.log(`Processing time: ${result.processing_time_ms.toFixed(2)}ms`);
+        if (decompressed === prompt) {
+            console.log("Decompression verified");
         }
-        catch (error) {
-            if (error instanceof api_client_1.RateLimitError) {
-                console.error(`❌ Rate limit exceeded: ${error.message}`);
-            }
-            else if (error instanceof api_client_1.APIRequestError) {
-                console.error(`❌ API error: ${error.message}`);
-            }
-            else {
-                throw error;
-            }
+        else {
+            console.log("Decompression failed");
         }
     }
-    // Comparison if both were run
-    if (showGreedy && showOptimal && results['greedy'] && results['optimal']) {
-        console.log("\n" + "=".repeat(80));
-        console.log("📊 COMPARISON");
-        console.log("-".repeat(80));
-        const ratioDiff = results['optimal'].ratio - results['greedy'].ratio;
-        if (ratioDiff < 0) {
-            console.log(`✅ Optimal is ${Math.abs(ratioDiff * 100).toFixed(2)}% better (smaller ratio)`);
+    catch (error) {
+        if (error instanceof api_client_1.RateLimitError) {
+            console.error(`Rate limit exceeded: ${error.message}`);
         }
-        else if (ratioDiff > 0) {
-            console.log(`✅ Greedy is ${(ratioDiff * 100).toFixed(2)}% better (smaller ratio)`);
+        else if (error instanceof api_client_1.APIRequestError) {
+            console.error(`API error: ${error.message}`);
         }
         else {
-            console.log("✅ Both produce identical compression ratios");
-        }
-        console.log(`\nGreedy ratio: ${(results['greedy'].ratio * 100).toFixed(2)}%`);
-        console.log(`Optimal ratio: ${(results['optimal'].ratio * 100).toFixed(2)}%`);
-        console.log(`Difference: ${(ratioDiff * 100).toFixed(2)}%`);
-        const greedyDictSize = Object.keys(results['greedy'].dict).length;
-        const optimalDictSize = Object.keys(results['optimal'].dict).length;
-        if (greedyDictSize !== optimalDictSize) {
-            console.log(`\nDictionary size: Greedy=${greedyDictSize}, Optimal=${optimalDictSize}`);
+            throw error;
         }
     }
 }

package/dist/core.d.ts CHANGED Viewed

@@ -17,20 +17,33 @@ export interface CompressionConfig {
 }
 export interface CompleteOptions {
     messages: Message[];
+    /** @deprecated System selects model automatically. */
     model?: string;
     provider?: 'openai' | 'anthropic' | 'google';
+    /**
+     * @deprecated Quality routing is now fully managed by admin-configured ceilings
+     * in the dashboard. Accepted for backward compatibility.
+     */
     desiredHle?: number;
     compress?: boolean;
     compressionConfig?: CompressionConfig;
     compressOutput?: boolean;
-    useOptimal?: boolean;
     mode?: 'async' | 'sync';
+    /** @deprecated Use desiredHle instead. */
     hleTargetPercent?: number;
+    /** @deprecated Use desiredHle instead. */
     minHleScore?: number;
+    /** @deprecated Always auto-selects now. */
     autoSelectByHle?: boolean;
+    /** @deprecated Use provider instead. */
     sameProviderOnly?: boolean;
     temperature?: number;
     maxTokens?: number;
+    /**
+     * Tags for cost attribution and quality ceilings.
+     * Supported keys: 'team', 'environment', 'feature'.
+     * The 'integration' tag is reserved for system use.
+     */
     tags?: Record<string, string>;
     maxHistoryMessages?: number;
 }
@@ -38,13 +51,11 @@ export declare class LightReach {
     private apiClient;
     private defaultModel;
     private defaultProvider;
-    private useOptimal;
     constructor(options?: {
         apiKey?: string;
         apiUrl?: string;
         defaultModel?: string;
         defaultProvider?: 'openai' | 'anthropic' | 'google';
-        useOptimal?: boolean;
     });
     complete(options: CompleteOptions): Promise<CompleteResponse>;
     /**
@@ -52,7 +63,6 @@ export declare class LightReach {
      */
     compress(text: string, options?: {
         model?: string;
-        algorithm?: 'greedy' | 'optimal';
         tags?: Record<string, string>;
     }): Promise<CompressResponse>;
 }

package/dist/core.js CHANGED Viewed

@@ -13,11 +13,9 @@ class LightReach {
     constructor(options = {}) {
         this.defaultModel = options.defaultModel ?? 'gpt-4';
         this.defaultProvider = options.defaultProvider ?? 'openai';
-        this.useOptimal = options.useOptimal ?? false;
         this.apiClient = new api_client_1.PcompresslrAPIClient(options.apiKey, options.apiUrl);
     }
     async complete(options) {
-        const algorithm = (options.useOptimal ?? this.useOptimal) ? 'optimal' : 'greedy';
         const cfg = options.compressionConfig
             ? {
                 compress_system: options.compressionConfig.compressSystem ?? false,
@@ -34,7 +32,6 @@ class LightReach {
                 compress: options.compress ?? true,
                 compression_config: cfg,
                 compress_output: options.compressOutput ?? false,
-                algorithm,
                 hle_target_percent: options.hleTargetPercent,
                 min_hle_score: options.minHleScore,
                 auto_select_by_hle: options.autoSelectByHle,
@@ -54,7 +51,6 @@ class LightReach {
             // We do NOT fabricate cost estimates here since the API response does not include pricing data.
             return {
                 ...resp,
-                text: resp.text ?? resp.decompressed_response,
                 tokens_saved: resp.tokens_saved ?? resp.compression_stats?.token_savings,
                 tokens_used: resp.tokens_used ?? resp.llm_stats?.total_tokens,
                 compression_ratio: resp.compression_ratio ?? resp.compression_stats?.compression_ratio,
@@ -85,7 +81,7 @@ class LightReach {
      * Compress text without making an LLM call (POST /api/v1/compress).
      */
     async compress(text, options) {
-        return await this.apiClient.compress(text, options?.model ?? this.defaultModel, options?.algorithm ?? 'greedy', options?.tags);
+        return await this.apiClient.compress(text, options?.model ?? this.defaultModel, options?.tags);
     }
 }
 exports.LightReach = LightReach;

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "compress-lightreach",
-  "version": "1.0.5",
+  "version": "1.0.8",
   "description": "AI cost management SDK with intelligent model routing, prompt compression, and real-time token tracking",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",