rehydra 0.3.4 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (76) hide show
  1. package/README.md +170 -760
  2. package/dist/index.d.ts +2 -0
  3. package/dist/index.d.ts.map +1 -1
  4. package/dist/index.js +4 -0
  5. package/dist/index.js.map +1 -1
  6. package/dist/proxy/index.d.ts +12 -0
  7. package/dist/proxy/index.d.ts.map +1 -0
  8. package/dist/proxy/index.js +11 -0
  9. package/dist/proxy/index.js.map +1 -0
  10. package/dist/proxy/providers/anthropic.d.ts +17 -0
  11. package/dist/proxy/providers/anthropic.d.ts.map +1 -0
  12. package/dist/proxy/providers/anthropic.js +117 -0
  13. package/dist/proxy/providers/anthropic.js.map +1 -0
  14. package/dist/proxy/providers/index.d.ts +19 -0
  15. package/dist/proxy/providers/index.d.ts.map +1 -0
  16. package/dist/proxy/providers/index.js +40 -0
  17. package/dist/proxy/providers/index.js.map +1 -0
  18. package/dist/proxy/providers/openai.d.ts +17 -0
  19. package/dist/proxy/providers/openai.d.ts.map +1 -0
  20. package/dist/proxy/providers/openai.js +92 -0
  21. package/dist/proxy/providers/openai.js.map +1 -0
  22. package/dist/proxy/providers/types.d.ts +29 -0
  23. package/dist/proxy/providers/types.d.ts.map +1 -0
  24. package/dist/proxy/providers/types.js +6 -0
  25. package/dist/proxy/providers/types.js.map +1 -0
  26. package/dist/proxy/proxy-server.d.ts +53 -0
  27. package/dist/proxy/proxy-server.d.ts.map +1 -0
  28. package/dist/proxy/proxy-server.js +146 -0
  29. package/dist/proxy/proxy-server.js.map +1 -0
  30. package/dist/proxy/rehydra-fetch.d.ts +35 -0
  31. package/dist/proxy/rehydra-fetch.d.ts.map +1 -0
  32. package/dist/proxy/rehydra-fetch.js +217 -0
  33. package/dist/proxy/rehydra-fetch.js.map +1 -0
  34. package/dist/proxy/rehydra-proxy.d.ts +40 -0
  35. package/dist/proxy/rehydra-proxy.d.ts.map +1 -0
  36. package/dist/proxy/rehydra-proxy.js +82 -0
  37. package/dist/proxy/rehydra-proxy.js.map +1 -0
  38. package/dist/proxy/sse-parser.d.ts +59 -0
  39. package/dist/proxy/sse-parser.d.ts.map +1 -0
  40. package/dist/proxy/sse-parser.js +112 -0
  41. package/dist/proxy/sse-parser.js.map +1 -0
  42. package/dist/proxy/types.d.ts +49 -0
  43. package/dist/proxy/types.d.ts.map +1 -0
  44. package/dist/proxy/types.js +5 -0
  45. package/dist/proxy/types.js.map +1 -0
  46. package/dist/proxy/wrap-client.d.ts +47 -0
  47. package/dist/proxy/wrap-client.d.ts.map +1 -0
  48. package/dist/proxy/wrap-client.js +70 -0
  49. package/dist/proxy/wrap-client.js.map +1 -0
  50. package/dist/storage/session.d.ts +3 -0
  51. package/dist/storage/session.d.ts.map +1 -1
  52. package/dist/storage/session.js +16 -0
  53. package/dist/storage/session.js.map +1 -1
  54. package/dist/storage/types.d.ts +16 -0
  55. package/dist/storage/types.d.ts.map +1 -1
  56. package/dist/streaming/anonymizer-stream.d.ts +63 -0
  57. package/dist/streaming/anonymizer-stream.d.ts.map +1 -0
  58. package/dist/streaming/anonymizer-stream.js +184 -0
  59. package/dist/streaming/anonymizer-stream.js.map +1 -0
  60. package/dist/streaming/index.d.ts +9 -0
  61. package/dist/streaming/index.d.ts.map +1 -0
  62. package/dist/streaming/index.js +8 -0
  63. package/dist/streaming/index.js.map +1 -0
  64. package/dist/streaming/sentence-buffer.d.ts +78 -0
  65. package/dist/streaming/sentence-buffer.d.ts.map +1 -0
  66. package/dist/streaming/sentence-buffer.js +238 -0
  67. package/dist/streaming/sentence-buffer.js.map +1 -0
  68. package/dist/streaming/stream-factory.d.ts +38 -0
  69. package/dist/streaming/stream-factory.d.ts.map +1 -0
  70. package/dist/streaming/stream-factory.js +69 -0
  71. package/dist/streaming/stream-factory.js.map +1 -0
  72. package/dist/streaming/types.d.ts +121 -0
  73. package/dist/streaming/types.d.ts.map +1 -0
  74. package/dist/streaming/types.js +5 -0
  75. package/dist/streaming/types.js.map +1 -0
  76. package/package.json +18 -1
package/README.md CHANGED
@@ -6,898 +6,308 @@
6
6
  ![Issues](https://img.shields.io/github/issues/rehydra-ai/rehydra)
7
7
  [![codecov](https://codecov.io/github/rehydra-ai/rehydra/graph/badge.svg?token=WX5RI0ZZJG)](https://codecov.io/github/rehydra-ai/rehydra)
8
8
 
9
-
10
- On-device PII anonymization module for high-privacy AI workflows. Detects and replaces Personally Identifiable Information (PII) with semantically valuable placeholder tags while maintaining an encrypted mapping for rehydration.
11
-
12
- ```bash
13
- npm install rehydra
14
- ```
15
-
16
- **Works in Node.js, Bun, and browsers**
17
-
18
- ## Features
19
-
20
- - **Structured PII Detection**: Regex-based detection for emails, phones, IBANs, credit cards, IPs, URLs
21
- - **Soft PII Detection**: ONNX-powered NER model for names, organizations, locations (auto-downloads on first use if enabled)
22
- - **Semantic Enrichment**: AI/MT-friendly tags with gender/location attributes
23
- - **Secure PII Mapping**: AES-256-GCM encrypted storage of original PII values
24
- - **Cross-Platform**: Works identically in Node.js, Bun, and browsers
25
- - **Configurable Policies**: Customizable detection rules, thresholds, and allowlists
26
- - **Validation & Leak Scanning**: Built-in validation and optional leak detection
27
-
28
- ## Installation
29
-
30
- ### Node.js
9
+ On-device PII anonymization for AI workflows. Detects names, emails, phones, IBANs, and more — replaces them with encrypted placeholder tags — and rehydrates them back after processing.
31
10
 
32
11
  ```bash
33
12
  npm install rehydra
34
13
  ```
35
- For bun support see [Bun Support](#bun-support)
36
-
37
- ### Browser (with bundler)
38
-
39
- ```bash
40
- npm install rehydra onnxruntime-web
41
- ```
42
14
 
43
- ### Browser (without bundler)
44
-
45
- ```html
46
- <script type="module">
47
- // Import directly from your dist folder or CDN
48
- import { createAnonymizer } from './node_modules/rehydra/dist/index.js';
49
-
50
- // onnxruntime-web is automatically loaded from CDN when needed
51
- </script>
52
- ```
15
+ **Works in Node.js, Bun, and browsers.** No data leaves your machine.
53
16
 
54
17
  ## Quick Start
55
18
 
56
- ### Full pipeline (Anonymize → LLM → Rehydrate)
57
-
58
- The full workflow for privacy-preserving LLM workflows:
59
-
60
19
  ```typescript
61
- import {
62
- createAnonymizer,
63
- decryptPIIMap,
64
- rehydrate,
65
- InMemoryKeyProvider
66
- } from 'rehydra';
20
+ import { createAnonymizer, decryptPIIMap, rehydrate, InMemoryKeyProvider } from 'rehydra';
67
21
 
68
- // 1. Create a key provider (required to decrypt later)
69
22
  const keyProvider = new InMemoryKeyProvider();
70
-
71
- // 2. Create anonymizer with key provider
72
23
  const anonymizer = createAnonymizer({
73
- ner: { mode: 'quantized' },
74
- semantic: { enabled: true },
75
- keyProvider: keyProvider
24
+ ner: { mode: 'quantized' }, // ~280 MB model, auto-downloads on first use
25
+ keyProvider,
76
26
  });
77
27
 
78
- await anonymizer.initialize();
79
-
80
- // 3. Anonymize before translation
81
- const original = 'Hello John Smith from Acme Corp in Berlin!';
82
- const result = await anonymizer.anonymize(original);
28
+ const result = await anonymizer.anonymize(
29
+ 'Email john.smith@acme-corp.com or call John at +41 79 123 45 67'
30
+ );
83
31
 
84
32
  console.log(result.anonymizedText);
85
- // "Hello <PII type="PERSON" gender="male" id="1"/> from <PII type="ORG" id="2"/> in <PII type="LOCATION" scope="city" id="3"/>!"
86
-
87
- // 4. Translate (or do other AI workloads that preserve placeholders)
88
- const translated = await yourAIWorkflow(result.anonymizedText, { from: 'en', to: 'de' });
89
- // "Hallo <PII type="PERSON" gender="male" id="1"/> von <PII type="ORG" id="2"/> in <PII type="LOCATION" scope="city" id="3"/>!"
33
+ // "Email <PII type="EMAIL" id="1"/> or call <PII type="PERSON" id="2"/> at <PII type="PHONE" id="3"/>"
90
34
 
91
- // 5. Decrypt the PII map using the same key
92
- const encryptionKey = await keyProvider.getKey();
93
- const piiMap = await decryptPIIMap(result.piiMap, encryptionKey);
94
-
95
- // 6. Rehydrate - replace placeholders with original values
96
- const rehydrated = rehydrate(translated, piiMap);
97
- // "Hallo John Smith von Acme Corp in Berlin!"
35
+ // Rehydrate after translation or other processing
36
+ const key = await keyProvider.getKey();
37
+ const piiMap = await decryptPIIMap(result.piiMap!, key);
38
+ const original = rehydrate(result.anonymizedText, piiMap);
39
+ // "Email john.smith@acme-corp.com or call John at +41 79 123 45 67"
98
40
 
99
- // 7. Clean up
100
41
  await anonymizer.dispose();
101
42
  ```
102
43
 
103
- ### Regex-Only Mode (No Downloads Required)
104
-
105
- For structured PII like emails, phones, IBANs, credit cards:
106
-
107
- ```typescript
108
- import { anonymizeRegexOnly } from 'rehydra';
109
-
110
- const result = await anonymizeRegexOnly(
111
- 'Contact john@example.com or call +49 30 123456. IBAN: DE89370400440532013000'
112
- );
113
-
114
- console.log(result.anonymizedText);
115
- // "Contact <PII type="EMAIL" id="1"/> or call <PII type="PHONE" id="2"/>. IBAN: <PII type="IBAN" id="3"/>"
116
- ```
44
+ ## LLM Proxy
117
45
 
118
- ### Full Mode with NER (Detects Names, Organizations, Locations)
46
+ Drop-in middleware that anonymizes prompts before they leave your machine and rehydrates responses. Works with OpenAI, Anthropic, and any OpenAI-compatible API.
119
47
 
120
- The NER model is automatically downloaded on first use (~280 MB for quantized):
48
+ ### Wrap any fetch-based client
121
49
 
122
50
  ```typescript
123
- import { createAnonymizer } from 'rehydra';
124
-
125
- const anonymizer = createAnonymizer({
126
- ner: {
127
- mode: 'quantized', // or 'standard' for full model (~1.1 GB)
128
- onStatus: (status) => console.log(status),
129
- }
51
+ import OpenAI from 'openai';
52
+ import { createRehydraFetch, InMemoryKeyProvider, InMemoryPIIStorageProvider } from 'rehydra';
53
+
54
+ const openai = new OpenAI({
55
+ fetch: createRehydraFetch({
56
+ anonymizer: { ner: { mode: 'quantized' } },
57
+ keyProvider: new InMemoryKeyProvider(),
58
+ piiStorageProvider: new InMemoryPIIStorageProvider(),
59
+ }),
130
60
  });
131
61
 
132
- await anonymizer.initialize(); // Downloads model if needed
133
-
134
- const result = await anonymizer.anonymize(
135
- 'Hello John Smith from Acme Corp in Berlin!'
136
- );
137
-
138
- console.log(result.anonymizedText);
139
- // "Hello <PII type="PERSON" id="1"/> from <PII type="ORG" id="2"/> in <PII type="LOCATION" id="3"/>!"
140
-
141
- // Clean up when done
142
- await anonymizer.dispose();
62
+ // PII is anonymized before leaving your machine, response is rehydrated automatically
63
+ const response = await openai.chat.completions.create({
64
+ model: 'gpt-4o',
65
+ messages: [{ role: 'user', content: 'Draft a reply to john@example.com about the meeting' }],
66
+ });
143
67
  ```
144
68
 
145
- ### With Semantic Enrichment
146
-
147
- Add gender and location scope for better machine translation:
69
+ ### Or use wrapLLMClient for even less code
148
70
 
149
71
  ```typescript
150
- import { createAnonymizer } from 'rehydra';
72
+ import OpenAI from 'openai';
73
+ import { wrapLLMClient, InMemoryKeyProvider, InMemoryPIIStorageProvider } from 'rehydra';
151
74
 
152
- const anonymizer = createAnonymizer({
153
- ner: { mode: 'quantized' },
154
- semantic: {
155
- enabled: true, // Downloads ~12 MB of semantic data on first use
156
- onStatus: (status) => console.log(status),
157
- }
75
+ const openai = wrapLLMClient(new OpenAI(), {
76
+ keyProvider: new InMemoryKeyProvider(),
77
+ piiStorageProvider: new InMemoryPIIStorageProvider(),
158
78
  });
159
-
160
- await anonymizer.initialize();
161
-
162
- const result = await anonymizer.anonymize(
163
- 'Hello Maria Schmidt from Berlin!'
164
- );
165
-
166
- console.log(result.anonymizedText);
167
- // "Hello <PII type="PERSON" gender="female" id="1"/> from <PII type="LOCATION" scope="city" id="2"/>!"
168
79
  ```
169
80
 
170
- ## API Reference
171
-
172
- Full documentation on [https://docs.rehydra.ai](https://docs.rehydra.ai).
81
+ ### Standalone proxy server
173
82
 
174
- ### Configuration Options
83
+ Point any LLM client at a local proxy — zero code changes needed:
175
84
 
176
85
  ```typescript
177
- import { createAnonymizer, InMemoryKeyProvider } from 'rehydra';
86
+ import { createRehydraProxyServer, InMemoryKeyProvider, InMemoryPIIStorageProvider } from 'rehydra';
178
87
 
179
- const anonymizer = createAnonymizer({
180
- // NER configuration
181
- ner: {
182
- mode: 'quantized', // 'standard' | 'quantized' | 'disabled' | 'custom'
183
- backend: 'local', // 'local' (default) | 'inference-server'
184
- autoDownload: true, // Auto-download model if not present
185
- onStatus: (status) => {}, // Status messages callback
186
- onDownloadProgress: (progress) => {
187
- console.log(`${progress.file}: ${progress.percent}%`);
188
- },
189
-
190
- // For 'inference-server' backend:
191
- inferenceServerUrl: 'http://localhost:8080',
192
-
193
- // For 'custom' mode only:
194
- modelPath: './my-model.onnx',
195
- vocabPath: './vocab.txt',
196
- },
197
-
198
- // Semantic enrichment (adds gender/scope attributes)
199
- semantic: {
200
- enabled: true, // Enable MT-friendly attributes
201
- autoDownload: true, // Auto-download semantic data (~12 MB)
202
- onStatus: (status) => {},
203
- onDownloadProgress: (progress) => {},
204
- },
205
-
206
- // Encryption key provider
88
+ const proxy = await createRehydraProxyServer({
89
+ port: 8080,
90
+ upstream: 'https://api.openai.com',
207
91
  keyProvider: new InMemoryKeyProvider(),
208
-
209
- // Custom policy (optional)
210
- defaultPolicy: { /* see Policy section */ },
92
+ piiStorageProvider: new InMemoryPIIStorageProvider(),
211
93
  });
212
94
 
213
- await anonymizer.initialize();
95
+ // Point your client at the proxy
96
+ const openai = new OpenAI({ baseURL: 'http://localhost:8080/v1' });
214
97
  ```
215
98
 
216
- ### NER Modes
217
-
218
- | Mode | Description | Size | Auto-Download |
219
- |------|-------------|------|---------------|
220
- | `'disabled'` | No NER, regex only | 0 | N/A |
221
- | `'quantized'` | Smaller model, ~95% accuracy | ~280 MB | Yes |
222
- | `'standard'` | Full model, best accuracy | ~1.1 GB | Yes |
223
- | `'custom'` | Your own ONNX model | Varies | No |
99
+ Supports non-streaming and streaming (SSE) responses for both OpenAI and Anthropic APIs.
224
100
 
225
- ### ONNX Session Options
101
+ ## Streaming
226
102
 
227
- Fine-tune ONNX Runtime performance with session options:
103
+ Process text chunk-by-chunk with constant memory. Works as a Node.js Transform stream.
228
104
 
229
105
  ```typescript
230
- const anonymizer = createAnonymizer({
231
- ner: {
232
- mode: 'quantized',
233
- sessionOptions: {
234
- // Graph optimization level: 'disabled' | 'basic' | 'extended' | 'all'
235
- graphOptimizationLevel: 'all', // default
236
-
237
- // Threading (Node.js only)
238
- intraOpNumThreads: 4, // threads within operators
239
- interOpNumThreads: 1, // threads between operators
240
-
241
- // Memory optimization
242
- enableCpuMemArena: true,
243
- enableMemPattern: true,
244
- }
245
- }
246
- });
247
- ```
248
-
249
- #### Execution Providers
250
-
251
- By default, Rehydra uses:
252
- - **Node.js**: CPU (fastest for quantized models)
253
- - **Browsers**: CPU (WASM)
106
+ import { createReadStream, createWriteStream } from 'fs';
107
+ import { createAnonymizerStream, InMemoryKeyProvider } from 'rehydra';
254
108
 
255
-
256
- > For NVIDIA GPU acceleration with CUDA/TensorRT, use the inference server backend (see [GPU Acceleration](#gpu-acceleration-enterprise)).
257
-
258
- ### GPU Acceleration (Enterprise)
259
-
260
- For high-throughput production deployments, Rehydra supports GPU-accelerated inference via a dedicated inference server. This is useful for large documents.
261
-
262
- ```typescript
263
- const anonymizer = createAnonymizer({
264
- ner: {
265
- backend: 'inference-server',
266
- inferenceServerUrl: 'http://localhost:8080',
267
- }
109
+ const stream = await createAnonymizerStream({
110
+ anonymizer: { ner: { mode: 'quantized' } },
111
+ keyProvider: new InMemoryKeyProvider(),
112
+ sessionId: 'batch-job-001',
113
+ piiStorageProvider: storage,
268
114
  });
269
115
 
270
- await anonymizer.initialize();
116
+ createReadStream('input.txt').pipe(stream).pipe(createWriteStream('anonymized.txt'));
271
117
  ```
272
118
 
273
- **Performance Comparison:**
274
-
275
- | Text Size | CPU (local) | GPU (server) |
276
- |-----------|-------------|--------------|
277
- | Short (~40 chars) | 4.3ms | 62ms |
278
- | Medium (~500 chars) | 26ms | 73ms |
279
- | Long (~2000 chars) | 93ms | 117ms |
280
- | Entity-dense | 13ms | 68ms |
281
-
282
- Local CPU faster for most use cases due to network overhead. GPU is beneficial for batch processing and large documents.
283
-
284
- **Backend Options:**
285
-
286
- | Backend | Description | Latency (2K chars) |
287
- |---------|-------------|-------------------|
288
- | `'local'` | CPU inference (default) | ~4,300ms |
289
- | `'inference-server'` | GPU server (enterprise) | ~117ms |
290
-
291
-
292
- ### Main Functions
293
-
294
- #### `createAnonymizer(config?)`
119
+ ### Low-latency mode for LLM token streams
295
120
 
296
- Creates a reusable anonymizer instance:
121
+ Regex-only, smaller buffers, flushes aggressively — designed for real-time token streams:
297
122
 
298
123
  ```typescript
299
- const anonymizer = createAnonymizer({
300
- ner: { mode: 'quantized' }
124
+ const stream = await createAnonymizerStream({
125
+ buffer: { lowLatency: true },
301
126
  });
302
127
 
303
- await anonymizer.initialize();
304
- const result = await anonymizer.anonymize('text');
305
- await anonymizer.dispose();
306
- ```
307
-
308
- #### `anonymize(text, locale?, policy?)`
309
-
310
- One-off anonymization (regex-only by default):
311
-
312
- ```typescript
313
- import { anonymize } from 'rehydra';
314
-
315
- const result = await anonymize('Contact test@example.com');
316
- ```
317
-
318
- #### `anonymizeWithNER(text, nerConfig, policy?)`
319
-
320
- One-off anonymization with NER:
321
-
322
- ```typescript
323
- import { anonymizeWithNER } from 'rehydra';
324
-
325
- const result = await anonymizeWithNER(
326
- 'Hello John Smith',
327
- { mode: 'quantized' }
328
- );
329
- ```
330
-
331
- #### `anonymizeRegexOnly(text, policy?)`
332
-
333
- Fast regex-only anonymization:
334
-
335
- ```typescript
336
- import { anonymizeRegexOnly } from 'rehydra';
337
-
338
- const result = await anonymizeRegexOnly('Card: 4111111111111111');
339
- ```
340
-
341
- ### Rehydration Functions
342
-
343
- #### `decryptPIIMap(encryptedMap, key)`
344
-
345
- Decrypts the PII map for rehydration:
346
-
347
- ```typescript
348
- import { decryptPIIMap } from 'rehydra';
349
-
350
- const piiMap = await decryptPIIMap(result.piiMap, encryptionKey);
351
- // Returns Map<string, string> where key is "PERSON:1" and value is "John Smith"
352
- ```
353
-
354
- #### `rehydrate(text, piiMap)`
355
-
356
- Replaces placeholders with original values:
357
-
358
- ```typescript
359
- import { rehydrate } from 'rehydra';
360
-
361
- const original = rehydrate(translatedText, piiMap);
362
- ```
363
-
364
- ### Result Structure
365
-
366
- ```typescript
367
- interface AnonymizationResult {
368
- // Text with PII replaced by placeholder tags
369
- anonymizedText: string;
370
-
371
- // Detected entities (without original text for safety)
372
- entities: Array<{
373
- type: PIIType;
374
- id: number;
375
- start: number;
376
- end: number;
377
- confidence: number;
378
- source: 'REGEX' | 'NER';
379
- }>;
380
-
381
- // Encrypted PII mapping (for later rehydration)
382
- piiMap: {
383
- ciphertext: string; // Base64
384
- iv: string; // Base64
385
- authTag: string; // Base64
386
- };
387
-
388
- // Processing statistics
389
- stats: {
390
- countsByType: Record<PIIType, number>;
391
- totalEntities: number;
392
- processingTimeMs: number;
393
- modelVersion: string;
394
- leakScanPassed?: boolean;
395
- };
396
- }
397
- ```
398
-
399
- ## Supported PII Types
400
-
401
- | Type | Description | Detection | Semantic Attributes |
402
- |------|-------------|-----------|---------------------|
403
- | `EMAIL` | Email addresses | Regex | - |
404
- | `PHONE` | Phone numbers (international) | Regex | - |
405
- | `IBAN` | International Bank Account Numbers | Regex + Checksum | - |
406
- | `BIC_SWIFT` | Bank Identifier Codes | Regex | - |
407
- | `CREDIT_CARD` | Credit card numbers | Regex + Luhn | - |
408
- | `IP_ADDRESS` | IPv4 and IPv6 addresses | Regex | - |
409
- | `URL` | Web URLs | Regex | - |
410
- | `CASE_ID` | Case/ticket numbers | Regex (configurable) | - |
411
- | `CUSTOMER_ID` | Customer identifiers | Regex (configurable) | - |
412
- | `PERSON` | Person names | NER | `gender` (male/female/neutral) |
413
- | `ORG` | Organization names | NER | - |
414
- | `LOCATION` | Location/place names | NER | `scope` (city/country/region) |
415
- | `ADDRESS` | Physical addresses | NER | - |
416
- | `DATE_OF_BIRTH` | Dates of birth | NER | - |
417
-
418
- ## Configuration
419
-
420
- ### Anonymization Policy
421
-
422
- ```typescript
423
- import { createAnonymizer, PIIType } from 'rehydra';
424
-
425
- const anonymizer = createAnonymizer({
426
- ner: { mode: 'quantized' },
427
- defaultPolicy: {
428
- // Which PII types to detect
429
- enabledTypes: new Set([PIIType.EMAIL, PIIType.PHONE, PIIType.PERSON]),
430
-
431
- // Confidence thresholds per type (0.0 - 1.0)
432
- confidenceThresholds: new Map([
433
- [PIIType.PERSON, 0.8],
434
- [PIIType.EMAIL, 0.5],
435
- ]),
436
-
437
- // Terms to never treat as PII
438
- allowlistTerms: new Set(['Customer Service', 'Help Desk']),
439
-
440
- // Enable semantic enrichment (gender/scope)
441
- enableSemanticMasking: true,
442
-
443
- // Enable leak scanning on output
444
- enableLeakScan: true,
445
- },
128
+ llmTokenStream.pipe(stream).on('data', (chunk) => {
129
+ ws.send(chunk.toString());
446
130
  });
447
131
  ```
448
132
 
449
- ### Custom Recognizers
450
-
451
- Add domain-specific patterns:
133
+ ### Stream from a session
452
134
 
453
135
  ```typescript
454
- import { createCustomIdRecognizer, PIIType, createAnonymizer } from 'rehydra';
455
-
456
- const customRecognizer = createCustomIdRecognizer([
457
- {
458
- name: 'Order Number',
459
- pattern: /\bORD-[A-Z0-9]{8}\b/g,
460
- type: PIIType.CASE_ID,
461
- },
462
- ]);
463
-
464
- const anonymizer = createAnonymizer();
465
- anonymizer.getRegistry().register(customRecognizer);
136
+ const session = anonymizer.session('chat-123');
137
+ const stream = await session.createStream();
138
+ input.pipe(stream).pipe(output);
466
139
  ```
467
140
 
468
- ## Data & Model Storage
469
-
470
- Models and semantic data are cached locally for offline use.
141
+ ## Sessions
471
142
 
472
- ### Node.js Cache Locations
473
-
474
- | Data | macOS | Linux | Windows |
475
- |------|-------|-------|---------|
476
- | NER Models | `~/Library/Caches/rehydra/models/` | `~/.cache/rehydra/models/` | `%LOCALAPPDATA%/rehydra/models/` |
477
- | Semantic Data | `~/Library/Caches/rehydra/semantic-data/` | `~/.cache/rehydra/semantic-data/` | `%LOCALAPPDATA%/rehydra/semantic-data/` |
478
-
479
- ### Browser Cache
480
-
481
- In browsers, data is stored using:
482
- - **IndexedDB**: For semantic data and smaller files
483
- - **Origin Private File System (OPFS)**: For large model files (~280 MB)
484
-
485
- Data persists across page reloads and browser sessions.
486
-
487
- ### Manual Data Management
143
+ For multi-message conversations where PII IDs need to stay consistent and PII maps need to persist:
488
144
 
489
145
  ```typescript
490
- import {
491
- // Model management
492
- isModelDownloaded,
493
- downloadModel,
494
- clearModelCache,
495
- listDownloadedModels,
496
-
497
- // Semantic data management
498
- isSemanticDataDownloaded,
499
- downloadSemanticData,
500
- clearSemanticDataCache,
146
+ import {
147
+ createAnonymizer,
148
+ InMemoryKeyProvider,
149
+ SQLitePIIStorageProvider, // or InMemoryPIIStorageProvider, IndexedDBPIIStorageProvider
501
150
  } from 'rehydra';
502
151
 
503
- // Check if model is downloaded
504
- const hasModel = await isModelDownloaded('quantized');
505
-
506
- // Manually download model with progress
507
- await downloadModel('quantized', (progress) => {
508
- console.log(`${progress.file}: ${progress.percent}%`);
152
+ const anonymizer = createAnonymizer({
153
+ ner: { mode: 'quantized' },
154
+ keyProvider: new InMemoryKeyProvider(),
155
+ piiStorageProvider: new SQLitePIIStorageProvider('./pii.db'),
509
156
  });
510
157
 
511
- // Check semantic data
512
- const hasSemanticData = await isSemanticDataDownloaded();
513
-
514
- // List downloaded models
515
- const models = await listDownloadedModels();
516
-
517
- // Clear caches
518
- await clearModelCache('quantized'); // or clearModelCache() for all
519
- await clearSemanticDataCache();
520
- ```
521
-
522
- ## Encryption & Security
523
-
524
- The PII map is encrypted using **AES-256-GCM** via the Web Crypto API (works in both Node.js and browsers).
158
+ const session = anonymizer.session('chat-123');
525
159
 
526
- ### Key Providers
160
+ // Message 1
161
+ await session.anonymize('Contact me at user@example.com');
162
+ // → "Contact me at <PII type="EMAIL" id="1"/>"
527
163
 
528
- ```typescript
529
- import {
530
- InMemoryKeyProvider, // For development/testing
531
- ConfigKeyProvider, // For production with pre-configured key
532
- KeyProvider, // Interface for custom implementations
533
- generateKey,
534
- } from 'rehydra';
164
+ // Message 2 — same email gets the same ID
165
+ await session.anonymize('CC: user@example.com and admin@example.com');
166
+ // "CC: <PII type="EMAIL" id="1"/> and <PII type="EMAIL" id="2"/>"
535
167
 
536
- // Development: In-memory key (generates random key, lost on page refresh)
537
- const devKeyProvider = new InMemoryKeyProvider();
538
-
539
- // Production: Pre-configured key
540
- // Generate key: openssl rand -base64 32
541
- const keyBase64 = process.env.PII_ENCRYPTION_KEY; // or read from config
542
- const prodKeyProvider = new ConfigKeyProvider(keyBase64);
543
-
544
- // Custom: Implement KeyProvider interface
545
- class SecureKeyProvider implements KeyProvider {
546
- async getKey(): Promise<Uint8Array> {
547
- // Retrieve from secure storage, HSM, keychain, etc.
548
- return await getKeyFromSecureStorage();
549
- }
550
- }
168
+ // Rehydrate any message auto-loads the PII map from storage
169
+ const original = await session.rehydrate(translatedText);
551
170
  ```
552
171
 
553
- ### Security Best Practices
554
-
555
- - **Never log the raw PII map** - Always use encrypted storage
556
- - **Persist the encryption key securely** - Use platform keystores (iOS Keychain, Android Keystore, etc.)
557
- - **Rotate keys** - Implement key rotation for long-running applications
558
- - **Enable leak scanning** - Catch any missed PII in output
559
-
560
- ## PII Map Storage
561
-
562
- For applications that need to persist encrypted PII maps (e.g., chat applications where you need to rehydrate later), use sessions with built-in storage providers.
563
-
564
172
  ### Storage Providers
565
173
 
566
- | Provider | Environment | Persistence | Use Case |
567
- |----------|-------------|-------------|----------|
568
- | `InMemoryPIIStorageProvider` | All | None (lost on restart) | Development, testing |
569
- | `SQLitePIIStorageProvider` | Node.js, Bun only* | File-based | Server-side applications |
570
- | `IndexedDBPIIStorageProvider` | Browser | Browser storage | Client-side applications |
174
+ | Provider | Environment | Persistence |
175
+ |----------|-------------|-------------|
176
+ | `InMemoryPIIStorageProvider` | All | None (lost on restart) |
177
+ | `SQLitePIIStorageProvider` | Node.js, Bun | File-based (`better-sqlite3` on Node, `bun:sqlite` on Bun) |
178
+ | `IndexedDBPIIStorageProvider` | Browser | Browser storage |
179
+
180
+ ## Supported PII Types
571
181
 
572
- *\*Not available in browser builds. Use `IndexedDBPIIStorageProvider` for browser applications.*
182
+ | Type | Detection | Notes |
183
+ |------|-----------|-------|
184
+ | `PERSON` | NER | Names, with optional `gender` attribute |
185
+ | `ORG` | NER | Organization names |
186
+ | `LOCATION` | NER | Places, with optional `scope` attribute (city/country/region) |
187
+ | `ADDRESS` | NER | Physical addresses |
188
+ | `DATE_OF_BIRTH` | NER | Dates of birth |
189
+ | `EMAIL` | Regex | Email addresses |
190
+ | `PHONE` | Regex | International phone numbers |
191
+ | `IBAN` | Regex + checksum | International Bank Account Numbers |
192
+ | `BIC_SWIFT` | Regex | Bank Identifier Codes |
193
+ | `CREDIT_CARD` | Regex + Luhn | Credit card numbers |
194
+ | `IP_ADDRESS` | Regex | IPv4 and IPv6 |
195
+ | `URL` | Regex | Web URLs |
196
+ | `CASE_ID` | Regex | Configurable case/ticket patterns |
197
+ | `CUSTOMER_ID` | Regex | Configurable customer ID patterns |
573
198
 
574
- ### Important: Storage Only Works with Sessions
199
+ ## Configuration
575
200
 
576
- > **Note:** The `piiStorageProvider` is only used when you call `anonymizer.session()`.
577
- > Calling `anonymizer.anonymize()` directly does NOT save to storage - the encrypted PII map
578
- > is only returned in the result for you to handle manually.
201
+ ### NER Modes
579
202
 
580
- ```typescript
581
- // ❌ Storage NOT used - you must handle the PII map yourself
582
- const result = await anonymizer.anonymize('Hello John!');
583
- // result.piiMap is returned but NOT saved to storage
584
-
585
- // Storage IS used - auto-saves and auto-loads
586
- const session = anonymizer.session('conversation-123');
587
- const result = await session.anonymize('Hello John!');
588
- // result.piiMap is automatically saved to storage
589
- ```
203
+ | Mode | Size | Description |
204
+ |------|------|-------------|
205
+ | `'disabled'` | 0 | Regex only — no model download |
206
+ | `'quantized'` | ~280 MB | Recommended good accuracy, smaller download |
207
+ | `'standard'` | ~1.1 GB | Best accuracy |
208
+ | `'custom'` | Varies | Bring your own ONNX model |
590
209
 
591
- ### Example: Without Storage (Simple One-Off Usage)
210
+ ### Semantic Enrichment
592
211
 
593
- For simple use cases where you don't need persistence:
212
+ Adds gender/scope attributes for better machine translation:
594
213
 
595
214
  ```typescript
596
- import { createAnonymizer, decryptPIIMap, rehydrate, InMemoryKeyProvider } from 'rehydra';
597
-
598
- const keyProvider = new InMemoryKeyProvider();
599
215
  const anonymizer = createAnonymizer({
600
216
  ner: { mode: 'quantized' },
601
- keyProvider,
217
+ semantic: { enabled: true }, // Downloads ~12 MB of name/location data
602
218
  });
603
- await anonymizer.initialize();
604
-
605
- // Anonymize
606
- const result = await anonymizer.anonymize('Hello John Smith!');
607
219
 
608
- // Translate (or other processing)
609
- const translated = await translateAPI(result.anonymizedText);
610
-
611
- // Rehydrate manually using the returned PII map
612
- const key = await keyProvider.getKey();
613
- const piiMap = await decryptPIIMap(result.piiMap, key);
614
- const original = rehydrate(translated, piiMap);
220
+ // "Hello <PII type="PERSON" gender="female" id="1"/> from <PII type="LOCATION" scope="city" id="2"/>!"
615
221
  ```
616
222
 
617
- ### Example: With Storage (Persistent Sessions)
618
-
619
- For applications that need to persist PII maps across requests/restarts:
223
+ ### Anonymization Policy
620
224
 
621
225
  ```typescript
622
- import {
623
- createAnonymizer,
624
- InMemoryKeyProvider,
625
- SQLitePIIStorageProvider,
626
- } from 'rehydra';
627
-
628
- // 1. Setup storage (once at app start)
629
- const storage = new SQLitePIIStorageProvider('./pii-maps.db');
630
- await storage.initialize();
631
-
632
- // 2. Create anonymizer with storage and key provider
633
226
  const anonymizer = createAnonymizer({
634
227
  ner: { mode: 'quantized' },
635
- keyProvider: new InMemoryKeyProvider(),
636
- piiStorageProvider: storage,
228
+ defaultPolicy: {
229
+ enabledTypes: new Set([PIIType.EMAIL, PIIType.PHONE, PIIType.PERSON]),
230
+ confidenceThresholds: new Map([[PIIType.PERSON, 0.8]]),
231
+ allowlistTerms: new Set(['Customer Service']),
232
+ enableLeakScan: true,
233
+ },
637
234
  });
638
- await anonymizer.initialize();
639
-
640
- // 3. Create a session for each conversation
641
- const session = anonymizer.session('conversation-123');
642
-
643
- // 4. Anonymize - auto-saves to storage
644
- const result = await session.anonymize('Hello John Smith from Acme Corp!');
645
- console.log(result.anonymizedText);
646
- // "Hello <PII type="PERSON" id="1"/> from <PII type="ORG" id="1"/>!"
647
-
648
- // 5. Later (even after app restart): rehydrate - auto-loads and decrypts
649
- const translated = await translateAPI(result.anonymizedText);
650
- const original = await session.rehydrate(translated);
651
- console.log(original);
652
- // "Hello John Smith from Acme Corp!"
653
-
654
- // 6. Optional: check existence or delete
655
- await session.exists(); // true
656
- await session.delete(); // removes from storage
657
235
  ```
658
236
 
659
- ### Example: Multiple Conversations
660
-
661
- Each session ID maps to a separate stored PII map:
237
+ ### Anonymization Modes
662
238
 
663
239
  ```typescript
664
- // Different chat sessions
665
- const chat1 = anonymizer.session('user-alice-chat');
666
- const chat2 = anonymizer.session('user-bob-chat');
667
-
668
- await chat1.anonymize('Alice: Contact me at alice@example.com');
669
- await chat2.anonymize('Bob: My number is +49 30 123456');
240
+ // Pseudonymize (default): reversible, returns encrypted PII map
241
+ const anonymizer = createAnonymizer({ mode: 'pseudonymize' });
670
242
 
671
- // Each session has independent storage
672
- await chat1.rehydrate(translatedText1); // Uses Alice's PII map
673
- await chat2.rehydrate(translatedText2); // Uses Bob's PII map
243
+ // Anonymize: irreversible, no PII map returned
244
+ const anonymizer = createAnonymizer({ mode: 'anonymize' });
674
245
  ```
675
246
 
676
- ### Multi-Message Conversations
677
-
678
- Within a session, entity IDs are consistent across multiple `anonymize()` calls:
679
-
680
- ```typescript
681
- const session = anonymizer.session('chat-123');
682
-
683
- // Message 1: User provides contact info
684
- const msg1 = await session.anonymize('Contact me at user@example.com');
685
- // → "Contact me at <PII type="EMAIL" id="1"/>"
686
-
687
- // Message 2: References same email + new one
688
- const msg2 = await session.anonymize('CC: user@example.com and admin@example.com');
689
- // → "CC: <PII type="EMAIL" id="1"/> and <PII type="EMAIL" id="2"/>"
690
- // ↑ Same ID (reused) ↑ New ID
691
-
692
- // Message 3: No PII
693
- await session.anonymize('Please translate to German');
694
- // Previous PII preserved
695
-
696
- // All messages can be rehydrated correctly
697
- await session.rehydrate(msg1.anonymizedText); // ✓
698
- await session.rehydrate(msg2.anonymizedText); // ✓
699
- ```
700
-
701
- This ensures that follow-up messages referencing the same PII produce consistent placeholders, and rehydration works correctly across the entire conversation.
702
-
703
- ### SQLite Provider (Node.js + Bun only)
704
-
705
- The SQLite provider works on both Node.js and Bun with automatic runtime detection.
706
-
707
- > **Note:** `SQLitePIIStorageProvider` is **not available in browser builds**. When bundling for browser with Vite/webpack, use `IndexedDBPIIStorageProvider` instead. The browser-safe build automatically excludes SQLite to avoid bundling Node.js dependencies.
247
+ ### Custom Recognizers
708
248
 
709
249
  ```typescript
710
- // Node.js / Bun only
711
- import { SQLitePIIStorageProvider } from 'rehydra';
712
- // Or explicitly: import { SQLitePIIStorageProvider } from 'rehydra/storage/sqlite';
250
+ import { createCustomIdRecognizer, PIIType } from 'rehydra';
713
251
 
714
- // File-based database
715
- const storage = new SQLitePIIStorageProvider('./data/pii-maps.db');
716
- await storage.initialize();
252
+ const recognizer = createCustomIdRecognizer([{
253
+ name: 'Order Number',
254
+ pattern: /\bORD-[A-Z0-9]{8}\b/g,
255
+ type: PIIType.CASE_ID,
256
+ }]);
717
257
 
718
- // Or in-memory for testing
719
- const testStorage = new SQLitePIIStorageProvider(':memory:');
720
- await testStorage.initialize();
258
+ anonymizer.getRegistry().register(recognizer);
721
259
  ```
722
260
 
723
- **Dependencies:**
724
- - **Bun**: Uses built-in `bun:sqlite` (no additional install needed)
725
- - **Node.js**: Requires `better-sqlite3`:
261
+ ### GPU Acceleration
726
262
 
727
- ```bash
728
- npm install better-sqlite3
729
- ```
730
-
731
- ### IndexedDB Provider (Browser)
263
+ For high-throughput batch processing, use a remote inference server with GPU:
732
264
 
733
265
  ```typescript
734
- import {
735
- createAnonymizer,
736
- InMemoryKeyProvider,
737
- IndexedDBPIIStorageProvider,
738
- } from 'rehydra';
739
-
740
- // Custom database name (defaults to 'rehydra-pii-storage')
741
- const storage = new IndexedDBPIIStorageProvider('my-app-pii');
742
-
743
266
  const anonymizer = createAnonymizer({
744
- ner: { mode: 'quantized' },
745
- keyProvider: new InMemoryKeyProvider(),
746
- piiStorageProvider: storage,
267
+ ner: {
268
+ backend: 'inference-server',
269
+ inferenceServerUrl: 'http://localhost:8080',
270
+ },
747
271
  });
748
- await anonymizer.initialize();
749
-
750
- // Use sessions as usual
751
- const session = anonymizer.session('browser-chat-123');
752
- const result = await session.anonymize('Hello John!');
753
- const original = await session.rehydrate(result.anonymizedText);
754
272
  ```
755
273
 
756
- ### Session Interface
274
+ ## Encryption
757
275
 
758
- The session object provides these methods:
276
+ PII maps are encrypted with **AES-256-GCM** via the Web Crypto API.
759
277
 
760
278
  ```typescript
761
- interface AnonymizerSession {
762
- readonly sessionId: string;
763
- anonymize(text: string, locale?: string, policy?: Partial<AnonymizationPolicy>): Promise<AnonymizationResult>;
764
- rehydrate(text: string): Promise<string>;
765
- load(): Promise<StoredPIIMap | null>;
766
- delete(): Promise<boolean>;
767
- exists(): Promise<boolean>;
768
- }
769
- ```
770
-
771
- ### Data Retention
772
-
773
- **Entries persist forever by default.** Use `cleanup()` on the storage provider to remove old entries:
774
-
775
- ```typescript
776
- // Delete entries older than 7 days
777
- const count = await storage.cleanup(new Date(Date.now() - 7 * 24 * 60 * 60 * 1000));
778
-
779
- // Or delete specific sessions
780
- await session.delete();
781
-
782
- // List all stored sessions
783
- const sessionIds = await storage.list();
784
- ```
785
-
786
- ## Browser Usage
787
-
788
- The library works seamlessly in browsers without any special configuration.
789
-
790
- ### Browser Notes
791
-
792
- - **First-use downloads**: NER model (~280 MB) and semantic data (~12 MB) are downloaded on first use
793
- - **ONNX runtime**: Automatically loaded from CDN if not bundled
794
- - **Offline support**: After initial download, everything works offline
795
- - **Storage**: Uses IndexedDB and OPFS - data persists across sessions
796
-
797
- ### Bundler Support (Vite, webpack, esbuild)
798
-
799
- The package uses [conditional exports](https://nodejs.org/api/packages.html#conditional-exports) to automatically provide a browser-safe build when bundling for the web. This means:
800
-
801
- - **Automatic**: Vite, webpack, esbuild, and other modern bundlers will automatically use `dist/browser.js`
802
- - **No Node.js modules**: The browser build excludes `SQLitePIIStorageProvider` and other Node.js-specific code
803
- - **Tree-shakable**: Only the code you use is included in your bundle
804
-
805
- ```json
806
- // package.json exports (simplified)
807
- {
808
- "exports": {
809
- ".": {
810
- "browser": "./dist/browser.js",
811
- "node": "./dist/index.js",
812
- "default": "./dist/index.js"
813
- }
814
- }
815
- }
816
- ```
817
-
818
- ## Bun Support
819
-
820
- This library works with [Bun](https://bun.sh). Since `onnxruntime-node` is a native Node.js addon, Bun uses `onnxruntime-web`:
279
+ // Development: random key, lost on restart
280
+ const keyProvider = new InMemoryKeyProvider();
821
281
 
822
- ```bash
823
- bun add rehydra onnxruntime-web
282
+ // Production: persistent key (generate with: openssl rand -base64 32)
283
+ const keyProvider = new ConfigKeyProvider(process.env.PII_ENCRYPTION_KEY!);
824
284
  ```
825
285
 
826
- Usage is identical - the library auto-detects the runtime.
827
-
828
- ## Performance
829
-
830
- Benchmarks on Apple M-series (CPU) and NVIDIA T4 (GPU). Run `npm run benchmark:compare` to measure on your hardware.
831
-
832
- ### Backend Comparison
833
-
834
- | Backend | Short (~40 chars) | Medium (~500 chars) | Long (~2K chars) | Entity-dense |
835
- |---------|-------------------|---------------------|------------------|--------------|
836
- | **Regex-only** | 0.38 ms | 0.50 ms | 0.91 ms | 0.35 ms |
837
- | **NER CPU** | 4.3 ms | 26 ms | 93 ms | 13 ms |
838
- | **NER GPU** | 62 ms | 73 ms | 117 ms | 68 ms |
839
-
840
- Local CPU inference is faster than GPU for typical workloads due to network overhead. GPU servers are beneficial for high-throughput batch processing where many requests can be parallelized.
841
-
842
- ### Throughput (ops/sec)
843
-
844
- | Backend | Short | Medium | Long |
845
- |---------|-------|--------|------|
846
- | **Regex-only** | ~2,640 | ~2,017 | ~1,096 |
847
- | **NER CPU** | ~234 | ~38 | ~11 |
848
- | **NER GPU** | ~16 | ~14 | ~9 |
849
-
850
- ### Model Downloads
851
-
852
- | Model | Size | First-Use Download |
853
- |-------|------|-------------------|
854
- | Quantized NER | ~265 MB | ~30s on fast connection |
855
- | Standard NER | ~1.1 GB | ~2min on fast connection |
856
- | Semantic Data | ~12 MB | ~5s on fast connection |
857
-
858
- ### Recommendations
859
-
860
- | Use Case | Recommended Backend |
861
- |----------|---------------------|
862
- | Structured PII only (email, phone, IBAN) | Regex-only |
863
- | General use with name/org/location detection | **NER CPU (default)** |
864
- | High-throughput batch processing (1000s of docs) | NER GPU |
865
- | Privacy-sensitive / zero-knowledge required | NER CPU (data never leaves device) |
866
-
867
- > **Note:** Local CPU inference now outperforms GPU for most use cases due to network overhead elimination. The trie-based tokenizer provides O(token_length) lookups instead of O(vocab_size), making local inference practical for production use.
868
-
869
- ## Requirements
286
+ ## Platform Support
870
287
 
871
288
  | Environment | Version | Notes |
872
289
  |-------------|---------|-------|
873
290
  | Node.js | >= 18.0.0 | Uses native `onnxruntime-node` |
874
- | Bun | >= 1.0.0 | Requires `onnxruntime-web` |
875
- | Browsers | Chrome 86+, Firefox 89+, Safari 15.4+, Edge 86+ | Uses OPFS for model storage |
291
+ | Bun | >= 1.0.0 | Install `onnxruntime-web`: `bun add rehydra onnxruntime-web` |
292
+ | Browsers | Chrome 86+, Firefox 89+, Safari 15.4+ | Uses OPFS for model storage |
876
293
 
877
- ## Development
294
+ The browser build (`rehydra/browser`) automatically excludes Node.js dependencies. Modern bundlers (Vite, webpack, esbuild) select the right entry point via conditional exports.
878
295
 
879
- ```bash
880
- # Install dependencies
881
- npm install
882
-
883
- # Run tests
884
- npm test
885
-
886
- # Build
887
- npm run build
888
-
889
- # Lint
890
- npm run lint
891
- ```
892
-
893
- ### Building Custom Models
894
-
895
- For development or custom models:
296
+ ## Development
896
297
 
897
298
  ```bash
898
- # Requires Python 3.8+
899
- npm run setup:ner # Standard model
900
- npm run setup:ner:quantized # Quantized model
299
+ npm install # Install dependencies
300
+ npm run build # Compile TypeScript
301
+ npm test # Run tests (watch mode)
302
+ npm run test:run # Run tests once
303
+ npm run lint # ESLint
304
+ npm run setup:ner # Pre-download NER model (~280 MB)
305
+ npm run benchmark # Run benchmarks
306
+
307
+ # Integration tests (require API keys)
308
+ npm run test:streaming # No API key needed
309
+ OPENAI_API_KEY=... npm run test:proxy:openai -- --ner # OpenAI with NER
310
+ ANTHROPIC_API_KEY=... npm run test:proxy:anthropic -- --ner # Anthropic with NER
901
311
  ```
902
312
 
903
313
  ## License