rehydra 0.3.3 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (83) hide show
  1. package/README.md +173 -873
  2. package/dist/core/anonymizer.d.ts +9 -1
  3. package/dist/core/anonymizer.d.ts.map +1 -1
  4. package/dist/core/anonymizer.js +29 -7
  5. package/dist/core/anonymizer.js.map +1 -1
  6. package/dist/index.d.ts +2 -0
  7. package/dist/index.d.ts.map +1 -1
  8. package/dist/index.js +4 -0
  9. package/dist/index.js.map +1 -1
  10. package/dist/proxy/index.d.ts +12 -0
  11. package/dist/proxy/index.d.ts.map +1 -0
  12. package/dist/proxy/index.js +11 -0
  13. package/dist/proxy/index.js.map +1 -0
  14. package/dist/proxy/providers/anthropic.d.ts +17 -0
  15. package/dist/proxy/providers/anthropic.d.ts.map +1 -0
  16. package/dist/proxy/providers/anthropic.js +117 -0
  17. package/dist/proxy/providers/anthropic.js.map +1 -0
  18. package/dist/proxy/providers/index.d.ts +19 -0
  19. package/dist/proxy/providers/index.d.ts.map +1 -0
  20. package/dist/proxy/providers/index.js +40 -0
  21. package/dist/proxy/providers/index.js.map +1 -0
  22. package/dist/proxy/providers/openai.d.ts +17 -0
  23. package/dist/proxy/providers/openai.d.ts.map +1 -0
  24. package/dist/proxy/providers/openai.js +92 -0
  25. package/dist/proxy/providers/openai.js.map +1 -0
  26. package/dist/proxy/providers/types.d.ts +29 -0
  27. package/dist/proxy/providers/types.d.ts.map +1 -0
  28. package/dist/proxy/providers/types.js +6 -0
  29. package/dist/proxy/providers/types.js.map +1 -0
  30. package/dist/proxy/proxy-server.d.ts +53 -0
  31. package/dist/proxy/proxy-server.d.ts.map +1 -0
  32. package/dist/proxy/proxy-server.js +146 -0
  33. package/dist/proxy/proxy-server.js.map +1 -0
  34. package/dist/proxy/rehydra-fetch.d.ts +35 -0
  35. package/dist/proxy/rehydra-fetch.d.ts.map +1 -0
  36. package/dist/proxy/rehydra-fetch.js +217 -0
  37. package/dist/proxy/rehydra-fetch.js.map +1 -0
  38. package/dist/proxy/rehydra-proxy.d.ts +40 -0
  39. package/dist/proxy/rehydra-proxy.d.ts.map +1 -0
  40. package/dist/proxy/rehydra-proxy.js +82 -0
  41. package/dist/proxy/rehydra-proxy.js.map +1 -0
  42. package/dist/proxy/sse-parser.d.ts +59 -0
  43. package/dist/proxy/sse-parser.d.ts.map +1 -0
  44. package/dist/proxy/sse-parser.js +112 -0
  45. package/dist/proxy/sse-parser.js.map +1 -0
  46. package/dist/proxy/types.d.ts +49 -0
  47. package/dist/proxy/types.d.ts.map +1 -0
  48. package/dist/proxy/types.js +5 -0
  49. package/dist/proxy/types.js.map +1 -0
  50. package/dist/proxy/wrap-client.d.ts +47 -0
  51. package/dist/proxy/wrap-client.d.ts.map +1 -0
  52. package/dist/proxy/wrap-client.js +70 -0
  53. package/dist/proxy/wrap-client.js.map +1 -0
  54. package/dist/storage/session.d.ts +3 -0
  55. package/dist/storage/session.d.ts.map +1 -1
  56. package/dist/storage/session.js +24 -1
  57. package/dist/storage/session.js.map +1 -1
  58. package/dist/storage/types.d.ts +16 -0
  59. package/dist/storage/types.d.ts.map +1 -1
  60. package/dist/streaming/anonymizer-stream.d.ts +63 -0
  61. package/dist/streaming/anonymizer-stream.d.ts.map +1 -0
  62. package/dist/streaming/anonymizer-stream.js +184 -0
  63. package/dist/streaming/anonymizer-stream.js.map +1 -0
  64. package/dist/streaming/index.d.ts +9 -0
  65. package/dist/streaming/index.d.ts.map +1 -0
  66. package/dist/streaming/index.js +8 -0
  67. package/dist/streaming/index.js.map +1 -0
  68. package/dist/streaming/sentence-buffer.d.ts +78 -0
  69. package/dist/streaming/sentence-buffer.d.ts.map +1 -0
  70. package/dist/streaming/sentence-buffer.js +238 -0
  71. package/dist/streaming/sentence-buffer.js.map +1 -0
  72. package/dist/streaming/stream-factory.d.ts +38 -0
  73. package/dist/streaming/stream-factory.d.ts.map +1 -0
  74. package/dist/streaming/stream-factory.js +69 -0
  75. package/dist/streaming/stream-factory.js.map +1 -0
  76. package/dist/streaming/types.d.ts +121 -0
  77. package/dist/streaming/types.d.ts.map +1 -0
  78. package/dist/streaming/types.js +5 -0
  79. package/dist/streaming/types.js.map +1 -0
  80. package/dist/types/index.d.ts +8 -2
  81. package/dist/types/index.d.ts.map +1 -1
  82. package/dist/types/index.js.map +1 -1
  83. package/package.json +19 -2
package/README.md CHANGED
@@ -1,1013 +1,313 @@
1
1
  # Rehydra
2
2
 
3
+ ![Rehydra Logo](https://framerusercontent.com/images/NT4xWP34VHd2Wlrk197je5BUi4.png?scale-down-to=512&width=6516&height=1752)
4
+
3
5
  ![License](https://img.shields.io/github/license/rehydra-ai/rehydra)
4
6
  ![Issues](https://img.shields.io/github/issues/rehydra-ai/rehydra)
5
7
  [![codecov](https://codecov.io/github/rehydra-ai/rehydra/graph/badge.svg?token=WX5RI0ZZJG)](https://codecov.io/github/rehydra-ai/rehydra)
6
8
 
7
- On-device PII anonymization module for high-privacy AI workflows. Detects and replaces Personally Identifiable Information (PII) with placeholder tags while maintaining an encrypted mapping for later rehydration.
8
-
9
- ```bash
10
- npm install rehydra
11
- ```
12
-
13
- **Works in Node.js, Bun, and browsers**
14
-
15
- ## Features
16
-
17
- - **Structured PII Detection**: Regex-based detection for emails, phones, IBANs, credit cards, IPs, URLs
18
- - **Soft PII Detection**: ONNX-powered NER model for names, organizations, locations (auto-downloads on first use if enabled)
19
- - **Semantic Enrichment**: AI/MT-friendly tags with gender/location attributes for better translations
20
- - **Secure PII Mapping**: AES-256-GCM encrypted storage of original PII values
21
- - **Cross-Platform**: Works identically in Node.js, Bun, and browsers
22
- - **Configurable Policies**: Customizable detection rules, thresholds, and allowlists
23
- - **Validation & Leak Scanning**: Built-in validation and optional leak detection
24
-
25
- ## Installation
26
-
27
- ### Node.js / Bun
9
+ On-device PII anonymization for AI workflows. Detects names, emails, phones, IBANs, and more replaces them with encrypted placeholder tags and rehydrates them back after processing.
28
10
 
29
11
  ```bash
30
12
  npm install rehydra
31
13
  ```
32
14
 
33
- ### Browser (with bundler)
34
-
35
- ```bash
36
- npm install rehydra onnxruntime-web
37
- ```
38
-
39
- When using Vite, webpack, or other bundlers, the browser-safe entry point is automatically selected via [conditional exports](https://nodejs.org/api/packages.html#conditional-exports). This entry point excludes Node.js-specific modules like SQLite storage.
40
-
41
- ### Browser (without bundler)
42
-
43
- ```html
44
- <script type="module">
45
- // Import directly from your dist folder or CDN
46
- import { createAnonymizer } from './node_modules/rehydra/dist/index.js';
47
-
48
- // onnxruntime-web is automatically loaded from CDN when needed
49
- </script>
50
- ```
15
+ **Works in Node.js, Bun, and browsers.** No data leaves your machine.
51
16
 
52
17
  ## Quick Start
53
18
 
54
- ### Regex-Only Mode (No Downloads Required)
55
-
56
- For structured PII like emails, phones, IBANs, credit cards:
57
-
58
- ```typescript
59
- import { anonymizeRegexOnly } from 'rehydra';
60
-
61
- const result = await anonymizeRegexOnly(
62
- 'Contact john@example.com or call +49 30 123456. IBAN: DE89370400440532013000'
63
- );
64
-
65
- console.log(result.anonymizedText);
66
- // "Contact <PII type="EMAIL" id="1"/> or call <PII type="PHONE" id="2"/>. IBAN: <PII type="IBAN" id="3"/>"
67
- ```
68
-
69
- ### Full Mode with NER (Detects Names, Organizations, Locations)
70
-
71
- The NER model is automatically downloaded on first use (~280 MB for quantized):
72
-
73
- ```typescript
74
- import { createAnonymizer } from 'rehydra';
75
-
76
- const anonymizer = createAnonymizer({
77
- ner: {
78
- mode: 'quantized', // or 'standard' for full model (~1.1 GB)
79
- onStatus: (status) => console.log(status),
80
- }
81
- });
82
-
83
- await anonymizer.initialize(); // Downloads model if needed
84
-
85
- const result = await anonymizer.anonymize(
86
- 'Hello John Smith from Acme Corp in Berlin!'
87
- );
88
-
89
- console.log(result.anonymizedText);
90
- // "Hello <PII type="PERSON" id="1"/> from <PII type="ORG" id="2"/> in <PII type="LOCATION" id="3"/>!"
91
-
92
- // Clean up when done
93
- await anonymizer.dispose();
94
- ```
95
-
96
- ### With Semantic Enrichment
97
-
98
- Add gender and location scope for better machine translation:
99
-
100
19
  ```typescript
101
- import { createAnonymizer } from 'rehydra';
20
+ import { createAnonymizer, decryptPIIMap, rehydrate, InMemoryKeyProvider } from 'rehydra';
102
21
 
22
+ const keyProvider = new InMemoryKeyProvider();
103
23
  const anonymizer = createAnonymizer({
104
- ner: { mode: 'quantized' },
105
- semantic: {
106
- enabled: true, // Downloads ~12 MB of semantic data on first use
107
- onStatus: (status) => console.log(status),
108
- }
24
+ ner: { mode: 'quantized' }, // ~280 MB model, auto-downloads on first use
25
+ keyProvider,
109
26
  });
110
27
 
111
- await anonymizer.initialize();
112
-
113
28
  const result = await anonymizer.anonymize(
114
- 'Hello Maria Schmidt from Berlin!'
29
+ 'Email john.smith@acme-corp.com or call John at +41 79 123 45 67'
115
30
  );
116
31
 
117
32
  console.log(result.anonymizedText);
118
- // "Hello <PII type="PERSON" gender="female" id="1"/> from <PII type="LOCATION" scope="city" id="2"/>!"
119
- ```
120
-
121
- ## Example: Translation Workflow (Anonymize → Translate → Rehydrate)
122
-
123
- The full workflow for privacy-preserving translation:
124
-
125
- ```typescript
126
- import {
127
- createAnonymizer,
128
- decryptPIIMap,
129
- rehydrate,
130
- InMemoryKeyProvider
131
- } from 'rehydra';
33
+ // "Email <PII type="EMAIL" id="1"/> or call <PII type="PERSON" id="2"/> at <PII type="PHONE" id="3"/>"
132
34
 
133
- // 1. Create a key provider (required to decrypt later)
134
- const keyProvider = new InMemoryKeyProvider();
135
-
136
- // 2. Create anonymizer with key provider
137
- const anonymizer = createAnonymizer({
138
- ner: { mode: 'quantized' },
139
- keyProvider: keyProvider
140
- });
141
-
142
- await anonymizer.initialize();
143
-
144
- // 3. Anonymize before translation
145
- const original = 'Hello John Smith from Acme Corp in Berlin!';
146
- const result = await anonymizer.anonymize(original);
147
-
148
- console.log(result.anonymizedText);
149
- // "Hello <PII type="PERSON" id="1"/> from <PII type="ORG" id="2"/> in <PII type="LOCATION" id="3"/>!"
150
-
151
- // 4. Translate (or do other AI workloads that preserve placeholders)
152
- const translated = await yourAIWorkflow(result.anonymizedText, { from: 'en', to: 'de' });
153
- // "Hallo <PII type="PERSON" id="1"/> von <PII type="ORG" id="2"/> in <PII type="LOCATION" id="3"/>!"
154
-
155
- // 5. Decrypt the PII map using the same key
156
- const encryptionKey = await keyProvider.getKey();
157
- const piiMap = await decryptPIIMap(result.piiMap, encryptionKey);
158
-
159
- // 6. Rehydrate - replace placeholders with original values
160
- const rehydrated = rehydrate(translated, piiMap);
161
-
162
- console.log(rehydrated);
163
- // "Hallo John Smith von Acme Corp in Berlin!"
35
+ // Rehydrate after translation or other processing
36
+ const key = await keyProvider.getKey();
37
+ const piiMap = await decryptPIIMap(result.piiMap!, key);
38
+ const original = rehydrate(result.anonymizedText, piiMap);
39
+ // "Email john.smith@acme-corp.com or call John at +41 79 123 45 67"
164
40
 
165
- // 7. Clean up
166
41
  await anonymizer.dispose();
167
42
  ```
168
43
 
169
- ### Key Points
170
-
171
- - **Save the encryption key** - You need the same key to decrypt the PII map
172
- - **Placeholders are XML-like** - Most translation services preserve them automatically
173
- - **PII stays local** - Original values never leave your system during translation
44
+ ## LLM Proxy
174
45
 
175
- ## API Reference
46
+ Drop-in middleware that anonymizes prompts before they leave your machine and rehydrates responses. Works with OpenAI, Anthropic, and any OpenAI-compatible API.
176
47
 
177
- ### Configuration Options
48
+ ### Wrap any fetch-based client
178
49
 
179
50
  ```typescript
180
- import { createAnonymizer, InMemoryKeyProvider } from 'rehydra';
181
-
182
- const anonymizer = createAnonymizer({
183
- // NER configuration
184
- ner: {
185
- mode: 'quantized', // 'standard' | 'quantized' | 'disabled' | 'custom'
186
- backend: 'local', // 'local' (default) | 'inference-server'
187
- autoDownload: true, // Auto-download model if not present
188
- onStatus: (status) => {}, // Status messages callback
189
- onDownloadProgress: (progress) => {
190
- console.log(`${progress.file}: ${progress.percent}%`);
191
- },
192
-
193
- // For 'inference-server' backend:
194
- inferenceServerUrl: 'http://localhost:8080',
195
-
196
- // For 'custom' mode only:
197
- modelPath: './my-model.onnx',
198
- vocabPath: './vocab.txt',
199
- },
200
-
201
- // Semantic enrichment (adds gender/scope attributes)
202
- semantic: {
203
- enabled: true, // Enable MT-friendly attributes
204
- autoDownload: true, // Auto-download semantic data (~12 MB)
205
- onStatus: (status) => {},
206
- onDownloadProgress: (progress) => {},
207
- },
208
-
209
- // Encryption key provider
210
- keyProvider: new InMemoryKeyProvider(),
211
-
212
- // Custom policy (optional)
213
- defaultPolicy: { /* see Policy section */ },
51
+ import OpenAI from 'openai';
52
+ import { createRehydraFetch, InMemoryKeyProvider, InMemoryPIIStorageProvider } from 'rehydra';
53
+
54
+ const openai = new OpenAI({
55
+ fetch: createRehydraFetch({
56
+ anonymizer: { ner: { mode: 'quantized' } },
57
+ keyProvider: new InMemoryKeyProvider(),
58
+ piiStorageProvider: new InMemoryPIIStorageProvider(),
59
+ }),
214
60
  });
215
61
 
216
- await anonymizer.initialize();
217
- ```
218
-
219
- ### NER Modes
220
-
221
- | Mode | Description | Size | Auto-Download |
222
- |------|-------------|------|---------------|
223
- | `'disabled'` | No NER, regex only | 0 | N/A |
224
- | `'quantized'` | Smaller model, ~95% accuracy | ~280 MB | Yes |
225
- | `'standard'` | Full model, best accuracy | ~1.1 GB | Yes |
226
- | `'custom'` | Your own ONNX model | Varies | No |
227
-
228
- ### ONNX Session Options
229
-
230
- Fine-tune ONNX Runtime performance with session options:
231
-
232
- ```typescript
233
- const anonymizer = createAnonymizer({
234
- ner: {
235
- mode: 'quantized',
236
- sessionOptions: {
237
- // Graph optimization level: 'disabled' | 'basic' | 'extended' | 'all'
238
- graphOptimizationLevel: 'all', // default
239
-
240
- // Threading (Node.js only)
241
- intraOpNumThreads: 4, // threads within operators
242
- interOpNumThreads: 1, // threads between operators
243
-
244
- // Memory optimization
245
- enableCpuMemArena: true,
246
- enableMemPattern: true,
247
- }
248
- }
62
+ // PII is anonymized before leaving your machine, response is rehydrated automatically
63
+ const response = await openai.chat.completions.create({
64
+ model: 'gpt-4o',
65
+ messages: [{ role: 'user', content: 'Draft a reply to john@example.com about the meeting' }],
249
66
  });
250
67
  ```
251
68
 
252
- #### Execution Providers
253
-
254
- By default, Rehydra uses:
255
- - **Node.js**: CPU (fastest for quantized models)
256
- - **Browsers**: WebGPU with WASM fallback
257
-
258
- To enable **CoreML on macOS** (for non-quantized models):
69
+ ### Or use wrapLLMClient for even less code
259
70
 
260
71
  ```typescript
261
- const anonymizer = createAnonymizer({
262
- ner: {
263
- mode: 'standard', // CoreML works better with FP32 models
264
- sessionOptions: {
265
- executionProviders: ['coreml', 'cpu'],
266
- }
267
- }
268
- });
269
- ```
270
-
271
- > **Note:** CoreML provides minimal speedup for quantized (INT8) models since they're already optimized for CPU. Use CoreML with the standard FP32 model for best results.
272
-
273
- Available execution providers (local inference):
274
- | Provider | Platform | Best For |
275
- |----------|----------|----------|
276
- | `'cpu'` | All | Quantized models (default) |
277
- | `'coreml'` | macOS | Standard (FP32) models on Apple Silicon |
278
- | `'webgpu'` | Browsers | GPU acceleration in Chrome 113+ |
279
- | `'wasm'` | Browsers | Fallback for all browsers |
280
-
281
- > **Note:** For NVIDIA GPU acceleration with CUDA/TensorRT, use the inference server backend (see [GPU Acceleration](#gpu-acceleration-enterprise)).
282
-
283
- ### GPU Acceleration (Enterprise)
284
-
285
- For high-throughput production deployments, Rehydra supports GPU-accelerated inference via a dedicated inference server. This provides **10-37× speedup** over CPU inference.
286
-
287
- ```typescript
288
- const anonymizer = createAnonymizer({
289
- ner: {
290
- backend: 'inference-server',
291
- inferenceServerUrl: 'http://localhost:8080',
292
- }
293
- });
294
-
295
- await anonymizer.initialize();
296
- ```
297
-
298
- **Performance Comparison:**
299
-
300
- | Text Size | CPU (local) | GPU (server) | Winner |
301
- |-----------|-------------|--------------|--------|
302
- | Short (~40 chars) | 4.3ms | 62ms | **CPU 14× faster** |
303
- | Medium (~500 chars) | 26ms | 73ms | **CPU 2.8× faster** |
304
- | Long (~2000 chars) | 93ms | 117ms | **CPU 1.3× faster** |
305
- | Entity-dense | 13ms | 68ms | **CPU 5× faster** |
306
-
307
- Local CPU faster for most use cases due to network overhead. GPU is beneficial for batch processing and large documents.
308
-
309
-
310
-
311
- **Backend Options:**
312
-
313
- | Backend | Description | Latency (2K chars) |
314
- |---------|-------------|-------------------|
315
- | `'local'` | CPU inference (default) | ~4,300ms |
316
- | `'inference-server'` | GPU server (enterprise) | ~117ms |
317
-
318
- > **Note:** The GPU inference server is available as part of Rehydra Enterprise. Contact us for deployment options including Docker containers and Kubernetes helm charts.
319
-
320
- ### Main Functions
72
+ import OpenAI from 'openai';
73
+ import { wrapLLMClient, InMemoryKeyProvider, InMemoryPIIStorageProvider } from 'rehydra';
321
74
 
322
- #### `createAnonymizer(config?)`
323
-
324
- Creates a reusable anonymizer instance:
325
-
326
- ```typescript
327
- const anonymizer = createAnonymizer({
328
- ner: { mode: 'quantized' }
75
+ const openai = wrapLLMClient(new OpenAI(), {
76
+ keyProvider: new InMemoryKeyProvider(),
77
+ piiStorageProvider: new InMemoryPIIStorageProvider(),
329
78
  });
330
-
331
- await anonymizer.initialize();
332
- const result = await anonymizer.anonymize('text');
333
- await anonymizer.dispose();
334
- ```
335
-
336
- #### `anonymize(text, locale?, policy?)`
337
-
338
- One-off anonymization (regex-only by default):
339
-
340
- ```typescript
341
- import { anonymize } from 'rehydra';
342
-
343
- const result = await anonymize('Contact test@example.com');
344
- ```
345
-
346
- #### `anonymizeWithNER(text, nerConfig, policy?)`
347
-
348
- One-off anonymization with NER:
349
-
350
- ```typescript
351
- import { anonymizeWithNER } from 'rehydra';
352
-
353
- const result = await anonymizeWithNER(
354
- 'Hello John Smith',
355
- { mode: 'quantized' }
356
- );
357
- ```
358
-
359
- #### `anonymizeRegexOnly(text, policy?)`
360
-
361
- Fast regex-only anonymization:
362
-
363
- ```typescript
364
- import { anonymizeRegexOnly } from 'rehydra';
365
-
366
- const result = await anonymizeRegexOnly('Card: 4111111111111111');
367
79
  ```
368
80
 
369
- ### Rehydration Functions
370
-
371
- #### `decryptPIIMap(encryptedMap, key)`
81
+ ### Standalone proxy server
372
82
 
373
- Decrypts the PII map for rehydration:
83
+ Point any LLM client at a local proxy — zero code changes needed:
374
84
 
375
85
  ```typescript
376
- import { decryptPIIMap } from 'rehydra';
377
-
378
- const piiMap = await decryptPIIMap(result.piiMap, encryptionKey);
379
- // Returns Map<string, string> where key is "PERSON:1" and value is "John Smith"
380
- ```
86
+ import { createRehydraProxyServer, InMemoryKeyProvider, InMemoryPIIStorageProvider } from 'rehydra';
381
87
 
382
- #### `rehydrate(text, piiMap)`
383
-
384
- Replaces placeholders with original values:
385
-
386
- ```typescript
387
- import { rehydrate } from 'rehydra';
388
-
389
- const original = rehydrate(translatedText, piiMap);
390
- ```
391
-
392
- ### Result Structure
393
-
394
- ```typescript
395
- interface AnonymizationResult {
396
- // Text with PII replaced by placeholder tags
397
- anonymizedText: string;
398
-
399
- // Detected entities (without original text for safety)
400
- entities: Array<{
401
- type: PIIType;
402
- id: number;
403
- start: number;
404
- end: number;
405
- confidence: number;
406
- source: 'REGEX' | 'NER';
407
- }>;
408
-
409
- // Encrypted PII mapping (for later rehydration)
410
- piiMap: {
411
- ciphertext: string; // Base64
412
- iv: string; // Base64
413
- authTag: string; // Base64
414
- };
415
-
416
- // Processing statistics
417
- stats: {
418
- countsByType: Record<PIIType, number>;
419
- totalEntities: number;
420
- processingTimeMs: number;
421
- modelVersion: string;
422
- leakScanPassed?: boolean;
423
- };
424
- }
425
- ```
426
-
427
- ## Supported PII Types
428
-
429
- | Type | Description | Detection | Semantic Attributes |
430
- |------|-------------|-----------|---------------------|
431
- | `EMAIL` | Email addresses | Regex | - |
432
- | `PHONE` | Phone numbers (international) | Regex | - |
433
- | `IBAN` | International Bank Account Numbers | Regex + Checksum | - |
434
- | `BIC_SWIFT` | Bank Identifier Codes | Regex | - |
435
- | `CREDIT_CARD` | Credit card numbers | Regex + Luhn | - |
436
- | `IP_ADDRESS` | IPv4 and IPv6 addresses | Regex | - |
437
- | `URL` | Web URLs | Regex | - |
438
- | `CASE_ID` | Case/ticket numbers | Regex (configurable) | - |
439
- | `CUSTOMER_ID` | Customer identifiers | Regex (configurable) | - |
440
- | `PERSON` | Person names | NER | `gender` (male/female/neutral) |
441
- | `ORG` | Organization names | NER | - |
442
- | `LOCATION` | Location/place names | NER | `scope` (city/country/region) |
443
- | `ADDRESS` | Physical addresses | NER | - |
444
- | `DATE_OF_BIRTH` | Dates of birth | NER | - |
445
-
446
- ## Configuration
447
-
448
- ### Anonymization Policy
449
-
450
- ```typescript
451
- import { createAnonymizer, PIIType } from 'rehydra';
452
-
453
- const anonymizer = createAnonymizer({
454
- ner: { mode: 'quantized' },
455
- defaultPolicy: {
456
- // Which PII types to detect
457
- enabledTypes: new Set([PIIType.EMAIL, PIIType.PHONE, PIIType.PERSON]),
458
-
459
- // Confidence thresholds per type (0.0 - 1.0)
460
- confidenceThresholds: new Map([
461
- [PIIType.PERSON, 0.8],
462
- [PIIType.EMAIL, 0.5],
463
- ]),
464
-
465
- // Terms to never treat as PII
466
- allowlistTerms: new Set(['Customer Service', 'Help Desk']),
467
-
468
- // Enable semantic enrichment (gender/scope)
469
- enableSemanticMasking: true,
470
-
471
- // Enable leak scanning on output
472
- enableLeakScan: true,
473
- },
88
+ const proxy = await createRehydraProxyServer({
89
+ port: 8080,
90
+ upstream: 'https://api.openai.com',
91
+ keyProvider: new InMemoryKeyProvider(),
92
+ piiStorageProvider: new InMemoryPIIStorageProvider(),
474
93
  });
475
- ```
476
-
477
- ### Custom Recognizers
478
94
 
479
- Add domain-specific patterns:
480
-
481
- ```typescript
482
- import { createCustomIdRecognizer, PIIType, createAnonymizer } from 'rehydra';
483
-
484
- const customRecognizer = createCustomIdRecognizer([
485
- {
486
- name: 'Order Number',
487
- pattern: /\bORD-[A-Z0-9]{8}\b/g,
488
- type: PIIType.CASE_ID,
489
- },
490
- ]);
491
-
492
- const anonymizer = createAnonymizer();
493
- anonymizer.getRegistry().register(customRecognizer);
95
+ // Point your client at the proxy
96
+ const openai = new OpenAI({ baseURL: 'http://localhost:8080/v1' });
494
97
  ```
495
98
 
496
- ## Data & Model Storage
497
-
498
- Models and semantic data are cached locally for offline use.
499
-
500
- ### Node.js Cache Locations
501
-
502
- | Data | macOS | Linux | Windows |
503
- |------|-------|-------|---------|
504
- | NER Models | `~/Library/Caches/rehydra/models/` | `~/.cache/rehydra/models/` | `%LOCALAPPDATA%/rehydra/models/` |
505
- | Semantic Data | `~/Library/Caches/rehydra/semantic-data/` | `~/.cache/rehydra/semantic-data/` | `%LOCALAPPDATA%/rehydra/semantic-data/` |
506
-
507
- ### Browser Cache
508
-
509
- In browsers, data is stored using:
510
- - **IndexedDB**: For semantic data and smaller files
511
- - **Origin Private File System (OPFS)**: For large model files (~280 MB)
99
+ Supports non-streaming and streaming (SSE) responses for both OpenAI and Anthropic APIs.
512
100
 
513
- Data persists across page reloads and browser sessions.
101
+ ## Streaming
514
102
 
515
- ### Manual Data Management
103
+ Process text chunk-by-chunk with constant memory. Works as a Node.js Transform stream.
516
104
 
517
105
  ```typescript
518
- import {
519
- // Model management
520
- isModelDownloaded,
521
- downloadModel,
522
- clearModelCache,
523
- listDownloadedModels,
524
-
525
- // Semantic data management
526
- isSemanticDataDownloaded,
527
- downloadSemanticData,
528
- clearSemanticDataCache,
529
- } from 'rehydra';
530
-
531
- // Check if model is downloaded
532
- const hasModel = await isModelDownloaded('quantized');
106
+ import { createReadStream, createWriteStream } from 'fs';
107
+ import { createAnonymizerStream, InMemoryKeyProvider } from 'rehydra';
533
108
 
534
- // Manually download model with progress
535
- await downloadModel('quantized', (progress) => {
536
- console.log(`${progress.file}: ${progress.percent}%`);
109
+ const stream = await createAnonymizerStream({
110
+ anonymizer: { ner: { mode: 'quantized' } },
111
+ keyProvider: new InMemoryKeyProvider(),
112
+ sessionId: 'batch-job-001',
113
+ piiStorageProvider: storage,
537
114
  });
538
115
 
539
- // Check semantic data
540
- const hasSemanticData = await isSemanticDataDownloaded();
541
-
542
- // List downloaded models
543
- const models = await listDownloadedModels();
544
-
545
- // Clear caches
546
- await clearModelCache('quantized'); // or clearModelCache() for all
547
- await clearSemanticDataCache();
116
+ createReadStream('input.txt').pipe(stream).pipe(createWriteStream('anonymized.txt'));
548
117
  ```
549
118
 
550
- ## Encryption & Security
551
-
552
- The PII map is encrypted using **AES-256-GCM** via the Web Crypto API (works in both Node.js and browsers).
119
+ ### Low-latency mode for LLM token streams
553
120
 
554
- ### Key Providers
121
+ Regex-only, smaller buffers, flushes aggressively — designed for real-time token streams:
555
122
 
556
123
  ```typescript
557
- import {
558
- InMemoryKeyProvider, // For development/testing
559
- ConfigKeyProvider, // For production with pre-configured key
560
- KeyProvider, // Interface for custom implementations
561
- generateKey,
562
- } from 'rehydra';
563
-
564
- // Development: In-memory key (generates random key, lost on page refresh)
565
- const devKeyProvider = new InMemoryKeyProvider();
566
-
567
- // Production: Pre-configured key
568
- // Generate key: openssl rand -base64 32
569
- const keyBase64 = process.env.PII_ENCRYPTION_KEY; // or read from config
570
- const prodKeyProvider = new ConfigKeyProvider(keyBase64);
571
-
572
- // Custom: Implement KeyProvider interface
573
- class SecureKeyProvider implements KeyProvider {
574
- async getKey(): Promise<Uint8Array> {
575
- // Retrieve from secure storage, HSM, keychain, etc.
576
- return await getKeyFromSecureStorage();
577
- }
578
- }
579
- ```
580
-
581
- ### Security Best Practices
582
-
583
- - **Never log the raw PII map** - Always use encrypted storage
584
- - **Persist the encryption key securely** - Use platform keystores (iOS Keychain, Android Keystore, etc.)
585
- - **Rotate keys** - Implement key rotation for long-running applications
586
- - **Enable leak scanning** - Catch any missed PII in output
587
-
588
- ## PII Map Storage
589
-
590
- For applications that need to persist encrypted PII maps (e.g., chat applications where you need to rehydrate later), use sessions with built-in storage providers.
591
-
592
- ### Storage Providers
593
-
594
- | Provider | Environment | Persistence | Use Case |
595
- |----------|-------------|-------------|----------|
596
- | `InMemoryPIIStorageProvider` | All | None (lost on restart) | Development, testing |
597
- | `SQLitePIIStorageProvider` | Node.js, Bun only* | File-based | Server-side applications |
598
- | `IndexedDBPIIStorageProvider` | Browser | Browser storage | Client-side applications |
599
-
600
- *\*Not available in browser builds. Use `IndexedDBPIIStorageProvider` for browser applications.*
601
-
602
- ### Important: Storage Only Works with Sessions
603
-
604
- > **Note:** The `piiStorageProvider` is only used when you call `anonymizer.session()`.
605
- > Calling `anonymizer.anonymize()` directly does NOT save to storage - the encrypted PII map
606
- > is only returned in the result for you to handle manually.
124
+ const stream = await createAnonymizerStream({
125
+ buffer: { lowLatency: true },
126
+ });
607
127
 
608
- ```typescript
609
- // ❌ Storage NOT used - you must handle the PII map yourself
610
- const result = await anonymizer.anonymize('Hello John!');
611
- // result.piiMap is returned but NOT saved to storage
612
-
613
- // ✅ Storage IS used - auto-saves and auto-loads
614
- const session = anonymizer.session('conversation-123');
615
- const result = await session.anonymize('Hello John!');
616
- // result.piiMap is automatically saved to storage
128
+ llmTokenStream.pipe(stream).on('data', (chunk) => {
129
+ ws.send(chunk.toString());
130
+ });
617
131
  ```
618
132
 
619
- ### Example: Without Storage (Simple One-Off Usage)
620
-
621
- For simple use cases where you don't need persistence:
133
+ ### Stream from a session
622
134
 
623
135
  ```typescript
624
- import { createAnonymizer, decryptPIIMap, rehydrate, InMemoryKeyProvider } from 'rehydra';
625
-
626
- const keyProvider = new InMemoryKeyProvider();
627
- const anonymizer = createAnonymizer({
628
- ner: { mode: 'quantized' },
629
- keyProvider,
630
- });
631
- await anonymizer.initialize();
632
-
633
- // Anonymize
634
- const result = await anonymizer.anonymize('Hello John Smith!');
635
-
636
- // Translate (or other processing)
637
- const translated = await translateAPI(result.anonymizedText);
638
-
639
- // Rehydrate manually using the returned PII map
640
- const key = await keyProvider.getKey();
641
- const piiMap = await decryptPIIMap(result.piiMap, key);
642
- const original = rehydrate(translated, piiMap);
136
+ const session = anonymizer.session('chat-123');
137
+ const stream = await session.createStream();
138
+ input.pipe(stream).pipe(output);
643
139
  ```
644
140
 
645
- ### Example: With Storage (Persistent Sessions)
141
+ ## Sessions
646
142
 
647
- For applications that need to persist PII maps across requests/restarts:
143
+ For multi-message conversations where PII IDs need to stay consistent and PII maps need to persist:
648
144
 
649
145
  ```typescript
650
- import {
146
+ import {
651
147
  createAnonymizer,
652
148
  InMemoryKeyProvider,
653
- SQLitePIIStorageProvider,
149
+ SQLitePIIStorageProvider, // or InMemoryPIIStorageProvider, IndexedDBPIIStorageProvider
654
150
  } from 'rehydra';
655
151
 
656
- // 1. Setup storage (once at app start)
657
- const storage = new SQLitePIIStorageProvider('./pii-maps.db');
658
- await storage.initialize();
659
-
660
- // 2. Create anonymizer with storage and key provider
661
152
  const anonymizer = createAnonymizer({
662
153
  ner: { mode: 'quantized' },
663
154
  keyProvider: new InMemoryKeyProvider(),
664
- piiStorageProvider: storage,
155
+ piiStorageProvider: new SQLitePIIStorageProvider('./pii.db'),
665
156
  });
666
- await anonymizer.initialize();
667
-
668
- // 3. Create a session for each conversation
669
- const session = anonymizer.session('conversation-123');
670
-
671
- // 4. Anonymize - auto-saves to storage
672
- const result = await session.anonymize('Hello John Smith from Acme Corp!');
673
- console.log(result.anonymizedText);
674
- // "Hello <PII type="PERSON" id="1"/> from <PII type="ORG" id="1"/>!"
675
-
676
- // 5. Later (even after app restart): rehydrate - auto-loads and decrypts
677
- const translated = await translateAPI(result.anonymizedText);
678
- const original = await session.rehydrate(translated);
679
- console.log(original);
680
- // "Hello John Smith from Acme Corp!"
681
-
682
- // 6. Optional: check existence or delete
683
- await session.exists(); // true
684
- await session.delete(); // removes from storage
685
- ```
686
-
687
- ### Example: Multiple Conversations
688
-
689
- Each session ID maps to a separate stored PII map:
690
-
691
- ```typescript
692
- // Different chat sessions
693
- const chat1 = anonymizer.session('user-alice-chat');
694
- const chat2 = anonymizer.session('user-bob-chat');
695
-
696
- await chat1.anonymize('Alice: Contact me at alice@example.com');
697
- await chat2.anonymize('Bob: My number is +49 30 123456');
698
-
699
- // Each session has independent storage
700
- await chat1.rehydrate(translatedText1); // Uses Alice's PII map
701
- await chat2.rehydrate(translatedText2); // Uses Bob's PII map
702
- ```
703
-
704
- ### Multi-Message Conversations
705
157
 
706
- Within a session, entity IDs are consistent across multiple `anonymize()` calls:
707
-
708
- ```typescript
709
158
  const session = anonymizer.session('chat-123');
710
159
 
711
- // Message 1: User provides contact info
712
- const msg1 = await session.anonymize('Contact me at user@example.com');
160
+ // Message 1
161
+ await session.anonymize('Contact me at user@example.com');
713
162
  // → "Contact me at <PII type="EMAIL" id="1"/>"
714
163
 
715
- // Message 2: References same email + new one
716
- const msg2 = await session.anonymize('CC: user@example.com and admin@example.com');
164
+ // Message 2 same email gets the same ID
165
+ await session.anonymize('CC: user@example.com and admin@example.com');
717
166
  // → "CC: <PII type="EMAIL" id="1"/> and <PII type="EMAIL" id="2"/>"
718
- // ↑ Same ID (reused) ↑ New ID
719
-
720
- // Message 3: No PII
721
- await session.anonymize('Please translate to German');
722
- // Previous PII preserved
723
167
 
724
- // All messages can be rehydrated correctly
725
- await session.rehydrate(msg1.anonymizedText); // ✓
726
- await session.rehydrate(msg2.anonymizedText); // ✓
168
+ // Rehydrate any message auto-loads the PII map from storage
169
+ const original = await session.rehydrate(translatedText);
727
170
  ```
728
171
 
729
- This ensures that follow-up messages referencing the same PII produce consistent placeholders, and rehydration works correctly across the entire conversation.
730
-
731
- ### SQLite Provider (Node.js + Bun only)
172
+ ### Storage Providers
732
173
 
733
- The SQLite provider works on both Node.js and Bun with automatic runtime detection.
174
+ | Provider | Environment | Persistence |
175
+ |----------|-------------|-------------|
176
+ | `InMemoryPIIStorageProvider` | All | None (lost on restart) |
177
+ | `SQLitePIIStorageProvider` | Node.js, Bun | File-based (`better-sqlite3` on Node, `bun:sqlite` on Bun) |
178
+ | `IndexedDBPIIStorageProvider` | Browser | Browser storage |
734
179
 
735
- > **Note:** `SQLitePIIStorageProvider` is **not available in browser builds**. When bundling for browser with Vite/webpack, use `IndexedDBPIIStorageProvider` instead. The browser-safe build automatically excludes SQLite to avoid bundling Node.js dependencies.
180
+ ## Supported PII Types
736
181
 
737
- ```typescript
738
- // Node.js / Bun only
739
- import { SQLitePIIStorageProvider } from 'rehydra';
740
- // Or explicitly: import { SQLitePIIStorageProvider } from 'rehydra/storage/sqlite';
182
+ | Type | Detection | Notes |
183
+ |------|-----------|-------|
184
+ | `PERSON` | NER | Names, with optional `gender` attribute |
185
+ | `ORG` | NER | Organization names |
186
+ | `LOCATION` | NER | Places, with optional `scope` attribute (city/country/region) |
187
+ | `ADDRESS` | NER | Physical addresses |
188
+ | `DATE_OF_BIRTH` | NER | Dates of birth |
189
+ | `EMAIL` | Regex | Email addresses |
190
+ | `PHONE` | Regex | International phone numbers |
191
+ | `IBAN` | Regex + checksum | International Bank Account Numbers |
192
+ | `BIC_SWIFT` | Regex | Bank Identifier Codes |
193
+ | `CREDIT_CARD` | Regex + Luhn | Credit card numbers |
194
+ | `IP_ADDRESS` | Regex | IPv4 and IPv6 |
195
+ | `URL` | Regex | Web URLs |
196
+ | `CASE_ID` | Regex | Configurable case/ticket patterns |
197
+ | `CUSTOMER_ID` | Regex | Configurable customer ID patterns |
741
198
 
742
- // File-based database
743
- const storage = new SQLitePIIStorageProvider('./data/pii-maps.db');
744
- await storage.initialize();
199
+ ## Configuration
745
200
 
746
- // Or in-memory for testing
747
- const testStorage = new SQLitePIIStorageProvider(':memory:');
748
- await testStorage.initialize();
749
- ```
201
+ ### NER Modes
750
202
 
751
- **Dependencies:**
752
- - **Bun**: Uses built-in `bun:sqlite` (no additional install needed)
753
- - **Node.js**: Requires `better-sqlite3`:
203
+ | Mode | Size | Description |
204
+ |------|------|-------------|
205
+ | `'disabled'` | 0 | Regex only — no model download |
206
+ | `'quantized'` | ~280 MB | Recommended — good accuracy, smaller download |
207
+ | `'standard'` | ~1.1 GB | Best accuracy |
208
+ | `'custom'` | Varies | Bring your own ONNX model |
754
209
 
755
- ```bash
756
- npm install better-sqlite3
757
- ```
210
+ ### Semantic Enrichment
758
211
 
759
- ### IndexedDB Provider (Browser)
212
+ Adds gender/scope attributes for better machine translation:
760
213
 
761
214
  ```typescript
762
- import {
763
- createAnonymizer,
764
- InMemoryKeyProvider,
765
- IndexedDBPIIStorageProvider,
766
- } from 'rehydra';
767
-
768
- // Custom database name (defaults to 'rehydra-pii-storage')
769
- const storage = new IndexedDBPIIStorageProvider('my-app-pii');
770
-
771
215
  const anonymizer = createAnonymizer({
772
216
  ner: { mode: 'quantized' },
773
- keyProvider: new InMemoryKeyProvider(),
774
- piiStorageProvider: storage,
217
+ semantic: { enabled: true }, // Downloads ~12 MB of name/location data
775
218
  });
776
- await anonymizer.initialize();
777
219
 
778
- // Use sessions as usual
779
- const session = anonymizer.session('browser-chat-123');
780
- const result = await session.anonymize('Hello John!');
781
- const original = await session.rehydrate(result.anonymizedText);
220
+ // "Hello <PII type="PERSON" gender="female" id="1"/> from <PII type="LOCATION" scope="city" id="2"/>!"
782
221
  ```
783
222
 
784
- ### Session Interface
785
-
786
- The session object provides these methods:
223
+ ### Anonymization Policy
787
224
 
788
225
  ```typescript
789
- interface AnonymizerSession {
790
- readonly sessionId: string;
791
- anonymize(text: string, locale?: string, policy?: Partial<AnonymizationPolicy>): Promise<AnonymizationResult>;
792
- rehydrate(text: string): Promise<string>;
793
- load(): Promise<StoredPIIMap | null>;
794
- delete(): Promise<boolean>;
795
- exists(): Promise<boolean>;
796
- }
226
+ const anonymizer = createAnonymizer({
227
+ ner: { mode: 'quantized' },
228
+ defaultPolicy: {
229
+ enabledTypes: new Set([PIIType.EMAIL, PIIType.PHONE, PIIType.PERSON]),
230
+ confidenceThresholds: new Map([[PIIType.PERSON, 0.8]]),
231
+ allowlistTerms: new Set(['Customer Service']),
232
+ enableLeakScan: true,
233
+ },
234
+ });
797
235
  ```
798
236
 
799
- ### Data Retention
800
-
801
- **Entries persist forever by default.** Use `cleanup()` on the storage provider to remove old entries:
237
+ ### Anonymization Modes
802
238
 
803
239
  ```typescript
804
- // Delete entries older than 7 days
805
- const count = await storage.cleanup(new Date(Date.now() - 7 * 24 * 60 * 60 * 1000));
240
+ // Pseudonymize (default): reversible, returns encrypted PII map
241
+ const anonymizer = createAnonymizer({ mode: 'pseudonymize' });
806
242
 
807
- // Or delete specific sessions
808
- await session.delete();
809
-
810
- // List all stored sessions
811
- const sessionIds = await storage.list();
812
- ```
813
-
814
- ## Browser Usage
815
-
816
- The library works seamlessly in browsers without any special configuration.
817
-
818
- ### Basic Browser Example
819
-
820
- ```html
821
- <!DOCTYPE html>
822
- <html>
823
- <head>
824
- <title>PII Anonymization</title>
825
- </head>
826
- <body>
827
- <script type="module">
828
- import {
829
- createAnonymizer,
830
- InMemoryKeyProvider,
831
- decryptPIIMap,
832
- rehydrate
833
- } from './node_modules/rehydra/dist/index.js';
834
-
835
- async function demo() {
836
- // Create anonymizer
837
- const keyProvider = new InMemoryKeyProvider();
838
- const anonymizer = createAnonymizer({
839
- ner: {
840
- mode: 'quantized',
841
- onStatus: (s) => console.log('NER:', s),
842
- onDownloadProgress: (p) => console.log(`Download: ${p.percent}%`)
843
- },
844
- semantic: { enabled: true },
845
- keyProvider
846
- });
847
-
848
- // Initialize (downloads models on first use)
849
- await anonymizer.initialize();
850
-
851
- // Anonymize
852
- const result = await anonymizer.anonymize(
853
- 'Contact Maria Schmidt at maria@example.com in Berlin.'
854
- );
855
-
856
- console.log('Anonymized:', result.anonymizedText);
857
- // "Contact <PII type="PERSON" gender="female" id="1"/> at <PII type="EMAIL" id="2"/> in <PII type="LOCATION" scope="city" id="3"/>."
858
-
859
- // Rehydrate
860
- const key = await keyProvider.getKey();
861
- const piiMap = await decryptPIIMap(result.piiMap, key);
862
- const original = rehydrate(result.anonymizedText, piiMap);
863
-
864
- console.log('Rehydrated:', original);
865
-
866
- await anonymizer.dispose();
867
- }
868
-
869
- demo().catch(console.error);
870
- </script>
871
- </body>
872
- </html>
243
+ // Anonymize: irreversible, no PII map returned
244
+ const anonymizer = createAnonymizer({ mode: 'anonymize' });
873
245
  ```
874
246
 
875
- ### Browser Notes
876
-
877
- - **First-use downloads**: NER model (~280 MB) and semantic data (~12 MB) are downloaded on first use
878
- - **ONNX runtime**: Automatically loaded from CDN if not bundled
879
- - **Offline support**: After initial download, everything works offline
880
- - **Storage**: Uses IndexedDB and OPFS - data persists across sessions
881
-
882
- ### Bundler Support (Vite, webpack, esbuild)
883
-
884
- The package uses [conditional exports](https://nodejs.org/api/packages.html#conditional-exports) to automatically provide a browser-safe build when bundling for the web. This means:
885
-
886
- - **Automatic**: Vite, webpack, esbuild, and other modern bundlers will automatically use `dist/browser.js`
887
- - **No Node.js modules**: The browser build excludes `SQLitePIIStorageProvider` and other Node.js-specific code
888
- - **Tree-shakable**: Only the code you use is included in your bundle
889
-
890
- ```json
891
- // package.json exports (simplified)
892
- {
893
- "exports": {
894
- ".": {
895
- "browser": "./dist/browser.js",
896
- "node": "./dist/index.js",
897
- "default": "./dist/index.js"
898
- }
899
- }
900
- }
901
- ```
902
-
903
- **Explicit imports** (if needed):
247
+ ### Custom Recognizers
904
248
 
905
249
  ```typescript
906
- // Browser-only build (excludes SQLite, Node.js fs, etc.)
907
- import { createAnonymizer } from 'rehydra/browser';
250
+ import { createCustomIdRecognizer, PIIType } from 'rehydra';
908
251
 
909
- // Node.js build (includes everything)
910
- import { createAnonymizer, SQLitePIIStorageProvider } from 'rehydra/node';
252
+ const recognizer = createCustomIdRecognizer([{
253
+ name: 'Order Number',
254
+ pattern: /\bORD-[A-Z0-9]{8}\b/g,
255
+ type: PIIType.CASE_ID,
256
+ }]);
911
257
 
912
- // SQLite storage only (Node.js only)
913
- import { SQLitePIIStorageProvider } from 'rehydra/storage/sqlite';
258
+ anonymizer.getRegistry().register(recognizer);
914
259
  ```
915
260
 
916
- **Browser build excludes:**
917
- - `SQLitePIIStorageProvider` (use `IndexedDBPIIStorageProvider` instead)
918
- - Node.js `fs`, `path`, `os` modules
919
-
920
- **Browser build includes:**
921
- - All recognizers (email, phone, IBAN, etc.)
922
- - NER model support (with `onnxruntime-web`)
923
- - Semantic enrichment
924
- - `InMemoryPIIStorageProvider`
925
- - `IndexedDBPIIStorageProvider`
926
- - All crypto utilities
261
+ ### GPU Acceleration
927
262
 
928
- ## Bun Support
263
+ For high-throughput batch processing, use a remote inference server with GPU:
929
264
 
930
- This library works with [Bun](https://bun.sh). Since `onnxruntime-node` is a native Node.js addon, Bun uses `onnxruntime-web`:
931
-
932
- ```bash
933
- bun add rehydra onnxruntime-web
265
+ ```typescript
266
+ const anonymizer = createAnonymizer({
267
+ ner: {
268
+ backend: 'inference-server',
269
+ inferenceServerUrl: 'http://localhost:8080',
270
+ },
271
+ });
934
272
  ```
935
273
 
936
- Usage is identical - the library auto-detects the runtime.
937
-
938
- ## Performance
939
-
940
- Benchmarks on Apple M-series (CPU) and NVIDIA T4 (GPU). Run `npm run benchmark:compare` to measure on your hardware.
274
+ ## Encryption
941
275
 
942
- ### Backend Comparison
276
+ PII maps are encrypted with **AES-256-GCM** via the Web Crypto API.
943
277
 
944
- | Backend | Short (~40 chars) | Medium (~500 chars) | Long (~2K chars) | Entity-dense |
945
- |---------|-------------------|---------------------|------------------|--------------|
946
- | **Regex-only** | 0.38 ms | 0.50 ms | 0.91 ms | 0.35 ms |
947
- | **NER CPU** | 4.3 ms | 26 ms | 93 ms | 13 ms |
948
- | **NER GPU** | 62 ms | 73 ms | 117 ms | 68 ms |
949
-
950
- Local CPU inference is faster than GPU for typical workloads due to network overhead. GPU servers are beneficial for high-throughput batch processing where many requests can be parallelized.
951
-
952
- ### Throughput (ops/sec)
953
-
954
- | Backend | Short | Medium | Long |
955
- |---------|-------|--------|------|
956
- | **Regex-only** | ~2,640 | ~2,017 | ~1,096 |
957
- | **NER CPU** | ~234 | ~38 | ~11 |
958
- | **NER GPU** | ~16 | ~14 | ~9 |
959
-
960
- ### Model Downloads
961
-
962
- | Model | Size | First-Use Download |
963
- |-------|------|-------------------|
964
- | Quantized NER | ~265 MB | ~30s on fast connection |
965
- | Standard NER | ~1.1 GB | ~2min on fast connection |
966
- | Semantic Data | ~12 MB | ~5s on fast connection |
967
-
968
- ### Recommendations
969
-
970
- | Use Case | Recommended Backend |
971
- |----------|---------------------|
972
- | Structured PII only (email, phone, IBAN) | Regex-only |
973
- | General use with name/org/location detection | **NER CPU (default)** |
974
- | High-throughput batch processing (1000s of docs) | NER GPU |
975
- | Privacy-sensitive / zero-knowledge required | NER CPU (data never leaves device) |
278
+ ```typescript
279
+ // Development: random key, lost on restart
280
+ const keyProvider = new InMemoryKeyProvider();
976
281
 
977
- > **Note:** Local CPU inference now outperforms GPU for most use cases due to network overhead elimination. The trie-based tokenizer provides O(token_length) lookups instead of O(vocab_size), making local inference practical for production use.
282
+ // Production: persistent key (generate with: openssl rand -base64 32)
283
+ const keyProvider = new ConfigKeyProvider(process.env.PII_ENCRYPTION_KEY!);
284
+ ```
978
285
 
979
- ## Requirements
286
+ ## Platform Support
980
287
 
981
288
  | Environment | Version | Notes |
982
289
  |-------------|---------|-------|
983
290
  | Node.js | >= 18.0.0 | Uses native `onnxruntime-node` |
984
- | Bun | >= 1.0.0 | Requires `onnxruntime-web` |
985
- | Browsers | Chrome 86+, Firefox 89+, Safari 15.4+, Edge 86+ | Uses OPFS for model storage |
291
+ | Bun | >= 1.0.0 | Install `onnxruntime-web`: `bun add rehydra onnxruntime-web` |
292
+ | Browsers | Chrome 86+, Firefox 89+, Safari 15.4+ | Uses OPFS for model storage |
986
293
 
987
- ## Development
988
-
989
- ```bash
990
- # Install dependencies
991
- npm install
992
-
993
- # Run tests
994
- npm test
995
-
996
- # Build
997
- npm run build
294
+ The browser build (`rehydra/browser`) automatically excludes Node.js dependencies. Modern bundlers (Vite, webpack, esbuild) select the right entry point via conditional exports.
998
295
 
999
- # Lint
1000
- npm run lint
1001
- ```
1002
-
1003
- ### Building Custom Models
1004
-
1005
- For development or custom models:
296
+ ## Development
1006
297
 
1007
298
  ```bash
1008
- # Requires Python 3.8+
1009
- npm run setup:ner # Standard model
1010
- npm run setup:ner:quantized # Quantized model
299
+ npm install # Install dependencies
300
+ npm run build # Compile TypeScript
301
+ npm test # Run tests (watch mode)
302
+ npm run test:run # Run tests once
303
+ npm run lint # ESLint
304
+ npm run setup:ner # Pre-download NER model (~280 MB)
305
+ npm run benchmark # Run benchmarks
306
+
307
+ # Integration tests (require API keys)
308
+ npm run test:streaming # No API key needed
309
+ OPENAI_API_KEY=... npm run test:proxy:openai -- --ner # OpenAI with NER
310
+ ANTHROPIC_API_KEY=... npm run test:proxy:anthropic -- --ner # Anthropic with NER
1011
311
  ```
1012
312
 
1013
313
  ## License