rehydra 0.3.3 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +173 -873
- package/dist/core/anonymizer.d.ts +9 -1
- package/dist/core/anonymizer.d.ts.map +1 -1
- package/dist/core/anonymizer.js +29 -7
- package/dist/core/anonymizer.js.map +1 -1
- package/dist/index.d.ts +2 -0
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +4 -0
- package/dist/index.js.map +1 -1
- package/dist/proxy/index.d.ts +12 -0
- package/dist/proxy/index.d.ts.map +1 -0
- package/dist/proxy/index.js +11 -0
- package/dist/proxy/index.js.map +1 -0
- package/dist/proxy/providers/anthropic.d.ts +17 -0
- package/dist/proxy/providers/anthropic.d.ts.map +1 -0
- package/dist/proxy/providers/anthropic.js +117 -0
- package/dist/proxy/providers/anthropic.js.map +1 -0
- package/dist/proxy/providers/index.d.ts +19 -0
- package/dist/proxy/providers/index.d.ts.map +1 -0
- package/dist/proxy/providers/index.js +40 -0
- package/dist/proxy/providers/index.js.map +1 -0
- package/dist/proxy/providers/openai.d.ts +17 -0
- package/dist/proxy/providers/openai.d.ts.map +1 -0
- package/dist/proxy/providers/openai.js +92 -0
- package/dist/proxy/providers/openai.js.map +1 -0
- package/dist/proxy/providers/types.d.ts +29 -0
- package/dist/proxy/providers/types.d.ts.map +1 -0
- package/dist/proxy/providers/types.js +6 -0
- package/dist/proxy/providers/types.js.map +1 -0
- package/dist/proxy/proxy-server.d.ts +53 -0
- package/dist/proxy/proxy-server.d.ts.map +1 -0
- package/dist/proxy/proxy-server.js +146 -0
- package/dist/proxy/proxy-server.js.map +1 -0
- package/dist/proxy/rehydra-fetch.d.ts +35 -0
- package/dist/proxy/rehydra-fetch.d.ts.map +1 -0
- package/dist/proxy/rehydra-fetch.js +217 -0
- package/dist/proxy/rehydra-fetch.js.map +1 -0
- package/dist/proxy/rehydra-proxy.d.ts +40 -0
- package/dist/proxy/rehydra-proxy.d.ts.map +1 -0
- package/dist/proxy/rehydra-proxy.js +82 -0
- package/dist/proxy/rehydra-proxy.js.map +1 -0
- package/dist/proxy/sse-parser.d.ts +59 -0
- package/dist/proxy/sse-parser.d.ts.map +1 -0
- package/dist/proxy/sse-parser.js +112 -0
- package/dist/proxy/sse-parser.js.map +1 -0
- package/dist/proxy/types.d.ts +49 -0
- package/dist/proxy/types.d.ts.map +1 -0
- package/dist/proxy/types.js +5 -0
- package/dist/proxy/types.js.map +1 -0
- package/dist/proxy/wrap-client.d.ts +47 -0
- package/dist/proxy/wrap-client.d.ts.map +1 -0
- package/dist/proxy/wrap-client.js +70 -0
- package/dist/proxy/wrap-client.js.map +1 -0
- package/dist/storage/session.d.ts +3 -0
- package/dist/storage/session.d.ts.map +1 -1
- package/dist/storage/session.js +24 -1
- package/dist/storage/session.js.map +1 -1
- package/dist/storage/types.d.ts +16 -0
- package/dist/storage/types.d.ts.map +1 -1
- package/dist/streaming/anonymizer-stream.d.ts +63 -0
- package/dist/streaming/anonymizer-stream.d.ts.map +1 -0
- package/dist/streaming/anonymizer-stream.js +184 -0
- package/dist/streaming/anonymizer-stream.js.map +1 -0
- package/dist/streaming/index.d.ts +9 -0
- package/dist/streaming/index.d.ts.map +1 -0
- package/dist/streaming/index.js +8 -0
- package/dist/streaming/index.js.map +1 -0
- package/dist/streaming/sentence-buffer.d.ts +78 -0
- package/dist/streaming/sentence-buffer.d.ts.map +1 -0
- package/dist/streaming/sentence-buffer.js +238 -0
- package/dist/streaming/sentence-buffer.js.map +1 -0
- package/dist/streaming/stream-factory.d.ts +38 -0
- package/dist/streaming/stream-factory.d.ts.map +1 -0
- package/dist/streaming/stream-factory.js +69 -0
- package/dist/streaming/stream-factory.js.map +1 -0
- package/dist/streaming/types.d.ts +121 -0
- package/dist/streaming/types.d.ts.map +1 -0
- package/dist/streaming/types.js +5 -0
- package/dist/streaming/types.js.map +1 -0
- package/dist/types/index.d.ts +8 -2
- package/dist/types/index.d.ts.map +1 -1
- package/dist/types/index.js.map +1 -1
- package/package.json +19 -2
package/README.md
CHANGED
|
@@ -1,1013 +1,313 @@
|
|
|
1
1
|
# Rehydra
|
|
2
2
|
|
|
3
|
+

|
|
4
|
+
|
|
3
5
|

|
|
4
6
|

|
|
5
7
|
[](https://codecov.io/github/rehydra-ai/rehydra)
|
|
6
8
|
|
|
7
|
-
On-device PII anonymization
|
|
8
|
-
|
|
9
|
-
```bash
|
|
10
|
-
npm install rehydra
|
|
11
|
-
```
|
|
12
|
-
|
|
13
|
-
**Works in Node.js, Bun, and browsers**
|
|
14
|
-
|
|
15
|
-
## Features
|
|
16
|
-
|
|
17
|
-
- **Structured PII Detection**: Regex-based detection for emails, phones, IBANs, credit cards, IPs, URLs
|
|
18
|
-
- **Soft PII Detection**: ONNX-powered NER model for names, organizations, locations (auto-downloads on first use if enabled)
|
|
19
|
-
- **Semantic Enrichment**: AI/MT-friendly tags with gender/location attributes for better translations
|
|
20
|
-
- **Secure PII Mapping**: AES-256-GCM encrypted storage of original PII values
|
|
21
|
-
- **Cross-Platform**: Works identically in Node.js, Bun, and browsers
|
|
22
|
-
- **Configurable Policies**: Customizable detection rules, thresholds, and allowlists
|
|
23
|
-
- **Validation & Leak Scanning**: Built-in validation and optional leak detection
|
|
24
|
-
|
|
25
|
-
## Installation
|
|
26
|
-
|
|
27
|
-
### Node.js / Bun
|
|
9
|
+
On-device PII anonymization for AI workflows. Detects names, emails, phones, IBANs, and more — replaces them with encrypted placeholder tags — and rehydrates them back after processing.
|
|
28
10
|
|
|
29
11
|
```bash
|
|
30
12
|
npm install rehydra
|
|
31
13
|
```
|
|
32
14
|
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
```bash
|
|
36
|
-
npm install rehydra onnxruntime-web
|
|
37
|
-
```
|
|
38
|
-
|
|
39
|
-
When using Vite, webpack, or other bundlers, the browser-safe entry point is automatically selected via [conditional exports](https://nodejs.org/api/packages.html#conditional-exports). This entry point excludes Node.js-specific modules like SQLite storage.
|
|
40
|
-
|
|
41
|
-
### Browser (without bundler)
|
|
42
|
-
|
|
43
|
-
```html
|
|
44
|
-
<script type="module">
|
|
45
|
-
// Import directly from your dist folder or CDN
|
|
46
|
-
import { createAnonymizer } from './node_modules/rehydra/dist/index.js';
|
|
47
|
-
|
|
48
|
-
// onnxruntime-web is automatically loaded from CDN when needed
|
|
49
|
-
</script>
|
|
50
|
-
```
|
|
15
|
+
**Works in Node.js, Bun, and browsers.** No data leaves your machine.
|
|
51
16
|
|
|
52
17
|
## Quick Start
|
|
53
18
|
|
|
54
|
-
### Regex-Only Mode (No Downloads Required)
|
|
55
|
-
|
|
56
|
-
For structured PII like emails, phones, IBANs, credit cards:
|
|
57
|
-
|
|
58
|
-
```typescript
|
|
59
|
-
import { anonymizeRegexOnly } from 'rehydra';
|
|
60
|
-
|
|
61
|
-
const result = await anonymizeRegexOnly(
|
|
62
|
-
'Contact john@example.com or call +49 30 123456. IBAN: DE89370400440532013000'
|
|
63
|
-
);
|
|
64
|
-
|
|
65
|
-
console.log(result.anonymizedText);
|
|
66
|
-
// "Contact <PII type="EMAIL" id="1"/> or call <PII type="PHONE" id="2"/>. IBAN: <PII type="IBAN" id="3"/>"
|
|
67
|
-
```
|
|
68
|
-
|
|
69
|
-
### Full Mode with NER (Detects Names, Organizations, Locations)
|
|
70
|
-
|
|
71
|
-
The NER model is automatically downloaded on first use (~280 MB for quantized):
|
|
72
|
-
|
|
73
|
-
```typescript
|
|
74
|
-
import { createAnonymizer } from 'rehydra';
|
|
75
|
-
|
|
76
|
-
const anonymizer = createAnonymizer({
|
|
77
|
-
ner: {
|
|
78
|
-
mode: 'quantized', // or 'standard' for full model (~1.1 GB)
|
|
79
|
-
onStatus: (status) => console.log(status),
|
|
80
|
-
}
|
|
81
|
-
});
|
|
82
|
-
|
|
83
|
-
await anonymizer.initialize(); // Downloads model if needed
|
|
84
|
-
|
|
85
|
-
const result = await anonymizer.anonymize(
|
|
86
|
-
'Hello John Smith from Acme Corp in Berlin!'
|
|
87
|
-
);
|
|
88
|
-
|
|
89
|
-
console.log(result.anonymizedText);
|
|
90
|
-
// "Hello <PII type="PERSON" id="1"/> from <PII type="ORG" id="2"/> in <PII type="LOCATION" id="3"/>!"
|
|
91
|
-
|
|
92
|
-
// Clean up when done
|
|
93
|
-
await anonymizer.dispose();
|
|
94
|
-
```
|
|
95
|
-
|
|
96
|
-
### With Semantic Enrichment
|
|
97
|
-
|
|
98
|
-
Add gender and location scope for better machine translation:
|
|
99
|
-
|
|
100
19
|
```typescript
|
|
101
|
-
import { createAnonymizer } from 'rehydra';
|
|
20
|
+
import { createAnonymizer, decryptPIIMap, rehydrate, InMemoryKeyProvider } from 'rehydra';
|
|
102
21
|
|
|
22
|
+
const keyProvider = new InMemoryKeyProvider();
|
|
103
23
|
const anonymizer = createAnonymizer({
|
|
104
|
-
ner: { mode: 'quantized' },
|
|
105
|
-
|
|
106
|
-
enabled: true, // Downloads ~12 MB of semantic data on first use
|
|
107
|
-
onStatus: (status) => console.log(status),
|
|
108
|
-
}
|
|
24
|
+
ner: { mode: 'quantized' }, // ~280 MB model, auto-downloads on first use
|
|
25
|
+
keyProvider,
|
|
109
26
|
});
|
|
110
27
|
|
|
111
|
-
await anonymizer.initialize();
|
|
112
|
-
|
|
113
28
|
const result = await anonymizer.anonymize(
|
|
114
|
-
'
|
|
29
|
+
'Email john.smith@acme-corp.com or call John at +41 79 123 45 67'
|
|
115
30
|
);
|
|
116
31
|
|
|
117
32
|
console.log(result.anonymizedText);
|
|
118
|
-
// "
|
|
119
|
-
```
|
|
120
|
-
|
|
121
|
-
## Example: Translation Workflow (Anonymize → Translate → Rehydrate)
|
|
122
|
-
|
|
123
|
-
The full workflow for privacy-preserving translation:
|
|
124
|
-
|
|
125
|
-
```typescript
|
|
126
|
-
import {
|
|
127
|
-
createAnonymizer,
|
|
128
|
-
decryptPIIMap,
|
|
129
|
-
rehydrate,
|
|
130
|
-
InMemoryKeyProvider
|
|
131
|
-
} from 'rehydra';
|
|
33
|
+
// "Email <PII type="EMAIL" id="1"/> or call <PII type="PERSON" id="2"/> at <PII type="PHONE" id="3"/>"
|
|
132
34
|
|
|
133
|
-
//
|
|
134
|
-
const
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
ner: { mode: 'quantized' },
|
|
139
|
-
keyProvider: keyProvider
|
|
140
|
-
});
|
|
141
|
-
|
|
142
|
-
await anonymizer.initialize();
|
|
143
|
-
|
|
144
|
-
// 3. Anonymize before translation
|
|
145
|
-
const original = 'Hello John Smith from Acme Corp in Berlin!';
|
|
146
|
-
const result = await anonymizer.anonymize(original);
|
|
147
|
-
|
|
148
|
-
console.log(result.anonymizedText);
|
|
149
|
-
// "Hello <PII type="PERSON" id="1"/> from <PII type="ORG" id="2"/> in <PII type="LOCATION" id="3"/>!"
|
|
150
|
-
|
|
151
|
-
// 4. Translate (or do other AI workloads that preserve placeholders)
|
|
152
|
-
const translated = await yourAIWorkflow(result.anonymizedText, { from: 'en', to: 'de' });
|
|
153
|
-
// "Hallo <PII type="PERSON" id="1"/> von <PII type="ORG" id="2"/> in <PII type="LOCATION" id="3"/>!"
|
|
154
|
-
|
|
155
|
-
// 5. Decrypt the PII map using the same key
|
|
156
|
-
const encryptionKey = await keyProvider.getKey();
|
|
157
|
-
const piiMap = await decryptPIIMap(result.piiMap, encryptionKey);
|
|
158
|
-
|
|
159
|
-
// 6. Rehydrate - replace placeholders with original values
|
|
160
|
-
const rehydrated = rehydrate(translated, piiMap);
|
|
161
|
-
|
|
162
|
-
console.log(rehydrated);
|
|
163
|
-
// "Hallo John Smith von Acme Corp in Berlin!"
|
|
35
|
+
// Rehydrate after translation or other processing
|
|
36
|
+
const key = await keyProvider.getKey();
|
|
37
|
+
const piiMap = await decryptPIIMap(result.piiMap!, key);
|
|
38
|
+
const original = rehydrate(result.anonymizedText, piiMap);
|
|
39
|
+
// "Email john.smith@acme-corp.com or call John at +41 79 123 45 67"
|
|
164
40
|
|
|
165
|
-
// 7. Clean up
|
|
166
41
|
await anonymizer.dispose();
|
|
167
42
|
```
|
|
168
43
|
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
- **Save the encryption key** - You need the same key to decrypt the PII map
|
|
172
|
-
- **Placeholders are XML-like** - Most translation services preserve them automatically
|
|
173
|
-
- **PII stays local** - Original values never leave your system during translation
|
|
44
|
+
## LLM Proxy
|
|
174
45
|
|
|
175
|
-
|
|
46
|
+
Drop-in middleware that anonymizes prompts before they leave your machine and rehydrates responses. Works with OpenAI, Anthropic, and any OpenAI-compatible API.
|
|
176
47
|
|
|
177
|
-
###
|
|
48
|
+
### Wrap any fetch-based client
|
|
178
49
|
|
|
179
50
|
```typescript
|
|
180
|
-
import
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
onDownloadProgress: (progress) => {
|
|
190
|
-
console.log(`${progress.file}: ${progress.percent}%`);
|
|
191
|
-
},
|
|
192
|
-
|
|
193
|
-
// For 'inference-server' backend:
|
|
194
|
-
inferenceServerUrl: 'http://localhost:8080',
|
|
195
|
-
|
|
196
|
-
// For 'custom' mode only:
|
|
197
|
-
modelPath: './my-model.onnx',
|
|
198
|
-
vocabPath: './vocab.txt',
|
|
199
|
-
},
|
|
200
|
-
|
|
201
|
-
// Semantic enrichment (adds gender/scope attributes)
|
|
202
|
-
semantic: {
|
|
203
|
-
enabled: true, // Enable MT-friendly attributes
|
|
204
|
-
autoDownload: true, // Auto-download semantic data (~12 MB)
|
|
205
|
-
onStatus: (status) => {},
|
|
206
|
-
onDownloadProgress: (progress) => {},
|
|
207
|
-
},
|
|
208
|
-
|
|
209
|
-
// Encryption key provider
|
|
210
|
-
keyProvider: new InMemoryKeyProvider(),
|
|
211
|
-
|
|
212
|
-
// Custom policy (optional)
|
|
213
|
-
defaultPolicy: { /* see Policy section */ },
|
|
51
|
+
import OpenAI from 'openai';
|
|
52
|
+
import { createRehydraFetch, InMemoryKeyProvider, InMemoryPIIStorageProvider } from 'rehydra';
|
|
53
|
+
|
|
54
|
+
const openai = new OpenAI({
|
|
55
|
+
fetch: createRehydraFetch({
|
|
56
|
+
anonymizer: { ner: { mode: 'quantized' } },
|
|
57
|
+
keyProvider: new InMemoryKeyProvider(),
|
|
58
|
+
piiStorageProvider: new InMemoryPIIStorageProvider(),
|
|
59
|
+
}),
|
|
214
60
|
});
|
|
215
61
|
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
|
|
220
|
-
|
|
221
|
-
| Mode | Description | Size | Auto-Download |
|
|
222
|
-
|------|-------------|------|---------------|
|
|
223
|
-
| `'disabled'` | No NER, regex only | 0 | N/A |
|
|
224
|
-
| `'quantized'` | Smaller model, ~95% accuracy | ~280 MB | Yes |
|
|
225
|
-
| `'standard'` | Full model, best accuracy | ~1.1 GB | Yes |
|
|
226
|
-
| `'custom'` | Your own ONNX model | Varies | No |
|
|
227
|
-
|
|
228
|
-
### ONNX Session Options
|
|
229
|
-
|
|
230
|
-
Fine-tune ONNX Runtime performance with session options:
|
|
231
|
-
|
|
232
|
-
```typescript
|
|
233
|
-
const anonymizer = createAnonymizer({
|
|
234
|
-
ner: {
|
|
235
|
-
mode: 'quantized',
|
|
236
|
-
sessionOptions: {
|
|
237
|
-
// Graph optimization level: 'disabled' | 'basic' | 'extended' | 'all'
|
|
238
|
-
graphOptimizationLevel: 'all', // default
|
|
239
|
-
|
|
240
|
-
// Threading (Node.js only)
|
|
241
|
-
intraOpNumThreads: 4, // threads within operators
|
|
242
|
-
interOpNumThreads: 1, // threads between operators
|
|
243
|
-
|
|
244
|
-
// Memory optimization
|
|
245
|
-
enableCpuMemArena: true,
|
|
246
|
-
enableMemPattern: true,
|
|
247
|
-
}
|
|
248
|
-
}
|
|
62
|
+
// PII is anonymized before leaving your machine, response is rehydrated automatically
|
|
63
|
+
const response = await openai.chat.completions.create({
|
|
64
|
+
model: 'gpt-4o',
|
|
65
|
+
messages: [{ role: 'user', content: 'Draft a reply to john@example.com about the meeting' }],
|
|
249
66
|
});
|
|
250
67
|
```
|
|
251
68
|
|
|
252
|
-
|
|
253
|
-
|
|
254
|
-
By default, Rehydra uses:
|
|
255
|
-
- **Node.js**: CPU (fastest for quantized models)
|
|
256
|
-
- **Browsers**: WebGPU with WASM fallback
|
|
257
|
-
|
|
258
|
-
To enable **CoreML on macOS** (for non-quantized models):
|
|
69
|
+
### Or use wrapLLMClient for even less code
|
|
259
70
|
|
|
260
71
|
```typescript
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
mode: 'standard', // CoreML works better with FP32 models
|
|
264
|
-
sessionOptions: {
|
|
265
|
-
executionProviders: ['coreml', 'cpu'],
|
|
266
|
-
}
|
|
267
|
-
}
|
|
268
|
-
});
|
|
269
|
-
```
|
|
270
|
-
|
|
271
|
-
> **Note:** CoreML provides minimal speedup for quantized (INT8) models since they're already optimized for CPU. Use CoreML with the standard FP32 model for best results.
|
|
272
|
-
|
|
273
|
-
Available execution providers (local inference):
|
|
274
|
-
| Provider | Platform | Best For |
|
|
275
|
-
|----------|----------|----------|
|
|
276
|
-
| `'cpu'` | All | Quantized models (default) |
|
|
277
|
-
| `'coreml'` | macOS | Standard (FP32) models on Apple Silicon |
|
|
278
|
-
| `'webgpu'` | Browsers | GPU acceleration in Chrome 113+ |
|
|
279
|
-
| `'wasm'` | Browsers | Fallback for all browsers |
|
|
280
|
-
|
|
281
|
-
> **Note:** For NVIDIA GPU acceleration with CUDA/TensorRT, use the inference server backend (see [GPU Acceleration](#gpu-acceleration-enterprise)).
|
|
282
|
-
|
|
283
|
-
### GPU Acceleration (Enterprise)
|
|
284
|
-
|
|
285
|
-
For high-throughput production deployments, Rehydra supports GPU-accelerated inference via a dedicated inference server. This provides **10-37× speedup** over CPU inference.
|
|
286
|
-
|
|
287
|
-
```typescript
|
|
288
|
-
const anonymizer = createAnonymizer({
|
|
289
|
-
ner: {
|
|
290
|
-
backend: 'inference-server',
|
|
291
|
-
inferenceServerUrl: 'http://localhost:8080',
|
|
292
|
-
}
|
|
293
|
-
});
|
|
294
|
-
|
|
295
|
-
await anonymizer.initialize();
|
|
296
|
-
```
|
|
297
|
-
|
|
298
|
-
**Performance Comparison:**
|
|
299
|
-
|
|
300
|
-
| Text Size | CPU (local) | GPU (server) | Winner |
|
|
301
|
-
|-----------|-------------|--------------|--------|
|
|
302
|
-
| Short (~40 chars) | 4.3ms | 62ms | **CPU 14× faster** |
|
|
303
|
-
| Medium (~500 chars) | 26ms | 73ms | **CPU 2.8× faster** |
|
|
304
|
-
| Long (~2000 chars) | 93ms | 117ms | **CPU 1.3× faster** |
|
|
305
|
-
| Entity-dense | 13ms | 68ms | **CPU 5× faster** |
|
|
306
|
-
|
|
307
|
-
Local CPU faster for most use cases due to network overhead. GPU is beneficial for batch processing and large documents.
|
|
308
|
-
|
|
309
|
-
|
|
310
|
-
|
|
311
|
-
**Backend Options:**
|
|
312
|
-
|
|
313
|
-
| Backend | Description | Latency (2K chars) |
|
|
314
|
-
|---------|-------------|-------------------|
|
|
315
|
-
| `'local'` | CPU inference (default) | ~4,300ms |
|
|
316
|
-
| `'inference-server'` | GPU server (enterprise) | ~117ms |
|
|
317
|
-
|
|
318
|
-
> **Note:** The GPU inference server is available as part of Rehydra Enterprise. Contact us for deployment options including Docker containers and Kubernetes helm charts.
|
|
319
|
-
|
|
320
|
-
### Main Functions
|
|
72
|
+
import OpenAI from 'openai';
|
|
73
|
+
import { wrapLLMClient, InMemoryKeyProvider, InMemoryPIIStorageProvider } from 'rehydra';
|
|
321
74
|
|
|
322
|
-
|
|
323
|
-
|
|
324
|
-
|
|
325
|
-
|
|
326
|
-
```typescript
|
|
327
|
-
const anonymizer = createAnonymizer({
|
|
328
|
-
ner: { mode: 'quantized' }
|
|
75
|
+
const openai = wrapLLMClient(new OpenAI(), {
|
|
76
|
+
keyProvider: new InMemoryKeyProvider(),
|
|
77
|
+
piiStorageProvider: new InMemoryPIIStorageProvider(),
|
|
329
78
|
});
|
|
330
|
-
|
|
331
|
-
await anonymizer.initialize();
|
|
332
|
-
const result = await anonymizer.anonymize('text');
|
|
333
|
-
await anonymizer.dispose();
|
|
334
|
-
```
|
|
335
|
-
|
|
336
|
-
#### `anonymize(text, locale?, policy?)`
|
|
337
|
-
|
|
338
|
-
One-off anonymization (regex-only by default):
|
|
339
|
-
|
|
340
|
-
```typescript
|
|
341
|
-
import { anonymize } from 'rehydra';
|
|
342
|
-
|
|
343
|
-
const result = await anonymize('Contact test@example.com');
|
|
344
|
-
```
|
|
345
|
-
|
|
346
|
-
#### `anonymizeWithNER(text, nerConfig, policy?)`
|
|
347
|
-
|
|
348
|
-
One-off anonymization with NER:
|
|
349
|
-
|
|
350
|
-
```typescript
|
|
351
|
-
import { anonymizeWithNER } from 'rehydra';
|
|
352
|
-
|
|
353
|
-
const result = await anonymizeWithNER(
|
|
354
|
-
'Hello John Smith',
|
|
355
|
-
{ mode: 'quantized' }
|
|
356
|
-
);
|
|
357
|
-
```
|
|
358
|
-
|
|
359
|
-
#### `anonymizeRegexOnly(text, policy?)`
|
|
360
|
-
|
|
361
|
-
Fast regex-only anonymization:
|
|
362
|
-
|
|
363
|
-
```typescript
|
|
364
|
-
import { anonymizeRegexOnly } from 'rehydra';
|
|
365
|
-
|
|
366
|
-
const result = await anonymizeRegexOnly('Card: 4111111111111111');
|
|
367
79
|
```
|
|
368
80
|
|
|
369
|
-
###
|
|
370
|
-
|
|
371
|
-
#### `decryptPIIMap(encryptedMap, key)`
|
|
81
|
+
### Standalone proxy server
|
|
372
82
|
|
|
373
|
-
|
|
83
|
+
Point any LLM client at a local proxy — zero code changes needed:
|
|
374
84
|
|
|
375
85
|
```typescript
|
|
376
|
-
import {
|
|
377
|
-
|
|
378
|
-
const piiMap = await decryptPIIMap(result.piiMap, encryptionKey);
|
|
379
|
-
// Returns Map<string, string> where key is "PERSON:1" and value is "John Smith"
|
|
380
|
-
```
|
|
86
|
+
import { createRehydraProxyServer, InMemoryKeyProvider, InMemoryPIIStorageProvider } from 'rehydra';
|
|
381
87
|
|
|
382
|
-
|
|
383
|
-
|
|
384
|
-
|
|
385
|
-
|
|
386
|
-
|
|
387
|
-
import { rehydrate } from 'rehydra';
|
|
388
|
-
|
|
389
|
-
const original = rehydrate(translatedText, piiMap);
|
|
390
|
-
```
|
|
391
|
-
|
|
392
|
-
### Result Structure
|
|
393
|
-
|
|
394
|
-
```typescript
|
|
395
|
-
interface AnonymizationResult {
|
|
396
|
-
// Text with PII replaced by placeholder tags
|
|
397
|
-
anonymizedText: string;
|
|
398
|
-
|
|
399
|
-
// Detected entities (without original text for safety)
|
|
400
|
-
entities: Array<{
|
|
401
|
-
type: PIIType;
|
|
402
|
-
id: number;
|
|
403
|
-
start: number;
|
|
404
|
-
end: number;
|
|
405
|
-
confidence: number;
|
|
406
|
-
source: 'REGEX' | 'NER';
|
|
407
|
-
}>;
|
|
408
|
-
|
|
409
|
-
// Encrypted PII mapping (for later rehydration)
|
|
410
|
-
piiMap: {
|
|
411
|
-
ciphertext: string; // Base64
|
|
412
|
-
iv: string; // Base64
|
|
413
|
-
authTag: string; // Base64
|
|
414
|
-
};
|
|
415
|
-
|
|
416
|
-
// Processing statistics
|
|
417
|
-
stats: {
|
|
418
|
-
countsByType: Record<PIIType, number>;
|
|
419
|
-
totalEntities: number;
|
|
420
|
-
processingTimeMs: number;
|
|
421
|
-
modelVersion: string;
|
|
422
|
-
leakScanPassed?: boolean;
|
|
423
|
-
};
|
|
424
|
-
}
|
|
425
|
-
```
|
|
426
|
-
|
|
427
|
-
## Supported PII Types
|
|
428
|
-
|
|
429
|
-
| Type | Description | Detection | Semantic Attributes |
|
|
430
|
-
|------|-------------|-----------|---------------------|
|
|
431
|
-
| `EMAIL` | Email addresses | Regex | - |
|
|
432
|
-
| `PHONE` | Phone numbers (international) | Regex | - |
|
|
433
|
-
| `IBAN` | International Bank Account Numbers | Regex + Checksum | - |
|
|
434
|
-
| `BIC_SWIFT` | Bank Identifier Codes | Regex | - |
|
|
435
|
-
| `CREDIT_CARD` | Credit card numbers | Regex + Luhn | - |
|
|
436
|
-
| `IP_ADDRESS` | IPv4 and IPv6 addresses | Regex | - |
|
|
437
|
-
| `URL` | Web URLs | Regex | - |
|
|
438
|
-
| `CASE_ID` | Case/ticket numbers | Regex (configurable) | - |
|
|
439
|
-
| `CUSTOMER_ID` | Customer identifiers | Regex (configurable) | - |
|
|
440
|
-
| `PERSON` | Person names | NER | `gender` (male/female/neutral) |
|
|
441
|
-
| `ORG` | Organization names | NER | - |
|
|
442
|
-
| `LOCATION` | Location/place names | NER | `scope` (city/country/region) |
|
|
443
|
-
| `ADDRESS` | Physical addresses | NER | - |
|
|
444
|
-
| `DATE_OF_BIRTH` | Dates of birth | NER | - |
|
|
445
|
-
|
|
446
|
-
## Configuration
|
|
447
|
-
|
|
448
|
-
### Anonymization Policy
|
|
449
|
-
|
|
450
|
-
```typescript
|
|
451
|
-
import { createAnonymizer, PIIType } from 'rehydra';
|
|
452
|
-
|
|
453
|
-
const anonymizer = createAnonymizer({
|
|
454
|
-
ner: { mode: 'quantized' },
|
|
455
|
-
defaultPolicy: {
|
|
456
|
-
// Which PII types to detect
|
|
457
|
-
enabledTypes: new Set([PIIType.EMAIL, PIIType.PHONE, PIIType.PERSON]),
|
|
458
|
-
|
|
459
|
-
// Confidence thresholds per type (0.0 - 1.0)
|
|
460
|
-
confidenceThresholds: new Map([
|
|
461
|
-
[PIIType.PERSON, 0.8],
|
|
462
|
-
[PIIType.EMAIL, 0.5],
|
|
463
|
-
]),
|
|
464
|
-
|
|
465
|
-
// Terms to never treat as PII
|
|
466
|
-
allowlistTerms: new Set(['Customer Service', 'Help Desk']),
|
|
467
|
-
|
|
468
|
-
// Enable semantic enrichment (gender/scope)
|
|
469
|
-
enableSemanticMasking: true,
|
|
470
|
-
|
|
471
|
-
// Enable leak scanning on output
|
|
472
|
-
enableLeakScan: true,
|
|
473
|
-
},
|
|
88
|
+
const proxy = await createRehydraProxyServer({
|
|
89
|
+
port: 8080,
|
|
90
|
+
upstream: 'https://api.openai.com',
|
|
91
|
+
keyProvider: new InMemoryKeyProvider(),
|
|
92
|
+
piiStorageProvider: new InMemoryPIIStorageProvider(),
|
|
474
93
|
});
|
|
475
|
-
```
|
|
476
|
-
|
|
477
|
-
### Custom Recognizers
|
|
478
94
|
|
|
479
|
-
|
|
480
|
-
|
|
481
|
-
```typescript
|
|
482
|
-
import { createCustomIdRecognizer, PIIType, createAnonymizer } from 'rehydra';
|
|
483
|
-
|
|
484
|
-
const customRecognizer = createCustomIdRecognizer([
|
|
485
|
-
{
|
|
486
|
-
name: 'Order Number',
|
|
487
|
-
pattern: /\bORD-[A-Z0-9]{8}\b/g,
|
|
488
|
-
type: PIIType.CASE_ID,
|
|
489
|
-
},
|
|
490
|
-
]);
|
|
491
|
-
|
|
492
|
-
const anonymizer = createAnonymizer();
|
|
493
|
-
anonymizer.getRegistry().register(customRecognizer);
|
|
95
|
+
// Point your client at the proxy
|
|
96
|
+
const openai = new OpenAI({ baseURL: 'http://localhost:8080/v1' });
|
|
494
97
|
```
|
|
495
98
|
|
|
496
|
-
|
|
497
|
-
|
|
498
|
-
Models and semantic data are cached locally for offline use.
|
|
499
|
-
|
|
500
|
-
### Node.js Cache Locations
|
|
501
|
-
|
|
502
|
-
| Data | macOS | Linux | Windows |
|
|
503
|
-
|------|-------|-------|---------|
|
|
504
|
-
| NER Models | `~/Library/Caches/rehydra/models/` | `~/.cache/rehydra/models/` | `%LOCALAPPDATA%/rehydra/models/` |
|
|
505
|
-
| Semantic Data | `~/Library/Caches/rehydra/semantic-data/` | `~/.cache/rehydra/semantic-data/` | `%LOCALAPPDATA%/rehydra/semantic-data/` |
|
|
506
|
-
|
|
507
|
-
### Browser Cache
|
|
508
|
-
|
|
509
|
-
In browsers, data is stored using:
|
|
510
|
-
- **IndexedDB**: For semantic data and smaller files
|
|
511
|
-
- **Origin Private File System (OPFS)**: For large model files (~280 MB)
|
|
99
|
+
Supports non-streaming and streaming (SSE) responses for both OpenAI and Anthropic APIs.
|
|
512
100
|
|
|
513
|
-
|
|
101
|
+
## Streaming
|
|
514
102
|
|
|
515
|
-
|
|
103
|
+
Process text chunk-by-chunk with constant memory. Works as a Node.js Transform stream.
|
|
516
104
|
|
|
517
105
|
```typescript
|
|
518
|
-
import {
|
|
519
|
-
|
|
520
|
-
isModelDownloaded,
|
|
521
|
-
downloadModel,
|
|
522
|
-
clearModelCache,
|
|
523
|
-
listDownloadedModels,
|
|
524
|
-
|
|
525
|
-
// Semantic data management
|
|
526
|
-
isSemanticDataDownloaded,
|
|
527
|
-
downloadSemanticData,
|
|
528
|
-
clearSemanticDataCache,
|
|
529
|
-
} from 'rehydra';
|
|
530
|
-
|
|
531
|
-
// Check if model is downloaded
|
|
532
|
-
const hasModel = await isModelDownloaded('quantized');
|
|
106
|
+
import { createReadStream, createWriteStream } from 'fs';
|
|
107
|
+
import { createAnonymizerStream, InMemoryKeyProvider } from 'rehydra';
|
|
533
108
|
|
|
534
|
-
|
|
535
|
-
|
|
536
|
-
|
|
109
|
+
const stream = await createAnonymizerStream({
|
|
110
|
+
anonymizer: { ner: { mode: 'quantized' } },
|
|
111
|
+
keyProvider: new InMemoryKeyProvider(),
|
|
112
|
+
sessionId: 'batch-job-001',
|
|
113
|
+
piiStorageProvider: storage,
|
|
537
114
|
});
|
|
538
115
|
|
|
539
|
-
|
|
540
|
-
const hasSemanticData = await isSemanticDataDownloaded();
|
|
541
|
-
|
|
542
|
-
// List downloaded models
|
|
543
|
-
const models = await listDownloadedModels();
|
|
544
|
-
|
|
545
|
-
// Clear caches
|
|
546
|
-
await clearModelCache('quantized'); // or clearModelCache() for all
|
|
547
|
-
await clearSemanticDataCache();
|
|
116
|
+
createReadStream('input.txt').pipe(stream).pipe(createWriteStream('anonymized.txt'));
|
|
548
117
|
```
|
|
549
118
|
|
|
550
|
-
|
|
551
|
-
|
|
552
|
-
The PII map is encrypted using **AES-256-GCM** via the Web Crypto API (works in both Node.js and browsers).
|
|
119
|
+
### Low-latency mode for LLM token streams
|
|
553
120
|
|
|
554
|
-
|
|
121
|
+
Regex-only, smaller buffers, flushes aggressively — designed for real-time token streams:
|
|
555
122
|
|
|
556
123
|
```typescript
|
|
557
|
-
|
|
558
|
-
|
|
559
|
-
|
|
560
|
-
KeyProvider, // Interface for custom implementations
|
|
561
|
-
generateKey,
|
|
562
|
-
} from 'rehydra';
|
|
563
|
-
|
|
564
|
-
// Development: In-memory key (generates random key, lost on page refresh)
|
|
565
|
-
const devKeyProvider = new InMemoryKeyProvider();
|
|
566
|
-
|
|
567
|
-
// Production: Pre-configured key
|
|
568
|
-
// Generate key: openssl rand -base64 32
|
|
569
|
-
const keyBase64 = process.env.PII_ENCRYPTION_KEY; // or read from config
|
|
570
|
-
const prodKeyProvider = new ConfigKeyProvider(keyBase64);
|
|
571
|
-
|
|
572
|
-
// Custom: Implement KeyProvider interface
|
|
573
|
-
class SecureKeyProvider implements KeyProvider {
|
|
574
|
-
async getKey(): Promise<Uint8Array> {
|
|
575
|
-
// Retrieve from secure storage, HSM, keychain, etc.
|
|
576
|
-
return await getKeyFromSecureStorage();
|
|
577
|
-
}
|
|
578
|
-
}
|
|
579
|
-
```
|
|
580
|
-
|
|
581
|
-
### Security Best Practices
|
|
582
|
-
|
|
583
|
-
- **Never log the raw PII map** - Always use encrypted storage
|
|
584
|
-
- **Persist the encryption key securely** - Use platform keystores (iOS Keychain, Android Keystore, etc.)
|
|
585
|
-
- **Rotate keys** - Implement key rotation for long-running applications
|
|
586
|
-
- **Enable leak scanning** - Catch any missed PII in output
|
|
587
|
-
|
|
588
|
-
## PII Map Storage
|
|
589
|
-
|
|
590
|
-
For applications that need to persist encrypted PII maps (e.g., chat applications where you need to rehydrate later), use sessions with built-in storage providers.
|
|
591
|
-
|
|
592
|
-
### Storage Providers
|
|
593
|
-
|
|
594
|
-
| Provider | Environment | Persistence | Use Case |
|
|
595
|
-
|----------|-------------|-------------|----------|
|
|
596
|
-
| `InMemoryPIIStorageProvider` | All | None (lost on restart) | Development, testing |
|
|
597
|
-
| `SQLitePIIStorageProvider` | Node.js, Bun only* | File-based | Server-side applications |
|
|
598
|
-
| `IndexedDBPIIStorageProvider` | Browser | Browser storage | Client-side applications |
|
|
599
|
-
|
|
600
|
-
*\*Not available in browser builds. Use `IndexedDBPIIStorageProvider` for browser applications.*
|
|
601
|
-
|
|
602
|
-
### Important: Storage Only Works with Sessions
|
|
603
|
-
|
|
604
|
-
> **Note:** The `piiStorageProvider` is only used when you call `anonymizer.session()`.
|
|
605
|
-
> Calling `anonymizer.anonymize()` directly does NOT save to storage - the encrypted PII map
|
|
606
|
-
> is only returned in the result for you to handle manually.
|
|
124
|
+
const stream = await createAnonymizerStream({
|
|
125
|
+
buffer: { lowLatency: true },
|
|
126
|
+
});
|
|
607
127
|
|
|
608
|
-
|
|
609
|
-
|
|
610
|
-
|
|
611
|
-
// result.piiMap is returned but NOT saved to storage
|
|
612
|
-
|
|
613
|
-
// ✅ Storage IS used - auto-saves and auto-loads
|
|
614
|
-
const session = anonymizer.session('conversation-123');
|
|
615
|
-
const result = await session.anonymize('Hello John!');
|
|
616
|
-
// result.piiMap is automatically saved to storage
|
|
128
|
+
llmTokenStream.pipe(stream).on('data', (chunk) => {
|
|
129
|
+
ws.send(chunk.toString());
|
|
130
|
+
});
|
|
617
131
|
```
|
|
618
132
|
|
|
619
|
-
###
|
|
620
|
-
|
|
621
|
-
For simple use cases where you don't need persistence:
|
|
133
|
+
### Stream from a session
|
|
622
134
|
|
|
623
135
|
```typescript
|
|
624
|
-
|
|
625
|
-
|
|
626
|
-
|
|
627
|
-
const anonymizer = createAnonymizer({
|
|
628
|
-
ner: { mode: 'quantized' },
|
|
629
|
-
keyProvider,
|
|
630
|
-
});
|
|
631
|
-
await anonymizer.initialize();
|
|
632
|
-
|
|
633
|
-
// Anonymize
|
|
634
|
-
const result = await anonymizer.anonymize('Hello John Smith!');
|
|
635
|
-
|
|
636
|
-
// Translate (or other processing)
|
|
637
|
-
const translated = await translateAPI(result.anonymizedText);
|
|
638
|
-
|
|
639
|
-
// Rehydrate manually using the returned PII map
|
|
640
|
-
const key = await keyProvider.getKey();
|
|
641
|
-
const piiMap = await decryptPIIMap(result.piiMap, key);
|
|
642
|
-
const original = rehydrate(translated, piiMap);
|
|
136
|
+
const session = anonymizer.session('chat-123');
|
|
137
|
+
const stream = await session.createStream();
|
|
138
|
+
input.pipe(stream).pipe(output);
|
|
643
139
|
```
|
|
644
140
|
|
|
645
|
-
|
|
141
|
+
## Sessions
|
|
646
142
|
|
|
647
|
-
For
|
|
143
|
+
For multi-message conversations where PII IDs need to stay consistent and PII maps need to persist:
|
|
648
144
|
|
|
649
145
|
```typescript
|
|
650
|
-
import {
|
|
146
|
+
import {
|
|
651
147
|
createAnonymizer,
|
|
652
148
|
InMemoryKeyProvider,
|
|
653
|
-
SQLitePIIStorageProvider,
|
|
149
|
+
SQLitePIIStorageProvider, // or InMemoryPIIStorageProvider, IndexedDBPIIStorageProvider
|
|
654
150
|
} from 'rehydra';
|
|
655
151
|
|
|
656
|
-
// 1. Setup storage (once at app start)
|
|
657
|
-
const storage = new SQLitePIIStorageProvider('./pii-maps.db');
|
|
658
|
-
await storage.initialize();
|
|
659
|
-
|
|
660
|
-
// 2. Create anonymizer with storage and key provider
|
|
661
152
|
const anonymizer = createAnonymizer({
|
|
662
153
|
ner: { mode: 'quantized' },
|
|
663
154
|
keyProvider: new InMemoryKeyProvider(),
|
|
664
|
-
piiStorageProvider:
|
|
155
|
+
piiStorageProvider: new SQLitePIIStorageProvider('./pii.db'),
|
|
665
156
|
});
|
|
666
|
-
await anonymizer.initialize();
|
|
667
|
-
|
|
668
|
-
// 3. Create a session for each conversation
|
|
669
|
-
const session = anonymizer.session('conversation-123');
|
|
670
|
-
|
|
671
|
-
// 4. Anonymize - auto-saves to storage
|
|
672
|
-
const result = await session.anonymize('Hello John Smith from Acme Corp!');
|
|
673
|
-
console.log(result.anonymizedText);
|
|
674
|
-
// "Hello <PII type="PERSON" id="1"/> from <PII type="ORG" id="1"/>!"
|
|
675
|
-
|
|
676
|
-
// 5. Later (even after app restart): rehydrate - auto-loads and decrypts
|
|
677
|
-
const translated = await translateAPI(result.anonymizedText);
|
|
678
|
-
const original = await session.rehydrate(translated);
|
|
679
|
-
console.log(original);
|
|
680
|
-
// "Hello John Smith from Acme Corp!"
|
|
681
|
-
|
|
682
|
-
// 6. Optional: check existence or delete
|
|
683
|
-
await session.exists(); // true
|
|
684
|
-
await session.delete(); // removes from storage
|
|
685
|
-
```
|
|
686
|
-
|
|
687
|
-
### Example: Multiple Conversations
|
|
688
|
-
|
|
689
|
-
Each session ID maps to a separate stored PII map:
|
|
690
|
-
|
|
691
|
-
```typescript
|
|
692
|
-
// Different chat sessions
|
|
693
|
-
const chat1 = anonymizer.session('user-alice-chat');
|
|
694
|
-
const chat2 = anonymizer.session('user-bob-chat');
|
|
695
|
-
|
|
696
|
-
await chat1.anonymize('Alice: Contact me at alice@example.com');
|
|
697
|
-
await chat2.anonymize('Bob: My number is +49 30 123456');
|
|
698
|
-
|
|
699
|
-
// Each session has independent storage
|
|
700
|
-
await chat1.rehydrate(translatedText1); // Uses Alice's PII map
|
|
701
|
-
await chat2.rehydrate(translatedText2); // Uses Bob's PII map
|
|
702
|
-
```
|
|
703
|
-
|
|
704
|
-
### Multi-Message Conversations
|
|
705
157
|
|
|
706
|
-
Within a session, entity IDs are consistent across multiple `anonymize()` calls:
|
|
707
|
-
|
|
708
|
-
```typescript
|
|
709
158
|
const session = anonymizer.session('chat-123');
|
|
710
159
|
|
|
711
|
-
// Message 1
|
|
712
|
-
|
|
160
|
+
// Message 1
|
|
161
|
+
await session.anonymize('Contact me at user@example.com');
|
|
713
162
|
// → "Contact me at <PII type="EMAIL" id="1"/>"
|
|
714
163
|
|
|
715
|
-
// Message 2
|
|
716
|
-
|
|
164
|
+
// Message 2 — same email gets the same ID
|
|
165
|
+
await session.anonymize('CC: user@example.com and admin@example.com');
|
|
717
166
|
// → "CC: <PII type="EMAIL" id="1"/> and <PII type="EMAIL" id="2"/>"
|
|
718
|
-
// ↑ Same ID (reused) ↑ New ID
|
|
719
|
-
|
|
720
|
-
// Message 3: No PII
|
|
721
|
-
await session.anonymize('Please translate to German');
|
|
722
|
-
// Previous PII preserved
|
|
723
167
|
|
|
724
|
-
//
|
|
725
|
-
await session.rehydrate(
|
|
726
|
-
await session.rehydrate(msg2.anonymizedText); // ✓
|
|
168
|
+
// Rehydrate any message — auto-loads the PII map from storage
|
|
169
|
+
const original = await session.rehydrate(translatedText);
|
|
727
170
|
```
|
|
728
171
|
|
|
729
|
-
|
|
730
|
-
|
|
731
|
-
### SQLite Provider (Node.js + Bun only)
|
|
172
|
+
### Storage Providers
|
|
732
173
|
|
|
733
|
-
|
|
174
|
+
| Provider | Environment | Persistence |
|
|
175
|
+
|----------|-------------|-------------|
|
|
176
|
+
| `InMemoryPIIStorageProvider` | All | None (lost on restart) |
|
|
177
|
+
| `SQLitePIIStorageProvider` | Node.js, Bun | File-based (`better-sqlite3` on Node, `bun:sqlite` on Bun) |
|
|
178
|
+
| `IndexedDBPIIStorageProvider` | Browser | Browser storage |
|
|
734
179
|
|
|
735
|
-
|
|
180
|
+
## Supported PII Types
|
|
736
181
|
|
|
737
|
-
|
|
738
|
-
|
|
739
|
-
|
|
740
|
-
|
|
182
|
+
| Type | Detection | Notes |
|
|
183
|
+
|------|-----------|-------|
|
|
184
|
+
| `PERSON` | NER | Names, with optional `gender` attribute |
|
|
185
|
+
| `ORG` | NER | Organization names |
|
|
186
|
+
| `LOCATION` | NER | Places, with optional `scope` attribute (city/country/region) |
|
|
187
|
+
| `ADDRESS` | NER | Physical addresses |
|
|
188
|
+
| `DATE_OF_BIRTH` | NER | Dates of birth |
|
|
189
|
+
| `EMAIL` | Regex | Email addresses |
|
|
190
|
+
| `PHONE` | Regex | International phone numbers |
|
|
191
|
+
| `IBAN` | Regex + checksum | International Bank Account Numbers |
|
|
192
|
+
| `BIC_SWIFT` | Regex | Bank Identifier Codes |
|
|
193
|
+
| `CREDIT_CARD` | Regex + Luhn | Credit card numbers |
|
|
194
|
+
| `IP_ADDRESS` | Regex | IPv4 and IPv6 |
|
|
195
|
+
| `URL` | Regex | Web URLs |
|
|
196
|
+
| `CASE_ID` | Regex | Configurable case/ticket patterns |
|
|
197
|
+
| `CUSTOMER_ID` | Regex | Configurable customer ID patterns |
|
|
741
198
|
|
|
742
|
-
|
|
743
|
-
const storage = new SQLitePIIStorageProvider('./data/pii-maps.db');
|
|
744
|
-
await storage.initialize();
|
|
199
|
+
## Configuration
|
|
745
200
|
|
|
746
|
-
|
|
747
|
-
const testStorage = new SQLitePIIStorageProvider(':memory:');
|
|
748
|
-
await testStorage.initialize();
|
|
749
|
-
```
|
|
201
|
+
### NER Modes
|
|
750
202
|
|
|
751
|
-
|
|
752
|
-
|
|
753
|
-
|
|
203
|
+
| Mode | Size | Description |
|
|
204
|
+
|------|------|-------------|
|
|
205
|
+
| `'disabled'` | 0 | Regex only — no model download |
|
|
206
|
+
| `'quantized'` | ~280 MB | Recommended — good accuracy, smaller download |
|
|
207
|
+
| `'standard'` | ~1.1 GB | Best accuracy |
|
|
208
|
+
| `'custom'` | Varies | Bring your own ONNX model |
|
|
754
209
|
|
|
755
|
-
|
|
756
|
-
npm install better-sqlite3
|
|
757
|
-
```
|
|
210
|
+
### Semantic Enrichment
|
|
758
211
|
|
|
759
|
-
|
|
212
|
+
Adds gender/scope attributes for better machine translation:
|
|
760
213
|
|
|
761
214
|
```typescript
|
|
762
|
-
import {
|
|
763
|
-
createAnonymizer,
|
|
764
|
-
InMemoryKeyProvider,
|
|
765
|
-
IndexedDBPIIStorageProvider,
|
|
766
|
-
} from 'rehydra';
|
|
767
|
-
|
|
768
|
-
// Custom database name (defaults to 'rehydra-pii-storage')
|
|
769
|
-
const storage = new IndexedDBPIIStorageProvider('my-app-pii');
|
|
770
|
-
|
|
771
215
|
const anonymizer = createAnonymizer({
|
|
772
216
|
ner: { mode: 'quantized' },
|
|
773
|
-
|
|
774
|
-
piiStorageProvider: storage,
|
|
217
|
+
semantic: { enabled: true }, // Downloads ~12 MB of name/location data
|
|
775
218
|
});
|
|
776
|
-
await anonymizer.initialize();
|
|
777
219
|
|
|
778
|
-
//
|
|
779
|
-
const session = anonymizer.session('browser-chat-123');
|
|
780
|
-
const result = await session.anonymize('Hello John!');
|
|
781
|
-
const original = await session.rehydrate(result.anonymizedText);
|
|
220
|
+
// "Hello <PII type="PERSON" gender="female" id="1"/> from <PII type="LOCATION" scope="city" id="2"/>!"
|
|
782
221
|
```
|
|
783
222
|
|
|
784
|
-
###
|
|
785
|
-
|
|
786
|
-
The session object provides these methods:
|
|
223
|
+
### Anonymization Policy
|
|
787
224
|
|
|
788
225
|
```typescript
|
|
789
|
-
|
|
790
|
-
|
|
791
|
-
|
|
792
|
-
|
|
793
|
-
|
|
794
|
-
|
|
795
|
-
|
|
796
|
-
}
|
|
226
|
+
const anonymizer = createAnonymizer({
|
|
227
|
+
ner: { mode: 'quantized' },
|
|
228
|
+
defaultPolicy: {
|
|
229
|
+
enabledTypes: new Set([PIIType.EMAIL, PIIType.PHONE, PIIType.PERSON]),
|
|
230
|
+
confidenceThresholds: new Map([[PIIType.PERSON, 0.8]]),
|
|
231
|
+
allowlistTerms: new Set(['Customer Service']),
|
|
232
|
+
enableLeakScan: true,
|
|
233
|
+
},
|
|
234
|
+
});
|
|
797
235
|
```
|
|
798
236
|
|
|
799
|
-
###
|
|
800
|
-
|
|
801
|
-
**Entries persist forever by default.** Use `cleanup()` on the storage provider to remove old entries:
|
|
237
|
+
### Anonymization Modes
|
|
802
238
|
|
|
803
239
|
```typescript
|
|
804
|
-
//
|
|
805
|
-
const
|
|
240
|
+
// Pseudonymize (default): reversible, returns encrypted PII map
|
|
241
|
+
const anonymizer = createAnonymizer({ mode: 'pseudonymize' });
|
|
806
242
|
|
|
807
|
-
//
|
|
808
|
-
|
|
809
|
-
|
|
810
|
-
// List all stored sessions
|
|
811
|
-
const sessionIds = await storage.list();
|
|
812
|
-
```
|
|
813
|
-
|
|
814
|
-
## Browser Usage
|
|
815
|
-
|
|
816
|
-
The library works seamlessly in browsers without any special configuration.
|
|
817
|
-
|
|
818
|
-
### Basic Browser Example
|
|
819
|
-
|
|
820
|
-
```html
|
|
821
|
-
<!DOCTYPE html>
|
|
822
|
-
<html>
|
|
823
|
-
<head>
|
|
824
|
-
<title>PII Anonymization</title>
|
|
825
|
-
</head>
|
|
826
|
-
<body>
|
|
827
|
-
<script type="module">
|
|
828
|
-
import {
|
|
829
|
-
createAnonymizer,
|
|
830
|
-
InMemoryKeyProvider,
|
|
831
|
-
decryptPIIMap,
|
|
832
|
-
rehydrate
|
|
833
|
-
} from './node_modules/rehydra/dist/index.js';
|
|
834
|
-
|
|
835
|
-
async function demo() {
|
|
836
|
-
// Create anonymizer
|
|
837
|
-
const keyProvider = new InMemoryKeyProvider();
|
|
838
|
-
const anonymizer = createAnonymizer({
|
|
839
|
-
ner: {
|
|
840
|
-
mode: 'quantized',
|
|
841
|
-
onStatus: (s) => console.log('NER:', s),
|
|
842
|
-
onDownloadProgress: (p) => console.log(`Download: ${p.percent}%`)
|
|
843
|
-
},
|
|
844
|
-
semantic: { enabled: true },
|
|
845
|
-
keyProvider
|
|
846
|
-
});
|
|
847
|
-
|
|
848
|
-
// Initialize (downloads models on first use)
|
|
849
|
-
await anonymizer.initialize();
|
|
850
|
-
|
|
851
|
-
// Anonymize
|
|
852
|
-
const result = await anonymizer.anonymize(
|
|
853
|
-
'Contact Maria Schmidt at maria@example.com in Berlin.'
|
|
854
|
-
);
|
|
855
|
-
|
|
856
|
-
console.log('Anonymized:', result.anonymizedText);
|
|
857
|
-
// "Contact <PII type="PERSON" gender="female" id="1"/> at <PII type="EMAIL" id="2"/> in <PII type="LOCATION" scope="city" id="3"/>."
|
|
858
|
-
|
|
859
|
-
// Rehydrate
|
|
860
|
-
const key = await keyProvider.getKey();
|
|
861
|
-
const piiMap = await decryptPIIMap(result.piiMap, key);
|
|
862
|
-
const original = rehydrate(result.anonymizedText, piiMap);
|
|
863
|
-
|
|
864
|
-
console.log('Rehydrated:', original);
|
|
865
|
-
|
|
866
|
-
await anonymizer.dispose();
|
|
867
|
-
}
|
|
868
|
-
|
|
869
|
-
demo().catch(console.error);
|
|
870
|
-
</script>
|
|
871
|
-
</body>
|
|
872
|
-
</html>
|
|
243
|
+
// Anonymize: irreversible, no PII map returned
|
|
244
|
+
const anonymizer = createAnonymizer({ mode: 'anonymize' });
|
|
873
245
|
```
|
|
874
246
|
|
|
875
|
-
###
|
|
876
|
-
|
|
877
|
-
- **First-use downloads**: NER model (~280 MB) and semantic data (~12 MB) are downloaded on first use
|
|
878
|
-
- **ONNX runtime**: Automatically loaded from CDN if not bundled
|
|
879
|
-
- **Offline support**: After initial download, everything works offline
|
|
880
|
-
- **Storage**: Uses IndexedDB and OPFS - data persists across sessions
|
|
881
|
-
|
|
882
|
-
### Bundler Support (Vite, webpack, esbuild)
|
|
883
|
-
|
|
884
|
-
The package uses [conditional exports](https://nodejs.org/api/packages.html#conditional-exports) to automatically provide a browser-safe build when bundling for the web. This means:
|
|
885
|
-
|
|
886
|
-
- **Automatic**: Vite, webpack, esbuild, and other modern bundlers will automatically use `dist/browser.js`
|
|
887
|
-
- **No Node.js modules**: The browser build excludes `SQLitePIIStorageProvider` and other Node.js-specific code
|
|
888
|
-
- **Tree-shakable**: Only the code you use is included in your bundle
|
|
889
|
-
|
|
890
|
-
```json
|
|
891
|
-
// package.json exports (simplified)
|
|
892
|
-
{
|
|
893
|
-
"exports": {
|
|
894
|
-
".": {
|
|
895
|
-
"browser": "./dist/browser.js",
|
|
896
|
-
"node": "./dist/index.js",
|
|
897
|
-
"default": "./dist/index.js"
|
|
898
|
-
}
|
|
899
|
-
}
|
|
900
|
-
}
|
|
901
|
-
```
|
|
902
|
-
|
|
903
|
-
**Explicit imports** (if needed):
|
|
247
|
+
### Custom Recognizers
|
|
904
248
|
|
|
905
249
|
```typescript
|
|
906
|
-
|
|
907
|
-
import { createAnonymizer } from 'rehydra/browser';
|
|
250
|
+
import { createCustomIdRecognizer, PIIType } from 'rehydra';
|
|
908
251
|
|
|
909
|
-
|
|
910
|
-
|
|
252
|
+
const recognizer = createCustomIdRecognizer([{
|
|
253
|
+
name: 'Order Number',
|
|
254
|
+
pattern: /\bORD-[A-Z0-9]{8}\b/g,
|
|
255
|
+
type: PIIType.CASE_ID,
|
|
256
|
+
}]);
|
|
911
257
|
|
|
912
|
-
|
|
913
|
-
import { SQLitePIIStorageProvider } from 'rehydra/storage/sqlite';
|
|
258
|
+
anonymizer.getRegistry().register(recognizer);
|
|
914
259
|
```
|
|
915
260
|
|
|
916
|
-
|
|
917
|
-
- `SQLitePIIStorageProvider` (use `IndexedDBPIIStorageProvider` instead)
|
|
918
|
-
- Node.js `fs`, `path`, `os` modules
|
|
919
|
-
|
|
920
|
-
**Browser build includes:**
|
|
921
|
-
- All recognizers (email, phone, IBAN, etc.)
|
|
922
|
-
- NER model support (with `onnxruntime-web`)
|
|
923
|
-
- Semantic enrichment
|
|
924
|
-
- `InMemoryPIIStorageProvider`
|
|
925
|
-
- `IndexedDBPIIStorageProvider`
|
|
926
|
-
- All crypto utilities
|
|
261
|
+
### GPU Acceleration
|
|
927
262
|
|
|
928
|
-
|
|
263
|
+
For high-throughput batch processing, use a remote inference server with GPU:
|
|
929
264
|
|
|
930
|
-
|
|
931
|
-
|
|
932
|
-
|
|
933
|
-
|
|
265
|
+
```typescript
|
|
266
|
+
const anonymizer = createAnonymizer({
|
|
267
|
+
ner: {
|
|
268
|
+
backend: 'inference-server',
|
|
269
|
+
inferenceServerUrl: 'http://localhost:8080',
|
|
270
|
+
},
|
|
271
|
+
});
|
|
934
272
|
```
|
|
935
273
|
|
|
936
|
-
|
|
937
|
-
|
|
938
|
-
## Performance
|
|
939
|
-
|
|
940
|
-
Benchmarks on Apple M-series (CPU) and NVIDIA T4 (GPU). Run `npm run benchmark:compare` to measure on your hardware.
|
|
274
|
+
## Encryption
|
|
941
275
|
|
|
942
|
-
|
|
276
|
+
PII maps are encrypted with **AES-256-GCM** via the Web Crypto API.
|
|
943
277
|
|
|
944
|
-
|
|
945
|
-
|
|
946
|
-
|
|
947
|
-
| **NER CPU** | 4.3 ms | 26 ms | 93 ms | 13 ms |
|
|
948
|
-
| **NER GPU** | 62 ms | 73 ms | 117 ms | 68 ms |
|
|
949
|
-
|
|
950
|
-
Local CPU inference is faster than GPU for typical workloads due to network overhead. GPU servers are beneficial for high-throughput batch processing where many requests can be parallelized.
|
|
951
|
-
|
|
952
|
-
### Throughput (ops/sec)
|
|
953
|
-
|
|
954
|
-
| Backend | Short | Medium | Long |
|
|
955
|
-
|---------|-------|--------|------|
|
|
956
|
-
| **Regex-only** | ~2,640 | ~2,017 | ~1,096 |
|
|
957
|
-
| **NER CPU** | ~234 | ~38 | ~11 |
|
|
958
|
-
| **NER GPU** | ~16 | ~14 | ~9 |
|
|
959
|
-
|
|
960
|
-
### Model Downloads
|
|
961
|
-
|
|
962
|
-
| Model | Size | First-Use Download |
|
|
963
|
-
|-------|------|-------------------|
|
|
964
|
-
| Quantized NER | ~265 MB | ~30s on fast connection |
|
|
965
|
-
| Standard NER | ~1.1 GB | ~2min on fast connection |
|
|
966
|
-
| Semantic Data | ~12 MB | ~5s on fast connection |
|
|
967
|
-
|
|
968
|
-
### Recommendations
|
|
969
|
-
|
|
970
|
-
| Use Case | Recommended Backend |
|
|
971
|
-
|----------|---------------------|
|
|
972
|
-
| Structured PII only (email, phone, IBAN) | Regex-only |
|
|
973
|
-
| General use with name/org/location detection | **NER CPU (default)** |
|
|
974
|
-
| High-throughput batch processing (1000s of docs) | NER GPU |
|
|
975
|
-
| Privacy-sensitive / zero-knowledge required | NER CPU (data never leaves device) |
|
|
278
|
+
```typescript
|
|
279
|
+
// Development: random key, lost on restart
|
|
280
|
+
const keyProvider = new InMemoryKeyProvider();
|
|
976
281
|
|
|
977
|
-
|
|
282
|
+
// Production: persistent key (generate with: openssl rand -base64 32)
|
|
283
|
+
const keyProvider = new ConfigKeyProvider(process.env.PII_ENCRYPTION_KEY!);
|
|
284
|
+
```
|
|
978
285
|
|
|
979
|
-
##
|
|
286
|
+
## Platform Support
|
|
980
287
|
|
|
981
288
|
| Environment | Version | Notes |
|
|
982
289
|
|-------------|---------|-------|
|
|
983
290
|
| Node.js | >= 18.0.0 | Uses native `onnxruntime-node` |
|
|
984
|
-
| Bun | >= 1.0.0 |
|
|
985
|
-
| Browsers | Chrome 86+, Firefox 89+, Safari 15.4
|
|
291
|
+
| Bun | >= 1.0.0 | Install `onnxruntime-web`: `bun add rehydra onnxruntime-web` |
|
|
292
|
+
| Browsers | Chrome 86+, Firefox 89+, Safari 15.4+ | Uses OPFS for model storage |
|
|
986
293
|
|
|
987
|
-
|
|
988
|
-
|
|
989
|
-
```bash
|
|
990
|
-
# Install dependencies
|
|
991
|
-
npm install
|
|
992
|
-
|
|
993
|
-
# Run tests
|
|
994
|
-
npm test
|
|
995
|
-
|
|
996
|
-
# Build
|
|
997
|
-
npm run build
|
|
294
|
+
The browser build (`rehydra/browser`) automatically excludes Node.js dependencies. Modern bundlers (Vite, webpack, esbuild) select the right entry point via conditional exports.
|
|
998
295
|
|
|
999
|
-
|
|
1000
|
-
npm run lint
|
|
1001
|
-
```
|
|
1002
|
-
|
|
1003
|
-
### Building Custom Models
|
|
1004
|
-
|
|
1005
|
-
For development or custom models:
|
|
296
|
+
## Development
|
|
1006
297
|
|
|
1007
298
|
```bash
|
|
1008
|
-
#
|
|
1009
|
-
npm run
|
|
1010
|
-
npm
|
|
299
|
+
npm install # Install dependencies
|
|
300
|
+
npm run build # Compile TypeScript
|
|
301
|
+
npm test # Run tests (watch mode)
|
|
302
|
+
npm run test:run # Run tests once
|
|
303
|
+
npm run lint # ESLint
|
|
304
|
+
npm run setup:ner # Pre-download NER model (~280 MB)
|
|
305
|
+
npm run benchmark # Run benchmarks
|
|
306
|
+
|
|
307
|
+
# Integration tests (require API keys)
|
|
308
|
+
npm run test:streaming # No API key needed
|
|
309
|
+
OPENAI_API_KEY=... npm run test:proxy:openai -- --ner # OpenAI with NER
|
|
310
|
+
ANTHROPIC_API_KEY=... npm run test:proxy:anthropic -- --ner # Anthropic with NER
|
|
1011
311
|
```
|
|
1012
312
|
|
|
1013
313
|
## License
|