pseudonym-mcp 0.7.2 → 0.7.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +86 -66
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,21 +1,23 @@
|
|
|
1
1
|
# pseudonym-mcp
|
|
2
2
|
|
|
3
|
-
Local
|
|
3
|
+
Local pseudonymisation layer for LLM workflows — replaces detected PII with opaque tokens before the prompt reaches the cloud, then restores it on the way back.
|
|
4
4
|
|
|
5
5
|
[](https://www.npmjs.com/package/pseudonym-mcp)
|
|
6
6
|
[](LICENSE)
|
|
7
7
|
[](#)
|
|
8
|
-
[](#gdpr--ai-compliance)
|
|
9
|
+
[](#)
|
|
10
10
|
[](#)
|
|
11
11
|
|
|
12
|
-
Sits between your application and any cloud LLM (Claude, GPT-4, Gemini…).
|
|
12
|
+
Sits between your application and any cloud LLM (Claude, GPT-4, Gemini…). Detects PII locally and replaces it with opaque tokens before the prompt leaves your machine, then restores original values in the response — so users never see the tags.
|
|
13
|
+
|
|
14
|
+
It is a **defense-in-depth measure**, not a compliance silver bullet. Read the [Limitations](#limitations) and [GDPR & AI Compliance](#gdpr--ai-compliance) sections before assuming this stack does more than it does.
|
|
13
15
|
|
|
14
16
|
## What you get
|
|
15
17
|
|
|
16
18
|
- **Multi-language PII detection**: Built-in support for English (SSN, credit cards, US phone) and Polish (PESEL, IBAN, Polish phone). New **heuristic language detection** (`detectLanguage()`) infers the language from text content — `--lang` remains the authoritative override but is no longer the only input.
|
|
17
|
-
- **Hybrid NER engine**: Regex for structured PII (SSN, credit cards, IBAN, email, phone) + local Ollama LLM for unstructured entities (names,
|
|
18
|
-
- **
|
|
19
|
+
- **Hybrid NER engine**: Regex for structured PII (SSN, credit cards, IBAN, email, phone) + local Ollama LLM for unstructured entities (names, organisations).
|
|
20
|
+
- **Local-detection architecture**: Detection and substitution happen on your machine. The cloud LLM call still happens (that's the point) — but it sees tokens instead of detected PII.
|
|
19
21
|
- **Session-keyed mapping store**: Tokens like `[PERSON:1]` map back to originals in an isolated, per-request session. Multiple round-trips preserve token coherence.
|
|
20
22
|
- **Auto-unmask**: Optional mode that automatically restores tokens in the LLM's response before returning it to the user.
|
|
21
23
|
- **Flexible engines**: Run `regex` only (no Ollama required), `llm` only, or `hybrid` (default).
|
|
@@ -27,53 +29,57 @@ Sits between your application and any cloud LLM (Claude, GPT-4, Gemini…). Repl
|
|
|
27
29
|
|
|
28
30
|
❌ **Without pseudonym-mcp:**
|
|
29
31
|
|
|
30
|
-
- Prompt: `"John Smith, SSN 123-45-6789, card 4111 1111 1111 1111"` → sent verbatim to
|
|
31
|
-
- Every name, ID number, and credit card in your prompt is processed and potentially logged by the
|
|
32
|
-
- A
|
|
33
|
-
- Sending personal data to a
|
|
32
|
+
- Prompt: `"John Smith, SSN 123-45-6789, card 4111 1111 1111 1111"` → sent verbatim to the LLM provider
|
|
33
|
+
- Every name, ID number, and credit card in your prompt is processed and potentially logged by the provider
|
|
34
|
+
- A breach at the provider's end exposes those values in cleartext
|
|
35
|
+
- Sending personal data to a non-EU LLM provider without further safeguards raises GDPR Article 44 questions you'll need to answer
|
|
34
36
|
|
|
35
37
|
✅ **With pseudonym-mcp:**
|
|
36
38
|
|
|
37
39
|
- The same prompt becomes `"[PERSON:1], SSN [SSN:1], card [CREDIT_CARD:1]"` before it leaves your machine
|
|
38
|
-
- The LLM reasons about
|
|
39
|
-
- The response is
|
|
40
|
-
-
|
|
40
|
+
- The LLM reasons about structure and content without seeing those detected values in cleartext
|
|
41
|
+
- The response is locally de-tokenised before reaching the user
|
|
42
|
+
- Detected direct identifiers are no longer shipped upstream — though structure, dates, indirect references, and any missed PII still are
|
|
43
|
+
|
|
44
|
+
This is a meaningful reduction in cleartext PII exposure. It is **not** "no personal data leaves your machine" — see [Limitations](#limitations).
|
|
41
45
|
|
|
42
46
|
## GDPR & AI Compliance
|
|
43
47
|
|
|
44
|
-
pseudonym-mcp
|
|
48
|
+
pseudonym-mcp is relevant to compliance work, but it is a **technical control**, not a compliance product. Whether you are compliant with any specific regulation depends on your full stack, your role (controller/processor), your contracts, your DPIA, and your jurisdiction.
|
|
45
49
|
|
|
46
50
|
### Why this matters
|
|
47
51
|
|
|
48
|
-
The EU **General Data Protection Regulation (GDPR)** classifies names, national ID numbers (like SSN or PESEL), bank account numbers (IBAN), email addresses, credit card numbers, and phone numbers as **personal data** under Article 4(1). Sending this data to a cloud LLM provider constitutes **processing** under Article 4(2)
|
|
52
|
+
The EU **General Data Protection Regulation (GDPR)** classifies names, national ID numbers (like SSN or PESEL), bank account numbers (IBAN), email addresses, credit card numbers, and phone numbers as **personal data** under Article 4(1). Sending this data to a cloud LLM provider constitutes **processing** under Article 4(2). Pseudonymisation is explicitly recognised under Art. 4(5) as a risk-reduction measure — but, critically, **pseudonymised data is still personal data** (Recital 26).
|
|
49
53
|
|
|
50
|
-
| GDPR Article | Obligation |
|
|
51
|
-
| ------------ | -------------------------------------------------------------------- | ------------------------------------------------------------------------------- |
|
|
52
|
-
| Art. 5(1)(c) | **Data minimisation**
|
|
53
|
-
| Art. 25 | **Privacy by design and by default** |
|
|
54
|
-
| Art. 32 | **Security of processing**
|
|
55
|
-
| Art. 44 | **Transfers to third countries**
|
|
56
|
-
| Art. 4(5) | **Pseudonymisation**
|
|
54
|
+
| GDPR Article | Obligation | Where pseudonym-mcp helps | Where it doesn't |
|
|
55
|
+
| ------------ | -------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------- |
|
|
56
|
+
| Art. 5(1)(c) | **Data minimisation** | Strips detected direct identifiers before transmission | Doesn't minimise context, structure, or undetected PII |
|
|
57
|
+
| Art. 25 | **Privacy by design and by default** | Provides a technical layer that fits into a privacy-by-design architecture | Architecture and policy decisions are still your responsibility |
|
|
58
|
+
| Art. 32 | **Security of processing** | Recognised technical measure under Recital 83 (pseudonymisation) | One control among many; doesn't replace access control, logging, encryption |
|
|
59
|
+
| Art. 44 | **Transfers to third countries** | Reduces the cleartext PII you transfer | Pseudonymised personal data is still personal data — transfer rules still apply |
|
|
60
|
+
| Art. 4(5) | **Pseudonymisation** definition | The mapping store is opaque to the cloud LLM; re-identification requires the local session | Re-identification is possible from context for anyone with side knowledge |
|
|
57
61
|
|
|
58
|
-
> **
|
|
62
|
+
> **The honest bottom line:** pseudonymisation under GDPR Art. 4(5) is **not** anonymisation. The data remains personal data in your system, and Art. 44 transfer obligations are not switched off just because you tokenised the name field.
|
|
59
63
|
|
|
60
64
|
### AI Act alignment
|
|
61
65
|
|
|
62
|
-
The EU **AI Act**
|
|
66
|
+
The EU **AI Act** places additional requirements on high-risk AI systems that process personal data. Using pseudonym-mcp as an intermediary layer can:
|
|
63
67
|
|
|
64
|
-
-
|
|
65
|
-
-
|
|
66
|
-
-
|
|
68
|
+
- Support data minimisation in your AI system's data flows.
|
|
69
|
+
- Help document a technical control for transparency and human-oversight requirements.
|
|
70
|
+
- Align with the principle of **technical robustness and safety** (Art. 15) by limiting cleartext PII exposure.
|
|
71
|
+
|
|
72
|
+
It does not change your AI Act risk classification on its own — classification is a function of use-case and deployment context, not of the masking step in front of the model.
|
|
67
73
|
|
|
68
74
|
### US & international applicability
|
|
69
75
|
|
|
70
|
-
|
|
76
|
+
The tool is also relevant outside the EU, with the same caveats:
|
|
71
77
|
|
|
72
|
-
- **CCPA / CPRA** (California) —
|
|
73
|
-
- **HIPAA** (US healthcare) — PHI
|
|
74
|
-
- **PCI DSS** (payment industry) —
|
|
75
|
-
- **SOC 2** —
|
|
76
|
-
- **PIPEDA** (Canada), **LGPD** (Brazil), **POPIA** (South Africa) — all require appropriate safeguards for cross-border personal data transfers.
|
|
78
|
+
- **CCPA / CPRA** (California) — reduces personal information sent to third-party processors; doesn't change controller/business obligations or consumer rights.
|
|
79
|
+
- **HIPAA** (US healthcare) — pseudonymised PHI is still PHI under HIPAA. Using this tool does **not** eliminate the need for a BAA with your cloud LLM provider if you're a covered entity or business associate. It can be part of a defensible safeguard posture; it cannot substitute for one.
|
|
80
|
+
- **PCI DSS** (payment industry) — Luhn-validated detection reduces the chance card numbers ride in cleartext to an LLM. It is one control; PCI scope, segmentation, and storage rules are separate concerns.
|
|
81
|
+
- **SOC 2** — useful evidence of a technical control limiting PII exposure. Auditors will look at the full picture, not just this layer.
|
|
82
|
+
- **PIPEDA** (Canada), **LGPD** (Brazil), **POPIA** (South Africa) — all require appropriate safeguards for cross-border personal data transfers. This tool is a relevant safeguard, not a substitute for the legal basis of the transfer.
|
|
77
83
|
|
|
78
84
|
### Sector-specific applicability
|
|
79
85
|
|
|
@@ -82,11 +88,13 @@ While GDPR originates in the EU, pseudonym-mcp is equally relevant for:
|
|
|
82
88
|
| Healthcare | GDPR + HIPAA + national health data laws | Patient names, SSN, diagnoses |
|
|
83
89
|
| Banking & Finance | GDPR + PCI DSS + PSD2 + DORA | Credit cards, IBAN, SSN, PESEL |
|
|
84
90
|
| HR & Recruitment | GDPR Art. 9 (special categories) | Names, national IDs, contact details |
|
|
85
|
-
| Legal | GDPR + attorney
|
|
91
|
+
| Legal | GDPR + attorney–client privilege | Names, case numbers, personal details |
|
|
86
92
|
| Insurance | GDPR + Solvency II | Personal identifiers, health data |
|
|
87
93
|
| Public Sector (US) | CCPA + state privacy laws | SSN, driver's license numbers |
|
|
88
94
|
| Public Sector (PL) | GDPR + UODO + KRI | PESEL, NIP, REGON |
|
|
89
95
|
|
|
96
|
+
In every row of this table, pseudonym-mcp is a useful **building block**. None of those regimes can be satisfied by a masking tool alone.
|
|
97
|
+
|
|
90
98
|
## How it works
|
|
91
99
|
|
|
92
100
|
```
|
|
@@ -102,7 +110,7 @@ Your App / Claude Desktop
|
|
|
102
110
|
│ Phase 2: Ollama NER │ ← PERSON, ORG (local LLM)
|
|
103
111
|
│ MappingStore (session) │ ← [TAG:N] ↔ original value
|
|
104
112
|
└────────────┬────────────┘
|
|
105
|
-
│
|
|
113
|
+
│ sanitised prompt (detected PII → tokens)
|
|
106
114
|
▼
|
|
107
115
|
Cloud LLM API
|
|
108
116
|
(Claude / GPT-4 / Gemini)
|
|
@@ -154,17 +162,17 @@ We discussed a contract for 45 000 zł. Contact: jan.kowalski@acme.pl
|
|
|
154
162
|
In Claude Code you type:
|
|
155
163
|
|
|
156
164
|
```
|
|
157
|
-
Use mask_text on this note, then
|
|
165
|
+
Use mask_text on this note, then summarise the key points of the meeting.
|
|
158
166
|
```
|
|
159
167
|
|
|
160
|
-
**pseudonym-mcp replaces PII locally before
|
|
168
|
+
**pseudonym-mcp replaces detected PII locally before the prompt goes upstream:**
|
|
161
169
|
|
|
162
170
|
```
|
|
163
171
|
Meeting with [PERSON:1] ([PESEL:1]) from [ORG:1].
|
|
164
172
|
We discussed a contract for 45 000 zł. Contact: [EMAIL:1]
|
|
165
173
|
```
|
|
166
174
|
|
|
167
|
-
**Claude responds (
|
|
175
|
+
**Claude responds (working from tokens):**
|
|
168
176
|
|
|
169
177
|
```
|
|
170
178
|
Meeting with [PERSON:1] from [ORG:1] covered a contract
|
|
@@ -178,7 +186,7 @@ Meeting with Jan Kowalski from Acme sp. z o.o. covered
|
|
|
178
186
|
a contract for 45 000 zł. Follow up via jan.kowalski@acme.pl
|
|
179
187
|
```
|
|
180
188
|
|
|
181
|
-
|
|
189
|
+
The cloud provider saw the structure of the meeting and the amount — but not the detected name, PESEL, organisation, or email in cleartext. The swap happens on your machine.
|
|
182
190
|
|
|
183
191
|
### Obsidian vault with `session_id`
|
|
184
192
|
|
|
@@ -187,13 +195,13 @@ Anthropic / OpenAI never saw any real data. The entire swap happens on your mach
|
|
|
187
195
|
Use mask_text on my notes — remember the session_id
|
|
188
196
|
|
|
189
197
|
# ask Claude anything across multiple prompts
|
|
190
|
-
|
|
198
|
+
Summarise all meetings from Q1
|
|
191
199
|
|
|
192
200
|
# Claude replies with tokens; restore originals
|
|
193
201
|
Use unmask_text with session_id abc123 on the response
|
|
194
202
|
```
|
|
195
203
|
|
|
196
|
-
The `session_id` keeps the token map alive for the
|
|
204
|
+
The `session_id` keeps the token map alive for the session — the same `[PERSON:1]` always refers to the same person across notes. That consistency is what makes cross-note reasoning possible; it is also what makes a masked corpus potentially re-identifiable to anyone with side knowledge of your work. Use long-lived sessions deliberately.
|
|
197
205
|
|
|
198
206
|
## MCP Prompt Templates
|
|
199
207
|
|
|
@@ -207,28 +215,28 @@ pseudonym-mcp ships two built-in prompt templates that chain masking, an LLM tas
|
|
|
207
215
|
|
|
208
216
|
What happens:
|
|
209
217
|
|
|
210
|
-
1. pseudonym-mcp masks PII locally → `[PERSON:1]`, `[PESEL:1]`
|
|
211
|
-
2. Claude processes the
|
|
218
|
+
1. pseudonym-mcp masks detected PII locally → `[PERSON:1]`, `[PESEL:1]`
|
|
219
|
+
2. Claude processes the masked text
|
|
212
220
|
3. pseudonym-mcp restores originals in the response
|
|
213
221
|
|
|
214
222
|
Optional `lang` argument: `en` (default) or `pl`.
|
|
215
223
|
|
|
216
224
|
### `privacy_scan_file` — file / PDF (macOS only)
|
|
217
225
|
|
|
218
|
-
> **Requires [macos-vision-mcp](https://github.com/woladi/macos-vision-mcp)** — a separate MCP server that uses Apple's Vision framework to extract text from PDFs and images. macOS only.
|
|
226
|
+
> **Requires [macos-vision-mcp](https://github.com/woladi/macos-vision-mcp)** — a separate MCP server that uses Apple's Vision framework to extract text from PDFs and images on-device. macOS only.
|
|
219
227
|
|
|
220
228
|
```
|
|
221
|
-
/privacy_scan_file filePath="/Users/me/contracts/nda.pdf" task="
|
|
229
|
+
/privacy_scan_file filePath="/Users/me/contracts/nda.pdf" task="Summarise obligations and deadlines"
|
|
222
230
|
```
|
|
223
231
|
|
|
224
232
|
What happens:
|
|
225
233
|
|
|
226
|
-
1. macos-vision-mcp extracts text from the file
|
|
227
|
-
2. pseudonym-mcp masks
|
|
228
|
-
3. Claude processes the
|
|
234
|
+
1. macos-vision-mcp extracts text from the file on-device
|
|
235
|
+
2. pseudonym-mcp masks detected PII locally
|
|
236
|
+
3. Claude processes the masked content
|
|
229
237
|
4. pseudonym-mcp restores originals before the response is shown
|
|
230
238
|
|
|
231
|
-
Optional arguments: `task` (default:
|
|
239
|
+
Optional arguments: `task` (default: _summarise the key points_), `lang` (`en` or `pl`).
|
|
232
240
|
|
|
233
241
|
## Quick Start
|
|
234
242
|
|
|
@@ -244,7 +252,7 @@ claude mcp add pseudonym-mcp -- npx -y pseudonym-mcp --engines hybrid
|
|
|
244
252
|
ollama pull llama3
|
|
245
253
|
```
|
|
246
254
|
|
|
247
|
-
Skip this step if you only need regex-based masking (`--engines regex`).
|
|
255
|
+
Skip this step if you only need regex-based masking (`--engines regex`). Without Ollama, you'll catch structured identifiers (SSN, IBAN, cards, email, phone, PESEL) but not free-form names and organisations.
|
|
248
256
|
|
|
249
257
|
> **Global install** — if you prefer `npm install -g pseudonym-mcp`, replace `npx -y pseudonym-mcp` with `pseudonym-mcp` in all snippets below.
|
|
250
258
|
|
|
@@ -254,7 +262,7 @@ Restart your client. The `mask_text` and `unmask_text` tools appear automaticall
|
|
|
254
262
|
|
|
255
263
|
| Tool | What it does | Example prompt |
|
|
256
264
|
| ------------- | -------------------------------------------------------------------------------------- | --------------------------------------------------------------- |
|
|
257
|
-
| `mask_text` |
|
|
265
|
+
| `mask_text` | Pseudonymise detected PII in text. Returns `masked_text` + `session_id`. | _"Use mask_text on this customer letter before summarising it"_ |
|
|
258
266
|
| `unmask_text` | Restore original values from a session. Pass the `session_id` returned by `mask_text`. | _"Use unmask_text with session_id X to restore the response"_ |
|
|
259
267
|
|
|
260
268
|
### `mask_text` input
|
|
@@ -327,7 +335,7 @@ pseudonym-mcp --lang en --engines regex --ollama-model llama3 --auto-unmask
|
|
|
327
335
|
| `--ollama-model` | Ollama model to use for NER |
|
|
328
336
|
| `--ollama-base-url` | Ollama base URL |
|
|
329
337
|
| `--config` | Path to a custom JSON config file |
|
|
330
|
-
| `--auto-unmask` | Enable automatic response de-
|
|
338
|
+
| `--auto-unmask` | Enable automatic response de-tokenisation |
|
|
331
339
|
| `--custom-literals` | Comma-separated strings to always redact, e.g. `"Jan Kowalski,78091512345"` |
|
|
332
340
|
|
|
333
341
|
### Claude Code
|
|
@@ -368,6 +376,8 @@ Add to `~/.cursor/mcp.json`:
|
|
|
368
376
|
|
|
369
377
|
## Supported PII types
|
|
370
378
|
|
|
379
|
+
Detection is best-effort. The patterns below are what the tool **looks for** — not a guarantee of what it will always catch. See [Limitations](#limitations) for known gaps.
|
|
380
|
+
|
|
371
381
|
### Custom literals
|
|
372
382
|
|
|
373
383
|
| Tag | Detection | Match |
|
|
@@ -396,7 +406,7 @@ Custom literals are applied after the regex phase and before LLM NER, regardless
|
|
|
396
406
|
| `PHONE` | `+1 (XXX) XXX-XXXX`, `XXX-XXX-XXXX`, `XXX.XXX.XXXX` | Format match |
|
|
397
407
|
| `ZIP_CODE` | `XXXXX` or `XXXXX-XXXX` (paranoid mode only) | Format match |
|
|
398
408
|
| `PERSON` | Full names | Ollama NER (hybrid / llm engines) |
|
|
399
|
-
| `ORG` | Company /
|
|
409
|
+
| `ORG` | Company / organisation names | Ollama NER (hybrid / llm engines) |
|
|
400
410
|
|
|
401
411
|
### Polish (`--lang pl`)
|
|
402
412
|
|
|
@@ -409,7 +419,7 @@ Custom literals are applied after the regex phase and before LLM NER, regardless
|
|
|
409
419
|
| `NIP` | 10-digit tax ID (strict / paranoid modes) | Checksum (weights `[6,5,7,2,3,4,5,6,7]`) |
|
|
410
420
|
| `POSTAL_CODE` | `XX-XXX` (paranoid mode only) | Format match |
|
|
411
421
|
| `PERSON` | Full names | Ollama NER (hybrid / llm engines) |
|
|
412
|
-
| `ORG` | Company /
|
|
422
|
+
| `ORG` | Company / organisation names | Ollama NER (hybrid / llm engines) |
|
|
413
423
|
|
|
414
424
|
## Language Detection
|
|
415
425
|
|
|
@@ -432,7 +442,7 @@ detectLanguage('Hello')
|
|
|
432
442
|
| `confidence` | Score 0–1 from franc, or `null` when franc was not called |
|
|
433
443
|
|
|
434
444
|
Texts shorter than 20 characters or with low confidence return `detected: 'unknown'`.
|
|
435
|
-
The detector does not affect the current
|
|
445
|
+
The detector does not affect the current pseudonymisation pipeline — `--lang` config remains authoritative.
|
|
436
446
|
It is a building block for future multi-language and auto-select modes.
|
|
437
447
|
|
|
438
448
|
## Engine modes
|
|
@@ -443,27 +453,37 @@ It is a building block for future multi-language and auto-select modes.
|
|
|
443
453
|
| `llm` | Yes | No | Yes |
|
|
444
454
|
| `hybrid` (default) | Yes (graceful fallback) | Yes | Yes |
|
|
445
455
|
|
|
446
|
-
In `hybrid` mode, Ollama runs after the regex pass so the LLM never sees already-
|
|
456
|
+
In `hybrid` mode, Ollama runs after the regex pass so the LLM never sees already-tokenised values. If Ollama is unreachable, the server logs a warning to stderr and returns the regex-only masked text — no crash, no hang.
|
|
447
457
|
|
|
448
458
|
## Privacy & Security notes
|
|
449
459
|
|
|
450
|
-
|
|
451
|
-
|
|
452
|
-
- **
|
|
453
|
-
- **
|
|
460
|
+
Calibrated claims:
|
|
461
|
+
|
|
462
|
+
- **No telemetry from the tool itself.** pseudonym-mcp makes no network requests except to your local Ollama instance and (optionally) the MCP stdio transport.
|
|
463
|
+
- **In-memory mapping by default.** The mapping store is not written to disk. Sessions are scoped to the server process lifetime.
|
|
464
|
+
- **Idempotent tokens within a session.** The same original value always maps to the same token (`[PERSON:1]` will not become `[PERSON:2]` for the same name on a second occurrence), preserving semantic coherence in LLM reasoning.
|
|
465
|
+
- **No model training.** The local Ollama model operates offline. Your data is not used to train any model by this tool.
|
|
454
466
|
- **Strict validation by default.** Invalid SSNs (area 000/666/900+), failed-Luhn credit card numbers, and invalid-checksum PESELs are not masked, preventing false positives from OCR errors or random digit sequences.
|
|
455
467
|
|
|
468
|
+
What this does **not** guarantee:
|
|
469
|
+
|
|
470
|
+
- That all PII in your input is detected.
|
|
471
|
+
- That tokenised text is unlinkable to real people — re-identification from context is possible.
|
|
472
|
+
- That the cloud provider can't learn sensitive things from structure, timing, or content.
|
|
473
|
+
- Compliance with any specific regulation — that's a system-level property, not a tool-level one.
|
|
474
|
+
|
|
456
475
|
## Limitations
|
|
457
476
|
|
|
458
477
|
pseudonym-mcp is a technical privacy control, not a legal guarantee of compliance.
|
|
459
478
|
|
|
460
|
-
- Detection is best-effort
|
|
461
|
-
-
|
|
462
|
-
-
|
|
463
|
-
-
|
|
464
|
-
- Re-identification is possible for anyone with access to the local mapping store
|
|
479
|
+
- **Detection is best-effort.** False negatives and false positives are both possible. Indirect references (e.g. _"the tall guy from accounting"_, _"my landlord"_, _"the place near the bridge"_) are not detected. Nicknames, initials, and partial names are typically missed.
|
|
480
|
+
- **Structure still travels.** Dates, amounts, relationships between tokens, narrative content, and any PII the detector missed all reach the cloud LLM. Tokenisation hides _who_, not _what kind of situation_.
|
|
481
|
+
- **Pre-mask logging is your problem.** If your application logs plaintext before passing it to `mask_text`, this tool cannot help you.
|
|
482
|
+
- **Process-local mapping.** Restarting the server ends the session and discards mappings. This is intentional.
|
|
483
|
+
- **Re-identification is possible** for anyone with access to the local mapping store, and may be possible from context alone for anyone with side knowledge. This is pseudonymisation under GDPR Art. 4(5), not anonymisation.
|
|
484
|
+
- **No legal advice.** Nothing in this README constitutes legal advice. Compliance is a system-level property — talk to your DPO, your compliance team, and your lawyers about your specific deployment.
|
|
465
485
|
|
|
466
|
-
> Under GDPR Art. 4(5),
|
|
486
|
+
> Under GDPR Art. 4(5) and Recital 26, pseudonymised data is still personal data. pseudonym-mcp substantially reduces cleartext PII exposure but does not eliminate your legal obligations.
|
|
467
487
|
|
|
468
488
|
## Development
|
|
469
489
|
|
package/package.json
CHANGED