shroud-privacy 2.3.1 → 2.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +49 -4
- package/dist/config-manager.js +13 -0
- package/dist/config.js +25 -0
- package/dist/detectors/regex.js +245 -0
- package/dist/field-scope.d.ts +25 -0
- package/dist/field-scope.js +92 -0
- package/dist/generators/codes.d.ts +65 -0
- package/dist/generators/codes.js +435 -8
- package/dist/hooks.js +59 -7
- package/dist/obfuscator.d.ts +1 -1
- package/dist/obfuscator.js +7 -1
- package/dist/types.d.ts +37 -0
- package/dist/types.js +12 -0
- package/openclaw.plugin.json +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -17,7 +17,7 @@
|
|
|
17
17
|
<a href="CHANGELOG.md">Changelog</a>
|
|
18
18
|
</p>
|
|
19
19
|
|
|
20
|
-
> Apache 2.0 · Zero runtime dependencies · Anthropic + OpenAI + Google supported · Prompt-caching friendly · Works with [OpenClaw](https://openclaw.ai) or any agent via [APP](#agent-privacy-protocol-app)
|
|
20
|
+
> Apache 2.0 · Zero runtime dependencies · Anthropic + OpenAI + Google supported · Prompt-caching friendly · Works with [OpenClaw](https://openclaw.ai), [Hermes Agent](https://github.com/nousresearch/hermes-agent), or any agent via [APP](#agent-privacy-protocol-app)
|
|
21
21
|
|
|
22
22
|
---
|
|
23
23
|
|
|
@@ -32,7 +32,7 @@ Frontier LLMs are transformative for infrastructure operations — network troub
|
|
|
32
32
|
- **Customer PII** — emails, phone numbers, national IDs, credit cards, physical addresses
|
|
33
33
|
- **Internal URLs** — wiki pages, Jira tickets, admin portals, API endpoints
|
|
34
34
|
|
|
35
|
-
Shroud sits between your agent and the LLM. It detects all of the above (
|
|
35
|
+
Shroud sits between your agent and the LLM. It detects all of the above (130+ entity types), replaces each with a deterministic format-preserving fake, and reverses the mapping on the way back. The LLM reasons over realistic-looking data. Your real infrastructure stays private.
|
|
36
36
|
|
|
37
37
|
### Who needs this
|
|
38
38
|
|
|
@@ -56,7 +56,7 @@ Shroud does not guarantee compliance — regex-based detection has limitations (
|
|
|
56
56
|
|
|
57
57
|
## What it does
|
|
58
58
|
|
|
59
|
-
1. **Detects**
|
|
59
|
+
1. **Detects** 130+ entity types: emails, IPs, phones, API keys, hostnames, SNMP communities, BGP ASNs, credit cards, SSNs, file paths, URLs, person/org/location names, VLANs, route-maps, ACLs, OSPF IDs, IBANs, JWTs, PEM certs, GPS coordinates, ICS/SCADA identifiers, dates of birth, medical record numbers (MRN/NPI/DEA), bank accounts (routing/sort code/SWIFT), tax IDs (EIN/UTR), passport numbers, driver's licenses, court case/docket/patent numbers, cryptocurrency addresses (Ethereum/Bitcoin), AWS ARNs, vendor-specific secrets (Cisco, Juniper, Palo Alto, Check Point, Fortinet, F5, Arista), and custom regex patterns.
|
|
60
60
|
2. **Replaces** each value with a deterministic fake (same input + key = same fake every time). Fakes are format-preserving: IPv4 stays in CGNAT range (`100.64.0.0/10`), IPv6 uses ULA range (`fd00::/8`), emails keep `@domain` structure, credit cards pass Luhn, etc.
|
|
61
61
|
3. **Passes through public URLs** — external URLs (arxiv.org, docs.stripe.com, etc.) are not obfuscated. Shroud resolves FQDNs via DNS: public IPs pass through, RFC 1918 / NXDOMAIN / internal IPs are obfuscated. Well-known platforms (GitHub, YouTube, Wikipedia, etc.) are always passed through.
|
|
62
62
|
4. **Deobfuscates** LLM responses and tool parameters so the user sees real values and tools receive real arguments.
|
|
@@ -102,6 +102,23 @@ openclaw plugins install shroud-privacy
|
|
|
102
102
|
|
|
103
103
|
Configure in `~/.openclaw/openclaw.json` under `plugins.entries."shroud-privacy".config`. No OpenClaw file modifications needed — Shroud uses runtime interception only.
|
|
104
104
|
|
|
105
|
+
### Hermes Agent
|
|
106
|
+
|
|
107
|
+
```bash
|
|
108
|
+
hermes plugins install wkeything/shroud
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
That's it. The plugin auto-builds on first session start (requires Node.js). All LLM traffic is obfuscated transparently — no Hermes configuration changes needed.
|
|
112
|
+
|
|
113
|
+
Per-tool field scoping is enabled by default, reducing false positives on structural fields (IDs, hashes, timestamps). Works with all Hermes-supported providers (OpenRouter, Anthropic, OpenAI, z.ai, local models).
|
|
114
|
+
|
|
115
|
+
**Config-as-code** is supported — edit `~/.shroud/shroud.config.json` to customize detection rules, field scoping, and confidence thresholds. Changes hot-reload within 2 seconds, no restart needed. The config file is shared with OpenClaw — edits apply to both platforms.
|
|
116
|
+
|
|
117
|
+
Verify after a conversation:
|
|
118
|
+
```bash
|
|
119
|
+
cat ~/.hermes/shroud-stats.json | python3 -m json.tool
|
|
120
|
+
```
|
|
121
|
+
|
|
105
122
|
### Any agent (via APP)
|
|
106
123
|
|
|
107
124
|
The **Agent Privacy Protocol** (APP) lets any AI agent add privacy and infrastructure protection — no OpenClaw required. Shroud ships with an APP server and a Python client.
|
|
@@ -230,9 +247,37 @@ Out of the box, Shroud:
|
|
|
230
247
|
| `redactionLevel` | `"full"` \| `"masked"` \| `"stats"` | `"full"` | Output mode: fake values, partial masking, or category placeholders |
|
|
231
248
|
| `dryRun` | boolean | `false` | Detect entities but don't replace (testing mode) |
|
|
232
249
|
| `maxStoreMappings` | number | `0` | Max mapping store size with LRU eviction (0 = unlimited) |
|
|
250
|
+
| `fieldScoping` | object | — | Per-tool field scoping and per-agent category exemptions (see below) |
|
|
233
251
|
|
|
234
252
|
> **Env var overrides:** `SHROUD_SECRET_KEY` and `SHROUD_PERSISTENT_SALT` override their respective config keys (priority: env var > plugin config > default).
|
|
235
253
|
|
|
254
|
+
### Per-tool field scoping
|
|
255
|
+
|
|
256
|
+
By default Shroud scans every string field in every message. This catches everything but produces false positives — file paths agents need, config values, UUIDs matching credit card patterns.
|
|
257
|
+
|
|
258
|
+
Field scoping narrows what gets scanned. Add a `fieldScoping` block to `shroud.config.json`:
|
|
259
|
+
|
|
260
|
+
```jsonc
|
|
261
|
+
{
|
|
262
|
+
"fieldScoping": {
|
|
263
|
+
"toolFields": {
|
|
264
|
+
"Read": { "scanFields": ["content", "text"] },
|
|
265
|
+
"Bash": { "scanFields": ["output", "stdout", "stderr"] },
|
|
266
|
+
"gmail_*": { "scanFields": ["subject", "body", "snippet", "from", "to"] },
|
|
267
|
+
"github_*": { "scanFields": ["title", "body", "description", "comment"] }
|
|
268
|
+
},
|
|
269
|
+
"neverScanFields": ["id", "created_at", "updated_at", "sha", "hash", "ref", "type", "status"],
|
|
270
|
+
"defaultScanFields": []
|
|
271
|
+
}
|
|
272
|
+
}
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
**`toolFields`** maps tool name patterns (wildcards `*` `?` supported) to the fields that should be scanned in their results. Unmatched tools fall back to `defaultScanFields` — set it to `[]` to scan everything for unknown tools (safe default).
|
|
276
|
+
|
|
277
|
+
**`neverScanFields`** lists structural fields that never contain user-generated content. These are skipped regardless of tool.
|
|
278
|
+
|
|
279
|
+
Hot-reloadable. No config = scan everything (backward compatible).
|
|
280
|
+
|
|
236
281
|
### Detection rules as code (hot-reload)
|
|
237
282
|
|
|
238
283
|
Shroud auto-generates a JSONC config file on first run containing every built-in detection rule:
|
|
@@ -409,7 +454,7 @@ shroud-stats --test "Contact john@acme.com" # test detection
|
|
|
409
454
|
|
|
410
455
|
## Entity categories
|
|
411
456
|
|
|
412
|
-
`person_name`, `email`, `phone`, `ip_address`, `api_key`, `url`, `org_name`, `location`, `file_path`, `credit_card`, `ssn`, `mac_address`, `hostname`, `snmp_community`, `bgp_asn`, `network_credential`, `vlan_id`, `interface_desc`, `route_map`, `ospf_id`, `acl_name`, `iban`, `national_id`, `jwt`, `ics_identifier`, `gps_coordinate`, `certificate`, `custom`
|
|
457
|
+
`person_name`, `email`, `phone`, `ip_address`, `api_key`, `url`, `org_name`, `location`, `file_path`, `credit_card`, `ssn`, `mac_address`, `hostname`, `snmp_community`, `bgp_asn`, `network_credential`, `vlan_id`, `interface_desc`, `route_map`, `ospf_id`, `acl_name`, `iban`, `national_id`, `jwt`, `ics_identifier`, `gps_coordinate`, `certificate`, `date_of_birth`, `medical_record_number`, `bank_account_number`, `tax_id`, `passport_number`, `drivers_license`, `case_number`, `cryptocurrency_address`, `aws_arn`, `custom`
|
|
413
458
|
|
|
414
459
|
---
|
|
415
460
|
|
package/dist/config-manager.js
CHANGED
|
@@ -154,6 +154,19 @@ export class ConfigManager {
|
|
|
154
154
|
" // Edit rules here. Changes hot-reload within 2 seconds (no restart needed).",
|
|
155
155
|
" // Priority: env vars > this file > plugin config > defaults.",
|
|
156
156
|
" //",
|
|
157
|
+
" // Per-tool field scoping — controls which fields get scanned for PII.",
|
|
158
|
+
" // Reduces false positives from structural fields (IDs, hashes, timestamps).",
|
|
159
|
+
' "fieldScoping": {',
|
|
160
|
+
' "toolFields": {',
|
|
161
|
+
' "Read": { "scanFields": ["content", "text"] },',
|
|
162
|
+
' "read": { "scanFields": ["content", "text"] },',
|
|
163
|
+
' "Bash": { "scanFields": ["output", "stdout", "stderr"] },',
|
|
164
|
+
' "exec": { "scanFields": ["output", "stdout", "stderr"] }',
|
|
165
|
+
" },",
|
|
166
|
+
' "neverScanFields": ["id", "created_at", "updated_at", "sha", "hash", "ref", "type", "status", "state", "mode"],',
|
|
167
|
+
' "defaultScanFields": []',
|
|
168
|
+
" },",
|
|
169
|
+
" //",
|
|
157
170
|
" // Rule format:",
|
|
158
171
|
' // "rule_name": {',
|
|
159
172
|
' // "pattern": "regex string", // override or define the detection regex',
|
package/dist/config.js
CHANGED
|
@@ -82,6 +82,31 @@ export function resolveConfig(pluginConfig) {
|
|
|
82
82
|
dryRun: typeof raw.dryRun === "boolean" ? raw.dryRun : false,
|
|
83
83
|
// LRU store eviction (0 = unlimited)
|
|
84
84
|
maxStoreMappings: typeof raw.maxStoreMappings === "number" ? raw.maxStoreMappings : 0,
|
|
85
|
+
// Field scoping (optional, backward compatible)
|
|
86
|
+
fieldScoping: (() => {
|
|
87
|
+
const fs = raw.fieldScoping;
|
|
88
|
+
if (!fs || typeof fs !== "object")
|
|
89
|
+
return undefined;
|
|
90
|
+
const fsc = fs;
|
|
91
|
+
const toolFields = {};
|
|
92
|
+
if (fsc.toolFields && typeof fsc.toolFields === "object") {
|
|
93
|
+
for (const [pattern, rule] of Object.entries(fsc.toolFields)) {
|
|
94
|
+
if (rule && typeof rule === "object" && Array.isArray(rule.scanFields)) {
|
|
95
|
+
toolFields[pattern] = { scanFields: rule.scanFields.filter((f) => typeof f === "string") };
|
|
96
|
+
}
|
|
97
|
+
}
|
|
98
|
+
}
|
|
99
|
+
return {
|
|
100
|
+
toolFields,
|
|
101
|
+
neverScanFields: Array.isArray(fsc.neverScanFields)
|
|
102
|
+
? fsc.neverScanFields.filter((f) => typeof f === "string")
|
|
103
|
+
: [],
|
|
104
|
+
defaultScanFields: Array.isArray(fsc.defaultScanFields)
|
|
105
|
+
? fsc.defaultScanFields.filter((f) => typeof f === "string")
|
|
106
|
+
: [],
|
|
107
|
+
useContractExemptions: typeof fsc.useContractExemptions === "boolean" ? fsc.useContractExemptions : false,
|
|
108
|
+
};
|
|
109
|
+
})(),
|
|
85
110
|
};
|
|
86
111
|
return config;
|
|
87
112
|
}
|
package/dist/detectors/regex.js
CHANGED
|
@@ -1179,6 +1179,251 @@ export const BUILTIN_PATTERNS = [
|
|
|
1179
1179
|
category: Category.ACL_NAME,
|
|
1180
1180
|
confidence: 0.85,
|
|
1181
1181
|
},
|
|
1182
|
+
// ===================================================================
|
|
1183
|
+
// Healthcare — HIPAA-relevant identifiers
|
|
1184
|
+
// ===================================================================
|
|
1185
|
+
// --- Date of birth (context-triggered) ---
|
|
1186
|
+
{
|
|
1187
|
+
// "DOB: 03/15/1987" or "DOB 1987-03-15" or "date of birth: 03/15/1987"
|
|
1188
|
+
name: "dob_keyword",
|
|
1189
|
+
pattern: /(?:DOB|date\s+of\s+birth|birthdate|born\s+on|birth\s+date|geburtsdatum)\s*[:=]?\s*(\d{1,2}[\/\-.]\d{1,2}[\/\-.]\d{2,4})/gi,
|
|
1190
|
+
category: Category.DATE_OF_BIRTH,
|
|
1191
|
+
confidence: 0.9,
|
|
1192
|
+
},
|
|
1193
|
+
{
|
|
1194
|
+
// ISO date after DOB keyword: "DOB: 1987-03-15"
|
|
1195
|
+
name: "dob_iso",
|
|
1196
|
+
pattern: /(?:DOB|date\s+of\s+birth|birthdate|birth\s+date)\s*[:=]?\s*(\d{4}-\d{2}-\d{2})/gi,
|
|
1197
|
+
category: Category.DATE_OF_BIRTH,
|
|
1198
|
+
confidence: 0.9,
|
|
1199
|
+
},
|
|
1200
|
+
{
|
|
1201
|
+
// Written month: "born on March 15, 1987" or "DOB: January 3, 1990"
|
|
1202
|
+
name: "dob_written",
|
|
1203
|
+
pattern: /(?:DOB|date\s+of\s+birth|birthdate|born\s+on|birth\s+date)\s*[:=]?\s*((?:January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{1,2},?\s+\d{4})/gi,
|
|
1204
|
+
category: Category.DATE_OF_BIRTH,
|
|
1205
|
+
confidence: 0.9,
|
|
1206
|
+
},
|
|
1207
|
+
// --- Medical record number (MRN) ---
|
|
1208
|
+
{
|
|
1209
|
+
// "MRN: 12345678" or "medical record number: ABC-123456"
|
|
1210
|
+
name: "mrn_keyword",
|
|
1211
|
+
pattern: /(?:MRN|medical\s*record\s*(?:number|no|#)?|patient\s*(?:id|number|#))\s*[:=#]?\s*([A-Z0-9][\w\-]{3,19})/gi,
|
|
1212
|
+
category: Category.MEDICAL_RECORD_NUMBER,
|
|
1213
|
+
confidence: 0.9,
|
|
1214
|
+
},
|
|
1215
|
+
{
|
|
1216
|
+
// US NPI (National Provider Identifier): 10-digit number with "NPI" context
|
|
1217
|
+
name: "npi_number",
|
|
1218
|
+
pattern: /(?:NPI|national\s*provider)\s*[:=#]?\s*(\d{10})\b/gi,
|
|
1219
|
+
category: Category.MEDICAL_RECORD_NUMBER,
|
|
1220
|
+
confidence: 0.95,
|
|
1221
|
+
},
|
|
1222
|
+
{
|
|
1223
|
+
// US DEA number: 2 letters + 7 digits (e.g., "AB1234567")
|
|
1224
|
+
name: "dea_number",
|
|
1225
|
+
pattern: /(?:DEA|drug\s*enforcement)\s*[:=#]?\s*([A-Z][A-Z9]\d{7})\b/gi,
|
|
1226
|
+
category: Category.MEDICAL_RECORD_NUMBER,
|
|
1227
|
+
confidence: 0.95,
|
|
1228
|
+
},
|
|
1229
|
+
{
|
|
1230
|
+
// Health insurance policy/member/subscriber ID
|
|
1231
|
+
name: "health_insurance_id",
|
|
1232
|
+
pattern: /(?:(?:health\s*)?(?:insurance|policy|member|subscriber|group)\s*(?:id|number|no|#))\s*[:=#]?\s*([A-Z0-9][\w\-]{3,19})/gi,
|
|
1233
|
+
category: Category.MEDICAL_RECORD_NUMBER,
|
|
1234
|
+
confidence: 0.85,
|
|
1235
|
+
},
|
|
1236
|
+
// ===================================================================
|
|
1237
|
+
// Finance — bank accounts, routing numbers, tax IDs
|
|
1238
|
+
// ===================================================================
|
|
1239
|
+
// --- Bank account numbers (context-triggered) ---
|
|
1240
|
+
{
|
|
1241
|
+
// US routing number (ABA/transit): 9 digits with context
|
|
1242
|
+
name: "us_routing_number",
|
|
1243
|
+
pattern: /(?:routing|ABA|transit)\s*(?:number|no|#)?\s*[:=#]?\s*(\d{9})\b/gi,
|
|
1244
|
+
category: Category.BANK_ACCOUNT_NUMBER,
|
|
1245
|
+
confidence: 0.9,
|
|
1246
|
+
},
|
|
1247
|
+
{
|
|
1248
|
+
// Bank account number with context keyword
|
|
1249
|
+
name: "bank_account_keyword",
|
|
1250
|
+
pattern: /(?:(?:bank\s*)?account|acct)\s*(?:number|no|#)\s*[:=#]?\s*(\d{4,17})\b/gi,
|
|
1251
|
+
category: Category.BANK_ACCOUNT_NUMBER,
|
|
1252
|
+
confidence: 0.85,
|
|
1253
|
+
},
|
|
1254
|
+
{
|
|
1255
|
+
// UK sort code: XX-XX-XX with context
|
|
1256
|
+
name: "uk_sort_code",
|
|
1257
|
+
pattern: /(?:sort\s*code)\s*[:=#]?\s*(\d{2}-\d{2}-\d{2})\b/gi,
|
|
1258
|
+
category: Category.BANK_ACCOUNT_NUMBER,
|
|
1259
|
+
confidence: 0.9,
|
|
1260
|
+
},
|
|
1261
|
+
{
|
|
1262
|
+
// SWIFT/BIC code: 8 or 11 alphanumeric characters (e.g., DEUTDEFF, BOFAUS3NXXX)
|
|
1263
|
+
name: "swift_bic",
|
|
1264
|
+
pattern: /(?:SWIFT|BIC|SWIFT\/BIC)\s*[:=#]?\s*([A-Z]{4}[A-Z]{2}[A-Z0-9]{2}(?:[A-Z0-9]{3})?)\b/gi,
|
|
1265
|
+
category: Category.BANK_ACCOUNT_NUMBER,
|
|
1266
|
+
confidence: 0.9,
|
|
1267
|
+
},
|
|
1268
|
+
{
|
|
1269
|
+
// Standalone SWIFT/BIC pattern (distinctive 8/11-char format)
|
|
1270
|
+
name: "swift_bic_standalone",
|
|
1271
|
+
pattern: /\b([A-Z]{4}(?:AT|DE|CH|GB|US|FR|IT|ES|NL|BE|AU|CA|JP|CN|IN|BR|MX|ZA|SE|NO|DK|FI|IE|PT|CZ|PL|HU|RO|BG|HR|SK|SI|LT|LV|EE|MT|CY|LU|GR)[A-Z0-9]{2}(?:[A-Z0-9]{3})?)\b/g,
|
|
1272
|
+
category: Category.BANK_ACCOUNT_NUMBER,
|
|
1273
|
+
confidence: 0.85,
|
|
1274
|
+
},
|
|
1275
|
+
// --- Tax IDs ---
|
|
1276
|
+
{
|
|
1277
|
+
// US EIN (Employer Identification Number): XX-XXXXXXX with context
|
|
1278
|
+
name: "us_ein",
|
|
1279
|
+
pattern: /(?:EIN|employer\s*(?:identification|id)\s*(?:number|no|#)?|tax\s*(?:id|identification)\s*(?:number|no|#)?)\s*[:=#]?\s*(\d{2}-\d{7})\b/gi,
|
|
1280
|
+
category: Category.TAX_ID,
|
|
1281
|
+
confidence: 0.9,
|
|
1282
|
+
},
|
|
1283
|
+
{
|
|
1284
|
+
// TIN/tax number generic context-triggered
|
|
1285
|
+
name: "tax_id_generic",
|
|
1286
|
+
pattern: /(?:TIN|tax\s*(?:id|number|identification)|taxpayer\s*(?:id|number))\s*[:=#]?\s*(\d[\d\-]{5,12})\b/gi,
|
|
1287
|
+
category: Category.TAX_ID,
|
|
1288
|
+
confidence: 0.85,
|
|
1289
|
+
},
|
|
1290
|
+
{
|
|
1291
|
+
// UK UTR (Unique Taxpayer Reference): 10 digits with context
|
|
1292
|
+
name: "uk_utr",
|
|
1293
|
+
pattern: /(?:UTR|unique\s*taxpayer\s*(?:reference|ref))\s*[:=#]?\s*(\d{10})\b/gi,
|
|
1294
|
+
category: Category.TAX_ID,
|
|
1295
|
+
confidence: 0.9,
|
|
1296
|
+
},
|
|
1297
|
+
// ===================================================================
|
|
1298
|
+
// Legal / Identity Documents
|
|
1299
|
+
// ===================================================================
|
|
1300
|
+
// --- Passport numbers (context-triggered) ---
|
|
1301
|
+
{
|
|
1302
|
+
// Generic context: "passport no: P12345678" or "passport number: 123456789"
|
|
1303
|
+
name: "passport_keyword",
|
|
1304
|
+
pattern: /(?:passport)\s*(?:number|no|#)?\s*[:=#]?\s*([A-Z0-9]{6,12})\b/gi,
|
|
1305
|
+
category: Category.PASSPORT_NUMBER,
|
|
1306
|
+
confidence: 0.9,
|
|
1307
|
+
},
|
|
1308
|
+
{
|
|
1309
|
+
// German Reisepass: "Reisepass: C01X00T47"
|
|
1310
|
+
name: "passport_de",
|
|
1311
|
+
pattern: /(?:Reisepass|Personalausweis)\s*(?:Nr|number|no|#)?\.?\s*[:=#]?\s*([CFGHJKLMNPRTVWXYZ0-9]{9})\b/gi,
|
|
1312
|
+
category: Category.PASSPORT_NUMBER,
|
|
1313
|
+
confidence: 0.9,
|
|
1314
|
+
},
|
|
1315
|
+
{
|
|
1316
|
+
// Travel document context: "travel document: XX1234567"
|
|
1317
|
+
name: "travel_document",
|
|
1318
|
+
pattern: /(?:travel\s*document)\s*(?:number|no|#)?\s*[:=#]?\s*([A-Z0-9]{6,12})\b/gi,
|
|
1319
|
+
category: Category.PASSPORT_NUMBER,
|
|
1320
|
+
confidence: 0.85,
|
|
1321
|
+
},
|
|
1322
|
+
// --- Driver's license (context-triggered) ---
|
|
1323
|
+
{
|
|
1324
|
+
// Generic: "driver's license: A1234567" or "DL: 12345678"
|
|
1325
|
+
name: "drivers_license_keyword",
|
|
1326
|
+
pattern: /(?:driver'?s?\s*licen[sc]e|DL|driving\s*licen[sc]e|F[üu]hrerschein)\s*(?:number|no|#|Nr)?\.?\s*[:=#]?\s*([A-Z0-9][\w\-]{3,19})\b/gi,
|
|
1327
|
+
category: Category.DRIVERS_LICENSE,
|
|
1328
|
+
confidence: 0.9,
|
|
1329
|
+
},
|
|
1330
|
+
{
|
|
1331
|
+
// US California format: letter + 7 digits with context
|
|
1332
|
+
name: "dl_california",
|
|
1333
|
+
pattern: /(?:driver'?s?\s*licen[sc]e|DL)\s*(?:number|no|#)?\s*[:=#]?\s*([A-Z]\d{7})\b/gi,
|
|
1334
|
+
category: Category.DRIVERS_LICENSE,
|
|
1335
|
+
confidence: 0.9,
|
|
1336
|
+
},
|
|
1337
|
+
{
|
|
1338
|
+
// License plate / vehicle registration (context-triggered, requires : or = separator)
|
|
1339
|
+
name: "license_plate",
|
|
1340
|
+
pattern: /(?:licen[sc]e\s*plate|registration\s*(?:number|no|#|plate)|vehicle\s*(?:plate|reg|registration)|Kennzeichen|plaque)\s*[:=#]\s*([A-Z][A-Z0-9\-]{1,11}[A-Z0-9])\b/gi,
|
|
1341
|
+
category: Category.DRIVERS_LICENSE,
|
|
1342
|
+
confidence: 0.85,
|
|
1343
|
+
},
|
|
1344
|
+
// --- Court case / docket numbers ---
|
|
1345
|
+
{
|
|
1346
|
+
// US federal court: "1:23-cv-01234" or "2:24-cr-00567"
|
|
1347
|
+
name: "us_federal_case",
|
|
1348
|
+
pattern: /\b(\d{1,2}:\d{2}-[a-z]{2}-\d{3,6})\b/g,
|
|
1349
|
+
category: Category.CASE_NUMBER,
|
|
1350
|
+
confidence: 0.95,
|
|
1351
|
+
},
|
|
1352
|
+
{
|
|
1353
|
+
// Generic case/docket number with context keyword
|
|
1354
|
+
name: "case_number_keyword",
|
|
1355
|
+
pattern: /(?:case|docket|cause|filing)\s*(?:number|no|#)?\s*[:=#]?\s*([A-Z0-9][\w\-\/]{3,24})\b/gi,
|
|
1356
|
+
category: Category.CASE_NUMBER,
|
|
1357
|
+
confidence: 0.85,
|
|
1358
|
+
},
|
|
1359
|
+
{
|
|
1360
|
+
// Patent number: "US12345678" or "EP1234567" or "WO2024123456"
|
|
1361
|
+
name: "patent_number",
|
|
1362
|
+
pattern: /\b((?:US|EP|WO|JP|CN|KR|AU|CA)\s?\d{4,12}(?:\s?[AB]\d?)?)\b/g,
|
|
1363
|
+
category: Category.CASE_NUMBER,
|
|
1364
|
+
confidence: 0.85,
|
|
1365
|
+
},
|
|
1366
|
+
{
|
|
1367
|
+
// Aktenzeichen (German file reference): "Az.: 1 BvR 123/45" or "Az. 12 O 456/23"
|
|
1368
|
+
name: "aktenzeichen",
|
|
1369
|
+
pattern: /(?:Az\.?|Aktenzeichen)\s*[:=]?\s*(\d{1,3}\s+[A-Z][a-z]*\s+\d{1,5}\/\d{2,4})\b/gi,
|
|
1370
|
+
category: Category.CASE_NUMBER,
|
|
1371
|
+
confidence: 0.9,
|
|
1372
|
+
},
|
|
1373
|
+
// ===================================================================
|
|
1374
|
+
// Cloud / Crypto
|
|
1375
|
+
// ===================================================================
|
|
1376
|
+
// --- Cryptocurrency addresses (standalone — distinctive formats) ---
|
|
1377
|
+
{
|
|
1378
|
+
// Ethereum address: 0x + 40 hex chars
|
|
1379
|
+
name: "crypto_ethereum",
|
|
1380
|
+
pattern: /\b(0x[0-9a-fA-F]{40})\b/g,
|
|
1381
|
+
category: Category.CRYPTOCURRENCY_ADDRESS,
|
|
1382
|
+
confidence: 0.95,
|
|
1383
|
+
},
|
|
1384
|
+
{
|
|
1385
|
+
// Bitcoin P2PKH: starts with 1, 25-34 base58 chars
|
|
1386
|
+
name: "crypto_bitcoin_p2pkh",
|
|
1387
|
+
pattern: /\b(1[a-km-zA-HJ-NP-Z1-9]{25,34})\b/g,
|
|
1388
|
+
category: Category.CRYPTOCURRENCY_ADDRESS,
|
|
1389
|
+
confidence: 0.9,
|
|
1390
|
+
},
|
|
1391
|
+
{
|
|
1392
|
+
// Bitcoin P2SH: starts with 3, 25-34 base58 chars
|
|
1393
|
+
name: "crypto_bitcoin_p2sh",
|
|
1394
|
+
pattern: /\b(3[a-km-zA-HJ-NP-Z1-9]{25,34})\b/g,
|
|
1395
|
+
category: Category.CRYPTOCURRENCY_ADDRESS,
|
|
1396
|
+
confidence: 0.9,
|
|
1397
|
+
},
|
|
1398
|
+
{
|
|
1399
|
+
// Bitcoin Bech32: bc1 + 39-59 lowercase alphanum
|
|
1400
|
+
name: "crypto_bitcoin_bech32",
|
|
1401
|
+
pattern: /\b(bc1[a-z0-9]{39,59})\b/g,
|
|
1402
|
+
category: Category.CRYPTOCURRENCY_ADDRESS,
|
|
1403
|
+
confidence: 0.95,
|
|
1404
|
+
},
|
|
1405
|
+
{
|
|
1406
|
+
// Crypto wallet/address with context keyword
|
|
1407
|
+
name: "crypto_wallet_keyword",
|
|
1408
|
+
pattern: /(?:wallet|crypto|bitcoin|ethereum|eth|btc)\s*(?:address|addr|id)?\s*[:=#]?\s*([a-zA-Z0-9]{26,64})\b/gi,
|
|
1409
|
+
category: Category.CRYPTOCURRENCY_ADDRESS,
|
|
1410
|
+
confidence: 0.85,
|
|
1411
|
+
},
|
|
1412
|
+
// --- AWS ARN (standalone — distinctive format) ---
|
|
1413
|
+
{
|
|
1414
|
+
// Full ARN: arn:aws:service:region:account-id:resource (account may be empty for S3)
|
|
1415
|
+
name: "aws_arn",
|
|
1416
|
+
pattern: /\b(arn:aws[a-z\-]*:[a-z0-9\-]+:[a-z0-9\-]*:\d{0,12}:[^\s"']{1,128})\b/g,
|
|
1417
|
+
category: Category.AWS_ARN,
|
|
1418
|
+
confidence: 0.95,
|
|
1419
|
+
},
|
|
1420
|
+
{
|
|
1421
|
+
// AWS account ID with context: 12-digit number after "account" keyword
|
|
1422
|
+
name: "aws_account_id",
|
|
1423
|
+
pattern: /(?:(?:aws|amazon)\s*)?account\s*(?:id|#|number)?\s*[:=#]?\s*(\d{12})\b/gi,
|
|
1424
|
+
category: Category.AWS_ARN,
|
|
1425
|
+
confidence: 0.85,
|
|
1426
|
+
},
|
|
1182
1427
|
];
|
|
1183
1428
|
/**
|
|
1184
1429
|
* Tracks occupied text spans and answers overlap queries in O(log n)
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Per-tool field scoping and per-agent category exemptions for obfuscation.
|
|
3
|
+
*
|
|
4
|
+
* Reduces false positives by only scanning relevant fields per tool and
|
|
5
|
+
* exempting entity categories that the agent's contract allows.
|
|
6
|
+
*/
|
|
7
|
+
import type { FieldScopingConfig, ScopeDecision } from "./types.js";
|
|
8
|
+
export declare class FieldScopeResolver {
|
|
9
|
+
private readonly toolPatterns;
|
|
10
|
+
private readonly neverScanFields;
|
|
11
|
+
private readonly defaultScanFields;
|
|
12
|
+
private readonly useContractExemptions;
|
|
13
|
+
private readonly enabled;
|
|
14
|
+
constructor(config?: FieldScopingConfig);
|
|
15
|
+
/** Resolve which fields to scan for a given tool name. */
|
|
16
|
+
resolveToolScope(toolName: string): ScopeDecision;
|
|
17
|
+
/**
|
|
18
|
+
* Resolve which entity categories are exempt from obfuscation for an agent.
|
|
19
|
+
* Uses the agent contract's allowedDataClasses — those categories are data
|
|
20
|
+
* the agent is trusted to handle, so obfuscating them is counterproductive.
|
|
21
|
+
*/
|
|
22
|
+
resolveAgentExemptions(agentLabel: string, role: string): Set<string>;
|
|
23
|
+
/** Check if a field should be scanned given the resolved scope. */
|
|
24
|
+
shouldScanField(fieldName: string, scope: ScopeDecision): boolean;
|
|
25
|
+
}
|
|
@@ -0,0 +1,92 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Per-tool field scoping and per-agent category exemptions for obfuscation.
|
|
3
|
+
*
|
|
4
|
+
* Reduces false positives by only scanning relevant fields per tool and
|
|
5
|
+
* exempting entity categories that the agent's contract allows.
|
|
6
|
+
*/
|
|
7
|
+
/** Simple wildcard matching (supports * and ?). */
|
|
8
|
+
function wildcardMatch(value, pattern) {
|
|
9
|
+
const escaped = pattern
|
|
10
|
+
.replace(/[.+^${}()|[\]\\]/g, "\\$&")
|
|
11
|
+
.replace(/\*/g, ".*")
|
|
12
|
+
.replace(/\?/g, ".");
|
|
13
|
+
return new RegExp(`^${escaped}$`, "i").test(value);
|
|
14
|
+
}
|
|
15
|
+
export class FieldScopeResolver {
|
|
16
|
+
toolPatterns;
|
|
17
|
+
neverScanFields;
|
|
18
|
+
defaultScanFields;
|
|
19
|
+
useContractExemptions;
|
|
20
|
+
enabled;
|
|
21
|
+
constructor(config) {
|
|
22
|
+
if (!config) {
|
|
23
|
+
this.enabled = false;
|
|
24
|
+
this.toolPatterns = [];
|
|
25
|
+
this.neverScanFields = new Set();
|
|
26
|
+
this.defaultScanFields = new Set();
|
|
27
|
+
this.useContractExemptions = false;
|
|
28
|
+
return;
|
|
29
|
+
}
|
|
30
|
+
this.enabled = true;
|
|
31
|
+
this.neverScanFields = new Set(config.neverScanFields ?? []);
|
|
32
|
+
this.defaultScanFields = new Set(config.defaultScanFields ?? []);
|
|
33
|
+
this.useContractExemptions = config.useContractExemptions ?? false;
|
|
34
|
+
this.toolPatterns = [];
|
|
35
|
+
for (const [pattern, rule] of Object.entries(config.toolFields ?? {})) {
|
|
36
|
+
this.toolPatterns.push({ pattern, scanFields: new Set(rule.scanFields) });
|
|
37
|
+
}
|
|
38
|
+
}
|
|
39
|
+
/** Resolve which fields to scan for a given tool name. */
|
|
40
|
+
resolveToolScope(toolName) {
|
|
41
|
+
if (!this.enabled) {
|
|
42
|
+
return { mode: "all", scanFields: new Set(), neverScanFields: new Set() };
|
|
43
|
+
}
|
|
44
|
+
// Find first matching tool pattern
|
|
45
|
+
for (const { pattern, scanFields } of this.toolPatterns) {
|
|
46
|
+
if (wildcardMatch(toolName, pattern)) {
|
|
47
|
+
return {
|
|
48
|
+
mode: "selected",
|
|
49
|
+
scanFields,
|
|
50
|
+
neverScanFields: this.neverScanFields,
|
|
51
|
+
};
|
|
52
|
+
}
|
|
53
|
+
}
|
|
54
|
+
// No match — use default
|
|
55
|
+
if (this.defaultScanFields.size > 0) {
|
|
56
|
+
return {
|
|
57
|
+
mode: "selected",
|
|
58
|
+
scanFields: this.defaultScanFields,
|
|
59
|
+
neverScanFields: this.neverScanFields,
|
|
60
|
+
};
|
|
61
|
+
}
|
|
62
|
+
// Empty defaultScanFields = scan everything (backward compatible)
|
|
63
|
+
return { mode: "all", scanFields: new Set(), neverScanFields: this.neverScanFields };
|
|
64
|
+
}
|
|
65
|
+
/**
|
|
66
|
+
* Resolve which entity categories are exempt from obfuscation for an agent.
|
|
67
|
+
* Uses the agent contract's allowedDataClasses — those categories are data
|
|
68
|
+
* the agent is trusted to handle, so obfuscating them is counterproductive.
|
|
69
|
+
*/
|
|
70
|
+
resolveAgentExemptions(agentLabel, role) {
|
|
71
|
+
if (!this.enabled || !this.useContractExemptions) {
|
|
72
|
+
return new Set();
|
|
73
|
+
}
|
|
74
|
+
// contracts.ts only exists on feature/transformer — graceful fallback
|
|
75
|
+
try {
|
|
76
|
+
const { resolveAgentContract } = require("./contracts.js");
|
|
77
|
+
const contract = resolveAgentContract(agentLabel, role);
|
|
78
|
+
return new Set(contract.allowedDataClasses);
|
|
79
|
+
}
|
|
80
|
+
catch {
|
|
81
|
+
return new Set();
|
|
82
|
+
}
|
|
83
|
+
}
|
|
84
|
+
/** Check if a field should be scanned given the resolved scope. */
|
|
85
|
+
shouldScanField(fieldName, scope) {
|
|
86
|
+
if (scope.neverScanFields.has(fieldName))
|
|
87
|
+
return false;
|
|
88
|
+
if (scope.mode === "all")
|
|
89
|
+
return true;
|
|
90
|
+
return scope.scanFields.has(fieldName);
|
|
91
|
+
}
|
|
92
|
+
}
|
|
@@ -14,7 +14,72 @@ export declare class CodeGenerator implements BaseGenerator {
|
|
|
14
14
|
_fakeSsn(seed: number): string;
|
|
15
15
|
_fakePhone(seed: number, original: string): string;
|
|
16
16
|
_fakeIban(seed: number, original: string): string;
|
|
17
|
+
/**
|
|
18
|
+
* Format-aware national ID generator.
|
|
19
|
+
* Preserves structure per detected sub-type rather than generic zero-padding.
|
|
20
|
+
*/
|
|
17
21
|
_fakeNationalId(seed: number, original: string): string;
|
|
18
22
|
_fakeJwt(seed: number): string;
|
|
23
|
+
/**
|
|
24
|
+
* GPS coordinate generator — distributes across plausible world locations
|
|
25
|
+
* instead of clustering near null island (0,0).
|
|
26
|
+
*/
|
|
19
27
|
_fakeGps(seed: number): string;
|
|
28
|
+
/**
|
|
29
|
+
* ICS/SCADA identifier — format-aware per sub-type.
|
|
30
|
+
* Preserves structure for OPC UA endpoints, Modbus addresses, BACnet IDs, etc.
|
|
31
|
+
*/
|
|
32
|
+
_fakeIcsId(seed: number, original: string): string;
|
|
33
|
+
/**
|
|
34
|
+
* Certificate generator — produces structurally valid PEM-like blocks
|
|
35
|
+
* instead of [REDACTED-CERT-XXXX] placeholders.
|
|
36
|
+
*/
|
|
37
|
+
_fakeCertificate(seed: number, original: string): string;
|
|
38
|
+
/**
|
|
39
|
+
* Format-preserving fake date of birth.
|
|
40
|
+
* Shifts the date by a deterministic offset (30-300 days) derived from seed,
|
|
41
|
+
* preserving the original format (MM/DD/YYYY, YYYY-MM-DD, DD.MM.YYYY, written month).
|
|
42
|
+
*/
|
|
43
|
+
_fakeDob(seed: number, original: string): string;
|
|
44
|
+
/**
|
|
45
|
+
* Fake medical record / provider ID.
|
|
46
|
+
* Preserves format structure (letter+digits, pure digits, with dashes).
|
|
47
|
+
*/
|
|
48
|
+
_fakeMedicalId(seed: number, original: string): string;
|
|
49
|
+
/**
|
|
50
|
+
* Fake bank account / routing number.
|
|
51
|
+
* Preserves format (digit count, dashes, sort code format).
|
|
52
|
+
*/
|
|
53
|
+
_fakeBankAccount(seed: number, original: string): string;
|
|
54
|
+
/**
|
|
55
|
+
* Fake tax ID / EIN.
|
|
56
|
+
* Preserves format (XX-XXXXXXX for EIN, pure digits for others).
|
|
57
|
+
*/
|
|
58
|
+
_fakeTaxId(seed: number, original: string): string;
|
|
59
|
+
/**
|
|
60
|
+
* Fake passport number.
|
|
61
|
+
* Preserves format: letter prefix + digit count, or pure alphanumeric.
|
|
62
|
+
*/
|
|
63
|
+
_fakePassport(seed: number, original: string): string;
|
|
64
|
+
/**
|
|
65
|
+
* Fake driver's license / license plate.
|
|
66
|
+
* Preserves format structure (letter/digit positions, dashes, spaces).
|
|
67
|
+
*/
|
|
68
|
+
_fakeDriversLicense(seed: number, original: string): string;
|
|
69
|
+
/**
|
|
70
|
+
* Fake case / docket / patent number.
|
|
71
|
+
* Preserves format: digit:digit-letters-digits, or prefix + digits.
|
|
72
|
+
*/
|
|
73
|
+
_fakeCaseNumber(seed: number, original: string): string;
|
|
74
|
+
/**
|
|
75
|
+
* Fake cryptocurrency address.
|
|
76
|
+
* Preserves format: Ethereum (0x + 40 hex), Bitcoin P2PKH/P2SH (base58),
|
|
77
|
+
* Bitcoin Bech32 (bc1 + lowercase alphanum).
|
|
78
|
+
*/
|
|
79
|
+
_fakeCryptoAddress(seed: number, original: string): string;
|
|
80
|
+
/**
|
|
81
|
+
* Fake AWS ARN.
|
|
82
|
+
* Preserves service and resource type, replaces account ID and resource name.
|
|
83
|
+
*/
|
|
84
|
+
_fakeAwsArn(seed: number, original: string): string;
|
|
20
85
|
}
|