shroud-privacy 2.2.6 → 2.2.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +132 -225
- package/dist/detectors/regex.js +47 -0
- package/dist/dns-cache.d.ts +61 -0
- package/dist/dns-cache.js +224 -0
- package/dist/hooks.js +81 -0
- package/openclaw.plugin.json +1 -1
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -2,27 +2,72 @@
|
|
|
2
2
|
<img src="logo.png" alt="Shroud" width="160" height="160">
|
|
3
3
|
</p>
|
|
4
4
|
|
|
5
|
-
<h1 align="center">Shroud
|
|
5
|
+
<h1 align="center">Shroud</h1>
|
|
6
6
|
|
|
7
7
|
<p align="center">
|
|
8
|
-
Privacy
|
|
8
|
+
<strong>Privacy and infrastructure protection for AI agents.</strong><br>
|
|
9
|
+
Prevents sensitive data from reaching LLMs — PII, network topology, credentials, OT/SCADA identifiers, and internal infrastructure details are replaced with deterministic fakes before any API call leaves the process. Responses are deobfuscated transparently so users and tools see real values.
|
|
9
10
|
</p>
|
|
10
11
|
|
|
11
|
-
>
|
|
12
|
+
<p align="center">
|
|
13
|
+
<a href="#install">Install</a> ·
|
|
14
|
+
<a href="#why-shroud">Why Shroud</a> ·
|
|
15
|
+
<a href="#configure">Configure</a> ·
|
|
16
|
+
<a href="#agent-privacy-protocol-app">APP Protocol</a> ·
|
|
17
|
+
<a href="CHANGELOG.md">Changelog</a>
|
|
18
|
+
</p>
|
|
19
|
+
|
|
20
|
+
> Apache 2.0 · Zero runtime dependencies · Works with [OpenClaw](https://openclaw.ai) or any agent via [APP](#agent-privacy-protocol-app)
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Why Shroud
|
|
25
|
+
|
|
26
|
+
Frontier LLMs are transformative for infrastructure operations — network troubleshooting, incident response, change planning, compliance audits. But every prompt you send is an API call to a third party. Without protection, you're transmitting:
|
|
27
|
+
|
|
28
|
+
- **Network topology** — subnets, VLANs, BGP ASNs, OSPF areas, interface descriptions, ACL names, route-maps
|
|
29
|
+
- **Device identities** — hostnames, management IPs, SNMP communities, firmware versions
|
|
30
|
+
- **Credentials** — API keys, connection strings, PSKs, enable secrets, TACACS/RADIUS shared keys
|
|
31
|
+
- **OT/SCADA identifiers** — Modbus addresses, OPC-UA endpoints, IEC 61850 IED names, historian tags, BACnet device IDs
|
|
32
|
+
- **Customer PII** — emails, phone numbers, national IDs, credit cards, physical addresses
|
|
33
|
+
- **Internal URLs** — wiki pages, Jira tickets, admin portals, API endpoints
|
|
34
|
+
|
|
35
|
+
Shroud sits between your agent and the LLM. It detects all of the above (100+ entity types), replaces each with a deterministic format-preserving fake, and reverses the mapping on the way back. The LLM reasons over realistic-looking data. Your real infrastructure stays private.
|
|
36
|
+
|
|
37
|
+
### Who needs this
|
|
38
|
+
|
|
39
|
+
| Sector | What leaks without Shroud |
|
|
40
|
+
|--------|--------------------------|
|
|
41
|
+
| **Telecoms & ISPs** | MPLS topologies, BGP peering, customer CPE configs, circuit IDs |
|
|
42
|
+
| **Energy & utilities** | SCADA/ICS endpoints, substation IPs, OPC-UA tags, DNP3 addresses |
|
|
43
|
+
| **Transport & aviation** | ATC sector IDs, NAV frequencies, signalling network topology |
|
|
44
|
+
| **Banking & finance** | Internal API endpoints, database connection strings, customer PII |
|
|
45
|
+
| **Healthcare** | Patient identifiers, internal system hostnames, API credentials |
|
|
46
|
+
| **Government & defence** | Classified network segments, device inventories, operational IPs |
|
|
47
|
+
| **Any enterprise** | Internal URLs, credentials, employee PII, customer data |
|
|
48
|
+
|
|
49
|
+
### Regulatory context
|
|
50
|
+
|
|
51
|
+
If you process personal data of EU residents, **GDPR Article 32** requires "appropriate technical measures" to protect it. Sending unredacted PII to a third-party LLM API is a data transfer — Shroud ensures detected PII never leaves your process. Similar obligations exist under CCPA, HIPAA, PCI-DSS, and sector-specific regulations (NIS2, NERC CIP, IEC 62443).
|
|
52
|
+
|
|
53
|
+
Shroud does not guarantee compliance — regex-based detection has limitations (see [SECURITY.md](SECURITY.md)). But it is a meaningful technical control that reduces exposure.
|
|
54
|
+
|
|
55
|
+
---
|
|
12
56
|
|
|
13
57
|
## What it does
|
|
14
58
|
|
|
15
|
-
1. **Detects** 100+ entity types: emails, IPs, phones, API keys, hostnames, SNMP communities, BGP ASNs, credit cards, SSNs, file paths, URLs, person/org/location names, VLANs, route-maps, ACLs, OSPF IDs, IBANs, JWTs, PEM certs, GPS coordinates, ICS/SCADA identifiers, Palo Alto
|
|
59
|
+
1. **Detects** 100+ entity types: emails, IPs, phones, API keys, hostnames, SNMP communities, BGP ASNs, credit cards, SSNs, file paths, URLs, person/org/location names, VLANs, route-maps, ACLs, OSPF IDs, IBANs, JWTs, PEM certs, GPS coordinates, ICS/SCADA identifiers, vendor-specific secrets (Cisco, Juniper, Palo Alto, Check Point, Fortinet, F5, Arista), and custom regex patterns.
|
|
16
60
|
2. **Replaces** each value with a deterministic fake (same input + key = same fake every time). Fakes are format-preserving: IPv4 stays in CGNAT range (`100.64.0.0/10`), IPv6 uses ULA range (`fd00::/8`), emails keep `@domain` structure, credit cards pass Luhn, etc.
|
|
17
|
-
3. **
|
|
18
|
-
4. **
|
|
61
|
+
3. **Passes through public URLs** — external URLs (arxiv.org, docs.stripe.com, etc.) are not obfuscated. Shroud resolves FQDNs via DNS: public IPs pass through, RFC 1918 / NXDOMAIN / internal IPs are obfuscated. Well-known platforms (GitHub, YouTube, Wikipedia, etc.) are always passed through.
|
|
62
|
+
4. **Deobfuscates** LLM responses and tool parameters so the user sees real values and tools receive real arguments.
|
|
63
|
+
5. **Audit logs** every event with counts, categories, char deltas, and optional proof hashes — never logging raw sensitive values.
|
|
19
64
|
|
|
20
65
|
### Hook lifecycle
|
|
21
66
|
|
|
22
67
|
| Hook | Direction | What happens |
|
|
23
68
|
|------|-----------|-------------|
|
|
24
69
|
| `globalThis.fetch` intercept | User → LLM | Obfuscate all outbound LLM API requests; deobfuscate SSE responses per content block |
|
|
25
|
-
| `before_prompt_build` | User → LLM |
|
|
70
|
+
| `before_prompt_build` | User → LLM | Warm DNS cache for URL classification; pre-seed mapping store |
|
|
26
71
|
| `before_message_write` | Any → History | Deobfuscate assistant messages for transcript; re-obfuscate on next turn |
|
|
27
72
|
| `before_tool_call` | LLM → Tool | Deobfuscate tool parameters + track tool chain depth |
|
|
28
73
|
| `tool_result_persist` | Tool → History | Obfuscate tool results before storing |
|
|
@@ -30,27 +75,26 @@
|
|
|
30
75
|
| `globalThis.__shroudStreamDeobfuscate` | LLM → Agent | Streaming event deobfuscation hook |
|
|
31
76
|
| `globalThis.__shroudDeobfuscate` | Agent → Channel | Global deobfuscation hook — called by OpenClaw before ANY channel send |
|
|
32
77
|
|
|
33
|
-
> **
|
|
78
|
+
> **How it works:** Shroud intercepts ALL outbound LLM API calls (Anthropic, OpenAI, Google, any provider) at the `fetch` level and obfuscates detected entities in every message — including assistant history and Slack `<mailto:>` markup — before it leaves the process. On the response side, SSE streaming is deobfuscated per content block with buffered flushing. Every delivery path (Slack, WhatsApp, TUI, Telegram, Discord, Signal, web) gets real text automatically. Zero host patches required.
|
|
79
|
+
|
|
80
|
+
> **Requires OpenClaw 2026.3.24 or later.**
|
|
34
81
|
|
|
35
|
-
|
|
82
|
+
---
|
|
36
83
|
|
|
37
84
|
## Install
|
|
38
85
|
|
|
39
86
|
### OpenClaw (2026.3.24+)
|
|
40
87
|
|
|
41
88
|
```bash
|
|
42
|
-
#
|
|
43
|
-
openclaw --version
|
|
44
|
-
|
|
45
|
-
# Install Shroud
|
|
89
|
+
openclaw --version # ensure 2026.3.24+
|
|
46
90
|
openclaw plugins install shroud-privacy
|
|
47
91
|
```
|
|
48
92
|
|
|
49
|
-
Configure in `~/.openclaw/openclaw.json` under `plugins.entries."shroud-privacy".config`. No OpenClaw file modifications needed — Shroud uses runtime
|
|
93
|
+
Configure in `~/.openclaw/openclaw.json` under `plugins.entries."shroud-privacy".config`. No OpenClaw file modifications needed — Shroud uses runtime interception only.
|
|
50
94
|
|
|
51
95
|
### Any agent (via APP)
|
|
52
96
|
|
|
53
|
-
The **Agent Privacy Protocol** (APP) lets any AI agent add privacy
|
|
97
|
+
The **Agent Privacy Protocol** (APP) lets any AI agent add privacy and infrastructure protection — no OpenClaw required. Shroud ships with an APP server and a Python client.
|
|
54
98
|
|
|
55
99
|
```bash
|
|
56
100
|
npm install shroud-privacy
|
|
@@ -83,19 +127,19 @@ node node_modules/shroud-privacy/app-server.mjs node_modules/shroud-privacy/dist
|
|
|
83
127
|
|
|
84
128
|
Handshake (server writes on startup):
|
|
85
129
|
```json
|
|
86
|
-
{"app":"1.0","engine":"shroud","version":"2.2.
|
|
130
|
+
{"app":"1.0","engine":"shroud","version":"2.2.7","capabilities":["obfuscate","deobfuscate","batch","stats","health","configure","audit","partitions"]}
|
|
87
131
|
```
|
|
88
132
|
|
|
89
133
|
Obfuscate:
|
|
90
134
|
```json
|
|
91
|
-
|
|
92
|
-
|
|
135
|
+
> {"id":1,"method":"obfuscate","params":{"text":"Contact admin@acme.com"}}
|
|
136
|
+
< {"id":1,"result":{"text":"Contact user@example.net","entityCount":1,"categories":{"email":1},"modified":true}}
|
|
93
137
|
```
|
|
94
138
|
|
|
95
139
|
Deobfuscate:
|
|
96
140
|
```json
|
|
97
|
-
|
|
98
|
-
|
|
141
|
+
> {"id":2,"method":"deobfuscate","params":{"text":"Contact user@example.net"}}
|
|
142
|
+
< {"id":2,"result":{"text":"Contact admin@acme.com","replacementCount":1,"modified":true}}
|
|
99
143
|
```
|
|
100
144
|
|
|
101
145
|
Other methods: `reset`, `stats`, `health`, `configure`, `shutdown`.
|
|
@@ -113,12 +157,13 @@ openclaw gateway restart
|
|
|
113
157
|
## Updating
|
|
114
158
|
|
|
115
159
|
```bash
|
|
116
|
-
# Remove old plugin, reinstall from npm, restart
|
|
117
160
|
openclaw plugins remove shroud-privacy
|
|
118
161
|
openclaw plugins install shroud-privacy
|
|
119
162
|
openclaw gateway restart
|
|
120
163
|
```
|
|
121
164
|
|
|
165
|
+
---
|
|
166
|
+
|
|
122
167
|
## Configure
|
|
123
168
|
|
|
124
169
|
Edit `~/.openclaw/openclaw.json` under `plugins.entries."shroud-privacy".config`:
|
|
@@ -127,11 +172,7 @@ Edit `~/.openclaw/openclaw.json` under `plugins.entries."shroud-privacy".config`
|
|
|
127
172
|
"shroud-privacy": {
|
|
128
173
|
"enabled": true,
|
|
129
174
|
"config": {
|
|
130
|
-
// Recommended: safe defaults for community use
|
|
131
175
|
"auditEnabled": true // audit log on — see what Shroud is doing
|
|
132
|
-
// "auditIncludeProofHashes": false // off by default (opt-in)
|
|
133
|
-
// "auditMaxFakesSample": 0 // off by default (opt-in)
|
|
134
|
-
// "auditLogFormat": "human" // human-readable single lines
|
|
135
176
|
// "minConfidence": 0.0 // catch everything (default)
|
|
136
177
|
// "secretKey": "" // auto-generated if empty
|
|
137
178
|
// "persistentSalt": "" // set for cross-session consistency
|
|
@@ -151,29 +192,17 @@ openclaw gateway restart
|
|
|
151
192
|
Out of the box, Shroud:
|
|
152
193
|
- Auto-generates a secret key (per-session unless you set `secretKey`)
|
|
153
194
|
- Detects all entity categories at confidence >= 0.0
|
|
195
|
+
- Passes through public URLs (DNS-verified) and well-known platforms
|
|
154
196
|
- Logs audit lines (counts + categories) but **not** proof hashes or fake samples
|
|
155
|
-
- Never logs raw values, real
|
|
156
|
-
|
|
157
|
-
To enable proof hashes and fake samples for deeper audit:
|
|
197
|
+
- Never logs raw values, real-to-fake mappings, or original text
|
|
158
198
|
|
|
159
|
-
|
|
160
|
-
"config": {
|
|
161
|
-
"auditEnabled": true,
|
|
162
|
-
"auditIncludeProofHashes": true,
|
|
163
|
-
"auditHashTruncate": 12,
|
|
164
|
-
"auditMaxFakesSample": 3
|
|
165
|
-
}
|
|
166
|
-
```
|
|
167
|
-
|
|
168
|
-
## Config reference
|
|
169
|
-
|
|
170
|
-
### Core settings
|
|
199
|
+
### Config reference
|
|
171
200
|
|
|
172
201
|
| Key | Type | Default | Description |
|
|
173
202
|
|-----|------|---------|-------------|
|
|
174
203
|
| `secretKey` | string | auto | HMAC secret for deterministic mapping |
|
|
175
204
|
| `persistentSalt` | string | `""` | Fixed salt for cross-session consistency |
|
|
176
|
-
| `minConfidence` | number | `0.0` | Minimum detector confidence (0.0
|
|
205
|
+
| `minConfidence` | number | `0.0` | Minimum detector confidence (0.0-1.0) |
|
|
177
206
|
| `allowlist` | string[] | `[]` | Values to never obfuscate |
|
|
178
207
|
| `denylist` | string[] | `[]` | Values to always obfuscate |
|
|
179
208
|
| `canaryEnabled` | boolean | `false` | Inject tracking tokens for leak detection |
|
|
@@ -206,56 +235,38 @@ Disable or tune individual detection rules by name. Rule names match the built-i
|
|
|
206
235
|
}
|
|
207
236
|
```
|
|
208
237
|
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
### Conversational tools
|
|
212
|
-
|
|
213
|
-
Shroud registers tools that the LLM can call during conversations:
|
|
214
|
-
|
|
215
|
-
| Tool | What it does |
|
|
216
|
-
|------|-------------|
|
|
217
|
-
| `shroud-stats` | Show all detection rules with status, confidence, hit counts, store size, and config summary |
|
|
238
|
+
---
|
|
218
239
|
|
|
219
|
-
|
|
220
|
-
|
|
221
|
-
```bash
|
|
222
|
-
node ~/.openclaw/extensions/shroud-privacy/scripts/shroud-stats.mjs # live rule table
|
|
223
|
-
node ~/.openclaw/extensions/shroud-privacy/scripts/shroud-stats.mjs --json # JSON output
|
|
224
|
-
node ~/.openclaw/extensions/shroud-privacy/scripts/shroud-stats.mjs --test "Contact john@acme.com"
|
|
225
|
-
```
|
|
226
|
-
|
|
227
|
-
Tip: create an alias for convenience:
|
|
228
|
-
```bash
|
|
229
|
-
alias shroud-stats="node ~/.openclaw/extensions/shroud-privacy/scripts/shroud-stats.mjs"
|
|
230
|
-
```
|
|
240
|
+
## URL handling
|
|
231
241
|
|
|
232
|
-
|
|
242
|
+
Shroud distinguishes between internal and external URLs:
|
|
233
243
|
|
|
234
|
-
|
|
244
|
+
- **External URLs pass through.** When Shroud detects a URL, it checks the FQDN against a DNS cache populated in the `before_prompt_build` hook. If the domain resolves to a public IP, the URL is not obfuscated — the LLM needs to see real URLs for tool calls like `fetch` and `web_search`. Well-known platforms (GitHub, YouTube, Wikipedia, Stack Overflow, npm, PyPI, etc.) always pass through regardless of DNS.
|
|
235
245
|
|
|
236
|
-
|
|
246
|
+
- **Internal URLs are obfuscated.** Domains that resolve to RFC 1918 addresses (10.x, 172.16-31.x, 192.168.x), CGNAT, link-local, loopback, or that fail DNS resolution (NXDOMAIN, timeout) are treated as internal infrastructure and obfuscated.
|
|
237
247
|
|
|
238
|
-
**
|
|
248
|
+
- **DNS cache miss = obfuscate.** If the FQDN hasn't been resolved yet (first message in a session, DNS timeout), the URL is obfuscated as a safe default. The cache warms on each turn, so subsequent mentions of the same domain will pass through if it's public.
|
|
239
249
|
|
|
240
|
-
|
|
250
|
+
| URL | Resolves to | Action |
|
|
251
|
+
|-----|-------------|--------|
|
|
252
|
+
| `https://arxiv.org/abs/2301.12345` | 151.101.1.42 (public) | Pass through |
|
|
253
|
+
| `https://docs.stripe.com/api` | 52.x.x.x (public) | Pass through |
|
|
254
|
+
| `https://wiki.internal.corp/runbooks` | 10.0.0.50 (RFC 1918) | Obfuscate |
|
|
255
|
+
| `https://jira.mycompany.net/issue/123` | 172.16.1.10 (RFC 1918) | Obfuscate |
|
|
256
|
+
| `https://secret.local/admin` | NXDOMAIN | Obfuscate |
|
|
257
|
+
| `https://github.com/org/repo` | (PUBLIC_DOMAINS list) | Pass through |
|
|
241
258
|
|
|
242
|
-
|
|
259
|
+
### LLM agent guidance
|
|
243
260
|
|
|
244
|
-
|
|
245
|
-
1. `EventStream.prototype.push()` patch — deobfuscates content blocks in `message_end` events
|
|
246
|
-
2. `globalThis.__shroudDeobfuscate` — available for on-demand deobfuscation
|
|
247
|
-
3. `message_sending` hook — deobfuscates outbound message content when fired by OpenClaw
|
|
248
|
-
4. `before_message_write` hook — deobfuscates assistant messages in the transcript
|
|
261
|
+
Because Shroud replaces URLs before they reach the LLM, the LLM may see unfamiliar or fake-looking domains in the conversation context. Tool calls (fetch, read, etc.) are deobfuscated automatically before execution, so they work correctly even when the LLM sees a fake URL.
|
|
249
262
|
|
|
250
|
-
|
|
263
|
+
**If you are building an agent that uses Shroud, add the following to your agent's system prompt or instruction files:**
|
|
251
264
|
|
|
252
|
-
Shroud
|
|
265
|
+
> Shroud privacy is active. URLs and domains in the conversation may appear different from what the user sent — internal URLs are replaced with fake domains to protect infrastructure. If a URL looks unfamiliar or doesn't resolve, it has likely been obfuscated. The tool call pipeline deobfuscates automatically. Do NOT tell the user a URL is invalid just because you see an unfamiliar domain. If a fetch or read tool succeeded with the URL, trust the result.
|
|
253
266
|
|
|
254
|
-
|
|
255
|
-
- **Audit log lines** — `byRule=regex:email:3,regex:ipv4:2,...` alongside the existing `byCat` field.
|
|
256
|
-
- **`getStats()`** — the `ruleHits` object in the stats response, useful for programmatic access.
|
|
267
|
+
This prevents the LLM from questioning obfuscated URLs or telling the user their link is broken.
|
|
257
268
|
|
|
258
|
-
|
|
269
|
+
---
|
|
259
270
|
|
|
260
271
|
## Redaction levels
|
|
261
272
|
|
|
@@ -269,25 +280,7 @@ Three output modes for different audiences:
|
|
|
269
280
|
"redactionLevel": "masked"
|
|
270
281
|
```
|
|
271
282
|
|
|
272
|
-
|
|
273
|
-
|
|
274
|
-
The **Shroud Enterprise Edition** adds features for teams and regulated environments:
|
|
275
|
-
|
|
276
|
-
- **Multi-tenant isolation** — per-tenant HMAC keying and mapping stores
|
|
277
|
-
- **SIEM integration** — real-time event streaming to webhooks (JSON/CEF)
|
|
278
|
-
- **Key rotation** — rotate secrets without losing existing mappings
|
|
279
|
-
- **Active monitoring** — anomaly detection with alerting pipeline
|
|
280
|
-
- **Policy-as-code** — external JSON policy files with glob/regex rules
|
|
281
|
-
- **Shared store** — cross-agent file-backed mapping synchronization
|
|
282
|
-
- **Compliance mode** — locked category enforcement with audit trail
|
|
283
|
-
- **Exposure tracking** — rate-of-exposure alerting per category
|
|
284
|
-
- **Hot-reload** — live rule updates without restart
|
|
285
|
-
- **Session isolation** — per-session stores and mapping engines
|
|
286
|
-
- **Session handoff** — encrypted export/import for session continuity
|
|
287
|
-
- **Provenance tagging** — invisible audit markers in output
|
|
288
|
-
- **Corpus pre-scanning** — batch obfuscation for RAG pipelines
|
|
289
|
-
|
|
290
|
-
Contact for licensing: https://github.com/wkeything/shroud
|
|
283
|
+
---
|
|
291
284
|
|
|
292
285
|
## Detection intelligence
|
|
293
286
|
|
|
@@ -297,15 +290,17 @@ Shroud includes a `ContextDetector` that wraps the regex engine with post-detect
|
|
|
297
290
|
- **Proximity clustering**: When a name, email, and phone appear within 200 characters, each gets a confidence boost.
|
|
298
291
|
- **Hostname propagation**: `hostname FCNETR1` in one place → bare `FCNETR1` detected everywhere in the text.
|
|
299
292
|
- **Learned entities**: Hostnames and infra identifiers seen in previous messages are remembered and detected in future messages without requiring config-line context.
|
|
300
|
-
- **Documentation filtering**: RFC 3849 IPv6 doc prefix (`2001:db8::/32`), IPv6 loopback (`::1`), `example.com` emails, and well-known placeholders are automatically skipped.
|
|
301
|
-
- **
|
|
293
|
+
- **Documentation filtering**: RFC 3849 IPv6 doc prefix (`2001:db8::/32`), IPv6 loopback (`::1`), `example.com` emails, and well-known placeholders are automatically skipped.
|
|
294
|
+
- **DNS-based URL classification**: External URLs pass through to the LLM; internal URLs are obfuscated. See [URL handling](#url-handling).
|
|
302
295
|
- **Common word decay**: Words like `permit`, `deny`, `default` that happen to match patterns get 50% confidence reduction.
|
|
303
296
|
- **Recursive deobfuscation**: Up to 3 passes for nested structures (fakes inside JSON-encoded strings).
|
|
304
|
-
- **Subnet-aware deobfuscation**: When an LLM derives network/broadcast addresses from fake host IPs
|
|
297
|
+
- **Subnet-aware deobfuscation**: When an LLM derives network/broadcast addresses from fake host IPs, Shroud reverse-maps them via the SubnetMapper. Works for both CGNAT (IPv4) and ULA (IPv6) fake ranges.
|
|
298
|
+
|
|
299
|
+
---
|
|
305
300
|
|
|
306
301
|
## Verify it works
|
|
307
302
|
|
|
308
|
-
After restarting OpenClaw, send a message containing
|
|
303
|
+
After restarting OpenClaw, send a message containing sensitive data (e.g. an email, IP, or config snippet). Then check the logs:
|
|
309
304
|
|
|
310
305
|
```bash
|
|
311
306
|
tail -f ~/.openclaw/logs/openclaw.log \
|
|
@@ -326,41 +321,44 @@ With proof hashes enabled:
|
|
|
326
321
|
[shroud][audit] OBFUSCATE req=a3f1bc9e02d4e7f1 | entities=4 | chars=1200->1218 (delta=+18) | modified=YES | byCat=email:1,ip_address:2,hostname:1 | byRule=regex:email:1,regex:ipv4:2,regex:hostname:1 | proof_in=8a3c1f0e2b4d proof_out=f7d2a1c9e084 | fakes=[jsmith@corp.net|100.64.0.12|SW-LAB-01]
|
|
327
322
|
```
|
|
328
323
|
|
|
329
|
-
###
|
|
324
|
+
### Conversational tools
|
|
325
|
+
|
|
326
|
+
| Tool | What it does |
|
|
327
|
+
|------|-------------|
|
|
328
|
+
| `shroud-stats` | Show all detection rules with status, confidence, hit counts, store size, and config summary |
|
|
329
|
+
|
|
330
|
+
CLI:
|
|
331
|
+
|
|
332
|
+
```bash
|
|
333
|
+
shroud-stats # live rule table
|
|
334
|
+
shroud-stats --json # JSON output
|
|
335
|
+
shroud-stats --test "Contact john@acme.com" # test detection
|
|
336
|
+
```
|
|
337
|
+
|
|
338
|
+
---
|
|
330
339
|
|
|
331
|
-
|
|
332
|
-
|-------|---------|
|
|
333
|
-
| `req` | Random request ID (hex) — correlates obfuscate ↔ deobfuscate |
|
|
334
|
-
| `entities` | Total entities detected and replaced |
|
|
335
|
-
| `chars` | Input → output character count |
|
|
336
|
-
| `delta` | Character count change (fakes may be longer/shorter) |
|
|
337
|
-
| `modified` | `YES` if text was changed, `NO` if pass-through |
|
|
338
|
-
| `byCat` | Entity counts by category |
|
|
339
|
-
| `byRule` | Entity counts by detector rule |
|
|
340
|
-
| `proof_in` | Truncated salted SHA-256 of input text (opt-in) |
|
|
341
|
-
| `proof_out` | Truncated salted SHA-256 of output text (opt-in) |
|
|
342
|
-
| `fakes` | Sample of fake replacement values (opt-in, never real values) |
|
|
340
|
+
## Entity categories
|
|
343
341
|
|
|
344
|
-
|
|
342
|
+
`person_name`, `email`, `phone`, `ip_address`, `api_key`, `url`, `org_name`, `location`, `file_path`, `credit_card`, `ssn`, `mac_address`, `hostname`, `snmp_community`, `bgp_asn`, `network_credential`, `vlan_id`, `interface_desc`, `route_map`, `ospf_id`, `acl_name`, `iban`, `national_id`, `jwt`, `ics_identifier`, `gps_coordinate`, `certificate`, `custom`
|
|
345
343
|
|
|
346
|
-
|
|
344
|
+
---
|
|
347
345
|
|
|
348
346
|
## Agent Privacy Protocol (APP)
|
|
349
347
|
|
|
350
|
-
APP is an open protocol for adding privacy
|
|
348
|
+
APP is an open protocol for adding privacy and infrastructure protection to any AI agent. Shroud is the reference implementation.
|
|
351
349
|
|
|
352
350
|
### Overview
|
|
353
351
|
|
|
354
352
|
```
|
|
355
|
-
|
|
356
|
-
|
|
357
|
-
|
|
358
|
-
|
|
359
|
-
|
|
360
|
-
|
|
361
|
-
|
|
362
|
-
|
|
363
|
-
|
|
353
|
+
+-------------------+ stdin/stdout +------------------+
|
|
354
|
+
| Your Agent | <---- JSON-RPC ----> | APP Server |
|
|
355
|
+
| (any language) | | (app-server.mjs)|
|
|
356
|
+
+-------------------+ +------------------+
|
|
357
|
+
| |
|
|
358
|
+
| 1. obfuscate(user_input) | detects entities,
|
|
359
|
+
| 2. send to LLM | returns fakes
|
|
360
|
+
| 3. deobfuscate(llm_response) | restores reals
|
|
361
|
+
| 4. show to user |
|
|
364
362
|
```
|
|
365
363
|
|
|
366
364
|
### Protocol specification
|
|
@@ -369,20 +367,6 @@ APP is an open protocol for adding privacy obfuscation to any AI agent. Shroud i
|
|
|
369
367
|
- **Encoding**: UTF-8
|
|
370
368
|
- **Process model**: Agent spawns APP server as subprocess, one per agent instance
|
|
371
369
|
|
|
372
|
-
### Handshake
|
|
373
|
-
|
|
374
|
-
On startup, the server writes a single JSON line to stdout:
|
|
375
|
-
|
|
376
|
-
```json
|
|
377
|
-
{"app":"1.0","engine":"shroud","version":"2.2.5","capabilities":["obfuscate","deobfuscate","batch","stats","health","configure","audit","partitions"]}
|
|
378
|
-
```
|
|
379
|
-
|
|
380
|
-
The agent must read this line before sending requests. Fields:
|
|
381
|
-
- `app` — protocol version (always `"1.0"`)
|
|
382
|
-
- `engine` — implementation name
|
|
383
|
-
- `version` — implementation version
|
|
384
|
-
- `capabilities` — supported methods
|
|
385
|
-
|
|
386
370
|
### Methods
|
|
387
371
|
|
|
388
372
|
| Method | Params | Returns | Description |
|
|
@@ -396,38 +380,8 @@ The agent must read this line before sending requests. Fields:
|
|
|
396
380
|
| `batch` | `{operations: [{direction, text}]}` | `{results: [...]}` | Batch obfuscate/deobfuscate |
|
|
397
381
|
| `shutdown` | `{}` | `{ok}` | Graceful shutdown (flushes stats) |
|
|
398
382
|
|
|
399
|
-
### Request/response format
|
|
400
|
-
|
|
401
|
-
```
|
|
402
|
-
→ {"id":1,"method":"obfuscate","params":{"text":"Server 10.1.0.1 is down"}}
|
|
403
|
-
← {"id":1,"result":{"text":"Server 100.64.0.12 is down","entityCount":1,"categories":{"ip_address":1},"modified":true,"audit":{"requestId":"a1b2c3","proofIn":"8a3c1f","proofOut":"f7d2a1"}}}
|
|
404
|
-
```
|
|
405
|
-
|
|
406
|
-
Errors:
|
|
407
|
-
```
|
|
408
|
-
← {"id":1,"error":{"code":-32602,"message":"Missing required param: text"}}
|
|
409
|
-
```
|
|
410
|
-
|
|
411
|
-
### Heartbeat
|
|
412
|
-
|
|
413
|
-
The server writes JSON heartbeats to stderr every 30 seconds:
|
|
414
|
-
```json
|
|
415
|
-
{"heartbeat":true,"pid":12345,"uptime":120,"requests":42,"avgLatencyMs":1.2,"storeSize":15,"memoryMB":28}
|
|
416
|
-
```
|
|
417
|
-
|
|
418
|
-
### Integration checklist
|
|
419
|
-
|
|
420
|
-
1. `npm install shroud-privacy`
|
|
421
|
-
2. Spawn: `node node_modules/shroud-privacy/app-server.mjs node_modules/shroud-privacy/dist`
|
|
422
|
-
3. Read handshake line from stdout
|
|
423
|
-
4. Before LLM: send `obfuscate`, use returned `text`
|
|
424
|
-
5. After LLM: send `deobfuscate`, show returned `text` to user
|
|
425
|
-
6. On agent shutdown: send `shutdown`
|
|
426
|
-
|
|
427
383
|
### Python client
|
|
428
384
|
|
|
429
|
-
A ready-made Python client is included at `clients/python/shroud_client.py`:
|
|
430
|
-
|
|
431
385
|
```python
|
|
432
386
|
from shroud_client import ShroudClient
|
|
433
387
|
|
|
@@ -446,72 +400,25 @@ print(real.residual_fakes) # any CGNAT/ULA IPs that survived
|
|
|
446
400
|
client.stop()
|
|
447
401
|
```
|
|
448
402
|
|
|
449
|
-
|
|
403
|
+
---
|
|
450
404
|
|
|
451
405
|
## Development
|
|
452
406
|
|
|
453
407
|
```bash
|
|
454
408
|
npm install
|
|
455
|
-
npm test #
|
|
409
|
+
npm test # all 3 suites: unit + harness + openclaw
|
|
410
|
+
npm run test:unit # vitest (819 tests)
|
|
411
|
+
npm run test:integration # APP harness (359 tests)
|
|
412
|
+
npm run test:openclaw # OpenClaw sandbox (14 tests)
|
|
456
413
|
npm run build # compile TypeScript
|
|
457
414
|
npm run lint # type-check without emitting
|
|
458
415
|
```
|
|
459
416
|
|
|
460
|
-
|
|
461
|
-
|
|
462
|
-
```bash
|
|
463
|
-
npm run build
|
|
464
|
-
openclaw plugins install --path .
|
|
465
|
-
openclaw gateway restart
|
|
466
|
-
```
|
|
467
|
-
|
|
468
|
-
## Release workflow
|
|
469
|
-
|
|
470
|
-
### Tagging a release
|
|
471
|
-
|
|
472
|
-
```bash
|
|
473
|
-
# 1. Update version in package.json and openclaw.plugin.json
|
|
474
|
-
# 2. Update CHANGELOG.md
|
|
475
|
-
# 3. Commit and tag
|
|
476
|
-
git add -A
|
|
477
|
-
git commit -m "release: vX.Y.Z"
|
|
478
|
-
git tag vX.Y.Z
|
|
479
|
-
git push && git push --tags
|
|
480
|
-
```
|
|
481
|
-
|
|
482
|
-
Then create a GitHub Release from the tag (attach the changelog entry as notes).
|
|
483
|
-
|
|
484
|
-
### npm publish (maintainers only)
|
|
485
|
-
|
|
486
|
-
```bash
|
|
487
|
-
# Pre-flight (always run before publishing)
|
|
488
|
-
npm pack --dry-run # verify only dist/, openclaw.plugin.json, LICENSE are included
|
|
489
|
-
npm run prepublishOnly # lint + test + build (runs automatically on npm publish)
|
|
490
|
-
|
|
491
|
-
# One-time setup (when you decide to publish)
|
|
492
|
-
npm login
|
|
493
|
-
npm profile enable-2fa auth-and-writes
|
|
494
|
-
|
|
495
|
-
# Publish
|
|
496
|
-
npm publish # publishConfig.access = "public" is already set
|
|
497
|
-
```
|
|
498
|
-
|
|
499
|
-
**Security notes:**
|
|
500
|
-
- Enable 2FA for both login and publish (`auth-and-writes`). This prevents token-only takeover.
|
|
501
|
-
- Never commit npm tokens to git. Use `npm login` interactively or set `NPM_TOKEN` as a GitHub Actions secret.
|
|
502
|
-
- Use `npm publish --provenance` in CI to add Sigstore attestation (links the package to the exact source commit).
|
|
503
|
-
|
|
504
|
-
### CI
|
|
505
|
-
|
|
506
|
-
The repo includes `.github/workflows/ci.yml` which runs lint + test + build on every push and PR. The publish job is present but only triggers on `v*` tags and requires `NPM_TOKEN` as a repository secret — it will no-op until that secret is configured.
|
|
507
|
-
|
|
508
|
-
## Entity categories
|
|
509
|
-
|
|
510
|
-
`person_name`, `email`, `phone`, `ip_address`, `api_key`, `url`, `org_name`, `location`, `file_path`, `credit_card`, `ssn`, `mac_address`, `hostname`, `snmp_community`, `bgp_asn`, `network_credential`, `vlan_id`, `interface_desc`, `route_map`, `ospf_id`, `acl_name`, `iban`, `national_id`, `jwt`, `ics_identifier`, `gps_coordinate`, `certificate`, `custom`
|
|
417
|
+
---
|
|
511
418
|
|
|
512
419
|
## Disclaimer
|
|
513
420
|
|
|
514
|
-
This software is provided "as is", without warranty of any kind, express or implied. Shroud uses regex-based detection which may not catch all sensitive data. It reduces
|
|
421
|
+
This software is provided "as is", without warranty of any kind, express or implied. Shroud uses regex-based detection which may not catch all sensitive data. It reduces exposure but does not eliminate it. See [SECURITY.md](SECURITY.md) for known limitations. The authors assume no responsibility for data leakage, compliance failures, or any damages arising from use of this software.
|
|
515
422
|
|
|
516
423
|
## License
|
|
517
424
|
|
package/dist/detectors/regex.js
CHANGED
|
@@ -98,12 +98,53 @@ export function isDocExample(value, category) {
|
|
|
98
98
|
return true;
|
|
99
99
|
}
|
|
100
100
|
}
|
|
101
|
+
// DNS-based public URL detection: if the FQDN resolves to a public IP,
|
|
102
|
+
// the URL is external and should not be obfuscated. The cache is warmed
|
|
103
|
+
// asynchronously in before_prompt_build; cache miss → obfuscate (safe default).
|
|
104
|
+
const dnsCache = globalThis.__shroudDnsCache;
|
|
105
|
+
if (dnsCache) {
|
|
106
|
+
const isPublic = dnsCache.isPublic(value);
|
|
107
|
+
if (isPublic === true)
|
|
108
|
+
return true;
|
|
109
|
+
}
|
|
101
110
|
}
|
|
102
111
|
return false;
|
|
103
112
|
}
|
|
104
113
|
case Category.BGP_ASN:
|
|
105
114
|
// Private ASNs are real infra identifiers — don't skip them
|
|
106
115
|
return false;
|
|
116
|
+
case Category.FILE_PATH: {
|
|
117
|
+
// Skip paths that are clearly URL path components from public domains.
|
|
118
|
+
// e.g., /www.npmjs.com/package/shroud-privacy, /github.com/org/repo
|
|
119
|
+
// This is a safety net — the span fix in detect() should prevent these,
|
|
120
|
+
// but production environments may have edge cases we can't reproduce.
|
|
121
|
+
if (value.startsWith("/")) {
|
|
122
|
+
const pathLower = value.toLowerCase();
|
|
123
|
+
for (const d of PUBLIC_DOMAINS) {
|
|
124
|
+
if (pathLower.startsWith(`/${d}/`) || pathLower.startsWith(`/${d}`)
|
|
125
|
+
|| pathLower.startsWith(`/www.${d}/`) || pathLower.startsWith(`/www.${d}`)) {
|
|
126
|
+
return true;
|
|
127
|
+
}
|
|
128
|
+
}
|
|
129
|
+
for (const d of DOC_DOMAINS) {
|
|
130
|
+
if (pathLower.startsWith(`/${d}/`) || pathLower.startsWith(`/${d}`)
|
|
131
|
+
|| pathLower.startsWith(`/www.${d}/`) || pathLower.startsWith(`/www.${d}`)) {
|
|
132
|
+
return true;
|
|
133
|
+
}
|
|
134
|
+
}
|
|
135
|
+
// DNS cache check — if the first path segment is a public domain
|
|
136
|
+
const dnsCache = globalThis.__shroudDnsCache;
|
|
137
|
+
if (dnsCache) {
|
|
138
|
+
const firstSeg = value.slice(1).split("/")[0];
|
|
139
|
+
if (firstSeg && firstSeg.includes(".")) {
|
|
140
|
+
const isPublic = dnsCache.isPublic("https://" + firstSeg + "/");
|
|
141
|
+
if (isPublic === true)
|
|
142
|
+
return true;
|
|
143
|
+
}
|
|
144
|
+
}
|
|
145
|
+
}
|
|
146
|
+
return false;
|
|
147
|
+
}
|
|
107
148
|
case Category.HOSTNAME:
|
|
108
149
|
if (DOC_HOSTNAMES.has(value) || DOC_HOSTNAMES.has(value.toUpperCase()))
|
|
109
150
|
return true;
|
|
@@ -1208,6 +1249,9 @@ export class RegexDetector {
|
|
|
1208
1249
|
}
|
|
1209
1250
|
// Skip documentation/example values (#7)
|
|
1210
1251
|
if (isDocExample(grp, pdef.category)) {
|
|
1252
|
+
// Still register the span to prevent other detectors
|
|
1253
|
+
// (e.g., file_path) from matching inside a skipped URL
|
|
1254
|
+
spans.add(grpStart, grpEnd);
|
|
1211
1255
|
continue;
|
|
1212
1256
|
}
|
|
1213
1257
|
spans.add(grpStart, grpEnd);
|
|
@@ -1234,6 +1278,9 @@ export class RegexDetector {
|
|
|
1234
1278
|
}
|
|
1235
1279
|
// Skip documentation/example values (#7)
|
|
1236
1280
|
if (isDocExample(value, pdef.category)) {
|
|
1281
|
+
// Still register the span to prevent other detectors
|
|
1282
|
+
// (e.g., file_path) from matching inside a skipped URL
|
|
1283
|
+
spans.add(start, end);
|
|
1237
1284
|
continue;
|
|
1238
1285
|
}
|
|
1239
1286
|
spans.add(start, end);
|
|
@@ -0,0 +1,61 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* DNS resolution cache for URL classification.
|
|
3
|
+
*
|
|
4
|
+
* Resolves FQDNs to determine whether they point to public (external) or
|
|
5
|
+
* private (internal/RFC 1918) addresses. Public URLs are passed through
|
|
6
|
+
* without obfuscation — the LLM needs to see real URLs to make tool call
|
|
7
|
+
* decisions (e.g., "fetch this page").
|
|
8
|
+
*
|
|
9
|
+
* Design:
|
|
10
|
+
* - warmCache() is async — called in the before_prompt_build hook
|
|
11
|
+
* - isPublic() is sync — checked in the obfuscation pipeline's isDocExample()
|
|
12
|
+
* - Cache miss = null (unknown) → obfuscate (safe default, privacy-first)
|
|
13
|
+
* - Uses only Node.js builtins (dns, net). Zero runtime dependencies.
|
|
14
|
+
*/
|
|
15
|
+
/**
|
|
16
|
+
* RFC 1918 + other private/reserved IPv4 ranges.
|
|
17
|
+
* Returns true if the IP should be treated as internal.
|
|
18
|
+
*/
|
|
19
|
+
export declare function isPrivateIPv4(ip: string): boolean;
|
|
20
|
+
/**
|
|
21
|
+
* Check if an IPv6 address is private/reserved.
|
|
22
|
+
*/
|
|
23
|
+
export declare function isPrivateIPv6(ip: string): boolean;
|
|
24
|
+
/**
|
|
25
|
+
* Extract FQDN from a URL string.
|
|
26
|
+
* Returns null if the URL is malformed or the host is an IP literal.
|
|
27
|
+
*/
|
|
28
|
+
export declare function extractFqdn(url: string): string | null;
|
|
29
|
+
export declare class DnsCache {
|
|
30
|
+
private _cache;
|
|
31
|
+
private _ttlMs;
|
|
32
|
+
private _pending;
|
|
33
|
+
constructor(ttlMs?: number);
|
|
34
|
+
/**
|
|
35
|
+
* Resolve an array of URLs and warm the cache.
|
|
36
|
+
* Called from the async before_prompt_build hook.
|
|
37
|
+
* Resolves all FQDNs in parallel for speed.
|
|
38
|
+
*/
|
|
39
|
+
warmCache(urls: string[]): Promise<void>;
|
|
40
|
+
/**
|
|
41
|
+
* Check if a URL points to a public (external) host.
|
|
42
|
+
*
|
|
43
|
+
* Returns:
|
|
44
|
+
* true — resolved to a public IP, safe to pass through
|
|
45
|
+
* false — resolved to a private IP, should be obfuscated
|
|
46
|
+
* null — not in cache (DNS not yet resolved), obfuscate as safe default
|
|
47
|
+
*/
|
|
48
|
+
isPublic(url: string): boolean | null;
|
|
49
|
+
/**
|
|
50
|
+
* Get the resolved address for a URL (for logging/audit).
|
|
51
|
+
*/
|
|
52
|
+
getAddress(url: string): string | null;
|
|
53
|
+
/** Number of cached entries. */
|
|
54
|
+
get size(): number;
|
|
55
|
+
/** Clear the cache. */
|
|
56
|
+
clear(): void;
|
|
57
|
+
/** Pre-seed the cache (for testing with /etc/hosts or mocks). */
|
|
58
|
+
seed(fqdn: string, address: string | null, isPublic: boolean): void;
|
|
59
|
+
private _isCached;
|
|
60
|
+
private _resolve;
|
|
61
|
+
}
|
|
@@ -0,0 +1,224 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* DNS resolution cache for URL classification.
|
|
3
|
+
*
|
|
4
|
+
* Resolves FQDNs to determine whether they point to public (external) or
|
|
5
|
+
* private (internal/RFC 1918) addresses. Public URLs are passed through
|
|
6
|
+
* without obfuscation — the LLM needs to see real URLs to make tool call
|
|
7
|
+
* decisions (e.g., "fetch this page").
|
|
8
|
+
*
|
|
9
|
+
* Design:
|
|
10
|
+
* - warmCache() is async — called in the before_prompt_build hook
|
|
11
|
+
* - isPublic() is sync — checked in the obfuscation pipeline's isDocExample()
|
|
12
|
+
* - Cache miss = null (unknown) → obfuscate (safe default, privacy-first)
|
|
13
|
+
* - Uses only Node.js builtins (dns, net). Zero runtime dependencies.
|
|
14
|
+
*/
|
|
15
|
+
import { lookup } from "node:dns";
|
|
16
|
+
import { isIPv4, isIPv6 } from "node:net";
|
|
17
|
+
/** Default cache TTL: 1 hour. */
|
|
18
|
+
const DEFAULT_TTL_MS = 60 * 60 * 1000;
|
|
19
|
+
/**
|
|
20
|
+
* RFC 1918 + other private/reserved IPv4 ranges.
|
|
21
|
+
* Returns true if the IP should be treated as internal.
|
|
22
|
+
*/
|
|
23
|
+
export function isPrivateIPv4(ip) {
|
|
24
|
+
const parts = ip.split(".").map(Number);
|
|
25
|
+
if (parts.length !== 4 || parts.some((p) => isNaN(p)))
|
|
26
|
+
return true; // malformed → treat as private
|
|
27
|
+
const [a, b] = parts;
|
|
28
|
+
// 10.0.0.0/8 — RFC 1918
|
|
29
|
+
if (a === 10)
|
|
30
|
+
return true;
|
|
31
|
+
// 172.16.0.0/12 — RFC 1918 (172.16.0.0 – 172.31.255.255)
|
|
32
|
+
if (a === 172 && b >= 16 && b <= 31)
|
|
33
|
+
return true;
|
|
34
|
+
// 192.168.0.0/16 — RFC 1918
|
|
35
|
+
if (a === 192 && b === 168)
|
|
36
|
+
return true;
|
|
37
|
+
// 127.0.0.0/8 — loopback
|
|
38
|
+
if (a === 127)
|
|
39
|
+
return true;
|
|
40
|
+
// 169.254.0.0/16 — link-local
|
|
41
|
+
if (a === 169 && b === 254)
|
|
42
|
+
return true;
|
|
43
|
+
// 100.64.0.0/10 — CGNAT (Shroud's fake range — definitely private)
|
|
44
|
+
if (a === 100 && b >= 64 && b <= 127)
|
|
45
|
+
return true;
|
|
46
|
+
// 0.0.0.0/8 — "this" network
|
|
47
|
+
if (a === 0)
|
|
48
|
+
return true;
|
|
49
|
+
// 224.0.0.0/4 — multicast
|
|
50
|
+
if (a >= 224 && a <= 239)
|
|
51
|
+
return true;
|
|
52
|
+
// 240.0.0.0/4 — reserved
|
|
53
|
+
if (a >= 240)
|
|
54
|
+
return true;
|
|
55
|
+
return false;
|
|
56
|
+
}
|
|
57
|
+
/**
|
|
58
|
+
* Check if an IPv6 address is private/reserved.
|
|
59
|
+
*/
|
|
60
|
+
export function isPrivateIPv6(ip) {
|
|
61
|
+
const lower = ip.toLowerCase();
|
|
62
|
+
// ::1 — loopback
|
|
63
|
+
if (lower === "::1")
|
|
64
|
+
return true;
|
|
65
|
+
// fc00::/7 — unique local (ULA)
|
|
66
|
+
if (lower.startsWith("fc") || lower.startsWith("fd"))
|
|
67
|
+
return true;
|
|
68
|
+
// fe80::/10 — link-local
|
|
69
|
+
if (lower.startsWith("fe80"))
|
|
70
|
+
return true;
|
|
71
|
+
// :: — unspecified
|
|
72
|
+
if (lower === "::")
|
|
73
|
+
return true;
|
|
74
|
+
return false;
|
|
75
|
+
}
|
|
76
|
+
/**
|
|
77
|
+
* Extract FQDN from a URL string.
|
|
78
|
+
* Returns null if the URL is malformed or the host is an IP literal.
|
|
79
|
+
*/
|
|
80
|
+
export function extractFqdn(url) {
|
|
81
|
+
// Match protocol://host[:port][/path]
|
|
82
|
+
const match = url.match(/^https?:\/\/([^/:]+)/i);
|
|
83
|
+
if (!match)
|
|
84
|
+
return null;
|
|
85
|
+
const host = match[1].toLowerCase();
|
|
86
|
+
// Skip IP literals — they're handled by the IP detector, not the URL detector
|
|
87
|
+
if (isIPv4(host) || isIPv6(host) || host.startsWith("["))
|
|
88
|
+
return null;
|
|
89
|
+
// Skip localhost
|
|
90
|
+
if (host === "localhost")
|
|
91
|
+
return null;
|
|
92
|
+
// Must have at least one dot (skip bare hostnames)
|
|
93
|
+
if (!host.includes("."))
|
|
94
|
+
return null;
|
|
95
|
+
return host;
|
|
96
|
+
}
|
|
97
|
+
export class DnsCache {
|
|
98
|
+
_cache = new Map();
|
|
99
|
+
_ttlMs;
|
|
100
|
+
_pending = new Map();
|
|
101
|
+
constructor(ttlMs = DEFAULT_TTL_MS) {
|
|
102
|
+
this._ttlMs = ttlMs;
|
|
103
|
+
}
|
|
104
|
+
/**
|
|
105
|
+
* Resolve an array of URLs and warm the cache.
|
|
106
|
+
* Called from the async before_prompt_build hook.
|
|
107
|
+
* Resolves all FQDNs in parallel for speed.
|
|
108
|
+
*/
|
|
109
|
+
async warmCache(urls) {
|
|
110
|
+
const fqdns = new Set();
|
|
111
|
+
for (const url of urls) {
|
|
112
|
+
const fqdn = extractFqdn(url);
|
|
113
|
+
if (fqdn && !this._isCached(fqdn)) {
|
|
114
|
+
fqdns.add(fqdn);
|
|
115
|
+
}
|
|
116
|
+
}
|
|
117
|
+
if (fqdns.size === 0)
|
|
118
|
+
return;
|
|
119
|
+
const promises = [...fqdns].map((fqdn) => this._resolve(fqdn));
|
|
120
|
+
await Promise.allSettled(promises);
|
|
121
|
+
}
|
|
122
|
+
/**
|
|
123
|
+
* Check if a URL points to a public (external) host.
|
|
124
|
+
*
|
|
125
|
+
* Returns:
|
|
126
|
+
* true — resolved to a public IP, safe to pass through
|
|
127
|
+
* false — resolved to a private IP, should be obfuscated
|
|
128
|
+
* null — not in cache (DNS not yet resolved), obfuscate as safe default
|
|
129
|
+
*/
|
|
130
|
+
isPublic(url) {
|
|
131
|
+
const fqdn = extractFqdn(url);
|
|
132
|
+
if (!fqdn)
|
|
133
|
+
return null;
|
|
134
|
+
const entry = this._cache.get(fqdn);
|
|
135
|
+
if (!entry)
|
|
136
|
+
return null;
|
|
137
|
+
// Check TTL
|
|
138
|
+
if (Date.now() - entry.resolvedAt > this._ttlMs) {
|
|
139
|
+
this._cache.delete(fqdn);
|
|
140
|
+
return null;
|
|
141
|
+
}
|
|
142
|
+
return entry.isPublic;
|
|
143
|
+
}
|
|
144
|
+
/**
|
|
145
|
+
* Get the resolved address for a URL (for logging/audit).
|
|
146
|
+
*/
|
|
147
|
+
getAddress(url) {
|
|
148
|
+
const fqdn = extractFqdn(url);
|
|
149
|
+
if (!fqdn)
|
|
150
|
+
return null;
|
|
151
|
+
return this._cache.get(fqdn)?.address ?? null;
|
|
152
|
+
}
|
|
153
|
+
/** Number of cached entries. */
|
|
154
|
+
get size() {
|
|
155
|
+
return this._cache.size;
|
|
156
|
+
}
|
|
157
|
+
/** Clear the cache. */
|
|
158
|
+
clear() {
|
|
159
|
+
this._cache.clear();
|
|
160
|
+
this._pending.clear();
|
|
161
|
+
}
|
|
162
|
+
/** Pre-seed the cache (for testing with /etc/hosts or mocks). */
|
|
163
|
+
seed(fqdn, address, isPublic) {
|
|
164
|
+
this._cache.set(fqdn.toLowerCase(), {
|
|
165
|
+
address,
|
|
166
|
+
isPublic,
|
|
167
|
+
resolvedAt: Date.now(),
|
|
168
|
+
});
|
|
169
|
+
}
|
|
170
|
+
_isCached(fqdn) {
|
|
171
|
+
const entry = this._cache.get(fqdn);
|
|
172
|
+
if (!entry)
|
|
173
|
+
return false;
|
|
174
|
+
return Date.now() - entry.resolvedAt <= this._ttlMs;
|
|
175
|
+
}
|
|
176
|
+
_resolve(fqdn) {
|
|
177
|
+
// Deduplicate concurrent lookups for the same FQDN
|
|
178
|
+
const existing = this._pending.get(fqdn);
|
|
179
|
+
if (existing)
|
|
180
|
+
return existing;
|
|
181
|
+
const promise = new Promise((resolve) => {
|
|
182
|
+
// 3-second timeout to avoid blocking the hook
|
|
183
|
+
const timer = setTimeout(() => {
|
|
184
|
+
const entry = {
|
|
185
|
+
address: null,
|
|
186
|
+
isPublic: false, // timeout → treat as private (safe default)
|
|
187
|
+
resolvedAt: Date.now(),
|
|
188
|
+
};
|
|
189
|
+
this._cache.set(fqdn, entry);
|
|
190
|
+
this._pending.delete(fqdn);
|
|
191
|
+
resolve(entry);
|
|
192
|
+
}, 3000);
|
|
193
|
+
lookup(fqdn, { all: false }, (err, address, family) => {
|
|
194
|
+
clearTimeout(timer);
|
|
195
|
+
let entry;
|
|
196
|
+
if (err || !address) {
|
|
197
|
+
// NXDOMAIN, ENOTFOUND, etc. → treat as private
|
|
198
|
+
entry = {
|
|
199
|
+
address: null,
|
|
200
|
+
isPublic: false,
|
|
201
|
+
resolvedAt: Date.now(),
|
|
202
|
+
};
|
|
203
|
+
}
|
|
204
|
+
else {
|
|
205
|
+
const isPrivate = family === 4
|
|
206
|
+
? isPrivateIPv4(address)
|
|
207
|
+
: family === 6
|
|
208
|
+
? isPrivateIPv6(address)
|
|
209
|
+
: true; // unknown family → private
|
|
210
|
+
entry = {
|
|
211
|
+
address,
|
|
212
|
+
isPublic: !isPrivate,
|
|
213
|
+
resolvedAt: Date.now(),
|
|
214
|
+
};
|
|
215
|
+
}
|
|
216
|
+
this._cache.set(fqdn, entry);
|
|
217
|
+
this._pending.delete(fqdn);
|
|
218
|
+
resolve(entry);
|
|
219
|
+
});
|
|
220
|
+
});
|
|
221
|
+
this._pending.set(fqdn, promise);
|
|
222
|
+
return promise;
|
|
223
|
+
}
|
|
224
|
+
}
|
package/dist/hooks.js
CHANGED
|
@@ -24,6 +24,7 @@ import { createHash, randomBytes } from "node:crypto";
|
|
|
24
24
|
import { writeFileSync } from "node:fs";
|
|
25
25
|
import { BUILTIN_PATTERNS } from "./detectors/regex.js";
|
|
26
26
|
import { STATS_FILE, IS_TEST } from "./config.js";
|
|
27
|
+
import { DnsCache } from "./dns-cache.js";
|
|
27
28
|
function getSharedObfuscator(fallback) {
|
|
28
29
|
return globalThis.__shroudObfuscator || fallback;
|
|
29
30
|
}
|
|
@@ -191,6 +192,35 @@ export function registerHooks(api, obfuscator) {
|
|
|
191
192
|
else {
|
|
192
193
|
g.__shroudObfuscator = obfuscator;
|
|
193
194
|
}
|
|
195
|
+
// DNS cache for public URL detection — shared across plugin instances
|
|
196
|
+
if (!g.__shroudDnsCache) {
|
|
197
|
+
const cache = new DnsCache();
|
|
198
|
+
g.__shroudDnsCache = cache;
|
|
199
|
+
// Pre-warm with well-known public domains so first-turn URLs pass through
|
|
200
|
+
// without waiting for async DNS resolution. These domains are guaranteed
|
|
201
|
+
// public — no lookup needed.
|
|
202
|
+
const publicDomains = [
|
|
203
|
+
"youtube.com", "youtu.be", "m.youtube.com",
|
|
204
|
+
"google.com", "google.co.uk", "google.de", "google.fr",
|
|
205
|
+
"github.com", "gitlab.com", "bitbucket.org",
|
|
206
|
+
"stackoverflow.com", "stackexchange.com",
|
|
207
|
+
"wikipedia.org", "wikimedia.org",
|
|
208
|
+
"twitter.com", "x.com",
|
|
209
|
+
"reddit.com",
|
|
210
|
+
"linkedin.com",
|
|
211
|
+
"medium.com",
|
|
212
|
+
"npmjs.com", "www.npmjs.com", "pypi.org", "crates.io",
|
|
213
|
+
"docker.com", "hub.docker.com",
|
|
214
|
+
"microsoft.com", "apple.com",
|
|
215
|
+
"mozilla.org",
|
|
216
|
+
"w3.org",
|
|
217
|
+
"archive.org",
|
|
218
|
+
];
|
|
219
|
+
for (const d of publicDomains) {
|
|
220
|
+
cache.seed(d, "0.0.0.1", true); // address doesn't matter, isPublic=true
|
|
221
|
+
cache.seed("www." + d, "0.0.0.1", true);
|
|
222
|
+
}
|
|
223
|
+
}
|
|
194
224
|
}
|
|
195
225
|
// All hook closures must use the shared obfuscator, not the local parameter.
|
|
196
226
|
// OpenClaw loads the plugin multiple times; only one instance has the mappings.
|
|
@@ -206,6 +236,57 @@ export function registerHooks(api, obfuscator) {
|
|
|
206
236
|
if (ob().toolDepth > 0) {
|
|
207
237
|
ob().resetToolDepth();
|
|
208
238
|
}
|
|
239
|
+
// ── DNS cache warming ──
|
|
240
|
+
// Extract all URLs from the prompt and messages, resolve their FQDNs
|
|
241
|
+
// to determine public vs private. This runs BEFORE obfuscation so
|
|
242
|
+
// the sync pipeline's isDocExample() can check the cache.
|
|
243
|
+
//
|
|
244
|
+
// Slack wraps URLs as <https://url|display> or <https://url>.
|
|
245
|
+
// We must strip this markup BEFORE extracting URLs, otherwise the
|
|
246
|
+
// regex won't match and the DNS cache won't warm for Slack messages.
|
|
247
|
+
const dnsCache = globalThis.__shroudDnsCache;
|
|
248
|
+
if (dnsCache) {
|
|
249
|
+
const urlRe = /https?:\/\/[^\s<>"')\]]+[^\s<>"')\].,;:!?]/g;
|
|
250
|
+
const allUrls = [];
|
|
251
|
+
// Strip Slack link markup so URL regex can match cleanly
|
|
252
|
+
function stripSlackForDns(text) {
|
|
253
|
+
text = text.replace(/<mailto:[^|>]+\|([^>]*)>/g, "$1");
|
|
254
|
+
text = text.replace(/<(https?:\/\/[^|>]+)\|[^>]*>/g, "$1");
|
|
255
|
+
text = text.replace(/<(https?:\/\/[^>]+)>/g, "$1");
|
|
256
|
+
return text;
|
|
257
|
+
}
|
|
258
|
+
if (typeof event?.prompt === "string") {
|
|
259
|
+
const cleaned = stripSlackForDns(event.prompt);
|
|
260
|
+
for (const m of cleaned.matchAll(urlRe))
|
|
261
|
+
allUrls.push(m[0]);
|
|
262
|
+
}
|
|
263
|
+
if (Array.isArray(event?.messages)) {
|
|
264
|
+
for (const msg of event.messages) {
|
|
265
|
+
const texts = [];
|
|
266
|
+
if (typeof msg.content === "string")
|
|
267
|
+
texts.push(msg.content);
|
|
268
|
+
else if (Array.isArray(msg.content)) {
|
|
269
|
+
for (const b of msg.content) {
|
|
270
|
+
if (b?.type === "text" && typeof b.text === "string")
|
|
271
|
+
texts.push(b.text);
|
|
272
|
+
}
|
|
273
|
+
}
|
|
274
|
+
for (const text of texts) {
|
|
275
|
+
const cleaned = stripSlackForDns(text);
|
|
276
|
+
for (const m of cleaned.matchAll(urlRe))
|
|
277
|
+
allUrls.push(m[0]);
|
|
278
|
+
}
|
|
279
|
+
}
|
|
280
|
+
}
|
|
281
|
+
if (allUrls.length > 0) {
|
|
282
|
+
try {
|
|
283
|
+
await dnsCache.warmCache(allUrls);
|
|
284
|
+
}
|
|
285
|
+
catch {
|
|
286
|
+
// DNS failure is non-fatal — URLs will be obfuscated (safe default)
|
|
287
|
+
}
|
|
288
|
+
}
|
|
289
|
+
}
|
|
209
290
|
let totalEntities = 0;
|
|
210
291
|
// Obfuscate the system prompt
|
|
211
292
|
const prompt = event?.prompt;
|
package/openclaw.plugin.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"id": "shroud-privacy",
|
|
3
3
|
"name": "Shroud",
|
|
4
|
-
"version": "2.2.
|
|
4
|
+
"version": "2.2.8",
|
|
5
5
|
"description": "Privacy obfuscation with deterministic fake values and deobfuscation — PII never reaches the LLM, tool calls still work",
|
|
6
6
|
"configSchema": {
|
|
7
7
|
"type": "object",
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "shroud-privacy",
|
|
3
|
-
"version": "2.2.
|
|
4
|
-
"description": "Privacy
|
|
3
|
+
"version": "2.2.8",
|
|
4
|
+
"description": "Privacy and infrastructure protection for AI agents — detects sensitive data (PII, network topology, credentials, OT/SCADA) and replaces with deterministic fakes before anything reaches the LLM.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "dist/index.js",
|
|
7
7
|
"types": "dist/index.d.ts",
|