@safefence/openclaw-guardrails 0.6.2 → 0.6.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,444 +1,13 @@
1
1
  # OpenClaw Guardrails
2
2
 
3
+ [![npm version](https://img.shields.io/npm/v/@safefence/openclaw-guardrails)](https://www.npmjs.com/package/@safefence/openclaw-guardrails)
4
+ [![npm provenance](https://img.shields.io/badge/npm-provenance-brightgreen)](https://docs.npmjs.com/generating-provenance-statements)
5
+
3
6
  > **Experimental** -- This project is under active development and not yet production-ready. APIs, config schemas, and behavior may change without notice between releases.
4
7
 
5
8
  Native TypeScript security kernel for OpenClaw (`>=2026.2.25`) with deterministic local enforcement, principal-aware authorization, and owner approval for group/multi-user safety.
6
9
 
7
- ## Repository Context
8
-
9
- - Root project overview: [`../../README.md`](../../README.md)
10
- - Research and threat analysis: [`../../docs/openclaw-llm-security-research.md`](../../docs/openclaw-llm-security-research.md)
11
- - OWASP LLM coverage mapping: see the research doc above.
12
-
13
- ## Core Model
14
-
15
- - One engine path for all phases (`GuardrailsEngine`).
16
- - Fixed-order detector pipeline with deterministic reason codes.
17
- - Monotonic precedence: `DENY > REDACT > ALLOW`.
18
- - No runtime dependency on remote inference or policy services.
19
- - Zero runtime dependencies — uses only Node.js built-ins (`fetch()`, `fs`).
20
- - Audit mode still applies redaction by default.
21
-
22
- ## How It Works
23
-
24
- ### Plugin ↔ Engine Flow
25
-
26
- The plugin has three layers: `openclaw-extension.ts` registers typed hooks with OpenClaw, `event-adapter.ts` maps between OpenClaw's structured `(event, ctx)` pairs and the internal `OpenClawContext`, and `openclaw-adapter.ts` converts contexts into `GuardEvent`s for the engine.
27
-
28
- ```mermaid
29
- sequenceDiagram
30
- participant OC as OpenClaw Runtime
31
- participant EXT as openclaw-extension.ts
32
- participant EA as event-adapter.ts
33
- participant ADP as openclaw-adapter.ts
34
- participant ENG as GuardrailsEngine
35
-
36
- OC->>EXT: api.on(hookName, handler)
37
- EXT->>EA: map*(event, ctx) → OpenClawContext
38
- EXT->>ADP: hooks.<hookName>(oclCtx)
39
- ADP->>ADP: toEvent(phase, ctx) → GuardEvent
40
- ADP->>ENG: engine.evaluate(guardEvent, phase)
41
- ENG-->>ADP: GuardDecision
42
- ADP->>ADP: applyRolloutPolicy()
43
- ADP->>ADP: updateMetrics()
44
- ADP-->>EXT: OpenClawHookResult
45
- EXT->>EA: mapTo*Result(hookResult) → typed result
46
- EXT-->>OC: hook-specific return value
47
- ```
48
-
49
- ### Hook Lifecycle
50
-
51
- Six lifecycle hooks span the full agent interaction. Each hook has different blocking/redaction capabilities:
52
-
53
- ```mermaid
54
- sequenceDiagram
55
- participant U as User / Channel
56
- participant OC as OpenClaw
57
- participant G as Guardrails Plugin
58
-
59
- rect rgb(240, 248, 255)
60
- Note over U,G: Agent Initialization
61
- OC->>G: before_agent_start(prompt, agentCtx)
62
- G-->>OC: { prependSystemContext: securityPolicy }
63
- Note right of G: Injects immutable security prompt
64
- end
65
-
66
- rect rgb(255, 248, 240)
67
- Note over U,G: Inbound Message
68
- U->>OC: Send message
69
- OC->>G: message_received(from, content, channelCtx)
70
- G-->>OC: void (observe-only, cannot block)
71
- Note right of G: Audits violations, defers enforcement
72
- end
73
-
74
- rect rgb(240, 255, 240)
75
- Note over U,G: Tool Execution Gate
76
- OC->>G: before_tool_call(toolName, params, agentCtx)
77
- G-->>OC: { block: true, blockReason } or {}
78
- Note right of G: Primary enforcement point
79
- end
80
-
81
- rect rgb(255, 255, 240)
82
- Note over U,G: Tool Result Persistence
83
- OC->>G: tool_result_persist(message, toolCtx)
84
- G-->>OC: { message: { content: redacted } } or {}
85
- Note right of G: Sync regex redaction only
86
- Note right of G: Async engine eval for audit (fire-and-forget)
87
- end
88
-
89
- rect rgb(255, 240, 240)
90
- Note over U,G: Outbound Message Gate
91
- OC->>G: message_sending(content, channelCtx)
92
- G-->>OC: { cancel: true } or { content: redacted } or {}
93
- Note right of G: Blocks system prompt leaks
94
- Note right of G: Always enforced in stage_b rollout
95
- end
96
-
97
- rect rgb(248, 240, 255)
98
- Note over U,G: Session End
99
- OC->>G: agent_end(messages, success, agentCtx)
100
- G-->>OC: void (observe-only)
101
- Note right of G: Emits metrics + monitoring snapshot
102
- end
103
- ```
104
-
105
- ### Hook Capability Matrix
106
-
107
- | Hook | Can Block | Can Redact | Can Cancel | Return Type |
108
- |---|---|---|---|---|
109
- | `before_agent_start` | No | No | No | `{ prependSystemContext }` |
110
- | `message_received` | No (void) | No | No | void |
111
- | `before_tool_call` | **Yes** | No | No | `{ block, blockReason }` |
112
- | `tool_result_persist` | No | **Yes** (sync) | No | `{ message }` |
113
- | `message_sending` | **Yes** | **Yes** | **Yes** | `{ cancel }` or `{ content }` |
114
- | `agent_end` | No (void) | No | No | void |
115
-
116
- ### Detector Pipeline
117
-
118
- All 12 detectors run sequentially for every `engine.evaluate()` call. No short-circuiting — an early DENY does not skip later detectors. All hits are merged, then `DENY > REDACT > ALLOW` precedence determines the outcome.
119
-
120
- ```mermaid
121
- sequenceDiagram
122
- participant ENG as Engine.evaluate()
123
- participant D1 as 1. Input Intent
124
- participant D2 as 2. Command Policy
125
- participant D3 as 3. Path Canonical
126
- participant D4 as 4. Network Egress
127
- participant D5 as 5. Provenance
128
- participant D6 as 6. Principal Authz
129
- participant D7 as 7. Owner Approval
130
- participant D8 as 8. Sensitive Data
131
- participant D9 as 9. Restricted Info
132
- participant D10 as 10. Output Safety
133
- participant D11 as 11. Budget
134
- participant D12 as 12. Extensions
135
-
136
- Note over ENG: normalizeGuardEvent(rawEvent)
137
-
138
- ENG->>D1: size limits, injection, exfil, context probes
139
- D1-->>ENG: hits[]
140
- ENG->>D2: tool allowlist, binary allowlist, shell ops, destructive cmds
141
- Note over D2: before_tool_call only
142
- D2-->>ENG: hits[]
143
- ENG->>D3: path traversal, workspace boundary, symlinks
144
- Note over D3: async (realpath), before_tool_call only
145
- D3-->>ENG: hits[]
146
- ENG->>D4: host allowlist, private egress, DNS validation
147
- Note over D4: async (DNS), before_tool_call only
148
- D4-->>ENG: hits[]
149
- ENG->>D5: supply chain trust + retrieval trust
150
- Note over D5: async, before_tool_call only
151
- D5-->>ENG: hits[]
152
- ENG->>D6: identity resolution, RBAC, mention-gating
153
- Note over D6: Anti-spoofing: owner/admin derived from config only
154
- D6-->>ENG: hits[] + approvalRequirement?
155
- ENG->>D7: challenge/verify approval token
156
- Note over D7: Only runs if D6 returned approvalRequirement
157
- D7-->>ENG: hits[] + approvalChallenge?
158
- ENG->>D8: secret patterns → PII patterns (cascaded)
159
- D8-->>ENG: hits[] + redactedContent?
160
- ENG->>D9: data-class redaction for non-owner principals
161
- D9-->>ENG: hits[] + redactedContent?
162
- ENG->>D10: system prompt leak, suspicious output patterns
163
- Note over D10: Receives pre-redacted content from D9/D8
164
- D10-->>ENG: hits[] + redactedContent?
165
- ENG->>D11: requests/min + tool calls/min (sliding window)
166
- D11-->>ENG: hits[]
167
- ENG->>D12: external HTTP validators + custom validators
168
- Note over D12: Concurrent via Promise.all, custom validators fail-open
169
- D12-->>ENG: hits[]
170
-
171
- Note over ENG: decideFromHits(): DENY > REDACT > ALLOW
172
- Note over ENG: aggregateRisk(): 1 - exp(-weighted_sum)
173
- Note over ENG: finalizeDecision(): audit mode override
174
- Note over ENG: auditSink.append() if enabled
175
- ```
176
-
177
- #### Detector Details
178
-
179
- | # | Detector | Active Phases | What It Checks | Decision | Weight |
180
- |---|---|---|---|---|---|
181
- | 1 | Input Intent | All | Input size limits, prompt injection patterns, exfiltration patterns, context probing (injected filenames, workspace probing) | DENY | 0.75–0.95 |
182
- | 2 | Command Policy | `before_tool_call` | Tool allowlist, binary allowlist, shell operators, destructive command patterns, arg pattern validation | DENY | 0.8–1.0 |
183
- | 3 | Path Canonical | `before_tool_call` | Path traversal patterns, workspace boundary (realpath), symlink traversal | DENY | 0.9–0.95 |
184
- | 4 | Network Egress | `before_tool_call` | Host allowlist, private/local IP blocking, DNS resolution, egress tool detection | DENY | 0.7–0.9 |
185
- | 5 | Provenance | `before_tool_call` | Skill source trust, hash integrity, retrieval trust level, signed source | DENY | 0.7–0.85 |
186
- | 6 | Principal Authz | All | Identity resolution, role-based tool policy, mention-gating, group channel enforcement, data-class restrictions | DENY | 0.7–0.95 |
187
- | 7 | Owner Approval | Conditional | Challenge creation, token verification (TTL, digest, conversation, replay) | DENY | 0.8–0.9 |
188
- | 8 | Sensitive Data | All | Secret patterns (AWS keys, GitHub PATs, PEM keys, etc.), PII patterns (emails, SSNs, credit cards) | REDACT | 0.5–0.7 |
189
- | 9 | Restricted Info | `message_received`, `tool_result_persist`, `message_sending` | Data-class policy for non-owner principals, cross-principal redaction | DENY/REDACT | 0.7–0.9 |
190
- | 10 | Output Safety | `message_received`, `tool_result_persist`, `message_sending` | System prompt leak patterns, injected filename references, suspicious patterns (script tags, bearer tokens) | DENY/REDACT | 0.55–0.95 |
191
- | 11 | Budget | All (tool calls: `before_tool_call` only) | Requests/minute, tool calls/minute (sliding 60s window, per-principal partitioned) | DENY | 0.65–0.75 |
192
- | 12 | Extensions | All | External HTTP validators (circuit breaker, timeout), custom validator functions (phase-filtered) | DENY | 0.5–0.7 |
193
-
194
- ### Risk Scoring
195
-
196
- Risk score formula: `1 - exp(-Σ(clamp(weight, 0, 1) × multiplier))` where DENY multiplier = 1.0, REDACT multiplier = 0.6. This produces a diminishing-returns curve: many small hits converge toward 1.0 but never exceed it. Rounded to 4 decimal places.
197
-
198
- ### Decision Finalization
199
-
200
- ```mermaid
201
- flowchart TD
202
- A[All RuleHits merged] --> B{Any DENY hit?}
203
- B -->|Yes| C[decision = DENY]
204
- B -->|No| D{Any REDACT hit?}
205
- D -->|Yes| E[decision = REDACT]
206
- D -->|No| F[decision = ALLOW]
207
- C --> G{mode = audit?}
208
- E --> G
209
- F --> H[Return GuardDecision]
210
- G -->|Yes| I[Override to ALLOW<br/>Prepend AUDIT_WOULD_DENY/REDACT<br/>Redact only if applyInAuditMode]
211
- G -->|No| J[Return as-is with enforcement]
212
- I --> H
213
- J --> H
214
- ```
215
-
216
- ### Rollout Stages
217
-
218
- ```mermaid
219
- flowchart LR
220
- A[stage_a_audit] -->|"All violations audit-only"| B[stage_b_high_risk_enforce]
221
- B -->|"message_sending: always enforce<br/>before_tool_call: enforce if highRiskTools<br/>others: audit-only"| C[stage_c_full_enforce]
222
- C -->|"All violations enforced"| D[Production]
223
- ```
224
-
225
- ## Security Features
226
-
227
- ### Identity and Authorization
228
- - Principal-aware identity model (`owner/admin/member/unknown`).
229
- - **Anti-spoofing**: privileged roles (`owner`/`admin`) are derived exclusively from `principal.ownerIds`/`adminIds` in config — caller-supplied `metadata.role` values of `"owner"` or `"admin"` are downgraded to `"member"`.
230
- - Group-aware authorization (mention-gating + role-based tool policy).
231
-
232
- ### Owner Approval Workflow
233
-
234
- ```mermaid
235
- sequenceDiagram
236
- participant Agent as Agent / Caller
237
- participant ENG as GuardrailsEngine
238
- participant D6 as Principal Authz
239
- participant D7 as Owner Approval
240
- participant AB as ApprovalBroker
241
- participant AS as ApprovalStore
242
- participant NS as NotificationSink
243
- participant Owner as Owner / Admin
244
-
245
- rect rgb(255, 248, 240)
246
- Note over Agent,Owner: Phase 1: Challenge
247
- Agent->>ENG: before_tool_call (restricted tool, member role)
248
- ENG->>D6: evaluateAuthorization()
249
- D6-->>ENG: approvalRequirement (requiredRole, reason)
250
- ENG->>D7: detectOwnerApproval(requirement)
251
- D7->>AB: createChallenge(toolName, args, requesterId)
252
- AB->>AB: requestId = randomUUID()
253
- AB->>AB: actionDigest = SHA-256({ toolName, args, conversationId, ... })
254
- AB->>AS: save(record with expiresAt = now + TTL)
255
- AB->>NS: notify({ requestId, toolName, reason, expiresAt })
256
- AB-->>D7: { requestId, expiresAt, requiredRole }
257
- D7-->>ENG: DENY + approvalChallenge
258
- ENG-->>Agent: DENY with approvalChallenge.requestId
259
- end
260
-
261
- rect rgb(240, 255, 240)
262
- Note over Agent,Owner: Phase 2: Approval
263
- Owner->>ENG: /approve <requestId>
264
- ENG->>AB: approveRequest(requestId, ownerId, "owner")
265
- AB->>AS: lookup(requestId)
266
- AB->>AB: Verify: not expired, role sufficient, not self-approval
267
- AB->>AB: Check quorum (approverIds.length >= ownerQuorum?)
268
- AB->>AB: Generate token: apr_<uuid>
269
- AB->>AS: setToken(requestId, token)
270
- AB-->>ENG: token string
271
- ENG-->>Owner: "Approved. Token: apr_..."
272
- end
273
-
274
- rect rgb(240, 248, 255)
275
- Note over Agent,Owner: Phase 3: Redemption
276
- Agent->>ENG: before_tool_call (same tool + metadata.approval.token)
277
- ENG->>D6: evaluateAuthorization() → approvalRequirement
278
- ENG->>D7: detectOwnerApproval(requirement)
279
- D7->>AB: verifyAndConsumeToken(token)
280
- AB->>AS: lookup by token
281
- AB->>AB: Verify: not expired, not used, conversation match
282
- AB->>AB: Verify: action digest match (same tool + args)
283
- AB->>AS: markUsed(requestId)
284
- AB-->>D7: "valid"
285
- D7-->>ENG: no hits (ALLOW)
286
- ENG-->>Agent: ALLOW
287
- end
288
-
289
- rect rgb(255, 240, 240)
290
- Note over Agent,Owner: Replay Prevention
291
- Agent->>ENG: before_tool_call (same token again)
292
- ENG->>D7: detectOwnerApproval(requirement)
293
- D7->>AB: verifyAndConsumeToken(token)
294
- AB->>AB: Token already has usedAt timestamp
295
- AB-->>D7: "replayed"
296
- D7-->>ENG: DENY (OWNER_APPROVAL_REPLAYED)
297
- ENG-->>Agent: DENY
298
- end
299
- ```
300
-
301
- **Approval verification checks** (in order):
302
- 1. Token exists and maps to a valid record
303
- 2. Record not expired (TTL from creation)
304
- 3. Token not already consumed (`usedAt` is null)
305
- 4. RequestId matches (if provided by caller)
306
- 5. Requester identity matches original requester
307
- 6. Conversation matches (if `bindToConversation` enabled)
308
- 7. Action digest matches (SHA-256 of tool + args + context)
309
-
310
- ### Outbound Guard (System Prompt Leak Prevention)
311
-
312
- ```mermaid
313
- sequenceDiagram
314
- participant Agent as Agent
315
- participant ADP as Adapter
316
- participant ENG as Engine
317
- participant D10 as Output Safety
318
-
319
- Agent->>ADP: message_sending(context)
320
- ADP->>ADP: extractOutboundContent()
321
- Note over ADP: Scans ALL string fields<br/>(not just "content")
322
- ADP->>ENG: evaluate(guardEvent, "message_sending")
323
- ENG->>D10: Check leak patterns + injected filenames
324
- alt System prompt content detected
325
- D10-->>ENG: DENY (SYSTEM_PROMPT_LEAK, weight 0.95)
326
- ENG-->>ADP: DENY
327
- ADP-->>Agent: { cancel: true }
328
- else Suspicious patterns (script tags, tokens)
329
- D10-->>ENG: REDACT (UNTRUSTED_OUTPUT, weight 0.55)
330
- ENG-->>ADP: REDACT with sanitized content
331
- ADP-->>Agent: { content: redactedContent }
332
- else Clean
333
- D10-->>ENG: no hits
334
- ENG-->>ADP: ALLOW
335
- ADP-->>Agent: {}
336
- end
337
- ```
338
-
339
- ### `tool_result_persist` — Split Sync/Async Strategy
340
-
341
- This hook is synchronous in OpenClaw but the engine is async. The adapter splits the work:
342
-
343
- ```mermaid
344
- sequenceDiagram
345
- participant OC as OpenClaw (sync)
346
- participant EXT as Extension
347
- participant ADP as Adapter (async)
348
- participant AUDIT as Audit Sink
349
-
350
- OC->>EXT: tool_result_persist(event, ctx)
351
-
352
- par Sync path (returns to OpenClaw)
353
- EXT->>EXT: redactWithPatterns(content, precompiled patterns)
354
- EXT-->>OC: { message: { content: redacted } } or {}
355
- and Async path (fire-and-forget)
356
- EXT->>ADP: hooks.tool_result_persist(oclCtx)
357
- ADP->>ADP: engine.evaluate() + metrics
358
- ADP->>AUDIT: auditSink.append()
359
- Note over ADP: Promise .catch() logs errors
360
- end
361
- ```
362
-
363
- ### Reason Code Sanitization
364
-
365
- Sensitive reason codes are replaced before reaching the client to prevent detection fingerprinting:
366
-
367
- | Internal Code | Client-Facing Code |
368
- |---|---|
369
- | `SECRET_DETECTED` | `CONTENT_POLICY_VIOLATION` |
370
- | `PII_DETECTED` | `CONTENT_POLICY_VIOLATION` |
371
- | `EXFIL_PATTERN` | `CONTENT_POLICY_VIOLATION` |
372
- | `SYSTEM_PROMPT_LEAK` | `CONTENT_POLICY_VIOLATION` |
373
-
374
- All other reason codes pass through unchanged.
375
-
376
- ### Redaction Cascade
377
-
378
- Sensitive data, restricted info, and output safety detectors produce redacted content in a priority chain:
379
-
380
- ```mermaid
381
- flowchart LR
382
- D8[D8: Sensitive Data<br/>secrets → PII] -->|redactedContent| D9[D9: Restricted Info<br/>data-class policy]
383
- D9 -->|redactedContent| D10[D10: Output Safety<br/>leak patterns]
384
- D10 -->|Final redactedContent| R[Engine picks:<br/>D10 > D9 > D8]
385
- ```
386
-
387
- ## Architecture
388
-
389
- ```
390
- src/
391
- ├── index.ts # Public exports
392
- ├── core/
393
- │ ├── engine.ts # Ordered detector pipeline + final decisioning
394
- │ ├── identity.ts # Principal normalization + anti-spoofing
395
- │ ├── authorization.ts # Role/channel/data-class policy evaluation
396
- │ ├── approval.ts # Owner approval broker + notification sink
397
- │ ├── approval-store.ts # Persistent approval state + pruning
398
- │ ├── audit-sink.ts # JSONL audit event sink
399
- │ ├── budget-store.ts # Per-principal budget tracking
400
- │ ├── custom-validator.ts # Custom validator interface
401
- │ ├── jsonl-writer.ts # Shared JSONL append writer
402
- │ ├── notification-sink.ts # Admin notification sink interface + impls
403
- │ ├── token-usage-store.ts # Per-user token usage tracking
404
- │ ├── normalize.ts # Event normalization
405
- │ ├── event-utils.ts # Guard event helpers
406
- │ ├── scoring.ts # Risk score aggregation
407
- │ ├── reason-codes.ts # Canonical reason code constants
408
- │ ├── types.ts # Core type definitions
409
- │ ├── command-parse.ts # Command string parsing
410
- │ ├── network-guard.ts # Network host/URL validation
411
- │ ├── path-canonical.ts # Path canonicalization + symlink checks
412
- │ ├── retrieval-trust.ts # Retrieval trust level evaluation
413
- │ ├── supply-chain.ts # Skill source + hash policy
414
- │ └── detectors/ # Security detector modules
415
- │ ├── index.ts # Detector exports
416
- │ ├── types.ts # Detector type definitions
417
- │ ├── budget-detector.ts # Per-principal budget enforcement
418
- │ ├── command-policy-detector.ts # Command allow/deny + shell operator blocking
419
- │ ├── external-validator-detector.ts # HTTP external validation + circuit breaker
420
- │ ├── input-intent-detector.ts # Prompt injection, exfiltration, context probing
421
- │ ├── network-egress-detector.ts # Host allowlist + private IP blocking
422
- │ ├── output-safety-detector.ts # System prompt leak + filename injection
423
- │ ├── owner-approval-detector.ts # Approval challenge gating
424
- │ ├── path-canonical-detector.ts # Symlink traversal detection
425
- │ ├── principal-authz-detector.ts # Role-based authorization
426
- │ ├── provenance-detector.ts # Skill source trust + hash integrity
427
- │ ├── restricted-info-detector.ts # Non-privileged group redaction
428
- │ └── sensitive-data-detector.ts # Secret/PII detection
429
- ├── plugin/
430
- │ ├── version.ts # Shared version constant
431
- │ ├── event-adapter.ts # OpenClaw typed hook ↔ internal context mapping
432
- │ ├── openclaw-adapter.ts # Core guardrails engine adapter + telemetry
433
- │ └── openclaw-extension.ts # Plugin entry point (api.on() typed hooks)
434
- ├── redaction/
435
- │ └── redact.ts # Secret/PII redaction engine (cached regex)
436
- └── rules/
437
- ├── default-policy.ts # Default config factory + merge
438
- └── patterns.ts # Detection pattern definitions
439
- ```
440
-
441
- ## Install in OpenClaw
10
+ ## Install
442
11
 
443
12
  ```bash
444
13
  openclaw plugins install @safefence/openclaw-guardrails
@@ -470,13 +39,10 @@ After changing plugin install/config, restart the OpenClaw service or gateway pr
470
39
 
471
40
  ## Usage
472
41
 
473
- Three main entry points:
474
-
475
42
  ```ts
476
- // 1. OpenClaw plugin — default export, auto-discovered by OpenClaw via
477
- // package.json "openclaw.extensions". Registers all typed hooks via api.on().
43
+ // 1. OpenClaw plugin — default export, auto-discovered via package.json
44
+ // "openclaw.extensions". Registers all typed hooks via api.on().
478
45
  import { openclawGuardrailsPlugin } from "@safefence/openclaw-guardrails";
479
- // openclawGuardrailsPlugin.register(api) is called automatically by OpenClaw.
480
46
 
481
47
  // 2. Plugin factory — returns a guardrails engine with hook handlers,
482
48
  // useful for testing or manual integration.
@@ -497,37 +63,7 @@ const engine = new GuardrailsEngine(config);
497
63
  const decision = await engine.evaluate(event);
498
64
  ```
499
65
 
500
- ### Plugin with advanced options
501
-
502
- ```ts
503
- import {
504
- createOpenClawGuardrailsPlugin,
505
- JsonlAuditSink,
506
- CallbackNotificationSink
507
- } from "@safefence/openclaw-guardrails";
508
-
509
- const plugin = createOpenClawGuardrailsPlugin({
510
- config: {
511
- workspaceRoot: "/workspace/project",
512
- audit: { enabled: true, sinkPath: "/var/log/guardrails/audit.jsonl" },
513
- budgetPersistence: { enabled: true, storagePath: "/data/token-usage.jsonl" },
514
- notifications: { enabled: true },
515
- externalValidation: {
516
- enabled: true,
517
- endpoint: "https://guard.example.com/validate",
518
- validators: ["jailbreak", "pii"],
519
- timeoutMs: 3000,
520
- failOpen: true
521
- }
522
- },
523
- auditSink: new JsonlAuditSink("/var/log/guardrails/audit.jsonl"),
524
- notificationSink: new CallbackNotificationSink(async (notification) => {
525
- await sendSlackMessage(adminChannel, `Approval needed: ${notification.reason}`);
526
- })
527
- });
528
- ```
529
-
530
- ### Custom validators
66
+ ### Custom Validators
531
67
 
532
68
  ```ts
533
69
  import { GuardrailsEngine } from "@safefence/openclaw-guardrails";
@@ -547,108 +83,15 @@ const spendingLimit: CustomValidator = {
547
83
  const engine = new GuardrailsEngine(config, { customValidators: [spendingLimit] });
548
84
  ```
549
85
 
550
- **Exported types**: `ApproverRole`, `ChannelType`, `DataClass`, `Decision`, `PrincipalContext`, `PrincipalRole`, `RolloutStage`, `GuardDecision`, `GuardEvent`, `GuardrailsConfig`, `Phase`, `TokenUsageSummary`, `AuditEvent`, `AuditSink`, `CustomValidator`, `CustomValidatorContext`, `NotificationSink`, `ApprovalNotification`, `TokenUsageRecord`, `EngineOptions`, `PluginOptions`.
86
+ ## Exports
551
87
 
552
- **Exported constants**: `REASON_CODES`, `UNKNOWN_SENDER`, `UNKNOWN_CONVERSATION`.
88
+ **Types**: `ApproverRole`, `ChannelType`, `DataClass`, `Decision`, `PrincipalContext`, `PrincipalRole`, `RolloutStage`, `GuardDecision`, `GuardEvent`, `GuardrailsConfig`, `Phase`, `TokenUsageSummary`, `AuditEvent`, `AuditSink`, `CustomValidator`, `CustomValidatorContext`, `NotificationSink`, `ApprovalNotification`, `TokenUsageRecord`, `EngineOptions`, `PluginOptions`.
553
89
 
554
- **Exported classes**: `GuardrailsEngine`, `JsonlAuditSink`, `NoopAuditSink`, `ConsoleNotificationSink`, `CallbackNotificationSink`, `NoopNotificationSink`, `TokenUsageStore`.
90
+ **Constants**: `REASON_CODES`, `UNKNOWN_SENDER`, `UNKNOWN_CONVERSATION`.
555
91
 
556
- **Config helpers**: `createDefaultConfig()`, `mergeConfig(base, overrides)`.
92
+ **Classes**: `GuardrailsEngine`, `JsonlAuditSink`, `NoopAuditSink`, `ConsoleNotificationSink`, `CallbackNotificationSink`, `NoopNotificationSink`, `TokenUsageStore`.
557
93
 
558
- ## Config Reference
559
-
560
- | Section | Key | Type | Default | Description |
561
- |---------|-----|------|---------|-------------|
562
- | *(root)* | `mode` | `"enforce" \| "audit"` | `"enforce"` | Whether violations block or just log |
563
- | *(root)* | `failClosed` | `boolean` | `true` | On engine error: DENY (true) or ALLOW (false) |
564
- | *(root)* | `workspaceRoot` | `string` | `process.cwd()` | Anchor for path resolution |
565
- | `allow` | `tools` | `string[]` | 8 tools | Allowed tool names |
566
- | `allow` | `commands` | `CommandEntry[]` | 6 binaries | Allowed binaries with optional argPattern |
567
- | `allow` | `writablePaths` | `string[]` | `[workspaceRoot]` | Filesystem write boundary |
568
- | `allow` | `networkHosts` | `string[]` | localhost only | Allowed egress hosts |
569
- | `allow` | `allowPrivateEgress` | `boolean` | `false` | Allow RFC 1918 / loopback destinations |
570
- | `deny` | `commandPatterns` | `string[]` | 8 patterns | Destructive command regexes |
571
- | `deny` | `pathPatterns` | `string[]` | 8 patterns | Path traversal regexes |
572
- | `deny` | `promptInjectionPatterns` | `string[]` | 6 patterns | Injection attempt regexes |
573
- | `deny` | `exfiltrationPatterns` | `string[]` | 4 patterns | Data exfiltration regexes |
574
- | `deny` | `shellOperatorPatterns` | `string[]` | 9 patterns | Shell chaining/redirect regexes |
575
- | `redaction` | `secretPatterns` | `string[]` | 7 patterns | Secret detection regexes (AWS, GitHub, PEM, etc.) |
576
- | `redaction` | `piiPatterns` | `string[]` | 4 patterns | PII detection regexes (email, SSN, CC, phone) |
577
- | `redaction` | `replacement` | `string` | `"[REDACTED]"` | Replacement string for matches |
578
- | `redaction` | `applyInAuditMode` | `boolean` | `true` | Redact even when mode=audit |
579
- | `limits` | `maxInputChars` | `number` | `20000` | Max input content length |
580
- | `limits` | `maxToolArgChars` | `number` | `10000` | Max serialized tool args length |
581
- | `limits` | `maxOutputChars` | `number` | `50000` | Max tool output length |
582
- | `limits` | `maxRequestsPerMinute` | `number` | `120` | Rate limit: requests per 60s window |
583
- | `limits` | `maxToolCallsPerMinute` | `number` | `60` | Rate limit: tool calls per 60s window |
584
- | `pathPolicy` | `enforceCanonicalRealpath` | `boolean` | `true` | Resolve symlinks and verify workspace boundary |
585
- | `pathPolicy` | `denySymlinkTraversal` | `boolean` | `true` | Block symlinks that escape workspace |
586
- | `supplyChain` | `trustedSkillSources` | `string[]` | — | Allowed skill installation domains |
587
- | `supplyChain` | `requireSkillHash` | `boolean` | `true` | Require hash for remote skills |
588
- | `supplyChain` | `allowedSkillHashes` | `string[]` | — | Pre-approved skill hashes |
589
- | `principal` | `requireContext` | `boolean` | `true` | Require identity context |
590
- | `principal` | `ownerIds` | `string[]` | `[]` | User IDs with owner privilege |
591
- | `principal` | `adminIds` | `string[]` | `[]` | User IDs with admin privilege |
592
- | `principal` | `failUnknownInGroup` | `boolean` | `true` | Deny unknown users in group channels |
593
- | `authorization` | `defaultEffect` | `"deny" \| "allow"` | `"deny"` | Default when no explicit rule matches |
594
- | `authorization` | `requireMentionInGroups` | `boolean` | `true` | Require @mention for group messages |
595
- | `authorization` | `restrictedTools` | `string[]` | 6 tools | Tools requiring elevated role or approval |
596
- | `authorization` | `restrictedDataClasses` | `string[]` | — | Data classes requiring elevated access |
597
- | `authorization` | `toolAllowByRole` | `Record<Role, string[]>` | Role-tiered | Per-role tool access lists |
598
- | `approval` | `enabled` | `boolean` | `true` | Enable owner approval workflow |
599
- | `approval` | `ttlSeconds` | `number` | `300` | Approval challenge TTL |
600
- | `approval` | `requireForTools` | `string[]` | 6 tools | Tools requiring approval |
601
- | `approval` | `requireForDataClasses` | `string[]` | `["restricted", "secret"]` | Data classes requiring approval |
602
- | `approval` | `ownerQuorum` | `number` | `1` | Number of approvers required |
603
- | `approval` | `bindToConversation` | `boolean` | `true` | Bind token to originating conversation |
604
- | `approval` | `storagePath` | `string?` | — | JSON file for persistent approvals |
605
- | `tenancy` | `budgetKeyMode` | `string` | `"agent+principal+conversation"` | Budget partitioning strategy |
606
- | `tenancy` | `redactCrossPrincipalOutput` | `boolean` | `true` | Redact vs deny for restricted data |
607
- | `outboundGuard` | `enabled` | `boolean` | `true` | Enable outbound leak prevention |
608
- | `outboundGuard` | `systemPromptLeakPatterns` | `string[]` | 8 patterns | Patterns indicating prompt leakage |
609
- | `outboundGuard` | `injectedFileNames` | `string[]` | 9 names | Config filenames to block in output |
610
- | `rollout` | `stage` | `RolloutStage` | `"stage_c_full_enforce"` | Current enforcement stage |
611
- | `rollout` | `highRiskTools` | `string[]` | — | Tools enforced in stage B |
612
- | `monitoring` | `falsePositiveThresholdPct` | `number` | `3` | False positive rate threshold |
613
- | `monitoring` | `consecutiveDaysForTuning` | `number` | `2` | Days above threshold before signaling |
614
- | `audit` | `enabled` | `boolean` | `false` | Enable JSONL audit trail |
615
- | `audit` | `sinkPath` | `string?` | — | File path for JSONL audit events |
616
- | `externalValidation` | `enabled` | `boolean` | `false` | Enable HTTP external validators |
617
- | `externalValidation` | `endpoint` | `string` | — | POST endpoint for validation requests |
618
- | `externalValidation` | `timeoutMs` | `number?` | `5000` | Per-request timeout |
619
- | `externalValidation` | `validators` | `string[]` | `[]` | Validator names to invoke |
620
- | `externalValidation` | `failOpen` | `boolean` | `false` | Allow on timeout/error |
621
- | `budgetPersistence` | `enabled` | `boolean` | `false` | Enable token usage tracking |
622
- | `budgetPersistence` | `storagePath` | `string?` | — | JSONL path for usage persistence |
623
- | `notifications` | `enabled` | `boolean` | `false` | Enable approval notifications |
624
- | `notifications` | `adminChannelId` | `string?` | — | Target channel for notifications |
625
-
626
- ## Migration
627
-
628
- ### v0.6.0 → v0.6.1
629
-
630
- 1. **Plugin API alignment**: The plugin now uses OpenClaw's typed hook system (`api.on()`) instead of `api.registerHook()`. Security decisions (block, cancel, redact) are now properly honoured by OpenClaw's pipeline — previously they were silently discarded.
631
- 2. **New event adapter layer**: `src/plugin/event-adapter.ts` bridges OpenClaw's structured `(event, ctx)` hook pairs to the internal `OpenClawContext`. No changes needed for users of `createOpenClawGuardrailsPlugin()` or `GuardrailsEngine` directly.
632
- 3. **Plugin export**: The default export is now an `{ id, name, version, register }` object (compatible with `resolvePluginModuleExport()`). The `registerOpenClawGuardrails` named export is preserved for backward compatibility.
633
- 4. **`tool_result_persist` sync redaction**: Uses the existing `redactWithPatterns()` utility for synchronous redaction. Full async engine evaluation runs fire-and-forget for audit/metrics.
634
- 5. **Manifest cleaned**: Removed unrecognized `entry` and `hooks` fields from `openclaw.plugin.json`. Set `additionalProperties: false` on root config schema.
635
- 6. **Peer dependency**: `openclaw` is now declared as a `peerDependency` (`>=2026.2.25`).
636
-
637
- ### v0.5.x → v0.6.0
638
-
639
- 1. `GuardrailsEngine` constructor now takes `(config, options?)` instead of `(config, budgetStore?, approvalBroker?)`. Pass dependencies via `EngineOptions`.
640
- 2. `createOpenClawGuardrailsPlugin()` accepts both `Partial<GuardrailsConfig>` (unchanged) and `PluginOptions` (new) for injecting audit sinks, notification sinks, etc.
641
- 3. New config sections (`audit`, `externalValidation`, `budgetPersistence`, `notifications`) default to disabled — no breaking changes for existing configs.
642
- 4. New reason codes: `EXTERNAL_VALIDATION_FAILED`, `EXTERNAL_VALIDATION_TIMEOUT`.
643
-
644
- ### v0.3.x → v0.4.0
645
-
646
- 1. Add `principal`, `authorization`, `approval`, and `tenancy` config blocks.
647
- 2. Pass sender/channel metadata in hook contexts (`senderId`, `conversationId`, `channelType`, `mentionedAgent`).
648
- 3. Integrate owner approval handling via `approvalChallenge.requestId` + `plugin.approveRequest(...)`.
649
- 4. Keep secure defaults unless you have a validated exception.
650
- 5. Use `rollout.stage` for staged deployment and monitor `metadata.guardrailsMonitoring`.
651
- 6. **Breaking**: callers can no longer self-assign privileged roles (`owner`/`admin`) via `metadata.role`. Privileged roles are now derived exclusively from `principal.ownerIds`/`adminIds` in config. Any caller-supplied `"owner"` or `"admin"` role is downgraded to `"member"`.
94
+ **Config helpers**: `createDefaultConfig()`, `mergeConfig(base, overrides)`.
652
95
 
653
96
  ## Limitations
654
97
 
@@ -658,6 +101,14 @@ const engine = new GuardrailsEngine(config, { customValidators: [spendingLimit]
658
101
  - External validator circuit breaker state is module-scoped (shared across engine instances in the same process).
659
102
  - Token usage `records` array grows unboundedly in memory for long-running processes. Use JSONL persistence and restart periodically for high-volume deployments.
660
103
 
104
+ ## Further Reading
105
+
106
+ - [Architecture & Internals](docs/ARCHITECTURE.md) — plugin flow, detector pipeline, security features, source layout
107
+ - [Config Reference](docs/CONFIG.md) — full configuration options
108
+ - [Migration Guide](docs/MIGRATION.md) — upgrade notes between versions
109
+ - [Research & Threat Analysis](../../docs/openclaw-llm-security-research.md)
110
+ - [Root Project Overview](../../README.md)
111
+
661
112
  ## Development
662
113
 
663
114
  ```bash
@@ -1 +1 @@
1
- export declare const PLUGIN_VERSION = "0.6.1";
1
+ export declare const PLUGIN_VERSION = "0.6.4";
@@ -1 +1 @@
1
- export const PLUGIN_VERSION = "0.6.1";
1
+ export const PLUGIN_VERSION = "0.6.4";
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "id": "openclaw-guardrails",
3
3
  "name": "openclaw-guardrails",
4
- "version": "0.6.2",
4
+ "version": "0.6.4",
5
5
  "description": "Deterministic local guardrails for OpenClaw hooks",
6
6
  "configSchema": {
7
7
  "type": "object",
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@safefence/openclaw-guardrails",
3
- "version": "0.6.2",
3
+ "version": "0.6.4",
4
4
  "description": "Native deterministic guardrails plugin for OpenClaw",
5
5
  "openclaw": {
6
6
  "extensions": [
@@ -22,7 +22,7 @@
22
22
  "test:watch": "vitest",
23
23
  "preversion": "npm test && npm run build",
24
24
  "version": "bash scripts/sync-version.sh",
25
- "postversion": "echo '\nRun these to publish:\n git push origin master --tags\n npm publish --access public'"
25
+ "postversion": "echo '\nRun this to publish via CI:\n git push origin master --tags'"
26
26
  },
27
27
  "engines": {
28
28
  "node": ">=20"
@@ -34,6 +34,14 @@
34
34
  "owasp",
35
35
  "llm"
36
36
  ],
37
+ "repository": {
38
+ "type": "git",
39
+ "url": "https://github.com/douglasswm/safefence.git",
40
+ "directory": "packages/openclaw-guardrails"
41
+ },
42
+ "publishConfig": {
43
+ "provenance": true
44
+ },
37
45
  "license": "MIT",
38
46
  "peerDependencies": {
39
47
  "openclaw": ">=2026.2.25"