npm - @cyberdyne-systems/agent-safety - Versions diffs - 2026.3.3 → 2026.3.8 - Mend

@cyberdyne-systems/agent-safety 2026.3.3 → 2026.3.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -1,19 +1,55 @@
 # Agent Safety System
-OpenClaw plugin for LLM agent safety based on [arXiv:2602.20021 — "Agents of Chaos"](https://arxiv.org/abs/2602.20021).
+OpenClaw plugin for LLM agent safety based on [arXiv:2602.20021 -- "Agents of Chaos"](https://arxiv.org/abs/2602.20021).
 Hooks into `before_tool_call` to validate every tool call against a stakeholder model with trust levels, UID-based identity anchoring, and 8 risk dimensions.
-## Usage
+## Install
 ```bash
-openclaw plugins enable agent-safety
+openclaw plugins install @cyberdyne-systems/agent-safety
 ```
-See [full documentation](https://docs.openclaw.ai/extensions/agent-safety) for configuration, tool reference, and architecture.
+## Configuration
+After install, configure via `openclaw config set`:
+```bash
+# Validation mode: local (default), api, or both
+openclaw config set plugins.entries.agent-safety.mode local
+# Enable Claude API deep analysis (requires API key)
+openclaw config set plugins.entries.agent-safety.mode both
+openclaw config set plugins.entries.agent-safety.apiKey sk-ant-...
+# Block high-risk actions from unverified users (default: true)
+openclaw config set plugins.entries.agent-safety.blockHighRiskUnverified true
+```
+## How It Works
+1. **Quick check** (zero latency): local rules check trust level, permissions, identity spoofing, loop patterns, and dangerous command patterns
+2. **Deep analysis** (optional): Claude API evaluates 8 risk dimensions from the paper -- authority violation, resource abuse, information leak, safety bypass, goal misalignment, social engineering, cascading failure, irreversible action
+## Agent Safety Tool
+Once loaded, agents get an `agent_safety` tool with these actions:
+- `status` -- safety dashboard with audit stats
+- `stakeholders` -- list registered principals
+- `log` -- recent audit entries
+- `add_stakeholder` -- register a new principal
+- `set_trust` -- adjust trust level (0-4)
+## Test Suite
+114 tests covering all 11 case studies from the paper:
+- 23 MUST_BLOCK scenarios: 100% detection rate
+- 18 MUST_ALLOW scenarios: 0% false positive rate
 ## Development
 ```bash
-pnpm test extensions/agent-safety/
+pnpm vitest run extensions/agent-safety/
 ```

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@cyberdyne-systems/agent-safety",
-  "version": "2026.3.3",
+  "version": "2026.3.8",
   "description": "Agent safety system: stakeholder model, action validator, and safety dashboard — based on arXiv:2602.20021",
   "type": "module",
   "dependencies": {

package/src/integration.test.ts CHANGED Viewed

@@ -113,7 +113,8 @@ describe("Integration: full hook pipeline", () => {
     expect(simulateHook(store, auditLog, "bash", { command: "ls" }, "unknown_uid").block).toBe(
       true,
     );
-    expect(simulateHook(store, auditLog, "modify_memory", { content: "hi" }).block).toBe(true);
+    // No sender context → defaults to owner (local user), should be allowed
+    expect(simulateHook(store, auditLog, "modify_memory", { content: "hi" }).block).toBe(false);
   });
   // Audit logging

package/src/stakeholder-store.ts CHANGED Viewed

@@ -80,6 +80,11 @@ export class StakeholderStore {
       if (match) return match;
     }
+    // No sender context at all → local user, treat as owner
+    if (senderId === undefined && isOwner === undefined) {
+      return this.getOwner() ?? DEFAULT_STAKEHOLDERS[0];
+    }
     // Return untrusted default for unknown senders
     return {
       id: "unknown",