npm - @oussamadouhou/agent-enforcement - Versions diffs - 0.1.2 - Mend

@oussamadouhou/agent-enforcement 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (40) hide show

package/.claude-plugin/plugin.json +5 -0
package/README.md +67 -0
package/dist/adapters/logger.d.ts +23 -0
package/dist/adapters/logger.d.ts.map +1 -0
package/dist/adapters/logger.js +58 -0
package/dist/adapters/logger.js.map +1 -0
package/dist/adapters/message-reader.d.ts +20 -0
package/dist/adapters/message-reader.d.ts.map +1 -0
package/dist/adapters/message-reader.js +123 -0
package/dist/adapters/message-reader.js.map +1 -0
package/dist/constants.d.ts +22 -0
package/dist/constants.d.ts.map +1 -0
package/dist/constants.js +50 -0
package/dist/constants.js.map +1 -0
package/dist/detection/environment.d.ts +7 -0
package/dist/detection/environment.d.ts.map +1 -0
package/dist/detection/environment.js +30 -0
package/dist/detection/environment.js.map +1 -0
package/dist/hook.d.ts +35 -0
package/dist/hook.d.ts.map +1 -0
package/dist/hook.js +223 -0
package/dist/hook.js.map +1 -0
package/dist/index.d.ts +14 -0
package/dist/index.d.ts.map +1 -0
package/dist/index.js +8 -0
package/dist/index.js.map +1 -0
package/dist/plugin.d.ts +4 -0
package/dist/plugin.d.ts.map +1 -0
package/dist/plugin.js +6 -0
package/dist/plugin.js.map +1 -0
package/dist/schema/verification-schema.d.ts +127 -0
package/dist/schema/verification-schema.d.ts.map +1 -0
package/dist/schema/verification-schema.js +132 -0
package/dist/schema/verification-schema.js.map +1 -0
package/docs/EVIDENCE_SYSTEM.md +214 -0
package/fixtures/claude/project-abc/session-123.jsonl +2 -0
package/fixtures/opencode/messages/session-123/msg_001.json +4 -0
package/hooks/enforcement.py +255 -0
package/hooks/hooks.json +15 -0
package/package.json +70 -0

package/docs/EVIDENCE_SYSTEM.md ADDED Viewed

@@ -0,0 +1,214 @@
+# Evidence-Based Trust System
+**Version**: 1.0.0
+**Plugin**: @oussamadouhou/agent-enforcement
+---
+## Overview
+The Evidence-Based Trust System prevents AI agents from marking tasks complete without providing verifiable evidence of completion.
+### Purpose
+- **Prevent premature completion**: No marking todos done without proof
+- **Enforce accountability**: Every completion requires evidence
+- **Enable verification**: Evidence must be independently verifiable
+- **Build trust**: Ground truth over assumptions
+---
+## Enforcement Levels
+Set via environment variable: `OPENCODE_ENFORCEMENT_LEVEL`
+| Level | Value | Behavior |
+|-------|-------|----------|
+| **CREATIVE** | `0` | No enforcement (brainstorming, planning) |
+| **STANDARD** | `1` | Warning in tool output (default) |
+| **STRICT** | `2` | Block tool execution with error |
+**Set level:**
+```bash
+export OPENCODE_ENFORCEMENT_LEVEL=1  # 0, 1, or 2
+```
+---
+## Required Evidence Format
+Before marking any todo as `completed`, you **MUST** provide an evidence block:
+```markdown
+**Evidence for [todo-id]**:
+**Execution**: [command/tool used]
+**Verification**: [what was checked]
+**Checklist**:
+- [x] Tool executed successfully
+- [x] Output captured
+- [x] Result matches expected
+- [x] No workarounds used
+- [x] Independently verifiable
+**Trust**: 🟢 HIGH | ✅ Ground Truth
+```
+---
+## Checklist Items (All Required)
+### 1. Tool executed successfully
+- ✅ Tool returned exit code 0 (bash)
+- ✅ Tool completed without errors
+- ✅ Output was generated
+- ❌ NOT: "I think it worked"
+### 2. Output captured
+- ✅ Tool output included in response
+- ✅ Output is readable and complete
+- ✅ Output shown to user
+- ❌ NOT: Hidden or summarized
+### 3. Result matches expected
+- ✅ Output contains expected data
+- ✅ Changes are visible in files/system
+- ✅ Verification command confirms success
+- ❌ NOT: Assumed without checking
+### 4. No workarounds used
+- ✅ Proper solution implemented
+- ✅ No hacks or temporary fixes
+- ✅ Follows best practices
+- ❌ NOT: `as any`, ignoring errors, deleting tests
+### 5. Independently verifiable
+- ✅ Another person/agent can verify
+- ✅ Evidence is objective
+- ✅ Not dependent on hidden state
+- ❌ NOT: "Trust me"
+---
+## Trust Markers (Required)
+### Confidence Level
+- 🟢 **HIGH**: Direct observation, ground truth
+- 🟡 **MEDIUM**: Indirect verification, inference
+- 🔴 **LOW**: Assumption, no verification
+### Evidence Type
+- ✅ **Ground Truth**: Tool executed, output captured
+- ⚠️ **Simulation**: Mental model, not executed
+- ❌ **Assumption**: No verification attempted
+---
+## Examples
+### ✅ GOOD: Complete Evidence
+```markdown
+I'll check the working directory.
+**Evidence for todo-1**:
+**Execution**: `pwd` command via bash tool
+**Verification**: Output shows `/home/user/project`
+**Checklist**:
+- [x] Tool executed successfully (exit code 0)
+- [x] Output captured: `/home/user/project`
+- [x] Result matches expected (correct directory)
+- [x] No workarounds used (standard pwd command)
+- [x] Independently verifiable (anyone can run pwd)
+**Trust**: 🟢 HIGH | ✅ Ground Truth
+*Now marking todo-1 complete*
+```
+### ❌ BAD: No Evidence
+```markdown
+I checked the working directory.
+*Marking todo-1 complete*  ← VIOLATION
+```
+### ❌ BAD: Incomplete Checklist
+```markdown
+**Evidence for todo-1**:
+**Execution**: `pwd`
+**Verification**: It worked
+**Checklist**:
+- [x] Tool executed
+- [ ] Output captured  ← MISSING
+**Trust**: 🟢 HIGH
+```
+---
+## When Evidence is NOT Required
+- ❌ Todo status is `pending`, `in_progress`, `cancelled`
+- ✅ Only `completed` status requires evidence
+---
+## Violations
+### What Happens (STANDARD Mode)
+When you try to mark a todo complete without evidence:
+1. **Tool completes normally** (todo is updated)
+2. **Warning appears in tool output** (you see it in response)
+3. **You should acknowledge** and provide evidence before proceeding
+### What Happens (STRICT Mode)
+When you try to mark a todo complete without evidence:
+1. **Tool execution is BLOCKED** (error thrown)
+2. **Todo is NOT updated**
+3. **You MUST provide evidence** then retry
+---
+## FAQ
+### Q: Why is this necessary?
+**A**: Prevents agents from claiming tasks are done without proof, improves reliability and accountability.
+### Q: Can I disable this?
+**A**: Set `OPENCODE_ENFORCEMENT_LEVEL=0` for CREATIVE mode (no enforcement).
+### Q: What if I'm just planning?
+**A**: Use CREATIVE mode (level 0) for planning/brainstorming.
+### Q: Can I provide evidence after completion?
+**A**: No - evidence must be provided BEFORE marking complete.
+### Q: What counts as valid evidence?
+**A**: Tool execution output, file contents, command results - anything independently verifiable.
+---
+## Technical Details
+### Hook Points
+- `tool.execute.before`: Validation logic
+- `tool.execute.after`: Warning injection
+### Detection
+- Monitors `todowrite` tool calls
+- Checks last assistant message for evidence block
+- Validates checklist items and trust markers
+### Warning Injection
+- Appends `<system-reminder>` to tool output
+- LLM sees warning in tool response
+- Human sees toast notification in TUI
+---
+**Documentation Location**: `~/.config/opencode/plugins/agent-enforcement/docs/EVIDENCE_SYSTEM.md`
+**Plugin Repository**: https://github.com/oussamadouhou/agent-enforcement
+**Issues**: https://github.com/oussamadouhou/agent-enforcement/issues

package/fixtures/claude/project-abc/session-123.jsonl ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ {"role":"user","content":[{"type":"text","text":"Run the check"}]}
2	+ {"role":"assistant","content":[{"type":"text","text":"Evidence for task-1:\n\nExecution:\n- Tool: read\n\nVerification:\n- Checked output\n\nChecklist:\n- [x] Tool executed successfully\n- [x] Output captured\n- [x] Result matches expected\n- [x] No workarounds used\n- [x] Independently verifiable\n\nTrust: 🟢 HIGH \| ✅ Ground Truth"}]}

package/fixtures/opencode/messages/session-123/msg_001.json ADDED Viewed

@@ -0,0 +1,4 @@
+{
+  "role": "assistant",
+  "text": "**Evidence for task-1**:\n\n**Execution**:\n- Tool: read\n\n**Verification**:\n- Checked output\n\n**Checklist**:\n- [x] Tool executed successfully\n- [x] Output captured\n- [x] Result matches expected\n- [x] No workarounds used\n- [x] Independently verifiable\n\n**Trust**: 🟢 HIGH | ✅ Ground Truth"
+}

package/hooks/enforcement.py ADDED Viewed

@@ -0,0 +1,255 @@
+#!/usr/bin/env python3
+"""
+Claude Code hook handler for evidence enforcement.
+Reads stdin (hook input), validates evidence, outputs JSON result.
+"""
+from __future__ import annotations
+import json
+import os
+import re
+import sys
+from datetime import datetime
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple
+HOOK_NAME = "evidence-enforcement"
+ENFORCEMENT_LEVELS = {
+    "CREATIVE": 0,
+    "STANDARD": 1,
+    "STRICT": 2,
+}
+EVIDENCE_PATTERNS = {
+    "evidence_block": re.compile(r"\*\*Evidence(?:\s+for\s+[\w-]+)?\*\*:", re.IGNORECASE),
+    "execution_section": re.compile(r"\*\*Execution\*\*:", re.IGNORECASE),
+    "verification_section": re.compile(r"\*\*Verification\*\*:", re.IGNORECASE),
+    "checklist_section": re.compile(r"\*\*Checklist\*\*:", re.IGNORECASE),
+    "trust_markers": re.compile(r"\*\*(?:Confidence|Trust)\*\*:", re.IGNORECASE),
+}
+REQUIRED_CHECKLIST_ITEMS = [
+    "Tool executed",
+    "Output captured",
+    "Result matches",
+    "No workarounds",
+    "Independently verifiable",
+]
+class Logger:
+    def __init__(self, mode: str, path: str, level: str) -> None:
+        self.mode = mode
+        self.path = path
+        self.level = level
+        self.level_map = {"debug": 10, "info": 20, "warn": 30, "error": 40}
+    def _should_log(self, level: str) -> bool:
+        if self.mode == "silent":
+            return False
+        return self.level_map.get(level, 20) >= self.level_map.get(self.level, 20)
+    def _log(self, level: str, msg: str) -> None:
+        if not self._should_log(level):
+            return
+        timestamp = datetime.utcnow().isoformat()
+        formatted = f"[{timestamp}] [{level.upper()}] {msg}"
+        if self.mode == "file":
+            try:
+                with open(self.path, "a", encoding="utf-8") as handle:
+                    handle.write(formatted + "\n")
+            except Exception:
+                return
+        elif self.mode == "console":
+            print(formatted)
+    def info(self, msg: str) -> None:
+        self._log("info", msg)
+    def warn(self, msg: str) -> None:
+        self._log("warn", msg)
+    def error(self, msg: str) -> None:
+        self._log("error", msg)
+    def debug(self, msg: str) -> None:
+        self._log("debug", msg)
+def get_enforcement_level() -> int:
+    env_level = os.environ.get("ENFORCEMENT_LEVEL") or os.environ.get("OPENCODE_ENFORCEMENT_LEVEL")
+    if not env_level:
+        return ENFORCEMENT_LEVELS["STANDARD"]
+    if env_level.isdigit():
+        parsed = int(env_level)
+        if parsed in (0, 1, 2):
+            return parsed
+    upper = env_level.upper()
+    return ENFORCEMENT_LEVELS.get(upper, ENFORCEMENT_LEVELS["STANDARD"])
+def get_log_config() -> Dict[str, str]:
+    return {
+        "mode": os.environ.get("ENFORCEMENT_LOG_MODE", "file"),
+        "path": os.environ.get("ENFORCEMENT_LOG_PATH", "/tmp/agent-enforcement.log"),
+        "level": os.environ.get("ENFORCEMENT_LOG_LEVEL", "info"),
+    }
+def extract_text(message: Dict[str, Any]) -> Optional[str]:
+    text = message.get("text")
+    if isinstance(text, str):
+        return text
+    content = message.get("content")
+    if isinstance(content, list):
+        for part in content:
+            if isinstance(part, dict) and part.get("type") == "text":
+                part_text = part.get("text")
+                if isinstance(part_text, str):
+                    return part_text
+    return None
+def read_last_assistant_message(session_id: str) -> Optional[str]:
+    projects_dir = Path.home() / ".claude" / "projects"
+    if not projects_dir.exists():
+        return None
+    for project_dir in projects_dir.iterdir():
+        if not project_dir.is_dir():
+            continue
+        session_file = project_dir / f"{session_id}.jsonl"
+        if not session_file.exists():
+            continue
+        try:
+            lines = session_file.read_text(encoding="utf-8").splitlines()
+        except Exception:
+            continue
+        for line in reversed(lines):
+            try:
+                message = json.loads(line)
+            except json.JSONDecodeError:
+                continue
+            if message.get("role") == "assistant":
+                return extract_text(message) or line
+    return None
+def validate_message_text(message_text: str) -> Tuple[bool, List[str]]:
+    missing_items: List[str] = []
+    for item in REQUIRED_CHECKLIST_ITEMS:
+        item_regex = re.compile(rf"\[\s*[xX]\s*\].*{re.escape(item)}", re.IGNORECASE)
+        if not item_regex.search(message_text):
+            missing_items.append(item)
+    has_evidence_block = bool(EVIDENCE_PATTERNS["evidence_block"].search(message_text))
+    has_trust_markers = bool(EVIDENCE_PATTERNS["trust_markers"].search(message_text))
+    if not has_evidence_block:
+        return False, ["Evidence block missing"]
+    if missing_items:
+        return False, missing_items
+    if not has_trust_markers:
+        return False, ["Trust markers missing"]
+    return True, []
+def build_error_message(missing: List[str], level: int) -> str:
+    status = "BLOCKED" if level == ENFORCEMENT_LEVELS["STRICT"] else "WARNING"
+    if "Evidence block missing" in missing:
+        return (
+            f"[{HOOK_NAME}] {status}: Cannot mark todo complete without evidence block.\n\n"
+            "Required format:\n"
+            "**Evidence for [todo-id]**:\n"
+            "**Execution**: [command/tool used]\n"
+            "**Verification**: [what was checked]\n"
+            "**Checklist**:\n"
+            "- [x] Tool executed successfully\n"
+            "- [x] Output captured\n"
+            "- [x] Result matches expected\n"
+            "- [x] No workarounds used\n"
+            "- [x] Independently verifiable\n"
+            "**Trust**: 🟢 HIGH | ✅ Ground Truth\n\n"
+            "See ~/AGENTS.md \"Evidence-Based Trust System\" for details."
+        )
+    if "Trust markers missing" in missing:
+        return (
+            f"[{HOOK_NAME}] {status}: Trust markers missing.\n\n"
+            "Required:\n"
+            "**Confidence**: 🟢 HIGH | 🟡 MEDIUM | 🔴 LOW\n"
+            "**Evidence**: ✅ Ground Truth | ⚠️ Simulation | ❌ Assumption"
+        )
+    checklist = "\n".join([f"- [ ] {item}" for item in missing])
+    return (
+        f"[{HOOK_NAME}] {status}: Evidence checklist incomplete.\n\n"
+        f"Missing checklist items:\n{checklist}\n\n"
+        "All items must be checked [x] before marking todo complete."
+    )
+def parse_todos(raw_todos: Any) -> List[Dict[str, Any]]:
+    if isinstance(raw_todos, list):
+        return [t for t in raw_todos if isinstance(t, dict)]
+    if isinstance(raw_todos, str):
+        try:
+            parsed = json.loads(raw_todos)
+            if isinstance(parsed, list):
+                return [t for t in parsed if isinstance(t, dict)]
+        except json.JSONDecodeError:
+            return []
+    return []
+def main() -> None:
+    input_data = json.loads(sys.stdin.read())
+    tool = input_data.get("tool", "")
+    session_id = input_data.get("sessionID", "")
+    args = input_data.get("args", {}) if isinstance(input_data.get("args"), dict) else {}
+    raw_todos = args.get("todos", [])
+    if tool != "todowrite":
+        print(json.dumps({"action": "allow"}))
+        return
+    level = get_enforcement_level()
+    log_config = get_log_config()
+    logger = Logger(**log_config)
+    if level == ENFORCEMENT_LEVELS["CREATIVE"]:
+        logger.info(f"[{HOOK_NAME}] CREATIVE mode - no enforcement")
+        print(json.dumps({"action": "allow"}))
+        return
+    todos = parse_todos(raw_todos)
+    completed = [t for t in todos if t.get("status") == "completed"]
+    if not completed:
+        print(json.dumps({"action": "allow"}))
+        return
+    last_message = read_last_assistant_message(session_id)
+    if not last_message:
+        message = build_error_message(["Evidence block missing"], level)
+        if level == ENFORCEMENT_LEVELS["STRICT"]:
+            print(json.dumps({"action": "block", "message": message}))
+        else:
+            logger.warn(message)
+            print(json.dumps({"action": "allow"}))
+        return
+    valid, missing = validate_message_text(last_message)
+    if valid:
+        logger.info(f"[{HOOK_NAME}] Evidence validation passed")
+        print(json.dumps({"action": "allow"}))
+        return
+    message = build_error_message(missing, level)
+    if level == ENFORCEMENT_LEVELS["STRICT"]:
+        print(json.dumps({"action": "block", "message": message}))
+    else:
+        logger.warn(message)
+        print(json.dumps({"action": "allow"}))
+if __name__ == "__main__":
+    main()

package/hooks/hooks.json ADDED Viewed

@@ -0,0 +1,15 @@
+{
+  "hooks": {
+    "PreToolUse": [
+      {
+        "hooks": [
+          {
+            "type": "command",
+            "command": "python3 ${CLAUDE_PLUGIN_ROOT}/hooks/enforcement.py",
+            "timeout": 10
+          }
+        ]
+      }
+    ]
+  }
+}

package/package.json ADDED Viewed

@@ -0,0 +1,70 @@
+{
+  "name": "@oussamadouhou/agent-enforcement",
+  "version": "0.1.2",
+  "description": "Evidence enforcement hooks for OpenCode and Claude Code",
+  "main": "dist/plugin.js",
+  "type": "module",
+  "types": "dist/plugin.d.ts",
+  "exports": {
+    ".": {
+      "types": "./dist/plugin.d.ts",
+      "import": "./dist/plugin.js",
+      "default": "./dist/plugin.js"
+    },
+    "./index": {
+      "types": "./dist/index.d.ts",
+      "import": "./dist/index.js",
+      "default": "./dist/index.js"
+    },
+    "./schema": {
+      "types": "./dist/schema/verification-schema.d.ts",
+      "import": "./dist/schema/verification-schema.js",
+      "default": "./dist/schema/verification-schema.js"
+    }
+  },
+  "files": [
+    "dist",
+    "hooks",
+    "docs",
+    ".claude-plugin",
+    "fixtures"
+  ],
+  "scripts": {
+    "build": "bun run clean && bun run compile",
+    "clean": "rm -rf dist",
+    "compile": "bunx tsc",
+    "prepublishOnly": "bun run build",
+    "test": "bun test"
+  },
+  "keywords": [
+    "opencode",
+    "claude-code",
+    "enforcement",
+    "evidence",
+    "todo"
+  ],
+  "author": "Oussama Douhou",
+  "license": "MIT",
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/oussamadouhou/agent-enforcement.git"
+  },
+  "homepage": "https://github.com/oussamadouhou/agent-enforcement#readme",
+  "bugs": {
+    "url": "https://github.com/oussamadouhou/agent-enforcement/issues"
+  },
+  "publishConfig": {
+    "access": "public"
+  },
+  "peerDependencies": {
+    "@opencode-ai/plugin": "^1.1.21"
+  },
+  "devDependencies": {
+    "@types/node": "^20.0.0",
+    "typescript": "^5.0.0",
+    "bun-types": "^1.0.0"
+  },
+  "engines": {
+    "node": ">=18.0.0"
+  }
+}