wardproof 0.2.0__tar.gz → 0.3.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {wardproof-0.2.0 → wardproof-0.3.0}/PKG-INFO +100 -31
- {wardproof-0.2.0 → wardproof-0.3.0}/README.md +82 -30
- {wardproof-0.2.0 → wardproof-0.3.0}/THREAT_MODEL.md +85 -5
- wardproof-0.3.0/benchmarks/README.md +222 -0
- wardproof-0.3.0/benchmarks/corpus.jsonl +136 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/benchmarks/external/README.md +21 -6
- {wardproof-0.2.0 → wardproof-0.3.0}/benchmarks/external/fetch_data.py +15 -4
- {wardproof-0.2.0 → wardproof-0.3.0}/benchmarks/external/injecagent.py +32 -0
- wardproof-0.3.0/benchmarks/heldout.py +132 -0
- wardproof-0.3.0/benchmarks/latency.py +114 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/benchmarks/run_benchmark.py +42 -1
- wardproof-0.3.0/examples/agent_to_agent_transfer.py +172 -0
- wardproof-0.3.0/examples/integrations/README.md +346 -0
- wardproof-0.3.0/examples/integrations/agentkit_guarded.py +213 -0
- wardproof-0.3.0/examples/integrations/anthropic_tools_guarded.py +198 -0
- wardproof-0.3.0/examples/integrations/crewai_guarded.py +154 -0
- wardproof-0.3.0/examples/integrations/langgraph_guarded.py +208 -0
- wardproof-0.3.0/examples/integrations/mcp_guarded.py +218 -0
- wardproof-0.3.0/examples/integrations/openai_tools_guarded.py +212 -0
- wardproof-0.3.0/examples/integrations/skills_guard.py +143 -0
- wardproof-0.3.0/examples/integrations/venice_guarded.py +256 -0
- wardproof-0.3.0/examples/morse_injection_blocked_at_action.py +132 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/examples/protect_rag_app.py +1 -1
- wardproof-0.3.0/examples/protect_x402_payments.py +205 -0
- wardproof-0.3.0/pyproject.toml +80 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/__init__.py +1 -1
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/agents/base.py +13 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/audit/ledger.py +18 -2
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/guardrails/__init__.py +2 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/guardrails/_normalize.py +17 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/guardrails/prompt_injection.py +110 -1
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/guardrails/tool_misuse.py +61 -11
- wardproof-0.3.0/wardproof/guardrails/transfer.py +211 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/guardrails/x402_payment.py +70 -28
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/orchestration/engine.py +35 -3
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/standards.py +42 -1
- wardproof-0.2.0/benchmarks/README.md +0 -103
- wardproof-0.2.0/benchmarks/corpus.jsonl +0 -66
- wardproof-0.2.0/examples/protect_x402_payments.py +0 -138
- wardproof-0.2.0/pyproject.toml +0 -54
- {wardproof-0.2.0 → wardproof-0.3.0}/.gitignore +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/CONTRIBUTING.md +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/LICENSE +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/SECURITY.md +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/benchmarks/external/__init__.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/benchmarks/external/_screen.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/benchmarks/external/agentdojo.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/examples/protect_defi_agent.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/examples/protect_mcp_agent.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/agents/__init__.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/agents/detector.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/agents/responder.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/agents/verifier.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/audit/__init__.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/audit/stix.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/cli.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/config.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/guardrails/base.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/guardrails/mcp_guard.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/guardrails/memory_poisoning.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/llm/__init__.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/llm/base.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/llm/null.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/llm/ollama_client.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/orchestration/__init__.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/orchestration/factory.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/sandbox/__init__.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/sandbox/executor.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/sandbox/permissions.py +0 -0
- {wardproof-0.2.0 → wardproof-0.3.0}/wardproof/schema.py +0 -0
|
@@ -1,16 +1,22 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: wardproof
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.3.0
|
|
4
4
|
Summary: Local-first, verifiable defensive AI agent swarms that protect other AI agent systems.
|
|
5
5
|
Author: Wardproof contributors
|
|
6
6
|
License-Expression: MIT
|
|
7
7
|
License-File: LICENSE
|
|
8
8
|
Keywords: agents,ai-security,guardrails,local-first,prompt-injection
|
|
9
9
|
Requires-Python: >=3.11
|
|
10
|
+
Provides-Extra: agentkit
|
|
11
|
+
Requires-Dist: coinbase-agentkit>=0.7; extra == 'agentkit'
|
|
10
12
|
Provides-Extra: all
|
|
11
13
|
Requires-Dist: cryptography>=42; extra == 'all'
|
|
12
14
|
Requires-Dist: httpx>=0.27; extra == 'all'
|
|
13
15
|
Requires-Dist: pyyaml>=6; extra == 'all'
|
|
16
|
+
Provides-Extra: anthropic
|
|
17
|
+
Requires-Dist: anthropic>=0.40; extra == 'anthropic'
|
|
18
|
+
Provides-Extra: crewai
|
|
19
|
+
Requires-Dist: crewai>=1.14; extra == 'crewai'
|
|
14
20
|
Provides-Extra: crypto
|
|
15
21
|
Requires-Dist: cryptography>=42; extra == 'crypto'
|
|
16
22
|
Provides-Extra: dev
|
|
@@ -21,8 +27,19 @@ Requires-Dist: pytest>=8; extra == 'dev'
|
|
|
21
27
|
Requires-Dist: ruff>=0.4; extra == 'dev'
|
|
22
28
|
Provides-Extra: guard
|
|
23
29
|
Requires-Dist: llm-guard>=0.3; extra == 'guard'
|
|
30
|
+
Provides-Extra: langgraph
|
|
31
|
+
Requires-Dist: langchain-core>=1.4; extra == 'langgraph'
|
|
32
|
+
Requires-Dist: langgraph>=1.2; extra == 'langgraph'
|
|
33
|
+
Provides-Extra: mcp
|
|
34
|
+
Requires-Dist: mcp>=1.26; extra == 'mcp'
|
|
24
35
|
Provides-Extra: ollama
|
|
25
36
|
Requires-Dist: httpx>=0.27; extra == 'ollama'
|
|
37
|
+
Provides-Extra: openai
|
|
38
|
+
Requires-Dist: openai>=1.0; extra == 'openai'
|
|
39
|
+
Provides-Extra: venice
|
|
40
|
+
Requires-Dist: openai>=1.0; extra == 'venice'
|
|
41
|
+
Provides-Extra: x402
|
|
42
|
+
Requires-Dist: x402>=2.0; extra == 'x402'
|
|
26
43
|
Provides-Extra: yaml
|
|
27
44
|
Requires-Dist: pyyaml>=6; extra == 'yaml'
|
|
28
45
|
Description-Content-Type: text/markdown
|
|
@@ -31,7 +48,16 @@ Description-Content-Type: text/markdown
|
|
|
31
48
|
|
|
32
49
|
**Local-first, verifiable defensive AI agent swarms.**
|
|
33
50
|
|
|
51
|
+
Stop prompt injection and tool misuse before your agent drains its wallet, leaks
|
|
52
|
+
its keys, or runs the wrong command, and keep a tamper-evident log of every
|
|
53
|
+
decision.
|
|
54
|
+
|
|
34
55
|
[](https://github.com/Impossible-Mission-Force/wardproof/actions/workflows/ci.yml)
|
|
56
|
+
[](https://pypi.org/project/wardproof/)
|
|
57
|
+
[](https://github.com/Impossible-Mission-Force/wardproof/blob/main/LICENSE)
|
|
58
|
+
[](https://www.python.org/downloads/)
|
|
59
|
+
|
|
60
|
+

|
|
35
61
|
|
|
36
62
|
Wardproof is a small framework for building swarms of *defensive* agents that
|
|
37
63
|
sit in front of your *other* AI systems (RAG pipelines, tool-using agents,
|
|
@@ -44,10 +70,17 @@ It is deliberately **small, transparent, and forkable**. The security core has
|
|
|
44
70
|
**zero third-party dependencies** and runs **fully offline**, with a local
|
|
45
71
|
model via Ollama, or with no model at all.
|
|
46
72
|
|
|
47
|
-
> **Status: v0.
|
|
48
|
-
> (see [Benchmark](#benchmark))
|
|
49
|
-
>
|
|
50
|
-
>
|
|
73
|
+
> **Status: v0.3.0.** The deterministic core is built, tested, and benchmarked
|
|
74
|
+
> (see [Benchmark](#benchmark)), and ships dedicated guards for x402 agent
|
|
75
|
+
> payments, on-chain transfers, MCP tool calls, and skill/tool definitions, a
|
|
76
|
+
> controls-to-standards map (OWASP Agentic Top 10, OWASP LLM 2025, MITRE ATLAS,
|
|
77
|
+
> CSA MAESTRO, and NIST AI 600-1) with STIX 2.1 ledger export, harnesses that
|
|
78
|
+
> screen the public AgentDojo and InjecAgent suites, and drop-in integration
|
|
79
|
+
> examples for OpenAI and Anthropic tool calling, CrewAI, LangGraph, MCP,
|
|
80
|
+
> Coinbase AgentKit, and Venice AI. It is
|
|
81
|
+
> deployable today as a screening and audit layer, designed to run as defence in
|
|
82
|
+
> depth within the scope set out in [`THREAT_MODEL.md`](THREAT_MODEL.md) and
|
|
83
|
+
> [`SECURITY.md`](SECURITY.md).
|
|
51
84
|
|
|
52
85
|
---
|
|
53
86
|
|
|
@@ -84,14 +117,26 @@ different stance:
|
|
|
84
117
|
- **x402 payment guardrail**: chain-agnostic screening of x402 (HTTP 402)
|
|
85
118
|
payment envelopes (CAIP-2 network, amount, recipient, asset) with a recipient
|
|
86
119
|
allowlist, amount thresholds, replay detection, and 402-body injection checks.
|
|
120
|
+
- **Transfer guardrail**: screens on-chain transfers against a recipient
|
|
121
|
+
allowlist and spend threshold, and treats an agent-relayed transfer as never
|
|
122
|
+
pre-authorised (it escalates rather than trusting one agent's say-so).
|
|
87
123
|
- **MCP guard**: screens MCP tool descriptions and schemas for tool poisoning
|
|
88
124
|
(incl. hidden Unicode), allowlists servers, detects manifest rug pulls, and
|
|
89
125
|
audits every tool invocation.
|
|
126
|
+
- **Skill/tool scanner**: screens a skill or tool definition (name, description,
|
|
127
|
+
code) before it is registered, catching hidden instructions buried in a
|
|
128
|
+
description (the tool-poisoning class, one step earlier than a live call). See
|
|
129
|
+
`examples/integrations/skills_guard.py`.
|
|
130
|
+
- **Framework integrations**: drop-in examples that put the swarm in front of
|
|
131
|
+
OpenAI and Anthropic tool calling, CrewAI, LangGraph, MCP, and Coinbase
|
|
132
|
+
AgentKit tool calls, plus Venice AI as an optional escalate-only second-opinion
|
|
133
|
+
backend. Each is an optional dependency; the core imports none of them. See
|
|
134
|
+
[`examples/integrations/`](examples/integrations/).
|
|
90
135
|
- **Standards-aligned**: every control mapped to OWASP Top 10 for Agentic
|
|
91
136
|
Applications, OWASP Agentic Threats (T1-T15), OWASP LLM Top 10 2025, CSA
|
|
92
|
-
MAESTRO,
|
|
93
|
-
detections are ATLAS-tagged and export to **STIX 2.1** for
|
|
94
|
-
`wardproof export-stix`.
|
|
137
|
+
MAESTRO, MITRE ATLAS, and NIST AI 600-1 (`wardproof/standards.py`, enforced by
|
|
138
|
+
tests). Ledger detections are ATLAS-tagged and export to **STIX 2.1** for
|
|
139
|
+
SIEM/SOC via `wardproof export-stix`.
|
|
95
140
|
- **3 reference agents**: `DetectorAgent`, `VerifierAgent` (with detector
|
|
96
141
|
integrity check), `ResponderAgent`.
|
|
97
142
|
- **Capability sandbox**: default-deny permission broker (per-agent grants,
|
|
@@ -113,7 +158,7 @@ different stance:
|
|
|
113
158
|
pip install -e . # core only, zero third-party deps
|
|
114
159
|
pip install -e ".[crypto]" # + Ed25519 signed ledgers
|
|
115
160
|
pip install -e ".[ollama]" # + local model via Ollama
|
|
116
|
-
pip install -e ".[all]" #
|
|
161
|
+
pip install -e ".[all]" # optional runtime backends (ollama, crypto, yaml)
|
|
117
162
|
```
|
|
118
163
|
|
|
119
164
|
Requires Python 3.11+.
|
|
@@ -204,16 +249,30 @@ category:
|
|
|
204
249
|
python benchmarks/run_benchmark.py
|
|
205
250
|
```
|
|
206
251
|
|
|
207
|
-
On the default configuration
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
252
|
+
On the default configuration plus the optional payment, transfer, and MCP guards,
|
|
253
|
+
with no model (136 cases: 89 attacks, 47 benign), it flags all 89 attacks at a
|
|
254
|
+
0% false-positive rate (0 of 47 benign inputs flagged):
|
|
255
|
+
|
|
256
|
+
| Category | Recall (attacks flagged) | False positives |
|
|
257
|
+
| ---------------- | ------------------------ | --------------- |
|
|
258
|
+
| injection | 27/27 | 0/11 |
|
|
259
|
+
| tool_misuse | 23/23 | 0/10 |
|
|
260
|
+
| memory_poisoning | 16/16 | 0/10 |
|
|
261
|
+
| mcp_poisoning | 6/6 | 0/4 |
|
|
262
|
+
| skill_poisoning | 4/4 | 0/2 |
|
|
263
|
+
| x402_payment | 6/6 | 0/2 |
|
|
264
|
+
| transfer | 3/3 | 0/2 |
|
|
265
|
+
| agent_relayed | 4/4 | 0/2 |
|
|
266
|
+
| benign_general | n/a | 0/4 |
|
|
267
|
+
| **Overall** | **89/89 (100%)** | **0/47 (0%)** |
|
|
268
|
+
|
|
269
|
+
Treat these as a coverage and regression signal on *known* patterns, not a
|
|
270
|
+
security claim: the corpus is partly self-authored, so novel attacks (other
|
|
271
|
+
languages, fresh encodings, or pure-semantic paraphrase) can still slip past a
|
|
272
|
+
deterministic denylist. Closing that gap is the job of the optional LLM second
|
|
273
|
+
opinion (see Roadmap); these patterns are the floor, not the ceiling. Re-run the
|
|
274
|
+
harness to regenerate the numbers above; the full breakdown and the honest edges
|
|
275
|
+
are in [`benchmarks/README.md`](benchmarks/README.md).
|
|
217
276
|
|
|
218
277
|
---
|
|
219
278
|
|
|
@@ -240,22 +299,32 @@ No need to touch the engine, the ledger, or the agent base classes.
|
|
|
240
299
|
Wardproof is built to become a complete, auditable control layer for AI agents.
|
|
241
300
|
The direction:
|
|
242
301
|
|
|
243
|
-
**Now (v0.
|
|
244
|
-
The deterministic core: schema,
|
|
245
|
-
|
|
246
|
-
|
|
247
|
-
|
|
248
|
-
|
|
302
|
+
**Now (v0.3.0)**
|
|
303
|
+
The deterministic core: schema, guardrails, Detector / Verifier / Responder, a
|
|
304
|
+
capability sandbox, circuit breaker and watchdog, a hash-chained and optionally
|
|
305
|
+
signed audit ledger, a reproducible adversarial benchmark, a published threat
|
|
306
|
+
model, worked examples, a test suite, and a ledger verification CLI. On top of
|
|
307
|
+
that: dedicated guards for x402 payments (recipient allowlist, spend thresholds,
|
|
308
|
+
replay detection, injection screening of the 402 body), on-chain transfers, MCP
|
|
309
|
+
tool calls (description and schema screening, server allowlisting, rug-pull
|
|
310
|
+
detection), and skill/tool definitions; a controls-to-standards map (OWASP
|
|
311
|
+
Agentic Top 10, OWASP LLM 2025, MITRE ATLAS, CSA MAESTRO, NIST AI 600-1) with
|
|
312
|
+
STIX 2.1 ledger export; screening harnesses for the public AgentDojo and
|
|
313
|
+
InjecAgent suites; and drop-in integration examples for OpenAI and Anthropic tool
|
|
314
|
+
calling, CrewAI, LangGraph, MCP, and Coinbase AgentKit, plus Venice AI as an
|
|
315
|
+
optional escalate-only second-opinion backend (alongside the existing Ollama
|
|
316
|
+
backend).
|
|
249
317
|
|
|
250
318
|
**Next**
|
|
251
|
-
- A semantic detection layer
|
|
252
|
-
|
|
319
|
+
- A bundled local semantic detection layer that ships by default alongside the
|
|
320
|
+
deterministic guardrails, to close the gaps the benchmark exposes. The
|
|
321
|
+
escalate-only second-opinion hook already exists (Ollama or Venice); this would
|
|
322
|
+
add a default local model so the semantic layer is on without extra setup.
|
|
253
323
|
- First-class isolation backends behind one interface: subprocess with rlimits,
|
|
254
324
|
Docker, and gVisor or microVM, each with its trust boundary documented.
|
|
255
|
-
-
|
|
256
|
-
|
|
257
|
-
|
|
258
|
-
- Config files, structured logging, and a pluggable guardrail registry.
|
|
325
|
+
- A FastAPI middleware that drops the swarm in front of an existing agent
|
|
326
|
+
service, and a pluggable guardrail registry, config files, and structured
|
|
327
|
+
logging.
|
|
259
328
|
|
|
260
329
|
**Later**
|
|
261
330
|
- Observability: ledger export to OpenTelemetry and SIEM, a read-only audit
|
|
@@ -2,7 +2,16 @@
|
|
|
2
2
|
|
|
3
3
|
**Local-first, verifiable defensive AI agent swarms.**
|
|
4
4
|
|
|
5
|
+
Stop prompt injection and tool misuse before your agent drains its wallet, leaks
|
|
6
|
+
its keys, or runs the wrong command, and keep a tamper-evident log of every
|
|
7
|
+
decision.
|
|
8
|
+
|
|
5
9
|
[](https://github.com/Impossible-Mission-Force/wardproof/actions/workflows/ci.yml)
|
|
10
|
+
[](https://pypi.org/project/wardproof/)
|
|
11
|
+
[](https://github.com/Impossible-Mission-Force/wardproof/blob/main/LICENSE)
|
|
12
|
+
[](https://www.python.org/downloads/)
|
|
13
|
+
|
|
14
|
+

|
|
6
15
|
|
|
7
16
|
Wardproof is a small framework for building swarms of *defensive* agents that
|
|
8
17
|
sit in front of your *other* AI systems (RAG pipelines, tool-using agents,
|
|
@@ -15,10 +24,17 @@ It is deliberately **small, transparent, and forkable**. The security core has
|
|
|
15
24
|
**zero third-party dependencies** and runs **fully offline**, with a local
|
|
16
25
|
model via Ollama, or with no model at all.
|
|
17
26
|
|
|
18
|
-
> **Status: v0.
|
|
19
|
-
> (see [Benchmark](#benchmark))
|
|
20
|
-
>
|
|
21
|
-
>
|
|
27
|
+
> **Status: v0.3.0.** The deterministic core is built, tested, and benchmarked
|
|
28
|
+
> (see [Benchmark](#benchmark)), and ships dedicated guards for x402 agent
|
|
29
|
+
> payments, on-chain transfers, MCP tool calls, and skill/tool definitions, a
|
|
30
|
+
> controls-to-standards map (OWASP Agentic Top 10, OWASP LLM 2025, MITRE ATLAS,
|
|
31
|
+
> CSA MAESTRO, and NIST AI 600-1) with STIX 2.1 ledger export, harnesses that
|
|
32
|
+
> screen the public AgentDojo and InjecAgent suites, and drop-in integration
|
|
33
|
+
> examples for OpenAI and Anthropic tool calling, CrewAI, LangGraph, MCP,
|
|
34
|
+
> Coinbase AgentKit, and Venice AI. It is
|
|
35
|
+
> deployable today as a screening and audit layer, designed to run as defence in
|
|
36
|
+
> depth within the scope set out in [`THREAT_MODEL.md`](THREAT_MODEL.md) and
|
|
37
|
+
> [`SECURITY.md`](SECURITY.md).
|
|
22
38
|
|
|
23
39
|
---
|
|
24
40
|
|
|
@@ -55,14 +71,26 @@ different stance:
|
|
|
55
71
|
- **x402 payment guardrail**: chain-agnostic screening of x402 (HTTP 402)
|
|
56
72
|
payment envelopes (CAIP-2 network, amount, recipient, asset) with a recipient
|
|
57
73
|
allowlist, amount thresholds, replay detection, and 402-body injection checks.
|
|
74
|
+
- **Transfer guardrail**: screens on-chain transfers against a recipient
|
|
75
|
+
allowlist and spend threshold, and treats an agent-relayed transfer as never
|
|
76
|
+
pre-authorised (it escalates rather than trusting one agent's say-so).
|
|
58
77
|
- **MCP guard**: screens MCP tool descriptions and schemas for tool poisoning
|
|
59
78
|
(incl. hidden Unicode), allowlists servers, detects manifest rug pulls, and
|
|
60
79
|
audits every tool invocation.
|
|
80
|
+
- **Skill/tool scanner**: screens a skill or tool definition (name, description,
|
|
81
|
+
code) before it is registered, catching hidden instructions buried in a
|
|
82
|
+
description (the tool-poisoning class, one step earlier than a live call). See
|
|
83
|
+
`examples/integrations/skills_guard.py`.
|
|
84
|
+
- **Framework integrations**: drop-in examples that put the swarm in front of
|
|
85
|
+
OpenAI and Anthropic tool calling, CrewAI, LangGraph, MCP, and Coinbase
|
|
86
|
+
AgentKit tool calls, plus Venice AI as an optional escalate-only second-opinion
|
|
87
|
+
backend. Each is an optional dependency; the core imports none of them. See
|
|
88
|
+
[`examples/integrations/`](examples/integrations/).
|
|
61
89
|
- **Standards-aligned**: every control mapped to OWASP Top 10 for Agentic
|
|
62
90
|
Applications, OWASP Agentic Threats (T1-T15), OWASP LLM Top 10 2025, CSA
|
|
63
|
-
MAESTRO,
|
|
64
|
-
detections are ATLAS-tagged and export to **STIX 2.1** for
|
|
65
|
-
`wardproof export-stix`.
|
|
91
|
+
MAESTRO, MITRE ATLAS, and NIST AI 600-1 (`wardproof/standards.py`, enforced by
|
|
92
|
+
tests). Ledger detections are ATLAS-tagged and export to **STIX 2.1** for
|
|
93
|
+
SIEM/SOC via `wardproof export-stix`.
|
|
66
94
|
- **3 reference agents**: `DetectorAgent`, `VerifierAgent` (with detector
|
|
67
95
|
integrity check), `ResponderAgent`.
|
|
68
96
|
- **Capability sandbox**: default-deny permission broker (per-agent grants,
|
|
@@ -84,7 +112,7 @@ different stance:
|
|
|
84
112
|
pip install -e . # core only, zero third-party deps
|
|
85
113
|
pip install -e ".[crypto]" # + Ed25519 signed ledgers
|
|
86
114
|
pip install -e ".[ollama]" # + local model via Ollama
|
|
87
|
-
pip install -e ".[all]" #
|
|
115
|
+
pip install -e ".[all]" # optional runtime backends (ollama, crypto, yaml)
|
|
88
116
|
```
|
|
89
117
|
|
|
90
118
|
Requires Python 3.11+.
|
|
@@ -175,16 +203,30 @@ category:
|
|
|
175
203
|
python benchmarks/run_benchmark.py
|
|
176
204
|
```
|
|
177
205
|
|
|
178
|
-
On the default configuration
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
206
|
+
On the default configuration plus the optional payment, transfer, and MCP guards,
|
|
207
|
+
with no model (136 cases: 89 attacks, 47 benign), it flags all 89 attacks at a
|
|
208
|
+
0% false-positive rate (0 of 47 benign inputs flagged):
|
|
209
|
+
|
|
210
|
+
| Category | Recall (attacks flagged) | False positives |
|
|
211
|
+
| ---------------- | ------------------------ | --------------- |
|
|
212
|
+
| injection | 27/27 | 0/11 |
|
|
213
|
+
| tool_misuse | 23/23 | 0/10 |
|
|
214
|
+
| memory_poisoning | 16/16 | 0/10 |
|
|
215
|
+
| mcp_poisoning | 6/6 | 0/4 |
|
|
216
|
+
| skill_poisoning | 4/4 | 0/2 |
|
|
217
|
+
| x402_payment | 6/6 | 0/2 |
|
|
218
|
+
| transfer | 3/3 | 0/2 |
|
|
219
|
+
| agent_relayed | 4/4 | 0/2 |
|
|
220
|
+
| benign_general | n/a | 0/4 |
|
|
221
|
+
| **Overall** | **89/89 (100%)** | **0/47 (0%)** |
|
|
222
|
+
|
|
223
|
+
Treat these as a coverage and regression signal on *known* patterns, not a
|
|
224
|
+
security claim: the corpus is partly self-authored, so novel attacks (other
|
|
225
|
+
languages, fresh encodings, or pure-semantic paraphrase) can still slip past a
|
|
226
|
+
deterministic denylist. Closing that gap is the job of the optional LLM second
|
|
227
|
+
opinion (see Roadmap); these patterns are the floor, not the ceiling. Re-run the
|
|
228
|
+
harness to regenerate the numbers above; the full breakdown and the honest edges
|
|
229
|
+
are in [`benchmarks/README.md`](benchmarks/README.md).
|
|
188
230
|
|
|
189
231
|
---
|
|
190
232
|
|
|
@@ -211,22 +253,32 @@ No need to touch the engine, the ledger, or the agent base classes.
|
|
|
211
253
|
Wardproof is built to become a complete, auditable control layer for AI agents.
|
|
212
254
|
The direction:
|
|
213
255
|
|
|
214
|
-
**Now (v0.
|
|
215
|
-
The deterministic core: schema,
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
|
|
256
|
+
**Now (v0.3.0)**
|
|
257
|
+
The deterministic core: schema, guardrails, Detector / Verifier / Responder, a
|
|
258
|
+
capability sandbox, circuit breaker and watchdog, a hash-chained and optionally
|
|
259
|
+
signed audit ledger, a reproducible adversarial benchmark, a published threat
|
|
260
|
+
model, worked examples, a test suite, and a ledger verification CLI. On top of
|
|
261
|
+
that: dedicated guards for x402 payments (recipient allowlist, spend thresholds,
|
|
262
|
+
replay detection, injection screening of the 402 body), on-chain transfers, MCP
|
|
263
|
+
tool calls (description and schema screening, server allowlisting, rug-pull
|
|
264
|
+
detection), and skill/tool definitions; a controls-to-standards map (OWASP
|
|
265
|
+
Agentic Top 10, OWASP LLM 2025, MITRE ATLAS, CSA MAESTRO, NIST AI 600-1) with
|
|
266
|
+
STIX 2.1 ledger export; screening harnesses for the public AgentDojo and
|
|
267
|
+
InjecAgent suites; and drop-in integration examples for OpenAI and Anthropic tool
|
|
268
|
+
calling, CrewAI, LangGraph, MCP, and Coinbase AgentKit, plus Venice AI as an
|
|
269
|
+
optional escalate-only second-opinion backend (alongside the existing Ollama
|
|
270
|
+
backend).
|
|
220
271
|
|
|
221
272
|
**Next**
|
|
222
|
-
- A semantic detection layer
|
|
223
|
-
|
|
273
|
+
- A bundled local semantic detection layer that ships by default alongside the
|
|
274
|
+
deterministic guardrails, to close the gaps the benchmark exposes. The
|
|
275
|
+
escalate-only second-opinion hook already exists (Ollama or Venice); this would
|
|
276
|
+
add a default local model so the semantic layer is on without extra setup.
|
|
224
277
|
- First-class isolation backends behind one interface: subprocess with rlimits,
|
|
225
278
|
Docker, and gVisor or microVM, each with its trust boundary documented.
|
|
226
|
-
-
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
- Config files, structured logging, and a pluggable guardrail registry.
|
|
279
|
+
- A FastAPI middleware that drops the swarm in front of an existing agent
|
|
280
|
+
service, and a pluggable guardrail registry, config files, and structured
|
|
281
|
+
logging.
|
|
230
282
|
|
|
231
283
|
**Later**
|
|
232
284
|
- Observability: ledger export to OpenTelemetry and SIEM, a read-only audit
|
|
@@ -68,8 +68,8 @@ our own model.
|
|
|
68
68
|
| Data exfiltration | outbound-data detection plus `allowed_hosts` | a creative encoding, or an allowlisted host that is itself compromised |
|
|
69
69
|
| Memory poisoning | flags persistence and stealth phrasing | subtle writes below threshold get escalated, not blocked; `remember that` over-flags (benchmark `mem-b4`) |
|
|
70
70
|
| Compromised defensive agent | Verifier re-checks independently and audits the Detector; verdicts combine fail-closed | if Detector and Verifier share the same rule set, they share its blind spots |
|
|
71
|
-
| Alert flood or cascading failure | CircuitBreaker forces a human into the loop |
|
|
72
|
-
| Ledger tampering | hash chain, signatures, external append-only storage, self-verify | a stolen
|
|
71
|
+
| Alert flood or cascading failure | CircuitBreaker forces a human into the loop | a downgrade window opens once tripped (floor is ESCALATE, CRITICAL is exempt); see the breaker note below |
|
|
72
|
+
| Ledger tampering | hash chain, signatures, external append-only storage, self-verify | the hash chain alone cannot detect a full rewrite; signatures close that, but a stolen key can forge entries and deleting the whole store removes evidence; see the trust-model note below |
|
|
73
73
|
| Sandbox escape | rlimit-bounded subprocess runner | not a boundary for hostile native code; use containers, gVisor, or microVMs |
|
|
74
74
|
| Supply chain | zero-dependency core | optional extras and the toolchain remain; signed releases and SBOM are roadmap v1.0 |
|
|
75
75
|
|
|
@@ -80,13 +80,15 @@ operators can place Wardproof in their existing risk language. The mapping is no
|
|
|
80
80
|
prose: it lives in code (`wardproof/standards.py`) and a test
|
|
81
81
|
(`tests/test_standards.py`) fails if any identifier here is wrong or unknown. The
|
|
82
82
|
identifiers were verified against primary sources (the official
|
|
83
|
-
`mitre-atlas/atlas-data` dataset, genai.owasp.org,
|
|
84
|
-
|
|
83
|
+
`mitre-atlas/atlas-data` dataset, genai.owasp.org, the CSA MAESTRO publication,
|
|
84
|
+
and the NIST AI 600-1 PDF); provenance and source URLs are in
|
|
85
85
|
`research/03-standards-verification.md`.
|
|
86
86
|
|
|
87
87
|
Frameworks: **ASI** = OWASP Top 10 for Agentic Applications 2026; **T** = OWASP
|
|
88
88
|
Agentic AI Threats and Mitigations (T1-T15); **LLM** = OWASP Top 10 for LLM
|
|
89
|
-
Applications 2025; **L** = CSA MAESTRO layer; **AML.T** = MITRE ATLAS technique
|
|
89
|
+
Applications 2025; **L** = CSA MAESTRO layer; **AML.T** = MITRE ATLAS technique;
|
|
90
|
+
**NIST AI 600-1** = the generative-AI risks named in the NIST GenAI Profile
|
|
91
|
+
(mapped in a separate table below).
|
|
90
92
|
|
|
91
93
|
| Wardproof control | OWASP Agentic Top 10 | OWASP Agentic Threats | OWASP LLM 2025 | MAESTRO | MITRE ATLAS |
|
|
92
94
|
| --- | --- | --- | --- | --- | --- |
|
|
@@ -122,6 +124,31 @@ The MITRE ATLAS catalog in `standards.py` also carries `AML.T0024`,
|
|
|
122
124
|
`AML.T0098`, and `AML.T0083` for credential and inference-API exfiltration that
|
|
123
125
|
the roadmap guardrails will map onto.
|
|
124
126
|
|
|
127
|
+
### NIST AI 600-1 (Generative AI Profile)
|
|
128
|
+
|
|
129
|
+
The guards also map onto the generative-AI risks named in **NIST AI 600-1**,
|
|
130
|
+
"Artificial Intelligence Risk Management Framework: Generative Artificial
|
|
131
|
+
Intelligence Profile" (July 2024), available at
|
|
132
|
+
<https://doi.org/10.6028/NIST.AI.600-1>. The risk names below are verbatim from
|
|
133
|
+
Section 2 of that document (`standards.py` carries all twelve).
|
|
134
|
+
|
|
135
|
+
| Wardproof control | NIST AI 600-1 risk |
|
|
136
|
+
| --- | --- |
|
|
137
|
+
| Prompt-injection guardrail | Information Security, Data Privacy |
|
|
138
|
+
| Tool-misuse guardrail | Information Security |
|
|
139
|
+
| Memory-poisoning guardrail | Information Integrity |
|
|
140
|
+
| x402 payment guardrail | Information Security |
|
|
141
|
+
| On-chain transfer guardrail | Information Security |
|
|
142
|
+
| MCP guard | Information Security |
|
|
143
|
+
|
|
144
|
+
Honest scope note: NIST AI 600-1 frames risks, not specific attack techniques.
|
|
145
|
+
Its Section 2 "Information Security" risk covers offensive cyber, vulnerability
|
|
146
|
+
exploitation, and attacks on the confidentiality and integrity of training data,
|
|
147
|
+
code, and model weights; the literal phrase "prompt injection" appears in the
|
|
148
|
+
document's suggested actions, not in that Section 2 risk definition. The mapping
|
|
149
|
+
above is to the risk category each guard reduces, not a claim that the document
|
|
150
|
+
names each attack by the same words.
|
|
151
|
+
|
|
125
152
|
### SIEM / SOC integration
|
|
126
153
|
|
|
127
154
|
The ledger exports to **STIX 2.1** so detections flow into SIEM and SOC tooling
|
|
@@ -138,6 +165,59 @@ custom properties, so a SIEM alert traces back to the exact, hash-verifiable
|
|
|
138
165
|
ledger entry. The export is deterministic (stable object ids), so re-exporting
|
|
139
166
|
the same ledger yields an identical bundle.
|
|
140
167
|
|
|
168
|
+
### Audit ledger trust model
|
|
169
|
+
|
|
170
|
+
The ledger is **tamper-evident, not tamper-preventing**. Be precise about what
|
|
171
|
+
that buys you:
|
|
172
|
+
|
|
173
|
+
- The stdlib hash chain (no key) detects in-place **mutation**, **deletion**,
|
|
174
|
+
**reordering**, and a **forged appended** entry: each breaks a recomputed hash
|
|
175
|
+
or a chain link.
|
|
176
|
+
- The hash chain alone does **not** detect a **full rewrite**: an attacker who
|
|
177
|
+
changes an entry and recomputes every downstream hash and `prev_hash` produces
|
|
178
|
+
a chain that re-verifies. A bare hash chain cannot stop someone who rewrites
|
|
179
|
+
all of it.
|
|
180
|
+
- **Ed25519 signatures** close that gap by binding history to a key the writer
|
|
181
|
+
holds. When a public key is in play, `verify()` requires **every** entry to
|
|
182
|
+
carry a valid signature: a missing signature is treated as tampering, not
|
|
183
|
+
skipped. So a full rewrite fails (the old signatures no longer match the
|
|
184
|
+
recomputed hashes) and stripping signatures fails (a missing signature at the
|
|
185
|
+
tampered index). The `require_signatures` flag forces this check explicitly.
|
|
186
|
+
- Signatures matter only when the **writer and verifier differ** and the
|
|
187
|
+
**private key lives outside the audited agent**. If the same process holds the
|
|
188
|
+
key and can rewrite the store, it can produce a fresh valid chain: that is
|
|
189
|
+
tamper-evidence's limit, and it is why the key and the verifier belong outside
|
|
190
|
+
the agent. Deleting the entire store still removes the evidence; append-only or
|
|
191
|
+
tamper-resistant media is the operator's job.
|
|
192
|
+
|
|
193
|
+
### Circuit-breaker downgrade tradeoff
|
|
194
|
+
|
|
195
|
+
When too many severe verdicts occur in a short window the breaker trips and, for
|
|
196
|
+
a cooldown, downgrades further `BLOCK`/`QUARANTINE` verdicts to `ESCALATE`. This
|
|
197
|
+
trades a block storm for a downgrade window, and it is a deliberate tradeoff, not
|
|
198
|
+
a hole:
|
|
199
|
+
|
|
200
|
+
- The downgrade floor is `ESCALATE`, **never** `ALLOW`: a human stays in the
|
|
201
|
+
loop, no action slips straight through.
|
|
202
|
+
- **CRITICAL** severity is **exempt**: a clearly critical action (a denied tool,
|
|
203
|
+
a payment replay) is never softened; only repeated lower-severity noise is.
|
|
204
|
+
- It is configurable: a high `max_events` effectively opts out for operators who
|
|
205
|
+
would rather take a block storm than a downgrade window.
|
|
206
|
+
|
|
207
|
+
### x402 canonical-envelope assumption
|
|
208
|
+
|
|
209
|
+
The x402 guard reads the payment envelope tolerant of the several field-name
|
|
210
|
+
aliases the ecosystem uses. A hostile server could try a **split envelope**:
|
|
211
|
+
a benign value where the guard looks and a hostile value where the payer reads.
|
|
212
|
+
The guard defends in depth by collecting **every** recipient and amount value,
|
|
213
|
+
requiring all recipients to be allowlisted, evaluating the **worst-case** (max)
|
|
214
|
+
amount against the threshold, flagging conflicting aliases as ambiguous, and
|
|
215
|
+
flagging non-positive amounts. It still **assumes the caller passes a single
|
|
216
|
+
canonical envelope** and that the payment tool reads the same authoritative
|
|
217
|
+
fields the guard screens; the conflict detection is a backstop, not a substitute
|
|
218
|
+
for a canonical envelope. The guard screens; the sandbox permission broker
|
|
219
|
+
remains the hard enforcement layer for the actual payment tool.
|
|
220
|
+
|
|
141
221
|
## Out of scope
|
|
142
222
|
|
|
143
223
|
Wardproof does **not** protect against the following. Treat these as the
|