PyPI - wardproof - Versions diffs - 0.2.0__tar.gz → 0.3.0__tar.gz - Mend

wardproof 0.2.0tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (70) hide show

{wardproof-0.2.0 → wardproof-0.3.0}/PKG-INFO RENAMED Viewed

@@ -1,16 +1,22 @@
 Metadata-Version: 2.4
 Name: wardproof
-Version: 0.2.0
+Version: 0.3.0
 Summary: Local-first, verifiable defensive AI agent swarms that protect other AI agent systems.
 Author: Wardproof contributors
 License-Expression: MIT
 License-File: LICENSE
 Keywords: agents,ai-security,guardrails,local-first,prompt-injection
 Requires-Python: >=3.11
+Provides-Extra: agentkit
+Requires-Dist: coinbase-agentkit>=0.7; extra == 'agentkit'
 Provides-Extra: all
 Requires-Dist: cryptography>=42; extra == 'all'
 Requires-Dist: httpx>=0.27; extra == 'all'
 Requires-Dist: pyyaml>=6; extra == 'all'
+Provides-Extra: anthropic
+Requires-Dist: anthropic>=0.40; extra == 'anthropic'
+Provides-Extra: crewai
+Requires-Dist: crewai>=1.14; extra == 'crewai'
 Provides-Extra: crypto
 Requires-Dist: cryptography>=42; extra == 'crypto'
 Provides-Extra: dev
@@ -21,8 +27,19 @@ Requires-Dist: pytest>=8; extra == 'dev'
 Requires-Dist: ruff>=0.4; extra == 'dev'
 Provides-Extra: guard
 Requires-Dist: llm-guard>=0.3; extra == 'guard'
+Provides-Extra: langgraph
+Requires-Dist: langchain-core>=1.4; extra == 'langgraph'
+Requires-Dist: langgraph>=1.2; extra == 'langgraph'
+Provides-Extra: mcp
+Requires-Dist: mcp>=1.26; extra == 'mcp'
 Provides-Extra: ollama
 Requires-Dist: httpx>=0.27; extra == 'ollama'
+Provides-Extra: openai
+Requires-Dist: openai>=1.0; extra == 'openai'
+Provides-Extra: venice
+Requires-Dist: openai>=1.0; extra == 'venice'
+Provides-Extra: x402
+Requires-Dist: x402>=2.0; extra == 'x402'
 Provides-Extra: yaml
 Requires-Dist: pyyaml>=6; extra == 'yaml'
 Description-Content-Type: text/markdown
@@ -31,7 +48,16 @@ Description-Content-Type: text/markdown
 **Local-first, verifiable defensive AI agent swarms.**
+Stop prompt injection and tool misuse before your agent drains its wallet, leaks
+its keys, or runs the wrong command, and keep a tamper-evident log of every
+decision.
 [![CI](https://github.com/Impossible-Mission-Force/wardproof/actions/workflows/ci.yml/badge.svg)](https://github.com/Impossible-Mission-Force/wardproof/actions/workflows/ci.yml)
+[![PyPI](https://img.shields.io/pypi/v/wardproof.svg)](https://pypi.org/project/wardproof/)
+[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/Impossible-Mission-Force/wardproof/blob/main/LICENSE)
+[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/downloads/)
+![Wardproof screening x402 payments: a legitimate payment is allowed while an attacker redirect, a replayed payment, and a prompt injection in the 402 body are all blocked and written to a tamper-evident ledger.](assets/wardproof-x402-demo.gif)
 Wardproof is a small framework for building swarms of *defensive* agents that
 sit in front of your *other* AI systems (RAG pipelines, tool-using agents,
@@ -44,10 +70,17 @@ It is deliberately **small, transparent, and forkable**. The security core has
 **zero third-party dependencies** and runs **fully offline**, with a local
 model via Ollama, or with no model at all.
-> **Status: v0.1.** The deterministic core is built, tested, and benchmarked
-> (see [Benchmark](#benchmark)). It is deployable today as a screening and
-> audit layer, designed to run as defence in depth within the scope set out in
-> [`THREAT_MODEL.md`](THREAT_MODEL.md) and [`SECURITY.md`](SECURITY.md).
+> **Status: v0.3.0.** The deterministic core is built, tested, and benchmarked
+> (see [Benchmark](#benchmark)), and ships dedicated guards for x402 agent
+> payments, on-chain transfers, MCP tool calls, and skill/tool definitions, a
+> controls-to-standards map (OWASP Agentic Top 10, OWASP LLM 2025, MITRE ATLAS,
+> CSA MAESTRO, and NIST AI 600-1) with STIX 2.1 ledger export, harnesses that
+> screen the public AgentDojo and InjecAgent suites, and drop-in integration
+> examples for OpenAI and Anthropic tool calling, CrewAI, LangGraph, MCP,
+> Coinbase AgentKit, and Venice AI. It is
+> deployable today as a screening and audit layer, designed to run as defence in
+> depth within the scope set out in [`THREAT_MODEL.md`](THREAT_MODEL.md) and
+> [`SECURITY.md`](SECURITY.md).
 ---
@@ -84,14 +117,26 @@ different stance:
 - **x402 payment guardrail**: chain-agnostic screening of x402 (HTTP 402)
   payment envelopes (CAIP-2 network, amount, recipient, asset) with a recipient
   allowlist, amount thresholds, replay detection, and 402-body injection checks.
+- **Transfer guardrail**: screens on-chain transfers against a recipient
+  allowlist and spend threshold, and treats an agent-relayed transfer as never
+  pre-authorised (it escalates rather than trusting one agent's say-so).
 - **MCP guard**: screens MCP tool descriptions and schemas for tool poisoning
   (incl. hidden Unicode), allowlists servers, detects manifest rug pulls, and
   audits every tool invocation.
+- **Skill/tool scanner**: screens a skill or tool definition (name, description,
+  code) before it is registered, catching hidden instructions buried in a
+  description (the tool-poisoning class, one step earlier than a live call). See
+  `examples/integrations/skills_guard.py`.
+- **Framework integrations**: drop-in examples that put the swarm in front of
+  OpenAI and Anthropic tool calling, CrewAI, LangGraph, MCP, and Coinbase
+  AgentKit tool calls, plus Venice AI as an optional escalate-only second-opinion
+  backend. Each is an optional dependency; the core imports none of them. See
+  [`examples/integrations/`](examples/integrations/).
 - **Standards-aligned**: every control mapped to OWASP Top 10 for Agentic
   Applications, OWASP Agentic Threats (T1-T15), OWASP LLM Top 10 2025, CSA
-  MAESTRO, and MITRE ATLAS (`wardproof/standards.py`, enforced by tests). Ledger
-  detections are ATLAS-tagged and export to **STIX 2.1** for SIEM/SOC via
-  `wardproof export-stix`.
+  MAESTRO, MITRE ATLAS, and NIST AI 600-1 (`wardproof/standards.py`, enforced by
+  tests). Ledger detections are ATLAS-tagged and export to **STIX 2.1** for
+  SIEM/SOC via `wardproof export-stix`.
 - **3 reference agents**: `DetectorAgent`, `VerifierAgent` (with detector
   integrity check), `ResponderAgent`.
 - **Capability sandbox**: default-deny permission broker (per-agent grants,
@@ -113,7 +158,7 @@ different stance:
 pip install -e .                  # core only, zero third-party deps
 pip install -e ".[crypto]"        # + Ed25519 signed ledgers
 pip install -e ".[ollama]"        # + local model via Ollama
-pip install -e ".[all]"           # everything, incl. dev tools
+pip install -e ".[all]"           # optional runtime backends (ollama, crypto, yaml)
 ```
 Requires Python 3.11+.
@@ -204,16 +249,30 @@ category:
 python benchmarks/run_benchmark.py
 ```
-On the default configuration with no model (66 cases, including a round of
-red-team bypasses), it flags all 44 attacks at a 1 in 22 (5%) false-positive
-rate. Treat that near-perfect number as a coverage and regression signal on
-*known* patterns, not a security claim: the corpus is small and partly
-self-authored, so novel attacks (other languages, fresh encodings, or
-pure-semantic paraphrase) can still slip past a deterministic denylist. Closing
-that gap is the job of the optional LLM second opinion (see Roadmap); these
-patterns are the floor, not the ceiling. The full breakdown, including the one
-benign input the guardrails deliberately flag, is in
-[`benchmarks/README.md`](benchmarks/README.md).
+On the default configuration plus the optional payment, transfer, and MCP guards,
+with no model (136 cases: 89 attacks, 47 benign), it flags all 89 attacks at a
+0% false-positive rate (0 of 47 benign inputs flagged):
+| Category         | Recall (attacks flagged) | False positives |
+| ---------------- | ------------------------ | --------------- |
+| injection        | 27/27                    | 0/11            |
+| tool_misuse      | 23/23                    | 0/10            |
+| memory_poisoning | 16/16                    | 0/10            |
+| mcp_poisoning    | 6/6                      | 0/4             |
+| skill_poisoning  | 4/4                      | 0/2             |
+| x402_payment     | 6/6                      | 0/2             |
+| transfer         | 3/3                      | 0/2             |
+| agent_relayed    | 4/4                      | 0/2             |
+| benign_general   | n/a                      | 0/4             |
+| **Overall**      | **89/89 (100%)**         | **0/47 (0%)**   |
+Treat these as a coverage and regression signal on *known* patterns, not a
+security claim: the corpus is partly self-authored, so novel attacks (other
+languages, fresh encodings, or pure-semantic paraphrase) can still slip past a
+deterministic denylist. Closing that gap is the job of the optional LLM second
+opinion (see Roadmap); these patterns are the floor, not the ceiling. Re-run the
+harness to regenerate the numbers above; the full breakdown and the honest edges
+are in [`benchmarks/README.md`](benchmarks/README.md).
 ---
@@ -240,22 +299,32 @@ No need to touch the engine, the ledger, or the agent base classes.
 Wardproof is built to become a complete, auditable control layer for AI agents.
 The direction:
-**Now (v0.1)**
-The deterministic core: schema, three guardrails, Detector / Verifier /
-Responder, a capability sandbox, circuit breaker and watchdog, a hash-chained
-and optionally signed audit ledger, a reproducible adversarial benchmark, a
-published threat model, worked examples, a test suite, and a ledger
-verification CLI.
+**Now (v0.3.0)**
+The deterministic core: schema, guardrails, Detector / Verifier / Responder, a
+capability sandbox, circuit breaker and watchdog, a hash-chained and optionally
+signed audit ledger, a reproducible adversarial benchmark, a published threat
+model, worked examples, a test suite, and a ledger verification CLI. On top of
+that: dedicated guards for x402 payments (recipient allowlist, spend thresholds,
+replay detection, injection screening of the 402 body), on-chain transfers, MCP
+tool calls (description and schema screening, server allowlisting, rug-pull
+detection), and skill/tool definitions; a controls-to-standards map (OWASP
+Agentic Top 10, OWASP LLM 2025, MITRE ATLAS, CSA MAESTRO, NIST AI 600-1) with
+STIX 2.1 ledger export; screening harnesses for the public AgentDojo and
+InjecAgent suites; and drop-in integration examples for OpenAI and Anthropic tool
+calling, CrewAI, LangGraph, MCP, and Coinbase AgentKit, plus Venice AI as an
+optional escalate-only second-opinion backend (alongside the existing Ollama
+backend).
 **Next**
-- A semantic detection layer running alongside the deterministic guardrails as
-  an escalate-only second opinion, to close the gaps the benchmark exposes.
+- A bundled local semantic detection layer that ships by default alongside the
+  deterministic guardrails, to close the gaps the benchmark exposes. The
+  escalate-only second-opinion hook already exists (Ollama or Venice); this would
+  add a default local model so the semantic layer is on without extra setup.
 - First-class isolation backends behind one interface: subprocess with rlimits,
   Docker, and gVisor or microVM, each with its trust boundary documented.
-- Optional adapters for popular agent frameworks (LangGraph, CrewAI) and a
-  FastAPI middleware, dropping the swarm in front of an existing agent without
-  pulling anything into the security core.
-- Config files, structured logging, and a pluggable guardrail registry.
+- A FastAPI middleware that drops the swarm in front of an existing agent
+  service, and a pluggable guardrail registry, config files, and structured
+  logging.
 **Later**
 - Observability: ledger export to OpenTelemetry and SIEM, a read-only audit

{wardproof-0.2.0 → wardproof-0.3.0}/README.md RENAMED Viewed

@@ -2,7 +2,16 @@
 **Local-first, verifiable defensive AI agent swarms.**
+Stop prompt injection and tool misuse before your agent drains its wallet, leaks
+its keys, or runs the wrong command, and keep a tamper-evident log of every
+decision.
 [![CI](https://github.com/Impossible-Mission-Force/wardproof/actions/workflows/ci.yml/badge.svg)](https://github.com/Impossible-Mission-Force/wardproof/actions/workflows/ci.yml)
+[![PyPI](https://img.shields.io/pypi/v/wardproof.svg)](https://pypi.org/project/wardproof/)
+[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/Impossible-Mission-Force/wardproof/blob/main/LICENSE)
+[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/downloads/)
+![Wardproof screening x402 payments: a legitimate payment is allowed while an attacker redirect, a replayed payment, and a prompt injection in the 402 body are all blocked and written to a tamper-evident ledger.](assets/wardproof-x402-demo.gif)
 Wardproof is a small framework for building swarms of *defensive* agents that
 sit in front of your *other* AI systems (RAG pipelines, tool-using agents,
@@ -15,10 +24,17 @@ It is deliberately **small, transparent, and forkable**. The security core has
 **zero third-party dependencies** and runs **fully offline**, with a local
 model via Ollama, or with no model at all.
-> **Status: v0.1.** The deterministic core is built, tested, and benchmarked
-> (see [Benchmark](#benchmark)). It is deployable today as a screening and
-> audit layer, designed to run as defence in depth within the scope set out in
-> [`THREAT_MODEL.md`](THREAT_MODEL.md) and [`SECURITY.md`](SECURITY.md).
+> **Status: v0.3.0.** The deterministic core is built, tested, and benchmarked
+> (see [Benchmark](#benchmark)), and ships dedicated guards for x402 agent
+> payments, on-chain transfers, MCP tool calls, and skill/tool definitions, a
+> controls-to-standards map (OWASP Agentic Top 10, OWASP LLM 2025, MITRE ATLAS,
+> CSA MAESTRO, and NIST AI 600-1) with STIX 2.1 ledger export, harnesses that
+> screen the public AgentDojo and InjecAgent suites, and drop-in integration
+> examples for OpenAI and Anthropic tool calling, CrewAI, LangGraph, MCP,
+> Coinbase AgentKit, and Venice AI. It is
+> deployable today as a screening and audit layer, designed to run as defence in
+> depth within the scope set out in [`THREAT_MODEL.md`](THREAT_MODEL.md) and
+> [`SECURITY.md`](SECURITY.md).
 ---
@@ -55,14 +71,26 @@ different stance:
 - **x402 payment guardrail**: chain-agnostic screening of x402 (HTTP 402)
   payment envelopes (CAIP-2 network, amount, recipient, asset) with a recipient
   allowlist, amount thresholds, replay detection, and 402-body injection checks.
+- **Transfer guardrail**: screens on-chain transfers against a recipient
+  allowlist and spend threshold, and treats an agent-relayed transfer as never
+  pre-authorised (it escalates rather than trusting one agent's say-so).
 - **MCP guard**: screens MCP tool descriptions and schemas for tool poisoning
   (incl. hidden Unicode), allowlists servers, detects manifest rug pulls, and
   audits every tool invocation.
+- **Skill/tool scanner**: screens a skill or tool definition (name, description,
+  code) before it is registered, catching hidden instructions buried in a
+  description (the tool-poisoning class, one step earlier than a live call). See
+  `examples/integrations/skills_guard.py`.
+- **Framework integrations**: drop-in examples that put the swarm in front of
+  OpenAI and Anthropic tool calling, CrewAI, LangGraph, MCP, and Coinbase
+  AgentKit tool calls, plus Venice AI as an optional escalate-only second-opinion
+  backend. Each is an optional dependency; the core imports none of them. See
+  [`examples/integrations/`](examples/integrations/).
 - **Standards-aligned**: every control mapped to OWASP Top 10 for Agentic
   Applications, OWASP Agentic Threats (T1-T15), OWASP LLM Top 10 2025, CSA
-  MAESTRO, and MITRE ATLAS (`wardproof/standards.py`, enforced by tests). Ledger
-  detections are ATLAS-tagged and export to **STIX 2.1** for SIEM/SOC via
-  `wardproof export-stix`.
+  MAESTRO, MITRE ATLAS, and NIST AI 600-1 (`wardproof/standards.py`, enforced by
+  tests). Ledger detections are ATLAS-tagged and export to **STIX 2.1** for
+  SIEM/SOC via `wardproof export-stix`.
 - **3 reference agents**: `DetectorAgent`, `VerifierAgent` (with detector
   integrity check), `ResponderAgent`.
 - **Capability sandbox**: default-deny permission broker (per-agent grants,
@@ -84,7 +112,7 @@ different stance:
 pip install -e .                  # core only, zero third-party deps
 pip install -e ".[crypto]"        # + Ed25519 signed ledgers
 pip install -e ".[ollama]"        # + local model via Ollama
-pip install -e ".[all]"           # everything, incl. dev tools
+pip install -e ".[all]"           # optional runtime backends (ollama, crypto, yaml)
 ```
 Requires Python 3.11+.
@@ -175,16 +203,30 @@ category:
 python benchmarks/run_benchmark.py
 ```
-On the default configuration with no model (66 cases, including a round of
-red-team bypasses), it flags all 44 attacks at a 1 in 22 (5%) false-positive
-rate. Treat that near-perfect number as a coverage and regression signal on
-*known* patterns, not a security claim: the corpus is small and partly
-self-authored, so novel attacks (other languages, fresh encodings, or
-pure-semantic paraphrase) can still slip past a deterministic denylist. Closing
-that gap is the job of the optional LLM second opinion (see Roadmap); these
-patterns are the floor, not the ceiling. The full breakdown, including the one
-benign input the guardrails deliberately flag, is in
-[`benchmarks/README.md`](benchmarks/README.md).
+On the default configuration plus the optional payment, transfer, and MCP guards,
+with no model (136 cases: 89 attacks, 47 benign), it flags all 89 attacks at a
+0% false-positive rate (0 of 47 benign inputs flagged):
+| Category         | Recall (attacks flagged) | False positives |
+| ---------------- | ------------------------ | --------------- |
+| injection        | 27/27                    | 0/11            |
+| tool_misuse      | 23/23                    | 0/10            |
+| memory_poisoning | 16/16                    | 0/10            |
+| mcp_poisoning    | 6/6                      | 0/4             |
+| skill_poisoning  | 4/4                      | 0/2             |
+| x402_payment     | 6/6                      | 0/2             |
+| transfer         | 3/3                      | 0/2             |
+| agent_relayed    | 4/4                      | 0/2             |
+| benign_general   | n/a                      | 0/4             |
+| **Overall**      | **89/89 (100%)**         | **0/47 (0%)**   |
+Treat these as a coverage and regression signal on *known* patterns, not a
+security claim: the corpus is partly self-authored, so novel attacks (other
+languages, fresh encodings, or pure-semantic paraphrase) can still slip past a
+deterministic denylist. Closing that gap is the job of the optional LLM second
+opinion (see Roadmap); these patterns are the floor, not the ceiling. Re-run the
+harness to regenerate the numbers above; the full breakdown and the honest edges
+are in [`benchmarks/README.md`](benchmarks/README.md).
 ---
@@ -211,22 +253,32 @@ No need to touch the engine, the ledger, or the agent base classes.
 Wardproof is built to become a complete, auditable control layer for AI agents.
 The direction:
-**Now (v0.1)**
-The deterministic core: schema, three guardrails, Detector / Verifier /
-Responder, a capability sandbox, circuit breaker and watchdog, a hash-chained
-and optionally signed audit ledger, a reproducible adversarial benchmark, a
-published threat model, worked examples, a test suite, and a ledger
-verification CLI.
+**Now (v0.3.0)**
+The deterministic core: schema, guardrails, Detector / Verifier / Responder, a
+capability sandbox, circuit breaker and watchdog, a hash-chained and optionally
+signed audit ledger, a reproducible adversarial benchmark, a published threat
+model, worked examples, a test suite, and a ledger verification CLI. On top of
+that: dedicated guards for x402 payments (recipient allowlist, spend thresholds,
+replay detection, injection screening of the 402 body), on-chain transfers, MCP
+tool calls (description and schema screening, server allowlisting, rug-pull
+detection), and skill/tool definitions; a controls-to-standards map (OWASP
+Agentic Top 10, OWASP LLM 2025, MITRE ATLAS, CSA MAESTRO, NIST AI 600-1) with
+STIX 2.1 ledger export; screening harnesses for the public AgentDojo and
+InjecAgent suites; and drop-in integration examples for OpenAI and Anthropic tool
+calling, CrewAI, LangGraph, MCP, and Coinbase AgentKit, plus Venice AI as an
+optional escalate-only second-opinion backend (alongside the existing Ollama
+backend).
 **Next**
-- A semantic detection layer running alongside the deterministic guardrails as
-  an escalate-only second opinion, to close the gaps the benchmark exposes.
+- A bundled local semantic detection layer that ships by default alongside the
+  deterministic guardrails, to close the gaps the benchmark exposes. The
+  escalate-only second-opinion hook already exists (Ollama or Venice); this would
+  add a default local model so the semantic layer is on without extra setup.
 - First-class isolation backends behind one interface: subprocess with rlimits,
   Docker, and gVisor or microVM, each with its trust boundary documented.
-- Optional adapters for popular agent frameworks (LangGraph, CrewAI) and a
-  FastAPI middleware, dropping the swarm in front of an existing agent without
-  pulling anything into the security core.
-- Config files, structured logging, and a pluggable guardrail registry.
+- A FastAPI middleware that drops the swarm in front of an existing agent
+  service, and a pluggable guardrail registry, config files, and structured
+  logging.
 **Later**
 - Observability: ledger export to OpenTelemetry and SIEM, a read-only audit

{wardproof-0.2.0 → wardproof-0.3.0}/THREAT_MODEL.md RENAMED Viewed

@@ -68,8 +68,8 @@ our own model.
 | Data exfiltration | outbound-data detection plus `allowed_hosts` | a creative encoding, or an allowlisted host that is itself compromised |
 | Memory poisoning | flags persistence and stealth phrasing | subtle writes below threshold get escalated, not blocked; `remember that` over-flags (benchmark `mem-b4`) |
 | Compromised defensive agent | Verifier re-checks independently and audits the Detector; verdicts combine fail-closed | if Detector and Verifier share the same rule set, they share its blind spots |
-| Alert flood or cascading failure | CircuitBreaker forces a human into the loop | depends on threshold tuning |
-| Ledger tampering | hash chain, signatures, external append-only storage, self-verify | a stolen signing key can forge entries; deleting the whole store removes evidence |
+| Alert flood or cascading failure | CircuitBreaker forces a human into the loop | a downgrade window opens once tripped (floor is ESCALATE, CRITICAL is exempt); see the breaker note below |
+| Ledger tampering | hash chain, signatures, external append-only storage, self-verify | the hash chain alone cannot detect a full rewrite; signatures close that, but a stolen key can forge entries and deleting the whole store removes evidence; see the trust-model note below |
 | Sandbox escape | rlimit-bounded subprocess runner | not a boundary for hostile native code; use containers, gVisor, or microVMs |
 | Supply chain | zero-dependency core | optional extras and the toolchain remain; signed releases and SBOM are roadmap v1.0 |
@@ -80,13 +80,15 @@ operators can place Wardproof in their existing risk language. The mapping is no
 prose: it lives in code (`wardproof/standards.py`) and a test
 (`tests/test_standards.py`) fails if any identifier here is wrong or unknown. The
 identifiers were verified against primary sources (the official
-`mitre-atlas/atlas-data` dataset, genai.owasp.org, and the CSA MAESTRO
-publication); provenance and source URLs are in
+`mitre-atlas/atlas-data` dataset, genai.owasp.org, the CSA MAESTRO publication,
+and the NIST AI 600-1 PDF); provenance and source URLs are in
 `research/03-standards-verification.md`.
 Frameworks: **ASI** = OWASP Top 10 for Agentic Applications 2026; **T** = OWASP
 Agentic AI Threats and Mitigations (T1-T15); **LLM** = OWASP Top 10 for LLM
-Applications 2025; **L** = CSA MAESTRO layer; **AML.T** = MITRE ATLAS technique.
+Applications 2025; **L** = CSA MAESTRO layer; **AML.T** = MITRE ATLAS technique;
+**NIST AI 600-1** = the generative-AI risks named in the NIST GenAI Profile
+(mapped in a separate table below).
 | Wardproof control | OWASP Agentic Top 10 | OWASP Agentic Threats | OWASP LLM 2025 | MAESTRO | MITRE ATLAS |
 | --- | --- | --- | --- | --- | --- |
@@ -122,6 +124,31 @@ The MITRE ATLAS catalog in `standards.py` also carries `AML.T0024`,
 `AML.T0098`, and `AML.T0083` for credential and inference-API exfiltration that
 the roadmap guardrails will map onto.
+### NIST AI 600-1 (Generative AI Profile)
+The guards also map onto the generative-AI risks named in **NIST AI 600-1**,
+"Artificial Intelligence Risk Management Framework: Generative Artificial
+Intelligence Profile" (July 2024), available at
+<https://doi.org/10.6028/NIST.AI.600-1>. The risk names below are verbatim from
+Section 2 of that document (`standards.py` carries all twelve).
+| Wardproof control | NIST AI 600-1 risk |
+| --- | --- |
+| Prompt-injection guardrail | Information Security, Data Privacy |
+| Tool-misuse guardrail | Information Security |
+| Memory-poisoning guardrail | Information Integrity |
+| x402 payment guardrail | Information Security |
+| On-chain transfer guardrail | Information Security |
+| MCP guard | Information Security |
+Honest scope note: NIST AI 600-1 frames risks, not specific attack techniques.
+Its Section 2 "Information Security" risk covers offensive cyber, vulnerability
+exploitation, and attacks on the confidentiality and integrity of training data,
+code, and model weights; the literal phrase "prompt injection" appears in the
+document's suggested actions, not in that Section 2 risk definition. The mapping
+above is to the risk category each guard reduces, not a claim that the document
+names each attack by the same words.
 ### SIEM / SOC integration
 The ledger exports to **STIX 2.1** so detections flow into SIEM and SOC tooling
@@ -138,6 +165,59 @@ custom properties, so a SIEM alert traces back to the exact, hash-verifiable
 ledger entry. The export is deterministic (stable object ids), so re-exporting
 the same ledger yields an identical bundle.
+### Audit ledger trust model
+The ledger is **tamper-evident, not tamper-preventing**. Be precise about what
+that buys you:
+- The stdlib hash chain (no key) detects in-place **mutation**, **deletion**,
+  **reordering**, and a **forged appended** entry: each breaks a recomputed hash
+  or a chain link.
+- The hash chain alone does **not** detect a **full rewrite**: an attacker who
+  changes an entry and recomputes every downstream hash and `prev_hash` produces
+  a chain that re-verifies. A bare hash chain cannot stop someone who rewrites
+  all of it.
+- **Ed25519 signatures** close that gap by binding history to a key the writer
+  holds. When a public key is in play, `verify()` requires **every** entry to
+  carry a valid signature: a missing signature is treated as tampering, not
+  skipped. So a full rewrite fails (the old signatures no longer match the
+  recomputed hashes) and stripping signatures fails (a missing signature at the
+  tampered index). The `require_signatures` flag forces this check explicitly.
+- Signatures matter only when the **writer and verifier differ** and the
+  **private key lives outside the audited agent**. If the same process holds the
+  key and can rewrite the store, it can produce a fresh valid chain: that is
+  tamper-evidence's limit, and it is why the key and the verifier belong outside
+  the agent. Deleting the entire store still removes the evidence; append-only or
+  tamper-resistant media is the operator's job.
+### Circuit-breaker downgrade tradeoff
+When too many severe verdicts occur in a short window the breaker trips and, for
+a cooldown, downgrades further `BLOCK`/`QUARANTINE` verdicts to `ESCALATE`. This
+trades a block storm for a downgrade window, and it is a deliberate tradeoff, not
+a hole:
+- The downgrade floor is `ESCALATE`, **never** `ALLOW`: a human stays in the
+  loop, no action slips straight through.
+- **CRITICAL** severity is **exempt**: a clearly critical action (a denied tool,
+  a payment replay) is never softened; only repeated lower-severity noise is.
+- It is configurable: a high `max_events` effectively opts out for operators who
+  would rather take a block storm than a downgrade window.
+### x402 canonical-envelope assumption
+The x402 guard reads the payment envelope tolerant of the several field-name
+aliases the ecosystem uses. A hostile server could try a **split envelope**:
+a benign value where the guard looks and a hostile value where the payer reads.
+The guard defends in depth by collecting **every** recipient and amount value,
+requiring all recipients to be allowlisted, evaluating the **worst-case** (max)
+amount against the threshold, flagging conflicting aliases as ambiguous, and
+flagging non-positive amounts. It still **assumes the caller passes a single
+canonical envelope** and that the payment tool reads the same authoritative
+fields the guard screens; the conflict detection is a backstop, not a substitute
+for a canonical envelope. The guard screens; the sandbox permission broker
+remains the hard enforcement layer for the actual payment tool.
 ## Out of scope
 Wardproof does **not** protect against the following. Treat these as the

wardproof 0.2.0__tar.gz → 0.3.0__tar.gz

wardproof 0.2.0tar.gz → 0.3.0tar.gz