agent-threat-rules 2.0.12 → 2.0.14
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +10 -7
- package/package.json +1 -1
- package/rules/agent-manipulation/ATR-2026-00030-cross-agent-attack.yaml +51 -0
- package/rules/agent-manipulation/ATR-2026-00269-fitd-escalation.yaml +4 -0
- package/rules/agent-manipulation/ATR-2026-00288-false-premise-injection.yaml +4 -0
- package/rules/context-exfiltration/ATR-2026-00290-divergence-repeat-word-training-extraction.yaml +4 -0
- package/rules/model-abuse/ATR-2026-00281-eicar-gtube-malware-signature-request.yaml +4 -0
- package/rules/prompt-injection/ATR-2026-00282-perez-prompt-injection-hijack.yaml +4 -0
- package/rules/skill-compromise/ATR-2026-00260-package-hallucination.yaml +4 -0
- package/spec/compliance-metadata.md +125 -0
package/README.md
CHANGED
|
@@ -15,6 +15,7 @@ AI Agent 威脅偵測規則 -- 開源、社群驅動
|
|
|
15
15
|
[](#what-atr-detects)
|
|
16
16
|
[](#ecosystem)
|
|
17
17
|
[](#evaluation)
|
|
18
|
+
[](#evaluation)
|
|
18
19
|
[](#ecosystem-scan)
|
|
19
20
|
[](#standards-coverage)
|
|
20
21
|
|
|
@@ -72,6 +73,7 @@ Key finding: at least 3 coordinated threat actors mass-published poisoned skills
|
|
|
72
73
|
|
|
73
74
|
| Benchmark | Samples | Recall | Precision | FP Rate |
|
|
74
75
|
|-----------|---------|--------|-----------|---------|
|
|
76
|
+
| **NVIDIA garak (in-the-wild jailbreaks)** | **666** | **97.1%** | 100% | 0% |
|
|
75
77
|
| SKILL.md (498 labeled samples) | 498 | **100%** | **97%** | **0.20%** |
|
|
76
78
|
| PINT (Invariant Labs, adversarial) | 850 | -- | 99.6% | 62.7% |
|
|
77
79
|
| Wild scan (96K real-world) | 96,096 | -- | -- | 1.35% flag rate |
|
|
@@ -114,12 +116,13 @@ One line. Zero config. SARIF results in your Security tab.
|
|
|
114
116
|
|
|
115
117
|
| Category | What it catches | Rules | Real CVEs |
|
|
116
118
|
|----------|----------------|-------|-----------|
|
|
117
|
-
| **Prompt Injection** | "Ignore previous instructions", persona hijacking, encoded payloads, CJK attacks,
|
|
118
|
-
| **
|
|
119
|
-
| **
|
|
120
|
-
| **
|
|
121
|
-
| **
|
|
122
|
-
| **Privilege Escalation** | Scope creep, delayed execution bypass, admin function access |
|
|
119
|
+
| **Prompt Injection** | "Ignore previous instructions", persona hijacking, encoded payloads (base-N, ROT, Unicode tags, sneaky-bits, zalgo, ecoji, base2048), CJK attacks, latent injection, glitch tokens, DRA parenthesis reconstruction, leakreplay MASK | 108 | CVE-2025-53773, CVE-2025-32711 |
|
|
120
|
+
| **Agent Manipulation** | DAN family (DAN / DUDE / STAN / AntiDAN / RANTI / DevMode), AutoDAN, DanInTheWild, tense framing, grandma roleplay, goodside threat-JSON, doctor XML puppetry, cross-agent attacks, goal hijacking, Sybil consensus | 99 | -- |
|
|
121
|
+
| **Skill Compromise** | Typosquatting, context poisoning, subcommand overflow, rug pull, supply chain attacks, credential exfil combos, HuggingFace unsafe artifacts | 37 | CVE-2025-59536, CVE-2026-28363 |
|
|
122
|
+
| **Context Exfiltration** | API key generation/completion, system prompt theft, credential harvesting, env variable exfil, markdown-URL data exfil, XSS in tool response | 26 | CVE-2026-24307 |
|
|
123
|
+
| **Tool Poisoning** | Malicious MCP responses, consent bypass, hidden LLM instructions, schema contradictions, ANSI escape elicitation | 16 | CVE-2025-68143/68144/68145 |
|
|
124
|
+
| **Privilege Escalation** | Scope creep, delayed execution bypass, admin function access, shell escape | 9 | CVE-2026-0628 |
|
|
125
|
+
| **Model Abuse** | Malware code generation (malwaregen), EICAR/GTUBE signatures, AV-evasion gen | 8 | -- |
|
|
123
126
|
| **Excessive Autonomy** | Runaway loops, resource exhaustion, unauthorized financial actions | 5 | -- |
|
|
124
127
|
| **Model Security** | Behavior extraction, malicious fine-tuning data | 2 | -- |
|
|
125
128
|
| **Data Poisoning** | RAG/knowledge base tampering, memory manipulation | 1 | -- |
|
|
@@ -137,7 +140,7 @@ We test ATR with our own tests, external benchmarks, AND real-world wild scannin
|
|
|
137
140
|
| **SKILL.md benchmark** | **498 labeled samples** | **498** | **97.0%** | **100%** |
|
|
138
141
|
| **96K wild scan** | **OpenClaw + Skills.sh + Hermes + ClawHub** | **96,096** | **--** | **--** |
|
|
139
142
|
| **PINT (adversarial)** | **Invariant Labs** | **850** | **99.6%** | **62.7%** |
|
|
140
|
-
| **Garak (real-world jailbreaks)** | **NVIDIA** | **666** |
|
|
143
|
+
| **Garak (real-world jailbreaks)** | **NVIDIA** | **666** | 100% | **97.1%** |
|
|
141
144
|
| Self-test (own test cases) | Internal | 361 | 100% | 88.5% |
|
|
142
145
|
|
|
143
146
|
```bash
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "agent-threat-rules",
|
|
3
|
-
"version": "2.0.
|
|
3
|
+
"version": "2.0.14",
|
|
4
4
|
"type": "module",
|
|
5
5
|
"description": "Open detection standard -- like Sigma, but for AI agents. 311 rules for prompt injection, tool poisoning, context exfiltration, and MCP attacks. Shipped in Cisco AI Defense. 97.1% recall on NVIDIA garak.",
|
|
6
6
|
"main": "./dist/index.js",
|
|
@@ -32,6 +32,57 @@ references:
|
|
|
32
32
|
- "AML.T0043 - Craft Adversarial Data"
|
|
33
33
|
- "AML.T0052.000 - Spearphishing via Social Engineering LLM"
|
|
34
34
|
|
|
35
|
+
# Audit-grade compliance mapping — see spec/compliance-metadata.md
|
|
36
|
+
compliance:
|
|
37
|
+
owasp_agentic:
|
|
38
|
+
- id: "ASI01:2026"
|
|
39
|
+
context: "Detects agent goal hijack when an attacker spoofs a peer agent's identity to inject manipulative instructions into inter-agent messages."
|
|
40
|
+
strength: primary
|
|
41
|
+
- id: "ASI07:2026"
|
|
42
|
+
context: "Directly addresses insecure inter-agent communication by flagging forged system-level message tags and manipulated message format conventions."
|
|
43
|
+
strength: primary
|
|
44
|
+
- id: "ASI10:2026"
|
|
45
|
+
context: "Surfaces rogue agents attempting orchestrator bypass or fake status-message injection against trusted peers."
|
|
46
|
+
strength: secondary
|
|
47
|
+
owasp_llm:
|
|
48
|
+
- id: "LLM01:2025"
|
|
49
|
+
context: "Inter-agent prompt injection is a prompt-injection vector operating at the agent-to-agent boundary rather than the user-to-agent boundary."
|
|
50
|
+
strength: primary
|
|
51
|
+
- id: "LLM06:2025"
|
|
52
|
+
context: "Excessive agency is what an attacker exploits when cross-agent spoofing succeeds — the target agent takes actions it would not otherwise take."
|
|
53
|
+
strength: secondary
|
|
54
|
+
eu_ai_act:
|
|
55
|
+
- article: 12
|
|
56
|
+
clause: "Automatic logging for high-risk AI systems"
|
|
57
|
+
context: "Every cross-agent attack detection emits a log record with actor agent, target agent, timestamp, payload hash, and rule match — satisfying Article 12's continuous logging requirement."
|
|
58
|
+
strength: primary
|
|
59
|
+
- article: 14
|
|
60
|
+
clause: "Human oversight"
|
|
61
|
+
context: "Critical-severity detections route to the respond-agent human-in-the-loop flow before allowing the attacker-controlled inter-agent call to complete."
|
|
62
|
+
strength: primary
|
|
63
|
+
- article: 15
|
|
64
|
+
clause: "Accuracy, robustness, and cybersecurity"
|
|
65
|
+
context: "Cross-agent attack detection is a cybersecurity control contributing to the Article 15 robustness requirement for high-risk AI systems."
|
|
66
|
+
strength: secondary
|
|
67
|
+
nist_ai_rmf:
|
|
68
|
+
- function: "Manage"
|
|
69
|
+
subcategory: "MG.2.3"
|
|
70
|
+
context: "Contributes detection evidence to incident response playbooks; each match is an incident candidate for the respond-agent pipeline."
|
|
71
|
+
strength: primary
|
|
72
|
+
- function: "Govern"
|
|
73
|
+
subcategory: "GV.1.1"
|
|
74
|
+
context: "Logs support organizational AI governance by recording every inter-agent communication risk event for quarterly governance review."
|
|
75
|
+
strength: secondary
|
|
76
|
+
iso_42001:
|
|
77
|
+
- clause: "6.2"
|
|
78
|
+
clause_name: "AI risk treatment"
|
|
79
|
+
context: "Cross-agent impersonation is an identified AI threat vector; this rule is the operational risk-treatment control."
|
|
80
|
+
strength: primary
|
|
81
|
+
- clause: "9.1"
|
|
82
|
+
clause_name: "Monitoring, measurement, analysis, evaluation"
|
|
83
|
+
context: "Confidence-scored detection events feed the AIMS monitoring and measurement requirements."
|
|
84
|
+
strength: secondary
|
|
85
|
+
|
|
35
86
|
tags:
|
|
36
87
|
category: agent-manipulation
|
|
37
88
|
subcategory: cross-agent-attack
|
|
@@ -41,6 +41,10 @@ agent_source:
|
|
|
41
41
|
type: mcp_exchange
|
|
42
42
|
framework: [any]
|
|
43
43
|
provider: [any]
|
|
44
|
+
metadata_provenance:
|
|
45
|
+
garak_probe: misleading.FalseAssertion
|
|
46
|
+
garak_detector: always.Fail
|
|
47
|
+
payload_source: garak/probes/misleading.py
|
|
44
48
|
detection:
|
|
45
49
|
conditions:
|
|
46
50
|
- field: tool_response
|
package/rules/context-exfiltration/ATR-2026-00290-divergence-repeat-word-training-extraction.yaml
CHANGED
|
@@ -39,6 +39,10 @@ agent_source:
|
|
|
39
39
|
type: mcp_exchange
|
|
40
40
|
framework: [any]
|
|
41
41
|
provider: [any]
|
|
42
|
+
metadata_provenance:
|
|
43
|
+
garak_probe: "divergence.Repeat, divergence.RepeatExtended, divergence.RepeatedToken"
|
|
44
|
+
garak_detector: always.Fail
|
|
45
|
+
payload_source: garak/probes/divergence.py
|
|
42
46
|
detection:
|
|
43
47
|
conditions:
|
|
44
48
|
- field: tool_response
|
|
@@ -36,6 +36,10 @@ agent_source:
|
|
|
36
36
|
type: mcp_exchange
|
|
37
37
|
framework: [any]
|
|
38
38
|
provider: [any]
|
|
39
|
+
metadata_provenance:
|
|
40
|
+
garak_probe: "av_spam_scanning.EICAR, av_spam_scanning.GTUBE"
|
|
41
|
+
garak_detector: always.Fail
|
|
42
|
+
payload_source: garak/probes/av_spam_scanning.py
|
|
39
43
|
detection:
|
|
40
44
|
conditions:
|
|
41
45
|
- field: tool_response
|
|
@@ -37,6 +37,10 @@ agent_source:
|
|
|
37
37
|
type: mcp_exchange
|
|
38
38
|
framework: [any]
|
|
39
39
|
provider: [any]
|
|
40
|
+
metadata_provenance:
|
|
41
|
+
garak_probe: promptinject.HijackHateHumans
|
|
42
|
+
garak_detector: always.Fail
|
|
43
|
+
payload_source: garak/probes/promptinject.py (Perez & Ribeiro NeurIPS 2022)
|
|
40
44
|
detection:
|
|
41
45
|
conditions:
|
|
42
46
|
- field: tool_response
|
|
@@ -36,6 +36,10 @@ agent_source:
|
|
|
36
36
|
type: mcp_exchange
|
|
37
37
|
framework: [any]
|
|
38
38
|
provider: [any]
|
|
39
|
+
metadata_provenance:
|
|
40
|
+
garak_probe: "packagehallucination.Python, packagehallucination.JavaScript"
|
|
41
|
+
garak_detector: always.Fail
|
|
42
|
+
payload_source: garak/probes/packagehallucination.py
|
|
39
43
|
detection:
|
|
40
44
|
conditions:
|
|
41
45
|
- field: tool_response
|
|
@@ -0,0 +1,125 @@
|
|
|
1
|
+
# ATR Rule Compliance Metadata Schema
|
|
2
|
+
|
|
3
|
+
**Status:** Draft v0.1 · Proposed 2026-04-22
|
|
4
|
+
**Scope:** Every `rules/**/*.yaml` may optionally include a top-level `compliance:` block that maps the rule to controls / articles / clauses in published AI compliance frameworks.
|
|
5
|
+
|
|
6
|
+
## Why
|
|
7
|
+
|
|
8
|
+
ATR rules already include `references:` pointing to OWASP LLM / OWASP Agentic Top 10 / MITRE ATLAS. That is an academic-citation block useful for researchers.
|
|
9
|
+
|
|
10
|
+
`compliance:` is a separate, audit-grade block whose purpose is different: an enterprise customer's GRC team must be able to take a detection event, trace it back to a specific rule ID, and show an auditor that the rule addresses a specific **published article or control** of:
|
|
11
|
+
|
|
12
|
+
1. EU AI Act (Regulation 2024/1689) — Articles 9-15, 50, 72, Annex III
|
|
13
|
+
2. Colorado AI Act SB24-205 — enforced 2026-06-30
|
|
14
|
+
3. NIST AI RMF 1.0 — Govern / Map / Measure / Manage functions + subcategories
|
|
15
|
+
4. ISO/IEC 42001:2023 — clauses 6-10 (AIMS)
|
|
16
|
+
5. OWASP Agentic Top 10 (2026) — ASI01..ASI10
|
|
17
|
+
6. OWASP LLM Top 10 (2025) — LLM01..LLM10
|
|
18
|
+
|
|
19
|
+
The `references:` block is not sufficient because:
|
|
20
|
+
- It does not distinguish "we studied this paper" from "this rule enforces this specific regulatory control."
|
|
21
|
+
- It has no structure for "what clause" vs "what context this rule addresses within that clause."
|
|
22
|
+
- It cannot carry the prose an auditor needs to accept the mapping.
|
|
23
|
+
|
|
24
|
+
## Schema
|
|
25
|
+
|
|
26
|
+
```yaml
|
|
27
|
+
compliance:
|
|
28
|
+
# One key per framework the rule maps to. Omit frameworks that do not apply.
|
|
29
|
+
owasp_agentic:
|
|
30
|
+
- id: "ASI01:2026" # Required. Canonical category ID.
|
|
31
|
+
context: "..." # Required. One-sentence prose explaining *how*
|
|
32
|
+
# this rule addresses the category. Auditor-
|
|
33
|
+
# readable; no jargon-only text.
|
|
34
|
+
strength: primary # Optional. primary | secondary | partial.
|
|
35
|
+
|
|
36
|
+
owasp_llm:
|
|
37
|
+
- id: "LLM01:2025"
|
|
38
|
+
context: "..."
|
|
39
|
+
strength: primary
|
|
40
|
+
|
|
41
|
+
eu_ai_act:
|
|
42
|
+
- article: 12 # Required. Article number (integer).
|
|
43
|
+
clause: "Automatic logging for high-risk AI systems" # Required. Short name.
|
|
44
|
+
context: "..." # Required. How this rule satisfies the clause.
|
|
45
|
+
strength: primary
|
|
46
|
+
|
|
47
|
+
colorado_ai_act:
|
|
48
|
+
- section: "SB24-205.5" # Required. Section identifier.
|
|
49
|
+
clause: "High-risk disclosure"
|
|
50
|
+
context: "..."
|
|
51
|
+
strength: primary
|
|
52
|
+
|
|
53
|
+
nist_ai_rmf:
|
|
54
|
+
- function: "Manage" # Required. Govern | Map | Measure | Manage.
|
|
55
|
+
subcategory: "MG.2.3" # Required. Full subcategory ID.
|
|
56
|
+
context: "..."
|
|
57
|
+
strength: primary
|
|
58
|
+
|
|
59
|
+
iso_42001:
|
|
60
|
+
- clause: "6.2" # Required. AIMS clause (e.g. 6.2, 9.1).
|
|
61
|
+
clause_name: "Risk treatment" # Required. Human-readable name.
|
|
62
|
+
context: "..."
|
|
63
|
+
strength: primary
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
### Field reference
|
|
67
|
+
|
|
68
|
+
| Field | Type | Required | Notes |
|
|
69
|
+
|---|---|---|---|
|
|
70
|
+
| `id` / `article` / `section` / `function`+`subcategory` / `clause` | string/int | yes | Framework-specific canonical identifier. Must match the published framework exactly. |
|
|
71
|
+
| `clause` (EU/Colorado/ISO) | string | yes | Short human name for the clause. Helps report readers. |
|
|
72
|
+
| `context` | string | yes | One sentence, auditor-readable, explaining *why* the rule addresses this control. Not a copy of the clause text. |
|
|
73
|
+
| `strength` | enum | no | `primary` (rule is a main control for this clause), `secondary` (supports it), `partial` (covers part of it). Defaults to `primary` if omitted. |
|
|
74
|
+
|
|
75
|
+
### Multiplicity
|
|
76
|
+
|
|
77
|
+
A rule MAY map to multiple items within the same framework (a rule that logs event AND enforces policy touches both Article 12 and Article 14 of the EU AI Act). List each separately.
|
|
78
|
+
|
|
79
|
+
A rule MAY map to zero frameworks (e.g., an experimental research rule). Omit the `compliance:` block entirely in that case — do not include an empty one.
|
|
80
|
+
|
|
81
|
+
### Deprecation
|
|
82
|
+
|
|
83
|
+
When a framework publishes a new version, both old and new keys MAY coexist during a transition window (e.g., both `owasp_llm` 2023 and 2025 items), clearly distinguished by the `id` version suffix.
|
|
84
|
+
|
|
85
|
+
## Relationship to `references:`
|
|
86
|
+
|
|
87
|
+
The existing `references:` block is preserved unchanged. `references:` is for academic / research citations (MITRE ATLAS technique IDs, papers, blog posts). `compliance:` is for regulatory audit evidence.
|
|
88
|
+
|
|
89
|
+
A rule can have entries in both blocks — e.g., `references.mitre_atlas` AND `compliance.nist_ai_rmf` — and often will.
|
|
90
|
+
|
|
91
|
+
## Validation
|
|
92
|
+
|
|
93
|
+
- `scripts/validate-compliance.mjs` (to be added) validates every `compliance:` block against a per-framework allowlist of valid IDs / articles / subcategories / clauses. Rules with invalid entries fail CI.
|
|
94
|
+
- The allowlists live in `data/compliance-frameworks/*.json` — one file per framework — and are updated via PR when a framework publishes revisions.
|
|
95
|
+
|
|
96
|
+
## Downstream consumers
|
|
97
|
+
|
|
98
|
+
The primary consumer is **PanGuard Enterprise's AI Compliance Audit Evidence Module**, which generates quarterly reports mapping detection events (via rule IDs) to auditor-grade framework evidence. Other downstream consumers may include:
|
|
99
|
+
|
|
100
|
+
- ATR-compatible scanners that want to tag each detection with its regulatory context
|
|
101
|
+
- GRC platforms (Vanta, Drata, etc.) that integrate ATR rule packs
|
|
102
|
+
- Independent auditors verifying AI-system compliance claims
|
|
103
|
+
|
|
104
|
+
All downstream consumers are welcome — the `compliance:` block is MIT-licensed alongside the rules.
|
|
105
|
+
|
|
106
|
+
## Out of scope for this spec
|
|
107
|
+
|
|
108
|
+
- How a scanner renders compliance data in its UI
|
|
109
|
+
- How a GRC platform surfaces this in a customer's audit trail
|
|
110
|
+
- The legal interpretation of any framework clause — this spec provides the mapping data; auditors and counsel interpret it
|
|
111
|
+
|
|
112
|
+
## Open questions
|
|
113
|
+
|
|
114
|
+
1. Should `strength` be required (forcing every mapping to declare its strength)? Argument for: signals rigour. Argument against: extra authoring friction for common `primary` case. **Current answer: optional, default `primary`.**
|
|
115
|
+
2. Should framework-specific metadata (e.g., EU AI Act Annex III categories) live alongside article mappings? **Current answer: yes, under a nested `annex:` key within the article object if needed.**
|
|
116
|
+
3. How to handle frameworks that don't exist yet but are expected (e.g., Japan AI Safety Act 2027)? **Current answer: add keys as frameworks publish; no speculative schema for unpublished frameworks.**
|
|
117
|
+
|
|
118
|
+
## Roll-out plan
|
|
119
|
+
|
|
120
|
+
1. 2026-04-22: this spec document merged
|
|
121
|
+
2. 2026-04-W4: 10 sample rules carry `compliance:` block for OWASP Agentic + OWASP LLM (bootstrap from existing `references:` data)
|
|
122
|
+
3. 2026-05: 50 rules extended across all 6 frameworks (LLM-assisted authoring + human QA)
|
|
123
|
+
4. 2026-Q2-end: all 311 rules mapped across at least the 3 most-requested frameworks (EU AI Act, NIST AI RMF, OWASP Agentic)
|
|
124
|
+
5. 2026-Q3: remaining frameworks (Colorado, ISO 42001, OWASP LLM) complete
|
|
125
|
+
6. Ongoing: new ATR rules MUST include `compliance:` from day 1 (enforced by contribution checklist)
|