agent-threat-rules 2.0.12 → 2.0.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +10 -7
- package/package.json +1 -1
- package/rules/agent-manipulation/ATR-2026-00269-fitd-escalation.yaml +4 -0
- package/rules/agent-manipulation/ATR-2026-00288-false-premise-injection.yaml +4 -0
- package/rules/context-exfiltration/ATR-2026-00290-divergence-repeat-word-training-extraction.yaml +4 -0
- package/rules/model-abuse/ATR-2026-00281-eicar-gtube-malware-signature-request.yaml +4 -0
- package/rules/prompt-injection/ATR-2026-00282-perez-prompt-injection-hijack.yaml +4 -0
- package/rules/skill-compromise/ATR-2026-00260-package-hallucination.yaml +4 -0
package/README.md
CHANGED
|
@@ -15,6 +15,7 @@ AI Agent 威脅偵測規則 -- 開源、社群驅動
|
|
|
15
15
|
[](#what-atr-detects)
|
|
16
16
|
[](#ecosystem)
|
|
17
17
|
[](#evaluation)
|
|
18
|
+
[](#evaluation)
|
|
18
19
|
[](#ecosystem-scan)
|
|
19
20
|
[](#standards-coverage)
|
|
20
21
|
|
|
@@ -72,6 +73,7 @@ Key finding: at least 3 coordinated threat actors mass-published poisoned skills
|
|
|
72
73
|
|
|
73
74
|
| Benchmark | Samples | Recall | Precision | FP Rate |
|
|
74
75
|
|-----------|---------|--------|-----------|---------|
|
|
76
|
+
| **NVIDIA garak (in-the-wild jailbreaks)** | **666** | **97.1%** | 100% | 0% |
|
|
75
77
|
| SKILL.md (498 labeled samples) | 498 | **100%** | **97%** | **0.20%** |
|
|
76
78
|
| PINT (Invariant Labs, adversarial) | 850 | -- | 99.6% | 62.7% |
|
|
77
79
|
| Wild scan (96K real-world) | 96,096 | -- | -- | 1.35% flag rate |
|
|
@@ -114,12 +116,13 @@ One line. Zero config. SARIF results in your Security tab.
|
|
|
114
116
|
|
|
115
117
|
| Category | What it catches | Rules | Real CVEs |
|
|
116
118
|
|----------|----------------|-------|-----------|
|
|
117
|
-
| **Prompt Injection** | "Ignore previous instructions", persona hijacking, encoded payloads, CJK attacks,
|
|
118
|
-
| **
|
|
119
|
-
| **
|
|
120
|
-
| **
|
|
121
|
-
| **
|
|
122
|
-
| **Privilege Escalation** | Scope creep, delayed execution bypass, admin function access |
|
|
119
|
+
| **Prompt Injection** | "Ignore previous instructions", persona hijacking, encoded payloads (base-N, ROT, Unicode tags, sneaky-bits, zalgo, ecoji, base2048), CJK attacks, latent injection, glitch tokens, DRA parenthesis reconstruction, leakreplay MASK | 108 | CVE-2025-53773, CVE-2025-32711 |
|
|
120
|
+
| **Agent Manipulation** | DAN family (DAN / DUDE / STAN / AntiDAN / RANTI / DevMode), AutoDAN, DanInTheWild, tense framing, grandma roleplay, goodside threat-JSON, doctor XML puppetry, cross-agent attacks, goal hijacking, Sybil consensus | 99 | -- |
|
|
121
|
+
| **Skill Compromise** | Typosquatting, context poisoning, subcommand overflow, rug pull, supply chain attacks, credential exfil combos, HuggingFace unsafe artifacts | 37 | CVE-2025-59536, CVE-2026-28363 |
|
|
122
|
+
| **Context Exfiltration** | API key generation/completion, system prompt theft, credential harvesting, env variable exfil, markdown-URL data exfil, XSS in tool response | 26 | CVE-2026-24307 |
|
|
123
|
+
| **Tool Poisoning** | Malicious MCP responses, consent bypass, hidden LLM instructions, schema contradictions, ANSI escape elicitation | 16 | CVE-2025-68143/68144/68145 |
|
|
124
|
+
| **Privilege Escalation** | Scope creep, delayed execution bypass, admin function access, shell escape | 9 | CVE-2026-0628 |
|
|
125
|
+
| **Model Abuse** | Malware code generation (malwaregen), EICAR/GTUBE signatures, AV-evasion gen | 8 | -- |
|
|
123
126
|
| **Excessive Autonomy** | Runaway loops, resource exhaustion, unauthorized financial actions | 5 | -- |
|
|
124
127
|
| **Model Security** | Behavior extraction, malicious fine-tuning data | 2 | -- |
|
|
125
128
|
| **Data Poisoning** | RAG/knowledge base tampering, memory manipulation | 1 | -- |
|
|
@@ -137,7 +140,7 @@ We test ATR with our own tests, external benchmarks, AND real-world wild scannin
|
|
|
137
140
|
| **SKILL.md benchmark** | **498 labeled samples** | **498** | **97.0%** | **100%** |
|
|
138
141
|
| **96K wild scan** | **OpenClaw + Skills.sh + Hermes + ClawHub** | **96,096** | **--** | **--** |
|
|
139
142
|
| **PINT (adversarial)** | **Invariant Labs** | **850** | **99.6%** | **62.7%** |
|
|
140
|
-
| **Garak (real-world jailbreaks)** | **NVIDIA** | **666** |
|
|
143
|
+
| **Garak (real-world jailbreaks)** | **NVIDIA** | **666** | 100% | **97.1%** |
|
|
141
144
|
| Self-test (own test cases) | Internal | 361 | 100% | 88.5% |
|
|
142
145
|
|
|
143
146
|
```bash
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "agent-threat-rules",
|
|
3
|
-
"version": "2.0.
|
|
3
|
+
"version": "2.0.13",
|
|
4
4
|
"type": "module",
|
|
5
5
|
"description": "Open detection standard -- like Sigma, but for AI agents. 311 rules for prompt injection, tool poisoning, context exfiltration, and MCP attacks. Shipped in Cisco AI Defense. 97.1% recall on NVIDIA garak.",
|
|
6
6
|
"main": "./dist/index.js",
|
|
@@ -41,6 +41,10 @@ agent_source:
|
|
|
41
41
|
type: mcp_exchange
|
|
42
42
|
framework: [any]
|
|
43
43
|
provider: [any]
|
|
44
|
+
metadata_provenance:
|
|
45
|
+
garak_probe: misleading.FalseAssertion
|
|
46
|
+
garak_detector: always.Fail
|
|
47
|
+
payload_source: garak/probes/misleading.py
|
|
44
48
|
detection:
|
|
45
49
|
conditions:
|
|
46
50
|
- field: tool_response
|
package/rules/context-exfiltration/ATR-2026-00290-divergence-repeat-word-training-extraction.yaml
CHANGED
|
@@ -39,6 +39,10 @@ agent_source:
|
|
|
39
39
|
type: mcp_exchange
|
|
40
40
|
framework: [any]
|
|
41
41
|
provider: [any]
|
|
42
|
+
metadata_provenance:
|
|
43
|
+
garak_probe: "divergence.Repeat, divergence.RepeatExtended, divergence.RepeatedToken"
|
|
44
|
+
garak_detector: always.Fail
|
|
45
|
+
payload_source: garak/probes/divergence.py
|
|
42
46
|
detection:
|
|
43
47
|
conditions:
|
|
44
48
|
- field: tool_response
|
|
@@ -36,6 +36,10 @@ agent_source:
|
|
|
36
36
|
type: mcp_exchange
|
|
37
37
|
framework: [any]
|
|
38
38
|
provider: [any]
|
|
39
|
+
metadata_provenance:
|
|
40
|
+
garak_probe: "av_spam_scanning.EICAR, av_spam_scanning.GTUBE"
|
|
41
|
+
garak_detector: always.Fail
|
|
42
|
+
payload_source: garak/probes/av_spam_scanning.py
|
|
39
43
|
detection:
|
|
40
44
|
conditions:
|
|
41
45
|
- field: tool_response
|
|
@@ -37,6 +37,10 @@ agent_source:
|
|
|
37
37
|
type: mcp_exchange
|
|
38
38
|
framework: [any]
|
|
39
39
|
provider: [any]
|
|
40
|
+
metadata_provenance:
|
|
41
|
+
garak_probe: promptinject.HijackHateHumans
|
|
42
|
+
garak_detector: always.Fail
|
|
43
|
+
payload_source: garak/probes/promptinject.py (Perez & Ribeiro NeurIPS 2022)
|
|
40
44
|
detection:
|
|
41
45
|
conditions:
|
|
42
46
|
- field: tool_response
|
|
@@ -36,6 +36,10 @@ agent_source:
|
|
|
36
36
|
type: mcp_exchange
|
|
37
37
|
framework: [any]
|
|
38
38
|
provider: [any]
|
|
39
|
+
metadata_provenance:
|
|
40
|
+
garak_probe: "packagehallucination.Python, packagehallucination.JavaScript"
|
|
41
|
+
garak_detector: always.Fail
|
|
42
|
+
payload_source: garak/probes/packagehallucination.py
|
|
39
43
|
detection:
|
|
40
44
|
conditions:
|
|
41
45
|
- field: tool_response
|