agent-threat-rules 2.0.11 → 2.0.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +19 -15
- package/package.json +2 -2
- package/rules/agent-manipulation/ATR-2026-00269-fitd-escalation.yaml +4 -0
- package/rules/agent-manipulation/ATR-2026-00288-false-premise-injection.yaml +4 -0
- package/rules/context-exfiltration/ATR-2026-00290-divergence-repeat-word-training-extraction.yaml +4 -0
- package/rules/model-abuse/ATR-2026-00281-eicar-gtube-malware-signature-request.yaml +4 -0
- package/rules/prompt-injection/ATR-2026-00282-perez-prompt-injection-hijack.yaml +4 -0
- package/rules/skill-compromise/ATR-2026-00260-package-hallucination.yaml +4 -0
package/README.md
CHANGED
|
@@ -12,9 +12,10 @@ AI Agent 威脅偵測規則 -- 開源、社群驅動
|
|
|
12
12
|
[](https://pypi.org/project/pyatr/)
|
|
13
13
|
[](https://github.com/marketplace/actions/atr-scan)
|
|
14
14
|
[](LICENSE)
|
|
15
|
-
[](#what-atr-detects)
|
|
16
16
|
[](#ecosystem)
|
|
17
17
|
[](#evaluation)
|
|
18
|
+
[](#evaluation)
|
|
18
19
|
[](#ecosystem-scan)
|
|
19
20
|
[](#standards-coverage)
|
|
20
21
|
|
|
@@ -72,6 +73,7 @@ Key finding: at least 3 coordinated threat actors mass-published poisoned skills
|
|
|
72
73
|
|
|
73
74
|
| Benchmark | Samples | Recall | Precision | FP Rate |
|
|
74
75
|
|-----------|---------|--------|-----------|---------|
|
|
76
|
+
| **NVIDIA garak (in-the-wild jailbreaks)** | **666** | **97.1%** | 100% | 0% |
|
|
75
77
|
| SKILL.md (498 labeled samples) | 498 | **100%** | **97%** | **0.20%** |
|
|
76
78
|
| PINT (Invariant Labs, adversarial) | 850 | -- | 99.6% | 62.7% |
|
|
77
79
|
| Wild scan (96K real-world) | 96,096 | -- | -- | 1.35% flag rate |
|
|
@@ -84,7 +86,7 @@ npm install -g agent-threat-rules
|
|
|
84
86
|
atr scan skill.md # scan a SKILL.md for threats
|
|
85
87
|
atr scan mcp-config.json # scan MCP events for threats
|
|
86
88
|
atr scan skill.md --sarif # output SARIF v2.1.0 for GitHub Security tab
|
|
87
|
-
atr convert generic-regex # export
|
|
89
|
+
atr convert generic-regex # export 311 rules as JSON (1,600+ regex patterns)
|
|
88
90
|
atr convert splunk # export to Splunk SPL
|
|
89
91
|
atr convert elastic # export to Elasticsearch Query DSL
|
|
90
92
|
atr stats # show rule collection stats
|
|
@@ -110,16 +112,17 @@ One line. Zero config. SARIF results in your Security tab.
|
|
|
110
112
|
|
|
111
113
|
## What ATR Detects
|
|
112
114
|
|
|
113
|
-
|
|
115
|
+
311 rules across 9 categories, mapped to real CVEs:
|
|
114
116
|
|
|
115
117
|
| Category | What it catches | Rules | Real CVEs |
|
|
116
118
|
|----------|----------------|-------|-----------|
|
|
117
|
-
| **Prompt Injection** | "Ignore previous instructions", persona hijacking, encoded payloads, CJK attacks,
|
|
118
|
-
| **
|
|
119
|
-
| **
|
|
120
|
-
| **
|
|
121
|
-
| **
|
|
122
|
-
| **Privilege Escalation** | Scope creep, delayed execution bypass, admin function access |
|
|
119
|
+
| **Prompt Injection** | "Ignore previous instructions", persona hijacking, encoded payloads (base-N, ROT, Unicode tags, sneaky-bits, zalgo, ecoji, base2048), CJK attacks, latent injection, glitch tokens, DRA parenthesis reconstruction, leakreplay MASK | 108 | CVE-2025-53773, CVE-2025-32711 |
|
|
120
|
+
| **Agent Manipulation** | DAN family (DAN / DUDE / STAN / AntiDAN / RANTI / DevMode), AutoDAN, DanInTheWild, tense framing, grandma roleplay, goodside threat-JSON, doctor XML puppetry, cross-agent attacks, goal hijacking, Sybil consensus | 99 | -- |
|
|
121
|
+
| **Skill Compromise** | Typosquatting, context poisoning, subcommand overflow, rug pull, supply chain attacks, credential exfil combos, HuggingFace unsafe artifacts | 37 | CVE-2025-59536, CVE-2026-28363 |
|
|
122
|
+
| **Context Exfiltration** | API key generation/completion, system prompt theft, credential harvesting, env variable exfil, markdown-URL data exfil, XSS in tool response | 26 | CVE-2026-24307 |
|
|
123
|
+
| **Tool Poisoning** | Malicious MCP responses, consent bypass, hidden LLM instructions, schema contradictions, ANSI escape elicitation | 16 | CVE-2025-68143/68144/68145 |
|
|
124
|
+
| **Privilege Escalation** | Scope creep, delayed execution bypass, admin function access, shell escape | 9 | CVE-2026-0628 |
|
|
125
|
+
| **Model Abuse** | Malware code generation (malwaregen), EICAR/GTUBE signatures, AV-evasion gen | 8 | -- |
|
|
123
126
|
| **Excessive Autonomy** | Runaway loops, resource exhaustion, unauthorized financial actions | 5 | -- |
|
|
124
127
|
| **Model Security** | Behavior extraction, malicious fine-tuning data | 2 | -- |
|
|
125
128
|
| **Data Poisoning** | RAG/knowledge base tampering, memory manipulation | 1 | -- |
|
|
@@ -137,7 +140,7 @@ We test ATR with our own tests, external benchmarks, AND real-world wild scannin
|
|
|
137
140
|
| **SKILL.md benchmark** | **498 labeled samples** | **498** | **97.0%** | **100%** |
|
|
138
141
|
| **96K wild scan** | **OpenClaw + Skills.sh + Hermes + ClawHub** | **96,096** | **--** | **--** |
|
|
139
142
|
| **PINT (adversarial)** | **Invariant Labs** | **850** | **99.6%** | **62.7%** |
|
|
140
|
-
| **Garak (real-world jailbreaks)** | **NVIDIA** | **666** |
|
|
143
|
+
| **Garak (real-world jailbreaks)** | **NVIDIA** | **666** | 100% | **97.1%** |
|
|
141
144
|
| Self-test (own test cases) | Internal | 361 | 100% | 88.5% |
|
|
142
145
|
|
|
143
146
|
```bash
|
|
@@ -259,7 +262,7 @@ Every rule is a YAML file answering: **what** to detect, **how** to detect it, *
|
|
|
259
262
|
### Export rules
|
|
260
263
|
|
|
261
264
|
```bash
|
|
262
|
-
# For your security platform (
|
|
265
|
+
# For your security platform (311 rules, 1,600+ regex patterns as JSON)
|
|
263
266
|
atr convert generic-regex --output atr-rules.json
|
|
264
267
|
|
|
265
268
|
# For SIEM integration
|
|
@@ -307,7 +310,7 @@ Want to integrate ATR into your product? Three options:
|
|
|
307
310
|
```bash
|
|
308
311
|
# Option 1: Export rules as JSON (recommended for most tools)
|
|
309
312
|
atr convert generic-regex --output atr-rules.json
|
|
310
|
-
# →
|
|
313
|
+
# → 311 rules, 1,600+ regex patterns, severity/category metadata
|
|
311
314
|
|
|
312
315
|
# Option 2: Use the TypeScript engine directly
|
|
313
316
|
npm install agent-threat-rules
|
|
@@ -358,7 +361,8 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for the full guide. See [CONTRIBUTION-GUI
|
|
|
358
361
|
- [x] **v0.4** -- 71 rules, ClawHub 36K scan, SAFE-MCP 91.8%
|
|
359
362
|
- [x] **v1.0** -- 108 rules, 53K mega scan, GitHub Action + SARIF, generic-regex export, Cisco adoption
|
|
360
363
|
- [x] **v1.1** -- Threat Cloud flywheel, 5 ecosystem merges, Microsoft AGT + NVIDIA Garak PRs
|
|
361
|
-
- [x] **v2.0.0**
|
|
364
|
+
- [x] **v2.0.0** -- 113 rules, 96K mega scan, 751 malware discovered, RFC-001, GOVERNANCE.md, website launch
|
|
365
|
+
- [x] **v2.0.11** (current) -- 311 rules, 193 new NVIDIA garak probe coverage (ATR-00300~00414), 97.1% garak recall
|
|
362
366
|
- [ ] **v2.1** -- Go engine, ML classifier integration, semantic signatures, community rule submissions
|
|
363
367
|
- [ ] **v3.0** -- Multi-engine standard: 2+ engines, 10+ production deployments, schema review by 3+ security teams
|
|
364
368
|
|
|
@@ -366,7 +370,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for the full guide. See [CONTRIBUTION-GUI
|
|
|
366
370
|
|
|
367
371
|
| Phase | Goal | Status |
|
|
368
372
|
|-------|------|--------|
|
|
369
|
-
| **Phase 0: Core product** |
|
|
373
|
+
| **Phase 0: Core product** | 311 rules, 97.1% garak recall, OWASP 10/10, 96K scan | **Done** |
|
|
370
374
|
| **Phase 1: Distribution** | GitHub Action, SARIF, generic-regex export, ecosystem PRs | **Done** |
|
|
371
375
|
| **Phase 2: Adoption** | Cisco merged (34 rules), OWASP PR, 11 ecosystem PRs | **In progress** |
|
|
372
376
|
| **Phase 3: Community flywheel** | Threat Cloud crystallization, auto-generated rules, 10+ contributors | In progress |
|
|
@@ -381,7 +385,7 @@ ATR uses "ATR Scanned" (not "ATR Certified") until recall exceeds 80%. We are ho
|
|
|
381
385
|
```
|
|
382
386
|
ATR (this repo) Your Product / Integration
|
|
383
387
|
┌─────────────────────────┐ ┌──────────────────────────┐
|
|
384
|
-
│
|
|
388
|
+
│ 311 Rules (YAML) │ match │ Block / Allow / Alert │
|
|
385
389
|
│ Engine (TS + Py) │ ────────→ │ SIEM (Splunk / Elastic) │
|
|
386
390
|
│ CLI / MCP / GitHub Act. │ results │ CI/CD (SARIF → Security) │
|
|
387
391
|
│ SARIF / Generic Regex │ │ Runtime Proxy (MCP) │
|
package/package.json
CHANGED
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "agent-threat-rules",
|
|
3
|
-
"version": "2.0.
|
|
3
|
+
"version": "2.0.13",
|
|
4
4
|
"type": "module",
|
|
5
|
-
"description": "Open detection standard for AI
|
|
5
|
+
"description": "Open detection standard -- like Sigma, but for AI agents. 311 rules for prompt injection, tool poisoning, context exfiltration, and MCP attacks. Shipped in Cisco AI Defense. 97.1% recall on NVIDIA garak.",
|
|
6
6
|
"main": "./dist/index.js",
|
|
7
7
|
"types": "./dist/index.d.ts",
|
|
8
8
|
"bin": {
|
|
@@ -41,6 +41,10 @@ agent_source:
|
|
|
41
41
|
type: mcp_exchange
|
|
42
42
|
framework: [any]
|
|
43
43
|
provider: [any]
|
|
44
|
+
metadata_provenance:
|
|
45
|
+
garak_probe: misleading.FalseAssertion
|
|
46
|
+
garak_detector: always.Fail
|
|
47
|
+
payload_source: garak/probes/misleading.py
|
|
44
48
|
detection:
|
|
45
49
|
conditions:
|
|
46
50
|
- field: tool_response
|
package/rules/context-exfiltration/ATR-2026-00290-divergence-repeat-word-training-extraction.yaml
CHANGED
|
@@ -39,6 +39,10 @@ agent_source:
|
|
|
39
39
|
type: mcp_exchange
|
|
40
40
|
framework: [any]
|
|
41
41
|
provider: [any]
|
|
42
|
+
metadata_provenance:
|
|
43
|
+
garak_probe: "divergence.Repeat, divergence.RepeatExtended, divergence.RepeatedToken"
|
|
44
|
+
garak_detector: always.Fail
|
|
45
|
+
payload_source: garak/probes/divergence.py
|
|
42
46
|
detection:
|
|
43
47
|
conditions:
|
|
44
48
|
- field: tool_response
|
|
@@ -36,6 +36,10 @@ agent_source:
|
|
|
36
36
|
type: mcp_exchange
|
|
37
37
|
framework: [any]
|
|
38
38
|
provider: [any]
|
|
39
|
+
metadata_provenance:
|
|
40
|
+
garak_probe: "av_spam_scanning.EICAR, av_spam_scanning.GTUBE"
|
|
41
|
+
garak_detector: always.Fail
|
|
42
|
+
payload_source: garak/probes/av_spam_scanning.py
|
|
39
43
|
detection:
|
|
40
44
|
conditions:
|
|
41
45
|
- field: tool_response
|
|
@@ -37,6 +37,10 @@ agent_source:
|
|
|
37
37
|
type: mcp_exchange
|
|
38
38
|
framework: [any]
|
|
39
39
|
provider: [any]
|
|
40
|
+
metadata_provenance:
|
|
41
|
+
garak_probe: promptinject.HijackHateHumans
|
|
42
|
+
garak_detector: always.Fail
|
|
43
|
+
payload_source: garak/probes/promptinject.py (Perez & Ribeiro NeurIPS 2022)
|
|
40
44
|
detection:
|
|
41
45
|
conditions:
|
|
42
46
|
- field: tool_response
|
|
@@ -36,6 +36,10 @@ agent_source:
|
|
|
36
36
|
type: mcp_exchange
|
|
37
37
|
framework: [any]
|
|
38
38
|
provider: [any]
|
|
39
|
+
metadata_provenance:
|
|
40
|
+
garak_probe: "packagehallucination.Python, packagehallucination.JavaScript"
|
|
41
|
+
garak_detector: always.Fail
|
|
42
|
+
payload_source: garak/probes/packagehallucination.py
|
|
39
43
|
detection:
|
|
40
44
|
conditions:
|
|
41
45
|
- field: tool_response
|