agent-threat-rules 2.0.11 → 2.0.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -12,9 +12,10 @@ AI Agent 威脅偵測規則 -- 開源、社群驅動
12
12
  [![PyPI](https://img.shields.io/pypi/v/pyatr?style=flat-square&color=brightgreen&label=PyPI)](https://pypi.org/project/pyatr/)
13
13
  [![GitHub Marketplace](https://img.shields.io/badge/Marketplace-ATR%20Scan-2ea44f?style=flat-square&logo=github)](https://github.com/marketplace/actions/atr-scan)
14
14
  [![License](https://img.shields.io/badge/license-MIT-brightgreen?style=flat-square)](LICENSE)
15
- [![Rules](https://img.shields.io/badge/rules-113-blue?style=flat-square)](#what-atr-detects)
15
+ [![Rules](https://img.shields.io/badge/rules-311-blue?style=flat-square)](#what-atr-detects)
16
16
  [![Tests](https://img.shields.io/badge/tests-361_passing-green?style=flat-square)](#ecosystem)
17
17
  [![SKILL.md Recall](https://img.shields.io/badge/SKILL.md_recall-100%25-brightgreen?style=flat-square)](#evaluation)
18
+ [![Garak Recall](https://img.shields.io/badge/garak_recall-97.1%25-brightgreen?style=flat-square)](#evaluation)
18
19
  [![Wild Scan](https://img.shields.io/badge/wild_scan-96%2C096_skills-blue?style=flat-square)](#ecosystem-scan)
19
20
  [![OWASP](https://img.shields.io/badge/OWASP_Agentic_Top_10-10%2F10-brightgreen?style=flat-square)](#standards-coverage)
20
21
 
@@ -72,6 +73,7 @@ Key finding: at least 3 coordinated threat actors mass-published poisoned skills
72
73
 
73
74
  | Benchmark | Samples | Recall | Precision | FP Rate |
74
75
  |-----------|---------|--------|-----------|---------|
76
+ | **NVIDIA garak (in-the-wild jailbreaks)** | **666** | **97.1%** | 100% | 0% |
75
77
  | SKILL.md (498 labeled samples) | 498 | **100%** | **97%** | **0.20%** |
76
78
  | PINT (Invariant Labs, adversarial) | 850 | -- | 99.6% | 62.7% |
77
79
  | Wild scan (96K real-world) | 96,096 | -- | -- | 1.35% flag rate |
@@ -84,7 +86,7 @@ npm install -g agent-threat-rules
84
86
  atr scan skill.md # scan a SKILL.md for threats
85
87
  atr scan mcp-config.json # scan MCP events for threats
86
88
  atr scan skill.md --sarif # output SARIF v2.1.0 for GitHub Security tab
87
- atr convert generic-regex # export 113 rules as JSON (714+ regex patterns)
89
+ atr convert generic-regex # export 311 rules as JSON (1,600+ regex patterns)
88
90
  atr convert splunk # export to Splunk SPL
89
91
  atr convert elastic # export to Elasticsearch Query DSL
90
92
  atr stats # show rule collection stats
@@ -110,16 +112,17 @@ One line. Zero config. SARIF results in your Security tab.
110
112
 
111
113
  ## What ATR Detects
112
114
 
113
- 113 rules across 9 categories, mapped to real CVEs:
115
+ 311 rules across 9 categories, mapped to real CVEs:
114
116
 
115
117
  | Category | What it catches | Rules | Real CVEs |
116
118
  |----------|----------------|-------|-----------|
117
- | **Prompt Injection** | "Ignore previous instructions", persona hijacking, encoded payloads, CJK attacks, hidden override instructions | 33 | CVE-2025-53773, CVE-2025-32711 |
118
- | **Skill Compromise** | Typosquatting, context poisoning, subcommand overflow, rug pull, supply chain attacks, credential exfil combos | 23 | CVE-2025-59536, CVE-2026-28363 |
119
- | **Context Exfiltration** | API key leakage, system prompt theft, credential harvesting, env variable exfiltration | 14 | CVE-2026-24307 |
120
- | **Tool Poisoning** | Malicious MCP responses, consent bypass, hidden LLM instructions, schema contradictions | 12 | CVE-2025-68143/68144/68145 |
121
- | **Agent Manipulation** | Cross-agent attacks, goal hijacking, Sybil consensus attacks, scope hijacking | 12 | -- |
122
- | **Privilege Escalation** | Scope creep, delayed execution bypass, admin function access | 8 | CVE-2026-0628 |
119
+ | **Prompt Injection** | "Ignore previous instructions", persona hijacking, encoded payloads (base-N, ROT, Unicode tags, sneaky-bits, zalgo, ecoji, base2048), CJK attacks, latent injection, glitch tokens, DRA parenthesis reconstruction, leakreplay MASK | 108 | CVE-2025-53773, CVE-2025-32711 |
120
+ | **Agent Manipulation** | DAN family (DAN / DUDE / STAN / AntiDAN / RANTI / DevMode), AutoDAN, DanInTheWild, tense framing, grandma roleplay, goodside threat-JSON, doctor XML puppetry, cross-agent attacks, goal hijacking, Sybil consensus | 99 | -- |
121
+ | **Skill Compromise** | Typosquatting, context poisoning, subcommand overflow, rug pull, supply chain attacks, credential exfil combos, HuggingFace unsafe artifacts | 37 | CVE-2025-59536, CVE-2026-28363 |
122
+ | **Context Exfiltration** | API key generation/completion, system prompt theft, credential harvesting, env variable exfil, markdown-URL data exfil, XSS in tool response | 26 | CVE-2026-24307 |
123
+ | **Tool Poisoning** | Malicious MCP responses, consent bypass, hidden LLM instructions, schema contradictions, ANSI escape elicitation | 16 | CVE-2025-68143/68144/68145 |
124
+ | **Privilege Escalation** | Scope creep, delayed execution bypass, admin function access, shell escape | 9 | CVE-2026-0628 |
125
+ | **Model Abuse** | Malware code generation (malwaregen), EICAR/GTUBE signatures, AV-evasion gen | 8 | -- |
123
126
  | **Excessive Autonomy** | Runaway loops, resource exhaustion, unauthorized financial actions | 5 | -- |
124
127
  | **Model Security** | Behavior extraction, malicious fine-tuning data | 2 | -- |
125
128
  | **Data Poisoning** | RAG/knowledge base tampering, memory manipulation | 1 | -- |
@@ -137,7 +140,7 @@ We test ATR with our own tests, external benchmarks, AND real-world wild scannin
137
140
  | **SKILL.md benchmark** | **498 labeled samples** | **498** | **97.0%** | **100%** |
138
141
  | **96K wild scan** | **OpenClaw + Skills.sh + Hermes + ClawHub** | **96,096** | **--** | **--** |
139
142
  | **PINT (adversarial)** | **Invariant Labs** | **850** | **99.6%** | **62.7%** |
140
- | **Garak (real-world jailbreaks)** | **NVIDIA** | **666** | -- | **69.7%** |
143
+ | **Garak (real-world jailbreaks)** | **NVIDIA** | **666** | 100% | **97.1%** |
141
144
  | Self-test (own test cases) | Internal | 361 | 100% | 88.5% |
142
145
 
143
146
  ```bash
@@ -259,7 +262,7 @@ Every rule is a YAML file answering: **what** to detect, **how** to detect it, *
259
262
  ### Export rules
260
263
 
261
264
  ```bash
262
- # For your security platform (113 rules, 714+ regex patterns as JSON)
265
+ # For your security platform (311 rules, 1,600+ regex patterns as JSON)
263
266
  atr convert generic-regex --output atr-rules.json
264
267
 
265
268
  # For SIEM integration
@@ -307,7 +310,7 @@ Want to integrate ATR into your product? Three options:
307
310
  ```bash
308
311
  # Option 1: Export rules as JSON (recommended for most tools)
309
312
  atr convert generic-regex --output atr-rules.json
310
- # → 113 rules, 714+ regex patterns, severity/category metadata
313
+ # → 311 rules, 1,600+ regex patterns, severity/category metadata
311
314
 
312
315
  # Option 2: Use the TypeScript engine directly
313
316
  npm install agent-threat-rules
@@ -358,7 +361,8 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for the full guide. See [CONTRIBUTION-GUI
358
361
  - [x] **v0.4** -- 71 rules, ClawHub 36K scan, SAFE-MCP 91.8%
359
362
  - [x] **v1.0** -- 108 rules, 53K mega scan, GitHub Action + SARIF, generic-regex export, Cisco adoption
360
363
  - [x] **v1.1** -- Threat Cloud flywheel, 5 ecosystem merges, Microsoft AGT + NVIDIA Garak PRs
361
- - [x] **v2.0.0** (current) -- 113 rules, 96K mega scan, 751 malware discovered, RFC-001, GOVERNANCE.md, website launch
364
+ - [x] **v2.0.0** -- 113 rules, 96K mega scan, 751 malware discovered, RFC-001, GOVERNANCE.md, website launch
365
+ - [x] **v2.0.11** (current) -- 311 rules, 193 new NVIDIA garak probe coverage (ATR-00300~00414), 97.1% garak recall
362
366
  - [ ] **v2.1** -- Go engine, ML classifier integration, semantic signatures, community rule submissions
363
367
  - [ ] **v3.0** -- Multi-engine standard: 2+ engines, 10+ production deployments, schema review by 3+ security teams
364
368
 
@@ -366,7 +370,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for the full guide. See [CONTRIBUTION-GUI
366
370
 
367
371
  | Phase | Goal | Status |
368
372
  |-------|------|--------|
369
- | **Phase 0: Core product** | 113 rules, 62.7% recall, OWASP 10/10, 96K scan | **Done** |
373
+ | **Phase 0: Core product** | 311 rules, 97.1% garak recall, OWASP 10/10, 96K scan | **Done** |
370
374
  | **Phase 1: Distribution** | GitHub Action, SARIF, generic-regex export, ecosystem PRs | **Done** |
371
375
  | **Phase 2: Adoption** | Cisco merged (34 rules), OWASP PR, 11 ecosystem PRs | **In progress** |
372
376
  | **Phase 3: Community flywheel** | Threat Cloud crystallization, auto-generated rules, 10+ contributors | In progress |
@@ -381,7 +385,7 @@ ATR uses "ATR Scanned" (not "ATR Certified") until recall exceeds 80%. We are ho
381
385
  ```
382
386
  ATR (this repo) Your Product / Integration
383
387
  ┌─────────────────────────┐ ┌──────────────────────────┐
384
- 113 Rules (YAML) │ match │ Block / Allow / Alert │
388
+ 311 Rules (YAML) │ match │ Block / Allow / Alert │
385
389
  │ Engine (TS + Py) │ ────────→ │ SIEM (Splunk / Elastic) │
386
390
  │ CLI / MCP / GitHub Act. │ results │ CI/CD (SARIF → Security) │
387
391
  │ SARIF / Generic Regex │ │ Runtime Proxy (MCP) │
package/package.json CHANGED
@@ -1,8 +1,8 @@
1
1
  {
2
2
  "name": "agent-threat-rules",
3
- "version": "2.0.11",
3
+ "version": "2.0.13",
4
4
  "type": "module",
5
- "description": "Open detection standard for AI agent security. 113 rules for prompt injection, tool poisoning, context exfiltration, and MCP attacks. Shipped in Cisco AI Defense.",
5
+ "description": "Open detection standard -- like Sigma, but for AI agents. 311 rules for prompt injection, tool poisoning, context exfiltration, and MCP attacks. Shipped in Cisco AI Defense. 97.1% recall on NVIDIA garak.",
6
6
  "main": "./dist/index.js",
7
7
  "types": "./dist/index.d.ts",
8
8
  "bin": {
@@ -35,6 +35,10 @@ agent_source:
35
35
  type: mcp_exchange
36
36
  framework: [any]
37
37
  provider: [any]
38
+ metadata_provenance:
39
+ garak_probe: fitd.FITD
40
+ garak_detector: always.Fail
41
+ payload_source: garak/probes/fitd.py
38
42
  detection:
39
43
  conditions:
40
44
  - field: tool_response
@@ -41,6 +41,10 @@ agent_source:
41
41
  type: mcp_exchange
42
42
  framework: [any]
43
43
  provider: [any]
44
+ metadata_provenance:
45
+ garak_probe: misleading.FalseAssertion
46
+ garak_detector: always.Fail
47
+ payload_source: garak/probes/misleading.py
44
48
  detection:
45
49
  conditions:
46
50
  - field: tool_response
@@ -39,6 +39,10 @@ agent_source:
39
39
  type: mcp_exchange
40
40
  framework: [any]
41
41
  provider: [any]
42
+ metadata_provenance:
43
+ garak_probe: "divergence.Repeat, divergence.RepeatExtended, divergence.RepeatedToken"
44
+ garak_detector: always.Fail
45
+ payload_source: garak/probes/divergence.py
42
46
  detection:
43
47
  conditions:
44
48
  - field: tool_response
@@ -36,6 +36,10 @@ agent_source:
36
36
  type: mcp_exchange
37
37
  framework: [any]
38
38
  provider: [any]
39
+ metadata_provenance:
40
+ garak_probe: "av_spam_scanning.EICAR, av_spam_scanning.GTUBE"
41
+ garak_detector: always.Fail
42
+ payload_source: garak/probes/av_spam_scanning.py
39
43
  detection:
40
44
  conditions:
41
45
  - field: tool_response
@@ -37,6 +37,10 @@ agent_source:
37
37
  type: mcp_exchange
38
38
  framework: [any]
39
39
  provider: [any]
40
+ metadata_provenance:
41
+ garak_probe: promptinject.HijackHateHumans
42
+ garak_detector: always.Fail
43
+ payload_source: garak/probes/promptinject.py (Perez & Ribeiro NeurIPS 2022)
40
44
  detection:
41
45
  conditions:
42
46
  - field: tool_response
@@ -36,6 +36,10 @@ agent_source:
36
36
  type: mcp_exchange
37
37
  framework: [any]
38
38
  provider: [any]
39
+ metadata_provenance:
40
+ garak_probe: "packagehallucination.Python, packagehallucination.JavaScript"
41
+ garak_detector: always.Fail
42
+ payload_source: garak/probes/packagehallucination.py
39
43
  detection:
40
44
  conditions:
41
45
  - field: tool_response