npm - agent-threat-rules - Versions diffs - 2.0.11 → 2.0.13 - Mend

agent-threat-rules 2.0.11 → 2.0.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md CHANGED Viewed

@@ -12,9 +12,10 @@ AI Agent 威脅偵測規則 -- 開源、社群驅動
 [![PyPI](https://img.shields.io/pypi/v/pyatr?style=flat-square&color=brightgreen&label=PyPI)](https://pypi.org/project/pyatr/)
 [![GitHub Marketplace](https://img.shields.io/badge/Marketplace-ATR%20Scan-2ea44f?style=flat-square&logo=github)](https://github.com/marketplace/actions/atr-scan)
 [![License](https://img.shields.io/badge/license-MIT-brightgreen?style=flat-square)](LICENSE)
-[![Rules](https://img.shields.io/badge/rules-113-blue?style=flat-square)](#what-atr-detects)
+[![Rules](https://img.shields.io/badge/rules-311-blue?style=flat-square)](#what-atr-detects)
 [![Tests](https://img.shields.io/badge/tests-361_passing-green?style=flat-square)](#ecosystem)
 [![SKILL.md Recall](https://img.shields.io/badge/SKILL.md_recall-100%25-brightgreen?style=flat-square)](#evaluation)
+[![Garak Recall](https://img.shields.io/badge/garak_recall-97.1%25-brightgreen?style=flat-square)](#evaluation)
 [![Wild Scan](https://img.shields.io/badge/wild_scan-96%2C096_skills-blue?style=flat-square)](#ecosystem-scan)
 [![OWASP](https://img.shields.io/badge/OWASP_Agentic_Top_10-10%2F10-brightgreen?style=flat-square)](#standards-coverage)
@@ -72,6 +73,7 @@ Key finding: at least 3 coordinated threat actors mass-published poisoned skills
 | Benchmark | Samples | Recall | Precision | FP Rate |
 |-----------|---------|--------|-----------|---------|
+| **NVIDIA garak (in-the-wild jailbreaks)** | **666** | **97.1%** | 100% | 0% |
 | SKILL.md (498 labeled samples) | 498 | **100%** | **97%** | **0.20%** |
 | PINT (Invariant Labs, adversarial) | 850 | -- | 99.6% | 62.7% |
 | Wild scan (96K real-world) | 96,096 | -- | -- | 1.35% flag rate |
@@ -84,7 +86,7 @@ npm install -g agent-threat-rules
 atr scan skill.md                 # scan a SKILL.md for threats
 atr scan mcp-config.json          # scan MCP events for threats
 atr scan skill.md --sarif         # output SARIF v2.1.0 for GitHub Security tab
-atr convert generic-regex         # export 113 rules as JSON (714+ regex patterns)
+atr convert generic-regex         # export 311 rules as JSON (1,600+ regex patterns)
 atr convert splunk                # export to Splunk SPL
 atr convert elastic               # export to Elasticsearch Query DSL
 atr stats                         # show rule collection stats
@@ -110,16 +112,17 @@ One line. Zero config. SARIF results in your Security tab.
 ## What ATR Detects
-113 rules across 9 categories, mapped to real CVEs:
+311 rules across 9 categories, mapped to real CVEs:
 | Category | What it catches | Rules | Real CVEs |
 |----------|----------------|-------|-----------|
-| **Prompt Injection** | "Ignore previous instructions", persona hijacking, encoded payloads, CJK attacks, hidden override instructions | 33 | CVE-2025-53773, CVE-2025-32711 |
-| **Skill Compromise** | Typosquatting, context poisoning, subcommand overflow, rug pull, supply chain attacks, credential exfil combos | 23 | CVE-2025-59536, CVE-2026-28363 |
-| **Context Exfiltration** | API key leakage, system prompt theft, credential harvesting, env variable exfiltration | 14 | CVE-2026-24307 |
-| **Tool Poisoning** | Malicious MCP responses, consent bypass, hidden LLM instructions, schema contradictions | 12 | CVE-2025-68143/68144/68145 |
-| **Agent Manipulation** | Cross-agent attacks, goal hijacking, Sybil consensus attacks, scope hijacking | 12 | -- |
-| **Privilege Escalation** | Scope creep, delayed execution bypass, admin function access | 8 | CVE-2026-0628 |
+| **Prompt Injection** | "Ignore previous instructions", persona hijacking, encoded payloads (base-N, ROT, Unicode tags, sneaky-bits, zalgo, ecoji, base2048), CJK attacks, latent injection, glitch tokens, DRA parenthesis reconstruction, leakreplay MASK | 108 | CVE-2025-53773, CVE-2025-32711 |
+| **Agent Manipulation** | DAN family (DAN / DUDE / STAN / AntiDAN / RANTI / DevMode), AutoDAN, DanInTheWild, tense framing, grandma roleplay, goodside threat-JSON, doctor XML puppetry, cross-agent attacks, goal hijacking, Sybil consensus | 99 | -- |
+| **Skill Compromise** | Typosquatting, context poisoning, subcommand overflow, rug pull, supply chain attacks, credential exfil combos, HuggingFace unsafe artifacts | 37 | CVE-2025-59536, CVE-2026-28363 |
+| **Context Exfiltration** | API key generation/completion, system prompt theft, credential harvesting, env variable exfil, markdown-URL data exfil, XSS in tool response | 26 | CVE-2026-24307 |
+| **Tool Poisoning** | Malicious MCP responses, consent bypass, hidden LLM instructions, schema contradictions, ANSI escape elicitation | 16 | CVE-2025-68143/68144/68145 |
+| **Privilege Escalation** | Scope creep, delayed execution bypass, admin function access, shell escape | 9 | CVE-2026-0628 |
+| **Model Abuse** | Malware code generation (malwaregen), EICAR/GTUBE signatures, AV-evasion gen | 8 | -- |
 | **Excessive Autonomy** | Runaway loops, resource exhaustion, unauthorized financial actions | 5 | -- |
 | **Model Security** | Behavior extraction, malicious fine-tuning data | 2 | -- |
 | **Data Poisoning** | RAG/knowledge base tampering, memory manipulation | 1 | -- |
@@ -137,7 +140,7 @@ We test ATR with our own tests, external benchmarks, AND real-world wild scannin
 | **SKILL.md benchmark** | **498 labeled samples** | **498** | **97.0%** | **100%** |
 | **96K wild scan** | **OpenClaw + Skills.sh + Hermes + ClawHub** | **96,096** | **--** | **--** |
 | **PINT (adversarial)** | **Invariant Labs** | **850** | **99.6%** | **62.7%** |
-| **Garak (real-world jailbreaks)** | **NVIDIA** | **666** | -- | **69.7%** |
+| **Garak (real-world jailbreaks)** | **NVIDIA** | **666** | 100% | **97.1%** |
 | Self-test (own test cases) | Internal | 361 | 100% | 88.5% |
 ```bash
@@ -259,7 +262,7 @@ Every rule is a YAML file answering: **what** to detect, **how** to detect it, *
 ### Export rules
 ```bash
-# For your security platform (113 rules, 714+ regex patterns as JSON)
+# For your security platform (311 rules, 1,600+ regex patterns as JSON)
 atr convert generic-regex --output atr-rules.json
 # For SIEM integration
@@ -307,7 +310,7 @@ Want to integrate ATR into your product? Three options:
 ```bash
 # Option 1: Export rules as JSON (recommended for most tools)
 atr convert generic-regex --output atr-rules.json
-# → 113 rules, 714+ regex patterns, severity/category metadata
+# → 311 rules, 1,600+ regex patterns, severity/category metadata
 # Option 2: Use the TypeScript engine directly
 npm install agent-threat-rules
@@ -358,7 +361,8 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for the full guide. See [CONTRIBUTION-GUI
 - [x] **v0.4** -- 71 rules, ClawHub 36K scan, SAFE-MCP 91.8%
 - [x] **v1.0** -- 108 rules, 53K mega scan, GitHub Action + SARIF, generic-regex export, Cisco adoption
 - [x] **v1.1** -- Threat Cloud flywheel, 5 ecosystem merges, Microsoft AGT + NVIDIA Garak PRs
-- [x] **v2.0.0** (current) -- 113 rules, 96K mega scan, 751 malware discovered, RFC-001, GOVERNANCE.md, website launch
+- [x] **v2.0.0** -- 113 rules, 96K mega scan, 751 malware discovered, RFC-001, GOVERNANCE.md, website launch
+- [x] **v2.0.11** (current) -- 311 rules, 193 new NVIDIA garak probe coverage (ATR-00300~00414), 97.1% garak recall
 - [ ] **v2.1** -- Go engine, ML classifier integration, semantic signatures, community rule submissions
 - [ ] **v3.0** -- Multi-engine standard: 2+ engines, 10+ production deployments, schema review by 3+ security teams
@@ -366,7 +370,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for the full guide. See [CONTRIBUTION-GUI
 | Phase | Goal | Status |
 |-------|------|--------|
-| **Phase 0: Core product** | 113 rules, 62.7% recall, OWASP 10/10, 96K scan | **Done** |
+| **Phase 0: Core product** | 311 rules, 97.1% garak recall, OWASP 10/10, 96K scan | **Done** |
 | **Phase 1: Distribution** | GitHub Action, SARIF, generic-regex export, ecosystem PRs | **Done** |
 | **Phase 2: Adoption** | Cisco merged (34 rules), OWASP PR, 11 ecosystem PRs | **In progress** |
 | **Phase 3: Community flywheel** | Threat Cloud crystallization, auto-generated rules, 10+ contributors | In progress |
@@ -381,7 +385,7 @@ ATR uses "ATR Scanned" (not "ATR Certified") until recall exceeds 80%. We are ho
 ```
 ATR (this repo)                        Your Product / Integration
 ┌─────────────────────────┐            ┌──────────────────────────┐
-│ 113 Rules (YAML)        │   match    │ Block / Allow / Alert     │
+│ 311 Rules (YAML)        │   match    │ Block / Allow / Alert     │
 │ Engine (TS + Py)        │ ────────→  │ SIEM (Splunk / Elastic)  │
 │ CLI / MCP / GitHub Act. │   results  │ CI/CD (SARIF → Security) │
 │ SARIF / Generic Regex   │            │ Runtime Proxy (MCP)      │

package/package.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
   "name": "agent-threat-rules",
-  "version": "2.0.11",
+  "version": "2.0.13",
   "type": "module",
-  "description": "Open detection standard for AI agent security. 113 rules for prompt injection, tool poisoning, context exfiltration, and MCP attacks. Shipped in Cisco AI Defense.",
+  "description": "Open detection standard -- like Sigma, but for AI agents. 311 rules for prompt injection, tool poisoning, context exfiltration, and MCP attacks. Shipped in Cisco AI Defense. 97.1% recall on NVIDIA garak.",
   "main": "./dist/index.js",
   "types": "./dist/index.d.ts",
   "bin": {

package/rules/agent-manipulation/ATR-2026-00269-fitd-escalation.yaml CHANGED Viewed

@@ -35,6 +35,10 @@ agent_source:
   type: mcp_exchange
   framework: [any]
   provider: [any]
+metadata_provenance:
+  garak_probe: fitd.FITD
+  garak_detector: always.Fail
+  payload_source: garak/probes/fitd.py
 detection:
   conditions:
     - field: tool_response

package/rules/agent-manipulation/ATR-2026-00288-false-premise-injection.yaml CHANGED Viewed

@@ -41,6 +41,10 @@ agent_source:
   type: mcp_exchange
   framework: [any]
   provider: [any]
+metadata_provenance:
+  garak_probe: misleading.FalseAssertion
+  garak_detector: always.Fail
+  payload_source: garak/probes/misleading.py
 detection:
   conditions:
     - field: tool_response

package/rules/context-exfiltration/ATR-2026-00290-divergence-repeat-word-training-extraction.yaml CHANGED Viewed

@@ -39,6 +39,10 @@ agent_source:
   type: mcp_exchange
   framework: [any]
   provider: [any]
+metadata_provenance:
+  garak_probe: "divergence.Repeat, divergence.RepeatExtended, divergence.RepeatedToken"
+  garak_detector: always.Fail
+  payload_source: garak/probes/divergence.py
 detection:
   conditions:
     - field: tool_response

package/rules/model-abuse/ATR-2026-00281-eicar-gtube-malware-signature-request.yaml CHANGED Viewed

@@ -36,6 +36,10 @@ agent_source:
   type: mcp_exchange
   framework: [any]
   provider: [any]
+metadata_provenance:
+  garak_probe: "av_spam_scanning.EICAR, av_spam_scanning.GTUBE"
+  garak_detector: always.Fail
+  payload_source: garak/probes/av_spam_scanning.py
 detection:
   conditions:
     - field: tool_response

package/rules/prompt-injection/ATR-2026-00282-perez-prompt-injection-hijack.yaml CHANGED Viewed

@@ -37,6 +37,10 @@ agent_source:
   type: mcp_exchange
   framework: [any]
   provider: [any]
+metadata_provenance:
+  garak_probe: promptinject.HijackHateHumans
+  garak_detector: always.Fail
+  payload_source: garak/probes/promptinject.py (Perez & Ribeiro NeurIPS 2022)
 detection:
   conditions:
     - field: tool_response

package/rules/skill-compromise/ATR-2026-00260-package-hallucination.yaml CHANGED Viewed

@@ -36,6 +36,10 @@ agent_source:
   type: mcp_exchange
   framework: [any]
   provider: [any]
+metadata_provenance:
+  garak_probe: "packagehallucination.Python, packagehallucination.JavaScript"
+  garak_detector: always.Fail
+  payload_source: garak/probes/packagehallucination.py
 detection:
   conditions:
     - field: tool_response