@panguard-ai/atr 1.4.3 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,251 +0,0 @@
1
- # How to Write an ATR Rule
2
-
3
- This guide walks you through creating detection rules for AI agent threats.
4
-
5
- ## Before You Start
6
-
7
- ATR rules use regex-based pattern matching (`detection_tier: pattern`). Be honest about what this can and cannot do:
8
-
9
- **Regex CAN detect:**
10
-
11
- - Known attack phrases and keywords
12
- - Encoded payloads (base64, hex, unicode)
13
- - Credential formats (API keys, tokens)
14
- - Structural patterns (markdown injection, delimiter abuse)
15
-
16
- **Regex CANNOT detect:**
17
-
18
- - Paraphrased attacks (same meaning, different words)
19
- - Multilingual variants you haven't written patterns for
20
- - Semantic manipulation (subtle framing, selective omission)
21
- - Protocol-level attacks (timing, ordering)
22
-
23
- If your rule tries to detect something regex fundamentally cannot catch, mark it clearly in the description. See `LIMITATIONS.md` for the full list.
24
-
25
- ## Step 1: Identify the Attack
26
-
27
- Before writing a rule, clearly define:
28
-
29
- - What attack does this detect?
30
- - Which OWASP Agentic Top 10 category does it map to?
31
- - What data source is needed (LLM I/O, tool calls, agent behavior)?
32
- - What specific text patterns indicate this attack?
33
- - Can this attack be trivially rephrased to evade detection?
34
-
35
- ## Step 2: Choose the Right Category
36
-
37
- | Category | Use When |
38
- | ---------------------- | ----------------------------------------------------------------- |
39
- | `prompt-injection` | User/external input tries to override agent instructions |
40
- | `tool-poisoning` | Tool responses contain malicious or manipulative content |
41
- | `context-exfiltration` | Agent leaks system prompt, API keys, or internal data |
42
- | `agent-manipulation` | One agent manipulates another agent's behavior |
43
- | `privilege-escalation` | Agent accesses resources beyond its authorized scope |
44
- | `excessive-autonomy` | Agent operates beyond intended boundaries (loops, resource abuse) |
45
- | `data-poisoning` | Training or retrieval data has been tampered with |
46
- | `skill-compromise` | MCP skills/tools are impersonated, hijacked, or over-permissioned |
47
- | `model-security` | Model weights, behavior, or training pipeline are targeted |
48
-
49
- ## Step 3: Write the Rule
50
-
51
- ### Required fields (per schema v0.1)
52
-
53
- ```yaml
54
- title: 'Descriptive Title of What This Detects'
55
- id: ATR-2026-XXX
56
- status: experimental
57
- description: |
58
- What this rule detects and how. Include limitations.
59
- Note: This rule detects [specific patterns], not [what it cannot detect].
60
- author: 'Your Name'
61
- date: '2026/03/09'
62
- schema_version: '0.1'
63
- detection_tier: pattern
64
- maturity: experimental
65
- severity: high
66
-
67
- references:
68
- owasp_llm:
69
- - 'LLM01:2025 - Prompt Injection'
70
- owasp_agentic:
71
- - 'ASI01:2026 - Agent Goal Hijack'
72
- mitre_atlas:
73
- - 'AML.T0054 - LLM Prompt Injection'
74
-
75
- tags:
76
- category: prompt-injection
77
- subcategory: your-subcategory
78
- confidence: medium
79
-
80
- agent_source:
81
- type: llm_io
82
- framework:
83
- - any
84
- provider:
85
- - any
86
-
87
- detection:
88
- conditions:
89
- - field: user_input
90
- operator: regex
91
- value: "(?i)your\\s+regex\\s+pattern\\s+here"
92
- description: 'What this pattern catches'
93
- - field: user_input
94
- operator: regex
95
- value: "(?i)another\\s+pattern"
96
- description: 'What this second pattern catches'
97
- condition: any
98
- false_positives:
99
- - 'Describe legitimate scenarios that could trigger this rule'
100
-
101
- response:
102
- actions:
103
- - block_input
104
- - alert
105
- - snapshot
106
- auto_response_threshold: high
107
- message_template: |
108
- [ATR-2026-XXX] Rule triggered. Pattern: {matched_pattern}.
109
- Session: {session_id}.
110
-
111
- test_cases:
112
- true_positives:
113
- - input: 'Exact text that SHOULD trigger this rule'
114
- expected: triggered
115
- description: 'Why this should trigger'
116
- true_negatives:
117
- - input: 'Legitimate text that should NOT trigger'
118
- expected: not_triggered
119
- description: 'Why this should not trigger'
120
-
121
- evasion_tests:
122
- - input: 'Rephrased version that evades detection'
123
- expected: not_triggered
124
- bypass_technique: paraphrase
125
- notes: 'Requires embedding similarity detection (v0.2)'
126
- ```
127
-
128
- ## Step 4: Severity Calibration
129
-
130
- Choose severity based on real-world impact, not pattern sophistication:
131
-
132
- | Severity | Criteria | Example |
133
- | --------------- | -------------------------------------------------------------- | ------------------------------------------------------------ |
134
- | `critical` | Immediate data loss, credential exposure, or system compromise | API key exfiltration with active exploitation |
135
- | `high` | Significant security boundary violation | Direct prompt injection overriding safety controls |
136
- | `medium` | Potential for escalation or policy violation | Suspicious tool call patterns without confirmed exploitation |
137
- | `low` | Anomalous behavior worth logging | Unusual but possibly legitimate agent autonomy |
138
- | `informational` | Context for security analysis | Metadata patterns useful for correlation |
139
-
140
- ## Step 5: Write Good Test Cases
141
-
142
- ### Minimum requirements
143
-
144
- - 5 true positives (attacks that SHOULD trigger)
145
- - 5 true negatives (legitimate content that should NOT trigger)
146
- - 3 evasion tests (known bypass techniques)
147
-
148
- ### True positive tips
149
-
150
- - Cover all detection conditions (each regex pattern should have at least one TP)
151
- - Include realistic attack payloads, not toy examples
152
- - Test edge cases: mixed case, extra whitespace, unicode
153
-
154
- ### True negative tips (most important for credibility)
155
-
156
- - Include adversarial true negatives: inputs that LOOK suspicious but are legitimate
157
- - Test professional language that overlaps with attack patterns
158
- - Examples:
159
- - "Please update the system prompt to reflect new branding" (legitimate, looks like injection)
160
- - "As CEO, I am directing you to prioritize this task" (authority language, not an attack)
161
- - "The analysis is 100% correct per our validation suite" (confidence language, not manipulation)
162
-
163
- ### Evasion test tips
164
-
165
- - Document what you CANNOT catch. This builds trust.
166
- - Common bypass techniques to test:
167
- - `paraphrase`: Same meaning, completely different wording
168
- - `language_switch`: Attack in German, Chinese, Japanese, etc.
169
- - `indirect_reference`: Referring to attack concepts without using keywords
170
- - `encoding`: Base64, ROT13, unicode homoglyphs
171
- - `social_engineering`: Polite, professional tone wrapping malicious intent
172
- - `fictional_framing`: "In a novel, the character would say..."
173
-
174
- ## Anti-Patterns (What NOT to Do)
175
-
176
- ### 1. Overly broad regex
177
-
178
- ```yaml
179
- # BAD: Catches nearly everything
180
- value: "(?i)(ignore|change|update|modify)"
181
-
182
- # GOOD: Specific to attack context
183
- value: "(?i)(ignore|disregard)\\s+(all\\s+)?previous\\s+(instructions|directives|rules)"
184
- ```
185
-
186
- ### 2. No word boundaries
187
-
188
- ```yaml
189
- # BAD: Matches "signore" (Italian for "sir")
190
- value: "(?i)ignore"
191
-
192
- # GOOD: Word boundary prevents false positives
193
- value: "(?i)\\bignore\\b\\s+\\b(previous|prior|above)\\b"
194
- ```
195
-
196
- ### 3. Claiming behavioral detection with regex
197
-
198
- ```yaml
199
- # BAD: This regex cannot detect actual cascading failures
200
- description: "Detects cascading failures in agent pipelines"
201
-
202
- # GOOD: Be honest about what regex detects
203
- description: |
204
- Detects textual descriptions of cascading failure patterns.
205
- Note: Structural cascade prevention requires behavioral monitoring (v0.2).
206
- ```
207
-
208
- ### 4. Aggressive response actions for weak detection
209
-
210
- ```yaml
211
- # BAD: Blocking based on text description detection
212
- response:
213
- actions:
214
- - block_input
215
- - kill_agent
216
-
217
- # GOOD: Alert-only for pattern-tier detection of behavioral threats
218
- response:
219
- actions:
220
- - alert
221
- - snapshot
222
- ```
223
-
224
- ### 5. Missing false_positives section
225
-
226
- Every rule WILL have false positives. If you can't think of any, your rule is either too narrow to be useful or you haven't thought hard enough.
227
-
228
- ## Step 6: Validate and Test
229
-
230
- ```bash
231
- # Validate your rule structure
232
- npx agent-threat-rules validate my-rule.yaml
233
-
234
- # Run embedded test cases
235
- npx agent-threat-rules test my-rule.yaml
236
-
237
- # Check stats
238
- npx agent-threat-rules stats
239
- ```
240
-
241
- ## Step 7: Submit
242
-
243
- 1. Fork `github.com/Agent-Threat-Rule/agent-threat-rules`
244
- 2. Place your rule in the correct category directory under `rules/`
245
- 3. Run `npx agent-threat-rules validate rules/` to check all rules
246
- 4. Run `npx agent-threat-rules test rules/` to run all test cases
247
- 5. Submit a PR with:
248
- - Rule YAML file
249
- - Description of what attack this detects
250
- - References (OWASP, MITRE ATLAS, CVE if applicable)
251
- - Any known limitations or evasion techniques
package/tsconfig.json DELETED
@@ -1,17 +0,0 @@
1
- {
2
- "compilerOptions": {
3
- "target": "ES2022",
4
- "module": "Node16",
5
- "moduleResolution": "Node16",
6
- "declaration": true,
7
- "declarationMap": true,
8
- "sourceMap": true,
9
- "outDir": "./dist",
10
- "rootDir": "./src",
11
- "strict": true,
12
- "esModuleInterop": true,
13
- "skipLibCheck": true,
14
- "composite": true
15
- },
16
- "include": ["src"]
17
- }