@stackone/defender 0.4.0 → 0.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,5 +1,6 @@
1
1
  # @stackone/defender
2
2
 
3
+ ---
3
4
  Prompt injection defense framework for AI tool-calling. Detects and neutralizes prompt injection attacks hidden in tool results (emails, documents, PRs, etc.) before they reach your LLM.
4
5
 
5
6
  ## Installation
@@ -17,7 +18,11 @@ import { createPromptDefense } from '@stackone/defender';
17
18
 
18
19
  // Create defense with Tier 1 (patterns) + Tier 2 (ML classifier)
19
20
  // blockHighRisk: true enables the allowed/blocked decision
20
- const defense = createPromptDefense({ enableTier2: true, blockHighRisk: true });
21
+ const defense = createPromptDefense({
22
+ enableTier2: true,
23
+ blockHighRisk: true,
24
+ useDefaultToolRules: true, // Enable built-in per-tool base risk and field-handling rules (risky-field overrides always apply)
25
+ });
21
26
 
22
27
  // Defend a tool result — ONNX model (~22MB) auto-loads on first call
23
28
  const result = await defense.defendToolResult(toolOutput, 'gmail_get_message');
@@ -70,11 +75,13 @@ Use `allowed` for blocking decisions:
70
75
 
71
76
  `riskLevel` is diagnostic metadata. It starts at the tool's base risk level and can only be escalated by detections — never reduced. Use it for logging and monitoring, not for allow/block logic.
72
77
 
78
+ The following base risk levels apply when `useDefaultToolRules: true` is set. Without it, tools use `defaultRiskLevel` (defaults to `medium`).
79
+
73
80
  | Tool Pattern | Base Risk | Why |
74
81
  |--------------|-----------|-----|
75
82
  | `gmail_*`, `email_*` | `high` | Emails are the #1 injection vector |
76
- | `unified_documents_*` | `medium` | User-generated content |
77
- | `unified_hris_*` | `medium` | Employee data with free-text fields |
83
+ | `documents_*` | `medium` | User-generated content |
84
+ | `hris_*` | `medium` | Employee data with free-text fields |
78
85
  | `github_*` | `medium` | PRs/issues with user-generated content |
79
86
  | All other tools | `medium` | Default cautious level |
80
87
 
@@ -97,9 +104,10 @@ Create a defense instance.
97
104
 
98
105
  ```typescript
99
106
  const defense = createPromptDefense({
100
- enableTier1: true, // Pattern detection (default: true)
101
- enableTier2: true, // ML classification (default: false)
102
- blockHighRisk: true, // Block high/critical content (default: false)
107
+ enableTier1: true, // Pattern detection (default: true)
108
+ enableTier2: true, // ML classification (default: false)
109
+ blockHighRisk: true, // Block high/critical content (default: false)
110
+ useDefaultToolRules: true, // Enable built-in per-tool base risk and field-handling rules (default: false)
103
111
  defaultRiskLevel: 'medium',
104
112
  });
105
113
  ```
@@ -129,7 +137,7 @@ Batch method — defends multiple tool results concurrently.
129
137
  ```typescript
130
138
  const results = await defense.defendToolResults([
131
139
  { value: emailData, toolName: 'gmail_get_message' },
132
- { value: docData, toolName: 'unified_documents_get' },
140
+ { value: docData, toolName: 'documents_get' },
133
141
  { value: prData, toolName: 'github_get_pull_request' },
134
142
  ]);
135
143
 
@@ -178,7 +186,11 @@ await mlpDefense.warmupTier2();
178
186
  import { generateText, tool } from 'ai';
179
187
  import { createPromptDefense } from '@stackone/defender';
180
188
 
181
- const defense = createPromptDefense({ enableTier2: true, blockHighRisk: true });
189
+ const defense = createPromptDefense({
190
+ enableTier2: true,
191
+ blockHighRisk: true,
192
+ useDefaultToolRules: true,
193
+ });
182
194
  await defense.warmupTier2(); // optional, avoids first-call latency
183
195
 
184
196
  const result = await generateText({
@@ -203,15 +215,18 @@ const result = await generateText({
203
215
 
204
216
  ## Tool-Specific Rules
205
217
 
206
- Built-in rules define which fields to sanitize and what base risk level to use for each tool provider. See the [base risk table](#understanding-allowed-vs-risklevel) for risk levels.
218
+ > **Note:** `useDefaultToolRules: true` enables built-in per-tool **risk rules** (base risk, skip fields, max lengths, thresholds). Risky-field detection (which fields get sanitized) uses tool-specific overrides regardless of this setting.
219
+
220
+ Built-in per-tool rules define the base risk level and field-handling parameters for each tool provider. See the [base risk table](#understanding-allowed-vs-risklevel) for risk levels.
207
221
 
208
222
  | Tool Pattern | Risky Fields | Notes |
209
223
  |---|---|---|
210
224
  | `gmail_*`, `email_*` | subject, body, snippet, content | Base risk `high` — primary injection vector |
211
- | `unified_documents_*` | name, description, content, title | User-generated content |
225
+ | `documents_*` | name, description, content, title | User-generated content |
212
226
  | `github_*` | name, title, body, description | PRs, issues, comments |
213
- | `unified_hris_*` | name, notes, bio, description | Employee free-text fields |
214
- | `unified_ats_*`, `unified_crm_*` | _(default risky fields)_ | Uses global defaults |
227
+ | `hris_*` | name, notes, bio, description | Employee free-text fields |
228
+ | `ats_*` | name, notes, description, summary | Candidate data |
229
+ | `crm_*` | name, description, notes, content | Customer data |
215
230
 
216
231
  Tools not matching any pattern use `medium` base risk with default risky field detection.
217
232
 
@@ -235,4 +250,4 @@ npm test
235
250
 
236
251
  ## License
237
252
 
238
- SSPL-1.0 — See [LICENSE](./LICENSE) for details.
253
+ Apache-2.0 — See [LICENSE](./LICENSE) for details.