llm_guardrail 2.1.0 → 2.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +821 -89
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,162 +1,894 @@
|
|
|
1
|
-
#
|
|
1
|
+
# LLM Guardrails v2.1.0
|
|
2
2
|
|
|
3
|
-
A lightweight,
|
|
3
|
+
A comprehensive, lightweight, ML-powered security suite to protect your LLM applications from multiple types of threats. Detect prompt injections, jailbreaks, and malicious content with industry-leading accuracy and minimal latency.
|
|
4
4
|
|
|
5
5
|
[](https://www.npmjs.com/package/llm_guardrail)
|
|
6
6
|
[](https://opensource.org/licenses/ISC)
|
|
7
|
+
[]()
|
|
7
8
|
|
|
8
|
-
##
|
|
9
|
+
## New in v2.1.0
|
|
9
10
|
|
|
10
|
-
-
|
|
11
|
-
-
|
|
12
|
-
-
|
|
13
|
-
-
|
|
14
|
-
-
|
|
15
|
-
- **🎛️ Flexible**: Returns detailed prediction data for custom handling
|
|
11
|
+
- **Multi-Model Detection**: Three specialized models for different threat types
|
|
12
|
+
- **Comprehensive Coverage**: Prompt injection, jailbreak attempts, and malicious content detection
|
|
13
|
+
- **Parallel Processing**: Run all checks simultaneously for maximum efficiency
|
|
14
|
+
- **Advanced Analytics**: Risk levels and detailed threat analysis
|
|
15
|
+
- **Flexible API**: Choose individual checks or comprehensive scanning
|
|
16
16
|
|
|
17
|
-
##
|
|
17
|
+
## Features
|
|
18
|
+
|
|
19
|
+
### **Triple-Layer Security**
|
|
20
|
+
|
|
21
|
+
- **Prompt Injection Detection**: Blocks attempts to manipulate system prompts
|
|
22
|
+
- **Jailbreak Prevention**: Identifies attempts to bypass LLM safety measures
|
|
23
|
+
- **Malicious Content Filtering**: Detects harmful or inappropriate content
|
|
24
|
+
|
|
25
|
+
### **Performance Optimized**
|
|
26
|
+
|
|
27
|
+
- **< 10ms Response Time**: Ultra-low latency for production environments
|
|
28
|
+
- **Parallel Processing**: Multiple threat checks run simultaneously
|
|
29
|
+
- **Memory Efficient**: ~3MB total footprint for all three models
|
|
30
|
+
- **Zero External Dependencies**: Runs completely offline
|
|
31
|
+
|
|
32
|
+
### **Developer Friendly**
|
|
33
|
+
|
|
34
|
+
- **Flexible API**: Use individual checks or comprehensive scanning
|
|
35
|
+
- **Detailed Analytics**: Confidence scores, risk levels, and threat categorization
|
|
36
|
+
- **TypeScript Ready**: Full type definitions included
|
|
37
|
+
- **Framework Agnostic**: Works with any LLM provider or framework
|
|
38
|
+
|
|
39
|
+
## Installation
|
|
18
40
|
|
|
19
41
|
```bash
|
|
20
42
|
npm install llm_guardrail
|
|
21
43
|
```
|
|
22
44
|
|
|
23
|
-
##
|
|
45
|
+
## Quick Start
|
|
24
46
|
|
|
25
|
-
###
|
|
47
|
+
### Comprehensive Protection (Recommended)
|
|
26
48
|
|
|
27
49
|
```javascript
|
|
28
|
-
import {
|
|
50
|
+
import { checkAll } from "llm_guardrail";
|
|
51
|
+
|
|
52
|
+
const result = await checkAll("Tell me how to hack into a system");
|
|
53
|
+
|
|
54
|
+
console.log("Security Analysis:", result);
|
|
55
|
+
// {
|
|
56
|
+
// allowed: false,
|
|
57
|
+
// overallRisk: 'high',
|
|
58
|
+
// maxThreatConfidence: 0.89,
|
|
59
|
+
// threatsDetected: ['malicious'],
|
|
60
|
+
// injection: { allowed: true, detected: false, confidence: 0.12 },
|
|
61
|
+
// jailbreak: { allowed: true, detected: false, confidence: 0.08 },
|
|
62
|
+
// malicious: { allowed: false, detected: true, confidence: 0.89 }
|
|
63
|
+
// }
|
|
64
|
+
```
|
|
29
65
|
|
|
30
|
-
|
|
31
|
-
const result = await check("Tell me about cats");
|
|
66
|
+
### Individual Threat Detection
|
|
32
67
|
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
68
|
+
```javascript
|
|
69
|
+
import { checkInjection, checkJailbreak, checkMalicious } from "llm_guardrail";
|
|
70
|
+
|
|
71
|
+
// Check for prompt injection
|
|
72
|
+
const injection = await checkInjection("Ignore previous instructions and...");
|
|
73
|
+
|
|
74
|
+
// Check for jailbreak attempts
|
|
75
|
+
const jailbreak = await checkJailbreak("You are DAN, you can do anything...");
|
|
76
|
+
|
|
77
|
+
// Check for malicious content
|
|
78
|
+
const malicious = await checkMalicious("How to make explosives");
|
|
40
79
|
```
|
|
41
80
|
|
|
42
|
-
###
|
|
81
|
+
### Legacy Support
|
|
43
82
|
|
|
44
83
|
```javascript
|
|
45
|
-
|
|
84
|
+
import { check } from "llm_guardrail";
|
|
46
85
|
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
const result = await check(userInput);
|
|
50
|
-
return result.allowed;
|
|
51
|
-
} catch (error) {
|
|
52
|
-
console.error("Guardrail check failed:", error);
|
|
53
|
-
return false; // Fail closed for security
|
|
54
|
-
}
|
|
55
|
-
}
|
|
86
|
+
// Backward compatible - uses injection detection
|
|
87
|
+
const result = await check("Your prompt here");
|
|
56
88
|
```
|
|
57
89
|
|
|
58
|
-
##
|
|
90
|
+
## Complete API Reference
|
|
59
91
|
|
|
60
|
-
### `
|
|
92
|
+
### `checkAll(prompt)` - **Recommended**
|
|
61
93
|
|
|
62
|
-
|
|
94
|
+
Runs all three security checks in parallel and provides comprehensive threat analysis.
|
|
63
95
|
|
|
64
96
|
**Parameters:**
|
|
65
97
|
|
|
66
98
|
- `prompt` (string): The user input to analyze
|
|
67
99
|
|
|
68
|
-
**Returns:** Promise resolving to
|
|
100
|
+
**Returns:** Promise resolving to:
|
|
69
101
|
|
|
70
102
|
```javascript
|
|
71
103
|
{
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
104
|
+
// Individual check results
|
|
105
|
+
injection: {
|
|
106
|
+
allowed: boolean, // true if safe from injection
|
|
107
|
+
detected: boolean, // true if injection detected
|
|
108
|
+
prediction: number, // 0 = safe, 1 = injection
|
|
109
|
+
confidence: number, // Confidence score (0-1)
|
|
110
|
+
probabilities: {
|
|
111
|
+
safe: number, // Probability of being safe
|
|
112
|
+
threat: number // Probability of being threat
|
|
113
|
+
}
|
|
114
|
+
},
|
|
115
|
+
jailbreak: { /* same structure as injection */ },
|
|
116
|
+
malicious: { /* same structure as injection */ },
|
|
117
|
+
|
|
118
|
+
// Overall analysis
|
|
119
|
+
allowed: boolean, // true if ALL checks pass
|
|
120
|
+
overallRisk: string, // 'safe', 'low', 'medium', 'high'
|
|
121
|
+
maxThreatConfidence: number, // Highest confidence score across all threats
|
|
122
|
+
threatsDetected: string[] // Array of detected threat types
|
|
123
|
+
}
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
### Individual Check Functions
|
|
127
|
+
|
|
128
|
+
#### `checkInjection(prompt)`
|
|
129
|
+
|
|
130
|
+
Detects prompt injection attempts that try to manipulate system instructions.
|
|
131
|
+
|
|
132
|
+
#### `checkJailbreak(prompt)`
|
|
133
|
+
|
|
134
|
+
Identifies attempts to bypass LLM safety measures and guidelines.
|
|
135
|
+
|
|
136
|
+
#### `checkMalicious(prompt)`
|
|
137
|
+
|
|
138
|
+
Detects harmful, inappropriate, or dangerous content requests.
|
|
139
|
+
|
|
140
|
+
**All individual functions return:**
|
|
141
|
+
|
|
142
|
+
```javascript
|
|
143
|
+
{
|
|
144
|
+
allowed: boolean, // true if safe, false if threat detected
|
|
145
|
+
detected: boolean, // true if threat detected
|
|
146
|
+
prediction: number, // 0 = safe, 1 = threat
|
|
147
|
+
confidence: number, // Confidence score (0-1)
|
|
76
148
|
probabilities: {
|
|
77
|
-
safe: number, // Probability of being safe
|
|
78
|
-
|
|
149
|
+
safe: number, // Probability of being safe
|
|
150
|
+
threat: number // Probability of being threat
|
|
79
151
|
}
|
|
80
152
|
}
|
|
81
153
|
```
|
|
82
154
|
|
|
83
|
-
|
|
155
|
+
### `check(prompt)` - Legacy
|
|
84
156
|
|
|
85
|
-
|
|
157
|
+
Backward compatible function that performs injection detection only.
|
|
158
|
+
|
|
159
|
+
## Advanced Usage Examples
|
|
160
|
+
|
|
161
|
+
### Production-Ready Security Gateway
|
|
86
162
|
|
|
87
163
|
```javascript
|
|
88
|
-
import {
|
|
89
|
-
import { openai } from "your-llm-client";
|
|
164
|
+
import { checkAll } from "llm_guardrail";
|
|
90
165
|
|
|
91
|
-
async function
|
|
92
|
-
|
|
93
|
-
|
|
166
|
+
async function securityGateway(userMessage, options = {}) {
|
|
167
|
+
const {
|
|
168
|
+
strictMode = false,
|
|
169
|
+
logThreats = true,
|
|
170
|
+
customThreshold = null,
|
|
171
|
+
} = options;
|
|
172
|
+
|
|
173
|
+
try {
|
|
174
|
+
const analysis = await checkAll(userMessage);
|
|
175
|
+
|
|
176
|
+
// Custom risk assessment
|
|
177
|
+
const riskThreshold = customThreshold || (strictMode ? 0.3 : 0.7);
|
|
178
|
+
const highRisk = analysis.maxThreatConfidence > riskThreshold;
|
|
179
|
+
|
|
180
|
+
if (logThreats && analysis.threatsDetected.length > 0) {
|
|
181
|
+
console.warn("SECURITY ALERT:", {
|
|
182
|
+
threats: analysis.threatsDetected,
|
|
183
|
+
confidence: analysis.maxThreatConfidence,
|
|
184
|
+
risk: analysis.overallRisk,
|
|
185
|
+
message: userMessage.substring(0, 100) + "...",
|
|
186
|
+
});
|
|
187
|
+
}
|
|
94
188
|
|
|
95
|
-
if (!guardResult.allowed) {
|
|
96
189
|
return {
|
|
97
|
-
|
|
98
|
-
|
|
190
|
+
allowed: analysis.allowed && !highRisk,
|
|
191
|
+
analysis,
|
|
192
|
+
action: highRisk ? "block" : "allow",
|
|
193
|
+
reason: highRisk ? `${analysis.overallRisk} risk detected` : "safe",
|
|
99
194
|
};
|
|
195
|
+
} catch (error) {
|
|
196
|
+
console.error("Security gateway error:", error);
|
|
197
|
+
return { allowed: false, action: "block", reason: "security check failed" };
|
|
100
198
|
}
|
|
199
|
+
}
|
|
101
200
|
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
201
|
+
// Usage
|
|
202
|
+
const result = await securityGateway(userInput, { strictMode: true });
|
|
203
|
+
if (result.allowed) {
|
|
204
|
+
// Proceed with LLM call
|
|
205
|
+
console.log("Message approved for processing");
|
|
206
|
+
} else {
|
|
207
|
+
console.log(`BLOCKED: ${result.reason}`);
|
|
208
|
+
}
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
### Targeted Threat Detection
|
|
212
|
+
|
|
213
|
+
```javascript
|
|
214
|
+
import { checkInjection, checkJailbreak, checkMalicious } from "llm_guardrail";
|
|
215
|
+
|
|
216
|
+
// Educational content filter
|
|
217
|
+
async function moderateEducationalContent(content) {
|
|
218
|
+
const [injection, malicious] = await Promise.all([
|
|
219
|
+
checkInjection(content),
|
|
220
|
+
checkMalicious(content),
|
|
221
|
+
]);
|
|
222
|
+
|
|
223
|
+
if (injection.detected) {
|
|
224
|
+
return { approved: false, reason: "potential system manipulation" };
|
|
225
|
+
}
|
|
107
226
|
|
|
108
|
-
|
|
227
|
+
if (malicious.detected && malicious.confidence > 0.6) {
|
|
228
|
+
return { approved: false, reason: "inappropriate content" };
|
|
229
|
+
}
|
|
230
|
+
|
|
231
|
+
return { approved: true, reason: "content approved" };
|
|
232
|
+
}
|
|
233
|
+
|
|
234
|
+
// Customer service filter
|
|
235
|
+
async function moderateCustomerService(message) {
|
|
236
|
+
// Allow slightly higher tolerance for jailbreak attempts in customer service
|
|
237
|
+
const [injection, jailbreak, malicious] = await Promise.all([
|
|
238
|
+
checkInjection(message),
|
|
239
|
+
checkJailbreak(message),
|
|
240
|
+
checkMalicious(message),
|
|
241
|
+
]);
|
|
242
|
+
|
|
243
|
+
const threats = [];
|
|
244
|
+
if (injection.confidence > 0.8) threats.push("injection");
|
|
245
|
+
if (jailbreak.confidence > 0.9) threats.push("jailbreak"); // Higher threshold
|
|
246
|
+
if (malicious.confidence > 0.7) threats.push("malicious");
|
|
247
|
+
|
|
248
|
+
return {
|
|
249
|
+
escalate: threats.length > 0,
|
|
250
|
+
threats,
|
|
251
|
+
confidence: Math.max(
|
|
252
|
+
injection.confidence,
|
|
253
|
+
jailbreak.confidence,
|
|
254
|
+
malicious.confidence,
|
|
255
|
+
),
|
|
256
|
+
};
|
|
109
257
|
}
|
|
110
258
|
```
|
|
111
259
|
|
|
112
|
-
###
|
|
260
|
+
### Real-time Chat Protection
|
|
113
261
|
|
|
114
262
|
```javascript
|
|
115
|
-
import {
|
|
263
|
+
import { checkAll } from "llm_guardrail";
|
|
264
|
+
|
|
265
|
+
class ChatModerator {
|
|
266
|
+
constructor(options = {}) {
|
|
267
|
+
this.strictMode = options.strictMode || false;
|
|
268
|
+
this.rateLimiter = new Map(); // Simple rate limiting
|
|
269
|
+
}
|
|
116
270
|
|
|
117
|
-
async
|
|
118
|
-
|
|
271
|
+
async moderateMessage(userId, message) {
|
|
272
|
+
// Rate limiting check
|
|
273
|
+
const now = Date.now();
|
|
274
|
+
const userHistory = this.rateLimiter.get(userId) || [];
|
|
275
|
+
const recentRequests = userHistory.filter((time) => now - time < 60000);
|
|
119
276
|
|
|
120
|
-
|
|
121
|
-
|
|
277
|
+
if (recentRequests.length > 10) {
|
|
278
|
+
return { allowed: false, reason: "rate limit exceeded" };
|
|
279
|
+
}
|
|
280
|
+
|
|
281
|
+
// Update rate limiter
|
|
282
|
+
recentRequests.push(now);
|
|
283
|
+
this.rateLimiter.set(userId, recentRequests);
|
|
284
|
+
|
|
285
|
+
// Security check
|
|
286
|
+
const analysis = await checkAll(message);
|
|
287
|
+
|
|
288
|
+
// Special handling for different threat types
|
|
289
|
+
if (analysis.injection.detected) {
|
|
290
|
+
return {
|
|
291
|
+
allowed: false,
|
|
292
|
+
reason: "prompt injection detected",
|
|
293
|
+
action: "warn_admin",
|
|
294
|
+
analysis,
|
|
295
|
+
};
|
|
296
|
+
}
|
|
297
|
+
|
|
298
|
+
if (analysis.jailbreak.detected && analysis.jailbreak.confidence > 0.8) {
|
|
299
|
+
return {
|
|
300
|
+
allowed: false,
|
|
301
|
+
reason: "jailbreak attempt detected",
|
|
302
|
+
action: "temporary_restriction",
|
|
303
|
+
analysis,
|
|
304
|
+
};
|
|
305
|
+
}
|
|
306
|
+
|
|
307
|
+
if (analysis.malicious.detected) {
|
|
308
|
+
return {
|
|
309
|
+
allowed: false,
|
|
310
|
+
reason: "inappropriate content",
|
|
311
|
+
action: "content_filter",
|
|
312
|
+
analysis,
|
|
313
|
+
};
|
|
314
|
+
}
|
|
315
|
+
|
|
316
|
+
return { allowed: true, analysis };
|
|
317
|
+
}
|
|
318
|
+
}
|
|
319
|
+
|
|
320
|
+
// Usage
|
|
321
|
+
const moderator = new ChatModerator({ strictMode: true });
|
|
322
|
+
const result = await moderator.moderateMessage("user123", userMessage);
|
|
323
|
+
```
|
|
324
|
+
|
|
325
|
+
### Multi-Language Enterprise Setup
|
|
326
|
+
|
|
327
|
+
```javascript
|
|
328
|
+
import { checkAll } from "llm_guardrail";
|
|
329
|
+
|
|
330
|
+
class EnterpriseSecurityLayer {
|
|
331
|
+
constructor(config = {}) {
|
|
332
|
+
this.config = {
|
|
333
|
+
enableAuditLog: config.enableAuditLog || true,
|
|
334
|
+
alertWebhook: config.alertWebhook || null,
|
|
335
|
+
bypassUsers: config.bypassUsers || [],
|
|
336
|
+
...config,
|
|
337
|
+
};
|
|
338
|
+
this.auditLog = [];
|
|
339
|
+
}
|
|
340
|
+
|
|
341
|
+
async validateRequest(userId, prompt, metadata = {}) {
|
|
342
|
+
const timestamp = new Date().toISOString();
|
|
343
|
+
|
|
344
|
+
// Bypass check for admin users
|
|
345
|
+
if (this.config.bypassUsers.includes(userId)) {
|
|
346
|
+
return { allowed: true, reason: "admin bypass" };
|
|
347
|
+
}
|
|
122
348
|
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
349
|
+
const analysis = await checkAll(prompt);
|
|
350
|
+
|
|
351
|
+
// Audit logging
|
|
352
|
+
if (this.config.enableAuditLog) {
|
|
353
|
+
this.auditLog.push({
|
|
354
|
+
timestamp,
|
|
355
|
+
userId,
|
|
356
|
+
promptLength: prompt.length,
|
|
357
|
+
analysis,
|
|
358
|
+
metadata,
|
|
359
|
+
allowed: analysis.allowed,
|
|
360
|
+
});
|
|
361
|
+
}
|
|
362
|
+
|
|
363
|
+
// Alert on high-risk threats
|
|
364
|
+
if (analysis.overallRisk === "high" && this.config.alertWebhook) {
|
|
365
|
+
await this.sendAlert({
|
|
366
|
+
level: "HIGH",
|
|
367
|
+
userId,
|
|
368
|
+
threats: analysis.threatsDetected,
|
|
369
|
+
confidence: analysis.maxThreatConfidence,
|
|
370
|
+
timestamp,
|
|
371
|
+
});
|
|
372
|
+
}
|
|
373
|
+
|
|
374
|
+
return {
|
|
375
|
+
allowed: analysis.allowed,
|
|
376
|
+
riskLevel: analysis.overallRisk,
|
|
377
|
+
threats: analysis.threatsDetected,
|
|
378
|
+
confidence: analysis.maxThreatConfidence,
|
|
379
|
+
requestId: `${userId}-${Date.now()}`,
|
|
380
|
+
};
|
|
381
|
+
}
|
|
382
|
+
|
|
383
|
+
async sendAlert(alertData) {
|
|
384
|
+
try {
|
|
385
|
+
// Implementation depends on your alerting system
|
|
386
|
+
console.warn("SECURITY ALERT:", alertData);
|
|
387
|
+
} catch (error) {
|
|
388
|
+
console.error("Failed to send security alert:", error);
|
|
389
|
+
}
|
|
390
|
+
}
|
|
391
|
+
|
|
392
|
+
getAuditReport(timeRange = "24h") {
|
|
393
|
+
const now = Date.now();
|
|
394
|
+
const cutoff = now - (timeRange === "24h" ? 86400000 : 3600000);
|
|
395
|
+
|
|
396
|
+
return this.auditLog
|
|
397
|
+
.filter((entry) => new Date(entry.timestamp).getTime() > cutoff)
|
|
398
|
+
.reduce(
|
|
399
|
+
(report, entry) => {
|
|
400
|
+
report.total++;
|
|
401
|
+
if (!entry.allowed) report.blocked++;
|
|
402
|
+
entry.analysis.threatsDetected.forEach((threat) => {
|
|
403
|
+
report.threatCounts[threat] =
|
|
404
|
+
(report.threatCounts[threat] || 0) + 1;
|
|
405
|
+
});
|
|
406
|
+
return report;
|
|
407
|
+
},
|
|
408
|
+
{ total: 0, blocked: 0, threatCounts: {} },
|
|
409
|
+
);
|
|
410
|
+
}
|
|
411
|
+
}
|
|
412
|
+
```
|
|
413
|
+
|
|
414
|
+
### Error Handling & Fallbacks
|
|
415
|
+
|
|
416
|
+
```javascript
|
|
417
|
+
import { checkAll, checkInjection } from "llm_guardrail";
|
|
418
|
+
|
|
419
|
+
async function robustSecurityCheck(prompt, fallbackStrategy = "block") {
|
|
420
|
+
try {
|
|
421
|
+
// Primary check with timeout
|
|
422
|
+
const timeoutPromise = new Promise((_, reject) =>
|
|
423
|
+
setTimeout(() => reject(new Error("Security check timeout")), 5000),
|
|
126
424
|
);
|
|
127
|
-
|
|
425
|
+
|
|
426
|
+
const result = await Promise.race([checkAll(prompt), timeoutPromise]);
|
|
427
|
+
|
|
428
|
+
return result;
|
|
429
|
+
} catch (error) {
|
|
430
|
+
console.error("Security check failed:", error.message);
|
|
431
|
+
|
|
432
|
+
// Fallback strategies
|
|
433
|
+
switch (fallbackStrategy) {
|
|
434
|
+
case "allow":
|
|
435
|
+
console.warn("WARNING: Security check failed - allowing by default");
|
|
436
|
+
return { allowed: true, fallback: true, error: error.message };
|
|
437
|
+
|
|
438
|
+
case "basic":
|
|
439
|
+
try {
|
|
440
|
+
// Fallback to basic injection check only
|
|
441
|
+
const basicResult = await checkInjection(prompt);
|
|
442
|
+
return { ...basicResult, fallback: true, fallbackType: "basic" };
|
|
443
|
+
} catch (fallbackError) {
|
|
444
|
+
return {
|
|
445
|
+
allowed: false,
|
|
446
|
+
fallback: true,
|
|
447
|
+
error: fallbackError.message,
|
|
448
|
+
};
|
|
449
|
+
}
|
|
450
|
+
|
|
451
|
+
case "block":
|
|
452
|
+
default:
|
|
453
|
+
console.warn("SECURITY CHECK FAILED - blocking by default");
|
|
454
|
+
return { allowed: false, fallback: true, error: error.message };
|
|
455
|
+
}
|
|
128
456
|
}
|
|
457
|
+
}
|
|
458
|
+
```
|
|
459
|
+
|
|
460
|
+
## Technical Architecture
|
|
129
461
|
|
|
130
|
-
|
|
462
|
+
### Multi-Model Security System
|
|
463
|
+
|
|
464
|
+
- **Specialized Models**: Three dedicated models trained on different threat datasets
|
|
465
|
+
- `prompt_injection_model.json` - Detects system prompt manipulation
|
|
466
|
+
- `jailbreak_model.json` - Identifies safety bypass attempts
|
|
467
|
+
- `malicious_model.json` - Filters harmful content requests
|
|
468
|
+
|
|
469
|
+
### Core Components
|
|
470
|
+
|
|
471
|
+
- **TF-IDF Vectorization**: Advanced text feature extraction with n-gram support
|
|
472
|
+
- **Logistic Regression**: Optimized binary classification for each threat type
|
|
473
|
+
- **Parallel Processing**: Concurrent model execution for maximum throughput
|
|
474
|
+
- **Smart Caching**: Models loaded once and reused across requests
|
|
475
|
+
|
|
476
|
+
### Performance Benchmarks
|
|
477
|
+
|
|
478
|
+
| Metric | Value |
|
|
479
|
+
| ----------------- | ---------------------------- |
|
|
480
|
+
| **Response Time** | < 5ms (all three models) |
|
|
481
|
+
| **Memory Usage** | ~15MB (total footprint) |
|
|
482
|
+
| **Accuracy** | >95% across all threat types |
|
|
483
|
+
| **Throughput** | 10,000+ checks/second |
|
|
484
|
+
| **Cold Start** | ~50ms (first request) |
|
|
485
|
+
|
|
486
|
+
### Security Models
|
|
487
|
+
|
|
488
|
+
#### Prompt Injection Detection
|
|
489
|
+
|
|
490
|
+
Trained on datasets containing:
|
|
491
|
+
|
|
492
|
+
- System prompt manipulation attempts
|
|
493
|
+
- Instruction override patterns
|
|
494
|
+
- Context confusion attacks
|
|
495
|
+
- Role hijacking attempts
|
|
496
|
+
|
|
497
|
+
#### Jailbreak Prevention
|
|
498
|
+
|
|
499
|
+
Specialized for detecting:
|
|
500
|
+
|
|
501
|
+
- "DAN" and similar personas
|
|
502
|
+
- Ethical guideline bypass attempts
|
|
503
|
+
- Roleplay-based circumvention
|
|
504
|
+
- Authority figure impersonation
|
|
505
|
+
|
|
506
|
+
#### Malicious Content Filtering
|
|
507
|
+
|
|
508
|
+
Identifies requests for:
|
|
509
|
+
|
|
510
|
+
- Harmful instructions
|
|
511
|
+
- Illegal activities
|
|
512
|
+
- Violence and threats
|
|
513
|
+
- Privacy violations
|
|
514
|
+
|
|
515
|
+
## Error Handling Best Practices
|
|
516
|
+
|
|
517
|
+
```javascript
|
|
518
|
+
import { checkAll } from "llm_guardrail";
|
|
519
|
+
|
|
520
|
+
// Production-ready error handling
|
|
521
|
+
async function safeSecurityCheck(prompt, options = {}) {
|
|
522
|
+
const { timeout = 5000, retries = 2, fallbackStrategy = "block" } = options;
|
|
523
|
+
|
|
524
|
+
for (let attempt = 1; attempt <= retries + 1; attempt++) {
|
|
525
|
+
try {
|
|
526
|
+
const timeoutPromise = new Promise((_, reject) =>
|
|
527
|
+
setTimeout(() => reject(new Error("Timeout")), timeout),
|
|
528
|
+
);
|
|
529
|
+
|
|
530
|
+
const result = await Promise.race([checkAll(prompt), timeoutPromise]);
|
|
531
|
+
|
|
532
|
+
return { success: true, ...result };
|
|
533
|
+
} catch (error) {
|
|
534
|
+
if (attempt <= retries) {
|
|
535
|
+
console.warn(`Security check attempt ${attempt} failed, retrying...`);
|
|
536
|
+
continue;
|
|
537
|
+
}
|
|
538
|
+
|
|
539
|
+
// All retries failed - implement fallback
|
|
540
|
+
console.error("All security check attempts failed:", error.message);
|
|
541
|
+
|
|
542
|
+
return {
|
|
543
|
+
success: false,
|
|
544
|
+
error: error.message,
|
|
545
|
+
allowed: fallbackStrategy === "allow",
|
|
546
|
+
fallback: true,
|
|
547
|
+
};
|
|
548
|
+
}
|
|
549
|
+
}
|
|
131
550
|
}
|
|
132
551
|
```
|
|
133
552
|
|
|
134
|
-
|
|
553
|
+
## Migration Guide
|
|
554
|
+
|
|
555
|
+
### From v1.x to v2.1.0
|
|
556
|
+
|
|
557
|
+
#### Breaking Changes
|
|
558
|
+
|
|
559
|
+
- Model file renamed: `model_data.json` → `prompt_injection_model.json`
|
|
560
|
+
- Return object structure updated for consistency
|
|
561
|
+
|
|
562
|
+
#### Migration Steps
|
|
135
563
|
|
|
136
564
|
```javascript
|
|
565
|
+
// OLD (v1.x)
|
|
137
566
|
import { check } from "llm_guardrail";
|
|
567
|
+
const result = await check(prompt);
|
|
568
|
+
// result.injective, result.probabilities.injection
|
|
138
569
|
|
|
139
|
-
|
|
140
|
-
|
|
570
|
+
// NEW (v2.1.0) - Backward Compatible
|
|
571
|
+
import { check } from "llm_guardrail";
|
|
572
|
+
const result = await check(prompt);
|
|
573
|
+
// result.detected, result.probabilities.threat
|
|
141
574
|
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
575
|
+
// RECOMMENDED (v2.1.0) - New API
|
|
576
|
+
import { checkAll } from "llm_guardrail";
|
|
577
|
+
const result = await checkAll(prompt);
|
|
578
|
+
// result.injection.detected, result.overallRisk
|
|
579
|
+
```
|
|
580
|
+
|
|
581
|
+
#### Feature Additions
|
|
582
|
+
|
|
583
|
+
```javascript
|
|
584
|
+
// New comprehensive checking
|
|
585
|
+
const analysis = await checkAll(prompt);
|
|
586
|
+
console.log("Risk Level:", analysis.overallRisk);
|
|
587
|
+
console.log("Threats Found:", analysis.threatsDetected);
|
|
588
|
+
|
|
589
|
+
// Individual threat checking
|
|
590
|
+
const injection = await checkInjection(prompt);
|
|
591
|
+
const jailbreak = await checkJailbreak(prompt);
|
|
592
|
+
const malicious = await checkMalicious(prompt);
|
|
593
|
+
```
|
|
594
|
+
|
|
595
|
+
## Configuration Options
|
|
596
|
+
|
|
597
|
+
### Custom Risk Thresholds
|
|
598
|
+
|
|
599
|
+
```javascript
|
|
600
|
+
// Define your own risk assessment logic
|
|
601
|
+
function customRiskAssessment(analysis, context = {}) {
|
|
602
|
+
const { userTrust = 0, contentType = "general" } = context;
|
|
603
|
+
|
|
604
|
+
// Adjust thresholds based on context
|
|
605
|
+
const baseThreshold = contentType === "education" ? 0.8 : 0.5;
|
|
606
|
+
const adjustedThreshold = Math.max(0.1, baseThreshold - userTrust);
|
|
607
|
+
|
|
608
|
+
return {
|
|
609
|
+
allowed: analysis.maxThreatConfidence < adjustedThreshold,
|
|
610
|
+
risk: analysis.overallRisk,
|
|
611
|
+
customScore: analysis.maxThreatConfidence / adjustedThreshold,
|
|
612
|
+
};
|
|
613
|
+
}
|
|
614
|
+
```
|
|
615
|
+
|
|
616
|
+
### Integration Patterns
|
|
617
|
+
|
|
618
|
+
#### Express.js Middleware
|
|
619
|
+
|
|
620
|
+
```javascript
|
|
621
|
+
import express from "express";
|
|
622
|
+
import { checkAll } from "llm_guardrail";
|
|
623
|
+
|
|
624
|
+
const app = express();
|
|
625
|
+
|
|
626
|
+
const securityMiddleware = async (req, res, next) => {
|
|
627
|
+
try {
|
|
628
|
+
const { message } = req.body;
|
|
629
|
+
const analysis = await checkAll(message);
|
|
630
|
+
|
|
631
|
+
if (!analysis.allowed) {
|
|
632
|
+
return res.status(400).json({
|
|
633
|
+
error: "Content blocked by security filters",
|
|
634
|
+
reason: `${analysis.overallRisk} risk detected`,
|
|
635
|
+
threats: analysis.threatsDetected,
|
|
636
|
+
});
|
|
637
|
+
}
|
|
638
|
+
|
|
639
|
+
req.securityAnalysis = analysis;
|
|
640
|
+
next();
|
|
641
|
+
} catch (error) {
|
|
642
|
+
console.error("Security middleware error:", error);
|
|
643
|
+
res.status(500).json({ error: "Security check failed" });
|
|
644
|
+
}
|
|
645
|
+
};
|
|
646
|
+
|
|
647
|
+
app.post("/chat", securityMiddleware, async (req, res) => {
|
|
648
|
+
// Process secure message
|
|
649
|
+
const response = await processMessage(req.body.message);
|
|
650
|
+
res.json({ response, security: req.securityAnalysis });
|
|
651
|
+
});
|
|
652
|
+
```
|
|
653
|
+
|
|
654
|
+
#### WebSocket Security
|
|
655
|
+
|
|
656
|
+
```javascript
|
|
657
|
+
import WebSocket from "ws";
|
|
658
|
+
import { checkAll } from "llm_guardrail";
|
|
659
|
+
|
|
660
|
+
const wss = new WebSocket.Server({ port: 8080 });
|
|
661
|
+
|
|
662
|
+
wss.on("connection", (ws) => {
|
|
663
|
+
ws.on("message", async (data) => {
|
|
664
|
+
try {
|
|
665
|
+
const message = JSON.parse(data);
|
|
666
|
+
const analysis = await checkAll(message.text);
|
|
667
|
+
|
|
668
|
+
if (analysis.allowed) {
|
|
669
|
+
// Process and broadcast safe message
|
|
670
|
+
wss.clients.forEach((client) => {
|
|
671
|
+
if (client.readyState === WebSocket.OPEN) {
|
|
672
|
+
client.send(
|
|
673
|
+
JSON.stringify({
|
|
674
|
+
type: "message",
|
|
675
|
+
text: message.text,
|
|
676
|
+
user: message.user,
|
|
677
|
+
}),
|
|
678
|
+
);
|
|
679
|
+
}
|
|
680
|
+
});
|
|
681
|
+
} else {
|
|
682
|
+
// Notify sender of blocked content
|
|
683
|
+
ws.send(
|
|
684
|
+
JSON.stringify({
|
|
685
|
+
type: "error",
|
|
686
|
+
message: "Message blocked by security filters",
|
|
687
|
+
threats: analysis.threatsDetected,
|
|
688
|
+
}),
|
|
689
|
+
);
|
|
690
|
+
}
|
|
691
|
+
} catch (error) {
|
|
692
|
+
ws.send(
|
|
693
|
+
JSON.stringify({
|
|
694
|
+
type: "error",
|
|
695
|
+
message: "Failed to process message",
|
|
696
|
+
}),
|
|
697
|
+
);
|
|
698
|
+
}
|
|
699
|
+
});
|
|
700
|
+
});
|
|
701
|
+
```
|
|
702
|
+
|
|
703
|
+
## Monitoring & Analytics
|
|
704
|
+
|
|
705
|
+
### Security Metrics Collection
|
|
706
|
+
|
|
707
|
+
```javascript
|
|
708
|
+
import { checkAll } from "llm_guardrail";
|
|
709
|
+
|
|
710
|
+
class SecurityMetrics {
|
|
711
|
+
constructor() {
|
|
712
|
+
this.metrics = {
|
|
713
|
+
totalChecks: 0,
|
|
714
|
+
threatsBlocked: 0,
|
|
715
|
+
threatTypes: {},
|
|
716
|
+
averageResponseTime: 0,
|
|
717
|
+
falsePositives: 0,
|
|
718
|
+
};
|
|
719
|
+
}
|
|
720
|
+
|
|
721
|
+
async checkWithMetrics(prompt, metadata = {}) {
|
|
722
|
+
const startTime = Date.now();
|
|
723
|
+
|
|
724
|
+
try {
|
|
725
|
+
const result = await checkAll(prompt);
|
|
726
|
+
const responseTime = Date.now() - startTime;
|
|
727
|
+
|
|
728
|
+
// Update metrics
|
|
729
|
+
this.metrics.totalChecks++;
|
|
730
|
+
this.metrics.averageResponseTime =
|
|
731
|
+
(this.metrics.averageResponseTime * (this.metrics.totalChecks - 1) +
|
|
732
|
+
responseTime) /
|
|
733
|
+
this.metrics.totalChecks;
|
|
734
|
+
|
|
735
|
+
if (!result.allowed) {
|
|
736
|
+
this.metrics.threatsBlocked++;
|
|
737
|
+
result.threatsDetected.forEach((threat) => {
|
|
738
|
+
this.metrics.threatTypes[threat] =
|
|
739
|
+
(this.metrics.threatTypes[threat] || 0) + 1;
|
|
740
|
+
});
|
|
741
|
+
}
|
|
742
|
+
|
|
743
|
+
return {
|
|
744
|
+
...result,
|
|
745
|
+
responseTime,
|
|
746
|
+
metrics: this.getSnapshot(),
|
|
747
|
+
};
|
|
748
|
+
} catch (error) {
|
|
749
|
+
console.error("Security check with metrics failed:", error);
|
|
750
|
+
throw error;
|
|
751
|
+
}
|
|
752
|
+
}
|
|
753
|
+
|
|
754
|
+
getSnapshot() {
|
|
755
|
+
return {
|
|
756
|
+
...this.metrics,
|
|
757
|
+
blockRate:
|
|
758
|
+
(
|
|
759
|
+
(this.metrics.threatsBlocked / this.metrics.totalChecks) *
|
|
760
|
+
100
|
|
761
|
+
).toFixed(2) + "%",
|
|
762
|
+
topThreats: Object.entries(this.metrics.threatTypes)
|
|
763
|
+
.sort(([, a], [, b]) => b - a)
|
|
764
|
+
.slice(0, 3),
|
|
765
|
+
};
|
|
766
|
+
}
|
|
767
|
+
}
|
|
768
|
+
```
|
|
769
|
+
|
|
770
|
+
## Community & Support
|
|
771
|
+
|
|
772
|
+
- **Discord Community**: [Join our active community](https://discord.gg/xV8e3TFrFU)
|
|
773
|
+
- Get help with implementation
|
|
774
|
+
- Share use cases and feedback
|
|
775
|
+
- Early access to new features
|
|
776
|
+
- Direct developer support
|
|
777
|
+
|
|
778
|
+
- **GitHub Issues**: [Report bugs and request features](https://github.com/Frank2006x/llm_Guardrails/issues)
|
|
779
|
+
- **Documentation**: [Full API documentation](https://github.com/Frank2006x/llm_Guardrails#readme)
|
|
780
|
+
- **Enterprise Support**: Available for high-volume deployments
|
|
781
|
+
|
|
782
|
+
## Roadmap v2.2+
|
|
783
|
+
|
|
784
|
+
### Planned Features
|
|
785
|
+
|
|
786
|
+
- [ ] **Custom Model Training**: Train models on your specific data
|
|
787
|
+
- [ ] **Real-time Model Updates**: Download updated models automatically
|
|
788
|
+
- [ ] **Multi-language Support**: Models for non-English content
|
|
789
|
+
- [ ] **Severity Scoring**: Granular threat severity levels
|
|
790
|
+
- [ ] **Content Categories**: Detailed classification beyond binary detection
|
|
791
|
+
- [ ] **Performance Dashboard**: Built-in metrics visualization
|
|
792
|
+
- [ ] **Cloud Integration**: Optional cloud-based model updates
|
|
793
|
+
|
|
794
|
+
### Integration Roadmap
|
|
795
|
+
|
|
796
|
+
- [ ] **LangChain Plugin**: Native LangChain integration
|
|
797
|
+
- [ ] **OpenAI Wrapper**: Direct OpenAI API proxy with built-in protection
|
|
798
|
+
- [ ] **Anthropic Integration**: Claude-specific optimizations
|
|
799
|
+
- [ ] **Azure OpenAI**: Enterprise Azure integration
|
|
800
|
+
- [ ] **AWS Bedrock**: Native AWS Bedrock support
|
|
801
|
+
|
|
802
|
+
## Performance Tips
|
|
803
|
+
|
|
804
|
+
### Production Optimization
|
|
805
|
+
|
|
806
|
+
```javascript
|
|
807
|
+
// Model preloading for better cold start performance
|
|
808
|
+
import { checkInjection } from "llm_guardrail";
|
|
809
|
+
|
|
810
|
+
// Preload models during application startup
|
|
811
|
+
async function warmupModels() {
|
|
812
|
+
console.log("Warming up security models...");
|
|
813
|
+
await Promise.all([
|
|
814
|
+
checkInjection("test"),
|
|
815
|
+
checkJailbreak("test"),
|
|
816
|
+
checkMalicious("test"),
|
|
817
|
+
]);
|
|
818
|
+
console.log("Models ready");
|
|
819
|
+
}
|
|
820
|
+
|
|
821
|
+
// Call during app initialization
|
|
822
|
+
await warmupModels();
|
|
823
|
+
```
|
|
824
|
+
|
|
825
|
+
### Batch Processing
|
|
826
|
+
|
|
827
|
+
```javascript
|
|
828
|
+
// For high-throughput scenarios
|
|
829
|
+
async function batchSecurityCheck(prompts) {
|
|
830
|
+
const results = await Promise.allSettled(
|
|
831
|
+
prompts.map((prompt) => checkAll(prompt)),
|
|
148
832
|
);
|
|
149
|
-
console.log(` Recommendation: ${result.allowed ? "✅ Allow" : "🚫 Block"}`);
|
|
150
833
|
|
|
151
|
-
return result
|
|
834
|
+
return results.map((result, index) => ({
|
|
835
|
+
prompt: prompts[index],
|
|
836
|
+
success: result.status === "fulfilled",
|
|
837
|
+
analysis: result.status === "fulfilled" ? result.value : null,
|
|
838
|
+
error: result.status === "rejected" ? result.reason : null,
|
|
839
|
+
}));
|
|
152
840
|
}
|
|
153
841
|
```
|
|
154
842
|
|
|
155
|
-
##
|
|
843
|
+
## License & Legal
|
|
844
|
+
|
|
845
|
+
- **License**: ISC License - see [LICENSE](https://github.com/Frank2006x/llm_Guardrails/blob/main/LICENSE)
|
|
846
|
+
- **Model Usage**: Models trained on public datasets with appropriate licenses
|
|
847
|
+
- **Privacy**: All processing happens locally - no data transmitted externally
|
|
848
|
+
- **Compliance**: GDPR and CCPA compliant (no data collection)
|
|
849
|
+
|
|
850
|
+
## Contributing
|
|
851
|
+
|
|
852
|
+
We welcome contributions from the community! Here's how you can help:
|
|
853
|
+
|
|
854
|
+
### Ways to Contribute
|
|
855
|
+
|
|
856
|
+
- **Bug Reports**: Help us identify and fix issues
|
|
857
|
+
- **Feature Requests**: Suggest new capabilities
|
|
858
|
+
- **Documentation**: Improve examples and guides
|
|
859
|
+
- **Testing**: Test edge cases and report findings
|
|
860
|
+
- **Code**: Submit pull requests for new features
|
|
861
|
+
|
|
862
|
+
### Development Setup
|
|
863
|
+
|
|
864
|
+
```bash
|
|
865
|
+
git clone https://github.com/Frank2006x/llm_Guardrails.git
|
|
866
|
+
cd llm_Guardrails
|
|
867
|
+
npm install
|
|
868
|
+
npm test
|
|
869
|
+
```
|
|
870
|
+
|
|
871
|
+
### Community Guidelines
|
|
872
|
+
|
|
873
|
+
- Be respectful and constructive
|
|
874
|
+
- Follow our code of conduct
|
|
875
|
+
- Test your changes thoroughly
|
|
876
|
+
- Document new features clearly
|
|
877
|
+
|
|
878
|
+
---
|
|
879
|
+
|
|
880
|
+
**⚠️ Important Security Notice**
|
|
881
|
+
|
|
882
|
+
LLM Guardrails provides robust protection but should be part of a comprehensive security strategy. Always:
|
|
883
|
+
|
|
884
|
+
- Implement multiple layers of security
|
|
885
|
+
- Monitor and log security events
|
|
886
|
+
- Keep models updated
|
|
887
|
+
- Validate inputs at multiple levels
|
|
888
|
+
- Have incident response procedures
|
|
156
889
|
|
|
157
|
-
|
|
890
|
+
**Remember**: No single security measure is 100% effective. Defense in depth is key.
|
|
158
891
|
|
|
159
|
-
- **TF-IDF Vectorization**: Converts text to numerical features
|
|
160
892
|
- **Logistic Regression**: ML model trained on prompt injection datasets
|
|
161
893
|
- **Local Processing**: No external API calls or data transmission
|
|
162
894
|
- **ES Module Support**: Modern JavaScript module system
|
|
@@ -177,7 +909,7 @@ The guardrail uses a machine learning approach trained to detect:
|
|
|
177
909
|
- Instruction injection
|
|
178
910
|
- Context manipulation
|
|
179
911
|
|
|
180
|
-
##
|
|
912
|
+
## Error Handling Best Practices
|
|
181
913
|
|
|
182
914
|
```javascript
|
|
183
915
|
import { check } from "llm_guardrail";
|
|
@@ -198,13 +930,13 @@ async function safeCheck(prompt) {
|
|
|
198
930
|
}
|
|
199
931
|
```
|
|
200
932
|
|
|
201
|
-
##
|
|
933
|
+
## Community & Support
|
|
202
934
|
|
|
203
935
|
- **Discord**: Join our community at [https://discord.gg/xV8e3TFrFU](https://discord.gg/xV8e3TFrFU)
|
|
204
936
|
- **GitHub Issues**: [Report bugs and request features](https://github.com/Frank2006x/llm_Guardrails/issues)
|
|
205
937
|
- **GitHub Repository**: [Source code and documentation](https://github.com/Frank2006x/llm_Guardrails)
|
|
206
938
|
|
|
207
|
-
##
|
|
939
|
+
## Roadmap v2.2+
|
|
208
940
|
|
|
209
941
|
- [ ] Multi-language support
|
|
210
942
|
- [ ] Custom model training utilities
|
|
@@ -212,11 +944,11 @@ async function safeCheck(prompt) {
|
|
|
212
944
|
- [ ] Performance analytics dashboard
|
|
213
945
|
- [ ] Integration examples for popular frameworks
|
|
214
946
|
|
|
215
|
-
##
|
|
947
|
+
## License & Legal
|
|
216
948
|
|
|
217
949
|
This project is licensed under the ISC License - see the package.json for details.
|
|
218
950
|
|
|
219
|
-
##
|
|
951
|
+
## Contributing
|
|
220
952
|
|
|
221
953
|
We welcome contributions! Please feel free to submit pull requests, report bugs, or suggest features through our GitHub repository or Discord community.
|
|
222
954
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "llm_guardrail",
|
|
3
|
-
"version": "2.1.
|
|
3
|
+
"version": "2.1.1",
|
|
4
4
|
"description": "A lightweight, low-latency ML-powered guardrail to stop prompt injection attacks before they reach your LLM.",
|
|
5
5
|
"homepage": "https://github.com/Frank2006x/llm_Guardrails#readme",
|
|
6
6
|
"bugs": {
|