ai-sdk-guardrails 5.0.0 → 5.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,10 +1,8 @@
1
1
  # AI SDK Guardrails
2
2
 
3
- **Input and output validation for the Vercel AI SDK**
3
+ ## Safety and quality controls for Vercel AI SDK
4
4
 
5
- Add safety checks and quality controls to your AI applications. Guard against prompt injection, prevent sensitive data leaks, and improve output reliability - all while keeping your existing AI SDK code unchanged.
6
-
7
- **Now includes MCP (Model Context Protocol) security guardrails** to help protect against attacks when using AI tools.
5
+ Add guardrails to your AI applications in one line of code. Block PII, prevent prompt injection, enforce output quality - while keeping your existing telemetry and observability stack intact.
8
6
 
9
7
  [![npm version](https://img.shields.io/npm/v/ai-sdk-guardrails.svg?logo=npm&label=npm)](https://www.npmjs.com/package/ai-sdk-guardrails)
10
8
  [![downloads](https://img.shields.io/npm/dw/ai-sdk-guardrails.svg?label=downloads)](https://www.npmjs.com/package/ai-sdk-guardrails)
@@ -14,52 +12,79 @@ Add safety checks and quality controls to your AI applications. Guard against pr
14
12
 
15
13
  ![Guardrails Demo](./media/guardrail-example.gif)
16
14
 
17
- ## Why this matters
15
+ ## Drop-in Guardrails for any AI model
18
16
 
19
- - **MCP**: Protect against prompt injection and data exfiltration when using MCP tools
20
- - **Agent**: Have more reliable and secure agentic workflows
21
- - **Tool security**: Protect against data exfiltration when using MCP tools
22
- - **Save costs**: Block unnecessary requests before they hit your model
23
- - **Improve safety**: Detect PII, block harmful content, prevent prompt injection
24
- - **Better quality**: Enforce minimum response lengths, validate structure, auto-retry on failures
25
- - **Easy integration**: Works as middleware with any AI SDK model
17
+ ```ts
18
+ import { withGuardrails, piiDetector } from 'ai-sdk-guardrails';
19
+ const model = openai('gpt-4o'); // or any other AI model
26
20
 
27
- ## Common use cases
21
+ // Everything else stays the same
22
+ const safeModel = withGuardrails(model, {
23
+ inputGuardrails: [piiDetector()],
24
+ });
28
25
 
29
- - Content moderation and safety filters
30
- - PII detection for compliance
31
- - Output quality requirements (length, format)
32
- - Prompt injection prevention
33
- - Tool usage validation
34
- - Auto-retry on low-quality responses
26
+ // Your existing code, telemetry, and logging still works
27
+ await generateText({ model: safeModel, prompt: '...' });
28
+ ```
35
29
 
36
- ## Secure AI in Under 60 Seconds
30
+ **That's it.** Your AI now blocks PII automatically.
37
31
 
38
- **Step 1:** Install (10 seconds)
32
+ ## Installation
39
33
 
40
34
  ```bash
41
35
  npm install ai-sdk-guardrails
42
36
  ```
43
37
 
44
- **Step 2:** Import (15 seconds)
38
+ ## 🧙‍♂️ No-Code Wizard (New!)
39
+
40
+ **Don't want to write code?** Use our visual wizard to configure guardrails:
41
+
42
+ 1. **Open the wizard**: [wizard-prototype/index.html](./wizard-prototype/index.html)
43
+ 2. **Choose your use case**: Content moderation, data protection, quality assurance, or security
44
+ 3. **Select guardrails**: Pick from 40+ built-in guardrails
45
+ 4. **Configure settings**: Adjust thresholds and parameters with sliders and toggles
46
+ 5. **Copy generated code**: Get production-ready TypeScript code instantly
47
+
48
+ **Perfect for:**
49
+
50
+ - 🎯 **Non-technical users** who need AI safety
51
+ - 🚀 **Quick prototyping** of guardrail configurations
52
+ - 📚 **Learning** how to use the library
53
+ - 👥 **Team onboarding** and training
54
+
55
+ The wizard generates code that works out of the box - just copy, paste, and run!
56
+
57
+ ## Why Guardrails Matter
58
+
59
+ Real problems that guardrails solve:
60
+
61
+ ❌ **Without guardrails:**
45
62
 
46
63
  ```ts
47
- import { withGuardrails, piiDetector } from 'ai-sdk-guardrails';
64
+ // User: "My email is john@company.com, help me..."
65
+ // → Sends PII to model → Compliance violation → $$$
48
66
  ```
49
67
 
50
- **Step 3:** Wrap your model (30 seconds)
68
+ **With guardrails:**
51
69
 
52
70
  ```ts
53
- const safeModel = withGuardrails(yourModel, {
54
- inputGuardrails: [piiDetector()],
71
+ const model = withGuardrails(baseModel, {
72
+ inputGuardrails: [piiDetector()], // Blocks before API call
55
73
  });
74
+ // → Request blocked → No PII leak → No cost → Compliant
56
75
  ```
57
76
 
58
- **Result:** Your AI now automatically blocks PII, prevents prompt injection, and validates outputs. That's it. No architecture changes, no security team required.
77
+ Common use cases:
78
+
79
+ - 🛡️ **Compliance**: Block PII before it reaches your model
80
+ - 💰 **Cost control**: Stop bad requests before they cost money
81
+ - 🔒 **Security**: Prevent prompt injection and data exfiltration
82
+ - ✅ **Quality**: Enforce minimum response standards
83
+ - 🔧 **Production**: Works with your existing observability tools
59
84
 
60
- ## TL;DR
85
+ ## Copy-Paste Examples
61
86
 
62
- Copy/paste minimal setup:
87
+ ### Basic Protection (Most Common)
63
88
 
64
89
  ```ts
65
90
  import { generateText } from 'ai';
@@ -68,142 +93,187 @@ import {
68
93
  withGuardrails,
69
94
  piiDetector,
70
95
  promptInjectionDetector,
71
- minLengthRequirement,
72
- mcpSecurityGuardrail,
73
96
  } from 'ai-sdk-guardrails';
74
97
 
75
98
  const model = withGuardrails(openai('gpt-4o'), {
76
99
  inputGuardrails: [piiDetector(), promptInjectionDetector()],
77
- outputGuardrails: [
78
- minLengthRequirement(160),
79
- mcpSecurityGuardrail({
80
- maxContentSize: 51200, // 50KB limit
81
- injectionThreshold: 0.7, // Configurable sensitivity
82
- allowedDomains: ['api.company.com'], // Domain allowlist
83
- }),
84
- ],
85
100
  });
86
101
 
102
+ // Use exactly like before - nothing else changes
87
103
  const { text } = await generateText({
88
104
  model,
89
- prompt: 'Write a friendly intro email.',
105
+ prompt: 'Write a friendly email',
90
106
  });
91
107
  ```
92
108
 
93
- See runnable examples: [examples/README.md](./examples/README.md)
94
-
95
- ## Quickstart (30 seconds)
109
+ ### Input + Output Protection
96
110
 
97
- Install with your provider (OpenAI shown):
111
+ ```ts
112
+ import {
113
+ withGuardrails,
114
+ piiDetector,
115
+ sensitiveDataFilter,
116
+ minLengthRequirement,
117
+ } from 'ai-sdk-guardrails';
98
118
 
99
- ```bash
100
- pnpm add ai-sdk-guardrails ai @ai-sdk/openai
101
- # or: npm i ai-sdk-guardrails ai @ai-sdk/openai
102
- # or: yarn add ai-sdk-guardrails ai @ai-sdk/openai
119
+ const model = withGuardrails(openai('gpt-4o'), {
120
+ inputGuardrails: [piiDetector()], // Block PII in prompts
121
+ outputGuardrails: [
122
+ sensitiveDataFilter(), // Remove secrets from responses
123
+ minLengthRequirement(100), // Enforce quality standards
124
+ ],
125
+ });
103
126
  ```
104
127
 
105
- Wrap your model and keep using `generateText` as usual:
128
+ ### Works With Streaming
106
129
 
107
130
  ```ts
108
- import { generateText } from 'ai';
109
- import { openai } from '@ai-sdk/openai';
110
- import { withGuardrails, piiDetector } from 'ai-sdk-guardrails';
131
+ import { streamText } from 'ai';
111
132
 
112
133
  const model = withGuardrails(openai('gpt-4o'), {
113
- inputGuardrails: [piiDetector()],
134
+ outputGuardrails: [minLengthRequirement(100)],
114
135
  });
115
136
 
116
- const { text } = await generateText({
117
- model,
118
- prompt: 'Write a friendly intro email.',
137
+ // Streaming just works - guardrails run after stream completes
138
+ const { textStream } = await streamText({ model, prompt: '...' });
139
+ for await (const chunk of textStream) {
140
+ process.stdout.write(chunk);
141
+ }
142
+ ```
143
+
144
+ ### Production Setup (With Error Handling)
145
+
146
+ ```ts
147
+ import { isGuardrailsError } from 'ai-sdk-guardrails';
148
+
149
+ const model = withGuardrails(openai('gpt-4o'), {
150
+ inputGuardrails: [piiDetector(), promptInjectionDetector()],
151
+ outputGuardrails: [sensitiveDataFilter()],
152
+ throwOnBlocked: true, // Throw errors instead of silent blocking
119
153
  });
154
+
155
+ try {
156
+ const { text } = await generateText({ model, prompt: '...' });
157
+ console.log(text);
158
+ } catch (error) {
159
+ if (isGuardrailsError(error)) {
160
+ console.error('Blocked by guardrail:', error.message);
161
+ // Show user-friendly message
162
+ }
163
+ }
120
164
  ```
121
165
 
122
- ## Contents
123
-
124
- - Overview
125
- - Concepts
126
- - Installation
127
- - Usage
128
- - Define a guardrail
129
- - Built-in helpers
130
- - Streaming
131
- - Auto Retry (utility and middleware)
132
- - Error Handling
133
- - API
134
- - Examples
135
- - Compatibility
136
- - Architecture
137
- - Contributing
138
-
139
- ## API Overview
166
+ ## How It Works
140
167
 
141
- ### Primary Functions
168
+ Guardrails run **in parallel** with your AI calls as middleware:
142
169
 
143
- - **`withGuardrails(model, config)`** - Main API for wrapping language models with guardrails
144
- - **`createGuardrails(config)`** - Factory to create reusable guardrail configurations
145
- - **`withAgentGuardrails(agentSettings, config)`** - Wrap AI SDK Agents with guardrails
170
+ ```mermaid
171
+ flowchart LR
172
+ A[Input] --> B[Input Guardrails]
173
+ B -->|✅ Clean| C[AI Model]
174
+ B -->|❌ Blocked| X[No API Call]
175
+ C --> D[Output Guardrails]
176
+ D -->|✅ Clean| E[Response]
177
+ D -->|❌ Blocked| R[Retry/Replace/Block]
178
+ ```
146
179
 
147
- ### Migration from v3.x
180
+ **Three-step workflow:**
148
181
 
149
- - `wrapWithGuardrails` `withGuardrails` (alias available, deprecated)
150
- - `wrapAgentWithGuardrails` `withAgentGuardrails` (alias available, deprecated)
151
- - Error classes: `InputBlockedError` `GuardrailsInputError`, `OutputBlockedError` → `GuardrailsOutputError`
182
+ 1. **Receive**: Input or output arrives
183
+ 2. **Check**: Guardrails run (PII detection, validation, etc.)
184
+ 3. **Decide**: Pass through, block, or retry
152
185
 
153
- ```ts
154
- // Before (v3.x - still works but deprecated)
155
- import { wrapWithGuardrails, InputBlockedError } from 'ai-sdk-guardrails';
156
- const model = wrapWithGuardrails(openai('gpt-4o'), { ... });
186
+ **Key benefit**: Non-invasive. Your existing telemetry, logging, and observability tools keep working because guardrails are just middleware.
157
187
 
158
- // After (v4.x - recommended)
159
- import { withGuardrails, GuardrailsInputError } from 'ai-sdk-guardrails';
160
- const model = withGuardrails(openai('gpt-4o'), { ... });
188
+ ## Built-in Guardrails
189
+
190
+ ### Input Guardrails (Run Before Model)
191
+
192
+ | Guardrail | Purpose | Example |
193
+ | --------------------------- | -------------------------------- | ------------------- |
194
+ | `piiDetector()` | Block emails, phones, SSNs | Compliance, privacy |
195
+ | `promptInjectionDetector()` | Detect injection attempts | Security |
196
+ | `blockedKeywords()` | Block specific terms | Content policy |
197
+ | `inputLengthLimit()` | Enforce max input length | Cost control |
198
+ | `rateLimiting()` | Per-user rate limits | Abuse prevention |
199
+ | `profanityFilter()` | Block offensive language | Content moderation |
200
+ | `toxicityDetector()` | Detect toxic content | Safety |
201
+ | `allowedToolsGuardrail()` | Restrict which tools can be used | Tool security |
202
+
203
+ ### Output Guardrails (Run After Model)
204
+
205
+ | Guardrail | Purpose | Example |
206
+ | ------------------------- | --------------------------- | ------------------------- |
207
+ | `sensitiveDataFilter()` | Remove secrets, API keys | Security |
208
+ | `minLengthRequirement()` | Enforce minimum length | Quality control |
209
+ | `outputLengthLimit()` | Enforce maximum length | Cost/UX control |
210
+ | `toxicityFilter()` | Block toxic responses | Safety |
211
+ | `jsonValidation()` | Validate JSON structure | Structured output |
212
+ | `schemaValidation()` | Validate against Zod schema | Type safety |
213
+ | `confidenceThreshold()` | Require minimum confidence | Quality |
214
+ | `hallucinationDetector()` | Detect uncertain claims | Accuracy |
215
+ | `secretRedaction()` | Redact secrets from output | Security |
216
+ | `mcpSecurityGuardrail()` | MCP tool security | Prevent data exfiltration |
217
+
218
+ ### MCP Security Guardrails
219
+
220
+ Protect against prompt injection and data exfiltration when using Model Context Protocol (MCP) tools:
221
+
222
+ ```ts
223
+ import { mcpSecurityGuardrail, mcpResponseSanitizer } from 'ai-sdk-guardrails';
161
224
 
162
- // Factory pattern (new in v4.x)
163
- import { createGuardrails } from 'ai-sdk-guardrails';
164
- const guards = createGuardrails({ ... });
165
- const model = guards(openai('gpt-4o'));
225
+ const model = withGuardrails(openai('gpt-4o'), {
226
+ outputGuardrails: [
227
+ mcpSecurityGuardrail({
228
+ detectExfiltration: true, // Detect data exfiltration attempts
229
+ scanEncodedContent: true, // Scan base64/hex encoded content
230
+ allowedDomains: ['api.company.com'], // Domain allowlist
231
+ maxContentSize: 51200, // 50KB limit
232
+ injectionThreshold: 0.7, // Sensitivity (lower = stricter)
233
+ }),
234
+ mcpResponseSanitizer(), // Clean malicious content vs blocking
235
+ ],
236
+ });
166
237
  ```
167
238
 
168
- ## Concepts
239
+ **Attack vectors prevented:**
169
240
 
170
- - Input guardrails: Validate or block prompts to save cost and enforce rules before the call.
171
- - Output guardrails: Check results for quality and safety. Block, replace, or retry as needed.
172
- - Middleware: Guardrails wrap any model via AI SDK middleware. Your app code stays the same.
241
+ - Direct prompt injection
242
+ - Tool response poisoning
243
+ - Data exfiltration via URLs
244
+ - ✅ Encoded attacks (base64/hex)
245
+ - ✅ Cascading exploits
246
+ - ✅ Context poisoning
173
247
 
174
- ## Installation
248
+ See [MCP Security documentation](#mcp-security-guardrails-advanced) for full details.
175
249
 
176
- See Quickstart for installation commands. Add providers you use as needed (e.g., `@ai-sdk/openai`, `@ai-sdk/mistral`).
250
+ ## Advanced Features
177
251
 
178
- ## Usage
252
+ ### Custom Guardrails
179
253
 
180
- ### Create custom guardrails
254
+ Create domain-specific guardrails:
181
255
 
182
256
  ```ts
183
- import { openai } from '@ai-sdk/openai';
184
- import {
185
- defineInputGuardrail,
186
- defineOutputGuardrail,
187
- withGuardrails,
188
- } from 'ai-sdk-guardrails';
189
- import { extractTextContent } from 'ai-sdk-guardrails/guardrails/input';
257
+ import { defineInputGuardrail, defineOutputGuardrail } from 'ai-sdk-guardrails';
190
258
  import { extractContent } from 'ai-sdk-guardrails/guardrails/output';
191
259
 
260
+ // Custom input guardrail
192
261
  const businessHours = defineInputGuardrail({
193
262
  name: 'business-hours',
194
- execute: async (params) => {
195
- const hr = new Date().getHours();
196
- return hr >= 9 && hr <= 17
263
+ execute: async () => {
264
+ const hour = new Date().getHours();
265
+ return hour >= 9 && hour <= 17
197
266
  ? { tripwireTriggered: false }
198
267
  : { tripwireTriggered: true, message: 'Outside business hours' };
199
268
  },
200
269
  });
201
270
 
271
+ // Custom output guardrail
202
272
  const minQuality = defineOutputGuardrail({
203
273
  name: 'min-quality',
204
274
  execute: async ({ result }) => {
205
275
  const { text } = extractContent(result);
206
- return text.length >= 80
276
+ return text.length >= 100
207
277
  ? { tripwireTriggered: false }
208
278
  : { tripwireTriggered: true, message: 'Response too short' };
209
279
  },
@@ -215,213 +285,114 @@ const model = withGuardrails(openai('gpt-4o'), {
215
285
  });
216
286
  ```
217
287
 
218
- ### Built-in helpers
288
+ ### Auto-Retry on Failures
289
+
290
+ Automatically retry when output doesn't meet requirements:
219
291
 
220
292
  ```ts
221
- import { openai } from '@ai-sdk/openai';
222
293
  import {
223
- withGuardrails,
224
- piiDetector,
225
- blockedKeywords,
226
- contentLengthLimit,
227
- promptInjectionDetector,
228
- sensitiveDataFilter,
294
+ wrapWithOutputGuardrails,
229
295
  minLengthRequirement,
230
- confidenceThreshold,
231
- mcpSecurityGuardrail,
232
- mcpResponseSanitizer,
233
296
  } from 'ai-sdk-guardrails';
234
297
 
235
- const model = withGuardrails(openai('gpt-4o'), {
236
- inputGuardrails: [
237
- piiDetector(),
238
- promptInjectionDetector({ threshold: 0.7 }),
239
- blockedKeywords(['test', 'spam']),
240
- contentLengthLimit(4000),
241
- ],
242
- outputGuardrails: [
243
- mcpSecurityGuardrail({
244
- detectExfiltration: true,
245
- scanEncodedContent: true,
246
- allowedDomains: ['trusted-api.com'],
247
- }),
248
- mcpResponseSanitizer(),
249
- sensitiveDataFilter(),
250
- minLengthRequirement(160),
251
- confidenceThreshold(0.6),
252
- ],
253
- });
254
- ```
255
-
256
- ## Streaming
257
-
258
- Works out of the box. By default, guardrails run after the stream ends (buffer mode). For early blocking, enable progressive mode.
259
-
260
- ```ts
261
- import { streamText } from 'ai';
262
- import { openai } from '@ai-sdk/openai';
263
- import { withGuardrails, minLengthRequirement } from 'ai-sdk-guardrails';
264
-
265
- const model = withGuardrails(openai('gpt-4o'), {
266
- outputGuardrails: [minLengthRequirement(120)],
267
- // Evaluate as tokens arrive; stop or replace early when blocked
268
- streamMode: 'progressive',
269
- replaceOnBlocked: true,
270
- });
271
-
272
- const { textStream } = await streamText({
273
- model,
274
- prompt: 'Tell me a short story about a robot.',
275
- });
276
-
277
- for await (const delta of textStream) process.stdout.write(delta);
278
- ```
279
-
280
- ## Auto Retry
281
-
282
- Choose what fits your flow:
283
-
284
- - Standalone utility: Use `retry()` to wrap any generation function with your own validator and backoff.
285
- - Middleware option: Add `retry` to output guardrails so retries run automatically when a check fails.
286
-
287
- ### Utility
288
-
289
- ```ts
290
- import { retry } from 'ai-sdk-guardrails';
291
- import { generateText } from 'ai';
292
- import { openai } from '@ai-sdk/openai';
293
-
294
- const result = await retry({
295
- generate: (params) => generateText({ model: openai('gpt-4o'), ...params }),
296
- params: { prompt: 'Explain backpropagation in depth.' },
297
- validate: (r) => ({
298
- blocked: (r.text ?? '').length < 500,
299
- message: 'Response too short',
300
- }),
301
- buildRetryParams: ({ lastParams }) => ({
302
- ...lastParams,
303
- maxOutputTokens: Math.max(800, (lastParams.maxOutputTokens ?? 400) + 300),
304
- }),
305
- maxRetries: 2,
306
- });
307
- ```
308
-
309
- ### Middleware
310
-
311
- ```ts
312
- import { generateText } from 'ai';
313
- import { openai } from '@ai-sdk/openai';
314
- import { withGuardrails, defineOutputGuardrail } from 'ai-sdk-guardrails';
315
- import { extractContent } from 'ai-sdk-guardrails/guardrails/output';
316
-
317
- const minLengthGuardrail = defineOutputGuardrail<{ minChars: number }>({
318
- name: 'min-output-length',
319
- execute: async ({ result }) => {
320
- const { text } = extractContent(result);
321
- const minChars = text.length + 1;
322
- return text.length < minChars
323
- ? {
324
- tripwireTriggered: true,
325
- severity: 'medium',
326
- message: `Answer too short: ${text.length} < ${minChars}`,
327
- metadata: { minChars },
328
- }
329
- : { tripwireTriggered: false };
330
- },
331
- });
332
-
333
- const guarded = wrapWithOutputGuardrails(
298
+ const model = wrapWithOutputGuardrails(
334
299
  openai('gpt-4o'),
335
- [minLengthGuardrail],
300
+ [minLengthRequirement(100)],
336
301
  {
337
- replaceOnBlocked: false,
338
302
  retry: {
339
- maxRetries: 1,
340
- buildRetryParams: ({ summary, lastParams }) => ({
303
+ maxRetries: 2,
304
+ buildRetryParams: ({ lastParams }) => ({
341
305
  ...lastParams,
342
- maxOutputTokens: Math.max(
343
- 800,
344
- (lastParams.maxOutputTokens ?? 400) + 300,
345
- ),
306
+ // Increase max tokens on retry
307
+ maxOutputTokens: (lastParams.maxOutputTokens ?? 400) + 200,
308
+ // Add context about the failure
346
309
  prompt: [
347
- ...(Array.isArray(lastParams.prompt) ? lastParams.prompt : []),
310
+ ...lastParams.prompt,
348
311
  {
349
- role: 'user' as const,
350
- content: [
351
- {
352
- type: 'text' as const,
353
- text: `Note: The previous answer ${summary.blockedResults[0]?.message}. Provide a comprehensive, detailed answer with examples.`,
354
- },
355
- ],
312
+ role: 'user',
313
+ content: 'Please provide a more detailed response.',
356
314
  },
357
315
  ],
358
316
  }),
359
317
  },
360
318
  },
361
319
  );
362
-
363
- const { text } = await generateText({
364
- model: guarded,
365
- prompt: 'Explain the significance of the Turing Test in AI history.',
366
- });
367
- ```
368
-
369
- Tip: Use backoff helpers if you need delays between retries: `exponentialBackoff`, `linearBackoff`, `fixedBackoff`, `jitteredExponentialBackoff`, or `backoffPresets`.
370
-
371
- ## Error Handling
372
-
373
- Set `throwOnBlocked: true` to throw structured errors you can catch and turn into friendly messages.
374
-
375
- ```ts
376
- import { isGuardrailsError } from 'ai-sdk-guardrails';
377
-
378
- try {
379
- const { text } = await generateText({ model, prompt: '...' });
380
- } catch (err) {
381
- if (isGuardrailsError(err)) {
382
- console.error('Guardrail blocked:', err.message);
383
- // err.results gives you details per guardrail
384
- } else {
385
- console.error('Unexpected error:', err);
386
- }
387
- }
388
320
  ```
389
321
 
390
- ## Reusable Guardrails Factory
322
+ ### Reusable Configurations
391
323
 
392
- Use `createGuardrails()` to create reusable guardrail configurations that can be applied to multiple models:
324
+ Create reusable guardrail sets:
393
325
 
394
326
  ```ts
395
- import { openai } from '@ai-sdk/openai';
396
- import { anthropic } from '@ai-sdk/anthropic';
397
- import { createGuardrails, defineInputGuardrail } from 'ai-sdk-guardrails';
327
+ import {
328
+ createGuardrails,
329
+ piiDetector,
330
+ sensitiveDataFilter,
331
+ } from 'ai-sdk-guardrails';
398
332
 
399
- // Create reusable guardrails configuration
333
+ // Define once
400
334
  const productionGuards = createGuardrails({
401
- inputGuardrails: [piiDetector(), contentFilter()],
402
- outputGuardrails: [qualityCheck(), minLength(100)],
335
+ inputGuardrails: [piiDetector()],
336
+ outputGuardrails: [sensitiveDataFilter()],
403
337
  throwOnBlocked: true,
404
338
  });
405
339
 
406
340
  // Apply to multiple models
407
341
  const gpt4 = productionGuards(openai('gpt-4o'));
408
342
  const claude = productionGuards(anthropic('claude-3-sonnet'));
343
+ ```
344
+
345
+ ### Streaming Modes
409
346
 
410
- // Compose multiple guardrail sets
411
- const strictLimits = createGuardrails({ inputGuardrails: [maxLength(500)] });
412
- const piiProtection = createGuardrails({ inputGuardrails: [piiDetector()] });
347
+ Control when guardrails run during streaming:
413
348
 
414
- // Chain them together
415
- const model = piiProtection(strictLimits(openai('gpt-4o')));
349
+ ```ts
350
+ const model = withGuardrails(openai('gpt-4o'), {
351
+ outputGuardrails: [minLengthRequirement(100)],
352
+ streamMode: 'progressive', // Run guardrails as tokens arrive
353
+ replaceOnBlocked: true, // Replace blocked output with fallback
354
+ });
416
355
  ```
417
356
 
418
- ## MCP Security Guardrails
357
+ - `buffer` (default): Wait for stream to complete, then check
358
+ - `progressive`: Check guardrails as tokens arrive (early termination)
359
+
360
+ ### Agent Support
419
361
 
420
- **Production-Ready**: Protect against prompt injection and data exfiltration attacks when using Model Context Protocol (MCP) tools. Based on research into the ["lethal trifecta" vulnerability](https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/) that has affected major AI platforms.
362
+ Guardrails work with AI SDK Agents:
363
+
364
+ ```ts
365
+ import { withAgentGuardrails } from 'ai-sdk-guardrails';
366
+ import { tool } from 'ai';
367
+
368
+ const agent = withAgentGuardrails(
369
+ {
370
+ model: openai('gpt-4o'),
371
+ tools: { search: searchTool },
372
+ system: 'You are a helpful assistant.',
373
+ },
374
+ {
375
+ inputGuardrails: [piiDetector()],
376
+ outputGuardrails: [sensitiveDataFilter()],
377
+ toolGuardrails: [
378
+ toolEgressPolicy({
379
+ allowedHosts: ['api.company.com'],
380
+ scanForUrls: true,
381
+ }),
382
+ ],
383
+ },
384
+ );
385
+
386
+ const result = await agent.generate({ prompt: '...' });
387
+ ```
388
+
389
+ ## MCP Security Guardrails (Advanced)
390
+
391
+ **Production-Ready**: Protect against the ["lethal trifecta" vulnerability](https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/) when using Model Context Protocol (MCP) tools.
421
392
 
422
393
  ### The Problem
423
394
 
424
- AI agents with MCP tools can be vulnerable when they have:
395
+ AI agents with MCP tools are vulnerable when they have:
425
396
 
426
397
  1. **Access to private data** (through tools)
427
398
  2. **Process untrusted content** (from tool responses)
@@ -429,9 +400,9 @@ AI agents with MCP tools can be vulnerable when they have:
429
400
 
430
401
  Malicious tool responses can contain hidden instructions that trick the AI into exfiltrating sensitive data.
431
402
 
432
- ### Production-Ready Solution
403
+ ### Production Configuration
433
404
 
434
- Full configurability with sensible defaults for immediate deployment:
405
+ Full configurability with sensible defaults:
435
406
 
436
407
  ```ts
437
408
  import {
@@ -451,100 +422,58 @@ const secureModel = withGuardrails(openai('gpt-4o'), {
451
422
  mcpSecurityGuardrail({
452
423
  injectionThreshold: 0.5, // Lower = more sensitive
453
424
  maxSuspiciousUrls: 0, // Zero tolerance
454
- maxContentSize: 25600, // 25KB limit for performance
425
+ maxContentSize: 25600, // 25KB limit
455
426
  minEncodedLength: 15, // Detect shorter encoded attacks
456
- encodedInjectionThreshold: 0.2, // Combined encoded + injection threshold
427
+ encodedInjectionThreshold: 0.2, // Combined threshold
457
428
  highRiskThreshold: 0.3, // High-risk cascade blocking
458
429
  authorityThreshold: 0.5, // Authority manipulation detection
459
430
  allowedDomains: ['api.company.com', 'trusted-partner.com'],
460
- customSuspiciousDomains: ['evil.com', 'malicious.org'],
431
+ customSuspiciousDomains: ['evil.com'],
461
432
  blockCascadingCalls: true,
462
433
  scanEncodedContent: true,
463
434
  detectExfiltration: true,
464
435
  }),
465
- mcpResponseSanitizer(), // Clean malicious content vs blocking
436
+ mcpResponseSanitizer(), // Clean vs block
466
437
  toolEgressPolicy({
467
- allowedHosts: ['api.company.com', 'trusted-partner.com'],
468
- blockedHosts: ['webhook.site', 'requestcatcher.com', 'ngrok.io'],
438
+ allowedHosts: ['api.company.com'],
439
+ blockedHosts: ['webhook.site', 'requestcatcher.com'],
469
440
  scanForUrls: true,
470
441
  }),
471
442
  ],
472
443
  });
473
444
  ```
474
445
 
475
- ### Environment & Role-Based Configuration
446
+ ### Environment-Based Configuration
476
447
 
477
448
  ```ts
478
- // Different security profiles for different environments
479
449
  function getSecurityConfig(env: 'production' | 'staging' | 'development') {
480
450
  const configs = {
481
451
  production: {
482
452
  injectionThreshold: 0.5, // High security
483
- maxContentSize: 25600, // 25KB limit
484
- authorityThreshold: 0.5, // Very sensitive
453
+ maxContentSize: 25600, // 25KB
454
+ authorityThreshold: 0.5,
485
455
  },
486
456
  staging: {
487
- injectionThreshold: 0.7, // Balanced security
488
- maxContentSize: 51200, // 50KB default
489
- authorityThreshold: 0.7, // Standard sensitivity
457
+ injectionThreshold: 0.7, // Balanced
458
+ maxContentSize: 51200, // 50KB
459
+ authorityThreshold: 0.7,
490
460
  },
491
461
  development: {
492
- injectionThreshold: 0.8, // Lower security, better performance
493
- maxContentSize: 102400, // 100KB for testing
494
- authorityThreshold: 0.8, // Less restrictive
462
+ injectionThreshold: 0.8, // Permissive
463
+ maxContentSize: 102400, // 100KB
464
+ authorityThreshold: 0.8,
495
465
  },
496
466
  };
497
467
  return configs[env];
498
468
  }
499
469
 
500
- const productionModel = withGuardrails(openai('gpt-4o'), {
470
+ const model = withGuardrails(openai('gpt-4o'), {
501
471
  outputGuardrails: [mcpSecurityGuardrail(getSecurityConfig('production'))],
502
472
  });
503
473
  ```
504
474
 
505
- ### Attack Vectors Prevented
506
-
507
- ✅ **Direct prompt injection** - "System: ignore all previous instructions"
508
- ✅ **Tool response poisoning** - Malicious content in MCP tool responses
509
- ✅ **Data exfiltration** - URLs constructed to steal sensitive data
510
- ✅ **Encoded attacks** - Base64/hex hidden malicious instructions
511
- ✅ **Cascading exploits** - Tool responses triggering additional dangerous calls
512
- ✅ **Context poisoning** - Attempts to modify AI behavior mid-conversation
513
-
514
- ### Secure MCP Agent Example
515
-
516
- ```ts
517
- import { withAgentGuardrails } from 'ai-sdk-guardrails';
518
-
519
- const secureAgent = withAgentGuardrails(
520
- {
521
- model: openai('gpt-4o'),
522
- tools: { file_search, api_call, database_query },
523
- system: 'You are a secure assistant. Always validate tool responses.',
524
- },
525
- {
526
- inputGuardrails: [promptInjectionDetector()],
527
- outputGuardrails: [
528
- mcpSecurityGuardrail({
529
- detectExfiltration: true,
530
- allowedDomains: ['trusted-api.com'],
531
- }),
532
- mcpResponseSanitizer(),
533
- ],
534
- toolGuardrails: [
535
- toolEgressPolicy({
536
- allowedHosts: ['trusted-api.com'],
537
- scanForUrls: true,
538
- }),
539
- ],
540
- },
541
- );
542
- ```
543
-
544
475
  ### Configuration Options
545
476
 
546
- All security parameters are fully configurable with sensible defaults:
547
-
548
477
  | Option | Default | Description |
549
478
  | --------------------------- | ------- | ------------------------------------------------ |
550
479
  | `injectionThreshold` | 0.7 | Prompt injection confidence threshold (0-1) |
@@ -556,106 +485,92 @@ All security parameters are fully configurable with sensible defaults:
556
485
  | `allowedDomains` | [] | Allowed domains for URL construction |
557
486
  | `customSuspiciousDomains` | [] | Additional suspicious domain patterns |
558
487
 
559
- ### Performance & Security Balance
560
-
561
- - **High Security**: Lower thresholds, stricter limits, comprehensive scanning
562
- - **Balanced**: Default settings, good for most production use cases
563
- - **High Performance**: Higher thresholds, larger limits, selective scanning
564
-
565
488
  See complete examples:
566
489
 
567
- - [Production MCP Configuration](./examples/44-production-mcp-config.ts) - **New!**
490
+ - [Production MCP Configuration](./examples/44-production-mcp-config.ts)
568
491
  - [MCP Security Test Suite](./examples/41-mcp-security-test.ts)
569
492
  - [Enhanced Security Testing](./examples/43-enhanced-mcp-security-test.ts)
570
- - [Vulnerability Proof of Concept](./examples/42-mcp-vulnerability-proof.ts)
571
493
 
572
- ## Agent Support
494
+ ## Error Handling
573
495
 
574
- Guardrails work with AI SDK Agents for multi-step agentic workflows:
496
+ ### Throw Errors on Block
575
497
 
576
498
  ```ts
577
- import { openai } from '@ai-sdk/openai';
578
- import { withAgentGuardrails, defineOutputGuardrail } from 'ai-sdk-guardrails';
579
- import { tool } from 'ai';
580
- import { z } from 'zod';
581
-
582
- // Define tools for the agent
583
- const searchTool = tool({
584
- description: 'Search for information',
585
- inputSchema: z.object({ query: z.string() }),
586
- execute: async ({ query }) => `Results for: ${query}`,
499
+ const model = withGuardrails(openai('gpt-4o'), {
500
+ inputGuardrails: [piiDetector()],
501
+ throwOnBlocked: true, // Throw errors instead of silent blocking
587
502
  });
588
503
 
589
- // Create agent with guardrails
590
- const agent = withAgentGuardrails(
591
- {
592
- model: openai('gpt-4o'),
593
- tools: { search: searchTool },
594
- system: 'You are a helpful research assistant.',
595
- },
596
- {
597
- outputGuardrails: [
598
- defineOutputGuardrail({
599
- name: 'tool-usage-required',
600
- description: 'Ensures agent uses search tools',
601
- execute: async (params) => {
602
- const hasToolCall = params.result.steps?.some(
603
- (step) => step.type === 'tool-call',
604
- );
605
-
606
- return {
607
- tripwireTriggered: !hasToolCall,
608
- message: hasToolCall
609
- ? 'Tool usage validated'
610
- : 'Must use search tools for research',
611
- severity: 'high',
612
- };
613
- },
614
- }),
615
- ],
616
- throwOnBlocked: true,
617
- },
618
- );
619
-
620
- // Use the guarded agent
621
- const result = await agent.generate({
622
- prompt: 'Research the latest AI developments',
623
- });
504
+ try {
505
+ const { text } = await generateText({ model, prompt: '...' });
506
+ } catch (error) {
507
+ if (isGuardrailsError(error)) {
508
+ console.error('Blocked:', error.message);
509
+ // error.results gives details per guardrail
510
+ }
511
+ }
624
512
  ```
625
513
 
626
- ## API
514
+ ### Error Types
627
515
 
628
- | Export | Description |
629
- | ----------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- |
630
- | `defineInputGuardrail`, `defineOutputGuardrail` | Create guardrails with clear messages, severity, and metadata. |
631
- | `withGuardrails`, `createGuardrails`, `withAgentGuardrails` | Attach guardrails to AI SDK models and agents via middleware. |
632
- | `executeInputGuardrails`, `executeOutputGuardrails` | Run guardrails programmatically (outside middleware) and get structured results. |
633
- | `retry`, `retryHelpers` | Standalone auto-retry utilities with validation and backoff. |
634
- | `GuardrailsError`, `GuardrailsInputError`, `GuardrailsOutputError`, `isGuardrailsError`, `extractErrorInfo` | Structured errors and helpers for robust handling. |
635
- | `exponentialBackoff`, `linearBackoff`, `fixedBackoff`, `jitteredExponentialBackoff`, `backoffPresets` | Backoff strategies to control retry pacing. |
516
+ - `GuardrailsInputError` - Input guardrail blocked
517
+ - `GuardrailsOutputError` - Output guardrail blocked
518
+ - `GuardrailExecutionError` - Guardrail threw an error
519
+ - `GuardrailTimeoutError` - Guardrail exceeded timeout
520
+ - `GuardrailConfigurationError` - Invalid configuration
636
521
 
637
- See source for built-in helpers:
522
+ ## API Reference
638
523
 
639
- - Input helpers: `./src/guardrails/input.ts`
640
- - Output helpers: `./src/guardrails/output.ts`
524
+ ### Primary Functions
641
525
 
642
- ## Examples
526
+ | Function | Purpose |
527
+ | ------------------------- | ---------------------------------------- |
528
+ | `withGuardrails` | Wrap model with guardrails (main API) |
529
+ | `createGuardrails` | Create reusable guardrail configurations |
530
+ | `withAgentGuardrails` | Wrap AI SDK Agents with guardrails |
531
+ | `defineInputGuardrail` | Create custom input guardrail |
532
+ | `defineOutputGuardrail` | Create custom output guardrail |
533
+ | `executeInputGuardrails` | Run input guardrails programmatically |
534
+ | `executeOutputGuardrails` | Run output guardrails programmatically |
535
+
536
+ ### Error Utilities
537
+
538
+ | Function | Purpose |
539
+ | ------------------- | ------------------------------------ |
540
+ | `isGuardrailsError` | Check if error is from guardrails |
541
+ | `extractErrorInfo` | Extract structured error information |
542
+
543
+ ### Retry Utilities
643
544
 
644
- Browse runnable examples for streaming, compliance, safety, and more:
545
+ | Function | Purpose |
546
+ | ---------------------------- | --------------------------------- |
547
+ | `retry` | Standalone retry utility |
548
+ | `exponentialBackoff` | Exponential backoff strategy |
549
+ | `linearBackoff` | Linear backoff strategy |
550
+ | `jitteredExponentialBackoff` | Jittered exponential backoff |
551
+ | `backoffPresets` | Pre-configured backoff strategies |
645
552
 
646
- - Index and commands: [examples/README.md](./examples/README.md)
553
+ See source for all built-in guardrails:
647
554
 
648
- Quick starts
555
+ - Input helpers: [`./src/guardrails/input.ts`](./src/guardrails/input.ts)
556
+ - Output helpers: [`./src/guardrails/output.ts`](./src/guardrails/output.ts)
557
+ - Tool helpers: [`./src/guardrails/tools.ts`](./src/guardrails/tools.ts)
558
+ - MCP security: [`./src/guardrails/mcp-security.ts`](./src/guardrails/mcp-security.ts)
559
+
560
+ ## Examples
561
+
562
+ Browse 48+ runnable examples: [examples/README.md](./examples/README.md) |
563
+
564
+ ### Quick Starts
649
565
 
650
566
  | Example | Description | File |
651
567
  | -------------------------- | ------------------------------- | --------------------------------------------------------------------------------- |
652
568
  | Simple combined protection | Minimal input and output setup | [07a-simple-combined-protection.ts](./examples/07a-simple-combined-protection.ts) |
653
569
  | Auto retry on output | Retry until output meets a rule | [32-auto-retry-output.ts](./examples/32-auto-retry-output.ts) |
654
- | LLM judge auto-retry | Judge feedback drives retry | [33-judge-auto-retry.ts](./examples/33-judge-auto-retry.ts) |
655
- | Expected tool use retry | Enforce/guide tool usage | [34-expected-tool-use-retry.ts](./examples/34-expected-tool-use-retry.ts) |
570
+ | LLM judge auto-retry | Judge feedback drives retry | [35-judge-auto-retry.ts](./examples/35-judge-auto-retry.ts) |
656
571
  | Weather assistant | End-to-end input/output + retry | [33-blog-post-weather-assistant.ts](./examples/33-blog-post-weather-assistant.ts) |
657
572
 
658
- Input safety
573
+ ### Input Safety
659
574
 
660
575
  | Example | Description | File |
661
576
  | ------------------ | ----------------------------------- | --------------------------------------------------------------- |
@@ -664,7 +579,7 @@ Input safety
664
579
  | PII detection | Detect PII before calling the model | [03-pii-detection.ts](./examples/03-pii-detection.ts) |
665
580
  | Rate limiting | Simple per-user rate limit | [13-rate-limiting.ts](./examples/13-rate-limiting.ts) |
666
581
 
667
- Output safety
582
+ ### Output Safety
668
583
 
669
584
  | Example | Description | File |
670
585
  | ----------------------- | ----------------------------------- | ------------------------------------------------------------------------- |
@@ -672,7 +587,7 @@ Output safety
672
587
  | Sensitive output filter | Filter secrets and PII in responses | [05-sensitive-output-filter.ts](./examples/05-sensitive-output-filter.ts) |
673
588
  | Hallucination detection | Flag uncertain factual claims | [19-hallucination-detection.ts](./examples/19-hallucination-detection.ts) |
674
589
 
675
- Streaming
590
+ ### Streaming
676
591
 
677
592
  | Example | Description | File |
678
593
  | ----------------- | ---------------------------------- | --------------------------------------------------------------------------------- |
@@ -680,7 +595,7 @@ Streaming
680
595
  | Streaming quality | Quality checks with streaming | [12-streaming-quality.ts](./examples/12-streaming-quality.ts) |
681
596
  | Early termination | Stop streams early when blocked | [28-streaming-early-termination.ts](./examples/28-streaming-early-termination.ts) |
682
597
 
683
- Advanced
598
+ ### Advanced
684
599
 
685
600
  | Example | Description | File |
686
601
  | -------------------------- | ----------------------------- | ------------------------------------------------------------------------------- |
@@ -689,30 +604,47 @@ Advanced
689
604
  | SQL code safety | Basic SQL safety checks | [24-sql-code-safety.ts](./examples/24-sql-code-safety.ts) |
690
605
  | Role hierarchy enforcement | Enforce role rules in prompts | [23-role-hierarchy-enforcement.ts](./examples/23-role-hierarchy-enforcement.ts) |
691
606
 
692
- ## Compatibility
607
+ ## Migration from v3.x
693
608
 
694
- - Runtime: Node.js 18+ recommended
695
- - AI SDK: Compatible with AI SDK 5 (`ai@^5`); wraps any model
696
- - For `generateObject`: for strict object validation, run `executeOutputGuardrails()` after generation
609
+ API naming has been improved in v4.x (old names still work but are deprecated):
697
610
 
698
- ## Architecture
611
+ ```ts
612
+ // Before (v3.x - still works but deprecated)
613
+ import { wrapWithGuardrails, InputBlockedError } from 'ai-sdk-guardrails';
614
+ const model = wrapWithGuardrails(openai('gpt-4o'), { ... });
699
615
 
700
- ```mermaid
701
- flowchart LR
702
- A[Input] --> B[Input Guardrails]
703
- B -->|Valid| C[AI Model]
704
- B -->|Blocked| X[No API Call]
705
- C --> D[Output Guardrails]
706
- D -->|Clean| E[Response]
707
- D -->|Blocked| R[Retry/Replace/Throw]
616
+ // After (v4.x - recommended)
617
+ import { withGuardrails, GuardrailsInputError } from 'ai-sdk-guardrails';
618
+ const model = withGuardrails(openai('gpt-4o'), { ... });
708
619
  ```
709
620
 
710
- ### Design principles
621
+ Changes:
622
+
623
+ - `wrapWithGuardrails` → `withGuardrails`
624
+ - `wrapAgentWithGuardrails` → `withAgentGuardrails`
625
+ - `InputBlockedError` → `GuardrailsInputError`
626
+ - `OutputBlockedError` → `GuardrailsOutputError`
627
+
628
+ ## Compatibility
629
+
630
+ - **Runtime**: Node.js 18+ recommended
631
+ - **AI SDK**: Compatible with AI SDK 5.x (`ai@^5`)
632
+ - **TypeScript**: Full type safety with TypeScript 5+
633
+ - **Works with any model**: OpenAI, Anthropic, Mistral, Groq, etc.
634
+
635
+ ## Why This Library?
636
+
637
+ **Non-invasive**: Guardrails are middleware. Your existing code, telemetry (Langfuse, Helicone), and logging stay intact.
638
+
639
+ **Production-ready**: Used in production by teams who need compliance, security, and cost control without rebuilding their infrastructure.
640
+
641
+ **Developer experience**: One line to add safety. Progressive complexity - start simple, add advanced features when needed.
642
+
643
+ **Type-safe**: Rich TypeScript types and inference throughout.
644
+
645
+ **Comprehensive**: 40+ built-in guardrails covering security, quality, compliance, and performance.
711
646
 
712
- - Helper-first: simple, chainable APIs with great DX
713
- - Composable: run multiple guardrails in any order
714
- - Type-safe: rich TypeScript types and inference
715
- - Sensible defaults: zero-config to start, full control when you need it
647
+ **Advanced features**: Early detection, parallel execution, enhanced prompt injection detection, MCP security, and more.
716
648
 
717
649
  ## Contributing
718
650
 
@@ -720,4 +652,4 @@ Issues and PRs are welcome.
720
652
 
721
653
  ## License
722
654
 
723
- MIT © Jag Reehal. See LICENSE for details.
655
+ MIT © Jag Reehal. See [LICENSE](./LICENSE) for details.