ai-sdk-guardrails 4.0.0 → 5.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +558 -730
  2. package/package.json +26 -22
package/README.md CHANGED
@@ -1,895 +1,723 @@
1
1
  # AI SDK Guardrails
2
2
 
3
- Middleware for the Vercel AI SDK that adds safety, quality control, and cost management to your AI applications by intercepting prompts and responses.
3
+ **Input and output validation for the Vercel AI SDK**
4
4
 
5
- Block harmful inputs, filter low-quality outputs, and gain observability, all in just a few lines of code.
5
+ Add safety checks and quality controls to your AI applications. Guard against prompt injection, prevent sensitive data leaks, and improve output reliability - all while keeping your existing AI SDK code unchanged.
6
+
7
+ **Now includes MCP (Model Context Protocol) security guardrails** to help protect against attacks when using AI tools.
8
+
9
+ [![npm version](https://img.shields.io/npm/v/ai-sdk-guardrails.svg?logo=npm&label=npm)](https://www.npmjs.com/package/ai-sdk-guardrails)
10
+ [![downloads](https://img.shields.io/npm/dw/ai-sdk-guardrails.svg?label=downloads)](https://www.npmjs.com/package/ai-sdk-guardrails)
11
+ [![bundle size](https://img.shields.io/bundlephobia/minzip/ai-sdk-guardrails.svg?label=minzipped)](https://bundlephobia.com/package/ai-sdk-guardrails)
12
+ [![license](https://img.shields.io/npm/l/ai-sdk-guardrails.svg?label=license)](./LICENSE)
13
+ ![types](https://img.shields.io/badge/TypeScript-Ready-3178C6?logo=typescript&logoColor=white)
6
14
 
7
15
  ![Guardrails Demo](./media/guardrail-example.gif)
8
16
 
9
- ## TL;DR
17
+ ## Why this matters
10
18
 
11
- Quickly add input and output validation to any AI SDK-compatible model.
19
+ - **MCP**: Protect against prompt injection and data exfiltration when using MCP tools
20
+ - **Agent**: Have more reliable and secure agentic workflows
21
+ - **Tool security**: Protect against data exfiltration when using MCP tools
22
+ - **Save costs**: Block unnecessary requests before they hit your model
23
+ - **Improve safety**: Detect PII, block harmful content, prevent prompt injection
24
+ - **Better quality**: Enforce minimum response lengths, validate structure, auto-retry on failures
25
+ - **Easy integration**: Works as middleware with any AI SDK model
12
26
 
13
- ```typescript
14
- import { openai } from '@ai-sdk/openai';
15
- import { generateText } from 'ai';
16
- import {
17
- wrapWithGuardrails,
18
- defineInputGuardrail,
19
- defineOutputGuardrail,
20
- } from 'ai-sdk-guardrails';
27
+ ## Common use cases
21
28
 
22
- // 1. Define your guardrails
23
- const inputGuard = defineInputGuardrail({
24
- name: 'length-check',
25
- execute: async ({ prompt }) =>
26
- prompt.length > 100
27
- ? { tripwireTriggered: true, message: 'Input too long' }
28
- : { tripwireTriggered: false },
29
- });
29
+ - Content moderation and safety filters
30
+ - PII detection for compliance
31
+ - Output quality requirements (length, format)
32
+ - Prompt injection prevention
33
+ - Tool usage validation
34
+ - Auto-retry on low-quality responses
30
35
 
31
- const outputGuard = defineOutputGuardrail({
32
- name: 'quality-check',
33
- execute: async ({ result }) =>
34
- result.text.length < 10
35
- ? { tripwireTriggered: true, message: 'Response too short' }
36
- : { tripwireTriggered: false },
37
- });
36
+ ## Secure AI in Under 60 Seconds
38
37
 
39
- // 2. Wrap your model
40
- const guardedModel = wrapWithGuardrails(openai('gpt-4o'), {
41
- inputGuardrails: [inputGuard],
42
- outputGuardrails: [outputGuard],
43
- });
38
+ **Step 1:** Install (10 seconds)
44
39
 
45
- // 3. Use it! Guardrails will run automatically.
46
- const { text } = await generateText({
47
- model: guardedModel,
48
- prompt: 'A prompt that is definitely not too long.',
49
- });
40
+ ```bash
41
+ npm install ai-sdk-guardrails
50
42
  ```
51
43
 
52
- ## How It Works
53
-
54
- ### Without Guardrails (Inefficient, Poor Quality)
44
+ **Step 2:** Import (15 seconds)
55
45
 
56
- ```mermaid
57
- flowchart LR
58
- A[User Input<br/>'hello'] --> B[AI Model] --> C[Response<br/>⚠️ Wastes resources<br/>😞 Often useless]
46
+ ```ts
47
+ import { withGuardrails, piiDetector } from 'ai-sdk-guardrails';
59
48
  ```
60
49
 
61
- ### With Input Guardrails (Save Resources)
50
+ **Step 3:** Wrap your model (30 seconds)
62
51
 
63
- ```mermaid
64
- flowchart LR
65
- A[User Input<br/>'hello'] --> B[Input Guardrails] --> C[❌ STOPPED<br/>✅ No API call made]
52
+ ```ts
53
+ const safeModel = withGuardrails(yourModel, {
54
+ inputGuardrails: [piiDetector()],
55
+ });
66
56
  ```
67
57
 
68
- ### With Output Guardrails (Ensure Quality)
58
+ **Result:** Your AI now automatically blocks PII, prevents prompt injection, and validates outputs. That's it. No architecture changes, no security team required.
69
59
 
70
- ```mermaid
71
- flowchart LR
72
- A[AI Response<br/>'Here's my SSN: 123-45-6789'] --> B[Output Guardrails] --> C[❌ BLOCKED<br/>🛡️ Privacy protected]
73
- ```
60
+ ## TL;DR
74
61
 
75
- ### Complete Protection
62
+ Copy/paste minimal setup:
76
63
 
77
- ```mermaid
78
- flowchart LR
79
- A[User Input] --> B[Input Guardrails] --> C[AI Model] --> D[Output Guardrails] --> E[Clean Response]
64
+ ```ts
65
+ import { generateText } from 'ai';
66
+ import { openai } from '@ai-sdk/openai';
67
+ import {
68
+ withGuardrails,
69
+ piiDetector,
70
+ promptInjectionDetector,
71
+ minLengthRequirement,
72
+ mcpSecurityGuardrail,
73
+ } from 'ai-sdk-guardrails';
74
+
75
+ const model = withGuardrails(openai('gpt-4o'), {
76
+ inputGuardrails: [piiDetector(), promptInjectionDetector()],
77
+ outputGuardrails: [
78
+ minLengthRequirement(160),
79
+ mcpSecurityGuardrail({
80
+ maxContentSize: 51200, // 50KB limit
81
+ injectionThreshold: 0.7, // Configurable sensitivity
82
+ allowedDomains: ['api.company.com'], // Domain allowlist
83
+ }),
84
+ ],
85
+ });
86
+
87
+ const { text } = await generateText({
88
+ model,
89
+ prompt: 'Write a friendly intro email.',
90
+ });
80
91
  ```
81
92
 
82
- That's it! Input guardrails optimize resource usage by stopping inefficient requests. Output guardrails ensure quality by filtering responses.
93
+ See runnable examples: [examples/README.md](./examples/README.md)
94
+
95
+ ## Quickstart (30 seconds)
83
96
 
84
- ## 📦 Installation
97
+ Install with your provider (OpenAI shown):
85
98
 
86
99
  ```bash
87
- npm install ai-sdk-guardrails
100
+ pnpm add ai-sdk-guardrails ai @ai-sdk/openai
101
+ # or: npm i ai-sdk-guardrails ai @ai-sdk/openai
102
+ # or: yarn add ai-sdk-guardrails ai @ai-sdk/openai
103
+ ```
88
104
 
89
- # or
105
+ Wrap your model and keep using `generateText` as usual:
90
106
 
91
- yarn add ai-sdk-guardrails
107
+ ```ts
108
+ import { generateText } from 'ai';
109
+ import { openai } from '@ai-sdk/openai';
110
+ import { withGuardrails, piiDetector } from 'ai-sdk-guardrails';
92
111
 
93
- # or
112
+ const model = withGuardrails(openai('gpt-4o'), {
113
+ inputGuardrails: [piiDetector()],
114
+ });
94
115
 
95
- pnpm add ai-sdk-guardrails
116
+ const { text } = await generateText({
117
+ model,
118
+ prompt: 'Write a friendly intro email.',
119
+ });
96
120
  ```
97
121
 
98
- ## 🔄 Migration Guide
122
+ ## Contents
99
123
 
100
- For breaking changes from v3 to v4 (including the new analytics-rich callbacks), see [v3-v4-MIGRATION.md](./v3-v4-MIGRATION.md).
124
+ - Overview
125
+ - Concepts
126
+ - Installation
127
+ - Usage
128
+ - Define a guardrail
129
+ - Built-in helpers
130
+ - Streaming
131
+ - Auto Retry (utility and middleware)
132
+ - Error Handling
133
+ - API
134
+ - Examples
135
+ - Compatibility
136
+ - Architecture
137
+ - Contributing
101
138
 
102
- ## 🚀 Quick Start
139
+ ## API Overview
103
140
 
104
- Add smart validation to your AI applications in just 3 steps:
141
+ ### Primary Functions
105
142
 
106
- ### 1. Prevent Unnecessary AI Calls
143
+ - **`withGuardrails(model, config)`** - Main API for wrapping language models with guardrails
144
+ - **`createGuardrails(config)`** - Factory to create reusable guardrail configurations
145
+ - **`withAgentGuardrails(agentSettings, config)`** - Wrap AI SDK Agents with guardrails
107
146
 
108
- ```typescript
109
- import { generateText } from 'ai';
110
- import { openai } from '@ai-sdk/openai';
111
- import {
112
- wrapWithInputGuardrails,
113
- defineInputGuardrail,
114
- } from 'ai-sdk-guardrails';
115
- import { extractTextContent } from 'ai-sdk-guardrails/guardrails/input';
147
+ ### Migration from v3.x
116
148
 
117
- // Block inefficient requests before calling the AI model
118
- const lengthGuard = defineInputGuardrail({
119
- name: 'blocked-keywords',
120
- execute: async (context) => {
121
- const { prompt } = extractTextContent(context);
122
- const blockedWords = ['spam', 'test', 'hello'];
123
-
124
- const foundWord = blockedWords.find((word) =>
125
- prompt.toLowerCase().includes(word.toLowerCase()),
126
- );
127
-
128
- if (foundWord) {
129
- return {
130
- tripwireTriggered: true,
131
- message: `Blocked keyword detected: ${foundWord}`,
132
- severity: 'medium',
133
- };
134
- }
135
-
136
- return { tripwireTriggered: false };
137
- },
138
- });
149
+ - `wrapWithGuardrails` `withGuardrails` (alias available, deprecated)
150
+ - `wrapAgentWithGuardrails` `withAgentGuardrails` (alias available, deprecated)
151
+ - Error classes: `InputBlockedError` → `GuardrailsInputError`, `OutputBlockedError` → `GuardrailsOutputError`
139
152
 
140
- const optimizedModel = wrapWithInputGuardrails(openai('gpt-4'), {
141
- inputGuardrails: [lengthGuard],
142
- });
153
+ ```ts
154
+ // Before (v3.x - still works but deprecated)
155
+ import { wrapWithGuardrails, InputBlockedError } from 'ai-sdk-guardrails';
156
+ const model = wrapWithGuardrails(openai('gpt-4o'), { ... });
157
+
158
+ // After (v4.x - recommended)
159
+ import { withGuardrails, GuardrailsInputError } from 'ai-sdk-guardrails';
160
+ const model = withGuardrails(openai('gpt-4o'), { ... });
161
+
162
+ // Factory pattern (new in v4.x)
163
+ import { createGuardrails } from 'ai-sdk-guardrails';
164
+ const guards = createGuardrails({ ... });
165
+ const model = guards(openai('gpt-4o'));
166
+ ```
143
167
 
144
- // This would normally waste an API call for a useless response
145
- try {
146
- const result = await generateText({
147
- model: optimizedModel,
148
- prompt: 'hello', // ❌ Blocked - prevents unnecessary API call
149
- });
150
- } catch (error) {
151
- console.log('Blocked request, saved money!');
152
- }
168
+ ## Concepts
153
169
 
154
- // This generates valuable content
155
- const goodResult = await generateText({
156
- model: optimizedModel,
157
- prompt: 'Write a product description for our new software', // ✅ This creates value
158
- });
159
- ```
170
+ - Input guardrails: Validate or block prompts to save cost and enforce rules before the call.
171
+ - Output guardrails: Check results for quality and safety. Block, replace, or retry as needed.
172
+ - Middleware: Guardrails wrap any model via AI SDK middleware. Your app code stays the same.
173
+
174
+ ## Installation
175
+
176
+ See Quickstart for installation commands. Add providers you use as needed (e.g., `@ai-sdk/openai`, `@ai-sdk/mistral`).
160
177
 
161
- ### 2. Ensure Quality Output
178
+ ## Usage
162
179
 
163
- ```typescript
180
+ ### Create custom guardrails
181
+
182
+ ```ts
183
+ import { openai } from '@ai-sdk/openai';
164
184
  import {
165
- wrapWithOutputGuardrails,
185
+ defineInputGuardrail,
166
186
  defineOutputGuardrail,
187
+ withGuardrails,
167
188
  } from 'ai-sdk-guardrails';
189
+ import { extractTextContent } from 'ai-sdk-guardrails/guardrails/input';
168
190
  import { extractContent } from 'ai-sdk-guardrails/guardrails/output';
169
191
 
170
- const qualityGuard = defineOutputGuardrail({
171
- name: 'sensitive-info-detector',
172
- execute: async (context) => {
173
- const { text } = extractContent(context.result);
174
-
175
- // Simple sensitive info patterns
176
- const sensitivePatterns = [
177
- /\b\d{3}-\d{2}-\d{4}\b/, // SSN
178
- /\b[\w\.-]+@[\w\.-]+\.\w+\b/, // Email
179
- /\b\d{3}-\d{3}-\d{4}\b/, // Phone
180
- ];
181
-
182
- const foundPattern = sensitivePatterns.find((pattern) =>
183
- pattern.test(text),
184
- );
185
-
186
- if (foundPattern) {
187
- return {
188
- tripwireTriggered: true,
189
- message: 'Sensitive information detected in response',
190
- severity: 'high',
191
- };
192
- }
193
-
194
- return { tripwireTriggered: false };
192
+ const businessHours = defineInputGuardrail({
193
+ name: 'business-hours',
194
+ execute: async (params) => {
195
+ const hr = new Date().getHours();
196
+ return hr >= 9 && hr <= 17
197
+ ? { tripwireTriggered: false }
198
+ : { tripwireTriggered: true, message: 'Outside business hours' };
195
199
  },
196
200
  });
197
201
 
198
- const qualityModel = wrapWithOutputGuardrails(openai('gpt-4'), {
199
- outputGuardrails: [qualityGuard],
200
- onOutputBlocked: (executionSummary) => {
201
- console.log(
202
- 'Prevented sensitive data leak:',
203
- executionSummary.blockedResults[0]?.message,
204
- );
205
-
206
- // Access comprehensive analytics (New in v4.0.0)
207
- console.log(
208
- `Blocked ${executionSummary.stats.blocked} of ${executionSummary.guardrailsExecuted} guardrails`,
209
- );
202
+ const minQuality = defineOutputGuardrail({
203
+ name: 'min-quality',
204
+ execute: async ({ result }) => {
205
+ const { text } = extractContent(result);
206
+ return text.length >= 80
207
+ ? { tripwireTriggered: false }
208
+ : { tripwireTriggered: true, message: 'Response too short' };
210
209
  },
211
210
  });
212
211
 
213
- const result = await generateText({
214
- model: qualityModel,
215
- prompt: 'Create a user profile example',
212
+ const model = withGuardrails(openai('gpt-4o'), {
213
+ inputGuardrails: [businessHours],
214
+ outputGuardrails: [minQuality],
216
215
  });
217
- // Automatically blocks responses containing emails, phone numbers, or SSNs
218
216
  ```
219
217
 
220
- ### 3. Custom Business Logic
221
-
222
- ```typescript
223
- const businessHoursGuard = defineInputGuardrail({
224
- name: 'business-hours-only',
225
- execute: async () => {
226
- const hour = new Date().getUTCHours();
227
- // Only allow requests between 9 AM and 5 PM UTC
228
- if (hour < 9 || hour > 17) {
229
- return {
230
- tripwireTriggered: true,
231
- message:
232
- 'Requests are only permitted during business hours (9:00-17:00 UTC).',
233
- severity: 'low',
234
- };
235
- }
236
- return { tripwireTriggered: false };
237
- },
238
- });
218
+ ### Built-in helpers
239
219
 
240
- const smartEducationModel = wrapWithInputGuardrails(openai('gpt-4'), {
241
- inputGuardrails: [businessHoursGuard],
220
+ ```ts
221
+ import { openai } from '@ai-sdk/openai';
222
+ import {
223
+ withGuardrails,
224
+ piiDetector,
225
+ blockedKeywords,
226
+ contentLengthLimit,
227
+ promptInjectionDetector,
228
+ sensitiveDataFilter,
229
+ minLengthRequirement,
230
+ confidenceThreshold,
231
+ mcpSecurityGuardrail,
232
+ mcpResponseSanitizer,
233
+ } from 'ai-sdk-guardrails';
234
+
235
+ const model = withGuardrails(openai('gpt-4o'), {
236
+ inputGuardrails: [
237
+ piiDetector(),
238
+ promptInjectionDetector({ threshold: 0.7 }),
239
+ blockedKeywords(['test', 'spam']),
240
+ contentLengthLimit(4000),
241
+ ],
242
+ outputGuardrails: [
243
+ mcpSecurityGuardrail({
244
+ detectExfiltration: true,
245
+ scanEncodedContent: true,
246
+ allowedDomains: ['trusted-api.com'],
247
+ }),
248
+ mcpResponseSanitizer(),
249
+ sensitiveDataFilter(),
250
+ minLengthRequirement(160),
251
+ confidenceThreshold(0.6),
252
+ ],
242
253
  });
243
254
  ```
244
255
 
245
- ### 4. Type-Safe Metadata (TypeScript)
256
+ ## Streaming
246
257
 
247
- The library automatically infers metadata types from your guardrail definitions - no manual type annotations needed!
258
+ Works out of the box. By default, guardrails run after the stream ends (buffer mode). For early blocking, enable progressive mode.
248
259
 
249
- ```typescript
250
- // Define metadata interface for your guardrail
251
- interface PIIMetadata extends Record<string, unknown> {
252
- detectedTypes: Array<{ type: string; description: string }>;
253
- count: number;
254
- }
260
+ ```ts
261
+ import { streamText } from 'ai';
262
+ import { openai } from '@ai-sdk/openai';
263
+ import { withGuardrails, minLengthRequirement } from 'ai-sdk-guardrails';
255
264
 
256
- // Create guardrail with typed metadata
257
- const piiDetectionGuardrail = defineInputGuardrail({
258
- name: 'pii-detection',
259
- execute: async (context) => {
260
- const { prompt } = extractTextContent(context);
261
-
262
- const patterns = [
263
- {
264
- name: 'SSN',
265
- regex: /\b\d{3}-\d{2}-\d{4}\b/,
266
- description: 'Social Security Number',
267
- },
268
- {
269
- name: 'Email',
270
- regex: /\b[\w\.-]+@[\w\.-]+\.\w+\b/,
271
- description: 'Email address',
272
- },
273
- ];
274
-
275
- const detected = patterns.filter((p) => p.regex.test(prompt));
276
-
277
- if (detected.length > 0) {
278
- // TypeScript knows this metadata matches PIIMetadata
279
- const metadata: PIIMetadata = {
280
- detectedTypes: detected.map((p) => ({
281
- type: p.name,
282
- description: p.description,
283
- })),
284
- count: detected.length,
285
- };
286
-
287
- return {
288
- tripwireTriggered: true,
289
- message: `PII detected: ${detected.map((p) => p.name).join(', ')}`,
290
- severity: 'high',
291
- metadata, // Type is automatically inferred!
292
- };
293
- }
294
-
295
- return { tripwireTriggered: false };
296
- },
265
+ const model = withGuardrails(openai('gpt-4o'), {
266
+ outputGuardrails: [minLengthRequirement(120)],
267
+ // Evaluate as tokens arrive; stop or replace early when blocked
268
+ streamMode: 'progressive',
269
+ replaceOnBlocked: true,
297
270
  });
298
271
 
299
- // Use the guardrail - types flow through automatically!
300
- const protectedModel = wrapWithInputGuardrails(model, [piiDetectionGuardrail], {
301
- onInputBlocked: (summary) => {
302
- // TypeScript knows the metadata type - no casting needed!
303
- const metadata = summary.blockedResults[0]?.metadata;
304
- if (metadata?.detectedTypes) {
305
- // Full type safety and autocomplete for metadata.detectedTypes
306
- for (const type of metadata.detectedTypes) {
307
- console.log(`Detected: ${type.type} - ${type.description}`);
308
- }
309
- }
310
- },
272
+ const { textStream } = await streamText({
273
+ model,
274
+ prompt: 'Tell me a short story about a robot.',
311
275
  });
312
- ```
313
276
 
314
- **That's it!** Your AI application now optimizes resource usage, ensures quality, prevents inappropriate responses, and provides full type safety automatically.
315
-
316
- ## ✨ Features
317
-
318
- - 🛡️ **Input & Output Guardrails**: Enforce custom safety, compliance, and quality policies on both prompts and LLM responses.
319
- - 💰 **Cost Control**: Block invalid or wasteful prompts before they are sent to your LLM provider, saving you money.
320
- - 🎯 **Quality Improvement**: Automatically filter, flag, or retry low-quality or irrelevant model outputs.
321
- - 🔒 **Security Protection**: Built-in defenses against prompt injection, jailbreak attempts, PII leakage, secret exposure, and tool call validation.
322
- - 🏛️ **Compliance & Governance**: Enforce regulatory guidelines and business rules for enterprise applications with jurisdiction-specific compliance.
323
- - 🔄 **Streaming Support**: Works seamlessly with both streaming (streamText) and standard (generateText) API responses with real-time content monitoring.
324
- - 📊 **Observability Hooks**: Built-in callbacks (onInputBlocked, onOutputBlocked, etc.) for logging and monitoring with comprehensive execution analytics.
325
- - ⚙️ **Configurable Execution**: Run guardrails in parallel or sequentially and set custom timeouts.
326
- - 🚀 **AI SDK Native**: Designed from the ground up to integrate cleanly with AI SDK middleware patterns.
327
- - 🧠 **AI-Powered Verification**: LLM-as-judge capabilities for hallucination detection and quality assessment.
328
- - 🌍 **Global Compliance**: Support for multiple jurisdictions (US, EU, UK, CA, AU, JP, CN, IN) with region-specific policies.
329
- - 📝 **Content Protection**: Copyright and IP protection with originality scoring and verbatim passage detection.
330
- - 🔐 **Data Integrity**: Comprehensive table validation, SQL code safety, and schema enforcement.
331
- - 🌐 **Network Security**: Domain allowlisting, URL sanitization, and external access controls.
332
- - 🔒 **Privacy & Memory**: PII redaction, memory minimization, and secure logging practices.
333
- - 🛡️ **Safety & Escalation**: Toxicity de-escalation, human review workflows, and streaming early termination.
334
-
335
- ## 📚 API Overview
336
-
337
- | Function | Description |
338
- | ---------------------------- | ----------------------------------------------------------------------------- |
339
- | `defineInputGuardrail()` | Creates a guardrail to validate, inspect, or block prompts. |
340
- | `defineOutputGuardrail()` | Creates a guardrail to validate, filter, or re-route LLM outputs. |
341
- | `wrapWithGuardrails()` | ⭐ **Recommended** - The easiest way to add both input and output guardrails. |
342
- | `wrapWithInputGuardrails()` | Attaches input-only guardrails to a model. |
343
- | `wrapWithOutputGuardrails()` | Attaches output-only guardrails to a model. |
344
- | `isGuardrailsError()`, etc. | Error handling utilities and structured error types. |
345
-
346
- ## 🧠 Design Philosophy
347
-
348
- - ✅ **Helper-First**: Simple, chainable utility functions provide a great developer experience for fast adoption.
349
- - 🧩 **Composable**: Multiple guardrails can be chained together and will run in your specified order (or in parallel).
350
- - 🧾 **Type-Safe**: Full TypeScript support with automatic type inference for guardrail metadata - no manual type annotations needed!
351
- - 🧪 **Sensible Defaults**: Get started quickly with zero-config default behaviors that can be easily overridden.
352
-
353
- ## Architecture Overview
354
-
355
- The library leverages the Vercel AI SDK's middleware architecture to provide composable guardrails that integrate seamlessly with your existing AI applications:
356
-
357
- ```mermaid
358
- graph TB
359
- subgraph "Your Application"
360
- App[Your App Code]
361
- Config[Guardrail Configuration]
362
- end
363
-
364
- subgraph "AI SDK Guardrails Middleware"
365
- InputMW[Input Guardrails Middleware]
366
- OutputMW[Output Guardrails Middleware]
367
-
368
- subgraph "Input Guardrails Layer"
369
- Length[Length Validation]
370
- Spam[Spam Detection]
371
- PII[PII Detection]
372
- Business[Business Rules]
373
- Custom1[Custom Guards]
374
- end
375
-
376
- subgraph "Output Guardrails Layer"
377
- Quality[Quality Assurance]
378
- Sensitive[Sensitive Info Filter]
379
- Professional[Professional Tone]
380
- Factual[Factual Validation]
381
- Custom2[Custom Guards]
382
- end
383
- end
384
-
385
- subgraph "AI SDK Core"
386
- Wrapper[wrapLanguageModel]
387
- Generator[generateText/Object/Stream]
388
- end
389
-
390
- subgraph "External Services"
391
- AI[AI Model Provider]
392
- Log[Logging & Telemetry]
393
- end
394
-
395
- App --> Config
396
- Config --> InputMW
397
- InputMW --> Length
398
- InputMW --> Spam
399
- InputMW --> PII
400
- InputMW --> Business
401
- InputMW --> Custom1
402
-
403
- InputMW -->|Valid Request| Wrapper
404
- InputMW -->|Blocked Request| Log
405
-
406
- Wrapper --> Generator
407
- Generator --> AI
408
- AI --> OutputMW
409
-
410
- OutputMW --> Quality
411
- OutputMW --> Sensitive
412
- OutputMW --> Professional
413
- OutputMW --> Factual
414
- OutputMW --> Custom2
415
-
416
- OutputMW -->|Clean Response| App
417
- OutputMW -->|Quality Issues| Log
418
-
419
- style InputMW fill:#e1f5fe
420
- style OutputMW fill:#f3e5f5
421
- style AI fill:#fff3e0
422
- style App fill:#e8f5e8
277
+ for await (const delta of textStream) process.stdout.write(delta);
423
278
  ```
424
279
 
425
- ## 🍳 Recipes & Use Cases
280
+ ## Auto Retry
426
281
 
427
- Guardrails can enforce any custom logic. Here are a few common patterns.
282
+ Choose what fits your flow:
428
283
 
429
- ### Rate Limiting
284
+ - Standalone utility: Use `retry()` to wrap any generation function with your own validator and backoff.
285
+ - Middleware option: Add `retry` to output guardrails so retries run automatically when a check fails.
430
286
 
431
- Pass a userId in the metadata of your generateText call to enforce per-user rate limits.
287
+ ### Utility
432
288
 
433
- ```typescript
434
- const rateLimitGuard = defineInputGuardrail({
435
- name: 'user-rate-limit',
436
- execute: async ({ metadata }) => {
437
- const userId = metadata?.userId ?? 'anonymous';
438
- const allowed = await checkRateLimit(userId); // Your rate-limiting logic
289
+ ```ts
290
+ import { retry } from 'ai-sdk-guardrails';
291
+ import { generateText } from 'ai';
292
+ import { openai } from '@ai-sdk/openai';
439
293
 
440
- return allowed
441
- ? { tripwireTriggered: false }
442
- : {
443
- tripwireTriggered: true,
444
- message: `Rate limit exceeded for user: ${userId}`,
445
- };
446
- },
294
+ const result = await retry({
295
+ generate: (params) => generateText({ model: openai('gpt-4o'), ...params }),
296
+ params: { prompt: 'Explain backpropagation in depth.' },
297
+ validate: (r) => ({
298
+ blocked: (r.text ?? '').length < 500,
299
+ message: 'Response too short',
300
+ }),
301
+ buildRetryParams: ({ lastParams }) => ({
302
+ ...lastParams,
303
+ maxOutputTokens: Math.max(800, (lastParams.maxOutputTokens ?? 400) + 300),
304
+ }),
305
+ maxRetries: 2,
447
306
  });
448
307
  ```
449
308
 
450
- ### LLM-as-Judge for Quality Scoring
309
+ ### Middleware
451
310
 
452
- Use a cheaper, faster model to "judge" the output of a more powerful one.
311
+ ```ts
312
+ import { generateText } from 'ai';
313
+ import { openai } from '@ai-sdk/openai';
314
+ import { withGuardrails, defineOutputGuardrail } from 'ai-sdk-guardrails';
315
+ import { extractContent } from 'ai-sdk-guardrails/guardrails/output';
453
316
 
454
- ```typescript
455
- const qualityJudge = defineOutputGuardrail({
456
- name: 'llm-quality-judge',
317
+ const minLengthGuardrail = defineOutputGuardrail<{ minChars: number }>({
318
+ name: 'min-output-length',
457
319
  execute: async ({ result }) => {
458
- // Use a cheap model to score the primary model's output
459
- const judgement = await generateText({
460
- model: openai('gpt-3.5-turbo'),
461
- prompt: `Is the following response helpful and safe? Answer YES or NO. \n\nResponse: "${result.text}"`,
462
- });
463
-
464
- const isSafe = judgement.text.includes('YES');
465
- return isSafe
466
- ? { tripwireTriggered: false }
467
- : {
320
+ const { text } = extractContent(result);
321
+ const minChars = text.length + 1;
322
+ return text.length < minChars
323
+ ? {
468
324
  tripwireTriggered: true,
469
- message: `Output failed LLM-as-judge quality check.`,
470
- metadata: { originalText: result.text },
471
- };
325
+ severity: 'medium',
326
+ message: `Answer too short: ${text.length} < ${minChars}`,
327
+ metadata: { minChars },
328
+ }
329
+ : { tripwireTriggered: false };
472
330
  },
473
331
  });
474
- ```
475
332
 
476
- ### Advanced Input Validation
477
-
478
- ```typescript
479
- import { extractTextContent } from 'ai-sdk-guardrails/guardrails/input';
480
-
481
- const comprehensiveInputGuard = defineInputGuardrail({
482
- name: 'comprehensive-input-validation',
483
- execute: async (context) => {
484
- const { prompt } = extractTextContent(context);
485
-
486
- // Length validation
487
- if (prompt.length < 10) {
488
- return {
489
- tripwireTriggered: true,
490
- message: 'Input too short - likely to produce low-value response',
491
- severity: 'medium',
492
- suggestion: 'Please provide more detailed input for better results',
493
- };
494
- }
495
-
496
- if (prompt.length > 4000) {
497
- return {
498
- tripwireTriggered: true,
499
- message: 'Input too long - may exceed token limits',
500
- severity: 'high',
501
- suggestion: 'Break your request into smaller, focused parts',
502
- };
503
- }
504
-
505
- // Content quality checks
506
- const spamPatterns = [
507
- /^(.)\1{10,}$/, // Repeated characters
508
- /^(test|hello|hi|hey)$/i, // Common spam words
509
- ];
510
-
511
- const foundSpam = spamPatterns.find((pattern) => pattern.test(prompt));
512
- if (foundSpam) {
513
- return {
514
- tripwireTriggered: true,
515
- message: 'Low-quality input detected',
516
- severity: 'high',
517
- };
518
- }
519
-
520
- return { tripwireTriggered: false };
333
+ const guarded = wrapWithOutputGuardrails(
334
+ openai('gpt-4o'),
335
+ [minLengthGuardrail],
336
+ {
337
+ replaceOnBlocked: false,
338
+ retry: {
339
+ maxRetries: 1,
340
+ buildRetryParams: ({ summary, lastParams }) => ({
341
+ ...lastParams,
342
+ maxOutputTokens: Math.max(
343
+ 800,
344
+ (lastParams.maxOutputTokens ?? 400) + 300,
345
+ ),
346
+ prompt: [
347
+ ...(Array.isArray(lastParams.prompt) ? lastParams.prompt : []),
348
+ {
349
+ role: 'user' as const,
350
+ content: [
351
+ {
352
+ type: 'text' as const,
353
+ text: `Note: The previous answer ${summary.blockedResults[0]?.message}. Provide a comprehensive, detailed answer with examples.`,
354
+ },
355
+ ],
356
+ },
357
+ ],
358
+ }),
359
+ },
521
360
  },
361
+ );
362
+
363
+ const { text } = await generateText({
364
+ model: guarded,
365
+ prompt: 'Explain the significance of the Turing Test in AI history.',
522
366
  });
523
367
  ```
524
368
 
525
- ### Professional Output Quality Control
369
+ Tip: Use backoff helpers if you need delays between retries: `exponentialBackoff`, `linearBackoff`, `fixedBackoff`, `jitteredExponentialBackoff`, or `backoffPresets`.
526
370
 
527
- ```typescript
528
- import { extractContent } from 'ai-sdk-guardrails/guardrails/output';
371
+ ## Error Handling
529
372
 
530
- const professionalQualityGuard = defineOutputGuardrail({
531
- name: 'professional-quality-control',
532
- execute: async (context) => {
533
- const { text } = extractContent(context.result);
534
-
535
- const qualityIssues = [];
536
-
537
- // Check for unprofessional language
538
- const unprofessionalTerms = ['lol', 'wtf', 'omg', 'ur', 'u r'];
539
- const hasUnprofessional = unprofessionalTerms.some((term) =>
540
- text.toLowerCase().includes(term),
541
- );
542
-
543
- if (hasUnprofessional) {
544
- qualityIssues.push('Contains unprofessional language');
545
- }
546
-
547
- // Check for placeholder text
548
- const placeholders = ['[insert', '[add', '[your', 'TODO:', 'FIXME:'];
549
- const hasPlaceholders = placeholders.some((placeholder) =>
550
- text.includes(placeholder),
551
- );
552
-
553
- if (hasPlaceholders) {
554
- qualityIssues.push('Contains placeholder text - incomplete response');
555
- }
556
-
557
- // Check for excessive repetition
558
- const sentences = text.split(/[.!?]+/).filter((s) => s.trim());
559
- const uniqueSentences = new Set(
560
- sentences.map((s) => s.trim().toLowerCase()),
561
- );
562
- const repetitionRatio = uniqueSentences.size / sentences.length;
563
-
564
- if (sentences.length > 3 && repetitionRatio < 0.6) {
565
- qualityIssues.push('Excessive repetition detected');
566
- }
567
-
568
- if (qualityIssues.length > 0) {
569
- return {
570
- tripwireTriggered: true,
571
- message: `Quality issues found: ${qualityIssues.join(', ')}`,
572
- severity: 'medium',
573
- suggestion: 'Request a more professional, complete response',
574
- metadata: {
575
- issues: qualityIssues,
576
- quality_score: repetitionRatio,
577
- },
578
- };
579
- }
373
+ Set `throwOnBlocked: true` to throw structured errors you can catch and turn into friendly messages.
580
374
 
581
- return { tripwireTriggered: false };
582
- },
583
- });
584
- ```
375
+ ```ts
376
+ import { isGuardrailsError } from 'ai-sdk-guardrails';
585
377
 
586
- ## 🔄 Streaming Support
378
+ try {
379
+ const { text } = await generateText({ model, prompt: '...' });
380
+ } catch (err) {
381
+ if (isGuardrailsError(err)) {
382
+ console.error('Guardrail blocked:', err.message);
383
+ // err.results gives you details per guardrail
384
+ } else {
385
+ console.error('Unexpected error:', err);
386
+ }
387
+ }
388
+ ```
587
389
 
588
- Guardrails work with streams out-of-the-box. By default, output guardrails run after the complete response has been streamed (buffer mode).
390
+ ## Reusable Guardrails Factory
589
391
 
590
- ```typescript
591
- import { streamText } from 'ai';
392
+ Use `createGuardrails()` to create reusable guardrail configurations that can be applied to multiple models:
592
393
 
593
- const guardedModel = wrapWithGuardrails(openai('gpt-4o'), {
594
- outputGuardrails: [qualityJudge],
394
+ ```ts
395
+ import { openai } from '@ai-sdk/openai';
396
+ import { anthropic } from '@ai-sdk/anthropic';
397
+ import { createGuardrails, defineInputGuardrail } from 'ai-sdk-guardrails';
398
+
399
+ // Create reusable guardrails configuration
400
+ const productionGuards = createGuardrails({
401
+ inputGuardrails: [piiDetector(), contentFilter()],
402
+ outputGuardrails: [qualityCheck(), minLength(100)],
403
+ throwOnBlocked: true,
595
404
  });
596
405
 
597
- const { textStream } = await streamText({
598
- model: guardedModel,
599
- prompt: 'Tell me a short story about a robot.',
600
- });
406
+ // Apply to multiple models
407
+ const gpt4 = productionGuards(openai('gpt-4o'));
408
+ const claude = productionGuards(anthropic('claude-3-sonnet'));
601
409
 
602
- // Stream the response to the client
603
- for await (const delta of textStream) {
604
- process.stdout.write(delta);
605
- }
410
+ // Compose multiple guardrail sets
411
+ const strictLimits = createGuardrails({ inputGuardrails: [maxLength(500)] });
412
+ const piiProtection = createGuardrails({ inputGuardrails: [piiDetector()] });
606
413
 
607
- // The qualityJudge guardrail will run after the stream is complete.
414
+ // Chain them together
415
+ const model = piiProtection(strictLimits(openai('gpt-4o')));
608
416
  ```
609
417
 
610
- ### Progressive Streaming (opt-in)
418
+ ## MCP Security Guardrails
611
419
 
612
- For early blocking, enable progressive evaluation:
420
+ **Production-Ready**: Protect against prompt injection and data exfiltration attacks when using Model Context Protocol (MCP) tools. Based on research into the ["lethal trifecta" vulnerability](https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/) that has affected major AI platforms.
613
421
 
614
- ```ts
615
- const guardedModel = wrapWithGuardrails(openai('gpt-4o'), {
616
- outputGuardrails: [qualityJudge],
617
- // Evaluate on the fly and stop early when blocked
618
- streamMode: 'progressive',
619
- // Replace blocked output with a placeholder (default: true)
620
- replaceOnBlocked: true,
621
- });
622
- ```
422
+ ### The Problem
623
423
 
624
- In progressive mode, guardrails evaluate text as it arrives. If blocked:
424
+ AI agents with MCP tools can be vulnerable when they have:
625
425
 
626
- - with `throwOnBlocked: true`, the stream errors.
627
- - with `replaceOnBlocked: true`, a placeholder message is streamed and the stream ends.
628
- - otherwise, the original chunks continue (with a callback via `onOutputBlocked`).
426
+ 1. **Access to private data** (through tools)
427
+ 2. **Process untrusted content** (from tool responses)
428
+ 3. **Can communicate externally** (make web requests)
629
429
 
630
- Note: Progressive mode runs guardrails more frequently and may increase overhead for long streams.
430
+ Malicious tool responses can contain hidden instructions that trick the AI into exfiltrating sensitive data.
631
431
 
632
- ### Configuration Highlights
432
+ ### Production-Ready Solution
633
433
 
634
- - `replaceOnBlocked` (output): defaults to `true` for safer behavior.
635
- - `executionOptions.logLevel`: defaults to `'warn'` (respects `'none' | 'error' | 'warn' | 'info' | 'debug'`).
636
- - `onInputBlocked` / `onOutputBlocked`: receive a `GuardrailExecutionSummary` with analytics.
434
+ Full configurability with sensible defaults for immediate deployment:
637
435
 
638
- ### Cancellation Support
436
+ ```ts
437
+ import {
438
+ withGuardrails,
439
+ promptInjectionDetector,
440
+ mcpSecurityGuardrail,
441
+ mcpResponseSanitizer,
442
+ toolEgressPolicy,
443
+ } from 'ai-sdk-guardrails';
639
444
 
640
- Guardrails can receive an `AbortSignal` and should abort work on timeout or caller-initiated cancel:
445
+ // Conservative production setup (high security)
446
+ const secureModel = withGuardrails(openai('gpt-4o'), {
447
+ inputGuardrails: [
448
+ promptInjectionDetector({ threshold: 0.6, includeExamples: true }),
449
+ ],
450
+ outputGuardrails: [
451
+ mcpSecurityGuardrail({
452
+ injectionThreshold: 0.5, // Lower = more sensitive
453
+ maxSuspiciousUrls: 0, // Zero tolerance
454
+ maxContentSize: 25600, // 25KB limit for performance
455
+ minEncodedLength: 15, // Detect shorter encoded attacks
456
+ encodedInjectionThreshold: 0.2, // Combined encoded + injection threshold
457
+ highRiskThreshold: 0.3, // High-risk cascade blocking
458
+ authorityThreshold: 0.5, // Authority manipulation detection
459
+ allowedDomains: ['api.company.com', 'trusted-partner.com'],
460
+ customSuspiciousDomains: ['evil.com', 'malicious.org'],
461
+ blockCascadingCalls: true,
462
+ scanEncodedContent: true,
463
+ detectExfiltration: true,
464
+ }),
465
+ mcpResponseSanitizer(), // Clean malicious content vs blocking
466
+ toolEgressPolicy({
467
+ allowedHosts: ['api.company.com', 'trusted-partner.com'],
468
+ blockedHosts: ['webhook.site', 'requestcatcher.com', 'ngrok.io'],
469
+ scanForUrls: true,
470
+ }),
471
+ ],
472
+ });
473
+ ```
474
+
475
+ ### Environment & Role-Based Configuration
641
476
 
642
477
  ```ts
643
- const guard = defineInputGuardrail({
644
- name: 'long-check',
645
- async execute(context, { signal }) {
646
- await doWork({ signal }); // Pass signal to your async ops
647
- return { tripwireTriggered: false };
648
- },
649
- });
478
+ // Different security profiles for different environments
479
+ function getSecurityConfig(env: 'production' | 'staging' | 'development') {
480
+ const configs = {
481
+ production: {
482
+ injectionThreshold: 0.5, // High security
483
+ maxContentSize: 25600, // 25KB limit
484
+ authorityThreshold: 0.5, // Very sensitive
485
+ },
486
+ staging: {
487
+ injectionThreshold: 0.7, // Balanced security
488
+ maxContentSize: 51200, // 50KB default
489
+ authorityThreshold: 0.7, // Standard sensitivity
490
+ },
491
+ development: {
492
+ injectionThreshold: 0.8, // Lower security, better performance
493
+ maxContentSize: 102400, // 100KB for testing
494
+ authorityThreshold: 0.8, // Less restrictive
495
+ },
496
+ };
497
+ return configs[env];
498
+ }
650
499
 
651
- // Timeouts are enforced by guardrail execution; if it times out, you'll get a GuardrailTimeoutError.
500
+ const productionModel = withGuardrails(openai('gpt-4o'), {
501
+ outputGuardrails: [mcpSecurityGuardrail(getSecurityConfig('production'))],
502
+ });
652
503
  ```
653
504
 
654
- ## 🛠️ Error Handling
505
+ ### Attack Vectors Prevented
655
506
 
656
- When `throwOnBlocked: true` (the default), you can catch structured errors to handle blocks gracefully.
507
+ **Direct prompt injection** - "System: ignore all previous instructions"
508
+ ✅ **Tool response poisoning** - Malicious content in MCP tool responses
509
+ ✅ **Data exfiltration** - URLs constructed to steal sensitive data
510
+ ✅ **Encoded attacks** - Base64/hex hidden malicious instructions
511
+ ✅ **Cascading exploits** - Tool responses triggering additional dangerous calls
512
+ ✅ **Context poisoning** - Attempts to modify AI behavior mid-conversation
657
513
 
658
- ```typescript
659
- import { generateText } from 'ai';
660
- import { isGuardrailsError } from 'ai-sdk-guardrails';
514
+ ### Secure MCP Agent Example
661
515
 
662
- try {
663
- const result = await generateText({
664
- model: guardedModel,
665
- prompt: 'A prompt that might be blocked...',
666
- });
667
- } catch (error) {
668
- if (isGuardrailsError(error)) {
669
- // Error was thrown by one of our guardrails
670
- console.error('Guardrail check failed:', error.message);
671
- console.error('Triggered Guards:', error.results);
672
- } else {
673
- // Some other error occurred
674
- console.error('An unexpected error occurred:', error);
675
- }
676
- }
516
+ ```ts
517
+ import { withAgentGuardrails } from 'ai-sdk-guardrails';
518
+
519
+ const secureAgent = withAgentGuardrails(
520
+ {
521
+ model: openai('gpt-4o'),
522
+ tools: { file_search, api_call, database_query },
523
+ system: 'You are a secure assistant. Always validate tool responses.',
524
+ },
525
+ {
526
+ inputGuardrails: [promptInjectionDetector()],
527
+ outputGuardrails: [
528
+ mcpSecurityGuardrail({
529
+ detectExfiltration: true,
530
+ allowedDomains: ['trusted-api.com'],
531
+ }),
532
+ mcpResponseSanitizer(),
533
+ ],
534
+ toolGuardrails: [
535
+ toolEgressPolicy({
536
+ allowedHosts: ['trusted-api.com'],
537
+ scanForUrls: true,
538
+ }),
539
+ ],
540
+ },
541
+ );
677
542
  ```
678
543
 
679
- ### User-Friendly Error Messages
544
+ ### Configuration Options
680
545
 
681
- Transform technical guardrail messages into user-friendly guidance:
546
+ All security parameters are fully configurable with sensible defaults:
682
547
 
683
- ```typescript
684
- function createUserFriendlyMessage(guardrailResult): string {
685
- const guardrailName = guardrailResult.context?.guardrailName;
548
+ | Option | Default | Description |
549
+ | --------------------------- | ------- | ------------------------------------------------ |
550
+ | `injectionThreshold` | 0.7 | Prompt injection confidence threshold (0-1) |
551
+ | `maxSuspiciousUrls` | 0 | Max allowed suspicious URLs (0 = zero tolerance) |
552
+ | `maxContentSize` | 51200 | Max content size in bytes (50KB default) |
553
+ | `minEncodedLength` | 20 | Min encoded content length to analyze |
554
+ | `encodedInjectionThreshold` | 0.3 | Combined encoded + injection threshold |
555
+ | `authorityThreshold` | 0.7 | Authority manipulation detection sensitivity |
556
+ | `allowedDomains` | [] | Allowed domains for URL construction |
557
+ | `customSuspiciousDomains` | [] | Additional suspicious domain patterns |
686
558
 
687
- switch (guardrailName) {
688
- case 'content-length-limit':
689
- return 'Your message is too long. Please keep it under 500 characters for the best response.';
559
+ ### Performance & Security Balance
690
560
 
691
- case 'blocked-keywords':
692
- return "I can't help with that topic. Try asking about something else I can assist with.";
561
+ - **High Security**: Lower thresholds, stricter limits, comprehensive scanning
562
+ - **Balanced**: Default settings, good for most production use cases
563
+ - **High Performance**: Higher thresholds, larger limits, selective scanning
693
564
 
694
- case 'user-rate-limit':
695
- return "You're sending requests too quickly. Please wait a moment before trying again.";
565
+ See complete examples:
696
566
 
697
- default:
698
- return (
699
- guardrailResult.suggestion ||
700
- 'Please refine your request and try again.'
701
- );
702
- }
703
- }
704
- ```
567
+ - [Production MCP Configuration](./examples/44-production-mcp-config.ts) - **New!**
568
+ - [MCP Security Test Suite](./examples/41-mcp-security-test.ts)
569
+ - [Enhanced Security Testing](./examples/43-enhanced-mcp-security-test.ts)
570
+ - [Vulnerability Proof of Concept](./examples/42-mcp-vulnerability-proof.ts)
705
571
 
706
- ## Complete AI SDK Integration
572
+ ## Agent Support
707
573
 
708
- The library seamlessly integrates with all AI SDK functions:
574
+ Guardrails work with AI SDK Agents for multi-step agentic workflows:
709
575
 
710
- ```typescript
711
- // Create your production-ready model once
712
- const productionModel = wrapWithGuardrails(openai('gpt-4'), {
713
- inputGuardrails: [lengthGuard, spamGuard, rateLimitGuard],
714
- outputGuardrails: [qualityGuard, sensitiveInfoGuard],
715
- throwOnBlocked: false,
716
- onInputBlocked: (executionSummary) => {
717
- console.log('Input blocked:', executionSummary.blockedResults[0]?.message);
576
+ ```ts
577
+ import { openai } from '@ai-sdk/openai';
578
+ import { withAgentGuardrails, defineOutputGuardrail } from 'ai-sdk-guardrails';
579
+ import { tool } from 'ai';
580
+ import { z } from 'zod';
581
+
582
+ // Define tools for the agent
583
+ const searchTool = tool({
584
+ description: 'Search for information',
585
+ inputSchema: z.object({ query: z.string() }),
586
+ execute: async ({ query }) => `Results for: ${query}`,
587
+ });
718
588
 
719
- // Enhanced analytics available in v4.0.0
720
- console.log(`Execution time: ${executionSummary.totalExecutionTime}ms`);
721
- console.log(
722
- `Guardrails: ${executionSummary.stats.blocked} blocked, ${executionSummary.stats.passed} passed`,
723
- );
589
+ // Create agent with guardrails
590
+ const agent = withAgentGuardrails(
591
+ {
592
+ model: openai('gpt-4o'),
593
+ tools: { search: searchTool },
594
+ system: 'You are a helpful research assistant.',
724
595
  },
725
- onOutputBlocked: (executionSummary) => {
726
- console.log(
727
- 'Output filtered:',
728
- executionSummary.blockedResults[0]?.message,
729
- );
730
-
731
- // Track comprehensive metrics
732
- analytics.track('output_blocked', {
733
- severity: executionSummary.blockedResults[0]?.severity,
734
- totalGuardrails: executionSummary.guardrailsExecuted,
735
- executionTime: executionSummary.totalExecutionTime,
736
- });
596
+ {
597
+ outputGuardrails: [
598
+ defineOutputGuardrail({
599
+ name: 'tool-usage-required',
600
+ description: 'Ensures agent uses search tools',
601
+ execute: async (params) => {
602
+ const hasToolCall = params.result.steps?.some(
603
+ (step) => step.type === 'tool-call',
604
+ );
605
+
606
+ return {
607
+ tripwireTriggered: !hasToolCall,
608
+ message: hasToolCall
609
+ ? 'Tool usage validated'
610
+ : 'Must use search tools for research',
611
+ severity: 'high',
612
+ };
613
+ },
614
+ }),
615
+ ],
616
+ throwOnBlocked: true,
737
617
  },
738
- });
739
-
740
- // Use with any AI SDK function
741
- const textResult = await generateText({
742
- model: productionModel,
743
- prompt: 'Write a professional email response',
744
- });
618
+ );
745
619
 
746
- const objectResult = await generateObject({
747
- model: productionModel,
748
- prompt: 'Create a user profile',
749
- schema: userProfileSchema,
750
- });
751
-
752
- const textStream = await streamText({
753
- model: productionModel,
754
- prompt: 'Explain our product features',
620
+ // Use the guarded agent
621
+ const result = await agent.generate({
622
+ prompt: 'Research the latest AI developments',
755
623
  });
756
624
  ```
757
625
 
758
- ## Examples
626
+ ## API
759
627
 
760
- Explore **30 comprehensive examples** that demonstrate practical performance optimization, security protection, quality assurance, and enterprise-grade safety patterns:
628
+ | Export | Description |
629
+ | ----------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- |
630
+ | `defineInputGuardrail`, `defineOutputGuardrail` | Create guardrails with clear messages, severity, and metadata. |
631
+ | `withGuardrails`, `createGuardrails`, `withAgentGuardrails` | Attach guardrails to AI SDK models and agents via middleware. |
632
+ | `executeInputGuardrails`, `executeOutputGuardrails` | Run guardrails programmatically (outside middleware) and get structured results. |
633
+ | `retry`, `retryHelpers` | Standalone auto-retry utilities with validation and backoff. |
634
+ | `GuardrailsError`, `GuardrailsInputError`, `GuardrailsOutputError`, `isGuardrailsError`, `extractErrorInfo` | Structured errors and helpers for robust handling. |
635
+ | `exponentialBackoff`, `linearBackoff`, `fixedBackoff`, `jitteredExponentialBackoff`, `backoffPresets` | Backoff strategies to control retry pacing. |
761
636
 
762
- ### Core Foundation Examples
637
+ See source for built-in helpers:
763
638
 
764
- - **[Input Length Limits](examples/01-input-length-limit.ts)** - Foundation patterns for input validation
765
- - **[Blocked Keywords](examples/02-blocked-keywords.ts)** - Block prompts with specific keywords and content filtering
766
- - **[Output Length Check](examples/04-output-length-check.ts)** - Ensure minimum output length and quality control
767
- - **[Quality Assessment](examples/06-quality-assessment.ts)** - Assess response quality and content analysis
768
- - **[Combined Protection](examples/07-combined-protection.ts)** - Simple input/output validation for efficiency and quality
769
- - **[Simple Combined Protection](examples/07a-simple-combined-protection.ts)** - Simplified combined guardrails example
770
- - **[Blocking vs Warning](examples/08-blocking-vs-warning.ts)** - Compare blocking and warning modes with error handling
639
+ - Input helpers: `./src/guardrails/input.ts`
640
+ - Output helpers: `./src/guardrails/output.ts`
771
641
 
772
- ### Security & Protection Examples
773
-
774
- - **[PII Detection](examples/03-pii-detection.ts)** - Detect and block personal information in inputs
775
- - **[Sensitive Output Filter](examples/05-sensitive-output-filter.ts)** - Filter sensitive data from responses
776
- - **[Prompt Injection Detection](examples/16-prompt-injection-detection.ts)** - Comprehensive prompt injection detection with pattern matching and heuristic scoring
777
- - **[Tool Call Validation](examples/17-tool-call-validation.ts)** - Tool call validation with security patterns and dangerous operation detection
778
- - **[Basic Tool Allowlist](examples/17a-basic-tool-allowlist.ts)** - Basic tool allowlisting for secure tool usage
779
- - **[Tool Parameter Validation](examples/17b-tool-parameter-validation.ts)** - Validate tool parameters for security
780
- - **[Secret Leakage Scan](examples/18-secret-leakage-scan.ts)** - Secret leakage scanning with automatic redaction and entropy calculation
781
- - **[Jailbreak Detection](examples/30-jailbreak-detection.ts)** - Jailbreak detection with safe response templates and pattern recognition
642
+ ## Examples
782
643
 
783
- ### Content Quality & Validation Examples
644
+ Browse runnable examples for streaming, compliance, safety, and more:
784
645
 
785
- - **[Autoevals Guardrails](examples/31-autoevals-guardrails.ts)** - AI-powered quality evaluation using Autoevals library for factuality checking
786
- - **[Business Logic](examples/14-business-logic.ts)** - Custom business rules, work hours, and professional standards
787
- - **[LLM-as-Judge](examples/15-llm-as-judge.ts)** - AI-powered quality evaluation and scoring
788
- - **[Simple Quality Judge](examples/15a-simple-quality-judge.ts)** - Simplified quality assessment example
789
- - **[Hallucination Detection](examples/19-hallucination-detection.ts)** - Hallucination detection with LLM-as-judge verification and fact-checking
790
- - **[Response Consistency](examples/22-response-consistency.ts)** - Response consistency validation and coherence checking
646
+ - Index and commands: [examples/README.md](./examples/README.md)
791
647
 
792
- ### Compliance & Regulation Examples
648
+ Quick starts
793
649
 
794
- - **[Regulated Advice Compliance](examples/21-regulated-advice-compliance.ts)** - Regulated advice compliance with jurisdiction-specific rules (US, EU, UK, CA, AU, JP, CN, IN)
795
- - **[Role Hierarchy Enforcement](examples/23-role-hierarchy-enforcement.ts)** - Role hierarchy enforcement with multi-layered violation detection
650
+ | Example | Description | File |
651
+ | -------------------------- | ------------------------------- | --------------------------------------------------------------------------------- |
652
+ | Simple combined protection | Minimal input and output setup | [07a-simple-combined-protection.ts](./examples/07a-simple-combined-protection.ts) |
653
+ | Auto retry on output | Retry until output meets a rule | [32-auto-retry-output.ts](./examples/32-auto-retry-output.ts) |
654
+ | LLM judge auto-retry | Judge feedback drives retry | [33-judge-auto-retry.ts](./examples/33-judge-auto-retry.ts) |
655
+ | Expected tool use retry | Enforce/guide tool usage | [34-expected-tool-use-retry.ts](./examples/34-expected-tool-use-retry.ts) |
656
+ | Weather assistant | End-to-end input/output + retry | [33-blog-post-weather-assistant.ts](./examples/33-blog-post-weather-assistant.ts) |
796
657
 
797
- ### Data Integrity & Code Safety Examples
658
+ Input safety
798
659
 
799
- - **[Schema Validation](examples/09-schema-validation.ts)** - Schema validation and structured output quality
800
- - **[Object Content Filter](examples/10-object-content-filter.ts)** - Filter inappropriate content in generated objects
801
- - **[SQL Code Safety](examples/24-sql-code-safety.ts)** - SQL code safety with dangerous operation blocking and injection detection
660
+ | Example | Description | File |
661
+ | ------------------ | ----------------------------------- | --------------------------------------------------------------- |
662
+ | Input length limit | Enforce max input length | [01-input-length-limit.ts](./examples/01-input-length-limit.ts) |
663
+ | Blocked keywords | Block specific terms | [02-blocked-keywords.ts](./examples/02-blocked-keywords.ts) |
664
+ | PII detection | Detect PII before calling the model | [03-pii-detection.ts](./examples/03-pii-detection.ts) |
665
+ | Rate limiting | Simple per-user rate limit | [13-rate-limiting.ts](./examples/13-rate-limiting.ts) |
802
666
 
803
- ### Network & External Access Examples
667
+ Output safety
804
668
 
805
- - **[Domain Allowlisting](examples/25-browsing-domain-allowlist.ts)** - Domain allowlisting with URL sanitization and security validation
669
+ | Example | Description | File |
670
+ | ----------------------- | ----------------------------------- | ------------------------------------------------------------------------- |
671
+ | Output length check | Require min/max output length | [04-output-length-check.ts](./examples/04-output-length-check.ts) |
672
+ | Sensitive output filter | Filter secrets and PII in responses | [05-sensitive-output-filter.ts](./examples/05-sensitive-output-filter.ts) |
673
+ | Hallucination detection | Flag uncertain factual claims | [19-hallucination-detection.ts](./examples/19-hallucination-detection.ts) |
806
674
 
807
- ### Privacy & Memory Management Examples
675
+ Streaming
808
676
 
809
- - **[Memory Minimization](examples/26-memory-minimization.ts)** - Memory minimization with PII redaction and multiple redaction strategies
810
- - **[Logging Redaction](examples/27-logging-redaction.ts)** - Logging redaction with secure logging practices and compliance frameworks
677
+ | Example | Description | File |
678
+ | ----------------- | ---------------------------------- | --------------------------------------------------------------------------------- |
679
+ | Streaming limits | Apply limits in buffered streaming | [11-streaming-limits.ts](./examples/11-streaming-limits.ts) |
680
+ | Streaming quality | Quality checks with streaming | [12-streaming-quality.ts](./examples/12-streaming-quality.ts) |
681
+ | Early termination | Stop streams early when blocked | [28-streaming-early-termination.ts](./examples/28-streaming-early-termination.ts) |
811
682
 
812
- ### Safety & Escalation Examples
683
+ Advanced
813
684
 
814
- - **[Human Review Escalation](examples/20-human-review-escalation.ts)** - Human review escalation with content flagging, review routing, and quality control workflows
815
- - **[Toxicity & Harassment De-escalation](examples/29-toxicity-harassment-deescalation.ts)** - Toxicity and harassment de-escalation with safe response generation and user escalation tracking
685
+ | Example | Description | File |
686
+ | -------------------------- | ----------------------------- | ------------------------------------------------------------------------------- |
687
+ | Simple quality judge | Cheaper model judges quality | [15a-simple-quality-judge.ts](./examples/15a-simple-quality-judge.ts) |
688
+ | Secret leakage scan | Scan responses for secrets | [18-secret-leakage-scan.ts](./examples/18-secret-leakage-scan.ts) |
689
+ | SQL code safety | Basic SQL safety checks | [24-sql-code-safety.ts](./examples/24-sql-code-safety.ts) |
690
+ | Role hierarchy enforcement | Enforce role rules in prompts | [23-role-hierarchy-enforcement.ts](./examples/23-role-hierarchy-enforcement.ts) |
816
691
 
817
- ### Streaming Examples
692
+ ## Compatibility
818
693
 
819
- - **[Streaming Limits](examples/11-streaming-limits.ts)** - Apply guardrails to streaming responses with real-time validation
820
- - **[Streaming Quality](examples/12-streaming-quality.ts)** - Real-time quality monitoring for streams
821
- - **[Streaming Early Termination](examples/28-streaming-early-termination.ts)** - Streaming early termination with real-time content monitoring and session state management
694
+ - Runtime: Node.js 18+ recommended
695
+ - AI SDK: Compatible with AI SDK 5 (`ai@^5`); wraps any model
696
+ - For `generateObject`: for strict object validation, run `executeOutputGuardrails()` after generation
822
697
 
823
- ### Resource Management Examples
698
+ ## Architecture
824
699
 
825
- - **[Rate Limiting](examples/13-rate-limiting.ts)** - Smart rate limiting that prevents resource overuse
700
+ ```mermaid
701
+ flowchart LR
702
+ A[Input] --> B[Input Guardrails]
703
+ B -->|Valid| C[AI Model]
704
+ B -->|Blocked| X[No API Call]
705
+ C --> D[Output Guardrails]
706
+ D -->|Clean| E[Response]
707
+ D -->|Blocked| R[Retry/Replace/Throw]
708
+ ```
826
709
 
827
- ### Running Examples
710
+ ### Design principles
828
711
 
829
- ```bash
830
- # Install dependencies
831
- pnpm install
832
-
833
- # Run core foundation examples
834
- tsx examples/01-input-length-limit.ts # Basic input validation
835
- tsx examples/02-blocked-keywords.ts # Keyword blocking
836
- tsx examples/04-output-length-check.ts # Output length validation
837
- tsx examples/06-quality-assessment.ts # Quality assessment
838
- tsx examples/07-combined-protection.ts # Combined input/output protection
839
- tsx examples/07a-simple-combined-protection.ts # Simplified combined protection
840
- tsx examples/08-blocking-vs-warning.ts # Blocking vs warning modes
841
-
842
- # Run security examples
843
- tsx examples/03-pii-detection.ts # PII protection
844
- tsx examples/05-sensitive-output-filter.ts # Sensitive output filtering
845
- tsx examples/16-prompt-injection-detection.ts # Prompt injection protection
846
- tsx examples/17-tool-call-validation.ts # Tool call validation
847
- tsx examples/17a-basic-tool-allowlist.ts # Basic tool allowlisting
848
- tsx examples/17b-tool-parameter-validation.ts # Tool parameter validation
849
- tsx examples/18-secret-leakage-scan.ts # Secret leakage prevention
850
- tsx examples/30-jailbreak-detection.ts # Jailbreak detection
851
-
852
- # Run content quality examples
853
- tsx examples/31-autoevals-guardrails.ts # AI-powered quality evaluation with Autoevals
854
- tsx examples/14-business-logic.ts # Business-specific rules
855
- tsx examples/15-llm-as-judge.ts # AI-powered quality control
856
- tsx examples/15a-simple-quality-judge.ts # Simplified quality assessment
857
- tsx examples/19-hallucination-detection.ts # Hallucination detection
858
- tsx examples/22-response-consistency.ts # Response consistency
859
-
860
- # Run compliance examples
861
- tsx examples/21-regulated-advice-compliance.ts # Regulatory compliance
862
- tsx examples/23-role-hierarchy-enforcement.ts # Role hierarchy enforcement
863
-
864
- # Run data integrity examples
865
- tsx examples/09-schema-validation.ts # Schema validation
866
- tsx examples/10-object-content-filter.ts # Object content filtering
867
- tsx examples/24-sql-code-safety.ts # SQL code safety
868
-
869
- # Run network security examples
870
- tsx examples/25-browsing-domain-allowlist.ts # Domain allowlisting
871
-
872
- # Run privacy examples
873
- tsx examples/26-memory-minimization.ts # Memory minimization
874
- tsx examples/27-logging-redaction.ts # Logging redaction
875
-
876
- # Run safety examples
877
- tsx examples/20-human-review-escalation.ts # Human review escalation
878
- tsx examples/29-toxicity-harassment-deescalation.ts # Toxicity de-escalation
879
-
880
- # Run streaming examples
881
- tsx examples/11-streaming-limits.ts # Streaming limits
882
- tsx examples/12-streaming-quality.ts # Streaming quality monitoring
883
- tsx examples/28-streaming-early-termination.ts # Streaming early termination
884
-
885
- # Run resource management examples
886
- tsx examples/13-rate-limiting.ts # Rate limiting
887
- ```
712
+ - Helper-first: simple, chainable APIs with great DX
713
+ - Composable: run multiple guardrails in any order
714
+ - Type-safe: rich TypeScript types and inference
715
+ - Sensible defaults: zero-config to start, full control when you need it
888
716
 
889
- ## 🤝 Contributing
717
+ ## Contributing
890
718
 
891
- Contributions of all sizes are welcome! Please open issues and pull requests on [GitHub](https://github.com/jagreehal/ai-sdk-guardrails).
719
+ Issues and PRs are welcome.
892
720
 
893
- ## 📄 License
721
+ ## License
894
722
 
895
- MIT © [Jag Reehal](https://github.com/jagreehal) See LICENSE for full details.
723
+ MIT © Jag Reehal. See LICENSE for details.