ai-evaluate 2.0.2 → 2.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,5 +1,4 @@
1
-
2
- 
3
- > ai-evaluate@2.0.1 build /Users/nathanclevenger/projects/primitives.org.ai/packages/ai-evaluate
4
- > tsc -p tsconfig.json
5
-
1
+
2
+ > ai-evaluate@2.1.3 build /Users/nathanclevenger/projects/primitives.org.ai/packages/ai-evaluate
3
+ > tsc -p tsconfig.json
4
+
package/CHANGELOG.md CHANGED
@@ -1,5 +1,36 @@
1
1
  # ai-evaluate
2
2
 
3
+ ## 2.1.3
4
+
5
+ ### Patch Changes
6
+
7
+ - Documentation and testing improvements
8
+ - Add deterministic AI testing suite with self-validating patterns
9
+ - Apply StoryBrand narrative to all package READMEs
10
+ - Update TESTING.md with four principles of deterministic AI testing
11
+ - Fix duplicate examples package name conflict
12
+
13
+ - Updated dependencies
14
+ - ai-functions@2.1.3
15
+ - ai-tests@2.1.3
16
+
17
+ ## 2.1.1
18
+
19
+ ### Patch Changes
20
+
21
+ - Updated dependencies [6beb531]
22
+ - ai-functions@2.1.1
23
+ - ai-tests@2.1.1
24
+
25
+ ## 2.0.3
26
+
27
+ ### Patch Changes
28
+
29
+ - Updated dependencies
30
+ - rpc.do@0.2.0
31
+ - ai-functions@2.0.3
32
+ - ai-tests@2.0.3
33
+
3
34
  ## 2.0.2
4
35
 
5
36
  ### Patch Changes
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 .org.ai
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md CHANGED
@@ -1,92 +1,100 @@
1
1
  # ai-evaluate
2
2
 
3
- Secure code execution in sandboxed environments. Run untrusted code safely using Cloudflare Workers or Miniflare.
3
+ **You need to run user code. But untrusted code is terrifying.**
4
4
 
5
- ## Installation
5
+ One malicious snippet could crash your server, access your file system, or make unauthorized network requests. You've seen the horror stories. You know the risks.
6
+
7
+ What if you could run any code with confidence?
8
+
9
+ ## The Solution
10
+
11
+ `ai-evaluate` runs untrusted code in V8 isolates with zero access to your system. No file system. No network (by default). No risk.
12
+
13
+ ```typescript
14
+ // Before: Dangerous eval
15
+ const result = eval(userCode) // Could do ANYTHING
16
+
17
+ // After: Sandboxed execution
18
+ import { evaluate } from 'ai-evaluate'
19
+
20
+ const result = await evaluate({
21
+ script: userCode
22
+ })
23
+ // Runs in isolated V8 context - your system is protected
24
+ ```
25
+
26
+ ## Quick Start
27
+
28
+ **1. Install**
6
29
 
7
30
  ```bash
8
31
  pnpm add ai-evaluate
9
32
  ```
10
33
 
11
- ## Quick Start
34
+ **2. Evaluate code safely**
12
35
 
13
36
  ```typescript
14
37
  import { evaluate } from 'ai-evaluate'
15
38
 
16
- // Run a simple script
17
39
  const result = await evaluate({
18
40
  script: '1 + 1'
19
41
  })
20
42
  // { success: true, value: 2, logs: [], duration: 5 }
43
+ ```
44
+
45
+ **3. Run tests on user code**
21
46
 
22
- // With a module and tests
47
+ ```typescript
23
48
  const result = await evaluate({
24
49
  module: `
25
50
  export const add = (a, b) => a + b
26
- export const multiply = (a, b) => a * b
27
51
  `,
28
52
  tests: `
29
- describe('math', () => {
53
+ describe('add', () => {
30
54
  it('adds numbers', () => {
31
- expect(add(2, 3)).toBe(5);
32
- })
33
- it('multiplies numbers', () => {
34
- expect(multiply(2, 3)).toBe(6);
55
+ expect(add(2, 3)).toBe(5)
35
56
  })
36
57
  })
37
- `,
38
- script: 'add(10, 20)'
58
+ `
39
59
  })
60
+ // result.testResults.passed === 1
40
61
  ```
41
62
 
42
- ## Features
63
+ ## What You Get
43
64
 
44
- - **Secure isolation** - Code runs in a sandboxed V8 isolate
45
- - **Vitest-compatible tests** - `describe`, `it`, `expect` in global scope
46
- - **Module exports** - Define modules and use exports in scripts/tests
47
- - **Cloudflare Workers** - Uses worker_loaders in production
48
- - **Miniflare** - Uses Miniflare for local development and Node.js
49
- - **Network isolation** - External network access blocked by default
65
+ - **Complete isolation** - Code runs in sandboxed V8 isolates
66
+ - **Built-in testing** - Vitest-compatible `describe`, `it`, `expect`
67
+ - **Module support** - Define exports and use them in scripts/tests
68
+ - **Production-ready** - Cloudflare Workers in production, Miniflare locally
69
+ - **Network blocked** - External access disabled by default
50
70
 
51
- ## API
71
+ ## API Reference
52
72
 
53
73
  ### evaluate(options)
54
74
 
55
- Execute code in a sandboxed environment.
56
-
57
75
  ```typescript
58
76
  interface EvaluateOptions {
59
- /** Module code with exports */
60
- module?: string
61
- /** Test code using vitest-style API */
62
- tests?: string
63
- /** Script code to run (module exports in scope) */
64
- script?: string
65
- /** Timeout in milliseconds (default: 5000) */
66
- timeout?: number
67
- /** Environment variables */
68
- env?: Record<string, string>
77
+ module?: string // Module code with exports
78
+ tests?: string // Vitest-style test code
79
+ script?: string // Script to execute
80
+ timeout?: number // Default: 5000ms
81
+ env?: Record<string, string> // Environment variables
82
+ sdk?: SDKConfig | boolean // Enable $, db, ai globals
69
83
  }
70
84
 
71
85
  interface EvaluateResult {
72
- /** Whether execution succeeded */
73
- success: boolean
74
- /** Return value from script */
75
- value?: unknown
76
- /** Console output */
77
- logs: LogEntry[]
78
- /** Test results (if tests provided) */
79
- testResults?: TestResults
80
- /** Error message if failed */
81
- error?: string
82
- /** Execution time in ms */
83
- duration: number
86
+ success: boolean // Execution succeeded
87
+ value?: unknown // Script return value
88
+ logs: LogEntry[] // Console output
89
+ testResults?: TestResults // Test results if tests provided
90
+ error?: string // Error message if failed
91
+ duration: number // Execution time in ms
84
92
  }
85
93
  ```
86
94
 
87
95
  ### createEvaluator(env)
88
96
 
89
- Create an evaluate function bound to a specific environment. Useful for Cloudflare Workers.
97
+ Bind to a Cloudflare Workers environment.
90
98
 
91
99
  ```typescript
92
100
  import { createEvaluator } from 'ai-evaluate'
@@ -94,24 +102,22 @@ import { createEvaluator } from 'ai-evaluate'
94
102
  export default {
95
103
  async fetch(request, env) {
96
104
  const sandbox = createEvaluator(env)
97
- const result = await sandbox({
98
- script: '1 + 1'
99
- })
105
+ const result = await sandbox({ script: '1 + 1' })
100
106
  return Response.json(result)
101
107
  }
102
108
  }
103
109
  ```
104
110
 
105
- ## Usage Patterns
111
+ ## Usage Examples
106
112
 
107
- ### Simple Script Execution
113
+ ### Simple Script
108
114
 
109
115
  ```typescript
110
116
  const result = await evaluate({
111
117
  script: `
112
- const x = 10;
113
- const y = 20;
114
- return x + y;
118
+ const x = 10
119
+ const y = 20
120
+ return x + y
115
121
  `
116
122
  })
117
123
  // result.value === 30
@@ -122,189 +128,137 @@ const result = await evaluate({
122
128
  ```typescript
123
129
  const result = await evaluate({
124
130
  module: `
125
- exports.greet = (name) => \`Hello, \${name}!\`;
126
- exports.sum = (...nums) => nums.reduce((a, b) => a + b, 0);
131
+ exports.greet = (name) => \`Hello, \${name}!\`
132
+ exports.sum = (...nums) => nums.reduce((a, b) => a + b, 0)
127
133
  `,
128
134
  script: `
129
- console.log(greet('World'));
130
- return sum(1, 2, 3, 4, 5);
135
+ console.log(greet('World'))
136
+ return sum(1, 2, 3, 4, 5)
131
137
  `
132
138
  })
133
139
  // result.value === 15
134
140
  // result.logs[0].message === 'Hello, World!'
135
141
  ```
136
142
 
137
- ### Running Tests
143
+ ### Testing User Code
138
144
 
139
145
  ```typescript
140
146
  const result = await evaluate({
141
147
  module: `
142
148
  exports.isPrime = (n) => {
143
- if (n < 2) return false;
149
+ if (n < 2) return false
144
150
  for (let i = 2; i <= Math.sqrt(n); i++) {
145
- if (n % i === 0) return false;
151
+ if (n % i === 0) return false
146
152
  }
147
- return true;
148
- };
153
+ return true
154
+ }
149
155
  `,
150
156
  tests: `
151
157
  describe('isPrime', () => {
152
158
  it('returns false for numbers less than 2', () => {
153
- expect(isPrime(0)).toBe(false);
154
- expect(isPrime(1)).toBe(false);
155
- });
159
+ expect(isPrime(0)).toBe(false)
160
+ expect(isPrime(1)).toBe(false)
161
+ })
156
162
 
157
163
  it('returns true for prime numbers', () => {
158
- expect(isPrime(2)).toBe(true);
159
- expect(isPrime(3)).toBe(true);
160
- expect(isPrime(17)).toBe(true);
161
- });
164
+ expect(isPrime(2)).toBe(true)
165
+ expect(isPrime(17)).toBe(true)
166
+ })
162
167
 
163
168
  it('returns false for composite numbers', () => {
164
- expect(isPrime(4)).toBe(false);
165
- expect(isPrime(9)).toBe(false);
166
- expect(isPrime(100)).toBe(false);
167
- });
168
- });
169
+ expect(isPrime(4)).toBe(false)
170
+ expect(isPrime(100)).toBe(false)
171
+ })
172
+ })
169
173
  `
170
174
  })
171
175
 
172
- console.log(result.testResults)
173
- // {
174
- // total: 3,
175
- // passed: 3,
176
- // failed: 0,
177
- // skipped: 0,
178
- // tests: [...]
179
- // }
176
+ // result.testResults = { total: 3, passed: 3, failed: 0, ... }
180
177
  ```
181
178
 
182
179
  ## Test Framework
183
180
 
184
- The sandbox provides a vitest-compatible test API with async support.
181
+ Full vitest-compatible API with async support.
185
182
 
186
- ### describe / it / test
183
+ ### Test Structure
187
184
 
188
185
  ```typescript
189
- describe('group name', () => {
190
- it('test name', () => {
191
- // test code
192
- });
193
-
194
- test('another test', () => {
195
- // test code
196
- });
197
-
198
- it.skip('skipped test', () => {
199
- // won't run
200
- });
201
-
202
- it.only('only this test', () => {
203
- // when .only is used, only these tests run
204
- });
205
- });
186
+ describe('group', () => {
187
+ it('test name', () => { /* ... */ })
188
+ test('another test', () => { /* ... */ })
189
+ it.skip('skipped', () => { /* ... */ })
190
+ it.only('focused', () => { /* ... */ })
191
+ })
206
192
  ```
207
193
 
208
194
  ### Async Tests
209
195
 
210
196
  ```typescript
211
- describe('async operations', () => {
212
- it('supports async/await', async () => {
213
- const result = await someAsyncFunction();
214
- expect(result).toBe('expected');
215
- });
216
-
217
- it('supports promises', () => {
218
- return fetchData().then(data => {
219
- expect(data).toBeDefined();
220
- });
221
- });
222
- });
197
+ it('async/await', async () => {
198
+ const result = await someAsyncFunction()
199
+ expect(result).toBe('expected')
200
+ })
223
201
  ```
224
202
 
225
203
  ### Hooks
226
204
 
227
205
  ```typescript
228
206
  describe('with setup', () => {
229
- let data;
230
-
231
- beforeEach(() => {
232
- data = { count: 0 };
233
- });
207
+ let data
234
208
 
235
- afterEach(() => {
236
- data = null;
237
- });
209
+ beforeEach(() => { data = { count: 0 } })
210
+ afterEach(() => { data = null })
238
211
 
239
- it('uses setup data', () => {
240
- data.count++;
241
- expect(data.count).toBe(1);
242
- });
243
- });
212
+ it('uses setup', () => {
213
+ data.count++
214
+ expect(data.count).toBe(1)
215
+ })
216
+ })
244
217
  ```
245
218
 
246
- ### expect matchers
219
+ ### Matchers
247
220
 
248
221
  ```typescript
249
222
  // Equality
250
- expect(value).toBe(expected) // Strict equality (===)
251
- expect(value).toEqual(expected) // Deep equality
252
- expect(value).toStrictEqual(expected) // Strict deep equality
223
+ expect(value).toBe(expected)
224
+ expect(value).toEqual(expected)
225
+ expect(value).toStrictEqual(expected)
253
226
 
254
227
  // Truthiness
255
- expect(value).toBeTruthy() // Truthy check
256
- expect(value).toBeFalsy() // Falsy check
257
- expect(value).toBeNull() // null check
258
- expect(value).toBeUndefined() // undefined check
259
- expect(value).toBeDefined() // not undefined
260
- expect(value).toBeNaN() // NaN check
228
+ expect(value).toBeTruthy()
229
+ expect(value).toBeFalsy()
230
+ expect(value).toBeNull()
231
+ expect(value).toBeUndefined()
232
+ expect(value).toBeDefined()
261
233
 
262
234
  // Numbers
263
- expect(value).toBeGreaterThan(n) // > comparison
264
- expect(value).toBeLessThan(n) // < comparison
265
- expect(value).toBeGreaterThanOrEqual(n)// >= comparison
266
- expect(value).toBeLessThanOrEqual(n) // <= comparison
267
- expect(value).toBeCloseTo(n, digits) // Floating point comparison
235
+ expect(value).toBeGreaterThan(n)
236
+ expect(value).toBeLessThan(n)
237
+ expect(value).toBeCloseTo(n, digits)
268
238
 
269
- // Strings
270
- expect(value).toMatch(/pattern/) // Regex match
271
- expect(value).toMatch('substring') // Contains substring
272
-
273
- // Arrays & Strings
274
- expect(value).toContain(item) // Array/string contains
275
- expect(value).toContainEqual(item) // Array contains (deep equality)
276
- expect(value).toHaveLength(n) // Length check
239
+ // Strings & Arrays
240
+ expect(value).toMatch(/pattern/)
241
+ expect(value).toContain(item)
242
+ expect(value).toHaveLength(n)
277
243
 
278
244
  // Objects
279
- expect(value).toHaveProperty('path') // Has property
280
- expect(value).toHaveProperty('path', v)// Has property with value
281
- expect(value).toMatchObject(partial) // Partial object match
282
-
283
- // Types
284
- expect(value).toBeInstanceOf(Class) // instanceof check
285
- expect(value).toBeTypeOf('string') // typeof check
245
+ expect(value).toHaveProperty('path')
246
+ expect(value).toMatchObject(partial)
286
247
 
287
248
  // Errors
288
- expect(fn).toThrow() // Throws any error
289
- expect(fn).toThrow('message') // Throws with message
290
- expect(fn).toThrow(/pattern/) // Throws matching pattern
291
- expect(fn).toThrow(ErrorClass) // Throws specific error type
249
+ expect(fn).toThrow()
250
+ expect(fn).toThrow('message')
292
251
 
293
- // Negated matchers
252
+ // Negation
294
253
  expect(value).not.toBe(expected)
295
- expect(value).not.toEqual(expected)
296
- expect(value).not.toContain(item)
297
- expect(fn).not.toThrow()
298
254
 
299
- // Promise matchers
255
+ // Promises
300
256
  await expect(promise).resolves.toBe(value)
301
257
  await expect(promise).rejects.toThrow('error')
302
258
  ```
303
259
 
304
260
  ## Cloudflare Workers Setup
305
261
 
306
- To use in Cloudflare Workers with worker_loaders:
307
-
308
262
  ### wrangler.toml
309
263
 
310
264
  ```toml
@@ -315,7 +269,7 @@ main = "src/index.ts"
315
269
  binding = "LOADER"
316
270
  ```
317
271
 
318
- ### Worker Code
272
+ ### Worker
319
273
 
320
274
  ```typescript
321
275
  import { createEvaluator } from 'ai-evaluate'
@@ -327,7 +281,6 @@ export interface Env {
327
281
  export default {
328
282
  async fetch(request: Request, env: Env): Promise<Response> {
329
283
  const sandbox = createEvaluator(env)
330
-
331
284
  const { code, tests } = await request.json()
332
285
 
333
286
  const result = await sandbox({
@@ -340,58 +293,33 @@ export default {
340
293
  }
341
294
  ```
342
295
 
343
- ## Node.js / Development
296
+ ## Local Development
344
297
 
345
- In Node.js or during development, the evaluate function automatically uses Miniflare:
298
+ In Node.js, Miniflare is used automatically:
346
299
 
347
300
  ```typescript
348
301
  import { evaluate } from 'ai-evaluate'
349
302
 
350
- // Miniflare is used automatically when LOADER binding is not present
351
303
  const result = await evaluate({
352
304
  script: 'return "Hello from Node!"'
353
305
  })
354
306
  ```
355
307
 
356
- Make sure `miniflare` is installed:
308
+ Ensure Miniflare is installed:
357
309
 
358
310
  ```bash
359
311
  pnpm add miniflare
360
312
  ```
361
313
 
362
- ## Security
363
-
364
- The sandbox provides several security features:
365
-
366
- 1. **V8 Isolate** - Code runs in an isolated V8 context
367
- 2. **No Network** - External network access is blocked (`globalOutbound: null`)
368
- 3. **No File System** - No access to the file system
369
- 4. **Memory Limits** - Standard Worker memory limits apply
370
- 5. **CPU Limits** - Execution time is limited
314
+ ## Security Model
371
315
 
372
- ## Example: Code Evaluation API
373
-
374
- ```typescript
375
- import { evaluate } from 'ai-evaluate'
376
- import { Hono } from 'hono'
377
-
378
- const app = new Hono()
379
-
380
- app.post('/evaluate', async (c) => {
381
- const { module, tests, script } = await c.req.json()
382
-
383
- const result = await evaluate({
384
- module,
385
- tests,
386
- script,
387
- timeout: 5000
388
- })
389
-
390
- return c.json(result)
391
- })
392
-
393
- export default app
394
- ```
316
+ | Protection | Description |
317
+ |------------|-------------|
318
+ | V8 Isolate | Code runs in isolated V8 context |
319
+ | No Network | External access blocked by default |
320
+ | No File System | Zero filesystem access |
321
+ | Memory Limits | Standard Worker limits apply |
322
+ | CPU Limits | Execution time bounded |
395
323
 
396
324
  ## Types
397
325
 
@@ -418,3 +346,11 @@ interface TestResult {
418
346
  duration: number
419
347
  }
420
348
  ```
349
+
350
+ ---
351
+
352
+ **Stop worrying about untrusted code. Start building.**
353
+
354
+ ```bash
355
+ pnpm add ai-evaluate
356
+ ```
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ai-evaluate",
3
- "version": "2.0.2",
3
+ "version": "2.1.3",
4
4
  "description": "Secure code execution in sandboxed environments",
5
5
  "type": "module",
6
6
  "main": "dist/index.js",
@@ -11,19 +11,10 @@
11
11
  "types": "./dist/index.d.ts"
12
12
  }
13
13
  },
14
- "scripts": {
15
- "build": "tsc -p tsconfig.json",
16
- "dev": "tsc -p tsconfig.json --watch",
17
- "test": "vitest",
18
- "typecheck": "tsc --noEmit",
19
- "lint": "eslint .",
20
- "clean": "rm -rf dist"
21
- },
22
14
  "dependencies": {
23
- "ai-functions": "2.0.2",
24
- "ai-tests": "2.0.2",
25
15
  "capnweb": "^0.2.0",
26
- "rpc.do": "^0.1.0"
16
+ "ai-functions": "2.1.3",
17
+ "ai-tests": "2.1.3"
27
18
  },
28
19
  "devDependencies": {
29
20
  "@vitest/coverage-v8": "^2.1.0",
@@ -42,5 +33,13 @@
42
33
  "miniflare",
43
34
  "primitives"
44
35
  ],
45
- "license": "MIT"
46
- }
36
+ "license": "MIT",
37
+ "scripts": {
38
+ "build": "tsc -p tsconfig.json",
39
+ "dev": "tsc -p tsconfig.json --watch",
40
+ "test": "vitest",
41
+ "typecheck": "tsc --noEmit",
42
+ "lint": "eslint .",
43
+ "clean": "rm -rf dist"
44
+ }
45
+ }