visual-ai-assertions 0.8.0 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # visual-ai-assertions
2
2
 
3
- AI-powered visual assertions for E2E tests. Send screenshots to Claude, GPT, or Gemini and get structured, typed results.
3
+ AI-powered visual assertions for E2E tests. Send screenshots — or short video recordings — to Claude, GPT, or Gemini and get structured, typed results.
4
4
 
5
5
  ## Installation
6
6
 
@@ -12,6 +12,9 @@ npm install visual-ai-assertions
12
12
  npm install @anthropic-ai/sdk # for Claude
13
13
  npm install @google/genai # for Gemini
14
14
 
15
+ # Optional: install ffmpeg deps to enable video input
16
+ npm install --save-dev fluent-ffmpeg @ffmpeg-installer/ffmpeg @ffprobe-installer/ffprobe
17
+
15
18
  # Zod is a peer dependency
16
19
  npm install zod
17
20
  ```
@@ -112,9 +115,9 @@ const ai = visualAI({
112
115
  });
113
116
  ```
114
117
 
115
- ### `ai.check(image, statements, options?)`
118
+ ### `ai.check(input, statements, options?)`
116
119
 
117
- Visual assertion. Returns `pass: true` only if ALL statements are true.
120
+ Visual assertion against a screenshot or short video. Returns `pass: true` only if ALL statements are true. For video inputs, a statement passes when it is true at any sampled frame.
118
121
 
119
122
  ```typescript
120
123
  // Single statement
@@ -130,6 +133,12 @@ const result = await ai.check(screenshot, [
130
133
  const result = await ai.check(screenshot, ["The form is submitted"], {
131
134
  instructions: ["Ignore loading spinners that appear briefly"],
132
135
  });
136
+
137
+ // Video input — statement is true if it ever happens during the clip
138
+ const result = await ai.check("./recording.webm", [
139
+ 'A success toast with text "Saved" briefly appears',
140
+ ]);
141
+ console.log(result.statements[0].timestampSeconds); // e.g. 3.5
133
142
  ```
134
143
 
135
144
  **Returns:** `CheckResult`
@@ -149,9 +158,9 @@ const result = await ai.check(screenshot, ["The form is submitted"], {
149
158
  }
150
159
  ```
151
160
 
152
- ### `ai.ask(image, prompt, options?)`
161
+ ### `ai.ask(input, prompt, options?)`
153
162
 
154
- Free-form analysis. Returns structured issues with priority and category.
163
+ Free-form analysis of an image or video. Returns structured issues with priority and category. Video inputs are sampled into a frame timeline; the result includes `frameReferences` indicating which frames the model relied on.
155
164
 
156
165
  ```typescript
157
166
  const result = await ai.ask(screenshot, "Analyze this page for UI issues");
@@ -312,6 +321,39 @@ await ai.check("https://example.com/screenshot.png", "...");
312
321
 
313
322
  Oversized images are automatically resized to provider limits.
314
323
 
324
+ ### Video Input
325
+
326
+ `ai.check()` and `ai.ask()` also accept short video recordings (`.mp4`, `.webm`, `.mov`, `.mkv`) — useful for asserting on transient UI like toast messages. Accepted shapes are file path, `data:video/...;base64,...` URL, raw base64 string, `Buffer`, and `Uint8Array`. HTTP/HTTPS URLs are not supported for video inputs — fetch the bytes yourself first.
327
+
328
+ ```typescript
329
+ // Playwright recording on disk
330
+ const result = await ai.check("./trace/video/recording.webm", [
331
+ 'A success toast with text "Saved" briefly appears',
332
+ ]);
333
+
334
+ // Result includes frame metadata + per-statement timestamps
335
+ console.log(result.frames);
336
+ // { count: 4, timestampsSeconds: [0.5, 1.5, 2.5, 3.5], durationSeconds: 4.0 }
337
+ console.log(result.statements[0].timestampSeconds); // 3.5
338
+
339
+ // Override sampling — defaults are 1 fps, max 10 frames, max 10 s of video
340
+ await ai.check("./long-clip.mp4", ["Loader disappears"], {
341
+ video: { fps: 2, maxFrames: 20, maxDurationSeconds: 15 },
342
+ });
343
+ ```
344
+
345
+ `maxFrames` is hard-capped at 60 to keep memory bounded. Frames are downscaled so the longer edge fits within 1568 px before being sent to the provider.
346
+
347
+ How it works: the library samples frames with ffmpeg and sends them to the provider as an ordered timeline. A statement passes when it is true at any sampled frame, unless its wording specifies otherwise (e.g. "throughout"). Template helpers (`accessibility`, `layout`, `pageLoad`, `content`, `elementsVisible`, `elementsHidden`) are image-only — pass video to `check()` or `ask()` instead.
348
+
349
+ **ffmpeg setup.** Video support is gated on three optional peer deps:
350
+
351
+ ```bash
352
+ npm install --save-dev fluent-ffmpeg @ffmpeg-installer/ffmpeg @ffprobe-installer/ffprobe
353
+ ```
354
+
355
+ Calling `check()` or `ask()` with a video input throws `VisualAIVideoError` (import from `visual-ai-assertions` to `instanceof`-narrow it) if these packages aren't installed. If you already have `ffmpeg`/`ffprobe` on `PATH`, only `fluent-ffmpeg` is required.
356
+
315
357
  ### Formatting & Assertion Helpers
316
358
 
317
359
  ```typescript
@@ -356,6 +398,9 @@ try {
356
398
  case "IMAGE_INVALID":
357
399
  // Invalid image: corrupt, unsupported format, etc.
358
400
  break;
401
+ case "VIDEO_INVALID":
402
+ // Invalid video: missing ffmpeg deps, oversized clip, decode failure, etc.
403
+ break;
359
404
  case "RESPONSE_PARSE_FAILED":
360
405
  // AI returned unparseable response — error.rawResponse has raw text
361
406
  break;
@@ -415,7 +460,12 @@ import type {
415
460
  AskResult,
416
461
  CheckResult,
417
462
  CompareResult,
463
+ Frame,
464
+ MediaInput,
418
465
  SupportedMimeType,
466
+ SupportedVideoMimeType,
467
+ VideoFramesMetadata,
468
+ VideoSamplingOptions,
419
469
  VisualAIConfig,
420
470
  VisualAIErrorCode,
421
471
  } from "visual-ai-assertions";