visual-ai-assertions 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,487 @@
1
+ # visual-ai-assertions
2
+
3
+ AI-powered visual assertions for E2E tests. Send screenshots to Claude, GPT, or Gemini and get structured, typed results.
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ # Install the library
9
+ npm install visual-ai-assertions
10
+
11
+ # Install your preferred provider SDK (pick one or more)
12
+ npm install @anthropic-ai/sdk # for Claude
13
+ npm install openai # for GPT
14
+ npm install @google/genai # for Gemini
15
+
16
+ # Zod is a peer dependency
17
+ npm install zod
18
+ ```
19
+
20
+ ### System Requirements
21
+
22
+ This library uses [sharp](https://sharp.pixelplumbing.com/) for image processing.
23
+ Sharp downloads native binaries automatically for most supported platforms.
24
+
25
+ If installation fails in CI, Docker, or a minimal Linux image:
26
+
27
+ - See the [sharp installation guide](https://sharp.pixelplumbing.com/install)
28
+ - On Alpine Linux, install `vips-dev` with `apk add --no-cache vips-dev`
29
+ - On minimal Docker images, use `--platform=linux/amd64` or install the required build tools
30
+
31
+ ## Quick Start
32
+
33
+ ### Playwright + Anthropic
34
+
35
+ ```typescript
36
+ import { test, expect } from "@playwright/test";
37
+ import { visualAI } from "visual-ai-assertions";
38
+
39
+ const ai = visualAI();
40
+ // Provider auto-inferred from ANTHROPIC_API_KEY env var
41
+
42
+ test("login page looks correct", async ({ page }) => {
43
+ await page.goto("https://myapp.com/login");
44
+ const screenshot = await page.screenshot();
45
+
46
+ const result = await ai.check(screenshot, [
47
+ "A login form is visible with email and password fields",
48
+ "A 'Sign In' button is present and visually enabled",
49
+ "The company logo appears in the header",
50
+ "No error messages are displayed",
51
+ ]);
52
+
53
+ // Simple pass/fail
54
+ expect(result.pass).toBe(true);
55
+
56
+ // Or inspect individual statements
57
+ for (const stmt of result.statements) {
58
+ expect(stmt.pass, `Failed: ${stmt.statement} — ${stmt.reasoning}`).toBe(true);
59
+ }
60
+ });
61
+ ```
62
+
63
+ ### WebDriverIO + OpenAI
64
+
65
+ ```typescript
66
+ import { visualAI } from "visual-ai-assertions";
67
+
68
+ const ai = visualAI({ model: "gpt-5-mini" });
69
+ // Provider inferred from model prefix
70
+
71
+ describe("Product Page", () => {
72
+ it("should display all required elements", async () => {
73
+ await browser.url("https://myapp.com/products/1");
74
+ const screenshot = await browser.saveScreenshot("./screenshot.png");
75
+
76
+ const result = await ai.elementsVisible(screenshot, [
77
+ "Product title",
78
+ "Price tag",
79
+ "Add to Cart button",
80
+ "Product image",
81
+ ]);
82
+
83
+ expect(result.pass).toBe(true);
84
+ });
85
+ });
86
+ ```
87
+
88
+ ## API Reference
89
+
90
+ ### `visualAI(config?)`
91
+
92
+ Create an AI visual analysis instance. Provider is auto-inferred from the model name or API key environment variable.
93
+
94
+ ```typescript
95
+ import { visualAI, Provider, Model } from "visual-ai-assertions";
96
+
97
+ // Minimal — provider inferred from ANTHROPIC_API_KEY env var
98
+ const ai = visualAI();
99
+
100
+ // Explicit configuration
101
+ const ai = visualAI({
102
+ provider: "anthropic", // optional — auto-inferred from model or API key
103
+ apiKey: "sk-...", // optional, defaults to provider env var
104
+ model: "claude-sonnet-4-6", // optional, sensible defaults per provider
105
+ debug: true, // optional, logs prompts/responses to stderr
106
+ maxTokens: 4096, // optional, default 4096
107
+ reasoningEffort: "high", // optional, "low" | "medium" | "high" | "xhigh"
108
+ trackUsage: false, // optional, defaults to false — usage stats to stderr
109
+ });
110
+
111
+ // Use constants for IDE autocomplete
112
+ const ai = visualAI({
113
+ provider: Provider.ANTHROPIC,
114
+ model: Model.Anthropic.SONNET_4_6,
115
+ });
116
+ ```
117
+
118
+ ### `ai.check(image, statements, options?)`
119
+
120
+ Visual assertion. Returns `pass: true` only if ALL statements are true.
121
+
122
+ ```typescript
123
+ // Single statement
124
+ const result = await ai.check(screenshot, "The login button is visible");
125
+
126
+ // Multiple statements
127
+ const result = await ai.check(screenshot, [
128
+ "The login button is visible",
129
+ "No error messages are displayed",
130
+ ]);
131
+
132
+ // With instructions
133
+ const result = await ai.check(screenshot, ["The form is submitted"], {
134
+ instructions: ["Ignore loading spinners that appear briefly"],
135
+ });
136
+ ```
137
+
138
+ **Returns:** `CheckResult`
139
+
140
+ ```typescript
141
+ {
142
+ pass: boolean; // true only if ALL statements pass
143
+ reasoning: string; // overall summary
144
+ issues: Issue[]; // structured findings
145
+ statements: StatementResult[]; // per-statement breakdown
146
+ usage?: {
147
+ inputTokens: number;
148
+ outputTokens: number;
149
+ estimatedCost?: number; // USD
150
+ durationSeconds?: number; // API call duration
151
+ };
152
+ }
153
+ ```
154
+
155
+ ### `ai.ask(image, prompt, options?)`
156
+
157
+ Free-form analysis. Returns structured issues with priority and category.
158
+
159
+ ```typescript
160
+ const result = await ai.ask(screenshot, "Analyze this page for UI issues");
161
+
162
+ // Filter by priority
163
+ const critical = result.issues.filter((i) => i.priority === "critical");
164
+
165
+ // With instructions
166
+ const result = await ai.ask(screenshot, "Check for accessibility issues", {
167
+ instructions: ["Ignore contrast on decorative elements"],
168
+ });
169
+ ```
170
+
171
+ **Returns:** `AskResult`
172
+
173
+ ```typescript
174
+ {
175
+ summary: string; // high-level analysis
176
+ issues: Issue[]; // categorized findings
177
+ usage?: {
178
+ inputTokens: number;
179
+ outputTokens: number;
180
+ estimatedCost?: number;
181
+ durationSeconds?: number;
182
+ };
183
+ }
184
+ ```
185
+
186
+ ### `ai.compare(imageA, imageB, options?)`
187
+
188
+ Compare two images and get structured differences.
189
+
190
+ ```typescript
191
+ import { writeFileSync } from "node:fs";
192
+
193
+ // Basic comparison
194
+ const result = await ai.compare(before, after);
195
+
196
+ // gemini-3-flash-preview includes an annotated diff by default.
197
+ // Pass { diffImage: false } to opt out.
198
+
199
+ // With custom prompt and instructions
200
+ const result = await ai.compare(before, after, {
201
+ prompt: "Focus on header layout changes",
202
+ instructions: ["Ignore date/time differences"],
203
+ });
204
+
205
+ // With AI-generated diff image (supported only by gemini-3-flash-preview)
206
+ const result = await ai.compare(before, after, {
207
+ diffImage: true,
208
+ });
209
+ if (result.diffImage) {
210
+ writeFileSync("diff.png", result.diffImage.data);
211
+ }
212
+ ```
213
+
214
+ **Returns:** `CompareResult`
215
+
216
+ ```typescript
217
+ {
218
+ pass: boolean; // true if no critical/major changes
219
+ reasoning: string; // overall summary
220
+ changes: ChangeEntry[]; // list of visual differences
221
+ diffImage?: { // present when diffing is enabled explicitly or by Gemini 3 preview defaults
222
+ data: Buffer; // PNG image data
223
+ width: number;
224
+ height: number;
225
+ mimeType: "image/png";
226
+ };
227
+ usage?: UsageInfo;
228
+ }
229
+ ```
230
+
231
+ Where `ChangeEntry` is:
232
+
233
+ ```typescript
234
+ {
235
+ description: string; // what changed
236
+ severity: "critical" | "major" | "minor";
237
+ }
238
+ ```
239
+
240
+ ### Template Methods
241
+
242
+ Type-safe methods for common visual QA checks. All return `CheckResult`. Use `Accessibility`, `Layout`, and `Content` constants for IDE autocomplete.
243
+
244
+ ```typescript
245
+ import { Accessibility, Layout, Content } from "visual-ai-assertions";
246
+
247
+ // Check that UI elements are visible
248
+ await ai.elementsVisible(screenshot, ["Submit button", "Nav bar", "Footer"]);
249
+
250
+ // Check that UI elements are hidden
251
+ await ai.elementsHidden(screenshot, ["Loading spinner", "Error modal"]);
252
+
253
+ // Accessibility checks (contrast, readability, interactive visibility)
254
+ await ai.accessibility(screenshot);
255
+ await ai.accessibility(screenshot, {
256
+ checks: [Accessibility.CONTRAST, Accessibility.READABILITY],
257
+ });
258
+
259
+ // Layout checks (overlap, overflow, alignment)
260
+ await ai.layout(screenshot);
261
+ await ai.layout(screenshot, {
262
+ checks: [Layout.OVERLAP, Layout.OVERFLOW],
263
+ instructions: ["Sticky headers may overlap content — ignore if < 10px"],
264
+ });
265
+
266
+ // Page load verification
267
+ await ai.pageLoad(screenshot);
268
+ await ai.pageLoad(screenshot, { expectLoaded: false }); // expect loading state
269
+
270
+ // Content checks (placeholder text, errors, broken images)
271
+ await ai.content(screenshot);
272
+ await ai.content(screenshot, {
273
+ checks: [Content.PLACEHOLDER_TEXT, Content.ERROR_MESSAGES],
274
+ });
275
+ ```
276
+
277
+ ### Issue Structure
278
+
279
+ Every issue includes:
280
+
281
+ ```typescript
282
+ {
283
+ priority: "critical" | "major" | "minor";
284
+ category: "accessibility" |
285
+ "missing-element" |
286
+ "layout" |
287
+ "content" |
288
+ "styling" |
289
+ "functionality" |
290
+ "performance" |
291
+ "other";
292
+ description: string; // what the issue is
293
+ suggestion: string; // how to fix it
294
+ }
295
+ ```
296
+
297
+ ### Image Input
298
+
299
+ Accepts multiple formats:
300
+
301
+ ```typescript
302
+ // Buffer (from Playwright screenshot)
303
+ const screenshot = await page.screenshot();
304
+ await ai.check(screenshot, "...");
305
+
306
+ // File path
307
+ await ai.check("./screenshots/page.png", "...");
308
+
309
+ // Base64 string
310
+ await ai.check(base64String, "...");
311
+
312
+ // URL
313
+ await ai.check("https://example.com/screenshot.png", "...");
314
+ ```
315
+
316
+ Oversized images are automatically resized to provider limits.
317
+
318
+ ### Formatting & Assertion Helpers
319
+
320
+ ```typescript
321
+ import {
322
+ formatCheckResult,
323
+ formatCompareResult,
324
+ assertVisualResult,
325
+ assertVisualCompareResult,
326
+ } from "visual-ai-assertions";
327
+
328
+ // Pretty-print results to console
329
+ const result = await ai.check(screenshot, ["Login form is visible"]);
330
+ console.log(formatCheckResult(result, "login-page"));
331
+
332
+ // Throw VisualAIAssertionError on failure (includes full result on error)
333
+ assertVisualResult(result, "login-page");
334
+
335
+ // Same for compare results
336
+ const diff = await ai.compare(before, after);
337
+ console.log(formatCompareResult(diff));
338
+ assertVisualCompareResult(diff, "regression-check");
339
+ ```
340
+
341
+ ## Error Handling
342
+
343
+ All errors extend `VisualAIError`, and every concrete error includes an `error.code` string for programmatic handling:
344
+
345
+ ```typescript
346
+ import { isVisualAIKnownError } from "visual-ai-assertions";
347
+
348
+ try {
349
+ const result = await ai.check(screenshot, "Page is loaded");
350
+ } catch (error) {
351
+ if (isVisualAIKnownError(error)) {
352
+ switch (error.code) {
353
+ case "AUTH_FAILED":
354
+ // Invalid or missing API key
355
+ break;
356
+ case "RATE_LIMITED":
357
+ // Rate limited — error.retryAfter has seconds to wait
358
+ break;
359
+ case "IMAGE_INVALID":
360
+ // Invalid image: corrupt, unsupported format, etc.
361
+ break;
362
+ case "RESPONSE_PARSE_FAILED":
363
+ // AI returned unparseable response — error.rawResponse has raw text
364
+ break;
365
+ case "CONFIG_INVALID":
366
+ // Provider SDK not installed or invalid config
367
+ break;
368
+ case "ASSERTION_FAILED":
369
+ // assertVisualResult threw — error.result has the full failed result
370
+ break;
371
+ case "PROVIDER_ERROR":
372
+ case "VISUAL_AI_ERROR":
373
+ break;
374
+ }
375
+ }
376
+ }
377
+ ```
378
+
379
+ The `VisualAIKnownError` union and `isVisualAIKnownError()` helper are useful when you want `switch (error.code)` to narrow to subclass-specific fields such as `retryAfter`, `statusCode`, or `rawResponse`. Class-based `instanceof` checks continue to work too.
380
+
381
+ ## Environment Variables
382
+
383
+ ### API Keys
384
+
385
+ | Provider | Environment Variable |
386
+ | --------- | -------------------- |
387
+ | Anthropic | `ANTHROPIC_API_KEY` |
388
+ | OpenAI | `OPENAI_API_KEY` |
389
+ | Google | `GOOGLE_API_KEY` |
390
+
391
+ ### Optional Configuration
392
+
393
+ | Variable | Description |
394
+ | ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
395
+ | `VISUAL_AI_PROVIDER` | Default provider when `provider` is not set in config. Must be `"anthropic"`, `"openai"`, or `"google"`. Falls back to auto-detecting from which API key env var is set. |
396
+ | `VISUAL_AI_MODEL` | Default model when `model` is not set in config. Overrides the provider's default model. |
397
+ | `VISUAL_AI_DEBUG` | Enable debug logging when `debug` is not set in config. Use `"true"` or `"1"` to enable. |
398
+ | `VISUAL_AI_TRACK_USAGE` | Enable usage tracking when `trackUsage` is not set in config. Use `"true"` or `"1"` to enable. |
399
+
400
+ ## Configuration
401
+
402
+ | Option | Type | Default | Description |
403
+ | ----------------- | ------- | ---------------- | ----------------------------------------------------------------------------- |
404
+ | `provider` | string | auto-inferred | `"anthropic"` `"openai"` `"google"` — inferred from model name or API key |
405
+ | `apiKey` | string | env var | API key for the provider |
406
+ | `model` | string | provider default | Model to use |
407
+ | `debug` | boolean | `false` | Log prompts/responses to stderr |
408
+ | `maxTokens` | number | `4096` | Max tokens for AI response |
409
+ | `reasoningEffort` | string | `undefined` | `"low"` `"medium"` `"high"` `"xhigh"` — controls how deeply the model reasons |
410
+ | `trackUsage` | boolean | `false` | Log token usage and estimated cost to stderr |
411
+
412
+ ## Exported Types
413
+
414
+ ```typescript
415
+ import type {
416
+ AskResult,
417
+ CheckResult,
418
+ CompareResult,
419
+ SupportedMimeType,
420
+ VisualAIConfig,
421
+ VisualAIErrorCode,
422
+ } from "visual-ai-assertions";
423
+ ```
424
+
425
+ `SupportedMimeType` is the exported image MIME union:
426
+
427
+ ```typescript
428
+ type SupportedMimeType = "image/jpeg" | "image/png" | "image/webp" | "image/gif";
429
+ ```
430
+
431
+ **Default models:**
432
+
433
+ | Provider | Default Model |
434
+ | --------- | ------------------------ |
435
+ | Anthropic | `claude-sonnet-4-6` |
436
+ | OpenAI | `gpt-5-mini` |
437
+ | Google | `gemini-3-flash-preview` |
438
+
439
+ ## Reasoning Effort
440
+
441
+ Control how deeply the model reasons before responding. Higher effort produces more thorough analysis but uses more tokens and takes longer.
442
+
443
+ ```typescript
444
+ const ai = visualAI({
445
+ reasoningEffort: "high", // "low" | "medium" | "high" | "xhigh"
446
+ });
447
+ ```
448
+
449
+ When omitted, each provider uses its default behavior. The `"xhigh"` level enables maximum reasoning depth (maps to Anthropic's `"max"` effort and OpenAI's `"xhigh"` via the Responses API).
450
+
451
+ | Provider | Native Parameter | `"xhigh"` maps to |
452
+ | --------- | ----------------------------------------------------- | -------------------- |
453
+ | Anthropic | `thinking.type: "adaptive"` + `output_config.effort` | `effort: "max"` |
454
+ | OpenAI | `reasoning.effort` (Responses API) | `effort: "xhigh"` |
455
+ | Google | `thinkingConfig.thinkingBudget` (1024 / 8192 / 24576) | `24576` (max budget) |
456
+
457
+ ## Supported Models
458
+
459
+ All listed models support image/vision input. Pass any model ID to the `model` config option.
460
+
461
+ ### Anthropic
462
+
463
+ | Model | Model ID | Input $/MTok | Output $/MTok | Notes |
464
+ | ----------------- | ------------------- | ------------ | ------------- | ----------------------------- |
465
+ | Claude Opus 4.6 | `claude-opus-4-6` | $5 | $25 | Most capable, 128K max output |
466
+ | Claude Sonnet 4.6 | `claude-sonnet-4-6` | $3 | $15 | **Default** — best value |
467
+ | Claude Haiku 4.5 | `claude-haiku-4-5` | $1 | $5 | Fastest, budget-friendly |
468
+
469
+ ### OpenAI
470
+
471
+ | Model | Model ID | Input $/MTok | Output $/MTok | Notes |
472
+ | ----------- | ------------- | ------------ | ------------- | ------------------------------ |
473
+ | GPT-5.4 Pro | `gpt-5.4-pro` | $30 | $180 | Most capable, extended context |
474
+ | GPT-5.4 | `gpt-5.4` | $2.50 | $15 | Best vision quality |
475
+ | GPT-5.2 | `gpt-5.2` | $1.75 | $14 | Balanced quality and cost |
476
+ | GPT-5 mini | `gpt-5-mini` | $0.25 | $2 | **Default** — fast and cheap |
477
+
478
+ ### Google
479
+
480
+ | Model | Model ID | Input $/MTok | Output $/MTok | Notes |
481
+ | -------------- | ------------------------ | ------------ | ------------- | --------------------------------- |
482
+ | Gemini 3.1 Pro | `gemini-3.1-pro-preview` | $2 | $12 | Preview — most advanced reasoning |
483
+ | Gemini 3 Flash | `gemini-3-flash-preview` | $0.50 | $3 | **Default** — fast and capable |
484
+
485
+ ## License
486
+
487
+ MIT