@arclabs561/ai-visual-test 0.5.1 → 0.7.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (74) hide show
  1. package/CHANGELOG.md +127 -11
  2. package/DEPLOYMENT.md +225 -9
  3. package/README.md +71 -80
  4. package/index.d.ts +902 -5
  5. package/package.json +10 -51
  6. package/src/batch-optimizer.mjs +39 -0
  7. package/src/cache.mjs +241 -16
  8. package/src/config.mjs +33 -91
  9. package/src/constants.mjs +54 -0
  10. package/src/convenience.mjs +113 -10
  11. package/src/cost-optimization.mjs +1 -0
  12. package/src/cost-tracker.mjs +134 -2
  13. package/src/data-extractor.mjs +36 -7
  14. package/src/dynamic-few-shot.mjs +69 -11
  15. package/src/errors.mjs +6 -2
  16. package/src/experience-propagation.mjs +12 -0
  17. package/src/experience-tracer.mjs +12 -3
  18. package/src/game-player.mjs +222 -43
  19. package/src/graceful-shutdown.mjs +126 -0
  20. package/src/helpers/playwright.mjs +22 -8
  21. package/src/human-validation-manager.mjs +99 -2
  22. package/src/index.mjs +48 -3
  23. package/src/integrations/playwright.mjs +140 -0
  24. package/src/judge.mjs +699 -24
  25. package/src/load-env.mjs +2 -1
  26. package/src/logger.mjs +31 -3
  27. package/src/model-tier-selector.mjs +1 -221
  28. package/src/natural-language-specs.mjs +31 -3
  29. package/src/persona-enhanced.mjs +4 -2
  30. package/src/persona-experience.mjs +1 -1
  31. package/src/pricing.mjs +28 -0
  32. package/src/prompt-composer.mjs +162 -5
  33. package/src/provider-data.mjs +115 -0
  34. package/src/render-change-detector.mjs +5 -0
  35. package/src/research-enhanced-validation.mjs +7 -5
  36. package/src/retry.mjs +21 -7
  37. package/src/rubrics.mjs +4 -0
  38. package/src/safe-logger.mjs +71 -0
  39. package/src/session-cost-tracker.mjs +320 -0
  40. package/src/smart-validator.mjs +8 -8
  41. package/src/spec-templates.mjs +52 -6
  42. package/src/startup-validation.mjs +127 -0
  43. package/src/temporal-adaptive.mjs +2 -2
  44. package/src/temporal-decision-manager.mjs +1 -271
  45. package/src/temporal-logic.mjs +104 -0
  46. package/src/temporal-note-pruner.mjs +119 -0
  47. package/src/temporal-preprocessor.mjs +1 -543
  48. package/src/temporal.mjs +681 -79
  49. package/src/utils/action-hallucination-detector.mjs +301 -0
  50. package/src/utils/baseline-validator.mjs +82 -0
  51. package/src/utils/cache-stats.mjs +104 -0
  52. package/src/utils/cached-llm.mjs +164 -0
  53. package/src/utils/capability-stratifier.mjs +108 -0
  54. package/src/utils/counterfactual-tester.mjs +83 -0
  55. package/src/utils/error-recovery.mjs +117 -0
  56. package/src/utils/explainability-scorer.mjs +119 -0
  57. package/src/utils/exploratory-automation.mjs +131 -0
  58. package/src/utils/index.mjs +10 -0
  59. package/src/utils/intent-recognizer.mjs +201 -0
  60. package/src/utils/log-sanitizer.mjs +165 -0
  61. package/src/utils/path-validator.mjs +88 -0
  62. package/src/utils/performance-logger.mjs +316 -0
  63. package/src/utils/performance-measurement.mjs +280 -0
  64. package/src/utils/prompt-sanitizer.mjs +213 -0
  65. package/src/utils/rate-limiter.mjs +144 -0
  66. package/src/validation-framework.mjs +24 -20
  67. package/src/validation-result-normalizer.mjs +35 -1
  68. package/src/validation.mjs +75 -25
  69. package/src/validators/accessibility-validator.mjs +144 -0
  70. package/src/validators/hybrid-validator.mjs +48 -4
  71. package/api/health.js +0 -34
  72. package/api/validate.js +0 -252
  73. package/public/index.html +0 -149
  74. package/vercel.json +0 -27
package/CHANGELOG.md CHANGED
@@ -1,24 +1,140 @@
1
1
  # Changelog
2
2
 
3
- All notable changes to @arclabs561/ai-visual-test will be documented in this file.
3
+ All notable changes to ai-visual-test will be documented in this file.
4
4
 
5
- ## [0.5.1] - 2025-11-14
5
+ ## [0.7.4] - 2026-03-03
6
+
7
+ ### Added
8
+ - **Structured result fields at top level** - `result.richIssues`, `result.recommendations`, `result.strengths` promoted from `result.semantic` to the top-level result object, eliminating the need to reach into `result.semantic` for structured output.
9
+ - `richIssues`: array of `{ description, importance, annoyance, impact, evidence, suggestion }` objects
10
+ - `recommendations`: array of `{ priority, suggestion, expectedImpact }` objects
11
+ - `strengths`: array of strings describing what works well
12
+ - **TypeScript types** for `RichIssue`, `Recommendation`; updated `SemanticInfo` and `ValidationResult` interfaces.
13
+
14
+ ### Fixed
15
+ - `result.issues` (flat strings) is preserved for backward compatibility; `result.richIssues` adds the structured version alongside it.
16
+
17
+ ## [0.7.3] - 2026-03-02
18
+
19
+ ### Added
20
+ - **Visual anchors** - domain-level grounding cues (text + image) injected into VLM prompts. Supports `AnchorEntry` union type: plain strings, dimension-scoped text, image references, or combinations. Config-level anchors merge with per-call `context.anchors`.
21
+ - **Dimension-scoped anchors** - tag anchors with rubric dimension names for targeted evaluation.
22
+ - **Image anchor resolution** - file paths, data URIs, and raw base64 supported for reference screenshots.
23
+
24
+ ### Fixed
25
+ - Prompt composer: proper `\n\n` separation between anchor section and base prompt.
26
+ - Judge: always warn on missing anchor images (not just verbose mode).
27
+ - Build script: strip `scripts` and `devDependencies` from dist `package.json`.
28
+ - Publish workflow: run only unit tests in CI; audit prod deps only.
29
+
30
+ ## [0.6.0] - 2025-01-17
6
31
 
7
32
  ### Changed
8
- - **Package renamed to scoped** - Now published as `@arclabs561/ai-visual-test` for consistency with other @arclabs561 packages
9
- - **Breaking change**: Update imports from `ai-visual-test` to `@arclabs561/ai-visual-test`
33
+ - **Selective Obfuscation** - Core algorithms obfuscated while maintaining debuggability
34
+ - Obfuscates only Tier 1 files (temporal decision, cost optimization, activity preprocessing)
35
+ - Keeps API surface, validators, utilities, and cache system readable
36
+ - Transparent about obfuscation strategy in README
37
+ - TypeScript definitions enhanced with comprehensive JSDoc (survives obfuscation)
38
+ - **Documentation Strategy** - Minimal, self-contained documentation in package
39
+ - `API_QUICK_REFERENCE.md` - Essential API patterns (in package)
40
+ - `EXAMPLES.md` - Working code examples (in package)
41
+ - Enhanced TypeScript definitions with examples and usage patterns
42
+ - README updated with obfuscation transparency section
43
+ - All documentation self-contained (no external hosting, GitHub is private)
44
+
45
+ ### Security
46
+ - **Path Traversal Prevention** - Added comprehensive path validation to prevent directory traversal attacks
47
+ - `src/utils/path-validator.mjs` - Centralized path validation utilities
48
+ - All image paths validated before file operations
49
+ - Absolute paths properly resolved and validated
50
+ - **Prompt Injection Protection** - Protection against prompt injection attacks
51
+ - `src/utils/prompt-sanitizer.mjs` - Prompt sanitization and security validation
52
+ - Strict mode validation (default) or sanitization mode
53
+ - Detects and prevents malicious prompt patterns
54
+ - **Image Format Validation** - Magic byte validation to prevent MIME type spoofing
55
+ - Validates PNG, JPEG, GIF, WebP formats using file signatures
56
+ - Prevents malicious file uploads disguised as images
57
+ - **Library-Level Rate Limiting** - Configurable request and cost-based rate limiting
58
+ - `src/utils/rate-limiter.mjs` - Request and cost-based rate limiting
59
+ - Prevents API abuse and cost overruns
60
+ - Configurable limits per window
61
+ - **Log Sanitization** - All logged output sanitized to prevent information leakage
62
+ - `src/utils/log-sanitizer.mjs` - Utilities for sanitizing sensitive data
63
+ - Error messages use basename for file paths
64
+ - Sensitive data removed from logs
65
+ - **Input Validation** - Comprehensive input validation
66
+ - Prompt length limits (10k characters max)
67
+ - File path validation for all file operations
68
+ - Error message sanitization
69
+
70
+ ### Changed
71
+ - **Repository Privacy** - GitHub repository made private
72
+ - Source code, history, and internal documentation no longer publicly accessible
73
+ - **Selective Obfuscation** - Protects proprietary algorithms while maintaining usability
74
+ - Obfuscates: `temporal-decision-manager.mjs`, `cost-optimization.mjs`, `model-tier-selector.mjs`, `temporal-preprocessor.mjs`
75
+ - Readable: API surface, validators, utilities, cache system, error handling
76
+ - Build script shows which files are obfuscated (🔒) vs readable (📄)
77
+ - Transparent documentation about obfuscation strategy
78
+ - **Package Cleanup** - Removed deployment-specific files from npm package
79
+ - Removed `vercel.json`, `api/**/*.js`, `public/**/*.html` from package
80
+ - Package now contains only library code (115 files)
81
+ - Cleaner, library-only distribution
82
+
83
+ ### Added
84
+ - **Security Utilities**
85
+ - `src/utils/path-validator.mjs` - Path validation and traversal prevention
86
+ - `src/utils/prompt-sanitizer.mjs` - Prompt injection protection
87
+ - `src/utils/rate-limiter.mjs` - Library-level rate limiting
88
+ - `src/utils/log-sanitizer.mjs` - Log sanitization utilities
89
+ - **Build System**
90
+ - `scripts/build-obfuscated.mjs` - Obfuscation build script
91
+ - `scripts/cleanup-root-docs.mjs` - Repository cleanup automation
92
+ - `npm run build` - Build obfuscated package
93
+ - `npm run build:skip-obfuscation` - Build without obfuscation (testing)
94
+ - **Documentation**
95
+ - `API_QUICK_REFERENCE.md` - Essential API patterns (in package)
96
+ - `EXAMPLES.md` - Working code examples (in package)
97
+ - Enhanced TypeScript definitions with comprehensive JSDoc comments
98
+ - `docs/OBFUSCATION_STRATEGY.md` - Complete obfuscation strategy
99
+ - `docs/OBFUSCATION_IMPLEMENTATION.md` - Implementation details
100
+
101
+ ### Improved
102
+ - **Error Handling** - Enhanced error messages with sanitization
103
+ - File paths use basename in error messages
104
+ - No sensitive information in error output
105
+ - Better error categorization
106
+ - **Secret Detection** - Improved false positive handling
107
+ - Added patterns for common code constructs
108
+ - Excluded script from self-checking
109
+ - Better detection of actual secrets vs. code patterns
110
+
111
+ ### Fixed
112
+ - **Test Failures** - Fixed ExploratoryStrategy test (shared state issue)
113
+ - **Build Script** - Fixed obfuscator detection logic
114
+ - **Package Paths** - Fixed package.json paths for dist/ directory
115
+
116
+ ### Repository
117
+ - **Cleanup** - Archived 14 temporary documentation files
118
+ - **Organization** - Root directory reduced from ~20+ to 7 essential files
119
+ - **Gitignore** - Updated to exclude temporary files and deployment configs
120
+
121
+ ### Security Rating
122
+ - Improved from **LOW-MEDIUM** to **8.5/10**
123
+ - All critical vulnerabilities addressed
124
+ - Production-ready security posture
10
125
 
11
126
  ## [0.5.0] - 2025-11-13
12
127
 
13
128
  ### Added
14
129
  - **API Sub-Modules** - Organized API into logical sub-modules for better tree-shaking
15
- - `@arclabs561/ai-visual-test/validators` - All validation functionality
16
- - `@arclabs561/ai-visual-test/temporal` - Temporal aggregation and decision-making
17
- - `@arclabs561/ai-visual-test/multi-modal` - Multi-modal validation features
18
- - `@arclabs561/ai-visual-test/ensemble` - Ensemble judging and bias detection
19
- - `@arclabs561/ai-visual-test/persona` - Persona-based testing
20
- - `@arclabs561/ai-visual-test/specs` - Natural language specifications
21
- - `@arclabs561/ai-visual-test/utils` - Utility functions and infrastructure
130
+ - `ai-visual-test/validators` - All validation functionality
131
+ - `ai-visual-test/temporal` - Temporal aggregation and decision-making
132
+ - `ai-visual-test/multi-modal` - Multi-modal validation features
133
+ - `ai-visual-test/ensemble` - Ensemble judging and bias detection
134
+ - `ai-visual-test/persona` - Persona-based testing
135
+ - `ai-visual-test/specs` - Natural language specifications
136
+ - `ai-visual-test/utils` - Utility functions and infrastructure
137
+ - Main export (`ai-visual-test`) still works for backward compatibility
22
138
  - **Smart Validators** - Automatically select the best validator type based on available context
23
139
  - `validateSmart()` - Universal smart validator that auto-selects best method
24
140
  - `validateAccessibilitySmart()` - Smart accessibility validation (programmatic/VLLM/hybrid)
package/DEPLOYMENT.md CHANGED
@@ -1,5 +1,14 @@
1
1
  # Deployment Guide
2
2
 
3
+ ## Overview
4
+
5
+ This guide covers deploying `@arclabs561/ai-visual-test` in production environments, including:
6
+ - Vercel serverless deployment
7
+ - Docker containerization
8
+ - Health checks and monitoring
9
+ - Graceful shutdown
10
+ - Environment variable validation
11
+
3
12
  ## Vercel Deployment
4
13
 
5
14
  ### Quick Deploy
@@ -15,22 +24,95 @@ vercel
15
24
 
16
25
  ### Environment Variables
17
26
 
18
- Set these in Vercel dashboard:
27
+ **Required** (at least one API key):
28
+ - `GEMINI_API_KEY` - For Gemini provider
29
+ - `OPENAI_API_KEY` - For OpenAI provider
30
+ - `ANTHROPIC_API_KEY` - For Claude/Anthropic provider
31
+ - `GROQ_API_KEY` - For Groq provider (high-frequency decisions)
32
+
33
+ **Optional**:
34
+ - `VLM_PROVIDER` - Provider to use (auto-detected if not set): `gemini`, `openai`, `claude`, `groq`
35
+ - `VLM_MODEL` - Explicit model override
36
+ - `VLM_MODEL_TIER` - Model tier: `fast`, `balanced`, `best`
37
+ - `API_KEY` or `VLLM_API_KEY` - For API endpoint authentication
38
+ - `REQUIRE_AUTH` - Set to `true` to enforce authentication (default: `true` if API_KEY is set)
39
+ - `RATE_LIMIT_MAX_REQUESTS` - Max requests per minute (default: 10)
40
+ - `DISABLE_LLM_CACHE` - Set to `true` to disable caching globally
41
+
42
+ ### Startup Validation
19
43
 
20
- - `GEMINI_API_KEY` (or `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`)
21
- - `VLM_PROVIDER` (optional)
22
- - `API_KEY` or `VLLM_API_KEY` (optional, for API authentication)
23
- - `REQUIRE_AUTH` (optional, set to `true` to enforce authentication)
24
- - `RATE_LIMIT_MAX_REQUESTS` (optional, default: 10 requests per minute)
44
+ The library automatically validates configuration at startup. If required environment variables are missing, you'll get clear error messages:
45
+
46
+ ```javascript
47
+ import { validateStartup } from '@arclabs561/ai-visual-test';
48
+
49
+ // Strict validation (throws on missing vars)
50
+ try {
51
+ validateStartup();
52
+ console.log('✅ Configuration valid');
53
+ } catch (error) {
54
+ console.error('❌ Configuration invalid:', error.message);
55
+ // Error includes actionable guidance:
56
+ // "Missing required environment variables for provider 'gemini': GEMINI_API_KEY"
57
+ }
58
+
59
+ // Soft validation (returns warnings)
60
+ const result = validateStartupSoft();
61
+ if (!result.valid) {
62
+ console.warn('⚠️ Configuration warnings:', result.warnings);
63
+ }
64
+ ```
25
65
 
26
66
  ### API Endpoints
27
67
 
28
68
  After deployment, you'll have:
29
69
 
30
- - `https://your-site.vercel.app/api/validate` - Validation endpoint
31
- - `https://your-site.vercel.app/api/health` - Health check
70
+ - `https://your-site.vercel.app/api/validate` - Validation endpoint (POST)
71
+ - `https://your-site.vercel.app/api/health` - Health check (GET)
32
72
  - `https://your-site.vercel.app/` - Web interface
33
73
 
74
+ #### Health Check Endpoint
75
+
76
+ The health check endpoint provides comprehensive status:
77
+
78
+ ```bash
79
+ curl https://your-site.vercel.app/api/health
80
+ ```
81
+
82
+ **Response**:
83
+ ```json
84
+ {
85
+ "status": "healthy",
86
+ "timestamp": "2025-01-17T12:00:00.000Z",
87
+ "version": "0.5.5",
88
+ "config": {
89
+ "enabled": true,
90
+ "provider": "gemini",
91
+ "hasApiKey": true
92
+ },
93
+ "validation": {
94
+ "valid": true,
95
+ "warnings": []
96
+ },
97
+ "cache": {
98
+ "enabled": true,
99
+ "hits": 1234,
100
+ "misses": 567,
101
+ "hitRate": 0.685
102
+ }
103
+ }
104
+ ```
105
+
106
+ **Status Codes**:
107
+ - `200` - Healthy (all checks pass)
108
+ - `503` - Degraded (configuration issues, but service may still work)
109
+ - `500` - Error (health check itself failed)
110
+
111
+ Use this endpoint for:
112
+ - Load balancer health checks
113
+ - Monitoring and alerting
114
+ - Deployment verification
115
+
34
116
  ### Usage
35
117
 
36
118
  ```javascript
@@ -66,6 +148,123 @@ const remaining = response.headers.get('X-RateLimit-Remaining');
66
148
  const resetAt = response.headers.get('X-RateLimit-Reset');
67
149
  ```
68
150
 
151
+ ## Docker Deployment
152
+
153
+ ### Dockerfile Example
154
+
155
+ ```dockerfile
156
+ FROM node:18-alpine
157
+
158
+ WORKDIR /app
159
+
160
+ # Copy package files
161
+ COPY package*.json ./
162
+ RUN npm ci --only=production
163
+
164
+ # Copy source code
165
+ COPY src ./src
166
+ COPY api ./api
167
+
168
+ # Set environment
169
+ ENV NODE_ENV=production
170
+
171
+ # Expose port (if running as server)
172
+ EXPOSE 3000
173
+
174
+ # Health check
175
+ HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
176
+ CMD node -e "require('http').get('http://localhost:3000/api/health', (r) => { process.exit(r.statusCode === 200 ? 0 : 1) })"
177
+
178
+ # Start application
179
+ CMD ["node", "api/server.js"]
180
+ ```
181
+
182
+ ### Docker Compose Example
183
+
184
+ ```yaml
185
+ version: '3.8'
186
+
187
+ services:
188
+ ai-visual-test:
189
+ build: .
190
+ ports:
191
+ - "3000:3000"
192
+ environment:
193
+ - GEMINI_API_KEY=${GEMINI_API_KEY}
194
+ - VLM_PROVIDER=gemini
195
+ - NODE_ENV=production
196
+ healthcheck:
197
+ test: ["CMD", "curl", "-f", "http://localhost:3000/api/health"]
198
+ interval: 30s
199
+ timeout: 10s
200
+ retries: 3
201
+ start_period: 40s
202
+ ```
203
+
204
+ ## Graceful Shutdown
205
+
206
+ The library includes graceful shutdown handling for long-running processes:
207
+
208
+ ```javascript
209
+ import { initGracefulShutdown, registerShutdownHandler } from '@arclabs561/ai-visual-test';
210
+
211
+ // Initialize (automatically done in library, but can be customized)
212
+ initGracefulShutdown({ timeout: 30000 }); // 30 second timeout
213
+
214
+ // Register custom shutdown handlers
215
+ registerShutdownHandler(async () => {
216
+ // Clean up your resources
217
+ await closeDatabase();
218
+ await flushLogs();
219
+ }, 10); // Priority (higher = called first)
220
+ ```
221
+
222
+ **Features**:
223
+ - Handles `SIGTERM` and `SIGINT` signals
224
+ - Executes shutdown handlers in priority order
225
+ - Flushes caches and cleans up resources
226
+ - Timeout protection (default: 30s)
227
+ - Handles uncaught exceptions
228
+
229
+ ## Monitoring and Observability
230
+
231
+ ### Health Checks
232
+
233
+ Monitor the `/api/health` endpoint:
234
+ - **Interval**: Check every 30-60 seconds
235
+ - **Timeout**: 3-5 seconds
236
+ - **Alert on**: Status `503` (degraded) or `500` (error)
237
+
238
+ ### Metrics to Monitor
239
+
240
+ 1. **Health Check Status**
241
+ - `status: "healthy"` vs `"degraded"` vs `"error"`
242
+ - Validation warnings
243
+
244
+ 2. **Cache Performance**
245
+ - Hit rate (should be >50% in production)
246
+ - Cache size
247
+
248
+ 3. **API Performance**
249
+ - Response times (via performance logger)
250
+ - Error rates
251
+ - Cost tracking
252
+
253
+ ### Logging
254
+
255
+ The library includes comprehensive logging:
256
+ - API call performance (latency, retries, costs)
257
+ - Cache operations (hits, misses, evictions)
258
+ - Temporal decisions (when prompts trigger/skip)
259
+ - Error patterns
260
+
261
+ Enable debug logging:
262
+ ```javascript
263
+ import { setDebugEnabled } from '@arclabs561/ai-visual-test';
264
+
265
+ setDebugEnabled(true);
266
+ ```
267
+
69
268
  ## Local Development
70
269
 
71
270
  ```bash
@@ -76,5 +275,22 @@ npm install
76
275
  npm test
77
276
 
78
277
  # Use as library
79
- import { validateScreenshot } from '@ai-visual-test/core';
278
+ import { validateScreenshot } from '@arclabs561/ai-visual-test';
279
+
280
+ # Validate startup configuration
281
+ import { validateStartup } from '@arclabs561/ai-visual-test';
282
+ validateStartup(); // Throws if configuration invalid
80
283
  ```
284
+
285
+ ## Production Checklist
286
+
287
+ - [ ] Set required API keys in environment
288
+ - [ ] Configure `VLM_PROVIDER` if using specific provider
289
+ - [ ] Set `API_KEY` for endpoint authentication (if exposing API)
290
+ - [ ] Configure `RATE_LIMIT_MAX_REQUESTS` based on expected load
291
+ - [ ] Set up health check monitoring
292
+ - [ ] Configure logging aggregation
293
+ - [ ] Set up cost tracking and alerts
294
+ - [ ] Test graceful shutdown
295
+ - [ ] Verify cache directory permissions (if using file cache)
296
+ - [ ] Review security settings (`REQUIRE_AUTH`, rate limits)
package/README.md CHANGED
@@ -1,18 +1,20 @@
1
- # @arclabs561/ai-visual-test
1
+ # ai-visual-test
2
2
 
3
- AI-powered visual testing. Uses vision language models to understand screenshots instead of pixel-diffing.
3
+ Visual testing framework using Vision Language Models. Validates screenshots, checks accessibility, and can play games.
4
4
 
5
- ## Why
5
+ ## Why This Package
6
6
 
7
- Pixel-based testing breaks when content changes or layouts shift. This tool asks "does this look correct?" instead of "did pixels change?"
7
+ Pixel-based testing breaks when content changes. This tool asks "does this look correct?" instead of "did pixels change?"
8
8
 
9
- ## Install
9
+ ## Installation
10
10
 
11
11
  ```bash
12
12
  npm install @arclabs561/ai-visual-test
13
13
  ```
14
14
 
15
- Set an API key:
15
+ ## Configuration
16
+
17
+ Set an API key in a `.env` file:
16
18
 
17
19
  ```bash
18
20
  # .env file
@@ -23,7 +25,26 @@ OPENAI_API_KEY=your-key-here
23
25
  ANTHROPIC_API_KEY=your-key-here
24
26
  ```
25
27
 
26
- ## Use
28
+ ## Quick Start
29
+
30
+ ### With Playwright
31
+
32
+ ```javascript
33
+ import { validatePage } from '@arclabs561/ai-visual-test';
34
+ import { chromium } from 'playwright';
35
+
36
+ const browser = await chromium.launch();
37
+ const page = await browser.newPage();
38
+ await page.goto('https://example.com');
39
+
40
+ // validatePage() handles screenshotting
41
+ const result = await validatePage(page, 'Check for visual bugs and accessibility issues');
42
+
43
+ console.log(result.score); // 7 (0-10 scale)
44
+ console.log(result.issues); // ['Missing error messages', 'Low contrast']
45
+ ```
46
+
47
+ ### With Screenshot Path
27
48
 
28
49
  ```javascript
29
50
  import { validateScreenshot } from '@arclabs561/ai-visual-test';
@@ -33,109 +54,79 @@ const result = await validateScreenshot(
33
54
  'Check if this payment form is accessible and usable'
34
55
  );
35
56
 
36
- console.log(result.score); // 0-10
57
+ console.log(result.score); // 7 (0-10 scale)
37
58
  console.log(result.issues); // ['Missing error messages', 'Low contrast']
38
59
  ```
39
60
 
40
- ## What it's good for
41
-
42
- - **Accessibility** - Fast programmatic checks or VLLM semantic evaluation
43
- - **Design principles** - Validates brutalist, minimal, or other styles
44
- - **Temporal testing** - Analyzes animations and gameplay over time
45
- - **State validation** - Fast programmatic or VLLM extraction
46
- - **Game testing** - Validate gameplay with variable goals
47
- - **Natural language specs** - Write tests in plain English
61
+ ## Key Features
48
62
 
49
- ## What it's not good for
50
-
51
- - Pixel-perfect layout testing (use pixel-diffing tools)
52
- - Exact color matching (use design tools)
53
- - Performance testing (use Lighthouse)
54
- - Unit testing (use Jest/Vitest)
55
-
56
- ## API
57
-
58
- ### Core
63
+ ### 1. Hybrid Validation
64
+ Combines deterministic code checks (contrast ratios, aria-labels) with AI visual judgment.
59
65
 
60
66
  ```javascript
61
- import { validateScreenshot, createConfig } from '@arclabs561/ai-visual-test';
67
+ import { validateAccessibilityHybrid } from '@arclabs561/ai-visual-test/validators';
68
+ // Checks code AND pixels
69
+ const result = await validateAccessibilityHybrid(page, 'shot.png');
70
+ ```
62
71
 
63
- // Configure (optional - auto-detects from env)
64
- const config = createConfig({
65
- provider: 'gemini',
66
- apiKey: process.env.GEMINI_API_KEY
67
- });
72
+ ### 2. AI Game Agent
73
+ Plays Canvas/WebGL games by analyzing screenshots and planning actions. Includes Reflexion (learning from mistakes) and Chain of Thought.
68
74
 
69
- // Validate
70
- const result = await validateScreenshot(
71
- 'screenshot.png',
72
- 'Evaluate this screenshot',
73
- { testType: 'payment-screen' }
74
- );
75
+ ```javascript
76
+ import { playGame } from '@arclabs561/ai-visual-test';
77
+ await playGame(page, { goal: 'Win the level', maxSteps: 50 });
75
78
  ```
76
79
 
77
- ### Sub-modules (better tree-shaking)
80
+ ### 3. Cost Optimization
81
+ Caching, model tiering, and provider selection. See `test/performance/optimization-claims-validation.test.mjs` for validation.
78
82
 
79
- ```javascript
80
- // Validators
81
- import { StateValidator } from '@arclabs561/ai-visual-test/validators';
83
+ ## Documentation
82
84
 
83
- // Temporal
84
- import { aggregateTemporalNotes } from '@arclabs561/ai-visual-test/temporal';
85
+ - [**EXAMPLES.md**](./EXAMPLES.md) - Code snippets for Game Playing, Hybrid Validation, Playwright integration.
86
+ - [**API_QUICK_REFERENCE.md**](./API_QUICK_REFERENCE.md) - Function signatures and options.
87
+ - [**examples/**](./examples/) - Runnable examples.
88
+ - **TypeScript**: Type definitions included.
85
89
 
86
- // Multi-modal
87
- import { multiModalValidation } from '@arclabs561/ai-visual-test/multi-modal';
90
+ ## Playwright Integration
88
91
 
89
- // Ensemble
90
- import { EnsembleJudge } from '@arclabs561/ai-visual-test/ensemble';
92
+ Custom matchers for Playwright tests. **Requires `@playwright/test` to be installed** (already in devDependencies for this project).
91
93
 
92
- // Persona
93
- import { experiencePageAsPersona } from '@arclabs561/ai-visual-test/persona';
94
+ ### Setup
94
95
 
95
- // Specs
96
- import { parseSpec } from '@arclabs561/ai-visual-test/specs';
96
+ ```javascript
97
+ import { expect } from '@playwright/test';
98
+ import { createMatchers } from '@arclabs561/ai-visual-test/playwright';
97
99
 
98
- // Utils
99
- import { getCacheStats } from '@arclabs561/ai-visual-test/utils';
100
+ // Extend expect with custom matchers (call once in your test setup)
101
+ createMatchers(expect);
100
102
  ```
101
103
 
102
- ### With Playwright
104
+ ### Usage in Tests
103
105
 
104
106
  ```javascript
105
- import { test } from '@playwright/test';
106
- import { validateScreenshot } from '@arclabs561/ai-visual-test';
107
-
108
- test('payment screen', async ({ page }) => {
109
- await page.goto('https://example.com/checkout');
110
- await page.screenshot({ path: 'checkout.png' });
107
+ test('visual quality', async ({ page }) => {
108
+ await page.goto('https://example.com');
111
109
 
112
- const result = await validateScreenshot(
113
- 'checkout.png',
114
- 'Check if payment form is accessible'
115
- );
110
+ // Visual quality check
111
+ await expect(page).toHaveVisualScore(7, 'Check visual quality');
116
112
 
117
- assert(result.score >= 8, 'Payment form should score at least 8');
113
+ // Hybrid accessibility check (programmatic + AI)
114
+ await expect(page).toBeAccessibleHybrid(4.5);
118
115
  });
119
116
  ```
120
117
 
121
- ## Features
122
-
123
- - **Multi-provider** - Gemini, OpenAI, Claude
124
- - **Cost-effective** - Auto-selects cheapest provider, includes caching
125
- - **Multi-modal** - Screenshots + rendered code + context
126
- - **Temporal** - Time-series validation for animations
127
- - **Multi-perspective** - Multiple personas evaluate same state
128
- - **Zero dependencies** - Pure ES Modules
118
+ ### Installation
129
119
 
130
- ## Examples
120
+ For development in this project, Playwright is already installed. For use in other projects:
131
121
 
132
- See `examples/` directory for complete examples.
122
+ ```bash
123
+ npm install --save-dev @playwright/test
124
+ npx playwright install chromium
125
+ ```
133
126
 
134
- ## Documentation
127
+ See `examples/playwright-setup.mjs` for setup example.
135
128
 
136
- - `docs/API_SUBMODULES.md` - Sub-module usage
137
- - `docs/API_SURFACE_ORGANIZATION.md` - API organization
138
- - `CHANGELOG.md` - Version history
129
+ Documentation: [docs/PLAYWRIGHT_INTEGRATION.md](./docs/PLAYWRIGHT_INTEGRATION.md)
139
130
 
140
131
  ## License
141
132