@arclabs561/ai-visual-test 0.5.1 → 0.7.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +127 -11
- package/DEPLOYMENT.md +225 -9
- package/README.md +71 -80
- package/index.d.ts +902 -5
- package/package.json +10 -51
- package/src/batch-optimizer.mjs +39 -0
- package/src/cache.mjs +241 -16
- package/src/config.mjs +33 -91
- package/src/constants.mjs +54 -0
- package/src/convenience.mjs +113 -10
- package/src/cost-optimization.mjs +1 -0
- package/src/cost-tracker.mjs +134 -2
- package/src/data-extractor.mjs +36 -7
- package/src/dynamic-few-shot.mjs +69 -11
- package/src/errors.mjs +6 -2
- package/src/experience-propagation.mjs +12 -0
- package/src/experience-tracer.mjs +12 -3
- package/src/game-player.mjs +222 -43
- package/src/graceful-shutdown.mjs +126 -0
- package/src/helpers/playwright.mjs +22 -8
- package/src/human-validation-manager.mjs +99 -2
- package/src/index.mjs +48 -3
- package/src/integrations/playwright.mjs +140 -0
- package/src/judge.mjs +699 -24
- package/src/load-env.mjs +2 -1
- package/src/logger.mjs +31 -3
- package/src/model-tier-selector.mjs +1 -221
- package/src/natural-language-specs.mjs +31 -3
- package/src/persona-enhanced.mjs +4 -2
- package/src/persona-experience.mjs +1 -1
- package/src/pricing.mjs +28 -0
- package/src/prompt-composer.mjs +162 -5
- package/src/provider-data.mjs +115 -0
- package/src/render-change-detector.mjs +5 -0
- package/src/research-enhanced-validation.mjs +7 -5
- package/src/retry.mjs +21 -7
- package/src/rubrics.mjs +4 -0
- package/src/safe-logger.mjs +71 -0
- package/src/session-cost-tracker.mjs +320 -0
- package/src/smart-validator.mjs +8 -8
- package/src/spec-templates.mjs +52 -6
- package/src/startup-validation.mjs +127 -0
- package/src/temporal-adaptive.mjs +2 -2
- package/src/temporal-decision-manager.mjs +1 -271
- package/src/temporal-logic.mjs +104 -0
- package/src/temporal-note-pruner.mjs +119 -0
- package/src/temporal-preprocessor.mjs +1 -543
- package/src/temporal.mjs +681 -79
- package/src/utils/action-hallucination-detector.mjs +301 -0
- package/src/utils/baseline-validator.mjs +82 -0
- package/src/utils/cache-stats.mjs +104 -0
- package/src/utils/cached-llm.mjs +164 -0
- package/src/utils/capability-stratifier.mjs +108 -0
- package/src/utils/counterfactual-tester.mjs +83 -0
- package/src/utils/error-recovery.mjs +117 -0
- package/src/utils/explainability-scorer.mjs +119 -0
- package/src/utils/exploratory-automation.mjs +131 -0
- package/src/utils/index.mjs +10 -0
- package/src/utils/intent-recognizer.mjs +201 -0
- package/src/utils/log-sanitizer.mjs +165 -0
- package/src/utils/path-validator.mjs +88 -0
- package/src/utils/performance-logger.mjs +316 -0
- package/src/utils/performance-measurement.mjs +280 -0
- package/src/utils/prompt-sanitizer.mjs +213 -0
- package/src/utils/rate-limiter.mjs +144 -0
- package/src/validation-framework.mjs +24 -20
- package/src/validation-result-normalizer.mjs +35 -1
- package/src/validation.mjs +75 -25
- package/src/validators/accessibility-validator.mjs +144 -0
- package/src/validators/hybrid-validator.mjs +48 -4
- package/api/health.js +0 -34
- package/api/validate.js +0 -252
- package/public/index.html +0 -149
- package/vercel.json +0 -27
package/CHANGELOG.md
CHANGED
|
@@ -1,24 +1,140 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
-
All notable changes to
|
|
3
|
+
All notable changes to ai-visual-test will be documented in this file.
|
|
4
4
|
|
|
5
|
-
## [0.
|
|
5
|
+
## [0.7.4] - 2026-03-03
|
|
6
|
+
|
|
7
|
+
### Added
|
|
8
|
+
- **Structured result fields at top level** - `result.richIssues`, `result.recommendations`, `result.strengths` promoted from `result.semantic` to the top-level result object, eliminating the need to reach into `result.semantic` for structured output.
|
|
9
|
+
- `richIssues`: array of `{ description, importance, annoyance, impact, evidence, suggestion }` objects
|
|
10
|
+
- `recommendations`: array of `{ priority, suggestion, expectedImpact }` objects
|
|
11
|
+
- `strengths`: array of strings describing what works well
|
|
12
|
+
- **TypeScript types** for `RichIssue`, `Recommendation`; updated `SemanticInfo` and `ValidationResult` interfaces.
|
|
13
|
+
|
|
14
|
+
### Fixed
|
|
15
|
+
- `result.issues` (flat strings) is preserved for backward compatibility; `result.richIssues` adds the structured version alongside it.
|
|
16
|
+
|
|
17
|
+
## [0.7.3] - 2026-03-02
|
|
18
|
+
|
|
19
|
+
### Added
|
|
20
|
+
- **Visual anchors** - domain-level grounding cues (text + image) injected into VLM prompts. Supports `AnchorEntry` union type: plain strings, dimension-scoped text, image references, or combinations. Config-level anchors merge with per-call `context.anchors`.
|
|
21
|
+
- **Dimension-scoped anchors** - tag anchors with rubric dimension names for targeted evaluation.
|
|
22
|
+
- **Image anchor resolution** - file paths, data URIs, and raw base64 supported for reference screenshots.
|
|
23
|
+
|
|
24
|
+
### Fixed
|
|
25
|
+
- Prompt composer: proper `\n\n` separation between anchor section and base prompt.
|
|
26
|
+
- Judge: always warn on missing anchor images (not just verbose mode).
|
|
27
|
+
- Build script: strip `scripts` and `devDependencies` from dist `package.json`.
|
|
28
|
+
- Publish workflow: run only unit tests in CI; audit prod deps only.
|
|
29
|
+
|
|
30
|
+
## [0.6.0] - 2025-01-17
|
|
6
31
|
|
|
7
32
|
### Changed
|
|
8
|
-
- **
|
|
9
|
-
-
|
|
33
|
+
- **Selective Obfuscation** - Core algorithms obfuscated while maintaining debuggability
|
|
34
|
+
- Obfuscates only Tier 1 files (temporal decision, cost optimization, activity preprocessing)
|
|
35
|
+
- Keeps API surface, validators, utilities, and cache system readable
|
|
36
|
+
- Transparent about obfuscation strategy in README
|
|
37
|
+
- TypeScript definitions enhanced with comprehensive JSDoc (survives obfuscation)
|
|
38
|
+
- **Documentation Strategy** - Minimal, self-contained documentation in package
|
|
39
|
+
- `API_QUICK_REFERENCE.md` - Essential API patterns (in package)
|
|
40
|
+
- `EXAMPLES.md` - Working code examples (in package)
|
|
41
|
+
- Enhanced TypeScript definitions with examples and usage patterns
|
|
42
|
+
- README updated with obfuscation transparency section
|
|
43
|
+
- All documentation self-contained (no external hosting, GitHub is private)
|
|
44
|
+
|
|
45
|
+
### Security
|
|
46
|
+
- **Path Traversal Prevention** - Added comprehensive path validation to prevent directory traversal attacks
|
|
47
|
+
- `src/utils/path-validator.mjs` - Centralized path validation utilities
|
|
48
|
+
- All image paths validated before file operations
|
|
49
|
+
- Absolute paths properly resolved and validated
|
|
50
|
+
- **Prompt Injection Protection** - Protection against prompt injection attacks
|
|
51
|
+
- `src/utils/prompt-sanitizer.mjs` - Prompt sanitization and security validation
|
|
52
|
+
- Strict mode validation (default) or sanitization mode
|
|
53
|
+
- Detects and prevents malicious prompt patterns
|
|
54
|
+
- **Image Format Validation** - Magic byte validation to prevent MIME type spoofing
|
|
55
|
+
- Validates PNG, JPEG, GIF, WebP formats using file signatures
|
|
56
|
+
- Prevents malicious file uploads disguised as images
|
|
57
|
+
- **Library-Level Rate Limiting** - Configurable request and cost-based rate limiting
|
|
58
|
+
- `src/utils/rate-limiter.mjs` - Request and cost-based rate limiting
|
|
59
|
+
- Prevents API abuse and cost overruns
|
|
60
|
+
- Configurable limits per window
|
|
61
|
+
- **Log Sanitization** - All logged output sanitized to prevent information leakage
|
|
62
|
+
- `src/utils/log-sanitizer.mjs` - Utilities for sanitizing sensitive data
|
|
63
|
+
- Error messages use basename for file paths
|
|
64
|
+
- Sensitive data removed from logs
|
|
65
|
+
- **Input Validation** - Comprehensive input validation
|
|
66
|
+
- Prompt length limits (10k characters max)
|
|
67
|
+
- File path validation for all file operations
|
|
68
|
+
- Error message sanitization
|
|
69
|
+
|
|
70
|
+
### Changed
|
|
71
|
+
- **Repository Privacy** - GitHub repository made private
|
|
72
|
+
- Source code, history, and internal documentation no longer publicly accessible
|
|
73
|
+
- **Selective Obfuscation** - Protects proprietary algorithms while maintaining usability
|
|
74
|
+
- Obfuscates: `temporal-decision-manager.mjs`, `cost-optimization.mjs`, `model-tier-selector.mjs`, `temporal-preprocessor.mjs`
|
|
75
|
+
- Readable: API surface, validators, utilities, cache system, error handling
|
|
76
|
+
- Build script shows which files are obfuscated (🔒) vs readable (📄)
|
|
77
|
+
- Transparent documentation about obfuscation strategy
|
|
78
|
+
- **Package Cleanup** - Removed deployment-specific files from npm package
|
|
79
|
+
- Removed `vercel.json`, `api/**/*.js`, `public/**/*.html` from package
|
|
80
|
+
- Package now contains only library code (115 files)
|
|
81
|
+
- Cleaner, library-only distribution
|
|
82
|
+
|
|
83
|
+
### Added
|
|
84
|
+
- **Security Utilities**
|
|
85
|
+
- `src/utils/path-validator.mjs` - Path validation and traversal prevention
|
|
86
|
+
- `src/utils/prompt-sanitizer.mjs` - Prompt injection protection
|
|
87
|
+
- `src/utils/rate-limiter.mjs` - Library-level rate limiting
|
|
88
|
+
- `src/utils/log-sanitizer.mjs` - Log sanitization utilities
|
|
89
|
+
- **Build System**
|
|
90
|
+
- `scripts/build-obfuscated.mjs` - Obfuscation build script
|
|
91
|
+
- `scripts/cleanup-root-docs.mjs` - Repository cleanup automation
|
|
92
|
+
- `npm run build` - Build obfuscated package
|
|
93
|
+
- `npm run build:skip-obfuscation` - Build without obfuscation (testing)
|
|
94
|
+
- **Documentation**
|
|
95
|
+
- `API_QUICK_REFERENCE.md` - Essential API patterns (in package)
|
|
96
|
+
- `EXAMPLES.md` - Working code examples (in package)
|
|
97
|
+
- Enhanced TypeScript definitions with comprehensive JSDoc comments
|
|
98
|
+
- `docs/OBFUSCATION_STRATEGY.md` - Complete obfuscation strategy
|
|
99
|
+
- `docs/OBFUSCATION_IMPLEMENTATION.md` - Implementation details
|
|
100
|
+
|
|
101
|
+
### Improved
|
|
102
|
+
- **Error Handling** - Enhanced error messages with sanitization
|
|
103
|
+
- File paths use basename in error messages
|
|
104
|
+
- No sensitive information in error output
|
|
105
|
+
- Better error categorization
|
|
106
|
+
- **Secret Detection** - Improved false positive handling
|
|
107
|
+
- Added patterns for common code constructs
|
|
108
|
+
- Excluded script from self-checking
|
|
109
|
+
- Better detection of actual secrets vs. code patterns
|
|
110
|
+
|
|
111
|
+
### Fixed
|
|
112
|
+
- **Test Failures** - Fixed ExploratoryStrategy test (shared state issue)
|
|
113
|
+
- **Build Script** - Fixed obfuscator detection logic
|
|
114
|
+
- **Package Paths** - Fixed package.json paths for dist/ directory
|
|
115
|
+
|
|
116
|
+
### Repository
|
|
117
|
+
- **Cleanup** - Archived 14 temporary documentation files
|
|
118
|
+
- **Organization** - Root directory reduced from ~20+ to 7 essential files
|
|
119
|
+
- **Gitignore** - Updated to exclude temporary files and deployment configs
|
|
120
|
+
|
|
121
|
+
### Security Rating
|
|
122
|
+
- Improved from **LOW-MEDIUM** to **8.5/10**
|
|
123
|
+
- All critical vulnerabilities addressed
|
|
124
|
+
- Production-ready security posture
|
|
10
125
|
|
|
11
126
|
## [0.5.0] - 2025-11-13
|
|
12
127
|
|
|
13
128
|
### Added
|
|
14
129
|
- **API Sub-Modules** - Organized API into logical sub-modules for better tree-shaking
|
|
15
|
-
-
|
|
16
|
-
-
|
|
17
|
-
-
|
|
18
|
-
-
|
|
19
|
-
-
|
|
20
|
-
-
|
|
21
|
-
-
|
|
130
|
+
- `ai-visual-test/validators` - All validation functionality
|
|
131
|
+
- `ai-visual-test/temporal` - Temporal aggregation and decision-making
|
|
132
|
+
- `ai-visual-test/multi-modal` - Multi-modal validation features
|
|
133
|
+
- `ai-visual-test/ensemble` - Ensemble judging and bias detection
|
|
134
|
+
- `ai-visual-test/persona` - Persona-based testing
|
|
135
|
+
- `ai-visual-test/specs` - Natural language specifications
|
|
136
|
+
- `ai-visual-test/utils` - Utility functions and infrastructure
|
|
137
|
+
- Main export (`ai-visual-test`) still works for backward compatibility
|
|
22
138
|
- **Smart Validators** - Automatically select the best validator type based on available context
|
|
23
139
|
- `validateSmart()` - Universal smart validator that auto-selects best method
|
|
24
140
|
- `validateAccessibilitySmart()` - Smart accessibility validation (programmatic/VLLM/hybrid)
|
package/DEPLOYMENT.md
CHANGED
|
@@ -1,5 +1,14 @@
|
|
|
1
1
|
# Deployment Guide
|
|
2
2
|
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
This guide covers deploying `@arclabs561/ai-visual-test` in production environments, including:
|
|
6
|
+
- Vercel serverless deployment
|
|
7
|
+
- Docker containerization
|
|
8
|
+
- Health checks and monitoring
|
|
9
|
+
- Graceful shutdown
|
|
10
|
+
- Environment variable validation
|
|
11
|
+
|
|
3
12
|
## Vercel Deployment
|
|
4
13
|
|
|
5
14
|
### Quick Deploy
|
|
@@ -15,22 +24,95 @@ vercel
|
|
|
15
24
|
|
|
16
25
|
### Environment Variables
|
|
17
26
|
|
|
18
|
-
|
|
27
|
+
**Required** (at least one API key):
|
|
28
|
+
- `GEMINI_API_KEY` - For Gemini provider
|
|
29
|
+
- `OPENAI_API_KEY` - For OpenAI provider
|
|
30
|
+
- `ANTHROPIC_API_KEY` - For Claude/Anthropic provider
|
|
31
|
+
- `GROQ_API_KEY` - For Groq provider (high-frequency decisions)
|
|
32
|
+
|
|
33
|
+
**Optional**:
|
|
34
|
+
- `VLM_PROVIDER` - Provider to use (auto-detected if not set): `gemini`, `openai`, `claude`, `groq`
|
|
35
|
+
- `VLM_MODEL` - Explicit model override
|
|
36
|
+
- `VLM_MODEL_TIER` - Model tier: `fast`, `balanced`, `best`
|
|
37
|
+
- `API_KEY` or `VLLM_API_KEY` - For API endpoint authentication
|
|
38
|
+
- `REQUIRE_AUTH` - Set to `true` to enforce authentication (default: `true` if API_KEY is set)
|
|
39
|
+
- `RATE_LIMIT_MAX_REQUESTS` - Max requests per minute (default: 10)
|
|
40
|
+
- `DISABLE_LLM_CACHE` - Set to `true` to disable caching globally
|
|
41
|
+
|
|
42
|
+
### Startup Validation
|
|
19
43
|
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
44
|
+
The library automatically validates configuration at startup. If required environment variables are missing, you'll get clear error messages:
|
|
45
|
+
|
|
46
|
+
```javascript
|
|
47
|
+
import { validateStartup } from '@arclabs561/ai-visual-test';
|
|
48
|
+
|
|
49
|
+
// Strict validation (throws on missing vars)
|
|
50
|
+
try {
|
|
51
|
+
validateStartup();
|
|
52
|
+
console.log('✅ Configuration valid');
|
|
53
|
+
} catch (error) {
|
|
54
|
+
console.error('❌ Configuration invalid:', error.message);
|
|
55
|
+
// Error includes actionable guidance:
|
|
56
|
+
// "Missing required environment variables for provider 'gemini': GEMINI_API_KEY"
|
|
57
|
+
}
|
|
58
|
+
|
|
59
|
+
// Soft validation (returns warnings)
|
|
60
|
+
const result = validateStartupSoft();
|
|
61
|
+
if (!result.valid) {
|
|
62
|
+
console.warn('⚠️ Configuration warnings:', result.warnings);
|
|
63
|
+
}
|
|
64
|
+
```
|
|
25
65
|
|
|
26
66
|
### API Endpoints
|
|
27
67
|
|
|
28
68
|
After deployment, you'll have:
|
|
29
69
|
|
|
30
|
-
- `https://your-site.vercel.app/api/validate` - Validation endpoint
|
|
31
|
-
- `https://your-site.vercel.app/api/health` - Health check
|
|
70
|
+
- `https://your-site.vercel.app/api/validate` - Validation endpoint (POST)
|
|
71
|
+
- `https://your-site.vercel.app/api/health` - Health check (GET)
|
|
32
72
|
- `https://your-site.vercel.app/` - Web interface
|
|
33
73
|
|
|
74
|
+
#### Health Check Endpoint
|
|
75
|
+
|
|
76
|
+
The health check endpoint provides comprehensive status:
|
|
77
|
+
|
|
78
|
+
```bash
|
|
79
|
+
curl https://your-site.vercel.app/api/health
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
**Response**:
|
|
83
|
+
```json
|
|
84
|
+
{
|
|
85
|
+
"status": "healthy",
|
|
86
|
+
"timestamp": "2025-01-17T12:00:00.000Z",
|
|
87
|
+
"version": "0.5.5",
|
|
88
|
+
"config": {
|
|
89
|
+
"enabled": true,
|
|
90
|
+
"provider": "gemini",
|
|
91
|
+
"hasApiKey": true
|
|
92
|
+
},
|
|
93
|
+
"validation": {
|
|
94
|
+
"valid": true,
|
|
95
|
+
"warnings": []
|
|
96
|
+
},
|
|
97
|
+
"cache": {
|
|
98
|
+
"enabled": true,
|
|
99
|
+
"hits": 1234,
|
|
100
|
+
"misses": 567,
|
|
101
|
+
"hitRate": 0.685
|
|
102
|
+
}
|
|
103
|
+
}
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
**Status Codes**:
|
|
107
|
+
- `200` - Healthy (all checks pass)
|
|
108
|
+
- `503` - Degraded (configuration issues, but service may still work)
|
|
109
|
+
- `500` - Error (health check itself failed)
|
|
110
|
+
|
|
111
|
+
Use this endpoint for:
|
|
112
|
+
- Load balancer health checks
|
|
113
|
+
- Monitoring and alerting
|
|
114
|
+
- Deployment verification
|
|
115
|
+
|
|
34
116
|
### Usage
|
|
35
117
|
|
|
36
118
|
```javascript
|
|
@@ -66,6 +148,123 @@ const remaining = response.headers.get('X-RateLimit-Remaining');
|
|
|
66
148
|
const resetAt = response.headers.get('X-RateLimit-Reset');
|
|
67
149
|
```
|
|
68
150
|
|
|
151
|
+
## Docker Deployment
|
|
152
|
+
|
|
153
|
+
### Dockerfile Example
|
|
154
|
+
|
|
155
|
+
```dockerfile
|
|
156
|
+
FROM node:18-alpine
|
|
157
|
+
|
|
158
|
+
WORKDIR /app
|
|
159
|
+
|
|
160
|
+
# Copy package files
|
|
161
|
+
COPY package*.json ./
|
|
162
|
+
RUN npm ci --only=production
|
|
163
|
+
|
|
164
|
+
# Copy source code
|
|
165
|
+
COPY src ./src
|
|
166
|
+
COPY api ./api
|
|
167
|
+
|
|
168
|
+
# Set environment
|
|
169
|
+
ENV NODE_ENV=production
|
|
170
|
+
|
|
171
|
+
# Expose port (if running as server)
|
|
172
|
+
EXPOSE 3000
|
|
173
|
+
|
|
174
|
+
# Health check
|
|
175
|
+
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
|
|
176
|
+
CMD node -e "require('http').get('http://localhost:3000/api/health', (r) => { process.exit(r.statusCode === 200 ? 0 : 1) })"
|
|
177
|
+
|
|
178
|
+
# Start application
|
|
179
|
+
CMD ["node", "api/server.js"]
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
### Docker Compose Example
|
|
183
|
+
|
|
184
|
+
```yaml
|
|
185
|
+
version: '3.8'
|
|
186
|
+
|
|
187
|
+
services:
|
|
188
|
+
ai-visual-test:
|
|
189
|
+
build: .
|
|
190
|
+
ports:
|
|
191
|
+
- "3000:3000"
|
|
192
|
+
environment:
|
|
193
|
+
- GEMINI_API_KEY=${GEMINI_API_KEY}
|
|
194
|
+
- VLM_PROVIDER=gemini
|
|
195
|
+
- NODE_ENV=production
|
|
196
|
+
healthcheck:
|
|
197
|
+
test: ["CMD", "curl", "-f", "http://localhost:3000/api/health"]
|
|
198
|
+
interval: 30s
|
|
199
|
+
timeout: 10s
|
|
200
|
+
retries: 3
|
|
201
|
+
start_period: 40s
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
## Graceful Shutdown
|
|
205
|
+
|
|
206
|
+
The library includes graceful shutdown handling for long-running processes:
|
|
207
|
+
|
|
208
|
+
```javascript
|
|
209
|
+
import { initGracefulShutdown, registerShutdownHandler } from '@arclabs561/ai-visual-test';
|
|
210
|
+
|
|
211
|
+
// Initialize (automatically done in library, but can be customized)
|
|
212
|
+
initGracefulShutdown({ timeout: 30000 }); // 30 second timeout
|
|
213
|
+
|
|
214
|
+
// Register custom shutdown handlers
|
|
215
|
+
registerShutdownHandler(async () => {
|
|
216
|
+
// Clean up your resources
|
|
217
|
+
await closeDatabase();
|
|
218
|
+
await flushLogs();
|
|
219
|
+
}, 10); // Priority (higher = called first)
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
**Features**:
|
|
223
|
+
- Handles `SIGTERM` and `SIGINT` signals
|
|
224
|
+
- Executes shutdown handlers in priority order
|
|
225
|
+
- Flushes caches and cleans up resources
|
|
226
|
+
- Timeout protection (default: 30s)
|
|
227
|
+
- Handles uncaught exceptions
|
|
228
|
+
|
|
229
|
+
## Monitoring and Observability
|
|
230
|
+
|
|
231
|
+
### Health Checks
|
|
232
|
+
|
|
233
|
+
Monitor the `/api/health` endpoint:
|
|
234
|
+
- **Interval**: Check every 30-60 seconds
|
|
235
|
+
- **Timeout**: 3-5 seconds
|
|
236
|
+
- **Alert on**: Status `503` (degraded) or `500` (error)
|
|
237
|
+
|
|
238
|
+
### Metrics to Monitor
|
|
239
|
+
|
|
240
|
+
1. **Health Check Status**
|
|
241
|
+
- `status: "healthy"` vs `"degraded"` vs `"error"`
|
|
242
|
+
- Validation warnings
|
|
243
|
+
|
|
244
|
+
2. **Cache Performance**
|
|
245
|
+
- Hit rate (should be >50% in production)
|
|
246
|
+
- Cache size
|
|
247
|
+
|
|
248
|
+
3. **API Performance**
|
|
249
|
+
- Response times (via performance logger)
|
|
250
|
+
- Error rates
|
|
251
|
+
- Cost tracking
|
|
252
|
+
|
|
253
|
+
### Logging
|
|
254
|
+
|
|
255
|
+
The library includes comprehensive logging:
|
|
256
|
+
- API call performance (latency, retries, costs)
|
|
257
|
+
- Cache operations (hits, misses, evictions)
|
|
258
|
+
- Temporal decisions (when prompts trigger/skip)
|
|
259
|
+
- Error patterns
|
|
260
|
+
|
|
261
|
+
Enable debug logging:
|
|
262
|
+
```javascript
|
|
263
|
+
import { setDebugEnabled } from '@arclabs561/ai-visual-test';
|
|
264
|
+
|
|
265
|
+
setDebugEnabled(true);
|
|
266
|
+
```
|
|
267
|
+
|
|
69
268
|
## Local Development
|
|
70
269
|
|
|
71
270
|
```bash
|
|
@@ -76,5 +275,22 @@ npm install
|
|
|
76
275
|
npm test
|
|
77
276
|
|
|
78
277
|
# Use as library
|
|
79
|
-
import { validateScreenshot } from '@ai-visual-test
|
|
278
|
+
import { validateScreenshot } from '@arclabs561/ai-visual-test';
|
|
279
|
+
|
|
280
|
+
# Validate startup configuration
|
|
281
|
+
import { validateStartup } from '@arclabs561/ai-visual-test';
|
|
282
|
+
validateStartup(); // Throws if configuration invalid
|
|
80
283
|
```
|
|
284
|
+
|
|
285
|
+
## Production Checklist
|
|
286
|
+
|
|
287
|
+
- [ ] Set required API keys in environment
|
|
288
|
+
- [ ] Configure `VLM_PROVIDER` if using specific provider
|
|
289
|
+
- [ ] Set `API_KEY` for endpoint authentication (if exposing API)
|
|
290
|
+
- [ ] Configure `RATE_LIMIT_MAX_REQUESTS` based on expected load
|
|
291
|
+
- [ ] Set up health check monitoring
|
|
292
|
+
- [ ] Configure logging aggregation
|
|
293
|
+
- [ ] Set up cost tracking and alerts
|
|
294
|
+
- [ ] Test graceful shutdown
|
|
295
|
+
- [ ] Verify cache directory permissions (if using file cache)
|
|
296
|
+
- [ ] Review security settings (`REQUIRE_AUTH`, rate limits)
|
package/README.md
CHANGED
|
@@ -1,18 +1,20 @@
|
|
|
1
|
-
#
|
|
1
|
+
# ai-visual-test
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
Visual testing framework using Vision Language Models. Validates screenshots, checks accessibility, and can play games.
|
|
4
4
|
|
|
5
|
-
## Why
|
|
5
|
+
## Why This Package
|
|
6
6
|
|
|
7
|
-
Pixel-based testing breaks when content changes
|
|
7
|
+
Pixel-based testing breaks when content changes. This tool asks "does this look correct?" instead of "did pixels change?"
|
|
8
8
|
|
|
9
|
-
##
|
|
9
|
+
## Installation
|
|
10
10
|
|
|
11
11
|
```bash
|
|
12
12
|
npm install @arclabs561/ai-visual-test
|
|
13
13
|
```
|
|
14
14
|
|
|
15
|
-
|
|
15
|
+
## Configuration
|
|
16
|
+
|
|
17
|
+
Set an API key in a `.env` file:
|
|
16
18
|
|
|
17
19
|
```bash
|
|
18
20
|
# .env file
|
|
@@ -23,7 +25,26 @@ OPENAI_API_KEY=your-key-here
|
|
|
23
25
|
ANTHROPIC_API_KEY=your-key-here
|
|
24
26
|
```
|
|
25
27
|
|
|
26
|
-
##
|
|
28
|
+
## Quick Start
|
|
29
|
+
|
|
30
|
+
### With Playwright
|
|
31
|
+
|
|
32
|
+
```javascript
|
|
33
|
+
import { validatePage } from '@arclabs561/ai-visual-test';
|
|
34
|
+
import { chromium } from 'playwright';
|
|
35
|
+
|
|
36
|
+
const browser = await chromium.launch();
|
|
37
|
+
const page = await browser.newPage();
|
|
38
|
+
await page.goto('https://example.com');
|
|
39
|
+
|
|
40
|
+
// validatePage() handles screenshotting
|
|
41
|
+
const result = await validatePage(page, 'Check for visual bugs and accessibility issues');
|
|
42
|
+
|
|
43
|
+
console.log(result.score); // 7 (0-10 scale)
|
|
44
|
+
console.log(result.issues); // ['Missing error messages', 'Low contrast']
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
### With Screenshot Path
|
|
27
48
|
|
|
28
49
|
```javascript
|
|
29
50
|
import { validateScreenshot } from '@arclabs561/ai-visual-test';
|
|
@@ -33,109 +54,79 @@ const result = await validateScreenshot(
|
|
|
33
54
|
'Check if this payment form is accessible and usable'
|
|
34
55
|
);
|
|
35
56
|
|
|
36
|
-
console.log(result.score);
|
|
57
|
+
console.log(result.score); // 7 (0-10 scale)
|
|
37
58
|
console.log(result.issues); // ['Missing error messages', 'Low contrast']
|
|
38
59
|
```
|
|
39
60
|
|
|
40
|
-
##
|
|
41
|
-
|
|
42
|
-
- **Accessibility** - Fast programmatic checks or VLLM semantic evaluation
|
|
43
|
-
- **Design principles** - Validates brutalist, minimal, or other styles
|
|
44
|
-
- **Temporal testing** - Analyzes animations and gameplay over time
|
|
45
|
-
- **State validation** - Fast programmatic or VLLM extraction
|
|
46
|
-
- **Game testing** - Validate gameplay with variable goals
|
|
47
|
-
- **Natural language specs** - Write tests in plain English
|
|
61
|
+
## Key Features
|
|
48
62
|
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
- Pixel-perfect layout testing (use pixel-diffing tools)
|
|
52
|
-
- Exact color matching (use design tools)
|
|
53
|
-
- Performance testing (use Lighthouse)
|
|
54
|
-
- Unit testing (use Jest/Vitest)
|
|
55
|
-
|
|
56
|
-
## API
|
|
57
|
-
|
|
58
|
-
### Core
|
|
63
|
+
### 1. Hybrid Validation
|
|
64
|
+
Combines deterministic code checks (contrast ratios, aria-labels) with AI visual judgment.
|
|
59
65
|
|
|
60
66
|
```javascript
|
|
61
|
-
import {
|
|
67
|
+
import { validateAccessibilityHybrid } from '@arclabs561/ai-visual-test/validators';
|
|
68
|
+
// Checks code AND pixels
|
|
69
|
+
const result = await validateAccessibilityHybrid(page, 'shot.png');
|
|
70
|
+
```
|
|
62
71
|
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
provider: 'gemini',
|
|
66
|
-
apiKey: process.env.GEMINI_API_KEY
|
|
67
|
-
});
|
|
72
|
+
### 2. AI Game Agent
|
|
73
|
+
Plays Canvas/WebGL games by analyzing screenshots and planning actions. Includes Reflexion (learning from mistakes) and Chain of Thought.
|
|
68
74
|
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
'Evaluate this screenshot',
|
|
73
|
-
{ testType: 'payment-screen' }
|
|
74
|
-
);
|
|
75
|
+
```javascript
|
|
76
|
+
import { playGame } from '@arclabs561/ai-visual-test';
|
|
77
|
+
await playGame(page, { goal: 'Win the level', maxSteps: 50 });
|
|
75
78
|
```
|
|
76
79
|
|
|
77
|
-
###
|
|
80
|
+
### 3. Cost Optimization
|
|
81
|
+
Caching, model tiering, and provider selection. See `test/performance/optimization-claims-validation.test.mjs` for validation.
|
|
78
82
|
|
|
79
|
-
|
|
80
|
-
// Validators
|
|
81
|
-
import { StateValidator } from '@arclabs561/ai-visual-test/validators';
|
|
83
|
+
## Documentation
|
|
82
84
|
|
|
83
|
-
|
|
84
|
-
|
|
85
|
+
- [**EXAMPLES.md**](./EXAMPLES.md) - Code snippets for Game Playing, Hybrid Validation, Playwright integration.
|
|
86
|
+
- [**API_QUICK_REFERENCE.md**](./API_QUICK_REFERENCE.md) - Function signatures and options.
|
|
87
|
+
- [**examples/**](./examples/) - Runnable examples.
|
|
88
|
+
- **TypeScript**: Type definitions included.
|
|
85
89
|
|
|
86
|
-
|
|
87
|
-
import { multiModalValidation } from '@arclabs561/ai-visual-test/multi-modal';
|
|
90
|
+
## Playwright Integration
|
|
88
91
|
|
|
89
|
-
|
|
90
|
-
import { EnsembleJudge } from '@arclabs561/ai-visual-test/ensemble';
|
|
92
|
+
Custom matchers for Playwright tests. **Requires `@playwright/test` to be installed** (already in devDependencies for this project).
|
|
91
93
|
|
|
92
|
-
|
|
93
|
-
import { experiencePageAsPersona } from '@arclabs561/ai-visual-test/persona';
|
|
94
|
+
### Setup
|
|
94
95
|
|
|
95
|
-
|
|
96
|
-
import {
|
|
96
|
+
```javascript
|
|
97
|
+
import { expect } from '@playwright/test';
|
|
98
|
+
import { createMatchers } from '@arclabs561/ai-visual-test/playwright';
|
|
97
99
|
|
|
98
|
-
//
|
|
99
|
-
|
|
100
|
+
// Extend expect with custom matchers (call once in your test setup)
|
|
101
|
+
createMatchers(expect);
|
|
100
102
|
```
|
|
101
103
|
|
|
102
|
-
###
|
|
104
|
+
### Usage in Tests
|
|
103
105
|
|
|
104
106
|
```javascript
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
test('payment screen', async ({ page }) => {
|
|
109
|
-
await page.goto('https://example.com/checkout');
|
|
110
|
-
await page.screenshot({ path: 'checkout.png' });
|
|
107
|
+
test('visual quality', async ({ page }) => {
|
|
108
|
+
await page.goto('https://example.com');
|
|
111
109
|
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
'Check if payment form is accessible'
|
|
115
|
-
);
|
|
110
|
+
// Visual quality check
|
|
111
|
+
await expect(page).toHaveVisualScore(7, 'Check visual quality');
|
|
116
112
|
|
|
117
|
-
|
|
113
|
+
// Hybrid accessibility check (programmatic + AI)
|
|
114
|
+
await expect(page).toBeAccessibleHybrid(4.5);
|
|
118
115
|
});
|
|
119
116
|
```
|
|
120
117
|
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
- **Multi-provider** - Gemini, OpenAI, Claude
|
|
124
|
-
- **Cost-effective** - Auto-selects cheapest provider, includes caching
|
|
125
|
-
- **Multi-modal** - Screenshots + rendered code + context
|
|
126
|
-
- **Temporal** - Time-series validation for animations
|
|
127
|
-
- **Multi-perspective** - Multiple personas evaluate same state
|
|
128
|
-
- **Zero dependencies** - Pure ES Modules
|
|
118
|
+
### Installation
|
|
129
119
|
|
|
130
|
-
|
|
120
|
+
For development in this project, Playwright is already installed. For use in other projects:
|
|
131
121
|
|
|
132
|
-
|
|
122
|
+
```bash
|
|
123
|
+
npm install --save-dev @playwright/test
|
|
124
|
+
npx playwright install chromium
|
|
125
|
+
```
|
|
133
126
|
|
|
134
|
-
|
|
127
|
+
See `examples/playwright-setup.mjs` for setup example.
|
|
135
128
|
|
|
136
|
-
|
|
137
|
-
- `docs/API_SURFACE_ORGANIZATION.md` - API organization
|
|
138
|
-
- `CHANGELOG.md` - Version history
|
|
129
|
+
Documentation: [docs/PLAYWRIGHT_INTEGRATION.md](./docs/PLAYWRIGHT_INTEGRATION.md)
|
|
139
130
|
|
|
140
131
|
## License
|
|
141
132
|
|