@wix/evalforge-evaluator 0.73.0 → 0.75.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,53 @@
1
+ # @wix/evalforge-evaluator
2
+
3
+ CLI tool that executes AI agent evaluations. It fetches an eval run configuration from the backend, runs each scenario against a Claude Code agent, streams trace events, runs assertions, and reports results.
4
+
5
+ ## How It Works
6
+
7
+ ```
8
+ evaluator <project-id> <eval-run-id>
9
+ ```
10
+
11
+ 1. **Load configuration** from environment variables (server URL, AI Gateway credentials, etc.)
12
+ 2. **Fetch evaluation data** from the backend API — eval run, scenarios, agent config, skills, MCPs, sub-agents, and templates
13
+ 3. **For each scenario:**
14
+ - Prepare a working directory (download and extract template)
15
+ - Write skills to `.claude/skills/<name>/SKILL.md`
16
+ - Write MCPs to `.mcp.json`
17
+ - Write sub-agents to `.claude/agents/<name>.md`
18
+ - Launch the Claude Code agent with the scenario's trigger prompt via `@anthropic-ai/claude-agent-sdk`
19
+ - Stream trace events back to the backend
20
+ - Run assertions on the agent's output
21
+ - Report the scenario result
22
+ 4. **Finalize** — set eval run status to `COMPLETED` or `FAILED`
23
+
24
+ ## Environment Variables
25
+
26
+ | Variable | Required | Description |
27
+ |----------|----------|-------------|
28
+ | `EVAL_SERVER_URL` | Yes | Backend server URL for fetching data and reporting results |
29
+ | `AI_GATEWAY_URL` | Yes | AI Gateway base URL for LLM calls |
30
+ | `AI_GATEWAY_HEADERS` | No | Custom headers for AI Gateway (newline-separated `key:value` pairs) |
31
+ | `EVAL_API_PREFIX` | No | API path prefix (e.g., `/api/v1`) |
32
+ | `EVALUATIONS_DIR` | No | Directory for evaluation working directories |
33
+ | `TRACE_PUSH_URL` | No | URL for pushing trace events (remote job execution) |
34
+ | `EVAL_ROUTE_HEADER` | No | `x-wix-route` header for deploy preview routing |
35
+ | `EVAL_AUTH_TOKEN` | No | Bearer token for public endpoint authentication |
36
+
37
+ The evaluator is typically launched by the backend (locally or on a remote Dev Machine) with these variables pre-configured.
38
+
39
+ ## Scripts
40
+
41
+ ```bash
42
+ yarn build # Build CJS + ESM + type declarations
43
+ yarn test # Run tests
44
+ yarn lint # Run ESLint
45
+ yarn clean # Remove build artifacts
46
+ ```
47
+
48
+ ## Dependencies
49
+
50
+ - `@wix/evalforge-types` — shared type definitions
51
+ - `@wix/eval-assertions` — assertion evaluation framework
52
+ - `@wix/evalforge-github-client` — GitHub API client for fetching skill files
53
+ - `@anthropic-ai/claude-agent-sdk` — Claude Code agent SDK