@wix/evalforge-evaluator 0.73.0 → 0.75.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +53 -0
- package/build/index.js +124 -5954
- package/build/index.js.map +4 -4
- package/build/index.mjs +113 -5943
- package/build/index.mjs.map +4 -4
- package/build/types/run-scenario/agents/claude-code/write-skills.d.ts +4 -4
- package/build/types/run-scenario/environment.d.ts +1 -1
- package/package.json +5 -5
package/README.md
ADDED
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
# @wix/evalforge-evaluator
|
|
2
|
+
|
|
3
|
+
CLI tool that executes AI agent evaluations. It fetches an eval run configuration from the backend, runs each scenario against a Claude Code agent, streams trace events, runs assertions, and reports results.
|
|
4
|
+
|
|
5
|
+
## How It Works
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
evaluator <project-id> <eval-run-id>
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
1. **Load configuration** from environment variables (server URL, AI Gateway credentials, etc.)
|
|
12
|
+
2. **Fetch evaluation data** from the backend API — eval run, scenarios, agent config, skills, MCPs, sub-agents, and templates
|
|
13
|
+
3. **For each scenario:**
|
|
14
|
+
- Prepare a working directory (download and extract template)
|
|
15
|
+
- Write skills to `.claude/skills/<name>/SKILL.md`
|
|
16
|
+
- Write MCPs to `.mcp.json`
|
|
17
|
+
- Write sub-agents to `.claude/agents/<name>.md`
|
|
18
|
+
- Launch the Claude Code agent with the scenario's trigger prompt via `@anthropic-ai/claude-agent-sdk`
|
|
19
|
+
- Stream trace events back to the backend
|
|
20
|
+
- Run assertions on the agent's output
|
|
21
|
+
- Report the scenario result
|
|
22
|
+
4. **Finalize** — set eval run status to `COMPLETED` or `FAILED`
|
|
23
|
+
|
|
24
|
+
## Environment Variables
|
|
25
|
+
|
|
26
|
+
| Variable | Required | Description |
|
|
27
|
+
|----------|----------|-------------|
|
|
28
|
+
| `EVAL_SERVER_URL` | Yes | Backend server URL for fetching data and reporting results |
|
|
29
|
+
| `AI_GATEWAY_URL` | Yes | AI Gateway base URL for LLM calls |
|
|
30
|
+
| `AI_GATEWAY_HEADERS` | No | Custom headers for AI Gateway (newline-separated `key:value` pairs) |
|
|
31
|
+
| `EVAL_API_PREFIX` | No | API path prefix (e.g., `/api/v1`) |
|
|
32
|
+
| `EVALUATIONS_DIR` | No | Directory for evaluation working directories |
|
|
33
|
+
| `TRACE_PUSH_URL` | No | URL for pushing trace events (remote job execution) |
|
|
34
|
+
| `EVAL_ROUTE_HEADER` | No | `x-wix-route` header for deploy preview routing |
|
|
35
|
+
| `EVAL_AUTH_TOKEN` | No | Bearer token for public endpoint authentication |
|
|
36
|
+
|
|
37
|
+
The evaluator is typically launched by the backend (locally or on a remote Dev Machine) with these variables pre-configured.
|
|
38
|
+
|
|
39
|
+
## Scripts
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
yarn build # Build CJS + ESM + type declarations
|
|
43
|
+
yarn test # Run tests
|
|
44
|
+
yarn lint # Run ESLint
|
|
45
|
+
yarn clean # Remove build artifacts
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
## Dependencies
|
|
49
|
+
|
|
50
|
+
- `@wix/evalforge-types` — shared type definitions
|
|
51
|
+
- `@wix/eval-assertions` — assertion evaluation framework
|
|
52
|
+
- `@wix/evalforge-github-client` — GitHub API client for fetching skill files
|
|
53
|
+
- `@anthropic-ai/claude-agent-sdk` — Claude Code agent SDK
|