@dutchmanlabs/evalstudio 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,106 @@
1
+ # Eval Studio CLI
2
+
3
+ Local-first CLI for scanning AI agents, generating eval suites through Dutchman Labs, running them locally, and uploading results back to Eval Studio.
4
+
5
+ ## Install and Run
6
+
7
+ From the monorepo during development:
8
+
9
+ ```bash
10
+ npm run build:cli
11
+ node packages/cli/dist/index.js --help
12
+ node packages/cli/dist/index.js login
13
+ ```
14
+
15
+ After install, the quick start is:
16
+
17
+ ```bash
18
+ evalstudio login
19
+ evalstudio init
20
+ evalstudio detect
21
+ evalstudio generate
22
+ evalstudio run
23
+ ```
24
+
25
+ Users will be able to get that command UX either via `npx`:
26
+
27
+ ```bash
28
+ npx @dutchmanlabs/evalstudio@latest login
29
+ npx @dutchmanlabs/evalstudio@latest init
30
+ npx @dutchmanlabs/evalstudio@latest detect
31
+ npx @dutchmanlabs/evalstudio@latest generate
32
+ npx @dutchmanlabs/evalstudio@latest run
33
+ ```
34
+
35
+ Or by installing globally:
36
+
37
+ ```bash
38
+ npm install -g @dutchmanlabs/evalstudio
39
+ evalstudio --help
40
+ ```
41
+
42
+ ## Commands
43
+
44
+ - `evalstudio login`
45
+ - `evalstudio init`
46
+ - `evalstudio detect`
47
+ - `evalstudio scan` (alias)
48
+ - `evalstudio generate`
49
+ - `evalstudio run`
50
+ - `evalstudio status`
51
+ - `evalstudio export`
52
+
53
+ ## Local Files
54
+
55
+ The CLI writes state in the current repo under `.evalstudio/`:
56
+
57
+ - `.evalstudio/config.json`
58
+ - `.evalstudio/scan-results.json`
59
+ - `.evalstudio/latest-suite.json`
60
+ - `.evalstudio/latest-run.json`
61
+ - `.evalstudio/exports/`
62
+
63
+ Global auth is stored in `~/.evalstudio/config.json`.
64
+
65
+ `generate` writes the current hosted suite to `.evalstudio/latest-suite.json`.
66
+
67
+ `run` executes the suite locally, saves the result set to `.evalstudio/latest-run.json`, and then uploads those results to the Dutchman Labs dashboard.
68
+
69
+ `export` is local-only. It transforms `.evalstudio/latest-run.json` into JSONL, CSV, or pytest artifacts under `.evalstudio/exports/`.
70
+
71
+ ## Help
72
+
73
+ ```bash
74
+ evalstudio --help
75
+ evalstudio help
76
+ evalstudio generate --help
77
+ evalstudio help run
78
+ ```
79
+
80
+ ## Demo Target
81
+
82
+ The canonical sibling demo repo used during validation is:
83
+
84
+ `/Users/riyadsarsour/Desktop/dutchman/testagent`
85
+
86
+ That demo agent listens on:
87
+
88
+ `http://127.0.0.1:3000/api/chat`
89
+
90
+ Typical demo flow:
91
+
92
+ ```bash
93
+ cd /Users/riyadsarsour/Desktop/dutchman/testagent
94
+ node /Users/riyadsarsour/Desktop/dutchman/dutchmanlabs/packages/cli/dist/index.js init
95
+ node /Users/riyadsarsour/Desktop/dutchman/dutchmanlabs/packages/cli/dist/index.js detect
96
+ node /Users/riyadsarsour/Desktop/dutchman/dutchmanlabs/packages/cli/dist/index.js generate
97
+ node /Users/riyadsarsour/Desktop/dutchman/dutchmanlabs/packages/cli/dist/index.js run --url http://127.0.0.1:3000/api/chat
98
+ node /Users/riyadsarsour/Desktop/dutchman/dutchmanlabs/packages/cli/dist/index.js export
99
+ ```
100
+
101
+ Artifacts to inspect after the demo:
102
+
103
+ - `.evalstudio/scan-results.json`
104
+ - `.evalstudio/latest-suite.json`
105
+ - `.evalstudio/latest-run.json`
106
+ - `.evalstudio/exports/`
package/VALIDATION.md ADDED
@@ -0,0 +1,69 @@
1
+ # Eval Studio CLI Validation
2
+
3
+ Validation date: 2026-03-31
4
+
5
+ ## Checklist
6
+
7
+ - [x] Happy path
8
+ - [x] Invalid API key
9
+ - [x] Generation limit exceeded
10
+ - [x] Multiple candidates
11
+ - [x] Local endpoint down
12
+ - [x] Export after run
13
+
14
+ ## Results
15
+
16
+ ### Happy path
17
+
18
+ - Environment: release-readiness pass used the local mock backend at `http://127.0.0.1:8791` plus a real local HTTP agent endpoint at `http://127.0.0.1:3000/api/chat`
19
+ - Commands exercised: `init`, `detect --candidate 1`, `generate`, `run --url http://127.0.0.1:3000/api/chat --payload '{"input":"{{prompt}}"}'`, `status`, `export`
20
+ - Result: end-to-end success on the polished build
21
+ - Notes:
22
+ - project created: `proj_72c7f75e`
23
+ - detect found 2 candidates and selected `app/api/chat/route.ts`
24
+ - suite created: `suite_3b2835ae`
25
+ - run uploaded: `run_08bd0e4c`
26
+ - JSONL, CSV, and pytest exports written locally under `.evalstudio/exports/`
27
+
28
+ Additional note:
29
+ - the core product loop was also validated earlier on 2026-03-31 against the real hosted backend with a real API key and local sibling demo repo
30
+
31
+ ### Invalid API key
32
+
33
+ - Environment: temp home directory with a bogus `es_live_...` key against the real hosted backend
34
+ - Command exercised: `status`
35
+ - Result: CLI shows a human-readable invalid/revoked key error without a stack trace
36
+
37
+ ### Generation limit exceeded
38
+
39
+ - Environment: local mock backend returning `429 generation_limit_exceeded`
40
+ - Command exercised: `generate`
41
+ - Result: CLI shows a friendly daily limit message and points the user to `evalstudio status`
42
+
43
+ ### Multiple candidates
44
+
45
+ - Environment: temp validation repo with two detected candidates
46
+ - Command exercised: `detect --candidate 1`
47
+ - Result: CLI prints a ranked list, supports clean selection by number, and stores the selected candidate in `.evalstudio/config.json`
48
+
49
+ ### Local endpoint down
50
+
51
+ - Environment: valid local suite with no service listening at the configured URL
52
+ - Command exercised: `run --url http://127.0.0.1:3999/api/chat --payload '{"input":"{{prompt}}"}'`
53
+ - Result: CLI surfaces an actionable local endpoint error instead of a raw fetch failure
54
+
55
+ ### Export after run
56
+
57
+ - Environment: local run cache present
58
+ - Command exercised: `export`
59
+ - Result: CLI writes JSONL, CSV, and pytest artifacts and prints their saved paths
60
+
61
+ ## Additional error smoke tests
62
+
63
+ - Missing API key: `status` reports `No Eval Studio API key is saved on this machine.`
64
+ - Project not initialized: `status` reports `This repo is not initialized for Eval Studio yet.`
65
+ - No candidates found: `detect` on an empty repo reports `No likely AI agent candidates were found in this repo.`
66
+ - No candidate selected: `generate` on an initialized repo without detection results reports `No candidate selected.`
67
+ - No eval suite generated: `run` on an initialized repo without a suite reports `No eval suite is saved for this repo.`
68
+ - Malformed local agent response: `run` against a server returning `{"ok":true}` reports `Your local agent responded, but the response shape wasn't recognized.`
69
+ - Backend upload failure: `run` against a backend stub that fails result uploads reports `Eval Studio couldn't upload your run results.` and leaves `.evalstudio/latest-run.json` with `"uploadStatus": "pending"`