@evalstudio/cli 0.3.0 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,211 @@
1
+ # @evalstudio/cli
2
+
3
+ Command-line interface for [EvalStudio](https://github.com/Treatwell-AI/evalstudio) — a flexible evaluation platform for testing chatbots, AI agents, and REST APIs.
4
+
5
+ ## Quick Start
6
+
7
+ ### Install
8
+
9
+ ```bash
10
+ npm install -g @evalstudio/cli
11
+
12
+ # Or run directly with npx
13
+ npx @evalstudio/cli --help
14
+ ```
15
+
16
+ ### Initialize a project
17
+
18
+ ```bash
19
+ mkdir my-evals && cd my-evals
20
+ evalstudio init
21
+
22
+ # Or with npx
23
+ mkdir my-evals && cd my-evals
24
+ npx @evalstudio/cli init
25
+ ```
26
+
27
+ This creates an `evalstudio.config.json` and a `data/` directory for storing test data.
28
+
29
+ ### Start the Web UI
30
+
31
+ The fastest way to get started is through the Web UI, which lets you manage everything visually — connectors, personas, scenarios, evals, and runs:
32
+
33
+ ```bash
34
+ evalstudio serve --open
35
+
36
+ # Or with npx
37
+ npx @evalstudio/cli serve --open
38
+ ```
39
+
40
+ This starts the API server and Web UI on `http://localhost:3000`. From there you can create all your resources and trigger eval runs through the browser.
41
+
42
+ ### CLI workflow
43
+
44
+ Everything available in the Web UI can also be done from the command line, which is useful for scripting and CI/CD pipelines.
45
+
46
+ #### Configure an LLM provider
47
+
48
+ Set up an LLM provider for evaluation (LLM-as-judge) and persona generation:
49
+
50
+ ```bash
51
+ evalstudio llm-provider create "openai" --provider openai --api-key sk-...
52
+ ```
53
+
54
+ #### Create a connector
55
+
56
+ Define the agent endpoint to test against. For example, a LangGraph dev server:
57
+
58
+ ```bash
59
+ evalstudio connector create "my-agent" \
60
+ --type langgraph \
61
+ --base-url "http://localhost:2024" \
62
+ --config '{"assistantId": "agent"}'
63
+ ```
64
+
65
+ #### Create a persona and scenario
66
+
67
+ ```bash
68
+ # Create a test persona
69
+ evalstudio persona create "frustrated-customer" \
70
+ -d "A customer who is unhappy with their recent purchase"
71
+
72
+ # Create a test scenario
73
+ evalstudio scenario create "refund-request" \
74
+ -i "Ask for a refund on a recent order" \
75
+ --success-criteria "Agent offers a refund or escalation path" \
76
+ --failure-criteria "Agent ignores the refund request" \
77
+ --personas "frustrated-customer"
78
+ ```
79
+
80
+ #### Create and run an eval
81
+
82
+ ```bash
83
+ # Create an eval combining scenarios with a connector
84
+ evalstudio eval create -n "customer-service-eval" \
85
+ -c "my-agent" \
86
+ --scenario "refund-request"
87
+
88
+ # Create runs for the eval
89
+ evalstudio run create -e "customer-service-eval"
90
+
91
+ # Process queued runs
92
+ evalstudio run process
93
+ ```
94
+
95
+ ## Commands
96
+
97
+ | Command | Description |
98
+ |---------|-------------|
99
+ | `evalstudio init [name]` | Initialize a new project in the current directory |
100
+ | `evalstudio status` | Show project status and configuration |
101
+ | `evalstudio connector <sub>` | Manage connectors (create, list, show, update, delete, types) |
102
+ | `evalstudio llm-provider <sub>` | Manage LLM providers (create, list, show, update, delete, models) |
103
+ | `evalstudio persona <sub>` | Manage test personas (create, list, show, update, delete) |
104
+ | `evalstudio scenario <sub>` | Manage test scenarios (create, list, show, update, delete) |
105
+ | `evalstudio eval <sub>` | Manage evals (create, list, show, update, delete) |
106
+ | `evalstudio run <sub>` | Manage runs (create, list, show, delete, process) |
107
+ | `evalstudio serve` | Start the API server and Web UI |
108
+
109
+ All commands support `--json` for machine-readable output, useful for scripting and CI/CD pipelines.
110
+
111
+ ### `evalstudio serve` options
112
+
113
+ | Option | Description |
114
+ |--------|-------------|
115
+ | `-p, --port <number>` | Port to listen on (default: 3000, env: `EVALSTUDIO_PORT`) |
116
+ | `--no-web` | Disable Web UI, serve API only |
117
+ | `--no-processor` | Disable background run processor |
118
+ | `--open` | Open browser after starting |
119
+
120
+ ### `evalstudio run process` options
121
+
122
+ | Option | Description |
123
+ |--------|-------------|
124
+ | `-w, --watch` | Continuously watch and process queued runs |
125
+ | `-c, --concurrency <number>` | Max concurrent runs (default: 3) |
126
+ | `--poll <ms>` | Poll interval in milliseconds (default: 2000) |
127
+
128
+ ## Development Setup
129
+
130
+ ### Prerequisites
131
+
132
+ - Node.js 20+
133
+ - pnpm 9.15+
134
+
135
+ ### Clone and install
136
+
137
+ ```bash
138
+ git clone https://github.com/Treatwell-AI/evalstudio.git
139
+ cd evalstudio
140
+ pnpm install
141
+ ```
142
+
143
+ ### Build
144
+
145
+ ```bash
146
+ # Build all packages (required — CLI depends on core and api)
147
+ pnpm build
148
+
149
+ # Or build just the CLI and its dependencies
150
+ pnpm --filter @evalstudio/cli build
151
+ ```
152
+
153
+ ### Run locally
154
+
155
+ ```bash
156
+ # Run the CLI directly from the build output
157
+ node packages/cli/dist/index.js status
158
+
159
+ # Or use pnpm to scope commands
160
+ pnpm --filter @evalstudio/cli build && node packages/cli/dist/index.js init
161
+ ```
162
+
163
+ ### Development workflow
164
+
165
+ ```bash
166
+ # Watch mode — recompiles on changes
167
+ pnpm --filter @evalstudio/cli dev
168
+
169
+ # Run tests
170
+ pnpm --filter @evalstudio/cli test
171
+
172
+ # Watch mode for tests
173
+ pnpm --filter @evalstudio/cli test:watch
174
+
175
+ # Type checking
176
+ pnpm --filter @evalstudio/cli typecheck
177
+
178
+ # Linting
179
+ pnpm --filter @evalstudio/cli lint
180
+ ```
181
+
182
+ ### Project structure
183
+
184
+ ```
185
+ packages/cli/
186
+ ├── src/
187
+ │ ├── index.ts # CLI entry point
188
+ │ ├── commands/
189
+ │ │ ├── init.ts # Project initialization
190
+ │ │ ├── status.ts # Status display
191
+ │ │ ├── connector.ts # Connector CRUD
192
+ │ │ ├── eval.ts # Eval CRUD
193
+ │ │ ├── llm-provider.ts # LLM provider CRUD
194
+ │ │ ├── persona.ts # Persona CRUD
195
+ │ │ ├── run.ts # Run management & processing
196
+ │ │ ├── scenario.ts # Scenario CRUD
197
+ │ │ └── serve.ts # API + Web server
198
+ │ └── __tests__/ # Test files
199
+ ├── dist/ # Compiled output
200
+ └── web-dist/ # Bundled Web UI (copied during build)
201
+ ```
202
+
203
+ ### Architecture
204
+
205
+ The CLI is a thin wrapper around `@evalstudio/core`. All business logic (storage, evaluation, connectors) lives in core — the CLI provides command parsing via [Commander.js](https://github.com/tj/commander.js) and formatted terminal output.
206
+
207
+ The `serve` command starts a [Fastify](https://fastify.dev/) server from `@evalstudio/api` and serves the pre-built Web UI from the `web-dist/` directory.
208
+
209
+ ## License
210
+
211
+ MIT
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@evalstudio/cli",
3
- "version": "0.3.0",
3
+ "version": "0.3.2",
4
4
  "description": "Command-line interface for EvalStudio",
5
5
  "type": "module",
6
6
  "bin": {
@@ -31,8 +31,8 @@
31
31
  },
32
32
  "dependencies": {
33
33
  "commander": "^13.0.0",
34
- "@evalstudio/core": "0.3.0",
35
- "@evalstudio/api": "0.3.0"
34
+ "@evalstudio/core": "0.3.2",
35
+ "@evalstudio/api": "0.3.2"
36
36
  },
37
37
  "devDependencies": {
38
38
  "@types/node": "^22.10.10",