@evalstudio/cli 0.3.0 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +211 -0
- package/package.json +3 -3
package/README.md
ADDED
|
@@ -0,0 +1,211 @@
|
|
|
1
|
+
# @evalstudio/cli
|
|
2
|
+
|
|
3
|
+
Command-line interface for [EvalStudio](https://github.com/Treatwell-AI/evalstudio) — a flexible evaluation platform for testing chatbots, AI agents, and REST APIs.
|
|
4
|
+
|
|
5
|
+
## Quick Start
|
|
6
|
+
|
|
7
|
+
### Install
|
|
8
|
+
|
|
9
|
+
```bash
|
|
10
|
+
npm install -g @evalstudio/cli
|
|
11
|
+
|
|
12
|
+
# Or run directly with npx
|
|
13
|
+
npx @evalstudio/cli --help
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
### Initialize a project
|
|
17
|
+
|
|
18
|
+
```bash
|
|
19
|
+
mkdir my-evals && cd my-evals
|
|
20
|
+
evalstudio init
|
|
21
|
+
|
|
22
|
+
# Or with npx
|
|
23
|
+
mkdir my-evals && cd my-evals
|
|
24
|
+
npx @evalstudio/cli init
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
This creates an `evalstudio.config.json` and a `data/` directory for storing test data.
|
|
28
|
+
|
|
29
|
+
### Start the Web UI
|
|
30
|
+
|
|
31
|
+
The fastest way to get started is through the Web UI, which lets you manage everything visually — connectors, personas, scenarios, evals, and runs:
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
evalstudio serve --open
|
|
35
|
+
|
|
36
|
+
# Or with npx
|
|
37
|
+
npx @evalstudio/cli serve --open
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
This starts the API server and Web UI on `http://localhost:3000`. From there you can create all your resources and trigger eval runs through the browser.
|
|
41
|
+
|
|
42
|
+
### CLI workflow
|
|
43
|
+
|
|
44
|
+
Everything available in the Web UI can also be done from the command line, which is useful for scripting and CI/CD pipelines.
|
|
45
|
+
|
|
46
|
+
#### Configure an LLM provider
|
|
47
|
+
|
|
48
|
+
Set up an LLM provider for evaluation (LLM-as-judge) and persona generation:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
evalstudio llm-provider create "openai" --provider openai --api-key sk-...
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
#### Create a connector
|
|
55
|
+
|
|
56
|
+
Define the agent endpoint to test against. For example, a LangGraph dev server:
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
evalstudio connector create "my-agent" \
|
|
60
|
+
--type langgraph \
|
|
61
|
+
--base-url "http://localhost:2024" \
|
|
62
|
+
--config '{"assistantId": "agent"}'
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
#### Create a persona and scenario
|
|
66
|
+
|
|
67
|
+
```bash
|
|
68
|
+
# Create a test persona
|
|
69
|
+
evalstudio persona create "frustrated-customer" \
|
|
70
|
+
-d "A customer who is unhappy with their recent purchase"
|
|
71
|
+
|
|
72
|
+
# Create a test scenario
|
|
73
|
+
evalstudio scenario create "refund-request" \
|
|
74
|
+
-i "Ask for a refund on a recent order" \
|
|
75
|
+
--success-criteria "Agent offers a refund or escalation path" \
|
|
76
|
+
--failure-criteria "Agent ignores the refund request" \
|
|
77
|
+
--personas "frustrated-customer"
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
#### Create and run an eval
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
# Create an eval combining scenarios with a connector
|
|
84
|
+
evalstudio eval create -n "customer-service-eval" \
|
|
85
|
+
-c "my-agent" \
|
|
86
|
+
--scenario "refund-request"
|
|
87
|
+
|
|
88
|
+
# Create runs for the eval
|
|
89
|
+
evalstudio run create -e "customer-service-eval"
|
|
90
|
+
|
|
91
|
+
# Process queued runs
|
|
92
|
+
evalstudio run process
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
## Commands
|
|
96
|
+
|
|
97
|
+
| Command | Description |
|
|
98
|
+
|---------|-------------|
|
|
99
|
+
| `evalstudio init [name]` | Initialize a new project in the current directory |
|
|
100
|
+
| `evalstudio status` | Show project status and configuration |
|
|
101
|
+
| `evalstudio connector <sub>` | Manage connectors (create, list, show, update, delete, types) |
|
|
102
|
+
| `evalstudio llm-provider <sub>` | Manage LLM providers (create, list, show, update, delete, models) |
|
|
103
|
+
| `evalstudio persona <sub>` | Manage test personas (create, list, show, update, delete) |
|
|
104
|
+
| `evalstudio scenario <sub>` | Manage test scenarios (create, list, show, update, delete) |
|
|
105
|
+
| `evalstudio eval <sub>` | Manage evals (create, list, show, update, delete) |
|
|
106
|
+
| `evalstudio run <sub>` | Manage runs (create, list, show, delete, process) |
|
|
107
|
+
| `evalstudio serve` | Start the API server and Web UI |
|
|
108
|
+
|
|
109
|
+
All commands support `--json` for machine-readable output, useful for scripting and CI/CD pipelines.
|
|
110
|
+
|
|
111
|
+
### `evalstudio serve` options
|
|
112
|
+
|
|
113
|
+
| Option | Description |
|
|
114
|
+
|--------|-------------|
|
|
115
|
+
| `-p, --port <number>` | Port to listen on (default: 3000, env: `EVALSTUDIO_PORT`) |
|
|
116
|
+
| `--no-web` | Disable Web UI, serve API only |
|
|
117
|
+
| `--no-processor` | Disable background run processor |
|
|
118
|
+
| `--open` | Open browser after starting |
|
|
119
|
+
|
|
120
|
+
### `evalstudio run process` options
|
|
121
|
+
|
|
122
|
+
| Option | Description |
|
|
123
|
+
|--------|-------------|
|
|
124
|
+
| `-w, --watch` | Continuously watch and process queued runs |
|
|
125
|
+
| `-c, --concurrency <number>` | Max concurrent runs (default: 3) |
|
|
126
|
+
| `--poll <ms>` | Poll interval in milliseconds (default: 2000) |
|
|
127
|
+
|
|
128
|
+
## Development Setup
|
|
129
|
+
|
|
130
|
+
### Prerequisites
|
|
131
|
+
|
|
132
|
+
- Node.js 20+
|
|
133
|
+
- pnpm 9.15+
|
|
134
|
+
|
|
135
|
+
### Clone and install
|
|
136
|
+
|
|
137
|
+
```bash
|
|
138
|
+
git clone https://github.com/Treatwell-AI/evalstudio.git
|
|
139
|
+
cd evalstudio
|
|
140
|
+
pnpm install
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
### Build
|
|
144
|
+
|
|
145
|
+
```bash
|
|
146
|
+
# Build all packages (required — CLI depends on core and api)
|
|
147
|
+
pnpm build
|
|
148
|
+
|
|
149
|
+
# Or build just the CLI and its dependencies
|
|
150
|
+
pnpm --filter @evalstudio/cli build
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
### Run locally
|
|
154
|
+
|
|
155
|
+
```bash
|
|
156
|
+
# Run the CLI directly from the build output
|
|
157
|
+
node packages/cli/dist/index.js status
|
|
158
|
+
|
|
159
|
+
# Or use pnpm to scope commands
|
|
160
|
+
pnpm --filter @evalstudio/cli build && node packages/cli/dist/index.js init
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
### Development workflow
|
|
164
|
+
|
|
165
|
+
```bash
|
|
166
|
+
# Watch mode — recompiles on changes
|
|
167
|
+
pnpm --filter @evalstudio/cli dev
|
|
168
|
+
|
|
169
|
+
# Run tests
|
|
170
|
+
pnpm --filter @evalstudio/cli test
|
|
171
|
+
|
|
172
|
+
# Watch mode for tests
|
|
173
|
+
pnpm --filter @evalstudio/cli test:watch
|
|
174
|
+
|
|
175
|
+
# Type checking
|
|
176
|
+
pnpm --filter @evalstudio/cli typecheck
|
|
177
|
+
|
|
178
|
+
# Linting
|
|
179
|
+
pnpm --filter @evalstudio/cli lint
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
### Project structure
|
|
183
|
+
|
|
184
|
+
```
|
|
185
|
+
packages/cli/
|
|
186
|
+
├── src/
|
|
187
|
+
│ ├── index.ts # CLI entry point
|
|
188
|
+
│ ├── commands/
|
|
189
|
+
│ │ ├── init.ts # Project initialization
|
|
190
|
+
│ │ ├── status.ts # Status display
|
|
191
|
+
│ │ ├── connector.ts # Connector CRUD
|
|
192
|
+
│ │ ├── eval.ts # Eval CRUD
|
|
193
|
+
│ │ ├── llm-provider.ts # LLM provider CRUD
|
|
194
|
+
│ │ ├── persona.ts # Persona CRUD
|
|
195
|
+
│ │ ├── run.ts # Run management & processing
|
|
196
|
+
│ │ ├── scenario.ts # Scenario CRUD
|
|
197
|
+
│ │ └── serve.ts # API + Web server
|
|
198
|
+
│ └── __tests__/ # Test files
|
|
199
|
+
├── dist/ # Compiled output
|
|
200
|
+
└── web-dist/ # Bundled Web UI (copied during build)
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
### Architecture
|
|
204
|
+
|
|
205
|
+
The CLI is a thin wrapper around `@evalstudio/core`. All business logic (storage, evaluation, connectors) lives in core — the CLI provides command parsing via [Commander.js](https://github.com/tj/commander.js) and formatted terminal output.
|
|
206
|
+
|
|
207
|
+
The `serve` command starts a [Fastify](https://fastify.dev/) server from `@evalstudio/api` and serves the pre-built Web UI from the `web-dist/` directory.
|
|
208
|
+
|
|
209
|
+
## License
|
|
210
|
+
|
|
211
|
+
MIT
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@evalstudio/cli",
|
|
3
|
-
"version": "0.3.
|
|
3
|
+
"version": "0.3.1",
|
|
4
4
|
"description": "Command-line interface for EvalStudio",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
@@ -31,8 +31,8 @@
|
|
|
31
31
|
},
|
|
32
32
|
"dependencies": {
|
|
33
33
|
"commander": "^13.0.0",
|
|
34
|
-
"@evalstudio/core": "0.3.
|
|
35
|
-
"@evalstudio/api": "0.3.
|
|
34
|
+
"@evalstudio/core": "0.3.1",
|
|
35
|
+
"@evalstudio/api": "0.3.1"
|
|
36
36
|
},
|
|
37
37
|
"devDependencies": {
|
|
38
38
|
"@types/node": "^22.10.10",
|