@gleanwork/mcp-server-tester 0.12.0 → 1.0.0-beta.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +120 -337
- package/dist/cli/index.js +455 -174
- package/dist/fixtures/mcp.d.ts +121 -44
- package/dist/fixtures/mcp.js +974 -244
- package/dist/fixtures/mcp.js.map +1 -1
- package/dist/fixtures/mcpAuth.js +6 -2
- package/dist/fixtures/mcpAuth.js.map +1 -1
- package/dist/index.cjs +4936 -1292
- package/dist/index.cjs.map +1 -1
- package/dist/index.d.cts +1660 -570
- package/dist/index.d.ts +1660 -570
- package/dist/index.js +4923 -1288
- package/dist/index.js.map +1 -1
- package/dist/reporters/mcpReporter.cjs +35 -16
- package/dist/reporters/mcpReporter.cjs.map +1 -1
- package/dist/reporters/mcpReporter.d.cts +8 -3
- package/dist/reporters/mcpReporter.d.ts +8 -3
- package/dist/reporters/mcpReporter.js +36 -17
- package/dist/reporters/mcpReporter.js.map +1 -1
- package/dist/reporters/ui-dist/app.js +5 -5
- package/dist/reporters/ui-dist/styles.css +1 -1
- package/package.json +63 -8
- package/src/reporters/ui-dist/app.js +5 -5
- package/src/reporters/ui-dist/styles.css +1 -1
package/README.md
CHANGED
|
@@ -4,161 +4,50 @@
|
|
|
4
4
|
[](https://www.npmjs.com/package/@gleanwork/mcp-server-tester)
|
|
5
5
|
[](https://github.com/gleanwork/mcp-server-tester/actions/workflows/ci.yml)
|
|
6
6
|
[](https://opensource.org/licenses/MIT)
|
|
7
|
-
[](https://nodejs.org)
|
|
8
|
-
[](https://www.typescriptlang.org/)
|
|
9
7
|
|
|
10
|
-
|
|
8
|
+
A testing and evaluation framework for [Model Context Protocol (MCP)](https://modelcontextprotocol.io) servers. Write deterministic Playwright tests against your MCP tools, or run data-driven eval datasets — including LLM-based evaluation of tool discoverability.
|
|
11
9
|
|
|
12
|
-
|
|
13
|
-
> **Experimental Project** - This library is in active development. APIs may change, and we welcome contributions, feedback, and collaboration as we evolve the framework. See [CONTRIBUTING.md](./CONTRIBUTING.md) for details.
|
|
10
|
+
## Playwright Tests
|
|
14
11
|
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
## What's Included
|
|
18
|
-
|
|
19
|
-
This framework provides **two complementary approaches** for testing MCP servers:
|
|
20
|
-
|
|
21
|
-
### 1. **Automated Testing** (Playwright Tests)
|
|
22
|
-
|
|
23
|
-
Write deterministic, automated tests using standard Playwright patterns with MCP-specific fixtures. Perfect for:
|
|
24
|
-
|
|
25
|
-
- Direct tool calls with expected outputs
|
|
26
|
-
- Protocol conformance validation
|
|
27
|
-
- Integration testing with your MCP server
|
|
28
|
-
- CI/CD pipelines
|
|
12
|
+
The `mcp` Playwright fixture connects to your MCP server (stdio or HTTP) and exposes a high-level API for calling tools and asserting responses. Custom matchers keep assertions readable.
|
|
29
13
|
|
|
30
14
|
```typescript
|
|
31
|
-
test('read a file', async ({ mcp }) => {
|
|
32
|
-
const result = await mcp.callTool('read_file', { path: '/tmp/test.txt' });
|
|
33
|
-
expect(result.content).toContain('Hello');
|
|
34
|
-
});
|
|
35
|
-
```
|
|
36
|
-
|
|
37
|
-
### 2. **Evaluation Datasets** (Evals) ⚠️ Experimental
|
|
38
|
-
|
|
39
|
-
Run deeper, more subjective analysis using dataset-driven evaluations. Includes:
|
|
40
|
-
|
|
41
|
-
- Schema validation (deterministic)
|
|
42
|
-
- Text and regex pattern matching (deterministic)
|
|
43
|
-
- LLM-as-a-judge scoring (non-deterministic)
|
|
44
|
-
|
|
45
|
-
**Note:** Evals, particularly those using LLM-as-a-judge, are highly experimental due to their non-deterministic nature. Results may vary between runs, and prompts may need tuning for your specific use case.
|
|
46
|
-
|
|
47
|
-
```typescript
|
|
48
|
-
const result = await runEvalDataset({ dataset, expectations }, { mcp });
|
|
49
|
-
expect(result.passed).toBe(result.total);
|
|
50
|
-
```
|
|
51
|
-
|
|
52
|
-
## Features
|
|
53
|
-
|
|
54
|
-
- 🎭 **Playwright Integration** - Use MCP servers in Playwright tests with idiomatic fixtures
|
|
55
|
-
- 📊 **Matrix Evals** - Run dataset-driven evaluations across multiple transports
|
|
56
|
-
- 📸 **Snapshot Testing** - Capture and compare deterministic responses with optional sanitizers for variable data
|
|
57
|
-
- 🤖 **LLM-as-a-Judge** - Optional semantic evaluation using Anthropic Claude
|
|
58
|
-
- 🔌 **Multiple Transports** - Support for both stdio (local) and HTTP (remote) connections
|
|
59
|
-
- ✅ **Protocol Conformance** - Built-in checks for MCP spec compliance
|
|
60
|
-
|
|
61
|
-
## Installation
|
|
62
|
-
|
|
63
|
-
```bash
|
|
64
|
-
npm install --save-dev @gleanwork/mcp-server-tester @playwright/test zod
|
|
65
|
-
```
|
|
66
|
-
|
|
67
|
-
**Note:** The Anthropic SDK is optional and only needed if you plan to use LLM-as-a-judge semantic evaluation:
|
|
68
|
-
|
|
69
|
-
```bash
|
|
70
|
-
npm install --save-dev @anthropic-ai/sdk
|
|
71
|
-
```
|
|
72
|
-
|
|
73
|
-
## Quick Start
|
|
74
|
-
|
|
75
|
-
### Initialize with CLI
|
|
76
|
-
|
|
77
|
-
The fastest way to get started:
|
|
78
|
-
|
|
79
|
-
```bash
|
|
80
|
-
npx mcp-server-tester init
|
|
81
|
-
|
|
82
|
-
# Follow the interactive prompts to create:
|
|
83
|
-
# - playwright.config.ts (configured for your MCP server)
|
|
84
|
-
# - tests/mcp.spec.ts (example tests)
|
|
85
|
-
# - data/example-dataset.json (sample eval dataset)
|
|
86
|
-
# - package.json (with all dependencies)
|
|
87
|
-
```
|
|
88
|
-
|
|
89
|
-
See the [CLI Guide](./docs/cli.md) for all options.
|
|
90
|
-
|
|
91
|
-
### Example: Testing in Action
|
|
92
|
-
|
|
93
|
-
Here's what a complete test suite looks like (following the **layered testing pattern**):
|
|
94
|
-
|
|
95
|
-
```typescript
|
|
96
|
-
// tests/mcp.spec.ts
|
|
97
15
|
import { test, expect } from '@gleanwork/mcp-server-tester/fixtures/mcp';
|
|
98
|
-
import {
|
|
99
|
-
loadEvalDataset,
|
|
100
|
-
runEvalDataset,
|
|
101
|
-
createSchemaExpectation,
|
|
102
|
-
} from '@gleanwork/mcp-server-tester';
|
|
103
|
-
import { z } from 'zod';
|
|
104
|
-
|
|
105
|
-
// Layer 1: MCP Protocol Conformance
|
|
106
|
-
test.describe('MCP Protocol Conformance', () => {
|
|
107
|
-
test('should return valid server info', async ({ mcp }) => {
|
|
108
|
-
const info = mcp.getServerInfo();
|
|
109
|
-
expect(info).toBeTruthy();
|
|
110
|
-
expect(info?.name).toBeTruthy();
|
|
111
|
-
expect(info?.version).toBeTruthy();
|
|
112
|
-
});
|
|
113
|
-
|
|
114
|
-
test('should list available tools', async ({ mcp }) => {
|
|
115
|
-
const tools = await mcp.listTools();
|
|
116
|
-
expect(Array.isArray(tools)).toBe(true);
|
|
117
|
-
expect(tools.length).toBeGreaterThan(0);
|
|
118
|
-
});
|
|
119
16
|
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
17
|
+
test('read_file returns file contents', async ({ mcp }) => {
|
|
18
|
+
const result = await mcp.callTool('read_file', { path: '/tmp/test.txt' });
|
|
19
|
+
expect(result).toContainToolText('Hello, world');
|
|
20
|
+
expect(result).not.toBeToolError();
|
|
124
21
|
});
|
|
125
22
|
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
const result = await mcp.callTool('read_file', {
|
|
130
|
-
path: '/tmp/test.txt',
|
|
131
|
-
});
|
|
132
|
-
expect(result.content).toContain('Hello');
|
|
133
|
-
});
|
|
23
|
+
test('server exposes required tools', async ({ mcp }) => {
|
|
24
|
+
const tools = await mcp.listTools();
|
|
25
|
+
expect(tools.map((t) => t.name)).toContain('read_file');
|
|
134
26
|
});
|
|
27
|
+
```
|
|
135
28
|
|
|
136
|
-
|
|
137
|
-
test('file operations eval', async ({ mcp }) => {
|
|
138
|
-
const FileContentSchema = z.object({
|
|
139
|
-
content: z.string(),
|
|
140
|
-
});
|
|
29
|
+
Playwright tests are fast, deterministic, and designed for CI. Use them for regression testing, schema validation, and protocol conformance. The framework includes built-in conformance checks for the MCP spec.
|
|
141
30
|
|
|
142
|
-
|
|
143
|
-
schemas: { 'file-content': FileContentSchema },
|
|
144
|
-
});
|
|
31
|
+
Available matchers:
|
|
145
32
|
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
33
|
+
| Matcher | Description |
|
|
34
|
+
| ------------------------ | ----------------------------------------------- |
|
|
35
|
+
| `toContainToolText` | Response contains expected substrings |
|
|
36
|
+
| `toMatchToolSchema` | Response validates against a Zod schema |
|
|
37
|
+
| `toMatchToolPattern` | Response matches a regex pattern |
|
|
38
|
+
| `toMatchToolSnapshot` | Response matches a saved baseline |
|
|
39
|
+
| `toBeToolError` | Response is (or is not) an error |
|
|
40
|
+
| `toHaveToolResponseSize` | Response size is within bounds |
|
|
41
|
+
| `toSatisfyToolPredicate` | Response satisfies a custom function |
|
|
42
|
+
| `toHaveToolCalls` | LLM called the expected tools |
|
|
43
|
+
| `toHaveToolCallCount` | LLM made N tool calls |
|
|
44
|
+
| `toPassToolJudge` | LLM evaluates response quality against a rubric |
|
|
155
45
|
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
46
|
+
## Eval Datasets
|
|
47
|
+
|
|
48
|
+
Eval datasets let you define test cases as JSON files and run them with `runEvalDataset()`. Each case specifies a tool call and one or more assertions.
|
|
159
49
|
|
|
160
50
|
```json
|
|
161
|
-
// data/evals.json
|
|
162
51
|
{
|
|
163
52
|
"name": "file-ops",
|
|
164
53
|
"cases": [
|
|
@@ -166,256 +55,150 @@ test('file operations eval', async ({ mcp }) => {
|
|
|
166
55
|
"id": "read-config",
|
|
167
56
|
"toolName": "read_file",
|
|
168
57
|
"args": { "path": "/tmp/config.json" },
|
|
169
|
-
"
|
|
58
|
+
"expect": {
|
|
59
|
+
"schema": "file-content",
|
|
60
|
+
"containsText": ["version", "name"]
|
|
61
|
+
}
|
|
170
62
|
},
|
|
171
63
|
{
|
|
172
64
|
"id": "read-readme",
|
|
173
65
|
"toolName": "read_file",
|
|
174
66
|
"args": { "path": "/tmp/README.md" },
|
|
175
|
-
"
|
|
67
|
+
"expect": {
|
|
68
|
+
"snapshot": "readme-snapshot"
|
|
69
|
+
}
|
|
176
70
|
}
|
|
177
71
|
]
|
|
178
72
|
}
|
|
179
73
|
```
|
|
180
74
|
|
|
181
75
|
```typescript
|
|
182
|
-
|
|
183
|
-
import {
|
|
76
|
+
import { test, expect } from '@gleanwork/mcp-server-tester/fixtures/mcp';
|
|
77
|
+
import { loadEvalDataset, runEvalDataset } from '@gleanwork/mcp-server-tester';
|
|
78
|
+
import { z } from 'zod';
|
|
184
79
|
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
mcpConfig: {
|
|
192
|
-
transport: 'stdio',
|
|
193
|
-
command: 'node',
|
|
194
|
-
args: ['path/to/your/server.js'],
|
|
195
|
-
},
|
|
196
|
-
},
|
|
197
|
-
},
|
|
198
|
-
],
|
|
80
|
+
test('file operations eval', async ({ mcp }, testInfo) => {
|
|
81
|
+
const dataset = await loadEvalDataset('./data/evals.json', {
|
|
82
|
+
schemas: { 'file-content': z.object({ content: z.string() }) },
|
|
83
|
+
});
|
|
84
|
+
const result = await runEvalDataset({ dataset }, { mcp, testInfo });
|
|
85
|
+
expect(result.passed).toBe(result.total);
|
|
199
86
|
});
|
|
200
87
|
```
|
|
201
88
|
|
|
202
|
-
|
|
89
|
+
Supported assertion types:
|
|
203
90
|
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
91
|
+
| Type | Description |
|
|
92
|
+
| ---------------- | ----------------------------------------------- |
|
|
93
|
+
| `containsText` | Response includes expected substrings |
|
|
94
|
+
| `schema` | Response validates against a Zod schema |
|
|
95
|
+
| `regex` | Response matches a pattern |
|
|
96
|
+
| `snapshot` | Response matches a saved baseline |
|
|
97
|
+
| `judge` | LLM evaluates response quality against a rubric |
|
|
98
|
+
| `toolsTriggered` | LLM called the expected tools (LLM host mode) |
|
|
211
99
|
|
|
212
|
-
|
|
100
|
+
### LLM host mode
|
|
213
101
|
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
**Real MCP Server Tests:**
|
|
217
|
-
|
|
218
|
-
- [`filesystem-server/`](./examples/filesystem-server) - Test suite for Anthropic's Filesystem MCP server
|
|
219
|
-
- Demonstrates `fixturify-project` for isolated test fixtures
|
|
220
|
-
- Zod schema validation for JSON files
|
|
221
|
-
- 5 Playwright tests, 11 eval dataset cases
|
|
222
|
-
|
|
223
|
-
- [`sqlite-server/`](./examples/sqlite-server) - Test suite for SQLite MCP server
|
|
224
|
-
- Demonstrates `better-sqlite3` for database testing
|
|
225
|
-
- Custom expectations for record count validation
|
|
226
|
-
- 11 Playwright tests, 14 eval dataset cases
|
|
227
|
-
|
|
228
|
-
**Basic Patterns:**
|
|
229
|
-
|
|
230
|
-
- [`basic-playwright-usage/`](./examples/basic-playwright-usage) - Simple Playwright test patterns
|
|
231
|
-
|
|
232
|
-
Each example includes complete test suites, eval datasets, and npm scripts. See [`examples/README.md`](./examples/README.md) for detailed documentation.
|
|
233
|
-
|
|
234
|
-
## Key Concepts
|
|
235
|
-
|
|
236
|
-
### Fixtures
|
|
237
|
-
|
|
238
|
-
Access MCP servers in tests via Playwright fixtures:
|
|
239
|
-
|
|
240
|
-
- `mcpClient: Client` - Raw MCP SDK client
|
|
241
|
-
- `mcp: MCPFixtureApi` - High-level test API with helper methods
|
|
242
|
-
|
|
243
|
-
### Expectations
|
|
244
|
-
|
|
245
|
-
Validate tool responses with multiple expectation types:
|
|
246
|
-
|
|
247
|
-
- **Exact Match** - Structured JSON equality
|
|
248
|
-
- **Schema** - Zod validation
|
|
249
|
-
- **Text Contains** - Substring matching (great for markdown)
|
|
250
|
-
- **Regex** - Pattern matching
|
|
251
|
-
- **LLM Judge** - Semantic evaluation
|
|
252
|
-
|
|
253
|
-
See [Expectations Guide](./docs/expectations.md) for details.
|
|
254
|
-
|
|
255
|
-
### Transports
|
|
256
|
-
|
|
257
|
-
Connect to MCP servers via:
|
|
258
|
-
|
|
259
|
-
- **stdio** - Local server processes
|
|
260
|
-
- **HTTP** - Remote servers
|
|
261
|
-
|
|
262
|
-
See [Transports Guide](./docs/transports.md) for configuration.
|
|
263
|
-
|
|
264
|
-
### Snapshot Testing
|
|
265
|
-
|
|
266
|
-
Snapshot testing captures tool responses and compares them against stored baselines. This works best for **deterministic responses** like help text, configuration, or schema discovery.
|
|
267
|
-
|
|
268
|
-
> **Note:** For responses with timestamps, IDs, or live data, use [sanitizers](./docs/expectations.md#snapshot-sanitizers) to normalize variable content, or consider schema validation instead.
|
|
269
|
-
|
|
270
|
-
```bash
|
|
271
|
-
# Generate dataset with snapshot expectations
|
|
272
|
-
npx mcp-server-tester generate --snapshot -o data/evals.json
|
|
273
|
-
|
|
274
|
-
# First run captures snapshots
|
|
275
|
-
npx playwright test
|
|
276
|
-
|
|
277
|
-
# Update snapshots when server behavior changes
|
|
278
|
-
npx playwright test --update-snapshots
|
|
279
|
-
```
|
|
280
|
-
|
|
281
|
-
For responses with variable data, use sanitizers:
|
|
102
|
+
In LLM host mode, a real LLM receives your server's tool list and a natural language prompt, then decides which tools to call. This tests whether your tool names, descriptions, and input schemas are clear enough for autonomous use — a different question from whether the tools return correct output.
|
|
282
103
|
|
|
283
104
|
```json
|
|
284
105
|
{
|
|
285
|
-
"id": "
|
|
286
|
-
"
|
|
287
|
-
"
|
|
288
|
-
"
|
|
289
|
-
|
|
106
|
+
"id": "find-config",
|
|
107
|
+
"mode": "llm_host",
|
|
108
|
+
"scenario": "Find the application config file and return its contents",
|
|
109
|
+
"llmHostConfig": {
|
|
110
|
+
"provider": "anthropic",
|
|
111
|
+
"model": "claude-opus-4-20250514"
|
|
112
|
+
},
|
|
113
|
+
"expect": {
|
|
114
|
+
"toolsTriggered": {
|
|
115
|
+
"calls": [{ "name": "read_file", "required": true }]
|
|
116
|
+
}
|
|
117
|
+
}
|
|
290
118
|
}
|
|
291
119
|
```
|
|
292
120
|
|
|
293
|
-
See the [
|
|
294
|
-
|
|
295
|
-
## CLI OAuth Authentication
|
|
121
|
+
LLM host mode makes real API calls and produces non-deterministic results. Use `iterations` to run a case multiple times and measure pass rate rather than expecting 100% on a single run. See the [LLM Host Guide](docs/llm-host.md) for configuration and cost management.
|
|
296
122
|
|
|
297
|
-
|
|
123
|
+
## Installation
|
|
298
124
|
|
|
299
|
-
|
|
125
|
+
Requires Node.js 22+.
|
|
300
126
|
|
|
301
127
|
```bash
|
|
302
|
-
|
|
303
|
-
npx mcp-server-tester login https://api.example.com/mcp
|
|
304
|
-
|
|
305
|
-
# Force re-authentication
|
|
306
|
-
npx mcp-server-tester login https://api.example.com/mcp --force
|
|
128
|
+
npm install --save-dev @gleanwork/mcp-server-tester @playwright/test zod
|
|
307
129
|
```
|
|
308
130
|
|
|
309
|
-
|
|
131
|
+
The Anthropic SDK is only needed for LLM-as-judge assertions or LLM host mode with the Anthropic provider:
|
|
310
132
|
|
|
311
|
-
|
|
312
|
-
|
|
313
|
-
|
|
133
|
+
```bash
|
|
134
|
+
npm install --save-dev @anthropic-ai/sdk
|
|
135
|
+
```
|
|
314
136
|
|
|
315
|
-
|
|
316
|
-
- **macOS**: `~/.local/state/mcp-tests/<server-key>/`
|
|
317
|
-
- **Windows**: `%LOCALAPPDATA%\mcp-tests\<server-key>\`
|
|
137
|
+
## Quick Start
|
|
318
138
|
|
|
319
|
-
|
|
139
|
+
```bash
|
|
140
|
+
npx mcp-server-tester init
|
|
141
|
+
```
|
|
320
142
|
|
|
321
|
-
|
|
322
|
-
- File permissions: `0600` (owner read/write only)
|
|
323
|
-
- Files stored: `tokens.json`, `client.json`, `server.json`
|
|
143
|
+
The CLI wizard creates a `playwright.config.ts`, example tests, and a sample eval dataset configured for your server. See the [CLI Guide](./docs/cli.md) for all options.
|
|
324
144
|
|
|
325
|
-
|
|
145
|
+
## Configuration
|
|
326
146
|
|
|
327
|
-
|
|
147
|
+
Point the framework at your MCP server in `playwright.config.ts`:
|
|
328
148
|
|
|
329
149
|
```typescript
|
|
330
|
-
import {
|
|
150
|
+
import { defineConfig } from '@playwright/test';
|
|
331
151
|
|
|
332
|
-
|
|
333
|
-
|
|
152
|
+
export default defineConfig({
|
|
153
|
+
testDir: './tests',
|
|
154
|
+
reporter: [['list'], ['@gleanwork/mcp-server-tester/reporters/mcpReporter']],
|
|
155
|
+
projects: [
|
|
156
|
+
{
|
|
157
|
+
name: 'my-server',
|
|
158
|
+
use: {
|
|
159
|
+
mcpConfig: {
|
|
160
|
+
transport: 'stdio',
|
|
161
|
+
command: 'node',
|
|
162
|
+
args: ['server.js'],
|
|
163
|
+
},
|
|
164
|
+
},
|
|
165
|
+
},
|
|
166
|
+
],
|
|
334
167
|
});
|
|
335
|
-
|
|
336
|
-
// Get a valid access token (cached, refreshed, or new)
|
|
337
|
-
const result = await client.getAccessToken();
|
|
338
|
-
console.log(`Token: ${result.accessToken}`);
|
|
339
|
-
```
|
|
340
|
-
|
|
341
|
-
### CI/CD Usage (GitHub Actions)
|
|
342
|
-
|
|
343
|
-
For automated testing in CI, tokens can be provided via environment variables:
|
|
344
|
-
|
|
345
|
-
```yaml
|
|
346
|
-
# .github/workflows/mcp-tests.yml
|
|
347
|
-
jobs:
|
|
348
|
-
test:
|
|
349
|
-
runs-on: ubuntu-latest
|
|
350
|
-
env:
|
|
351
|
-
MCP_ACCESS_TOKEN: ${{ secrets.MCP_ACCESS_TOKEN }}
|
|
352
|
-
MCP_REFRESH_TOKEN: ${{ secrets.MCP_REFRESH_TOKEN }}
|
|
353
|
-
steps:
|
|
354
|
-
- uses: actions/checkout@v4
|
|
355
|
-
- run: npm ci
|
|
356
|
-
- run: npm run test:playwright
|
|
357
168
|
```
|
|
358
169
|
|
|
359
|
-
|
|
360
|
-
|
|
361
|
-
1. Authenticate locally: `npx mcp-server-tester login <server-url>`
|
|
362
|
-
2. Export tokens for GitHub: `npx mcp-server-tester token <server-url> --format gh`
|
|
363
|
-
3. Run the output `gh secret set` commands (requires [GitHub CLI](https://cli.github.com/))
|
|
364
|
-
|
|
365
|
-
The `token` command supports multiple formats:
|
|
170
|
+
For HTTP servers, set `transport: 'http'` and `serverUrl`. For servers that require OAuth, see the [Transports Guide](./docs/transports.md) and [CLI Guide](./docs/cli.md) for authentication setup, including CI/CD token management.
|
|
366
171
|
|
|
367
|
-
|
|
368
|
-
- `json` - JSON object for scripting
|
|
369
|
-
- `gh` - Ready-to-paste GitHub CLI commands
|
|
370
|
-
|
|
371
|
-
See the [CLI Guide](./docs/cli.md#token---export-tokens-for-cicd) for details.
|
|
372
|
-
|
|
373
|
-
Alternatively, inject tokens programmatically in your test setup:
|
|
374
|
-
|
|
375
|
-
```typescript
|
|
376
|
-
import { injectTokens } from '@gleanwork/mcp-server-tester';
|
|
377
|
-
|
|
378
|
-
// In globalSetup.ts
|
|
379
|
-
await injectTokens('https://api.example.com/mcp', {
|
|
380
|
-
accessToken: process.env.MCP_ACCESS_TOKEN!,
|
|
381
|
-
tokenType: 'Bearer',
|
|
382
|
-
});
|
|
383
|
-
```
|
|
172
|
+
## Documentation
|
|
384
173
|
|
|
385
|
-
|
|
174
|
+
- [Quick Start](./docs/quickstart.md) — detailed setup and configuration
|
|
175
|
+
- [Expectations](./docs/expectations.md) — all assertion types including snapshot sanitizers
|
|
176
|
+
- [LLM Host Simulation](docs/llm-host.md) — tool discoverability testing
|
|
177
|
+
- [API Reference](./docs/api-reference.md)
|
|
178
|
+
- [Transports](./docs/transports.md) — stdio and HTTP configuration, OAuth
|
|
179
|
+
- [CLI Commands](./docs/cli.md) — init, generate, login, token
|
|
180
|
+
- [UI Reporter](./docs/ui-reporter.md) — interactive web UI for test results
|
|
181
|
+
- [Development](./docs/development.md) — contributing and building
|
|
386
182
|
|
|
387
|
-
|
|
183
|
+
## Examples
|
|
388
184
|
|
|
389
|
-
|
|
185
|
+
The `examples/` directory contains complete working examples:
|
|
390
186
|
|
|
391
|
-
|
|
187
|
+
- [filesystem-server/](./examples/filesystem-server) — Test suite for Anthropic's Filesystem MCP server: 5 Playwright tests, 11 eval dataset cases, Zod schema validation.
|
|
188
|
+
- [sqlite-server/](./examples/sqlite-server) — Test suite for a SQLite MCP server: 11 Playwright tests, 14 eval dataset cases.
|
|
189
|
+
- [basic-playwright-usage/](./examples/basic-playwright-usage) — Minimal Playwright patterns.
|
|
392
190
|
|
|
393
|
-
|
|
394
|
-
export default defineConfig({
|
|
395
|
-
reporter: [['list'], ['@gleanwork/mcp-server-tester/reporters/mcpReporter']],
|
|
396
|
-
});
|
|
397
|
-
```
|
|
191
|
+
## Known Limitations
|
|
398
192
|
|
|
399
|
-
|
|
193
|
+
These MCP protocol features are not currently supported. These are deliberate scope decisions, not bugs:
|
|
400
194
|
|
|
401
|
-
|
|
195
|
+
- MCP resources (`listResources`, `readResource`)
|
|
196
|
+
- MCP prompts (`listPrompts`, `getPrompt`)
|
|
197
|
+
- Server-to-client notifications
|
|
198
|
+
- Streaming tool responses (`callTool` waits for the complete response)
|
|
402
199
|
|
|
403
|
-
|
|
404
|
-
- **Examples**: See [`examples/`](./examples) directory
|
|
405
|
-
- **Issues**: [GitHub Issues](https://github.com/gleanwork/mcp-server-tester/issues)
|
|
200
|
+
If any of these affect your use case, please open an issue.
|
|
406
201
|
|
|
407
202
|
## License
|
|
408
203
|
|
|
409
204
|
MIT
|
|
410
|
-
|
|
411
|
-
## Contributing
|
|
412
|
-
|
|
413
|
-
Contributions welcome! See [Development Guide](./docs/development.md) for setup instructions.
|
|
414
|
-
|
|
415
|
-
## Credits
|
|
416
|
-
|
|
417
|
-
Built with:
|
|
418
|
-
|
|
419
|
-
- [@modelcontextprotocol/sdk](https://github.com/modelcontextprotocol/typescript-sdk)
|
|
420
|
-
- [@playwright/test](https://playwright.dev)
|
|
421
|
-
- [Zod](https://zod.dev)
|