@gleanwork/mcp-server-tester 0.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Glean Contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,421 @@
1
+ # @gleanwork/mcp-server-tester
2
+
3
+ [![Experimental](https://img.shields.io/badge/-Experimental-D8FD49?style=flat-square&logo=data:image/svg+xml;base64,PHN2ZyB2aWV3Qm94PSIwIDAgMzIgMzIiIGZpbGw9Im5vbmUiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyI+CjxwYXRoIGQ9Ik0yNC4zMDA2IDIuOTU0MjdMMjAuNzY1NiAwLjE5OTk1MUwxNy45MDI4IDMuOTk1MjdDMTMuNTY1MyAxLjkzNDk1IDguMjMwMTkgMy4wODQzOSA1LjE5Mzk0IDcuMDA5ODNDMS42NTg4OCAxMS41NjQyIDIuNDgzIDE4LjExMzggNy4wMzczOCAyMS42NDg5QzguNzcyMzggMjIuOTkzNSAxMC43ODkzIDIzLjcwOTIgMTIuODI3OSAyMy44MTc3QzE2LjE0NjEgMjQuMDEyOCAxOS41MDc3IDIyLjYyNDggMjEuNjc2NSAxOS44MDU1QzI0LjczNDQgMTUuODggMjQuNTE3NSAxMC40MTQ4IDIxLjQ1OTYgNi43Mjc4OUwyNC4zMDA2IDIuOTU0MjdaTTE4LjExOTcgMTcuMDUxMkMxNi4xMDI4IDE5LjYzMiAxMi4zNzI1IDIwLjEwOTEgOS43NzAwMSAxOC4wOTIyQzcuMTg5MTkgMTYuMDc1MiA2LjcxMjA3IDEyLjMyMzMgOC43MjkwMSA5Ljc0MjQ2QzkuNzA0OTQgOC40ODQ1OCAxMS4xMTQ2IDcuNjgyMTQgMTIuNjc2MSA3LjQ4Njk2QzEzLjA0NDggNy40NDM1OCAxMy40MTM1IDcuNDIxOSAxMy43ODIyIDcuNDQzNThDMTQuOTc1IDcuNTA4NjUgMTYuMTI0NCA3Ljk0MjM5IDE3LjA3ODcgOC42Nzk3N0MxOS42NTk1IDEwLjcxODQgMjAuMTM2NiAxNC40NzAzIDE4LjExOTcgMTcuMDUxMloiIGZpbGw9IndoaXRlIi8+CjxwYXRoIGQ9Ik0yNC41MTc2IDIxLjY5MjJDMjMuOTMyIDIyLjQ1MTMgMjMuMjgxNCAyMy4xMjM2IDIyLjU2NTcgMjMuNzUyNUMyMS44NzE3IDI0LjMzODEgMjEuMTEyNyAyNC44ODAzIDIwLjMxMDIgMjUuMzM1N0MxOS41Mjk1IDI1Ljc2OTUgMTguNjgzNyAyNi4xMzgyIDE3LjgzNzggMjYuNDIwMUMxNi45OTIgMjYuNzAyIDE2LjEwMjggMjYuODk3MiAxNS4yMTM3IDI3LjAwNTdDMTQuMzI0NSAyNy4xMTQxIDEzLjQzNTMgMjcuMTU3NSAxMi41MjQ0IDI3LjA5MjRDMTEuNjEzNSAyNy4wMjczIDEwLjcyNDMgMjYuODc1NSA5Ljg1Njg0IDI2LjY1ODdMOS42NjE2NSAyNy4zNzQzTDguNzcyNDYgMzAuOTk2MkM5LjkwMDIxIDMxLjI5OTggMTEuMDQ5NyAzMS40NzMzIDEyLjIyMDggMzEuNTZDMTIuMjY0MiAzMS41NiAxMi4zMjkyIDMxLjU2IDEyLjM3MjYgMzEuNTZDMTMuNTAwMyAzMS42MjUxIDE0LjY0OTggMzEuNTgxNyAxNS43NTU4IDMxLjQ1MTZDMTYuOTI3IDMxLjI5OTggMTguMDk4MSAzMS4wMzk1IDE5LjIyNTggMzAuNjcwOEMyMC4zNTM2IDMwLjMwMjIgMjEuNDU5NyAyOS44MjUgMjIuNTAwNyAyOS4yMzk1QzIzLjU2MzQgMjguNjUzOSAyNC41NjEgMjcuOTM4MiAyNS40OTM1IDI3LjE1NzVDMjYuNDQ3OCAyNi4zNTUgMjcuMzE1MyAyNS40NDQyIDI4LjA3NDQgMjQuNDQ2NUMyOC4xODI4IDI0LjMxNjQgMjguMjY5NSAyNC4xNjQ2IDI4LjM3OCAyNC4wMTI4TDI0Ljc3NzkgMjEuMzQ1MkMyNC42Njk0IDIxLjQ1MzcgMjQuNjA0NCAyMS41ODM4IDI0LjUxNzYgMjEuNjkyMloiIGZpbGw9IndoaXRlIi8+Cjwvc3ZnPg==&labelColor=343CED)](https://github.com/gleanwork/.github/blob/main/docs/repository-stability.md#experimental)
4
+ [![npm version](https://img.shields.io/npm/v/@gleanwork/mcp-server-tester)](https://www.npmjs.com/package/@gleanwork/mcp-server-tester)
5
+ [![CI](https://github.com/gleanwork/mcp-server-tester/actions/workflows/ci.yml/badge.svg)](https://github.com/gleanwork/mcp-server-tester/actions/workflows/ci.yml)
6
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
7
+ [![Node.js Version](https://img.shields.io/node/v/@gleanwork/mcp-server-tester)](https://nodejs.org)
8
+ [![TypeScript](https://img.shields.io/badge/TypeScript-5.7-blue.svg)](https://www.typescriptlang.org/)
9
+
10
+ > Playwright-based testing framework for MCP servers
11
+
12
+ > [!WARNING]
13
+ > **Experimental Project** - This library is in active development. APIs may change, and we welcome contributions, feedback, and collaboration as we evolve the framework. See [CONTRIBUTING.md](./CONTRIBUTING.md) for details.
14
+
15
+ `@gleanwork/mcp-server-tester` is a comprehensive testing and evaluation framework for [Model Context Protocol (MCP)](https://modelcontextprotocol.io) servers. It provides first-class Playwright fixtures, data-driven eval datasets, and optional LLM-as-a-judge scoring.
16
+
17
+ ## What's Included
18
+
19
+ This framework provides **two complementary approaches** for testing MCP servers:
20
+
21
+ ### 1. **Automated Testing** (Playwright Tests)
22
+
23
+ Write deterministic, automated tests using standard Playwright patterns with MCP-specific fixtures. Perfect for:
24
+
25
+ - Direct tool calls with expected outputs
26
+ - Protocol conformance validation
27
+ - Integration testing with your MCP server
28
+ - CI/CD pipelines
29
+
30
+ ```typescript
31
+ test('read a file', async ({ mcp }) => {
32
+ const result = await mcp.callTool('read_file', { path: '/tmp/test.txt' });
33
+ expect(result.content).toContain('Hello');
34
+ });
35
+ ```
36
+
37
+ ### 2. **Evaluation Datasets** (Evals) ⚠️ Experimental
38
+
39
+ Run deeper, more subjective analysis using dataset-driven evaluations. Includes:
40
+
41
+ - Schema validation (deterministic)
42
+ - Text and regex pattern matching (deterministic)
43
+ - LLM-as-a-judge scoring (non-deterministic)
44
+
45
+ **Note:** Evals, particularly those using LLM-as-a-judge, are highly experimental due to their non-deterministic nature. Results may vary between runs, and prompts may need tuning for your specific use case.
46
+
47
+ ```typescript
48
+ const result = await runEvalDataset({ dataset, expectations }, { mcp });
49
+ expect(result.passed).toBe(result.total);
50
+ ```
51
+
52
+ ## Features
53
+
54
+ - 🎭 **Playwright Integration** - Use MCP servers in Playwright tests with idiomatic fixtures
55
+ - 📊 **Matrix Evals** - Run dataset-driven evaluations across multiple transports
56
+ - 📸 **Snapshot Testing** - Capture and compare deterministic responses with optional sanitizers for variable data
57
+ - 🤖 **LLM-as-a-Judge** - Optional semantic evaluation using Anthropic Claude
58
+ - 🔌 **Multiple Transports** - Support for both stdio (local) and HTTP (remote) connections
59
+ - ✅ **Protocol Conformance** - Built-in checks for MCP spec compliance
60
+
61
+ ## Installation
62
+
63
+ ```bash
64
+ npm install --save-dev @gleanwork/mcp-server-tester @playwright/test zod
65
+ ```
66
+
67
+ **Note:** The Anthropic SDK is optional and only needed if you plan to use LLM-as-a-judge semantic evaluation:
68
+
69
+ ```bash
70
+ npm install --save-dev @anthropic-ai/sdk
71
+ ```
72
+
73
+ ## Quick Start
74
+
75
+ ### Initialize with CLI
76
+
77
+ The fastest way to get started:
78
+
79
+ ```bash
80
+ npx mcp-server-tester init
81
+
82
+ # Follow the interactive prompts to create:
83
+ # - playwright.config.ts (configured for your MCP server)
84
+ # - tests/mcp.spec.ts (example tests)
85
+ # - data/example-dataset.json (sample eval dataset)
86
+ # - package.json (with all dependencies)
87
+ ```
88
+
89
+ See the [CLI Guide](./docs/cli.md) for all options.
90
+
91
+ ### Example: Testing in Action
92
+
93
+ Here's what a complete test suite looks like (following the **layered testing pattern**):
94
+
95
+ ```typescript
96
+ // tests/mcp.spec.ts
97
+ import { test, expect } from '@gleanwork/mcp-server-tester/fixtures/mcp';
98
+ import {
99
+ loadEvalDataset,
100
+ runEvalDataset,
101
+ createSchemaExpectation,
102
+ } from '@gleanwork/mcp-server-tester';
103
+ import { z } from 'zod';
104
+
105
+ // Layer 1: MCP Protocol Conformance
106
+ test.describe('MCP Protocol Conformance', () => {
107
+ test('should return valid server info', async ({ mcp }) => {
108
+ const info = mcp.getServerInfo();
109
+ expect(info).toBeTruthy();
110
+ expect(info?.name).toBeTruthy();
111
+ expect(info?.version).toBeTruthy();
112
+ });
113
+
114
+ test('should list available tools', async ({ mcp }) => {
115
+ const tools = await mcp.listTools();
116
+ expect(Array.isArray(tools)).toBe(true);
117
+ expect(tools.length).toBeGreaterThan(0);
118
+ });
119
+
120
+ test('should handle invalid tool gracefully', async ({ mcp }) => {
121
+ const result = await mcp.callTool('nonexistent_tool', {});
122
+ expect(result.isError).toBe(true);
123
+ });
124
+ });
125
+
126
+ // Layer 2: Direct Tool Testing
127
+ test.describe('File Operations', () => {
128
+ test('should read a file', async ({ mcp }) => {
129
+ const result = await mcp.callTool('read_file', {
130
+ path: '/tmp/test.txt',
131
+ });
132
+ expect(result.content).toContain('Hello');
133
+ });
134
+ });
135
+
136
+ // Layer 3: Eval Datasets
137
+ test('file operations eval', async ({ mcp }) => {
138
+ const FileContentSchema = z.object({
139
+ content: z.string(),
140
+ });
141
+
142
+ const dataset = await loadEvalDataset('./data/evals.json', {
143
+ schemas: { 'file-content': FileContentSchema },
144
+ });
145
+
146
+ const result = await runEvalDataset(
147
+ {
148
+ dataset,
149
+ expectations: {
150
+ schema: createSchemaExpectation(dataset),
151
+ },
152
+ },
153
+ { mcp }
154
+ );
155
+
156
+ expect(result.passed).toBe(result.total);
157
+ });
158
+ ```
159
+
160
+ ```json
161
+ // data/evals.json
162
+ {
163
+ "name": "file-ops",
164
+ "cases": [
165
+ {
166
+ "id": "read-config",
167
+ "toolName": "read_file",
168
+ "args": { "path": "/tmp/config.json" },
169
+ "expectedSchemaName": "file-content"
170
+ },
171
+ {
172
+ "id": "read-readme",
173
+ "toolName": "read_file",
174
+ "args": { "path": "/tmp/README.md" },
175
+ "expectedTextContains": ["# Welcome", "## Installation"]
176
+ }
177
+ ]
178
+ }
179
+ ```
180
+
181
+ ```typescript
182
+ // playwright.config.ts
183
+ import { defineConfig } from '@playwright/test';
184
+
185
+ export default defineConfig({
186
+ testDir: './tests',
187
+ projects: [
188
+ {
189
+ name: 'mcp-local',
190
+ use: {
191
+ mcpConfig: {
192
+ transport: 'stdio',
193
+ command: 'node',
194
+ args: ['path/to/your/server.js'],
195
+ },
196
+ },
197
+ },
198
+ ],
199
+ });
200
+ ```
201
+
202
+ ## Documentation
203
+
204
+ - **[Quick Start Guide](./docs/quickstart.md)** - Detailed setup and configuration
205
+ - **[Expectations](./docs/expectations.md)** - All validation types (exact, schema, regex, text contains, snapshot, LLM judge)
206
+ - **[API Reference](./docs/api-reference.md)** - Complete API documentation
207
+ - **[CLI Commands](./docs/cli.md)** - `init`, `generate`, `login`, and `token` command details
208
+ - **[UI Reporter](./docs/ui-reporter.md)** - Interactive web UI for test results
209
+ - **[Transports](./docs/transports.md)** - Stdio vs HTTP configuration
210
+ - **[Development](./docs/development.md)** - Contributing and building
211
+
212
+ ## Examples
213
+
214
+ The `examples/` directory contains complete working examples:
215
+
216
+ **Real MCP Server Tests:**
217
+
218
+ - [`filesystem-server/`](./examples/filesystem-server) - Test suite for Anthropic's Filesystem MCP server
219
+ - Demonstrates `fixturify-project` for isolated test fixtures
220
+ - Zod schema validation for JSON files
221
+ - 5 Playwright tests, 11 eval dataset cases
222
+
223
+ - [`sqlite-server/`](./examples/sqlite-server) - Test suite for SQLite MCP server
224
+ - Demonstrates `better-sqlite3` for database testing
225
+ - Custom expectations for record count validation
226
+ - 11 Playwright tests, 14 eval dataset cases
227
+
228
+ **Basic Patterns:**
229
+
230
+ - [`basic-playwright-usage/`](./examples/basic-playwright-usage) - Simple Playwright test patterns
231
+
232
+ Each example includes complete test suites, eval datasets, and npm scripts. See [`examples/README.md`](./examples/README.md) for detailed documentation.
233
+
234
+ ## Key Concepts
235
+
236
+ ### Fixtures
237
+
238
+ Access MCP servers in tests via Playwright fixtures:
239
+
240
+ - `mcpClient: Client` - Raw MCP SDK client
241
+ - `mcp: MCPFixtureApi` - High-level test API with helper methods
242
+
243
+ ### Expectations
244
+
245
+ Validate tool responses with multiple expectation types:
246
+
247
+ - **Exact Match** - Structured JSON equality
248
+ - **Schema** - Zod validation
249
+ - **Text Contains** - Substring matching (great for markdown)
250
+ - **Regex** - Pattern matching
251
+ - **LLM Judge** - Semantic evaluation
252
+
253
+ See [Expectations Guide](./docs/expectations.md) for details.
254
+
255
+ ### Transports
256
+
257
+ Connect to MCP servers via:
258
+
259
+ - **stdio** - Local server processes
260
+ - **HTTP** - Remote servers
261
+
262
+ See [Transports Guide](./docs/transports.md) for configuration.
263
+
264
+ ### Snapshot Testing
265
+
266
+ Snapshot testing captures tool responses and compares them against stored baselines. This works best for **deterministic responses** like help text, configuration, or schema discovery.
267
+
268
+ > **Note:** For responses with timestamps, IDs, or live data, use [sanitizers](./docs/expectations.md#snapshot-sanitizers) to normalize variable content, or consider schema validation instead.
269
+
270
+ ```bash
271
+ # Generate dataset with snapshot expectations
272
+ npx mcp-server-tester generate --snapshot -o data/evals.json
273
+
274
+ # First run captures snapshots
275
+ npx playwright test
276
+
277
+ # Update snapshots when server behavior changes
278
+ npx playwright test --update-snapshots
279
+ ```
280
+
281
+ For responses with variable data, use sanitizers:
282
+
283
+ ```json
284
+ {
285
+ "id": "get-user",
286
+ "toolName": "get_user",
287
+ "args": { "id": "123" },
288
+ "expectedSnapshot": "user-profile",
289
+ "snapshotSanitizers": ["uuid", "iso-date", { "remove": ["lastLoginAt"] }]
290
+ }
291
+ ```
292
+
293
+ See the [Expectations Guide](./docs/expectations.md#snapshot-testing) for when to use snapshots vs other validation methods.
294
+
295
+ ## CLI OAuth Authentication
296
+
297
+ For MCP servers that require OAuth authentication, the framework provides a CLI-based OAuth flow:
298
+
299
+ ### Interactive Login
300
+
301
+ ```bash
302
+ # Authenticate with an MCP server (opens browser)
303
+ npx mcp-server-tester login https://api.example.com/mcp
304
+
305
+ # Force re-authentication
306
+ npx mcp-server-tester login https://api.example.com/mcp --force
307
+ ```
308
+
309
+ ### Token Storage
310
+
311
+ Tokens are cached locally and automatically refreshed when expired.
312
+
313
+ **Storage locations:**
314
+
315
+ - **Linux**: `$XDG_STATE_HOME/mcp-tests/<server-key>/` or `~/.local/state/mcp-tests/<server-key>/`
316
+ - **macOS**: `~/.local/state/mcp-tests/<server-key>/`
317
+ - **Windows**: `%LOCALAPPDATA%\mcp-tests\<server-key>\`
318
+
319
+ **Security:**
320
+
321
+ - Directory permissions: `0700` (owner only)
322
+ - File permissions: `0600` (owner read/write only)
323
+ - Files stored: `tokens.json`, `client.json`, `server.json`
324
+
325
+ Use `--state-dir` to override the storage location.
326
+
327
+ ### Programmatic Usage
328
+
329
+ ```typescript
330
+ import { CLIOAuthClient } from '@gleanwork/mcp-server-tester';
331
+
332
+ const client = new CLIOAuthClient({
333
+ mcpServerUrl: 'https://api.example.com/mcp',
334
+ });
335
+
336
+ // Get a valid access token (cached, refreshed, or new)
337
+ const result = await client.getAccessToken();
338
+ console.log(`Token: ${result.accessToken}`);
339
+ ```
340
+
341
+ ### CI/CD Usage (GitHub Actions)
342
+
343
+ For automated testing in CI, tokens can be provided via environment variables:
344
+
345
+ ```yaml
346
+ # .github/workflows/mcp-tests.yml
347
+ jobs:
348
+ test:
349
+ runs-on: ubuntu-latest
350
+ env:
351
+ MCP_ACCESS_TOKEN: ${{ secrets.MCP_ACCESS_TOKEN }}
352
+ MCP_REFRESH_TOKEN: ${{ secrets.MCP_REFRESH_TOKEN }}
353
+ steps:
354
+ - uses: actions/checkout@v4
355
+ - run: npm ci
356
+ - run: npm run test:playwright
357
+ ```
358
+
359
+ **To set up GitHub Actions secrets:**
360
+
361
+ 1. Authenticate locally: `npx mcp-server-tester login <server-url>`
362
+ 2. Export tokens for GitHub: `npx mcp-server-tester token <server-url> --format gh`
363
+ 3. Run the output `gh secret set` commands (requires [GitHub CLI](https://cli.github.com/))
364
+
365
+ The `token` command supports multiple formats:
366
+
367
+ - `env` (default) - Shell-compatible `KEY=value` pairs
368
+ - `json` - JSON object for scripting
369
+ - `gh` - Ready-to-paste GitHub CLI commands
370
+
371
+ See the [CLI Guide](./docs/cli.md#token---export-tokens-for-cicd) for details.
372
+
373
+ Alternatively, inject tokens programmatically in your test setup:
374
+
375
+ ```typescript
376
+ import { injectTokens } from '@gleanwork/mcp-server-tester';
377
+
378
+ // In globalSetup.ts
379
+ await injectTokens('https://api.example.com/mcp', {
380
+ accessToken: process.env.MCP_ACCESS_TOKEN!,
381
+ tokenType: 'Bearer',
382
+ });
383
+ ```
384
+
385
+ ## UI Reporter
386
+
387
+ Interactive web UI for visualizing test results:
388
+
389
+ ![MCP Test Reporter UI](./ui.png)
390
+
391
+ Add to your `playwright.config.ts`:
392
+
393
+ ```typescript
394
+ export default defineConfig({
395
+ reporter: [['list'], ['@gleanwork/mcp-server-tester/reporters/mcpReporter']],
396
+ });
397
+ ```
398
+
399
+ See [UI Reporter Guide](./docs/ui-reporter.md) for features and usage.
400
+
401
+ ## Support
402
+
403
+ - **Documentation**: See [`docs/`](./docs) directory
404
+ - **Examples**: See [`examples/`](./examples) directory
405
+ - **Issues**: [GitHub Issues](https://github.com/gleanwork/mcp-server-tester/issues)
406
+
407
+ ## License
408
+
409
+ MIT
410
+
411
+ ## Contributing
412
+
413
+ Contributions welcome! See [Development Guide](./docs/development.md) for setup instructions.
414
+
415
+ ## Credits
416
+
417
+ Built with:
418
+
419
+ - [@modelcontextprotocol/sdk](https://github.com/modelcontextprotocol/typescript-sdk)
420
+ - [@playwright/test](https://playwright.dev)
421
+ - [Zod](https://zod.dev)