npm - @gleanwork/mcp-server-tester - Versions diffs - 0.12.0 - Mend

@gleanwork/mcp-server-tester 0.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

package/LICENSE +21 -0
package/README.md +421 -0
package/dist/cli/index.js +2785 -0
package/dist/fixtures/mcp.d.ts +605 -0
package/dist/fixtures/mcp.js +2378 -0
package/dist/fixtures/mcp.js.map +1 -0
package/dist/fixtures/mcpAuth.d.ts +31 -0
package/dist/fixtures/mcpAuth.js +317 -0
package/dist/fixtures/mcpAuth.js.map +1 -0
package/dist/index.cjs +3658 -0
package/dist/index.cjs.map +1 -0
package/dist/index.d.cts +3857 -0
package/dist/index.d.ts +3857 -0
package/dist/index.js +3582 -0
package/dist/index.js.map +1 -0
package/dist/reporters/mcpReporter.cjs +301 -0
package/dist/reporters/mcpReporter.cjs.map +1 -0
package/dist/reporters/mcpReporter.d.cts +85 -0
package/dist/reporters/mcpReporter.d.ts +85 -0
package/dist/reporters/mcpReporter.js +297 -0
package/dist/reporters/mcpReporter.js.map +1 -0
package/dist/reporters/ui-dist/app.js +174 -0
package/dist/reporters/ui-dist/index.html +28 -0
package/dist/reporters/ui-dist/styles.css +1 -0
package/package.json +138 -0
package/src/reporters/ui-dist/app.js +174 -0
package/src/reporters/ui-dist/index.html +28 -0
package/src/reporters/ui-dist/styles.css +1 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 Glean Contributors
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,421 @@
+# @gleanwork/mcp-server-tester
+[![Experimental](https://img.shields.io/badge/-Experimental-D8FD49?style=flat-square&logo=data:image/svg+xml;base64,PHN2ZyB2aWV3Qm94PSIwIDAgMzIgMzIiIGZpbGw9Im5vbmUiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyI+CjxwYXRoIGQ9Ik0yNC4zMDA2IDIuOTU0MjdMMjAuNzY1NiAwLjE5OTk1MUwxNy45MDI4IDMuOTk1MjdDMTMuNTY1MyAxLjkzNDk1IDguMjMwMTkgMy4wODQzOSA1LjE5Mzk0IDcuMDA5ODNDMS42NTg4OCAxMS41NjQyIDIuNDgzIDE4LjExMzggNy4wMzczOCAyMS42NDg5QzguNzcyMzggMjIuOTkzNSAxMC43ODkzIDIzLjcwOTIgMTIuODI3OSAyMy44MTc3QzE2LjE0NjEgMjQuMDEyOCAxOS41MDc3IDIyLjYyNDggMjEuNjc2NSAxOS44MDU1QzI0LjczNDQgMTUuODggMjQuNTE3NSAxMC40MTQ4IDIxLjQ1OTYgNi43Mjc4OUwyNC4zMDA2IDIuOTU0MjdaTTE4LjExOTcgMTcuMDUxMkMxNi4xMDI4IDE5LjYzMiAxMi4zNzI1IDIwLjEwOTEgOS43NzAwMSAxOC4wOTIyQzcuMTg5MTkgMTYuMDc1MiA2LjcxMjA3IDEyLjMyMzMgOC43MjkwMSA5Ljc0MjQ2QzkuNzA0OTQgOC40ODQ1OCAxMS4xMTQ2IDcuNjgyMTQgMTIuNjc2MSA3LjQ4Njk2QzEzLjA0NDggNy40NDM1OCAxMy40MTM1IDcuNDIxOSAxMy43ODIyIDcuNDQzNThDMTQuOTc1IDcuNTA4NjUgMTYuMTI0NCA3Ljk0MjM5IDE3LjA3ODcgOC42Nzk3N0MxOS42NTk1IDEwLjcxODQgMjAuMTM2NiAxNC40NzAzIDE4LjExOTcgMTcuMDUxMloiIGZpbGw9IndoaXRlIi8+CjxwYXRoIGQ9Ik0yNC41MTc2IDIxLjY5MjJDMjMuOTMyIDIyLjQ1MTMgMjMuMjgxNCAyMy4xMjM2IDIyLjU2NTcgMjMuNzUyNUMyMS44NzE3IDI0LjMzODEgMjEuMTEyNyAyNC44ODAzIDIwLjMxMDIgMjUuMzM1N0MxOS41Mjk1IDI1Ljc2OTUgMTguNjgzNyAyNi4xMzgyIDE3LjgzNzggMjYuNDIwMUMxNi45OTIgMjYuNzAyIDE2LjEwMjggMjYuODk3MiAxNS4yMTM3IDI3LjAwNTdDMTQuMzI0NSAyNy4xMTQxIDEzLjQzNTMgMjcuMTU3NSAxMi41MjQ0IDI3LjA5MjRDMTEuNjEzNSAyNy4wMjczIDEwLjcyNDMgMjYuODc1NSA5Ljg1Njg0IDI2LjY1ODdMOS42NjE2NSAyNy4zNzQzTDguNzcyNDYgMzAuOTk2MkM5LjkwMDIxIDMxLjI5OTggMTEuMDQ5NyAzMS40NzMzIDEyLjIyMDggMzEuNTZDMTIuMjY0MiAzMS41NiAxMi4zMjkyIDMxLjU2IDEyLjM3MjYgMzEuNTZDMTMuNTAwMyAzMS42MjUxIDE0LjY0OTggMzEuNTgxNyAxNS43NTU4IDMxLjQ1MTZDMTYuOTI3IDMxLjI5OTggMTguMDk4MSAzMS4wMzk1IDE5LjIyNTggMzAuNjcwOEMyMC4zNTM2IDMwLjMwMjIgMjEuNDU5NyAyOS44MjUgMjIuNTAwNyAyOS4yMzk1QzIzLjU2MzQgMjguNjUzOSAyNC41NjEgMjcuOTM4MiAyNS40OTM1IDI3LjE1NzVDMjYuNDQ3OCAyNi4zNTUgMjcuMzE1MyAyNS40NDQyIDI4LjA3NDQgMjQuNDQ2NUMyOC4xODI4IDI0LjMxNjQgMjguMjY5NSAyNC4xNjQ2IDI4LjM3OCAyNC4wMTI4TDI0Ljc3NzkgMjEuMzQ1MkMyNC42Njk0IDIxLjQ1MzcgMjQuNjA0NCAyMS41ODM4IDI0LjUxNzYgMjEuNjkyMloiIGZpbGw9IndoaXRlIi8+Cjwvc3ZnPg==&labelColor=343CED)](https://github.com/gleanwork/.github/blob/main/docs/repository-stability.md#experimental)
+[![npm version](https://img.shields.io/npm/v/@gleanwork/mcp-server-tester)](https://www.npmjs.com/package/@gleanwork/mcp-server-tester)
+[![CI](https://github.com/gleanwork/mcp-server-tester/actions/workflows/ci.yml/badge.svg)](https://github.com/gleanwork/mcp-server-tester/actions/workflows/ci.yml)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Node.js Version](https://img.shields.io/node/v/@gleanwork/mcp-server-tester)](https://nodejs.org)
+[![TypeScript](https://img.shields.io/badge/TypeScript-5.7-blue.svg)](https://www.typescriptlang.org/)
+> Playwright-based testing framework for MCP servers
+> [!WARNING]
+> **Experimental Project** - This library is in active development. APIs may change, and we welcome contributions, feedback, and collaboration as we evolve the framework. See [CONTRIBUTING.md](./CONTRIBUTING.md) for details.
+`@gleanwork/mcp-server-tester` is a comprehensive testing and evaluation framework for [Model Context Protocol (MCP)](https://modelcontextprotocol.io) servers. It provides first-class Playwright fixtures, data-driven eval datasets, and optional LLM-as-a-judge scoring.
+## What's Included
+This framework provides **two complementary approaches** for testing MCP servers:
+### 1. **Automated Testing** (Playwright Tests)
+Write deterministic, automated tests using standard Playwright patterns with MCP-specific fixtures. Perfect for:
+- Direct tool calls with expected outputs
+- Protocol conformance validation
+- Integration testing with your MCP server
+- CI/CD pipelines
+```typescript
+test('read a file', async ({ mcp }) => {
+  const result = await mcp.callTool('read_file', { path: '/tmp/test.txt' });
+  expect(result.content).toContain('Hello');
+});
+```
+### 2. **Evaluation Datasets** (Evals) ⚠️ Experimental
+Run deeper, more subjective analysis using dataset-driven evaluations. Includes:
+- Schema validation (deterministic)
+- Text and regex pattern matching (deterministic)
+- LLM-as-a-judge scoring (non-deterministic)
+**Note:** Evals, particularly those using LLM-as-a-judge, are highly experimental due to their non-deterministic nature. Results may vary between runs, and prompts may need tuning for your specific use case.
+```typescript
+const result = await runEvalDataset({ dataset, expectations }, { mcp });
+expect(result.passed).toBe(result.total);
+```
+## Features
+- 🎭 **Playwright Integration** - Use MCP servers in Playwright tests with idiomatic fixtures
+- 📊 **Matrix Evals** - Run dataset-driven evaluations across multiple transports
+- 📸 **Snapshot Testing** - Capture and compare deterministic responses with optional sanitizers for variable data
+- 🤖 **LLM-as-a-Judge** - Optional semantic evaluation using Anthropic Claude
+- 🔌 **Multiple Transports** - Support for both stdio (local) and HTTP (remote) connections
+- ✅ **Protocol Conformance** - Built-in checks for MCP spec compliance
+## Installation
+```bash
+npm install --save-dev @gleanwork/mcp-server-tester @playwright/test zod
+```
+**Note:** The Anthropic SDK is optional and only needed if you plan to use LLM-as-a-judge semantic evaluation:
+```bash
+npm install --save-dev @anthropic-ai/sdk
+```
+## Quick Start
+### Initialize with CLI
+The fastest way to get started:
+```bash
+npx mcp-server-tester init
+# Follow the interactive prompts to create:
+# - playwright.config.ts (configured for your MCP server)
+# - tests/mcp.spec.ts (example tests)
+# - data/example-dataset.json (sample eval dataset)
+# - package.json (with all dependencies)
+```
+See the [CLI Guide](./docs/cli.md) for all options.
+### Example: Testing in Action
+Here's what a complete test suite looks like (following the **layered testing pattern**):
+```typescript
+// tests/mcp.spec.ts
+import { test, expect } from '@gleanwork/mcp-server-tester/fixtures/mcp';
+import {
+  loadEvalDataset,
+  runEvalDataset,
+  createSchemaExpectation,
+} from '@gleanwork/mcp-server-tester';
+import { z } from 'zod';
+// Layer 1: MCP Protocol Conformance
+test.describe('MCP Protocol Conformance', () => {
+  test('should return valid server info', async ({ mcp }) => {
+    const info = mcp.getServerInfo();
+    expect(info).toBeTruthy();
+    expect(info?.name).toBeTruthy();
+    expect(info?.version).toBeTruthy();
+  });
+  test('should list available tools', async ({ mcp }) => {
+    const tools = await mcp.listTools();
+    expect(Array.isArray(tools)).toBe(true);
+    expect(tools.length).toBeGreaterThan(0);
+  });
+  test('should handle invalid tool gracefully', async ({ mcp }) => {
+    const result = await mcp.callTool('nonexistent_tool', {});
+    expect(result.isError).toBe(true);
+  });
+});
+// Layer 2: Direct Tool Testing
+test.describe('File Operations', () => {
+  test('should read a file', async ({ mcp }) => {
+    const result = await mcp.callTool('read_file', {
+      path: '/tmp/test.txt',
+    });
+    expect(result.content).toContain('Hello');
+  });
+});
+// Layer 3: Eval Datasets
+test('file operations eval', async ({ mcp }) => {
+  const FileContentSchema = z.object({
+    content: z.string(),
+  });
+  const dataset = await loadEvalDataset('./data/evals.json', {
+    schemas: { 'file-content': FileContentSchema },
+  });
+  const result = await runEvalDataset(
+    {
+      dataset,
+      expectations: {
+        schema: createSchemaExpectation(dataset),
+      },
+    },
+    { mcp }
+  );
+  expect(result.passed).toBe(result.total);
+});
+```
+```json
+// data/evals.json
+{
+  "name": "file-ops",
+  "cases": [
+    {
+      "id": "read-config",
+      "toolName": "read_file",
+      "args": { "path": "/tmp/config.json" },
+      "expectedSchemaName": "file-content"
+    },
+    {
+      "id": "read-readme",
+      "toolName": "read_file",
+      "args": { "path": "/tmp/README.md" },
+      "expectedTextContains": ["# Welcome", "## Installation"]
+    }
+  ]
+}
+```
+```typescript
+// playwright.config.ts
+import { defineConfig } from '@playwright/test';
+export default defineConfig({
+  testDir: './tests',
+  projects: [
+    {
+      name: 'mcp-local',
+      use: {
+        mcpConfig: {
+          transport: 'stdio',
+          command: 'node',
+          args: ['path/to/your/server.js'],
+        },
+      },
+    },
+  ],
+});
+```
+## Documentation
+- **[Quick Start Guide](./docs/quickstart.md)** - Detailed setup and configuration
+- **[Expectations](./docs/expectations.md)** - All validation types (exact, schema, regex, text contains, snapshot, LLM judge)
+- **[API Reference](./docs/api-reference.md)** - Complete API documentation
+- **[CLI Commands](./docs/cli.md)** - `init`, `generate`, `login`, and `token` command details
+- **[UI Reporter](./docs/ui-reporter.md)** - Interactive web UI for test results
+- **[Transports](./docs/transports.md)** - Stdio vs HTTP configuration
+- **[Development](./docs/development.md)** - Contributing and building
+## Examples
+The `examples/` directory contains complete working examples:
+**Real MCP Server Tests:**
+- [`filesystem-server/`](./examples/filesystem-server) - Test suite for Anthropic's Filesystem MCP server
+  - Demonstrates `fixturify-project` for isolated test fixtures
+  - Zod schema validation for JSON files
+  - 5 Playwright tests, 11 eval dataset cases
+- [`sqlite-server/`](./examples/sqlite-server) - Test suite for SQLite MCP server
+  - Demonstrates `better-sqlite3` for database testing
+  - Custom expectations for record count validation
+  - 11 Playwright tests, 14 eval dataset cases
+**Basic Patterns:**
+- [`basic-playwright-usage/`](./examples/basic-playwright-usage) - Simple Playwright test patterns
+Each example includes complete test suites, eval datasets, and npm scripts. See [`examples/README.md`](./examples/README.md) for detailed documentation.
+## Key Concepts
+### Fixtures
+Access MCP servers in tests via Playwright fixtures:
+- `mcpClient: Client` - Raw MCP SDK client
+- `mcp: MCPFixtureApi` - High-level test API with helper methods
+### Expectations
+Validate tool responses with multiple expectation types:
+- **Exact Match** - Structured JSON equality
+- **Schema** - Zod validation
+- **Text Contains** - Substring matching (great for markdown)
+- **Regex** - Pattern matching
+- **LLM Judge** - Semantic evaluation
+See [Expectations Guide](./docs/expectations.md) for details.
+### Transports
+Connect to MCP servers via:
+- **stdio** - Local server processes
+- **HTTP** - Remote servers
+See [Transports Guide](./docs/transports.md) for configuration.
+### Snapshot Testing
+Snapshot testing captures tool responses and compares them against stored baselines. This works best for **deterministic responses** like help text, configuration, or schema discovery.
+> **Note:** For responses with timestamps, IDs, or live data, use [sanitizers](./docs/expectations.md#snapshot-sanitizers) to normalize variable content, or consider schema validation instead.
+```bash
+# Generate dataset with snapshot expectations
+npx mcp-server-tester generate --snapshot -o data/evals.json
+# First run captures snapshots
+npx playwright test
+# Update snapshots when server behavior changes
+npx playwright test --update-snapshots
+```
+For responses with variable data, use sanitizers:
+```json
+{
+  "id": "get-user",
+  "toolName": "get_user",
+  "args": { "id": "123" },
+  "expectedSnapshot": "user-profile",
+  "snapshotSanitizers": ["uuid", "iso-date", { "remove": ["lastLoginAt"] }]
+}
+```
+See the [Expectations Guide](./docs/expectations.md#snapshot-testing) for when to use snapshots vs other validation methods.
+## CLI OAuth Authentication
+For MCP servers that require OAuth authentication, the framework provides a CLI-based OAuth flow:
+### Interactive Login
+```bash
+# Authenticate with an MCP server (opens browser)
+npx mcp-server-tester login https://api.example.com/mcp
+# Force re-authentication
+npx mcp-server-tester login https://api.example.com/mcp --force
+```
+### Token Storage
+Tokens are cached locally and automatically refreshed when expired.
+**Storage locations:**
+- **Linux**: `$XDG_STATE_HOME/mcp-tests/<server-key>/` or `~/.local/state/mcp-tests/<server-key>/`
+- **macOS**: `~/.local/state/mcp-tests/<server-key>/`
+- **Windows**: `%LOCALAPPDATA%\mcp-tests\<server-key>\`
+**Security:**
+- Directory permissions: `0700` (owner only)
+- File permissions: `0600` (owner read/write only)
+- Files stored: `tokens.json`, `client.json`, `server.json`
+Use `--state-dir` to override the storage location.
+### Programmatic Usage
+```typescript
+import { CLIOAuthClient } from '@gleanwork/mcp-server-tester';
+const client = new CLIOAuthClient({
+  mcpServerUrl: 'https://api.example.com/mcp',
+});
+// Get a valid access token (cached, refreshed, or new)
+const result = await client.getAccessToken();
+console.log(`Token: ${result.accessToken}`);
+```
+### CI/CD Usage (GitHub Actions)
+For automated testing in CI, tokens can be provided via environment variables:
+```yaml
+# .github/workflows/mcp-tests.yml
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    env:
+      MCP_ACCESS_TOKEN: ${{ secrets.MCP_ACCESS_TOKEN }}
+      MCP_REFRESH_TOKEN: ${{ secrets.MCP_REFRESH_TOKEN }}
+    steps:
+      - uses: actions/checkout@v4
+      - run: npm ci
+      - run: npm run test:playwright
+```
+**To set up GitHub Actions secrets:**
+1. Authenticate locally: `npx mcp-server-tester login <server-url>`
+2. Export tokens for GitHub: `npx mcp-server-tester token <server-url> --format gh`
+3. Run the output `gh secret set` commands (requires [GitHub CLI](https://cli.github.com/))
+The `token` command supports multiple formats:
+- `env` (default) - Shell-compatible `KEY=value` pairs
+- `json` - JSON object for scripting
+- `gh` - Ready-to-paste GitHub CLI commands
+See the [CLI Guide](./docs/cli.md#token---export-tokens-for-cicd) for details.
+Alternatively, inject tokens programmatically in your test setup:
+```typescript
+import { injectTokens } from '@gleanwork/mcp-server-tester';
+// In globalSetup.ts
+await injectTokens('https://api.example.com/mcp', {
+  accessToken: process.env.MCP_ACCESS_TOKEN!,
+  tokenType: 'Bearer',
+});
+```
+## UI Reporter
+Interactive web UI for visualizing test results:
+![MCP Test Reporter UI](./ui.png)
+Add to your `playwright.config.ts`:
+```typescript
+export default defineConfig({
+  reporter: [['list'], ['@gleanwork/mcp-server-tester/reporters/mcpReporter']],
+});
+```
+See [UI Reporter Guide](./docs/ui-reporter.md) for features and usage.
+## Support
+- **Documentation**: See [`docs/`](./docs) directory
+- **Examples**: See [`examples/`](./examples) directory
+- **Issues**: [GitHub Issues](https://github.com/gleanwork/mcp-server-tester/issues)
+## License
+MIT
+## Contributing
+Contributions welcome! See [Development Guide](./docs/development.md) for setup instructions.
+## Credits
+Built with:
+- [@modelcontextprotocol/sdk](https://github.com/modelcontextprotocol/typescript-sdk)
+- [@playwright/test](https://playwright.dev)
+- [Zod](https://zod.dev)