gitlab-mcp 1.2.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. package/README.md +42 -28
  2. package/dist/config/dotenv.d.ts +2 -0
  3. package/dist/config/dotenv.js +44 -0
  4. package/dist/config/dotenv.js.map +1 -0
  5. package/dist/config/env.d.ts +3 -2
  6. package/dist/config/env.js +17 -2
  7. package/dist/config/env.js.map +1 -1
  8. package/dist/http-app.js +10 -3
  9. package/dist/http-app.js.map +1 -1
  10. package/dist/http.js +5 -4
  11. package/dist/http.js.map +1 -1
  12. package/dist/index.js +5 -4
  13. package/dist/index.js.map +1 -1
  14. package/dist/lib/auth-context.d.ts +2 -1
  15. package/dist/lib/auth-context.js.map +1 -1
  16. package/dist/lib/gitlab-client.d.ts +42 -8
  17. package/dist/lib/gitlab-client.js +380 -42
  18. package/dist/lib/gitlab-client.js.map +1 -1
  19. package/dist/lib/network.js +12 -6
  20. package/dist/lib/network.js.map +1 -1
  21. package/dist/lib/oauth-scopes.d.ts +2 -0
  22. package/dist/lib/oauth-scopes.js +16 -0
  23. package/dist/lib/oauth-scopes.js.map +1 -0
  24. package/dist/lib/regex.d.ts +5 -0
  25. package/dist/lib/regex.js +111 -0
  26. package/dist/lib/regex.js.map +1 -0
  27. package/dist/lib/request-runtime.js +24 -11
  28. package/dist/lib/request-runtime.js.map +1 -1
  29. package/dist/tools/gitlab.js +193 -3
  30. package/dist/tools/gitlab.js.map +1 -1
  31. package/dist/tools/mr-code-context.d.ts +1 -1
  32. package/dist/types/auth.d.ts +1 -0
  33. package/dist/types/auth.js +2 -0
  34. package/dist/types/auth.js.map +1 -0
  35. package/dist/types/context.d.ts +1 -0
  36. package/docs/architecture.md +11 -11
  37. package/docs/authentication.md +4 -1
  38. package/docs/configuration.md +29 -22
  39. package/docs/deployment.md +2 -0
  40. package/docs/mcp-integration-testing-best-practices.md +381 -730
  41. package/docs/tools.md +24 -14
  42. package/package.json +1 -1
@@ -1,124 +1,138 @@
1
- # MCP Integration Testing Best Practices
1
+ # MCP Server Integration Testing Best Practices (JavaScript / TypeScript)
2
2
 
3
- > This article is based on hands-on experience with 324 test cases from the gitlab-mcp project, combined with the official MCP SDK, Inspector CLI, and community-recommended practices, to systematically summarize the methodology and implementation patterns for MCP Server integration testing.
3
+ > This guide is for teams building MCP servers in the Node.js / TypeScript ecosystem. It focuses on a **deterministic, automatable, layered** integration testing strategy.
4
4
  >
5
- > Scope note: This is a general methodology article. Code snippets, environment variable names, file paths, and npm scripts are illustrative examples and should be adapted to your repository layout and conventions.
5
+ > Baseline: MCP specification **2025-11-25**. The guide explicitly distinguishes **Streamable HTTP (current)** from **HTTP+SSE (legacy compatibility)**.
6
+ >
7
+ > Version note: as of early 2026, the TypeScript SDK is still in a v1 → v2 transition period. Many production systems still run v1, while v2 is introducing package splits and API changes.
8
+ >
9
+ > Scope note: this document summarizes methodology and implementation patterns. Snippets, env vars, file paths, and commands are examples and should be adapted to your repository.
10
+
11
+ ---
6
12
 
7
13
  ## Table of Contents
8
14
 
9
- - [Why MCP Servers Need Integration Testing](#why-mcp-servers-need-integration-testing)
10
- - [The Testing Pyramid: MCP Edition](#the-testing-pyramid-mcp-edition)
11
- - [Layer 1: InMemoryTransport Protocol-Level Testing](#layer-1-inmemorytransport-protocol-level-testing)
12
- - [Core Scaffolding: buildContext + createLinkedPair](#core-scaffolding-buildcontext--createlinkedpair)
13
- - [Pattern 1: Tool Registration and Discovery](#pattern-1-tool-registration-and-discovery)
14
- - [Pattern 2: Tool Handler End-to-End Verification](#pattern-2-tool-handler-end-to-end-verification)
15
+ - [Why MCP Servers Need Integration Tests](#why-mcp-servers-need-integration-tests)
16
+ - [Testing Pyramid for MCP](#testing-pyramid-for-mcp)
17
+ - [Layer 0: Design for Testability (Server Factory + Dependency Injection)](#layer-0-design-for-testability-server-factory--dependency-injection)
18
+ - [Layer 1: InMemoryTransport Protocol-Level Integration Tests (P0 Core)](#layer-1-inmemorytransport-protocol-level-integration-tests-p0-core)
19
+ - [Core Scaffold: buildContext + createLinkedPair](#core-scaffold-buildcontext--createlinkedpair)
20
+ - [Pattern 1: Capabilities and List Contracts (tools/resources/prompts/list)](#pattern-1-capabilities-and-list-contracts-toolsresourcespromptslist)
21
+ - [Pattern 2: Tool Handler End-to-End Validation (stub external dependencies)](#pattern-2-tool-handler-end-to-end-validation-stub-external-dependencies)
15
22
  - [Pattern 3: Schema Validation and Boundary Inputs](#pattern-3-schema-validation-and-boundary-inputs)
16
- - [Layer 2: HTTP Transport Layer Testing](#layer-2-http-transport-layer-testing)
17
- - [Streamable HTTP Testing](#streamable-http-testing)
18
- - [SSE Transport Testing](#sse-transport-testing)
19
- - [Session Lifecycle Testing](#session-lifecycle-testing)
20
- - [Layer 3: Security and Policy Testing](#layer-3-security-and-policy-testing)
21
- - [Remote Authentication Flow](#remote-authentication-flow)
22
- - [Error Handling and Sensitive Information Redaction](#error-handling-and-sensitive-information-redaction)
23
- - [Policy Engine: Read-Only, Feature Flags, Allowlists](#policy-engine-read-only-feature-flags-allowlists)
24
- - [Layer 4: Agent Loop Integration Testing](#layer-4-agent-loop-integration-testing)
25
- - [ScriptedLLM Pattern](#scriptedllm-pattern)
26
- - [Real LLM Smoke Testing](#real-llm-smoke-testing)
27
- - [Layer 5: Inspector CLI Black-Box Testing](#layer-5-inspector-cli-black-box-testing)
28
- - [CI/CD Integration Strategy](#cicd-integration-strategy)
29
- - [Common Pitfalls and Solutions](#common-pitfalls-and-solutions)
30
- - [Summary: Recommended Test Matrix](#summary-recommended-test-matrix)
23
+ - [Pattern 4: Minimal Coverage for Bidirectional Requests (Sampling/Elicitation)](#pattern-4-minimal-coverage-for-bidirectional-requests-samplingelicitation)
24
+ - [Layer 2: Streamable HTTP Transport Integration Tests (P0/P1)](#layer-2-streamable-http-transport-integration-tests-p0p1)
25
+ - [Spec Checklist You Must Align With (2025-11-25)](#spec-checklist-you-must-align-with-2025-11-25)
26
+ - [HTTP Test Harness: Port 0 + Isolated Server Instances](#http-test-harness-port-0--isolated-server-instances)
27
+ - [Must-Test Cases: Session, 404 Reinitialize, DELETE, Protocol Version Header](#must-test-cases-session-404-reinitialize-delete-protocol-version-header)
28
+ - [SSE Stream Tests (GET/POST SSE on Streamable HTTP)](#sse-stream-tests-getpost-sse-on-streamable-http)
29
+ - [Layer 2.5: Legacy HTTP+SSE Compatibility Tests (Only if Needed)](#layer-25-legacy-httpsse-compatibility-tests-only-if-needed)
30
+ - [Layer 3: Security / Auth / Policy Tests (P0/P1)](#layer-3-security--auth--policy-tests-p0p1)
31
+ - [Origin/Host Protection (DNS Rebinding)](#originhost-protection-dns-rebinding)
32
+ - [OAuth/Authorization (Resource Metadata Discovery)](#oauthauthorization-resource-metadata-discovery)
33
+ - [Error Handling and Secret Redaction](#error-handling-and-secret-redaction)
34
+ - [Policy Combinatorics: Read-Only, Allowlists, Feature Flags](#policy-combinatorics-read-only-allowlists-feature-flags)
35
+ - [Layer 4: Conformance Testing (Strongly Recommended)](#layer-4-conformance-testing-strongly-recommended)
36
+ - [Layer 5: Agent Loop Integration Tests (ScriptedLLM + Small Real-LLM Smoke)](#layer-5-agent-loop-integration-tests-scriptedllm--small-real-llm-smoke)
37
+ - [Layer 6: Inspector CLI Black-Box Contract Tests (Pre/Post Deployment)](#layer-6-inspector-cli-black-box-contract-tests-prepost-deployment)
38
+ - [CI/CD Layered Execution Strategy](#cicd-layered-execution-strategy)
39
+ - [Common Pitfalls and Fixes (Updated for 2025-11-25)](#common-pitfalls-and-fixes-updated-for-2025-11-25)
40
+ - [Recommended Test Matrix (Copy/Paste)](#recommended-test-matrix-copypaste)
41
+ - [References and Compatibility Notes](#references-and-compatibility-notes)
31
42
 
32
43
  ---
33
44
 
34
- ## Why MCP Servers Need Integration Testing
35
-
36
- An MCP Server is not an ordinary HTTP API. It has several characteristics that make testing more complex:
45
+ ## Why MCP Servers Need Integration Tests
37
46
 
38
- 1. **Stateful sessions**: The client must first `initialize` to obtain a session ID, and all subsequent requests must include it
39
- 2. **Multiple transport protocols**: The same server may simultaneously support Streamable HTTP, SSE, and stdio
40
- 3. **Bidirectional communication**: In SSE mode, the server can proactively push events to the client
41
- 4. **Policy layer**: Read-only mode, tool allowlists, and feature flags can alter the available tool set
42
- 5. **Authentication context**: In remote deployments, tokens are passed via HTTP headers and must propagate through AsyncLocalStorage
47
+ An MCP server is not a standard HTTP API. Complexity comes from the combination of protocol mechanics, session behavior, bidirectional messaging, and security boundaries.
43
48
 
44
- Pure unit tests cannot cover these interactions. You need a real MCP Client and Server communicating through a real (or in-memory simulated) transport layer to verify the complete request-response chain.
49
+ 1. **Stateful sessions over HTTP**: initialization can return `MCP-Session-Id`; subsequent requests must carry it. When a session expires, the server should return `404`, and the client should re-initialize.
50
+ 2. **Multiple transports**: the spec defines stdio and Streamable HTTP. Streamable HTTP uses a single endpoint that can support both `POST` and `GET` (optional SSE stream).
51
+ 3. **Bidirectional messaging**: servers can send notifications/requests over SSE. Disconnection is not cancellation; cancellation requires an explicit cancel notification.
52
+ 4. **Explicit security requirements**: Streamable HTTP should validate `Origin` to mitigate DNS rebinding. Local deployments should typically bind to `127.0.0.1`, and auth should be implemented where required.
53
+ 5. **Authorization is more than Bearer header plumbing**: the spec defines OAuth-based discovery and flow, including Protected Resource Metadata discovery.
45
54
 
46
- A common anti-pattern in the community is so-called **"Vibe Testing"** spinning up an LLM Agent, typing a few prompts, and considering it passed if the output "looks about right." This approach is non-deterministic, non-reproducible, and expensive. The correct approach is to build a **deterministic, automatable, layered** integration testing system.
55
+ Because of this, unit tests alone cannot cover full interaction behavior. Integration tests should use **real MCP clients + real MCP servers + real/simulated transports** so results are reproducible, assertable, and CI-friendly.
47
56
 
48
57
  ---
49
58
 
50
- ## The Testing Pyramid: MCP Edition
51
-
52
- ```
53
- ┌─────────────┐
54
- │ LLM E2E │ ← Few, Nightly
55
- │ Smoke Test │
56
- ─┤ ├─
57
- / └─────────────┘ \
58
- / Inspector CLI \Black-box contract test
59
- / ┌─────────────────┐ \
60
- / Security / │ \
61
- / │ Policy / Error │ \ ← Every PR
62
- / ┌──┴─────────────────┴──┐ \
63
- / │ HTTP / SSE Transport │ \
64
- / Session Lifecycle │ \
65
- / ┌──┴───────────────────────┴──┐ \
66
- / InMemoryTransport Protocol \ Every commit
67
- / │ Registration / Schema / │ \
68
- / Handler │ \
69
- └────────────────────────────────────────┘
59
+ ## Testing Pyramid for MCP
60
+
61
+ ```text
62
+ ┌───────────────────┐
63
+ Real LLM Smoke │ ← small, nightly
64
+ └─────────┬─────────┘
65
+
66
+ ┌─────────────┴─────────────┐
67
+ │ Agent Loop (ScriptedLLM) │nightly / small PR subset
68
+ └─────────────┬─────────────┘
69
+
70
+ ┌────────────────┴────────────────┐
71
+ │ Conformance (Spec Compliance) │ ← nightly / pre-release
72
+ └────────────────┬────────────────┘
73
+
74
+ ┌───────────────────────┴───────────────────────┐
75
+ Streamable HTTP + Session + SSE integration │ ← every PR (P0/P1)
76
+ └───────────────────────┬───────────────────────┘
77
+
78
+ ┌─────────────────────────┴─────────────────────────┐
79
+ │ InMemoryTransport protocol-level integration │ ← every commit (P0)
80
+ └───────────────────────────────────────────────────┘
70
81
  ```
71
82
 
72
- **Principle: The lower the layer, the more tests, the faster, and the more deterministic.**
83
+ Principle: lower layers should be broader, faster, and more deterministic. Higher layers should be smaller and smoke-oriented.
73
84
 
74
85
  ---
75
86
 
76
- ## Layer 1: InMemoryTransport Protocol-Level Testing
87
+ ## Layer 0: Design for Testability (Server Factory + Dependency Injection)
77
88
 
78
- This is the **cornerstone** of MCP integration testing. Using the official TypeScript SDK's `InMemoryTransport.createLinkedPair()`, you can connect Client and Server directly within the same process — no child processes or HTTP servers needed.
89
+ Whether integration testing is practical is mostly determined by server architecture.
79
90
 
80
- **Advantages**:
91
+ ### Core requirements
81
92
 
82
- - Extremely fast (millisecond-level)
83
- - Fully deterministic, no network/port dependencies
84
- - Tests the real MCP protocol handshake and tool invocation chain
93
+ - **Server constructor should be a pure factory**: `createMcpServer(context)` depends only on `context`. Avoid direct top-level reads of `process.env`, DB connections, or network calls.
94
+ - **Context should be complete**: include env, logger, external API clients, policy engine, formatters, and optionally clock/random providers.
95
+ - **Defaults must be usable**: `buildContext()` should provide complete defaults, and tests should override only deltas.
96
+ - **Time/random should be controllable**: session IDs, expiry handling, and retries become stable when time/random sources are injectable.
85
97
 
86
- ### Core Scaffolding: buildContext + createLinkedPair
98
+ ---
87
99
 
88
- Extracting the server creation logic into a factory function is the key to testability. In tests, you import the factory directly and use dependency injection to replace external services.
100
+ ## Layer 1: InMemoryTransport Protocol-Level Integration Tests (P0 Core)
89
101
 
90
- ```typescript
91
- // tests/integration/_helpers.ts
102
+ Goal: avoid opening ports or spawning processes. Use real MCP client ↔ server lifecycle (`initialize` / `list` / `call`) in one process.
103
+
104
+ ### Core Scaffold: buildContext + createLinkedPair
92
105
 
106
+ ```ts
107
+ // tests/integration/_helpers.ts
93
108
  import { Client } from "@modelcontextprotocol/sdk/client/index.js";
94
109
  import { InMemoryTransport } from "@modelcontextprotocol/sdk/inMemory.js";
95
- import { createMcpServer } from "../../src/server/build-server.js";
96
110
 
97
- // 1) Build test context — all external dependencies can be stubbed
98
- export function buildContext(overrides?: BuildContextOptions): AppContext {
111
+ import { createMcpServer } from "../../src/server/createMcpServer.js";
112
+
113
+ export function buildContext(overrides?: Partial<AppContext>): AppContext {
99
114
  return {
100
115
  env: {
101
- ...defaultEnv, // Complete default configuration
102
- GITLAB_READ_ONLY_MODE: overrides?.readOnlyMode ?? false,
103
- GITLAB_ALLOWED_PROJECT_IDS: overrides?.allowedProjectIds ?? [],
104
- // ... other overridable fields
116
+ READ_ONLY_MODE: false,
117
+ ...overrides?.env
118
+ },
119
+ logger: overrides?.logger ?? {
120
+ info: vi.fn(),
121
+ warn: vi.fn(),
122
+ error: vi.fn(),
123
+ debug: vi.fn()
105
124
  },
106
- logger: {
107
- info: vi.fn(), warn: vi.fn(), error: vi.fn(),
108
- debug: vi.fn(), trace: vi.fn(), fatal: vi.fn(),
109
- child: () => ({}) as never
125
+ services: {
126
+ ...overrides?.services
110
127
  },
111
- gitlab: { ...overrides?.gitlabStub }, // Key: inject stub
112
- policy: new ToolPolicyEngine({ ... }), // Real policy engine
113
- formatter: new OutputFormatter({ ... }) // Real formatter
128
+ policy: overrides?.policy ?? new ToolPolicyEngine(),
129
+ formatter: overrides?.formatter ?? new OutputFormatter()
114
130
  };
115
131
  }
116
132
 
117
- // 2) Create Client ↔ Server linked pair
118
133
  export async function createLinkedPair(context: AppContext) {
119
134
  const server = createMcpServer(context);
120
- const [clientTransport, serverTransport] =
121
- InMemoryTransport.createLinkedPair();
135
+ const [clientTransport, serverTransport] = InMemoryTransport.createLinkedPair();
122
136
 
123
137
  await server.connect(serverTransport);
124
138
 
@@ -132,41 +146,39 @@ export async function createLinkedPair(context: AppContext) {
132
146
  }
133
147
  ```
134
148
 
135
- > **Best Practice**: `createMcpServer()` should be a **pure factory function** that accepts a complete context object and does not depend on global state or environment variables. This allows tests to construct server instances with any configuration.
149
+ Always close transports using `try/finally`, even if one side may cascade-close in current implementation.
136
150
 
137
- ### Pattern 1: Tool Registration and Discovery
151
+ ### Pattern 1: Capabilities and List Contracts (tools/resources/prompts/list)
138
152
 
139
- Verify that `tools/list` returns the expected tool set this is the most basic contract test.
153
+ Do not test tools only. A mature MCP server often exposes tools, resources, and prompts. The list contract is a first-order external API.
140
154
 
141
- ```typescript
142
- describe("Tool Registration", () => {
143
- it("registers all core tools under default configuration", async () => {
155
+ ```ts
156
+ describe("Contract: listTools()", () => {
157
+ it("exposes expected core tools by default", async () => {
144
158
  const { client, clientTransport, serverTransport } = await createLinkedPair(buildContext());
159
+
145
160
  try {
146
161
  const { tools } = await client.listTools();
147
162
  const names = tools.map((t) => t.name);
148
163
 
149
- expect(names).toContain("gitlab_get_project");
150
- expect(names).toContain("gitlab_list_issues");
151
164
  expect(names).toContain("health_check");
165
+ expect(names).toContain("my_readonly_tool");
152
166
  } finally {
153
167
  await clientTransport.close();
154
168
  await serverTransport.close();
155
169
  }
156
170
  });
157
171
 
158
- it("excludes all mutating tools in read-only mode", async () => {
159
- const { client, clientTransport, serverTransport } = await createLinkedPair(
160
- buildContext({ readOnlyMode: true })
161
- );
172
+ it("hides write tools in read-only mode", async () => {
173
+ const ctx = buildContext({ env: { READ_ONLY_MODE: true } as any });
174
+ const { client, clientTransport, serverTransport } = await createLinkedPair(ctx);
175
+
162
176
  try {
163
177
  const { tools } = await client.listTools();
164
178
  const names = tools.map((t) => t.name);
165
179
 
166
- expect(names).not.toContain("gitlab_create_issue");
167
- expect(names).not.toContain("gitlab_execute_graphql_mutation");
168
- // Read-only tools are still present
169
- expect(names).toContain("gitlab_get_project");
180
+ expect(names).not.toContain("create_issue");
181
+ expect(names).toContain("health_check");
170
182
  } finally {
171
183
  await clientTransport.close();
172
184
  await serverTransport.close();
@@ -175,46 +187,42 @@ describe("Tool Registration", () => {
175
187
  });
176
188
  ```
177
189
 
178
- > **Key Point**: Always close both transport endpoints in the `finally` block. InMemoryTransport does not clean up automatically — if you miss this, subsequent tests will hang.
190
+ Recommended assertion style:
191
+
192
+ - Assert presence/absence of tool names (stable).
193
+ - Avoid asserting full text descriptions (high-churn).
194
+
195
+ ### Pattern 2: Tool Handler End-to-End Validation (stub external dependencies)
179
196
 
180
- ### Pattern 2: Tool Handler End-to-End Verification
197
+ Validate at least three aspects:
181
198
 
182
- Use `client.callTool()` to make real JSON-RPC calls, stub external APIs (such as GitLab), and verify three things:
199
+ 1. Correct dependency call with correct parameters.
200
+ 2. Correct MCP tool result shape (`content[]` and optional `structuredContent`).
201
+ 3. Correct error-path behavior.
183
202
 
184
- 1. The correct API method is called with the right arguments
185
- 2. The response structure conforms to the MCP specification (`content[].text` + `structuredContent`)
186
- 3. Error scenarios return `isError: true`
203
+ ```ts
204
+ describe("Tool handler: get_project", () => {
205
+ it("forwards project_id to dependency", async () => {
206
+ const getProject = vi.fn().mockResolvedValue({ id: 42, name: "alpha" });
187
207
 
188
- ```typescript
189
- describe("Tool handler: gitlab_get_project", () => {
190
- it("passes project_id to context.gitlab.getProject()", async () => {
191
- const getProject = vi.fn().mockResolvedValue({
192
- id: 42,
193
- name: "my-project",
194
- path_with_namespace: "group/my-project"
208
+ const ctx = buildContext({
209
+ services: { git: { getProject } } as any
195
210
  });
196
211
 
197
- const { client, clientTransport, serverTransport } = await createLinkedPair(
198
- buildContext({
199
- gitlabStub: { getProject }
200
- })
201
- );
212
+ const { client, clientTransport, serverTransport } = await createLinkedPair(ctx);
202
213
 
203
214
  try {
204
215
  const result = await client.callTool({
205
- name: "gitlab_get_project",
206
- arguments: { project_id: "group/my-project" }
216
+ name: "get_project",
217
+ arguments: { project_id: "group/alpha" }
207
218
  });
208
219
 
209
- // Verification 1: stub was called correctly
210
- expect(getProject).toHaveBeenCalledWith("group/my-project");
211
-
212
- // Verification 2: response is not an error
220
+ expect(getProject).toHaveBeenCalledWith("group/alpha");
213
221
  expect(result.isError).toBeFalsy();
214
222
 
215
- // Verification 3: text content contains expected data
216
- const text = (result.content as Array<{ text: string }>).find((c) => c.type === "text")!.text;
217
- expect(text).toContain("my-project");
223
+ const text = (result.content as any[]).find((c) => c.type === "text")?.text ?? "";
224
+ expect(text).toContain("alpha");
225
+ expect(result.structuredContent).toMatchObject({ id: 42, name: "alpha" });
218
226
  } finally {
219
227
  await clientTransport.close();
220
228
  await serverTransport.close();
@@ -223,51 +231,29 @@ describe("Tool handler: gitlab_get_project", () => {
223
231
  });
224
232
  ```
225
233
 
226
- > **Best Practice**: `gitlabStub` only needs to provide the methods used by the current test. Unstubbed method calls will produce runtime errors, which is **exactly** the behavior you want — it helps you discover unexpected API calls.
234
+ Practical rule: stub only what the test should use. Unexpected dependency usage should fail loudly.
227
235
 
228
236
  ### Pattern 3: Schema Validation and Boundary Inputs
229
237
 
230
- The MCP SDK's Zod schemas automatically validate input parameters. You should test:
238
+ Most MCP servers use schema validation (often Zod). Invalid input should be treated as a first-class test target:
231
239
 
232
- - `null` value preprocessing (`null → undefined`)
233
- - Missing required fields
234
- - Type mismatches
235
- - Invalid enum values
240
+ - Missing/extra fields.
241
+ - Type mismatches.
242
+ - Invalid enum values.
243
+ - `null` / `undefined` semantics.
236
244
 
237
- ```typescript
238
- describe("Schema Validation", () => {
239
- it("null values are preprocessed to undefined (optional fields do not error)", async () => {
240
- const listProjects = vi.fn().mockResolvedValue([]);
241
- const { client, clientTransport, serverTransport } = await createLinkedPair(
242
- buildContext({
243
- gitlabStub: { listProjects }
244
- })
245
- );
245
+ ```ts
246
+ describe("Schema validation", () => {
247
+ it("returns tool-level error for type mismatch", async () => {
248
+ const ctx = buildContext({ services: { git: { listProjects: vi.fn() } } as any });
249
+ const { client, clientTransport, serverTransport } = await createLinkedPair(ctx);
246
250
 
247
251
  try {
248
252
  const result = await client.callTool({
249
- name: "gitlab_list_projects",
250
- arguments: { search: null, page: null } // null → undefined
253
+ name: "list_projects",
254
+ arguments: { page: "not-a-number" } as any
251
255
  });
252
- expect(result.isError).toBeFalsy();
253
- } finally {
254
- await clientTransport.close();
255
- await serverTransport.close();
256
- }
257
- });
258
256
 
259
- it("type mismatch triggers Zod error", async () => {
260
- const { client, clientTransport, serverTransport } = await createLinkedPair(
261
- buildContext({
262
- gitlabStub: { listProjects: vi.fn() }
263
- })
264
- );
265
-
266
- try {
267
- const result = await client.callTool({
268
- name: "gitlab_list_projects",
269
- arguments: { page: "not-a-number" } // should be number
270
- });
271
257
  expect(result.isError).toBe(true);
272
258
  } finally {
273
259
  await clientTransport.close();
@@ -277,705 +263,370 @@ describe("Schema Validation", () => {
277
263
  });
278
264
  ```
279
265
 
266
+ If your implementation maps validation failures to protocol errors (`-32602 InvalidParams`) instead of tool-level errors, that can also be valid. The key is consistency with client expectations.
267
+
268
+ ### Pattern 4: Minimal Coverage for Bidirectional Requests (Sampling/Elicitation)
269
+
270
+ If your server uses server-initiated requests (for example, sampling/elicitation), add a minimal closed-loop test:
271
+
272
+ - Register client-side handlers with deterministic responses.
273
+ - Verify server behavior continues correctly after handler responses.
274
+
280
275
  ---
281
276
 
282
- ## Layer 2: HTTP Transport Layer Testing
277
+ ## Layer 2: Streamable HTTP Transport Integration Tests (P0/P1)
278
+
279
+ InMemory tests are fast but skip critical reality: wire serialization, HTTP headers, session headers, SSE behavior, and origin checks.
280
+
281
+ ### Spec Checklist You Must Align With (2025-11-25)
283
282
 
284
- InMemoryTransport skips serialization and the network layer. To test real HTTP endpoints, you need to start a real HTTP server.
283
+ - **Single MCP endpoint** supports both `POST` and `GET`.
284
+ - **POST** requires client `Accept: application/json, text/event-stream`.
285
+ - **POST body** must be a single JSON-RPC message (not batch array in strict mode if your stack enforces that).
286
+ - **GET** opens SSE stream, or server may return `405` if SSE is not supported.
287
+ - **Origin security** should reject invalid origins with `403`.
288
+ - **Session behavior**: when session IDs are enabled, post-init requests must include `MCP-Session-Id`.
289
+ - **Protocol version header** should be validated for post-init requests.
285
290
 
286
- ### Streamable HTTP Testing
291
+ ### HTTP Test Harness: Port 0 + Isolated Server Instances
287
292
 
288
- **Key Pattern**: Use port 0 to let the OS assign a random port, avoiding port conflicts.
293
+ Never share a stateful server instance across tests involving session/rate-limit/connection state.
289
294
 
290
- ```typescript
295
+ ```ts
291
296
  import { createServer, type Server as HttpServer } from "node:http";
292
- import { setupMcpHttpApp } from "../../src/http-app.js";
293
297
 
294
298
  let httpServer: HttpServer;
295
299
  let baseUrl: string;
296
- let result: SetupMcpHttpAppResult;
297
-
298
- beforeAll(async () => {
299
- const context = buildHttpContext();
300
- result = setupMcpHttpApp({
301
- context,
302
- env: context.env,
303
- logger: context.logger
304
- });
305
300
 
306
- httpServer = createServer(result.app);
301
+ beforeEach(async () => {
302
+ const app = buildYourExpressOrHonoApp();
303
+ httpServer = createServer(app);
304
+
307
305
  await new Promise<void>((resolve) => {
308
306
  httpServer.listen(0, "127.0.0.1", () => resolve());
309
307
  });
310
308
 
311
309
  const addr = httpServer.address();
312
- if (typeof addr === "object" && addr !== null) {
310
+ if (typeof addr === "object" && addr?.port) {
313
311
  baseUrl = `http://127.0.0.1:${addr.port}`;
312
+ } else {
313
+ throw new Error("Failed to bind test port");
314
314
  }
315
315
  });
316
316
 
317
- afterAll(async () => {
318
- // Key: close all sessions first, then shut down the HTTP server
319
- for (const sessionId of result.sessions.keys()) {
320
- await result.closeSession(sessionId, "shutdown");
321
- }
317
+ afterEach(async () => {
322
318
  await new Promise<void>((resolve, reject) => {
323
319
  httpServer.close((err) => (err ? reject(err) : resolve()));
324
320
  });
325
321
  });
326
322
  ```
327
323
 
328
- **Typical Test Scenarios**:
329
-
330
- | Scenario | HTTP Method | Expected Status | Error Code |
331
- | ------------------------------- | ---------------------------- | --------------- | ---------- |
332
- | Initialize session | `POST /mcp` (initialize) | 200 | — |
333
- | Subsequent request with session | `POST /mcp` | 200 | — |
334
- | Invalid session ID | `POST /mcp` | 404 | -32001 |
335
- | GET without initialization | `GET /mcp` | 400 | -32000 |
336
- | Capacity exceeded | `POST /mcp` (MAX_SESSIONS=1) | 503 | -32002 |
337
- | Rate limiting | `POST /mcp` (excessive) | 429 | -32003 |
338
- | Delete session | `DELETE /mcp` | 200 | — |
339
- | Health check | `GET /healthz` | 200 | — |
340
-
341
- > **Best Practice**: Each test scenario requiring independent configuration (e.g., `MAX_SESSIONS=1`) should create its own `setupMcpHttpApp` + `createServer` instance, cleaning up in a `finally` block. Do not share stateful server instances.
342
-
343
- ### SSE Transport Testing
344
-
345
- SSE testing is more complex than HTTP because `GET /sse` returns a long-lived event stream:
346
-
347
- ```typescript
348
- // SSE event parser
349
- async function* parseSseEvents(response: Response): AsyncGenerator<SseEvent> {
350
- const reader = response.body!.getReader();
351
- const decoder = new TextDecoder();
352
- let buffer = "";
353
-
354
- try {
355
- while (true) {
356
- const { done, value } = await reader.read();
357
- if (done) break;
358
- buffer += decoder.decode(value, { stream: true });
359
- const parts = buffer.split("\n\n");
360
- buffer = parts.pop()!;
361
-
362
- for (const part of parts) {
363
- const event: SseEvent = {};
364
- for (const line of part.split("\n")) {
365
- if (line.startsWith("event: ")) event.event = line.slice(7).trim();
366
- else if (line.startsWith("data: ")) event.data = line.slice(6).trim();
367
- }
368
- yield event;
369
- }
324
+ ### Must-Test Cases: Session, 404 Reinitialize, DELETE, Protocol Version Header
325
+
326
+ #### 1) Initialization returns `MCP-Session-Id` (stateful mode)
327
+
328
+ ```ts
329
+ const MCP_HEADERS = {
330
+ "Content-Type": "application/json",
331
+ Accept: "application/json, text/event-stream"
332
+ };
333
+
334
+ function initializeBody() {
335
+ return JSON.stringify({
336
+ jsonrpc: "2.0",
337
+ id: 1,
338
+ method: "initialize",
339
+ params: {
340
+ protocolVersion: "2025-11-25",
341
+ capabilities: {},
342
+ clientInfo: { name: "itest", version: "0.0.1" }
370
343
  }
371
- } finally {
372
- reader.releaseLock();
373
- }
344
+ });
374
345
  }
375
- ```
376
-
377
- **SSE Test Flow**:
378
-
379
- ```typescript
380
- it("GET /sse returns an endpoint event", async () => {
381
- const controller = new AbortController();
382
- try {
383
- const response = await fetch(`${baseUrl}/sse`, {
384
- headers: { Accept: "text/event-stream" },
385
- signal: controller.signal
386
- });
387
346
 
388
- expect(response.headers.get("content-type")).toContain("text/event-stream");
389
-
390
- const gen = parseSseEvents(response);
391
- const first = await gen.next();
392
- const event = first.value as SseEvent;
347
+ it("returns MCP-Session-Id during initialize", async () => {
348
+ const res = await fetch(`${baseUrl}/mcp`, {
349
+ method: "POST",
350
+ headers: MCP_HEADERS,
351
+ body: initializeBody()
352
+ });
393
353
 
394
- expect(event.event).toBe("endpoint");
395
- expect(event.data).toContain("/messages?sessionId=");
396
- } finally {
397
- controller.abort(); // Required: clean up the long-lived connection
398
- }
354
+ expect(res.status).toBe(200);
355
+ expect(res.headers.get("MCP-Session-Id")).toBeTruthy();
399
356
  });
400
357
  ```
401
358
 
402
- > **Note**: In SSE mode, when calling `handlePostMessage` via `POST /messages`, you must pass `req.body` as the third argument, because Express's `json()` middleware has already consumed the raw body stream. This is a common pitfall.
403
-
404
- > **Important**: `SSE=true` is incompatible with `REMOTE_AUTHORIZATION=true`. The environment validation layer enforces this constraint at startup. If you need remote per-request authentication, use Streamable HTTP transport instead.
405
-
406
- ### Session Lifecycle Testing
359
+ #### 2) Post-init requests require `MCP-Session-Id` and valid protocol version
407
360
 
408
- Session management is the most bug-prone area of an MCP Server. You must test the complete lifecycle:
409
-
410
- ```typescript
411
- describe("Session DELETE", () => {
412
- it("POST with the same session returns 404 after DELETE", async () => {
413
- const sessionId = await initializeSession(baseUrl);
414
-
415
- // Delete the session
416
- await fetch(`${baseUrl}/mcp`, {
417
- method: "DELETE",
418
- headers: { ...MCP_HEADERS, "mcp-session-id": sessionId }
419
- });
420
-
421
- // Attempt to use the deleted session
422
- const res = await fetch(`${baseUrl}/mcp`, {
423
- method: "POST",
424
- headers: { ...MCP_HEADERS, "mcp-session-id": sessionId },
425
- body: JSON.stringify({
426
- jsonrpc: "2.0",
427
- id: 2,
428
- method: "tools/list",
429
- params: {}
430
- })
431
- });
432
-
433
- expect(res.status).toBe(404);
434
- const body = await res.json();
435
- expect(body.error?.code).toBe(-32001);
361
+ ```ts
362
+ it("requires session and protocol headers post-init", async () => {
363
+ const initRes = await fetch(`${baseUrl}/mcp`, {
364
+ method: "POST",
365
+ headers: MCP_HEADERS,
366
+ body: initializeBody()
367
+ });
368
+ const sessionId = initRes.headers.get("MCP-Session-Id")!;
369
+
370
+ const res = await fetch(`${baseUrl}/mcp`, {
371
+ method: "POST",
372
+ headers: {
373
+ ...MCP_HEADERS,
374
+ "MCP-Session-Id": sessionId,
375
+ "MCP-Protocol-Version": "2025-11-25"
376
+ },
377
+ body: JSON.stringify({ jsonrpc: "2.0", id: 2, method: "tools/list", params: {} })
436
378
  });
437
- });
438
- ```
439
-
440
- **Garbage Collection Testing**: Trigger immediate expiration by setting `SESSION_TIMEOUT_SECONDS` to 0:
441
-
442
- ```typescript
443
- it("GC cleans up expired sessions", async () => {
444
- (ctx.env as any).SESSION_TIMEOUT_SECONDS = 0;
445
- // ... create session ...
446
- expect(result.sessions.size).toBe(1);
447
379
 
448
- await result.garbageCollectSessions();
449
- expect(result.sessions.size).toBe(0);
380
+ expect(res.status).toBe(200);
450
381
  });
451
382
  ```
452
383
 
453
- **Client Disconnection Testing** requires polling, since TCP close is not synchronous:
384
+ #### 3) Invalid protocol version returns `400`
454
385
 
455
- ```typescript
456
- it("SSE session is cleaned up after client disconnects", async () => {
457
- controller.abort();
386
+ ```ts
387
+ it("returns 400 for invalid MCP-Protocol-Version", async () => {
388
+ const initRes = await fetch(`${baseUrl}/mcp`, {
389
+ method: "POST",
390
+ headers: MCP_HEADERS,
391
+ body: initializeBody()
392
+ });
393
+ const sessionId = initRes.headers.get("MCP-Session-Id")!;
394
+
395
+ const res = await fetch(`${baseUrl}/mcp`, {
396
+ method: "POST",
397
+ headers: {
398
+ ...MCP_HEADERS,
399
+ "MCP-Session-Id": sessionId,
400
+ "MCP-Protocol-Version": "invalid-version"
401
+ },
402
+ body: JSON.stringify({ jsonrpc: "2.0", id: 2, method: "tools/list", params: {} })
403
+ });
458
404
 
459
- const deadline = Date.now() + 2000;
460
- while (result.sseSessions.size > 0 && Date.now() < deadline) {
461
- await new Promise((r) => setTimeout(r, 50));
462
- }
463
- expect(result.sseSessions.size).toBe(0);
405
+ expect(res.status).toBe(400);
464
406
  });
465
407
  ```
466
408
 
467
- ---
409
+ #### 4) Session termination and re-initialization (`404` behavior)
468
410
 
469
- ## Layer 3: Security and Policy Testing
411
+ If your server expires/terminates sessions, requests with stale session IDs should return `404`, and client should re-initialize without sending old session ID.
470
412
 
471
- ### Remote Authentication Flow
413
+ #### 5) DELETE semantics (`200` or `405`)
472
414
 
473
- When `REMOTE_AUTHORIZATION=true`, the token comes from the HTTP header rather than an environment variable:
415
+ The server may support explicit session termination via `DELETE`, or may return `405`.
474
416
 
475
- ```typescript
476
- function buildRemoteAuthContext() {
477
- const ctx = buildContext({ token: null }); // No default token
478
- (ctx.env as any).REMOTE_AUTHORIZATION = true;
479
- (ctx.env as any).HTTP_JSON_ONLY = true;
480
- return ctx;
481
- }
417
+ ### SSE Stream Tests (GET/POST SSE on Streamable HTTP)
482
418
 
483
- describe("Remote Authentication", () => {
484
- it("missing token returns 401 + error code -32010", async () => {
485
- const res = await fetch(`${baseUrl}/mcp`, {
486
- method: "POST",
487
- headers: MCP_HEADERS, // No Authorization
488
- body: initializeBody()
489
- });
490
- expect(res.status).toBe(401);
491
- const body = await res.json();
492
- expect(body.error?.code).toBe(-32010);
493
- });
419
+ In Streamable HTTP, SSE happens on the **same endpoint**:
494
420
 
495
- it("Bearer token is passed via Authorization header", async () => {
496
- const res = await fetch(`${baseUrl}/mcp`, {
497
- method: "POST",
498
- headers: {
499
- ...MCP_HEADERS,
500
- Authorization: "Bearer test-remote-token"
501
- },
502
- body: initializeBody()
503
- });
504
- expect(res.status).toBe(200);
505
- expect(res.headers.get("mcp-session-id")).toBeTruthy();
506
- });
421
+ - POST response can be `text/event-stream`.
422
+ - GET may open a standalone SSE channel for notifications.
507
423
 
508
- it("authentication context propagates to session state", async () => {
509
- // Initialize session with token
510
- const initRes = await fetch(`${baseUrl}/mcp`, {
511
- method: "POST",
512
- headers: {
513
- ...MCP_HEADERS,
514
- Authorization: "Bearer my-secret-token"
515
- },
516
- body: initializeBody()
517
- });
518
- const sessionId = initRes.headers.get("mcp-session-id")!;
424
+ #### 1) `GET /mcp` should be SSE (`200`) or unsupported (`405`)
519
425
 
520
- // Verify internal auth state of the session
521
- const session = result.sessions.get(sessionId);
522
- expect(session!.auth?.token).toBe("my-secret-token");
523
- expect(session!.auth?.header).toBe("authorization");
426
+ ```ts
427
+ it("GET /mcp returns SSE or 405", async () => {
428
+ const res = await fetch(`${baseUrl}/mcp`, {
429
+ method: "GET",
430
+ headers: { Accept: "text/event-stream" }
524
431
  });
525
- });
526
- ```
527
432
 
528
- ### Error Handling and Sensitive Information Redaction
529
-
530
- Error handling is a security-critical path. Two modes need to be tested:
531
-
532
- | Mode | `GITLAB_ERROR_DETAIL_MODE` | Behavior |
533
- | ------ | -------------------------- | ------------------------------------------------------------------ |
534
- | `full` | Implementation-dependent | Returns full error details (better debugging, higher leakage risk) |
535
- | `safe` | Recommended for production | Hides internal details, returns generic messages |
536
-
537
- ```typescript
538
- describe("Error Handling", () => {
539
- it("GitLabApiError 404 → isError + status code", async () => {
540
- const getProject = vi.fn().mockRejectedValue(new GitLabApiError("Not Found", 404));
541
- // ...
542
- expect(result.isError).toBe(true);
543
- expect(text).toContain("GitLab API error 404");
544
- });
545
-
546
- it("safe mode hides error details", async () => {
547
- (ctx.env as any).GITLAB_ERROR_DETAIL_MODE = "safe";
548
-
549
- const getProject = vi
550
- .fn()
551
- .mockRejectedValue(new Error("DB connection failed: password=hunter2"));
552
- // ...
553
- expect(text).toBe("Request failed"); // Generic message
554
- expect(text).not.toContain("hunter2"); // No leakage
555
- });
556
-
557
- it("non-Error thrown values return Unknown error", async () => {
558
- const getProject = vi.fn().mockRejectedValue("string error");
559
- // ...
560
- expect(text).toBe("Unknown error");
561
- });
433
+ const ct = res.headers.get("content-type") ?? "";
434
+ expect([200, 405]).toContain(res.status);
435
+ if (res.status === 200) {
436
+ expect(ct).toContain("text/event-stream");
437
+ }
562
438
  });
563
439
  ```
564
440
 
565
- **Token Redaction Testing** ensure tokens are not leaked in error details. Using `it.each` reduces boilerplate when testing multiple token patterns:
566
-
567
- ```typescript
568
- describe("Token Redaction", () => {
569
- it.each([
570
- ["GitLab PAT", "glpat-abcdef1234567890"],
571
- ["GitHub PAT", "ghp_abcdef1234567890abcde"],
572
- ["JWT", "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.payload"]
573
- ])("redacts %s token", async (label, token) => {
574
- const getProject = vi.fn().mockRejectedValue(
575
- new GitLabApiError("Unauthorized", 401, {
576
- message: `Token ${token} is invalid`
577
- })
578
- );
579
- // ...
580
- expect(text).toContain("[REDACTED]");
581
- expect(text).not.toContain(token);
582
- });
441
+ #### 2) JSON-only mode should reject GET (`405`)
583
442
 
584
- it("redacts sensitive object keys (authorization, password, secret)", async () => {
585
- const getProject = vi.fn().mockRejectedValue(
586
- new GitLabApiError("Error", 400, {
587
- authorization: "Bearer secret-val",
588
- password: "hunter2",
589
- message: "safe value" // Non-sensitive key is preserved
590
- })
591
- );
592
- // ...
593
- expect(text).not.toContain("secret-val");
594
- expect(text).not.toContain("hunter2");
595
- expect(text).toContain("safe value");
596
- });
597
- });
598
- ```
443
+ If you explicitly enable JSON-only response mode, test that GET is rejected as expected.
599
444
 
600
- ### Policy Engine: Read-Only, Feature Flags, Allowlists
445
+ #### 3) SSE disconnect/reconnect behavior
601
446
 
602
- The policy engine determines which tools are available. You must test various combinations:
447
+ Avoid fixed sleeps. Prefer deadline + polling assertions when testing reconnect behavior.
603
448
 
604
- ```typescript
605
- describe("GraphQL Tool Policy", () => {
606
- it("disables GraphQL tools when ALLOWED_PROJECT_IDS is set", async () => {
607
- const { client, clientTransport, serverTransport } = await createLinkedPair(
608
- buildContext({ allowedProjectIds: ["123"] })
609
- );
610
- try {
611
- const names = (await client.listTools()).tools.map((t) => t.name);
612
- expect(names).not.toContain("gitlab_execute_graphql_query");
613
- expect(names).not.toContain("gitlab_execute_graphql_mutation");
614
- } finally {
615
- await clientTransport.close();
616
- await serverTransport.close();
617
- }
618
- });
449
+ ---
619
450
 
620
- it("GITLAB_ALLOW_GRAPHQL_WITH_PROJECT_SCOPE overrides the restriction", async () => {
621
- const { client, clientTransport, serverTransport } = await createLinkedPair(
622
- buildContext({
623
- allowedProjectIds: ["123"],
624
- allowGraphqlWithProjectScope: true
625
- })
626
- );
627
- try {
628
- const names = (await client.listTools()).tools.map((t) => t.name);
629
- expect(names).toContain("gitlab_execute_graphql_query");
630
- } finally {
631
- await clientTransport.close();
632
- await serverTransport.close();
633
- }
634
- });
451
+ ## Layer 2.5: Legacy HTTP+SSE Compatibility Tests (Only if Needed)
635
452
 
636
- it("compat tool blocks mutation in read-only mode", async () => {
637
- const { client, clientTransport, serverTransport } = await createLinkedPair(
638
- buildContext({ readOnlyMode: true, gitlabStub: { executeGraphql: vi.fn() } })
639
- );
640
- try {
641
- const result = await client.callTool({
642
- name: "gitlab_execute_graphql",
643
- arguments: { query: "mutation { createProject { id } }" }
644
- });
645
- expect(result.isError).toBe(true);
646
- } finally {
647
- await clientTransport.close();
648
- await serverTransport.close();
649
- }
650
- });
453
+ Streamable HTTP replaces legacy HTTP+SSE. If you still need old-client compatibility:
651
454
 
652
- it("string literal containing 'mutation' does not trigger false positive", async () => {
653
- const executeGraphql = vi.fn().mockResolvedValue({ data: {} });
654
- const { client, clientTransport, serverTransport } = await createLinkedPair(
655
- buildContext({ gitlabStub: { executeGraphql } })
656
- );
657
- try {
658
- const result = await client.callTool({
659
- name: "gitlab_execute_graphql_query",
660
- arguments: { query: '{ project(name: "mutation thing") { id } }' }
661
- });
662
- expect(result.isError).toBeFalsy(); // Should not error
663
- } finally {
664
- await clientTransport.close();
665
- await serverTransport.close();
666
- }
667
- });
668
- });
669
- ```
455
+ - Clearly label it as **legacy transport**.
456
+ - Keep minimal black-box tests for old endpoints.
457
+ - Implement new capabilities only on Streamable HTTP, not legacy.
670
458
 
671
459
  ---
672
460
 
673
- ## Layer 4: Agent Loop Integration Testing
461
+ ## Layer 3: Security / Auth / Policy Tests (P0/P1)
674
462
 
675
- ### ScriptedLLM Pattern
463
+ ### Origin/Host Protection (DNS Rebinding)
676
464
 
677
- The real MCP use case is an LLM Agent calling tools. However, testing directly with a real LLM is non-deterministic, slow, and expensive. The recommended approach is **ScriptedLLM** — a pre-programmed sequence of LLM responses.
465
+ Recommended cases:
678
466
 
679
- > **Note**: Layer 4 and Layer 5 patterns are high-value strategies. Some teams already implement them, while others can adopt them incrementally as a roadmap.
467
+ - Missing `Origin` (allow/deny based on policy, but keep behavior consistent).
468
+ - Invalid `Origin` outside allowlist → `403`.
469
+ - Invalid `Host` when host validation is enabled.
680
470
 
681
- ```typescript
682
- class ScriptedLLM {
683
- private cursor = 0;
684
- constructor(private responses: LLMResponse[]) {}
471
+ ### OAuth/Authorization (Resource Metadata Discovery)
685
472
 
686
- async createMessage(messages: Message[], tools: Tool[]) {
687
- if (this.cursor >= this.responses.length) {
688
- // Default end response
689
- return { content: [{ type: "text", text: "Done" }] };
690
- }
691
- return this.responses[this.cursor++];
692
- }
693
- }
473
+ If your HTTP transport supports OAuth-compliant authorization, minimally test:
694
474
 
695
- // Test case
696
- it("Agent calls tool and processes result", async () => {
697
- const listProjects = vi.fn().mockResolvedValue([{ id: 1, name: "alpha" }]);
698
-
699
- const { client } = await createLinkedPair(buildContext({ gitlabStub: { listProjects } }));
700
-
701
- const llm = new ScriptedLLM([
702
- // Round 1: LLM decides to call a tool
703
- {
704
- content: [
705
- {
706
- type: "tool_use",
707
- id: "call-1",
708
- name: "gitlab_list_projects",
709
- input: { search: "alpha" }
710
- }
711
- ]
712
- },
713
- // Round 2: LLM sees tool result and gives final answer
714
- {
715
- content: [{ type: "text", text: "Found project alpha" }]
716
- }
717
- ]);
475
+ - Unauthorized request returns `401` with expected auth challenge information.
476
+ - Optional scope challenge in auth response.
477
+ - Resource metadata endpoint exists and includes expected authorization server metadata.
718
478
 
719
- const result = await runAgentLoop({ client, llm, query: "find alpha" });
479
+ If you use a private bearer token model instead of full OAuth discovery, document and test that explicitly as a custom mode.
720
480
 
721
- expect(listProjects).toHaveBeenCalled();
722
- expect(result).toContain("alpha");
723
- });
724
- ```
481
+ ### Error Handling and Secret Redaction
725
482
 
726
- **Key Assertion Strategies**:
483
+ Differentiate:
727
484
 
728
- - Assert **whether a tool was called** and **with what arguments** — deterministic
729
- - Assert **the final output contains key information** semi-deterministic
730
- - **Do not** assert the full text of natural language output — unstable
485
+ - **Tool-level error**: request reached tool; `result.isError === true`.
486
+ - **Protocol-level error**: request itself failed; client gets protocol/transport exception.
731
487
 
732
- ### Real LLM Smoke Testing
488
+ Must-test:
733
489
 
734
- A small number of scenarios can use a real LLM (run in nightly CI):
490
+ - Stable error shape mapping.
491
+ - No leakage of sensitive values (`Authorization`, `password`, `token`, cookies).
492
+ - Security mode vs debug mode output differences.
735
493
 
736
- ```typescript
737
- // Only run in CI with an API key
738
- describe.skipIf(!process.env.ANTHROPIC_API_KEY)("LLM E2E Smoke", () => {
739
- it("can discover and call the health_check tool", async () => {
740
- const result = await runAgentLoop({
741
- client,
742
- llm: new AnthropicLLM(process.env.ANTHROPIC_API_KEY!),
743
- query: "Check the server health"
744
- });
494
+ ### Policy Combinatorics: Read-Only, Allowlists, Feature Flags
745
495
 
746
- // Loose assertion — just needs to produce a result
747
- expect(result.length).toBeGreaterThan(0);
748
- }, 30_000); // Generous timeout
749
- });
750
- ```
496
+ Policy regressions are common.
751
497
 
752
- ---
498
+ Recommended approach:
753
499
 
754
- ## Layer 5: Inspector CLI Black-Box Testing
500
+ - Layer 1: contract tests on `listTools()` for key policy combinations.
501
+ - Layer 2: a small number of end-to-end HTTP tests combining session + policy.
755
502
 
756
- [MCP Inspector](https://github.com/modelcontextprotocol/inspector)'s `--cli` mode is designed for scripting and CI, outputting JSON format. It's ideal for **black-box contract testing**.
503
+ ---
757
504
 
758
- ### Local STDIO Testing
505
+ ## Layer 4: Conformance Testing (Strongly Recommended)
759
506
 
760
- Use your project’s actual compiled stdio entrypoint (for example, `dist/index.js`).
507
+ Conformance tests catch protocol drift during spec and SDK upgrades.
761
508
 
762
509
  ```bash
763
- # List tools
764
- npx @modelcontextprotocol/inspector --cli \
765
- node dist/index.js \
766
- --method tools/list
767
-
768
- # Call a tool
769
- npx @modelcontextprotocol/inspector --cli \
770
- node dist/index.js \
771
- --method tools/call \
772
- --tool-name health_check
773
- ```
510
+ # Run server conformance scenarios against a running server
511
+ npx @modelcontextprotocol/conformance server --url http://localhost:3000/mcp
774
512
 
775
- ### Remote HTTP Testing
513
+ # Run one scenario only
514
+ npx @modelcontextprotocol/conformance server --url http://localhost:3000/mcp --scenario server-initialize
776
515
 
777
- ```bash
778
- # Streamable HTTP (with auth header)
779
- npx @modelcontextprotocol/inspector --cli \
780
- https://my-mcp-server.example.com \
781
- --transport http \
782
- --method tools/list \
783
- --header "Authorization: Bearer $TOKEN"
516
+ # List all scenarios
517
+ npx @modelcontextprotocol/conformance list
784
518
  ```
785
519
 
786
- ### Embedding in Vitest
787
-
788
- ```typescript
789
- import { execa } from "execa";
790
-
791
- test("Inspector CLI: tool list contains health_check", async () => {
792
- const { stdout } = await execa("npx", [
793
- "-y",
794
- "@modelcontextprotocol/inspector",
795
- "--cli",
796
- "node",
797
- "dist/index.js",
798
- "--method",
799
- "tools/list"
800
- ]);
801
-
802
- const res = JSON.parse(stdout);
803
- const names = res.tools.map((t: { name: string }) => t.name);
804
- expect(names).toContain("health_check");
805
- });
806
- ```
520
+ Recommended usage:
807
521
 
808
- > **When to use Inspector CLI vs SDK tests**: If you want stable API-level tests (unaffected by CLI output format changes), prefer SDK + InMemoryTransport. Inspector CLI is better suited for post-deployment smoke verification.
522
+ - Nightly: full conformance suite.
523
+ - Pre-release: blocking gate.
809
524
 
810
525
  ---
811
526
 
812
- ## CI/CD Integration Strategy
813
-
814
- ### Recommended Layered Strategy
815
-
816
- | Trigger | Test Type | Tools | Duration |
817
- | ----------------- | -------------------------------- | -------------------- | -------- |
818
- | Every commit / PR | InMemoryTransport protocol tests | Vitest + SDK | < 5s |
819
- | Every commit / PR | HTTP/SSE transport layer tests | Vitest + real server | < 10s |
820
- | Every commit / PR | Security/policy/error handling | Vitest + SDK | < 5s |
821
- | Every PR | Inspector CLI contract test | Inspector --cli | < 15s |
822
- | Nightly | Agent Loop (ScriptedLLM) | Vitest + SDK | < 30s |
823
- | Nightly | LLM E2E smoke test | Vitest + real LLM | < 60s |
824
- | Pre-release | Containerized full-stack test | Docker + Inspector | < 5min |
825
-
826
- ### package.json Script Organization
827
-
828
- The following script layout is one practical example. Rename or regroup scripts based on your repository structure and CI strategy.
829
-
830
- ```json
831
- {
832
- "scripts": {
833
- "test": "vitest run",
834
- "test:unit": "vitest run tests/unit",
835
- "test:integration": "vitest run tests/integration",
836
- "test:e2e": "vitest run tests/e2e",
837
- "test:smoke": "vitest run tests/smoke --timeout=60000",
838
- "typecheck": "tsc --noEmit",
839
- "lint": "eslint ."
840
- }
841
- }
842
- ```
527
+ ## Layer 5: Agent Loop Integration Tests (ScriptedLLM + Small Real-LLM Smoke)
843
528
 
844
- ### CI Configuration Essentials
529
+ Core principle: assert deterministic signals (tool sequence/arguments), not natural-language exact text.
845
530
 
846
- ```yaml
847
- # .github/workflows/test.yml or equivalent .gitlab-ci.yml
848
- test:
849
- steps:
850
- - run: pnpm typecheck # Type check first
851
- - run: pnpm lint # Then lint
852
- - run: pnpm test # Finally run all tests
853
- ```
531
+ ### ScriptedLLM mode (recommended)
854
532
 
855
- > **Note**: For SSE tests involving TCP connection closure (e.g., client disconnection), use **polling with a deadline** instead of a fixed `setTimeout` to avoid flaky tests caused by timing differences in CI environments.
533
+ - Drive agents with pre-scripted LLM responses.
534
+ - Assert called tools, arguments, and key entities in final output.
856
535
 
857
- ---
536
+ ### Real-LLM smoke (small)
858
537
 
859
- ## Common Pitfalls and Solutions
538
+ - Keep to 1–3 scenarios.
539
+ - Use weak assertions (non-empty output, successful basic tool call).
540
+ - Run nightly or on-demand only.
860
541
 
861
- ### 1. Express Body Parser Conflicts with SSE handlePostMessage
542
+ ---
862
543
 
863
- **Problem**: The `express.json()` middleware consumes the raw body stream, causing `SSEServerTransport.handlePostMessage()` to fail internally when calling `getRawBody()` with a `stream is not readable` error.
544
+ ## Layer 6: Inspector CLI Black-Box Contract Tests (Pre/Post Deployment)
864
545
 
865
- **Solution**: Always pass `req.body` as the third argument:
546
+ Inspector can be used as user-perspective black-box validation.
866
547
 
867
- ```typescript
868
- // Wrong
869
- await session.transport.handlePostMessage(req, res);
548
+ - Local build artifact: `node dist/index.js` (stdio).
549
+ - Remote deployment: `https://your-domain/mcp` (Streamable HTTP).
870
550
 
871
- // ✓ Correct
872
- await session.transport.handlePostMessage(req, res, req.body);
551
+ ```bash
552
+ npx @modelcontextprotocol/inspector --cli node dist/index.js --method tools/list
873
553
  ```
874
554
 
875
- ### 2. Unclosed InMemoryTransport Causes Tests to Hang
555
+ ---
876
556
 
877
- **Problem**: Forgetting to close transports causes the test process to never exit.
557
+ ## CI/CD Layered Execution Strategy
878
558
 
879
- **Solution**: Always use the `try/finally` pattern:
559
+ | Trigger | Test Layers | Goal |
560
+ | ----------------- | ------------------------------------------- | --------------------------------------------------------------- |
561
+ | Every commit / PR | Layer 1 (InMemory) | P0 contract: tools/handlers/schema/policy |
562
+ | Every PR | Layer 2 (HTTP) | P0/P1: session, 404 reinitialize, protocol headers, GET SSE/405 |
563
+ | Nightly | Layer 4 (Conformance) | Spec compliance and upgrade early warning |
564
+ | Nightly | Layer 5 (Agent loop + small real LLM smoke) | Real usage path smoke checks |
565
+ | Pre-release | Layer 6 (Inspector + deployment black-box) | Release acceptance gate |
880
566
 
881
- ```typescript
882
- const { client, clientTransport, serverTransport } = await createLinkedPair(context);
883
- try {
884
- // Test logic
885
- } finally {
886
- await clientTransport.close();
887
- await serverTransport.close();
888
- }
889
- ```
567
+ ---
890
568
 
891
- ### 3. Shared HTTP Server Causes State Leakage
569
+ ## Common Pitfalls and Fixes (Updated for 2025-11-25)
892
570
 
893
- **Problem**: Multiple tests share the same `setupMcpHttpApp` instance, and session state bleeds between them.
571
+ 1. Treating legacy HTTP+SSE as the current protocol model.
894
572
 
895
- **Solution**: Tests requiring independent configuration (e.g., `MAX_SESSIONS=1`) should create their own server instance and clean up in `finally`.
573
+ - Fix: use Streamable HTTP as primary; isolate legacy compatibility.
896
574
 
897
- ### 4. Timing Issues with SSE Client Disconnection
575
+ 2. Sending JSON-RPC batch arrays in POST bodies.
898
576
 
899
- **Problem**: After `controller.abort()`, the server-side `res.on("close")` callback is not triggered synchronously.
577
+ - Fix: enforce single-message payloads where required by your server/profile.
900
578
 
901
- **Solution**: Use polling with a deadline:
579
+ 3. Forgetting `MCP-Protocol-Version` / `MCP-Session-Id` in post-init requests.
902
580
 
903
- ```typescript
904
- const deadline = Date.now() + 2000;
905
- while (sessions.size > 0 && Date.now() < deadline) {
906
- await new Promise((r) => setTimeout(r, 50));
907
- }
908
- ```
581
+ - Fix: test valid, missing, and invalid header combinations.
909
582
 
910
- ### 5. False Positives in GraphQL Mutation Detection
583
+ 4. Missing Origin checks (DNS rebinding risk).
911
584
 
912
- **Problem**: A query containing the string literal `"mutation"` is incorrectly identified as a mutation operation.
585
+ - Fix: add origin/host allowlist tests and default-safe binding strategy.
913
586
 
914
- **Solution**: Strip comments and string values before detection:
587
+ 5. Incorrect body handling in HTTP middleware.
915
588
 
916
- ```typescript
917
- const normalized = query
918
- .replace(/#[^\n]*/g, " ") // Remove line comments
919
- .replace(/"""[\s\S]*?"""/g, " ") // Remove block strings
920
- .replace(/"(?:\\.|[^"\\])*"/g, " "); // Remove double-quoted strings
921
- ```
589
+ - Fix: ensure transport receives request body in the expected form.
922
590
 
923
- **You must write corresponding tests** to verify this behavior does not produce false positives.
591
+ 6. Misclassifying SSE disconnect as request cancellation.
924
592
 
925
- ### 6. TypeScript Compatibility with Environment Variable Type Overrides
593
+ - Fix: model cancellation explicitly and test cancellation semantics directly.
926
594
 
927
- **Problem**: Directly assigning `ctx.env.HTTP_JSON_ONLY = true` may cause a TypeScript error due to `readonly` types.
595
+ 7. Multi-replica session stickiness issues.
928
596
 
929
- **Solution**: Use type assertion:
930
-
931
- ```typescript
932
- (ctx.env as { HTTP_JSON_ONLY: boolean }).HTTP_JSON_ONLY = true;
933
- ```
597
+ - Fix: sticky sessions or external session store for stateful transport behavior.
934
598
 
935
599
  ---
936
600
 
937
- ## Summary: Recommended Test Matrix
938
-
939
- The following table summarizes the test dimensions a mature MCP Server should cover:
940
-
941
- | Dimension | Test Method | Priority |
942
- | -------------------------------------------------- | ------------------------ | -------- |
943
- | Protocol handshake (initialize / list) | InMemoryTransport | P0 |
944
- | Tool Handler correctness | InMemoryTransport + stub | P0 |
945
- | Schema validation / boundary inputs | InMemoryTransport | P0 |
946
- | HTTP Session create/reuse/delete | Real HTTP server | P0 |
947
- | SSE connect/message/disconnect | Real HTTP server | P1 |
948
- | Session capacity limits | Real HTTP server | P1 |
949
- | Session rate limiting | Real HTTP server | P1 |
950
- | Session garbage collection | Real HTTP server | P1 |
951
- | Remote authentication (Bearer / Private-Token) | Real HTTP server | P1 |
952
- | Dynamic API URL | Real HTTP server | P2 |
953
- | Error handling (GitLabApiError / Error / unknown) | InMemoryTransport | P0 |
954
- | Token redaction (glpat / ghp / JWT) | InMemoryTransport | P1 |
955
- | Sensitive key redaction (password / authorization) | InMemoryTransport | P1 |
956
- | Safe mode vs full mode | InMemoryTransport | P1 |
957
- | Read-only mode tool filtering | InMemoryTransport | P0 |
958
- | Feature flags (wiki / pipeline / release) | InMemoryTransport | P1 |
959
- | Tool allowlist / blocklist | InMemoryTransport | P1 |
960
- | GraphQL mutation detection and policy | InMemoryTransport | P1 |
961
- | Agent Loop (ScriptedLLM) | InMemoryTransport | P2 |
962
- | Response truncation (maxBytes) | InMemoryTransport | P2 |
963
- | Health check endpoint | Real HTTP server | P2 |
964
- | Inspector CLI contract test | Inspector --cli | P2 |
965
- | Real LLM E2E | Real LLM API | P3 |
601
+ ## Recommended Test Matrix (Copy/Paste)
602
+
603
+ | Dimension | Method | Priority |
604
+ | ----------------------------------------------------- | --------------- | -------- |
605
+ | initialize / lifecycle | InMemory + HTTP | P0 |
606
+ | tools/resources/prompts list contracts | InMemory | P0 |
607
+ | tool handler correctness with stubs | InMemory | P0 |
608
+ | schema validation (bad input) | InMemory | P0 |
609
+ | Streamable HTTP accept/headers/status | HTTP | P0 |
610
+ | `MCP-Session-Id`: create/reuse/terminate/reinitialize | HTTP | P0 |
611
+ | `MCP-Protocol-Version`: missing/invalid/valid | HTTP | P0 |
612
+ | GET behavior: SSE or 405 | HTTP | P1 |
613
+ | Origin/Host validation | HTTP | P0/P1 |
614
+ | OAuth challenge + resource metadata | HTTP | P1 |
615
+ | SSE reconnect (`retry` / `Last-Event-ID`) | HTTP | P2 |
616
+ | conformance suite | CLI | P1 |
617
+ | inspector black-box on artifacts | CLI | P2 |
618
+ | agent loop (ScriptedLLM) | InMemory | P2 |
619
+ | real LLM smoke | external API | P3 |
966
620
 
967
621
  ---
968
622
 
969
- ## References
970
-
971
- - [MCP Official Specification](https://modelcontextprotocol.io/specification) (referenced version: 2025-11-25)
972
- - [MCP TypeScript SDK](https://github.com/modelcontextprotocol/typescript-sdk)
973
- - [MCP Inspector (including CLI mode)](https://github.com/modelcontextprotocol/inspector)
974
- - [MCP Best Practices Guide](https://modelcontextprotocol.info/docs/best-practices/)
975
- - [MCP Server E2E Testing Example](https://github.com/mkusaka/mcp-server-e2e-testing-example)
976
- - [MCPcat Integration Testing Guide](https://mcpcat.io/guides/integration-tests-mcp-flows/)
977
- - [MCPcat Unit Testing Guide](https://mcpcat.io/guides/writing-unit-tests-mcp-servers/)
978
- - [Stop Vibe-Testing Your MCP Server](https://www.jlowin.dev/blog/stop-vibe-testing-mcp-servers)
979
- - [MCP Server Testing Tools Overview (Testomat.io)](https://testomat.io/blog/mcp-server-testing-tools/)
980
- - [MCP Server Best Practices (MarkTechPost)](https://www.marktechpost.com/2025/07/23/7-mcp-server-best-practices-for-scalable-ai-integrations-in-2025/)
981
- - [MCP Official Node.js Client Tutorial](https://modelcontextprotocol.io/tutorials/building-a-client)
623
+ ## References and Compatibility Notes
624
+
625
+ - MCP Specification 2025-11-25: transports, sessions, protocol version behavior.
626
+ - MCP Specification 2025-11-25: authorization and Protected Resource Metadata.
627
+ - MCP Conformance tooling.
628
+ - MCP Inspector documentation.
629
+ - TypeScript SDK migration notes (v1 to v2).
630
+ - TypeScript SDK server/client guides for Streamable HTTP behavior.
631
+
632
+ ---