gitlab-mcp 1.1.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (106) hide show
  1. package/LICENSE +21 -0
  2. package/dist/config/env.d.ts +56 -0
  3. package/dist/config/env.js +163 -0
  4. package/dist/config/env.js.map +1 -0
  5. package/dist/http-app.d.ts +45 -0
  6. package/dist/http-app.js +550 -0
  7. package/dist/http-app.js.map +1 -0
  8. package/dist/http.d.ts +2 -0
  9. package/dist/http.js +65 -0
  10. package/dist/http.js.map +1 -0
  11. package/dist/index.d.ts +2 -0
  12. package/dist/index.js +65 -0
  13. package/dist/index.js.map +1 -0
  14. package/dist/lib/auth-context.d.ts +9 -0
  15. package/dist/lib/auth-context.js +9 -0
  16. package/dist/lib/auth-context.js.map +1 -0
  17. package/dist/lib/gitlab-client.d.ts +331 -0
  18. package/dist/lib/gitlab-client.js +1025 -0
  19. package/dist/lib/gitlab-client.js.map +1 -0
  20. package/dist/lib/logger.d.ts +2 -0
  21. package/dist/lib/logger.js +13 -0
  22. package/dist/lib/logger.js.map +1 -0
  23. package/dist/lib/network.d.ts +3 -0
  24. package/dist/lib/network.js +38 -0
  25. package/dist/lib/network.js.map +1 -0
  26. package/dist/lib/oauth.d.ts +29 -0
  27. package/dist/lib/oauth.js +220 -0
  28. package/dist/lib/oauth.js.map +1 -0
  29. package/dist/lib/output.d.ts +14 -0
  30. package/dist/lib/output.js +38 -0
  31. package/dist/lib/output.js.map +1 -0
  32. package/dist/lib/policy.d.ts +25 -0
  33. package/dist/lib/policy.js +48 -0
  34. package/dist/lib/policy.js.map +1 -0
  35. package/dist/lib/request-runtime.d.ts +26 -0
  36. package/dist/lib/request-runtime.js +323 -0
  37. package/dist/lib/request-runtime.js.map +1 -0
  38. package/dist/lib/sanitize.d.ts +1 -0
  39. package/dist/lib/sanitize.js +21 -0
  40. package/dist/lib/sanitize.js.map +1 -0
  41. package/dist/lib/session-capacity.d.ts +8 -0
  42. package/dist/lib/session-capacity.js +7 -0
  43. package/dist/lib/session-capacity.js.map +1 -0
  44. package/dist/server/build-server.d.ts +3 -0
  45. package/dist/server/build-server.js +13 -0
  46. package/dist/server/build-server.js.map +1 -0
  47. package/dist/tools/gitlab.d.ts +9 -0
  48. package/dist/tools/gitlab.js +2576 -0
  49. package/dist/tools/gitlab.js.map +1 -0
  50. package/dist/tools/health.d.ts +2 -0
  51. package/dist/tools/health.js +21 -0
  52. package/dist/tools/health.js.map +1 -0
  53. package/dist/tools/mr-code-context.d.ts +38 -0
  54. package/dist/tools/mr-code-context.js +330 -0
  55. package/dist/tools/mr-code-context.js.map +1 -0
  56. package/{src/types/context.ts → dist/types/context.d.ts} +5 -6
  57. package/dist/types/context.js +2 -0
  58. package/dist/types/context.js.map +1 -0
  59. package/docs/configuration.md +6 -6
  60. package/docs/mcp-integration-testing-best-practices.md +981 -0
  61. package/package.json +13 -1
  62. package/.dockerignore +0 -7
  63. package/.editorconfig +0 -9
  64. package/.env.example +0 -75
  65. package/.github/workflows/nodejs.yml +0 -31
  66. package/.github/workflows/npm-publish.yml +0 -31
  67. package/.husky/pre-commit +0 -1
  68. package/.nvmrc +0 -1
  69. package/.prettierrc.json +0 -6
  70. package/Dockerfile +0 -20
  71. package/docker-compose.yml +0 -10
  72. package/eslint.config.js +0 -23
  73. package/scripts/get-oauth-token.example.sh +0 -15
  74. package/src/config/env.ts +0 -171
  75. package/src/http.ts +0 -620
  76. package/src/index.ts +0 -77
  77. package/src/lib/auth-context.ts +0 -19
  78. package/src/lib/gitlab-client.ts +0 -1810
  79. package/src/lib/logger.ts +0 -17
  80. package/src/lib/network.ts +0 -45
  81. package/src/lib/oauth.ts +0 -287
  82. package/src/lib/output.ts +0 -51
  83. package/src/lib/policy.ts +0 -78
  84. package/src/lib/request-runtime.ts +0 -376
  85. package/src/lib/sanitize.ts +0 -25
  86. package/src/lib/session-capacity.ts +0 -14
  87. package/src/server/build-server.ts +0 -17
  88. package/src/tools/gitlab.ts +0 -3135
  89. package/src/tools/health.ts +0 -27
  90. package/src/tools/mr-code-context.ts +0 -473
  91. package/tests/auth-context.test.ts +0 -102
  92. package/tests/gitlab-client.test.ts +0 -672
  93. package/tests/graphql-guard.test.ts +0 -121
  94. package/tests/integration/agent-loop.integration.test.ts +0 -558
  95. package/tests/integration/server.integration.test.ts +0 -543
  96. package/tests/mr-code-context.test.ts +0 -600
  97. package/tests/oauth.test.ts +0 -43
  98. package/tests/output.test.ts +0 -186
  99. package/tests/policy.test.ts +0 -324
  100. package/tests/request-runtime.test.ts +0 -252
  101. package/tests/sanitize.test.ts +0 -123
  102. package/tests/session-capacity.test.ts +0 -49
  103. package/tests/upload-reference.test.ts +0 -88
  104. package/tsconfig.build.json +0 -11
  105. package/tsconfig.json +0 -21
  106. package/vitest.config.ts +0 -12
@@ -0,0 +1,981 @@
1
+ # MCP Integration Testing Best Practices
2
+
3
+ > This article is based on hands-on experience with 324 test cases from the gitlab-mcp project, combined with the official MCP SDK, Inspector CLI, and community-recommended practices, to systematically summarize the methodology and implementation patterns for MCP Server integration testing.
4
+ >
5
+ > Scope note: This is a general methodology article. Code snippets, environment variable names, file paths, and npm scripts are illustrative examples and should be adapted to your repository layout and conventions.
6
+
7
+ ## Table of Contents
8
+
9
+ - [Why MCP Servers Need Integration Testing](#why-mcp-servers-need-integration-testing)
10
+ - [The Testing Pyramid: MCP Edition](#the-testing-pyramid-mcp-edition)
11
+ - [Layer 1: InMemoryTransport Protocol-Level Testing](#layer-1-inmemorytransport-protocol-level-testing)
12
+ - [Core Scaffolding: buildContext + createLinkedPair](#core-scaffolding-buildcontext--createlinkedpair)
13
+ - [Pattern 1: Tool Registration and Discovery](#pattern-1-tool-registration-and-discovery)
14
+ - [Pattern 2: Tool Handler End-to-End Verification](#pattern-2-tool-handler-end-to-end-verification)
15
+ - [Pattern 3: Schema Validation and Boundary Inputs](#pattern-3-schema-validation-and-boundary-inputs)
16
+ - [Layer 2: HTTP Transport Layer Testing](#layer-2-http-transport-layer-testing)
17
+ - [Streamable HTTP Testing](#streamable-http-testing)
18
+ - [SSE Transport Testing](#sse-transport-testing)
19
+ - [Session Lifecycle Testing](#session-lifecycle-testing)
20
+ - [Layer 3: Security and Policy Testing](#layer-3-security-and-policy-testing)
21
+ - [Remote Authentication Flow](#remote-authentication-flow)
22
+ - [Error Handling and Sensitive Information Redaction](#error-handling-and-sensitive-information-redaction)
23
+ - [Policy Engine: Read-Only, Feature Flags, Allowlists](#policy-engine-read-only-feature-flags-allowlists)
24
+ - [Layer 4: Agent Loop Integration Testing](#layer-4-agent-loop-integration-testing)
25
+ - [ScriptedLLM Pattern](#scriptedllm-pattern)
26
+ - [Real LLM Smoke Testing](#real-llm-smoke-testing)
27
+ - [Layer 5: Inspector CLI Black-Box Testing](#layer-5-inspector-cli-black-box-testing)
28
+ - [CI/CD Integration Strategy](#cicd-integration-strategy)
29
+ - [Common Pitfalls and Solutions](#common-pitfalls-and-solutions)
30
+ - [Summary: Recommended Test Matrix](#summary-recommended-test-matrix)
31
+
32
+ ---
33
+
34
+ ## Why MCP Servers Need Integration Testing
35
+
36
+ An MCP Server is not an ordinary HTTP API. It has several characteristics that make testing more complex:
37
+
38
+ 1. **Stateful sessions**: The client must first `initialize` to obtain a session ID, and all subsequent requests must include it
39
+ 2. **Multiple transport protocols**: The same server may simultaneously support Streamable HTTP, SSE, and stdio
40
+ 3. **Bidirectional communication**: In SSE mode, the server can proactively push events to the client
41
+ 4. **Policy layer**: Read-only mode, tool allowlists, and feature flags can alter the available tool set
42
+ 5. **Authentication context**: In remote deployments, tokens are passed via HTTP headers and must propagate through AsyncLocalStorage
43
+
44
+ Pure unit tests cannot cover these interactions. You need a real MCP Client and Server communicating through a real (or in-memory simulated) transport layer to verify the complete request-response chain.
45
+
46
+ A common anti-pattern in the community is so-called **"Vibe Testing"** — spinning up an LLM Agent, typing a few prompts, and considering it passed if the output "looks about right." This approach is non-deterministic, non-reproducible, and expensive. The correct approach is to build a **deterministic, automatable, layered** integration testing system.
47
+
48
+ ---
49
+
50
+ ## The Testing Pyramid: MCP Edition
51
+
52
+ ```
53
+ ┌─────────────┐
54
+ │ LLM E2E │ ← Few, Nightly
55
+ │ Smoke Test │
56
+ ─┤ ├─
57
+ / └─────────────┘ \
58
+ / Inspector CLI \ ← Black-box contract test
59
+ / ┌─────────────────┐ \
60
+ / │ Security / │ \
61
+ / │ Policy / Error │ \ ← Every PR
62
+ / ┌──┴─────────────────┴──┐ \
63
+ / │ HTTP / SSE Transport │ \
64
+ / │ Session Lifecycle │ \
65
+ / ┌──┴───────────────────────┴──┐ \
66
+ / │ InMemoryTransport Protocol │ \ ← Every commit
67
+ / │ Registration / Schema / │ \
68
+ / │ Handler │ \
69
+ └────────────────────────────────────────┘
70
+ ```
71
+
72
+ **Principle: The lower the layer, the more tests, the faster, and the more deterministic.**
73
+
74
+ ---
75
+
76
+ ## Layer 1: InMemoryTransport Protocol-Level Testing
77
+
78
+ This is the **cornerstone** of MCP integration testing. Using the official TypeScript SDK's `InMemoryTransport.createLinkedPair()`, you can connect Client and Server directly within the same process — no child processes or HTTP servers needed.
79
+
80
+ **Advantages**:
81
+
82
+ - Extremely fast (millisecond-level)
83
+ - Fully deterministic, no network/port dependencies
84
+ - Tests the real MCP protocol handshake and tool invocation chain
85
+
86
+ ### Core Scaffolding: buildContext + createLinkedPair
87
+
88
+ Extracting the server creation logic into a factory function is the key to testability. In tests, you import the factory directly and use dependency injection to replace external services.
89
+
90
+ ```typescript
91
+ // tests/integration/_helpers.ts
92
+
93
+ import { Client } from "@modelcontextprotocol/sdk/client/index.js";
94
+ import { InMemoryTransport } from "@modelcontextprotocol/sdk/inMemory.js";
95
+ import { createMcpServer } from "../../src/server/build-server.js";
96
+
97
+ // 1) Build test context — all external dependencies can be stubbed
98
+ export function buildContext(overrides?: BuildContextOptions): AppContext {
99
+ return {
100
+ env: {
101
+ ...defaultEnv, // Complete default configuration
102
+ GITLAB_READ_ONLY_MODE: overrides?.readOnlyMode ?? false,
103
+ GITLAB_ALLOWED_PROJECT_IDS: overrides?.allowedProjectIds ?? [],
104
+ // ... other overridable fields
105
+ },
106
+ logger: {
107
+ info: vi.fn(), warn: vi.fn(), error: vi.fn(),
108
+ debug: vi.fn(), trace: vi.fn(), fatal: vi.fn(),
109
+ child: () => ({}) as never
110
+ },
111
+ gitlab: { ...overrides?.gitlabStub }, // Key: inject stub
112
+ policy: new ToolPolicyEngine({ ... }), // Real policy engine
113
+ formatter: new OutputFormatter({ ... }) // Real formatter
114
+ };
115
+ }
116
+
117
+ // 2) Create Client ↔ Server linked pair
118
+ export async function createLinkedPair(context: AppContext) {
119
+ const server = createMcpServer(context);
120
+ const [clientTransport, serverTransport] =
121
+ InMemoryTransport.createLinkedPair();
122
+
123
+ await server.connect(serverTransport);
124
+
125
+ const client = new Client(
126
+ { name: "integration-test-client", version: "0.0.1" },
127
+ { capabilities: {} }
128
+ );
129
+ await client.connect(clientTransport);
130
+
131
+ return { client, server, clientTransport, serverTransport, context };
132
+ }
133
+ ```
134
+
135
+ > **Best Practice**: `createMcpServer()` should be a **pure factory function** that accepts a complete context object and does not depend on global state or environment variables. This allows tests to construct server instances with any configuration.
136
+
137
+ ### Pattern 1: Tool Registration and Discovery
138
+
139
+ Verify that `tools/list` returns the expected tool set — this is the most basic contract test.
140
+
141
+ ```typescript
142
+ describe("Tool Registration", () => {
143
+ it("registers all core tools under default configuration", async () => {
144
+ const { client, clientTransport, serverTransport } = await createLinkedPair(buildContext());
145
+ try {
146
+ const { tools } = await client.listTools();
147
+ const names = tools.map((t) => t.name);
148
+
149
+ expect(names).toContain("gitlab_get_project");
150
+ expect(names).toContain("gitlab_list_issues");
151
+ expect(names).toContain("health_check");
152
+ } finally {
153
+ await clientTransport.close();
154
+ await serverTransport.close();
155
+ }
156
+ });
157
+
158
+ it("excludes all mutating tools in read-only mode", async () => {
159
+ const { client, clientTransport, serverTransport } = await createLinkedPair(
160
+ buildContext({ readOnlyMode: true })
161
+ );
162
+ try {
163
+ const { tools } = await client.listTools();
164
+ const names = tools.map((t) => t.name);
165
+
166
+ expect(names).not.toContain("gitlab_create_issue");
167
+ expect(names).not.toContain("gitlab_execute_graphql_mutation");
168
+ // Read-only tools are still present
169
+ expect(names).toContain("gitlab_get_project");
170
+ } finally {
171
+ await clientTransport.close();
172
+ await serverTransport.close();
173
+ }
174
+ });
175
+ });
176
+ ```
177
+
178
+ > **Key Point**: Always close both transport endpoints in the `finally` block. InMemoryTransport does not clean up automatically — if you miss this, subsequent tests will hang.
179
+
180
+ ### Pattern 2: Tool Handler End-to-End Verification
181
+
182
+ Use `client.callTool()` to make real JSON-RPC calls, stub external APIs (such as GitLab), and verify three things:
183
+
184
+ 1. The correct API method is called with the right arguments
185
+ 2. The response structure conforms to the MCP specification (`content[].text` + `structuredContent`)
186
+ 3. Error scenarios return `isError: true`
187
+
188
+ ```typescript
189
+ describe("Tool handler: gitlab_get_project", () => {
190
+ it("passes project_id to context.gitlab.getProject()", async () => {
191
+ const getProject = vi.fn().mockResolvedValue({
192
+ id: 42,
193
+ name: "my-project",
194
+ path_with_namespace: "group/my-project"
195
+ });
196
+
197
+ const { client, clientTransport, serverTransport } = await createLinkedPair(
198
+ buildContext({
199
+ gitlabStub: { getProject }
200
+ })
201
+ );
202
+
203
+ try {
204
+ const result = await client.callTool({
205
+ name: "gitlab_get_project",
206
+ arguments: { project_id: "group/my-project" }
207
+ });
208
+
209
+ // Verification 1: stub was called correctly
210
+ expect(getProject).toHaveBeenCalledWith("group/my-project");
211
+
212
+ // Verification 2: response is not an error
213
+ expect(result.isError).toBeFalsy();
214
+
215
+ // Verification 3: text content contains expected data
216
+ const text = (result.content as Array<{ text: string }>).find((c) => c.type === "text")!.text;
217
+ expect(text).toContain("my-project");
218
+ } finally {
219
+ await clientTransport.close();
220
+ await serverTransport.close();
221
+ }
222
+ });
223
+ });
224
+ ```
225
+
226
+ > **Best Practice**: `gitlabStub` only needs to provide the methods used by the current test. Unstubbed method calls will produce runtime errors, which is **exactly** the behavior you want — it helps you discover unexpected API calls.
227
+
228
+ ### Pattern 3: Schema Validation and Boundary Inputs
229
+
230
+ The MCP SDK's Zod schemas automatically validate input parameters. You should test:
231
+
232
+ - `null` value preprocessing (`null → undefined`)
233
+ - Missing required fields
234
+ - Type mismatches
235
+ - Invalid enum values
236
+
237
+ ```typescript
238
+ describe("Schema Validation", () => {
239
+ it("null values are preprocessed to undefined (optional fields do not error)", async () => {
240
+ const listProjects = vi.fn().mockResolvedValue([]);
241
+ const { client, clientTransport, serverTransport } = await createLinkedPair(
242
+ buildContext({
243
+ gitlabStub: { listProjects }
244
+ })
245
+ );
246
+
247
+ try {
248
+ const result = await client.callTool({
249
+ name: "gitlab_list_projects",
250
+ arguments: { search: null, page: null } // null → undefined
251
+ });
252
+ expect(result.isError).toBeFalsy();
253
+ } finally {
254
+ await clientTransport.close();
255
+ await serverTransport.close();
256
+ }
257
+ });
258
+
259
+ it("type mismatch triggers Zod error", async () => {
260
+ const { client, clientTransport, serverTransport } = await createLinkedPair(
261
+ buildContext({
262
+ gitlabStub: { listProjects: vi.fn() }
263
+ })
264
+ );
265
+
266
+ try {
267
+ const result = await client.callTool({
268
+ name: "gitlab_list_projects",
269
+ arguments: { page: "not-a-number" } // should be number
270
+ });
271
+ expect(result.isError).toBe(true);
272
+ } finally {
273
+ await clientTransport.close();
274
+ await serverTransport.close();
275
+ }
276
+ });
277
+ });
278
+ ```
279
+
280
+ ---
281
+
282
+ ## Layer 2: HTTP Transport Layer Testing
283
+
284
+ InMemoryTransport skips serialization and the network layer. To test real HTTP endpoints, you need to start a real HTTP server.
285
+
286
+ ### Streamable HTTP Testing
287
+
288
+ **Key Pattern**: Use port 0 to let the OS assign a random port, avoiding port conflicts.
289
+
290
+ ```typescript
291
+ import { createServer, type Server as HttpServer } from "node:http";
292
+ import { setupMcpHttpApp } from "../../src/http-app.js";
293
+
294
+ let httpServer: HttpServer;
295
+ let baseUrl: string;
296
+ let result: SetupMcpHttpAppResult;
297
+
298
+ beforeAll(async () => {
299
+ const context = buildHttpContext();
300
+ result = setupMcpHttpApp({
301
+ context,
302
+ env: context.env,
303
+ logger: context.logger
304
+ });
305
+
306
+ httpServer = createServer(result.app);
307
+ await new Promise<void>((resolve) => {
308
+ httpServer.listen(0, "127.0.0.1", () => resolve());
309
+ });
310
+
311
+ const addr = httpServer.address();
312
+ if (typeof addr === "object" && addr !== null) {
313
+ baseUrl = `http://127.0.0.1:${addr.port}`;
314
+ }
315
+ });
316
+
317
+ afterAll(async () => {
318
+ // Key: close all sessions first, then shut down the HTTP server
319
+ for (const sessionId of result.sessions.keys()) {
320
+ await result.closeSession(sessionId, "shutdown");
321
+ }
322
+ await new Promise<void>((resolve, reject) => {
323
+ httpServer.close((err) => (err ? reject(err) : resolve()));
324
+ });
325
+ });
326
+ ```
327
+
328
+ **Typical Test Scenarios**:
329
+
330
+ | Scenario | HTTP Method | Expected Status | Error Code |
331
+ | ------------------------------- | ---------------------------- | --------------- | ---------- |
332
+ | Initialize session | `POST /mcp` (initialize) | 200 | — |
333
+ | Subsequent request with session | `POST /mcp` | 200 | — |
334
+ | Invalid session ID | `POST /mcp` | 404 | -32001 |
335
+ | GET without initialization | `GET /mcp` | 400 | -32000 |
336
+ | Capacity exceeded | `POST /mcp` (MAX_SESSIONS=1) | 503 | -32002 |
337
+ | Rate limiting | `POST /mcp` (excessive) | 429 | -32003 |
338
+ | Delete session | `DELETE /mcp` | 200 | — |
339
+ | Health check | `GET /healthz` | 200 | — |
340
+
341
+ > **Best Practice**: Each test scenario requiring independent configuration (e.g., `MAX_SESSIONS=1`) should create its own `setupMcpHttpApp` + `createServer` instance, cleaning up in a `finally` block. Do not share stateful server instances.
342
+
343
+ ### SSE Transport Testing
344
+
345
+ SSE testing is more complex than HTTP because `GET /sse` returns a long-lived event stream:
346
+
347
+ ```typescript
348
+ // SSE event parser
349
+ async function* parseSseEvents(response: Response): AsyncGenerator<SseEvent> {
350
+ const reader = response.body!.getReader();
351
+ const decoder = new TextDecoder();
352
+ let buffer = "";
353
+
354
+ try {
355
+ while (true) {
356
+ const { done, value } = await reader.read();
357
+ if (done) break;
358
+ buffer += decoder.decode(value, { stream: true });
359
+ const parts = buffer.split("\n\n");
360
+ buffer = parts.pop()!;
361
+
362
+ for (const part of parts) {
363
+ const event: SseEvent = {};
364
+ for (const line of part.split("\n")) {
365
+ if (line.startsWith("event: ")) event.event = line.slice(7).trim();
366
+ else if (line.startsWith("data: ")) event.data = line.slice(6).trim();
367
+ }
368
+ yield event;
369
+ }
370
+ }
371
+ } finally {
372
+ reader.releaseLock();
373
+ }
374
+ }
375
+ ```
376
+
377
+ **SSE Test Flow**:
378
+
379
+ ```typescript
380
+ it("GET /sse returns an endpoint event", async () => {
381
+ const controller = new AbortController();
382
+ try {
383
+ const response = await fetch(`${baseUrl}/sse`, {
384
+ headers: { Accept: "text/event-stream" },
385
+ signal: controller.signal
386
+ });
387
+
388
+ expect(response.headers.get("content-type")).toContain("text/event-stream");
389
+
390
+ const gen = parseSseEvents(response);
391
+ const first = await gen.next();
392
+ const event = first.value as SseEvent;
393
+
394
+ expect(event.event).toBe("endpoint");
395
+ expect(event.data).toContain("/messages?sessionId=");
396
+ } finally {
397
+ controller.abort(); // Required: clean up the long-lived connection
398
+ }
399
+ });
400
+ ```
401
+
402
+ > **Note**: In SSE mode, when calling `handlePostMessage` via `POST /messages`, you must pass `req.body` as the third argument, because Express's `json()` middleware has already consumed the raw body stream. This is a common pitfall.
403
+
404
+ > **Important**: `SSE=true` is incompatible with `REMOTE_AUTHORIZATION=true`. The environment validation layer enforces this constraint at startup. If you need remote per-request authentication, use Streamable HTTP transport instead.
405
+
406
+ ### Session Lifecycle Testing
407
+
408
+ Session management is the most bug-prone area of an MCP Server. You must test the complete lifecycle:
409
+
410
+ ```typescript
411
+ describe("Session DELETE", () => {
412
+ it("POST with the same session returns 404 after DELETE", async () => {
413
+ const sessionId = await initializeSession(baseUrl);
414
+
415
+ // Delete the session
416
+ await fetch(`${baseUrl}/mcp`, {
417
+ method: "DELETE",
418
+ headers: { ...MCP_HEADERS, "mcp-session-id": sessionId }
419
+ });
420
+
421
+ // Attempt to use the deleted session
422
+ const res = await fetch(`${baseUrl}/mcp`, {
423
+ method: "POST",
424
+ headers: { ...MCP_HEADERS, "mcp-session-id": sessionId },
425
+ body: JSON.stringify({
426
+ jsonrpc: "2.0",
427
+ id: 2,
428
+ method: "tools/list",
429
+ params: {}
430
+ })
431
+ });
432
+
433
+ expect(res.status).toBe(404);
434
+ const body = await res.json();
435
+ expect(body.error?.code).toBe(-32001);
436
+ });
437
+ });
438
+ ```
439
+
440
+ **Garbage Collection Testing**: Trigger immediate expiration by setting `SESSION_TIMEOUT_SECONDS` to 0:
441
+
442
+ ```typescript
443
+ it("GC cleans up expired sessions", async () => {
444
+ (ctx.env as any).SESSION_TIMEOUT_SECONDS = 0;
445
+ // ... create session ...
446
+ expect(result.sessions.size).toBe(1);
447
+
448
+ await result.garbageCollectSessions();
449
+ expect(result.sessions.size).toBe(0);
450
+ });
451
+ ```
452
+
453
+ **Client Disconnection Testing** requires polling, since TCP close is not synchronous:
454
+
455
+ ```typescript
456
+ it("SSE session is cleaned up after client disconnects", async () => {
457
+ controller.abort();
458
+
459
+ const deadline = Date.now() + 2000;
460
+ while (result.sseSessions.size > 0 && Date.now() < deadline) {
461
+ await new Promise((r) => setTimeout(r, 50));
462
+ }
463
+ expect(result.sseSessions.size).toBe(0);
464
+ });
465
+ ```
466
+
467
+ ---
468
+
469
+ ## Layer 3: Security and Policy Testing
470
+
471
+ ### Remote Authentication Flow
472
+
473
+ When `REMOTE_AUTHORIZATION=true`, the token comes from the HTTP header rather than an environment variable:
474
+
475
+ ```typescript
476
+ function buildRemoteAuthContext() {
477
+ const ctx = buildContext({ token: null }); // No default token
478
+ (ctx.env as any).REMOTE_AUTHORIZATION = true;
479
+ (ctx.env as any).HTTP_JSON_ONLY = true;
480
+ return ctx;
481
+ }
482
+
483
+ describe("Remote Authentication", () => {
484
+ it("missing token returns 401 + error code -32010", async () => {
485
+ const res = await fetch(`${baseUrl}/mcp`, {
486
+ method: "POST",
487
+ headers: MCP_HEADERS, // No Authorization
488
+ body: initializeBody()
489
+ });
490
+ expect(res.status).toBe(401);
491
+ const body = await res.json();
492
+ expect(body.error?.code).toBe(-32010);
493
+ });
494
+
495
+ it("Bearer token is passed via Authorization header", async () => {
496
+ const res = await fetch(`${baseUrl}/mcp`, {
497
+ method: "POST",
498
+ headers: {
499
+ ...MCP_HEADERS,
500
+ Authorization: "Bearer test-remote-token"
501
+ },
502
+ body: initializeBody()
503
+ });
504
+ expect(res.status).toBe(200);
505
+ expect(res.headers.get("mcp-session-id")).toBeTruthy();
506
+ });
507
+
508
+ it("authentication context propagates to session state", async () => {
509
+ // Initialize session with token
510
+ const initRes = await fetch(`${baseUrl}/mcp`, {
511
+ method: "POST",
512
+ headers: {
513
+ ...MCP_HEADERS,
514
+ Authorization: "Bearer my-secret-token"
515
+ },
516
+ body: initializeBody()
517
+ });
518
+ const sessionId = initRes.headers.get("mcp-session-id")!;
519
+
520
+ // Verify internal auth state of the session
521
+ const session = result.sessions.get(sessionId);
522
+ expect(session!.auth?.token).toBe("my-secret-token");
523
+ expect(session!.auth?.header).toBe("authorization");
524
+ });
525
+ });
526
+ ```
527
+
528
+ ### Error Handling and Sensitive Information Redaction
529
+
530
+ Error handling is a security-critical path. Two modes need to be tested:
531
+
532
+ | Mode | `GITLAB_ERROR_DETAIL_MODE` | Behavior |
533
+ | ------ | -------------------------- | ------------------------------------------------------------------ |
534
+ | `full` | Implementation-dependent | Returns full error details (better debugging, higher leakage risk) |
535
+ | `safe` | Recommended for production | Hides internal details, returns generic messages |
536
+
537
+ ```typescript
538
+ describe("Error Handling", () => {
539
+ it("GitLabApiError 404 → isError + status code", async () => {
540
+ const getProject = vi.fn().mockRejectedValue(new GitLabApiError("Not Found", 404));
541
+ // ...
542
+ expect(result.isError).toBe(true);
543
+ expect(text).toContain("GitLab API error 404");
544
+ });
545
+
546
+ it("safe mode hides error details", async () => {
547
+ (ctx.env as any).GITLAB_ERROR_DETAIL_MODE = "safe";
548
+
549
+ const getProject = vi
550
+ .fn()
551
+ .mockRejectedValue(new Error("DB connection failed: password=hunter2"));
552
+ // ...
553
+ expect(text).toBe("Request failed"); // Generic message
554
+ expect(text).not.toContain("hunter2"); // No leakage
555
+ });
556
+
557
+ it("non-Error thrown values return Unknown error", async () => {
558
+ const getProject = vi.fn().mockRejectedValue("string error");
559
+ // ...
560
+ expect(text).toBe("Unknown error");
561
+ });
562
+ });
563
+ ```
564
+
565
+ **Token Redaction Testing** — ensure tokens are not leaked in error details. Using `it.each` reduces boilerplate when testing multiple token patterns:
566
+
567
+ ```typescript
568
+ describe("Token Redaction", () => {
569
+ it.each([
570
+ ["GitLab PAT", "glpat-abcdef1234567890"],
571
+ ["GitHub PAT", "ghp_abcdef1234567890abcde"],
572
+ ["JWT", "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.payload"]
573
+ ])("redacts %s token", async (label, token) => {
574
+ const getProject = vi.fn().mockRejectedValue(
575
+ new GitLabApiError("Unauthorized", 401, {
576
+ message: `Token ${token} is invalid`
577
+ })
578
+ );
579
+ // ...
580
+ expect(text).toContain("[REDACTED]");
581
+ expect(text).not.toContain(token);
582
+ });
583
+
584
+ it("redacts sensitive object keys (authorization, password, secret)", async () => {
585
+ const getProject = vi.fn().mockRejectedValue(
586
+ new GitLabApiError("Error", 400, {
587
+ authorization: "Bearer secret-val",
588
+ password: "hunter2",
589
+ message: "safe value" // Non-sensitive key is preserved
590
+ })
591
+ );
592
+ // ...
593
+ expect(text).not.toContain("secret-val");
594
+ expect(text).not.toContain("hunter2");
595
+ expect(text).toContain("safe value");
596
+ });
597
+ });
598
+ ```
599
+
600
+ ### Policy Engine: Read-Only, Feature Flags, Allowlists
601
+
602
+ The policy engine determines which tools are available. You must test various combinations:
603
+
604
+ ```typescript
605
+ describe("GraphQL Tool Policy", () => {
606
+ it("disables GraphQL tools when ALLOWED_PROJECT_IDS is set", async () => {
607
+ const { client, clientTransport, serverTransport } = await createLinkedPair(
608
+ buildContext({ allowedProjectIds: ["123"] })
609
+ );
610
+ try {
611
+ const names = (await client.listTools()).tools.map((t) => t.name);
612
+ expect(names).not.toContain("gitlab_execute_graphql_query");
613
+ expect(names).not.toContain("gitlab_execute_graphql_mutation");
614
+ } finally {
615
+ await clientTransport.close();
616
+ await serverTransport.close();
617
+ }
618
+ });
619
+
620
+ it("GITLAB_ALLOW_GRAPHQL_WITH_PROJECT_SCOPE overrides the restriction", async () => {
621
+ const { client, clientTransport, serverTransport } = await createLinkedPair(
622
+ buildContext({
623
+ allowedProjectIds: ["123"],
624
+ allowGraphqlWithProjectScope: true
625
+ })
626
+ );
627
+ try {
628
+ const names = (await client.listTools()).tools.map((t) => t.name);
629
+ expect(names).toContain("gitlab_execute_graphql_query");
630
+ } finally {
631
+ await clientTransport.close();
632
+ await serverTransport.close();
633
+ }
634
+ });
635
+
636
+ it("compat tool blocks mutation in read-only mode", async () => {
637
+ const { client, clientTransport, serverTransport } = await createLinkedPair(
638
+ buildContext({ readOnlyMode: true, gitlabStub: { executeGraphql: vi.fn() } })
639
+ );
640
+ try {
641
+ const result = await client.callTool({
642
+ name: "gitlab_execute_graphql",
643
+ arguments: { query: "mutation { createProject { id } }" }
644
+ });
645
+ expect(result.isError).toBe(true);
646
+ } finally {
647
+ await clientTransport.close();
648
+ await serverTransport.close();
649
+ }
650
+ });
651
+
652
+ it("string literal containing 'mutation' does not trigger false positive", async () => {
653
+ const executeGraphql = vi.fn().mockResolvedValue({ data: {} });
654
+ const { client, clientTransport, serverTransport } = await createLinkedPair(
655
+ buildContext({ gitlabStub: { executeGraphql } })
656
+ );
657
+ try {
658
+ const result = await client.callTool({
659
+ name: "gitlab_execute_graphql_query",
660
+ arguments: { query: '{ project(name: "mutation thing") { id } }' }
661
+ });
662
+ expect(result.isError).toBeFalsy(); // Should not error
663
+ } finally {
664
+ await clientTransport.close();
665
+ await serverTransport.close();
666
+ }
667
+ });
668
+ });
669
+ ```
670
+
671
+ ---
672
+
673
+ ## Layer 4: Agent Loop Integration Testing
674
+
675
+ ### ScriptedLLM Pattern
676
+
677
+ The real MCP use case is an LLM Agent calling tools. However, testing directly with a real LLM is non-deterministic, slow, and expensive. The recommended approach is **ScriptedLLM** — a pre-programmed sequence of LLM responses.
678
+
679
+ > **Note**: Layer 4 and Layer 5 patterns are high-value strategies. Some teams already implement them, while others can adopt them incrementally as a roadmap.
680
+
681
+ ```typescript
682
+ class ScriptedLLM {
683
+ private cursor = 0;
684
+ constructor(private responses: LLMResponse[]) {}
685
+
686
+ async createMessage(messages: Message[], tools: Tool[]) {
687
+ if (this.cursor >= this.responses.length) {
688
+ // Default end response
689
+ return { content: [{ type: "text", text: "Done" }] };
690
+ }
691
+ return this.responses[this.cursor++];
692
+ }
693
+ }
694
+
695
+ // Test case
696
+ it("Agent calls tool and processes result", async () => {
697
+ const listProjects = vi.fn().mockResolvedValue([{ id: 1, name: "alpha" }]);
698
+
699
+ const { client } = await createLinkedPair(buildContext({ gitlabStub: { listProjects } }));
700
+
701
+ const llm = new ScriptedLLM([
702
+ // Round 1: LLM decides to call a tool
703
+ {
704
+ content: [
705
+ {
706
+ type: "tool_use",
707
+ id: "call-1",
708
+ name: "gitlab_list_projects",
709
+ input: { search: "alpha" }
710
+ }
711
+ ]
712
+ },
713
+ // Round 2: LLM sees tool result and gives final answer
714
+ {
715
+ content: [{ type: "text", text: "Found project alpha" }]
716
+ }
717
+ ]);
718
+
719
+ const result = await runAgentLoop({ client, llm, query: "find alpha" });
720
+
721
+ expect(listProjects).toHaveBeenCalled();
722
+ expect(result).toContain("alpha");
723
+ });
724
+ ```
725
+
726
+ **Key Assertion Strategies**:
727
+
728
+ - Assert **whether a tool was called** and **with what arguments** — deterministic
729
+ - Assert **the final output contains key information** — semi-deterministic
730
+ - **Do not** assert the full text of natural language output — unstable
731
+
732
+ ### Real LLM Smoke Testing
733
+
734
+ A small number of scenarios can use a real LLM (run in nightly CI):
735
+
736
+ ```typescript
737
+ // Only run in CI with an API key
738
+ describe.skipIf(!process.env.ANTHROPIC_API_KEY)("LLM E2E Smoke", () => {
739
+ it("can discover and call the health_check tool", async () => {
740
+ const result = await runAgentLoop({
741
+ client,
742
+ llm: new AnthropicLLM(process.env.ANTHROPIC_API_KEY!),
743
+ query: "Check the server health"
744
+ });
745
+
746
+ // Loose assertion — just needs to produce a result
747
+ expect(result.length).toBeGreaterThan(0);
748
+ }, 30_000); // Generous timeout
749
+ });
750
+ ```
751
+
752
+ ---
753
+
754
+ ## Layer 5: Inspector CLI Black-Box Testing
755
+
756
+ [MCP Inspector](https://github.com/modelcontextprotocol/inspector)'s `--cli` mode is designed for scripting and CI, outputting JSON format. It's ideal for **black-box contract testing**.
757
+
758
+ ### Local STDIO Testing
759
+
760
+ Use your project’s actual compiled stdio entrypoint (for example, `dist/index.js`).
761
+
762
+ ```bash
763
+ # List tools
764
+ npx @modelcontextprotocol/inspector --cli \
765
+ node dist/index.js \
766
+ --method tools/list
767
+
768
+ # Call a tool
769
+ npx @modelcontextprotocol/inspector --cli \
770
+ node dist/index.js \
771
+ --method tools/call \
772
+ --tool-name health_check
773
+ ```
774
+
775
+ ### Remote HTTP Testing
776
+
777
+ ```bash
778
+ # Streamable HTTP (with auth header)
779
+ npx @modelcontextprotocol/inspector --cli \
780
+ https://my-mcp-server.example.com \
781
+ --transport http \
782
+ --method tools/list \
783
+ --header "Authorization: Bearer $TOKEN"
784
+ ```
785
+
786
+ ### Embedding in Vitest
787
+
788
+ ```typescript
789
+ import { execa } from "execa";
790
+
791
+ test("Inspector CLI: tool list contains health_check", async () => {
792
+ const { stdout } = await execa("npx", [
793
+ "-y",
794
+ "@modelcontextprotocol/inspector",
795
+ "--cli",
796
+ "node",
797
+ "dist/index.js",
798
+ "--method",
799
+ "tools/list"
800
+ ]);
801
+
802
+ const res = JSON.parse(stdout);
803
+ const names = res.tools.map((t: { name: string }) => t.name);
804
+ expect(names).toContain("health_check");
805
+ });
806
+ ```
807
+
808
+ > **When to use Inspector CLI vs SDK tests**: If you want stable API-level tests (unaffected by CLI output format changes), prefer SDK + InMemoryTransport. Inspector CLI is better suited for post-deployment smoke verification.
809
+
810
+ ---
811
+
812
+ ## CI/CD Integration Strategy
813
+
814
+ ### Recommended Layered Strategy
815
+
816
+ | Trigger | Test Type | Tools | Duration |
817
+ | ----------------- | -------------------------------- | -------------------- | -------- |
818
+ | Every commit / PR | InMemoryTransport protocol tests | Vitest + SDK | < 5s |
819
+ | Every commit / PR | HTTP/SSE transport layer tests | Vitest + real server | < 10s |
820
+ | Every commit / PR | Security/policy/error handling | Vitest + SDK | < 5s |
821
+ | Every PR | Inspector CLI contract test | Inspector --cli | < 15s |
822
+ | Nightly | Agent Loop (ScriptedLLM) | Vitest + SDK | < 30s |
823
+ | Nightly | LLM E2E smoke test | Vitest + real LLM | < 60s |
824
+ | Pre-release | Containerized full-stack test | Docker + Inspector | < 5min |
825
+
826
+ ### package.json Script Organization
827
+
828
+ The following script layout is one practical example. Rename or regroup scripts based on your repository structure and CI strategy.
829
+
830
+ ```json
831
+ {
832
+ "scripts": {
833
+ "test": "vitest run",
834
+ "test:unit": "vitest run tests/unit",
835
+ "test:integration": "vitest run tests/integration",
836
+ "test:e2e": "vitest run tests/e2e",
837
+ "test:smoke": "vitest run tests/smoke --timeout=60000",
838
+ "typecheck": "tsc --noEmit",
839
+ "lint": "eslint ."
840
+ }
841
+ }
842
+ ```
843
+
844
+ ### CI Configuration Essentials
845
+
846
+ ```yaml
847
+ # .github/workflows/test.yml or equivalent .gitlab-ci.yml
848
+ test:
849
+ steps:
850
+ - run: pnpm typecheck # Type check first
851
+ - run: pnpm lint # Then lint
852
+ - run: pnpm test # Finally run all tests
853
+ ```
854
+
855
+ > **Note**: For SSE tests involving TCP connection closure (e.g., client disconnection), use **polling with a deadline** instead of a fixed `setTimeout` to avoid flaky tests caused by timing differences in CI environments.
856
+
857
+ ---
858
+
859
+ ## Common Pitfalls and Solutions
860
+
861
+ ### 1. Express Body Parser Conflicts with SSE handlePostMessage
862
+
863
+ **Problem**: The `express.json()` middleware consumes the raw body stream, causing `SSEServerTransport.handlePostMessage()` to fail internally when calling `getRawBody()` with a `stream is not readable` error.
864
+
865
+ **Solution**: Always pass `req.body` as the third argument:
866
+
867
+ ```typescript
868
+ // ✗ Wrong
869
+ await session.transport.handlePostMessage(req, res);
870
+
871
+ // ✓ Correct
872
+ await session.transport.handlePostMessage(req, res, req.body);
873
+ ```
874
+
875
+ ### 2. Unclosed InMemoryTransport Causes Tests to Hang
876
+
877
+ **Problem**: Forgetting to close transports causes the test process to never exit.
878
+
879
+ **Solution**: Always use the `try/finally` pattern:
880
+
881
+ ```typescript
882
+ const { client, clientTransport, serverTransport } = await createLinkedPair(context);
883
+ try {
884
+ // Test logic
885
+ } finally {
886
+ await clientTransport.close();
887
+ await serverTransport.close();
888
+ }
889
+ ```
890
+
891
+ ### 3. Shared HTTP Server Causes State Leakage
892
+
893
+ **Problem**: Multiple tests share the same `setupMcpHttpApp` instance, and session state bleeds between them.
894
+
895
+ **Solution**: Tests requiring independent configuration (e.g., `MAX_SESSIONS=1`) should create their own server instance and clean up in `finally`.
896
+
897
+ ### 4. Timing Issues with SSE Client Disconnection
898
+
899
+ **Problem**: After `controller.abort()`, the server-side `res.on("close")` callback is not triggered synchronously.
900
+
901
+ **Solution**: Use polling with a deadline:
902
+
903
+ ```typescript
904
+ const deadline = Date.now() + 2000;
905
+ while (sessions.size > 0 && Date.now() < deadline) {
906
+ await new Promise((r) => setTimeout(r, 50));
907
+ }
908
+ ```
909
+
910
+ ### 5. False Positives in GraphQL Mutation Detection
911
+
912
+ **Problem**: A query containing the string literal `"mutation"` is incorrectly identified as a mutation operation.
913
+
914
+ **Solution**: Strip comments and string values before detection:
915
+
916
+ ```typescript
917
+ const normalized = query
918
+ .replace(/#[^\n]*/g, " ") // Remove line comments
919
+ .replace(/"""[\s\S]*?"""/g, " ") // Remove block strings
920
+ .replace(/"(?:\\.|[^"\\])*"/g, " "); // Remove double-quoted strings
921
+ ```
922
+
923
+ **You must write corresponding tests** to verify this behavior does not produce false positives.
924
+
925
+ ### 6. TypeScript Compatibility with Environment Variable Type Overrides
926
+
927
+ **Problem**: Directly assigning `ctx.env.HTTP_JSON_ONLY = true` may cause a TypeScript error due to `readonly` types.
928
+
929
+ **Solution**: Use type assertion:
930
+
931
+ ```typescript
932
+ (ctx.env as { HTTP_JSON_ONLY: boolean }).HTTP_JSON_ONLY = true;
933
+ ```
934
+
935
+ ---
936
+
937
+ ## Summary: Recommended Test Matrix
938
+
939
+ The following table summarizes the test dimensions a mature MCP Server should cover:
940
+
941
+ | Dimension | Test Method | Priority |
942
+ | -------------------------------------------------- | ------------------------ | -------- |
943
+ | Protocol handshake (initialize / list) | InMemoryTransport | P0 |
944
+ | Tool Handler correctness | InMemoryTransport + stub | P0 |
945
+ | Schema validation / boundary inputs | InMemoryTransport | P0 |
946
+ | HTTP Session create/reuse/delete | Real HTTP server | P0 |
947
+ | SSE connect/message/disconnect | Real HTTP server | P1 |
948
+ | Session capacity limits | Real HTTP server | P1 |
949
+ | Session rate limiting | Real HTTP server | P1 |
950
+ | Session garbage collection | Real HTTP server | P1 |
951
+ | Remote authentication (Bearer / Private-Token) | Real HTTP server | P1 |
952
+ | Dynamic API URL | Real HTTP server | P2 |
953
+ | Error handling (GitLabApiError / Error / unknown) | InMemoryTransport | P0 |
954
+ | Token redaction (glpat / ghp / JWT) | InMemoryTransport | P1 |
955
+ | Sensitive key redaction (password / authorization) | InMemoryTransport | P1 |
956
+ | Safe mode vs full mode | InMemoryTransport | P1 |
957
+ | Read-only mode tool filtering | InMemoryTransport | P0 |
958
+ | Feature flags (wiki / pipeline / release) | InMemoryTransport | P1 |
959
+ | Tool allowlist / blocklist | InMemoryTransport | P1 |
960
+ | GraphQL mutation detection and policy | InMemoryTransport | P1 |
961
+ | Agent Loop (ScriptedLLM) | InMemoryTransport | P2 |
962
+ | Response truncation (maxBytes) | InMemoryTransport | P2 |
963
+ | Health check endpoint | Real HTTP server | P2 |
964
+ | Inspector CLI contract test | Inspector --cli | P2 |
965
+ | Real LLM E2E | Real LLM API | P3 |
966
+
967
+ ---
968
+
969
+ ## References
970
+
971
+ - [MCP Official Specification](https://modelcontextprotocol.io/specification) (referenced version: 2025-11-25)
972
+ - [MCP TypeScript SDK](https://github.com/modelcontextprotocol/typescript-sdk)
973
+ - [MCP Inspector (including CLI mode)](https://github.com/modelcontextprotocol/inspector)
974
+ - [MCP Best Practices Guide](https://modelcontextprotocol.info/docs/best-practices/)
975
+ - [MCP Server E2E Testing Example](https://github.com/mkusaka/mcp-server-e2e-testing-example)
976
+ - [MCPcat Integration Testing Guide](https://mcpcat.io/guides/integration-tests-mcp-flows/)
977
+ - [MCPcat Unit Testing Guide](https://mcpcat.io/guides/writing-unit-tests-mcp-servers/)
978
+ - [Stop Vibe-Testing Your MCP Server](https://www.jlowin.dev/blog/stop-vibe-testing-mcp-servers)
979
+ - [MCP Server Testing Tools Overview (Testomat.io)](https://testomat.io/blog/mcp-server-testing-tools/)
980
+ - [MCP Server Best Practices (MarkTechPost)](https://www.marktechpost.com/2025/07/23/7-mcp-server-best-practices-for-scalable-ai-integrations-in-2025/)
981
+ - [MCP Official Node.js Client Tutorial](https://modelcontextprotocol.io/tutorials/building-a-client)