npm - @yawlabs/mcp-compliance - Versions diffs - 0.9.1 → 0.10.1 - Mend

@yawlabs/mcp-compliance 0.9.1 → 0.10.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md +68 -2
package/dist/{chunk-CH2E27X5.js → chunk-7KISK3FS.js} +99 -9
package/dist/index.js +700 -187
package/dist/mcp/server.js +1 -1
package/dist/runner.d.ts +10 -1
package/dist/runner.js +1 -1
package/package.json +4 -1
package/schemas/report.v1.json +165 -0

package/README.md CHANGED Viewed

@@ -5,7 +5,7 @@
 [![GitHub stars](https://img.shields.io/github/stars/YawLabs/mcp-compliance)](https://github.com/YawLabs/mcp-compliance/stargazers)
 [![CI](https://github.com/YawLabs/mcp-compliance/actions/workflows/ci.yml/badge.svg)](https://github.com/YawLabs/mcp-compliance/actions/workflows/ci.yml)
-**Test any MCP server for spec compliance.** 84-test suite covering transport, lifecycle, tools, resources, prompts, error handling, schema validation, and security against the [MCP specification](https://modelcontextprotocol.io/specification/2025-11-25). Works against **HTTP endpoints** (`https://my-server.com/mcp`) and **stdio servers** (`npx @modelcontextprotocol/server-filesystem /tmp`) alike. CLI, MCP server, and programmatic API.
+**Test any MCP server for spec compliance.** 85-test suite covering transport, lifecycle, tools, resources, prompts, error handling, schema validation, and security against the [MCP specification](https://modelcontextprotocol.io/specification/2025-11-25). Works against **HTTP endpoints** (`https://my-server.com/mcp`) and **stdio servers** (`npx @modelcontextprotocol/server-filesystem /tmp`) alike. CLI, MCP server, and programmatic API.
 Built and maintained by [Yaw Labs](https://yaw.sh).
@@ -128,6 +128,19 @@ On Windows, `npx` and other `.cmd` shims are handled automatically by spawning t
 ### CI integration
+**GitHub Action** (drop into any `.github/workflows/*.yml`):
+```yaml
+- uses: YawLabs/mcp-compliance@v0
+  with:
+    target: 'node ./dist/server.js'   # or a URL like https://my-server.com/mcp
+    format: github                     # ::error / ::warning annotations on the PR
+    strict: 'true'                     # exit non-zero if any required test fails
+    min-grade: 'A'                     # also exit if grade slips
+```
+**Manual CLI invocation:**
 ```bash
 # GitHub Actions: emits ::error / ::warning annotations inline on the PR
 mcp-compliance test https://my-server.com/mcp --format github --strict
@@ -135,11 +148,30 @@ mcp-compliance test https://my-server.com/mcp --format github --strict
 # Slack/Linear/PR comment: drop the body straight into a comment
 mcp-compliance test https://my-server.com/mcp --format markdown > report.md
+# HTML report (self-contained, share anywhere — issue comments, S3, GitHub Pages)
+mcp-compliance test https://my-server.com/mcp --format html > report.html
 # Block release if grade slips below B
 mcp-compliance test https://my-server.com/mcp --min-grade B
 # Preview which tests will run before connecting (handy for --only/--skip authoring)
 mcp-compliance test --list --transport stdio --skip security
+# Diff two runs — exit 1 if anything that was passing is now failing
+mcp-compliance test https://my-server.com/mcp --format json > current.json
+mcp-compliance diff baseline.json current.json
+# Watch mode for stdio dev loop — re-runs on file changes in cwd
+mcp-compliance test --watch -- node ./dist/server.js
+# Latency benchmark
+mcp-compliance benchmark -- node ./dist/server.js -r 200 -c 4
+```
+**Docker:**
+```bash
+docker run --rm ghcr.io/yawlabs/mcp-compliance test https://my-server.com/mcp
 ```
 ### Scaffold a config
@@ -447,7 +479,7 @@ Restart your MCP client and approve the server when prompted.
 ### Tools
-- **mcp_compliance_test** — Run the full 84-test suite against a URL or stdio command. Supports auth, custom headers, env vars, timeout, retries, and category/test filtering. Returns grade, score, and detailed results.
+- **mcp_compliance_test** — Run the full 85-test suite against a URL or stdio command. Supports auth, custom headers, env vars, timeout, retries, and category/test filtering. Returns grade, score, and detailed results.
 - **mcp_compliance_badge** — Get the badge markdown/HTML for a server. Supports auth and custom headers.
 - **mcp_compliance_explain** — Explain what a specific test ID checks and why it matters.
@@ -468,8 +500,42 @@ const report2 = await runComplianceSuite('https://my-server.com/mcp', {
   retries: 1,
   only: ['transport', 'lifecycle'],
 });
+// Live progress for streaming UIs (e.g. server-sent-events to a browser)
+await runComplianceSuite('https://my-server.com/mcp', {
+  onTestComplete: (result) => {
+    // result has the full TestResult: id, name, category, required,
+    // passed, details, durationMs, specRef. Push it to your client.
+    sendToClient(result);
+  },
+});
+```
+## Report schema
+The JSON output of the test suite is a stable, versioned contract. Every report includes a `schemaVersion` field at the top level. The full JSON Schema lives at [`schemas/report.v1.json`](./schemas/report.v1.json) and is shipped with the npm package.
+```jsonc
+{
+  "schemaVersion": "1",        // bumped on breaking changes to the report shape
+  "specVersion": "2025-11-25", // MCP spec version tested against
+  "toolVersion": "0.10.0",     // mcp-compliance version that produced the report
+  "url": "...",
+  "timestamp": "...",
+  "grade": "A",
+  "score": 92.5,
+  "tests": [ ... ],
+  // ...
+}
 ```
+Consumer guidance:
+- Pin against `schemaVersion`. Reject reports with an unknown version rather than guessing at the shape.
+- The schema validates with any Draft 2020-12 validator (e.g. `ajv`).
+- Within a major version, additions are non-breaking. Renames, removals, or type changes bump the version.
+- Two runs against the same server produce equivalent grade, score, and per-test pass/fail (modulo timings/timestamps).
 ## Specification
 The compliance testing methodology is published as an open specification:

package/dist/{chunk-CH2E27X5.js → chunk-7KISK3FS.js} RENAMED Viewed

@@ -5,7 +5,7 @@ import { request as request2 } from "undici";
 // src/badge.ts
 import { createHash } from "crypto";
 function urlHash(url) {
-  return createHash("sha256").update(url).digest("hex").slice(0, 12);
+  return createHash("sha256").update(url).digest("hex").slice(0, 24);
 }
 function generateBadge(url) {
   const hash = urlHash(url);
@@ -209,9 +209,21 @@ function createStdioTransport(opts) {
   let exited = false;
   let exitCode = null;
   let spawnError = null;
+  let spawned = false;
   const pending = /* @__PURE__ */ new Map();
   let stdoutBuffer = "";
   let stderrBuffer = "";
+  const spawnReady = new Promise((resolve, reject) => {
+    child.once("spawn", () => {
+      spawned = true;
+      resolve();
+    });
+    child.once("error", (err) => {
+      if (!spawned) reject(err);
+    });
+  });
+  spawnReady.catch(() => {
+  });
   child.on("error", (err) => {
     spawnError = err;
     rejectAllPending(err);
@@ -281,6 +293,15 @@ function createStdioTransport(opts) {
     ${snippet.replace(/\n/g, "\n    ")}`;
   }
   async function writeLine(line) {
+    if (!spawned && !spawnError) {
+      try {
+        await spawnReady;
+      } catch (err) {
+        throw new Error(
+          annotateWithStderr(`stdio transport: spawn failed \u2014 ${err instanceof Error ? err.message : String(err)}`)
+        );
+      }
+    }
     if (exited) {
       throw new Error(annotateWithStderr(`stdio transport: child has exited (code ${exitCode})`));
     }
@@ -375,6 +396,7 @@ function createStdioTransport(opts) {
 }
 // src/types.ts
+var REPORT_SCHEMA_VERSION = "1";
 var TEST_DEFINITIONS = [
   // ── Transport (13 tests) ─────────────────────────────────────────
   {
@@ -679,6 +701,15 @@ var TEST_DEFINITIONS = [
     description: "Sends a tools/call request with _meta.progressToken and checks if the server sends progress notifications via SSE. Progress support is optional but recommended for long-running operations.",
     recommendation: "When a request includes _meta.progressToken, send notifications/progress events via SSE to report progress. Include progressToken, progress (current), and optionally total fields."
   },
+  {
+    id: "lifecycle-meta-tolerance",
+    name: "Tolerates _meta field on requests",
+    category: "lifecycle",
+    required: false,
+    specRef: "basic/utilities#_meta",
+    description: "Sends a ping with params._meta = { extra: 'value' } and verifies the server doesn't error. The 2025-11-25 spec allows arbitrary _meta on any request; servers should ignore unknown _meta fields gracefully.",
+    recommendation: "Treat the _meta field as opaque \u2014 pass it through your request validator, but do not reject requests for unknown _meta keys. The MCP spec reserves _meta for protocol/transport metadata and forward-compat extensibility."
+  },
   // ── Tools (4 tests) ──────────────────────────────────────────────
   {
     id: "tools-list",
@@ -1228,14 +1259,20 @@ var STDIO_INCOMPATIBLE_IDS = /* @__PURE__ */ new Set([
   "error-parse-code",
   "error-invalid-request-code",
   // Security tests that are inherently HTTP-layer (auth headers,
-  // sessions, CORS, TLS, rate limits, RFC 9728 metadata).
+  // sessions, CORS, TLS, rate limits, RFC 9728 metadata). For stdio
+  // servers these don't apply — the parent process owns the trust
+  // boundary, not the server.
   "security-tls-required",
   "security-oauth-metadata",
   "security-token-in-uri",
   "security-rate-limiting",
   "security-cors-headers",
   "security-origin-validation",
-  "security-session-not-auth"
+  "security-session-not-auth",
+  "security-auth-required",
+  "security-auth-malformed",
+  "security-www-authenticate",
+  "security-session-entropy"
 ]);
 function supportsTransport(def, kind) {
   if (!def) return true;
@@ -1299,8 +1336,11 @@ async function runComplianceSuite(target, options = {}) {
         return !options.skip.includes(category) && !options.skip.includes(id);
       }
       return true;
+    }, looksRejected2 = function(text, isErrorFlag) {
+      if (isErrorFlag) return true;
+      return REJECTION_PATTERNS.some((p) => p.test(text));
     };
-    var buildHeaders = buildHeaders2, shouldRun = shouldRun2;
+    var buildHeaders = buildHeaders2, shouldRun = shouldRun2, looksRejected = looksRejected2;
     const backendUrl = resolvedTarget.type === "http" ? resolvedTarget.url : "";
     const userHeaders = resolvedTarget.type === "http" ? resolvedTarget.headers ?? options.headers ?? {} : {};
     const displayUrl = resolvedTarget.type === "http" ? resolvedTarget.url : `stdio:${resolvedTarget.command}${resolvedTarget.args?.length ? ` ${resolvedTarget.args.join(" ")}` : ""}`;
@@ -1382,7 +1422,7 @@ async function runComplianceSuite(target, options = {}) {
           if (attempt < retries) await new Promise((r) => setTimeout(r, 1e3 * (attempt + 1)));
         }
       }
-      tests.push({
+      const result = {
         id,
         name,
         category,
@@ -1391,8 +1431,10 @@ async function runComplianceSuite(target, options = {}) {
         details: lastResult.details,
         durationMs: Date.now() - start,
         specRef: `${SPEC_BASE}/${specRef}`
-      });
+      };
+      tests.push(result);
       options.onProgress?.(id, lastResult.passed, lastResult.details);
+      options.onTestComplete?.(result);
     }
     await test(
       "transport-post",
@@ -1984,6 +2026,28 @@ async function runComplianceSuite(target, options = {}) {
         }
       }
     );
+    await test(
+      "lifecycle-meta-tolerance",
+      "Tolerates _meta field on requests",
+      "lifecycle",
+      false,
+      "basic/utilities#_meta",
+      async () => {
+        try {
+          const res = await rpc("ping", { _meta: { "mcp-compliance/probe": "1" } });
+          const body = res.body;
+          if (body.error) {
+            return {
+              passed: false,
+              details: `Server rejected _meta on ping (code ${body.error.code}). _meta should be ignored, not error.`
+            };
+          }
+          return { passed: true, details: "Server accepted ping with arbitrary _meta field" };
+        } catch (err) {
+          return { passed: false, details: `Error: ${err instanceof Error ? err.message : String(err)}` };
+        }
+      }
+    );
     await test(
       "transport-content-type-init",
       "Initialize response has valid content type",
@@ -3284,25 +3348,50 @@ async function runComplianceSuite(target, options = {}) {
         }
       }
     );
+    const REJECTION_PATTERNS = [
+      /access denied/i,
+      /permission denied/i,
+      /not allowed/i,
+      /not permitted/i,
+      /outside (allowed|permitted)/i,
+      /forbidden/i,
+      /unauthorized/i,
+      /invalid (path|input|argument|parameter|request)/i,
+      /(payload|request) (rejected|blocked|refused)/i,
+      /enoent|eacces|eperm/i,
+      /sandbox(ed)?/i,
+      /(no such file|file not found)/i,
+      /\binvalid\b.*\b(input|json|argument|parameter|character)/i
+    ];
     async function runInjectionTest(toolName, paramName, payloads, detectPattern, label) {
       const issues = [];
+      let defended = 0;
       for (const payload of payloads) {
         try {
           const res = await rpc("tools/call", { name: toolName, arguments: { [paramName]: payload } });
-          const content = res.body?.result?.content;
+          const result = res.body?.result;
+          const content = result?.content;
+          const isErrorFlag = result?.isError === true;
           if (Array.isArray(content)) {
             const text = content.map((c) => c.text || "").join(" ");
             if (detectPattern.test(text)) {
-              issues.push(`Payload "${payload}" ${label} (output: ${text.substring(0, 100)})`);
+              if (looksRejected2(text, isErrorFlag)) {
+                defended++;
+              } else {
+                issues.push(`Payload "${payload}" ${label} (output: ${text.substring(0, 100)})`);
+              }
+            } else {
+              defended++;
             }
           }
         } catch {
+          defended++;
         }
       }
       if (issues.length > 0) return { passed: false, details: issues.join("; ") };
       return {
         passed: true,
-        details: `Tested ${payloads.length} payloads against ${toolName}.${paramName} \u2014 no ${label.split(" ")[0]} detected`
+        details: defended === payloads.length ? `Tested ${payloads.length} payloads against ${toolName}.${paramName} \u2014 server defended (rejected or sanitized)` : `Tested ${payloads.length} payloads against ${toolName}.${paramName} \u2014 no ${label.split(" ")[0]} detected`
       };
     }
     if (toolNames.length > 0) {
@@ -3841,6 +3930,7 @@ async function runComplianceSuite(target, options = {}) {
     const { score, grade, overall, summary, categories } = computeScore(tests);
     const badge = generateBadge(displayUrl);
     return {
+      schemaVersion: REPORT_SCHEMA_VERSION,
       specVersion: SPEC_VERSION,
       toolVersion: TOOL_VERSION,
       url: displayUrl,