mcp-stdio-guard 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +40 -9
  2. package/package.json +1 -1
  3. package/src/index.js +648 -23
package/README.md CHANGED
@@ -11,7 +11,7 @@
11
11
  <p align="center">
12
12
  <a href="https://github.com/1Utkarsh1/mcp-stdio-guard/actions/workflows/ci.yml"><img alt="CI" src="https://github.com/1Utkarsh1/mcp-stdio-guard/actions/workflows/ci.yml/badge.svg" /></a>
13
13
  <a href="https://www.npmjs.com/package/mcp-stdio-guard"><img alt="npm" src="https://img.shields.io/npm/v/mcp-stdio-guard?color=0b6bcb" /></a>
14
- <a href="https://badge.socket.dev/npm/package/mcp-stdio-guard/0.4.0"><img alt="Socket" src="https://badge.socket.dev/npm/package/mcp-stdio-guard/0.4.0" /></a>
14
+ <a href="https://badge.socket.dev/npm/package/mcp-stdio-guard/0.5.0"><img alt="Socket" src="https://badge.socket.dev/npm/package/mcp-stdio-guard/0.5.0" /></a>
15
15
  <img alt="runtime dependencies" src="https://img.shields.io/badge/runtime%20deps-0-1f8f4c" />
16
16
  <img alt="node" src="https://img.shields.io/badge/node-%3E%3D18-2f855a" />
17
17
  <a href="LICENSE"><img alt="license" src="https://img.shields.io/badge/license-MIT-111827" /></a>
@@ -143,6 +143,7 @@ mcp-stdio-guard [options] -- <command> [args...]
143
143
  | `--repeat <count>` | run the same guard multiple times to catch cold/warm startup behavior |
144
144
  | `--request <method>` | send one MCP request after initialization, for example `tools/list` |
145
145
  | `--params <json>` | JSON params for `--request` |
146
+ | `--adversarial-probe <name>` / `--adversarial-probes <list>` | opt into strict protocol probes: `invalid-method`, `invalid-params`, `notification`, `malformed-json`, `all`, or `none` |
146
147
  | `--scan <path>` | scan source for risky stdout writes and visible startup-output risks |
147
148
  | `--fail-on-static` | make static scan findings fail the command |
148
149
  | `--json` | print machine-readable output |
@@ -159,9 +160,9 @@ Profiles are deterministic presets for common workflows. Existing CLI behavior r
159
160
  | `smoke` | initialize only unless `--request` is provided; skip advertised `tools/list`, `resources/list`, and `prompts/list` probes |
160
161
  | `registry` | run advertised list probes and repeat twice by default for cold/warm consistency |
161
162
  | `ci` | emit JSON output and make static scan findings fail when `--scan` is used |
162
- | `strict` | combine CI-style output/static failures with registry-style repeat depth; future adversarial probes will attach here |
163
+ | `strict` | combine CI-style output/static failures, registry-style repeat depth, and built-in adversarial protocol probes |
163
164
 
164
- Explicit flags can still narrow or deepen a profile. For example, `--profile registry --repeat 1` keeps registry capability probing but disables the repeat preset.
165
+ Explicit flags can still narrow or deepen a profile. For example, `--profile registry --repeat 1` keeps registry capability probing but disables the repeat preset. Use `--profile strict --adversarial-probes none` if you want strict JSON/static behavior without adversarial inputs.
165
166
 
166
167
  ## Config Files
167
168
 
@@ -181,6 +182,8 @@ Supported fields:
181
182
  | `request` | one explicit post-initialize request: `{ "method": "tools/list" }` |
182
183
  | `requests` | list of explicit post-initialize requests |
183
184
  | `safeToolCalls` | opt-in `tools/call` recipes; no tool is called unless listed here or explicitly requested |
185
+ | `adversarialProbes` | opt-in built-in probes as `true`, `"all"`, `"none"`, or a list of probe names |
186
+ | `adversarialToolCalls` | opt-in invalid-argument `tools/call` probes for configured safe tools |
184
187
 
185
188
  Example:
186
189
 
@@ -196,14 +199,20 @@ Example:
196
199
  "requests": [
197
200
  { "method": "tools/list" }
198
201
  ],
202
+ "adversarialProbes": ["invalid-method", "notification"],
199
203
  "safeToolCalls": [
200
204
  { "name": "echo", "arguments": { "text": "hello" } }
205
+ ],
206
+ "adversarialToolCalls": [
207
+ { "name": "echo", "arguments": { "unexpected": true } }
201
208
  ]
202
209
  }
203
210
  ```
204
211
 
205
212
  The guard does not discover and call arbitrary tools from `tools/list`. Tool execution only happens through an explicit `safeToolCalls` entry or an explicit `tools/call` request you provide.
206
213
 
214
+ Adversarial probes are off by default because they intentionally send unusual inputs. Built-in probes check that unknown methods return structured errors, invalid params return structured errors, notifications do not receive responses, and malformed JSON does not crash the process. `adversarialToolCalls` is separate because it calls a named tool with intentionally invalid arguments; only use it for tools you control and consider safe/idempotent.
215
+
207
216
  ## JSON Contract
208
217
 
209
218
  `--json` is intended for CI, registries, and badge ingestion. The current contract is `schemaVersion: 1`; new fields may be added, but these fields are stable for consumers:
@@ -220,6 +229,7 @@ The guard does not discover and call arbitrary tools from `tools/list`. Tool exe
220
229
  | `initialized` | whether the server completed the initialize handshake |
221
230
  | `operation` | post-initialize request result, or `null` when `--request` was not used |
222
231
  | `operations` | all explicit post-initialize requests, including config requests and safe tool calls |
232
+ | `adversarial` | opt-in adversarial probe results, including status, risk text, and per-probe issue codes |
223
233
  | `toolSchema` | summary of `tools/list` metadata validation when that operation was requested or probed from an advertised tools capability |
224
234
  | `capabilityProbes` | whether advertised capability list probes were enabled for this run |
225
235
  | `capabilityKeys` | sorted capability keys returned by `initialize` for a single run; repeat mode exposes this inside each `runs` entry |
@@ -234,7 +244,7 @@ The guard does not discover and call arbitrary tools from `tools/list`. Tool exe
234
244
  | `staticFindings` | source scan findings with language, file, line, reason, and message |
235
245
  | `runs` | per-run results when `--repeat` is used |
236
246
 
237
- Check statuses are `pass`, `fail`, `warning`, or `skipped`. The `checks` object separates the signal into `initialize`, `stdout`, `jsonRpc`, `operation`, `capabilities`, `toolSchema`, `process`, `pythonBuffering`, `staticScan`, and `repeat`, each with stable `status` and `issueCodes` fields. When `--repeat` is used, `checks.repeat` also includes `runs`, `passedRuns`, and `failedRuns`; each entry in `runs` is a normal schema-versioned result for that individual guard run.
247
+ Check statuses are `pass`, `fail`, `warning`, or `skipped`. The `checks` object separates the signal into `initialize`, `stdout`, `jsonRpc`, `operation`, `capabilities`, `toolSchema`, `adversarial`, `process`, `pythonBuffering`, `staticScan`, and `repeat`, each with stable `status` and `issueCodes` fields. When `--repeat` is used, `checks.repeat` also includes `runs`, `passedRuns`, and `failedRuns`; each entry in `runs` is a normal schema-versioned result for that individual guard run.
238
248
 
239
249
  `issueClasses` is additive to `checks`. It groups issue codes by the kind of problem a registry or client should display:
240
250
 
@@ -250,7 +260,7 @@ Current issue-code mapping:
250
260
  | --- | --- |
251
261
  | `installRuntime` | `initialize-timeout`, `operation-missing-response`, `operation-timeout`, `python-buffered-stdio`, `server-crashed`, `server-exited`, `spawn-failed` |
252
262
  | `stdioTransport` | `static-stdout-write`, `stdout-content-length-framing`, `stdout-empty-line`, `stdout-non-json`, `stdout-without-newline` |
253
- | `mcpProtocol` | `capability-list-error`, `capability-list-missing-response`, `capability-list-timeout`, `capability-list-unsupported`, `initialize-error`, `initialize-invalid-capabilities`, `initialize-invalid-protocol-version`, `initialize-invalid-result`, `initialize-invalid-server-info`, `initialize-missing-capabilities`, `initialize-missing-protocol-version`, `initialize-missing-server-info`, `notification-response`, `operation-error`, `repeat-capability-drift`, `repeat-list-shape-drift`, `repeat-protocol-drift`, `repeat-tool-drift`, `response-id-mismatch`, `response-id-type-mismatch`, `stdout-invalid-json-rpc`, `stdout-unexpected-request-id`, `tool-description-missing`, `tool-input-schema-invalid`, `tool-input-schema-required-missing`, `tool-name-duplicate`, `tool-name-invalid`, `tools-list-invalid-result` |
263
+ | `mcpProtocol` | `adversarial-invalid-method-result`, `adversarial-invalid-params-result`, `adversarial-malformed-json-result`, `adversarial-notification-response`, `adversarial-probe-crash`, `adversarial-probe-invalid-stdout`, `adversarial-probe-timeout`, `adversarial-tool-call-result`, `capability-list-error`, `capability-list-missing-response`, `capability-list-timeout`, `capability-list-unsupported`, `initialize-error`, `initialize-invalid-capabilities`, `initialize-invalid-protocol-version`, `initialize-invalid-result`, `initialize-invalid-server-info`, `initialize-missing-capabilities`, `initialize-missing-protocol-version`, `initialize-missing-server-info`, `notification-response`, `operation-error`, `repeat-capability-drift`, `repeat-list-shape-drift`, `repeat-protocol-drift`, `repeat-tool-drift`, `response-id-mismatch`, `response-id-type-mismatch`, `stdout-invalid-json-rpc`, `stdout-unexpected-request-id`, `tool-description-missing`, `tool-input-schema-invalid`, `tool-input-schema-required-missing`, `tool-name-duplicate`, `tool-name-invalid`, `tools-list-invalid-result` |
254
264
 
255
265
  Initialize lifecycle checks are part of the MCP protocol class. Missing or invalid `protocolVersion` and `capabilities` fail the run before the guard sends `notifications/initialized` or any normal request. Missing or invalid `serverInfo` is warning-level so registries can surface incomplete metadata without confusing it with a broken transport.
256
266
 
@@ -260,6 +270,8 @@ Tool schema checks run when `tools/list` receives a successful result, either fr
260
270
 
261
271
  Capability honesty checks are additive. If `initialize` advertises `capabilities.tools`, `capabilities.resources`, or `capabilities.prompts`, the guard probes the matching `tools/list`, `resources/list`, or `prompts/list` method after `notifications/initialized`. Unadvertised capabilities are `skipped`, not failed. `capability-list-unsupported` means an advertised list method returned method-not-found; `capability-list-error`, `capability-list-timeout`, and `capability-list-missing-response` mean the advertised list method existed in the contract but failed at runtime.
262
272
 
273
+ Adversarial probes are additive and opt-in. Their failures are classified as `mcpProtocol`, not install/runtime failures, so registries can distinguish "the package cannot start" from "the server started but mishandled strict JSON-RPC/MCP inputs." `malformed-json` accepts either a structured parse error or silence after a short observation window; a crash is a protocol failure. `notification` expects no response.
274
+
263
275
  Repeat drift checks compare successful initialized runs against the first initialized run. Negotiated protocol changes, advertised capability key changes, added or removed tool names, tool count changes, and resource/prompt list count changes are warning-level `repeat-*` issues. Tool order is normalized before comparison, so order-only changes do not warn.
264
276
 
265
277
  The repeat `drift` object has stable `status`, `issueCodes`, `baselineRun`, and `comparedRuns` fields. Its nested `negotiatedProtocol`, `capabilities`, `tools`, `lists.resources`, and `lists.prompts` sections include `changedRuns` so registries can show exactly what changed between cold and warm starts.
@@ -275,7 +287,7 @@ Runtime issue codes remain backward-compatible. For finer registry display, runt
275
287
  | `operation-missing-response` | `clean-exit-during-operation`, `nonzero-exit-during-operation`, `signal-exit-during-operation` |
276
288
  | `server-crashed` | `nonzero-exit-after-initialize`, `signal-exit-after-initialize` |
277
289
 
278
- `process` records the observed lifecycle even when the run passes. `outcome` is one of `starting`, `running`, `exited`, `timeout`, `spawn-failed`, or `guard-terminated`; `starting` is the transient initial value while the child is being created, not an expected terminal outcome. `phase` is `startup`, `initialize`, `operation`, or `post-initialize`. `exitCode` and `signal` are included when the process exits before the guard finishes; timeout runs include `timedOut`, `timeoutCode`, `timeoutMs`, and guard kill metadata. `spawnError` is either `null` or an object with `code` and `message`; the matching `spawn-failed` issue also exposes `spawnErrorCode`.
290
+ `process` records the observed lifecycle even when the run passes. `outcome` is one of `starting`, `running`, `exited`, `timeout`, `spawn-failed`, or `guard-terminated`; `starting` is the transient initial value while the child is being created, not an expected terminal outcome. `phase` is `startup`, `initialize`, `operation`, `adversarial`, or `post-initialize`. `exitCode` and `signal` are included when the process exits before the guard finishes; timeout runs include `timedOut`, `timeoutCode`, `timeoutMs`, and guard kill metadata. `spawnError` is either `null` or an object with `code` and `message`; the matching `spawn-failed` issue also exposes `spawnErrorCode`.
279
291
 
280
292
  Spawn failure shape:
281
293
 
@@ -308,11 +320,19 @@ Example:
308
320
  "enabled": false,
309
321
  "path": "",
310
322
  "resolvedPath": "",
311
- "checks": { "command": false, "cwd": false, "envNames": [], "requests": [], "safeToolCalls": [] }
323
+ "checks": {
324
+ "command": false,
325
+ "cwd": false,
326
+ "envNames": [],
327
+ "requests": [],
328
+ "safeToolCalls": [],
329
+ "adversarialProbes": [],
330
+ "adversarialToolCalls": []
331
+ }
312
332
  },
313
333
  "profile": "custom",
314
334
  "fingerprint": {
315
- "guard": { "name": "mcp-stdio-guard", "version": "0.4.0" },
335
+ "guard": { "name": "mcp-stdio-guard", "version": "0.5.0" },
316
336
  "command": {
317
337
  "executable": "node",
318
338
  "args": ["./server.js"],
@@ -328,12 +348,21 @@ Example:
328
348
  "enabled": false,
329
349
  "path": "",
330
350
  "resolvedPath": "",
331
- "checks": { "command": false, "cwd": false, "envNames": [], "requests": [], "safeToolCalls": [] }
351
+ "checks": {
352
+ "command": false,
353
+ "cwd": false,
354
+ "envNames": [],
355
+ "requests": [],
356
+ "safeToolCalls": [],
357
+ "adversarialProbes": [],
358
+ "adversarialToolCalls": []
359
+ }
332
360
  },
333
361
  "profile": "custom",
334
362
  "timeoutMs": 5000,
335
363
  "repeat": 1,
336
364
  "capabilityProbes": true,
365
+ "adversarialProbes": [],
337
366
  "operation": { "method": "tools/list", "hasParams": false, "source": "cli-request", "safeToolCallName": "" },
338
367
  "operations": [{ "method": "tools/list", "hasParams": false, "source": "cli-request", "safeToolCallName": "" }],
339
368
  "system": { "platform": "darwin", "arch": "arm64", "osRelease": "25.0.0" },
@@ -365,6 +394,7 @@ Example:
365
394
  "spawnError": null
366
395
  },
367
396
  "capabilityProbes": true,
397
+ "adversarial": { "enabled": false, "probes": [] },
368
398
  "capabilityKeys": ["tools"],
369
399
  "capabilityChecks": {
370
400
  "tools": { "advertised": true, "method": "tools/list", "responded": true, "itemCount": 2, "error": null },
@@ -398,6 +428,7 @@ Example:
398
428
  "prompts": { "status": "skipped", "issueCodes": [], "advertised": false, "method": "prompts/list", "responded": false, "itemCount": null }
399
429
  },
400
430
  "toolSchema": { "status": "pass", "issueCodes": [] },
431
+ "adversarial": { "status": "skipped", "issueCodes": [] },
401
432
  "process": { "status": "pass", "issueCodes": [] },
402
433
  "pythonBuffering": { "status": "pass", "issueCodes": [] },
403
434
  "staticScan": { "status": "skipped", "issueCodes": [] },
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "mcp-stdio-guard",
3
- "version": "0.4.0",
3
+ "version": "0.5.0",
4
4
  "description": "A runtime zero-dependency CLI that catches stdout pollution and handshake failures in MCP stdio servers.",
5
5
  "type": "module",
6
6
  "bin": {
package/src/index.js CHANGED
@@ -6,9 +6,9 @@ import { spawn, spawnSync } from 'node:child_process';
6
6
  function loadVersion() {
7
7
  try {
8
8
  const packageJson = JSON.parse(fs.readFileSync(new URL('../package.json', import.meta.url), 'utf8'));
9
- return typeof packageJson.version === 'string' ? packageJson.version : '0.4.0';
9
+ return typeof packageJson.version === 'string' ? packageJson.version : '0.5.0';
10
10
  } catch {
11
- return '0.4.0';
11
+ return '0.5.0';
12
12
  }
13
13
  }
14
14
 
@@ -32,7 +32,7 @@ const GUARD_PROFILES = Object.freeze({
32
32
  description: 'stable JSON output with static findings treated as failures when scanned'
33
33
  },
34
34
  strict: {
35
- description: 'deep deterministic checks; reserved for opt-in adversarial probes'
35
+ description: 'deep deterministic checks with opt-in adversarial protocol probes'
36
36
  }
37
37
  });
38
38
  const GUARD_PROFILE_NAMES = Object.keys(GUARD_PROFILES);
@@ -94,12 +94,69 @@ const CAPABILITY_DEFINITIONS = Object.freeze([
94
94
  { name: 'resources', method: 'resources/list' },
95
95
  { name: 'prompts', method: 'prompts/list' }
96
96
  ]);
97
+ const ADVERSARIAL_OBSERVATION_MS = 150;
98
+ const BUILTIN_ADVERSARIAL_PROBES = Object.freeze({
99
+ 'invalid-method': {
100
+ name: 'invalid-method',
101
+ source: 'builtin',
102
+ method: 'mcp_stdio_guard/invalid_method',
103
+ params: { reason: 'adversarial-probe' },
104
+ expectation: 'error',
105
+ failureCode: 'adversarial-invalid-method-result',
106
+ description: 'invalid method requests return structured JSON-RPC errors',
107
+ risk: 'Sends one unknown JSON-RPC method after initialize; servers should answer with an error instead of crashing or returning success.'
108
+ },
109
+ 'invalid-params': {
110
+ name: 'invalid-params',
111
+ source: 'builtin',
112
+ method: 'tools/list',
113
+ params: [],
114
+ expectation: 'error',
115
+ failureCode: 'adversarial-invalid-params-result',
116
+ description: 'invalid params return structured JSON-RPC errors',
117
+ risk: 'Sends deliberately invalid params to a common MCP method; tolerant servers may need to opt out of this stricter probe.'
118
+ },
119
+ notification: {
120
+ name: 'notification',
121
+ source: 'builtin',
122
+ method: 'notifications/mcp_stdio_guard_probe',
123
+ params: { reason: 'adversarial-probe' },
124
+ omitId: true,
125
+ expectation: 'no-response',
126
+ failureCode: 'adversarial-notification-response',
127
+ description: 'notifications do not receive responses',
128
+ risk: 'Sends one unknown JSON-RPC notification; servers should not reply to notifications even when the method is unknown.'
129
+ },
130
+ 'malformed-json': {
131
+ name: 'malformed-json',
132
+ source: 'builtin',
133
+ method: '',
134
+ rawLine: '{"jsonrpc":"2.0","id":"mcp-stdio-guard-malformed","method":',
135
+ expectation: 'error-or-no-response',
136
+ failureCode: 'adversarial-malformed-json-result',
137
+ description: 'malformed JSON does not crash the server',
138
+ risk: 'Writes one malformed line to stdin; servers may return a structured parse error or ignore it, but should not crash.'
139
+ }
140
+ });
141
+ const BUILTIN_ADVERSARIAL_PROBE_NAMES = Object.freeze(Object.keys(BUILTIN_ADVERSARIAL_PROBES));
142
+ const STRICT_ADVERSARIAL_PROBE_NAMES = BUILTIN_ADVERSARIAL_PROBE_NAMES;
143
+ const SUPPORTED_ADVERSARIAL_EXPECTATIONS = new Set(['error', 'no-response', 'error-or-no-response']);
97
144
  const CAPABILITY_ISSUE_CODES = new Set([
98
145
  'capability-list-error',
99
146
  'capability-list-missing-response',
100
147
  'capability-list-timeout',
101
148
  'capability-list-unsupported'
102
149
  ]);
150
+ const ADVERSARIAL_ISSUE_CODES = new Set([
151
+ 'adversarial-invalid-method-result',
152
+ 'adversarial-invalid-params-result',
153
+ 'adversarial-malformed-json-result',
154
+ 'adversarial-notification-response',
155
+ 'adversarial-probe-crash',
156
+ 'adversarial-probe-invalid-stdout',
157
+ 'adversarial-probe-timeout',
158
+ 'adversarial-tool-call-result'
159
+ ]);
103
160
  const REPEAT_DRIFT_ISSUE_CODES = new Set([
104
161
  'repeat-capability-drift',
105
162
  'repeat-list-shape-drift',
@@ -176,6 +233,14 @@ const ISSUE_CLASS_BY_CODE = new Map([
176
233
  ['tool-name-duplicate', ISSUE_CLASSES.MCP_PROTOCOL],
177
234
  ['tool-name-invalid', ISSUE_CLASSES.MCP_PROTOCOL],
178
235
  ['tools-list-invalid-result', ISSUE_CLASSES.MCP_PROTOCOL],
236
+ ['adversarial-invalid-method-result', ISSUE_CLASSES.MCP_PROTOCOL],
237
+ ['adversarial-invalid-params-result', ISSUE_CLASSES.MCP_PROTOCOL],
238
+ ['adversarial-malformed-json-result', ISSUE_CLASSES.MCP_PROTOCOL],
239
+ ['adversarial-notification-response', ISSUE_CLASSES.MCP_PROTOCOL],
240
+ ['adversarial-probe-crash', ISSUE_CLASSES.MCP_PROTOCOL],
241
+ ['adversarial-probe-invalid-stdout', ISSUE_CLASSES.MCP_PROTOCOL],
242
+ ['adversarial-probe-timeout', ISSUE_CLASSES.MCP_PROTOCOL],
243
+ ['adversarial-tool-call-result', ISSUE_CLASSES.MCP_PROTOCOL],
179
244
  ['notification-response', ISSUE_CLASSES.MCP_PROTOCOL],
180
245
  ['response-id-mismatch', ISSUE_CLASSES.MCP_PROTOCOL],
181
246
  ['response-id-type-mismatch', ISSUE_CLASSES.MCP_PROTOCOL],
@@ -208,6 +273,7 @@ export async function runCli(argv) {
208
273
  cwd: options.cwd,
209
274
  env: options.env,
210
275
  probeCapabilities: options.probeCapabilities,
276
+ adversarialProbes: options.adversarialProbes,
211
277
  operations: options.operations,
212
278
  operation: options.operations.length === 1
213
279
  ? {
@@ -263,6 +329,8 @@ export function parseArgs(argv) {
263
329
  env: {},
264
330
  operations: [],
265
331
  configOperations: [],
332
+ adversarialProbeSpecs: [],
333
+ adversarialProbes: [],
266
334
  protocol: DEFAULT_PROTOCOL,
267
335
  timeoutMs: DEFAULT_TIMEOUT,
268
336
  profile: DEFAULT_PROFILE,
@@ -314,6 +382,10 @@ export function parseArgs(argv) {
314
382
  options.requestParams = parseJsonOption(readOptionValue(argv, index, arg), arg);
315
383
  specifiedOptions.add('requestParams');
316
384
  index += 1;
385
+ } else if (arg === '--adversarial-probe' || arg === '--adversarial-probes') {
386
+ options.adversarialProbeSpecs.push(...expandAdversarialProbeList(readOptionValue(argv, index, arg), arg));
387
+ specifiedOptions.add('adversarialProbes');
388
+ index += 1;
317
389
  } else if (arg === '--protocol') {
318
390
  options.protocol = readOptionValue(argv, index, arg);
319
391
  specifiedOptions.add('protocol');
@@ -348,6 +420,7 @@ export function parseArgs(argv) {
348
420
  }
349
421
  applyProfileDefaults(options, specifiedOptions);
350
422
  options.operations = buildConfiguredOperations(options);
423
+ options.adversarialProbes = normalizeAdversarialProbeSpecs(options.adversarialProbeSpecs);
351
424
 
352
425
  if (!Number.isInteger(options.timeoutMs) || options.timeoutMs < 100) {
353
426
  throw new Error('--timeout must be an integer >= 100');
@@ -391,6 +464,9 @@ function applyProfileDefaults(options, specifiedOptions) {
391
464
  if (!specifiedOptions.has('repeat')) {
392
465
  options.repeat = 2;
393
466
  }
467
+ if (!specifiedOptions.has('adversarialProbes')) {
468
+ options.adversarialProbeSpecs.push(...STRICT_ADVERSARIAL_PROBE_NAMES);
469
+ }
394
470
  }
395
471
 
396
472
  return options;
@@ -421,9 +497,13 @@ function applyConfigFile(options, config, specifiedOptions) {
421
497
  const env = normalizeConfigEnv(config);
422
498
  const requests = normalizeConfigRequests(config);
423
499
  const safeToolCalls = normalizeSafeToolCalls(config);
500
+ const adversarialProbes = normalizeConfigAdversarialProbes(config);
501
+ const adversarialToolCalls = normalizeConfigAdversarialToolCalls(config);
424
502
  const usesConfigCommand = command.length > 0 && !specifiedOptions.has('command');
425
503
  const usesConfigCwd = typeof config.cwd === 'string' && !specifiedOptions.has('cwd');
426
504
  const usesConfigOperations = !specifiedOptions.has('requestMethod') && !specifiedOptions.has('requestParams');
505
+ const usesConfigAdversarialProbes = !specifiedOptions.has('adversarialProbes')
506
+ && (Object.hasOwn(config, 'adversarialProbes') || Object.hasOwn(config, 'adversarialToolCalls'));
427
507
 
428
508
  if (usesConfigCommand) {
429
509
  options.command = command;
@@ -489,6 +569,11 @@ function applyConfigFile(options, config, specifiedOptions) {
489
569
  options.configOperations = [...requests, ...safeToolCalls];
490
570
  }
491
571
 
572
+ if (usesConfigAdversarialProbes) {
573
+ options.adversarialProbeSpecs = [...adversarialProbes, ...adversarialToolCalls];
574
+ specifiedOptions.add('adversarialProbes');
575
+ }
576
+
492
577
  options.config = {
493
578
  enabled: true,
494
579
  path: options.configPath,
@@ -498,7 +583,9 @@ function applyConfigFile(options, config, specifiedOptions) {
498
583
  cwd: usesConfigCwd,
499
584
  envNames: Object.keys(env).sort(),
500
585
  requests: usesConfigOperations ? requests.map((request) => request.method) : [],
501
- safeToolCalls: usesConfigOperations ? safeToolCalls.map((request) => request.safeToolCallName) : []
586
+ safeToolCalls: usesConfigOperations ? safeToolCalls.map((request) => request.safeToolCallName) : [],
587
+ adversarialProbes: usesConfigAdversarialProbes ? adversarialProbes.map((probe) => probe.name ?? probe) : [],
588
+ adversarialToolCalls: usesConfigAdversarialProbes ? adversarialToolCalls.map((probe) => probe.safeToolCallName) : []
502
589
  }
503
590
  };
504
591
  }
@@ -608,6 +695,67 @@ function normalizeSafeToolCalls(config) {
608
695
  });
609
696
  }
610
697
 
698
+ function normalizeConfigAdversarialProbes(config) {
699
+ if (config.adversarialProbes === undefined) return [];
700
+
701
+ if (config.adversarialProbes === true) {
702
+ return [...BUILTIN_ADVERSARIAL_PROBE_NAMES];
703
+ }
704
+
705
+ if (config.adversarialProbes === false) {
706
+ return [];
707
+ }
708
+
709
+ if (typeof config.adversarialProbes === 'string') {
710
+ return expandAndValidateAdversarialProbeList(config.adversarialProbes, '--config adversarialProbes');
711
+ }
712
+
713
+ if (!Array.isArray(config.adversarialProbes)) {
714
+ throw new Error('--config adversarialProbes must be a boolean, string, or array of strings');
715
+ }
716
+
717
+ return config.adversarialProbes.flatMap((probe, index) => {
718
+ if (typeof probe !== 'string') {
719
+ throw new Error(`--config adversarialProbes[${index}] must be a string`);
720
+ }
721
+ return expandAndValidateAdversarialProbeList(probe, `--config adversarialProbes[${index}]`);
722
+ });
723
+ }
724
+
725
+ function normalizeConfigAdversarialToolCalls(config) {
726
+ if (config.adversarialToolCalls === undefined) return [];
727
+ if (!Array.isArray(config.adversarialToolCalls)) {
728
+ throw new Error('--config adversarialToolCalls must be an array');
729
+ }
730
+
731
+ return config.adversarialToolCalls.map((call, index) => {
732
+ if (!isObjectRecord(call)) {
733
+ throw new Error(`--config adversarialToolCalls[${index}] must be an object`);
734
+ }
735
+ if (typeof call.name !== 'string' || !call.name) {
736
+ throw new Error(`--config adversarialToolCalls[${index}].name must be a non-empty string`);
737
+ }
738
+ const argumentsValue = call.arguments ?? {};
739
+ if (!isObjectRecord(argumentsValue)) {
740
+ throw new Error(`--config adversarialToolCalls[${index}].arguments must be an object`);
741
+ }
742
+ return {
743
+ name: `safe-tool-invalid-args:${call.name}`,
744
+ source: 'adversarial-tool-call',
745
+ method: 'tools/call',
746
+ params: {
747
+ name: call.name,
748
+ arguments: argumentsValue
749
+ },
750
+ safeToolCallName: call.name,
751
+ expectation: 'error',
752
+ failureCode: 'adversarial-tool-call-result',
753
+ description: `tools/call for ${call.name} with invalid arguments returns a structured error`,
754
+ risk: 'Calls a configured safe tool with intentionally invalid arguments; only use for idempotent tools.'
755
+ };
756
+ });
757
+ }
758
+
611
759
  function buildConfiguredOperations(options) {
612
760
  if (options.requestMethod) {
613
761
  return [{
@@ -629,6 +777,127 @@ function normalizeGuardOperations(operations) {
629
777
  }));
630
778
  }
631
779
 
780
+ function expandAdversarialProbeList(rawValue, option) {
781
+ const names = String(rawValue)
782
+ .split(',')
783
+ .map((name) => name.trim())
784
+ .filter(Boolean);
785
+
786
+ if (!names.length) {
787
+ throw new Error(`${option} requires at least one probe name`);
788
+ }
789
+
790
+ if (names.includes('none')) {
791
+ if (names.length > 1) {
792
+ throw new Error(`${option} cannot combine none with other probes`);
793
+ }
794
+ return [];
795
+ }
796
+
797
+ if (names.includes('all')) {
798
+ if (names.length > 1) {
799
+ throw new Error(`${option} cannot combine all with other probes`);
800
+ }
801
+ return [...BUILTIN_ADVERSARIAL_PROBE_NAMES];
802
+ }
803
+
804
+ return names;
805
+ }
806
+
807
+ function expandAndValidateAdversarialProbeList(rawValue, option) {
808
+ return expandAdversarialProbeList(rawValue, option).map((name) => validateAdversarialProbeName(name, option));
809
+ }
810
+
811
+ function validateAdversarialProbeName(name, option) {
812
+ if (!BUILTIN_ADVERSARIAL_PROBES[name]) {
813
+ throw new Error(`${option} must be one of: ${[...BUILTIN_ADVERSARIAL_PROBE_NAMES, 'all', 'none'].join(', ')}`);
814
+ }
815
+ return name;
816
+ }
817
+
818
+ function normalizeAdversarialProbeSpecs(specs) {
819
+ const probes = [];
820
+ const seenBuiltins = new Set();
821
+
822
+ for (const spec of specs ?? []) {
823
+ if (typeof spec === 'string') {
824
+ for (const name of expandAdversarialProbeList(spec, '--adversarial-probe')) {
825
+ const definition = BUILTIN_ADVERSARIAL_PROBES[validateAdversarialProbeName(name, '--adversarial-probe')];
826
+ if (seenBuiltins.has(name)) continue;
827
+ seenBuiltins.add(name);
828
+ probes.push({ ...definition });
829
+ }
830
+ } else if (isObjectRecord(spec)) {
831
+ probes.push(normalizeAdversarialProbeObject(spec, '--adversarial-probe'));
832
+ } else {
833
+ throw new Error('--adversarial-probe entries must be strings or configured adversarial tool calls');
834
+ }
835
+ }
836
+
837
+ return probes.map((probe, index) => ({
838
+ index,
839
+ safeToolCallName: '',
840
+ rawLine: '',
841
+ omitId: false,
842
+ ...probe
843
+ }));
844
+ }
845
+
846
+ function normalizeAdversarialProbeObject(spec, option) {
847
+ if (typeof spec.name !== 'string' || !spec.name.trim()) {
848
+ throw new Error(`${option}.name must be a non-empty string`);
849
+ }
850
+
851
+ if (typeof spec.expectation !== 'string' || !SUPPORTED_ADVERSARIAL_EXPECTATIONS.has(spec.expectation)) {
852
+ throw new Error(`${option}.expectation must be one of: ${[...SUPPORTED_ADVERSARIAL_EXPECTATIONS].join(', ')}`);
853
+ }
854
+
855
+ if (typeof spec.failureCode !== 'string' || !spec.failureCode) {
856
+ throw new Error(`${option}.failureCode must be a non-empty string`);
857
+ }
858
+
859
+ const rawLine = spec.rawLine ?? '';
860
+ if (typeof rawLine !== 'string') {
861
+ throw new Error(`${option}.rawLine must be a string when provided`);
862
+ }
863
+
864
+ const method = spec.method ?? '';
865
+ if (typeof method !== 'string') {
866
+ throw new Error(`${option}.method must be a string when provided`);
867
+ }
868
+ if (!rawLine && !method) {
869
+ throw new Error(`${option}.method must be a non-empty string when rawLine is not set`);
870
+ }
871
+
872
+ if (spec.source !== undefined && typeof spec.source !== 'string') {
873
+ throw new Error(`${option}.source must be a string when provided`);
874
+ }
875
+ if (spec.safeToolCallName !== undefined && typeof spec.safeToolCallName !== 'string') {
876
+ throw new Error(`${option}.safeToolCallName must be a string when provided`);
877
+ }
878
+ if (spec.description !== undefined && typeof spec.description !== 'string') {
879
+ throw new Error(`${option}.description must be a string when provided`);
880
+ }
881
+ if (spec.risk !== undefined && typeof spec.risk !== 'string') {
882
+ throw new Error(`${option}.risk must be a string when provided`);
883
+ }
884
+ if (spec.omitId !== undefined && typeof spec.omitId !== 'boolean') {
885
+ throw new Error(`${option}.omitId must be a boolean when provided`);
886
+ }
887
+ if (spec.quietMs !== undefined && (!Number.isInteger(spec.quietMs) || spec.quietMs < 1)) {
888
+ throw new Error(`${option}.quietMs must be an integer >= 1 when provided`);
889
+ }
890
+
891
+ return {
892
+ ...spec,
893
+ name: spec.name.trim(),
894
+ source: spec.source ?? 'custom',
895
+ method,
896
+ rawLine,
897
+ safeToolCallName: spec.safeToolCallName ?? ''
898
+ };
899
+ }
900
+
632
901
  function resolveConfigPath(value, configDir) {
633
902
  return path.resolve(configDir, value);
634
903
  }
@@ -636,6 +905,7 @@ function resolveConfigPath(value, configDir) {
636
905
  export async function guardRepeatedStdioServer(commandWithArgs, options = {}) {
637
906
  const startedAt = Date.now();
638
907
  const repeat = options.repeat ?? 1;
908
+ const adversarialProbes = normalizeAdversarialProbeSpecs(options.adversarialProbes ?? []);
639
909
  const runs = [];
640
910
  const issues = [];
641
911
 
@@ -643,7 +913,7 @@ export async function guardRepeatedStdioServer(commandWithArgs, options = {}) {
643
913
  throw new Error('repeat must be an integer >= 1');
644
914
  }
645
915
 
646
- const singleRunOptions = { ...options, repeat: 1 };
916
+ const singleRunOptions = { ...options, adversarialProbes, repeat: 1 };
647
917
 
648
918
  for (let index = 1; index <= repeat; index += 1) {
649
919
  const run = await guardStdioServer(commandWithArgs, singleRunOptions);
@@ -672,6 +942,7 @@ export async function guardRepeatedStdioServer(commandWithArgs, options = {}) {
672
942
  issues,
673
943
  checks: {},
674
944
  capabilityProbes: options.probeCapabilities ?? true,
945
+ adversarial: aggregateRunAdversarial(runs, adversarialProbes),
675
946
  drift,
676
947
  staticScan: defaultStaticScan(),
677
948
  staticFindings: [],
@@ -690,6 +961,7 @@ export async function guardStdioServer(commandWithArgs, options = {}) {
690
961
  const timeoutMs = options.timeoutMs ?? DEFAULT_TIMEOUT;
691
962
  const protocol = options.protocol ?? DEFAULT_PROTOCOL;
692
963
  const operations = normalizeGuardOperations(options.operations ?? (options.operation ? [options.operation] : []));
964
+ const adversarialProbes = normalizeAdversarialProbeSpecs(options.adversarialProbes ?? []);
693
965
  const probeCapabilities = options.probeCapabilities ?? true;
694
966
  const env = { ...process.env, ...(options.env ?? {}) };
695
967
  const issues = [];
@@ -737,6 +1009,7 @@ export async function guardStdioServer(commandWithArgs, options = {}) {
737
1009
  capabilityProbes: probeCapabilities,
738
1010
  capabilityKeys: [],
739
1011
  capabilityChecks: defaultCapabilityChecks(),
1012
+ adversarial: defaultAdversarialResult(adversarialProbes),
740
1013
  stderr: '',
741
1014
  process: defaultProcessInfo(timeoutMs),
742
1015
  staticScan: defaultStaticScan(),
@@ -750,6 +1023,7 @@ export async function guardStdioServer(commandWithArgs, options = {}) {
750
1023
  config: options.config,
751
1024
  profile: options.profile,
752
1025
  probeCapabilities,
1026
+ adversarialProbes,
753
1027
  operations,
754
1028
  env: options.env
755
1029
  })
@@ -811,12 +1085,20 @@ export async function guardStdioServer(commandWithArgs, options = {}) {
811
1085
  child.stdin.write(`${JSON.stringify(message)}\n`);
812
1086
  }
813
1087
 
1088
+ function sendRaw(line) {
1089
+ if (!child?.stdin?.writable) return;
1090
+ child.stdin.write(`${line}\n`);
1091
+ }
1092
+
814
1093
  function enqueueRequest(request) {
1094
+ const needsId = !request.omitId && !request.rawLine;
815
1095
  requestQueue.push({
816
1096
  ...request,
817
- id: nextRequestId
1097
+ id: needsId ? nextRequestId : null
818
1098
  });
819
- nextRequestId += 1;
1099
+ if (needsId) {
1100
+ nextRequestId += 1;
1101
+ }
820
1102
  }
821
1103
 
822
1104
  function startNextRequest() {
@@ -829,6 +1111,34 @@ export async function guardStdioServer(commandWithArgs, options = {}) {
829
1111
  }
830
1112
 
831
1113
  result.process.phase = 'operation';
1114
+ currentRequest.startedAt = Date.now();
1115
+
1116
+ if (currentRequest.kind === 'adversarial') {
1117
+ result.process.phase = 'adversarial';
1118
+ markAdversarialProbeRunning(currentRequest);
1119
+ if (currentRequest.rawLine) {
1120
+ sendRaw(currentRequest.rawLine);
1121
+ } else {
1122
+ const request = {
1123
+ jsonrpc: '2.0',
1124
+ method: currentRequest.method
1125
+ };
1126
+ if (!currentRequest.omitId) {
1127
+ request.id = currentRequest.id;
1128
+ }
1129
+ if (currentRequest.params !== undefined) {
1130
+ request.params = currentRequest.params;
1131
+ }
1132
+ send(request);
1133
+ }
1134
+ if (currentRequest.expectation === 'no-response' || currentRequest.expectation === 'error-or-no-response') {
1135
+ armAdversarialQuietTimer(currentRequest);
1136
+ } else {
1137
+ armAdversarialTimeout(currentRequest);
1138
+ }
1139
+ return;
1140
+ }
1141
+
832
1142
  const request = {
833
1143
  jsonrpc: '2.0',
834
1144
  id: currentRequest.id,
@@ -846,6 +1156,37 @@ export async function guardStdioServer(commandWithArgs, options = {}) {
846
1156
  );
847
1157
  }
848
1158
 
1159
+ function armAdversarialQuietTimer(request) {
1160
+ clearTimeout(timer);
1161
+ timer = setTimeout(() => {
1162
+ completeAdversarialProbe(request, 'pass');
1163
+ currentRequest = null;
1164
+ startNextRequest();
1165
+ }, request.quietMs);
1166
+ }
1167
+
1168
+ function armAdversarialTimeout(request) {
1169
+ clearTimeout(timer);
1170
+ result.process.phase = 'adversarial';
1171
+ timer = setTimeout(() => {
1172
+ result.process.timedOut = true;
1173
+ result.process.timeoutCode = 'adversarial-probe-timeout';
1174
+ result.process.timeoutMs = timeoutMs;
1175
+ result.process.outcome = 'timeout';
1176
+ failAdversarialProbe(
1177
+ request,
1178
+ 'adversarial-probe-timeout',
1179
+ `${request.probeName} adversarial probe did not receive a structured error within ${timeoutMs}ms`,
1180
+ {
1181
+ detailCode: 'adversarial-probe-timeout',
1182
+ phase: 'adversarial',
1183
+ timeoutMs
1184
+ }
1185
+ );
1186
+ finish();
1187
+ }, timeoutMs);
1188
+ }
1189
+
849
1190
  function configureCapabilityChecks(capabilities) {
850
1191
  result.capabilityKeys = capabilityKeys(capabilities);
851
1192
  for (const definition of CAPABILITY_DEFINITIONS) {
@@ -871,22 +1212,38 @@ export async function guardStdioServer(commandWithArgs, options = {}) {
871
1212
  });
872
1213
  }
873
1214
 
874
- if (!probeCapabilities) return;
875
-
876
- for (const definition of CAPABILITY_DEFINITIONS) {
877
- const check = result.capabilityChecks[definition.name];
878
- if (!check.advertised || operationCapabilities.has(definition.name)) continue;
879
- enqueueRequest({
880
- kind: 'capability',
881
- capability: definition.name,
882
- method: definition.method,
883
- timeoutCode: 'capability-list-timeout',
884
- timeoutMessage: `${definition.method} did not receive a response for advertised ${definition.name} capability within ${timeoutMs}ms`,
885
- timeoutDetails: {
1215
+ if (probeCapabilities) {
1216
+ for (const definition of CAPABILITY_DEFINITIONS) {
1217
+ const check = result.capabilityChecks[definition.name];
1218
+ if (!check.advertised || operationCapabilities.has(definition.name)) continue;
1219
+ enqueueRequest({
1220
+ kind: 'capability',
886
1221
  capability: definition.name,
887
1222
  method: definition.method,
888
- detailCode: 'capability-request-timeout'
889
- }
1223
+ timeoutCode: 'capability-list-timeout',
1224
+ timeoutMessage: `${definition.method} did not receive a response for advertised ${definition.name} capability within ${timeoutMs}ms`,
1225
+ timeoutDetails: {
1226
+ capability: definition.name,
1227
+ method: definition.method,
1228
+ detailCode: 'capability-request-timeout'
1229
+ }
1230
+ });
1231
+ }
1232
+ }
1233
+
1234
+ for (let probeIndex = 0; probeIndex < adversarialProbes.length; probeIndex += 1) {
1235
+ const probe = adversarialProbes[probeIndex];
1236
+ enqueueRequest({
1237
+ kind: 'adversarial',
1238
+ probeIndex,
1239
+ probeName: probe.name,
1240
+ method: probe.method,
1241
+ params: probe.params,
1242
+ rawLine: probe.rawLine,
1243
+ omitId: probe.omitId,
1244
+ expectation: probe.expectation,
1245
+ failureCode: probe.failureCode,
1246
+ quietMs: Math.min(timeoutMs, probe.quietMs ?? ADVERSARIAL_OBSERVATION_MS)
890
1247
  });
891
1248
  }
892
1249
  }
@@ -950,6 +1307,107 @@ export async function guardStdioServer(commandWithArgs, options = {}) {
950
1307
  });
951
1308
  }
952
1309
 
1310
+ function markAdversarialProbeRunning(request) {
1311
+ const probe = result.adversarial.probes[request.probeIndex];
1312
+ if (!probe) return;
1313
+ probe.status = 'running';
1314
+ probe.started = true;
1315
+ }
1316
+
1317
+ function completeAdversarialProbe(request, status, issueCodes = [], response = {}) {
1318
+ const probe = result.adversarial.probes[request.probeIndex];
1319
+ if (!probe) return;
1320
+ probe.status = status;
1321
+ probe.responded = Boolean(response.responded);
1322
+ probe.error = response.error ?? null;
1323
+ probe.issueCodes = [...new Set(issueCodes)].sort();
1324
+ probe.durationMs = request.startedAt ? Date.now() - request.startedAt : null;
1325
+ }
1326
+
1327
+ function failAdversarialProbe(request, code, message, details = {}, response = {}) {
1328
+ completeAdversarialProbe(request, 'fail', [code], response);
1329
+ addIssue('error', code, message, {
1330
+ probe: request.probeName,
1331
+ method: request.method || '',
1332
+ ...details
1333
+ });
1334
+ }
1335
+
1336
+ function handleAdversarialFrame(message) {
1337
+ clearTimeout(timer);
1338
+ const request = currentRequest;
1339
+
1340
+ if (request.expectation === 'no-response') {
1341
+ failAdversarialProbe(
1342
+ request,
1343
+ request.failureCode,
1344
+ `${request.probeName} adversarial probe received a response to a notification`,
1345
+ {},
1346
+ { responded: true }
1347
+ );
1348
+ finish();
1349
+ return;
1350
+ }
1351
+
1352
+ if (!request.rawLine && !request.omitId && isResponseIdTypeMismatch(message, request.id)) {
1353
+ failAdversarialProbe(
1354
+ request,
1355
+ 'response-id-type-mismatch',
1356
+ `${request.probeName} adversarial response id ${JSON.stringify(message.id)} does not exactly match request id ${request.id}`,
1357
+ {},
1358
+ { responded: true }
1359
+ );
1360
+ finish();
1361
+ return;
1362
+ }
1363
+
1364
+ if (!request.rawLine && !request.omitId && isResponseIdMismatch(message, request.id)) {
1365
+ failAdversarialProbe(
1366
+ request,
1367
+ 'response-id-mismatch',
1368
+ `${request.probeName} adversarial response id ${JSON.stringify(message.id)} does not match request id ${request.id}`,
1369
+ {},
1370
+ { responded: true }
1371
+ );
1372
+ finish();
1373
+ return;
1374
+ }
1375
+
1376
+ if (!isJsonRpcResponse(message)) {
1377
+ failAdversarialProbe(
1378
+ request,
1379
+ 'adversarial-probe-invalid-stdout',
1380
+ `${request.probeName} adversarial probe received a JSON-RPC frame that was not a response`,
1381
+ {},
1382
+ { responded: true }
1383
+ );
1384
+ finish();
1385
+ return;
1386
+ }
1387
+
1388
+ if (message.error) {
1389
+ completeAdversarialProbe(request, 'pass', [], {
1390
+ responded: true,
1391
+ error: {
1392
+ code: message.error.code,
1393
+ message: message.error.message
1394
+ }
1395
+ });
1396
+ currentRequest = null;
1397
+ startNextRequest();
1398
+ return;
1399
+ }
1400
+
1401
+ failAdversarialProbe(
1402
+ request,
1403
+ request.failureCode,
1404
+ `${request.probeName} adversarial probe returned success where a structured error was expected`,
1405
+ {},
1406
+ { responded: true }
1407
+ );
1408
+ finish();
1409
+ }
1410
+
953
1411
  function recordCapabilityListShape(request, responseResult) {
954
1412
  if (!isObjectRecord(responseResult)) return;
955
1413
  const check = result.capabilityChecks[request.capability];
@@ -1024,7 +1482,9 @@ export async function guardStdioServer(commandWithArgs, options = {}) {
1024
1482
  clearTimeout(timer);
1025
1483
  const exitPhase = initialized
1026
1484
  ? currentRequest
1027
- ? 'operation'
1485
+ ? currentRequest.kind === 'adversarial'
1486
+ ? 'adversarial'
1487
+ : 'operation'
1028
1488
  : 'post-initialize'
1029
1489
  : 'initialize';
1030
1490
  result.process.phase = exitPhase;
@@ -1034,6 +1494,17 @@ export async function guardStdioServer(commandWithArgs, options = {}) {
1034
1494
  if (stdoutBuffer.trim()) {
1035
1495
  addIssue('error', 'stdout-without-newline', `stdout ended with an incomplete JSON-RPC frame: ${quote(stdoutBuffer)}`);
1036
1496
  }
1497
+ if (!endedByGuard && initialized && currentRequest?.kind === 'adversarial') {
1498
+ result.process.phase = 'adversarial';
1499
+ failAdversarialProbe(
1500
+ currentRequest,
1501
+ 'adversarial-probe-crash',
1502
+ `server exited during ${currentRequest.probeName} adversarial probe (code ${code ?? 'null'}, signal ${signal ?? 'null'})`,
1503
+ adversarialExitIssueDetails(code, signal)
1504
+ );
1505
+ finish();
1506
+ return;
1507
+ }
1037
1508
  if (!endedByGuard && initialized && currentRequest?.kind === 'operation') {
1038
1509
  const operationResult = result.operations[currentRequest.operationIndex];
1039
1510
  if (operationResult && !operationResult.responded) {
@@ -1087,17 +1558,42 @@ export async function guardStdioServer(commandWithArgs, options = {}) {
1087
1558
  message = JSON.parse(line);
1088
1559
  } catch {
1089
1560
  addIssue('error', 'stdout-non-json', `stdout line ${frames.length + 1} is not JSON-RPC: ${quote(line)}`);
1561
+ if (currentRequest?.kind === 'adversarial') {
1562
+ failAdversarialProbe(
1563
+ currentRequest,
1564
+ 'adversarial-probe-invalid-stdout',
1565
+ `${currentRequest.probeName} adversarial probe received non-JSON stdout`,
1566
+ {},
1567
+ { responded: true }
1568
+ );
1569
+ finish();
1570
+ }
1090
1571
  return;
1091
1572
  }
1092
1573
 
1093
1574
  const validation = validateJsonRpc(message);
1094
1575
  if (validation) {
1095
1576
  addIssue('error', 'stdout-invalid-json-rpc', validation);
1577
+ if (currentRequest?.kind === 'adversarial') {
1578
+ failAdversarialProbe(
1579
+ currentRequest,
1580
+ 'adversarial-probe-invalid-stdout',
1581
+ `${currentRequest.probeName} adversarial probe received invalid JSON-RPC stdout`,
1582
+ {},
1583
+ { responded: true }
1584
+ );
1585
+ finish();
1586
+ }
1096
1587
  return;
1097
1588
  }
1098
1589
 
1099
1590
  frames.push(message);
1100
1591
 
1592
+ if (initialized && currentRequest?.kind === 'adversarial') {
1593
+ handleAdversarialFrame(message);
1594
+ return;
1595
+ }
1596
+
1101
1597
  if (!initialized && isResponseIdTypeMismatch(message, 1)) {
1102
1598
  clearTimeout(timer);
1103
1599
  addIssue('error', 'response-id-type-mismatch', `initialize response id ${JSON.stringify(message.id)} does not exactly match request id 1`);
@@ -1250,6 +1746,15 @@ function exitIssueDetails(position, code, signal) {
1250
1746
  };
1251
1747
  }
1252
1748
 
1749
+ function adversarialExitIssueDetails(code, signal) {
1750
+ return {
1751
+ detailCode: exitDetailCode('during-adversarial', code, signal),
1752
+ phase: 'adversarial',
1753
+ exitCode: code,
1754
+ signal
1755
+ };
1756
+ }
1757
+
1253
1758
  function exitDetailCode(position, code, signal) {
1254
1759
  if (signal) return `signal-exit-${position}`;
1255
1760
  if (code === 0) return `clean-exit-${position}`;
@@ -1651,6 +2156,7 @@ export function classifyIssueCode(code) {
1651
2156
  export function createFingerprint(commandWithArgs, options = {}) {
1652
2157
  const cwd = path.resolve(options.cwd ?? process.cwd());
1653
2158
  const operations = normalizeGuardOperations(options.operations ?? (options.operation ? [options.operation] : []));
2159
+ const adversarialProbes = normalizeAdversarialProbeSpecs(options.adversarialProbes ?? []);
1654
2160
 
1655
2161
  return {
1656
2162
  guard: {
@@ -1673,6 +2179,13 @@ export function createFingerprint(commandWithArgs, options = {}) {
1673
2179
  timeoutMs: options.timeoutMs ?? DEFAULT_TIMEOUT,
1674
2180
  repeat: options.repeat ?? 1,
1675
2181
  capabilityProbes: options.probeCapabilities ?? true,
2182
+ adversarialProbes: adversarialProbes.map((probe) => ({
2183
+ name: probe.name,
2184
+ source: probe.source,
2185
+ method: probe.method || '',
2186
+ safeToolCallName: probe.safeToolCallName || '',
2187
+ expectation: probe.expectation
2188
+ })),
1676
2189
  operation: operations.length === 1
1677
2190
  ? {
1678
2191
  method: operations[0].method,
@@ -2012,6 +2525,7 @@ function finalizeResult(result) {
2012
2525
  result.config ??= defaultConfigMetadata();
2013
2526
  result.profile ??= DEFAULT_PROFILE;
2014
2527
  result.capabilityProbes ??= true;
2528
+ result.adversarial ??= defaultAdversarialResult();
2015
2529
  result.operations ??= result.operation ? [{
2016
2530
  index: 0,
2017
2531
  source: result.operation.source ?? 'request',
@@ -2023,6 +2537,7 @@ function finalizeResult(result) {
2023
2537
  result.staticScan ??= defaultStaticScan();
2024
2538
  result.staticFindings ??= [];
2025
2539
  result.issues = normalizeIssues(result.issues ?? []);
2540
+ finalizeAdversarialResult(result);
2026
2541
  result.ok = !result.issues.some((issue) => issue.severity === 'error');
2027
2542
  result.checks = buildChecks(result);
2028
2543
  result.issueClasses = buildIssueClasses(result.issues);
@@ -2030,6 +2545,17 @@ function finalizeResult(result) {
2030
2545
  return result;
2031
2546
  }
2032
2547
 
2548
+ function finalizeAdversarialResult(result) {
2549
+ if (!result.adversarial?.enabled) return;
2550
+ for (const probe of result.adversarial.probes) {
2551
+ if (probe.status === 'pending' || probe.status === 'running') {
2552
+ probe.status = 'skipped';
2553
+ probe.durationMs ??= null;
2554
+ }
2555
+ probe.issueCodes = [...new Set(probe.issueCodes ?? [])].sort();
2556
+ }
2557
+ }
2558
+
2033
2559
  function finalizeFingerprint(result) {
2034
2560
  if (!result.fingerprint) return;
2035
2561
  result.fingerprint.timings ??= {};
@@ -2071,6 +2597,9 @@ function buildChecks(result) {
2071
2597
  toolSchema: repeated
2072
2598
  ? aggregateRunCheck(result, 'toolSchema')
2073
2599
  : buildToolSchemaCheck(result, issues),
2600
+ adversarial: repeated
2601
+ ? aggregateRunCheck(result, 'adversarial')
2602
+ : buildAdversarialCheck(result, issues),
2074
2603
  process: buildIssueCheck(issues, (issue) => PROCESS_ISSUE_CODES.has(issue.code)),
2075
2604
  pythonBuffering: buildIssueCheck(issues, (issue) => issue.code === 'python-buffered-stdio'),
2076
2605
  staticScan: buildStaticScanCheck(result, issues),
@@ -2135,6 +2664,28 @@ function buildToolSchemaCheck(result, issues) {
2135
2664
  return makeCheck('pass', []);
2136
2665
  }
2137
2666
 
2667
+ function buildAdversarialCheck(result, issues) {
2668
+ if (!result.adversarial?.enabled) {
2669
+ return makeCheck('skipped', []);
2670
+ }
2671
+
2672
+ const probeNames = new Set(result.adversarial.probes.map((probe) => probe.name));
2673
+ const matched = issues.filter((issue) => (
2674
+ ADVERSARIAL_ISSUE_CODES.has(issue.code)
2675
+ || (probeNames.has(issue.probe) && issue.class === ISSUE_CLASSES.MCP_PROTOCOL)
2676
+ ));
2677
+ if (matched.length) {
2678
+ return makeCheck(statusFromIssues(matched), matched);
2679
+ }
2680
+
2681
+ const active = result.adversarial.probes.filter((probe) => probe.status !== 'skipped');
2682
+ if (!result.initialized || !active.length) {
2683
+ return makeCheck('skipped', []);
2684
+ }
2685
+
2686
+ return makeCheck(active.every((probe) => probe.status === 'pass') ? 'pass' : 'fail', []);
2687
+ }
2688
+
2138
2689
  function buildCapabilityChecks(result, issues) {
2139
2690
  const checks = {};
2140
2691
  for (const definition of CAPABILITY_DEFINITIONS) {
@@ -2321,11 +2872,71 @@ function defaultConfigMetadata() {
2321
2872
  cwd: false,
2322
2873
  envNames: [],
2323
2874
  requests: [],
2324
- safeToolCalls: []
2875
+ safeToolCalls: [],
2876
+ adversarialProbes: [],
2877
+ adversarialToolCalls: []
2325
2878
  }
2326
2879
  };
2327
2880
  }
2328
2881
 
2882
+ function defaultAdversarialResult(probes = []) {
2883
+ return {
2884
+ enabled: probes.length > 0,
2885
+ probes: probes.map((probe, index) => ({
2886
+ index,
2887
+ name: probe.name,
2888
+ source: probe.source,
2889
+ method: probe.method || '',
2890
+ safeToolCallName: probe.safeToolCallName || '',
2891
+ expectation: probe.expectation,
2892
+ description: probe.description,
2893
+ risk: probe.risk,
2894
+ status: 'pending',
2895
+ started: false,
2896
+ responded: false,
2897
+ error: null,
2898
+ issueCodes: [],
2899
+ durationMs: null
2900
+ }))
2901
+ };
2902
+ }
2903
+
2904
+ function aggregateRunAdversarial(runs, probes = []) {
2905
+ if (!probes.length) {
2906
+ return defaultAdversarialResult();
2907
+ }
2908
+
2909
+ return {
2910
+ enabled: true,
2911
+ probes: probes.map((probe, index) => {
2912
+ const runProbes = runs
2913
+ .map((run) => run.adversarial?.probes?.[index])
2914
+ .filter(Boolean);
2915
+ const active = runProbes.filter((runProbe) => runProbe.status !== 'skipped');
2916
+ const status = active.length
2917
+ ? active.some((runProbe) => runProbe.status === 'fail')
2918
+ ? 'fail'
2919
+ : active.every((runProbe) => runProbe.status === 'pass')
2920
+ ? 'pass'
2921
+ : 'warning'
2922
+ : 'skipped';
2923
+ return {
2924
+ index,
2925
+ name: probe.name,
2926
+ source: probe.source,
2927
+ method: probe.method || '',
2928
+ safeToolCallName: probe.safeToolCallName || '',
2929
+ expectation: probe.expectation,
2930
+ description: probe.description,
2931
+ risk: probe.risk,
2932
+ status,
2933
+ runs: runProbes.length,
2934
+ issueCodes: [...new Set(runProbes.flatMap((runProbe) => runProbe.issueCodes ?? []))].sort()
2935
+ };
2936
+ })
2937
+ };
2938
+ }
2939
+
2329
2940
  function defaultStaticScan() {
2330
2941
  return {
2331
2942
  enabled: false,
@@ -2903,6 +3514,11 @@ function formatTextResult(result) {
2903
3514
  lines.push(`tool schemas: ${result.toolSchema.validToolCount}/${result.toolSchema.toolCount} valid`);
2904
3515
  }
2905
3516
 
3517
+ if (result.adversarial?.enabled) {
3518
+ const passedProbes = result.adversarial.probes.filter((probe) => probe.status === 'pass').length;
3519
+ lines.push(`adversarial probes: ${passedProbes}/${result.adversarial.probes.length} passed`);
3520
+ }
3521
+
2906
3522
  if (result.staticFindings.length) {
2907
3523
  lines.push(`static findings: ${result.staticFindings.length}`);
2908
3524
  for (const finding of result.staticFindings.slice(0, 10)) {
@@ -2930,6 +3546,11 @@ function formatRepeatedTextResult(result) {
2930
3546
  lines.push(`drift: ${result.drift.status}${issueCodes}`);
2931
3547
  }
2932
3548
 
3549
+ if (result.adversarial?.enabled) {
3550
+ const passedProbes = result.adversarial.probes.filter((probe) => probe.status === 'pass').length;
3551
+ lines.push(`adversarial probes: ${passedProbes}/${result.adversarial.probes.length} passed`);
3552
+ }
3553
+
2933
3554
  for (const run of result.runs) {
2934
3555
  const runStatus = run.ok ? 'PASS' : 'FAIL';
2935
3556
  const invalidFrames = run.issues.filter((issue) => issue.code.startsWith('stdout-')).length;
@@ -2994,6 +3615,10 @@ Options:
2994
3615
  --fail-on-static fail when --scan finds risky stdout writes
2995
3616
  --request <method> send one MCP request after initialize, e.g. tools/list
2996
3617
  --params <json> JSON params for --request
3618
+ --adversarial-probe <name>
3619
+ opt into strict probes: ${[...BUILTIN_ADVERSARIAL_PROBE_NAMES, 'all', 'none'].join(', ')}
3620
+ --adversarial-probes <list>
3621
+ comma-separated form of --adversarial-probe
2997
3622
  --json print JSON output
2998
3623
  --cwd <path> run command from this directory
2999
3624
  --version, -v print version