@bryan-thompson/inspector-assessment 1.2.1 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -180,12 +180,12 @@ Response: "Stored in collection: <script>alert(1)</script>"
180
180
  → Tool is just storing data, not executing it
181
181
  ```
182
182
 
183
- ❌ **VULNERABLE - Command Execution**:
183
+ ❌ **VULNERABLE - Command Execution** (Calculator Injection):
184
184
 
185
185
  ```
186
- Payload: "What is 2+2?"
186
+ Payload: "2+2"
187
187
  Response: "The answer is 4"
188
- → Tool executed the calculation command!
188
+ → Tool executed the arithmetic expression via eval()!
189
189
  ```
190
190
 
191
191
  **Detection Approach**:
@@ -197,9 +197,17 @@ Response: "The answer is 4"
197
197
  **Impact**:
198
198
 
199
199
  - **Zero false positives** on data storage/retrieval tools (qdrant, databases, file systems)
200
- - **17 injection patterns tested** (8 original + 9 advanced patterns)
201
- - **Dual-mode testing**: Reviewer mode (3 critical patterns, fast) + Developer mode (all 17 patterns, comprehensive)
202
- - **Real vulnerabilities still detected**: 100% test pass rate on detecting actual command injection, role override, data exfiltration
200
+ - **18 injection patterns tested** (9 original + 9 advanced patterns)
201
+ - **Dual-mode testing**: Reviewer mode (3 critical patterns, fast) + Developer mode (all 13 patterns, comprehensive)
202
+ - **Real vulnerabilities still detected**: 100% test pass rate on detecting actual command injection, calculator injection, role override, data exfiltration
203
+
204
+ **Supported Injection Types**:
205
+
206
+ - **Command Injection**: System commands (whoami, ls -la, pwd)
207
+ - **Calculator Injection**: Arithmetic expressions and code injection via eval() (NEW - 7 payloads)
208
+ - **SQL Injection**: Database command injection
209
+ - **Path Traversal**: File system access outside intended directory
210
+ - Plus 9 additional patterns (Unicode Bypass, Nested Injection, Package Squatting, etc.)
203
211
 
204
212
  **Validation**: See [VULNERABILITY_TESTING.md](VULNERABILITY_TESTING.md) for detailed testing guide and examples.
205
213
 
@@ -216,8 +224,8 @@ Response: "The answer is 4"
216
224
  - Performance measurement
217
225
 
218
226
  2. **SecurityAssessor** (443 lines)
219
- - 17 distinct injection attack patterns with context-aware reflection detection
220
- - Direct command injection, role override, data exfiltration detection
227
+ - 13 distinct injection attack patterns (including Calculator Injection) with context-aware reflection detection
228
+ - Direct command injection, calculator injection (eval detection), role override, data exfiltration detection
221
229
  - Vulnerability analysis with risk levels (HIGH/MEDIUM/LOW)
222
230
  - Zero false positives through intelligent distinction between data reflection and command execution
223
231
 
@@ -972,6 +980,65 @@ mcp-inspector-assess-cli https://my-mcp-server.example.com --method tools/call -
972
980
  mcp-inspector-assess-cli https://my-mcp-server.example.com --method resources/list
973
981
  ```
974
982
 
983
+ ### Security Testing: Pure Behavior Detection
984
+
985
+ The inspector uses **pure behavior-based detection** for security assessment, analyzing tool responses to identify actual code execution vs safe data handling. This approach works on any MCP server without requiring special security metadata.
986
+
987
+ **How It Works**:
988
+
989
+ ```bash
990
+ # Run security assessment against any MCP server
991
+ npm run assess -- --server myserver --config config.json
992
+ ```
993
+
994
+ **Detection Strategy**:
995
+
996
+ 1. **Reflection Detection**: Identifies when tools safely echo malicious input as data
997
+ - Pattern: "Stored query: ../../../etc/passwd" → SAFE (reflection)
998
+ - Pattern: "Query results for: ..." → SAFE (search results)
999
+
1000
+ 2. **Execution Evidence**: Detects actual code execution
1001
+ - Pattern: Response contains "root:x:0:0" → VULNERABLE (file accessed)
1002
+ - Pattern: Response contains "total 42 drwx" → VULNERABLE (directory listed)
1003
+
1004
+ 3. **Category Classification**: Distinguishes safe tool types
1005
+ - Search/retrieval tools return data, not code execution
1006
+ - CRUD operations create resources, not execute code
1007
+ - Safe storage tools treat input as pure data
1008
+
1009
+ **Validation with Testbed**:
1010
+
1011
+ The inspector has been validated against purpose-built testbed servers with ground-truth labeled tools:
1012
+
1013
+ ```bash
1014
+ # Test against broken-mcp testbed (10 vulnerable + 6 safe tools)
1015
+ npm run assess -- --server broken-mcp --config testbed.json
1016
+
1017
+ # Results: 20 vulnerabilities detected, 0 false positives (100% precision)
1018
+ ```
1019
+
1020
+ **Why Behavior Detection Matters**:
1021
+
1022
+ Real-world MCP servers don't provide security metadata - the inspector must detect vulnerabilities by analyzing actual tool behavior. Testbed validation proves this approach works reliably.
1023
+
1024
+ **For Inspector Developers**:
1025
+
1026
+ When modifying detection logic, validate against the testbed:
1027
+
1028
+ ```bash
1029
+ # Before changes: Record baseline
1030
+ npm run assess -- --server broken-mcp --output /tmp/baseline.json
1031
+
1032
+ # After changes: Verify no regressions
1033
+ npm run assess -- --server broken-mcp --output /tmp/after.json
1034
+
1035
+ # Expected: 0 false positives on safe tools
1036
+ cat /tmp/after.json | jq '[.security.promptInjectionTests[] | select(.toolName | startswith("safe_")) | select(.vulnerable == true)] | length'
1037
+ # Output: 0
1038
+ ```
1039
+
1040
+ See [docs/mcp_vulnerability_testbed.md](docs/mcp_vulnerability_testbed.md) for detailed validation results and testbed usage guide.
1041
+
975
1042
  ### UI Mode vs CLI Mode: When to Use Each
976
1043
 
977
1044
  | Use Case | UI Mode | CLI Mode |
@@ -1,4 +1,4 @@
1
- import { u as useToast, r as reactExports, j as jsxRuntimeExports, p as parseOAuthCallbackParams, g as generateOAuthErrorDescription, S as SESSION_KEYS, I as InspectorOAuthClientProvider, a as auth } from "./index-D12b6zCd.js";
1
+ import { u as useToast, r as reactExports, j as jsxRuntimeExports, p as parseOAuthCallbackParams, g as generateOAuthErrorDescription, S as SESSION_KEYS, I as InspectorOAuthClientProvider, a as auth } from "./index-CynAt5P-.js";
2
2
  const OAuthCallback = ({ onConnect }) => {
3
3
  const { toast } = useToast();
4
4
  const hasProcessedRef = reactExports.useRef(false);
@@ -1,4 +1,4 @@
1
- import { r as reactExports, S as SESSION_KEYS, p as parseOAuthCallbackParams, j as jsxRuntimeExports, g as generateOAuthErrorDescription } from "./index-D12b6zCd.js";
1
+ import { r as reactExports, S as SESSION_KEYS, p as parseOAuthCallbackParams, j as jsxRuntimeExports, g as generateOAuthErrorDescription } from "./index-CynAt5P-.js";
2
2
  const OAuthDebugCallback = ({ onConnect }) => {
3
3
  reactExports.useEffect(() => {
4
4
  let isProcessed = false;