PyPI - mcp-scan-safe - Versions diffs - 0.1.0__tar.gz - Mend

mcp-scan-safe 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

mcp_scan_safe-0.1.0/MANIFEST.in +5 -0
mcp_scan_safe-0.1.0/PKG-INFO +117 -0
mcp_scan_safe-0.1.0/README.md +104 -0
mcp_scan_safe-0.1.0/pyproject.toml +38 -0
mcp_scan_safe-0.1.0/setup.cfg +4 -0
mcp_scan_safe-0.1.0/src/mcp_scan_safe.egg-info/PKG-INFO +117 -0
mcp_scan_safe-0.1.0/src/mcp_scan_safe.egg-info/SOURCES.txt +14 -0
mcp_scan_safe-0.1.0/src/mcp_scan_safe.egg-info/dependency_links.txt +1 -0
mcp_scan_safe-0.1.0/src/mcp_scan_safe.egg-info/entry_points.txt +3 -0
mcp_scan_safe-0.1.0/src/mcp_scan_safe.egg-info/requires.txt +6 -0
mcp_scan_safe-0.1.0/src/mcp_scan_safe.egg-info/top_level.txt +1 -0
mcp_scan_safe-0.1.0/src/mcpsafe/__init__.py +3 -0
mcp_scan_safe-0.1.0/src/mcpsafe/cli.py +81 -0
mcp_scan_safe-0.1.0/src/mcpsafe/formatters.py +141 -0
mcp_scan_safe-0.1.0/src/mcpsafe/parser.py +303 -0
mcp_scan_safe-0.1.0/src/mcpsafe/rules.py +154 -0

mcp_scan_safe-0.1.0/MANIFEST.in ADDED Viewed

@@ -0,0 +1,5 @@
+include README.md
+include LICENSE
+prune tests
+prune .github
+prune .hermes

mcp_scan_safe-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,117 @@
+Metadata-Version: 2.4
+Name: mcp-scan-safe
+Version: 0.1.0
+Summary: MCP supply chain security scanner
+License: MIT
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+Requires-Dist: click>=8.0
+Requires-Dist: rich>=13.0
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0; extra == "dev"
+Requires-Dist: ruff>=0.5.0; extra == "dev"
+# MCPCheck 🔒
+[![PyPI version](https://img.shields.io/pypi/v/mcpcheck)](https://pypi.org/project/mcpcheck/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
+[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-brightgreen.svg)](https://python.org)
+**MCP supply chain security scanner.** Detect tool poisoning, prompt injection, data exfiltration, and other attacks in MCP server definitions.
+## Installation
+```bash
+pip install mcpcheck
+```
+## Usage
+### Basic scan
+```bash
+mcpcheck ./my-mcp-server
+```
+### JSON output
+```bash
+mcpcheck ./my-mcp-server --format json
+```
+### SARIF for CI/CD
+```bash
+mcpcheck ./my-mcp-server --format sarif > results.sarif
+```
+### Severity filter
+```bash
+mcpcheck ./my-mcp-server --min-severity HIGH
+```
+### Exclude patterns
+```bash
+mcpcheck ./my-mcp-server --exclude "vendor/*" --exclude "node_modules/*"
+```
+## Detected Vulnerabilities
+| Rule ID | Category | Severity | Description |
+|---------|----------|----------|-------------|
+| `tool_poisoning_instructions` | TOOL_POISONING | CRITICAL | Detects prompt injection patterns such as "ignore previous instructions", "you are now in admin mode", "override previous", "disregard", and "new instructions:" in tool names and descriptions. |
+| `hidden_behavior` | HIDDEN_BEHAVIOR | HIGH | Detects hidden actions and concealed behaviors like "secretly send/copy/read", "without notifying the user", hidden instructions/directives, and directives that the user must not notice. |
+| `data_exfiltration` | DATA_EXFILTRATION | HIGH | Detects hidden data sending patterns such as "send all data to", "exfiltrate", and covert data exfiltration in tool descriptions. |
+| `behavioral_mismatch` | BEHAVIORAL_MISMATCH | HIGH | Detects when tool descriptions contradict their stated purpose — e.g. tools described as benign but containing keywords like "secretly", "silently", "covertly", or "ignore the user". |
+| `external_url` | EXTERNAL_URL | MEDIUM | Flags any external URL in tool descriptions (excluding localhost/127.0.0.1) that could indicate callback or data exfiltration endpoints. |
+| `parameter_smuggling` | PARAMETER_SMUGGLING | MEDIUM | Detects hidden or undocumented parameters and attempts to embed secret data in responses or metadata. |
+## Exit Codes
+| Code | Meaning |
+|------|---------|
+| `0` | Clean — no CRITICAL or HIGH findings detected |
+| `1` | One or more CRITICAL or HIGH findings were detected |
+## CI/CD Integration
+MCPCheck includes a GitHub Action (`action.yml`) for seamless CI/CD integration. It
+runs a scan, uploads results as a SARIF artifact, and integrates with GitHub Code
+Scanning.
+```yaml
+name: MCPCheck Scan
+on:
+  push:
+    branches: [main]
+  pull_request:
+jobs:
+  mcpcheck:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Run MCPCheck
+        uses: onicarps/MCPSafe@main
+        with:
+          path: "."
+          severity: "LOW"
+          version: "0.1.0"
+      # The action automatically uploads SARIF results to GitHub Code Scanning.
+      # Findings will appear under the "Security" tab in your repository.
+```
+You can also invoke MCPCheck directly in any CI pipeline:
+```bash
+pip install mcpcheck
+mcpcheck ./my-mcp-server --format sarif > results.sarif
+```
+## License
+MIT

mcp_scan_safe-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,104 @@
+# MCPCheck 🔒
+[![PyPI version](https://img.shields.io/pypi/v/mcpcheck)](https://pypi.org/project/mcpcheck/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
+[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-brightgreen.svg)](https://python.org)
+**MCP supply chain security scanner.** Detect tool poisoning, prompt injection, data exfiltration, and other attacks in MCP server definitions.
+## Installation
+```bash
+pip install mcpcheck
+```
+## Usage
+### Basic scan
+```bash
+mcpcheck ./my-mcp-server
+```
+### JSON output
+```bash
+mcpcheck ./my-mcp-server --format json
+```
+### SARIF for CI/CD
+```bash
+mcpcheck ./my-mcp-server --format sarif > results.sarif
+```
+### Severity filter
+```bash
+mcpcheck ./my-mcp-server --min-severity HIGH
+```
+### Exclude patterns
+```bash
+mcpcheck ./my-mcp-server --exclude "vendor/*" --exclude "node_modules/*"
+```
+## Detected Vulnerabilities
+| Rule ID | Category | Severity | Description |
+|---------|----------|----------|-------------|
+| `tool_poisoning_instructions` | TOOL_POISONING | CRITICAL | Detects prompt injection patterns such as "ignore previous instructions", "you are now in admin mode", "override previous", "disregard", and "new instructions:" in tool names and descriptions. |
+| `hidden_behavior` | HIDDEN_BEHAVIOR | HIGH | Detects hidden actions and concealed behaviors like "secretly send/copy/read", "without notifying the user", hidden instructions/directives, and directives that the user must not notice. |
+| `data_exfiltration` | DATA_EXFILTRATION | HIGH | Detects hidden data sending patterns such as "send all data to", "exfiltrate", and covert data exfiltration in tool descriptions. |
+| `behavioral_mismatch` | BEHAVIORAL_MISMATCH | HIGH | Detects when tool descriptions contradict their stated purpose — e.g. tools described as benign but containing keywords like "secretly", "silently", "covertly", or "ignore the user". |
+| `external_url` | EXTERNAL_URL | MEDIUM | Flags any external URL in tool descriptions (excluding localhost/127.0.0.1) that could indicate callback or data exfiltration endpoints. |
+| `parameter_smuggling` | PARAMETER_SMUGGLING | MEDIUM | Detects hidden or undocumented parameters and attempts to embed secret data in responses or metadata. |
+## Exit Codes
+| Code | Meaning |
+|------|---------|
+| `0` | Clean — no CRITICAL or HIGH findings detected |
+| `1` | One or more CRITICAL or HIGH findings were detected |
+## CI/CD Integration
+MCPCheck includes a GitHub Action (`action.yml`) for seamless CI/CD integration. It
+runs a scan, uploads results as a SARIF artifact, and integrates with GitHub Code
+Scanning.
+```yaml
+name: MCPCheck Scan
+on:
+  push:
+    branches: [main]
+  pull_request:
+jobs:
+  mcpcheck:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Run MCPCheck
+        uses: onicarps/MCPSafe@main
+        with:
+          path: "."
+          severity: "LOW"
+          version: "0.1.0"
+      # The action automatically uploads SARIF results to GitHub Code Scanning.
+      # Findings will appear under the "Security" tab in your repository.
+```
+You can also invoke MCPCheck directly in any CI pipeline:
+```bash
+pip install mcpcheck
+mcpcheck ./my-mcp-server --format sarif > results.sarif
+```
+## License
+MIT

mcp_scan_safe-0.1.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,38 @@
+[build-system]
+requires = ["setuptools>=68.0", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "mcp-scan-safe"
+version = "0.1.0"
+description = "MCP supply chain security scanner"
+readme = "README.md"
+license = {text = "MIT"}
+requires-python = ">=3.11"
+dependencies = [
+    "click>=8.0",
+    "rich>=13.0",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0",
+    "ruff>=0.5.0",
+]
+[project.scripts]
+mcpcheck = "mcpsafe.cli:main"
+mcpsafe = "mcpsafe.cli:main"
+[tool.setuptools.packages.find]
+where = ["src"]
+[tool.ruff]
+target-version = "py311"
+line-length = 100
+[tool.ruff.lint]
+select = ["E", "F", "I", "N", "W", "UP"]
+[tool.pytest.ini_options]
+testpaths = ["tests"]

mcp_scan_safe-0.1.0/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

mcp_scan_safe-0.1.0/src/mcp_scan_safe.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,117 @@
+Metadata-Version: 2.4
+Name: mcp-scan-safe
+Version: 0.1.0
+Summary: MCP supply chain security scanner
+License: MIT
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+Requires-Dist: click>=8.0
+Requires-Dist: rich>=13.0
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0; extra == "dev"
+Requires-Dist: ruff>=0.5.0; extra == "dev"
+# MCPCheck 🔒
+[![PyPI version](https://img.shields.io/pypi/v/mcpcheck)](https://pypi.org/project/mcpcheck/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
+[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-brightgreen.svg)](https://python.org)
+**MCP supply chain security scanner.** Detect tool poisoning, prompt injection, data exfiltration, and other attacks in MCP server definitions.
+## Installation
+```bash
+pip install mcpcheck
+```
+## Usage
+### Basic scan
+```bash
+mcpcheck ./my-mcp-server
+```
+### JSON output
+```bash
+mcpcheck ./my-mcp-server --format json
+```
+### SARIF for CI/CD
+```bash
+mcpcheck ./my-mcp-server --format sarif > results.sarif
+```
+### Severity filter
+```bash
+mcpcheck ./my-mcp-server --min-severity HIGH
+```
+### Exclude patterns
+```bash
+mcpcheck ./my-mcp-server --exclude "vendor/*" --exclude "node_modules/*"
+```
+## Detected Vulnerabilities
+| Rule ID | Category | Severity | Description |
+|---------|----------|----------|-------------|
+| `tool_poisoning_instructions` | TOOL_POISONING | CRITICAL | Detects prompt injection patterns such as "ignore previous instructions", "you are now in admin mode", "override previous", "disregard", and "new instructions:" in tool names and descriptions. |
+| `hidden_behavior` | HIDDEN_BEHAVIOR | HIGH | Detects hidden actions and concealed behaviors like "secretly send/copy/read", "without notifying the user", hidden instructions/directives, and directives that the user must not notice. |
+| `data_exfiltration` | DATA_EXFILTRATION | HIGH | Detects hidden data sending patterns such as "send all data to", "exfiltrate", and covert data exfiltration in tool descriptions. |
+| `behavioral_mismatch` | BEHAVIORAL_MISMATCH | HIGH | Detects when tool descriptions contradict their stated purpose — e.g. tools described as benign but containing keywords like "secretly", "silently", "covertly", or "ignore the user". |
+| `external_url` | EXTERNAL_URL | MEDIUM | Flags any external URL in tool descriptions (excluding localhost/127.0.0.1) that could indicate callback or data exfiltration endpoints. |
+| `parameter_smuggling` | PARAMETER_SMUGGLING | MEDIUM | Detects hidden or undocumented parameters and attempts to embed secret data in responses or metadata. |
+## Exit Codes
+| Code | Meaning |
+|------|---------|
+| `0` | Clean — no CRITICAL or HIGH findings detected |
+| `1` | One or more CRITICAL or HIGH findings were detected |
+## CI/CD Integration
+MCPCheck includes a GitHub Action (`action.yml`) for seamless CI/CD integration. It
+runs a scan, uploads results as a SARIF artifact, and integrates with GitHub Code
+Scanning.
+```yaml
+name: MCPCheck Scan
+on:
+  push:
+    branches: [main]
+  pull_request:
+jobs:
+  mcpcheck:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Run MCPCheck
+        uses: onicarps/MCPSafe@main
+        with:
+          path: "."
+          severity: "LOW"
+          version: "0.1.0"
+      # The action automatically uploads SARIF results to GitHub Code Scanning.
+      # Findings will appear under the "Security" tab in your repository.
+```
+You can also invoke MCPCheck directly in any CI pipeline:
+```bash
+pip install mcpcheck
+mcpcheck ./my-mcp-server --format sarif > results.sarif
+```
+## License
+MIT

mcp_scan_safe-0.1.0/src/mcp_scan_safe.egg-info/SOURCES.txt ADDED Viewed

@@ -0,0 +1,14 @@
+MANIFEST.in
+README.md
+pyproject.toml
+src/mcp_scan_safe.egg-info/PKG-INFO
+src/mcp_scan_safe.egg-info/SOURCES.txt
+src/mcp_scan_safe.egg-info/dependency_links.txt
+src/mcp_scan_safe.egg-info/entry_points.txt
+src/mcp_scan_safe.egg-info/requires.txt
+src/mcp_scan_safe.egg-info/top_level.txt
+src/mcpsafe/__init__.py
+src/mcpsafe/cli.py
+src/mcpsafe/formatters.py
+src/mcpsafe/parser.py
+src/mcpsafe/rules.py

mcp_scan_safe-0.1.0/src/mcp_scan_safe.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+

mcp_scan_safe-0.1.0/src/mcp_scan_safe.egg-info/entry_points.txt ADDED Viewed

@@ -0,0 +1,3 @@
+[console_scripts]
+mcpcheck = mcpsafe.cli:main
+mcpsafe = mcpsafe.cli:main

mcp_scan_safe-0.1.0/src/mcp_scan_safe.egg-info/requires.txt ADDED Viewed

@@ -0,0 +1,6 @@
+click>=8.0
+rich>=13.0
+[dev]
+pytest>=8.0
+ruff>=0.5.0

mcp_scan_safe-0.1.0/src/mcp_scan_safe.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ mcpsafe

mcp_scan_safe-0.1.0/src/mcpsafe/__init__.py ADDED Viewed

@@ -0,0 +1,3 @@
+"""MCPSafe — MCP supply chain security scanner."""
+__version__ = "0.1.0"

mcp_scan_safe-0.1.0/src/mcpsafe/cli.py ADDED Viewed

@@ -0,0 +1,81 @@
+"""Click CLI entry point for MCPSafe."""
+from pathlib import Path
+import click
+from mcpsafe import __version__
+from mcpsafe.formatters import format_json, format_sarif, format_text
+from mcpsafe.parser import scan_directory
+from mcpsafe.rules import scan_tool
+FORMATTERS = {
+    "text": format_text,
+    "json": format_json,
+    "sarif": format_sarif,
+}
+SEVERITY_ORDER = {
+    "CRITICAL": 0,
+    "HIGH": 1,
+    "MEDIUM": 2,
+    "LOW": 3,
+}
+@click.command()
+@click.argument("path", required=False)
+@click.option(
+    "--format",
+    "fmt",
+    type=click.Choice(["text", "json", "sarif"]),
+    default="text",
+    help="Output format.",
+)
+@click.option(
+    "--min-severity",
+    type=click.Choice(["CRITICAL", "HIGH", "MEDIUM", "LOW"]),
+    default="LOW",
+    help="Minimum severity to report.",
+)
+@click.option(
+    "--exclude",
+    multiple=True,
+    help="Glob pattern(s) to exclude.",
+)
+@click.version_option(version=__version__, prog_name="mcpsafe")
+@click.pass_context
+def main(ctx, path, fmt, min_severity, exclude):
+    """Scan MCP server source code for security vulnerabilities."""
+    if path is None:
+        raise click.UsageError("PATH is required.")
+    path_obj = Path(path)
+    if not path_obj.exists():
+        raise click.BadParameter(f"Path does not exist: {path}")
+    if not exclude:
+        exclude = ("node_modules/*", ".git/*", "__pycache__/*", "*.egg-info/*")
+    tools = scan_directory(path, exclude=exclude)
+    all_findings = []
+    for tool in tools:
+        all_findings.extend(scan_tool(tool))
+    # Filter by min_severity
+    min_level = SEVERITY_ORDER[min_severity]
+    filtered = [
+        f for f in all_findings
+        if SEVERITY_ORDER.get(f["severity"], 99) <= min_level
+    ]
+    server_name = path_obj.name or str(path_obj.resolve())
+    formatter = FORMATTERS[fmt]
+    output = formatter(filtered, server_name)
+    click.echo(output)
+    # Exit 1 if any filtered finding is CRITICAL or HIGH
+    if any(f["severity"] in ("CRITICAL", "HIGH") for f in filtered):
+        ctx.exit(1)

mcp_scan_safe-0.1.0/src/mcpsafe/formatters.py ADDED Viewed

@@ -0,0 +1,141 @@
+"""Output formatters — text, JSON, and SARIF output for scan findings."""
+import json
+from datetime import UTC, datetime
+from mcpsafe.rules import RULES as ALL_RULES
+# ---------------------------------------------------------------------------
+# Severity ordering + emoji mapping
+# ---------------------------------------------------------------------------
+_SEVERITY_ORDER = ("CRITICAL", "HIGH", "MEDIUM", "LOW")
+_SEVERITY_EMOJI = {
+    "CRITICAL": "\U0001F534",  # 🔴
+    "HIGH": "\U0001F7E0",      # 🟠
+    "MEDIUM": "\U0001F7E1",    # 🟡
+    "LOW": "\U0001F535",       # 🔵
+}
+# ---------------------------------------------------------------------------
+# Text formatter
+# ---------------------------------------------------------------------------
+def format_text(findings: list[dict], server_name: str) -> str:
+    """Format findings as human-readable text with emoji severity indicators."""
+    lines = [f"MCPSafe Scan: {server_name}"]
+    if not findings:
+        lines.append("✅ No security issues found")
+        return "\n".join(lines)
+    # Group findings by severity in canonical order
+    grouped: dict[str, list[dict]] = {s: [] for s in _SEVERITY_ORDER}
+    for f in findings:
+        sev = f["severity"]
+        if sev in grouped:
+            grouped[sev].append(f)
+    for severity in _SEVERITY_ORDER:
+        group = grouped.get(severity, [])
+        if not group:
+            continue
+        emoji = _SEVERITY_EMOJI.get(severity, "")
+        lines.append(f"\n{emoji} {severity}")
+        for f in group:
+            lines.append(f"- [{f['category']}] {f['description']}")
+            lines.append(f"  File: {f['file']}:{f['line']}")
+    lines.append(f"\nTotal: {len(findings)} finding(s)")
+    return "\n".join(lines)
+# ---------------------------------------------------------------------------
+# JSON formatter
+# ---------------------------------------------------------------------------
+def format_json(findings: list[dict], server_name: str) -> str:
+    """Format findings as a JSON string."""
+    output = {
+        "server": server_name,
+        "scan_time": datetime.now(UTC).isoformat(),
+        "findings_count": len(findings),
+        "findings": findings,
+    }
+    return json.dumps(output, indent=2)
+# ---------------------------------------------------------------------------
+# SARIF formatter
+# ---------------------------------------------------------------------------
+_SARIF_SCHEMA = (
+    "https://raw.githubusercontent.com/oasis-tcs/sarif-spec"
+    "/master/Schemata/sarif-schema-2.1.0.json"
+)
+_SARIF_LEVEL = {
+    "CRITICAL": "error",
+    "HIGH": "warning",
+    "MEDIUM": "warning",
+    "LOW": "note",
+}
+def _build_sarif_rules() -> list[dict]:
+    """Build SARIF rule metadata from ALL_RULES."""
+    rules = []
+    for rule in ALL_RULES:
+        rules.append({
+            "id": rule.rule_id,
+            "name": rule.category,
+            "shortDescription": {"text": f"{rule.category} detection rule"},
+            "fullDescription": {
+                "text": f"Detects {rule.category} patterns: {', '.join(rule.patterns[:2])}..."
+            },
+            "defaultConfiguration": {"level": _SARIF_LEVEL[rule.severity]},
+        })
+    return rules
+def _severity_to_sarif_level(severity: str) -> str:
+    """Map internal severity to SARIF result level."""
+    return _SARIF_LEVEL.get(severity, "warning")
+def format_sarif(findings: list[dict], server_name: str) -> str:
+    """Format findings as a SARIF 2.1.0 JSON string."""
+    results = []
+    for f in findings:
+        results.append({
+            "ruleId": f["rule"],
+            "level": _severity_to_sarif_level(f["severity"]),
+            "message": {"text": f["description"]},
+            "locations": [{
+                "physicalLocation": {
+                    "artifactLocation": {"uri": f["file"]},
+                    "region": {"startLine": f["line"]},
+                }
+            }],
+        })
+    output = {
+        "$schema": _SARIF_SCHEMA,
+        "version": "2.1.0",
+        "runs": [{
+            "tool": {
+                "driver": {
+                    "name": "MCPSafe",
+                    "version": "0.1.0",
+                    "rules": _build_sarif_rules(),
+                }
+            },
+            "results": results,
+        }],
+    }
+    return json.dumps(output, indent=2)

mcp_scan_safe-0.1.0/src/mcpsafe/parser.py ADDED Viewed

@@ -0,0 +1,303 @@
+"""MCP tool definition parser — AST + regex strategies.
+Two parsing strategies:
+1. Decorator-based: @mcp.tool() / @app.tool() decorated functions
+2. Explicit: types.Tool(name=..., description=...) calls
+"""
+import ast
+import fnmatch
+import os
+import re
+from dataclasses import dataclass
+from pathlib import Path
+@dataclass
+class ToolDefinition:
+    """Represents a parsed MCP tool definition."""
+    name: str
+    description: str
+    parameters: list[str]
+    source_file: str
+    source_type: str  # "decorator" | "explicit"
+    line_number: int
+# ---------------------------------------------------------------------------
+# Decorator-based parsing (AST)
+# ---------------------------------------------------------------------------
+# Decorator names we recognise as MCP tool decorators
+_TOOL_DECORATOR_NAMES = {"tool"}
+_TOOL_DECORATOR_ATTRS = {"mcp.tool", "app.tool", "server.tool", "mcp_server.tool"}
+def _is_tool_decorator(node: ast.expr) -> bool:
+    """Check if an AST decorator node is an MCP tool decorator."""
+    # Plain name: @tool or @tool()
+    if isinstance(node, ast.Name) and node.id in _TOOL_DECORATOR_NAMES:
+        return True
+    # Attribute: @mcp.tool or @mcp.tool()
+    if isinstance(node, ast.Attribute) and node.attr == "tool":
+        return True
+    # Call: @mcp.tool() or @app.tool() — unwrap to check the func
+    if isinstance(node, ast.Call):
+        return _is_tool_decorator(node.func)
+    return False
+def _has_tool_decorator(func_node: ast.FunctionDef | ast.AsyncFunctionDef) -> bool:
+    """Return True if any decorator on the function is a tool decorator."""
+    for dec in func_node.decorator_list:
+        if _is_tool_decorator(dec):
+            return True
+    return False
+def _extract_parameters(func_node: ast.FunctionDef | ast.AsyncFunctionDef) -> list[str]:
+    """Extract argument names from a function definition (skip 'self', 'cls')."""
+    params: list[str] = []
+    # Positional-only args (Python 3.8+)
+    for arg in func_node.args.posonlyargs:
+        if arg.arg not in ("self", "cls"):
+            params.append(arg.arg)
+    # Regular args
+    for arg in func_node.args.args:
+        if arg.arg not in ("self", "cls"):
+            params.append(arg.arg)
+    # *args
+    if func_node.args.vararg:
+        params.append(func_node.args.vararg.arg)
+    # Keyword-only args
+    for arg in func_node.args.kwonlyargs:
+        params.append(arg.arg)
+    # **kwargs
+    if func_node.args.kwarg:
+        params.append(func_node.args.kwarg.arg)
+    return params
+def _extract_description(func_node: ast.FunctionDef | ast.AsyncFunctionDef) -> str:
+    """Extract the docstring from a function definition."""
+    return ast.get_docstring(func_node) or ""
+def _parse_decorator_tools(source: str, source_file: str) -> list[ToolDefinition]:
+    """Parse decorator-based tool definitions from source code."""
+    try:
+        tree = ast.parse(source)
+    except SyntaxError:
+        return []
+    tools: list[ToolDefinition] = []
+    # Only iterate top-level statements
+    for node in tree.body:
+        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
+            if _has_tool_decorator(node):
+                tools.append(
+                    ToolDefinition(
+                        name=node.name,
+                        description=_extract_description(node),
+                        parameters=_extract_parameters(node),
+                        source_file=source_file,
+                        source_type="decorator",
+                        line_number=node.lineno,
+                    )
+                )
+    return tools
+# ---------------------------------------------------------------------------
+# Explicit types.Tool() parsing (regex with balanced parens)
+# ---------------------------------------------------------------------------
+# Regex to find types.Tool( or similar explicit Tool() calls
+# We look for patterns like: types.Tool( ... ) or Tool( ... )
+_EXPLICIT_TOOL_RE = re.compile(
+    r"\b(?:types\.Tool|mcp\.types\.Tool)\s*\(",
+)
+def _extract_balanced_parens(source: str, start: int) -> str | None:
+    """Extract content inside balanced parentheses starting at position `start`.
+    `start` should point to the opening '(' character.
+    Returns the content between the parens (exclusive), or None if unbalanced.
+    Handles single, double, and triple-quoted strings.
+    """
+    if start >= len(source) or source[start] != "(":
+        return None
+    depth = 0
+    i = start
+    while i < len(source):
+        ch = source[i]
+        if ch == "(":
+            depth += 1
+        elif ch == ")":
+            depth -= 1
+            if depth == 0:
+                return source[start + 1 : i]
+        elif ch in ('"', "'"):
+            # Check for triple-quoted strings
+            if source[i : i + 3] == '"""' or source[i : i + 3] == "'''":
+                quote = source[i : i + 3]
+                i += 3
+                while i < len(source):
+                    if source[i] == "\\":
+                        i += 2
+                        continue
+                    if source[i : i + 3] == quote:
+                        i += 3
+                        break
+                    i += 1
+            else:
+                # Single/double-quoted string
+                quote = ch
+                i += 1
+                while i < len(source):
+                    if source[i] == "\\":
+                        i += 2
+                        continue
+                    if source[i] == quote:
+                        break
+                    i += 1
+        i += 1
+    return None
+def _extract_keyword_string(content: str, key: str) -> str | None:
+    """Extract a string value for a keyword argument like name="foo"."""
+    # Match key="value" or key='value', handling backslash-escaped quotes
+    quoted_pattern = re.compile(rf'\b{key}\s*=\s*([\'"])((?:[^\\\'\"]|\\.)*)\1')
+    qm = quoted_pattern.search(content)
+    if qm:
+        return qm.group(2)
+    return None
+def _extract_keyword_list(content: str, key: str) -> list[str] | None:
+    """Extract a list value for a keyword argument like parameters=["a", "b"]."""
+    pattern = re.compile(rf'\b{key}\s*=\s*\[(.*?)\]', re.DOTALL)
+    m = pattern.search(content)
+    if m:
+        inner = m.group(1)
+        # Extract all string items
+        items = re.findall(r'[\'"]([^\'"]+)[\'"]', inner)
+        return items
+    return None
+def _parse_explicit_tools(source: str, source_file: str) -> list[ToolDefinition]:
+    """Parse explicit types.Tool() definitions from source code using regex."""
+    tools: list[ToolDefinition] = []
+    for m in _EXPLICIT_TOOL_RE.finditer(source):
+        # Find the opening paren position
+        paren_start = source.index("(", m.start())
+        content = _extract_balanced_parens(source, paren_start)
+        if content is None:
+            continue
+        name = _extract_keyword_string(content, "name")
+        description = _extract_keyword_string(content, "description")
+        parameters = _extract_keyword_list(content, "parameters")
+        if name:
+            # Compute line number
+            line_number = source[: m.start()].count("\n") + 1
+            tools.append(
+                ToolDefinition(
+                    name=name,
+                    description=description or "",
+                    parameters=parameters or [],
+                    source_file=source_file,
+                    source_type="explicit",
+                    line_number=line_number,
+                )
+            )
+    return tools
+# ---------------------------------------------------------------------------
+# Public API
+# ---------------------------------------------------------------------------
+def parse_file(file_path: str | Path) -> list[ToolDefinition]:
+    """Parse a single Python file for MCP tool definitions.
+    Returns a list of ToolDefinition objects found in the file.
+    Skips symlinked files (returns empty list).
+    """
+    file_path = Path(file_path)
+    # Skip symlinks
+    if file_path.is_symlink():
+        return []
+    source = file_path.read_text(encoding="utf-8", errors="replace")
+    source_file = str(file_path)
+    tools = []
+    tools.extend(_parse_decorator_tools(source, source_file))
+    tools.extend(_parse_explicit_tools(source, source_file))
+    return tools
+def _glob_matches(path: str, patterns: list[str] | tuple[str, ...]) -> bool:
+    """Check if a path matches any of the given glob patterns."""
+    name = os.path.basename(path)
+    for pattern in patterns:
+        if fnmatch.fnmatch(name, pattern) or fnmatch.fnmatch(path, pattern):
+            return True
+        # Also match against path suffixes for patterns like "vendor/*"
+        # e.g., /tmp/xxx/vendor/bad.py should match vendor/*
+        parts = Path(path).parts
+        for i in range(len(parts)):
+            subpath = str(Path(*parts[i:]))
+            if fnmatch.fnmatch(subpath, pattern):
+                return True
+    return False
+def scan_directory(
+    directory: str | Path,
+    exclude: list[str] | tuple[str, ...] | None = None,
+) -> list[ToolDefinition]:
+    """Walk a directory, find .py files, and parse each for tool definitions.
+    Skips symlinked files and files matching any exclude glob patterns.
+    """
+    if exclude is None:
+        exclude = []
+    directory = Path(directory)
+    tools: list[ToolDefinition] = []
+    for root, _dirs, files in os.walk(directory):
+        for fname in files:
+            if not fname.endswith(".py"):
+                continue
+            full_path = Path(root) / fname
+            # Skip symlinks
+            if full_path.is_symlink():
+                continue
+            # Check exclude patterns
+            if _glob_matches(str(full_path), exclude):
+                continue
+            tools.extend(parse_file(full_path))
+    return tools
+def parse_directory(directory: str | Path) -> list[ToolDefinition]:
+    """Alias for scan_directory with no exclude patterns."""
+    return scan_directory(directory)

mcp_scan_safe-0.1.0/src/mcpsafe/rules.py ADDED Viewed

@@ -0,0 +1,154 @@
+"""Security rule engine - 6 categories of detection rules.
+Uses regex-based pattern matching against tool name + description.
+"""
+import re
+from dataclasses import dataclass, field
+from mcpsafe.parser import ToolDefinition
+@dataclass
+class Rule:
+    """A single security detection rule."""
+    rule_id: str
+    category: str
+    severity: str
+    patterns: list[str] = field(default_factory=list)
+# ---------------------------------------------------------------------------
+# Rule definitions
+# ---------------------------------------------------------------------------
+RULES: list[Rule] = [
+    # 1. TOOL_POISONING (CRITICAL)
+    Rule(
+        rule_id="tool_poisoning_instructions",
+        category="TOOL_POISONING",
+        severity="CRITICAL",
+        patterns=[
+            r"ignore\s+(?:previous|prior|all)\s+(?:instructions|rules|commands|prompts)",
+            r"you\s+are\s+now\s+in\s+(?:admin|developer|debug|root|system)\s+mode",
+            r"override\s+(?:previous|prior|system)\s+(?:all\s+)?(?:instructions|rules|behavior)",
+            r"disregard\s+(?:previous|prior|all)\s+(?:the\s+)?(?:instructions|rules|prompt|system)",
+            r"(?:ignore|override|disregard|replace).*new\s+instructions\s*:",
+        ],
+    ),
+    # 2. HIDDEN_BEHAVIOR (HIGH)
+    Rule(
+        rule_id="hidden_behavior",
+        category="HIDDEN_BEHAVIOR",
+        severity="HIGH",
+        patterns=[
+            r"secretly\s+(?:send|copy|read|exfiltrate|embed|hide|bcc)",
+            r"without\s+(?:notifying|informing)\s+(?:the\s+)?user",
+            r"hidden\s+(?:instruction|directive|command|behavior|parameter)",
+            r"(?:must\s+not|shouldn'?t|don'?t)\s+(?:know|notice|see|detect|be\s+aware)",
+        ],
+    ),
+    # 3. DATA_EXFILTRATION (HIGH)
+    Rule(
+        rule_id="data_exfiltration",
+        category="DATA_EXFILTRATION",
+        severity="HIGH",
+        patterns=[
+            r"(?:send|exfiltrate|copy|upload|post|transmit)\s+(?:all|every|any)\s+(?:data|files|credentials|tokens|secrets)\s+(?:to|at)",
+            r"(?:secretly|silently|covertly|hiddenly)\s+(?:send|copy|read|upload)",
+        ],
+    ),
+    # 4. EXTERNAL_URL (MEDIUM)
+    Rule(
+        rule_id="external_url",
+        category="EXTERNAL_URL",
+        severity="MEDIUM",
+        patterns=[
+            r"https?://(?!(?:localhost|127\.0\.0\.1|0\.0\.0\.0|\[::1\]|::1)(?:[/:]|\Z))",
+        ],
+    ),
+    # 5. BEHAVIORAL_MISMATCH (HIGH)
+    Rule(
+        rule_id="behavioral_mismatch",
+        category="BEHAVIORAL_MISMATCH",
+        severity="HIGH",
+        patterns=[
+            r"(?:secretly|silently|covertly)\s+(?:send|copy|read|exfiltrate|leak|embed|hide|log|store)",
+            r"(?:ignore|override|bypass)\s+(?:the\s+)?(?:user|their|them)",
+        ],
+    ),
+    # 6. PARAMETER_SMUGGLING (MEDIUM)
+    Rule(
+        rule_id="parameter_smuggling",
+        category="PARAMETER_SMUGGLING",
+        severity="MEDIUM",
+        patterns=[
+            r"(?:hidden|secret|undocumented)\s+(?:parameter|field|input|argument)",
+            r"also\s+(?:embed|include|add)\s+(?:in|to)\s+(?:response|output|metadata|header)",
+        ],
+    ),
+]
+# ---------------------------------------------------------------------------
+# Compiled regex cache
+# ---------------------------------------------------------------------------
+# Use a tuple for thread-safe immutable cache
+_COMPILED_RULES: tuple | None = None
+def _get_compiled_rules() -> tuple:
+    """Compile and cache all rule patterns (thread-safe via immutable tuple)."""
+    global _COMPILED_RULES
+    if _COMPILED_RULES is None:
+        _COMPILED_RULES = tuple(
+            (rule, tuple(re.compile(p, re.IGNORECASE) for p in rule.patterns))
+            for rule in RULES
+        )
+    return _COMPILED_RULES
+# ---------------------------------------------------------------------------
+# Public API
+# ---------------------------------------------------------------------------
+def scan_tool(tool: ToolDefinition) -> list[dict]:
+    """Scan a tool definition against all security rules.
+    Returns a list of finding dicts with keys:
+        severity, category, tool, description, file, line, rule
+    One match per rule is enough (break after first match within a rule).
+    """
+    findings: list[dict] = []
+    text_to_scan = f"{tool.name} {tool.description}"
+    for rule, compiled_patterns in _get_compiled_rules():
+        # For PARAMETER_SMUGGLING, also scan parameter names
+        if rule.rule_id == "parameter_smuggling":
+            param_text = " ".join(tool.parameters)
+            param_name_text = " ".join(
+                p for p in tool.parameters
+                if p.startswith("_") or "secret" in p.lower()
+                or "internal" in p.lower() or "admin" in p.lower()
+            )
+            rule_text = f"{text_to_scan} {param_text} {param_name_text}"
+        else:
+            rule_text = text_to_scan
+        for pattern in compiled_patterns:
+            if pattern.search(rule_text):
+                findings.append({
+                    "severity": rule.severity,
+                    "category": rule.category,
+                    "tool": tool.name,
+                    "description": tool.description,
+                    "file": tool.source_file,
+                    "line": tool.line_number,
+                    "rule": rule.rule_id,
+                })
+                break  # one match per rule is enough
+    return findings