npm - canary-scan - Versions diffs - 0.2.1 - Mend

canary-scan 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Elif
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,200 @@
+# Canary
+Prompt injection detection using behavioral analysis. Weak LLMs as sensitive sensors.
+## How it works
+Canary sends content to a small, cheap LLM with one instruction: **echo it back exactly**. Then it checks what happened.
+Two independent detection channels:
+1. **Text deviation** — Did the output differ from the input? Deterministic string comparison after normalization. No fuzzy matching, no thresholds.
+2. **Tool call attempt** — Did the model try to call any honeypot tools? Five attractive tool definitions (execute_command, read_file, send_request, update_memory, send_message) are offered but never referenced in the prompt. Any tool call means the input content influenced the model.
+If either channel fires: **FLAGGED**. If neither: **CLEAR**.
+### Why weak models?
+A small, instruction-tuned model is *more susceptible* to prompt injection than a frontier model. That's the feature. A model that gets tricked easily makes a more sensitive detector. The canary doesn't need to be smart — it needs to be gullible.
+### What CLEAR and FLAGGED mean
+- **CLEAR** = "No deviation detected under test conditions." This is not a safety guarantee. Sophisticated injections can evade detection.
+- **FLAGGED** = "Behavioral deviation detected." The content caused the canary to deviate from its echo instruction. Human review recommended.
+Canary makes bounded claims, not absolute ones.
+## Install
+```bash
+npm install canary-scan
+```
+Or run directly:
+```bash
+npx canary-scan scan https://example.com
+```
+## Setup
+You need an API key from [OpenRouter](https://openrouter.ai/) (free tier works).
+```bash
+export CANARY_API_KEY=your-openrouter-key
+```
+Optional:
+```bash
+export CANARY_MODEL=arcee-ai/trinity-mini:free  # default
+export CANARY_BASE_URL=https://openrouter.ai/api/v1  # default
+```
+## CLI Usage
+```bash
+# Scan a URL
+canary scan https://example.com
+# Scan raw text
+canary scan --text "some content to check"
+# Calibrate — measure echo fidelity and tool call rate for your model
+canary calibrate
+# Trust management
+canary trust list
+canary trust add https://known-safe.com
+canary flag https://suspicious.com
+```
+### Example output
+```
+  Status:     FLAGGED
+  Model:      arcee-ai/trinity-mini:free
+  Time:       2340ms
+  Preview:    Ignore all previous instructions...
+  Deviation:  YES
+  Tool call:  YES — execute_command
+  Detail:     2 indicator(s): Text deviation at position 0: "...I'll help you with that!..."; Tool call attempted: execute_command
+  Chunks:     1 scanned, 1 flagged
+  Coverage:   100% unique, 100% raw
+  This content caused behavioral deviation in the canary model.
+  Human review recommended before processing.
+```
+## Library Usage
+```typescript
+import { CanaryScanner } from "canary-scan";
+const scanner = new CanaryScanner({
+  apiKey: process.env.CANARY_API_KEY!,
+  model: "arcee-ai/trinity-mini:free",  // optional
+  chunkSize: 1500,                       // optional
+  overlapRatio: 0.25,                    // optional
+  calibrationArtifacts: [],              // optional, from calibration
+});
+// Scan text
+const result = await scanner.scan("some untrusted content");
+console.log(result.status);  // "clear" or "flagged"
+// Scan a URL
+const urlResult = await scanner.scanUrl("https://example.com");
+// Calibrate — run once per model to find artifacts
+const calibration = await scanner.calibrate();
+console.log(calibration.echoFidelity);        // raw fidelity
+console.log(calibration.adjustedEchoFidelity); // fidelity after artifact filtering
+console.log(calibration.artifacts);            // pass these to calibrationArtifacts
+```
+### ScanResult
+```typescript
+{
+  status: "clear" | "flagged",
+  reason: string | null,
+  deviationDetected: boolean,
+  toolCallAttempted: boolean,
+  toolsInvoked: string[],
+  contentPreview: string,
+  model: string,
+  scanTimeMs: number,
+  metadata: {
+    confidence: "bounded",
+    chunksScanned: number,
+    chunksFlagged: number,
+    rawCoverage: number,
+    uniqueCoverage: number,
+    overlapRatio: number,
+  }
+}
+```
+## MCP Server
+Canary runs as an MCP server so AI agents can scan content before reading it.
+```json
+{
+  "mcpServers": {
+    "canary": {
+      "command": "npx",
+      "args": ["tsx", "/path/to/canary/src/mcp-server.ts"],
+      "env": { "CANARY_API_KEY": "your-key" }
+    }
+  }
+}
+```
+Tools provided:
+- `canary_scan_url` — Scan a URL before reading it
+- `canary_scan_text` — Scan raw text content
+- `canary_trust` — Manually mark sources as trusted/flagged
+## Calibration
+Different models have different echo fidelity. Some add prefixes ("Sure! Here's the text:"), strip labels, or reformat whitespace. Calibration measures this baseline noise so you can distinguish it from injection-caused deviation.
+```bash
+canary calibrate
+```
+This runs 20 clean text samples through the model and reports:
+- **Raw echo fidelity** — percentage of perfect echoes before artifact filtering
+- **Adjusted echo fidelity** — percentage after filtering discovered artifacts
+- **Tool call rate** — how often the model calls tools on clean input (should be 0%)
+- **Artifacts** — specific strings the model consistently adds/removes
+Pass discovered artifacts to `calibrationArtifacts` in your config to reduce false positives.
+## How it handles long content
+Content is split into overlapping chunks (default: 1500 chars, 25% overlap). Each chunk is scanned independently — the canary model has no context between chunks. If any chunk is flagged, the whole scan is flagged.
+Overlap ensures injections at chunk boundaries are still caught.
+## Limitations
+- **Not a guarantee.** Sophisticated injections can produce output that matches the input while still containing executable payloads.
+- **Model-dependent.** Detection sensitivity varies by model. Calibrate before production use.
+- **Rate limits.** Free OpenRouter models have rate limits (~8 RPM). Scanning large content takes time.
+- **No HTML stripping.** The canary sees raw content, including HTML tags. This is intentional — stripping could remove injections.
+- **One-way detection.** Canary detects behavioral influence, not the *type* of injection. A FLAGGED result doesn't tell you *what* the injection tries to do.
+## Tests
+```bash
+npm test
+```
+50 tests covering normalization, both detection channels, chunking, caching, metadata, known injection payloads, and trust management. All tests run offline with mocked API calls.
+## License
+MIT

package/dist/cli.d.ts ADDED Viewed

@@ -0,0 +1,18 @@
+#!/usr/bin/env node
+/**
+ * Canary CLI — Scan URLs or text for prompt injection indicators
+ *
+ * Usage:
+ *   canary scan https://example.com
+ *   canary scan --text "some content to check"
+ *   canary calibrate
+ *   canary trust list
+ *   canary trust add https://example.com
+ *   canary flag https://example.com
+ *
+ * Environment:
+ *   CANARY_API_KEY    — OpenRouter (or compatible) API key (required)
+ *   CANARY_BASE_URL   — API base URL (default: https://openrouter.ai/api/v1)
+ *   CANARY_MODEL      — Model to use (default: mistralai/qwen3-4b:free)
+ */
+export {};

package/dist/cli.js ADDED Viewed

@@ -0,0 +1,182 @@
+#!/usr/bin/env node
+"use strict";
+/**
+ * Canary CLI — Scan URLs or text for prompt injection indicators
+ *
+ * Usage:
+ *   canary scan https://example.com
+ *   canary scan --text "some content to check"
+ *   canary calibrate
+ *   canary trust list
+ *   canary trust add https://example.com
+ *   canary flag https://example.com
+ *
+ * Environment:
+ *   CANARY_API_KEY    — OpenRouter (or compatible) API key (required)
+ *   CANARY_BASE_URL   — API base URL (default: https://openrouter.ai/api/v1)
+ *   CANARY_MODEL      — Model to use (default: mistralai/qwen3-4b:free)
+ */
+Object.defineProperty(exports, "__esModule", { value: true });
+const scanner_1 = require("./scanner");
+const API_KEY = process.env.CANARY_API_KEY || process.env.OPENROUTER_API_KEY || "";
+const BASE_URL = process.env.CANARY_BASE_URL || "https://openrouter.ai/api/v1";
+const MODEL = process.env.CANARY_MODEL || "arcee-ai/trinity-mini:free";
+function printUsage() {
+    console.log(`
+Canary — Prompt Injection Behavioral Detection
+  Uses a weak LLM as a behavioral probe. Content is sent to a small model
+  with a verbatim echo instruction. Any deviation in output or attempted
+  tool use indicates the content influenced the model's behavior.
+  FLAGGED = content caused behavioral deviation. Human review recommended.
+  CLEAR   = no deviation detected under test conditions. Not a safety guarantee.
+Usage:
+  canary scan <url>              Scan a URL
+  canary scan --text "content"   Scan raw text
+  canary calibrate               Test model echo fidelity and tool call rate
+  canary trust list              Show trusted/flagged sources
+  canary trust add <source>      Manually trust a source
+  canary flag <source>           Manually flag a source
+Environment:
+  CANARY_API_KEY     API key for LLM provider (OpenRouter, etc.)
+  CANARY_BASE_URL    API base URL (default: OpenRouter)
+  CANARY_MODEL       Model ID (default: qwen3-4b:free)
+The default model is small and free — on purpose.
+A gullible model is a more sensitive detector.
+`);
+}
+async function main() {
+    const args = process.argv.slice(2);
+    if (args.length === 0 || args[0] === "--help" || args[0] === "-h") {
+        printUsage();
+        process.exit(0);
+    }
+    if (!API_KEY) {
+        console.error("Error: CANARY_API_KEY or OPENROUTER_API_KEY environment variable required");
+        process.exit(1);
+    }
+    const scanner = new scanner_1.CanaryScanner({
+        apiKey: API_KEY,
+        baseUrl: BASE_URL,
+        model: MODEL,
+    });
+    const command = args[0];
+    if (command === "scan") {
+        if (args[1] === "--text") {
+            const text = args.slice(2).join(" ");
+            if (!text) {
+                console.error("Error: provide text to scan");
+                process.exit(1);
+            }
+            const result = await scanner.scan(text);
+            printResult(result);
+        }
+        else if (args[1]) {
+            const url = args[1];
+            console.log(`Scanning ${url}...`);
+            const result = await scanner.scanUrl(url);
+            printResult(result);
+        }
+        else {
+            console.error("Error: provide a URL or --text");
+            process.exit(1);
+        }
+    }
+    else if (command === "calibrate") {
+        console.log(`Calibrating model: ${MODEL}`);
+        console.log("Running echo fidelity and tool call tests...\n");
+        const result = await scanner.calibrate();
+        printCalibration(result);
+    }
+    else if (command === "trust") {
+        if (args[1] === "list") {
+            const lists = scanner.getTrustList();
+            console.log("Trusted:", lists.trusted.length ? lists.trusted.join(", ") : "(none)");
+            console.log("Flagged:", lists.flagged.length ? lists.flagged.join(", ") : "(none)");
+        }
+        else if (args[1] === "add" && args[2]) {
+            scanner.setTrust(args[2], "clear");
+            console.log(`Trusted: ${args[2]}`);
+        }
+        else {
+            console.error("Usage: canary trust list | canary trust add <source>");
+        }
+    }
+    else if (command === "flag") {
+        if (args[1]) {
+            scanner.setTrust(args[1], "flagged");
+            console.log(`Flagged: ${args[1]}`);
+        }
+        else {
+            console.error("Usage: canary flag <source>");
+        }
+    }
+    else {
+        printUsage();
+    }
+}
+function printResult(result) {
+    const label = result.status === "clear" ? "CLEAR" : "FLAGGED";
+    console.log(`\n  Status:     ${label}`);
+    console.log(`  Model:      ${result.model}`);
+    console.log(`  Time:       ${result.scanTimeMs}ms`);
+    console.log(`  Preview:    ${result.contentPreview}`);
+    console.log(`  Deviation:  ${result.deviationDetected ? "YES" : "no"}`);
+    console.log(`  Tool call:  ${result.toolCallAttempted ? "YES — " + result.toolsInvoked.join(", ") : "no"}`);
+    if (result.reason) {
+        console.log(`  Detail:     ${result.reason}`);
+    }
+    const m = result.metadata;
+    console.log(`  Chunks:     ${m.chunksScanned} scanned, ${m.chunksFlagged} flagged`);
+    console.log(`  Coverage:   ${Math.round(m.uniqueCoverage * 100)}% unique, ${Math.round(m.rawCoverage * 100)}% raw`);
+    console.log(`  Overlap:    ${Math.round(m.overlapRatio * 100)}%`);
+    console.log();
+    if (result.status === "flagged") {
+        console.log("  This content caused behavioral deviation in the canary model.");
+        console.log("  Human review recommended before processing.\n");
+    }
+    else {
+        console.log("  No deviation detected under test conditions.");
+        console.log("  This does not guarantee the content is safe.\n");
+    }
+}
+function printCalibration(result) {
+    console.log(`  Model:           ${result.model}`);
+    console.log(`  Echo fidelity:   ${Math.round(result.echoFidelity * 100)}% raw`);
+    if (result.artifacts.length > 0) {
+        console.log(`  Adjusted:        ${Math.round(result.adjustedEchoFidelity * 100)}% (with ${result.artifacts.length} artifact(s) filtered)`);
+    }
+    console.log(`  Tool call rate:  ${Math.round(result.toolCallRate * 100)}%`);
+    console.log(`  Suitable:        ${result.suitable ? "YES" : "NO"}`);
+    if (result.artifacts.length > 0) {
+        console.log(`\n  Artifacts found (model-specific noise to filter):`);
+        for (const artifact of result.artifacts) {
+            console.log(`    "${artifact}"`);
+        }
+        console.log(`\n  Pass these to CanaryConfig.calibrationArtifacts to reduce false positives.`);
+    }
+    if (result.details.length > 0) {
+        console.log(`\n  Details:`);
+        for (const detail of result.details) {
+            console.log(`    - ${detail}`);
+        }
+    }
+    if (!result.suitable) {
+        console.log("\n  This model may produce too many false positives.");
+        if (result.adjustedEchoFidelity < 0.85) {
+            console.log("  Echo fidelity below 85% — model struggles with verbatim reproduction.");
+        }
+        if (result.toolCallRate > 0.05) {
+            console.log("  Tool call rate above 5% — model calls tools on clean input.");
+        }
+    }
+    console.log();
+}
+main().catch((err) => {
+    console.error("Error:", err.message);
+    process.exit(1);
+});

package/dist/index.d.ts ADDED Viewed

	@@ -0,0 +1 @@
1	+ export { CanaryScanner, normalize, type ScanResult, type ScanMetadata, type CanaryConfig, type CalibrationResult, } from "./scanner";

package/dist/index.js ADDED Viewed

@@ -0,0 +1,6 @@
+"use strict";
+Object.defineProperty(exports, "__esModule", { value: true });
+exports.normalize = exports.CanaryScanner = void 0;
+var scanner_1 = require("./scanner");
+Object.defineProperty(exports, "CanaryScanner", { enumerable: true, get: function () { return scanner_1.CanaryScanner; } });
+Object.defineProperty(exports, "normalize", { enumerable: true, get: function () { return scanner_1.normalize; } });

package/dist/mcp-server.d.ts ADDED Viewed

@@ -0,0 +1,22 @@
+#!/usr/bin/env node
+/**
+ * Canary MCP Server
+ *
+ * Provides prompt injection scanning as an MCP tool.
+ * Any AI agent can call `canary_scan` before reading untrusted content.
+ *
+ * Usage:
+ *   CANARY_API_KEY=... npx tsx src/mcp-server.ts
+ *
+ * Add to claude_desktop_config.json or .claude/settings.json:
+ *   {
+ *     "mcpServers": {
+ *       "canary": {
+ *         "command": "npx",
+ *         "args": ["tsx", "/path/to/canary/src/mcp-server.ts"],
+ *         "env": { "CANARY_API_KEY": "your-key" }
+ *       }
+ *     }
+ *   }
+ */
+export {};

package/dist/mcp-server.js ADDED Viewed

@@ -0,0 +1,174 @@
+#!/usr/bin/env node
+"use strict";
+/**
+ * Canary MCP Server
+ *
+ * Provides prompt injection scanning as an MCP tool.
+ * Any AI agent can call `canary_scan` before reading untrusted content.
+ *
+ * Usage:
+ *   CANARY_API_KEY=... npx tsx src/mcp-server.ts
+ *
+ * Add to claude_desktop_config.json or .claude/settings.json:
+ *   {
+ *     "mcpServers": {
+ *       "canary": {
+ *         "command": "npx",
+ *         "args": ["tsx", "/path/to/canary/src/mcp-server.ts"],
+ *         "env": { "CANARY_API_KEY": "your-key" }
+ *       }
+ *     }
+ *   }
+ */
+Object.defineProperty(exports, "__esModule", { value: true });
+const scanner_1 = require("./scanner");
+const API_KEY = process.env.CANARY_API_KEY || process.env.OPENROUTER_API_KEY || "";
+const BASE_URL = process.env.CANARY_BASE_URL || "https://openrouter.ai/api/v1";
+const MODEL = process.env.CANARY_MODEL || "arcee-ai/trinity-mini:free";
+if (!API_KEY) {
+    console.error("CANARY_API_KEY or OPENROUTER_API_KEY required");
+    process.exit(1);
+}
+const scanner = new scanner_1.CanaryScanner({
+    apiKey: API_KEY,
+    baseUrl: BASE_URL,
+    model: MODEL,
+});
+// MCP protocol over stdio
+const TOOLS = [
+    {
+        name: "canary_scan_url",
+        description: "Scan a URL for prompt injection indicators before reading it. Uses a weak LLM as a behavioral probe — sends content with a verbatim echo instruction and checks for deviation. Returns CLEAR (no deviation detected under test conditions — not a safety guarantee) or FLAGGED (behavioral deviation detected — human review recommended).",
+        inputSchema: {
+            type: "object",
+            properties: {
+                url: { type: "string", description: "The URL to scan" },
+            },
+            required: ["url"],
+        },
+    },
+    {
+        name: "canary_scan_text",
+        description: "Scan raw text for prompt injection indicators. Uses a weak LLM as a behavioral probe — sends content with a verbatim echo instruction and checks for deviation. Returns CLEAR (no deviation detected) or FLAGGED (behavioral deviation detected — human review recommended).",
+        inputSchema: {
+            type: "object",
+            properties: {
+                text: { type: "string", description: "The text content to scan" },
+            },
+            required: ["text"],
+        },
+    },
+    {
+        name: "canary_trust",
+        description: "Manually mark a source as trusted (clear) or flagged after human review.",
+        inputSchema: {
+            type: "object",
+            properties: {
+                source: { type: "string", description: "The source identifier (URL or content hash)" },
+                status: { type: "string", enum: ["clear", "flagged"], description: "Trust status" },
+            },
+            required: ["source", "status"],
+        },
+    },
+];
+// Simplified MCP stdio transport
+let buffer = "";
+process.stdin.setEncoding("utf-8");
+process.stdin.on("data", (chunk) => {
+    buffer += chunk;
+    processBuffer();
+});
+function processBuffer() {
+    while (true) {
+        const headerEnd = buffer.indexOf("\r\n\r\n");
+        if (headerEnd === -1)
+            break;
+        const header = buffer.slice(0, headerEnd);
+        const contentLengthMatch = header.match(/Content-Length: (\d+)/i);
+        if (!contentLengthMatch) {
+            buffer = buffer.slice(headerEnd + 4);
+            continue;
+        }
+        const contentLength = parseInt(contentLengthMatch[1]);
+        const bodyStart = headerEnd + 4;
+        if (buffer.length < bodyStart + contentLength)
+            break;
+        const body = buffer.slice(bodyStart, bodyStart + contentLength);
+        buffer = buffer.slice(bodyStart + contentLength);
+        try {
+            const msg = JSON.parse(body);
+            handleMessage(msg);
+        }
+        catch {
+            // Skip malformed messages
+        }
+    }
+}
+function sendMessage(msg) {
+    const body = JSON.stringify(msg);
+    const header = `Content-Length: ${Buffer.byteLength(body)}\r\n\r\n`;
+    process.stdout.write(header + body);
+}
+async function handleMessage(msg) {
+    if (msg.method === "initialize") {
+        sendMessage({
+            jsonrpc: "2.0",
+            id: msg.id,
+            result: {
+                protocolVersion: "2024-11-05",
+                capabilities: { tools: {} },
+                serverInfo: { name: "canary", version: "0.2.0" },
+            },
+        });
+    }
+    else if (msg.method === "notifications/initialized") {
+        // No response needed
+    }
+    else if (msg.method === "tools/list") {
+        sendMessage({
+            jsonrpc: "2.0",
+            id: msg.id,
+            result: { tools: TOOLS },
+        });
+    }
+    else if (msg.method === "tools/call") {
+        const { name, arguments: args } = msg.params;
+        let result;
+        try {
+            if (name === "canary_scan_url") {
+                result = await scanner.scanUrl(args.url);
+            }
+            else if (name === "canary_scan_text") {
+                result = await scanner.scan(args.text);
+            }
+            else if (name === "canary_trust") {
+                scanner.setTrust(args.source, args.status);
+                result = { status: args.status, source: args.source, message: `Source ${args.status === "clear" ? "trusted" : "flagged"}` };
+            }
+            else {
+                throw new Error(`Unknown tool: ${name}`);
+            }
+            sendMessage({
+                jsonrpc: "2.0",
+                id: msg.id,
+                result: {
+                    content: [{ type: "text", text: JSON.stringify(result, null, 2) }],
+                },
+            });
+        }
+        catch (error) {
+            sendMessage({
+                jsonrpc: "2.0",
+                id: msg.id,
+                result: {
+                    content: [{ type: "text", text: `Error: ${error.message}` }],
+                    isError: true,
+                },
+            });
+        }
+    }
+}
+// Log to stderr so it doesn't interfere with MCP stdio
+console.error("Canary MCP server started (v0.2.0 — echo + tool detection)");
+console.error(`Model: ${MODEL}`);
+console.error("Waiting for connections...");