npm - cloakllm - Versions diffs - 0.1.1 - Mend

cloakllm 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/LICENSE +21 -0
package/README.md +204 -0
package/package.json +56 -0
package/src/audit.js +227 -0
package/src/cli.js +126 -0
package/src/config.js +48 -0
package/src/detector.js +166 -0
package/src/index.d.ts +160 -0
package/src/index.js +37 -0
package/src/llm-detector.js +173 -0
package/src/middleware.js +237 -0
package/src/shield.js +126 -0
package/src/tokenizer.js +133 -0
package/src/vercel-middleware.js +245 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Ziv (Zivuch) Chen
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,204 @@
+# CloakLLM
+**PII cloaking and tamper-evident audit logs for LLM API calls.**
+CloakLLM intercepts your LLM API calls, detects and cloaks PII before it reaches the provider, and logs every event to a tamper-evident audit chain designed for EU AI Act Article 12 compliance.
+> **Also available for Python:** `pip install cloakllm` — includes spaCy NER for name/org/location detection. See [CloakLLM Python](https://github.com/cloakllm/CloakLLM-PY).
+## Install
+```bash
+npm install cloakllm
+```
+## Quick Start
+### With OpenAI SDK (one line)
+```javascript
+const cloakllm = require('cloakllm');
+const OpenAI = require('openai');
+const client = new OpenAI();
+cloakllm.enable(client);  // That's it. All calls are now cloaked.
+const response = await client.chat.completions.create({
+  model: 'gpt-4o-mini',
+  messages: [{
+    role: 'user',
+    content: 'Write a reminder for sarah.j@techcorp.io about the Q3 audit'
+  }]
+});
+// Provider never saw "sarah.j@techcorp.io"
+// Response has the real email restored automatically
+```
+### With Vercel AI SDK
+```javascript
+const { createCloakLLMMiddleware } = require('cloakllm');
+const { generateText, wrapLanguageModel } = require('ai');
+const { openai } = require('@ai-sdk/openai');
+const middleware = createCloakLLMMiddleware();
+const model = wrapLanguageModel({ model: openai('gpt-4o'), middleware });
+const { text } = await generateText({
+  model,
+  prompt: 'Write a reminder for sarah.j@techcorp.io about the Q3 audit'
+});
+// Provider never saw "sarah.j@techcorp.io"
+// Response has the real email restored automatically
+```
+Works with any AI SDK provider (OpenAI, Anthropic, Google, Mistral, etc.) and supports both `generateText` and `streamText`.
+### Standalone
+```javascript
+const { Shield } = require('cloakllm');
+const shield = new Shield();
+const [sanitized, tokenMap] = shield.sanitize(
+  'Send report to john@acme.com, SSN 123-45-6789'
+);
+// sanitized: "Send report to [EMAIL_0], SSN [SSN_0]"
+// ... send sanitized text to any LLM ...
+const restored = shield.desanitize(llmResponse, tokenMap);
+// Original values restored
+```
+## What It Detects
+| Category | Examples | Method |
+|----------|----------|--------|
+| `EMAIL` | `john@acme.com` | Regex |
+| `SSN` | `123-45-6789` | Regex |
+| `CREDIT_CARD` | `4111111111111111` | Regex |
+| `PHONE` | `+1-555-0142` | Regex |
+| `IP_ADDRESS` | `192.168.1.1` | Regex |
+| `API_KEY` | `sk_live_abc123...` | Regex |
+| `AWS_KEY` | `AKIAIOSFODNN7EXAMPLE` | Regex |
+| `JWT` | `eyJhbG...` | Regex |
+| `IBAN` | `DE89370400440532013000` | Regex |
+| `PERSON` | John Smith | LLM (Local) |
+| `ORG` | Acme Corp, Google | LLM (Local) |
+| `GPE` | New York, Israel | LLM (Local) |
+| `ADDRESS` | 742 Evergreen Terrace | LLM (Local) |
+| `DATE_OF_BIRTH` | 1990-01-15 | LLM (Local) |
+| `MEDICAL` | diabetes mellitus | LLM (Local) |
+| `FINANCIAL` | account 4521-XXX | LLM (Local) |
+| `NATIONAL_ID` | TZ 12345678 | LLM (Local) |
+| `BIOMETRIC` | fingerprint hash | LLM (Local) |
+| `USERNAME` | @johndoe42 | LLM (Local) |
+| `PASSWORD` | P@ssw0rd123 | LLM (Local) |
+| `VEHICLE` | plate ABC-1234 | LLM (Local) |
+> **LLM categories** require opt-in (`llmDetection: true`) and a local [Ollama](https://ollama.com) instance. Data never leaves your machine.
+## How It Works
+```
+Your app:      "Email sarah.j@techcorp.io about Project Falcon"
+Provider sees: "Email [EMAIL_0] about Project Falcon"
+You receive:   Original email restored in the response
+```
+1. **Detect** — Regex patterns find structured PII (emails, SSNs, credit cards, etc.)
+2. **Cloak** — Replace with deterministic tokens: `[EMAIL_0]`, `[SSN_0]`
+3. **Log** — Write to hash-chained audit trail (each entry includes previous entry's SHA-256 hash)
+4. **Uncloak** — Restore original values in the LLM response
+## Tamper-Evident Audit Chain
+Every event is logged to JSONL files with hash chaining:
+```json
+{
+  "seq": 42,
+  "event_type": "sanitize",
+  "entity_count": 3,
+  "categories": {"EMAIL": 1, "SSN": 1, "PHONE": 1},
+  "prompt_hash": "sha256:9f86d0...",
+  "prev_hash": "sha256:7c4d2e...",
+  "entry_hash": "sha256:b5e8f3..."
+}
+```
+Modify any entry and every subsequent hash breaks. Verify with:
+```bash
+npx cloakllm verify ./cloakllm_audit/
+```
+## CLI
+```bash
+# Scan text for PII
+npx cloakllm scan "Email john@test.com, SSN 123-45-6789"
+# Verify audit chain integrity
+npx cloakllm verify ./cloakllm_audit/
+# Show audit statistics
+npx cloakllm stats ./cloakllm_audit/
+```
+## Configuration
+```javascript
+const { Shield, ShieldConfig } = require('cloakllm');
+const shield = new Shield(new ShieldConfig({
+  detectEmails: true,        // default: true
+  detectPhones: true,
+  detectSsns: true,
+  detectCreditCards: true,
+  detectApiKeys: true,
+  detectIpAddresses: true,
+  detectIban: true,
+  logDir: './my-audit-logs', // default: ./cloakllm_audit
+  auditEnabled: true,        // default: true
+  skipModels: ['ollama/'],   // skip local models
+  customPatterns: [
+    { name: 'EMPLOYEE_ID', pattern: 'EMP-\\d{6}' }
+  ],
+  // LLM Detection (opt-in, requires Ollama)
+  llmDetection: true,                         // Enable LLM-based detection
+  llmModel: 'llama3.2',                       // Ollama model
+  llmOllamaUrl: 'http://localhost:11434',      // Ollama endpoint
+  llmTimeout: 10000,                           // Timeout in ms
+  llmConfidence: 0.85,                         // Confidence score
+}));
+```
+## EU AI Act Compliance
+Article 12 of the EU AI Act requires tamper-evident audit logs for AI systems. Enforcement begins **August 2, 2026**. CloakLLM provides:
+- **Hash-chained logs** — cryptographically linked, any modification breaks the chain
+- **O(n) verification** — `cloakllm verify` audits the entire chain
+- **No PII in logs** — only hashes and token counts are logged (original values never stored)
+- **Event-level detail** — every sanitize/desanitize event is recorded
+## Roadmap
+- [x] NER-based detection (names, orgs, locations) via local LLM
+- [x] Local LLM detection (opt-in, via Ollama)
+- [ ] Streaming response support
+- [x] Vercel AI SDK middleware
+- [ ] LangChain.js integration
+- [ ] OpenTelemetry span emission
+- [ ] RFC 3161 trusted timestamping
+## License
+MIT — See [LICENSE](LICENSE).
+## See Also
+- **[CloakLLM Python](https://github.com/cloakllm/CloakLLM-PY)** — Python version with spaCy NER + LiteLLM integration

package/package.json ADDED Viewed

@@ -0,0 +1,56 @@
+{
+  "name": "cloakllm",
+  "version": "0.1.1",
+  "description": "PII cloaking and tamper-evident audit logs for LLM API calls. EU AI Act Article 12 compliance.",
+  "main": "src/index.js",
+  "types": "src/index.d.ts",
+  "bin": {
+    "cloakllm": "./src/cli.js"
+  },
+  "scripts": {
+    "test": "node --test test/*.js"
+  },
+  "keywords": [
+    "llm",
+    "privacy",
+    "pii",
+    "compliance",
+    "eu-ai-act",
+    "openai",
+    "vercel-ai-sdk",
+    "audit",
+    "security",
+    "gdpr",
+    "data-protection"
+  ],
+  "author": "Ziv (Zivuch) Chen",
+  "license": "MIT",
+  "homepage": "https://github.com/cloakllm/CloakLLM-JS",
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/cloakllm/CloakLLM-JS"
+  },
+  "bugs": {
+    "url": "https://github.com/cloakllm/CloakLLM-JS/issues"
+  },
+  "engines": {
+    "node": ">=18.0.0"
+  },
+  "files": [
+    "src/",
+    "LICENSE",
+    "README.md"
+  ],
+  "peerDependencies": {
+    "openai": ">=4.0.0",
+    "ai": ">=4.0.0"
+  },
+  "peerDependenciesMeta": {
+    "openai": {
+      "optional": true
+    },
+    "ai": {
+      "optional": true
+    }
+  }
+}

package/src/audit.js ADDED Viewed

@@ -0,0 +1,227 @@
+/**
+ * Tamper-Evident Audit Logger.
+ *
+ * Hash-chained append-only JSONL logs for EU AI Act Article 12 compliance.
+ * Each entry's SHA-256 hash includes the previous entry's hash.
+ * Any modification breaks the chain from that point forward.
+ */
+const crypto = require('crypto');
+const fs = require('fs');
+const path = require('path');
+const GENESIS_HASH = '0'.repeat(64);
+class AuditLogger {
+  /**
+   * @param {import('./config').ShieldConfig} config
+   */
+  constructor(config) {
+    this.config = config;
+    this._seq = 0;
+    this._prevHash = GENESIS_HASH;
+    this._logDir = config.logDir;
+    this._initialized = false;
+  }
+  _ensureInit() {
+    if (this._initialized) return;
+    fs.mkdirSync(this._logDir, { recursive: true });
+    // Recover chain state from most recent log file
+    const logFiles = this._getLogFiles();
+    if (logFiles.length > 0) {
+      const lastFile = logFiles[logFiles.length - 1];
+      try {
+        const content = fs.readFileSync(lastFile, 'utf-8');
+        const lines = content.split('\n').filter(l => l.trim());
+        if (lines.length > 0) {
+          const lastEntry = JSON.parse(lines[lines.length - 1]);
+          this._seq = lastEntry.seq + 1;
+          this._prevHash = lastEntry.entry_hash;
+        }
+      } catch {
+        // Start fresh if corrupted
+      }
+    }
+    this._initialized = true;
+  }
+  _getLogFiles() {
+    if (!fs.existsSync(this._logDir)) return [];
+    return fs.readdirSync(this._logDir)
+      .filter(f => f.startsWith('audit_') && f.endsWith('.jsonl'))
+      .sort()
+      .map(f => path.join(this._logDir, f));
+  }
+  _getLogFile() {
+    const today = new Date().toISOString().split('T')[0];
+    return path.join(this._logDir, `audit_${today}.jsonl`);
+  }
+  /**
+   * Compute SHA-256 hash of entry data.
+   * @param {Object} data
+   * @returns {string}
+   */
+  static computeHash(data) {
+    // Deterministic serialization: recursively sort keys at all levels
+    const sorted = JSON.stringify(data, (_, v) => {
+      if (v && typeof v === 'object' && !Array.isArray(v)) {
+        return Object.keys(v).sort().reduce((o, k) => { o[k] = v[k]; return o; }, {});
+      }
+      return v;
+    });
+    return crypto.createHash('sha256').update(sorted).digest('hex');
+  }
+  /**
+   * Append a new entry to the audit log.
+   * @param {Object} options
+   * @returns {Object|null}
+   */
+  log({
+    eventType,
+    originalText = '',
+    sanitizedText = '',
+    model = null,
+    provider = null,
+    entityCount = 0,
+    categories = {},
+    tokensUsed = [],
+    latencyMs = 0,
+    metadata = {},
+  }) {
+    if (!this.config.auditEnabled) return null;
+    this._ensureInit();
+    const entryData = {
+      seq: this._seq,
+      event_id: crypto.randomUUID(),
+      timestamp: new Date().toISOString(),
+      event_type: eventType,
+      model,
+      provider,
+      entity_count: entityCount,
+      categories,
+      tokens_used: tokensUsed,
+      prompt_hash: originalText
+        ? crypto.createHash('sha256').update(originalText).digest('hex')
+        : '',
+      sanitized_hash: sanitizedText
+        ? crypto.createHash('sha256').update(sanitizedText).digest('hex')
+        : '',
+      latency_ms: Math.round(latencyMs * 100) / 100,
+      prev_hash: this._prevHash,
+      metadata,
+    };
+    const entryHash = AuditLogger.computeHash(entryData);
+    entryData.entry_hash = entryHash;
+    // Write to log file
+    const logFile = this._getLogFile();
+    fs.appendFileSync(logFile, JSON.stringify(entryData) + '\n');
+    // Update chain state
+    this._prevHash = entryHash;
+    this._seq += 1;
+    return entryData;
+  }
+  /**
+   * Verify the integrity of the entire audit chain.
+   * @param {string} [logFilePath] - Specific file, or all files in logDir
+   * @returns {{ valid: boolean, errors: string[] }}
+   */
+  verifyChain(logFilePath = null) {
+    const errors = [];
+    const files = logFilePath ? [logFilePath] : this._getLogFiles();
+    if (files.length === 0) return { valid: true, errors: [] };
+    let prevHash = GENESIS_HASH;
+    for (const fpath of files) {
+      const content = fs.readFileSync(fpath, 'utf-8');
+      const lines = content.split('\n').filter(l => l.trim());
+      const fname = path.basename(fpath);
+      for (let i = 0; i < lines.length; i++) {
+        let entry;
+        try {
+          entry = JSON.parse(lines[i]);
+        } catch {
+          errors.push(`${fname}:${i + 1} — Invalid JSON`);
+          continue;
+        }
+        // Check chain link
+        if (entry.prev_hash !== prevHash) {
+          errors.push(
+            `${fname}:${i + 1} seq=${entry.seq} — ` +
+            `Chain broken: expected prev_hash=${prevHash.slice(0, 16)}..., ` +
+            `got ${(entry.prev_hash || 'MISSING').slice(0, 16)}...`
+          );
+        }
+        // Recompute entry hash
+        const storedHash = entry.entry_hash;
+        delete entry.entry_hash;
+        const recomputed = AuditLogger.computeHash(entry);
+        if (storedHash !== recomputed) {
+          errors.push(
+            `${fname}:${i + 1} seq=${entry.seq} — ` +
+            `Entry tampered: stored_hash=${storedHash.slice(0, 16)}..., ` +
+            `recomputed=${recomputed.slice(0, 16)}...`
+          );
+        }
+        prevHash = storedHash;
+      }
+    }
+    return { valid: errors.length === 0, errors };
+  }
+  /**
+   * Get aggregate statistics from audit logs.
+   * @returns {Object}
+   */
+  getStats() {
+    this._ensureInit();
+    const stats = {
+      total_events: 0,
+      total_entities_detected: 0,
+      categories: {},
+      models_used: new Set(),
+      log_files: [],
+    };
+    for (const fpath of this._getLogFiles()) {
+      stats.log_files.push(path.basename(fpath));
+      const content = fs.readFileSync(fpath, 'utf-8');
+      for (const line of content.split('\n').filter(l => l.trim())) {
+        try {
+          const entry = JSON.parse(line);
+          stats.total_events += 1;
+          stats.total_entities_detected += entry.entity_count || 0;
+          for (const [cat, count] of Object.entries(entry.categories || {})) {
+            stats.categories[cat] = (stats.categories[cat] || 0) + count;
+          }
+          if (entry.model) stats.models_used.add(entry.model);
+        } catch { /* skip corrupt lines */ }
+      }
+    }
+    stats.models_used = [...stats.models_used];
+    return stats;
+  }
+}
+module.exports = { AuditLogger, GENESIS_HASH };

package/src/cli.js ADDED Viewed

@@ -0,0 +1,126 @@
+#!/usr/bin/env node
+/**
+ * CloakLLM CLI.
+ *
+ * Usage:
+ *   cloakllm scan "Send email to john@acme.com, SSN 123-45-6789"
+ *   cloakllm verify ./cloakllm_audit/
+ *   cloakllm stats ./cloakllm_audit/
+ */
+const { Shield } = require('./shield');
+const { ShieldConfig } = require('./config');
+const { AuditLogger } = require('./audit');
+const args = process.argv.slice(2);
+const command = args[0];
+function cmdScan() {
+  const text = args.slice(1).join(' ');
+  if (!text) {
+    console.error('Usage: cloakllm scan "text to scan"');
+    process.exit(1);
+  }
+  const config = new ShieldConfig({ auditEnabled: false });
+  const shield = new Shield(config);
+  const analysis = shield.analyze(text);
+  if (analysis.entity_count === 0) {
+    console.log('✅ No sensitive entities detected.');
+    return;
+  }
+  console.log(`⚠️  Found ${analysis.entity_count} sensitive entities:\n`);
+  for (const ent of analysis.entities) {
+    console.log(`  [${ent.category}] "${ent.text}"`);
+    console.log(`    Position: ${ent.start}-${ent.end} | Confidence: ${Math.round(ent.confidence * 100)}% | Source: ${ent.source}`);
+  }
+  const [sanitized, tokenMap] = shield.sanitize(text);
+  console.log(`\n${'─'.repeat(60)}`);
+  console.log(`ORIGINAL:  ${text}`);
+  console.log(`SANITIZED: ${sanitized}`);
+  console.log(`${'─'.repeat(60)}`);
+  console.log(`\nToken map (${tokenMap.entityCount} entities):`);
+  for (const [token, original] of tokenMap.reverse) {
+    console.log(`  ${token} → "${original}"`);
+  }
+}
+function warnIfOutsideCwd(dirPath) {
+  const path = require('path');
+  const resolved = path.resolve(dirPath);
+  const cwd = process.cwd();
+  if (!resolved.startsWith(cwd + path.sep) && resolved !== cwd) {
+    console.warn(`Warning: Log directory '${resolved}' is outside the current working directory.`);
+  }
+}
+function cmdVerify() {
+  const logDir = args[1];
+  if (!logDir) {
+    console.error('Usage: cloakllm verify <log_dir>');
+    process.exit(1);
+  }
+  warnIfOutsideCwd(logDir);
+  const fs = require('fs');
+  if (!fs.existsSync(logDir)) {
+    console.error(`❌ Log directory not found: ${logDir}`);
+    process.exit(1);
+  }
+  const config = new ShieldConfig({ logDir });
+  const logger = new AuditLogger(config);
+  console.log(`Verifying audit chain in ${logDir}...`);
+  const { valid, errors } = logger.verifyChain();
+  if (valid) {
+    console.log('✅ Audit chain integrity verified — no tampering detected.');
+  } else {
+    console.error(`❌ CHAIN INTEGRITY FAILURE — ${errors.length} error(s):\n`);
+    for (const err of errors) {
+      console.error(`  • ${err}`);
+    }
+    process.exit(1);
+  }
+}
+function cmdStats() {
+  const logDir = args[1];
+  if (!logDir) {
+    console.error('Usage: cloakllm stats <log_dir>');
+    process.exit(1);
+  }
+  warnIfOutsideCwd(logDir);
+  const config = new ShieldConfig({ logDir });
+  const logger = new AuditLogger(config);
+  console.log(JSON.stringify(logger.getStats(), null, 2));
+}
+switch (command) {
+  case 'scan':
+    cmdScan();
+    break;
+  case 'verify':
+    cmdVerify();
+    break;
+  case 'stats':
+    cmdStats();
+    break;
+  default:
+    console.log('CloakLLM — AI Compliance Middleware CLI\n');
+    console.log('Commands:');
+    console.log('  scan <text>      Scan text for sensitive data');
+    console.log('  verify <dir>     Verify audit log integrity');
+    console.log('  stats <dir>      Show audit statistics');
+    break;
+}

package/src/config.js ADDED Viewed

@@ -0,0 +1,48 @@
+/**
+ * CloakLLM Configuration.
+ *
+ * All settings have sensible defaults. Override via constructor:
+ *   const config = new ShieldConfig({ logDir: './my-audit-logs' });
+ *   const shield = new Shield(config);
+ *
+ * Or via environment variables:
+ *   CLOAKLLM_LOG_DIR=./my-audit-logs
+ */
+class ShieldConfig {
+  constructor(options = {}) {
+    // --- Detection ---
+    this.detectEmails = options.detectEmails ?? true;
+    this.detectPhones = options.detectPhones ?? true;
+    this.detectSsns = options.detectSsns ?? true;
+    this.detectCreditCards = options.detectCreditCards ?? true;
+    this.detectApiKeys = options.detectApiKeys ?? true;
+    this.detectIpAddresses = options.detectIpAddresses ?? true;
+    this.detectIban = options.detectIban ?? true;
+    /** @type {Array<{name: string, pattern: string}>} */
+    this.customPatterns = options.customPatterns ?? [];
+    // --- LLM Detection (Pass 2: local LLM via Ollama) ---
+    this.llmDetection = options.llmDetection ??
+      (process.env.CLOAKLLM_LLM_DETECTION ?? 'false').toLowerCase() === 'true';
+    this.llmModel = options.llmModel ?? process.env.CLOAKLLM_LLM_MODEL ?? 'llama3.2';
+    this.llmOllamaUrl = options.llmOllamaUrl ?? process.env.CLOAKLLM_OLLAMA_URL ?? 'http://localhost:11434';
+    this.llmTimeout = options.llmTimeout ?? 10000;
+    this.llmConfidence = options.llmConfidence ?? 0.85;
+    // --- Tokenization ---
+    this.descriptiveTokens = options.descriptiveTokens ?? true;
+    // --- Audit Logging ---
+    this.auditEnabled = options.auditEnabled ?? true;
+    this.logDir = options.logDir ?? process.env.CLOAKLLM_LOG_DIR ?? './cloakllm_audit';
+    this.logOriginalValues = options.logOriginalValues ?? false;
+    // --- Middleware ---
+    this.autoMode = options.autoMode ?? true;
+    /** @type {string[]} Model prefixes to skip sanitization */
+    this.skipModels = options.skipModels ?? [];
+  }
+}
+module.exports = { ShieldConfig };