npm - kafkacode - Versions diffs - 1.4.1 → 1.5.0 - Mend

kafkacode 1.4.1 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/CHANGELOG.md +15 -0
package/README.md +31 -9
package/dist/ConfigLoader.js +113 -0
package/dist/FileScanner.js +29 -11
package/dist/LLMAnalyzer.js +4 -1
package/dist/PatternScanner.js +164 -23
package/dist/ReportGenerator.js +121 -21
package/dist/cli.js +75 -9
package/dist/index.js +4 -2
package/package.json +17 -3

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,21 @@
 All notable changes to this project are documented in this file.
+## [1.5.0] - 2026-06-06
+### Added
+- Config file support via `kafkacode.config.json`, `.kafkacoderc`, and `.kafkacoderc.json`.
+- `.kafkacodeignore`, repeatable `--exclude`, `--baseline`, and `--update-baseline`.
+- Severity controls with `--min-severity` and `--fail-on`.
+- Compact CI output via `--plain`.
+- Redacted findings by default, with `--show-secrets` for explicit local opt-in.
+- Stable rule IDs, confidence metadata, and SARIF fingerprints.
+- More file coverage: `.env`, JSON, YAML, TOML, INI, properties, XML, Terraform, Dockerfiles, Markdown, and shell scripts.
+- Expanded secret rules for GitHub, OpenAI, Anthropic, Google, Slack, SendGrid, npm, JWTs, and database URLs.
+### Changed
+- GitHub Action inputs now expose output format, failure thresholds, excludes, redaction, package version, and compact output.
 ## [1.4.0] - 2026-06-05
 ### Added

package/README.md CHANGED Viewed

@@ -2,12 +2,11 @@
 <img src="docs/logo4.png" width="104" alt="KafkaCode logo" />
-# KafkaCode
+# KafkaCode - Open-Source Privacy Code Scanner
-**Catch PII leaks, hardcoded secrets, and compliance risks before they ship.**
+**Local-first PII scanner and secret detection CLI for source code, CI/CD, GDPR, CCPA, and SARIF workflows.**
-An AI-powered privacy &amp; compliance scanner for your source code. One command,
-a clear **A+ → F privacy grade**, and CI-ready exit codes. Runs in seconds.
+KafkaCode catches PII leaks, hardcoded secrets, and privacy compliance risks before they ship. One command gives you a clear **A+ → F privacy grade**, CI-ready exit codes, JSON/SARIF output, and optional BYO-key AI analysis.
 [![npm version](https://img.shields.io/npm/v/kafkacode.svg?color=cb3837&logo=npm)](https://www.npmjs.com/package/kafkacode)
 [![npm downloads](https://img.shields.io/npm/dm/kafkacode.svg?color=cb3837)](https://www.npmjs.com/package/kafkacode)
@@ -26,7 +25,8 @@ a clear **A+ → F privacy grade**, and CI-ready exit codes. Runs in seconds.
 Most scanners stop at *"you leaked an AWS key."* KafkaCode goes further — it grades how
 your code handles **personal data**, flags **GDPR/CCPA** risks, and catches hardcoded
-secrets, with an optional **AI pass** for the context that regex alone can't see.
+secrets with a local-first pattern scanner and an optional **AI pass** for the context
+that regex alone can't see.
 You get one number a whole team understands — a **privacy grade from A+ to F** — plus a
 non-zero exit code that fails the build when something sensitive slips in.
@@ -52,10 +52,14 @@ kafkacode scan ./src --verbose
 - 🔑 **Secret detection** — AWS & Stripe keys, private keys, high-entropy strings
 - 🕵️ **PII detection** — emails, phone numbers, IP addresses
+- 🛡️ **Privacy compliance scanning** — source-code checks for GDPR, CCPA, and data privacy risks
 - 🤖 **AI-powered analysis** — contextual privacy issues a regex would miss
 - 🎓 **Privacy grade** — a single, shareable **A+ → F** score
 - 🏷️ **Grade badge** — drop your score into your README (`--badge`)
 - ⚡ **Fast & offline** — pattern scanning needs no network
+- 📄 **SARIF & JSON output** — integrate with GitHub code scanning and security dashboards
+- 🧰 **Config, ignores & baselines** — adopt safely in existing repositories
+- 🔒 **Redacted output by default** — prevent secrets from leaking into logs
 - 🌐 **7 languages** — Python, JavaScript, TypeScript, Java, Go, Ruby, PHP
 - 🚀 **CI/CD ready** — clean exit codes + a one-line GitHub Action
@@ -134,6 +138,12 @@ jobs:
 ```bash
 # Exits non-zero when issues are found, failing the build
 npx kafkacode scan ./src
+# Fail only on high or critical findings
+npx kafkacode scan ./src --fail-on high
+# Generate SARIF for GitHub code scanning
+npx kafkacode scan ./src --format sarif --output kafkacode.sarif --no-fail
 ```
 ## 🔍 What it detects
@@ -141,7 +151,7 @@ npx kafkacode scan ./src
 | Severity | Examples |
 | -------- | -------- |
 | 🚨 **Critical** | AWS keys, Stripe live keys, private keys |
-| 🔥 **High** | `password=`, `api_key=`, `token=` and other secrets in assignments |
+| 🔥 **High** | JWTs, `password=`, `api_key=`, `token=` and other secrets in assignments |
 | ⚠️ **Medium** | Emails, phone numbers, high-entropy strings |
 | 🔵 **Low** | IP addresses |
@@ -187,18 +197,30 @@ Pass `--no-ai` to force pattern-only even when a key is set.
 | PII / personal-data findings |     ✅    |          ➖           |   ➖    |
 | Privacy grade (A+ → F)       |     ✅    |          ➖           |   ➖    |
 | AI contextual analysis       |     ✅    |          ➖           |   ➖    |
+| SARIF output                 |     ✅    |          ➖           |   ✅    |
 | Zero-config, one command     |     ✅    |          ✅           |   ➖    |
 KafkaCode focuses on **privacy and developer-friendly grading** — it complements
 deep secret scanners rather than replacing them.
+## 📚 Guides
+- [PII scanner for source code](https://nikhil-kapu.github.io/kafkacode/guide/pii-scanner-for-source-code)
+- [Secret scanning in CI/CD](https://nikhil-kapu.github.io/kafkacode/guide/secret-scanning-in-ci-cd)
+- [GDPR code scanning](https://nikhil-kapu.github.io/kafkacode/guide/gdpr-code-scanning)
+- [SARIF privacy scanner](https://nikhil-kapu.github.io/kafkacode/guide/sarif-privacy-scanner)
+- [Local-first privacy scanner](https://nikhil-kapu.github.io/kafkacode/guide/local-first-privacy-scanner)
 ## 🗺️ Roadmap
 - [x] **Bring-your-own-key AI** — call Groq / OpenAI-compatible providers directly
 - [x] **`--json` & SARIF output** — SARIF integrates with the GitHub Security tab
-- [ ] Config file &amp; `.kafkacodeignore`
-- [ ] Baseline file to adopt on existing codebases
-- [ ] More file types (`.env`, YAML, Terraform, Dockerfiles)
+- [x] Config file &amp; `.kafkacodeignore`
+- [x] Baseline file to adopt on existing codebases
+- [x] More file types (`.env`, YAML, Terraform, Dockerfiles)
+- [x] Redacted snippets by default, with `--show-secrets` opt-in
+- [ ] Provider validation for selected secret types
+- [ ] More language-aware privacy rules
 Ideas and PRs welcome — see [CONTRIBUTING.md](CONTRIBUTING.md).

package/dist/ConfigLoader.js ADDED Viewed

@@ -0,0 +1,113 @@
+const fs = require('fs');
+const path = require('path');
+const crypto = require('crypto');
+const SEVERITY_ORDER = ['Low', 'Medium', 'High', 'Critical'];
+function normalizeSeverity(severity, fallback = 'Low') {
+    if (!severity) return fallback;
+    const normalized = severity.charAt(0).toUpperCase() + severity.slice(1).toLowerCase();
+    return SEVERITY_ORDER.includes(normalized) ? normalized : fallback;
+}
+function severityRank(severity) {
+    return SEVERITY_ORDER.indexOf(normalizeSeverity(severity));
+}
+function isAtLeastSeverity(severity, threshold) {
+    return severityRank(severity) >= severityRank(threshold);
+}
+function normalizeArray(value) {
+    if (!value) return [];
+    if (Array.isArray(value)) return value.filter(Boolean);
+    return [value].filter(Boolean);
+}
+function loadJsonFile(filePath) {
+    if (!filePath || !fs.existsSync(filePath)) return {};
+    const raw = fs.readFileSync(filePath, 'utf8');
+    return JSON.parse(raw);
+}
+function findDefaultConfig(rootDir) {
+    const names = [
+        'kafkacode.config.json',
+        '.kafkacoderc',
+        '.kafkacoderc.json'
+    ];
+    return names
+        .map(name => path.join(rootDir, name))
+        .find(filePath => fs.existsSync(filePath));
+}
+function loadConfig(rootDir, configPath) {
+    const resolvedRoot = path.resolve(rootDir);
+    const resolvedConfig = configPath
+        ? path.resolve(configPath)
+        : findDefaultConfig(resolvedRoot);
+    const config = resolvedConfig ? loadJsonFile(resolvedConfig) : {};
+    config.__path = resolvedConfig || null;
+    return config;
+}
+function getFindingFingerprint(finding, cwd = process.cwd()) {
+    const filePath = finding.file_path || finding.file || '';
+    const relativeFile = filePath
+        ? path.relative(cwd, filePath).split(path.sep).join('/')
+        : '';
+    const source = [
+        finding.rule_id || finding.finding_type || 'unknown-rule',
+        relativeFile,
+        (finding.code_snippet || '').trim()
+    ].join('\n');
+    return crypto.createHash('sha256').update(source).digest('hex');
+}
+function loadBaseline(filePath) {
+    if (!filePath || !fs.existsSync(filePath)) {
+        return new Set();
+    }
+    const parsed = loadJsonFile(filePath);
+    const findings = Array.isArray(parsed) ? parsed : normalizeArray(parsed.findings);
+    return new Set(findings.map(item => {
+        if (typeof item === 'string') return item;
+        return item && item.fingerprint;
+    }).filter(Boolean));
+}
+function writeBaseline(filePath, findings, cwd = process.cwd()) {
+    const entries = findings.map(finding => {
+        const filePathValue = finding.file_path || '';
+        return {
+            fingerprint: getFindingFingerprint(finding, cwd),
+            ruleId: finding.rule_id || '',
+            file: filePathValue ? path.relative(cwd, filePathValue).split(path.sep).join('/') : '',
+            line: finding.line_number || 0,
+            severity: finding.severity || 'Low',
+            description: finding.description || ''
+        };
+    });
+    const payload = {
+        version: 1,
+        generatedAt: new Date().toISOString(),
+        findings: entries
+    };
+    fs.writeFileSync(filePath, JSON.stringify(payload, null, 2) + '\n');
+}
+module.exports = {
+    SEVERITY_ORDER,
+    normalizeSeverity,
+    severityRank,
+    isAtLeastSeverity,
+    normalizeArray,
+    loadConfig,
+    loadBaseline,
+    writeBaseline,
+    getFindingFingerprint
+};

package/dist/FileScanner.js CHANGED Viewed

@@ -3,19 +3,30 @@ const path = require('path');
 const { minimatch } = require('minimatch');
 class FileScanner {
-    constructor(rootDir) {
+    constructor(rootDir, options = {}) {
         this.rootDir = path.resolve(rootDir);
-        this.supportedExtensions = new Set(['.py', '.js', '.ts', '.java', '.go', '.rb', '.php']);
+        this.supportedExtensions = new Set([
+            '.py', '.js', '.jsx', '.ts', '.tsx', '.java', '.go', '.rb', '.php',
+            '.env', '.json', '.yaml', '.yml', '.toml', '.ini', '.properties',
+            '.xml', '.tf', '.tfvars', '.dockerfile', '.md', '.sh'
+        ]);
+        this.supportedFilenames = new Set([
+            '.env', '.env.example', '.env.local', '.env.development', '.env.production',
+            'Dockerfile', 'dockerfile', 'Containerfile', 'Makefile'
+        ]);
         this.ignoreDirs = new Set([
             '.git', 'node_modules', 'venv', '__pycache__', '.venv', 'env',
             'build', 'dist', 'target', 'out', '.next', '.nuxt', 'vendor',
-            'coverage', '.coverage', '.pytest_cache', '.mypy_cache'
+            'coverage', '.coverage', '.pytest_cache', '.mypy_cache',
+            '.vitepress', '.cache', '.turbo'
         ]);
-        this.gitignorePatterns = this._loadGitignore();
+        this.extraIgnorePatterns = options.exclude || [];
+        this.gitignorePatterns = this._loadIgnoreFile('.gitignore');
+        this.kafkaCodeIgnorePatterns = this._loadIgnoreFile('.kafkacodeignore');
     }
-    _loadGitignore() {
-        const gitignorePath = path.join(this.rootDir, '.gitignore');
+    _loadIgnoreFile(fileName) {
+        const gitignorePath = path.join(this.rootDir, fileName);
         const patterns = [];
         if (fs.existsSync(gitignorePath)) {
@@ -48,9 +59,16 @@ class FileScanner {
             }
         }
-        // Check gitignore patterns
-        for (const pattern of this.gitignorePatterns) {
-            if (minimatch(relativePath, pattern) || minimatch(path.basename(filePath), pattern)) {
+        // Check ignore patterns
+        const ignorePatterns = [
+            ...this.gitignorePatterns,
+            ...this.kafkaCodeIgnorePatterns,
+            ...this.extraIgnorePatterns
+        ];
+        for (const pattern of ignorePatterns) {
+            if (minimatch(relativePath, pattern, { dot: true }) ||
+                minimatch(path.basename(filePath), pattern, { dot: true }) ||
+                minimatch(relativePath, `${pattern}/**`, { dot: true })) {
                 return true;
             }
         }
@@ -75,7 +93,7 @@ class FileScanner {
                     files.push(...this._scanDirectory(fullPath));
                 } else if (entry.isFile()) {
                     const ext = path.extname(entry.name);
-                    if (this.supportedExtensions.has(ext)) {
+                    if (this.supportedExtensions.has(ext) || this.supportedFilenames.has(entry.name)) {
                         files.push(fullPath);
                     }
                 }
@@ -93,4 +111,4 @@ class FileScanner {
     }
 }
-module.exports = FileScanner;
+module.exports = FileScanner;

package/dist/LLMAnalyzer.js CHANGED Viewed

@@ -250,12 +250,15 @@ ${codeSnippet}`;
                 findings.push({
                     file_path: filePath,
                     line_number: vuln.line_number || startLine,
+                    rule_id: 'KC_AI_CONTEXT',
                     severity: vuln.severity || 'Medium',
                     finding_type: 'Context-Based Issue',
                     description: vuln.description || 'Privacy vulnerability detected',
+                    confidence: vuln.confidence || 'Medium',
                     code_snippet: this._getCodeSnippet(content, vuln.line_number || startLine),
                     suggestion: vuln.suggestion || 'Review and address the identified issue.',
-                    source: 'llm'
+                    source: 'llm',
+                    secret: false
                 });
             }

package/dist/PatternScanner.js CHANGED Viewed

@@ -6,62 +6,199 @@ class PatternScanner {
     _initPatterns() {
         return {
             'aws_access_key': {
+                id: 'KC_SECRET_AWS_ACCESS_KEY',
                 pattern: /\b(AKIA[0-9A-Z]{16})\b/g,
                 severity: 'Critical',
                 type: 'Hardcoded Secret',
-                description: 'AWS Access Key ID detected'
+                description: 'AWS Access Key ID detected',
+                confidence: 'High',
+                secret: true
             },
             'aws_secret_key': {
-                pattern: /\b[A-Za-z0-9/+=]{40}\b/g,
+                id: 'KC_SECRET_AWS_SECRET_KEY',
+                pattern: /\b(?:aws_?secret_?access_?key|AWS_SECRET_ACCESS_KEY)\b\s*[:=]\s*["']?([A-Za-z0-9/+=]{40})/gi,
                 severity: 'Critical',
                 type: 'Hardcoded Secret',
-                description: 'Potential AWS Secret Access Key detected'
+                description: 'AWS Secret Access Key detected',
+                confidence: 'High',
+                secret: true,
+                secretGroup: 1
             },
             'stripe_key': {
-                pattern: /\b(sk_live_[0-9a-zA-Z]{24}|pk_live_[0-9a-zA-Z]{24})\b/g,
+                id: 'KC_SECRET_STRIPE_KEY',
+                pattern: /\b((?:sk|pk|rk)_live_[0-9a-zA-Z]{24,})\b/g,
                 severity: 'Critical',
                 type: 'Hardcoded Secret',
-                description: 'Stripe API key detected'
+                description: 'Stripe live API key detected',
+                confidence: 'High',
+                secret: true
             },
             'private_key': {
-                pattern: /-----BEGIN\s+(RSA\s+)?PRIVATE\s+KEY-----/g,
+                id: 'KC_SECRET_PRIVATE_KEY',
+                pattern: /-----BEGIN\s+(?:RSA\s+|DSA\s+|EC\s+|OPENSSH\s+)?PRIVATE\s+KEY-----/g,
                 severity: 'Critical',
                 type: 'Hardcoded Secret',
-                description: 'Private key detected'
+                description: 'Private key detected',
+                confidence: 'High',
+                secret: true
+            },
+            'github_token': {
+                id: 'KC_SECRET_GITHUB_TOKEN',
+                pattern: /\b((?:ghp|gho|ghu|ghs|ghr)_[A-Za-z0-9_]{30,}|github_pat_[A-Za-z0-9_]{22,}_[A-Za-z0-9_]{59,})\b/g,
+                severity: 'Critical',
+                type: 'Hardcoded Secret',
+                description: 'GitHub token detected',
+                confidence: 'High',
+                secret: true
+            },
+            'openai_key': {
+                id: 'KC_SECRET_OPENAI_KEY',
+                pattern: /\b(sk-(?:proj-)?[A-Za-z0-9_-]{32,})\b/g,
+                severity: 'Critical',
+                type: 'Hardcoded Secret',
+                description: 'OpenAI API key detected',
+                confidence: 'High',
+                secret: true
+            },
+            'anthropic_key': {
+                id: 'KC_SECRET_ANTHROPIC_KEY',
+                pattern: /\b(sk-ant-[A-Za-z0-9_-]{32,})\b/g,
+                severity: 'Critical',
+                type: 'Hardcoded Secret',
+                description: 'Anthropic API key detected',
+                confidence: 'High',
+                secret: true
+            },
+            'google_api_key': {
+                id: 'KC_SECRET_GOOGLE_API_KEY',
+                pattern: /\b(AIza[0-9A-Za-z_-]{35})\b/g,
+                severity: 'Critical',
+                type: 'Hardcoded Secret',
+                description: 'Google API key detected',
+                confidence: 'High',
+                secret: true
+            },
+            'slack_token': {
+                id: 'KC_SECRET_SLACK_TOKEN',
+                pattern: /\b(xox[baprs]-[A-Za-z0-9-]{20,})\b/g,
+                severity: 'Critical',
+                type: 'Hardcoded Secret',
+                description: 'Slack token detected',
+                confidence: 'High',
+                secret: true
+            },
+            'slack_webhook': {
+                id: 'KC_SECRET_SLACK_WEBHOOK',
+                pattern: /\b(https:\/\/hooks\.slack\.com\/services\/T[A-Z0-9]+\/B[A-Z0-9]+\/[A-Za-z0-9]+)\b/g,
+                severity: 'Critical',
+                type: 'Hardcoded Secret',
+                description: 'Slack webhook URL detected',
+                confidence: 'High',
+                secret: true
+            },
+            'sendgrid_key': {
+                id: 'KC_SECRET_SENDGRID_KEY',
+                pattern: /\b(SG\.[A-Za-z0-9_-]{16,}\.[A-Za-z0-9_-]{16,})\b/g,
+                severity: 'Critical',
+                type: 'Hardcoded Secret',
+                description: 'SendGrid API key detected',
+                confidence: 'High',
+                secret: true
+            },
+            'npm_token': {
+                id: 'KC_SECRET_NPM_TOKEN',
+                pattern: /\b(npm_[A-Za-z0-9]{36,})\b/g,
+                severity: 'Critical',
+                type: 'Hardcoded Secret',
+                description: 'npm token detected',
+                confidence: 'High',
+                secret: true
+            },
+            'jwt': {
+                id: 'KC_SECRET_JWT',
+                pattern: /\b(eyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,})\b/g,
+                severity: 'High',
+                type: 'Hardcoded Secret',
+                description: 'JSON Web Token detected',
+                confidence: 'Medium',
+                secret: true
+            },
+            'database_url': {
+                id: 'KC_SECRET_DATABASE_URL',
+                pattern: /\b((?:postgres(?:ql)?|mysql|mongodb(?:\+srv)?|redis):\/\/[^\s"']+:[^\s"']+@[^\s"']+)/gi,
+                severity: 'Critical',
+                type: 'Hardcoded Secret',
+                description: 'Database connection URL with credentials detected',
+                confidence: 'High',
+                secret: true
             },
             'high_entropy': {
+                id: 'KC_SECRET_HIGH_ENTROPY',
                 pattern: /\b[A-Za-z0-9+/=]{32,}\b/g,
                 severity: 'Medium',
                 type: 'Hardcoded Secret',
-                description: 'High entropy string (potential secret)'
+                description: 'High entropy string (potential secret)',
+                confidence: 'Low',
+                secret: true
             },
             'email': {
+                id: 'KC_PII_EMAIL',
                 pattern: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/gi,
                 severity: 'Medium',
                 type: 'PII Detected',
-                description: 'Email address detected'
+                description: 'Email address detected',
+                confidence: 'High',
+                secret: false
             },
             'phone': {
+                id: 'KC_PII_PHONE',
                 pattern: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b|\b\(\d{3}\)\s?\d{3}[-.]?\d{4}\b/g,
                 severity: 'Medium',
                 type: 'PII Detected',
-                description: 'Phone number detected'
+                description: 'Phone number detected',
+                confidence: 'Medium',
+                secret: false
             },
             'ip_address': {
-                pattern: /\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b/g,
+                id: 'KC_PII_IP_ADDRESS',
+                pattern: /\b(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\b/g,
                 severity: 'Low',
                 type: 'PII Detected',
-                description: 'IP address detected'
+                description: 'IP address detected',
+                confidence: 'Medium',
+                secret: false
             },
             'sensitive_keywords': {
-                pattern: /\b(password|secret|api_key|token|ssn|credit_card|credentials)\s*[=:]\s*["']?[^"'\s]+/gi,
+                id: 'KC_SECRET_SENSITIVE_ASSIGNMENT',
+                pattern: /\b(password|passwd|secret|api[_-]?key|token|ssn|credit[_-]?card|credentials?)\b\s*[:=]\s*["']?([^"'\s,;]+)/gi,
                 severity: 'High',
                 type: 'Hardcoded Secret',
-                description: 'Sensitive keyword in assignment context'
+                description: 'Sensitive keyword in assignment context',
+                confidence: 'Medium',
+                secret: true,
+                secretGroup: 2
             }
         };
     }
+    _isLikelyPlaceholder(data) {
+        const value = (data || '').toLowerCase();
+        return [
+            'example', 'sample', 'dummy', 'test', 'fake', 'placeholder',
+            'changeme', 'your_', 'your-', '<', 'xxxx', '****'
+        ].some(token => value.includes(token));
+    }
+    _defaultSuggestion(patternInfo) {
+        if (patternInfo.type === 'PII Detected') {
+            return 'Avoid hardcoding personal data; load it from approved runtime sources and redact it from logs.';
+        }
+        if (patternInfo.id === 'KC_SECRET_HIGH_ENTROPY') {
+            return 'Review this value and move secrets to environment variables or a secrets manager.';
+        }
+        return 'Move credentials to environment variables or a secrets manager, then rotate the exposed value.';
+    }
     _calculateEntropy(data) {
         if (!data) return 0;
@@ -103,24 +240,28 @@ class PatternScanner {
                         if (matchedText.length < 20) {
                             continue;
                         }
-                    }
-                    // Skip common false positives for AWS secret pattern
-                    if (patternName === 'aws_secret_key') {
-                        const matchedText = match[0];
-                        const falsePositives = ['example', 'dummy', 'test', 'fake'];
-                        if (falsePositives.some(fp => matchedText.toLowerCase().includes(fp))) {
+                        if (this._isLikelyPlaceholder(matchedText)) {
                             continue;
                         }
                     }
+                    const matchedText = patternInfo.secretGroup ? match[patternInfo.secretGroup] : match[0];
+                    if (patternInfo.secret && this._isLikelyPlaceholder(matchedText)) {
+                        continue;
+                    }
                     const finding = {
                         file_path: filePath,
                         line_number: lineNum + 1,
+                        rule_id: patternInfo.id,
                         severity: patternInfo.severity,
                         finding_type: patternInfo.type,
                         description: patternInfo.description,
-                        code_snippet: line.trim()
+                        suggestion: patternInfo.suggestion || this._defaultSuggestion(patternInfo),
+                        confidence: patternInfo.confidence || 'Medium',
+                        code_snippet: line.trim(),
+                        matched_text: matchedText,
+                        secret: patternInfo.secret === true
                     };
                     findings.push(finding);
@@ -132,4 +273,4 @@ class PatternScanner {
     }
 }
-module.exports = PatternScanner;
+module.exports = PatternScanner;

package/dist/ReportGenerator.js CHANGED Viewed

@@ -7,7 +7,9 @@ try {
 } catch (_) { /* fall back to the default */ }
 class ReportGenerator {
-    constructor() {
+    constructor(options = {}) {
+        this.showSecrets = options.showSecrets === true;
+        this.plain = options.plain === true;
         this.reportTime = new Date();
         this.severityIcons = {
             'Critical': '🚨',
@@ -72,6 +74,41 @@ class ReportGenerator {
         return { grade, url, markdown: `![Privacy Grade: ${grade}](${url})` };
     }
+    _maskValue(value) {
+        if (!value) return value;
+        if (this.showSecrets) return value;
+        if (value.length <= 6) return '[REDACTED]';
+        return `${value.slice(0, 3)}...[REDACTED]...${value.slice(-3)}`;
+    }
+    _sanitizeSnippet(finding) {
+        const snippet = finding.code_snippet || '';
+        if (this.showSecrets || !snippet) return snippet;
+        let sanitized = snippet;
+        if (finding.matched_text) {
+            sanitized = sanitized.split(finding.matched_text).join(this._maskValue(finding.matched_text));
+        }
+        // Defense in depth for snippets from contextual AI findings.
+        sanitized = sanitized
+            .replace(/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/g, '[EMAIL_REDACTED]')
+            .replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE_REDACTED]')
+            .replace(/\b\(\d{3}\)\s?\d{3}[-.]?\d{4}\b/g, '[PHONE_REDACTED]')
+            .replace(/\b((?:ghp|gho|ghu|ghs|ghr)_[A-Za-z0-9_]{20,}|github_pat_[A-Za-z0-9_]{20,})\b/g, '[TOKEN_REDACTED]')
+            .replace(/\b(sk-(?:proj-)?[A-Za-z0-9_-]{24,}|sk-ant-[A-Za-z0-9_-]{24,})\b/g, '[API_KEY_REDACTED]')
+            .replace(/\b(AKIA[0-9A-Z]{16})\b/g, '[AWS_ACCESS_KEY_REDACTED]')
+            .replace(/\b((?:postgres(?:ql)?|mysql|mongodb(?:\+srv)?|redis):\/\/[^\s"']+:[^\s"']+@[^\s"']+)/gi, '[DATABASE_URL_REDACTED]');
+        return sanitized;
+    }
+    _displayPath(filePath) {
+        if (!filePath) return 'Unknown';
+        const relative = path.relative(process.cwd(), filePath).split(path.sep).join('/');
+        return relative && !relative.startsWith('..') ? relative : filePath;
+    }
     // Public: render findings as a structured JSON report.
     generateJson(scanDir, findings, fileCount) {
         const severityCounts = { Critical: 0, High: 0, Medium: 0, Low: 0 };
@@ -93,14 +130,17 @@ class ReportGenerator {
                 severityCounts
             },
             findings: findings.map((f) => ({
+                ruleId: f.rule_id || '',
                 file: rel(f.file_path),
                 line: f.line_number || 0,
                 severity: f.severity || 'Low',
                 type: f.finding_type || 'Issue',
                 description: f.description || '',
                 suggestion: f.suggestion || '',
+                confidence: f.confidence || 'Medium',
                 source: f.source === 'llm' ? 'ai' : 'pattern',
-                snippet: f.code_snippet || ''
+                fingerprint: f.fingerprint || '',
+                snippet: this._sanitizeSnippet(f)
             }))
         };
         return JSON.stringify(report, null, 2);
@@ -119,19 +159,23 @@ class ReportGenerator {
         const rules = new Map();
         for (const f of findings) {
-            const id = slug(f.finding_type);
+            const id = f.rule_id || slug(f.finding_type);
             if (!rules.has(id)) {
                 rules.set(id, {
                     id,
                     name: f.finding_type || 'Issue',
-                    shortDescription: { text: f.finding_type || 'Issue' },
-                    defaultConfiguration: { level: levelFor(f.severity) }
+                    shortDescription: { text: f.description || f.finding_type || 'Issue' },
+                    defaultConfiguration: { level: levelFor(f.severity) },
+                    properties: {
+                        confidence: f.confidence || 'Medium',
+                        category: f.finding_type || 'Issue'
+                    }
                 });
             }
         }
         const results = findings.map((f) => ({
-            ruleId: slug(f.finding_type),
+            ruleId: f.rule_id || slug(f.finding_type),
             level: levelFor(f.severity),
             message: { text: f.description || 'Privacy issue detected' },
             locations: [{
@@ -139,7 +183,14 @@ class ReportGenerator {
                     artifactLocation: { uri: rel(f.file_path) },
                     region: { startLine: Math.max(1, f.line_number || 1) }
                 }
-            }]
+            }],
+            partialFingerprints: f.fingerprint ? { kafkacode: f.fingerprint } : undefined,
+            properties: {
+                severity: f.severity || 'Low',
+                confidence: f.confidence || 'Medium',
+                source: f.source === 'llm' ? 'ai' : 'pattern',
+                snippet: this._sanitizeSnippet(f)
+            }
         }));
         const sarif = {
@@ -186,7 +237,64 @@ class ReportGenerator {
         return colors[grade] || '⚪';
     }
+    _countSeverities(findings) {
+        const severityCounts = {};
+        this.severityOrder.forEach(severity => {
+            severityCounts[severity] = 0;
+        });
+        for (const finding of findings) {
+            const severity = finding.severity || 'Low';
+            if (severityCounts.hasOwnProperty(severity)) {
+                severityCounts[severity]++;
+            }
+        }
+        return severityCounts;
+    }
+    generatePlainReport(scanDir, findings, fileCount) {
+        const severityCounts = this._countSeverities(findings);
+        const grade = this._calculateGrade(findings);
+        const lines = [
+            `KafkaCode Privacy Scan`,
+            `Directory: ${scanDir}`,
+            `Files scanned: ${fileCount}`,
+            `Issues: ${findings.length}`,
+            `Privacy grade: ${grade}`,
+            `Severity: Critical ${severityCounts.Critical}, High ${severityCounts.High}, Medium ${severityCounts.Medium}, Low ${severityCounts.Low}`,
+            ''
+        ];
+        if (!findings.length) {
+            lines.push('No privacy issues detected.');
+            return lines.join('\n');
+        }
+        const groupedFindings = this._groupFindingsBySeverity(findings);
+        for (const severity of this.severityOrder) {
+            const severityFindings = groupedFindings[severity];
+            if (!severityFindings.length) continue;
+            lines.push(`${severity.toUpperCase()}`);
+            for (const finding of severityFindings) {
+                lines.push(`- ${this._displayPath(finding.file_path)}:${finding.line_number || 0} ${finding.rule_id || 'N/A'} ${finding.description || 'Privacy issue detected'}`);
+                lines.push(`  ${this._sanitizeSnippet(finding) || 'N/A'}`);
+                if (finding.suggestion) {
+                    lines.push(`  Suggestion: ${finding.suggestion}`);
+                }
+            }
+            lines.push('');
+        }
+        return lines.join('\n');
+    }
     generateReport(scanDir, findings, fileCount) {
+        if (this.plain) {
+            return this.generatePlainReport(scanDir, findings, fileCount);
+        }
         const reportLines = [];
         // ASCII Art Header
@@ -212,17 +320,7 @@ class ReportGenerator {
         );
         // Summary box
-        const severityCounts = {};
-        this.severityOrder.forEach(severity => {
-            severityCounts[severity] = 0;
-        });
-        for (const finding of findings) {
-            const severity = finding.severity || 'Low';
-            if (severityCounts.hasOwnProperty(severity)) {
-                severityCounts[severity]++;
-            }
-        }
+        const severityCounts = this._countSeverities(findings);
         const grade = this._calculateGrade(findings);
         const gradeColor = this._getGradeColor(grade);
@@ -296,12 +394,14 @@ class ReportGenerator {
                         `┌── Issue #${i + 1} ──────────────────────────────────────────────────────────`,
                         `│ ${icon} ${finding.description || 'Privacy issue detected'}`,
                         '│',
-                        `│ 📍 Location: ${finding.file_path || 'Unknown'}:${finding.line_number || 0}`,
+                        `│ 📍 Location: ${this._displayPath(finding.file_path)}:${finding.line_number || 0}`,
                         `│ 🚨 Severity: ${finding.severity || 'Unknown'}`,
+                        `│ 🎚️  Confidence: ${finding.confidence || 'Medium'}`,
+                        `│ 🧩 Rule: ${finding.rule_id || 'N/A'}`,
                         `│ 🔎 Detection: ${sourceBadge}`,
                         '│',
                         '│ 💾 Code:',
-                        `│    ${(finding.line_number || 0).toString().padStart(3)} │ ${finding.code_snippet || 'N/A'}`
+                        `│    ${(finding.line_number || 0).toString().padStart(3)} │ ${this._sanitizeSnippet(finding) || 'N/A'}`
                     );
                     if (finding.suggestion) {
@@ -335,4 +435,4 @@ class ReportGenerator {
     }
 }
-module.exports = ReportGenerator;
+module.exports = ReportGenerator;

package/dist/cli.js CHANGED Viewed

@@ -6,13 +6,28 @@ const fs = require('fs');
 const FileScanner = require('./FileScanner');
 const AnalysisEngine = require('./AnalysisEngine');
 const ReportGenerator = require('./ReportGenerator');
+const {
+    loadConfig,
+    loadBaseline,
+    writeBaseline,
+    getFindingFingerprint,
+    isAtLeastSeverity,
+    normalizeSeverity,
+    normalizeArray
+} = require('./ConfigLoader');
 const program = new Command();
+const VERSION = require('../package.json').version;
+function collect(value, previous) {
+    previous.push(value);
+    return previous;
+}
 program
     .name('kafkacode')
     .description('KafkaCode - Privacy and Compliance Scanner')
-    .version('1.4.0');
+    .version(VERSION);
 program
     .command('scan')
@@ -22,6 +37,14 @@ program
     .option('-b, --badge', 'Print a copy-paste privacy-grade badge for your README')
     .option('-f, --format <format>', 'Output format: console, json, or sarif', 'console')
     .option('-o, --output <file>', 'Write output to a file instead of stdout')
+    .option('-c, --config <file>', 'Path to a KafkaCode JSON config file')
+    .option('--exclude <pattern>', 'Exclude a glob pattern from scanning (repeatable)', collect, [])
+    .option('--baseline <file>', 'Ignore findings already present in a baseline file')
+    .option('--update-baseline <file>', 'Write current findings to a baseline file and exit 0')
+    .option('--min-severity <severity>', 'Minimum severity to report: low, medium, high, critical')
+    .option('--fail-on <severity>', 'Fail only when findings are at least this severity', 'low')
+    .option('--show-secrets', 'Show full matched snippets instead of redacting sensitive values')
+    .option('--plain', 'Use compact console output without the ASCII banner')
     .option('--no-ai', 'Disable AI-powered analysis (run pattern scan only)')
     .option('--no-fail', 'Exit 0 even when issues are found')
     .action(async (directory, options) => {
@@ -44,18 +67,41 @@ async function runScan(directory, options = {}) {
         process.exit(1);
     }
+    let config = {};
+    try {
+        config = loadConfig(directory, options.config);
+    } catch (error) {
+        console.error(`Error loading config: ${error.message}`);
+        process.exit(1);
+    }
+    const minSeverity = normalizeSeverity(options.minSeverity || config.minSeverity || 'Low');
+    const failOn = normalizeSeverity(options.failOn || config.failOn || 'Low');
+    const excludes = [
+        ...normalizeArray(config.exclude),
+        ...normalizeArray(options.exclude)
+    ];
+    const baselinePath = options.baseline || config.baseline || '';
+    const updateBaselinePath = options.updateBaseline || '';
+    const showSecrets = options.showSecrets === true || config.showSecrets === true;
+    const plain = options.plain === true || config.plain === true;
+    const aiDisabled = options.ai === false || config.ai === false;
     if (verbose) {
         console.log('🚀 Starting KafkaCode privacy scan...');
+        if (config.__path) {
+            console.log(`⚙️  Loaded config: ${config.__path}`);
+        }
     }
     try {
         // Initialize components
-        const fileScanner = new FileScanner(directory);
+        const fileScanner = new FileScanner(directory, { exclude: excludes });
         const analysisEngine = new AnalysisEngine(verbose);
-        if (options.ai === false) {
+        if (aiDisabled) {
             analysisEngine.disableAi();
         }
-        const reportGenerator = new ReportGenerator();
+        const reportGenerator = new ReportGenerator({ showSecrets, plain });
         // Scan for files
         if (verbose) {
@@ -76,7 +122,27 @@ async function runScan(directory, options = {}) {
         if (verbose) {
             console.log('🔍 Performing privacy analysis...');
         }
-        const findings = await analysisEngine.analyzeFiles(files);
+        let findings = await analysisEngine.analyzeFiles(files);
+        const scanRoot = path.resolve(directory);
+        findings = findings.map(finding => ({
+            ...finding,
+            fingerprint: getFindingFingerprint(finding, scanRoot)
+        }));
+        if (updateBaselinePath) {
+            const resolvedBaseline = path.resolve(updateBaselinePath);
+            writeBaseline(resolvedBaseline, findings, scanRoot);
+            console.error(`✅ Wrote baseline with ${findings.length} findings to ${resolvedBaseline}`);
+            process.exit(0);
+        }
+        if (baselinePath) {
+            const resolvedBaseline = path.resolve(baselinePath);
+            const baseline = loadBaseline(resolvedBaseline);
+            findings = findings.filter(finding => !baseline.has(finding.fingerprint));
+        }
+        findings = findings.filter(finding => isAtLeastSeverity(finding.severity, minSeverity));
         // Render the findings in the requested format (validated above)
         let output;
@@ -98,7 +164,7 @@ async function runScan(directory, options = {}) {
         // Console-only extras — kept out of machine-readable output
         if (format === 'console' && !options.output) {
-            if (options.ai !== false && !analysisEngine.aiEnabled()) {
+            if (!aiDisabled && !analysisEngine.aiEnabled()) {
                 console.log('💡 Tip: set KAFKACODE_API_KEY to enable AI-powered contextual analysis. See the README.\n');
             }
             if (options.badge) {
@@ -108,8 +174,8 @@ async function runScan(directory, options = {}) {
             }
         }
-        // Exit non-zero when issues are found, unless --no-fail was passed
-        const shouldFail = options.fail !== false && findings.length > 0;
+        // Exit non-zero when findings meet the failure threshold, unless --no-fail was passed
+        const shouldFail = options.fail !== false && findings.some(finding => isAtLeastSeverity(finding.severity, failOn));
         process.exit(shouldFail ? 1 : 0);
     } catch (error) {
@@ -132,4 +198,4 @@ process.on('SIGINT', () => {
     process.exit(1);
 });
-program.parse();
+program.parse();

package/dist/index.js CHANGED Viewed

@@ -3,11 +3,13 @@ const PatternScanner = require('./PatternScanner');
 const LLMAnalyzer = require('./LLMAnalyzer');
 const AnalysisEngine = require('./AnalysisEngine');
 const ReportGenerator = require('./ReportGenerator');
+const ConfigLoader = require('./ConfigLoader');
 module.exports = {
     FileScanner,
     PatternScanner,
     LLMAnalyzer,
     AnalysisEngine,
-    ReportGenerator
-};
+    ReportGenerator,
+    ConfigLoader
+};

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "kafkacode",
-  "version": "1.4.1",
-  "description": "AI-powered privacy and compliance scanner - find PII leaks, hardcoded secrets, and compliance violations in your source code",
+  "version": "1.5.0",
+  "description": "Open-source, local-first privacy code scanner for PII leaks, hardcoded secrets, GDPR/CCPA compliance, SARIF, and CI/CD",
   "main": "dist/index.js",
   "bin": {
     "kafkacode": "dist/cli.js"
@@ -21,13 +21,27 @@
     "gdpr",
     "ccpa",
     "secret-detection",
+    "secret-scanning",
+    "secrets-scanner",
+    "secrets",
+    "pii-detection",
     "code-analysis",
     "privacy-scanner",
+    "privacy-code-scanner",
     "ai-powered",
     "shift-left",
     "cli-tool",
     "security-scanner",
-    "vulnerability-scanner"
+    "vulnerability-scanner",
+    "data-loss-prevention",
+    "dlp",
+    "gdpr-compliance",
+    "ccpa-compliance",
+    "sarif",
+    "github-actions",
+    "devsecops",
+    "appsec",
+    "data-privacy"
   ],
   "author": "KafkaLabs <contact@kafkalabs.com>",
   "license": "MIT",