kafkacode 1.4.1 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,21 @@
2
2
 
3
3
  All notable changes to this project are documented in this file.
4
4
 
5
+ ## [1.5.0] - 2026-06-06
6
+
7
+ ### Added
8
+ - Config file support via `kafkacode.config.json`, `.kafkacoderc`, and `.kafkacoderc.json`.
9
+ - `.kafkacodeignore`, repeatable `--exclude`, `--baseline`, and `--update-baseline`.
10
+ - Severity controls with `--min-severity` and `--fail-on`.
11
+ - Compact CI output via `--plain`.
12
+ - Redacted findings by default, with `--show-secrets` for explicit local opt-in.
13
+ - Stable rule IDs, confidence metadata, and SARIF fingerprints.
14
+ - More file coverage: `.env`, JSON, YAML, TOML, INI, properties, XML, Terraform, Dockerfiles, Markdown, and shell scripts.
15
+ - Expanded secret rules for GitHub, OpenAI, Anthropic, Google, Slack, SendGrid, npm, JWTs, and database URLs.
16
+
17
+ ### Changed
18
+ - GitHub Action inputs now expose output format, failure thresholds, excludes, redaction, package version, and compact output.
19
+
5
20
  ## [1.4.0] - 2026-06-05
6
21
 
7
22
  ### Added
package/README.md CHANGED
@@ -2,12 +2,11 @@
2
2
 
3
3
  <img src="docs/logo4.png" width="104" alt="KafkaCode logo" />
4
4
 
5
- # KafkaCode
5
+ # KafkaCode - Open-Source Privacy Code Scanner
6
6
 
7
- **Catch PII leaks, hardcoded secrets, and compliance risks before they ship.**
7
+ **Local-first PII scanner and secret detection CLI for source code, CI/CD, GDPR, CCPA, and SARIF workflows.**
8
8
 
9
- An AI-powered privacy &amp; compliance scanner for your source code. One command,
10
- a clear **A+ → F privacy grade**, and CI-ready exit codes. Runs in seconds.
9
+ KafkaCode catches PII leaks, hardcoded secrets, and privacy compliance risks before they ship. One command gives you a clear **A+ → F privacy grade**, CI-ready exit codes, JSON/SARIF output, and optional BYO-key AI analysis.
11
10
 
12
11
  [![npm version](https://img.shields.io/npm/v/kafkacode.svg?color=cb3837&logo=npm)](https://www.npmjs.com/package/kafkacode)
13
12
  [![npm downloads](https://img.shields.io/npm/dm/kafkacode.svg?color=cb3837)](https://www.npmjs.com/package/kafkacode)
@@ -26,7 +25,8 @@ a clear **A+ → F privacy grade**, and CI-ready exit codes. Runs in seconds.
26
25
 
27
26
  Most scanners stop at *"you leaked an AWS key."* KafkaCode goes further — it grades how
28
27
  your code handles **personal data**, flags **GDPR/CCPA** risks, and catches hardcoded
29
- secrets, with an optional **AI pass** for the context that regex alone can't see.
28
+ secrets with a local-first pattern scanner and an optional **AI pass** for the context
29
+ that regex alone can't see.
30
30
 
31
31
  You get one number a whole team understands — a **privacy grade from A+ to F** — plus a
32
32
  non-zero exit code that fails the build when something sensitive slips in.
@@ -52,10 +52,14 @@ kafkacode scan ./src --verbose
52
52
 
53
53
  - 🔑 **Secret detection** — AWS & Stripe keys, private keys, high-entropy strings
54
54
  - 🕵️ **PII detection** — emails, phone numbers, IP addresses
55
+ - 🛡️ **Privacy compliance scanning** — source-code checks for GDPR, CCPA, and data privacy risks
55
56
  - 🤖 **AI-powered analysis** — contextual privacy issues a regex would miss
56
57
  - 🎓 **Privacy grade** — a single, shareable **A+ → F** score
57
58
  - 🏷️ **Grade badge** — drop your score into your README (`--badge`)
58
59
  - ⚡ **Fast & offline** — pattern scanning needs no network
60
+ - 📄 **SARIF & JSON output** — integrate with GitHub code scanning and security dashboards
61
+ - 🧰 **Config, ignores & baselines** — adopt safely in existing repositories
62
+ - 🔒 **Redacted output by default** — prevent secrets from leaking into logs
59
63
  - 🌐 **7 languages** — Python, JavaScript, TypeScript, Java, Go, Ruby, PHP
60
64
  - 🚀 **CI/CD ready** — clean exit codes + a one-line GitHub Action
61
65
 
@@ -134,6 +138,12 @@ jobs:
134
138
  ```bash
135
139
  # Exits non-zero when issues are found, failing the build
136
140
  npx kafkacode scan ./src
141
+
142
+ # Fail only on high or critical findings
143
+ npx kafkacode scan ./src --fail-on high
144
+
145
+ # Generate SARIF for GitHub code scanning
146
+ npx kafkacode scan ./src --format sarif --output kafkacode.sarif --no-fail
137
147
  ```
138
148
 
139
149
  ## 🔍 What it detects
@@ -141,7 +151,7 @@ npx kafkacode scan ./src
141
151
  | Severity | Examples |
142
152
  | -------- | -------- |
143
153
  | 🚨 **Critical** | AWS keys, Stripe live keys, private keys |
144
- | 🔥 **High** | `password=`, `api_key=`, `token=` and other secrets in assignments |
154
+ | 🔥 **High** | JWTs, `password=`, `api_key=`, `token=` and other secrets in assignments |
145
155
  | ⚠️ **Medium** | Emails, phone numbers, high-entropy strings |
146
156
  | 🔵 **Low** | IP addresses |
147
157
 
@@ -187,18 +197,30 @@ Pass `--no-ai` to force pattern-only even when a key is set.
187
197
  | PII / personal-data findings | ✅ | ➖ | ➖ |
188
198
  | Privacy grade (A+ → F) | ✅ | ➖ | ➖ |
189
199
  | AI contextual analysis | ✅ | ➖ | ➖ |
200
+ | SARIF output | ✅ | ➖ | ✅ |
190
201
  | Zero-config, one command | ✅ | ✅ | ➖ |
191
202
 
192
203
  KafkaCode focuses on **privacy and developer-friendly grading** — it complements
193
204
  deep secret scanners rather than replacing them.
194
205
 
206
+ ## 📚 Guides
207
+
208
+ - [PII scanner for source code](https://nikhil-kapu.github.io/kafkacode/guide/pii-scanner-for-source-code)
209
+ - [Secret scanning in CI/CD](https://nikhil-kapu.github.io/kafkacode/guide/secret-scanning-in-ci-cd)
210
+ - [GDPR code scanning](https://nikhil-kapu.github.io/kafkacode/guide/gdpr-code-scanning)
211
+ - [SARIF privacy scanner](https://nikhil-kapu.github.io/kafkacode/guide/sarif-privacy-scanner)
212
+ - [Local-first privacy scanner](https://nikhil-kapu.github.io/kafkacode/guide/local-first-privacy-scanner)
213
+
195
214
  ## 🗺️ Roadmap
196
215
 
197
216
  - [x] **Bring-your-own-key AI** — call Groq / OpenAI-compatible providers directly
198
217
  - [x] **`--json` & SARIF output** — SARIF integrates with the GitHub Security tab
199
- - [ ] Config file &amp; `.kafkacodeignore`
200
- - [ ] Baseline file to adopt on existing codebases
201
- - [ ] More file types (`.env`, YAML, Terraform, Dockerfiles)
218
+ - [x] Config file &amp; `.kafkacodeignore`
219
+ - [x] Baseline file to adopt on existing codebases
220
+ - [x] More file types (`.env`, YAML, Terraform, Dockerfiles)
221
+ - [x] Redacted snippets by default, with `--show-secrets` opt-in
222
+ - [ ] Provider validation for selected secret types
223
+ - [ ] More language-aware privacy rules
202
224
 
203
225
  Ideas and PRs welcome — see [CONTRIBUTING.md](CONTRIBUTING.md).
204
226
 
@@ -0,0 +1,113 @@
1
+ const fs = require('fs');
2
+ const path = require('path');
3
+ const crypto = require('crypto');
4
+
5
+ const SEVERITY_ORDER = ['Low', 'Medium', 'High', 'Critical'];
6
+
7
+ function normalizeSeverity(severity, fallback = 'Low') {
8
+ if (!severity) return fallback;
9
+ const normalized = severity.charAt(0).toUpperCase() + severity.slice(1).toLowerCase();
10
+ return SEVERITY_ORDER.includes(normalized) ? normalized : fallback;
11
+ }
12
+
13
+ function severityRank(severity) {
14
+ return SEVERITY_ORDER.indexOf(normalizeSeverity(severity));
15
+ }
16
+
17
+ function isAtLeastSeverity(severity, threshold) {
18
+ return severityRank(severity) >= severityRank(threshold);
19
+ }
20
+
21
+ function normalizeArray(value) {
22
+ if (!value) return [];
23
+ if (Array.isArray(value)) return value.filter(Boolean);
24
+ return [value].filter(Boolean);
25
+ }
26
+
27
+ function loadJsonFile(filePath) {
28
+ if (!filePath || !fs.existsSync(filePath)) return {};
29
+ const raw = fs.readFileSync(filePath, 'utf8');
30
+ return JSON.parse(raw);
31
+ }
32
+
33
+ function findDefaultConfig(rootDir) {
34
+ const names = [
35
+ 'kafkacode.config.json',
36
+ '.kafkacoderc',
37
+ '.kafkacoderc.json'
38
+ ];
39
+ return names
40
+ .map(name => path.join(rootDir, name))
41
+ .find(filePath => fs.existsSync(filePath));
42
+ }
43
+
44
+ function loadConfig(rootDir, configPath) {
45
+ const resolvedRoot = path.resolve(rootDir);
46
+ const resolvedConfig = configPath
47
+ ? path.resolve(configPath)
48
+ : findDefaultConfig(resolvedRoot);
49
+
50
+ const config = resolvedConfig ? loadJsonFile(resolvedConfig) : {};
51
+ config.__path = resolvedConfig || null;
52
+ return config;
53
+ }
54
+
55
+ function getFindingFingerprint(finding, cwd = process.cwd()) {
56
+ const filePath = finding.file_path || finding.file || '';
57
+ const relativeFile = filePath
58
+ ? path.relative(cwd, filePath).split(path.sep).join('/')
59
+ : '';
60
+ const source = [
61
+ finding.rule_id || finding.finding_type || 'unknown-rule',
62
+ relativeFile,
63
+ (finding.code_snippet || '').trim()
64
+ ].join('\n');
65
+ return crypto.createHash('sha256').update(source).digest('hex');
66
+ }
67
+
68
+ function loadBaseline(filePath) {
69
+ if (!filePath || !fs.existsSync(filePath)) {
70
+ return new Set();
71
+ }
72
+
73
+ const parsed = loadJsonFile(filePath);
74
+ const findings = Array.isArray(parsed) ? parsed : normalizeArray(parsed.findings);
75
+ return new Set(findings.map(item => {
76
+ if (typeof item === 'string') return item;
77
+ return item && item.fingerprint;
78
+ }).filter(Boolean));
79
+ }
80
+
81
+ function writeBaseline(filePath, findings, cwd = process.cwd()) {
82
+ const entries = findings.map(finding => {
83
+ const filePathValue = finding.file_path || '';
84
+ return {
85
+ fingerprint: getFindingFingerprint(finding, cwd),
86
+ ruleId: finding.rule_id || '',
87
+ file: filePathValue ? path.relative(cwd, filePathValue).split(path.sep).join('/') : '',
88
+ line: finding.line_number || 0,
89
+ severity: finding.severity || 'Low',
90
+ description: finding.description || ''
91
+ };
92
+ });
93
+
94
+ const payload = {
95
+ version: 1,
96
+ generatedAt: new Date().toISOString(),
97
+ findings: entries
98
+ };
99
+
100
+ fs.writeFileSync(filePath, JSON.stringify(payload, null, 2) + '\n');
101
+ }
102
+
103
+ module.exports = {
104
+ SEVERITY_ORDER,
105
+ normalizeSeverity,
106
+ severityRank,
107
+ isAtLeastSeverity,
108
+ normalizeArray,
109
+ loadConfig,
110
+ loadBaseline,
111
+ writeBaseline,
112
+ getFindingFingerprint
113
+ };
@@ -3,19 +3,30 @@ const path = require('path');
3
3
  const { minimatch } = require('minimatch');
4
4
 
5
5
  class FileScanner {
6
- constructor(rootDir) {
6
+ constructor(rootDir, options = {}) {
7
7
  this.rootDir = path.resolve(rootDir);
8
- this.supportedExtensions = new Set(['.py', '.js', '.ts', '.java', '.go', '.rb', '.php']);
8
+ this.supportedExtensions = new Set([
9
+ '.py', '.js', '.jsx', '.ts', '.tsx', '.java', '.go', '.rb', '.php',
10
+ '.env', '.json', '.yaml', '.yml', '.toml', '.ini', '.properties',
11
+ '.xml', '.tf', '.tfvars', '.dockerfile', '.md', '.sh'
12
+ ]);
13
+ this.supportedFilenames = new Set([
14
+ '.env', '.env.example', '.env.local', '.env.development', '.env.production',
15
+ 'Dockerfile', 'dockerfile', 'Containerfile', 'Makefile'
16
+ ]);
9
17
  this.ignoreDirs = new Set([
10
18
  '.git', 'node_modules', 'venv', '__pycache__', '.venv', 'env',
11
19
  'build', 'dist', 'target', 'out', '.next', '.nuxt', 'vendor',
12
- 'coverage', '.coverage', '.pytest_cache', '.mypy_cache'
20
+ 'coverage', '.coverage', '.pytest_cache', '.mypy_cache',
21
+ '.vitepress', '.cache', '.turbo'
13
22
  ]);
14
- this.gitignorePatterns = this._loadGitignore();
23
+ this.extraIgnorePatterns = options.exclude || [];
24
+ this.gitignorePatterns = this._loadIgnoreFile('.gitignore');
25
+ this.kafkaCodeIgnorePatterns = this._loadIgnoreFile('.kafkacodeignore');
15
26
  }
16
27
 
17
- _loadGitignore() {
18
- const gitignorePath = path.join(this.rootDir, '.gitignore');
28
+ _loadIgnoreFile(fileName) {
29
+ const gitignorePath = path.join(this.rootDir, fileName);
19
30
  const patterns = [];
20
31
 
21
32
  if (fs.existsSync(gitignorePath)) {
@@ -48,9 +59,16 @@ class FileScanner {
48
59
  }
49
60
  }
50
61
 
51
- // Check gitignore patterns
52
- for (const pattern of this.gitignorePatterns) {
53
- if (minimatch(relativePath, pattern) || minimatch(path.basename(filePath), pattern)) {
62
+ // Check ignore patterns
63
+ const ignorePatterns = [
64
+ ...this.gitignorePatterns,
65
+ ...this.kafkaCodeIgnorePatterns,
66
+ ...this.extraIgnorePatterns
67
+ ];
68
+ for (const pattern of ignorePatterns) {
69
+ if (minimatch(relativePath, pattern, { dot: true }) ||
70
+ minimatch(path.basename(filePath), pattern, { dot: true }) ||
71
+ minimatch(relativePath, `${pattern}/**`, { dot: true })) {
54
72
  return true;
55
73
  }
56
74
  }
@@ -75,7 +93,7 @@ class FileScanner {
75
93
  files.push(...this._scanDirectory(fullPath));
76
94
  } else if (entry.isFile()) {
77
95
  const ext = path.extname(entry.name);
78
- if (this.supportedExtensions.has(ext)) {
96
+ if (this.supportedExtensions.has(ext) || this.supportedFilenames.has(entry.name)) {
79
97
  files.push(fullPath);
80
98
  }
81
99
  }
@@ -93,4 +111,4 @@ class FileScanner {
93
111
  }
94
112
  }
95
113
 
96
- module.exports = FileScanner;
114
+ module.exports = FileScanner;
@@ -250,12 +250,15 @@ ${codeSnippet}`;
250
250
  findings.push({
251
251
  file_path: filePath,
252
252
  line_number: vuln.line_number || startLine,
253
+ rule_id: 'KC_AI_CONTEXT',
253
254
  severity: vuln.severity || 'Medium',
254
255
  finding_type: 'Context-Based Issue',
255
256
  description: vuln.description || 'Privacy vulnerability detected',
257
+ confidence: vuln.confidence || 'Medium',
256
258
  code_snippet: this._getCodeSnippet(content, vuln.line_number || startLine),
257
259
  suggestion: vuln.suggestion || 'Review and address the identified issue.',
258
- source: 'llm'
260
+ source: 'llm',
261
+ secret: false
259
262
  });
260
263
  }
261
264
 
@@ -6,62 +6,199 @@ class PatternScanner {
6
6
  _initPatterns() {
7
7
  return {
8
8
  'aws_access_key': {
9
+ id: 'KC_SECRET_AWS_ACCESS_KEY',
9
10
  pattern: /\b(AKIA[0-9A-Z]{16})\b/g,
10
11
  severity: 'Critical',
11
12
  type: 'Hardcoded Secret',
12
- description: 'AWS Access Key ID detected'
13
+ description: 'AWS Access Key ID detected',
14
+ confidence: 'High',
15
+ secret: true
13
16
  },
14
17
  'aws_secret_key': {
15
- pattern: /\b[A-Za-z0-9/+=]{40}\b/g,
18
+ id: 'KC_SECRET_AWS_SECRET_KEY',
19
+ pattern: /\b(?:aws_?secret_?access_?key|AWS_SECRET_ACCESS_KEY)\b\s*[:=]\s*["']?([A-Za-z0-9/+=]{40})/gi,
16
20
  severity: 'Critical',
17
21
  type: 'Hardcoded Secret',
18
- description: 'Potential AWS Secret Access Key detected'
22
+ description: 'AWS Secret Access Key detected',
23
+ confidence: 'High',
24
+ secret: true,
25
+ secretGroup: 1
19
26
  },
20
27
  'stripe_key': {
21
- pattern: /\b(sk_live_[0-9a-zA-Z]{24}|pk_live_[0-9a-zA-Z]{24})\b/g,
28
+ id: 'KC_SECRET_STRIPE_KEY',
29
+ pattern: /\b((?:sk|pk|rk)_live_[0-9a-zA-Z]{24,})\b/g,
22
30
  severity: 'Critical',
23
31
  type: 'Hardcoded Secret',
24
- description: 'Stripe API key detected'
32
+ description: 'Stripe live API key detected',
33
+ confidence: 'High',
34
+ secret: true
25
35
  },
26
36
  'private_key': {
27
- pattern: /-----BEGIN\s+(RSA\s+)?PRIVATE\s+KEY-----/g,
37
+ id: 'KC_SECRET_PRIVATE_KEY',
38
+ pattern: /-----BEGIN\s+(?:RSA\s+|DSA\s+|EC\s+|OPENSSH\s+)?PRIVATE\s+KEY-----/g,
28
39
  severity: 'Critical',
29
40
  type: 'Hardcoded Secret',
30
- description: 'Private key detected'
41
+ description: 'Private key detected',
42
+ confidence: 'High',
43
+ secret: true
44
+ },
45
+ 'github_token': {
46
+ id: 'KC_SECRET_GITHUB_TOKEN',
47
+ pattern: /\b((?:ghp|gho|ghu|ghs|ghr)_[A-Za-z0-9_]{30,}|github_pat_[A-Za-z0-9_]{22,}_[A-Za-z0-9_]{59,})\b/g,
48
+ severity: 'Critical',
49
+ type: 'Hardcoded Secret',
50
+ description: 'GitHub token detected',
51
+ confidence: 'High',
52
+ secret: true
53
+ },
54
+ 'openai_key': {
55
+ id: 'KC_SECRET_OPENAI_KEY',
56
+ pattern: /\b(sk-(?:proj-)?[A-Za-z0-9_-]{32,})\b/g,
57
+ severity: 'Critical',
58
+ type: 'Hardcoded Secret',
59
+ description: 'OpenAI API key detected',
60
+ confidence: 'High',
61
+ secret: true
62
+ },
63
+ 'anthropic_key': {
64
+ id: 'KC_SECRET_ANTHROPIC_KEY',
65
+ pattern: /\b(sk-ant-[A-Za-z0-9_-]{32,})\b/g,
66
+ severity: 'Critical',
67
+ type: 'Hardcoded Secret',
68
+ description: 'Anthropic API key detected',
69
+ confidence: 'High',
70
+ secret: true
71
+ },
72
+ 'google_api_key': {
73
+ id: 'KC_SECRET_GOOGLE_API_KEY',
74
+ pattern: /\b(AIza[0-9A-Za-z_-]{35})\b/g,
75
+ severity: 'Critical',
76
+ type: 'Hardcoded Secret',
77
+ description: 'Google API key detected',
78
+ confidence: 'High',
79
+ secret: true
80
+ },
81
+ 'slack_token': {
82
+ id: 'KC_SECRET_SLACK_TOKEN',
83
+ pattern: /\b(xox[baprs]-[A-Za-z0-9-]{20,})\b/g,
84
+ severity: 'Critical',
85
+ type: 'Hardcoded Secret',
86
+ description: 'Slack token detected',
87
+ confidence: 'High',
88
+ secret: true
89
+ },
90
+ 'slack_webhook': {
91
+ id: 'KC_SECRET_SLACK_WEBHOOK',
92
+ pattern: /\b(https:\/\/hooks\.slack\.com\/services\/T[A-Z0-9]+\/B[A-Z0-9]+\/[A-Za-z0-9]+)\b/g,
93
+ severity: 'Critical',
94
+ type: 'Hardcoded Secret',
95
+ description: 'Slack webhook URL detected',
96
+ confidence: 'High',
97
+ secret: true
98
+ },
99
+ 'sendgrid_key': {
100
+ id: 'KC_SECRET_SENDGRID_KEY',
101
+ pattern: /\b(SG\.[A-Za-z0-9_-]{16,}\.[A-Za-z0-9_-]{16,})\b/g,
102
+ severity: 'Critical',
103
+ type: 'Hardcoded Secret',
104
+ description: 'SendGrid API key detected',
105
+ confidence: 'High',
106
+ secret: true
107
+ },
108
+ 'npm_token': {
109
+ id: 'KC_SECRET_NPM_TOKEN',
110
+ pattern: /\b(npm_[A-Za-z0-9]{36,})\b/g,
111
+ severity: 'Critical',
112
+ type: 'Hardcoded Secret',
113
+ description: 'npm token detected',
114
+ confidence: 'High',
115
+ secret: true
116
+ },
117
+ 'jwt': {
118
+ id: 'KC_SECRET_JWT',
119
+ pattern: /\b(eyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,})\b/g,
120
+ severity: 'High',
121
+ type: 'Hardcoded Secret',
122
+ description: 'JSON Web Token detected',
123
+ confidence: 'Medium',
124
+ secret: true
125
+ },
126
+ 'database_url': {
127
+ id: 'KC_SECRET_DATABASE_URL',
128
+ pattern: /\b((?:postgres(?:ql)?|mysql|mongodb(?:\+srv)?|redis):\/\/[^\s"']+:[^\s"']+@[^\s"']+)/gi,
129
+ severity: 'Critical',
130
+ type: 'Hardcoded Secret',
131
+ description: 'Database connection URL with credentials detected',
132
+ confidence: 'High',
133
+ secret: true
31
134
  },
32
135
  'high_entropy': {
136
+ id: 'KC_SECRET_HIGH_ENTROPY',
33
137
  pattern: /\b[A-Za-z0-9+/=]{32,}\b/g,
34
138
  severity: 'Medium',
35
139
  type: 'Hardcoded Secret',
36
- description: 'High entropy string (potential secret)'
140
+ description: 'High entropy string (potential secret)',
141
+ confidence: 'Low',
142
+ secret: true
37
143
  },
38
144
  'email': {
145
+ id: 'KC_PII_EMAIL',
39
146
  pattern: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/gi,
40
147
  severity: 'Medium',
41
148
  type: 'PII Detected',
42
- description: 'Email address detected'
149
+ description: 'Email address detected',
150
+ confidence: 'High',
151
+ secret: false
43
152
  },
44
153
  'phone': {
154
+ id: 'KC_PII_PHONE',
45
155
  pattern: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b|\b\(\d{3}\)\s?\d{3}[-.]?\d{4}\b/g,
46
156
  severity: 'Medium',
47
157
  type: 'PII Detected',
48
- description: 'Phone number detected'
158
+ description: 'Phone number detected',
159
+ confidence: 'Medium',
160
+ secret: false
49
161
  },
50
162
  'ip_address': {
51
- pattern: /\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b/g,
163
+ id: 'KC_PII_IP_ADDRESS',
164
+ pattern: /\b(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\b/g,
52
165
  severity: 'Low',
53
166
  type: 'PII Detected',
54
- description: 'IP address detected'
167
+ description: 'IP address detected',
168
+ confidence: 'Medium',
169
+ secret: false
55
170
  },
56
171
  'sensitive_keywords': {
57
- pattern: /\b(password|secret|api_key|token|ssn|credit_card|credentials)\s*[=:]\s*["']?[^"'\s]+/gi,
172
+ id: 'KC_SECRET_SENSITIVE_ASSIGNMENT',
173
+ pattern: /\b(password|passwd|secret|api[_-]?key|token|ssn|credit[_-]?card|credentials?)\b\s*[:=]\s*["']?([^"'\s,;]+)/gi,
58
174
  severity: 'High',
59
175
  type: 'Hardcoded Secret',
60
- description: 'Sensitive keyword in assignment context'
176
+ description: 'Sensitive keyword in assignment context',
177
+ confidence: 'Medium',
178
+ secret: true,
179
+ secretGroup: 2
61
180
  }
62
181
  };
63
182
  }
64
183
 
184
+ _isLikelyPlaceholder(data) {
185
+ const value = (data || '').toLowerCase();
186
+ return [
187
+ 'example', 'sample', 'dummy', 'test', 'fake', 'placeholder',
188
+ 'changeme', 'your_', 'your-', '<', 'xxxx', '****'
189
+ ].some(token => value.includes(token));
190
+ }
191
+
192
+ _defaultSuggestion(patternInfo) {
193
+ if (patternInfo.type === 'PII Detected') {
194
+ return 'Avoid hardcoding personal data; load it from approved runtime sources and redact it from logs.';
195
+ }
196
+ if (patternInfo.id === 'KC_SECRET_HIGH_ENTROPY') {
197
+ return 'Review this value and move secrets to environment variables or a secrets manager.';
198
+ }
199
+ return 'Move credentials to environment variables or a secrets manager, then rotate the exposed value.';
200
+ }
201
+
65
202
  _calculateEntropy(data) {
66
203
  if (!data) return 0;
67
204
 
@@ -103,24 +240,28 @@ class PatternScanner {
103
240
  if (matchedText.length < 20) {
104
241
  continue;
105
242
  }
106
- }
107
-
108
- // Skip common false positives for AWS secret pattern
109
- if (patternName === 'aws_secret_key') {
110
- const matchedText = match[0];
111
- const falsePositives = ['example', 'dummy', 'test', 'fake'];
112
- if (falsePositives.some(fp => matchedText.toLowerCase().includes(fp))) {
243
+ if (this._isLikelyPlaceholder(matchedText)) {
113
244
  continue;
114
245
  }
115
246
  }
116
247
 
248
+ const matchedText = patternInfo.secretGroup ? match[patternInfo.secretGroup] : match[0];
249
+ if (patternInfo.secret && this._isLikelyPlaceholder(matchedText)) {
250
+ continue;
251
+ }
252
+
117
253
  const finding = {
118
254
  file_path: filePath,
119
255
  line_number: lineNum + 1,
256
+ rule_id: patternInfo.id,
120
257
  severity: patternInfo.severity,
121
258
  finding_type: patternInfo.type,
122
259
  description: patternInfo.description,
123
- code_snippet: line.trim()
260
+ suggestion: patternInfo.suggestion || this._defaultSuggestion(patternInfo),
261
+ confidence: patternInfo.confidence || 'Medium',
262
+ code_snippet: line.trim(),
263
+ matched_text: matchedText,
264
+ secret: patternInfo.secret === true
124
265
  };
125
266
 
126
267
  findings.push(finding);
@@ -132,4 +273,4 @@ class PatternScanner {
132
273
  }
133
274
  }
134
275
 
135
- module.exports = PatternScanner;
276
+ module.exports = PatternScanner;
@@ -7,7 +7,9 @@ try {
7
7
  } catch (_) { /* fall back to the default */ }
8
8
 
9
9
  class ReportGenerator {
10
- constructor() {
10
+ constructor(options = {}) {
11
+ this.showSecrets = options.showSecrets === true;
12
+ this.plain = options.plain === true;
11
13
  this.reportTime = new Date();
12
14
  this.severityIcons = {
13
15
  'Critical': '🚨',
@@ -72,6 +74,41 @@ class ReportGenerator {
72
74
  return { grade, url, markdown: `![Privacy Grade: ${grade}](${url})` };
73
75
  }
74
76
 
77
+ _maskValue(value) {
78
+ if (!value) return value;
79
+ if (this.showSecrets) return value;
80
+ if (value.length <= 6) return '[REDACTED]';
81
+ return `${value.slice(0, 3)}...[REDACTED]...${value.slice(-3)}`;
82
+ }
83
+
84
+ _sanitizeSnippet(finding) {
85
+ const snippet = finding.code_snippet || '';
86
+ if (this.showSecrets || !snippet) return snippet;
87
+
88
+ let sanitized = snippet;
89
+ if (finding.matched_text) {
90
+ sanitized = sanitized.split(finding.matched_text).join(this._maskValue(finding.matched_text));
91
+ }
92
+
93
+ // Defense in depth for snippets from contextual AI findings.
94
+ sanitized = sanitized
95
+ .replace(/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/g, '[EMAIL_REDACTED]')
96
+ .replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE_REDACTED]')
97
+ .replace(/\b\(\d{3}\)\s?\d{3}[-.]?\d{4}\b/g, '[PHONE_REDACTED]')
98
+ .replace(/\b((?:ghp|gho|ghu|ghs|ghr)_[A-Za-z0-9_]{20,}|github_pat_[A-Za-z0-9_]{20,})\b/g, '[TOKEN_REDACTED]')
99
+ .replace(/\b(sk-(?:proj-)?[A-Za-z0-9_-]{24,}|sk-ant-[A-Za-z0-9_-]{24,})\b/g, '[API_KEY_REDACTED]')
100
+ .replace(/\b(AKIA[0-9A-Z]{16})\b/g, '[AWS_ACCESS_KEY_REDACTED]')
101
+ .replace(/\b((?:postgres(?:ql)?|mysql|mongodb(?:\+srv)?|redis):\/\/[^\s"']+:[^\s"']+@[^\s"']+)/gi, '[DATABASE_URL_REDACTED]');
102
+
103
+ return sanitized;
104
+ }
105
+
106
+ _displayPath(filePath) {
107
+ if (!filePath) return 'Unknown';
108
+ const relative = path.relative(process.cwd(), filePath).split(path.sep).join('/');
109
+ return relative && !relative.startsWith('..') ? relative : filePath;
110
+ }
111
+
75
112
  // Public: render findings as a structured JSON report.
76
113
  generateJson(scanDir, findings, fileCount) {
77
114
  const severityCounts = { Critical: 0, High: 0, Medium: 0, Low: 0 };
@@ -93,14 +130,17 @@ class ReportGenerator {
93
130
  severityCounts
94
131
  },
95
132
  findings: findings.map((f) => ({
133
+ ruleId: f.rule_id || '',
96
134
  file: rel(f.file_path),
97
135
  line: f.line_number || 0,
98
136
  severity: f.severity || 'Low',
99
137
  type: f.finding_type || 'Issue',
100
138
  description: f.description || '',
101
139
  suggestion: f.suggestion || '',
140
+ confidence: f.confidence || 'Medium',
102
141
  source: f.source === 'llm' ? 'ai' : 'pattern',
103
- snippet: f.code_snippet || ''
142
+ fingerprint: f.fingerprint || '',
143
+ snippet: this._sanitizeSnippet(f)
104
144
  }))
105
145
  };
106
146
  return JSON.stringify(report, null, 2);
@@ -119,19 +159,23 @@ class ReportGenerator {
119
159
 
120
160
  const rules = new Map();
121
161
  for (const f of findings) {
122
- const id = slug(f.finding_type);
162
+ const id = f.rule_id || slug(f.finding_type);
123
163
  if (!rules.has(id)) {
124
164
  rules.set(id, {
125
165
  id,
126
166
  name: f.finding_type || 'Issue',
127
- shortDescription: { text: f.finding_type || 'Issue' },
128
- defaultConfiguration: { level: levelFor(f.severity) }
167
+ shortDescription: { text: f.description || f.finding_type || 'Issue' },
168
+ defaultConfiguration: { level: levelFor(f.severity) },
169
+ properties: {
170
+ confidence: f.confidence || 'Medium',
171
+ category: f.finding_type || 'Issue'
172
+ }
129
173
  });
130
174
  }
131
175
  }
132
176
 
133
177
  const results = findings.map((f) => ({
134
- ruleId: slug(f.finding_type),
178
+ ruleId: f.rule_id || slug(f.finding_type),
135
179
  level: levelFor(f.severity),
136
180
  message: { text: f.description || 'Privacy issue detected' },
137
181
  locations: [{
@@ -139,7 +183,14 @@ class ReportGenerator {
139
183
  artifactLocation: { uri: rel(f.file_path) },
140
184
  region: { startLine: Math.max(1, f.line_number || 1) }
141
185
  }
142
- }]
186
+ }],
187
+ partialFingerprints: f.fingerprint ? { kafkacode: f.fingerprint } : undefined,
188
+ properties: {
189
+ severity: f.severity || 'Low',
190
+ confidence: f.confidence || 'Medium',
191
+ source: f.source === 'llm' ? 'ai' : 'pattern',
192
+ snippet: this._sanitizeSnippet(f)
193
+ }
143
194
  }));
144
195
 
145
196
  const sarif = {
@@ -186,7 +237,64 @@ class ReportGenerator {
186
237
  return colors[grade] || '⚪';
187
238
  }
188
239
 
240
+ _countSeverities(findings) {
241
+ const severityCounts = {};
242
+ this.severityOrder.forEach(severity => {
243
+ severityCounts[severity] = 0;
244
+ });
245
+
246
+ for (const finding of findings) {
247
+ const severity = finding.severity || 'Low';
248
+ if (severityCounts.hasOwnProperty(severity)) {
249
+ severityCounts[severity]++;
250
+ }
251
+ }
252
+
253
+ return severityCounts;
254
+ }
255
+
256
+ generatePlainReport(scanDir, findings, fileCount) {
257
+ const severityCounts = this._countSeverities(findings);
258
+ const grade = this._calculateGrade(findings);
259
+ const lines = [
260
+ `KafkaCode Privacy Scan`,
261
+ `Directory: ${scanDir}`,
262
+ `Files scanned: ${fileCount}`,
263
+ `Issues: ${findings.length}`,
264
+ `Privacy grade: ${grade}`,
265
+ `Severity: Critical ${severityCounts.Critical}, High ${severityCounts.High}, Medium ${severityCounts.Medium}, Low ${severityCounts.Low}`,
266
+ ''
267
+ ];
268
+
269
+ if (!findings.length) {
270
+ lines.push('No privacy issues detected.');
271
+ return lines.join('\n');
272
+ }
273
+
274
+ const groupedFindings = this._groupFindingsBySeverity(findings);
275
+ for (const severity of this.severityOrder) {
276
+ const severityFindings = groupedFindings[severity];
277
+ if (!severityFindings.length) continue;
278
+
279
+ lines.push(`${severity.toUpperCase()}`);
280
+ for (const finding of severityFindings) {
281
+ lines.push(`- ${this._displayPath(finding.file_path)}:${finding.line_number || 0} ${finding.rule_id || 'N/A'} ${finding.description || 'Privacy issue detected'}`);
282
+ lines.push(` ${this._sanitizeSnippet(finding) || 'N/A'}`);
283
+ if (finding.suggestion) {
284
+ lines.push(` Suggestion: ${finding.suggestion}`);
285
+ }
286
+ }
287
+ lines.push('');
288
+ }
289
+
290
+ return lines.join('\n');
291
+ }
292
+
189
293
  generateReport(scanDir, findings, fileCount) {
294
+ if (this.plain) {
295
+ return this.generatePlainReport(scanDir, findings, fileCount);
296
+ }
297
+
190
298
  const reportLines = [];
191
299
 
192
300
  // ASCII Art Header
@@ -212,17 +320,7 @@ class ReportGenerator {
212
320
  );
213
321
 
214
322
  // Summary box
215
- const severityCounts = {};
216
- this.severityOrder.forEach(severity => {
217
- severityCounts[severity] = 0;
218
- });
219
-
220
- for (const finding of findings) {
221
- const severity = finding.severity || 'Low';
222
- if (severityCounts.hasOwnProperty(severity)) {
223
- severityCounts[severity]++;
224
- }
225
- }
323
+ const severityCounts = this._countSeverities(findings);
226
324
 
227
325
  const grade = this._calculateGrade(findings);
228
326
  const gradeColor = this._getGradeColor(grade);
@@ -296,12 +394,14 @@ class ReportGenerator {
296
394
  `┌── Issue #${i + 1} ──────────────────────────────────────────────────────────`,
297
395
  `│ ${icon} ${finding.description || 'Privacy issue detected'}`,
298
396
  '│',
299
- `│ 📍 Location: ${finding.file_path || 'Unknown'}:${finding.line_number || 0}`,
397
+ `│ 📍 Location: ${this._displayPath(finding.file_path)}:${finding.line_number || 0}`,
300
398
  `│ 🚨 Severity: ${finding.severity || 'Unknown'}`,
399
+ `│ 🎚️ Confidence: ${finding.confidence || 'Medium'}`,
400
+ `│ 🧩 Rule: ${finding.rule_id || 'N/A'}`,
301
401
  `│ 🔎 Detection: ${sourceBadge}`,
302
402
  '│',
303
403
  '│ 💾 Code:',
304
- `│ ${(finding.line_number || 0).toString().padStart(3)} │ ${finding.code_snippet || 'N/A'}`
404
+ `│ ${(finding.line_number || 0).toString().padStart(3)} │ ${this._sanitizeSnippet(finding) || 'N/A'}`
305
405
  );
306
406
 
307
407
  if (finding.suggestion) {
@@ -335,4 +435,4 @@ class ReportGenerator {
335
435
  }
336
436
  }
337
437
 
338
- module.exports = ReportGenerator;
438
+ module.exports = ReportGenerator;
package/dist/cli.js CHANGED
@@ -6,13 +6,28 @@ const fs = require('fs');
6
6
  const FileScanner = require('./FileScanner');
7
7
  const AnalysisEngine = require('./AnalysisEngine');
8
8
  const ReportGenerator = require('./ReportGenerator');
9
+ const {
10
+ loadConfig,
11
+ loadBaseline,
12
+ writeBaseline,
13
+ getFindingFingerprint,
14
+ isAtLeastSeverity,
15
+ normalizeSeverity,
16
+ normalizeArray
17
+ } = require('./ConfigLoader');
9
18
 
10
19
  const program = new Command();
20
+ const VERSION = require('../package.json').version;
21
+
22
+ function collect(value, previous) {
23
+ previous.push(value);
24
+ return previous;
25
+ }
11
26
 
12
27
  program
13
28
  .name('kafkacode')
14
29
  .description('KafkaCode - Privacy and Compliance Scanner')
15
- .version('1.4.0');
30
+ .version(VERSION);
16
31
 
17
32
  program
18
33
  .command('scan')
@@ -22,6 +37,14 @@ program
22
37
  .option('-b, --badge', 'Print a copy-paste privacy-grade badge for your README')
23
38
  .option('-f, --format <format>', 'Output format: console, json, or sarif', 'console')
24
39
  .option('-o, --output <file>', 'Write output to a file instead of stdout')
40
+ .option('-c, --config <file>', 'Path to a KafkaCode JSON config file')
41
+ .option('--exclude <pattern>', 'Exclude a glob pattern from scanning (repeatable)', collect, [])
42
+ .option('--baseline <file>', 'Ignore findings already present in a baseline file')
43
+ .option('--update-baseline <file>', 'Write current findings to a baseline file and exit 0')
44
+ .option('--min-severity <severity>', 'Minimum severity to report: low, medium, high, critical')
45
+ .option('--fail-on <severity>', 'Fail only when findings are at least this severity', 'low')
46
+ .option('--show-secrets', 'Show full matched snippets instead of redacting sensitive values')
47
+ .option('--plain', 'Use compact console output without the ASCII banner')
25
48
  .option('--no-ai', 'Disable AI-powered analysis (run pattern scan only)')
26
49
  .option('--no-fail', 'Exit 0 even when issues are found')
27
50
  .action(async (directory, options) => {
@@ -44,18 +67,41 @@ async function runScan(directory, options = {}) {
44
67
  process.exit(1);
45
68
  }
46
69
 
70
+ let config = {};
71
+ try {
72
+ config = loadConfig(directory, options.config);
73
+ } catch (error) {
74
+ console.error(`Error loading config: ${error.message}`);
75
+ process.exit(1);
76
+ }
77
+
78
+ const minSeverity = normalizeSeverity(options.minSeverity || config.minSeverity || 'Low');
79
+ const failOn = normalizeSeverity(options.failOn || config.failOn || 'Low');
80
+ const excludes = [
81
+ ...normalizeArray(config.exclude),
82
+ ...normalizeArray(options.exclude)
83
+ ];
84
+ const baselinePath = options.baseline || config.baseline || '';
85
+ const updateBaselinePath = options.updateBaseline || '';
86
+ const showSecrets = options.showSecrets === true || config.showSecrets === true;
87
+ const plain = options.plain === true || config.plain === true;
88
+ const aiDisabled = options.ai === false || config.ai === false;
89
+
47
90
  if (verbose) {
48
91
  console.log('🚀 Starting KafkaCode privacy scan...');
92
+ if (config.__path) {
93
+ console.log(`⚙️ Loaded config: ${config.__path}`);
94
+ }
49
95
  }
50
96
 
51
97
  try {
52
98
  // Initialize components
53
- const fileScanner = new FileScanner(directory);
99
+ const fileScanner = new FileScanner(directory, { exclude: excludes });
54
100
  const analysisEngine = new AnalysisEngine(verbose);
55
- if (options.ai === false) {
101
+ if (aiDisabled) {
56
102
  analysisEngine.disableAi();
57
103
  }
58
- const reportGenerator = new ReportGenerator();
104
+ const reportGenerator = new ReportGenerator({ showSecrets, plain });
59
105
 
60
106
  // Scan for files
61
107
  if (verbose) {
@@ -76,7 +122,27 @@ async function runScan(directory, options = {}) {
76
122
  if (verbose) {
77
123
  console.log('🔍 Performing privacy analysis...');
78
124
  }
79
- const findings = await analysisEngine.analyzeFiles(files);
125
+ let findings = await analysisEngine.analyzeFiles(files);
126
+ const scanRoot = path.resolve(directory);
127
+ findings = findings.map(finding => ({
128
+ ...finding,
129
+ fingerprint: getFindingFingerprint(finding, scanRoot)
130
+ }));
131
+
132
+ if (updateBaselinePath) {
133
+ const resolvedBaseline = path.resolve(updateBaselinePath);
134
+ writeBaseline(resolvedBaseline, findings, scanRoot);
135
+ console.error(`✅ Wrote baseline with ${findings.length} findings to ${resolvedBaseline}`);
136
+ process.exit(0);
137
+ }
138
+
139
+ if (baselinePath) {
140
+ const resolvedBaseline = path.resolve(baselinePath);
141
+ const baseline = loadBaseline(resolvedBaseline);
142
+ findings = findings.filter(finding => !baseline.has(finding.fingerprint));
143
+ }
144
+
145
+ findings = findings.filter(finding => isAtLeastSeverity(finding.severity, minSeverity));
80
146
 
81
147
  // Render the findings in the requested format (validated above)
82
148
  let output;
@@ -98,7 +164,7 @@ async function runScan(directory, options = {}) {
98
164
 
99
165
  // Console-only extras — kept out of machine-readable output
100
166
  if (format === 'console' && !options.output) {
101
- if (options.ai !== false && !analysisEngine.aiEnabled()) {
167
+ if (!aiDisabled && !analysisEngine.aiEnabled()) {
102
168
  console.log('💡 Tip: set KAFKACODE_API_KEY to enable AI-powered contextual analysis. See the README.\n');
103
169
  }
104
170
  if (options.badge) {
@@ -108,8 +174,8 @@ async function runScan(directory, options = {}) {
108
174
  }
109
175
  }
110
176
 
111
- // Exit non-zero when issues are found, unless --no-fail was passed
112
- const shouldFail = options.fail !== false && findings.length > 0;
177
+ // Exit non-zero when findings meet the failure threshold, unless --no-fail was passed
178
+ const shouldFail = options.fail !== false && findings.some(finding => isAtLeastSeverity(finding.severity, failOn));
113
179
  process.exit(shouldFail ? 1 : 0);
114
180
 
115
181
  } catch (error) {
@@ -132,4 +198,4 @@ process.on('SIGINT', () => {
132
198
  process.exit(1);
133
199
  });
134
200
 
135
- program.parse();
201
+ program.parse();
package/dist/index.js CHANGED
@@ -3,11 +3,13 @@ const PatternScanner = require('./PatternScanner');
3
3
  const LLMAnalyzer = require('./LLMAnalyzer');
4
4
  const AnalysisEngine = require('./AnalysisEngine');
5
5
  const ReportGenerator = require('./ReportGenerator');
6
+ const ConfigLoader = require('./ConfigLoader');
6
7
 
7
8
  module.exports = {
8
9
  FileScanner,
9
10
  PatternScanner,
10
11
  LLMAnalyzer,
11
12
  AnalysisEngine,
12
- ReportGenerator
13
- };
13
+ ReportGenerator,
14
+ ConfigLoader
15
+ };
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "kafkacode",
3
- "version": "1.4.1",
4
- "description": "AI-powered privacy and compliance scanner - find PII leaks, hardcoded secrets, and compliance violations in your source code",
3
+ "version": "1.5.0",
4
+ "description": "Open-source, local-first privacy code scanner for PII leaks, hardcoded secrets, GDPR/CCPA compliance, SARIF, and CI/CD",
5
5
  "main": "dist/index.js",
6
6
  "bin": {
7
7
  "kafkacode": "dist/cli.js"
@@ -21,13 +21,27 @@
21
21
  "gdpr",
22
22
  "ccpa",
23
23
  "secret-detection",
24
+ "secret-scanning",
25
+ "secrets-scanner",
26
+ "secrets",
27
+ "pii-detection",
24
28
  "code-analysis",
25
29
  "privacy-scanner",
30
+ "privacy-code-scanner",
26
31
  "ai-powered",
27
32
  "shift-left",
28
33
  "cli-tool",
29
34
  "security-scanner",
30
- "vulnerability-scanner"
35
+ "vulnerability-scanner",
36
+ "data-loss-prevention",
37
+ "dlp",
38
+ "gdpr-compliance",
39
+ "ccpa-compliance",
40
+ "sarif",
41
+ "github-actions",
42
+ "devsecops",
43
+ "appsec",
44
+ "data-privacy"
31
45
  ],
32
46
  "author": "KafkaLabs <contact@kafkalabs.com>",
33
47
  "license": "MIT",