kafkacode 1.2.0 โ†’ 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,26 +1,31 @@
1
1
  # Changelog
2
2
 
3
- All notable changes to this project will be documented in this file.
3
+ All notable changes to this project are documented in this file.
4
4
 
5
- ## [1.0.0] - 2024-09-22
5
+ ## [1.3.0] - 2026-06-05
6
6
 
7
7
  ### Added
8
- - Initial release of KafkaCode Privacy Scanner
9
- - Pattern-based detection for secrets, API keys, and sensitive data
10
- - AI-powered contextual analysis using Grok LLM
11
- - Support for multiple programming languages (Python, JavaScript, TypeScript, Java, Go, Ruby, PHP)
12
- - Beautiful console reporting with severity levels
13
- - Privacy grading system (A+ to F)
14
- - CLI interface with scan command
15
- - API key obfuscation for commercial distribution
16
- - Gitignore pattern support
17
- - Comprehensive test suite
8
+ - Bring-your-own-key AI analysis: set `KAFKACODE_API_KEY` to call an
9
+ OpenAI-compatible provider directly (defaults to Groq). `KAFKACODE_API_URL`
10
+ and `KAFKACODE_MODEL` override the endpoint and model.
11
+ - `--badge` flag that prints a shareable privacy-grade badge for your README.
12
+ - `--no-ai` flag to force pattern-only scanning.
18
13
 
19
- ### Features
20
- - Detects AWS keys, Stripe keys, private keys
21
- - Identifies PII like emails, phone numbers, IP addresses
22
- - High entropy string detection for potential secrets
23
- - Context-aware privacy vulnerability analysis
24
- - Detailed suggestions for remediation
25
- - Verbose logging option
26
- - Exit codes for CI/CD integration
14
+ ### Changed
15
+ - Open-sourced under the MIT License.
16
+ - AI analysis is now opt-in. With no key configured, scanning is pattern-only
17
+ and fully offline โ€” no code leaves the machine.
18
+
19
+ ### Removed
20
+ - Silent "mock" analysis fallback; on an API error the snippet is now skipped
21
+ instead of fabricating findings.
22
+
23
+ ## [1.2.0] - 2025-10-05
24
+
25
+ ### Added
26
+ - Pattern-based detection for hardcoded secrets, API keys, and PII.
27
+ - AI-powered contextual privacy analysis.
28
+ - Support for Python, JavaScript, TypeScript, Java, Go, Ruby, and PHP.
29
+ - Console reporting with severity levels and an A+ to F privacy grade.
30
+ - `.gitignore`-aware file scanning.
31
+ - Non-zero exit codes for CI/CD integration.
package/README.md CHANGED
@@ -1,98 +1,217 @@
1
- # KafkaCode Privacy Scanner
2
-
3
1
  <div align="center">
4
- <h3>by <a href="https://kafkalabs.com">KafkaLabs</a></h3>
5
- <p>๐Ÿ” <strong>Shift-left privacy and compliance scanner for source code</strong></p>
6
- <p>
7
- <a href="https://kafkalabs.com/kafka-code">Website</a> โ€ข
8
- <a href="https://github.com/nikhil-kapu/KafkacodeFnpm">GitHub</a> โ€ข
9
- <a href="https://www.npmjs.com/package/kafkacode">npm</a>
10
- </p>
2
+
3
+ <img src="docs/logo4.png" width="104" alt="KafkaCode logo" />
4
+
5
+ # KafkaCode
6
+
7
+ **Catch PII leaks, hardcoded secrets, and compliance risks before they ship.**
8
+
9
+ An AI-powered privacy &amp; compliance scanner for your source code. One command,
10
+ a clear **A+ โ†’ F privacy grade**, and CI-ready exit codes. Runs in seconds.
11
+
12
+ [![npm version](https://img.shields.io/npm/v/kafkacode.svg?color=cb3837&logo=npm)](https://www.npmjs.com/package/kafkacode)
13
+ [![npm downloads](https://img.shields.io/npm/dm/kafkacode.svg?color=cb3837)](https://www.npmjs.com/package/kafkacode)
14
+ [![CI](https://img.shields.io/github/actions/workflow/status/nikhil-kapu/kafkacode/ci.yml?branch=main&label=CI&logo=github)](https://github.com/nikhil-kapu/kafkacode/actions)
15
+ [![license](https://img.shields.io/npm/l/kafkacode.svg?color=blue)](LICENSE)
16
+ [![node](https://img.shields.io/node/v/kafkacode.svg?color=339933&logo=node.js)](package.json)
17
+ [![PRs welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md)
18
+
19
+ [Quickstart](#-quickstart) ยท [Features](#-features) ยท [Example](#-example-output) ยท [CI/CD](#-cicd-integration) ยท [How it works](#-how-it-works) ยท [Contributing](#-contributing)
20
+
11
21
  </div>
12
22
 
13
23
  ---
14
24
 
15
- KafkaCode is an AI-powered privacy scanner by **KafkaLabs** that helps developers identify potential privacy issues, PII leaks, and compliance violations in their source code before they reach production.
25
+ ## Why KafkaCode?
26
+
27
+ Most scanners stop at *"you leaked an AWS key."* KafkaCode goes further โ€” it grades how
28
+ your code handles **personal data**, flags **GDPR/CCPA** risks, and catches hardcoded
29
+ secrets, with an optional **AI pass** for the context that regex alone can't see.
16
30
 
17
- ## Features
31
+ You get one number a whole team understands โ€” a **privacy grade from A+ to F** โ€” plus a
32
+ non-zero exit code that fails the build when something sensitive slips in.
33
+
34
+ ```bash
35
+ npx kafkacode scan .
36
+ ```
18
37
 
19
- - ๐Ÿ” **Pattern-based Detection**: Identifies hardcoded secrets, API keys, and sensitive data
20
- - ๐Ÿค– **AI-powered Analysis**: Uses advanced LLM analysis for contextual privacy issues
21
- - โšก **Fast & Efficient**: Scans entire codebases in seconds
22
- - ๐ŸŽฏ **Multiple File Types**: Supports Python, JavaScript, TypeScript, Java, Go, Ruby, PHP
23
- - ๐Ÿ“Š **Detailed Reports**: Beautiful console reports with severity levels
24
- - ๐Ÿš€ **CI/CD Ready**: Easy integration with build pipelines
38
+ No install. No signup. No config.
25
39
 
26
- ## Installation
40
+ ## โšก Quickstart
27
41
 
28
42
  ```bash
43
+ # Run it once, anywhere (no install)
44
+ npx kafkacode scan .
45
+
46
+ # Or install globally
29
47
  npm install -g kafkacode
48
+ kafkacode scan ./src --verbose
30
49
  ```
31
50
 
32
- Or using npx (no installation required):
33
- ```bash
34
- npx kafkacode scan /path/to/your/project
51
+ ## โœจ Features
52
+
53
+ - ๐Ÿ”‘ **Secret detection** โ€” AWS & Stripe keys, private keys, high-entropy strings
54
+ - ๐Ÿ•ต๏ธ **PII detection** โ€” emails, phone numbers, IP addresses
55
+ - ๐Ÿค– **AI-powered analysis** โ€” contextual privacy issues a regex would miss
56
+ - ๐ŸŽ“ **Privacy grade** โ€” a single, shareable **A+ โ†’ F** score
57
+ - ๐Ÿท๏ธ **Grade badge** โ€” drop your score into your README (`--badge`)
58
+ - โšก **Fast & offline** โ€” pattern scanning needs no network
59
+ - ๐ŸŒ **7 languages** โ€” Python, JavaScript, TypeScript, Java, Go, Ruby, PHP
60
+ - ๐Ÿš€ **CI/CD ready** โ€” clean exit codes + a one-line GitHub Action
61
+
62
+ ## ๐Ÿ“Š Example output
63
+
64
+ ```text
65
+ ๐ŸŽฏ PRIVACY SCAN REPORT
66
+ โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
67
+
68
+ ๐Ÿ“Š SCAN SUMMARY
69
+ ๐Ÿ“ Directory: ./src
70
+ ๐Ÿ“„ Files Scanned: 18
71
+ ๐Ÿ” Total Issues: 4
72
+ ๐Ÿ† Privacy Grade: ๐Ÿ”ด F
73
+
74
+ ๐Ÿšจ Critical: 1 ๐Ÿ”ฅ High: 1 โš ๏ธ Medium: 2 ๐Ÿ”ต Low: 0
75
+
76
+ ๐Ÿšจ CRITICAL
77
+ โ”Œโ”€ AWS Access Key ID detected
78
+ โ”‚ ๐Ÿ“ src/config.js:12
79
+ โ”‚ ๐Ÿ’ก Move credentials to environment variables or a secrets manager.
80
+ โ””โ”€
81
+
82
+ โš ๏ธ MEDIUM
83
+ โ”Œโ”€ Email address detected (PII)
84
+ โ”‚ ๐Ÿ“ src/users.js:47
85
+ โ”‚ ๐Ÿ’ก Avoid hardcoding personal data; load it at runtime.
86
+ โ””โ”€
35
87
  ```
36
88
 
37
- ## Usage
89
+ ## ๐Ÿท๏ธ Privacy grade & badge
90
+
91
+ KafkaCode distills every scan into one grade:
92
+
93
+ | Grade | Meaning |
94
+ | :---: | ------- |
95
+ | ๐ŸŸข **A+ / A / A-** | Excellent โ€” no or only low-severity issues |
96
+ | ๐ŸŸก **B+ / B / B-** | Good โ€” a few medium-severity issues |
97
+ | ๐ŸŸ  **C+ / C / C-** | Needs attention โ€” high-severity issues present |
98
+ | ๐Ÿ”ด **D / F** | Critical privacy/secret exposure |
99
+
100
+ Show it off in your own README:
38
101
 
39
- **Basic Scan:**
40
102
  ```bash
41
- kafkacode scan /path/to/your/project
103
+ kafkacode scan . --badge
42
104
  ```
43
105
 
44
- **Verbose Output:**
45
- ```bash
46
- kafkacode scan /path/to/your/project --verbose
106
+ ```text
107
+ ๐Ÿท๏ธ Privacy Grade Badge โ€” paste into your README:
108
+
109
+ ![Privacy Grade: A+](https://img.shields.io/badge/Privacy%20Grade-A%2B-brightgreen)
47
110
  ```
48
111
 
49
- ## What it detects
112
+ โ†’ ![Privacy Grade: A+](https://img.shields.io/badge/Privacy%20Grade-A%2B-brightgreen)
113
+
114
+ ## ๐Ÿš€ CI/CD integration
50
115
 
51
- - **Critical Issues**: AWS keys, Stripe keys, Private keys
52
- - **High Severity**: Sensitive keywords in assignment context
53
- - **Medium Severity**: Email addresses, Phone numbers, High entropy strings
54
- - **Low Severity**: IP addresses, URLs
116
+ ### GitHub Action
55
117
 
56
- ## Privacy Grade
118
+ ```yaml
119
+ # .github/workflows/privacy.yml
120
+ name: Privacy Scan
121
+ on: [push, pull_request]
122
+ jobs:
123
+ scan:
124
+ runs-on: ubuntu-latest
125
+ steps:
126
+ - uses: actions/checkout@v4
127
+ - uses: nikhil-kapu/kafkacode@v1
128
+ with:
129
+ path: ./src
130
+ ```
57
131
 
58
- KafkaCode assigns a privacy grade (A+ to F) based on the severity and number of issues found:
132
+ ### Any CI / pre-commit
59
133
 
60
- - **A+/A/A-**: Excellent privacy practices
61
- - **B+/B/B-**: Good privacy practices with minor issues
62
- - **C+/C/C-**: Moderate privacy issues that should be addressed
63
- - **D**: Multiple high-severity privacy issues
64
- - **F**: Critical privacy vulnerabilities detected
134
+ ```bash
135
+ # Exits non-zero when issues are found, failing the build
136
+ npx kafkacode scan ./src
137
+ ```
65
138
 
66
- ## Example Output
139
+ ## ๐Ÿ” What it detects
140
+
141
+ | Severity | Examples |
142
+ | -------- | -------- |
143
+ | ๐Ÿšจ **Critical** | AWS keys, Stripe live keys, private keys |
144
+ | ๐Ÿ”ฅ **High** | `password=`, `api_key=`, `token=` and other secrets in assignments |
145
+ | โš ๏ธ **Medium** | Emails, phone numbers, high-entropy strings |
146
+ | ๐Ÿ”ต **Low** | IP addresses |
147
+
148
+ ## ๐Ÿง  How it works
67
149
 
68
150
  ```
69
- ๐ŸŽฏ PRIVACY SCAN REPORT
70
- โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
151
+ your code โ”€โ–ถ FileScanner โ”€โ–ถ โ”Œโ”€ PatternScanner (regex, fully offline)
152
+ โ””โ”€ LLMAnalyzer (optional AI context)
153
+ โ”‚
154
+ โ–ผ
155
+ ReportGenerator โ”€โ–ถ grade + findings + exit code
156
+ ```
71
157
 
72
- ๐Ÿ“Š SCAN SUMMARY
73
- ๐Ÿ“ Directory: ./src
74
- โฐ Timestamp: 2024-01-15 10:30:45
75
- ๐Ÿ“„ Files Scanned: 25
76
- ๐Ÿ” Total Issues: 3
77
- ๐Ÿ† Privacy Grade: ๐ŸŸกB-
158
+ Pattern-based detection runs entirely on your machine with no network calls. The
159
+ optional AI layer adds contextual findings for the cases static rules can't catch.
160
+
161
+ ## ๐Ÿค– AI mode (optional, bring-your-own-key)
162
+
163
+ Pattern scanning works out of the box with **no setup and no network calls**. To add
164
+ AI-powered contextual findings, bring your own API key โ€” KafkaCode calls an
165
+ OpenAI-compatible chat API directly, defaulting to [Groq](https://console.groq.com/keys)
166
+ (which has a free tier):
167
+
168
+ ```bash
169
+ export KAFKACODE_API_KEY=your_key_here
170
+ kafkacode scan ./src
78
171
  ```
79
172
 
80
- ## License
173
+ | Variable | Default | Purpose |
174
+ | -------- | ------- | ------- |
175
+ | `KAFKACODE_API_KEY` | _(unset)_ | Your provider API key โ€” **enables AI mode** |
176
+ | `KAFKACODE_API_URL` | `https://api.groq.com/openai/v1` | OpenAI-compatible base URL (Groq, OpenAI, OpenRouter, local modelsโ€ฆ) |
177
+ | `KAFKACODE_MODEL` | `llama-3.1-8b-instant` | Model name |
81
178
 
82
- MIT License - Copyright (c) 2025 KafkaLabs
179
+ Without a key, KafkaCode runs **pattern-only and never sends your code anywhere**.
180
+ Pass `--no-ai` to force pattern-only even when a key is set.
83
181
 
84
- See [LICENSE](LICENSE) file for details.
182
+ ## ๐Ÿ†š How it compares
85
183
 
86
- ## About KafkaLabs
184
+ | | KafkaCode | gitleaks / trufflehog | semgrep |
185
+ | ---------------------------- | :-------: | :-------------------: | :-----: |
186
+ | Hardcoded secrets | โœ… | โœ… (deep, git log) | โž– |
187
+ | PII / personal-data findings | โœ… | โž– | โž– |
188
+ | Privacy grade (A+ โ†’ F) | โœ… | โž– | โž– |
189
+ | AI contextual analysis | โœ… | โž– | โž– |
190
+ | Zero-config, one command | โœ… | โœ… | โž– |
87
191
 
88
- KafkaCode is built by [KafkaLabs](https://kafkalabs.com), helping developers build privacy-first applications.
192
+ KafkaCode focuses on **privacy and developer-friendly grading** โ€” it complements
193
+ deep secret scanners rather than replacing them.
89
194
 
90
- - ๐ŸŒ **Website**: [kafkalabs.com/kafka-code](https://kafkalabs.com/kafka-code)
91
- - ๐Ÿ“ง **Contact**: contact@kafkalabs.com
92
- - ๐Ÿ’ฌ **Issues**: [GitHub Issues](https://github.com/nikhil-kapu/KafkacodeFnpm/issues)
195
+ ## ๐Ÿ—บ๏ธ Roadmap
93
196
 
94
- ---
197
+ - [x] **Bring-your-own-key AI** โ€” call Groq / OpenAI-compatible providers directly
198
+ - [ ] `--json` and **SARIF** output (GitHub Security tab integration)
199
+ - [ ] Config file &amp; `.kafkacodeignore`
200
+ - [ ] Baseline file to adopt on existing codebases
201
+ - [ ] More file types (`.env`, YAML, Terraform, Dockerfiles)
202
+
203
+ Ideas and PRs welcome โ€” see [CONTRIBUTING.md](CONTRIBUTING.md).
204
+
205
+ ## ๐Ÿค Contributing
206
+
207
+ Contributions of all kinds are welcome โ€” bug reports, new detection patterns, and docs.
208
+ Start with [CONTRIBUTING.md](CONTRIBUTING.md), and please report security issues per our
209
+ [Security Policy](SECURITY.md).
210
+
211
+ ## ๐Ÿ“„ License
212
+
213
+ [MIT](LICENSE) ยฉ KafkaLabs
95
214
 
96
215
  <div align="center">
97
- Made with โค๏ธ by <a href="https://kafkalabs.com">KafkaLabs</a>
98
- </div>
216
+ <sub>๐Ÿ›ก๏ธ Keep your code secure, keep your users safe.</sub>
217
+ </div>
@@ -10,6 +10,16 @@ class AnalysisEngine {
10
10
  this.llmAnalyzer.verbose = verbose;
11
11
  }
12
12
 
13
+ /** Force pattern-only scanning, even if an API key is configured. */
14
+ disableAi() {
15
+ this.llmAnalyzer.disabled = true;
16
+ }
17
+
18
+ /** Whether AI-powered analysis will run for this scan. */
19
+ aiEnabled() {
20
+ return this.llmAnalyzer.isEnabled();
21
+ }
22
+
13
23
  async analyzeFile(filePath) {
14
24
  if (this.verbose) {
15
25
  console.log(`Analyzing: ${filePath}`);
@@ -1,8 +1,28 @@
1
1
  const https = require('https');
2
2
 
3
+ /**
4
+ * LLMAnalyzer performs optional AI-powered contextual analysis.
5
+ *
6
+ * It is "bring your own key": the user supplies an API key and KafkaCode calls
7
+ * an OpenAI-compatible chat-completions endpoint directly (defaulting to Groq).
8
+ * When no key (and no self-hosted backend) is configured, AI analysis is simply
9
+ * skipped โ€” pattern-based scanning still runs, and no code leaves the machine.
10
+ */
3
11
  class LLMAnalyzer {
4
12
  constructor() {
5
- this.backendEndpoint = process.env.KAFKACODE_BACKEND_ENDPOINT || 'https://adorable-motivation-production.up.railway.app';
13
+ // Bring-your-own-key: direct, OpenAI-compatible provider call.
14
+ this.apiKey = process.env.KAFKACODE_API_KEY || '';
15
+ this.apiUrl = process.env.KAFKACODE_API_URL || 'https://api.groq.com/openai/v1';
16
+ this.model = process.env.KAFKACODE_MODEL || 'llama-3.1-8b-instant';
17
+
18
+ // Advanced: a self-hosted backend exposing POST /api/analyze. If set,
19
+ // it takes precedence over a direct provider call.
20
+ this.backendEndpoint = process.env.KAFKACODE_BACKEND_ENDPOINT || '';
21
+
22
+ // Delay between snippet requests, to stay within free-tier rate limits.
23
+ this.rateLimitMs = parseInt(process.env.KAFKACODE_RATE_LIMIT_MS || '250', 10);
24
+
25
+ this.disabled = false;
6
26
  this.verbose = false;
7
27
  this.interestKeywords = new Set([
8
28
  'api', 'db', 'database', 'user', 'password', 'save', 'fetch', 'send', 'log',
@@ -11,6 +31,11 @@ class LLMAnalyzer {
11
31
  ]);
12
32
  }
13
33
 
34
+ /** AI analysis is available only when a key or a backend endpoint is configured. */
35
+ isEnabled() {
36
+ return !this.disabled && Boolean(this.apiKey || this.backendEndpoint);
37
+ }
38
+
14
39
  _createSnippetPrompt(codeSnippet, filePath, startLine) {
15
40
  return `SYSTEM: You are an automated privacy and compliance analysis engine. Your task is to review the following CODE SNIPPET and identify potential privacy vulnerabilities based ONLY on the provided code. The snippet is from a larger file. Do not infer functionality outside of this snippet. Your analysis must focus on how the code handles data that could be considered sensitive or PII.
16
41
 
@@ -67,7 +92,6 @@ ${codeSnippet}`;
67
92
  const mergedRanges = [];
68
93
  for (const [start, end] of ranges) {
69
94
  if (mergedRanges.length > 0 && start <= mergedRanges[mergedRanges.length - 1][1] + 10) {
70
- // Extend previous range
71
95
  const lastRange = mergedRanges[mergedRanges.length - 1];
72
96
  mergedRanges[mergedRanges.length - 1] = [lastRange[0], Math.max(lastRange[1], end)];
73
97
  } else {
@@ -78,27 +102,50 @@ ${codeSnippet}`;
78
102
  return mergedRanges;
79
103
  }
80
104
 
81
- async callGrokApi(codeSnippet, filePath, startLine) {
82
- try {
83
- return await this._callBackendApi(codeSnippet, filePath, startLine);
84
- } catch (error) {
85
- if (this.verbose) {
86
- console.log(` โŒ LLM call failed, using mock: ${error.message}`);
87
- }
88
- return this._mockSnippetResponse(codeSnippet, filePath, startLine);
105
+ /** Route a snippet to either the self-hosted backend or the direct provider. */
106
+ async _analyzeSnippet(codeSnippet, filePath, startLine) {
107
+ if (this.backendEndpoint) {
108
+ return this._callBackendApi(codeSnippet, filePath, startLine);
89
109
  }
110
+ return this._callProviderApi(codeSnippet, filePath, startLine);
90
111
  }
91
112
 
92
- async _callBackendApi(codeSnippet, filePath, startLine) {
113
+ /** Call an OpenAI-compatible chat-completions endpoint directly (BYOK). */
114
+ async _callProviderApi(codeSnippet, filePath, startLine) {
115
+ const prompt = this._createSnippetPrompt(codeSnippet, filePath, startLine);
93
116
  const payload = JSON.stringify({
94
- codeSnippet,
95
- filePath,
96
- startLine
117
+ model: this.model,
118
+ messages: [{ role: 'user', content: prompt }],
119
+ temperature: 0,
120
+ max_tokens: 800
97
121
  });
98
122
 
99
- const url = new URL(this.backendEndpoint);
100
- url.pathname = '/api/analyze';
123
+ const base = this.apiUrl.replace(/\/+$/, '');
124
+ const url = new URL(`${base}/chat/completions`);
125
+ const options = {
126
+ hostname: url.hostname,
127
+ port: url.port || 443,
128
+ path: url.pathname + url.search,
129
+ method: 'POST',
130
+ headers: {
131
+ 'Content-Type': 'application/json',
132
+ 'Authorization': `Bearer ${this.apiKey}`,
133
+ 'Content-Length': Buffer.byteLength(payload)
134
+ },
135
+ timeout: 20000
136
+ };
137
+
138
+ const raw = await this._request(options, payload);
139
+ const result = JSON.parse(raw);
140
+ const content = result.choices && result.choices[0] && result.choices[0].message.content;
141
+ return this._parseVulnerabilities(content || '');
142
+ }
101
143
 
144
+ /** Call a self-hosted KafkaCode backend (POST /api/analyze). */
145
+ async _callBackendApi(codeSnippet, filePath, startLine) {
146
+ const payload = JSON.stringify({ codeSnippet, filePath, startLine });
147
+ const base = this.backendEndpoint.replace(/\/+$/, '');
148
+ const url = new URL(`${base}/api/analyze`);
102
149
  const options = {
103
150
  hostname: url.hostname,
104
151
  port: url.port || 443,
@@ -108,68 +155,57 @@ ${codeSnippet}`;
108
155
  'Content-Type': 'application/json',
109
156
  'Content-Length': Buffer.byteLength(payload)
110
157
  },
111
- timeout: 12000 // 12 second timeout for CLI request
158
+ timeout: 20000
112
159
  };
113
160
 
161
+ const raw = await this._request(options, payload);
162
+ const result = JSON.parse(raw);
163
+ const content = result.data && result.data.choices && result.data.choices[0].message.content;
164
+ return this._parseVulnerabilities(content || '');
165
+ }
166
+
167
+ /** Parse the model's text response into a vulnerabilities array, defensively. */
168
+ _parseVulnerabilities(content) {
169
+ try {
170
+ const parsed = JSON.parse(content);
171
+ return Array.isArray(parsed.vulnerabilities) ? parsed.vulnerabilities : [];
172
+ } catch (err) {
173
+ // Some models wrap JSON in prose or fences โ€” extract the JSON object.
174
+ const start = content.indexOf('{');
175
+ const end = content.lastIndexOf('}') + 1;
176
+ if (start !== -1 && end > start) {
177
+ try {
178
+ const parsed = JSON.parse(content.substring(start, end));
179
+ return Array.isArray(parsed.vulnerabilities) ? parsed.vulnerabilities : [];
180
+ } catch (_) {
181
+ return [];
182
+ }
183
+ }
184
+ return [];
185
+ }
186
+ }
187
+
188
+ /** Promise wrapper around https.request with status + timeout handling. */
189
+ _request(options, payload) {
114
190
  return new Promise((resolve, reject) => {
115
191
  const req = https.request(options, (res) => {
116
192
  let data = '';
117
-
118
- res.on('data', (chunk) => {
119
- data += chunk;
120
- });
121
-
193
+ res.on('data', (chunk) => { data += chunk; });
122
194
  res.on('end', () => {
123
- try {
124
- if (res.statusCode === 429) {
125
- const errorData = JSON.parse(data);
126
- throw new Error(`Rate limit exceeded: ${errorData.message}`);
127
- }
128
-
129
- if (res.statusCode !== 200) {
130
- throw new Error(`HTTP ${res.statusCode}: ${data}`);
131
- }
132
-
133
- const result = JSON.parse(data);
134
- const content = result.data.choices[0].message.content;
135
-
136
- try {
137
- const parsedResponse = JSON.parse(content);
138
- if (this.verbose) {
139
- console.log(` โœ… LLM returned ${parsedResponse.vulnerabilities?.length || 0} vulnerabilities`);
140
- if (result.rateLimitRemaining !== undefined) {
141
- console.log(` ๐Ÿ“Š Rate limit remaining: ${result.rateLimitRemaining}`);
142
- }
143
- }
144
- parsedResponse.__source = 'llm';
145
- resolve(parsedResponse);
146
- } catch (jsonError) {
147
- const jsonStart = content.indexOf('{');
148
- const jsonEnd = content.lastIndexOf('}') + 1;
149
- if (jsonStart !== -1 && jsonEnd > jsonStart) {
150
- const parsed = JSON.parse(content.substring(jsonStart, jsonEnd));
151
- if (this.verbose) {
152
- console.log(` โœ… LLM returned ${parsed.vulnerabilities?.length || 0} vulnerabilities (extracted JSON)`);
153
- }
154
- parsed.__source = 'llm';
155
- resolve(parsed);
156
- } else {
157
- resolve({ vulnerabilities: [] });
158
- }
159
- }
160
- } catch (error) {
161
- reject(error);
195
+ if (res.statusCode === 429) {
196
+ return reject(new Error('Rate limit exceeded (HTTP 429)'));
197
+ }
198
+ if (res.statusCode < 200 || res.statusCode >= 300) {
199
+ return reject(new Error(`HTTP ${res.statusCode}: ${data.slice(0, 200)}`));
162
200
  }
201
+ resolve(data);
163
202
  });
164
203
  });
165
204
 
166
- req.on('error', (error) => {
167
- reject(error);
168
- });
169
-
205
+ req.on('error', reject);
170
206
  req.on('timeout', () => {
171
207
  req.destroy();
172
- reject(new Error('Backend API request timeout'));
208
+ reject(new Error('LLM request timed out'));
173
209
  });
174
210
 
175
211
  req.write(payload);
@@ -177,87 +213,56 @@ ${codeSnippet}`;
177
213
  });
178
214
  }
179
215
 
180
-
181
- _mockSnippetResponse(codeSnippet, filePath, startLine) {
182
- const vulnerabilities = [];
183
- const lines = codeSnippet.split('\n');
184
-
185
- // Simple heuristic-based mock analysis
186
- for (let i = 0; i < lines.length; i++) {
187
- const line = lines[i];
188
- const lineLower = line.toLowerCase();
189
- const actualLineNum = startLine + i;
190
-
191
- // Look for logging of potentially sensitive data
192
- if (lineLower.includes('log') && ['email', 'user', 'password', 'token'].some(term => lineLower.includes(term))) {
193
- vulnerabilities.push({
194
- line_number: actualLineNum,
195
- severity: 'Medium',
196
- description: 'Potential logging of sensitive user data detected.',
197
- suggestion: 'Consider logging only non-sensitive identifiers or hashing sensitive data before logging.'
198
- });
199
- }
200
-
201
- // Look for insecure data transmission
202
- if (lineLower.includes('http://') && ['api', 'send', 'post', 'request'].some(term => lineLower.includes(term))) {
203
- vulnerabilities.push({
204
- line_number: actualLineNum,
205
- severity: 'High',
206
- description: 'Insecure HTTP transmission of potentially sensitive data.',
207
- suggestion: 'Use HTTPS instead of HTTP for all data transmission.'
208
- });
209
- }
216
+ async analyzeFile(filePath, content, patternFindings = []) {
217
+ // AI analysis is opt-in: with no key/backend configured, skip entirely.
218
+ if (!this.isEnabled()) {
219
+ return [];
210
220
  }
211
221
 
212
- return { vulnerabilities, __source: 'mock' };
213
- }
214
-
215
- async analyzeFile(filePath, content, patternFindings = []) {
216
222
  const lines = content.split('\n');
217
223
  const areasOfInterest = this._identifyAreasOfInterest(content, patternFindings);
218
224
  const findings = [];
219
225
 
220
226
  if (this.verbose && areasOfInterest.length > 0) {
221
- console.log(` Found ${areasOfInterest.length} areas of interest for LLM analysis`);
227
+ console.log(` Found ${areasOfInterest.length} areas of interest for AI analysis`);
222
228
  }
223
229
 
224
- // Analyze each area of interest
225
230
  for (const [startLine, endLine] of areasOfInterest) {
226
- // Extract snippet
227
- const snippetLines = lines.slice(startLine - 1, endLine);
228
- const snippet = snippetLines.join('\n');
231
+ const snippet = lines.slice(startLine - 1, endLine).join('\n');
229
232
 
230
233
  // Skip very small snippets
231
234
  if (snippet.trim().length < 50) {
232
235
  continue;
233
236
  }
234
237
 
238
+ let vulnerabilities;
235
239
  try {
236
- // Call API with rate limiting
237
- const grokResponse = await this.callGrokApi(snippet, filePath, startLine);
238
-
239
- // Add delay to prevent rate limiting from free tier
240
- await new Promise(resolve => setTimeout(resolve, 1000));
241
-
242
- // Process findings
243
- for (const vuln of grokResponse.vulnerabilities || []) {
244
- const finding = {
245
- file_path: filePath,
246
- line_number: vuln.line_number || startLine,
247
- severity: vuln.severity || 'Medium',
248
- finding_type: 'Context-Based Issue',
249
- description: vuln.description || 'Privacy vulnerability detected',
250
- code_snippet: this._getCodeSnippet(content, vuln.line_number || startLine),
251
- suggestion: vuln.suggestion || 'Review and address the identified issue.',
252
- source: grokResponse.__source || 'unknown'
253
- };
254
- findings.push(finding);
255
- }
256
-
240
+ vulnerabilities = await this._analyzeSnippet(snippet, filePath, startLine);
257
241
  } catch (error) {
258
- // Continue with other snippets if one fails
242
+ // Skip this snippet on error โ€” never fabricate findings.
243
+ if (this.verbose) {
244
+ console.log(` โš ๏ธ AI analysis skipped for ${filePath}:${startLine} โ€” ${error.message}`);
245
+ }
259
246
  continue;
260
247
  }
248
+
249
+ for (const vuln of vulnerabilities) {
250
+ findings.push({
251
+ file_path: filePath,
252
+ line_number: vuln.line_number || startLine,
253
+ severity: vuln.severity || 'Medium',
254
+ finding_type: 'Context-Based Issue',
255
+ description: vuln.description || 'Privacy vulnerability detected',
256
+ code_snippet: this._getCodeSnippet(content, vuln.line_number || startLine),
257
+ suggestion: vuln.suggestion || 'Review and address the identified issue.',
258
+ source: 'llm'
259
+ });
260
+ }
261
+
262
+ // Gentle pacing to respect provider rate limits.
263
+ if (this.rateLimitMs > 0) {
264
+ await new Promise(resolve => setTimeout(resolve, this.rateLimitMs));
265
+ }
261
266
  }
262
267
 
263
268
  // Remove duplicates based on line number and description
@@ -283,4 +288,4 @@ ${codeSnippet}`;
283
288
  }
284
289
  }
285
290
 
286
- module.exports = LLMAnalyzer;
291
+ module.exports = LLMAnalyzer;
@@ -47,6 +47,25 @@ class ReportGenerator {
47
47
  }
48
48
  }
49
49
 
50
+ // Public: return the privacy grade (A+ .. F) for a set of findings.
51
+ getGrade(findings) {
52
+ return this._calculateGrade(findings);
53
+ }
54
+
55
+ // Public: build a shields.io privacy-grade badge for embedding in a README.
56
+ getBadge(findings) {
57
+ const grade = this._calculateGrade(findings);
58
+ const colorMap = {
59
+ 'A+': 'brightgreen', 'A': 'brightgreen', 'A-': 'green',
60
+ 'B+': 'yellowgreen', 'B': 'yellowgreen', 'B-': 'yellow',
61
+ 'C+': 'yellow', 'C': 'orange', 'C-': 'orange',
62
+ 'D': 'red', 'F': 'red'
63
+ };
64
+ const color = colorMap[grade] || 'lightgrey';
65
+ const url = `https://img.shields.io/badge/Privacy%20Grade-${encodeURIComponent(grade)}-${color}`;
66
+ return { grade, url, markdown: `![Privacy Grade: ${grade}](${url})` };
67
+ }
68
+
50
69
  _groupFindingsBySeverity(findings) {
51
70
  const groups = {};
52
71
  this.severityOrder.forEach(severity => {
@@ -132,7 +151,7 @@ class ReportGenerator {
132
151
  'โ”‚ โ”‚',
133
152
  'โ”‚ ๐ŸŽฏ Detection Methods: โ”‚',
134
153
  `โ”‚ โ€ข Pattern-based: ${patternCount} issues โ”‚`,
135
- `โ”‚ โ€ข AI-powered: ${llmCount} issues (Grok 4 Fast) โ”‚`,
154
+ `โ”‚ โ€ข AI-powered: ${llmCount} issues โ”‚`,
136
155
  'โ”‚ โ”‚',
137
156
  'โ”‚ ๐Ÿ“ˆ Severity Breakdown: โ”‚',
138
157
  `โ”‚ ๐Ÿšจ Critical: ${severityCounts['Critical'].toString().padEnd(10)} ๐Ÿ”ฅ High: ${severityCounts['High'].toString().padEnd(14)} โ”‚`,
@@ -209,16 +228,12 @@ class ReportGenerator {
209
228
 
210
229
  // Footer
211
230
  reportLines.push(
212
- 'โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—',
213
- 'โ•‘ ๐Ÿš€ GET STARTED โ•‘',
214
- 'โ•‘ โ•‘',
215
- 'โ•‘ ๐Ÿ“š Documentation: https://kafkacode.dev/docs โ•‘',
216
- 'โ•‘ ๐Ÿ™ GitHub: https://github.com/kafkacode/privacy-scanner โ•‘',
217
- 'โ•‘ ๐Ÿ’ฌ Support: https://discord.gg/kafkacode โ•‘',
218
- 'โ•‘ ๐Ÿฆ Follow: @KafkaCodeDev โ•‘',
219
- 'โ•‘ โ•‘',
220
- 'โ•‘ ๐Ÿ›ก๏ธ Keep your code secure, keep your users safe! ๐Ÿ›ก๏ธ โ•‘',
221
- 'โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•',
231
+ '',
232
+ 'โ”€'.repeat(80),
233
+ '๐Ÿš€ KafkaCode ยท AI-powered privacy & compliance scanner',
234
+ '๐Ÿ“š Docs & issues: https://github.com/nikhil-kapu/kafkacode',
235
+ '๐Ÿ›ก๏ธ Keep your code secure, keep your users safe!',
236
+ 'โ”€'.repeat(80),
222
237
  ''
223
238
  );
224
239
 
package/dist/cli.js CHANGED
@@ -12,13 +12,15 @@ const program = new Command();
12
12
  program
13
13
  .name('kafkacode')
14
14
  .description('KafkaCode - Privacy and Compliance Scanner')
15
- .version('1.2.0');
15
+ .version('1.3.0');
16
16
 
17
17
  program
18
18
  .command('scan')
19
19
  .description('Scan a directory for privacy issues')
20
20
  .argument('<directory>', 'Path to the source code directory to scan')
21
21
  .option('-v, --verbose', 'Print verbose progress updates during the scan')
22
+ .option('-b, --badge', 'Print a copy-paste privacy-grade badge for your README')
23
+ .option('--no-ai', 'Disable AI-powered analysis (run pattern scan only)')
22
24
  .action(async (directory, options) => {
23
25
  await runScan(directory, options);
24
26
  });
@@ -40,6 +42,9 @@ async function runScan(directory, options = {}) {
40
42
  // Initialize components
41
43
  const fileScanner = new FileScanner(directory);
42
44
  const analysisEngine = new AnalysisEngine(verbose);
45
+ if (options.ai === false) {
46
+ analysisEngine.disableAi();
47
+ }
43
48
  const reportGenerator = new ReportGenerator();
44
49
 
45
50
  // Scan for files
@@ -67,6 +72,18 @@ async function runScan(directory, options = {}) {
67
72
  const report = reportGenerator.generateReport(directory, findings, files.length);
68
73
  console.log(report);
69
74
 
75
+ // Hint that AI analysis is available when it wasn't used.
76
+ if (options.ai !== false && !analysisEngine.aiEnabled()) {
77
+ console.log('๐Ÿ’ก Tip: set KAFKACODE_API_KEY to enable AI-powered contextual analysis. See the README.\n');
78
+ }
79
+
80
+ // Optionally print a copy-paste privacy-grade badge for the user's README
81
+ if (options.badge) {
82
+ const badge = reportGenerator.getBadge(findings);
83
+ console.log('๐Ÿท๏ธ Privacy Grade Badge โ€” paste into your README:\n');
84
+ console.log(` ${badge.markdown}\n`);
85
+ }
86
+
70
87
  // Return appropriate exit code
71
88
  process.exit(findings.length > 0 ? 1 : 0);
72
89
 
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "kafkacode",
3
- "version": "1.2.0",
4
- "description": "AI-powered privacy and compliance scanner by KafkaLabs - identify PII leaks, secrets, and compliance violations",
3
+ "version": "1.3.0",
4
+ "description": "AI-powered privacy and compliance scanner - find PII leaks, hardcoded secrets, and compliance violations in your source code",
5
5
  "main": "dist/index.js",
6
6
  "bin": {
7
7
  "kafkacode": "dist/cli.js"
@@ -27,8 +27,7 @@
27
27
  "shift-left",
28
28
  "cli-tool",
29
29
  "security-scanner",
30
- "vulnerability-scanner",
31
- "kafkalabs"
30
+ "vulnerability-scanner"
32
31
  ],
33
32
  "author": "KafkaLabs <contact@kafkalabs.com>",
34
33
  "license": "MIT",
@@ -48,10 +47,10 @@
48
47
  ],
49
48
  "repository": {
50
49
  "type": "git",
51
- "url": "https://github.com/nikhil-kapu/KafkacodeFnpm.git"
50
+ "url": "git+https://github.com/nikhil-kapu/kafkacode.git"
52
51
  },
53
- "homepage": "https://kafkalabs.com/kafka-code",
52
+ "homepage": "https://github.com/nikhil-kapu/kafkacode#readme",
54
53
  "bugs": {
55
- "url": "https://github.com/nikhil-kapu/KafkacodeFnpm/issues"
54
+ "url": "https://github.com/nikhil-kapu/kafkacode/issues"
56
55
  }
57
- }
56
+ }