kafkacode 1.2.0 โ 1.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +25 -20
- package/README.md +178 -59
- package/dist/AnalysisEngine.js +10 -0
- package/dist/LLMAnalyzer.js +133 -128
- package/dist/ReportGenerator.js +26 -11
- package/dist/cli.js +18 -1
- package/package.json +7 -8
package/CHANGELOG.md
CHANGED
|
@@ -1,26 +1,31 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
-
All notable changes to this project
|
|
3
|
+
All notable changes to this project are documented in this file.
|
|
4
4
|
|
|
5
|
-
## [1.
|
|
5
|
+
## [1.3.0] - 2026-06-05
|
|
6
6
|
|
|
7
7
|
### Added
|
|
8
|
-
-
|
|
9
|
-
-
|
|
10
|
-
|
|
11
|
-
-
|
|
12
|
-
-
|
|
13
|
-
- Privacy grading system (A+ to F)
|
|
14
|
-
- CLI interface with scan command
|
|
15
|
-
- API key obfuscation for commercial distribution
|
|
16
|
-
- Gitignore pattern support
|
|
17
|
-
- Comprehensive test suite
|
|
8
|
+
- Bring-your-own-key AI analysis: set `KAFKACODE_API_KEY` to call an
|
|
9
|
+
OpenAI-compatible provider directly (defaults to Groq). `KAFKACODE_API_URL`
|
|
10
|
+
and `KAFKACODE_MODEL` override the endpoint and model.
|
|
11
|
+
- `--badge` flag that prints a shareable privacy-grade badge for your README.
|
|
12
|
+
- `--no-ai` flag to force pattern-only scanning.
|
|
18
13
|
|
|
19
|
-
###
|
|
20
|
-
-
|
|
21
|
-
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
-
|
|
26
|
-
|
|
14
|
+
### Changed
|
|
15
|
+
- Open-sourced under the MIT License.
|
|
16
|
+
- AI analysis is now opt-in. With no key configured, scanning is pattern-only
|
|
17
|
+
and fully offline โ no code leaves the machine.
|
|
18
|
+
|
|
19
|
+
### Removed
|
|
20
|
+
- Silent "mock" analysis fallback; on an API error the snippet is now skipped
|
|
21
|
+
instead of fabricating findings.
|
|
22
|
+
|
|
23
|
+
## [1.2.0] - 2025-10-05
|
|
24
|
+
|
|
25
|
+
### Added
|
|
26
|
+
- Pattern-based detection for hardcoded secrets, API keys, and PII.
|
|
27
|
+
- AI-powered contextual privacy analysis.
|
|
28
|
+
- Support for Python, JavaScript, TypeScript, Java, Go, Ruby, and PHP.
|
|
29
|
+
- Console reporting with severity levels and an A+ to F privacy grade.
|
|
30
|
+
- `.gitignore`-aware file scanning.
|
|
31
|
+
- Non-zero exit codes for CI/CD integration.
|
package/README.md
CHANGED
|
@@ -1,98 +1,217 @@
|
|
|
1
|
-
# KafkaCode Privacy Scanner
|
|
2
|
-
|
|
3
1
|
<div align="center">
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
2
|
+
|
|
3
|
+
<img src="docs/logo4.png" width="104" alt="KafkaCode logo" />
|
|
4
|
+
|
|
5
|
+
# KafkaCode
|
|
6
|
+
|
|
7
|
+
**Catch PII leaks, hardcoded secrets, and compliance risks before they ship.**
|
|
8
|
+
|
|
9
|
+
An AI-powered privacy & compliance scanner for your source code. One command,
|
|
10
|
+
a clear **A+ โ F privacy grade**, and CI-ready exit codes. Runs in seconds.
|
|
11
|
+
|
|
12
|
+
[](https://www.npmjs.com/package/kafkacode)
|
|
13
|
+
[](https://www.npmjs.com/package/kafkacode)
|
|
14
|
+
[](https://github.com/nikhil-kapu/kafkacode/actions)
|
|
15
|
+
[](LICENSE)
|
|
16
|
+
[](package.json)
|
|
17
|
+
[](CONTRIBUTING.md)
|
|
18
|
+
|
|
19
|
+
[Quickstart](#-quickstart) ยท [Features](#-features) ยท [Example](#-example-output) ยท [CI/CD](#-cicd-integration) ยท [How it works](#-how-it-works) ยท [Contributing](#-contributing)
|
|
20
|
+
|
|
11
21
|
</div>
|
|
12
22
|
|
|
13
23
|
---
|
|
14
24
|
|
|
15
|
-
|
|
25
|
+
## Why KafkaCode?
|
|
26
|
+
|
|
27
|
+
Most scanners stop at *"you leaked an AWS key."* KafkaCode goes further โ it grades how
|
|
28
|
+
your code handles **personal data**, flags **GDPR/CCPA** risks, and catches hardcoded
|
|
29
|
+
secrets, with an optional **AI pass** for the context that regex alone can't see.
|
|
16
30
|
|
|
17
|
-
|
|
31
|
+
You get one number a whole team understands โ a **privacy grade from A+ to F** โ plus a
|
|
32
|
+
non-zero exit code that fails the build when something sensitive slips in.
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
npx kafkacode scan .
|
|
36
|
+
```
|
|
18
37
|
|
|
19
|
-
|
|
20
|
-
- ๐ค **AI-powered Analysis**: Uses advanced LLM analysis for contextual privacy issues
|
|
21
|
-
- โก **Fast & Efficient**: Scans entire codebases in seconds
|
|
22
|
-
- ๐ฏ **Multiple File Types**: Supports Python, JavaScript, TypeScript, Java, Go, Ruby, PHP
|
|
23
|
-
- ๐ **Detailed Reports**: Beautiful console reports with severity levels
|
|
24
|
-
- ๐ **CI/CD Ready**: Easy integration with build pipelines
|
|
38
|
+
No install. No signup. No config.
|
|
25
39
|
|
|
26
|
-
##
|
|
40
|
+
## โก Quickstart
|
|
27
41
|
|
|
28
42
|
```bash
|
|
43
|
+
# Run it once, anywhere (no install)
|
|
44
|
+
npx kafkacode scan .
|
|
45
|
+
|
|
46
|
+
# Or install globally
|
|
29
47
|
npm install -g kafkacode
|
|
48
|
+
kafkacode scan ./src --verbose
|
|
30
49
|
```
|
|
31
50
|
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
51
|
+
## โจ Features
|
|
52
|
+
|
|
53
|
+
- ๐ **Secret detection** โ AWS & Stripe keys, private keys, high-entropy strings
|
|
54
|
+
- ๐ต๏ธ **PII detection** โ emails, phone numbers, IP addresses
|
|
55
|
+
- ๐ค **AI-powered analysis** โ contextual privacy issues a regex would miss
|
|
56
|
+
- ๐ **Privacy grade** โ a single, shareable **A+ โ F** score
|
|
57
|
+
- ๐ท๏ธ **Grade badge** โ drop your score into your README (`--badge`)
|
|
58
|
+
- โก **Fast & offline** โ pattern scanning needs no network
|
|
59
|
+
- ๐ **7 languages** โ Python, JavaScript, TypeScript, Java, Go, Ruby, PHP
|
|
60
|
+
- ๐ **CI/CD ready** โ clean exit codes + a one-line GitHub Action
|
|
61
|
+
|
|
62
|
+
## ๐ Example output
|
|
63
|
+
|
|
64
|
+
```text
|
|
65
|
+
๐ฏ PRIVACY SCAN REPORT
|
|
66
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
|
67
|
+
|
|
68
|
+
๐ SCAN SUMMARY
|
|
69
|
+
๐ Directory: ./src
|
|
70
|
+
๐ Files Scanned: 18
|
|
71
|
+
๐ Total Issues: 4
|
|
72
|
+
๐ Privacy Grade: ๐ด F
|
|
73
|
+
|
|
74
|
+
๐จ Critical: 1 ๐ฅ High: 1 โ ๏ธ Medium: 2 ๐ต Low: 0
|
|
75
|
+
|
|
76
|
+
๐จ CRITICAL
|
|
77
|
+
โโ AWS Access Key ID detected
|
|
78
|
+
โ ๐ src/config.js:12
|
|
79
|
+
โ ๐ก Move credentials to environment variables or a secrets manager.
|
|
80
|
+
โโ
|
|
81
|
+
|
|
82
|
+
โ ๏ธ MEDIUM
|
|
83
|
+
โโ Email address detected (PII)
|
|
84
|
+
โ ๐ src/users.js:47
|
|
85
|
+
โ ๐ก Avoid hardcoding personal data; load it at runtime.
|
|
86
|
+
โโ
|
|
35
87
|
```
|
|
36
88
|
|
|
37
|
-
##
|
|
89
|
+
## ๐ท๏ธ Privacy grade & badge
|
|
90
|
+
|
|
91
|
+
KafkaCode distills every scan into one grade:
|
|
92
|
+
|
|
93
|
+
| Grade | Meaning |
|
|
94
|
+
| :---: | ------- |
|
|
95
|
+
| ๐ข **A+ / A / A-** | Excellent โ no or only low-severity issues |
|
|
96
|
+
| ๐ก **B+ / B / B-** | Good โ a few medium-severity issues |
|
|
97
|
+
| ๐ **C+ / C / C-** | Needs attention โ high-severity issues present |
|
|
98
|
+
| ๐ด **D / F** | Critical privacy/secret exposure |
|
|
99
|
+
|
|
100
|
+
Show it off in your own README:
|
|
38
101
|
|
|
39
|
-
**Basic Scan:**
|
|
40
102
|
```bash
|
|
41
|
-
kafkacode scan
|
|
103
|
+
kafkacode scan . --badge
|
|
42
104
|
```
|
|
43
105
|
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
106
|
+
```text
|
|
107
|
+
๐ท๏ธ Privacy Grade Badge โ paste into your README:
|
|
108
|
+
|
|
109
|
+

|
|
47
110
|
```
|
|
48
111
|
|
|
49
|
-
|
|
112
|
+
โ 
|
|
113
|
+
|
|
114
|
+
## ๐ CI/CD integration
|
|
50
115
|
|
|
51
|
-
|
|
52
|
-
- **High Severity**: Sensitive keywords in assignment context
|
|
53
|
-
- **Medium Severity**: Email addresses, Phone numbers, High entropy strings
|
|
54
|
-
- **Low Severity**: IP addresses, URLs
|
|
116
|
+
### GitHub Action
|
|
55
117
|
|
|
56
|
-
|
|
118
|
+
```yaml
|
|
119
|
+
# .github/workflows/privacy.yml
|
|
120
|
+
name: Privacy Scan
|
|
121
|
+
on: [push, pull_request]
|
|
122
|
+
jobs:
|
|
123
|
+
scan:
|
|
124
|
+
runs-on: ubuntu-latest
|
|
125
|
+
steps:
|
|
126
|
+
- uses: actions/checkout@v4
|
|
127
|
+
- uses: nikhil-kapu/kafkacode@v1
|
|
128
|
+
with:
|
|
129
|
+
path: ./src
|
|
130
|
+
```
|
|
57
131
|
|
|
58
|
-
|
|
132
|
+
### Any CI / pre-commit
|
|
59
133
|
|
|
60
|
-
|
|
61
|
-
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
- **F**: Critical privacy vulnerabilities detected
|
|
134
|
+
```bash
|
|
135
|
+
# Exits non-zero when issues are found, failing the build
|
|
136
|
+
npx kafkacode scan ./src
|
|
137
|
+
```
|
|
65
138
|
|
|
66
|
-
##
|
|
139
|
+
## ๐ What it detects
|
|
140
|
+
|
|
141
|
+
| Severity | Examples |
|
|
142
|
+
| -------- | -------- |
|
|
143
|
+
| ๐จ **Critical** | AWS keys, Stripe live keys, private keys |
|
|
144
|
+
| ๐ฅ **High** | `password=`, `api_key=`, `token=` and other secrets in assignments |
|
|
145
|
+
| โ ๏ธ **Medium** | Emails, phone numbers, high-entropy strings |
|
|
146
|
+
| ๐ต **Low** | IP addresses |
|
|
147
|
+
|
|
148
|
+
## ๐ง How it works
|
|
67
149
|
|
|
68
150
|
```
|
|
69
|
-
|
|
70
|
-
|
|
151
|
+
your code โโถ FileScanner โโถ โโ PatternScanner (regex, fully offline)
|
|
152
|
+
โโ LLMAnalyzer (optional AI context)
|
|
153
|
+
โ
|
|
154
|
+
โผ
|
|
155
|
+
ReportGenerator โโถ grade + findings + exit code
|
|
156
|
+
```
|
|
71
157
|
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
158
|
+
Pattern-based detection runs entirely on your machine with no network calls. The
|
|
159
|
+
optional AI layer adds contextual findings for the cases static rules can't catch.
|
|
160
|
+
|
|
161
|
+
## ๐ค AI mode (optional, bring-your-own-key)
|
|
162
|
+
|
|
163
|
+
Pattern scanning works out of the box with **no setup and no network calls**. To add
|
|
164
|
+
AI-powered contextual findings, bring your own API key โ KafkaCode calls an
|
|
165
|
+
OpenAI-compatible chat API directly, defaulting to [Groq](https://console.groq.com/keys)
|
|
166
|
+
(which has a free tier):
|
|
167
|
+
|
|
168
|
+
```bash
|
|
169
|
+
export KAFKACODE_API_KEY=your_key_here
|
|
170
|
+
kafkacode scan ./src
|
|
78
171
|
```
|
|
79
172
|
|
|
80
|
-
|
|
173
|
+
| Variable | Default | Purpose |
|
|
174
|
+
| -------- | ------- | ------- |
|
|
175
|
+
| `KAFKACODE_API_KEY` | _(unset)_ | Your provider API key โ **enables AI mode** |
|
|
176
|
+
| `KAFKACODE_API_URL` | `https://api.groq.com/openai/v1` | OpenAI-compatible base URL (Groq, OpenAI, OpenRouter, local modelsโฆ) |
|
|
177
|
+
| `KAFKACODE_MODEL` | `llama-3.1-8b-instant` | Model name |
|
|
81
178
|
|
|
82
|
-
|
|
179
|
+
Without a key, KafkaCode runs **pattern-only and never sends your code anywhere**.
|
|
180
|
+
Pass `--no-ai` to force pattern-only even when a key is set.
|
|
83
181
|
|
|
84
|
-
|
|
182
|
+
## ๐ How it compares
|
|
85
183
|
|
|
86
|
-
|
|
184
|
+
| | KafkaCode | gitleaks / trufflehog | semgrep |
|
|
185
|
+
| ---------------------------- | :-------: | :-------------------: | :-----: |
|
|
186
|
+
| Hardcoded secrets | โ
| โ
(deep, git log) | โ |
|
|
187
|
+
| PII / personal-data findings | โ
| โ | โ |
|
|
188
|
+
| Privacy grade (A+ โ F) | โ
| โ | โ |
|
|
189
|
+
| AI contextual analysis | โ
| โ | โ |
|
|
190
|
+
| Zero-config, one command | โ
| โ
| โ |
|
|
87
191
|
|
|
88
|
-
KafkaCode
|
|
192
|
+
KafkaCode focuses on **privacy and developer-friendly grading** โ it complements
|
|
193
|
+
deep secret scanners rather than replacing them.
|
|
89
194
|
|
|
90
|
-
|
|
91
|
-
- ๐ง **Contact**: contact@kafkalabs.com
|
|
92
|
-
- ๐ฌ **Issues**: [GitHub Issues](https://github.com/nikhil-kapu/KafkacodeFnpm/issues)
|
|
195
|
+
## ๐บ๏ธ Roadmap
|
|
93
196
|
|
|
94
|
-
|
|
197
|
+
- [x] **Bring-your-own-key AI** โ call Groq / OpenAI-compatible providers directly
|
|
198
|
+
- [ ] `--json` and **SARIF** output (GitHub Security tab integration)
|
|
199
|
+
- [ ] Config file & `.kafkacodeignore`
|
|
200
|
+
- [ ] Baseline file to adopt on existing codebases
|
|
201
|
+
- [ ] More file types (`.env`, YAML, Terraform, Dockerfiles)
|
|
202
|
+
|
|
203
|
+
Ideas and PRs welcome โ see [CONTRIBUTING.md](CONTRIBUTING.md).
|
|
204
|
+
|
|
205
|
+
## ๐ค Contributing
|
|
206
|
+
|
|
207
|
+
Contributions of all kinds are welcome โ bug reports, new detection patterns, and docs.
|
|
208
|
+
Start with [CONTRIBUTING.md](CONTRIBUTING.md), and please report security issues per our
|
|
209
|
+
[Security Policy](SECURITY.md).
|
|
210
|
+
|
|
211
|
+
## ๐ License
|
|
212
|
+
|
|
213
|
+
[MIT](LICENSE) ยฉ KafkaLabs
|
|
95
214
|
|
|
96
215
|
<div align="center">
|
|
97
|
-
|
|
98
|
-
</div>
|
|
216
|
+
<sub>๐ก๏ธ Keep your code secure, keep your users safe.</sub>
|
|
217
|
+
</div>
|
package/dist/AnalysisEngine.js
CHANGED
|
@@ -10,6 +10,16 @@ class AnalysisEngine {
|
|
|
10
10
|
this.llmAnalyzer.verbose = verbose;
|
|
11
11
|
}
|
|
12
12
|
|
|
13
|
+
/** Force pattern-only scanning, even if an API key is configured. */
|
|
14
|
+
disableAi() {
|
|
15
|
+
this.llmAnalyzer.disabled = true;
|
|
16
|
+
}
|
|
17
|
+
|
|
18
|
+
/** Whether AI-powered analysis will run for this scan. */
|
|
19
|
+
aiEnabled() {
|
|
20
|
+
return this.llmAnalyzer.isEnabled();
|
|
21
|
+
}
|
|
22
|
+
|
|
13
23
|
async analyzeFile(filePath) {
|
|
14
24
|
if (this.verbose) {
|
|
15
25
|
console.log(`Analyzing: ${filePath}`);
|
package/dist/LLMAnalyzer.js
CHANGED
|
@@ -1,8 +1,28 @@
|
|
|
1
1
|
const https = require('https');
|
|
2
2
|
|
|
3
|
+
/**
|
|
4
|
+
* LLMAnalyzer performs optional AI-powered contextual analysis.
|
|
5
|
+
*
|
|
6
|
+
* It is "bring your own key": the user supplies an API key and KafkaCode calls
|
|
7
|
+
* an OpenAI-compatible chat-completions endpoint directly (defaulting to Groq).
|
|
8
|
+
* When no key (and no self-hosted backend) is configured, AI analysis is simply
|
|
9
|
+
* skipped โ pattern-based scanning still runs, and no code leaves the machine.
|
|
10
|
+
*/
|
|
3
11
|
class LLMAnalyzer {
|
|
4
12
|
constructor() {
|
|
5
|
-
|
|
13
|
+
// Bring-your-own-key: direct, OpenAI-compatible provider call.
|
|
14
|
+
this.apiKey = process.env.KAFKACODE_API_KEY || '';
|
|
15
|
+
this.apiUrl = process.env.KAFKACODE_API_URL || 'https://api.groq.com/openai/v1';
|
|
16
|
+
this.model = process.env.KAFKACODE_MODEL || 'llama-3.1-8b-instant';
|
|
17
|
+
|
|
18
|
+
// Advanced: a self-hosted backend exposing POST /api/analyze. If set,
|
|
19
|
+
// it takes precedence over a direct provider call.
|
|
20
|
+
this.backendEndpoint = process.env.KAFKACODE_BACKEND_ENDPOINT || '';
|
|
21
|
+
|
|
22
|
+
// Delay between snippet requests, to stay within free-tier rate limits.
|
|
23
|
+
this.rateLimitMs = parseInt(process.env.KAFKACODE_RATE_LIMIT_MS || '250', 10);
|
|
24
|
+
|
|
25
|
+
this.disabled = false;
|
|
6
26
|
this.verbose = false;
|
|
7
27
|
this.interestKeywords = new Set([
|
|
8
28
|
'api', 'db', 'database', 'user', 'password', 'save', 'fetch', 'send', 'log',
|
|
@@ -11,6 +31,11 @@ class LLMAnalyzer {
|
|
|
11
31
|
]);
|
|
12
32
|
}
|
|
13
33
|
|
|
34
|
+
/** AI analysis is available only when a key or a backend endpoint is configured. */
|
|
35
|
+
isEnabled() {
|
|
36
|
+
return !this.disabled && Boolean(this.apiKey || this.backendEndpoint);
|
|
37
|
+
}
|
|
38
|
+
|
|
14
39
|
_createSnippetPrompt(codeSnippet, filePath, startLine) {
|
|
15
40
|
return `SYSTEM: You are an automated privacy and compliance analysis engine. Your task is to review the following CODE SNIPPET and identify potential privacy vulnerabilities based ONLY on the provided code. The snippet is from a larger file. Do not infer functionality outside of this snippet. Your analysis must focus on how the code handles data that could be considered sensitive or PII.
|
|
16
41
|
|
|
@@ -67,7 +92,6 @@ ${codeSnippet}`;
|
|
|
67
92
|
const mergedRanges = [];
|
|
68
93
|
for (const [start, end] of ranges) {
|
|
69
94
|
if (mergedRanges.length > 0 && start <= mergedRanges[mergedRanges.length - 1][1] + 10) {
|
|
70
|
-
// Extend previous range
|
|
71
95
|
const lastRange = mergedRanges[mergedRanges.length - 1];
|
|
72
96
|
mergedRanges[mergedRanges.length - 1] = [lastRange[0], Math.max(lastRange[1], end)];
|
|
73
97
|
} else {
|
|
@@ -78,27 +102,50 @@ ${codeSnippet}`;
|
|
|
78
102
|
return mergedRanges;
|
|
79
103
|
}
|
|
80
104
|
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
if (this.verbose) {
|
|
86
|
-
console.log(` โ LLM call failed, using mock: ${error.message}`);
|
|
87
|
-
}
|
|
88
|
-
return this._mockSnippetResponse(codeSnippet, filePath, startLine);
|
|
105
|
+
/** Route a snippet to either the self-hosted backend or the direct provider. */
|
|
106
|
+
async _analyzeSnippet(codeSnippet, filePath, startLine) {
|
|
107
|
+
if (this.backendEndpoint) {
|
|
108
|
+
return this._callBackendApi(codeSnippet, filePath, startLine);
|
|
89
109
|
}
|
|
110
|
+
return this._callProviderApi(codeSnippet, filePath, startLine);
|
|
90
111
|
}
|
|
91
112
|
|
|
92
|
-
|
|
113
|
+
/** Call an OpenAI-compatible chat-completions endpoint directly (BYOK). */
|
|
114
|
+
async _callProviderApi(codeSnippet, filePath, startLine) {
|
|
115
|
+
const prompt = this._createSnippetPrompt(codeSnippet, filePath, startLine);
|
|
93
116
|
const payload = JSON.stringify({
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
117
|
+
model: this.model,
|
|
118
|
+
messages: [{ role: 'user', content: prompt }],
|
|
119
|
+
temperature: 0,
|
|
120
|
+
max_tokens: 800
|
|
97
121
|
});
|
|
98
122
|
|
|
99
|
-
const
|
|
100
|
-
url
|
|
123
|
+
const base = this.apiUrl.replace(/\/+$/, '');
|
|
124
|
+
const url = new URL(`${base}/chat/completions`);
|
|
125
|
+
const options = {
|
|
126
|
+
hostname: url.hostname,
|
|
127
|
+
port: url.port || 443,
|
|
128
|
+
path: url.pathname + url.search,
|
|
129
|
+
method: 'POST',
|
|
130
|
+
headers: {
|
|
131
|
+
'Content-Type': 'application/json',
|
|
132
|
+
'Authorization': `Bearer ${this.apiKey}`,
|
|
133
|
+
'Content-Length': Buffer.byteLength(payload)
|
|
134
|
+
},
|
|
135
|
+
timeout: 20000
|
|
136
|
+
};
|
|
137
|
+
|
|
138
|
+
const raw = await this._request(options, payload);
|
|
139
|
+
const result = JSON.parse(raw);
|
|
140
|
+
const content = result.choices && result.choices[0] && result.choices[0].message.content;
|
|
141
|
+
return this._parseVulnerabilities(content || '');
|
|
142
|
+
}
|
|
101
143
|
|
|
144
|
+
/** Call a self-hosted KafkaCode backend (POST /api/analyze). */
|
|
145
|
+
async _callBackendApi(codeSnippet, filePath, startLine) {
|
|
146
|
+
const payload = JSON.stringify({ codeSnippet, filePath, startLine });
|
|
147
|
+
const base = this.backendEndpoint.replace(/\/+$/, '');
|
|
148
|
+
const url = new URL(`${base}/api/analyze`);
|
|
102
149
|
const options = {
|
|
103
150
|
hostname: url.hostname,
|
|
104
151
|
port: url.port || 443,
|
|
@@ -108,68 +155,57 @@ ${codeSnippet}`;
|
|
|
108
155
|
'Content-Type': 'application/json',
|
|
109
156
|
'Content-Length': Buffer.byteLength(payload)
|
|
110
157
|
},
|
|
111
|
-
timeout:
|
|
158
|
+
timeout: 20000
|
|
112
159
|
};
|
|
113
160
|
|
|
161
|
+
const raw = await this._request(options, payload);
|
|
162
|
+
const result = JSON.parse(raw);
|
|
163
|
+
const content = result.data && result.data.choices && result.data.choices[0].message.content;
|
|
164
|
+
return this._parseVulnerabilities(content || '');
|
|
165
|
+
}
|
|
166
|
+
|
|
167
|
+
/** Parse the model's text response into a vulnerabilities array, defensively. */
|
|
168
|
+
_parseVulnerabilities(content) {
|
|
169
|
+
try {
|
|
170
|
+
const parsed = JSON.parse(content);
|
|
171
|
+
return Array.isArray(parsed.vulnerabilities) ? parsed.vulnerabilities : [];
|
|
172
|
+
} catch (err) {
|
|
173
|
+
// Some models wrap JSON in prose or fences โ extract the JSON object.
|
|
174
|
+
const start = content.indexOf('{');
|
|
175
|
+
const end = content.lastIndexOf('}') + 1;
|
|
176
|
+
if (start !== -1 && end > start) {
|
|
177
|
+
try {
|
|
178
|
+
const parsed = JSON.parse(content.substring(start, end));
|
|
179
|
+
return Array.isArray(parsed.vulnerabilities) ? parsed.vulnerabilities : [];
|
|
180
|
+
} catch (_) {
|
|
181
|
+
return [];
|
|
182
|
+
}
|
|
183
|
+
}
|
|
184
|
+
return [];
|
|
185
|
+
}
|
|
186
|
+
}
|
|
187
|
+
|
|
188
|
+
/** Promise wrapper around https.request with status + timeout handling. */
|
|
189
|
+
_request(options, payload) {
|
|
114
190
|
return new Promise((resolve, reject) => {
|
|
115
191
|
const req = https.request(options, (res) => {
|
|
116
192
|
let data = '';
|
|
117
|
-
|
|
118
|
-
res.on('data', (chunk) => {
|
|
119
|
-
data += chunk;
|
|
120
|
-
});
|
|
121
|
-
|
|
193
|
+
res.on('data', (chunk) => { data += chunk; });
|
|
122
194
|
res.on('end', () => {
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
}
|
|
128
|
-
|
|
129
|
-
if (res.statusCode !== 200) {
|
|
130
|
-
throw new Error(`HTTP ${res.statusCode}: ${data}`);
|
|
131
|
-
}
|
|
132
|
-
|
|
133
|
-
const result = JSON.parse(data);
|
|
134
|
-
const content = result.data.choices[0].message.content;
|
|
135
|
-
|
|
136
|
-
try {
|
|
137
|
-
const parsedResponse = JSON.parse(content);
|
|
138
|
-
if (this.verbose) {
|
|
139
|
-
console.log(` โ
LLM returned ${parsedResponse.vulnerabilities?.length || 0} vulnerabilities`);
|
|
140
|
-
if (result.rateLimitRemaining !== undefined) {
|
|
141
|
-
console.log(` ๐ Rate limit remaining: ${result.rateLimitRemaining}`);
|
|
142
|
-
}
|
|
143
|
-
}
|
|
144
|
-
parsedResponse.__source = 'llm';
|
|
145
|
-
resolve(parsedResponse);
|
|
146
|
-
} catch (jsonError) {
|
|
147
|
-
const jsonStart = content.indexOf('{');
|
|
148
|
-
const jsonEnd = content.lastIndexOf('}') + 1;
|
|
149
|
-
if (jsonStart !== -1 && jsonEnd > jsonStart) {
|
|
150
|
-
const parsed = JSON.parse(content.substring(jsonStart, jsonEnd));
|
|
151
|
-
if (this.verbose) {
|
|
152
|
-
console.log(` โ
LLM returned ${parsed.vulnerabilities?.length || 0} vulnerabilities (extracted JSON)`);
|
|
153
|
-
}
|
|
154
|
-
parsed.__source = 'llm';
|
|
155
|
-
resolve(parsed);
|
|
156
|
-
} else {
|
|
157
|
-
resolve({ vulnerabilities: [] });
|
|
158
|
-
}
|
|
159
|
-
}
|
|
160
|
-
} catch (error) {
|
|
161
|
-
reject(error);
|
|
195
|
+
if (res.statusCode === 429) {
|
|
196
|
+
return reject(new Error('Rate limit exceeded (HTTP 429)'));
|
|
197
|
+
}
|
|
198
|
+
if (res.statusCode < 200 || res.statusCode >= 300) {
|
|
199
|
+
return reject(new Error(`HTTP ${res.statusCode}: ${data.slice(0, 200)}`));
|
|
162
200
|
}
|
|
201
|
+
resolve(data);
|
|
163
202
|
});
|
|
164
203
|
});
|
|
165
204
|
|
|
166
|
-
req.on('error',
|
|
167
|
-
reject(error);
|
|
168
|
-
});
|
|
169
|
-
|
|
205
|
+
req.on('error', reject);
|
|
170
206
|
req.on('timeout', () => {
|
|
171
207
|
req.destroy();
|
|
172
|
-
reject(new Error('
|
|
208
|
+
reject(new Error('LLM request timed out'));
|
|
173
209
|
});
|
|
174
210
|
|
|
175
211
|
req.write(payload);
|
|
@@ -177,87 +213,56 @@ ${codeSnippet}`;
|
|
|
177
213
|
});
|
|
178
214
|
}
|
|
179
215
|
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
// Simple heuristic-based mock analysis
|
|
186
|
-
for (let i = 0; i < lines.length; i++) {
|
|
187
|
-
const line = lines[i];
|
|
188
|
-
const lineLower = line.toLowerCase();
|
|
189
|
-
const actualLineNum = startLine + i;
|
|
190
|
-
|
|
191
|
-
// Look for logging of potentially sensitive data
|
|
192
|
-
if (lineLower.includes('log') && ['email', 'user', 'password', 'token'].some(term => lineLower.includes(term))) {
|
|
193
|
-
vulnerabilities.push({
|
|
194
|
-
line_number: actualLineNum,
|
|
195
|
-
severity: 'Medium',
|
|
196
|
-
description: 'Potential logging of sensitive user data detected.',
|
|
197
|
-
suggestion: 'Consider logging only non-sensitive identifiers or hashing sensitive data before logging.'
|
|
198
|
-
});
|
|
199
|
-
}
|
|
200
|
-
|
|
201
|
-
// Look for insecure data transmission
|
|
202
|
-
if (lineLower.includes('http://') && ['api', 'send', 'post', 'request'].some(term => lineLower.includes(term))) {
|
|
203
|
-
vulnerabilities.push({
|
|
204
|
-
line_number: actualLineNum,
|
|
205
|
-
severity: 'High',
|
|
206
|
-
description: 'Insecure HTTP transmission of potentially sensitive data.',
|
|
207
|
-
suggestion: 'Use HTTPS instead of HTTP for all data transmission.'
|
|
208
|
-
});
|
|
209
|
-
}
|
|
216
|
+
async analyzeFile(filePath, content, patternFindings = []) {
|
|
217
|
+
// AI analysis is opt-in: with no key/backend configured, skip entirely.
|
|
218
|
+
if (!this.isEnabled()) {
|
|
219
|
+
return [];
|
|
210
220
|
}
|
|
211
221
|
|
|
212
|
-
return { vulnerabilities, __source: 'mock' };
|
|
213
|
-
}
|
|
214
|
-
|
|
215
|
-
async analyzeFile(filePath, content, patternFindings = []) {
|
|
216
222
|
const lines = content.split('\n');
|
|
217
223
|
const areasOfInterest = this._identifyAreasOfInterest(content, patternFindings);
|
|
218
224
|
const findings = [];
|
|
219
225
|
|
|
220
226
|
if (this.verbose && areasOfInterest.length > 0) {
|
|
221
|
-
console.log(` Found ${areasOfInterest.length} areas of interest for
|
|
227
|
+
console.log(` Found ${areasOfInterest.length} areas of interest for AI analysis`);
|
|
222
228
|
}
|
|
223
229
|
|
|
224
|
-
// Analyze each area of interest
|
|
225
230
|
for (const [startLine, endLine] of areasOfInterest) {
|
|
226
|
-
|
|
227
|
-
const snippetLines = lines.slice(startLine - 1, endLine);
|
|
228
|
-
const snippet = snippetLines.join('\n');
|
|
231
|
+
const snippet = lines.slice(startLine - 1, endLine).join('\n');
|
|
229
232
|
|
|
230
233
|
// Skip very small snippets
|
|
231
234
|
if (snippet.trim().length < 50) {
|
|
232
235
|
continue;
|
|
233
236
|
}
|
|
234
237
|
|
|
238
|
+
let vulnerabilities;
|
|
235
239
|
try {
|
|
236
|
-
|
|
237
|
-
const grokResponse = await this.callGrokApi(snippet, filePath, startLine);
|
|
238
|
-
|
|
239
|
-
// Add delay to prevent rate limiting from free tier
|
|
240
|
-
await new Promise(resolve => setTimeout(resolve, 1000));
|
|
241
|
-
|
|
242
|
-
// Process findings
|
|
243
|
-
for (const vuln of grokResponse.vulnerabilities || []) {
|
|
244
|
-
const finding = {
|
|
245
|
-
file_path: filePath,
|
|
246
|
-
line_number: vuln.line_number || startLine,
|
|
247
|
-
severity: vuln.severity || 'Medium',
|
|
248
|
-
finding_type: 'Context-Based Issue',
|
|
249
|
-
description: vuln.description || 'Privacy vulnerability detected',
|
|
250
|
-
code_snippet: this._getCodeSnippet(content, vuln.line_number || startLine),
|
|
251
|
-
suggestion: vuln.suggestion || 'Review and address the identified issue.',
|
|
252
|
-
source: grokResponse.__source || 'unknown'
|
|
253
|
-
};
|
|
254
|
-
findings.push(finding);
|
|
255
|
-
}
|
|
256
|
-
|
|
240
|
+
vulnerabilities = await this._analyzeSnippet(snippet, filePath, startLine);
|
|
257
241
|
} catch (error) {
|
|
258
|
-
//
|
|
242
|
+
// Skip this snippet on error โ never fabricate findings.
|
|
243
|
+
if (this.verbose) {
|
|
244
|
+
console.log(` โ ๏ธ AI analysis skipped for ${filePath}:${startLine} โ ${error.message}`);
|
|
245
|
+
}
|
|
259
246
|
continue;
|
|
260
247
|
}
|
|
248
|
+
|
|
249
|
+
for (const vuln of vulnerabilities) {
|
|
250
|
+
findings.push({
|
|
251
|
+
file_path: filePath,
|
|
252
|
+
line_number: vuln.line_number || startLine,
|
|
253
|
+
severity: vuln.severity || 'Medium',
|
|
254
|
+
finding_type: 'Context-Based Issue',
|
|
255
|
+
description: vuln.description || 'Privacy vulnerability detected',
|
|
256
|
+
code_snippet: this._getCodeSnippet(content, vuln.line_number || startLine),
|
|
257
|
+
suggestion: vuln.suggestion || 'Review and address the identified issue.',
|
|
258
|
+
source: 'llm'
|
|
259
|
+
});
|
|
260
|
+
}
|
|
261
|
+
|
|
262
|
+
// Gentle pacing to respect provider rate limits.
|
|
263
|
+
if (this.rateLimitMs > 0) {
|
|
264
|
+
await new Promise(resolve => setTimeout(resolve, this.rateLimitMs));
|
|
265
|
+
}
|
|
261
266
|
}
|
|
262
267
|
|
|
263
268
|
// Remove duplicates based on line number and description
|
|
@@ -283,4 +288,4 @@ ${codeSnippet}`;
|
|
|
283
288
|
}
|
|
284
289
|
}
|
|
285
290
|
|
|
286
|
-
module.exports = LLMAnalyzer;
|
|
291
|
+
module.exports = LLMAnalyzer;
|
package/dist/ReportGenerator.js
CHANGED
|
@@ -47,6 +47,25 @@ class ReportGenerator {
|
|
|
47
47
|
}
|
|
48
48
|
}
|
|
49
49
|
|
|
50
|
+
// Public: return the privacy grade (A+ .. F) for a set of findings.
|
|
51
|
+
getGrade(findings) {
|
|
52
|
+
return this._calculateGrade(findings);
|
|
53
|
+
}
|
|
54
|
+
|
|
55
|
+
// Public: build a shields.io privacy-grade badge for embedding in a README.
|
|
56
|
+
getBadge(findings) {
|
|
57
|
+
const grade = this._calculateGrade(findings);
|
|
58
|
+
const colorMap = {
|
|
59
|
+
'A+': 'brightgreen', 'A': 'brightgreen', 'A-': 'green',
|
|
60
|
+
'B+': 'yellowgreen', 'B': 'yellowgreen', 'B-': 'yellow',
|
|
61
|
+
'C+': 'yellow', 'C': 'orange', 'C-': 'orange',
|
|
62
|
+
'D': 'red', 'F': 'red'
|
|
63
|
+
};
|
|
64
|
+
const color = colorMap[grade] || 'lightgrey';
|
|
65
|
+
const url = `https://img.shields.io/badge/Privacy%20Grade-${encodeURIComponent(grade)}-${color}`;
|
|
66
|
+
return { grade, url, markdown: `` };
|
|
67
|
+
}
|
|
68
|
+
|
|
50
69
|
_groupFindingsBySeverity(findings) {
|
|
51
70
|
const groups = {};
|
|
52
71
|
this.severityOrder.forEach(severity => {
|
|
@@ -132,7 +151,7 @@ class ReportGenerator {
|
|
|
132
151
|
'โ โ',
|
|
133
152
|
'โ ๐ฏ Detection Methods: โ',
|
|
134
153
|
`โ โข Pattern-based: ${patternCount} issues โ`,
|
|
135
|
-
`โ โข AI-powered: ${llmCount} issues
|
|
154
|
+
`โ โข AI-powered: ${llmCount} issues โ`,
|
|
136
155
|
'โ โ',
|
|
137
156
|
'โ ๐ Severity Breakdown: โ',
|
|
138
157
|
`โ ๐จ Critical: ${severityCounts['Critical'].toString().padEnd(10)} ๐ฅ High: ${severityCounts['High'].toString().padEnd(14)} โ`,
|
|
@@ -209,16 +228,12 @@ class ReportGenerator {
|
|
|
209
228
|
|
|
210
229
|
// Footer
|
|
211
230
|
reportLines.push(
|
|
212
|
-
'
|
|
213
|
-
'
|
|
214
|
-
'
|
|
215
|
-
'
|
|
216
|
-
'
|
|
217
|
-
'
|
|
218
|
-
'โ ๐ฆ Follow: @KafkaCodeDev โ',
|
|
219
|
-
'โ โ',
|
|
220
|
-
'โ ๐ก๏ธ Keep your code secure, keep your users safe! ๐ก๏ธ โ',
|
|
221
|
-
'โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ',
|
|
231
|
+
'',
|
|
232
|
+
'โ'.repeat(80),
|
|
233
|
+
'๐ KafkaCode ยท AI-powered privacy & compliance scanner',
|
|
234
|
+
'๐ Docs & issues: https://github.com/nikhil-kapu/kafkacode',
|
|
235
|
+
'๐ก๏ธ Keep your code secure, keep your users safe!',
|
|
236
|
+
'โ'.repeat(80),
|
|
222
237
|
''
|
|
223
238
|
);
|
|
224
239
|
|
package/dist/cli.js
CHANGED
|
@@ -12,13 +12,15 @@ const program = new Command();
|
|
|
12
12
|
program
|
|
13
13
|
.name('kafkacode')
|
|
14
14
|
.description('KafkaCode - Privacy and Compliance Scanner')
|
|
15
|
-
.version('1.
|
|
15
|
+
.version('1.3.0');
|
|
16
16
|
|
|
17
17
|
program
|
|
18
18
|
.command('scan')
|
|
19
19
|
.description('Scan a directory for privacy issues')
|
|
20
20
|
.argument('<directory>', 'Path to the source code directory to scan')
|
|
21
21
|
.option('-v, --verbose', 'Print verbose progress updates during the scan')
|
|
22
|
+
.option('-b, --badge', 'Print a copy-paste privacy-grade badge for your README')
|
|
23
|
+
.option('--no-ai', 'Disable AI-powered analysis (run pattern scan only)')
|
|
22
24
|
.action(async (directory, options) => {
|
|
23
25
|
await runScan(directory, options);
|
|
24
26
|
});
|
|
@@ -40,6 +42,9 @@ async function runScan(directory, options = {}) {
|
|
|
40
42
|
// Initialize components
|
|
41
43
|
const fileScanner = new FileScanner(directory);
|
|
42
44
|
const analysisEngine = new AnalysisEngine(verbose);
|
|
45
|
+
if (options.ai === false) {
|
|
46
|
+
analysisEngine.disableAi();
|
|
47
|
+
}
|
|
43
48
|
const reportGenerator = new ReportGenerator();
|
|
44
49
|
|
|
45
50
|
// Scan for files
|
|
@@ -67,6 +72,18 @@ async function runScan(directory, options = {}) {
|
|
|
67
72
|
const report = reportGenerator.generateReport(directory, findings, files.length);
|
|
68
73
|
console.log(report);
|
|
69
74
|
|
|
75
|
+
// Hint that AI analysis is available when it wasn't used.
|
|
76
|
+
if (options.ai !== false && !analysisEngine.aiEnabled()) {
|
|
77
|
+
console.log('๐ก Tip: set KAFKACODE_API_KEY to enable AI-powered contextual analysis. See the README.\n');
|
|
78
|
+
}
|
|
79
|
+
|
|
80
|
+
// Optionally print a copy-paste privacy-grade badge for the user's README
|
|
81
|
+
if (options.badge) {
|
|
82
|
+
const badge = reportGenerator.getBadge(findings);
|
|
83
|
+
console.log('๐ท๏ธ Privacy Grade Badge โ paste into your README:\n');
|
|
84
|
+
console.log(` ${badge.markdown}\n`);
|
|
85
|
+
}
|
|
86
|
+
|
|
70
87
|
// Return appropriate exit code
|
|
71
88
|
process.exit(findings.length > 0 ? 1 : 0);
|
|
72
89
|
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "kafkacode",
|
|
3
|
-
"version": "1.
|
|
4
|
-
"description": "AI-powered privacy and compliance scanner
|
|
3
|
+
"version": "1.3.0",
|
|
4
|
+
"description": "AI-powered privacy and compliance scanner - find PII leaks, hardcoded secrets, and compliance violations in your source code",
|
|
5
5
|
"main": "dist/index.js",
|
|
6
6
|
"bin": {
|
|
7
7
|
"kafkacode": "dist/cli.js"
|
|
@@ -27,8 +27,7 @@
|
|
|
27
27
|
"shift-left",
|
|
28
28
|
"cli-tool",
|
|
29
29
|
"security-scanner",
|
|
30
|
-
"vulnerability-scanner"
|
|
31
|
-
"kafkalabs"
|
|
30
|
+
"vulnerability-scanner"
|
|
32
31
|
],
|
|
33
32
|
"author": "KafkaLabs <contact@kafkalabs.com>",
|
|
34
33
|
"license": "MIT",
|
|
@@ -48,10 +47,10 @@
|
|
|
48
47
|
],
|
|
49
48
|
"repository": {
|
|
50
49
|
"type": "git",
|
|
51
|
-
"url": "https://github.com/nikhil-kapu/
|
|
50
|
+
"url": "git+https://github.com/nikhil-kapu/kafkacode.git"
|
|
52
51
|
},
|
|
53
|
-
"homepage": "https://
|
|
52
|
+
"homepage": "https://github.com/nikhil-kapu/kafkacode#readme",
|
|
54
53
|
"bugs": {
|
|
55
|
-
"url": "https://github.com/nikhil-kapu/
|
|
54
|
+
"url": "https://github.com/nikhil-kapu/kafkacode/issues"
|
|
56
55
|
}
|
|
57
|
-
}
|
|
56
|
+
}
|