kuzushi 0.11.0 → 0.11.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +197 -479
- package/dist/copilot/shell.d.ts +12 -8
- package/dist/copilot/shell.js +295 -187
- package/dist/copilot/shell.js.map +1 -1
- package/dist/ui/App.js +5 -1
- package/dist/ui/App.js.map +1 -1
- package/dist/ui/components/CopilotShell.d.ts +3 -6
- package/dist/ui/components/CopilotShell.js +33 -50
- package/dist/ui/components/CopilotShell.js.map +1 -1
- package/dist/ui/state.d.ts +6 -1
- package/dist/ui/state.js.map +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
<img src="kuzushi.png" alt="Kuzushi" width="200" />
|
|
2
2
|
|
|
3
|
-
# Kuzushi —
|
|
3
|
+
# Kuzushi — Security-Native AI Operating Environment
|
|
4
4
|
|
|
5
5
|
[](https://github.com/allsmog/Kuzushi/actions/workflows/ci.yml)
|
|
6
6
|
[](https://www.npmjs.com/package/kuzushi)
|
|
@@ -8,96 +8,141 @@
|
|
|
8
8
|
|
|
9
9
|
[kuzushi.dev](https://kuzushi.dev)
|
|
10
10
|
|
|
11
|
-
|
|
11
|
+
Kuzushi combines offensive security, defensive operations, and compliance governance into a single interactive platform — powered by LLM agents, backed by a persistent workspace, and delivered through a rich terminal UI.
|
|
12
12
|
|
|
13
|
-
|
|
13
|
+
Find the vulnerability. Prove it's exploitable. Deploy a honeypot to detect it. Check if it violates PCI DSS. Generate the patch. One tool, one conversation.
|
|
14
14
|
|
|
15
15
|
```sh
|
|
16
|
-
|
|
16
|
+
npm install -g kuzushi
|
|
17
|
+
kuzushi
|
|
17
18
|
```
|
|
18
19
|
|
|
19
|
-
|
|
20
|
+
## Three Ways to Use It
|
|
20
21
|
|
|
21
|
-
|
|
22
|
+
### Shell (default)
|
|
22
23
|
|
|
23
|
-
|
|
24
|
+
Just type `kuzushi`. The interactive copilot shell starts with your loaded modules, available tools, and any active workspace. Talk naturally or use structured commands.
|
|
24
25
|
|
|
25
|
-
|
|
26
|
+
```
|
|
27
|
+
kuzushi shell # default — just `kuzushi` works
|
|
28
|
+
kuzushi shell --workspace acme-pentest # resume an engagement
|
|
29
|
+
kuzushi shell --target ./repo # set initial target
|
|
30
|
+
kuzushi shell --load blackbox,honeypot # pre-load specific modules
|
|
31
|
+
```
|
|
26
32
|
|
|
27
|
-
```
|
|
28
|
-
|
|
29
|
-
|
|
33
|
+
```
|
|
34
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
35
|
+
│ kuzushi shell workspace: acme-api │
|
|
36
|
+
│ modules: sast, randori, blackbox, honeypot, shinsa │
|
|
37
|
+
│ target: ./acme-api (Node.js + Express + PostgreSQL) │
|
|
38
|
+
└─────────────────────────────────────────────────────────────┘
|
|
30
39
|
|
|
31
|
-
|
|
32
|
-
|
|
40
|
+
kuzushi> modules
|
|
41
|
+
kuzushi> use sast
|
|
42
|
+
kuzushi/sast> run scan ./repo preset=deep
|
|
43
|
+
kuzushi/sast> back
|
|
44
|
+
kuzushi> tools
|
|
45
|
+
kuzushi> run sast:verify fingerprint=abc123
|
|
46
|
+
kuzushi> exit
|
|
33
47
|
```
|
|
34
48
|
|
|
49
|
+
### Scan (headless pipeline)
|
|
50
|
+
|
|
51
|
+
The full SAST pipeline — 40+ agent tasks orchestrated as a DAG. Semgrep, CodeQL, 30+ agentic detectors, AI triage, verification, PoC generation, patch synthesis. CI/CD-native with SARIF, quality gates, and exit codes.
|
|
52
|
+
|
|
35
53
|
```sh
|
|
36
|
-
|
|
37
|
-
kuzushi
|
|
54
|
+
kuzushi scan <repo>
|
|
55
|
+
kuzushi scan <repo> --preset deep --verify --auto-patch
|
|
56
|
+
kuzushi scan <repo> --sarif report.sarif --fail-on-tp --quality-gate
|
|
57
|
+
kuzushi scan <repo> --resume
|
|
58
|
+
```
|
|
38
59
|
|
|
39
|
-
|
|
40
|
-
export ANTHROPIC_API_KEY=sk-ant-...
|
|
41
|
-
kuzushi /path/to/your/repo
|
|
60
|
+
### Run (headless module tool)
|
|
42
61
|
|
|
43
|
-
|
|
44
|
-
export OPENAI_API_KEY=sk-...
|
|
45
|
-
kuzushi /path/to/repo --model openai:gpt-4o
|
|
62
|
+
Execute a single module tool without the interactive shell. Scriptable, composable with unix pipes.
|
|
46
63
|
|
|
47
|
-
|
|
48
|
-
kuzushi
|
|
64
|
+
```sh
|
|
65
|
+
kuzushi run sast:scan ./repo --json
|
|
66
|
+
kuzushi run sast:triage fingerprint=abc123 --json
|
|
67
|
+
kuzushi run sast:verify fingerprint=abc123 --quiet
|
|
68
|
+
kuzushi run sast:findings severity=critical,high --json
|
|
49
69
|
```
|
|
50
70
|
|
|
51
|
-
|
|
71
|
+
## Quick Start
|
|
72
|
+
|
|
73
|
+
Prereqs: Node 22+, and either an API key or Claude Code OAuth login.
|
|
74
|
+
|
|
75
|
+
```sh
|
|
76
|
+
# Install globally
|
|
77
|
+
npm install -g kuzushi
|
|
52
78
|
|
|
53
|
-
|
|
79
|
+
# Start the copilot shell (default)
|
|
80
|
+
kuzushi
|
|
54
81
|
|
|
55
|
-
|
|
82
|
+
# Or run a headless scan
|
|
83
|
+
kuzushi scan /path/to/your/repo
|
|
56
84
|
|
|
57
|
-
|
|
85
|
+
# With specific providers
|
|
86
|
+
export ANTHROPIC_API_KEY=sk-ant-...
|
|
87
|
+
kuzushi scan /path/to/your/repo
|
|
58
88
|
|
|
59
|
-
|
|
89
|
+
# With OpenAI, Google, Groq, Mistral, or 15+ other providers
|
|
90
|
+
kuzushi scan /path/to/repo --model openai:gpt-4o
|
|
91
|
+
kuzushi scan /path/to/repo --model google:gemini-2.0-flash
|
|
92
|
+
```
|
|
60
93
|
|
|
61
|
-
|
|
94
|
+
Kuzushi auto-downloads Opengrep if you don't have a scanner installed. Zero dependencies to manage.
|
|
62
95
|
|
|
63
|
-
|
|
96
|
+
## Module System
|
|
64
97
|
|
|
65
|
-
|
|
66
|
-
- **Confidence** — 0.0 to 1.0
|
|
67
|
-
- **Rationale** — why the agent reached that verdict, referencing specific code lines
|
|
68
|
-
- **Verification steps** — 2-6 actionable steps a human reviewer can follow
|
|
69
|
-
- **Fix suggestion** — suggested patch when applicable
|
|
70
|
-
- **PoC exploit** (with `--verify`) — a concrete proof-of-concept payload proving the vulnerability is exploitable
|
|
71
|
-
- **Cost** — per-finding triage and verification cost in USD
|
|
98
|
+
Kuzushi's capabilities come from pluggable modules. Each module exposes tools (for shell and run modes) and optionally pipeline tasks (for scan mode DAG execution).
|
|
72
99
|
|
|
73
|
-
|
|
100
|
+
| Module | Category | What It Does |
|
|
101
|
+
|--------|----------|-------------|
|
|
102
|
+
| **sast** (built-in) | offense | 40+ task SAST pipeline: Semgrep, CodeQL, agentic detectors, AI triage, verification, PoC, patch |
|
|
103
|
+
| **randori** | intel | 7-stage PASTA threat modeling with ATT&CK/CAPEC/NVD intel, attack trees, probabilistic risk |
|
|
104
|
+
| **vuln-scout** | offense | Whitebox SAST with 15 Joern CPG verification scripts, 8 autonomous agents |
|
|
105
|
+
| **augur** | offense | Neuro-symbolic SAST (IRIS/ICLR 2025 LLM-driven CodeQL taint analysis) |
|
|
106
|
+
| **blackbox** | offense | Black/grey-box pentesting: nmap, gobuster, nikto, hydra, privilege escalation |
|
|
107
|
+
| **pwn** | offense | Binary exploitation: checksec, GDB, ROP chains, heap exploitation, SROP |
|
|
108
|
+
| **pentest** | offense | MCP server wrapping metasploit, nmap, hydra, john |
|
|
109
|
+
| **honeypot** | defense | Autonomous honeypot orchestration: 14 service types, 6 honeytokens, Falco |
|
|
110
|
+
| **yokai** | defense | Supply chain tripwires: dependency confusion, typosquatting, registry canaries |
|
|
111
|
+
| **prompt-armor** | offense | LLM red teaming: 80+ attack plugins, 25+ mutation strategies |
|
|
112
|
+
| **shinsa** | governance | Multi-framework compliance: ISO 27001, NIST 800-53, SOC 2, PCI DSS |
|
|
113
|
+
| **revgraph** | intel | Binary reverse engineering: Ghidra + Neo4j, NL2Cypher, function embeddings |
|
|
74
114
|
|
|
75
|
-
|
|
115
|
+
Modules are loaded via the shell (`use <module>`) or at startup (`--load blackbox,honeypot`).
|
|
116
|
+
|
|
117
|
+
## The SAST Pipeline
|
|
118
|
+
|
|
119
|
+
The built-in `sast` module runs a 40+ task DAG pipeline:
|
|
76
120
|
|
|
77
121
|
```
|
|
78
122
|
┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌─────────┐ ┌──────────┐
|
|
79
|
-
│ Task DAG
|
|
123
|
+
│ Task DAG │────>│ AI Triage │────>│ Verification │────>│ Patch │────>│ Report │
|
|
80
124
|
│ Semgrep │ │ Investigate │ │ Construct │ │ Generate │ │ TP only │
|
|
81
125
|
│ CodeQL │ │ each finding │ │ PoC exploits │ │ & verify │ │ + export │
|
|
82
|
-
│
|
|
126
|
+
│ 30+ tasks │ │ with context │ │ (optional) │ │ (opt-in) │ │ + stream │
|
|
83
127
|
└─────────────┘ └──────────────┘ └──────────────┘ └─────────┘ └──────────┘
|
|
84
128
|
```
|
|
85
129
|
|
|
86
|
-
1. **Context gathering** — auto-detects
|
|
87
|
-
2. **Code graph** — builds
|
|
88
|
-
3. **Threat modeling** — Randori PASTA plugin
|
|
89
|
-
4. **Threat-informed hunting** —
|
|
90
|
-
5. **Task DAG execution** —
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
130
|
+
1. **Context gathering** — auto-detects tech stack, frameworks, auth patterns, ORMs, sanitization
|
|
131
|
+
2. **Code graph** — builds entry-point-to-sink graph via static analysis + LLM discovery
|
|
132
|
+
3. **Threat modeling** — Randori PASTA plugin: business objectives, technical scope, DFD decomposition, STRIDE threats with ATT&CK/CAPEC/OWASP mapping and 5-factor probabilistic scoring
|
|
133
|
+
4. **Threat-informed hunting** — adversarial Claude agents per DFD external entity, CTF-style hunting
|
|
134
|
+
5. **Task DAG execution** — Semgrep, CodeQL, agentic scanner, 15+ specialized detectors (SSRF, SQLi, XSS, command injection, XXE, deserialization, NoSQL injection, template injection, prototype pollution, race conditions, supply chain, GraphQL, secrets/crypto, auth logic, systems-level deep semantic analysis)
|
|
135
|
+
6. **Classifier funnel** — cheap single-token pre-filter removes ~80% noise before expensive triage
|
|
136
|
+
7. **Deduplication** — fingerprints and merges equivalent findings across scanners
|
|
137
|
+
8. **AI triage** — agent investigates each finding with source context, code graph, evidence chains, threat model, CWE-specific knowledge
|
|
138
|
+
9. **Variant analysis** — confirmed TPs trigger automatic search for similar patterns
|
|
139
|
+
10. **Verification** (optional) — constructs concrete PoC exploit payloads
|
|
140
|
+
11. **PoC harness** (optional) — produces runnable exploit scripts with execution feedback
|
|
141
|
+
12. **Dynamic analysis** (optional) — executes harnesses in Docker sandbox
|
|
142
|
+
13. **Auto-patch** (optional) — generates, validates, and re-verifies patches in disposable worktrees
|
|
143
|
+
14. **Report** — terminal display + SARIF, Markdown, JSON, CSV, JSONL; optional SSE streaming
|
|
144
|
+
|
|
145
|
+
For each finding, you get: **verdict** (tp/fp/by_design/needs_review), **confidence** (0-1), **rationale** with code references, **verification steps**, **fix suggestion**, optional **PoC exploit**, and **cost in USD**.
|
|
101
146
|
|
|
102
147
|
## CI Integration
|
|
103
148
|
|
|
@@ -114,7 +159,7 @@ jobs:
|
|
|
114
159
|
- uses: actions/setup-node@v4
|
|
115
160
|
with:
|
|
116
161
|
node-version: 22
|
|
117
|
-
- run: npx kuzushi . --sarif results.sarif --quality-gate --fail-on-tp
|
|
162
|
+
- run: npx kuzushi scan . --sarif results.sarif --quality-gate --fail-on-tp
|
|
118
163
|
env:
|
|
119
164
|
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
|
120
165
|
- uses: github/codeql-action/upload-sarif@v3
|
|
@@ -126,255 +171,132 @@ jobs:
|
|
|
126
171
|
### Quality Gates
|
|
127
172
|
|
|
128
173
|
```sh
|
|
129
|
-
kuzushi <repo> --quality-gate # fail CI on threshold violations
|
|
130
|
-
kuzushi <repo> --fail-on-tp # fail if any high/critical TP is found
|
|
131
|
-
kuzushi <repo> --sarif results.sarif # export SARIF for GitHub Code Scanning
|
|
174
|
+
kuzushi scan <repo> --quality-gate # fail CI on threshold violations
|
|
175
|
+
kuzushi scan <repo> --fail-on-tp # fail if any high/critical TP is found
|
|
176
|
+
kuzushi scan <repo> --sarif results.sarif # export SARIF for GitHub Code Scanning
|
|
132
177
|
```
|
|
133
178
|
|
|
134
|
-
## Key Features
|
|
135
|
-
|
|
136
|
-
**Vendor-agnostic LLM runtime** — works with Anthropic, OpenAI, Google, Groq, Mistral, and 15+ other providers. Swap models at runtime with `--model provider:modelId`. Use cheaper models for triage, premium models for verification.
|
|
137
|
-
|
|
138
|
-
**Exploit verification** — goes beyond classification. Constructs concrete PoC payloads (SQL injection strings, XSS vectors, etc.) that prove a finding is exploitable, not just theoretically possible.
|
|
139
|
-
|
|
140
|
-
**Crypto behavioral testing** — generates and executes behavioral test harnesses in a Docker sandbox for crypto misuse findings. Detects timing side-channels, ECB mode, weak hashes, weak PRNGs, and more.
|
|
141
|
-
|
|
142
|
-
**IRIS-style taint analysis** — LLM-driven CodeQL taint analysis inspired by the IRIS paper (ICLR 2025). An LLM selects relevant CWE classes for the project, writes CodeQL extraction queries dynamically (language-agnostic, framework-agnostic), labels candidates, generates TaintTracking configurations, and iteratively refines queries when compilation fails. Structured taint paths (source-to-sink step data) are persisted to the findings DB for downstream verification and reporting. No templates, no hardcoded framework detection.
|
|
143
|
-
|
|
144
|
-
**Randori PASTA threat modeling** — ships `@kuzushi/randori-plugin` as a dependency. Runs 4-stage PASTA analysis (objectives, scope, DFD decomposition, STRIDE threats) via Claude Code plugin. Threat leads are injected into all detector prompts for threat-informed scanning. ATT&CK, CAPEC, and OWASP mapping included.
|
|
145
|
-
|
|
146
|
-
**Threat-informed hunting** — spawns adversarial Claude agents for each DFD external entity identified by the threat model. Each agent explores the codebase as that actor (end user, admin, external service, LLM agent) looking for exploitable paths. Findings are deduplicated and fed into triage/verification.
|
|
147
|
-
|
|
148
|
-
**Systems-level deep semantic hunt** — LLM-driven analysis pipeline for finding the class of bugs that survive decades of code review and fuzzing: integer overflow/wraparound (CWE-190), sentinel value collisions (CWE-787), signed/unsigned comparison bugs (CWE-681), buffer overflows exploitable via missing stack canaries (CWE-693), use-after-free in protocol state machines (CWE-416), and unsafe block violations in Rust (CWE-704). The LLM writes and runs CodeQL queries using range analysis, loop induction analysis, and type predicates — NOT TaintTracking — to find bugs that source-to-sink taint flows cannot express. Activates automatically for C, C++, Rust, and Go codebases. The `glasswing` preset routes a frontier model to this task for maximum depth.
|
|
149
|
-
|
|
150
|
-
**Auto rule generation** — verified exploitable findings automatically generate custom Semgrep rules. Rules are persisted to `.kuzushi/custom-rules/` and auto-loaded on subsequent scans, creating a feedback loop where the scanner gets smarter over time. Rules are validated against the original finding and removed if they don't match.
|
|
151
|
-
|
|
152
|
-
**Diff-aware taint analysis** — narrows analysis to files changed since a base branch. Run `--taint-diff-base main` in CI to only analyze what's new in the PR.
|
|
153
|
-
|
|
154
|
-
**Resumable runs** — checkpoints pipeline state to SQLite. Interrupted scan? `--resume` picks up exactly where it left off.
|
|
155
|
-
|
|
156
|
-
**Patch synthesis** — `kuzushi patch <repo> --fingerprint <fp>` generates and validates security patches in disposable git worktrees without touching your working copy.
|
|
157
|
-
|
|
158
|
-
**Language-tuned detection** — every detector adapts its prompts to your repo's actual tech stack. Kuzushi auto-detects languages and frameworks, then injects language-specific sinks, safe patterns, few-shot examples, investigation hints (grep patterns, key files), framework-aware guidance, and anti-hallucination constraints. A Python repo gets `subprocess.run` shell=True analysis and Django/Flask/FastAPI-specific advice. A C/C++ repo gets buffer-size #define resolution, signed/unsigned mismatch detection, and memory-safety few-shots. A Java repo gets SpEL injection, XXE factory configuration, and Spring Security guidance. 8 language ecosystems covered: C/C++, Java/Kotlin, Python, Go, JavaScript/TypeScript, Rust, PHP, Ruby — each with per-vulnerability-class depth. Polyglot repos get all relevant languages composed together.
|
|
159
|
-
|
|
160
|
-
**15+ specialized detectors** — dedicated detection tasks for command injection, XXE, insecure deserialization, SSRF, NoSQL injection, template injection, prototype pollution, race conditions, supply chain, GraphQL security, secrets/crypto, code config, auth logic, sharp edges, crypto behavioral testing, and systems-level deep semantic analysis (integer overflow, buffer overflow, sentinel collision, use-after-free, unsafe blocks). Each has vulnerability-class-specific prompts, anti-hallucination constraints, and multi-lens analysis. All detectors receive threat model leads for threat-informed scanning.
|
|
161
|
-
|
|
162
|
-
**Classifier funnel** — single-token LLM pre-filter using a cheap model removes ~80% of false positives before expensive triage, cutting per-scan cost dramatically.
|
|
163
|
-
|
|
164
|
-
**Source pre-read** — triage agents receive the flagged source file pre-loaded (50 lines surrounding the finding), eliminating cold-start tool calls and improving reasoning accuracy.
|
|
165
|
-
|
|
166
|
-
**LLM code graph** — builds a persistent code graph tracing entry points through middleware, controllers, services, and data-access layers. Static skeleton from import analysis + LLM-assisted gap-filling for dynamic dispatch, DI, and callback patterns. Discovery mode: when no HTTP routes are detected, the LLM identifies entry points itself (main functions, socket listeners, gRPC servers, CLI handlers) and traces security-relevant paths with threat model context. Feeds graph context into triage for better reasoning.
|
|
167
|
-
|
|
168
|
-
**Multi-strategy analysis** — runs 2-4 different analytical approaches per vulnerability class (syntactic pattern matching, dataflow tracing, first-principles reasoning, execution-based proof) and merges results with confidence boosting when strategies agree. Auto-generates reusable Semgrep rules from confirmed multi-strategy findings.
|
|
169
|
-
|
|
170
|
-
**13 CWE knowledge modules** — domain-specific knowledge for SQL injection, XSS, SSRF, command injection, path traversal, auth bypass, deserialization, race conditions, crypto, XXE, file upload, IDOR, and NoSQL injection — including dangerous patterns, safe patterns, bypass techniques, and fix examples.
|
|
171
|
-
|
|
172
|
-
**Incremental scanning** — skips re-triage for unchanged findings across runs. Tracks the last scanned commit, computes file diffs, and expands the rescan scope with dependency-aware invalidation via the import graph.
|
|
173
|
-
|
|
174
|
-
**Auto-patch with closed-loop verification** — after confirming a vulnerability, automatically generates a patch in a disposable git worktree, validates it (apply, build, test), then re-runs the scanner on the patched code to confirm the vulnerability is gone.
|
|
175
|
-
|
|
176
|
-
**Live streaming** — SSE server streams pipeline events in real-time (`--stream`). Connect with `curl`, `EventSource`, or any SSE client to watch findings appear as they're triaged.
|
|
177
|
-
|
|
178
|
-
**Interactive terminal UI** — React+Ink-powered live display with pipeline progress tree, spinners, attack chain diagrams, and a trophy screen for confirmed exploits. Includes an interactive REPL during scans (pause, skip, inspect findings), a first-run setup wizard, config confirmation flow, inline code preview, and clickable file paths. Auto-detects terminal theme and falls back to plain text in non-TTY environments.
|
|
179
|
-
|
|
180
|
-
**Audit logging** — optional JSONL audit trail of every agent decision for debugging, accountability, and compliance records.
|
|
181
|
-
|
|
182
179
|
## Scan Presets
|
|
183
180
|
|
|
184
|
-
Presets configure the pipeline for different cost/depth tradeoffs. CLI flags override preset values.
|
|
185
|
-
|
|
186
181
|
```sh
|
|
187
|
-
kuzushi <repo> --preset fast # semgrep only, no context/enrichment
|
|
188
|
-
kuzushi <repo> --preset standard # semgrep + IRIS taint + secrets/crypto
|
|
189
|
-
kuzushi <repo> --preset deep #
|
|
190
|
-
kuzushi <repo> --preset glasswing #
|
|
191
|
-
# + deep semantic hunt with a frontier model
|
|
182
|
+
kuzushi scan <repo> --preset fast # semgrep only, no context/enrichment
|
|
183
|
+
kuzushi scan <repo> --preset standard # semgrep + IRIS taint + secrets/crypto
|
|
184
|
+
kuzushi scan <repo> --preset deep # + verification + threat modeling + systems hunt
|
|
185
|
+
kuzushi scan <repo> --preset glasswing # + PoC + threat-informed hunting + frontier model
|
|
192
186
|
```
|
|
193
187
|
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
## Tasks
|
|
197
|
-
|
|
198
|
-
Every stage of the pipeline is an **AgentTask** — a composable unit with explicit dependencies that the orchestrator runs as a DAG. Tasks are selected via `--tasks` or `config.tasks`, and per-task config (including model overrides) lives in `config.taskConfig`.
|
|
188
|
+
## Key Features
|
|
199
189
|
|
|
200
|
-
|
|
201
|
-
|---------|-------------|---------------|
|
|
202
|
-
| `semgrep` (default) | Traditional SAST via Opengrep/Semgrep | Yes |
|
|
203
|
-
| `codeql` | Semantic dataflow/taint analysis via GitHub CodeQL CLI | No (opt-in) |
|
|
204
|
-
| `agentic` | AI-driven scanner — LLM with read-only repo tools | N/A |
|
|
205
|
-
| `taint-cwe-select` / `taint-iris` | IRIS-style LLM-driven CodeQL taint analysis — dynamic CWE selection, LLM-generated queries, iterative refinement | No (opt-in) |
|
|
206
|
-
| `systems-hunt` | Deep semantic analysis for C/C++/Rust/Go — LLM-driven CodeQL range analysis, loop induction, missing mitigations | No (opt-in) |
|
|
207
|
-
| `secrets-crypto-detect` | Secrets, API keys, and cryptographic misuse detection | N/A |
|
|
208
|
-
| `code-config-detect` | Security-relevant code and configuration issues | N/A |
|
|
209
|
-
| `threat-model-randori` | PASTA threat modeling with STRIDE analysis | N/A |
|
|
210
|
-
| `threat-hunt` | Adversarial CTF-style hunting per DFD entity | N/A |
|
|
211
|
-
| `context-gatherer` | Auto-detects tech stack, frameworks, auth patterns | N/A |
|
|
212
|
-
| `context-enricher` | Deep context enrichment (middleware, trust boundaries) | N/A |
|
|
190
|
+
**Vendor-agnostic LLM runtime** — Anthropic, OpenAI, Google, Groq, Mistral, and 15+ providers. Swap models at runtime with `--model provider:modelId`. Runs airgapped with Ollama.
|
|
213
191
|
|
|
214
|
-
|
|
215
|
-
kuzushi <repo> --tasks semgrep,codeql # run specific tasks
|
|
216
|
-
kuzushi <repo> --tasks agentic # AI-only scan
|
|
217
|
-
kuzushi <repo> --task-model threat-hunt=anthropic:claude-opus-4-6 # per-task model override
|
|
218
|
-
```
|
|
192
|
+
**Exploit verification** — constructs concrete PoC payloads (SQL injection strings, XSS vectors, etc.) that prove exploitability, not just theoretical possibility.
|
|
219
193
|
|
|
220
|
-
|
|
194
|
+
**IRIS-style taint analysis** — LLM-driven CodeQL taint analysis (ICLR 2025). LLM selects CWE classes, writes CodeQL queries dynamically, labels candidates, generates TaintTracking configs, iteratively refines on failure.
|
|
221
195
|
|
|
222
|
-
|
|
223
|
-
<summary><strong>All Commands</strong></summary>
|
|
196
|
+
**Randori PASTA threat modeling** — 4-stage PASTA analysis (objectives, scope, DFD, STRIDE threats) via Claude Code plugin. Threat leads injected into all detector prompts. ATT&CK, CAPEC, OWASP mapping.
|
|
224
197
|
|
|
225
|
-
|
|
198
|
+
**Systems-level deep semantic hunt** — finds bugs that survive decades of review: integer overflow, sentinel collisions, signed/unsigned comparison, buffer overflows, use-after-free in protocol state machines, unsafe Rust blocks. LLM writes CodeQL queries using range analysis, not TaintTracking.
|
|
226
199
|
|
|
227
|
-
|
|
228
|
-
kuzushi <repo> # scan with defaults
|
|
229
|
-
kuzushi <repo> --tasks codeql
|
|
230
|
-
kuzushi <repo> --tasks semgrep,codeql
|
|
231
|
-
kuzushi <repo> --tasks semgrep,agentic
|
|
232
|
-
kuzushi <repo> --severity ERROR # only ERROR-level findings
|
|
233
|
-
kuzushi <repo> --max 20 # triage top 20 findings only
|
|
234
|
-
kuzushi <repo> --model anthropic:claude-sonnet-4-6 # use a different model
|
|
235
|
-
kuzushi <repo> --task-model triage=openai:gpt-4o # separate model for triage stage
|
|
236
|
-
kuzushi <repo> --api-key sk-ant-... --base-url https://api.example.com/ # custom API endpoint
|
|
237
|
-
kuzushi <repo> --fresh # clear prior results, re-triage everything
|
|
238
|
-
kuzushi <repo> --db ./my.sqlite3 # custom database path
|
|
239
|
-
kuzushi <repo> --resume # resume the most recent interrupted run
|
|
240
|
-
kuzushi <repo> --resume <run-id> # resume a specific run by ID
|
|
241
|
-
```
|
|
200
|
+
**Multi-strategy analysis** — 2-4 analytical approaches per vulnerability class with confidence boosting when strategies agree. Auto-generates Semgrep rules from confirmed findings.
|
|
242
201
|
|
|
243
|
-
|
|
202
|
+
**Auto-patch with closed-loop verification** — generates patches in disposable worktrees, validates, re-runs the scanner to confirm the vulnerability is gone.
|
|
244
203
|
|
|
245
|
-
|
|
246
|
-
kuzushi <repo> --verify # enable exploit verification for TPs
|
|
247
|
-
kuzushi <repo> --verify --task-model verify=openai:gpt-4o-mini # cheaper model for verification
|
|
248
|
-
kuzushi <repo> --verify --verify-max-turns 20
|
|
249
|
-
kuzushi <repo> --verify --verify-concurrency 3
|
|
250
|
-
kuzushi <repo> --verify --verify-min-confidence 0.7 # skip low-confidence TPs
|
|
251
|
-
```
|
|
204
|
+
**Crypto behavioral testing** — generates and executes behavioral test harnesses in Docker for crypto misuse: timing side-channels, ECB mode, weak hashes, weak PRNGs.
|
|
252
205
|
|
|
253
|
-
|
|
206
|
+
**Language-tuned detection** — auto-detects languages and frameworks, injects language-specific sinks, safe patterns, few-shot examples, anti-hallucination constraints. 8 ecosystems: C/C++, Java/Kotlin, Python, Go, JS/TS, Rust, PHP, Ruby.
|
|
254
207
|
|
|
255
|
-
|
|
256
|
-
kuzushi <repo> --verify --poc-harness # generate exploit scripts for verified findings
|
|
257
|
-
kuzushi <repo> --verify --poc-harness --task-model poc-harness=openai:gpt-4o-mini
|
|
258
|
-
kuzushi <repo> --verify --poc-harness --poc-harness-max-turns 25
|
|
259
|
-
kuzushi <repo> --verify --poc-harness --poc-harness-concurrency 2
|
|
260
|
-
```
|
|
208
|
+
**Resumable runs** — checkpoints to SQLite. `--resume` picks up where you left off.
|
|
261
209
|
|
|
262
|
-
|
|
210
|
+
**Interactive terminal UI** — React+Ink-powered live display with pipeline progress tree, spinners, trophy screen for confirmed exploits. REPL during scans (pause, skip, inspect). First-run setup wizard. Falls back to plain text in non-TTY.
|
|
263
211
|
|
|
264
|
-
|
|
265
|
-
kuzushi <repo> --verify --poc-harness --dynamic-analysis # execute harnesses to confirm/reject findings
|
|
266
|
-
kuzushi <repo> --verify --dynamic-analysis --dynamic-max-candidates 10
|
|
267
|
-
kuzushi <repo> --verify --dynamic-analysis --dynamic-min-score 8
|
|
268
|
-
```
|
|
212
|
+
**Incremental scanning** — skips re-triage for unchanged findings. Dependency-aware invalidation via import graph.
|
|
269
213
|
|
|
270
|
-
|
|
214
|
+
**Audit logging** — JSONL audit trail of every agent decision.
|
|
271
215
|
|
|
272
|
-
|
|
273
|
-
|
|
274
|
-
kuzushi patch <repo> --fingerprint <fp> --build-cmd "npm run build"
|
|
275
|
-
kuzushi patch <repo> --fingerprint <fp> --test-cmd "npm test" --max-iterations 5
|
|
276
|
-
```
|
|
216
|
+
<details>
|
|
217
|
+
<summary><strong>All Scan Commands</strong></summary>
|
|
277
218
|
|
|
278
|
-
###
|
|
219
|
+
### Scan
|
|
279
220
|
|
|
280
221
|
```
|
|
281
|
-
kuzushi <repo>
|
|
222
|
+
kuzushi scan <repo> # scan with defaults
|
|
223
|
+
kuzushi scan <repo> --tasks codeql
|
|
224
|
+
kuzushi scan <repo> --tasks semgrep,codeql
|
|
225
|
+
kuzushi scan <repo> --severity ERROR
|
|
226
|
+
kuzushi scan <repo> --max 20
|
|
227
|
+
kuzushi scan <repo> --model anthropic:claude-sonnet-4-6
|
|
228
|
+
kuzushi scan <repo> --task-model triage=openai:gpt-4o
|
|
229
|
+
kuzushi scan <repo> --fresh
|
|
230
|
+
kuzushi scan <repo> --resume
|
|
282
231
|
```
|
|
283
232
|
|
|
284
|
-
###
|
|
233
|
+
### Verification & PoC
|
|
285
234
|
|
|
286
235
|
```
|
|
287
|
-
kuzushi <repo> --
|
|
288
|
-
kuzushi <repo> --
|
|
289
|
-
kuzushi <repo> --
|
|
290
|
-
kuzushi <repo> --multi-strategy-auto-rules # generate Semgrep rules from confirmed multi-strategy findings
|
|
236
|
+
kuzushi scan <repo> --verify
|
|
237
|
+
kuzushi scan <repo> --verify --poc-harness
|
|
238
|
+
kuzushi scan <repo> --verify --poc-harness --dynamic-analysis
|
|
291
239
|
```
|
|
292
240
|
|
|
293
|
-
###
|
|
241
|
+
### Multi-Strategy & Code Graph
|
|
294
242
|
|
|
295
243
|
```
|
|
296
|
-
kuzushi <repo> --
|
|
297
|
-
kuzushi <repo> --
|
|
298
|
-
kuzushi <repo> --
|
|
299
|
-
kuzushi <repo> --auto-patch --patch-verify-depth triage # re-run scanner + triage on patched code
|
|
300
|
-
kuzushi <repo> --auto-patch --patch-verify-depth full # full pipeline re-verify (most thorough)
|
|
301
|
-
kuzushi <repo> --auto-patch --patch-concurrency 3 # parallel patch synthesis tasks
|
|
244
|
+
kuzushi scan <repo> --multi-strategy
|
|
245
|
+
kuzushi scan <repo> --multi-strategy-full
|
|
246
|
+
kuzushi scan <repo> --code-graph
|
|
302
247
|
```
|
|
303
248
|
|
|
304
|
-
###
|
|
249
|
+
### Auto-Patch
|
|
305
250
|
|
|
306
251
|
```
|
|
307
|
-
kuzushi <repo> --
|
|
308
|
-
kuzushi <repo> --
|
|
309
|
-
# Then in another terminal:
|
|
310
|
-
curl -N http://localhost:3001/events # watch live pipeline events
|
|
252
|
+
kuzushi scan <repo> --verify --auto-patch
|
|
253
|
+
kuzushi scan <repo> --auto-patch --patch-verify-depth full
|
|
311
254
|
```
|
|
312
255
|
|
|
313
|
-
###
|
|
256
|
+
### Diff-Aware & Crypto
|
|
314
257
|
|
|
315
258
|
```
|
|
316
|
-
kuzushi <repo> --
|
|
259
|
+
kuzushi scan <repo> --taint-diff-base main
|
|
260
|
+
kuzushi scan <repo> --crypto-behavioral-test
|
|
317
261
|
```
|
|
318
262
|
|
|
319
|
-
###
|
|
263
|
+
### Output
|
|
320
264
|
|
|
321
265
|
```
|
|
322
|
-
kuzushi <repo> --
|
|
323
|
-
kuzushi <repo> --
|
|
324
|
-
kuzushi <repo> --
|
|
266
|
+
kuzushi scan <repo> --output report.md
|
|
267
|
+
kuzushi scan <repo> --sarif results.sarif
|
|
268
|
+
kuzushi scan <repo> --json results.json
|
|
269
|
+
kuzushi scan <repo> --csv results.csv
|
|
270
|
+
kuzushi scan <repo> --stream
|
|
271
|
+
kuzushi scan <repo> --audit-log
|
|
325
272
|
```
|
|
326
273
|
|
|
327
|
-
###
|
|
274
|
+
### Run Mode
|
|
328
275
|
|
|
329
276
|
```
|
|
330
|
-
kuzushi
|
|
331
|
-
kuzushi
|
|
332
|
-
kuzushi
|
|
333
|
-
kuzushi
|
|
334
|
-
kuzushi <repo> --jsonl results.jsonl # export JSONL report
|
|
335
|
-
kuzushi <repo> --audit-log # write agent activity to .kuzushi/runs/{runId}/
|
|
336
|
-
kuzushi <repo> --verbose # show debug-level runtime diagnostics
|
|
337
|
-
kuzushi <repo> --no-context # disable repo context gathering
|
|
277
|
+
kuzushi run sast:scan ./repo --json
|
|
278
|
+
kuzushi run sast:triage fingerprint=abc123 --json
|
|
279
|
+
kuzushi run sast:verify fingerprint=abc123 --quiet
|
|
280
|
+
kuzushi run sast:findings severity=critical,high --json
|
|
338
281
|
```
|
|
339
282
|
|
|
340
|
-
###
|
|
283
|
+
### Patch
|
|
341
284
|
|
|
342
285
|
```
|
|
343
|
-
kuzushi <repo> --
|
|
344
|
-
kuzushi <repo> --
|
|
345
|
-
kuzushi <repo> --retry-backoff-ms 10000 # initial backoff delay (default: 5000)
|
|
286
|
+
kuzushi patch <repo> --fingerprint <fp>
|
|
287
|
+
kuzushi patch <repo> --fingerprint <fp> --build-cmd "npm run build" --test-cmd "npm test"
|
|
346
288
|
```
|
|
347
289
|
|
|
348
290
|
### Config
|
|
349
291
|
|
|
350
292
|
```
|
|
351
|
-
kuzushi config get
|
|
352
|
-
kuzushi config get model # show one key
|
|
293
|
+
kuzushi config get
|
|
353
294
|
kuzushi config set model anthropic:claude-sonnet-4-6
|
|
354
295
|
kuzushi config set tasks semgrep,agentic
|
|
355
|
-
kuzushi config
|
|
356
|
-
kuzushi config
|
|
357
|
-
kuzushi config set taskConfig.semgrep.binary opengrep
|
|
358
|
-
kuzushi config set taskConfig.semgrep.configFlag auto
|
|
359
|
-
kuzushi config set taskConfig.agentic.model anthropic:claude-sonnet-4-6
|
|
360
|
-
kuzushi config set taskConfig.agentic.maxFindings 25
|
|
361
|
-
kuzushi config set taskConfig.triage.model anthropic:claude-opus-4-6
|
|
362
|
-
kuzushi config set taskConfig.verify.model openai:gpt-4o-mini
|
|
363
|
-
kuzushi config set severity ERROR,WARNING,INFO
|
|
364
|
-
kuzushi config set verify true
|
|
365
|
-
kuzushi config set verifyMinConfidence 0.7
|
|
366
|
-
kuzushi config set auditLog true
|
|
367
|
-
kuzushi config validate --repo . # validate the effective config for this repo
|
|
368
|
-
kuzushi config unset model # reset to default
|
|
369
|
-
kuzushi config path # print config file location
|
|
296
|
+
kuzushi config validate --repo .
|
|
297
|
+
kuzushi config path
|
|
370
298
|
```
|
|
371
299
|
|
|
372
|
-
Global config lives at `~/.kuzushi/config.json`. Optional project overrides can live at `<repo>/.kuzushi/config.json`. CLI flags override config values.
|
|
373
|
-
|
|
374
|
-
**Repo-local config sandboxing:** By default, project-level config files (`<repo>/.kuzushi/config.json`) are sandboxed — keys that could execute code or reach external systems (e.g., `hooks`, `externalTasks`, `pocExecute`, scanner binary paths) are silently stripped. This prevents a cloned repo from altering your runtime behavior. Pass `--trust-repo-config` to opt in to the full project config when you trust the repository.
|
|
375
|
-
|
|
376
|
-
Security note: `agentRuntimeConfig.apiKey` is stored in plaintext in config files. Prefer `--api-key` for one-off runs or `ANTHROPIC_API_KEY` from your shell/secret manager.
|
|
377
|
-
|
|
378
300
|
</details>
|
|
379
301
|
|
|
380
302
|
<details>
|
|
@@ -384,93 +306,44 @@ Security note: `agentRuntimeConfig.apiKey` is stored in plaintext in config file
|
|
|
384
306
|
| --- | --- | --- |
|
|
385
307
|
| `model` | `anthropic:claude-sonnet-4-6` | Default LLM model for all tasks and stages |
|
|
386
308
|
| `tasks` | `["semgrep"]` | Enabled task IDs, in execution order |
|
|
387
|
-
| `taskConfig` | `{ semgrep: {...}, triage: {...}, ... }` | Per-task config blocks keyed by task ID or stage ID
|
|
309
|
+
| `taskConfig` | `{ semgrep: {...}, triage: {...}, ... }` | Per-task config blocks keyed by task ID or stage ID |
|
|
388
310
|
| `severity` | `["ERROR","WARNING"]` | Semgrep severity filter |
|
|
389
311
|
| `excludePatterns` | `["test","tests","node_modules",...]` | Directories/globs to skip |
|
|
390
|
-
| `busBackend` | `"in-process"` | Message bus transport (`in-process`) |
|
|
391
312
|
| `triageConcurrency` | `5` | Parallel LLM triage calls |
|
|
392
|
-
| `scanMode` | `"concurrent"` | Task execution mode
|
|
393
|
-
| `agentRuntimeBackend` | `"pi-ai"` | Agent runtime backend (`pi-ai`) |
|
|
313
|
+
| `scanMode` | `"concurrent"` | Task execution mode |
|
|
394
314
|
| `verify` | `false` | Enable proof-of-exploitability verification |
|
|
395
|
-
| `
|
|
396
|
-
| `
|
|
397
|
-
| `
|
|
398
|
-
| `
|
|
399
|
-
| `
|
|
400
|
-
| `
|
|
401
|
-
| `
|
|
402
|
-
|
|
403
|
-
|
|
404
|
-
|
|
405
|
-
|
|
|
406
|
-
|
|
|
407
|
-
|
|
408
|
-
|
|
409
|
-
|
|
410
|
-
|
|
411
|
-
|
|
412
|
-
| `taskConfig.triage.model` | `model` | Model for triage agents |
|
|
413
|
-
| `taskConfig.verify.model` | `model` | Model for verification agents |
|
|
414
|
-
| `taskConfig.poc-harness.model` | `taskConfig.verify.model` → `model` | Model for PoC harness generation |
|
|
415
|
-
| `multiStrategyMode` | `"off"` | Multi-strategy analysis mode (`off`, `adaptive`, `full`) |
|
|
416
|
-
| `multiStrategyBudgetUsd` | `2.0` | Per-finding budget across all strategies (USD) |
|
|
417
|
-
| `autoPatchEnabled` | `false` | Enable automatic patch generation in pipeline |
|
|
418
|
-
| `autoPatchAfter` | `"verify"` | Trigger threshold for auto-patch (`verify`, `poc`, `triage`) |
|
|
419
|
-
| `patchVerifyDepth` | `"task"` | Re-verification depth after patching (`task`, `triage`, `full`) |
|
|
420
|
-
| `patchConcurrency` | `2` | Max concurrent patch synthesis tasks |
|
|
421
|
-
| `incrementalCache` | `true` | Enable incremental scanning (skip unchanged findings across runs) |
|
|
422
|
-
| `incrementalDepTracking` | `true` | Include importers of changed files in rescan scope |
|
|
423
|
-
| `streamingEnabled` | `false` | Enable SSE streaming server for live pipeline events |
|
|
424
|
-
| `streamingPort` | `0` (auto) | Port for the SSE streaming server |
|
|
425
|
-
| `enableContextGathering` | `true` | Run repo context analysis before triage |
|
|
426
|
-
| `auditLog` | `false` | Write agent activity to JSONL audit files |
|
|
427
|
-
| `reportOutput` | _(unset)_ | Write markdown report output to this path |
|
|
428
|
-
| `sarifOutput` | _(unset)_ | Write SARIF v2.1.0 output to this path |
|
|
429
|
-
| `jsonOutput` | _(unset)_ | Write JSON report to this path |
|
|
430
|
-
| `csvOutput` | _(unset)_ | Write CSV report to this path |
|
|
431
|
-
| `jsonlOutput` | _(unset)_ | Write JSONL report to this path |
|
|
432
|
-
| `maxTriageRetries` | `2` | Retry failed triage calls |
|
|
433
|
-
| `maxVerifyRetries` | `2` | Retry failed verification calls |
|
|
434
|
-
| `maxPocHarnessRetries` | `2` | Retry failed PoC harness generation calls |
|
|
435
|
-
| `retryBackoffMs` | `5000` | Initial retry backoff delay in ms |
|
|
436
|
-
| `retryBackoffMultiplier` | `2` | Exponential backoff multiplier |
|
|
437
|
-
|
|
438
|
-
Example config:
|
|
439
|
-
|
|
440
|
-
```json
|
|
441
|
-
{
|
|
442
|
-
"tasks": ["semgrep", "codeql", "context-gatherer", "context-enricher", "secrets-crypto-detect", "code-config-detect", "taint-cwe-select", "taint-iris"],
|
|
443
|
-
"scanMode": "concurrent",
|
|
444
|
-
"triageConcurrency": 3,
|
|
445
|
-
"verify": true,
|
|
446
|
-
"verifyMinConfidence": 0.7,
|
|
447
|
-
"auditLog": true,
|
|
448
|
-
"taskConfig": {
|
|
449
|
-
"codeql": { "dbPath": "./codeql-db", "suite": "javascript-security-extended" },
|
|
450
|
-
"semgrep": { "binary": "opengrep", "configFlag": "auto" },
|
|
451
|
-
"agentic": { "model": "anthropic:claude-sonnet-4-6", "maxFindings": 20 },
|
|
452
|
-
"triage": { "model": "anthropic:claude-opus-4-6" },
|
|
453
|
-
"verify": { "model": "openai:gpt-4o-mini" }
|
|
454
|
-
}
|
|
455
|
-
}
|
|
456
|
-
```
|
|
315
|
+
| `pocHarness` | `false` | Enable PoC harness generation |
|
|
316
|
+
| `cryptoBehavioralTestEnabled` | `false` | Enable crypto behavioral testing |
|
|
317
|
+
| `codeGraphEnabled` | `true` | Enable LLM code graph |
|
|
318
|
+
| `multiStrategyMode` | `"off"` | Multi-strategy analysis (`off`, `adaptive`, `full`) |
|
|
319
|
+
| `autoPatchEnabled` | `false` | Enable auto-patch generation |
|
|
320
|
+
| `incrementalCache` | `true` | Enable incremental scanning |
|
|
321
|
+
| `auditLog` | `false` | Write JSONL audit files |
|
|
322
|
+
|
|
323
|
+
**Stage model overrides** via `taskConfig`:
|
|
324
|
+
|
|
325
|
+
| Key | Purpose |
|
|
326
|
+
| --- | --- |
|
|
327
|
+
| `taskConfig.triage.model` | Model for triage agents |
|
|
328
|
+
| `taskConfig.verify.model` | Model for verification agents |
|
|
329
|
+
| `taskConfig.poc-harness.model` | Model for PoC harness generation |
|
|
330
|
+
|
|
331
|
+
Global config: `~/.kuzushi/config.json`. Project overrides: `<repo>/.kuzushi/config.json`. CLI flags override config.
|
|
457
332
|
|
|
458
333
|
### Environment Variables
|
|
459
334
|
|
|
460
335
|
| Variable | Required | Description |
|
|
461
336
|
| --- | --- | --- |
|
|
462
|
-
| `ANTHROPIC_API_KEY` | When using `anthropic:*` models | Anthropic API key
|
|
463
|
-
| `OPENAI_API_KEY` | When using `openai:*` models | OpenAI API key
|
|
464
|
-
| `GEMINI_API_KEY` / `GOOGLE_API_KEY` | When using `google:*` models | Google API key
|
|
337
|
+
| `ANTHROPIC_API_KEY` | When using `anthropic:*` models | Anthropic API key |
|
|
338
|
+
| `OPENAI_API_KEY` | When using `openai:*` models | OpenAI API key |
|
|
339
|
+
| `GEMINI_API_KEY` / `GOOGLE_API_KEY` | When using `google:*` models | Google API key |
|
|
465
340
|
|
|
466
341
|
</details>
|
|
467
342
|
|
|
468
343
|
<details>
|
|
469
344
|
<summary><strong>CodeQL Setup</strong></summary>
|
|
470
345
|
|
|
471
|
-
The `codeql` scanner requires the [CodeQL CLI](https://github.com/github/codeql-cli-binaries/releases)
|
|
472
|
-
|
|
473
|
-
Install it:
|
|
346
|
+
The `codeql` scanner requires the [CodeQL CLI](https://github.com/github/codeql-cli-binaries/releases) installed separately (not auto-downloaded).
|
|
474
347
|
|
|
475
348
|
```sh
|
|
476
349
|
# Via GitHub CLI (recommended):
|
|
@@ -480,175 +353,20 @@ gh extension install github/gh-codeql && gh codeql install-stub
|
|
|
480
353
|
# https://github.com/github/codeql-cli-binaries/releases
|
|
481
354
|
```
|
|
482
355
|
|
|
483
|
-
|
|
484
|
-
|
|
485
|
-
1. `codeql` on your PATH
|
|
486
|
-
2. Previously placed binary at `~/.kuzushi/bin/codeql`
|
|
487
|
-
3. Fails with install instructions if not found
|
|
488
|
-
|
|
489
|
-
CodeQL is **opt-in** — the default scanner list is `["semgrep"]`. To enable it:
|
|
490
|
-
|
|
491
|
-
```sh
|
|
492
|
-
kuzushi <repo> --scanners codeql # CodeQL only
|
|
493
|
-
kuzushi <repo> --scanners semgrep,codeql # both scanners
|
|
494
|
-
kuzushi config set scanners semgrep,codeql # persist as default
|
|
495
|
-
```
|
|
496
|
-
|
|
497
|
-
CodeQL builds a database from your source code before running queries. You can skip this step by pointing to a pre-built database:
|
|
498
|
-
|
|
499
|
-
```sh
|
|
500
|
-
kuzushi config set scannerConfig.codeql.dbPath ./codeql-db
|
|
501
|
-
```
|
|
502
|
-
|
|
503
|
-
</details>
|
|
504
|
-
|
|
505
|
-
<details>
|
|
506
|
-
<summary><strong>Taint Analysis Setup</strong></summary>
|
|
507
|
-
|
|
508
|
-
The `taint-analysis` scanner is a multi-pass CodeQL-based pipeline that uses LLM-assisted classification to label sources, sinks, sanitizers, and summaries. It requires:
|
|
509
|
-
|
|
510
|
-
1. **CodeQL CLI** — same requirement as the `codeql` scanner
|
|
511
|
-
2. **Python 3** — used by taint analysis scripts for query generation
|
|
512
|
-
|
|
513
|
-
Taint analysis templates, references, and scripts are bundled as the [`@kuzushi/augur`](https://www.npmjs.com/package/@kuzushi/augur) npm package and installed automatically with `pnpm install`. No manual clone or `TAINT_ANALYSIS_PATH` setup needed.
|
|
514
|
-
|
|
515
|
-
```sh
|
|
516
|
-
kuzushi <repo> --scanners taint-analysis
|
|
517
|
-
kuzushi config set scannerConfig["taint-analysis"].labelingModel anthropic:claude-sonnet-4-6
|
|
518
|
-
kuzushi config set scannerConfig["taint-analysis"].passes "[1,2,3,4,5,6]"
|
|
519
|
-
```
|
|
520
|
-
|
|
521
|
-
To override the bundled taint-analysis assets (e.g., for local development), set `TAINT_ANALYSIS_PATH` or `scannerConfig["taint-analysis"].taintAnalysisPath`:
|
|
522
|
-
|
|
523
|
-
```sh
|
|
524
|
-
export TAINT_ANALYSIS_PATH=/path/to/local/taint-analysis
|
|
525
|
-
kuzushi config set scannerConfig["taint-analysis"].taintAnalysisPath /path/to/local/taint-analysis
|
|
526
|
-
```
|
|
527
|
-
|
|
528
|
-
Taint analysis runs in three DAG-ordered stages: **preflight** (database creation, candidate extraction), **label** (LLM classification), and **analyze** (library generation, query execution, finding extraction).
|
|
529
|
-
|
|
530
|
-
### Taint Analysis TI + Artifact Outputs
|
|
531
|
-
|
|
532
|
-
Each taint analysis run emits interoperability artifacts under the workspace (`scannerConfig["taint-analysis"].workspaceDir`, default `./iris`) and run directory:
|
|
533
|
-
|
|
534
|
-
- `iris/exploration/TI_PRIOR.md` and `iris/exploration/ti_prior.json` — live TI prior (CISA KEV + NVD) with degraded-mode metadata when fetches fail
|
|
535
|
-
- `iris/labels/TAINT_MODEL.json` — per-CWE taint model (`sources/sinks/sanitizers/propagators`) with TI-weighted basis
|
|
536
|
-
- `iris/results/findings.raw.json` — normalized raw findings aggregate from taint analysis pass SARIF outputs
|
|
537
|
-
- `.kuzushi/runs/<runId>/findings.triaged.json` — triaged findings export including optional taint analysis source/sink triage details
|
|
538
|
-
|
|
539
|
-
Relevant `scannerConfig["taint-analysis"]` options:
|
|
540
|
-
|
|
541
|
-
- `tiMode`: `"live-required"` (default)
|
|
542
|
-
- `tiFailurePolicy`: `"continue_without_ti"` (default)
|
|
543
|
-
- `tiTimeoutMs`: live TI fetch timeout in milliseconds
|
|
544
|
-
- `refinementEnabled`: enable one post-triage refinement loop (default `false`)
|
|
545
|
-
- `refinementIterations`: max refinement passes when enabled (default `1`)
|
|
546
|
-
- `refinementDeltaOnly`: triage only changed findings after refinement (default `true`)
|
|
547
|
-
- `refinementModel`: optional model override for refinement stage wiring
|
|
548
|
-
|
|
549
|
-
</details>
|
|
550
|
-
|
|
551
|
-
<details>
|
|
552
|
-
<summary><strong>Agent Runtime Backends</strong></summary>
|
|
553
|
-
|
|
554
|
-
Kuzushi supports two agent runtime backends:
|
|
555
|
-
|
|
556
|
-
**Claude (default)** — Uses `@anthropic-ai/claude-agent-sdk` to spawn Claude Code subprocesses with built-in tool implementations (Read, Glob, Grep, Bash, etc.). Supports session reuse: batch operations keep a single subprocess alive across multiple turns via the SDK's streaming input API, reducing subprocess spawns by ~99%. Requires `ANTHROPIC_API_KEY`.
|
|
557
|
-
|
|
558
|
-
**Pi-AI** — Uses `@mariozechner/pi-ai` to provide vendor-agnostic LLM access. It supports 15+ providers (Anthropic, OpenAI, Google, Groq, Mistral, etc.) through a single interface. All LLM calls run in-process (no subprocesses).
|
|
559
|
-
|
|
560
|
-
Kuzushi implements an internal agentic loop on top of pi-ai:
|
|
561
|
-
|
|
562
|
-
1. **Tool-calling loop** — call model, parse tool calls, execute tools, feed results back, repeat until stop or max turns
|
|
563
|
-
2. **Local tool implementations** — Read (file reader with line numbers), Glob (Node 22+ `globSync`), Grep (regex search across files)
|
|
564
|
-
3. **Structured output** — system prompt injection + post-hoc JSON extraction from fenced code blocks or raw text
|
|
565
|
-
4. **Safety controls** — max turns, budget enforcement, abort signal, permission gating via `canUseTool`
|
|
356
|
+
CodeQL is opt-in. Enable it:
|
|
566
357
|
|
|
567
358
|
```sh
|
|
568
|
-
|
|
569
|
-
|
|
570
|
-
|
|
571
|
-
ANTHROPIC_API_KEY=... kuzushi <repo> --model anthropic:claude-sonnet-4-6
|
|
359
|
+
kuzushi scan <repo> --tasks codeql
|
|
360
|
+
kuzushi scan <repo> --tasks semgrep,codeql
|
|
361
|
+
kuzushi config set tasks semgrep,codeql
|
|
572
362
|
```
|
|
573
363
|
|
|
574
364
|
</details>
|
|
575
365
|
|
|
576
|
-
|
|
577
|
-
<summary><strong>Architecture</strong></summary>
|
|
578
|
-
|
|
579
|
-
Kuzushi is built on three core abstractions:
|
|
580
|
-
|
|
581
|
-
**Message Bus** — A transport-agnostic `MessageBus` interface (`publish`, `subscribe`, `waitFor`) that decouples pipeline stages. The stable build supports the in-process `EventEmitter` transport today.
|
|
582
|
-
|
|
583
|
-
**AgentTask + DAG** — Every unit of work (context gatherer, scanner, future threat modeler, etc.) implements the `AgentTask` interface: an `id`, `dependsOn` list, `outputKind`, and a `run()` method. The `TaskRegistry` resolves enabled tasks into a DAG, groups them into parallel stages, detects cycles, and hands execution to the `PipelineOrchestrator`. Upstream task outputs are forwarded to dependents automatically.
|
|
584
|
-
|
|
585
|
-
**Pipeline Phases** — After the DAG completes, the orchestrator drives sequential phases: triage (classify findings), verification (construct PoC exploits), patch synthesis (auto-generate and re-verify fixes), and report (display results + optional SSE streaming). Each phase has its own concurrency control, cost tracking, and checkpoint support.
|
|
586
|
-
|
|
587
|
-
**Strategy Framework** — The multi-strategy system wraps detection tasks with multiple analytical approaches (syntactic, dataflow, reasoning, execution) that run in parallel or adaptively, merging results with corroboration-based confidence boosting.
|
|
588
|
-
|
|
589
|
-
**Code Graph** — A persistent SQLite-backed graph of code paths from entry points to sinks, built from static import analysis and LLM-assisted tracing. Injected into triage prompts for deeper reasoning about reachability and sanitization.
|
|
590
|
-
|
|
591
|
-
**Session Reuse** — The Claude runtime uses the Agent SDK's `AsyncIterable<SDKUserMessage>` prompt to keep a single subprocess alive across multiple turns. Batch operations (taint labeling, triage, verification, rescoring, PoC generation, patch synthesis) write per-batch data to `.kuzushi/batches/` files and send the subprocess a prompt to `Read` each file. This reduces worst-case subprocess spawns from ~3,100 to ~24 per pipeline run. Runtimes without `createSession` support (pi-ai) fall back to one subprocess per call automatically.
|
|
592
|
-
|
|
593
|
-
Existing `ScannerPlugin` implementations (Semgrep, Agentic) are adapted into `AgentTask` via `adaptScannerPlugin()`, so the scanner plugin API remains stable.
|
|
594
|
-
|
|
595
|
-
See [AGENTS.md](AGENTS.md) for the full developer guide on adding new agent tasks.
|
|
596
|
-
|
|
597
|
-
### Package Surface
|
|
598
|
-
|
|
599
|
-
Kuzushi is published as a CLI-first package. The supported npm surface is the executable plus the root package export; internal modules under `dist/*` and `src/*` are not a stable API contract and may change between releases.
|
|
600
|
-
|
|
601
|
-
Release builds are expected to come from a clean compile into `dist/`. `pnpm build` now cleans `dist/` first, `prepack` rebuilds automatically, and `pnpm verify:pack` runs `npm pack --dry-run` so stale artifacts do not get published.
|
|
602
|
-
|
|
603
|
-
</details>
|
|
604
|
-
|
|
605
|
-
## Output
|
|
606
|
-
|
|
607
|
-
Results are stored in SQLite at `<repo>/.kuzushi/findings.sqlite3`. Export to any format:
|
|
608
|
-
|
|
609
|
-
```sh
|
|
610
|
-
kuzushi <repo> --output report.md # Markdown
|
|
611
|
-
kuzushi <repo> --sarif results.sarif # SARIF v2.1.0 (GitHub Code Scanning compatible)
|
|
612
|
-
kuzushi <repo> --json results.json # JSON
|
|
613
|
-
kuzushi <repo> --csv results.csv # CSV
|
|
614
|
-
kuzushi <repo> --jsonl results.jsonl # JSONL
|
|
615
|
-
```
|
|
616
|
-
|
|
617
|
-
## Development
|
|
618
|
-
|
|
619
|
-
```
|
|
620
|
-
pnpm install # install deps
|
|
621
|
-
pnpm dev -- /path/to/repo # run in dev mode
|
|
622
|
-
pnpm check:types # typecheck app + benchmark tooling
|
|
623
|
-
pnpm typecheck # type check
|
|
624
|
-
pnpm test # run tests
|
|
625
|
-
pnpm test:e2e # deterministic mock-backed smoke scan against fixture app
|
|
626
|
-
pnpm test:coverage # tests + coverage (70% threshold)
|
|
627
|
-
pnpm build # compile to dist/
|
|
628
|
-
pnpm verify:pack # verify published tarball contents
|
|
629
|
-
pnpm benchmark # run benchmark suite against govwa dataset
|
|
630
|
-
pnpm benchmark:freeze # freeze current benchmark results as baseline
|
|
631
|
-
pnpm benchmark:diff # diff current results against frozen baseline
|
|
632
|
-
pnpm benchmark:regression # CI regression check against baseline
|
|
633
|
-
```
|
|
634
|
-
|
|
635
|
-
`pnpm test:e2e` and the benchmark regression workflow use `tests/fixtures/mock-anthropic-server.mjs` for deterministic mock-backed coverage. They are useful smoke/regression checks, but they are not real LLM integration tests.
|
|
636
|
-
|
|
637
|
-
## Troubleshooting
|
|
638
|
-
|
|
639
|
-
- **"Error: missing API credentials for selected model provider(s)."**: Set the provider env var(s) for your selected models (for example `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, or `GEMINI_API_KEY`)
|
|
640
|
-
- **"No findings from scanner. Code looks clean."**: Your code is clean, or try `--severity ERROR,WARNING,INFO` to include lower-severity rules
|
|
641
|
-
- **Scan interrupted**: Re-run the same command (already-triaged findings are skipped), or use `--resume` to continue from the exact checkpoint
|
|
642
|
-
- **Wrong model**: `kuzushi config set model anthropic:claude-sonnet-4-6` or pass `--model` per-scan
|
|
643
|
-
- **Scanner download fails**: Install Opengrep or Semgrep manually, ensure it's on your PATH
|
|
644
|
-
- **High triage cost**: Use `--triage-model openai:gpt-4o-mini` for cheaper triage, or `--max 10` to limit findings
|
|
645
|
-
- **Verification too expensive**: Use `--verify-min-confidence 0.8` to only verify high-confidence TPs, or `--verify-model openai:gpt-4o-mini`
|
|
646
|
-
- **pi-ai model not found**: Ensure the model string uses `provider:modelId` format (e.g., `openai:gpt-4o`, not just `gpt-4o`)
|
|
647
|
-
|
|
648
|
-
## Contributing
|
|
366
|
+
## Architecture
|
|
649
367
|
|
|
650
|
-
See [
|
|
368
|
+
See [VISION.md](VISION.md) for the full architecture vision, module system design, workspace/knowledge graph, intel layer, governance model, and implementation roadmap.
|
|
651
369
|
|
|
652
370
|
## License
|
|
653
371
|
|
|
654
|
-
|
|
372
|
+
MIT
|