kuzushi 0.11.0 → 0.11.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  <img src="kuzushi.png" alt="Kuzushi" width="200" />
2
2
 
3
- # Kuzushi — AI Security Scanner That Only Shows You Real Vulnerabilities
3
+ # Kuzushi — Security-Native AI Operating Environment
4
4
 
5
5
  [![CI](https://github.com/allsmog/Kuzushi/actions/workflows/ci.yml/badge.svg)](https://github.com/allsmog/Kuzushi/actions/workflows/ci.yml)
6
6
  [![npm](https://img.shields.io/npm/v/kuzushi)](https://www.npmjs.com/package/kuzushi)
@@ -8,96 +8,141 @@
8
8
 
9
9
  [kuzushi.dev](https://kuzushi.dev)
10
10
 
11
- SAST tools cry wolf. Semgrep finds 500 issues in your codebase. 480 of them are false positives. You spend hours triaging a wall of noise and still miss the real vulnerability on line 247.
11
+ Kuzushi combines offensive security, defensive operations, and compliance governance into a single interactive platform powered by LLM agents, backed by a persistent workspace, and delivered through a rich terminal UI.
12
12
 
13
- Kuzushi runs the same scanners, then sends an AI agent to investigate each finding reading the actual code, tracing data flow, checking for sanitization. It tells you which findings are real, which are noise, and optionally proves exploitability by constructing a working PoC.
13
+ Find the vulnerability. Prove it's exploitable. Deploy a honeypot to detect it. Check if it violates PCI DSS. Generate the patch. One tool, one conversation.
14
14
 
15
15
  ```sh
16
- npx kuzushi /path/to/your/repo
16
+ npm install -g kuzushi
17
+ kuzushi
17
18
  ```
18
19
 
19
- No config files. Just point it at a repo.
20
+ ## Three Ways to Use It
20
21
 
21
- <!-- TODO: Add terminal recording / asciinema GIF here -->
22
+ ### Shell (default)
22
23
 
23
- ## Quick Start
24
+ Just type `kuzushi`. The interactive copilot shell starts with your loaded modules, available tools, and any active workspace. Talk naturally or use structured commands.
24
25
 
25
- Prereqs: Node 22+, and either an API key or Claude Code OAuth login.
26
+ ```
27
+ kuzushi shell # default — just `kuzushi` works
28
+ kuzushi shell --workspace acme-pentest # resume an engagement
29
+ kuzushi shell --target ./repo # set initial target
30
+ kuzushi shell --load blackbox,honeypot # pre-load specific modules
31
+ ```
26
32
 
27
- ```sh
28
- # Install globally (recommended — get upgrades with npm update -g kuzushi)
29
- npm install -g kuzushi
33
+ ```
34
+ ┌─────────────────────────────────────────────────────────────┐
35
+ │ kuzushi shell workspace: acme-api │
36
+ │ modules: sast, randori, blackbox, honeypot, shinsa │
37
+ │ target: ./acme-api (Node.js + Express + PostgreSQL) │
38
+ └─────────────────────────────────────────────────────────────┘
30
39
 
31
- # Or run without installing
32
- npx kuzushi /path/to/your/repo
40
+ kuzushi> modules
41
+ kuzushi> use sast
42
+ kuzushi/sast> run scan ./repo preset=deep
43
+ kuzushi/sast> back
44
+ kuzushi> tools
45
+ kuzushi> run sast:verify fingerprint=abc123
46
+ kuzushi> exit
33
47
  ```
34
48
 
49
+ ### Scan (headless pipeline)
50
+
51
+ The full SAST pipeline — 40+ agent tasks orchestrated as a DAG. Semgrep, CodeQL, 30+ agentic detectors, AI triage, verification, PoC generation, patch synthesis. CI/CD-native with SARIF, quality gates, and exit codes.
52
+
35
53
  ```sh
36
- # With Claude Code OAuth (no API key needed — uses your Claude login)
37
- kuzushi /path/to/your/repo
54
+ kuzushi scan <repo>
55
+ kuzushi scan <repo> --preset deep --verify --auto-patch
56
+ kuzushi scan <repo> --sarif report.sarif --fail-on-tp --quality-gate
57
+ kuzushi scan <repo> --resume
58
+ ```
38
59
 
39
- # With Anthropic API key
40
- export ANTHROPIC_API_KEY=sk-ant-...
41
- kuzushi /path/to/your/repo
60
+ ### Run (headless module tool)
42
61
 
43
- # With OpenAI
44
- export OPENAI_API_KEY=sk-...
45
- kuzushi /path/to/repo --model openai:gpt-4o
62
+ Execute a single module tool without the interactive shell. Scriptable, composable with unix pipes.
46
63
 
47
- # With Google, Groq, Mistral, or 15+ other providers
48
- kuzushi /path/to/repo --model google:gemini-2.0-flash
64
+ ```sh
65
+ kuzushi run sast:scan ./repo --json
66
+ kuzushi run sast:triage fingerprint=abc123 --json
67
+ kuzushi run sast:verify fingerprint=abc123 --quiet
68
+ kuzushi run sast:findings severity=critical,high --json
49
69
  ```
50
70
 
51
- Kuzushi auto-downloads Opengrep if you don't have a scanner installed. Zero dependencies to manage.
71
+ ## Quick Start
72
+
73
+ Prereqs: Node 22+, and either an API key or Claude Code OAuth login.
74
+
75
+ ```sh
76
+ # Install globally
77
+ npm install -g kuzushi
52
78
 
53
- ## The Problem
79
+ # Start the copilot shell (default)
80
+ kuzushi
54
81
 
55
- **SAST scanners alone** have high recall but terrible precision — they flag everything that *could* be a vulnerability, drowning you in false positives. Teams burn hours triaging, develop alert fatigue, and eventually stop looking.
82
+ # Or run a headless scan
83
+ kuzushi scan /path/to/your/repo
56
84
 
57
- **LLMs alone** can reason about code but hallucinate when scanning from scratch — 95%+ false positive rate when you ask "find vulnerabilities in this repo."
85
+ # With specific providers
86
+ export ANTHROPIC_API_KEY=sk-ant-...
87
+ kuzushi scan /path/to/your/repo
58
88
 
59
- **Kuzushi combines both.** SAST signal narrows the search space. AI reasoning eliminates false positives. The result: near-human researcher agreement rates on vulnerability classification.
89
+ # With OpenAI, Google, Groq, Mistral, or 15+ other providers
90
+ kuzushi scan /path/to/repo --model openai:gpt-4o
91
+ kuzushi scan /path/to/repo --model google:gemini-2.0-flash
92
+ ```
60
93
 
61
- ## What You Get
94
+ Kuzushi auto-downloads Opengrep if you don't have a scanner installed. Zero dependencies to manage.
62
95
 
63
- For each finding, Kuzushi produces:
96
+ ## Module System
64
97
 
65
- - **Verdict** `true_positive`, `false_positive`, `by_design`, or `needs_review`
66
- - **Confidence** — 0.0 to 1.0
67
- - **Rationale** — why the agent reached that verdict, referencing specific code lines
68
- - **Verification steps** — 2-6 actionable steps a human reviewer can follow
69
- - **Fix suggestion** — suggested patch when applicable
70
- - **PoC exploit** (with `--verify`) — a concrete proof-of-concept payload proving the vulnerability is exploitable
71
- - **Cost** — per-finding triage and verification cost in USD
98
+ Kuzushi's capabilities come from pluggable modules. Each module exposes tools (for shell and run modes) and optionally pipeline tasks (for scan mode DAG execution).
72
99
 
73
- The terminal report shows true positives first, then needs-review items. False positives and by-design findings are counted but deprioritized. You only see what matters.
100
+ | Module | Category | What It Does |
101
+ |--------|----------|-------------|
102
+ | **sast** (built-in) | offense | 40+ task SAST pipeline: Semgrep, CodeQL, agentic detectors, AI triage, verification, PoC, patch |
103
+ | **randori** | intel | 7-stage PASTA threat modeling with ATT&CK/CAPEC/NVD intel, attack trees, probabilistic risk |
104
+ | **vuln-scout** | offense | Whitebox SAST with 15 Joern CPG verification scripts, 8 autonomous agents |
105
+ | **augur** | offense | Neuro-symbolic SAST (IRIS/ICLR 2025 LLM-driven CodeQL taint analysis) |
106
+ | **blackbox** | offense | Black/grey-box pentesting: nmap, gobuster, nikto, hydra, privilege escalation |
107
+ | **pwn** | offense | Binary exploitation: checksec, GDB, ROP chains, heap exploitation, SROP |
108
+ | **pentest** | offense | MCP server wrapping metasploit, nmap, hydra, john |
109
+ | **honeypot** | defense | Autonomous honeypot orchestration: 14 service types, 6 honeytokens, Falco |
110
+ | **yokai** | defense | Supply chain tripwires: dependency confusion, typosquatting, registry canaries |
111
+ | **prompt-armor** | offense | LLM red teaming: 80+ attack plugins, 25+ mutation strategies |
112
+ | **shinsa** | governance | Multi-framework compliance: ISO 27001, NIST 800-53, SOC 2, PCI DSS |
113
+ | **revgraph** | intel | Binary reverse engineering: Ghidra + Neo4j, NL2Cypher, function embeddings |
74
114
 
75
- ## How It Works
115
+ Modules are loaded via the shell (`use <module>`) or at startup (`--load blackbox,honeypot`).
116
+
117
+ ## The SAST Pipeline
118
+
119
+ The built-in `sast` module runs a 40+ task DAG pipeline:
76
120
 
77
121
  ```
78
122
  ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌─────────┐ ┌──────────┐
79
- │ Task DAG │────▶│ AI Triage │────▶│ Verification │────▶│ Patch │────▶│ Report │
123
+ │ Task DAG │────>│ AI Triage │────>│ Verification │────>│ Patch │────>│ Report │
80
124
  │ Semgrep │ │ Investigate │ │ Construct │ │ Generate │ │ TP only │
81
125
  │ CodeQL │ │ each finding │ │ PoC exploits │ │ & verify │ │ + export │
82
- 15+ tasks │ │ with context │ │ (optional) │ │ (opt-in) │ │ + stream │
126
+ 30+ tasks │ │ with context │ │ (optional) │ │ (opt-in) │ │ + stream │
83
127
  └─────────────┘ └──────────────┘ └──────────────┘ └─────────┘ └──────────┘
84
128
  ```
85
129
 
86
- 1. **Context gathering** — auto-detects your tech stack, frameworks, auth patterns, ORMs, and sanitization libraries
87
- 2. **Code graph** — builds a persistent entry-point-to-sink graph via static analysis + LLM discovery mode. For HTTP services, traces pre-identified routes. For CLI tools, daemons, and non-HTTP projects, the LLM identifies entry points itself (main functions, socket listeners, gRPC servers, CLI handlers) and traces security-relevant call/data-flow paths
88
- 3. **Threat modeling** — Randori PASTA plugin (shipped as `@kuzushi/randori-plugin`) performs 4-stage threat analysis: business objectives, technical scope, DFD decomposition, and STRIDE threat scenarios with ATT&CK/CAPEC/OWASP mapping and 5-factor probabilistic scoring. All threat leads are injected into every detector's prompts.
89
- 4. **Threat-informed hunting** — spawns one adversarial Claude agent per DFD external entity (users, services, attackers) to CTF-style hunt for vulnerabilities from each actor's perspective
90
- 5. **Task DAG execution** — runs enabled tasks as a dependency-aware DAG: Semgrep, CodeQL, agentic scanner, and 15+ specialized detectors (SSRF, SQLi, XSS, command injection, XXE, deserialization, NoSQL injection, template injection, prototype pollution, race conditions, supply chain, GraphQL, secrets/crypto, auth logic, sharp edges, systems-level deep semantic analysis); multi-strategy mode runs 2-4 analytical approaches per vuln class
91
- 4. **Classifier funnel** — cheap single-token pre-filter removes ~80% of noise before expensive triage
92
- 5. **Deduplication** — fingerprints and merges equivalent findings across scanners
93
- 6. **Incremental skip** — findings already triaged in previous runs are skipped automatically
94
- 7. **AI triage** — an agent investigates each finding with pre-loaded source context, code graph paths, evidence chains, threat model context, and CWE-specific knowledge modules. Threat model output from Randori PASTA is injected into triage prompts so the agent can distinguish design choices (`by_design`) from real vulnerabilities (`tp`). Batch-dropped findings auto-escalate to individual triage
95
- 8. **Variant analysis** — confirmed TPs trigger automatic search for similar patterns across the codebase
96
- 9. **Verification** (optional) — constructs concrete PoC exploit payloads for true positives
97
- 10. **PoC harness generation** (optional) — produces runnable exploit scripts with iterative execution feedback
98
- 11. **Dynamic analysis** (optional) — executes harnesses in Docker sandbox to confirm exploitability
99
- 12. **Auto-patch** (optional) generates, validates, and re-verifies patches in disposable git worktrees
100
- 13. **Report** — terminal display + export to SARIF, Markdown, JSON, CSV, or JSONL; optional SSE live streaming
130
+ 1. **Context gathering** — auto-detects tech stack, frameworks, auth patterns, ORMs, sanitization
131
+ 2. **Code graph** — builds entry-point-to-sink graph via static analysis + LLM discovery
132
+ 3. **Threat modeling** — Randori PASTA plugin: business objectives, technical scope, DFD decomposition, STRIDE threats with ATT&CK/CAPEC/OWASP mapping and 5-factor probabilistic scoring
133
+ 4. **Threat-informed hunting** — adversarial Claude agents per DFD external entity, CTF-style hunting
134
+ 5. **Task DAG execution** — Semgrep, CodeQL, agentic scanner, 15+ specialized detectors (SSRF, SQLi, XSS, command injection, XXE, deserialization, NoSQL injection, template injection, prototype pollution, race conditions, supply chain, GraphQL, secrets/crypto, auth logic, systems-level deep semantic analysis)
135
+ 6. **Classifier funnel** — cheap single-token pre-filter removes ~80% noise before expensive triage
136
+ 7. **Deduplication** — fingerprints and merges equivalent findings across scanners
137
+ 8. **AI triage** — agent investigates each finding with source context, code graph, evidence chains, threat model, CWE-specific knowledge
138
+ 9. **Variant analysis** — confirmed TPs trigger automatic search for similar patterns
139
+ 10. **Verification** (optional) constructs concrete PoC exploit payloads
140
+ 11. **PoC harness** (optional) — produces runnable exploit scripts with execution feedback
141
+ 12. **Dynamic analysis** (optional) — executes harnesses in Docker sandbox
142
+ 13. **Auto-patch** (optional) — generates, validates, and re-verifies patches in disposable worktrees
143
+ 14. **Report** — terminal display + SARIF, Markdown, JSON, CSV, JSONL; optional SSE streaming
144
+
145
+ For each finding, you get: **verdict** (tp/fp/by_design/needs_review), **confidence** (0-1), **rationale** with code references, **verification steps**, **fix suggestion**, optional **PoC exploit**, and **cost in USD**.
101
146
 
102
147
  ## CI Integration
103
148
 
@@ -114,7 +159,7 @@ jobs:
114
159
  - uses: actions/setup-node@v4
115
160
  with:
116
161
  node-version: 22
117
- - run: npx kuzushi . --sarif results.sarif --quality-gate --fail-on-tp
162
+ - run: npx kuzushi scan . --sarif results.sarif --quality-gate --fail-on-tp
118
163
  env:
119
164
  ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
120
165
  - uses: github/codeql-action/upload-sarif@v3
@@ -126,255 +171,132 @@ jobs:
126
171
  ### Quality Gates
127
172
 
128
173
  ```sh
129
- kuzushi <repo> --quality-gate # fail CI on threshold violations
130
- kuzushi <repo> --fail-on-tp # fail if any high/critical TP is found
131
- kuzushi <repo> --sarif results.sarif # export SARIF for GitHub Code Scanning
174
+ kuzushi scan <repo> --quality-gate # fail CI on threshold violations
175
+ kuzushi scan <repo> --fail-on-tp # fail if any high/critical TP is found
176
+ kuzushi scan <repo> --sarif results.sarif # export SARIF for GitHub Code Scanning
132
177
  ```
133
178
 
134
- ## Key Features
135
-
136
- **Vendor-agnostic LLM runtime** — works with Anthropic, OpenAI, Google, Groq, Mistral, and 15+ other providers. Swap models at runtime with `--model provider:modelId`. Use cheaper models for triage, premium models for verification.
137
-
138
- **Exploit verification** — goes beyond classification. Constructs concrete PoC payloads (SQL injection strings, XSS vectors, etc.) that prove a finding is exploitable, not just theoretically possible.
139
-
140
- **Crypto behavioral testing** — generates and executes behavioral test harnesses in a Docker sandbox for crypto misuse findings. Detects timing side-channels, ECB mode, weak hashes, weak PRNGs, and more.
141
-
142
- **IRIS-style taint analysis** — LLM-driven CodeQL taint analysis inspired by the IRIS paper (ICLR 2025). An LLM selects relevant CWE classes for the project, writes CodeQL extraction queries dynamically (language-agnostic, framework-agnostic), labels candidates, generates TaintTracking configurations, and iteratively refines queries when compilation fails. Structured taint paths (source-to-sink step data) are persisted to the findings DB for downstream verification and reporting. No templates, no hardcoded framework detection.
143
-
144
- **Randori PASTA threat modeling** — ships `@kuzushi/randori-plugin` as a dependency. Runs 4-stage PASTA analysis (objectives, scope, DFD decomposition, STRIDE threats) via Claude Code plugin. Threat leads are injected into all detector prompts for threat-informed scanning. ATT&CK, CAPEC, and OWASP mapping included.
145
-
146
- **Threat-informed hunting** — spawns adversarial Claude agents for each DFD external entity identified by the threat model. Each agent explores the codebase as that actor (end user, admin, external service, LLM agent) looking for exploitable paths. Findings are deduplicated and fed into triage/verification.
147
-
148
- **Systems-level deep semantic hunt** — LLM-driven analysis pipeline for finding the class of bugs that survive decades of code review and fuzzing: integer overflow/wraparound (CWE-190), sentinel value collisions (CWE-787), signed/unsigned comparison bugs (CWE-681), buffer overflows exploitable via missing stack canaries (CWE-693), use-after-free in protocol state machines (CWE-416), and unsafe block violations in Rust (CWE-704). The LLM writes and runs CodeQL queries using range analysis, loop induction analysis, and type predicates — NOT TaintTracking — to find bugs that source-to-sink taint flows cannot express. Activates automatically for C, C++, Rust, and Go codebases. The `glasswing` preset routes a frontier model to this task for maximum depth.
149
-
150
- **Auto rule generation** — verified exploitable findings automatically generate custom Semgrep rules. Rules are persisted to `.kuzushi/custom-rules/` and auto-loaded on subsequent scans, creating a feedback loop where the scanner gets smarter over time. Rules are validated against the original finding and removed if they don't match.
151
-
152
- **Diff-aware taint analysis** — narrows analysis to files changed since a base branch. Run `--taint-diff-base main` in CI to only analyze what's new in the PR.
153
-
154
- **Resumable runs** — checkpoints pipeline state to SQLite. Interrupted scan? `--resume` picks up exactly where it left off.
155
-
156
- **Patch synthesis** — `kuzushi patch <repo> --fingerprint <fp>` generates and validates security patches in disposable git worktrees without touching your working copy.
157
-
158
- **Language-tuned detection** — every detector adapts its prompts to your repo's actual tech stack. Kuzushi auto-detects languages and frameworks, then injects language-specific sinks, safe patterns, few-shot examples, investigation hints (grep patterns, key files), framework-aware guidance, and anti-hallucination constraints. A Python repo gets `subprocess.run` shell=True analysis and Django/Flask/FastAPI-specific advice. A C/C++ repo gets buffer-size #define resolution, signed/unsigned mismatch detection, and memory-safety few-shots. A Java repo gets SpEL injection, XXE factory configuration, and Spring Security guidance. 8 language ecosystems covered: C/C++, Java/Kotlin, Python, Go, JavaScript/TypeScript, Rust, PHP, Ruby — each with per-vulnerability-class depth. Polyglot repos get all relevant languages composed together.
159
-
160
- **15+ specialized detectors** — dedicated detection tasks for command injection, XXE, insecure deserialization, SSRF, NoSQL injection, template injection, prototype pollution, race conditions, supply chain, GraphQL security, secrets/crypto, code config, auth logic, sharp edges, crypto behavioral testing, and systems-level deep semantic analysis (integer overflow, buffer overflow, sentinel collision, use-after-free, unsafe blocks). Each has vulnerability-class-specific prompts, anti-hallucination constraints, and multi-lens analysis. All detectors receive threat model leads for threat-informed scanning.
161
-
162
- **Classifier funnel** — single-token LLM pre-filter using a cheap model removes ~80% of false positives before expensive triage, cutting per-scan cost dramatically.
163
-
164
- **Source pre-read** — triage agents receive the flagged source file pre-loaded (50 lines surrounding the finding), eliminating cold-start tool calls and improving reasoning accuracy.
165
-
166
- **LLM code graph** — builds a persistent code graph tracing entry points through middleware, controllers, services, and data-access layers. Static skeleton from import analysis + LLM-assisted gap-filling for dynamic dispatch, DI, and callback patterns. Discovery mode: when no HTTP routes are detected, the LLM identifies entry points itself (main functions, socket listeners, gRPC servers, CLI handlers) and traces security-relevant paths with threat model context. Feeds graph context into triage for better reasoning.
167
-
168
- **Multi-strategy analysis** — runs 2-4 different analytical approaches per vulnerability class (syntactic pattern matching, dataflow tracing, first-principles reasoning, execution-based proof) and merges results with confidence boosting when strategies agree. Auto-generates reusable Semgrep rules from confirmed multi-strategy findings.
169
-
170
- **13 CWE knowledge modules** — domain-specific knowledge for SQL injection, XSS, SSRF, command injection, path traversal, auth bypass, deserialization, race conditions, crypto, XXE, file upload, IDOR, and NoSQL injection — including dangerous patterns, safe patterns, bypass techniques, and fix examples.
171
-
172
- **Incremental scanning** — skips re-triage for unchanged findings across runs. Tracks the last scanned commit, computes file diffs, and expands the rescan scope with dependency-aware invalidation via the import graph.
173
-
174
- **Auto-patch with closed-loop verification** — after confirming a vulnerability, automatically generates a patch in a disposable git worktree, validates it (apply, build, test), then re-runs the scanner on the patched code to confirm the vulnerability is gone.
175
-
176
- **Live streaming** — SSE server streams pipeline events in real-time (`--stream`). Connect with `curl`, `EventSource`, or any SSE client to watch findings appear as they're triaged.
177
-
178
- **Interactive terminal UI** — React+Ink-powered live display with pipeline progress tree, spinners, attack chain diagrams, and a trophy screen for confirmed exploits. Includes an interactive REPL during scans (pause, skip, inspect findings), a first-run setup wizard, config confirmation flow, inline code preview, and clickable file paths. Auto-detects terminal theme and falls back to plain text in non-TTY environments.
179
-
180
- **Audit logging** — optional JSONL audit trail of every agent decision for debugging, accountability, and compliance records.
181
-
182
179
  ## Scan Presets
183
180
 
184
- Presets configure the pipeline for different cost/depth tradeoffs. CLI flags override preset values.
185
-
186
181
  ```sh
187
- kuzushi <repo> --preset fast # semgrep only, no context/enrichment/variant analysis
188
- kuzushi <repo> --preset standard # semgrep + IRIS taint + secrets/crypto detection
189
- kuzushi <repo> --preset deep # standard + verification + threat modeling + systems-level hunt
190
- kuzushi <repo> --preset glasswing # verification + PoC generation + threat-informed hunting
191
- # + deep semantic hunt with a frontier model
182
+ kuzushi scan <repo> --preset fast # semgrep only, no context/enrichment
183
+ kuzushi scan <repo> --preset standard # semgrep + IRIS taint + secrets/crypto
184
+ kuzushi scan <repo> --preset deep # + verification + threat modeling + systems hunt
185
+ kuzushi scan <repo> --preset glasswing # + PoC + threat-informed hunting + frontier model
192
186
  ```
193
187
 
194
- The `glasswing` preset uses a cost-smart model tiering strategy: a standard model handles bulk scanning and triage, while a frontier model is used surgically for the systems-hunt and threat-hunt stages — where stronger adversarial reasoning has the highest ROI for zero-day discovery. Per-task model overrides keep costs controlled.
195
-
196
- ## Tasks
197
-
198
- Every stage of the pipeline is an **AgentTask** — a composable unit with explicit dependencies that the orchestrator runs as a DAG. Tasks are selected via `--tasks` or `config.tasks`, and per-task config (including model overrides) lives in `config.taskConfig`.
188
+ ## Key Features
199
189
 
200
- | Task ID | Description | Auto-download |
201
- |---------|-------------|---------------|
202
- | `semgrep` (default) | Traditional SAST via Opengrep/Semgrep | Yes |
203
- | `codeql` | Semantic dataflow/taint analysis via GitHub CodeQL CLI | No (opt-in) |
204
- | `agentic` | AI-driven scanner — LLM with read-only repo tools | N/A |
205
- | `taint-cwe-select` / `taint-iris` | IRIS-style LLM-driven CodeQL taint analysis — dynamic CWE selection, LLM-generated queries, iterative refinement | No (opt-in) |
206
- | `systems-hunt` | Deep semantic analysis for C/C++/Rust/Go — LLM-driven CodeQL range analysis, loop induction, missing mitigations | No (opt-in) |
207
- | `secrets-crypto-detect` | Secrets, API keys, and cryptographic misuse detection | N/A |
208
- | `code-config-detect` | Security-relevant code and configuration issues | N/A |
209
- | `threat-model-randori` | PASTA threat modeling with STRIDE analysis | N/A |
210
- | `threat-hunt` | Adversarial CTF-style hunting per DFD entity | N/A |
211
- | `context-gatherer` | Auto-detects tech stack, frameworks, auth patterns | N/A |
212
- | `context-enricher` | Deep context enrichment (middleware, trust boundaries) | N/A |
190
+ **Vendor-agnostic LLM runtime** Anthropic, OpenAI, Google, Groq, Mistral, and 15+ providers. Swap models at runtime with `--model provider:modelId`. Runs airgapped with Ollama.
213
191
 
214
- ```sh
215
- kuzushi <repo> --tasks semgrep,codeql # run specific tasks
216
- kuzushi <repo> --tasks agentic # AI-only scan
217
- kuzushi <repo> --task-model threat-hunt=anthropic:claude-opus-4-6 # per-task model override
218
- ```
192
+ **Exploit verification** — constructs concrete PoC payloads (SQL injection strings, XSS vectors, etc.) that prove exploitability, not just theoretical possibility.
219
193
 
220
- ---
194
+ **IRIS-style taint analysis** — LLM-driven CodeQL taint analysis (ICLR 2025). LLM selects CWE classes, writes CodeQL queries dynamically, labels candidates, generates TaintTracking configs, iteratively refines on failure.
221
195
 
222
- <details>
223
- <summary><strong>All Commands</strong></summary>
196
+ **Randori PASTA threat modeling** — 4-stage PASTA analysis (objectives, scope, DFD, STRIDE threats) via Claude Code plugin. Threat leads injected into all detector prompts. ATT&CK, CAPEC, OWASP mapping.
224
197
 
225
- ### Scan (default)
198
+ **Systems-level deep semantic hunt** — finds bugs that survive decades of review: integer overflow, sentinel collisions, signed/unsigned comparison, buffer overflows, use-after-free in protocol state machines, unsafe Rust blocks. LLM writes CodeQL queries using range analysis, not TaintTracking.
226
199
 
227
- ```
228
- kuzushi <repo> # scan with defaults
229
- kuzushi <repo> --tasks codeql
230
- kuzushi <repo> --tasks semgrep,codeql
231
- kuzushi <repo> --tasks semgrep,agentic
232
- kuzushi <repo> --severity ERROR # only ERROR-level findings
233
- kuzushi <repo> --max 20 # triage top 20 findings only
234
- kuzushi <repo> --model anthropic:claude-sonnet-4-6 # use a different model
235
- kuzushi <repo> --task-model triage=openai:gpt-4o # separate model for triage stage
236
- kuzushi <repo> --api-key sk-ant-... --base-url https://api.example.com/ # custom API endpoint
237
- kuzushi <repo> --fresh # clear prior results, re-triage everything
238
- kuzushi <repo> --db ./my.sqlite3 # custom database path
239
- kuzushi <repo> --resume # resume the most recent interrupted run
240
- kuzushi <repo> --resume <run-id> # resume a specific run by ID
241
- ```
200
+ **Multi-strategy analysis** — 2-4 analytical approaches per vulnerability class with confidence boosting when strategies agree. Auto-generates Semgrep rules from confirmed findings.
242
201
 
243
- ### Verification
202
+ **Auto-patch with closed-loop verification** — generates patches in disposable worktrees, validates, re-runs the scanner to confirm the vulnerability is gone.
244
203
 
245
- ```
246
- kuzushi <repo> --verify # enable exploit verification for TPs
247
- kuzushi <repo> --verify --task-model verify=openai:gpt-4o-mini # cheaper model for verification
248
- kuzushi <repo> --verify --verify-max-turns 20
249
- kuzushi <repo> --verify --verify-concurrency 3
250
- kuzushi <repo> --verify --verify-min-confidence 0.7 # skip low-confidence TPs
251
- ```
204
+ **Crypto behavioral testing** — generates and executes behavioral test harnesses in Docker for crypto misuse: timing side-channels, ECB mode, weak hashes, weak PRNGs.
252
205
 
253
- ### PoC Harness Generation
206
+ **Language-tuned detection** auto-detects languages and frameworks, injects language-specific sinks, safe patterns, few-shot examples, anti-hallucination constraints. 8 ecosystems: C/C++, Java/Kotlin, Python, Go, JS/TS, Rust, PHP, Ruby.
254
207
 
255
- ```
256
- kuzushi <repo> --verify --poc-harness # generate exploit scripts for verified findings
257
- kuzushi <repo> --verify --poc-harness --task-model poc-harness=openai:gpt-4o-mini
258
- kuzushi <repo> --verify --poc-harness --poc-harness-max-turns 25
259
- kuzushi <repo> --verify --poc-harness --poc-harness-concurrency 2
260
- ```
208
+ **Resumable runs** — checkpoints to SQLite. `--resume` picks up where you left off.
261
209
 
262
- ### Dynamic Analysis
210
+ **Interactive terminal UI** — React+Ink-powered live display with pipeline progress tree, spinners, trophy screen for confirmed exploits. REPL during scans (pause, skip, inspect). First-run setup wizard. Falls back to plain text in non-TTY.
263
211
 
264
- ```
265
- kuzushi <repo> --verify --poc-harness --dynamic-analysis # execute harnesses to confirm/reject findings
266
- kuzushi <repo> --verify --dynamic-analysis --dynamic-max-candidates 10
267
- kuzushi <repo> --verify --dynamic-analysis --dynamic-min-score 8
268
- ```
212
+ **Incremental scanning** — skips re-triage for unchanged findings. Dependency-aware invalidation via import graph.
269
213
 
270
- ### Patch Synthesis
214
+ **Audit logging** — JSONL audit trail of every agent decision.
271
215
 
272
- ```
273
- kuzushi patch <repo> --fingerprint <fp> # synthesize and validate a patch
274
- kuzushi patch <repo> --fingerprint <fp> --build-cmd "npm run build"
275
- kuzushi patch <repo> --fingerprint <fp> --test-cmd "npm test" --max-iterations 5
276
- ```
216
+ <details>
217
+ <summary><strong>All Scan Commands</strong></summary>
277
218
 
278
- ### Code Graph
219
+ ### Scan
279
220
 
280
221
  ```
281
- kuzushi <repo> --code-graph # enable LLM-powered code graph (entry-point-to-sink tracing)
222
+ kuzushi scan <repo> # scan with defaults
223
+ kuzushi scan <repo> --tasks codeql
224
+ kuzushi scan <repo> --tasks semgrep,codeql
225
+ kuzushi scan <repo> --severity ERROR
226
+ kuzushi scan <repo> --max 20
227
+ kuzushi scan <repo> --model anthropic:claude-sonnet-4-6
228
+ kuzushi scan <repo> --task-model triage=openai:gpt-4o
229
+ kuzushi scan <repo> --fresh
230
+ kuzushi scan <repo> --resume
282
231
  ```
283
232
 
284
- ### Multi-Strategy Analysis
233
+ ### Verification & PoC
285
234
 
286
235
  ```
287
- kuzushi <repo> --multi-strategy # adaptive mode: run cheapest strategy first, exit early if confident
288
- kuzushi <repo> --multi-strategy-full # run all strategies in parallel for maximum coverage
289
- kuzushi <repo> --multi-strategy-budget 3.0 # per-finding budget across all strategies (USD)
290
- kuzushi <repo> --multi-strategy-auto-rules # generate Semgrep rules from confirmed multi-strategy findings
236
+ kuzushi scan <repo> --verify
237
+ kuzushi scan <repo> --verify --poc-harness
238
+ kuzushi scan <repo> --verify --poc-harness --dynamic-analysis
291
239
  ```
292
240
 
293
- ### Auto-Patch (Closed-Loop)
241
+ ### Multi-Strategy & Code Graph
294
242
 
295
243
  ```
296
- kuzushi <repo> --verify --auto-patch # patch exploitable findings, re-verify
297
- kuzushi <repo> --verify --auto-patch --auto-patch-after triage # patch any TP (broadest trigger)
298
- kuzushi <repo> --verify --auto-patch --auto-patch-after poc # patch only after PoC proves it
299
- kuzushi <repo> --auto-patch --patch-verify-depth triage # re-run scanner + triage on patched code
300
- kuzushi <repo> --auto-patch --patch-verify-depth full # full pipeline re-verify (most thorough)
301
- kuzushi <repo> --auto-patch --patch-concurrency 3 # parallel patch synthesis tasks
244
+ kuzushi scan <repo> --multi-strategy
245
+ kuzushi scan <repo> --multi-strategy-full
246
+ kuzushi scan <repo> --code-graph
302
247
  ```
303
248
 
304
- ### Streaming
249
+ ### Auto-Patch
305
250
 
306
251
  ```
307
- kuzushi <repo> --stream # start SSE server on auto-assigned port
308
- kuzushi <repo> --stream --stream-port 3001 # start SSE server on specific port
309
- # Then in another terminal:
310
- curl -N http://localhost:3001/events # watch live pipeline events
252
+ kuzushi scan <repo> --verify --auto-patch
253
+ kuzushi scan <repo> --auto-patch --patch-verify-depth full
311
254
  ```
312
255
 
313
- ### Crypto Behavioral Testing
256
+ ### Diff-Aware & Crypto
314
257
 
315
258
  ```
316
- kuzushi <repo> --crypto-behavioral-test # generate & run behavioral tests for crypto misuse findings
259
+ kuzushi scan <repo> --taint-diff-base main
260
+ kuzushi scan <repo> --crypto-behavioral-test
317
261
  ```
318
262
 
319
- ### Diff-Aware Taint
263
+ ### Output
320
264
 
321
265
  ```
322
- kuzushi <repo> --taint-diff-base main # only taint-analyze files changed since main
323
- kuzushi <repo> --taint-diff-base main --taint-diff-mode delta # emit only findings intersecting the diff
324
- kuzushi <repo> --taint-diff-base main --taint-diff-mode baseline # merge cached + rerun for full baseline
266
+ kuzushi scan <repo> --output report.md
267
+ kuzushi scan <repo> --sarif results.sarif
268
+ kuzushi scan <repo> --json results.json
269
+ kuzushi scan <repo> --csv results.csv
270
+ kuzushi scan <repo> --stream
271
+ kuzushi scan <repo> --audit-log
325
272
  ```
326
273
 
327
- ### Output & Observability
274
+ ### Run Mode
328
275
 
329
276
  ```
330
- kuzushi <repo> --output report.md # export markdown report
331
- kuzushi <repo> --sarif results.sarif # export SARIF v2.1.0
332
- kuzushi <repo> --json results.json # export JSON report
333
- kuzushi <repo> --csv results.csv # export CSV report
334
- kuzushi <repo> --jsonl results.jsonl # export JSONL report
335
- kuzushi <repo> --audit-log # write agent activity to .kuzushi/runs/{runId}/
336
- kuzushi <repo> --verbose # show debug-level runtime diagnostics
337
- kuzushi <repo> --no-context # disable repo context gathering
277
+ kuzushi run sast:scan ./repo --json
278
+ kuzushi run sast:triage fingerprint=abc123 --json
279
+ kuzushi run sast:verify fingerprint=abc123 --quiet
280
+ kuzushi run sast:findings severity=critical,high --json
338
281
  ```
339
282
 
340
- ### Retry
283
+ ### Patch
341
284
 
342
285
  ```
343
- kuzushi <repo> --max-triage-retries 3 # retry failed triage calls (default: 2)
344
- kuzushi <repo> --max-verify-retries 3 # retry failed verification calls (default: 2)
345
- kuzushi <repo> --retry-backoff-ms 10000 # initial backoff delay (default: 5000)
286
+ kuzushi patch <repo> --fingerprint <fp>
287
+ kuzushi patch <repo> --fingerprint <fp> --build-cmd "npm run build" --test-cmd "npm test"
346
288
  ```
347
289
 
348
290
  ### Config
349
291
 
350
292
  ```
351
- kuzushi config get # show all config
352
- kuzushi config get model # show one key
293
+ kuzushi config get
353
294
  kuzushi config set model anthropic:claude-sonnet-4-6
354
295
  kuzushi config set tasks semgrep,agentic
355
- kuzushi config set taskConfig.codeql.dbPath ./codeql-db
356
- kuzushi config set taskConfig.codeql.suite javascript-security-extended
357
- kuzushi config set taskConfig.semgrep.binary opengrep
358
- kuzushi config set taskConfig.semgrep.configFlag auto
359
- kuzushi config set taskConfig.agentic.model anthropic:claude-sonnet-4-6
360
- kuzushi config set taskConfig.agentic.maxFindings 25
361
- kuzushi config set taskConfig.triage.model anthropic:claude-opus-4-6
362
- kuzushi config set taskConfig.verify.model openai:gpt-4o-mini
363
- kuzushi config set severity ERROR,WARNING,INFO
364
- kuzushi config set verify true
365
- kuzushi config set verifyMinConfidence 0.7
366
- kuzushi config set auditLog true
367
- kuzushi config validate --repo . # validate the effective config for this repo
368
- kuzushi config unset model # reset to default
369
- kuzushi config path # print config file location
296
+ kuzushi config validate --repo .
297
+ kuzushi config path
370
298
  ```
371
299
 
372
- Global config lives at `~/.kuzushi/config.json`. Optional project overrides can live at `<repo>/.kuzushi/config.json`. CLI flags override config values.
373
-
374
- **Repo-local config sandboxing:** By default, project-level config files (`<repo>/.kuzushi/config.json`) are sandboxed — keys that could execute code or reach external systems (e.g., `hooks`, `externalTasks`, `pocExecute`, scanner binary paths) are silently stripped. This prevents a cloned repo from altering your runtime behavior. Pass `--trust-repo-config` to opt in to the full project config when you trust the repository.
375
-
376
- Security note: `agentRuntimeConfig.apiKey` is stored in plaintext in config files. Prefer `--api-key` for one-off runs or `ANTHROPIC_API_KEY` from your shell/secret manager.
377
-
378
300
  </details>
379
301
 
380
302
  <details>
@@ -384,93 +306,44 @@ Security note: `agentRuntimeConfig.apiKey` is stored in plaintext in config file
384
306
  | --- | --- | --- |
385
307
  | `model` | `anthropic:claude-sonnet-4-6` | Default LLM model for all tasks and stages |
386
308
  | `tasks` | `["semgrep"]` | Enabled task IDs, in execution order |
387
- | `taskConfig` | `{ semgrep: {...}, triage: {...}, ... }` | Per-task config blocks keyed by task ID or stage ID (see below) |
309
+ | `taskConfig` | `{ semgrep: {...}, triage: {...}, ... }` | Per-task config blocks keyed by task ID or stage ID |
388
310
  | `severity` | `["ERROR","WARNING"]` | Semgrep severity filter |
389
311
  | `excludePatterns` | `["test","tests","node_modules",...]` | Directories/globs to skip |
390
- | `busBackend` | `"in-process"` | Message bus transport (`in-process`) |
391
312
  | `triageConcurrency` | `5` | Parallel LLM triage calls |
392
- | `scanMode` | `"concurrent"` | Task execution mode (`sequential` or `concurrent`) |
393
- | `agentRuntimeBackend` | `"pi-ai"` | Agent runtime backend (`pi-ai`) |
313
+ | `scanMode` | `"concurrent"` | Task execution mode |
394
314
  | `verify` | `false` | Enable proof-of-exploitability verification |
395
- | `verifyMaxTurns` | `15` | Max turns for verification agent |
396
- | `verifyConcurrency` | `3` | Parallel verification calls |
397
- | `verifyVerdicts` | `["tp"]` | Which triage verdicts to verify |
398
- | `verifyMinConfidence` | `0` | Minimum triage confidence to trigger verification (0-1) |
399
- | `pocHarness` | `false` | Enable post-verification PoC harness generation (requires `--verify`) |
400
- | `pocHarnessMaxTurns` | `20` | Max turns for PoC harness agent |
401
- | `pocHarnessConcurrency` | `2` | Parallel PoC harness generation calls |
402
- | `cryptoBehavioralTestEnabled` | `false` | Enable crypto behavioral testing for crypto misuse findings |
403
- | `cryptoBehavioralMaxFindings` | `10` | Max findings to generate behavioral tests for per run |
404
- | `cryptoBehavioralTimeoutMs` | `120000` | Execution timeout per harness in ms |
405
- | `cryptoBehavioralPerFindingBudgetUsd` | `1` | Cost budget per finding for harness generation |
406
- | `codeGraphEnabled` | `true` | Enable LLM-powered code graph construction and enrichment |
407
-
408
- **Stage model overrides** set per-stage models via `taskConfig` instead of top-level fields:
409
-
410
- | `taskConfig` key | Fallback chain | Purpose |
411
- | --- | --- | --- |
412
- | `taskConfig.triage.model` | `model` | Model for triage agents |
413
- | `taskConfig.verify.model` | `model` | Model for verification agents |
414
- | `taskConfig.poc-harness.model` | `taskConfig.verify.model` → `model` | Model for PoC harness generation |
415
- | `multiStrategyMode` | `"off"` | Multi-strategy analysis mode (`off`, `adaptive`, `full`) |
416
- | `multiStrategyBudgetUsd` | `2.0` | Per-finding budget across all strategies (USD) |
417
- | `autoPatchEnabled` | `false` | Enable automatic patch generation in pipeline |
418
- | `autoPatchAfter` | `"verify"` | Trigger threshold for auto-patch (`verify`, `poc`, `triage`) |
419
- | `patchVerifyDepth` | `"task"` | Re-verification depth after patching (`task`, `triage`, `full`) |
420
- | `patchConcurrency` | `2` | Max concurrent patch synthesis tasks |
421
- | `incrementalCache` | `true` | Enable incremental scanning (skip unchanged findings across runs) |
422
- | `incrementalDepTracking` | `true` | Include importers of changed files in rescan scope |
423
- | `streamingEnabled` | `false` | Enable SSE streaming server for live pipeline events |
424
- | `streamingPort` | `0` (auto) | Port for the SSE streaming server |
425
- | `enableContextGathering` | `true` | Run repo context analysis before triage |
426
- | `auditLog` | `false` | Write agent activity to JSONL audit files |
427
- | `reportOutput` | _(unset)_ | Write markdown report output to this path |
428
- | `sarifOutput` | _(unset)_ | Write SARIF v2.1.0 output to this path |
429
- | `jsonOutput` | _(unset)_ | Write JSON report to this path |
430
- | `csvOutput` | _(unset)_ | Write CSV report to this path |
431
- | `jsonlOutput` | _(unset)_ | Write JSONL report to this path |
432
- | `maxTriageRetries` | `2` | Retry failed triage calls |
433
- | `maxVerifyRetries` | `2` | Retry failed verification calls |
434
- | `maxPocHarnessRetries` | `2` | Retry failed PoC harness generation calls |
435
- | `retryBackoffMs` | `5000` | Initial retry backoff delay in ms |
436
- | `retryBackoffMultiplier` | `2` | Exponential backoff multiplier |
437
-
438
- Example config:
439
-
440
- ```json
441
- {
442
- "tasks": ["semgrep", "codeql", "context-gatherer", "context-enricher", "secrets-crypto-detect", "code-config-detect", "taint-cwe-select", "taint-iris"],
443
- "scanMode": "concurrent",
444
- "triageConcurrency": 3,
445
- "verify": true,
446
- "verifyMinConfidence": 0.7,
447
- "auditLog": true,
448
- "taskConfig": {
449
- "codeql": { "dbPath": "./codeql-db", "suite": "javascript-security-extended" },
450
- "semgrep": { "binary": "opengrep", "configFlag": "auto" },
451
- "agentic": { "model": "anthropic:claude-sonnet-4-6", "maxFindings": 20 },
452
- "triage": { "model": "anthropic:claude-opus-4-6" },
453
- "verify": { "model": "openai:gpt-4o-mini" }
454
- }
455
- }
456
- ```
315
+ | `pocHarness` | `false` | Enable PoC harness generation |
316
+ | `cryptoBehavioralTestEnabled` | `false` | Enable crypto behavioral testing |
317
+ | `codeGraphEnabled` | `true` | Enable LLM code graph |
318
+ | `multiStrategyMode` | `"off"` | Multi-strategy analysis (`off`, `adaptive`, `full`) |
319
+ | `autoPatchEnabled` | `false` | Enable auto-patch generation |
320
+ | `incrementalCache` | `true` | Enable incremental scanning |
321
+ | `auditLog` | `false` | Write JSONL audit files |
322
+
323
+ **Stage model overrides** via `taskConfig`:
324
+
325
+ | Key | Purpose |
326
+ | --- | --- |
327
+ | `taskConfig.triage.model` | Model for triage agents |
328
+ | `taskConfig.verify.model` | Model for verification agents |
329
+ | `taskConfig.poc-harness.model` | Model for PoC harness generation |
330
+
331
+ Global config: `~/.kuzushi/config.json`. Project overrides: `<repo>/.kuzushi/config.json`. CLI flags override config.
457
332
 
458
333
  ### Environment Variables
459
334
 
460
335
  | Variable | Required | Description |
461
336
  | --- | --- | --- |
462
- | `ANTHROPIC_API_KEY` | When using `anthropic:*` models | Anthropic API key for pi-ai backend |
463
- | `OPENAI_API_KEY` | When using `openai:*` models | OpenAI API key for pi-ai backend |
464
- | `GEMINI_API_KEY` / `GOOGLE_API_KEY` | When using `google:*` models | Google API key for pi-ai backend |
337
+ | `ANTHROPIC_API_KEY` | When using `anthropic:*` models | Anthropic API key |
338
+ | `OPENAI_API_KEY` | When using `openai:*` models | OpenAI API key |
339
+ | `GEMINI_API_KEY` / `GOOGLE_API_KEY` | When using `google:*` models | Google API key |
465
340
 
466
341
  </details>
467
342
 
468
343
  <details>
469
344
  <summary><strong>CodeQL Setup</strong></summary>
470
345
 
471
- The `codeql` scanner requires the [CodeQL CLI](https://github.com/github/codeql-cli-binaries/releases) to be installed separately. Unlike Semgrep, it is **not auto-downloaded** (the CLI is ~500 MB and requires accepting GitHub's license).
472
-
473
- Install it:
346
+ The `codeql` scanner requires the [CodeQL CLI](https://github.com/github/codeql-cli-binaries/releases) installed separately (not auto-downloaded).
474
347
 
475
348
  ```sh
476
349
  # Via GitHub CLI (recommended):
@@ -480,175 +353,20 @@ gh extension install github/gh-codeql && gh codeql install-stub
480
353
  # https://github.com/github/codeql-cli-binaries/releases
481
354
  ```
482
355
 
483
- Kuzushi finds the CodeQL binary in this order:
484
-
485
- 1. `codeql` on your PATH
486
- 2. Previously placed binary at `~/.kuzushi/bin/codeql`
487
- 3. Fails with install instructions if not found
488
-
489
- CodeQL is **opt-in** — the default scanner list is `["semgrep"]`. To enable it:
490
-
491
- ```sh
492
- kuzushi <repo> --scanners codeql # CodeQL only
493
- kuzushi <repo> --scanners semgrep,codeql # both scanners
494
- kuzushi config set scanners semgrep,codeql # persist as default
495
- ```
496
-
497
- CodeQL builds a database from your source code before running queries. You can skip this step by pointing to a pre-built database:
498
-
499
- ```sh
500
- kuzushi config set scannerConfig.codeql.dbPath ./codeql-db
501
- ```
502
-
503
- </details>
504
-
505
- <details>
506
- <summary><strong>Taint Analysis Setup</strong></summary>
507
-
508
- The `taint-analysis` scanner is a multi-pass CodeQL-based pipeline that uses LLM-assisted classification to label sources, sinks, sanitizers, and summaries. It requires:
509
-
510
- 1. **CodeQL CLI** — same requirement as the `codeql` scanner
511
- 2. **Python 3** — used by taint analysis scripts for query generation
512
-
513
- Taint analysis templates, references, and scripts are bundled as the [`@kuzushi/augur`](https://www.npmjs.com/package/@kuzushi/augur) npm package and installed automatically with `pnpm install`. No manual clone or `TAINT_ANALYSIS_PATH` setup needed.
514
-
515
- ```sh
516
- kuzushi <repo> --scanners taint-analysis
517
- kuzushi config set scannerConfig["taint-analysis"].labelingModel anthropic:claude-sonnet-4-6
518
- kuzushi config set scannerConfig["taint-analysis"].passes "[1,2,3,4,5,6]"
519
- ```
520
-
521
- To override the bundled taint-analysis assets (e.g., for local development), set `TAINT_ANALYSIS_PATH` or `scannerConfig["taint-analysis"].taintAnalysisPath`:
522
-
523
- ```sh
524
- export TAINT_ANALYSIS_PATH=/path/to/local/taint-analysis
525
- kuzushi config set scannerConfig["taint-analysis"].taintAnalysisPath /path/to/local/taint-analysis
526
- ```
527
-
528
- Taint analysis runs in three DAG-ordered stages: **preflight** (database creation, candidate extraction), **label** (LLM classification), and **analyze** (library generation, query execution, finding extraction).
529
-
530
- ### Taint Analysis TI + Artifact Outputs
531
-
532
- Each taint analysis run emits interoperability artifacts under the workspace (`scannerConfig["taint-analysis"].workspaceDir`, default `./iris`) and run directory:
533
-
534
- - `iris/exploration/TI_PRIOR.md` and `iris/exploration/ti_prior.json` — live TI prior (CISA KEV + NVD) with degraded-mode metadata when fetches fail
535
- - `iris/labels/TAINT_MODEL.json` — per-CWE taint model (`sources/sinks/sanitizers/propagators`) with TI-weighted basis
536
- - `iris/results/findings.raw.json` — normalized raw findings aggregate from taint analysis pass SARIF outputs
537
- - `.kuzushi/runs/<runId>/findings.triaged.json` — triaged findings export including optional taint analysis source/sink triage details
538
-
539
- Relevant `scannerConfig["taint-analysis"]` options:
540
-
541
- - `tiMode`: `"live-required"` (default)
542
- - `tiFailurePolicy`: `"continue_without_ti"` (default)
543
- - `tiTimeoutMs`: live TI fetch timeout in milliseconds
544
- - `refinementEnabled`: enable one post-triage refinement loop (default `false`)
545
- - `refinementIterations`: max refinement passes when enabled (default `1`)
546
- - `refinementDeltaOnly`: triage only changed findings after refinement (default `true`)
547
- - `refinementModel`: optional model override for refinement stage wiring
548
-
549
- </details>
550
-
551
- <details>
552
- <summary><strong>Agent Runtime Backends</strong></summary>
553
-
554
- Kuzushi supports two agent runtime backends:
555
-
556
- **Claude (default)** — Uses `@anthropic-ai/claude-agent-sdk` to spawn Claude Code subprocesses with built-in tool implementations (Read, Glob, Grep, Bash, etc.). Supports session reuse: batch operations keep a single subprocess alive across multiple turns via the SDK's streaming input API, reducing subprocess spawns by ~99%. Requires `ANTHROPIC_API_KEY`.
557
-
558
- **Pi-AI** — Uses `@mariozechner/pi-ai` to provide vendor-agnostic LLM access. It supports 15+ providers (Anthropic, OpenAI, Google, Groq, Mistral, etc.) through a single interface. All LLM calls run in-process (no subprocesses).
559
-
560
- Kuzushi implements an internal agentic loop on top of pi-ai:
561
-
562
- 1. **Tool-calling loop** — call model, parse tool calls, execute tools, feed results back, repeat until stop or max turns
563
- 2. **Local tool implementations** — Read (file reader with line numbers), Glob (Node 22+ `globSync`), Grep (regex search across files)
564
- 3. **Structured output** — system prompt injection + post-hoc JSON extraction from fenced code blocks or raw text
565
- 4. **Safety controls** — max turns, budget enforcement, abort signal, permission gating via `canUseTool`
356
+ CodeQL is opt-in. Enable it:
566
357
 
567
358
  ```sh
568
- # Use with any supported provider:
569
- OPENAI_API_KEY=... kuzushi <repo> --model openai:gpt-4o
570
- GEMINI_API_KEY=... kuzushi <repo> --model google:gemini-2.0-flash
571
- ANTHROPIC_API_KEY=... kuzushi <repo> --model anthropic:claude-sonnet-4-6
359
+ kuzushi scan <repo> --tasks codeql
360
+ kuzushi scan <repo> --tasks semgrep,codeql
361
+ kuzushi config set tasks semgrep,codeql
572
362
  ```
573
363
 
574
364
  </details>
575
365
 
576
- <details>
577
- <summary><strong>Architecture</strong></summary>
578
-
579
- Kuzushi is built on three core abstractions:
580
-
581
- **Message Bus** — A transport-agnostic `MessageBus` interface (`publish`, `subscribe`, `waitFor`) that decouples pipeline stages. The stable build supports the in-process `EventEmitter` transport today.
582
-
583
- **AgentTask + DAG** — Every unit of work (context gatherer, scanner, future threat modeler, etc.) implements the `AgentTask` interface: an `id`, `dependsOn` list, `outputKind`, and a `run()` method. The `TaskRegistry` resolves enabled tasks into a DAG, groups them into parallel stages, detects cycles, and hands execution to the `PipelineOrchestrator`. Upstream task outputs are forwarded to dependents automatically.
584
-
585
- **Pipeline Phases** — After the DAG completes, the orchestrator drives sequential phases: triage (classify findings), verification (construct PoC exploits), patch synthesis (auto-generate and re-verify fixes), and report (display results + optional SSE streaming). Each phase has its own concurrency control, cost tracking, and checkpoint support.
586
-
587
- **Strategy Framework** — The multi-strategy system wraps detection tasks with multiple analytical approaches (syntactic, dataflow, reasoning, execution) that run in parallel or adaptively, merging results with corroboration-based confidence boosting.
588
-
589
- **Code Graph** — A persistent SQLite-backed graph of code paths from entry points to sinks, built from static import analysis and LLM-assisted tracing. Injected into triage prompts for deeper reasoning about reachability and sanitization.
590
-
591
- **Session Reuse** — The Claude runtime uses the Agent SDK's `AsyncIterable<SDKUserMessage>` prompt to keep a single subprocess alive across multiple turns. Batch operations (taint labeling, triage, verification, rescoring, PoC generation, patch synthesis) write per-batch data to `.kuzushi/batches/` files and send the subprocess a prompt to `Read` each file. This reduces worst-case subprocess spawns from ~3,100 to ~24 per pipeline run. Runtimes without `createSession` support (pi-ai) fall back to one subprocess per call automatically.
592
-
593
- Existing `ScannerPlugin` implementations (Semgrep, Agentic) are adapted into `AgentTask` via `adaptScannerPlugin()`, so the scanner plugin API remains stable.
594
-
595
- See [AGENTS.md](AGENTS.md) for the full developer guide on adding new agent tasks.
596
-
597
- ### Package Surface
598
-
599
- Kuzushi is published as a CLI-first package. The supported npm surface is the executable plus the root package export; internal modules under `dist/*` and `src/*` are not a stable API contract and may change between releases.
600
-
601
- Release builds are expected to come from a clean compile into `dist/`. `pnpm build` now cleans `dist/` first, `prepack` rebuilds automatically, and `pnpm verify:pack` runs `npm pack --dry-run` so stale artifacts do not get published.
602
-
603
- </details>
604
-
605
- ## Output
606
-
607
- Results are stored in SQLite at `<repo>/.kuzushi/findings.sqlite3`. Export to any format:
608
-
609
- ```sh
610
- kuzushi <repo> --output report.md # Markdown
611
- kuzushi <repo> --sarif results.sarif # SARIF v2.1.0 (GitHub Code Scanning compatible)
612
- kuzushi <repo> --json results.json # JSON
613
- kuzushi <repo> --csv results.csv # CSV
614
- kuzushi <repo> --jsonl results.jsonl # JSONL
615
- ```
616
-
617
- ## Development
618
-
619
- ```
620
- pnpm install # install deps
621
- pnpm dev -- /path/to/repo # run in dev mode
622
- pnpm check:types # typecheck app + benchmark tooling
623
- pnpm typecheck # type check
624
- pnpm test # run tests
625
- pnpm test:e2e # deterministic mock-backed smoke scan against fixture app
626
- pnpm test:coverage # tests + coverage (70% threshold)
627
- pnpm build # compile to dist/
628
- pnpm verify:pack # verify published tarball contents
629
- pnpm benchmark # run benchmark suite against govwa dataset
630
- pnpm benchmark:freeze # freeze current benchmark results as baseline
631
- pnpm benchmark:diff # diff current results against frozen baseline
632
- pnpm benchmark:regression # CI regression check against baseline
633
- ```
634
-
635
- `pnpm test:e2e` and the benchmark regression workflow use `tests/fixtures/mock-anthropic-server.mjs` for deterministic mock-backed coverage. They are useful smoke/regression checks, but they are not real LLM integration tests.
636
-
637
- ## Troubleshooting
638
-
639
- - **"Error: missing API credentials for selected model provider(s)."**: Set the provider env var(s) for your selected models (for example `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, or `GEMINI_API_KEY`)
640
- - **"No findings from scanner. Code looks clean."**: Your code is clean, or try `--severity ERROR,WARNING,INFO` to include lower-severity rules
641
- - **Scan interrupted**: Re-run the same command (already-triaged findings are skipped), or use `--resume` to continue from the exact checkpoint
642
- - **Wrong model**: `kuzushi config set model anthropic:claude-sonnet-4-6` or pass `--model` per-scan
643
- - **Scanner download fails**: Install Opengrep or Semgrep manually, ensure it's on your PATH
644
- - **High triage cost**: Use `--triage-model openai:gpt-4o-mini` for cheaper triage, or `--max 10` to limit findings
645
- - **Verification too expensive**: Use `--verify-min-confidence 0.8` to only verify high-confidence TPs, or `--verify-model openai:gpt-4o-mini`
646
- - **pi-ai model not found**: Ensure the model string uses `provider:modelId` format (e.g., `openai:gpt-4o`, not just `gpt-4o`)
647
-
648
- ## Contributing
366
+ ## Architecture
649
367
 
650
- See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and guidelines.
368
+ See [VISION.md](VISION.md) for the full architecture vision, module system design, workspace/knowledge graph, intel layer, governance model, and implementation roadmap.
651
369
 
652
370
  ## License
653
371
 
654
- [MIT](LICENSE)
372
+ MIT