npm - kuzushi - Versions diffs - 0.11.0 → 0.11.2 - Mend

kuzushi 0.11.0 → 0.11.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/README.md +197 -479
package/dist/copilot/shell.d.ts +12 -8
package/dist/copilot/shell.js +295 -187
package/dist/copilot/shell.js.map +1 -1
package/dist/ui/App.js +5 -1
package/dist/ui/App.js.map +1 -1
package/dist/ui/components/CopilotShell.d.ts +3 -6
package/dist/ui/components/CopilotShell.js +33 -50
package/dist/ui/components/CopilotShell.js.map +1 -1
package/dist/ui/state.d.ts +6 -1
package/dist/ui/state.js.map +1 -1
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 <img src="kuzushi.png" alt="Kuzushi" width="200" />
-# Kuzushi — AI Security Scanner That Only Shows You Real Vulnerabilities
+# Kuzushi — Security-Native AI Operating Environment
 [![CI](https://github.com/allsmog/Kuzushi/actions/workflows/ci.yml/badge.svg)](https://github.com/allsmog/Kuzushi/actions/workflows/ci.yml)
 [![npm](https://img.shields.io/npm/v/kuzushi)](https://www.npmjs.com/package/kuzushi)
@@ -8,96 +8,141 @@
 [kuzushi.dev](https://kuzushi.dev)
-SAST tools cry wolf. Semgrep finds 500 issues in your codebase. 480 of them are false positives. You spend hours triaging a wall of noise and still miss the real vulnerability on line 247.
+Kuzushi combines offensive security, defensive operations, and compliance governance into a single interactive platform — powered by LLM agents, backed by a persistent workspace, and delivered through a rich terminal UI.
-Kuzushi runs the same scanners, then sends an AI agent to investigate each finding — reading the actual code, tracing data flow, checking for sanitization. It tells you which findings are real, which are noise, and optionally proves exploitability by constructing a working PoC.
+Find the vulnerability. Prove it's exploitable. Deploy a honeypot to detect it. Check if it violates PCI DSS. Generate the patch. One tool, one conversation.
 ```sh
-npx kuzushi /path/to/your/repo
+npm install -g kuzushi
+kuzushi
 ```
-No config files. Just point it at a repo.
+## Three Ways to Use It
-<!-- TODO: Add terminal recording / asciinema GIF here -->
+### Shell (default)
-## Quick Start
+Just type `kuzushi`. The interactive copilot shell starts with your loaded modules, available tools, and any active workspace. Talk naturally or use structured commands.
-Prereqs: Node 22+, and either an API key or Claude Code OAuth login.
+```
+kuzushi shell                            # default — just `kuzushi` works
+kuzushi shell --workspace acme-pentest   # resume an engagement
+kuzushi shell --target ./repo            # set initial target
+kuzushi shell --load blackbox,honeypot   # pre-load specific modules
+```
-```sh
-# Install globally (recommended — get upgrades with npm update -g kuzushi)
-npm install -g kuzushi
+```
+┌─────────────────────────────────────────────────────────────┐
+│  kuzushi shell                      workspace: acme-api     │
+│  modules: sast, randori, blackbox, honeypot, shinsa         │
+│  target: ./acme-api  (Node.js + Express + PostgreSQL)       │
+└─────────────────────────────────────────────────────────────┘
-# Or run without installing
-npx kuzushi /path/to/your/repo
+kuzushi> modules
+kuzushi> use sast
+kuzushi/sast> run scan ./repo preset=deep
+kuzushi/sast> back
+kuzushi> tools
+kuzushi> run sast:verify fingerprint=abc123
+kuzushi> exit
 ```
+### Scan (headless pipeline)
+The full SAST pipeline — 40+ agent tasks orchestrated as a DAG. Semgrep, CodeQL, 30+ agentic detectors, AI triage, verification, PoC generation, patch synthesis. CI/CD-native with SARIF, quality gates, and exit codes.
 ```sh
-# With Claude Code OAuth (no API key needed — uses your Claude login)
-kuzushi /path/to/your/repo
+kuzushi scan <repo>
+kuzushi scan <repo> --preset deep --verify --auto-patch
+kuzushi scan <repo> --sarif report.sarif --fail-on-tp --quality-gate
+kuzushi scan <repo> --resume
+```
-# With Anthropic API key
-export ANTHROPIC_API_KEY=sk-ant-...
-kuzushi /path/to/your/repo
+### Run (headless module tool)
-# With OpenAI
-export OPENAI_API_KEY=sk-...
-kuzushi /path/to/repo --model openai:gpt-4o
+Execute a single module tool without the interactive shell. Scriptable, composable with unix pipes.
-# With Google, Groq, Mistral, or 15+ other providers
-kuzushi /path/to/repo --model google:gemini-2.0-flash
+```sh
+kuzushi run sast:scan ./repo --json
+kuzushi run sast:triage fingerprint=abc123 --json
+kuzushi run sast:verify fingerprint=abc123 --quiet
+kuzushi run sast:findings severity=critical,high --json
 ```
-Kuzushi auto-downloads Opengrep if you don't have a scanner installed. Zero dependencies to manage.
+## Quick Start
+Prereqs: Node 22+, and either an API key or Claude Code OAuth login.
+```sh
+# Install globally
+npm install -g kuzushi
-## The Problem
+# Start the copilot shell (default)
+kuzushi
-**SAST scanners alone** have high recall but terrible precision — they flag everything that *could* be a vulnerability, drowning you in false positives. Teams burn hours triaging, develop alert fatigue, and eventually stop looking.
+# Or run a headless scan
+kuzushi scan /path/to/your/repo
-**LLMs alone** can reason about code but hallucinate when scanning from scratch — 95%+ false positive rate when you ask "find vulnerabilities in this repo."
+# With specific providers
+export ANTHROPIC_API_KEY=sk-ant-...
+kuzushi scan /path/to/your/repo
-**Kuzushi combines both.** SAST signal narrows the search space. AI reasoning eliminates false positives. The result: near-human researcher agreement rates on vulnerability classification.
+# With OpenAI, Google, Groq, Mistral, or 15+ other providers
+kuzushi scan /path/to/repo --model openai:gpt-4o
+kuzushi scan /path/to/repo --model google:gemini-2.0-flash
+```
-## What You Get
+Kuzushi auto-downloads Opengrep if you don't have a scanner installed. Zero dependencies to manage.
-For each finding, Kuzushi produces:
+## Module System
-- **Verdict** — `true_positive`, `false_positive`, `by_design`, or `needs_review`
-- **Confidence** — 0.0 to 1.0
-- **Rationale** — why the agent reached that verdict, referencing specific code lines
-- **Verification steps** — 2-6 actionable steps a human reviewer can follow
-- **Fix suggestion** — suggested patch when applicable
-- **PoC exploit** (with `--verify`) — a concrete proof-of-concept payload proving the vulnerability is exploitable
-- **Cost** — per-finding triage and verification cost in USD
+Kuzushi's capabilities come from pluggable modules. Each module exposes tools (for shell and run modes) and optionally pipeline tasks (for scan mode DAG execution).
-The terminal report shows true positives first, then needs-review items. False positives and by-design findings are counted but deprioritized. You only see what matters.
+| Module | Category | What It Does |
+|--------|----------|-------------|
+| **sast** (built-in) | offense | 40+ task SAST pipeline: Semgrep, CodeQL, agentic detectors, AI triage, verification, PoC, patch |
+| **randori** | intel | 7-stage PASTA threat modeling with ATT&CK/CAPEC/NVD intel, attack trees, probabilistic risk |
+| **vuln-scout** | offense | Whitebox SAST with 15 Joern CPG verification scripts, 8 autonomous agents |
+| **augur** | offense | Neuro-symbolic SAST (IRIS/ICLR 2025 LLM-driven CodeQL taint analysis) |
+| **blackbox** | offense | Black/grey-box pentesting: nmap, gobuster, nikto, hydra, privilege escalation |
+| **pwn** | offense | Binary exploitation: checksec, GDB, ROP chains, heap exploitation, SROP |
+| **pentest** | offense | MCP server wrapping metasploit, nmap, hydra, john |
+| **honeypot** | defense | Autonomous honeypot orchestration: 14 service types, 6 honeytokens, Falco |
+| **yokai** | defense | Supply chain tripwires: dependency confusion, typosquatting, registry canaries |
+| **prompt-armor** | offense | LLM red teaming: 80+ attack plugins, 25+ mutation strategies |
+| **shinsa** | governance | Multi-framework compliance: ISO 27001, NIST 800-53, SOC 2, PCI DSS |
+| **revgraph** | intel | Binary reverse engineering: Ghidra + Neo4j, NL2Cypher, function embeddings |
-## How It Works
+Modules are loaded via the shell (`use <module>`) or at startup (`--load blackbox,honeypot`).
+## The SAST Pipeline
+The built-in `sast` module runs a 40+ task DAG pipeline:
 ```
   ┌─────────────┐     ┌──────────────┐     ┌──────────────┐     ┌─────────┐     ┌──────────┐
-  │  Task DAG    │────▶│  AI Triage   │────▶│ Verification │────▶│  Patch  │────▶│  Report  │
+  │  Task DAG    │────>│  AI Triage   │────>│ Verification │────>│  Patch  │────>│  Report  │
   │ Semgrep      │     │ Investigate  │     │ Construct    │     │ Generate │     │ TP only  │
   │ CodeQL       │     │ each finding │     │ PoC exploits │     │ & verify │     │ + export │
-  │ 15+ tasks    │     │ with context │     │ (optional)   │     │ (opt-in) │     │ + stream │
+  │ 30+ tasks    │     │ with context │     │ (optional)   │     │ (opt-in) │     │ + stream │
   └─────────────┘     └──────────────┘     └──────────────┘     └─────────┘     └──────────┘
 ```
-1. **Context gathering** — auto-detects your tech stack, frameworks, auth patterns, ORMs, and sanitization libraries
-2. **Code graph** — builds a persistent entry-point-to-sink graph via static analysis + LLM discovery mode. For HTTP services, traces pre-identified routes. For CLI tools, daemons, and non-HTTP projects, the LLM identifies entry points itself (main functions, socket listeners, gRPC servers, CLI handlers) and traces security-relevant call/data-flow paths
-3. **Threat modeling** — Randori PASTA plugin (shipped as `@kuzushi/randori-plugin`) performs 4-stage threat analysis: business objectives, technical scope, DFD decomposition, and STRIDE threat scenarios with ATT&CK/CAPEC/OWASP mapping and 5-factor probabilistic scoring. All threat leads are injected into every detector's prompts.
-4. **Threat-informed hunting** — spawns one adversarial Claude agent per DFD external entity (users, services, attackers) to CTF-style hunt for vulnerabilities from each actor's perspective
-5. **Task DAG execution** — runs enabled tasks as a dependency-aware DAG: Semgrep, CodeQL, agentic scanner, and 15+ specialized detectors (SSRF, SQLi, XSS, command injection, XXE, deserialization, NoSQL injection, template injection, prototype pollution, race conditions, supply chain, GraphQL, secrets/crypto, auth logic, sharp edges, systems-level deep semantic analysis); multi-strategy mode runs 2-4 analytical approaches per vuln class
-4. **Classifier funnel** — cheap single-token pre-filter removes ~80% of noise before expensive triage
-5. **Deduplication** — fingerprints and merges equivalent findings across scanners
-6. **Incremental skip** — findings already triaged in previous runs are skipped automatically
-7. **AI triage** — an agent investigates each finding with pre-loaded source context, code graph paths, evidence chains, threat model context, and CWE-specific knowledge modules. Threat model output from Randori PASTA is injected into triage prompts so the agent can distinguish design choices (`by_design`) from real vulnerabilities (`tp`). Batch-dropped findings auto-escalate to individual triage
-8. **Variant analysis** — confirmed TPs trigger automatic search for similar patterns across the codebase
-9. **Verification** (optional) — constructs concrete PoC exploit payloads for true positives
-10. **PoC harness generation** (optional) — produces runnable exploit scripts with iterative execution feedback
-11. **Dynamic analysis** (optional) — executes harnesses in Docker sandbox to confirm exploitability
-12. **Auto-patch** (optional) — generates, validates, and re-verifies patches in disposable git worktrees
-13. **Report** — terminal display + export to SARIF, Markdown, JSON, CSV, or JSONL; optional SSE live streaming
+1. **Context gathering** — auto-detects tech stack, frameworks, auth patterns, ORMs, sanitization
+2. **Code graph** — builds entry-point-to-sink graph via static analysis + LLM discovery
+3. **Threat modeling** — Randori PASTA plugin: business objectives, technical scope, DFD decomposition, STRIDE threats with ATT&CK/CAPEC/OWASP mapping and 5-factor probabilistic scoring
+4. **Threat-informed hunting** — adversarial Claude agents per DFD external entity, CTF-style hunting
+5. **Task DAG execution** — Semgrep, CodeQL, agentic scanner, 15+ specialized detectors (SSRF, SQLi, XSS, command injection, XXE, deserialization, NoSQL injection, template injection, prototype pollution, race conditions, supply chain, GraphQL, secrets/crypto, auth logic, systems-level deep semantic analysis)
+6. **Classifier funnel** — cheap single-token pre-filter removes ~80% noise before expensive triage
+7. **Deduplication** — fingerprints and merges equivalent findings across scanners
+8. **AI triage** — agent investigates each finding with source context, code graph, evidence chains, threat model, CWE-specific knowledge
+9. **Variant analysis** — confirmed TPs trigger automatic search for similar patterns
+10. **Verification** (optional) — constructs concrete PoC exploit payloads
+11. **PoC harness** (optional) — produces runnable exploit scripts with execution feedback
+12. **Dynamic analysis** (optional) — executes harnesses in Docker sandbox
+13. **Auto-patch** (optional) — generates, validates, and re-verifies patches in disposable worktrees
+14. **Report** — terminal display + SARIF, Markdown, JSON, CSV, JSONL; optional SSE streaming
+For each finding, you get: **verdict** (tp/fp/by_design/needs_review), **confidence** (0-1), **rationale** with code references, **verification steps**, **fix suggestion**, optional **PoC exploit**, and **cost in USD**.
 ## CI Integration
@@ -114,7 +159,7 @@ jobs:
       - uses: actions/setup-node@v4
         with:
           node-version: 22
-      - run: npx kuzushi . --sarif results.sarif --quality-gate --fail-on-tp
+      - run: npx kuzushi scan . --sarif results.sarif --quality-gate --fail-on-tp
         env:
           ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
       - uses: github/codeql-action/upload-sarif@v3
@@ -126,255 +171,132 @@ jobs:
 ### Quality Gates
 ```sh
-kuzushi <repo> --quality-gate         # fail CI on threshold violations
-kuzushi <repo> --fail-on-tp           # fail if any high/critical TP is found
-kuzushi <repo> --sarif results.sarif  # export SARIF for GitHub Code Scanning
+kuzushi scan <repo> --quality-gate         # fail CI on threshold violations
+kuzushi scan <repo> --fail-on-tp           # fail if any high/critical TP is found
+kuzushi scan <repo> --sarif results.sarif  # export SARIF for GitHub Code Scanning
 ```
-## Key Features
-**Vendor-agnostic LLM runtime** — works with Anthropic, OpenAI, Google, Groq, Mistral, and 15+ other providers. Swap models at runtime with `--model provider:modelId`. Use cheaper models for triage, premium models for verification.
-**Exploit verification** — goes beyond classification. Constructs concrete PoC payloads (SQL injection strings, XSS vectors, etc.) that prove a finding is exploitable, not just theoretically possible.
-**Crypto behavioral testing** — generates and executes behavioral test harnesses in a Docker sandbox for crypto misuse findings. Detects timing side-channels, ECB mode, weak hashes, weak PRNGs, and more.
-**IRIS-style taint analysis** — LLM-driven CodeQL taint analysis inspired by the IRIS paper (ICLR 2025). An LLM selects relevant CWE classes for the project, writes CodeQL extraction queries dynamically (language-agnostic, framework-agnostic), labels candidates, generates TaintTracking configurations, and iteratively refines queries when compilation fails. Structured taint paths (source-to-sink step data) are persisted to the findings DB for downstream verification and reporting. No templates, no hardcoded framework detection.
-**Randori PASTA threat modeling** — ships `@kuzushi/randori-plugin` as a dependency. Runs 4-stage PASTA analysis (objectives, scope, DFD decomposition, STRIDE threats) via Claude Code plugin. Threat leads are injected into all detector prompts for threat-informed scanning. ATT&CK, CAPEC, and OWASP mapping included.
-**Threat-informed hunting** — spawns adversarial Claude agents for each DFD external entity identified by the threat model. Each agent explores the codebase as that actor (end user, admin, external service, LLM agent) looking for exploitable paths. Findings are deduplicated and fed into triage/verification.
-**Systems-level deep semantic hunt** — LLM-driven analysis pipeline for finding the class of bugs that survive decades of code review and fuzzing: integer overflow/wraparound (CWE-190), sentinel value collisions (CWE-787), signed/unsigned comparison bugs (CWE-681), buffer overflows exploitable via missing stack canaries (CWE-693), use-after-free in protocol state machines (CWE-416), and unsafe block violations in Rust (CWE-704). The LLM writes and runs CodeQL queries using range analysis, loop induction analysis, and type predicates — NOT TaintTracking — to find bugs that source-to-sink taint flows cannot express. Activates automatically for C, C++, Rust, and Go codebases. The `glasswing` preset routes a frontier model to this task for maximum depth.
-**Auto rule generation** — verified exploitable findings automatically generate custom Semgrep rules. Rules are persisted to `.kuzushi/custom-rules/` and auto-loaded on subsequent scans, creating a feedback loop where the scanner gets smarter over time. Rules are validated against the original finding and removed if they don't match.
-**Diff-aware taint analysis** — narrows analysis to files changed since a base branch. Run `--taint-diff-base main` in CI to only analyze what's new in the PR.
-**Resumable runs** — checkpoints pipeline state to SQLite. Interrupted scan? `--resume` picks up exactly where it left off.
-**Patch synthesis** — `kuzushi patch <repo> --fingerprint <fp>` generates and validates security patches in disposable git worktrees without touching your working copy.
-**Language-tuned detection** — every detector adapts its prompts to your repo's actual tech stack. Kuzushi auto-detects languages and frameworks, then injects language-specific sinks, safe patterns, few-shot examples, investigation hints (grep patterns, key files), framework-aware guidance, and anti-hallucination constraints. A Python repo gets `subprocess.run` shell=True analysis and Django/Flask/FastAPI-specific advice. A C/C++ repo gets buffer-size #define resolution, signed/unsigned mismatch detection, and memory-safety few-shots. A Java repo gets SpEL injection, XXE factory configuration, and Spring Security guidance. 8 language ecosystems covered: C/C++, Java/Kotlin, Python, Go, JavaScript/TypeScript, Rust, PHP, Ruby — each with per-vulnerability-class depth. Polyglot repos get all relevant languages composed together.
-**15+ specialized detectors** — dedicated detection tasks for command injection, XXE, insecure deserialization, SSRF, NoSQL injection, template injection, prototype pollution, race conditions, supply chain, GraphQL security, secrets/crypto, code config, auth logic, sharp edges, crypto behavioral testing, and systems-level deep semantic analysis (integer overflow, buffer overflow, sentinel collision, use-after-free, unsafe blocks). Each has vulnerability-class-specific prompts, anti-hallucination constraints, and multi-lens analysis. All detectors receive threat model leads for threat-informed scanning.
-**Classifier funnel** — single-token LLM pre-filter using a cheap model removes ~80% of false positives before expensive triage, cutting per-scan cost dramatically.
-**Source pre-read** — triage agents receive the flagged source file pre-loaded (50 lines surrounding the finding), eliminating cold-start tool calls and improving reasoning accuracy.
-**LLM code graph** — builds a persistent code graph tracing entry points through middleware, controllers, services, and data-access layers. Static skeleton from import analysis + LLM-assisted gap-filling for dynamic dispatch, DI, and callback patterns. Discovery mode: when no HTTP routes are detected, the LLM identifies entry points itself (main functions, socket listeners, gRPC servers, CLI handlers) and traces security-relevant paths with threat model context. Feeds graph context into triage for better reasoning.
-**Multi-strategy analysis** — runs 2-4 different analytical approaches per vulnerability class (syntactic pattern matching, dataflow tracing, first-principles reasoning, execution-based proof) and merges results with confidence boosting when strategies agree. Auto-generates reusable Semgrep rules from confirmed multi-strategy findings.
-**13 CWE knowledge modules** — domain-specific knowledge for SQL injection, XSS, SSRF, command injection, path traversal, auth bypass, deserialization, race conditions, crypto, XXE, file upload, IDOR, and NoSQL injection — including dangerous patterns, safe patterns, bypass techniques, and fix examples.
-**Incremental scanning** — skips re-triage for unchanged findings across runs. Tracks the last scanned commit, computes file diffs, and expands the rescan scope with dependency-aware invalidation via the import graph.
-**Auto-patch with closed-loop verification** — after confirming a vulnerability, automatically generates a patch in a disposable git worktree, validates it (apply, build, test), then re-runs the scanner on the patched code to confirm the vulnerability is gone.
-**Live streaming** — SSE server streams pipeline events in real-time (`--stream`). Connect with `curl`, `EventSource`, or any SSE client to watch findings appear as they're triaged.
-**Interactive terminal UI** — React+Ink-powered live display with pipeline progress tree, spinners, attack chain diagrams, and a trophy screen for confirmed exploits. Includes an interactive REPL during scans (pause, skip, inspect findings), a first-run setup wizard, config confirmation flow, inline code preview, and clickable file paths. Auto-detects terminal theme and falls back to plain text in non-TTY environments.
-**Audit logging** — optional JSONL audit trail of every agent decision for debugging, accountability, and compliance records.
 ## Scan Presets
-Presets configure the pipeline for different cost/depth tradeoffs. CLI flags override preset values.
 ```sh
-kuzushi <repo> --preset fast        # semgrep only, no context/enrichment/variant analysis
-kuzushi <repo> --preset standard    # semgrep + IRIS taint + secrets/crypto detection
-kuzushi <repo> --preset deep        # standard + verification + threat modeling + systems-level hunt
-kuzushi <repo> --preset glasswing   # verification + PoC generation + threat-informed hunting
-                                    # + deep semantic hunt with a frontier model
+kuzushi scan <repo> --preset fast        # semgrep only, no context/enrichment
+kuzushi scan <repo> --preset standard    # semgrep + IRIS taint + secrets/crypto
+kuzushi scan <repo> --preset deep        # + verification + threat modeling + systems hunt
+kuzushi scan <repo> --preset glasswing   # + PoC + threat-informed hunting + frontier model
 ```
-The `glasswing` preset uses a cost-smart model tiering strategy: a standard model handles bulk scanning and triage, while a frontier model is used surgically for the systems-hunt and threat-hunt stages — where stronger adversarial reasoning has the highest ROI for zero-day discovery. Per-task model overrides keep costs controlled.
-## Tasks
-Every stage of the pipeline is an **AgentTask** — a composable unit with explicit dependencies that the orchestrator runs as a DAG. Tasks are selected via `--tasks` or `config.tasks`, and per-task config (including model overrides) lives in `config.taskConfig`.
+## Key Features
-| Task ID | Description | Auto-download |
-|---------|-------------|---------------|
-| `semgrep` (default) | Traditional SAST via Opengrep/Semgrep | Yes |
-| `codeql` | Semantic dataflow/taint analysis via GitHub CodeQL CLI | No (opt-in) |
-| `agentic` | AI-driven scanner — LLM with read-only repo tools | N/A |
-| `taint-cwe-select` / `taint-iris` | IRIS-style LLM-driven CodeQL taint analysis — dynamic CWE selection, LLM-generated queries, iterative refinement | No (opt-in) |
-| `systems-hunt` | Deep semantic analysis for C/C++/Rust/Go — LLM-driven CodeQL range analysis, loop induction, missing mitigations | No (opt-in) |
-| `secrets-crypto-detect` | Secrets, API keys, and cryptographic misuse detection | N/A |
-| `code-config-detect` | Security-relevant code and configuration issues | N/A |
-| `threat-model-randori` | PASTA threat modeling with STRIDE analysis | N/A |
-| `threat-hunt` | Adversarial CTF-style hunting per DFD entity | N/A |
-| `context-gatherer` | Auto-detects tech stack, frameworks, auth patterns | N/A |
-| `context-enricher` | Deep context enrichment (middleware, trust boundaries) | N/A |
+**Vendor-agnostic LLM runtime** — Anthropic, OpenAI, Google, Groq, Mistral, and 15+ providers. Swap models at runtime with `--model provider:modelId`. Runs airgapped with Ollama.
-```sh
-kuzushi <repo> --tasks semgrep,codeql               # run specific tasks
-kuzushi <repo> --tasks agentic                       # AI-only scan
-kuzushi <repo> --task-model threat-hunt=anthropic:claude-opus-4-6  # per-task model override
-```
+**Exploit verification** — constructs concrete PoC payloads (SQL injection strings, XSS vectors, etc.) that prove exploitability, not just theoretical possibility.
----
+**IRIS-style taint analysis** — LLM-driven CodeQL taint analysis (ICLR 2025). LLM selects CWE classes, writes CodeQL queries dynamically, labels candidates, generates TaintTracking configs, iteratively refines on failure.
-<details>
-<summary><strong>All Commands</strong></summary>
+**Randori PASTA threat modeling** — 4-stage PASTA analysis (objectives, scope, DFD, STRIDE threats) via Claude Code plugin. Threat leads injected into all detector prompts. ATT&CK, CAPEC, OWASP mapping.
-### Scan (default)
+**Systems-level deep semantic hunt** — finds bugs that survive decades of review: integer overflow, sentinel collisions, signed/unsigned comparison, buffer overflows, use-after-free in protocol state machines, unsafe Rust blocks. LLM writes CodeQL queries using range analysis, not TaintTracking.
-```
-kuzushi <repo>                        # scan with defaults
-kuzushi <repo> --tasks codeql
-kuzushi <repo> --tasks semgrep,codeql
-kuzushi <repo> --tasks semgrep,agentic
-kuzushi <repo> --severity ERROR       # only ERROR-level findings
-kuzushi <repo> --max 20               # triage top 20 findings only
-kuzushi <repo> --model anthropic:claude-sonnet-4-6  # use a different model
-kuzushi <repo> --task-model triage=openai:gpt-4o  # separate model for triage stage
-kuzushi <repo> --api-key sk-ant-... --base-url https://api.example.com/  # custom API endpoint
-kuzushi <repo> --fresh                # clear prior results, re-triage everything
-kuzushi <repo> --db ./my.sqlite3      # custom database path
-kuzushi <repo> --resume               # resume the most recent interrupted run
-kuzushi <repo> --resume <run-id>      # resume a specific run by ID
-```
+**Multi-strategy analysis** — 2-4 analytical approaches per vulnerability class with confidence boosting when strategies agree. Auto-generates Semgrep rules from confirmed findings.
-### Verification
+**Auto-patch with closed-loop verification** — generates patches in disposable worktrees, validates, re-runs the scanner to confirm the vulnerability is gone.
-```
-kuzushi <repo> --verify               # enable exploit verification for TPs
-kuzushi <repo> --verify --task-model verify=openai:gpt-4o-mini  # cheaper model for verification
-kuzushi <repo> --verify --verify-max-turns 20
-kuzushi <repo> --verify --verify-concurrency 3
-kuzushi <repo> --verify --verify-min-confidence 0.7  # skip low-confidence TPs
-```
+**Crypto behavioral testing** — generates and executes behavioral test harnesses in Docker for crypto misuse: timing side-channels, ECB mode, weak hashes, weak PRNGs.
-### PoC Harness Generation
+**Language-tuned detection** — auto-detects languages and frameworks, injects language-specific sinks, safe patterns, few-shot examples, anti-hallucination constraints. 8 ecosystems: C/C++, Java/Kotlin, Python, Go, JS/TS, Rust, PHP, Ruby.
-```
-kuzushi <repo> --verify --poc-harness                          # generate exploit scripts for verified findings
-kuzushi <repo> --verify --poc-harness --task-model poc-harness=openai:gpt-4o-mini
-kuzushi <repo> --verify --poc-harness --poc-harness-max-turns 25
-kuzushi <repo> --verify --poc-harness --poc-harness-concurrency 2
-```
+**Resumable runs** — checkpoints to SQLite. `--resume` picks up where you left off.
-### Dynamic Analysis
+**Interactive terminal UI** — React+Ink-powered live display with pipeline progress tree, spinners, trophy screen for confirmed exploits. REPL during scans (pause, skip, inspect). First-run setup wizard. Falls back to plain text in non-TTY.
-```
-kuzushi <repo> --verify --poc-harness --dynamic-analysis       # execute harnesses to confirm/reject findings
-kuzushi <repo> --verify --dynamic-analysis --dynamic-max-candidates 10
-kuzushi <repo> --verify --dynamic-analysis --dynamic-min-score 8
-```
+**Incremental scanning** — skips re-triage for unchanged findings. Dependency-aware invalidation via import graph.
-### Patch Synthesis
+**Audit logging** — JSONL audit trail of every agent decision.
-```
-kuzushi patch <repo> --fingerprint <fp>                        # synthesize and validate a patch
-kuzushi patch <repo> --fingerprint <fp> --build-cmd "npm run build"
-kuzushi patch <repo> --fingerprint <fp> --test-cmd "npm test" --max-iterations 5
-```
+<details>
+<summary><strong>All Scan Commands</strong></summary>
-### Code Graph
+### Scan
 ```
-kuzushi <repo> --code-graph                       # enable LLM-powered code graph (entry-point-to-sink tracing)
+kuzushi scan <repo>                        # scan with defaults
+kuzushi scan <repo> --tasks codeql
+kuzushi scan <repo> --tasks semgrep,codeql
+kuzushi scan <repo> --severity ERROR
+kuzushi scan <repo> --max 20
+kuzushi scan <repo> --model anthropic:claude-sonnet-4-6
+kuzushi scan <repo> --task-model triage=openai:gpt-4o
+kuzushi scan <repo> --fresh
+kuzushi scan <repo> --resume
 ```
-### Multi-Strategy Analysis
+### Verification & PoC
 ```
-kuzushi <repo> --multi-strategy                   # adaptive mode: run cheapest strategy first, exit early if confident
-kuzushi <repo> --multi-strategy-full              # run all strategies in parallel for maximum coverage
-kuzushi <repo> --multi-strategy-budget 3.0        # per-finding budget across all strategies (USD)
-kuzushi <repo> --multi-strategy-auto-rules        # generate Semgrep rules from confirmed multi-strategy findings
+kuzushi scan <repo> --verify
+kuzushi scan <repo> --verify --poc-harness
+kuzushi scan <repo> --verify --poc-harness --dynamic-analysis
 ```
-### Auto-Patch (Closed-Loop)
+### Multi-Strategy & Code Graph
 ```
-kuzushi <repo> --verify --auto-patch                                # patch exploitable findings, re-verify
-kuzushi <repo> --verify --auto-patch --auto-patch-after triage      # patch any TP (broadest trigger)
-kuzushi <repo> --verify --auto-patch --auto-patch-after poc         # patch only after PoC proves it
-kuzushi <repo> --auto-patch --patch-verify-depth triage             # re-run scanner + triage on patched code
-kuzushi <repo> --auto-patch --patch-verify-depth full               # full pipeline re-verify (most thorough)
-kuzushi <repo> --auto-patch --patch-concurrency 3                   # parallel patch synthesis tasks
+kuzushi scan <repo> --multi-strategy
+kuzushi scan <repo> --multi-strategy-full
+kuzushi scan <repo> --code-graph
 ```
-### Streaming
+### Auto-Patch
 ```
-kuzushi <repo> --stream                           # start SSE server on auto-assigned port
-kuzushi <repo> --stream --stream-port 3001        # start SSE server on specific port
-# Then in another terminal:
-curl -N http://localhost:3001/events              # watch live pipeline events
+kuzushi scan <repo> --verify --auto-patch
+kuzushi scan <repo> --auto-patch --patch-verify-depth full
 ```
-### Crypto Behavioral Testing
+### Diff-Aware & Crypto
 ```
-kuzushi <repo> --crypto-behavioral-test                        # generate & run behavioral tests for crypto misuse findings
+kuzushi scan <repo> --taint-diff-base main
+kuzushi scan <repo> --crypto-behavioral-test
 ```
-### Diff-Aware Taint
+### Output
 ```
-kuzushi <repo> --taint-diff-base main                          # only taint-analyze files changed since main
-kuzushi <repo> --taint-diff-base main --taint-diff-mode delta  # emit only findings intersecting the diff
-kuzushi <repo> --taint-diff-base main --taint-diff-mode baseline  # merge cached + rerun for full baseline
+kuzushi scan <repo> --output report.md
+kuzushi scan <repo> --sarif results.sarif
+kuzushi scan <repo> --json results.json
+kuzushi scan <repo> --csv results.csv
+kuzushi scan <repo> --stream
+kuzushi scan <repo> --audit-log
 ```
-### Output & Observability
+### Run Mode
 ```
-kuzushi <repo> --output report.md     # export markdown report
-kuzushi <repo> --sarif results.sarif  # export SARIF v2.1.0
-kuzushi <repo> --json results.json    # export JSON report
-kuzushi <repo> --csv results.csv      # export CSV report
-kuzushi <repo> --jsonl results.jsonl  # export JSONL report
-kuzushi <repo> --audit-log            # write agent activity to .kuzushi/runs/{runId}/
-kuzushi <repo> --verbose              # show debug-level runtime diagnostics
-kuzushi <repo> --no-context           # disable repo context gathering
+kuzushi run sast:scan ./repo --json
+kuzushi run sast:triage fingerprint=abc123 --json
+kuzushi run sast:verify fingerprint=abc123 --quiet
+kuzushi run sast:findings severity=critical,high --json
 ```
-### Retry
+### Patch
 ```
-kuzushi <repo> --max-triage-retries 3       # retry failed triage calls (default: 2)
-kuzushi <repo> --max-verify-retries 3       # retry failed verification calls (default: 2)
-kuzushi <repo> --retry-backoff-ms 10000     # initial backoff delay (default: 5000)
+kuzushi patch <repo> --fingerprint <fp>
+kuzushi patch <repo> --fingerprint <fp> --build-cmd "npm run build" --test-cmd "npm test"
 ```
 ### Config
 ```
-kuzushi config get                    # show all config
-kuzushi config get model              # show one key
+kuzushi config get
 kuzushi config set model anthropic:claude-sonnet-4-6
 kuzushi config set tasks semgrep,agentic
-kuzushi config set taskConfig.codeql.dbPath ./codeql-db
-kuzushi config set taskConfig.codeql.suite javascript-security-extended
-kuzushi config set taskConfig.semgrep.binary opengrep
-kuzushi config set taskConfig.semgrep.configFlag auto
-kuzushi config set taskConfig.agentic.model anthropic:claude-sonnet-4-6
-kuzushi config set taskConfig.agentic.maxFindings 25
-kuzushi config set taskConfig.triage.model anthropic:claude-opus-4-6
-kuzushi config set taskConfig.verify.model openai:gpt-4o-mini
-kuzushi config set severity ERROR,WARNING,INFO
-kuzushi config set verify true
-kuzushi config set verifyMinConfidence 0.7
-kuzushi config set auditLog true
-kuzushi config validate --repo .      # validate the effective config for this repo
-kuzushi config unset model            # reset to default
-kuzushi config path                   # print config file location
+kuzushi config validate --repo .
+kuzushi config path
 ```
-Global config lives at `~/.kuzushi/config.json`. Optional project overrides can live at `<repo>/.kuzushi/config.json`. CLI flags override config values.
-**Repo-local config sandboxing:** By default, project-level config files (`<repo>/.kuzushi/config.json`) are sandboxed — keys that could execute code or reach external systems (e.g., `hooks`, `externalTasks`, `pocExecute`, scanner binary paths) are silently stripped. This prevents a cloned repo from altering your runtime behavior. Pass `--trust-repo-config` to opt in to the full project config when you trust the repository.
-Security note: `agentRuntimeConfig.apiKey` is stored in plaintext in config files. Prefer `--api-key` for one-off runs or `ANTHROPIC_API_KEY` from your shell/secret manager.
 </details>
 <details>
@@ -384,93 +306,44 @@ Security note: `agentRuntimeConfig.apiKey` is stored in plaintext in config file
 | --- | --- | --- |
 | `model` | `anthropic:claude-sonnet-4-6` | Default LLM model for all tasks and stages |
 | `tasks` | `["semgrep"]` | Enabled task IDs, in execution order |
-| `taskConfig` | `{ semgrep: {...}, triage: {...}, ... }` | Per-task config blocks keyed by task ID or stage ID (see below) |
+| `taskConfig` | `{ semgrep: {...}, triage: {...}, ... }` | Per-task config blocks keyed by task ID or stage ID |
 | `severity` | `["ERROR","WARNING"]` | Semgrep severity filter |
 | `excludePatterns` | `["test","tests","node_modules",...]` | Directories/globs to skip |
-| `busBackend` | `"in-process"` | Message bus transport (`in-process`) |
 | `triageConcurrency` | `5` | Parallel LLM triage calls |
-| `scanMode` | `"concurrent"` | Task execution mode (`sequential` or `concurrent`) |
-| `agentRuntimeBackend` | `"pi-ai"` | Agent runtime backend (`pi-ai`) |
+| `scanMode` | `"concurrent"` | Task execution mode |
 | `verify` | `false` | Enable proof-of-exploitability verification |
-| `verifyMaxTurns` | `15` | Max turns for verification agent |
-| `verifyConcurrency` | `3` | Parallel verification calls |
-| `verifyVerdicts` | `["tp"]` | Which triage verdicts to verify |
-| `verifyMinConfidence` | `0` | Minimum triage confidence to trigger verification (0-1) |
-| `pocHarness` | `false` | Enable post-verification PoC harness generation (requires `--verify`) |
-| `pocHarnessMaxTurns` | `20` | Max turns for PoC harness agent |
-| `pocHarnessConcurrency` | `2` | Parallel PoC harness generation calls |
-| `cryptoBehavioralTestEnabled` | `false` | Enable crypto behavioral testing for crypto misuse findings |
-| `cryptoBehavioralMaxFindings` | `10` | Max findings to generate behavioral tests for per run |
-| `cryptoBehavioralTimeoutMs` | `120000` | Execution timeout per harness in ms |
-| `cryptoBehavioralPerFindingBudgetUsd` | `1` | Cost budget per finding for harness generation |
-| `codeGraphEnabled` | `true` | Enable LLM-powered code graph construction and enrichment |
-**Stage model overrides** — set per-stage models via `taskConfig` instead of top-level fields:
-| `taskConfig` key | Fallback chain | Purpose |
-| --- | --- | --- |
-| `taskConfig.triage.model` | `model` | Model for triage agents |
-| `taskConfig.verify.model` | `model` | Model for verification agents |
-| `taskConfig.poc-harness.model` | `taskConfig.verify.model` → `model` | Model for PoC harness generation |
-| `multiStrategyMode` | `"off"` | Multi-strategy analysis mode (`off`, `adaptive`, `full`) |
-| `multiStrategyBudgetUsd` | `2.0` | Per-finding budget across all strategies (USD) |
-| `autoPatchEnabled` | `false` | Enable automatic patch generation in pipeline |
-| `autoPatchAfter` | `"verify"` | Trigger threshold for auto-patch (`verify`, `poc`, `triage`) |
-| `patchVerifyDepth` | `"task"` | Re-verification depth after patching (`task`, `triage`, `full`) |
-| `patchConcurrency` | `2` | Max concurrent patch synthesis tasks |
-| `incrementalCache` | `true` | Enable incremental scanning (skip unchanged findings across runs) |
-| `incrementalDepTracking` | `true` | Include importers of changed files in rescan scope |
-| `streamingEnabled` | `false` | Enable SSE streaming server for live pipeline events |
-| `streamingPort` | `0` (auto) | Port for the SSE streaming server |
-| `enableContextGathering` | `true` | Run repo context analysis before triage |
-| `auditLog` | `false` | Write agent activity to JSONL audit files |
-| `reportOutput` | _(unset)_ | Write markdown report output to this path |
-| `sarifOutput` | _(unset)_ | Write SARIF v2.1.0 output to this path |
-| `jsonOutput` | _(unset)_ | Write JSON report to this path |
-| `csvOutput` | _(unset)_ | Write CSV report to this path |
-| `jsonlOutput` | _(unset)_ | Write JSONL report to this path |
-| `maxTriageRetries` | `2` | Retry failed triage calls |
-| `maxVerifyRetries` | `2` | Retry failed verification calls |
-| `maxPocHarnessRetries` | `2` | Retry failed PoC harness generation calls |
-| `retryBackoffMs` | `5000` | Initial retry backoff delay in ms |
-| `retryBackoffMultiplier` | `2` | Exponential backoff multiplier |
-Example config:
-```json
-{
-  "tasks": ["semgrep", "codeql", "context-gatherer", "context-enricher", "secrets-crypto-detect", "code-config-detect", "taint-cwe-select", "taint-iris"],
-  "scanMode": "concurrent",
-  "triageConcurrency": 3,
-  "verify": true,
-  "verifyMinConfidence": 0.7,
-  "auditLog": true,
-  "taskConfig": {
-    "codeql": { "dbPath": "./codeql-db", "suite": "javascript-security-extended" },
-    "semgrep": { "binary": "opengrep", "configFlag": "auto" },
-    "agentic": { "model": "anthropic:claude-sonnet-4-6", "maxFindings": 20 },
-    "triage": { "model": "anthropic:claude-opus-4-6" },
-    "verify": { "model": "openai:gpt-4o-mini" }
-  }
-}
-```
+| `pocHarness` | `false` | Enable PoC harness generation |
+| `cryptoBehavioralTestEnabled` | `false` | Enable crypto behavioral testing |
+| `codeGraphEnabled` | `true` | Enable LLM code graph |
+| `multiStrategyMode` | `"off"` | Multi-strategy analysis (`off`, `adaptive`, `full`) |
+| `autoPatchEnabled` | `false` | Enable auto-patch generation |
+| `incrementalCache` | `true` | Enable incremental scanning |
+| `auditLog` | `false` | Write JSONL audit files |
+**Stage model overrides** via `taskConfig`:
+| Key | Purpose |
+| --- | --- |
+| `taskConfig.triage.model` | Model for triage agents |
+| `taskConfig.verify.model` | Model for verification agents |
+| `taskConfig.poc-harness.model` | Model for PoC harness generation |
+Global config: `~/.kuzushi/config.json`. Project overrides: `<repo>/.kuzushi/config.json`. CLI flags override config.
 ### Environment Variables
 | Variable | Required | Description |
 | --- | --- | --- |
-| `ANTHROPIC_API_KEY` | When using `anthropic:*` models | Anthropic API key for pi-ai backend |
-| `OPENAI_API_KEY` | When using `openai:*` models | OpenAI API key for pi-ai backend |
-| `GEMINI_API_KEY` / `GOOGLE_API_KEY` | When using `google:*` models | Google API key for pi-ai backend |
+| `ANTHROPIC_API_KEY` | When using `anthropic:*` models | Anthropic API key |
+| `OPENAI_API_KEY` | When using `openai:*` models | OpenAI API key |
+| `GEMINI_API_KEY` / `GOOGLE_API_KEY` | When using `google:*` models | Google API key |
 </details>
 <details>
 <summary><strong>CodeQL Setup</strong></summary>
-The `codeql` scanner requires the [CodeQL CLI](https://github.com/github/codeql-cli-binaries/releases) to be installed separately. Unlike Semgrep, it is **not auto-downloaded** (the CLI is ~500 MB and requires accepting GitHub's license).
-Install it:
+The `codeql` scanner requires the [CodeQL CLI](https://github.com/github/codeql-cli-binaries/releases) installed separately (not auto-downloaded).
 ```sh
 # Via GitHub CLI (recommended):
@@ -480,175 +353,20 @@ gh extension install github/gh-codeql && gh codeql install-stub
 # https://github.com/github/codeql-cli-binaries/releases
 ```
-Kuzushi finds the CodeQL binary in this order:
-1. `codeql` on your PATH
-2. Previously placed binary at `~/.kuzushi/bin/codeql`
-3. Fails with install instructions if not found
-CodeQL is **opt-in** — the default scanner list is `["semgrep"]`. To enable it:
-```sh
-kuzushi <repo> --scanners codeql              # CodeQL only
-kuzushi <repo> --scanners semgrep,codeql      # both scanners
-kuzushi config set scanners semgrep,codeql    # persist as default
-```
-CodeQL builds a database from your source code before running queries. You can skip this step by pointing to a pre-built database:
-```sh
-kuzushi config set scannerConfig.codeql.dbPath ./codeql-db
-```
-</details>
-<details>
-<summary><strong>Taint Analysis Setup</strong></summary>
-The `taint-analysis` scanner is a multi-pass CodeQL-based pipeline that uses LLM-assisted classification to label sources, sinks, sanitizers, and summaries. It requires:
-1. **CodeQL CLI** — same requirement as the `codeql` scanner
-2. **Python 3** — used by taint analysis scripts for query generation
-Taint analysis templates, references, and scripts are bundled as the [`@kuzushi/augur`](https://www.npmjs.com/package/@kuzushi/augur) npm package and installed automatically with `pnpm install`. No manual clone or `TAINT_ANALYSIS_PATH` setup needed.
-```sh
-kuzushi <repo> --scanners taint-analysis
-kuzushi config set scannerConfig["taint-analysis"].labelingModel anthropic:claude-sonnet-4-6
-kuzushi config set scannerConfig["taint-analysis"].passes "[1,2,3,4,5,6]"
-```
-To override the bundled taint-analysis assets (e.g., for local development), set `TAINT_ANALYSIS_PATH` or `scannerConfig["taint-analysis"].taintAnalysisPath`:
-```sh
-export TAINT_ANALYSIS_PATH=/path/to/local/taint-analysis
-kuzushi config set scannerConfig["taint-analysis"].taintAnalysisPath /path/to/local/taint-analysis
-```
-Taint analysis runs in three DAG-ordered stages: **preflight** (database creation, candidate extraction), **label** (LLM classification), and **analyze** (library generation, query execution, finding extraction).
-### Taint Analysis TI + Artifact Outputs
-Each taint analysis run emits interoperability artifacts under the workspace (`scannerConfig["taint-analysis"].workspaceDir`, default `./iris`) and run directory:
-- `iris/exploration/TI_PRIOR.md` and `iris/exploration/ti_prior.json` — live TI prior (CISA KEV + NVD) with degraded-mode metadata when fetches fail
-- `iris/labels/TAINT_MODEL.json` — per-CWE taint model (`sources/sinks/sanitizers/propagators`) with TI-weighted basis
-- `iris/results/findings.raw.json` — normalized raw findings aggregate from taint analysis pass SARIF outputs
-- `.kuzushi/runs/<runId>/findings.triaged.json` — triaged findings export including optional taint analysis source/sink triage details
-Relevant `scannerConfig["taint-analysis"]` options:
-- `tiMode`: `"live-required"` (default)
-- `tiFailurePolicy`: `"continue_without_ti"` (default)
-- `tiTimeoutMs`: live TI fetch timeout in milliseconds
-- `refinementEnabled`: enable one post-triage refinement loop (default `false`)
-- `refinementIterations`: max refinement passes when enabled (default `1`)
-- `refinementDeltaOnly`: triage only changed findings after refinement (default `true`)
-- `refinementModel`: optional model override for refinement stage wiring
-</details>
-<details>
-<summary><strong>Agent Runtime Backends</strong></summary>
-Kuzushi supports two agent runtime backends:
-**Claude (default)** — Uses `@anthropic-ai/claude-agent-sdk` to spawn Claude Code subprocesses with built-in tool implementations (Read, Glob, Grep, Bash, etc.). Supports session reuse: batch operations keep a single subprocess alive across multiple turns via the SDK's streaming input API, reducing subprocess spawns by ~99%. Requires `ANTHROPIC_API_KEY`.
-**Pi-AI** — Uses `@mariozechner/pi-ai` to provide vendor-agnostic LLM access. It supports 15+ providers (Anthropic, OpenAI, Google, Groq, Mistral, etc.) through a single interface. All LLM calls run in-process (no subprocesses).
-Kuzushi implements an internal agentic loop on top of pi-ai:
-1. **Tool-calling loop** — call model, parse tool calls, execute tools, feed results back, repeat until stop or max turns
-2. **Local tool implementations** — Read (file reader with line numbers), Glob (Node 22+ `globSync`), Grep (regex search across files)
-3. **Structured output** — system prompt injection + post-hoc JSON extraction from fenced code blocks or raw text
-4. **Safety controls** — max turns, budget enforcement, abort signal, permission gating via `canUseTool`
+CodeQL is opt-in. Enable it:
 ```sh
-# Use with any supported provider:
-OPENAI_API_KEY=... kuzushi <repo> --model openai:gpt-4o
-GEMINI_API_KEY=... kuzushi <repo> --model google:gemini-2.0-flash
-ANTHROPIC_API_KEY=... kuzushi <repo> --model anthropic:claude-sonnet-4-6
+kuzushi scan <repo> --tasks codeql
+kuzushi scan <repo> --tasks semgrep,codeql
+kuzushi config set tasks semgrep,codeql
 ```
 </details>
-<details>
-<summary><strong>Architecture</strong></summary>
-Kuzushi is built on three core abstractions:
-**Message Bus** — A transport-agnostic `MessageBus` interface (`publish`, `subscribe`, `waitFor`) that decouples pipeline stages. The stable build supports the in-process `EventEmitter` transport today.
-**AgentTask + DAG** — Every unit of work (context gatherer, scanner, future threat modeler, etc.) implements the `AgentTask` interface: an `id`, `dependsOn` list, `outputKind`, and a `run()` method. The `TaskRegistry` resolves enabled tasks into a DAG, groups them into parallel stages, detects cycles, and hands execution to the `PipelineOrchestrator`. Upstream task outputs are forwarded to dependents automatically.
-**Pipeline Phases** — After the DAG completes, the orchestrator drives sequential phases: triage (classify findings), verification (construct PoC exploits), patch synthesis (auto-generate and re-verify fixes), and report (display results + optional SSE streaming). Each phase has its own concurrency control, cost tracking, and checkpoint support.
-**Strategy Framework** — The multi-strategy system wraps detection tasks with multiple analytical approaches (syntactic, dataflow, reasoning, execution) that run in parallel or adaptively, merging results with corroboration-based confidence boosting.
-**Code Graph** — A persistent SQLite-backed graph of code paths from entry points to sinks, built from static import analysis and LLM-assisted tracing. Injected into triage prompts for deeper reasoning about reachability and sanitization.
-**Session Reuse** — The Claude runtime uses the Agent SDK's `AsyncIterable<SDKUserMessage>` prompt to keep a single subprocess alive across multiple turns. Batch operations (taint labeling, triage, verification, rescoring, PoC generation, patch synthesis) write per-batch data to `.kuzushi/batches/` files and send the subprocess a prompt to `Read` each file. This reduces worst-case subprocess spawns from ~3,100 to ~24 per pipeline run. Runtimes without `createSession` support (pi-ai) fall back to one subprocess per call automatically.
-Existing `ScannerPlugin` implementations (Semgrep, Agentic) are adapted into `AgentTask` via `adaptScannerPlugin()`, so the scanner plugin API remains stable.
-See [AGENTS.md](AGENTS.md) for the full developer guide on adding new agent tasks.
-### Package Surface
-Kuzushi is published as a CLI-first package. The supported npm surface is the executable plus the root package export; internal modules under `dist/*` and `src/*` are not a stable API contract and may change between releases.
-Release builds are expected to come from a clean compile into `dist/`. `pnpm build` now cleans `dist/` first, `prepack` rebuilds automatically, and `pnpm verify:pack` runs `npm pack --dry-run` so stale artifacts do not get published.
-</details>
-## Output
-Results are stored in SQLite at `<repo>/.kuzushi/findings.sqlite3`. Export to any format:
-```sh
-kuzushi <repo> --output report.md     # Markdown
-kuzushi <repo> --sarif results.sarif  # SARIF v2.1.0 (GitHub Code Scanning compatible)
-kuzushi <repo> --json results.json    # JSON
-kuzushi <repo> --csv results.csv      # CSV
-kuzushi <repo> --jsonl results.jsonl  # JSONL
-```
-## Development
-```
-pnpm install                          # install deps
-pnpm dev -- /path/to/repo             # run in dev mode
-pnpm check:types                      # typecheck app + benchmark tooling
-pnpm typecheck                        # type check
-pnpm test                             # run tests
-pnpm test:e2e                         # deterministic mock-backed smoke scan against fixture app
-pnpm test:coverage                    # tests + coverage (70% threshold)
-pnpm build                            # compile to dist/
-pnpm verify:pack                      # verify published tarball contents
-pnpm benchmark                        # run benchmark suite against govwa dataset
-pnpm benchmark:freeze                 # freeze current benchmark results as baseline
-pnpm benchmark:diff                   # diff current results against frozen baseline
-pnpm benchmark:regression             # CI regression check against baseline
-```
-`pnpm test:e2e` and the benchmark regression workflow use `tests/fixtures/mock-anthropic-server.mjs` for deterministic mock-backed coverage. They are useful smoke/regression checks, but they are not real LLM integration tests.
-## Troubleshooting
-- **"Error: missing API credentials for selected model provider(s)."**: Set the provider env var(s) for your selected models (for example `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, or `GEMINI_API_KEY`)
-- **"No findings from scanner. Code looks clean."**: Your code is clean, or try `--severity ERROR,WARNING,INFO` to include lower-severity rules
-- **Scan interrupted**: Re-run the same command (already-triaged findings are skipped), or use `--resume` to continue from the exact checkpoint
-- **Wrong model**: `kuzushi config set model anthropic:claude-sonnet-4-6` or pass `--model` per-scan
-- **Scanner download fails**: Install Opengrep or Semgrep manually, ensure it's on your PATH
-- **High triage cost**: Use `--triage-model openai:gpt-4o-mini` for cheaper triage, or `--max 10` to limit findings
-- **Verification too expensive**: Use `--verify-min-confidence 0.8` to only verify high-confidence TPs, or `--verify-model openai:gpt-4o-mini`
-- **pi-ai model not found**: Ensure the model string uses `provider:modelId` format (e.g., `openai:gpt-4o`, not just `gpt-4o`)
-## Contributing
+## Architecture
-See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and guidelines.
+See [VISION.md](VISION.md) for the full architecture vision, module system design, workspace/knowledge graph, intel layer, governance model, and implementation roadmap.
 ## License
-[MIT](LICENSE)
+MIT