@optave/codegraph 2.2.0 → 2.2.2-dev.c252ef9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -5,7 +5,7 @@
5
5
  <h1 align="center">codegraph</h1>
6
6
 
7
7
  <p align="center">
8
- <strong>Always-fresh code intelligence for AI agents sub-second incremental rebuilds, zero-cost by default, optionally enhanced with your LLM.</strong>
8
+ <strong>Give your AI the map before it starts exploring.</strong>
9
9
  </p>
10
10
 
11
11
  <p align="center">
@@ -13,63 +13,64 @@
13
13
  <a href="https://github.com/optave/codegraph/blob/main/LICENSE"><img src="https://img.shields.io/github/license/optave/codegraph?style=flat-square&logo=opensourceinitiative&logoColor=white" alt="Apache-2.0 License" /></a>
14
14
  <a href="https://github.com/optave/codegraph/actions"><img src="https://img.shields.io/github/actions/workflow/status/optave/codegraph/codegraph-impact.yml?style=flat-square&logo=githubactions&logoColor=white&label=CI" alt="CI" /></a>
15
15
  <img src="https://img.shields.io/badge/node-%3E%3D20-339933?style=flat-square&logo=node.js&logoColor=white" alt="Node >= 20" />
16
- <img src="https://img.shields.io/badge/graph-always%20fresh-brightgreen?style=flat-square&logo=shield&logoColor=white" alt="Always Fresh" />
17
16
  </p>
18
17
 
19
18
  <p align="center">
20
- <a href="#-why-codegraph">Why codegraph?</a>
21
- <a href="#-quick-start">Quick Start</a>
22
- <a href="#-features">Features</a>
23
- <a href="#-commands">Commands</a>
24
- <a href="#-language-support">Languages</a>
25
- <a href="#-ai-agent-integration">AI Integration</a>
26
- <a href="#-recommended-practices">Practices</a>
27
- <a href="#-ci--github-actions">CI/CD</a>
28
- <a href="#-roadmap">Roadmap</a>
29
- <a href="#-contributing">Contributing</a>
19
+ <a href="#the-problem">The Problem</a> &middot;
20
+ <a href="#what-codegraph-does">What It Does</a> &middot;
21
+ <a href="#-quick-start">Quick Start</a> &middot;
22
+ <a href="#-commands">Commands</a> &middot;
23
+ <a href="#-language-support">Languages</a> &middot;
24
+ <a href="#-ai-agent-integration">AI Integration</a> &middot;
25
+ <a href="#-how-it-works">How It Works</a> &middot;
26
+ <a href="#-recommended-practices">Practices</a> &middot;
27
+ <a href="#-roadmap">Roadmap</a>
30
28
  </p>
31
29
 
32
30
  ---
33
31
 
34
- > **The code graph that keeps up with your commits.**
35
- >
36
- > Codegraph parses your codebase with [tree-sitter](https://tree-sitter.github.io/) (native Rust or WASM), builds a function-level dependency graph in SQLite, and keeps it current with sub-second incremental rebuilds. Every query runs locally — no API keys, no Docker, no setup. When you want deeper intelligence, bring your own LLM provider and codegraph enhances search and analysis through the same API you already use. Your code only goes where you choose to send it.
32
+ ## The Problem
37
33
 
38
- ---
34
+ AI coding assistants are incredible — until your codebase gets big enough. Then they get lost.
39
35
 
40
- ## 🔄 Why most code graph tools can't keep up with your commits
36
+ On a large codebase, a great portion of your AI budget isn't going toward solving tasks. It's going toward the AI re-orienting itself in your code. Every session. Over and over. It burns tokens on tool calls — `grep`, `find`, `cat` — just to figure out what calls what. It loses context. It hallucinates dependencies. It modifies a function without realizing 14 callers across 9 files depend on it.
41
37
 
42
- If you use a code graph with an AI agent, the graph needs to be **current**. A stale graph gives the agent wrong answers — deleted functions still show up, new dependencies are invisible, impact analysis misses the code you just wrote. The graph should rebuild on every commit, ideally on every save.
38
+ When the AI catches these mistakes, you waste time and tokens on corrections. When it doesn't catch them, your codebase starts degrading with silent bugs until things stop working.
43
39
 
44
- Most tools in this space can't do that:
40
+ And when you hit `/clear` or run out of context? It starts from scratch.
45
41
 
46
- | Problem | Who has it | Why it breaks on every commit |
47
- |---|---|---|
48
- | **Full re-index on every change** | code-graph-rag, CodeMCP, axon, joern, cpg, GitNexus | No file-level change tracking. Change one file → re-parse and re-insert the entire codebase. On a 3,000-file project, that's 30+ seconds per commit minimum |
49
- | **Cloud API calls baked into the pipeline** | code-graph-rag, CodeRAG | Embeddings are generated through cloud APIs (OpenAI, Voyage AI, Gemini). Every rebuild = API round-trips for every function. Slow, expensive, and rate-limited. You can't put this in a commit hook |
50
- | **Heavy infrastructure that's slow to restart** | code-graph-rag (Memgraph), axon (KuzuDB), badger-graph (Dgraph) | External databases add latency to every write. Bulk-inserting a full graph into Memgraph is not a sub-second operation |
51
- | **No persistence between runs** | pyan, cflow | Re-parse from scratch every time. No database, no delta, no incremental anything |
42
+ ## What Codegraph Does
52
43
 
53
- **Codegraph solves this with three-tier incremental change detection:**
44
+ Codegraph gives your AI a pre-built, always-current map of your entire codebase — every function, every caller, every dependency — so it stops guessing and starts knowing.
54
45
 
55
- 1. **Tier 0 Journal (O(changed)):** If `codegraph watch` was running, a change journal records exactly which files were touched. The next build reads the journal and only processes those files zero filesystem scanning
56
- 2. **Tier 1 — mtime+size (O(n) stats, O(changed) reads):** No journal? Codegraph stats every file and compares mtime + size against stored values. Matching files are skipped without reading a single byte — 10-100x cheaper than hashing
57
- 3. **Tier 2 — Hash (O(changed) reads):** Files that fail the mtime/size check are read and MD5-hashed. Only files whose hash actually changed get re-parsed and re-inserted
46
+ It parses your code with [tree-sitter](https://tree-sitter.github.io/) (native Rust or WASM), builds a function-level dependency graph in SQLite, and keeps it current with sub-second incremental rebuilds. Your AI gets answers like _"this function has 14 callers across 9 files"_ instantly, instead of spending 30 tool calls to maybe discover half of them.
58
47
 
59
- **Result:** change one file in a 3,000-file project → rebuild completes in **under a second**. With watch mode active, rebuilds are near-instant the journal makes the build proportional to the number of changed files, not the size of the codebase. Put it in a commit hook, a file watcher, or let your AI agent trigger it. The graph is always current.
48
+ **Free. Open source. Fully local.** Zero network calls, zero telemetry. Your code stays on your machine. When you want deeper intelligence, bring your own LLM provider your code only goes where you choose to send it.
60
49
 
61
- And because the core pipeline is pure local computation (tree-sitter + SQLite), there are no API calls, no network latency, and no cost. LLM-powered features (semantic search, richer embeddings) are a separate optional layer — they enhance the graph but never block it from being current.
50
+ **Three commands to get started:**
62
51
 
63
- ---
52
+ ```bash
53
+ npm install -g @optave/codegraph
54
+ cd your-project
55
+ codegraph build
56
+ ```
64
57
 
65
- ## 💡 Why codegraph?
58
+ That's it. No config files, no Docker, no JVM, no API keys, no accounts. The graph is ready to query. Add `codegraph mcp` to your AI agent's config and it has full access to your dependency graph through 17 MCP tools.
66
59
 
67
- <sub>Comparison last verified: February 2026</sub>
60
+ ### Why it matters
68
61
 
69
- Most code graph tools make you choose: **fast local analysis with no AI, or powerful AI features that require full re-indexing through cloud APIs on every change.** Codegraph gives you both — a graph that rebuilds in milliseconds on every commit, with optional LLM enhancement through the provider you're already using.
62
+ | Without codegraph | With codegraph |
63
+ |---|---|
64
+ | AI spends 20+ tool calls per session re-discovering your code structure | AI gets full dependency context in one call |
65
+ | Modifies `parseConfig()` without knowing 9 files import it | `fn-impact parseConfig` shows every caller before the edit |
66
+ | Hallucinates that `auth.js` imports from `db.js` | `deps src/auth.js` shows the real import graph |
67
+ | After `/clear`, starts from scratch | Graph persists — next session picks up where this one left off |
68
+ | Suggests renaming a function, breaks 14 call sites silently | `diff-impact --staged` catches the breakage before you commit |
70
69
 
71
70
  ### Feature comparison
72
71
 
72
+ <sub>Comparison last verified: February 2026</sub>
73
+
73
74
  | Capability | codegraph | [joern](https://github.com/joernio/joern) | [narsil-mcp](https://github.com/postrv/narsil-mcp) | [code-graph-rag](https://github.com/vitali87/code-graph-rag) | [cpg](https://github.com/Fraunhofer-AISEC/cpg) | [GitNexus](https://github.com/abhigyanpatwari/GitNexus) | [CodeMCP](https://github.com/SimplyLiz/CodeMCP) | [axon](https://github.com/harshkedia177/axon) |
74
75
  |---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
75
76
  | Function-level analysis | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** |
@@ -90,82 +91,22 @@ Most code graph tools make you choose: **fast local analysis with no AI, or powe
90
91
 
91
92
  | | Differentiator | In practice |
92
93
  |---|---|---|
93
- | **⚡** | **Always-fresh graph** | Three-tier change detection: journal (O(changed)) → mtime+size (O(n) stats) → hash (O(changed) reads). Sub-second rebuilds even on large codebases. Competitors re-index everything from scratch; Merkle-tree approaches still require O(n) filesystem scanning |
94
- | **🔓** | **Zero-cost core, LLM-enhanced when you want** | Full graph analysis with no API keys, no accounts, no cost. Optionally bring your own LLM provider for richer embeddings and AI-powered search — your code only goes to the provider you already chose |
94
+ | **⚡** | **Always-fresh graph** | Three-tier change detection: journal (O(changed)) → mtime+size (O(n) stats) → hash (O(changed) reads). Sub-second rebuilds even on large codebases |
95
+ | **🔓** | **Zero-cost core, LLM-enhanced when you want** | Full graph analysis with no API keys, no accounts, no cost. Optionally bring your own LLM provider — your code only goes where you choose |
95
96
  | **🔬** | **Function-level, not just files** | Traces `handleAuth()` → `validateToken()` → `decryptJWT()` and shows 14 callers across 9 files break if `decryptJWT` changes |
96
- | **🤖** | **Built for AI agents** | 17-tool [MCP server](https://modelcontextprotocol.io/) — AI assistants query your graph directly. Single-repo by default, your code doesn't leak to other projects |
97
- | **🌐** | **Multi-language, one CLI** | JS/TS + Python + Go + Rust + Java + C# + PHP + Ruby + HCL in a single graph — no juggling Madge, pyan, and cflow |
97
+ | **🤖** | **Built for AI agents** | 17-tool [MCP server](https://modelcontextprotocol.io/) — AI assistants query your graph directly. Single-repo by default |
98
+ | **🌐** | **Multi-language, one CLI** | JS/TS + Python + Go + Rust + Java + C# + PHP + Ruby + HCL in a single graph |
98
99
  | **💥** | **Git diff impact** | `codegraph diff-impact` shows changed functions, their callers, and full blast radius — ships with a GitHub Actions workflow |
99
- | **🧠** | **Semantic search** | Local embeddings by default, LLM-powered embeddings when opted in — multi-query with RRF ranking via `"auth; token; JWT"` |
100
-
101
- ### How other tools compare
102
-
103
- The key question is: **can you rebuild your graph on every commit in a large codebase without it costing money or taking minutes?** Most tools in this space either re-index everything from scratch (slow), require cloud API calls for core features (costly), or both. Codegraph's three-tier incremental detection achieves true O(changed) in the best case — when the watcher is running, rebuilds are proportional only to the number of files that changed, not the size of the codebase. The core pipeline needs no API keys at all. LLM-powered features are opt-in, using whichever provider you already work with.
104
-
105
- | Tool | What it does well | The tradeoff |
106
- |---|---|---|
107
- | [joern](https://github.com/joernio/joern) | Full CPG (AST + CFG + PDG) for vulnerability discovery, Scala query DSL, 14 languages, daily releases | No incremental builds — full re-parse on every change. Requires JDK 21, no built-in MCP, no watch mode |
108
- | [narsil-mcp](https://github.com/postrv/narsil-mcp) | 90 MCP tools, 32 languages, taint analysis, SBOM, dead code, neural search, Merkle-tree incremental indexing, single ~30MB binary | Merkle trees still require O(n) filesystem scanning on every rebuild. Primarily MCP-only — no standalone CLI query interface. Neural search requires API key or ONNX source build |
109
- | [code-graph-rag](https://github.com/vitali87/code-graph-rag) | Graph RAG with Memgraph, multi-provider AI, semantic search, code editing via AST | No incremental rebuilds — full re-index + re-embed through cloud APIs on every change. Requires Docker |
110
- | [cpg](https://github.com/Fraunhofer-AISEC/cpg) | Formal Code Property Graph (AST + CFG + PDG + DFG), ~10 languages, MCP module, LLVM IR support, academic specifications | No incremental builds. Requires JVM + Gradle, no zero config, no watch mode |
111
- | [GitNexus](https://github.com/abhigyanpatwari/GitNexus) | Knowledge graph with precomputed structural intelligence, 7 MCP tools, hybrid search (BM25 + semantic + RRF), clustering, process tracing | Full 6-phase pipeline re-run on changes. KuzuDB graph DB, browser mode limited to ~5,000 files. **PolyForm NC — no commercial use** |
112
- | [CodeMCP](https://github.com/SimplyLiz/CodeMCP) | SCIP compiler-grade indexing, compound operations (83% token savings), secret scanning | No incremental builds. Custom license, requires SCIP toolchains per language |
113
- | [axon](https://github.com/harshkedia177/axon) | 11-phase pipeline, KuzuDB, community detection, dead code, change coupling | Full pipeline re-run on changes. No license, Python-only, no MCP |
114
- | [Madge](https://github.com/pahen/madge) | Simple file-level JS/TS dependency graphs | No function-level analysis, no impact tracing, JS/TS only |
115
- | [dependency-cruiser](https://github.com/sverweij/dependency-cruiser) | Architectural rule validation for JS/TS | Module-level only (function-level explicitly out of scope), requires config |
116
- | [Nx graph](https://nx.dev/) | Monorepo project-level dependency graph | Requires Nx workspace, project-level only (not file or function) |
117
- | [pyan](https://github.com/Technologicat/pyan) / [cflow](https://www.gnu.org/software/cflow/) | Function-level call graphs | Single-language each (Python / C only), no persistence, no queries |
118
-
119
- ### Codegraph vs. Narsil-MCP: How to Decide
120
-
121
- If you are looking for local code intelligence over MCP, the closest alternative to `codegraph` is [postrv/narsil-mcp](https://github.com/postrv/narsil-mcp). Both projects aim to give AI agents deep context about your codebase, but they approach the problem with fundamentally different philosophies.
122
-
123
- Here is a cold, analytical breakdown to help you decide which tool fits your workflow.
124
-
125
- #### The Core Difference
126
-
127
- * **Codegraph is a surgical scalpel.** It does one thing exceptionally well: building an always-fresh, function-level dependency graph in SQLite and exposing it to AI agents with zero fluff.
128
- * **Narsil-MCP is a Swiss Army knife.** It is a sprawling, "batteries-included" intelligence server that includes everything from taint analysis and SBOM generation to SPARQL knowledge graphs.
129
-
130
- #### Feature Comparison
131
-
132
- | Aspect | Optave Codegraph | Narsil-MCP |
133
- | :--- | :--- | :--- |
134
- | **Philosophy** | Lean, deterministic, AI-optimized | Comprehensive, feature-dense |
135
- | **AI Tool Count** | 17 focused tools | 90 distinct tools |
136
- | **Language Support** | 11 languages | 32 languages |
137
- | **Primary Interface** | CLI-first with MCP integration | MCP-first (CLI is secondary) |
138
- | **Supply Chain Risk** | Low (minimal dependency tree) | Higher (requires massive dependency graph for embedded ML/scanners) |
139
- | **Graph Updates** | **Three-tier O(changed)** — journal → mtime+size → hash. With watch mode, only changed files are touched | Merkle trees — O(n) filesystem scan on every rebuild to recompute tree hashes |
140
-
141
- #### Choose Codegraph if:
142
-
143
- * **You need the fastest possible incremental rebuilds.** Codegraph’s three-tier change detection (journal → mtime+size → hash) achieves true O(changed) when the watcher is running — only touched files are processed. Narsil’s Merkle trees still require O(n) filesystem scanning to recompute hashes on every rebuild, even when nothing changed. On a 3,000-file project, this is the difference between near-instant and noticeable.
144
- * **You want to optimize AI agent reasoning.** Large Language Models degrade in performance and hallucinate when overwhelmed with choices. Codegraph’s tight 17-tool surface area ensures agents quickly understand their capabilities without wasting context window tokens.
145
- * **You are concerned about supply chain attacks.** To support 90 tools, SBOMs, and neural embeddings, a tool must pull in a massive dependency tree. Codegraph keeps its dependencies minimal, dramatically reducing the risk of malicious code sneaking onto your machine.
146
- * **You want deterministic blast-radius checks.** Features like `diff-impact` are built specifically to tell you exactly how a changed function cascades through your codebase before you merge a PR.
147
- * **You value a strong standalone CLI.** You want to query your code graph locally without necessarily spinning up an AI agent.
148
-
149
- #### Choose Narsil-MCP if:
150
-
151
- * **You want security and code intelligence together.** You dont want a separated MCP for security and prefer an 'all-in-one solution.
152
- * **You use niche languages.** Your codebase relies heavily on languages outside of Codegraph's core 11 (e.g., Fortran, Erlang, Zig, Swift).
153
- * **You are willing to manage tool presets.** Because 90 tools will overload an AI's context window, you don't mind manually configuring preset files (like "Minimal" or "Balanced") to restrict what the AI can see depending on your editor.
100
+ | **🧠** | **Semantic search** | Local embeddings by default, LLM-powered when opted in — multi-query with RRF ranking via `"auth; token; JWT"` |
154
101
 
155
102
  ---
156
103
 
157
104
  ## 🚀 Quick Start
158
105
 
159
106
  ```bash
160
- # Install from npm
107
+ # Install
161
108
  npm install -g @optave/codegraph
162
109
 
163
- # Or install from source
164
- git clone https://github.com/optave/codegraph.git
165
- cd codegraph
166
- npm install
167
- npm link
168
-
169
110
  # Build a graph for any project
170
111
  cd your-project
171
112
  codegraph build # → .codegraph/graph.db created
@@ -176,22 +117,56 @@ codegraph query myFunc # find any function, see callers & callees
176
117
  codegraph deps src/index.ts # file-level import/export map
177
118
  ```
178
119
 
120
+ Or install from source:
121
+
122
+ ```bash
123
+ git clone https://github.com/optave/codegraph.git
124
+ cd codegraph && npm install && npm link
125
+ ```
126
+
127
+ ### For AI agents
128
+
129
+ Add codegraph to your agent's instructions (e.g. `CLAUDE.md`):
130
+
131
+ ```markdown
132
+ Before modifying code, always:
133
+ 1. `codegraph where <name>` — find where the symbol lives
134
+ 2. `codegraph context <name> -T` — get full context (source, deps, callers)
135
+ 3. `codegraph fn-impact <name> -T` — check blast radius before editing
136
+
137
+ After modifying code:
138
+ 4. `codegraph diff-impact --staged -T` — verify impact before committing
139
+ ```
140
+
141
+ Or connect directly via MCP:
142
+
143
+ ```bash
144
+ codegraph mcp # 17-tool MCP server — AI queries the graph directly
145
+ ```
146
+
147
+ Full agent setup: [AI Agent Guide](docs/ai-agent-guide.md) &middot; [CLAUDE.md template](docs/ai-agent-guide.md#claudemd-template)
148
+
149
+ ---
150
+
179
151
  ## ✨ Features
180
152
 
181
153
  | | Feature | Description |
182
154
  |---|---|---|
183
- | 🔍 | **Symbol search** | Find any function, class, or method by name with callers/callees |
155
+ | 🔍 | **Symbol search** | Find any function, class, or method by name exact match priority, relevance scoring, `--file` and `--kind` filters |
184
156
  | 📁 | **File dependencies** | See what a file imports and what imports it |
185
157
  | 💥 | **Impact analysis** | Trace every file affected by a change (transitive) |
186
- | 🧬 | **Function-level tracing** | Call chains, caller trees, and function-level impact |
158
+ | 🧬 | **Function-level tracing** | Call chains, caller trees, and function-level impact with qualified call resolution |
159
+ | 🎯 | **Deep context** | `context` gives AI agents source, deps, callers, signature, and tests for a function in one call; `explain` gives structural summaries of files or functions |
160
+ | 📍 | **Fast lookup** | `where` shows exactly where a symbol is defined and used — minimal, fast |
187
161
  | 📊 | **Diff impact** | Parse `git diff`, find overlapping functions, trace their callers |
188
162
  | 🗺️ | **Module map** | Bird's-eye view of your most-connected files |
163
+ | 🏗️ | **Structure & hotspots** | Directory cohesion scores, fan-in/fan-out hotspot detection, module boundaries |
189
164
  | 🔄 | **Cycle detection** | Find circular dependencies at file or function level |
190
165
  | 📤 | **Export** | DOT (Graphviz), Mermaid, and JSON graph export |
191
166
  | 🧠 | **Semantic search** | Embeddings-powered natural language search with multi-query RRF ranking |
192
167
  | 👀 | **Watch mode** | Incrementally update the graph as files change |
193
168
  | 🤖 | **MCP server** | 17-tool MCP server for AI assistants; single-repo by default, opt-in multi-repo |
194
- | 🔒 | **Your code, your choice** | Zero-cost core with no API keys. Optionally enhance with your LLM provider your code only goes where you send it |
169
+ | | **Always fresh** | Three-tier incremental detectionsub-second rebuilds even on large codebases |
195
170
 
196
171
  ## 📦 Commands
197
172
 
@@ -210,7 +185,19 @@ codegraph watch [dir] # Watch for changes, update graph incrementally
210
185
  codegraph query <name> # Find a symbol — shows callers and callees
211
186
  codegraph deps <file> # File imports/exports
212
187
  codegraph map # Top 20 most-connected files
213
- codegraph map -n 50 # Top 50
188
+ codegraph map -n 50 --no-tests # Top 50, excluding test files
189
+ codegraph where <name> # Where is a symbol defined and used?
190
+ codegraph where --file src/db.js # List symbols, imports, exports for a file
191
+ codegraph stats # Graph health: nodes, edges, languages, quality score
192
+ ```
193
+
194
+ ### Deep Context (AI-Optimized)
195
+
196
+ ```bash
197
+ codegraph context <name> # Full context: source, deps, callers, signature, tests
198
+ codegraph context <name> --depth 2 --no-tests # Include callee source 2 levels deep
199
+ codegraph explain <file> # Structural summary: public API, internals, data flow
200
+ codegraph explain <function> # Function summary: signature, calls, callers, tests
214
201
  ```
215
202
 
216
203
  ### Impact Analysis
@@ -223,6 +210,15 @@ codegraph fn-impact <name> # What functions break if this one changes
223
210
  codegraph diff-impact # Impact of unstaged git changes
224
211
  codegraph diff-impact --staged # Impact of staged changes
225
212
  codegraph diff-impact HEAD~3 # Impact vs a specific ref
213
+ codegraph diff-impact main --format mermaid -T # Mermaid flowchart of blast radius
214
+ ```
215
+
216
+ ### Structure & Hotspots
217
+
218
+ ```bash
219
+ codegraph structure # Directory overview with cohesion scores
220
+ codegraph hotspots # Files with extreme fan-in, fan-out, or density
221
+ codegraph hotspots --metric coupling --level directory --no-tests
226
222
  ```
227
223
 
228
224
  ### Export & Visualization
@@ -238,10 +234,10 @@ codegraph cycles --functions # Function-level cycles
238
234
 
239
235
  ### Semantic Search
240
236
 
241
- Codegraph can build local embeddings for every function, method, and class, then search them by natural language. Everything runs locally using [@huggingface/transformers](https://huggingface.co/docs/transformers.js) — no API keys needed.
237
+ Local embeddings for every function, method, and class search by natural language. Everything runs locally using [@huggingface/transformers](https://huggingface.co/docs/transformers.js) — no API keys needed.
242
238
 
243
239
  ```bash
244
- codegraph embed # Build embeddings (default: minilm)
240
+ codegraph embed # Build embeddings (default: nomic-v1.5)
245
241
  codegraph embed --model nomic # Use a different model
246
242
  codegraph search "handle authentication"
247
243
  codegraph search "parse config" --min-score 0.4 -n 10
@@ -268,9 +264,9 @@ A single trailing semicolon is ignored (falls back to single-query mode). The `-
268
264
  | `minilm` | all-MiniLM-L6-v2 | 384 | ~23 MB | Apache-2.0 | Fastest, good for quick iteration |
269
265
  | `jina-small` | jina-embeddings-v2-small-en | 512 | ~33 MB | Apache-2.0 | Better quality, still small |
270
266
  | `jina-base` | jina-embeddings-v2-base-en | 768 | ~137 MB | Apache-2.0 | High quality, 8192 token context |
271
- | `jina-code` (default) | jina-embeddings-v2-base-code | 768 | ~137 MB | Apache-2.0 | **Best for code search**, trained on code+text |
267
+ | `jina-code` | jina-embeddings-v2-base-code | 768 | ~137 MB | Apache-2.0 | Best for code search, trained on code+text (requires HF token) |
272
268
  | `nomic` | nomic-embed-text-v1 | 768 | ~137 MB | Apache-2.0 | Good quality, 8192 context |
273
- | `nomic-v1.5` | nomic-embed-text-v1.5 | 768 | ~137 MB | Apache-2.0 | Improved nomic, Matryoshka dimensions |
269
+ | `nomic-v1.5` (default) | nomic-embed-text-v1.5 | 768 | ~137 MB | Apache-2.0 | **Improved nomic, Matryoshka dimensions** |
274
270
  | `bge-large` | bge-large-en-v1.5 | 1024 | ~335 MB | MIT | Best general retrieval, top MTEB scores |
275
271
 
276
272
  The model used during `embed` is stored in the database, so `search` auto-detects it — no need to pass `--model` when searching.
@@ -289,28 +285,18 @@ codegraph registry remove <name> # Unregister
289
285
 
290
286
  `codegraph build` auto-registers the project — no manual setup needed.
291
287
 
292
- ### AI Integration
293
-
294
- ```bash
295
- codegraph mcp # Start MCP server (single-repo, current project only)
296
- codegraph mcp --multi-repo # Enable access to all registered repos
297
- codegraph mcp --repos a,b # Restrict to specific repos (implies --multi-repo)
298
- ```
299
-
300
- By default, the MCP server only exposes the local project's graph. AI agents cannot access other repositories unless you explicitly opt in with `--multi-repo` or `--repos`.
301
-
302
288
  ### Common Flags
303
289
 
304
290
  | Flag | Description |
305
291
  |---|---|
306
292
  | `-d, --db <path>` | Custom path to `graph.db` |
307
- | `-T, --no-tests` | Exclude `.test.`, `.spec.`, `__test__` files |
293
+ | `-T, --no-tests` | Exclude `.test.`, `.spec.`, `__test__` files (available on `fn`, `fn-impact`, `context`, `explain`, `where`, `diff-impact`, `search`, `map`, `hotspots`, `deps`, `impact`) |
308
294
  | `--depth <n>` | Transitive trace depth (default varies by command) |
309
295
  | `-j, --json` | Output as JSON |
310
296
  | `-v, --verbose` | Enable debug output |
311
297
  | `--engine <engine>` | Parser engine: `native`, `wasm`, or `auto` (default: `auto`) |
312
- | `-k, --kind <kind>` | Filter by kind: `function`, `method`, `class`, `struct`, `enum`, `trait`, `record`, `module` (search) |
313
- | `--file <pattern>` | Filter by file path pattern (search) |
298
+ | `-k, --kind <kind>` | Filter by kind: `function`, `method`, `class`, `struct`, `enum`, `trait`, `record`, `module` (`fn`, `context`, `search`) |
299
+ | `-f, --file <path>` | Scope to a specific file (`fn`, `context`, `where`) |
314
300
  | `--rrf-k <n>` | RRF smoothing constant for multi-query search (default 60) |
315
301
 
316
302
  ## 🌐 Language Support
@@ -348,6 +334,16 @@ By default, the MCP server only exposes the local project's graph. AI agents can
348
334
  4. **Store** — Everything goes into SQLite as nodes + edges with tree-sitter node boundaries
349
335
  5. **Query** — All queries run locally against the SQLite DB — typically under 100ms
350
336
 
337
+ ### Incremental Rebuilds
338
+
339
+ The graph stays current without re-parsing your entire codebase. Three-tier change detection ensures rebuilds are proportional to what changed, not the size of the project:
340
+
341
+ 1. **Tier 0 — Journal (O(changed)):** If `codegraph watch` was running, a change journal records exactly which files were touched. The next build reads the journal and only processes those files — zero filesystem scanning
342
+ 2. **Tier 1 — mtime+size (O(n) stats, O(changed) reads):** No journal? Codegraph stats every file and compares mtime + size against stored values. Matching files are skipped without reading a single byte
343
+ 3. **Tier 2 — Hash (O(changed) reads):** Files that fail the mtime/size check are read and MD5-hashed. Only files whose hash actually changed get re-parsed and re-inserted
344
+
345
+ **Result:** change one file in a 3,000-file project and the rebuild completes in under a second. Put it in a commit hook, a file watcher, or let your AI agent trigger it.
346
+
351
347
  ### Dual Engine
352
348
 
353
349
  Codegraph ships with two parsing engines:
@@ -361,18 +357,19 @@ Both engines produce identical output. Use `--engine native|wasm|auto` to contro
361
357
 
362
358
  ### Call Resolution
363
359
 
364
- Calls are resolved with priority and confidence scoring:
360
+ Calls are resolved with **qualified resolution** — method calls (`obj.method()`) are distinguished from standalone function calls, and built-in receivers (`console`, `Math`, `JSON`, `Array`, `Promise`, etc.) are filtered out automatically. Import scope is respected: a call to `foo()` only resolves to functions that are actually imported or defined in the same file, eliminating false positives from name collisions.
365
361
 
366
362
  | Priority | Source | Confidence |
367
363
  |---|---|---|
368
364
  | 1 | **Import-aware** — `import { foo } from './bar'` → link to `bar` | `1.0` |
369
365
  | 2 | **Same-file** — definitions in the current file | `1.0` |
370
- | 3 | **Same directory** — definitions in sibling files | `0.7` |
371
- | 4 | **Same parent directory** — definitions in sibling dirs | `0.5` |
372
- | 5 | **Global fallback** — match by name across codebase | `0.3` |
373
- | 6 | **Method hierarchy** — resolved through `extends`/`implements` | — |
366
+ | 3 | **Same directory** — definitions in sibling files (standalone calls only) | `0.7` |
367
+ | 4 | **Same parent directory** — definitions in sibling dirs (standalone calls only) | `0.5` |
368
+ | 5 | **Method hierarchy** — resolved through `extends`/`implements` | varies |
374
369
 
375
- Dynamic patterns like `fn.call()`, `fn.apply()`, `fn.bind()`, and `obj["method"]()` are also detected on a best-effort basis.
370
+ Method calls on unknown receivers skip global fallback entirely — `stmt.run()` will never resolve to a standalone `run` function in another file. Duplicate caller/callee edges are deduplicated automatically. Dynamic patterns like `fn.call()`, `fn.apply()`, `fn.bind()`, and `obj["method"]()` are also detected on a best-effort basis.
371
+
372
+ Codegraph also extracts symbols from common callback patterns: Commander `.command().action()` callbacks (as `command:build`), Express route handlers (as `route:GET /api/users`), and event emitter listeners (as `event:data`).
376
373
 
377
374
  ## 📊 Performance
378
375
 
@@ -380,10 +377,10 @@ Self-measured on every release via CI ([full history](generated/BENCHMARKS.md)):
380
377
 
381
378
  | Metric | Latest |
382
379
  |---|---|
383
- | Build speed (native) | **2.5 ms/file** |
384
- | Build speed (WASM) | **5 ms/file** |
380
+ | Build speed (native) | **1.9 ms/file** |
381
+ | Build speed (WASM) | **6.6 ms/file** |
385
382
  | Query time | **1ms** |
386
- | ~50,000 files (est.) | **~125.0s build** |
383
+ | ~50,000 files (est.) | **~95.0s build** |
387
384
 
388
385
  Metrics are normalized per file for cross-version comparability. Times above are for a full initial build — incremental rebuilds only re-parse changed files.
389
386
 
@@ -395,8 +392,8 @@ Codegraph includes a built-in [Model Context Protocol](https://modelcontextproto
395
392
 
396
393
  ```bash
397
394
  codegraph mcp # Single-repo mode (default) — only local project
398
- codegraph mcp --multi-repo # Multi-repo all registered repos accessible
399
- codegraph mcp --repos a,b # Multi-repo with allowlist
395
+ codegraph mcp --multi-repo # Enable access to all registered repos
396
+ codegraph mcp --repos a,b # Restrict to specific repos (implies --multi-repo)
400
397
  ```
401
398
 
402
399
  **Single-repo mode (default):** Tools operate only on the local `.codegraph/graph.db`. The `repo` parameter and `list_repos` tool are not exposed to the AI agent.
@@ -567,6 +564,23 @@ const { results: fused } = await multiSearchData(
567
564
  - **Dynamic calls are best-effort** — complex computed property access and `eval` patterns are not resolved
568
565
  - **Python imports** — resolves relative imports but doesn't follow `sys.path` or virtual environment packages
569
566
 
567
+ ## 🔍 How Codegraph Compares
568
+
569
+ <sub>Last verified: February 2026. Full analysis: <a href="generated/COMPETITIVE_ANALYSIS.md">COMPETITIVE_ANALYSIS.md</a></sub>
570
+
571
+ | Capability | codegraph | [joern](https://github.com/joernio/joern) | [narsil-mcp](https://github.com/postrv/narsil-mcp) | [code-graph-rag](https://github.com/vitali87/code-graph-rag) | [cpg](https://github.com/Fraunhofer-AISEC/cpg) | [GitNexus](https://github.com/abhigyanpatwari/GitNexus) |
572
+ |---|:---:|:---:|:---:|:---:|:---:|:---:|
573
+ | Function-level analysis | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** |
574
+ | Multi-language | **11** | **14** | **32** | Multi | **~10** | **9** |
575
+ | Incremental rebuilds | **O(changed)** | — | O(n) Merkle | — | — | — |
576
+ | MCP / AI agent support | **Yes** | — | **Yes** | **Yes** | **Yes** | **Yes** |
577
+ | Git diff impact | **Yes** | — | — | — | — | **Yes** |
578
+ | Semantic search | **Yes** | — | **Yes** | **Yes** | — | **Yes** |
579
+ | Watch mode | **Yes** | — | **Yes** | — | — | — |
580
+ | Zero config, no Docker/JVM | **Yes** | — | **Yes** | — | — | — |
581
+ | Works without API keys | **Yes** | **Yes** | **Yes** | — | **Yes** | **Yes** |
582
+ | Commercial use (Apache/MIT) | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | — |
583
+
570
584
  ## 🗺️ Roadmap
571
585
 
572
586
  See **[ROADMAP.md](ROADMAP.md)** for the full development roadmap. Current plan:
@@ -599,5 +613,5 @@ Looking to add a new language? Check out **[Adding a New Language](docs/adding-a
599
613
  ---
600
614
 
601
615
  <p align="center">
602
- <sub>Built with <a href="https://tree-sitter.github.io/">tree-sitter</a> and <a href="https://github.com/WiseLibs/better-sqlite3">better-sqlite3</a>. Your code only goes where you choose to send it.</sub>
616
+ <sub>Built with <a href="https://tree-sitter.github.io/">tree-sitter</a> and <a href="https://github.com/WiseLibs/better-sqlite3">better-sqlite3</a>. Your code stays on your machine.</sub>
603
617
  </p>
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@optave/codegraph",
3
- "version": "2.2.0",
3
+ "version": "2.2.2-dev.c252ef9",
4
4
  "description": "Local code graph CLI — parse codebases with tree-sitter, build dependency graphs, query them",
5
5
  "type": "module",
6
6
  "main": "src/index.js",
@@ -61,10 +61,10 @@
61
61
  "optionalDependencies": {
62
62
  "@huggingface/transformers": "^3.8.1",
63
63
  "@modelcontextprotocol/sdk": "^1.0.0",
64
- "@optave/codegraph-darwin-arm64": "2.2.0",
65
- "@optave/codegraph-darwin-x64": "2.2.0",
66
- "@optave/codegraph-linux-x64-gnu": "2.2.0",
67
- "@optave/codegraph-win32-x64-msvc": "2.2.0"
64
+ "@optave/codegraph-darwin-arm64": "2.2.2-dev.c252ef9",
65
+ "@optave/codegraph-darwin-x64": "2.2.2-dev.c252ef9",
66
+ "@optave/codegraph-linux-x64-gnu": "2.2.2-dev.c252ef9",
67
+ "@optave/codegraph-win32-x64-msvc": "2.2.2-dev.c252ef9"
68
68
  },
69
69
  "devDependencies": {
70
70
  "@biomejs/biome": "^2.4.4",
package/src/builder.js CHANGED
@@ -43,8 +43,28 @@ const BUILTIN_RECEIVERS = new Set([
43
43
  'require',
44
44
  ]);
45
45
 
46
- export function collectFiles(dir, files = [], config = {}, directories = null) {
46
+ export function collectFiles(
47
+ dir,
48
+ files = [],
49
+ config = {},
50
+ directories = null,
51
+ _visited = new Set(),
52
+ ) {
47
53
  const trackDirs = directories !== null;
54
+
55
+ // Resolve real path to detect symlink loops
56
+ let realDir;
57
+ try {
58
+ realDir = fs.realpathSync(dir);
59
+ } catch {
60
+ return trackDirs ? { files, directories } : files;
61
+ }
62
+ if (_visited.has(realDir)) {
63
+ warn(`Symlink loop detected, skipping: ${dir}`);
64
+ return trackDirs ? { files, directories } : files;
65
+ }
66
+ _visited.add(realDir);
67
+
48
68
  let entries;
49
69
  try {
50
70
  entries = fs.readdirSync(dir, { withFileTypes: true });
@@ -67,7 +87,7 @@ export function collectFiles(dir, files = [], config = {}, directories = null) {
67
87
 
68
88
  const full = path.join(dir, entry.name);
69
89
  if (entry.isDirectory()) {
70
- collectFiles(full, files, config, directories);
90
+ collectFiles(full, files, config, directories, _visited);
71
91
  } else if (EXTENSIONS.has(path.extname(entry.name))) {
72
92
  files.push(full);
73
93
  hasFiles = true;
@@ -125,6 +145,28 @@ function fileStat(filePath) {
125
145
  }
126
146
  }
127
147
 
148
+ /**
149
+ * Read a file with retry on transient errors (EBUSY/EACCES/EPERM).
150
+ * Editors performing non-atomic saves can cause these during mid-write.
151
+ */
152
+ const TRANSIENT_CODES = new Set(['EBUSY', 'EACCES', 'EPERM']);
153
+ const RETRY_DELAY_MS = 50;
154
+
155
+ export function readFileSafe(filePath, retries = 2) {
156
+ for (let attempt = 0; ; attempt++) {
157
+ try {
158
+ return fs.readFileSync(filePath, 'utf-8');
159
+ } catch (err) {
160
+ if (attempt < retries && TRANSIENT_CODES.has(err.code)) {
161
+ const end = Date.now() + RETRY_DELAY_MS;
162
+ while (Date.now() < end) {}
163
+ continue;
164
+ }
165
+ throw err;
166
+ }
167
+ }
168
+ }
169
+
128
170
  /**
129
171
  * Determine which files have changed since last build.
130
172
  * Three-tier cascade:
@@ -193,7 +235,7 @@ function getChangedFiles(db, allFiles, rootDir) {
193
235
 
194
236
  let content;
195
237
  try {
196
- content = fs.readFileSync(absPath, 'utf-8');
238
+ content = readFileSafe(absPath);
197
239
  } catch {
198
240
  continue;
199
241
  }
@@ -256,7 +298,7 @@ function getChangedFiles(db, allFiles, rootDir) {
256
298
  for (const item of needsHash) {
257
299
  let content;
258
300
  try {
259
- content = fs.readFileSync(item.file, 'utf-8');
301
+ content = readFileSafe(item.file);
260
302
  } catch {
261
303
  continue;
262
304
  }
@@ -459,7 +501,7 @@ export async function buildGraph(rootDir, opts = {}) {
459
501
  const absPath = path.join(rootDir, relPath);
460
502
  let code;
461
503
  try {
462
- code = fs.readFileSync(absPath, 'utf-8');
504
+ code = readFileSafe(absPath);
463
505
  } catch {
464
506
  code = null;
465
507
  }
package/src/cli.js CHANGED
@@ -216,6 +216,7 @@ program
216
216
  .option('--depth <n>', 'Max transitive caller depth', '3')
217
217
  .option('-T, --no-tests', 'Exclude test/spec files from results')
218
218
  .option('-j, --json', 'Output as JSON')
219
+ .option('-f, --format <format>', 'Output format: text, mermaid, json', 'text')
219
220
  .action((ref, opts) => {
220
221
  diffImpact(opts.db, {
221
222
  ref,
@@ -223,6 +224,7 @@ program
223
224
  depth: parseInt(opts.depth, 10),
224
225
  noTests: !opts.tests,
225
226
  json: opts.json,
227
+ format: opts.format,
226
228
  });
227
229
  });
228
230
 
@@ -374,7 +376,7 @@ program
374
376
  .action(() => {
375
377
  console.log('\nAvailable embedding models:\n');
376
378
  for (const [key, config] of Object.entries(MODELS)) {
377
- const def = key === 'jina-code' ? ' (default)' : '';
379
+ const def = key === 'nomic-v1.5' ? ' (default)' : '';
378
380
  console.log(` ${key.padEnd(12)} ${String(config.dim).padStart(4)}d ${config.desc}${def}`);
379
381
  }
380
382
  console.log('\nUsage: codegraph embed --model <name>');
@@ -388,8 +390,8 @@ program
388
390
  )
389
391
  .option(
390
392
  '-m, --model <name>',
391
- 'Embedding model: minilm, jina-small, jina-base, jina-code (default), nomic, nomic-v1.5, bge-large. Run `codegraph models` for details',
392
- 'jina-code',
393
+ 'Embedding model: minilm, jina-small, jina-base, jina-code, nomic, nomic-v1.5 (default), bge-large. Run `codegraph models` for details',
394
+ 'nomic-v1.5',
393
395
  )
394
396
  .action(async (dir, opts) => {
395
397
  const root = path.resolve(dir || '.');
package/src/config.js CHANGED
@@ -19,7 +19,7 @@ export const DEFAULTS = {
19
19
  defaultDepth: 3,
20
20
  defaultLimit: 20,
21
21
  },
22
- embeddings: { model: 'jina-code', llmProvider: null },
22
+ embeddings: { model: 'nomic-v1.5', llmProvider: null },
23
23
  llm: { provider: null, model: null, baseUrl: null, apiKey: null, apiKeyCommand: null },
24
24
  search: { defaultMinScore: 0.2, rrfK: 60, topK: 15 },
25
25
  ci: { failOnCycles: false, impactThreshold: null },
package/src/embedder.js CHANGED
@@ -55,7 +55,7 @@ export const MODELS = {
55
55
  },
56
56
  };
57
57
 
58
- export const DEFAULT_MODEL = 'jina-code';
58
+ export const DEFAULT_MODEL = 'nomic-v1.5';
59
59
  const BATCH_SIZE_MAP = {
60
60
  minilm: 32,
61
61
  'jina-small': 16,
@@ -139,9 +139,11 @@ export function extractSymbols(tree, _filePath) {
139
139
  if (callInfo) {
140
140
  calls.push(callInfo);
141
141
  }
142
+ if (fn.type === 'member_expression') {
143
+ const cbDef = extractCallbackDefinition(node, fn);
144
+ if (cbDef) definitions.push(cbDef);
145
+ }
142
146
  }
143
- const cbDef = extractCallbackDefinition(node);
144
- if (cbDef) definitions.push(cbDef);
145
147
  break;
146
148
  }
147
149
 
@@ -320,10 +322,6 @@ function extractReceiverName(objNode) {
320
322
  if (objNode.type === 'identifier') return objNode.text;
321
323
  if (objNode.type === 'this') return 'this';
322
324
  if (objNode.type === 'super') return 'super';
323
- if (objNode.type === 'member_expression') {
324
- const prop = objNode.childForFieldName('property');
325
- if (prop) return objNode.text;
326
- }
327
325
  return objNode.text;
328
326
  }
329
327
 
@@ -432,8 +430,8 @@ const EXPRESS_METHODS = new Set([
432
430
  ]);
433
431
  const EVENT_METHODS = new Set(['on', 'once', 'addEventListener', 'addListener']);
434
432
 
435
- function extractCallbackDefinition(callNode) {
436
- const fn = callNode.childForFieldName('function');
433
+ function extractCallbackDefinition(callNode, fn) {
434
+ if (!fn) fn = callNode.childForFieldName('function');
437
435
  if (!fn || fn.type !== 'member_expression') return null;
438
436
 
439
437
  const prop = fn.childForFieldName('property');
package/src/index.js CHANGED
@@ -41,6 +41,7 @@ export {
41
41
  ALL_SYMBOL_KINDS,
42
42
  contextData,
43
43
  diffImpactData,
44
+ diffImpactMermaid,
44
45
  explainData,
45
46
  FALSE_POSITIVE_CALLER_THRESHOLD,
46
47
  FALSE_POSITIVE_NAMES,
package/src/mcp.js CHANGED
@@ -8,7 +8,7 @@
8
8
  import { createRequire } from 'node:module';
9
9
  import { findCycles } from './cycles.js';
10
10
  import { findDbPath } from './db.js';
11
- import { ALL_SYMBOL_KINDS } from './queries.js';
11
+ import { ALL_SYMBOL_KINDS, diffImpactMermaid } from './queries.js';
12
12
 
13
13
  const REPO_PROP = {
14
14
  repo: {
@@ -201,6 +201,11 @@ const BASE_TOOLS = [
201
201
  ref: { type: 'string', description: 'Git ref to diff against (default: HEAD)' },
202
202
  depth: { type: 'number', description: 'Transitive caller depth', default: 3 },
203
203
  no_tests: { type: 'boolean', description: 'Exclude test files', default: false },
204
+ format: {
205
+ type: 'string',
206
+ enum: ['json', 'mermaid'],
207
+ description: 'Output format (default: json)',
208
+ },
204
209
  },
205
210
  },
206
211
  },
@@ -467,12 +472,21 @@ export async function startMCPServer(customDbPath, options = {}) {
467
472
  });
468
473
  break;
469
474
  case 'diff_impact':
470
- result = diffImpactData(dbPath, {
471
- staged: args.staged,
472
- ref: args.ref,
473
- depth: args.depth,
474
- noTests: args.no_tests,
475
- });
475
+ if (args.format === 'mermaid') {
476
+ result = diffImpactMermaid(dbPath, {
477
+ staged: args.staged,
478
+ ref: args.ref,
479
+ depth: args.depth,
480
+ noTests: args.no_tests,
481
+ });
482
+ } else {
483
+ result = diffImpactData(dbPath, {
484
+ staged: args.staged,
485
+ ref: args.ref,
486
+ depth: args.depth,
487
+ noTests: args.no_tests,
488
+ });
489
+ }
476
490
  break;
477
491
  case 'semantic_search': {
478
492
  const { searchData } = await import('./embedder.js');
package/src/queries.js CHANGED
@@ -608,16 +608,34 @@ export function diffImpactData(customDbPath, opts = {}) {
608
608
 
609
609
  if (!diffOutput.trim()) {
610
610
  db.close();
611
- return { changedFiles: 0, affectedFunctions: [], affectedFiles: [], summary: null };
611
+ return {
612
+ changedFiles: 0,
613
+ newFiles: [],
614
+ affectedFunctions: [],
615
+ affectedFiles: [],
616
+ summary: null,
617
+ };
612
618
  }
613
619
 
614
620
  const changedRanges = new Map();
621
+ const newFiles = new Set();
615
622
  let currentFile = null;
623
+ let prevIsDevNull = false;
616
624
  for (const line of diffOutput.split('\n')) {
625
+ if (line.startsWith('--- /dev/null')) {
626
+ prevIsDevNull = true;
627
+ continue;
628
+ }
629
+ if (line.startsWith('--- ')) {
630
+ prevIsDevNull = false;
631
+ continue;
632
+ }
617
633
  const fileMatch = line.match(/^\+\+\+ b\/(.+)/);
618
634
  if (fileMatch) {
619
635
  currentFile = fileMatch[1];
620
636
  if (!changedRanges.has(currentFile)) changedRanges.set(currentFile, []);
637
+ if (prevIsDevNull) newFiles.add(currentFile);
638
+ prevIsDevNull = false;
621
639
  continue;
622
640
  }
623
641
  const hunkMatch = line.match(/^@@ .+ \+(\d+)(?:,(\d+))? @@/);
@@ -630,7 +648,13 @@ export function diffImpactData(customDbPath, opts = {}) {
630
648
 
631
649
  if (changedRanges.size === 0) {
632
650
  db.close();
633
- return { changedFiles: 0, affectedFunctions: [], affectedFiles: [], summary: null };
651
+ return {
652
+ changedFiles: 0,
653
+ newFiles: [],
654
+ affectedFunctions: [],
655
+ affectedFiles: [],
656
+ summary: null,
657
+ };
634
658
  }
635
659
 
636
660
  const affectedFunctions = [];
@@ -658,6 +682,10 @@ export function diffImpactData(customDbPath, opts = {}) {
658
682
  const visited = new Set([fn.id]);
659
683
  let frontier = [fn.id];
660
684
  let totalCallers = 0;
685
+ const levels = {};
686
+ const edges = [];
687
+ const idToKey = new Map();
688
+ idToKey.set(fn.id, `${fn.file}::${fn.name}:${fn.line}`);
661
689
  for (let d = 1; d <= maxDepth; d++) {
662
690
  const nextFrontier = [];
663
691
  for (const fid of frontier) {
@@ -673,6 +701,11 @@ export function diffImpactData(customDbPath, opts = {}) {
673
701
  visited.add(c.id);
674
702
  nextFrontier.push(c.id);
675
703
  allAffected.add(`${c.file}:${c.name}`);
704
+ const callerKey = `${c.file}::${c.name}:${c.line}`;
705
+ idToKey.set(c.id, callerKey);
706
+ if (!levels[d]) levels[d] = [];
707
+ levels[d].push({ name: c.name, kind: c.kind, file: c.file, line: c.line });
708
+ edges.push({ from: idToKey.get(fid), to: callerKey });
676
709
  totalCallers++;
677
710
  }
678
711
  }
@@ -686,6 +719,8 @@ export function diffImpactData(customDbPath, opts = {}) {
686
719
  file: fn.file,
687
720
  line: fn.line,
688
721
  transitiveCallers: totalCallers,
722
+ levels,
723
+ edges,
689
724
  };
690
725
  });
691
726
 
@@ -695,6 +730,7 @@ export function diffImpactData(customDbPath, opts = {}) {
695
730
  db.close();
696
731
  return {
697
732
  changedFiles: changedRanges.size,
733
+ newFiles: [...newFiles],
698
734
  affectedFunctions: functionResults,
699
735
  affectedFiles: [...affectedFiles],
700
736
  summary: {
@@ -705,6 +741,120 @@ export function diffImpactData(customDbPath, opts = {}) {
705
741
  };
706
742
  }
707
743
 
744
+ export function diffImpactMermaid(customDbPath, opts = {}) {
745
+ const data = diffImpactData(customDbPath, opts);
746
+ if (data.error) return data.error;
747
+ if (data.changedFiles === 0 || data.affectedFunctions.length === 0) {
748
+ return 'flowchart TB\n none["No impacted functions detected"]';
749
+ }
750
+
751
+ const newFileSet = new Set(data.newFiles || []);
752
+ const lines = ['flowchart TB'];
753
+
754
+ // Assign stable Mermaid node IDs
755
+ let nodeCounter = 0;
756
+ const nodeIdMap = new Map();
757
+ const nodeLabels = new Map();
758
+ function nodeId(key, label) {
759
+ if (!nodeIdMap.has(key)) {
760
+ nodeIdMap.set(key, `n${nodeCounter++}`);
761
+ if (label) nodeLabels.set(key, label);
762
+ }
763
+ return nodeIdMap.get(key);
764
+ }
765
+
766
+ // Register all nodes (changed functions + their callers)
767
+ for (const fn of data.affectedFunctions) {
768
+ nodeId(`${fn.file}::${fn.name}:${fn.line}`, fn.name);
769
+ for (const callers of Object.values(fn.levels || {})) {
770
+ for (const c of callers) {
771
+ nodeId(`${c.file}::${c.name}:${c.line}`, c.name);
772
+ }
773
+ }
774
+ }
775
+
776
+ // Collect all edges and determine blast radius
777
+ const allEdges = new Set();
778
+ const edgeFromNodes = new Set();
779
+ const edgeToNodes = new Set();
780
+ const changedKeys = new Set();
781
+
782
+ for (const fn of data.affectedFunctions) {
783
+ changedKeys.add(`${fn.file}::${fn.name}:${fn.line}`);
784
+ for (const edge of fn.edges || []) {
785
+ const edgeKey = `${edge.from}|${edge.to}`;
786
+ if (!allEdges.has(edgeKey)) {
787
+ allEdges.add(edgeKey);
788
+ edgeFromNodes.add(edge.from);
789
+ edgeToNodes.add(edge.to);
790
+ }
791
+ }
792
+ }
793
+
794
+ // Blast radius: caller nodes that are never a source (leaf nodes of the impact tree)
795
+ const blastRadiusKeys = new Set();
796
+ for (const key of edgeToNodes) {
797
+ if (!edgeFromNodes.has(key) && !changedKeys.has(key)) {
798
+ blastRadiusKeys.add(key);
799
+ }
800
+ }
801
+
802
+ // Intermediate callers: not changed, not blast radius
803
+ const intermediateKeys = new Set();
804
+ for (const key of edgeToNodes) {
805
+ if (!changedKeys.has(key) && !blastRadiusKeys.has(key)) {
806
+ intermediateKeys.add(key);
807
+ }
808
+ }
809
+
810
+ // Group changed functions by file
811
+ const fileGroups = new Map();
812
+ for (const fn of data.affectedFunctions) {
813
+ if (!fileGroups.has(fn.file)) fileGroups.set(fn.file, []);
814
+ fileGroups.get(fn.file).push(fn);
815
+ }
816
+
817
+ // Emit changed-file subgraphs
818
+ let sgCounter = 0;
819
+ for (const [file, fns] of fileGroups) {
820
+ const isNew = newFileSet.has(file);
821
+ const tag = isNew ? 'new' : 'modified';
822
+ const sgId = `sg${sgCounter++}`;
823
+ lines.push(` subgraph ${sgId}["${file} **(${tag})**"]`);
824
+ for (const fn of fns) {
825
+ const key = `${fn.file}::${fn.name}:${fn.line}`;
826
+ lines.push(` ${nodeIdMap.get(key)}["${fn.name}"]`);
827
+ }
828
+ lines.push(' end');
829
+ const style = isNew ? 'fill:#e8f5e9,stroke:#4caf50' : 'fill:#fff3e0,stroke:#ff9800';
830
+ lines.push(` style ${sgId} ${style}`);
831
+ }
832
+
833
+ // Emit intermediate caller nodes (outside subgraphs)
834
+ for (const key of intermediateKeys) {
835
+ lines.push(` ${nodeIdMap.get(key)}["${nodeLabels.get(key)}"]`);
836
+ }
837
+
838
+ // Emit blast radius subgraph
839
+ if (blastRadiusKeys.size > 0) {
840
+ const sgId = `sg${sgCounter++}`;
841
+ lines.push(` subgraph ${sgId}["Callers **(blast radius)**"]`);
842
+ for (const key of blastRadiusKeys) {
843
+ lines.push(` ${nodeIdMap.get(key)}["${nodeLabels.get(key)}"]`);
844
+ }
845
+ lines.push(' end');
846
+ lines.push(` style ${sgId} fill:#f3e5f5,stroke:#9c27b0`);
847
+ }
848
+
849
+ // Emit edges (impact flows from changed fn toward callers)
850
+ for (const edgeKey of allEdges) {
851
+ const [from, to] = edgeKey.split('|');
852
+ lines.push(` ${nodeIdMap.get(from)} --> ${nodeIdMap.get(to)}`);
853
+ }
854
+
855
+ return lines.join('\n');
856
+ }
857
+
708
858
  export function listFunctionsData(customDbPath, opts = {}) {
709
859
  const db = openReadonlyOrFail(customDbPath);
710
860
  const noTests = opts.noTests || false;
@@ -2079,8 +2229,12 @@ export function fnImpact(name, customDbPath, opts = {}) {
2079
2229
  }
2080
2230
 
2081
2231
  export function diffImpact(customDbPath, opts = {}) {
2232
+ if (opts.format === 'mermaid') {
2233
+ console.log(diffImpactMermaid(customDbPath, opts));
2234
+ return;
2235
+ }
2082
2236
  const data = diffImpactData(customDbPath, opts);
2083
- if (opts.json) {
2237
+ if (opts.json || opts.format === 'json') {
2084
2238
  console.log(JSON.stringify(data, null, 2));
2085
2239
  return;
2086
2240
  }
package/src/watcher.js CHANGED
@@ -1,5 +1,6 @@
1
1
  import fs from 'node:fs';
2
2
  import path from 'node:path';
3
+ import { readFileSafe } from './builder.js';
3
4
  import { EXTENSIONS, IGNORE_DIRS, normalizePath } from './constants.js';
4
5
  import { initSchema, openDb } from './db.js';
5
6
  import { appendJournalEntries } from './journal.js';
@@ -35,7 +36,7 @@ async function updateFile(_db, rootDir, filePath, stmts, engineOpts, cache) {
35
36
 
36
37
  let code;
37
38
  try {
38
- code = fs.readFileSync(filePath, 'utf-8');
39
+ code = readFileSafe(filePath);
39
40
  } catch (err) {
40
41
  warn(`Cannot read ${relPath}: ${err.message}`);
41
42
  return null;