@optave/codegraph 2.5.1 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -31,19 +31,24 @@
31
31
 
32
32
  ## The Problem
33
33
 
34
- AI coding assistants are incredible until your codebase gets big enough. Then they get lost.
34
+ Large codebases are opaque. The structure lives in people's heads, not in tools.
35
35
 
36
- On a large codebase, a great portion of your AI budget isn't going toward solving tasks. It's going toward the AI re-orienting itself in your code. Every session. Over and over. It burns tokens on tool calls `grep`, `find`, `cat` just to figure out what calls what. It loses context. It hallucinates dependencies. It modifies a function without realizing 14 callers across 9 files depend on it.
36
+ A developer inherits a project and spends days grepping to understand what calls what. An AI agent burns half its token budget on `grep`, `find`, `cat` re-discovering the same structure every session. An architect draws boundary rules on a whiteboard that erode within weeks because nothing enforces them. A CI pipeline catches test failures but can't tell you _"this change silently affects 14 callers across 9 files."_
37
37
 
38
- When the AI catches these mistakes, you waste time and tokens on corrections. When it doesn't catch them, your codebase starts degrading with silent bugs until things stop working.
39
-
40
- And when you hit `/clear` or run out of context? It starts from scratch.
38
+ The information exists it's in the code itself. But without a structured map, everyone is navigating blind: developers guess, AI agents hallucinate, and architecture degrades one unreviewed change at a time.
41
39
 
42
40
  ## What Codegraph Does
43
41
 
44
- Codegraph gives your AI a pre-built, always-current map of your entire codebase — every function, every caller, every dependency — so it stops guessing and starts knowing.
42
+ Codegraph builds a function-level dependency graph of your entire codebase — every function, every caller, every dependency — and keeps it current with sub-second incremental rebuilds.
43
+
44
+ It parses your code with [tree-sitter](https://tree-sitter.github.io/) (native Rust or WASM), stores the graph in SQLite, and gives you multiple ways to consume it:
45
+
46
+ - **CLI** — developers explore, query, and audit their code from the terminal
47
+ - **MCP server** — AI agents query the graph directly through 30 tools
48
+ - **CI gates** — `check` and `manifesto` commands enforce quality thresholds with exit codes
49
+ - **Programmatic API** — embed codegraph in your own tools via `npm install`
45
50
 
46
- It parses your code with [tree-sitter](https://tree-sitter.github.io/) (native Rust or WASM), builds a function-level dependency graph in SQLite, and keeps it current with sub-second incremental rebuilds. Your AI gets answers like _"this function has 14 callers across 9 files"_ instantly, instead of spending 30 tool calls to maybe discover half of them.
51
+ Instead of 30 tool calls to maybe discover half your dependencies, you get _"this function has 14 callers across 9 files"_ instantly. Instead of hoping architecture rules are followed, you enforce them. Instead of finding breakage in production, `diff-impact --staged` catches it before you commit.
47
52
 
48
53
  **Free. Open source. Fully local.** Zero network calls, zero telemetry. Your code stays on your machine. When you want deeper intelligence, bring your own LLM provider — your code only goes where you choose to send it.
49
54
 
@@ -55,39 +60,54 @@ cd your-project
55
60
  codegraph build
56
61
  ```
57
62
 
58
- That's it. No config files, no Docker, no JVM, no API keys, no accounts. The graph is ready to query. Add `codegraph mcp` to your AI agent's config and it has full access to your dependency graph through 24 MCP tools (25 in multi-repo mode).
63
+ That's it. No config files, no Docker, no JVM, no API keys, no accounts. The graph is ready to query.
59
64
 
60
65
  ### Why it matters
61
66
 
62
- | Without codegraph | With codegraph |
63
- |---|---|
64
- | AI spends 20+ tool calls per session re-discovering your code structure | AI gets full dependency context in one call |
65
- | Modifies `parseConfig()` without knowing 9 files import it | `fn-impact parseConfig` shows every caller before the edit |
66
- | Hallucinates that `auth.js` imports from `db.js` | `deps src/auth.js` shows the real import graph |
67
- | After `/clear`, starts from scratch | Graph persists next session picks up where this one left off |
68
- | Suggests renaming a function, breaks 14 call sites silently | `diff-impact --staged` catches the breakage before you commit |
67
+ | | Without codegraph | With codegraph |
68
+ |---|---|---|
69
+ | **AI agents** | Spend 20+ tool calls per session re-discovering code structure | Get full dependency context in one MCP call |
70
+ | **AI agents** | Modify `parseConfig()` without knowing 9 files import it | `fn-impact parseConfig` shows every caller before the edit |
71
+ | **Developers** | Inherit a codebase and grep for hours to understand what calls what | `context handleAuth -T` gives source, deps, callers, and tests in one command |
72
+ | **Developers** | Rename a function, break 14 call sites silently | `diff-impact --staged` catches breakage before you commit |
73
+ | **CI pipelines** | Catch test failures but miss structural degradation | `check --staged` fails the build when blast radius or complexity thresholds are exceeded |
74
+ | **Architects** | Draw boundary rules that erode within weeks | `manifesto` and `boundaries` enforce architecture rules on every commit |
69
75
 
70
76
  ### Feature comparison
71
77
 
72
- <sub>Comparison last verified: February 2026</sub>
78
+ <sub>Comparison last verified: March 2026. Full analysis: <a href="generated/competitive/COMPETITIVE_ANALYSIS.md">COMPETITIVE_ANALYSIS.md</a></sub>
73
79
 
74
80
  | Capability | codegraph | [joern](https://github.com/joernio/joern) | [narsil-mcp](https://github.com/postrv/narsil-mcp) | [code-graph-rag](https://github.com/vitali87/code-graph-rag) | [cpg](https://github.com/Fraunhofer-AISEC/cpg) | [GitNexus](https://github.com/abhigyanpatwari/GitNexus) | [CodeMCP](https://github.com/SimplyLiz/CodeMCP) | [axon](https://github.com/harshkedia177/axon) |
75
81
  |---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
76
82
  | Function-level analysis | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** |
77
- | Multi-language | **11** | **14** | **32** | Multi | **~10** | **9** | SCIP langs | Few |
78
- | Semantic search | **Yes** | — | **Yes** | **Yes** | — | **Yes** | — | |
79
- | MCP / AI agent support | **Yes** | — | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | — |
80
- | Git diff impact | **Yes** | — | — | — | — | **Yes** | — | **Yes** |
81
- | Git co-change analysis | **Yes** | — | — | — | — | — | **Yes** | **Yes** |
82
- | Watch mode | **Yes** | — | **Yes** | — | — | — | — | — |
83
- | Dead code / role classification | **Yes** | — | **Yes** | — | — | — | — | **Yes** |
84
- | Cycle detection | **Yes** | — | **Yes** | — | — | — | — | **Yes** |
85
- | Incremental rebuilds | **O(changed)** | — | O(n) Merkle | — | — | — | — | — |
86
- | Zero config | **Yes** | — | **Yes** | | | | | |
83
+ | Multi-language | **11** | **14** | **32** | **11** | **~10** | **12** | **12** | **3** |
84
+ | Semantic search | **Yes** | — | **Yes** | **Yes** | — | **Yes** | — | **Yes** |
85
+ | Hybrid BM25 + semantic | **Yes** | — | | | | **Yes** | — | **Yes** |
86
+ | CODEOWNERS integration | **Yes** | — | — | — | — | | — | |
87
+ | Architecture boundary rules | **Yes** | — | — | — | — | — | | |
88
+ | CI validation predicates | **Yes** | — | | — | — | — | — | — |
89
+ | Composite audit command | **Yes** | — | | — | — | — | — | |
90
+ | Batch querying | **Yes** | — | | — | — | — | — | |
91
+ | Graph snapshots | **Yes** | — | | — | — | — | — | — |
92
+ | MCP / AI agent support | **Yes** | — | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** |
93
+ | Git diff impact | **Yes** | — | — | — | — | **Yes** | **Yes** | **Yes** |
94
+ | Branch structural diff | **Yes** | — | — | — | — | — | — | **Yes** |
95
+ | Git co-change analysis | **Yes** | — | — | — | — | — | — | **Yes** |
96
+ | Watch mode | **Yes** | — | **Yes** | **Yes** | — | — | **Yes** | **Yes** |
97
+ | Dead code / role classification | **Yes** | — | **Yes** | — | — | — | **Yes** | **Yes** |
98
+ | Cycle detection | **Yes** | — | — | — | — | — | — | — |
99
+ | Incremental rebuilds | **O(changed)** | — | O(n) Merkle | — | — | — | Go only | **Yes** |
100
+ | Zero config | **Yes** | — | **Yes** | — | — | **Yes** | — | **Yes** |
87
101
  | Embeddable JS library (`npm install`) | **Yes** | — | — | — | — | — | — | — |
88
- | LLM-optional (works without API keys) | **Yes** | **Yes** | **Yes** | | **Yes** | **Yes** | **Yes** | **Yes** |
89
- | Commercial use allowed | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | — | — | — |
90
- | Open source | **Yes** | Yes | Yes | Yes | Yes | Yes | Custom | — |
102
+ | LLM-optional (works without API keys) | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** |
103
+ | Dataflow analysis | **Yes** | **Yes** | | | **Yes** | — | — | — |
104
+ | Control flow graph (CFG) | **Yes** | **Yes** | | | **Yes** | | | — |
105
+ | AST node querying | **Yes** | **Yes** | — | — | **Yes** | — | — | — |
106
+ | Expanded node/edge types | **Yes** | **Yes** | — | — | **Yes** | — | — | — |
107
+ | GraphML / Neo4j export | **Yes** | **Yes** | — | — | — | — | — | — |
108
+ | Interactive graph viewer | **Yes** | — | — | — | — | — | — | — |
109
+ | Commercial use allowed | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | No | Paid | **Yes** |
110
+ | Open source | **Yes** | Yes | Yes | Yes | Yes | No | No | Yes |
91
111
 
92
112
  ### What makes codegraph different
93
113
 
@@ -97,10 +117,11 @@ That's it. No config files, no Docker, no JVM, no API keys, no accounts. The gra
97
117
  | **🔓** | **Zero-cost core, LLM-enhanced when you want** | Full graph analysis with no API keys, no accounts, no cost. Optionally bring your own LLM provider — your code only goes where you choose |
98
118
  | **🔬** | **Function-level, not just files** | Traces `handleAuth()` → `validateToken()` → `decryptJWT()` and shows 14 callers across 9 files break if `decryptJWT` changes |
99
119
  | **🏷️** | **Role classification** | Every symbol auto-tagged as `entry`/`core`/`utility`/`adapter`/`dead`/`leaf` — agents instantly know what they're looking at |
100
- | **🤖** | **Built for AI agents** | 24-tool [MCP server](https://modelcontextprotocol.io/) — AI assistants query your graph directly. Single-repo by default |
120
+ | **🤖** | **Built for AI agents** | 30-tool [MCP server](https://modelcontextprotocol.io/) — AI assistants query your graph directly. Single-repo by default |
101
121
  | **🌐** | **Multi-language, one CLI** | JS/TS + Python + Go + Rust + Java + C# + PHP + Ruby + HCL in a single graph |
102
122
  | **💥** | **Git diff impact** | `codegraph diff-impact` shows changed functions, their callers, and full blast radius — enriched with historically coupled files from git co-change analysis. Ships with a GitHub Actions workflow |
103
- | **🧠** | **Semantic search** | Local embeddings by default, LLM-powered when opted in multi-query with RRF ranking via `"auth; token; JWT"` |
123
+ | **🧠** | **Hybrid search** | BM25 keyword + semantic embeddings fused via RRF — `hybrid` (default), `semantic`, or `keyword` mode; multi-query via `"auth; token; JWT"` |
124
+ | **🔬** | **Dataflow + CFG** | Track how data flows through functions (`flows_to`, `returns`, `mutates`) and visualize intraprocedural control flow graphs for all 11 languages |
104
125
 
105
126
  ---
106
127
 
@@ -127,6 +148,8 @@ git clone https://github.com/optave/codegraph.git
127
148
  cd codegraph && npm install && npm link
128
149
  ```
129
150
 
151
+ > **Dev builds:** Pre-release tarballs are attached to [GitHub Releases](https://github.com/optave/codegraph/releases). Install with `npm install -g <path-to-tarball>`. Note that `npm install -g <tarball-url>` does not work because npm cannot resolve optional platform-specific dependencies from a URL — download the `.tgz` first, then install from the local file.
152
+
130
153
  ### For AI agents
131
154
 
132
155
  Add codegraph to your agent's instructions (e.g. `CLAUDE.md`):
@@ -144,7 +167,7 @@ After modifying code:
144
167
  Or connect directly via MCP:
145
168
 
146
169
  ```bash
147
- codegraph mcp # 24-tool MCP server — AI queries the graph directly
170
+ codegraph mcp # 30-tool MCP server — AI queries the graph directly
148
171
  ```
149
172
 
150
173
  Full agent setup: [AI Agent Guide](docs/guides/ai-agent-guide.md) &middot; [CLAUDE.md template](docs/guides/ai-agent-guide.md#claudemd-template)
@@ -159,7 +182,7 @@ Full agent setup: [AI Agent Guide](docs/guides/ai-agent-guide.md) &middot; [CLAU
159
182
  | 📁 | **File dependencies** | See what a file imports and what imports it |
160
183
  | 💥 | **Impact analysis** | Trace every file affected by a change (transitive) |
161
184
  | 🧬 | **Function-level tracing** | Call chains, caller trees, function-level impact, and A→B pathfinding with qualified call resolution |
162
- | 🎯 | **Deep context** | `context` gives AI agents source, deps, callers, signature, and tests for a function in one call; `explain` gives structural summaries of files or functions |
185
+ | 🎯 | **Deep context** | `context` gives AI agents source, deps, callers, signature, and tests for a function in one call; `audit --quick` gives structural summaries of files or functions |
163
186
  | 📍 | **Fast lookup** | `where` shows exactly where a symbol is defined and used — minimal, fast |
164
187
  | 📊 | **Diff impact** | Parse `git diff`, find overlapping functions, trace their callers |
165
188
  | 🔗 | **Co-change analysis** | Analyze git history for files that always change together — surfaces hidden coupling the static graph can't see; enriches `diff-impact` with historically coupled files |
@@ -167,14 +190,31 @@ Full agent setup: [AI Agent Guide](docs/guides/ai-agent-guide.md) &middot; [CLAU
167
190
  | 🏗️ | **Structure & hotspots** | Directory cohesion scores, fan-in/fan-out hotspot detection, module boundaries |
168
191
  | 🏷️ | **Node role classification** | Every symbol auto-tagged as `entry`/`core`/`utility`/`adapter`/`dead`/`leaf` based on connectivity patterns — agents instantly know architectural role |
169
192
  | 🔄 | **Cycle detection** | Find circular dependencies at file or function level |
170
- | 📤 | **Export** | DOT (Graphviz), Mermaid, and JSON graph export |
193
+ | 📤 | **Export** | DOT, Mermaid, JSON, GraphML, GraphSON, and Neo4j CSV graph export |
171
194
  | 🧠 | **Semantic search** | Embeddings-powered natural language search with multi-query RRF ranking |
172
195
  | 👀 | **Watch mode** | Incrementally update the graph as files change |
173
- | 🤖 | **MCP server** | 24-tool MCP server for AI assistants; single-repo by default, opt-in multi-repo |
196
+ | 🤖 | **MCP server** | 30-tool MCP server for AI assistants; single-repo by default, opt-in multi-repo |
174
197
  | ⚡ | **Always fresh** | Three-tier incremental detection — sub-second rebuilds even on large codebases |
175
198
  | 🧮 | **Complexity metrics** | Cognitive, cyclomatic, nesting depth, Halstead, and Maintainability Index per function |
176
199
  | 🏘️ | **Community detection** | Louvain clustering to discover natural module boundaries and architectural drift |
177
- | 📜 | **Manifesto rule engine** | Configurable pass/fail rules with warn/fail thresholds for CI gates (exit code 1 on fail) |
200
+ | 📜 | **Manifesto rule engine** | Configurable pass/fail rules with warn/fail thresholds for CI gates via `check` (exit code 1 on fail) |
201
+ | 👥 | **CODEOWNERS integration** | Map graph nodes to CODEOWNERS entries — see who owns each function, ownership boundaries in `diff-impact` |
202
+ | 💾 | **Graph snapshots** | `snapshot save`/`restore` for instant DB backup and rollback — checkpoint before refactoring, restore without rebuilding |
203
+ | 🔎 | **Hybrid BM25 + semantic search** | FTS5 keyword search + embedding-based semantic search fused via Reciprocal Rank Fusion — `hybrid`, `semantic`, or `keyword` modes |
204
+ | 📄 | **Pagination & NDJSON streaming** | Universal `--limit`/`--offset` pagination on all MCP tools and CLI commands; `--ndjson` for newline-delimited JSON streaming |
205
+ | 🔀 | **Branch structural diff** | Compare code structure between two git refs — added/removed/changed symbols with transitive caller impact |
206
+ | 🛡️ | **Architecture boundaries** | User-defined dependency rules between modules with onion architecture preset — violations flagged in manifesto and CI |
207
+ | ✅ | **CI validation predicates** | `check` command with configurable gates: complexity, blast radius, cycles, boundary violations — exit code 0/1 for CI |
208
+ | 📋 | **Composite audit** | Single `audit` command combining explain + impact + health metrics per function — one call instead of 3-4 |
209
+ | 🚦 | **Triage queue** | `triage` merges connectivity, hotspots, roles, and complexity into a ranked audit priority queue |
210
+ | 📦 | **Batch querying** | Accept a list of targets and return all results in one JSON payload — enables multi-agent parallel dispatch |
211
+ | 🔬 | **Dataflow analysis** | Track how data moves through functions with `flows_to`, `returns`, and `mutates` edges — opt-in via `build --dataflow` (JS/TS) |
212
+ | 🧩 | **Control flow graph** | Intraprocedural CFG construction for all 11 languages — `cfg` command with text/DOT/Mermaid output, opt-in via `build --cfg` |
213
+ | 🔎 | **AST node querying** | Stored queryable AST nodes (calls, `new`, string, regex, throw, await) — `ast` command with SQL GLOB pattern matching |
214
+ | 🧬 | **Expanded node/edge types** | `parameter`, `property`, `constant` node kinds with `parent_id` for sub-declaration queries; `contains`, `parameter_of`, `receiver` edge kinds |
215
+ | 📊 | **Exports analysis** | `exports <file>` shows all exported symbols with per-symbol consumers, re-export detection, and counts |
216
+ | 📈 | **Interactive viewer** | `codegraph plot` generates an interactive HTML graph viewer with hierarchical/force/radial layouts, complexity overlays, and drill-down |
217
+ | 🏷️ | **Stable JSON schema** | `normalizeSymbol` utility ensures consistent 7-field output (name, kind, file, line, endLine, role, fileHash) across all commands |
178
218
 
179
219
  See [docs/examples](docs/examples) for real-world CLI and MCP usage examples.
180
220
 
@@ -202,6 +242,8 @@ codegraph stats # Graph health: nodes, edges, languages, quality
202
242
  codegraph roles # Node role classification (entry, core, utility, adapter, dead, leaf)
203
243
  codegraph roles --role dead -T # Find dead code (unreferenced, non-exported symbols)
204
244
  codegraph roles --role core --file src/ # Core symbols in src/
245
+ codegraph exports src/queries.js # Per-symbol consumer analysis (who calls each export)
246
+ codegraph children <name> # List parameters, properties, constants of a symbol
205
247
  ```
206
248
 
207
249
  ### Deep Context (AI-Optimized)
@@ -209,24 +251,28 @@ codegraph roles --role core --file src/ # Core symbols in src/
209
251
  ```bash
210
252
  codegraph context <name> # Full context: source, deps, callers, signature, tests
211
253
  codegraph context <name> --depth 2 --no-tests # Include callee source 2 levels deep
212
- codegraph explain <file> # Structural summary: public API, internals, data flow
213
- codegraph explain <function> # Function summary: signature, calls, callers, tests
254
+ codegraph audit <file> --quick # Structural summary: public API, internals, data flow
255
+ codegraph audit <function> --quick # Function summary: signature, calls, callers, tests
214
256
  ```
215
257
 
216
258
  ### Impact Analysis
217
259
 
218
260
  ```bash
219
261
  codegraph impact <file> # Transitive reverse dependency trace
220
- codegraph fn <name> # Function-level: callers, callees, call chain
221
- codegraph fn <name> --no-tests --depth 5
262
+ codegraph query <name> # Function-level: callers, callees, call chain
263
+ codegraph query <name> --no-tests --depth 5
222
264
  codegraph fn-impact <name> # What functions break if this one changes
223
- codegraph path <from> <to> # Shortest path between two symbols (A calls...calls B)
265
+ codegraph path <from> <to> # Shortest path between two symbols (A calls...calls B)
224
266
  codegraph path <from> <to> --reverse # Follow edges backward
225
- codegraph path <from> <to> --max-depth 5 --kinds calls,imports
267
+ codegraph path <from> <to> --depth 5 --kinds calls,imports
226
268
  codegraph diff-impact # Impact of unstaged git changes
227
269
  codegraph diff-impact --staged # Impact of staged changes
228
270
  codegraph diff-impact HEAD~3 # Impact vs a specific ref
229
271
  codegraph diff-impact main --format mermaid -T # Mermaid flowchart of blast radius
272
+ codegraph branch-compare main feature-branch # Structural diff between two refs
273
+ codegraph branch-compare main HEAD --no-tests # Symbols added/removed/changed vs main
274
+ codegraph branch-compare v2.4.0 v2.5.0 --json # JSON output for programmatic use
275
+ codegraph branch-compare main HEAD --format mermaid # Mermaid diagram of structural changes
230
276
  ```
231
277
 
232
278
  ### Co-Change Analysis
@@ -249,8 +295,8 @@ Co-change data also enriches `diff-impact` — historically coupled files appear
249
295
 
250
296
  ```bash
251
297
  codegraph structure # Directory overview with cohesion scores
252
- codegraph hotspots # Files with extreme fan-in, fan-out, or density
253
- codegraph hotspots --metric coupling --level directory --no-tests
298
+ codegraph triage --level file # Files with extreme fan-in, fan-out, or density
299
+ codegraph triage --level directory --sort coupling --no-tests
254
300
  ```
255
301
 
256
302
  ### Code Health & Architecture
@@ -263,8 +309,79 @@ codegraph complexity --above-threshold -T # Only functions exceeding warn thres
263
309
  codegraph communities # Louvain community detection — natural module boundaries
264
310
  codegraph communities --drift -T # Drift analysis only — split/merge candidates
265
311
  codegraph communities --functions # Function-level community detection
266
- codegraph manifesto # Pass/fail rule engine (exit code 1 on fail)
267
- codegraph manifesto -T # Exclude test files from rule evaluation
312
+ codegraph check # Pass/fail rule engine (exit code 1 on fail)
313
+ codegraph check -T # Exclude test files from rule evaluation
314
+ ```
315
+
316
+ ### Dataflow, CFG & AST
317
+
318
+ ```bash
319
+ codegraph dataflow <name> # Data flow edges for a function (flows_to, returns, mutates)
320
+ codegraph dataflow <name> --impact # Transitive data-dependent blast radius
321
+ codegraph cfg <name> # Control flow graph (text format)
322
+ codegraph cfg <name> --format dot # CFG as Graphviz DOT
323
+ codegraph cfg <name> --format mermaid # CFG as Mermaid diagram
324
+ codegraph ast # List all stored AST nodes
325
+ codegraph ast "handleAuth" # Search AST nodes by pattern (GLOB)
326
+ codegraph ast -k call # Filter by kind: call, new, string, regex, throw, await
327
+ codegraph ast -k throw --file src/ # Combine kind and file filters
328
+ ```
329
+
330
+ > **Note:** Dataflow requires `codegraph build --dataflow` (JS/TS only). CFG requires `codegraph build --cfg`. Both are opt-in to keep default builds fast.
331
+
332
+ ### Audit, Triage & Batch
333
+
334
+ Composite commands for risk-driven workflows and multi-agent dispatch.
335
+
336
+ ```bash
337
+ codegraph audit <file-or-function> # Combined structural summary + impact + health in one report
338
+ codegraph audit <target> --quick # Structural summary only (skip impact and health)
339
+ codegraph audit src/queries.js -T # Audit all functions in a file
340
+ codegraph triage # Ranked audit priority queue (connectivity + hotspots + roles)
341
+ codegraph triage -T --limit 20 # Top 20 riskiest functions, excluding tests
342
+ codegraph triage --level file -T # File-level hotspot analysis
343
+ codegraph triage --level directory -T # Directory-level hotspot analysis
344
+ codegraph batch target1 target2 ... # Batch query multiple targets in one call
345
+ codegraph batch --json targets.json # Batch from a JSON file
346
+ ```
347
+
348
+ ### CI Validation
349
+
350
+ `codegraph check` provides configurable pass/fail predicates for CI gates and state machines. Exit code 0 = pass, 1 = fail.
351
+
352
+ ```bash
353
+ codegraph check # Run manifesto rules on whole codebase
354
+ codegraph check --staged # Check staged changes (diff predicates)
355
+ codegraph check --staged --rules # Run both diff predicates AND manifesto rules
356
+ codegraph check --no-new-cycles # Fail if staged changes introduce cycles
357
+ codegraph check --max-complexity 30 # Fail if any function exceeds complexity threshold
358
+ codegraph check --max-blast-radius 50 # Fail if blast radius exceeds limit
359
+ codegraph check --no-boundary-violations # Fail on architecture boundary violations
360
+ codegraph check main # Check current branch vs main
361
+ ```
362
+
363
+ ### CODEOWNERS
364
+
365
+ Map graph symbols to CODEOWNERS entries. Shows who owns each function and surfaces ownership boundaries.
366
+
367
+ ```bash
368
+ codegraph owners # Show ownership for all symbols
369
+ codegraph owners src/queries.js # Ownership for symbols in a specific file
370
+ codegraph owners --boundary # Show ownership boundaries between modules
371
+ codegraph owners --owner @backend # Filter by owner
372
+ ```
373
+
374
+ Ownership data also enriches `diff-impact` — affected owners and suggested reviewers appear alongside the static dependency analysis.
375
+
376
+ ### Snapshots
377
+
378
+ Lightweight SQLite DB backup and restore — checkpoint before refactoring, instantly rollback without rebuilding.
379
+
380
+ ```bash
381
+ codegraph snapshot save before-refactor # Save a named snapshot
382
+ codegraph snapshot list # List all snapshots
383
+ codegraph snapshot restore before-refactor # Restore a snapshot
384
+ codegraph snapshot delete before-refactor # Delete a snapshot
268
385
  ```
269
386
 
270
387
  ### Export & Visualization
@@ -273,7 +390,11 @@ codegraph manifesto -T # Exclude test files from rule evaluation
273
390
  codegraph export -f dot # Graphviz DOT format
274
391
  codegraph export -f mermaid # Mermaid diagram
275
392
  codegraph export -f json # JSON graph
393
+ codegraph export -f graphml # GraphML (XML standard)
394
+ codegraph export -f graphson # GraphSON (TinkerPop v3 / Gremlin)
395
+ codegraph export -f neo4j # Neo4j CSV (bulk import, separate nodes/relationships files)
276
396
  codegraph export --functions -o graph.dot # Function-level, write to file
397
+ codegraph plot # Interactive HTML viewer with force/hierarchical/radial layouts
277
398
  codegraph cycles # Detect circular dependencies
278
399
  codegraph cycles --functions # Function-level cycles
279
400
  ```
@@ -287,6 +408,9 @@ codegraph embed # Build embeddings (default: nomic-v1.5)
287
408
  codegraph embed --model nomic # Use a different model
288
409
  codegraph search "handle authentication"
289
410
  codegraph search "parse config" --min-score 0.4 -n 10
411
+ codegraph search "parseConfig" --mode keyword # BM25 keyword-only (exact names)
412
+ codegraph search "auth flow" --mode semantic # Embedding-only (conceptual)
413
+ codegraph search "auth flow" --mode hybrid # BM25 + semantic RRF fusion (default)
290
414
  codegraph models # List available models
291
415
  ```
292
416
 
@@ -336,13 +460,17 @@ codegraph registry remove <name> # Unregister
336
460
  | Flag | Description |
337
461
  |---|---|
338
462
  | `-d, --db <path>` | Custom path to `graph.db` |
339
- | `-T, --no-tests` | Exclude `.test.`, `.spec.`, `__test__` files (available on `fn`, `fn-impact`, `path`, `context`, `explain`, `where`, `diff-impact`, `search`, `map`, `hotspots`, `roles`, `co-change`, `deps`, `impact`, `complexity`, `communities`, `manifesto`) |
463
+ | `-T, --no-tests` | Exclude `.test.`, `.spec.`, `__test__` files (available on most query commands including `query`, `fn-impact`, `path`, `context`, `where`, `diff-impact`, `search`, `map`, `roles`, `co-change`, `deps`, `impact`, `complexity`, `communities`, `branch-compare`, `audit`, `triage`, `check`, `dataflow`, `cfg`, `ast`, `exports`, `children`) |
340
464
  | `--depth <n>` | Transitive trace depth (default varies by command) |
341
465
  | `-j, --json` | Output as JSON |
342
466
  | `-v, --verbose` | Enable debug output |
343
467
  | `--engine <engine>` | Parser engine: `native`, `wasm`, or `auto` (default: `auto`) |
344
- | `-k, --kind <kind>` | Filter by kind: `function`, `method`, `class`, `struct`, `enum`, `trait`, `record`, `module` (`fn`, `context`, `search`) |
468
+ | `-k, --kind <kind>` | Filter by kind: `function`, `method`, `class`, `interface`, `type`, `struct`, `enum`, `trait`, `record`, `module`, `parameter`, `property`, `constant` |
345
469
  | `-f, --file <path>` | Scope to a specific file (`fn`, `context`, `where`) |
470
+ | `--mode <mode>` | Search mode: `hybrid` (default), `semantic`, or `keyword` (`search`) |
471
+ | `--ndjson` | Output as newline-delimited JSON (one object per line) |
472
+ | `--limit <n>` | Limit number of results |
473
+ | `--offset <n>` | Skip first N results (pagination) |
346
474
  | `--rrf-k <n>` | RRF smoothing constant for multi-query search (default 60) |
347
475
 
348
476
  ## 🌐 Language Support
@@ -375,10 +503,11 @@ codegraph registry remove <name> # Unregister
375
503
  ```
376
504
 
377
505
  1. **Parse** — tree-sitter parses every source file into an AST (native Rust engine or WASM fallback)
378
- 2. **Extract** — Functions, classes, methods, interfaces, imports, exports, and call sites are extracted
506
+ 2. **Extract** — Functions, classes, methods, interfaces, imports, exports, call sites, parameters, properties, and constants are extracted
379
507
  3. **Resolve** — Imports are resolved to actual files (handles ESM conventions, `tsconfig.json` path aliases, `baseUrl`)
380
- 4. **Store** — Everything goes into SQLite as nodes + edges with tree-sitter node boundaries
381
- 5. **Query** — All queries run locally against the SQLite DB typically under 100ms
508
+ 4. **Store** — Everything goes into SQLite as nodes + edges with tree-sitter node boundaries, plus structural edges (`contains`, `parameter_of`, `receiver`)
509
+ 5. **Analyze** (opt-in) Complexity metrics, control flow graphs (`--cfg`), dataflow edges (`--dataflow`), and AST node storage
510
+ 6. **Query** — All queries run locally against the SQLite DB — typically under 100ms
382
511
 
383
512
  ### Incremental Rebuilds
384
513
 
@@ -419,18 +548,18 @@ Codegraph also extracts symbols from common callback patterns: Commander `.comma
419
548
 
420
549
  ## 📊 Performance
421
550
 
422
- Self-measured on every release via CI ([build benchmarks](generated/BUILD-BENCHMARKS.md) | [embedding benchmarks](generated/EMBEDDING-BENCHMARKS.md)):
551
+ Self-measured on every release via CI ([build benchmarks](generated/benchmarks/BUILD-BENCHMARKS.md) | [embedding benchmarks](generated/benchmarks/EMBEDDING-BENCHMARKS.md)):
423
552
 
424
553
  | Metric | Latest |
425
554
  |---|---|
426
- | Build speed (native) | **2 ms/file** |
427
- | Build speed (WASM) | **8.4 ms/file** |
428
- | Query time | **2ms** |
555
+ | Build speed (native) | **1.9 ms/file** |
556
+ | Build speed (WASM) | **8.3 ms/file** |
557
+ | Query time | **3ms** |
429
558
  | No-op rebuild (native) | **4ms** |
430
- | 1-file rebuild (native) | **97ms** |
431
- | Query: fn-deps | **2.1ms** |
432
- | Query: path | **1.2ms** |
433
- | ~50,000 files (est.) | **~100.0s build** |
559
+ | 1-file rebuild (native) | **124ms** |
560
+ | Query: fn-deps | **1.4ms** |
561
+ | Query: path | **1.4ms** |
562
+ | ~50,000 files (est.) | **~95.0s build** |
434
563
 
435
564
  Metrics are normalized per file for cross-version comparability. Times above are for a full initial build — incremental rebuilds only re-parse changed files.
436
565
 
@@ -452,7 +581,7 @@ Optional: `@huggingface/transformers` (semantic search), `@modelcontextprotocol/
452
581
 
453
582
  ### MCP Server
454
583
 
455
- Codegraph includes a built-in [Model Context Protocol](https://modelcontextprotocol.io/) server with 24 tools (25 in multi-repo mode), so AI assistants can query your dependency graph directly:
584
+ Codegraph includes a built-in [Model Context Protocol](https://modelcontextprotocol.io/) server with 30 tools (31 in multi-repo mode), so AI assistants can query your dependency graph directly:
456
585
 
457
586
  ```bash
458
587
  codegraph mcp # Single-repo mode (default) — only local project
@@ -475,7 +604,7 @@ This project uses codegraph. The database is at `.codegraph/graph.db`.
475
604
 
476
605
  ### Before modifying code, always:
477
606
  1. `codegraph where <name>` — find where the symbol lives
478
- 2. `codegraph explain <file-or-function>` — understand the structure
607
+ 2. `codegraph audit <file-or-function> --quick` — understand the structure
479
608
  3. `codegraph context <name> -T` — get full context (source, deps, callers)
480
609
  4. `codegraph fn-impact <name> -T` — check blast radius before editing
481
610
 
@@ -485,7 +614,7 @@ This project uses codegraph. The database is at `.codegraph/graph.db`.
485
614
  ### Other useful commands
486
615
  - `codegraph build .` — rebuild the graph (incremental by default)
487
616
  - `codegraph map` — module overview
488
- - `codegraph fn <name> -T` — function call chain
617
+ - `codegraph query <name> -T` — function call chain (callers + callees)
489
618
  - `codegraph path <from> <to> -T` — shortest call path between two symbols
490
619
  - `codegraph deps <file>` — file-level dependencies
491
620
  - `codegraph roles --role dead -T` — find dead code (unreferenced symbols)
@@ -493,8 +622,23 @@ This project uses codegraph. The database is at `.codegraph/graph.db`.
493
622
  - `codegraph co-change <file>` — files that historically change together
494
623
  - `codegraph complexity -T` — per-function complexity metrics (cognitive, cyclomatic, MI)
495
624
  - `codegraph communities --drift -T` — module boundary drift analysis
496
- - `codegraph manifesto -T` — pass/fail rule check (CI gate, exit code 1 on fail)
497
- - `codegraph search "<query>"` — semantic search (requires `codegraph embed`)
625
+ - `codegraph check -T` — pass/fail rule check (CI gate, exit code 1 on fail)
626
+ - `codegraph audit <target> -T` — combined structural summary + impact + health in one report
627
+ - `codegraph triage -T` — ranked audit priority queue
628
+ - `codegraph triage --level file -T` — file-level hotspot analysis
629
+ - `codegraph check --staged` — CI validation predicates (exit code 0/1)
630
+ - `codegraph batch target1 target2` — batch query multiple targets at once
631
+ - `codegraph owners [target]` — CODEOWNERS mapping for symbols
632
+ - `codegraph snapshot save <name>` — checkpoint the graph DB before refactoring
633
+ - `codegraph branch-compare main HEAD -T` — structural diff between two refs (added/removed/changed symbols)
634
+ - `codegraph exports <file>` — per-symbol consumer analysis (who calls each export)
635
+ - `codegraph children <name>` — list parameters, properties, constants of a symbol
636
+ - `codegraph dataflow <name>` — data flow edges (flows_to, returns, mutates)
637
+ - `codegraph cfg <name>` — intraprocedural control flow graph
638
+ - `codegraph ast <pattern>` — search stored AST nodes (calls, new, string, regex, throw, await)
639
+ - `codegraph plot` — interactive HTML dependency graph viewer
640
+ - `codegraph search "<query>"` — hybrid search (requires `codegraph embed`)
641
+ - `codegraph search "<query>" --mode keyword` — BM25 keyword search
498
642
  - `codegraph cycles` — check for circular dependencies
499
643
 
500
644
  ### Flags
@@ -576,7 +720,7 @@ Create a `.codegraphrc.json` in your project root to customize behavior:
576
720
 
577
721
  ### Manifesto rules
578
722
 
579
- Configure pass/fail thresholds for `codegraph manifesto`:
723
+ Configure pass/fail thresholds for `codegraph check` (manifesto mode):
580
724
 
581
725
  ```json
582
726
  {
@@ -592,7 +736,7 @@ Configure pass/fail thresholds for `codegraph manifesto`:
592
736
  }
593
737
  ```
594
738
 
595
- When any function exceeds a `fail` threshold, `codegraph manifesto` exits with code 1 — perfect for CI gates.
739
+ When any function exceeds a `fail` threshold, `codegraph check` exits with code 1 — perfect for CI gates.
596
740
 
597
741
  ### LLM credentials
598
742
 
@@ -616,13 +760,14 @@ Works with any secret manager: 1Password CLI (`op`), Bitwarden (`bw`), `pass`, H
616
760
  Codegraph also exports a full API for use in your own tools:
617
761
 
618
762
  ```js
619
- import { buildGraph, queryNameData, findCycles, exportDOT } from '@optave/codegraph';
763
+ import { buildGraph, queryNameData, findCycles, exportDOT, normalizeSymbol } from '@optave/codegraph';
620
764
 
621
765
  // Build the graph
622
766
  buildGraph('/path/to/project');
623
767
 
624
768
  // Query programmatically
625
769
  const results = queryNameData('myFunction', '/path/to/.codegraph/graph.db');
770
+ // All query results use normalizeSymbol for a stable 7-field schema
626
771
  ```
627
772
 
628
773
  ```js
@@ -659,25 +804,7 @@ const { results: fused } = await multiSearchData(
659
804
  - **No full type inference** — parses `.d.ts` interfaces but doesn't use TypeScript's type checker for overload resolution
660
805
  - **Dynamic calls are best-effort** — complex computed property access and `eval` patterns are not resolved
661
806
  - **Python imports** — resolves relative imports but doesn't follow `sys.path` or virtual environment packages
662
-
663
- ## 🔍 How Codegraph Compares
664
-
665
- <sub>Last verified: February 2026. Full analysis: <a href="generated/COMPETITIVE_ANALYSIS.md">COMPETITIVE_ANALYSIS.md</a></sub>
666
-
667
- | Capability | codegraph | [joern](https://github.com/joernio/joern) | [narsil-mcp](https://github.com/postrv/narsil-mcp) | [code-graph-rag](https://github.com/vitali87/code-graph-rag) | [cpg](https://github.com/Fraunhofer-AISEC/cpg) | [GitNexus](https://github.com/abhigyanpatwari/GitNexus) |
668
- |---|:---:|:---:|:---:|:---:|:---:|:---:|
669
- | Function-level analysis | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** |
670
- | Multi-language | **11** | **14** | **32** | Multi | **~10** | **9** |
671
- | Incremental rebuilds | **O(changed)** | — | O(n) Merkle | — | — | — |
672
- | MCP / AI agent support | **Yes** | — | **Yes** | **Yes** | **Yes** | **Yes** |
673
- | Git diff impact | **Yes** | — | — | — | — | **Yes** |
674
- | Git co-change analysis | **Yes** | — | — | — | — | — |
675
- | Dead code / role classification | **Yes** | — | **Yes** | — | — | — |
676
- | Semantic search | **Yes** | — | **Yes** | **Yes** | — | **Yes** |
677
- | Watch mode | **Yes** | — | **Yes** | — | — | — |
678
- | Zero config, no Docker/JVM | **Yes** | — | **Yes** | — | — | — |
679
- | Works without API keys | **Yes** | **Yes** | **Yes** | — | **Yes** | **Yes** |
680
- | Commercial use (Apache/MIT) | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | — |
807
+ - **Dataflow analysis** — currently JS/TS only; intraprocedural (single-function scope), not interprocedural
681
808
 
682
809
  ## 🗺️ Roadmap
683
810
 
@@ -685,12 +812,12 @@ See **[ROADMAP.md](docs/roadmap/ROADMAP.md)** for the full development roadmap a
685
812
 
686
813
  1. ~~**Rust Core**~~ — **Complete** (v1.3.0) — native tree-sitter parsing via napi-rs, parallel multi-core parsing, incremental re-parsing, import resolution & cycle detection in Rust
687
814
  2. ~~**Foundation Hardening**~~ — **Complete** (v1.4.0) — parser registry, 12-tool MCP server with multi-repo support, test coverage 62%→75%, `apiKeyCommand` secret resolution, global repo registry
688
- 3. **Architectural Refactoring** — parser plugin system, repository pattern, pipeline builder, engine strategy, domain errors, curated API
689
- 4. **Intelligent Embeddings** — LLM-generated descriptions, hybrid search
815
+ 3. ~~**Deep Analysis**~~ — **Complete** (v3.0.0) dataflow analysis (flows_to, returns, mutates), intraprocedural CFG for all 11 languages, stored AST nodes, expanded node/edge types (parameter, property, constant, contains, parameter_of, receiver), GraphML/GraphSON/Neo4j CSV export, interactive HTML viewer, CLI consolidation, stable JSON schema
816
+ 4. **Architectural Refactoring** — parser plugin system, repository pattern, pipeline builder, engine strategy, domain errors, curated API
690
817
  5. **Natural Language Queries** — `codegraph ask` command, conversational sessions
691
818
  6. **Expanded Language Support** — 8 new languages (12 → 20)
692
819
  7. **GitHub Integration & CI** — reusable GitHub Action, PR review, SARIF output
693
- 8. **Visualization & Advanced** — web UI, monorepo support, agentic search
820
+ 8. **TypeScript Migration** — gradual migration from JS to TypeScript
694
821
 
695
822
  ## 🤝 Contributing
696
823
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@optave/codegraph",
3
- "version": "2.5.1",
3
+ "version": "3.0.0",
4
4
  "description": "Local code graph CLI — parse codebases with tree-sitter, build dependency graphs, query them",
5
5
  "type": "module",
6
6
  "main": "src/index.js",
@@ -71,15 +71,16 @@
71
71
  },
72
72
  "optionalDependencies": {
73
73
  "@modelcontextprotocol/sdk": "^1.0.0",
74
- "@optave/codegraph-darwin-arm64": "2.5.1",
75
- "@optave/codegraph-darwin-x64": "2.5.1",
76
- "@optave/codegraph-linux-x64-gnu": "2.5.1",
77
- "@optave/codegraph-win32-x64-msvc": "2.5.1"
74
+ "@optave/codegraph-darwin-arm64": "3.0.0",
75
+ "@optave/codegraph-darwin-x64": "3.0.0",
76
+ "@optave/codegraph-linux-x64-gnu": "3.0.0",
77
+ "@optave/codegraph-win32-x64-msvc": "3.0.0"
78
78
  },
79
79
  "devDependencies": {
80
80
  "@biomejs/biome": "^2.4.4",
81
- "@commitlint/cli": "^19.8",
82
- "@commitlint/config-conventional": "^19.8",
81
+ "@commitlint/cli": "^20.4",
82
+ "@commitlint/config-conventional": "^20.0",
83
+ "@huggingface/transformers": "^3.8.1",
83
84
  "@tree-sitter-grammars/tree-sitter-hcl": "^1.2.0",
84
85
  "@vitest/coverage-v8": "^4.0.18",
85
86
  "commit-and-tag-version": "^12.5",