@optave/codegraph 2.0.0 → 2.1.1-dev.00f091c

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -45,19 +45,18 @@ Most tools in this space can't do that:
45
45
 
46
46
  | Problem | Who has it | Why it breaks on every commit |
47
47
  |---|---|---|
48
- | **Full re-index on every change** | code-graph-rag, CodeMCP, axon, autodev-codebase | No file-level change tracking. Change one file → re-parse and re-insert the entire codebase. On a 3,000-file project, that's 30+ seconds per commit minimum |
49
- | **Cloud API calls baked into the pipeline** | code-graph-rag, autodev-codebase, Claude-code-memory, CodeRAG | Embeddings are generated through cloud APIs (OpenAI, Voyage AI, Gemini). Every rebuild = API round-trips for every function. Slow, expensive, and rate-limited. You can't put this in a commit hook |
48
+ | **Full re-index on every change** | code-graph-rag, CodeMCP, axon, joern, cpg, GitNexus | No file-level change tracking. Change one file → re-parse and re-insert the entire codebase. On a 3,000-file project, that's 30+ seconds per commit minimum |
49
+ | **Cloud API calls baked into the pipeline** | code-graph-rag, CodeRAG | Embeddings are generated through cloud APIs (OpenAI, Voyage AI, Gemini). Every rebuild = API round-trips for every function. Slow, expensive, and rate-limited. You can't put this in a commit hook |
50
50
  | **Heavy infrastructure that's slow to restart** | code-graph-rag (Memgraph), axon (KuzuDB), badger-graph (Dgraph) | External databases add latency to every write. Bulk-inserting a full graph into Memgraph is not a sub-second operation |
51
- | **No persistence between runs** | glimpse, pyan, cflow | Re-parse from scratch every time. No database, no delta, no incremental anything |
51
+ | **No persistence between runs** | pyan, cflow | Re-parse from scratch every time. No database, no delta, no incremental anything |
52
52
 
53
- **Codegraph solves this with incremental builds:**
53
+ **Codegraph solves this with three-tier incremental change detection:**
54
54
 
55
- 1. Every file gets an MD5 hash stored in SQLite
56
- 2. On rebuild, only files whose hash changed get re-parsed
57
- 3. Stale nodes and edges for changed files are cleaned, then re-inserted
58
- 4. Everything else is untouched
55
+ 1. **Tier 0 Journal (O(changed)):** If `codegraph watch` was running, a change journal records exactly which files were touched. The next build reads the journal and only processes those files — zero filesystem scanning
56
+ 2. **Tier 1 — mtime+size (O(n) stats, O(changed) reads):** No journal? Codegraph stats every file and compares mtime + size against stored values. Matching files are skipped without reading a single byte — 10-100x cheaper than hashing
57
+ 3. **Tier 2 Hash (O(changed) reads):** Files that fail the mtime/size check are read and MD5-hashed. Only files whose hash actually changed get re-parsed and re-inserted
59
58
 
60
- **Result:** change one file in a 3,000-file project → rebuild completes in **under a second**. Put it in a commit hook, a file watcher, or let your AI agent trigger it. The graph is always current.
59
+ **Result:** change one file in a 3,000-file project → rebuild completes in **under a second**. With watch mode active, rebuilds are near-instant — the journal makes the build proportional to the number of changed files, not the size of the codebase. Put it in a commit hook, a file watcher, or let your AI agent trigger it. The graph is always current.
61
60
 
62
61
  And because the core pipeline is pure local computation (tree-sitter + SQLite), there are no API calls, no network latency, and no cost. LLM-powered features (semantic search, richer embeddings) are a separate optional layer — they enhance the graph but never block it from being current.
63
62
 
@@ -71,26 +70,27 @@ Most code graph tools make you choose: **fast local analysis with no AI, or powe
71
70
 
72
71
  ### Feature comparison
73
72
 
74
- | Capability | codegraph | [code-graph-rag](https://github.com/vitali87/code-graph-rag) | [glimpse](https://github.com/seatedro/glimpse) | [CodeMCP](https://github.com/SimplyLiz/CodeMCP) | [axon](https://github.com/harshkedia177/axon) | [autodev-codebase](https://github.com/anrgct/autodev-codebase) | [arbor](https://github.com/Anandb71/arbor) | [Claude-code-memory](https://github.com/Durafen/Claude-code-memory) |
73
+ | Capability | codegraph | [joern](https://github.com/joernio/joern) | [narsil-mcp](https://github.com/postrv/narsil-mcp) | [code-graph-rag](https://github.com/vitali87/code-graph-rag) | [cpg](https://github.com/Fraunhofer-AISEC/cpg) | [GitNexus](https://github.com/abhigyanpatwari/GitNexus) | [CodeMCP](https://github.com/SimplyLiz/CodeMCP) | [axon](https://github.com/harshkedia177/axon) |
75
74
  |---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
76
- | Function-level analysis | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | |
77
- | Multi-language | **11** | Multi | Multi | SCIP langs | Few | **40+** | Multi | |
78
- | Semantic search | **Yes** | **Yes** | | — | — | **Yes** | **Yes** | **Yes** |
79
- | MCP / AI agent support | **Yes** | **Yes** | — | **Yes** | | | **Yes** | **Yes** |
80
- | Git diff impact | **Yes** | — | — | — | **Yes** | — | | — |
81
- | Watch mode | **Yes** | — | | — | — | — | — | — |
82
- | CI workflow included | **Yes** | — | | — | — | — | — | |
83
- | Cycle detection | **Yes** | — | | — | **Yes** | — | — | — |
84
- | Incremental rebuilds | **Yes** | — | | — | — | — | — | — |
85
- | Zero config | **Yes** | — | **Yes** | — | — | — | **Yes** | — |
86
- | LLM-optional (works without API keys) | **Yes** | | **Yes** | **Yes** | **Yes** | | **Yes** | — |
87
- | Open source | **Yes** | Yes | Yes | Custom | | — | Yes | — |
75
+ | Function-level analysis | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** |
76
+ | Multi-language | **11** | **14** | **32** | Multi | **~10** | **9** | SCIP langs | Few |
77
+ | Semantic search | **Yes** | — | **Yes** | **Yes** | — | **Yes** | | |
78
+ | MCP / AI agent support | **Yes** | — | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | — |
79
+ | Git diff impact | **Yes** | — | — | — | — | **Yes** | — | **Yes** |
80
+ | Watch mode | **Yes** | — | **Yes** | — | — | — | — | — |
81
+ | Cycle detection | **Yes** | — | **Yes** | — | — | — | — | **Yes** |
82
+ | Incremental rebuilds | **O(changed)** | — | O(n) Merkle | — | | — | — | — |
83
+ | Zero config | **Yes** | — | **Yes** | — | — | — | — | — |
84
+ | Embeddable JS library (`npm install`) | **Yes** | — | | — | — | — | | — |
85
+ | LLM-optional (works without API keys) | **Yes** | **Yes** | **Yes** | — | **Yes** | **Yes** | **Yes** | **Yes** |
86
+ | Commercial use allowed | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | — | | — |
87
+ | Open source | **Yes** | Yes | Yes | Yes | Yes | Yes | Custom | — |
88
88
 
89
89
  ### What makes codegraph different
90
90
 
91
91
  | | Differentiator | In practice |
92
92
  |---|---|---|
93
- | **⚡** | **Always-fresh graph** | Sub-second incremental rebuilds via file-hash tracking. Run on every commit, every save, in watch mode the graph is never stale. Competitors re-index everything from scratch |
93
+ | **⚡** | **Always-fresh graph** | Three-tier change detection: journal (O(changed)) mtime+size (O(n) stats) hash (O(changed) reads). Sub-second rebuilds even on large codebases. Competitors re-index everything from scratch; Merkle-tree approaches still require O(n) filesystem scanning |
94
94
  | **🔓** | **Zero-cost core, LLM-enhanced when you want** | Full graph analysis with no API keys, no accounts, no cost. Optionally bring your own LLM provider for richer embeddings and AI-powered search — your code only goes to the provider you already chose |
95
95
  | **🔬** | **Function-level, not just files** | Traces `handleAuth()` → `validateToken()` → `decryptJWT()` and shows 14 callers across 9 files break if `decryptJWT` changes |
96
96
  | **🤖** | **Built for AI agents** | 13-tool [MCP server](https://modelcontextprotocol.io/) — AI assistants query your graph directly. Single-repo by default, your code doesn't leak to other projects |
@@ -100,22 +100,58 @@ Most code graph tools make you choose: **fast local analysis with no AI, or powe
100
100
 
101
101
  ### How other tools compare
102
102
 
103
- The key question is: **can you rebuild your graph on every commit in a large codebase without it costing money or taking minutes?** Most tools in this space either re-index everything from scratch (slow), require cloud API calls for core features (costly), or both. Codegraph's incremental builds keep the graph current in millisecondsand the core pipeline needs no API keys at all. LLM-powered features are opt-in, using whichever provider you already work with.
103
+ The key question is: **can you rebuild your graph on every commit in a large codebase without it costing money or taking minutes?** Most tools in this space either re-index everything from scratch (slow), require cloud API calls for core features (costly), or both. Codegraph's three-tier incremental detection achieves true O(changed) in the best case when the watcher is running, rebuilds are proportional only to the number of files that changed, not the size of the codebase. The core pipeline needs no API keys at all. LLM-powered features are opt-in, using whichever provider you already work with.
104
104
 
105
105
  | Tool | What it does well | The tradeoff |
106
106
  |---|---|---|
107
+ | [joern](https://github.com/joernio/joern) | Full CPG (AST + CFG + PDG) for vulnerability discovery, Scala query DSL, 14 languages, daily releases | No incremental builds — full re-parse on every change. Requires JDK 21, no built-in MCP, no watch mode |
108
+ | [narsil-mcp](https://github.com/postrv/narsil-mcp) | 90 MCP tools, 32 languages, taint analysis, SBOM, dead code, neural search, Merkle-tree incremental indexing, single ~30MB binary | Merkle trees still require O(n) filesystem scanning on every rebuild. Primarily MCP-only — no standalone CLI query interface. Neural search requires API key or ONNX source build |
107
109
  | [code-graph-rag](https://github.com/vitali87/code-graph-rag) | Graph RAG with Memgraph, multi-provider AI, semantic search, code editing via AST | No incremental rebuilds — full re-index + re-embed through cloud APIs on every change. Requires Docker |
108
- | [glimpse](https://github.com/seatedro/glimpse) | Clipboard-first LLM context tool, call graphs, LSP resolution, token counting | Context-packing tool, not a dependency graph no persistence, no MCP, no incremental updates |
110
+ | [cpg](https://github.com/Fraunhofer-AISEC/cpg) | Formal Code Property Graph (AST + CFG + PDG + DFG), ~10 languages, MCP module, LLVM IR support, academic specifications | No incremental builds. Requires JVM + Gradle, no zero config, no watch mode |
111
+ | [GitNexus](https://github.com/abhigyanpatwari/GitNexus) | Knowledge graph with precomputed structural intelligence, 7 MCP tools, hybrid search (BM25 + semantic + RRF), clustering, process tracing | Full 6-phase pipeline re-run on changes. KuzuDB graph DB, browser mode limited to ~5,000 files. **PolyForm NC — no commercial use** |
109
112
  | [CodeMCP](https://github.com/SimplyLiz/CodeMCP) | SCIP compiler-grade indexing, compound operations (83% token savings), secret scanning | No incremental builds. Custom license, requires SCIP toolchains per language |
110
113
  | [axon](https://github.com/harshkedia177/axon) | 11-phase pipeline, KuzuDB, community detection, dead code, change coupling | Full pipeline re-run on changes. No license, Python-only, no MCP |
111
- | [autodev-codebase](https://github.com/anrgct/autodev-codebase) | 40+ languages, interactive Cytoscape.js visualization, LLM reranking | Re-embeds through cloud APIs on changes. No license, complex setup |
112
- | [arbor](https://github.com/Anandb71/arbor) | Native GUI, confidence scoring, architectural role classification, fuzzy search | GUI-focused — no CLI pipeline, no watch mode, no CI integration |
113
- | [Claude-code-memory](https://github.com/Durafen/Claude-code-memory) | Persistent codebase memory for Claude Code, Memory Guard quality gate | Requires Voyage AI (cloud) + Qdrant (Docker) for core features |
114
114
  | [Madge](https://github.com/pahen/madge) | Simple file-level JS/TS dependency graphs | No function-level analysis, no impact tracing, JS/TS only |
115
115
  | [dependency-cruiser](https://github.com/sverweij/dependency-cruiser) | Architectural rule validation for JS/TS | Module-level only (function-level explicitly out of scope), requires config |
116
116
  | [Nx graph](https://nx.dev/) | Monorepo project-level dependency graph | Requires Nx workspace, project-level only (not file or function) |
117
117
  | [pyan](https://github.com/Technologicat/pyan) / [cflow](https://www.gnu.org/software/cflow/) | Function-level call graphs | Single-language each (Python / C only), no persistence, no queries |
118
118
 
119
+ ### Codegraph vs. Narsil-MCP: How to Decide
120
+
121
+ If you are looking for local code intelligence over MCP, the closest alternative to `codegraph` is [postrv/narsil-mcp](https://github.com/postrv/narsil-mcp). Both projects aim to give AI agents deep context about your codebase, but they approach the problem with fundamentally different philosophies.
122
+
123
+ Here is a cold, analytical breakdown to help you decide which tool fits your workflow.
124
+
125
+ #### The Core Difference
126
+
127
+ * **Codegraph is a surgical scalpel.** It does one thing exceptionally well: building an always-fresh, function-level dependency graph in SQLite and exposing it to AI agents with zero fluff.
128
+ * **Narsil-MCP is a Swiss Army knife.** It is a sprawling, "batteries-included" intelligence server that includes everything from taint analysis and SBOM generation to SPARQL knowledge graphs.
129
+
130
+ #### Feature Comparison
131
+
132
+ | Aspect | Optave Codegraph | Narsil-MCP |
133
+ | :--- | :--- | :--- |
134
+ | **Philosophy** | Lean, deterministic, AI-optimized | Comprehensive, feature-dense |
135
+ | **AI Tool Count** | 13 focused tools | 90 distinct tools |
136
+ | **Language Support** | 11 languages | 32 languages |
137
+ | **Primary Interface** | CLI-first with MCP integration | MCP-first (CLI is secondary) |
138
+ | **Supply Chain Risk** | Low (minimal dependency tree) | Higher (requires massive dependency graph for embedded ML/scanners) |
139
+ | **Graph Updates** | **Three-tier O(changed)** — journal → mtime+size → hash. With watch mode, only changed files are touched | Merkle trees — O(n) filesystem scan on every rebuild to recompute tree hashes |
140
+
141
+ #### Choose Codegraph if:
142
+
143
+ * **You need the fastest possible incremental rebuilds.** Codegraph’s three-tier change detection (journal → mtime+size → hash) achieves true O(changed) when the watcher is running — only touched files are processed. Narsil’s Merkle trees still require O(n) filesystem scanning to recompute hashes on every rebuild, even when nothing changed. On a 3,000-file project, this is the difference between near-instant and noticeable.
144
+ * **You want to optimize AI agent reasoning.** Large Language Models degrade in performance and hallucinate when overwhelmed with choices. Codegraph’s tight 13-tool surface area ensures agents quickly understand their capabilities without wasting context window tokens.
145
+ * **You are concerned about supply chain attacks.** To support 90 tools, SBOMs, and neural embeddings, a tool must pull in a massive dependency tree. Codegraph keeps its dependencies minimal, dramatically reducing the risk of malicious code sneaking onto your machine.
146
+ * **You want deterministic blast-radius checks.** Features like `diff-impact` are built specifically to tell you exactly how a changed function cascades through your codebase before you merge a PR.
147
+ * **You value a strong standalone CLI.** You want to query your code graph locally without necessarily spinning up an AI agent.
148
+
149
+ #### Choose Narsil-MCP if:
150
+
151
+ * **You want security and code intelligence together.** You dont want a separated MCP for security and prefer an 'all-in-one solution.
152
+ * **You use niche languages.** Your codebase relies heavily on languages outside of Codegraph's core 11 (e.g., Fortran, Erlang, Zig, Swift).
153
+ * **You are willing to manage tool presets.** Because 90 tools will overload an AI's context window, you don't mind manually configuring preset files (like "Minimal" or "Balanced") to restrict what the AI can see depending on your editor.
154
+
119
155
  ---
120
156
 
121
157
  ## 🚀 Quick Start
@@ -229,10 +265,10 @@ A single trailing semicolon is ignored (falls back to single-query mode). The `-
229
265
 
230
266
  | Flag | Model | Dimensions | Size | License | Notes |
231
267
  |---|---|---|---|---|---|
232
- | `minilm` (default) | all-MiniLM-L6-v2 | 384 | ~23 MB | Apache-2.0 | Fastest, good for quick iteration |
268
+ | `minilm` | all-MiniLM-L6-v2 | 384 | ~23 MB | Apache-2.0 | Fastest, good for quick iteration |
233
269
  | `jina-small` | jina-embeddings-v2-small-en | 512 | ~33 MB | Apache-2.0 | Better quality, still small |
234
270
  | `jina-base` | jina-embeddings-v2-base-en | 768 | ~137 MB | Apache-2.0 | High quality, 8192 token context |
235
- | `jina-code` | jina-embeddings-v2-base-code | 768 | ~137 MB | Apache-2.0 | **Best for code search**, trained on code+text |
271
+ | `jina-code` (default) | jina-embeddings-v2-base-code | 768 | ~137 MB | Apache-2.0 | **Best for code search**, trained on code+text |
236
272
  | `nomic` | nomic-embed-text-v1 | 768 | ~137 MB | Apache-2.0 | Good quality, 8192 context |
237
273
  | `nomic-v1.5` | nomic-embed-text-v1.5 | 768 | ~137 MB | Apache-2.0 | Improved nomic, Matryoshka dimensions |
238
274
  | `bge-large` | bge-large-en-v1.5 | 1024 | ~335 MB | MIT | Best general retrieval, top MTEB scores |
@@ -340,15 +376,16 @@ Dynamic patterns like `fn.call()`, `fn.apply()`, `fn.bind()`, and `obj["method"]
340
376
 
341
377
  ## 📊 Performance
342
378
 
343
- Benchmarked on a ~3,200-file TypeScript project:
379
+ Self-measured on every release via CI ([full history](generated/BENCHMARKS.md)):
344
380
 
345
- | Metric | Value |
381
+ | Metric | Latest |
346
382
  |---|---|
347
- | Build time | ~30s |
348
- | Nodes | 19,000+ |
349
- | Edges | 120,000+ |
350
- | Query time | <100ms |
351
- | DB size | ~5 MB |
383
+ | Build speed (native) | **2.5 ms/file** |
384
+ | Build speed (WASM) | **5 ms/file** |
385
+ | Query time | **1ms** |
386
+ | ~50,000 files (est.) | **~125.0s build** |
387
+
388
+ Metrics are normalized per file for cross-version comparability. Times above are for a full initial build — incremental rebuilds only re-parse changed files.
352
389
 
353
390
  ## 🤖 AI Agent Integration
354
391
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@optave/codegraph",
3
- "version": "2.0.0",
3
+ "version": "2.1.1-dev.00f091c",
4
4
  "description": "Local code graph CLI — parse codebases with tree-sitter, build dependency graphs, query them",
5
5
  "type": "module",
6
6
  "main": "src/index.js",
@@ -29,11 +29,11 @@
29
29
  "lint": "biome check src/ tests/",
30
30
  "lint:fix": "biome check --write src/ tests/",
31
31
  "format": "biome format --write src/ tests/",
32
- "prepare": "npm run build:wasm && husky",
32
+ "prepare": "npm run build:wasm && husky && npm run deps:tree",
33
+ "deps:tree": "node scripts/gen-deps.cjs",
33
34
  "release": "commit-and-tag-version",
34
35
  "release:dry-run": "commit-and-tag-version --dry-run",
35
- "version": "node scripts/sync-native-versions.js && git add package.json",
36
- "prepublishOnly": "npm test"
36
+ "version": "node scripts/sync-native-versions.js && git add package.json"
37
37
  },
38
38
  "keywords": [
39
39
  "codegraph",
@@ -61,19 +61,19 @@
61
61
  "optionalDependencies": {
62
62
  "@huggingface/transformers": "^3.8.1",
63
63
  "@modelcontextprotocol/sdk": "^1.0.0",
64
- "@optave/codegraph-darwin-arm64": "2.0.0",
65
- "@optave/codegraph-darwin-x64": "2.0.0",
66
- "@optave/codegraph-linux-x64-gnu": "2.0.0",
67
- "@optave/codegraph-win32-x64-msvc": "2.0.0"
64
+ "@optave/codegraph-darwin-arm64": "2.1.1-dev.00f091c",
65
+ "@optave/codegraph-darwin-x64": "2.1.1-dev.00f091c",
66
+ "@optave/codegraph-linux-x64-gnu": "2.1.1-dev.00f091c",
67
+ "@optave/codegraph-win32-x64-msvc": "2.1.1-dev.00f091c"
68
68
  },
69
69
  "devDependencies": {
70
70
  "@biomejs/biome": "^2.4.4",
71
71
  "@commitlint/cli": "^19.8",
72
72
  "@commitlint/config-conventional": "^19.8",
73
- "commit-and-tag-version": "^12.5",
74
- "husky": "^9.1",
75
73
  "@tree-sitter-grammars/tree-sitter-hcl": "^1.2.0",
76
74
  "@vitest/coverage-v8": "^4.0.18",
75
+ "commit-and-tag-version": "^12.5",
76
+ "husky": "^9.1",
77
77
  "tree-sitter-c-sharp": "^0.23.1",
78
78
  "tree-sitter-cli": "^0.26.5",
79
79
  "tree-sitter-go": "^0.23.4",