@optave/codegraph 1.1.0 → 1.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,311 +1,498 @@
1
- <p align="center">
2
- <img src="https://img.shields.io/badge/codegraph-dependency%20intelligence-blue?style=for-the-badge&logo=graphql&logoColor=white" alt="codegraph" />
3
- </p>
4
-
5
- <h1 align="center">codegraph</h1>
6
-
7
- <p align="center">
8
- <strong>Local code dependency graph CLI — parse, query, and visualize your codebase at file and function level.</strong>
9
- </p>
10
-
11
- <p align="center">
12
- <a href="https://www.npmjs.com/package/codegraph"><img src="https://img.shields.io/npm/v/codegraph?style=flat-square&logo=npm&logoColor=white&label=npm" alt="npm version" /></a>
13
- <a href="https://github.com/optave/codegraph/blob/main/LICENSE"><img src="https://img.shields.io/github/license/optave/codegraph?style=flat-square&logo=opensourceinitiative&logoColor=white" alt="Apache-2.0 License" /></a>
14
- <a href="https://github.com/optave/codegraph/actions"><img src="https://img.shields.io/github/actions/workflow/status/optave/codegraph/codegraph-impact.yml?style=flat-square&logo=githubactions&logoColor=white&label=CI" alt="CI" /></a>
15
- <img src="https://img.shields.io/badge/node-%3E%3D20-339933?style=flat-square&logo=node.js&logoColor=white" alt="Node >= 20" />
16
- <img src="https://img.shields.io/badge/platform-local%20only-important?style=flat-square&logo=shield&logoColor=white" alt="Local Only" />
17
- </p>
18
-
19
- <p align="center">
20
- <a href="#-quick-start">Quick Start</a> •
21
- <a href="#-features">Features</a> •
22
- <a href="#-commands">Commands</a> •
23
- <a href="#-language-support">Languages</a> •
24
- <a href="#-ai-agent-integration">AI Integration</a> •
25
- <a href="#-ci--github-actions">CI/CD</a> •
26
- <a href="#-contributing">Contributing</a>
27
- </p>
28
-
29
- ---
30
-
31
- > **Zero network calls. Zero telemetry. Your code never leaves your machine.**
32
- >
33
- > Codegraph uses [tree-sitter](https://tree-sitter.github.io/) (via WASM — no native compilation required) to parse your codebase into an AST, extracts functions, classes, imports, and call sites, resolves dependencies, and stores everything in a local SQLite database. Query it instantly from the command line.
34
-
35
- ---
36
-
37
- ## 🚀 Quick Start
38
-
39
- ```bash
40
- # Install
41
- git clone https://github.com/optave/codegraph.git
42
- cd codegraph
43
- npm install
44
- npm link
45
-
46
- # Build a graph for any project
47
- cd your-project
48
- codegraph build # .codegraph/graph.db created
49
-
50
- # Start exploring
51
- codegraph map # see most-connected files
52
- codegraph query myFunc # find any function, see callers & callees
53
- codegraph deps src/index.ts # file-level import/export map
54
- ```
55
-
56
- ## Features
57
-
58
- | | Feature | Description |
59
- |---|---|---|
60
- | 🔍 | **Symbol search** | Find any function, class, or method by name with callers/callees |
61
- | 📁 | **File dependencies** | See what a file imports and what imports it |
62
- | 💥 | **Impact analysis** | Trace every file affected by a change (transitive) |
63
- | 🧬 | **Function-level tracing** | Call chains, caller trees, and function-level impact |
64
- | 📊 | **Diff impact** | Parse `git diff`, find overlapping functions, trace their callers |
65
- | 🗺️ | **Module map** | Bird's-eye view of your most-connected files |
66
- | 🔄 | **Cycle detection** | Find circular dependencies at file or function level |
67
- | 📤 | **Export** | DOT (Graphviz), Mermaid, and JSON graph export |
68
- | 🧠 | **Semantic search** | Embeddings-powered natural language code search |
69
- | 👀 | **Watch mode** | Incrementally update the graph as files change |
70
- | 🤖 | **MCP server** | Model Context Protocol integration for AI assistants |
71
- | 🔒 | **Fully local** | No network calls, no data exfiltration, SQLite-backed |
72
-
73
- ## 📦 Commands
74
-
75
- ### Build & Watch
76
-
77
- ```bash
78
- codegraph build [dir] # Parse and build the dependency graph
79
- codegraph build --no-incremental # Force full rebuild
80
- codegraph watch [dir] # Watch for changes, update graph incrementally
81
- ```
82
-
83
- ### Query & Explore
84
-
85
- ```bash
86
- codegraph query <name> # Find a symbolshows callers and callees
87
- codegraph deps <file> # File imports/exports
88
- codegraph map # Top 20 most-connected files
89
- codegraph map -n 50 # Top 50
90
- ```
91
-
92
- ### Impact Analysis
93
-
94
- ```bash
95
- codegraph impact <file> # Transitive reverse dependency trace
96
- codegraph fn <name> # Function-level: callers, callees, call chain
97
- codegraph fn <name> --no-tests --depth 5
98
- codegraph fn-impact <name> # What functions break if this one changes
99
- codegraph diff-impact # Impact of unstaged git changes
100
- codegraph diff-impact --staged # Impact of staged changes
101
- codegraph diff-impact HEAD~3 # Impact vs a specific ref
102
- ```
103
-
104
- ### Export & Visualization
105
-
106
- ```bash
107
- codegraph export -f dot # Graphviz DOT format
108
- codegraph export -f mermaid # Mermaid diagram
109
- codegraph export -f json # JSON graph
110
- codegraph export --functions -o graph.dot # Function-level, write to file
111
- codegraph cycles # Detect circular dependencies
112
- codegraph cycles --functions # Function-level cycles
113
- ```
114
-
115
- ### Semantic Search
116
-
117
- Codegraph can build local embeddings for every function, method, and class, then search them by natural language. Everything runs locally using [@huggingface/transformers](https://huggingface.co/docs/transformers.js) — no API keys needed.
118
-
119
- ```bash
120
- codegraph embed # Build embeddings (default: minilm)
121
- codegraph embed --model nomic # Use a different model
122
- codegraph search "handle authentication"
123
- codegraph search "parse config" --min-score 0.4 -n 10
124
- codegraph models # List available models
125
- ```
126
-
127
- #### Available Models
128
-
129
- | Flag | Model | Dimensions | Size | License | Notes |
130
- |---|---|---|---|---|---|
131
- | `minilm` (default) | all-MiniLM-L6-v2 | 384 | ~23 MB | Apache-2.0 | Fastest, good for quick iteration |
132
- | `jina-small` | jina-embeddings-v2-small-en | 512 | ~33 MB | Apache-2.0 | Better quality, still small |
133
- | `jina-base` | jina-embeddings-v2-base-en | 768 | ~137 MB | Apache-2.0 | High quality, 8192 token context |
134
- | `nomic` | nomic-embed-text-v1 | 768 | ~137 MB | Apache-2.0 | Best quality, 8192 context |
135
-
136
- The model used during `embed` is stored in the database, so `search` auto-detects it — no need to pass `--model` when searching.
137
-
138
- ### AI Integration
139
-
140
- ```bash
141
- codegraph mcp # Start MCP server for AI assistants
142
- ```
143
-
144
- ### Common Flags
145
-
146
- | Flag | Description |
147
- |---|---|
148
- | `-d, --db <path>` | Custom path to `graph.db` |
149
- | `-T, --no-tests` | Exclude `.test.`, `.spec.`, `__test__` files |
150
- | `--depth <n>` | Transitive trace depth (default varies by command) |
151
- | `-j, --json` | Output as JSON |
152
- | `-v, --verbose` | Enable debug output |
153
-
154
- ## 🌐 Language Support
155
-
156
- | Language | Extensions | Coverage |
157
- |---|---|---|
158
- | ![JavaScript](https://img.shields.io/badge/-JavaScript-F7DF1E?style=flat-square&logo=javascript&logoColor=black) | `.js`, `.jsx`, `.mjs`, `.cjs` | Full — functions, classes, imports, call sites |
159
- | ![TypeScript](https://img.shields.io/badge/-TypeScript-3178C6?style=flat-square&logo=typescript&logoColor=white) | `.ts`, `.tsx` | Full interfaces, type aliases, `.d.ts` |
160
- | ![Python](https://img.shields.io/badge/-Python-3776AB?style=flat-square&logo=python&logoColor=white) | `.py` | Functions, classes, methods, imports, decorators |
161
- | ![Terraform](https://img.shields.io/badge/-Terraform-844FBA?style=flat-square&logo=terraform&logoColor=white) | `.tf`, `.hcl` | Resource, data, variable, module, output blocks |
162
-
163
- ## ⚙️ How It Works
164
-
165
- ```
166
- ┌──────────┐ ┌───────────┐ ┌───────────┐ ┌──────────┐ ┌─────────┐
167
- │ Source │───▶│ tree-sitter│───▶│ Extract │───▶│ Resolve │───▶│ SQLite │
168
- │ Files │ │ Parse │ │ Symbols │ │ Imports │ │ DB │
169
- └──────────┘ └───────────┘ └───────────┘ └──────────┘ └─────────┘
170
-
171
-
172
- ┌─────────┐
173
- │ Query │
174
- └─────────┘
175
- ```
176
-
177
- 1. **Parse** — tree-sitter (WASM) parses every source file into an AST
178
- 2. **Extract** Functions, classes, methods, interfaces, imports, exports, and call sites are extracted
179
- 3. **Resolve** — Imports are resolved to actual files (handles ESM conventions, `tsconfig.json` path aliases, `baseUrl`)
180
- 4. **Store** — Everything goes into SQLite as nodes + edges with tree-sitter node boundaries
181
- 5. **Query** All queries run locally against the SQLite DB — typically under 100ms
182
-
183
- ### Call Resolution
184
-
185
- Calls are resolved with priority and confidence scoring:
186
-
187
- | Priority | Source | Confidence |
188
- |---|---|---|
189
- | 1 | **Import-aware** — `import { foo } from './bar'` → link to `bar` | `1.0` |
190
- | 2 | **Same-file**definitions in the current file | `1.0` |
191
- | 3 | **Same directory** — definitions in sibling files | `0.7` |
192
- | 4 | **Same parent directory** — definitions in sibling dirs | `0.5` |
193
- | 5 | **Global fallback** — match by name across codebase | `0.3` |
194
- | 6 | **Method hierarchy** resolved through `extends`/`implements` | — |
195
-
196
- Dynamic patterns like `fn.call()`, `fn.apply()`, `fn.bind()`, and `obj["method"]()` are also detected on a best-effort basis.
197
-
198
- ## 📊 Performance
199
-
200
- Benchmarked on a ~3,200-file TypeScript project:
201
-
202
- | Metric | Value |
203
- |---|---|
204
- | Build time | ~30s |
205
- | Nodes | 19,000+ |
206
- | Edges | 120,000+ |
207
- | Query time | <100ms |
208
- | DB size | ~5 MB |
209
-
210
- ## 🤖 AI Agent Integration
211
-
212
- ### MCP Server
213
-
214
- Codegraph includes a built-in [Model Context Protocol](https://modelcontextprotocol.io/) server, so AI assistants can query your dependency graph directly:
215
-
216
- ```bash
217
- codegraph mcp
218
- ```
219
-
220
- ### CLAUDE.md / Agent Instructions
221
-
222
- Add this to your project's `CLAUDE.md` to help AI agents use codegraph:
223
-
224
- ```markdown
225
- ## Code Navigation
226
-
227
- This project has a codegraph database at `.codegraph/graph.db`.
228
-
229
- - **Before modifying a function**: `codegraph fn <name> --no-tests`
230
- - **Before modifying a file**: `codegraph deps <file>`
231
- - **To assess PR impact**: `codegraph diff-impact --no-tests`
232
- - **To find entry points**: `codegraph map`
233
- - **To trace breakage**: `codegraph fn-impact <name> --no-tests`
234
-
235
- Rebuild after major structural changes: `codegraph build`
236
- ```
237
-
238
- ## 🔁 CI / GitHub Actions
239
-
240
- Codegraph ships with a ready-to-use GitHub Actions workflow that comments impact analysis on every pull request.
241
-
242
- Copy `.github/workflows/codegraph-impact.yml` to your repo, and every PR will get a comment like:
243
-
244
- > **3 functions changed** **12 callers affected** across **7 files**
245
-
246
- ## 🛠️ Configuration
247
-
248
- Create a `.codegraphrc.json` in your project root to customize behavior:
249
-
250
- ```json
251
- {
252
- "include": ["src/**", "lib/**"],
253
- "exclude": ["**/*.test.js", "**/__mocks__/**"],
254
- "ignoreDirs": ["node_modules", ".git", "dist"],
255
- "extensions": [".js", ".ts", ".tsx", ".py"],
256
- "aliases": {
257
- "@/": "./src/",
258
- "@utils/": "./src/utils/"
259
- },
260
- "build": {
261
- "incremental": true
262
- }
263
- }
264
- ```
265
-
266
- ## 📖 Programmatic API
267
-
268
- Codegraph also exports a full API for use in your own tools:
269
-
270
- ```js
271
- import { buildGraph, queryNameData, findCycles, exportDOT } from 'codegraph';
272
-
273
- // Build the graph
274
- buildGraph('/path/to/project');
275
-
276
- // Query programmatically
277
- const results = queryNameData('myFunction', '/path/to/.codegraph/graph.db');
278
- ```
279
-
280
- ## ⚠️ Limitations
281
-
282
- - **No full type inference** — parses `.d.ts` interfaces but doesn't use TypeScript's type checker for overload resolution
283
- - **Dynamic calls are best-effort** complex computed property access and `eval` patterns are not resolved
284
- - **Python imports** — resolves relative imports but doesn't follow `sys.path` or virtual environment packages
285
-
286
- ## 🤝 Contributing
287
-
288
- Contributions are welcome! Here's how to get started:
289
-
290
- ```bash
291
- git clone https://github.com/optave/codegraph.git
292
- cd codegraph
293
- npm install --legacy-peer-deps
294
- npm test # run tests with vitest
295
- ```
296
-
297
- 1. Fork the repo
298
- 2. Create your feature branch (`git checkout -b feat/amazing-feature`)
299
- 3. Commit your changes
300
- 4. Push to the branch
301
- 5. Open a Pull Request
302
-
303
- ## 📄 License
304
-
305
- [Apache-2.0](LICENSE)
306
-
307
- ---
308
-
309
- <p align="center">
310
- <sub>Built with <a href="https://tree-sitter.github.io/">tree-sitter</a> and <a href="https://github.com/WiseLibs/better-sqlite3">better-sqlite3</a>. No data leaves your machine. Ever.</sub>
311
- </p>
1
+ <p align="center">
2
+ <img src="https://img.shields.io/badge/codegraph-dependency%20intelligence-blue?style=for-the-badge&logo=graphql&logoColor=white" alt="codegraph" />
3
+ </p>
4
+
5
+ <h1 align="center">codegraph</h1>
6
+
7
+ <p align="center">
8
+ <strong>Local code dependency graph CLI — parse, query, and visualize your codebase at file and function level.</strong>
9
+ </p>
10
+
11
+ <p align="center">
12
+ <a href="https://www.npmjs.com/package/@optave/codegraph"><img src="https://img.shields.io/npm/v/@optave/codegraph?style=flat-square&logo=npm&logoColor=white&label=npm" alt="npm version" /></a>
13
+ <a href="https://github.com/optave/codegraph/blob/main/LICENSE"><img src="https://img.shields.io/github/license/optave/codegraph?style=flat-square&logo=opensourceinitiative&logoColor=white" alt="Apache-2.0 License" /></a>
14
+ <a href="https://github.com/optave/codegraph/actions"><img src="https://img.shields.io/github/actions/workflow/status/optave/codegraph/codegraph-impact.yml?style=flat-square&logo=githubactions&logoColor=white&label=CI" alt="CI" /></a>
15
+ <img src="https://img.shields.io/badge/node-%3E%3D20-339933?style=flat-square&logo=node.js&logoColor=white" alt="Node >= 20" />
16
+ <img src="https://img.shields.io/badge/platform-local%20only-important?style=flat-square&logo=shield&logoColor=white" alt="Local Only" />
17
+ </p>
18
+
19
+ <p align="center">
20
+ <a href="#-why-codegraph">Why codegraph?</a> •
21
+ <a href="#-quick-start">Quick Start</a> •
22
+ <a href="#-features">Features</a> •
23
+ <a href="#-commands">Commands</a> •
24
+ <a href="#-language-support">Languages</a> •
25
+ <a href="#-ai-agent-integration">AI Integration</a> •
26
+ <a href="#-recommended-practices">Practices</a>
27
+ <a href="#-ci--github-actions">CI/CD</a>
28
+ <a href="#-roadmap">Roadmap</a> •
29
+ <a href="#-contributing">Contributing</a>
30
+ </p>
31
+
32
+ ---
33
+
34
+ > **Zero network calls. Zero telemetry. Your code never leaves your machine.**
35
+ >
36
+ > Codegraph uses [tree-sitter](https://tree-sitter.github.io/) (via WASM — no native compilation required) to parse your codebase into an AST, extracts functions, classes, imports, and call sites, resolves dependencies, and stores everything in a local SQLite database. Query it instantly from the command line.
37
+
38
+ ---
39
+
40
+ ## 💡 Why codegraph?
41
+
42
+ <sub>Comparison last verified: February 2026</sub>
43
+
44
+ Most dependency graph tools only tell you which **files** import which — codegraph tells you which **functions** call which, who their callers are, and what breaks when something changes. Here's how it compares to the alternatives:
45
+
46
+ ### Feature comparison
47
+
48
+ | Capability | codegraph | Madge | dep-cruiser | Skott | Nx graph | Sourcetrail |
49
+ |---|:---:|:---:|:---:|:---:|:---:|:---:|
50
+ | Function-level analysis | **Yes** | — | — | — | — | **Yes** |
51
+ | Multi-language | **10** | 1 | 1 | 1 | Any (project) | 4 |
52
+ | Semantic search | **Yes** | | | — | — | — |
53
+ | MCP / AI agent support | **Yes** | — | — | — | — | — |
54
+ | Git diff impact | **Yes** | — | — | — | Partial | — |
55
+ | Persistent database | **Yes** | — | — | — | — | Yes |
56
+ | Watch mode | **Yes** | — | — | — | Daemon | — |
57
+ | CI workflow included | **Yes** | — | Rules | — | Yes | — |
58
+ | Cycle detection | **Yes** | Yes | Yes | Yes | — | — |
59
+ | Zero config | **Yes** | Yes | — | Yes | — | — |
60
+ | Fully local / no telemetry | **Yes** | Yes | Yes | Yes | Partial | Yes |
61
+ | Free & open source | **Yes** | Yes | Yes | Yes | Partial | Archived |
62
+
63
+ ### What makes codegraph different
64
+
65
+ | | Differentiator | In practice |
66
+ |---|---|---|
67
+ | **🔬** | **Function-level, not just files** | Traces `handleAuth()` `validateToken()` → `decryptJWT()` and shows 14 callers across 9 files break if `decryptJWT` changes |
68
+ | **🌐** | **Multi-language, one CLI** | JS/TS + Python + Go + Rust + Java + C# + PHP + Ruby + Terraform in a single graph — no juggling Madge, pyan, and cflow |
69
+ | **🤖** | **AI-agent ready** | Built-in [MCP server](https://modelcontextprotocol.io/) — AI assistants query your graph directly via `codegraph fn <name>` |
70
+ | **💥** | **Git diff impact** | `codegraph diff-impact` shows changed functions, their callers, and full blast radius — ships with a GitHub Actions workflow |
71
+ | **🔒** | **Fully local, zero telemetry** | No accounts, no API keys, no cloud, no data exfiltration — Apache-2.0, free forever |
72
+ | **⚡** | **Build once, query instantly** | SQLite-backed — build in ~30s, every query under 100ms. Native Rust engine with WASM fallback. Most competitors re-parse every run |
73
+ | **🧠** | **Semantic search** | `codegraph search "handle auth"` uses local embeddings — multi-query with RRF ranking via `"auth; token; JWT"` |
74
+
75
+ ### How other tools compare
76
+
77
+ Many tools in this space are cloud-based or SaaS — meaning your code leaves your machine. Others require external services, accounts, or API keys. Codegraph makes **zero network calls** and has **zero telemetry**. Everything runs locally.
78
+
79
+ | Tool | What it does well | Where it falls short |
80
+ |---|---|---|
81
+ | [Madge](https://github.com/pahen/madge) | Simple file-level JS/TS dependency graphs | No function-level analysis, no impact tracing, JS/TS only |
82
+ | [dependency-cruiser](https://github.com/sverweij/dependency-cruiser) | Architectural rule validation for JS/TS | Module-level only (function-level explicitly out of scope), requires config |
83
+ | [Skott](https://github.com/antoine-music/skott) | Module graph with unused code detection | File-level only, JS/TS only, no persistent database |
84
+ | [Nx graph](https://nx.dev/) | Monorepo project-level dependency graph | Requires Nx workspace, project-level only (not file or function) |
85
+ | [Sourcetrail](https://github.com/CoatiSoftware/Sourcetrail) | Rich GUI with symbol-level graphs | Archived/discontinued (2021), no JS/TS, no CLI |
86
+ | [Sourcegraph](https://sourcegraph.com/) | Enterprise code search and navigation | Cloud/SaaS code sent to servers, $19+/user/mo, no longer open source |
87
+ | [CodeSee](https://www.codesee.io/) | Visual codebase maps | Cloud-based — code leaves your machine, acquired by GitKraken |
88
+ | [Understand](https://scitools.com/) | Deep multi-language static analysis | $100+/month per seat, proprietary, GUI-only, no CI or AI integration |
89
+ | [Snyk Code](https://snyk.io/) | AI-powered security scanning | Cloud-based — code sent to Snyk servers for analysis, not a dependency graph tool |
90
+ | [pyan](https://github.com/Technologicat/pyan) / [cflow](https://www.gnu.org/software/cflow/) | Function-level call graphs | Single-language each (Python / C only), no persistence, no queries |
91
+
92
+ ---
93
+
94
+ ## 🚀 Quick Start
95
+
96
+ ```bash
97
+ # Install from npm
98
+ npm install -g @optave/codegraph
99
+
100
+ # Or install from source
101
+ git clone https://github.com/optave/codegraph.git
102
+ cd codegraph
103
+ npm install
104
+ npm link
105
+
106
+ # Build a graph for any project
107
+ cd your-project
108
+ codegraph build # .codegraph/graph.db created
109
+
110
+ # Start exploring
111
+ codegraph map # see most-connected files
112
+ codegraph query myFunc # find any function, see callers & callees
113
+ codegraph deps src/index.ts # file-level import/export map
114
+ ```
115
+
116
+ ## ✨ Features
117
+
118
+ | | Feature | Description |
119
+ |---|---|---|
120
+ | 🔍 | **Symbol search** | Find any function, class, or method by name with callers/callees |
121
+ | 📁 | **File dependencies** | See what a file imports and what imports it |
122
+ | 💥 | **Impact analysis** | Trace every file affected by a change (transitive) |
123
+ | 🧬 | **Function-level tracing** | Call chains, caller trees, and function-level impact |
124
+ | 📊 | **Diff impact** | Parse `git diff`, find overlapping functions, trace their callers |
125
+ | 🗺️ | **Module map** | Bird's-eye view of your most-connected files |
126
+ | 🔄 | **Cycle detection** | Find circular dependencies at file or function level |
127
+ | 📤 | **Export** | DOT (Graphviz), Mermaid, and JSON graph export |
128
+ | 🧠 | **Semantic search** | Embeddings-powered natural language search with multi-query RRF ranking |
129
+ | 👀 | **Watch mode** | Incrementally update the graph as files change |
130
+ | 🤖 | **MCP server** | Model Context Protocol integration for AI assistants |
131
+ | 🔒 | **Fully local** | No network calls, no data exfiltration, SQLite-backed |
132
+
133
+ ## 📦 Commands
134
+
135
+ ### Build & Watch
136
+
137
+ ```bash
138
+ codegraph build [dir] # Parse and build the dependency graph
139
+ codegraph build --no-incremental # Force full rebuild
140
+ codegraph build --engine wasm # Force WASM engine (skip native)
141
+ codegraph watch [dir] # Watch for changes, update graph incrementally
142
+ ```
143
+
144
+ ### Query & Explore
145
+
146
+ ```bash
147
+ codegraph query <name> # Find a symbol — shows callers and callees
148
+ codegraph deps <file> # File imports/exports
149
+ codegraph map # Top 20 most-connected files
150
+ codegraph map -n 50 # Top 50
151
+ ```
152
+
153
+ ### Impact Analysis
154
+
155
+ ```bash
156
+ codegraph impact <file> # Transitive reverse dependency trace
157
+ codegraph fn <name> # Function-level: callers, callees, call chain
158
+ codegraph fn <name> --no-tests --depth 5
159
+ codegraph fn-impact <name> # What functions break if this one changes
160
+ codegraph diff-impact # Impact of unstaged git changes
161
+ codegraph diff-impact --staged # Impact of staged changes
162
+ codegraph diff-impact HEAD~3 # Impact vs a specific ref
163
+ ```
164
+
165
+ ### Export & Visualization
166
+
167
+ ```bash
168
+ codegraph export -f dot # Graphviz DOT format
169
+ codegraph export -f mermaid # Mermaid diagram
170
+ codegraph export -f json # JSON graph
171
+ codegraph export --functions -o graph.dot # Function-level, write to file
172
+ codegraph cycles # Detect circular dependencies
173
+ codegraph cycles --functions # Function-level cycles
174
+ ```
175
+
176
+ ### Semantic Search
177
+
178
+ Codegraph can build local embeddings for every function, method, and class, then search them by natural language. Everything runs locally using [@huggingface/transformers](https://huggingface.co/docs/transformers.js) — no API keys needed.
179
+
180
+ ```bash
181
+ codegraph embed # Build embeddings (default: minilm)
182
+ codegraph embed --model nomic # Use a different model
183
+ codegraph search "handle authentication"
184
+ codegraph search "parse config" --min-score 0.4 -n 10
185
+ codegraph models # List available models
186
+ ```
187
+
188
+ #### Multi-query search
189
+
190
+ Separate queries with `;` to search from multiple angles at once. Results are ranked using [Reciprocal Rank Fusion (RRF)](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf)items that rank highly across multiple queries rise to the top.
191
+
192
+ ```bash
193
+ codegraph search "auth middleware; JWT validation"
194
+ codegraph search "parse config; read settings; load env" -n 20
195
+ codegraph search "error handling; retry logic" --kind function
196
+ codegraph search "database connection; query builder" --rrf-k 30
197
+ ```
198
+
199
+ A single trailing semicolon is ignored (falls back to single-query mode). The `--rrf-k` flag controls the RRF smoothing constant (default 60) — lower values give more weight to top-ranked results.
200
+
201
+ #### Available Models
202
+
203
+ | Flag | Model | Dimensions | Size | License | Notes |
204
+ |---|---|---|---|---|---|
205
+ | `minilm` (default) | all-MiniLM-L6-v2 | 384 | ~23 MB | Apache-2.0 | Fastest, good for quick iteration |
206
+ | `jina-small` | jina-embeddings-v2-small-en | 512 | ~33 MB | Apache-2.0 | Better quality, still small |
207
+ | `jina-base` | jina-embeddings-v2-base-en | 768 | ~137 MB | Apache-2.0 | High quality, 8192 token context |
208
+ | `jina-code` | jina-embeddings-v2-base-code | 768 | ~137 MB | Apache-2.0 | **Best for code search**, trained on code+text |
209
+ | `nomic` | nomic-embed-text-v1 | 768 | ~137 MB | Apache-2.0 | Good quality, 8192 context |
210
+ | `nomic-v1.5` | nomic-embed-text-v1.5 | 768 | ~137 MB | Apache-2.0 | Improved nomic, Matryoshka dimensions |
211
+ | `bge-large` | bge-large-en-v1.5 | 1024 | ~335 MB | MIT | Best general retrieval, top MTEB scores |
212
+
213
+ The model used during `embed` is stored in the database, so `search` auto-detects it — no need to pass `--model` when searching.
214
+
215
+ ### AI Integration
216
+
217
+ ```bash
218
+ codegraph mcp # Start MCP server for AI assistants
219
+ ```
220
+
221
+ ### Common Flags
222
+
223
+ | Flag | Description |
224
+ |---|---|
225
+ | `-d, --db <path>` | Custom path to `graph.db` |
226
+ | `-T, --no-tests` | Exclude `.test.`, `.spec.`, `__test__` files |
227
+ | `--depth <n>` | Transitive trace depth (default varies by command) |
228
+ | `-j, --json` | Output as JSON |
229
+ | `-v, --verbose` | Enable debug output |
230
+ | `--engine <engine>` | Parser engine: `native`, `wasm`, or `auto` (default: `auto`) |
231
+ | `-k, --kind <kind>` | Filter by kind: `function`, `method`, `class` (search) |
232
+ | `--file <pattern>` | Filter by file path pattern (search) |
233
+ | `--rrf-k <n>` | RRF smoothing constant for multi-query search (default 60) |
234
+
235
+ ## 🌐 Language Support
236
+
237
+ | Language | Extensions | Coverage |
238
+ |---|---|---|
239
+ | ![JavaScript](https://img.shields.io/badge/-JavaScript-F7DF1E?style=flat-square&logo=javascript&logoColor=black) | `.js`, `.jsx`, `.mjs`, `.cjs` | Full — functions, classes, imports, call sites |
240
+ | ![TypeScript](https://img.shields.io/badge/-TypeScript-3178C6?style=flat-square&logo=typescript&logoColor=white) | `.ts`, `.tsx` | Full interfaces, type aliases, `.d.ts` |
241
+ | ![Python](https://img.shields.io/badge/-Python-3776AB?style=flat-square&logo=python&logoColor=white) | `.py` | Functions, classes, methods, imports, decorators |
242
+ | ![Go](https://img.shields.io/badge/-Go-00ADD8?style=flat-square&logo=go&logoColor=white) | `.go` | Functions, methods, structs, interfaces, imports, call sites |
243
+ | ![Rust](https://img.shields.io/badge/-Rust-000000?style=flat-square&logo=rust&logoColor=white) | `.rs` | Functions, methods, structs, traits, `use` imports, call sites |
244
+ | ![Java](https://img.shields.io/badge/-Java-ED8B00?style=flat-square&logo=openjdk&logoColor=white) | `.java` | Classes, methods, constructors, interfaces, imports, call sites |
245
+ | ![C#](https://img.shields.io/badge/-C%23-512BD4?style=flat-square&logo=dotnet&logoColor=white) | `.cs` | Classes, structs, records, interfaces, enums, methods, constructors, using directives, invocations |
246
+ | ![PHP](https://img.shields.io/badge/-PHP-777BB4?style=flat-square&logo=php&logoColor=white) | `.php` | Functions, classes, interfaces, traits, enums, methods, namespace use, calls |
247
+ | ![Ruby](https://img.shields.io/badge/-Ruby-CC342D?style=flat-square&logo=ruby&logoColor=white) | `.rb` | Classes, modules, methods, singleton methods, require/require_relative, include/extend |
248
+ | ![Terraform](https://img.shields.io/badge/-Terraform-844FBA?style=flat-square&logo=terraform&logoColor=white) | `.tf`, `.hcl` | Resource, data, variable, module, output blocks |
249
+
250
+ ## ⚙️ How It Works
251
+
252
+ ```
253
+ ┌──────────┐ ┌───────────┐ ┌───────────┐ ┌──────────┐ ┌─────────┐
254
+ Source │──▶│ tree-sitter│──▶│ Extract │──▶│ Resolve │──▶│ SQLite │
255
+ Files │ │ Parse │ │ Symbols │ │ Imports │ │ DB │
256
+ └──────────┘ └───────────┘ └───────────┘ └──────────┘ └─────────┘
257
+
258
+
259
+ ┌─────────┐
260
+ Query │
261
+ └─────────┘
262
+ ```
263
+
264
+ 1. **Parse** — tree-sitter parses every source file into an AST (native Rust engine or WASM fallback)
265
+ 2. **Extract** — Functions, classes, methods, interfaces, imports, exports, and call sites are extracted
266
+ 3. **Resolve** Imports are resolved to actual files (handles ESM conventions, `tsconfig.json` path aliases, `baseUrl`)
267
+ 4. **Store** — Everything goes into SQLite as nodes + edges with tree-sitter node boundaries
268
+ 5. **Query** All queries run locally against the SQLite DB — typically under 100ms
269
+
270
+ ### Dual Engine
271
+
272
+ Codegraph ships with two parsing engines:
273
+
274
+ | Engine | How it works | When it's used |
275
+ |--------|-------------|----------------|
276
+ | **Native** (Rust) | napi-rs addon built from `crates/codegraph-core/` — parallel multi-core parsing via rayon | Auto-selected when the prebuilt binary is available |
277
+ | **WASM** | `web-tree-sitter` with pre-built `.wasm` grammars in `grammars/` | Fallback when the native addon isn't installed |
278
+
279
+ Both engines produce identical output. Use `--engine native|wasm|auto` to control selection (default: `auto`).
280
+
281
+ ### Call Resolution
282
+
283
+ Calls are resolved with priority and confidence scoring:
284
+
285
+ | Priority | Source | Confidence |
286
+ |---|---|---|
287
+ | 1 | **Import-aware** — `import { foo } from './bar'` → link to `bar` | `1.0` |
288
+ | 2 | **Same-file** definitions in the current file | `1.0` |
289
+ | 3 | **Same directory** — definitions in sibling files | `0.7` |
290
+ | 4 | **Same parent directory** — definitions in sibling dirs | `0.5` |
291
+ | 5 | **Global fallback** — match by name across codebase | `0.3` |
292
+ | 6 | **Method hierarchy** — resolved through `extends`/`implements` | — |
293
+
294
+ Dynamic patterns like `fn.call()`, `fn.apply()`, `fn.bind()`, and `obj["method"]()` are also detected on a best-effort basis.
295
+
296
+ ## 📊 Performance
297
+
298
+ Benchmarked on a ~3,200-file TypeScript project:
299
+
300
+ | Metric | Value |
301
+ |---|---|
302
+ | Build time | ~30s |
303
+ | Nodes | 19,000+ |
304
+ | Edges | 120,000+ |
305
+ | Query time | <100ms |
306
+ | DB size | ~5 MB |
307
+
308
+ ## 🤖 AI Agent Integration
309
+
310
+ ### MCP Server
311
+
312
+ Codegraph includes a built-in [Model Context Protocol](https://modelcontextprotocol.io/) server, so AI assistants can query your dependency graph directly:
313
+
314
+ ```bash
315
+ codegraph mcp
316
+ ```
317
+
318
+ ### CLAUDE.md / Agent Instructions
319
+
320
+ Add this to your project's `CLAUDE.md` to help AI agents use codegraph:
321
+
322
+ ```markdown
323
+ ## Code Navigation
324
+
325
+ This project uses codegraph. The database is at `.codegraph/graph.db`.
326
+
327
+ - **Before modifying a function**: `codegraph fn <name> --no-tests`
328
+ - **Before modifying a file**: `codegraph deps <file>`
329
+ - **To assess PR impact**: `codegraph diff-impact --no-tests`
330
+ - **To find entry points**: `codegraph map`
331
+ - **To trace breakage**: `codegraph fn-impact <name> --no-tests`
332
+
333
+ Rebuild after major structural changes: `codegraph build`
334
+
335
+ ### Semantic search
336
+
337
+ Use `codegraph search` to find functions by intent rather than exact name.
338
+ When a single query might miss results, combine multiple angles with `;`:
339
+
340
+ codegraph search "validate auth; check token; verify JWT"
341
+ codegraph search "parse config; load settings" --kind function
342
+
343
+ Multi-query search uses Reciprocal Rank Fusion — functions that rank
344
+ highly across several queries surface first. This is especially useful
345
+ when you're not sure what naming convention the codebase uses.
346
+
347
+ When writing multi-queries, use 2-4 sub-queries (2-4 words each) that
348
+ attack the problem from different angles. Pick from these strategies:
349
+ - **Naming variants**: cover synonyms the author might have used
350
+ ("send email; notify user; deliver message")
351
+ - **Abstraction levels**: pair high-level intent with low-level operation
352
+ ("handle payment; charge credit card")
353
+ - **Input/output sides**: cover the read half and write half
354
+ ("parse config; apply settings")
355
+ - **Domain + technical**: bridge business language and implementation
356
+ ("onboard tenant; create organization; provision workspace")
357
+
358
+ Use `--kind function` to cut noise. Use `--file <pattern>` to scope.
359
+ ```
360
+
361
+ ## 📋 Recommended Practices
362
+
363
+ See **[docs/recommended-practices.md](docs/recommended-practices.md)** for integration guides:
364
+
365
+ - **Git hooks** — auto-rebuild on commit, impact checks on push, commit message enrichment
366
+ - **CI/CD** — PR impact comments, threshold gates, graph caching
367
+ - **AI agents** — MCP server, CLAUDE.md templates, Claude Code hooks
368
+ - **Developer workflow** — watch mode, explore-before-you-edit, semantic search
369
+ - **Secure credentials** — `apiKeyCommand` with 1Password, Bitwarden, Vault, macOS Keychain, `pass`
370
+
371
+ ## 🔁 CI / GitHub Actions
372
+
373
+ Codegraph ships with a ready-to-use GitHub Actions workflow that comments impact analysis on every pull request.
374
+
375
+ Copy `.github/workflows/codegraph-impact.yml` to your repo, and every PR will get a comment like:
376
+
377
+ > **3 functions changed** → **12 callers affected** across **7 files**
378
+
379
+ ## 🛠️ Configuration
380
+
381
+ Create a `.codegraphrc.json` in your project root to customize behavior:
382
+
383
+ ```json
384
+ {
385
+ "include": ["src/**", "lib/**"],
386
+ "exclude": ["**/*.test.js", "**/__mocks__/**"],
387
+ "ignoreDirs": ["node_modules", ".git", "dist"],
388
+ "extensions": [".js", ".ts", ".tsx", ".py"],
389
+ "aliases": {
390
+ "@/": "./src/",
391
+ "@utils/": "./src/utils/"
392
+ },
393
+ "build": {
394
+ "incremental": true
395
+ }
396
+ }
397
+ ```
398
+
399
+ ### LLM credentials
400
+
401
+ Codegraph supports an `apiKeyCommand` field for secure credential management. Instead of storing API keys in config files or environment variables, you can shell out to a secret manager at runtime:
402
+
403
+ ```json
404
+ {
405
+ "llm": {
406
+ "provider": "openai",
407
+ "apiKeyCommand": "op read op://vault/openai/api-key"
408
+ }
409
+ }
410
+ ```
411
+
412
+ The command is split on whitespace and executed with `execFileSync` (no shell injection risk). Priority: **command output > `CODEGRAPH_LLM_API_KEY` env var > file config**. On failure, codegraph warns and falls back to the next source.
413
+
414
+ Works with any secret manager: 1Password CLI (`op`), Bitwarden (`bw`), `pass`, HashiCorp Vault, macOS Keychain (`security`), AWS Secrets Manager, etc.
415
+
416
+ ## 📖 Programmatic API
417
+
418
+ Codegraph also exports a full API for use in your own tools:
419
+
420
+ ```js
421
+ import { buildGraph, queryNameData, findCycles, exportDOT } from '@optave/codegraph';
422
+
423
+ // Build the graph
424
+ buildGraph('/path/to/project');
425
+
426
+ // Query programmatically
427
+ const results = queryNameData('myFunction', '/path/to/.codegraph/graph.db');
428
+ ```
429
+
430
+ ```js
431
+ import { parseFileAuto, getActiveEngine, isNativeAvailable } from '@optave/codegraph';
432
+
433
+ // Check which engine is active
434
+ console.log(getActiveEngine()); // 'native' or 'wasm'
435
+ console.log(isNativeAvailable()); // true if Rust addon is installed
436
+
437
+ // Parse a single file (uses auto-selected engine)
438
+ const symbols = await parseFileAuto('/path/to/file.ts');
439
+ ```
440
+
441
+ ```js
442
+ import { searchData, multiSearchData, buildEmbeddings } from '@optave/codegraph';
443
+
444
+ // Build embeddings (one-time)
445
+ await buildEmbeddings('/path/to/project');
446
+
447
+ // Single-query search
448
+ const { results } = await searchData('handle auth', dbPath);
449
+
450
+ // Multi-query search with RRF ranking
451
+ const { results: fused } = await multiSearchData(
452
+ ['auth middleware', 'JWT validation'],
453
+ dbPath,
454
+ { limit: 10, minScore: 0.3 }
455
+ );
456
+ // Each result has: { name, kind, file, line, rrf, queryScores[] }
457
+ ```
458
+
459
+ ## ⚠️ Limitations
460
+
461
+ - **No full type inference** — parses `.d.ts` interfaces but doesn't use TypeScript's type checker for overload resolution
462
+ - **Dynamic calls are best-effort** — complex computed property access and `eval` patterns are not resolved
463
+ - **Python imports** — resolves relative imports but doesn't follow `sys.path` or virtual environment packages
464
+
465
+ ## 🗺️ Roadmap
466
+
467
+ See **[ROADMAP.md](ROADMAP.md)** for the full development roadmap. Current plan:
468
+
469
+ 1. ~~**Rust Core**~~ — **Complete** (v1.3.0) — native tree-sitter parsing via napi-rs, parallel multi-core parsing, incremental re-parsing, import resolution & cycle detection in Rust
470
+ 2. ~~**Foundation Hardening**~~ — **Complete** (v1.4.0) — parser registry, 11-tool MCP server, test coverage 62%→75%, `apiKeyCommand` secret resolution
471
+ 3. **Intelligent Embeddings** — LLM-generated descriptions, hybrid search
472
+ 4. **Natural Language Queries** — `codegraph ask` command, conversational sessions
473
+ 5. **Expanded Language Support** — 8 new languages (12 → 20)
474
+ 6. **GitHub Integration & CI** — reusable GitHub Action, PR review, SARIF output
475
+ 7. **Visualization & Advanced** — web UI, dead code detection, monorepo support, agentic search
476
+
477
+ ## 🤝 Contributing
478
+
479
+ Contributions are welcome! See **[CONTRIBUTING.md](CONTRIBUTING.md)** for the full guide — setup, workflow, commit convention, testing, and architecture notes.
480
+
481
+ ```bash
482
+ git clone https://github.com/optave/codegraph.git
483
+ cd codegraph
484
+ npm install
485
+ npm test
486
+ ```
487
+
488
+ Looking to add a new language? Check out **[Adding a New Language](docs/adding-a-language.md)**.
489
+
490
+ ## 📄 License
491
+
492
+ [Apache-2.0](LICENSE)
493
+
494
+ ---
495
+
496
+ <p align="center">
497
+ <sub>Built with <a href="https://tree-sitter.github.io/">tree-sitter</a> and <a href="https://github.com/WiseLibs/better-sqlite3">better-sqlite3</a>. No data leaves your machine. Ever.</sub>
498
+ </p>