codegraphy 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Charan Kulal
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,310 @@
1
+ Metadata-Version: 2.4
2
+ Name: codegraphy
3
+ Version: 0.1.0
4
+ Summary: SQLite/PostgreSQL codebase knowledge graph and MCP server for Claude Code
5
+ Author: Charan Kulal
6
+ License-Expression: MIT
7
+ Keywords: mcp,code-analysis,knowledge-graph,claude-code,sqlite,postgresql
8
+ Classifier: Development Status :: 3 - Alpha
9
+ Classifier: Intended Audience :: Developers
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: Programming Language :: Python :: 3.10
12
+ Classifier: Programming Language :: Python :: 3.11
13
+ Classifier: Programming Language :: Python :: 3.12
14
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
15
+ Classifier: Topic :: Software Development :: Documentation
16
+ Requires-Python: >=3.10
17
+ Description-Content-Type: text/markdown
18
+ License-File: LICENSE
19
+ Requires-Dist: click>=8.0
20
+ Requires-Dist: mcp>=1.0
21
+ Provides-Extra: postgres
22
+ Requires-Dist: psycopg2-binary; extra == "postgres"
23
+ Provides-Extra: pgvector
24
+ Requires-Dist: pgvector; extra == "pgvector"
25
+ Provides-Extra: js
26
+ Requires-Dist: tree-sitter; extra == "js"
27
+ Requires-Dist: tree-sitter-javascript; extra == "js"
28
+ Requires-Dist: tree-sitter-typescript; extra == "js"
29
+ Provides-Extra: html
30
+ Requires-Dist: tree-sitter; extra == "html"
31
+ Requires-Dist: tree-sitter-html; extra == "html"
32
+ Provides-Extra: all
33
+ Requires-Dist: psycopg2-binary; extra == "all"
34
+ Requires-Dist: pgvector; extra == "all"
35
+ Requires-Dist: tree-sitter; extra == "all"
36
+ Requires-Dist: tree-sitter-javascript; extra == "all"
37
+ Requires-Dist: tree-sitter-typescript; extra == "all"
38
+ Requires-Dist: tree-sitter-html; extra == "all"
39
+ Dynamic: license-file
40
+
41
+ # codegraphy
42
+
43
+ Standalone Python package that parses a codebase into a knowledge graph (PostgreSQL or SQLite) and exposes it as an [MCP](https://modelcontextprotocol.io/) server for Claude Code. Claude calls graph tools instead of `Read` + `Bash(grep)` — cuts exploration token cost by 5–10×.
44
+
45
+ **Python:** 3.10+
46
+ **License:** MIT
47
+
48
+ ---
49
+
50
+ ## Why
51
+
52
+ Claude exploring an unfamiliar codebase today:
53
+
54
+ | Task | Without codegraphy | With codegraphy |
55
+ |------|-------------------|----------------|
56
+ | Find where `Something` is defined | Read 10 files (~15k tokens) | `search_symbol("Something")` (~200 tokens) |
57
+ | Understand a file's structure | Read full file (~3k tokens) | `get_file_summary("views.py")` (~300 tokens) |
58
+
59
+ ---
60
+
61
+ ## Installation
62
+
63
+ ```bash
64
+ # SQLite-only install (default, zero config):
65
+ pip install codegraphy
66
+
67
+ # For PostgreSQL support:
68
+ pip install codegraphy[postgres]
69
+
70
+ # For JS/TS parsing (planned):
71
+ pip install codegraphy[js]
72
+
73
+ # Everything:
74
+ pip install codegraphy[all]
75
+ ```
76
+
77
+ The base PyPI package keeps SQLite support in the standard library path, so PostgreSQL stays opt-in.
78
+
79
+ ---
80
+
81
+ ## Quickstart
82
+
83
+ ```bash
84
+ # 1. Initialize the database (SQLite by default)
85
+ codegraphy init
86
+
87
+ # 2. Index your project
88
+ codegraphy index .
89
+
90
+ # 3. Start the MCP server (stdio, for Claude Code)
91
+ codegraphy serve
92
+ ```
93
+
94
+ That's it. Claude can now query your codebase graph instead of reading files.
95
+
96
+ ---
97
+
98
+ ## CLI Reference
99
+
100
+ ```bash
101
+ codegraphy init [--db URL] # Create tables (SQLite default, or pass Postgres URL)
102
+ codegraphy index PATH [--exclude] # Full index of a directory
103
+ codegraphy update # Incremental re-index via git diff
104
+ codegraphy serve # Start MCP server over stdio
105
+ codegraphy search NAME # Search symbols (debug, not MCP)
106
+ codegraphy usages QUALIFIED_NAME # Find usages (debug, not MCP)
107
+ codegraphy stats # Show graph statistics
108
+ ```
109
+
110
+ ---
111
+
112
+ ## MCP Tools
113
+
114
+ When running as an MCP server, codegraphy exposes these tools to Claude:
115
+
116
+ | Tool | Description |
117
+ |------|-------------|
118
+ | `search_symbol(name, kind?, limit?, fallback_grep?)` | Find symbols by name — exact, then substring, then grep fallback |
119
+ | `get_file_summary(file_path)` | Classes, functions, imports in a file without reading it |
120
+ | `find_usages(qualified_name, limit?, fallback_grep?)` | Who imports/calls/references this symbol |
121
+ | `get_context(file_path, line, radius?)` | Read N lines around a line number |
122
+ | `path_between(from_qualified, to_qualified, max_depth?)` | BFS shortest path between two symbols |
123
+ | `grep_search(pattern, include?, exclude?, limit?)` | Direct grep — bypass the graph |
124
+ | `graph_stats()` | File/symbol/edge counts, backend type |
125
+ | `what_touches_model(model_name)` | Django: views, admin, signals referencing a model |
126
+ | `search_semantic(query, limit?)` | pgvector semantic search (Postgres only, planned) |
127
+
128
+ All tools return a `source` field (`"graph"` or `"grep"`) so Claude can gauge confidence.
129
+
130
+ ---
131
+
132
+ ## Configuration
133
+
134
+ Priority: CLI flag → environment variable → `codegraphy.toml` → defaults.
135
+
136
+ ### Environment Variables
137
+
138
+ ```bash
139
+ DATABASE_URL=sqlite:///codegraphy.db # or postgresql://localhost/codegraphy
140
+ REPOLENS_ROOT=. # project root for grep fallback
141
+ REPOLENS_PLUGINS=repolens.plugins.django
142
+ ```
143
+
144
+ ### Config File (optional)
145
+
146
+ ```toml
147
+ # codegraphy.toml (place at project root)
148
+ database_url = "postgresql://localhost/codegraphy"
149
+ root = "."
150
+ exclude = ["migrations", "node_modules", ".venv", "__pycache__"]
151
+ plugins = ["repolens.plugins.django"]
152
+ ```
153
+
154
+ ---
155
+
156
+ ## Claude Code Integration
157
+
158
+ ### Register the MCP server
159
+
160
+ ```json
161
+ // .claude/settings.json
162
+ {
163
+ "mcpServers": {
164
+ "codegraphy": {
165
+ "command": "codegraphy",
166
+ "args": ["serve"],
167
+ "env": {
168
+ "DATABASE_URL": "sqlite:///codegraphy.db"
169
+ }
170
+ }
171
+ }
172
+ }
173
+ ```
174
+
175
+ ### Auto-update on session end (optional)
176
+
177
+ ```json
178
+ // .claude/settings.json
179
+ {
180
+ "hooks": {
181
+ "Stop": [{
182
+ "type": "command",
183
+ "command": "codegraphy update"
184
+ }]
185
+ }
186
+ }
187
+ ```
188
+
189
+ ---
190
+
191
+ ## Architecture
192
+
193
+ ```
194
+ repolens/
195
+ ├── cli.py # Click CLI entry points
196
+ ├── config.py # DATABASE_URL, REPOLENS_ROOT, plugin list
197
+ ├── db/
198
+ │ ├── schema.py # CREATE TABLE statements (PG + SQLite)
199
+ │ └── store.py # upsert_symbol, upsert_edge, query helpers
200
+ ├── indexer/
201
+ │ ├── base.py # BaseIndexer ABC, Symbol/Edge dataclasses
202
+ │ ├── python.py # ast-based Python indexer
203
+ │ └── walker.py # Filesystem walk + git-diff incremental
204
+ ├── mcp/
205
+ │ └── server.py # FastMCP server + all tool definitions
206
+ ├── plugins/
207
+ │ ├── base.py # BasePlugin ABC
208
+ │ └── django.py # Django-aware: models, views, signals
209
+ └── session/ # (planned) git-diff hook + memory write
210
+ ```
211
+
212
+ ### Database Schema
213
+
214
+ Three tables power the graph:
215
+
216
+ - **`cg_files`** — indexed files with git hash for deduplication
217
+ - **`cg_symbols`** — every class, function, method, import with location + summary
218
+ - **`cg_edges`** — relationships: `imports`, `calls`, `inherits`, `references`, `registers`, `handles_signal`
219
+
220
+ ### Indexing Strategy
221
+
222
+ 1. Walk files via `git ls-files` (falls back to `os.walk`)
223
+ 2. SHA-256 content hash skips unchanged files
224
+ 3. AST parsing extracts symbols and edges
225
+ 4. Plugins post-process symbols (e.g., Django re-tags `class` → `model`)
226
+ 5. Upsert into database with cascade delete for clean re-indexing
227
+
228
+ ---
229
+
230
+ ## Plugin System
231
+
232
+ Plugins implement two hooks:
233
+
234
+ ```python
235
+ class BasePlugin:
236
+ def on_symbol(self, symbol: Symbol) -> Symbol:
237
+ """Mutate or re-tag a symbol after parsing."""
238
+ return symbol
239
+
240
+ def extra_edges(self, symbols: list[Symbol]) -> list[Edge]:
241
+ """Derive additional edges from the symbol list."""
242
+ return []
243
+ ```
244
+
245
+ ### Built-in: Django Plugin
246
+
247
+ Detects Django patterns by file naming convention:
248
+ - Classes in `models.py` → `kind = "model"`
249
+ - Classes/functions in `views.py` → `kind = "view"`
250
+
251
+ Enable via environment variable:
252
+ ```bash
253
+ REPOLENS_PLUGINS=repolens.plugins.django
254
+ ```
255
+
256
+ ---
257
+
258
+ ## Current Status
259
+
260
+ | Milestone | Status |
261
+ |-----------|--------|
262
+ | M1 — Schema + Python indexer + `codegraphy index` | ✅ Complete |
263
+ | M2 — `search_symbol` + `get_file_summary` + MCP serve | ✅ Complete |
264
+ | M3 — `find_usages` + `path_between` + `get_context` + grep fallback | ✅ Complete |
265
+ | M4 — `codegraphy update` (incremental) | ✅ Complete |
266
+ | M5 — Django plugin | 🔶 Partial (symbol re-tagging, no admin/signal edges) |
267
+ | M6 — Semantic search (pgvector) | ⬜ Stub only |
268
+ | M7 — JS/TS indexer (tree-sitter) | ⬜ Planned |
269
+ | M8 — HTML/Template indexer | ⬜ Planned |
270
+ | M9 — `grep_search` tool + cross-language edges | 🔶 grep_search done, cross-lang edges planned |
271
+
272
+ ---
273
+
274
+ ## Development
275
+
276
+ ```bash
277
+ # Clone and install in editable mode
278
+ git clone <repo-url> && cd codegraphy
279
+ python -m venv .venv && source .venv/bin/activate
280
+ pip install -e .
281
+
282
+ # Initialize local DB and index this project
283
+ codegraphy init
284
+ codegraphy index .
285
+
286
+ # Check stats
287
+ codegraphy stats
288
+ ```
289
+
290
+ ## Publishing
291
+
292
+ `codegraphy` is configured to build as a standard PyPI distribution from `pyproject.toml`.
293
+
294
+ For PyPI trusted publishing, use **`publish.yml`** as the workflow name. The workflow file lives at `.github/workflows/publish.yml`.
295
+
296
+ ```bash
297
+ python -m pip install --upgrade build twine
298
+ python -m build
299
+ python -m twine check dist/*
300
+ python -m twine upload dist/*
301
+ ```
302
+
303
+ ---
304
+
305
+ ## What It Is NOT
306
+
307
+ - Not a code execution sandbox
308
+ - Not a test runner or linter
309
+ - Not a replacement for LSP/IDE features
310
+ - Not AI-generated summaries by default (uses docstrings; AI summaries are opt-in future)
@@ -0,0 +1,270 @@
1
+ # codegraphy
2
+
3
+ Standalone Python package that parses a codebase into a knowledge graph (PostgreSQL or SQLite) and exposes it as an [MCP](https://modelcontextprotocol.io/) server for Claude Code. Claude calls graph tools instead of `Read` + `Bash(grep)` — cuts exploration token cost by 5–10×.
4
+
5
+ **Python:** 3.10+
6
+ **License:** MIT
7
+
8
+ ---
9
+
10
+ ## Why
11
+
12
+ Claude exploring an unfamiliar codebase today:
13
+
14
+ | Task | Without codegraphy | With codegraphy |
15
+ |------|-------------------|----------------|
16
+ | Find where `Something` is defined | Read 10 files (~15k tokens) | `search_symbol("Something")` (~200 tokens) |
17
+ | Understand a file's structure | Read full file (~3k tokens) | `get_file_summary("views.py")` (~300 tokens) |
18
+
19
+ ---
20
+
21
+ ## Installation
22
+
23
+ ```bash
24
+ # SQLite-only install (default, zero config):
25
+ pip install codegraphy
26
+
27
+ # For PostgreSQL support:
28
+ pip install codegraphy[postgres]
29
+
30
+ # For JS/TS parsing (planned):
31
+ pip install codegraphy[js]
32
+
33
+ # Everything:
34
+ pip install codegraphy[all]
35
+ ```
36
+
37
+ The base PyPI package keeps SQLite support in the standard library path, so PostgreSQL stays opt-in.
38
+
39
+ ---
40
+
41
+ ## Quickstart
42
+
43
+ ```bash
44
+ # 1. Initialize the database (SQLite by default)
45
+ codegraphy init
46
+
47
+ # 2. Index your project
48
+ codegraphy index .
49
+
50
+ # 3. Start the MCP server (stdio, for Claude Code)
51
+ codegraphy serve
52
+ ```
53
+
54
+ That's it. Claude can now query your codebase graph instead of reading files.
55
+
56
+ ---
57
+
58
+ ## CLI Reference
59
+
60
+ ```bash
61
+ codegraphy init [--db URL] # Create tables (SQLite default, or pass Postgres URL)
62
+ codegraphy index PATH [--exclude] # Full index of a directory
63
+ codegraphy update # Incremental re-index via git diff
64
+ codegraphy serve # Start MCP server over stdio
65
+ codegraphy search NAME # Search symbols (debug, not MCP)
66
+ codegraphy usages QUALIFIED_NAME # Find usages (debug, not MCP)
67
+ codegraphy stats # Show graph statistics
68
+ ```
69
+
70
+ ---
71
+
72
+ ## MCP Tools
73
+
74
+ When running as an MCP server, codegraphy exposes these tools to Claude:
75
+
76
+ | Tool | Description |
77
+ |------|-------------|
78
+ | `search_symbol(name, kind?, limit?, fallback_grep?)` | Find symbols by name — exact, then substring, then grep fallback |
79
+ | `get_file_summary(file_path)` | Classes, functions, imports in a file without reading it |
80
+ | `find_usages(qualified_name, limit?, fallback_grep?)` | Who imports/calls/references this symbol |
81
+ | `get_context(file_path, line, radius?)` | Read N lines around a line number |
82
+ | `path_between(from_qualified, to_qualified, max_depth?)` | BFS shortest path between two symbols |
83
+ | `grep_search(pattern, include?, exclude?, limit?)` | Direct grep — bypass the graph |
84
+ | `graph_stats()` | File/symbol/edge counts, backend type |
85
+ | `what_touches_model(model_name)` | Django: views, admin, signals referencing a model |
86
+ | `search_semantic(query, limit?)` | pgvector semantic search (Postgres only, planned) |
87
+
88
+ All tools return a `source` field (`"graph"` or `"grep"`) so Claude can gauge confidence.
89
+
90
+ ---
91
+
92
+ ## Configuration
93
+
94
+ Priority: CLI flag → environment variable → `codegraphy.toml` → defaults.
95
+
96
+ ### Environment Variables
97
+
98
+ ```bash
99
+ DATABASE_URL=sqlite:///codegraphy.db # or postgresql://localhost/codegraphy
100
+ REPOLENS_ROOT=. # project root for grep fallback
101
+ REPOLENS_PLUGINS=repolens.plugins.django
102
+ ```
103
+
104
+ ### Config File (optional)
105
+
106
+ ```toml
107
+ # codegraphy.toml (place at project root)
108
+ database_url = "postgresql://localhost/codegraphy"
109
+ root = "."
110
+ exclude = ["migrations", "node_modules", ".venv", "__pycache__"]
111
+ plugins = ["repolens.plugins.django"]
112
+ ```
113
+
114
+ ---
115
+
116
+ ## Claude Code Integration
117
+
118
+ ### Register the MCP server
119
+
120
+ ```json
121
+ // .claude/settings.json
122
+ {
123
+ "mcpServers": {
124
+ "codegraphy": {
125
+ "command": "codegraphy",
126
+ "args": ["serve"],
127
+ "env": {
128
+ "DATABASE_URL": "sqlite:///codegraphy.db"
129
+ }
130
+ }
131
+ }
132
+ }
133
+ ```
134
+
135
+ ### Auto-update on session end (optional)
136
+
137
+ ```json
138
+ // .claude/settings.json
139
+ {
140
+ "hooks": {
141
+ "Stop": [{
142
+ "type": "command",
143
+ "command": "codegraphy update"
144
+ }]
145
+ }
146
+ }
147
+ ```
148
+
149
+ ---
150
+
151
+ ## Architecture
152
+
153
+ ```
154
+ repolens/
155
+ ├── cli.py # Click CLI entry points
156
+ ├── config.py # DATABASE_URL, REPOLENS_ROOT, plugin list
157
+ ├── db/
158
+ │ ├── schema.py # CREATE TABLE statements (PG + SQLite)
159
+ │ └── store.py # upsert_symbol, upsert_edge, query helpers
160
+ ├── indexer/
161
+ │ ├── base.py # BaseIndexer ABC, Symbol/Edge dataclasses
162
+ │ ├── python.py # ast-based Python indexer
163
+ │ └── walker.py # Filesystem walk + git-diff incremental
164
+ ├── mcp/
165
+ │ └── server.py # FastMCP server + all tool definitions
166
+ ├── plugins/
167
+ │ ├── base.py # BasePlugin ABC
168
+ │ └── django.py # Django-aware: models, views, signals
169
+ └── session/ # (planned) git-diff hook + memory write
170
+ ```
171
+
172
+ ### Database Schema
173
+
174
+ Three tables power the graph:
175
+
176
+ - **`cg_files`** — indexed files with git hash for deduplication
177
+ - **`cg_symbols`** — every class, function, method, import with location + summary
178
+ - **`cg_edges`** — relationships: `imports`, `calls`, `inherits`, `references`, `registers`, `handles_signal`
179
+
180
+ ### Indexing Strategy
181
+
182
+ 1. Walk files via `git ls-files` (falls back to `os.walk`)
183
+ 2. SHA-256 content hash skips unchanged files
184
+ 3. AST parsing extracts symbols and edges
185
+ 4. Plugins post-process symbols (e.g., Django re-tags `class` → `model`)
186
+ 5. Upsert into database with cascade delete for clean re-indexing
187
+
188
+ ---
189
+
190
+ ## Plugin System
191
+
192
+ Plugins implement two hooks:
193
+
194
+ ```python
195
+ class BasePlugin:
196
+ def on_symbol(self, symbol: Symbol) -> Symbol:
197
+ """Mutate or re-tag a symbol after parsing."""
198
+ return symbol
199
+
200
+ def extra_edges(self, symbols: list[Symbol]) -> list[Edge]:
201
+ """Derive additional edges from the symbol list."""
202
+ return []
203
+ ```
204
+
205
+ ### Built-in: Django Plugin
206
+
207
+ Detects Django patterns by file naming convention:
208
+ - Classes in `models.py` → `kind = "model"`
209
+ - Classes/functions in `views.py` → `kind = "view"`
210
+
211
+ Enable via environment variable:
212
+ ```bash
213
+ REPOLENS_PLUGINS=repolens.plugins.django
214
+ ```
215
+
216
+ ---
217
+
218
+ ## Current Status
219
+
220
+ | Milestone | Status |
221
+ |-----------|--------|
222
+ | M1 — Schema + Python indexer + `codegraphy index` | ✅ Complete |
223
+ | M2 — `search_symbol` + `get_file_summary` + MCP serve | ✅ Complete |
224
+ | M3 — `find_usages` + `path_between` + `get_context` + grep fallback | ✅ Complete |
225
+ | M4 — `codegraphy update` (incremental) | ✅ Complete |
226
+ | M5 — Django plugin | 🔶 Partial (symbol re-tagging, no admin/signal edges) |
227
+ | M6 — Semantic search (pgvector) | ⬜ Stub only |
228
+ | M7 — JS/TS indexer (tree-sitter) | ⬜ Planned |
229
+ | M8 — HTML/Template indexer | ⬜ Planned |
230
+ | M9 — `grep_search` tool + cross-language edges | 🔶 grep_search done, cross-lang edges planned |
231
+
232
+ ---
233
+
234
+ ## Development
235
+
236
+ ```bash
237
+ # Clone and install in editable mode
238
+ git clone <repo-url> && cd codegraphy
239
+ python -m venv .venv && source .venv/bin/activate
240
+ pip install -e .
241
+
242
+ # Initialize local DB and index this project
243
+ codegraphy init
244
+ codegraphy index .
245
+
246
+ # Check stats
247
+ codegraphy stats
248
+ ```
249
+
250
+ ## Publishing
251
+
252
+ `codegraphy` is configured to build as a standard PyPI distribution from `pyproject.toml`.
253
+
254
+ For PyPI trusted publishing, use **`publish.yml`** as the workflow name. The workflow file lives at `.github/workflows/publish.yml`.
255
+
256
+ ```bash
257
+ python -m pip install --upgrade build twine
258
+ python -m build
259
+ python -m twine check dist/*
260
+ python -m twine upload dist/*
261
+ ```
262
+
263
+ ---
264
+
265
+ ## What It Is NOT
266
+
267
+ - Not a code execution sandbox
268
+ - Not a test runner or linter
269
+ - Not a replacement for LSP/IDE features
270
+ - Not AI-generated summaries by default (uses docstrings; AI summaries are opt-in future)