codebase-context 1.6.2 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (189) hide show
  1. package/LICENSE +21 -21
  2. package/README.md +417 -282
  3. package/dist/analyzers/angular/index.d.ts.map +1 -1
  4. package/dist/analyzers/angular/index.js +91 -40
  5. package/dist/analyzers/angular/index.js.map +1 -1
  6. package/dist/analyzers/generic/index.d.ts +1 -0
  7. package/dist/analyzers/generic/index.d.ts.map +1 -1
  8. package/dist/analyzers/generic/index.js +94 -14
  9. package/dist/analyzers/generic/index.js.map +1 -1
  10. package/dist/cli-formatters.d.ts +47 -0
  11. package/dist/cli-formatters.d.ts.map +1 -0
  12. package/dist/cli-formatters.js +803 -0
  13. package/dist/cli-formatters.js.map +1 -0
  14. package/dist/cli-memory.d.ts +5 -0
  15. package/dist/cli-memory.d.ts.map +1 -0
  16. package/dist/cli-memory.js +218 -0
  17. package/dist/cli-memory.js.map +1 -0
  18. package/dist/cli.d.ts +3 -1
  19. package/dist/cli.d.ts.map +1 -1
  20. package/dist/cli.js +317 -88
  21. package/dist/cli.js.map +1 -1
  22. package/dist/constants/codebase-context.d.ts +13 -0
  23. package/dist/constants/codebase-context.d.ts.map +1 -1
  24. package/dist/constants/codebase-context.js +13 -0
  25. package/dist/constants/codebase-context.js.map +1 -1
  26. package/dist/core/auto-refresh.d.ts +16 -0
  27. package/dist/core/auto-refresh.d.ts.map +1 -0
  28. package/dist/core/auto-refresh.js +25 -0
  29. package/dist/core/auto-refresh.js.map +1 -0
  30. package/dist/core/file-watcher.d.ts +15 -0
  31. package/dist/core/file-watcher.d.ts.map +1 -0
  32. package/dist/core/file-watcher.js +59 -0
  33. package/dist/core/file-watcher.js.map +1 -0
  34. package/dist/core/index-meta.d.ts +27 -0
  35. package/dist/core/index-meta.d.ts.map +1 -0
  36. package/dist/core/index-meta.js +212 -0
  37. package/dist/core/index-meta.js.map +1 -0
  38. package/dist/core/indexer.d.ts.map +1 -1
  39. package/dist/core/indexer.js +324 -26
  40. package/dist/core/indexer.js.map +1 -1
  41. package/dist/core/reranker.d.ts.map +1 -1
  42. package/dist/core/reranker.js +3 -0
  43. package/dist/core/reranker.js.map +1 -1
  44. package/dist/core/search-quality.js +2 -2
  45. package/dist/core/search-quality.js.map +1 -1
  46. package/dist/core/search.d.ts +1 -0
  47. package/dist/core/search.d.ts.map +1 -1
  48. package/dist/core/search.js +79 -11
  49. package/dist/core/search.js.map +1 -1
  50. package/dist/core/symbol-references.d.ts +20 -0
  51. package/dist/core/symbol-references.d.ts.map +1 -0
  52. package/dist/core/symbol-references.js +186 -0
  53. package/dist/core/symbol-references.js.map +1 -0
  54. package/dist/embeddings/index.d.ts +8 -0
  55. package/dist/embeddings/index.d.ts.map +1 -1
  56. package/dist/embeddings/index.js +17 -2
  57. package/dist/embeddings/index.js.map +1 -1
  58. package/dist/embeddings/openai.d.ts +1 -1
  59. package/dist/embeddings/openai.d.ts.map +1 -1
  60. package/dist/embeddings/openai.js +3 -1
  61. package/dist/embeddings/openai.js.map +1 -1
  62. package/dist/embeddings/transformers.d.ts +6 -0
  63. package/dist/embeddings/transformers.d.ts.map +1 -1
  64. package/dist/embeddings/transformers.js +12 -5
  65. package/dist/embeddings/transformers.js.map +1 -1
  66. package/dist/embeddings/types.d.ts +1 -0
  67. package/dist/embeddings/types.d.ts.map +1 -1
  68. package/dist/embeddings/types.js +7 -1
  69. package/dist/embeddings/types.js.map +1 -1
  70. package/dist/eval/harness.d.ts +5 -0
  71. package/dist/eval/harness.d.ts.map +1 -0
  72. package/dist/eval/harness.js +153 -0
  73. package/dist/eval/harness.js.map +1 -0
  74. package/dist/eval/types.d.ts +59 -0
  75. package/dist/eval/types.d.ts.map +1 -0
  76. package/dist/eval/types.js +2 -0
  77. package/dist/eval/types.js.map +1 -0
  78. package/dist/grammars/manifest.d.ts +26 -0
  79. package/dist/grammars/manifest.d.ts.map +1 -0
  80. package/dist/grammars/manifest.js +64 -0
  81. package/dist/grammars/manifest.js.map +1 -0
  82. package/dist/index.d.ts +16 -2
  83. package/dist/index.d.ts.map +1 -1
  84. package/dist/index.js +181 -1300
  85. package/dist/index.js.map +1 -1
  86. package/dist/patterns/semantics.d.ts +2 -1
  87. package/dist/patterns/semantics.d.ts.map +1 -1
  88. package/dist/patterns/semantics.js +0 -2
  89. package/dist/patterns/semantics.js.map +1 -1
  90. package/dist/preflight/evidence-lock.d.ts +6 -0
  91. package/dist/preflight/evidence-lock.d.ts.map +1 -1
  92. package/dist/preflight/evidence-lock.js +33 -1
  93. package/dist/preflight/evidence-lock.js.map +1 -1
  94. package/dist/storage/index.d.ts +4 -1
  95. package/dist/storage/index.d.ts.map +1 -1
  96. package/dist/storage/index.js +2 -2
  97. package/dist/storage/index.js.map +1 -1
  98. package/dist/storage/lancedb.d.ts +11 -1
  99. package/dist/storage/lancedb.d.ts.map +1 -1
  100. package/dist/storage/lancedb.js +45 -11
  101. package/dist/storage/lancedb.js.map +1 -1
  102. package/dist/storage/types.d.ts +4 -1
  103. package/dist/storage/types.d.ts.map +1 -1
  104. package/dist/storage/types.js.map +1 -1
  105. package/dist/tools/detect-circular-dependencies.d.ts +5 -0
  106. package/dist/tools/detect-circular-dependencies.d.ts.map +1 -0
  107. package/dist/tools/detect-circular-dependencies.js +117 -0
  108. package/dist/tools/detect-circular-dependencies.js.map +1 -0
  109. package/dist/tools/get-codebase-metadata.d.ts +5 -0
  110. package/dist/tools/get-codebase-metadata.d.ts.map +1 -0
  111. package/dist/tools/get-codebase-metadata.js +53 -0
  112. package/dist/tools/get-codebase-metadata.js.map +1 -0
  113. package/dist/tools/get-indexing-status.d.ts +5 -0
  114. package/dist/tools/get-indexing-status.d.ts.map +1 -0
  115. package/dist/tools/get-indexing-status.js +44 -0
  116. package/dist/tools/get-indexing-status.js.map +1 -0
  117. package/dist/tools/get-memory.d.ts +5 -0
  118. package/dist/tools/get-memory.d.ts.map +1 -0
  119. package/dist/tools/get-memory.js +89 -0
  120. package/dist/tools/get-memory.js.map +1 -0
  121. package/dist/tools/get-style-guide.d.ts +5 -0
  122. package/dist/tools/get-style-guide.d.ts.map +1 -0
  123. package/dist/tools/get-style-guide.js +151 -0
  124. package/dist/tools/get-style-guide.js.map +1 -0
  125. package/dist/tools/get-symbol-references.d.ts +5 -0
  126. package/dist/tools/get-symbol-references.d.ts.map +1 -0
  127. package/dist/tools/get-symbol-references.js +70 -0
  128. package/dist/tools/get-symbol-references.js.map +1 -0
  129. package/dist/tools/get-team-patterns.d.ts +5 -0
  130. package/dist/tools/get-team-patterns.d.ts.map +1 -0
  131. package/dist/tools/get-team-patterns.js +147 -0
  132. package/dist/tools/get-team-patterns.js.map +1 -0
  133. package/dist/tools/index.d.ts +6 -0
  134. package/dist/tools/index.d.ts.map +1 -0
  135. package/dist/tools/index.js +41 -0
  136. package/dist/tools/index.js.map +1 -0
  137. package/dist/tools/refresh-index.d.ts +5 -0
  138. package/dist/tools/refresh-index.d.ts.map +1 -0
  139. package/dist/tools/refresh-index.js +40 -0
  140. package/dist/tools/refresh-index.js.map +1 -0
  141. package/dist/tools/remember.d.ts +5 -0
  142. package/dist/tools/remember.d.ts.map +1 -0
  143. package/dist/tools/remember.js +101 -0
  144. package/dist/tools/remember.js.map +1 -0
  145. package/dist/tools/search-codebase.d.ts +5 -0
  146. package/dist/tools/search-codebase.d.ts.map +1 -0
  147. package/dist/tools/search-codebase.js +745 -0
  148. package/dist/tools/search-codebase.js.map +1 -0
  149. package/dist/tools/types.d.ts +223 -0
  150. package/dist/tools/types.d.ts.map +1 -0
  151. package/dist/tools/types.js +2 -0
  152. package/dist/tools/types.js.map +1 -0
  153. package/dist/types/index.d.ts +79 -11
  154. package/dist/types/index.d.ts.map +1 -1
  155. package/dist/types/index.js +0 -1
  156. package/dist/types/index.js.map +1 -1
  157. package/dist/utils/ast-chunker.d.ts +71 -0
  158. package/dist/utils/ast-chunker.d.ts.map +1 -0
  159. package/dist/utils/ast-chunker.js +453 -0
  160. package/dist/utils/ast-chunker.js.map +1 -0
  161. package/dist/utils/chunking.d.ts.map +1 -1
  162. package/dist/utils/chunking.js +10 -3
  163. package/dist/utils/chunking.js.map +1 -1
  164. package/dist/utils/language-detection.d.ts.map +1 -1
  165. package/dist/utils/language-detection.js +26 -1
  166. package/dist/utils/language-detection.js.map +1 -1
  167. package/dist/utils/tree-sitter.d.ts +28 -0
  168. package/dist/utils/tree-sitter.d.ts.map +1 -0
  169. package/dist/utils/tree-sitter.js +422 -0
  170. package/dist/utils/tree-sitter.js.map +1 -0
  171. package/dist/utils/usage-tracker.d.ts +30 -40
  172. package/dist/utils/usage-tracker.d.ts.map +1 -1
  173. package/dist/utils/usage-tracker.js +66 -8
  174. package/dist/utils/usage-tracker.js.map +1 -1
  175. package/docs/capabilities.md +183 -92
  176. package/docs/cli.md +196 -0
  177. package/grammars/.gitkeep +0 -0
  178. package/grammars/tree-sitter-c.wasm +0 -0
  179. package/grammars/tree-sitter-c_sharp.wasm +0 -0
  180. package/grammars/tree-sitter-cpp.wasm +0 -0
  181. package/grammars/tree-sitter-go.wasm +0 -0
  182. package/grammars/tree-sitter-java.wasm +0 -0
  183. package/grammars/tree-sitter-javascript.wasm +0 -0
  184. package/grammars/tree-sitter-kotlin.wasm +0 -0
  185. package/grammars/tree-sitter-python.wasm +0 -0
  186. package/grammars/tree-sitter-rust.wasm +0 -0
  187. package/grammars/tree-sitter-tsx.wasm +0 -0
  188. package/grammars/tree-sitter-typescript.wasm +0 -0
  189. package/package.json +153 -157
package/README.md CHANGED
@@ -1,282 +1,417 @@
1
- # codebase-context
2
-
3
- ## Local-first second brain for AI Agents working on your codebase
4
-
5
- [![npm version](https://img.shields.io/npm/v/codebase-context)](https://www.npmjs.com/package/codebase-context) [![license](https://img.shields.io/npm/l/codebase-context)](./LICENSE) [![node](https://img.shields.io/node/v/codebase-context)](./package.json)
6
-
7
- You're tired of AI agents writing code that 'just works' but fits like a square peg in a round hole - not your conventions, not your architecture, not your repo. Even with well-curated instructions. You correct the agent, it doesn't remember. Next session, same mistakes.
8
-
9
- This MCP gives agents _just enough_ context so they match _how_ your team codes, know _why_, and _remember_ every correction.
10
-
11
- Here's what codebase-context does:
12
-
13
- **Finds the right context** - Search that doesn't just return code. Each result comes back with analyzed and quantified coding patterns and conventions, related team memories, file relationships, and quality indicators. It knows whether you're looking for a specific file, a concept, or how things wire together - and filters out the noise (test files, configs, old utilities) before the agent sees them. The agent gets curated context, not raw hits.
14
-
15
- **Knows your conventions** - Detected from your code and git history, not only from rules you wrote. Seeks team consensus and direction by adoption percentages and trends (rising/declining), golden files. Tells the difference between code that's _common_ and code that's _current_ - what patterns the team is moving toward and what's being left behind.
16
-
17
- **Remembers across sessions** - Decisions, failures, workarounds that look wrong but exist for a reason - the battle scars that aren't in the comments. Recorded once, surfaced automatically so the agent doesn't "clean up" something you spent a week getting right. Conventional git commits (`refactor:`, `migrate:`, `fix:`) auto-extract into memory with zero effort. Stale memories decay and get flagged instead of blindly trusted.
18
-
19
- **Checks before editing** - A preflight card with risk level, patterns to use and avoid, failure warnings, and a `readyToEdit` evidence check. Catches the "confidently wrong" problem: when code, team memories, and patterns contradict each other, it tells the agent to ask instead of guess. If evidence is thin or contradictory, it says so.
20
-
21
- One tool call returns all of it. Local-first - your code never leaves your machine.
22
-
23
- <!-- TODO: Add demo GIF: search_codebase("How does this app attach the auth token to outgoing API calls?") AuthInterceptor top result + preflight + agent proceeds or asks -->
24
- <!-- ![Demo](./docs/assets/demo.gif) -->
25
-
26
- ## Quick Start
27
-
28
- Add it to the configuration of your AI Agent of preference:
29
-
30
- ### Claude Code
31
-
32
- ```bash
33
- claude mcp add codebase-context -- npx -y codebase-context /path/to/your/project
34
- ```
35
-
36
- ### Claude Desktop
37
-
38
- Add to `claude_desktop_config.json`:
39
-
40
- ```json
41
- {
42
- "mcpServers": {
43
- "codebase-context": {
44
- "command": "npx",
45
- "args": ["-y", "codebase-context", "/path/to/your/project"]
46
- }
47
- }
48
- }
49
- ```
50
-
51
- ### VS Code (Copilot)
52
-
53
- Add `.vscode/mcp.json` to your project root:
54
-
55
- ```json
56
- {
57
- "servers": {
58
- "codebase-context": {
59
- "command": "npx",
60
- "args": ["-y", "codebase-context", "/path/to/your/project"] // Or "${workspaceFolder}"if your workspace is one project only
61
- }
62
- }
63
- }
64
- ```
65
-
66
- ### Cursor
67
-
68
- Add to `.cursor/mcp.json` in your project:
69
-
70
- ```json
71
- {
72
- "mcpServers": {
73
- "codebase-context": {
74
- "command": "npx",
75
- "args": ["-y", "codebase-context", "/path/to/your/project"]
76
- }
77
- }
78
- }
79
- ```
80
-
81
- ### Windsurf
82
-
83
- Open Settings > MCP and add:
84
-
85
- ```json
86
- {
87
- "mcpServers": {
88
- "codebase-context": {
89
- "command": "npx",
90
- "args": ["-y", "codebase-context", "/path/to/your/project"]
91
- }
92
- }
93
- }
94
- ```
95
-
96
- ## Codex
97
-
98
- Run codex mcp add codebase-context npx -y codebase-context "/path/to/your/project"
99
-
100
- ## What It Actually Does
101
-
102
- Other tools help AI find code. This one helps AI make the right decisions - by knowing what your team does, tracking where codebases are heading, and warning before mistakes happen.
103
-
104
- ### The Difference
105
-
106
- | Without codebase-context | With codebase-context |
107
- | ------------------------------------------------------- | --------------------------------------------------- |
108
- | Generates code using whatever matches or "sounds" right | Generates code following your team conventions |
109
- | Copies any example that fits | Follows your best implementations (golden files) |
110
- | Repeats mistakes you already corrected | Surfaces failure memories right before trying again |
111
- | You re-explain the same things every session | Remembers conventions and decisions automatically |
112
- | Edits confidently even when context is weak | Flags high-risk changes when evidence is thin |
113
- | Sees what the current code does and assumes | Sees how your code has evolved and why |
114
-
115
- ### The Search Tool (`search_codebase`)
116
-
117
- This is where it all comes together. One call returns:
118
-
119
- - **Code results** with `file` (path + line range), `summary`, `score`
120
- - **Type** per result: compact `componentType:layer` (e.g., `service:data`) — helps agents orient
121
- - **Pattern signals** per result: `trend` (Rising/Declining — Stable is omitted) and `patternWarning` when using legacy code
122
- - **Relationships** per result: `importedByCount` and `hasTests` (condensed)
123
- - **Related memories**: up to 3 team decisions, gotchas, and failures matched to the query
124
- - **Search quality**: `ok` or `low_confidence` with confidence score and `hint` when low
125
- - **Preflight**: `ready` (boolean) + `reason` when evidence is thin. Pass `intent="edit"` to get the full preflight card. If search quality is low, `ready` is always `false`.
126
-
127
- Snippets are opt-in (`includeSnippets: true`). Default output is lean — if the agent wants code, it calls `read_file`.
128
-
129
- ```json
130
- {
131
- "searchQuality": { "status": "ok", "confidence": 0.72 },
132
- "preflight": { "ready": true },
133
- "results": [
134
- {
135
- "file": "src/auth/auth.interceptor.ts:1-20",
136
- "summary": "HTTP interceptor that attaches auth token to outgoing requests",
137
- "score": 0.72,
138
- "type": "service:core",
139
- "trend": "Rising",
140
- "relationships": { "importedByCount": 4, "hasTests": true }
141
- }
142
- ],
143
- "relatedMemories": ["Always use HttpInterceptorFn (0.97)"]
144
- }
145
- ```
146
-
147
- Lean enough to fit on one screen. If search quality is low, preflight blocks edits instead of faking confidence.
148
-
149
- ### Patterns & Conventions (`get_team_patterns`)
150
-
151
- Detects what your team actually does by analyzing the codebase:
152
-
153
- - Adoption percentages for dependency injection, state management, testing, libraries
154
- - Patterns/conventions trend direction (Rising / Stable / Declining) based on git recency
155
- - Golden files - your best implementations ranked by modern pattern density
156
- - Conflicts - when the team hasn't converged (both approaches above 20% adoption)
157
-
158
- ### Team Memory (`remember` + `get_memory`)
159
-
160
- Record a decision once. It surfaces automatically in search results and preflight cards from then on. **Your git commits also become memories** - conventional commits like `refactor:`, `migrate:`, `fix:`, `revert:` from the last 90 days are auto-extracted during indexing.
161
-
162
- - **Types**: conventions (style rules), decisions (architecture choices), gotchas (things that break), failures (we tried X, it broke because Y)
163
- - **Confidence decay**: decisions age over 180 days, gotchas and failures over 90 days. Stale memories get flagged instead of blindly trusted.
164
- - **Zero-config git extraction**: runs automatically during `refresh_index`. No setup, no manual work.
165
-
166
- ### All Tools
167
-
168
- | Tool | What it does |
169
- | ------------------------------ | -------------------------------------------------------------------------------- |
170
- | `search_codebase` | Hybrid search with enrichment + preflight. Pass `intent="edit"` for edit readiness check. |
171
- | `get_team_patterns` | Pattern frequencies, golden files, conflict detection |
172
- | `get_component_usage` | "Find Usages" - where a library or component is imported |
173
- | `remember` | Record a convention, decision, gotcha, or failure |
174
- | `get_memory` | Query team memory with confidence decay scoring |
175
- | `get_codebase_metadata` | Project structure, frameworks, dependencies |
176
- | `get_style_guide` | Style guide rules for the current project |
177
- | `detect_circular_dependencies` | Import cycles between files |
178
- | `refresh_index` | Re-index (full or incremental) + extract git memories |
179
- | `get_indexing_status` | Progress and stats for the current index |
180
-
181
- ## How the Search Works
182
-
183
- The retrieval pipeline is designed around one goal: give the agent the right context, not just any file that matches.
184
-
185
- - **Intent classification** - knows whether "AuthService" is a name lookup or "how does auth work" is conceptual. Adjusts keyword/semantic weights accordingly.
186
- - **Hybrid fusion (RRF)** - combines keyword and semantic search using Reciprocal Rank Fusion instead of brittle score averaging.
187
- - **Query expansion** - conceptual queries automatically expand with domain-relevant terms (auth login, token, session, guard).
188
- - **Contamination control** - test files are filtered/demoted for non-test queries.
189
- - **Import centrality** - files that are imported more often rank higher.
190
- - **Cross-encoder reranking** - a stage-2 reranker triggers only when top scores are ambiguous. CPU-only, bounded to top-K.
191
- - **Incremental indexing** - only re-indexes files that changed since last run (SHA-256 manifest diffing).
192
- - **Auto-heal** - if the index corrupts, search triggers a full re-index automatically.
193
-
194
- ## Language Support
195
-
196
- Over **30+ languages** are supported: TypeScript, JavaScript, Python, Java, Kotlin, C/C++, C#, Go, Rust, PHP, Ruby, Swift, Scala, Shell, and common config/markup formats.
197
- However right now only **Angular** has a specific analyzer for enriched context (signals, standalone components, control flow, DI patterns).
198
- If you need enriched context from any language or framework, please file an issue - or even better, contribute with a new analyzer
199
-
200
- Structured filters available: `framework`, `language`, `componentType`, `layer` (presentation, business, data, state, core, shared).
201
-
202
- ## Configuration
203
-
204
- | Variable | Default | Description |
205
- | ------------------------ | -------------- | --------------------------------------------------------- |
206
- | `EMBEDDING_PROVIDER` | `transformers` | `openai` (fast, cloud) or `transformers` (local, private) |
207
- | `OPENAI_API_KEY` | - | Required only if using `openai` provider |
208
- | `CODEBASE_ROOT` | - | Project root (CLI arg takes precedence) |
209
- | `CODEBASE_CONTEXT_DEBUG` | - | Set to `1` for verbose logging |
210
-
211
- ## Performance
212
-
213
- - **First indexing**: 2-5 minutes for ~30k files (embedding computation).
214
- - **Subsequent queries**: milliseconds from cache.
215
- - **Incremental updates**: `refresh_index` with `incrementalOnly: true` processes only changed files (SHA-256 manifest diffing).
216
-
217
- ## File Structure
218
-
219
- ```
220
- .codebase-context/
221
- memory.json # Team knowledge (should be persisted in git)
222
- intelligence.json # Pattern analysis (generated)
223
- index.json # Keyword index (generated)
224
- index/ # Vector database (generated)
225
- ```
226
-
227
- **Recommended `.gitignore`:**
228
-
229
- ```gitignore
230
- # Codebase Context - ignore generated files, keep memory
231
- .codebase-context/*
232
- !.codebase-context/memory.json
233
- ```
234
-
235
- ## CLI Access (Vendor-Neutral)
236
-
237
- You can manage team memory directly from the terminal without any AI agent:
238
-
239
- ```bash
240
- # List all memories
241
- npx codebase-context memory list
242
-
243
- # Filter by category or type
244
- npx codebase-context memory list --category conventions --type convention
245
-
246
- # Search memories
247
- npx codebase-context memory list --query "auth"
248
-
249
- # Add a memory
250
- npx codebase-context memory add --type convention --category tooling --memory "Use pnpm, not npm" --reason "Workspace support and speed"
251
-
252
- # Remove a memory
253
- npx codebase-context memory remove <id>
254
-
255
- # JSON output for scripting
256
- npx codebase-context memory list --json
257
- ```
258
-
259
- Set `CODEBASE_ROOT` to point to your project, or run from the project directory.
260
-
261
- ## Tip: Ensuring your AI Agent recalls memory:
262
-
263
- Add this to `.cursorrules`, `CLAUDE.md`, or `AGENTS.md`:
264
-
265
- ```
266
- ## Codebase Context
267
-
268
- **At start of each task:** Call `get_memory` to load team conventions.
269
-
270
- **When user says "remember this" or "record this":**
271
- - Call `remember` tool IMMEDIATELY before doing anything else.
272
- ```
273
-
274
- ## Links
275
-
276
- - [Motivation](./MOTIVATION.md) - Research and design rationale
277
- - [Changelog](./CHANGELOG.md) - Version history
278
- - [Contributing](./CONTRIBUTING.md) - How to add analyzers
279
-
280
- ## License
281
-
282
- MIT
1
+ # codebase-context
2
+
3
+ ## Local-first second brain for AI agents working on your codebase
4
+
5
+ [![npm version](https://img.shields.io/npm/v/codebase-context)](https://www.npmjs.com/package/codebase-context) [![license](https://img.shields.io/npm/l/codebase-context)](./LICENSE) [![node](https://img.shields.io/node/v/codebase-context)](./package.json)
6
+
7
+ You're tired of AI agents writing code that 'just works' but fits like a square peg in a round hole - not your conventions, not your architecture, not your repo. Even with well-curated instructions. You correct the agent, it doesn't remember. Next session, same mistakes.
8
+
9
+ This MCP gives agents _just enough_ context so they match _how_ your team codes, know _why_, and _remember_ every correction.
10
+
11
+ Here's what codebase-context does:
12
+
13
+ **Finds the right context** - Search that doesn't just return code. Each result comes back with analyzed and quantified coding patterns and conventions, related team memories, file relationships, and quality indicators. It knows whether you're looking for a specific file, a concept, or how things wire together - and filters out the noise (test files, configs, old utilities) before the agent sees them. The agent gets curated context, not raw hits.
14
+
15
+ **Knows your conventions** - Detected from your code and git history, not only from rules you wrote. Seeks team consensus and direction by adoption percentages and trends (rising/declining), golden files. Tells the difference between code that's _common_ and code that's _current_ - what patterns the team is moving toward and what's being left behind.
16
+
17
+ **Remembers across sessions** - Decisions, failures, workarounds that look wrong but exist for a reason - the battle scars that aren't in the comments. Recorded once, surfaced automatically so the agent doesn't "clean up" something you spent a week getting right. Conventional git commits (`refactor:`, `migrate:`, `fix:`) auto-extract into memory with zero effort. Stale memories decay and get flagged instead of blindly trusted.
18
+
19
+ **Checks before editing** - Before editing something, you get a decision card showing whether there's enough evidence to proceed. If a symbol has four callers (files that import or reference it) and only two appear in your search results, the card shows that coverage gap. If coverage is low, `whatWouldHelp` lists the specific searches to run before you touch anything. When code, team memories, and patterns contradict each other, it tells you to look deeper instead of guessing.
20
+
21
+ One tool call returns all of it. Local-first - your code never leaves your machine by default. Opt into `EMBEDDING_PROVIDER=openai` for cloud speed, but then code is sent externally.
22
+
23
+ The index auto-refreshes as you edit - a file watcher triggers incremental reindex in the background when the MCP server is running. No stale context between tool calls.
24
+
25
+ <!-- TODO: Add demo GIF: search_codebase("How does this app attach the auth token to outgoing API calls?") -> AuthInterceptor top result + preflight + agent proceeds or asks -->
26
+ <!-- ![Demo](./docs/assets/demo.gif) -->
27
+
28
+ ## Quick Start
29
+
30
+ Add it to the configuration of your AI Agent of preference:
31
+
32
+ ### Claude Code
33
+
34
+ ```bash
35
+ claude mcp add codebase-context -- npx -y codebase-context /path/to/your/project
36
+ ```
37
+
38
+ ### Claude Desktop
39
+
40
+ Add to `claude_desktop_config.json`:
41
+
42
+ ```json
43
+ {
44
+ "mcpServers": {
45
+ "codebase-context": {
46
+ "command": "npx",
47
+ "args": ["-y", "codebase-context", "/path/to/your/project"]
48
+ }
49
+ }
50
+ }
51
+ ```
52
+
53
+ ### VS Code (Copilot)
54
+
55
+ Add `.vscode/mcp.json` to your project root:
56
+
57
+ ```json
58
+ {
59
+ "servers": {
60
+ "codebase-context": {
61
+ "command": "npx",
62
+ "args": ["-y", "codebase-context", "/path/to/your/project"] // Or "${workspaceFolder}" if your workspace is one project only
63
+ }
64
+ }
65
+ }
66
+ ```
67
+
68
+ ### Cursor
69
+
70
+ Add to `.cursor/mcp.json` in your project:
71
+
72
+ ```json
73
+ {
74
+ "mcpServers": {
75
+ "codebase-context": {
76
+ "command": "npx",
77
+ "args": ["-y", "codebase-context", "/path/to/your/project"]
78
+ }
79
+ }
80
+ }
81
+ ```
82
+
83
+ ### Windsurf
84
+
85
+ Open Settings > MCP and add:
86
+
87
+ ```json
88
+ {
89
+ "mcpServers": {
90
+ "codebase-context": {
91
+ "command": "npx",
92
+ "args": ["-y", "codebase-context", "/path/to/your/project"]
93
+ }
94
+ }
95
+ }
96
+ ```
97
+
98
+ ### Codex
99
+
100
+ ```bash
101
+ codex mcp add codebase-context npx -y codebase-context "/path/to/your/project"
102
+ ```
103
+
104
+ ## New to this codebase?
105
+
106
+ Three commands to get what usually takes a new developer weeks to piece together:
107
+
108
+ ```bash
109
+ # What tech stack, architecture, and file count?
110
+ npx -y codebase-context metadata
111
+
112
+ # What does the team actually code like right now?
113
+ npx -y codebase-context patterns
114
+
115
+ # What team decisions were made (and why)?
116
+ npx -y codebase-context memory list
117
+ ```
118
+
119
+ This is also what your AI agent consumes automatically via MCP tools; the CLI is the human-readable version.
120
+
121
+ ### CLI preview
122
+
123
+ ```text
124
+ $ npx -y codebase-context patterns
125
+ ┌─ Team Patterns ──────────────────────────────────────────────────────┐
126
+ │ │
127
+ UNIT TEST FRAMEWORK │
128
+ │ USE: Vitest – 96% adoption │
129
+ │ alt CAUTION: Jest – 4% minority pattern │
130
+ │ │
131
+ STATE MANAGEMENT │
132
+ │ PREFER: RxJS 63% adoption │
133
+ │ alt Redux-style store – 25% │
134
+ │ │
135
+ └──────────────────────────────────────────────────────────────────────┘
136
+ ```
137
+
138
+ ```text
139
+ $ npx -y codebase-context search --query "file watcher" --intent edit --limit 1
140
+ ┌─ Search: "file watcher" ─── intent: edit ────────────────────────────┐
141
+ │ Quality: ok (1.00) │
142
+ │ Ready to edit: YES │
143
+ │ │
144
+ │ Best example: index.ts │
145
+ └──────────────────────────────────────────────────────────────────────┘
146
+ ```
147
+
148
+ ```text
149
+ $ npx -y codebase-context metadata
150
+ ┌─ codebase-context [monorepo] ────────────────────────────────────────┐
151
+ │ │
152
+ │ Framework: Angular unknown Architecture: mixed │
153
+ 130 files · 24,211 lines · 1077 components │
154
+ │ │
155
+ Dependencies: @huggingface/transformers · @lancedb/lancedb · │
156
+ @modelcontextprotocol/sdk · @typescript-eslint/typescript-estree · │
157
+ │ chokidar · fuse.js (+14 more) │
158
+ │ │
159
+ └──────────────────────────────────────────────────────────────────────┘
160
+ ```
161
+
162
+ ```text
163
+ $ npx -y codebase-context refs --symbol "startFileWatcher"
164
+ ┌─ startFileWatcher ─── 11 references ─── static analysis ─────────────┐
165
+ │ │
166
+ startFileWatcher │
167
+ │ │ │
168
+ ├─ file-watcher.test.ts:5 │
169
+ │ import { startFileWatcher } from '../src/core/file-watcher.... │
170
+ │ │
171
+ └──────────────────────────────────────────────────────────────────────┘
172
+ ```
173
+
174
+ ```text
175
+ $ npx -y codebase-context cycles
176
+ ┌─ Circular Dependencies ──────────────────────────────────────────────┐
177
+ │ │
178
+ No cycles found · 98 files · 260 edges · 2.7 avg deps │
179
+ │ │
180
+ └──────────────────────────────────────────────────────────────────────┘
181
+ ```
182
+
183
+ See `docs/cli.md` for the full CLI gallery.
184
+
185
+ ## What It Actually Does
186
+
187
+ Other tools help AI find code. This one helps AI make the right decisions - by knowing what your team does, tracking where codebases are heading, and warning before mistakes happen.
188
+
189
+ ### The Difference
190
+
191
+ | Without codebase-context | With codebase-context |
192
+ | ------------------------------------------------------- | --------------------------------------------------- |
193
+ | Generates code using whatever matches or "sounds" right | Generates code following your team conventions |
194
+ | Copies any example that fits | Follows your best implementations (golden files) |
195
+ | Repeats mistakes you already corrected | Surfaces failure memories right before trying again |
196
+ | You re-explain the same things every session | Remembers conventions and decisions automatically |
197
+ | Edits confidently even when context is weak | Flags high-risk changes when evidence is thin |
198
+ | Sees what the current code does and assumes | Sees how your code has evolved and why |
199
+
200
+ ### The Search Tool (`search_codebase`)
201
+
202
+ This is where it all comes together. One call returns:
203
+
204
+ - **Code results** with `file` (path + line range), `summary`, `score`
205
+ - **Type** per result: compact `componentType:layer` (e.g., `service:data`) — helps agents orient
206
+ - **Pattern signals** per result: `trend` (Rising/Declining — Stable is omitted) and `patternWarning` when using legacy code
207
+ - **Relationships** per result: `importedByCount` and `hasTests` (condensed) + **hints** (capped ranked callers, consumers, tests) — so you see suggested next reads and know what you haven't looked at yet
208
+ - **Related memories**: up to 3 team decisions, gotchas, and failures matched to the query
209
+ - **Search quality**: `ok` or `low_confidence` with confidence score and `hint` when low
210
+ - **Preflight**: `ready` (boolean) with decision card when `intent="edit"|"refactor"|"migrate"`. Shows `nextAction` (if not ready), `warnings`, `patterns` (do/avoid), `bestExample`, `impact` (import-graph coverage — how many files that import or reference the result are in your search), and `whatWouldHelp` (next steps). If search quality is low, `ready` is always `false`.
211
+
212
+ Snippets are optional (`includeSnippets: true`). When enabled, snippets that have symbol metadata (e.g. from the Generic analyzer's AST chunking or Angular component chunks) start with a scope header so you know where the code lives (e.g. `// AuthService.getToken()` or `// SpotifyApiService`). Example:
213
+
214
+ ```ts
215
+ // AuthService.getToken()
216
+ getToken(): string {
217
+ return this.token;
218
+ }
219
+ ```
220
+
221
+ Default output is lean if the agent wants code, it calls `read_file`.
222
+
223
+ For scripting and automation, every CLI command accepts `--json` for machine output (stdout = JSON; logs/errors go to stderr).
224
+ See `docs/capabilities.md` for the field reference.
225
+
226
+ Lean enough to fit on one screen. If search quality is low, preflight blocks edits instead of faking confidence.
227
+
228
+ ### Patterns & Conventions (`get_team_patterns`)
229
+
230
+ Detects what your team actually does by analyzing the codebase:
231
+
232
+ - Adoption percentages for dependency injection, state management, testing, libraries
233
+ - Patterns/conventions trend direction (Rising / Stable / Declining) based on git recency
234
+ - Golden files - your best implementations ranked by modern pattern density
235
+ - Conflicts - when the team hasn't converged (both approaches above 20% adoption)
236
+
237
+ ### Team Memory (`remember` + `get_memory`)
238
+
239
+ Record a decision once. It surfaces automatically in search results and preflight cards from then on. **Your git commits also become memories** - conventional commits like `refactor:`, `migrate:`, `fix:`, `revert:` from the last 90 days are auto-extracted during indexing.
240
+
241
+ - **Types**: conventions (style rules), decisions (architecture choices), gotchas (things that break), failures (we tried X, it broke because Y)
242
+ - **Confidence decay**: decisions age over 180 days, gotchas and failures over 90 days. Stale memories get flagged instead of blindly trusted.
243
+ - **Zero-config git extraction**: runs automatically during `refresh_index`. No setup, no manual work.
244
+
245
+ ### All Tools
246
+
247
+ | Tool | What it does |
248
+ | ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
249
+ | `search_codebase` | Hybrid search + decision card. Pass `intent="edit"` to get `ready`, `nextAction`, patterns, import-graph coverage, and `whatWouldHelp`. |
250
+ | `get_team_patterns` | Pattern frequencies, golden files, conflict detection |
251
+ | `get_symbol_references` | Find concrete references to a symbol (usageCount + top snippets). `confidence: "syntactic"` = static/source-based only; no runtime or dynamic dispatch. |
252
+ | `remember` | Record a convention, decision, gotcha, or failure |
253
+ | `get_memory` | Query team memory with confidence decay scoring |
254
+ | `get_codebase_metadata` | Project structure, frameworks, dependencies |
255
+ | `get_style_guide` | Style guide rules for the current project |
256
+ | `detect_circular_dependencies` | Import cycles between files |
257
+ | `refresh_index` | Re-index (full or incremental) + extract git memories |
258
+ | `get_indexing_status` | Progress and stats for the current index |
259
+
260
+ ## Evaluation Harness (`npm run eval`)
261
+
262
+ Reproducible evaluation with frozen fixtures so ranking/chunking changes are measured honestly and regressions get caught. **For contributors and CI:** run before releases or after changing search/ranking/chunking to guard against regressions.
263
+
264
+ - Two codebases: `npm run eval -- <codebaseA> <codebaseB>`
265
+ - Defaults: fixture A = `tests/fixtures/eval-angular-spotify.json`, fixture B = `tests/fixtures/eval-controlled.json`
266
+ - Offline smoke (no network):
267
+
268
+ ```bash
269
+ npm run eval -- tests/fixtures/codebases/eval-controlled tests/fixtures/codebases/eval-controlled \
270
+ --fixture-a=tests/fixtures/eval-controlled.json \
271
+ --fixture-b=tests/fixtures/eval-controlled.json \
272
+ --skip-reindex --no-rerank
273
+ ```
274
+
275
+ - Flags: `--help`, `--fixture-a`, `--fixture-b`, `--skip-reindex`, `--no-rerank`, `--no-redact`
276
+ - To save a report for later comparison, redirect stdout (e.g. `pnpm run eval -- <path-to-angular-spotify> --skip-reindex > internal-docs/tests/eval-runs/angular-spotify-YYYY-MM-DD.txt`).
277
+
278
+ ## How the Search Works
279
+
280
+ The retrieval pipeline is designed around one goal: give the agent the right context, not just any file that matches.
281
+
282
+ - **Definition-first ranking** - for exact-name lookups (e.g. a symbol name), the file that _defines_ the symbol ranks above files that only use it.
283
+ - **Intent classification** - knows whether "AuthService" is a name lookup or "how does auth work" is conceptual. Adjusts keyword/semantic weights accordingly.
284
+ - **Hybrid fusion (RRF)** - combines keyword and semantic search using Reciprocal Rank Fusion instead of brittle score averaging.
285
+ - **Query expansion** - conceptual queries automatically expand with domain-relevant terms (auth → login, token, session, guard).
286
+ - **Contamination control** - test files are filtered/demoted for non-test queries.
287
+ - **Import centrality** - files that are imported more often rank higher.
288
+ - **Cross-encoder reranking** - a stage-2 reranker triggers only when top scores are ambiguous. CPU-only, bounded to top-K.
289
+ - **Incremental indexing** - only re-indexes files that changed since last run (SHA-256 manifest diffing).
290
+ - **Version gating** - index artifacts are versioned; mismatches trigger automatic rebuild so mixed-version data is never served.
291
+ - **Auto-heal** - if the index corrupts, search triggers a full re-index automatically.
292
+
293
+ **Index reliability:** Rebuilds write to a staging directory and swap atomically only on success, so a failed rebuild never corrupts the active index. Version mismatches or corruption trigger an automatic full re-index (no user action required).
294
+
295
+ ## Language Support
296
+
297
+ **10 languages** have full symbol extraction (Tree-sitter): TypeScript, JavaScript, Python, Java, Kotlin, C, C++, C#, Go, Rust. **30+ languages** have indexing and retrieval coverage (keyword + semantic), including PHP, Ruby, Swift, Scala, Shell, and config/markup (JSON/YAML/TOML/XML, etc.).
298
+
299
+ Enrichment is framework-specific: right now only **Angular** has a dedicated analyzer for rich conventions/context (signals, standalone components, control flow, DI patterns).
300
+
301
+ For non-Angular projects, the **Generic** analyzer uses **AST-aligned chunking** when a Tree-sitter grammar is available: symbol-bounded chunks with **scope-aware prefixes** (e.g. `// ClassName.methodName`) so snippets show where code lives. Without a grammar it falls back to safe line-based chunking.
302
+
303
+ Structured filters available: `framework`, `language`, `componentType`, `layer` (presentation, business, data, state, core, shared).
304
+
305
+ ## Configuration
306
+
307
+ | Variable | Default | Description |
308
+ | ------------------------ | -------------------------- | --------------------------------------------------------------------------------------------- |
309
+ | `EMBEDDING_PROVIDER` | `transformers` | `openai` (fast, cloud) or `transformers` (local, private) |
310
+ | `OPENAI_API_KEY` | - | Required only if using `openai` provider |
311
+ | `CODEBASE_ROOT` | - | Project root (CLI arg takes precedence) |
312
+ | `CODEBASE_CONTEXT_DEBUG` | - | Set to `1` for verbose logging |
313
+ | `EMBEDDING_MODEL` | `Xenova/bge-small-en-v1.5` | Local embedding model override (e.g. `onnx-community/granite-embedding-small-english-r2-ONNX` for Granite) |
314
+
315
+ ## Performance
316
+
317
+ - **First indexing**: 2-5 minutes for ~30k files (embedding computation).
318
+ - **Subsequent queries**: milliseconds from cache.
319
+ - **Incremental updates**: `refresh_index` with `incrementalOnly: true` processes only changed files (SHA-256 manifest diffing).
320
+
321
+ ## File Structure
322
+
323
+ ```
324
+ .codebase-context/
325
+ memory.json # Team knowledge (should be persisted in git)
326
+ index-meta.json # Index metadata and version (generated)
327
+ intelligence.json # Pattern analysis (generated)
328
+ relationships.json # File/symbol relationships (generated)
329
+ index.json # Keyword index (generated)
330
+ index/ # Vector database (generated)
331
+ ```
332
+
333
+ **Recommended `.gitignore`:**
334
+
335
+ ```gitignore
336
+ # Codebase Context - ignore generated files, keep memory
337
+ .codebase-context/*
338
+ !.codebase-context/memory.json
339
+ ```
340
+
341
+ ## CLI Reference
342
+
343
+ All MCP tools are available as CLI commands — no AI agent required. Useful for onboarding, scripting, debugging, and CI workflows.
344
+ For formatted examples and “money shots”, see `docs/cli.md`.
345
+
346
+ Set `CODEBASE_ROOT` to your project root, or run from the project directory.
347
+
348
+ ```bash
349
+ # Search the indexed codebase
350
+ npx -y codebase-context search --query "authentication middleware"
351
+ npx -y codebase-context search --query "auth" --intent edit --limit 5
352
+
353
+ # Project structure, frameworks, and dependencies
354
+ npx -y codebase-context metadata
355
+
356
+ # Index state and progress
357
+ npx -y codebase-context status
358
+
359
+ # Re-index the codebase
360
+ npx -y codebase-context reindex
361
+ npx -y codebase-context reindex --incremental --reason "added new service"
362
+
363
+ # Style guide rules
364
+ npx -y codebase-context style-guide
365
+ npx -y codebase-context style-guide --query "naming" --category patterns
366
+
367
+ # Team patterns (DI, state, testing, etc.)
368
+ npx -y codebase-context patterns
369
+ npx -y codebase-context patterns --category testing
370
+
371
+ # Symbol references
372
+ npx -y codebase-context refs --symbol "UserService"
373
+ npx -y codebase-context refs --symbol "handleLogin" --limit 20
374
+
375
+ # Circular dependency detection
376
+ npx -y codebase-context cycles
377
+ npx -y codebase-context cycles --scope src/features
378
+
379
+ # Memory management
380
+ npx -y codebase-context memory list
381
+ npx -y codebase-context memory list --category conventions --type convention
382
+ npx -y codebase-context memory list --query "auth" --json
383
+ npx -y codebase-context memory add --type convention --category tooling --memory "Use pnpm, not npm" --reason "Workspace support and speed"
384
+ npx -y codebase-context memory remove <id>
385
+ ```
386
+
387
+ All commands accept `--json` for raw JSON output suitable for piping and scripting.
388
+
389
+ ## What to add to your CLAUDE.md / AGENTS.md
390
+
391
+ Paste this into `.cursorrules`, `CLAUDE.md`, `AGENTS.md`, or wherever your AI reads project instructions:
392
+
393
+ ```markdown
394
+ ## Codebase Context (MCP)
395
+
396
+ **Start of every task:** Call `get_memory` to load team conventions before writing any code.
397
+
398
+ **Before editing existing code:** Call `search_codebase` with `intent: "edit"`. If the preflight card says `ready: false`, read the listed files before touching anything.
399
+
400
+ **Before writing new code:** Call `get_team_patterns` to check how the team handles DI, state, testing, and library wrappers — don't introduce a new pattern if one already exists.
401
+
402
+ **When asked to "remember" or "record" something:** Call `remember` immediately, before doing anything else.
403
+
404
+ **When adding imports that cross module boundaries:** Call `detect_circular_dependencies` with the relevant scope after adding the import.
405
+ ```
406
+
407
+ These are the behaviors that make the most difference day-to-day. Copy, trim what doesn't apply to your stack, and add it once.
408
+
409
+ ## Links
410
+
411
+ - [Motivation](./MOTIVATION.md) - Research and design rationale
412
+ - [Changelog](./CHANGELOG.md) - Version history
413
+ - [Contributing](./CONTRIBUTING.md) - How to add analyzers
414
+
415
+ ## License
416
+
417
+ MIT