@iceinvein/code-intelligence-mcp 0.2.4 → 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +213 -46
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -14,10 +14,14 @@ This server indexes your codebase locally to provide **fast, semantic, and struc
|
|
|
14
14
|
|
|
15
15
|
Unlike basic text search, this server builds a local knowledge graph to understand your code.
|
|
16
16
|
|
|
17
|
-
*
|
|
18
|
-
*
|
|
19
|
-
*
|
|
20
|
-
*
|
|
17
|
+
* 🔍 **Advanced Hybrid Search**: Combines **Tantivy** (keyword BM25) + **LanceDB** (semantic vector) + **Jina Code embeddings** (768-dim code-specific model) with Reciprocal Rank Fusion (RRF).
|
|
18
|
+
* 🎯 **Cross-Encoder Reranking**: Always-on ORT-based reranker for precision result ranking.
|
|
19
|
+
* 🧠 **Smart Context Assembly**: Token-aware budgeting with query-aware truncation that keeps relevant lines within context limits.
|
|
20
|
+
* 📊 **PageRank Scoring**: Graph-based symbol importance scoring that identifies central, heavily-used components.
|
|
21
|
+
* 🎓 **Learns from Feedback**: Optional learning system that adapts to user selections over time.
|
|
22
|
+
* 🚀 **Production First**: Ranking heuristics prioritize implementation code over tests and glue code (`index.ts`).
|
|
23
|
+
* 🔗 **Multi-Repo Support**: Index and search across multiple repositories/monorepos simultaneously.
|
|
24
|
+
* ⚡ **Fast & Local**: Written in **Rust**. Uses Metal GPU acceleration on macOS. Parallel indexing with persistent caching.
|
|
21
25
|
|
|
22
26
|
---
|
|
23
27
|
|
|
@@ -47,16 +51,49 @@ Add to your `opencode.json` (or global config):
|
|
|
47
51
|
|
|
48
52
|
## Capabilities
|
|
49
53
|
|
|
50
|
-
Available tools for the agent:
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
|
55
|
-
|
|
|
56
|
-
| `
|
|
57
|
-
| `
|
|
58
|
-
| `
|
|
59
|
-
| `
|
|
54
|
+
Available tools for the agent (19 tools total):
|
|
55
|
+
|
|
56
|
+
### Core Search & Navigation
|
|
57
|
+
|
|
58
|
+
| Tool | Description |
|
|
59
|
+
| :------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
60
|
+
| `search_code` | **Primary Search.** Finds code by meaning ("how does auth work?") or structure ("class User"). Supports query decomposition (e.g., "authentication and authorization"). |
|
|
61
|
+
| `get_definition` | Retrieves the full definition of a specific symbol with disambiguation support. |
|
|
62
|
+
| `find_references` | Finds all usages of a function, class, or variable. |
|
|
63
|
+
| `get_call_hierarchy` | Specifies upstream callers and downstream callees. |
|
|
64
|
+
| `get_type_graph` | Explores inheritance (extends/implements) and type aliases. |
|
|
65
|
+
| `explore_dependency_graph` | Explores module-level dependencies upstream or downstream. |
|
|
66
|
+
| `get_file_symbols` | Lists all symbols defined in a specific file. |
|
|
67
|
+
| `get_usage_examples` | Returns real-world examples of how a symbol is used in the codebase. |
|
|
68
|
+
|
|
69
|
+
### Advanced Analysis
|
|
70
|
+
|
|
71
|
+
| Tool | Description |
|
|
72
|
+
| :----------------------- | :---------------------------------------------------------------------------------------- |
|
|
73
|
+
| `explain_search` | Returns detailed scoring breakdown to understand why results ranked as they did. |
|
|
74
|
+
| `find_similar_code` | Finds code semantically similar to a given symbol or code snippet. |
|
|
75
|
+
| `trace_data_flow` | Traces variable reads and writes through the codebase to understand data flow. |
|
|
76
|
+
| `find_affected_code` | Finds code that would be affected if a symbol changes (reverse dependencies). |
|
|
77
|
+
| `get_similarity_cluster` | Returns symbols in the same semantic similarity cluster as a given symbol. |
|
|
78
|
+
| `summarize_file` | Generates a summary of file contents including symbol counts, structure, and key exports. |
|
|
79
|
+
| `get_module_summary` | Lists all exported symbols from a module/file with their signatures. |
|
|
80
|
+
|
|
81
|
+
### Testing & Documentation
|
|
82
|
+
|
|
83
|
+
| Tool | Description |
|
|
84
|
+
| :---------------------- | :------------------------------------------------------------------------------------------ |
|
|
85
|
+
| `search_todos` | Searches for TODO and FIXME comments to track technical debt. |
|
|
86
|
+
| `find_tests_for_symbol` | Finds test files that test a given symbol or source file. |
|
|
87
|
+
| `search_decorators` | Searches for TypeScript/JavaScript decorators (@Component, @Controller, @Get, @Post, etc.). |
|
|
88
|
+
|
|
89
|
+
### Context & Learning
|
|
90
|
+
|
|
91
|
+
| Tool | Description |
|
|
92
|
+
| :----------------- | :------------------------------------------------------------------------------ |
|
|
93
|
+
| `hydrate_symbols` | Hydrates full context for a set of symbol IDs. |
|
|
94
|
+
| `report_selection` | Records user selection feedback for learning (call when user selects a result). |
|
|
95
|
+
| `refresh_index` | Manually triggers a re-index of the codebase. |
|
|
96
|
+
| `get_index_stats` | Returns index statistics (files, symbols, edges, last updated). |
|
|
60
97
|
|
|
61
98
|
---
|
|
62
99
|
|
|
@@ -64,28 +101,46 @@ Available tools for the agent:
|
|
|
64
101
|
|
|
65
102
|
The server supports semantic navigation and symbol extraction for the following languages:
|
|
66
103
|
|
|
67
|
-
*
|
|
68
|
-
*
|
|
69
|
-
*
|
|
70
|
-
*
|
|
71
|
-
*
|
|
72
|
-
*
|
|
73
|
-
*
|
|
74
|
-
*
|
|
104
|
+
* **Rust**
|
|
105
|
+
* **TypeScript / TSX**
|
|
106
|
+
* **JavaScript**
|
|
107
|
+
* **Python**
|
|
108
|
+
* **Go**
|
|
109
|
+
* **Java**
|
|
110
|
+
* **C**
|
|
111
|
+
* **C++**
|
|
75
112
|
|
|
76
113
|
---
|
|
77
114
|
|
|
78
|
-
## Smart Ranking
|
|
115
|
+
## Smart Ranking & Context Enhancement
|
|
116
|
+
|
|
117
|
+
The ranking engine optimizes results for relevance using sophisticated signals:
|
|
118
|
+
|
|
119
|
+
1. **PageRank Symbol Importance**: Graph-based scoring that identifies central, heavily-used components (similar to Google's PageRank).
|
|
120
|
+
2. **Cross-Encoder Reranking**: Always-on ORT-based reranker applies deep learning to fine-tune result order.
|
|
121
|
+
3. **Reciprocal Rank Fusion (RRF)**: Combines keyword, vector, and graph search results using statistically optimal rank fusion.
|
|
122
|
+
4. **Query Decomposition**: Complex queries ("X and Y") are automatically split into sub-queries for better coverage.
|
|
123
|
+
5. **Token-Aware Truncation**: Context assembly keeps query-relevant lines within token budgets using BM25-style relevance scoring.
|
|
124
|
+
6. **Directory Semantics**: Implementation directories (`src`, `lib`, `app`) are boosted, while build artifacts (`dist`, `build`) and `node_modules` are penalized.
|
|
125
|
+
7. **Test Penalty**: Test files (`*.test.ts`, `__tests__`) are ranked lower by default, but are boosted if the query intent implies testing.
|
|
126
|
+
8. **Glue Code Filtering**: Re-export files (e.g., `index.ts`) are deprioritized in favor of the actual implementation.
|
|
127
|
+
9. **JSDoc Boost**: Symbols with documentation receive a ranking boost, and examples are included in search results.
|
|
128
|
+
10. **Learning from Feedback** (optional): Tracks user selections to personalize future search results.
|
|
129
|
+
11. **Package-Aware Scoring** (multi-repo): Boosts results from the same package when working in monorepos.
|
|
130
|
+
|
|
131
|
+
### Intent Detection
|
|
79
132
|
|
|
80
|
-
The
|
|
133
|
+
The system detects query intent and adjusts ranking accordingly:
|
|
81
134
|
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
135
|
+
| Query Pattern | Intent | Effect |
|
|
136
|
+
| ----------------- | ------------------------- | --------------------------------------- |
|
|
137
|
+
| "struct User" | Definition | Boosts type definitions (1.5x) |
|
|
138
|
+
| "who calls login" | Callers | Triggers graph lookup |
|
|
139
|
+
| "verify login" | Testing | Boosts test files |
|
|
140
|
+
| "User schema" | Schema/Model | Boosts schema/model files (50-75x) |
|
|
141
|
+
| "auth and authz" | Multi-query decomposition | Splits into sub-queries, merges via RRF |
|
|
142
|
+
|
|
143
|
+
For a deep dive into the system's design, see [System Architecture](SYSTEM_ARCHITECTURE.md).
|
|
89
144
|
|
|
90
145
|
---
|
|
91
146
|
|
|
@@ -93,12 +148,78 @@ The ranking engine optimizes results for relevance using several heuristics:
|
|
|
93
148
|
|
|
94
149
|
Works without configuration by default. You can customize behavior via environment variables:
|
|
95
150
|
|
|
151
|
+
### Core Settings
|
|
152
|
+
|
|
153
|
+
```json
|
|
154
|
+
"env": {
|
|
155
|
+
"BASE_DIR": "/path/to/repo", // Required: Repository root
|
|
156
|
+
"WATCH_MODE": "true", // Watch for file changes (Default: true)
|
|
157
|
+
"INDEX_PATTERNS": "**/*.ts,**/*.go", // File patterns to index
|
|
158
|
+
"EXCLUDE_PATTERNS": "**/node_modules/**",
|
|
159
|
+
"REPO_ROOTS": "/path/to/repo1,/path/to/repo2" // Multi-repo support
|
|
160
|
+
}
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
### Embedding Model
|
|
164
|
+
|
|
165
|
+
```json
|
|
166
|
+
"env": {
|
|
167
|
+
"EMBEDDINGS_BACKEND": "jinacode", // jinacode (default), fastembed, hash
|
|
168
|
+
"EMBEDDINGS_DEVICE": "cpu", // cpu or metal (macOS GPU)
|
|
169
|
+
"EMBEDDING_BATCH_SIZE": "32"
|
|
170
|
+
}
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
### Context Assembly
|
|
174
|
+
|
|
175
|
+
```json
|
|
176
|
+
"env": {
|
|
177
|
+
"MAX_CONTEXT_TOKENS": "8192", // Token budget for context (default: 8192)
|
|
178
|
+
"TOKEN_ENCODING": "o200k_base", // tiktoken encoding model
|
|
179
|
+
"MAX_CONTEXT_BYTES": "200000" // Legacy byte-based limit (fallback)
|
|
180
|
+
}
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
### Ranking & Retrieval
|
|
184
|
+
|
|
185
|
+
```json
|
|
186
|
+
"env": {
|
|
187
|
+
"RANK_EXPORTED_BOOST": "0.1", // Boost for exported symbols
|
|
188
|
+
"RANK_TEST_PENALTY": "0.1", // Penalty for test files
|
|
189
|
+
"RANK_POPULARITY_WEIGHT": "0.05", // PageRank influence
|
|
190
|
+
"RRF_ENABLED": "true", // Enable Reciprocal Rank Fusion
|
|
191
|
+
"HYBRID_ALPHA": "0.7" // Vector vs keyword weight (0-1)
|
|
192
|
+
}
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
### Learning System (Optional)
|
|
196
|
+
|
|
197
|
+
```json
|
|
198
|
+
"env": {
|
|
199
|
+
"LEARNING_ENABLED": "false", // Enable selection tracking (default: false)
|
|
200
|
+
"LEARNING_SELECTION_BOOST": "0.1", // Boost for previously selected symbols
|
|
201
|
+
"LEARNING_FILE_AFFINITY_BOOST": "0.05" // Boost for frequently accessed files
|
|
202
|
+
}
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
### Performance
|
|
206
|
+
|
|
207
|
+
```json
|
|
208
|
+
"env": {
|
|
209
|
+
"PARALLEL_WORKERS": "1", // Indexing parallelism (default: 1 for SQLite)
|
|
210
|
+
"EMBEDDING_CACHE_ENABLED": "true", // Persistent embedding cache
|
|
211
|
+
"PAGERANK_ITERATIONS": "20", // PageRank computation iterations
|
|
212
|
+
"METRICS_ENABLED": "true", // Prometheus metrics
|
|
213
|
+
"METRICS_PORT": "9090"
|
|
214
|
+
}
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
### Query Expansion
|
|
218
|
+
|
|
96
219
|
```json
|
|
97
220
|
"env": {
|
|
98
|
-
"
|
|
99
|
-
"
|
|
100
|
-
"INDEX_PATTERNS": "**/*.go", // Add custom file types
|
|
101
|
-
"MAX_CONTEXT_BYTES": "50000" // Limit context window
|
|
221
|
+
"SYNONYM_EXPANSION_ENABLED": "true", // Expand "auth" → "authentication"
|
|
222
|
+
"ACRONYM_EXPANSION_ENABLED": "true" // Expand "db" → "database"
|
|
102
223
|
}
|
|
103
224
|
```
|
|
104
225
|
|
|
@@ -113,12 +234,14 @@ flowchart LR
|
|
|
113
234
|
subgraph Server [Code Intelligence Server]
|
|
114
235
|
direction TB
|
|
115
236
|
Tools[Tool Router]
|
|
116
|
-
|
|
237
|
+
|
|
117
238
|
subgraph Indexer [Indexing Pipeline]
|
|
118
239
|
direction TB
|
|
119
240
|
Scan[File Scan] --> Parse[Tree-Sitter]
|
|
120
241
|
Parse --> Extract[Symbol Extraction]
|
|
121
|
-
Extract -->
|
|
242
|
+
Extract --> PageRank[PageRank Compute]
|
|
243
|
+
Extract --> Embed[Jina Code Embeddings]
|
|
244
|
+
Extract --> JSDoc[JSDoc/Decorator/TODO Extract]
|
|
122
245
|
end
|
|
123
246
|
|
|
124
247
|
subgraph Storage [Storage Engine]
|
|
@@ -126,17 +249,31 @@ flowchart LR
|
|
|
126
249
|
SQLite[(SQLite)]
|
|
127
250
|
Tantivy[(Tantivy)]
|
|
128
251
|
Lance[(LanceDB)]
|
|
252
|
+
Cache[(Embedding Cache)]
|
|
253
|
+
end
|
|
254
|
+
|
|
255
|
+
subgraph Retrieval [Retrieval Engine]
|
|
256
|
+
direction TB
|
|
257
|
+
QueryExpand[Query Expansion]
|
|
258
|
+
Hybrid[Hybrid Search RRF]
|
|
259
|
+
Rerank[Cross-Encoder Reranker]
|
|
260
|
+
Signals[Ranking Signals]
|
|
261
|
+
Context[Token-Aware Assembly]
|
|
129
262
|
end
|
|
130
263
|
|
|
131
264
|
%% Data Flow
|
|
132
265
|
Tools -- Index --> Scan
|
|
133
|
-
|
|
134
|
-
Embed --> Tantivy
|
|
266
|
+
PageRank --> SQLite
|
|
135
267
|
Embed --> Lance
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
Tools -- Query -->
|
|
268
|
+
Embed --> Cache
|
|
269
|
+
JSDoc --> SQLite
|
|
270
|
+
|
|
271
|
+
Tools -- Query --> QueryExpand
|
|
272
|
+
QueryExpand --> Hybrid
|
|
273
|
+
Hybrid --> Rerank
|
|
274
|
+
Rerank --> Signals
|
|
275
|
+
Signals --> Context
|
|
276
|
+
Context --> Tools
|
|
140
277
|
end
|
|
141
278
|
```
|
|
142
279
|
|
|
@@ -144,6 +281,36 @@ flowchart LR
|
|
|
144
281
|
|
|
145
282
|
## Development
|
|
146
283
|
|
|
147
|
-
1.
|
|
148
|
-
2.
|
|
149
|
-
3.
|
|
284
|
+
1. **Prerequisites**: Rust (stable), `protobuf`.
|
|
285
|
+
2. **Build**: `cargo build --release`
|
|
286
|
+
3. **Run**: `./scripts/start_mcp.sh`
|
|
287
|
+
4. **Test**: `cargo test` or `EMBEDDINGS_BACKEND=hash cargo test` (faster, skips model download)
|
|
288
|
+
|
|
289
|
+
### Quick Testing with Hash Backend
|
|
290
|
+
|
|
291
|
+
For faster development iteration, use the hash embedding backend which skips model downloads:
|
|
292
|
+
|
|
293
|
+
```bash
|
|
294
|
+
EMBEDDINGS_BACKEND=hash BASE_DIR=/path/to/repo ./target/release/code-intelligence-mcp-server
|
|
295
|
+
```
|
|
296
|
+
|
|
297
|
+
### Project Structure
|
|
298
|
+
|
|
299
|
+
```
|
|
300
|
+
src/
|
|
301
|
+
├── indexer/ # File scanning, parsing, symbol extraction
|
|
302
|
+
├── storage/ # SQLite, Tantivy, LanceDB layers
|
|
303
|
+
├── retrieval/ # Hybrid search, ranking, context assembly
|
|
304
|
+
├── graph/ # PageRank, call hierarchy, type graphs
|
|
305
|
+
├── handlers/ # MCP tool handlers
|
|
306
|
+
├── server/ # MCP protocol routing
|
|
307
|
+
├── tools/ # Tool definitions
|
|
308
|
+
├── embeddings/ # Jina Code model wrapper
|
|
309
|
+
├── reranker/ # Cross-encoder ORT implementation
|
|
310
|
+
├── metrics/ # Prometheus metrics
|
|
311
|
+
└── config.rs # Environment-based configuration
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
## License
|
|
315
|
+
|
|
316
|
+
MIT
|