@iceinvein/code-intelligence-mcp-standalone 1.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (4) hide show
  1. package/README.md +695 -0
  2. package/bin/run.js +25 -0
  3. package/install.js +86 -0
  4. package/package.json +27 -0
package/README.md ADDED
@@ -0,0 +1,695 @@
1
+ # Code Intelligence MCP Server
2
+
3
+ > **Semantic search and code navigation for LLM agents.**
4
+
5
+ [![NPM Version](https://img.shields.io/npm/v/@iceinvein/code-intelligence-mcp?style=flat-square&color=blue)](https://www.npmjs.com/package/@iceinvein/code-intelligence-mcp)
6
+ [![License](https://img.shields.io/badge/license-MIT-green?style=flat-square)](LICENSE)
7
+ [![MCP](https://img.shields.io/badge/MCP-Enabled-orange?style=flat-square)](https://modelcontextprotocol.io)
8
+
9
+ ---
10
+
11
+ This server indexes your codebase locally to provide **fast, semantic, and structure-aware** code navigation to tools like Claude Code, OpenCode, Trae, and Cursor.
12
+
13
+ ## Why Use This Server?
14
+
15
+ Unlike basic text search, this server builds a local knowledge graph to understand your code.
16
+
17
+ * **Advanced Hybrid Search**: Combines keyword search ([BM25](#glossary) via Tantivy) with semantic vector search (via LanceDB + jina-code-embeddings-0.5b) using [Reciprocal Rank Fusion (RRF)](#glossary) — a technique that merges ranked results from different search systems by position rather than raw score.
18
+ * **Smart Context Assembly**: Token-aware budgeting with query-aware truncation that keeps relevant lines within context limits.
19
+ * **On-Device LLM Descriptions**: Automatically generates natural-language descriptions for every symbol using a local **Qwen2.5-Coder-1.5B** model (llama.cpp with Metal GPU), enriching search with human-readable summaries. This bridges the vocabulary gap between how developers search ("auth handler") and how code is named (`authenticate_request`).
20
+ * **PageRank Scoring**: Graph-based symbol importance scoring (similar to Google's original algorithm) that identifies central, heavily-used components by analyzing call graphs and type relationships.
21
+ * **Learns from Feedback**: Optional learning system that adapts to user selections over time.
22
+ * **Production First**: Multi-layer test detection (file paths, symbol names, and AST-level `#[test]`/`mod tests` analysis) ensures implementation code ranks above test helpers.
23
+ * **Multi-Repo Support**: Index and search across multiple repositories/monorepos simultaneously.
24
+ * **OS-Native File Watching**: Uses the `notify` crate with macOS FSEvents for instant re-indexing on file changes.
25
+ * **Built-in Chat UI**: Optional ChatGPT-style web interface powered by a local **Qwen2.5-Coder-14B** model. Ask questions about your codebase in the browser with live tool-call visibility and streaming responses.
26
+ * **Fast & Local**: Written in **Rust** with Metal GPU acceleration on Apple Silicon. Parallel indexing with persistent caching.
27
+
28
+ ---
29
+
30
+ ## Quick Start
31
+
32
+ Runs directly via `npx` without requiring a local Rust toolchain.
33
+
34
+ ### Claude Code
35
+
36
+ Add to your MCP settings (global `~/.claude.json` or project-level `.mcp.json`):
37
+
38
+ ```json
39
+ {
40
+ "mcpServers": {
41
+ "code-intelligence": {
42
+ "command": "npx",
43
+ "args": ["-y", "@iceinvein/code-intelligence-mcp"],
44
+ "env": {}
45
+ }
46
+ }
47
+ }
48
+ ```
49
+
50
+ Or install via the CLI:
51
+
52
+ ```bash
53
+ claude mcp add code-intelligence -- npx -y @iceinvein/code-intelligence-mcp
54
+ ```
55
+
56
+ Once connected, Claude Code gains 23 MCP tools for semantic search (`search_code`), symbol navigation (`get_definition`, `find_references`), call/type graphs (`get_call_hierarchy`, `get_type_graph`), impact analysis (`find_affected_code`, `trace_data_flow`), and more. The server auto-detects the working directory and begins indexing in the background.
57
+
58
+ ### OpenCode / Trae
59
+
60
+ Add to your `opencode.json` (or global config):
61
+
62
+ ```json
63
+ {
64
+ "mcp": {
65
+ "code-intelligence": {
66
+ "type": "local",
67
+ "command": ["npx", "-y", "@iceinvein/code-intelligence-mcp"],
68
+ "enabled": true
69
+ }
70
+ }
71
+ }
72
+ ```
73
+
74
+ *The server will automatically download the embedding model (~531MB) and LLM (~1.1GB) on first launch, then index your project in the background.*
75
+
76
+ ---
77
+
78
+ ## Standalone Server Mode
79
+
80
+ By default, each MCP client spawns its own server process (stdio transport). If you run multiple clients against the same repo, a per-repo leader lock (`flock()`) ensures only one instance performs indexing, file watching, and LLM description generation. The leader loads the LLM (~1.1GB) during indexing and automatically frees it once descriptions are complete. Follower instances never load the LLM — they open the search index read-only and pick up the leader's changes. All instances load their own copy of the embedding model (~531MB) for query-time vector search.
81
+
82
+ **Standalone mode** runs a single long-lived HTTP server that all clients share. The main advantage is cross-repo deduplication — in stdio mode, each instance loads its own embedding model regardless of which repo it's on. With 5 instances across 3 repos, that's 5 copies (~2.6GB). Standalone loads the models once and shares them across all repos and clients.
83
+
84
+ ### Starting the Server
85
+
86
+ ```bash
87
+ # Default: localhost:3333
88
+ npx @iceinvein/code-intelligence-mcp-standalone
89
+
90
+ # Custom host/port
91
+ npx @iceinvein/code-intelligence-mcp-standalone --port 4444 --host 0.0.0.0
92
+
93
+ # From source
94
+ ./target/release/code-intelligence-mcp-server --standalone
95
+ ./target/release/code-intelligence-mcp-server --standalone --port 4444
96
+
97
+ # Via environment variable
98
+ CIMCP_MODE=standalone ./target/release/code-intelligence-mcp-server
99
+ ```
100
+
101
+ ### Connecting MCP Clients
102
+
103
+ Point your MCP clients to the standalone server using Streamable HTTP transport:
104
+
105
+ **Claude Code** (`~/.claude.json` or project-level `.mcp.json`):
106
+ ```json
107
+ {
108
+ "mcpServers": {
109
+ "code-intelligence": {
110
+ "type": "streamable-http",
111
+ "url": "http://localhost:3333/mcp"
112
+ }
113
+ }
114
+ }
115
+ ```
116
+
117
+ Or via the CLI:
118
+ ```bash
119
+ claude mcp add --transport http code-intelligence http://localhost:3333/mcp
120
+ ```
121
+
122
+ **OpenCode** (`opencode.json`):
123
+ ```json
124
+ {
125
+ "mcp": {
126
+ "code-intelligence": {
127
+ "type": "remote",
128
+ "url": "http://localhost:3333/mcp",
129
+ "enabled": true
130
+ }
131
+ }
132
+ }
133
+ ```
134
+
135
+ **Cursor** (`.cursor/mcp.json`):
136
+ ```json
137
+ {
138
+ "mcpServers": {
139
+ "code-intelligence": {
140
+ "url": "http://localhost:3333/mcp"
141
+ }
142
+ }
143
+ }
144
+ ```
145
+
146
+ The server auto-detects each client's workspace root via the MCP `roots` capability — no `BASE_DIR` needed.
147
+
148
+ ### How It Works
149
+
150
+ ```mermaid
151
+ flowchart TB
152
+ A[Claude Code - Session A] & B[Cursor - Session B] & C[Trae - Session C]
153
+ A & B & C -- "POST /mcp (Streamable HTTP)" --> Server
154
+
155
+ Server["Standalone MCP Server<br/>(single process, shared embedding model)"]
156
+
157
+ Server --> RA["Repo A indexes<br/>SQLite + Tantivy + LanceDB"]
158
+ Server --> RB["Repo B indexes<br/>SQLite + Tantivy + LanceDB"]
159
+ Server --> RC["Repo C indexes<br/>SQLite + Tantivy + LanceDB"]
160
+ ```
161
+
162
+ Each client session is bound to its workspace root. The server maintains separate indexes per repo but shares the embedding model across all of them.
163
+
164
+ ### Data Storage
165
+
166
+ Both embedded (stdio) and standalone (HTTP) modes store all data in `~/.code-intelligence/`:
167
+
168
+ ```text
169
+ ~/.code-intelligence/
170
+ ├── server.toml # Optional config file (standalone only)
171
+ ├── models/ # Shared models (loaded once, shared across repos)
172
+ │ ├── jina-code-embeddings-0.5b-gguf/ # Embedding model (~531MB, GGUF via llama.cpp)
173
+ │ └── qwen2.5-coder-1.5b-gguf/ # LLM model (~1.1GB)
174
+ ├── logs/
175
+ │ └── server.log
176
+ └── repos/
177
+ ├── registry.json # Tracks all known repos
178
+ ├── a1b2c3d4e5f6a7b8/ # Per-repo data (SHA256 hash of repo path)
179
+ │ ├── code-intelligence.db
180
+ │ ├── tantivy-index/
181
+ │ └── vectors/
182
+ └── f8e7d6c5b4a3f2e1/
183
+ └── ...
184
+ ```
185
+
186
+ The same repo always maps to the same hash regardless of mode, so embedded and standalone can share the same index data.
187
+
188
+ ### Configuration
189
+
190
+ Standalone mode is configured via `~/.code-intelligence/server.toml` (created on first run with defaults). Environment variables and CLI flags override TOML settings.
191
+
192
+ **Priority:** CLI flags > Environment variables > `server.toml` > Defaults
193
+
194
+ **Example `server.toml`:**
195
+
196
+ ```toml
197
+ [server]
198
+ host = "127.0.0.1"
199
+ port = 3333
200
+
201
+ [embeddings]
202
+ backend = "llamacpp" # llamacpp (default) or hash (testing)
203
+ device = "metal" # cpu or metal (macOS GPU)
204
+
205
+ [repos.defaults]
206
+ index_patterns = "**/*.ts,**/*.tsx,**/*.rs,**/*.py,**/*.go"
207
+ exclude_patterns = "**/node_modules/**,**/dist/**,**/.git/**"
208
+ watch_mode = true # Auto-reindex on file changes
209
+
210
+ [lifecycle]
211
+ warm_ttl_seconds = 300 # How long idle repos stay in memory
212
+ ```
213
+
214
+ **Environment variable overrides (same as embedded mode):**
215
+
216
+ | Variable | Example | Description |
217
+ | -------- | ------- | ----------- |
218
+ | `CIMCP_MODE` | `standalone` | Alternative to `--standalone` flag |
219
+ | `EMBEDDINGS_BACKEND` | `hash` | Override embedding backend (`llamacpp` or `hash`) |
220
+ | `EMBEDDINGS_DEVICE` | `metal` | Override device (cpu/metal) |
221
+ | `EMBEDDINGS_MODEL_DIR` | `/path/to/model` | Override model directory |
222
+
223
+ ---
224
+
225
+ ## Chat Mode (Experimental)
226
+
227
+ Chat mode adds a **ChatGPT-style web UI** for asking questions about your codebase directly in the browser. It runs a local **Qwen2.5-Coder-14B** model with full Metal GPU acceleration and uses the same search and navigation tools that MCP clients get — meaning search quality improvements automatically benefit the chat experience.
228
+
229
+ Chat mode requires standalone mode and Apple Silicon with at least 16GB of unified memory.
230
+
231
+ ### Quick Start
232
+
233
+ ```bash
234
+ # Start standalone server with chat enabled
235
+ npx @iceinvein/code-intelligence-mcp-standalone --chat
236
+
237
+ # Or from source
238
+ ./target/release/code-intelligence-mcp-server --standalone --chat
239
+
240
+ # Custom ports
241
+ ./target/release/code-intelligence-mcp-server --standalone --port 3333 --chat --chat-port 4000
242
+
243
+ # Via environment variables
244
+ CIMCP_MODE=standalone CIMCP_CHAT=true ./target/release/code-intelligence-mcp-server
245
+ ```
246
+
247
+ Once started, open **http://127.0.0.1:3334** in your browser.
248
+
249
+ On first launch, the 14B model (~9GB) is downloaded from HuggingFace and cached at `~/.code-intelligence/models/qwen2.5-coder-14b-gguf/`. The MCP server starts immediately — the model loads in the background and the chat UI becomes available once loading completes (typically 2-5 minutes on first run, seconds on subsequent launches).
250
+
251
+ ### How It Works
252
+
253
+ ```mermaid
254
+ sequenceDiagram
255
+ participant Browser as Web UI
256
+ participant Chat as Chat Server (:3334)
257
+ participant Agent as Agent Loop
258
+ participant LLM as Qwen2.5-14B (Metal GPU)
259
+ participant Tools as MCP Tool Handlers
260
+
261
+ Browser->>Chat: POST /api/chat (messages + repo_path)
262
+ Chat-->>Browser: SSE stream opened
263
+
264
+ loop Up to 3 tool rounds
265
+ Agent->>LLM: Generate (full prompt)
266
+ LLM-->>Agent: Response with <tool_call> blocks
267
+ Agent-->>Browser: SSE: tool_call (tool name + args)
268
+ Agent->>Tools: Execute tool (search_code, get_definition, etc.)
269
+ Tools-->>Agent: Tool results (JSON)
270
+ Agent-->>Browser: SSE: tool_result (summary)
271
+ Note over Agent: Append results to conversation, next round
272
+ end
273
+
274
+ Agent->>LLM: Generate stream (final response)
275
+ LLM-->>Agent: Tokens (one at a time)
276
+ Agent-->>Browser: SSE: token (streamed)
277
+ Agent-->>Browser: SSE: done
278
+ ```
279
+
280
+ The agent uses up to **3 rounds** of tool calling before producing a final streamed response. Each round, the LLM can invoke any combination of 10 code intelligence tools to gather context before answering.
281
+
282
+ ### Available Tools
283
+
284
+ The chat agent has access to a curated subset of the full MCP tool suite:
285
+
286
+ | Tool | Purpose |
287
+ | :--- | :------ |
288
+ | `search_code` | Hybrid semantic + keyword search |
289
+ | `get_definition` | Jump to symbol source code |
290
+ | `find_references` | Find all usages of a symbol |
291
+ | `get_call_hierarchy` | Navigate callers and callees |
292
+ | `get_type_graph` | Explore type inheritance |
293
+ | `explore_dependency_graph` | Trace module imports/exports |
294
+ | `get_file_symbols` | List all symbols in a file |
295
+ | `find_affected_code` | Impact analysis (reverse dependencies) |
296
+ | `trace_data_flow` | Follow variable reads and writes |
297
+ | `summarize_file` | Structural file overview |
298
+
299
+ ### Web UI Features
300
+
301
+ - **Live token streaming** — responses appear word-by-word as the model generates
302
+ - **Tool call visibility** — see which tools the model invokes and their results in real-time
303
+ - **Multi-turn conversation** — full chat history maintained across turns
304
+ - **Markdown rendering** — code blocks with syntax highlighting (via highlight.js)
305
+ - **Dark/light theme** — toggle between themes with the header button
306
+ - **Repo selector** — specify the repository path to query against
307
+ - **Keyboard shortcuts** — Enter to send, Shift+Enter for newline
308
+
309
+ ### Configuration
310
+
311
+ | Setting | CLI Flag | Env Var | Default | Description |
312
+ | :------ | :------- | :------ | :------ | :---------- |
313
+ | Enable chat | `--chat` | `CIMCP_CHAT=true` | off | Activate chat mode |
314
+ | Chat port | `--chat-port PORT` | `CIMCP_CHAT_PORT=PORT` | `3334` | HTTP port for the chat UI |
315
+
316
+ **Priority:** CLI flags > Environment variables > Defaults
317
+
318
+ ### API Reference
319
+
320
+ The chat server exposes three HTTP endpoints:
321
+
322
+ **`GET /`** — Serves the web UI (single-page HTML with embedded CSS/JS).
323
+
324
+ **`GET /api/status`** — Returns model loading status.
325
+ ```json
326
+ {"model_loaded": true, "model_name": "Qwen2.5-Coder-14B-Instruct"}
327
+ ```
328
+
329
+ **`POST /api/chat`** — Starts a streaming chat session. Returns an SSE event stream.
330
+
331
+ Request body:
332
+ ```json
333
+ {
334
+ "messages": [
335
+ {"role": "user", "content": "How does the ranking system work?"}
336
+ ],
337
+ "repo_path": "/absolute/path/to/your/repo"
338
+ }
339
+ ```
340
+
341
+ SSE event types:
342
+
343
+ | Event | Data | Description |
344
+ | :---- | :--- | :---------- |
345
+ | `token` | `{"type":"token","content":"The "}` | A generated text token |
346
+ | `tool_call` | `{"type":"tool_call","tool":"search_code","args":{...}}` | Tool invocation started |
347
+ | `tool_result` | `{"type":"tool_result","tool":"search_code","summary":"..."}` | Tool execution completed |
348
+ | `error` | `{"type":"error","message":"..."}` | Non-recoverable error |
349
+ | `done` | `{"type":"done"}` | Stream complete |
350
+
351
+ ### Model Details
352
+
353
+ | Property | Value |
354
+ | :------- | :---- |
355
+ | Model | Qwen2.5-Coder-14B-Instruct |
356
+ | Format | GGUF Q4_K_M (~9 GB) |
357
+ | Context window | 8,192 tokens |
358
+ | Max generation | 2,048 tokens per response |
359
+ | GPU offloading | All layers via Metal |
360
+ | Sampling | Temperature 0.7 |
361
+ | HuggingFace repo | `Qwen/Qwen2.5-Coder-14B-Instruct-GGUF` |
362
+ | Cache location | `~/.code-intelligence/models/qwen2.5-coder-14b-gguf/` |
363
+
364
+ ### Limitations
365
+
366
+ - **Standalone-only** — chat is not available in embedded (stdio) mode since it requires a persistent HTTP server
367
+ - **Apple Silicon required** — the 14B model needs Metal GPU acceleration; 16GB+ unified memory recommended
368
+ - **Context budget** — the 8K token context window is shared between conversation history, tool definitions, and tool results; long conversations may lose early context
369
+ - **Tool result truncation** — individual tool results are capped at 4,000 characters to preserve context budget
370
+ - **No authentication** — the chat server binds to localhost only; do not expose to the network without adding an auth layer
371
+ - **Single-threaded generation** — one chat request is processed at a time; concurrent requests queue
372
+
373
+ ---
374
+
375
+ ## Capabilities
376
+
377
+ Available tools for the agent (23 tools total):
378
+
379
+ ### Core Search & Navigation
380
+
381
+ | Tool | Description |
382
+ | :------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
383
+ | `search_code` | **Primary Search.** Finds code by meaning ("how does auth work?") or structure ("class User"). Supports query decomposition (e.g., "authentication and authorization"). |
384
+ | `get_definition` | Retrieves the full definition of a specific symbol with disambiguation support. |
385
+ | `find_references` | Finds all usages of a function, class, or variable. |
386
+ | `get_call_hierarchy` | Specifies upstream callers and downstream callees. |
387
+ | `get_type_graph` | Explores inheritance (extends/implements) and type aliases. |
388
+ | `explore_dependency_graph` | Explores module-level dependencies upstream or downstream. |
389
+ | `get_file_symbols` | Lists all symbols defined in a specific file. |
390
+ | `get_usage_examples` | Returns real-world examples of how a symbol is used in the codebase. |
391
+
392
+ ### Advanced Analysis
393
+
394
+ | Tool | Description |
395
+ | :----------------------- | :---------------------------------------------------------------------------------------- |
396
+ | `explain_search` | Returns detailed scoring breakdown to understand why results ranked as they did. |
397
+ | `find_similar_code` | Finds code semantically similar to a given symbol or code snippet. |
398
+ | `trace_data_flow` | Traces variable reads and writes through the codebase to understand data flow. |
399
+ | `find_affected_code` | Finds code that would be affected if a symbol changes (reverse dependencies). |
400
+ | `get_similarity_cluster` | Returns symbols in the same semantic similarity cluster as a given symbol. |
401
+ | `summarize_file` | Generates a summary of file contents including symbol counts, structure, and key exports. |
402
+ | `get_module_summary` | Lists all exported symbols from a module/file with their signatures. |
403
+
404
+ ### Testing, Frameworks & Documentation
405
+
406
+ | Tool | Description |
407
+ | :------------------------- | :------------------------------------------------------------------------------------------------------------------------ |
408
+ | `search_todos` | Searches for TODO and FIXME comments to track technical debt. |
409
+ | `find_tests_for_symbol` | Finds test files that test a given symbol or source file. |
410
+ | `search_decorators` | Searches for TypeScript/JavaScript decorators (@Component, @Controller, @Get, @Post, etc.). |
411
+ | `search_framework_patterns`| Searches for framework-specific patterns (e.g., Elysia routes, WebSocket handlers, middleware) with method/path filtering.|
412
+
413
+ ### Context & Learning
414
+
415
+ | Tool | Description |
416
+ | :----------------- | :------------------------------------------------------------------------------ |
417
+ | `hydrate_symbols` | Hydrates full context for a set of symbol IDs. |
418
+ | `report_selection` | Records user selection feedback for learning (call when user selects a result). |
419
+ | `refresh_index` | Manually triggers a re-index of the codebase. |
420
+ | `get_index_stats` | Returns index statistics (files, symbols, edges, last updated). |
421
+
422
+ ---
423
+
424
+ ## Supported Languages
425
+
426
+ The server supports semantic navigation and symbol extraction for the following languages:
427
+
428
+ * **Rust**
429
+ * **TypeScript / TSX**
430
+ * **JavaScript**
431
+ * **Python**
432
+ * **Go**
433
+ * **Java**
434
+ * **C**
435
+ * **C++**
436
+
437
+ ---
438
+
439
+ ## Smart Ranking & Context Enhancement
440
+
441
+ The search pipeline runs two parallel searches — keyword (BM25 via Tantivy) and semantic (vector embeddings via LanceDB) — then merges them using Reciprocal Rank Fusion (RRF). On top of this hybrid base, the ranking engine applies structural signals to optimize for relevance:
442
+
443
+ 1. **PageRank Symbol Importance**: Graph-based scoring that identifies central, heavily-used components (similar to Google's PageRank).
444
+ 2. **Reciprocal Rank Fusion (RRF)**: Combines keyword, vector, and graph search results using statistically optimal rank fusion.
445
+ 3. **Query Decomposition**: Complex queries ("X and Y") are automatically split into sub-queries for better coverage.
446
+ 4. **Token-Aware Truncation**: Context assembly keeps query-relevant lines within token budgets using BM25-style relevance scoring.
447
+ 5. **LLM-Enriched Indexing**: On-device Qwen2.5-Coder generates natural-language descriptions for each symbol, bridging the vocabulary gap between how developers search and how code is named.
448
+ 6. **Morphological Variants**: Function names are expanded with stems and derivations (e.g., `watch` → `watcher`, `index` → `reindex`) to improve recall for natural-language queries.
449
+ 7. **Multi-Layer Test Detection**: Three mechanisms — file path patterns (`*.test.ts`), symbol name heuristics (`test_*`), and SQL-based AST analysis (`#[test]`, `mod tests`) — with a final enforcement pass that prevents test code from escaping via edge expansion.
450
+ 8. **Edge Expansion**: High-ranking symbols pull in structurally related code (callers, type members) with importance filtering to avoid noise from private helpers.
451
+ 9. **Directory Semantics**: Implementation directories (`src`, `lib`, `app`) are boosted, while build artifacts (`dist`, `build`) and `node_modules` are penalized.
452
+ 10. **Exported Symbol Boost**: Exported/public symbols receive a ranking boost as they represent the primary API surface.
453
+ 11. **Glue Code Filtering**: Re-export files (e.g., `index.ts`) are deprioritized in favor of the actual implementation.
454
+ 12. **JSDoc Boost**: Symbols with documentation receive a ranking boost, and examples are included in search results.
455
+ 13. **Learning from Feedback** (optional): Tracks user selections to personalize future search results.
456
+ 14. **Package-Aware Scoring** (multi-repo): Boosts results from the same package when working in monorepos.
457
+
458
+ ### Intent Detection
459
+
460
+ The system detects query intent and adjusts ranking accordingly:
461
+
462
+ | Query Pattern | Intent | Effect |
463
+ | ----------------- | ------------------------- | --------------------------------------- |
464
+ | "struct User" | Definition | Boosts type definitions (1.5x) |
465
+ | "who calls login" | Callers | Triggers graph lookup |
466
+ | "verify login" | Testing | Boosts test files |
467
+ | "User schema" | Schema/Model | Boosts schema/model files (50-75x) |
468
+ | "auth and authz" | Multi-query decomposition | Splits into sub-queries, merges via RRF |
469
+
470
+ For a deep dive into the system's design, see [System Architecture](SYSTEM_ARCHITECTURE.md).
471
+
472
+ ---
473
+
474
+ ## Glossary
475
+
476
+ Key terms used throughout this documentation:
477
+
478
+ | Term | Full Name | What It Means |
479
+ |------|-----------|---------------|
480
+ | **MCP** | Model Context Protocol | An open protocol for connecting LLM-based tools (like Claude Code, Cursor, OpenCode) to external data sources and capabilities. This server implements MCP to expose code search and navigation tools. |
481
+ | **BM25** | Best Matching 25 | A probabilistic text search algorithm (used by Tantivy). Ranks results by how often your search terms appear in a document (term frequency) weighted by how rare those terms are across all documents (inverse document frequency / IDF). The standard algorithm behind most full-text search engines. |
482
+ | **IDF** | Inverse Document Frequency | A component of BM25 that measures how rare a term is. A term like `authenticate` appearing in only 3 files has high IDF (very discriminating), while `error` appearing in 200 files has low IDF (less useful for ranking). |
483
+ | **RRF** | Reciprocal Rank Fusion | A technique for merging ranked result lists from different search systems. Instead of comparing raw scores (which have different scales), RRF uses rank positions: a result ranked #1 in keyword search and #3 in vector search gets a combined score based on those positions. This makes it robust when combining fundamentally different search approaches. |
484
+ | **GGUF** | GGML Unified Format | A binary format for storing quantized (compressed) neural network weights. Used by llama.cpp to run both the embedding model and the LLM efficiently on consumer hardware. Q4_K_M quantization reduces the 1.5B parameter model from ~3GB to ~1.1GB with minimal quality loss. |
485
+ | **LLM** | Large Language Model | In this project, a local Qwen2.5-Coder-1.5B model that generates one-sentence natural-language descriptions for each code symbol (function, class, type). These descriptions are indexed alongside the code, helping BM25 match natural-language queries to technically-named code. |
486
+ | **PageRank** | — | A graph algorithm (originally from Google Search) adapted here to score symbol importance. Symbols that are called/referenced by many other symbols get higher PageRank scores, indicating they are central to the codebase. |
487
+ | **Tree-Sitter** | — | A parser generator that builds concrete syntax trees (CSTs) for source code. Used to extract symbols (functions, classes, types), their relationships (calls, imports, type hierarchies), and structural information from 8 supported languages. |
488
+
489
+ ---
490
+
491
+ ## Configuration (Optional)
492
+
493
+ Works without configuration by default. You can customize behavior via environment variables:
494
+
495
+ ### Core Settings
496
+
497
+ ```json
498
+ "env": {
499
+ "BASE_DIR": "/path/to/repo", // Required: Repository root
500
+ "WATCH_MODE": "true", // Watch for file changes (Default: true)
501
+ "INDEX_PATTERNS": "**/*.ts,**/*.go", // File patterns to index
502
+ "EXCLUDE_PATTERNS": "**/node_modules/**",
503
+ "REPO_ROOTS": "/path/to/repo1,/path/to/repo2" // Multi-repo support
504
+ }
505
+ ```
506
+
507
+ ### Embedding Model
508
+
509
+ ```json
510
+ "env": {
511
+ "EMBEDDINGS_BACKEND": "llamacpp", // llamacpp (default) or hash (testing)
512
+ "EMBEDDINGS_DEVICE": "cpu", // cpu or metal (macOS GPU)
513
+ "EMBEDDING_BATCH_SIZE": "32"
514
+ }
515
+ ```
516
+
517
+ ### Context Assembly
518
+
519
+ ```json
520
+ "env": {
521
+ "MAX_CONTEXT_TOKENS": "8192", // Token budget for context (default: 8192)
522
+ "TOKEN_ENCODING": "o200k_base", // tiktoken encoding model
523
+ "MAX_CONTEXT_BYTES": "200000" // Legacy byte-based limit (fallback)
524
+ }
525
+ ```
526
+
527
+ ### Ranking & Retrieval
528
+
529
+ ```json
530
+ "env": {
531
+ "RANK_EXPORTED_BOOST": "1.0", // Boost for exported symbols
532
+ "RANK_TEST_PENALTY": "0.1", // Penalty for test files
533
+ "RANK_POPULARITY_WEIGHT": "0.05", // PageRank influence
534
+ "RRF_ENABLED": "true", // Enable Reciprocal Rank Fusion
535
+ "HYBRID_ALPHA": "0.7" // Vector vs keyword weight (0-1)
536
+ }
537
+ ```
538
+
539
+ ### Learning System (Optional)
540
+
541
+ ```json
542
+ "env": {
543
+ "LEARNING_ENABLED": "false", // Enable selection tracking (default: false)
544
+ "LEARNING_SELECTION_BOOST": "0.1", // Boost for previously selected symbols
545
+ "LEARNING_FILE_AFFINITY_BOOST": "0.05" // Boost for frequently accessed files
546
+ }
547
+ ```
548
+
549
+ ### Performance
550
+
551
+ ```json
552
+ "env": {
553
+ "PARALLEL_WORKERS": "1", // Indexing parallelism (default: 1 for SQLite)
554
+ "EMBEDDING_CACHE_ENABLED": "true", // Persistent embedding cache
555
+ "PAGERANK_ITERATIONS": "20", // PageRank computation iterations
556
+ "METRICS_ENABLED": "true", // Prometheus metrics
557
+ "METRICS_PORT": "9090"
558
+ }
559
+ ```
560
+
561
+ ### Query Expansion
562
+
563
+ ```json
564
+ "env": {
565
+ "SYNONYM_EXPANSION_ENABLED": "true", // Expand "auth" → "authentication"
566
+ "ACRONYM_EXPANSION_ENABLED": "true" // Expand "db" → "database"
567
+ }
568
+ ```
569
+
570
+ ---
571
+
572
+ ## Architecture
573
+
574
+ ```mermaid
575
+ flowchart LR
576
+ Client[MCP Client] <==> Tools
577
+ Browser[Chat Web UI] <==> ChatServer
578
+
579
+ subgraph Server [Code Intelligence Server]
580
+ direction TB
581
+ Tools[Tool Router]
582
+
583
+ subgraph Chat [Chat Mode]
584
+ direction TB
585
+ ChatServer[Axum HTTP + SSE] --> Agent[Agent Loop]
586
+ Agent --> ChatLLM["Qwen2.5-Coder-14B<br/>(Metal GPU)"]
587
+ Agent -- "tool calls" --> Handlers
588
+ end
589
+
590
+ subgraph Indexer [Indexing Pipeline]
591
+ direction TB
592
+ Watch[OS-Native File Watcher] --> Scan[File Scan]
593
+ Scan --> Parse[Tree-Sitter]
594
+ Parse --> Extract[Symbol Extraction]
595
+ Extract --> PageRank[PageRank Compute]
596
+ Extract --> Embed[jina-code-0.5b Embeddings - llama.cpp]
597
+ Extract --> LLMDesc[LLM Descriptions - Qwen2.5-Coder]
598
+ Extract --> JSDoc[JSDoc/Decorator/TODO Extract]
599
+ end
600
+
601
+ subgraph Storage [Storage Engine]
602
+ direction TB
603
+ SQLite[(SQLite)]
604
+ Tantivy[(Tantivy)]
605
+ Lance[(LanceDB)]
606
+ Cache[(Embedding Cache)]
607
+ end
608
+
609
+ subgraph Retrieval [Retrieval Engine]
610
+ direction TB
611
+ QueryExpand[Query Expansion]
612
+ Hybrid[Hybrid Search RRF]
613
+ Signals[Ranking Signals]
614
+ Context[Token-Aware Assembly]
615
+ end
616
+
617
+ Handlers[Tool Handlers]
618
+ Tools --> Handlers
619
+ Handlers -- Index --> Watch
620
+ PageRank --> SQLite
621
+ Embed --> Lance
622
+ Embed --> Cache
623
+ LLMDesc --> SQLite
624
+ JSDoc --> SQLite
625
+
626
+ Handlers -- Query --> QueryExpand
627
+ QueryExpand --> Hybrid
628
+ Hybrid --> Signals
629
+ Signals --> Context
630
+ Context --> Handlers
631
+ end
632
+ ```
633
+
634
+ ---
635
+
636
+ ## Development
637
+
638
+ 1. **Prerequisites**: Rust (stable), `protobuf`.
639
+ 2. **Build**: `cargo build --release`
640
+ 3. **Run**: `./scripts/start_mcp.sh`
641
+ 4. **Test**: `cargo test` or `EMBEDDINGS_BACKEND=hash cargo test` (faster, skips model download)
642
+
643
+ ### Quick Testing with Hash Backend
644
+
645
+ For faster development iteration, use the hash embedding backend which skips model downloads:
646
+
647
+ ```bash
648
+ EMBEDDINGS_BACKEND=hash BASE_DIR=/path/to/repo ./target/release/code-intelligence-mcp-server
649
+ ```
650
+
651
+ ### Project Structure
652
+
653
+ ```text
654
+ src/
655
+ ├── chat/ # Chat mode (--chat flag, standalone only)
656
+ │ ├── mod.rs # Axum HTTP server, SSE streaming, routes
657
+ │ ├── agent.rs # Multi-round agent loop, prompt building, tool call parsing
658
+ │ ├── llm.rs # ChatLlm (Qwen2.5-Coder-14B via llama.cpp, Metal GPU)
659
+ │ ├── tools.rs # Tool definitions (JSON) + dispatch to handlers
660
+ │ └── ui.html # Single-file web UI (vanilla JS, marked.js, highlight.js)
661
+ ├── indexer/
662
+ │ ├── extract/ # Language-specific symbol extractors (Rust, TS, Python, Go, Java, C, C++)
663
+ │ ├── pipeline/ # Indexing pipeline stages (scan, parse, embed, watch, describe)
664
+ │ └── package/ # Package detection (npm, Cargo, Go, Python)
665
+ ├── storage/
666
+ │ ├── sqlite/ # SQLite schema, queries, operations
667
+ │ ├── tantivy.rs # BM25 full-text search with n-gram tokenization
668
+ │ └── vector.rs # LanceDB vector embeddings
669
+ ├── retrieval/
670
+ │ ├── ranking/ # Scoring signals, RRF, diversity, edge expansion, reranker
671
+ │ ├── assembler/ # Token-aware context assembly and formatting
672
+ │ ├── hyde/ # Hypothetical document expansion
673
+ │ ├── mod.rs # Search pipeline orchestrator
674
+ │ ├── hybrid.rs # Hybrid BM25 + vector scoring loop
675
+ │ └── postprocess.rs # Final enforcement, vector promotion
676
+ ├── graph/ # PageRank, call hierarchy, type graphs
677
+ ├── handlers/ # MCP tool handlers (shared by MCP server + chat agent)
678
+ ├── server/ # MCP protocol routing (embedded + standalone)
679
+ │ ├── mod.rs # Shared tool dispatch, embedded handler
680
+ │ └── standalone.rs # Standalone HTTP handler with session routing
681
+ ├── tools/ # Tool definitions (23 MCP tools)
682
+ ├── embeddings/ # jina-code-0.5b embedding model (GGUF via llama.cpp)
683
+ ├── llm/ # On-device LLM (Qwen2.5-Coder-1.5B via llama.cpp, for descriptions)
684
+ ├── reranker/ # Reranker trait and cache (currently disabled)
685
+ ├── path/ # Cross-platform path normalization (camino)
686
+ ├── text.rs # Text processing (synonym expansion, morphological variants)
687
+ ├── metrics/ # Prometheus metrics
688
+ ├── config.rs # Configuration (embedded + standalone)
689
+ ├── session.rs # Multi-repo session management (standalone)
690
+ └── registry.rs # Repo registry with path hashing (standalone)
691
+ ```
692
+
693
+ ## License
694
+
695
+ MIT
package/bin/run.js ADDED
@@ -0,0 +1,25 @@
1
+ #!/usr/bin/env node
2
+
3
+ const { spawn } = require('node:child_process');
4
+ const path = require('node:path');
5
+ const fs = require('node:fs');
6
+
7
+ const BINARY_NAME = 'code-intelligence-mcp-server';
8
+ const BINARY_PATH = path.join(__dirname, BINARY_NAME);
9
+
10
+ if (!fs.existsSync(BINARY_PATH)) {
11
+ console.error(`Binary not found at ${BINARY_PATH}`);
12
+ console.error('Please try reinstalling the package: npm install -g @iceinvein/code-intelligence-mcp-standalone');
13
+ process.exit(1);
14
+ }
15
+
16
+ // Pass through all args, prepend --standalone
17
+ const args = ['--standalone', ...process.argv.slice(2)];
18
+
19
+ const child = spawn(BINARY_PATH, args, {
20
+ stdio: 'inherit'
21
+ });
22
+
23
+ child.on('exit', (code) => process.exit(code));
24
+ process.on('SIGINT', () => child.kill('SIGINT'));
25
+ process.on('SIGTERM', () => child.kill('SIGTERM'));
package/install.js ADDED
@@ -0,0 +1,86 @@
1
+ const fs = require('fs');
2
+ const path = require('path');
3
+ const axios = require('axios');
4
+ const tar = require('tar');
5
+ const os = require('os');
6
+
7
+ const REPO = 'iceinvein/code_intelligence_mcp_server';
8
+ const BINARY_NAME = 'code-intelligence-mcp-server';
9
+ // We use the version from package.json to fetch the matching tag
10
+ const VERSION = 'v' + require('./package.json').version;
11
+
12
+ const MAPPING = {
13
+ 'darwin': {
14
+ 'arm64': 'aarch64-apple-darwin'
15
+ }
16
+ };
17
+
18
+ async function install() {
19
+ const platform = os.platform();
20
+ const arch = os.arch();
21
+
22
+ if (!MAPPING[platform] || !MAPPING[platform][arch]) {
23
+ console.error(`\n Code Intelligence MCP Server currently only supports macOS (Apple Silicon).\n`);
24
+ console.error(` Detected: ${platform} ${arch}`);
25
+ console.error(` Supported: darwin arm64 (macOS with Apple Silicon)\n`);
26
+ console.error(` For updates on additional platform support, see:`);
27
+ console.error(` https://github.com/iceinvein/code_intelligence_mcp_server\n`);
28
+ process.exit(1);
29
+ }
30
+
31
+ const target = MAPPING[platform][arch];
32
+ const tarFilename = `${BINARY_NAME}-${target}.tar.gz`;
33
+ const url = `https://github.com/${REPO}/releases/download/${VERSION}/${tarFilename}`;
34
+
35
+ const binDir = path.join(__dirname, 'bin');
36
+ const destBinary = path.join(binDir, BINARY_NAME);
37
+
38
+ // Ensure bin dir exists
39
+ if (!fs.existsSync(binDir)) {
40
+ fs.mkdirSync(binDir, { recursive: true });
41
+ }
42
+
43
+ console.log(`Downloading ${BINARY_NAME} ${VERSION} for ${target}...`);
44
+ console.log(`URL: ${url}`);
45
+
46
+ try {
47
+ const response = await axios({
48
+ method: 'get',
49
+ url: url,
50
+ responseType: 'stream'
51
+ });
52
+
53
+ // Pipe the tar.gz stream directly into the extractor
54
+ const extract = tar.x({
55
+ C: binDir,
56
+ });
57
+
58
+ response.data.pipe(extract);
59
+
60
+ await new Promise((resolve, reject) => {
61
+ extract.on('finish', resolve);
62
+ extract.on('error', reject);
63
+ });
64
+
65
+ // Verify the binary exists
66
+ if (fs.existsSync(destBinary)) {
67
+ fs.chmodSync(destBinary, 0o755);
68
+ console.log(`Successfully installed to ${destBinary}`);
69
+ } else {
70
+ console.error('Extraction failed: Binary not found after unpacking.');
71
+ console.error(`Expected location: ${destBinary}`);
72
+ // List contents of binDir to help debug
73
+ console.log('Contents of bin directory:', fs.readdirSync(binDir));
74
+ process.exit(1);
75
+ }
76
+
77
+ } catch (error) {
78
+ console.error('Failed to download or install binary:', error.message);
79
+ if (error.response && error.response.status === 404) {
80
+ console.error(`Release not found. Please ensure version ${VERSION} is published on GitHub.`);
81
+ }
82
+ process.exit(1);
83
+ }
84
+ }
85
+
86
+ install();
package/package.json ADDED
@@ -0,0 +1,27 @@
1
+ {
2
+ "name": "@iceinvein/code-intelligence-mcp-standalone",
3
+ "version": "1.5.1",
4
+ "description": "Code Intelligence MCP Server - Standalone HTTP mode for multi-client setups",
5
+ "bin": {
6
+ "code-intelligence-mcp-standalone": "bin/run.js"
7
+ },
8
+ "scripts": {
9
+ "postinstall": "node install.js"
10
+ },
11
+ "repository": {
12
+ "type": "git",
13
+ "url": "git+https://github.com/iceinvein/code_intelligence_mcp_server.git"
14
+ },
15
+ "author": "iceinvein",
16
+ "license": "MIT",
17
+ "dependencies": {
18
+ "axios": "^1.13.2",
19
+ "tar": "^7.5.3"
20
+ },
21
+ "os": [
22
+ "darwin"
23
+ ],
24
+ "cpu": [
25
+ "arm64"
26
+ ]
27
+ }