@iceinvein/code-intelligence-mcp-standalone 2.0.1 → 2.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +1 -166
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -22,7 +22,6 @@ Unlike basic text search, this server builds a local knowledge graph to understa
|
|
|
22
22
|
* **Production First**: Multi-layer test detection (file paths, symbol names, and AST-level `#[test]`/`mod tests` analysis) ensures implementation code ranks above test helpers.
|
|
23
23
|
* **Multi-Repo Support**: Index and search across multiple repositories/monorepos simultaneously.
|
|
24
24
|
* **OS-Native File Watching**: Uses the `notify` crate with macOS FSEvents for instant re-indexing on file changes.
|
|
25
|
-
* **Built-in Chat UI**: Optional ChatGPT-style web interface powered by a local **Qwen2.5-Coder-14B** model. Ask questions about your codebase in the browser with live tool-call visibility and streaming responses.
|
|
26
25
|
* **Fast & Local**: Written in **Rust** with Metal GPU acceleration on Apple Silicon. Parallel indexing with persistent caching.
|
|
27
26
|
|
|
28
27
|
---
|
|
@@ -222,156 +221,6 @@ warm_ttl_seconds = 300 # How long idle repos stay in memory
|
|
|
222
221
|
|
|
223
222
|
---
|
|
224
223
|
|
|
225
|
-
## Chat Mode (Experimental)
|
|
226
|
-
|
|
227
|
-
Chat mode adds a **ChatGPT-style web UI** for asking questions about your codebase directly in the browser. It runs a local **Qwen2.5-Coder-14B** model with full Metal GPU acceleration and uses the same search and navigation tools that MCP clients get — meaning search quality improvements automatically benefit the chat experience.
|
|
228
|
-
|
|
229
|
-
Chat mode requires standalone mode and Apple Silicon with at least 16GB of unified memory.
|
|
230
|
-
|
|
231
|
-
### Quick Start
|
|
232
|
-
|
|
233
|
-
```bash
|
|
234
|
-
# Start standalone server with chat enabled
|
|
235
|
-
npx @iceinvein/code-intelligence-mcp-standalone --chat
|
|
236
|
-
|
|
237
|
-
# Or from source
|
|
238
|
-
./target/release/code-intelligence-mcp-server --standalone --chat
|
|
239
|
-
|
|
240
|
-
# Custom ports
|
|
241
|
-
./target/release/code-intelligence-mcp-server --standalone --port 3333 --chat --chat-port 4000
|
|
242
|
-
|
|
243
|
-
# Via environment variables
|
|
244
|
-
CIMCP_MODE=standalone CIMCP_CHAT=true ./target/release/code-intelligence-mcp-server
|
|
245
|
-
```
|
|
246
|
-
|
|
247
|
-
Once started, open **http://127.0.0.1:3334** in your browser.
|
|
248
|
-
|
|
249
|
-
On first launch, the 14B model (~9GB) is downloaded from HuggingFace and cached at `~/.code-intelligence/models/qwen2.5-coder-14b-gguf/`. The MCP server starts immediately — the model loads in the background and the chat UI becomes available once loading completes (typically 2-5 minutes on first run, seconds on subsequent launches).
|
|
250
|
-
|
|
251
|
-
### How It Works
|
|
252
|
-
|
|
253
|
-
```mermaid
|
|
254
|
-
sequenceDiagram
|
|
255
|
-
participant Browser as Web UI
|
|
256
|
-
participant Chat as Chat Server (:3334)
|
|
257
|
-
participant Agent as Agent Loop
|
|
258
|
-
participant LLM as Qwen2.5-14B (Metal GPU)
|
|
259
|
-
participant Tools as MCP Tool Handlers
|
|
260
|
-
|
|
261
|
-
Browser->>Chat: POST /api/chat (messages + repo_path)
|
|
262
|
-
Chat-->>Browser: SSE stream opened
|
|
263
|
-
|
|
264
|
-
loop Up to 3 tool rounds
|
|
265
|
-
Agent->>LLM: Generate (full prompt)
|
|
266
|
-
LLM-->>Agent: Response with <tool_call> blocks
|
|
267
|
-
Agent-->>Browser: SSE: tool_call (tool name + args)
|
|
268
|
-
Agent->>Tools: Execute tool (search_code, get_definition, etc.)
|
|
269
|
-
Tools-->>Agent: Tool results (JSON)
|
|
270
|
-
Agent-->>Browser: SSE: tool_result (summary)
|
|
271
|
-
Note over Agent: Append results to conversation, next round
|
|
272
|
-
end
|
|
273
|
-
|
|
274
|
-
Agent->>LLM: Generate stream (final response)
|
|
275
|
-
LLM-->>Agent: Tokens (one at a time)
|
|
276
|
-
Agent-->>Browser: SSE: token (streamed)
|
|
277
|
-
Agent-->>Browser: SSE: done
|
|
278
|
-
```
|
|
279
|
-
|
|
280
|
-
The agent uses up to **3 rounds** of tool calling before producing a final streamed response. Each round, the LLM can invoke any combination of 10 code intelligence tools to gather context before answering.
|
|
281
|
-
|
|
282
|
-
### Available Tools
|
|
283
|
-
|
|
284
|
-
The chat agent has access to a curated subset of the full MCP tool suite:
|
|
285
|
-
|
|
286
|
-
| Tool | Purpose |
|
|
287
|
-
| :--- | :------ |
|
|
288
|
-
| `search_code` | Hybrid semantic + keyword search |
|
|
289
|
-
| `get_definition` | Jump to symbol source code |
|
|
290
|
-
| `find_references` | Find all usages of a symbol |
|
|
291
|
-
| `get_call_hierarchy` | Navigate callers and callees |
|
|
292
|
-
| `get_type_graph` | Explore type inheritance |
|
|
293
|
-
| `explore_dependency_graph` | Trace module imports/exports |
|
|
294
|
-
| `get_file_symbols` | List all symbols in a file |
|
|
295
|
-
| `find_affected_code` | Impact analysis (reverse dependencies) |
|
|
296
|
-
| `trace_data_flow` | Follow variable reads and writes |
|
|
297
|
-
| `summarize_file` | Structural file overview |
|
|
298
|
-
|
|
299
|
-
### Web UI Features
|
|
300
|
-
|
|
301
|
-
- **Live token streaming** — responses appear word-by-word as the model generates
|
|
302
|
-
- **Tool call visibility** — see which tools the model invokes and their results in real-time
|
|
303
|
-
- **Multi-turn conversation** — full chat history maintained across turns
|
|
304
|
-
- **Markdown rendering** — code blocks with syntax highlighting (via highlight.js)
|
|
305
|
-
- **Dark/light theme** — toggle between themes with the header button
|
|
306
|
-
- **Repo selector** — specify the repository path to query against
|
|
307
|
-
- **Keyboard shortcuts** — Enter to send, Shift+Enter for newline
|
|
308
|
-
|
|
309
|
-
### Configuration
|
|
310
|
-
|
|
311
|
-
| Setting | CLI Flag | Env Var | Default | Description |
|
|
312
|
-
| :------ | :------- | :------ | :------ | :---------- |
|
|
313
|
-
| Enable chat | `--chat` | `CIMCP_CHAT=true` | off | Activate chat mode |
|
|
314
|
-
| Chat port | `--chat-port PORT` | `CIMCP_CHAT_PORT=PORT` | `3334` | HTTP port for the chat UI |
|
|
315
|
-
|
|
316
|
-
**Priority:** CLI flags > Environment variables > Defaults
|
|
317
|
-
|
|
318
|
-
### API Reference
|
|
319
|
-
|
|
320
|
-
The chat server exposes three HTTP endpoints:
|
|
321
|
-
|
|
322
|
-
**`GET /`** — Serves the web UI (single-page HTML with embedded CSS/JS).
|
|
323
|
-
|
|
324
|
-
**`GET /api/status`** — Returns model loading status.
|
|
325
|
-
```json
|
|
326
|
-
{"model_loaded": true, "model_name": "Qwen2.5-Coder-14B-Instruct"}
|
|
327
|
-
```
|
|
328
|
-
|
|
329
|
-
**`POST /api/chat`** — Starts a streaming chat session. Returns an SSE event stream.
|
|
330
|
-
|
|
331
|
-
Request body:
|
|
332
|
-
```json
|
|
333
|
-
{
|
|
334
|
-
"messages": [
|
|
335
|
-
{"role": "user", "content": "How does the ranking system work?"}
|
|
336
|
-
],
|
|
337
|
-
"repo_path": "/absolute/path/to/your/repo"
|
|
338
|
-
}
|
|
339
|
-
```
|
|
340
|
-
|
|
341
|
-
SSE event types:
|
|
342
|
-
|
|
343
|
-
| Event | Data | Description |
|
|
344
|
-
| :---- | :--- | :---------- |
|
|
345
|
-
| `token` | `{"type":"token","content":"The "}` | A generated text token |
|
|
346
|
-
| `tool_call` | `{"type":"tool_call","tool":"search_code","args":{...}}` | Tool invocation started |
|
|
347
|
-
| `tool_result` | `{"type":"tool_result","tool":"search_code","summary":"..."}` | Tool execution completed |
|
|
348
|
-
| `error` | `{"type":"error","message":"..."}` | Non-recoverable error |
|
|
349
|
-
| `done` | `{"type":"done"}` | Stream complete |
|
|
350
|
-
|
|
351
|
-
### Model Details
|
|
352
|
-
|
|
353
|
-
| Property | Value |
|
|
354
|
-
| :------- | :---- |
|
|
355
|
-
| Model | Qwen2.5-Coder-14B-Instruct |
|
|
356
|
-
| Format | GGUF Q4_K_M (~9 GB) |
|
|
357
|
-
| Context window | 8,192 tokens |
|
|
358
|
-
| Max generation | 2,048 tokens per response |
|
|
359
|
-
| GPU offloading | All layers via Metal |
|
|
360
|
-
| Sampling | Temperature 0.7 |
|
|
361
|
-
| HuggingFace repo | `Qwen/Qwen2.5-Coder-14B-Instruct-GGUF` |
|
|
362
|
-
| Cache location | `~/.code-intelligence/models/qwen2.5-coder-14b-gguf/` |
|
|
363
|
-
|
|
364
|
-
### Limitations
|
|
365
|
-
|
|
366
|
-
- **Standalone-only** — chat is not available in embedded (stdio) mode since it requires a persistent HTTP server
|
|
367
|
-
- **Apple Silicon required** — the 14B model needs Metal GPU acceleration; 16GB+ unified memory recommended
|
|
368
|
-
- **Context budget** — the 8K token context window is shared between conversation history, tool definitions, and tool results; long conversations may lose early context
|
|
369
|
-
- **Tool result truncation** — individual tool results are capped at 4,000 characters to preserve context budget
|
|
370
|
-
- **No authentication** — the chat server binds to localhost only; do not expose to the network without adding an auth layer
|
|
371
|
-
- **Single-threaded generation** — one chat request is processed at a time; concurrent requests queue
|
|
372
|
-
|
|
373
|
-
---
|
|
374
|
-
|
|
375
224
|
## Capabilities
|
|
376
225
|
|
|
377
226
|
Available tools for the agent (23 tools total):
|
|
@@ -574,19 +423,11 @@ Works without configuration by default. You can customize behavior via environme
|
|
|
574
423
|
```mermaid
|
|
575
424
|
flowchart LR
|
|
576
425
|
Client[MCP Client] <==> Tools
|
|
577
|
-
Browser[Chat Web UI] <==> ChatServer
|
|
578
426
|
|
|
579
427
|
subgraph Server [Code Intelligence Server]
|
|
580
428
|
direction TB
|
|
581
429
|
Tools[Tool Router]
|
|
582
430
|
|
|
583
|
-
subgraph Chat [Chat Mode]
|
|
584
|
-
direction TB
|
|
585
|
-
ChatServer[Axum HTTP + SSE] --> Agent[Agent Loop]
|
|
586
|
-
Agent --> ChatLLM["Qwen2.5-Coder-14B<br/>(Metal GPU)"]
|
|
587
|
-
Agent -- "tool calls" --> Handlers
|
|
588
|
-
end
|
|
589
|
-
|
|
590
431
|
subgraph Indexer [Indexing Pipeline]
|
|
591
432
|
direction TB
|
|
592
433
|
Watch[OS-Native File Watcher] --> Scan[File Scan]
|
|
@@ -652,12 +493,6 @@ EMBEDDINGS_BACKEND=hash BASE_DIR=/path/to/repo ./target/release/code-intelligenc
|
|
|
652
493
|
|
|
653
494
|
```text
|
|
654
495
|
src/
|
|
655
|
-
├── chat/ # Chat mode (--chat flag, standalone only)
|
|
656
|
-
│ ├── mod.rs # Axum HTTP server, SSE streaming, routes
|
|
657
|
-
│ ├── agent.rs # Multi-round agent loop, prompt building, tool call parsing
|
|
658
|
-
│ ├── llm.rs # ChatLlm (Qwen2.5-Coder-14B via llama.cpp, Metal GPU)
|
|
659
|
-
│ ├── tools.rs # Tool definitions (JSON) + dispatch to handlers
|
|
660
|
-
│ └── ui.html # Single-file web UI (vanilla JS, marked.js, highlight.js)
|
|
661
496
|
├── indexer/
|
|
662
497
|
│ ├── extract/ # Language-specific symbol extractors (Rust, TS, Python, Go, Java, C, C++)
|
|
663
498
|
│ ├── pipeline/ # Indexing pipeline stages (scan, parse, embed, watch, describe)
|
|
@@ -674,7 +509,7 @@ src/
|
|
|
674
509
|
│ ├── hybrid.rs # Hybrid BM25 + vector scoring loop
|
|
675
510
|
│ └── postprocess.rs # Final enforcement, vector promotion
|
|
676
511
|
├── graph/ # PageRank, call hierarchy, type graphs
|
|
677
|
-
├── handlers/ # MCP tool handlers
|
|
512
|
+
├── handlers/ # MCP tool handlers
|
|
678
513
|
├── server/ # MCP protocol routing (embedded + standalone)
|
|
679
514
|
│ ├── mod.rs # Shared tool dispatch, embedded handler
|
|
680
515
|
│ └── standalone.rs # Standalone HTTP handler with session routing
|
package/package.json
CHANGED