@iceinvein/code-intelligence-mcp 1.5.0 → 1.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -22,6 +22,7 @@ Unlike basic text search, this server builds a local knowledge graph to understa
22
22
  * **Production First**: Multi-layer test detection (file paths, symbol names, and AST-level `#[test]`/`mod tests` analysis) ensures implementation code ranks above test helpers.
23
23
  * **Multi-Repo Support**: Index and search across multiple repositories/monorepos simultaneously.
24
24
  * **OS-Native File Watching**: Uses the `notify` crate with macOS FSEvents for instant re-indexing on file changes.
25
+ * **Built-in Chat UI**: Optional ChatGPT-style web interface powered by a local **Qwen2.5-Coder-14B** model. Ask questions about your codebase in the browser with live tool-call visibility and streaming responses.
25
26
  * **Fast & Local**: Written in **Rust** with Metal GPU acceleration on Apple Silicon. Parallel indexing with persistent caching.
26
27
 
27
28
  ---
@@ -221,6 +222,156 @@ warm_ttl_seconds = 300 # How long idle repos stay in memory
221
222
 
222
223
  ---
223
224
 
225
+ ## Chat Mode (Experimental)
226
+
227
+ Chat mode adds a **ChatGPT-style web UI** for asking questions about your codebase directly in the browser. It runs a local **Qwen2.5-Coder-14B** model with full Metal GPU acceleration and uses the same search and navigation tools that MCP clients get — meaning search quality improvements automatically benefit the chat experience.
228
+
229
+ Chat mode requires standalone mode and Apple Silicon with at least 16GB of unified memory.
230
+
231
+ ### Quick Start
232
+
233
+ ```bash
234
+ # Start standalone server with chat enabled
235
+ npx @iceinvein/code-intelligence-mcp-standalone --chat
236
+
237
+ # Or from source
238
+ ./target/release/code-intelligence-mcp-server --standalone --chat
239
+
240
+ # Custom ports
241
+ ./target/release/code-intelligence-mcp-server --standalone --port 3333 --chat --chat-port 4000
242
+
243
+ # Via environment variables
244
+ CIMCP_MODE=standalone CIMCP_CHAT=true ./target/release/code-intelligence-mcp-server
245
+ ```
246
+
247
+ Once started, open **http://127.0.0.1:3334** in your browser.
248
+
249
+ On first launch, the 14B model (~9GB) is downloaded from HuggingFace and cached at `~/.code-intelligence/models/qwen2.5-coder-14b-gguf/`. The MCP server starts immediately — the model loads in the background and the chat UI becomes available once loading completes (typically 2-5 minutes on first run, seconds on subsequent launches).
250
+
251
+ ### How It Works
252
+
253
+ ```mermaid
254
+ sequenceDiagram
255
+ participant Browser as Web UI
256
+ participant Chat as Chat Server (:3334)
257
+ participant Agent as Agent Loop
258
+ participant LLM as Qwen2.5-14B (Metal GPU)
259
+ participant Tools as MCP Tool Handlers
260
+
261
+ Browser->>Chat: POST /api/chat (messages + repo_path)
262
+ Chat-->>Browser: SSE stream opened
263
+
264
+ loop Up to 3 tool rounds
265
+ Agent->>LLM: Generate (full prompt)
266
+ LLM-->>Agent: Response with <tool_call> blocks
267
+ Agent-->>Browser: SSE: tool_call (tool name + args)
268
+ Agent->>Tools: Execute tool (search_code, get_definition, etc.)
269
+ Tools-->>Agent: Tool results (JSON)
270
+ Agent-->>Browser: SSE: tool_result (summary)
271
+ Note over Agent: Append results to conversation, next round
272
+ end
273
+
274
+ Agent->>LLM: Generate stream (final response)
275
+ LLM-->>Agent: Tokens (one at a time)
276
+ Agent-->>Browser: SSE: token (streamed)
277
+ Agent-->>Browser: SSE: done
278
+ ```
279
+
280
+ The agent uses up to **3 rounds** of tool calling before producing a final streamed response. Each round, the LLM can invoke any combination of 10 code intelligence tools to gather context before answering.
281
+
282
+ ### Available Tools
283
+
284
+ The chat agent has access to a curated subset of the full MCP tool suite:
285
+
286
+ | Tool | Purpose |
287
+ | :--- | :------ |
288
+ | `search_code` | Hybrid semantic + keyword search |
289
+ | `get_definition` | Jump to symbol source code |
290
+ | `find_references` | Find all usages of a symbol |
291
+ | `get_call_hierarchy` | Navigate callers and callees |
292
+ | `get_type_graph` | Explore type inheritance |
293
+ | `explore_dependency_graph` | Trace module imports/exports |
294
+ | `get_file_symbols` | List all symbols in a file |
295
+ | `find_affected_code` | Impact analysis (reverse dependencies) |
296
+ | `trace_data_flow` | Follow variable reads and writes |
297
+ | `summarize_file` | Structural file overview |
298
+
299
+ ### Web UI Features
300
+
301
+ - **Live token streaming** — responses appear word-by-word as the model generates
302
+ - **Tool call visibility** — see which tools the model invokes and their results in real-time
303
+ - **Multi-turn conversation** — full chat history maintained across turns
304
+ - **Markdown rendering** — code blocks with syntax highlighting (via highlight.js)
305
+ - **Dark/light theme** — toggle between themes with the header button
306
+ - **Repo selector** — specify the repository path to query against
307
+ - **Keyboard shortcuts** — Enter to send, Shift+Enter for newline
308
+
309
+ ### Configuration
310
+
311
+ | Setting | CLI Flag | Env Var | Default | Description |
312
+ | :------ | :------- | :------ | :------ | :---------- |
313
+ | Enable chat | `--chat` | `CIMCP_CHAT=true` | off | Activate chat mode |
314
+ | Chat port | `--chat-port PORT` | `CIMCP_CHAT_PORT=PORT` | `3334` | HTTP port for the chat UI |
315
+
316
+ **Priority:** CLI flags > Environment variables > Defaults
317
+
318
+ ### API Reference
319
+
320
+ The chat server exposes three HTTP endpoints:
321
+
322
+ **`GET /`** — Serves the web UI (single-page HTML with embedded CSS/JS).
323
+
324
+ **`GET /api/status`** — Returns model loading status.
325
+ ```json
326
+ {"model_loaded": true, "model_name": "Qwen2.5-Coder-14B-Instruct"}
327
+ ```
328
+
329
+ **`POST /api/chat`** — Starts a streaming chat session. Returns an SSE event stream.
330
+
331
+ Request body:
332
+ ```json
333
+ {
334
+ "messages": [
335
+ {"role": "user", "content": "How does the ranking system work?"}
336
+ ],
337
+ "repo_path": "/absolute/path/to/your/repo"
338
+ }
339
+ ```
340
+
341
+ SSE event types:
342
+
343
+ | Event | Data | Description |
344
+ | :---- | :--- | :---------- |
345
+ | `token` | `{"type":"token","content":"The "}` | A generated text token |
346
+ | `tool_call` | `{"type":"tool_call","tool":"search_code","args":{...}}` | Tool invocation started |
347
+ | `tool_result` | `{"type":"tool_result","tool":"search_code","summary":"..."}` | Tool execution completed |
348
+ | `error` | `{"type":"error","message":"..."}` | Non-recoverable error |
349
+ | `done` | `{"type":"done"}` | Stream complete |
350
+
351
+ ### Model Details
352
+
353
+ | Property | Value |
354
+ | :------- | :---- |
355
+ | Model | Qwen2.5-Coder-14B-Instruct |
356
+ | Format | GGUF Q4_K_M (~9 GB) |
357
+ | Context window | 8,192 tokens |
358
+ | Max generation | 2,048 tokens per response |
359
+ | GPU offloading | All layers via Metal |
360
+ | Sampling | Temperature 0.7 |
361
+ | HuggingFace repo | `Qwen/Qwen2.5-Coder-14B-Instruct-GGUF` |
362
+ | Cache location | `~/.code-intelligence/models/qwen2.5-coder-14b-gguf/` |
363
+
364
+ ### Limitations
365
+
366
+ - **Standalone-only** — chat is not available in embedded (stdio) mode since it requires a persistent HTTP server
367
+ - **Apple Silicon required** — the 14B model needs Metal GPU acceleration; 16GB+ unified memory recommended
368
+ - **Context budget** — the 8K token context window is shared between conversation history, tool definitions, and tool results; long conversations may lose early context
369
+ - **Tool result truncation** — individual tool results are capped at 4,000 characters to preserve context budget
370
+ - **No authentication** — the chat server binds to localhost only; do not expose to the network without adding an auth layer
371
+ - **Single-threaded generation** — one chat request is processed at a time; concurrent requests queue
372
+
373
+ ---
374
+
224
375
  ## Capabilities
225
376
 
226
377
  Available tools for the agent (23 tools total):
@@ -423,11 +574,19 @@ Works without configuration by default. You can customize behavior via environme
423
574
  ```mermaid
424
575
  flowchart LR
425
576
  Client[MCP Client] <==> Tools
577
+ Browser[Chat Web UI] <==> ChatServer
426
578
 
427
579
  subgraph Server [Code Intelligence Server]
428
580
  direction TB
429
581
  Tools[Tool Router]
430
582
 
583
+ subgraph Chat [Chat Mode]
584
+ direction TB
585
+ ChatServer[Axum HTTP + SSE] --> Agent[Agent Loop]
586
+ Agent --> ChatLLM["Qwen2.5-Coder-14B<br/>(Metal GPU)"]
587
+ Agent -- "tool calls" --> Handlers
588
+ end
589
+
431
590
  subgraph Indexer [Indexing Pipeline]
432
591
  direction TB
433
592
  Watch[OS-Native File Watcher] --> Scan[File Scan]
@@ -455,19 +614,20 @@ flowchart LR
455
614
  Context[Token-Aware Assembly]
456
615
  end
457
616
 
458
- %% Data Flow
459
- Tools -- Index --> Watch
617
+ Handlers[Tool Handlers]
618
+ Tools --> Handlers
619
+ Handlers -- Index --> Watch
460
620
  PageRank --> SQLite
461
621
  Embed --> Lance
462
622
  Embed --> Cache
463
623
  LLMDesc --> SQLite
464
624
  JSDoc --> SQLite
465
625
 
466
- Tools -- Query --> QueryExpand
626
+ Handlers -- Query --> QueryExpand
467
627
  QueryExpand --> Hybrid
468
628
  Hybrid --> Signals
469
629
  Signals --> Context
470
- Context --> Tools
630
+ Context --> Handlers
471
631
  end
472
632
  ```
473
633
 
@@ -492,6 +652,12 @@ EMBEDDINGS_BACKEND=hash BASE_DIR=/path/to/repo ./target/release/code-intelligenc
492
652
 
493
653
  ```text
494
654
  src/
655
+ ├── chat/ # Chat mode (--chat flag, standalone only)
656
+ │ ├── mod.rs # Axum HTTP server, SSE streaming, routes
657
+ │ ├── agent.rs # Multi-round agent loop, prompt building, tool call parsing
658
+ │ ├── llm.rs # ChatLlm (Qwen2.5-Coder-14B via llama.cpp, Metal GPU)
659
+ │ ├── tools.rs # Tool definitions (JSON) + dispatch to handlers
660
+ │ └── ui.html # Single-file web UI (vanilla JS, marked.js, highlight.js)
495
661
  ├── indexer/
496
662
  │ ├── extract/ # Language-specific symbol extractors (Rust, TS, Python, Go, Java, C, C++)
497
663
  │ ├── pipeline/ # Indexing pipeline stages (scan, parse, embed, watch, describe)
@@ -508,13 +674,13 @@ src/
508
674
  │ ├── hybrid.rs # Hybrid BM25 + vector scoring loop
509
675
  │ └── postprocess.rs # Final enforcement, vector promotion
510
676
  ├── graph/ # PageRank, call hierarchy, type graphs
511
- ├── handlers/ # MCP tool handlers
677
+ ├── handlers/ # MCP tool handlers (shared by MCP server + chat agent)
512
678
  ├── server/ # MCP protocol routing (embedded + standalone)
513
679
  │ ├── mod.rs # Shared tool dispatch, embedded handler
514
680
  │ └── standalone.rs # Standalone HTTP handler with session routing
515
681
  ├── tools/ # Tool definitions (23 MCP tools)
516
682
  ├── embeddings/ # jina-code-0.5b embedding model (GGUF via llama.cpp)
517
- ├── llm/ # On-device LLM (Qwen2.5-Coder-1.5B via llama.cpp)
683
+ ├── llm/ # On-device LLM (Qwen2.5-Coder-1.5B via llama.cpp, for descriptions)
518
684
  ├── reranker/ # Reranker trait and cache (currently disabled)
519
685
  ├── path/ # Cross-platform path normalization (camino)
520
686
  ├── text.rs # Text processing (synonym expansion, morphological variants)
package/package.json CHANGED
@@ -1,10 +1,9 @@
1
1
  {
2
2
  "name": "@iceinvein/code-intelligence-mcp",
3
- "version": "1.5.0",
3
+ "version": "1.5.1",
4
4
  "description": "Code Intelligence MCP Server - Smart context for your LLM coding agent",
5
5
  "bin": {
6
- "code-intelligence-mcp": "bin/run.js",
7
- "code-intelligence-mcp-standalone": "bin/standalone.js"
6
+ "code-intelligence-mcp": "bin/run.js"
8
7
  },
9
8
  "scripts": {
10
9
  "postinstall": "node install.js"
package/bin/standalone.js DELETED
@@ -1,26 +0,0 @@
1
- #!/usr/bin/env node
2
-
3
- const { spawn } = require('node:child_process');
4
- const path = require('node:path');
5
- const os = require('node:os');
6
- const fs = require('node:fs');
7
-
8
- const BINARY_NAME = 'code-intelligence-mcp-server';
9
- const BINARY_PATH = path.join(__dirname, BINARY_NAME);
10
-
11
- if (!fs.existsSync(BINARY_PATH)) {
12
- console.error(`Binary not found at ${BINARY_PATH}`);
13
- console.error('Please try reinstalling the package: npm install -g @iceinvein/code-intelligence-mcp');
14
- process.exit(1);
15
- }
16
-
17
- // Pass through all args, prepend --standalone
18
- const args = ['--standalone', ...process.argv.slice(2)];
19
-
20
- const child = spawn(BINARY_PATH, args, {
21
- stdio: 'inherit'
22
- });
23
-
24
- child.on('exit', (code) => process.exit(code));
25
- process.on('SIGINT', () => child.kill('SIGINT'));
26
- process.on('SIGTERM', () => child.kill('SIGTERM'));