@iceinvein/code-intelligence-mcp-standalone 2.0.1 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +1 -166
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -22,7 +22,6 @@ Unlike basic text search, this server builds a local knowledge graph to understa
22
22
  * **Production First**: Multi-layer test detection (file paths, symbol names, and AST-level `#[test]`/`mod tests` analysis) ensures implementation code ranks above test helpers.
23
23
  * **Multi-Repo Support**: Index and search across multiple repositories/monorepos simultaneously.
24
24
  * **OS-Native File Watching**: Uses the `notify` crate with macOS FSEvents for instant re-indexing on file changes.
25
- * **Built-in Chat UI**: Optional ChatGPT-style web interface powered by a local **Qwen2.5-Coder-14B** model. Ask questions about your codebase in the browser with live tool-call visibility and streaming responses.
26
25
  * **Fast & Local**: Written in **Rust** with Metal GPU acceleration on Apple Silicon. Parallel indexing with persistent caching.
27
26
 
28
27
  ---
@@ -222,156 +221,6 @@ warm_ttl_seconds = 300 # How long idle repos stay in memory
222
221
 
223
222
  ---
224
223
 
225
- ## Chat Mode (Experimental)
226
-
227
- Chat mode adds a **ChatGPT-style web UI** for asking questions about your codebase directly in the browser. It runs a local **Qwen2.5-Coder-14B** model with full Metal GPU acceleration and uses the same search and navigation tools that MCP clients get — meaning search quality improvements automatically benefit the chat experience.
228
-
229
- Chat mode requires standalone mode and Apple Silicon with at least 16GB of unified memory.
230
-
231
- ### Quick Start
232
-
233
- ```bash
234
- # Start standalone server with chat enabled
235
- npx @iceinvein/code-intelligence-mcp-standalone --chat
236
-
237
- # Or from source
238
- ./target/release/code-intelligence-mcp-server --standalone --chat
239
-
240
- # Custom ports
241
- ./target/release/code-intelligence-mcp-server --standalone --port 3333 --chat --chat-port 4000
242
-
243
- # Via environment variables
244
- CIMCP_MODE=standalone CIMCP_CHAT=true ./target/release/code-intelligence-mcp-server
245
- ```
246
-
247
- Once started, open **http://127.0.0.1:3334** in your browser.
248
-
249
- On first launch, the 14B model (~9GB) is downloaded from HuggingFace and cached at `~/.code-intelligence/models/qwen2.5-coder-14b-gguf/`. The MCP server starts immediately — the model loads in the background and the chat UI becomes available once loading completes (typically 2-5 minutes on first run, seconds on subsequent launches).
250
-
251
- ### How It Works
252
-
253
- ```mermaid
254
- sequenceDiagram
255
- participant Browser as Web UI
256
- participant Chat as Chat Server (:3334)
257
- participant Agent as Agent Loop
258
- participant LLM as Qwen2.5-14B (Metal GPU)
259
- participant Tools as MCP Tool Handlers
260
-
261
- Browser->>Chat: POST /api/chat (messages + repo_path)
262
- Chat-->>Browser: SSE stream opened
263
-
264
- loop Up to 3 tool rounds
265
- Agent->>LLM: Generate (full prompt)
266
- LLM-->>Agent: Response with <tool_call> blocks
267
- Agent-->>Browser: SSE: tool_call (tool name + args)
268
- Agent->>Tools: Execute tool (search_code, get_definition, etc.)
269
- Tools-->>Agent: Tool results (JSON)
270
- Agent-->>Browser: SSE: tool_result (summary)
271
- Note over Agent: Append results to conversation, next round
272
- end
273
-
274
- Agent->>LLM: Generate stream (final response)
275
- LLM-->>Agent: Tokens (one at a time)
276
- Agent-->>Browser: SSE: token (streamed)
277
- Agent-->>Browser: SSE: done
278
- ```
279
-
280
- The agent uses up to **3 rounds** of tool calling before producing a final streamed response. Each round, the LLM can invoke any combination of 10 code intelligence tools to gather context before answering.
281
-
282
- ### Available Tools
283
-
284
- The chat agent has access to a curated subset of the full MCP tool suite:
285
-
286
- | Tool | Purpose |
287
- | :--- | :------ |
288
- | `search_code` | Hybrid semantic + keyword search |
289
- | `get_definition` | Jump to symbol source code |
290
- | `find_references` | Find all usages of a symbol |
291
- | `get_call_hierarchy` | Navigate callers and callees |
292
- | `get_type_graph` | Explore type inheritance |
293
- | `explore_dependency_graph` | Trace module imports/exports |
294
- | `get_file_symbols` | List all symbols in a file |
295
- | `find_affected_code` | Impact analysis (reverse dependencies) |
296
- | `trace_data_flow` | Follow variable reads and writes |
297
- | `summarize_file` | Structural file overview |
298
-
299
- ### Web UI Features
300
-
301
- - **Live token streaming** — responses appear word-by-word as the model generates
302
- - **Tool call visibility** — see which tools the model invokes and their results in real-time
303
- - **Multi-turn conversation** — full chat history maintained across turns
304
- - **Markdown rendering** — code blocks with syntax highlighting (via highlight.js)
305
- - **Dark/light theme** — toggle between themes with the header button
306
- - **Repo selector** — specify the repository path to query against
307
- - **Keyboard shortcuts** — Enter to send, Shift+Enter for newline
308
-
309
- ### Configuration
310
-
311
- | Setting | CLI Flag | Env Var | Default | Description |
312
- | :------ | :------- | :------ | :------ | :---------- |
313
- | Enable chat | `--chat` | `CIMCP_CHAT=true` | off | Activate chat mode |
314
- | Chat port | `--chat-port PORT` | `CIMCP_CHAT_PORT=PORT` | `3334` | HTTP port for the chat UI |
315
-
316
- **Priority:** CLI flags > Environment variables > Defaults
317
-
318
- ### API Reference
319
-
320
- The chat server exposes three HTTP endpoints:
321
-
322
- **`GET /`** — Serves the web UI (single-page HTML with embedded CSS/JS).
323
-
324
- **`GET /api/status`** — Returns model loading status.
325
- ```json
326
- {"model_loaded": true, "model_name": "Qwen2.5-Coder-14B-Instruct"}
327
- ```
328
-
329
- **`POST /api/chat`** — Starts a streaming chat session. Returns an SSE event stream.
330
-
331
- Request body:
332
- ```json
333
- {
334
- "messages": [
335
- {"role": "user", "content": "How does the ranking system work?"}
336
- ],
337
- "repo_path": "/absolute/path/to/your/repo"
338
- }
339
- ```
340
-
341
- SSE event types:
342
-
343
- | Event | Data | Description |
344
- | :---- | :--- | :---------- |
345
- | `token` | `{"type":"token","content":"The "}` | A generated text token |
346
- | `tool_call` | `{"type":"tool_call","tool":"search_code","args":{...}}` | Tool invocation started |
347
- | `tool_result` | `{"type":"tool_result","tool":"search_code","summary":"..."}` | Tool execution completed |
348
- | `error` | `{"type":"error","message":"..."}` | Non-recoverable error |
349
- | `done` | `{"type":"done"}` | Stream complete |
350
-
351
- ### Model Details
352
-
353
- | Property | Value |
354
- | :------- | :---- |
355
- | Model | Qwen2.5-Coder-14B-Instruct |
356
- | Format | GGUF Q4_K_M (~9 GB) |
357
- | Context window | 8,192 tokens |
358
- | Max generation | 2,048 tokens per response |
359
- | GPU offloading | All layers via Metal |
360
- | Sampling | Temperature 0.7 |
361
- | HuggingFace repo | `Qwen/Qwen2.5-Coder-14B-Instruct-GGUF` |
362
- | Cache location | `~/.code-intelligence/models/qwen2.5-coder-14b-gguf/` |
363
-
364
- ### Limitations
365
-
366
- - **Standalone-only** — chat is not available in embedded (stdio) mode since it requires a persistent HTTP server
367
- - **Apple Silicon required** — the 14B model needs Metal GPU acceleration; 16GB+ unified memory recommended
368
- - **Context budget** — the 8K token context window is shared between conversation history, tool definitions, and tool results; long conversations may lose early context
369
- - **Tool result truncation** — individual tool results are capped at 4,000 characters to preserve context budget
370
- - **No authentication** — the chat server binds to localhost only; do not expose to the network without adding an auth layer
371
- - **Single-threaded generation** — one chat request is processed at a time; concurrent requests queue
372
-
373
- ---
374
-
375
224
  ## Capabilities
376
225
 
377
226
  Available tools for the agent (23 tools total):
@@ -574,19 +423,11 @@ Works without configuration by default. You can customize behavior via environme
574
423
  ```mermaid
575
424
  flowchart LR
576
425
  Client[MCP Client] <==> Tools
577
- Browser[Chat Web UI] <==> ChatServer
578
426
 
579
427
  subgraph Server [Code Intelligence Server]
580
428
  direction TB
581
429
  Tools[Tool Router]
582
430
 
583
- subgraph Chat [Chat Mode]
584
- direction TB
585
- ChatServer[Axum HTTP + SSE] --> Agent[Agent Loop]
586
- Agent --> ChatLLM["Qwen2.5-Coder-14B<br/>(Metal GPU)"]
587
- Agent -- "tool calls" --> Handlers
588
- end
589
-
590
431
  subgraph Indexer [Indexing Pipeline]
591
432
  direction TB
592
433
  Watch[OS-Native File Watcher] --> Scan[File Scan]
@@ -652,12 +493,6 @@ EMBEDDINGS_BACKEND=hash BASE_DIR=/path/to/repo ./target/release/code-intelligenc
652
493
 
653
494
  ```text
654
495
  src/
655
- ├── chat/ # Chat mode (--chat flag, standalone only)
656
- │ ├── mod.rs # Axum HTTP server, SSE streaming, routes
657
- │ ├── agent.rs # Multi-round agent loop, prompt building, tool call parsing
658
- │ ├── llm.rs # ChatLlm (Qwen2.5-Coder-14B via llama.cpp, Metal GPU)
659
- │ ├── tools.rs # Tool definitions (JSON) + dispatch to handlers
660
- │ └── ui.html # Single-file web UI (vanilla JS, marked.js, highlight.js)
661
496
  ├── indexer/
662
497
  │ ├── extract/ # Language-specific symbol extractors (Rust, TS, Python, Go, Java, C, C++)
663
498
  │ ├── pipeline/ # Indexing pipeline stages (scan, parse, embed, watch, describe)
@@ -674,7 +509,7 @@ src/
674
509
  │ ├── hybrid.rs # Hybrid BM25 + vector scoring loop
675
510
  │ └── postprocess.rs # Final enforcement, vector promotion
676
511
  ├── graph/ # PageRank, call hierarchy, type graphs
677
- ├── handlers/ # MCP tool handlers (shared by MCP server + chat agent)
512
+ ├── handlers/ # MCP tool handlers
678
513
  ├── server/ # MCP protocol routing (embedded + standalone)
679
514
  │ ├── mod.rs # Shared tool dispatch, embedded handler
680
515
  │ └── standalone.rs # Standalone HTTP handler with session routing
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@iceinvein/code-intelligence-mcp-standalone",
3
- "version": "2.0.1",
3
+ "version": "2.1.0",
4
4
  "description": "Code Intelligence MCP Server - Standalone HTTP mode for multi-client setups",
5
5
  "bin": {
6
6
  "code-intelligence-mcp-standalone": "bin/run.js"