mcp-local-rag 0.3.0 → 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,14 +1,35 @@
1
1
  # MCP Local RAG
2
2
 
3
- A privacy-first document search server that runs entirely on your machine. No API keys, no cloud services, no data leaving your computer.
3
+ [![npm version](https://img.shields.io/npm/v/mcp-local-rag.svg)](https://www.npmjs.com/package/mcp-local-rag)
4
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
4
5
 
5
- Built for the Model Context Protocol (MCP), this lets you use Cursor, Codex, Claude Code, or any MCP client to search through your local documents using semantic search—without sending anything to external services.
6
+ Local RAG for developers using MCP.
7
+ Semantic search with keyword boost for exact technical terms — fully private, zero setup.
8
+
9
+ ## Features
10
+
11
+ - **Semantic search with keyword boost**
12
+ Vector search first, then keyword matching boosts exact matches. Terms like `useEffect`, error codes, and class names rank higher—not just semantically guessed.
13
+
14
+ - **Smart semantic chunking**
15
+ Chunks documents by meaning, not character count. Uses embedding similarity to find natural topic boundaries—keeping related content together and splitting where topics change.
16
+
17
+ - **Quality-first result filtering**
18
+ Groups results by relevance gaps instead of arbitrary top-K cutoffs. Get fewer but more trustworthy chunks.
19
+
20
+ - **Runs entirely locally**
21
+ No API keys, no cloud, no data leaving your machine. Works fully offline after the first model download.
22
+
23
+ - **Zero-friction setup**
24
+ One `npx` command. No Docker, no Python, no servers to manage. Designed for Cursor, Codex, and Claude Code via MCP.
6
25
 
7
26
  ## Quick Start
8
27
 
9
- Add the MCP server to your AI coding tool. Choose your tool below:
28
+ Set `BASE_DIR` to the folder you want to search. Documents must live under it.
29
+
30
+ Add the MCP server to your AI coding tool:
10
31
 
11
- **For Cursor** - Add to `~/.cursor/mcp.json`:
32
+ **For Cursor** Add to `~/.cursor/mcp.json`:
12
33
  ```json
13
34
  {
14
35
  "mcpServers": {
@@ -23,7 +44,7 @@ Add the MCP server to your AI coding tool. Choose your tool below:
23
44
  }
24
45
  ```
25
46
 
26
- **For Codex** - Add to `~/.codex/config.toml`:
47
+ **For Codex** Add to `~/.codex/config.toml`:
27
48
  ```toml
28
49
  [mcp_servers.local-rag]
29
50
  command = "npx"
@@ -33,103 +54,129 @@ args = ["-y", "mcp-local-rag"]
33
54
  BASE_DIR = "/path/to/your/documents"
34
55
  ```
35
56
 
36
- **For Claude Code** - Run this command:
57
+ **For Claude Code** Run this command:
37
58
  ```bash
38
59
  claude mcp add local-rag --scope user --env BASE_DIR=/path/to/your/documents -- npx -y mcp-local-rag
39
60
  ```
40
61
 
41
- Restart your tool, then start using:
62
+ Restart your tool, then start using it:
63
+
42
64
  ```
43
- "Ingest api-spec.pdf"
44
- "What does this document say about authentication?"
65
+ You: "Ingest api-spec.pdf"
66
+ Assistant: Successfully ingested api-spec.pdf (47 chunks created)
67
+
68
+ You: "What does the API documentation say about authentication?"
69
+ Assistant: Based on the documentation, authentication uses OAuth 2.0 with JWT tokens.
70
+ The flow is described in section 3.2...
45
71
  ```
46
72
 
47
73
  That's it. No installation, no Docker, no complex setup.
48
74
 
49
75
  ## Why This Exists
50
76
 
51
- You want to use AI to search through your documents. Maybe they're technical specs, research papers, internal documentation, or meeting notes. The problem: most solutions require sending your files to external APIs.
77
+ You want AI to search your documentstechnical specs, research papers, internal docs. But most solutions send your files to external APIs.
52
78
 
53
- This creates three issues:
79
+ **Privacy.** Your documents might contain sensitive data. This runs entirely locally.
54
80
 
55
- **Privacy concerns.** Your documents might contain sensitive information—client data, proprietary research, personal notes. Sending them to third-party services means trusting them with that data.
81
+ **Cost.** External embedding APIs charge per use. This is free after the initial model download.
56
82
 
57
- **Cost at scale.** External embedding APIs charge per use. For large document sets or frequent searches, costs add up quickly.
83
+ **Offline.** Works without internet after setup.
58
84
 
59
- **Network dependency.** If you're offline or have limited connectivity, you can't search your own documents.
85
+ **Code search.** Pure semantic search misses exact terms like `useEffect` or `ERR_CONNECTION_REFUSED`. Keyword boost catches both meaning and exact matches.
60
86
 
61
- This project solves these problems by running everything locally. Documents never leave your machine. The embedding model downloads once, then works offline. And it's free to use as much as you want.
87
+ ## Usage
62
88
 
63
- ## What You Get
89
+ The server provides 5 MCP tools: ingest, search, list, delete, status
90
+ (`ingest_file`, `query_documents`, `list_files`, `delete_file`, `status`).
64
91
 
65
- The server provides five tools through MCP:
92
+ ### Ingesting Documents
66
93
 
67
- **Document ingestion** handles PDF, DOCX, TXT, and Markdown files. Point it at a file, and it extracts the text, splits it into searchable chunks, generates embeddings using a local model, and stores everything in a local vector database. If you ingest the same file again, it replaces the old version—no duplicate data.
94
+ ```
95
+ "Ingest the document at /Users/me/docs/api-spec.pdf"
96
+ ```
68
97
 
69
- **Semantic search** lets you query in natural language. Instead of keyword matching, it understands meaning. Ask "how does authentication work" and it finds relevant sections even if they use different words like "login flow" or "credential validation."
98
+ Supports PDF, DOCX, TXT, and Markdown. The server extracts text, splits it into chunks, generates embeddings locally, and stores everything in a local vector database.
70
99
 
71
- **File management** shows what you've ingested and when. You can see how many chunks each file produced and verify everything is indexed correctly.
100
+ Re-ingesting the same file replaces the old version automatically.
72
101
 
73
- **File deletion** removes ingested documents from the vector database. When you delete a file, all its chunks and embeddings are permanently removed. This is useful for removing outdated documents or sensitive data you no longer want indexed.
102
+ ### Searching Documents
74
103
 
75
- **System status** reports on your database—document count, total chunks, memory usage. Helpful for monitoring performance or debugging issues.
104
+ ```
105
+ "What does the API documentation say about authentication?"
106
+ "Find information about rate limiting"
107
+ "Search for error handling best practices"
108
+ ```
76
109
 
77
- All of this uses:
78
- - **LanceDB** for vector storage (file-based, no server needed)
79
- - **Transformers.js** for embeddings (runs in Node.js, no Python)
80
- - **all-MiniLM-L6-v2** model (384 dimensions, good balance of speed and accuracy)
81
- - **RecursiveCharacterTextSplitter** for intelligent text chunking
110
+ Search uses semantic similarity with keyword boost. This means `useEffect` finds documents containing that exact term, not just semantically similar React concepts.
82
111
 
83
- The result: query responses typically under 3 seconds on a standard laptop, even with thousands of document chunks indexed.
112
+ Results include text content, source file, and relevance score. Adjust result count with `limit` (1-20, default 10).
84
113
 
85
- ## First Run
114
+ ### Managing Files
86
115
 
87
- The server starts instantly, but the embedding model downloads **on first use** (when you ingest or search for the first time):
88
- - **Download size**: ~90MB (model files)
89
- - **Disk usage after caching**: ~120MB (includes ONNX runtime cache)
90
- - **Time**: 1-2 minutes on a decent connection
91
- - **First operation delay**: Your initial ingest or search request will wait for the model download to complete
116
+ ```
117
+ "List all ingested files" # See what's indexed
118
+ "Delete old-spec.pdf from RAG" # Remove a file
119
+ "Show RAG server status" # Check system health
120
+ ```
92
121
 
93
- You'll see a message like "Initializing model (downloading ~90MB, may take 1-2 minutes)..." in the console. The model caches in `CACHE_DIR` (default: `./models/`) for offline use.
122
+ ## Search Tuning
94
123
 
95
- **Why lazy initialization?** This approach allows the server to start immediately without upfront model loading. You only download when actually needed, making the server more responsive for quick status checks or file management operations.
124
+ Adjust these for your use case:
96
125
 
97
- **Offline Mode**: After first download, works completely offline—no internet required.
126
+ | Variable | Default | Description |
127
+ |----------|---------|-------------|
128
+ | `RAG_HYBRID_WEIGHT` | `0.6` | Keyword boost factor. 0 = semantic only, higher = stronger keyword boost. |
129
+ | `RAG_GROUPING` | (not set) | `similar` for top group only, `related` for top 2 groups. |
130
+ | `RAG_MAX_DISTANCE` | (not set) | Filter out low-relevance results (e.g., `0.5`). |
98
131
 
99
- ## Security
132
+ Example (stricter, code-focused):
133
+ ```json
134
+ "env": {
135
+ "RAG_HYBRID_WEIGHT": "0.7",
136
+ "RAG_GROUPING": "similar"
137
+ }
138
+ ```
100
139
 
101
- **Path Restriction**: This server only accesses files within your `BASE_DIR`. Any attempt to access files outside this directory (e.g., via `../` path traversal) will be rejected.
140
+ ## How It Works
102
141
 
103
- **Local Only**: All processing happens on your machine. No network requests are made after the initial model download.
142
+ **TL;DR:**
143
+ - Documents are chunked by semantic similarity, not fixed character counts
144
+ - Each chunk is embedded locally using Transformers.js
145
+ - Search uses semantic similarity with keyword boost for exact matches
146
+ - Results are filtered based on relevance gaps, not raw scores
104
147
 
105
- **Model Verification**: The embedding model downloads from HuggingFace's official repository (`Xenova/all-MiniLM-L6-v2`). Verify integrity by checking the [official model card](https://huggingface.co/Xenova/all-MiniLM-L6-v2).
148
+ ### Details
106
149
 
107
- ## Configuration
150
+ When you ingest a document, the parser extracts text based on file type (PDF via `unpdf`, DOCX via `mammoth`, text files directly).
108
151
 
109
- The server works out of the box with sensible defaults, but you can customize it through environment variables.
152
+ The semantic chunker splits text into sentences, then groups them using embedding similarity. It finds natural topic boundaries where the meaning shifts—keeping related content together instead of cutting at arbitrary character limits. This produces chunks that are coherent units of meaning, typically 500-1000 characters.
110
153
 
111
- ### For Codex
154
+ Each chunk goes through the Transformers.js embedding model (`all-MiniLM-L6-v2`), converting text into 384-dimensional vectors. Vectors are stored in LanceDB, a file-based vector database requiring no server process.
112
155
 
113
- Add to `~/.codex/config.toml`:
156
+ When you search:
157
+ 1. Your query becomes a vector using the same model
158
+ 2. Semantic (vector) search finds the most relevant chunks
159
+ 3. Quality filters apply (distance threshold, grouping)
160
+ 4. Keyword matches boost rankings for exact term matching
114
161
 
115
- ```toml
116
- [mcp_servers.local-rag]
117
- command = "npx"
118
- args = ["-y", "mcp-local-rag"]
162
+ The keyword boost ensures exact terms like `useEffect` or error codes rank higher when they match.
119
163
 
120
- [mcp_servers.local-rag.env]
121
- BASE_DIR = "/path/to/your/documents"
122
- DB_PATH = "./lancedb"
123
- CACHE_DIR = "./models"
124
- ```
164
+ <details>
165
+ <summary><strong>Configuration</strong></summary>
125
166
 
126
- **Note:** The section name must be `mcp_servers` (with underscore). Using `mcp-servers` or `mcpservers` will cause Codex to ignore the configuration.
167
+ ### Environment Variables
127
168
 
128
- ### For Cursor
169
+ | Variable | Default | Description |
170
+ |----------|---------|-------------|
171
+ | `BASE_DIR` | Current directory | Document root directory (security boundary) |
172
+ | `DB_PATH` | `./lancedb/` | Vector database location |
173
+ | `CACHE_DIR` | `./models/` | Model cache directory |
174
+ | `MODEL_NAME` | `Xenova/all-MiniLM-L6-v2` | HuggingFace model ID ([available models](https://huggingface.co/models?library=transformers.js&pipeline_tag=feature-extraction)) |
175
+ | `MAX_FILE_SIZE` | `104857600` (100MB) | Maximum file size in bytes |
129
176
 
130
- Add to your Cursor settings:
131
- - **Global** (all projects): `~/.cursor/mcp.json`
132
- - **Project-specific**: `.cursor/mcp.json` in your project root
177
+ ### Client-Specific Setup
178
+
179
+ **Cursor** — Global: `~/.cursor/mcp.json`, Project: `.cursor/mcp.json`
133
180
 
134
181
  ```json
135
182
  {
@@ -138,424 +185,178 @@ Add to your Cursor settings:
138
185
  "command": "npx",
139
186
  "args": ["-y", "mcp-local-rag"],
140
187
  "env": {
141
- "BASE_DIR": "/path/to/your/documents",
142
- "DB_PATH": "./lancedb",
143
- "CACHE_DIR": "./models"
188
+ "BASE_DIR": "/path/to/your/documents"
144
189
  }
145
190
  }
146
191
  }
147
192
  }
148
193
  ```
149
194
 
150
- ### For Claude Code
151
-
152
- Run in your project directory to enable for that project:
153
-
154
- ```bash
155
- cd /path/to/your/project
156
- claude mcp add local-rag --env BASE_DIR=/path/to/your/documents -- npx -y mcp-local-rag
157
- ```
195
+ **Codex** `~/.codex/config.toml` (note: must use `mcp_servers` with underscore)
158
196
 
159
- Or add globally for all projects:
197
+ ```toml
198
+ [mcp_servers.local-rag]
199
+ command = "npx"
200
+ args = ["-y", "mcp-local-rag"]
160
201
 
161
- ```bash
162
- claude mcp add local-rag --scope user --env BASE_DIR=/path/to/your/documents -- npx -y mcp-local-rag
202
+ [mcp_servers.local-rag.env]
203
+ BASE_DIR = "/path/to/your/documents"
163
204
  ```
164
205
 
165
- **With additional environment variables:**
206
+ **Claude Code**:
166
207
 
167
208
  ```bash
168
209
  claude mcp add local-rag --scope user \
169
210
  --env BASE_DIR=/path/to/your/documents \
170
- --env DB_PATH=./lancedb \
171
- --env CACHE_DIR=./models \
172
211
  -- npx -y mcp-local-rag
173
212
  ```
174
213
 
175
- ### Environment Variables
214
+ ### First Run
176
215
 
177
- | Variable | Default | Description | Valid Range |
178
- |----------|---------|-------------|-------------|
179
- | `BASE_DIR` | Current directory | Document root directory. Server only accesses files within this path (prevents accidental system file access). | Any valid path |
180
- | `DB_PATH` | `./lancedb/` | Vector database storage location. Can grow large with many documents. | Any valid path |
181
- | `CACHE_DIR` | `./models/` | Model cache directory. After first download, model stays here for offline use. | Any valid path |
182
- | `MODEL_NAME` | `Xenova/all-MiniLM-L6-v2` | HuggingFace model identifier. Must be Transformers.js compatible. See [available models](https://huggingface.co/models?library=transformers.js&pipeline_tag=feature-extraction&sort=trending). **Note:** Changing models requires deleting your database (`rm -rf ./lancedb/`) and re-ingesting all documents. See FAQ for details. | HF model ID |
183
- | `MAX_FILE_SIZE` | `104857600` (100MB) | Maximum file size in bytes. Larger files rejected to prevent memory issues. | 1MB - 500MB |
184
- | `CHUNK_SIZE` | `512` | Characters per chunk. Larger = more context but slower processing. | 128 - 2048 |
185
- | `CHUNK_OVERLAP` | `100` | Overlap between chunks. Preserves context across boundaries. | 0 - (CHUNK_SIZE/2) |
186
- | `RAG_MAX_DISTANCE` | (not set) | Maximum distance threshold for search results. Results with distance greater than this value are excluded. Lower values mean stricter filtering (e.g., `0.5` for high relevance only). | Positive number |
187
- | `RAG_GROUPING` | (not set) | Grouping mode for quality filtering. `similar` returns only the most similar group (stops at first distance jump). `related` includes related groups (stops at second distance jump). | `similar` or `related` |
188
- | `RAG_HYBRID_WEIGHT` | `0.6` | Balance between keyword (BM25) and semantic search. `0.0` = pure semantic, `1.0` = pure keyword. Default `0.6` prioritizes keyword matches—ideal for code/technical terms. Use `0.3-0.4` for natural language queries. | `0.0` - `1.0` |
216
+ The embedding model (~90MB) downloads on first use. Takes 1-2 minutes, then works offline.
189
217
 
190
- ## Usage
218
+ ### Security
191
219
 
192
- **After configuration**, restart your MCP client:
193
- - **Cursor**: Fully quit and relaunch (Cmd+Q on Mac, not just closing windows)
194
- - **Codex**: Restart the IDE/extension
195
- - **Claude Code**: No restart needed—changes apply immediately
220
+ - **Path restriction**: Only files within `BASE_DIR` are accessible
221
+ - **Local only**: No network requests after model download
222
+ - **Model source**: Official HuggingFace repository ([verify here](https://huggingface.co/Xenova/all-MiniLM-L6-v2))
196
223
 
197
- The server will appear as available tools that your AI assistant can use.
224
+ </details>
198
225
 
199
- ### Ingesting Documents
226
+ <details>
227
+ <summary><strong>Performance</strong></summary>
200
228
 
201
- **In Cursor**, the Composer Agent automatically uses MCP tools when needed:
229
+ Tested on MacBook Pro M1 (16GB RAM), Node.js 22:
202
230
 
203
- ```
204
- "Ingest the document at /Users/me/docs/api-spec.pdf"
205
- ```
231
+ **Query Speed**: ~1.2 seconds for 10,000 chunks (p90 < 3s)
206
232
 
207
- **In Codex CLI**, the assistant automatically uses configured MCP tools when needed:
233
+ **Ingestion** (10MB PDF):
234
+ - PDF parsing: ~8s
235
+ - Chunking: ~2s
236
+ - Embedding: ~30s
237
+ - DB insertion: ~5s
208
238
 
209
- ```bash
210
- codex "Ingest the document at /Users/me/docs/api-spec.pdf into the RAG system"
211
- ```
239
+ **Memory**: ~200MB idle, ~800MB peak (50MB file ingestion)
212
240
 
213
- **In Claude Code**, just ask naturally:
241
+ **Concurrency**: Handles 5 parallel queries without degradation.
214
242
 
215
- ```
216
- "Ingest the document at /Users/me/docs/api-spec.pdf"
217
- ```
243
+ </details>
218
244
 
219
- **Path Requirements**: The server requires **absolute paths** to files. Your AI assistant will typically convert natural language requests into absolute paths automatically. The `BASE_DIR` setting restricts access to only files within that directory tree for security, but you must still provide the full path.
245
+ <details>
246
+ <summary><strong>Troubleshooting</strong></summary>
220
247
 
221
- The server:
222
- 1. Validates the file exists and is under 100MB
223
- 2. Extracts text (handling PDF/DOCX/TXT/MD formats)
224
- 3. Splits into chunks (512 chars, 100 char overlap)
225
- 4. Generates embeddings for each chunk
226
- 5. Stores in the vector database
248
+ ### "No results found"
227
249
 
228
- This takes roughly 5-10 seconds per MB on a standard laptop. You'll see a confirmation when complete, including how many chunks were created.
250
+ Documents must be ingested first. Run `"List all ingested files"` to verify.
229
251
 
230
- ### Searching Documents
252
+ ### Model download failed
231
253
 
232
- Ask questions in natural language:
254
+ Check internet connection. If behind a proxy, configure network settings. The model can also be [downloaded manually](https://huggingface.co/Xenova/all-MiniLM-L6-v2).
233
255
 
234
- ```
235
- "What does the API documentation say about authentication?"
236
- "Find information about rate limiting"
237
- "Search for error handling best practices"
238
- ```
256
+ ### "File too large"
239
257
 
240
- The server uses **hybrid search** combining:
241
- 1. **Keyword matching (BM25)**: Finds exact term matches—crucial for code terms like `ProjectLifetimeScope` or error codes
242
- 2. **Semantic search**: Understands meaning, so "authentication" finds "login flow" content
258
+ Default limit is 100MB. Split large files or increase `MAX_FILE_SIZE`.
243
259
 
244
- This hybrid approach means searching for `useEffect` finds documents containing that exact term, not just semantically similar React concepts.
260
+ ### Slow queries
245
261
 
246
- Results include the text content, which file it came from, and a relevance score. Your AI assistant then uses these results to answer your question.
262
+ Check chunk count with `status`. Large documents with many chunks may slow queries. Consider splitting very large files.
247
263
 
248
- You can adjust the number of results:
264
+ ### "Path outside BASE_DIR"
249
265
 
250
- ```
251
- "Search for database optimization tips, return 5 results" # Fewer, more precise
252
- "Search for database optimization tips, return 20 results" # Broader exploration
253
- ```
266
+ Ensure file paths are within `BASE_DIR`. Use absolute paths.
254
267
 
255
- The limit parameter accepts 1-20 results. Recommended: 5 for precision, 10 for balance, 20 for broad exploration.
268
+ ### MCP client doesn't see tools
256
269
 
257
- ### Managing Files
270
+ 1. Verify config file syntax
271
+ 2. Restart client completely (Cmd+Q on Mac for Cursor)
272
+ 3. Test directly: `npx mcp-local-rag` should run without errors
258
273
 
259
- See what's indexed:
274
+ </details>
260
275
 
261
- ```
262
- "List all ingested files"
263
- ```
264
-
265
- This shows each file's path, how many chunks it produced, and when it was ingested.
276
+ <details>
277
+ <summary><strong>FAQ</strong></summary>
266
278
 
267
- Delete a file from the database:
268
-
269
- ```
270
- "Delete /Users/me/docs/old-spec.pdf from the RAG system"
271
- ```
279
+ **Is this really private?**
280
+ Yes. After model download, nothing leaves your machine. Verify with network monitoring.
272
281
 
273
- This permanently removes the file and all its chunks from the vector database. The operation is idempotent—deleting a file that doesn't exist succeeds without error.
282
+ **Can I use this offline?**
283
+ Yes, after the first model download (~90MB).
274
284
 
275
- Check system status:
285
+ **How does this compare to cloud RAG?**
286
+ Cloud services offer better accuracy at scale but require sending data externally. This trades some accuracy for complete privacy and zero runtime cost.
276
287
 
277
- ```
278
- "Show the RAG server status"
279
- ```
288
+ **What file formats are supported?**
289
+ PDF, DOCX, TXT, Markdown. Not yet: Excel, PowerPoint, images, HTML.
280
290
 
281
- This reports total documents, total chunks, current memory usage, and uptime.
291
+ **Can I change the embedding model?**
292
+ Yes, but you must delete your database and re-ingest all documents. Different models produce incompatible vector dimensions.
282
293
 
283
- ### Re-ingesting Files
294
+ **GPU acceleration?**
295
+ Transformers.js runs on CPU. GPU support is experimental. CPU performance is adequate for most use cases.
284
296
 
285
- If you update a document, ingest it again:
297
+ **Multi-user support?**
298
+ No. Designed for single-user, local access. Multi-user would require authentication/access control.
286
299
 
287
- ```
288
- "Re-ingest api-spec.pdf with the latest changes"
289
- ```
300
+ **How to backup?**
301
+ Copy `DB_PATH` directory (default: `./lancedb/`).
290
302
 
291
- The server automatically deletes old chunks for that file before adding new ones. No duplicates, no stale data.
303
+ </details>
292
304
 
293
- ## Development
305
+ <details>
306
+ <summary><strong>Development</strong></summary>
294
307
 
295
308
  ### Building from Source
296
309
 
297
310
  ```bash
298
311
  git clone https://github.com/shinpr/mcp-local-rag.git
299
312
  cd mcp-local-rag
300
- npm install
313
+ pnpm install
301
314
  ```
302
315
 
303
- ### Running Tests
316
+ ### Testing
304
317
 
305
318
  ```bash
306
- # Run all tests
307
- npm test
308
-
309
- # Run with coverage
310
- npm run test:coverage
311
-
312
- # Watch mode for development
313
- npm run test:watch
319
+ pnpm test # Run all tests
320
+ pnpm run test:watch # Watch mode
314
321
  ```
315
322
 
316
- The test suite includes:
317
- - Unit tests for each component
318
- - Integration tests for the full ingestion and search flow
319
- - Security tests for path traversal protection
320
- - Performance tests verifying query speed targets
321
-
322
323
  ### Code Quality
323
324
 
324
325
  ```bash
325
- # Type check
326
- npm run type-check
327
-
328
- # Lint and format
329
- npm run check:fix
330
-
331
- # Check circular dependencies
332
- npm run check:deps
333
-
334
- # Full quality check (runs everything)
335
- npm run check:all
326
+ pnpm run type-check # TypeScript check
327
+ pnpm run check:fix # Lint and format
328
+ pnpm run check:deps # Circular dependency check
329
+ pnpm run check:all # Full quality check
336
330
  ```
337
331
 
338
332
  ### Project Structure
339
333
 
340
334
  ```
341
335
  src/
342
- index.ts # Entry point, starts the MCP server
343
- server/ # RAGServer class, MCP tool handlers
344
- parser/ # Document parsing (PDF, DOCX, TXT, MD)
345
- chunker/ # Text splitting logic
346
- embedder/ # Embedding generation with Transformers.js
347
- vectordb/ # LanceDB operations
348
- __tests__/ # Test suites
336
+ index.ts # Entry point
337
+ server/ # MCP tool handlers
338
+ parser/ # PDF, DOCX, TXT, MD parsing
339
+ chunker/ # Text splitting
340
+ embedder/ # Transformers.js embeddings
341
+ vectordb/ # LanceDB operations
342
+ __tests__/ # Test suites
349
343
  ```
350
344
 
351
- Each module has clear boundaries:
352
- - **Parser** validates file paths and extracts text
353
- - **Chunker** splits text into overlapping segments
354
- - **Embedder** generates 384-dimensional vectors
355
- - **VectorStore** handles all database operations
356
- - **RAGServer** orchestrates everything and exposes MCP tools
357
-
358
- ## Performance
359
-
360
- **Test Environment**: MacBook Pro M1 (16GB RAM), tested with v0.1.3 on Node.js 22 (January 2025)
361
-
362
- **Query Performance**:
363
- - Average: 1.2 seconds for 10,000 indexed chunks (5 results)
364
- - Target: p90 < 3 seconds ✓
365
-
366
- **Ingestion Speed** (10MB PDF):
367
- - Total: ~45 seconds
368
- - PDF parsing: ~8 seconds (17%)
369
- - Text chunking: ~2 seconds (4%)
370
- - Embedding generation: ~30 seconds (67%)
371
- - Database insertion: ~5 seconds (11%)
372
-
373
- **Memory Usage**:
374
- - Baseline: ~200MB idle
375
- - Peak: ~800MB when ingesting 50MB file
376
- - Target: < 1GB ✓
377
-
378
- **Concurrent Queries**: Handles 5 parallel queries without degradation. LanceDB's async API allows non-blocking operations.
379
-
380
- **Note**: Your results will vary based on hardware, especially CPU speed (embeddings run on CPU, not GPU).
381
-
382
- ## Troubleshooting
383
-
384
- ### "No results found" when searching
385
-
386
- **Cause**: Documents must be ingested before searching.
387
-
388
- **Solution**:
389
- 1. First ingest documents: `"Ingest /path/to/document.pdf"`
390
- 2. Verify ingestion: `"List all ingested files"`
391
- 3. Then search: `"Search for [your query]"`
392
-
393
- **Common mistake**: Trying to search immediately after configuration without ingesting any documents.
394
-
395
- ### "Model download failed"
396
-
397
- The embedding model downloads from HuggingFace on first use (when you ingest or search for the first time). If you're behind a proxy or firewall, you might need to configure network settings.
398
-
399
- **When it happens**: Your first ingest or search operation will trigger the download. If it fails, you'll see a detailed error message with troubleshooting guidance (network issues, disk space, cache corruption).
400
-
401
- **What to do**: The error message provides specific recommendations. Common solutions:
402
- 1. Check your internet connection and retry the operation
403
- 2. Ensure you have sufficient disk space (~120MB needed)
404
- 3. If problems persist, delete the cache directory and try again
405
-
406
- Alternatively, download the model manually:
407
- 1. Visit https://huggingface.co/Xenova/all-MiniLM-L6-v2
408
- 2. Download the model files
409
- 3. Set CACHE_DIR to where you saved them
410
-
411
- ### "File too large" error
412
-
413
- Default limit is 100MB. For larger files:
414
- - Split them into smaller documents
415
- - Or increase MAX_FILE_SIZE in your config (be aware of memory usage)
416
-
417
- ### Slow query performance
418
-
419
- If queries take longer than expected:
420
- - Check how many chunks you have indexed (`status` command)
421
- - Consider the hardware (embeddings are CPU-intensive)
422
- - Try reducing CHUNK_SIZE to create fewer chunks
423
-
424
- ### "Path outside BASE_DIR" error
425
-
426
- The server restricts file access to BASE_DIR for security. Make sure your file path is within that directory. Check for:
427
- - Correct BASE_DIR setting in your MCP config
428
- - Relative paths vs absolute paths
429
- - Typos in the file path
430
-
431
- ### MCP client doesn't see the tools
432
-
433
- **For Cursor:**
434
- 1. Open Settings → Features → Model Context Protocol
435
- 2. Verify the server configuration is saved
436
- 3. Restart Cursor completely
437
- 4. Check the MCP connection status in the status bar
438
-
439
- **For Codex CLI:**
440
- 1. Check `~/.codex/config.toml` to verify the configuration
441
- 2. Ensure the section name is `[mcp_servers.local-rag]` (with underscore)
442
- 3. Test the server directly: `npx mcp-local-rag` should run without errors
443
- 4. Restart Codex CLI or IDE extension
444
- 5. Check for error messages when Codex starts
445
-
446
- **For Claude Code:**
447
- 1. Run `claude mcp list` to see configured servers
448
- 2. Verify the server appears in the list
449
- 3. Check `~/.config/claude/mcp_config.json` for syntax errors
450
- 4. Test the server directly: `npx mcp-local-rag` should run without errors
451
-
452
- **Common issues:**
453
- - Invalid JSON syntax in config files
454
- - Wrong file paths in BASE_DIR setting
455
- - Server binary not found (try global install: `npm install -g mcp-local-rag`)
456
- - Firewall blocking local communication
457
-
458
- ## How It Works
459
-
460
- When you ingest a document, the parser extracts text based on the file type. PDFs use `pdf-parse`, DOCX uses `mammoth`, and text files are read directly.
461
-
462
- The chunker then splits the text using LangChain's RecursiveCharacterTextSplitter. It tries to break on natural boundaries (paragraphs, sentences) while keeping chunks around 512 characters. Adjacent chunks overlap by 100 characters to preserve context.
463
-
464
- Each chunk goes through the Transformers.js embedding model, which converts text into a 384-dimensional vector representing its semantic meaning. This happens in batches of 8 chunks at a time for efficiency.
465
-
466
- Vectors are stored in LanceDB, a columnar vector database that works with local files. No server process, no complex setup. It's just a directory with data files.
467
-
468
- When you search, the server performs **hybrid search**:
469
- 1. Your query becomes a vector using the same embedding model
470
- 2. LanceDB performs both keyword search (BM25) and vector similarity search
471
- 3. Results are combined using a weighted linear combination (default: 60% keyword, 40% semantic)
472
- 4. The top matches return to your MCP client with their original text and metadata
473
-
474
- This hybrid approach gets the best of both worlds: exact keyword matches (essential for code terms, error codes, function names) and semantic understanding (so "authentication" finds "login flow" content). The default weight prioritizes keyword matches, which works well for developer documentation where exact terms matter.
475
-
476
- ## FAQ
477
-
478
- **Is this really private?**
479
-
480
- Yes. After the initial model download, nothing leaves your machine. You can verify with network monitoring tools—no outbound requests during ingestion or search.
481
-
482
- **Can I use this offline?**
483
-
484
- Yes, once the model is cached. The first run needs internet to download the model (~90MB), but after that, everything works offline.
485
-
486
- **How does this compare to cloud RAG services?**
487
-
488
- Cloud services (OpenAI, Pinecone, etc.) typically offer better accuracy and scale. But they require sending your documents externally, ongoing costs, and internet connectivity. This project trades some accuracy for complete privacy and zero runtime cost.
489
-
490
- **What file formats are supported?**
491
-
492
- Currently supported:
493
- - **PDF**: `.pdf` (uses pdf-parse)
494
- - **Microsoft Word**: `.docx` (uses mammoth, not `.doc`)
495
- - **Plain Text**: `.txt`
496
- - **Markdown**: `.md`, `.markdown`
497
-
498
- **Not yet supported**:
499
- - Excel/CSV (`.xlsx`, `.csv`)
500
- - PowerPoint (`.pptx`)
501
- - Images with OCR (`.jpg`, `.png`)
502
- - HTML (`.html`)
503
- - Old Word documents (`.doc`)
504
-
505
- Want support for another format? [Open an issue](https://github.com/shinpr/mcp-local-rag/issues/new) with your use case.
506
-
507
- **Can I customize the embedding model?**
508
-
509
- Yes, set MODEL_NAME to any Transformers.js-compatible model from HuggingFace.
510
-
511
- However, switching models isn't as simple as changing a config value. Each model produces vectors of a specific dimension—`all-MiniLM-L6-v2` outputs 384 dimensions, while `multilingual-e5-small` outputs 384 and `embeddinggemma-300m` outputs 768. These vectors are fundamentally incompatible with each other.
512
-
513
- When you change models, you must:
514
-
515
- 1. **Delete your existing database**: `rm -rf ./lancedb/` (or your custom DB_PATH)
516
- 2. **Re-ingest all your documents** with the new model
517
-
518
- Simply re-ingesting without deleting won't work. LanceDB locks the vector dimension when you first insert data, and that schema persists even if you delete all documents. If you try to insert 768-dimensional vectors into a database that was created with 384-dimensional vectors, you'll get a dimension mismatch error.
519
-
520
- The good news: if you do forget to delete the database, LanceDB will give you a clear error message like "Query vector size 768 does not match index column size 384"—so you'll know exactly what went wrong.
521
-
522
- **How much does accuracy depend on the model?**
523
-
524
- `all-MiniLM-L6-v2` is optimized for English and performs well for technical documentation. For other languages, consider multilingual models like `multilingual-e5-small`. For higher accuracy, try larger models—but expect slower processing.
525
-
526
- **What about GPU acceleration?**
527
-
528
- Transformers.js runs on CPU by default. GPU support is experimental and varies by platform. For most use cases, CPU performance is adequate (embeddings are reasonably fast even without GPU).
529
-
530
- **Can multiple people share a database?**
531
-
532
- The current design assumes single-user, local access. For multi-user scenarios, you'd need to implement authentication and access control—both out of scope for this project's privacy-first design.
533
-
534
- **How do I back up my data?**
535
-
536
- Copy your DB_PATH directory (default: `./lancedb/`). That's your entire vector database. Copy BASE_DIR for your original documents. Both are just files—no special export needed.
345
+ </details>
537
346
 
538
347
  ## Contributing
539
348
 
540
- Contributions are welcome. Before submitting a PR:
349
+ Contributions welcome. Before submitting a PR:
541
350
 
542
- 1. Run the test suite: `npm test`
543
- 2. Ensure code quality: `npm run check:all`
351
+ 1. Run tests: `pnpm test`
352
+ 2. Check quality: `pnpm run check:all`
544
353
  3. Add tests for new features
545
- 4. Update documentation if you change behavior
354
+ 4. Update docs if behavior changes
546
355
 
547
356
  ## License
548
357
 
549
- MIT License - see LICENSE file for details.
550
-
551
- Free for personal and commercial use. No attribution required, but appreciated.
358
+ MIT License. Free for personal and commercial use.
552
359
 
553
360
  ## Acknowledgments
554
361
 
555
- Built with:
556
- - [Model Context Protocol](https://modelcontextprotocol.io/) by Anthropic
557
- - [LanceDB](https://lancedb.com/) for vector storage
558
- - [Transformers.js](https://huggingface.co/docs/transformers.js) by HuggingFace
559
- - [LangChain.js](https://js.langchain.com/) for text splitting
560
-
561
- Created as a practical tool for developers who want AI-powered document search without compromising privacy.
362
+ Built with [Model Context Protocol](https://modelcontextprotocol.io/) by Anthropic, [LanceDB](https://lancedb.com/), and [Transformers.js](https://huggingface.co/docs/transformers.js).