mcp-local-rag 0.3.0 → 0.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +194 -393
- package/dist/chunker/index.d.ts +1 -32
- package/dist/chunker/index.d.ts.map +1 -1
- package/dist/chunker/index.js +3 -72
- package/dist/chunker/index.js.map +1 -1
- package/dist/chunker/semantic-chunker.d.ts +81 -0
- package/dist/chunker/semantic-chunker.d.ts.map +1 -0
- package/dist/chunker/semantic-chunker.js +248 -0
- package/dist/chunker/semantic-chunker.js.map +1 -0
- package/dist/chunker/sentence-splitter.d.ts +16 -0
- package/dist/chunker/sentence-splitter.d.ts.map +1 -0
- package/dist/chunker/sentence-splitter.js +114 -0
- package/dist/chunker/sentence-splitter.js.map +1 -0
- package/dist/embedder/index.d.ts.map +1 -1
- package/dist/embedder/index.js.map +1 -1
- package/dist/index.js +0 -2
- package/dist/index.js.map +1 -1
- package/dist/parser/index.d.ts +1 -1
- package/dist/parser/index.d.ts.map +1 -1
- package/dist/parser/index.js +8 -6
- package/dist/parser/index.js.map +1 -1
- package/dist/server/index.d.ts +0 -4
- package/dist/server/index.d.ts.map +1 -1
- package/dist/server/index.js +7 -11
- package/dist/server/index.js.map +1 -1
- package/dist/vectordb/index.d.ts +21 -6
- package/dist/vectordb/index.d.ts.map +1 -1
- package/dist/vectordb/index.js +113 -107
- package/dist/vectordb/index.js.map +1 -1
- package/package.json +5 -13
package/README.md
CHANGED
|
@@ -1,14 +1,35 @@
|
|
|
1
1
|
# MCP Local RAG
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
[](https://www.npmjs.com/package/mcp-local-rag)
|
|
4
|
+
[](https://opensource.org/licenses/MIT)
|
|
4
5
|
|
|
5
|
-
|
|
6
|
+
Local RAG for developers using MCP.
|
|
7
|
+
Semantic search with keyword boost for exact technical terms — fully private, zero setup.
|
|
8
|
+
|
|
9
|
+
## Features
|
|
10
|
+
|
|
11
|
+
- **Semantic search with keyword boost**
|
|
12
|
+
Vector search first, then keyword matching boosts exact matches. Terms like `useEffect`, error codes, and class names rank higher—not just semantically guessed.
|
|
13
|
+
|
|
14
|
+
- **Smart semantic chunking**
|
|
15
|
+
Chunks documents by meaning, not character count. Uses embedding similarity to find natural topic boundaries—keeping related content together and splitting where topics change.
|
|
16
|
+
|
|
17
|
+
- **Quality-first result filtering**
|
|
18
|
+
Groups results by relevance gaps instead of arbitrary top-K cutoffs. Get fewer but more trustworthy chunks.
|
|
19
|
+
|
|
20
|
+
- **Runs entirely locally**
|
|
21
|
+
No API keys, no cloud, no data leaving your machine. Works fully offline after the first model download.
|
|
22
|
+
|
|
23
|
+
- **Zero-friction setup**
|
|
24
|
+
One `npx` command. No Docker, no Python, no servers to manage. Designed for Cursor, Codex, and Claude Code via MCP.
|
|
6
25
|
|
|
7
26
|
## Quick Start
|
|
8
27
|
|
|
9
|
-
|
|
28
|
+
Set `BASE_DIR` to the folder you want to search. Documents must live under it.
|
|
29
|
+
|
|
30
|
+
Add the MCP server to your AI coding tool:
|
|
10
31
|
|
|
11
|
-
**For Cursor**
|
|
32
|
+
**For Cursor** — Add to `~/.cursor/mcp.json`:
|
|
12
33
|
```json
|
|
13
34
|
{
|
|
14
35
|
"mcpServers": {
|
|
@@ -23,7 +44,7 @@ Add the MCP server to your AI coding tool. Choose your tool below:
|
|
|
23
44
|
}
|
|
24
45
|
```
|
|
25
46
|
|
|
26
|
-
**For Codex**
|
|
47
|
+
**For Codex** — Add to `~/.codex/config.toml`:
|
|
27
48
|
```toml
|
|
28
49
|
[mcp_servers.local-rag]
|
|
29
50
|
command = "npx"
|
|
@@ -33,103 +54,129 @@ args = ["-y", "mcp-local-rag"]
|
|
|
33
54
|
BASE_DIR = "/path/to/your/documents"
|
|
34
55
|
```
|
|
35
56
|
|
|
36
|
-
**For Claude Code**
|
|
57
|
+
**For Claude Code** — Run this command:
|
|
37
58
|
```bash
|
|
38
59
|
claude mcp add local-rag --scope user --env BASE_DIR=/path/to/your/documents -- npx -y mcp-local-rag
|
|
39
60
|
```
|
|
40
61
|
|
|
41
|
-
Restart your tool, then start using:
|
|
62
|
+
Restart your tool, then start using it:
|
|
63
|
+
|
|
42
64
|
```
|
|
43
|
-
"Ingest api-spec.pdf"
|
|
44
|
-
|
|
65
|
+
You: "Ingest api-spec.pdf"
|
|
66
|
+
Assistant: Successfully ingested api-spec.pdf (47 chunks created)
|
|
67
|
+
|
|
68
|
+
You: "What does the API documentation say about authentication?"
|
|
69
|
+
Assistant: Based on the documentation, authentication uses OAuth 2.0 with JWT tokens.
|
|
70
|
+
The flow is described in section 3.2...
|
|
45
71
|
```
|
|
46
72
|
|
|
47
73
|
That's it. No installation, no Docker, no complex setup.
|
|
48
74
|
|
|
49
75
|
## Why This Exists
|
|
50
76
|
|
|
51
|
-
You want
|
|
77
|
+
You want AI to search your documents—technical specs, research papers, internal docs. But most solutions send your files to external APIs.
|
|
52
78
|
|
|
53
|
-
This
|
|
79
|
+
**Privacy.** Your documents might contain sensitive data. This runs entirely locally.
|
|
54
80
|
|
|
55
|
-
**
|
|
81
|
+
**Cost.** External embedding APIs charge per use. This is free after the initial model download.
|
|
56
82
|
|
|
57
|
-
**
|
|
83
|
+
**Offline.** Works without internet after setup.
|
|
58
84
|
|
|
59
|
-
**
|
|
85
|
+
**Code search.** Pure semantic search misses exact terms like `useEffect` or `ERR_CONNECTION_REFUSED`. Keyword boost catches both meaning and exact matches.
|
|
60
86
|
|
|
61
|
-
|
|
87
|
+
## Usage
|
|
62
88
|
|
|
63
|
-
|
|
89
|
+
The server provides 5 MCP tools: ingest, search, list, delete, status
|
|
90
|
+
(`ingest_file`, `query_documents`, `list_files`, `delete_file`, `status`).
|
|
64
91
|
|
|
65
|
-
|
|
92
|
+
### Ingesting Documents
|
|
66
93
|
|
|
67
|
-
|
|
94
|
+
```
|
|
95
|
+
"Ingest the document at /Users/me/docs/api-spec.pdf"
|
|
96
|
+
```
|
|
68
97
|
|
|
69
|
-
|
|
98
|
+
Supports PDF, DOCX, TXT, and Markdown. The server extracts text, splits it into chunks, generates embeddings locally, and stores everything in a local vector database.
|
|
70
99
|
|
|
71
|
-
|
|
100
|
+
Re-ingesting the same file replaces the old version automatically.
|
|
72
101
|
|
|
73
|
-
|
|
102
|
+
### Searching Documents
|
|
74
103
|
|
|
75
|
-
|
|
104
|
+
```
|
|
105
|
+
"What does the API documentation say about authentication?"
|
|
106
|
+
"Find information about rate limiting"
|
|
107
|
+
"Search for error handling best practices"
|
|
108
|
+
```
|
|
76
109
|
|
|
77
|
-
|
|
78
|
-
- **LanceDB** for vector storage (file-based, no server needed)
|
|
79
|
-
- **Transformers.js** for embeddings (runs in Node.js, no Python)
|
|
80
|
-
- **all-MiniLM-L6-v2** model (384 dimensions, good balance of speed and accuracy)
|
|
81
|
-
- **RecursiveCharacterTextSplitter** for intelligent text chunking
|
|
110
|
+
Search uses semantic similarity with keyword boost. This means `useEffect` finds documents containing that exact term, not just semantically similar React concepts.
|
|
82
111
|
|
|
83
|
-
|
|
112
|
+
Results include text content, source file, and relevance score. Adjust result count with `limit` (1-20, default 10).
|
|
84
113
|
|
|
85
|
-
|
|
114
|
+
### Managing Files
|
|
86
115
|
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
-
|
|
90
|
-
|
|
91
|
-
|
|
116
|
+
```
|
|
117
|
+
"List all ingested files" # See what's indexed
|
|
118
|
+
"Delete old-spec.pdf from RAG" # Remove a file
|
|
119
|
+
"Show RAG server status" # Check system health
|
|
120
|
+
```
|
|
92
121
|
|
|
93
|
-
|
|
122
|
+
## Search Tuning
|
|
94
123
|
|
|
95
|
-
|
|
124
|
+
Adjust these for your use case:
|
|
96
125
|
|
|
97
|
-
|
|
126
|
+
| Variable | Default | Description |
|
|
127
|
+
|----------|---------|-------------|
|
|
128
|
+
| `RAG_HYBRID_WEIGHT` | `0.6` | Keyword boost factor. 0 = semantic only, higher = stronger keyword boost. |
|
|
129
|
+
| `RAG_GROUPING` | (not set) | `similar` for top group only, `related` for top 2 groups. |
|
|
130
|
+
| `RAG_MAX_DISTANCE` | (not set) | Filter out low-relevance results (e.g., `0.5`). |
|
|
98
131
|
|
|
99
|
-
|
|
132
|
+
Example (stricter, code-focused):
|
|
133
|
+
```json
|
|
134
|
+
"env": {
|
|
135
|
+
"RAG_HYBRID_WEIGHT": "0.7",
|
|
136
|
+
"RAG_GROUPING": "similar"
|
|
137
|
+
}
|
|
138
|
+
```
|
|
100
139
|
|
|
101
|
-
|
|
140
|
+
## How It Works
|
|
102
141
|
|
|
103
|
-
**
|
|
142
|
+
**TL;DR:**
|
|
143
|
+
- Documents are chunked by semantic similarity, not fixed character counts
|
|
144
|
+
- Each chunk is embedded locally using Transformers.js
|
|
145
|
+
- Search uses semantic similarity with keyword boost for exact matches
|
|
146
|
+
- Results are filtered based on relevance gaps, not raw scores
|
|
104
147
|
|
|
105
|
-
|
|
148
|
+
### Details
|
|
106
149
|
|
|
107
|
-
|
|
150
|
+
When you ingest a document, the parser extracts text based on file type (PDF via `unpdf`, DOCX via `mammoth`, text files directly).
|
|
108
151
|
|
|
109
|
-
The
|
|
152
|
+
The semantic chunker splits text into sentences, then groups them using embedding similarity. It finds natural topic boundaries where the meaning shifts—keeping related content together instead of cutting at arbitrary character limits. This produces chunks that are coherent units of meaning, typically 500-1000 characters.
|
|
110
153
|
|
|
111
|
-
|
|
154
|
+
Each chunk goes through the Transformers.js embedding model (`all-MiniLM-L6-v2`), converting text into 384-dimensional vectors. Vectors are stored in LanceDB, a file-based vector database requiring no server process.
|
|
112
155
|
|
|
113
|
-
|
|
156
|
+
When you search:
|
|
157
|
+
1. Your query becomes a vector using the same model
|
|
158
|
+
2. Semantic (vector) search finds the most relevant chunks
|
|
159
|
+
3. Quality filters apply (distance threshold, grouping)
|
|
160
|
+
4. Keyword matches boost rankings for exact term matching
|
|
114
161
|
|
|
115
|
-
|
|
116
|
-
[mcp_servers.local-rag]
|
|
117
|
-
command = "npx"
|
|
118
|
-
args = ["-y", "mcp-local-rag"]
|
|
162
|
+
The keyword boost ensures exact terms like `useEffect` or error codes rank higher when they match.
|
|
119
163
|
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
DB_PATH = "./lancedb"
|
|
123
|
-
CACHE_DIR = "./models"
|
|
124
|
-
```
|
|
164
|
+
<details>
|
|
165
|
+
<summary><strong>Configuration</strong></summary>
|
|
125
166
|
|
|
126
|
-
|
|
167
|
+
### Environment Variables
|
|
127
168
|
|
|
128
|
-
|
|
169
|
+
| Variable | Default | Description |
|
|
170
|
+
|----------|---------|-------------|
|
|
171
|
+
| `BASE_DIR` | Current directory | Document root directory (security boundary) |
|
|
172
|
+
| `DB_PATH` | `./lancedb/` | Vector database location |
|
|
173
|
+
| `CACHE_DIR` | `./models/` | Model cache directory |
|
|
174
|
+
| `MODEL_NAME` | `Xenova/all-MiniLM-L6-v2` | HuggingFace model ID ([available models](https://huggingface.co/models?library=transformers.js&pipeline_tag=feature-extraction)) |
|
|
175
|
+
| `MAX_FILE_SIZE` | `104857600` (100MB) | Maximum file size in bytes |
|
|
129
176
|
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
177
|
+
### Client-Specific Setup
|
|
178
|
+
|
|
179
|
+
**Cursor** — Global: `~/.cursor/mcp.json`, Project: `.cursor/mcp.json`
|
|
133
180
|
|
|
134
181
|
```json
|
|
135
182
|
{
|
|
@@ -138,424 +185,178 @@ Add to your Cursor settings:
|
|
|
138
185
|
"command": "npx",
|
|
139
186
|
"args": ["-y", "mcp-local-rag"],
|
|
140
187
|
"env": {
|
|
141
|
-
"BASE_DIR": "/path/to/your/documents"
|
|
142
|
-
"DB_PATH": "./lancedb",
|
|
143
|
-
"CACHE_DIR": "./models"
|
|
188
|
+
"BASE_DIR": "/path/to/your/documents"
|
|
144
189
|
}
|
|
145
190
|
}
|
|
146
191
|
}
|
|
147
192
|
}
|
|
148
193
|
```
|
|
149
194
|
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
Run in your project directory to enable for that project:
|
|
153
|
-
|
|
154
|
-
```bash
|
|
155
|
-
cd /path/to/your/project
|
|
156
|
-
claude mcp add local-rag --env BASE_DIR=/path/to/your/documents -- npx -y mcp-local-rag
|
|
157
|
-
```
|
|
195
|
+
**Codex** — `~/.codex/config.toml` (note: must use `mcp_servers` with underscore)
|
|
158
196
|
|
|
159
|
-
|
|
197
|
+
```toml
|
|
198
|
+
[mcp_servers.local-rag]
|
|
199
|
+
command = "npx"
|
|
200
|
+
args = ["-y", "mcp-local-rag"]
|
|
160
201
|
|
|
161
|
-
|
|
162
|
-
|
|
202
|
+
[mcp_servers.local-rag.env]
|
|
203
|
+
BASE_DIR = "/path/to/your/documents"
|
|
163
204
|
```
|
|
164
205
|
|
|
165
|
-
**
|
|
206
|
+
**Claude Code**:
|
|
166
207
|
|
|
167
208
|
```bash
|
|
168
209
|
claude mcp add local-rag --scope user \
|
|
169
210
|
--env BASE_DIR=/path/to/your/documents \
|
|
170
|
-
--env DB_PATH=./lancedb \
|
|
171
|
-
--env CACHE_DIR=./models \
|
|
172
211
|
-- npx -y mcp-local-rag
|
|
173
212
|
```
|
|
174
213
|
|
|
175
|
-
###
|
|
214
|
+
### First Run
|
|
176
215
|
|
|
177
|
-
|
|
178
|
-
|----------|---------|-------------|-------------|
|
|
179
|
-
| `BASE_DIR` | Current directory | Document root directory. Server only accesses files within this path (prevents accidental system file access). | Any valid path |
|
|
180
|
-
| `DB_PATH` | `./lancedb/` | Vector database storage location. Can grow large with many documents. | Any valid path |
|
|
181
|
-
| `CACHE_DIR` | `./models/` | Model cache directory. After first download, model stays here for offline use. | Any valid path |
|
|
182
|
-
| `MODEL_NAME` | `Xenova/all-MiniLM-L6-v2` | HuggingFace model identifier. Must be Transformers.js compatible. See [available models](https://huggingface.co/models?library=transformers.js&pipeline_tag=feature-extraction&sort=trending). **Note:** Changing models requires deleting your database (`rm -rf ./lancedb/`) and re-ingesting all documents. See FAQ for details. | HF model ID |
|
|
183
|
-
| `MAX_FILE_SIZE` | `104857600` (100MB) | Maximum file size in bytes. Larger files rejected to prevent memory issues. | 1MB - 500MB |
|
|
184
|
-
| `CHUNK_SIZE` | `512` | Characters per chunk. Larger = more context but slower processing. | 128 - 2048 |
|
|
185
|
-
| `CHUNK_OVERLAP` | `100` | Overlap between chunks. Preserves context across boundaries. | 0 - (CHUNK_SIZE/2) |
|
|
186
|
-
| `RAG_MAX_DISTANCE` | (not set) | Maximum distance threshold for search results. Results with distance greater than this value are excluded. Lower values mean stricter filtering (e.g., `0.5` for high relevance only). | Positive number |
|
|
187
|
-
| `RAG_GROUPING` | (not set) | Grouping mode for quality filtering. `similar` returns only the most similar group (stops at first distance jump). `related` includes related groups (stops at second distance jump). | `similar` or `related` |
|
|
188
|
-
| `RAG_HYBRID_WEIGHT` | `0.6` | Balance between keyword (BM25) and semantic search. `0.0` = pure semantic, `1.0` = pure keyword. Default `0.6` prioritizes keyword matches—ideal for code/technical terms. Use `0.3-0.4` for natural language queries. | `0.0` - `1.0` |
|
|
216
|
+
The embedding model (~90MB) downloads on first use. Takes 1-2 minutes, then works offline.
|
|
189
217
|
|
|
190
|
-
|
|
218
|
+
### Security
|
|
191
219
|
|
|
192
|
-
**
|
|
193
|
-
- **
|
|
194
|
-
- **
|
|
195
|
-
- **Claude Code**: No restart needed—changes apply immediately
|
|
220
|
+
- **Path restriction**: Only files within `BASE_DIR` are accessible
|
|
221
|
+
- **Local only**: No network requests after model download
|
|
222
|
+
- **Model source**: Official HuggingFace repository ([verify here](https://huggingface.co/Xenova/all-MiniLM-L6-v2))
|
|
196
223
|
|
|
197
|
-
|
|
224
|
+
</details>
|
|
198
225
|
|
|
199
|
-
|
|
226
|
+
<details>
|
|
227
|
+
<summary><strong>Performance</strong></summary>
|
|
200
228
|
|
|
201
|
-
|
|
229
|
+
Tested on MacBook Pro M1 (16GB RAM), Node.js 22:
|
|
202
230
|
|
|
203
|
-
|
|
204
|
-
"Ingest the document at /Users/me/docs/api-spec.pdf"
|
|
205
|
-
```
|
|
231
|
+
**Query Speed**: ~1.2 seconds for 10,000 chunks (p90 < 3s)
|
|
206
232
|
|
|
207
|
-
**
|
|
233
|
+
**Ingestion** (10MB PDF):
|
|
234
|
+
- PDF parsing: ~8s
|
|
235
|
+
- Chunking: ~2s
|
|
236
|
+
- Embedding: ~30s
|
|
237
|
+
- DB insertion: ~5s
|
|
208
238
|
|
|
209
|
-
|
|
210
|
-
codex "Ingest the document at /Users/me/docs/api-spec.pdf into the RAG system"
|
|
211
|
-
```
|
|
239
|
+
**Memory**: ~200MB idle, ~800MB peak (50MB file ingestion)
|
|
212
240
|
|
|
213
|
-
**
|
|
241
|
+
**Concurrency**: Handles 5 parallel queries without degradation.
|
|
214
242
|
|
|
215
|
-
|
|
216
|
-
"Ingest the document at /Users/me/docs/api-spec.pdf"
|
|
217
|
-
```
|
|
243
|
+
</details>
|
|
218
244
|
|
|
219
|
-
|
|
245
|
+
<details>
|
|
246
|
+
<summary><strong>Troubleshooting</strong></summary>
|
|
220
247
|
|
|
221
|
-
|
|
222
|
-
1. Validates the file exists and is under 100MB
|
|
223
|
-
2. Extracts text (handling PDF/DOCX/TXT/MD formats)
|
|
224
|
-
3. Splits into chunks (512 chars, 100 char overlap)
|
|
225
|
-
4. Generates embeddings for each chunk
|
|
226
|
-
5. Stores in the vector database
|
|
248
|
+
### "No results found"
|
|
227
249
|
|
|
228
|
-
|
|
250
|
+
Documents must be ingested first. Run `"List all ingested files"` to verify.
|
|
229
251
|
|
|
230
|
-
###
|
|
252
|
+
### Model download failed
|
|
231
253
|
|
|
232
|
-
|
|
254
|
+
Check internet connection. If behind a proxy, configure network settings. The model can also be [downloaded manually](https://huggingface.co/Xenova/all-MiniLM-L6-v2).
|
|
233
255
|
|
|
234
|
-
|
|
235
|
-
"What does the API documentation say about authentication?"
|
|
236
|
-
"Find information about rate limiting"
|
|
237
|
-
"Search for error handling best practices"
|
|
238
|
-
```
|
|
256
|
+
### "File too large"
|
|
239
257
|
|
|
240
|
-
|
|
241
|
-
1. **Keyword matching (BM25)**: Finds exact term matches—crucial for code terms like `ProjectLifetimeScope` or error codes
|
|
242
|
-
2. **Semantic search**: Understands meaning, so "authentication" finds "login flow" content
|
|
258
|
+
Default limit is 100MB. Split large files or increase `MAX_FILE_SIZE`.
|
|
243
259
|
|
|
244
|
-
|
|
260
|
+
### Slow queries
|
|
245
261
|
|
|
246
|
-
|
|
262
|
+
Check chunk count with `status`. Large documents with many chunks may slow queries. Consider splitting very large files.
|
|
247
263
|
|
|
248
|
-
|
|
264
|
+
### "Path outside BASE_DIR"
|
|
249
265
|
|
|
250
|
-
|
|
251
|
-
"Search for database optimization tips, return 5 results" # Fewer, more precise
|
|
252
|
-
"Search for database optimization tips, return 20 results" # Broader exploration
|
|
253
|
-
```
|
|
266
|
+
Ensure file paths are within `BASE_DIR`. Use absolute paths.
|
|
254
267
|
|
|
255
|
-
|
|
268
|
+
### MCP client doesn't see tools
|
|
256
269
|
|
|
257
|
-
|
|
270
|
+
1. Verify config file syntax
|
|
271
|
+
2. Restart client completely (Cmd+Q on Mac for Cursor)
|
|
272
|
+
3. Test directly: `npx mcp-local-rag` should run without errors
|
|
258
273
|
|
|
259
|
-
|
|
274
|
+
</details>
|
|
260
275
|
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
```
|
|
264
|
-
|
|
265
|
-
This shows each file's path, how many chunks it produced, and when it was ingested.
|
|
276
|
+
<details>
|
|
277
|
+
<summary><strong>FAQ</strong></summary>
|
|
266
278
|
|
|
267
|
-
|
|
268
|
-
|
|
269
|
-
```
|
|
270
|
-
"Delete /Users/me/docs/old-spec.pdf from the RAG system"
|
|
271
|
-
```
|
|
279
|
+
**Is this really private?**
|
|
280
|
+
Yes. After model download, nothing leaves your machine. Verify with network monitoring.
|
|
272
281
|
|
|
273
|
-
|
|
282
|
+
**Can I use this offline?**
|
|
283
|
+
Yes, after the first model download (~90MB).
|
|
274
284
|
|
|
275
|
-
|
|
285
|
+
**How does this compare to cloud RAG?**
|
|
286
|
+
Cloud services offer better accuracy at scale but require sending data externally. This trades some accuracy for complete privacy and zero runtime cost.
|
|
276
287
|
|
|
277
|
-
|
|
278
|
-
|
|
279
|
-
```
|
|
288
|
+
**What file formats are supported?**
|
|
289
|
+
PDF, DOCX, TXT, Markdown. Not yet: Excel, PowerPoint, images, HTML.
|
|
280
290
|
|
|
281
|
-
|
|
291
|
+
**Can I change the embedding model?**
|
|
292
|
+
Yes, but you must delete your database and re-ingest all documents. Different models produce incompatible vector dimensions.
|
|
282
293
|
|
|
283
|
-
|
|
294
|
+
**GPU acceleration?**
|
|
295
|
+
Transformers.js runs on CPU. GPU support is experimental. CPU performance is adequate for most use cases.
|
|
284
296
|
|
|
285
|
-
|
|
297
|
+
**Multi-user support?**
|
|
298
|
+
No. Designed for single-user, local access. Multi-user would require authentication/access control.
|
|
286
299
|
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
```
|
|
300
|
+
**How to backup?**
|
|
301
|
+
Copy `DB_PATH` directory (default: `./lancedb/`).
|
|
290
302
|
|
|
291
|
-
|
|
303
|
+
</details>
|
|
292
304
|
|
|
293
|
-
|
|
305
|
+
<details>
|
|
306
|
+
<summary><strong>Development</strong></summary>
|
|
294
307
|
|
|
295
308
|
### Building from Source
|
|
296
309
|
|
|
297
310
|
```bash
|
|
298
311
|
git clone https://github.com/shinpr/mcp-local-rag.git
|
|
299
312
|
cd mcp-local-rag
|
|
300
|
-
|
|
313
|
+
pnpm install
|
|
301
314
|
```
|
|
302
315
|
|
|
303
|
-
###
|
|
316
|
+
### Testing
|
|
304
317
|
|
|
305
318
|
```bash
|
|
306
|
-
# Run all tests
|
|
307
|
-
|
|
308
|
-
|
|
309
|
-
# Run with coverage
|
|
310
|
-
npm run test:coverage
|
|
311
|
-
|
|
312
|
-
# Watch mode for development
|
|
313
|
-
npm run test:watch
|
|
319
|
+
pnpm test # Run all tests
|
|
320
|
+
pnpm run test:watch # Watch mode
|
|
314
321
|
```
|
|
315
322
|
|
|
316
|
-
The test suite includes:
|
|
317
|
-
- Unit tests for each component
|
|
318
|
-
- Integration tests for the full ingestion and search flow
|
|
319
|
-
- Security tests for path traversal protection
|
|
320
|
-
- Performance tests verifying query speed targets
|
|
321
|
-
|
|
322
323
|
### Code Quality
|
|
323
324
|
|
|
324
325
|
```bash
|
|
325
|
-
#
|
|
326
|
-
|
|
327
|
-
|
|
328
|
-
#
|
|
329
|
-
npm run check:fix
|
|
330
|
-
|
|
331
|
-
# Check circular dependencies
|
|
332
|
-
npm run check:deps
|
|
333
|
-
|
|
334
|
-
# Full quality check (runs everything)
|
|
335
|
-
npm run check:all
|
|
326
|
+
pnpm run type-check # TypeScript check
|
|
327
|
+
pnpm run check:fix # Lint and format
|
|
328
|
+
pnpm run check:deps # Circular dependency check
|
|
329
|
+
pnpm run check:all # Full quality check
|
|
336
330
|
```
|
|
337
331
|
|
|
338
332
|
### Project Structure
|
|
339
333
|
|
|
340
334
|
```
|
|
341
335
|
src/
|
|
342
|
-
index.ts
|
|
343
|
-
server/
|
|
344
|
-
parser/
|
|
345
|
-
chunker/
|
|
346
|
-
embedder/
|
|
347
|
-
vectordb/
|
|
348
|
-
__tests__/
|
|
336
|
+
index.ts # Entry point
|
|
337
|
+
server/ # MCP tool handlers
|
|
338
|
+
parser/ # PDF, DOCX, TXT, MD parsing
|
|
339
|
+
chunker/ # Text splitting
|
|
340
|
+
embedder/ # Transformers.js embeddings
|
|
341
|
+
vectordb/ # LanceDB operations
|
|
342
|
+
__tests__/ # Test suites
|
|
349
343
|
```
|
|
350
344
|
|
|
351
|
-
|
|
352
|
-
- **Parser** validates file paths and extracts text
|
|
353
|
-
- **Chunker** splits text into overlapping segments
|
|
354
|
-
- **Embedder** generates 384-dimensional vectors
|
|
355
|
-
- **VectorStore** handles all database operations
|
|
356
|
-
- **RAGServer** orchestrates everything and exposes MCP tools
|
|
357
|
-
|
|
358
|
-
## Performance
|
|
359
|
-
|
|
360
|
-
**Test Environment**: MacBook Pro M1 (16GB RAM), tested with v0.1.3 on Node.js 22 (January 2025)
|
|
361
|
-
|
|
362
|
-
**Query Performance**:
|
|
363
|
-
- Average: 1.2 seconds for 10,000 indexed chunks (5 results)
|
|
364
|
-
- Target: p90 < 3 seconds ✓
|
|
365
|
-
|
|
366
|
-
**Ingestion Speed** (10MB PDF):
|
|
367
|
-
- Total: ~45 seconds
|
|
368
|
-
- PDF parsing: ~8 seconds (17%)
|
|
369
|
-
- Text chunking: ~2 seconds (4%)
|
|
370
|
-
- Embedding generation: ~30 seconds (67%)
|
|
371
|
-
- Database insertion: ~5 seconds (11%)
|
|
372
|
-
|
|
373
|
-
**Memory Usage**:
|
|
374
|
-
- Baseline: ~200MB idle
|
|
375
|
-
- Peak: ~800MB when ingesting 50MB file
|
|
376
|
-
- Target: < 1GB ✓
|
|
377
|
-
|
|
378
|
-
**Concurrent Queries**: Handles 5 parallel queries without degradation. LanceDB's async API allows non-blocking operations.
|
|
379
|
-
|
|
380
|
-
**Note**: Your results will vary based on hardware, especially CPU speed (embeddings run on CPU, not GPU).
|
|
381
|
-
|
|
382
|
-
## Troubleshooting
|
|
383
|
-
|
|
384
|
-
### "No results found" when searching
|
|
385
|
-
|
|
386
|
-
**Cause**: Documents must be ingested before searching.
|
|
387
|
-
|
|
388
|
-
**Solution**:
|
|
389
|
-
1. First ingest documents: `"Ingest /path/to/document.pdf"`
|
|
390
|
-
2. Verify ingestion: `"List all ingested files"`
|
|
391
|
-
3. Then search: `"Search for [your query]"`
|
|
392
|
-
|
|
393
|
-
**Common mistake**: Trying to search immediately after configuration without ingesting any documents.
|
|
394
|
-
|
|
395
|
-
### "Model download failed"
|
|
396
|
-
|
|
397
|
-
The embedding model downloads from HuggingFace on first use (when you ingest or search for the first time). If you're behind a proxy or firewall, you might need to configure network settings.
|
|
398
|
-
|
|
399
|
-
**When it happens**: Your first ingest or search operation will trigger the download. If it fails, you'll see a detailed error message with troubleshooting guidance (network issues, disk space, cache corruption).
|
|
400
|
-
|
|
401
|
-
**What to do**: The error message provides specific recommendations. Common solutions:
|
|
402
|
-
1. Check your internet connection and retry the operation
|
|
403
|
-
2. Ensure you have sufficient disk space (~120MB needed)
|
|
404
|
-
3. If problems persist, delete the cache directory and try again
|
|
405
|
-
|
|
406
|
-
Alternatively, download the model manually:
|
|
407
|
-
1. Visit https://huggingface.co/Xenova/all-MiniLM-L6-v2
|
|
408
|
-
2. Download the model files
|
|
409
|
-
3. Set CACHE_DIR to where you saved them
|
|
410
|
-
|
|
411
|
-
### "File too large" error
|
|
412
|
-
|
|
413
|
-
Default limit is 100MB. For larger files:
|
|
414
|
-
- Split them into smaller documents
|
|
415
|
-
- Or increase MAX_FILE_SIZE in your config (be aware of memory usage)
|
|
416
|
-
|
|
417
|
-
### Slow query performance
|
|
418
|
-
|
|
419
|
-
If queries take longer than expected:
|
|
420
|
-
- Check how many chunks you have indexed (`status` command)
|
|
421
|
-
- Consider the hardware (embeddings are CPU-intensive)
|
|
422
|
-
- Try reducing CHUNK_SIZE to create fewer chunks
|
|
423
|
-
|
|
424
|
-
### "Path outside BASE_DIR" error
|
|
425
|
-
|
|
426
|
-
The server restricts file access to BASE_DIR for security. Make sure your file path is within that directory. Check for:
|
|
427
|
-
- Correct BASE_DIR setting in your MCP config
|
|
428
|
-
- Relative paths vs absolute paths
|
|
429
|
-
- Typos in the file path
|
|
430
|
-
|
|
431
|
-
### MCP client doesn't see the tools
|
|
432
|
-
|
|
433
|
-
**For Cursor:**
|
|
434
|
-
1. Open Settings → Features → Model Context Protocol
|
|
435
|
-
2. Verify the server configuration is saved
|
|
436
|
-
3. Restart Cursor completely
|
|
437
|
-
4. Check the MCP connection status in the status bar
|
|
438
|
-
|
|
439
|
-
**For Codex CLI:**
|
|
440
|
-
1. Check `~/.codex/config.toml` to verify the configuration
|
|
441
|
-
2. Ensure the section name is `[mcp_servers.local-rag]` (with underscore)
|
|
442
|
-
3. Test the server directly: `npx mcp-local-rag` should run without errors
|
|
443
|
-
4. Restart Codex CLI or IDE extension
|
|
444
|
-
5. Check for error messages when Codex starts
|
|
445
|
-
|
|
446
|
-
**For Claude Code:**
|
|
447
|
-
1. Run `claude mcp list` to see configured servers
|
|
448
|
-
2. Verify the server appears in the list
|
|
449
|
-
3. Check `~/.config/claude/mcp_config.json` for syntax errors
|
|
450
|
-
4. Test the server directly: `npx mcp-local-rag` should run without errors
|
|
451
|
-
|
|
452
|
-
**Common issues:**
|
|
453
|
-
- Invalid JSON syntax in config files
|
|
454
|
-
- Wrong file paths in BASE_DIR setting
|
|
455
|
-
- Server binary not found (try global install: `npm install -g mcp-local-rag`)
|
|
456
|
-
- Firewall blocking local communication
|
|
457
|
-
|
|
458
|
-
## How It Works
|
|
459
|
-
|
|
460
|
-
When you ingest a document, the parser extracts text based on the file type. PDFs use `pdf-parse`, DOCX uses `mammoth`, and text files are read directly.
|
|
461
|
-
|
|
462
|
-
The chunker then splits the text using LangChain's RecursiveCharacterTextSplitter. It tries to break on natural boundaries (paragraphs, sentences) while keeping chunks around 512 characters. Adjacent chunks overlap by 100 characters to preserve context.
|
|
463
|
-
|
|
464
|
-
Each chunk goes through the Transformers.js embedding model, which converts text into a 384-dimensional vector representing its semantic meaning. This happens in batches of 8 chunks at a time for efficiency.
|
|
465
|
-
|
|
466
|
-
Vectors are stored in LanceDB, a columnar vector database that works with local files. No server process, no complex setup. It's just a directory with data files.
|
|
467
|
-
|
|
468
|
-
When you search, the server performs **hybrid search**:
|
|
469
|
-
1. Your query becomes a vector using the same embedding model
|
|
470
|
-
2. LanceDB performs both keyword search (BM25) and vector similarity search
|
|
471
|
-
3. Results are combined using a weighted linear combination (default: 60% keyword, 40% semantic)
|
|
472
|
-
4. The top matches return to your MCP client with their original text and metadata
|
|
473
|
-
|
|
474
|
-
This hybrid approach gets the best of both worlds: exact keyword matches (essential for code terms, error codes, function names) and semantic understanding (so "authentication" finds "login flow" content). The default weight prioritizes keyword matches, which works well for developer documentation where exact terms matter.
|
|
475
|
-
|
|
476
|
-
## FAQ
|
|
477
|
-
|
|
478
|
-
**Is this really private?**
|
|
479
|
-
|
|
480
|
-
Yes. After the initial model download, nothing leaves your machine. You can verify with network monitoring tools—no outbound requests during ingestion or search.
|
|
481
|
-
|
|
482
|
-
**Can I use this offline?**
|
|
483
|
-
|
|
484
|
-
Yes, once the model is cached. The first run needs internet to download the model (~90MB), but after that, everything works offline.
|
|
485
|
-
|
|
486
|
-
**How does this compare to cloud RAG services?**
|
|
487
|
-
|
|
488
|
-
Cloud services (OpenAI, Pinecone, etc.) typically offer better accuracy and scale. But they require sending your documents externally, ongoing costs, and internet connectivity. This project trades some accuracy for complete privacy and zero runtime cost.
|
|
489
|
-
|
|
490
|
-
**What file formats are supported?**
|
|
491
|
-
|
|
492
|
-
Currently supported:
|
|
493
|
-
- **PDF**: `.pdf` (uses pdf-parse)
|
|
494
|
-
- **Microsoft Word**: `.docx` (uses mammoth, not `.doc`)
|
|
495
|
-
- **Plain Text**: `.txt`
|
|
496
|
-
- **Markdown**: `.md`, `.markdown`
|
|
497
|
-
|
|
498
|
-
**Not yet supported**:
|
|
499
|
-
- Excel/CSV (`.xlsx`, `.csv`)
|
|
500
|
-
- PowerPoint (`.pptx`)
|
|
501
|
-
- Images with OCR (`.jpg`, `.png`)
|
|
502
|
-
- HTML (`.html`)
|
|
503
|
-
- Old Word documents (`.doc`)
|
|
504
|
-
|
|
505
|
-
Want support for another format? [Open an issue](https://github.com/shinpr/mcp-local-rag/issues/new) with your use case.
|
|
506
|
-
|
|
507
|
-
**Can I customize the embedding model?**
|
|
508
|
-
|
|
509
|
-
Yes, set MODEL_NAME to any Transformers.js-compatible model from HuggingFace.
|
|
510
|
-
|
|
511
|
-
However, switching models isn't as simple as changing a config value. Each model produces vectors of a specific dimension—`all-MiniLM-L6-v2` outputs 384 dimensions, while `multilingual-e5-small` outputs 384 and `embeddinggemma-300m` outputs 768. These vectors are fundamentally incompatible with each other.
|
|
512
|
-
|
|
513
|
-
When you change models, you must:
|
|
514
|
-
|
|
515
|
-
1. **Delete your existing database**: `rm -rf ./lancedb/` (or your custom DB_PATH)
|
|
516
|
-
2. **Re-ingest all your documents** with the new model
|
|
517
|
-
|
|
518
|
-
Simply re-ingesting without deleting won't work. LanceDB locks the vector dimension when you first insert data, and that schema persists even if you delete all documents. If you try to insert 768-dimensional vectors into a database that was created with 384-dimensional vectors, you'll get a dimension mismatch error.
|
|
519
|
-
|
|
520
|
-
The good news: if you do forget to delete the database, LanceDB will give you a clear error message like "Query vector size 768 does not match index column size 384"—so you'll know exactly what went wrong.
|
|
521
|
-
|
|
522
|
-
**How much does accuracy depend on the model?**
|
|
523
|
-
|
|
524
|
-
`all-MiniLM-L6-v2` is optimized for English and performs well for technical documentation. For other languages, consider multilingual models like `multilingual-e5-small`. For higher accuracy, try larger models—but expect slower processing.
|
|
525
|
-
|
|
526
|
-
**What about GPU acceleration?**
|
|
527
|
-
|
|
528
|
-
Transformers.js runs on CPU by default. GPU support is experimental and varies by platform. For most use cases, CPU performance is adequate (embeddings are reasonably fast even without GPU).
|
|
529
|
-
|
|
530
|
-
**Can multiple people share a database?**
|
|
531
|
-
|
|
532
|
-
The current design assumes single-user, local access. For multi-user scenarios, you'd need to implement authentication and access control—both out of scope for this project's privacy-first design.
|
|
533
|
-
|
|
534
|
-
**How do I back up my data?**
|
|
535
|
-
|
|
536
|
-
Copy your DB_PATH directory (default: `./lancedb/`). That's your entire vector database. Copy BASE_DIR for your original documents. Both are just files—no special export needed.
|
|
345
|
+
</details>
|
|
537
346
|
|
|
538
347
|
## Contributing
|
|
539
348
|
|
|
540
|
-
Contributions
|
|
349
|
+
Contributions welcome. Before submitting a PR:
|
|
541
350
|
|
|
542
|
-
1. Run
|
|
543
|
-
2.
|
|
351
|
+
1. Run tests: `pnpm test`
|
|
352
|
+
2. Check quality: `pnpm run check:all`
|
|
544
353
|
3. Add tests for new features
|
|
545
|
-
4. Update
|
|
354
|
+
4. Update docs if behavior changes
|
|
546
355
|
|
|
547
356
|
## License
|
|
548
357
|
|
|
549
|
-
MIT License
|
|
550
|
-
|
|
551
|
-
Free for personal and commercial use. No attribution required, but appreciated.
|
|
358
|
+
MIT License. Free for personal and commercial use.
|
|
552
359
|
|
|
553
360
|
## Acknowledgments
|
|
554
361
|
|
|
555
|
-
Built with
|
|
556
|
-
- [Model Context Protocol](https://modelcontextprotocol.io/) by Anthropic
|
|
557
|
-
- [LanceDB](https://lancedb.com/) for vector storage
|
|
558
|
-
- [Transformers.js](https://huggingface.co/docs/transformers.js) by HuggingFace
|
|
559
|
-
- [LangChain.js](https://js.langchain.com/) for text splitting
|
|
560
|
-
|
|
561
|
-
Created as a practical tool for developers who want AI-powered document search without compromising privacy.
|
|
362
|
+
Built with [Model Context Protocol](https://modelcontextprotocol.io/) by Anthropic, [LanceDB](https://lancedb.com/), and [Transformers.js](https://huggingface.co/docs/transformers.js).
|