@arabold/docs-mcp-server 1.10.0 → 1.12.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +152 -232
- package/db/migrations/000-initial-schema.sql +57 -0
- package/db/migrations/001-add-indexed-at-column.sql +6 -0
- package/dist/DocumentManagementService-_qCZ1Hi2.js +3409 -0
- package/dist/DocumentManagementService-_qCZ1Hi2.js.map +1 -0
- package/dist/EmbeddingFactory-BJMbJvje.js +174 -0
- package/dist/EmbeddingFactory-BJMbJvje.js.map +1 -0
- package/dist/FindVersionTool-CH1c3Tyu.js +170 -0
- package/dist/FindVersionTool-CH1c3Tyu.js.map +1 -0
- package/dist/RemoveTool-DmB1YJTA.js +65 -0
- package/dist/RemoveTool-DmB1YJTA.js.map +1 -0
- package/dist/assets/main.css +1 -0
- package/dist/assets/main.js +8097 -0
- package/dist/assets/main.js.map +1 -0
- package/dist/cli.js +49 -143
- package/dist/cli.js.map +1 -1
- package/dist/server.js +684 -388
- package/dist/server.js.map +1 -1
- package/dist/web.js +937 -0
- package/dist/web.js.map +1 -0
- package/package.json +35 -11
- package/public/assets/main.css +1 -0
- package/public/assets/main.js +8097 -0
- package/public/assets/main.js.map +1 -0
- package/dist/EmbeddingFactory-6UEXNF44.js +0 -1177
- package/dist/EmbeddingFactory-6UEXNF44.js.map +0 -1
- package/dist/chunk-VTO2ED43.js +0 -12098
- package/dist/chunk-VTO2ED43.js.map +0 -1
- package/dist/chunk-YCXNASA6.js +0 -124
- package/dist/chunk-YCXNASA6.js.map +0 -1
- package/dist/cli.d.ts +0 -1
- package/dist/server.d.ts +0 -1
package/README.md
CHANGED
|
@@ -1,81 +1,55 @@
|
|
|
1
|
-
#
|
|
1
|
+
# Docs MCP Server: Enhance Your AI Coding Assistant
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
AI coding assistants often struggle with outdated documentation, leading to incorrect suggestions or hallucinated code examples. Verifying AI responses against specific library versions can be time-consuming and inefficient.
|
|
4
4
|
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
- 🌐 **Versatile Scraping:** Fetch documentation from diverse sources like websites, GitHub, npm, PyPI, or local files.
|
|
8
|
-
- 🧠 **Intelligent Processing:** Automatically split content semantically and generate embeddings using your choice of models (OpenAI, Google Gemini, Azure OpenAI, AWS Bedrock, Ollama, and more).
|
|
9
|
-
- 💾 **Optimized Storage:** Leverage SQLite with `sqlite-vec` for efficient vector storage and FTS5 for robust full-text search.
|
|
10
|
-
- 🔍 **Powerful Hybrid Search:** Combine vector similarity and full-text search across different library versions for highly relevant results.
|
|
11
|
-
- ⚙️ **Asynchronous Job Handling:** Manage scraping and indexing tasks efficiently with a background job queue and MCP/CLI tools.
|
|
12
|
-
- 🐳 **Simple Deployment:** Get up and running quickly using Docker or npx.
|
|
13
|
-
|
|
14
|
-
## Overview
|
|
15
|
-
|
|
16
|
-
This project provides a Model Context Protocol (MCP) server designed to scrape, process, index, and search documentation for various software libraries and packages. It fetches content from specified URLs, splits it into meaningful chunks using semantic splitting techniques, generates vector embeddings using OpenAI, and stores the data in an SQLite database. The server utilizes `sqlite-vec` for efficient vector similarity search and FTS5 for full-text search capabilities, combining them for hybrid search results. It supports versioning, allowing documentation for different library versions (including unversioned content) to be stored and queried distinctly.
|
|
17
|
-
|
|
18
|
-
The server exposes MCP tools for:
|
|
19
|
-
|
|
20
|
-
- Starting a scraping job (`scrape_docs`): Returns a `jobId` immediately.
|
|
21
|
-
- Checking job status (`get_job_status`): Retrieves the current status and progress of a specific job.
|
|
22
|
-
- Listing active/completed jobs (`list_jobs`): Shows recent and ongoing jobs.
|
|
23
|
-
- Cancelling a job (`cancel_job`): Attempts to stop a running or queued job.
|
|
24
|
-
- Searching documentation (`search_docs`).
|
|
25
|
-
- Listing indexed libraries (`list_libraries`).
|
|
26
|
-
- Finding appropriate versions (`find_version`).
|
|
27
|
-
- Removing indexed documents (`remove_docs`).
|
|
28
|
-
- Fetching single URLs (`fetch_url`): Fetches a URL and returns its content as Markdown.
|
|
29
|
-
|
|
30
|
-
## Configuration
|
|
31
|
-
|
|
32
|
-
The following environment variables are supported to configure the embedding model behavior:
|
|
5
|
+
The **Docs MCP Server** addresses these challenges by providing a personal, always-current knowledge base for your AI assistant. It acts as a bridge, connecting your LLM directly to the **latest official documentation** from thousands of software libraries.
|
|
33
6
|
|
|
34
|
-
|
|
7
|
+
By grounding AI responses in accurate, version-aware context, the Docs MCP Server enables you to receive concise and relevant integration details and code snippets, improving the reliability and efficiency of LLM-assisted development.
|
|
35
8
|
|
|
36
|
-
-
|
|
9
|
+
It's **free**, **open-source**, runs **locally** for privacy, and integrates seamlessly with your workflow via the Model Context Protocol (MCP).
|
|
37
10
|
|
|
38
|
-
|
|
11
|
+
## Why Use the Docs MCP Server?
|
|
39
12
|
|
|
40
|
-
|
|
41
|
-
- `OPENAI_ORG_ID`: **Optional.** Your OpenAI Organization ID
|
|
42
|
-
- `OPENAI_API_BASE`: **Optional.** Custom base URL for OpenAI-compatible APIs (e.g., Ollama, Azure OpenAI)
|
|
13
|
+
LLM-assisted coding promises speed and efficiency, but often falls short due to:
|
|
43
14
|
|
|
44
|
-
|
|
15
|
+
- 🌀 **Stale Knowledge:** LLMs train on snapshots of the internet, quickly falling behind new library releases and API changes.
|
|
16
|
+
- 👻 **Code Hallucinations:** AI can invent plausible-looking code that is syntactically correct but functionally wrong or uses non-existent APIs.
|
|
17
|
+
- ❓ **Version Ambiguity:** Generic answers rarely account for the specific version dependencies in _your_ project, leading to subtle bugs.
|
|
18
|
+
- ⏳ **Verification Overhead:** Developers spend valuable time double-checking AI suggestions against official documentation.
|
|
45
19
|
|
|
46
|
-
|
|
20
|
+
**The Docs MCP Server tackles these problems head-on by:**
|
|
47
21
|
|
|
48
|
-
|
|
22
|
+
- ✅ **Providing Always Up-to-Date Context:** It fetches and indexes documentation _directly_ from official sources (websites, GitHub, npm, PyPI, local files) on demand.
|
|
23
|
+
- 🎯 **Delivering Version-Specific Answers:** Search queries can target exact library versions, ensuring the information aligns with your project's dependencies.
|
|
24
|
+
- 💡 **Reducing Hallucinations:** By grounding the LLM in real documentation, it provides accurate examples and integration details.
|
|
25
|
+
- ⚡ **Boosting Productivity:** Get trustworthy answers faster, integrated directly into your AI assistant workflow.
|
|
49
26
|
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
- `aws`: Uses AWS Bedrock embeddings
|
|
53
|
-
|
|
54
|
-
- `AWS_ACCESS_KEY_ID`: **Required.** AWS access key
|
|
55
|
-
- `AWS_SECRET_ACCESS_KEY`: **Required.** AWS secret key
|
|
56
|
-
- `AWS_REGION` or `BEDROCK_AWS_REGION`: **Required.** AWS region for Bedrock
|
|
57
|
-
|
|
58
|
-
- `microsoft`: Uses Azure OpenAI embeddings
|
|
59
|
-
- `AZURE_OPENAI_API_KEY`: **Required.** Azure OpenAI API key
|
|
60
|
-
- `AZURE_OPENAI_API_INSTANCE_NAME`: **Required.** Azure instance name
|
|
61
|
-
- `AZURE_OPENAI_API_DEPLOYMENT_NAME`: **Required.** Azure deployment name
|
|
62
|
-
- `AZURE_OPENAI_API_VERSION`: **Required.** Azure API version
|
|
63
|
-
|
|
64
|
-
### Vector Dimensions
|
|
65
|
-
|
|
66
|
-
The database schema uses a fixed dimension of 1536 for embedding vectors. Only models that produce vectors with dimension ≤ 1536 are supported, except for certain providers (like Gemini) that support dimension reduction.
|
|
27
|
+
## ✨ Key Features
|
|
67
28
|
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
29
|
+
- **Up-to-Date Knowledge:** Fetches the latest documentation directly from the source.
|
|
30
|
+
- **Version-Aware Search:** Get answers relevant to specific library versions (e.g., `react@18.2.0` vs `react@17.0.0`).
|
|
31
|
+
- **Accurate Snippets:** Reduces AI hallucinations by using context from official docs.
|
|
32
|
+
- **Web Interface:** Provides a easy-to-use web interface for searching and managing documentation.
|
|
33
|
+
- **Broad Source Compatibility:** Scrapes websites, GitHub repos, package manager sites (npm, PyPI), and even local file directories.
|
|
34
|
+
- **Intelligent Processing:** Automatically chunks documentation semantically and generates embeddings.
|
|
35
|
+
- **Flexible Embedding Models:** Supports OpenAI (incl. compatible APIs like Ollama), Google Gemini/Vertex AI, Azure OpenAI, AWS Bedrock, and more.
|
|
36
|
+
- **Powerful Hybrid Search:** Combines vector similarity with full-text search for relevance.
|
|
37
|
+
- **Local & Private:** Runs entirely on your machine, keeping your data and queries private.
|
|
38
|
+
- **Free & Open Source:** Built for the community, by the community.
|
|
39
|
+
- **Simple Deployment:** Easy setup via Docker or `npx`.
|
|
40
|
+
- **Seamless Integration:** Works with MCP-compatible clients (like Claude, Cline, Roo).
|
|
71
41
|
|
|
72
42
|
## Running the MCP Server
|
|
73
43
|
|
|
74
|
-
|
|
44
|
+
Get up and running quickly!
|
|
45
|
+
|
|
46
|
+
- [Option 1: Using Docker](#option-1-using-docker)
|
|
47
|
+
- [Option 2: Using npx](#option-2-using-npx)
|
|
48
|
+
- [Option 3: Using Docker Compose](#option-3-using-docker-compose)
|
|
75
49
|
|
|
76
|
-
### Option 1: Using Docker
|
|
50
|
+
### Option 1: Using Docker
|
|
77
51
|
|
|
78
|
-
This
|
|
52
|
+
This approach is easy, straightforward, and doesn't require Node.js to be installed.
|
|
79
53
|
|
|
80
54
|
1. **Ensure Docker is installed and running.**
|
|
81
55
|
2. **Configure your MCP settings:**
|
|
@@ -176,7 +150,7 @@ docker run -i --rm \
|
|
|
176
150
|
|
|
177
151
|
### Option 2: Using npx
|
|
178
152
|
|
|
179
|
-
This approach is
|
|
153
|
+
This approach is useful when you need local file access (e.g., indexing documentation from your local file system). While this can also be achieved by mounting paths into a Docker container, using `npx` is simpler but requires a Node.js installation.
|
|
180
154
|
|
|
181
155
|
1. **Ensure Node.js is installed.**
|
|
182
156
|
2. **Configure your MCP settings:**
|
|
@@ -204,183 +178,180 @@ This approach is recommended when you need local file access (e.g., indexing doc
|
|
|
204
178
|
|
|
205
179
|
3. **That's it!** The server will now be available to your AI assistant.
|
|
206
180
|
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
You can use the CLI to manage documentation directly, either via Docker or npx. **Important: Use the same method (Docker or npx) for both the server and CLI to ensure access to the same indexed documentation.**
|
|
210
|
-
|
|
211
|
-
### Using Docker CLI
|
|
181
|
+
### Option 3: Using Docker Compose
|
|
212
182
|
|
|
213
|
-
|
|
183
|
+
This method provides a persistent local setup by running the server and web interface using Docker Compose. It requires cloning the repository but simplifies managing both services together.
|
|
214
184
|
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
|
|
220
|
-
|
|
221
|
-
|
|
185
|
+
1. **Ensure Docker and Docker Compose are installed and running.**
|
|
186
|
+
2. **Clone the repository:**
|
|
187
|
+
```bash
|
|
188
|
+
git clone https://github.com/arabold/docs-mcp-server.git
|
|
189
|
+
cd docs-mcp-server
|
|
190
|
+
```
|
|
191
|
+
3. **Set up your environment:**
|
|
192
|
+
Copy the example environment file and **edit it** to add your necessary API keys (e.g., `OPENAI_API_KEY`).
|
|
193
|
+
```bash
|
|
194
|
+
cp .env.example .env
|
|
195
|
+
# Now, edit the .env file with your editor
|
|
196
|
+
```
|
|
197
|
+
Refer to the [Configuration](#configuration) section for details on available environment variables.
|
|
198
|
+
4. **Launch the services:**
|
|
199
|
+
Run this command from the repository's root directory. It will build the images (if necessary) and start the server and web interface in the background.
|
|
222
200
|
|
|
223
|
-
|
|
201
|
+
```bash
|
|
202
|
+
docker compose up -d
|
|
203
|
+
```
|
|
224
204
|
|
|
225
|
-
|
|
205
|
+
- `-d`: Runs the containers in detached mode (in the background). Omit this to see logs directly in your terminal.
|
|
226
206
|
|
|
227
|
-
If you
|
|
207
|
+
**Note:** If you pull updates for the repository (e.g., using `git pull`), you'll need to rebuild the Docker images to include the changes by running `docker compose up -d --build`.
|
|
228
208
|
|
|
229
|
-
|
|
230
|
-
|
|
231
|
-
```
|
|
209
|
+
5. **Configure your MCP client:**
|
|
210
|
+
Add the following configuration block to your MCP settings file (e.g., for Claude, Cline, Roo):
|
|
232
211
|
|
|
233
|
-
|
|
212
|
+
```json
|
|
213
|
+
{
|
|
214
|
+
"mcpServers": {
|
|
215
|
+
"docs-mcp-server": {
|
|
216
|
+
"url": "http://localhost:6280/sse", // Connects via HTTP to the Docker Compose service
|
|
217
|
+
"disabled": false,
|
|
218
|
+
"autoApprove": []
|
|
219
|
+
}
|
|
220
|
+
}
|
|
221
|
+
}
|
|
222
|
+
```
|
|
234
223
|
|
|
235
|
-
|
|
224
|
+
Restart your AI assistant application after updating the configuration.
|
|
236
225
|
|
|
237
|
-
|
|
226
|
+
6. **Access the Web Interface:**
|
|
227
|
+
The web interface will be available at `http://localhost:6281`.
|
|
238
228
|
|
|
239
|
-
|
|
229
|
+
**Benefits of this method:**
|
|
240
230
|
|
|
241
|
-
|
|
231
|
+
- Runs both the server and web UI with a single command.
|
|
232
|
+
- Uses the local source code (rebuilds automatically if code changes and you run `docker compose up --build`).
|
|
233
|
+
- Persistent data storage via the `docs-mcp-data` Docker volume.
|
|
234
|
+
- Easy configuration management via the `.env` file.
|
|
242
235
|
|
|
243
|
-
|
|
244
|
-
docs-cli --help
|
|
245
|
-
# or
|
|
246
|
-
npx -y --package=@arabold/docs-mcp-server docs-cli --help
|
|
247
|
-
```
|
|
236
|
+
To stop the services, run `docker compose down` from the repository directory.
|
|
248
237
|
|
|
249
|
-
|
|
238
|
+
## Using the Web Interface
|
|
250
239
|
|
|
251
|
-
|
|
252
|
-
docs-cli scrape --help
|
|
253
|
-
docs-cli search --help
|
|
254
|
-
docs-cli fetch-url --help
|
|
255
|
-
docs-cli find-version --help
|
|
256
|
-
docs-cli remove --help
|
|
257
|
-
docs-cli list --help
|
|
258
|
-
```
|
|
240
|
+
You can access a web-based GUI at `http://localhost:6281` to manage and search library documentation through your browser. **Important: Use the same method (Docker or npx) for both the server and web interface to ensure access to the same indexed documentation.**
|
|
259
241
|
|
|
260
|
-
###
|
|
242
|
+
### Using Docker Web Interface
|
|
261
243
|
|
|
262
|
-
|
|
244
|
+
If you're running the server with Docker, use Docker for the web interface as well:
|
|
263
245
|
|
|
264
246
|
```bash
|
|
265
|
-
|
|
247
|
+
docker run --rm \
|
|
248
|
+
-e OPENAI_API_KEY="your-openai-api-key-here" \
|
|
249
|
+
-v docs-mcp-data:/data \
|
|
250
|
+
-p 3000:3000 \
|
|
251
|
+
ghcr.io/arabold/docs-mcp-server:latest \
|
|
252
|
+
docs-web
|
|
266
253
|
```
|
|
267
254
|
|
|
268
|
-
|
|
269
|
-
|
|
270
|
-
- `--no-follow-redirects`: Disable following HTTP redirects (default: follow redirects).
|
|
271
|
-
- `--scrape-mode <mode>`: HTML processing strategy: 'fetch' (fast, less JS), 'playwright' (slow, full JS), 'auto' (default).
|
|
272
|
-
|
|
273
|
-
**Examples:**
|
|
255
|
+
Make sure to:
|
|
274
256
|
|
|
275
|
-
|
|
276
|
-
|
|
277
|
-
|
|
278
|
-
```
|
|
257
|
+
- Use the same volume name (`docs-mcp-data` in this example) as your server
|
|
258
|
+
- Map port 6281 with `-p 6281:3000`
|
|
259
|
+
- Pass any configuration environment variables with `-e` flags
|
|
279
260
|
|
|
280
|
-
###
|
|
261
|
+
### Using `npx Web Interface
|
|
281
262
|
|
|
282
|
-
|
|
263
|
+
If you're running the server with `npx`, use `npx` for the web interface as well:
|
|
283
264
|
|
|
284
265
|
```bash
|
|
285
|
-
docs-
|
|
266
|
+
npx -y --package=@arabold/docs-mcp-server docs-web --port 6281
|
|
286
267
|
```
|
|
287
268
|
|
|
288
|
-
|
|
269
|
+
You can specify a different port using the `--port` flag.
|
|
289
270
|
|
|
290
|
-
|
|
291
|
-
- Accepts full versions (`1.2.3`), pre-release versions (`1.2.3-beta.1`), or partial versions (`1`, `1.2` which are expanded to `1.0.0`, `1.2.0`).
|
|
292
|
-
- If omitted, the documentation is indexed as **unversioned**.
|
|
293
|
-
- `-p, --max-pages <number>`: Maximum pages to scrape (default: 1000).
|
|
294
|
-
- `-d, --max-depth <number>`: Maximum navigation depth (default: 3).
|
|
295
|
-
- `-c, --max-concurrency <number>`: Maximum concurrent requests (default: 3).
|
|
296
|
-
- `--scope <scope>`: Defines the crawling boundary: 'subpages' (default), 'hostname', or 'domain'.
|
|
297
|
-
- `--no-follow-redirects`: Disable following HTTP redirects (default: follow redirects).
|
|
298
|
-
- `--scrape-mode <mode>`: HTML processing strategy: 'fetch' (fast, less JS), 'playwright' (slow, full JS), 'auto' (default).
|
|
299
|
-
- `--ignore-errors`: Ignore errors during scraping (default: true).
|
|
271
|
+
The `npx` approach will use the default data directory on your system (typically in your home directory), ensuring consistency between server and web interface.
|
|
300
272
|
|
|
301
|
-
|
|
273
|
+
## Using the CLI
|
|
302
274
|
|
|
303
|
-
|
|
304
|
-
# Scrape React 18.2.0 docs
|
|
305
|
-
docs-cli scrape react --version 18.2.0 https://react.dev/
|
|
306
|
-
```
|
|
275
|
+
You can use the CLI to manage documentation directly, either via Docker or npx. **Important: Use the same method (Docker or npx) for both the server and CLI to ensure access to the same indexed documentation.**
|
|
307
276
|
|
|
308
|
-
|
|
277
|
+
Here's how to invoke the CLI:
|
|
309
278
|
|
|
310
|
-
|
|
279
|
+
### Using Docker CLI
|
|
280
|
+
|
|
281
|
+
If you're running the server with Docker, use Docker for the CLI as well:
|
|
311
282
|
|
|
312
283
|
```bash
|
|
313
|
-
|
|
284
|
+
docker run --rm \
|
|
285
|
+
-e OPENAI_API_KEY="your-openai-api-key-here" \
|
|
286
|
+
-v docs-mcp-data:/data \
|
|
287
|
+
ghcr.io/arabold/docs-mcp-server:latest \
|
|
288
|
+
docs-cli <command> [options]
|
|
314
289
|
```
|
|
315
290
|
|
|
316
|
-
|
|
291
|
+
Make sure to use the same volume name (`docs-mcp-data` in this example) as you did for the server. Any of the configuration environment variables (see [Configuration](#configuration) above) can be passed using `-e` flags, just like with the server.
|
|
317
292
|
|
|
318
|
-
|
|
319
|
-
- Supports exact versions (`18.0.0`), partial versions (`18`), or ranges (`18.x`).
|
|
320
|
-
- If omitted, searches the **latest** available indexed version.
|
|
321
|
-
- If a specific version/range doesn't match, it falls back to the latest indexed version _older_ than the target.
|
|
322
|
-
- To search **only unversioned** documents, explicitly pass an empty string: `--version ""`. (Note: Omitting `--version` searches latest, which _might_ be unversioned if no other versions exist).
|
|
323
|
-
- `-l, --limit <number>`: Maximum number of results (default: 5).
|
|
324
|
-
- `-e, --exact-match`: Only match the exact version specified (disables fallback and range matching) (default: false).
|
|
293
|
+
### Using `npx CLI
|
|
325
294
|
|
|
326
|
-
|
|
295
|
+
If you're running the server with npx, use `npx` for the CLI as well:
|
|
327
296
|
|
|
328
297
|
```bash
|
|
329
|
-
|
|
330
|
-
docs-cli search react 'hooks'
|
|
298
|
+
npx -y --package=@arabold/docs-mcp-server docs-cli <command> [options]
|
|
331
299
|
```
|
|
332
300
|
|
|
333
|
-
|
|
301
|
+
The `npx` approach will use the default data directory on your system (typically in your home directory), ensuring consistency between server and CLI.
|
|
334
302
|
|
|
335
|
-
|
|
303
|
+
The main commands available are:
|
|
336
304
|
|
|
337
|
-
|
|
338
|
-
|
|
339
|
-
|
|
305
|
+
- `scrape`: Scrapes and indexes documentation from a URL.
|
|
306
|
+
- `search`: Searches the indexed documentation.
|
|
307
|
+
- `list`: Lists all indexed libraries.
|
|
308
|
+
- `remove`: Removes indexed documentation.
|
|
309
|
+
- `fetch-url`: Fetches a single URL and converts to Markdown.
|
|
310
|
+
- `find-version`: Finds the best matching version for a library.
|
|
340
311
|
|
|
341
|
-
|
|
312
|
+
See the [CLI Command Reference](#cli-command-reference) below for detailed command usage.
|
|
342
313
|
|
|
343
|
-
|
|
314
|
+
## Configuration
|
|
344
315
|
|
|
345
|
-
|
|
316
|
+
The following environment variables are supported to configure the embedding model behavior:
|
|
346
317
|
|
|
347
|
-
|
|
348
|
-
# Find the latest indexed version for react
|
|
349
|
-
docs-cli find-version react
|
|
350
|
-
```
|
|
318
|
+
### Embedding Model Configuration
|
|
351
319
|
|
|
352
|
-
|
|
320
|
+
- `DOCS_MCP_EMBEDDING_MODEL`: **Optional.** Format: `provider:model_name` or just `model_name` (defaults to `text-embedding-3-small`). Supported providers and their required environment variables:
|
|
353
321
|
|
|
354
|
-
|
|
322
|
+
- `openai` (default): Uses OpenAI's embedding models
|
|
355
323
|
|
|
356
|
-
|
|
357
|
-
|
|
358
|
-
|
|
324
|
+
- `OPENAI_API_KEY`: **Required.** Your OpenAI API key
|
|
325
|
+
- `OPENAI_ORG_ID`: **Optional.** Your OpenAI Organization ID
|
|
326
|
+
- `OPENAI_API_BASE`: **Optional.** Custom base URL for OpenAI-compatible APIs (e.g., Ollama, Azure OpenAI)
|
|
359
327
|
|
|
360
|
-
|
|
328
|
+
- `vertex`: Uses Google Cloud Vertex AI embeddings
|
|
361
329
|
|
|
362
|
-
|
|
330
|
+
- `GOOGLE_APPLICATION_CREDENTIALS`: **Required.** Path to service account JSON key file
|
|
363
331
|
|
|
364
|
-
|
|
365
|
-
docs-cli remove <library> [options]
|
|
366
|
-
```
|
|
332
|
+
- `gemini`: Uses Google Generative AI (Gemini) embeddings
|
|
367
333
|
|
|
368
|
-
**
|
|
334
|
+
- `GOOGLE_API_KEY`: **Required.** Your Google API key
|
|
369
335
|
|
|
370
|
-
-
|
|
336
|
+
- `aws`: Uses AWS Bedrock embeddings
|
|
371
337
|
|
|
372
|
-
**
|
|
338
|
+
- `AWS_ACCESS_KEY_ID`: **Required.** AWS access key
|
|
339
|
+
- `AWS_SECRET_ACCESS_KEY`: **Required.** AWS secret key
|
|
340
|
+
- `AWS_REGION` or `BEDROCK_AWS_REGION`: **Required.** AWS region for Bedrock
|
|
373
341
|
|
|
374
|
-
|
|
375
|
-
|
|
376
|
-
|
|
377
|
-
|
|
342
|
+
- `microsoft`: Uses Azure OpenAI embeddings
|
|
343
|
+
- `AZURE_OPENAI_API_KEY`: **Required.** Azure OpenAI API key
|
|
344
|
+
- `AZURE_OPENAI_API_INSTANCE_NAME`: **Required.** Azure instance name
|
|
345
|
+
- `AZURE_OPENAI_API_DEPLOYMENT_NAME`: **Required.** Azure deployment name
|
|
346
|
+
- `AZURE_OPENAI_API_VERSION`: **Required.** Azure API version
|
|
378
347
|
|
|
379
|
-
###
|
|
348
|
+
### Vector Dimensions
|
|
380
349
|
|
|
381
|
-
|
|
382
|
-
|
|
383
|
-
-
|
|
350
|
+
The database schema uses a fixed dimension of 1536 for embedding vectors. Only models that produce vectors with dimension ≤ 1536 are supported, except for certain providers (like Gemini) that support dimension reduction.
|
|
351
|
+
|
|
352
|
+
For OpenAI-compatible APIs (like Ollama), use the `openai` provider with `OPENAI_API_BASE` pointing to your endpoint.
|
|
353
|
+
|
|
354
|
+
These variables can be set regardless of how you run the server (Docker, npx, or from source).
|
|
384
355
|
|
|
385
356
|
## Development & Advanced Setup
|
|
386
357
|
|
|
@@ -390,39 +361,6 @@ This section covers running the server/CLI directly from the source code for dev
|
|
|
390
361
|
|
|
391
362
|
This provides an isolated environment and exposes the server via HTTP endpoints.
|
|
392
363
|
|
|
393
|
-
1. **Clone the repository:**
|
|
394
|
-
```bash
|
|
395
|
-
git clone https://github.com/arabold/docs-mcp-server.git # Replace with actual URL if different
|
|
396
|
-
cd docs-mcp-server
|
|
397
|
-
```
|
|
398
|
-
2. **Create `.env` file:**
|
|
399
|
-
Copy the example and add your OpenAI key (see "Environment Setup" below).
|
|
400
|
-
```bash
|
|
401
|
-
cp .env.example .env
|
|
402
|
-
# Edit .env and add your OPENAI_API_KEY
|
|
403
|
-
```
|
|
404
|
-
3. **Build the Docker image:**
|
|
405
|
-
```bash
|
|
406
|
-
docker build -t docs-mcp-server .
|
|
407
|
-
```
|
|
408
|
-
4. **Run the Docker container:**
|
|
409
|
-
|
|
410
|
-
```bash
|
|
411
|
-
# Option 1: Using a named volume (recommended)
|
|
412
|
-
# Docker automatically creates the volume 'docs-mcp-data' if it doesn't exist on first run.
|
|
413
|
-
docker run -i --env-file .env -v docs-mcp-data:/data --name docs-mcp-server docs-mcp-server
|
|
414
|
-
|
|
415
|
-
# Option 2: Mapping to a host directory
|
|
416
|
-
# docker run -i --env-file .env -v /path/on/your/host:/data --name docs-mcp-server docs-mcp-server
|
|
417
|
-
```
|
|
418
|
-
|
|
419
|
-
- `-i`: Keep STDIN open even if not attached. This is crucial for interacting with the server over stdio.
|
|
420
|
-
- `--env-file .env`: Loads environment variables (like `OPENAI_API_KEY`) from your local `.env` file.
|
|
421
|
-
- `-v docs-mcp-data:/data` or `-v /path/on/your/host:/data`: **Crucial for persistence.** This mounts a Docker named volume (Docker creates `docs-mcp-data` automatically if needed) or a host directory to the `/data` directory inside the container. The `/data` directory is where the server stores its `documents.db` file (as configured by `DOCS_MCP_STORE_PATH` in the Dockerfile). This ensures your indexed documentation persists even if the container is stopped or removed.
|
|
422
|
-
- `--name docs-mcp-server`: Assigns a convenient name to the container.
|
|
423
|
-
|
|
424
|
-
The server inside the container now runs directly using Node.js and communicates over **stdio**.
|
|
425
|
-
|
|
426
364
|
This method is useful for contributing to the project or running un-published versions.
|
|
427
365
|
|
|
428
366
|
1. **Clone the repository:**
|
|
@@ -479,7 +417,7 @@ This method is useful for contributing to the project or running un-published ve
|
|
|
479
417
|
# DOCS_MCP_STORE_PATH=/path/to/your/desired/storage/directory
|
|
480
418
|
```
|
|
481
419
|
|
|
482
|
-
###
|
|
420
|
+
### Testing (from Source)
|
|
483
421
|
|
|
484
422
|
Since MCP servers communicate over stdio when run directly via Node.js, debugging can be challenging. We recommend using the [MCP Inspector](https://github.com/modelcontextprotocol/inspector), which is available as a package script after building:
|
|
485
423
|
|
|
@@ -489,24 +427,6 @@ npx @modelcontextprotocol/inspector node dist/server.js
|
|
|
489
427
|
|
|
490
428
|
The Inspector will provide a URL to access debugging tools in your browser.
|
|
491
429
|
|
|
492
|
-
### Releasing
|
|
493
|
-
|
|
494
|
-
This project uses [semantic-release](https://github.com/semantic-release/semantic-release) and [Conventional Commits](https://www.conventionalcommits.org/) to automate the release process.
|
|
495
|
-
|
|
496
|
-
**How it works:**
|
|
497
|
-
|
|
498
|
-
1. **Commit Messages:** All commits merged into the `main` branch **must** follow the Conventional Commits specification.
|
|
499
|
-
2. **Manual Trigger:** The "Release" GitHub Actions workflow can be triggered manually from the Actions tab when you're ready to create a new release.
|
|
500
|
-
3. **`semantic-release` Actions:** Determines version, updates `CHANGELOG.md` & `package.json`, commits, tags, publishes to npm, and creates a GitHub Release.
|
|
501
|
-
|
|
502
|
-
**What you need to do:**
|
|
503
|
-
|
|
504
|
-
- Use Conventional Commits.
|
|
505
|
-
- Merge changes to `main`.
|
|
506
|
-
- Trigger a release manually when ready from the Actions tab in GitHub.
|
|
507
|
-
|
|
508
|
-
**Automation handles:** Changelog, version bumps, tags, npm publish, GitHub releases.
|
|
509
|
-
|
|
510
430
|
### Architecture
|
|
511
431
|
|
|
512
432
|
For details on the project's architecture and design principles, please see [ARCHITECTURE.md](ARCHITECTURE.md).
|
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
-- Initial database schema setup
|
|
2
|
+
|
|
3
|
+
-- Documents table
|
|
4
|
+
CREATE TABLE IF NOT EXISTS documents(
|
|
5
|
+
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
6
|
+
library TEXT NOT NULL,
|
|
7
|
+
version TEXT NOT NULL DEFAULT '',
|
|
8
|
+
url TEXT NOT NULL,
|
|
9
|
+
content TEXT,
|
|
10
|
+
metadata JSON,
|
|
11
|
+
sort_order INTEGER NOT NULL,
|
|
12
|
+
UNIQUE(url, library, version, sort_order)
|
|
13
|
+
);
|
|
14
|
+
|
|
15
|
+
-- Indexes
|
|
16
|
+
CREATE INDEX IF NOT EXISTS idx_documents_library_lower ON documents(lower(library));
|
|
17
|
+
CREATE INDEX IF NOT EXISTS idx_documents_version_lower ON documents(lower(library), lower(version));
|
|
18
|
+
|
|
19
|
+
-- Create Embeddings virtual table
|
|
20
|
+
-- Note: Dimension is hardcoded here based on the value in schema.ts at the time of creation.
|
|
21
|
+
-- If VECTOR_DIMENSION changes, a separate migration would be needed to update/recreate this table.
|
|
22
|
+
CREATE VIRTUAL TABLE IF NOT EXISTS documents_vec USING vec0(
|
|
23
|
+
library TEXT NOT NULL,
|
|
24
|
+
version TEXT NOT NULL,
|
|
25
|
+
embedding FLOAT[1536]
|
|
26
|
+
);
|
|
27
|
+
|
|
28
|
+
-- Create FTS5 virtual table
|
|
29
|
+
CREATE VIRTUAL TABLE IF NOT EXISTS documents_fts USING fts5(
|
|
30
|
+
content,
|
|
31
|
+
title,
|
|
32
|
+
url,
|
|
33
|
+
path,
|
|
34
|
+
tokenize='porter unicode61',
|
|
35
|
+
content='documents',
|
|
36
|
+
content_rowid='id'
|
|
37
|
+
);
|
|
38
|
+
|
|
39
|
+
-- Delete trigger to maintain FTS index
|
|
40
|
+
CREATE TRIGGER IF NOT EXISTS documents_fts_after_delete AFTER DELETE ON documents BEGIN
|
|
41
|
+
INSERT INTO documents_fts(documents_fts, rowid, content, title, url, path)
|
|
42
|
+
VALUES('delete', old.id, old.content, json_extract(old.metadata, '$.title'), old.url, json_extract(old.metadata, '$.path'));
|
|
43
|
+
END;
|
|
44
|
+
|
|
45
|
+
-- Update trigger to maintain FTS index
|
|
46
|
+
CREATE TRIGGER IF NOT EXISTS documents_fts_after_update AFTER UPDATE ON documents BEGIN
|
|
47
|
+
INSERT INTO documents_fts(documents_fts, rowid, content, title, url, path)
|
|
48
|
+
VALUES('delete', old.id, old.content, json_extract(old.metadata, '$.title'), old.url, json_extract(old.metadata, '$.path'));
|
|
49
|
+
INSERT INTO documents_fts(rowid, content, title, url, path)
|
|
50
|
+
VALUES(new.id, new.content, json_extract(new.metadata, '$.title'), new.url, json_extract(new.metadata, '$.path'));
|
|
51
|
+
END;
|
|
52
|
+
|
|
53
|
+
-- Insert trigger to maintain FTS index
|
|
54
|
+
CREATE TRIGGER IF NOT EXISTS documents_fts_after_insert AFTER INSERT ON documents BEGIN
|
|
55
|
+
INSERT INTO documents_fts(rowid, content, title, url, path)
|
|
56
|
+
VALUES(new.id, new.content, json_extract(new.metadata, '$.title'), new.url, json_extract(new.metadata, '$.path'));
|
|
57
|
+
END;
|
|
@@ -0,0 +1,6 @@
|
|
|
1
|
+
-- Add indexed_at column to track when documents were last indexed
|
|
2
|
+
-- Step 1: Add the column allowing NULLs (SQLite limitation workaround)
|
|
3
|
+
ALTER TABLE documents ADD COLUMN indexed_at DATETIME;
|
|
4
|
+
|
|
5
|
+
-- Step 2: Update existing rows to set the timestamp
|
|
6
|
+
UPDATE documents SET indexed_at = CURRENT_TIMESTAMP WHERE indexed_at IS NULL;
|