@arabold/docs-mcp-server 1.5.0 → 1.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +215 -99
- package/dist/EmbeddingFactory-6UEXNF44.js +1177 -0
- package/dist/EmbeddingFactory-6UEXNF44.js.map +1 -0
- package/dist/{chunk-2YTVPKP5.js → chunk-FAZDXJQN.js} +57 -114
- package/dist/chunk-FAZDXJQN.js.map +1 -0
- package/dist/chunk-YCXNASA6.js +124 -0
- package/dist/chunk-YCXNASA6.js.map +1 -0
- package/dist/cli.js +9 -3
- package/dist/cli.js.map +1 -1
- package/dist/server.js +2 -2
- package/dist/server.js.map +1 -1
- package/package.json +7 -2
- package/dist/chunk-2YTVPKP5.js.map +0 -1
package/README.md
CHANGED
|
@@ -4,12 +4,12 @@ A MCP server for fetching and searching 3rd party package documentation.
|
|
|
4
4
|
|
|
5
5
|
## ✨ Key Features
|
|
6
6
|
|
|
7
|
-
- 🌐 **
|
|
8
|
-
- 🧠 **
|
|
9
|
-
- 💾 **
|
|
10
|
-
- 🔍 **Hybrid Search:** Combine vector and full-text search
|
|
11
|
-
- ⚙️ **Job
|
|
12
|
-
- 🐳 **
|
|
7
|
+
- 🌐 **Versatile Scraping:** Fetch documentation from diverse sources like websites, GitHub, npm, PyPI, or local files.
|
|
8
|
+
- 🧠 **Intelligent Processing:** Automatically split content semantically and generate embeddings using your choice of models (OpenAI, Google Gemini, Azure OpenAI, AWS Bedrock, Ollama, and more).
|
|
9
|
+
- 💾 **Optimized Storage:** Leverage SQLite with `sqlite-vec` for efficient vector storage and FTS5 for robust full-text search.
|
|
10
|
+
- 🔍 **Powerful Hybrid Search:** Combine vector similarity and full-text search across different library versions for highly relevant results.
|
|
11
|
+
- ⚙️ **Asynchronous Job Handling:** Manage scraping and indexing tasks efficiently with a background job queue and MCP/CLI tools.
|
|
12
|
+
- 🐳 **Simple Deployment:** Get up and running quickly using Docker or npx.
|
|
13
13
|
|
|
14
14
|
## Overview
|
|
15
15
|
|
|
@@ -26,104 +26,216 @@ The server exposes MCP tools for:
|
|
|
26
26
|
- Finding appropriate versions (`find_version`).
|
|
27
27
|
- Removing indexed documents (`remove_docs`).
|
|
28
28
|
|
|
29
|
-
##
|
|
29
|
+
## Configuration
|
|
30
30
|
|
|
31
|
-
|
|
31
|
+
The following environment variables are supported to configure the embedding model behavior:
|
|
32
32
|
|
|
33
|
-
###
|
|
33
|
+
### Embedding Model Configuration
|
|
34
34
|
|
|
35
|
-
|
|
35
|
+
- `DOCS_MCP_EMBEDDING_MODEL`: **Optional.** Format: `provider:model_name` or just `model_name` (defaults to `text-embedding-3-small`). Supported providers and their required environment variables:
|
|
36
36
|
|
|
37
|
-
|
|
38
|
-
```bash
|
|
39
|
-
npm install -g @arabold/docs-mcp-server
|
|
40
|
-
```
|
|
41
|
-
2. **Run the Server:**
|
|
42
|
-
```bash
|
|
43
|
-
docs-server
|
|
44
|
-
```
|
|
45
|
-
_(Note: You'll need to manage environment variables like `OPENAI_API_KEY` yourself when running this way, e.g., by setting them in your shell profile or using a tool like `dotenv`.)_
|
|
46
|
-
3. **Run the CLI:**
|
|
47
|
-
```bash
|
|
48
|
-
docs-cli <command> [options]
|
|
49
|
-
```
|
|
50
|
-
(See "CLI Command Reference" below for available commands and options.)
|
|
37
|
+
- `openai` (default): Uses OpenAI's embedding models
|
|
51
38
|
|
|
52
|
-
|
|
39
|
+
- `OPENAI_API_KEY`: **Required.** Your OpenAI API key
|
|
40
|
+
- `OPENAI_ORG_ID`: **Optional.** Your OpenAI Organization ID
|
|
41
|
+
- `OPENAI_API_BASE`: **Optional.** Custom base URL for OpenAI-compatible APIs (e.g., Ollama, Azure OpenAI)
|
|
53
42
|
|
|
54
|
-
|
|
43
|
+
- `vertex`: Uses Google Cloud Vertex AI embeddings
|
|
55
44
|
|
|
56
|
-
|
|
45
|
+
- `GOOGLE_APPLICATION_CREDENTIALS`: **Required.** Path to service account JSON key file
|
|
57
46
|
|
|
58
|
-
|
|
59
|
-
2. **Run the Server (e.g., for MCP Integration):**
|
|
47
|
+
- `gemini`: Uses Google Generative AI (Gemini) embeddings
|
|
60
48
|
|
|
61
|
-
|
|
62
|
-
docker run -i --rm \
|
|
63
|
-
-e OPENAI_API_KEY="your-openai-api-key-here" \
|
|
64
|
-
-v docs-mcp-data:/data \
|
|
65
|
-
ghcr.io/arabold/docs-mcp-server:latest
|
|
66
|
-
```
|
|
49
|
+
- `GOOGLE_API_KEY`: **Required.** Your Google API key
|
|
67
50
|
|
|
68
|
-
|
|
69
|
-
- `--rm`: Automatically remove the container when it exits.
|
|
70
|
-
- `-e OPENAI_API_KEY="..."`: **Required.** Set your OpenAI API key.
|
|
71
|
-
- `-v docs-mcp-data:/data`: **Required for persistence.** Mounts a Docker named volume `docs-mcp-data` to the container's `/data` directory, where the database is stored. You can replace `docs-mcp-data` with a specific host path if preferred (e.g., `-v /path/on/host:/data`).
|
|
72
|
-
- `ghcr.io/arabold/docs-mcp-server:latest`: Specifies the public Docker image to use.
|
|
73
|
-
|
|
74
|
-
This is the recommended approach for integrating with tools like Claude Desktop or Cline.
|
|
75
|
-
|
|
76
|
-
**Claude/Cline Configuration Example:**
|
|
77
|
-
Add the following configuration block to your MCP settings file (adjust path as needed):
|
|
78
|
-
|
|
79
|
-
- Cline: `/Users/andrerabold/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json`
|
|
80
|
-
- Claude Desktop (MacOS): `~/Library/Application Support/Claude/claude_desktop_config.json`
|
|
81
|
-
- Claude Desktop (Windows): `%APPDATA%/Claude/claude_desktop_config.json`
|
|
82
|
-
|
|
83
|
-
```json
|
|
84
|
-
{
|
|
85
|
-
"mcpServers": {
|
|
86
|
-
"docs-mcp-server": {
|
|
87
|
-
"command": "docker",
|
|
88
|
-
"args": [
|
|
89
|
-
"run",
|
|
90
|
-
"-i",
|
|
91
|
-
"--rm",
|
|
92
|
-
"-e",
|
|
93
|
-
"OPENAI_API_KEY",
|
|
94
|
-
"-v",
|
|
95
|
-
"docs-mcp-data:/data",
|
|
96
|
-
"ghcr.io/arabold/docs-mcp-server:latest"
|
|
97
|
-
],
|
|
98
|
-
"env": {
|
|
99
|
-
"OPENAI_API_KEY": "sk-proj-..." // Required: Replace with your key
|
|
100
|
-
},
|
|
101
|
-
"disabled": false,
|
|
102
|
-
"autoApprove": []
|
|
103
|
-
}
|
|
104
|
-
// ... other servers might be listed here
|
|
105
|
-
}
|
|
106
|
-
}
|
|
107
|
-
```
|
|
51
|
+
- `aws`: Uses AWS Bedrock embeddings
|
|
108
52
|
|
|
109
|
-
|
|
53
|
+
- `AWS_ACCESS_KEY_ID`: **Required.** AWS access key
|
|
54
|
+
- `AWS_SECRET_ACCESS_KEY`: **Required.** AWS secret key
|
|
55
|
+
- `AWS_REGION` or `BEDROCK_AWS_REGION`: **Required.** AWS region for Bedrock
|
|
110
56
|
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
57
|
+
- `microsoft`: Uses Azure OpenAI embeddings
|
|
58
|
+
- `AZURE_OPENAI_API_KEY`: **Required.** Azure OpenAI API key
|
|
59
|
+
- `AZURE_OPENAI_API_INSTANCE_NAME`: **Required.** Azure instance name
|
|
60
|
+
- `AZURE_OPENAI_API_DEPLOYMENT_NAME`: **Required.** Azure deployment name
|
|
61
|
+
- `AZURE_OPENAI_API_VERSION`: **Required.** Azure API version
|
|
62
|
+
|
|
63
|
+
### Vector Dimensions
|
|
64
|
+
|
|
65
|
+
The database schema uses a fixed dimension of 1536 for embedding vectors. Only models that produce vectors with dimension ≤ 1536 are supported, except for certain providers (like Gemini) that support dimension reduction.
|
|
66
|
+
|
|
67
|
+
For OpenAI-compatible APIs (like Ollama), use the `openai` provider with `OPENAI_API_BASE` pointing to your endpoint.
|
|
68
|
+
|
|
69
|
+
These variables can be set regardless of how you run the server (Docker, npx, or from source).
|
|
70
|
+
|
|
71
|
+
## Running the MCP Server
|
|
72
|
+
|
|
73
|
+
There are two ways to run the docs-mcp-server:
|
|
74
|
+
|
|
75
|
+
### Option 1: Using Docker (Recommended)
|
|
76
|
+
|
|
77
|
+
This is the recommended approach for most users. It's easy, straightforward, and doesn't require Node.js to be installed.
|
|
78
|
+
|
|
79
|
+
1. **Ensure Docker is installed and running.**
|
|
80
|
+
2. **Configure your MCP settings:**
|
|
81
|
+
|
|
82
|
+
**Claude/Cline/Roo Configuration Example:**
|
|
83
|
+
Add the following configuration block to your MCP settings file (adjust path as needed):
|
|
84
|
+
|
|
85
|
+
```json
|
|
86
|
+
{
|
|
87
|
+
"mcpServers": {
|
|
88
|
+
"docs-mcp-server": {
|
|
89
|
+
"command": "docker",
|
|
90
|
+
"args": [
|
|
91
|
+
"run",
|
|
92
|
+
"-i",
|
|
93
|
+
"--rm",
|
|
94
|
+
"-e",
|
|
95
|
+
"OPENAI_API_KEY",
|
|
96
|
+
"-v",
|
|
97
|
+
"docs-mcp-data:/data",
|
|
98
|
+
"ghcr.io/arabold/docs-mcp-server:latest"
|
|
99
|
+
],
|
|
100
|
+
"env": {
|
|
101
|
+
"OPENAI_API_KEY": "sk-proj-..." // Required: Replace with your key
|
|
102
|
+
},
|
|
103
|
+
"disabled": false,
|
|
104
|
+
"autoApprove": []
|
|
105
|
+
}
|
|
106
|
+
}
|
|
107
|
+
}
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
Remember to replace `"sk-proj-..."` with your actual OpenAI API key and restart the application.
|
|
111
|
+
|
|
112
|
+
3. **That's it!** The server will now be available to your AI assistant.
|
|
113
|
+
|
|
114
|
+
**Docker Container Settings:**
|
|
115
|
+
|
|
116
|
+
- `-i`: Keep STDIN open, crucial for MCP communication over stdio.
|
|
117
|
+
- `--rm`: Automatically remove the container when it exits.
|
|
118
|
+
- `-e OPENAI_API_KEY`: **Required.** Set your OpenAI API key.
|
|
119
|
+
- `-v docs-mcp-data:/data`: **Required for persistence.** Mounts a Docker named volume `docs-mcp-data` to store the database. You can replace with a specific host path if preferred (e.g., `-v /path/on/host:/data`).
|
|
121
120
|
|
|
122
|
-
|
|
121
|
+
Any of the configuration environment variables (see [Configuration](#configuration) above) can be passed to the container using the `-e` flag. For example:
|
|
123
122
|
|
|
124
|
-
|
|
123
|
+
```bash
|
|
124
|
+
# Example 1: Using OpenAI embeddings (default)
|
|
125
|
+
docker run -i --rm \
|
|
126
|
+
-e OPENAI_API_KEY="your-key-here" \
|
|
127
|
+
-e DOCS_MCP_EMBEDDING_MODEL="text-embedding-3-small" \
|
|
128
|
+
-v docs-mcp-data:/data \
|
|
129
|
+
ghcr.io/arabold/docs-mcp-server:latest
|
|
130
|
+
|
|
131
|
+
# Example 2: Using OpenAI-compatible API (like Ollama)
|
|
132
|
+
docker run -i --rm \
|
|
133
|
+
-e OPENAI_API_KEY="your-key-here" \
|
|
134
|
+
-e OPENAI_API_BASE="http://localhost:11434/v1" \
|
|
135
|
+
-e DOCS_MCP_EMBEDDING_MODEL="embeddings" \
|
|
136
|
+
-v docs-mcp-data:/data \
|
|
137
|
+
ghcr.io/arabold/docs-mcp-server:latest
|
|
138
|
+
|
|
139
|
+
# Example 3a: Using Google Cloud Vertex AI embeddings
|
|
140
|
+
docker run -i --rm \
|
|
141
|
+
-e OPENAI_API_KEY="your-openai-key" \ # Keep for fallback to OpenAI
|
|
142
|
+
-e DOCS_MCP_EMBEDDING_MODEL="vertex:text-embedding-004" \
|
|
143
|
+
-e GOOGLE_APPLICATION_CREDENTIALS="/app/gcp-key.json" \
|
|
144
|
+
-v docs-mcp-data:/data \
|
|
145
|
+
-v /path/to/gcp-key.json:/app/gcp-key.json:ro \
|
|
146
|
+
ghcr.io/arabold/docs-mcp-server:latest
|
|
147
|
+
|
|
148
|
+
# Example 3b: Using Google Generative AI (Gemini) embeddings
|
|
149
|
+
docker run -i --rm \
|
|
150
|
+
-e OPENAI_API_KEY="your-openai-key" \ # Keep for fallback to OpenAI
|
|
151
|
+
-e DOCS_MCP_EMBEDDING_MODEL="gemini:embedding-001" \
|
|
152
|
+
-e GOOGLE_API_KEY="your-google-api-key" \
|
|
153
|
+
-v docs-mcp-data:/data \
|
|
154
|
+
ghcr.io/arabold/docs-mcp-server:latest
|
|
155
|
+
|
|
156
|
+
# Example 4: Using AWS Bedrock embeddings
|
|
157
|
+
docker run -i --rm \
|
|
158
|
+
-e AWS_ACCESS_KEY_ID="your-aws-key" \
|
|
159
|
+
-e AWS_SECRET_ACCESS_KEY="your-aws-secret" \
|
|
160
|
+
-e AWS_REGION="us-east-1" \
|
|
161
|
+
-e DOCS_MCP_EMBEDDING_MODEL="aws:amazon.titan-embed-text-v1" \
|
|
162
|
+
-v docs-mcp-data:/data \
|
|
163
|
+
ghcr.io/arabold/docs-mcp-server:latest
|
|
164
|
+
|
|
165
|
+
# Example 5: Using Azure OpenAI embeddings
|
|
166
|
+
docker run -i --rm \
|
|
167
|
+
-e AZURE_OPENAI_API_KEY="your-azure-key" \
|
|
168
|
+
-e AZURE_OPENAI_API_INSTANCE_NAME="your-instance" \
|
|
169
|
+
-e AZURE_OPENAI_API_DEPLOYMENT_NAME="your-deployment" \
|
|
170
|
+
-e AZURE_OPENAI_API_VERSION="2024-02-01" \
|
|
171
|
+
-e DOCS_MCP_EMBEDDING_MODEL="microsoft:text-embedding-ada-002" \
|
|
172
|
+
-v docs-mcp-data:/data \
|
|
173
|
+
ghcr.io/arabold/docs-mcp-server:latest
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
### Option 2: Using npx
|
|
177
|
+
|
|
178
|
+
This approach is recommended when you need local file access (e.g., indexing documentation from your local file system). While this can also be achieved by mounting paths into a Docker container, using npx is simpler but requires a Node.js installation.
|
|
179
|
+
|
|
180
|
+
1. **Ensure Node.js is installed.**
|
|
181
|
+
2. **Configure your MCP settings:**
|
|
182
|
+
|
|
183
|
+
**Claude/Cline/Roo Configuration Example:**
|
|
184
|
+
Add the following configuration block to your MCP settings file:
|
|
185
|
+
|
|
186
|
+
```json
|
|
187
|
+
{
|
|
188
|
+
"mcpServers": {
|
|
189
|
+
"docs-mcp-server": {
|
|
190
|
+
"command": "npx",
|
|
191
|
+
"args": ["-y", "--package=@arabold/docs-mcp-server", "docs-server"],
|
|
192
|
+
"env": {
|
|
193
|
+
"OPENAI_API_KEY": "sk-proj-..." // Required: Replace with your key
|
|
194
|
+
},
|
|
195
|
+
"disabled": false,
|
|
196
|
+
"autoApprove": []
|
|
197
|
+
}
|
|
198
|
+
}
|
|
199
|
+
}
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
Remember to replace `"sk-proj-..."` with your actual OpenAI API key and restart the application.
|
|
203
|
+
|
|
204
|
+
3. **That's it!** The server will now be available to your AI assistant.
|
|
205
|
+
|
|
206
|
+
## Using the CLI
|
|
207
|
+
|
|
208
|
+
You can use the CLI to manage documentation directly, either via Docker or npx. **Important: Use the same method (Docker or npx) for both the server and CLI to ensure access to the same indexed documentation.**
|
|
209
|
+
|
|
210
|
+
### Using Docker CLI
|
|
211
|
+
|
|
212
|
+
If you're running the server with Docker, use Docker for the CLI as well:
|
|
213
|
+
|
|
214
|
+
```bash
|
|
215
|
+
docker run --rm \
|
|
216
|
+
-e OPENAI_API_KEY="your-openai-api-key-here" \
|
|
217
|
+
-v docs-mcp-data:/data \
|
|
218
|
+
ghcr.io/arabold/docs-mcp-server:latest \
|
|
219
|
+
docs-cli <command> [options]
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
Make sure to use the same volume name (`docs-mcp-data` in this example) as you did for the server. Any of the configuration environment variables (see [Configuration](#configuration) above) can be passed using `-e` flags, just like with the server.
|
|
125
223
|
|
|
126
|
-
|
|
224
|
+
### Using npx CLI
|
|
225
|
+
|
|
226
|
+
If you're running the server with npx, use npx for the CLI as well:
|
|
227
|
+
|
|
228
|
+
```bash
|
|
229
|
+
npx -y --package=@arabold/docs-mcp-server docs-cli <command> [options]
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
The npx approach will use the default data directory on your system (typically in your home directory), ensuring consistency between server and CLI.
|
|
233
|
+
|
|
234
|
+
(See "CLI Command Reference" below for available commands and options.)
|
|
235
|
+
|
|
236
|
+
### CLI Command Reference
|
|
237
|
+
|
|
238
|
+
The `docs-cli` provides commands for managing the documentation index. Access it either via Docker (`docker run -v docs-mcp-data:/data ghcr.io/arabold/docs-mcp-server:latest docs-cli ...`) or `npx` (`npx -y --package=@arabold/docs-mcp-server docs-cli ...`).
|
|
127
239
|
|
|
128
240
|
**General Help:**
|
|
129
241
|
|
|
@@ -140,7 +252,7 @@ docs-cli scrape --help
|
|
|
140
252
|
docs-cli search --help
|
|
141
253
|
docs-cli find-version --help
|
|
142
254
|
docs-cli remove --help
|
|
143
|
-
docs-cli list
|
|
255
|
+
docs-cli list --help
|
|
144
256
|
```
|
|
145
257
|
|
|
146
258
|
### Scraping Documentation (`scrape`)
|
|
@@ -164,11 +276,8 @@ docs-cli scrape <library> <url> [options]
|
|
|
164
276
|
**Examples:**
|
|
165
277
|
|
|
166
278
|
```bash
|
|
167
|
-
# Scrape React 18.2.0 docs
|
|
279
|
+
# Scrape React 18.2.0 docs
|
|
168
280
|
docs-cli scrape react --version 18.2.0 https://react.dev/
|
|
169
|
-
|
|
170
|
-
# Scrape React docs without a specific version (using npx)
|
|
171
|
-
npx -y --package=@arabold/docs-mcp-server docs-cli scrape react https://react.dev/
|
|
172
281
|
```
|
|
173
282
|
|
|
174
283
|
### Searching Documentation (`search`)
|
|
@@ -194,9 +303,6 @@ docs-cli search <library> <query> [options]
|
|
|
194
303
|
```bash
|
|
195
304
|
# Search latest React docs for 'hooks'
|
|
196
305
|
docs-cli search react 'hooks'
|
|
197
|
-
|
|
198
|
-
# Search React 18.x docs for 'hooks' (using npx)
|
|
199
|
-
npx -y --package=@arabold/docs-mcp-server docs-cli search react --version 18.x 'hooks'
|
|
200
306
|
```
|
|
201
307
|
|
|
202
308
|
### Finding Available Versions (`find-version`)
|
|
@@ -218,12 +324,12 @@ docs-cli find-version <library> [options]
|
|
|
218
324
|
docs-cli find-version react
|
|
219
325
|
```
|
|
220
326
|
|
|
221
|
-
### Listing Libraries (`list
|
|
327
|
+
### Listing Libraries (`list`)
|
|
222
328
|
|
|
223
329
|
Lists all libraries currently indexed in the store.
|
|
224
330
|
|
|
225
331
|
```bash
|
|
226
|
-
docs-cli list
|
|
332
|
+
docs-cli list
|
|
227
333
|
```
|
|
228
334
|
|
|
229
335
|
### Removing Documentation (`remove`)
|
|
@@ -330,6 +436,16 @@ This method is useful for contributing to the project or running un-published ve
|
|
|
330
436
|
# Required: Your OpenAI API key for generating embeddings.
|
|
331
437
|
OPENAI_API_KEY=your-api-key-here
|
|
332
438
|
|
|
439
|
+
# Optional: Your OpenAI Organization ID (handled automatically by LangChain if set)
|
|
440
|
+
OPENAI_ORG_ID=
|
|
441
|
+
|
|
442
|
+
# Optional: Custom base URL for OpenAI API (e.g., for Azure OpenAI or compatible APIs)
|
|
443
|
+
OPENAI_API_BASE=
|
|
444
|
+
|
|
445
|
+
# Optional: Embedding model name (defaults to "text-embedding-3-small")
|
|
446
|
+
# Examples: text-embedding-3-large, text-embedding-ada-002
|
|
447
|
+
DOCS_MCP_EMBEDDING_MODEL=
|
|
448
|
+
|
|
333
449
|
# Optional: Specify a custom directory to store the SQLite database file (documents.db).
|
|
334
450
|
# If set, this path takes precedence over the default locations.
|
|
335
451
|
# Default behavior (if unset):
|