@arabold/docs-mcp-server 1.5.0 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -4,12 +4,12 @@ A MCP server for fetching and searching 3rd party package documentation.
4
4
 
5
5
  ## ✨ Key Features
6
6
 
7
- - 🌐 **Scrape & Index:** Fetch documentation from web sources or local files.
8
- - 🧠 **Smart Processing:** Utilize semantic splitting and OpenAI embeddings for meaningful content chunks.
9
- - 💾 **Efficient Storage:** Store data in SQLite, leveraging `sqlite-vec` for vector search and FTS5 for full-text search.
10
- - 🔍 **Hybrid Search:** Combine vector and full-text search for relevant results across different library versions.
11
- - ⚙️ **Job Management:** Handle scraping tasks asynchronously with a robust job queue and management tools (MCP & CLI).
12
- - 🐳 **Easy Deployment:** Run the server easily using the provided Docker image.
7
+ - 🌐 **Versatile Scraping:** Fetch documentation from diverse sources like websites, GitHub, npm, PyPI, or local files.
8
+ - 🧠 **Intelligent Processing:** Automatically split content semantically and generate embeddings using your choice of models (OpenAI, Google Gemini, Azure OpenAI, AWS Bedrock, Ollama, and more).
9
+ - 💾 **Optimized Storage:** Leverage SQLite with `sqlite-vec` for efficient vector storage and FTS5 for robust full-text search.
10
+ - 🔍 **Powerful Hybrid Search:** Combine vector similarity and full-text search across different library versions for highly relevant results.
11
+ - ⚙️ **Asynchronous Job Handling:** Manage scraping and indexing tasks efficiently with a background job queue and MCP/CLI tools.
12
+ - 🐳 **Simple Deployment:** Get up and running quickly using Docker or npx.
13
13
 
14
14
  ## Overview
15
15
 
@@ -26,104 +26,216 @@ The server exposes MCP tools for:
26
26
  - Finding appropriate versions (`find_version`).
27
27
  - Removing indexed documents (`remove_docs`).
28
28
 
29
- ## Usage
29
+ ## Configuration
30
30
 
31
- Once the package is published to npm (`@arabold/docs-mcp-server`), you can run the server or the companion CLI in two main ways:
31
+ The following environment variables are supported to configure the embedding model behavior:
32
32
 
33
- ### Method 1: Global Installation (Recommended for CLI Usage)
33
+ ### Embedding Model Configuration
34
34
 
35
- Install the package globally using npm. This makes the `docs-server` and `docs-cli` commands directly available in your terminal.
35
+ - `DOCS_MCP_EMBEDDING_MODEL`: **Optional.** Format: `provider:model_name` or just `model_name` (defaults to `text-embedding-3-small`). Supported providers and their required environment variables:
36
36
 
37
- 1. **Install Globally:**
38
- ```bash
39
- npm install -g @arabold/docs-mcp-server
40
- ```
41
- 2. **Run the Server:**
42
- ```bash
43
- docs-server
44
- ```
45
- _(Note: You'll need to manage environment variables like `OPENAI_API_KEY` yourself when running this way, e.g., by setting them in your shell profile or using a tool like `dotenv`.)_
46
- 3. **Run the CLI:**
47
- ```bash
48
- docs-cli <command> [options]
49
- ```
50
- (See "CLI Command Reference" below for available commands and options.)
37
+ - `openai` (default): Uses OpenAI's embedding models
51
38
 
52
- This method is convenient if you plan to use the `docs-cli` frequently.
39
+ - `OPENAI_API_KEY`: **Required.** Your OpenAI API key
40
+ - `OPENAI_ORG_ID`: **Optional.** Your OpenAI Organization ID
41
+ - `OPENAI_API_BASE`: **Optional.** Custom base URL for OpenAI-compatible APIs (e.g., Ollama, Azure OpenAI)
53
42
 
54
- ### Method 2: Running with Docker (Recommended for MCP Integration)
43
+ - `vertex`: Uses Google Cloud Vertex AI embeddings
55
44
 
56
- Run the server using the pre-built Docker image available on GitHub Container Registry. This provides an isolated environment and simplifies setup.
45
+ - `GOOGLE_APPLICATION_CREDENTIALS`: **Required.** Path to service account JSON key file
57
46
 
58
- 1. **Ensure Docker is installed and running.**
59
- 2. **Run the Server (e.g., for MCP Integration):**
47
+ - `gemini`: Uses Google Generative AI (Gemini) embeddings
60
48
 
61
- ```bash
62
- docker run -i --rm \
63
- -e OPENAI_API_KEY="your-openai-api-key-here" \
64
- -v docs-mcp-data:/data \
65
- ghcr.io/arabold/docs-mcp-server:latest
66
- ```
49
+ - `GOOGLE_API_KEY`: **Required.** Your Google API key
67
50
 
68
- - `-i`: Keep STDIN open, crucial for MCP communication over stdio.
69
- - `--rm`: Automatically remove the container when it exits.
70
- - `-e OPENAI_API_KEY="..."`: **Required.** Set your OpenAI API key.
71
- - `-v docs-mcp-data:/data`: **Required for persistence.** Mounts a Docker named volume `docs-mcp-data` to the container's `/data` directory, where the database is stored. You can replace `docs-mcp-data` with a specific host path if preferred (e.g., `-v /path/on/host:/data`).
72
- - `ghcr.io/arabold/docs-mcp-server:latest`: Specifies the public Docker image to use.
73
-
74
- This is the recommended approach for integrating with tools like Claude Desktop or Cline.
75
-
76
- **Claude/Cline Configuration Example:**
77
- Add the following configuration block to your MCP settings file (adjust path as needed):
78
-
79
- - Cline: `/Users/andrerabold/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json`
80
- - Claude Desktop (MacOS): `~/Library/Application Support/Claude/claude_desktop_config.json`
81
- - Claude Desktop (Windows): `%APPDATA%/Claude/claude_desktop_config.json`
82
-
83
- ```json
84
- {
85
- "mcpServers": {
86
- "docs-mcp-server": {
87
- "command": "docker",
88
- "args": [
89
- "run",
90
- "-i",
91
- "--rm",
92
- "-e",
93
- "OPENAI_API_KEY",
94
- "-v",
95
- "docs-mcp-data:/data",
96
- "ghcr.io/arabold/docs-mcp-server:latest"
97
- ],
98
- "env": {
99
- "OPENAI_API_KEY": "sk-proj-..." // Required: Replace with your key
100
- },
101
- "disabled": false,
102
- "autoApprove": []
103
- }
104
- // ... other servers might be listed here
105
- }
106
- }
107
- ```
51
+ - `aws`: Uses AWS Bedrock embeddings
108
52
 
109
- Remember to replace `"sk-proj-..."` with your actual OpenAI API key and restart the application.
53
+ - `AWS_ACCESS_KEY_ID`: **Required.** AWS access key
54
+ - `AWS_SECRET_ACCESS_KEY`: **Required.** AWS secret key
55
+ - `AWS_REGION` or `BEDROCK_AWS_REGION`: **Required.** AWS region for Bedrock
110
56
 
111
- 3. **Run the CLI (Requires Docker):**
112
- To use the CLI commands, you can run them inside a temporary container:
113
- ```bash
114
- docker run --rm \
115
- -e OPENAI_API_KEY="your-openai-api-key-here" \
116
- -v docs-mcp-data:/data \
117
- ghcr.io/arabold/docs-mcp-server:latest \
118
- npx docs-cli <command> [options]
119
- ```
120
- (See "CLI Command Reference" below for available commands and options.)
57
+ - `microsoft`: Uses Azure OpenAI embeddings
58
+ - `AZURE_OPENAI_API_KEY`: **Required.** Azure OpenAI API key
59
+ - `AZURE_OPENAI_API_INSTANCE_NAME`: **Required.** Azure instance name
60
+ - `AZURE_OPENAI_API_DEPLOYMENT_NAME`: **Required.** Azure deployment name
61
+ - `AZURE_OPENAI_API_VERSION`: **Required.** Azure API version
62
+
63
+ ### Vector Dimensions
64
+
65
+ The database schema uses a fixed dimension of 1536 for embedding vectors. Only models that produce vectors with dimension ≤ 1536 are supported, except for certain providers (like Gemini) that support dimension reduction.
66
+
67
+ For OpenAI-compatible APIs (like Ollama), use the `openai` provider with `OPENAI_API_BASE` pointing to your endpoint.
68
+
69
+ These variables can be set regardless of how you run the server (Docker, npx, or from source).
70
+
71
+ ## Running the MCP Server
72
+
73
+ There are two ways to run the docs-mcp-server:
74
+
75
+ ### Option 1: Using Docker (Recommended)
76
+
77
+ This is the recommended approach for most users. It's easy, straightforward, and doesn't require Node.js to be installed.
78
+
79
+ 1. **Ensure Docker is installed and running.**
80
+ 2. **Configure your MCP settings:**
81
+
82
+ **Claude/Cline/Roo Configuration Example:**
83
+ Add the following configuration block to your MCP settings file (adjust path as needed):
84
+
85
+ ```json
86
+ {
87
+ "mcpServers": {
88
+ "docs-mcp-server": {
89
+ "command": "docker",
90
+ "args": [
91
+ "run",
92
+ "-i",
93
+ "--rm",
94
+ "-e",
95
+ "OPENAI_API_KEY",
96
+ "-v",
97
+ "docs-mcp-data:/data",
98
+ "ghcr.io/arabold/docs-mcp-server:latest"
99
+ ],
100
+ "env": {
101
+ "OPENAI_API_KEY": "sk-proj-..." // Required: Replace with your key
102
+ },
103
+ "disabled": false,
104
+ "autoApprove": []
105
+ }
106
+ }
107
+ }
108
+ ```
109
+
110
+ Remember to replace `"sk-proj-..."` with your actual OpenAI API key and restart the application.
111
+
112
+ 3. **That's it!** The server will now be available to your AI assistant.
113
+
114
+ **Docker Container Settings:**
115
+
116
+ - `-i`: Keep STDIN open, crucial for MCP communication over stdio.
117
+ - `--rm`: Automatically remove the container when it exits.
118
+ - `-e OPENAI_API_KEY`: **Required.** Set your OpenAI API key.
119
+ - `-v docs-mcp-data:/data`: **Required for persistence.** Mounts a Docker named volume `docs-mcp-data` to store the database. You can replace with a specific host path if preferred (e.g., `-v /path/on/host:/data`).
121
120
 
122
- This method is ideal for integrating the server into other tools and ensures a consistent runtime environment.
121
+ Any of the configuration environment variables (see [Configuration](#configuration) above) can be passed to the container using the `-e` flag. For example:
123
122
 
124
- ## CLI Command Reference
123
+ ```bash
124
+ # Example 1: Using OpenAI embeddings (default)
125
+ docker run -i --rm \
126
+ -e OPENAI_API_KEY="your-key-here" \
127
+ -e DOCS_MCP_EMBEDDING_MODEL="text-embedding-3-small" \
128
+ -v docs-mcp-data:/data \
129
+ ghcr.io/arabold/docs-mcp-server:latest
130
+
131
+ # Example 2: Using OpenAI-compatible API (like Ollama)
132
+ docker run -i --rm \
133
+ -e OPENAI_API_KEY="your-key-here" \
134
+ -e OPENAI_API_BASE="http://localhost:11434/v1" \
135
+ -e DOCS_MCP_EMBEDDING_MODEL="embeddings" \
136
+ -v docs-mcp-data:/data \
137
+ ghcr.io/arabold/docs-mcp-server:latest
138
+
139
+ # Example 3a: Using Google Cloud Vertex AI embeddings
140
+ docker run -i --rm \
141
+ -e OPENAI_API_KEY="your-openai-key" \ # Keep for fallback to OpenAI
142
+ -e DOCS_MCP_EMBEDDING_MODEL="vertex:text-embedding-004" \
143
+ -e GOOGLE_APPLICATION_CREDENTIALS="/app/gcp-key.json" \
144
+ -v docs-mcp-data:/data \
145
+ -v /path/to/gcp-key.json:/app/gcp-key.json:ro \
146
+ ghcr.io/arabold/docs-mcp-server:latest
147
+
148
+ # Example 3b: Using Google Generative AI (Gemini) embeddings
149
+ docker run -i --rm \
150
+ -e OPENAI_API_KEY="your-openai-key" \ # Keep for fallback to OpenAI
151
+ -e DOCS_MCP_EMBEDDING_MODEL="gemini:embedding-001" \
152
+ -e GOOGLE_API_KEY="your-google-api-key" \
153
+ -v docs-mcp-data:/data \
154
+ ghcr.io/arabold/docs-mcp-server:latest
155
+
156
+ # Example 4: Using AWS Bedrock embeddings
157
+ docker run -i --rm \
158
+ -e AWS_ACCESS_KEY_ID="your-aws-key" \
159
+ -e AWS_SECRET_ACCESS_KEY="your-aws-secret" \
160
+ -e AWS_REGION="us-east-1" \
161
+ -e DOCS_MCP_EMBEDDING_MODEL="aws:amazon.titan-embed-text-v1" \
162
+ -v docs-mcp-data:/data \
163
+ ghcr.io/arabold/docs-mcp-server:latest
164
+
165
+ # Example 5: Using Azure OpenAI embeddings
166
+ docker run -i --rm \
167
+ -e AZURE_OPENAI_API_KEY="your-azure-key" \
168
+ -e AZURE_OPENAI_API_INSTANCE_NAME="your-instance" \
169
+ -e AZURE_OPENAI_API_DEPLOYMENT_NAME="your-deployment" \
170
+ -e AZURE_OPENAI_API_VERSION="2024-02-01" \
171
+ -e DOCS_MCP_EMBEDDING_MODEL="microsoft:text-embedding-ada-002" \
172
+ -v docs-mcp-data:/data \
173
+ ghcr.io/arabold/docs-mcp-server:latest
174
+ ```
175
+
176
+ ### Option 2: Using npx
177
+
178
+ This approach is recommended when you need local file access (e.g., indexing documentation from your local file system). While this can also be achieved by mounting paths into a Docker container, using npx is simpler but requires a Node.js installation.
179
+
180
+ 1. **Ensure Node.js is installed.**
181
+ 2. **Configure your MCP settings:**
182
+
183
+ **Claude/Cline/Roo Configuration Example:**
184
+ Add the following configuration block to your MCP settings file:
185
+
186
+ ```json
187
+ {
188
+ "mcpServers": {
189
+ "docs-mcp-server": {
190
+ "command": "npx",
191
+ "args": ["-y", "--package=@arabold/docs-mcp-server", "docs-server"],
192
+ "env": {
193
+ "OPENAI_API_KEY": "sk-proj-..." // Required: Replace with your key
194
+ },
195
+ "disabled": false,
196
+ "autoApprove": []
197
+ }
198
+ }
199
+ }
200
+ ```
201
+
202
+ Remember to replace `"sk-proj-..."` with your actual OpenAI API key and restart the application.
203
+
204
+ 3. **That's it!** The server will now be available to your AI assistant.
205
+
206
+ ## Using the CLI
207
+
208
+ You can use the CLI to manage documentation directly, either via Docker or npx. **Important: Use the same method (Docker or npx) for both the server and CLI to ensure access to the same indexed documentation.**
209
+
210
+ ### Using Docker CLI
211
+
212
+ If you're running the server with Docker, use Docker for the CLI as well:
213
+
214
+ ```bash
215
+ docker run --rm \
216
+ -e OPENAI_API_KEY="your-openai-api-key-here" \
217
+ -v docs-mcp-data:/data \
218
+ ghcr.io/arabold/docs-mcp-server:latest \
219
+ docs-cli <command> [options]
220
+ ```
221
+
222
+ Make sure to use the same volume name (`docs-mcp-data` in this example) as you did for the server. Any of the configuration environment variables (see [Configuration](#configuration) above) can be passed using `-e` flags, just like with the server.
125
223
 
126
- The `docs-cli` provides commands for managing the documentation index. Access it either via global installation (`docs-cli ...`) or `npx` (`npx -y --package=@arabold/docs-mcp-server docs-cli ...`).
224
+ ### Using npx CLI
225
+
226
+ If you're running the server with npx, use npx for the CLI as well:
227
+
228
+ ```bash
229
+ npx -y --package=@arabold/docs-mcp-server docs-cli <command> [options]
230
+ ```
231
+
232
+ The npx approach will use the default data directory on your system (typically in your home directory), ensuring consistency between server and CLI.
233
+
234
+ (See "CLI Command Reference" below for available commands and options.)
235
+
236
+ ### CLI Command Reference
237
+
238
+ The `docs-cli` provides commands for managing the documentation index. Access it either via Docker (`docker run -v docs-mcp-data:/data ghcr.io/arabold/docs-mcp-server:latest docs-cli ...`) or `npx` (`npx -y --package=@arabold/docs-mcp-server docs-cli ...`).
127
239
 
128
240
  **General Help:**
129
241
 
@@ -140,7 +252,7 @@ docs-cli scrape --help
140
252
  docs-cli search --help
141
253
  docs-cli find-version --help
142
254
  docs-cli remove --help
143
- docs-cli list-libraries --help
255
+ docs-cli list --help
144
256
  ```
145
257
 
146
258
  ### Scraping Documentation (`scrape`)
@@ -164,11 +276,8 @@ docs-cli scrape <library> <url> [options]
164
276
  **Examples:**
165
277
 
166
278
  ```bash
167
- # Scrape React 18.2.0 docs (assuming global install)
279
+ # Scrape React 18.2.0 docs
168
280
  docs-cli scrape react --version 18.2.0 https://react.dev/
169
-
170
- # Scrape React docs without a specific version (using npx)
171
- npx -y --package=@arabold/docs-mcp-server docs-cli scrape react https://react.dev/
172
281
  ```
173
282
 
174
283
  ### Searching Documentation (`search`)
@@ -194,9 +303,6 @@ docs-cli search <library> <query> [options]
194
303
  ```bash
195
304
  # Search latest React docs for 'hooks'
196
305
  docs-cli search react 'hooks'
197
-
198
- # Search React 18.x docs for 'hooks' (using npx)
199
- npx -y --package=@arabold/docs-mcp-server docs-cli search react --version 18.x 'hooks'
200
306
  ```
201
307
 
202
308
  ### Finding Available Versions (`find-version`)
@@ -218,12 +324,12 @@ docs-cli find-version <library> [options]
218
324
  docs-cli find-version react
219
325
  ```
220
326
 
221
- ### Listing Libraries (`list-libraries`)
327
+ ### Listing Libraries (`list`)
222
328
 
223
329
  Lists all libraries currently indexed in the store.
224
330
 
225
331
  ```bash
226
- docs-cli list-libraries
332
+ docs-cli list
227
333
  ```
228
334
 
229
335
  ### Removing Documentation (`remove`)
@@ -330,6 +436,16 @@ This method is useful for contributing to the project or running un-published ve
330
436
  # Required: Your OpenAI API key for generating embeddings.
331
437
  OPENAI_API_KEY=your-api-key-here
332
438
 
439
+ # Optional: Your OpenAI Organization ID (handled automatically by LangChain if set)
440
+ OPENAI_ORG_ID=
441
+
442
+ # Optional: Custom base URL for OpenAI API (e.g., for Azure OpenAI or compatible APIs)
443
+ OPENAI_API_BASE=
444
+
445
+ # Optional: Embedding model name (defaults to "text-embedding-3-small")
446
+ # Examples: text-embedding-3-large, text-embedding-ada-002
447
+ DOCS_MCP_EMBEDDING_MODEL=
448
+
333
449
  # Optional: Specify a custom directory to store the SQLite database file (documents.db).
334
450
  # If set, this path takes precedence over the default locations.
335
451
  # Default behavior (if unset):