@arabold/docs-mcp-server 1.6.0 → 1.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +87 -14
- package/dist/EmbeddingFactory-6UEXNF44.js +1177 -0
- package/dist/EmbeddingFactory-6UEXNF44.js.map +1 -0
- package/dist/{chunk-S7C2LRQA.js → chunk-FAZDXJQN.js} +25 -132
- package/dist/chunk-FAZDXJQN.js.map +1 -0
- package/dist/chunk-YCXNASA6.js +124 -0
- package/dist/chunk-YCXNASA6.js.map +1 -0
- package/dist/cli.js +6 -2
- package/dist/cli.js.map +1 -1
- package/dist/server.js +2 -1
- package/dist/server.js.map +1 -1
- package/package.json +4 -1
- package/dist/chunk-S7C2LRQA.js.map +0 -1
package/README.md
CHANGED
|
@@ -4,12 +4,12 @@ A MCP server for fetching and searching 3rd party package documentation.
|
|
|
4
4
|
|
|
5
5
|
## ✨ Key Features
|
|
6
6
|
|
|
7
|
-
- 🌐 **
|
|
8
|
-
- 🧠 **
|
|
9
|
-
- 💾 **
|
|
10
|
-
- 🔍 **Hybrid Search:** Combine vector and full-text search
|
|
11
|
-
- ⚙️ **Job
|
|
12
|
-
- 🐳 **
|
|
7
|
+
- 🌐 **Versatile Scraping:** Fetch documentation from diverse sources like websites, GitHub, npm, PyPI, or local files.
|
|
8
|
+
- 🧠 **Intelligent Processing:** Automatically split content semantically and generate embeddings using your choice of models (OpenAI, Google Gemini, Azure OpenAI, AWS Bedrock, Ollama, and more).
|
|
9
|
+
- 💾 **Optimized Storage:** Leverage SQLite with `sqlite-vec` for efficient vector storage and FTS5 for robust full-text search.
|
|
10
|
+
- 🔍 **Powerful Hybrid Search:** Combine vector similarity and full-text search across different library versions for highly relevant results.
|
|
11
|
+
- ⚙️ **Asynchronous Job Handling:** Manage scraping and indexing tasks efficiently with a background job queue and MCP/CLI tools.
|
|
12
|
+
- 🐳 **Simple Deployment:** Get up and running quickly using Docker or npx.
|
|
13
13
|
|
|
14
14
|
## Overview
|
|
15
15
|
|
|
@@ -28,14 +28,43 @@ The server exposes MCP tools for:
|
|
|
28
28
|
|
|
29
29
|
## Configuration
|
|
30
30
|
|
|
31
|
-
The following environment variables are supported to configure the
|
|
31
|
+
The following environment variables are supported to configure the embedding model behavior:
|
|
32
32
|
|
|
33
|
-
|
|
34
|
-
- `OPENAI_ORG_ID`: **Optional.** Your OpenAI Organization ID (handled automatically by LangChain if set).
|
|
35
|
-
- `OPENAI_API_BASE`: **Optional.** Custom base URL for OpenAI API (e.g., for Azure OpenAI or compatible APIs).
|
|
36
|
-
- `DOCS_MCP_EMBEDDING_MODEL`: **Optional.** Embedding model name (defaults to "text-embedding-3-small"). Must produce vectors with ≤1536 dimensions. Smaller dimensions are automatically padded with zeros.
|
|
33
|
+
### Embedding Model Configuration
|
|
37
34
|
|
|
38
|
-
|
|
35
|
+
- `DOCS_MCP_EMBEDDING_MODEL`: **Optional.** Format: `provider:model_name` or just `model_name` (defaults to `text-embedding-3-small`). Supported providers and their required environment variables:
|
|
36
|
+
|
|
37
|
+
- `openai` (default): Uses OpenAI's embedding models
|
|
38
|
+
|
|
39
|
+
- `OPENAI_API_KEY`: **Required.** Your OpenAI API key
|
|
40
|
+
- `OPENAI_ORG_ID`: **Optional.** Your OpenAI Organization ID
|
|
41
|
+
- `OPENAI_API_BASE`: **Optional.** Custom base URL for OpenAI-compatible APIs (e.g., Ollama, Azure OpenAI)
|
|
42
|
+
|
|
43
|
+
- `vertex`: Uses Google Cloud Vertex AI embeddings
|
|
44
|
+
|
|
45
|
+
- `GOOGLE_APPLICATION_CREDENTIALS`: **Required.** Path to service account JSON key file
|
|
46
|
+
|
|
47
|
+
- `gemini`: Uses Google Generative AI (Gemini) embeddings
|
|
48
|
+
|
|
49
|
+
- `GOOGLE_API_KEY`: **Required.** Your Google API key
|
|
50
|
+
|
|
51
|
+
- `aws`: Uses AWS Bedrock embeddings
|
|
52
|
+
|
|
53
|
+
- `AWS_ACCESS_KEY_ID`: **Required.** AWS access key
|
|
54
|
+
- `AWS_SECRET_ACCESS_KEY`: **Required.** AWS secret key
|
|
55
|
+
- `AWS_REGION` or `BEDROCK_AWS_REGION`: **Required.** AWS region for Bedrock
|
|
56
|
+
|
|
57
|
+
- `microsoft`: Uses Azure OpenAI embeddings
|
|
58
|
+
- `AZURE_OPENAI_API_KEY`: **Required.** Azure OpenAI API key
|
|
59
|
+
- `AZURE_OPENAI_API_INSTANCE_NAME`: **Required.** Azure instance name
|
|
60
|
+
- `AZURE_OPENAI_API_DEPLOYMENT_NAME`: **Required.** Azure deployment name
|
|
61
|
+
- `AZURE_OPENAI_API_VERSION`: **Required.** Azure API version
|
|
62
|
+
|
|
63
|
+
### Vector Dimensions
|
|
64
|
+
|
|
65
|
+
The database schema uses a fixed dimension of 1536 for embedding vectors. Only models that produce vectors with dimension ≤ 1536 are supported, except for certain providers (like Gemini) that support dimension reduction.
|
|
66
|
+
|
|
67
|
+
For OpenAI-compatible APIs (like Ollama), use the `openai` provider with `OPENAI_API_BASE` pointing to your endpoint.
|
|
39
68
|
|
|
40
69
|
These variables can be set regardless of how you run the server (Docker, npx, or from source).
|
|
41
70
|
|
|
@@ -92,10 +121,54 @@ This is the recommended approach for most users. It's easy, straightforward, and
|
|
|
92
121
|
Any of the configuration environment variables (see [Configuration](#configuration) above) can be passed to the container using the `-e` flag. For example:
|
|
93
122
|
|
|
94
123
|
```bash
|
|
124
|
+
# Example 1: Using OpenAI embeddings (default)
|
|
125
|
+
docker run -i --rm \
|
|
126
|
+
-e OPENAI_API_KEY="your-key-here" \
|
|
127
|
+
-e DOCS_MCP_EMBEDDING_MODEL="text-embedding-3-small" \
|
|
128
|
+
-v docs-mcp-data:/data \
|
|
129
|
+
ghcr.io/arabold/docs-mcp-server:latest
|
|
130
|
+
|
|
131
|
+
# Example 2: Using OpenAI-compatible API (like Ollama)
|
|
95
132
|
docker run -i --rm \
|
|
96
133
|
-e OPENAI_API_KEY="your-key-here" \
|
|
97
|
-
-e
|
|
98
|
-
-e
|
|
134
|
+
-e OPENAI_API_BASE="http://localhost:11434/v1" \
|
|
135
|
+
-e DOCS_MCP_EMBEDDING_MODEL="embeddings" \
|
|
136
|
+
-v docs-mcp-data:/data \
|
|
137
|
+
ghcr.io/arabold/docs-mcp-server:latest
|
|
138
|
+
|
|
139
|
+
# Example 3a: Using Google Cloud Vertex AI embeddings
|
|
140
|
+
docker run -i --rm \
|
|
141
|
+
-e OPENAI_API_KEY="your-openai-key" \ # Keep for fallback to OpenAI
|
|
142
|
+
-e DOCS_MCP_EMBEDDING_MODEL="vertex:text-embedding-004" \
|
|
143
|
+
-e GOOGLE_APPLICATION_CREDENTIALS="/app/gcp-key.json" \
|
|
144
|
+
-v docs-mcp-data:/data \
|
|
145
|
+
-v /path/to/gcp-key.json:/app/gcp-key.json:ro \
|
|
146
|
+
ghcr.io/arabold/docs-mcp-server:latest
|
|
147
|
+
|
|
148
|
+
# Example 3b: Using Google Generative AI (Gemini) embeddings
|
|
149
|
+
docker run -i --rm \
|
|
150
|
+
-e OPENAI_API_KEY="your-openai-key" \ # Keep for fallback to OpenAI
|
|
151
|
+
-e DOCS_MCP_EMBEDDING_MODEL="gemini:embedding-001" \
|
|
152
|
+
-e GOOGLE_API_KEY="your-google-api-key" \
|
|
153
|
+
-v docs-mcp-data:/data \
|
|
154
|
+
ghcr.io/arabold/docs-mcp-server:latest
|
|
155
|
+
|
|
156
|
+
# Example 4: Using AWS Bedrock embeddings
|
|
157
|
+
docker run -i --rm \
|
|
158
|
+
-e AWS_ACCESS_KEY_ID="your-aws-key" \
|
|
159
|
+
-e AWS_SECRET_ACCESS_KEY="your-aws-secret" \
|
|
160
|
+
-e AWS_REGION="us-east-1" \
|
|
161
|
+
-e DOCS_MCP_EMBEDDING_MODEL="aws:amazon.titan-embed-text-v1" \
|
|
162
|
+
-v docs-mcp-data:/data \
|
|
163
|
+
ghcr.io/arabold/docs-mcp-server:latest
|
|
164
|
+
|
|
165
|
+
# Example 5: Using Azure OpenAI embeddings
|
|
166
|
+
docker run -i --rm \
|
|
167
|
+
-e AZURE_OPENAI_API_KEY="your-azure-key" \
|
|
168
|
+
-e AZURE_OPENAI_API_INSTANCE_NAME="your-instance" \
|
|
169
|
+
-e AZURE_OPENAI_API_DEPLOYMENT_NAME="your-deployment" \
|
|
170
|
+
-e AZURE_OPENAI_API_VERSION="2024-02-01" \
|
|
171
|
+
-e DOCS_MCP_EMBEDDING_MODEL="microsoft:text-embedding-ada-002" \
|
|
99
172
|
-v docs-mcp-data:/data \
|
|
100
173
|
ghcr.io/arabold/docs-mcp-server:latest
|
|
101
174
|
```
|