@arabold/docs-mcp-server 1.6.0 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -4,12 +4,12 @@ A MCP server for fetching and searching 3rd party package documentation.
4
4
 
5
5
  ## ✨ Key Features
6
6
 
7
- - 🌐 **Scrape & Index:** Fetch documentation from web sources or local files.
8
- - 🧠 **Smart Processing:** Utilize semantic splitting and OpenAI embeddings for meaningful content chunks.
9
- - 💾 **Efficient Storage:** Store data in SQLite, leveraging `sqlite-vec` for vector search and FTS5 for full-text search.
10
- - 🔍 **Hybrid Search:** Combine vector and full-text search for relevant results across different library versions.
11
- - ⚙️ **Job Management:** Handle scraping tasks asynchronously with a robust job queue and management tools (MCP & CLI).
12
- - 🐳 **Easy Deployment:** Run the server easily using Docker or npx.
7
+ - 🌐 **Versatile Scraping:** Fetch documentation from diverse sources like websites, GitHub, npm, PyPI, or local files.
8
+ - 🧠 **Intelligent Processing:** Automatically split content semantically and generate embeddings using your choice of models (OpenAI, Google Gemini, Azure OpenAI, AWS Bedrock, Ollama, and more).
9
+ - 💾 **Optimized Storage:** Leverage SQLite with `sqlite-vec` for efficient vector storage and FTS5 for robust full-text search.
10
+ - 🔍 **Powerful Hybrid Search:** Combine vector similarity and full-text search across different library versions for highly relevant results.
11
+ - ⚙️ **Asynchronous Job Handling:** Manage scraping and indexing tasks efficiently with a background job queue and MCP/CLI tools.
12
+ - 🐳 **Simple Deployment:** Get up and running quickly using Docker or npx.
13
13
 
14
14
  ## Overview
15
15
 
@@ -28,14 +28,43 @@ The server exposes MCP tools for:
28
28
 
29
29
  ## Configuration
30
30
 
31
- The following environment variables are supported to configure the OpenAI API and embedding behavior:
31
+ The following environment variables are supported to configure the embedding model behavior:
32
32
 
33
- - `OPENAI_API_KEY`: **Required.** Your OpenAI API key for generating embeddings.
34
- - `OPENAI_ORG_ID`: **Optional.** Your OpenAI Organization ID (handled automatically by LangChain if set).
35
- - `OPENAI_API_BASE`: **Optional.** Custom base URL for OpenAI API (e.g., for Azure OpenAI or compatible APIs).
36
- - `DOCS_MCP_EMBEDDING_MODEL`: **Optional.** Embedding model name (defaults to "text-embedding-3-small"). Must produce vectors with ≤1536 dimensions. Smaller dimensions are automatically padded with zeros.
33
+ ### Embedding Model Configuration
37
34
 
38
- The database schema uses a fixed dimension of 1536 for embedding vectors. Models that produce larger vectors are not supported and will cause an error. Models with smaller vectors (e.g., older embedding models) are automatically padded with zeros to match the required dimension.
35
+ - `DOCS_MCP_EMBEDDING_MODEL`: **Optional.** Format: `provider:model_name` or just `model_name` (defaults to `text-embedding-3-small`). Supported providers and their required environment variables:
36
+
37
+ - `openai` (default): Uses OpenAI's embedding models
38
+
39
+ - `OPENAI_API_KEY`: **Required.** Your OpenAI API key
40
+ - `OPENAI_ORG_ID`: **Optional.** Your OpenAI Organization ID
41
+ - `OPENAI_API_BASE`: **Optional.** Custom base URL for OpenAI-compatible APIs (e.g., Ollama, Azure OpenAI)
42
+
43
+ - `vertex`: Uses Google Cloud Vertex AI embeddings
44
+
45
+ - `GOOGLE_APPLICATION_CREDENTIALS`: **Required.** Path to service account JSON key file
46
+
47
+ - `gemini`: Uses Google Generative AI (Gemini) embeddings
48
+
49
+ - `GOOGLE_API_KEY`: **Required.** Your Google API key
50
+
51
+ - `aws`: Uses AWS Bedrock embeddings
52
+
53
+ - `AWS_ACCESS_KEY_ID`: **Required.** AWS access key
54
+ - `AWS_SECRET_ACCESS_KEY`: **Required.** AWS secret key
55
+ - `AWS_REGION` or `BEDROCK_AWS_REGION`: **Required.** AWS region for Bedrock
56
+
57
+ - `microsoft`: Uses Azure OpenAI embeddings
58
+ - `AZURE_OPENAI_API_KEY`: **Required.** Azure OpenAI API key
59
+ - `AZURE_OPENAI_API_INSTANCE_NAME`: **Required.** Azure instance name
60
+ - `AZURE_OPENAI_API_DEPLOYMENT_NAME`: **Required.** Azure deployment name
61
+ - `AZURE_OPENAI_API_VERSION`: **Required.** Azure API version
62
+
63
+ ### Vector Dimensions
64
+
65
+ The database schema uses a fixed dimension of 1536 for embedding vectors. Only models that produce vectors with dimension ≤ 1536 are supported, except for certain providers (like Gemini) that support dimension reduction.
66
+
67
+ For OpenAI-compatible APIs (like Ollama), use the `openai` provider with `OPENAI_API_BASE` pointing to your endpoint.
39
68
 
40
69
  These variables can be set regardless of how you run the server (Docker, npx, or from source).
41
70
 
@@ -92,10 +121,54 @@ This is the recommended approach for most users. It's easy, straightforward, and
92
121
  Any of the configuration environment variables (see [Configuration](#configuration) above) can be passed to the container using the `-e` flag. For example:
93
122
 
94
123
  ```bash
124
+ # Example 1: Using OpenAI embeddings (default)
125
+ docker run -i --rm \
126
+ -e OPENAI_API_KEY="your-key-here" \
127
+ -e DOCS_MCP_EMBEDDING_MODEL="text-embedding-3-small" \
128
+ -v docs-mcp-data:/data \
129
+ ghcr.io/arabold/docs-mcp-server:latest
130
+
131
+ # Example 2: Using OpenAI-compatible API (like Ollama)
95
132
  docker run -i --rm \
96
133
  -e OPENAI_API_KEY="your-key-here" \
97
- -e DOCS_MCP_EMBEDDING_MODEL="text-embedding-3-large" \
98
- -e OPENAI_API_BASE="http://your-api-endpoint" \
134
+ -e OPENAI_API_BASE="http://localhost:11434/v1" \
135
+ -e DOCS_MCP_EMBEDDING_MODEL="embeddings" \
136
+ -v docs-mcp-data:/data \
137
+ ghcr.io/arabold/docs-mcp-server:latest
138
+
139
+ # Example 3a: Using Google Cloud Vertex AI embeddings
140
+ docker run -i --rm \
141
+ -e OPENAI_API_KEY="your-openai-key" \ # Keep for fallback to OpenAI
142
+ -e DOCS_MCP_EMBEDDING_MODEL="vertex:text-embedding-004" \
143
+ -e GOOGLE_APPLICATION_CREDENTIALS="/app/gcp-key.json" \
144
+ -v docs-mcp-data:/data \
145
+ -v /path/to/gcp-key.json:/app/gcp-key.json:ro \
146
+ ghcr.io/arabold/docs-mcp-server:latest
147
+
148
+ # Example 3b: Using Google Generative AI (Gemini) embeddings
149
+ docker run -i --rm \
150
+ -e OPENAI_API_KEY="your-openai-key" \ # Keep for fallback to OpenAI
151
+ -e DOCS_MCP_EMBEDDING_MODEL="gemini:embedding-001" \
152
+ -e GOOGLE_API_KEY="your-google-api-key" \
153
+ -v docs-mcp-data:/data \
154
+ ghcr.io/arabold/docs-mcp-server:latest
155
+
156
+ # Example 4: Using AWS Bedrock embeddings
157
+ docker run -i --rm \
158
+ -e AWS_ACCESS_KEY_ID="your-aws-key" \
159
+ -e AWS_SECRET_ACCESS_KEY="your-aws-secret" \
160
+ -e AWS_REGION="us-east-1" \
161
+ -e DOCS_MCP_EMBEDDING_MODEL="aws:amazon.titan-embed-text-v1" \
162
+ -v docs-mcp-data:/data \
163
+ ghcr.io/arabold/docs-mcp-server:latest
164
+
165
+ # Example 5: Using Azure OpenAI embeddings
166
+ docker run -i --rm \
167
+ -e AZURE_OPENAI_API_KEY="your-azure-key" \
168
+ -e AZURE_OPENAI_API_INSTANCE_NAME="your-instance" \
169
+ -e AZURE_OPENAI_API_DEPLOYMENT_NAME="your-deployment" \
170
+ -e AZURE_OPENAI_API_VERSION="2024-02-01" \
171
+ -e DOCS_MCP_EMBEDDING_MODEL="microsoft:text-embedding-ada-002" \
99
172
  -v docs-mcp-data:/data \
100
173
  ghcr.io/arabold/docs-mcp-server:latest
101
174
  ```