npm - @arabold/docs-mcp-server - Versions diffs - 1.6.0 → 1.8.0 - Mend

@arabold/docs-mcp-server 1.6.0 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/README.md +111 -16
package/dist/EmbeddingFactory-6UEXNF44.js +1177 -0
package/dist/EmbeddingFactory-6UEXNF44.js.map +1 -0
package/dist/{chunk-S7C2LRQA.js → chunk-ADZQJG2M.js} +358 -391
package/dist/chunk-ADZQJG2M.js.map +1 -0
package/dist/chunk-YCXNASA6.js +124 -0
package/dist/chunk-YCXNASA6.js.map +1 -0
package/dist/cli.js +26 -3
package/dist/cli.js.map +1 -1
package/dist/server.js +35 -11
package/dist/server.js.map +1 -1
package/package.json +4 -1
package/dist/chunk-S7C2LRQA.js.map +0 -1

package/README.md CHANGED Viewed

@@ -4,12 +4,12 @@ A MCP server for fetching and searching 3rd party package documentation.
 ## ✨ Key Features
-- 🌐 **Scrape & Index:** Fetch documentation from web sources or local files.
-- 🧠 **Smart Processing:** Utilize semantic splitting and OpenAI embeddings for meaningful content chunks.
-- 💾 **Efficient Storage:** Store data in SQLite, leveraging `sqlite-vec` for vector search and FTS5 for full-text search.
-- 🔍 **Hybrid Search:** Combine vector and full-text search for relevant results across different library versions.
-- ⚙️ **Job Management:** Handle scraping tasks asynchronously with a robust job queue and management tools (MCP & CLI).
-- 🐳 **Easy Deployment:** Run the server easily using Docker or npx.
+- 🌐 **Versatile Scraping:** Fetch documentation from diverse sources like websites, GitHub, npm, PyPI, or local files.
+- 🧠 **Intelligent Processing:** Automatically split content semantically and generate embeddings using your choice of models (OpenAI, Google Gemini, Azure OpenAI, AWS Bedrock, Ollama, and more).
+- 💾 **Optimized Storage:** Leverage SQLite with `sqlite-vec` for efficient vector storage and FTS5 for robust full-text search.
+- 🔍 **Powerful Hybrid Search:** Combine vector similarity and full-text search across different library versions for highly relevant results.
+- ⚙️ **Asynchronous Job Handling:** Manage scraping and indexing tasks efficiently with a background job queue and MCP/CLI tools.
+- 🐳 **Simple Deployment:** Get up and running quickly using Docker or npx.
 ## Overview
@@ -25,17 +25,47 @@ The server exposes MCP tools for:
 - Listing indexed libraries (`list_libraries`).
 - Finding appropriate versions (`find_version`).
 - Removing indexed documents (`remove_docs`).
+- Fetching single URLs (`fetch_url`): Fetches a URL and returns its content as Markdown.
 ## Configuration
-The following environment variables are supported to configure the OpenAI API and embedding behavior:
+The following environment variables are supported to configure the embedding model behavior:
-- `OPENAI_API_KEY`: **Required.** Your OpenAI API key for generating embeddings.
-- `OPENAI_ORG_ID`: **Optional.** Your OpenAI Organization ID (handled automatically by LangChain if set).
-- `OPENAI_API_BASE`: **Optional.** Custom base URL for OpenAI API (e.g., for Azure OpenAI or compatible APIs).
-- `DOCS_MCP_EMBEDDING_MODEL`: **Optional.** Embedding model name (defaults to "text-embedding-3-small"). Must produce vectors with ≤1536 dimensions. Smaller dimensions are automatically padded with zeros.
+### Embedding Model Configuration
-The database schema uses a fixed dimension of 1536 for embedding vectors. Models that produce larger vectors are not supported and will cause an error. Models with smaller vectors (e.g., older embedding models) are automatically padded with zeros to match the required dimension.
+- `DOCS_MCP_EMBEDDING_MODEL`: **Optional.** Format: `provider:model_name` or just `model_name` (defaults to `text-embedding-3-small`). Supported providers and their required environment variables:
+  - `openai` (default): Uses OpenAI's embedding models
+    - `OPENAI_API_KEY`: **Required.** Your OpenAI API key
+    - `OPENAI_ORG_ID`: **Optional.** Your OpenAI Organization ID
+    - `OPENAI_API_BASE`: **Optional.** Custom base URL for OpenAI-compatible APIs (e.g., Ollama, Azure OpenAI)
+  - `vertex`: Uses Google Cloud Vertex AI embeddings
+    - `GOOGLE_APPLICATION_CREDENTIALS`: **Required.** Path to service account JSON key file
+  - `gemini`: Uses Google Generative AI (Gemini) embeddings
+    - `GOOGLE_API_KEY`: **Required.** Your Google API key
+  - `aws`: Uses AWS Bedrock embeddings
+    - `AWS_ACCESS_KEY_ID`: **Required.** AWS access key
+    - `AWS_SECRET_ACCESS_KEY`: **Required.** AWS secret key
+    - `AWS_REGION` or `BEDROCK_AWS_REGION`: **Required.** AWS region for Bedrock
+  - `microsoft`: Uses Azure OpenAI embeddings
+    - `AZURE_OPENAI_API_KEY`: **Required.** Azure OpenAI API key
+    - `AZURE_OPENAI_API_INSTANCE_NAME`: **Required.** Azure instance name
+    - `AZURE_OPENAI_API_DEPLOYMENT_NAME`: **Required.** Azure deployment name
+    - `AZURE_OPENAI_API_VERSION`: **Required.** Azure API version
+### Vector Dimensions
+The database schema uses a fixed dimension of 1536 for embedding vectors. Only models that produce vectors with dimension ≤ 1536 are supported, except for certain providers (like Gemini) that support dimension reduction.
+For OpenAI-compatible APIs (like Ollama), use the `openai` provider with `OPENAI_API_BASE` pointing to your endpoint.
 These variables can be set regardless of how you run the server (Docker, npx, or from source).
@@ -92,10 +122,54 @@ This is the recommended approach for most users. It's easy, straightforward, and
 Any of the configuration environment variables (see [Configuration](#configuration) above) can be passed to the container using the `-e` flag. For example:
 ```bash
+# Example 1: Using OpenAI embeddings (default)
 docker run -i --rm \
   -e OPENAI_API_KEY="your-key-here" \
-  -e DOCS_MCP_EMBEDDING_MODEL="text-embedding-3-large" \
-  -e OPENAI_API_BASE="http://your-api-endpoint" \
+  -e DOCS_MCP_EMBEDDING_MODEL="text-embedding-3-small" \
+  -v docs-mcp-data:/data \
+  ghcr.io/arabold/docs-mcp-server:latest
+# Example 2: Using OpenAI-compatible API (like Ollama)
+docker run -i --rm \
+  -e OPENAI_API_KEY="your-key-here" \
+  -e OPENAI_API_BASE="http://localhost:11434/v1" \
+  -e DOCS_MCP_EMBEDDING_MODEL="embeddings" \
+  -v docs-mcp-data:/data \
+  ghcr.io/arabold/docs-mcp-server:latest
+# Example 3a: Using Google Cloud Vertex AI embeddings
+docker run -i --rm \
+  -e OPENAI_API_KEY="your-openai-key" \  # Keep for fallback to OpenAI
+  -e DOCS_MCP_EMBEDDING_MODEL="vertex:text-embedding-004" \
+  -e GOOGLE_APPLICATION_CREDENTIALS="/app/gcp-key.json" \
+  -v docs-mcp-data:/data \
+  -v /path/to/gcp-key.json:/app/gcp-key.json:ro \
+  ghcr.io/arabold/docs-mcp-server:latest
+# Example 3b: Using Google Generative AI (Gemini) embeddings
+docker run -i --rm \
+  -e OPENAI_API_KEY="your-openai-key" \  # Keep for fallback to OpenAI
+  -e DOCS_MCP_EMBEDDING_MODEL="gemini:embedding-001" \
+  -e GOOGLE_API_KEY="your-google-api-key" \
+  -v docs-mcp-data:/data \
+  ghcr.io/arabold/docs-mcp-server:latest
+# Example 4: Using AWS Bedrock embeddings
+docker run -i --rm \
+  -e AWS_ACCESS_KEY_ID="your-aws-key" \
+  -e AWS_SECRET_ACCESS_KEY="your-aws-secret" \
+  -e AWS_REGION="us-east-1" \
+  -e DOCS_MCP_EMBEDDING_MODEL="aws:amazon.titan-embed-text-v1" \
+  -v docs-mcp-data:/data \
+  ghcr.io/arabold/docs-mcp-server:latest
+# Example 5: Using Azure OpenAI embeddings
+docker run -i --rm \
+  -e AZURE_OPENAI_API_KEY="your-azure-key" \
+  -e AZURE_OPENAI_API_INSTANCE_NAME="your-instance" \
+  -e AZURE_OPENAI_API_DEPLOYMENT_NAME="your-deployment" \
+  -e AZURE_OPENAI_API_VERSION="2024-02-01" \
+  -e DOCS_MCP_EMBEDDING_MODEL="microsoft:text-embedding-ada-002" \
   -v docs-mcp-data:/data \
   ghcr.io/arabold/docs-mcp-server:latest
 ```
@@ -177,11 +251,31 @@ npx -y --package=@arabold/docs-mcp-server docs-cli --help
 ```bash
 docs-cli scrape --help
 docs-cli search --help
+docs-cli fetch-url --help
 docs-cli find-version --help
 docs-cli remove --help
 docs-cli list --help
 ```
+### Fetching Single URLs (`fetch-url`)
+Fetches a single URL and converts its content to Markdown. Unlike `scrape`, this command does not crawl links or store the content.
+```bash
+docs-cli fetch-url <url> [options]
+```
+**Options:**
+- `--no-follow-redirects`: Disable following HTTP redirects (default: follow redirects)
+**Examples:**
+```bash
+# Fetch a URL and convert to Markdown
+docs-cli fetch-url https://example.com/page.html
+```
 ### Scraping Documentation (`scrape`)
 Scrapes and indexes documentation from a given URL for a specific library.
@@ -398,13 +492,14 @@ This project uses [semantic-release](https://github.com/semantic-release/semanti
 **How it works:**
 1.  **Commit Messages:** All commits merged into the `main` branch **must** follow the Conventional Commits specification.
-2.  **Automation:** The "Release" GitHub Actions workflow automatically runs `semantic-release` on pushes to `main`.
+2.  **Manual Trigger:** The "Release" GitHub Actions workflow can be triggered manually from the Actions tab when you're ready to create a new release.
 3.  **`semantic-release` Actions:** Determines version, updates `CHANGELOG.md` & `package.json`, commits, tags, publishes to npm, and creates a GitHub Release.
 **What you need to do:**
 - Use Conventional Commits.
-- Merge to `main`.
+- Merge changes to `main`.
+- Trigger a release manually when ready from the Actions tab in GitHub.
 **Automation handles:** Changelog, version bumps, tags, npm publish, GitHub releases.