@arabold/docs-mcp-server 1.5.0 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -9,7 +9,7 @@ A MCP server for fetching and searching 3rd party package documentation.
9
9
  - 💾 **Efficient Storage:** Store data in SQLite, leveraging `sqlite-vec` for vector search and FTS5 for full-text search.
10
10
  - 🔍 **Hybrid Search:** Combine vector and full-text search for relevant results across different library versions.
11
11
  - ⚙️ **Job Management:** Handle scraping tasks asynchronously with a robust job queue and management tools (MCP & CLI).
12
- - 🐳 **Easy Deployment:** Run the server easily using the provided Docker image.
12
+ - 🐳 **Easy Deployment:** Run the server easily using Docker or npx.
13
13
 
14
14
  ## Overview
15
15
 
@@ -26,104 +26,143 @@ The server exposes MCP tools for:
26
26
  - Finding appropriate versions (`find_version`).
27
27
  - Removing indexed documents (`remove_docs`).
28
28
 
29
- ## Usage
29
+ ## Configuration
30
30
 
31
- Once the package is published to npm (`@arabold/docs-mcp-server`), you can run the server or the companion CLI in two main ways:
31
+ The following environment variables are supported to configure the OpenAI API and embedding behavior:
32
32
 
33
- ### Method 1: Global Installation (Recommended for CLI Usage)
33
+ - `OPENAI_API_KEY`: **Required.** Your OpenAI API key for generating embeddings.
34
+ - `OPENAI_ORG_ID`: **Optional.** Your OpenAI Organization ID (handled automatically by LangChain if set).
35
+ - `OPENAI_API_BASE`: **Optional.** Custom base URL for OpenAI API (e.g., for Azure OpenAI or compatible APIs).
36
+ - `DOCS_MCP_EMBEDDING_MODEL`: **Optional.** Embedding model name (defaults to "text-embedding-3-small"). Must produce vectors with ≤1536 dimensions. Smaller dimensions are automatically padded with zeros.
34
37
 
35
- Install the package globally using npm. This makes the `docs-server` and `docs-cli` commands directly available in your terminal.
38
+ The database schema uses a fixed dimension of 1536 for embedding vectors. Models that produce larger vectors are not supported and will cause an error. Models with smaller vectors (e.g., older embedding models) are automatically padded with zeros to match the required dimension.
36
39
 
37
- 1. **Install Globally:**
38
- ```bash
39
- npm install -g @arabold/docs-mcp-server
40
- ```
41
- 2. **Run the Server:**
42
- ```bash
43
- docs-server
44
- ```
45
- _(Note: You'll need to manage environment variables like `OPENAI_API_KEY` yourself when running this way, e.g., by setting them in your shell profile or using a tool like `dotenv`.)_
46
- 3. **Run the CLI:**
47
- ```bash
48
- docs-cli <command> [options]
49
- ```
50
- (See "CLI Command Reference" below for available commands and options.)
40
+ These variables can be set regardless of how you run the server (Docker, npx, or from source).
51
41
 
52
- This method is convenient if you plan to use the `docs-cli` frequently.
42
+ ## Running the MCP Server
53
43
 
54
- ### Method 2: Running with Docker (Recommended for MCP Integration)
44
+ There are two ways to run the docs-mcp-server:
55
45
 
56
- Run the server using the pre-built Docker image available on GitHub Container Registry. This provides an isolated environment and simplifies setup.
46
+ ### Option 1: Using Docker (Recommended)
57
47
 
58
- 1. **Ensure Docker is installed and running.**
59
- 2. **Run the Server (e.g., for MCP Integration):**
48
+ This is the recommended approach for most users. It's easy, straightforward, and doesn't require Node.js to be installed.
60
49
 
61
- ```bash
62
- docker run -i --rm \
63
- -e OPENAI_API_KEY="your-openai-api-key-here" \
64
- -v docs-mcp-data:/data \
65
- ghcr.io/arabold/docs-mcp-server:latest
66
- ```
50
+ 1. **Ensure Docker is installed and running.**
51
+ 2. **Configure your MCP settings:**
67
52
 
68
- - `-i`: Keep STDIN open, crucial for MCP communication over stdio.
69
- - `--rm`: Automatically remove the container when it exits.
70
- - `-e OPENAI_API_KEY="..."`: **Required.** Set your OpenAI API key.
71
- - `-v docs-mcp-data:/data`: **Required for persistence.** Mounts a Docker named volume `docs-mcp-data` to the container's `/data` directory, where the database is stored. You can replace `docs-mcp-data` with a specific host path if preferred (e.g., `-v /path/on/host:/data`).
72
- - `ghcr.io/arabold/docs-mcp-server:latest`: Specifies the public Docker image to use.
73
-
74
- This is the recommended approach for integrating with tools like Claude Desktop or Cline.
75
-
76
- **Claude/Cline Configuration Example:**
77
- Add the following configuration block to your MCP settings file (adjust path as needed):
78
-
79
- - Cline: `/Users/andrerabold/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json`
80
- - Claude Desktop (MacOS): `~/Library/Application Support/Claude/claude_desktop_config.json`
81
- - Claude Desktop (Windows): `%APPDATA%/Claude/claude_desktop_config.json`
82
-
83
- ```json
84
- {
85
- "mcpServers": {
86
- "docs-mcp-server": {
87
- "command": "docker",
88
- "args": [
89
- "run",
90
- "-i",
91
- "--rm",
92
- "-e",
93
- "OPENAI_API_KEY",
94
- "-v",
95
- "docs-mcp-data:/data",
96
- "ghcr.io/arabold/docs-mcp-server:latest"
97
- ],
98
- "env": {
99
- "OPENAI_API_KEY": "sk-proj-..." // Required: Replace with your key
100
- },
101
- "disabled": false,
102
- "autoApprove": []
103
- }
104
- // ... other servers might be listed here
105
- }
106
- }
107
- ```
53
+ **Claude/Cline/Roo Configuration Example:**
54
+ Add the following configuration block to your MCP settings file (adjust path as needed):
108
55
 
109
- Remember to replace `"sk-proj-..."` with your actual OpenAI API key and restart the application.
56
+ ```json
57
+ {
58
+ "mcpServers": {
59
+ "docs-mcp-server": {
60
+ "command": "docker",
61
+ "args": [
62
+ "run",
63
+ "-i",
64
+ "--rm",
65
+ "-e",
66
+ "OPENAI_API_KEY",
67
+ "-v",
68
+ "docs-mcp-data:/data",
69
+ "ghcr.io/arabold/docs-mcp-server:latest"
70
+ ],
71
+ "env": {
72
+ "OPENAI_API_KEY": "sk-proj-..." // Required: Replace with your key
73
+ },
74
+ "disabled": false,
75
+ "autoApprove": []
76
+ }
77
+ }
78
+ }
79
+ ```
110
80
 
111
- 3. **Run the CLI (Requires Docker):**
112
- To use the CLI commands, you can run them inside a temporary container:
113
- ```bash
114
- docker run --rm \
115
- -e OPENAI_API_KEY="your-openai-api-key-here" \
116
- -v docs-mcp-data:/data \
117
- ghcr.io/arabold/docs-mcp-server:latest \
118
- npx docs-cli <command> [options]
119
- ```
120
- (See "CLI Command Reference" below for available commands and options.)
81
+ Remember to replace `"sk-proj-..."` with your actual OpenAI API key and restart the application.
82
+
83
+ 3. **That's it!** The server will now be available to your AI assistant.
84
+
85
+ **Docker Container Settings:**
86
+
87
+ - `-i`: Keep STDIN open, crucial for MCP communication over stdio.
88
+ - `--rm`: Automatically remove the container when it exits.
89
+ - `-e OPENAI_API_KEY`: **Required.** Set your OpenAI API key.
90
+ - `-v docs-mcp-data:/data`: **Required for persistence.** Mounts a Docker named volume `docs-mcp-data` to store the database. You can replace with a specific host path if preferred (e.g., `-v /path/on/host:/data`).
91
+
92
+ Any of the configuration environment variables (see [Configuration](#configuration) above) can be passed to the container using the `-e` flag. For example:
93
+
94
+ ```bash
95
+ docker run -i --rm \
96
+ -e OPENAI_API_KEY="your-key-here" \
97
+ -e DOCS_MCP_EMBEDDING_MODEL="text-embedding-3-large" \
98
+ -e OPENAI_API_BASE="http://your-api-endpoint" \
99
+ -v docs-mcp-data:/data \
100
+ ghcr.io/arabold/docs-mcp-server:latest
101
+ ```
102
+
103
+ ### Option 2: Using npx
104
+
105
+ This approach is recommended when you need local file access (e.g., indexing documentation from your local file system). While this can also be achieved by mounting paths into a Docker container, using npx is simpler but requires a Node.js installation.
106
+
107
+ 1. **Ensure Node.js is installed.**
108
+ 2. **Configure your MCP settings:**
109
+
110
+ **Claude/Cline/Roo Configuration Example:**
111
+ Add the following configuration block to your MCP settings file:
112
+
113
+ ```json
114
+ {
115
+ "mcpServers": {
116
+ "docs-mcp-server": {
117
+ "command": "npx",
118
+ "args": ["-y", "--package=@arabold/docs-mcp-server", "docs-server"],
119
+ "env": {
120
+ "OPENAI_API_KEY": "sk-proj-..." // Required: Replace with your key
121
+ },
122
+ "disabled": false,
123
+ "autoApprove": []
124
+ }
125
+ }
126
+ }
127
+ ```
121
128
 
122
- This method is ideal for integrating the server into other tools and ensures a consistent runtime environment.
129
+ Remember to replace `"sk-proj-..."` with your actual OpenAI API key and restart the application.
123
130
 
124
- ## CLI Command Reference
131
+ 3. **That's it!** The server will now be available to your AI assistant.
125
132
 
126
- The `docs-cli` provides commands for managing the documentation index. Access it either via global installation (`docs-cli ...`) or `npx` (`npx -y --package=@arabold/docs-mcp-server docs-cli ...`).
133
+ ## Using the CLI
134
+
135
+ You can use the CLI to manage documentation directly, either via Docker or npx. **Important: Use the same method (Docker or npx) for both the server and CLI to ensure access to the same indexed documentation.**
136
+
137
+ ### Using Docker CLI
138
+
139
+ If you're running the server with Docker, use Docker for the CLI as well:
140
+
141
+ ```bash
142
+ docker run --rm \
143
+ -e OPENAI_API_KEY="your-openai-api-key-here" \
144
+ -v docs-mcp-data:/data \
145
+ ghcr.io/arabold/docs-mcp-server:latest \
146
+ docs-cli <command> [options]
147
+ ```
148
+
149
+ Make sure to use the same volume name (`docs-mcp-data` in this example) as you did for the server. Any of the configuration environment variables (see [Configuration](#configuration) above) can be passed using `-e` flags, just like with the server.
150
+
151
+ ### Using npx CLI
152
+
153
+ If you're running the server with npx, use npx for the CLI as well:
154
+
155
+ ```bash
156
+ npx -y --package=@arabold/docs-mcp-server docs-cli <command> [options]
157
+ ```
158
+
159
+ The npx approach will use the default data directory on your system (typically in your home directory), ensuring consistency between server and CLI.
160
+
161
+ (See "CLI Command Reference" below for available commands and options.)
162
+
163
+ ### CLI Command Reference
164
+
165
+ The `docs-cli` provides commands for managing the documentation index. Access it either via Docker (`docker run -v docs-mcp-data:/data ghcr.io/arabold/docs-mcp-server:latest docs-cli ...`) or `npx` (`npx -y --package=@arabold/docs-mcp-server docs-cli ...`).
127
166
 
128
167
  **General Help:**
129
168
 
@@ -140,7 +179,7 @@ docs-cli scrape --help
140
179
  docs-cli search --help
141
180
  docs-cli find-version --help
142
181
  docs-cli remove --help
143
- docs-cli list-libraries --help
182
+ docs-cli list --help
144
183
  ```
145
184
 
146
185
  ### Scraping Documentation (`scrape`)
@@ -164,11 +203,8 @@ docs-cli scrape <library> <url> [options]
164
203
  **Examples:**
165
204
 
166
205
  ```bash
167
- # Scrape React 18.2.0 docs (assuming global install)
206
+ # Scrape React 18.2.0 docs
168
207
  docs-cli scrape react --version 18.2.0 https://react.dev/
169
-
170
- # Scrape React docs without a specific version (using npx)
171
- npx -y --package=@arabold/docs-mcp-server docs-cli scrape react https://react.dev/
172
208
  ```
173
209
 
174
210
  ### Searching Documentation (`search`)
@@ -194,9 +230,6 @@ docs-cli search <library> <query> [options]
194
230
  ```bash
195
231
  # Search latest React docs for 'hooks'
196
232
  docs-cli search react 'hooks'
197
-
198
- # Search React 18.x docs for 'hooks' (using npx)
199
- npx -y --package=@arabold/docs-mcp-server docs-cli search react --version 18.x 'hooks'
200
233
  ```
201
234
 
202
235
  ### Finding Available Versions (`find-version`)
@@ -218,12 +251,12 @@ docs-cli find-version <library> [options]
218
251
  docs-cli find-version react
219
252
  ```
220
253
 
221
- ### Listing Libraries (`list-libraries`)
254
+ ### Listing Libraries (`list`)
222
255
 
223
256
  Lists all libraries currently indexed in the store.
224
257
 
225
258
  ```bash
226
- docs-cli list-libraries
259
+ docs-cli list
227
260
  ```
228
261
 
229
262
  ### Removing Documentation (`remove`)
@@ -330,6 +363,16 @@ This method is useful for contributing to the project or running un-published ve
330
363
  # Required: Your OpenAI API key for generating embeddings.
331
364
  OPENAI_API_KEY=your-api-key-here
332
365
 
366
+ # Optional: Your OpenAI Organization ID (handled automatically by LangChain if set)
367
+ OPENAI_ORG_ID=
368
+
369
+ # Optional: Custom base URL for OpenAI API (e.g., for Azure OpenAI or compatible APIs)
370
+ OPENAI_API_BASE=
371
+
372
+ # Optional: Embedding model name (defaults to "text-embedding-3-small")
373
+ # Examples: text-embedding-3-large, text-embedding-ada-002
374
+ DOCS_MCP_EMBEDDING_MODEL=
375
+
333
376
  # Optional: Specify a custom directory to store the SQLite database file (documents.db).
334
377
  # If set, this path takes precedence over the default locations.
335
378
  # Default behavior (if unset):
@@ -10790,6 +10790,16 @@ var StoreError = class extends Error {
10790
10790
  }
10791
10791
  }
10792
10792
  };
10793
+ var DimensionError = class extends StoreError {
10794
+ constructor(modelName, modelDimension, dbDimension) {
10795
+ super(
10796
+ `Model "${modelName}" produces ${modelDimension}-dimensional vectors, which exceeds the database's fixed dimension of ${dbDimension}. Please use a model with dimension \u2264 ${dbDimension}.`
10797
+ );
10798
+ this.modelName = modelName;
10799
+ this.modelDimension = modelDimension;
10800
+ this.dbDimension = dbDimension;
10801
+ }
10802
+ };
10793
10803
  var ConnectionError = class extends StoreError {
10794
10804
  };
10795
10805
 
@@ -10863,6 +10873,9 @@ function mapDbDocumentToDocument(doc) {
10863
10873
  var DocumentStore = class {
10864
10874
  db;
10865
10875
  embeddings;
10876
+ dbDimension = 1536;
10877
+ // Fixed dimension from schema.ts
10878
+ modelDimension;
10866
10879
  statements;
10867
10880
  /**
10868
10881
  * Calculates Reciprocal Rank Fusion score for a result
@@ -10971,14 +10984,46 @@ var DocumentStore = class {
10971
10984
  this.statements = statements;
10972
10985
  }
10973
10986
  /**
10974
- * Initializes embeddings client
10987
+ * Pads a vector to the fixed database dimension by appending zeros.
10988
+ * Throws an error if the input vector is longer than the database dimension.
10975
10989
  */
10976
- initializeEmbeddings() {
10977
- this.embeddings = new OpenAIEmbeddings({
10978
- modelName: "text-embedding-3-small",
10990
+ padVector(vector) {
10991
+ if (vector.length > this.dbDimension) {
10992
+ throw new Error(
10993
+ `Vector dimension ${vector.length} exceeds database dimension ${this.dbDimension}`
10994
+ );
10995
+ }
10996
+ if (vector.length === this.dbDimension) {
10997
+ return vector;
10998
+ }
10999
+ return [...vector, ...new Array(this.dbDimension - vector.length).fill(0)];
11000
+ }
11001
+ /**
11002
+ * Initializes embeddings client using environment variables for configuration.
11003
+ *
11004
+ * Supports:
11005
+ * - OPENAI_API_KEY (handled automatically by LangChain)
11006
+ * - OPENAI_ORG_ID (handled automatically by LangChain)
11007
+ * - DOCS_MCP_EMBEDDING_MODEL (optional, defaults to "text-embedding-3-small")
11008
+ * - OPENAI_API_BASE (optional)
11009
+ */
11010
+ async initializeEmbeddings() {
11011
+ const modelName = process.env.DOCS_MCP_EMBEDDING_MODEL || "text-embedding-3-small";
11012
+ const baseURL = process.env.OPENAI_API_BASE;
11013
+ const config = {
10979
11014
  stripNewLines: true,
10980
- batchSize: 512
10981
- });
11015
+ batchSize: 512,
11016
+ modelName
11017
+ };
11018
+ if (baseURL) {
11019
+ config.configuration = { baseURL };
11020
+ }
11021
+ this.embeddings = new OpenAIEmbeddings(config);
11022
+ const testVector = await this.embeddings.embedQuery("test");
11023
+ this.modelDimension = testVector.length;
11024
+ if (this.modelDimension > this.dbDimension) {
11025
+ throw new DimensionError(modelName, this.modelDimension, this.dbDimension);
11026
+ }
10982
11027
  }
10983
11028
  /**
10984
11029
  * Escapes a query string for use with SQLite FTS5 MATCH operator.
@@ -10996,8 +11041,11 @@ var DocumentStore = class {
10996
11041
  sqliteVec.load(this.db);
10997
11042
  this.db.exec(createTablesSQL);
10998
11043
  this.prepareStatements();
10999
- this.initializeEmbeddings();
11044
+ await this.initializeEmbeddings();
11000
11045
  } catch (error) {
11046
+ if (error instanceof StoreError) {
11047
+ throw error;
11048
+ }
11001
11049
  throw new ConnectionError("Failed to initialize database connection", error);
11002
11050
  }
11003
11051
  }
@@ -11065,7 +11113,8 @@ var DocumentStore = class {
11065
11113
  `;
11066
11114
  return `${header}${doc.pageContent}`;
11067
11115
  });
11068
- const embeddings = await this.embeddings.embedDocuments(texts);
11116
+ const rawEmbeddings = await this.embeddings.embedDocuments(texts);
11117
+ const paddedEmbeddings = rawEmbeddings.map((vector) => this.padVector(vector));
11069
11118
  const transaction = this.db.transaction((docs) => {
11070
11119
  for (let i = 0; i < docs.length; i++) {
11071
11120
  const doc = docs[i];
@@ -11086,7 +11135,7 @@ var DocumentStore = class {
11086
11135
  BigInt(rowId),
11087
11136
  library.toLowerCase(),
11088
11137
  version.toLowerCase(),
11089
- JSON.stringify(embeddings[i])
11138
+ JSON.stringify(paddedEmbeddings[i])
11090
11139
  );
11091
11140
  }
11092
11141
  });
@@ -11132,7 +11181,8 @@ var DocumentStore = class {
11132
11181
  */
11133
11182
  async findByContent(library, version, query, limit) {
11134
11183
  try {
11135
- const embedding = await this.embeddings.embedQuery(query);
11184
+ const rawEmbedding = await this.embeddings.embedQuery(query);
11185
+ const embedding = this.padVector(rawEmbedding);
11136
11186
  const ftsQuery = this.escapeFtsQuery(query);
11137
11187
  const stmt = this.db.prepare(`
11138
11188
  WITH vec_scores AS (
@@ -11568,4 +11618,4 @@ export {
11568
11618
  RemoveTool,
11569
11619
  DocumentManagementService
11570
11620
  };
11571
- //# sourceMappingURL=chunk-2YTVPKP5.js.map
11621
+ //# sourceMappingURL=chunk-S7C2LRQA.js.map