@arabold/docs-mcp-server 1.10.0 → 1.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,81 +1,50 @@
1
- # docs-mcp-server MCP Server
1
+ # Docs MCP Server: Enhance Your AI Coding Assistant
2
2
 
3
- A MCP server for fetching and searching 3rd party package documentation.
3
+ AI coding assistants often struggle with outdated documentation, leading to incorrect suggestions or hallucinated code examples. Verifying AI responses against specific library versions can be time-consuming and inefficient.
4
4
 
5
- ## Key Features
6
-
7
- - 🌐 **Versatile Scraping:** Fetch documentation from diverse sources like websites, GitHub, npm, PyPI, or local files.
8
- - 🧠 **Intelligent Processing:** Automatically split content semantically and generate embeddings using your choice of models (OpenAI, Google Gemini, Azure OpenAI, AWS Bedrock, Ollama, and more).
9
- - 💾 **Optimized Storage:** Leverage SQLite with `sqlite-vec` for efficient vector storage and FTS5 for robust full-text search.
10
- - 🔍 **Powerful Hybrid Search:** Combine vector similarity and full-text search across different library versions for highly relevant results.
11
- - ⚙️ **Asynchronous Job Handling:** Manage scraping and indexing tasks efficiently with a background job queue and MCP/CLI tools.
12
- - 🐳 **Simple Deployment:** Get up and running quickly using Docker or npx.
13
-
14
- ## Overview
15
-
16
- This project provides a Model Context Protocol (MCP) server designed to scrape, process, index, and search documentation for various software libraries and packages. It fetches content from specified URLs, splits it into meaningful chunks using semantic splitting techniques, generates vector embeddings using OpenAI, and stores the data in an SQLite database. The server utilizes `sqlite-vec` for efficient vector similarity search and FTS5 for full-text search capabilities, combining them for hybrid search results. It supports versioning, allowing documentation for different library versions (including unversioned content) to be stored and queried distinctly.
17
-
18
- The server exposes MCP tools for:
19
-
20
- - Starting a scraping job (`scrape_docs`): Returns a `jobId` immediately.
21
- - Checking job status (`get_job_status`): Retrieves the current status and progress of a specific job.
22
- - Listing active/completed jobs (`list_jobs`): Shows recent and ongoing jobs.
23
- - Cancelling a job (`cancel_job`): Attempts to stop a running or queued job.
24
- - Searching documentation (`search_docs`).
25
- - Listing indexed libraries (`list_libraries`).
26
- - Finding appropriate versions (`find_version`).
27
- - Removing indexed documents (`remove_docs`).
28
- - Fetching single URLs (`fetch_url`): Fetches a URL and returns its content as Markdown.
29
-
30
- ## Configuration
31
-
32
- The following environment variables are supported to configure the embedding model behavior:
33
-
34
- ### Embedding Model Configuration
35
-
36
- - `DOCS_MCP_EMBEDDING_MODEL`: **Optional.** Format: `provider:model_name` or just `model_name` (defaults to `text-embedding-3-small`). Supported providers and their required environment variables:
37
-
38
- - `openai` (default): Uses OpenAI's embedding models
39
-
40
- - `OPENAI_API_KEY`: **Required.** Your OpenAI API key
41
- - `OPENAI_ORG_ID`: **Optional.** Your OpenAI Organization ID
42
- - `OPENAI_API_BASE`: **Optional.** Custom base URL for OpenAI-compatible APIs (e.g., Ollama, Azure OpenAI)
5
+ The **Docs MCP Server** addresses these challenges by providing a personal, always-current knowledge base for your AI assistant. It acts as a bridge, connecting your LLM directly to the **latest official documentation** from thousands of software libraries.
43
6
 
44
- - `vertex`: Uses Google Cloud Vertex AI embeddings
7
+ By grounding AI responses in accurate, version-aware context, the Docs MCP Server enables you to receive concise and relevant integration details and code snippets, improving the reliability and efficiency of LLM-assisted development.
45
8
 
46
- - `GOOGLE_APPLICATION_CREDENTIALS`: **Required.** Path to service account JSON key file
9
+ It's **free**, **open-source**, runs **locally** for privacy, and integrates seamlessly with your workflow via the Model Context Protocol (MCP).
47
10
 
48
- - `gemini`: Uses Google Generative AI (Gemini) embeddings
11
+ ## Why Use the Docs MCP Server?
49
12
 
50
- - `GOOGLE_API_KEY`: **Required.** Your Google API key
13
+ LLM-assisted coding promises speed and efficiency, but often falls short due to:
51
14
 
52
- - `aws`: Uses AWS Bedrock embeddings
15
+ - 🌀 **Stale Knowledge:** LLMs train on snapshots of the internet, quickly falling behind new library releases and API changes.
16
+ - 👻 **Code Hallucinations:** AI can invent plausible-looking code that is syntactically correct but functionally wrong or uses non-existent APIs.
17
+ - ❓ **Version Ambiguity:** Generic answers rarely account for the specific version dependencies in _your_ project, leading to subtle bugs.
18
+ - ⏳ **Verification Overhead:** Developers spend valuable time double-checking AI suggestions against official documentation.
53
19
 
54
- - `AWS_ACCESS_KEY_ID`: **Required.** AWS access key
55
- - `AWS_SECRET_ACCESS_KEY`: **Required.** AWS secret key
56
- - `AWS_REGION` or `BEDROCK_AWS_REGION`: **Required.** AWS region for Bedrock
20
+ **The Docs MCP Server tackles these problems head-on by:**
57
21
 
58
- - `microsoft`: Uses Azure OpenAI embeddings
59
- - `AZURE_OPENAI_API_KEY`: **Required.** Azure OpenAI API key
60
- - `AZURE_OPENAI_API_INSTANCE_NAME`: **Required.** Azure instance name
61
- - `AZURE_OPENAI_API_DEPLOYMENT_NAME`: **Required.** Azure deployment name
62
- - `AZURE_OPENAI_API_VERSION`: **Required.** Azure API version
22
+ - **Providing Always Up-to-Date Context:** It fetches and indexes documentation _directly_ from official sources (websites, GitHub, npm, PyPI, local files) on demand.
23
+ - 🎯 **Delivering Version-Specific Answers:** Search queries can target exact library versions, ensuring the information aligns with your project's dependencies.
24
+ - 💡 **Reducing Hallucinations:** By grounding the LLM in real documentation, it provides accurate examples and integration details.
25
+ - **Boosting Productivity:** Get trustworthy answers faster, integrated directly into your AI assistant workflow.
63
26
 
64
- ### Vector Dimensions
65
-
66
- The database schema uses a fixed dimension of 1536 for embedding vectors. Only models that produce vectors with dimension ≤ 1536 are supported, except for certain providers (like Gemini) that support dimension reduction.
67
-
68
- For OpenAI-compatible APIs (like Ollama), use the `openai` provider with `OPENAI_API_BASE` pointing to your endpoint.
27
+ ## Key Features
69
28
 
70
- These variables can be set regardless of how you run the server (Docker, npx, or from source).
29
+ - **Up-to-Date Knowledge:** Fetches the latest documentation directly from the source.
30
+ - **Version-Aware Search:** Get answers relevant to specific library versions (e.g., `react@18.2.0` vs `react@17.0.0`).
31
+ - **Accurate Snippets:** Reduces AI hallucinations by using context from official docs.
32
+ - **Broad Source Compatibility:** Scrapes websites, GitHub repos, package manager sites (npm, PyPI), and even local file directories.
33
+ - **Intelligent Processing:** Automatically chunks documentation semantically and generates embeddings.
34
+ - **Flexible Embedding Models:** Supports OpenAI (incl. compatible APIs like Ollama), Google Gemini/Vertex AI, Azure OpenAI, AWS Bedrock, and more.
35
+ - **Powerful Hybrid Search:** Combines vector similarity with full-text search for relevance.
36
+ - **Local & Private:** Runs entirely on your machine, keeping your data and queries private.
37
+ - **Free & Open Source:** Built for the community, by the community.
38
+ - **Simple Deployment:** Easy setup via Docker or `npx`.
39
+ - **Seamless Integration:** Works with MCP-compatible clients (like Claude, Cline, Roo).
71
40
 
72
41
  ## Running the MCP Server
73
42
 
74
- There are two ways to run the docs-mcp-server:
43
+ Get up and running quickly!
75
44
 
76
- ### Option 1: Using Docker (Recommended)
45
+ ### Option 1: Using Docker
77
46
 
78
- This is the recommended approach for most users. It's easy, straightforward, and doesn't require Node.js to be installed.
47
+ This approach is easy, straightforward, and doesn't require Node.js to be installed.
79
48
 
80
49
  1. **Ensure Docker is installed and running.**
81
50
  2. **Configure your MCP settings:**
@@ -176,7 +145,7 @@ docker run -i --rm \
176
145
 
177
146
  ### Option 2: Using npx
178
147
 
179
- This approach is recommended when you need local file access (e.g., indexing documentation from your local file system). While this can also be achieved by mounting paths into a Docker container, using npx is simpler but requires a Node.js installation.
148
+ This approach is useful when you need local file access (e.g., indexing documentation from your local file system). While this can also be achieved by mounting paths into a Docker container, using npx is simpler but requires a Node.js installation.
180
149
 
181
150
  1. **Ensure Node.js is installed.**
182
151
  2. **Configure your MCP settings:**
@@ -204,10 +173,29 @@ This approach is recommended when you need local file access (e.g., indexing doc
204
173
 
205
174
  3. **That's it!** The server will now be available to your AI assistant.
206
175
 
176
+ ### Option 3: Using npx with HTTP Protocol
177
+
178
+ Similar to Option 2, this uses `npx` to run the latest published package without needing Docker or a local clone. However, this option starts the server using the Streamable HTTP protocol instead of the default stdio, making it accessible via HTTP endpoints. This is useful if you have multiple clients, you work with multiple code assistants in parallel, or want to expose the server to other applications.
179
+
180
+ 1. **Ensure Node.js is installed.**
181
+ 2. **Run the command:**
182
+
183
+ ```bash
184
+ # Ensure required environment variables like OPENAI_API_KEY are set
185
+ npx --package=@arabold/docs-mcp-server docs-server --protocol http --port 8000
186
+ ```
187
+
188
+ - `--protocol http`: Instructs the server to use the HTTP protocol.
189
+ - `--port <number>`: Specifies the listening port (default: 8000).
190
+
191
+ The server will expose endpoints like `/mcp` and `/sse` on the specified port.
192
+
207
193
  ## Using the CLI
208
194
 
209
195
  You can use the CLI to manage documentation directly, either via Docker or npx. **Important: Use the same method (Docker or npx) for both the server and CLI to ensure access to the same indexed documentation.**
210
196
 
197
+ Here's how to invoke the CLI:
198
+
211
199
  ### Using Docker CLI
212
200
 
213
201
  If you're running the server with Docker, use Docker for the CLI as well:
@@ -232,155 +220,58 @@ npx -y --package=@arabold/docs-mcp-server docs-cli <command> [options]
232
220
 
233
221
  The npx approach will use the default data directory on your system (typically in your home directory), ensuring consistency between server and CLI.
234
222
 
235
- (See "CLI Command Reference" below for available commands and options.)
236
-
237
- ### CLI Command Reference
238
-
239
- The `docs-cli` provides commands for managing the documentation index. Access it either via Docker (`docker run -v docs-mcp-data:/data ghcr.io/arabold/docs-mcp-server:latest docs-cli ...`) or `npx` (`npx -y --package=@arabold/docs-mcp-server docs-cli ...`).
240
-
241
- **General Help:**
242
-
243
- ```bash
244
- docs-cli --help
245
- # or
246
- npx -y --package=@arabold/docs-mcp-server docs-cli --help
247
- ```
248
-
249
- **Command Specific Help:** (Replace `docs-cli` with the `npx...` command if not installed globally)
250
-
251
- ```bash
252
- docs-cli scrape --help
253
- docs-cli search --help
254
- docs-cli fetch-url --help
255
- docs-cli find-version --help
256
- docs-cli remove --help
257
- docs-cli list --help
258
- ```
259
-
260
- ### Fetching Single URLs (`fetch-url`)
261
-
262
- Fetches a single URL and converts its content to Markdown. Unlike `scrape`, this command does not crawl links or store the content.
263
-
264
- ```bash
265
- docs-cli fetch-url <url> [options]
266
- ```
267
-
268
- **Options:**
269
-
270
- - `--no-follow-redirects`: Disable following HTTP redirects (default: follow redirects).
271
- - `--scrape-mode <mode>`: HTML processing strategy: 'fetch' (fast, less JS), 'playwright' (slow, full JS), 'auto' (default).
272
-
273
- **Examples:**
274
-
275
- ```bash
276
- # Fetch a URL and convert to Markdown
277
- docs-cli fetch-url https://example.com/page.html
278
- ```
279
-
280
- ### Scraping Documentation (`scrape`)
281
-
282
- Scrapes and indexes documentation from a given URL for a specific library.
283
-
284
- ```bash
285
- docs-cli scrape <library> <url> [options]
286
- ```
287
-
288
- **Options:**
289
-
290
- - `-v, --version <string>`: The specific version to associate with the scraped documents.
291
- - Accepts full versions (`1.2.3`), pre-release versions (`1.2.3-beta.1`), or partial versions (`1`, `1.2` which are expanded to `1.0.0`, `1.2.0`).
292
- - If omitted, the documentation is indexed as **unversioned**.
293
- - `-p, --max-pages <number>`: Maximum pages to scrape (default: 1000).
294
- - `-d, --max-depth <number>`: Maximum navigation depth (default: 3).
295
- - `-c, --max-concurrency <number>`: Maximum concurrent requests (default: 3).
296
- - `--scope <scope>`: Defines the crawling boundary: 'subpages' (default), 'hostname', or 'domain'.
297
- - `--no-follow-redirects`: Disable following HTTP redirects (default: follow redirects).
298
- - `--scrape-mode <mode>`: HTML processing strategy: 'fetch' (fast, less JS), 'playwright' (slow, full JS), 'auto' (default).
299
- - `--ignore-errors`: Ignore errors during scraping (default: true).
300
-
301
- **Examples:**
302
-
303
- ```bash
304
- # Scrape React 18.2.0 docs
305
- docs-cli scrape react --version 18.2.0 https://react.dev/
306
- ```
307
-
308
- ### Searching Documentation (`search`)
309
-
310
- Searches the indexed documentation for a library, optionally filtering by version.
223
+ The main commands available are:
311
224
 
312
- ```bash
313
- docs-cli search <library> <query> [options]
314
- ```
225
+ - `scrape`: Scrapes and indexes documentation from a URL.
226
+ - `search`: Searches the indexed documentation.
227
+ - `list`: Lists all indexed libraries.
228
+ - `remove`: Removes indexed documentation.
229
+ - `fetch-url`: Fetches a single URL and converts to Markdown.
230
+ - `find-version`: Finds the best matching version for a library.
315
231
 
316
- **Options:**
232
+ See the [CLI Command Reference](#cli-command-reference) below for detailed command usage.
317
233
 
318
- - `-v, --version <string>`: The target version or range to search within.
319
- - Supports exact versions (`18.0.0`), partial versions (`18`), or ranges (`18.x`).
320
- - If omitted, searches the **latest** available indexed version.
321
- - If a specific version/range doesn't match, it falls back to the latest indexed version _older_ than the target.
322
- - To search **only unversioned** documents, explicitly pass an empty string: `--version ""`. (Note: Omitting `--version` searches latest, which _might_ be unversioned if no other versions exist).
323
- - `-l, --limit <number>`: Maximum number of results (default: 5).
324
- - `-e, --exact-match`: Only match the exact version specified (disables fallback and range matching) (default: false).
325
-
326
- **Examples:**
327
-
328
- ```bash
329
- # Search latest React docs for 'hooks'
330
- docs-cli search react 'hooks'
331
- ```
332
-
333
- ### Finding Available Versions (`find-version`)
334
-
335
- Checks the index for the best matching version for a library based on a target, and indicates if unversioned documents exist.
336
-
337
- ```bash
338
- docs-cli find-version <library> [options]
339
- ```
234
+ ## Configuration
340
235
 
341
- **Options:**
236
+ The following environment variables are supported to configure the embedding model behavior:
342
237
 
343
- - `-v, --version <string>`: The target version or range. If omitted, finds the latest available version.
238
+ ### Embedding Model Configuration
344
239
 
345
- **Examples:**
240
+ - `DOCS_MCP_EMBEDDING_MODEL`: **Optional.** Format: `provider:model_name` or just `model_name` (defaults to `text-embedding-3-small`). Supported providers and their required environment variables:
346
241
 
347
- ```bash
348
- # Find the latest indexed version for react
349
- docs-cli find-version react
350
- ```
242
+ - `openai` (default): Uses OpenAI's embedding models
351
243
 
352
- ### Listing Libraries (`list`)
244
+ - `OPENAI_API_KEY`: **Required.** Your OpenAI API key
245
+ - `OPENAI_ORG_ID`: **Optional.** Your OpenAI Organization ID
246
+ - `OPENAI_API_BASE`: **Optional.** Custom base URL for OpenAI-compatible APIs (e.g., Ollama, Azure OpenAI)
353
247
 
354
- Lists all libraries currently indexed in the store.
248
+ - `vertex`: Uses Google Cloud Vertex AI embeddings
355
249
 
356
- ```bash
357
- docs-cli list
358
- ```
250
+ - `GOOGLE_APPLICATION_CREDENTIALS`: **Required.** Path to service account JSON key file
359
251
 
360
- ### Removing Documentation (`remove`)
252
+ - `gemini`: Uses Google Generative AI (Gemini) embeddings
361
253
 
362
- Removes indexed documents for a specific library and version.
254
+ - `GOOGLE_API_KEY`: **Required.** Your Google API key
363
255
 
364
- ```bash
365
- docs-cli remove <library> [options]
366
- ```
256
+ - `aws`: Uses AWS Bedrock embeddings
367
257
 
368
- **Options:**
258
+ - `AWS_ACCESS_KEY_ID`: **Required.** AWS access key
259
+ - `AWS_SECRET_ACCESS_KEY`: **Required.** AWS secret key
260
+ - `AWS_REGION` or `BEDROCK_AWS_REGION`: **Required.** AWS region for Bedrock
369
261
 
370
- - `-v, --version <string>`: The specific version to remove. If omitted, removes **unversioned** documents for the library.
262
+ - `microsoft`: Uses Azure OpenAI embeddings
263
+ - `AZURE_OPENAI_API_KEY`: **Required.** Azure OpenAI API key
264
+ - `AZURE_OPENAI_API_INSTANCE_NAME`: **Required.** Azure instance name
265
+ - `AZURE_OPENAI_API_DEPLOYMENT_NAME`: **Required.** Azure deployment name
266
+ - `AZURE_OPENAI_API_VERSION`: **Required.** Azure API version
371
267
 
372
- **Examples:**
268
+ ### Vector Dimensions
373
269
 
374
- ```bash
375
- # Remove React 18.2.0 docs
376
- docs-cli remove react --version 18.2.0
377
- ```
270
+ The database schema uses a fixed dimension of 1536 for embedding vectors. Only models that produce vectors with dimension ≤ 1536 are supported, except for certain providers (like Gemini) that support dimension reduction.
378
271
 
379
- ### Version Handling Summary
272
+ For OpenAI-compatible APIs (like Ollama), use the `openai` provider with `OPENAI_API_BASE` pointing to your endpoint.
380
273
 
381
- - **Scraping:** Requires a specific, valid version (`X.Y.Z`, `X.Y.Z-pre`, `X.Y`, `X`) or no version (for unversioned docs). Ranges (`X.x`) are invalid for scraping.
382
- - **Searching/Finding:** Accepts specific versions, partials, or ranges (`X.Y.Z`, `X.Y`, `X`, `X.x`). Falls back to the latest older version if the target doesn't match. Omitting the version targets the latest available. Explicitly searching `--version ""` targets unversioned documents.
383
- - **Unversioned Docs:** Libraries can have documentation stored without a specific version (by omitting `--version` during scrape). These can be searched explicitly using `--version ""`. The `find-version` command will also report if unversioned docs exist alongside any semver matches.
274
+ These variables can be set regardless of how you run the server (Docker, npx, or from source).
384
275
 
385
276
  ## Development & Advanced Setup
386
277
 
@@ -390,39 +281,6 @@ This section covers running the server/CLI directly from the source code for dev
390
281
 
391
282
  This provides an isolated environment and exposes the server via HTTP endpoints.
392
283
 
393
- 1. **Clone the repository:**
394
- ```bash
395
- git clone https://github.com/arabold/docs-mcp-server.git # Replace with actual URL if different
396
- cd docs-mcp-server
397
- ```
398
- 2. **Create `.env` file:**
399
- Copy the example and add your OpenAI key (see "Environment Setup" below).
400
- ```bash
401
- cp .env.example .env
402
- # Edit .env and add your OPENAI_API_KEY
403
- ```
404
- 3. **Build the Docker image:**
405
- ```bash
406
- docker build -t docs-mcp-server .
407
- ```
408
- 4. **Run the Docker container:**
409
-
410
- ```bash
411
- # Option 1: Using a named volume (recommended)
412
- # Docker automatically creates the volume 'docs-mcp-data' if it doesn't exist on first run.
413
- docker run -i --env-file .env -v docs-mcp-data:/data --name docs-mcp-server docs-mcp-server
414
-
415
- # Option 2: Mapping to a host directory
416
- # docker run -i --env-file .env -v /path/on/your/host:/data --name docs-mcp-server docs-mcp-server
417
- ```
418
-
419
- - `-i`: Keep STDIN open even if not attached. This is crucial for interacting with the server over stdio.
420
- - `--env-file .env`: Loads environment variables (like `OPENAI_API_KEY`) from your local `.env` file.
421
- - `-v docs-mcp-data:/data` or `-v /path/on/your/host:/data`: **Crucial for persistence.** This mounts a Docker named volume (Docker creates `docs-mcp-data` automatically if needed) or a host directory to the `/data` directory inside the container. The `/data` directory is where the server stores its `documents.db` file (as configured by `DOCS_MCP_STORE_PATH` in the Dockerfile). This ensures your indexed documentation persists even if the container is stopped or removed.
422
- - `--name docs-mcp-server`: Assigns a convenient name to the container.
423
-
424
- The server inside the container now runs directly using Node.js and communicates over **stdio**.
425
-
426
284
  This method is useful for contributing to the project or running un-published versions.
427
285
 
428
286
  1. **Clone the repository:**
@@ -479,7 +337,7 @@ This method is useful for contributing to the project or running un-published ve
479
337
  # DOCS_MCP_STORE_PATH=/path/to/your/desired/storage/directory
480
338
  ```
481
339
 
482
- ### Debugging (from Source)
340
+ ### Testing (from Source)
483
341
 
484
342
  Since MCP servers communicate over stdio when run directly via Node.js, debugging can be challenging. We recommend using the [MCP Inspector](https://github.com/modelcontextprotocol/inspector), which is available as a package script after building:
485
343
 
@@ -489,24 +347,6 @@ npx @modelcontextprotocol/inspector node dist/server.js
489
347
 
490
348
  The Inspector will provide a URL to access debugging tools in your browser.
491
349
 
492
- ### Releasing
493
-
494
- This project uses [semantic-release](https://github.com/semantic-release/semantic-release) and [Conventional Commits](https://www.conventionalcommits.org/) to automate the release process.
495
-
496
- **How it works:**
497
-
498
- 1. **Commit Messages:** All commits merged into the `main` branch **must** follow the Conventional Commits specification.
499
- 2. **Manual Trigger:** The "Release" GitHub Actions workflow can be triggered manually from the Actions tab when you're ready to create a new release.
500
- 3. **`semantic-release` Actions:** Determines version, updates `CHANGELOG.md` & `package.json`, commits, tags, publishes to npm, and creates a GitHub Release.
501
-
502
- **What you need to do:**
503
-
504
- - Use Conventional Commits.
505
- - Merge changes to `main`.
506
- - Trigger a release manually when ready from the Actions tab in GitHub.
507
-
508
- **Automation handles:** Changelog, version bumps, tags, npm publish, GitHub releases.
509
-
510
350
  ### Architecture
511
351
 
512
352
  For details on the project's architecture and design principles, please see [ARCHITECTURE.md](ARCHITECTURE.md).
@@ -100,11 +100,6 @@ var require_extend = __commonJS({
100
100
  }
101
101
  });
102
102
 
103
- // src/config.ts
104
- var DEFAULT_MAX_PAGES = 1e3;
105
- var DEFAULT_MAX_DEPTH = 3;
106
- var DEFAULT_MAX_CONCURRENCY = 3;
107
-
108
103
  // src/utils/logger.ts
109
104
  var currentLogLevel = 2 /* INFO */;
110
105
  function setLogLevel(level) {
@@ -834,11 +829,9 @@ var HtmlToMarkdownMiddleware = class {
834
829
  if (match) language = match[1];
835
830
  }
836
831
  }
837
- const brElements = element.querySelectorAll("br");
838
- if (brElements.length > 0) {
839
- for (const br of brElements) {
840
- br.replaceWith("\n");
841
- }
832
+ const brElements = Array.from(element.querySelectorAll("br"));
833
+ for (const br of brElements) {
834
+ br.replaceWith("\n");
842
835
  }
843
836
  const text3 = element.textContent || "";
844
837
  return `
@@ -848,6 +841,19 @@ ${text3.replace(/^\n+|\n+$/g, "")}
848
841
  `;
849
842
  }
850
843
  });
844
+ this.turndownService.addRule("anchor", {
845
+ filter: ["a"],
846
+ replacement: (content3, node2) => {
847
+ const href = node2.getAttribute("href");
848
+ if (!content3 || content3 === "#") {
849
+ return "";
850
+ }
851
+ if (!href) {
852
+ return content3;
853
+ }
854
+ return `[${content3}](${href})`;
855
+ }
856
+ });
851
857
  }
852
858
  /**
853
859
  * Processes the context to convert the sanitized HTML body node to Markdown.
@@ -966,8 +972,8 @@ var CancellationError = class extends PipelineError {
966
972
  };
967
973
 
968
974
  // src/scraper/strategies/BaseScraperStrategy.ts
969
- var DEFAULT_MAX_PAGES2 = 100;
970
- var DEFAULT_MAX_DEPTH2 = 3;
975
+ var DEFAULT_MAX_PAGES = 100;
976
+ var DEFAULT_MAX_DEPTH = 3;
971
977
  var DEFAULT_CONCURRENCY = 3;
972
978
  var BaseScraperStrategy = class {
973
979
  visited = /* @__PURE__ */ new Set();
@@ -983,7 +989,7 @@ var BaseScraperStrategy = class {
983
989
  if (signal?.aborted) {
984
990
  throw new CancellationError("Scraping cancelled during batch processing");
985
991
  }
986
- const maxDepth = options.maxDepth ?? DEFAULT_MAX_DEPTH2;
992
+ const maxDepth = options.maxDepth ?? DEFAULT_MAX_DEPTH;
987
993
  if (item.depth > maxDepth) {
988
994
  return [];
989
995
  }
@@ -991,7 +997,7 @@ var BaseScraperStrategy = class {
991
997
  const result = await this.processItem(item, options, void 0, signal);
992
998
  if (result.document) {
993
999
  this.pageCount++;
994
- const maxPages = options.maxPages ?? DEFAULT_MAX_PAGES2;
1000
+ const maxPages = options.maxPages ?? DEFAULT_MAX_PAGES;
995
1001
  logger.info(
996
1002
  `\u{1F310} Scraping page ${this.pageCount}/${maxPages} (depth ${item.depth}/${maxDepth}): ${item.url}`
997
1003
  );
@@ -1043,7 +1049,7 @@ var BaseScraperStrategy = class {
1043
1049
  const baseUrl = new URL2(options.url);
1044
1050
  const queue = [{ url: options.url, depth: 0 }];
1045
1051
  this.visited.add(normalizeUrl(options.url, this.options.urlNormalizerOptions));
1046
- const maxPages = options.maxPages ?? DEFAULT_MAX_PAGES2;
1052
+ const maxPages = options.maxPages ?? DEFAULT_MAX_PAGES;
1047
1053
  const maxConcurrency = options.maxConcurrency ?? DEFAULT_CONCURRENCY;
1048
1054
  while (queue.length > 0 && this.pageCount < maxPages) {
1049
1055
  if (signal?.aborted) {
@@ -1941,6 +1947,13 @@ var ListLibrariesTool = class {
1941
1947
  }
1942
1948
  };
1943
1949
 
1950
+ // src/utils/config.ts
1951
+ var DEFAULT_MAX_PAGES2 = 1e3;
1952
+ var DEFAULT_MAX_DEPTH2 = 3;
1953
+ var DEFAULT_MAX_CONCURRENCY = 3;
1954
+ var DEFAULT_PROTOCOL = "stdio";
1955
+ var DEFAULT_HTTP_PORT = 8e3;
1956
+
1944
1957
  // src/tools/ScrapeTool.ts
1945
1958
  import * as semver2 from "semver";
1946
1959
  var ScrapeTool = class {
@@ -1995,8 +2008,8 @@ var ScrapeTool = class {
1995
2008
  version: internalVersion,
1996
2009
  scope: scraperOptions?.scope ?? "subpages",
1997
2010
  followRedirects: scraperOptions?.followRedirects ?? true,
1998
- maxPages: scraperOptions?.maxPages ?? DEFAULT_MAX_PAGES,
1999
- maxDepth: scraperOptions?.maxDepth ?? DEFAULT_MAX_DEPTH,
2011
+ maxPages: scraperOptions?.maxPages ?? DEFAULT_MAX_PAGES2,
2012
+ maxDepth: scraperOptions?.maxDepth ?? DEFAULT_MAX_DEPTH2,
2000
2013
  maxConcurrency: scraperOptions?.maxConcurrency ?? DEFAULT_MAX_CONCURRENCY,
2001
2014
  ignoreErrors: scraperOptions?.ignoreErrors ?? true,
2002
2015
  scrapeMode: scraperOptions?.scrapeMode ?? "auto" /* Auto */
@@ -2080,26 +2093,6 @@ var SearchTool = class {
2080
2093
  logger.info(`\u2705 Found ${results.length} matching results`);
2081
2094
  return { results };
2082
2095
  } catch (error) {
2083
- if (error instanceof LibraryNotFoundError) {
2084
- logger.info(`\u2139\uFE0F Library not found: ${error.message}`);
2085
- return {
2086
- results: [],
2087
- error: {
2088
- message: error.message,
2089
- suggestions: error.suggestions
2090
- }
2091
- };
2092
- }
2093
- if (error instanceof VersionNotFoundError) {
2094
- logger.info(`\u2139\uFE0F Version not found: ${error.message}`);
2095
- return {
2096
- results: [],
2097
- error: {
2098
- message: error.message,
2099
- availableVersions: error.availableVersions
2100
- }
2101
- };
2102
- }
2103
2096
  logger.error(
2104
2097
  `\u274C Search failed: ${error instanceof Error ? error.message : "Unknown error"}`
2105
2098
  );
@@ -12073,9 +12066,6 @@ var DocumentManagementService = class {
12073
12066
  };
12074
12067
 
12075
12068
  export {
12076
- DEFAULT_MAX_PAGES,
12077
- DEFAULT_MAX_DEPTH,
12078
- DEFAULT_MAX_CONCURRENCY,
12079
12069
  setLogLevel,
12080
12070
  logger,
12081
12071
  HttpFetcher,
@@ -12085,14 +12075,20 @@ export {
12085
12075
  PipelineManager,
12086
12076
  CancelJobTool,
12087
12077
  VersionNotFoundError,
12078
+ LibraryNotFoundError,
12088
12079
  FetchUrlTool,
12089
12080
  FindVersionTool,
12090
12081
  GetJobInfoTool,
12091
12082
  ListJobsTool,
12092
12083
  ListLibrariesTool,
12093
12084
  RemoveTool,
12085
+ DEFAULT_MAX_PAGES2 as DEFAULT_MAX_PAGES,
12086
+ DEFAULT_MAX_DEPTH2 as DEFAULT_MAX_DEPTH,
12087
+ DEFAULT_MAX_CONCURRENCY,
12088
+ DEFAULT_PROTOCOL,
12089
+ DEFAULT_HTTP_PORT,
12094
12090
  ScrapeTool,
12095
12091
  SearchTool,
12096
12092
  DocumentManagementService
12097
12093
  };
12098
- //# sourceMappingURL=chunk-VTO2ED43.js.map
12094
+ //# sourceMappingURL=chunk-VF2RUEVV.js.map