docshark 0.1.17 → 0.1.20

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,176 @@
1
+ # 🦈 DocShark
2
+
3
+ [![Built with Bun](https://img.shields.io/badge/Bun-%23000000.svg?style=flat&logo=bun&logoColor=white)](https://bun.sh/)
4
+ [![NPM Version](https://img.shields.io/npm/v/docshark.svg?style=flat&color=blue)](https://www.npmjs.com/package/docshark)
5
+ [![MCP Compatible](https://img.shields.io/badge/MCP-Ready-0D1117.svg?style=flat&logo=github&logoColor=white)](https://modelcontextprotocol.io/)
6
+ [![GitHub Release](https://img.shields.io/github/v/release/Michael-Obele/docshark?color=success)](https://github.com/Michael-Obele/docshark/releases)
7
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
8
+
9
+ **DocShark** is a powerful MCP (Model Context Protocol) server designed to scrape, index, and search any documentation website. It creates a local, highly-searchable knowledge base from public documentation pages using FTS5 (Full-Text Search) and BM25 ranking, allowing AI assistants to query the latest docs effortlessly.
10
+
11
+ ---
12
+
13
+ ## 🚀 Features
14
+
15
+ - **Automated Crawling**: Discovers pages via `sitemap.xml` with fallback to BFS link crawling.
16
+ - **Smart Extraction**: Uses Readability and Turndown to extract main content and convert it to clean Markdown, filtering out navbars and sidebars.
17
+ - **Semantic Chunking**: Splits content based on headings, preserving contextual headers for better AI understanding.
18
+ - **High-Performance Search**: Built-in SQLite + FTS5 indexing with BM25 ranking for accurate and lightning-fast search results.
19
+ - **JS-Rendered Site Support**: Tiered fetching strategy automatically detects React/Vue SPAs (empty shells) and upgrades to `puppeteer-core` if you have it installed (zero-config, auto-fallback).
20
+ - **Polite Crawling**: Respects `robots.txt` and implements rate limiting to prevent overloading documentation servers.
21
+ - **Standard MCP Tooling**: Connect perfectly with Desktop Claude, VS Code, Cursor, and any other MCP-compatible clients via standard `stdio` or `http`/`sse` transports.
22
+
23
+ ## 📦 What We Have Done (Phase 1)
24
+
25
+ **Phase 1: Core Engine** is fully implemented and tested.
26
+
27
+ - ✅ Custom SQLite Database with FTS5 virtual tables and auto-sync triggers.
28
+ - ✅ Web scraping engine supporting standard `fetch()` and `puppeteer-core`.
29
+ - ✅ Markdown processor utilizing Readability + Turndown.
30
+ - ✅ Heading-based semantic chunker (500-1200 tokens per chunk).
31
+ - ✅ Asynchronous job manager and queue system.
32
+ - ✅ Complete HTTP API (REST endpoints + SSE event streams).
33
+ - ✅ Seamless integration of 4 MCP tools: `manage_library`, `search_docs`, `list_libraries`, and `get_doc_page`.
34
+ - ✅ Robust CLI interface (`start`, `add`, `rename`, `search`, `list`).
35
+
36
+ ## 🏗️ What We Are Doing
37
+
38
+ We are actively polishing the integration between the core engine and external MCP clients (like VS Code Agents and Claude Desktop).
39
+
40
+ ## 🔮 What We Plan To Do (Phase 2 & Beyond)
41
+
42
+ - **Web Dashboard**: An intuitive SvelteKit dashboard to manage your synced libraries, view crawl progress in real-time (via SSE), and test searches manually.
43
+ - **Incremental Crawling**: Smarter `refresh` jobs that compare `ETag` and `Last-Modified` headers to only re-scrape updated pages.
44
+ - **Vector Search (RAG)**: Integration of lightweight vector embeddings for semantic similarity search alongside the existing FTS5 keyword search.
45
+ - **Advanced Scraping Setup**: Support for custom CSS selectors to define exactly where content lives in non-standard documentation websites.
46
+
47
+ ---
48
+
49
+ ## 🛠️ Usage
50
+
51
+ ### Quick Start (from npm)
52
+
53
+ You can run DocShark directly without installing it globally using `bunx`:
54
+
55
+ ```bash
56
+ # Add a documentation library to the index
57
+ bunx docshark add https://valibot.dev/guides/ --depth 2
58
+
59
+ # Search your indexed docs
60
+ bunx docshark search "schema validation"
61
+ ```
62
+
63
+ ### Installation
64
+
65
+ To install DocShark globally as a CLI tool:
66
+
67
+ DocShark is intended to be installed and run with Bun.
68
+
69
+ ```bash
70
+ # Global Bun installation
71
+ bun add -g docshark
72
+ ```
73
+
74
+ After installation, you can use the `docshark` command:
75
+
76
+ ```bash
77
+ docshark list
78
+
79
+ # Update the global Bun installation when a new release is published
80
+ docshark update
81
+
82
+ # Script-friendly update check
83
+ docshark update --check --quiet
84
+ ```
85
+
86
+ Interactive CLI runs will also let you know when a newer version is available. Update notices are intentionally skipped for MCP `stdio` mode so they never interfere with protocol output.
87
+
88
+ For scripts, `docshark update --check` exits `0` when current, `10` when a newer version is available, and `1` when the version check could not be completed.
89
+
90
+ ## 🔌 MCP Integration
91
+
92
+ ### VS Code (GitHub Copilot / MCP Extension)
93
+
94
+ Add DocShark to your `.vscode/settings.json` or global MCP configuration:
95
+
96
+ ```json
97
+ {
98
+ "mcpServers": {
99
+ "docshark": {
100
+ "command": "bunx",
101
+ "args": ["-y", "docshark", "start", "--stdio"]
102
+ }
103
+ }
104
+ }
105
+ ```
106
+
107
+ ### Cursor
108
+
109
+ 1. Open **Cursor Settings** > **Models** > **MCP**.
110
+ 2. Click **+ Add New MCP Server**.
111
+ 3. Name: `docshark`
112
+ 4. Type: `command`
113
+ 5. Command: `bunx -y docshark start --stdio`
114
+
115
+ ### Claude Desktop
116
+
117
+ Edit your Claude Desktop configuration file:
118
+
119
+ - **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
120
+ - **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
121
+
122
+ ```json
123
+ {
124
+ "mcpServers": {
125
+ "docshark": {
126
+ "command": "bunx",
127
+ "args": ["-y", "docshark", "start", "--stdio"]
128
+ }
129
+ }
130
+ }
131
+ ```
132
+
133
+ ---
134
+
135
+ ## 🛠️ Development
136
+
137
+ ### Local Setup
138
+
139
+ Ensure you have [Bun](https://bun.sh/) installed.
140
+
141
+ ```bash
142
+ # Clone the repository
143
+ git clone https://github.com/Michael-Obele/docshark.git
144
+ cd docshark
145
+
146
+ # Install dependencies
147
+ bun install
148
+
149
+ # (Optional) Enable auto-detection & scraping of Javascript React/Vue single-page apps
150
+ bun add puppeteer-core
151
+
152
+ # Start the DocShark MCP server in HTTP mode for local testing
153
+ bun run src/cli.ts start --port 6380
154
+ ```
155
+
156
+ ### Local CLI Debugging
157
+
158
+ ```bash
159
+ # Run CLI directly while developing
160
+ bun run src/cli.ts list
161
+ ```
162
+
163
+ ## 🔄 Versioning & Changelog
164
+
165
+ This project uses [Google's Release Please](https://github.com/googleapis/release-please) to automate versioning and changelog generation.
166
+
167
+ - **Semantic Versioning**: Our versions automatically bump (e.g. `0.0.1` -> `0.0.2` or `0.1.0`) based on standard Conventional Commits (`feat:`, `fix:`, `chore:`, etc.).
168
+ - **Automated**: A PR is automatically created on `master` when standard commits are merged, generating a standard `CHANGELOG.md`.
169
+
170
+ ## 📜 License
171
+
172
+ This project is open-source and available under the [MIT License](LICENSE).
173
+
174
+ ---
175
+
176
+ _Built to empower AI agents with the latest knowledge._
@@ -4,20 +4,32 @@ import { fileURLToPath } from "node:url";
4
4
  const scriptDir = dirname(fileURLToPath(import.meta.url));
5
5
  const packageJsonPath = resolve(scriptDir, "../../package.json");
6
6
  const versionFilePath = resolve(scriptDir, "../version.ts");
7
+ const releaseVersion = process.env.DOCSHARK_VERSION?.trim();
7
8
  const packageJson = JSON.parse(readFileSync(packageJsonPath, "utf8"));
8
- if (typeof packageJson.version !== "string") {
9
+ const targetVersion = releaseVersion && releaseVersion.length > 0
10
+ ? releaseVersion
11
+ : packageJson.version;
12
+ if (typeof targetVersion !== "string") {
9
13
  throw new Error("packages/core/package.json is missing a valid version.");
10
14
  }
11
15
  const nextVersionFile = [
12
16
  "// This file is automatically updated by the version sync script.",
13
- `export const VERSION = '${packageJson.version}';`,
17
+ `export const VERSION = '${targetVersion}';`,
14
18
  "",
15
19
  ].join("\n");
16
20
  const currentVersionFile = readFileSync(versionFilePath, "utf8");
17
21
  if (currentVersionFile !== nextVersionFile) {
18
22
  writeFileSync(versionFilePath, nextVersionFile, "utf8");
19
- console.log(`Updated src/version.ts to ${packageJson.version}`);
23
+ console.log(`Updated src/version.ts to ${targetVersion}`);
20
24
  }
21
25
  else {
22
- console.log(`src/version.ts already matches ${packageJson.version}`);
26
+ console.log(`src/version.ts already matches ${targetVersion}`);
27
+ }
28
+ if (releaseVersion && packageJson.version !== targetVersion) {
29
+ const nextPackageJson = {
30
+ ...packageJson,
31
+ version: targetVersion,
32
+ };
33
+ writeFileSync(packageJsonPath, `${JSON.stringify(nextPackageJson, null, 2)}\n`, "utf8");
34
+ console.log(`Updated package.json to ${targetVersion}`);
23
35
  }
package/dist/version.d.ts CHANGED
@@ -1 +1 @@
1
- export declare const VERSION = "0.1.17";
1
+ export declare const VERSION = "0.1.20";
package/dist/version.js CHANGED
@@ -1,2 +1,2 @@
1
1
  // This file is automatically updated by the version sync script.
2
- export const VERSION = '0.1.17';
2
+ export const VERSION = '0.1.20';
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "docshark",
3
- "version": "0.1.17",
3
+ "version": "0.1.20",
4
4
  "description": "🦈 Documentation MCP Server — scrape, index, and search any doc website",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",