npm - docshark - Versions diffs - 0.1.5 → 0.1.7 - Mend

docshark 0.1.5 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

package/CHANGELOG.md +8 -0
package/README.md +83 -30
package/dist/api/router.js +77 -0
package/dist/cli.d.ts +1 -1
package/dist/cli.js +160 -164
package/dist/http.js +84 -0
package/dist/index.js +0 -1
package/dist/jobs/events.js +15 -0
package/dist/jobs/manager.js +49 -0
package/dist/jobs/worker.js +120 -0
package/dist/processor/chunker.js +79 -0
package/dist/processor/extractor.js +81 -0
package/dist/scraper/discoverer.js +206 -0
package/dist/scraper/fetcher.js +129 -0
package/dist/scraper/rate-limiter.js +18 -0
package/dist/scraper/robots.js +26 -0
package/dist/server.js +154 -0
package/dist/services/library.js +66 -0
package/dist/storage/db.js +228 -0
package/dist/storage/search.js +49 -0
package/dist/tools/add-library.js +35 -0
package/dist/tools/get-doc-page.js +25 -0
package/dist/tools/list-libraries.js +29 -0
package/dist/tools/refresh-library.js +25 -0
package/dist/tools/remove-library.js +25 -0
package/dist/tools/search-docs.js +35 -0
package/dist/types.js +2 -0
package/dist/version.d.ts +1 -1
package/dist/version.js +2 -0
package/package.json +6 -2

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,13 @@
 # Changelog
+## 0.1.7 (2026-03-07)
+**Full Changelog**: https://github.com/Michael-Obele/docshark/compare/v0.1.6...v0.1.7
+## 0.1.6 (2026-03-07)
+**Full Changelog**: https://github.com/Michael-Obele/docshark/compare/v0.1.5...v0.1.6
 ## 0.1.5 (2026-03-02)
 **Full Changelog**: https://github.com/Michael-Obele/docshark/compare/v0.1.4...v0.1.5

package/README.md CHANGED Viewed

@@ -22,6 +22,7 @@
 ## 📦 What We Have Done (Phase 1)
 **Phase 1: Core Engine** is fully implemented and tested.
 - ✅ Custom SQLite Database with FTS5 virtual tables and auto-sync triggers.
 - ✅ Web scraping engine supporting standard `fetch()` and `puppeteer-core`.
 - ✅ Markdown processor utilizing Readability + Turndown.
@@ -46,62 +47,113 @@ We are actively polishing the integration between the core engine and external M
 ## 🛠️ Usage
-### Installing & Running Locally
+### Quick Start (from npm)
-Ensure you have [Bun](https://bun.sh/) installed.
+You can run DocShark directly without installing it globally using `bunx`:
 ```bash
-# Install dependencies
-bun install
+# Add a documentation library to the index
+bunx docshark add https://valibot.dev/guides/ --depth 2
-# (Optional) Enable auto-detection & scraping of Javascript React/Vue single-page apps
-bun add puppeteer-core
+# Search your indexed docs
+bunx docshark search "schema validation"
+```
-# Start the DocShark MCP server in HTTP mode
-bun run src/cli.ts start --port 6380
+### Installation
+To install DocShark globally as a CLI tool:
+```bash
+# Using npm
+npm install -g docshark
+# Using Bun
+bun add -g docshark
 ```
-### Important CLI Commands
+After installation, you can use the `docshark` command:
 ```bash
-# Add a documentation library to the index
-bun run src/cli.ts add https://valibot.dev/guides/ --depth 2
+docshark list
+```
-# Search your indexed docs
-bun run src/cli.ts search "schema validation"
+## 🔌 MCP Integration
-# List all crawled libraries
-bun run src/cli.ts list
+### VS Code (GitHub Copilot / MCP Extension)
+Add DocShark to your `.vscode/settings.json` or global MCP configuration:
+```json
+{
+  "mcpServers": {
+    "docshark": {
+      "command": "bunx",
+      "args": ["-y", "docshark", "start", "--stdio"]
+    }
+  }
+}
 ```
-### Using in VS Code (Copilot Agent Mode)
+### Cursor
+1. Open **Cursor Settings** > **Models** > **MCP**.
+2. Click **+ Add New MCP Server**.
+3. Name: `docshark`
+4. Type: `command`
+5. Command: `bunx -y docshark start --stdio`
+### Claude Desktop
+Edit your Claude Desktop configuration file:
+- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
+- **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
-To use DocShark as an MCP server in VS Code:
-1. Enable MCP discovery in your VS Code settings.
-2. Create `.vscode/mcp.json` in your workspace:
 ```json
 {
-  "servers": {
+  "mcpServers": {
     "docshark": {
-      "type": "stdio",
-      "command": "bun",
-      "args": [
-        "run",
-        "/absolute/path/to/docshark/src/cli.ts",
-        "start",
-        "--stdio"
-      ]
+      "command": "bunx",
+      "args": ["-y", "docshark", "start", "--stdio"]
     }
   }
 }
 ```
-3. Restart the server in VS Code properties, and your Copilot agent will now have access to the docshark tools.
 ---
+## 🛠️ Development
+### Local Setup
+Ensure you have [Bun](https://bun.sh/) installed.
+```bash
+# Clone the repository
+git clone https://github.com/Michael-Obele/docshark.git
+cd docshark
+# Install dependencies
+bun install
+# (Optional) Enable auto-detection & scraping of Javascript React/Vue single-page apps
+bun add puppeteer-core
+# Start the DocShark MCP server in HTTP mode for local testing
+bun run src/cli.ts start --port 6380
+```
+### Local CLI Debugging
+```bash
+# Run CLI directly while developing
+bun run src/cli.ts list
+```
 ## 🔄 Versioning & Changelog
 This project uses [Google's Release Please](https://github.com/googleapis/release-please) to automate versioning and changelog generation.
 - **Semantic Versioning**: Our versions automatically bump (e.g. `0.0.1` -> `0.0.2` or `0.1.0`) based on standard Conventional Commits (`feat:`, `fix:`, `chore:`, etc.).
 - **Automated**: A PR is automatically created on `master` when standard commits are merged, generating a standard `CHANGELOG.md`.
@@ -110,4 +162,5 @@ This project uses [Google's Release Please](https://github.com/googleapis/releas
 This project is open-source and available under the [MIT License](LICENSE).
 ---
-*Built to empower AI agents with the latest knowledge.*
+_Built to empower AI agents with the latest knowledge._

package/dist/api/router.js ADDED Viewed

@@ -0,0 +1,77 @@
+import { VERSION } from '../version.js';
+export function createApiRouter(deps) {
+    return {
+        async handle(request) {
+            const url = new URL(request.url);
+            const path = url.pathname.replace(/^\/api/, '');
+            const method = request.method;
+            try {
+                // GET /api/libraries
+                if (method === 'GET' && path === '/libraries') {
+                    const status = url.searchParams.get('status') || 'all';
+                    const libs = deps.db.listLibraries(status);
+                    return json(libs);
+                }
+                // POST /api/libraries
+                if (method === 'POST' && path === '/libraries') {
+                    const body = await request.json();
+                    const lib = await deps.libraryService.add(body);
+                    return json(lib, 201);
+                }
+                // DELETE /api/libraries/:id
+                const deleteMatch = path.match(/^\/libraries\/(.+)$/);
+                if (method === 'DELETE' && deleteMatch) {
+                    deps.db.removeLibrary(deleteMatch[1]);
+                    return json({ ok: true });
+                }
+                // POST /api/libraries/:id/refresh
+                const refreshMatch = path.match(/^\/libraries\/(.+)\/refresh$/);
+                if (method === 'POST' && refreshMatch) {
+                    const job = deps.jobManager.startCrawl(refreshMatch[1]);
+                    return json({ jobId: job.id });
+                }
+                // GET /api/search?q=...&library=...&limit=...
+                if (method === 'GET' && path === '/search') {
+                    const q = url.searchParams.get('q') || '';
+                    const library = url.searchParams.get('library') || undefined;
+                    const limit = parseInt(url.searchParams.get('limit') || '5');
+                    const results = deps.searchEngine.search(q, { library, limit });
+                    return json(results);
+                }
+                // GET /api/crawls
+                if (method === 'GET' && path === '/crawls') {
+                    const libraryId = url.searchParams.get('library_id') || undefined;
+                    const jobs = deps.jobManager.listJobs(libraryId);
+                    return json(jobs);
+                }
+                // GET /api/stats
+                if (method === 'GET' && path === '/stats') {
+                    const libs = deps.db.listLibraries();
+                    return json({
+                        libraries: libs.length,
+                        pages: libs.reduce((s, l) => s + l.page_count, 0),
+                        chunks: libs.reduce((s, l) => s + l.chunk_count, 0),
+                    });
+                }
+                // GET /api/health
+                if (method === 'GET' && path === '/health') {
+                    return json({ status: 'ok', version: VERSION });
+                }
+                return new Response('Not Found', { status: 404 });
+            }
+            catch (err) {
+                console.error('[DocShark API]', err);
+                return json({ error: err.message }, 500);
+            }
+        },
+    };
+}
+function json(data, status = 200) {
+    return new Response(JSON.stringify(data), {
+        status,
+        headers: {
+            'Content-Type': 'application/json',
+            'Access-Control-Allow-Origin': '*',
+        },
+    });
+}

package/dist/cli.d.ts CHANGED Viewed

@@ -1,2 +1,2 @@
-#!/usr/bin/env node
+#!/usr/bin/env bun
 export {};

package/dist/cli.js CHANGED Viewed

@@ -1,179 +1,175 @@
-#!/usr/bin/env node
-import { createRequire } from "node:module";
-var __create = Object.create;
-var __getProtoOf = Object.getPrototypeOf;
-var __defProp = Object.defineProperty;
-var __getOwnPropNames = Object.getOwnPropertyNames;
-var __hasOwnProp = Object.prototype.hasOwnProperty;
-var __toESM = (mod, isNodeMode, target) => {
-  target = mod != null ? __create(__getProtoOf(mod)) : {};
-  const to = isNodeMode || !mod || !mod.__esModule ? __defProp(target, "default", { value: mod, enumerable: true }) : target;
-  for (let key of __getOwnPropNames(mod))
-    if (!__hasOwnProp.call(to, key))
-      __defProp(to, key, {
-        get: () => mod[key],
-        enumerable: true
-      });
-  return to;
-};
-var __require = /* @__PURE__ */ createRequire(import.meta.url);
-// src/cli.ts
+#!/usr/bin/env bun
+// src/cli.ts — DocShark CLI entry point
 import { Command } from "commander";
 import { startHttpServer } from "./http.js";
 import { StdioTransport } from "@tmcp/transport-stdio";
 import { server, db, searchEngine, libraryService } from "./server.js";
 import { VERSION } from "./version.js";
-var program = new Command().name("docshark").description("\uD83E\uDD88 Documentation MCP Server — scrape, index, and search any doc website").version(VERSION, "-v, --version", "output the current version");
-program.command("start", { isDefault: true }).description("Start the MCP server").option("-p, --port <port>", "HTTP server port", "6380").option("--stdio", "Run in STDIO mode (for Claude Desktop, Cursor, etc.)").option("--data-dir <path>", "Data directory", "").action(async (opts) => {
-  if (opts.dataDir) {
-    process.env.DOCSHARK_DATA_DIR = opts.dataDir;
-  }
-  db.init();
-  if (opts.stdio) {
-    const stdio = new StdioTransport(server);
-    stdio.listen();
-  } else {
-    await startHttpServer(parseInt(opts.port));
-  }
+const program = new Command()
+    .name("docshark")
+    .description("🦈 Documentation MCP Server — scrape, index, and search any doc website")
+    .version(VERSION, "-v, --version", "output the current version");
+program
+    .command("start", { isDefault: true })
+    .description("Start the MCP server")
+    .option("-p, --port <port>", "HTTP server port", "6380")
+    .option("--stdio", "Run in STDIO mode (for Claude Desktop, Cursor, etc.)")
+    .option("--data-dir <path>", "Data directory", "")
+    .action(async (opts) => {
+    if (opts.dataDir) {
+        process.env.DOCSHARK_DATA_DIR = opts.dataDir;
+    }
+    db.init();
+    if (opts.stdio) {
+        // STDIO mode — direct pipe, no HTTP
+        const stdio = new StdioTransport(server);
+        stdio.listen();
+    }
+    else {
+        await startHttpServer(parseInt(opts.port));
+    }
 });
-program.command("add <url>").description("Add a documentation library and start crawling").option("-n, --name <name>", "Library name (auto-generated from URL if omitted)").option("-d, --depth <n>", "Max crawl depth", "3").option("--lib-version <version>", "Library version").action(async (url, opts) => {
-  db.init();
-  try {
-    const lib = await libraryService.add({
-      url,
-      name: opts.name,
-      version: opts.libVersion,
-      maxDepth: parseInt(opts.depth)
-    });
-    console.log(`
-✅ Added "${lib.display_name}" — crawling ${lib.url}...`);
-    console.log(`   Job ID: ${lib.jobId}`);
-    console.log(`   Use "docshark list" to check progress.
-`);
-    await waitForCrawl(lib.jobId);
-  } catch (err) {
-    console.error(`
-❌ ${err.message}
-`);
-    process.exit(1);
-  }
+program
+    .command("add <url>")
+    .description("Add a documentation library and start crawling")
+    .option("-n, --name <name>", "Library name (auto-generated from URL if omitted)")
+    .option("-d, --depth <n>", "Max crawl depth", "3")
+    .option("--lib-version <version>", "Library version")
+    .action(async (url, opts) => {
+    db.init();
+    try {
+        const lib = await libraryService.add({
+            url,
+            name: opts.name,
+            version: opts.libVersion,
+            maxDepth: parseInt(opts.depth),
+        });
+        console.log(`\n✅ Added "${lib.display_name}" — crawling ${lib.url}...`);
+        console.log(`   Job ID: ${lib.jobId}`);
+        console.log(`   Use "docshark list" to check progress.\n`);
+        // Wait for the crawl to finish
+        await waitForCrawl(lib.jobId);
+    }
+    catch (err) {
+        console.error(`\n❌ ${err.message}\n`);
+        process.exit(1);
+    }
 });
-program.command("search <query>").description("Search indexed documentation").option("-l, --library <name>", "Filter by library").option("--limit <n>", "Max results", "5").action(async (query, opts) => {
-  db.init();
-  const results = searchEngine.search(query, {
-    library: opts.library,
-    limit: parseInt(opts.limit)
-  });
-  if (results.length === 0) {
-    console.log(`
-No results found for "${query}".
-`);
-    return;
-  }
-  for (const r of results) {
-    console.log(`
---- ${r.page_title} (${r.library_display_name}) ---`);
-    console.log(`Section: ${r.heading_context}`);
-    console.log(r.content.slice(0, 300));
-    console.log(`Source: ${r.page_url}
-`);
-  }
+program
+    .command("search <query>")
+    .description("Search indexed documentation")
+    .option("-l, --library <name>", "Filter by library")
+    .option("--limit <n>", "Max results", "5")
+    .action(async (query, opts) => {
+    db.init();
+    const results = searchEngine.search(query, {
+        library: opts.library,
+        limit: parseInt(opts.limit),
+    });
+    if (results.length === 0) {
+        console.log(`\nNo results found for "${query}".\n`);
+        return;
+    }
+    for (const r of results) {
+        console.log(`\n--- ${r.page_title} (${r.library_display_name}) ---`);
+        console.log(`Section: ${r.heading_context}`);
+        console.log(r.content.slice(0, 300));
+        console.log(`Source: ${r.page_url}\n`);
+    }
 });
-program.command("list").description("List indexed libraries").action(() => {
-  db.init();
-  const libs = db.listLibraries();
-  if (libs.length === 0) {
-    console.log(`
-No libraries indexed. Use "docshark add <url>" to add one.
-`);
-    return;
-  }
-  console.table(libs.map((l) => ({
-    Name: l.name,
-    URL: l.url,
-    Pages: l.page_count,
-    Chunks: l.chunk_count,
-    Status: l.status,
-    "Last Crawled": l.last_crawled_at || "never"
-  })));
+program
+    .command("list")
+    .description("List indexed libraries")
+    .action(() => {
+    db.init();
+    const libs = db.listLibraries();
+    if (libs.length === 0) {
+        console.log('\nNo libraries indexed. Use "docshark add <url>" to add one.\n');
+        return;
+    }
+    console.table(libs.map((l) => ({
+        Name: l.name,
+        URL: l.url,
+        Pages: l.page_count,
+        Chunks: l.chunk_count,
+        Status: l.status,
+        "Last Crawled": l.last_crawled_at || "never",
+    })));
 });
-program.command("refresh <name>").description("Refresh an existing documentation library").action(async (name) => {
-  db.init();
-  try {
-    const lib = db.getLibraryByName(name);
-    if (!lib)
-      throw new Error(`Library "${name}" not found.`);
-    const { jobManager } = await import("./server.js");
-    const job = jobManager.startCrawl(lib.id, { incremental: true });
-    console.log(`
-\uD83D\uDD04 Refreshing "${lib.display_name}" — crawling ${lib.url}...`);
-    console.log(`   Job ID: ${job.id}`);
-    await waitForCrawl(job.id);
-  } catch (err) {
-    console.error(`
-❌ ${err.message}
-`);
-    process.exit(1);
-  }
+program
+    .command("refresh <name>")
+    .description("Refresh an existing documentation library")
+    .action(async (name) => {
+    db.init();
+    try {
+        const lib = db.getLibraryByName(name);
+        if (!lib)
+            throw new Error(`Library "${name}" not found.`);
+        const { jobManager } = await import("./server.js");
+        const job = jobManager.startCrawl(lib.id, { incremental: true });
+        console.log(`\n🔄 Refreshing "${lib.display_name}" — crawling ${lib.url}...`);
+        console.log(`   Job ID: ${job.id}`);
+        await waitForCrawl(job.id);
+    }
+    catch (err) {
+        console.error(`\n❌ ${err.message}\n`);
+        process.exit(1);
+    }
 });
-program.command("remove <name>").description("Remove a documentation library and its index").action((name) => {
-  db.init();
-  try {
-    const lib = db.getLibraryByName(name);
-    if (!lib)
-      throw new Error(`Library "${name}" not found.`);
-    db.removeLibrary(lib.id);
-    console.log(`
-\uD83D\uDDD1️ Removed library "${lib.display_name}". Deleted ${lib.page_count} pages.
-`);
-  } catch (err) {
-    console.error(`
-❌ ${err.message}
-`);
-    process.exit(1);
-  }
+program
+    .command("remove <name>")
+    .description("Remove a documentation library and its index")
+    .action((name) => {
+    db.init();
+    try {
+        const lib = db.getLibraryByName(name);
+        if (!lib)
+            throw new Error(`Library "${name}" not found.`);
+        db.removeLibrary(lib.id);
+        console.log(`\n🗑️ Removed library "${lib.display_name}". Deleted ${lib.page_count} pages.\n`);
+    }
+    catch (err) {
+        console.error(`\n❌ ${err.message}\n`);
+        process.exit(1);
+    }
 });
-program.command("get <url>").description("Get the full markdown content of a specific indexed page").action((url) => {
-  db.init();
-  const page = db.getPage({ url });
-  if (!page) {
-    console.error(`
-❌ Page not found in index: ${url}
-`);
-    process.exit(1);
-  }
-  console.log(`
---- ${page.title} ---`);
-  console.log(`Source: ${page.url}
-`);
-  console.log(page.content_markdown);
-  console.log(`
-`);
+program
+    .command("get <url>")
+    .description("Get the full markdown content of a specific indexed page")
+    .action((url) => {
+    db.init();
+    const page = db.getPage({ url });
+    if (!page) {
+        console.error(`\n❌ Page not found in index: ${url}\n`);
+        process.exit(1);
+    }
+    console.log(`\n--- ${page.title} ---`);
+    console.log(`Source: ${page.url}\n\n`);
+    console.log(page.content_markdown);
+    console.log("\n");
 });
 program.parse();
+/** Helper to wait for a crawl job to finish (CLI blocking mode) */
 async function waitForCrawl(jobId) {
-  const { jobManager } = await import("./server.js");
-  return new Promise((resolve) => {
-    const check = () => {
-      const job = jobManager.getJob(jobId);
-      if (!job || job.status === "completed" || job.status === "failed") {
-        if (job?.status === "completed") {
-          console.log(`
-\uD83E\uDD88 Crawl complete: ${job.pages_crawled} pages, ${job.chunks_created} chunks indexed.`);
-          if (job.pages_failed > 0) {
-            console.log(`   ⚠️  ${job.pages_failed} pages failed.`);
-          }
-        } else if (job?.status === "failed") {
-          console.error(`
-❌ Crawl failed: ${job.error_message}`);
-        }
-        resolve();
-        return;
-      }
-      setTimeout(check, 1000);
-    };
-    check();
-  });
+    const { jobManager } = await import("./server.js");
+    return new Promise((resolve) => {
+        const check = () => {
+            const job = jobManager.getJob(jobId);
+            if (!job || job.status === "completed" || job.status === "failed") {
+                if (job?.status === "completed") {
+                    console.log(`\n🦈 Crawl complete: ${job.pages_crawled} pages, ${job.chunks_created} chunks indexed.`);
+                    if (job.pages_failed > 0) {
+                        console.log(`   ⚠️  ${job.pages_failed} pages failed.`);
+                    }
+                }
+                else if (job?.status === "failed") {
+                    console.error(`\n❌ Crawl failed: ${job.error_message}`);
+                }
+                resolve();
+                return;
+            }
+            setTimeout(check, 1000);
+        };
+        check();
+    });
 }