npm - bluera-knowledge - Versions diffs - 0.13.1 → 0.13.3 - Mend

bluera-knowledge 0.13.1 → 0.13.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/.claude-plugin/plugin.json +2 -15
package/.mcp.json +11 -0
package/CHANGELOG.md +14 -0
package/CONTRIBUTING.md +307 -0
package/README.md +51 -1168
package/dist/{chunk-6ZVW2P2F.js → chunk-AJI5DCKY.js} +67 -50
package/dist/{chunk-6ZVW2P2F.js.map → chunk-AJI5DCKY.js.map} +1 -1
package/dist/{chunk-GCUKVV33.js → chunk-AOSDVRRH.js} +2 -2
package/dist/{chunk-H5AKKHY7.js → chunk-XL2UHMBL.js} +2 -2
package/dist/index.js +3 -3
package/dist/mcp/server.js +2 -2
package/dist/workers/background-worker-cli.js +2 -2
package/docs/cli.md +170 -0
package/docs/commands.md +392 -0
package/docs/crawler-architecture.md +89 -0
package/docs/mcp-integration.md +130 -0
package/docs/token-efficiency.md +91 -0
package/package.json +1 -1
package/src/crawl/bridge.test.ts +1 -1
package/src/crawl/bridge.ts +25 -2
package/src/mcp/plugin-mcp-config.test.ts +26 -19
/package/dist/{chunk-GCUKVV33.js.map → chunk-AOSDVRRH.js.map} +0 -0
/package/dist/{chunk-H5AKKHY7.js.map → chunk-XL2UHMBL.js.map} +0 -0

package/docs/crawler-architecture.md ADDED Viewed

@@ -0,0 +1,89 @@
+# Crawler Architecture
+The crawler defaults to **headless mode** (Playwright) for maximum compatibility with modern JavaScript-rendered sites. Use `--fast` for static HTML sites when speed is critical.
+## Default Mode (Headless - JavaScript-Rendered Sites)
+By default, the crawler uses Playwright via crawl4ai to render JavaScript content:
+```mermaid
+sequenceDiagram
+    participant User
+    participant CLI
+    participant IntelligentCrawler
+    participant Axios
+    participant Claude
+    User->>CLI: crawl URL --crawl "instruction"
+    CLI->>IntelligentCrawler: crawl(url, {useHeadless: true})
+    IntelligentCrawler->>PythonBridge: fetchHeadless(url)
+    PythonBridge->>crawl4ai: AsyncWebCrawler.arun(url)
+    crawl4ai->>Playwright: Launch browser & render JS
+    Playwright-->>crawl4ai: Rendered HTML
+    crawl4ai-->>PythonBridge: {html, markdown, links}
+    PythonBridge-->>IntelligentCrawler: Rendered HTML
+    IntelligentCrawler->>Claude: determineCrawlUrls(html, instruction)
+    Note over Claude: Natural language instruction<br/>STILL FULLY ACTIVE
+    Claude-->>IntelligentCrawler: [urls to crawl]
+    loop For each URL
+        IntelligentCrawler->>PythonBridge: fetchHeadless(url)
+        PythonBridge->>crawl4ai: Render JS
+        crawl4ai-->>PythonBridge: HTML
+        PythonBridge-->>IntelligentCrawler: HTML
+        IntelligentCrawler->>IntelligentCrawler: Convert to markdown & index
+    end
+```
+## Fast Mode (Static Sites - `--fast`)
+For static HTML sites, use `--fast` for faster crawling with axios:
+```mermaid
+sequenceDiagram
+    participant User
+    participant CLI
+    participant IntelligentCrawler
+    participant Axios
+    participant Claude
+    User->>CLI: crawl URL --crawl "instruction" --fast
+    CLI->>IntelligentCrawler: crawl(url, {useHeadless: false})
+    IntelligentCrawler->>Axios: fetchHtml(url)
+    Axios-->>IntelligentCrawler: Static HTML
+    IntelligentCrawler->>Claude: determineCrawlUrls(html, instruction)
+    Claude-->>IntelligentCrawler: [urls to crawl]
+    loop For each URL
+        IntelligentCrawler->>Axios: fetchHtml(url)
+        Axios-->>IntelligentCrawler: HTML
+        IntelligentCrawler->>IntelligentCrawler: Convert to markdown & index
+    end
+```
+---
+## Key Points
+| Feature | Description |
+|---------|-------------|
+| **Default to headless** | Maximum compatibility with modern JavaScript-rendered sites (React, Vue, Next.js) |
+| **Fast mode available** | Use `--fast` for static HTML sites when speed is critical |
+| **Intelligent crawling preserved** | Claude Code CLI analyzes pages and selects URLs in both modes |
+| **Automatic fallback** | If headless fetch fails, automatically falls back to axios |
+---
+## Intelligent Mode vs Simple Mode
+The crawler operates in two modes depending on Claude Code CLI availability:
+| Mode | Requires Claude CLI | Behavior |
+|------|---------------------|----------|
+| **Intelligent** | Yes | Claude analyzes pages and selects URLs based on natural language instructions |
+| **Simple (BFS)** | No | Breadth-first crawl up to max depth (2 levels) |
+**Automatic detection:**
+- When Claude Code CLI is available: Full intelligent mode with `--crawl` and `--extract` instructions
+- When Claude Code CLI is unavailable: Automatically uses simple BFS mode
+- Clear messaging: "Claude CLI not found, using simple crawl mode"
+> **Note:** Install Claude Code to unlock `--crawl` (AI-guided URL selection) and `--extract` (AI content extraction). Without it, web crawling still works but uses simple BFS mode.

package/docs/mcp-integration.md ADDED Viewed

@@ -0,0 +1,130 @@
+# MCP Integration
+The plugin includes a Model Context Protocol server that exposes search tools for AI agents.
+> **Important:** Commands vs MCP Tools: You interact with the plugin using `/bluera-knowledge:` slash commands. Behind the scenes, these commands instruct Claude Code to use MCP tools (`mcp__bluera-knowledge__*`) which handle the actual operations. Commands provide the user interface, while MCP tools are the backend that AI agents use to access your knowledge stores.
+## Configuration
+The MCP server is configured in `.mcp.json` at the plugin root:
+```json
+{
+  "bluera-knowledge": {
+    "command": "node",
+    "args": ["${CLAUDE_PLUGIN_ROOT}/dist/mcp/server.js"],
+    "env": {
+      "PROJECT_ROOT": "${PWD}",
+      "DATA_DIR": ".bluera/bluera-knowledge/data",
+      "CONFIG_PATH": ".bluera/bluera-knowledge/config.json"
+    }
+  }
+}
+```
+> **Note:** We use a separate `.mcp.json` file rather than inline `mcpServers` in `plugin.json` due to [Claude Code Bug #16143](https://github.com/anthropics/claude-code/issues/16143). This is the recommended pattern for Claude Code plugins.
+---
+## Context Efficiency Strategy
+### Why Only 3 MCP Tools?
+Every MCP tool exposed requires its full schema to be sent to Claude with each tool invocation. More tools = more tokens consumed before Claude can even respond.
+**Design decision:** Consolidate from 10+ tools down to 3:
+| Approach | Tool Count | Context Cost | Trade-off |
+|----------|------------|--------------|-----------|
+| Individual tools | 10+ | ~800+ tokens | Simple calls, high overhead |
+| **Consolidated (current)** | 3 | ~300 tokens | Minimal overhead, slightly longer commands |
+### How It Works
+1. **Native tools for common workflow** - `search` and `get_full_context` are the operations Claude uses most often, so they get dedicated tools with full schemas
+2. **Meta-tool for management** - The `execute` tool consolidates 8 store/job management commands into a single tool. Commands are discovered on-demand via `execute("commands")` or `execute("help", {command: "store:create"})`
+3. **Lazy documentation** - Command help isn't pre-sent with tool listings; it's discoverable when needed
+**Result:** ~60% reduction in context overhead for MCP tool listings, without sacrificing functionality.
+> **Tip:** This pattern—consolidating infrequent operations into a meta-tool while keeping high-frequency operations native—is a general strategy for MCP context efficiency.
+---
+## Available MCP Tools
+### `search`
+Semantic vector search across all indexed stores or a specific subset. Returns structured code units with relevance ranking.
+**Parameters:**
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `query` | string | Search query (natural language, patterns, or type signatures) |
+| `intent` | string | Search intent: find-pattern, find-implementation, find-usage, find-definition, find-documentation |
+| `mode` | string | Search mode: hybrid (default), vector, or fts |
+| `detail` | string | Context level: minimal, contextual, or full |
+| `limit` | number | Maximum results (default: 10) |
+| `stores` | array | Array of specific store IDs to search (optional, searches all stores if not specified) |
+| `threshold` | number | Minimum normalized score (0-1) for filtering results |
+| `minRelevance` | number | Minimum raw cosine similarity (0-1) for filtering results |
+### `get_full_context`
+Retrieve complete code and context for a specific search result by ID.
+**Parameters:**
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `resultId` | string | The result ID from a previous search |
+### `execute`
+Meta-tool for store and job management. Consolidates 8 operations into one tool with subcommands.
+**Parameters:**
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `command` | string | Command to execute (see below) |
+| `args` | object | Command-specific arguments (optional) |
+**Available commands:**
+| Command | Args | Description |
+|---------|------|-------------|
+| `stores` | `type?` | List all knowledge stores |
+| `store:info` | `store` | Get detailed store information including file path |
+| `store:create` | `name`, `type`, `source`, `branch?`, `description?` | Create a new store |
+| `store:index` | `store` | Re-index an existing store |
+| `store:delete` | `store` | Delete a store and all data |
+| `stores:sync` | `dryRun?`, `prune?`, `reindex?` | Sync stores from definitions config |
+| `jobs` | `activeOnly?`, `status?` | List background jobs |
+| `job:status` | `jobId` | Check specific job status |
+| `job:cancel` | `jobId` | Cancel a running job |
+| `help` | `command?` | Show help for commands |
+| `commands` | - | List all available commands |
+---
+## Known Issues
+### MCP Configuration Pattern
+This plugin uses a separate `.mcp.json` file for MCP server configuration (rather than inline in `plugin.json`). This is the recommended pattern for Claude Code plugins due to [Bug #16143](https://github.com/anthropics/claude-code/issues/16143) where inline `mcpServers` may be ignored during plugin manifest parsing.
+If you're developing a Claude Code plugin with MCP integration, we recommend:
+1. Create a `.mcp.json` file at your plugin root
+2. Do NOT use inline `mcpServers` in `plugin.json`
+### Related Claude Code Bugs
+| Issue | Status | Description |
+|-------|--------|-------------|
+| [#16143](https://github.com/anthropics/claude-code/issues/16143) | Open | Inline `mcpServers` in plugin.json ignored |
+| [#13543](https://github.com/anthropics/claude-code/issues/13543) | Fixed v2.0.65 | .mcp.json files not copied to plugin cache |
+| [#18336](https://github.com/anthropics/claude-code/issues/18336) | Open | MCP plugin shows enabled but no resources available |

package/docs/token-efficiency.md ADDED Viewed

@@ -0,0 +1,91 @@
+# Token Efficiency
+Beyond speed and accuracy, Bluera Knowledge can **significantly reduce token consumption** for code-related queries—typically saving 60-75% compared to web search approaches.
+## How It Works
+**Without Bluera Knowledge:**
+- Web searches return 5-10 results (~500-2,000 tokens each)
+- Total per search: **3,000-10,000 tokens**
+- Often need multiple searches to find the right answer
+- Lower signal-to-noise ratio (blog posts mixed with actual docs)
+**With Bluera Knowledge:**
+- Semantic search returns top 10 relevant code chunks (~200-400 tokens each)
+- Structured metadata (file paths, imports, purpose)
+- Total per search: **1,500-3,000 tokens**
+- Higher relevance due to vector search (fewer follow-up queries needed)
+---
+## Real-World Examples
+### Example 1: Library Implementation Question
+**Question:** "How does Express handle middleware errors?"
+| Approach | Token Cost | Result |
+|----------|-----------|--------|
+| **Web Search** | ~8,000 tokens (3 searches: general query → refined query → source code) | Blog posts + Stack Overflow + eventual guess |
+| **Bluera Knowledge** | ~2,000 tokens (1 semantic search) | Actual Express source code, authoritative |
+| **Savings** | **75% fewer tokens** | Higher accuracy |
+### Example 2: Dependency Exploration
+**Question:** "How does LanceDB's vector search work?"
+| Approach | Token Cost | Result |
+|----------|-----------|--------|
+| **Web Search** | ~9,500 tokens (General docs → API docs → fetch specific page) | Documentation, might miss implementation details |
+| **Bluera Knowledge** | ~1,500 tokens (Search returns source + tests + examples) | Source code from Python + Rust implementation |
+| **Savings** | **84% fewer tokens** | Complete picture |
+### Example 3: Version-Specific Behavior
+**Question:** "What changed in React 18's useEffect cleanup?"
+| Approach | Token Cost | Result |
+|----------|-----------|--------|
+| **Training Data** | 0 tokens (but might be outdated) | Uncertain if accurate for React 18 |
+| **Web Search** | ~5,000 tokens (Search changelog → blog posts → docs) | Mix of React 17 and 18 info |
+| **Bluera Knowledge** | ~2,000 tokens (Search indexed React 18 source) | Exact React 18 implementation |
+| **Savings** | **60% fewer tokens** | Version-accurate |
+---
+## When BK Uses More Tokens
+Bluera Knowledge isn't always the most token-efficient choice:
+| Scenario | Best Approach | Why |
+|----------|---------------|-----|
+| **Simple concept questions** ("What is a JavaScript closure?") | Training data | Claude already knows this (0 tokens) |
+| **Current events** ("Latest Next.js 15 release notes") | Web search | BK only has what you've indexed |
+| **General advice** ("How to structure a React app?") | Training data | Opinion-based, not code-specific |
+---
+## Summary: Token Savings by Query Type
+| Query Type | Typical Token Savings | When to Use BK |
+|------------|----------------------|----------------|
+| **Library internals** | 60-75% | Always |
+| **Version-specific behavior** | 50-70% | Always |
+| **"How does X work internally?"** | 70-85% | Always |
+| **API usage examples** | 40-60% | Recommended |
+| **General concepts** | -100% (uses more) | Skip BK |
+| **Current events** | -100% (uses more) | Skip BK |
+---
+## Best Practice
+**Default to BK for library questions.** It's cheap, fast, and authoritative:
+| Question Type | Action | Why |
+|--------------|--------|-----|
+| Library internals, APIs, errors, versions, config | **Query BK first** | Source code is definitive, 60-85% token savings |
+| General programming concepts | Skip BK | Training data is sufficient |
+| Breaking news, release notes | Web search | BK only has indexed content |
+The plugin's Skills teach Claude Code these patterns automatically. When in doubt about a dependency, query BK—it's faster and more accurate than guessing or web searching.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "bluera-knowledge",
-  "version": "0.13.1",
+  "version": "0.13.3",
   "description": "CLI tool for managing knowledge stores with semantic search",
   "type": "module",
   "bin": {

package/src/crawl/bridge.test.ts CHANGED Viewed

@@ -78,7 +78,7 @@ describe('PythonBridge', () => {
       expect(spawn).toHaveBeenCalledWith(
         'python3',
-        ['python/crawl_worker.py'],
+        [expect.stringMatching(/.*\/python\/crawl_worker\.py$/)],
         expect.objectContaining({
           stdio: ['pipe', 'pipe', 'pipe'],
         })

package/src/crawl/bridge.ts CHANGED Viewed

@@ -1,6 +1,8 @@
 import { spawn, type ChildProcess } from 'node:child_process';
 import { randomUUID } from 'node:crypto';
+import path from 'node:path';
 import { createInterface, type Interface as ReadlineInterface } from 'node:readline';
+import { fileURLToPath } from 'node:url';
 import { ZodError } from 'zod';
 import {
   type CrawlResult,
@@ -37,9 +39,30 @@ export class PythonBridge {
   start(): Promise<void> {
     if (this.process) return Promise.resolve();
-    logger.debug('Starting Python bridge process');
+    // Compute absolute path to Python worker using import.meta.url
+    // This works both in development (src/) and production (dist/)
+    const currentFilePath = fileURLToPath(import.meta.url);
+    const isProduction = currentFilePath.includes('/dist/');
+    let pythonWorkerPath: string;
+    if (isProduction) {
+      // Production: Find dist dir and go to sibling python/ directory
+      const distIndex = currentFilePath.indexOf('/dist/');
+      const pluginRoot = currentFilePath.substring(0, distIndex);
+      pythonWorkerPath = path.join(pluginRoot, 'python', 'crawl_worker.py');
+    } else {
+      // Development: Go up from src/crawl to find python/
+      const srcDir = path.dirname(path.dirname(currentFilePath));
+      const projectRoot = path.dirname(srcDir);
+      pythonWorkerPath = path.join(projectRoot, 'python', 'crawl_worker.py');
+    }
+    logger.debug(
+      { pythonWorkerPath, currentFilePath, isProduction },
+      'Starting Python bridge process'
+    );
-    this.process = spawn('python3', ['python/crawl_worker.py'], {
+    this.process = spawn('python3', [pythonWorkerPath], {
       stdio: ['pipe', 'pipe', 'pipe'],
     });

package/src/mcp/plugin-mcp-config.test.ts CHANGED Viewed

@@ -3,25 +3,32 @@ import { readFileSync, existsSync } from 'fs';
 import { join } from 'path';
 /**
- * Tests to verify plugin.json is correctly configured for MCP server.
+ * Tests to verify .mcp.json is correctly configured for MCP server.
  * The MCP server must work when the plugin is installed via marketplace.
  *
  * Key requirements:
  * - Must use ${CLAUDE_PLUGIN_ROOT} for server path (resolves to plugin cache)
  * - Must set PROJECT_ROOT env var (required by server fail-fast check)
  * - Must NOT use relative paths (would resolve to user's project, not plugin)
+ *
+ * Note: We use .mcp.json at plugin root (not inline in plugin.json) due to
+ * Claude Code Bug #16143 where inline mcpServers is ignored during parsing.
+ * See: https://github.com/anthropics/claude-code/issues/16143
  */
-describe('Plugin MCP Configuration (.claude-plugin/plugin.json)', () => {
-  const configPath = join(process.cwd(), '.claude-plugin/plugin.json');
-  const config = JSON.parse(readFileSync(configPath, 'utf-8'));
+describe('Plugin MCP Configuration (.mcp.json)', () => {
+  const mcpJsonPath = join(process.cwd(), '.mcp.json');
+  const config = JSON.parse(readFileSync(mcpJsonPath, 'utf-8'));
+  it('has .mcp.json file at plugin root', () => {
+    expect(existsSync(mcpJsonPath)).toBe(true);
+  });
-  it('has mcpServers configuration inline', () => {
-    expect(config).toHaveProperty('mcpServers');
-    expect(config.mcpServers).toHaveProperty('bluera-knowledge');
+  it('has bluera-knowledge server configuration', () => {
+    expect(config).toHaveProperty('bluera-knowledge');
   });
   it('uses ${CLAUDE_PLUGIN_ROOT} for server path (required for plugin mode)', () => {
-    const serverConfig = config.mcpServers['bluera-knowledge'];
+    const serverConfig = config['bluera-knowledge'];
     const argsString = JSON.stringify(serverConfig.args);
     // CLAUDE_PLUGIN_ROOT is set by Claude Code when plugin is installed
@@ -31,7 +38,7 @@ describe('Plugin MCP Configuration (.claude-plugin/plugin.json)', () => {
   });
   it('does NOT use relative paths (would break in plugin mode)', () => {
-    const serverConfig = config.mcpServers['bluera-knowledge'];
+    const serverConfig = config['bluera-knowledge'];
     const argsString = JSON.stringify(serverConfig.args);
     // Relative paths like ./dist would resolve to user's project directory
@@ -40,7 +47,7 @@ describe('Plugin MCP Configuration (.claude-plugin/plugin.json)', () => {
   });
   it('sets PROJECT_ROOT environment variable (required by fail-fast server)', () => {
-    const serverConfig = config.mcpServers['bluera-knowledge'];
+    const serverConfig = config['bluera-knowledge'];
     // PROJECT_ROOT is required since b404cd6 (fail-fast change)
     expect(serverConfig.env).toHaveProperty('PROJECT_ROOT');
@@ -49,16 +56,16 @@ describe('Plugin MCP Configuration (.claude-plugin/plugin.json)', () => {
 });
 /**
- * Tests to ensure .mcp.json is NOT distributed with the plugin.
- * .mcp.json at project root causes confusion between plugin and project config.
+ * Tests to ensure plugin.json does NOT have inline mcpServers.
+ * Inline mcpServers is broken due to Claude Code Bug #16143.
  */
-describe('No conflicting .mcp.json in repo', () => {
-  it('does NOT have .mcp.json in repo root (prevents config confusion)', () => {
-    const mcpJsonPath = join(process.cwd(), '.mcp.json');
+describe('plugin.json does NOT have inline mcpServers', () => {
+  it('plugin.json does NOT contain mcpServers (would be ignored by Claude Code)', () => {
+    const pluginJsonPath = join(process.cwd(), '.claude-plugin/plugin.json');
+    const pluginConfig = JSON.parse(readFileSync(pluginJsonPath, 'utf-8'));
-    // .mcp.json should NOT exist in the repo
-    // - For plugin mode: use mcpServers in plugin.json
-    // - For development: use ~/.claude.json per README
-    expect(existsSync(mcpJsonPath)).toBe(false);
+    // mcpServers in plugin.json is ignored due to Bug #16143
+    // All MCP config should be in .mcp.json at plugin root
+    expect(pluginConfig).not.toHaveProperty('mcpServers');
   });
 });

/package/dist/{chunk-GCUKVV33.js.map → chunk-AOSDVRRH.js.map} RENAMED Viewed

File without changes

/package/dist/{chunk-H5AKKHY7.js.map → chunk-XL2UHMBL.js.map} RENAMED Viewed

File without changes