bluera-knowledge 0.13.1 → 0.13.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,89 @@
1
+ # Crawler Architecture
2
+
3
+ The crawler defaults to **headless mode** (Playwright) for maximum compatibility with modern JavaScript-rendered sites. Use `--fast` for static HTML sites when speed is critical.
4
+
5
+ ## Default Mode (Headless - JavaScript-Rendered Sites)
6
+
7
+ By default, the crawler uses Playwright via crawl4ai to render JavaScript content:
8
+
9
+ ```mermaid
10
+ sequenceDiagram
11
+ participant User
12
+ participant CLI
13
+ participant IntelligentCrawler
14
+ participant Axios
15
+ participant Claude
16
+
17
+ User->>CLI: crawl URL --crawl "instruction"
18
+ CLI->>IntelligentCrawler: crawl(url, {useHeadless: true})
19
+ IntelligentCrawler->>PythonBridge: fetchHeadless(url)
20
+ PythonBridge->>crawl4ai: AsyncWebCrawler.arun(url)
21
+ crawl4ai->>Playwright: Launch browser & render JS
22
+ Playwright-->>crawl4ai: Rendered HTML
23
+ crawl4ai-->>PythonBridge: {html, markdown, links}
24
+ PythonBridge-->>IntelligentCrawler: Rendered HTML
25
+ IntelligentCrawler->>Claude: determineCrawlUrls(html, instruction)
26
+ Note over Claude: Natural language instruction<br/>STILL FULLY ACTIVE
27
+ Claude-->>IntelligentCrawler: [urls to crawl]
28
+ loop For each URL
29
+ IntelligentCrawler->>PythonBridge: fetchHeadless(url)
30
+ PythonBridge->>crawl4ai: Render JS
31
+ crawl4ai-->>PythonBridge: HTML
32
+ PythonBridge-->>IntelligentCrawler: HTML
33
+ IntelligentCrawler->>IntelligentCrawler: Convert to markdown & index
34
+ end
35
+ ```
36
+
37
+ ## Fast Mode (Static Sites - `--fast`)
38
+
39
+ For static HTML sites, use `--fast` for faster crawling with axios:
40
+
41
+ ```mermaid
42
+ sequenceDiagram
43
+ participant User
44
+ participant CLI
45
+ participant IntelligentCrawler
46
+ participant Axios
47
+ participant Claude
48
+
49
+ User->>CLI: crawl URL --crawl "instruction" --fast
50
+ CLI->>IntelligentCrawler: crawl(url, {useHeadless: false})
51
+ IntelligentCrawler->>Axios: fetchHtml(url)
52
+ Axios-->>IntelligentCrawler: Static HTML
53
+ IntelligentCrawler->>Claude: determineCrawlUrls(html, instruction)
54
+ Claude-->>IntelligentCrawler: [urls to crawl]
55
+ loop For each URL
56
+ IntelligentCrawler->>Axios: fetchHtml(url)
57
+ Axios-->>IntelligentCrawler: HTML
58
+ IntelligentCrawler->>IntelligentCrawler: Convert to markdown & index
59
+ end
60
+ ```
61
+
62
+ ---
63
+
64
+ ## Key Points
65
+
66
+ | Feature | Description |
67
+ |---------|-------------|
68
+ | **Default to headless** | Maximum compatibility with modern JavaScript-rendered sites (React, Vue, Next.js) |
69
+ | **Fast mode available** | Use `--fast` for static HTML sites when speed is critical |
70
+ | **Intelligent crawling preserved** | Claude Code CLI analyzes pages and selects URLs in both modes |
71
+ | **Automatic fallback** | If headless fetch fails, automatically falls back to axios |
72
+
73
+ ---
74
+
75
+ ## Intelligent Mode vs Simple Mode
76
+
77
+ The crawler operates in two modes depending on Claude Code CLI availability:
78
+
79
+ | Mode | Requires Claude CLI | Behavior |
80
+ |------|---------------------|----------|
81
+ | **Intelligent** | Yes | Claude analyzes pages and selects URLs based on natural language instructions |
82
+ | **Simple (BFS)** | No | Breadth-first crawl up to max depth (2 levels) |
83
+
84
+ **Automatic detection:**
85
+ - When Claude Code CLI is available: Full intelligent mode with `--crawl` and `--extract` instructions
86
+ - When Claude Code CLI is unavailable: Automatically uses simple BFS mode
87
+ - Clear messaging: "Claude CLI not found, using simple crawl mode"
88
+
89
+ > **Note:** Install Claude Code to unlock `--crawl` (AI-guided URL selection) and `--extract` (AI content extraction). Without it, web crawling still works but uses simple BFS mode.
@@ -0,0 +1,130 @@
1
+ # MCP Integration
2
+
3
+ The plugin includes a Model Context Protocol server that exposes search tools for AI agents.
4
+
5
+ > **Important:** Commands vs MCP Tools: You interact with the plugin using `/bluera-knowledge:` slash commands. Behind the scenes, these commands instruct Claude Code to use MCP tools (`mcp__bluera-knowledge__*`) which handle the actual operations. Commands provide the user interface, while MCP tools are the backend that AI agents use to access your knowledge stores.
6
+
7
+ ## Configuration
8
+
9
+ The MCP server is configured in `.mcp.json` at the plugin root:
10
+
11
+ ```json
12
+ {
13
+ "bluera-knowledge": {
14
+ "command": "node",
15
+ "args": ["${CLAUDE_PLUGIN_ROOT}/dist/mcp/server.js"],
16
+ "env": {
17
+ "PROJECT_ROOT": "${PWD}",
18
+ "DATA_DIR": ".bluera/bluera-knowledge/data",
19
+ "CONFIG_PATH": ".bluera/bluera-knowledge/config.json"
20
+ }
21
+ }
22
+ }
23
+ ```
24
+
25
+ > **Note:** We use a separate `.mcp.json` file rather than inline `mcpServers` in `plugin.json` due to [Claude Code Bug #16143](https://github.com/anthropics/claude-code/issues/16143). This is the recommended pattern for Claude Code plugins.
26
+
27
+ ---
28
+
29
+ ## Context Efficiency Strategy
30
+
31
+ ### Why Only 3 MCP Tools?
32
+
33
+ Every MCP tool exposed requires its full schema to be sent to Claude with each tool invocation. More tools = more tokens consumed before Claude can even respond.
34
+
35
+ **Design decision:** Consolidate from 10+ tools down to 3:
36
+
37
+ | Approach | Tool Count | Context Cost | Trade-off |
38
+ |----------|------------|--------------|-----------|
39
+ | Individual tools | 10+ | ~800+ tokens | Simple calls, high overhead |
40
+ | **Consolidated (current)** | 3 | ~300 tokens | Minimal overhead, slightly longer commands |
41
+
42
+ ### How It Works
43
+
44
+ 1. **Native tools for common workflow** - `search` and `get_full_context` are the operations Claude uses most often, so they get dedicated tools with full schemas
45
+
46
+ 2. **Meta-tool for management** - The `execute` tool consolidates 8 store/job management commands into a single tool. Commands are discovered on-demand via `execute("commands")` or `execute("help", {command: "store:create"})`
47
+
48
+ 3. **Lazy documentation** - Command help isn't pre-sent with tool listings; it's discoverable when needed
49
+
50
+ **Result:** ~60% reduction in context overhead for MCP tool listings, without sacrificing functionality.
51
+
52
+ > **Tip:** This pattern—consolidating infrequent operations into a meta-tool while keeping high-frequency operations native—is a general strategy for MCP context efficiency.
53
+
54
+ ---
55
+
56
+ ## Available MCP Tools
57
+
58
+ ### `search`
59
+
60
+ Semantic vector search across all indexed stores or a specific subset. Returns structured code units with relevance ranking.
61
+
62
+ **Parameters:**
63
+
64
+ | Parameter | Type | Description |
65
+ |-----------|------|-------------|
66
+ | `query` | string | Search query (natural language, patterns, or type signatures) |
67
+ | `intent` | string | Search intent: find-pattern, find-implementation, find-usage, find-definition, find-documentation |
68
+ | `mode` | string | Search mode: hybrid (default), vector, or fts |
69
+ | `detail` | string | Context level: minimal, contextual, or full |
70
+ | `limit` | number | Maximum results (default: 10) |
71
+ | `stores` | array | Array of specific store IDs to search (optional, searches all stores if not specified) |
72
+ | `threshold` | number | Minimum normalized score (0-1) for filtering results |
73
+ | `minRelevance` | number | Minimum raw cosine similarity (0-1) for filtering results |
74
+
75
+ ### `get_full_context`
76
+
77
+ Retrieve complete code and context for a specific search result by ID.
78
+
79
+ **Parameters:**
80
+
81
+ | Parameter | Type | Description |
82
+ |-----------|------|-------------|
83
+ | `resultId` | string | The result ID from a previous search |
84
+
85
+ ### `execute`
86
+
87
+ Meta-tool for store and job management. Consolidates 8 operations into one tool with subcommands.
88
+
89
+ **Parameters:**
90
+
91
+ | Parameter | Type | Description |
92
+ |-----------|------|-------------|
93
+ | `command` | string | Command to execute (see below) |
94
+ | `args` | object | Command-specific arguments (optional) |
95
+
96
+ **Available commands:**
97
+
98
+ | Command | Args | Description |
99
+ |---------|------|-------------|
100
+ | `stores` | `type?` | List all knowledge stores |
101
+ | `store:info` | `store` | Get detailed store information including file path |
102
+ | `store:create` | `name`, `type`, `source`, `branch?`, `description?` | Create a new store |
103
+ | `store:index` | `store` | Re-index an existing store |
104
+ | `store:delete` | `store` | Delete a store and all data |
105
+ | `stores:sync` | `dryRun?`, `prune?`, `reindex?` | Sync stores from definitions config |
106
+ | `jobs` | `activeOnly?`, `status?` | List background jobs |
107
+ | `job:status` | `jobId` | Check specific job status |
108
+ | `job:cancel` | `jobId` | Cancel a running job |
109
+ | `help` | `command?` | Show help for commands |
110
+ | `commands` | - | List all available commands |
111
+
112
+ ---
113
+
114
+ ## Known Issues
115
+
116
+ ### MCP Configuration Pattern
117
+
118
+ This plugin uses a separate `.mcp.json` file for MCP server configuration (rather than inline in `plugin.json`). This is the recommended pattern for Claude Code plugins due to [Bug #16143](https://github.com/anthropics/claude-code/issues/16143) where inline `mcpServers` may be ignored during plugin manifest parsing.
119
+
120
+ If you're developing a Claude Code plugin with MCP integration, we recommend:
121
+ 1. Create a `.mcp.json` file at your plugin root
122
+ 2. Do NOT use inline `mcpServers` in `plugin.json`
123
+
124
+ ### Related Claude Code Bugs
125
+
126
+ | Issue | Status | Description |
127
+ |-------|--------|-------------|
128
+ | [#16143](https://github.com/anthropics/claude-code/issues/16143) | Open | Inline `mcpServers` in plugin.json ignored |
129
+ | [#13543](https://github.com/anthropics/claude-code/issues/13543) | Fixed v2.0.65 | .mcp.json files not copied to plugin cache |
130
+ | [#18336](https://github.com/anthropics/claude-code/issues/18336) | Open | MCP plugin shows enabled but no resources available |
@@ -0,0 +1,91 @@
1
+ # Token Efficiency
2
+
3
+ Beyond speed and accuracy, Bluera Knowledge can **significantly reduce token consumption** for code-related queries—typically saving 60-75% compared to web search approaches.
4
+
5
+ ## How It Works
6
+
7
+ **Without Bluera Knowledge:**
8
+ - Web searches return 5-10 results (~500-2,000 tokens each)
9
+ - Total per search: **3,000-10,000 tokens**
10
+ - Often need multiple searches to find the right answer
11
+ - Lower signal-to-noise ratio (blog posts mixed with actual docs)
12
+
13
+ **With Bluera Knowledge:**
14
+ - Semantic search returns top 10 relevant code chunks (~200-400 tokens each)
15
+ - Structured metadata (file paths, imports, purpose)
16
+ - Total per search: **1,500-3,000 tokens**
17
+ - Higher relevance due to vector search (fewer follow-up queries needed)
18
+
19
+ ---
20
+
21
+ ## Real-World Examples
22
+
23
+ ### Example 1: Library Implementation Question
24
+
25
+ **Question:** "How does Express handle middleware errors?"
26
+
27
+ | Approach | Token Cost | Result |
28
+ |----------|-----------|--------|
29
+ | **Web Search** | ~8,000 tokens (3 searches: general query → refined query → source code) | Blog posts + Stack Overflow + eventual guess |
30
+ | **Bluera Knowledge** | ~2,000 tokens (1 semantic search) | Actual Express source code, authoritative |
31
+ | **Savings** | **75% fewer tokens** | Higher accuracy |
32
+
33
+ ### Example 2: Dependency Exploration
34
+
35
+ **Question:** "How does LanceDB's vector search work?"
36
+
37
+ | Approach | Token Cost | Result |
38
+ |----------|-----------|--------|
39
+ | **Web Search** | ~9,500 tokens (General docs → API docs → fetch specific page) | Documentation, might miss implementation details |
40
+ | **Bluera Knowledge** | ~1,500 tokens (Search returns source + tests + examples) | Source code from Python + Rust implementation |
41
+ | **Savings** | **84% fewer tokens** | Complete picture |
42
+
43
+ ### Example 3: Version-Specific Behavior
44
+
45
+ **Question:** "What changed in React 18's useEffect cleanup?"
46
+
47
+ | Approach | Token Cost | Result |
48
+ |----------|-----------|--------|
49
+ | **Training Data** | 0 tokens (but might be outdated) | Uncertain if accurate for React 18 |
50
+ | **Web Search** | ~5,000 tokens (Search changelog → blog posts → docs) | Mix of React 17 and 18 info |
51
+ | **Bluera Knowledge** | ~2,000 tokens (Search indexed React 18 source) | Exact React 18 implementation |
52
+ | **Savings** | **60% fewer tokens** | Version-accurate |
53
+
54
+ ---
55
+
56
+ ## When BK Uses More Tokens
57
+
58
+ Bluera Knowledge isn't always the most token-efficient choice:
59
+
60
+ | Scenario | Best Approach | Why |
61
+ |----------|---------------|-----|
62
+ | **Simple concept questions** ("What is a JavaScript closure?") | Training data | Claude already knows this (0 tokens) |
63
+ | **Current events** ("Latest Next.js 15 release notes") | Web search | BK only has what you've indexed |
64
+ | **General advice** ("How to structure a React app?") | Training data | Opinion-based, not code-specific |
65
+
66
+ ---
67
+
68
+ ## Summary: Token Savings by Query Type
69
+
70
+ | Query Type | Typical Token Savings | When to Use BK |
71
+ |------------|----------------------|----------------|
72
+ | **Library internals** | 60-75% | Always |
73
+ | **Version-specific behavior** | 50-70% | Always |
74
+ | **"How does X work internally?"** | 70-85% | Always |
75
+ | **API usage examples** | 40-60% | Recommended |
76
+ | **General concepts** | -100% (uses more) | Skip BK |
77
+ | **Current events** | -100% (uses more) | Skip BK |
78
+
79
+ ---
80
+
81
+ ## Best Practice
82
+
83
+ **Default to BK for library questions.** It's cheap, fast, and authoritative:
84
+
85
+ | Question Type | Action | Why |
86
+ |--------------|--------|-----|
87
+ | Library internals, APIs, errors, versions, config | **Query BK first** | Source code is definitive, 60-85% token savings |
88
+ | General programming concepts | Skip BK | Training data is sufficient |
89
+ | Breaking news, release notes | Web search | BK only has indexed content |
90
+
91
+ The plugin's Skills teach Claude Code these patterns automatically. When in doubt about a dependency, query BK—it's faster and more accurate than guessing or web searching.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "bluera-knowledge",
3
- "version": "0.13.1",
3
+ "version": "0.13.3",
4
4
  "description": "CLI tool for managing knowledge stores with semantic search",
5
5
  "type": "module",
6
6
  "bin": {
@@ -78,7 +78,7 @@ describe('PythonBridge', () => {
78
78
 
79
79
  expect(spawn).toHaveBeenCalledWith(
80
80
  'python3',
81
- ['python/crawl_worker.py'],
81
+ [expect.stringMatching(/.*\/python\/crawl_worker\.py$/)],
82
82
  expect.objectContaining({
83
83
  stdio: ['pipe', 'pipe', 'pipe'],
84
84
  })
@@ -1,6 +1,8 @@
1
1
  import { spawn, type ChildProcess } from 'node:child_process';
2
2
  import { randomUUID } from 'node:crypto';
3
+ import path from 'node:path';
3
4
  import { createInterface, type Interface as ReadlineInterface } from 'node:readline';
5
+ import { fileURLToPath } from 'node:url';
4
6
  import { ZodError } from 'zod';
5
7
  import {
6
8
  type CrawlResult,
@@ -37,9 +39,30 @@ export class PythonBridge {
37
39
  start(): Promise<void> {
38
40
  if (this.process) return Promise.resolve();
39
41
 
40
- logger.debug('Starting Python bridge process');
42
+ // Compute absolute path to Python worker using import.meta.url
43
+ // This works both in development (src/) and production (dist/)
44
+ const currentFilePath = fileURLToPath(import.meta.url);
45
+ const isProduction = currentFilePath.includes('/dist/');
46
+
47
+ let pythonWorkerPath: string;
48
+ if (isProduction) {
49
+ // Production: Find dist dir and go to sibling python/ directory
50
+ const distIndex = currentFilePath.indexOf('/dist/');
51
+ const pluginRoot = currentFilePath.substring(0, distIndex);
52
+ pythonWorkerPath = path.join(pluginRoot, 'python', 'crawl_worker.py');
53
+ } else {
54
+ // Development: Go up from src/crawl to find python/
55
+ const srcDir = path.dirname(path.dirname(currentFilePath));
56
+ const projectRoot = path.dirname(srcDir);
57
+ pythonWorkerPath = path.join(projectRoot, 'python', 'crawl_worker.py');
58
+ }
59
+
60
+ logger.debug(
61
+ { pythonWorkerPath, currentFilePath, isProduction },
62
+ 'Starting Python bridge process'
63
+ );
41
64
 
42
- this.process = spawn('python3', ['python/crawl_worker.py'], {
65
+ this.process = spawn('python3', [pythonWorkerPath], {
43
66
  stdio: ['pipe', 'pipe', 'pipe'],
44
67
  });
45
68
 
@@ -3,25 +3,32 @@ import { readFileSync, existsSync } from 'fs';
3
3
  import { join } from 'path';
4
4
 
5
5
  /**
6
- * Tests to verify plugin.json is correctly configured for MCP server.
6
+ * Tests to verify .mcp.json is correctly configured for MCP server.
7
7
  * The MCP server must work when the plugin is installed via marketplace.
8
8
  *
9
9
  * Key requirements:
10
10
  * - Must use ${CLAUDE_PLUGIN_ROOT} for server path (resolves to plugin cache)
11
11
  * - Must set PROJECT_ROOT env var (required by server fail-fast check)
12
12
  * - Must NOT use relative paths (would resolve to user's project, not plugin)
13
+ *
14
+ * Note: We use .mcp.json at plugin root (not inline in plugin.json) due to
15
+ * Claude Code Bug #16143 where inline mcpServers is ignored during parsing.
16
+ * See: https://github.com/anthropics/claude-code/issues/16143
13
17
  */
14
- describe('Plugin MCP Configuration (.claude-plugin/plugin.json)', () => {
15
- const configPath = join(process.cwd(), '.claude-plugin/plugin.json');
16
- const config = JSON.parse(readFileSync(configPath, 'utf-8'));
18
+ describe('Plugin MCP Configuration (.mcp.json)', () => {
19
+ const mcpJsonPath = join(process.cwd(), '.mcp.json');
20
+ const config = JSON.parse(readFileSync(mcpJsonPath, 'utf-8'));
21
+
22
+ it('has .mcp.json file at plugin root', () => {
23
+ expect(existsSync(mcpJsonPath)).toBe(true);
24
+ });
17
25
 
18
- it('has mcpServers configuration inline', () => {
19
- expect(config).toHaveProperty('mcpServers');
20
- expect(config.mcpServers).toHaveProperty('bluera-knowledge');
26
+ it('has bluera-knowledge server configuration', () => {
27
+ expect(config).toHaveProperty('bluera-knowledge');
21
28
  });
22
29
 
23
30
  it('uses ${CLAUDE_PLUGIN_ROOT} for server path (required for plugin mode)', () => {
24
- const serverConfig = config.mcpServers['bluera-knowledge'];
31
+ const serverConfig = config['bluera-knowledge'];
25
32
  const argsString = JSON.stringify(serverConfig.args);
26
33
 
27
34
  // CLAUDE_PLUGIN_ROOT is set by Claude Code when plugin is installed
@@ -31,7 +38,7 @@ describe('Plugin MCP Configuration (.claude-plugin/plugin.json)', () => {
31
38
  });
32
39
 
33
40
  it('does NOT use relative paths (would break in plugin mode)', () => {
34
- const serverConfig = config.mcpServers['bluera-knowledge'];
41
+ const serverConfig = config['bluera-knowledge'];
35
42
  const argsString = JSON.stringify(serverConfig.args);
36
43
 
37
44
  // Relative paths like ./dist would resolve to user's project directory
@@ -40,7 +47,7 @@ describe('Plugin MCP Configuration (.claude-plugin/plugin.json)', () => {
40
47
  });
41
48
 
42
49
  it('sets PROJECT_ROOT environment variable (required by fail-fast server)', () => {
43
- const serverConfig = config.mcpServers['bluera-knowledge'];
50
+ const serverConfig = config['bluera-knowledge'];
44
51
 
45
52
  // PROJECT_ROOT is required since b404cd6 (fail-fast change)
46
53
  expect(serverConfig.env).toHaveProperty('PROJECT_ROOT');
@@ -49,16 +56,16 @@ describe('Plugin MCP Configuration (.claude-plugin/plugin.json)', () => {
49
56
  });
50
57
 
51
58
  /**
52
- * Tests to ensure .mcp.json is NOT distributed with the plugin.
53
- * .mcp.json at project root causes confusion between plugin and project config.
59
+ * Tests to ensure plugin.json does NOT have inline mcpServers.
60
+ * Inline mcpServers is broken due to Claude Code Bug #16143.
54
61
  */
55
- describe('No conflicting .mcp.json in repo', () => {
56
- it('does NOT have .mcp.json in repo root (prevents config confusion)', () => {
57
- const mcpJsonPath = join(process.cwd(), '.mcp.json');
62
+ describe('plugin.json does NOT have inline mcpServers', () => {
63
+ it('plugin.json does NOT contain mcpServers (would be ignored by Claude Code)', () => {
64
+ const pluginJsonPath = join(process.cwd(), '.claude-plugin/plugin.json');
65
+ const pluginConfig = JSON.parse(readFileSync(pluginJsonPath, 'utf-8'));
58
66
 
59
- // .mcp.json should NOT exist in the repo
60
- // - For plugin mode: use mcpServers in plugin.json
61
- // - For development: use ~/.claude.json per README
62
- expect(existsSync(mcpJsonPath)).toBe(false);
67
+ // mcpServers in plugin.json is ignored due to Bug #16143
68
+ // All MCP config should be in .mcp.json at plugin root
69
+ expect(pluginConfig).not.toHaveProperty('mcpServers');
63
70
  });
64
71
  });