bluera-knowledge 0.30.0 → 0.32.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (67) hide show
  1. package/.claude-plugin/plugin.json +24 -0
  2. package/.mcp.json +13 -0
  3. package/CHANGELOG.md +37 -0
  4. package/NOTICE +47 -0
  5. package/README.md +2 -2
  6. package/bun.lock +1978 -0
  7. package/commands/add-folder.md +48 -0
  8. package/commands/add-repo.md +50 -0
  9. package/commands/cancel.md +63 -0
  10. package/commands/check-status.md +130 -0
  11. package/commands/crawl.md +61 -0
  12. package/commands/doctor.md +27 -0
  13. package/commands/eval.md +222 -0
  14. package/commands/health.md +72 -0
  15. package/commands/index.md +48 -0
  16. package/commands/remove-store.md +52 -0
  17. package/commands/search.md +80 -0
  18. package/commands/search.sh +63 -0
  19. package/commands/skill-activation.md +131 -0
  20. package/commands/stores.md +54 -0
  21. package/commands/suggest.md +118 -0
  22. package/commands/sync.md +96 -0
  23. package/commands/test-plugin.md +547 -0
  24. package/commands/uninstall.md +65 -0
  25. package/dist/{chunk-B335UOU7.js → chunk-3TB7TDVF.js} +24 -3
  26. package/dist/chunk-3TB7TDVF.js.map +1 -0
  27. package/dist/{chunk-KCI4U6FH.js → chunk-KDZDLJUY.js} +2 -2
  28. package/dist/{chunk-AEXFPA57.js → chunk-YDTTD53Y.js} +158 -26
  29. package/dist/chunk-YDTTD53Y.js.map +1 -0
  30. package/dist/index.js +3 -3
  31. package/dist/mcp/bootstrap.js +10 -0
  32. package/dist/mcp/bootstrap.js.map +1 -1
  33. package/dist/mcp/server.d.ts +5 -3
  34. package/dist/mcp/server.js +2 -2
  35. package/dist/workers/background-worker-cli.js +2 -2
  36. package/hooks/check-ready.sh +109 -0
  37. package/hooks/hooks.json +87 -0
  38. package/hooks/job-status-hook.sh +51 -0
  39. package/hooks/posttooluse-bk-reminder.py +126 -0
  40. package/hooks/posttooluse-web-research.py +209 -0
  41. package/hooks/pretooluse-bk-suggest.py +296 -0
  42. package/hooks/skill-activation.py +221 -0
  43. package/hooks/skill-rules.json +131 -0
  44. package/package.json +10 -2
  45. package/scripts/CLAUDE.md +65 -0
  46. package/scripts/auto-setup.sh +65 -0
  47. package/scripts/bench-regression.sh +345 -0
  48. package/scripts/dev.sh +16 -0
  49. package/scripts/doctor.sh +103 -0
  50. package/scripts/download-models.ts +188 -0
  51. package/scripts/export-web-store.ts +142 -0
  52. package/scripts/lib/mock-server.sh +70 -0
  53. package/scripts/mcp-wrapper.sh +91 -0
  54. package/scripts/setup.sh +224 -0
  55. package/scripts/test-mcp-dev.js +260 -0
  56. package/scripts/validate-local.sh +412 -0
  57. package/scripts/validate-npm-release.sh +406 -0
  58. package/skills/advanced-workflows/SKILL.md +273 -0
  59. package/skills/knowledge-search/SKILL.md +110 -0
  60. package/skills/search-optimization/SKILL.md +199 -0
  61. package/skills/search-optimization/references/mistakes.md +21 -0
  62. package/skills/search-optimization/references/strategies.md +80 -0
  63. package/skills/store-lifecycle/SKILL.md +470 -0
  64. package/skills/when-to-query/SKILL.md +160 -0
  65. package/dist/chunk-AEXFPA57.js.map +0 -1
  66. package/dist/chunk-B335UOU7.js.map +0 -1
  67. /package/dist/{chunk-KCI4U6FH.js.map → chunk-KDZDLJUY.js.map} +0 -0
@@ -0,0 +1,48 @@
1
+ ---
2
+ description: Index a local folder of reference material
3
+ argument-hint: "[path] [--name store-name]"
4
+ allowed-tools: ["mcp__bluera-knowledge__execute"]
5
+ ---
6
+
7
+ # Add Local Folder to Knowledge Stores
8
+
9
+ Index a local folder of reference material: **$ARGUMENTS**
10
+
11
+ ## Steps
12
+
13
+ 1. Parse arguments from $ARGUMENTS:
14
+ - Extract the folder path (required, first positional argument)
15
+ - Extract --name parameter (optional, defaults to folder name)
16
+
17
+ 2. Use mcp__bluera-knowledge__execute tool with command "store:create":
18
+ - args.name: Store name (from --name or folder basename)
19
+ - args.type: "file"
20
+ - args.source: The folder path
21
+
22
+ 3. Display results showing job ID for background indexing:
23
+
24
+ ```
25
+ ✓ Adding folder: /Users/me/my-docs...
26
+ ✓ Created store: my-docs (e5f6g7h8...)
27
+ Location: ~/.local/share/bluera-knowledge/stores/e5f6g7h8.../
28
+
29
+ 🔄 Indexing started in background
30
+ Job ID: job_xyz789abc123
31
+
32
+ Check status with: /bluera-knowledge:check-status job_xyz789abc123
33
+ Or view all jobs: /bluera-knowledge:check-status
34
+ ```
35
+
36
+ ## Error Handling
37
+
38
+ If creation fails (e.g., path doesn't exist, permission denied):
39
+
40
+ ```
41
+ ✗ Failed to add folder: [error message]
42
+
43
+ Common issues:
44
+ - Check that the path exists
45
+ - Ensure you have read permissions for the folder
46
+ - Verify the path is a directory, not a file
47
+ - Use absolute paths to avoid ambiguity
48
+ ```
@@ -0,0 +1,50 @@
1
+ ---
2
+ description: Clone and index a library source repository
3
+ argument-hint: "[git-url] [--name store-name] [--branch branch-name]"
4
+ allowed-tools: ["mcp__bluera-knowledge__execute"]
5
+ ---
6
+
7
+ # Add Repository to Knowledge Stores
8
+
9
+ Clone and index a library source repository: **$ARGUMENTS**
10
+
11
+ ## Steps
12
+
13
+ 1. Parse arguments from $ARGUMENTS:
14
+ - Extract the git URL (required, first positional argument)
15
+ - Extract --name parameter (optional, defaults to repo name from URL)
16
+ - Extract --branch parameter (optional, defaults to default branch)
17
+
18
+ 2. Use mcp__bluera-knowledge__execute tool with command "store:create":
19
+ - args.name: Store name (from --name or extracted from URL)
20
+ - args.type: "repo"
21
+ - args.source: The git URL
22
+ - args.branch: Branch name (if --branch specified)
23
+
24
+ 3. Display results showing job ID for background indexing:
25
+
26
+ ```
27
+ ✓ Cloning https://github.com/facebook/react...
28
+ ✓ Created store: react (a1b2c3d4...)
29
+ Location: ~/.local/share/bluera-knowledge/stores/a1b2c3d4.../
30
+
31
+ 🔄 Indexing started in background
32
+ Job ID: job_abc123def456
33
+
34
+ Check status with: /bluera-knowledge:check-status job_abc123def456
35
+ Or view all jobs: /bluera-knowledge:check-status
36
+ ```
37
+
38
+ ## Error Handling
39
+
40
+ If creation fails (e.g., invalid URL, network error, git not available):
41
+
42
+ ```
43
+ ✗ Failed to clone repository: [error message]
44
+
45
+ Common issues:
46
+ - Check that the git URL is valid and accessible
47
+ - Ensure you have network connectivity
48
+ - Verify git is installed on your system
49
+ - For private repos, check your SSH keys or credentials
50
+ ```
@@ -0,0 +1,63 @@
1
+ ---
2
+ description: Cancel a background job
3
+ argument-hint: "[job-id]"
4
+ allowed-tools: ["mcp__bluera-knowledge__execute"]
5
+ ---
6
+
7
+ # Cancel Background Job
8
+
9
+ Cancel a running or pending background job: **$ARGUMENTS**
10
+
11
+ ## Steps
12
+
13
+ 1. Parse the job ID from $ARGUMENTS (required)
14
+ - If no job ID provided, show error and suggest using /bluera-knowledge:check-status to list active jobs
15
+
16
+ 2. Use mcp__bluera-knowledge__execute tool with command "job:cancel":
17
+ - args.jobId: The job ID from $ARGUMENTS
18
+
19
+ 3. Display cancellation result:
20
+
21
+ ```
22
+ ✓ Job job_abc123def456 cancelled
23
+ Type: clone
24
+ Progress: 45% (was indexing)
25
+
26
+ The job has been stopped and will not continue.
27
+ ```
28
+
29
+ ## When to Cancel
30
+
31
+ Cancel a job when:
32
+ - You accidentally started indexing the wrong repository
33
+ - The operation is taking too long and you want to try a different approach
34
+ - You need to free up system resources
35
+ - You want to stop an operation before it completes
36
+
37
+ ## Important Notes
38
+
39
+ - Only jobs in 'pending' or 'running' status can be cancelled
40
+ - Completed or failed jobs cannot be cancelled
41
+ - Cancelled jobs are marked with status 'cancelled' and remain in the job list
42
+ - Partial work may be saved (e.g., partially indexed files remain in the database)
43
+
44
+ ## Error Handling
45
+
46
+ If job cannot be cancelled:
47
+
48
+ ```
49
+ ✗ Cannot cancel job job_abc123def456: Job has already completed
50
+
51
+ Only pending or running jobs can be cancelled.
52
+ ```
53
+
54
+ If job not found:
55
+
56
+ ```
57
+ ✗ Job not found: job_abc123def456
58
+
59
+ Common issues:
60
+ - Check the job ID is correct
61
+ - Use /bluera-knowledge:check-status to see all active jobs
62
+ - Job may have already completed and been cleaned up
63
+ ```
@@ -0,0 +1,130 @@
1
+ ---
2
+ description: Check status of background operations
3
+ argument-hint: "[job-id]"
4
+ allowed-tools: ["mcp__bluera-knowledge__execute"]
5
+ ---
6
+
7
+ # Check Background Job Status
8
+
9
+ Check the status of a background operation: **$ARGUMENTS**
10
+
11
+ ## Steps
12
+
13
+ 1. Parse $ARGUMENTS:
14
+ - If a job ID is provided, use it for specific job status
15
+ - If no arguments, show all active jobs
16
+
17
+ 2. If job ID provided:
18
+ - Use mcp__bluera-knowledge__execute tool with command "job:status":
19
+ - args.jobId: The job ID from $ARGUMENTS
20
+ - Display current status, progress, and details
21
+
22
+ 3. If no job ID provided:
23
+ - Use mcp__bluera-knowledge__execute tool with command "jobs":
24
+ - args.activeOnly: true
25
+ - Display a table of running/pending jobs
26
+
27
+ ## Display Format
28
+
29
+ For a specific job:
30
+
31
+ ```
32
+ Job Status: job_abc123def456
33
+ ───────────────────────────────────────
34
+ Store: react-query
35
+ Phase: indexing (2/2)
36
+ Progress: █████░░░ 45% (562/1,247 files)
37
+ Started: 2 minutes ago
38
+ ```
39
+
40
+ For all active jobs, format as a rich table with progress bars:
41
+
42
+ ```
43
+ Active Background Jobs
44
+ ────────────────────────────────────────────────────────────────────────────────────────────
45
+ | Job ID | Store | Phase | Progress | Files |
46
+ |--------------------|------------------|-----------------|--------------|----------------|
47
+ | job_3abaf9639770 | claude-agent-sdk | indexing (2/2) | ██████░░ 59% | 32/77 files |
48
+ | job_4f0315fdcff9 | zustand | cloning (1/2) | █░░░░░░░ 15% | - |
49
+ | job_1d1d93fd254f | uvicorn | indexing (1/1) | ████░░░░ 44% | 20/100 files |
50
+ | job_ac7584576f18 | tanstack-query | crawling (1/2) | ███░░░░░ 31% | 24 pages |
51
+ | job_8113ea07cf53 | framer-motion | indexing (2/2) | ███░░░░░ 30% | 8/1378 files |
52
+ | job_288c24b6724c | monaco-editor | indexing (1/1) | ███░░░░░ 31% | 12/924 files |
53
+
54
+ ✓ tiktoken: Completed
55
+
56
+ 6 jobs still running. The smaller repos (claude-agent-sdk, zustand, uvicorn) are progressing
57
+ faster. The larger ones (tanstack-query: 1741 files, framer-motion: 1378 files,
58
+ monaco-editor: 924 files) will take longer.
59
+ ```
60
+
61
+ **Phase column:**
62
+ - Read from `job.details.phase`, `job.details.phaseStep`, `job.details.phaseTotalSteps`
63
+ - Format as: `{phase} ({step}/{total})` e.g., `indexing (2/2)`, `cloning (1/2)`
64
+ - Phases: `cloning`, `crawling`, `indexing`
65
+ - Clone jobs: cloning (1/2) → indexing (2/2)
66
+ - Index jobs: indexing (1/1)
67
+ - Crawl jobs: crawling (1/2) → indexing (2/2)
68
+
69
+ **Progress bar rendering (8 chars wide):**
70
+
71
+ Build the bar using these characters: `█` (filled) and `░` (empty)
72
+
73
+ ```
74
+ Algorithm:
75
+ filled = Math.round(progress / 100 * 8)
76
+ bar = '█'.repeat(filled) + '░'.repeat(8 - filled) + ' ' + progress + '%'
77
+ ```
78
+
79
+ Examples:
80
+ ```
81
+ 0% → ░░░░░░░░ 0%
82
+ 15% → █░░░░░░░ 15%
83
+ 25% → ██░░░░░░ 25%
84
+ 31% → ███░░░░░ 31%
85
+ 44% → ████░░░░ 44%
86
+ 59% → █████░░░ 59%
87
+ 75% → ██████░░ 75%
88
+ 88% → ███████░ 88%
89
+ 100% → ████████ 100%
90
+ ```
91
+
92
+ **Files column:**
93
+ - For indexing: Show `{filesProcessed}/{totalFiles} files` from job.details
94
+ - For crawling: Show `{pagesCrawled} pages` from job.details
95
+ - For cloning (phase 1): Show `-` (no file count yet)
96
+
97
+ **Summary section:**
98
+ - After the table, add a brief summary noting:
99
+ - How many jobs are still running
100
+ - Which repos are progressing faster (smaller file counts)
101
+ - Which repos will take longer (larger file counts)
102
+ - If any jobs recently completed, note them with ✓ prefix
103
+
104
+ If no active jobs:
105
+
106
+ ```
107
+ No active background jobs.
108
+
109
+ Recent completed jobs:
110
+ ────────────────────────────────────────────────────────────────────────────────────────────
111
+ | Job ID | Store | Phase | Files | Completed |
112
+ |--------------------|------------------|-----------------|----------------|--------------|
113
+ | job_old123abc456 | react-query | indexing (2/2) | 245/245 files | 5m ago |
114
+ | job_xyz789ghi012 | zustand | indexing (1/1) | 67/67 files | 12m ago |
115
+
116
+ All jobs completed successfully.
117
+ ```
118
+
119
+ ## Error Handling
120
+
121
+ If job not found:
122
+
123
+ ```
124
+ ✗ Job not found: job_abc123def456
125
+
126
+ Common issues:
127
+ - Check the job ID is correct
128
+ - Job may have expired (stale pending jobs are marked failed after 2 hours)
129
+ - Use /bluera-knowledge:check-status to see all active jobs
130
+ ```
@@ -0,0 +1,61 @@
1
+ ---
2
+ description: Crawl web pages with natural language control and add to knowledge store
3
+ argument-hint: "[url] [store-name] [--crawl instruction] [--extract instruction] [--fast]"
4
+ allowed-tools: ["Bash(node ${CLAUDE_PLUGIN_ROOT}/dist/index.js crawl:*)"]
5
+ context: fork
6
+ ---
7
+
8
+ **⚠️ IMPORTANT: Store name is a POSITIONAL argument, NOT an option!**
9
+
10
+ ```
11
+ WRONG: crawl https://example.com --store=my-store
12
+ RIGHT: crawl https://example.com my-store
13
+ ```
14
+
15
+ Crawling and indexing: $ARGUMENTS
16
+
17
+ ```bash
18
+ node ${CLAUDE_PLUGIN_ROOT}/dist/index.js crawl $ARGUMENTS
19
+ ```
20
+
21
+ The web pages will be crawled with intelligent link selection and optional natural language extraction, then indexed for searching.
22
+
23
+ **Note:** The web store is auto-created if it doesn't exist. No need to create the store first.
24
+
25
+ ## Usage Examples
26
+
27
+ **Intelligent crawl strategy:**
28
+ ```
29
+ /bluera-knowledge:crawl https://code.claude.com/docs/en/ claude-docs --crawl "all Getting Started pages"
30
+ ```
31
+
32
+ **With extraction:**
33
+ ```
34
+ /bluera-knowledge:crawl https://example.com/pricing pricing-store --extract "extract pricing and features"
35
+ ```
36
+
37
+ **Both strategy and extraction:**
38
+ ```
39
+ /bluera-knowledge:crawl https://docs.example.com my-docs --crawl "API reference pages" --extract "API endpoints and parameters"
40
+ ```
41
+
42
+ **Simple BFS mode:**
43
+ ```
44
+ /bluera-knowledge:crawl https://example.com/docs docs-store --simple
45
+ ```
46
+
47
+ **Fast mode (axios-only, no JavaScript rendering):**
48
+ ```
49
+ /bluera-knowledge:crawl https://example.com/docs docs-store --fast --max-pages 20
50
+ ```
51
+
52
+ ## Options
53
+
54
+ - `--crawl <instruction>` - Natural language instruction for which pages to crawl (e.g., "all Getting Started pages")
55
+ - `--extract <instruction>` - Natural language instruction for what content to extract (e.g., "extract API references")
56
+ - `--simple` - Use simple BFS (breadth-first search) mode instead of intelligent crawling
57
+ - `--max-pages <number>` - Maximum number of pages to crawl (default: 50)
58
+ - `--fast` - Use fast axios-only mode instead of headless browser
59
+ - Default behavior uses headless browser (Playwright via crawl4ai) for JavaScript-rendered sites
60
+ - Use `--fast` when the target site doesn't use client-side rendering
61
+ - Much faster than headless mode but may miss content from JavaScript-heavy sites
@@ -0,0 +1,27 @@
1
+ ---
2
+ description: Diagnose plugin issues and get fix instructions
3
+ allowed-tools: ["Bash(${CLAUDE_PLUGIN_ROOT:-.}/scripts/doctor.sh)"]
4
+ ---
5
+ # Bluera Knowledge Doctor
6
+
7
+ Run comprehensive diagnostics to identify and fix plugin issues.
8
+
9
+ ## Instructions
10
+
11
+ Run the doctor script to check all prerequisites:
12
+
13
+ ```bash
14
+ bash "${CLAUDE_PLUGIN_ROOT:-.}/scripts/doctor.sh"
15
+ ```
16
+
17
+ The script checks:
18
+ 1. **Build tools** (make/gcc) - Required for native modules
19
+ 2. **Node.js** - Required for MCP server
20
+ 3. **Plugin dependencies** (node_modules) - Required for MCP server
21
+ 4. **MCP wrapper** - Required for MCP server startup
22
+ 5. **Python 3** - Optional, for embeddings
23
+ 6. **Playwright** - Optional, for web crawling
24
+
25
+ For any `[FAIL]` items, follow the FIX instructions provided.
26
+
27
+ After fixing issues, restart Claude Code for changes to take effect.
@@ -0,0 +1,222 @@
1
+ ---
2
+ description: Evaluate agent quality across three modes — without BK, BK grep-only, and BK full
3
+ argument-hint: "[query | --predefined | --predefined N]"
4
+ allowed-tools: ["mcp__bluera-knowledge__execute", "mcp__bluera-knowledge__search", "mcp__bluera-knowledge__get_full_context", "Read", "Grep", "Glob", "WebSearch", "Bash"]
5
+ context: fork
6
+ ---
7
+
8
+ # Agent Quality Evaluation
9
+
10
+ Compare how well Claude answers library questions across three access levels.
11
+
12
+ For each query, three agents run in parallel:
13
+ - **Without BK** — uses only web search and training knowledge
14
+ - **BK Grep** — can Grep/Read/Glob the cloned source repos but has no vector search
15
+ - **BK Full** — uses BK vector search + get_full_context + Grep/Read (all BK tools)
16
+
17
+ Then score all three answers on accuracy, specificity, completeness, and source grounding.
18
+
19
+ ## Arguments
20
+
21
+ Parse `$ARGUMENTS`:
22
+
23
+ - **No arguments or empty**: Show usage help
24
+ - **Quoted string** (not starting with `--`): Arbitrary query mode — run eval for that single question
25
+ - **`--predefined`**: Run all predefined queries (skip any whose stores are not indexed)
26
+ - **`--predefined N`**: Run predefined query #N only (1-based index)
27
+
28
+ If no arguments provided, show:
29
+ ```
30
+ Usage:
31
+ /bluera-knowledge:eval "How does Express handle errors?" # Arbitrary query
32
+ /bluera-knowledge:eval --predefined # Run all predefined queries
33
+ /bluera-knowledge:eval --predefined 3 # Run predefined query #3
34
+ ```
35
+
36
+ ## Step 1: Prerequisites Check
37
+
38
+ 1. Call MCP `execute` with `{ command: "stores" }` to list indexed stores
39
+ 2. If no stores are indexed, show error and abort:
40
+ ```
41
+ No knowledge stores indexed. Add at least one library first:
42
+ /bluera-knowledge:add-repo https://github.com/expressjs/express --name express
43
+ ```
44
+ 3. Record the list of available store names — you'll pass these to the BK Full agent
45
+ 4. Build a `STORE_PATHS` mapping from the store response: for each store with a `path` field, record `- **<name>**: \`<path>\`` (one per line, as a markdown list). This gets passed to the BK Grep agent.
46
+
47
+ ## Step 2: Resolve Queries
48
+
49
+ ### Predefined mode (`--predefined`)
50
+
51
+ 1. Read the predefined queries file: `$CLAUDE_PLUGIN_ROOT/evals/agent-quality/queries/predefined.yaml`
52
+ 2. Parse the YAML content
53
+ 3. For each query, check if ANY of its `store_hint` values match an available store name
54
+ 4. Split into **runnable** (store available) and **skipped** (store not available) lists
55
+ 5. If `--predefined N` was specified, select only query at index N from the full list (skip if store not available)
56
+ 6. If no queries are runnable, show what stores to add and abort
57
+
58
+ ### Arbitrary mode (bare query string)
59
+
60
+ 1. Use the raw query string as the question
61
+ 2. Set `expected_topics` and `anti_patterns` to empty lists
62
+ 3. Set `id` to "arbitrary", `category` to "general", `difficulty` to "unknown"
63
+
64
+ ## Step 3: Load Templates
65
+
66
+ Read these files from `$CLAUDE_PLUGIN_ROOT/evals/agent-quality/templates/`:
67
+
68
+ 1. `without-bk-agent.md` — instructions for the baseline agent
69
+ 2. `bk-grep-agent.md` — instructions for the BK Grep agent
70
+ 3. `with-bk-agent.md` — instructions for the BK Full agent
71
+ 4. `judge.md` — grading rubric
72
+
73
+ ## Step 4: Run Eval (for each query)
74
+
75
+ ### Spawn ALL THREE agents in parallel (same turn, three Task tool calls)
76
+
77
+ **Without-BK agent** — Use the Task tool with `subagent_type: "general-purpose"`:
78
+ - Take the content from `without-bk-agent.md`
79
+ - Replace `{{QUESTION}}` with the actual question
80
+ - Send as the task prompt
81
+
82
+ **BK Grep agent** — Use the Task tool with `subagent_type: "general-purpose"`:
83
+ - Take the content from `bk-grep-agent.md`
84
+ - Replace `{{QUESTION}}` with the actual question
85
+ - Replace `{{STORE_PATHS}}` with the store name-to-path mapping built in Step 1
86
+ - Send as the task prompt
87
+
88
+ **BK Full agent** — Use the Task tool with `subagent_type: "general-purpose"`:
89
+ - Take the content from `with-bk-agent.md`
90
+ - Replace `{{QUESTION}}` with the actual question
91
+ - Replace `{{STORES}}` with the list of available store names (one per line, as a markdown list)
92
+ - Send as the task prompt
93
+
94
+ Wait for all three agents to complete.
95
+
96
+ ### Capture Token Usage
97
+
98
+ From each Task tool response, parse the `<usage>` block to extract:
99
+ - `total_tokens` — the total tokens consumed by the agent
100
+ - `duration_ms` — wall-clock time for the agent
101
+
102
+ If usage data is not available in a Task response, show "N/A" for that agent.
103
+
104
+ ### Judge the results
105
+
106
+ Using the rubric from `judge.md`, evaluate all three answers yourself:
107
+
108
+ 1. Read all three agent responses
109
+ 2. For each answer, score all 4 criteria (1-5):
110
+ - **Factual Accuracy**: Are the claims correct?
111
+ - **Specificity**: Does it cite specific files, functions, code?
112
+ - **Completeness**: Does it cover the full answer?
113
+ - **Source Grounding**: Are claims backed by evidence?
114
+ 3. If the query has `expected_topics`, check which answers mention each topic
115
+ 4. If the query has `anti_patterns`, flag if any answer makes those claims
116
+ 5. Calculate totals (max 20 each), determine winner and deltas
117
+
118
+ ## Step 5: Output Results
119
+
120
+ ### Single query output (arbitrary or `--predefined N`)
121
+
122
+ Show the full comparison:
123
+
124
+ ```
125
+ ## Eval: "<question>"
126
+
127
+ | Criterion | Without BK | BK Grep | BK Full |
128
+ |-------------------|:----------:|:-------:|:-------:|
129
+ | Accuracy | X | X | X |
130
+ | Specificity | X | X | X |
131
+ | Completeness | X | X | X |
132
+ | Source Grounding | X | X | X |
133
+ | **Total** | **X** | **X** | **X** |
134
+
135
+ | Usage | Without BK | BK Grep | BK Full |
136
+ |-------------------|:----------:|:-------:|:-------:|
137
+ | Tokens | X,XXX | X,XXX | X,XXX |
138
+ | Duration (s) | X.X | X.X | X.X |
139
+
140
+ **Winner:** [BK Full | BK Grep | Without BK | Tie] ([significant | marginal | none])
141
+ **Key Difference:** [One sentence explaining the most important quality gap]
142
+ **Grep vs Full:** [One sentence on whether vector search outperformed manual grep, and if so how]
143
+ ```
144
+
145
+ If expected topics were provided:
146
+ ```
147
+ ### Expected Topics
148
+ - [x] topic covered by all three
149
+ - [x] topic covered by BK Full + BK Grep only
150
+ - [x] topic covered by BK Full only
151
+ - [ ] topic missed by all
152
+ ```
153
+
154
+ ### Multi-query output (`--predefined`)
155
+
156
+ Show a summary row per query, then aggregate:
157
+
158
+ ```
159
+ ## Agent Quality Eval Summary
160
+
161
+ Ran X/8 queries (Y skipped — stores not indexed)
162
+
163
+ | # | Query | Difficulty | w/o BK | Grep | Full | Winner | Delta |
164
+ |---|-------|:----------:|:------:|:----:|:----:|--------|-------|
165
+ | 1 | query-id | medium | 9/20 | 15/20 | 19/20 | Full | significant |
166
+ | 2 | query-id | easy | 14/20 | 17/20 | 18/20 | Full | marginal |
167
+ | ... |
168
+
169
+ ### Token Usage
170
+
171
+ | # | Query | w/o BK tokens | Grep tokens | Full tokens |
172
+ |---|-------|:-------------:|:-----------:|:-----------:|
173
+ | 1 | query-id | 2,340 | 8,120 | 5,670 |
174
+ | 2 | query-id | 1,890 | 6,450 | 4,230 |
175
+ | ... |
176
+
177
+ ### Aggregate
178
+ - **Without BK mean:** X.X/20 (avg X,XXX tokens)
179
+ - **BK Grep mean:** X.X/20 (avg X,XXX tokens)
180
+ - **BK Full mean:** X.X/20 (avg X,XXX tokens)
181
+ - **Full vs Without:** +X.X points (+XX%)
182
+ - **Full vs Grep:** +X.X points (+XX%)
183
+ - **Grep vs Without:** +X.X points (+XX%)
184
+ - **Full win rate:** X/X (XX%)
185
+ - **Significant wins (Full):** X
186
+
187
+ ### By Category
188
+ | Category | w/o BK | Grep | Full | Full delta |
189
+ |----------|:------:|:----:|:----:|------------|
190
+ | implementation | X.X | X.X | X.X | +X.X |
191
+ | api | X.X | X.X | X.X | +X.X |
192
+
193
+ ### By Difficulty
194
+ | Difficulty | w/o BK | Grep | Full | Full delta |
195
+ |------------|:------:|:----:|:----:|------------|
196
+ | easy | X.X | X.X | X.X | +X.X |
197
+ | medium | X.X | X.X | X.X | +X.X |
198
+ | hard | X.X | X.X | X.X | +X.X |
199
+
200
+ ### Token Efficiency
201
+ | Agent | Mean Score | Mean Tokens | Score/1K Tokens |
202
+ |-------|:----------:|:-----------:|:---------------:|
203
+ | Without BK | X.X | X,XXX | X.XX |
204
+ | BK Grep | X.X | X,XXX | X.XX |
205
+ | BK Full | X.X | X,XXX | X.XX |
206
+ ```
207
+
208
+ If any queries were skipped:
209
+ ```
210
+ ### Skipped (store not indexed)
211
+ - vue-reactivity-tracking — add with: /bluera-knowledge:add-repo https://github.com/vuejs/core --name vue
212
+ - fastapi-dependency-injection — add with: /bluera-knowledge:add-repo https://github.com/fastapi/fastapi --name fastapi
213
+ ```
214
+
215
+ ## Important Notes
216
+
217
+ - Each query spawns 3 subagents. For `--predefined` with 8 queries, that's up to 24 agent runs. Process one query at a time (but spawn all three agents for each query in parallel).
218
+ - The without-BK agent may use WebSearch — this is intentional. We're comparing against "the best Claude can do without BK."
219
+ - The BK Grep agent may NOT use WebSearch. It tests what an agent can discover by exploring raw source code, to isolate the value of vector search.
220
+ - Scoring is somewhat subjective. The value is in the comparison (relative scores) rather than absolute numbers. Look at the delta and key differences.
221
+ - The Token Efficiency table reveals cost-effectiveness: if BK Grep achieves similar scores to BK Full with fewer tokens, it suggests vector search isn't adding much for that query type.
222
+ - For arbitrary queries without expected topics, grading relies entirely on the 4 general criteria. This is fine — it still reveals whether BK adds value.
@@ -0,0 +1,72 @@
1
+ ---
2
+ description: Check health of all stores (path existence, model compatibility)
3
+ allowed-tools: ["mcp__bluera-knowledge__execute"]
4
+ ---
5
+
6
+ # Store Health Check
7
+
8
+ Diagnose issues with knowledge stores: missing paths, schema migrations, model mismatches.
9
+
10
+ ## Steps
11
+
12
+ 1. Use the mcp__bluera-knowledge__execute tool with command "stores:health" to check all stores
13
+
14
+ 2. Present results grouped by status:
15
+
16
+ ```
17
+ ## Store Health Report
18
+
19
+ ### Errors (require action)
20
+ | Store | Type | Issue | Fix |
21
+ |-------|------|-------|-----|
22
+ | my-repo | repo | Path not found | Re-create store or fix projectRoot |
23
+
24
+ ### Warnings (recommended action)
25
+ | Store | Type | Issue | Fix |
26
+ |-------|------|-------|-----|
27
+ | old-docs | web | Schema v1 | Run: /bluera-knowledge:index old-docs |
28
+
29
+ ### Healthy
30
+ - react-docs (web)
31
+ - lodash (repo)
32
+
33
+ **Summary**: 2 healthy, 1 warning, 1 error
34
+ ```
35
+
36
+ ## Exit Codes
37
+
38
+ The health check returns an exit code for scripting:
39
+
40
+ | Exit Code | Meaning |
41
+ |-----------|---------|
42
+ | 0 | All stores healthy |
43
+ | 1 | At least one store has an error (path not found) |
44
+ | 2 | No errors, but at least one warning (model/schema issue) |
45
+
46
+ ## Issue Types
47
+
48
+ ### PATH_NOT_FOUND (Error)
49
+ The store's source path no longer exists. This happens when:
50
+ - A local folder was deleted or moved
51
+ - The project was relocated and paths weren't updated
52
+ - A cloned repo directory was removed
53
+
54
+ **Fix**: Re-create the store or update the projectRoot setting.
55
+
56
+ ### SCHEMA_V1 (Warning)
57
+ The store was created before model tracking was added. It needs to be re-indexed to be searchable.
58
+
59
+ **Fix**: Run `/bluera-knowledge:index <store-name>`
60
+
61
+ ### MODEL_MISMATCH (Warning)
62
+ The store was indexed with a different embedding model than the current configuration.
63
+
64
+ **Fix**: Run `/bluera-knowledge:index <store-name>` to re-index with the current model.
65
+
66
+ ## Check Single Store
67
+
68
+ To check a specific store only:
69
+
70
+ ```
71
+ stores:health --store=<store-name>
72
+ ```