bluera-knowledge 0.30.0 → 0.32.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +24 -0
- package/.mcp.json +13 -0
- package/CHANGELOG.md +37 -0
- package/NOTICE +47 -0
- package/README.md +2 -2
- package/bun.lock +1978 -0
- package/commands/add-folder.md +48 -0
- package/commands/add-repo.md +50 -0
- package/commands/cancel.md +63 -0
- package/commands/check-status.md +130 -0
- package/commands/crawl.md +61 -0
- package/commands/doctor.md +27 -0
- package/commands/eval.md +222 -0
- package/commands/health.md +72 -0
- package/commands/index.md +48 -0
- package/commands/remove-store.md +52 -0
- package/commands/search.md +80 -0
- package/commands/search.sh +63 -0
- package/commands/skill-activation.md +131 -0
- package/commands/stores.md +54 -0
- package/commands/suggest.md +118 -0
- package/commands/sync.md +96 -0
- package/commands/test-plugin.md +547 -0
- package/commands/uninstall.md +65 -0
- package/dist/{chunk-B335UOU7.js → chunk-3TB7TDVF.js} +24 -3
- package/dist/chunk-3TB7TDVF.js.map +1 -0
- package/dist/{chunk-KCI4U6FH.js → chunk-KDZDLJUY.js} +2 -2
- package/dist/{chunk-AEXFPA57.js → chunk-YDTTD53Y.js} +158 -26
- package/dist/chunk-YDTTD53Y.js.map +1 -0
- package/dist/index.js +3 -3
- package/dist/mcp/bootstrap.js +10 -0
- package/dist/mcp/bootstrap.js.map +1 -1
- package/dist/mcp/server.d.ts +5 -3
- package/dist/mcp/server.js +2 -2
- package/dist/workers/background-worker-cli.js +2 -2
- package/hooks/check-ready.sh +109 -0
- package/hooks/hooks.json +87 -0
- package/hooks/job-status-hook.sh +51 -0
- package/hooks/posttooluse-bk-reminder.py +126 -0
- package/hooks/posttooluse-web-research.py +209 -0
- package/hooks/pretooluse-bk-suggest.py +296 -0
- package/hooks/skill-activation.py +221 -0
- package/hooks/skill-rules.json +131 -0
- package/package.json +10 -2
- package/scripts/CLAUDE.md +65 -0
- package/scripts/auto-setup.sh +65 -0
- package/scripts/bench-regression.sh +345 -0
- package/scripts/dev.sh +16 -0
- package/scripts/doctor.sh +103 -0
- package/scripts/download-models.ts +188 -0
- package/scripts/export-web-store.ts +142 -0
- package/scripts/lib/mock-server.sh +70 -0
- package/scripts/mcp-wrapper.sh +91 -0
- package/scripts/setup.sh +224 -0
- package/scripts/test-mcp-dev.js +260 -0
- package/scripts/validate-local.sh +412 -0
- package/scripts/validate-npm-release.sh +406 -0
- package/skills/advanced-workflows/SKILL.md +273 -0
- package/skills/knowledge-search/SKILL.md +110 -0
- package/skills/search-optimization/SKILL.md +199 -0
- package/skills/search-optimization/references/mistakes.md +21 -0
- package/skills/search-optimization/references/strategies.md +80 -0
- package/skills/store-lifecycle/SKILL.md +470 -0
- package/skills/when-to-query/SKILL.md +160 -0
- package/dist/chunk-AEXFPA57.js.map +0 -1
- package/dist/chunk-B335UOU7.js.map +0 -1
- /package/dist/{chunk-KCI4U6FH.js.map → chunk-KDZDLJUY.js.map} +0 -0
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Index a local folder of reference material
|
|
3
|
+
argument-hint: "[path] [--name store-name]"
|
|
4
|
+
allowed-tools: ["mcp__bluera-knowledge__execute"]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Add Local Folder to Knowledge Stores
|
|
8
|
+
|
|
9
|
+
Index a local folder of reference material: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Steps
|
|
12
|
+
|
|
13
|
+
1. Parse arguments from $ARGUMENTS:
|
|
14
|
+
- Extract the folder path (required, first positional argument)
|
|
15
|
+
- Extract --name parameter (optional, defaults to folder name)
|
|
16
|
+
|
|
17
|
+
2. Use mcp__bluera-knowledge__execute tool with command "store:create":
|
|
18
|
+
- args.name: Store name (from --name or folder basename)
|
|
19
|
+
- args.type: "file"
|
|
20
|
+
- args.source: The folder path
|
|
21
|
+
|
|
22
|
+
3. Display results showing job ID for background indexing:
|
|
23
|
+
|
|
24
|
+
```
|
|
25
|
+
✓ Adding folder: /Users/me/my-docs...
|
|
26
|
+
✓ Created store: my-docs (e5f6g7h8...)
|
|
27
|
+
Location: ~/.local/share/bluera-knowledge/stores/e5f6g7h8.../
|
|
28
|
+
|
|
29
|
+
🔄 Indexing started in background
|
|
30
|
+
Job ID: job_xyz789abc123
|
|
31
|
+
|
|
32
|
+
Check status with: /bluera-knowledge:check-status job_xyz789abc123
|
|
33
|
+
Or view all jobs: /bluera-knowledge:check-status
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Error Handling
|
|
37
|
+
|
|
38
|
+
If creation fails (e.g., path doesn't exist, permission denied):
|
|
39
|
+
|
|
40
|
+
```
|
|
41
|
+
✗ Failed to add folder: [error message]
|
|
42
|
+
|
|
43
|
+
Common issues:
|
|
44
|
+
- Check that the path exists
|
|
45
|
+
- Ensure you have read permissions for the folder
|
|
46
|
+
- Verify the path is a directory, not a file
|
|
47
|
+
- Use absolute paths to avoid ambiguity
|
|
48
|
+
```
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Clone and index a library source repository
|
|
3
|
+
argument-hint: "[git-url] [--name store-name] [--branch branch-name]"
|
|
4
|
+
allowed-tools: ["mcp__bluera-knowledge__execute"]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Add Repository to Knowledge Stores
|
|
8
|
+
|
|
9
|
+
Clone and index a library source repository: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Steps
|
|
12
|
+
|
|
13
|
+
1. Parse arguments from $ARGUMENTS:
|
|
14
|
+
- Extract the git URL (required, first positional argument)
|
|
15
|
+
- Extract --name parameter (optional, defaults to repo name from URL)
|
|
16
|
+
- Extract --branch parameter (optional, defaults to default branch)
|
|
17
|
+
|
|
18
|
+
2. Use mcp__bluera-knowledge__execute tool with command "store:create":
|
|
19
|
+
- args.name: Store name (from --name or extracted from URL)
|
|
20
|
+
- args.type: "repo"
|
|
21
|
+
- args.source: The git URL
|
|
22
|
+
- args.branch: Branch name (if --branch specified)
|
|
23
|
+
|
|
24
|
+
3. Display results showing job ID for background indexing:
|
|
25
|
+
|
|
26
|
+
```
|
|
27
|
+
✓ Cloning https://github.com/facebook/react...
|
|
28
|
+
✓ Created store: react (a1b2c3d4...)
|
|
29
|
+
Location: ~/.local/share/bluera-knowledge/stores/a1b2c3d4.../
|
|
30
|
+
|
|
31
|
+
🔄 Indexing started in background
|
|
32
|
+
Job ID: job_abc123def456
|
|
33
|
+
|
|
34
|
+
Check status with: /bluera-knowledge:check-status job_abc123def456
|
|
35
|
+
Or view all jobs: /bluera-knowledge:check-status
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## Error Handling
|
|
39
|
+
|
|
40
|
+
If creation fails (e.g., invalid URL, network error, git not available):
|
|
41
|
+
|
|
42
|
+
```
|
|
43
|
+
✗ Failed to clone repository: [error message]
|
|
44
|
+
|
|
45
|
+
Common issues:
|
|
46
|
+
- Check that the git URL is valid and accessible
|
|
47
|
+
- Ensure you have network connectivity
|
|
48
|
+
- Verify git is installed on your system
|
|
49
|
+
- For private repos, check your SSH keys or credentials
|
|
50
|
+
```
|
|
@@ -0,0 +1,63 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Cancel a background job
|
|
3
|
+
argument-hint: "[job-id]"
|
|
4
|
+
allowed-tools: ["mcp__bluera-knowledge__execute"]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Cancel Background Job
|
|
8
|
+
|
|
9
|
+
Cancel a running or pending background job: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Steps
|
|
12
|
+
|
|
13
|
+
1. Parse the job ID from $ARGUMENTS (required)
|
|
14
|
+
- If no job ID provided, show error and suggest using /bluera-knowledge:check-status to list active jobs
|
|
15
|
+
|
|
16
|
+
2. Use mcp__bluera-knowledge__execute tool with command "job:cancel":
|
|
17
|
+
- args.jobId: The job ID from $ARGUMENTS
|
|
18
|
+
|
|
19
|
+
3. Display cancellation result:
|
|
20
|
+
|
|
21
|
+
```
|
|
22
|
+
✓ Job job_abc123def456 cancelled
|
|
23
|
+
Type: clone
|
|
24
|
+
Progress: 45% (was indexing)
|
|
25
|
+
|
|
26
|
+
The job has been stopped and will not continue.
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## When to Cancel
|
|
30
|
+
|
|
31
|
+
Cancel a job when:
|
|
32
|
+
- You accidentally started indexing the wrong repository
|
|
33
|
+
- The operation is taking too long and you want to try a different approach
|
|
34
|
+
- You need to free up system resources
|
|
35
|
+
- You want to stop an operation before it completes
|
|
36
|
+
|
|
37
|
+
## Important Notes
|
|
38
|
+
|
|
39
|
+
- Only jobs in 'pending' or 'running' status can be cancelled
|
|
40
|
+
- Completed or failed jobs cannot be cancelled
|
|
41
|
+
- Cancelled jobs are marked with status 'cancelled' and remain in the job list
|
|
42
|
+
- Partial work may be saved (e.g., partially indexed files remain in the database)
|
|
43
|
+
|
|
44
|
+
## Error Handling
|
|
45
|
+
|
|
46
|
+
If job cannot be cancelled:
|
|
47
|
+
|
|
48
|
+
```
|
|
49
|
+
✗ Cannot cancel job job_abc123def456: Job has already completed
|
|
50
|
+
|
|
51
|
+
Only pending or running jobs can be cancelled.
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
If job not found:
|
|
55
|
+
|
|
56
|
+
```
|
|
57
|
+
✗ Job not found: job_abc123def456
|
|
58
|
+
|
|
59
|
+
Common issues:
|
|
60
|
+
- Check the job ID is correct
|
|
61
|
+
- Use /bluera-knowledge:check-status to see all active jobs
|
|
62
|
+
- Job may have already completed and been cleaned up
|
|
63
|
+
```
|
|
@@ -0,0 +1,130 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Check status of background operations
|
|
3
|
+
argument-hint: "[job-id]"
|
|
4
|
+
allowed-tools: ["mcp__bluera-knowledge__execute"]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Check Background Job Status
|
|
8
|
+
|
|
9
|
+
Check the status of a background operation: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Steps
|
|
12
|
+
|
|
13
|
+
1. Parse $ARGUMENTS:
|
|
14
|
+
- If a job ID is provided, use it for specific job status
|
|
15
|
+
- If no arguments, show all active jobs
|
|
16
|
+
|
|
17
|
+
2. If job ID provided:
|
|
18
|
+
- Use mcp__bluera-knowledge__execute tool with command "job:status":
|
|
19
|
+
- args.jobId: The job ID from $ARGUMENTS
|
|
20
|
+
- Display current status, progress, and details
|
|
21
|
+
|
|
22
|
+
3. If no job ID provided:
|
|
23
|
+
- Use mcp__bluera-knowledge__execute tool with command "jobs":
|
|
24
|
+
- args.activeOnly: true
|
|
25
|
+
- Display a table of running/pending jobs
|
|
26
|
+
|
|
27
|
+
## Display Format
|
|
28
|
+
|
|
29
|
+
For a specific job:
|
|
30
|
+
|
|
31
|
+
```
|
|
32
|
+
Job Status: job_abc123def456
|
|
33
|
+
───────────────────────────────────────
|
|
34
|
+
Store: react-query
|
|
35
|
+
Phase: indexing (2/2)
|
|
36
|
+
Progress: █████░░░ 45% (562/1,247 files)
|
|
37
|
+
Started: 2 minutes ago
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
For all active jobs, format as a rich table with progress bars:
|
|
41
|
+
|
|
42
|
+
```
|
|
43
|
+
Active Background Jobs
|
|
44
|
+
────────────────────────────────────────────────────────────────────────────────────────────
|
|
45
|
+
| Job ID | Store | Phase | Progress | Files |
|
|
46
|
+
|--------------------|------------------|-----------------|--------------|----------------|
|
|
47
|
+
| job_3abaf9639770 | claude-agent-sdk | indexing (2/2) | ██████░░ 59% | 32/77 files |
|
|
48
|
+
| job_4f0315fdcff9 | zustand | cloning (1/2) | █░░░░░░░ 15% | - |
|
|
49
|
+
| job_1d1d93fd254f | uvicorn | indexing (1/1) | ████░░░░ 44% | 20/100 files |
|
|
50
|
+
| job_ac7584576f18 | tanstack-query | crawling (1/2) | ███░░░░░ 31% | 24 pages |
|
|
51
|
+
| job_8113ea07cf53 | framer-motion | indexing (2/2) | ███░░░░░ 30% | 8/1378 files |
|
|
52
|
+
| job_288c24b6724c | monaco-editor | indexing (1/1) | ███░░░░░ 31% | 12/924 files |
|
|
53
|
+
|
|
54
|
+
✓ tiktoken: Completed
|
|
55
|
+
|
|
56
|
+
6 jobs still running. The smaller repos (claude-agent-sdk, zustand, uvicorn) are progressing
|
|
57
|
+
faster. The larger ones (tanstack-query: 1741 files, framer-motion: 1378 files,
|
|
58
|
+
monaco-editor: 924 files) will take longer.
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
**Phase column:**
|
|
62
|
+
- Read from `job.details.phase`, `job.details.phaseStep`, `job.details.phaseTotalSteps`
|
|
63
|
+
- Format as: `{phase} ({step}/{total})` e.g., `indexing (2/2)`, `cloning (1/2)`
|
|
64
|
+
- Phases: `cloning`, `crawling`, `indexing`
|
|
65
|
+
- Clone jobs: cloning (1/2) → indexing (2/2)
|
|
66
|
+
- Index jobs: indexing (1/1)
|
|
67
|
+
- Crawl jobs: crawling (1/2) → indexing (2/2)
|
|
68
|
+
|
|
69
|
+
**Progress bar rendering (8 chars wide):**
|
|
70
|
+
|
|
71
|
+
Build the bar using these characters: `█` (filled) and `░` (empty)
|
|
72
|
+
|
|
73
|
+
```
|
|
74
|
+
Algorithm:
|
|
75
|
+
filled = Math.round(progress / 100 * 8)
|
|
76
|
+
bar = '█'.repeat(filled) + '░'.repeat(8 - filled) + ' ' + progress + '%'
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
Examples:
|
|
80
|
+
```
|
|
81
|
+
0% → ░░░░░░░░ 0%
|
|
82
|
+
15% → █░░░░░░░ 15%
|
|
83
|
+
25% → ██░░░░░░ 25%
|
|
84
|
+
31% → ███░░░░░ 31%
|
|
85
|
+
44% → ████░░░░ 44%
|
|
86
|
+
59% → █████░░░ 59%
|
|
87
|
+
75% → ██████░░ 75%
|
|
88
|
+
88% → ███████░ 88%
|
|
89
|
+
100% → ████████ 100%
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
**Files column:**
|
|
93
|
+
- For indexing: Show `{filesProcessed}/{totalFiles} files` from job.details
|
|
94
|
+
- For crawling: Show `{pagesCrawled} pages` from job.details
|
|
95
|
+
- For cloning (phase 1): Show `-` (no file count yet)
|
|
96
|
+
|
|
97
|
+
**Summary section:**
|
|
98
|
+
- After the table, add a brief summary noting:
|
|
99
|
+
- How many jobs are still running
|
|
100
|
+
- Which repos are progressing faster (smaller file counts)
|
|
101
|
+
- Which repos will take longer (larger file counts)
|
|
102
|
+
- If any jobs recently completed, note them with ✓ prefix
|
|
103
|
+
|
|
104
|
+
If no active jobs:
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
No active background jobs.
|
|
108
|
+
|
|
109
|
+
Recent completed jobs:
|
|
110
|
+
────────────────────────────────────────────────────────────────────────────────────────────
|
|
111
|
+
| Job ID | Store | Phase | Files | Completed |
|
|
112
|
+
|--------------------|------------------|-----------------|----------------|--------------|
|
|
113
|
+
| job_old123abc456 | react-query | indexing (2/2) | 245/245 files | 5m ago |
|
|
114
|
+
| job_xyz789ghi012 | zustand | indexing (1/1) | 67/67 files | 12m ago |
|
|
115
|
+
|
|
116
|
+
All jobs completed successfully.
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
## Error Handling
|
|
120
|
+
|
|
121
|
+
If job not found:
|
|
122
|
+
|
|
123
|
+
```
|
|
124
|
+
✗ Job not found: job_abc123def456
|
|
125
|
+
|
|
126
|
+
Common issues:
|
|
127
|
+
- Check the job ID is correct
|
|
128
|
+
- Job may have expired (stale pending jobs are marked failed after 2 hours)
|
|
129
|
+
- Use /bluera-knowledge:check-status to see all active jobs
|
|
130
|
+
```
|
|
@@ -0,0 +1,61 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Crawl web pages with natural language control and add to knowledge store
|
|
3
|
+
argument-hint: "[url] [store-name] [--crawl instruction] [--extract instruction] [--fast]"
|
|
4
|
+
allowed-tools: ["Bash(node ${CLAUDE_PLUGIN_ROOT}/dist/index.js crawl:*)"]
|
|
5
|
+
context: fork
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
**⚠️ IMPORTANT: Store name is a POSITIONAL argument, NOT an option!**
|
|
9
|
+
|
|
10
|
+
```
|
|
11
|
+
WRONG: crawl https://example.com --store=my-store
|
|
12
|
+
RIGHT: crawl https://example.com my-store
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
Crawling and indexing: $ARGUMENTS
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
node ${CLAUDE_PLUGIN_ROOT}/dist/index.js crawl $ARGUMENTS
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
The web pages will be crawled with intelligent link selection and optional natural language extraction, then indexed for searching.
|
|
22
|
+
|
|
23
|
+
**Note:** The web store is auto-created if it doesn't exist. No need to create the store first.
|
|
24
|
+
|
|
25
|
+
## Usage Examples
|
|
26
|
+
|
|
27
|
+
**Intelligent crawl strategy:**
|
|
28
|
+
```
|
|
29
|
+
/bluera-knowledge:crawl https://code.claude.com/docs/en/ claude-docs --crawl "all Getting Started pages"
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
**With extraction:**
|
|
33
|
+
```
|
|
34
|
+
/bluera-knowledge:crawl https://example.com/pricing pricing-store --extract "extract pricing and features"
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
**Both strategy and extraction:**
|
|
38
|
+
```
|
|
39
|
+
/bluera-knowledge:crawl https://docs.example.com my-docs --crawl "API reference pages" --extract "API endpoints and parameters"
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
**Simple BFS mode:**
|
|
43
|
+
```
|
|
44
|
+
/bluera-knowledge:crawl https://example.com/docs docs-store --simple
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
**Fast mode (axios-only, no JavaScript rendering):**
|
|
48
|
+
```
|
|
49
|
+
/bluera-knowledge:crawl https://example.com/docs docs-store --fast --max-pages 20
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
## Options
|
|
53
|
+
|
|
54
|
+
- `--crawl <instruction>` - Natural language instruction for which pages to crawl (e.g., "all Getting Started pages")
|
|
55
|
+
- `--extract <instruction>` - Natural language instruction for what content to extract (e.g., "extract API references")
|
|
56
|
+
- `--simple` - Use simple BFS (breadth-first search) mode instead of intelligent crawling
|
|
57
|
+
- `--max-pages <number>` - Maximum number of pages to crawl (default: 50)
|
|
58
|
+
- `--fast` - Use fast axios-only mode instead of headless browser
|
|
59
|
+
- Default behavior uses headless browser (Playwright via crawl4ai) for JavaScript-rendered sites
|
|
60
|
+
- Use `--fast` when the target site doesn't use client-side rendering
|
|
61
|
+
- Much faster than headless mode but may miss content from JavaScript-heavy sites
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Diagnose plugin issues and get fix instructions
|
|
3
|
+
allowed-tools: ["Bash(${CLAUDE_PLUGIN_ROOT:-.}/scripts/doctor.sh)"]
|
|
4
|
+
---
|
|
5
|
+
# Bluera Knowledge Doctor
|
|
6
|
+
|
|
7
|
+
Run comprehensive diagnostics to identify and fix plugin issues.
|
|
8
|
+
|
|
9
|
+
## Instructions
|
|
10
|
+
|
|
11
|
+
Run the doctor script to check all prerequisites:
|
|
12
|
+
|
|
13
|
+
```bash
|
|
14
|
+
bash "${CLAUDE_PLUGIN_ROOT:-.}/scripts/doctor.sh"
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
The script checks:
|
|
18
|
+
1. **Build tools** (make/gcc) - Required for native modules
|
|
19
|
+
2. **Node.js** - Required for MCP server
|
|
20
|
+
3. **Plugin dependencies** (node_modules) - Required for MCP server
|
|
21
|
+
4. **MCP wrapper** - Required for MCP server startup
|
|
22
|
+
5. **Python 3** - Optional, for embeddings
|
|
23
|
+
6. **Playwright** - Optional, for web crawling
|
|
24
|
+
|
|
25
|
+
For any `[FAIL]` items, follow the FIX instructions provided.
|
|
26
|
+
|
|
27
|
+
After fixing issues, restart Claude Code for changes to take effect.
|
package/commands/eval.md
ADDED
|
@@ -0,0 +1,222 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Evaluate agent quality across three modes — without BK, BK grep-only, and BK full
|
|
3
|
+
argument-hint: "[query | --predefined | --predefined N]"
|
|
4
|
+
allowed-tools: ["mcp__bluera-knowledge__execute", "mcp__bluera-knowledge__search", "mcp__bluera-knowledge__get_full_context", "Read", "Grep", "Glob", "WebSearch", "Bash"]
|
|
5
|
+
context: fork
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Agent Quality Evaluation
|
|
9
|
+
|
|
10
|
+
Compare how well Claude answers library questions across three access levels.
|
|
11
|
+
|
|
12
|
+
For each query, three agents run in parallel:
|
|
13
|
+
- **Without BK** — uses only web search and training knowledge
|
|
14
|
+
- **BK Grep** — can Grep/Read/Glob the cloned source repos but has no vector search
|
|
15
|
+
- **BK Full** — uses BK vector search + get_full_context + Grep/Read (all BK tools)
|
|
16
|
+
|
|
17
|
+
Then score all three answers on accuracy, specificity, completeness, and source grounding.
|
|
18
|
+
|
|
19
|
+
## Arguments
|
|
20
|
+
|
|
21
|
+
Parse `$ARGUMENTS`:
|
|
22
|
+
|
|
23
|
+
- **No arguments or empty**: Show usage help
|
|
24
|
+
- **Quoted string** (not starting with `--`): Arbitrary query mode — run eval for that single question
|
|
25
|
+
- **`--predefined`**: Run all predefined queries (skip any whose stores are not indexed)
|
|
26
|
+
- **`--predefined N`**: Run predefined query #N only (1-based index)
|
|
27
|
+
|
|
28
|
+
If no arguments provided, show:
|
|
29
|
+
```
|
|
30
|
+
Usage:
|
|
31
|
+
/bluera-knowledge:eval "How does Express handle errors?" # Arbitrary query
|
|
32
|
+
/bluera-knowledge:eval --predefined # Run all predefined queries
|
|
33
|
+
/bluera-knowledge:eval --predefined 3 # Run predefined query #3
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Step 1: Prerequisites Check
|
|
37
|
+
|
|
38
|
+
1. Call MCP `execute` with `{ command: "stores" }` to list indexed stores
|
|
39
|
+
2. If no stores are indexed, show error and abort:
|
|
40
|
+
```
|
|
41
|
+
No knowledge stores indexed. Add at least one library first:
|
|
42
|
+
/bluera-knowledge:add-repo https://github.com/expressjs/express --name express
|
|
43
|
+
```
|
|
44
|
+
3. Record the list of available store names — you'll pass these to the BK Full agent
|
|
45
|
+
4. Build a `STORE_PATHS` mapping from the store response: for each store with a `path` field, record `- **<name>**: \`<path>\`` (one per line, as a markdown list). This gets passed to the BK Grep agent.
|
|
46
|
+
|
|
47
|
+
## Step 2: Resolve Queries
|
|
48
|
+
|
|
49
|
+
### Predefined mode (`--predefined`)
|
|
50
|
+
|
|
51
|
+
1. Read the predefined queries file: `$CLAUDE_PLUGIN_ROOT/evals/agent-quality/queries/predefined.yaml`
|
|
52
|
+
2. Parse the YAML content
|
|
53
|
+
3. For each query, check if ANY of its `store_hint` values match an available store name
|
|
54
|
+
4. Split into **runnable** (store available) and **skipped** (store not available) lists
|
|
55
|
+
5. If `--predefined N` was specified, select only query at index N from the full list (skip if store not available)
|
|
56
|
+
6. If no queries are runnable, show what stores to add and abort
|
|
57
|
+
|
|
58
|
+
### Arbitrary mode (bare query string)
|
|
59
|
+
|
|
60
|
+
1. Use the raw query string as the question
|
|
61
|
+
2. Set `expected_topics` and `anti_patterns` to empty lists
|
|
62
|
+
3. Set `id` to "arbitrary", `category` to "general", `difficulty` to "unknown"
|
|
63
|
+
|
|
64
|
+
## Step 3: Load Templates
|
|
65
|
+
|
|
66
|
+
Read these files from `$CLAUDE_PLUGIN_ROOT/evals/agent-quality/templates/`:
|
|
67
|
+
|
|
68
|
+
1. `without-bk-agent.md` — instructions for the baseline agent
|
|
69
|
+
2. `bk-grep-agent.md` — instructions for the BK Grep agent
|
|
70
|
+
3. `with-bk-agent.md` — instructions for the BK Full agent
|
|
71
|
+
4. `judge.md` — grading rubric
|
|
72
|
+
|
|
73
|
+
## Step 4: Run Eval (for each query)
|
|
74
|
+
|
|
75
|
+
### Spawn ALL THREE agents in parallel (same turn, three Task tool calls)
|
|
76
|
+
|
|
77
|
+
**Without-BK agent** — Use the Task tool with `subagent_type: "general-purpose"`:
|
|
78
|
+
- Take the content from `without-bk-agent.md`
|
|
79
|
+
- Replace `{{QUESTION}}` with the actual question
|
|
80
|
+
- Send as the task prompt
|
|
81
|
+
|
|
82
|
+
**BK Grep agent** — Use the Task tool with `subagent_type: "general-purpose"`:
|
|
83
|
+
- Take the content from `bk-grep-agent.md`
|
|
84
|
+
- Replace `{{QUESTION}}` with the actual question
|
|
85
|
+
- Replace `{{STORE_PATHS}}` with the store name-to-path mapping built in Step 1
|
|
86
|
+
- Send as the task prompt
|
|
87
|
+
|
|
88
|
+
**BK Full agent** — Use the Task tool with `subagent_type: "general-purpose"`:
|
|
89
|
+
- Take the content from `with-bk-agent.md`
|
|
90
|
+
- Replace `{{QUESTION}}` with the actual question
|
|
91
|
+
- Replace `{{STORES}}` with the list of available store names (one per line, as a markdown list)
|
|
92
|
+
- Send as the task prompt
|
|
93
|
+
|
|
94
|
+
Wait for all three agents to complete.
|
|
95
|
+
|
|
96
|
+
### Capture Token Usage
|
|
97
|
+
|
|
98
|
+
From each Task tool response, parse the `<usage>` block to extract:
|
|
99
|
+
- `total_tokens` — the total tokens consumed by the agent
|
|
100
|
+
- `duration_ms` — wall-clock time for the agent
|
|
101
|
+
|
|
102
|
+
If usage data is not available in a Task response, show "N/A" for that agent.
|
|
103
|
+
|
|
104
|
+
### Judge the results
|
|
105
|
+
|
|
106
|
+
Using the rubric from `judge.md`, evaluate all three answers yourself:
|
|
107
|
+
|
|
108
|
+
1. Read all three agent responses
|
|
109
|
+
2. For each answer, score all 4 criteria (1-5):
|
|
110
|
+
- **Factual Accuracy**: Are the claims correct?
|
|
111
|
+
- **Specificity**: Does it cite specific files, functions, code?
|
|
112
|
+
- **Completeness**: Does it cover the full answer?
|
|
113
|
+
- **Source Grounding**: Are claims backed by evidence?
|
|
114
|
+
3. If the query has `expected_topics`, check which answers mention each topic
|
|
115
|
+
4. If the query has `anti_patterns`, flag if any answer makes those claims
|
|
116
|
+
5. Calculate totals (max 20 each), determine winner and deltas
|
|
117
|
+
|
|
118
|
+
## Step 5: Output Results
|
|
119
|
+
|
|
120
|
+
### Single query output (arbitrary or `--predefined N`)
|
|
121
|
+
|
|
122
|
+
Show the full comparison:
|
|
123
|
+
|
|
124
|
+
```
|
|
125
|
+
## Eval: "<question>"
|
|
126
|
+
|
|
127
|
+
| Criterion | Without BK | BK Grep | BK Full |
|
|
128
|
+
|-------------------|:----------:|:-------:|:-------:|
|
|
129
|
+
| Accuracy | X | X | X |
|
|
130
|
+
| Specificity | X | X | X |
|
|
131
|
+
| Completeness | X | X | X |
|
|
132
|
+
| Source Grounding | X | X | X |
|
|
133
|
+
| **Total** | **X** | **X** | **X** |
|
|
134
|
+
|
|
135
|
+
| Usage | Without BK | BK Grep | BK Full |
|
|
136
|
+
|-------------------|:----------:|:-------:|:-------:|
|
|
137
|
+
| Tokens | X,XXX | X,XXX | X,XXX |
|
|
138
|
+
| Duration (s) | X.X | X.X | X.X |
|
|
139
|
+
|
|
140
|
+
**Winner:** [BK Full | BK Grep | Without BK | Tie] ([significant | marginal | none])
|
|
141
|
+
**Key Difference:** [One sentence explaining the most important quality gap]
|
|
142
|
+
**Grep vs Full:** [One sentence on whether vector search outperformed manual grep, and if so how]
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
If expected topics were provided:
|
|
146
|
+
```
|
|
147
|
+
### Expected Topics
|
|
148
|
+
- [x] topic covered by all three
|
|
149
|
+
- [x] topic covered by BK Full + BK Grep only
|
|
150
|
+
- [x] topic covered by BK Full only
|
|
151
|
+
- [ ] topic missed by all
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
### Multi-query output (`--predefined`)
|
|
155
|
+
|
|
156
|
+
Show a summary row per query, then aggregate:
|
|
157
|
+
|
|
158
|
+
```
|
|
159
|
+
## Agent Quality Eval Summary
|
|
160
|
+
|
|
161
|
+
Ran X/8 queries (Y skipped — stores not indexed)
|
|
162
|
+
|
|
163
|
+
| # | Query | Difficulty | w/o BK | Grep | Full | Winner | Delta |
|
|
164
|
+
|---|-------|:----------:|:------:|:----:|:----:|--------|-------|
|
|
165
|
+
| 1 | query-id | medium | 9/20 | 15/20 | 19/20 | Full | significant |
|
|
166
|
+
| 2 | query-id | easy | 14/20 | 17/20 | 18/20 | Full | marginal |
|
|
167
|
+
| ... |
|
|
168
|
+
|
|
169
|
+
### Token Usage
|
|
170
|
+
|
|
171
|
+
| # | Query | w/o BK tokens | Grep tokens | Full tokens |
|
|
172
|
+
|---|-------|:-------------:|:-----------:|:-----------:|
|
|
173
|
+
| 1 | query-id | 2,340 | 8,120 | 5,670 |
|
|
174
|
+
| 2 | query-id | 1,890 | 6,450 | 4,230 |
|
|
175
|
+
| ... |
|
|
176
|
+
|
|
177
|
+
### Aggregate
|
|
178
|
+
- **Without BK mean:** X.X/20 (avg X,XXX tokens)
|
|
179
|
+
- **BK Grep mean:** X.X/20 (avg X,XXX tokens)
|
|
180
|
+
- **BK Full mean:** X.X/20 (avg X,XXX tokens)
|
|
181
|
+
- **Full vs Without:** +X.X points (+XX%)
|
|
182
|
+
- **Full vs Grep:** +X.X points (+XX%)
|
|
183
|
+
- **Grep vs Without:** +X.X points (+XX%)
|
|
184
|
+
- **Full win rate:** X/X (XX%)
|
|
185
|
+
- **Significant wins (Full):** X
|
|
186
|
+
|
|
187
|
+
### By Category
|
|
188
|
+
| Category | w/o BK | Grep | Full | Full delta |
|
|
189
|
+
|----------|:------:|:----:|:----:|------------|
|
|
190
|
+
| implementation | X.X | X.X | X.X | +X.X |
|
|
191
|
+
| api | X.X | X.X | X.X | +X.X |
|
|
192
|
+
|
|
193
|
+
### By Difficulty
|
|
194
|
+
| Difficulty | w/o BK | Grep | Full | Full delta |
|
|
195
|
+
|------------|:------:|:----:|:----:|------------|
|
|
196
|
+
| easy | X.X | X.X | X.X | +X.X |
|
|
197
|
+
| medium | X.X | X.X | X.X | +X.X |
|
|
198
|
+
| hard | X.X | X.X | X.X | +X.X |
|
|
199
|
+
|
|
200
|
+
### Token Efficiency
|
|
201
|
+
| Agent | Mean Score | Mean Tokens | Score/1K Tokens |
|
|
202
|
+
|-------|:----------:|:-----------:|:---------------:|
|
|
203
|
+
| Without BK | X.X | X,XXX | X.XX |
|
|
204
|
+
| BK Grep | X.X | X,XXX | X.XX |
|
|
205
|
+
| BK Full | X.X | X,XXX | X.XX |
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
If any queries were skipped:
|
|
209
|
+
```
|
|
210
|
+
### Skipped (store not indexed)
|
|
211
|
+
- vue-reactivity-tracking — add with: /bluera-knowledge:add-repo https://github.com/vuejs/core --name vue
|
|
212
|
+
- fastapi-dependency-injection — add with: /bluera-knowledge:add-repo https://github.com/fastapi/fastapi --name fastapi
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
## Important Notes
|
|
216
|
+
|
|
217
|
+
- Each query spawns 3 subagents. For `--predefined` with 8 queries, that's up to 24 agent runs. Process one query at a time (but spawn all three agents for each query in parallel).
|
|
218
|
+
- The without-BK agent may use WebSearch — this is intentional. We're comparing against "the best Claude can do without BK."
|
|
219
|
+
- The BK Grep agent may NOT use WebSearch. It tests what an agent can discover by exploring raw source code, to isolate the value of vector search.
|
|
220
|
+
- Scoring is somewhat subjective. The value is in the comparison (relative scores) rather than absolute numbers. Look at the delta and key differences.
|
|
221
|
+
- The Token Efficiency table reveals cost-effectiveness: if BK Grep achieves similar scores to BK Full with fewer tokens, it suggests vector search isn't adding much for that query type.
|
|
222
|
+
- For arbitrary queries without expected topics, grading relies entirely on the 4 general criteria. This is fine — it still reveals whether BK adds value.
|
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Check health of all stores (path existence, model compatibility)
|
|
3
|
+
allowed-tools: ["mcp__bluera-knowledge__execute"]
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Store Health Check
|
|
7
|
+
|
|
8
|
+
Diagnose issues with knowledge stores: missing paths, schema migrations, model mismatches.
|
|
9
|
+
|
|
10
|
+
## Steps
|
|
11
|
+
|
|
12
|
+
1. Use the mcp__bluera-knowledge__execute tool with command "stores:health" to check all stores
|
|
13
|
+
|
|
14
|
+
2. Present results grouped by status:
|
|
15
|
+
|
|
16
|
+
```
|
|
17
|
+
## Store Health Report
|
|
18
|
+
|
|
19
|
+
### Errors (require action)
|
|
20
|
+
| Store | Type | Issue | Fix |
|
|
21
|
+
|-------|------|-------|-----|
|
|
22
|
+
| my-repo | repo | Path not found | Re-create store or fix projectRoot |
|
|
23
|
+
|
|
24
|
+
### Warnings (recommended action)
|
|
25
|
+
| Store | Type | Issue | Fix |
|
|
26
|
+
|-------|------|-------|-----|
|
|
27
|
+
| old-docs | web | Schema v1 | Run: /bluera-knowledge:index old-docs |
|
|
28
|
+
|
|
29
|
+
### Healthy
|
|
30
|
+
- react-docs (web)
|
|
31
|
+
- lodash (repo)
|
|
32
|
+
|
|
33
|
+
**Summary**: 2 healthy, 1 warning, 1 error
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Exit Codes
|
|
37
|
+
|
|
38
|
+
The health check returns an exit code for scripting:
|
|
39
|
+
|
|
40
|
+
| Exit Code | Meaning |
|
|
41
|
+
|-----------|---------|
|
|
42
|
+
| 0 | All stores healthy |
|
|
43
|
+
| 1 | At least one store has an error (path not found) |
|
|
44
|
+
| 2 | No errors, but at least one warning (model/schema issue) |
|
|
45
|
+
|
|
46
|
+
## Issue Types
|
|
47
|
+
|
|
48
|
+
### PATH_NOT_FOUND (Error)
|
|
49
|
+
The store's source path no longer exists. This happens when:
|
|
50
|
+
- A local folder was deleted or moved
|
|
51
|
+
- The project was relocated and paths weren't updated
|
|
52
|
+
- A cloned repo directory was removed
|
|
53
|
+
|
|
54
|
+
**Fix**: Re-create the store or update the projectRoot setting.
|
|
55
|
+
|
|
56
|
+
### SCHEMA_V1 (Warning)
|
|
57
|
+
The store was created before model tracking was added. It needs to be re-indexed to be searchable.
|
|
58
|
+
|
|
59
|
+
**Fix**: Run `/bluera-knowledge:index <store-name>`
|
|
60
|
+
|
|
61
|
+
### MODEL_MISMATCH (Warning)
|
|
62
|
+
The store was indexed with a different embedding model than the current configuration.
|
|
63
|
+
|
|
64
|
+
**Fix**: Run `/bluera-knowledge:index <store-name>` to re-index with the current model.
|
|
65
|
+
|
|
66
|
+
## Check Single Store
|
|
67
|
+
|
|
68
|
+
To check a specific store only:
|
|
69
|
+
|
|
70
|
+
```
|
|
71
|
+
stores:health --store=<store-name>
|
|
72
|
+
```
|