bluera-knowledge 0.13.0 → 0.13.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (34) hide show
  1. package/.claude/rules/code-quality.md +12 -0
  2. package/.claude/rules/git.md +5 -0
  3. package/.claude/rules/versioning.md +7 -0
  4. package/.claude-plugin/plugin.json +2 -15
  5. package/.mcp.json +11 -0
  6. package/CHANGELOG.md +7 -0
  7. package/CLAUDE.md +5 -13
  8. package/CONTRIBUTING.md +307 -0
  9. package/README.md +58 -1167
  10. package/commands/crawl.md +2 -1
  11. package/commands/test-plugin.md +197 -72
  12. package/docs/claude-code-best-practices.md +458 -0
  13. package/docs/cli.md +170 -0
  14. package/docs/commands.md +392 -0
  15. package/docs/crawler-architecture.md +89 -0
  16. package/docs/mcp-integration.md +130 -0
  17. package/docs/token-efficiency.md +91 -0
  18. package/eslint.config.js +1 -1
  19. package/hooks/check-dependencies.sh +18 -1
  20. package/hooks/hooks.json +2 -2
  21. package/hooks/posttooluse-bk-reminder.py +30 -2
  22. package/package.json +1 -1
  23. package/scripts/test-mcp-dev.js +260 -0
  24. package/src/mcp/plugin-mcp-config.test.ts +26 -19
  25. package/tests/integration/cli-consistency.test.ts +3 -2
  26. package/docs/plans/2024-12-17-ai-search-quality-implementation.md +0 -752
  27. package/docs/plans/2024-12-17-ai-search-quality-testing-design.md +0 -201
  28. package/docs/plans/2025-12-16-bluera-knowledge-cli.md +0 -2951
  29. package/docs/plans/2025-12-16-phase2-features.md +0 -1518
  30. package/docs/plans/2025-12-17-hil-implementation.md +0 -926
  31. package/docs/plans/2025-12-17-hil-quality-testing.md +0 -224
  32. package/docs/plans/2025-12-17-search-quality-phase1-implementation.md +0 -1416
  33. package/docs/plans/2025-12-17-search-quality-testing-v2-design.md +0 -212
  34. package/docs/plans/2025-12-28-ai-agent-optimization.md +0 -1630
@@ -1,212 +0,0 @@
1
- # Search Quality Testing v2 - Design
2
-
3
- ## Goal
4
-
5
- Build a valid, reproducible search quality testing system that enables real-world performance tracking and drives actionable improvements over time.
6
-
7
- ## Core Requirements
8
-
9
- - **Valid tests**: Real representative data, not synthetic placeholders
10
- - **Regression tracking**: Stable queries against stable corpus to measure changes
11
- - **Exploratory testing**: Generate fresh queries to discover new issues
12
- - **Actionable output**: AI-judged with spot-checks for calibration
13
-
14
- ---
15
-
16
- ## Test Corpus
17
-
18
- ### Structure
19
-
20
- ```
21
- tests/fixtures/corpus/ # Committed directly (no .git folders)
22
- ├── oss-repos/
23
- │ ├── zod/ # TypeScript schema validation
24
- │ └── hono/ # Lightweight web framework
25
- ├── documentation/
26
- │ └── express-docs/ # Express.js guide excerpts
27
- ├── articles/ # Technical blog posts, tutorials
28
- ├── papers/ # Research papers (markdown)
29
- └── VERSION.md # Corpus version documentation
30
- ```
31
-
32
- ### Selection Criteria
33
-
34
- - Small but representative (~50-100 documents)
35
- - Mix of content types: code + docs, articles, reference
36
- - Pinned versions (cleaned snapshots, no .git)
37
- - Reflects real usage: dev docs, documented codebases, articles
38
-
39
- ---
40
-
41
- ## Query Management
42
-
43
- ### Core Queries (`queries/core.json`)
44
-
45
- ```json
46
- {
47
- "version": "1.0.0",
48
- "description": "Stable regression benchmark queries",
49
- "queries": [
50
- {
51
- "id": "auth-001",
52
- "query": "JWT token validation middleware",
53
- "intent": "Find authentication middleware implementations",
54
- "category": "code-pattern",
55
- "addedAt": "2025-12-17",
56
- "expectedSources": []
57
- }
58
- ]
59
- }
60
- ```
61
-
62
- ### Query Categories
63
-
64
- - `code-pattern` - Find implementation patterns
65
- - `concept` - Explain a concept or approach
66
- - `api-reference` - Look up specific API/function
67
- - `troubleshooting` - Debug/error resolution
68
- - `comparison` - Compare approaches or tools
69
-
70
- ### Generated Queries
71
-
72
- - Saved to `queries/generated/YYYY-MM-DD-HH-MM.json`
73
- - Same structure with `"source": "ai-generated"`
74
- - Can promote good queries to core set manually
75
-
76
- ---
77
-
78
- ## Results & Tracking
79
-
80
- ### Structure
81
-
82
- ```
83
- tests/quality-results/
84
- ├── runs/ # Individual run outputs
85
- │ └── 2025-12-17T16-23-58.jsonl
86
- ├── baseline.json # Current performance baseline
87
- └── history.json # Score trends over time
88
- ```
89
-
90
- ### Baseline (`baseline.json`)
91
-
92
- ```json
93
- {
94
- "updatedAt": "2025-12-17",
95
- "corpus": "v1.0.0",
96
- "querySet": "core@1.0.0",
97
- "scores": {
98
- "relevance": 0.72,
99
- "ranking": 0.68,
100
- "coverage": 0.65,
101
- "snippetQuality": 0.70,
102
- "overall": 0.69
103
- },
104
- "thresholds": {
105
- "regression": 0.05,
106
- "improvement": 0.03
107
- }
108
- }
109
- ```
110
-
111
- ### Comparison Output
112
-
113
- ```
114
- 📊 Search Quality Results (vs baseline)
115
-
116
- Relevance: 0.75 (+0.03) ✅
117
- Ranking: 0.66 (-0.02)
118
- Coverage: 0.71 (+0.06) ✅
119
- Snippet: 0.68 (-0.02)
120
- Overall: 0.70 (+0.01)
121
-
122
- ✅ No regressions detected
123
- ```
124
-
125
- ---
126
-
127
- ## Test Execution
128
-
129
- ### Commands
130
-
131
- | Command | Purpose |
132
- |---------|---------|
133
- | `npm run test:corpus:index` | Create store + index committed corpus |
134
- | `npm run test:search-quality` | Regression check against baseline |
135
- | `npm run test:search-quality -- --explore` | Generate fresh queries + run |
136
- | `npm run test:search-quality -- --set <name>` | Re-run historical query set |
137
- | `npm run test:search-quality -- --update-baseline` | Lock current scores as baseline |
138
-
139
- ### CI Integration
140
-
141
- ```yaml
142
- - run: npm run test:corpus:index
143
- - run: npm run test:search-quality
144
- ```
145
-
146
- ---
147
-
148
- ## AI Judgment Calibration
149
-
150
- ### Spot-Check Workflow
151
-
152
- ```bash
153
- npm run test:search-quality -- --review
154
- ```
155
-
156
- Interactive review of AI judgments to track agreement rate.
157
-
158
- ### Calibration Data (`queries/calibration.json`)
159
-
160
- ```json
161
- {
162
- "judgments": [...],
163
- "stats": {
164
- "totalReviewed": 47,
165
- "agreementRate": 0.89,
166
- "lastReview": "2025-12-17"
167
- }
168
- }
169
- ```
170
-
171
- ### When to Recalibrate
172
-
173
- - Agreement rate drops below 85%
174
- - After major prompt changes
175
- - Quarterly as hygiene
176
-
177
- ---
178
-
179
- ## Implementation Priority
180
-
181
- ### Phase 1 - Foundation
182
-
183
- 1. Build corpus: clone Zod + Hono, clean .git dirs, commit
184
- 2. Add 5-10 articles/docs manually
185
- 3. Create `core.json` with 15-20 curated queries
186
- 4. Update script for committed corpus + named query sets
187
- 5. Add baseline comparison output
188
-
189
- ### Phase 2 - Tracking
190
-
191
- 1. Implement `baseline.json` and `history.json`
192
- 2. Add `--update-baseline` flag
193
- 3. Regression detection with threshold alerts
194
- 4. Before/after comparison in output
195
-
196
- ### Phase 3 - Calibration
197
-
198
- 1. Interactive `--review` command
199
- 2. `calibration.json` tracking
200
- 3. Agreement rate reporting
201
-
202
- ### Phase 4 - CI
203
-
204
- 1. GitHub Actions workflow
205
- 2. PR comments with score changes
206
- 3. Block merges on regression
207
-
208
- ### Out of Scope (YAGNI)
209
-
210
- - PDF support (add later if needed)
211
- - Visualization dashboards
212
- - Automated query promotion