@joycodetech/qmd-ja 2.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (47) hide show
  1. package/CHANGELOG.md +819 -0
  2. package/LICENSE +21 -0
  3. package/README.md +1143 -0
  4. package/bin/qmd +162 -0
  5. package/dist/ast.d.ts +65 -0
  6. package/dist/ast.js +334 -0
  7. package/dist/bench/bench.d.ts +23 -0
  8. package/dist/bench/bench.js +280 -0
  9. package/dist/bench/score.d.ts +33 -0
  10. package/dist/bench/score.js +88 -0
  11. package/dist/bench/types.d.ts +80 -0
  12. package/dist/bench/types.js +8 -0
  13. package/dist/cli/formatter.d.ts +120 -0
  14. package/dist/cli/formatter.js +355 -0
  15. package/dist/cli/qmd.d.ts +43 -0
  16. package/dist/cli/qmd.js +4159 -0
  17. package/dist/collections.d.ts +166 -0
  18. package/dist/collections.js +410 -0
  19. package/dist/db.d.ts +44 -0
  20. package/dist/db.js +75 -0
  21. package/dist/index.d.ts +230 -0
  22. package/dist/index.js +242 -0
  23. package/dist/llm.d.ts +500 -0
  24. package/dist/llm.js +1615 -0
  25. package/dist/maintenance.d.ts +23 -0
  26. package/dist/maintenance.js +37 -0
  27. package/dist/mcp/server.d.ts +24 -0
  28. package/dist/mcp/server.js +702 -0
  29. package/dist/paths.d.ts +1 -0
  30. package/dist/paths.js +4 -0
  31. package/dist/store.d.ts +996 -0
  32. package/dist/store.js +4208 -0
  33. package/models/vaporetto-bccwj.model +0 -0
  34. package/package.json +130 -0
  35. package/scripts/build.mjs +30 -0
  36. package/scripts/check-package-grammars.mjs +29 -0
  37. package/scripts/package-smoke.mjs +65 -0
  38. package/scripts/test-all.mjs +38 -0
  39. package/skills/qmd/SKILL.md +295 -0
  40. package/skills/qmd/references/mcp-setup.md +102 -0
  41. package/skills/release/SKILL.md +139 -0
  42. package/skills/release/scripts/install-hooks.sh +38 -0
  43. package/vendor/vaporetto-node-wasm/package.json +11 -0
  44. package/vendor/vaporetto-node-wasm/vaporetto_node_wasm.d.ts +19 -0
  45. package/vendor/vaporetto-node-wasm/vaporetto_node_wasm.js +202 -0
  46. package/vendor/vaporetto-node-wasm/vaporetto_node_wasm_bg.wasm +0 -0
  47. package/vendor/vaporetto-node-wasm/vaporetto_node_wasm_bg.wasm.d.ts +13 -0
package/README.md ADDED
@@ -0,0 +1,1143 @@
1
+ ## πŸ‡―πŸ‡΅ qmd-ja β€” Japanese-enhanced fork of qmd
2
+
3
+ > **qmd-ja** is a fork of [tobi/qmd](https://github.com/tobi/qmd) (MIT) with accurate Japanese morphological analysis for BM25 full-text search.
4
+
5
+ ### What's different from upstream
6
+
7
+ | | tobi/qmd (upstream) | joycodetech/qmd-ja |
8
+ |---|---|---|
9
+ | CJK tokenizer | Unigram (character-level) | **Vaporetto WASM** (morphological) |
10
+ | `γƒŠγƒ¬γƒƒγ‚Έγƒ™γƒΌγ‚Ή` β†’ FTS | `γƒŠγƒ¬γƒƒγ‚Έγƒ™ γƒΌ γ‚Ή` ❌ | `γƒŠγƒ¬γƒƒγ‚Έγƒ™γƒΌγ‚Ή` βœ… |
11
+ | `γƒ—γƒ­γƒ³γƒ—γƒˆγ‚³γƒ³γƒ‘γ‚€γƒ©` β†’ FTS | `γƒ— γƒ­ ン γƒ— γƒˆ...` (17 tokens) | `γƒ—γƒ­γƒ³γƒ—γƒˆγ‚³γƒ³γƒ‘γ‚€γƒ©` (1 token) βœ… |
12
+ | Init time | 0 ms | 16 ms (one-time) |
13
+ | Tokenize speed | 3 Β΅s/query | 5 Β΅s/query |
14
+ | `γƒΌ` in CJK_RUN_PATTERN | Missing (bug) | Fixed βœ… |
15
+
16
+ ### Installation
17
+
18
+ ```bash
19
+ npm install -g @joycodetech/qmd-ja
20
+ ```
21
+
22
+ > All other features, commands, and MCP integration are identical to [tobi/qmd](https://github.com/tobi/qmd).
23
+ > See the original documentation below ↓
24
+
25
+ ---
26
+
27
+ # QMD - Query Markup Documents
28
+
29
+ An on-device search engine for everything you need to remember. Index your markdown notes, meeting transcripts, documentation, and knowledge bases. Search with keywords or natural language. Ideal for your agentic flows.
30
+
31
+ QMD combines BM25 full-text search, vector semantic search, and LLM re-rankingβ€”all running locally via node-llama-cpp with GGUF models.
32
+
33
+ ![QMD Architecture](assets/qmd-architecture.png)
34
+
35
+ You can read more about QMD's progress in the [CHANGELOG](CHANGELOG.md).
36
+
37
+ ## Quick Start
38
+
39
+ ```sh
40
+ # Install globally (Node or Bun)
41
+ npm install -g @tobilu/qmd
42
+ # or
43
+ bun install -g @tobilu/qmd
44
+
45
+ # Or run directly
46
+ npx @tobilu/qmd ...
47
+ bunx @tobilu/qmd ...
48
+
49
+ # Create collections for your notes, docs, and meeting transcripts
50
+ qmd collection add ~/notes --name notes
51
+ qmd collection add ~/Documents/meetings --name meetings
52
+ qmd collection add ~/work/docs --name docs
53
+
54
+ # Add context to help with search results, each piece of context will be returned when matching sub documents are returned. This works as a tree. This is the key feature of QMD as it allows LLMs to make much better contextual choices when selecting documents. Don't sleep on it!
55
+ qmd context add qmd://notes "Personal notes and ideas"
56
+ qmd context add qmd://meetings "Meeting transcripts and notes"
57
+ qmd context add qmd://docs "Work documentation"
58
+
59
+ # Generate embeddings for semantic search
60
+ qmd embed
61
+
62
+ # Search across everything
63
+ qmd search "project timeline" # Fast keyword search
64
+ qmd vsearch "how to deploy" # Semantic search
65
+ qmd query "quarterly planning process" # Hybrid + reranking (best quality)
66
+
67
+ # Get a specific document
68
+ qmd get "meetings/2024-01-15.md"
69
+
70
+ # Get a document by docid (shown in search results)
71
+ qmd get "#abc123"
72
+
73
+ # Get multiple documents by glob pattern
74
+ qmd multi-get "journals/2025-05*.md"
75
+
76
+ # Search within a specific collection
77
+ qmd search "API" -c notes
78
+
79
+ # Export all matches for an agent
80
+ qmd search "API" --all --files --min-score 0.3
81
+ ```
82
+
83
+ ### Using with AI Agents
84
+
85
+ QMD's `--json` and `--files` output formats are designed for agentic workflows:
86
+
87
+ ```sh
88
+ # Get structured results for an LLM
89
+ qmd search "authentication" --json -n 10
90
+
91
+ # List all relevant files above a threshold
92
+ qmd query "error handling" --all --files --min-score 0.4
93
+
94
+ # Retrieve full document content
95
+ qmd get "docs/api-reference.md" --full
96
+ ```
97
+
98
+ ### MCP Server
99
+
100
+ Although the tool works perfectly fine when you just tell your agent to use it on the command line, it also exposes an MCP (Model Context Protocol) server for tighter integration.
101
+
102
+ **Tools exposed:**
103
+ - `query` β€” Search with typed sub-queries (`lex`/`vec`/`hyde`), combined via RRF + reranking
104
+ - `get` β€” Retrieve a document by path or docid (with fuzzy matching suggestions)
105
+ - `multi_get` β€” Batch retrieve by glob pattern, comma-separated list, or docids
106
+ - `status` β€” Index health and collection info
107
+
108
+ **Claude Desktop configuration** (`~/Library/Application Support/Claude/claude_desktop_config.json`):
109
+
110
+ ```json
111
+ {
112
+ "mcpServers": {
113
+ "qmd": {
114
+ "command": "qmd",
115
+ "args": ["mcp"]
116
+ }
117
+ }
118
+ }
119
+ ```
120
+
121
+ **Claude Code** β€” Install the plugin (recommended):
122
+
123
+ ```bash
124
+ claude plugin marketplace add tobi/qmd
125
+ claude plugin install qmd@qmd
126
+ ```
127
+
128
+ Or configure MCP manually in `~/.claude/settings.json`:
129
+
130
+ ```json
131
+ {
132
+ "mcpServers": {
133
+ "qmd": {
134
+ "command": "qmd",
135
+ "args": ["mcp"]
136
+ }
137
+ }
138
+ }
139
+ ```
140
+
141
+ #### HTTP Transport
142
+
143
+ By default, QMD's MCP server uses stdio (launched as a subprocess by each client). For a shared, long-lived server that avoids repeated model loading, use the HTTP transport:
144
+
145
+ ```sh
146
+ # Foreground (Ctrl-C to stop)
147
+ qmd mcp --http # localhost:8181
148
+ qmd mcp --http --port 8080 # custom port
149
+
150
+ # Background daemon
151
+ qmd mcp --http --daemon # start, writes PID to ~/.cache/qmd/mcp.pid
152
+ qmd mcp stop # stop via PID file
153
+ qmd status # shows "MCP: running (PID ...)" when active
154
+ ```
155
+
156
+ The HTTP server exposes two endpoints:
157
+ - `POST /mcp` β€” MCP Streamable HTTP (JSON responses, stateless)
158
+ - `GET /health` β€” liveness check with uptime
159
+
160
+ LLM models stay loaded in VRAM across requests. Embedding/reranking contexts are disposed after 5 min idle and transparently recreated on the next request (~1s penalty, models remain loaded).
161
+
162
+ Point any MCP client at `http://localhost:8181/mcp` to connect.
163
+
164
+ #### MCP Tool Parameters
165
+
166
+ | Tool | Parameter | Type | Notes |
167
+ |------|-----------|------|-------|
168
+ | `query` | `searches` | array | Typed sub-queries (`lex`/`vec`/`hyde`), 1–10. **Required.** First gets 2x weight. |
169
+ | `query` | `collections` | string[] | Filter by collection names (OR). **Array only** β€” singular `collection` is silently ignored. |
170
+ | `query` | `intent` | string | Disambiguation context (does not search on its own) |
171
+ | `query` | `limit` | number | Max results (default 10) |
172
+ | `query` | `minScore` | number | Minimum relevance 0–1 (default 0) |
173
+ | `query` | `candidateLimit` | number | Max candidates to rerank (default 40) |
174
+ | `query` | `rerank` | boolean | Run LLM reranking (default **true**); set false for RRF-only |
175
+ | `get` | `file` | string | Path, docid (`#abc123`), or `path:from:count` (e.g. `#abc123:120:40`) |
176
+ | `get` | `fromLine` | number | Start line (1-indexed); overrides the `:from` suffix |
177
+ | `get` | `maxLines` | number | Limit returned lines |
178
+ | `get` | `lineNumbers` | boolean | Prefix lines with numbers (default **true**) |
179
+ | `multi_get` | `pattern` | string | Glob pattern or comma-separated list |
180
+ | `multi_get` | `maxBytes` | number | Skip files larger than N (default 10240) |
181
+ | `multi_get` | `maxLines` | number | Limit lines per file |
182
+ | `multi_get` | `lineNumbers` | boolean | Prefix lines with numbers (default **true**) |
183
+
184
+ Unknown parameters are silently ignored (not rejected) β€” double-check names if
185
+ results seem unscoped. The HTTP `/query` and `/search` endpoints return
186
+ `qmd://collection/path` URIs in the `file` field, matching the CLI and MCP output.
187
+
188
+ ### SDK / Library Usage
189
+
190
+ Use QMD as a library in your own Node.js or Bun applications.
191
+
192
+ #### Installation
193
+
194
+ ```sh
195
+ npm install @tobilu/qmd
196
+ ```
197
+
198
+ #### Quick Start
199
+
200
+ ```typescript
201
+ import { createStore } from '@tobilu/qmd'
202
+
203
+ const store = await createStore({
204
+ dbPath: './my-index.sqlite',
205
+ config: {
206
+ collections: {
207
+ docs: { path: '/path/to/docs', pattern: '**/*.md' },
208
+ },
209
+ },
210
+ })
211
+
212
+ const results = await store.search({ query: "authentication flow" })
213
+ console.log(results.map(r => `${r.title} (${Math.round(r.score * 100)}%)`))
214
+
215
+ await store.close()
216
+ ```
217
+
218
+ #### Store Creation
219
+
220
+ `createStore()` accepts three modes:
221
+
222
+ ```typescript
223
+ import { createStore } from '@tobilu/qmd'
224
+
225
+ // 1. Inline config β€” no files needed besides the DB
226
+ const store = await createStore({
227
+ dbPath: './index.sqlite',
228
+ config: {
229
+ collections: {
230
+ docs: { path: '/path/to/docs', pattern: '**/*.md' },
231
+ notes: { path: '/path/to/notes' },
232
+ },
233
+ },
234
+ })
235
+
236
+ // 2. YAML config file β€” collections defined in a file
237
+ const store2 = await createStore({
238
+ dbPath: './index.sqlite',
239
+ configPath: './qmd.yml',
240
+ })
241
+
242
+ // 3. DB-only β€” reopen a previously configured store
243
+ const store3 = await createStore({ dbPath: './index.sqlite' })
244
+ ```
245
+
246
+ #### Search
247
+
248
+ The unified `search()` method handles both simple queries and pre-expanded structured queries:
249
+
250
+ ```typescript
251
+ // Simple query β€” auto-expanded via LLM, then BM25 + vector + reranking
252
+ const results = await store.search({ query: "authentication flow" })
253
+
254
+ // With options
255
+ const results2 = await store.search({
256
+ query: "rate limiting",
257
+ intent: "API throttling and abuse prevention",
258
+ collection: "docs",
259
+ limit: 5,
260
+ minScore: 0.3,
261
+ explain: true,
262
+ })
263
+
264
+ // Pre-expanded queries β€” skip auto-expansion, control each sub-query
265
+ const results3 = await store.search({
266
+ queries: [
267
+ { type: 'lex', query: '"connection pool" timeout -redis' },
268
+ { type: 'vec', query: 'why do database connections time out under load' },
269
+ ],
270
+ collections: ["docs", "notes"],
271
+ })
272
+
273
+ // Skip reranking for faster results
274
+ const fast = await store.search({ query: "auth", rerank: false })
275
+ ```
276
+
277
+ For direct backend access:
278
+
279
+ ```typescript
280
+ // BM25 keyword search (fast, no LLM)
281
+ const lexResults = await store.searchLex("auth middleware", { limit: 10 })
282
+
283
+ // Vector similarity search (embedding model, no reranking)
284
+ const vecResults = await store.searchVector("how users log in", { limit: 10 })
285
+
286
+ // Manual query expansion for full control
287
+ const expanded = await store.expandQuery("auth flow", { intent: "user login" })
288
+ const results4 = await store.search({ queries: expanded })
289
+ ```
290
+
291
+ #### Retrieval
292
+
293
+ ```typescript
294
+ // Get a document by path or docid
295
+ const doc = await store.get("docs/readme.md")
296
+ const byId = await store.get("#abc123")
297
+
298
+ if (!("error" in doc)) {
299
+ console.log(doc.title, doc.displayPath, doc.context)
300
+ }
301
+
302
+ // Get document body with line range
303
+ const body = await store.getDocumentBody("docs/readme.md", {
304
+ fromLine: 50,
305
+ maxLines: 100,
306
+ })
307
+
308
+ // Batch retrieve by glob or comma-separated list
309
+ const { docs, errors } = await store.multiGet("docs/**/*.md", {
310
+ maxBytes: 20480,
311
+ })
312
+ ```
313
+
314
+ #### Collections
315
+
316
+ ```typescript
317
+ // Add a collection
318
+ await store.addCollection("myapp", {
319
+ path: "/src/myapp",
320
+ pattern: "**/*.ts",
321
+ ignore: ["node_modules/**", "*.test.ts"],
322
+ })
323
+
324
+ // List collections with document stats
325
+ const collections = await store.listCollections()
326
+ // => [{ name, pwd, glob_pattern, doc_count, active_count, last_modified, includeByDefault }]
327
+
328
+ // Get names of collections included in queries by default
329
+ const defaults = await store.getDefaultCollectionNames()
330
+
331
+ // Remove / rename
332
+ await store.removeCollection("myapp")
333
+ await store.renameCollection("old-name", "new-name")
334
+ ```
335
+
336
+ #### Context
337
+
338
+ Context adds descriptive metadata that improves search relevance and is returned alongside results:
339
+
340
+ ```typescript
341
+ // Add context for a path within a collection
342
+ await store.addContext("docs", "/api", "REST API reference documentation")
343
+
344
+ // Set global context (applies to all collections)
345
+ await store.setGlobalContext("Internal engineering documentation")
346
+
347
+ // List all contexts
348
+ const contexts = await store.listContexts()
349
+ // => [{ collection, path, context }]
350
+
351
+ // Remove context
352
+ await store.removeContext("docs", "/api")
353
+ await store.setGlobalContext(undefined) // clear global
354
+ ```
355
+
356
+ #### Indexing
357
+
358
+ ```typescript
359
+ // Re-index collections by scanning the filesystem
360
+ const result = await store.update({
361
+ collections: ["docs"], // optional β€” defaults to all
362
+ onProgress: ({ collection, file, current, total }) => {
363
+ console.log(`[${collection}] ${current}/${total} ${file}`)
364
+ },
365
+ })
366
+ // => { collections, indexed, updated, unchanged, removed, needsEmbedding }
367
+
368
+ // Generate vector embeddings
369
+ const embedResult = await store.embed({
370
+ force: false, // true to re-embed everything
371
+ chunkStrategy: "auto", // "regex" (default) or "auto" (AST for code files)
372
+ onProgress: ({ current, total, collection }) => {
373
+ console.log(`Embedding ${current}/${total}`)
374
+ },
375
+ })
376
+ ```
377
+
378
+ #### Types
379
+
380
+ Key types exported for SDK consumers:
381
+
382
+ ```typescript
383
+ import type {
384
+ QMDStore, // The store interface
385
+ SearchOptions, // Options for search()
386
+ LexSearchOptions, // Options for searchLex()
387
+ VectorSearchOptions, // Options for searchVector()
388
+ HybridQueryResult, // Search result with score, snippet, context
389
+ SearchResult, // Result from searchLex/searchVector
390
+ ExpandedQuery, // Typed sub-query { type: 'lex'|'vec'|'hyde', query }
391
+ DocumentResult, // Document metadata + body
392
+ DocumentNotFound, // Error with similarFiles suggestions
393
+ MultiGetResult, // Batch retrieval result
394
+ UpdateProgress, // Progress callback info for update()
395
+ UpdateResult, // Aggregated update result
396
+ EmbedProgress, // Progress callback info for embed()
397
+ EmbedResult, // Embedding result
398
+ StoreOptions, // createStore() options
399
+ CollectionConfig, // Inline config shape
400
+ IndexStatus, // From getStatus()
401
+ IndexHealthInfo, // From getIndexHealth()
402
+ } from '@tobilu/qmd'
403
+ ```
404
+
405
+ Utility exports:
406
+
407
+ ```typescript
408
+ import {
409
+ extractSnippet, // Extract a relevant snippet from text
410
+ addLineNumbers, // Add line numbers to text
411
+ DEFAULT_MULTI_GET_MAX_BYTES, // Default max file size for multiGet (10KB)
412
+ Maintenance, // Database maintenance operations
413
+ } from '@tobilu/qmd'
414
+ ```
415
+
416
+ #### Lifecycle
417
+
418
+ ```typescript
419
+ // Close the store β€” disposes LLM models and DB connection
420
+ await store.close()
421
+ ```
422
+
423
+ The SDK requires explicit `dbPath` β€” no defaults are assumed. This makes it safe to embed in any application without side effects.
424
+
425
+ ## Architecture
426
+
427
+ ```
428
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
429
+ β”‚ QMD Hybrid Search Pipeline β”‚
430
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
431
+
432
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
433
+ β”‚ User Query β”‚
434
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
435
+ β”‚
436
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
437
+ β–Ό β–Ό
438
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
439
+ β”‚ Query Expansionβ”‚ β”‚ Original Queryβ”‚
440
+ β”‚ (fine-tuned) β”‚ β”‚ (Γ—2 weight) β”‚
441
+ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
442
+ β”‚ β”‚
443
+ β”‚ 2 alternative queries β”‚
444
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
445
+ β”‚
446
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
447
+ β–Ό β–Ό β–Ό
448
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
449
+ β”‚ Original Query β”‚ β”‚ Expanded Query 1β”‚ β”‚ Expanded Query 2β”‚
450
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
451
+ β”‚ β”‚ β”‚
452
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”
453
+ β–Ό β–Ό β–Ό β–Ό β–Ό β–Ό
454
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”
455
+ β”‚ BM25 β”‚ β”‚Vector β”‚ β”‚ BM25 β”‚ β”‚Vector β”‚ β”‚ BM25 β”‚ β”‚Vector β”‚
456
+ β”‚(FTS5) β”‚ β”‚Search β”‚ β”‚(FTS5) β”‚ β”‚Search β”‚ β”‚(FTS5) β”‚ β”‚Search β”‚
457
+ β””β”€β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”€β”¬β”€β”€β”€β”˜
458
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
459
+ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
460
+ β”‚ β”‚ β”‚
461
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
462
+ β”‚
463
+ β–Ό
464
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
465
+ β”‚ RRF Fusion + Bonus β”‚
466
+ β”‚ Original query: Γ—2 β”‚
467
+ β”‚ Top-rank bonus: +0.05β”‚
468
+ β”‚ Top 30 Kept β”‚
469
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
470
+ β”‚
471
+ β–Ό
472
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
473
+ β”‚ LLM Re-ranking β”‚
474
+ β”‚ (qwen3-reranker) β”‚
475
+ β”‚ Yes/No + logprobs β”‚
476
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
477
+ β”‚
478
+ β–Ό
479
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
480
+ β”‚ Position-Aware Blend β”‚
481
+ β”‚ Top 1-3: 75% RRF β”‚
482
+ β”‚ Top 4-10: 60% RRF β”‚
483
+ β”‚ Top 11+: 40% RRF β”‚
484
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
485
+ ```
486
+
487
+ ## Score Normalization & Fusion
488
+
489
+ ### Search Backends
490
+
491
+ | Backend | Raw Score | Conversion | Range |
492
+ |---------|-----------|------------|-------|
493
+ | **FTS (BM25)** | SQLite FTS5 BM25 | `Math.abs(score)` | 0 to ~25+ |
494
+ | **Vector** | Cosine distance | `1 / (1 + distance)` | 0.0 to 1.0 |
495
+ | **Reranker** | LLM 0-10 rating | `score / 10` | 0.0 to 1.0 |
496
+
497
+ ### Fusion Strategy
498
+
499
+ The `query` command uses **Reciprocal Rank Fusion (RRF)** with position-aware blending:
500
+
501
+ 1. **Query Expansion**: Original query (Γ—2 for weighting) + 1 LLM variation
502
+ 2. **Parallel Retrieval**: Each query searches both FTS and vector indexes
503
+ 3. **RRF Fusion**: Combine all result lists using `score = Ξ£(1/(k+rank+1))` where k=60
504
+ 4. **Top-Rank Bonus**: Documents ranking #1 in any list get +0.05, #2-3 get +0.02
505
+ 5. **Top-K Selection**: Take top 30 candidates for reranking
506
+ 6. **Re-ranking**: LLM scores each document (yes/no with logprobs confidence)
507
+ 7. **Position-Aware Blending**:
508
+ - RRF rank 1-3: 75% retrieval, 25% reranker (preserves exact matches)
509
+ - RRF rank 4-10: 60% retrieval, 40% reranker
510
+ - RRF rank 11+: 40% retrieval, 60% reranker (trust reranker more)
511
+
512
+ **Why this approach**: Pure RRF can dilute exact matches when expanded queries don't match. The top-rank bonus preserves documents that score #1 for the original query. Position-aware blending prevents the reranker from destroying high-confidence retrieval results.
513
+
514
+ ### Score Interpretation
515
+
516
+ | Score | Meaning |
517
+ |-------|---------|
518
+ | 0.8 - 1.0 | Highly relevant |
519
+ | 0.5 - 0.8 | Moderately relevant |
520
+ | 0.2 - 0.5 | Somewhat relevant |
521
+ | 0.0 - 0.2 | Low relevance |
522
+
523
+ ## Requirements
524
+
525
+ ### System Requirements
526
+
527
+ - **Node.js** >= 22
528
+ - **Bun** >= 1.0.0
529
+ - **macOS**: Homebrew SQLite (for extension support)
530
+ ```sh
531
+ brew install sqlite
532
+ ```
533
+
534
+ ### GGUF Models (via node-llama-cpp)
535
+
536
+ QMD uses three local GGUF models (auto-downloaded on first use):
537
+
538
+ | Model | Purpose | Size |
539
+ |-------|---------|------|
540
+ | `embeddinggemma-300M-Q8_0` | Vector embeddings (default) | ~300MB |
541
+ | `qwen3-reranker-0.6b-q8_0` | Re-ranking | ~640MB |
542
+ | `qmd-query-expansion-1.7B-q4_k_m` | Query expansion (fine-tuned) | ~1.1GB |
543
+
544
+ Models are downloaded from HuggingFace and cached in `~/.cache/qmd/models/`.
545
+
546
+ ### Custom Embedding Model
547
+
548
+ Override the default embedding model via the `QMD_EMBED_MODEL` environment variable.
549
+ This is useful for multilingual corpora (e.g. Chinese, Japanese, Korean) where
550
+ `embeddinggemma-300M` has limited coverage.
551
+
552
+ ```sh
553
+ # Use Qwen3-Embedding-0.6B for better multilingual (CJK) support
554
+ export QMD_EMBED_MODEL="hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf"
555
+
556
+ # After changing the model, re-embed all collections:
557
+ qmd embed -f
558
+ ```
559
+
560
+ Supported model families:
561
+ - **embeddinggemma** (default) β€” English-optimized, small footprint
562
+ - **Qwen3-Embedding** β€” Multilingual (119 languages including CJK), MTEB top-ranked
563
+
564
+ > **Note:** When switching embedding models, you must re-index with `qmd embed -f`
565
+ > since vectors are not cross-compatible between models. The prompt format is
566
+ > automatically adjusted for each model family.
567
+
568
+ ## Installation
569
+
570
+ ```sh
571
+ npm install -g @tobilu/qmd
572
+ # or
573
+ bun install -g @tobilu/qmd
574
+ ```
575
+
576
+ ### Development
577
+
578
+ ```sh
579
+ git clone https://github.com/tobi/qmd
580
+ cd qmd
581
+ npm install
582
+ npm link
583
+ ```
584
+
585
+ ## Usage
586
+
587
+ ### Collection Management
588
+
589
+ ```sh
590
+ # Create a collection from current directory
591
+ qmd collection add . --name myproject
592
+
593
+ # Create a collection with explicit path and custom glob mask
594
+ qmd collection add ~/Documents/notes --name notes --mask "**/*.md"
595
+
596
+ # List all collections
597
+ qmd collection list
598
+
599
+ # Remove a collection
600
+ qmd collection remove myproject
601
+
602
+ # Rename a collection
603
+ qmd collection rename myproject my-project
604
+
605
+ # List files in a collection
606
+ qmd ls notes
607
+ qmd ls notes/subfolder
608
+
609
+ # Show collection details (path, glob mask, include status, context count)
610
+ qmd collection show notes
611
+
612
+ # Include or exclude a collection from default (unscoped) queries
613
+ qmd collection include notes
614
+ qmd collection exclude notes
615
+
616
+ # Run a command before every `qmd update` (e.g. git pull); empty arg clears it
617
+ qmd collection update-cmd notes 'git pull --rebase'
618
+ qmd collection update-cmd notes
619
+ ```
620
+
621
+ ### Generate Vector Embeddings
622
+
623
+ ```sh
624
+ # Embed all indexed documents (900 tokens/chunk, 15% overlap)
625
+ qmd embed
626
+
627
+ # Force re-embed everything
628
+ qmd embed -f
629
+
630
+ # Enable AST-aware chunking for code files (TS, JS, Python, Go, Rust)
631
+ qmd embed --chunk-strategy auto
632
+
633
+ # Also works with query for consistent chunk selection
634
+ qmd query "auth flow" --chunk-strategy auto
635
+
636
+ # Memory control for large corpora / constrained systems
637
+ qmd embed --max-docs-per-batch 50 # cap docs per embedding batch
638
+ qmd embed --max-batch-mb 64 # cap batch size in MB
639
+ ```
640
+
641
+ **AST-aware chunking** (`--chunk-strategy auto`) uses tree-sitter to chunk code
642
+ files at function, class, and import boundaries instead of arbitrary text
643
+ positions. This produces higher-quality chunks and better search results for
644
+ codebases. Markdown and other file types always use regex-based chunking
645
+ regardless of strategy.
646
+
647
+ The default is `regex` (existing behavior). Use `--chunk-strategy auto` to
648
+ opt in. Run `qmd status` to verify which grammars are available.
649
+
650
+ > **Note:** Tree-sitter grammars are optional dependencies. If they are not
651
+ > installed, `--chunk-strategy auto` falls back to regex-only chunking
652
+ > automatically. Tested on both Node.js and Bun.
653
+
654
+ ### Context Management
655
+
656
+ Context adds descriptive metadata to collections and paths, helping search understand your content.
657
+
658
+ ```sh
659
+ # Add context to a collection (using qmd:// virtual paths)
660
+ qmd context add qmd://notes "Personal notes and ideas"
661
+ qmd context add qmd://docs/api "API documentation"
662
+
663
+ # Add context from within a collection directory
664
+ cd ~/notes && qmd context add "Personal notes and ideas"
665
+ cd ~/notes/work && qmd context add "Work-related notes"
666
+
667
+ # Add global context (applies to all collections)
668
+ qmd context add / "Knowledge base for my projects"
669
+
670
+ # List all contexts
671
+ qmd context list
672
+
673
+ # Remove context
674
+ qmd context rm qmd://notes/old
675
+ ```
676
+
677
+ ### Search Commands
678
+
679
+ ```
680
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
681
+ β”‚ Search Modes β”‚
682
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
683
+ β”‚ search β”‚ BM25 full-text search only β”‚
684
+ β”‚ vsearch β”‚ Vector semantic search only β”‚
685
+ β”‚ query β”‚ Hybrid: FTS + Vector + Query Expansion + Re-ranking β”‚
686
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
687
+ ```
688
+
689
+ ```sh
690
+ # Full-text search (fast, keyword-based)
691
+ qmd search "authentication flow"
692
+
693
+ # Vector search (semantic similarity)
694
+ qmd vsearch "how to login"
695
+
696
+ # Hybrid search with re-ranking (best quality)
697
+ qmd query "user authentication"
698
+ ```
699
+
700
+ Two aliases exist for the semantic/hybrid modes: `vector-search` (β†’ `vsearch`)
701
+ and `deep-search` (β†’ `query`).
702
+
703
+ ### Options
704
+
705
+ ```sh
706
+ # Search options
707
+ -n <num> # Number of results (default: 5, or 20 for --files/--json)
708
+ -c, --collection # Restrict search to a specific collection
709
+ --all # Return all matches (use with --min-score to filter)
710
+ --min-score <num> # Minimum score threshold (default: 0)
711
+ --full # Show full document content
712
+ --line-numbers # Add line numbers to output
713
+ --explain # Include retrieval score traces (query, JSON/CLI output)
714
+ --index <name> # Use named index
715
+ --intent "<text>" # Disambiguation context (e.g. "web page load times")
716
+ --no-rerank # Skip LLM reranking (RRF scores only; faster on CPU)
717
+ -C, --candidate-limit <n> # Max candidates to rerank (default: 40)
718
+ --full-path # Emit on-disk filesystem paths instead of qmd:// URIs
719
+
720
+ # Output formats (for search and multi-get)
721
+ --format <kind> # cli (default) | json | csv | md | xml | files
722
+ # (--json, --csv, --md, --xml, --files are legacy aliases)
723
+
724
+ # Get options
725
+ qmd get <file>[:from[:count]] # Get document; optional start line and count
726
+ -l <num> # Maximum lines to return
727
+ --from <num> # Start line (overrides the :from suffix)
728
+ --no-line-numbers # Disable line numbering (on by default)
729
+
730
+ # Multi-get options
731
+ -l <num> # Maximum lines per file
732
+ --max-bytes <num> # Skip files larger than N bytes (default: 10KB)
733
+ ```
734
+
735
+ ### Collection Filtering
736
+
737
+ The `-c`/`--collection` flag filters results by collection **name** (as shown by
738
+ `qmd collection list`). Collections are a global registry β€” you can search any
739
+ collection from any directory:
740
+
741
+ ```sh
742
+ qmd search "auth" -c notes # single collection
743
+ qmd search "auth" -c notes -c docs # multiple collections (OR)
744
+ ```
745
+
746
+ With no `-c` flag, all default-included collections are searched. Collections
747
+ marked excluded (`qmd collection exclude <name>`) are skipped unless named
748
+ explicitly with `-c`.
749
+
750
+ > **Note:** With multiple `-c` flags, results come from a global top-K pool and are
751
+ > then filtered. If one collection dominates the rankings, matches from smaller
752
+ > collections may not appear at the default limit β€” raise `-n` or use `--all`.
753
+
754
+ ### Output Format
755
+
756
+ Default output is colorized CLI format (respects `NO_COLOR` env).
757
+
758
+ When stdout is a TTY, result paths are emitted as clickable terminal hyperlinks (OSC 8). Clicking a path opens the file in your editor using an editor URI template.
759
+
760
+ When stdout is not a TTY (for example piped to another command or redirected to a file), QMD emits plain text paths with no escape sequences.
761
+
762
+ TTY example:
763
+
764
+ ```
765
+ docs/guide.md:42 #a1b2c3
766
+ Title: Software Craftsmanship
767
+ Context: Work documentation
768
+ Score: 93%
769
+
770
+ This section covers the **craftsmanship** of building
771
+ quality software with attention to detail.
772
+ See also: engineering principles
773
+
774
+
775
+ notes/meeting.md:15 #d4e5f6
776
+ Title: Q4 Planning
777
+ Context: Personal notes and ideas
778
+ Score: 67%
779
+
780
+ Discussion about code quality and craftsmanship
781
+ in the development process.
782
+ ```
783
+
784
+ Configure the editor link target with `QMD_EDITOR_URI` (or `editor_uri` in config):
785
+
786
+ ```sh
787
+ # VS Code (default)
788
+ export QMD_EDITOR_URI="vscode://file/{path}:{line}:{col}"
789
+
790
+ # Cursor
791
+ export QMD_EDITOR_URI="cursor://file/{path}:{line}:{col}"
792
+
793
+ # Zed
794
+ export QMD_EDITOR_URI="zed://file/{path}:{line}:{col}"
795
+
796
+ # Sublime Text
797
+ export QMD_EDITOR_URI="subl://open?url=file://{path}&line={line}"
798
+ ```
799
+
800
+ Template placeholders:
801
+ - `{path}` absolute filesystem path (URI-encoded)
802
+ - `{line}` 1-based line number
803
+ - `{col}` or `{column}` 1-based column number
804
+
805
+ - **Path**: Collection-relative path (e.g., `docs/guide.md`)
806
+ - **Docid**: Short hash identifier (e.g., `#a1b2c3`) - use with `qmd get #a1b2c3`
807
+ - **Title**: Extracted from document (first heading or filename)
808
+ - **Context**: Path context if configured via `qmd context add`
809
+ - **Score**: Color-coded (green >70%, yellow >40%, dim otherwise)
810
+ - **Snippet**: Context around match with query terms highlighted
811
+
812
+ ### Examples
813
+
814
+ ```sh
815
+ # Get 10 results with minimum score 0.3
816
+ qmd query -n 10 --min-score 0.3 "API design patterns"
817
+
818
+ # Output as markdown for LLM context
819
+ qmd search --md --full "error handling"
820
+
821
+ # JSON output for scripting
822
+ qmd query --json "quarterly reports"
823
+
824
+ # Inspect how each result was scored (RRF + rerank blend)
825
+ qmd query --json --explain "quarterly reports"
826
+
827
+ # Use separate index for different knowledge base
828
+ qmd --index work search "quarterly reports"
829
+ ```
830
+
831
+ The `--explain` flag attaches a score breakdown to each result: the FTS/vector
832
+ backend scores plus the RRF fusion math (rank, weight, top-rank bonus) and every
833
+ sub-query's contribution. Abbreviated:
834
+
835
+ ```json
836
+ {
837
+ "docid": "#6c90f0",
838
+ "score": 0.89,
839
+ "file": "qmd://qmd/README.md",
840
+ "explain": {
841
+ "ftsScores": [0.892, 0.907],
842
+ "vectorScores": [0.540, 0.484],
843
+ "rrf": {
844
+ "rank": 1,
845
+ "weight": 0.75,
846
+ "baseScore": 0.123,
847
+ "topRankBonus": 0.05,
848
+ "totalScore": 0.173,
849
+ "contributions": [
850
+ { "source": "fts", "queryType": "original", "query": "reranking",
851
+ "rank": 1, "weight": 2, "backendScore": 0.892, "rrfContribution": 0.0328 }
852
+ ]
853
+ }
854
+ }
855
+ }
856
+ ```
857
+
858
+ ### Index Maintenance
859
+
860
+ ```sh
861
+ # Show index status and collections with contexts
862
+ qmd status
863
+
864
+ # Re-index all collections. If a collection has a configured update command
865
+ # (e.g. `git pull`), it runs first β€” set one with `qmd collection update-cmd`.
866
+ qmd update
867
+
868
+ # Diagnose the install (runtime, sqlite-vec, embedding fingerprints, GPU probe)
869
+ qmd doctor
870
+
871
+ # Initialize a project-local index in the current directory
872
+ qmd init
873
+
874
+ # Get document by filepath (with fuzzy matching suggestions)
875
+ qmd get notes/meeting.md
876
+
877
+ # Get document by docid (from search results)
878
+ qmd get "#abc123"
879
+
880
+ # Get document starting at line 50, max 100 lines
881
+ qmd get notes/meeting.md:50 -l 100
882
+
883
+ # Read 40 lines starting at line 120 via the :from:count suffix (works with docids)
884
+ qmd get notes/meeting.md:120:40
885
+ qmd get "#abc123:120:40"
886
+
887
+ # get / multi-get are line-numbered by default; disable with --no-line-numbers
888
+ qmd get notes/meeting.md --no-line-numbers
889
+
890
+ # Get multiple documents by glob pattern
891
+ qmd multi-get "journals/2025-05*.md"
892
+
893
+ # Get multiple documents by comma-separated list (supports docids)
894
+ qmd multi-get "doc1.md, doc2.md, #abc123"
895
+
896
+ # Limit multi-get to files under 20KB
897
+ qmd multi-get "docs/*.md" --max-bytes 20480
898
+
899
+ # Output multi-get as JSON for agent processing
900
+ qmd multi-get "docs/*.md" --json
901
+
902
+ # Clean up cache and orphaned data
903
+ qmd cleanup
904
+ ```
905
+
906
+ ### Benchmarking
907
+
908
+ Measure search quality across all four backends with `qmd bench` and a fixture file
909
+ of queries with known-relevant documents.
910
+
911
+ **From a git checkout**, an example fixture and its test corpus ship in the repo:
912
+
913
+ ```sh
914
+ # One-time setup (indexes the repo's test corpus into its own collection)
915
+ qmd collection add test/eval-docs --name eval-docs
916
+ qmd embed -c eval-docs
917
+
918
+ # Run the benchmark (table output)
919
+ qmd bench src/bench/fixtures/example.json
920
+
921
+ # JSON output for programmatic analysis
922
+ qmd bench src/bench/fixtures/example.json --json
923
+ ```
924
+
925
+ > The example fixture (`src/bench/fixtures/example.json`) and its test corpus
926
+ > (`test/eval-docs/`) exist only in a git checkout β€” they are **not** part of the
927
+ > published npm package. If you installed via `npm`/`npx`, write your own fixture
928
+ > (see below) against a collection you have already indexed:
929
+ >
930
+ > ```sh
931
+ > qmd bench my-fixture.json -c my-collection
932
+ > ```
933
+
934
+ Each query runs against four backends, reporting precision@k, recall, MRR, and F1:
935
+
936
+ | Backend | What it tests | LLM required |
937
+ |---------|---------------|--------------|
938
+ | `bm25` | Keyword search only (FTS5) | No |
939
+ | `vector` | Semantic similarity only | Embedding model |
940
+ | `hybrid` | BM25 + vector fusion (no reranking) | Embedding model |
941
+ | `full` | Full pipeline with LLM reranking | All three models |
942
+
943
+ **Score interpretation:** `1.00` = perfect (all expected docs in top results),
944
+ `0.00` = complete miss. The example fixture typically shows bm25 ~0.50, vector
945
+ ~0.70, and hybrid/full ~1.00 β€” a concrete demonstration of why hybrid search beats
946
+ either backend alone.
947
+
948
+ **Custom fixtures** are JSON:
949
+
950
+ ```json
951
+ {
952
+ "description": "My benchmark",
953
+ "version": 1,
954
+ "collection": "my-collection",
955
+ "queries": [
956
+ {
957
+ "id": "find-auth",
958
+ "query": "authentication flow",
959
+ "type": "semantic",
960
+ "expected_files": ["docs/auth-design.md"],
961
+ "expected_in_top_k": 3
962
+ }
963
+ ]
964
+ }
965
+ ```
966
+
967
+ `expected_files` are collection-relative paths as shown by `qmd ls`. The `type`
968
+ field (`exact`, `semantic`, `topical`, `cross-domain`, `alias`) labels queries for
969
+ grouping β€” it does not change search behavior.
970
+
971
+ > **Heads-up:** if the fixture's collection isn't indexed, bench currently runs to
972
+ > completion and reports all zeros with no warning. Verify setup with
973
+ > `qmd ls <collection>` first.
974
+
975
+ ## Data Storage
976
+
977
+ Index stored in: `~/.cache/qmd/index.sqlite`
978
+
979
+ ### Schema
980
+
981
+ ```sql
982
+ collections -- Indexed directories with name and glob patterns
983
+ path_contexts -- Context descriptions by virtual path (qmd://...)
984
+ documents -- Markdown content with metadata and docid (6-char hash)
985
+ documents_fts -- FTS5 full-text index
986
+ content_vectors -- Embedding chunks (hash, seq, pos, 900 tokens each)
987
+ vectors_vec -- sqlite-vec vector index (hash_seq key)
988
+ llm_cache -- Cached LLM responses (query expansion, rerank scores)
989
+ ```
990
+
991
+ ## Environment Variables
992
+
993
+ | Variable | Default | Description |
994
+ |----------|---------|-------------|
995
+ | `XDG_CACHE_HOME` | `~/.cache` | Cache directory location |
996
+ | `QMD_LLAMA_GPU` | `auto` | Force llama.cpp GPU backend (`metal`, `vulkan`, `cuda`) or disable GPU with `false` |
997
+ | `QMD_FORCE_CPU` | unset | Set to `1`/`true` to force CPU mode before any CUDA/Vulkan/Metal probing. Equivalent CLI flag: `--no-gpu`. |
998
+ | `QMD_EMBED_PARALLELISM` | automatic | Override embedding/reranking context parallelism (1-8). Windows CUDA defaults to `1` because parallel CUDA contexts can crash with `ggml-cuda.cu:98`; use Vulkan or raise this only if your driver is stable. |
999
+
1000
+ ## How It Works
1001
+
1002
+ ### Indexing Flow
1003
+
1004
+ ```
1005
+ Collection ──► Glob Pattern ──► Markdown Files ──► Parse Title ──► Hash Content
1006
+ β”‚ β”‚ β”‚
1007
+ β”‚ β”‚ β–Ό
1008
+ β”‚ β”‚ Generate docid
1009
+ β”‚ β”‚ (6-char hash)
1010
+ β”‚ β”‚ β”‚
1011
+ └──────────────────────────────────────────────────►└──► Store in SQLite
1012
+ β”‚
1013
+ β–Ό
1014
+ FTS5 Index
1015
+ ```
1016
+
1017
+ ### Embedding Flow
1018
+
1019
+ Documents are chunked into ~900-token pieces with 15% overlap using smart boundary detection:
1020
+
1021
+ ```
1022
+ Document ──► Smart Chunk (~900 tokens) ──► Format each chunk ──► node-llama-cpp ──► Store Vectors
1023
+ β”‚ "title | text" embedBatch()
1024
+ β”‚
1025
+ └─► Chunks stored with:
1026
+ - hash: document hash
1027
+ - seq: chunk sequence (0, 1, 2...)
1028
+ - pos: character position in original
1029
+ ```
1030
+
1031
+ ### Smart Chunking
1032
+
1033
+ Instead of cutting at hard token boundaries, QMD uses a scoring algorithm to find natural markdown break points. This keeps semantic units (sections, paragraphs, code blocks) together.
1034
+
1035
+ **Break Point Scores:**
1036
+
1037
+ | Pattern | Score | Description |
1038
+ |---------|-------|-------------|
1039
+ | `# Heading` | 100 | H1 - major section |
1040
+ | `## Heading` | 90 | H2 - subsection |
1041
+ | `### Heading` | 80 | H3 |
1042
+ | `#### Heading` | 70 | H4 |
1043
+ | `##### Heading` | 60 | H5 |
1044
+ | `###### Heading` | 50 | H6 |
1045
+ | ` ``` ` | 80 | Code block boundary |
1046
+ | `---` / `***` | 60 | Horizontal rule |
1047
+ | Blank line | 20 | Paragraph boundary |
1048
+ | `- item` / `1. item` | 5 | List item |
1049
+ | Line break | 1 | Minimal break |
1050
+
1051
+ **Algorithm:**
1052
+
1053
+ 1. Scan document for all break points with scores
1054
+ 2. When approaching the 900-token target, search a 200-token window before the cutoff
1055
+ 3. Score each break point: `finalScore = baseScore Γ— (1 - (distance/window)Β² Γ— 0.7)`
1056
+ 4. Cut at the highest-scoring break point
1057
+
1058
+ The squared distance decay means a heading 200 tokens back (score ~30) still beats a simple line break at the target (score 1), but a closer heading wins over a distant one.
1059
+
1060
+ **Code Fence Protection:** Break points inside code blocks are ignoredβ€”code stays together. If a code block exceeds the chunk size, it's kept whole when possible.
1061
+
1062
+ **AST-Aware Chunking (Code Files):**
1063
+
1064
+ For supported code files, QMD also parses the source with [tree-sitter](https://tree-sitter.github.io/) and adds AST-derived break points that are merged with the regex scores above:
1065
+
1066
+ | AST Node | Score | Languages |
1067
+ |----------|-------|-----------|
1068
+ | Class / interface / struct / impl / trait | 100 | All |
1069
+ | Function / method | 90 | All |
1070
+ | Type alias / enum | 80 | All |
1071
+ | Import / use declaration | 60 | All |
1072
+
1073
+ Supported for `.ts`, `.tsx`, `.js`, `.jsx`, `.py`, `.go`, and `.rs` files. Enable with `--chunk-strategy auto`. Markdown and other file types always use regex chunking.
1074
+
1075
+ ### Query Flow (Hybrid)
1076
+
1077
+ ```
1078
+ Query ──► LLM Expansion ──► [Original, Variant 1, Variant 2]
1079
+ β”‚
1080
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
1081
+ β–Ό β–Ό
1082
+ For each query: FTS (BM25)
1083
+ β”‚ β”‚
1084
+ β–Ό β–Ό
1085
+ Vector Search Ranked List
1086
+ β”‚
1087
+ β–Ό
1088
+ Ranked List
1089
+ β”‚
1090
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
1091
+ β–Ό
1092
+ RRF Fusion (k=60)
1093
+ Original query Γ—2 weight
1094
+ Top-rank bonus: +0.05/#1, +0.02/#2-3
1095
+ β”‚
1096
+ β–Ό
1097
+ Top 30 candidates
1098
+ β”‚
1099
+ β–Ό
1100
+ LLM Re-ranking
1101
+ (yes/no + logprob confidence)
1102
+ β”‚
1103
+ β–Ό
1104
+ Position-Aware Blend
1105
+ Rank 1-3: 75% RRF / 25% reranker
1106
+ Rank 4-10: 60% RRF / 40% reranker
1107
+ Rank 11+: 40% RRF / 60% reranker
1108
+ β”‚
1109
+ β–Ό
1110
+ Final Results
1111
+ ```
1112
+
1113
+ ## Model Configuration
1114
+
1115
+ Models are configured in `src/llm.ts` as HuggingFace URIs:
1116
+
1117
+ ```typescript
1118
+ const DEFAULT_EMBED_MODEL = "hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf";
1119
+ const DEFAULT_RERANK_MODEL = "hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf";
1120
+ const DEFAULT_GENERATE_MODEL = "hf:tobil/qmd-query-expansion-1.7B-gguf/qmd-query-expansion-1.7B-q4_k_m.gguf";
1121
+ ```
1122
+
1123
+ ### EmbeddingGemma Prompt Format
1124
+
1125
+ ```
1126
+ // For queries
1127
+ "task: search result | query: {query}"
1128
+
1129
+ // For documents
1130
+ "title: {title} | text: {content}"
1131
+ ```
1132
+
1133
+ ### Qwen3-Reranker
1134
+
1135
+ Uses node-llama-cpp's `createRankingContext()` and `rankAndSort()` API for cross-encoder reranking. Returns documents sorted by relevance score (0.0 - 1.0).
1136
+
1137
+ ### Qwen3 (Query Expansion)
1138
+
1139
+ Used for generating query variations via `LlamaChatSession`.
1140
+
1141
+ ## License
1142
+
1143
+ MIT