@tobilu/qmd 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,34 @@
1
+ # Changelog
2
+
3
+ All notable changes to QMD will be documented in this file.
4
+
5
+ ## [0.9.0] - 2026-02-15
6
+
7
+ Initial public release.
8
+
9
+ ### Features
10
+
11
+ - **Hybrid search pipeline** — BM25 full-text + vector similarity + LLM reranking with Reciprocal Rank Fusion
12
+ - **Smart chunking** — scored markdown break points keep sections, paragraphs, and code blocks intact (~900 tokens/chunk, 15% overlap)
13
+ - **Query expansion** — fine-tuned Qwen3 1.7B model generates search variations for better recall
14
+ - **Cross-encoder reranking** — Qwen3-Reranker scores candidates with position-aware blending
15
+ - **Vector embeddings** — EmbeddingGemma 300M via node-llama-cpp, all on-device
16
+ - **MCP server** — stdio and HTTP transports for Claude Desktop, Claude Code, and any MCP client
17
+ - **Collection management** — index multiple directories with glob patterns
18
+ - **Context annotations** — add descriptions to collections and paths for richer search
19
+ - **Document IDs** — 6-char content hash for stable references across re-indexes
20
+ - **Multi-get** — retrieve multiple documents by glob pattern, comma list, or docids
21
+ - **Multiple output formats** — JSON, CSV, Markdown, XML, files list
22
+ - **Claude Code plugin** — inline status checks and MCP integration
23
+
24
+ ### Fixes
25
+
26
+ - Handle dense content (code) that tokenizes beyond expected chunk size
27
+ - Proper cleanup of Metal GPU resources
28
+ - SQLite-vec readiness verification after extension load
29
+ - Reactivate deactivated documents on re-index
30
+ - BM25 score normalization with Math.abs
31
+ - Bun UTF-8 path corruption workaround
32
+
33
+ [0.9.0]: https://github.com/tobi/qmd/releases/tag/v0.9.0
34
+
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024-2026 Tobi Lutke
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,615 @@
1
+ # QMD - Query Markup Documents
2
+
3
+ An on-device search engine for everything you need to remember. Index your markdown notes, meeting transcripts, documentation, and knowledge bases. Search with keywords or natural language. Ideal for your agentic flows.
4
+
5
+ QMD combines BM25 full-text search, vector semantic search, and LLM re-ranking—all running locally via node-llama-cpp with GGUF models.
6
+
7
+ ![QMD Architecture](assets/qmd-architecture.png)
8
+
9
+ ## Quick Start
10
+
11
+ ```sh
12
+ # Install globally
13
+ bun install -g @tobilu/qmd
14
+
15
+ # Create collections for your notes, docs, and meeting transcripts
16
+ qmd collection add ~/notes --name notes
17
+ qmd collection add ~/Documents/meetings --name meetings
18
+ qmd collection add ~/work/docs --name docs
19
+
20
+ # Add context to help with search results
21
+ qmd context add qmd://notes "Personal notes and ideas"
22
+ qmd context add qmd://meetings "Meeting transcripts and notes"
23
+ qmd context add qmd://docs "Work documentation"
24
+
25
+ # Generate embeddings for semantic search
26
+ qmd embed
27
+
28
+ # Search across everything
29
+ qmd search "project timeline" # Fast keyword search
30
+ qmd vsearch "how to deploy" # Semantic search
31
+ qmd query "quarterly planning process" # Hybrid + reranking (best quality)
32
+
33
+ # Get a specific document
34
+ qmd get "meetings/2024-01-15.md"
35
+
36
+ # Get a document by docid (shown in search results)
37
+ qmd get "#abc123"
38
+
39
+ # Get multiple documents by glob pattern
40
+ qmd multi-get "journals/2025-05*.md"
41
+
42
+ # Search within a specific collection
43
+ qmd search "API" -c notes
44
+
45
+ # Export all matches for an agent
46
+ qmd search "API" --all --files --min-score 0.3
47
+ ```
48
+
49
+ ### Using with AI Agents
50
+
51
+ QMD's `--json` and `--files` output formats are designed for agentic workflows:
52
+
53
+ ```sh
54
+ # Get structured results for an LLM
55
+ qmd search "authentication" --json -n 10
56
+
57
+ # List all relevant files above a threshold
58
+ qmd query "error handling" --all --files --min-score 0.4
59
+
60
+ # Retrieve full document content
61
+ qmd get "docs/api-reference.md" --full
62
+ ```
63
+
64
+ ### MCP Server
65
+
66
+ Although the tool works perfectly fine when you just tell your agent to use it on the command line, it also exposes an MCP (Model Context Protocol) server for tighter integration.
67
+
68
+ **Tools exposed:**
69
+ - `qmd_search` - Fast BM25 keyword search (supports collection filter)
70
+ - `qmd_vector_search` - Semantic vector search (supports collection filter)
71
+ - `qmd_deep_search` - Deep search with query expansion and reranking (supports collection filter)
72
+ - `qmd_get` - Retrieve document by path or docid (with fuzzy matching suggestions)
73
+ - `qmd_multi_get` - Retrieve multiple documents by glob pattern, list, or docids
74
+ - `qmd_status` - Index health and collection info
75
+
76
+ **Claude Desktop configuration** (`~/Library/Application Support/Claude/claude_desktop_config.json`):
77
+
78
+ ```json
79
+ {
80
+ "mcpServers": {
81
+ "qmd": {
82
+ "command": "qmd",
83
+ "args": ["mcp"]
84
+ }
85
+ }
86
+ }
87
+ ```
88
+
89
+ **Claude Code** — Install the plugin (recommended):
90
+
91
+ ```bash
92
+ claude marketplace add tobi/qmd
93
+ claude plugin add qmd@qmd
94
+ ```
95
+
96
+ Or configure MCP manually in `~/.claude/settings.json`:
97
+
98
+ ```json
99
+ {
100
+ "mcpServers": {
101
+ "qmd": {
102
+ "command": "qmd",
103
+ "args": ["mcp"]
104
+ }
105
+ }
106
+ }
107
+ ```
108
+
109
+ #### HTTP Transport
110
+
111
+ By default, QMD's MCP server uses stdio (launched as a subprocess by each client). For a shared, long-lived server that avoids repeated model loading, use the HTTP transport:
112
+
113
+ ```sh
114
+ # Foreground (Ctrl-C to stop)
115
+ qmd mcp --http # localhost:8181
116
+ qmd mcp --http --port 8080 # custom port
117
+
118
+ # Background daemon
119
+ qmd mcp --http --daemon # start, writes PID to ~/.cache/qmd/mcp.pid
120
+ qmd mcp stop # stop via PID file
121
+ qmd status # shows "MCP: running (PID ...)" when active
122
+ ```
123
+
124
+ The HTTP server exposes two endpoints:
125
+ - `POST /mcp` — MCP Streamable HTTP (JSON responses, stateless)
126
+ - `GET /health` — liveness check with uptime
127
+
128
+ LLM models stay loaded in VRAM across requests. Embedding/reranking contexts are disposed after 5 min idle and transparently recreated on the next request (~1s penalty, models remain loaded).
129
+
130
+ Point any MCP client at `http://localhost:8181/mcp` to connect.
131
+
132
+ ## Architecture
133
+
134
+ ```
135
+ ┌─────────────────────────────────────────────────────────────────────────────┐
136
+ │ QMD Hybrid Search Pipeline │
137
+ └─────────────────────────────────────────────────────────────────────────────┘
138
+
139
+ ┌─────────────────┐
140
+ │ User Query │
141
+ └────────┬────────┘
142
+
143
+ ┌──────────────┴──────────────┐
144
+ ▼ ▼
145
+ ┌────────────────┐ ┌────────────────┐
146
+ │ Query Expansion│ │ Original Query│
147
+ │ (fine-tuned) │ │ (×2 weight) │
148
+ └───────┬────────┘ └───────┬────────┘
149
+ │ │
150
+ │ 2 alternative queries │
151
+ └──────────────┬──────────────┘
152
+
153
+ ┌───────────────────────┼───────────────────────┐
154
+ ▼ ▼ ▼
155
+ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
156
+ │ Original Query │ │ Expanded Query 1│ │ Expanded Query 2│
157
+ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘
158
+ │ │ │
159
+ ┌───────┴───────┐ ┌───────┴───────┐ ┌───────┴───────┐
160
+ ▼ ▼ ▼ ▼ ▼ ▼
161
+ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
162
+ │ BM25 │ │Vector │ │ BM25 │ │Vector │ │ BM25 │ │Vector │
163
+ │(FTS5) │ │Search │ │(FTS5) │ │Search │ │(FTS5) │ │Search │
164
+ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘
165
+ │ │ │ │ │ │
166
+ └───────┬───────┘ └──────┬──────┘ └──────┬──────┘
167
+ │ │ │
168
+ └────────────────────────┼───────────────────────┘
169
+
170
+
171
+ ┌───────────────────────┐
172
+ │ RRF Fusion + Bonus │
173
+ │ Original query: ×2 │
174
+ │ Top-rank bonus: +0.05│
175
+ │ Top 30 Kept │
176
+ └───────────┬───────────┘
177
+
178
+
179
+ ┌───────────────────────┐
180
+ │ LLM Re-ranking │
181
+ │ (qwen3-reranker) │
182
+ │ Yes/No + logprobs │
183
+ └───────────┬───────────┘
184
+
185
+
186
+ ┌───────────────────────┐
187
+ │ Position-Aware Blend │
188
+ │ Top 1-3: 75% RRF │
189
+ │ Top 4-10: 60% RRF │
190
+ │ Top 11+: 40% RRF │
191
+ └───────────────────────┘
192
+ ```
193
+
194
+ ## Score Normalization & Fusion
195
+
196
+ ### Search Backends
197
+
198
+ | Backend | Raw Score | Conversion | Range |
199
+ |---------|-----------|------------|-------|
200
+ | **FTS (BM25)** | SQLite FTS5 BM25 | `Math.abs(score)` | 0 to ~25+ |
201
+ | **Vector** | Cosine distance | `1 / (1 + distance)` | 0.0 to 1.0 |
202
+ | **Reranker** | LLM 0-10 rating | `score / 10` | 0.0 to 1.0 |
203
+
204
+ ### Fusion Strategy
205
+
206
+ The `query` command uses **Reciprocal Rank Fusion (RRF)** with position-aware blending:
207
+
208
+ 1. **Query Expansion**: Original query (×2 for weighting) + 1 LLM variation
209
+ 2. **Parallel Retrieval**: Each query searches both FTS and vector indexes
210
+ 3. **RRF Fusion**: Combine all result lists using `score = Σ(1/(k+rank+1))` where k=60
211
+ 4. **Top-Rank Bonus**: Documents ranking #1 in any list get +0.05, #2-3 get +0.02
212
+ 5. **Top-K Selection**: Take top 30 candidates for reranking
213
+ 6. **Re-ranking**: LLM scores each document (yes/no with logprobs confidence)
214
+ 7. **Position-Aware Blending**:
215
+ - RRF rank 1-3: 75% retrieval, 25% reranker (preserves exact matches)
216
+ - RRF rank 4-10: 60% retrieval, 40% reranker
217
+ - RRF rank 11+: 40% retrieval, 60% reranker (trust reranker more)
218
+
219
+ **Why this approach**: Pure RRF can dilute exact matches when expanded queries don't match. The top-rank bonus preserves documents that score #1 for the original query. Position-aware blending prevents the reranker from destroying high-confidence retrieval results.
220
+
221
+ ### Score Interpretation
222
+
223
+ | Score | Meaning |
224
+ |-------|---------|
225
+ | 0.8 - 1.0 | Highly relevant |
226
+ | 0.5 - 0.8 | Moderately relevant |
227
+ | 0.2 - 0.5 | Somewhat relevant |
228
+ | 0.0 - 0.2 | Low relevance |
229
+
230
+ ## Requirements
231
+
232
+ ### System Requirements
233
+
234
+ - **Bun** >= 1.0.0
235
+ - **macOS**: Homebrew SQLite (for extension support)
236
+ ```sh
237
+ brew install sqlite
238
+ ```
239
+
240
+ ### GGUF Models (via node-llama-cpp)
241
+
242
+ QMD uses three local GGUF models (auto-downloaded on first use):
243
+
244
+ | Model | Purpose | Size |
245
+ |-------|---------|------|
246
+ | `embeddinggemma-300M-Q8_0` | Vector embeddings | ~300MB |
247
+ | `qwen3-reranker-0.6b-q8_0` | Re-ranking | ~640MB |
248
+ | `qmd-query-expansion-1.7B-q4_k_m` | Query expansion (fine-tuned) | ~1.1GB |
249
+
250
+ Models are downloaded from HuggingFace and cached in `~/.cache/qmd/models/`.
251
+
252
+ ## Installation
253
+
254
+ ```sh
255
+ bun install -g @tobilu/qmd
256
+ ```
257
+
258
+ Make sure `~/.bun/bin` is in your PATH.
259
+
260
+ ### Development
261
+
262
+ ```sh
263
+ git clone https://github.com/tobi/qmd
264
+ cd qmd
265
+ bun install
266
+ bun link
267
+ ```
268
+
269
+ ## Usage
270
+
271
+ ### Collection Management
272
+
273
+ ```sh
274
+ # Create a collection from current directory
275
+ qmd collection add . --name myproject
276
+
277
+ # Create a collection with explicit path and custom glob mask
278
+ qmd collection add ~/Documents/notes --name notes --mask "**/*.md"
279
+
280
+ # List all collections
281
+ qmd collection list
282
+
283
+ # Remove a collection
284
+ qmd collection remove myproject
285
+
286
+ # Rename a collection
287
+ qmd collection rename myproject my-project
288
+
289
+ # List files in a collection
290
+ qmd ls notes
291
+ qmd ls notes/subfolder
292
+ ```
293
+
294
+ ### Generate Vector Embeddings
295
+
296
+ ```sh
297
+ # Embed all indexed documents (900 tokens/chunk, 15% overlap)
298
+ qmd embed
299
+
300
+ # Force re-embed everything
301
+ qmd embed -f
302
+ ```
303
+
304
+ ### Context Management
305
+
306
+ Context adds descriptive metadata to collections and paths, helping search understand your content.
307
+
308
+ ```sh
309
+ # Add context to a collection (using qmd:// virtual paths)
310
+ qmd context add qmd://notes "Personal notes and ideas"
311
+ qmd context add qmd://docs/api "API documentation"
312
+
313
+ # Add context from within a collection directory
314
+ cd ~/notes && qmd context add "Personal notes and ideas"
315
+ cd ~/notes/work && qmd context add "Work-related notes"
316
+
317
+ # Add global context (applies to all collections)
318
+ qmd context add / "Knowledge base for my projects"
319
+
320
+ # List all contexts
321
+ qmd context list
322
+
323
+ # Remove context
324
+ qmd context rm qmd://notes/old
325
+ ```
326
+
327
+ ### Search Commands
328
+
329
+ ```
330
+ ┌──────────────────────────────────────────────────────────────────┐
331
+ │ Search Modes │
332
+ ├──────────┬───────────────────────────────────────────────────────┤
333
+ │ search │ BM25 full-text search only │
334
+ │ vsearch │ Vector semantic search only │
335
+ │ query │ Hybrid: FTS + Vector + Query Expansion + Re-ranking │
336
+ └──────────┴───────────────────────────────────────────────────────┘
337
+ ```
338
+
339
+ ```sh
340
+ # Full-text search (fast, keyword-based)
341
+ qmd search "authentication flow"
342
+
343
+ # Vector search (semantic similarity)
344
+ qmd vsearch "how to login"
345
+
346
+ # Hybrid search with re-ranking (best quality)
347
+ qmd query "user authentication"
348
+ ```
349
+
350
+ ### Options
351
+
352
+ ```sh
353
+ # Search options
354
+ -n <num> # Number of results (default: 5, or 20 for --files/--json)
355
+ -c, --collection # Restrict search to a specific collection
356
+ --all # Return all matches (use with --min-score to filter)
357
+ --min-score <num> # Minimum score threshold (default: 0)
358
+ --full # Show full document content
359
+ --line-numbers # Add line numbers to output
360
+ --index <name> # Use named index
361
+
362
+ # Output formats (for search and multi-get)
363
+ --files # Output: docid,score,filepath,context
364
+ --json # JSON output with snippets
365
+ --csv # CSV output
366
+ --md # Markdown output
367
+ --xml # XML output
368
+
369
+ # Get options
370
+ qmd get <file>[:line] # Get document, optionally starting at line
371
+ -l <num> # Maximum lines to return
372
+ --from <num> # Start from line number
373
+
374
+ # Multi-get options
375
+ -l <num> # Maximum lines per file
376
+ --max-bytes <num> # Skip files larger than N bytes (default: 10KB)
377
+ ```
378
+
379
+ ### Output Format
380
+
381
+ Default output is colorized CLI format (respects `NO_COLOR` env):
382
+
383
+ ```
384
+ docs/guide.md:42 #a1b2c3
385
+ Title: Software Craftsmanship
386
+ Context: Work documentation
387
+ Score: 93%
388
+
389
+ This section covers the **craftsmanship** of building
390
+ quality software with attention to detail.
391
+ See also: engineering principles
392
+
393
+
394
+ notes/meeting.md:15 #d4e5f6
395
+ Title: Q4 Planning
396
+ Context: Personal notes and ideas
397
+ Score: 67%
398
+
399
+ Discussion about code quality and craftsmanship
400
+ in the development process.
401
+ ```
402
+
403
+ - **Path**: Collection-relative path (e.g., `docs/guide.md`)
404
+ - **Docid**: Short hash identifier (e.g., `#a1b2c3`) - use with `qmd get #a1b2c3`
405
+ - **Title**: Extracted from document (first heading or filename)
406
+ - **Context**: Path context if configured via `qmd context add`
407
+ - **Score**: Color-coded (green >70%, yellow >40%, dim otherwise)
408
+ - **Snippet**: Context around match with query terms highlighted
409
+
410
+ ### Examples
411
+
412
+ ```sh
413
+ # Get 10 results with minimum score 0.3
414
+ qmd query -n 10 --min-score 0.3 "API design patterns"
415
+
416
+ # Output as markdown for LLM context
417
+ qmd search --md --full "error handling"
418
+
419
+ # JSON output for scripting
420
+ qmd query --json "quarterly reports"
421
+
422
+ # Use separate index for different knowledge base
423
+ qmd --index work search "quarterly reports"
424
+ ```
425
+
426
+ ### Index Maintenance
427
+
428
+ ```sh
429
+ # Show index status and collections with contexts
430
+ qmd status
431
+
432
+ # Re-index all collections
433
+ qmd update
434
+
435
+ # Re-index with git pull first (for remote repos)
436
+ qmd update --pull
437
+
438
+ # Get document by filepath (with fuzzy matching suggestions)
439
+ qmd get notes/meeting.md
440
+
441
+ # Get document by docid (from search results)
442
+ qmd get "#abc123"
443
+
444
+ # Get document starting at line 50, max 100 lines
445
+ qmd get notes/meeting.md:50 -l 100
446
+
447
+ # Get multiple documents by glob pattern
448
+ qmd multi-get "journals/2025-05*.md"
449
+
450
+ # Get multiple documents by comma-separated list (supports docids)
451
+ qmd multi-get "doc1.md, doc2.md, #abc123"
452
+
453
+ # Limit multi-get to files under 20KB
454
+ qmd multi-get "docs/*.md" --max-bytes 20480
455
+
456
+ # Output multi-get as JSON for agent processing
457
+ qmd multi-get "docs/*.md" --json
458
+
459
+ # Clean up cache and orphaned data
460
+ qmd cleanup
461
+ ```
462
+
463
+ ## Data Storage
464
+
465
+ Index stored in: `~/.cache/qmd/index.sqlite`
466
+
467
+ ### Schema
468
+
469
+ ```sql
470
+ collections -- Indexed directories with name and glob patterns
471
+ path_contexts -- Context descriptions by virtual path (qmd://...)
472
+ documents -- Markdown content with metadata and docid (6-char hash)
473
+ documents_fts -- FTS5 full-text index
474
+ content_vectors -- Embedding chunks (hash, seq, pos, 900 tokens each)
475
+ vectors_vec -- sqlite-vec vector index (hash_seq key)
476
+ llm_cache -- Cached LLM responses (query expansion, rerank scores)
477
+ ```
478
+
479
+ ## Environment Variables
480
+
481
+ | Variable | Default | Description |
482
+ |----------|---------|-------------|
483
+ | `XDG_CACHE_HOME` | `~/.cache` | Cache directory location |
484
+
485
+ ## How It Works
486
+
487
+ ### Indexing Flow
488
+
489
+ ```
490
+ Collection ──► Glob Pattern ──► Markdown Files ──► Parse Title ──► Hash Content
491
+ │ │ │
492
+ │ │ ▼
493
+ │ │ Generate docid
494
+ │ │ (6-char hash)
495
+ │ │ │
496
+ └──────────────────────────────────────────────────►└──► Store in SQLite
497
+
498
+
499
+ FTS5 Index
500
+ ```
501
+
502
+ ### Embedding Flow
503
+
504
+ Documents are chunked into ~900-token pieces with 15% overlap using smart boundary detection:
505
+
506
+ ```
507
+ Document ──► Smart Chunk (~900 tokens) ──► Format each chunk ──► node-llama-cpp ──► Store Vectors
508
+ │ "title | text" embedBatch()
509
+
510
+ └─► Chunks stored with:
511
+ - hash: document hash
512
+ - seq: chunk sequence (0, 1, 2...)
513
+ - pos: character position in original
514
+ ```
515
+
516
+ ### Smart Chunking
517
+
518
+ Instead of cutting at hard token boundaries, QMD uses a scoring algorithm to find natural markdown break points. This keeps semantic units (sections, paragraphs, code blocks) together.
519
+
520
+ **Break Point Scores:**
521
+
522
+ | Pattern | Score | Description |
523
+ |---------|-------|-------------|
524
+ | `# Heading` | 100 | H1 - major section |
525
+ | `## Heading` | 90 | H2 - subsection |
526
+ | `### Heading` | 80 | H3 |
527
+ | `#### Heading` | 70 | H4 |
528
+ | `##### Heading` | 60 | H5 |
529
+ | `###### Heading` | 50 | H6 |
530
+ | ` ``` ` | 80 | Code block boundary |
531
+ | `---` / `***` | 60 | Horizontal rule |
532
+ | Blank line | 20 | Paragraph boundary |
533
+ | `- item` / `1. item` | 5 | List item |
534
+ | Line break | 1 | Minimal break |
535
+
536
+ **Algorithm:**
537
+
538
+ 1. Scan document for all break points with scores
539
+ 2. When approaching the 900-token target, search a 200-token window before the cutoff
540
+ 3. Score each break point: `finalScore = baseScore × (1 - (distance/window)² × 0.7)`
541
+ 4. Cut at the highest-scoring break point
542
+
543
+ The squared distance decay means a heading 200 tokens back (score ~30) still beats a simple line break at the target (score 1), but a closer heading wins over a distant one.
544
+
545
+ **Code Fence Protection:** Break points inside code blocks are ignored—code stays together. If a code block exceeds the chunk size, it's kept whole when possible.
546
+
547
+ ### Query Flow (Hybrid)
548
+
549
+ ```
550
+ Query ──► LLM Expansion ──► [Original, Variant 1, Variant 2]
551
+
552
+ ┌─────────┴─────────┐
553
+ ▼ ▼
554
+ For each query: FTS (BM25)
555
+ │ │
556
+ ▼ ▼
557
+ Vector Search Ranked List
558
+
559
+
560
+ Ranked List
561
+
562
+ └─────────┬─────────┘
563
+
564
+ RRF Fusion (k=60)
565
+ Original query ×2 weight
566
+ Top-rank bonus: +0.05/#1, +0.02/#2-3
567
+
568
+
569
+ Top 30 candidates
570
+
571
+
572
+ LLM Re-ranking
573
+ (yes/no + logprob confidence)
574
+
575
+
576
+ Position-Aware Blend
577
+ Rank 1-3: 75% RRF / 25% reranker
578
+ Rank 4-10: 60% RRF / 40% reranker
579
+ Rank 11+: 40% RRF / 60% reranker
580
+
581
+
582
+ Final Results
583
+ ```
584
+
585
+ ## Model Configuration
586
+
587
+ Models are configured in `src/llm.ts` as HuggingFace URIs:
588
+
589
+ ```typescript
590
+ const DEFAULT_EMBED_MODEL = "hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf";
591
+ const DEFAULT_RERANK_MODEL = "hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf";
592
+ const DEFAULT_GENERATE_MODEL = "hf:tobil/qmd-query-expansion-1.7B-gguf/qmd-query-expansion-1.7B-q4_k_m.gguf";
593
+ ```
594
+
595
+ ### EmbeddingGemma Prompt Format
596
+
597
+ ```
598
+ // For queries
599
+ "task: search result | query: {query}"
600
+
601
+ // For documents
602
+ "title: {title} | text: {content}"
603
+ ```
604
+
605
+ ### Qwen3-Reranker
606
+
607
+ Uses node-llama-cpp's `createRankingContext()` and `rankAndSort()` API for cross-encoder reranking. Returns documents sorted by relevance score (0.0 - 1.0).
608
+
609
+ ### Qwen3 (Query Expansion)
610
+
611
+ Used for generating query variations via `LlamaChatSession`.
612
+
613
+ ## License
614
+
615
+ MIT