org-qmd 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,917 @@
1
+ # QMD - Query Markup Documents
2
+
3
+ An on-device search engine for everything you need to remember. Index your markdown notes, meeting transcripts, documentation, and knowledge bases. Search with keywords or natural language. Ideal for your agentic flows.
4
+
5
+ QMD combines BM25 full-text search, vector semantic search, and LLM re-ranking—all running locally via node-llama-cpp with GGUF models.
6
+
7
+ ![QMD Architecture](assets/qmd-architecture.png)
8
+
9
+ You can read more about QMD's progress in the [CHANGELOG](CHANGELOG.md).
10
+
11
+ ## Quick Start
12
+
13
+ ```sh
14
+ # Install globally (Node or Bun)
15
+ npm install -g org-qmd
16
+ # or
17
+ bun install -g org-qmd
18
+
19
+ # Or run directly
20
+ npx org-qmd ...
21
+ bunx org-qmd ...
22
+
23
+ # Create collections for your notes, docs, and meeting transcripts
24
+ qmd collection add ~/notes --name notes
25
+ qmd collection add ~/Documents/meetings --name meetings
26
+ qmd collection add ~/work/docs --name docs
27
+
28
+ # Add context to help with search results, each piece of context will be returned when matching sub documents are returned. This works as a tree. This is the key feature of QMD as it allows LLMs to make much better contextual choices when selecting documents. Don't sleep on it!
29
+ qmd context add qmd://notes "Personal notes and ideas"
30
+ qmd context add qmd://meetings "Meeting transcripts and notes"
31
+ qmd context add qmd://docs "Work documentation"
32
+
33
+ # Generate embeddings for semantic search
34
+ qmd embed
35
+
36
+ # Search across everything
37
+ qmd search "project timeline" # Fast keyword search
38
+ qmd vsearch "how to deploy" # Semantic search
39
+ qmd query "quarterly planning process" # Hybrid + reranking (best quality)
40
+
41
+ # Get a specific document
42
+ qmd get "meetings/2024-01-15.md"
43
+
44
+ # Get a document by docid (shown in search results)
45
+ qmd get "#abc123"
46
+
47
+ # Get multiple documents by glob pattern
48
+ qmd multi-get "journals/2025-05*.md"
49
+
50
+ # Search within a specific collection
51
+ qmd search "API" -c notes
52
+
53
+ # Export all matches for an agent
54
+ qmd search "API" --all --files --min-score 0.3
55
+ ```
56
+
57
+ ### Using with AI Agents
58
+
59
+ QMD's `--json` and `--files` output formats are designed for agentic workflows:
60
+
61
+ ```sh
62
+ # Get structured results for an LLM
63
+ qmd search "authentication" --json -n 10
64
+
65
+ # List all relevant files above a threshold
66
+ qmd query "error handling" --all --files --min-score 0.4
67
+
68
+ # Retrieve full document content
69
+ qmd get "docs/api-reference.md" --full
70
+ ```
71
+
72
+ ### MCP Server
73
+
74
+ Although the tool works perfectly fine when you just tell your agent to use it on the command line, it also exposes an MCP (Model Context Protocol) server for tighter integration.
75
+
76
+ **Tools exposed:**
77
+ - `query` — Search with typed sub-queries (`lex`/`vec`/`hyde`), combined via RRF + reranking
78
+ - `get` — Retrieve a document by path or docid (with fuzzy matching suggestions)
79
+ - `multi_get` — Batch retrieve by glob pattern, comma-separated list, or docids
80
+ - `status` — Index health and collection info
81
+
82
+ **Claude Desktop configuration** (`~/Library/Application Support/Claude/claude_desktop_config.json`):
83
+
84
+ ```json
85
+ {
86
+ "mcpServers": {
87
+ "qmd": {
88
+ "command": "qmd",
89
+ "args": ["mcp"]
90
+ }
91
+ }
92
+ }
93
+ ```
94
+
95
+ **Claude Code** — Install the plugin (recommended):
96
+
97
+ ```bash
98
+ claude plugin marketplace add tobi/qmd
99
+ claude plugin install qmd@qmd
100
+ ```
101
+
102
+ Or configure MCP manually in `~/.claude/settings.json`:
103
+
104
+ ```json
105
+ {
106
+ "mcpServers": {
107
+ "qmd": {
108
+ "command": "qmd",
109
+ "args": ["mcp"]
110
+ }
111
+ }
112
+ }
113
+ ```
114
+
115
+ #### HTTP Transport
116
+
117
+ By default, QMD's MCP server uses stdio (launched as a subprocess by each client). For a shared, long-lived server that avoids repeated model loading, use the HTTP transport:
118
+
119
+ ```sh
120
+ # Foreground (Ctrl-C to stop)
121
+ qmd mcp --http # localhost:8181
122
+ qmd mcp --http --port 8080 # custom port
123
+
124
+ # Background daemon
125
+ qmd mcp --http --daemon # start, writes PID to ~/.cache/qmd/mcp.pid
126
+ qmd mcp stop # stop via PID file
127
+ qmd status # shows "MCP: running (PID ...)" when active
128
+ ```
129
+
130
+ The HTTP server exposes two endpoints:
131
+ - `POST /mcp` — MCP Streamable HTTP (JSON responses, stateless)
132
+ - `GET /health` — liveness check with uptime
133
+
134
+ LLM models stay loaded in VRAM across requests. Embedding/reranking contexts are disposed after 5 min idle and transparently recreated on the next request (~1s penalty, models remain loaded).
135
+
136
+ Point any MCP client at `http://localhost:8181/mcp` to connect.
137
+
138
+ ### SDK / Library Usage
139
+
140
+ Use QMD as a library in your own Node.js or Bun applications.
141
+
142
+ #### Installation
143
+
144
+ ```sh
145
+ npm install @tobilu/qmd
146
+ ```
147
+
148
+ #### Quick Start
149
+
150
+ ```typescript
151
+ import { createStore } from '@tobilu/qmd'
152
+
153
+ const store = await createStore({
154
+ dbPath: './my-index.sqlite',
155
+ config: {
156
+ collections: {
157
+ docs: { path: '/path/to/docs', pattern: '**/*.md' },
158
+ },
159
+ },
160
+ })
161
+
162
+ const results = await store.search({ query: "authentication flow" })
163
+ console.log(results.map(r => `${r.title} (${Math.round(r.score * 100)}%)`))
164
+
165
+ await store.close()
166
+ ```
167
+
168
+ #### Store Creation
169
+
170
+ `createStore()` accepts three modes:
171
+
172
+ ```typescript
173
+ import { createStore } from '@tobilu/qmd'
174
+
175
+ // 1. Inline config — no files needed besides the DB
176
+ const store = await createStore({
177
+ dbPath: './index.sqlite',
178
+ config: {
179
+ collections: {
180
+ docs: { path: '/path/to/docs', pattern: '**/*.md' },
181
+ notes: { path: '/path/to/notes' },
182
+ },
183
+ },
184
+ })
185
+
186
+ // 2. YAML config file — collections defined in a file
187
+ const store2 = await createStore({
188
+ dbPath: './index.sqlite',
189
+ configPath: './qmd.yml',
190
+ })
191
+
192
+ // 3. DB-only — reopen a previously configured store
193
+ const store3 = await createStore({ dbPath: './index.sqlite' })
194
+ ```
195
+
196
+ #### Search
197
+
198
+ The unified `search()` method handles both simple queries and pre-expanded structured queries:
199
+
200
+ ```typescript
201
+ // Simple query — auto-expanded via LLM, then BM25 + vector + reranking
202
+ const results = await store.search({ query: "authentication flow" })
203
+
204
+ // With options
205
+ const results2 = await store.search({
206
+ query: "rate limiting",
207
+ intent: "API throttling and abuse prevention",
208
+ collection: "docs",
209
+ limit: 5,
210
+ minScore: 0.3,
211
+ explain: true,
212
+ })
213
+
214
+ // Pre-expanded queries — skip auto-expansion, control each sub-query
215
+ const results3 = await store.search({
216
+ queries: [
217
+ { type: 'lex', query: '"connection pool" timeout -redis' },
218
+ { type: 'vec', query: 'why do database connections time out under load' },
219
+ ],
220
+ collections: ["docs", "notes"],
221
+ })
222
+
223
+ // Skip reranking for faster results
224
+ const fast = await store.search({ query: "auth", rerank: false })
225
+ ```
226
+
227
+ For direct backend access:
228
+
229
+ ```typescript
230
+ // BM25 keyword search (fast, no LLM)
231
+ const lexResults = await store.searchLex("auth middleware", { limit: 10 })
232
+
233
+ // Vector similarity search (embedding model, no reranking)
234
+ const vecResults = await store.searchVector("how users log in", { limit: 10 })
235
+
236
+ // Manual query expansion for full control
237
+ const expanded = await store.expandQuery("auth flow", { intent: "user login" })
238
+ const results4 = await store.search({ queries: expanded })
239
+ ```
240
+
241
+ #### Retrieval
242
+
243
+ ```typescript
244
+ // Get a document by path or docid
245
+ const doc = await store.get("docs/readme.md")
246
+ const byId = await store.get("#abc123")
247
+
248
+ if (!("error" in doc)) {
249
+ console.log(doc.title, doc.displayPath, doc.context)
250
+ }
251
+
252
+ // Get document body with line range
253
+ const body = await store.getDocumentBody("docs/readme.md", {
254
+ fromLine: 50,
255
+ maxLines: 100,
256
+ })
257
+
258
+ // Batch retrieve by glob or comma-separated list
259
+ const { docs, errors } = await store.multiGet("docs/**/*.md", {
260
+ maxBytes: 20480,
261
+ })
262
+ ```
263
+
264
+ #### Collections
265
+
266
+ ```typescript
267
+ // Add a collection
268
+ await store.addCollection("myapp", {
269
+ path: "/src/myapp",
270
+ pattern: "**/*.ts",
271
+ ignore: ["node_modules/**", "*.test.ts"],
272
+ })
273
+
274
+ // List collections with document stats
275
+ const collections = await store.listCollections()
276
+ // => [{ name, pwd, glob_pattern, doc_count, active_count, last_modified, includeByDefault }]
277
+
278
+ // Get names of collections included in queries by default
279
+ const defaults = await store.getDefaultCollectionNames()
280
+
281
+ // Remove / rename
282
+ await store.removeCollection("myapp")
283
+ await store.renameCollection("old-name", "new-name")
284
+ ```
285
+
286
+ #### Context
287
+
288
+ Context adds descriptive metadata that improves search relevance and is returned alongside results:
289
+
290
+ ```typescript
291
+ // Add context for a path within a collection
292
+ await store.addContext("docs", "/api", "REST API reference documentation")
293
+
294
+ // Set global context (applies to all collections)
295
+ await store.setGlobalContext("Internal engineering documentation")
296
+
297
+ // List all contexts
298
+ const contexts = await store.listContexts()
299
+ // => [{ collection, path, context }]
300
+
301
+ // Remove context
302
+ await store.removeContext("docs", "/api")
303
+ await store.setGlobalContext(undefined) // clear global
304
+ ```
305
+
306
+ #### Indexing
307
+
308
+ ```typescript
309
+ // Re-index collections by scanning the filesystem
310
+ const result = await store.update({
311
+ collections: ["docs"], // optional — defaults to all
312
+ onProgress: ({ collection, file, current, total }) => {
313
+ console.log(`[${collection}] ${current}/${total} ${file}`)
314
+ },
315
+ })
316
+ // => { collections, indexed, updated, unchanged, removed, needsEmbedding }
317
+
318
+ // Generate vector embeddings
319
+ const embedResult = await store.embed({
320
+ force: false, // true to re-embed everything
321
+ chunkStrategy: "auto", // "regex" (default) or "auto" (AST for code files)
322
+ onProgress: ({ current, total, collection }) => {
323
+ console.log(`Embedding ${current}/${total}`)
324
+ },
325
+ })
326
+ ```
327
+
328
+ #### Types
329
+
330
+ Key types exported for SDK consumers:
331
+
332
+ ```typescript
333
+ import type {
334
+ QMDStore, // The store interface
335
+ SearchOptions, // Options for search()
336
+ LexSearchOptions, // Options for searchLex()
337
+ VectorSearchOptions, // Options for searchVector()
338
+ HybridQueryResult, // Search result with score, snippet, context
339
+ SearchResult, // Result from searchLex/searchVector
340
+ ExpandedQuery, // Typed sub-query { type: 'lex'|'vec'|'hyde', query }
341
+ DocumentResult, // Document metadata + body
342
+ DocumentNotFound, // Error with similarFiles suggestions
343
+ MultiGetResult, // Batch retrieval result
344
+ UpdateProgress, // Progress callback info for update()
345
+ UpdateResult, // Aggregated update result
346
+ EmbedProgress, // Progress callback info for embed()
347
+ EmbedResult, // Embedding result
348
+ StoreOptions, // createStore() options
349
+ CollectionConfig, // Inline config shape
350
+ IndexStatus, // From getStatus()
351
+ IndexHealthInfo, // From getIndexHealth()
352
+ } from '@tobilu/qmd'
353
+ ```
354
+
355
+ Utility exports:
356
+
357
+ ```typescript
358
+ import {
359
+ extractSnippet, // Extract a relevant snippet from text
360
+ addLineNumbers, // Add line numbers to text
361
+ DEFAULT_MULTI_GET_MAX_BYTES, // Default max file size for multiGet (10KB)
362
+ Maintenance, // Database maintenance operations
363
+ } from '@tobilu/qmd'
364
+ ```
365
+
366
+ #### Lifecycle
367
+
368
+ ```typescript
369
+ // Close the store — disposes LLM models and DB connection
370
+ await store.close()
371
+ ```
372
+
373
+ The SDK requires explicit `dbPath` — no defaults are assumed. This makes it safe to embed in any application without side effects.
374
+
375
+ ## Architecture
376
+
377
+ ```
378
+ ┌─────────────────────────────────────────────────────────────────────────────┐
379
+ │ QMD Hybrid Search Pipeline │
380
+ └─────────────────────────────────────────────────────────────────────────────┘
381
+
382
+ ┌─────────────────┐
383
+ │ User Query │
384
+ └────────┬────────┘
385
+
386
+ ┌──────────────┴──────────────┐
387
+ ▼ ▼
388
+ ┌────────────────┐ ┌────────────────┐
389
+ │ Query Expansion│ │ Original Query│
390
+ │ (fine-tuned) │ │ (×2 weight) │
391
+ └───────┬────────┘ └───────┬────────┘
392
+ │ │
393
+ │ 2 alternative queries │
394
+ └──────────────┬──────────────┘
395
+
396
+ ┌───────────────────────┼───────────────────────┐
397
+ ▼ ▼ ▼
398
+ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
399
+ │ Original Query │ │ Expanded Query 1│ │ Expanded Query 2│
400
+ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘
401
+ │ │ │
402
+ ┌───────┴───────┐ ┌───────┴───────┐ ┌───────┴───────┐
403
+ ▼ ▼ ▼ ▼ ▼ ▼
404
+ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
405
+ │ BM25 │ │Vector │ │ BM25 │ │Vector │ │ BM25 │ │Vector │
406
+ │(FTS5) │ │Search │ │(FTS5) │ │Search │ │(FTS5) │ │Search │
407
+ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘
408
+ │ │ │ │ │ │
409
+ └───────┬───────┘ └──────┬──────┘ └──────┬──────┘
410
+ │ │ │
411
+ └────────────────────────┼───────────────────────┘
412
+
413
+
414
+ ┌───────────────────────┐
415
+ │ RRF Fusion + Bonus │
416
+ │ Original query: ×2 │
417
+ │ Top-rank bonus: +0.05│
418
+ │ Top 30 Kept │
419
+ └───────────┬───────────┘
420
+
421
+
422
+ ┌───────────────────────┐
423
+ │ LLM Re-ranking │
424
+ │ (qwen3-reranker) │
425
+ │ Yes/No + logprobs │
426
+ └───────────┬───────────┘
427
+
428
+
429
+ ┌───────────────────────┐
430
+ │ Position-Aware Blend │
431
+ │ Top 1-3: 75% RRF │
432
+ │ Top 4-10: 60% RRF │
433
+ │ Top 11+: 40% RRF │
434
+ └───────────────────────┘
435
+ ```
436
+
437
+ ## Score Normalization & Fusion
438
+
439
+ ### Search Backends
440
+
441
+ | Backend | Raw Score | Conversion | Range |
442
+ |---------|-----------|------------|-------|
443
+ | **FTS (BM25)** | SQLite FTS5 BM25 | `Math.abs(score)` | 0 to ~25+ |
444
+ | **Vector** | Cosine distance | `1 / (1 + distance)` | 0.0 to 1.0 |
445
+ | **Reranker** | LLM 0-10 rating | `score / 10` | 0.0 to 1.0 |
446
+
447
+ ### Fusion Strategy
448
+
449
+ The `query` command uses **Reciprocal Rank Fusion (RRF)** with position-aware blending:
450
+
451
+ 1. **Query Expansion**: Original query (×2 for weighting) + 1 LLM variation
452
+ 2. **Parallel Retrieval**: Each query searches both FTS and vector indexes
453
+ 3. **RRF Fusion**: Combine all result lists using `score = Σ(1/(k+rank+1))` where k=60
454
+ 4. **Top-Rank Bonus**: Documents ranking #1 in any list get +0.05, #2-3 get +0.02
455
+ 5. **Top-K Selection**: Take top 30 candidates for reranking
456
+ 6. **Re-ranking**: LLM scores each document (yes/no with logprobs confidence)
457
+ 7. **Position-Aware Blending**:
458
+ - RRF rank 1-3: 75% retrieval, 25% reranker (preserves exact matches)
459
+ - RRF rank 4-10: 60% retrieval, 40% reranker
460
+ - RRF rank 11+: 40% retrieval, 60% reranker (trust reranker more)
461
+
462
+ **Why this approach**: Pure RRF can dilute exact matches when expanded queries don't match. The top-rank bonus preserves documents that score #1 for the original query. Position-aware blending prevents the reranker from destroying high-confidence retrieval results.
463
+
464
+ ### Score Interpretation
465
+
466
+ | Score | Meaning |
467
+ |-------|---------|
468
+ | 0.8 - 1.0 | Highly relevant |
469
+ | 0.5 - 0.8 | Moderately relevant |
470
+ | 0.2 - 0.5 | Somewhat relevant |
471
+ | 0.0 - 0.2 | Low relevance |
472
+
473
+ ## Requirements
474
+
475
+ ### System Requirements
476
+
477
+ - **Node.js** >= 22
478
+ - **Bun** >= 1.0.0
479
+ - **macOS**: Homebrew SQLite (for extension support)
480
+ ```sh
481
+ brew install sqlite
482
+ ```
483
+
484
+ ### GGUF Models (via node-llama-cpp)
485
+
486
+ QMD uses three local GGUF models (auto-downloaded on first use):
487
+
488
+ | Model | Purpose | Size |
489
+ |-------|---------|------|
490
+ | `embeddinggemma-300M-Q8_0` | Vector embeddings (default) | ~300MB |
491
+ | `qwen3-reranker-0.6b-q8_0` | Re-ranking | ~640MB |
492
+ | `qmd-query-expansion-1.7B-q4_k_m` | Query expansion (fine-tuned) | ~1.1GB |
493
+
494
+ Models are downloaded from HuggingFace and cached in `~/.cache/qmd/models/`.
495
+
496
+ ### Custom Embedding Model
497
+
498
+ Override the default embedding model via the `QMD_EMBED_MODEL` environment variable.
499
+ This is useful for multilingual corpora (e.g. Chinese, Japanese, Korean) where
500
+ `embeddinggemma-300M` has limited coverage.
501
+
502
+ ```sh
503
+ # Use Qwen3-Embedding-0.6B for better multilingual (CJK) support
504
+ export QMD_EMBED_MODEL="hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf"
505
+
506
+ # After changing the model, re-embed all collections:
507
+ qmd embed -f
508
+ ```
509
+
510
+ Supported model families:
511
+ - **embeddinggemma** (default) — English-optimized, small footprint
512
+ - **Qwen3-Embedding** — Multilingual (119 languages including CJK), MTEB top-ranked
513
+
514
+ > **Note:** When switching embedding models, you must re-index with `qmd embed -f`
515
+ > since vectors are not cross-compatible between models. The prompt format is
516
+ > automatically adjusted for each model family.
517
+
518
+ ## Installation
519
+
520
+ ```sh
521
+ npm install -g @tobilu/qmd
522
+ # or
523
+ bun install -g @tobilu/qmd
524
+ ```
525
+
526
+ ### Development
527
+
528
+ ```sh
529
+ git clone https://github.com/tobi/qmd
530
+ cd qmd
531
+ npm install
532
+ npm link
533
+ ```
534
+
535
+ ## Usage
536
+
537
+ ### Collection Management
538
+
539
+ ```sh
540
+ # Create a collection from current directory
541
+ qmd collection add . --name myproject
542
+
543
+ # Create a collection with explicit path and custom glob mask
544
+ qmd collection add ~/Documents/notes --name notes --mask "**/*.md"
545
+
546
+ # List all collections
547
+ qmd collection list
548
+
549
+ # Remove a collection
550
+ qmd collection remove myproject
551
+
552
+ # Rename a collection
553
+ qmd collection rename myproject my-project
554
+
555
+ # List files in a collection
556
+ qmd ls notes
557
+ qmd ls notes/subfolder
558
+ ```
559
+
560
+ ### Generate Vector Embeddings
561
+
562
+ ```sh
563
+ # Embed all indexed documents (900 tokens/chunk, 15% overlap)
564
+ qmd embed
565
+
566
+ # Force re-embed everything
567
+ qmd embed -f
568
+
569
+ # Enable AST-aware chunking for code files (TS, JS, Python, Go, Rust)
570
+ qmd embed --chunk-strategy auto
571
+
572
+ # Also works with query for consistent chunk selection
573
+ qmd query "auth flow" --chunk-strategy auto
574
+ ```
575
+
576
+ **AST-aware chunking** (`--chunk-strategy auto`) uses tree-sitter to chunk code
577
+ files at function, class, and import boundaries instead of arbitrary text
578
+ positions. This produces higher-quality chunks and better search results for
579
+ codebases. Markdown and other file types always use regex-based chunking
580
+ regardless of strategy.
581
+
582
+ The default is `regex` (existing behavior). Use `--chunk-strategy auto` to
583
+ opt in. Run `qmd status` to verify which grammars are available.
584
+
585
+ > **Note:** Tree-sitter grammars are optional dependencies. If they are not
586
+ > installed, `--chunk-strategy auto` falls back to regex-only chunking
587
+ > automatically. Tested on both Node.js and Bun.
588
+
589
+ ### Context Management
590
+
591
+ Context adds descriptive metadata to collections and paths, helping search understand your content.
592
+
593
+ ```sh
594
+ # Add context to a collection (using qmd:// virtual paths)
595
+ qmd context add qmd://notes "Personal notes and ideas"
596
+ qmd context add qmd://docs/api "API documentation"
597
+
598
+ # Add context from within a collection directory
599
+ cd ~/notes && qmd context add "Personal notes and ideas"
600
+ cd ~/notes/work && qmd context add "Work-related notes"
601
+
602
+ # Add global context (applies to all collections)
603
+ qmd context add / "Knowledge base for my projects"
604
+
605
+ # List all contexts
606
+ qmd context list
607
+
608
+ # Remove context
609
+ qmd context rm qmd://notes/old
610
+ ```
611
+
612
+ ### Search Commands
613
+
614
+ ```
615
+ ┌──────────────────────────────────────────────────────────────────┐
616
+ │ Search Modes │
617
+ ├──────────┬───────────────────────────────────────────────────────┤
618
+ │ search │ BM25 full-text search only │
619
+ │ vsearch │ Vector semantic search only │
620
+ │ query │ Hybrid: FTS + Vector + Query Expansion + Re-ranking │
621
+ └──────────┴───────────────────────────────────────────────────────┘
622
+ ```
623
+
624
+ ```sh
625
+ # Full-text search (fast, keyword-based)
626
+ qmd search "authentication flow"
627
+
628
+ # Vector search (semantic similarity)
629
+ qmd vsearch "how to login"
630
+
631
+ # Hybrid search with re-ranking (best quality)
632
+ qmd query "user authentication"
633
+ ```
634
+
635
+ ### Options
636
+
637
+ ```sh
638
+ # Search options
639
+ -n <num> # Number of results (default: 5, or 20 for --files/--json)
640
+ -c, --collection # Restrict search to a specific collection
641
+ --all # Return all matches (use with --min-score to filter)
642
+ --min-score <num> # Minimum score threshold (default: 0)
643
+ --full # Show full document content
644
+ --line-numbers # Add line numbers to output
645
+ --explain # Include retrieval score traces (query, JSON/CLI output)
646
+ --index <name> # Use named index
647
+
648
+ # Output formats (for search and multi-get)
649
+ --files # Output: docid,score,filepath,context
650
+ --json # JSON output with snippets
651
+ --csv # CSV output
652
+ --md # Markdown output
653
+ --xml # XML output
654
+
655
+ # Get options
656
+ qmd get <file>[:line] # Get document, optionally starting at line
657
+ -l <num> # Maximum lines to return
658
+ --from <num> # Start from line number
659
+
660
+ # Multi-get options
661
+ -l <num> # Maximum lines per file
662
+ --max-bytes <num> # Skip files larger than N bytes (default: 10KB)
663
+ ```
664
+
665
+ ### Output Format
666
+
667
+ Default output is colorized CLI format (respects `NO_COLOR` env):
668
+
669
+ ```
670
+ docs/guide.md:42 #a1b2c3
671
+ Title: Software Craftsmanship
672
+ Context: Work documentation
673
+ Score: 93%
674
+
675
+ This section covers the **craftsmanship** of building
676
+ quality software with attention to detail.
677
+ See also: engineering principles
678
+
679
+
680
+ notes/meeting.md:15 #d4e5f6
681
+ Title: Q4 Planning
682
+ Context: Personal notes and ideas
683
+ Score: 67%
684
+
685
+ Discussion about code quality and craftsmanship
686
+ in the development process.
687
+ ```
688
+
689
+ - **Path**: Collection-relative path (e.g., `docs/guide.md`)
690
+ - **Docid**: Short hash identifier (e.g., `#a1b2c3`) - use with `qmd get #a1b2c3`
691
+ - **Title**: Extracted from document (first heading or filename)
692
+ - **Context**: Path context if configured via `qmd context add`
693
+ - **Score**: Color-coded (green >70%, yellow >40%, dim otherwise)
694
+ - **Snippet**: Context around match with query terms highlighted
695
+
696
+ ### Examples
697
+
698
+ ```sh
699
+ # Get 10 results with minimum score 0.3
700
+ qmd query -n 10 --min-score 0.3 "API design patterns"
701
+
702
+ # Output as markdown for LLM context
703
+ qmd search --md --full "error handling"
704
+
705
+ # JSON output for scripting
706
+ qmd query --json "quarterly reports"
707
+
708
+ # Inspect how each result was scored (RRF + rerank blend)
709
+ qmd query --json --explain "quarterly reports"
710
+
711
+ # Use separate index for different knowledge base
712
+ qmd --index work search "quarterly reports"
713
+ ```
714
+
715
+ ### Index Maintenance
716
+
717
+ ```sh
718
+ # Show index status and collections with contexts
719
+ qmd status
720
+
721
+ # Re-index all collections
722
+ qmd update
723
+
724
+ # Re-index with git pull first (for remote repos)
725
+ qmd update --pull
726
+
727
+ # Get document by filepath (with fuzzy matching suggestions)
728
+ qmd get notes/meeting.md
729
+
730
+ # Get document by docid (from search results)
731
+ qmd get "#abc123"
732
+
733
+ # Get document starting at line 50, max 100 lines
734
+ qmd get notes/meeting.md:50 -l 100
735
+
736
+ # Get multiple documents by glob pattern
737
+ qmd multi-get "journals/2025-05*.md"
738
+
739
+ # Get multiple documents by comma-separated list (supports docids)
740
+ qmd multi-get "doc1.md, doc2.md, #abc123"
741
+
742
+ # Limit multi-get to files under 20KB
743
+ qmd multi-get "docs/*.md" --max-bytes 20480
744
+
745
+ # Output multi-get as JSON for agent processing
746
+ qmd multi-get "docs/*.md" --json
747
+
748
+ # Clean up cache and orphaned data
749
+ qmd cleanup
750
+ ```
751
+
752
+ ## Data Storage
753
+
754
+ Index stored in: `~/.cache/qmd/index.sqlite`
755
+
756
+ ### Schema
757
+
758
+ ```sql
759
+ collections -- Indexed directories with name and glob patterns
760
+ path_contexts -- Context descriptions by virtual path (qmd://...)
761
+ documents -- Markdown content with metadata and docid (6-char hash)
762
+ documents_fts -- FTS5 full-text index
763
+ content_vectors -- Embedding chunks (hash, seq, pos, 900 tokens each)
764
+ vectors_vec -- sqlite-vec vector index (hash_seq key)
765
+ llm_cache -- Cached LLM responses (query expansion, rerank scores)
766
+ ```
767
+
768
+ ## Environment Variables
769
+
770
+ | Variable | Default | Description |
771
+ |----------|---------|-------------|
772
+ | `XDG_CACHE_HOME` | `~/.cache` | Cache directory location |
773
+
774
+ ## How It Works
775
+
776
+ ### Indexing Flow
777
+
778
+ ```
779
+ Collection ──► Glob Pattern ──► Markdown Files ──► Parse Title ──► Hash Content
780
+ │ │ │
781
+ │ │ ▼
782
+ │ │ Generate docid
783
+ │ │ (6-char hash)
784
+ │ │ │
785
+ └──────────────────────────────────────────────────►└──► Store in SQLite
786
+
787
+
788
+ FTS5 Index
789
+ ```
790
+
791
+ ### Embedding Flow
792
+
793
+ Documents are chunked into ~900-token pieces with 15% overlap using smart boundary detection:
794
+
795
+ ```
796
+ Document ──► Smart Chunk (~900 tokens) ──► Format each chunk ──► node-llama-cpp ──► Store Vectors
797
+ │ "title | text" embedBatch()
798
+
799
+ └─► Chunks stored with:
800
+ - hash: document hash
801
+ - seq: chunk sequence (0, 1, 2...)
802
+ - pos: character position in original
803
+ ```
804
+
805
+ ### Smart Chunking
806
+
807
+ Instead of cutting at hard token boundaries, QMD uses a scoring algorithm to find natural markdown break points. This keeps semantic units (sections, paragraphs, code blocks) together.
808
+
809
+ **Break Point Scores:**
810
+
811
+ | Pattern | Score | Description |
812
+ |---------|-------|-------------|
813
+ | `# Heading` | 100 | H1 - major section |
814
+ | `## Heading` | 90 | H2 - subsection |
815
+ | `### Heading` | 80 | H3 |
816
+ | `#### Heading` | 70 | H4 |
817
+ | `##### Heading` | 60 | H5 |
818
+ | `###### Heading` | 50 | H6 |
819
+ | ` ``` ` | 80 | Code block boundary |
820
+ | `---` / `***` | 60 | Horizontal rule |
821
+ | Blank line | 20 | Paragraph boundary |
822
+ | `- item` / `1. item` | 5 | List item |
823
+ | Line break | 1 | Minimal break |
824
+
825
+ **Algorithm:**
826
+
827
+ 1. Scan document for all break points with scores
828
+ 2. When approaching the 900-token target, search a 200-token window before the cutoff
829
+ 3. Score each break point: `finalScore = baseScore × (1 - (distance/window)² × 0.7)`
830
+ 4. Cut at the highest-scoring break point
831
+
832
+ The squared distance decay means a heading 200 tokens back (score ~30) still beats a simple line break at the target (score 1), but a closer heading wins over a distant one.
833
+
834
+ **Code Fence Protection:** Break points inside code blocks are ignored—code stays together. If a code block exceeds the chunk size, it's kept whole when possible.
835
+
836
+ **AST-Aware Chunking (Code Files):**
837
+
838
+ For supported code files, QMD also parses the source with [tree-sitter](https://tree-sitter.github.io/) and adds AST-derived break points that are merged with the regex scores above:
839
+
840
+ | AST Node | Score | Languages |
841
+ |----------|-------|-----------|
842
+ | Class / interface / struct / impl / trait | 100 | All |
843
+ | Function / method | 90 | All |
844
+ | Type alias / enum | 80 | All |
845
+ | Import / use declaration | 60 | All |
846
+
847
+ Supported for `.ts`, `.tsx`, `.js`, `.jsx`, `.py`, `.go`, and `.rs` files. Enable with `--chunk-strategy auto`. Markdown and other file types always use regex chunking.
848
+
849
+ ### Query Flow (Hybrid)
850
+
851
+ ```
852
+ Query ──► LLM Expansion ──► [Original, Variant 1, Variant 2]
853
+
854
+ ┌─────────┴─────────┐
855
+ ▼ ▼
856
+ For each query: FTS (BM25)
857
+ │ │
858
+ ▼ ▼
859
+ Vector Search Ranked List
860
+
861
+
862
+ Ranked List
863
+
864
+ └─────────┬─────────┘
865
+
866
+ RRF Fusion (k=60)
867
+ Original query ×2 weight
868
+ Top-rank bonus: +0.05/#1, +0.02/#2-3
869
+
870
+
871
+ Top 30 candidates
872
+
873
+
874
+ LLM Re-ranking
875
+ (yes/no + logprob confidence)
876
+
877
+
878
+ Position-Aware Blend
879
+ Rank 1-3: 75% RRF / 25% reranker
880
+ Rank 4-10: 60% RRF / 40% reranker
881
+ Rank 11+: 40% RRF / 60% reranker
882
+
883
+
884
+ Final Results
885
+ ```
886
+
887
+ ## Model Configuration
888
+
889
+ Models are configured in `src/llm.ts` as HuggingFace URIs:
890
+
891
+ ```typescript
892
+ const DEFAULT_EMBED_MODEL = "hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf";
893
+ const DEFAULT_RERANK_MODEL = "hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf";
894
+ const DEFAULT_GENERATE_MODEL = "hf:tobil/qmd-query-expansion-1.7B-gguf/qmd-query-expansion-1.7B-q4_k_m.gguf";
895
+ ```
896
+
897
+ ### EmbeddingGemma Prompt Format
898
+
899
+ ```
900
+ // For queries
901
+ "task: search result | query: {query}"
902
+
903
+ // For documents
904
+ "title: {title} | text: {content}"
905
+ ```
906
+
907
+ ### Qwen3-Reranker
908
+
909
+ Uses node-llama-cpp's `createRankingContext()` and `rankAndSort()` API for cross-encoder reranking. Returns documents sorted by relevance score (0.0 - 1.0).
910
+
911
+ ### Qwen3 (Query Expansion)
912
+
913
+ Used for generating query variations via `LlamaChatSession`.
914
+
915
+ ## License
916
+
917
+ MIT