npm - @tobilu/qmd - Versions diffs - 0.9.0 - Mend

@tobilu/qmd 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,34 @@
+# Changelog
+All notable changes to QMD will be documented in this file.
+## [0.9.0] - 2026-02-15
+Initial public release.
+### Features
+- **Hybrid search pipeline** — BM25 full-text + vector similarity + LLM reranking with Reciprocal Rank Fusion
+- **Smart chunking** — scored markdown break points keep sections, paragraphs, and code blocks intact (~900 tokens/chunk, 15% overlap)
+- **Query expansion** — fine-tuned Qwen3 1.7B model generates search variations for better recall
+- **Cross-encoder reranking** — Qwen3-Reranker scores candidates with position-aware blending
+- **Vector embeddings** — EmbeddingGemma 300M via node-llama-cpp, all on-device
+- **MCP server** — stdio and HTTP transports for Claude Desktop, Claude Code, and any MCP client
+- **Collection management** — index multiple directories with glob patterns
+- **Context annotations** — add descriptions to collections and paths for richer search
+- **Document IDs** — 6-char content hash for stable references across re-indexes
+- **Multi-get** — retrieve multiple documents by glob pattern, comma list, or docids
+- **Multiple output formats** — JSON, CSV, Markdown, XML, files list
+- **Claude Code plugin** — inline status checks and MCP integration
+### Fixes
+- Handle dense content (code) that tokenizes beyond expected chunk size
+- Proper cleanup of Metal GPU resources
+- SQLite-vec readiness verification after extension load
+- Reactivate deactivated documents on re-index
+- BM25 score normalization with Math.abs
+- Bun UTF-8 path corruption workaround
+[0.9.0]: https://github.com/tobi/qmd/releases/tag/v0.9.0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2024-2026 Tobi Lutke
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,615 @@
+# QMD - Query Markup Documents
+An on-device search engine for everything you need to remember. Index your markdown notes, meeting transcripts, documentation, and knowledge bases. Search with keywords or natural language. Ideal for your agentic flows.
+QMD combines BM25 full-text search, vector semantic search, and LLM re-ranking—all running locally via node-llama-cpp with GGUF models.
+![QMD Architecture](assets/qmd-architecture.png)
+## Quick Start
+```sh
+# Install globally
+bun install -g @tobilu/qmd
+# Create collections for your notes, docs, and meeting transcripts
+qmd collection add ~/notes --name notes
+qmd collection add ~/Documents/meetings --name meetings
+qmd collection add ~/work/docs --name docs
+# Add context to help with search results
+qmd context add qmd://notes "Personal notes and ideas"
+qmd context add qmd://meetings "Meeting transcripts and notes"
+qmd context add qmd://docs "Work documentation"
+# Generate embeddings for semantic search
+qmd embed
+# Search across everything
+qmd search "project timeline"           # Fast keyword search
+qmd vsearch "how to deploy"             # Semantic search
+qmd query "quarterly planning process"  # Hybrid + reranking (best quality)
+# Get a specific document
+qmd get "meetings/2024-01-15.md"
+# Get a document by docid (shown in search results)
+qmd get "#abc123"
+# Get multiple documents by glob pattern
+qmd multi-get "journals/2025-05*.md"
+# Search within a specific collection
+qmd search "API" -c notes
+# Export all matches for an agent
+qmd search "API" --all --files --min-score 0.3
+```
+### Using with AI Agents
+QMD's `--json` and `--files` output formats are designed for agentic workflows:
+```sh
+# Get structured results for an LLM
+qmd search "authentication" --json -n 10
+# List all relevant files above a threshold
+qmd query "error handling" --all --files --min-score 0.4
+# Retrieve full document content
+qmd get "docs/api-reference.md" --full
+```
+### MCP Server
+Although the tool works perfectly fine when you just tell your agent to use it on the command line, it also exposes an MCP (Model Context Protocol) server for tighter integration.
+**Tools exposed:**
+- `qmd_search` - Fast BM25 keyword search (supports collection filter)
+- `qmd_vector_search` - Semantic vector search (supports collection filter)
+- `qmd_deep_search` - Deep search with query expansion and reranking (supports collection filter)
+- `qmd_get` - Retrieve document by path or docid (with fuzzy matching suggestions)
+- `qmd_multi_get` - Retrieve multiple documents by glob pattern, list, or docids
+- `qmd_status` - Index health and collection info
+**Claude Desktop configuration** (`~/Library/Application Support/Claude/claude_desktop_config.json`):
+```json
+{
+  "mcpServers": {
+    "qmd": {
+      "command": "qmd",
+      "args": ["mcp"]
+    }
+  }
+}
+```
+**Claude Code** — Install the plugin (recommended):
+```bash
+claude marketplace add tobi/qmd
+claude plugin add qmd@qmd
+```
+Or configure MCP manually in `~/.claude/settings.json`:
+```json
+{
+  "mcpServers": {
+    "qmd": {
+      "command": "qmd",
+      "args": ["mcp"]
+    }
+  }
+}
+```
+#### HTTP Transport
+By default, QMD's MCP server uses stdio (launched as a subprocess by each client). For a shared, long-lived server that avoids repeated model loading, use the HTTP transport:
+```sh
+# Foreground (Ctrl-C to stop)
+qmd mcp --http                    # localhost:8181
+qmd mcp --http --port 8080        # custom port
+# Background daemon
+qmd mcp --http --daemon           # start, writes PID to ~/.cache/qmd/mcp.pid
+qmd mcp stop                      # stop via PID file
+qmd status                        # shows "MCP: running (PID ...)" when active
+```
+The HTTP server exposes two endpoints:
+- `POST /mcp` — MCP Streamable HTTP (JSON responses, stateless)
+- `GET /health` — liveness check with uptime
+LLM models stay loaded in VRAM across requests. Embedding/reranking contexts are disposed after 5 min idle and transparently recreated on the next request (~1s penalty, models remain loaded).
+Point any MCP client at `http://localhost:8181/mcp` to connect.
+## Architecture
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                         QMD Hybrid Search Pipeline                          │
+└─────────────────────────────────────────────────────────────────────────────┘
+                              ┌─────────────────┐
+                              │   User Query    │
+                              └────────┬────────┘
+                                       │
+                        ┌──────────────┴──────────────┐
+                        ▼                             ▼
+               ┌────────────────┐            ┌────────────────┐
+               │ Query Expansion│            │  Original Query│
+               │  (fine-tuned)  │            │   (×2 weight)  │
+               └───────┬────────┘            └───────┬────────┘
+                       │                             │
+                       │ 2 alternative queries       │
+                       └──────────────┬──────────────┘
+                                      │
+              ┌───────────────────────┼───────────────────────┐
+              ▼                       ▼                       ▼
+     ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
+     │ Original Query  │     │ Expanded Query 1│     │ Expanded Query 2│
+     └────────┬────────┘     └────────┬────────┘     └────────┬────────┘
+              │                       │                       │
+      ┌───────┴───────┐       ┌───────┴───────┐       ┌───────┴───────┐
+      ▼               ▼       ▼               ▼       ▼               ▼
+  ┌───────┐       ┌───────┐ ┌───────┐     ┌───────┐ ┌───────┐     ┌───────┐
+  │ BM25  │       │Vector │ │ BM25  │     │Vector │ │ BM25  │     │Vector │
+  │(FTS5) │       │Search │ │(FTS5) │     │Search │ │(FTS5) │     │Search │
+  └───┬───┘       └───┬───┘ └───┬───┘     └───┬───┘ └───┬───┘     └───┬───┘
+      │               │         │             │         │             │
+      └───────┬───────┘         └──────┬──────┘         └──────┬──────┘
+              │                        │                       │
+              └────────────────────────┼───────────────────────┘
+                                       │
+                                       ▼
+                          ┌───────────────────────┐
+                          │   RRF Fusion + Bonus  │
+                          │  Original query: ×2   │
+                          │  Top-rank bonus: +0.05│
+                          │     Top 30 Kept       │
+                          └───────────┬───────────┘
+                                      │
+                                      ▼
+                          ┌───────────────────────┐
+                          │    LLM Re-ranking     │
+                          │  (qwen3-reranker)     │
+                          │  Yes/No + logprobs    │
+                          └───────────┬───────────┘
+                                      │
+                                      ▼
+                          ┌───────────────────────┐
+                          │  Position-Aware Blend │
+                          │  Top 1-3:  75% RRF    │
+                          │  Top 4-10: 60% RRF    │
+                          │  Top 11+:  40% RRF    │
+                          └───────────────────────┘
+```
+## Score Normalization & Fusion
+### Search Backends
+| Backend | Raw Score | Conversion | Range |
+|---------|-----------|------------|-------|
+| **FTS (BM25)** | SQLite FTS5 BM25 | `Math.abs(score)` | 0 to ~25+ |
+| **Vector** | Cosine distance | `1 / (1 + distance)` | 0.0 to 1.0 |
+| **Reranker** | LLM 0-10 rating | `score / 10` | 0.0 to 1.0 |
+### Fusion Strategy
+The `query` command uses **Reciprocal Rank Fusion (RRF)** with position-aware blending:
+1. **Query Expansion**: Original query (×2 for weighting) + 1 LLM variation
+2. **Parallel Retrieval**: Each query searches both FTS and vector indexes
+3. **RRF Fusion**: Combine all result lists using `score = Σ(1/(k+rank+1))` where k=60
+4. **Top-Rank Bonus**: Documents ranking #1 in any list get +0.05, #2-3 get +0.02
+5. **Top-K Selection**: Take top 30 candidates for reranking
+6. **Re-ranking**: LLM scores each document (yes/no with logprobs confidence)
+7. **Position-Aware Blending**:
+   - RRF rank 1-3: 75% retrieval, 25% reranker (preserves exact matches)
+   - RRF rank 4-10: 60% retrieval, 40% reranker
+   - RRF rank 11+: 40% retrieval, 60% reranker (trust reranker more)
+**Why this approach**: Pure RRF can dilute exact matches when expanded queries don't match. The top-rank bonus preserves documents that score #1 for the original query. Position-aware blending prevents the reranker from destroying high-confidence retrieval results.
+### Score Interpretation
+| Score | Meaning |
+|-------|---------|
+| 0.8 - 1.0 | Highly relevant |
+| 0.5 - 0.8 | Moderately relevant |
+| 0.2 - 0.5 | Somewhat relevant |
+| 0.0 - 0.2 | Low relevance |
+## Requirements
+### System Requirements
+- **Bun** >= 1.0.0
+- **macOS**: Homebrew SQLite (for extension support)
+  ```sh
+  brew install sqlite
+  ```
+### GGUF Models (via node-llama-cpp)
+QMD uses three local GGUF models (auto-downloaded on first use):
+| Model | Purpose | Size |
+|-------|---------|------|
+| `embeddinggemma-300M-Q8_0` | Vector embeddings | ~300MB |
+| `qwen3-reranker-0.6b-q8_0` | Re-ranking | ~640MB |
+| `qmd-query-expansion-1.7B-q4_k_m` | Query expansion (fine-tuned) | ~1.1GB |
+Models are downloaded from HuggingFace and cached in `~/.cache/qmd/models/`.
+## Installation
+```sh
+bun install -g @tobilu/qmd
+```
+Make sure `~/.bun/bin` is in your PATH.
+### Development
+```sh
+git clone https://github.com/tobi/qmd
+cd qmd
+bun install
+bun link
+```
+## Usage
+### Collection Management
+```sh
+# Create a collection from current directory
+qmd collection add . --name myproject
+# Create a collection with explicit path and custom glob mask
+qmd collection add ~/Documents/notes --name notes --mask "**/*.md"
+# List all collections
+qmd collection list
+# Remove a collection
+qmd collection remove myproject
+# Rename a collection
+qmd collection rename myproject my-project
+# List files in a collection
+qmd ls notes
+qmd ls notes/subfolder
+```
+### Generate Vector Embeddings
+```sh
+# Embed all indexed documents (900 tokens/chunk, 15% overlap)
+qmd embed
+# Force re-embed everything
+qmd embed -f
+```
+### Context Management
+Context adds descriptive metadata to collections and paths, helping search understand your content.
+```sh
+# Add context to a collection (using qmd:// virtual paths)
+qmd context add qmd://notes "Personal notes and ideas"
+qmd context add qmd://docs/api "API documentation"
+# Add context from within a collection directory
+cd ~/notes && qmd context add "Personal notes and ideas"
+cd ~/notes/work && qmd context add "Work-related notes"
+# Add global context (applies to all collections)
+qmd context add / "Knowledge base for my projects"
+# List all contexts
+qmd context list
+# Remove context
+qmd context rm qmd://notes/old
+```
+### Search Commands
+```
+┌──────────────────────────────────────────────────────────────────┐
+│                        Search Modes                              │
+├──────────┬───────────────────────────────────────────────────────┤
+│ search   │ BM25 full-text search only                           │
+│ vsearch  │ Vector semantic search only                          │
+│ query    │ Hybrid: FTS + Vector + Query Expansion + Re-ranking  │
+└──────────┴───────────────────────────────────────────────────────┘
+```
+```sh
+# Full-text search (fast, keyword-based)
+qmd search "authentication flow"
+# Vector search (semantic similarity)
+qmd vsearch "how to login"
+# Hybrid search with re-ranking (best quality)
+qmd query "user authentication"
+```
+### Options
+```sh
+# Search options
+-n <num>           # Number of results (default: 5, or 20 for --files/--json)
+-c, --collection   # Restrict search to a specific collection
+--all              # Return all matches (use with --min-score to filter)
+--min-score <num>  # Minimum score threshold (default: 0)
+--full             # Show full document content
+--line-numbers     # Add line numbers to output
+--index <name>     # Use named index
+# Output formats (for search and multi-get)
+--files            # Output: docid,score,filepath,context
+--json             # JSON output with snippets
+--csv              # CSV output
+--md               # Markdown output
+--xml              # XML output
+# Get options
+qmd get <file>[:line]  # Get document, optionally starting at line
+-l <num>               # Maximum lines to return
+--from <num>           # Start from line number
+# Multi-get options
+-l <num>           # Maximum lines per file
+--max-bytes <num>  # Skip files larger than N bytes (default: 10KB)
+```
+### Output Format
+Default output is colorized CLI format (respects `NO_COLOR` env):
+```
+docs/guide.md:42 #a1b2c3
+Title: Software Craftsmanship
+Context: Work documentation
+Score: 93%
+This section covers the **craftsmanship** of building
+quality software with attention to detail.
+See also: engineering principles
+notes/meeting.md:15 #d4e5f6
+Title: Q4 Planning
+Context: Personal notes and ideas
+Score: 67%
+Discussion about code quality and craftsmanship
+in the development process.
+```
+- **Path**: Collection-relative path (e.g., `docs/guide.md`)
+- **Docid**: Short hash identifier (e.g., `#a1b2c3`) - use with `qmd get #a1b2c3`
+- **Title**: Extracted from document (first heading or filename)
+- **Context**: Path context if configured via `qmd context add`
+- **Score**: Color-coded (green >70%, yellow >40%, dim otherwise)
+- **Snippet**: Context around match with query terms highlighted
+### Examples
+```sh
+# Get 10 results with minimum score 0.3
+qmd query -n 10 --min-score 0.3 "API design patterns"
+# Output as markdown for LLM context
+qmd search --md --full "error handling"
+# JSON output for scripting
+qmd query --json "quarterly reports"
+# Use separate index for different knowledge base
+qmd --index work search "quarterly reports"
+```
+### Index Maintenance
+```sh
+# Show index status and collections with contexts
+qmd status
+# Re-index all collections
+qmd update
+# Re-index with git pull first (for remote repos)
+qmd update --pull
+# Get document by filepath (with fuzzy matching suggestions)
+qmd get notes/meeting.md
+# Get document by docid (from search results)
+qmd get "#abc123"
+# Get document starting at line 50, max 100 lines
+qmd get notes/meeting.md:50 -l 100
+# Get multiple documents by glob pattern
+qmd multi-get "journals/2025-05*.md"
+# Get multiple documents by comma-separated list (supports docids)
+qmd multi-get "doc1.md, doc2.md, #abc123"
+# Limit multi-get to files under 20KB
+qmd multi-get "docs/*.md" --max-bytes 20480
+# Output multi-get as JSON for agent processing
+qmd multi-get "docs/*.md" --json
+# Clean up cache and orphaned data
+qmd cleanup
+```
+## Data Storage
+Index stored in: `~/.cache/qmd/index.sqlite`
+### Schema
+```sql
+collections     -- Indexed directories with name and glob patterns
+path_contexts   -- Context descriptions by virtual path (qmd://...)
+documents       -- Markdown content with metadata and docid (6-char hash)
+documents_fts   -- FTS5 full-text index
+content_vectors -- Embedding chunks (hash, seq, pos, 900 tokens each)
+vectors_vec     -- sqlite-vec vector index (hash_seq key)
+llm_cache       -- Cached LLM responses (query expansion, rerank scores)
+```
+## Environment Variables
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `XDG_CACHE_HOME` | `~/.cache` | Cache directory location |
+## How It Works
+### Indexing Flow
+```
+Collection ──► Glob Pattern ──► Markdown Files ──► Parse Title ──► Hash Content
+    │                                                   │              │
+    │                                                   │              ▼
+    │                                                   │         Generate docid
+    │                                                   │         (6-char hash)
+    │                                                   │              │
+    └──────────────────────────────────────────────────►└──► Store in SQLite
+                                                                       │
+                                                                       ▼
+                                                                  FTS5 Index
+```
+### Embedding Flow
+Documents are chunked into ~900-token pieces with 15% overlap using smart boundary detection:
+```
+Document ──► Smart Chunk (~900 tokens) ──► Format each chunk ──► node-llama-cpp ──► Store Vectors
+                │                           "title | text"        embedBatch()
+                │
+                └─► Chunks stored with:
+                    - hash: document hash
+                    - seq: chunk sequence (0, 1, 2...)
+                    - pos: character position in original
+```
+### Smart Chunking
+Instead of cutting at hard token boundaries, QMD uses a scoring algorithm to find natural markdown break points. This keeps semantic units (sections, paragraphs, code blocks) together.
+**Break Point Scores:**
+| Pattern | Score | Description |
+|---------|-------|-------------|
+| `# Heading` | 100 | H1 - major section |
+| `## Heading` | 90 | H2 - subsection |
+| `### Heading` | 80 | H3 |
+| `#### Heading` | 70 | H4 |
+| `##### Heading` | 60 | H5 |
+| `###### Heading` | 50 | H6 |
+| ` ``` ` | 80 | Code block boundary |
+| `---` / `***` | 60 | Horizontal rule |
+| Blank line | 20 | Paragraph boundary |
+| `- item` / `1. item` | 5 | List item |
+| Line break | 1 | Minimal break |
+**Algorithm:**
+1. Scan document for all break points with scores
+2. When approaching the 900-token target, search a 200-token window before the cutoff
+3. Score each break point: `finalScore = baseScore × (1 - (distance/window)² × 0.7)`
+4. Cut at the highest-scoring break point
+The squared distance decay means a heading 200 tokens back (score ~30) still beats a simple line break at the target (score 1), but a closer heading wins over a distant one.
+**Code Fence Protection:** Break points inside code blocks are ignored—code stays together. If a code block exceeds the chunk size, it's kept whole when possible.
+### Query Flow (Hybrid)
+```
+Query ──► LLM Expansion ──► [Original, Variant 1, Variant 2]
+                │
+      ┌─────────┴─────────┐
+      ▼                   ▼
+   For each query:     FTS (BM25)
+      │                   │
+      ▼                   ▼
+   Vector Search      Ranked List
+      │
+      ▼
+   Ranked List
+      │
+      └─────────┬─────────┘
+                ▼
+         RRF Fusion (k=60)
+         Original query ×2 weight
+         Top-rank bonus: +0.05/#1, +0.02/#2-3
+                │
+                ▼
+         Top 30 candidates
+                │
+                ▼
+         LLM Re-ranking
+         (yes/no + logprob confidence)
+                │
+                ▼
+         Position-Aware Blend
+         Rank 1-3:  75% RRF / 25% reranker
+         Rank 4-10: 60% RRF / 40% reranker
+         Rank 11+:  40% RRF / 60% reranker
+                │
+                ▼
+         Final Results
+```
+## Model Configuration
+Models are configured in `src/llm.ts` as HuggingFace URIs:
+```typescript
+const DEFAULT_EMBED_MODEL = "hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf";
+const DEFAULT_RERANK_MODEL = "hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf";
+const DEFAULT_GENERATE_MODEL = "hf:tobil/qmd-query-expansion-1.7B-gguf/qmd-query-expansion-1.7B-q4_k_m.gguf";
+```
+### EmbeddingGemma Prompt Format
+```
+// For queries
+"task: search result | query: {query}"
+// For documents
+"title: {title} | text: {content}"
+```
+### Qwen3-Reranker
+Uses node-llama-cpp's `createRankingContext()` and `rankAndSort()` API for cross-encoder reranking. Returns documents sorted by relevance score (0.0 - 1.0).
+### Qwen3 (Query Expansion)
+Used for generating query variations via `LlamaChatSession`.
+## License
+MIT