npm - @ambicuity/kindx - Versions diffs - 0.1.0 → 1.1.0 - Mend

@ambicuity/kindx 0.1.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/CHANGELOG.md +51 -0
package/README.md +409 -129
package/bin/kindx +38 -0
package/capabilities/kindx/SKILL.md +127 -0
package/capabilities/kindx/references/mcp-setup.md +102 -0
package/dist/catalogs.js +57 -16
package/dist/inference.d.ts +82 -7
package/dist/inference.js +241 -49
package/dist/kindx.js +425 -91
package/dist/migrate.d.ts +2 -0
package/dist/migrate.js +133 -0
package/dist/protocol.d.ts +2 -1
package/dist/protocol.js +110 -6
package/dist/remote-llm.d.ts +23 -0
package/dist/remote-llm.js +307 -0
package/dist/repository.d.ts +18 -1
package/dist/repository.js +260 -35
package/dist/watcher.d.ts +29 -0
package/dist/watcher.js +243 -0
package/package.json +26 -11

package/README.md CHANGED Viewed

@@ -1,12 +1,60 @@
-# KINDX -- On-Device Document Intelligence Engine
+```
+ ██╗  ██╗██╗███╗   ██╗██████╗ ██╗  ██╗
+ ██║ ██╔╝██║████╗  ██║██╔══██╗╚██╗██╔╝
+ █████╔╝ ██║██╔██╗ ██║██║  ██║ ╚███╔╝
+ ██╔═██╗ ██║██║╚██╗██║██║  ██║ ██╔██╗
+ ██║  ██╗██║██║ ╚████║██████╔╝██╔╝ ██╗
+ ╚═╝  ╚═╝╚═╝╚═╝  ╚═══╝╚═════╝ ╚═╝  ╚═╝
+```
+# KINDX — Enterprise-Grade On-Device Knowledge Infrastructure
+[![MCP-Compatible](https://img.shields.io/badge/MCP-Compatible-6f42c1?style=flat-square&logo=data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMTYiIGhlaWdodD0iMTYiIHZpZXdCb3g9IjAgMCAxNiAxNiIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48cmVjdCB3aWR0aD0iMTYiIGhlaWdodD0iMTYiIHJ4PSIyIiBmaWxsPSIjNmY0MmMxIi8+PC9zdmc+)](https://modelcontextprotocol.io)
+[![Local-First](https://img.shields.io/badge/Local--First-Privacy%20Guaranteed-22c55e?style=flat-square)](https://github.com/ambicuity/KINDX)
+[![Node.js](https://img.shields.io/badge/Node.js-22%2B-339933?style=flat-square&logo=node.js&logoColor=white)](https://nodejs.org)
+[![TypeScript](https://img.shields.io/badge/TypeScript-Strict-3178C6?style=flat-square&logo=typescript&logoColor=white)](https://www.typescriptlang.org)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square)](./LICENSE)
+[![OpenSSF Scorecard](https://api.scorecard.dev/projects/github.com/ambicuity/KINDX/badge)](https://scorecard.dev/viewer/?uri=github.com/ambicuity/KINDX)
+**Knowledge Infrastructure for AI Agents.** KINDX is a high-performance, local-first backend for Agentic Context Injection — enabling AI agents to perform deterministic, privacy-preserving Contextual Retrieval over enterprise corpora without a single byte leaving the edge.
+KINDX combines BM25 full-text retrieval, vector semantic retrieval, and LLM re-ranking — all running locally via `node-llama-cpp` with GGUF models. It is designed to be called by agents, not typed by humans.
+> Read the progress log in the [CHANGELOG](./CHANGELOG.md).
+---
+## Why KINDX?
+The local RAG ecosystem is fragmenting: LanceDB is moving to multimodal ML infrastructure, Chroma is moving to managed cloud, Orama is moving to the browser. **KINDX is the only tool that stays on the desktop and speaks the agent's native language.**
+| Capability | KINDX | LanceDB | Chroma | Orama | Khoj |
+|---|:---:|:---:|:---:|:---:|:---:|
+| **Local-first / Air-gapped** | ✅ | ✅ | ❌ | ✅ | ✅ |
+| **MCP Server (agent protocol)** | ✅ | ❌ | ❌ | ❌ | ❌ |
+| **On-device GGUF inference** | ✅ | ❌ | ❌ | ❌ | Partial |
+| **Hybrid BM25 + Vector + Rerank** | ✅ | Partial | Partial | ✅ | ❌ |
+| **Structured agent output (JSON/CSV/XML)** | ✅ | ❌ | ❌ | ❌ | ❌ |
+| **CLI-first / `child_process` invocable** | ✅ | ❌ | ❌ | ❌ | ❌ |
+KINDX is the only product in this category that combines local-first privacy, first-class MCP support, on-device GGUF inference, structured pipeline output, and CLI invocability — making it the ideal Memory Node for MCP-compatible autonomous agents (Claude Code, Cursor, Continue.dev, AutoGPT, and beyond).
+---
+## The Three Pillars
-A local-first search engine for everything you need to remember. Index your markdown notes, meeting transcripts, documentation, and knowledge bases. Search with keywords or natural language. Designed for agentic workflows.
+### 1. Deterministic Privacy
+Every inference — embedding, reranking, query expansion — runs on local GGUF models via `node-llama-cpp`. Sensitive documents never leave the edge. There is no telemetry, no API call, no cloud dependency.
-KINDX combines BM25 full-text search, vector semantic search, and LLM re-ranking -- all running locally via node-llama-cpp with GGUF models.
+### 2. Agent-Native Design
+KINDX is architected for `child_process` invocation from autonomous agents (AutoGPT, OpenDevin, Claude Code, LangGraph). The `--json`, `--files`, `--csv`, and `--xml` output flags produce structured payloads for agent consumption. The MCP server provides tight protocol-level integration.
-You can read more about KINDX's progress in the [CHANGELOG](./CHANGELOG.md).
+### 3. Hybrid Precision (Neural-Symbolic Retrieval)
+Position-Aware Blending merges BM25 symbolic retrieval with neural vector similarity and LLM cross-encoder reranking. The fusion strategy is provably non-destructive to exact-match signals via Reciprocal Rank Fusion (RRF, k=60). See the [Architecture](#architecture) section for the full pipeline specification.
+---
-## Quick Start
+## Quick Start — Local-First Agentic Stack
 ```bash
 # Install globally (Node or Bun)
@@ -14,72 +62,99 @@ npm install -g @ambicuity/kindx
 # or
 bun install -g @ambicuity/kindx
-# Or run directly
+# Or invoke without installing
 npx @ambicuity/kindx ...
 bunx @ambicuity/kindx ...
-# Create collections for your notes, docs, and meeting transcripts
+> **Note:** The term "collection" in this documentation corresponds to a `collection` in the CLI.
+# Register collections
 kindx collection add ~/notes --name notes
 kindx collection add ~/Documents/meetings --name meetings
 kindx collection add ~/work/docs --name docs
-# Add context to help with search results
-kindx context add kindx://notes "Personal notes and ideas"
-kindx context add kindx://meetings "Meeting transcripts and notes"
-kindx context add kindx://docs "Work documentation"
+# Annotate collections with semantic context
+kindx context add kindx://notes "Personal documents and ideation corpus"
+kindx context add kindx://meetings "Meeting transcripts and decision records"
+kindx context add kindx://docs "Engineering documentation corpus"
-# Generate embeddings for semantic search
+# Build the vector index from corpus
 kindx embed
-# Search across everything
-kindx search "project timeline"          # Fast keyword search
-kindx vsearch "how to deploy"            # Semantic search
-kindx query "quarterly planning process" # Hybrid + reranking (best quality)
+# Contextual Retrieval — choose retrieval mode
+kindx search "project timeline"          # BM25 full-text retrieval (fast)
+kindx vsearch "how to deploy"            # Neural vector retrieval
+kindx query "quarterly planning process" # Hybrid + reranking (highest precision)
-# Get a specific document
+# Neural Extraction — retrieve a specific document
 kindx get "meetings/2024-01-15.md"
-# Get a document by docid (shown in search results)
+# Neural Extraction by docid (shown in retrieval results)
 kindx get "#abc123"
-# Get multiple documents by glob pattern
+# Bulk Neural Extraction via glob pattern
 kindx multi-get "journals/2025-05*.md"
-# Search within a specific collection
+# Scoped Contextual Retrieval within a collection
 kindx search "API" -c notes
-# Export all matches for an agent
+# Corrective feedback (Phase 1)
+kindx feedback --irrelevant --query "deploy k8s" --chunk "#abc123:2"
+kindx feedback --relevant --query "deploy k8s" --chunk "#abc123:2"
+kindx feedback list --query "deploy"
+# Export full match set for agent pipeline
 kindx search "API" --all --files --min-score 0.3
+> **Pro-tip (Small Collections):** For collections under ~100 documents, `kindx search` (BM25) is incredibly fast and often sufficient. The query expansion and reranking overhead of `kindx query` is best suited for larger, noisier corporate datasets.
 ```
-### Using with AI Agents
+---
+## Agent-Native Integration
-KINDX's `--json` and `--files` output formats are designed for agentic workflows:
+KINDX's primary interface is structured output for agent pipelines. Treat CLI invocations as RPC calls.
 ```bash
-# Get structured results for an LLM
+# Structured JSON payload for LLM context injection
 kindx search "authentication" --json -n 10
-# List all relevant files above a threshold
+# Filepath manifest above relevance threshold — agent file consumption
 kindx query "error handling" --all --files --min-score 0.4
-# Retrieve full document content
+# Full document content for agent context window
 kindx get "docs/api-reference.md" --full
 ```
-### MCP Server
+> **Pro-tip (Agentic Performance):** Prefer `kindx query` over `kindx search` for open-ended agent instructions. The query expansion and LLM re-ranking pipeline surfaces semantically adjacent documents that keyword retrieval misses.
+> **Pro-tip (Context Window Budgeting):** Use `--min-score 0.4` with `--files` to produce a ranked manifest, then `multi-get` only the top-k assets. This two-phase pattern prevents context window overflow while preserving retrieval precision.
+### Typed SDK Packages
+KINDX now includes typed client packages and integration scaffolding:
+- `@ambicuity/kindx-schemas` — shared Zod schemas for KINDX MCP/HTTP request and response contracts.
+- `@ambicuity/kindx-client` — TypeScript client for `/query` and MCP tool calls (`get`, `multi_get`, `status`, `kindx_feedback`, and memory tools).
+- `python/kindx-langchain` — installable Python retriever wrapper for LangChain-style document retrieval.
+- [`reference/integrations/agent-templates.md`](reference/integrations/agent-templates.md) — tested MCP configuration templates for OpenDevin, Goose, and Claude Code.
+---
+## MCP Server
-Although the tool works perfectly fine when you just tell your agent to use it on the command line, it also exposes an MCP (Model Context Protocol) server for tighter integration.
+KINDX exposes a Model Context Protocol (MCP) server for tool-call integration with any MCP-compatible agent runtime.
-Tools exposed:
-- `kindx_search` -- Fast BM25 keyword search (supports collection filter)
-- `kindx_vector_search` -- Semantic vector search (supports collection filter)
-- `kindx_deep_search` -- Deep search with query expansion and reranking (supports collection filter)
-- `kindx_get` -- Retrieve document by path or docid (with fuzzy matching suggestions)
-- `kindx_multi_get` -- Retrieve multiple documents by glob pattern, list, or docids
-- `kindx_status` -- Index health and collection info
+**Registered Tools:**
+- `kindx_search` — BM25 Contextual Retrieval (supports collection filter)
+- `kindx_vector_search` — Neural vector Contextual Retrieval (supports collection filter)
+- `kindx_deep_search` — Hybrid Neural-Symbolic retrieval with query expansion and reranking (supports collection filter)
+- `kindx_get` — Neural Extraction by path or docid (with fuzzy matching fallback)
+- `kindx_multi_get` — Bulk Neural Extraction by glob pattern, list, or docids
+- `kindx_status` — Index health and collection inventory
+- `kindx_feedback` — Store relevance feedback (`relevant` / `irrelevant`) for query+chunk pairs
-Claude Desktop configuration (`~/Library/Application Support/Claude/claude_desktop_config.json`):
+**Claude Desktop configuration** (`~/Library/Application Support/Claude/claude_desktop_config.json`):
 ```json
 {
@@ -92,28 +167,32 @@ Claude Desktop configuration (`~/Library/Application Support/Claude/claude_deskt
 }
 ```
-#### HTTP Transport
+### HTTP Transport
-By default, KINDX's MCP server uses stdio (launched as a subprocess by each client). For a shared, long-lived server that avoids repeated model loading, use the HTTP transport:
+By default, the MCP server uses stdio (launched as a subprocess per client). For a shared, long-lived server that avoids repeated model loading across agent sessions, use the HTTP transport:
 ```bash
-# Foreground (Ctrl-C to stop)
+# Foreground
 kindx mcp --http                # localhost:8181
 kindx mcp --http --port 8080    # custom port
-# Background daemon
-kindx mcp --http --daemon       # start, writes PID to ~/.cache/kindx/mcp.pid
-kindx mcp stop                  # stop via PID file
-kindx status                    # shows "MCP: running (PID ...)" when active
+# Persistent daemon
+kindx mcp --http --daemon       # writes PID to ~/.cache/kindx/mcp.pid
+kindx mcp stop                  # terminate via PID file
+kindx status                    # reports "MCP: running (PID ...)"
 ```
-The HTTP server exposes two endpoints:
-- `POST /mcp` -- MCP Streamable HTTP (JSON responses, stateless)
-- `GET /health` -- liveness check with uptime
+Endpoints:
+- `POST /mcp` — MCP Streamable HTTP (JSON, stateless)
+- `GET /health` — liveness probe with uptime
-LLM models stay loaded in VRAM across requests. Embedding/reranking contexts are disposed after 5 min idle and transparently recreated on the next request (~1s penalty, models remain loaded).
+LLM models remain resident in VRAM across requests. Embedding and reranking contexts are disposed after 5 min idle and transparently recreated on next request (~1 s penalty, models remain warm).
-Point any MCP client at `http://localhost:8181/mcp` to connect.
+Point any MCP client at `http://localhost:8181/mcp`.
+> **Pro-tip (Multi-Agent Deployments):** Run `kindx mcp --http --daemon` once at agent-cluster startup. All child agents share a single warm model context, eliminating per-invocation model load overhead (~3–8 s per cold start).
+---
 ## Architecture
@@ -164,7 +243,7 @@ graph TB
     CAT --> SQLite
 ```
-### Hybrid Search Pipeline
+### Hybrid Retrieval Pipeline
 ```mermaid
 flowchart TD
@@ -219,10 +298,10 @@ flowchart TD
 ### Score Normalization and Fusion
-#### Search Backends
+#### Retrieval Backends
 - **BM25 (FTS5)**: `Math.abs(score)` normalized via `score / 10`
-- **Vector search**: `1 / (1 + distance)` cosine similarity
+- **Vector retrieval**: `1 / (1 + distance)` cosine similarity
 #### Fusion Strategy
@@ -231,15 +310,17 @@ The `query` command uses Reciprocal Rank Fusion (RRF) with position-aware blendi
 1. **Query Expansion**: Original query (x2 for weighting) + 1 LLM variation
 2. **Parallel Retrieval**: Each query searches both FTS and vector indexes
 3. **RRF Fusion**: Combine all result lists using `score = Sum(1/(k+rank+1))` where k=60
-4. **Top-Rank Bonus**: Documents ranking #1 in any list get +0.05, #2-3 get +0.02
+4. **Top-Rank Bonus**: documents ranking #1 in any list get +0.05, #2-3 get +0.02
 5. **Top-K Selection**: Take top 30 candidates for reranking
-6. **Re-ranking**: LLM scores each document (yes/no with logprobs confidence)
+6. **Re-ranking**: LLM scores each asset (yes/no with logprobs confidence)
 7. **Position-Aware Blending**:
    - RRF rank 1-3: 75% retrieval, 25% reranker (preserves exact matches)
    - RRF rank 4-10: 60% retrieval, 40% reranker
    - RRF rank 11+: 40% retrieval, 60% reranker (trust reranker more)
-Why this approach: Pure RRF can dilute exact matches when expanded queries don't match. The top-rank bonus preserves documents that score #1 for the original query. Position-aware blending prevents the reranker from destroying high-confidence retrieval results.
+**Design rationale:** Pure RRF can dilute exact matches when expanded queries don't match. The top-rank bonus preserves documents that score #1 for the original query. Position-aware blending prevents the reranker from overriding high-confidence retrieval signals.
+---
 ## Requirements
@@ -253,33 +334,53 @@ Why this approach: Pure RRF can dilute exact matches when expanded queries don't
 brew install sqlite
 ```
+### WSL2 GPU Support (Windows)
+If you are running KINDX inside WSL2 with an NVIDIA GPU, `node-llama-cpp` might fall back to the slow, non-conformant Vulkan translation layer (`dzn`) causing `vsearch` and `query` to take 60-90 seconds or crash.
+To enable native CUDA GPU acceleration, install the CUDA toolkit runtime libraries (do *not* install the driver meta-packages, WSL2 passes the driver through from Windows):
+```bash
+wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.1-1_all.deb
+sudo dpkg -i cuda-keyring_1.1-1_all.deb
+sudo apt-get update
+sudo apt-get install cuda-toolkit-13-1  # or cuda-toolkit-12-6
+```
 ### GGUF Models (via node-llama-cpp)
 KINDX uses three local GGUF models (auto-downloaded on first use):
-- `embeddinggemma-300M-Q8_0` -- embedding model
-- `qwen3-reranker-0.6b-q8_0` -- cross-encoder reranker
-- `kindx-query-expansion-1.7B-q4_k_m` -- query expansion (fine-tuned)
+- `embeddinggemma-300M-Q8_0` — embedding model
+- `qwen3-reranker-0.6b-q8_0` — cross-encoder reranker
+- `kindx-query-expansion-1.7B-q4_k_m` — query expansion (fine-tuned)
 Models are downloaded from HuggingFace and cached in `~/.cache/kindx/models/`.
+> **Pro-tip (Air-Gapped Deployments):** Pre-download all three GGUF files and place them in `~/.cache/kindx/models/`. KINDX resolves models from the local cache first; no network access is required at runtime.
 ### Custom Embedding Model
-Override the default embedding model via the `KINDX_EMBED_MODEL` environment variable. This is useful for multilingual corpora (e.g. Chinese, Japanese, Korean) where embeddinggemma-300M has limited coverage.
+Override the default embedding model via the `KINDX_EMBED_MODEL` environment variable. Required for multilingual corpora (CJK, Arabic, etc.) where `embeddinggemma-300M` has limited coverage.
 ```bash
-# Use Qwen3-Embedding-0.6B for better multilingual (CJK) support
-export KINDX_EMBED_MODEL="hf:Qwen/Qwen3-Embedding-0.6B-GGUF/qwen3-embedding-0.6b-q8_0.gguf"
+# Use Qwen3-Embedding-0.6B for multilingual corpus (CJK) support
+export KINDX_EMBED_MODEL="hf:Qwen/Qwen3-Embedding-0.6B-GGUF/qwen3-embedding-0.6b-Q8_0.gguf"
-# After changing the model, re-embed all collections:
+# Force re-embed all documents after model switch
 kindx embed -f
 ```
 Supported model families:
-- **embeddinggemma** (default) -- English-optimized, small footprint
-- **Qwen3-Embedding** -- Multilingual (119 languages including CJK), MTEB top-ranked
-Note: When switching embedding models, you must re-index with `kindx embed -f` since vectors are not cross-compatible between models. The prompt format is automatically adjusted for each model family.
+| Model | Use Case |
+|---|---|
+| `embeddinggemma` (default) | English-optimized, minimal footprint |
+| `Qwen3-Embedding` | Multilingual (119 languages including CJK), MTEB top-ranked |
+> **Note:** Switching embedding models requires full re-indexing (`kindx embed -f`). Vectors are model-specific and not cross-compatible. The prompt format is automatically adjusted per model family.
+---
 ## Installation
@@ -298,18 +399,110 @@ npm install
 npm link
 ```
-## Usage
+### Troubleshooting: Permission Errors (`EACCES`)
+If you see `npm error code EACCES` when running `npm install -g`, your system npm is configured to write to a directory owned by root (e.g. `/usr/local/lib/node_modules`). **Do not use `sudo npm install -g`** — this is a security risk.
+The recommended fix is to use a Node version manager so that npm writes to a user-owned prefix:
+**Option 1 — `nvm` (most common)**
+```bash
+# Install nvm
+curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.2/install.sh | bash
+# Restart your shell, then:
+nvm install --lts
+nvm use --lts
+npm install -g @ambicuity/kindx
+```
-### Collection Management
+**Option 2 — `mise` (polyglot version manager)**
 ```bash
-# Create a collection from current directory
+# Install mise
+curl https://mise.run | sh
+# Restart your shell, then:
+mise use -g node@lts
+npm install -g @ambicuity/kindx
+```
+**Option 3 — configure a user-writable npm prefix**
+```bash
+mkdir -p ~/.npm-global
+npm config set prefix ~/.npm-global
+# Add to your shell profile (~/.zshrc or ~/.bashrc):
+export PATH="$HOME/.npm-global/bin:$PATH"
+# Then:
+npm install -g @ambicuity/kindx
+```
+After any of the above, `kindx --version` should print the installed version.
+## Usage Reference
+### Command Index
+Top-level commands:
+```bash
+kindx query <query>          # Hybrid search with expansion + reranking
+kindx search <query>         # BM25 full-text search
+kindx vsearch <query>        # Vector similarity search
+kindx get <file> [--from N]  # Retrieve one document (optionally from line offset)
+kindx multi-get <pattern>    # Retrieve many documents by glob/list/docid
+kindx embed                  # Generate or refresh embeddings
+kindx pull                   # Download/check the default local models
+kindx update                 # Re-index configured collections
+kindx watch                  # Keep the index fresh in the background
+kindx status                 # Report index, collection, and MCP health
+kindx cleanup                # Clear cache/orphaned rows and vacuum the DB
+kindx mcp                    # Start the MCP server (stdio by default)
+kindx migrate <target> <path> # Import from Chroma or OpenCLAW
+kindx skill install          # Install the packaged Claude skill locally
+kindx --skill                # Print the packaged skill markdown
+kindx --version              # Print the installed CLI version
+```
+KINDX opens SQLite indexes with `journal_mode=WAL` and `busy_timeout=5000`, so background writers
+(for example `kindx watch`) and MCP readers can run concurrently with fewer lock conflicts.
+Collection subcommands:
+```bash
+kindx collection add <path> [--name NAME] [--mask GLOB]
+kindx collection list
+kindx collection show <name>
+kindx collection remove <name>
+kindx collection rename <old> <new>
+kindx collection update-cmd <name> [command]
+kindx collection include <name>
+kindx collection exclude <name>
+```
+Context and MCP subcommands:
+```bash
+kindx context add [path] "text"
+kindx context list
+kindx context rm <path>
+kindx mcp --http
+kindx mcp --http --daemon
+kindx mcp stop
+```
+### collection Management
+```bash
+# Register a collection from current directory
 kindx collection add . --name myproject
-# Create a collection with explicit path and custom glob mask
+# Register with explicit path and glob mask
 kindx collection add ~/Documents/notes --name notes --mask "**/*.md"
-# List all collections
+# List all registered collections
 kindx collection list
 # Remove a collection
@@ -318,137 +511,204 @@ kindx collection remove myproject
 # Rename a collection
 kindx collection rename myproject my-project
-# List files in a collection
+# Show collection details and current settings
+kindx collection show my-project
+# Configure a pre-refresh command
+kindx collection update-cmd my-project "git pull --ff-only"
+# Include or exclude a collection from default queries
+kindx collection include my-project
+kindx collection exclude archive
+# List documents within a domain
 kindx ls notes
 kindx ls notes/subfolder
 ```
-### Generate Vector Embeddings
+### YAML Configuration
+By default, collection settings are stored in `~/.config/kindx/index.yml`. The config directory is resolved as `KINDX_CONFIG_DIR` (if set), then `XDG_CONFIG_HOME/kindx` (if set), then `~/.config/kindx`. Named indexes use `~/.config/kindx/{indexName}.yml` (default: `index.yml`). You can edit this file directly to configure `ignore` patterns and glob rules for files that should be skipped during indexing and search.
+#### Example
+```yaml
+collections:
+  docs:
+    path: ~/work/docs
+    pattern: "**/*.md"
+    ignore:
+      - "archive/**"
+      - "sessions/**"
+      - "**/*.draft.md"
+```
+#### How it works
+- `pattern` defines which files are included
+- `ignore` excludes matching files and directories
+- A file must match `pattern` **and not match any `ignore` rule** to be indexed
+- Ignored files are skipped during indexing and will not appear in search results
+#### Notes
+- `ignore` is configured in YAML (no CLI support currently)
+- Patterns are evaluated relative to the collection `path`.
+- By default, `node_modules`, `.git`, `.cache`, `vendor`, `dist`, and `build` are already ignored; `ignore` adds custom exclusions.
+- After editing `index.yml`, run `kindx update` to re-index with the updated rules.
+### Vector Index Generation
 ```bash
 # Embed all indexed documents (900 tokens/chunk, 15% overlap)
 kindx embed
-# Force re-embed everything
+# Force re-embed entire corpus
 kindx embed -f
 ```
 ### Context Management
-Context adds descriptive metadata to collections and paths, helping search understand your content.
+Context annotations add semantic metadata to collections and paths, improving Contextual Retrieval precision.
 ```bash
-# Add context to a collection (using kindx:// virtual paths)
-kindx context add kindx://notes "Personal notes and ideas"
-kindx context add kindx://docs/api "API documentation"
+# Annotate a collection (using kindx:// virtual paths)
+kindx context add kindx://notes "Personal documents and ideation corpus"
+kindx context add kindx://docs/api "API and integration documentation corpus"
-# Add context from within a collection directory
-cd ~/notes && kindx context add "Personal notes and ideas"
-cd ~/notes/work && kindx context add "Work-related notes"
+# Annotate from within a corpus directory
+cd ~/notes && kindx context add "Personal documents and ideas"
+cd ~/notes/work && kindx context add "Work-related knowledge corpus"
-# Add global context (applies to all collections)
-kindx context add / "Knowledge base for my projects"
+# Add global context (applies across all collections)
+kindx context add / "Enterprise knowledge base for agent context injection"
-# List all contexts
+# List all context annotations
 kindx context list
-# Remove context
+# Remove context annotation
 kindx context rm kindx://notes/old
 ```
-### Search Commands
+### Contextual Retrieval Commands
 ```
 +------------------------------------------------------------+
-| Search Modes                                               |
+| Retrieval Modes                                            |
 +----------+-------------------------------------------------+
-| search   | BM25 full-text search only                      |
-| vsearch  | Vector semantic search only                     |
-| query    | Hybrid: FTS + Vector + Query Expansion + Rerank |
+| search   | BM25 full-text retrieval only                  |
+| vsearch  | Neural vector retrieval only                   |
+| query    | Hybrid: FTS + Vector + Expansion + Rerank      |
 +----------+-------------------------------------------------+
 ```
 ```bash
-# Full-text search (fast, keyword-based)
+# Full-text Contextual Retrieval (fast, keyword-based)
 kindx search "authentication flow"
-# Vector search (semantic similarity)
+# Neural vector Contextual Retrieval (semantic similarity)
 kindx vsearch "how to login"
-# Hybrid search with re-ranking (best quality)
+# Hybrid Neural-Symbolic retrieval with re-ranking (highest precision)
 kindx query "user authentication"
 ```
-### Options
+### CLI Options
 ```bash
-# Search options
+# Retrieval options
 -n <num>           # Number of results (default: 5, or 20 for --files/--json)
--c, --collection   # Restrict search to a specific collection
---all              # Return all matches (use with --min-score to filter)
---min-score <num>  # Minimum score threshold (default: 0)
---full             # Show full document content
---line-numbers     # Add line numbers to output
+-c, --collection   # Restrict retrieval to a specific collection
+--all              # Return all matches (combine with --min-score to filter)
+--min-score <num>  # Minimum relevance threshold (default: 0)
+--full             # Return full document content
+--line-numbers     # Annotate output with line numbers
 --explain          # Include retrieval score traces (query, JSON/CLI output)
 --index <name>     # Use named index
-# Output formats (for search and multi-get)
+# Structured output formats (for agent pipeline consumption)
 --files            # Output: docid,score,filepath,context
---json             # JSON output with snippets
+--json             # JSON payload with snippets
 --csv              # CSV output
 --md               # Markdown output
 --xml              # XML output
-# Get options
-kindx get <file>[:line]  # Get document, optionally starting at line
+# Neural Extraction options
+kindx get <file>[:line]  # Extract document, optionally from line offset
 -l <num>                 # Maximum lines to return
 --from <num>             # Start from line number
-# Multi-get options
--l <num>                 # Maximum lines per file
---max-bytes <num>        # Skip files larger than N bytes (default: 10KB)
+# Bulk Neural Extraction options
+-l <num>                 # Maximum lines per asset
+--max-bytes <num>        # Skip assets larger than N bytes (default: 10KB)
 ```
 ### Index Maintenance
 ```bash
-# Show index status and collections with contexts
+# Report index health and collection inventory
 kindx status
 # Re-index all collections
 kindx update
-# Re-index with git pull first (for remote repos)
+# Re-index with upstream git pull (for remote corpus repos)
 kindx update --pull
-# Get document by filepath (with fuzzy matching suggestions)
+# Download/check the default local models
+kindx pull
+# Force re-download the default models
+kindx pull --refresh
+# Watch one or more collections for changes
+kindx watch
+kindx watch notes docs
+# Neural Extraction by filepath (with fuzzy matching fallback)
 kindx get notes/meeting.md
-# Get document by docid (from search results)
+# Neural Extraction by docid (from retrieval results)
 kindx get "#abc123"
-# Get document starting at line 50, max 100 lines
+# Extract document starting at line 50, max 100 lines
 kindx get notes/meeting.md:50 -l 100
-# Get multiple documents by glob pattern
+# Bulk Neural Extraction via glob pattern
 kindx multi-get "journals/2025-05*.md"
-# Get multiple documents by comma-separated list (supports docids)
+# Bulk Neural Extraction via comma-separated list (supports docids)
 kindx multi-get "doc1.md, doc2.md, #abc123"
-# Limit multi-get to files under 20KB
+# Limit bulk extraction to assets under 20KB
 kindx multi-get "docs/*.md" --max-bytes 20480
-# Output multi-get as JSON for agent processing
+# Export bulk extraction as JSON for agent processing
 kindx multi-get "docs/*.md" --json
-# Clean up cache and orphaned data
+# Purge cache and orphaned index data
 kindx cleanup
+# Import an existing Chroma or OpenCLAW corpus
+kindx migrate chroma /path/to/chroma.sqlite3
+kindx migrate openclaw /path/to/openclaw/repo
 ```
+### Claude Skill Packaging
+```bash
+# Print the packaged skill markdown
+kindx --skill
+# Install the packaged skill into ~/.claude/commands/
+kindx skill install
+```
+---
 ## Data Storage
-Index stored in: `~/.cache/kindx/index.sqlite`
+Index stored at: `~/.cache/kindx/index.sqlite`
 ### Schema
@@ -499,15 +759,33 @@ erDiagram
     content_vectors ||--|| vectors_vec : embeds
 ```
+---
 ## Environment Variables
 | Variable | Default | Description |
 |----------|---------|-------------|
 | `KINDX_EMBED_MODEL` | `embeddinggemma-300M` | Override embedding model (HuggingFace URI) |
-| `KINDX_EXPAND_CONTEXT_SIZE` | `2048` | Context window for query expansion |
+| `KINDX_EXPAND_CONTEXT_SIZE` | `2048` | Context window for query expansion LLM |
+| `KINDX_RERANK_CONTEXT_SIZE` | `4096` | Context window for reranking contexts |
+| `KINDX_LOW_VRAM` | (auto) | Force low-VRAM policy on/off (`1`/`0`) |
+| `KINDX_VRAM_BUDGET_MB` | (unset) | Optional GPU budget in MB; constrains context + parallelism |
+| `KINDX_LOW_VRAM_THRESHOLD_MB` | `6144` | Auto low-VRAM threshold based on free GPU memory |
+| `KINDX_LOW_VRAM_EMBED_PARALLELISM` | `2` | Max embedding context parallelism in low-VRAM mode |
+| `KINDX_LOW_VRAM_RERANK_PARALLELISM` | `1` | Max rerank context parallelism in low-VRAM mode |
+| `KINDX_LOW_VRAM_EXPAND_CONTEXT_SIZE` | `1024` | Expansion context size cap in low-VRAM mode |
+| `KINDX_LOW_VRAM_RERANK_CONTEXT_SIZE` | `1024` | Rerank context size cap in low-VRAM mode |
 | `KINDX_CONFIG_DIR` | `~/.config/kindx` | Configuration directory override |
 | `XDG_CACHE_HOME` | `~/.cache` | Cache base directory |
-| `NO_COLOR` | (unset) | Disable terminal colors |
+| `NO_COLOR` | (unset) | Disable ANSI terminal colors |
+| `KINDX_LLM_BACKEND` | `local` | Set to `remote` to use an OpenAI-compatible API instead of local GPU |
+| `KINDX_OPENAI_BASE_URL` | `http://127.0.0.1:11434/v1` | URL for the Remote API backend (e.g. Ollama, LM Studio) |
+| `KINDX_OPENAI_API_KEY` | (unset) | API key for the Remote API backend if required |
+| `KINDX_OPENAI_EMBED_MODEL`| `nomic-embed-text` | Model name to pass for `/v1/embeddings` |
+| `KINDX_OPENAI_GENERATE_MODEL` | `llama3.2` | Model name to pass for `/v1/chat/completions` (query expansion) |
+| `KINDX_OPENAI_RERANK_MODEL` | (unset) | Model name to pass for `/v1/rerank` (if supported by backend) |
+---
 ## How It Works
@@ -515,8 +793,8 @@ erDiagram
 ```mermaid
 flowchart LR
-    COL["Collection Config"] --> GLOB["Glob Pattern Scan"]
-    GLOB --> MD["Markdown Files"]
+    COL["collection Config"] --> GLOB["Glob Pattern Scan"]
+    GLOB --> MD["Markdown documents"]
     MD --> PARSE["Parse Title + Hash Content"]
     PARSE --> DOCID["Generate docid (6-char hash)"]
     DOCID --> SQL["Store in SQLite"]
@@ -525,11 +803,11 @@ flowchart LR
 ### Embedding Flow
-Documents are chunked into ~900-token pieces with 15% overlap using smart boundary detection:
+documents are chunked into ~900-token pieces with 15% overlap using smart boundary detection:
 ```mermaid
 flowchart LR
-    DOC["Document"] --> CHUNK["Smart Chunk (~900 tokens)"]
+    DOC["document"] --> CHUNK["Smart Chunk (~900 tokens)"]
     CHUNK --> FMT["Format: title | text"]
     FMT --> LLM["node-llama-cpp embedBatch"]
     LLM --> STORE["Store Vectors in sqlite-vec"]
@@ -539,9 +817,9 @@ flowchart LR
 ### Smart Chunking
-Instead of cutting at hard token boundaries, KINDX uses a scoring algorithm to find natural markdown break points. This keeps semantic units (sections, paragraphs, code blocks) together.
+Instead of cutting at hard token boundaries, KINDX uses a scoring algorithm to find natural markdown break points. This keeps semantic units (sections, paragraphs, code blocks) together within a single chunk.
-Algorithm:
+**Algorithm:**
 1. Scan document for all break points with scores
 2. When approaching the 900-token target, search a 200-token window before the cutoff
 3. Score each break point: `finalScore = baseScore x (1 - (distance/window)^2 x 0.7)`
@@ -549,7 +827,7 @@ Algorithm:
 The squared distance decay means a heading 200 tokens back (score ~30) still beats a simple line break at the target (score 1), but a closer heading wins over a distant one.
-Code Fence Protection: Break points inside code blocks are ignored -- code stays together. If a code block exceeds the chunk size, it is kept whole when possible.
+**Code Fence Protection:** Break points inside code blocks are ignored — code stays together. If a code block exceeds the chunk size, it is kept whole when possible.
 ### Model Configuration
@@ -561,18 +839,20 @@ const DEFAULT_RERANK_MODEL = "hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-re
 const DEFAULT_GENERATE_MODEL = "hf:ambicuity/kindx-query-expansion-1.7B-gguf/kindx-query-expansion-1.7B-q4_k_m.gguf";
 ```
+---
 ## Contributing
-See [CONTRIBUTING.md](./CONTRIBUTING.md) for the full contribution guide.
+See [CONTRIBUTING.md](./CONTRIBUTING.md) for the full contribution guide and The KINDX Specification.
 ## Security
-See [SECURITY.md](./SECURITY.md) for reporting vulnerabilities.
+See [SECURITY.md](./SECURITY.md) for vulnerability disclosure.
 ## License
-MIT -- see [LICENSE](./LICENSE) for details.
+MIT — see [LICENSE](./LICENSE) for details.
 ---
-Maintained by [Ritesh Rana](https://github.com/ambicuity) -- `contact@riteshrana.engineer`
+Maintained by [Ritesh Rana](https://github.com/ambicuity) — `contact@riteshrana.engineer`