mdcontext 0.0.1 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.changeset/README.md +28 -0
- package/.changeset/config.json +11 -0
- package/.claude/settings.local.json +25 -0
- package/.github/workflows/ci.yml +83 -0
- package/.github/workflows/claude-code-review.yml +44 -0
- package/.github/workflows/claude.yml +85 -0
- package/.github/workflows/release.yml +113 -0
- package/.tldrignore +112 -0
- package/BACKLOG.md +338 -0
- package/CONTRIBUTING.md +186 -0
- package/NOTES/NOTES +44 -0
- package/README.md +434 -11
- package/biome.json +36 -0
- package/cspell.config.yaml +14 -0
- package/dist/chunk-23UPXDNL.js +3044 -0
- package/dist/chunk-2W7MO2DL.js +1366 -0
- package/dist/chunk-3NUAZGMA.js +1689 -0
- package/dist/chunk-7TOWB2XB.js +366 -0
- package/dist/chunk-7XOTOADQ.js +3065 -0
- package/dist/chunk-AH2PDM2K.js +3042 -0
- package/dist/chunk-BNXWSZ63.js +3742 -0
- package/dist/chunk-BTL5DJVU.js +3222 -0
- package/dist/chunk-HDHYG7E4.js +104 -0
- package/dist/chunk-HLR4KZBP.js +3234 -0
- package/dist/chunk-IP3FRFEB.js +1045 -0
- package/dist/chunk-KHU56VDO.js +3042 -0
- package/dist/chunk-KRYIFLQR.js +88 -0
- package/dist/chunk-LBSDNLEM.js +287 -0
- package/dist/chunk-MNTQ7HCP.js +2643 -0
- package/dist/chunk-MUJELQQ6.js +1387 -0
- package/dist/chunk-MXJGMSLV.js +2199 -0
- package/dist/chunk-N6QJGC3Z.js +2636 -0
- package/dist/chunk-OBELGBPM.js +1713 -0
- package/dist/chunk-OT7R5XTA.js +3192 -0
- package/dist/chunk-P7X4RA2T.js +106 -0
- package/dist/chunk-PIDUQNC2.js +3185 -0
- package/dist/chunk-POGCDIH4.js +3187 -0
- package/dist/chunk-PSIEOQGZ.js +3043 -0
- package/dist/chunk-PVRT3IHA.js +3238 -0
- package/dist/chunk-QNN4TT23.js +1430 -0
- package/dist/chunk-RE3R45RJ.js +3042 -0
- package/dist/chunk-S7E6TFX6.js +803 -0
- package/dist/chunk-SG6GLU4U.js +1378 -0
- package/dist/chunk-SJCDV2ST.js +274 -0
- package/dist/chunk-SYE5XLF3.js +104 -0
- package/dist/chunk-T5VLYBZD.js +103 -0
- package/dist/chunk-TOQB7VWU.js +3238 -0
- package/dist/chunk-VFNMZ4ZQ.js +3228 -0
- package/dist/chunk-VVTGZNBT.js +1629 -0
- package/dist/chunk-W7Q4RFEV.js +104 -0
- package/dist/chunk-XTYYVRLO.js +3190 -0
- package/dist/chunk-Y6MDYVJD.js +3063 -0
- package/dist/cli/main.d.ts +1 -0
- package/dist/cli/main.js +5458 -0
- package/dist/index.d.ts +653 -0
- package/dist/index.js +79 -0
- package/dist/mcp/server.d.ts +1 -0
- package/dist/mcp/server.js +472 -0
- package/dist/schema-BAWSG7KY.js +22 -0
- package/dist/schema-E3QUPL26.js +20 -0
- package/dist/schema-EHL7WUT6.js +20 -0
- package/docs/019-USAGE.md +625 -0
- package/docs/020-current-implementation.md +364 -0
- package/docs/021-DOGFOODING-FINDINGS.md +175 -0
- package/docs/BACKLOG.md +80 -0
- package/docs/CONFIG.md +1123 -0
- package/docs/DESIGN.md +439 -0
- package/docs/ERRORS.md +383 -0
- package/docs/PROJECT.md +88 -0
- package/docs/ROADMAP.md +407 -0
- package/docs/summarization.md +320 -0
- package/docs/test-links.md +9 -0
- package/justfile +40 -0
- package/package.json +74 -9
- package/pnpm-workspace.yaml +5 -0
- package/research/INDEX.md +315 -0
- package/research/code-review/README.md +90 -0
- package/research/code-review/cli-error-handling-review.md +979 -0
- package/research/code-review/code-review-validation-report.md +464 -0
- package/research/code-review/main-ts-review.md +1128 -0
- package/research/config-analysis/01-current-implementation.md +470 -0
- package/research/config-analysis/02-strategy-recommendation.md +428 -0
- package/research/config-analysis/03-task-candidates.md +715 -0
- package/research/config-analysis/033-research-configuration-management.md +828 -0
- package/research/config-analysis/034-research-effect-cli-config.md +1504 -0
- package/research/config-analysis/04-consolidated-task-candidates.md +277 -0
- package/research/config-docs/SUMMARY.md +357 -0
- package/research/config-docs/TEST-RESULTS.md +776 -0
- package/research/config-docs/TODO.md +542 -0
- package/research/config-docs/analysis.md +744 -0
- package/research/config-docs/fix-validation.md +502 -0
- package/research/config-docs/help-audit.md +264 -0
- package/research/config-docs/help-system-analysis.md +890 -0
- package/research/dogfood/consolidated-tool-evaluation.md +373 -0
- package/research/dogfood/strategy-a/a-synthesis.md +184 -0
- package/research/dogfood/strategy-a/a1-docs.md +226 -0
- package/research/dogfood/strategy-a/a2-amorphic.md +156 -0
- package/research/dogfood/strategy-a/a3-llm.md +164 -0
- package/research/dogfood/strategy-b/b-synthesis.md +228 -0
- package/research/dogfood/strategy-b/b1-architecture.md +207 -0
- package/research/dogfood/strategy-b/b2-gaps.md +258 -0
- package/research/dogfood/strategy-b/b3-workflows.md +250 -0
- package/research/dogfood/strategy-c/c-synthesis.md +451 -0
- package/research/dogfood/strategy-c/c1-explorer.md +192 -0
- package/research/dogfood/strategy-c/c2-diver-memory.md +145 -0
- package/research/dogfood/strategy-c/c3-diver-control.md +148 -0
- package/research/dogfood/strategy-c/c4-diver-failure.md +151 -0
- package/research/dogfood/strategy-c/c5-diver-execution.md +221 -0
- package/research/dogfood/strategy-c/c6-diver-org.md +221 -0
- package/research/effect-cli-error-handling.md +845 -0
- package/research/effect-errors-as-values.md +943 -0
- package/research/errors-task-analysis/00-consolidated-tasks.md +207 -0
- package/research/errors-task-analysis/cli-commands-analysis.md +909 -0
- package/research/errors-task-analysis/embeddings-analysis.md +709 -0
- package/research/errors-task-analysis/index-search-analysis.md +812 -0
- package/research/frontmatter/COMMENTS-ARE-SKIPPED.md +149 -0
- package/research/frontmatter/LLM-CODE-NAVIGATION.md +276 -0
- package/research/issue-review.md +603 -0
- package/research/llm-summarization/agent-cli-tools-2026.md +1082 -0
- package/research/llm-summarization/alternative-providers-2026.md +1428 -0
- package/research/llm-summarization/anthropic-2026.md +367 -0
- package/research/llm-summarization/claude-cli-integration.md +1706 -0
- package/research/llm-summarization/cli-integration-patterns.md +3155 -0
- package/research/llm-summarization/openai-2026.md +473 -0
- package/research/llm-summarization/openai-compatible-providers-2026.md +1022 -0
- package/research/llm-summarization/opencode-cli-integration.md +1552 -0
- package/research/llm-summarization/prompt-engineering-2026.md +1426 -0
- package/research/llm-summarization/prototype-results.md +56 -0
- package/research/llm-summarization/provider-switching-patterns-2026.md +2153 -0
- package/research/llm-summarization/typescript-llm-libraries-2026.md +2436 -0
- package/research/mdcontext-error-analysis.md +521 -0
- package/research/mdcontext-pudding/00-EXECUTIVE-SUMMARY.md +282 -0
- package/research/mdcontext-pudding/01-index-embed.md +956 -0
- package/research/mdcontext-pudding/02-search-COMMANDS.md +142 -0
- package/research/mdcontext-pudding/02-search-SUMMARY.md +146 -0
- package/research/mdcontext-pudding/02-search.md +970 -0
- package/research/mdcontext-pudding/03-context.md +779 -0
- package/research/mdcontext-pudding/04-navigation-and-analytics.md +803 -0
- package/research/mdcontext-pudding/04-tree.md +704 -0
- package/research/mdcontext-pudding/05-config.md +1038 -0
- package/research/mdcontext-pudding/06-links-summary.txt +87 -0
- package/research/mdcontext-pudding/06-links.md +679 -0
- package/research/mdcontext-pudding/07-stats.md +693 -0
- package/research/mdcontext-pudding/BUG-FIX-PLAN.md +388 -0
- package/research/mdcontext-pudding/P0-BUG-VALIDATION.md +167 -0
- package/research/mdcontext-pudding/README.md +168 -0
- package/research/mdcontext-pudding/TESTING-SUMMARY.md +128 -0
- package/research/npm_publish/011-npm-workflow-research-agent2.md +792 -0
- package/research/npm_publish/012-npm-workflow-research-agent1.md +530 -0
- package/research/npm_publish/013-npm-workflow-research-agent3.md +722 -0
- package/research/npm_publish/014-npm-workflow-synthesis.md +556 -0
- package/research/npm_publish/031-npm-workflow-task-analysis.md +134 -0
- package/research/research-quality-review.md +834 -0
- package/research/semantic-search/002-research-embedding-models.md +490 -0
- package/research/semantic-search/003-research-rag-alternatives.md +523 -0
- package/research/semantic-search/004-research-vector-search.md +841 -0
- package/research/semantic-search/032-research-semantic-search.md +427 -0
- package/research/semantic-search/embedding-text-analysis.md +156 -0
- package/research/semantic-search/multi-word-failure-reproduction.md +171 -0
- package/research/semantic-search/query-processing-analysis.md +207 -0
- package/research/semantic-search/root-cause-and-solution.md +114 -0
- package/research/semantic-search/threshold-validation-report.md +69 -0
- package/research/semantic-search/vector-search-analysis.md +63 -0
- package/research/task-management-2026/00-synthesis-recommendations.md +295 -0
- package/research/task-management-2026/01-ai-workflow-tools.md +416 -0
- package/research/task-management-2026/02-agent-framework-patterns.md +476 -0
- package/research/task-management-2026/03-lightweight-file-based.md +567 -0
- package/research/task-management-2026/04-established-tools-ai-features.md +541 -0
- package/research/task-management-2026/linear/01-core-features-workflow.md +771 -0
- package/research/task-management-2026/linear/02-api-integrations.md +930 -0
- package/research/task-management-2026/linear/03-ai-features.md +368 -0
- package/research/task-management-2026/linear/04-pricing-setup.md +205 -0
- package/research/task-management-2026/linear/05-usage-patterns-best-practices.md +605 -0
- package/research/test-path-issues.md +276 -0
- package/review/ALP-76/1-error-type-design.md +962 -0
- package/review/ALP-76/2-error-handling-patterns.md +906 -0
- package/review/ALP-76/3-error-presentation.md +624 -0
- package/review/ALP-76/4-test-coverage.md +625 -0
- package/review/ALP-76/5-migration-completeness.md +440 -0
- package/review/ALP-76/6-effect-best-practices.md +755 -0
- package/scripts/apply-branch-protection.sh +47 -0
- package/scripts/branch-protection-templates.json +79 -0
- package/scripts/prototype-summarization.ts +346 -0
- package/scripts/rebuild-hnswlib.js +58 -0
- package/scripts/setup-branch-protection.sh +64 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/active-provider.json +7 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/bm25.json +541 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/bm25.meta.json +5 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/config.json +8 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.bin +0 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.meta.bin +0 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/documents.json +60 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/links.json +13 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/sections.json +1197 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/configuration-management.md +99 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/distributed-systems.md +92 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/error-handling.md +78 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/failure-automation.md +55 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/job-context.md +69 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/process-orchestration.md +99 -0
- package/src/cli/argv-preprocessor.test.ts +210 -0
- package/src/cli/argv-preprocessor.ts +202 -0
- package/src/cli/cli.test.ts +627 -0
- package/src/cli/commands/backlinks.ts +54 -0
- package/src/cli/commands/config-cmd.ts +642 -0
- package/src/cli/commands/context.ts +285 -0
- package/src/cli/commands/duplicates.ts +122 -0
- package/src/cli/commands/embeddings.ts +529 -0
- package/src/cli/commands/index-cmd.ts +480 -0
- package/src/cli/commands/index.ts +16 -0
- package/src/cli/commands/links.ts +52 -0
- package/src/cli/commands/search.ts +1281 -0
- package/src/cli/commands/stats.ts +149 -0
- package/src/cli/commands/tree.ts +128 -0
- package/src/cli/config-layer.ts +176 -0
- package/src/cli/error-handler.test.ts +235 -0
- package/src/cli/error-handler.ts +655 -0
- package/src/cli/flag-schemas.ts +341 -0
- package/src/cli/help.ts +588 -0
- package/src/cli/index.ts +9 -0
- package/src/cli/main.ts +435 -0
- package/src/cli/options.ts +41 -0
- package/src/cli/shared-error-handling.ts +199 -0
- package/src/cli/typo-suggester.test.ts +105 -0
- package/src/cli/typo-suggester.ts +130 -0
- package/src/cli/utils.ts +259 -0
- package/src/config/file-provider.test.ts +320 -0
- package/src/config/file-provider.ts +273 -0
- package/src/config/index.ts +72 -0
- package/src/config/integration.test.ts +667 -0
- package/src/config/precedence.test.ts +277 -0
- package/src/config/precedence.ts +451 -0
- package/src/config/schema.test.ts +414 -0
- package/src/config/schema.ts +603 -0
- package/src/config/service.test.ts +320 -0
- package/src/config/service.ts +243 -0
- package/src/config/testing.test.ts +264 -0
- package/src/config/testing.ts +110 -0
- package/src/core/index.ts +1 -0
- package/src/core/types.ts +113 -0
- package/src/duplicates/detector.test.ts +183 -0
- package/src/duplicates/detector.ts +414 -0
- package/src/duplicates/index.ts +18 -0
- package/src/embeddings/embedding-namespace.test.ts +300 -0
- package/src/embeddings/embedding-namespace.ts +947 -0
- package/src/embeddings/heading-boost.test.ts +222 -0
- package/src/embeddings/hnsw-build-options.test.ts +198 -0
- package/src/embeddings/hyde.test.ts +272 -0
- package/src/embeddings/hyde.ts +264 -0
- package/src/embeddings/index.ts +10 -0
- package/src/embeddings/openai-provider.ts +414 -0
- package/src/embeddings/pricing.json +22 -0
- package/src/embeddings/provider-constants.ts +204 -0
- package/src/embeddings/provider-errors.test.ts +967 -0
- package/src/embeddings/provider-errors.ts +565 -0
- package/src/embeddings/provider-factory.test.ts +240 -0
- package/src/embeddings/provider-factory.ts +225 -0
- package/src/embeddings/provider-integration.test.ts +788 -0
- package/src/embeddings/query-preprocessing.test.ts +187 -0
- package/src/embeddings/semantic-search-threshold.test.ts +508 -0
- package/src/embeddings/semantic-search.ts +1270 -0
- package/src/embeddings/types.ts +359 -0
- package/src/embeddings/vector-store.ts +708 -0
- package/src/embeddings/voyage-provider.ts +313 -0
- package/src/errors/errors.test.ts +845 -0
- package/src/errors/index.ts +533 -0
- package/src/index/ignore-patterns.test.ts +354 -0
- package/src/index/ignore-patterns.ts +305 -0
- package/src/index/index.ts +4 -0
- package/src/index/indexer.ts +684 -0
- package/src/index/storage.ts +260 -0
- package/src/index/types.ts +147 -0
- package/src/index/watcher.ts +189 -0
- package/src/index.ts +30 -0
- package/src/integration/search-keyword.test.ts +678 -0
- package/src/mcp/server.ts +612 -0
- package/src/parser/index.ts +1 -0
- package/src/parser/parser.test.ts +291 -0
- package/src/parser/parser.ts +394 -0
- package/src/parser/section-filter.test.ts +277 -0
- package/src/parser/section-filter.ts +392 -0
- package/src/search/__tests__/hybrid-search.test.ts +650 -0
- package/src/search/bm25-store.ts +366 -0
- package/src/search/cross-encoder.test.ts +253 -0
- package/src/search/cross-encoder.ts +406 -0
- package/src/search/fuzzy-search.test.ts +419 -0
- package/src/search/fuzzy-search.ts +273 -0
- package/src/search/hybrid-search.ts +448 -0
- package/src/search/path-matcher.test.ts +276 -0
- package/src/search/path-matcher.ts +33 -0
- package/src/search/query-parser.test.ts +260 -0
- package/src/search/query-parser.ts +319 -0
- package/src/search/searcher.test.ts +280 -0
- package/src/search/searcher.ts +724 -0
- package/src/search/wink-bm25.d.ts +30 -0
- package/src/summarization/cli-providers/claude.ts +202 -0
- package/src/summarization/cli-providers/detection.test.ts +273 -0
- package/src/summarization/cli-providers/detection.ts +118 -0
- package/src/summarization/cli-providers/index.ts +8 -0
- package/src/summarization/cost.test.ts +139 -0
- package/src/summarization/cost.ts +102 -0
- package/src/summarization/error-handler.test.ts +127 -0
- package/src/summarization/error-handler.ts +111 -0
- package/src/summarization/index.ts +102 -0
- package/src/summarization/pipeline.test.ts +498 -0
- package/src/summarization/pipeline.ts +231 -0
- package/src/summarization/prompts.test.ts +269 -0
- package/src/summarization/prompts.ts +133 -0
- package/src/summarization/provider-factory.test.ts +396 -0
- package/src/summarization/provider-factory.ts +178 -0
- package/src/summarization/types.ts +184 -0
- package/src/summarize/budget-bugs.test.ts +620 -0
- package/src/summarize/formatters.ts +419 -0
- package/src/summarize/index.ts +20 -0
- package/src/summarize/summarizer.test.ts +275 -0
- package/src/summarize/summarizer.ts +597 -0
- package/src/summarize/verify-bugs.test.ts +238 -0
- package/src/types/huggingface-transformers.d.ts +66 -0
- package/src/utils/index.ts +1 -0
- package/src/utils/tokens.test.ts +142 -0
- package/src/utils/tokens.ts +186 -0
- package/tests/fixtures/cli/.mdcontext/active-provider.json +7 -0
- package/tests/fixtures/cli/.mdcontext/config.json +8 -0
- package/tests/fixtures/cli/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.bin +0 -0
- package/tests/fixtures/cli/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.meta.bin +0 -0
- package/tests/fixtures/cli/.mdcontext/indexes/documents.json +33 -0
- package/tests/fixtures/cli/.mdcontext/indexes/links.json +12 -0
- package/tests/fixtures/cli/.mdcontext/indexes/sections.json +247 -0
- package/tests/fixtures/cli/README.md +9 -0
- package/tests/fixtures/cli/api-reference.md +11 -0
- package/tests/fixtures/cli/getting-started.md +11 -0
- package/tests/integration/embed-index.test.ts +712 -0
- package/tests/integration/search-context.test.ts +469 -0
- package/tests/integration/search-semantic.test.ts +522 -0
- package/tsconfig.json +26 -0
- package/vitest.config.ts +16 -0
- package/vitest.setup.ts +12 -0
|
@@ -0,0 +1,841 @@
|
|
|
1
|
+
# Vector Search Research: Patterns and Techniques (2025-2026)
|
|
2
|
+
|
|
3
|
+
Research findings for improving mdcontext semantic search capabilities.
|
|
4
|
+
|
|
5
|
+
## Table of Contents
|
|
6
|
+
|
|
7
|
+
1. [Hybrid Search](#1-hybrid-search)
|
|
8
|
+
2. [Re-ranking Approaches](#2-re-ranking-approaches)
|
|
9
|
+
3. [Vector Index Alternatives](#3-vector-index-alternatives)
|
|
10
|
+
4. [Filtering and Metadata](#4-filtering-and-metadata)
|
|
11
|
+
5. [Emerging Patterns](#5-emerging-patterns-2025-2026)
|
|
12
|
+
6. [Quick Wins: HNSW Parameter Tuning](#6-quick-wins-hnsw-parameter-tuning)
|
|
13
|
+
7. [Top 3 Recommendations](#7-top-3-recommendations)
|
|
14
|
+
8. [Effort/Impact Analysis](#8-effortimpact-analysis)
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## 1. Hybrid Search
|
|
19
|
+
|
|
20
|
+
Hybrid search combines sparse retrieval (BM25/keyword) with dense retrieval (vector embeddings) to leverage the strengths of both approaches.
|
|
21
|
+
|
|
22
|
+
### Why Hybrid Search?
|
|
23
|
+
|
|
24
|
+
| Approach | Strengths | Weaknesses |
|
|
25
|
+
| -------------------- | --------------------------------------------------------------------------------------------------------- | ----------------------------------------------- |
|
|
26
|
+
| **Keyword (BM25)** | Exact term matching, handles specific identifiers (e.g., "TS-01"), no vocabulary mismatch for known terms | Misses synonyms, semantic meaning, context |
|
|
27
|
+
| **Semantic (Dense)** | Understands meaning, handles paraphrasing, conceptual similarity | May miss exact terms, identifiers, proper nouns |
|
|
28
|
+
| **Hybrid** | Best of both worlds: exact + semantic | Added complexity, needs score fusion |
|
|
29
|
+
|
|
30
|
+
**Key insight**: Pure embedding search may miss important exact matches. For example, searching for "TS-01" won't naturally retrieve documents mentioning that identifier because embeddings represent high-dimensional semantic space, not lexical matches.
|
|
31
|
+
|
|
32
|
+
### Fusion Techniques
|
|
33
|
+
|
|
34
|
+
#### Reciprocal Rank Fusion (RRF)
|
|
35
|
+
|
|
36
|
+
RRF is the most widely adopted fusion algorithm for hybrid search. It merges ranked lists without requiring score normalization.
|
|
37
|
+
|
|
38
|
+
**Formula**:
|
|
39
|
+
|
|
40
|
+
```
|
|
41
|
+
RRF_score(d) = Σ 1/(k + rank_i(d))
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Where:
|
|
45
|
+
|
|
46
|
+
- `k` is a smoothing constant (typically 60)
|
|
47
|
+
- `rank_i(d)` is the document's rank in system i
|
|
48
|
+
|
|
49
|
+
**Advantages**:
|
|
50
|
+
|
|
51
|
+
- Score-agnostic: Works with incompatible scoring systems (cosine similarity 0-1 vs BM25 unbounded)
|
|
52
|
+
- Simple to implement
|
|
53
|
+
- No hyperparameter tuning for score scales
|
|
54
|
+
- Robust across different retrieval methods
|
|
55
|
+
|
|
56
|
+
**Performance**: Hybrid search with RRF consistently outperforms single-method retrieval by 10-15% in precision benchmarks.
|
|
57
|
+
|
|
58
|
+
#### Weighted RRF
|
|
59
|
+
|
|
60
|
+
Extends RRF with configurable weights per retrieval method:
|
|
61
|
+
|
|
62
|
+
```typescript
|
|
63
|
+
// Example configuration
|
|
64
|
+
const weights = {
|
|
65
|
+
bm25: 1.0, // Full weight for lexical precision
|
|
66
|
+
semantic: 0.7, // Slightly lower for semantic similarity
|
|
67
|
+
};
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
This allows emphasizing one method over another based on use case.
|
|
71
|
+
|
|
72
|
+
#### Linear Combination
|
|
73
|
+
|
|
74
|
+
Simpler fusion that combines normalized scores directly:
|
|
75
|
+
|
|
76
|
+
```typescript
|
|
77
|
+
finalScore = alpha * semanticScore + (1 - alpha) * bm25Score;
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
**Requires** score normalization to same scale. Less robust than RRF but faster.
|
|
81
|
+
|
|
82
|
+
### Implementation Options for Node.js
|
|
83
|
+
|
|
84
|
+
#### BM25 Libraries
|
|
85
|
+
|
|
86
|
+
| Library | Notes | NPM |
|
|
87
|
+
| --------------------------- | ------------------------------------------------------------ | ----------------------- |
|
|
88
|
+
| **wink-bm25-text-search** | Full-featured, supports field weighting, ~100% test coverage | `wink-bm25-text-search` |
|
|
89
|
+
| **OkapiBM25** | Simple, typed implementation, 111K downloads/year | `okapi-bm25` |
|
|
90
|
+
| **@langchain/community** | BM25Retriever for LangChain pipelines | `@langchain/community` |
|
|
91
|
+
| **winkNLP BM25 Vectorizer** | BM25 with configurable k1/b parameters | `wink-nlp` |
|
|
92
|
+
|
|
93
|
+
**Recommendation**: `wink-bm25-text-search` for its reliability and semantic features (stemming, stop words, field boosting).
|
|
94
|
+
|
|
95
|
+
### Hybrid Search Implementation Pattern
|
|
96
|
+
|
|
97
|
+
```typescript
|
|
98
|
+
interface HybridSearchResult {
|
|
99
|
+
sectionId: string;
|
|
100
|
+
semanticRank?: number;
|
|
101
|
+
bm25Rank?: number;
|
|
102
|
+
rrfScore: number;
|
|
103
|
+
}
|
|
104
|
+
|
|
105
|
+
function hybridSearch(
|
|
106
|
+
query: string,
|
|
107
|
+
options: {
|
|
108
|
+
semanticWeight?: number; // default 1.0
|
|
109
|
+
bm25Weight?: number; // default 1.0
|
|
110
|
+
k?: number; // RRF smoothing constant, default 60
|
|
111
|
+
limit?: number;
|
|
112
|
+
},
|
|
113
|
+
): HybridSearchResult[] {
|
|
114
|
+
// 1. Run both searches in parallel
|
|
115
|
+
const [semanticResults, bm25Results] = await Promise.all([
|
|
116
|
+
semanticSearch(query, { limit: limit * 2 }),
|
|
117
|
+
bm25Search(query, { limit: limit * 2 }),
|
|
118
|
+
]);
|
|
119
|
+
|
|
120
|
+
// 2. Apply RRF fusion
|
|
121
|
+
const scores = new Map<string, number>();
|
|
122
|
+
|
|
123
|
+
semanticResults.forEach((r, i) => {
|
|
124
|
+
const rank = i + 1;
|
|
125
|
+
const score = (scores.get(r.sectionId) || 0) + semanticWeight / (k + rank);
|
|
126
|
+
scores.set(r.sectionId, score);
|
|
127
|
+
});
|
|
128
|
+
|
|
129
|
+
bm25Results.forEach((r, i) => {
|
|
130
|
+
const rank = i + 1;
|
|
131
|
+
const score = (scores.get(r.sectionId) || 0) + bm25Weight / (k + rank);
|
|
132
|
+
scores.set(r.sectionId, score);
|
|
133
|
+
});
|
|
134
|
+
|
|
135
|
+
// 3. Sort by RRF score and return top results
|
|
136
|
+
return [...scores.entries()]
|
|
137
|
+
.sort((a, b) => b[1] - a[1])
|
|
138
|
+
.slice(0, limit)
|
|
139
|
+
.map(([id, score]) => ({ sectionId: id, rrfScore: score }));
|
|
140
|
+
}
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
### When Hybrid Beats Pure Semantic
|
|
144
|
+
|
|
145
|
+
- **Exact term searches**: Product codes, error codes, API names
|
|
146
|
+
- **Proper nouns**: Names, brands, specific technologies
|
|
147
|
+
- **Technical documentation**: Where exact terminology matters
|
|
148
|
+
- **Short queries**: Single-word searches that need lexical grounding
|
|
149
|
+
|
|
150
|
+
---
|
|
151
|
+
|
|
152
|
+
## 2. Re-ranking Approaches
|
|
153
|
+
|
|
154
|
+
Re-ranking is a two-stage retrieval pattern: first retrieve candidates with a fast method, then re-rank top-N with a more accurate model.
|
|
155
|
+
|
|
156
|
+
### Why Re-ranking?
|
|
157
|
+
|
|
158
|
+
- **Bi-encoders (embedding models)** encode query and documents separately, enabling fast ANN search but missing cross-attention between query-document pairs
|
|
159
|
+
- **Cross-encoders** jointly encode query+document, capturing fine-grained relevance but too slow for full corpus search
|
|
160
|
+
- **Solution**: Use bi-encoder for retrieval, cross-encoder for re-ranking top candidates
|
|
161
|
+
|
|
162
|
+
### Cross-Encoder Models
|
|
163
|
+
|
|
164
|
+
#### MS-MARCO MiniLM (Recommended for mdcontext)
|
|
165
|
+
|
|
166
|
+
| Model | Parameters | Latency | Use Case |
|
|
167
|
+
| ------------------------- | ---------- | ---------------- | -------------------------- |
|
|
168
|
+
| `ms-marco-MiniLM-L-6-v2` | 22.7M | 2-5ms/pair (CPU) | Fast, general purpose |
|
|
169
|
+
| `ms-marco-MiniLM-L-12-v2` | 33M | ~10ms/pair | Better quality, still fast |
|
|
170
|
+
| `BGE-reranker-base` | - | - | Multilingual support |
|
|
171
|
+
| `BGE-reranker-large` | - | - | Best quality, multilingual |
|
|
172
|
+
|
|
173
|
+
**Key stats**:
|
|
174
|
+
|
|
175
|
+
- Re-ranking typically improves RAG accuracy by 20-35%
|
|
176
|
+
- Adds 200-500ms latency (for top-20 re-ranking)
|
|
177
|
+
- Leading organizations see 30-50% improvements in retrieval precision
|
|
178
|
+
|
|
179
|
+
#### Performance Benefits (2025-2026 Production Data)
|
|
180
|
+
|
|
181
|
+
> "Three factors converge in 2026 to make reranking mainstream: open-source cross-encoder implementations have matured significantly, models like ms-marco-MiniLM-L-12-v2 deliver 95% of the performance of proprietary alternatives while running on commodity hardware."
|
|
182
|
+
|
|
183
|
+
### ColBERT: Late Interaction Models
|
|
184
|
+
|
|
185
|
+
ColBERT uses "late interaction" - encoding query and document separately but comparing at token level:
|
|
186
|
+
|
|
187
|
+
**Architecture**:
|
|
188
|
+
|
|
189
|
+
```
|
|
190
|
+
Query: [q1, q2, q3, ...] → Token embeddings
|
|
191
|
+
Document: [d1, d2, d3, ...] → Token embeddings
|
|
192
|
+
Score: MaxSim(Q, D) = Σ max(qi · dj)
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
**Advantages**:
|
|
196
|
+
|
|
197
|
+
- Better quality than bi-encoders
|
|
198
|
+
- Faster than cross-encoders (document embeddings can be precomputed)
|
|
199
|
+
- Storage-efficient with ColBERTv2 residual compression (6-10x smaller)
|
|
200
|
+
|
|
201
|
+
**Production Readiness (2025)**:
|
|
202
|
+
|
|
203
|
+
- Memory-mapped index storage (ColBERT-serve) reduces RAM by 90%+
|
|
204
|
+
- RAGatouille library provides easy Python integration
|
|
205
|
+
- Active research area (ECIR 2026 workshop on Late Interaction)
|
|
206
|
+
|
|
207
|
+
**For mdcontext**: ColBERT is likely overkill given the modest corpus size. Cross-encoders offer simpler integration with similar quality benefits.
|
|
208
|
+
|
|
209
|
+
### LLM-Based Re-ranking
|
|
210
|
+
|
|
211
|
+
Using language models to rank search results:
|
|
212
|
+
|
|
213
|
+
```typescript
|
|
214
|
+
// Example prompt
|
|
215
|
+
const prompt = `
|
|
216
|
+
Given the query: "${query}"
|
|
217
|
+
|
|
218
|
+
Rank these documents by relevance (most relevant first):
|
|
219
|
+
${documents.map((d, i) => `${i + 1}. ${d.title}: ${d.snippet}`).join("\n")}
|
|
220
|
+
|
|
221
|
+
Return only the numbers in ranked order.
|
|
222
|
+
`;
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
**Pros**: Highly accurate, understands nuance
|
|
226
|
+
**Cons**: Slow, expensive, adds LLM dependency
|
|
227
|
+
|
|
228
|
+
**Recommendation for mdcontext**: Not recommended. Cross-encoders provide good accuracy without LLM cost/latency.
|
|
229
|
+
|
|
230
|
+
### JavaScript/TypeScript Implementation Options
|
|
231
|
+
|
|
232
|
+
#### Option 1: Transformers.js (Browser + Node.js)
|
|
233
|
+
|
|
234
|
+
```typescript
|
|
235
|
+
import { pipeline } from "@xenova/transformers";
|
|
236
|
+
|
|
237
|
+
// Load cross-encoder for re-ranking
|
|
238
|
+
const reranker = await pipeline(
|
|
239
|
+
"text-classification",
|
|
240
|
+
"Xenova/ms-marco-MiniLM-L-6-v2",
|
|
241
|
+
);
|
|
242
|
+
|
|
243
|
+
// Score query-document pairs
|
|
244
|
+
const scores = await Promise.all(
|
|
245
|
+
documents.map((doc) => reranker(`${query} [SEP] ${doc.content}`)),
|
|
246
|
+
);
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
**Pros**:
|
|
250
|
+
|
|
251
|
+
- Runs locally (no API calls)
|
|
252
|
+
- ONNX runtime + WebGPU acceleration available
|
|
253
|
+
- Works in browser and Node.js
|
|
254
|
+
|
|
255
|
+
**Cons**:
|
|
256
|
+
|
|
257
|
+
- Model download required (~80MB for MiniLM-L6)
|
|
258
|
+
- First load is slow
|
|
259
|
+
- Node.js ONNX setup can be tricky
|
|
260
|
+
|
|
261
|
+
#### Option 2: External Re-ranking API
|
|
262
|
+
|
|
263
|
+
Services like Cohere, Jina, or self-hosted endpoints.
|
|
264
|
+
|
|
265
|
+
```typescript
|
|
266
|
+
const response = await fetch("https://api.cohere.ai/v1/rerank", {
|
|
267
|
+
method: "POST",
|
|
268
|
+
headers: { Authorization: `Bearer ${apiKey}` },
|
|
269
|
+
body: JSON.stringify({
|
|
270
|
+
query,
|
|
271
|
+
documents: docs.map((d) => d.content),
|
|
272
|
+
top_n: 10,
|
|
273
|
+
model: "rerank-english-v2.0",
|
|
274
|
+
}),
|
|
275
|
+
});
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
**Pros**: Easy integration, no local model management
|
|
279
|
+
**Cons**: API cost, latency, dependency
|
|
280
|
+
|
|
281
|
+
### When to Use Re-ranking
|
|
282
|
+
|
|
283
|
+
| Use Case | Re-ranking Value |
|
|
284
|
+
| ------------------------------ | ---------------- |
|
|
285
|
+
| High-precision requirements | High |
|
|
286
|
+
| Long documents with dense info | High |
|
|
287
|
+
| Ambiguous queries | High |
|
|
288
|
+
| Simple keyword searches | Low |
|
|
289
|
+
| Real-time autocomplete | Low (latency) |
|
|
290
|
+
| Very small result sets (<5) | Low |
|
|
291
|
+
|
|
292
|
+
**For mdcontext**: Medium-high value. Documentation search benefits from re-ranking because section embeddings may rank "close enough" results highly, and cross-encoders can distinguish subtle relevance differences.
|
|
293
|
+
|
|
294
|
+
---
|
|
295
|
+
|
|
296
|
+
## 3. Vector Index Alternatives
|
|
297
|
+
|
|
298
|
+
### HNSW (Current mdcontext Implementation)
|
|
299
|
+
|
|
300
|
+
**Strengths**:
|
|
301
|
+
|
|
302
|
+
- Near-instantaneous nearest neighbor retrieval
|
|
303
|
+
- Excellent recall and speed when data fits in RAM
|
|
304
|
+
- Incremental updates without rebuild
|
|
305
|
+
- Well-supported in Node.js (hnswlib-node)
|
|
306
|
+
|
|
307
|
+
**Weaknesses**:
|
|
308
|
+
|
|
309
|
+
- Entire graph must fit in memory
|
|
310
|
+
- Higher memory footprint per vector (graph structure overhead)
|
|
311
|
+
|
|
312
|
+
**Best for**: Mid-sized datasets (<10M vectors) with RAM budget
|
|
313
|
+
|
|
314
|
+
### IVF (Inverted File Index)
|
|
315
|
+
|
|
316
|
+
**Architecture**: Clusters vectors using k-means, searches only relevant clusters
|
|
317
|
+
|
|
318
|
+
**Strengths**:
|
|
319
|
+
|
|
320
|
+
- Lower memory than HNSW (loads clusters on-demand)
|
|
321
|
+
- Configurable recall/speed tradeoff via nprobe parameter
|
|
322
|
+
- IVF+PQ enables billion-scale on disk
|
|
323
|
+
|
|
324
|
+
**Weaknesses**:
|
|
325
|
+
|
|
326
|
+
- Accuracy depends on clustering quality
|
|
327
|
+
- Updates require re-clustering
|
|
328
|
+
- Cold queries may miss results if clusters are poor
|
|
329
|
+
|
|
330
|
+
**Best for**: Large static datasets, memory-constrained environments
|
|
331
|
+
|
|
332
|
+
### DiskANN
|
|
333
|
+
|
|
334
|
+
**Architecture**: Vamana graph + product quantization for SSD storage
|
|
335
|
+
|
|
336
|
+
**Strengths**:
|
|
337
|
+
|
|
338
|
+
- Handles datasets larger than RAM
|
|
339
|
+
- Stable latency with beam search and caching
|
|
340
|
+
- Good for dynamic datasets
|
|
341
|
+
|
|
342
|
+
**Weaknesses**:
|
|
343
|
+
|
|
344
|
+
- IOPS bottlenecks possible
|
|
345
|
+
- Base DiskANN is immutable (FreshDiskANN adds updates)
|
|
346
|
+
- More complex setup
|
|
347
|
+
|
|
348
|
+
**Best for**: Large datasets (10M+) where ~25% fits in RAM
|
|
349
|
+
|
|
350
|
+
### Comparison Summary
|
|
351
|
+
|
|
352
|
+
| Index | Memory | Speed | Updates | Best Scale |
|
|
353
|
+
| ----------- | ------ | ------- | ------- | ------------ |
|
|
354
|
+
| **HNSW** | High | Fastest | Easy | <10M vectors |
|
|
355
|
+
| **IVF** | Medium | Fast | Rebuild | 10M-100M |
|
|
356
|
+
| **DiskANN** | Low | Good | Limited | 100M+ |
|
|
357
|
+
|
|
358
|
+
### Node.js Library Options
|
|
359
|
+
|
|
360
|
+
| Library | Index Types | Notes |
|
|
361
|
+
| ------------------------ | ------------- | --------------------------------------------- |
|
|
362
|
+
| **hnswlib-node** | HNSW only | Mature, reliable, current mdcontext choice |
|
|
363
|
+
| **faiss-node** | IVF, HNSW, PQ | Facebook's FAISS bindings, more index options |
|
|
364
|
+
| **LangChain FaissStore** | FAISS-backed | Higher-level API, LangChain ecosystem |
|
|
365
|
+
| **hnswsqlite** | HNSW + SQLite | Persistence with metadata |
|
|
366
|
+
|
|
367
|
+
**Recommendation for mdcontext**: Stay with hnswlib-node. Documentation corpora are typically <100K sections, well within HNSW's sweet spot. The complexity of FAISS isn't warranted.
|
|
368
|
+
|
|
369
|
+
---
|
|
370
|
+
|
|
371
|
+
## 4. Filtering and Metadata
|
|
372
|
+
|
|
373
|
+
### The Filtering Challenge
|
|
374
|
+
|
|
375
|
+
Vector search filtering is non-trivial because ANN indexes like HNSW optimize for similarity, not attribute filtering.
|
|
376
|
+
|
|
377
|
+
### Three Strategies
|
|
378
|
+
|
|
379
|
+
#### Pre-Filtering (Filter-Then-Search)
|
|
380
|
+
|
|
381
|
+
1. Apply metadata filter (e.g., `path LIKE 'docs/api/%'`)
|
|
382
|
+
2. Run ANN search on filtered subset
|
|
383
|
+
|
|
384
|
+
**Pros**:
|
|
385
|
+
|
|
386
|
+
- Accurate results (only searches valid candidates)
|
|
387
|
+
- Works well for low-cardinality filters
|
|
388
|
+
|
|
389
|
+
**Cons**:
|
|
390
|
+
|
|
391
|
+
- **Breaks HNSW graph connectivity** when filter is highly selective
|
|
392
|
+
- May require brute-force search on small filtered sets
|
|
393
|
+
- Significant recall drop when <10% vectors remain
|
|
394
|
+
|
|
395
|
+
#### Post-Filtering (Search-Then-Filter)
|
|
396
|
+
|
|
397
|
+
1. Run ANN search for k\*N candidates
|
|
398
|
+
2. Apply metadata filter
|
|
399
|
+
3. Return top k that pass filter
|
|
400
|
+
|
|
401
|
+
**Pros**:
|
|
402
|
+
|
|
403
|
+
- Predictable latency
|
|
404
|
+
- HNSW graph stays intact
|
|
405
|
+
|
|
406
|
+
**Cons**:
|
|
407
|
+
|
|
408
|
+
- May return fewer than k results
|
|
409
|
+
- Wastes computation on filtered-out results
|
|
410
|
+
- Poor recall with selective filters
|
|
411
|
+
|
|
412
|
+
#### Integrated Filtering (In-Algorithm)
|
|
413
|
+
|
|
414
|
+
Modern vector databases modify the search algorithm to be filter-aware:
|
|
415
|
+
|
|
416
|
+
- **Weaviate ACORN**: Two-hop graph expansion for filtered search
|
|
417
|
+
- **Qdrant**: Pre-filtering with automatic fallback to payload index
|
|
418
|
+
- **Pinecone**: Merged metadata and vector indexes
|
|
419
|
+
|
|
420
|
+
**Performance**: Engines with integrated filtering maintain recall and often get _faster_ with filters (less work to do).
|
|
421
|
+
|
|
422
|
+
### Current mdcontext Filtering
|
|
423
|
+
|
|
424
|
+
From `current-implementation.md`:
|
|
425
|
+
|
|
426
|
+
- Only path pattern filtering supported (`pathPattern` option)
|
|
427
|
+
- Implemented as post-filtering
|
|
428
|
+
|
|
429
|
+
### Recommended Approach for mdcontext
|
|
430
|
+
|
|
431
|
+
Given typical documentation corpus sizes (<100K sections), a pragmatic hybrid approach:
|
|
432
|
+
|
|
433
|
+
```typescript
|
|
434
|
+
interface FilteredSearchOptions {
|
|
435
|
+
pathPattern?: string;
|
|
436
|
+
documentTypes?: string[];
|
|
437
|
+
minTokens?: number;
|
|
438
|
+
// Future: tags, dates, etc.
|
|
439
|
+
}
|
|
440
|
+
|
|
441
|
+
async function filteredSearch(query: string, options: FilteredSearchOptions) {
|
|
442
|
+
// 1. Estimate filter selectivity
|
|
443
|
+
const totalDocs = await getDocumentCount();
|
|
444
|
+
const filteredCount = await estimateFilteredCount(options);
|
|
445
|
+
const selectivity = filteredCount / totalDocs;
|
|
446
|
+
|
|
447
|
+
if (selectivity < 0.1) {
|
|
448
|
+
// Highly selective: brute-force on filtered set
|
|
449
|
+
const candidates = await getFilteredSections(options);
|
|
450
|
+
return bruteForceSearch(query, candidates);
|
|
451
|
+
} else if (selectivity < 0.5) {
|
|
452
|
+
// Medium selectivity: over-fetch then filter
|
|
453
|
+
const results = await semanticSearch(query, { limit: limit * 3 });
|
|
454
|
+
return applyFilters(results, options).slice(0, limit);
|
|
455
|
+
} else {
|
|
456
|
+
// Low selectivity: standard search with post-filter
|
|
457
|
+
const results = await semanticSearch(query, { limit: limit * 1.5 });
|
|
458
|
+
return applyFilters(results, options).slice(0, limit);
|
|
459
|
+
}
|
|
460
|
+
}
|
|
461
|
+
```
|
|
462
|
+
|
|
463
|
+
### Metadata to Consider
|
|
464
|
+
|
|
465
|
+
| Metadata | Use Case |
|
|
466
|
+
| -------------- | ---------------------------------- |
|
|
467
|
+
| `documentPath` | Filter by directory/file |
|
|
468
|
+
| `documentType` | Filter API docs, guides, tutorials |
|
|
469
|
+
| `lastModified` | Prefer recent content |
|
|
470
|
+
| `tokens` | Filter by content length |
|
|
471
|
+
| `headingLevel` | Prefer top-level sections |
|
|
472
|
+
| `tags` | Custom categorization |
|
|
473
|
+
|
|
474
|
+
---
|
|
475
|
+
|
|
476
|
+
## 5. Emerging Patterns (2025-2026)
|
|
477
|
+
|
|
478
|
+
### Learned Sparse Retrieval (SPLADE)
|
|
479
|
+
|
|
480
|
+
**What it is**: Neural models that produce sparse vectors with semantic term expansion.
|
|
481
|
+
|
|
482
|
+
**How it works**:
|
|
483
|
+
|
|
484
|
+
- Encodes text into sparse vector where dimensions = vocabulary terms
|
|
485
|
+
- Activates semantically related terms (e.g., "study" also activates "learn", "research")
|
|
486
|
+
- Compatible with inverted indexes like BM25
|
|
487
|
+
|
|
488
|
+
**SPLADE vs BM25**:
|
|
489
|
+
|
|
490
|
+
| Aspect | BM25 | SPLADE |
|
|
491
|
+
| ------------------- | ----------------- | ---------------------------- |
|
|
492
|
+
| Vocabulary mismatch | Critical weakness | Solved via expansion |
|
|
493
|
+
| Latency | Baseline | Similar (with optimizations) |
|
|
494
|
+
| Quality | Good | Better in-domain |
|
|
495
|
+
| Index compatibility | Inverted index | Inverted index |
|
|
496
|
+
|
|
497
|
+
**2025 Status**:
|
|
498
|
+
|
|
499
|
+
- SPLADE efficiency now matches BM25 (<4ms difference)
|
|
500
|
+
- Best results with hybrid sparse+dense approaches
|
|
501
|
+
- New pruning techniques (Superblock Pruning) up to 16x faster
|
|
502
|
+
|
|
503
|
+
**For mdcontext**: Interesting but adds complexity. BM25 + semantic hybrid likely sufficient.
|
|
504
|
+
|
|
505
|
+
### Query Expansion with HyDE
|
|
506
|
+
|
|
507
|
+
**Hypothetical Document Embeddings (HyDE)**:
|
|
508
|
+
|
|
509
|
+
1. User submits query
|
|
510
|
+
2. LLM generates hypothetical answer document
|
|
511
|
+
3. Embed the hypothetical document (not the query)
|
|
512
|
+
4. Search for real documents similar to the hypothetical
|
|
513
|
+
|
|
514
|
+
**Why it works**: Compares document-to-document rather than question-to-document, bridging the semantic gap.
|
|
515
|
+
|
|
516
|
+
**Implementation**:
|
|
517
|
+
|
|
518
|
+
```typescript
|
|
519
|
+
async function hydeSearch(query: string) {
|
|
520
|
+
// 1. Generate hypothetical document
|
|
521
|
+
const hypothetical = await llm.generate(
|
|
522
|
+
`Write a detailed paragraph that would answer: "${query}"`,
|
|
523
|
+
);
|
|
524
|
+
|
|
525
|
+
// 2. Embed hypothetical (or average multiple)
|
|
526
|
+
const embedding = await embed(hypothetical);
|
|
527
|
+
|
|
528
|
+
// 3. Search with hypothetical embedding
|
|
529
|
+
return vectorStore.search(embedding);
|
|
530
|
+
}
|
|
531
|
+
```
|
|
532
|
+
|
|
533
|
+
**Benefits**:
|
|
534
|
+
|
|
535
|
+
- 10-30% retrieval improvement on ambiguous queries
|
|
536
|
+
- Zero-shot (no training required)
|
|
537
|
+
- Domain adaptable
|
|
538
|
+
|
|
539
|
+
**Limitations**:
|
|
540
|
+
|
|
541
|
+
- Requires LLM call (cost, latency)
|
|
542
|
+
- Works poorly if LLM has no domain knowledge
|
|
543
|
+
- May hallucinate misleading hypotheticals
|
|
544
|
+
|
|
545
|
+
**For mdcontext**: Good option for complex queries, could be opt-in feature.
|
|
546
|
+
|
|
547
|
+
### GraphRAG
|
|
548
|
+
|
|
549
|
+
Combines vector search with knowledge graphs:
|
|
550
|
+
|
|
551
|
+
- Entities and relationships extracted from documents
|
|
552
|
+
- Queries traverse both vector space and graph
|
|
553
|
+
- Claims 99% precision in some benchmarks
|
|
554
|
+
|
|
555
|
+
**For mdcontext**: Overkill for documentation search. More relevant for enterprise knowledge bases.
|
|
556
|
+
|
|
557
|
+
### Long-Context RAG
|
|
558
|
+
|
|
559
|
+
Processing longer retrieval units (sections, documents) rather than small chunks.
|
|
560
|
+
|
|
561
|
+
**Benefits**:
|
|
562
|
+
|
|
563
|
+
- Preserves context
|
|
564
|
+
- Reduces fragmentation
|
|
565
|
+
- Better for coherent understanding
|
|
566
|
+
|
|
567
|
+
**mdcontext alignment**: Already uses section-level granularity, well-aligned with this trend.
|
|
568
|
+
|
|
569
|
+
### Self-RAG
|
|
570
|
+
|
|
571
|
+
Self-reflective retrieval that:
|
|
572
|
+
|
|
573
|
+
1. Decides when to retrieve
|
|
574
|
+
2. Evaluates retrieval quality
|
|
575
|
+
3. Critiques generated outputs
|
|
576
|
+
|
|
577
|
+
**For mdcontext**: Beyond current scope, more relevant for RAG pipelines with generation.
|
|
578
|
+
|
|
579
|
+
---
|
|
580
|
+
|
|
581
|
+
## 6. Quick Wins: HNSW Parameter Tuning
|
|
582
|
+
|
|
583
|
+
Current mdcontext parameters (from `current-implementation.md`):
|
|
584
|
+
|
|
585
|
+
```typescript
|
|
586
|
+
M: 16; // Max connections per node
|
|
587
|
+
efConstruction: 200; // Construction-time search width
|
|
588
|
+
efSearch: 100; // Query-time search width (implicit)
|
|
589
|
+
```
|
|
590
|
+
|
|
591
|
+
### Parameter Effects
|
|
592
|
+
|
|
593
|
+
| Parameter | Increase Effect | Decrease Effect |
|
|
594
|
+
| ------------------ | ----------------------------------------- | ----------------------------------------- |
|
|
595
|
+
| **M** | Better recall, larger index, slower build | Faster build, smaller index, lower recall |
|
|
596
|
+
| **efConstruction** | Better graph quality, slower build | Faster build, potentially lower recall |
|
|
597
|
+
| **efSearch** | Better recall, slower queries | Faster queries, lower recall |
|
|
598
|
+
|
|
599
|
+
### Recommended Tuning
|
|
600
|
+
|
|
601
|
+
For documentation search (~1K-100K sections):
|
|
602
|
+
|
|
603
|
+
#### Option A: Balanced (Current)
|
|
604
|
+
|
|
605
|
+
```typescript
|
|
606
|
+
M: 16;
|
|
607
|
+
efConstruction: 200;
|
|
608
|
+
efSearch: 100; // Consider increasing
|
|
609
|
+
```
|
|
610
|
+
|
|
611
|
+
Good balance, may benefit from higher efSearch.
|
|
612
|
+
|
|
613
|
+
#### Option B: Quality-Focused
|
|
614
|
+
|
|
615
|
+
```typescript
|
|
616
|
+
M: 24; // More connections
|
|
617
|
+
efConstruction: 256; // Better graph
|
|
618
|
+
efSearch: 200; // More thorough search
|
|
619
|
+
```
|
|
620
|
+
|
|
621
|
+
~30% more memory, ~95%+ recall, slightly slower build.
|
|
622
|
+
|
|
623
|
+
#### Option C: Speed-Focused
|
|
624
|
+
|
|
625
|
+
```typescript
|
|
626
|
+
M: 12;
|
|
627
|
+
efConstruction: 128;
|
|
628
|
+
efSearch: 64;
|
|
629
|
+
```
|
|
630
|
+
|
|
631
|
+
Faster builds and queries, ~85-90% recall.
|
|
632
|
+
|
|
633
|
+
### Quick Win: Dynamic efSearch
|
|
634
|
+
|
|
635
|
+
Since efSearch can be set at query time:
|
|
636
|
+
|
|
637
|
+
```typescript
|
|
638
|
+
function search(
|
|
639
|
+
query: string,
|
|
640
|
+
options: { quality?: "fast" | "balanced" | "thorough" },
|
|
641
|
+
) {
|
|
642
|
+
const efSearch = {
|
|
643
|
+
fast: 64,
|
|
644
|
+
balanced: 100,
|
|
645
|
+
thorough: 256,
|
|
646
|
+
}[options.quality ?? "balanced"];
|
|
647
|
+
|
|
648
|
+
return vectorStore.search(queryEmbedding, { efSearch });
|
|
649
|
+
}
|
|
650
|
+
```
|
|
651
|
+
|
|
652
|
+
### Validation Approach
|
|
653
|
+
|
|
654
|
+
1. Create ground-truth test set (10-20 queries with known relevant sections)
|
|
655
|
+
2. Measure recall@k for different parameters
|
|
656
|
+
3. Measure query latency
|
|
657
|
+
4. Choose based on recall/latency tradeoff
|
|
658
|
+
|
|
659
|
+
---
|
|
660
|
+
|
|
661
|
+
## 7. Top 3 Recommendations
|
|
662
|
+
|
|
663
|
+
### Recommendation 1: Hybrid Search with RRF
|
|
664
|
+
|
|
665
|
+
**What**: Add BM25 keyword search alongside semantic search, fuse results with Reciprocal Rank Fusion.
|
|
666
|
+
|
|
667
|
+
**Why**:
|
|
668
|
+
|
|
669
|
+
- Handles exact term matching (API names, error codes)
|
|
670
|
+
- 10-15% precision improvement in benchmarks
|
|
671
|
+
- Low implementation complexity
|
|
672
|
+
- Falls back gracefully (if one method fails, other still works)
|
|
673
|
+
|
|
674
|
+
**Implementation**:
|
|
675
|
+
|
|
676
|
+
1. Add `wink-bm25-text-search` dependency
|
|
677
|
+
2. Build BM25 index during embedding build (uses same section content)
|
|
678
|
+
3. Add `--mode hybrid` option to search command
|
|
679
|
+
4. Implement RRF fusion (~50 lines of code)
|
|
680
|
+
|
|
681
|
+
**Effort**: Medium (2-3 days)
|
|
682
|
+
**Impact**: High
|
|
683
|
+
|
|
684
|
+
### Recommendation 2: Cross-Encoder Re-ranking
|
|
685
|
+
|
|
686
|
+
**What**: Re-rank top-20 semantic search results using ms-marco-MiniLM-L-6-v2 cross-encoder.
|
|
687
|
+
|
|
688
|
+
**Why**:
|
|
689
|
+
|
|
690
|
+
- 20-35% accuracy improvement
|
|
691
|
+
- Catches relevant results that rank lower in embedding space
|
|
692
|
+
- Can be opt-in (--rerank flag) to avoid latency when not needed
|
|
693
|
+
- Modern cross-encoders are fast (2-5ms per pair)
|
|
694
|
+
|
|
695
|
+
**Implementation**:
|
|
696
|
+
|
|
697
|
+
1. Add Transformers.js dependency or use API (Cohere/Jina)
|
|
698
|
+
2. Load cross-encoder model on first rerank request
|
|
699
|
+
3. Score top-N candidates
|
|
700
|
+
4. Re-sort by cross-encoder score
|
|
701
|
+
|
|
702
|
+
**Effort**: Medium (2-3 days for Transformers.js, 1 day for API)
|
|
703
|
+
**Impact**: High
|
|
704
|
+
|
|
705
|
+
### Recommendation 3: HNSW Parameter Optimization
|
|
706
|
+
|
|
707
|
+
**What**: Tune HNSW parameters based on corpus size and add dynamic efSearch.
|
|
708
|
+
|
|
709
|
+
**Why**:
|
|
710
|
+
|
|
711
|
+
- Zero dependency changes
|
|
712
|
+
- Immediate quality/speed improvements
|
|
713
|
+
- Low risk
|
|
714
|
+
|
|
715
|
+
**Implementation**:
|
|
716
|
+
|
|
717
|
+
1. Add config options for M, efConstruction
|
|
718
|
+
2. Implement dynamic efSearch (fast/balanced/thorough)
|
|
719
|
+
3. Add `--quality` flag to search command
|
|
720
|
+
4. Consider auto-tuning based on corpus size
|
|
721
|
+
|
|
722
|
+
**Effort**: Low (1 day)
|
|
723
|
+
**Impact**: Medium
|
|
724
|
+
|
|
725
|
+
---
|
|
726
|
+
|
|
727
|
+
## 8. Effort/Impact Analysis
|
|
728
|
+
|
|
729
|
+
### Summary Matrix
|
|
730
|
+
|
|
731
|
+
| Improvement | Effort | Impact | Risk | Priority |
|
|
732
|
+
| ---------------------------- | ------------- | ---------- | -------- | -------- |
|
|
733
|
+
| **HNSW parameter tuning** | Low (1d) | Medium | Very Low | P0 |
|
|
734
|
+
| **Hybrid search (BM25+RRF)** | Medium (2-3d) | High | Low | P1 |
|
|
735
|
+
| **Cross-encoder re-ranking** | Medium (2-3d) | High | Medium | P1 |
|
|
736
|
+
| **Dynamic efSearch** | Low (0.5d) | Low-Medium | Very Low | P0 |
|
|
737
|
+
| **HyDE query expansion** | Medium (2d) | Medium | Medium | P2 |
|
|
738
|
+
| **Enhanced filtering** | Medium (2d) | Medium | Low | P2 |
|
|
739
|
+
| **SPLADE sparse retrieval** | High (5d+) | Medium | Medium | P3 |
|
|
740
|
+
| **ColBERT late interaction** | High (1w+) | Medium | High | P3 |
|
|
741
|
+
|
|
742
|
+
### Recommended Implementation Order
|
|
743
|
+
|
|
744
|
+
**Phase 1: Quick Wins (Week 1)**
|
|
745
|
+
|
|
746
|
+
1. HNSW parameter optimization + dynamic efSearch
|
|
747
|
+
2. Add quality flag to search CLI
|
|
748
|
+
|
|
749
|
+
**Phase 2: Hybrid Search (Week 2)**
|
|
750
|
+
|
|
751
|
+
1. Integrate BM25 library
|
|
752
|
+
2. Build BM25 index during embedding build
|
|
753
|
+
3. Implement RRF fusion
|
|
754
|
+
4. Add hybrid mode to CLI
|
|
755
|
+
|
|
756
|
+
**Phase 3: Re-ranking (Week 3)**
|
|
757
|
+
|
|
758
|
+
1. Evaluate Transformers.js vs API approach
|
|
759
|
+
2. Implement re-ranking pipeline
|
|
760
|
+
3. Add --rerank flag
|
|
761
|
+
4. Cache loaded models
|
|
762
|
+
|
|
763
|
+
**Phase 4: Polish (Week 4)**
|
|
764
|
+
|
|
765
|
+
1. Add HyDE as opt-in for complex queries
|
|
766
|
+
2. Enhance metadata filtering
|
|
767
|
+
3. Add search quality metrics/logging
|
|
768
|
+
4. Documentation
|
|
769
|
+
|
|
770
|
+
### Risk Mitigation
|
|
771
|
+
|
|
772
|
+
| Risk | Mitigation |
|
|
773
|
+
| --------------------------- | --------------------------- |
|
|
774
|
+
| Transformers.js ONNX issues | Fallback to API reranking |
|
|
775
|
+
| BM25 index size | Store separately, lazy load |
|
|
776
|
+
| Increased latency | Make re-ranking opt-in |
|
|
777
|
+
| Model download size | Cache models, lazy load |
|
|
778
|
+
|
|
779
|
+
---
|
|
780
|
+
|
|
781
|
+
## Sources
|
|
782
|
+
|
|
783
|
+
### Hybrid Search
|
|
784
|
+
|
|
785
|
+
- [Hybrid Search Explained - Weaviate](https://weaviate.io/blog/hybrid-search-explained)
|
|
786
|
+
- [Hybrid Search with BM25 and Rank Fusion - Medium](https://medium.com/thinking-sand/hybrid-search-with-bm25-and-rank-fusion-for-accurate-results-456a70305dc5)
|
|
787
|
+
- [Hybrid Search Scoring (RRF) - Azure AI Search](https://learn.microsoft.com/en-us/azure/search/hybrid-search-ranking)
|
|
788
|
+
- [Comprehensive Hybrid Search Guide - Elastic](https://www.elastic.co/what-is/hybrid-search)
|
|
789
|
+
- [Reciprocal Rank Fusion - ParadeDB](https://www.paradedb.com/learn/search-concepts/reciprocal-rank-fusion)
|
|
790
|
+
|
|
791
|
+
### Re-ranking
|
|
792
|
+
|
|
793
|
+
- [cross-encoder/ms-marco-MiniLM-L6-v2 - Hugging Face](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2)
|
|
794
|
+
- [RAG Reranking Techniques - CustomGPT](https://customgpt.ai/rag-reranking-techniques/)
|
|
795
|
+
- [Adaptive Retrieval Reranking - RAG About It](https://ragaboutit.com/adaptive-retrieval-reranking-how-to-implement-cross-encoder-models-to-fix-enterprise-rag-ranking-failures/)
|
|
796
|
+
- [MS MARCO Cross-Encoders - Sentence Transformers](https://www.sbert.net/docs/pretrained-models/ce-msmarco.html)
|
|
797
|
+
- [FlashRank - GitHub](https://github.com/PrithivirajDamodaran/FlashRank)
|
|
798
|
+
|
|
799
|
+
### Vector Indexes
|
|
800
|
+
|
|
801
|
+
- [Vector Search at Scale: HNSW vs IVF vs DiskANN](https://netcrit.net/vector-search-at-scale-hnsw-vs-ivf-vs-diskann)
|
|
802
|
+
- [HNSW vs DiskANN - Tiger Data](https://www.tigerdata.com/learn/hnsw-vs-diskann)
|
|
803
|
+
- [How to Pick a Vector Index - Zilliz](https://zilliz.com/learn/how-to-pick-a-vector-index-in-milvus-visual-guide)
|
|
804
|
+
- [HNSW Index Explained - Milvus](https://milvus.io/docs/index-explained.md)
|
|
805
|
+
|
|
806
|
+
### HNSW Tuning
|
|
807
|
+
|
|
808
|
+
- [Practical Guide to HNSW Hyperparameters - OpenSearch](https://opensearch.org/blog/a-practical-guide-to-selecting-hnsw-hyperparameters/)
|
|
809
|
+
- [HNSW Configuration Parameters - Milvus AI Reference](https://milvus.io/ai-quick-reference/what-are-the-key-configuration-parameters-for-an-hnsw-index-such-as-m-and-efconstructionefsearch-and-how-does-each-influence-the-tradeoff-between-index-size-build-time-query-speed-and-recall)
|
|
810
|
+
- [HNSW Indexes with Postgres - Crunchy Data](https://www.crunchydata.com/blog/hnsw-indexes-with-postgres-and-pgvector)
|
|
811
|
+
|
|
812
|
+
### Filtering
|
|
813
|
+
|
|
814
|
+
- [Complete Guide to Filtering in Vector Search - Qdrant](https://qdrant.tech/articles/vector-search-filtering/)
|
|
815
|
+
- [Vector Query Filters - Azure AI Search](https://learn.microsoft.com/en-us/azure/search/vector-search-filters)
|
|
816
|
+
- [Achilles Heel of Vector Search: Filters](https://yudhiesh.github.io/2025/05/09/the-achilles-heel-of-vector-search-filters/)
|
|
817
|
+
- [Metadata Filtering and Hybrid Search - Dataquest](https://www.dataquest.io/blog/metadata-filtering-and-hybrid-search-for-vector-databases/)
|
|
818
|
+
|
|
819
|
+
### Emerging Patterns
|
|
820
|
+
|
|
821
|
+
- [Late Interaction Overview: ColBERT, ColPali - Weaviate](https://weaviate.io/blog/late-interaction-overview)
|
|
822
|
+
- [Modern Sparse Neural Retrieval - Qdrant](https://qdrant.tech/articles/modern-sparse-neural-retrieval/)
|
|
823
|
+
- [SPLADE vs BM25 - Zilliz](https://zilliz.com/learn/comparing-splade-sparse-vectors-with-bm25)
|
|
824
|
+
- [HyDE for RAG - Machine Learning Plus](https://machinelearningplus.com/gen-ai/hypothetical-document-embedding-hyde-a-smarter-rag-method-to-search-documents/)
|
|
825
|
+
- [Better RAG with HyDE - Zilliz](https://zilliz.com/learn/improve-rag-and-information-retrieval-with-hyde-hypothetical-document-embeddings)
|
|
826
|
+
|
|
827
|
+
### Node.js Libraries
|
|
828
|
+
|
|
829
|
+
- [hnswlib-node - npm](https://www.npmjs.com/package/hnswlib-node)
|
|
830
|
+
- [hnswlib-node - GitHub](https://github.com/yoshoku/hnswlib-node)
|
|
831
|
+
- [wink-bm25-text-search - npm](https://www.npmjs.com/package/wink-bm25-text-search)
|
|
832
|
+
- [OkapiBM25 - GitHub](https://github.com/FurkanToprak/OkapiBM25)
|
|
833
|
+
- [Transformers.js v3 - Hugging Face](https://huggingface.co/blog/transformersjs-v3)
|
|
834
|
+
- [FaissStore - LangChain.js](https://js.langchain.com/docs/integrations/vectorstores/faiss/)
|
|
835
|
+
|
|
836
|
+
### RAG Best Practices
|
|
837
|
+
|
|
838
|
+
- [2025 Guide to RAG - Eden AI](https://www.edenai.co/post/the-2025-guide-to-retrieval-augmented-generation-rag)
|
|
839
|
+
- [Enhancing RAG: Study of Best Practices - arXiv](https://arxiv.org/abs/2501.07391)
|
|
840
|
+
- [RAG 2025 Definitive Guide - Chitika](https://www.chitika.com/retrieval-augmented-generation-rag-the-definitive-guide-2025/)
|
|
841
|
+
- [Role of Sufficient Context in RAG - Google Research](https://research.google/blog/deeper-insights-into-retrieval-augmented-generation-the-role-of-sufficient-context/)
|