mdcontext 0.0.1 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.changeset/README.md +28 -0
- package/.changeset/config.json +11 -0
- package/.claude/settings.local.json +25 -0
- package/.github/workflows/ci.yml +83 -0
- package/.github/workflows/claude-code-review.yml +44 -0
- package/.github/workflows/claude.yml +85 -0
- package/.github/workflows/release.yml +113 -0
- package/.tldrignore +112 -0
- package/BACKLOG.md +338 -0
- package/CONTRIBUTING.md +186 -0
- package/NOTES/NOTES +44 -0
- package/README.md +434 -11
- package/biome.json +36 -0
- package/cspell.config.yaml +14 -0
- package/dist/chunk-23UPXDNL.js +3044 -0
- package/dist/chunk-2W7MO2DL.js +1366 -0
- package/dist/chunk-3NUAZGMA.js +1689 -0
- package/dist/chunk-7TOWB2XB.js +366 -0
- package/dist/chunk-7XOTOADQ.js +3065 -0
- package/dist/chunk-AH2PDM2K.js +3042 -0
- package/dist/chunk-BNXWSZ63.js +3742 -0
- package/dist/chunk-BTL5DJVU.js +3222 -0
- package/dist/chunk-HDHYG7E4.js +104 -0
- package/dist/chunk-HLR4KZBP.js +3234 -0
- package/dist/chunk-IP3FRFEB.js +1045 -0
- package/dist/chunk-KHU56VDO.js +3042 -0
- package/dist/chunk-KRYIFLQR.js +88 -0
- package/dist/chunk-LBSDNLEM.js +287 -0
- package/dist/chunk-MNTQ7HCP.js +2643 -0
- package/dist/chunk-MUJELQQ6.js +1387 -0
- package/dist/chunk-MXJGMSLV.js +2199 -0
- package/dist/chunk-N6QJGC3Z.js +2636 -0
- package/dist/chunk-OBELGBPM.js +1713 -0
- package/dist/chunk-OT7R5XTA.js +3192 -0
- package/dist/chunk-P7X4RA2T.js +106 -0
- package/dist/chunk-PIDUQNC2.js +3185 -0
- package/dist/chunk-POGCDIH4.js +3187 -0
- package/dist/chunk-PSIEOQGZ.js +3043 -0
- package/dist/chunk-PVRT3IHA.js +3238 -0
- package/dist/chunk-QNN4TT23.js +1430 -0
- package/dist/chunk-RE3R45RJ.js +3042 -0
- package/dist/chunk-S7E6TFX6.js +803 -0
- package/dist/chunk-SG6GLU4U.js +1378 -0
- package/dist/chunk-SJCDV2ST.js +274 -0
- package/dist/chunk-SYE5XLF3.js +104 -0
- package/dist/chunk-T5VLYBZD.js +103 -0
- package/dist/chunk-TOQB7VWU.js +3238 -0
- package/dist/chunk-VFNMZ4ZQ.js +3228 -0
- package/dist/chunk-VVTGZNBT.js +1629 -0
- package/dist/chunk-W7Q4RFEV.js +104 -0
- package/dist/chunk-XTYYVRLO.js +3190 -0
- package/dist/chunk-Y6MDYVJD.js +3063 -0
- package/dist/cli/main.d.ts +1 -0
- package/dist/cli/main.js +5458 -0
- package/dist/index.d.ts +653 -0
- package/dist/index.js +79 -0
- package/dist/mcp/server.d.ts +1 -0
- package/dist/mcp/server.js +472 -0
- package/dist/schema-BAWSG7KY.js +22 -0
- package/dist/schema-E3QUPL26.js +20 -0
- package/dist/schema-EHL7WUT6.js +20 -0
- package/docs/019-USAGE.md +625 -0
- package/docs/020-current-implementation.md +364 -0
- package/docs/021-DOGFOODING-FINDINGS.md +175 -0
- package/docs/BACKLOG.md +80 -0
- package/docs/CONFIG.md +1123 -0
- package/docs/DESIGN.md +439 -0
- package/docs/ERRORS.md +383 -0
- package/docs/PROJECT.md +88 -0
- package/docs/ROADMAP.md +407 -0
- package/docs/summarization.md +320 -0
- package/docs/test-links.md +9 -0
- package/justfile +40 -0
- package/package.json +74 -9
- package/pnpm-workspace.yaml +5 -0
- package/research/INDEX.md +315 -0
- package/research/code-review/README.md +90 -0
- package/research/code-review/cli-error-handling-review.md +979 -0
- package/research/code-review/code-review-validation-report.md +464 -0
- package/research/code-review/main-ts-review.md +1128 -0
- package/research/config-analysis/01-current-implementation.md +470 -0
- package/research/config-analysis/02-strategy-recommendation.md +428 -0
- package/research/config-analysis/03-task-candidates.md +715 -0
- package/research/config-analysis/033-research-configuration-management.md +828 -0
- package/research/config-analysis/034-research-effect-cli-config.md +1504 -0
- package/research/config-analysis/04-consolidated-task-candidates.md +277 -0
- package/research/config-docs/SUMMARY.md +357 -0
- package/research/config-docs/TEST-RESULTS.md +776 -0
- package/research/config-docs/TODO.md +542 -0
- package/research/config-docs/analysis.md +744 -0
- package/research/config-docs/fix-validation.md +502 -0
- package/research/config-docs/help-audit.md +264 -0
- package/research/config-docs/help-system-analysis.md +890 -0
- package/research/dogfood/consolidated-tool-evaluation.md +373 -0
- package/research/dogfood/strategy-a/a-synthesis.md +184 -0
- package/research/dogfood/strategy-a/a1-docs.md +226 -0
- package/research/dogfood/strategy-a/a2-amorphic.md +156 -0
- package/research/dogfood/strategy-a/a3-llm.md +164 -0
- package/research/dogfood/strategy-b/b-synthesis.md +228 -0
- package/research/dogfood/strategy-b/b1-architecture.md +207 -0
- package/research/dogfood/strategy-b/b2-gaps.md +258 -0
- package/research/dogfood/strategy-b/b3-workflows.md +250 -0
- package/research/dogfood/strategy-c/c-synthesis.md +451 -0
- package/research/dogfood/strategy-c/c1-explorer.md +192 -0
- package/research/dogfood/strategy-c/c2-diver-memory.md +145 -0
- package/research/dogfood/strategy-c/c3-diver-control.md +148 -0
- package/research/dogfood/strategy-c/c4-diver-failure.md +151 -0
- package/research/dogfood/strategy-c/c5-diver-execution.md +221 -0
- package/research/dogfood/strategy-c/c6-diver-org.md +221 -0
- package/research/effect-cli-error-handling.md +845 -0
- package/research/effect-errors-as-values.md +943 -0
- package/research/errors-task-analysis/00-consolidated-tasks.md +207 -0
- package/research/errors-task-analysis/cli-commands-analysis.md +909 -0
- package/research/errors-task-analysis/embeddings-analysis.md +709 -0
- package/research/errors-task-analysis/index-search-analysis.md +812 -0
- package/research/frontmatter/COMMENTS-ARE-SKIPPED.md +149 -0
- package/research/frontmatter/LLM-CODE-NAVIGATION.md +276 -0
- package/research/issue-review.md +603 -0
- package/research/llm-summarization/agent-cli-tools-2026.md +1082 -0
- package/research/llm-summarization/alternative-providers-2026.md +1428 -0
- package/research/llm-summarization/anthropic-2026.md +367 -0
- package/research/llm-summarization/claude-cli-integration.md +1706 -0
- package/research/llm-summarization/cli-integration-patterns.md +3155 -0
- package/research/llm-summarization/openai-2026.md +473 -0
- package/research/llm-summarization/openai-compatible-providers-2026.md +1022 -0
- package/research/llm-summarization/opencode-cli-integration.md +1552 -0
- package/research/llm-summarization/prompt-engineering-2026.md +1426 -0
- package/research/llm-summarization/prototype-results.md +56 -0
- package/research/llm-summarization/provider-switching-patterns-2026.md +2153 -0
- package/research/llm-summarization/typescript-llm-libraries-2026.md +2436 -0
- package/research/mdcontext-error-analysis.md +521 -0
- package/research/mdcontext-pudding/00-EXECUTIVE-SUMMARY.md +282 -0
- package/research/mdcontext-pudding/01-index-embed.md +956 -0
- package/research/mdcontext-pudding/02-search-COMMANDS.md +142 -0
- package/research/mdcontext-pudding/02-search-SUMMARY.md +146 -0
- package/research/mdcontext-pudding/02-search.md +970 -0
- package/research/mdcontext-pudding/03-context.md +779 -0
- package/research/mdcontext-pudding/04-navigation-and-analytics.md +803 -0
- package/research/mdcontext-pudding/04-tree.md +704 -0
- package/research/mdcontext-pudding/05-config.md +1038 -0
- package/research/mdcontext-pudding/06-links-summary.txt +87 -0
- package/research/mdcontext-pudding/06-links.md +679 -0
- package/research/mdcontext-pudding/07-stats.md +693 -0
- package/research/mdcontext-pudding/BUG-FIX-PLAN.md +388 -0
- package/research/mdcontext-pudding/P0-BUG-VALIDATION.md +167 -0
- package/research/mdcontext-pudding/README.md +168 -0
- package/research/mdcontext-pudding/TESTING-SUMMARY.md +128 -0
- package/research/npm_publish/011-npm-workflow-research-agent2.md +792 -0
- package/research/npm_publish/012-npm-workflow-research-agent1.md +530 -0
- package/research/npm_publish/013-npm-workflow-research-agent3.md +722 -0
- package/research/npm_publish/014-npm-workflow-synthesis.md +556 -0
- package/research/npm_publish/031-npm-workflow-task-analysis.md +134 -0
- package/research/research-quality-review.md +834 -0
- package/research/semantic-search/002-research-embedding-models.md +490 -0
- package/research/semantic-search/003-research-rag-alternatives.md +523 -0
- package/research/semantic-search/004-research-vector-search.md +841 -0
- package/research/semantic-search/032-research-semantic-search.md +427 -0
- package/research/semantic-search/embedding-text-analysis.md +156 -0
- package/research/semantic-search/multi-word-failure-reproduction.md +171 -0
- package/research/semantic-search/query-processing-analysis.md +207 -0
- package/research/semantic-search/root-cause-and-solution.md +114 -0
- package/research/semantic-search/threshold-validation-report.md +69 -0
- package/research/semantic-search/vector-search-analysis.md +63 -0
- package/research/task-management-2026/00-synthesis-recommendations.md +295 -0
- package/research/task-management-2026/01-ai-workflow-tools.md +416 -0
- package/research/task-management-2026/02-agent-framework-patterns.md +476 -0
- package/research/task-management-2026/03-lightweight-file-based.md +567 -0
- package/research/task-management-2026/04-established-tools-ai-features.md +541 -0
- package/research/task-management-2026/linear/01-core-features-workflow.md +771 -0
- package/research/task-management-2026/linear/02-api-integrations.md +930 -0
- package/research/task-management-2026/linear/03-ai-features.md +368 -0
- package/research/task-management-2026/linear/04-pricing-setup.md +205 -0
- package/research/task-management-2026/linear/05-usage-patterns-best-practices.md +605 -0
- package/research/test-path-issues.md +276 -0
- package/review/ALP-76/1-error-type-design.md +962 -0
- package/review/ALP-76/2-error-handling-patterns.md +906 -0
- package/review/ALP-76/3-error-presentation.md +624 -0
- package/review/ALP-76/4-test-coverage.md +625 -0
- package/review/ALP-76/5-migration-completeness.md +440 -0
- package/review/ALP-76/6-effect-best-practices.md +755 -0
- package/scripts/apply-branch-protection.sh +47 -0
- package/scripts/branch-protection-templates.json +79 -0
- package/scripts/prototype-summarization.ts +346 -0
- package/scripts/rebuild-hnswlib.js +58 -0
- package/scripts/setup-branch-protection.sh +64 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/active-provider.json +7 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/bm25.json +541 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/bm25.meta.json +5 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/config.json +8 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.bin +0 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.meta.bin +0 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/documents.json +60 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/links.json +13 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/sections.json +1197 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/configuration-management.md +99 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/distributed-systems.md +92 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/error-handling.md +78 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/failure-automation.md +55 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/job-context.md +69 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/process-orchestration.md +99 -0
- package/src/cli/argv-preprocessor.test.ts +210 -0
- package/src/cli/argv-preprocessor.ts +202 -0
- package/src/cli/cli.test.ts +627 -0
- package/src/cli/commands/backlinks.ts +54 -0
- package/src/cli/commands/config-cmd.ts +642 -0
- package/src/cli/commands/context.ts +285 -0
- package/src/cli/commands/duplicates.ts +122 -0
- package/src/cli/commands/embeddings.ts +529 -0
- package/src/cli/commands/index-cmd.ts +480 -0
- package/src/cli/commands/index.ts +16 -0
- package/src/cli/commands/links.ts +52 -0
- package/src/cli/commands/search.ts +1281 -0
- package/src/cli/commands/stats.ts +149 -0
- package/src/cli/commands/tree.ts +128 -0
- package/src/cli/config-layer.ts +176 -0
- package/src/cli/error-handler.test.ts +235 -0
- package/src/cli/error-handler.ts +655 -0
- package/src/cli/flag-schemas.ts +341 -0
- package/src/cli/help.ts +588 -0
- package/src/cli/index.ts +9 -0
- package/src/cli/main.ts +435 -0
- package/src/cli/options.ts +41 -0
- package/src/cli/shared-error-handling.ts +199 -0
- package/src/cli/typo-suggester.test.ts +105 -0
- package/src/cli/typo-suggester.ts +130 -0
- package/src/cli/utils.ts +259 -0
- package/src/config/file-provider.test.ts +320 -0
- package/src/config/file-provider.ts +273 -0
- package/src/config/index.ts +72 -0
- package/src/config/integration.test.ts +667 -0
- package/src/config/precedence.test.ts +277 -0
- package/src/config/precedence.ts +451 -0
- package/src/config/schema.test.ts +414 -0
- package/src/config/schema.ts +603 -0
- package/src/config/service.test.ts +320 -0
- package/src/config/service.ts +243 -0
- package/src/config/testing.test.ts +264 -0
- package/src/config/testing.ts +110 -0
- package/src/core/index.ts +1 -0
- package/src/core/types.ts +113 -0
- package/src/duplicates/detector.test.ts +183 -0
- package/src/duplicates/detector.ts +414 -0
- package/src/duplicates/index.ts +18 -0
- package/src/embeddings/embedding-namespace.test.ts +300 -0
- package/src/embeddings/embedding-namespace.ts +947 -0
- package/src/embeddings/heading-boost.test.ts +222 -0
- package/src/embeddings/hnsw-build-options.test.ts +198 -0
- package/src/embeddings/hyde.test.ts +272 -0
- package/src/embeddings/hyde.ts +264 -0
- package/src/embeddings/index.ts +10 -0
- package/src/embeddings/openai-provider.ts +414 -0
- package/src/embeddings/pricing.json +22 -0
- package/src/embeddings/provider-constants.ts +204 -0
- package/src/embeddings/provider-errors.test.ts +967 -0
- package/src/embeddings/provider-errors.ts +565 -0
- package/src/embeddings/provider-factory.test.ts +240 -0
- package/src/embeddings/provider-factory.ts +225 -0
- package/src/embeddings/provider-integration.test.ts +788 -0
- package/src/embeddings/query-preprocessing.test.ts +187 -0
- package/src/embeddings/semantic-search-threshold.test.ts +508 -0
- package/src/embeddings/semantic-search.ts +1270 -0
- package/src/embeddings/types.ts +359 -0
- package/src/embeddings/vector-store.ts +708 -0
- package/src/embeddings/voyage-provider.ts +313 -0
- package/src/errors/errors.test.ts +845 -0
- package/src/errors/index.ts +533 -0
- package/src/index/ignore-patterns.test.ts +354 -0
- package/src/index/ignore-patterns.ts +305 -0
- package/src/index/index.ts +4 -0
- package/src/index/indexer.ts +684 -0
- package/src/index/storage.ts +260 -0
- package/src/index/types.ts +147 -0
- package/src/index/watcher.ts +189 -0
- package/src/index.ts +30 -0
- package/src/integration/search-keyword.test.ts +678 -0
- package/src/mcp/server.ts +612 -0
- package/src/parser/index.ts +1 -0
- package/src/parser/parser.test.ts +291 -0
- package/src/parser/parser.ts +394 -0
- package/src/parser/section-filter.test.ts +277 -0
- package/src/parser/section-filter.ts +392 -0
- package/src/search/__tests__/hybrid-search.test.ts +650 -0
- package/src/search/bm25-store.ts +366 -0
- package/src/search/cross-encoder.test.ts +253 -0
- package/src/search/cross-encoder.ts +406 -0
- package/src/search/fuzzy-search.test.ts +419 -0
- package/src/search/fuzzy-search.ts +273 -0
- package/src/search/hybrid-search.ts +448 -0
- package/src/search/path-matcher.test.ts +276 -0
- package/src/search/path-matcher.ts +33 -0
- package/src/search/query-parser.test.ts +260 -0
- package/src/search/query-parser.ts +319 -0
- package/src/search/searcher.test.ts +280 -0
- package/src/search/searcher.ts +724 -0
- package/src/search/wink-bm25.d.ts +30 -0
- package/src/summarization/cli-providers/claude.ts +202 -0
- package/src/summarization/cli-providers/detection.test.ts +273 -0
- package/src/summarization/cli-providers/detection.ts +118 -0
- package/src/summarization/cli-providers/index.ts +8 -0
- package/src/summarization/cost.test.ts +139 -0
- package/src/summarization/cost.ts +102 -0
- package/src/summarization/error-handler.test.ts +127 -0
- package/src/summarization/error-handler.ts +111 -0
- package/src/summarization/index.ts +102 -0
- package/src/summarization/pipeline.test.ts +498 -0
- package/src/summarization/pipeline.ts +231 -0
- package/src/summarization/prompts.test.ts +269 -0
- package/src/summarization/prompts.ts +133 -0
- package/src/summarization/provider-factory.test.ts +396 -0
- package/src/summarization/provider-factory.ts +178 -0
- package/src/summarization/types.ts +184 -0
- package/src/summarize/budget-bugs.test.ts +620 -0
- package/src/summarize/formatters.ts +419 -0
- package/src/summarize/index.ts +20 -0
- package/src/summarize/summarizer.test.ts +275 -0
- package/src/summarize/summarizer.ts +597 -0
- package/src/summarize/verify-bugs.test.ts +238 -0
- package/src/types/huggingface-transformers.d.ts +66 -0
- package/src/utils/index.ts +1 -0
- package/src/utils/tokens.test.ts +142 -0
- package/src/utils/tokens.ts +186 -0
- package/tests/fixtures/cli/.mdcontext/active-provider.json +7 -0
- package/tests/fixtures/cli/.mdcontext/config.json +8 -0
- package/tests/fixtures/cli/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.bin +0 -0
- package/tests/fixtures/cli/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.meta.bin +0 -0
- package/tests/fixtures/cli/.mdcontext/indexes/documents.json +33 -0
- package/tests/fixtures/cli/.mdcontext/indexes/links.json +12 -0
- package/tests/fixtures/cli/.mdcontext/indexes/sections.json +247 -0
- package/tests/fixtures/cli/README.md +9 -0
- package/tests/fixtures/cli/api-reference.md +11 -0
- package/tests/fixtures/cli/getting-started.md +11 -0
- package/tests/integration/embed-index.test.ts +712 -0
- package/tests/integration/search-context.test.ts +469 -0
- package/tests/integration/search-semantic.test.ts +522 -0
- package/tsconfig.json +26 -0
- package/vitest.config.ts +16 -0
- package/vitest.setup.ts +12 -0
|
@@ -0,0 +1,523 @@
|
|
|
1
|
+
# RAG Alternatives Research: Improving Semantic Search Quality
|
|
2
|
+
|
|
3
|
+
This document explores alternatives to traditional RAG patterns for improving semantic search quality in mdcontext. Since mdcontext is a pure retrieval system (no LLM generation), we focus on techniques that enhance retrieval precision and recall without adding generation complexity.
|
|
4
|
+
|
|
5
|
+
## Table of Contents
|
|
6
|
+
|
|
7
|
+
1. [The RAG Problem](#the-rag-problem)
|
|
8
|
+
2. [Alternative Approaches](#alternative-approaches)
|
|
9
|
+
3. [Top 3 Recommendations](#top-3-recommendations)
|
|
10
|
+
4. [Effort/Impact Analysis](#effortimpact-analysis)
|
|
11
|
+
5. [Quick Wins](#quick-wins)
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## The RAG Problem
|
|
16
|
+
|
|
17
|
+
### Why Standard RAG Hurts Retrieval Quality
|
|
18
|
+
|
|
19
|
+
Traditional RAG (Retrieval-Augmented Generation) is designed to enhance LLM generation with retrieved context. However, this paradigm introduces several problems when applied to pure semantic search:
|
|
20
|
+
|
|
21
|
+
#### 1. Optimization Mismatch
|
|
22
|
+
|
|
23
|
+
RAG systems optimize for **generation quality**, not **retrieval precision**. This creates a fundamental mismatch:
|
|
24
|
+
|
|
25
|
+
- RAG tolerates noisy retrieval because LLMs can filter irrelevant context
|
|
26
|
+
- Pure search requires every result to be relevant since users see raw results
|
|
27
|
+
- RAG metrics (BLEU, ROUGE) don't align with search metrics (nDCG, MRR)
|
|
28
|
+
|
|
29
|
+
#### 2. The Confidence Problem
|
|
30
|
+
|
|
31
|
+
Research shows RAG paradoxically reduces model accuracy in some cases:
|
|
32
|
+
|
|
33
|
+
> "While RAG generally improves overall performance, it paradoxically reduces the model's ability to abstain from answering when appropriate. The introduction of additional context seems to increase the model's confidence, leading to a higher propensity for hallucination rather than abstention."
|
|
34
|
+
|
|
35
|
+
Google research found that Gemma's incorrect answer rate increased from 10.2% to 66.1% when using insufficient context, demonstrating how retrieved content can actively harm results.
|
|
36
|
+
|
|
37
|
+
#### 3. The Vocabulary Mismatch Problem
|
|
38
|
+
|
|
39
|
+
Dense embeddings theoretically solve vocabulary mismatch (e.g., "coronavirus" should match "COVID"), but real embeddings fall short:
|
|
40
|
+
|
|
41
|
+
> "While semantic embedding models are supposed to eliminate the need for query expansion... real embeddings made by real models often fall short."
|
|
42
|
+
|
|
43
|
+
Example: A query for "skin rash" might retrieve documents about "behaving rashly" while missing medical articles about "dermatitis."
|
|
44
|
+
|
|
45
|
+
#### 4. When Retrieval Beats Generation
|
|
46
|
+
|
|
47
|
+
For documentation search specifically:
|
|
48
|
+
|
|
49
|
+
| Use Case | RAG Appropriate | Pure Retrieval Better |
|
|
50
|
+
| --------------------- | --------------- | --------------------- |
|
|
51
|
+
| Answer synthesis | Yes | No |
|
|
52
|
+
| Finding specific docs | No | Yes |
|
|
53
|
+
| Exploratory search | No | Yes |
|
|
54
|
+
| Code examples | Depends | Usually |
|
|
55
|
+
| API reference | No | Yes |
|
|
56
|
+
|
|
57
|
+
mdcontext's use case (finding relevant documentation sections) is best served by optimizing retrieval directly.
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
## Alternative Approaches
|
|
62
|
+
|
|
63
|
+
### 1. Hybrid Search (BM25 + Dense)
|
|
64
|
+
|
|
65
|
+
**What it is**: Combine traditional keyword search (BM25) with dense vector search, fusing results using techniques like Reciprocal Rank Fusion (RRF).
|
|
66
|
+
|
|
67
|
+
**Why it works**:
|
|
68
|
+
|
|
69
|
+
- Dense vectors excel at semantic understanding
|
|
70
|
+
- BM25 excels at exact matches (error codes, SKUs, technical terms)
|
|
71
|
+
- Hybrid captures both without tradeoffs
|
|
72
|
+
|
|
73
|
+
**Performance gains**:
|
|
74
|
+
|
|
75
|
+
> "Hybrid search improves recall 15-30% over single methods with minimal added complexity."
|
|
76
|
+
>
|
|
77
|
+
> "In open-domain QA benchmarks... BM25 passage recall is 22.1%; dense retrievers (DPR) reach 48.7%, but hybrid pipelines achieve up to 53.4%."
|
|
78
|
+
|
|
79
|
+
**Fusion methods**:
|
|
80
|
+
|
|
81
|
+
1. **Reciprocal Rank Fusion (RRF)**: Simplest, requires no tuning
|
|
82
|
+
|
|
83
|
+
```
|
|
84
|
+
score = sum(1 / (k + rank)) for each retriever
|
|
85
|
+
k = 60 (standard constant)
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
2. **Linear Combination**: More control, requires tuning
|
|
89
|
+
|
|
90
|
+
```
|
|
91
|
+
score = alpha * bm25_score + (1 - alpha) * dense_score
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
**JavaScript/TypeScript options**:
|
|
95
|
+
|
|
96
|
+
- [wink-bm25-text-search](https://www.npmjs.com/package/wink-bm25-text-search): Full-featured BM25 with NLP integration
|
|
97
|
+
- [OkapiBM25](https://www.npmjs.com/package/okapi-bm25): Simple, typed implementation
|
|
98
|
+
- [@langchain/community BM25Retriever](https://js.langchain.com/docs/integrations/retrievers/bm25/)
|
|
99
|
+
|
|
100
|
+
### 2. Cross-Encoder Re-ranking
|
|
101
|
+
|
|
102
|
+
**What it is**: Use a secondary model to re-score top-k results from initial retrieval.
|
|
103
|
+
|
|
104
|
+
**How it works**:
|
|
105
|
+
|
|
106
|
+
1. First stage: Fast bi-encoder retrieval (current approach)
|
|
107
|
+
2. Second stage: Cross-encoder scores (query, document) pairs for top-k results
|
|
108
|
+
3. Re-order based on cross-encoder scores
|
|
109
|
+
|
|
110
|
+
**Why it's better**:
|
|
111
|
+
|
|
112
|
+
> "Cross-encoders are more accurate than bi-encoders but they don't scale well, so using them to re-order a shortened list returned by semantic search is the ideal use case."
|
|
113
|
+
|
|
114
|
+
Cross-encoders process query and document together, enabling deeper semantic matching that bi-encoders (separate embedding) cannot achieve.
|
|
115
|
+
|
|
116
|
+
**Trade-offs**:
|
|
117
|
+
|
|
118
|
+
| Aspect | Bi-Encoder | Cross-Encoder |
|
|
119
|
+
| ----------- | ---------------------- | ------------------------ |
|
|
120
|
+
| Speed | Fast (precompute docs) | Slow (compute per query) |
|
|
121
|
+
| Accuracy | Good | Best |
|
|
122
|
+
| Scalability | O(1) for docs | O(n) per query |
|
|
123
|
+
| Use case | Initial retrieval | Re-ranking top-k |
|
|
124
|
+
|
|
125
|
+
**Implementation options**:
|
|
126
|
+
|
|
127
|
+
- [Transformers.js](https://huggingface.co/docs/transformers.js): Run ONNX models in Node.js
|
|
128
|
+
- [Cohere Rerank API](https://cohere.com/rerank): Managed service
|
|
129
|
+
- Python sidecar with sentence-transformers
|
|
130
|
+
|
|
131
|
+
### 3. SPLADE (Learned Sparse Retrieval)
|
|
132
|
+
|
|
133
|
+
**What it is**: Neural model that produces sparse vectors compatible with inverted indexes, combining benefits of neural understanding with lexical precision.
|
|
134
|
+
|
|
135
|
+
**How it works**:
|
|
136
|
+
|
|
137
|
+
- Uses BERT to weight term importance
|
|
138
|
+
- Enables term expansion (adds relevant related terms)
|
|
139
|
+
- Produces sparse vectors (mostly zeros) for efficient indexing
|
|
140
|
+
|
|
141
|
+
**Key advantages**:
|
|
142
|
+
|
|
143
|
+
> "Sparse representations benefit from several advantages compared to dense approaches: efficient use of inverted index, explicit lexical match, interpretability. They also seem to be better at generalizing on out-of-domain data."
|
|
144
|
+
|
|
145
|
+
**When SPLADE beats dense**:
|
|
146
|
+
|
|
147
|
+
- Out-of-domain generalization
|
|
148
|
+
- Interpretability requirements
|
|
149
|
+
- Exact term matching important
|
|
150
|
+
- Limited training data
|
|
151
|
+
|
|
152
|
+
**Trade-offs**:
|
|
153
|
+
|
|
154
|
+
- Requires specialized model serving
|
|
155
|
+
- Less mature JavaScript ecosystem
|
|
156
|
+
- May need fine-tuning for domain
|
|
157
|
+
|
|
158
|
+
### 4. ColBERT Late Interaction
|
|
159
|
+
|
|
160
|
+
**What it is**: Multi-vector approach where documents and queries are represented by multiple token-level vectors, matched via "late interaction."
|
|
161
|
+
|
|
162
|
+
**How it works**:
|
|
163
|
+
|
|
164
|
+
1. Encode query tokens → multiple query vectors
|
|
165
|
+
2. Encode document tokens → multiple document vectors
|
|
166
|
+
3. Compute MaxSim: for each query token, find max similarity to any doc token
|
|
167
|
+
4. Sum MaxSim scores across query tokens
|
|
168
|
+
|
|
169
|
+
**Performance characteristics**:
|
|
170
|
+
|
|
171
|
+
> "PLAID reduces late interaction search latency by up to 7x on a GPU and 45x on a CPU against vanilla ColBERTv2."
|
|
172
|
+
|
|
173
|
+
**Production viability**:
|
|
174
|
+
|
|
175
|
+
- PLAID engine enables production-scale deployment
|
|
176
|
+
- Memory-mapped storage reduces RAM by 90%
|
|
177
|
+
- Sub-millisecond query latency achievable
|
|
178
|
+
|
|
179
|
+
**Limitations for mdcontext**:
|
|
180
|
+
|
|
181
|
+
- No mature JavaScript implementation
|
|
182
|
+
- Would require Python service
|
|
183
|
+
- More complex infrastructure
|
|
184
|
+
- Overkill for typical documentation corpus sizes
|
|
185
|
+
|
|
186
|
+
### 5. Query Expansion Techniques
|
|
187
|
+
|
|
188
|
+
#### a) HyDE (Hypothetical Document Embeddings)
|
|
189
|
+
|
|
190
|
+
**What it is**: Use LLM to generate a hypothetical answer, then search using the answer's embedding instead of the query's.
|
|
191
|
+
|
|
192
|
+
**How it works**:
|
|
193
|
+
|
|
194
|
+
1. Query: "How do I configure authentication?"
|
|
195
|
+
2. LLM generates hypothetical answer (may be wrong, but captures patterns)
|
|
196
|
+
3. Embed the hypothetical answer
|
|
197
|
+
4. Search with that embedding
|
|
198
|
+
|
|
199
|
+
**Why it works**:
|
|
200
|
+
|
|
201
|
+
> "The semantic gap between your short question and the detailed answer creates mismatches. HyDE bridges this gap by first expanding your question into a hypothetical detailed answer."
|
|
202
|
+
|
|
203
|
+
**When to use**:
|
|
204
|
+
|
|
205
|
+
- Complex questions
|
|
206
|
+
- Domain-specific jargon
|
|
207
|
+
- When query is much shorter than target documents
|
|
208
|
+
|
|
209
|
+
**When NOT to use**:
|
|
210
|
+
|
|
211
|
+
- Simple keyword queries
|
|
212
|
+
- When LLM lacks domain knowledge
|
|
213
|
+
- Latency-sensitive applications (adds LLM call)
|
|
214
|
+
|
|
215
|
+
#### b) LLM Query Expansion
|
|
216
|
+
|
|
217
|
+
**What it is**: Use LLM to expand query with synonyms, related terms, and reformulations.
|
|
218
|
+
|
|
219
|
+
**Approaches**:
|
|
220
|
+
|
|
221
|
+
1. **Explicit expansion**: Generate expansion terms to append
|
|
222
|
+
2. **Multi-query**: Generate multiple query variations, search all, merge results
|
|
223
|
+
|
|
224
|
+
**Risk**:
|
|
225
|
+
|
|
226
|
+
> "While query expansion is helpful, using LLMs risks adding unhelpful query terms that reduce performance."
|
|
227
|
+
|
|
228
|
+
**Best practices**:
|
|
229
|
+
|
|
230
|
+
- Use for ambiguous queries only
|
|
231
|
+
- Limit expansion scope
|
|
232
|
+
- Consider query type detection before expanding
|
|
233
|
+
|
|
234
|
+
### 6. Domain-Adapted Embeddings
|
|
235
|
+
|
|
236
|
+
**What it is**: Fine-tune embedding models on your specific corpus or domain.
|
|
237
|
+
|
|
238
|
+
**Why it matters**:
|
|
239
|
+
|
|
240
|
+
> "Off-the-shelf embedding models are often limited to general knowledge and not company- or domain-specific knowledge."
|
|
241
|
+
|
|
242
|
+
**Results**:
|
|
243
|
+
|
|
244
|
+
> "Fine-tuning can boost performance by ~7% with only 6.3k samples. The training took 3 minutes on a consumer size GPU."
|
|
245
|
+
|
|
246
|
+
**Approaches**:
|
|
247
|
+
|
|
248
|
+
| Approach | Effort | Improvement | When to Use |
|
|
249
|
+
| --------------------- | ------ | ----------- | ------------------------- |
|
|
250
|
+
| LoRA adapters | Low | 5-10% | Specialized terminology |
|
|
251
|
+
| Full fine-tune | Medium | 10-15% | Domain-specific semantics |
|
|
252
|
+
| Contrastive on corpus | High | 15-20% | Mission-critical search |
|
|
253
|
+
|
|
254
|
+
**Requirements**:
|
|
255
|
+
|
|
256
|
+
- Training data (query-document pairs)
|
|
257
|
+
- GPU for training (consumer-grade sufficient)
|
|
258
|
+
- Evaluation dataset
|
|
259
|
+
|
|
260
|
+
### 7. Matryoshka Representation Learning (MRL)
|
|
261
|
+
|
|
262
|
+
**What it is**: Embeddings that work at multiple dimensions, enabling adaptive precision/speed tradeoffs.
|
|
263
|
+
|
|
264
|
+
**How it works**:
|
|
265
|
+
|
|
266
|
+
- Full embedding: 1536 dimensions
|
|
267
|
+
- Can truncate to 768, 384, 256, 128, etc.
|
|
268
|
+
- Early dimensions contain most information
|
|
269
|
+
- Enable two-stage retrieval with progressive precision
|
|
270
|
+
|
|
271
|
+
**Benefits**:
|
|
272
|
+
|
|
273
|
+
> "Up to 14x smaller embedding size for ImageNet-1K classification at the same level of accuracy... up to 14x real-world speed-ups for large-scale retrieval."
|
|
274
|
+
|
|
275
|
+
**Supported models**:
|
|
276
|
+
|
|
277
|
+
- OpenAI text-embedding-3-large (supports dimension reduction)
|
|
278
|
+
- Nomic nomic-embed-text-v1
|
|
279
|
+
- Alibaba gte-multilingual-base
|
|
280
|
+
|
|
281
|
+
**Application for mdcontext**:
|
|
282
|
+
|
|
283
|
+
- Already using text-embedding-3-small (supports dimensions parameter)
|
|
284
|
+
- Could use lower dimensions for initial shortlist
|
|
285
|
+
- Full dimensions for final ranking
|
|
286
|
+
|
|
287
|
+
---
|
|
288
|
+
|
|
289
|
+
## Top 3 Recommendations
|
|
290
|
+
|
|
291
|
+
### Recommendation 1: Hybrid Search (BM25 + Dense)
|
|
292
|
+
|
|
293
|
+
**Why #1**: Maximum impact with minimal complexity.
|
|
294
|
+
|
|
295
|
+
**Rationale**:
|
|
296
|
+
|
|
297
|
+
- Addresses the vocabulary mismatch problem directly
|
|
298
|
+
- 15-30% recall improvement documented
|
|
299
|
+
- Well-supported in JavaScript ecosystem
|
|
300
|
+
- No external dependencies (LLM, GPU, Python)
|
|
301
|
+
- Complements existing dense search perfectly
|
|
302
|
+
|
|
303
|
+
**Implementation path**:
|
|
304
|
+
|
|
305
|
+
1. Add BM25 index alongside HNSW
|
|
306
|
+
2. Run parallel queries
|
|
307
|
+
3. Fuse with RRF (k=60)
|
|
308
|
+
4. Return fused top-k
|
|
309
|
+
|
|
310
|
+
**Expected improvement**: 15-25% better recall for technical queries.
|
|
311
|
+
|
|
312
|
+
### Recommendation 2: Cross-Encoder Re-ranking
|
|
313
|
+
|
|
314
|
+
**Why #2**: Best precision gains for reasonable cost.
|
|
315
|
+
|
|
316
|
+
**Rationale**:
|
|
317
|
+
|
|
318
|
+
- Dramatically improves top-10 relevance
|
|
319
|
+
- Can be applied selectively (complex queries only)
|
|
320
|
+
- Transformers.js enables pure JavaScript implementation
|
|
321
|
+
- Small models (MiniLM) run fast enough for interactive use
|
|
322
|
+
|
|
323
|
+
**Implementation path**:
|
|
324
|
+
|
|
325
|
+
1. Use Transformers.js with cross-encoder model
|
|
326
|
+
2. Re-rank top-20 candidates to top-10
|
|
327
|
+
3. Consider caching for repeated queries
|
|
328
|
+
|
|
329
|
+
**Expected improvement**: 10-20% precision@10 improvement.
|
|
330
|
+
|
|
331
|
+
### Recommendation 3: Query Expansion (Selective HyDE)
|
|
332
|
+
|
|
333
|
+
**Why #3**: Addresses semantic gap for complex queries.
|
|
334
|
+
|
|
335
|
+
**Rationale**:
|
|
336
|
+
|
|
337
|
+
- Transforms short queries into document-like representations
|
|
338
|
+
- Works well for "how to" and conceptual queries
|
|
339
|
+
- Can be optional (detect when helpful)
|
|
340
|
+
- Uses existing OpenAI integration
|
|
341
|
+
|
|
342
|
+
**Implementation path**:
|
|
343
|
+
|
|
344
|
+
1. Detect query type (simple keyword vs. complex question)
|
|
345
|
+
2. For complex queries, generate 1-3 hypothetical answers
|
|
346
|
+
3. Embed answers, average embeddings
|
|
347
|
+
4. Search with expanded representation
|
|
348
|
+
|
|
349
|
+
**Expected improvement**: 15-30% for complex queries, 0% for simple keywords (but no regression).
|
|
350
|
+
|
|
351
|
+
---
|
|
352
|
+
|
|
353
|
+
## Effort/Impact Analysis
|
|
354
|
+
|
|
355
|
+
| Technique | Implementation Effort | Accuracy Impact | Latency Impact | Dependencies |
|
|
356
|
+
| ------------------------- | --------------------- | --------------------- | ------------------- | ---------------------------------- |
|
|
357
|
+
| **Hybrid Search** | Medium (2-3 days) | High (+15-30%) | Low (+5-10ms) | npm package only |
|
|
358
|
+
| **Cross-Encoder Re-rank** | Medium (2-3 days) | High (+10-20%) | Medium (+50-200ms) | Transformers.js + ONNX model |
|
|
359
|
+
| **HyDE Query Expansion** | Low (1 day) | Medium (+15%) | High (+500-1000ms) | OpenAI API |
|
|
360
|
+
| **SPLADE** | High (1-2 weeks) | Medium (+10%) | Low | Python service |
|
|
361
|
+
| **ColBERT** | Very High (2-4 weeks) | Very High (+20%) | Medium | Python service + specialized index |
|
|
362
|
+
| **Fine-tuned Embeddings** | High (1 week) | Medium-High (+10-15%) | None | Training infrastructure |
|
|
363
|
+
| **Matryoshka Dimensions** | Low (0.5 days) | Low (+5%) | Improvement (-20ms) | Already supported |
|
|
364
|
+
|
|
365
|
+
### Prioritized Roadmap
|
|
366
|
+
|
|
367
|
+
```
|
|
368
|
+
Phase 1 (Quick Wins - 1 week):
|
|
369
|
+
├── Matryoshka dimension optimization
|
|
370
|
+
└── Query preprocessing improvements
|
|
371
|
+
|
|
372
|
+
Phase 2 (Core Improvements - 2 weeks):
|
|
373
|
+
├── Hybrid search (BM25 + dense)
|
|
374
|
+
└── RRF fusion implementation
|
|
375
|
+
|
|
376
|
+
Phase 3 (Advanced Features - 2 weeks):
|
|
377
|
+
├── Cross-encoder re-ranking (Transformers.js)
|
|
378
|
+
└── Selective HyDE for complex queries
|
|
379
|
+
|
|
380
|
+
Phase 4 (Future Optimization):
|
|
381
|
+
├── Domain-adapted embeddings (if corpus-specific issues)
|
|
382
|
+
└── SPLADE evaluation (if hybrid insufficient)
|
|
383
|
+
```
|
|
384
|
+
|
|
385
|
+
---
|
|
386
|
+
|
|
387
|
+
## Quick Wins
|
|
388
|
+
|
|
389
|
+
These improvements can be implemented quickly with immediate benefits:
|
|
390
|
+
|
|
391
|
+
### 1. Query Preprocessing (1-2 hours)
|
|
392
|
+
|
|
393
|
+
```typescript
|
|
394
|
+
function preprocessQuery(query: string): string {
|
|
395
|
+
return query
|
|
396
|
+
.toLowerCase()
|
|
397
|
+
.replace(/[^\w\s]/g, " ") // Remove punctuation
|
|
398
|
+
.replace(/\s+/g, " ") // Normalize whitespace
|
|
399
|
+
.trim();
|
|
400
|
+
}
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
**Impact**: Reduces embedding noise, 2-5% precision improvement.
|
|
404
|
+
|
|
405
|
+
### 2. Matryoshka Dimension Reduction (2-4 hours)
|
|
406
|
+
|
|
407
|
+
OpenAI's text-embedding-3-small supports dimension reduction:
|
|
408
|
+
|
|
409
|
+
```typescript
|
|
410
|
+
const response = await openai.embeddings.create({
|
|
411
|
+
model: "text-embedding-3-small",
|
|
412
|
+
input: texts,
|
|
413
|
+
dimensions: 512, // Instead of 1536
|
|
414
|
+
});
|
|
415
|
+
```
|
|
416
|
+
|
|
417
|
+
**Benefits**:
|
|
418
|
+
|
|
419
|
+
- 3x smaller index
|
|
420
|
+
- Faster search
|
|
421
|
+
- Minimal accuracy loss (< 2% for most cases)
|
|
422
|
+
|
|
423
|
+
**Best for**: Larger corpora, faster iteration.
|
|
424
|
+
|
|
425
|
+
### 3. Result Deduplication (1-2 hours)
|
|
426
|
+
|
|
427
|
+
Remove near-duplicate results based on:
|
|
428
|
+
|
|
429
|
+
- Same document + similar headings
|
|
430
|
+
- High cosine similarity between result embeddings
|
|
431
|
+
|
|
432
|
+
**Impact**: Better result diversity, improved user experience.
|
|
433
|
+
|
|
434
|
+
### 4. Boost Heading Matches (2-4 hours)
|
|
435
|
+
|
|
436
|
+
Add bonus score when query terms appear in section headings:
|
|
437
|
+
|
|
438
|
+
```typescript
|
|
439
|
+
function adjustScore(result: SearchResult, query: string): number {
|
|
440
|
+
const queryTerms = query.toLowerCase().split(/\s+/);
|
|
441
|
+
const headingLower = result.heading.toLowerCase();
|
|
442
|
+
const headingMatches = queryTerms.filter((t) =>
|
|
443
|
+
headingLower.includes(t),
|
|
444
|
+
).length;
|
|
445
|
+
|
|
446
|
+
return result.similarity + headingMatches * 0.05; // +5% per match
|
|
447
|
+
}
|
|
448
|
+
```
|
|
449
|
+
|
|
450
|
+
**Impact**: Significant for navigation queries ("installation guide", "API reference").
|
|
451
|
+
|
|
452
|
+
### 5. Document Title Context (2-4 hours)
|
|
453
|
+
|
|
454
|
+
Ensure document titles are prominent in embeddings:
|
|
455
|
+
|
|
456
|
+
```typescript
|
|
457
|
+
function getEmbeddingText(section: Section, doc: Document): string {
|
|
458
|
+
return `
|
|
459
|
+
Document: ${doc.title}
|
|
460
|
+
Section: ${section.heading}
|
|
461
|
+
Parent: ${section.parent?.heading || "None"}
|
|
462
|
+
|
|
463
|
+
${section.content}
|
|
464
|
+
`.trim();
|
|
465
|
+
}
|
|
466
|
+
```
|
|
467
|
+
|
|
468
|
+
**Impact**: Better matching for document-level queries.
|
|
469
|
+
|
|
470
|
+
### 6. Negative Result Caching (4-8 hours)
|
|
471
|
+
|
|
472
|
+
Cache queries that return poor results:
|
|
473
|
+
|
|
474
|
+
- Track low-similarity searches
|
|
475
|
+
- Use for query expansion hints
|
|
476
|
+
- Inform users when no good matches exist
|
|
477
|
+
|
|
478
|
+
**Impact**: Better UX, data for future improvements.
|
|
479
|
+
|
|
480
|
+
---
|
|
481
|
+
|
|
482
|
+
## References
|
|
483
|
+
|
|
484
|
+
### Research Papers
|
|
485
|
+
|
|
486
|
+
- [Precise Zero-Shot Dense Retrieval without Relevance Labels (HyDE)](https://arxiv.org/abs/2212.10496)
|
|
487
|
+
- [Matryoshka Representation Learning](https://arxiv.org/abs/2205.13147)
|
|
488
|
+
- [SPLADE: Sparse Lexical and Expansion Model for Information Retrieval](https://arxiv.org/abs/2109.10086)
|
|
489
|
+
- [ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction](https://arxiv.org/abs/2004.12832)
|
|
490
|
+
- [Conventional Contrastive Learning Often Falls Short](https://arxiv.org/abs/2505.19274)
|
|
491
|
+
|
|
492
|
+
### Implementation Resources
|
|
493
|
+
|
|
494
|
+
- [Transformers.js Documentation](https://huggingface.co/docs/transformers.js)
|
|
495
|
+
- [wink-bm25-text-search](https://www.npmjs.com/package/wink-bm25-text-search)
|
|
496
|
+
- [Sentence Transformers - Retrieve & Re-Rank](https://sbert.net/examples/sentence_transformer/applications/retrieve_rerank/README.html)
|
|
497
|
+
- [OpenAI Cookbook - Search Reranking with Cross-Encoders](https://cookbook.openai.com/examples/search_reranking_with_cross-encoders)
|
|
498
|
+
|
|
499
|
+
### Industry Best Practices
|
|
500
|
+
|
|
501
|
+
- [Weaviate: Hybrid Search Explained](https://weaviate.io/blog/hybrid-search-explained)
|
|
502
|
+
- [Qdrant: Modern Sparse Neural Retrieval](https://qdrant.tech/articles/modern-sparse-neural-retrieval/)
|
|
503
|
+
- [Pinecone: SPLADE for Sparse Vector Search](https://www.pinecone.io/learn/splade/)
|
|
504
|
+
- [Google Research: The Role of Sufficient Context in RAG](https://research.google/blog/deeper-insights-into-retrieval-augmented-generation-the-role-of-sufficient-context/)
|
|
505
|
+
|
|
506
|
+
---
|
|
507
|
+
|
|
508
|
+
## Summary
|
|
509
|
+
|
|
510
|
+
For mdcontext's semantic search use case, the recommended approach is:
|
|
511
|
+
|
|
512
|
+
1. **Hybrid search** for best baseline improvement
|
|
513
|
+
2. **Cross-encoder re-ranking** for precision when needed
|
|
514
|
+
3. **Selective query expansion** for complex queries
|
|
515
|
+
|
|
516
|
+
These three techniques, combined with quick wins like query preprocessing and heading boosting, can significantly improve search quality without introducing the complexity and failure modes of full RAG systems.
|
|
517
|
+
|
|
518
|
+
The key insight is that **pure retrieval optimization beats RAG** for documentation search because:
|
|
519
|
+
|
|
520
|
+
- Users want to find documents, not generated answers
|
|
521
|
+
- Every result must be relevant (no LLM to filter noise)
|
|
522
|
+
- Latency matters for interactive search
|
|
523
|
+
- Simpler systems are more reliable and maintainable
|