mdcontext 0.0.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (337) hide show
  1. package/.changeset/README.md +28 -0
  2. package/.changeset/config.json +11 -0
  3. package/.claude/settings.local.json +25 -0
  4. package/.github/workflows/ci.yml +83 -0
  5. package/.github/workflows/claude-code-review.yml +44 -0
  6. package/.github/workflows/claude.yml +85 -0
  7. package/.github/workflows/release.yml +113 -0
  8. package/.tldrignore +112 -0
  9. package/BACKLOG.md +338 -0
  10. package/CONTRIBUTING.md +186 -0
  11. package/NOTES/NOTES +44 -0
  12. package/README.md +434 -11
  13. package/biome.json +36 -0
  14. package/cspell.config.yaml +14 -0
  15. package/dist/chunk-23UPXDNL.js +3044 -0
  16. package/dist/chunk-2W7MO2DL.js +1366 -0
  17. package/dist/chunk-3NUAZGMA.js +1689 -0
  18. package/dist/chunk-7TOWB2XB.js +366 -0
  19. package/dist/chunk-7XOTOADQ.js +3065 -0
  20. package/dist/chunk-AH2PDM2K.js +3042 -0
  21. package/dist/chunk-BNXWSZ63.js +3742 -0
  22. package/dist/chunk-BTL5DJVU.js +3222 -0
  23. package/dist/chunk-HDHYG7E4.js +104 -0
  24. package/dist/chunk-HLR4KZBP.js +3234 -0
  25. package/dist/chunk-IP3FRFEB.js +1045 -0
  26. package/dist/chunk-KHU56VDO.js +3042 -0
  27. package/dist/chunk-KRYIFLQR.js +88 -0
  28. package/dist/chunk-LBSDNLEM.js +287 -0
  29. package/dist/chunk-MNTQ7HCP.js +2643 -0
  30. package/dist/chunk-MUJELQQ6.js +1387 -0
  31. package/dist/chunk-MXJGMSLV.js +2199 -0
  32. package/dist/chunk-N6QJGC3Z.js +2636 -0
  33. package/dist/chunk-OBELGBPM.js +1713 -0
  34. package/dist/chunk-OT7R5XTA.js +3192 -0
  35. package/dist/chunk-P7X4RA2T.js +106 -0
  36. package/dist/chunk-PIDUQNC2.js +3185 -0
  37. package/dist/chunk-POGCDIH4.js +3187 -0
  38. package/dist/chunk-PSIEOQGZ.js +3043 -0
  39. package/dist/chunk-PVRT3IHA.js +3238 -0
  40. package/dist/chunk-QNN4TT23.js +1430 -0
  41. package/dist/chunk-RE3R45RJ.js +3042 -0
  42. package/dist/chunk-S7E6TFX6.js +803 -0
  43. package/dist/chunk-SG6GLU4U.js +1378 -0
  44. package/dist/chunk-SJCDV2ST.js +274 -0
  45. package/dist/chunk-SYE5XLF3.js +104 -0
  46. package/dist/chunk-T5VLYBZD.js +103 -0
  47. package/dist/chunk-TOQB7VWU.js +3238 -0
  48. package/dist/chunk-VFNMZ4ZQ.js +3228 -0
  49. package/dist/chunk-VVTGZNBT.js +1629 -0
  50. package/dist/chunk-W7Q4RFEV.js +104 -0
  51. package/dist/chunk-XTYYVRLO.js +3190 -0
  52. package/dist/chunk-Y6MDYVJD.js +3063 -0
  53. package/dist/cli/main.d.ts +1 -0
  54. package/dist/cli/main.js +5458 -0
  55. package/dist/index.d.ts +653 -0
  56. package/dist/index.js +79 -0
  57. package/dist/mcp/server.d.ts +1 -0
  58. package/dist/mcp/server.js +472 -0
  59. package/dist/schema-BAWSG7KY.js +22 -0
  60. package/dist/schema-E3QUPL26.js +20 -0
  61. package/dist/schema-EHL7WUT6.js +20 -0
  62. package/docs/019-USAGE.md +625 -0
  63. package/docs/020-current-implementation.md +364 -0
  64. package/docs/021-DOGFOODING-FINDINGS.md +175 -0
  65. package/docs/BACKLOG.md +80 -0
  66. package/docs/CONFIG.md +1123 -0
  67. package/docs/DESIGN.md +439 -0
  68. package/docs/ERRORS.md +383 -0
  69. package/docs/PROJECT.md +88 -0
  70. package/docs/ROADMAP.md +407 -0
  71. package/docs/summarization.md +320 -0
  72. package/docs/test-links.md +9 -0
  73. package/justfile +40 -0
  74. package/package.json +74 -9
  75. package/pnpm-workspace.yaml +5 -0
  76. package/research/INDEX.md +315 -0
  77. package/research/code-review/README.md +90 -0
  78. package/research/code-review/cli-error-handling-review.md +979 -0
  79. package/research/code-review/code-review-validation-report.md +464 -0
  80. package/research/code-review/main-ts-review.md +1128 -0
  81. package/research/config-analysis/01-current-implementation.md +470 -0
  82. package/research/config-analysis/02-strategy-recommendation.md +428 -0
  83. package/research/config-analysis/03-task-candidates.md +715 -0
  84. package/research/config-analysis/033-research-configuration-management.md +828 -0
  85. package/research/config-analysis/034-research-effect-cli-config.md +1504 -0
  86. package/research/config-analysis/04-consolidated-task-candidates.md +277 -0
  87. package/research/config-docs/SUMMARY.md +357 -0
  88. package/research/config-docs/TEST-RESULTS.md +776 -0
  89. package/research/config-docs/TODO.md +542 -0
  90. package/research/config-docs/analysis.md +744 -0
  91. package/research/config-docs/fix-validation.md +502 -0
  92. package/research/config-docs/help-audit.md +264 -0
  93. package/research/config-docs/help-system-analysis.md +890 -0
  94. package/research/dogfood/consolidated-tool-evaluation.md +373 -0
  95. package/research/dogfood/strategy-a/a-synthesis.md +184 -0
  96. package/research/dogfood/strategy-a/a1-docs.md +226 -0
  97. package/research/dogfood/strategy-a/a2-amorphic.md +156 -0
  98. package/research/dogfood/strategy-a/a3-llm.md +164 -0
  99. package/research/dogfood/strategy-b/b-synthesis.md +228 -0
  100. package/research/dogfood/strategy-b/b1-architecture.md +207 -0
  101. package/research/dogfood/strategy-b/b2-gaps.md +258 -0
  102. package/research/dogfood/strategy-b/b3-workflows.md +250 -0
  103. package/research/dogfood/strategy-c/c-synthesis.md +451 -0
  104. package/research/dogfood/strategy-c/c1-explorer.md +192 -0
  105. package/research/dogfood/strategy-c/c2-diver-memory.md +145 -0
  106. package/research/dogfood/strategy-c/c3-diver-control.md +148 -0
  107. package/research/dogfood/strategy-c/c4-diver-failure.md +151 -0
  108. package/research/dogfood/strategy-c/c5-diver-execution.md +221 -0
  109. package/research/dogfood/strategy-c/c6-diver-org.md +221 -0
  110. package/research/effect-cli-error-handling.md +845 -0
  111. package/research/effect-errors-as-values.md +943 -0
  112. package/research/errors-task-analysis/00-consolidated-tasks.md +207 -0
  113. package/research/errors-task-analysis/cli-commands-analysis.md +909 -0
  114. package/research/errors-task-analysis/embeddings-analysis.md +709 -0
  115. package/research/errors-task-analysis/index-search-analysis.md +812 -0
  116. package/research/frontmatter/COMMENTS-ARE-SKIPPED.md +149 -0
  117. package/research/frontmatter/LLM-CODE-NAVIGATION.md +276 -0
  118. package/research/issue-review.md +603 -0
  119. package/research/llm-summarization/agent-cli-tools-2026.md +1082 -0
  120. package/research/llm-summarization/alternative-providers-2026.md +1428 -0
  121. package/research/llm-summarization/anthropic-2026.md +367 -0
  122. package/research/llm-summarization/claude-cli-integration.md +1706 -0
  123. package/research/llm-summarization/cli-integration-patterns.md +3155 -0
  124. package/research/llm-summarization/openai-2026.md +473 -0
  125. package/research/llm-summarization/openai-compatible-providers-2026.md +1022 -0
  126. package/research/llm-summarization/opencode-cli-integration.md +1552 -0
  127. package/research/llm-summarization/prompt-engineering-2026.md +1426 -0
  128. package/research/llm-summarization/prototype-results.md +56 -0
  129. package/research/llm-summarization/provider-switching-patterns-2026.md +2153 -0
  130. package/research/llm-summarization/typescript-llm-libraries-2026.md +2436 -0
  131. package/research/mdcontext-error-analysis.md +521 -0
  132. package/research/mdcontext-pudding/00-EXECUTIVE-SUMMARY.md +282 -0
  133. package/research/mdcontext-pudding/01-index-embed.md +956 -0
  134. package/research/mdcontext-pudding/02-search-COMMANDS.md +142 -0
  135. package/research/mdcontext-pudding/02-search-SUMMARY.md +146 -0
  136. package/research/mdcontext-pudding/02-search.md +970 -0
  137. package/research/mdcontext-pudding/03-context.md +779 -0
  138. package/research/mdcontext-pudding/04-navigation-and-analytics.md +803 -0
  139. package/research/mdcontext-pudding/04-tree.md +704 -0
  140. package/research/mdcontext-pudding/05-config.md +1038 -0
  141. package/research/mdcontext-pudding/06-links-summary.txt +87 -0
  142. package/research/mdcontext-pudding/06-links.md +679 -0
  143. package/research/mdcontext-pudding/07-stats.md +693 -0
  144. package/research/mdcontext-pudding/BUG-FIX-PLAN.md +388 -0
  145. package/research/mdcontext-pudding/P0-BUG-VALIDATION.md +167 -0
  146. package/research/mdcontext-pudding/README.md +168 -0
  147. package/research/mdcontext-pudding/TESTING-SUMMARY.md +128 -0
  148. package/research/npm_publish/011-npm-workflow-research-agent2.md +792 -0
  149. package/research/npm_publish/012-npm-workflow-research-agent1.md +530 -0
  150. package/research/npm_publish/013-npm-workflow-research-agent3.md +722 -0
  151. package/research/npm_publish/014-npm-workflow-synthesis.md +556 -0
  152. package/research/npm_publish/031-npm-workflow-task-analysis.md +134 -0
  153. package/research/research-quality-review.md +834 -0
  154. package/research/semantic-search/002-research-embedding-models.md +490 -0
  155. package/research/semantic-search/003-research-rag-alternatives.md +523 -0
  156. package/research/semantic-search/004-research-vector-search.md +841 -0
  157. package/research/semantic-search/032-research-semantic-search.md +427 -0
  158. package/research/semantic-search/embedding-text-analysis.md +156 -0
  159. package/research/semantic-search/multi-word-failure-reproduction.md +171 -0
  160. package/research/semantic-search/query-processing-analysis.md +207 -0
  161. package/research/semantic-search/root-cause-and-solution.md +114 -0
  162. package/research/semantic-search/threshold-validation-report.md +69 -0
  163. package/research/semantic-search/vector-search-analysis.md +63 -0
  164. package/research/task-management-2026/00-synthesis-recommendations.md +295 -0
  165. package/research/task-management-2026/01-ai-workflow-tools.md +416 -0
  166. package/research/task-management-2026/02-agent-framework-patterns.md +476 -0
  167. package/research/task-management-2026/03-lightweight-file-based.md +567 -0
  168. package/research/task-management-2026/04-established-tools-ai-features.md +541 -0
  169. package/research/task-management-2026/linear/01-core-features-workflow.md +771 -0
  170. package/research/task-management-2026/linear/02-api-integrations.md +930 -0
  171. package/research/task-management-2026/linear/03-ai-features.md +368 -0
  172. package/research/task-management-2026/linear/04-pricing-setup.md +205 -0
  173. package/research/task-management-2026/linear/05-usage-patterns-best-practices.md +605 -0
  174. package/research/test-path-issues.md +276 -0
  175. package/review/ALP-76/1-error-type-design.md +962 -0
  176. package/review/ALP-76/2-error-handling-patterns.md +906 -0
  177. package/review/ALP-76/3-error-presentation.md +624 -0
  178. package/review/ALP-76/4-test-coverage.md +625 -0
  179. package/review/ALP-76/5-migration-completeness.md +440 -0
  180. package/review/ALP-76/6-effect-best-practices.md +755 -0
  181. package/scripts/apply-branch-protection.sh +47 -0
  182. package/scripts/branch-protection-templates.json +79 -0
  183. package/scripts/prototype-summarization.ts +346 -0
  184. package/scripts/rebuild-hnswlib.js +58 -0
  185. package/scripts/setup-branch-protection.sh +64 -0
  186. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/active-provider.json +7 -0
  187. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/bm25.json +541 -0
  188. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/bm25.meta.json +5 -0
  189. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/config.json +8 -0
  190. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.bin +0 -0
  191. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.meta.bin +0 -0
  192. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/documents.json +60 -0
  193. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/links.json +13 -0
  194. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/sections.json +1197 -0
  195. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/configuration-management.md +99 -0
  196. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/distributed-systems.md +92 -0
  197. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/error-handling.md +78 -0
  198. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/failure-automation.md +55 -0
  199. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/job-context.md +69 -0
  200. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/process-orchestration.md +99 -0
  201. package/src/cli/argv-preprocessor.test.ts +210 -0
  202. package/src/cli/argv-preprocessor.ts +202 -0
  203. package/src/cli/cli.test.ts +627 -0
  204. package/src/cli/commands/backlinks.ts +54 -0
  205. package/src/cli/commands/config-cmd.ts +642 -0
  206. package/src/cli/commands/context.ts +285 -0
  207. package/src/cli/commands/duplicates.ts +122 -0
  208. package/src/cli/commands/embeddings.ts +529 -0
  209. package/src/cli/commands/index-cmd.ts +480 -0
  210. package/src/cli/commands/index.ts +16 -0
  211. package/src/cli/commands/links.ts +52 -0
  212. package/src/cli/commands/search.ts +1281 -0
  213. package/src/cli/commands/stats.ts +149 -0
  214. package/src/cli/commands/tree.ts +128 -0
  215. package/src/cli/config-layer.ts +176 -0
  216. package/src/cli/error-handler.test.ts +235 -0
  217. package/src/cli/error-handler.ts +655 -0
  218. package/src/cli/flag-schemas.ts +341 -0
  219. package/src/cli/help.ts +588 -0
  220. package/src/cli/index.ts +9 -0
  221. package/src/cli/main.ts +435 -0
  222. package/src/cli/options.ts +41 -0
  223. package/src/cli/shared-error-handling.ts +199 -0
  224. package/src/cli/typo-suggester.test.ts +105 -0
  225. package/src/cli/typo-suggester.ts +130 -0
  226. package/src/cli/utils.ts +259 -0
  227. package/src/config/file-provider.test.ts +320 -0
  228. package/src/config/file-provider.ts +273 -0
  229. package/src/config/index.ts +72 -0
  230. package/src/config/integration.test.ts +667 -0
  231. package/src/config/precedence.test.ts +277 -0
  232. package/src/config/precedence.ts +451 -0
  233. package/src/config/schema.test.ts +414 -0
  234. package/src/config/schema.ts +603 -0
  235. package/src/config/service.test.ts +320 -0
  236. package/src/config/service.ts +243 -0
  237. package/src/config/testing.test.ts +264 -0
  238. package/src/config/testing.ts +110 -0
  239. package/src/core/index.ts +1 -0
  240. package/src/core/types.ts +113 -0
  241. package/src/duplicates/detector.test.ts +183 -0
  242. package/src/duplicates/detector.ts +414 -0
  243. package/src/duplicates/index.ts +18 -0
  244. package/src/embeddings/embedding-namespace.test.ts +300 -0
  245. package/src/embeddings/embedding-namespace.ts +947 -0
  246. package/src/embeddings/heading-boost.test.ts +222 -0
  247. package/src/embeddings/hnsw-build-options.test.ts +198 -0
  248. package/src/embeddings/hyde.test.ts +272 -0
  249. package/src/embeddings/hyde.ts +264 -0
  250. package/src/embeddings/index.ts +10 -0
  251. package/src/embeddings/openai-provider.ts +414 -0
  252. package/src/embeddings/pricing.json +22 -0
  253. package/src/embeddings/provider-constants.ts +204 -0
  254. package/src/embeddings/provider-errors.test.ts +967 -0
  255. package/src/embeddings/provider-errors.ts +565 -0
  256. package/src/embeddings/provider-factory.test.ts +240 -0
  257. package/src/embeddings/provider-factory.ts +225 -0
  258. package/src/embeddings/provider-integration.test.ts +788 -0
  259. package/src/embeddings/query-preprocessing.test.ts +187 -0
  260. package/src/embeddings/semantic-search-threshold.test.ts +508 -0
  261. package/src/embeddings/semantic-search.ts +1270 -0
  262. package/src/embeddings/types.ts +359 -0
  263. package/src/embeddings/vector-store.ts +708 -0
  264. package/src/embeddings/voyage-provider.ts +313 -0
  265. package/src/errors/errors.test.ts +845 -0
  266. package/src/errors/index.ts +533 -0
  267. package/src/index/ignore-patterns.test.ts +354 -0
  268. package/src/index/ignore-patterns.ts +305 -0
  269. package/src/index/index.ts +4 -0
  270. package/src/index/indexer.ts +684 -0
  271. package/src/index/storage.ts +260 -0
  272. package/src/index/types.ts +147 -0
  273. package/src/index/watcher.ts +189 -0
  274. package/src/index.ts +30 -0
  275. package/src/integration/search-keyword.test.ts +678 -0
  276. package/src/mcp/server.ts +612 -0
  277. package/src/parser/index.ts +1 -0
  278. package/src/parser/parser.test.ts +291 -0
  279. package/src/parser/parser.ts +394 -0
  280. package/src/parser/section-filter.test.ts +277 -0
  281. package/src/parser/section-filter.ts +392 -0
  282. package/src/search/__tests__/hybrid-search.test.ts +650 -0
  283. package/src/search/bm25-store.ts +366 -0
  284. package/src/search/cross-encoder.test.ts +253 -0
  285. package/src/search/cross-encoder.ts +406 -0
  286. package/src/search/fuzzy-search.test.ts +419 -0
  287. package/src/search/fuzzy-search.ts +273 -0
  288. package/src/search/hybrid-search.ts +448 -0
  289. package/src/search/path-matcher.test.ts +276 -0
  290. package/src/search/path-matcher.ts +33 -0
  291. package/src/search/query-parser.test.ts +260 -0
  292. package/src/search/query-parser.ts +319 -0
  293. package/src/search/searcher.test.ts +280 -0
  294. package/src/search/searcher.ts +724 -0
  295. package/src/search/wink-bm25.d.ts +30 -0
  296. package/src/summarization/cli-providers/claude.ts +202 -0
  297. package/src/summarization/cli-providers/detection.test.ts +273 -0
  298. package/src/summarization/cli-providers/detection.ts +118 -0
  299. package/src/summarization/cli-providers/index.ts +8 -0
  300. package/src/summarization/cost.test.ts +139 -0
  301. package/src/summarization/cost.ts +102 -0
  302. package/src/summarization/error-handler.test.ts +127 -0
  303. package/src/summarization/error-handler.ts +111 -0
  304. package/src/summarization/index.ts +102 -0
  305. package/src/summarization/pipeline.test.ts +498 -0
  306. package/src/summarization/pipeline.ts +231 -0
  307. package/src/summarization/prompts.test.ts +269 -0
  308. package/src/summarization/prompts.ts +133 -0
  309. package/src/summarization/provider-factory.test.ts +396 -0
  310. package/src/summarization/provider-factory.ts +178 -0
  311. package/src/summarization/types.ts +184 -0
  312. package/src/summarize/budget-bugs.test.ts +620 -0
  313. package/src/summarize/formatters.ts +419 -0
  314. package/src/summarize/index.ts +20 -0
  315. package/src/summarize/summarizer.test.ts +275 -0
  316. package/src/summarize/summarizer.ts +597 -0
  317. package/src/summarize/verify-bugs.test.ts +238 -0
  318. package/src/types/huggingface-transformers.d.ts +66 -0
  319. package/src/utils/index.ts +1 -0
  320. package/src/utils/tokens.test.ts +142 -0
  321. package/src/utils/tokens.ts +186 -0
  322. package/tests/fixtures/cli/.mdcontext/active-provider.json +7 -0
  323. package/tests/fixtures/cli/.mdcontext/config.json +8 -0
  324. package/tests/fixtures/cli/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.bin +0 -0
  325. package/tests/fixtures/cli/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.meta.bin +0 -0
  326. package/tests/fixtures/cli/.mdcontext/indexes/documents.json +33 -0
  327. package/tests/fixtures/cli/.mdcontext/indexes/links.json +12 -0
  328. package/tests/fixtures/cli/.mdcontext/indexes/sections.json +247 -0
  329. package/tests/fixtures/cli/README.md +9 -0
  330. package/tests/fixtures/cli/api-reference.md +11 -0
  331. package/tests/fixtures/cli/getting-started.md +11 -0
  332. package/tests/integration/embed-index.test.ts +712 -0
  333. package/tests/integration/search-context.test.ts +469 -0
  334. package/tests/integration/search-semantic.test.ts +522 -0
  335. package/tsconfig.json +26 -0
  336. package/vitest.config.ts +16 -0
  337. package/vitest.setup.ts +12 -0
@@ -0,0 +1,841 @@
1
+ # Vector Search Research: Patterns and Techniques (2025-2026)
2
+
3
+ Research findings for improving mdcontext semantic search capabilities.
4
+
5
+ ## Table of Contents
6
+
7
+ 1. [Hybrid Search](#1-hybrid-search)
8
+ 2. [Re-ranking Approaches](#2-re-ranking-approaches)
9
+ 3. [Vector Index Alternatives](#3-vector-index-alternatives)
10
+ 4. [Filtering and Metadata](#4-filtering-and-metadata)
11
+ 5. [Emerging Patterns](#5-emerging-patterns-2025-2026)
12
+ 6. [Quick Wins: HNSW Parameter Tuning](#6-quick-wins-hnsw-parameter-tuning)
13
+ 7. [Top 3 Recommendations](#7-top-3-recommendations)
14
+ 8. [Effort/Impact Analysis](#8-effortimpact-analysis)
15
+
16
+ ---
17
+
18
+ ## 1. Hybrid Search
19
+
20
+ Hybrid search combines sparse retrieval (BM25/keyword) with dense retrieval (vector embeddings) to leverage the strengths of both approaches.
21
+
22
+ ### Why Hybrid Search?
23
+
24
+ | Approach | Strengths | Weaknesses |
25
+ | -------------------- | --------------------------------------------------------------------------------------------------------- | ----------------------------------------------- |
26
+ | **Keyword (BM25)** | Exact term matching, handles specific identifiers (e.g., "TS-01"), no vocabulary mismatch for known terms | Misses synonyms, semantic meaning, context |
27
+ | **Semantic (Dense)** | Understands meaning, handles paraphrasing, conceptual similarity | May miss exact terms, identifiers, proper nouns |
28
+ | **Hybrid** | Best of both worlds: exact + semantic | Added complexity, needs score fusion |
29
+
30
+ **Key insight**: Pure embedding search may miss important exact matches. For example, searching for "TS-01" won't naturally retrieve documents mentioning that identifier because embeddings represent high-dimensional semantic space, not lexical matches.
31
+
32
+ ### Fusion Techniques
33
+
34
+ #### Reciprocal Rank Fusion (RRF)
35
+
36
+ RRF is the most widely adopted fusion algorithm for hybrid search. It merges ranked lists without requiring score normalization.
37
+
38
+ **Formula**:
39
+
40
+ ```
41
+ RRF_score(d) = Σ 1/(k + rank_i(d))
42
+ ```
43
+
44
+ Where:
45
+
46
+ - `k` is a smoothing constant (typically 60)
47
+ - `rank_i(d)` is the document's rank in system i
48
+
49
+ **Advantages**:
50
+
51
+ - Score-agnostic: Works with incompatible scoring systems (cosine similarity 0-1 vs BM25 unbounded)
52
+ - Simple to implement
53
+ - No hyperparameter tuning for score scales
54
+ - Robust across different retrieval methods
55
+
56
+ **Performance**: Hybrid search with RRF consistently outperforms single-method retrieval by 10-15% in precision benchmarks.
57
+
58
+ #### Weighted RRF
59
+
60
+ Extends RRF with configurable weights per retrieval method:
61
+
62
+ ```typescript
63
+ // Example configuration
64
+ const weights = {
65
+ bm25: 1.0, // Full weight for lexical precision
66
+ semantic: 0.7, // Slightly lower for semantic similarity
67
+ };
68
+ ```
69
+
70
+ This allows emphasizing one method over another based on use case.
71
+
72
+ #### Linear Combination
73
+
74
+ Simpler fusion that combines normalized scores directly:
75
+
76
+ ```typescript
77
+ finalScore = alpha * semanticScore + (1 - alpha) * bm25Score;
78
+ ```
79
+
80
+ **Requires** score normalization to same scale. Less robust than RRF but faster.
81
+
82
+ ### Implementation Options for Node.js
83
+
84
+ #### BM25 Libraries
85
+
86
+ | Library | Notes | NPM |
87
+ | --------------------------- | ------------------------------------------------------------ | ----------------------- |
88
+ | **wink-bm25-text-search** | Full-featured, supports field weighting, ~100% test coverage | `wink-bm25-text-search` |
89
+ | **OkapiBM25** | Simple, typed implementation, 111K downloads/year | `okapi-bm25` |
90
+ | **@langchain/community** | BM25Retriever for LangChain pipelines | `@langchain/community` |
91
+ | **winkNLP BM25 Vectorizer** | BM25 with configurable k1/b parameters | `wink-nlp` |
92
+
93
+ **Recommendation**: `wink-bm25-text-search` for its reliability and semantic features (stemming, stop words, field boosting).
94
+
95
+ ### Hybrid Search Implementation Pattern
96
+
97
+ ```typescript
98
+ interface HybridSearchResult {
99
+ sectionId: string;
100
+ semanticRank?: number;
101
+ bm25Rank?: number;
102
+ rrfScore: number;
103
+ }
104
+
105
+ function hybridSearch(
106
+ query: string,
107
+ options: {
108
+ semanticWeight?: number; // default 1.0
109
+ bm25Weight?: number; // default 1.0
110
+ k?: number; // RRF smoothing constant, default 60
111
+ limit?: number;
112
+ },
113
+ ): HybridSearchResult[] {
114
+ // 1. Run both searches in parallel
115
+ const [semanticResults, bm25Results] = await Promise.all([
116
+ semanticSearch(query, { limit: limit * 2 }),
117
+ bm25Search(query, { limit: limit * 2 }),
118
+ ]);
119
+
120
+ // 2. Apply RRF fusion
121
+ const scores = new Map<string, number>();
122
+
123
+ semanticResults.forEach((r, i) => {
124
+ const rank = i + 1;
125
+ const score = (scores.get(r.sectionId) || 0) + semanticWeight / (k + rank);
126
+ scores.set(r.sectionId, score);
127
+ });
128
+
129
+ bm25Results.forEach((r, i) => {
130
+ const rank = i + 1;
131
+ const score = (scores.get(r.sectionId) || 0) + bm25Weight / (k + rank);
132
+ scores.set(r.sectionId, score);
133
+ });
134
+
135
+ // 3. Sort by RRF score and return top results
136
+ return [...scores.entries()]
137
+ .sort((a, b) => b[1] - a[1])
138
+ .slice(0, limit)
139
+ .map(([id, score]) => ({ sectionId: id, rrfScore: score }));
140
+ }
141
+ ```
142
+
143
+ ### When Hybrid Beats Pure Semantic
144
+
145
+ - **Exact term searches**: Product codes, error codes, API names
146
+ - **Proper nouns**: Names, brands, specific technologies
147
+ - **Technical documentation**: Where exact terminology matters
148
+ - **Short queries**: Single-word searches that need lexical grounding
149
+
150
+ ---
151
+
152
+ ## 2. Re-ranking Approaches
153
+
154
+ Re-ranking is a two-stage retrieval pattern: first retrieve candidates with a fast method, then re-rank top-N with a more accurate model.
155
+
156
+ ### Why Re-ranking?
157
+
158
+ - **Bi-encoders (embedding models)** encode query and documents separately, enabling fast ANN search but missing cross-attention between query-document pairs
159
+ - **Cross-encoders** jointly encode query+document, capturing fine-grained relevance but too slow for full corpus search
160
+ - **Solution**: Use bi-encoder for retrieval, cross-encoder for re-ranking top candidates
161
+
162
+ ### Cross-Encoder Models
163
+
164
+ #### MS-MARCO MiniLM (Recommended for mdcontext)
165
+
166
+ | Model | Parameters | Latency | Use Case |
167
+ | ------------------------- | ---------- | ---------------- | -------------------------- |
168
+ | `ms-marco-MiniLM-L-6-v2` | 22.7M | 2-5ms/pair (CPU) | Fast, general purpose |
169
+ | `ms-marco-MiniLM-L-12-v2` | 33M | ~10ms/pair | Better quality, still fast |
170
+ | `BGE-reranker-base` | - | - | Multilingual support |
171
+ | `BGE-reranker-large` | - | - | Best quality, multilingual |
172
+
173
+ **Key stats**:
174
+
175
+ - Re-ranking typically improves RAG accuracy by 20-35%
176
+ - Adds 200-500ms latency (for top-20 re-ranking)
177
+ - Leading organizations see 30-50% improvements in retrieval precision
178
+
179
+ #### Performance Benefits (2025-2026 Production Data)
180
+
181
+ > "Three factors converge in 2026 to make reranking mainstream: open-source cross-encoder implementations have matured significantly, models like ms-marco-MiniLM-L-12-v2 deliver 95% of the performance of proprietary alternatives while running on commodity hardware."
182
+
183
+ ### ColBERT: Late Interaction Models
184
+
185
+ ColBERT uses "late interaction" - encoding query and document separately but comparing at token level:
186
+
187
+ **Architecture**:
188
+
189
+ ```
190
+ Query: [q1, q2, q3, ...] → Token embeddings
191
+ Document: [d1, d2, d3, ...] → Token embeddings
192
+ Score: MaxSim(Q, D) = Σ max(qi · dj)
193
+ ```
194
+
195
+ **Advantages**:
196
+
197
+ - Better quality than bi-encoders
198
+ - Faster than cross-encoders (document embeddings can be precomputed)
199
+ - Storage-efficient with ColBERTv2 residual compression (6-10x smaller)
200
+
201
+ **Production Readiness (2025)**:
202
+
203
+ - Memory-mapped index storage (ColBERT-serve) reduces RAM by 90%+
204
+ - RAGatouille library provides easy Python integration
205
+ - Active research area (ECIR 2026 workshop on Late Interaction)
206
+
207
+ **For mdcontext**: ColBERT is likely overkill given the modest corpus size. Cross-encoders offer simpler integration with similar quality benefits.
208
+
209
+ ### LLM-Based Re-ranking
210
+
211
+ Using language models to rank search results:
212
+
213
+ ```typescript
214
+ // Example prompt
215
+ const prompt = `
216
+ Given the query: "${query}"
217
+
218
+ Rank these documents by relevance (most relevant first):
219
+ ${documents.map((d, i) => `${i + 1}. ${d.title}: ${d.snippet}`).join("\n")}
220
+
221
+ Return only the numbers in ranked order.
222
+ `;
223
+ ```
224
+
225
+ **Pros**: Highly accurate, understands nuance
226
+ **Cons**: Slow, expensive, adds LLM dependency
227
+
228
+ **Recommendation for mdcontext**: Not recommended. Cross-encoders provide good accuracy without LLM cost/latency.
229
+
230
+ ### JavaScript/TypeScript Implementation Options
231
+
232
+ #### Option 1: Transformers.js (Browser + Node.js)
233
+
234
+ ```typescript
235
+ import { pipeline } from "@xenova/transformers";
236
+
237
+ // Load cross-encoder for re-ranking
238
+ const reranker = await pipeline(
239
+ "text-classification",
240
+ "Xenova/ms-marco-MiniLM-L-6-v2",
241
+ );
242
+
243
+ // Score query-document pairs
244
+ const scores = await Promise.all(
245
+ documents.map((doc) => reranker(`${query} [SEP] ${doc.content}`)),
246
+ );
247
+ ```
248
+
249
+ **Pros**:
250
+
251
+ - Runs locally (no API calls)
252
+ - ONNX runtime + WebGPU acceleration available
253
+ - Works in browser and Node.js
254
+
255
+ **Cons**:
256
+
257
+ - Model download required (~80MB for MiniLM-L6)
258
+ - First load is slow
259
+ - Node.js ONNX setup can be tricky
260
+
261
+ #### Option 2: External Re-ranking API
262
+
263
+ Services like Cohere, Jina, or self-hosted endpoints.
264
+
265
+ ```typescript
266
+ const response = await fetch("https://api.cohere.ai/v1/rerank", {
267
+ method: "POST",
268
+ headers: { Authorization: `Bearer ${apiKey}` },
269
+ body: JSON.stringify({
270
+ query,
271
+ documents: docs.map((d) => d.content),
272
+ top_n: 10,
273
+ model: "rerank-english-v2.0",
274
+ }),
275
+ });
276
+ ```
277
+
278
+ **Pros**: Easy integration, no local model management
279
+ **Cons**: API cost, latency, dependency
280
+
281
+ ### When to Use Re-ranking
282
+
283
+ | Use Case | Re-ranking Value |
284
+ | ------------------------------ | ---------------- |
285
+ | High-precision requirements | High |
286
+ | Long documents with dense info | High |
287
+ | Ambiguous queries | High |
288
+ | Simple keyword searches | Low |
289
+ | Real-time autocomplete | Low (latency) |
290
+ | Very small result sets (<5) | Low |
291
+
292
+ **For mdcontext**: Medium-high value. Documentation search benefits from re-ranking because section embeddings may rank "close enough" results highly, and cross-encoders can distinguish subtle relevance differences.
293
+
294
+ ---
295
+
296
+ ## 3. Vector Index Alternatives
297
+
298
+ ### HNSW (Current mdcontext Implementation)
299
+
300
+ **Strengths**:
301
+
302
+ - Near-instantaneous nearest neighbor retrieval
303
+ - Excellent recall and speed when data fits in RAM
304
+ - Incremental updates without rebuild
305
+ - Well-supported in Node.js (hnswlib-node)
306
+
307
+ **Weaknesses**:
308
+
309
+ - Entire graph must fit in memory
310
+ - Higher memory footprint per vector (graph structure overhead)
311
+
312
+ **Best for**: Mid-sized datasets (<10M vectors) with RAM budget
313
+
314
+ ### IVF (Inverted File Index)
315
+
316
+ **Architecture**: Clusters vectors using k-means, searches only relevant clusters
317
+
318
+ **Strengths**:
319
+
320
+ - Lower memory than HNSW (loads clusters on-demand)
321
+ - Configurable recall/speed tradeoff via nprobe parameter
322
+ - IVF+PQ enables billion-scale on disk
323
+
324
+ **Weaknesses**:
325
+
326
+ - Accuracy depends on clustering quality
327
+ - Updates require re-clustering
328
+ - Cold queries may miss results if clusters are poor
329
+
330
+ **Best for**: Large static datasets, memory-constrained environments
331
+
332
+ ### DiskANN
333
+
334
+ **Architecture**: Vamana graph + product quantization for SSD storage
335
+
336
+ **Strengths**:
337
+
338
+ - Handles datasets larger than RAM
339
+ - Stable latency with beam search and caching
340
+ - Good for dynamic datasets
341
+
342
+ **Weaknesses**:
343
+
344
+ - IOPS bottlenecks possible
345
+ - Base DiskANN is immutable (FreshDiskANN adds updates)
346
+ - More complex setup
347
+
348
+ **Best for**: Large datasets (10M+) where ~25% fits in RAM
349
+
350
+ ### Comparison Summary
351
+
352
+ | Index | Memory | Speed | Updates | Best Scale |
353
+ | ----------- | ------ | ------- | ------- | ------------ |
354
+ | **HNSW** | High | Fastest | Easy | <10M vectors |
355
+ | **IVF** | Medium | Fast | Rebuild | 10M-100M |
356
+ | **DiskANN** | Low | Good | Limited | 100M+ |
357
+
358
+ ### Node.js Library Options
359
+
360
+ | Library | Index Types | Notes |
361
+ | ------------------------ | ------------- | --------------------------------------------- |
362
+ | **hnswlib-node** | HNSW only | Mature, reliable, current mdcontext choice |
363
+ | **faiss-node** | IVF, HNSW, PQ | Facebook's FAISS bindings, more index options |
364
+ | **LangChain FaissStore** | FAISS-backed | Higher-level API, LangChain ecosystem |
365
+ | **hnswsqlite** | HNSW + SQLite | Persistence with metadata |
366
+
367
+ **Recommendation for mdcontext**: Stay with hnswlib-node. Documentation corpora are typically <100K sections, well within HNSW's sweet spot. The complexity of FAISS isn't warranted.
368
+
369
+ ---
370
+
371
+ ## 4. Filtering and Metadata
372
+
373
+ ### The Filtering Challenge
374
+
375
+ Vector search filtering is non-trivial because ANN indexes like HNSW optimize for similarity, not attribute filtering.
376
+
377
+ ### Three Strategies
378
+
379
+ #### Pre-Filtering (Filter-Then-Search)
380
+
381
+ 1. Apply metadata filter (e.g., `path LIKE 'docs/api/%'`)
382
+ 2. Run ANN search on filtered subset
383
+
384
+ **Pros**:
385
+
386
+ - Accurate results (only searches valid candidates)
387
+ - Works well for low-cardinality filters
388
+
389
+ **Cons**:
390
+
391
+ - **Breaks HNSW graph connectivity** when filter is highly selective
392
+ - May require brute-force search on small filtered sets
393
+ - Significant recall drop when <10% vectors remain
394
+
395
+ #### Post-Filtering (Search-Then-Filter)
396
+
397
+ 1. Run ANN search for k\*N candidates
398
+ 2. Apply metadata filter
399
+ 3. Return top k that pass filter
400
+
401
+ **Pros**:
402
+
403
+ - Predictable latency
404
+ - HNSW graph stays intact
405
+
406
+ **Cons**:
407
+
408
+ - May return fewer than k results
409
+ - Wastes computation on filtered-out results
410
+ - Poor recall with selective filters
411
+
412
+ #### Integrated Filtering (In-Algorithm)
413
+
414
+ Modern vector databases modify the search algorithm to be filter-aware:
415
+
416
+ - **Weaviate ACORN**: Two-hop graph expansion for filtered search
417
+ - **Qdrant**: Pre-filtering with automatic fallback to payload index
418
+ - **Pinecone**: Merged metadata and vector indexes
419
+
420
+ **Performance**: Engines with integrated filtering maintain recall and often get _faster_ with filters (less work to do).
421
+
422
+ ### Current mdcontext Filtering
423
+
424
+ From `current-implementation.md`:
425
+
426
+ - Only path pattern filtering supported (`pathPattern` option)
427
+ - Implemented as post-filtering
428
+
429
+ ### Recommended Approach for mdcontext
430
+
431
+ Given typical documentation corpus sizes (<100K sections), a pragmatic hybrid approach:
432
+
433
+ ```typescript
434
+ interface FilteredSearchOptions {
435
+ pathPattern?: string;
436
+ documentTypes?: string[];
437
+ minTokens?: number;
438
+ // Future: tags, dates, etc.
439
+ }
440
+
441
+ async function filteredSearch(query: string, options: FilteredSearchOptions) {
442
+ // 1. Estimate filter selectivity
443
+ const totalDocs = await getDocumentCount();
444
+ const filteredCount = await estimateFilteredCount(options);
445
+ const selectivity = filteredCount / totalDocs;
446
+
447
+ if (selectivity < 0.1) {
448
+ // Highly selective: brute-force on filtered set
449
+ const candidates = await getFilteredSections(options);
450
+ return bruteForceSearch(query, candidates);
451
+ } else if (selectivity < 0.5) {
452
+ // Medium selectivity: over-fetch then filter
453
+ const results = await semanticSearch(query, { limit: limit * 3 });
454
+ return applyFilters(results, options).slice(0, limit);
455
+ } else {
456
+ // Low selectivity: standard search with post-filter
457
+ const results = await semanticSearch(query, { limit: limit * 1.5 });
458
+ return applyFilters(results, options).slice(0, limit);
459
+ }
460
+ }
461
+ ```
462
+
463
+ ### Metadata to Consider
464
+
465
+ | Metadata | Use Case |
466
+ | -------------- | ---------------------------------- |
467
+ | `documentPath` | Filter by directory/file |
468
+ | `documentType` | Filter API docs, guides, tutorials |
469
+ | `lastModified` | Prefer recent content |
470
+ | `tokens` | Filter by content length |
471
+ | `headingLevel` | Prefer top-level sections |
472
+ | `tags` | Custom categorization |
473
+
474
+ ---
475
+
476
+ ## 5. Emerging Patterns (2025-2026)
477
+
478
+ ### Learned Sparse Retrieval (SPLADE)
479
+
480
+ **What it is**: Neural models that produce sparse vectors with semantic term expansion.
481
+
482
+ **How it works**:
483
+
484
+ - Encodes text into sparse vector where dimensions = vocabulary terms
485
+ - Activates semantically related terms (e.g., "study" also activates "learn", "research")
486
+ - Compatible with inverted indexes like BM25
487
+
488
+ **SPLADE vs BM25**:
489
+
490
+ | Aspect | BM25 | SPLADE |
491
+ | ------------------- | ----------------- | ---------------------------- |
492
+ | Vocabulary mismatch | Critical weakness | Solved via expansion |
493
+ | Latency | Baseline | Similar (with optimizations) |
494
+ | Quality | Good | Better in-domain |
495
+ | Index compatibility | Inverted index | Inverted index |
496
+
497
+ **2025 Status**:
498
+
499
+ - SPLADE efficiency now matches BM25 (<4ms difference)
500
+ - Best results with hybrid sparse+dense approaches
501
+ - New pruning techniques (Superblock Pruning) up to 16x faster
502
+
503
+ **For mdcontext**: Interesting but adds complexity. BM25 + semantic hybrid likely sufficient.
504
+
505
+ ### Query Expansion with HyDE
506
+
507
+ **Hypothetical Document Embeddings (HyDE)**:
508
+
509
+ 1. User submits query
510
+ 2. LLM generates hypothetical answer document
511
+ 3. Embed the hypothetical document (not the query)
512
+ 4. Search for real documents similar to the hypothetical
513
+
514
+ **Why it works**: Compares document-to-document rather than question-to-document, bridging the semantic gap.
515
+
516
+ **Implementation**:
517
+
518
+ ```typescript
519
+ async function hydeSearch(query: string) {
520
+ // 1. Generate hypothetical document
521
+ const hypothetical = await llm.generate(
522
+ `Write a detailed paragraph that would answer: "${query}"`,
523
+ );
524
+
525
+ // 2. Embed hypothetical (or average multiple)
526
+ const embedding = await embed(hypothetical);
527
+
528
+ // 3. Search with hypothetical embedding
529
+ return vectorStore.search(embedding);
530
+ }
531
+ ```
532
+
533
+ **Benefits**:
534
+
535
+ - 10-30% retrieval improvement on ambiguous queries
536
+ - Zero-shot (no training required)
537
+ - Domain adaptable
538
+
539
+ **Limitations**:
540
+
541
+ - Requires LLM call (cost, latency)
542
+ - Works poorly if LLM has no domain knowledge
543
+ - May hallucinate misleading hypotheticals
544
+
545
+ **For mdcontext**: Good option for complex queries, could be opt-in feature.
546
+
547
+ ### GraphRAG
548
+
549
+ Combines vector search with knowledge graphs:
550
+
551
+ - Entities and relationships extracted from documents
552
+ - Queries traverse both vector space and graph
553
+ - Claims 99% precision in some benchmarks
554
+
555
+ **For mdcontext**: Overkill for documentation search. More relevant for enterprise knowledge bases.
556
+
557
+ ### Long-Context RAG
558
+
559
+ Processing longer retrieval units (sections, documents) rather than small chunks.
560
+
561
+ **Benefits**:
562
+
563
+ - Preserves context
564
+ - Reduces fragmentation
565
+ - Better for coherent understanding
566
+
567
+ **mdcontext alignment**: Already uses section-level granularity, well-aligned with this trend.
568
+
569
+ ### Self-RAG
570
+
571
+ Self-reflective retrieval that:
572
+
573
+ 1. Decides when to retrieve
574
+ 2. Evaluates retrieval quality
575
+ 3. Critiques generated outputs
576
+
577
+ **For mdcontext**: Beyond current scope, more relevant for RAG pipelines with generation.
578
+
579
+ ---
580
+
581
+ ## 6. Quick Wins: HNSW Parameter Tuning
582
+
583
+ Current mdcontext parameters (from `current-implementation.md`):
584
+
585
+ ```typescript
586
+ M: 16; // Max connections per node
587
+ efConstruction: 200; // Construction-time search width
588
+ efSearch: 100; // Query-time search width (implicit)
589
+ ```
590
+
591
+ ### Parameter Effects
592
+
593
+ | Parameter | Increase Effect | Decrease Effect |
594
+ | ------------------ | ----------------------------------------- | ----------------------------------------- |
595
+ | **M** | Better recall, larger index, slower build | Faster build, smaller index, lower recall |
596
+ | **efConstruction** | Better graph quality, slower build | Faster build, potentially lower recall |
597
+ | **efSearch** | Better recall, slower queries | Faster queries, lower recall |
598
+
599
+ ### Recommended Tuning
600
+
601
+ For documentation search (~1K-100K sections):
602
+
603
+ #### Option A: Balanced (Current)
604
+
605
+ ```typescript
606
+ M: 16;
607
+ efConstruction: 200;
608
+ efSearch: 100; // Consider increasing
609
+ ```
610
+
611
+ Good balance, may benefit from higher efSearch.
612
+
613
+ #### Option B: Quality-Focused
614
+
615
+ ```typescript
616
+ M: 24; // More connections
617
+ efConstruction: 256; // Better graph
618
+ efSearch: 200; // More thorough search
619
+ ```
620
+
621
+ ~30% more memory, ~95%+ recall, slightly slower build.
622
+
623
+ #### Option C: Speed-Focused
624
+
625
+ ```typescript
626
+ M: 12;
627
+ efConstruction: 128;
628
+ efSearch: 64;
629
+ ```
630
+
631
+ Faster builds and queries, ~85-90% recall.
632
+
633
+ ### Quick Win: Dynamic efSearch
634
+
635
+ Since efSearch can be set at query time:
636
+
637
+ ```typescript
638
+ function search(
639
+ query: string,
640
+ options: { quality?: "fast" | "balanced" | "thorough" },
641
+ ) {
642
+ const efSearch = {
643
+ fast: 64,
644
+ balanced: 100,
645
+ thorough: 256,
646
+ }[options.quality ?? "balanced"];
647
+
648
+ return vectorStore.search(queryEmbedding, { efSearch });
649
+ }
650
+ ```
651
+
652
+ ### Validation Approach
653
+
654
+ 1. Create ground-truth test set (10-20 queries with known relevant sections)
655
+ 2. Measure recall@k for different parameters
656
+ 3. Measure query latency
657
+ 4. Choose based on recall/latency tradeoff
658
+
659
+ ---
660
+
661
+ ## 7. Top 3 Recommendations
662
+
663
+ ### Recommendation 1: Hybrid Search with RRF
664
+
665
+ **What**: Add BM25 keyword search alongside semantic search, fuse results with Reciprocal Rank Fusion.
666
+
667
+ **Why**:
668
+
669
+ - Handles exact term matching (API names, error codes)
670
+ - 10-15% precision improvement in benchmarks
671
+ - Low implementation complexity
672
+ - Falls back gracefully (if one method fails, other still works)
673
+
674
+ **Implementation**:
675
+
676
+ 1. Add `wink-bm25-text-search` dependency
677
+ 2. Build BM25 index during embedding build (uses same section content)
678
+ 3. Add `--mode hybrid` option to search command
679
+ 4. Implement RRF fusion (~50 lines of code)
680
+
681
+ **Effort**: Medium (2-3 days)
682
+ **Impact**: High
683
+
684
+ ### Recommendation 2: Cross-Encoder Re-ranking
685
+
686
+ **What**: Re-rank top-20 semantic search results using ms-marco-MiniLM-L-6-v2 cross-encoder.
687
+
688
+ **Why**:
689
+
690
+ - 20-35% accuracy improvement
691
+ - Catches relevant results that rank lower in embedding space
692
+ - Can be opt-in (--rerank flag) to avoid latency when not needed
693
+ - Modern cross-encoders are fast (2-5ms per pair)
694
+
695
+ **Implementation**:
696
+
697
+ 1. Add Transformers.js dependency or use API (Cohere/Jina)
698
+ 2. Load cross-encoder model on first rerank request
699
+ 3. Score top-N candidates
700
+ 4. Re-sort by cross-encoder score
701
+
702
+ **Effort**: Medium (2-3 days for Transformers.js, 1 day for API)
703
+ **Impact**: High
704
+
705
+ ### Recommendation 3: HNSW Parameter Optimization
706
+
707
+ **What**: Tune HNSW parameters based on corpus size and add dynamic efSearch.
708
+
709
+ **Why**:
710
+
711
+ - Zero dependency changes
712
+ - Immediate quality/speed improvements
713
+ - Low risk
714
+
715
+ **Implementation**:
716
+
717
+ 1. Add config options for M, efConstruction
718
+ 2. Implement dynamic efSearch (fast/balanced/thorough)
719
+ 3. Add `--quality` flag to search command
720
+ 4. Consider auto-tuning based on corpus size
721
+
722
+ **Effort**: Low (1 day)
723
+ **Impact**: Medium
724
+
725
+ ---
726
+
727
+ ## 8. Effort/Impact Analysis
728
+
729
+ ### Summary Matrix
730
+
731
+ | Improvement | Effort | Impact | Risk | Priority |
732
+ | ---------------------------- | ------------- | ---------- | -------- | -------- |
733
+ | **HNSW parameter tuning** | Low (1d) | Medium | Very Low | P0 |
734
+ | **Hybrid search (BM25+RRF)** | Medium (2-3d) | High | Low | P1 |
735
+ | **Cross-encoder re-ranking** | Medium (2-3d) | High | Medium | P1 |
736
+ | **Dynamic efSearch** | Low (0.5d) | Low-Medium | Very Low | P0 |
737
+ | **HyDE query expansion** | Medium (2d) | Medium | Medium | P2 |
738
+ | **Enhanced filtering** | Medium (2d) | Medium | Low | P2 |
739
+ | **SPLADE sparse retrieval** | High (5d+) | Medium | Medium | P3 |
740
+ | **ColBERT late interaction** | High (1w+) | Medium | High | P3 |
741
+
742
+ ### Recommended Implementation Order
743
+
744
+ **Phase 1: Quick Wins (Week 1)**
745
+
746
+ 1. HNSW parameter optimization + dynamic efSearch
747
+ 2. Add quality flag to search CLI
748
+
749
+ **Phase 2: Hybrid Search (Week 2)**
750
+
751
+ 1. Integrate BM25 library
752
+ 2. Build BM25 index during embedding build
753
+ 3. Implement RRF fusion
754
+ 4. Add hybrid mode to CLI
755
+
756
+ **Phase 3: Re-ranking (Week 3)**
757
+
758
+ 1. Evaluate Transformers.js vs API approach
759
+ 2. Implement re-ranking pipeline
760
+ 3. Add --rerank flag
761
+ 4. Cache loaded models
762
+
763
+ **Phase 4: Polish (Week 4)**
764
+
765
+ 1. Add HyDE as opt-in for complex queries
766
+ 2. Enhance metadata filtering
767
+ 3. Add search quality metrics/logging
768
+ 4. Documentation
769
+
770
+ ### Risk Mitigation
771
+
772
+ | Risk | Mitigation |
773
+ | --------------------------- | --------------------------- |
774
+ | Transformers.js ONNX issues | Fallback to API reranking |
775
+ | BM25 index size | Store separately, lazy load |
776
+ | Increased latency | Make re-ranking opt-in |
777
+ | Model download size | Cache models, lazy load |
778
+
779
+ ---
780
+
781
+ ## Sources
782
+
783
+ ### Hybrid Search
784
+
785
+ - [Hybrid Search Explained - Weaviate](https://weaviate.io/blog/hybrid-search-explained)
786
+ - [Hybrid Search with BM25 and Rank Fusion - Medium](https://medium.com/thinking-sand/hybrid-search-with-bm25-and-rank-fusion-for-accurate-results-456a70305dc5)
787
+ - [Hybrid Search Scoring (RRF) - Azure AI Search](https://learn.microsoft.com/en-us/azure/search/hybrid-search-ranking)
788
+ - [Comprehensive Hybrid Search Guide - Elastic](https://www.elastic.co/what-is/hybrid-search)
789
+ - [Reciprocal Rank Fusion - ParadeDB](https://www.paradedb.com/learn/search-concepts/reciprocal-rank-fusion)
790
+
791
+ ### Re-ranking
792
+
793
+ - [cross-encoder/ms-marco-MiniLM-L6-v2 - Hugging Face](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2)
794
+ - [RAG Reranking Techniques - CustomGPT](https://customgpt.ai/rag-reranking-techniques/)
795
+ - [Adaptive Retrieval Reranking - RAG About It](https://ragaboutit.com/adaptive-retrieval-reranking-how-to-implement-cross-encoder-models-to-fix-enterprise-rag-ranking-failures/)
796
+ - [MS MARCO Cross-Encoders - Sentence Transformers](https://www.sbert.net/docs/pretrained-models/ce-msmarco.html)
797
+ - [FlashRank - GitHub](https://github.com/PrithivirajDamodaran/FlashRank)
798
+
799
+ ### Vector Indexes
800
+
801
+ - [Vector Search at Scale: HNSW vs IVF vs DiskANN](https://netcrit.net/vector-search-at-scale-hnsw-vs-ivf-vs-diskann)
802
+ - [HNSW vs DiskANN - Tiger Data](https://www.tigerdata.com/learn/hnsw-vs-diskann)
803
+ - [How to Pick a Vector Index - Zilliz](https://zilliz.com/learn/how-to-pick-a-vector-index-in-milvus-visual-guide)
804
+ - [HNSW Index Explained - Milvus](https://milvus.io/docs/index-explained.md)
805
+
806
+ ### HNSW Tuning
807
+
808
+ - [Practical Guide to HNSW Hyperparameters - OpenSearch](https://opensearch.org/blog/a-practical-guide-to-selecting-hnsw-hyperparameters/)
809
+ - [HNSW Configuration Parameters - Milvus AI Reference](https://milvus.io/ai-quick-reference/what-are-the-key-configuration-parameters-for-an-hnsw-index-such-as-m-and-efconstructionefsearch-and-how-does-each-influence-the-tradeoff-between-index-size-build-time-query-speed-and-recall)
810
+ - [HNSW Indexes with Postgres - Crunchy Data](https://www.crunchydata.com/blog/hnsw-indexes-with-postgres-and-pgvector)
811
+
812
+ ### Filtering
813
+
814
+ - [Complete Guide to Filtering in Vector Search - Qdrant](https://qdrant.tech/articles/vector-search-filtering/)
815
+ - [Vector Query Filters - Azure AI Search](https://learn.microsoft.com/en-us/azure/search/vector-search-filters)
816
+ - [Achilles Heel of Vector Search: Filters](https://yudhiesh.github.io/2025/05/09/the-achilles-heel-of-vector-search-filters/)
817
+ - [Metadata Filtering and Hybrid Search - Dataquest](https://www.dataquest.io/blog/metadata-filtering-and-hybrid-search-for-vector-databases/)
818
+
819
+ ### Emerging Patterns
820
+
821
+ - [Late Interaction Overview: ColBERT, ColPali - Weaviate](https://weaviate.io/blog/late-interaction-overview)
822
+ - [Modern Sparse Neural Retrieval - Qdrant](https://qdrant.tech/articles/modern-sparse-neural-retrieval/)
823
+ - [SPLADE vs BM25 - Zilliz](https://zilliz.com/learn/comparing-splade-sparse-vectors-with-bm25)
824
+ - [HyDE for RAG - Machine Learning Plus](https://machinelearningplus.com/gen-ai/hypothetical-document-embedding-hyde-a-smarter-rag-method-to-search-documents/)
825
+ - [Better RAG with HyDE - Zilliz](https://zilliz.com/learn/improve-rag-and-information-retrieval-with-hyde-hypothetical-document-embeddings)
826
+
827
+ ### Node.js Libraries
828
+
829
+ - [hnswlib-node - npm](https://www.npmjs.com/package/hnswlib-node)
830
+ - [hnswlib-node - GitHub](https://github.com/yoshoku/hnswlib-node)
831
+ - [wink-bm25-text-search - npm](https://www.npmjs.com/package/wink-bm25-text-search)
832
+ - [OkapiBM25 - GitHub](https://github.com/FurkanToprak/OkapiBM25)
833
+ - [Transformers.js v3 - Hugging Face](https://huggingface.co/blog/transformersjs-v3)
834
+ - [FaissStore - LangChain.js](https://js.langchain.com/docs/integrations/vectorstores/faiss/)
835
+
836
+ ### RAG Best Practices
837
+
838
+ - [2025 Guide to RAG - Eden AI](https://www.edenai.co/post/the-2025-guide-to-retrieval-augmented-generation-rag)
839
+ - [Enhancing RAG: Study of Best Practices - arXiv](https://arxiv.org/abs/2501.07391)
840
+ - [RAG 2025 Definitive Guide - Chitika](https://www.chitika.com/retrieval-augmented-generation-rag-the-definitive-guide-2025/)
841
+ - [Role of Sufficient Context in RAG - Google Research](https://research.google/blog/deeper-insights-into-retrieval-augmented-generation-the-role-of-sufficient-context/)