mdcontext 0.0.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (337) hide show
  1. package/.changeset/README.md +28 -0
  2. package/.changeset/config.json +11 -0
  3. package/.claude/settings.local.json +25 -0
  4. package/.github/workflows/ci.yml +83 -0
  5. package/.github/workflows/claude-code-review.yml +44 -0
  6. package/.github/workflows/claude.yml +85 -0
  7. package/.github/workflows/release.yml +113 -0
  8. package/.tldrignore +112 -0
  9. package/BACKLOG.md +338 -0
  10. package/CONTRIBUTING.md +186 -0
  11. package/NOTES/NOTES +44 -0
  12. package/README.md +434 -11
  13. package/biome.json +36 -0
  14. package/cspell.config.yaml +14 -0
  15. package/dist/chunk-23UPXDNL.js +3044 -0
  16. package/dist/chunk-2W7MO2DL.js +1366 -0
  17. package/dist/chunk-3NUAZGMA.js +1689 -0
  18. package/dist/chunk-7TOWB2XB.js +366 -0
  19. package/dist/chunk-7XOTOADQ.js +3065 -0
  20. package/dist/chunk-AH2PDM2K.js +3042 -0
  21. package/dist/chunk-BNXWSZ63.js +3742 -0
  22. package/dist/chunk-BTL5DJVU.js +3222 -0
  23. package/dist/chunk-HDHYG7E4.js +104 -0
  24. package/dist/chunk-HLR4KZBP.js +3234 -0
  25. package/dist/chunk-IP3FRFEB.js +1045 -0
  26. package/dist/chunk-KHU56VDO.js +3042 -0
  27. package/dist/chunk-KRYIFLQR.js +88 -0
  28. package/dist/chunk-LBSDNLEM.js +287 -0
  29. package/dist/chunk-MNTQ7HCP.js +2643 -0
  30. package/dist/chunk-MUJELQQ6.js +1387 -0
  31. package/dist/chunk-MXJGMSLV.js +2199 -0
  32. package/dist/chunk-N6QJGC3Z.js +2636 -0
  33. package/dist/chunk-OBELGBPM.js +1713 -0
  34. package/dist/chunk-OT7R5XTA.js +3192 -0
  35. package/dist/chunk-P7X4RA2T.js +106 -0
  36. package/dist/chunk-PIDUQNC2.js +3185 -0
  37. package/dist/chunk-POGCDIH4.js +3187 -0
  38. package/dist/chunk-PSIEOQGZ.js +3043 -0
  39. package/dist/chunk-PVRT3IHA.js +3238 -0
  40. package/dist/chunk-QNN4TT23.js +1430 -0
  41. package/dist/chunk-RE3R45RJ.js +3042 -0
  42. package/dist/chunk-S7E6TFX6.js +803 -0
  43. package/dist/chunk-SG6GLU4U.js +1378 -0
  44. package/dist/chunk-SJCDV2ST.js +274 -0
  45. package/dist/chunk-SYE5XLF3.js +104 -0
  46. package/dist/chunk-T5VLYBZD.js +103 -0
  47. package/dist/chunk-TOQB7VWU.js +3238 -0
  48. package/dist/chunk-VFNMZ4ZQ.js +3228 -0
  49. package/dist/chunk-VVTGZNBT.js +1629 -0
  50. package/dist/chunk-W7Q4RFEV.js +104 -0
  51. package/dist/chunk-XTYYVRLO.js +3190 -0
  52. package/dist/chunk-Y6MDYVJD.js +3063 -0
  53. package/dist/cli/main.d.ts +1 -0
  54. package/dist/cli/main.js +5458 -0
  55. package/dist/index.d.ts +653 -0
  56. package/dist/index.js +79 -0
  57. package/dist/mcp/server.d.ts +1 -0
  58. package/dist/mcp/server.js +472 -0
  59. package/dist/schema-BAWSG7KY.js +22 -0
  60. package/dist/schema-E3QUPL26.js +20 -0
  61. package/dist/schema-EHL7WUT6.js +20 -0
  62. package/docs/019-USAGE.md +625 -0
  63. package/docs/020-current-implementation.md +364 -0
  64. package/docs/021-DOGFOODING-FINDINGS.md +175 -0
  65. package/docs/BACKLOG.md +80 -0
  66. package/docs/CONFIG.md +1123 -0
  67. package/docs/DESIGN.md +439 -0
  68. package/docs/ERRORS.md +383 -0
  69. package/docs/PROJECT.md +88 -0
  70. package/docs/ROADMAP.md +407 -0
  71. package/docs/summarization.md +320 -0
  72. package/docs/test-links.md +9 -0
  73. package/justfile +40 -0
  74. package/package.json +74 -9
  75. package/pnpm-workspace.yaml +5 -0
  76. package/research/INDEX.md +315 -0
  77. package/research/code-review/README.md +90 -0
  78. package/research/code-review/cli-error-handling-review.md +979 -0
  79. package/research/code-review/code-review-validation-report.md +464 -0
  80. package/research/code-review/main-ts-review.md +1128 -0
  81. package/research/config-analysis/01-current-implementation.md +470 -0
  82. package/research/config-analysis/02-strategy-recommendation.md +428 -0
  83. package/research/config-analysis/03-task-candidates.md +715 -0
  84. package/research/config-analysis/033-research-configuration-management.md +828 -0
  85. package/research/config-analysis/034-research-effect-cli-config.md +1504 -0
  86. package/research/config-analysis/04-consolidated-task-candidates.md +277 -0
  87. package/research/config-docs/SUMMARY.md +357 -0
  88. package/research/config-docs/TEST-RESULTS.md +776 -0
  89. package/research/config-docs/TODO.md +542 -0
  90. package/research/config-docs/analysis.md +744 -0
  91. package/research/config-docs/fix-validation.md +502 -0
  92. package/research/config-docs/help-audit.md +264 -0
  93. package/research/config-docs/help-system-analysis.md +890 -0
  94. package/research/dogfood/consolidated-tool-evaluation.md +373 -0
  95. package/research/dogfood/strategy-a/a-synthesis.md +184 -0
  96. package/research/dogfood/strategy-a/a1-docs.md +226 -0
  97. package/research/dogfood/strategy-a/a2-amorphic.md +156 -0
  98. package/research/dogfood/strategy-a/a3-llm.md +164 -0
  99. package/research/dogfood/strategy-b/b-synthesis.md +228 -0
  100. package/research/dogfood/strategy-b/b1-architecture.md +207 -0
  101. package/research/dogfood/strategy-b/b2-gaps.md +258 -0
  102. package/research/dogfood/strategy-b/b3-workflows.md +250 -0
  103. package/research/dogfood/strategy-c/c-synthesis.md +451 -0
  104. package/research/dogfood/strategy-c/c1-explorer.md +192 -0
  105. package/research/dogfood/strategy-c/c2-diver-memory.md +145 -0
  106. package/research/dogfood/strategy-c/c3-diver-control.md +148 -0
  107. package/research/dogfood/strategy-c/c4-diver-failure.md +151 -0
  108. package/research/dogfood/strategy-c/c5-diver-execution.md +221 -0
  109. package/research/dogfood/strategy-c/c6-diver-org.md +221 -0
  110. package/research/effect-cli-error-handling.md +845 -0
  111. package/research/effect-errors-as-values.md +943 -0
  112. package/research/errors-task-analysis/00-consolidated-tasks.md +207 -0
  113. package/research/errors-task-analysis/cli-commands-analysis.md +909 -0
  114. package/research/errors-task-analysis/embeddings-analysis.md +709 -0
  115. package/research/errors-task-analysis/index-search-analysis.md +812 -0
  116. package/research/frontmatter/COMMENTS-ARE-SKIPPED.md +149 -0
  117. package/research/frontmatter/LLM-CODE-NAVIGATION.md +276 -0
  118. package/research/issue-review.md +603 -0
  119. package/research/llm-summarization/agent-cli-tools-2026.md +1082 -0
  120. package/research/llm-summarization/alternative-providers-2026.md +1428 -0
  121. package/research/llm-summarization/anthropic-2026.md +367 -0
  122. package/research/llm-summarization/claude-cli-integration.md +1706 -0
  123. package/research/llm-summarization/cli-integration-patterns.md +3155 -0
  124. package/research/llm-summarization/openai-2026.md +473 -0
  125. package/research/llm-summarization/openai-compatible-providers-2026.md +1022 -0
  126. package/research/llm-summarization/opencode-cli-integration.md +1552 -0
  127. package/research/llm-summarization/prompt-engineering-2026.md +1426 -0
  128. package/research/llm-summarization/prototype-results.md +56 -0
  129. package/research/llm-summarization/provider-switching-patterns-2026.md +2153 -0
  130. package/research/llm-summarization/typescript-llm-libraries-2026.md +2436 -0
  131. package/research/mdcontext-error-analysis.md +521 -0
  132. package/research/mdcontext-pudding/00-EXECUTIVE-SUMMARY.md +282 -0
  133. package/research/mdcontext-pudding/01-index-embed.md +956 -0
  134. package/research/mdcontext-pudding/02-search-COMMANDS.md +142 -0
  135. package/research/mdcontext-pudding/02-search-SUMMARY.md +146 -0
  136. package/research/mdcontext-pudding/02-search.md +970 -0
  137. package/research/mdcontext-pudding/03-context.md +779 -0
  138. package/research/mdcontext-pudding/04-navigation-and-analytics.md +803 -0
  139. package/research/mdcontext-pudding/04-tree.md +704 -0
  140. package/research/mdcontext-pudding/05-config.md +1038 -0
  141. package/research/mdcontext-pudding/06-links-summary.txt +87 -0
  142. package/research/mdcontext-pudding/06-links.md +679 -0
  143. package/research/mdcontext-pudding/07-stats.md +693 -0
  144. package/research/mdcontext-pudding/BUG-FIX-PLAN.md +388 -0
  145. package/research/mdcontext-pudding/P0-BUG-VALIDATION.md +167 -0
  146. package/research/mdcontext-pudding/README.md +168 -0
  147. package/research/mdcontext-pudding/TESTING-SUMMARY.md +128 -0
  148. package/research/npm_publish/011-npm-workflow-research-agent2.md +792 -0
  149. package/research/npm_publish/012-npm-workflow-research-agent1.md +530 -0
  150. package/research/npm_publish/013-npm-workflow-research-agent3.md +722 -0
  151. package/research/npm_publish/014-npm-workflow-synthesis.md +556 -0
  152. package/research/npm_publish/031-npm-workflow-task-analysis.md +134 -0
  153. package/research/research-quality-review.md +834 -0
  154. package/research/semantic-search/002-research-embedding-models.md +490 -0
  155. package/research/semantic-search/003-research-rag-alternatives.md +523 -0
  156. package/research/semantic-search/004-research-vector-search.md +841 -0
  157. package/research/semantic-search/032-research-semantic-search.md +427 -0
  158. package/research/semantic-search/embedding-text-analysis.md +156 -0
  159. package/research/semantic-search/multi-word-failure-reproduction.md +171 -0
  160. package/research/semantic-search/query-processing-analysis.md +207 -0
  161. package/research/semantic-search/root-cause-and-solution.md +114 -0
  162. package/research/semantic-search/threshold-validation-report.md +69 -0
  163. package/research/semantic-search/vector-search-analysis.md +63 -0
  164. package/research/task-management-2026/00-synthesis-recommendations.md +295 -0
  165. package/research/task-management-2026/01-ai-workflow-tools.md +416 -0
  166. package/research/task-management-2026/02-agent-framework-patterns.md +476 -0
  167. package/research/task-management-2026/03-lightweight-file-based.md +567 -0
  168. package/research/task-management-2026/04-established-tools-ai-features.md +541 -0
  169. package/research/task-management-2026/linear/01-core-features-workflow.md +771 -0
  170. package/research/task-management-2026/linear/02-api-integrations.md +930 -0
  171. package/research/task-management-2026/linear/03-ai-features.md +368 -0
  172. package/research/task-management-2026/linear/04-pricing-setup.md +205 -0
  173. package/research/task-management-2026/linear/05-usage-patterns-best-practices.md +605 -0
  174. package/research/test-path-issues.md +276 -0
  175. package/review/ALP-76/1-error-type-design.md +962 -0
  176. package/review/ALP-76/2-error-handling-patterns.md +906 -0
  177. package/review/ALP-76/3-error-presentation.md +624 -0
  178. package/review/ALP-76/4-test-coverage.md +625 -0
  179. package/review/ALP-76/5-migration-completeness.md +440 -0
  180. package/review/ALP-76/6-effect-best-practices.md +755 -0
  181. package/scripts/apply-branch-protection.sh +47 -0
  182. package/scripts/branch-protection-templates.json +79 -0
  183. package/scripts/prototype-summarization.ts +346 -0
  184. package/scripts/rebuild-hnswlib.js +58 -0
  185. package/scripts/setup-branch-protection.sh +64 -0
  186. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/active-provider.json +7 -0
  187. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/bm25.json +541 -0
  188. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/bm25.meta.json +5 -0
  189. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/config.json +8 -0
  190. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.bin +0 -0
  191. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.meta.bin +0 -0
  192. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/documents.json +60 -0
  193. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/links.json +13 -0
  194. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/sections.json +1197 -0
  195. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/configuration-management.md +99 -0
  196. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/distributed-systems.md +92 -0
  197. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/error-handling.md +78 -0
  198. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/failure-automation.md +55 -0
  199. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/job-context.md +69 -0
  200. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/process-orchestration.md +99 -0
  201. package/src/cli/argv-preprocessor.test.ts +210 -0
  202. package/src/cli/argv-preprocessor.ts +202 -0
  203. package/src/cli/cli.test.ts +627 -0
  204. package/src/cli/commands/backlinks.ts +54 -0
  205. package/src/cli/commands/config-cmd.ts +642 -0
  206. package/src/cli/commands/context.ts +285 -0
  207. package/src/cli/commands/duplicates.ts +122 -0
  208. package/src/cli/commands/embeddings.ts +529 -0
  209. package/src/cli/commands/index-cmd.ts +480 -0
  210. package/src/cli/commands/index.ts +16 -0
  211. package/src/cli/commands/links.ts +52 -0
  212. package/src/cli/commands/search.ts +1281 -0
  213. package/src/cli/commands/stats.ts +149 -0
  214. package/src/cli/commands/tree.ts +128 -0
  215. package/src/cli/config-layer.ts +176 -0
  216. package/src/cli/error-handler.test.ts +235 -0
  217. package/src/cli/error-handler.ts +655 -0
  218. package/src/cli/flag-schemas.ts +341 -0
  219. package/src/cli/help.ts +588 -0
  220. package/src/cli/index.ts +9 -0
  221. package/src/cli/main.ts +435 -0
  222. package/src/cli/options.ts +41 -0
  223. package/src/cli/shared-error-handling.ts +199 -0
  224. package/src/cli/typo-suggester.test.ts +105 -0
  225. package/src/cli/typo-suggester.ts +130 -0
  226. package/src/cli/utils.ts +259 -0
  227. package/src/config/file-provider.test.ts +320 -0
  228. package/src/config/file-provider.ts +273 -0
  229. package/src/config/index.ts +72 -0
  230. package/src/config/integration.test.ts +667 -0
  231. package/src/config/precedence.test.ts +277 -0
  232. package/src/config/precedence.ts +451 -0
  233. package/src/config/schema.test.ts +414 -0
  234. package/src/config/schema.ts +603 -0
  235. package/src/config/service.test.ts +320 -0
  236. package/src/config/service.ts +243 -0
  237. package/src/config/testing.test.ts +264 -0
  238. package/src/config/testing.ts +110 -0
  239. package/src/core/index.ts +1 -0
  240. package/src/core/types.ts +113 -0
  241. package/src/duplicates/detector.test.ts +183 -0
  242. package/src/duplicates/detector.ts +414 -0
  243. package/src/duplicates/index.ts +18 -0
  244. package/src/embeddings/embedding-namespace.test.ts +300 -0
  245. package/src/embeddings/embedding-namespace.ts +947 -0
  246. package/src/embeddings/heading-boost.test.ts +222 -0
  247. package/src/embeddings/hnsw-build-options.test.ts +198 -0
  248. package/src/embeddings/hyde.test.ts +272 -0
  249. package/src/embeddings/hyde.ts +264 -0
  250. package/src/embeddings/index.ts +10 -0
  251. package/src/embeddings/openai-provider.ts +414 -0
  252. package/src/embeddings/pricing.json +22 -0
  253. package/src/embeddings/provider-constants.ts +204 -0
  254. package/src/embeddings/provider-errors.test.ts +967 -0
  255. package/src/embeddings/provider-errors.ts +565 -0
  256. package/src/embeddings/provider-factory.test.ts +240 -0
  257. package/src/embeddings/provider-factory.ts +225 -0
  258. package/src/embeddings/provider-integration.test.ts +788 -0
  259. package/src/embeddings/query-preprocessing.test.ts +187 -0
  260. package/src/embeddings/semantic-search-threshold.test.ts +508 -0
  261. package/src/embeddings/semantic-search.ts +1270 -0
  262. package/src/embeddings/types.ts +359 -0
  263. package/src/embeddings/vector-store.ts +708 -0
  264. package/src/embeddings/voyage-provider.ts +313 -0
  265. package/src/errors/errors.test.ts +845 -0
  266. package/src/errors/index.ts +533 -0
  267. package/src/index/ignore-patterns.test.ts +354 -0
  268. package/src/index/ignore-patterns.ts +305 -0
  269. package/src/index/index.ts +4 -0
  270. package/src/index/indexer.ts +684 -0
  271. package/src/index/storage.ts +260 -0
  272. package/src/index/types.ts +147 -0
  273. package/src/index/watcher.ts +189 -0
  274. package/src/index.ts +30 -0
  275. package/src/integration/search-keyword.test.ts +678 -0
  276. package/src/mcp/server.ts +612 -0
  277. package/src/parser/index.ts +1 -0
  278. package/src/parser/parser.test.ts +291 -0
  279. package/src/parser/parser.ts +394 -0
  280. package/src/parser/section-filter.test.ts +277 -0
  281. package/src/parser/section-filter.ts +392 -0
  282. package/src/search/__tests__/hybrid-search.test.ts +650 -0
  283. package/src/search/bm25-store.ts +366 -0
  284. package/src/search/cross-encoder.test.ts +253 -0
  285. package/src/search/cross-encoder.ts +406 -0
  286. package/src/search/fuzzy-search.test.ts +419 -0
  287. package/src/search/fuzzy-search.ts +273 -0
  288. package/src/search/hybrid-search.ts +448 -0
  289. package/src/search/path-matcher.test.ts +276 -0
  290. package/src/search/path-matcher.ts +33 -0
  291. package/src/search/query-parser.test.ts +260 -0
  292. package/src/search/query-parser.ts +319 -0
  293. package/src/search/searcher.test.ts +280 -0
  294. package/src/search/searcher.ts +724 -0
  295. package/src/search/wink-bm25.d.ts +30 -0
  296. package/src/summarization/cli-providers/claude.ts +202 -0
  297. package/src/summarization/cli-providers/detection.test.ts +273 -0
  298. package/src/summarization/cli-providers/detection.ts +118 -0
  299. package/src/summarization/cli-providers/index.ts +8 -0
  300. package/src/summarization/cost.test.ts +139 -0
  301. package/src/summarization/cost.ts +102 -0
  302. package/src/summarization/error-handler.test.ts +127 -0
  303. package/src/summarization/error-handler.ts +111 -0
  304. package/src/summarization/index.ts +102 -0
  305. package/src/summarization/pipeline.test.ts +498 -0
  306. package/src/summarization/pipeline.ts +231 -0
  307. package/src/summarization/prompts.test.ts +269 -0
  308. package/src/summarization/prompts.ts +133 -0
  309. package/src/summarization/provider-factory.test.ts +396 -0
  310. package/src/summarization/provider-factory.ts +178 -0
  311. package/src/summarization/types.ts +184 -0
  312. package/src/summarize/budget-bugs.test.ts +620 -0
  313. package/src/summarize/formatters.ts +419 -0
  314. package/src/summarize/index.ts +20 -0
  315. package/src/summarize/summarizer.test.ts +275 -0
  316. package/src/summarize/summarizer.ts +597 -0
  317. package/src/summarize/verify-bugs.test.ts +238 -0
  318. package/src/types/huggingface-transformers.d.ts +66 -0
  319. package/src/utils/index.ts +1 -0
  320. package/src/utils/tokens.test.ts +142 -0
  321. package/src/utils/tokens.ts +186 -0
  322. package/tests/fixtures/cli/.mdcontext/active-provider.json +7 -0
  323. package/tests/fixtures/cli/.mdcontext/config.json +8 -0
  324. package/tests/fixtures/cli/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.bin +0 -0
  325. package/tests/fixtures/cli/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.meta.bin +0 -0
  326. package/tests/fixtures/cli/.mdcontext/indexes/documents.json +33 -0
  327. package/tests/fixtures/cli/.mdcontext/indexes/links.json +12 -0
  328. package/tests/fixtures/cli/.mdcontext/indexes/sections.json +247 -0
  329. package/tests/fixtures/cli/README.md +9 -0
  330. package/tests/fixtures/cli/api-reference.md +11 -0
  331. package/tests/fixtures/cli/getting-started.md +11 -0
  332. package/tests/integration/embed-index.test.ts +712 -0
  333. package/tests/integration/search-context.test.ts +469 -0
  334. package/tests/integration/search-semantic.test.ts +522 -0
  335. package/tsconfig.json +26 -0
  336. package/vitest.config.ts +16 -0
  337. package/vitest.setup.ts +12 -0
@@ -0,0 +1,171 @@
1
+ # Multi-Word Semantic Search Failure Reproduction
2
+
3
+ ## Executive Summary
4
+
5
+ After systematic testing, the reported "multi-word semantic search failure" is **NOT a failure of semantic search itself**, but rather a **threshold calibration issue**. The root causes are:
6
+
7
+ 1. **Single-word queries have low similarity scores** (30-40%) while multi-word queries have higher scores (50-70%)
8
+ 2. **Default threshold of 0.5** filters out both single-word AND semantically-distant multi-word queries
9
+ 3. **Queries with abstract/non-domain-specific terms** (e.g., "gaps missing omissions", "issue challenge gap") score below threshold
10
+ 4. **Domain-specific multi-word queries work well** (e.g., "failure automation" = 61%, "process orchestration" = 68%)
11
+
12
+ ## Test Methodology
13
+
14
+ ### Test Corpus
15
+
16
+ Created a controlled test corpus in `src/__tests__/fixtures/semantic-search/multi-word-corpus/` with 6 markdown files covering:
17
+ - failure-automation.md - Failure detection and automated recovery
18
+ - job-context.md - Job execution context and metadata
19
+ - error-handling.md - Error handling patterns
20
+ - configuration-management.md - Config management practices
21
+ - distributed-systems.md - Distributed systems architecture
22
+ - process-orchestration.md - Workflow orchestration patterns
23
+
24
+ **Corpus Statistics:**
25
+ - 6 documents
26
+ - 67 sections
27
+ - 52 embedded vectors
28
+ - ~4,725 tokens
29
+
30
+ ### Index Command
31
+
32
+ ```bash
33
+ node dist/cli/main.js index src/__tests__/fixtures/semantic-search/multi-word-corpus --embed --force
34
+ ```
35
+
36
+ ## Test Results
37
+
38
+ ### Multi-Word Domain-Specific Queries (DEFAULT THRESHOLD 0.5)
39
+
40
+ | Query | Results | Top Match | Top Score |
41
+ |-------|---------|-----------|-----------|
42
+ | "failure automation" | 7 | failure-automation.md: Best Practices | 61.6% |
43
+ | "job context" | 4 | job-context.md: What is Job Context? | 60.4% |
44
+ | "error handling" | 7 | error-handling.md: Introduction | 63.7% |
45
+ | "configuration management" | 8 | configuration-management.md: Overview | 69.5% |
46
+ | "distributed systems" | 4 | distributed-systems.md: What Are... | 61.0% |
47
+ | "process orchestration" | 8 | process-orchestration.md: Introduction | 68.0% |
48
+
49
+ **Finding:** Multi-word queries with domain-specific terms **WORK WELL** with default threshold.
50
+
51
+ ### Single-Word Queries (DEFAULT THRESHOLD 0.5)
52
+
53
+ | Query | Results | Notes |
54
+ |-------|---------|-------|
55
+ | "failure" | 0 | Below 0.5 threshold |
56
+ | "automation" | 0 | Below 0.5 threshold |
57
+ | "context" | 0 | Below 0.5 threshold |
58
+ | "error" | 0 | Below 0.5 threshold |
59
+
60
+ ### Single-Word Queries (THRESHOLD 0.3)
61
+
62
+ | Query | Results | Top Match | Top Score |
63
+ |-------|---------|-----------|-----------|
64
+ | "failure" | 10 | failure-automation.md: Failure Isolation | 39.1% |
65
+ | "automation" | 10 | (similar) | ~35% |
66
+ | "error" | 10 | error-handling.md: Programming Errors | 49.1% |
67
+
68
+ **Finding:** Single-word queries have inherently **LOW similarity scores** (30-49%) due to:
69
+ 1. Short query embeddings lack semantic context
70
+ 2. Embedding model produces less distinctive vectors for single words
71
+ 3. Cosine similarity between short and long vectors is compressed
72
+
73
+ ### Abstract/Generic Multi-Word Queries (DEFAULT THRESHOLD 0.5)
74
+
75
+ | Query | Results | Notes |
76
+ |-------|---------|-------|
77
+ | "issue challenge gap" | 0 | Abstract terms, no domain match |
78
+ | "gaps missing omissions" | 0 | Meta-language about content, not content itself |
79
+
80
+ ### Abstract Queries (THRESHOLD 0.3)
81
+
82
+ | Query | Results | Top Match | Top Score |
83
+ |-------|---------|-----------|-----------|
84
+ | "issue challenge gap" | 10 | distributed-systems.md: Consistency vs Availability | 40.8% |
85
+ | "gaps missing omissions" | 3 | error-handling.md: Programming Errors | 35.0% |
86
+
87
+ **Finding:** Abstract/meta-language queries score **30-40%** - below default threshold but findable with lower threshold.
88
+
89
+ ### Hybrid Search Results
90
+
91
+ | Query | Hybrid Results | Primary Source |
92
+ |-------|---------------|----------------|
93
+ | "failure automation" | 7 | Semantic (RRF ~1.6) |
94
+ | "job context" | 4 | Semantic (RRF ~1.6) |
95
+
96
+ **Finding:** Hybrid search successfully combines semantic and keyword results, but the semantic component still uses the threshold filter.
97
+
98
+ ## Pattern Analysis
99
+
100
+ ### What Works (>50% similarity)
101
+ - Multi-word queries with **domain-specific terms** directly present in content
102
+ - Queries that form **coherent concepts** (e.g., "process orchestration")
103
+ - Queries that match **document titles or major headings**
104
+
105
+ ### What Fails at Default Threshold
106
+ - **Single words** - all score 30-49%
107
+ - **Abstract meta-language** - "gaps", "issues", "challenges" without domain context
108
+ - **Non-domain queries** searching indexed domain content
109
+ - **Very short queries** (1-2 generic words)
110
+
111
+ ### Similarity Score Distribution
112
+
113
+ ```
114
+ 70%+ : Document title/heading exact concept matches
115
+ 60-70%: Multi-word domain queries matching content topics
116
+ 50-60%: Multi-word queries with partial concept overlap
117
+ 40-50%: Single words or abstract queries with some relevance
118
+ 30-40%: Tangentially related content
119
+ <30% : Unrelated content (correctly filtered)
120
+ ```
121
+
122
+ ## Dogfooding Context
123
+
124
+ The dogfooding agents reported semantic search as "unreliable for multi-word conceptual queries". Re-analysis shows:
125
+
126
+ 1. **No embeddings were built** during dogfooding (only keyword index existed)
127
+ 2. Semantic search was **unavailable** - falling back to keyword search
128
+ 3. Multi-word **keyword** searches like "failure automation" worked
129
+ 4. Multi-word keyword searches as **quoted phrases** returned 0 (expecting exact text)
130
+ 5. Abstract queries like "gaps missing omissions" correctly returned 0 (phrase not in content)
131
+
132
+ The actual issue was:
133
+ - **Semantic search unavailable** (no embeddings)
134
+ - **Keyword phrase search** misunderstood (quoted = exact match)
135
+ - **Abstract conceptual queries** don't match concrete content via keyword
136
+
137
+ ## Recommendations
138
+
139
+ ### For ALP-204 (Embedding Text Analysis)
140
+ - Analyze how `generateEmbeddingText()` combines section context
141
+ - Check if heading + parent + content provides enough semantic signal for short queries
142
+
143
+ ### For ALP-205 (Query Processing)
144
+ - Query text is passed directly to embedding - no preprocessing
145
+ - Consider query expansion for short queries
146
+
147
+ ### For ALP-206 (Vector Search Parameters)
148
+ - Default threshold of 0.5 is **too high** for single-word queries
149
+ - Consider adaptive thresholds based on query length
150
+ - Consider returning top-K results regardless of threshold, then filtering
151
+
152
+ ### For ALP-207 (Solution Design)
153
+ Key solutions to consider:
154
+ 1. **Adaptive threshold** - lower for short queries
155
+ 2. **Query expansion** - augment short queries with context
156
+ 3. **Better user feedback** - show "X results below threshold" message
157
+ 4. **Threshold documentation** - educate users on --threshold flag
158
+
159
+ ## Conclusion
160
+
161
+ Multi-word semantic search **is working correctly** for domain-specific queries. The perceived "failure" is a combination of:
162
+ 1. No embeddings in dogfooding environment
163
+ 2. Threshold too high for short/abstract queries
164
+ 3. Confusion between keyword phrase search and semantic search
165
+ 4. Users expecting semantic search to understand meta-language about content
166
+
167
+ The fix is NOT to change semantic search algorithm, but to:
168
+ 1. Calibrate default threshold appropriately
169
+ 2. Add query-length-aware threshold adjustment
170
+ 3. Improve error messages when no results found
171
+ 4. Consider hybrid search as default mode
@@ -0,0 +1,207 @@
1
+ # Query Processing Analysis
2
+
3
+ ## Executive Summary
4
+
5
+ Query processing is **minimal and appropriate**. The query text is passed directly to the embedding API without modification. This is correct behavior for OpenAI's text-embedding-3-small model, which handles text normalization internally.
6
+
7
+ The asymmetry between query format (plain text) and document format (text with metadata) does NOT cause issues - embedding models are designed for this asymmetric retrieval pattern.
8
+
9
+ ## Query Flow
10
+
11
+ ```
12
+ User Input
13
+
14
+
15
+ CLI Parser (search.ts)
16
+ │ query string unchanged
17
+
18
+ semanticSearch(rootPath, query, options)
19
+ │ query string unchanged
20
+
21
+ provider.embed([query])
22
+ │ passed directly to API
23
+
24
+ OpenAI Embeddings API
25
+ │ returns 512-dimensional vector
26
+
27
+ Vector Store search()
28
+ │ cosine similarity comparison
29
+
30
+ Results filtered by threshold
31
+ ```
32
+
33
+ ## Code Trace
34
+
35
+ ### Entry Point: CLI
36
+
37
+ ```typescript
38
+ // src/cli/commands/search.ts:53-55
39
+ query: Args.text({ name: 'query' }).pipe(
40
+ Args.withDescription('Search query (natural language or regex pattern)'),
41
+ ),
42
+ ```
43
+
44
+ The query enters as a raw text string, no preprocessing.
45
+
46
+ ### Search Mode Detection
47
+
48
+ ```typescript
49
+ // src/cli/commands/search.ts:201-206
50
+ } else if (isAdvancedQuery(query)) {
51
+ effectiveMode = 'keyword'
52
+ modeReason = 'boolean/phrase pattern detected'
53
+ } else if (isRegexPattern(query)) {
54
+ effectiveMode = 'keyword'
55
+ modeReason = 'regex pattern detected'
56
+ }
57
+ ```
58
+
59
+ Queries with boolean operators (AND, OR, NOT) or quoted phrases are routed to keyword search. Plain multi-word queries go to semantic search.
60
+
61
+ ### Semantic Search Function
62
+
63
+ ```typescript
64
+ // src/embeddings/semantic-search.ts:558-559
65
+ // Embed the query
66
+ const queryResult = yield* wrapEmbedding(provider.embed([query]))
67
+ ```
68
+
69
+ **No preprocessing** - query is embedded exactly as received.
70
+
71
+ ### Embedding API Call
72
+
73
+ ```typescript
74
+ // src/embeddings/openai-provider.ts:175-179
75
+ const response = await this.client.embeddings.create({
76
+ model: this.model,
77
+ input: batch, // query text passed directly
78
+ dimensions: 512,
79
+ })
80
+ ```
81
+
82
+ Query text goes directly to OpenAI API without modification.
83
+
84
+ ## Query vs Document Format Asymmetry
85
+
86
+ ### Document Embedding Format (from ALP-204)
87
+
88
+ ```
89
+ # {heading}
90
+ Parent section: {parentHeading}
91
+ Document: {documentTitle}
92
+
93
+ {content}
94
+ ```
95
+
96
+ ### Query Format
97
+
98
+ ```
99
+ {raw query text}
100
+ ```
101
+
102
+ ### Analysis
103
+
104
+ This asymmetry is **intentional and correct** for semantic search:
105
+
106
+ 1. **Embedding models handle asymmetry**: OpenAI's text-embedding models are trained on diverse text formats. They produce semantically meaningful vectors regardless of format.
107
+
108
+ 2. **Query expansion is not needed**: The embedding model understands "failure automation" conceptually - it doesn't need to see `# Failure Automation` format.
109
+
110
+ 3. **Document context helps disambiguation**: The heading/document metadata in indexed content helps distinguish between sections with similar content but different contexts.
111
+
112
+ 4. **Industry standard practice**: Most RAG systems use plain queries against enriched documents.
113
+
114
+ ## Query Variation Tests
115
+
116
+ All variations produce semantically similar results:
117
+
118
+ | Query | Top Result | Similarity |
119
+ |-------|------------|------------|
120
+ | "failure automation" | Best Practices | 61.6% |
121
+ | "failure-automation" | Overview | 68.8% |
122
+ | "Failure Automation" | Best Practices | 65.6% |
123
+ | "automation for failures" | Overview | 70.3% |
124
+ | "how to automate failure handling" | Best Practices | 66.4% |
125
+
126
+ **Findings:**
127
+ - Casing doesn't significantly affect results
128
+ - Hyphenation produces slightly different top result
129
+ - Word order matters but doesn't break search
130
+ - Natural language queries work well
131
+
132
+ ## Threshold Analysis
133
+
134
+ ### Default Threshold Flow
135
+
136
+ ```
137
+ CLI default: 0.45
138
+
139
+ ▼ (if CLI uses default)
140
+ Config default: 0.5
141
+
142
+
143
+ Effective threshold: 0.5
144
+ ```
145
+
146
+ When user doesn't specify `--threshold`, the effective value is 0.5 from config.
147
+
148
+ ### Threshold Impact
149
+
150
+ | Threshold | Single-word "failure" | Multi-word "failure automation" |
151
+ |-----------|----------------------|--------------------------------|
152
+ | 0.5 | 0 results | 7 results |
153
+ | 0.3 | 10 results | 7+ results |
154
+ | 0.1 | 10 results | 7+ results |
155
+
156
+ The 0.5 threshold filters out low-similarity single-word matches while allowing relevant multi-word matches through.
157
+
158
+ ## Potential Query Enhancements (for ALP-207)
159
+
160
+ While current processing is correct, potential improvements could include:
161
+
162
+ ### 1. Query Expansion for Short Queries
163
+
164
+ ```typescript
165
+ // Hypothetical enhancement
166
+ const enhancedQuery = query.split(' ').length <= 2
167
+ ? `Find content about: ${query}`
168
+ : query
169
+ ```
170
+
171
+ ### 2. Adaptive Threshold
172
+
173
+ ```typescript
174
+ // Lower threshold for shorter queries
175
+ const adaptiveThreshold = query.split(' ').length <= 1
176
+ ? 0.3
177
+ : options.threshold ?? 0.5
178
+ ```
179
+
180
+ ### 3. Hybrid by Default
181
+
182
+ Short queries might benefit from hybrid mode being the default, leveraging both keyword and semantic signals.
183
+
184
+ ## Recommendations
185
+
186
+ ### No Changes Needed to Query Processing
187
+
188
+ The current implementation is correct. The query flow is:
189
+ - Clean (no unnecessary transformations)
190
+ - Transparent (what you type is what gets embedded)
191
+ - Flexible (users can adjust with --threshold)
192
+
193
+ ### Focus Areas for ALP-207
194
+
195
+ 1. **Threshold tuning** - Consider lowering default to 0.4 or making it adaptive
196
+ 2. **Better feedback** - Show "X results below threshold" when 0 results
197
+ 3. **Documentation** - Explain threshold behavior in help text
198
+ 4. **Hybrid default** - Consider hybrid mode as default for better coverage
199
+
200
+ ## Conclusion
201
+
202
+ Query processing is implemented correctly. The perceived "multi-word query failures" are actually threshold calibration issues, not query processing bugs. The search correctly:
203
+
204
+ 1. Passes queries unchanged to embedding API (correct)
205
+ 2. Uses asymmetric retrieval (query vs enriched documents) (correct)
206
+ 3. Handles query variations semantically (working)
207
+ 4. Applies configurable threshold (working, but may need tuning)
@@ -0,0 +1,114 @@
1
+ # Root Cause Analysis and Solution Design
2
+
3
+ ## Executive Summary
4
+
5
+ **Root Cause**: The "multi-word semantic search failure" is a **threshold calibration issue**, not a search algorithm bug.
6
+
7
+ **Key Findings**:
8
+ 1. Multi-word domain queries WORK correctly (60-70% similarity)
9
+ 2. Single-word queries score lower (30-40%) due to embedding model properties
10
+ 3. Default 0.5 threshold filters out short/abstract queries
11
+ 4. The dogfooding had no embeddings built - agents fell back to keyword search
12
+ 5. Embedding text format, query processing, and HNSW config are all correct
13
+
14
+ **Solution**: Lower default threshold + improve user feedback for edge cases.
15
+
16
+ ## Synthesis of Diagnostic Findings
17
+
18
+ ### ALP-203: Reproduction Results
19
+
20
+ | Query Type | Works at 0.5? | Score Range |
21
+ |------------|---------------|-------------|
22
+ | "failure automation" | YES | 54-62% |
23
+ | "error handling" | YES | 53-64% |
24
+ | "failure" (single) | NO | 31-39% |
25
+ | "error" (single) | NO | 32-49% |
26
+ | "gaps missing omissions" | NO | 30-35% |
27
+
28
+ **Conclusion**: Multi-word domain queries work. Short/abstract queries fail threshold.
29
+
30
+ ### ALP-204: Embedding Text Analysis
31
+
32
+ - Format is correct: `# heading\nParent: X\nDocument: Y\n\ncontent`
33
+ - Follows industry best practices
34
+ - No issues identified
35
+
36
+ ### ALP-205: Query Processing Analysis
37
+
38
+ - Query passed unchanged to embedding API (correct)
39
+ - Asymmetric retrieval (plain query vs enriched docs) is normal
40
+ - Query variations all work correctly
41
+
42
+ ### ALP-206: Vector Search Analysis
43
+
44
+ - HNSW parameters (M=16, efConstruction=200, efSearch=100) are optimal
45
+ - Cosine distance correct for text embeddings
46
+ - Threshold filtering is the only issue
47
+
48
+ ## Root Cause
49
+
50
+ **Primary Cause**: The default similarity threshold (0.5) is too high for:
51
+ 1. Single-word queries (max ~49% similarity due to embedding model properties)
52
+ 2. Abstract/meta-language queries
53
+ 3. Non-domain-specific queries
54
+
55
+ **NOT the cause**:
56
+ - Embedding text format (correct)
57
+ - Query processing (correct)
58
+ - HNSW parameters (optimal)
59
+ - Embedding model (working as expected)
60
+
61
+ **Contributing Factor**: Dogfooding lacked embeddings, causing confusion about what was failing.
62
+
63
+ ## Solution Design
64
+
65
+ ### Recommended Approach: Threshold Tuning + UX Improvements
66
+
67
+ #### 1. Lower Default Threshold to 0.35
68
+
69
+ ```typescript
70
+ // src/config/schema.ts
71
+ minSimilarity: Config.number('minSimilarity').pipe(Config.withDefault(0.35))
72
+ ```
73
+
74
+ **Rationale**:
75
+ - Captures single-word results (30-40% range)
76
+ - Still filters irrelevant content (<30%)
77
+ - Low risk - users can adjust with --threshold
78
+
79
+ #### 2. Add "Below Threshold" Feedback
80
+
81
+ When 0 results, show hint about lower-scored results:
82
+
83
+ ```
84
+ Results: 0
85
+
86
+ Note: 10 results found below 0.35 threshold (highest: 0.34)
87
+ Tip: Use --threshold 0.3 to see more results
88
+ ```
89
+
90
+ #### 3. Consider Hybrid Search as Default
91
+
92
+ For queries without boolean operators, hybrid mode provides better coverage by combining semantic and keyword signals.
93
+
94
+ ## Implementation Plan for Phase 2
95
+
96
+ 1. **Lower default threshold** - Change config default from 0.5 to 0.35
97
+ 2. **Add below-threshold feedback** - Show hint when 0 results
98
+ 3. **Document threshold behavior** - Update README/help
99
+ 4. **Validate changes** - Re-run test corpus
100
+
101
+ ## Expected Outcomes
102
+
103
+ | Metric | Before | After |
104
+ |--------|--------|-------|
105
+ | Single-word results at default | 0 | 10+ |
106
+ | Multi-word results | 7+ | 7+ (unchanged) |
107
+
108
+ ## Conclusion
109
+
110
+ The "multi-word semantic search failure" was misidentified. Multi-word queries work correctly. The issue is threshold calibration affecting single-word and abstract queries.
111
+
112
+ **Recommended Solution**: Lower threshold to 0.35, add user feedback, improve documentation.
113
+
114
+ **No algorithmic changes needed** to embedding generation, query processing, or vector search.
@@ -0,0 +1,69 @@
1
+ # Threshold Validation Report
2
+
3
+ ## Summary
4
+
5
+ Validation confirms that lowering the default similarity threshold from 0.5 to 0.35 (ALP-208) **fixes single-word query failures** without regressing multi-word query performance.
6
+
7
+ ## Test Environment
8
+
9
+ - **Test Corpus**: `src/__tests__/fixtures/semantic-search/multi-word-corpus/`
10
+ - **Documents**: 6 markdown files (failure-automation, job-context, error-handling, configuration-management, distributed-systems, process-orchestration)
11
+ - **Sections**: 52 embedded vectors
12
+ - **Date**: 2026-01-26
13
+
14
+ ## Before/After Comparison
15
+
16
+ ### Single-Word Queries
17
+
18
+ | Query | Before (0.5) | After (0.35) | Top Match | Top Score |
19
+ |-------|-------------|--------------|-----------|-----------|
20
+ | "failure" | 0 results | **6 results** | failure-automation.md: Failure Isolation | 39.0% |
21
+ | "error" | 0 results | **7 results** | error-handling.md: Programming Errors | 49.1% |
22
+ | "automation" | 0 results | **10 results** | failure-automation.md: Overview | 44.9% |
23
+ | "context" | 0 results | **10 results** | job-context.md: What is Job Context? | 48.1% |
24
+
25
+ **Improvement**: 100% of single-word queries now return relevant results.
26
+
27
+ ### Multi-Word Queries (Regression Check)
28
+
29
+ | Query | Before (0.5) | After (0.35) | Top Match | Top Score |
30
+ |-------|-------------|--------------|-----------|-----------|
31
+ | "failure automation" | 7 results | 10 results | failure-automation.md: Best Practices | 61.5% |
32
+ | "job context" | 4 results | 7 results | job-context.md: What is Job Context? | 60.4% |
33
+ | "error handling" | 7 results | 10 results | error-handling.md: Introduction | 63.6% |
34
+ | "configuration management" | 8 results | 10 results | configuration-management.md: Overview | 69.5% |
35
+ | "distributed systems" | 4 results | 10 results | distributed-systems.md: What Are... | 60.9% |
36
+ | "process orchestration" | 8 results | 10 results | process-orchestration.md: Introduction | 67.9% |
37
+
38
+ **Finding**: No regression. Multi-word queries actually return MORE results (expected, since threshold is lower), with the same top matches and scores.
39
+
40
+ ## Success Criteria Validation
41
+
42
+ - [x] **Single-word queries return results at default threshold** - All 4 test queries now return 6-10 results
43
+ - [x] **Multi-word queries work as before (no regression)** - All 6 queries return results with same top matches
44
+ - [x] **Quantitative improvement documented** - See tables above
45
+
46
+ ## Below-Threshold Feedback (ALP-209)
47
+
48
+ The new feedback feature correctly reports results below threshold:
49
+
50
+ ```json
51
+ {
52
+ "results": [...6 results...],
53
+ "belowThresholdCount": 14,
54
+ "belowThresholdHighest": 0.349
55
+ }
56
+ ```
57
+
58
+ This helps users understand that more content exists if they lower the threshold.
59
+
60
+ ## Conclusion
61
+
62
+ The threshold change from 0.5 to 0.35 is validated as the correct fix:
63
+
64
+ 1. **Single-word queries now work** - Users can search for concepts like "failure", "error", "context"
65
+ 2. **Multi-word queries unaffected** - High-quality results with same top matches
66
+ 3. **User guidance in place** - Documentation (ALP-210) explains threshold behavior
67
+ 4. **Below-threshold feedback** - Users see when lowering threshold would help
68
+
69
+ The root cause identified in ALP-207 (threshold too high for short queries scoring 30-40%) is confirmed fixed.
@@ -0,0 +1,63 @@
1
+ # Vector Search Parameters and Scoring Analysis
2
+
3
+ ## Executive Summary
4
+
5
+ The HNSW vector search configuration is **appropriate and well-tuned**. The root cause of "0 results" is **NOT the vector search algorithm**, but the **similarity threshold filtering** applied after search.
6
+
7
+ Key finding: Single-word queries have inherently lower similarity scores (30-40%) than multi-word queries (50-70%). The default 0.5 threshold filters out all single-word results.
8
+
9
+ ## HNSW Configuration
10
+
11
+ ### Current Parameters
12
+
13
+ From `src/embeddings/vector-store.ts:98`:
14
+
15
+ ```typescript
16
+ this.index.initIndex(10000, 16, 200, 100)
17
+ // maxElements, M, efConstruction, efSearch
18
+ ```
19
+
20
+ | Parameter | Value | Description | Assessment |
21
+ |-----------|-------|-------------|------------|
22
+ | maxElements | 10,000 | Initial capacity (auto-resizes) | Adequate |
23
+ | M | 16 | Max connections per node | Good balance |
24
+ | efConstruction | 200 | Construction-time search width | High quality |
25
+ | efSearch | 100 | Query-time search width | Good recall |
26
+
27
+ All parameters are well-tuned. No changes needed.
28
+
29
+ ## Similarity Score Analysis
30
+
31
+ ### Threshold Experiment
32
+
33
+ Testing "failure" at different thresholds:
34
+
35
+ | Threshold | Results | Top Score |
36
+ |-----------|---------|-----------|
37
+ | 0.0 | 10 | 39.1% |
38
+ | 0.3 | 10 | 39.1% |
39
+ | 0.4 | 0 | - |
40
+ | 0.5 | 0 | - |
41
+
42
+ ### Score Distribution by Query Type
43
+
44
+ | Query Type | Score Range | Results at 0.5 |
45
+ |------------|-------------|----------------|
46
+ | Single word | 31-49% | 0 |
47
+ | Two-word domain | 54-70% | 7+ |
48
+ | Natural language | 50-66% | 9 |
49
+
50
+ ## Root Cause
51
+
52
+ The 0.5 default threshold filters out single-word results (max ~49%). This is threshold calibration, not a search algorithm issue.
53
+
54
+ ## Recommendations for ALP-207
55
+
56
+ 1. Lower default threshold to 0.3-0.4
57
+ 2. Consider adaptive threshold by query length
58
+ 3. Show "N results below threshold" message
59
+ 4. Make threshold more visible in docs
60
+
61
+ ## Conclusion
62
+
63
+ Vector search works correctly. Focus ALP-207 on threshold tuning, not algorithmic changes.