mdcontext 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (251) hide show
  1. package/.changeset/config.json +9 -9
  2. package/.claude/settings.local.json +25 -0
  3. package/.github/workflows/claude-code-review.yml +44 -0
  4. package/.github/workflows/claude.yml +85 -0
  5. package/CONTRIBUTING.md +186 -0
  6. package/NOTES/NOTES +44 -0
  7. package/README.md +206 -3
  8. package/biome.json +1 -1
  9. package/dist/chunk-23UPXDNL.js +3044 -0
  10. package/dist/chunk-2W7MO2DL.js +1366 -0
  11. package/dist/chunk-3NUAZGMA.js +1689 -0
  12. package/dist/chunk-7TOWB2XB.js +366 -0
  13. package/dist/chunk-7XOTOADQ.js +3065 -0
  14. package/dist/chunk-AH2PDM2K.js +3042 -0
  15. package/dist/chunk-BNXWSZ63.js +3742 -0
  16. package/dist/chunk-BTL5DJVU.js +3222 -0
  17. package/dist/chunk-HDHYG7E4.js +104 -0
  18. package/dist/chunk-HLR4KZBP.js +3234 -0
  19. package/dist/chunk-IP3FRFEB.js +1045 -0
  20. package/dist/chunk-KHU56VDO.js +3042 -0
  21. package/dist/chunk-KRYIFLQR.js +85 -89
  22. package/dist/chunk-LBSDNLEM.js +287 -0
  23. package/dist/chunk-MNTQ7HCP.js +2643 -0
  24. package/dist/chunk-MUJELQQ6.js +1387 -0
  25. package/dist/chunk-MXJGMSLV.js +2199 -0
  26. package/dist/chunk-N6QJGC3Z.js +2636 -0
  27. package/dist/chunk-OBELGBPM.js +1713 -0
  28. package/dist/chunk-OT7R5XTA.js +3192 -0
  29. package/dist/chunk-P7X4RA2T.js +106 -0
  30. package/dist/chunk-PIDUQNC2.js +3185 -0
  31. package/dist/chunk-POGCDIH4.js +3187 -0
  32. package/dist/chunk-PSIEOQGZ.js +3043 -0
  33. package/dist/chunk-PVRT3IHA.js +3238 -0
  34. package/dist/chunk-QNN4TT23.js +1430 -0
  35. package/dist/chunk-RE3R45RJ.js +3042 -0
  36. package/dist/chunk-S7E6TFX6.js +718 -657
  37. package/dist/chunk-SG6GLU4U.js +1378 -0
  38. package/dist/chunk-SJCDV2ST.js +274 -0
  39. package/dist/chunk-SYE5XLF3.js +104 -0
  40. package/dist/chunk-T5VLYBZD.js +103 -0
  41. package/dist/chunk-TOQB7VWU.js +3238 -0
  42. package/dist/chunk-VFNMZ4ZQ.js +3228 -0
  43. package/dist/chunk-VVTGZNBT.js +1533 -1423
  44. package/dist/chunk-W7Q4RFEV.js +104 -0
  45. package/dist/chunk-XTYYVRLO.js +3190 -0
  46. package/dist/chunk-Y6MDYVJD.js +3063 -0
  47. package/dist/cli/main.js +4072 -629
  48. package/dist/index.d.ts +420 -33
  49. package/dist/index.js +8 -15
  50. package/dist/mcp/server.js +103 -7
  51. package/dist/schema-BAWSG7KY.js +22 -0
  52. package/dist/schema-E3QUPL26.js +20 -0
  53. package/dist/schema-EHL7WUT6.js +20 -0
  54. package/docs/019-USAGE.md +44 -5
  55. package/docs/020-current-implementation.md +8 -8
  56. package/docs/021-DOGFOODING-FINDINGS.md +1 -1
  57. package/docs/CONFIG.md +1123 -0
  58. package/docs/ERRORS.md +383 -0
  59. package/docs/summarization.md +320 -0
  60. package/justfile +40 -0
  61. package/package.json +39 -33
  62. package/research/INDEX.md +315 -0
  63. package/research/code-review/README.md +90 -0
  64. package/research/code-review/cli-error-handling-review.md +979 -0
  65. package/research/code-review/code-review-validation-report.md +464 -0
  66. package/research/code-review/main-ts-review.md +1128 -0
  67. package/research/config-docs/SUMMARY.md +357 -0
  68. package/research/config-docs/TEST-RESULTS.md +776 -0
  69. package/research/config-docs/TODO.md +542 -0
  70. package/research/config-docs/analysis.md +744 -0
  71. package/research/config-docs/fix-validation.md +502 -0
  72. package/research/config-docs/help-audit.md +264 -0
  73. package/research/config-docs/help-system-analysis.md +890 -0
  74. package/research/frontmatter/COMMENTS-ARE-SKIPPED.md +149 -0
  75. package/research/frontmatter/LLM-CODE-NAVIGATION.md +276 -0
  76. package/research/issue-review.md +603 -0
  77. package/research/llm-summarization/agent-cli-tools-2026.md +1082 -0
  78. package/research/llm-summarization/alternative-providers-2026.md +1428 -0
  79. package/research/llm-summarization/anthropic-2026.md +367 -0
  80. package/research/llm-summarization/claude-cli-integration.md +1706 -0
  81. package/research/llm-summarization/cli-integration-patterns.md +3155 -0
  82. package/research/llm-summarization/openai-2026.md +473 -0
  83. package/research/llm-summarization/openai-compatible-providers-2026.md +1022 -0
  84. package/research/llm-summarization/opencode-cli-integration.md +1552 -0
  85. package/research/llm-summarization/prompt-engineering-2026.md +1426 -0
  86. package/research/llm-summarization/prototype-results.md +56 -0
  87. package/research/llm-summarization/provider-switching-patterns-2026.md +2153 -0
  88. package/research/llm-summarization/typescript-llm-libraries-2026.md +2436 -0
  89. package/research/mdcontext-pudding/00-EXECUTIVE-SUMMARY.md +282 -0
  90. package/research/mdcontext-pudding/01-index-embed.md +956 -0
  91. package/research/mdcontext-pudding/02-search-COMMANDS.md +142 -0
  92. package/research/mdcontext-pudding/02-search-SUMMARY.md +146 -0
  93. package/research/mdcontext-pudding/02-search.md +970 -0
  94. package/research/mdcontext-pudding/03-context.md +779 -0
  95. package/research/mdcontext-pudding/04-navigation-and-analytics.md +803 -0
  96. package/research/mdcontext-pudding/04-tree.md +704 -0
  97. package/research/mdcontext-pudding/05-config.md +1038 -0
  98. package/research/mdcontext-pudding/06-links-summary.txt +87 -0
  99. package/research/mdcontext-pudding/06-links.md +679 -0
  100. package/research/mdcontext-pudding/07-stats.md +693 -0
  101. package/research/mdcontext-pudding/BUG-FIX-PLAN.md +388 -0
  102. package/research/mdcontext-pudding/P0-BUG-VALIDATION.md +167 -0
  103. package/research/mdcontext-pudding/README.md +168 -0
  104. package/research/mdcontext-pudding/TESTING-SUMMARY.md +128 -0
  105. package/research/research-quality-review.md +834 -0
  106. package/research/semantic-search/embedding-text-analysis.md +156 -0
  107. package/research/semantic-search/multi-word-failure-reproduction.md +171 -0
  108. package/research/semantic-search/query-processing-analysis.md +207 -0
  109. package/research/semantic-search/root-cause-and-solution.md +114 -0
  110. package/research/semantic-search/threshold-validation-report.md +69 -0
  111. package/research/semantic-search/vector-search-analysis.md +63 -0
  112. package/research/test-path-issues.md +276 -0
  113. package/review/ALP-76/1-error-type-design.md +962 -0
  114. package/review/ALP-76/2-error-handling-patterns.md +906 -0
  115. package/review/ALP-76/3-error-presentation.md +624 -0
  116. package/review/ALP-76/4-test-coverage.md +625 -0
  117. package/review/ALP-76/5-migration-completeness.md +440 -0
  118. package/review/ALP-76/6-effect-best-practices.md +755 -0
  119. package/scripts/apply-branch-protection.sh +47 -0
  120. package/scripts/branch-protection-templates.json +79 -0
  121. package/scripts/prototype-summarization.ts +346 -0
  122. package/scripts/rebuild-hnswlib.js +32 -37
  123. package/scripts/setup-branch-protection.sh +64 -0
  124. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/active-provider.json +7 -0
  125. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/bm25.json +541 -0
  126. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/bm25.meta.json +5 -0
  127. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/config.json +8 -0
  128. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.bin +0 -0
  129. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.meta.bin +0 -0
  130. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/documents.json +60 -0
  131. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/links.json +13 -0
  132. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/sections.json +1197 -0
  133. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/configuration-management.md +99 -0
  134. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/distributed-systems.md +92 -0
  135. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/error-handling.md +78 -0
  136. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/failure-automation.md +55 -0
  137. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/job-context.md +69 -0
  138. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/process-orchestration.md +99 -0
  139. package/src/cli/argv-preprocessor.test.ts +2 -2
  140. package/src/cli/cli.test.ts +230 -33
  141. package/src/cli/commands/config-cmd.ts +642 -0
  142. package/src/cli/commands/context.ts +97 -9
  143. package/src/cli/commands/duplicates.ts +122 -0
  144. package/src/cli/commands/embeddings.ts +529 -0
  145. package/src/cli/commands/index-cmd.ts +210 -30
  146. package/src/cli/commands/index.ts +3 -0
  147. package/src/cli/commands/search.ts +894 -64
  148. package/src/cli/commands/stats.ts +3 -0
  149. package/src/cli/commands/tree.ts +26 -5
  150. package/src/cli/config-layer.ts +176 -0
  151. package/src/cli/error-handler.test.ts +235 -0
  152. package/src/cli/error-handler.ts +655 -0
  153. package/src/cli/flag-schemas.ts +66 -0
  154. package/src/cli/help.ts +209 -7
  155. package/src/cli/main.ts +348 -58
  156. package/src/cli/options.ts +10 -0
  157. package/src/cli/shared-error-handling.ts +199 -0
  158. package/src/cli/utils.ts +150 -17
  159. package/src/config/file-provider.test.ts +320 -0
  160. package/src/config/file-provider.ts +273 -0
  161. package/src/config/index.ts +72 -0
  162. package/src/config/integration.test.ts +667 -0
  163. package/src/config/precedence.test.ts +277 -0
  164. package/src/config/precedence.ts +451 -0
  165. package/src/config/schema.test.ts +414 -0
  166. package/src/config/schema.ts +603 -0
  167. package/src/config/service.test.ts +320 -0
  168. package/src/config/service.ts +243 -0
  169. package/src/config/testing.test.ts +264 -0
  170. package/src/config/testing.ts +110 -0
  171. package/src/core/types.ts +6 -33
  172. package/src/duplicates/detector.test.ts +183 -0
  173. package/src/duplicates/detector.ts +414 -0
  174. package/src/duplicates/index.ts +18 -0
  175. package/src/embeddings/embedding-namespace.test.ts +300 -0
  176. package/src/embeddings/embedding-namespace.ts +947 -0
  177. package/src/embeddings/heading-boost.test.ts +222 -0
  178. package/src/embeddings/hnsw-build-options.test.ts +198 -0
  179. package/src/embeddings/hyde.test.ts +272 -0
  180. package/src/embeddings/hyde.ts +264 -0
  181. package/src/embeddings/index.ts +2 -0
  182. package/src/embeddings/openai-provider.ts +332 -83
  183. package/src/embeddings/pricing.json +22 -0
  184. package/src/embeddings/provider-constants.ts +204 -0
  185. package/src/embeddings/provider-errors.test.ts +967 -0
  186. package/src/embeddings/provider-errors.ts +565 -0
  187. package/src/embeddings/provider-factory.test.ts +240 -0
  188. package/src/embeddings/provider-factory.ts +225 -0
  189. package/src/embeddings/provider-integration.test.ts +788 -0
  190. package/src/embeddings/query-preprocessing.test.ts +187 -0
  191. package/src/embeddings/semantic-search-threshold.test.ts +508 -0
  192. package/src/embeddings/semantic-search.ts +780 -93
  193. package/src/embeddings/types.ts +293 -16
  194. package/src/embeddings/vector-store.ts +486 -77
  195. package/src/embeddings/voyage-provider.ts +313 -0
  196. package/src/errors/errors.test.ts +845 -0
  197. package/src/errors/index.ts +533 -0
  198. package/src/index/ignore-patterns.test.ts +354 -0
  199. package/src/index/ignore-patterns.ts +305 -0
  200. package/src/index/indexer.ts +286 -48
  201. package/src/index/storage.ts +94 -30
  202. package/src/index/types.ts +40 -2
  203. package/src/index/watcher.ts +67 -9
  204. package/src/index.ts +22 -0
  205. package/src/integration/search-keyword.test.ts +678 -0
  206. package/src/mcp/server.ts +135 -6
  207. package/src/parser/parser.ts +18 -19
  208. package/src/parser/section-filter.test.ts +277 -0
  209. package/src/parser/section-filter.ts +125 -3
  210. package/src/search/__tests__/hybrid-search.test.ts +650 -0
  211. package/src/search/bm25-store.ts +366 -0
  212. package/src/search/cross-encoder.test.ts +253 -0
  213. package/src/search/cross-encoder.ts +406 -0
  214. package/src/search/fuzzy-search.test.ts +419 -0
  215. package/src/search/fuzzy-search.ts +273 -0
  216. package/src/search/hybrid-search.ts +448 -0
  217. package/src/search/path-matcher.test.ts +276 -0
  218. package/src/search/path-matcher.ts +33 -0
  219. package/src/search/searcher.test.ts +99 -1
  220. package/src/search/searcher.ts +189 -67
  221. package/src/search/wink-bm25.d.ts +30 -0
  222. package/src/summarization/cli-providers/claude.ts +202 -0
  223. package/src/summarization/cli-providers/detection.test.ts +273 -0
  224. package/src/summarization/cli-providers/detection.ts +118 -0
  225. package/src/summarization/cli-providers/index.ts +8 -0
  226. package/src/summarization/cost.test.ts +139 -0
  227. package/src/summarization/cost.ts +102 -0
  228. package/src/summarization/error-handler.test.ts +127 -0
  229. package/src/summarization/error-handler.ts +111 -0
  230. package/src/summarization/index.ts +102 -0
  231. package/src/summarization/pipeline.test.ts +498 -0
  232. package/src/summarization/pipeline.ts +231 -0
  233. package/src/summarization/prompts.test.ts +269 -0
  234. package/src/summarization/prompts.ts +133 -0
  235. package/src/summarization/provider-factory.test.ts +396 -0
  236. package/src/summarization/provider-factory.ts +178 -0
  237. package/src/summarization/types.ts +184 -0
  238. package/src/summarize/summarizer.ts +104 -35
  239. package/src/types/huggingface-transformers.d.ts +66 -0
  240. package/tests/fixtures/cli/.mdcontext/active-provider.json +7 -0
  241. package/tests/fixtures/cli/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.bin +0 -0
  242. package/tests/fixtures/cli/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.meta.bin +0 -0
  243. package/tests/fixtures/cli/.mdcontext/indexes/documents.json +4 -4
  244. package/tests/fixtures/cli/.mdcontext/indexes/sections.json +14 -0
  245. package/tests/integration/embed-index.test.ts +712 -0
  246. package/tests/integration/search-context.test.ts +469 -0
  247. package/tests/integration/search-semantic.test.ts +522 -0
  248. package/vitest.config.ts +1 -6
  249. package/AGENTS.md +0 -46
  250. package/tests/fixtures/cli/.mdcontext/vectors.bin +0 -0
  251. package/tests/fixtures/cli/.mdcontext/vectors.meta.json +0 -1264
@@ -0,0 +1,168 @@
1
+ # mdcontext Pudding Research
2
+
3
+ **"The proof is in the pudding"** - Comprehensive dogfooding and testing of mdcontext functionality.
4
+
5
+ This directory contains detailed test results, analysis, and findings from exercising mdcontext against real-world codebases.
6
+
7
+ ## Test Reports
8
+
9
+ ### ✅ Completed
10
+
11
+ | Report | Status | Key Findings |
12
+ |--------|--------|--------------|
13
+ | [01-index-embed.md](./01-index-embed.md) | 🔴 Critical Bug | Vector metadata save fails on large corpora |
14
+ | [02-search.md](./02-search.md) | ✅ Complete | Search functionality working well |
15
+ | [03-context.md](./03-context.md) | ✅ Complete | Context assembly excellent |
16
+ | [04-tree.md](./04-tree.md) | ✅ Complete | Tree visualization works |
17
+ | [05-config.md](./05-config.md) | ✅ Complete | Config system solid |
18
+ | [06-links.md](./06-links.md) | ✅ Complete | Link graph analysis working |
19
+ | [07-stats.md](./07-stats.md) | ✅ Complete | Stats command comprehensive |
20
+
21
+ ### 🔴 Critical Findings
22
+
23
+ **Test 01 (Index & Embeddings)** uncovered a critical bug:
24
+ - **Issue**: Vector metadata save fails on large corpora (>1500 docs)
25
+ - **Severity**: P0 - Blocks production use of semantic search
26
+ - **Impact**: All embedding providers affected
27
+ - **Root Cause**: JSON.stringify size limit (~512MB)
28
+ - **Location**: `src/embeddings/vector-store.ts:401`
29
+ - **Fix Plan**: [BUG-FIX-PLAN.md](./BUG-FIX-PLAN.md)
30
+
31
+ ## Quick Links
32
+
33
+ ### Start Here 🎯
34
+ - **[00-EXECUTIVE-SUMMARY.md](./00-EXECUTIVE-SUMMARY.md)** - Complete testing overview, grades, and recommendations
35
+ - **[P0-BUG-VALIDATION.md](./P0-BUG-VALIDATION.md)** - Critical bug validation (100% reproducible)
36
+ - **[BUG-FIX-PLAN.md](./BUG-FIX-PLAN.md)** - Implementation plan for fix (6 hours)
37
+
38
+ ### Detailed Reports
39
+ - **[01-index-embed.md](./01-index-embed.md)** - Indexing & embedding deep dive (26KB)
40
+ - **[02-search.md](./02-search.md)** - Search testing (22 scenarios, Grade: A)
41
+ - **[03-context.md](./03-context.md)** - Token budget analysis (Grade: A-)
42
+ - **[04-tree.md](./04-tree.md)** - Structure navigation testing
43
+ - **[05-config.md](./05-config.md)** - Configuration management (26KB)
44
+ - **[06-links.md](./06-links.md)** - Knowledge graph capabilities
45
+ - **[07-stats.md](./07-stats.md)** - Analytics and metrics
46
+ - **[TESTING-SUMMARY.md](./TESTING-SUMMARY.md)** - Test matrix
47
+
48
+ ### Test Data
49
+ - Test corpus: agentic-flow (1561 docs, 52,714 sections)
50
+ - Reference corpus: mdcontext (120 docs, 4,234 sections)
51
+ - Test logs: `/tmp/test*.log`
52
+
53
+ ## Test Coverage
54
+
55
+ ### What Was Tested ✅
56
+
57
+ 1. **Basic Indexing**
58
+ - Large corpus (1561 docs): ✅ Works
59
+ - Performance: 108 docs/sec
60
+ - Storage: 28MB
61
+
62
+ 2. **Embeddings (Multiple Providers)**
63
+ - OpenAI (small corpus 120 docs): ✅ Works
64
+ - OpenAI (large corpus 1558 docs): ❌ Bug validated
65
+ - OpenRouter (large corpus 1558 docs): ❌ Bug validated
66
+ - Ollama (large corpus 1558 docs): ❌ Bug validated
67
+ - **100% reproducible across all providers**
68
+
69
+ 3. **CLI Features**
70
+ - JSON output: ✅ Perfect
71
+ - Force rebuild: ✅ Works
72
+ - Incremental updates: ✅ Excellent (28x faster)
73
+
74
+ 4. **Search Functionality** (02-search.md)
75
+ - Keyword search: ✅ Works well
76
+ - Multi-word queries: ✅ Fixed
77
+ - Semantic search: ⚠️ Blocked by embedding bug
78
+
79
+ 5. **Context Assembly** (03-context.md)
80
+ - Compression levels: ✅ All working
81
+ - Token budgets: ✅ Accurate
82
+ - Quality: ✅ High
83
+
84
+ ### What Wasn't Tested ⏸️
85
+
86
+ - Anthropic embeddings (no embedding API)
87
+ - Voyage AI embeddings
88
+ - Very large corpora (>5000 docs)
89
+ - Real-time watch mode under load
90
+ - Concurrent indexing
91
+
92
+ ## Performance Summary
93
+
94
+ | Operation | Speed | Cost | Storage |
95
+ |-----------|-------|------|---------|
96
+ | Basic Index (1561 docs) | 14.4s | Free | 28MB |
97
+ | OpenAI Embed (120 docs) | 64.7s | $0.011 | 69MB |
98
+ | Incremental (1 file) | 54ms | Free | Δ only |
99
+
100
+ ## Bug Impact Analysis
101
+
102
+ ### Working Today ✅
103
+ - Small projects (<200 docs) with embeddings
104
+ - Any size project without embeddings
105
+ - All CLI features
106
+
107
+ ### Blocked 🚫
108
+ - Medium projects (200-1000 docs) with embeddings
109
+ - Large projects (>1500 docs) with embeddings
110
+ - Production semantic search on real codebases
111
+
112
+ ### After Bug Fix 🔧
113
+ - All corpus sizes
114
+ - All providers
115
+ - Full semantic search capability
116
+
117
+ ## Next Steps
118
+
119
+ ### Immediate (P0)
120
+ 1. Implement MessagePack binary format for vector metadata
121
+ 2. Test fix on agentic-flow corpus
122
+ 3. Deploy to production
123
+
124
+ ### Short-term (P1)
125
+ 1. Reduce metadata redundancy
126
+ 2. Add size validation warnings
127
+ 3. Test OpenAI on medium corpus (500-1000 docs)
128
+
129
+ ### Medium-term (P2)
130
+ 1. Benchmark search performance with embeddings
131
+ 2. Test additional embedding providers
132
+ 3. Optimize storage further
133
+
134
+ ### Long-term (P3)
135
+ 1. Consider SQLite storage backend
136
+ 2. Add compression options
137
+ 3. Support for partial embedding
138
+
139
+ ## How to Use This Research
140
+
141
+ ### For Users
142
+ - **Want embeddings?** Read [01-index-embed.md](./01-index-embed.md) Quick Reference
143
+ - **Hit a bug?** Check the Issues Found section
144
+ - **Need workarounds?** See Best Practices section
145
+
146
+ ### For Developers
147
+ - **Fixing bugs?** See [BUG-FIX-PLAN.md](./BUG-FIX-PLAN.md)
148
+ - **Adding features?** Review performance benchmarks
149
+ - **Understanding codebase?** Read test methodology sections
150
+
151
+ ### For Product Decisions
152
+ - **Production readiness?** See [TESTING-SUMMARY.md](./TESTING-SUMMARY.md)
153
+ - **Pricing estimates?** Check cost analysis in main report
154
+ - **Roadmap planning?** Review Next Steps section
155
+
156
+ ## Acknowledgments
157
+
158
+ **Testing Methodology**: Real-world dogfooding
159
+ **Test Corpora**: agentic-flow (production codebase)
160
+ **Testing Tool**: mdcontext itself (self-hosting)
161
+ **Test Date**: 2026-01-27
162
+ **Tester**: Claude Sonnet 4.5
163
+ **Test Duration**: 90 minutes
164
+
165
+ ---
166
+
167
+ **Status**: Testing complete, critical bug found, fix plan ready
168
+ **Next Action**: Implement binary format for vector metadata
@@ -0,0 +1,128 @@
1
+ # mdcontext Testing Summary - 2026-01-27
2
+
3
+ ## Tests Completed
4
+
5
+ ### ✅ Successful Tests
6
+ 1. **Basic Indexing** (agentic-flow, 1561 docs)
7
+ - Duration: 14.4s
8
+ - Storage: 28MB
9
+ - Result: SUCCESS
10
+
11
+ 2. **OpenAI Embeddings** (mdcontext, 120 docs)
12
+ - Duration: 66.8s (1.5s index + 64.7s embed)
13
+ - Cost: $0.011
14
+ - Storage: 69MB (2.2M index + 66.2M embeddings)
15
+ - Result: SUCCESS
16
+
17
+ 3. **JSON Output**
18
+ - Basic: Single-line JSON
19
+ - Pretty: Formatted JSON
20
+ - Result: PERFECT
21
+
22
+ 4. **Force Rebuild**
23
+ - Reindexed all 120 docs
24
+ - Duration: 1524ms vs 47ms (incremental)
25
+ - Result: SUCCESS
26
+
27
+ 5. **Incremental Updates**
28
+ - Modified 1 file → Only 1 file reindexed
29
+ - Duration: 54ms (28x faster)
30
+ - Result: EXCELLENT
31
+
32
+ ### ❌ Failed Tests
33
+ 6. **OpenRouter Embeddings** (agentic-flow, 1561 docs)
34
+ - Generated embeddings: SUCCESS (101MB vectors.bin)
35
+ - Metadata save: FAILED (JSON size limit)
36
+ - Error: VectorStoreError - Invalid string length
37
+ - Result: BUG FOUND
38
+
39
+ 7. **Ollama Embeddings** (agentic-flow, 1561 docs)
40
+ - Generated embeddings: SUCCESS (101MB vectors.bin)
41
+ - Metadata save: FAILED (JSON size limit)
42
+ - Error: Same as OpenRouter
43
+ - Result: SAME BUG CONFIRMED
44
+
45
+ ## Critical Bug Details
46
+
47
+ **Issue**: Vector metadata serialization fails on large corpora
48
+ **Root Cause**: JSON.stringify exceeds V8 string size limit (~512MB)
49
+ **Affected**: ALL embedding providers on corpora >1500 docs
50
+ **Impact**: Cannot use semantic search on production codebases
51
+
52
+ ### Size Analysis
53
+ - Small corpus (120 docs, 3903 sections): 58MB metadata ✅ Works
54
+ - Large corpus (1561 docs, 52,714 sections): ~785MB metadata ❌ Fails
55
+
56
+ ### Calculation
57
+ ```
58
+ mdcontext: 58MB / 3,903 sections = 14.9KB per section
59
+ agentic-flow: 52,714 sections × 14.9KB = 785MB (exceeds limit)
60
+ ```
61
+
62
+ ## Providers Tested
63
+
64
+ | Provider | Small Corpus | Large Corpus | Status |
65
+ |----------|--------------|--------------|--------|
66
+ | OpenAI | ✅ SUCCESS | ⚠️ Untested (likely fails) | Partial |
67
+ | OpenRouter | ⚠️ Should work | ❌ FAILED | Blocked |
68
+ | Ollama | ⚠️ Should work | ❌ FAILED | Blocked |
69
+
70
+ ## Performance Benchmarks
71
+
72
+ ### Indexing Speed
73
+ - **Without embeddings**: ~108 docs/sec, ~3600 sections/sec
74
+ - **With embeddings (OpenAI)**: ~1.85 docs/sec, ~60 sections/sec
75
+
76
+ ### Costs (OpenAI)
77
+ - Small corpus (120 docs): $0.011
78
+ - Estimated large (1561 docs): ~$0.18 (if bug fixed)
79
+
80
+ ### Storage Overhead
81
+ - Basic index: ~18KB per doc
82
+ - With embeddings: ~575KB per doc (31x increase)
83
+
84
+ ## Recommendations
85
+
86
+ ### Priority 1: Fix Metadata Save Bug
87
+ - **Solution**: Switch to binary format (MessagePack/CBOR)
88
+ - **ETA**: 4-8 hours
89
+ - **Impact**: Unblocks all large-scale embedding use
90
+
91
+ ### Priority 2: Add Early Validation
92
+ - Check estimated metadata size before processing
93
+ - Fail early with clear error message
94
+ - Prevent wasted time/money on doomed runs
95
+
96
+ ### Priority 3: Optimize Metadata Size
97
+ - Currently 7x larger than binary vectors
98
+ - Audit what's stored per vector
99
+ - Remove redundant data
100
+
101
+ ## Production Readiness
102
+
103
+ ### Ready Now ✅
104
+ - Basic indexing (any size)
105
+ - Small corpus embeddings (<200 docs)
106
+ - All CLI features (JSON, force, incremental)
107
+
108
+ ### Blocked 🚫
109
+ - Medium-large corpus embeddings (>1500 docs)
110
+ - Production semantic search
111
+
112
+ ### After Bug Fix 🔧
113
+ - All corpus sizes
114
+ - All providers
115
+ - Production semantic search
116
+
117
+ ## Files Generated
118
+
119
+ - `/Users/alphab/Dev/LLM/DEV/mdcontext/research/mdcontext-pudding/01-index-embed.md` (940 lines)
120
+ - Test logs in `/tmp/test*.log`
121
+ - Partial indexes in target directories
122
+
123
+ ---
124
+
125
+ **Total Test Time**: 90 minutes
126
+ **Commands Executed**: 15+
127
+ **Bug Severity**: Critical (P0)
128
+ **Next Action**: Implement binary metadata format