mdcontext 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.changeset/config.json +9 -9
- package/.claude/settings.local.json +25 -0
- package/.github/workflows/claude-code-review.yml +44 -0
- package/.github/workflows/claude.yml +85 -0
- package/CONTRIBUTING.md +186 -0
- package/NOTES/NOTES +44 -0
- package/README.md +206 -3
- package/biome.json +1 -1
- package/dist/chunk-23UPXDNL.js +3044 -0
- package/dist/chunk-2W7MO2DL.js +1366 -0
- package/dist/chunk-3NUAZGMA.js +1689 -0
- package/dist/chunk-7TOWB2XB.js +366 -0
- package/dist/chunk-7XOTOADQ.js +3065 -0
- package/dist/chunk-AH2PDM2K.js +3042 -0
- package/dist/chunk-BNXWSZ63.js +3742 -0
- package/dist/chunk-BTL5DJVU.js +3222 -0
- package/dist/chunk-HDHYG7E4.js +104 -0
- package/dist/chunk-HLR4KZBP.js +3234 -0
- package/dist/chunk-IP3FRFEB.js +1045 -0
- package/dist/chunk-KHU56VDO.js +3042 -0
- package/dist/chunk-KRYIFLQR.js +85 -89
- package/dist/chunk-LBSDNLEM.js +287 -0
- package/dist/chunk-MNTQ7HCP.js +2643 -0
- package/dist/chunk-MUJELQQ6.js +1387 -0
- package/dist/chunk-MXJGMSLV.js +2199 -0
- package/dist/chunk-N6QJGC3Z.js +2636 -0
- package/dist/chunk-OBELGBPM.js +1713 -0
- package/dist/chunk-OT7R5XTA.js +3192 -0
- package/dist/chunk-P7X4RA2T.js +106 -0
- package/dist/chunk-PIDUQNC2.js +3185 -0
- package/dist/chunk-POGCDIH4.js +3187 -0
- package/dist/chunk-PSIEOQGZ.js +3043 -0
- package/dist/chunk-PVRT3IHA.js +3238 -0
- package/dist/chunk-QNN4TT23.js +1430 -0
- package/dist/chunk-RE3R45RJ.js +3042 -0
- package/dist/chunk-S7E6TFX6.js +718 -657
- package/dist/chunk-SG6GLU4U.js +1378 -0
- package/dist/chunk-SJCDV2ST.js +274 -0
- package/dist/chunk-SYE5XLF3.js +104 -0
- package/dist/chunk-T5VLYBZD.js +103 -0
- package/dist/chunk-TOQB7VWU.js +3238 -0
- package/dist/chunk-VFNMZ4ZQ.js +3228 -0
- package/dist/chunk-VVTGZNBT.js +1533 -1423
- package/dist/chunk-W7Q4RFEV.js +104 -0
- package/dist/chunk-XTYYVRLO.js +3190 -0
- package/dist/chunk-Y6MDYVJD.js +3063 -0
- package/dist/cli/main.js +4072 -629
- package/dist/index.d.ts +420 -33
- package/dist/index.js +8 -15
- package/dist/mcp/server.js +103 -7
- package/dist/schema-BAWSG7KY.js +22 -0
- package/dist/schema-E3QUPL26.js +20 -0
- package/dist/schema-EHL7WUT6.js +20 -0
- package/docs/019-USAGE.md +44 -5
- package/docs/020-current-implementation.md +8 -8
- package/docs/021-DOGFOODING-FINDINGS.md +1 -1
- package/docs/CONFIG.md +1123 -0
- package/docs/ERRORS.md +383 -0
- package/docs/summarization.md +320 -0
- package/justfile +40 -0
- package/package.json +39 -33
- package/research/INDEX.md +315 -0
- package/research/code-review/README.md +90 -0
- package/research/code-review/cli-error-handling-review.md +979 -0
- package/research/code-review/code-review-validation-report.md +464 -0
- package/research/code-review/main-ts-review.md +1128 -0
- package/research/config-docs/SUMMARY.md +357 -0
- package/research/config-docs/TEST-RESULTS.md +776 -0
- package/research/config-docs/TODO.md +542 -0
- package/research/config-docs/analysis.md +744 -0
- package/research/config-docs/fix-validation.md +502 -0
- package/research/config-docs/help-audit.md +264 -0
- package/research/config-docs/help-system-analysis.md +890 -0
- package/research/frontmatter/COMMENTS-ARE-SKIPPED.md +149 -0
- package/research/frontmatter/LLM-CODE-NAVIGATION.md +276 -0
- package/research/issue-review.md +603 -0
- package/research/llm-summarization/agent-cli-tools-2026.md +1082 -0
- package/research/llm-summarization/alternative-providers-2026.md +1428 -0
- package/research/llm-summarization/anthropic-2026.md +367 -0
- package/research/llm-summarization/claude-cli-integration.md +1706 -0
- package/research/llm-summarization/cli-integration-patterns.md +3155 -0
- package/research/llm-summarization/openai-2026.md +473 -0
- package/research/llm-summarization/openai-compatible-providers-2026.md +1022 -0
- package/research/llm-summarization/opencode-cli-integration.md +1552 -0
- package/research/llm-summarization/prompt-engineering-2026.md +1426 -0
- package/research/llm-summarization/prototype-results.md +56 -0
- package/research/llm-summarization/provider-switching-patterns-2026.md +2153 -0
- package/research/llm-summarization/typescript-llm-libraries-2026.md +2436 -0
- package/research/mdcontext-pudding/00-EXECUTIVE-SUMMARY.md +282 -0
- package/research/mdcontext-pudding/01-index-embed.md +956 -0
- package/research/mdcontext-pudding/02-search-COMMANDS.md +142 -0
- package/research/mdcontext-pudding/02-search-SUMMARY.md +146 -0
- package/research/mdcontext-pudding/02-search.md +970 -0
- package/research/mdcontext-pudding/03-context.md +779 -0
- package/research/mdcontext-pudding/04-navigation-and-analytics.md +803 -0
- package/research/mdcontext-pudding/04-tree.md +704 -0
- package/research/mdcontext-pudding/05-config.md +1038 -0
- package/research/mdcontext-pudding/06-links-summary.txt +87 -0
- package/research/mdcontext-pudding/06-links.md +679 -0
- package/research/mdcontext-pudding/07-stats.md +693 -0
- package/research/mdcontext-pudding/BUG-FIX-PLAN.md +388 -0
- package/research/mdcontext-pudding/P0-BUG-VALIDATION.md +167 -0
- package/research/mdcontext-pudding/README.md +168 -0
- package/research/mdcontext-pudding/TESTING-SUMMARY.md +128 -0
- package/research/research-quality-review.md +834 -0
- package/research/semantic-search/embedding-text-analysis.md +156 -0
- package/research/semantic-search/multi-word-failure-reproduction.md +171 -0
- package/research/semantic-search/query-processing-analysis.md +207 -0
- package/research/semantic-search/root-cause-and-solution.md +114 -0
- package/research/semantic-search/threshold-validation-report.md +69 -0
- package/research/semantic-search/vector-search-analysis.md +63 -0
- package/research/test-path-issues.md +276 -0
- package/review/ALP-76/1-error-type-design.md +962 -0
- package/review/ALP-76/2-error-handling-patterns.md +906 -0
- package/review/ALP-76/3-error-presentation.md +624 -0
- package/review/ALP-76/4-test-coverage.md +625 -0
- package/review/ALP-76/5-migration-completeness.md +440 -0
- package/review/ALP-76/6-effect-best-practices.md +755 -0
- package/scripts/apply-branch-protection.sh +47 -0
- package/scripts/branch-protection-templates.json +79 -0
- package/scripts/prototype-summarization.ts +346 -0
- package/scripts/rebuild-hnswlib.js +32 -37
- package/scripts/setup-branch-protection.sh +64 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/active-provider.json +7 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/bm25.json +541 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/bm25.meta.json +5 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/config.json +8 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.bin +0 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.meta.bin +0 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/documents.json +60 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/links.json +13 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/sections.json +1197 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/configuration-management.md +99 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/distributed-systems.md +92 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/error-handling.md +78 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/failure-automation.md +55 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/job-context.md +69 -0
- package/src/__tests__/fixtures/semantic-search/multi-word-corpus/process-orchestration.md +99 -0
- package/src/cli/argv-preprocessor.test.ts +2 -2
- package/src/cli/cli.test.ts +230 -33
- package/src/cli/commands/config-cmd.ts +642 -0
- package/src/cli/commands/context.ts +97 -9
- package/src/cli/commands/duplicates.ts +122 -0
- package/src/cli/commands/embeddings.ts +529 -0
- package/src/cli/commands/index-cmd.ts +210 -30
- package/src/cli/commands/index.ts +3 -0
- package/src/cli/commands/search.ts +894 -64
- package/src/cli/commands/stats.ts +3 -0
- package/src/cli/commands/tree.ts +26 -5
- package/src/cli/config-layer.ts +176 -0
- package/src/cli/error-handler.test.ts +235 -0
- package/src/cli/error-handler.ts +655 -0
- package/src/cli/flag-schemas.ts +66 -0
- package/src/cli/help.ts +209 -7
- package/src/cli/main.ts +348 -58
- package/src/cli/options.ts +10 -0
- package/src/cli/shared-error-handling.ts +199 -0
- package/src/cli/utils.ts +150 -17
- package/src/config/file-provider.test.ts +320 -0
- package/src/config/file-provider.ts +273 -0
- package/src/config/index.ts +72 -0
- package/src/config/integration.test.ts +667 -0
- package/src/config/precedence.test.ts +277 -0
- package/src/config/precedence.ts +451 -0
- package/src/config/schema.test.ts +414 -0
- package/src/config/schema.ts +603 -0
- package/src/config/service.test.ts +320 -0
- package/src/config/service.ts +243 -0
- package/src/config/testing.test.ts +264 -0
- package/src/config/testing.ts +110 -0
- package/src/core/types.ts +6 -33
- package/src/duplicates/detector.test.ts +183 -0
- package/src/duplicates/detector.ts +414 -0
- package/src/duplicates/index.ts +18 -0
- package/src/embeddings/embedding-namespace.test.ts +300 -0
- package/src/embeddings/embedding-namespace.ts +947 -0
- package/src/embeddings/heading-boost.test.ts +222 -0
- package/src/embeddings/hnsw-build-options.test.ts +198 -0
- package/src/embeddings/hyde.test.ts +272 -0
- package/src/embeddings/hyde.ts +264 -0
- package/src/embeddings/index.ts +2 -0
- package/src/embeddings/openai-provider.ts +332 -83
- package/src/embeddings/pricing.json +22 -0
- package/src/embeddings/provider-constants.ts +204 -0
- package/src/embeddings/provider-errors.test.ts +967 -0
- package/src/embeddings/provider-errors.ts +565 -0
- package/src/embeddings/provider-factory.test.ts +240 -0
- package/src/embeddings/provider-factory.ts +225 -0
- package/src/embeddings/provider-integration.test.ts +788 -0
- package/src/embeddings/query-preprocessing.test.ts +187 -0
- package/src/embeddings/semantic-search-threshold.test.ts +508 -0
- package/src/embeddings/semantic-search.ts +780 -93
- package/src/embeddings/types.ts +293 -16
- package/src/embeddings/vector-store.ts +486 -77
- package/src/embeddings/voyage-provider.ts +313 -0
- package/src/errors/errors.test.ts +845 -0
- package/src/errors/index.ts +533 -0
- package/src/index/ignore-patterns.test.ts +354 -0
- package/src/index/ignore-patterns.ts +305 -0
- package/src/index/indexer.ts +286 -48
- package/src/index/storage.ts +94 -30
- package/src/index/types.ts +40 -2
- package/src/index/watcher.ts +67 -9
- package/src/index.ts +22 -0
- package/src/integration/search-keyword.test.ts +678 -0
- package/src/mcp/server.ts +135 -6
- package/src/parser/parser.ts +18 -19
- package/src/parser/section-filter.test.ts +277 -0
- package/src/parser/section-filter.ts +125 -3
- package/src/search/__tests__/hybrid-search.test.ts +650 -0
- package/src/search/bm25-store.ts +366 -0
- package/src/search/cross-encoder.test.ts +253 -0
- package/src/search/cross-encoder.ts +406 -0
- package/src/search/fuzzy-search.test.ts +419 -0
- package/src/search/fuzzy-search.ts +273 -0
- package/src/search/hybrid-search.ts +448 -0
- package/src/search/path-matcher.test.ts +276 -0
- package/src/search/path-matcher.ts +33 -0
- package/src/search/searcher.test.ts +99 -1
- package/src/search/searcher.ts +189 -67
- package/src/search/wink-bm25.d.ts +30 -0
- package/src/summarization/cli-providers/claude.ts +202 -0
- package/src/summarization/cli-providers/detection.test.ts +273 -0
- package/src/summarization/cli-providers/detection.ts +118 -0
- package/src/summarization/cli-providers/index.ts +8 -0
- package/src/summarization/cost.test.ts +139 -0
- package/src/summarization/cost.ts +102 -0
- package/src/summarization/error-handler.test.ts +127 -0
- package/src/summarization/error-handler.ts +111 -0
- package/src/summarization/index.ts +102 -0
- package/src/summarization/pipeline.test.ts +498 -0
- package/src/summarization/pipeline.ts +231 -0
- package/src/summarization/prompts.test.ts +269 -0
- package/src/summarization/prompts.ts +133 -0
- package/src/summarization/provider-factory.test.ts +396 -0
- package/src/summarization/provider-factory.ts +178 -0
- package/src/summarization/types.ts +184 -0
- package/src/summarize/summarizer.ts +104 -35
- package/src/types/huggingface-transformers.d.ts +66 -0
- package/tests/fixtures/cli/.mdcontext/active-provider.json +7 -0
- package/tests/fixtures/cli/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.bin +0 -0
- package/tests/fixtures/cli/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.meta.bin +0 -0
- package/tests/fixtures/cli/.mdcontext/indexes/documents.json +4 -4
- package/tests/fixtures/cli/.mdcontext/indexes/sections.json +14 -0
- package/tests/integration/embed-index.test.ts +712 -0
- package/tests/integration/search-context.test.ts +469 -0
- package/tests/integration/search-semantic.test.ts +522 -0
- package/vitest.config.ts +1 -6
- package/AGENTS.md +0 -46
- package/tests/fixtures/cli/.mdcontext/vectors.bin +0 -0
- package/tests/fixtures/cli/.mdcontext/vectors.meta.json +0 -1264
|
@@ -0,0 +1,168 @@
|
|
|
1
|
+
# mdcontext Pudding Research
|
|
2
|
+
|
|
3
|
+
**"The proof is in the pudding"** - Comprehensive dogfooding and testing of mdcontext functionality.
|
|
4
|
+
|
|
5
|
+
This directory contains detailed test results, analysis, and findings from exercising mdcontext against real-world codebases.
|
|
6
|
+
|
|
7
|
+
## Test Reports
|
|
8
|
+
|
|
9
|
+
### ✅ Completed
|
|
10
|
+
|
|
11
|
+
| Report | Status | Key Findings |
|
|
12
|
+
|--------|--------|--------------|
|
|
13
|
+
| [01-index-embed.md](./01-index-embed.md) | 🔴 Critical Bug | Vector metadata save fails on large corpora |
|
|
14
|
+
| [02-search.md](./02-search.md) | ✅ Complete | Search functionality working well |
|
|
15
|
+
| [03-context.md](./03-context.md) | ✅ Complete | Context assembly excellent |
|
|
16
|
+
| [04-tree.md](./04-tree.md) | ✅ Complete | Tree visualization works |
|
|
17
|
+
| [05-config.md](./05-config.md) | ✅ Complete | Config system solid |
|
|
18
|
+
| [06-links.md](./06-links.md) | ✅ Complete | Link graph analysis working |
|
|
19
|
+
| [07-stats.md](./07-stats.md) | ✅ Complete | Stats command comprehensive |
|
|
20
|
+
|
|
21
|
+
### 🔴 Critical Findings
|
|
22
|
+
|
|
23
|
+
**Test 01 (Index & Embeddings)** uncovered a critical bug:
|
|
24
|
+
- **Issue**: Vector metadata save fails on large corpora (>1500 docs)
|
|
25
|
+
- **Severity**: P0 - Blocks production use of semantic search
|
|
26
|
+
- **Impact**: All embedding providers affected
|
|
27
|
+
- **Root Cause**: JSON.stringify size limit (~512MB)
|
|
28
|
+
- **Location**: `src/embeddings/vector-store.ts:401`
|
|
29
|
+
- **Fix Plan**: [BUG-FIX-PLAN.md](./BUG-FIX-PLAN.md)
|
|
30
|
+
|
|
31
|
+
## Quick Links
|
|
32
|
+
|
|
33
|
+
### Start Here 🎯
|
|
34
|
+
- **[00-EXECUTIVE-SUMMARY.md](./00-EXECUTIVE-SUMMARY.md)** - Complete testing overview, grades, and recommendations
|
|
35
|
+
- **[P0-BUG-VALIDATION.md](./P0-BUG-VALIDATION.md)** - Critical bug validation (100% reproducible)
|
|
36
|
+
- **[BUG-FIX-PLAN.md](./BUG-FIX-PLAN.md)** - Implementation plan for fix (6 hours)
|
|
37
|
+
|
|
38
|
+
### Detailed Reports
|
|
39
|
+
- **[01-index-embed.md](./01-index-embed.md)** - Indexing & embedding deep dive (26KB)
|
|
40
|
+
- **[02-search.md](./02-search.md)** - Search testing (22 scenarios, Grade: A)
|
|
41
|
+
- **[03-context.md](./03-context.md)** - Token budget analysis (Grade: A-)
|
|
42
|
+
- **[04-tree.md](./04-tree.md)** - Structure navigation testing
|
|
43
|
+
- **[05-config.md](./05-config.md)** - Configuration management (26KB)
|
|
44
|
+
- **[06-links.md](./06-links.md)** - Knowledge graph capabilities
|
|
45
|
+
- **[07-stats.md](./07-stats.md)** - Analytics and metrics
|
|
46
|
+
- **[TESTING-SUMMARY.md](./TESTING-SUMMARY.md)** - Test matrix
|
|
47
|
+
|
|
48
|
+
### Test Data
|
|
49
|
+
- Test corpus: agentic-flow (1561 docs, 52,714 sections)
|
|
50
|
+
- Reference corpus: mdcontext (120 docs, 4,234 sections)
|
|
51
|
+
- Test logs: `/tmp/test*.log`
|
|
52
|
+
|
|
53
|
+
## Test Coverage
|
|
54
|
+
|
|
55
|
+
### What Was Tested ✅
|
|
56
|
+
|
|
57
|
+
1. **Basic Indexing**
|
|
58
|
+
- Large corpus (1561 docs): ✅ Works
|
|
59
|
+
- Performance: 108 docs/sec
|
|
60
|
+
- Storage: 28MB
|
|
61
|
+
|
|
62
|
+
2. **Embeddings (Multiple Providers)**
|
|
63
|
+
- OpenAI (small corpus 120 docs): ✅ Works
|
|
64
|
+
- OpenAI (large corpus 1558 docs): ❌ Bug validated
|
|
65
|
+
- OpenRouter (large corpus 1558 docs): ❌ Bug validated
|
|
66
|
+
- Ollama (large corpus 1558 docs): ❌ Bug validated
|
|
67
|
+
- **100% reproducible across all providers**
|
|
68
|
+
|
|
69
|
+
3. **CLI Features**
|
|
70
|
+
- JSON output: ✅ Perfect
|
|
71
|
+
- Force rebuild: ✅ Works
|
|
72
|
+
- Incremental updates: ✅ Excellent (28x faster)
|
|
73
|
+
|
|
74
|
+
4. **Search Functionality** (02-search.md)
|
|
75
|
+
- Keyword search: ✅ Works well
|
|
76
|
+
- Multi-word queries: ✅ Fixed
|
|
77
|
+
- Semantic search: ⚠️ Blocked by embedding bug
|
|
78
|
+
|
|
79
|
+
5. **Context Assembly** (03-context.md)
|
|
80
|
+
- Compression levels: ✅ All working
|
|
81
|
+
- Token budgets: ✅ Accurate
|
|
82
|
+
- Quality: ✅ High
|
|
83
|
+
|
|
84
|
+
### What Wasn't Tested ⏸️
|
|
85
|
+
|
|
86
|
+
- Anthropic embeddings (no embedding API)
|
|
87
|
+
- Voyage AI embeddings
|
|
88
|
+
- Very large corpora (>5000 docs)
|
|
89
|
+
- Real-time watch mode under load
|
|
90
|
+
- Concurrent indexing
|
|
91
|
+
|
|
92
|
+
## Performance Summary
|
|
93
|
+
|
|
94
|
+
| Operation | Speed | Cost | Storage |
|
|
95
|
+
|-----------|-------|------|---------|
|
|
96
|
+
| Basic Index (1561 docs) | 14.4s | Free | 28MB |
|
|
97
|
+
| OpenAI Embed (120 docs) | 64.7s | $0.011 | 69MB |
|
|
98
|
+
| Incremental (1 file) | 54ms | Free | Δ only |
|
|
99
|
+
|
|
100
|
+
## Bug Impact Analysis
|
|
101
|
+
|
|
102
|
+
### Working Today ✅
|
|
103
|
+
- Small projects (<200 docs) with embeddings
|
|
104
|
+
- Any size project without embeddings
|
|
105
|
+
- All CLI features
|
|
106
|
+
|
|
107
|
+
### Blocked 🚫
|
|
108
|
+
- Medium projects (200-1000 docs) with embeddings
|
|
109
|
+
- Large projects (>1500 docs) with embeddings
|
|
110
|
+
- Production semantic search on real codebases
|
|
111
|
+
|
|
112
|
+
### After Bug Fix 🔧
|
|
113
|
+
- All corpus sizes
|
|
114
|
+
- All providers
|
|
115
|
+
- Full semantic search capability
|
|
116
|
+
|
|
117
|
+
## Next Steps
|
|
118
|
+
|
|
119
|
+
### Immediate (P0)
|
|
120
|
+
1. Implement MessagePack binary format for vector metadata
|
|
121
|
+
2. Test fix on agentic-flow corpus
|
|
122
|
+
3. Deploy to production
|
|
123
|
+
|
|
124
|
+
### Short-term (P1)
|
|
125
|
+
1. Reduce metadata redundancy
|
|
126
|
+
2. Add size validation warnings
|
|
127
|
+
3. Test OpenAI on medium corpus (500-1000 docs)
|
|
128
|
+
|
|
129
|
+
### Medium-term (P2)
|
|
130
|
+
1. Benchmark search performance with embeddings
|
|
131
|
+
2. Test additional embedding providers
|
|
132
|
+
3. Optimize storage further
|
|
133
|
+
|
|
134
|
+
### Long-term (P3)
|
|
135
|
+
1. Consider SQLite storage backend
|
|
136
|
+
2. Add compression options
|
|
137
|
+
3. Support for partial embedding
|
|
138
|
+
|
|
139
|
+
## How to Use This Research
|
|
140
|
+
|
|
141
|
+
### For Users
|
|
142
|
+
- **Want embeddings?** Read [01-index-embed.md](./01-index-embed.md) Quick Reference
|
|
143
|
+
- **Hit a bug?** Check the Issues Found section
|
|
144
|
+
- **Need workarounds?** See Best Practices section
|
|
145
|
+
|
|
146
|
+
### For Developers
|
|
147
|
+
- **Fixing bugs?** See [BUG-FIX-PLAN.md](./BUG-FIX-PLAN.md)
|
|
148
|
+
- **Adding features?** Review performance benchmarks
|
|
149
|
+
- **Understanding codebase?** Read test methodology sections
|
|
150
|
+
|
|
151
|
+
### For Product Decisions
|
|
152
|
+
- **Production readiness?** See [TESTING-SUMMARY.md](./TESTING-SUMMARY.md)
|
|
153
|
+
- **Pricing estimates?** Check cost analysis in main report
|
|
154
|
+
- **Roadmap planning?** Review Next Steps section
|
|
155
|
+
|
|
156
|
+
## Acknowledgments
|
|
157
|
+
|
|
158
|
+
**Testing Methodology**: Real-world dogfooding
|
|
159
|
+
**Test Corpora**: agentic-flow (production codebase)
|
|
160
|
+
**Testing Tool**: mdcontext itself (self-hosting)
|
|
161
|
+
**Test Date**: 2026-01-27
|
|
162
|
+
**Tester**: Claude Sonnet 4.5
|
|
163
|
+
**Test Duration**: 90 minutes
|
|
164
|
+
|
|
165
|
+
---
|
|
166
|
+
|
|
167
|
+
**Status**: Testing complete, critical bug found, fix plan ready
|
|
168
|
+
**Next Action**: Implement binary format for vector metadata
|
|
@@ -0,0 +1,128 @@
|
|
|
1
|
+
# mdcontext Testing Summary - 2026-01-27
|
|
2
|
+
|
|
3
|
+
## Tests Completed
|
|
4
|
+
|
|
5
|
+
### ✅ Successful Tests
|
|
6
|
+
1. **Basic Indexing** (agentic-flow, 1561 docs)
|
|
7
|
+
- Duration: 14.4s
|
|
8
|
+
- Storage: 28MB
|
|
9
|
+
- Result: SUCCESS
|
|
10
|
+
|
|
11
|
+
2. **OpenAI Embeddings** (mdcontext, 120 docs)
|
|
12
|
+
- Duration: 66.8s (1.5s index + 64.7s embed)
|
|
13
|
+
- Cost: $0.011
|
|
14
|
+
- Storage: 69MB (2.2M index + 66.2M embeddings)
|
|
15
|
+
- Result: SUCCESS
|
|
16
|
+
|
|
17
|
+
3. **JSON Output**
|
|
18
|
+
- Basic: Single-line JSON
|
|
19
|
+
- Pretty: Formatted JSON
|
|
20
|
+
- Result: PERFECT
|
|
21
|
+
|
|
22
|
+
4. **Force Rebuild**
|
|
23
|
+
- Reindexed all 120 docs
|
|
24
|
+
- Duration: 1524ms vs 47ms (incremental)
|
|
25
|
+
- Result: SUCCESS
|
|
26
|
+
|
|
27
|
+
5. **Incremental Updates**
|
|
28
|
+
- Modified 1 file → Only 1 file reindexed
|
|
29
|
+
- Duration: 54ms (28x faster)
|
|
30
|
+
- Result: EXCELLENT
|
|
31
|
+
|
|
32
|
+
### ❌ Failed Tests
|
|
33
|
+
6. **OpenRouter Embeddings** (agentic-flow, 1561 docs)
|
|
34
|
+
- Generated embeddings: SUCCESS (101MB vectors.bin)
|
|
35
|
+
- Metadata save: FAILED (JSON size limit)
|
|
36
|
+
- Error: VectorStoreError - Invalid string length
|
|
37
|
+
- Result: BUG FOUND
|
|
38
|
+
|
|
39
|
+
7. **Ollama Embeddings** (agentic-flow, 1561 docs)
|
|
40
|
+
- Generated embeddings: SUCCESS (101MB vectors.bin)
|
|
41
|
+
- Metadata save: FAILED (JSON size limit)
|
|
42
|
+
- Error: Same as OpenRouter
|
|
43
|
+
- Result: SAME BUG CONFIRMED
|
|
44
|
+
|
|
45
|
+
## Critical Bug Details
|
|
46
|
+
|
|
47
|
+
**Issue**: Vector metadata serialization fails on large corpora
|
|
48
|
+
**Root Cause**: JSON.stringify exceeds V8 string size limit (~512MB)
|
|
49
|
+
**Affected**: ALL embedding providers on corpora >1500 docs
|
|
50
|
+
**Impact**: Cannot use semantic search on production codebases
|
|
51
|
+
|
|
52
|
+
### Size Analysis
|
|
53
|
+
- Small corpus (120 docs, 3903 sections): 58MB metadata ✅ Works
|
|
54
|
+
- Large corpus (1561 docs, 52,714 sections): ~785MB metadata ❌ Fails
|
|
55
|
+
|
|
56
|
+
### Calculation
|
|
57
|
+
```
|
|
58
|
+
mdcontext: 58MB / 3,903 sections = 14.9KB per section
|
|
59
|
+
agentic-flow: 52,714 sections × 14.9KB = 785MB (exceeds limit)
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
## Providers Tested
|
|
63
|
+
|
|
64
|
+
| Provider | Small Corpus | Large Corpus | Status |
|
|
65
|
+
|----------|--------------|--------------|--------|
|
|
66
|
+
| OpenAI | ✅ SUCCESS | ⚠️ Untested (likely fails) | Partial |
|
|
67
|
+
| OpenRouter | ⚠️ Should work | ❌ FAILED | Blocked |
|
|
68
|
+
| Ollama | ⚠️ Should work | ❌ FAILED | Blocked |
|
|
69
|
+
|
|
70
|
+
## Performance Benchmarks
|
|
71
|
+
|
|
72
|
+
### Indexing Speed
|
|
73
|
+
- **Without embeddings**: ~108 docs/sec, ~3600 sections/sec
|
|
74
|
+
- **With embeddings (OpenAI)**: ~1.85 docs/sec, ~60 sections/sec
|
|
75
|
+
|
|
76
|
+
### Costs (OpenAI)
|
|
77
|
+
- Small corpus (120 docs): $0.011
|
|
78
|
+
- Estimated large (1561 docs): ~$0.18 (if bug fixed)
|
|
79
|
+
|
|
80
|
+
### Storage Overhead
|
|
81
|
+
- Basic index: ~18KB per doc
|
|
82
|
+
- With embeddings: ~575KB per doc (31x increase)
|
|
83
|
+
|
|
84
|
+
## Recommendations
|
|
85
|
+
|
|
86
|
+
### Priority 1: Fix Metadata Save Bug
|
|
87
|
+
- **Solution**: Switch to binary format (MessagePack/CBOR)
|
|
88
|
+
- **ETA**: 4-8 hours
|
|
89
|
+
- **Impact**: Unblocks all large-scale embedding use
|
|
90
|
+
|
|
91
|
+
### Priority 2: Add Early Validation
|
|
92
|
+
- Check estimated metadata size before processing
|
|
93
|
+
- Fail early with clear error message
|
|
94
|
+
- Prevent wasted time/money on doomed runs
|
|
95
|
+
|
|
96
|
+
### Priority 3: Optimize Metadata Size
|
|
97
|
+
- Currently 7x larger than binary vectors
|
|
98
|
+
- Audit what's stored per vector
|
|
99
|
+
- Remove redundant data
|
|
100
|
+
|
|
101
|
+
## Production Readiness
|
|
102
|
+
|
|
103
|
+
### Ready Now ✅
|
|
104
|
+
- Basic indexing (any size)
|
|
105
|
+
- Small corpus embeddings (<200 docs)
|
|
106
|
+
- All CLI features (JSON, force, incremental)
|
|
107
|
+
|
|
108
|
+
### Blocked 🚫
|
|
109
|
+
- Medium-large corpus embeddings (>1500 docs)
|
|
110
|
+
- Production semantic search
|
|
111
|
+
|
|
112
|
+
### After Bug Fix 🔧
|
|
113
|
+
- All corpus sizes
|
|
114
|
+
- All providers
|
|
115
|
+
- Production semantic search
|
|
116
|
+
|
|
117
|
+
## Files Generated
|
|
118
|
+
|
|
119
|
+
- `/Users/alphab/Dev/LLM/DEV/mdcontext/research/mdcontext-pudding/01-index-embed.md` (940 lines)
|
|
120
|
+
- Test logs in `/tmp/test*.log`
|
|
121
|
+
- Partial indexes in target directories
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
**Total Test Time**: 90 minutes
|
|
126
|
+
**Commands Executed**: 15+
|
|
127
|
+
**Bug Severity**: Critical (P0)
|
|
128
|
+
**Next Action**: Implement binary metadata format
|