mdcontext 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (251) hide show
  1. package/.changeset/config.json +9 -9
  2. package/.claude/settings.local.json +25 -0
  3. package/.github/workflows/claude-code-review.yml +44 -0
  4. package/.github/workflows/claude.yml +85 -0
  5. package/CONTRIBUTING.md +186 -0
  6. package/NOTES/NOTES +44 -0
  7. package/README.md +206 -3
  8. package/biome.json +1 -1
  9. package/dist/chunk-23UPXDNL.js +3044 -0
  10. package/dist/chunk-2W7MO2DL.js +1366 -0
  11. package/dist/chunk-3NUAZGMA.js +1689 -0
  12. package/dist/chunk-7TOWB2XB.js +366 -0
  13. package/dist/chunk-7XOTOADQ.js +3065 -0
  14. package/dist/chunk-AH2PDM2K.js +3042 -0
  15. package/dist/chunk-BNXWSZ63.js +3742 -0
  16. package/dist/chunk-BTL5DJVU.js +3222 -0
  17. package/dist/chunk-HDHYG7E4.js +104 -0
  18. package/dist/chunk-HLR4KZBP.js +3234 -0
  19. package/dist/chunk-IP3FRFEB.js +1045 -0
  20. package/dist/chunk-KHU56VDO.js +3042 -0
  21. package/dist/chunk-KRYIFLQR.js +85 -89
  22. package/dist/chunk-LBSDNLEM.js +287 -0
  23. package/dist/chunk-MNTQ7HCP.js +2643 -0
  24. package/dist/chunk-MUJELQQ6.js +1387 -0
  25. package/dist/chunk-MXJGMSLV.js +2199 -0
  26. package/dist/chunk-N6QJGC3Z.js +2636 -0
  27. package/dist/chunk-OBELGBPM.js +1713 -0
  28. package/dist/chunk-OT7R5XTA.js +3192 -0
  29. package/dist/chunk-P7X4RA2T.js +106 -0
  30. package/dist/chunk-PIDUQNC2.js +3185 -0
  31. package/dist/chunk-POGCDIH4.js +3187 -0
  32. package/dist/chunk-PSIEOQGZ.js +3043 -0
  33. package/dist/chunk-PVRT3IHA.js +3238 -0
  34. package/dist/chunk-QNN4TT23.js +1430 -0
  35. package/dist/chunk-RE3R45RJ.js +3042 -0
  36. package/dist/chunk-S7E6TFX6.js +718 -657
  37. package/dist/chunk-SG6GLU4U.js +1378 -0
  38. package/dist/chunk-SJCDV2ST.js +274 -0
  39. package/dist/chunk-SYE5XLF3.js +104 -0
  40. package/dist/chunk-T5VLYBZD.js +103 -0
  41. package/dist/chunk-TOQB7VWU.js +3238 -0
  42. package/dist/chunk-VFNMZ4ZQ.js +3228 -0
  43. package/dist/chunk-VVTGZNBT.js +1533 -1423
  44. package/dist/chunk-W7Q4RFEV.js +104 -0
  45. package/dist/chunk-XTYYVRLO.js +3190 -0
  46. package/dist/chunk-Y6MDYVJD.js +3063 -0
  47. package/dist/cli/main.js +4072 -629
  48. package/dist/index.d.ts +420 -33
  49. package/dist/index.js +8 -15
  50. package/dist/mcp/server.js +103 -7
  51. package/dist/schema-BAWSG7KY.js +22 -0
  52. package/dist/schema-E3QUPL26.js +20 -0
  53. package/dist/schema-EHL7WUT6.js +20 -0
  54. package/docs/019-USAGE.md +44 -5
  55. package/docs/020-current-implementation.md +8 -8
  56. package/docs/021-DOGFOODING-FINDINGS.md +1 -1
  57. package/docs/CONFIG.md +1123 -0
  58. package/docs/ERRORS.md +383 -0
  59. package/docs/summarization.md +320 -0
  60. package/justfile +40 -0
  61. package/package.json +39 -33
  62. package/research/INDEX.md +315 -0
  63. package/research/code-review/README.md +90 -0
  64. package/research/code-review/cli-error-handling-review.md +979 -0
  65. package/research/code-review/code-review-validation-report.md +464 -0
  66. package/research/code-review/main-ts-review.md +1128 -0
  67. package/research/config-docs/SUMMARY.md +357 -0
  68. package/research/config-docs/TEST-RESULTS.md +776 -0
  69. package/research/config-docs/TODO.md +542 -0
  70. package/research/config-docs/analysis.md +744 -0
  71. package/research/config-docs/fix-validation.md +502 -0
  72. package/research/config-docs/help-audit.md +264 -0
  73. package/research/config-docs/help-system-analysis.md +890 -0
  74. package/research/frontmatter/COMMENTS-ARE-SKIPPED.md +149 -0
  75. package/research/frontmatter/LLM-CODE-NAVIGATION.md +276 -0
  76. package/research/issue-review.md +603 -0
  77. package/research/llm-summarization/agent-cli-tools-2026.md +1082 -0
  78. package/research/llm-summarization/alternative-providers-2026.md +1428 -0
  79. package/research/llm-summarization/anthropic-2026.md +367 -0
  80. package/research/llm-summarization/claude-cli-integration.md +1706 -0
  81. package/research/llm-summarization/cli-integration-patterns.md +3155 -0
  82. package/research/llm-summarization/openai-2026.md +473 -0
  83. package/research/llm-summarization/openai-compatible-providers-2026.md +1022 -0
  84. package/research/llm-summarization/opencode-cli-integration.md +1552 -0
  85. package/research/llm-summarization/prompt-engineering-2026.md +1426 -0
  86. package/research/llm-summarization/prototype-results.md +56 -0
  87. package/research/llm-summarization/provider-switching-patterns-2026.md +2153 -0
  88. package/research/llm-summarization/typescript-llm-libraries-2026.md +2436 -0
  89. package/research/mdcontext-pudding/00-EXECUTIVE-SUMMARY.md +282 -0
  90. package/research/mdcontext-pudding/01-index-embed.md +956 -0
  91. package/research/mdcontext-pudding/02-search-COMMANDS.md +142 -0
  92. package/research/mdcontext-pudding/02-search-SUMMARY.md +146 -0
  93. package/research/mdcontext-pudding/02-search.md +970 -0
  94. package/research/mdcontext-pudding/03-context.md +779 -0
  95. package/research/mdcontext-pudding/04-navigation-and-analytics.md +803 -0
  96. package/research/mdcontext-pudding/04-tree.md +704 -0
  97. package/research/mdcontext-pudding/05-config.md +1038 -0
  98. package/research/mdcontext-pudding/06-links-summary.txt +87 -0
  99. package/research/mdcontext-pudding/06-links.md +679 -0
  100. package/research/mdcontext-pudding/07-stats.md +693 -0
  101. package/research/mdcontext-pudding/BUG-FIX-PLAN.md +388 -0
  102. package/research/mdcontext-pudding/P0-BUG-VALIDATION.md +167 -0
  103. package/research/mdcontext-pudding/README.md +168 -0
  104. package/research/mdcontext-pudding/TESTING-SUMMARY.md +128 -0
  105. package/research/research-quality-review.md +834 -0
  106. package/research/semantic-search/embedding-text-analysis.md +156 -0
  107. package/research/semantic-search/multi-word-failure-reproduction.md +171 -0
  108. package/research/semantic-search/query-processing-analysis.md +207 -0
  109. package/research/semantic-search/root-cause-and-solution.md +114 -0
  110. package/research/semantic-search/threshold-validation-report.md +69 -0
  111. package/research/semantic-search/vector-search-analysis.md +63 -0
  112. package/research/test-path-issues.md +276 -0
  113. package/review/ALP-76/1-error-type-design.md +962 -0
  114. package/review/ALP-76/2-error-handling-patterns.md +906 -0
  115. package/review/ALP-76/3-error-presentation.md +624 -0
  116. package/review/ALP-76/4-test-coverage.md +625 -0
  117. package/review/ALP-76/5-migration-completeness.md +440 -0
  118. package/review/ALP-76/6-effect-best-practices.md +755 -0
  119. package/scripts/apply-branch-protection.sh +47 -0
  120. package/scripts/branch-protection-templates.json +79 -0
  121. package/scripts/prototype-summarization.ts +346 -0
  122. package/scripts/rebuild-hnswlib.js +32 -37
  123. package/scripts/setup-branch-protection.sh +64 -0
  124. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/active-provider.json +7 -0
  125. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/bm25.json +541 -0
  126. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/bm25.meta.json +5 -0
  127. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/config.json +8 -0
  128. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.bin +0 -0
  129. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.meta.bin +0 -0
  130. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/documents.json +60 -0
  131. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/links.json +13 -0
  132. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/sections.json +1197 -0
  133. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/configuration-management.md +99 -0
  134. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/distributed-systems.md +92 -0
  135. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/error-handling.md +78 -0
  136. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/failure-automation.md +55 -0
  137. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/job-context.md +69 -0
  138. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/process-orchestration.md +99 -0
  139. package/src/cli/argv-preprocessor.test.ts +2 -2
  140. package/src/cli/cli.test.ts +230 -33
  141. package/src/cli/commands/config-cmd.ts +642 -0
  142. package/src/cli/commands/context.ts +97 -9
  143. package/src/cli/commands/duplicates.ts +122 -0
  144. package/src/cli/commands/embeddings.ts +529 -0
  145. package/src/cli/commands/index-cmd.ts +210 -30
  146. package/src/cli/commands/index.ts +3 -0
  147. package/src/cli/commands/search.ts +894 -64
  148. package/src/cli/commands/stats.ts +3 -0
  149. package/src/cli/commands/tree.ts +26 -5
  150. package/src/cli/config-layer.ts +176 -0
  151. package/src/cli/error-handler.test.ts +235 -0
  152. package/src/cli/error-handler.ts +655 -0
  153. package/src/cli/flag-schemas.ts +66 -0
  154. package/src/cli/help.ts +209 -7
  155. package/src/cli/main.ts +348 -58
  156. package/src/cli/options.ts +10 -0
  157. package/src/cli/shared-error-handling.ts +199 -0
  158. package/src/cli/utils.ts +150 -17
  159. package/src/config/file-provider.test.ts +320 -0
  160. package/src/config/file-provider.ts +273 -0
  161. package/src/config/index.ts +72 -0
  162. package/src/config/integration.test.ts +667 -0
  163. package/src/config/precedence.test.ts +277 -0
  164. package/src/config/precedence.ts +451 -0
  165. package/src/config/schema.test.ts +414 -0
  166. package/src/config/schema.ts +603 -0
  167. package/src/config/service.test.ts +320 -0
  168. package/src/config/service.ts +243 -0
  169. package/src/config/testing.test.ts +264 -0
  170. package/src/config/testing.ts +110 -0
  171. package/src/core/types.ts +6 -33
  172. package/src/duplicates/detector.test.ts +183 -0
  173. package/src/duplicates/detector.ts +414 -0
  174. package/src/duplicates/index.ts +18 -0
  175. package/src/embeddings/embedding-namespace.test.ts +300 -0
  176. package/src/embeddings/embedding-namespace.ts +947 -0
  177. package/src/embeddings/heading-boost.test.ts +222 -0
  178. package/src/embeddings/hnsw-build-options.test.ts +198 -0
  179. package/src/embeddings/hyde.test.ts +272 -0
  180. package/src/embeddings/hyde.ts +264 -0
  181. package/src/embeddings/index.ts +2 -0
  182. package/src/embeddings/openai-provider.ts +332 -83
  183. package/src/embeddings/pricing.json +22 -0
  184. package/src/embeddings/provider-constants.ts +204 -0
  185. package/src/embeddings/provider-errors.test.ts +967 -0
  186. package/src/embeddings/provider-errors.ts +565 -0
  187. package/src/embeddings/provider-factory.test.ts +240 -0
  188. package/src/embeddings/provider-factory.ts +225 -0
  189. package/src/embeddings/provider-integration.test.ts +788 -0
  190. package/src/embeddings/query-preprocessing.test.ts +187 -0
  191. package/src/embeddings/semantic-search-threshold.test.ts +508 -0
  192. package/src/embeddings/semantic-search.ts +780 -93
  193. package/src/embeddings/types.ts +293 -16
  194. package/src/embeddings/vector-store.ts +486 -77
  195. package/src/embeddings/voyage-provider.ts +313 -0
  196. package/src/errors/errors.test.ts +845 -0
  197. package/src/errors/index.ts +533 -0
  198. package/src/index/ignore-patterns.test.ts +354 -0
  199. package/src/index/ignore-patterns.ts +305 -0
  200. package/src/index/indexer.ts +286 -48
  201. package/src/index/storage.ts +94 -30
  202. package/src/index/types.ts +40 -2
  203. package/src/index/watcher.ts +67 -9
  204. package/src/index.ts +22 -0
  205. package/src/integration/search-keyword.test.ts +678 -0
  206. package/src/mcp/server.ts +135 -6
  207. package/src/parser/parser.ts +18 -19
  208. package/src/parser/section-filter.test.ts +277 -0
  209. package/src/parser/section-filter.ts +125 -3
  210. package/src/search/__tests__/hybrid-search.test.ts +650 -0
  211. package/src/search/bm25-store.ts +366 -0
  212. package/src/search/cross-encoder.test.ts +253 -0
  213. package/src/search/cross-encoder.ts +406 -0
  214. package/src/search/fuzzy-search.test.ts +419 -0
  215. package/src/search/fuzzy-search.ts +273 -0
  216. package/src/search/hybrid-search.ts +448 -0
  217. package/src/search/path-matcher.test.ts +276 -0
  218. package/src/search/path-matcher.ts +33 -0
  219. package/src/search/searcher.test.ts +99 -1
  220. package/src/search/searcher.ts +189 -67
  221. package/src/search/wink-bm25.d.ts +30 -0
  222. package/src/summarization/cli-providers/claude.ts +202 -0
  223. package/src/summarization/cli-providers/detection.test.ts +273 -0
  224. package/src/summarization/cli-providers/detection.ts +118 -0
  225. package/src/summarization/cli-providers/index.ts +8 -0
  226. package/src/summarization/cost.test.ts +139 -0
  227. package/src/summarization/cost.ts +102 -0
  228. package/src/summarization/error-handler.test.ts +127 -0
  229. package/src/summarization/error-handler.ts +111 -0
  230. package/src/summarization/index.ts +102 -0
  231. package/src/summarization/pipeline.test.ts +498 -0
  232. package/src/summarization/pipeline.ts +231 -0
  233. package/src/summarization/prompts.test.ts +269 -0
  234. package/src/summarization/prompts.ts +133 -0
  235. package/src/summarization/provider-factory.test.ts +396 -0
  236. package/src/summarization/provider-factory.ts +178 -0
  237. package/src/summarization/types.ts +184 -0
  238. package/src/summarize/summarizer.ts +104 -35
  239. package/src/types/huggingface-transformers.d.ts +66 -0
  240. package/tests/fixtures/cli/.mdcontext/active-provider.json +7 -0
  241. package/tests/fixtures/cli/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.bin +0 -0
  242. package/tests/fixtures/cli/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.meta.bin +0 -0
  243. package/tests/fixtures/cli/.mdcontext/indexes/documents.json +4 -4
  244. package/tests/fixtures/cli/.mdcontext/indexes/sections.json +14 -0
  245. package/tests/integration/embed-index.test.ts +712 -0
  246. package/tests/integration/search-context.test.ts +469 -0
  247. package/tests/integration/search-semantic.test.ts +522 -0
  248. package/vitest.config.ts +1 -6
  249. package/AGENTS.md +0 -46
  250. package/tests/fixtures/cli/.mdcontext/vectors.bin +0 -0
  251. package/tests/fixtures/cli/.mdcontext/vectors.meta.json +0 -1264
package/docs/ERRORS.md ADDED
@@ -0,0 +1,383 @@
1
+ # Error Handling Patterns
2
+
3
+ This document describes the error handling conventions used in mdcontext, following Effect's "errors as values" philosophy.
4
+
5
+ ## Error Type Taxonomy
6
+
7
+ All domain errors are defined in `src/errors/index.ts` using Effect's `Data.TaggedError`:
8
+
9
+ ```typescript
10
+ export class FileReadError extends Data.TaggedError('FileReadError')<{
11
+ readonly path: string
12
+ readonly message: string
13
+ readonly cause?: unknown
14
+ }> {
15
+ get code() { return ErrorCode.FILE_READ }
16
+ }
17
+ ```
18
+
19
+ ### Error Categories
20
+
21
+ - **File System**: `FileReadError`, `FileWriteError`, `DirectoryCreateError`, `DirectoryWalkError`
22
+ - **Parsing**: `ParseError` (for markdown parsing failures)
23
+ - **API**: `ApiKeyMissingError`, `ApiKeyInvalidError`
24
+ - **Embeddings**: `EmbeddingError` (rate limits, quota, network failures)
25
+ - **Index**: `IndexNotFoundError`, `IndexCorruptedError`, `IndexBuildError`
26
+ - **Search**: `DocumentNotFoundError`, `EmbeddingsNotFoundError`
27
+ - **Config**: `ConfigError`
28
+ - **Vector Store**: `VectorStoreError`
29
+ - **Watch**: `WatchError`
30
+ - **CLI**: `CliValidationError`
31
+
32
+ ## Error Codes
33
+
34
+ Each error type has a unique error code for programmatic handling. Error codes are stable identifiers that don't change when messages are updated.
35
+
36
+ ### Code Format
37
+
38
+ Codes follow the pattern `E{category}{number}`:
39
+
40
+ | Category | Code Range | Description |
41
+ |----------|------------|-------------|
42
+ | File System | E1xx | File and directory operations |
43
+ | Parse | E2xx | Markdown parsing errors |
44
+ | API | E3xx | API authentication and embedding errors |
45
+ | Index | E4xx | Index operations |
46
+ | Search | E5xx | Search operations |
47
+ | Vector Store | E6xx | Vector store operations |
48
+ | Config | E7xx | Configuration errors |
49
+ | Watch | E8xx | File watcher errors |
50
+ | CLI | E9xx | CLI validation errors |
51
+
52
+ ### Error Code Reference
53
+
54
+ | Code | Error Type | Description |
55
+ |------|------------|-------------|
56
+ | E100 | FileReadError | Cannot read file |
57
+ | E101 | FileWriteError | Cannot write file |
58
+ | E102 | DirectoryCreateError | Cannot create directory |
59
+ | E103 | DirectoryWalkError | Cannot traverse directory |
60
+ | E200 | ParseError | Markdown parsing failed |
61
+ | E300 | ApiKeyMissingError | API key not set in environment |
62
+ | E301 | ApiKeyInvalidError | API key rejected by provider |
63
+ | E310 | EmbeddingError (RateLimit) | Rate limit exceeded |
64
+ | E311 | EmbeddingError (QuotaExceeded) | API quota exceeded |
65
+ | E312 | EmbeddingError (Network) | Network error during embedding |
66
+ | E313 | EmbeddingError (ModelError) | Model error |
67
+ | E319 | EmbeddingError (Unknown) | Unknown embedding error |
68
+ | E400 | IndexNotFoundError | Index does not exist |
69
+ | E401 | IndexCorruptedError | Index is corrupted |
70
+ | E402 | IndexBuildError | Failed to build index |
71
+ | E500 | DocumentNotFoundError | Document not in index |
72
+ | E501 | EmbeddingsNotFoundError | Embeddings not found |
73
+ | E600 | VectorStoreError | Vector store operation failed |
74
+ | E700 | ConfigError | Configuration error |
75
+ | E800 | WatchError | File watcher error |
76
+ | E900 | CliValidationError | Invalid CLI arguments |
77
+
78
+ ### Exit Codes
79
+
80
+ CLI exit codes map to error categories:
81
+
82
+ | Exit Code | Category | Description |
83
+ |-----------|----------|-------------|
84
+ | 0 | Success | Operation completed successfully |
85
+ | 1 | User Error | Invalid arguments, missing config, etc. |
86
+ | 2 | System Error | File system, network, etc. |
87
+ | 3 | API Error | Authentication, rate limits, etc. |
88
+
89
+ ### Usage in Scripts
90
+
91
+ Error codes enable reliable scripting and CI/CD integration:
92
+
93
+ ```bash
94
+ # Check for specific error codes in output
95
+ mdcontext search "query" 2>&1 | grep -q "\[E400\]" && echo "Index not found"
96
+
97
+ # Use exit codes for control flow
98
+ mdcontext index || {
99
+ case $? in
100
+ 1) echo "User error - check arguments" ;;
101
+ 2) echo "System error - check permissions" ;;
102
+ 3) echo "API error - check credentials" ;;
103
+ esac
104
+ }
105
+ ```
106
+
107
+ ### Programmatic Access
108
+
109
+ ```typescript
110
+ import { FileReadError, ErrorCode } from './errors/index.js'
111
+
112
+ const error = new FileReadError({ path: '/file.md', message: 'ENOENT' })
113
+ console.log(error.code) // 'E100'
114
+ console.log(error._tag) // 'FileReadError'
115
+ ```
116
+
117
+ ## Transformation Patterns
118
+
119
+ ### 1. `mapError` - Transform Error Types
120
+
121
+ Use `mapError` to convert low-level errors to domain errors. This preserves error specificity while adapting the error type.
122
+
123
+ ```typescript
124
+ // GOOD - Maps to domain error with context
125
+ parse(content, options).pipe(
126
+ Effect.mapError((e) =>
127
+ new ParseError({
128
+ message: e.message,
129
+ path: filePath,
130
+ cause: e,
131
+ })
132
+ )
133
+ )
134
+
135
+ // BAD - Loses type information
136
+ Effect.mapError((e) => new Error(`${e._tag}: ${e.message}`))
137
+ ```
138
+
139
+ **When to use:**
140
+ - Converting library errors to domain errors
141
+ - Adding context (path, operation) to errors
142
+ - Translating between error domains
143
+
144
+ ### 2. `catchTag` / `catchTags` - Handle Specific Errors
145
+
146
+ Use `catchTag` when you need to handle a specific known error type. This enables exhaustive error handling and type-safe recovery.
147
+
148
+ ```typescript
149
+ // Handle specific error with recovery
150
+ estimateEmbeddingCost(dir).pipe(
151
+ Effect.catchTag('IndexNotFoundError', () =>
152
+ Effect.succeed(null) // Index doesn't exist, return null estimate
153
+ )
154
+ )
155
+
156
+ // Handle multiple specific errors
157
+ buildEmbeddings(dir).pipe(
158
+ Effect.catchTags({
159
+ ApiKeyMissingError: (e) => {
160
+ console.error(e.message)
161
+ return Effect.succeed(null)
162
+ },
163
+ ApiKeyInvalidError: (e) => {
164
+ console.error(e.message)
165
+ return Effect.succeed(null)
166
+ },
167
+ })
168
+ )
169
+ ```
170
+
171
+ **When to use:**
172
+ - Recovering from expected error conditions
173
+ - Providing fallback values for specific failures
174
+ - Implementing retry logic for transient errors
175
+ - Filtering/handling known error types mid-pipeline
176
+
177
+ ### 3. `catchAll` - Boundary Error Handling
178
+
179
+ Use `catchAll` **only at system boundaries** where all errors must be converted to a final format (user message, JSON response, etc.).
180
+
181
+ ```typescript
182
+ // GOOD - At CLI boundary (main.ts)
183
+ program.pipe(
184
+ Effect.catchAll((error) => {
185
+ console.error(formatError(error))
186
+ return Effect.succeed(ExitCode.failure)
187
+ })
188
+ )
189
+
190
+ // GOOD - At MCP boundary (server.ts)
191
+ // MCP protocol requires JSON responses for all operations
192
+ handler.pipe(
193
+ Effect.catchAll((e) =>
194
+ Effect.succeed({
195
+ isError: true,
196
+ content: [{ type: 'text', text: `Error: ${e.message}` }]
197
+ })
198
+ )
199
+ )
200
+
201
+ // BAD - In middle of pipeline (loses type information)
202
+ readFile(path).pipe(
203
+ Effect.catchAll(() => Effect.succeed(null)) // Silent failure!
204
+ )
205
+ ```
206
+
207
+ **When to use:**
208
+ - CLI entry points (converting to exit codes)
209
+ - MCP/API handlers (converting to protocol responses)
210
+ - Top-level program error handling
211
+
212
+ **When NOT to use:**
213
+ - Middle of pipelines (use `catchTag` instead)
214
+ - When error type information is needed downstream
215
+ - For silent failures without logging
216
+
217
+ ### 4. `Effect.tryPromise` / `Effect.try` - Lift External Operations
218
+
219
+ Use these to wrap promise-based or synchronous operations, converting thrown errors to Effect failures.
220
+
221
+ ```typescript
222
+ // For promises
223
+ Effect.tryPromise({
224
+ try: () => fs.readFile(path, 'utf-8'),
225
+ catch: (e) => new FileReadError({
226
+ path,
227
+ message: e instanceof Error ? e.message : String(e),
228
+ cause: e,
229
+ })
230
+ })
231
+
232
+ // For synchronous code that may throw
233
+ Effect.try({
234
+ try: () => JSON.parse(content),
235
+ catch: (e) => new IndexCorruptedError({
236
+ path,
237
+ reason: 'InvalidJson',
238
+ details: e instanceof Error ? e.message : undefined,
239
+ })
240
+ })
241
+ ```
242
+
243
+ ## Best Practices
244
+
245
+ ### Do's
246
+
247
+ - **Always use domain errors** - Never map to generic `Error`
248
+ - **Preserve cause chains** - Include `cause` field for debugging
249
+ - **Add context** - Include path, operation, and relevant metadata
250
+ - **Document error types** - Use JSDoc to specify thrown errors
251
+ - **Log at boundaries** - When swallowing errors, log for debugging
252
+
253
+ ### Don'ts
254
+
255
+ - **Don't swallow errors silently** - Always log or handle explicitly
256
+ - **Don't use `catchAll` mid-pipeline** - Use `catchTag` instead
257
+ - **Don't mix paradigms** - Avoid try/catch inside Effect.gen
258
+ - **Don't map to generic Error** - Always use typed domain errors
259
+
260
+ ### Batch Processing Pattern
261
+
262
+ When processing multiple items where individual failures shouldn't stop the batch:
263
+
264
+ ```typescript
265
+ const processFile = Effect.gen(function* () {
266
+ // ... processing logic
267
+ }).pipe(
268
+ // Note: catchAll intentional for batch processing
269
+ // Individual file failures collected in errors array
270
+ // rather than stopping the entire operation
271
+ Effect.catchAll((error) => {
272
+ errors.push({ path, message: error.message })
273
+ return Effect.void
274
+ })
275
+ )
276
+ ```
277
+
278
+ Always add a comment explaining why `catchAll` is appropriate.
279
+
280
+ ### Graceful Degradation Pattern
281
+
282
+ When a feature is optional and failure shouldn't block the main operation:
283
+
284
+ ```typescript
285
+ // Optional embedding cost estimate for user prompt
286
+ const estimate = yield* estimateEmbeddingCost(dir).pipe(
287
+ Effect.catchTag('IndexNotFoundError', () => Effect.succeed(null)),
288
+ // Note: catchAll for graceful degradation
289
+ // This is optional information - failure shouldn't block indexing
290
+ Effect.catchAll((e) => {
291
+ Effect.runSync(Effect.logWarning(`Could not estimate: ${e.message}`))
292
+ return Effect.succeed(null)
293
+ })
294
+ )
295
+ ```
296
+
297
+ ## Summarization Error Handling
298
+
299
+ AI summarization uses a separate error system with graceful degradation - errors never prevent search results from being displayed.
300
+
301
+ ### Summarization Error Codes
302
+
303
+ | Code | Error Type | Description |
304
+ |------|------------|-------------|
305
+ | PROVIDER_NOT_FOUND | Provider name unknown | Check provider spelling |
306
+ | PROVIDER_NOT_AVAILABLE | CLI tool not installed | Install the CLI tool |
307
+ | CLI_EXECUTION_FAILED | CLI process error | Check CLI authentication |
308
+ | API_REQUEST_FAILED | API call failed | Check API key and network |
309
+ | RATE_LIMITED | Too many requests | Wait and retry |
310
+ | INVALID_RESPONSE | Bad provider response | Report as bug |
311
+ | TIMEOUT | Request timed out | Reduce result set |
312
+ | NO_API_KEY | Missing API key | Set environment variable |
313
+
314
+ ### Troubleshooting Summarization
315
+
316
+ **"CLI tool 'claude' not found"**
317
+ ```bash
318
+ # Install Claude Code
319
+ # Visit: https://claude.ai/download
320
+ ```
321
+
322
+ **"CLI tool 'opencode' not found"**
323
+ ```bash
324
+ # Install OpenCode
325
+ npm install -g @opencode/cli
326
+ # Or: https://github.com/opencode-ai/opencode
327
+ ```
328
+
329
+ **"Authentication failed for anthropic"**
330
+ ```bash
331
+ export ANTHROPIC_API_KEY=sk-ant-...
332
+ ```
333
+
334
+ **"Rate limit exceeded"**
335
+ - Wait 60 seconds and retry
336
+ - Consider switching to CLI provider (free with subscription)
337
+
338
+ **"Summarization failed: timeout"**
339
+ - Reduce results: `mdcontext search "query" --limit 5 --summarize`
340
+ - The default timeout is 60 seconds
341
+
342
+ **"No summarization providers available"**
343
+ Either:
344
+ 1. Install a CLI tool: `claude`, `opencode`, or `gh copilot`
345
+ 2. Configure an API provider with a valid API key
346
+
347
+ ### Graceful Degradation
348
+
349
+ Summarization errors never crash the CLI. When summarization fails:
350
+
351
+ 1. Error message is displayed
352
+ 2. Search results are shown normally
353
+ 3. Exit code remains 0 (success)
354
+
355
+ ```typescript
356
+ // Implementation pattern in search.ts
357
+ const runSummarization = (options: SummarizationOptions): Effect.Effect<void, never> =>
358
+ runSummarizationUnsafe(options).pipe(
359
+ Effect.catchAll((error) =>
360
+ Effect.sync(() => {
361
+ displaySummarizationError(error)
362
+ // Search results still displayed
363
+ }),
364
+ ),
365
+ )
366
+ ```
367
+
368
+ ## Error Formatting
369
+
370
+ Error formatting (user-friendly messages) should only happen at the CLI boundary in `src/cli/error-handler.ts`. Internal errors carry structured data; presentation is separate from logic.
371
+
372
+ ```typescript
373
+ // src/cli/error-handler.ts
374
+ const formatError = (error: MdContextError): string => {
375
+ switch (error._tag) {
376
+ case 'FileReadError':
377
+ return `Cannot read file: ${error.path}\n${error.message}`
378
+ case 'ApiKeyMissingError':
379
+ return `API key not configured.\n\nSet ${error.envVar} environment variable.`
380
+ // ... other error types
381
+ }
382
+ }
383
+ ```
@@ -0,0 +1,320 @@
1
+ # AI Summarization Architecture
2
+
3
+ This document covers the architecture and implementation details of mdcontext's AI-powered search result summarization feature.
4
+
5
+ ## Overview
6
+
7
+ mdcontext can generate AI-powered summaries of search results using either:
8
+
9
+ 1. **CLI tools** (Claude Code, Copilot CLI, OpenCode) - Free with your subscription
10
+ 2. **API providers** (DeepSeek, Anthropic, OpenAI, Gemini) - Pay per query
11
+
12
+ The design prioritizes CLI providers as the primary option since they leverage existing subscriptions that developers already have.
13
+
14
+ ## Architecture
15
+
16
+ ```
17
+ ┌─────────────────────────────────────────────────────────────────┐
18
+ │ CLI (search.ts) │
19
+ │ --summarize flag triggers summarization pipeline │
20
+ └─────────────────────────┬───────────────────────────────────────┘
21
+
22
+
23
+ ┌─────────────────────────────────────────────────────────────────┐
24
+ │ Provider Factory │
25
+ │ getBestAvailableSummarizer() / createSummarizer() │
26
+ │ - Detects installed CLI tools │
27
+ │ - Creates appropriate provider instance │
28
+ └─────────────────────────┬───────────────────────────────────────┘
29
+
30
+ ┌───────────────┴───────────────┐
31
+ ▼ ▼
32
+ ┌─────────────────────┐ ┌─────────────────────┐
33
+ │ CLI Providers │ │ API Providers │
34
+ │ (Free) │ │ (Pay-per-use) │
35
+ │ │ │ │
36
+ │ - ClaudeCLI │ │ - DeepSeek │
37
+ │ - OpenCode │ │ - Anthropic │
38
+ │ - Copilot │ │ - OpenAI │
39
+ │ - Aider │ │ - Gemini │
40
+ │ - Cline │ │ - Qwen │
41
+ └─────────────────────┘ └─────────────────────┘
42
+ │ │
43
+ └───────────────┬───────────────┘
44
+
45
+ ┌─────────────────────────────────────────────────────────────────┐
46
+ │ Summarizer Interface │
47
+ │ summarize(input, prompt) → SummaryResult │
48
+ │ summarizeStream(input, prompt, options) → void │
49
+ │ estimateCost(inputTokens) → number │
50
+ │ isAvailable() → boolean │
51
+ └─────────────────────────────────────────────────────────────────┘
52
+ ```
53
+
54
+ ## Components
55
+
56
+ ### Provider Detection (`cli-providers/detection.ts`)
57
+
58
+ Automatically discovers installed CLI tools:
59
+
60
+ ```typescript
61
+ import { detectInstalledCLIs } from './summarization/index.js'
62
+
63
+ const installed = await detectInstalledCLIs()
64
+ // [{ name: 'claude', command: 'claude', displayName: 'Claude Code', ... }]
65
+ ```
66
+
67
+ Detection uses `which` (Unix) or `where` (Windows) via `spawn()` - never shell interpolation.
68
+
69
+ ### Provider Factory (`provider-factory.ts`)
70
+
71
+ Creates summarizer instances based on configuration:
72
+
73
+ ```typescript
74
+ import { createSummarizer, getBestAvailableSummarizer } from './summarization/index.js'
75
+
76
+ // Auto-detect best available provider
77
+ const result = await getBestAvailableSummarizer()
78
+ if (result) {
79
+ const { summarizer, config } = result
80
+ // Use summarizer...
81
+ }
82
+
83
+ // Or create from explicit config
84
+ const summarizer = await createSummarizer({
85
+ mode: 'cli',
86
+ provider: 'claude',
87
+ })
88
+ ```
89
+
90
+ ### Cost Estimation (`cost.ts`)
91
+
92
+ Estimates costs before execution:
93
+
94
+ ```typescript
95
+ import { estimateSummaryCost, formatCostDisplay } from './summarization/index.js'
96
+
97
+ const estimate = estimateSummaryCost(inputText, 'api', 'deepseek')
98
+ // {
99
+ // inputTokens: 2500,
100
+ // outputTokens: 500,
101
+ // estimatedCost: 0.0007,
102
+ // provider: 'deepseek',
103
+ // isPaid: true,
104
+ // formattedCost: '$0.0007'
105
+ // }
106
+
107
+ console.log(formatCostDisplay(estimate))
108
+ // "Estimated cost: $0.0007"
109
+ ```
110
+
111
+ CLI providers always return `isPaid: false` with `formattedCost: 'FREE (subscription)'`.
112
+
113
+ ### Prompt Templates (`prompts.ts`)
114
+
115
+ Pre-built prompts for different summarization styles:
116
+
117
+ | Template | Description |
118
+ |----------|-------------|
119
+ | `default` | Balanced summary with key findings |
120
+ | `concise` | 2-3 sentence quick summary |
121
+ | `detailed` | Comprehensive analysis |
122
+ | `actionable` | Focus on next steps |
123
+ | `technical` | Code patterns and API details |
124
+
125
+ ```typescript
126
+ import { buildPrompt } from './summarization/index.js'
127
+
128
+ const prompt = buildPrompt({
129
+ query: 'authentication',
130
+ resultCount: 10,
131
+ searchMode: 'hybrid',
132
+ }, 'actionable')
133
+ ```
134
+
135
+ ### Error Handling (`error-handler.ts`)
136
+
137
+ Graceful degradation on failures:
138
+
139
+ ```typescript
140
+ import { displaySummarizationError, isRecoverableError } from './summarization/index.js'
141
+
142
+ try {
143
+ await summarizer.summarize(input, prompt)
144
+ } catch (error) {
145
+ if (isRecoverableError(error)) {
146
+ // Retry logic
147
+ } else {
148
+ displaySummarizationError(error)
149
+ // Shows user-friendly message, search results still displayed
150
+ }
151
+ }
152
+ ```
153
+
154
+ ## Security Considerations
155
+
156
+ ### Shell Injection Prevention
157
+
158
+ All CLI invocations use `spawn()` with argument arrays - **NEVER** `exec()` with string interpolation:
159
+
160
+ ```typescript
161
+ // CORRECT - Safe from shell injection
162
+ spawn('claude', ['-p', userInput, '--output-format', 'text'])
163
+
164
+ // WRONG - Vulnerable to shell injection
165
+ exec(`claude -p "${userInput}"`) // NEVER DO THIS
166
+ ```
167
+
168
+ This is enforced throughout the codebase. User input is passed as array elements, never interpolated into shell commands.
169
+
170
+ ### API Key Handling
171
+
172
+ - API keys are sourced from environment variables only
173
+ - Never stored in config files
174
+ - Environment variable names follow provider conventions:
175
+ - `DEEPSEEK_API_KEY`
176
+ - `ANTHROPIC_API_KEY`
177
+ - `OPENAI_API_KEY`
178
+ - `GOOGLE_API_KEY` (for Gemini)
179
+ - `QWEN_API_KEY`
180
+
181
+ ### Timeout Protection
182
+
183
+ CLI processes have a default 60-second timeout to prevent hung processes.
184
+
185
+ ## Adding New Providers
186
+
187
+ ### CLI Provider
188
+
189
+ 1. Add to `KNOWN_CLIS` in `cli-providers/detection.ts`:
190
+
191
+ ```typescript
192
+ {
193
+ name: 'newcli',
194
+ command: 'newcli',
195
+ displayName: 'New CLI Tool',
196
+ args: ['--prompt'],
197
+ useStdin: false,
198
+ }
199
+ ```
200
+
201
+ 2. Create implementation in `cli-providers/newcli.ts`:
202
+
203
+ ```typescript
204
+ import { spawn } from 'node:child_process'
205
+ import type { Summarizer, SummaryResult } from '../types.js'
206
+
207
+ export class NewCLISummarizer implements Summarizer {
208
+ async summarize(input: string, prompt: string): Promise<SummaryResult> {
209
+ // SECURITY: Always use spawn() with argument arrays
210
+ const proc = spawn('newcli', ['--prompt', prompt, input])
211
+ // ... implementation
212
+ }
213
+
214
+ async isAvailable(): Promise<boolean> {
215
+ // Check if CLI is installed
216
+ }
217
+ }
218
+ ```
219
+
220
+ 3. Add to factory in `provider-factory.ts`
221
+
222
+ ### API Provider
223
+
224
+ 1. Add pricing to `cost.ts`:
225
+
226
+ ```typescript
227
+ export const API_PRICING = {
228
+ // ... existing providers
229
+ newapi: { input: 0.50, output: 1.00, displayName: 'New API' },
230
+ }
231
+ ```
232
+
233
+ 2. Create implementation using Vercel AI SDK (when implemented):
234
+
235
+ ```typescript
236
+ import { createOpenAI } from '@ai-sdk/openai'
237
+
238
+ export class NewAPISummarizer implements Summarizer {
239
+ // Use Vercel AI SDK for OpenAI-compatible APIs
240
+ }
241
+ ```
242
+
243
+ ## Performance
244
+
245
+ | Provider Type | Latency | Cost |
246
+ |--------------|---------|------|
247
+ | CLI (Claude) | 2-5s | Free |
248
+ | CLI (OpenCode) | 2-5s | Free |
249
+ | API (DeepSeek) | 1-3s | ~$0.0007/query |
250
+ | API (OpenAI) | 1-2s | ~$0.005/query |
251
+
252
+ ### Token Limits
253
+
254
+ - Input is automatically truncated at 100K characters (~25K tokens)
255
+ - Result content is truncated to 500 chars per result
256
+ - Output tokens capped at 500 for cost estimates
257
+
258
+ ## Configuration Reference
259
+
260
+ ### Config File
261
+
262
+ ```javascript
263
+ // mdcontext.config.js
264
+ /** @type {import('mdcontext').PartialMdContextConfig} */
265
+ export default {
266
+ aiSummarization: {
267
+ mode: 'cli', // 'cli' or 'api'
268
+ provider: 'claude', // Provider name
269
+ model: 'deepseek-chat', // Model for API providers
270
+ stream: false, // Enable streaming
271
+ },
272
+ }
273
+ ```
274
+
275
+ ### Environment Variables
276
+
277
+ | Variable | Description |
278
+ |----------|-------------|
279
+ | `MDCONTEXT_AISUMMARIZATION_MODE` | 'cli' or 'api' |
280
+ | `MDCONTEXT_AISUMMARIZATION_PROVIDER` | Provider name |
281
+ | `MDCONTEXT_AISUMMARIZATION_MODEL` | Model name (API only) |
282
+ | `MDCONTEXT_AISUMMARIZATION_STREAM` | 'true' or 'false' |
283
+
284
+ ## Troubleshooting
285
+
286
+ ### "CLI tool 'claude' not found"
287
+
288
+ **Solution:** Install Claude Code from https://claude.ai/download
289
+
290
+ ### "CLI tool 'opencode' not found"
291
+
292
+ **Solution:** Install OpenCode from https://github.com/opencode-ai/opencode
293
+
294
+ ### "Authentication failed for anthropic"
295
+
296
+ **Solution:** Set API key: `export ANTHROPIC_API_KEY=sk-...`
297
+
298
+ ### "Rate limit exceeded"
299
+
300
+ **Solution:** Wait and retry. Consider switching to CLI provider (free).
301
+
302
+ ### "Summarization failed: timeout"
303
+
304
+ **Solution:** Reduce result set with `--limit` or increase timeout in config.
305
+
306
+ ### "No summarization providers available"
307
+
308
+ **Solution:** Either:
309
+ 1. Install a CLI tool (Claude Code, OpenCode)
310
+ 2. Configure an API provider with valid API key
311
+
312
+ ### OpenCode JSON format errors
313
+
314
+ **Solution:** OpenCode JSON format is undocumented. Try updating OpenCode or switch to Claude CLI.
315
+
316
+ ## Related Documentation
317
+
318
+ - [README.md](../README.md#ai-summarization) - Quick start guide
319
+ - [CONFIG.md](./CONFIG.md) - Full configuration reference
320
+ - [ERRORS.md](./ERRORS.md) - Error handling patterns