mdcontext 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (251) hide show
  1. package/.changeset/config.json +9 -9
  2. package/.claude/settings.local.json +25 -0
  3. package/.github/workflows/claude-code-review.yml +44 -0
  4. package/.github/workflows/claude.yml +85 -0
  5. package/CONTRIBUTING.md +186 -0
  6. package/NOTES/NOTES +44 -0
  7. package/README.md +206 -3
  8. package/biome.json +1 -1
  9. package/dist/chunk-23UPXDNL.js +3044 -0
  10. package/dist/chunk-2W7MO2DL.js +1366 -0
  11. package/dist/chunk-3NUAZGMA.js +1689 -0
  12. package/dist/chunk-7TOWB2XB.js +366 -0
  13. package/dist/chunk-7XOTOADQ.js +3065 -0
  14. package/dist/chunk-AH2PDM2K.js +3042 -0
  15. package/dist/chunk-BNXWSZ63.js +3742 -0
  16. package/dist/chunk-BTL5DJVU.js +3222 -0
  17. package/dist/chunk-HDHYG7E4.js +104 -0
  18. package/dist/chunk-HLR4KZBP.js +3234 -0
  19. package/dist/chunk-IP3FRFEB.js +1045 -0
  20. package/dist/chunk-KHU56VDO.js +3042 -0
  21. package/dist/chunk-KRYIFLQR.js +85 -89
  22. package/dist/chunk-LBSDNLEM.js +287 -0
  23. package/dist/chunk-MNTQ7HCP.js +2643 -0
  24. package/dist/chunk-MUJELQQ6.js +1387 -0
  25. package/dist/chunk-MXJGMSLV.js +2199 -0
  26. package/dist/chunk-N6QJGC3Z.js +2636 -0
  27. package/dist/chunk-OBELGBPM.js +1713 -0
  28. package/dist/chunk-OT7R5XTA.js +3192 -0
  29. package/dist/chunk-P7X4RA2T.js +106 -0
  30. package/dist/chunk-PIDUQNC2.js +3185 -0
  31. package/dist/chunk-POGCDIH4.js +3187 -0
  32. package/dist/chunk-PSIEOQGZ.js +3043 -0
  33. package/dist/chunk-PVRT3IHA.js +3238 -0
  34. package/dist/chunk-QNN4TT23.js +1430 -0
  35. package/dist/chunk-RE3R45RJ.js +3042 -0
  36. package/dist/chunk-S7E6TFX6.js +718 -657
  37. package/dist/chunk-SG6GLU4U.js +1378 -0
  38. package/dist/chunk-SJCDV2ST.js +274 -0
  39. package/dist/chunk-SYE5XLF3.js +104 -0
  40. package/dist/chunk-T5VLYBZD.js +103 -0
  41. package/dist/chunk-TOQB7VWU.js +3238 -0
  42. package/dist/chunk-VFNMZ4ZQ.js +3228 -0
  43. package/dist/chunk-VVTGZNBT.js +1533 -1423
  44. package/dist/chunk-W7Q4RFEV.js +104 -0
  45. package/dist/chunk-XTYYVRLO.js +3190 -0
  46. package/dist/chunk-Y6MDYVJD.js +3063 -0
  47. package/dist/cli/main.js +4072 -629
  48. package/dist/index.d.ts +420 -33
  49. package/dist/index.js +8 -15
  50. package/dist/mcp/server.js +103 -7
  51. package/dist/schema-BAWSG7KY.js +22 -0
  52. package/dist/schema-E3QUPL26.js +20 -0
  53. package/dist/schema-EHL7WUT6.js +20 -0
  54. package/docs/019-USAGE.md +44 -5
  55. package/docs/020-current-implementation.md +8 -8
  56. package/docs/021-DOGFOODING-FINDINGS.md +1 -1
  57. package/docs/CONFIG.md +1123 -0
  58. package/docs/ERRORS.md +383 -0
  59. package/docs/summarization.md +320 -0
  60. package/justfile +40 -0
  61. package/package.json +39 -33
  62. package/research/INDEX.md +315 -0
  63. package/research/code-review/README.md +90 -0
  64. package/research/code-review/cli-error-handling-review.md +979 -0
  65. package/research/code-review/code-review-validation-report.md +464 -0
  66. package/research/code-review/main-ts-review.md +1128 -0
  67. package/research/config-docs/SUMMARY.md +357 -0
  68. package/research/config-docs/TEST-RESULTS.md +776 -0
  69. package/research/config-docs/TODO.md +542 -0
  70. package/research/config-docs/analysis.md +744 -0
  71. package/research/config-docs/fix-validation.md +502 -0
  72. package/research/config-docs/help-audit.md +264 -0
  73. package/research/config-docs/help-system-analysis.md +890 -0
  74. package/research/frontmatter/COMMENTS-ARE-SKIPPED.md +149 -0
  75. package/research/frontmatter/LLM-CODE-NAVIGATION.md +276 -0
  76. package/research/issue-review.md +603 -0
  77. package/research/llm-summarization/agent-cli-tools-2026.md +1082 -0
  78. package/research/llm-summarization/alternative-providers-2026.md +1428 -0
  79. package/research/llm-summarization/anthropic-2026.md +367 -0
  80. package/research/llm-summarization/claude-cli-integration.md +1706 -0
  81. package/research/llm-summarization/cli-integration-patterns.md +3155 -0
  82. package/research/llm-summarization/openai-2026.md +473 -0
  83. package/research/llm-summarization/openai-compatible-providers-2026.md +1022 -0
  84. package/research/llm-summarization/opencode-cli-integration.md +1552 -0
  85. package/research/llm-summarization/prompt-engineering-2026.md +1426 -0
  86. package/research/llm-summarization/prototype-results.md +56 -0
  87. package/research/llm-summarization/provider-switching-patterns-2026.md +2153 -0
  88. package/research/llm-summarization/typescript-llm-libraries-2026.md +2436 -0
  89. package/research/mdcontext-pudding/00-EXECUTIVE-SUMMARY.md +282 -0
  90. package/research/mdcontext-pudding/01-index-embed.md +956 -0
  91. package/research/mdcontext-pudding/02-search-COMMANDS.md +142 -0
  92. package/research/mdcontext-pudding/02-search-SUMMARY.md +146 -0
  93. package/research/mdcontext-pudding/02-search.md +970 -0
  94. package/research/mdcontext-pudding/03-context.md +779 -0
  95. package/research/mdcontext-pudding/04-navigation-and-analytics.md +803 -0
  96. package/research/mdcontext-pudding/04-tree.md +704 -0
  97. package/research/mdcontext-pudding/05-config.md +1038 -0
  98. package/research/mdcontext-pudding/06-links-summary.txt +87 -0
  99. package/research/mdcontext-pudding/06-links.md +679 -0
  100. package/research/mdcontext-pudding/07-stats.md +693 -0
  101. package/research/mdcontext-pudding/BUG-FIX-PLAN.md +388 -0
  102. package/research/mdcontext-pudding/P0-BUG-VALIDATION.md +167 -0
  103. package/research/mdcontext-pudding/README.md +168 -0
  104. package/research/mdcontext-pudding/TESTING-SUMMARY.md +128 -0
  105. package/research/research-quality-review.md +834 -0
  106. package/research/semantic-search/embedding-text-analysis.md +156 -0
  107. package/research/semantic-search/multi-word-failure-reproduction.md +171 -0
  108. package/research/semantic-search/query-processing-analysis.md +207 -0
  109. package/research/semantic-search/root-cause-and-solution.md +114 -0
  110. package/research/semantic-search/threshold-validation-report.md +69 -0
  111. package/research/semantic-search/vector-search-analysis.md +63 -0
  112. package/research/test-path-issues.md +276 -0
  113. package/review/ALP-76/1-error-type-design.md +962 -0
  114. package/review/ALP-76/2-error-handling-patterns.md +906 -0
  115. package/review/ALP-76/3-error-presentation.md +624 -0
  116. package/review/ALP-76/4-test-coverage.md +625 -0
  117. package/review/ALP-76/5-migration-completeness.md +440 -0
  118. package/review/ALP-76/6-effect-best-practices.md +755 -0
  119. package/scripts/apply-branch-protection.sh +47 -0
  120. package/scripts/branch-protection-templates.json +79 -0
  121. package/scripts/prototype-summarization.ts +346 -0
  122. package/scripts/rebuild-hnswlib.js +32 -37
  123. package/scripts/setup-branch-protection.sh +64 -0
  124. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/active-provider.json +7 -0
  125. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/bm25.json +541 -0
  126. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/bm25.meta.json +5 -0
  127. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/config.json +8 -0
  128. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.bin +0 -0
  129. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.meta.bin +0 -0
  130. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/documents.json +60 -0
  131. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/links.json +13 -0
  132. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/sections.json +1197 -0
  133. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/configuration-management.md +99 -0
  134. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/distributed-systems.md +92 -0
  135. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/error-handling.md +78 -0
  136. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/failure-automation.md +55 -0
  137. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/job-context.md +69 -0
  138. package/src/__tests__/fixtures/semantic-search/multi-word-corpus/process-orchestration.md +99 -0
  139. package/src/cli/argv-preprocessor.test.ts +2 -2
  140. package/src/cli/cli.test.ts +230 -33
  141. package/src/cli/commands/config-cmd.ts +642 -0
  142. package/src/cli/commands/context.ts +97 -9
  143. package/src/cli/commands/duplicates.ts +122 -0
  144. package/src/cli/commands/embeddings.ts +529 -0
  145. package/src/cli/commands/index-cmd.ts +210 -30
  146. package/src/cli/commands/index.ts +3 -0
  147. package/src/cli/commands/search.ts +894 -64
  148. package/src/cli/commands/stats.ts +3 -0
  149. package/src/cli/commands/tree.ts +26 -5
  150. package/src/cli/config-layer.ts +176 -0
  151. package/src/cli/error-handler.test.ts +235 -0
  152. package/src/cli/error-handler.ts +655 -0
  153. package/src/cli/flag-schemas.ts +66 -0
  154. package/src/cli/help.ts +209 -7
  155. package/src/cli/main.ts +348 -58
  156. package/src/cli/options.ts +10 -0
  157. package/src/cli/shared-error-handling.ts +199 -0
  158. package/src/cli/utils.ts +150 -17
  159. package/src/config/file-provider.test.ts +320 -0
  160. package/src/config/file-provider.ts +273 -0
  161. package/src/config/index.ts +72 -0
  162. package/src/config/integration.test.ts +667 -0
  163. package/src/config/precedence.test.ts +277 -0
  164. package/src/config/precedence.ts +451 -0
  165. package/src/config/schema.test.ts +414 -0
  166. package/src/config/schema.ts +603 -0
  167. package/src/config/service.test.ts +320 -0
  168. package/src/config/service.ts +243 -0
  169. package/src/config/testing.test.ts +264 -0
  170. package/src/config/testing.ts +110 -0
  171. package/src/core/types.ts +6 -33
  172. package/src/duplicates/detector.test.ts +183 -0
  173. package/src/duplicates/detector.ts +414 -0
  174. package/src/duplicates/index.ts +18 -0
  175. package/src/embeddings/embedding-namespace.test.ts +300 -0
  176. package/src/embeddings/embedding-namespace.ts +947 -0
  177. package/src/embeddings/heading-boost.test.ts +222 -0
  178. package/src/embeddings/hnsw-build-options.test.ts +198 -0
  179. package/src/embeddings/hyde.test.ts +272 -0
  180. package/src/embeddings/hyde.ts +264 -0
  181. package/src/embeddings/index.ts +2 -0
  182. package/src/embeddings/openai-provider.ts +332 -83
  183. package/src/embeddings/pricing.json +22 -0
  184. package/src/embeddings/provider-constants.ts +204 -0
  185. package/src/embeddings/provider-errors.test.ts +967 -0
  186. package/src/embeddings/provider-errors.ts +565 -0
  187. package/src/embeddings/provider-factory.test.ts +240 -0
  188. package/src/embeddings/provider-factory.ts +225 -0
  189. package/src/embeddings/provider-integration.test.ts +788 -0
  190. package/src/embeddings/query-preprocessing.test.ts +187 -0
  191. package/src/embeddings/semantic-search-threshold.test.ts +508 -0
  192. package/src/embeddings/semantic-search.ts +780 -93
  193. package/src/embeddings/types.ts +293 -16
  194. package/src/embeddings/vector-store.ts +486 -77
  195. package/src/embeddings/voyage-provider.ts +313 -0
  196. package/src/errors/errors.test.ts +845 -0
  197. package/src/errors/index.ts +533 -0
  198. package/src/index/ignore-patterns.test.ts +354 -0
  199. package/src/index/ignore-patterns.ts +305 -0
  200. package/src/index/indexer.ts +286 -48
  201. package/src/index/storage.ts +94 -30
  202. package/src/index/types.ts +40 -2
  203. package/src/index/watcher.ts +67 -9
  204. package/src/index.ts +22 -0
  205. package/src/integration/search-keyword.test.ts +678 -0
  206. package/src/mcp/server.ts +135 -6
  207. package/src/parser/parser.ts +18 -19
  208. package/src/parser/section-filter.test.ts +277 -0
  209. package/src/parser/section-filter.ts +125 -3
  210. package/src/search/__tests__/hybrid-search.test.ts +650 -0
  211. package/src/search/bm25-store.ts +366 -0
  212. package/src/search/cross-encoder.test.ts +253 -0
  213. package/src/search/cross-encoder.ts +406 -0
  214. package/src/search/fuzzy-search.test.ts +419 -0
  215. package/src/search/fuzzy-search.ts +273 -0
  216. package/src/search/hybrid-search.ts +448 -0
  217. package/src/search/path-matcher.test.ts +276 -0
  218. package/src/search/path-matcher.ts +33 -0
  219. package/src/search/searcher.test.ts +99 -1
  220. package/src/search/searcher.ts +189 -67
  221. package/src/search/wink-bm25.d.ts +30 -0
  222. package/src/summarization/cli-providers/claude.ts +202 -0
  223. package/src/summarization/cli-providers/detection.test.ts +273 -0
  224. package/src/summarization/cli-providers/detection.ts +118 -0
  225. package/src/summarization/cli-providers/index.ts +8 -0
  226. package/src/summarization/cost.test.ts +139 -0
  227. package/src/summarization/cost.ts +102 -0
  228. package/src/summarization/error-handler.test.ts +127 -0
  229. package/src/summarization/error-handler.ts +111 -0
  230. package/src/summarization/index.ts +102 -0
  231. package/src/summarization/pipeline.test.ts +498 -0
  232. package/src/summarization/pipeline.ts +231 -0
  233. package/src/summarization/prompts.test.ts +269 -0
  234. package/src/summarization/prompts.ts +133 -0
  235. package/src/summarization/provider-factory.test.ts +396 -0
  236. package/src/summarization/provider-factory.ts +178 -0
  237. package/src/summarization/types.ts +184 -0
  238. package/src/summarize/summarizer.ts +104 -35
  239. package/src/types/huggingface-transformers.d.ts +66 -0
  240. package/tests/fixtures/cli/.mdcontext/active-provider.json +7 -0
  241. package/tests/fixtures/cli/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.bin +0 -0
  242. package/tests/fixtures/cli/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.meta.bin +0 -0
  243. package/tests/fixtures/cli/.mdcontext/indexes/documents.json +4 -4
  244. package/tests/fixtures/cli/.mdcontext/indexes/sections.json +14 -0
  245. package/tests/integration/embed-index.test.ts +712 -0
  246. package/tests/integration/search-context.test.ts +469 -0
  247. package/tests/integration/search-semantic.test.ts +522 -0
  248. package/vitest.config.ts +1 -6
  249. package/AGENTS.md +0 -46
  250. package/tests/fixtures/cli/.mdcontext/vectors.bin +0 -0
  251. package/tests/fixtures/cli/.mdcontext/vectors.meta.json +0 -1264
@@ -0,0 +1,1426 @@
1
+ # Prompt Engineering for Code Summarization: 2026 Best Practices
2
+
3
+ **Research Date:** January 26, 2026
4
+ **Purpose:** Comprehensive guide to prompt engineering best practices for code summarization and analysis, specifically tailored for mdcontext.
5
+
6
+ ---
7
+
8
+ ## Table of Contents
9
+
10
+ 1. [Executive Summary](#executive-summary)
11
+ 2. [Effective Prompt Structures](#effective-prompt-structures)
12
+ 3. [Few-Shot vs Zero-Shot Approaches](#few-shot-vs-zero-shot-approaches)
13
+ 4. [Code Snippet Formatting](#code-snippet-formatting)
14
+ 5. [Token Optimization Strategies](#token-optimization-strategies)
15
+ 6. [Structured Output Formats](#structured-output-formats)
16
+ 7. [Common Pitfalls and Solutions](#common-pitfalls-and-solutions)
17
+ 8. [Prompt Templates for Code Summarization](#prompt-templates-for-code-summarization)
18
+ 9. [References](#references)
19
+
20
+ ---
21
+
22
+ ## Executive Summary
23
+
24
+ Modern LLMs in 2026 excel at code summarization when provided with well-structured prompts that:
25
+
26
+ - **Separate concerns** - Distinct sections for instructions, context, task, and output format
27
+ - **Use explicit constraints** - Specify exact requirements rather than vague requests
28
+ - **Leverage examples strategically** - Few-shot for specialized tasks, zero-shot for common operations
29
+ - **Optimize token usage** - Remove unnecessary formatting while maintaining code clarity
30
+ - **Request structured outputs** - JSON or Markdown for consistent, parsable results
31
+
32
+ Key insight: **One good example beats five adjectives** when reliability is critical.
33
+
34
+ ---
35
+
36
+ ## Effective Prompt Structures
37
+
38
+ ### 1. Core Structural Pattern (2026 Standard)
39
+
40
+ Modern prompts should follow a clear separation of concerns:
41
+
42
+ ```markdown
43
+ ## Instructions
44
+ [Clear, concise task definition with explicit constraints]
45
+
46
+ ## Context
47
+ [Relevant background information, codebase conventions, domain knowledge]
48
+
49
+ ## Task
50
+ [Specific action to perform]
51
+
52
+ ## Output Format
53
+ [Exact structure expected - JSON schema, Markdown template, etc.]
54
+ ```
55
+
56
+ **Why this works:** Claude and other frontier models excel at following structured instructions. Mixing context and instructions in one paragraph reduces reliability.
57
+
58
+ ### 2. Constraint Specification
59
+
60
+ **❌ Vague approach:**
61
+ ```
62
+ Summarize this code briefly.
63
+ ```
64
+
65
+ **✅ Explicit constraints:**
66
+ ```
67
+ Summarize this code in exactly 3 sentences:
68
+ - Sentence 1 (max 20 words): Primary purpose
69
+ - Sentence 2 (max 20 words): Key inputs and outputs
70
+ - Sentence 3 (max 20 words): Important side effects or dependencies
71
+ ```
72
+
73
+ ### 3. Prompt Template for Code Summarization
74
+
75
+ ```markdown
76
+ ## Instructions
77
+ Analyze the following code and provide a comprehensive summary following the exact structure specified in the Output Format section.
78
+
79
+ ## Context
80
+ - Programming language: {LANGUAGE}
81
+ - Project context: {PROJECT_TYPE}
82
+ - Code location: {FILE_PATH}
83
+ - Related modules: {DEPENDENCIES}
84
+
85
+ ## Task
86
+ Generate a multi-level summary that captures:
87
+ 1. High-level purpose (what the code does and why)
88
+ 2. Key technical details (algorithms, patterns, critical logic)
89
+ 3. Dependencies and integration points
90
+ 4. Notable edge cases or constraints
91
+
92
+ ## Code
93
+ ```{LANGUAGE}
94
+ {CODE_SNIPPET}
95
+ ```
96
+
97
+ ## Output Format
98
+ Return a JSON object with this exact structure:
99
+ ```json
100
+ {
101
+ "summary": "One-line description (max 100 chars)",
102
+ "purpose": "Detailed explanation of what this code accomplishes",
103
+ "technical_approach": "Key algorithms, patterns, or techniques used",
104
+ "inputs": ["List of primary inputs or parameters"],
105
+ "outputs": ["List of outputs or return values"],
106
+ "dependencies": ["External libraries, modules, or services"],
107
+ "side_effects": ["Any state changes, I/O operations, or side effects"],
108
+ "edge_cases": ["Notable edge cases or error handling"],
109
+ "complexity": "Brief complexity analysis (time/space)"
110
+ }
111
+ ```
112
+ ```
113
+
114
+ ---
115
+
116
+ ## Few-Shot vs Zero-Shot Approaches
117
+
118
+ ### Decision Framework
119
+
120
+ | Factor | Zero-Shot | Few-Shot |
121
+ |--------|-----------|----------|
122
+ | **Task Complexity** | Simple, well-understood tasks | Specialized, domain-specific tasks |
123
+ | **Output Format** | Standard formats (plain text, basic JSON) | Custom structures, specific styling |
124
+ | **Model Type** | Instruction-tuned models (GPT-4, Claude 3.5+) | General-purpose or smaller models |
125
+ | **Token Budget** | Limited (examples consume tokens) | Sufficient for 2-5 examples |
126
+ | **Consistency Needs** | Moderate | High - examples establish clear patterns |
127
+
128
+ ### Zero-Shot: When to Use
129
+
130
+ **Best for:**
131
+ - Text summarization with standard formats
132
+ - Code tasks abundant in training data (e.g., "explain this Python function")
133
+ - Working with fine-tuned instruction-following models
134
+ - Token-constrained environments
135
+
136
+ **Example - Zero-Shot Code Summary:**
137
+ ```markdown
138
+ Analyze this TypeScript function and explain:
139
+ 1. What it does
140
+ 2. Key inputs and outputs
141
+ 3. Any important edge cases
142
+
143
+ ```typescript
144
+ export function parseConfig(filePath: string): Config | null {
145
+ try {
146
+ const content = fs.readFileSync(filePath, 'utf-8');
147
+ return JSON.parse(content);
148
+ } catch (error) {
149
+ console.error(`Failed to parse config: ${error}`);
150
+ return null;
151
+ }
152
+ }
153
+ ```
154
+
155
+ Output as a concise paragraph.
156
+ ```
157
+
158
+ ### Few-Shot: When to Use
159
+
160
+ **Best for:**
161
+ - Domain-specific code patterns (e.g., blockchain, quantum computing)
162
+ - Custom output formats that are hard to describe
163
+ - Establishing consistent style across summaries
164
+ - Teaching new concepts not well-represented in training data
165
+
166
+ **Example - Few-Shot Code Summary:**
167
+ ```markdown
168
+ Analyze the following function and provide a summary using the exact style shown in these examples:
169
+
170
+ ## Example 1:
171
+
172
+ **Code:**
173
+ ```python
174
+ def validate_email(email: str) -> bool:
175
+ pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
176
+ return bool(re.match(pattern, email))
177
+ ```
178
+
179
+ **Summary:**
180
+ Purpose: Email format validation using regex
181
+ Pattern: Pure function - deterministic validation
182
+ Input: String (email address)
183
+ Output: Boolean (valid/invalid)
184
+ Complexity: O(n) where n = email length
185
+ Edge Cases: Doesn't verify email existence, only format
186
+
187
+ ## Example 2:
188
+
189
+ **Code:**
190
+ ```python
191
+ async def fetch_user_data(user_id: int) -> dict:
192
+ async with aiohttp.ClientSession() as session:
193
+ response = await session.get(f'/api/users/{user_id}')
194
+ return await response.json()
195
+ ```
196
+
197
+ **Summary:**
198
+ Purpose: Async HTTP fetch for user data
199
+ Pattern: Async I/O with context management
200
+ Input: Integer (user_id)
201
+ Output: Dictionary (user data from API)
202
+ Complexity: O(1) computation, I/O-bound
203
+ Edge Cases: No error handling for failed requests or invalid JSON
204
+
205
+ ## Now analyze this code:
206
+
207
+ **Code:**
208
+ ```typescript
209
+ export async function processQueue<T>(
210
+ queue: Queue<T>,
211
+ handler: (item: T) => Promise<void>,
212
+ concurrency: number = 5
213
+ ): Promise<void> {
214
+ const workers = Array(concurrency).fill(null).map(async () => {
215
+ while (true) {
216
+ const item = await queue.dequeue();
217
+ if (!item) break;
218
+ await handler(item);
219
+ }
220
+ });
221
+ await Promise.all(workers);
222
+ }
223
+ ```
224
+
225
+ **Summary:**
226
+ [Model generates summary following established pattern]
227
+ ```
228
+
229
+ **Performance Note:** Few-shot prompting offers **better performance than zero-shot** by providing targeted guidance through labeled examples, resulting in more accurate and relevant responses.
230
+
231
+ ---
232
+
233
+ ## Code Snippet Formatting
234
+
235
+ ### Key Research Findings (2026)
236
+
237
+ 1. **Format Removal Reduces Tokens by 24.5%** on average with negligible impact on LLM performance
238
+ 2. **Labeled Code Blocks Improve Understanding** - Models recognize code vs. context more accurately
239
+ 3. **Strategic Formatting** - Use minimal formatting for token efficiency while maintaining clarity
240
+
241
+ ### Formatting Best Practices
242
+
243
+ #### ✅ Recommended: Clearly Labeled Code Blocks
244
+
245
+ ```markdown
246
+ ## Context
247
+ Analyzing a Python data processing pipeline
248
+
249
+ ## Code to Analyze
250
+ ```python
251
+ def transform_data(raw_data):
252
+ cleaned = remove_nulls(raw_data)
253
+ normalized = normalize_values(cleaned)
254
+ return aggregate_results(normalized)
255
+ ```
256
+
257
+ ## Task
258
+ Summarize the data transformation pipeline
259
+ ```
260
+
261
+ **Why this works:**
262
+ - Clear boundaries between code and instructions
263
+ - Model understands code context immediately
264
+ - Maintains readability for human review
265
+
266
+ #### ⚠️ Token-Optimized: Minimal Formatting (When Needed)
267
+
268
+ For large codebases where token limits are critical:
269
+
270
+ ```markdown
271
+ Summarize this function:
272
+
273
+ def transform_data(raw_data):
274
+ cleaned=remove_nulls(raw_data)
275
+ normalized=normalize_values(cleaned)
276
+ return aggregate_results(normalized)
277
+
278
+ Focus on: purpose, data flow, dependencies
279
+ ```
280
+
281
+ **Trade-offs:**
282
+ - Saves ~25% tokens
283
+ - Slightly reduced readability
284
+ - LLM performance maintained according to 2026 research
285
+ - Use when processing many files or hitting context limits
286
+
287
+ #### ❌ Avoid: Mixing Code Without Delimiters
288
+
289
+ ```markdown
290
+ Look at this code def transform_data(raw_data): cleaned = remove_nulls(raw_data) and tell me what it does
291
+ ```
292
+
293
+ **Problems:**
294
+ - Model may confuse code with instructions
295
+ - Harder to parse code structure
296
+ - Reduced accuracy in analysis
297
+
298
+ ### Structural Summarization for Large Codebases
299
+
300
+ For token-constrained scenarios, use tools like tree-sitter to create structural summaries:
301
+
302
+ ```markdown
303
+ ## File: /src/data/pipeline.py
304
+ Functions:
305
+ - transform_data(raw_data) -> processed_data
306
+ - remove_nulls(data) -> cleaned_data
307
+ - normalize_values(data) -> normalized_data
308
+ - aggregate_results(data) -> summary
309
+
310
+ Dependencies:
311
+ - pandas
312
+ - numpy
313
+ - custom.validators
314
+
315
+ Purpose: ETL pipeline for sensor data processing
316
+ ```
317
+
318
+ **Benefit:** Drastically reduces tokens while preserving high-level architecture for summarization.
319
+
320
+ ---
321
+
322
+ ## Token Optimization Strategies
323
+
324
+ ### 1. Batching Inputs (30% Token Savings)
325
+
326
+ **❌ Inefficient: Repeated Instructions**
327
+ ```markdown
328
+ Summarize this function's purpose:
329
+ [Function 1]
330
+
331
+ Summarize this function's purpose:
332
+ [Function 2]
333
+
334
+ Summarize this function's purpose:
335
+ [Function 3]
336
+ ```
337
+
338
+ **✅ Efficient: Batched Processing**
339
+ ```markdown
340
+ Analyze each of the following functions and provide:
341
+ - Purpose (one sentence)
342
+ - Key inputs/outputs
343
+ - Complexity
344
+
345
+ Format: JSON array with one object per function
346
+
347
+ ## Function 1
348
+ [Code]
349
+
350
+ ## Function 2
351
+ [Code]
352
+
353
+ ## Function 3
354
+ [Code]
355
+ ```
356
+
357
+ **Savings:** Instructions specified once, not repeated for each input.
358
+
359
+ ### 2. Prompt Compression Techniques
360
+
361
+ **LLMLingua (Microsoft)**
362
+ - Prunes unnecessary tokens from prompts
363
+ - Maintains semantic meaning
364
+ - Useful for very long context windows
365
+
366
+ **Tree-Sitter Parsing**
367
+ - Converts code to structural AST
368
+ - Preserves architecture while reducing tokens
369
+ - Ideal for codebase-wide summarization
370
+
371
+ ### 3. Format Optimization by Task
372
+
373
+ | Task Type | Recommended Format | Token Efficiency |
374
+ |-----------|-------------------|------------------|
375
+ | Single file summary | Full formatted code | Clarity > tokens |
376
+ | Multi-file analysis | Minimal formatting | Tokens > minor clarity loss |
377
+ | Codebase overview | Structural summary (AST) | Maximum token savings |
378
+ | Detailed review | Full formatted with context | Clarity > tokens |
379
+
380
+ ### 4. Practical Implementation for mdcontext
381
+
382
+ ```typescript
383
+ // Adaptive formatting based on file size
384
+ function formatCodeForSummarization(code: string, fileSize: number): string {
385
+ if (fileSize < 1000) {
386
+ // Small files: preserve formatting
387
+ return `\`\`\`${language}\n${code}\n\`\`\``;
388
+ } else if (fileSize < 5000) {
389
+ // Medium files: minimal formatting
390
+ return code.replace(/\s+/g, ' ').trim();
391
+ } else {
392
+ // Large files: structural summary
393
+ return generateASTSummary(code);
394
+ }
395
+ }
396
+ ```
397
+
398
+ ---
399
+
400
+ ## Structured Output Formats
401
+
402
+ ### Why Structured Outputs Matter
403
+
404
+ 1. **Consistency** - Same format every time, easier to parse programmatically
405
+ 2. **Validation** - Can verify output matches expected schema
406
+ 3. **Integration** - Direct use in downstream systems without parsing
407
+ 4. **Clarity** - LLMs understand structure better than freeform requests
408
+
409
+ ### JSON vs Markdown: When to Use Each
410
+
411
+ | Format | Best For | Example Use Cases |
412
+ |--------|----------|-------------------|
413
+ | **JSON** | Machine parsing, APIs, databases | Code metrics, dependency graphs, test results |
414
+ | **Markdown** | Human reading, documentation, RAG | Code explanations, architecture docs, summaries |
415
+
416
+ **Key Insight (2026):** JSON is the gold standard for explicit key-value structure that LLMs parse reliably, while Markdown excels for text-heavy applications like summarization.
417
+
418
+ ### JSON Output Template
419
+
420
+ ```markdown
421
+ ## Instructions
422
+ Analyze the provided code and return a JSON object matching the schema below.
423
+
424
+ ## Output Schema
425
+ ```json
426
+ {
427
+ "file_path": "string",
428
+ "language": "string",
429
+ "summary": {
430
+ "one_line": "string (max 100 chars)",
431
+ "detailed": "string (2-3 sentences)"
432
+ },
433
+ "functions": [
434
+ {
435
+ "name": "string",
436
+ "purpose": "string",
437
+ "parameters": ["string"],
438
+ "returns": "string",
439
+ "complexity": "string"
440
+ }
441
+ ],
442
+ "dependencies": {
443
+ "external": ["string"],
444
+ "internal": ["string"]
445
+ },
446
+ "metadata": {
447
+ "lines_of_code": "number",
448
+ "estimated_complexity": "low|medium|high",
449
+ "test_coverage_needed": "boolean"
450
+ }
451
+ }
452
+ ```
453
+
454
+ ## Code to Analyze
455
+ ```{language}
456
+ {code}
457
+ ```
458
+
459
+ ## Requirements
460
+ - Return ONLY valid JSON, no additional text
461
+ - Ensure all required fields are present
462
+ - Use null for unknown values, not empty strings
463
+ ```
464
+
465
+ ### Markdown Output Template
466
+
467
+ ```markdown
468
+ ## Instructions
469
+ Analyze the provided code and create a comprehensive Markdown summary.
470
+
471
+ ## Output Format
472
+ Use this exact structure:
473
+
474
+ ```markdown
475
+ # {File Name}
476
+
477
+ ## Overview
478
+ [2-3 sentence description of the file's purpose and role in the project]
479
+
480
+ ## Key Components
481
+
482
+ ### Functions
483
+ - **{function_name}**: {one-line description}
484
+ - Inputs: {parameters}
485
+ - Outputs: {return type}
486
+ - Complexity: {time/space complexity}
487
+
488
+ ### Classes
489
+ - **{class_name}**: {one-line description}
490
+ - Responsibilities: {key responsibilities}
491
+ - Dependencies: {what it depends on}
492
+
493
+ ## Dependencies
494
+ - External: {list}
495
+ - Internal: {list}
496
+
497
+ ## Architecture Notes
498
+ {How this fits into the larger system}
499
+
500
+ ## Potential Issues
501
+ {Any code smells, technical debt, or areas for improvement}
502
+ ```
503
+
504
+ ## Code to Analyze
505
+ [code here]
506
+ ```
507
+
508
+ ### Hybrid Approach: Structured Markdown
509
+
510
+ Combining benefits of both formats:
511
+
512
+ ```markdown
513
+ ---
514
+ file: /src/auth/validator.ts
515
+ language: typescript
516
+ complexity: medium
517
+ dependencies: [bcrypt, jsonwebtoken]
518
+ ---
519
+
520
+ # Authentication Validator
521
+
522
+ ## Purpose
523
+ Validates user credentials and generates JWT tokens for authenticated sessions.
524
+
525
+ ## Implementation Details
526
+
527
+ **validateCredentials()**
528
+ - Input: `{ username: string, password: string }`
529
+ - Output: `Promise<User | null>`
530
+ - Complexity: O(1) - single DB lookup + hash comparison
531
+ - Side Effects: Logs failed attempts to security log
532
+
533
+ **generateToken()**
534
+ - Input: `User` object
535
+ - Output: `string` (JWT token)
536
+ - Complexity: O(1)
537
+ - Side Effects: None (pure function)
538
+
539
+ ## Security Considerations
540
+ - Uses bcrypt with cost factor 12 for password hashing
541
+ - JWT tokens expire after 24 hours
542
+ - No rate limiting implemented ⚠️ (potential vulnerability)
543
+ ```
544
+
545
+ ### Enforcing Structured Outputs
546
+
547
+ **Schema Specification:**
548
+ ```markdown
549
+ Return JSON matching this Pydantic schema:
550
+
551
+ ```python
552
+ class FunctionSummary(BaseModel):
553
+ name: str
554
+ purpose: str = Field(max_length=200)
555
+ parameters: List[str]
556
+ returns: Optional[str]
557
+ complexity: Literal["O(1)", "O(n)", "O(n log n)", "O(n²)", "unknown"]
558
+
559
+ class CodeSummary(BaseModel):
560
+ file_path: str
561
+ language: str
562
+ functions: List[FunctionSummary]
563
+ total_lines: int
564
+ ```
565
+ ```
566
+
567
+ **Grammar Enforcement:**
568
+ Most frontier models in 2026 support JSON mode for guaranteed valid JSON output.
569
+
570
+ ---
571
+
572
+ ## Common Pitfalls and Solutions
573
+
574
+ ### 1. Vague Prompts
575
+
576
+ **❌ Problem:**
577
+ ```markdown
578
+ Improve this code.
579
+ ```
580
+
581
+ **Why it fails:** LLM doesn't know if you want performance optimization, readability, bug fixes, new features, or all of the above.
582
+
583
+ **✅ Solution:**
584
+ ```markdown
585
+ Analyze this code for performance bottlenecks:
586
+ 1. Identify functions with O(n²) or worse complexity
587
+ 2. Suggest specific optimizations for each bottleneck
588
+ 3. Estimate performance improvement for each suggestion
589
+
590
+ Code:
591
+ [code here]
592
+ ```
593
+
594
+ ### 2. Vague References in Conversations
595
+
596
+ **❌ Problem:**
597
+ ```markdown
598
+ User: [Pastes 500 lines of code]
599
+ User: [Asks a question about function A]
600
+ User: [Asks a question about function B]
601
+ User: Refactor the above function
602
+ ```
603
+
604
+ **Why it fails:** "Above function" is ambiguous - which one? LLMs may lose track in long conversations.
605
+
606
+ **✅ Solution:**
607
+ ```markdown
608
+ Refactor the `processUserData` function (lines 45-67) to improve error handling.
609
+ ```
610
+
611
+ ### 3. Prompt Length Issues
612
+
613
+ **Research Finding (2026):** Prompts under 50 words have **higher success rates** than longer prompts.
614
+
615
+ **❌ Problem:**
616
+ ```markdown
617
+ I need you to analyze this complex codebase and think about all the different architectural patterns we could use and consider the trade-offs between microservices and monolithic approaches and also think about how we might scale this in the future and what database we should use and whether we should use REST or GraphQL and also...
618
+ [continues for 300 words]
619
+ ```
620
+
621
+ **✅ Solution: Prompt Chaining**
622
+ ```markdown
623
+ Step 1: What is the current architecture pattern used in this codebase?
624
+ [Get response]
625
+
626
+ Step 2: What are the main scalability bottlenecks?
627
+ [Get response]
628
+
629
+ Step 3: Suggest 2-3 specific architectural improvements addressing those bottlenecks.
630
+ [Get response]
631
+ ```
632
+
633
+ **Benefits:**
634
+ - Each prompt is focused and concise
635
+ - Higher accuracy per step
636
+ - Can course-correct between steps
637
+ - Better for complex, multi-faceted analysis
638
+
639
+ ### 4. Semantic Errors and Hallucinations
640
+
641
+ **❌ Problem:** LLMs may generate plausible-looking code with logical errors (missing conditions, incorrect logic) or small inaccuracies (missing minus sign).
642
+
643
+ **✅ Solutions:**
644
+
645
+ **a) Explicit Constraint Checking**
646
+ ```markdown
647
+ Analyze this sorting function and verify:
648
+ 1. Does it handle empty arrays correctly?
649
+ 2. Does it handle single-element arrays correctly?
650
+ 3. Does it maintain stability (equal elements preserve order)?
651
+ 4. What is its worst-case time complexity?
652
+
653
+ Flag any edge cases where it might fail.
654
+ ```
655
+
656
+ **b) Human-in-the-Loop for Critical Code**
657
+ ```markdown
658
+ Important: This code is used in a safety-critical medical device system.
659
+
660
+ Review this function for:
661
+ - Potential null pointer exceptions
662
+ - Integer overflow vulnerabilities
663
+ - Race conditions in concurrent access
664
+ - Edge cases that could cause incorrect dosage calculations
665
+
666
+ Mark each issue as: CRITICAL, HIGH, MEDIUM, or LOW severity.
667
+ ```
668
+
669
+ ### 5. Underspecification (Major 2026 Research Finding)
670
+
671
+ **Research Finding:** Underspecified prompts are **2x more likely to regress** over model or prompt changes, with accuracy drops of **>20%**.
672
+
673
+ **❌ Problem:**
674
+ ```markdown
675
+ Summarize this code.
676
+ ```
677
+
678
+ **Underspecified aspects:**
679
+ - How long should the summary be?
680
+ - What aspects to focus on (purpose, implementation, performance)?
681
+ - What format (prose, bullet points, JSON)?
682
+ - What audience (developers, managers, documentation)?
683
+
684
+ **✅ Solution: Comprehensive Specification**
685
+ ```markdown
686
+ ## Task
687
+ Create a technical summary for a senior developer code review.
688
+
689
+ ## Requirements
690
+ - Length: 3-5 bullet points
691
+ - Focus: Code correctness, performance, maintainability
692
+ - Flag: Any potential bugs or security issues
693
+ - Format: Markdown with code references
694
+
695
+ ## Code
696
+ [code here]
697
+
698
+ ## Output Template
699
+ - **Purpose**: [what this code does]
700
+ - **Implementation**: [key technical approach]
701
+ - **Performance**: [time/space complexity]
702
+ - **Issues**: [any concerns or improvements needed]
703
+ - **Dependencies**: [external libraries or modules]
704
+ ```
705
+
706
+ ### 6. Insufficient Context
707
+
708
+ **❌ Problem:**
709
+ ```markdown
710
+ Why is this function slow?
711
+
712
+ ```python
713
+ def process_data(data):
714
+ result = []
715
+ for item in data:
716
+ result.append(transform(item))
717
+ return result
718
+ ```
719
+ ```
720
+
721
+ **Missing context:**
722
+ - What is `transform()`? Is it I/O-bound, CPU-bound?
723
+ - How large is `data`? 10 items or 10 million?
724
+ - What are the performance requirements?
725
+
726
+ **✅ Solution:**
727
+ ```markdown
728
+ ## Context
729
+ - Function: process_data() in data_pipeline.py
730
+ - Typical input size: 100,000 - 1,000,000 items
731
+ - Current performance: ~30 seconds for 500k items
732
+ - Performance requirement: <5 seconds for 500k items
733
+ - transform() is a pure CPU-bound function (regex parsing)
734
+
735
+ ## Code
736
+ ```python
737
+ def process_data(data):
738
+ result = []
739
+ for item in data:
740
+ result.append(transform(item))
741
+ return result
742
+ ```
743
+
744
+ ## Task
745
+ Identify why this is slow and suggest specific optimizations.
746
+ ```
747
+
748
+ ### 7. Incorrect Output Format Handling
749
+
750
+ **❌ Problem:**
751
+ ```markdown
752
+ Return JSON with the code summary.
753
+
754
+ [LLM returns:]
755
+ Here's the JSON summary:
756
+ ```json
757
+ {
758
+ "summary": "..."
759
+ }
760
+ ```
761
+ I hope this helps!
762
+ ```
763
+
764
+ **Why it fails:** Extra text makes it harder to parse programmatically.
765
+
766
+ **✅ Solution:**
767
+ ```markdown
768
+ Return ONLY a valid JSON object, with no additional text before or after.
769
+ Do not include markdown code fences.
770
+ Do not include explanatory text.
771
+
772
+ Schema:
773
+ {
774
+ "summary": "string",
775
+ "complexity": "string"
776
+ }
777
+ ```
778
+
779
+ ---
780
+
781
+ ## Prompt Templates for Code Summarization
782
+
783
+ ### Template 1: Single File Summary (Comprehensive)
784
+
785
+ ```markdown
786
+ ## Task
787
+ Analyze the following code file and provide a comprehensive technical summary.
788
+
789
+ ## Context
790
+ - File path: {FILE_PATH}
791
+ - Programming language: {LANGUAGE}
792
+ - Project: {PROJECT_NAME}
793
+ - Role in project: {FILE_ROLE}
794
+
795
+ ## Code
796
+ ```{LANGUAGE}
797
+ {CODE_CONTENT}
798
+ ```
799
+
800
+ ## Output Format
801
+ Return a JSON object with this structure:
802
+
803
+ ```json
804
+ {
805
+ "summary": {
806
+ "one_line": "Brief description (max 100 chars)",
807
+ "detailed": "2-3 sentence explanation of purpose and approach"
808
+ },
809
+ "components": {
810
+ "functions": [
811
+ {
812
+ "name": "function_name",
813
+ "purpose": "What it does",
814
+ "signature": "function signature",
815
+ "complexity": "Time/space complexity"
816
+ }
817
+ ],
818
+ "classes": [
819
+ {
820
+ "name": "ClassName",
821
+ "purpose": "What it represents",
822
+ "key_methods": ["method1", "method2"],
823
+ "design_pattern": "Pattern used (if applicable)"
824
+ }
825
+ ],
826
+ "constants": [
827
+ {
828
+ "name": "CONSTANT_NAME",
829
+ "purpose": "Why it exists"
830
+ }
831
+ ]
832
+ },
833
+ "dependencies": {
834
+ "external_libraries": ["lib1", "lib2"],
835
+ "internal_modules": ["module1", "module2"],
836
+ "coupling_level": "low|medium|high"
837
+ },
838
+ "quality_metrics": {
839
+ "lines_of_code": 0,
840
+ "cyclomatic_complexity": "low|medium|high",
841
+ "test_coverage_needed": true,
842
+ "code_smells": ["Issue 1", "Issue 2"]
843
+ },
844
+ "architecture_notes": "How this fits into the larger system"
845
+ }
846
+ ```
847
+
848
+ ## Requirements
849
+ - Be precise and technical
850
+ - Flag any potential issues or improvements
851
+ - Focus on code correctness and maintainability
852
+ - Return ONLY valid JSON
853
+ ```
854
+
855
+ ### Template 2: Quick Function Summary (Zero-Shot)
856
+
857
+ ```markdown
858
+ Analyze this {LANGUAGE} function and provide:
859
+ 1. Purpose (one sentence)
860
+ 2. Key inputs and outputs
861
+ 3. Time/space complexity
862
+ 4. Any edge cases or potential issues
863
+
864
+ ```{LANGUAGE}
865
+ {FUNCTION_CODE}
866
+ ```
867
+
868
+ Format as a concise paragraph (max 100 words).
869
+ ```
870
+
871
+ ### Template 3: Multi-File Batch Summary
872
+
873
+ ```markdown
874
+ ## Task
875
+ Analyze each of the following code files and provide a standardized summary for each.
876
+
877
+ ## Output Format
878
+ Return a JSON array where each element represents one file:
879
+
880
+ ```json
881
+ [
882
+ {
883
+ "file_path": "string",
884
+ "language": "string",
885
+ "primary_purpose": "string (max 200 chars)",
886
+ "key_exports": ["function/class names"],
887
+ "dependencies": ["list of imports"],
888
+ "complexity_score": 1-10,
889
+ "suggested_improvements": ["improvement 1", "improvement 2"]
890
+ }
891
+ ]
892
+ ```
893
+
894
+ ## Files to Analyze
895
+
896
+ ### File 1: {PATH_1}
897
+ ```{LANGUAGE_1}
898
+ {CODE_1}
899
+ ```
900
+
901
+ ### File 2: {PATH_2}
902
+ ```{LANGUAGE_2}
903
+ {CODE_2}
904
+ ```
905
+
906
+ ### File 3: {PATH_3}
907
+ ```{LANGUAGE_3}
908
+ {CODE_3}
909
+ ```
910
+
911
+ ## Requirements
912
+ - Process each file independently
913
+ - Be consistent in evaluation criteria
914
+ - Return ONLY the JSON array
915
+ ```
916
+
917
+ ### Template 4: Architecture-Focused Summary (Few-Shot)
918
+
919
+ ```markdown
920
+ ## Task
921
+ Analyze the code and summarize its architectural role using the pattern shown in the examples.
922
+
923
+ ## Example 1
924
+
925
+ **Code:**
926
+ ```typescript
927
+ export class UserRepository {
928
+ constructor(private db: Database) {}
929
+
930
+ async findById(id: string): Promise<User | null> {
931
+ return this.db.query('SELECT * FROM users WHERE id = ?', [id]);
932
+ }
933
+ }
934
+ ```
935
+
936
+ **Summary:**
937
+ - **Pattern**: Repository Pattern
938
+ - **Layer**: Data Access Layer
939
+ - **Responsibility**: Abstracts database operations for User entities
940
+ - **Dependencies**: Database connection interface
941
+ - **Testing**: Easily mockable via dependency injection
942
+ - **Scalability**: Can swap database implementation without changing business logic
943
+
944
+ ## Example 2
945
+
946
+ **Code:**
947
+ ```python
948
+ class EmailNotificationService:
949
+ def __init__(self, email_client):
950
+ self.client = email_client
951
+
952
+ def send_welcome_email(self, user):
953
+ template = self.load_template('welcome')
954
+ self.client.send(user.email, template.render(user))
955
+ ```
956
+
957
+ **Summary:**
958
+ - **Pattern**: Service Layer
959
+ - **Layer**: Application/Business Layer
960
+ - **Responsibility**: Orchestrates email sending with templating
961
+ - **Dependencies**: Email client abstraction, template engine
962
+ - **Testing**: Mock email client for unit tests
963
+ - **Scalability**: Can add queue for async sending without changing interface
964
+
965
+ ## Now analyze this code
966
+
967
+ **Code:**
968
+ ```{LANGUAGE}
969
+ {CODE}
970
+ ```
971
+
972
+ **Summary:**
973
+ ```
974
+
975
+ ### Template 5: Security-Focused Code Review
976
+
977
+ ```markdown
978
+ ## Task
979
+ Perform a security-focused analysis of this code.
980
+
981
+ ## Context
982
+ - Language: {LANGUAGE}
983
+ - Environment: {PRODUCTION/STAGING/DEV}
984
+ - Handles sensitive data: {YES/NO}
985
+ - Authentication/Authorization: {DESCRIPTION}
986
+
987
+ ## Code
988
+ ```{LANGUAGE}
989
+ {CODE}
990
+ ```
991
+
992
+ ## Analysis Requirements
993
+
994
+ Evaluate for:
995
+
996
+ 1. **Input Validation**
997
+ - Are all inputs validated?
998
+ - Is sanitization performed correctly?
999
+ - SQL injection risks?
1000
+ - XSS vulnerabilities?
1001
+
1002
+ 2. **Authentication & Authorization**
1003
+ - Proper auth checks?
1004
+ - Role-based access control?
1005
+ - Token handling security?
1006
+
1007
+ 3. **Data Protection**
1008
+ - Sensitive data encrypted?
1009
+ - Secrets hardcoded?
1010
+ - Logging sensitive info?
1011
+
1012
+ 4. **Error Handling**
1013
+ - Information leakage in errors?
1014
+ - Proper exception handling?
1015
+ - Stack traces exposed?
1016
+
1017
+ 5. **Dependencies**
1018
+ - Known vulnerabilities in libs?
1019
+ - Outdated packages?
1020
+
1021
+ ## Output Format
1022
+
1023
+ ```json
1024
+ {
1025
+ "overall_risk": "low|medium|high|critical",
1026
+ "vulnerabilities": [
1027
+ {
1028
+ "type": "SQL Injection|XSS|Auth Bypass|etc",
1029
+ "severity": "low|medium|high|critical",
1030
+ "location": "line number or function name",
1031
+ "description": "What the vulnerability is",
1032
+ "exploit_scenario": "How it could be exploited",
1033
+ "remediation": "How to fix it"
1034
+ }
1035
+ ],
1036
+ "best_practices_violated": ["Practice 1", "Practice 2"],
1037
+ "recommendations": ["Recommendation 1", "Recommendation 2"],
1038
+ "safe_for_production": true|false
1039
+ }
1040
+ ```
1041
+ ```
1042
+
1043
+ ### Template 6: Performance Optimization Analysis
1044
+
1045
+ ```markdown
1046
+ ## Task
1047
+ Analyze this code for performance optimization opportunities.
1048
+
1049
+ ## Context
1050
+ - Current performance: {METRICS}
1051
+ - Performance goal: {TARGET}
1052
+ - Typical input size: {SIZE}
1053
+ - Environment: {PROD_SPECS}
1054
+
1055
+ ## Code
1056
+ ```{LANGUAGE}
1057
+ {CODE}
1058
+ ```
1059
+
1060
+ ## Analysis Focus
1061
+
1062
+ 1. **Algorithmic Complexity**
1063
+ - Current time complexity
1064
+ - Current space complexity
1065
+ - Potential optimizations
1066
+
1067
+ 2. **Resource Usage**
1068
+ - Memory allocations
1069
+ - I/O operations
1070
+ - Network calls
1071
+ - Database queries
1072
+
1073
+ 3. **Concurrency**
1074
+ - Parallelization opportunities
1075
+ - Race conditions
1076
+ - Lock contention
1077
+
1078
+ ## Output Format
1079
+
1080
+ ```markdown
1081
+ # Performance Analysis: {FUNCTION/FILE_NAME}
1082
+
1083
+ ## Current Complexity
1084
+ - Time: O(?)
1085
+ - Space: O(?)
1086
+
1087
+ ## Bottlenecks Identified
1088
+
1089
+ ### Bottleneck 1: {NAME}
1090
+ - **Location**: Line X-Y or function name
1091
+ - **Issue**: [Description of the bottleneck]
1092
+ - **Impact**: [How much it slows things down]
1093
+ - **Optimization**: [Specific suggestion]
1094
+ - **Expected Improvement**: [Estimated speedup]
1095
+
1096
+ ### Bottleneck 2: {NAME}
1097
+ [Same structure]
1098
+
1099
+ ## Quick Wins
1100
+ 1. [Easy optimization with good ROI]
1101
+ 2. [Easy optimization with good ROI]
1102
+
1103
+ ## Long-term Improvements
1104
+ 1. [More complex optimization for significant gains]
1105
+ 2. [Architectural changes needed]
1106
+
1107
+ ## Estimated Overall Improvement
1108
+ [X%] faster with quick wins, [Y%] with all optimizations
1109
+ ```
1110
+ ```
1111
+
1112
+ ### Template 7: Dependency Analysis
1113
+
1114
+ ```markdown
1115
+ ## Task
1116
+ Map all dependencies for this code file and analyze coupling.
1117
+
1118
+ ## Code
1119
+ ```{LANGUAGE}
1120
+ {CODE}
1121
+ ```
1122
+
1123
+ ## Output Format
1124
+
1125
+ ```json
1126
+ {
1127
+ "file_path": "string",
1128
+ "dependencies": {
1129
+ "external": [
1130
+ {
1131
+ "package": "package-name",
1132
+ "version": "^1.2.3",
1133
+ "imports": ["specific", "imports"],
1134
+ "purpose": "Why this dependency is used",
1135
+ "alternatives": ["alternative-package"]
1136
+ }
1137
+ ],
1138
+ "internal": [
1139
+ {
1140
+ "module_path": "./relative/path",
1141
+ "imports": ["imports"],
1142
+ "coupling_type": "tight|loose",
1143
+ "circular_dependency": true|false
1144
+ }
1145
+ ]
1146
+ },
1147
+ "coupling_analysis": {
1148
+ "level": "low|medium|high",
1149
+ "issues": ["Circular dependency between X and Y"],
1150
+ "suggestions": ["Break circular dep by introducing interface"]
1151
+ },
1152
+ "dependency_graph": {
1153
+ "direct_dependencies": 5,
1154
+ "transitive_dependencies": 23,
1155
+ "depth": 4
1156
+ },
1157
+ "risks": [
1158
+ {
1159
+ "type": "outdated|security|unmaintained",
1160
+ "package": "package-name",
1161
+ "description": "Risk description"
1162
+ }
1163
+ ]
1164
+ }
1165
+ ```
1166
+ ```
1167
+
1168
+ ### Template 8: Test Coverage Planning
1169
+
1170
+ ```markdown
1171
+ ## Task
1172
+ Analyze this code and create a test coverage plan.
1173
+
1174
+ ## Code
1175
+ ```{LANGUAGE}
1176
+ {CODE}
1177
+ ```
1178
+
1179
+ ## Analysis Requirements
1180
+
1181
+ 1. Identify all testable units (functions, methods, classes)
1182
+ 2. Determine test cases needed for each unit
1183
+ 3. Identify edge cases and error scenarios
1184
+ 4. Suggest test doubles (mocks, stubs) needed
1185
+ 5. Estimate test coverage complexity
1186
+
1187
+ ## Output Format
1188
+
1189
+ ```markdown
1190
+ # Test Coverage Plan: {FILE_NAME}
1191
+
1192
+ ## Test Units
1193
+
1194
+ ### Unit 1: `{function/class_name}`
1195
+
1196
+ **Test Cases:**
1197
+ 1. **Happy Path**: [Description]
1198
+ - Input: [Example input]
1199
+ - Expected Output: [Expected result]
1200
+
1201
+ 2. **Edge Case**: [Description]
1202
+ - Input: [Example input]
1203
+ - Expected Output: [Expected result]
1204
+
1205
+ 3. **Error Case**: [Description]
1206
+ - Input: [Example input]
1207
+ - Expected Behavior: [Exception thrown, error returned, etc.]
1208
+
1209
+ **Test Doubles Needed:**
1210
+ - Mock: {dependency_name} - [Why]
1211
+ - Stub: {dependency_name} - [Why]
1212
+
1213
+ **Complexity**: Low|Medium|High
1214
+
1215
+ ---
1216
+
1217
+ ### Unit 2: `{function/class_name}`
1218
+ [Same structure]
1219
+
1220
+ ---
1221
+
1222
+ ## Overall Test Strategy
1223
+
1224
+ **Total Test Cases**: {NUMBER}
1225
+ **Estimated Coverage**: {PERCENTAGE}%
1226
+ **Priority Order**: [Which units to test first and why]
1227
+ **Integration Tests Needed**: [Cross-unit testing scenarios]
1228
+
1229
+ ## Testing Challenges
1230
+
1231
+ 1. [Challenge 1 and mitigation strategy]
1232
+ 2. [Challenge 2 and mitigation strategy]
1233
+ ```
1234
+ ```
1235
+
1236
+ ---
1237
+
1238
+ ## Implementation Guidelines for mdcontext
1239
+
1240
+ ### 1. Adaptive Prompting Strategy
1241
+
1242
+ ```typescript
1243
+ interface PromptConfig {
1244
+ fileSize: number;
1245
+ complexity: 'low' | 'medium' | 'high';
1246
+ purpose: 'overview' | 'detailed' | 'security' | 'performance';
1247
+ }
1248
+
1249
+ function selectPromptTemplate(config: PromptConfig): string {
1250
+ // Small files: comprehensive analysis
1251
+ if (config.fileSize < 1000) {
1252
+ return config.purpose === 'overview'
1253
+ ? TEMPLATE_QUICK_SUMMARY
1254
+ : TEMPLATE_COMPREHENSIVE;
1255
+ }
1256
+
1257
+ // Large files: focus on structure
1258
+ if (config.fileSize > 5000) {
1259
+ return TEMPLATE_STRUCTURAL_SUMMARY;
1260
+ }
1261
+
1262
+ // Medium files: balanced approach
1263
+ return TEMPLATE_STANDARD;
1264
+ }
1265
+ ```
1266
+
1267
+ ### 2. Token Budget Management
1268
+
1269
+ ```typescript
1270
+ const TOKEN_LIMITS = {
1271
+ SINGLE_FILE: 4000,
1272
+ BATCH_PROCESSING: 8000,
1273
+ CODEBASE_OVERVIEW: 16000,
1274
+ };
1275
+
1276
+ function optimizeForTokens(code: string, limit: number): string {
1277
+ const estimatedTokens = code.length / 4; // Rough estimate
1278
+
1279
+ if (estimatedTokens > limit * 0.8) {
1280
+ // Use structural summary
1281
+ return generateStructuralSummary(code);
1282
+ }
1283
+
1284
+ return code; // Full code fits comfortably
1285
+ }
1286
+ ```
1287
+
1288
+ ### 3. Output Validation
1289
+
1290
+ ```typescript
1291
+ function validateSummaryOutput(output: string, expectedFormat: 'json' | 'markdown'): boolean {
1292
+ if (expectedFormat === 'json') {
1293
+ try {
1294
+ const parsed = JSON.parse(output);
1295
+ return validateSchema(parsed, EXPECTED_SCHEMA);
1296
+ } catch {
1297
+ return false;
1298
+ }
1299
+ }
1300
+
1301
+ // Validate markdown structure
1302
+ return output.includes('##') && output.length > 50;
1303
+ }
1304
+ ```
1305
+
1306
+ ### 4. Caching Strategy
1307
+
1308
+ ```typescript
1309
+ // Cache summaries with prompt version for invalidation
1310
+ interface CachedSummary {
1311
+ summary: string;
1312
+ promptVersion: string;
1313
+ timestamp: number;
1314
+ fileHash: string;
1315
+ }
1316
+
1317
+ // Invalidate if prompt or file changes
1318
+ function getCachedOrGenerate(
1319
+ file: CodeFile,
1320
+ promptVersion: string
1321
+ ): Promise<string> {
1322
+ const cached = cache.get(file.path);
1323
+
1324
+ if (cached?.promptVersion === promptVersion &&
1325
+ cached?.fileHash === file.hash) {
1326
+ return cached.summary;
1327
+ }
1328
+
1329
+ return generateSummary(file, promptVersion);
1330
+ }
1331
+ ```
1332
+
1333
+ ---
1334
+
1335
+ ## References
1336
+
1337
+ ### Research Papers & Technical Articles
1338
+
1339
+ 1. [The Hidden Cost of Readability: How Code Formatting Silently Consumes Your LLM Budget](https://arxiv.org/html/2508.13666) - Research on token optimization through format removal
1340
+ 2. [Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization](https://arxiv.org/html/2601.13118v1) - 2026 empirical study on code prompting
1341
+ 3. [What Prompts Don't Say: Understanding and Managing Underspecification in LLM Prompts](https://arxiv.org/html/2505.13360v1) - Critical research on prompt underspecification
1342
+ 4. [A comprehensive taxonomy of prompt engineering techniques](https://jamesthez.github.io/files/liu-fcs26.pdf) - Academic taxonomy of prompting methods
1343
+
1344
+ ### Best Practices Guides
1345
+
1346
+ 5. [The 2026 Guide to Prompt Engineering | IBM](https://www.ibm.com/think/prompt-engineering) - Industry best practices for 2026
1347
+ 6. [Prompt engineering best practices | Claude](https://claude.com/blog/best-practices-for-prompt-engineering) - Anthropic's official guidance
1348
+ 7. [Claude Prompt Engineering Best Practices (2026): A Checklist That Actually Improves Outputs](https://promptbuilder.cc/blog/claude-prompt-engineering-best-practices-2026)
1349
+ 8. [Prompt Engineering Guide 2026](https://www.analyticsvidhya.com/blog/2026/01/master-prompt-engineering/)
1350
+
1351
+ ### Summarization Techniques
1352
+
1353
+ 9. [Prompt Engineering Guide to Summarization](https://blog.promptlayer.com/prompt-engineering-guide-to-summarization/)
1354
+ 10. [Summarising Best Practices for Prompt Engineering | Towards Data Science](https://towardsdatascience.com/summarising-best-practices-for-prompt-engineering-c5e86c483af4/)
1355
+ 11. [Crafting Effective Prompts for Summarization Using Large Language Models | Towards Data Science](https://towardsdatascience.com/crafting-effective-prompts-for-summarization-using-large-language-models-dbbdf019f664/)
1356
+
1357
+ ### Few-Shot vs Zero-Shot Learning
1358
+
1359
+ 12. [Zero-Shot Learning vs. Few-Shot Learning vs. Fine-Tuning](https://labelbox.com/guides/zero-shot-learning-few-shot-learning-fine-tuning/) - Technical walkthrough
1360
+ 13. [Zero-Shot vs Few-Shot prompting: A Guide with Examples](https://www.vellum.ai/blog/zero-shot-vs-few-shot-prompting-a-guide-with-examples)
1361
+ 14. [Few-Shot Prompting | Prompt Engineering Guide](https://www.promptingguide.ai/techniques/fewshot)
1362
+ 15. [Zero-Shot and Few-Shot Learning with LLMs](https://neptune.ai/blog/zero-shot-and-few-shot-learning-with-llms)
1363
+
1364
+ ### Token Optimization
1365
+
1366
+ 16. [4 Research Backed Prompt Optimization Techniques to Save Your Tokens](https://medium.com/@koyelac/4-research-backed-prompt-optimization-techniques-to-save-your-tokens-ede300ec90dc)
1367
+ 17. [Better Prompting for LLMs: From Code Blocks to JSON and TOON](https://medium.com/@mokshanirugutti/better-prompting-for-llms-from-code-blocks-to-json-and-toon-8ceca8dd4f22)
1368
+ 18. [Prompt Compression in Large Language Models (LLMs): Making Every Token Count](https://medium.com/@sahin.samia/prompt-compression-in-large-language-models-llms-making-every-token-count-078a2d1c7e03)
1369
+ 19. [Stop Wasting LLM Tokens | Towards Data Science](https://towardsdatascience.com/stop-wasting-llm-tokens-a5b581fb3e6e/)
1370
+
1371
+ ### Structured Outputs
1372
+
1373
+ 20. [Generating Structured Output with LLMs (Part 1)](https://ankur-singh.github.io/blog/structured-output)
1374
+ 21. [Structuring Output Formats (JSON, Markdown)](https://apxml.com/courses/prompt-engineering-llm-application-development/chapter-2-advanced-prompting-strategies/structuring-output-formats)
1375
+ 22. [Mastering Prompt Engineering: Using LLMs to Generate JSON-Based Prompts](https://blog.republiclabs.ai/2026/01/mastering-prompt-engineering-using-llms.html)
1376
+ 23. [Why Markdown is the best format for LLMs](https://medium.com/@wetrocloud/why-markdown-is-the-best-format-for-llms-aa0514a409a7)
1377
+ 24. [From Free-Form to Structured: A Better Way to Use LLMs](https://marutitech.com/structured-outputs-llms/)
1378
+
1379
+ ### Code Generation & Analysis
1380
+
1381
+ 25. [Using LLMs for Code Generation: A Guide to Improving Accuracy and Addressing Common Issues](https://www.prompthub.us/blog/using-llms-for-code-generation-a-guide-to-improving-accuracy-and-addressing-common-issues)
1382
+ 26. [My LLM coding workflow going into 2026 - by Addy Osmani](https://addyo.substack.com/p/my-llm-coding-workflow-going-into)
1383
+ 27. [The Prompt Engineering Playbook for Programmers](https://addyo.substack.com/p/the-prompt-engineering-playbook-for)
1384
+
1385
+ ### Common Pitfalls & Challenges
1386
+
1387
+ 28. [Common LLM Prompt Engineering Challenges and Solutions](https://latitude-blog.ghost.io/blog/common-llm-prompt-engineering-challenges-and-solutions/)
1388
+ 29. [LLM Limitations: When Models and Chatbots Make Mistakes](https://learnprompting.org/docs/basics/pitfalls)
1389
+ 30. [A Field Guide to LLM Failure Modes](https://medium.com/@adnanmasood/a-field-guide-to-llm-failure-modes-5ffaeeb08e80)
1390
+
1391
+ ---
1392
+
1393
+ ## Appendix: Quick Reference Checklist
1394
+
1395
+ ### Before Writing a Code Summarization Prompt
1396
+
1397
+ - [ ] Is the purpose clear? (overview, detailed analysis, security review, etc.)
1398
+ - [ ] Is the output format specified? (JSON, Markdown, prose)
1399
+ - [ ] Is the audience identified? (developers, managers, documentation)
1400
+ - [ ] Are constraints explicit? (length, focus areas, depth)
1401
+ - [ ] Is context provided? (file role, project type, dependencies)
1402
+ - [ ] Are examples included if using few-shot?
1403
+ - [ ] Is the prompt under 50 words for simple tasks?
1404
+ - [ ] Is code properly formatted with language tags?
1405
+ - [ ] Is token budget considered for large files?
1406
+ - [ ] Is output validation planned?
1407
+
1408
+ ### After Receiving LLM Output
1409
+
1410
+ - [ ] Does output match requested format?
1411
+ - [ ] Are all required fields present?
1412
+ - [ ] Is the summary accurate? (spot check against code)
1413
+ - [ ] Are edge cases and issues identified?
1414
+ - [ ] Is complexity analysis reasonable?
1415
+ - [ ] Are dependencies correctly identified?
1416
+ - [ ] Is technical terminology accurate?
1417
+ - [ ] Can this output be parsed programmatically (if needed)?
1418
+ - [ ] Is it suitable for the intended audience?
1419
+ - [ ] Should this be cached for future use?
1420
+
1421
+ ---
1422
+
1423
+ **Document Version:** 1.0
1424
+ **Last Updated:** January 26, 2026
1425
+ **Maintained by:** mdcontext project
1426
+ **Review Schedule:** Quarterly (next review: April 2026)