npm - mdcontext - Versions diffs - 0.1.0 → 0.2.0 - Mend

mdcontext 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (251) hide show

package/.changeset/config.json +9 -9
package/.claude/settings.local.json +25 -0
package/.github/workflows/claude-code-review.yml +44 -0
package/.github/workflows/claude.yml +85 -0
package/CONTRIBUTING.md +186 -0
package/NOTES/NOTES +44 -0
package/README.md +206 -3
package/biome.json +1 -1
package/dist/chunk-23UPXDNL.js +3044 -0
package/dist/chunk-2W7MO2DL.js +1366 -0
package/dist/chunk-3NUAZGMA.js +1689 -0
package/dist/chunk-7TOWB2XB.js +366 -0
package/dist/chunk-7XOTOADQ.js +3065 -0
package/dist/chunk-AH2PDM2K.js +3042 -0
package/dist/chunk-BNXWSZ63.js +3742 -0
package/dist/chunk-BTL5DJVU.js +3222 -0
package/dist/chunk-HDHYG7E4.js +104 -0
package/dist/chunk-HLR4KZBP.js +3234 -0
package/dist/chunk-IP3FRFEB.js +1045 -0
package/dist/chunk-KHU56VDO.js +3042 -0
package/dist/chunk-KRYIFLQR.js +85 -89
package/dist/chunk-LBSDNLEM.js +287 -0
package/dist/chunk-MNTQ7HCP.js +2643 -0
package/dist/chunk-MUJELQQ6.js +1387 -0
package/dist/chunk-MXJGMSLV.js +2199 -0
package/dist/chunk-N6QJGC3Z.js +2636 -0
package/dist/chunk-OBELGBPM.js +1713 -0
package/dist/chunk-OT7R5XTA.js +3192 -0
package/dist/chunk-P7X4RA2T.js +106 -0
package/dist/chunk-PIDUQNC2.js +3185 -0
package/dist/chunk-POGCDIH4.js +3187 -0
package/dist/chunk-PSIEOQGZ.js +3043 -0
package/dist/chunk-PVRT3IHA.js +3238 -0
package/dist/chunk-QNN4TT23.js +1430 -0
package/dist/chunk-RE3R45RJ.js +3042 -0
package/dist/chunk-S7E6TFX6.js +718 -657
package/dist/chunk-SG6GLU4U.js +1378 -0
package/dist/chunk-SJCDV2ST.js +274 -0
package/dist/chunk-SYE5XLF3.js +104 -0
package/dist/chunk-T5VLYBZD.js +103 -0
package/dist/chunk-TOQB7VWU.js +3238 -0
package/dist/chunk-VFNMZ4ZQ.js +3228 -0
package/dist/chunk-VVTGZNBT.js +1533 -1423
package/dist/chunk-W7Q4RFEV.js +104 -0
package/dist/chunk-XTYYVRLO.js +3190 -0
package/dist/chunk-Y6MDYVJD.js +3063 -0
package/dist/cli/main.js +4072 -629
package/dist/index.d.ts +420 -33
package/dist/index.js +8 -15
package/dist/mcp/server.js +103 -7
package/dist/schema-BAWSG7KY.js +22 -0
package/dist/schema-E3QUPL26.js +20 -0
package/dist/schema-EHL7WUT6.js +20 -0
package/docs/019-USAGE.md +44 -5
package/docs/020-current-implementation.md +8 -8
package/docs/021-DOGFOODING-FINDINGS.md +1 -1
package/docs/CONFIG.md +1123 -0
package/docs/ERRORS.md +383 -0
package/docs/summarization.md +320 -0
package/justfile +40 -0
package/package.json +39 -33
package/research/INDEX.md +315 -0
package/research/code-review/README.md +90 -0
package/research/code-review/cli-error-handling-review.md +979 -0
package/research/code-review/code-review-validation-report.md +464 -0
package/research/code-review/main-ts-review.md +1128 -0
package/research/config-docs/SUMMARY.md +357 -0
package/research/config-docs/TEST-RESULTS.md +776 -0
package/research/config-docs/TODO.md +542 -0
package/research/config-docs/analysis.md +744 -0
package/research/config-docs/fix-validation.md +502 -0
package/research/config-docs/help-audit.md +264 -0
package/research/config-docs/help-system-analysis.md +890 -0
package/research/frontmatter/COMMENTS-ARE-SKIPPED.md +149 -0
package/research/frontmatter/LLM-CODE-NAVIGATION.md +276 -0
package/research/issue-review.md +603 -0
package/research/llm-summarization/agent-cli-tools-2026.md +1082 -0
package/research/llm-summarization/alternative-providers-2026.md +1428 -0
package/research/llm-summarization/anthropic-2026.md +367 -0
package/research/llm-summarization/claude-cli-integration.md +1706 -0
package/research/llm-summarization/cli-integration-patterns.md +3155 -0
package/research/llm-summarization/openai-2026.md +473 -0
package/research/llm-summarization/openai-compatible-providers-2026.md +1022 -0
package/research/llm-summarization/opencode-cli-integration.md +1552 -0
package/research/llm-summarization/prompt-engineering-2026.md +1426 -0
package/research/llm-summarization/prototype-results.md +56 -0
package/research/llm-summarization/provider-switching-patterns-2026.md +2153 -0
package/research/llm-summarization/typescript-llm-libraries-2026.md +2436 -0
package/research/mdcontext-pudding/00-EXECUTIVE-SUMMARY.md +282 -0
package/research/mdcontext-pudding/01-index-embed.md +956 -0
package/research/mdcontext-pudding/02-search-COMMANDS.md +142 -0
package/research/mdcontext-pudding/02-search-SUMMARY.md +146 -0
package/research/mdcontext-pudding/02-search.md +970 -0
package/research/mdcontext-pudding/03-context.md +779 -0
package/research/mdcontext-pudding/04-navigation-and-analytics.md +803 -0
package/research/mdcontext-pudding/04-tree.md +704 -0
package/research/mdcontext-pudding/05-config.md +1038 -0
package/research/mdcontext-pudding/06-links-summary.txt +87 -0
package/research/mdcontext-pudding/06-links.md +679 -0
package/research/mdcontext-pudding/07-stats.md +693 -0
package/research/mdcontext-pudding/BUG-FIX-PLAN.md +388 -0
package/research/mdcontext-pudding/P0-BUG-VALIDATION.md +167 -0
package/research/mdcontext-pudding/README.md +168 -0
package/research/mdcontext-pudding/TESTING-SUMMARY.md +128 -0
package/research/research-quality-review.md +834 -0
package/research/semantic-search/embedding-text-analysis.md +156 -0
package/research/semantic-search/multi-word-failure-reproduction.md +171 -0
package/research/semantic-search/query-processing-analysis.md +207 -0
package/research/semantic-search/root-cause-and-solution.md +114 -0
package/research/semantic-search/threshold-validation-report.md +69 -0
package/research/semantic-search/vector-search-analysis.md +63 -0
package/research/test-path-issues.md +276 -0
package/review/ALP-76/1-error-type-design.md +962 -0
package/review/ALP-76/2-error-handling-patterns.md +906 -0
package/review/ALP-76/3-error-presentation.md +624 -0
package/review/ALP-76/4-test-coverage.md +625 -0
package/review/ALP-76/5-migration-completeness.md +440 -0
package/review/ALP-76/6-effect-best-practices.md +755 -0
package/scripts/apply-branch-protection.sh +47 -0
package/scripts/branch-protection-templates.json +79 -0
package/scripts/prototype-summarization.ts +346 -0
package/scripts/rebuild-hnswlib.js +32 -37
package/scripts/setup-branch-protection.sh +64 -0
package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/active-provider.json +7 -0
package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/bm25.json +541 -0
package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/bm25.meta.json +5 -0
package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/config.json +8 -0
package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.bin +0 -0
package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.meta.bin +0 -0
package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/documents.json +60 -0
package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/links.json +13 -0
package/src/__tests__/fixtures/semantic-search/multi-word-corpus/.mdcontext/indexes/sections.json +1197 -0
package/src/__tests__/fixtures/semantic-search/multi-word-corpus/configuration-management.md +99 -0
package/src/__tests__/fixtures/semantic-search/multi-word-corpus/distributed-systems.md +92 -0
package/src/__tests__/fixtures/semantic-search/multi-word-corpus/error-handling.md +78 -0
package/src/__tests__/fixtures/semantic-search/multi-word-corpus/failure-automation.md +55 -0
package/src/__tests__/fixtures/semantic-search/multi-word-corpus/job-context.md +69 -0
package/src/__tests__/fixtures/semantic-search/multi-word-corpus/process-orchestration.md +99 -0
package/src/cli/argv-preprocessor.test.ts +2 -2
package/src/cli/cli.test.ts +230 -33
package/src/cli/commands/config-cmd.ts +642 -0
package/src/cli/commands/context.ts +97 -9
package/src/cli/commands/duplicates.ts +122 -0
package/src/cli/commands/embeddings.ts +529 -0
package/src/cli/commands/index-cmd.ts +210 -30
package/src/cli/commands/index.ts +3 -0
package/src/cli/commands/search.ts +894 -64
package/src/cli/commands/stats.ts +3 -0
package/src/cli/commands/tree.ts +26 -5
package/src/cli/config-layer.ts +176 -0
package/src/cli/error-handler.test.ts +235 -0
package/src/cli/error-handler.ts +655 -0
package/src/cli/flag-schemas.ts +66 -0
package/src/cli/help.ts +209 -7
package/src/cli/main.ts +348 -58
package/src/cli/options.ts +10 -0
package/src/cli/shared-error-handling.ts +199 -0
package/src/cli/utils.ts +150 -17
package/src/config/file-provider.test.ts +320 -0
package/src/config/file-provider.ts +273 -0
package/src/config/index.ts +72 -0
package/src/config/integration.test.ts +667 -0
package/src/config/precedence.test.ts +277 -0
package/src/config/precedence.ts +451 -0
package/src/config/schema.test.ts +414 -0
package/src/config/schema.ts +603 -0
package/src/config/service.test.ts +320 -0
package/src/config/service.ts +243 -0
package/src/config/testing.test.ts +264 -0
package/src/config/testing.ts +110 -0
package/src/core/types.ts +6 -33
package/src/duplicates/detector.test.ts +183 -0
package/src/duplicates/detector.ts +414 -0
package/src/duplicates/index.ts +18 -0
package/src/embeddings/embedding-namespace.test.ts +300 -0
package/src/embeddings/embedding-namespace.ts +947 -0
package/src/embeddings/heading-boost.test.ts +222 -0
package/src/embeddings/hnsw-build-options.test.ts +198 -0
package/src/embeddings/hyde.test.ts +272 -0
package/src/embeddings/hyde.ts +264 -0
package/src/embeddings/index.ts +2 -0
package/src/embeddings/openai-provider.ts +332 -83
package/src/embeddings/pricing.json +22 -0
package/src/embeddings/provider-constants.ts +204 -0
package/src/embeddings/provider-errors.test.ts +967 -0
package/src/embeddings/provider-errors.ts +565 -0
package/src/embeddings/provider-factory.test.ts +240 -0
package/src/embeddings/provider-factory.ts +225 -0
package/src/embeddings/provider-integration.test.ts +788 -0
package/src/embeddings/query-preprocessing.test.ts +187 -0
package/src/embeddings/semantic-search-threshold.test.ts +508 -0
package/src/embeddings/semantic-search.ts +780 -93
package/src/embeddings/types.ts +293 -16
package/src/embeddings/vector-store.ts +486 -77
package/src/embeddings/voyage-provider.ts +313 -0
package/src/errors/errors.test.ts +845 -0
package/src/errors/index.ts +533 -0
package/src/index/ignore-patterns.test.ts +354 -0
package/src/index/ignore-patterns.ts +305 -0
package/src/index/indexer.ts +286 -48
package/src/index/storage.ts +94 -30
package/src/index/types.ts +40 -2
package/src/index/watcher.ts +67 -9
package/src/index.ts +22 -0
package/src/integration/search-keyword.test.ts +678 -0
package/src/mcp/server.ts +135 -6
package/src/parser/parser.ts +18 -19
package/src/parser/section-filter.test.ts +277 -0
package/src/parser/section-filter.ts +125 -3
package/src/search/__tests__/hybrid-search.test.ts +650 -0
package/src/search/bm25-store.ts +366 -0
package/src/search/cross-encoder.test.ts +253 -0
package/src/search/cross-encoder.ts +406 -0
package/src/search/fuzzy-search.test.ts +419 -0
package/src/search/fuzzy-search.ts +273 -0
package/src/search/hybrid-search.ts +448 -0
package/src/search/path-matcher.test.ts +276 -0
package/src/search/path-matcher.ts +33 -0
package/src/search/searcher.test.ts +99 -1
package/src/search/searcher.ts +189 -67
package/src/search/wink-bm25.d.ts +30 -0
package/src/summarization/cli-providers/claude.ts +202 -0
package/src/summarization/cli-providers/detection.test.ts +273 -0
package/src/summarization/cli-providers/detection.ts +118 -0
package/src/summarization/cli-providers/index.ts +8 -0
package/src/summarization/cost.test.ts +139 -0
package/src/summarization/cost.ts +102 -0
package/src/summarization/error-handler.test.ts +127 -0
package/src/summarization/error-handler.ts +111 -0
package/src/summarization/index.ts +102 -0
package/src/summarization/pipeline.test.ts +498 -0
package/src/summarization/pipeline.ts +231 -0
package/src/summarization/prompts.test.ts +269 -0
package/src/summarization/prompts.ts +133 -0
package/src/summarization/provider-factory.test.ts +396 -0
package/src/summarization/provider-factory.ts +178 -0
package/src/summarization/types.ts +184 -0
package/src/summarize/summarizer.ts +104 -35
package/src/types/huggingface-transformers.d.ts +66 -0
package/tests/fixtures/cli/.mdcontext/active-provider.json +7 -0
package/tests/fixtures/cli/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.bin +0 -0
package/tests/fixtures/cli/.mdcontext/embeddings/openai_text-embedding-3-small_512/vectors.meta.bin +0 -0
package/tests/fixtures/cli/.mdcontext/indexes/documents.json +4 -4
package/tests/fixtures/cli/.mdcontext/indexes/sections.json +14 -0
package/tests/integration/embed-index.test.ts +712 -0
package/tests/integration/search-context.test.ts +469 -0
package/tests/integration/search-semantic.test.ts +522 -0
package/vitest.config.ts +1 -6
package/AGENTS.md +0 -46
package/tests/fixtures/cli/.mdcontext/vectors.bin +0 -0
package/tests/fixtures/cli/.mdcontext/vectors.meta.json +0 -1264

package/research/llm-summarization/alternative-providers-2026.md ADDED Viewed

@@ -0,0 +1,1428 @@
+# Alternative LLM Providers for Code Summarization - 2026 Research
+**Date**: January 26, 2026
+**Author**: Research Team
+**Purpose**: Evaluate alternative LLM providers for code summarization to reduce dependency on OpenAI and optimize costs
+---
+## Executive Summary
+This document provides a comprehensive analysis of alternative LLM providers available in 2026 for code summarization tasks. Based on current market analysis, the following providers emerge as strong candidates:
+**Top Recommendations:**
+1. **DeepSeek** - Best overall value (95% cheaper than OpenAI)
+2. **Google Gemini 2.5 Flash** - Best balance of speed, cost, and quality
+3. **Qwen 2.5/3 Coder** - Best specialized code models (open source)
+4. **Mistral Devstral 2** - Cost-efficient alternative to Claude Sonnet
+**For Local Deployment:**
+1. **Qwen 2.5 Coder 32B** - Industry-leading local code model
+2. **DeepSeek R1 Distilled** - Strong reasoning capabilities
+3. **Code Llama 34B** - Mature, well-supported option
+---
+## 1. Google Gemini
+### Models & Versions
+| Model | Context Window | Status | Release Date |
+|-------|---------------|--------|--------------|
+| Gemini 2.0 Flash | 1M tokens | Deprecated (EOL: March 31, 2026) | 2025 |
+| Gemini 2.0 Flash-Lite | 1M tokens | Active | 2025 |
+| Gemini 2.5 Flash | 1M tokens | Active | 2025 |
+| Gemini 2.5 Flash-Lite | 1M tokens | Active | 2025 |
+| Gemini 2.5 Pro | 1M tokens | Active | 2025 |
+| Gemini 3 Pro Preview | 1M tokens | Preview | 2026 |
+### Pricing (Per 1M Tokens)
+| Model | Input | Output | Cache Reads (10%) |
+|-------|-------|--------|-------------------|
+| **2.0 Flash** | $0.10 | $0.40 | $0.01 |
+| **2.0 Flash-Lite** | $0.10 | $0.40 | $0.01 |
+| **2.5 Flash** | $0.30 | $2.50 | $0.03 |
+| **2.5 Pro** | $1.25 | $10.00 | $0.125 |
+| **2.5 Pro (>200K context)** | $2.50 | $20.00 | $0.25 |
+| **3 Pro Preview** | $2.00 | $12.00 | $0.20 |
+**Batch Processing Discount**: 50% off (Gemini 2.5 Pro drops to $0.625/$5.00)
+### Code Understanding Capabilities
+**Strengths:**
+- Native tool use and function calling
+- Excellent speed-to-quality ratio on Flash models
+- Superior context handling (1M tokens = ~750K words)
+- Unified pricing model (no short vs long context tiers on 2.0/2.5 Flash)
+- Strong multimodal capabilities
+**Performance:**
+- Gemini 2.0 Flash delivers next-gen features with improved speed
+- 2.5 Pro offers competitive reasoning with GPT-4 level performance
+- Native code generation and understanding across multiple languages
+### Pros & Cons
+**Pros:**
+- ✅ Extremely competitive pricing (Flash models)
+- ✅ Massive 1M token context window
+- ✅ Cache reads cost only 10% of base price (90% savings)
+- ✅ Free tier available with generous rate limits
+- ✅ Fast inference speeds
+- ✅ Excellent for large codebase analysis
+**Cons:**
+- ❌ 2.0 Flash being deprecated (migration required)
+- ❌ Pro models get expensive with >200K context
+- ❌ Less specialized for code than dedicated code models
+- ❌ API availability limited to certain regions
+### Recommendations
+**Best Use Cases:**
+- **2.5 Flash**: Primary choice for most code summarization (best price/performance)
+- **2.5 Pro**: Complex reasoning over large codebases (use with caching)
+- **Batch API**: Non-urgent summarization jobs (50% cost reduction)
+**Cost Optimization:**
+- Leverage prompt caching for repeated codebase queries (90% savings)
+- Use batch processing for large-scale summarization tasks
+- Start with Flash-Lite for simple tasks, upgrade to Flash as needed
+---
+## 2. Meta Llama (Open Source)
+### Models & Versions
+| Model | Parameters | Architecture | Release Date |
+|-------|-----------|--------------|--------------|
+| Llama 3.3 | 70B | Dense | 2025 |
+| Llama 4 Scout | 17B active (MoE) | Mixture of Experts | April 2025 |
+| Llama 4 Maverick | 17B active, 128 experts | MoE | April 2025 |
+| Code Llama | 7B, 13B, 34B, 70B | Dense | 2023 (Legacy) |
+### Pricing
+**API Pricing (via cloud providers):**
+- **AWS Bedrock**: Variable pricing based on region and throughput
+- **Together AI**: ~$0.60/$0.80 per 1M tokens (Llama 3 70B)
+- **Replicate**: Pay-per-second pricing
+**Local Deployment**: Free (open source)
+### Hardware Requirements (Local)
+| Model | VRAM (FP16) | VRAM (Q4 Quantized) | Recommended GPU |
+|-------|-------------|---------------------|-----------------|
+| Llama 4 Maverick 17B | ~34GB | ~10-12GB | RTX 4090 (24GB) |
+| Llama 3.3 70B | ~140GB | ~35-40GB | A100 (80GB) or multi-GPU |
+| Code Llama 34B | ~68GB | ~20-24GB | RTX 4090 (24GB) |
+| Code Llama 13B | ~26GB | ~8-10GB | RTX 4070 Ti (12GB) |
+### Code Understanding Capabilities
+**Llama 4 Maverick:**
+- Achieves comparable results to DeepSeek V3 on coding benchmarks
+- At less than half the active parameters
+- Specialized for coding, chatbots, and technical assistants
+- MoE architecture allows specialized experts for code/math
+**Code Llama (Legacy but Proven):**
+- Supports Python, C++, Java, PHP, TypeScript/JavaScript, C#, Bash
+- Specialized training on code datasets
+- Strong code completion and generation
+- Good documentation understanding
+**Performance:**
+- 72.2% on SWE-bench Verified (Devstral 2 comparison metric)
+- Comprehensive multi-document analysis
+- Robust codebase reasoning
+- Sophisticated data processing
+### Pros & Cons
+**Pros:**
+- ✅ Fully open source (free for commercial use)
+- ✅ Can run locally (no API costs)
+- ✅ Strong community support
+- ✅ MoE architecture efficient for inference
+- ✅ Good code generation across major languages
+- ✅ No vendor lock-in
+**Cons:**
+- ❌ Requires significant hardware for larger models
+- ❌ Setup and maintenance overhead for local deployment
+- ❌ Not as specialized as newer code models (Qwen, DeepSeek Coder)
+- ❌ Maverick still new, less battle-tested than Llama 3
+### Recommendations
+**Best Use Cases:**
+- Local deployment when privacy is critical
+- Organizations wanting to avoid API costs at scale
+- Teams with GPU infrastructure already in place
+- Use Code Llama 13B/34B as baseline code model
+**Local Setup:**
+- Deploy via Ollama for easy management
+- Use quantized versions (Q4/Q5) to fit on consumer GPUs
+- Llama 4 Maverick 17B is sweet spot for code tasks (fits on RTX 4090)
+---
+## 3. DeepSeek
+### Models & Versions
+| Model | Parameters | Specialization | Release Date |
+|-------|-----------|---------------|--------------|
+| DeepSeek V3 | 671B total, MoE | General reasoning | Dec 2024 |
+| DeepSeek V3.2-Exp | 671B total, MoE | Experimental, optimized | Jan 2026 |
+| DeepSeek R1 | Unknown (reasoning model) | Chain-of-thought reasoning | Jan 2025 |
+| DeepSeek Coder V2 | 236B total, MoE | Code specialization | 2024 |
+| DeepSeek R1 Distilled | Various sizes | Distilled reasoning | 2025 |
+### Pricing (Per 1M Tokens)
+| Model | Input | Output | Cache Hit |
+|-------|-------|--------|-----------|
+| **DeepSeek V3** | $0.28 | $1.12 | $0.028 (90% off) |
+| **DeepSeek V3.2-Exp** | $0.14 | $0.56 | $0.014 (90% off) |
+| **DeepSeek Chat** | $0.28 | $0.42 | $0.028 |
+| **DeepSeek Reasoner** | Higher (thinking tokens) | Variable | N/A |
+**New User Bonus**: 5M free tokens (~$8.40 value) valid for 30 days
+### Code Understanding Capabilities
+**Strengths:**
+- **deepseek-chat**: Optimized for general tasks including classification, summarization, tool pipelines
+- **deepseek-coder**: Specialized for code generation, explanation, debugging
+- **deepseek-reasoner**: Complex code understanding with chain-of-thought
+- 8K max output for chat mode
+- Excellent at condensing long documents and code
+**Performance:**
+- V3.2-Exp: Up to 95% cheaper than GPT-5
+- Competitive with leading models on code benchmarks
+- Strong multi-language code support
+- Can generate code snippets, explain complex sections, debug programs
+### Pros & Cons
+**Pros:**
+- ✅ **Exceptional value** - 95% cheaper than OpenAI
+- ✅ Cache hits provide 90% additional savings
+- ✅ Strong code understanding and generation
+- ✅ Generous free tier (5M tokens)
+- ✅ Multiple specialized models (chat, coder, reasoner)
+- ✅ Fast inference
+- ✅ Recommended 70%+ cache hit rates in production
+**Cons:**
+- ❌ Newer provider, less established than OpenAI/Google
+- ❌ API stability/uptime unclear
+- ❌ Limited documentation compared to major providers
+- ❌ Reasoning model has variable token costs
+- ❌ Potential geopolitical concerns (Chinese company)
+### Recommendations
+**Best Use Cases:**
+- **PRIMARY RECOMMENDATION** for cost-sensitive deployments
+- High-volume code summarization (leverage caching)
+- Batch processing of codebases
+- Development/testing before production (use free tier)
+**Cost Optimization:**
+- Architect for high cache hit rates (70%+)
+- Use deepseek-chat for straightforward summarization
+- Reserve deepseek-reasoner for complex analysis
+- V3.2-Exp offers 50% cost reduction over V3
+**Risk Mitigation:**
+- Start with pilot project to validate quality
+- Have fallback provider (Gemini Flash) for critical workloads
+- Monitor API availability and response times
+---
+## 4. Mistral AI
+### Models & Versions
+| Model | Parameters | Specialization | Release Date |
+|-------|-----------|---------------|--------------|
+| Devstral 2 | 123B | Code agent/engineering | 2025 |
+| Devstral Small 2 | 24B | Lightweight coding | 2025 |
+| Mistral Medium 3 | Unknown | General purpose | 2025 |
+| Mistral Large 2411 | Unknown | Flagship model | Nov 2024 |
+### Pricing (Per 1M Tokens)
+| Model | Input | Output |
+|-------|-------|--------|
+| **Devstral 2** | $0.40 | $2.00 |
+| **Devstral Small 2** | $0.10 | $0.30 |
+| **Mistral Medium 3** | $0.40 | $2.00 |
+| **Mistral Large 2411** | $2.00 | $6.00 |
+### Code Understanding Capabilities
+**Devstral 2 Features:**
+- **SWE-bench Verified**: 72.2% (Devstral 2), 68.0% (Small 2)
+- **Frontier code agent** for solving software engineering tasks
+- Explores codebases and orchestrates changes across multiple files
+- Maintains architecture-level context
+- Tracks framework dependencies
+- Detects failures and retries with corrections
+- Multi-file codebase reasoning
+**Performance:**
+- **7x more cost-efficient** than Claude Sonnet for real-world tasks
+- Purpose-built for code generation and understanding
+- Optimized for developer workflows
+- Strong at complex refactoring and codebase analysis
+### Pros & Cons
+**Pros:**
+- ✅ **Excellent cost efficiency** vs Claude Sonnet
+- ✅ Strong software engineering capabilities
+- ✅ Multi-file context maintenance
+- ✅ Architecture-aware code changes
+- ✅ Small 2 model offers great price/performance
+- ✅ European company (GDPR compliant)
+**Cons:**
+- ❌ More expensive than DeepSeek/Gemini Flash
+- ❌ Smaller model selection than major providers
+- ❌ Less documentation/community than OpenAI/Google
+- ❌ Limited track record compared to established providers
+### Recommendations
+**Best Use Cases:**
+- Complex multi-file code refactoring
+- Codebase exploration and understanding
+- When architecture-level context is critical
+- Alternative to Claude Sonnet (7x cheaper)
+**Model Selection:**
+- **Devstral Small 2**: Most code summarization tasks (excellent value at $0.10/$0.30)
+- **Devstral 2**: Complex multi-file analysis, architecture changes
+- **Mistral Large**: Only when highest quality reasoning needed
+---
+## 5. Qwen (Alibaba Cloud)
+### Models & Versions
+| Model | Parameters | Architecture | Release Date |
+|-------|-----------|--------------|--------------|
+| Qwen 2.5 Coder | 0.5B, 1.5B, 3B, 7B, 14B, 32B | Dense | 2024 |
+| Qwen 3 Coder | Various sizes | Advanced | 2025/2026 |
+| Qwen 3 Coder 480B-A35B | 480B total, 35B active | MoE | 2026 |
+### Pricing (Via API Providers)
+| Model | Input | Output | Provider |
+|-------|-------|--------|----------|
+| **Qwen 2.5 Coder 32B** | $0.03 | $0.11 | Together AI |
+| **Qwen 2.5 Coder 32B** | $0.80 | (bundled) | Silicon Flow |
+| **Qwen 3 Coder 480B** | $2.00 | (bundled) | Alibaba Cloud |
+**Note**: Pricing varies significantly by provider. Alibaba Cloud has tiered pricing based on input tokens.
+### Hardware Requirements (Local)
+| Model | VRAM (FP16) | VRAM (Q4) | Recommended GPU |
+|-------|-------------|-----------|-----------------|
+| **Qwen 2.5 Coder 7B** | ~14GB | ~4-6GB | RTX 3060 (12GB) |
+| **Qwen 2.5 Coder 14B** | ~28GB | ~8-10GB | RTX 4070 Ti |
+| **Qwen 2.5 Coder 32B** | ~64GB | ~20-24GB | RTX 4090 (24GB) |
+| **Qwen 3 Coder 480B** | ~960GB | ~240GB | Multi-GPU cluster |
+### Code Understanding Capabilities
+**Qwen 2.5 Coder:**
+- **Best-in-class open source** code generation
+- Beats GPT-4o on multiple benchmarks (EvalPlus, LiveCodeBench, BigCodeBench)
+- **Code Repair**: 73.7 on Aider benchmark (comparable to GPT-4o)
+- **Multi-language**: Excellent across 40+ programming languages
+- **McEval Score**: 65.9 (multi-language evaluation)
+**Qwen 3 Coder (2026):**
+- **Powerful coding agent** capabilities
+- Excellent at tool calling and environment interaction
+- **Autonomous programming** features
+- **480B-A35B**: Largest open-source coding model
+- Trained with long-horizon RL across 20,000 parallel environments
+- **Claude Sonnet 4 level performance** on complex tasks
+**Key Strengths:**
+- Code generation, reasoning, and fixing
+- Comprehensive language support
+- Local deployment friendly
+- Production-ready quality
+### Pros & Cons
+**Pros:**
+- ✅ **Best local code model** in 2026
+- ✅ Open source (free for local use)
+- ✅ Genuinely competes with GPT-4o
+- ✅ 40+ programming languages
+- ✅ Multiple size options (0.5B to 480B)
+- ✅ Excellent code repair capabilities
+- ✅ Strong benchmarks across the board
+- ✅ Qwen 3 Coder has agent capabilities
+**Cons:**
+- ❌ Larger models require significant hardware
+- ❌ API pricing varies widely by provider
+- ❌ 480B model impractical for most local deployments
+- ❌ Less mature ecosystem than Llama
+- ❌ Documentation primarily in Chinese (improving)
+### Recommendations
+**Best Use Cases:**
+- **PRIMARY RECOMMENDATION** for local code model deployment
+- Multi-language code understanding and generation
+- Code repair and debugging tasks
+- When quality competitive with GPT-4o is required
+**Model Selection (Local):**
+- **7B**: Entry-level, runs on 8GB VRAM (quantized)
+- **14B**: Mid-range, good quality on 16GB VRAM
+- **32B**: **SWEET SPOT** - GPT-4o competitive, fits on RTX 4090
+- **480B**: Research/enterprise only
+**Deployment Strategy:**
+- Deploy 32B via Ollama for easy management
+- Use Q4 quantization to fit on consumer hardware
+- For API, use Together AI for best pricing
+---
+## 6. Anthropic Claude (Baseline Comparison)
+### Models & Versions
+| Model | Context | Release Date |
+|-------|---------|--------------|
+| Claude 4.5 Haiku | 200K tokens | 2025 |
+| Claude 4.5 Sonnet | 200K tokens | 2025 |
+| Claude 4.5 Opus | 200K tokens | 2025 |
+| Claude Opus 4.1 (Legacy) | 200K tokens | 2024 |
+### Pricing (Per 1M Tokens)
+| Model | Input | Output | Cache Read (90% off) |
+|-------|-------|--------|---------------------|
+| **Haiku 4.5** | $1.00 | $5.00 | $0.10 |
+| **Sonnet 4.5** | $3.00 | $15.00 | $0.30 |
+| **Opus 4.5** | $5.00 | $25.00 | $0.50 |
+| **Opus 4.1 (Legacy)** | $15.00 | $75.00 | N/A |
+**Batch Processing**: 50% discount (e.g., Opus 4.5 → $2.50/$12.50)
+### Cost Optimization Features
+**Prompt Caching:**
+- Cache reads cost 90% less than base price
+- Ideal for large codebases (chatting with 100-page PDFs or codebases)
+- Example: Opus 4.5 cache read = $0.50/MTok vs $5.00/MTok
+**Batch API:**
+- 50% flat discount on all token costs
+- For non-urgent tasks (overnight report generation)
+- No quality degradation
+### Code Understanding Capabilities
+**Strengths:**
+- High-quality code generation and understanding
+- Excellent at complex reasoning over code
+- Strong at following coding conventions
+- Good at explaining and documenting code
+- Multi-file context understanding
+**Use Cases:**
+- Text/code summarization
+- Image analysis of architecture diagrams
+- Product feature development
+- Classification and async job pipelines
+### Pros & Cons
+**Pros:**
+- ✅ Excellent quality and reliability
+- ✅ Strong enterprise adoption
+- ✅ Good documentation and support
+- ✅ Prompt caching (90% savings)
+- ✅ Batch processing (50% savings)
+- ✅ Opus 4.5 is 67% cheaper than 4.1
+**Cons:**
+- ❌ **Expensive** compared to alternatives
+- ❌ 5-25x more expensive than DeepSeek
+- ❌ 3-10x more expensive than Gemini Flash
+- ❌ 200K context vs 1M for Gemini
+- ❌ Higher costs limit batch processing use cases
+### Recommendations
+**When to Use Claude:**
+- Highest quality requirements
+- Complex reasoning over code
+- When cost is secondary to quality
+- Enterprise deployments with existing Claude contracts
+**Cost Optimization:**
+- Always use prompt caching for large codebases
+- Batch API for non-urgent tasks
+- Consider Haiku for simpler summarization tasks
+- Use Sonnet as baseline, only upgrade to Opus when needed
+**Alternatives to Consider:**
+- **Replace Sonnet with**: Mistral Devstral 2 (7x cheaper)
+- **Replace Haiku with**: Gemini 2.5 Flash (3x cheaper)
+- **Replace Opus with**: Qwen 3 Coder 480B (local) or DeepSeek V3
+---
+## 7. OpenAI GPT-5 (Baseline Comparison)
+### Models & Versions
+| Model | Specialization | Release Date |
+|-------|---------------|--------------|
+| GPT-5 | General flagship | August 2025 |
+| GPT-5-mini | Lightweight | 2025 |
+| GPT-5-nano | Ultra-lightweight | 2025 |
+| GPT-5 Pro | Maximum capability | 2025 |
+| GPT-5.2-Codex | Agentic coding | 2025 |
+| GPT-5.1-Codex-Max | Complex refactoring | 2025 |
+| GPT-5.1-Codex-Mini | Fast coding tasks | 2025 |
+### Pricing (Per 1M Tokens)
+| Model | Input | Output | Batch (50% off) |
+|-------|-------|--------|-----------------|
+| **GPT-5** | $1.25 | $10.00 | $0.625 / $5.00 |
+| **GPT-5-mini** | $0.25 | $2.00 | $0.125 / $1.00 |
+| **GPT-5-nano** | $0.05 | $0.40 | $0.025 / $0.20 |
+| **GPT-5 Pro** | $15.00 | $120.00 | $7.50 / $60.00 |
+| **GPT-4 Turbo** | $10.00 | $30.00 | N/A |
+**ChatGPT Plus**: $20/month (includes GPT-5 access)
+### Code Understanding Capabilities
+**GPT-5.2-Codex:**
+- Top-of-the-line agentic coding model
+- Complex, multi-step tasks
+- Autonomous assistant capabilities
+**GPT-5.1-Codex-Max:**
+- Optimized for long, complex problems
+- Entire legacy codebase refactoring
+- Handles massive context
+**GPT-5.1-Codex-Mini:**
+- Fast and efficient
+- Function completion, unit tests
+- Code snippet translation
+**General Capabilities:**
+- Improved reasoning vs GPT-4
+- Better debugging and cross-language development
+- Multi-step code generation
+- Strong documentation generation
+### Pros & Cons
+**Pros:**
+- ✅ Industry-leading quality
+- ✅ Excellent documentation and ecosystem
+- ✅ Specialized Codex models for code
+- ✅ Strong multi-language support
+- ✅ Reliable API uptime
+- ✅ Nano model is competitively priced
+**Cons:**
+- ❌ **Very expensive** at standard tier
+- ❌ 12-80x more expensive than DeepSeek
+- ❌ 4-25x more expensive than Gemini Flash
+- ❌ Pro model is prohibitively expensive
+- ❌ No prompt caching (unlike Claude/Gemini)
+- ❌ Lower context window than Gemini
+### Recommendations
+**When to Use OpenAI:**
+- Existing OpenAI integrations
+- When maximum quality is critical
+- ChatGPT Plus subscription already in place
+- Enterprise contracts with favorable pricing
+**Cost Optimization:**
+- Use Batch API for 50% discount (non-urgent tasks)
+- GPT-5-nano for simple summarization ($0.05/$0.40)
+- GPT-5-mini for most code tasks ($0.25/$2.00)
+- Avoid GPT-5 Pro unless absolutely necessary
+**Alternatives to Consider:**
+- **Replace GPT-5 with**: DeepSeek V3.2 (10x cheaper, similar quality)
+- **Replace GPT-5-mini with**: Gemini 2.5 Flash (comparable price, better context)
+- **Replace Codex with**: Qwen 2.5 Coder 32B (local, free)
+---
+## 8. Cohere
+### Models & Versions
+| Model | Context | Specialization | Release Date |
+|-------|---------|---------------|--------------|
+| Command R | 128K tokens | General, efficient | Aug 2024 |
+| Command R+ | 128K tokens | Complex reasoning | Aug 2024 |
+### Pricing (Per 1M Tokens)
+| Model | Input | Output |
+|-------|-------|--------|
+| **Command R** | $0.15 | $0.60 |
+| **Command R+ (08-2024)** | $2.50 | $10.00 |
+**Cost Comparison:**
+- Switching from R+ to R saves 94% on input costs ($2.50 → $0.15)
+### Code Understanding Capabilities
+**Command R+:**
+- Optimized for reasoning, summarization, question answering
+- Complex reasoning over long contexts (128K tokens)
+- Multi-step tool usage
+- Advanced reasoning capabilities
+**Command R:**
+- General chatbots, summaries, content generation
+- Text generation, summarization, translation
+- Text-based classification
+- More efficient for standard tasks
+**Use Cases:**
+- Document summarization (including code)
+- Long-context analysis
+- Question answering over codebases
+- Text classification and categorization
+### Pros & Cons
+**Pros:**
+- ✅ Command R is very affordable ($0.15/$0.60)
+- ✅ 128K context window
+- ✅ Strong summarization capabilities
+- ✅ Good long-context performance
+- ✅ Clear model differentiation (R vs R+)
+**Cons:**
+- ❌ Not specialized for code (general-purpose)
+- ❌ Command R+ expensive vs alternatives
+- ❌ Smaller model selection
+- ❌ Less community support than major providers
+- ❌ Limited code-specific benchmarks
+- ❌ 128K context < 1M for Gemini
+### Recommendations
+**Best Use Cases:**
+- General text summarization (not code-specific)
+- Long document analysis
+- When 128K context is sufficient
+- Budget-conscious deployments (use Command R)
+**When to Avoid:**
+- Code-specific tasks (use Qwen, DeepSeek Coder, Mistral Devstral)
+- Need for >128K context (use Gemini)
+- When cost is critical (DeepSeek is cheaper)
+**Model Selection:**
+- Use Command R for 94% cost savings when R+ isn't needed
+- Reserve R+ for complex multi-step reasoning
+- Consider alternatives for code-specific work
+---
+## 9. Local Deployment via Ollama
+### Overview
+Ollama is the **industry standard** for running LLMs locally in 2026. It provides a simple CLI, REST API, and Python/JavaScript SDKs for managing local models.
+**Server Details:**
+- Runs on `http://localhost:11434/` by default
+- Easy integration into applications
+- Offline operation (no network required)
+- Complete data privacy
+### Supported Models (Code-Focused)
+| Model | Provider | Size Options | Code Specialization |
+|-------|----------|--------------|---------------------|
+| **Qwen 2.5 Coder** | Alibaba | 0.5B, 1.5B, 3B, 7B, 14B, 32B | ⭐⭐⭐⭐⭐ Best |
+| **DeepSeek R1** | DeepSeek | Various, distilled versions | ⭐⭐⭐⭐⭐ Reasoning |
+| **Code Llama** | Meta | 7B, 13B, 34B | ⭐⭐⭐⭐ Proven |
+| **Mistral** | Mistral AI | 7B, 8x7B (MoE), 8x22B | ⭐⭐⭐ Good summarization |
+| **Gemma** | Google | 2B, 7B, 9B, 27B | ⭐⭐⭐ Efficient |
+| **Phi** | Microsoft | 3B, 3.5B | ⭐⭐ Lightweight |
+### VRAM Requirements
+**General Formula:**
+- FP16: ~2GB VRAM per billion parameters
+- Q4 Quantization: ~1GB VRAM per billion parameters
+- Q5 Quantization: ~1.2GB VRAM per billion parameters
+**Specific Models:**
+| Model Size | FP16 VRAM | Q4 VRAM | Q5 VRAM | Recommended GPU |
+|------------|-----------|---------|---------|-----------------|
+| **3-4B** | 6-8GB | 4-5GB | 4.5-6GB | RTX 3060 (12GB) |
+| **7B** | 14GB | 4-6GB | 5-7GB | RTX 3060 (12GB) |
+| **13-14B** | 26-28GB | 8-10GB | 10-12GB | RTX 4070 Ti (12GB) |
+| **32-34B** | 64-68GB | 20-24GB | 24-28GB | RTX 4090 (24GB) |
+| **70B** | 140GB | 35-40GB | 42-48GB | A100 (80GB) or 2x RTX 4090 |
+### Hardware Tiers
+**Entry-Level (4-6GB VRAM):**
+- Models: 3-4B parameters (Q4)
+- Context: ~4K tokens
+- Cost: ~$200-400 (used RTX 3050/3060)
+- Use Case: Basic code completion
+**Mid-Range (8-12GB VRAM):** ⭐ **SWEET SPOT**
+- Models: 7-14B parameters (Q4/Q5)
+- Context: 8K+ tokens
+- GPUs: RTX 3060 (12GB), RTX 4060 Ti (16GB)
+- Cost: ~$400-600
+- **Recommended**: Qwen 2.5 Coder 7B or 14B
+- Use Case: Most code summarization tasks
+**High-End (16-24GB VRAM):** ⭐ **PROFESSIONAL**
+- Models: 13-34B parameters (Q4/Q5)
+- Context: 16K+ tokens
+- GPUs: RTX 4070 Ti (12GB), RTX 4080 (16GB), **RTX 4090 (24GB)**
+- Cost: ~$800-1,600
+- **Recommended**: Qwen 2.5 Coder 32B (Q4)
+- Use Case: Production-grade code understanding
+**Enterprise (40GB+ VRAM):**
+- Models: 70B+ parameters
+- Context: 32K+ tokens
+- GPUs: A100 (80GB), H100 (80GB), multi-GPU setups
+- Cost: $10,000-30,000+
+- Use Case: On-premise AI infrastructure
+### Performance Considerations
+**VRAM Spillover:**
+- When model exceeds VRAM → spills to system RAM
+- **Performance hit**: Up to 30x slower
+- **Critical**: Keep models within VRAM limits
+**Quantization Quality:**
+- **Q4**: ~5-10% quality loss, 50% VRAM savings
+- **Q5**: ~2-5% quality loss, 40% VRAM savings
+- **Q8**: ~1% quality loss, 20% VRAM savings
+- **FP16**: Full quality, maximum VRAM usage
+**Context Window vs VRAM:**
+- Larger context = more VRAM needed
+- 4K context: Baseline
+- 8K context: +20-30% VRAM
+- 16K context: +50-70% VRAM
+### Integration Options
+**1. Command Line Interface (CLI):**
+```bash
+# Download model
+ollama pull qwen2.5-coder:32b
+# Run model
+ollama run qwen2.5-coder:32b
+# Serve API
+ollama serve
+```
+**2. REST API:**
+```bash
+curl http://localhost:11434/api/generate \
+  -d '{"model": "qwen2.5-coder:32b", "prompt": "Summarize this code..."}'
+```
+**3. Python SDK:**
+```python
+import ollama
+response = ollama.chat(
+    model='qwen2.5-coder:32b',
+    messages=[{'role': 'user', 'content': 'Summarize...'}]
+)
+```
+**4. LangChain Integration:**
+```python
+from langchain_community.llms import Ollama
+llm = Ollama(model="qwen2.5-coder:32b")
+response = llm.invoke("Summarize this code...")
+```
+### Fine-Tuning
+**LoRA/QLoRA Adapters:**
+- Train custom adapters on specific codebases
+- Export as safetensors
+- Load into Ollama Modelfile
+- Run with `ollama create` and `ollama run`
+**Use Cases:**
+- Domain-specific code patterns
+- Company coding standards
+- Legacy codebase understanding
+### Pros & Cons
+**Pros:**
+- ✅ **Zero API costs** (one-time hardware investment)
+- ✅ Complete data privacy (offline operation)
+- ✅ No network latency
+- ✅ No rate limits
+- ✅ Customizable (fine-tuning, quantization)
+- ✅ Easy setup and management
+- ✅ Multi-model support
+- ✅ Active community and updates
+**Cons:**
+- ❌ Significant upfront hardware cost
+- ❌ Requires technical setup and maintenance
+- ❌ Power consumption costs
+- ❌ Limited to hardware capabilities
+- ❌ Slower inference than cloud APIs (typically)
+- ❌ No automatic scaling
+- ❌ Need to manage model updates manually
+### Recommendations
+**When to Choose Local Deployment:**
+- High-volume usage (ROI within 3-6 months)
+- Strict privacy/security requirements
+- Proprietary codebases that can't use cloud APIs
+- Offline/air-gapped environments
+- When you already have GPU infrastructure
+**Hardware Recommendation:**
+- **Budget**: RTX 4060 Ti 16GB (~$500) → Qwen 2.5 Coder 14B
+- **Recommended**: **RTX 4090 24GB** (~$1,600) → Qwen 2.5 Coder 32B ⭐
+- **Enterprise**: A100 80GB (~$15,000) → Qwen 3 Coder 480B or 70B models
+**Model Recommendation:**
+- **Start with**: Qwen 2.5 Coder 32B (Q4) on RTX 4090
+- **Fallback**: Qwen 2.5 Coder 14B (Q4) on RTX 4060 Ti
+- **Alternative**: Code Llama 34B or DeepSeek R1 Distilled
+---
+## Cost Comparison Matrix
+### API-Based Providers (Per 1M Tokens)
+| Provider | Model | Input | Output | Total (1:4 ratio)* | Relative Cost |
+|----------|-------|-------|--------|--------------------|---------------|
+| **DeepSeek** | V3.2-Exp | $0.14 | $0.56 | $2.38 | **1x** (Baseline) |
+| **DeepSeek** | Chat | $0.28 | $0.42 | $1.96 | **0.8x** |
+| **Google** | Gemini 2.5 Flash | $0.30 | $2.50 | $10.30 | **4.3x** |
+| **Google** | Gemini 2.0 Flash | $0.10 | $0.40 | $1.70 | **0.7x** |
+| **Mistral** | Devstral Small 2 | $0.10 | $0.30 | $1.30 | **0.5x** |
+| **Mistral** | Devstral 2 | $0.40 | $2.00 | $8.40 | **3.5x** |
+| **Qwen** | 2.5 Coder 32B | $0.03 | $0.11 | $0.47 | **0.2x** |
+| **Cohere** | Command R | $0.15 | $0.60 | $2.55 | **1.1x** |
+| **Cohere** | Command R+ | $2.50 | $10.00 | $42.50 | **17.9x** |
+| **OpenAI** | GPT-5-nano | $0.05 | $0.40 | $1.65 | **0.7x** |
+| **OpenAI** | GPT-5-mini | $0.25 | $2.00 | $8.25 | **3.5x** |
+| **OpenAI** | GPT-5 | $1.25 | $10.00 | $41.25 | **17.3x** |
+| **Anthropic** | Claude Haiku 4.5 | $1.00 | $5.00 | $21.00 | **8.8x** |
+| **Anthropic** | Claude Sonnet 4.5 | $3.00 | $15.00 | $63.00 | **26.5x** |
+| **Anthropic** | Claude Opus 4.5 | $5.00 | $25.00 | $105.00 | **44.1x** |
+*Assumes typical 1:4 input:output ratio for summarization tasks (1M input generates 4M output)
+### Cost at Scale (Processing 1TB of Code)
+Assuming 1TB = ~250B tokens of code, generating 1B tokens of summaries:
+| Provider | Model | Cost per 1TB | Annual Cost (100TB) |
+|----------|-------|-------------|---------------------|
+| **Qwen API** | 2.5 Coder 32B | $117.50 | $11,750 |
+| **DeepSeek** | V3.2-Exp (cache) | $490 | $49,000 |
+| **Mistral** | Devstral Small 2 | $1,625 | $162,500 |
+| **DeepSeek** | V3 | $1,190 | $119,000 |
+| **Google** | 2.5 Flash | $13,750 | $1,375,000 |
+| **Mistral** | Devstral 2 | $11,000 | $1,100,000 |
+| **OpenAI** | GPT-5-mini | $10,625 | $1,062,500 |
+| **OpenAI** | GPT-5 | $52,500 | $5,250,000 |
+| **Anthropic** | Haiku 4.5 | $27,500 | $2,750,000 |
+| **Anthropic** | Sonnet 4.5 | $82,500 | $8,250,000 |
+| **Anthropic** | Opus 4.5 | $137,500 | $13,750,000 |
+### Local Deployment ROI Analysis
+**One-Time Hardware Investment:**
+- RTX 4090 24GB: $1,600
+- Power consumption: ~$300/year (24/7 operation @ $0.12/kWh)
+- Total Year 1: $1,900
+- Total Year 2-3: $300/year
+**Break-Even Analysis vs API:**
+| API Provider | Monthly API Cost | Break-Even (Months) |
+|--------------|------------------|---------------------|
+| DeepSeek V3.2-Exp | $200 | 9.5 months |
+| Gemini 2.5 Flash | $500 | 3.8 months |
+| Mistral Devstral 2 | $400 | 4.8 months |
+| GPT-5 | $2,000 | 1.0 month |
+| Claude Sonnet 4.5 | $3,000 | 0.6 months |
+**Conclusion**: Local deployment pays for itself within 1-10 months depending on usage volume.
+---
+## Performance Comparison
+### Code Generation Benchmarks (2026)
+| Model | EvalPlus | LiveCodeBench | BigCodeBench | SWE-bench Verified | Aider |
+|-------|----------|---------------|--------------|-------------------|-------|
+| **Qwen 2.5 Coder 32B** | ⭐ Best OSS | ⭐ Best OSS | ⭐ Best OSS | N/A | 73.7 |
+| **Qwen 3 Coder 480B** | N/A | N/A | N/A | N/A | ~Claude Sonnet 4 |
+| **DeepSeek V3** | Strong | Strong | Strong | N/A | N/A |
+| **Mistral Devstral 2** | N/A | N/A | N/A | 72.2% | N/A |
+| **Mistral Devstral Small 2** | N/A | N/A | N/A | 68.0% | N/A |
+| **Llama 4 Maverick** | Competitive | Competitive | Competitive | N/A | N/A |
+| **GPT-4o** | Competitive | Competitive | Competitive | N/A | ~73.7 |
+| **Claude Sonnet 4** | Strong | Strong | Strong | Strong | Strong |
+| **GPT-5** | Strong | Strong | Strong | Strong | Strong |
+**Key Takeaways:**
+- Qwen 2.5 Coder 32B is best-in-class for open source
+- Mistral Devstral models excel at software engineering tasks
+- DeepSeek V3 competitive with major proprietary models
+- Llama 4 Maverick performs well at half the parameters of competitors
+### Multi-Language Support
+| Model | Languages | Notes |
+|-------|-----------|-------|
+| **Qwen 2.5 Coder** | 40+ | Best multi-language (McEval: 65.9) |
+| **Code Llama** | 7+ | Python, C++, Java, PHP, TS, C#, Bash |
+| **DeepSeek Coder** | 20+ | Strong across major languages |
+| **GPT-5 Codex** | 50+ | Comprehensive language support |
+| **Claude 4.5** | 40+ | Broad language coverage |
+| **Gemini 2.5** | 30+ | Good multi-language support |
+---
+## Recommendations by Use Case
+### 1. High-Volume Code Summarization (Best ROI)
+**Primary Choice**: **DeepSeek V3.2-Exp**
+- Cost: $0.14/$0.56 per 1M tokens
+- With caching: $0.014 per 1M tokens (90% off)
+- **Why**: Unbeatable price/performance ratio
+**Alternative**: **Google Gemini 2.5 Flash**
+- Cost: $0.30/$2.50 per 1M tokens (5x cache read)
+- **Why**: More established provider, 1M context window
+**Backup**: **Qwen 2.5 Coder 32B (via Together AI)**
+- Cost: $0.03/$0.11 per 1M tokens
+- **Why**: Code-specialized, cheapest API option
+### 2. Maximum Quality (Cost Secondary)
+**Primary Choice**: **Claude Opus 4.5**
+- Cost: $5/$25 per 1M tokens
+- **Why**: Industry-leading quality, proven reliability
+**Alternative**: **GPT-5**
+- Cost: $1.25/$10 per 1M tokens
+- **Why**: Excellent quality, 5x cheaper than Opus
+**Code-Specific**: **Mistral Devstral 2**
+- Cost: $0.40/$2.00 per 1M tokens
+- **Why**: 7x cheaper than Sonnet, code-specialized
+### 3. Local Deployment (Privacy Critical)
+**Primary Choice**: **Qwen 2.5 Coder 32B**
+- Hardware: RTX 4090 24GB (~$1,600)
+- **Why**: Best local code model, GPT-4o competitive
+**Budget Option**: **Qwen 2.5 Coder 14B**
+- Hardware: RTX 4060 Ti 16GB (~$500)
+- **Why**: Good quality, affordable hardware
+**Reasoning Tasks**: **DeepSeek R1 Distilled**
+- Hardware: RTX 4090 24GB
+- **Why**: Strong reasoning, 32B distilled version available
+### 4. Large Codebase Analysis (Context Critical)
+**Primary Choice**: **Google Gemini 2.5 Pro**
+- Context: 1M tokens (~750K words)
+- Cost: $1.25/$10 per 1M tokens
+- **Why**: Massive context, competitive pricing with caching
+**Alternative**: **Claude Opus 4.5**
+- Context: 200K tokens
+- Cost: $5/$25 per 1M tokens (with 90% cache discount)
+- **Why**: Better quality, good with prompt caching
+**Budget Option**: **Gemini 2.5 Flash**
+- Context: 1M tokens
+- Cost: $0.30/$2.50 per 1M tokens
+- **Why**: Same context as Pro, much cheaper
+### 5. Multi-File Refactoring & Complex Changes
+**Primary Choice**: **Mistral Devstral 2**
+- Cost: $0.40/$2.00 per 1M tokens
+- **Why**: Built for multi-file orchestration, architecture-aware
+**Alternative**: **Qwen 3 Coder 480B**
+- Cost: $2.00 per 1M tokens
+- **Why**: Agent capabilities, autonomous programming
+**Premium Option**: **GPT-5.2-Codex**
+- Cost: Part of GPT-5 pricing
+- **Why**: Agentic coding model, multi-step tasks
+### 6. Budget-Conscious Production
+**Primary Choice**: **DeepSeek Chat**
+- Cost: $0.28/$0.42 per 1M tokens
+- **Why**: Best overall value, reliable
+**Alternative**: **Mistral Devstral Small 2**
+- Cost: $0.10/$0.30 per 1M tokens
+- **Why**: Code-specialized, very affordable
+**Flash Option**: **Gemini 2.0 Flash-Lite**
+- Cost: $0.10/$0.40 per 1M tokens
+- **Why**: Google reliability, competitive pricing
+- **Note**: 2.0 Flash deprecated March 31, 2026
+### 7. Batch Processing / Non-Urgent Tasks
+**Primary Choice**: **Gemini 2.5 Pro (Batch API)**
+- Cost: $0.625/$5.00 per 1M tokens (50% off)
+- **Why**: Premium quality at mid-tier pricing
+**Alternative**: **Claude Opus 4.5 (Batch API)**
+- Cost: $2.50/$12.50 per 1M tokens (50% off)
+- **Why**: Highest quality, reasonable with batch discount
+**Budget Option**: **OpenAI GPT-5 (Batch API)**
+- Cost: $0.625/$5.00 per 1M tokens (50% off)
+- **Why**: Same price as Gemini Pro batch, OpenAI ecosystem
+---
+## Implementation Strategy
+### Phase 1: Evaluation & Pilot (Weeks 1-4)
+**Objectives:**
+- Test top 3 providers with production workload
+- Measure quality, speed, and cost
+- Identify edge cases and failure modes
+**Recommended Providers to Test:**
+1. **DeepSeek V3.2-Exp** (primary cost-saver)
+2. **Gemini 2.5 Flash** (balanced option)
+3. **Qwen 2.5 Coder 32B via Together AI** (code-specialized)
+**Methodology:**
+- Run same 1,000 code files through each provider
+- Compare summaries (quality, accuracy, completeness)
+- Measure latency (p50, p95, p99)
+- Calculate actual costs with production usage patterns
+- Test caching effectiveness
+**Success Criteria:**
+- Quality: ≥90% as good as current baseline
+- Speed: p95 latency <3s for typical file
+- Cost: ≥50% reduction vs current provider
+### Phase 2: Multi-Provider Architecture (Weeks 5-8)
+**Objectives:**
+- Implement provider abstraction layer
+- Add fallback logic for reliability
+- Enable A/B testing and gradual rollout
+**Architecture:**
+```typescript
+interface LLMProvider {
+  summarize(code: string, options: SummarizeOptions): Promise<Summary>
+  estimateCost(tokens: number): number
+  getCapabilities(): ProviderCapabilities
+}
+class ProviderManager {
+  providers: Map<string, LLMProvider>
+  async summarize(
+    code: string,
+    options: {
+      preferredProvider?: string
+      fallbackProviders?: string[]
+      maxCost?: number
+      qualityTier?: 'budget' | 'standard' | 'premium'
+    }
+  ): Promise<Summary>
+}
+```
+**Provider Configuration:**
+```yaml
+providers:
+  primary:
+    - deepseek-v32-exp  # 95% of traffic
+    - gemini-25-flash   # 5% for comparison
+  fallback:
+    - gemini-25-flash   # If DeepSeek fails
+    - claude-haiku-45   # If both fail
+  quality-tiers:
+    budget:
+      - deepseek-chat
+      - mistral-devstral-small-2
+    standard:
+      - deepseek-v32-exp
+      - gemini-25-flash
+    premium:
+      - claude-sonnet-45
+      - gpt-5
+```
+### Phase 3: Production Rollout (Weeks 9-12)
+**Gradual Rollout:**
+- Week 9: 10% traffic to new providers
+- Week 10: 25% traffic
+- Week 11: 50% traffic
+- Week 12: 100% traffic (with fallbacks)
+**Monitoring:**
+- Cost per 1M tokens (actual vs expected)
+- Quality metrics (user feedback, manual review)
+- Latency (p50, p95, p99, p99.9)
+- Error rates by provider
+- Cache hit rates (DeepSeek, Gemini)
+**Optimization:**
+- Tune caching strategies for DeepSeek/Gemini
+- Implement request batching where applicable
+- Route requests based on complexity (simple → budget, complex → premium)
+### Phase 4: Local Deployment Evaluation (Weeks 13-16)
+**Objectives:**
+- Test Ollama + Qwen 2.5 Coder 32B locally
+- Measure ROI vs API costs
+- Evaluate for sensitive/proprietary code
+**Hardware:**
+- Acquire RTX 4090 24GB (~$1,600)
+- Or RTX 4060 Ti 16GB (~$500) for 14B model
+**Testing:**
+- Compare quality vs DeepSeek/Gemini
+- Measure throughput (tokens/sec)
+- Calculate power costs
+- Determine break-even point
+**Use Cases for Local:**
+- Proprietary codebases (can't use cloud APIs)
+- High-volume processing (>$500/month API costs)
+- Offline/air-gapped environments
+- Development/testing (free local usage)
+---
+## Risk Mitigation
+### Provider Reliability
+**Risk**: Single provider outage disrupts service
+**Mitigation:**
+- Multi-provider architecture with automatic fallback
+- Health checks and circuit breakers
+- SLA monitoring and alerting
+### Cost Overruns
+**Risk**: Unexpected usage spikes cause budget overruns
+**Mitigation:**
+- Per-request cost tracking and alerting
+- Rate limiting and quotas
+- Monthly budget caps with notifications
+- Caching strategies (90% cost reduction)
+### Quality Degradation
+**Risk**: Cheaper providers produce lower quality summaries
+**Mitigation:**
+- Automated quality metrics (length, coherence, key term coverage)
+- Manual review sampling (5-10% of outputs)
+- User feedback loops
+- A/B testing infrastructure
+### Data Privacy
+**Risk**: Sensitive code sent to cloud providers
+**Mitigation:**
+- Code scanning for secrets/credentials
+- Allowlist/blocklist for code patterns
+- Local deployment for sensitive repositories
+- Provider-specific data retention policies
+### Vendor Lock-In
+**Risk**: Dependency on specific provider features
+**Mitigation:**
+- Provider abstraction layer
+- Standard prompt templates
+- Avoid provider-specific features
+- Regular provider evaluation (quarterly)
+---
+## Monitoring & Metrics
+### Cost Metrics
+- **Cost per 1M tokens** (by provider)
+- **Total monthly spend** (by provider, by project)
+- **Cost per summary** (average, p50, p95)
+- **Cache hit rate** (DeepSeek, Gemini, Claude)
+- **Batch API usage** (% of requests, cost savings)
+### Quality Metrics
+- **Summary length** (tokens, characters)
+- **Key term coverage** (% of important identifiers included)
+- **User ratings** (thumbs up/down)
+- **Manual review scores** (1-5 scale, sampled)
+- **Edit distance** from baseline (A/B test)
+### Performance Metrics
+- **Latency p50/p95/p99** (by provider, by request size)
+- **Throughput** (requests/sec, tokens/sec)
+- **Error rate** (by provider, by error type)
+- **Retry rate** (failed requests requiring fallback)
+- **Timeout rate** (requests exceeding SLA)
+### Reliability Metrics
+- **Uptime** (by provider, 99.9% SLA)
+- **Failover count** (primary → fallback transitions)
+- **Provider health score** (composite metric)
+- **MTTR** (mean time to recovery from provider outage)
+---
+## Future Considerations
+### Emerging Models (2026+)
+**Watch List:**
+- **Qwen 4**: Expected 2026, likely further improvements
+- **Llama 5**: Meta's next generation
+- **Gemini 3**: Google's next flagship
+- **GPT-6**: OpenAI's future model
+- **Claude 5**: Anthropic's next generation
+- **DeepSeek V4**: Continuing aggressive pricing
+### Technology Trends
+**Context Windows:**
+- Trend toward 1M+ tokens (Gemini leads)
+- Enables whole-codebase summarization
+- Reduces chunking complexity
+**Specialized Code Models:**
+- More providers offering code-specific models
+- Better benchmark performance
+- Lower costs for code tasks
+**Local Model Quality:**
+- Open source closing gap with proprietary
+- Qwen 3 Coder 480B matches Claude Sonnet 4
+- Hardware requirements decreasing (MoE architectures)
+**Pricing Trends:**
+- Continued price competition (DeepSeek forcing reductions)
+- More optimization features (caching, batching)
+- Tiered pricing becoming standard
+### Integration Opportunities
+**LangChain/LlamaIndex:**
+- Standard frameworks for LLM integration
+- Provider-agnostic abstractions
+- RAG and agent capabilities
+**Ollama Ecosystem:**
+- Growing model library
+- Better quantization techniques
+- Improved inference engines
+**Cloud Platforms:**
+- AWS Bedrock, Azure OpenAI, Google Vertex AI
+- Unified billing and management
+- Enterprise features (VPC, compliance)
+---
+## Appendix: Quick Reference
+### Provider Selection Cheatsheet
+**Need lowest cost?** → DeepSeek V3.2-Exp ($0.14/$0.56)
+**Need best code quality?** → Qwen 2.5 Coder 32B or Mistral Devstral 2
+**Need largest context?** → Google Gemini 2.5 Pro (1M tokens)
+**Need highest overall quality?** → Claude Opus 4.5
+**Need local deployment?** → Qwen 2.5 Coder 32B via Ollama
+**Need multi-file refactoring?** → Mistral Devstral 2
+**Need batch processing?** → Gemini 2.5 Pro or Claude Opus 4.5 (Batch API)
+**Need privacy/security?** → Local deployment (Ollama + RTX 4090)
+**Need reliability?** → Google Gemini or OpenAI (established providers)
+### Cost-Per-Summary Estimates
+Assuming 2K token input file → 500 token summary:
+| Provider | Model | Cost per Summary |
+|----------|-------|------------------|
+| Qwen API | 2.5 Coder 32B | $0.000115 |
+| DeepSeek | V3.2-Exp | $0.000560 |
+| Mistral | Devstral Small 2 | $0.000350 |
+| Google | 2.5 Flash | $0.001850 |
+| Mistral | Devstral 2 | $0.001800 |
+| OpenAI | GPT-5-mini | $0.001500 |
+| OpenAI | GPT-5 | $0.007500 |
+| Anthropic | Haiku 4.5 | $0.004500 |
+| Anthropic | Sonnet 4.5 | $0.013500 |
+| Local | Qwen 2.5 Coder 32B | $0.000000* |
+*Excludes hardware amortization and electricity
+### Hardware Recommendations
+**Budget ($500)**: RTX 4060 Ti 16GB → Qwen 2.5 Coder 14B
+**Recommended ($1,600)**: **RTX 4090 24GB** → Qwen 2.5 Coder 32B
+**Enterprise ($15,000)**: A100 80GB → Qwen 3 Coder 480B or 70B models
+---
+## Sources
+### Google Gemini
+- [Gemini Developer API pricing](https://ai.google.dev/gemini-api/docs/pricing)
+- [Gemini models documentation](https://ai.google.dev/gemini-api/docs/models)
+- [Gemini API Pricing 2026: Complete Guide](https://www.aifreeapi.com/en/posts/gemini-api-pricing-2026)
+- [Google Gemini API Pricing Guide - MetaCTO](https://www.metacto.com/blogs/the-true-cost-of-google-gemini-a-guide-to-api-pricing-and-integration)
+- [Google Gemini 2.0 API Pricing](https://apidog.com/blog/google-gemini-2-0-api/)
+### Meta Llama
+- [Meta Llama 4 explained](https://www.techtarget.com/WhatIs/feature/Meta-Llama-4-explained-Everything-you-need-to-know)
+- [The Llama 4 herd announcement](https://ai.meta.com/blog/llama-4-multimodal-intelligence/)
+- [Code Llama announcement](https://ai.meta.com/blog/code-llama-large-language-model-coding/)
+- [Llama 4: Features and Comparison](https://gpt-trainer.com/blog/llama+4+evolution+features+comparison)
+### DeepSeek
+- [DeepSeek Models & Pricing](https://api-docs.deepseek.com/quick_start/pricing)
+- [DeepSeek V3.2-Exp pricing announcement](https://venturebeat.com/ai/deepseeks-new-v3-2-exp-model-cuts-api-pricing-in-half-to-less-than-3-cents)
+- [DeepSeek API Pricing Calculator](https://costgoat.com/pricing/deepseek-api)
+- [DeepSeek API Guide - DataCamp](https://www.datacamp.com/tutorial/deepseek-api)
+### Mistral AI
+- [Mistral AI Pricing](https://mistral.ai/pricing)
+- [Devstral 2 announcement](https://mistral.ai/news/devstral-2-vibe-cli)
+- [Mistral AI Models documentation](https://docs.mistral.ai/getting-started/models)
+- [Mistral Large 2411 Pricing](https://pricepertoken.com/pricing-page/model/mistral-ai-mistral-large-2411)
+### Ollama & Local Models
+- [Complete Ollama Tutorial 2026](https://dev.to/proflead/complete-ollama-tutorial-2026-llms-via-cli-cloud-python-3m97)
+- [Local AI Models for Coding 2026](https://failingfast.io/local-coding-ai-models/)
+- [Best GPU for Local LLM 2026](https://nutstudio.imyfone.com/llm-tips/best-gpu-for-local-llm/)
+- [Ollama VRAM Requirements Guide](https://localllm.in/blog/ollama-vram-requirements-for-local-llms)
+- [Best Local LLMs for 16GB VRAM](https://localllm.in/blog/best-local-llms-16gb-vram)
+### Anthropic Claude
+- [Claude API Pricing](https://platform.claude.com/docs/en/about-claude/pricing)
+- [Claude API Pricing 2026 - MetaCTO](https://www.metacto.com/blogs/anthropic-api-pricing-a-full-breakdown-of-costs-and-integration)
+- [Claude Opus 4 & 4.5 API Pricing](https://www.cometapi.com/the-guide-to-claude-opus-4--4-5-api-pricing-in-2026/)
+- [Anthropic Claude Review 2026](https://hackceleration.com/anthropic-review/)
+### Qwen
+- [Qwen-Coder model capabilities - Alibaba Cloud](https://www.alibabacloud.com/help/en/model-studio/qwen-coder)
+- [Qwen2.5 Coder 32B Model Specs](https://blog.galaxy.ai/model/qwen-2-5-coder-32b-instruct)
+- [Best Qwen Models in 2026](https://apidog.com/blog/best-qwen-models/)
+- [Qwen 2.5 vs Llama 3.3 comparison](https://www.humai.blog/qwen-2-5-vs-llama-3-3-best-open-source-llms-for-2026/)
+### OpenAI
+- [OpenAI Pricing](https://openai.com/api/pricing/)
+- [OpenAI Pricing in 2026](https://www.finout.io/blog/openai-pricing-in-2026)
+- [GPT-5: Features, Pricing & Accessibility](https://research.aimultiple.com/gpt-5/)
+- [GPT-5.2 Model documentation](https://platform.openai.com/docs/models/gpt-5.2)
+### Cohere
+- [Cohere Pricing](https://cohere.com/pricing)
+- [Cohere API Pricing 2026 - MetaCTO](https://www.metacto.com/blogs/cohere-pricing-explained-a-deep-dive-into-integration-development-costs)
+- [Cohere Command R+ documentation](https://docs.cohere.com/docs/command-r-plus)
+---
+**End of Report**