@llm-translate/cli 1.0.0-next.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (157) hide show
  1. package/.dockerignore +51 -0
  2. package/.env.example +33 -0
  3. package/.github/workflows/docs-pages.yml +57 -0
  4. package/.github/workflows/release.yml +49 -0
  5. package/.translaterc.json +44 -0
  6. package/CLAUDE.md +243 -0
  7. package/Dockerfile +55 -0
  8. package/README.md +371 -0
  9. package/RFC.md +1595 -0
  10. package/dist/cli/index.d.ts +2 -0
  11. package/dist/cli/index.js +4494 -0
  12. package/dist/cli/index.js.map +1 -0
  13. package/dist/index.d.ts +1152 -0
  14. package/dist/index.js +3841 -0
  15. package/dist/index.js.map +1 -0
  16. package/docker-compose.yml +56 -0
  17. package/docs/.vitepress/config.ts +161 -0
  18. package/docs/api/agent.md +262 -0
  19. package/docs/api/engine.md +274 -0
  20. package/docs/api/index.md +171 -0
  21. package/docs/api/providers.md +304 -0
  22. package/docs/changelog.md +64 -0
  23. package/docs/cli/dir.md +243 -0
  24. package/docs/cli/file.md +213 -0
  25. package/docs/cli/glossary.md +273 -0
  26. package/docs/cli/index.md +129 -0
  27. package/docs/cli/init.md +158 -0
  28. package/docs/cli/serve.md +211 -0
  29. package/docs/glossary.json +235 -0
  30. package/docs/guide/chunking.md +272 -0
  31. package/docs/guide/configuration.md +139 -0
  32. package/docs/guide/cost-optimization.md +237 -0
  33. package/docs/guide/docker.md +371 -0
  34. package/docs/guide/getting-started.md +150 -0
  35. package/docs/guide/glossary.md +241 -0
  36. package/docs/guide/index.md +86 -0
  37. package/docs/guide/ollama.md +515 -0
  38. package/docs/guide/prompt-caching.md +221 -0
  39. package/docs/guide/providers.md +232 -0
  40. package/docs/guide/quality-control.md +206 -0
  41. package/docs/guide/vitepress-integration.md +265 -0
  42. package/docs/index.md +63 -0
  43. package/docs/ja/api/agent.md +262 -0
  44. package/docs/ja/api/engine.md +274 -0
  45. package/docs/ja/api/index.md +171 -0
  46. package/docs/ja/api/providers.md +304 -0
  47. package/docs/ja/changelog.md +64 -0
  48. package/docs/ja/cli/dir.md +243 -0
  49. package/docs/ja/cli/file.md +213 -0
  50. package/docs/ja/cli/glossary.md +273 -0
  51. package/docs/ja/cli/index.md +111 -0
  52. package/docs/ja/cli/init.md +158 -0
  53. package/docs/ja/guide/chunking.md +271 -0
  54. package/docs/ja/guide/configuration.md +139 -0
  55. package/docs/ja/guide/cost-optimization.md +30 -0
  56. package/docs/ja/guide/getting-started.md +150 -0
  57. package/docs/ja/guide/glossary.md +214 -0
  58. package/docs/ja/guide/index.md +32 -0
  59. package/docs/ja/guide/ollama.md +410 -0
  60. package/docs/ja/guide/prompt-caching.md +221 -0
  61. package/docs/ja/guide/providers.md +232 -0
  62. package/docs/ja/guide/quality-control.md +137 -0
  63. package/docs/ja/guide/vitepress-integration.md +265 -0
  64. package/docs/ja/index.md +58 -0
  65. package/docs/ko/api/agent.md +262 -0
  66. package/docs/ko/api/engine.md +274 -0
  67. package/docs/ko/api/index.md +171 -0
  68. package/docs/ko/api/providers.md +304 -0
  69. package/docs/ko/changelog.md +64 -0
  70. package/docs/ko/cli/dir.md +243 -0
  71. package/docs/ko/cli/file.md +213 -0
  72. package/docs/ko/cli/glossary.md +273 -0
  73. package/docs/ko/cli/index.md +111 -0
  74. package/docs/ko/cli/init.md +158 -0
  75. package/docs/ko/guide/chunking.md +271 -0
  76. package/docs/ko/guide/configuration.md +139 -0
  77. package/docs/ko/guide/cost-optimization.md +30 -0
  78. package/docs/ko/guide/getting-started.md +150 -0
  79. package/docs/ko/guide/glossary.md +214 -0
  80. package/docs/ko/guide/index.md +32 -0
  81. package/docs/ko/guide/ollama.md +410 -0
  82. package/docs/ko/guide/prompt-caching.md +221 -0
  83. package/docs/ko/guide/providers.md +232 -0
  84. package/docs/ko/guide/quality-control.md +137 -0
  85. package/docs/ko/guide/vitepress-integration.md +265 -0
  86. package/docs/ko/index.md +58 -0
  87. package/docs/zh/api/agent.md +262 -0
  88. package/docs/zh/api/engine.md +274 -0
  89. package/docs/zh/api/index.md +171 -0
  90. package/docs/zh/api/providers.md +304 -0
  91. package/docs/zh/changelog.md +64 -0
  92. package/docs/zh/cli/dir.md +243 -0
  93. package/docs/zh/cli/file.md +213 -0
  94. package/docs/zh/cli/glossary.md +273 -0
  95. package/docs/zh/cli/index.md +111 -0
  96. package/docs/zh/cli/init.md +158 -0
  97. package/docs/zh/guide/chunking.md +271 -0
  98. package/docs/zh/guide/configuration.md +139 -0
  99. package/docs/zh/guide/cost-optimization.md +30 -0
  100. package/docs/zh/guide/getting-started.md +150 -0
  101. package/docs/zh/guide/glossary.md +214 -0
  102. package/docs/zh/guide/index.md +32 -0
  103. package/docs/zh/guide/ollama.md +410 -0
  104. package/docs/zh/guide/prompt-caching.md +221 -0
  105. package/docs/zh/guide/providers.md +232 -0
  106. package/docs/zh/guide/quality-control.md +137 -0
  107. package/docs/zh/guide/vitepress-integration.md +265 -0
  108. package/docs/zh/index.md +58 -0
  109. package/package.json +91 -0
  110. package/release.config.mjs +15 -0
  111. package/schemas/glossary.schema.json +110 -0
  112. package/src/cli/commands/dir.ts +469 -0
  113. package/src/cli/commands/file.ts +291 -0
  114. package/src/cli/commands/glossary.ts +221 -0
  115. package/src/cli/commands/init.ts +68 -0
  116. package/src/cli/commands/serve.ts +60 -0
  117. package/src/cli/index.ts +64 -0
  118. package/src/cli/options.ts +59 -0
  119. package/src/core/agent.ts +1119 -0
  120. package/src/core/chunker.ts +391 -0
  121. package/src/core/engine.ts +634 -0
  122. package/src/errors.ts +188 -0
  123. package/src/index.ts +147 -0
  124. package/src/integrations/vitepress.ts +549 -0
  125. package/src/parsers/markdown.ts +383 -0
  126. package/src/providers/claude.ts +259 -0
  127. package/src/providers/interface.ts +109 -0
  128. package/src/providers/ollama.ts +379 -0
  129. package/src/providers/openai.ts +308 -0
  130. package/src/providers/registry.ts +153 -0
  131. package/src/server/index.ts +152 -0
  132. package/src/server/middleware/auth.ts +93 -0
  133. package/src/server/middleware/logger.ts +90 -0
  134. package/src/server/routes/health.ts +84 -0
  135. package/src/server/routes/translate.ts +210 -0
  136. package/src/server/types.ts +138 -0
  137. package/src/services/cache.ts +899 -0
  138. package/src/services/config.ts +217 -0
  139. package/src/services/glossary.ts +247 -0
  140. package/src/types/analysis.ts +164 -0
  141. package/src/types/index.ts +265 -0
  142. package/src/types/modes.ts +121 -0
  143. package/src/types/mqm.ts +157 -0
  144. package/src/utils/logger.ts +141 -0
  145. package/src/utils/tokens.ts +116 -0
  146. package/tests/fixtures/glossaries/ml-glossary.json +53 -0
  147. package/tests/fixtures/input/lynq-installation.ko.md +350 -0
  148. package/tests/fixtures/input/lynq-installation.md +350 -0
  149. package/tests/fixtures/input/simple.ko.md +27 -0
  150. package/tests/fixtures/input/simple.md +27 -0
  151. package/tests/unit/chunker.test.ts +229 -0
  152. package/tests/unit/glossary.test.ts +146 -0
  153. package/tests/unit/markdown.test.ts +205 -0
  154. package/tests/unit/tokens.test.ts +81 -0
  155. package/tsconfig.json +28 -0
  156. package/tsup.config.ts +34 -0
  157. package/vitest.config.ts +16 -0
@@ -0,0 +1,515 @@
1
+ # Local Translation with Ollama
2
+
3
+ ::: info Translations
4
+ All non-English documentation is automatically translated using Claude Sonnet 4.
5
+ :::
6
+
7
+ Run llm-translate completely offline using Ollama. No API keys required, complete privacy for sensitive documents.
8
+
9
+ ::: warning Quality Varies by Model
10
+ Ollama translation quality is **highly dependent on model selection**. For reliable translation results:
11
+
12
+ - **Minimum**: 14B+ parameter models (e.g., `qwen2.5:14b`, `llama3.1:8b`)
13
+ - **Recommended**: 32B+ models (e.g., `qwen2.5:32b`, `llama3.3:70b`)
14
+ - **Not recommended**: Models under 7B produce inconsistent and often unusable translations
15
+
16
+ Smaller models (3B, 7B) may work for simple content but frequently fail on technical documentation, produce incomplete outputs, or ignore formatting instructions.
17
+ :::
18
+
19
+ ## Why Ollama?
20
+
21
+ - **Privacy**: Documents never leave your machine
22
+ - **No API costs**: Unlimited translations after initial setup
23
+ - **Offline**: Works without internet connection
24
+ - **Customizable**: Fine-tune models for your domain
25
+
26
+ ## System Requirements
27
+
28
+ ### Minimum (14B models)
29
+
30
+ - **RAM**: 16GB (for 14B models like qwen2.5:14b)
31
+ - **Storage**: 20GB free space
32
+ - **CPU**: Modern multi-core processor
33
+
34
+ ### Recommended
35
+
36
+ - **RAM**: 32GB+ (for larger models like qwen2.5:32b)
37
+ - **GPU**: NVIDIA with 16GB+ VRAM or Apple Silicon (M2/M3/M4)
38
+ - **Storage**: 100GB+ for multiple models
39
+
40
+ ### GPU Support
41
+
42
+ | Platform | GPU | Support |
43
+ |----------|-----|---------|
44
+ | macOS | Apple Silicon (M1/M2/M3/M4) | Excellent |
45
+ | Linux | NVIDIA (CUDA) | Excellent |
46
+ | Linux | AMD (ROCm) | Good |
47
+ | Windows | NVIDIA (CUDA) | Good |
48
+ | Windows | AMD | Limited |
49
+
50
+ ## Installation
51
+
52
+ ### macOS
53
+
54
+ ```bash
55
+ # Using Homebrew (recommended)
56
+ brew install ollama
57
+
58
+ # Or download from https://ollama.ai
59
+ ```
60
+
61
+ ### Linux
62
+
63
+ ```bash
64
+ # One-line installer
65
+ curl -fsSL https://ollama.ai/install.sh | sh
66
+
67
+ # Or using package managers
68
+ # Ubuntu/Debian
69
+ curl -fsSL https://ollama.ai/install.sh | sh
70
+
71
+ # Arch Linux
72
+ yay -S ollama
73
+ ```
74
+
75
+ ### Windows
76
+
77
+ Download the installer from [ollama.ai](https://ollama.ai/download/windows).
78
+
79
+ ### Docker
80
+
81
+ ```bash
82
+ # CPU only
83
+ docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
84
+
85
+ # With NVIDIA GPU
86
+ docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
87
+ ```
88
+
89
+ ## Quick Start
90
+
91
+ ### 1. Start Ollama Server
92
+
93
+ ```bash
94
+ # Start the server (runs in background)
95
+ ollama serve
96
+ ```
97
+
98
+ ::: tip
99
+ On macOS and Windows, Ollama starts automatically as a background service after installation.
100
+ :::
101
+
102
+ ### 2. Download a Model
103
+
104
+ ```bash
105
+ # Recommended: Qwen 2.5 14B (best multilingual support for local)
106
+ ollama pull qwen2.5:14b
107
+
108
+ # Alternative: Llama 3.2 (lighter, good for English-centric docs)
109
+ ollama pull llama3.2
110
+ ```
111
+
112
+ ### 3. Translate
113
+
114
+ ```bash
115
+ # Basic translation with qwen2.5:14b
116
+ llm-translate file README.md -o README.ko.md -s en -t ko --provider ollama --model qwen2.5:14b
117
+
118
+ # With specific model
119
+ llm-translate file doc.md -s en -t ja --provider ollama --model qwen2.5:14b
120
+ ```
121
+
122
+ ::: tip Qwen 2.5 for Translation
123
+ Qwen 2.5 supports 29 languages including Korean, Japanese, Chinese, and all major European languages. The 14B version offers excellent quality for translation tasks while running on 16GB RAM.
124
+ :::
125
+
126
+ ## Recommended Models for Translation
127
+
128
+ ### Best Quality (32B+)
129
+
130
+ | Model | Size | VRAM | Languages | Quality |
131
+ |-------|------|------|-----------|---------|
132
+ | `llama3.3` | 70B | 40GB+ | 100+ | Excellent |
133
+ | `qwen2.5:32b` | 32B | 20GB+ | 29 | Excellent |
134
+ | `llama3.1:70b` | 70B | 40GB+ | 8 | Very Good |
135
+
136
+ ### Lightweight with Best Language Support
137
+
138
+ For systems with limited resources, **Qwen2.5** offers the best multilingual support (29 languages).
139
+
140
+ | Model | Parameters | RAM | Languages | Quality | Best For |
141
+ |-------|-----------|-----|-----------|---------|----------|
142
+ | `qwen2.5:3b` | 3B | 3GB | 29 | Good | **Balanced (recommended)** |
143
+ | `qwen2.5:7b` | 7B | 6GB | 29 | Very Good | Quality priority |
144
+ | `gemma3:4b` | 4B | 4GB | Many | Good | Translation-optimized |
145
+ | `llama3.2` | 3B | 4GB | 8 | Good | English-centric docs |
146
+
147
+ ### Ultra-Lightweight (< 2GB RAM)
148
+
149
+ | Model | Parameters | RAM | Languages | Quality |
150
+ |-------|-----------|-----|-----------|---------|
151
+ | `qwen2.5:1.5b` | 1.5B | 2GB | 29 | Basic |
152
+ | `qwen2.5:0.5b` | 0.5B | 1GB | 29 | Basic |
153
+ | `gemma3:1b` | 1B | 1.5GB | Many | Basic |
154
+ | `llama3.2:1b` | 1B | 2GB | 8 | Basic |
155
+
156
+ ::: tip Qwen for Multilingual
157
+ Qwen2.5 supports 29 languages including Korean, Japanese, Chinese, and all major European languages. For non-English translation work, Qwen is often the best lightweight choice.
158
+ :::
159
+
160
+ ### Downloading Models
161
+
162
+ ```bash
163
+ # List available models
164
+ ollama list
165
+
166
+ # Recommended for translation (14B+)
167
+ ollama pull qwen2.5:14b # Best multilingual (29 languages)
168
+ ollama pull qwen2.5:32b # Higher quality, needs 32GB RAM
169
+ ollama pull llama3.1:8b # Good quality, lighter
170
+
171
+ # Lightweight options (may have quality issues)
172
+ ollama pull qwen2.5:7b # Better quality than 3B
173
+ ollama pull llama3.2 # Good for English-centric docs
174
+
175
+ # Other options
176
+ ollama pull mistral-nemo
177
+ ```
178
+
179
+ ## Configuration
180
+
181
+ ### Environment Variables
182
+
183
+ ```bash
184
+ # Default server URL (optional, this is the default)
185
+ export OLLAMA_BASE_URL=http://localhost:11434
186
+
187
+ # Custom server location
188
+ export OLLAMA_BASE_URL=http://192.168.1.100:11434
189
+ ```
190
+
191
+ ### Config File
192
+
193
+ ```json
194
+ {
195
+ "provider": {
196
+ "name": "ollama",
197
+ "model": "qwen2.5:14b",
198
+ "baseUrl": "http://localhost:11434"
199
+ },
200
+ "translation": {
201
+ "qualityThreshold": 75,
202
+ "maxIterations": 3
203
+ }
204
+ }
205
+ ```
206
+
207
+ ::: tip
208
+ For local models, a lower `qualityThreshold` (75) is recommended to avoid excessive refinement iterations. Use 14B+ models for reliable results.
209
+ :::
210
+
211
+ ### Model-Specific Settings
212
+
213
+ For different document types:
214
+
215
+ ```bash
216
+ # Best quality - qwen2.5:14b (recommended for most use cases)
217
+ llm-translate file api-spec.md -s en -t ko \
218
+ --provider ollama \
219
+ --model qwen2.5:14b \
220
+ --quality 75
221
+
222
+ # Higher quality with 32B model (requires 32GB RAM)
223
+ llm-translate file legal-doc.md -s en -t ko \
224
+ --provider ollama \
225
+ --model qwen2.5:32b \
226
+ --quality 80
227
+
228
+ # README files - lighter model for simple content
229
+ llm-translate file README.md -s en -t ko \
230
+ --provider ollama \
231
+ --model llama3.2 \
232
+ --quality 70
233
+
234
+ # Large documentation sets - balance speed and quality
235
+ llm-translate dir ./docs ./docs-ko -s en -t ko \
236
+ --provider ollama \
237
+ --model qwen2.5:14b \
238
+ --parallel 2
239
+ ```
240
+
241
+ ## Performance Optimization
242
+
243
+ ### GPU Acceleration
244
+
245
+ #### NVIDIA (Linux/Windows)
246
+
247
+ ```bash
248
+ # Check CUDA availability
249
+ nvidia-smi
250
+
251
+ # Ollama automatically uses CUDA if available
252
+ ollama serve
253
+ ```
254
+
255
+ #### Apple Silicon (macOS)
256
+
257
+ Metal acceleration is automatic on M1/M2/M3/M4 Macs.
258
+
259
+ ```bash
260
+ # Check GPU usage
261
+ sudo powermetrics --samplers gpu_power
262
+ ```
263
+
264
+ ### Memory Management
265
+
266
+ ```bash
267
+ # Set GPU memory limit (Linux with NVIDIA)
268
+ CUDA_VISIBLE_DEVICES=0 ollama serve
269
+
270
+ # Limit CPU threads
271
+ OLLAMA_NUM_PARALLEL=2 ollama serve
272
+ ```
273
+
274
+ ### Optimizing for Large Documents
275
+
276
+ ```bash
277
+ # Reduce chunk size for memory-constrained systems
278
+ llm-translate file large-doc.md --target ko \
279
+ --provider ollama \
280
+ --chunk-size 512
281
+
282
+ # Disable caching to reduce memory usage
283
+ llm-translate file doc.md --target ko \
284
+ --provider ollama \
285
+ --no-cache
286
+
287
+ # Single-threaded processing for stability
288
+ llm-translate dir ./docs ./docs-ko --target ko \
289
+ --provider ollama \
290
+ --parallel 1
291
+ ```
292
+
293
+ ## Remote Ollama Server
294
+
295
+ ### Server Setup
296
+
297
+ On the server machine:
298
+
299
+ ```bash
300
+ # Allow external connections
301
+ OLLAMA_HOST=0.0.0.0 ollama serve
302
+ ```
303
+
304
+ ::: warning Security
305
+ Only expose Ollama on trusted networks. Consider using a VPN or SSH tunnel for remote access.
306
+ :::
307
+
308
+ ### SSH Tunnel (Recommended)
309
+
310
+ ```bash
311
+ # Create secure tunnel to remote server
312
+ ssh -L 11434:localhost:11434 user@remote-server
313
+
314
+ # Then use as normal
315
+ llm-translate file doc.md --target ko --provider ollama
316
+ ```
317
+
318
+ ### Direct Connection
319
+
320
+ ```bash
321
+ # Set remote server URL
322
+ export OLLAMA_BASE_URL=http://remote-server:11434
323
+
324
+ llm-translate file doc.md --target ko --provider ollama
325
+ ```
326
+
327
+ ### Docker Compose for Team Server
328
+
329
+ ```yaml
330
+ # docker-compose.yml
331
+ version: '3.8'
332
+ services:
333
+ ollama:
334
+ image: ollama/ollama
335
+ ports:
336
+ - "11434:11434"
337
+ volumes:
338
+ - ollama_data:/root/.ollama
339
+ deploy:
340
+ resources:
341
+ reservations:
342
+ devices:
343
+ - driver: nvidia
344
+ count: 1
345
+ capabilities: [gpu]
346
+ restart: unless-stopped
347
+
348
+ volumes:
349
+ ollama_data:
350
+ ```
351
+
352
+ ## Troubleshooting
353
+
354
+ ### Connection Errors
355
+
356
+ ```
357
+ Error: Cannot connect to Ollama server at http://localhost:11434
358
+ ```
359
+
360
+ **Solutions:**
361
+
362
+ ```bash
363
+ # Check if Ollama is running
364
+ curl http://localhost:11434/api/tags
365
+
366
+ # Start the server
367
+ ollama serve
368
+
369
+ # Check for port conflicts
370
+ lsof -i :11434
371
+ ```
372
+
373
+ ### Model Not Found
374
+
375
+ ```
376
+ Error: Model "llama3.2" not found. Pull it with: ollama pull llama3.2
377
+ ```
378
+
379
+ **Solution:**
380
+
381
+ ```bash
382
+ # Download the model
383
+ ollama pull llama3.2
384
+
385
+ # Verify installation
386
+ ollama list
387
+ ```
388
+
389
+ ### Out of Memory
390
+
391
+ ```
392
+ Error: Out of memory. Try a smaller model or reduce chunk size.
393
+ ```
394
+
395
+ **Solutions:**
396
+
397
+ ```bash
398
+ # Use a smaller model
399
+ ollama pull llama3.2:1b
400
+ llm-translate file doc.md --target ko --provider ollama --model llama3.2:1b
401
+
402
+ # Reduce chunk size
403
+ llm-translate file doc.md --target ko --provider ollama --chunk-size 256
404
+
405
+ # Close other applications to free RAM
406
+ ```
407
+
408
+ ### Slow Performance
409
+
410
+ **Solutions:**
411
+
412
+ 1. **Use GPU acceleration** - Ensure Ollama detects your GPU
413
+ 2. **Use smaller model** - 7B models are much faster than 70B
414
+ 3. **Reduce quality threshold** - Fewer refinement iterations
415
+ 4. **Increase chunk size** - Fewer API calls (if memory allows)
416
+
417
+ ```bash
418
+ # Check if GPU is being used
419
+ ollama run llama3.2 --verbose
420
+
421
+ # Fast translation settings
422
+ llm-translate file doc.md --target ko \
423
+ --provider ollama \
424
+ --model llama3.2 \
425
+ --quality 70 \
426
+ --max-iterations 2
427
+ ```
428
+
429
+ ### Quality Issues
430
+
431
+ Local models may produce lower quality than cloud APIs. Tips to improve:
432
+
433
+ 1. **Use larger models** when possible
434
+ 2. **Use models with good multilingual training** (Qwen, Llama 3.2+)
435
+ 3. **Provide glossary** for technical terms
436
+ 4. **Lower quality threshold** to avoid infinite refinement loops
437
+
438
+ ```bash
439
+ # High-quality local translation
440
+ llm-translate file doc.md --target ko \
441
+ --provider ollama \
442
+ --model qwen2.5:32b \
443
+ --glossary glossary.json \
444
+ --quality 80 \
445
+ --max-iterations 4
446
+ ```
447
+
448
+ ## Comparison: Cloud vs Local
449
+
450
+ | Aspect | Cloud (Claude/OpenAI) | Local (Ollama) |
451
+ |--------|----------------------|----------------|
452
+ | **Privacy** | Data sent to servers | Fully private |
453
+ | **Cost** | Per-token pricing | Free after setup |
454
+ | **Quality** | Excellent | Good to Very Good (model dependent) |
455
+ | **Speed** | Fast | Varies with hardware |
456
+ | **Offline** | No | Yes |
457
+ | **Setup** | API key only | Install + download model |
458
+ | **Context** | 200K (Claude) | 32K-128K |
459
+
460
+ ::: info Local Translation Considerations
461
+ Local models (14B+) can produce good translation results but may not match cloud API quality for complex or nuanced content. Use larger models (32B+) for better results.
462
+ :::
463
+
464
+ ### When to Use Ollama
465
+
466
+ - Sensitive/confidential documents
467
+ - Air-gapped environments
468
+ - High-volume translation (cost savings)
469
+ - Privacy-conscious organizations
470
+ - Simple to moderate complexity documents
471
+
472
+ ### When to Use Cloud APIs
473
+
474
+ - Need for prompt caching (Claude - 90% cost reduction)
475
+ - Limited local hardware (< 16GB RAM)
476
+ - Need highest quality translations
477
+ - Complex technical or legal documents
478
+ - Occasional/low-volume translation
479
+
480
+ ## Advanced: Custom Models
481
+
482
+ ### Creating a Translation-Optimized Model
483
+
484
+ Create a `Modelfile` based on Qwen:
485
+
486
+ ```dockerfile
487
+ FROM qwen2.5:14b
488
+
489
+ PARAMETER temperature 0.3
490
+ PARAMETER num_ctx 32768
491
+
492
+ SYSTEM """You are a professional translator. Follow these rules:
493
+ 1. Maintain the original formatting (markdown, code blocks)
494
+ 2. Never translate code inside code blocks
495
+ 3. Keep URLs and file paths unchanged
496
+ 4. Translate naturally, not literally
497
+ 5. Use formal/polite register for Korean (경어체) and Japanese (です・ます調)
498
+ """
499
+ ```
500
+
501
+ Build and use:
502
+
503
+ ```bash
504
+ # Create custom model
505
+ ollama create translator -f Modelfile
506
+
507
+ # Use for translation
508
+ llm-translate file doc.md -s en -t ko --provider ollama --model translator
509
+ ```
510
+
511
+ ## Next Steps
512
+
513
+ - [Configure glossaries](./glossary) for consistent terminology
514
+ - [Optimize chunking](./chunking) for your documents
515
+ - [Set up quality control](./quality-control) thresholds
@@ -0,0 +1,221 @@
1
+ # Prompt Caching
2
+
3
+ ::: info Translations
4
+ All non-English documentation is automatically translated using Claude Sonnet 4.
5
+ :::
6
+
7
+ Prompt caching is a cost optimization feature that reduces API costs by up to 90% for repeated content.
8
+
9
+ ## How It Works
10
+
11
+ When translating documents, certain parts of the prompt remain constant:
12
+
13
+ - **System instructions**: Translation rules and guidelines
14
+ - **Glossary**: Domain-specific terminology
15
+
16
+ These are cached and reused across multiple chunks, saving significant costs.
17
+
18
+ ```
19
+ Request 1 (First Chunk):
20
+ ┌─────────────────────────────────┐
21
+ │ System Instructions (CACHED) │ ◀─ Written to cache
22
+ ├─────────────────────────────────┤
23
+ │ Glossary (CACHED) │ ◀─ Written to cache
24
+ ├─────────────────────────────────┤
25
+ │ Source Text (NOT cached) │
26
+ └─────────────────────────────────┘
27
+
28
+ Request 2+ (Subsequent Chunks):
29
+ ┌─────────────────────────────────┐
30
+ │ System Instructions (CACHED) │ ◀─ Read from cache (90% off)
31
+ ├─────────────────────────────────┤
32
+ │ Glossary (CACHED) │ ◀─ Read from cache (90% off)
33
+ ├─────────────────────────────────┤
34
+ │ Source Text (NOT cached) │
35
+ └─────────────────────────────────┘
36
+ ```
37
+
38
+ ## Cost Impact
39
+
40
+ ### Pricing (Claude)
41
+
42
+ | Token Type | Cost Multiplier |
43
+ |------------|-----------------|
44
+ | Regular input | 1.0x |
45
+ | Cache write | 1.25x (first use) |
46
+ | Cache read | 0.1x (subsequent) |
47
+ | Output | 1.0x |
48
+
49
+ ### Example Calculation
50
+
51
+ For a 10-chunk document with 500-token glossary:
52
+
53
+ **Without caching:**
54
+ ```
55
+ 10 chunks × 500 glossary tokens = 5,000 tokens
56
+ ```
57
+
58
+ **With caching:**
59
+ ```
60
+ First chunk: 500 × 1.25 = 625 tokens (cache write)
61
+ 9 chunks: 500 × 0.1 × 9 = 450 tokens (cache read)
62
+ Total: 1,075 tokens (78% savings)
63
+ ```
64
+
65
+ ## Requirements
66
+
67
+ ### Minimum Token Thresholds
68
+
69
+ Prompt caching requires minimum content length:
70
+
71
+ | Model | Minimum Tokens |
72
+ |-------|---------------|
73
+ | Claude Haiku 4.5 | 4,096 |
74
+ | Claude Haiku 3.5 | 2,048 |
75
+ | Claude Sonnet | 1,024 |
76
+ | Claude Opus | 1,024 |
77
+
78
+ Content below these thresholds won't be cached.
79
+
80
+ ### Provider Support
81
+
82
+ | Provider | Caching Support |
83
+ |----------|-----------------|
84
+ | Claude | ✅ Full support |
85
+ | OpenAI | ✅ Automatic |
86
+ | Ollama | ❌ Not available |
87
+
88
+ ## Configuration
89
+
90
+ Caching is enabled by default for Claude. To disable:
91
+
92
+ ```bash
93
+ llm-translate file doc.md -o doc.ko.md --target ko --no-cache
94
+ ```
95
+
96
+ Or in config:
97
+
98
+ ```json
99
+ {
100
+ "provider": {
101
+ "name": "claude",
102
+ "caching": false
103
+ }
104
+ }
105
+ ```
106
+
107
+ ## Monitoring Cache Performance
108
+
109
+ ### CLI Output
110
+
111
+ ```
112
+ ✓ Translation complete
113
+ Cache: 890 read / 234 written (78% hit rate)
114
+ ```
115
+
116
+ ### Verbose Mode
117
+
118
+ ```bash
119
+ llm-translate file doc.md -o doc.ko.md --target ko --verbose
120
+ ```
121
+
122
+ Shows per-chunk cache statistics:
123
+
124
+ ```
125
+ [Chunk 1/10] Cache: 0 read / 890 written
126
+ [Chunk 2/10] Cache: 890 read / 0 written
127
+ [Chunk 3/10] Cache: 890 read / 0 written
128
+ ...
129
+ ```
130
+
131
+ ### Programmatic Access
132
+
133
+ ```typescript
134
+ const result = await engine.translateFile({
135
+ input: 'doc.md',
136
+ output: 'doc.ko.md',
137
+ targetLang: 'ko',
138
+ });
139
+
140
+ console.log(result.metadata.tokensUsed);
141
+ // {
142
+ // input: 5000,
143
+ // output: 6000,
144
+ // cacheRead: 8000,
145
+ // cacheWrite: 1000
146
+ // }
147
+ ```
148
+
149
+ ## Maximizing Cache Efficiency
150
+
151
+ ### 1. Use Consistent Glossaries
152
+
153
+ Same glossary content = same cache key
154
+
155
+ ```bash
156
+ # Good: Same glossary for all files
157
+ llm-translate dir ./docs ./docs-ko --target ko --glossary glossary.json
158
+
159
+ # Less efficient: Different glossary per file
160
+ llm-translate file a.md --glossary a-glossary.json
161
+ llm-translate file b.md --glossary b-glossary.json
162
+ ```
163
+
164
+ ### 2. Batch Process Related Files
165
+
166
+ Cache persists for ~5 minutes. Process files together:
167
+
168
+ ```bash
169
+ # Efficient: Sequential processing shares cache
170
+ llm-translate dir ./docs ./docs-ko --target ko
171
+ ```
172
+
173
+ ### 3. Order Files by Size
174
+
175
+ Start with larger files to warm the cache:
176
+
177
+ ```bash
178
+ # Cache is populated by first file, reused by rest
179
+ llm-translate file large-doc.md ...
180
+ llm-translate file small-doc.md ...
181
+ ```
182
+
183
+ ### 4. Use Larger Glossaries Strategically
184
+
185
+ Larger glossaries benefit more from caching:
186
+
187
+ | Glossary Size | Cache Savings |
188
+ |---------------|---------------|
189
+ | 100 tokens | ~70% |
190
+ | 500 tokens | ~78% |
191
+ | 1000+ tokens | ~80%+ |
192
+
193
+ ## Troubleshooting
194
+
195
+ ### Cache Not Working
196
+
197
+ **Symptoms:** No `cacheRead` tokens reported
198
+
199
+ **Causes:**
200
+ 1. Content below minimum threshold
201
+ 2. Content changed between requests
202
+ 3. Cache TTL expired (5 minutes)
203
+
204
+ **Solutions:**
205
+ - Ensure glossary + system prompt > minimum tokens
206
+ - Process files in quick succession
207
+ - Use verbose mode to debug
208
+
209
+ ### High Cache Write Costs
210
+
211
+ **Symptoms:** More `cacheWrite` than expected
212
+
213
+ **Causes:**
214
+ 1. Many unique glossaries
215
+ 2. Files processed too far apart
216
+ 3. Cache invalidation between runs
217
+
218
+ **Solutions:**
219
+ - Consolidate glossaries
220
+ - Use batch processing
221
+ - Process within 5-minute windows