@llm-translate/cli 1.0.0-next.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.dockerignore +51 -0
- package/.env.example +33 -0
- package/.github/workflows/docs-pages.yml +57 -0
- package/.github/workflows/release.yml +49 -0
- package/.translaterc.json +44 -0
- package/CLAUDE.md +243 -0
- package/Dockerfile +55 -0
- package/README.md +371 -0
- package/RFC.md +1595 -0
- package/dist/cli/index.d.ts +2 -0
- package/dist/cli/index.js +4494 -0
- package/dist/cli/index.js.map +1 -0
- package/dist/index.d.ts +1152 -0
- package/dist/index.js +3841 -0
- package/dist/index.js.map +1 -0
- package/docker-compose.yml +56 -0
- package/docs/.vitepress/config.ts +161 -0
- package/docs/api/agent.md +262 -0
- package/docs/api/engine.md +274 -0
- package/docs/api/index.md +171 -0
- package/docs/api/providers.md +304 -0
- package/docs/changelog.md +64 -0
- package/docs/cli/dir.md +243 -0
- package/docs/cli/file.md +213 -0
- package/docs/cli/glossary.md +273 -0
- package/docs/cli/index.md +129 -0
- package/docs/cli/init.md +158 -0
- package/docs/cli/serve.md +211 -0
- package/docs/glossary.json +235 -0
- package/docs/guide/chunking.md +272 -0
- package/docs/guide/configuration.md +139 -0
- package/docs/guide/cost-optimization.md +237 -0
- package/docs/guide/docker.md +371 -0
- package/docs/guide/getting-started.md +150 -0
- package/docs/guide/glossary.md +241 -0
- package/docs/guide/index.md +86 -0
- package/docs/guide/ollama.md +515 -0
- package/docs/guide/prompt-caching.md +221 -0
- package/docs/guide/providers.md +232 -0
- package/docs/guide/quality-control.md +206 -0
- package/docs/guide/vitepress-integration.md +265 -0
- package/docs/index.md +63 -0
- package/docs/ja/api/agent.md +262 -0
- package/docs/ja/api/engine.md +274 -0
- package/docs/ja/api/index.md +171 -0
- package/docs/ja/api/providers.md +304 -0
- package/docs/ja/changelog.md +64 -0
- package/docs/ja/cli/dir.md +243 -0
- package/docs/ja/cli/file.md +213 -0
- package/docs/ja/cli/glossary.md +273 -0
- package/docs/ja/cli/index.md +111 -0
- package/docs/ja/cli/init.md +158 -0
- package/docs/ja/guide/chunking.md +271 -0
- package/docs/ja/guide/configuration.md +139 -0
- package/docs/ja/guide/cost-optimization.md +30 -0
- package/docs/ja/guide/getting-started.md +150 -0
- package/docs/ja/guide/glossary.md +214 -0
- package/docs/ja/guide/index.md +32 -0
- package/docs/ja/guide/ollama.md +410 -0
- package/docs/ja/guide/prompt-caching.md +221 -0
- package/docs/ja/guide/providers.md +232 -0
- package/docs/ja/guide/quality-control.md +137 -0
- package/docs/ja/guide/vitepress-integration.md +265 -0
- package/docs/ja/index.md +58 -0
- package/docs/ko/api/agent.md +262 -0
- package/docs/ko/api/engine.md +274 -0
- package/docs/ko/api/index.md +171 -0
- package/docs/ko/api/providers.md +304 -0
- package/docs/ko/changelog.md +64 -0
- package/docs/ko/cli/dir.md +243 -0
- package/docs/ko/cli/file.md +213 -0
- package/docs/ko/cli/glossary.md +273 -0
- package/docs/ko/cli/index.md +111 -0
- package/docs/ko/cli/init.md +158 -0
- package/docs/ko/guide/chunking.md +271 -0
- package/docs/ko/guide/configuration.md +139 -0
- package/docs/ko/guide/cost-optimization.md +30 -0
- package/docs/ko/guide/getting-started.md +150 -0
- package/docs/ko/guide/glossary.md +214 -0
- package/docs/ko/guide/index.md +32 -0
- package/docs/ko/guide/ollama.md +410 -0
- package/docs/ko/guide/prompt-caching.md +221 -0
- package/docs/ko/guide/providers.md +232 -0
- package/docs/ko/guide/quality-control.md +137 -0
- package/docs/ko/guide/vitepress-integration.md +265 -0
- package/docs/ko/index.md +58 -0
- package/docs/zh/api/agent.md +262 -0
- package/docs/zh/api/engine.md +274 -0
- package/docs/zh/api/index.md +171 -0
- package/docs/zh/api/providers.md +304 -0
- package/docs/zh/changelog.md +64 -0
- package/docs/zh/cli/dir.md +243 -0
- package/docs/zh/cli/file.md +213 -0
- package/docs/zh/cli/glossary.md +273 -0
- package/docs/zh/cli/index.md +111 -0
- package/docs/zh/cli/init.md +158 -0
- package/docs/zh/guide/chunking.md +271 -0
- package/docs/zh/guide/configuration.md +139 -0
- package/docs/zh/guide/cost-optimization.md +30 -0
- package/docs/zh/guide/getting-started.md +150 -0
- package/docs/zh/guide/glossary.md +214 -0
- package/docs/zh/guide/index.md +32 -0
- package/docs/zh/guide/ollama.md +410 -0
- package/docs/zh/guide/prompt-caching.md +221 -0
- package/docs/zh/guide/providers.md +232 -0
- package/docs/zh/guide/quality-control.md +137 -0
- package/docs/zh/guide/vitepress-integration.md +265 -0
- package/docs/zh/index.md +58 -0
- package/package.json +91 -0
- package/release.config.mjs +15 -0
- package/schemas/glossary.schema.json +110 -0
- package/src/cli/commands/dir.ts +469 -0
- package/src/cli/commands/file.ts +291 -0
- package/src/cli/commands/glossary.ts +221 -0
- package/src/cli/commands/init.ts +68 -0
- package/src/cli/commands/serve.ts +60 -0
- package/src/cli/index.ts +64 -0
- package/src/cli/options.ts +59 -0
- package/src/core/agent.ts +1119 -0
- package/src/core/chunker.ts +391 -0
- package/src/core/engine.ts +634 -0
- package/src/errors.ts +188 -0
- package/src/index.ts +147 -0
- package/src/integrations/vitepress.ts +549 -0
- package/src/parsers/markdown.ts +383 -0
- package/src/providers/claude.ts +259 -0
- package/src/providers/interface.ts +109 -0
- package/src/providers/ollama.ts +379 -0
- package/src/providers/openai.ts +308 -0
- package/src/providers/registry.ts +153 -0
- package/src/server/index.ts +152 -0
- package/src/server/middleware/auth.ts +93 -0
- package/src/server/middleware/logger.ts +90 -0
- package/src/server/routes/health.ts +84 -0
- package/src/server/routes/translate.ts +210 -0
- package/src/server/types.ts +138 -0
- package/src/services/cache.ts +899 -0
- package/src/services/config.ts +217 -0
- package/src/services/glossary.ts +247 -0
- package/src/types/analysis.ts +164 -0
- package/src/types/index.ts +265 -0
- package/src/types/modes.ts +121 -0
- package/src/types/mqm.ts +157 -0
- package/src/utils/logger.ts +141 -0
- package/src/utils/tokens.ts +116 -0
- package/tests/fixtures/glossaries/ml-glossary.json +53 -0
- package/tests/fixtures/input/lynq-installation.ko.md +350 -0
- package/tests/fixtures/input/lynq-installation.md +350 -0
- package/tests/fixtures/input/simple.ko.md +27 -0
- package/tests/fixtures/input/simple.md +27 -0
- package/tests/unit/chunker.test.ts +229 -0
- package/tests/unit/glossary.test.ts +146 -0
- package/tests/unit/markdown.test.ts +205 -0
- package/tests/unit/tokens.test.ts +81 -0
- package/tsconfig.json +28 -0
- package/tsup.config.ts +34 -0
- package/vitest.config.ts +16 -0
|
@@ -0,0 +1,515 @@
|
|
|
1
|
+
# Local Translation with Ollama
|
|
2
|
+
|
|
3
|
+
::: info Translations
|
|
4
|
+
All non-English documentation is automatically translated using Claude Sonnet 4.
|
|
5
|
+
:::
|
|
6
|
+
|
|
7
|
+
Run llm-translate completely offline using Ollama. No API keys required, complete privacy for sensitive documents.
|
|
8
|
+
|
|
9
|
+
::: warning Quality Varies by Model
|
|
10
|
+
Ollama translation quality is **highly dependent on model selection**. For reliable translation results:
|
|
11
|
+
|
|
12
|
+
- **Minimum**: 14B+ parameter models (e.g., `qwen2.5:14b`, `llama3.1:8b`)
|
|
13
|
+
- **Recommended**: 32B+ models (e.g., `qwen2.5:32b`, `llama3.3:70b`)
|
|
14
|
+
- **Not recommended**: Models under 7B produce inconsistent and often unusable translations
|
|
15
|
+
|
|
16
|
+
Smaller models (3B, 7B) may work for simple content but frequently fail on technical documentation, produce incomplete outputs, or ignore formatting instructions.
|
|
17
|
+
:::
|
|
18
|
+
|
|
19
|
+
## Why Ollama?
|
|
20
|
+
|
|
21
|
+
- **Privacy**: Documents never leave your machine
|
|
22
|
+
- **No API costs**: Unlimited translations after initial setup
|
|
23
|
+
- **Offline**: Works without internet connection
|
|
24
|
+
- **Customizable**: Fine-tune models for your domain
|
|
25
|
+
|
|
26
|
+
## System Requirements
|
|
27
|
+
|
|
28
|
+
### Minimum (14B models)
|
|
29
|
+
|
|
30
|
+
- **RAM**: 16GB (for 14B models like qwen2.5:14b)
|
|
31
|
+
- **Storage**: 20GB free space
|
|
32
|
+
- **CPU**: Modern multi-core processor
|
|
33
|
+
|
|
34
|
+
### Recommended
|
|
35
|
+
|
|
36
|
+
- **RAM**: 32GB+ (for larger models like qwen2.5:32b)
|
|
37
|
+
- **GPU**: NVIDIA with 16GB+ VRAM or Apple Silicon (M2/M3/M4)
|
|
38
|
+
- **Storage**: 100GB+ for multiple models
|
|
39
|
+
|
|
40
|
+
### GPU Support
|
|
41
|
+
|
|
42
|
+
| Platform | GPU | Support |
|
|
43
|
+
|----------|-----|---------|
|
|
44
|
+
| macOS | Apple Silicon (M1/M2/M3/M4) | Excellent |
|
|
45
|
+
| Linux | NVIDIA (CUDA) | Excellent |
|
|
46
|
+
| Linux | AMD (ROCm) | Good |
|
|
47
|
+
| Windows | NVIDIA (CUDA) | Good |
|
|
48
|
+
| Windows | AMD | Limited |
|
|
49
|
+
|
|
50
|
+
## Installation
|
|
51
|
+
|
|
52
|
+
### macOS
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
# Using Homebrew (recommended)
|
|
56
|
+
brew install ollama
|
|
57
|
+
|
|
58
|
+
# Or download from https://ollama.ai
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
### Linux
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
# One-line installer
|
|
65
|
+
curl -fsSL https://ollama.ai/install.sh | sh
|
|
66
|
+
|
|
67
|
+
# Or using package managers
|
|
68
|
+
# Ubuntu/Debian
|
|
69
|
+
curl -fsSL https://ollama.ai/install.sh | sh
|
|
70
|
+
|
|
71
|
+
# Arch Linux
|
|
72
|
+
yay -S ollama
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
### Windows
|
|
76
|
+
|
|
77
|
+
Download the installer from [ollama.ai](https://ollama.ai/download/windows).
|
|
78
|
+
|
|
79
|
+
### Docker
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
# CPU only
|
|
83
|
+
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
|
|
84
|
+
|
|
85
|
+
# With NVIDIA GPU
|
|
86
|
+
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
## Quick Start
|
|
90
|
+
|
|
91
|
+
### 1. Start Ollama Server
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
# Start the server (runs in background)
|
|
95
|
+
ollama serve
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
::: tip
|
|
99
|
+
On macOS and Windows, Ollama starts automatically as a background service after installation.
|
|
100
|
+
:::
|
|
101
|
+
|
|
102
|
+
### 2. Download a Model
|
|
103
|
+
|
|
104
|
+
```bash
|
|
105
|
+
# Recommended: Qwen 2.5 14B (best multilingual support for local)
|
|
106
|
+
ollama pull qwen2.5:14b
|
|
107
|
+
|
|
108
|
+
# Alternative: Llama 3.2 (lighter, good for English-centric docs)
|
|
109
|
+
ollama pull llama3.2
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
### 3. Translate
|
|
113
|
+
|
|
114
|
+
```bash
|
|
115
|
+
# Basic translation with qwen2.5:14b
|
|
116
|
+
llm-translate file README.md -o README.ko.md -s en -t ko --provider ollama --model qwen2.5:14b
|
|
117
|
+
|
|
118
|
+
# With specific model
|
|
119
|
+
llm-translate file doc.md -s en -t ja --provider ollama --model qwen2.5:14b
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
::: tip Qwen 2.5 for Translation
|
|
123
|
+
Qwen 2.5 supports 29 languages including Korean, Japanese, Chinese, and all major European languages. The 14B version offers excellent quality for translation tasks while running on 16GB RAM.
|
|
124
|
+
:::
|
|
125
|
+
|
|
126
|
+
## Recommended Models for Translation
|
|
127
|
+
|
|
128
|
+
### Best Quality (32B+)
|
|
129
|
+
|
|
130
|
+
| Model | Size | VRAM | Languages | Quality |
|
|
131
|
+
|-------|------|------|-----------|---------|
|
|
132
|
+
| `llama3.3` | 70B | 40GB+ | 100+ | Excellent |
|
|
133
|
+
| `qwen2.5:32b` | 32B | 20GB+ | 29 | Excellent |
|
|
134
|
+
| `llama3.1:70b` | 70B | 40GB+ | 8 | Very Good |
|
|
135
|
+
|
|
136
|
+
### Lightweight with Best Language Support
|
|
137
|
+
|
|
138
|
+
For systems with limited resources, **Qwen2.5** offers the best multilingual support (29 languages).
|
|
139
|
+
|
|
140
|
+
| Model | Parameters | RAM | Languages | Quality | Best For |
|
|
141
|
+
|-------|-----------|-----|-----------|---------|----------|
|
|
142
|
+
| `qwen2.5:3b` | 3B | 3GB | 29 | Good | **Balanced (recommended)** |
|
|
143
|
+
| `qwen2.5:7b` | 7B | 6GB | 29 | Very Good | Quality priority |
|
|
144
|
+
| `gemma3:4b` | 4B | 4GB | Many | Good | Translation-optimized |
|
|
145
|
+
| `llama3.2` | 3B | 4GB | 8 | Good | English-centric docs |
|
|
146
|
+
|
|
147
|
+
### Ultra-Lightweight (< 2GB RAM)
|
|
148
|
+
|
|
149
|
+
| Model | Parameters | RAM | Languages | Quality |
|
|
150
|
+
|-------|-----------|-----|-----------|---------|
|
|
151
|
+
| `qwen2.5:1.5b` | 1.5B | 2GB | 29 | Basic |
|
|
152
|
+
| `qwen2.5:0.5b` | 0.5B | 1GB | 29 | Basic |
|
|
153
|
+
| `gemma3:1b` | 1B | 1.5GB | Many | Basic |
|
|
154
|
+
| `llama3.2:1b` | 1B | 2GB | 8 | Basic |
|
|
155
|
+
|
|
156
|
+
::: tip Qwen for Multilingual
|
|
157
|
+
Qwen2.5 supports 29 languages including Korean, Japanese, Chinese, and all major European languages. For non-English translation work, Qwen is often the best lightweight choice.
|
|
158
|
+
:::
|
|
159
|
+
|
|
160
|
+
### Downloading Models
|
|
161
|
+
|
|
162
|
+
```bash
|
|
163
|
+
# List available models
|
|
164
|
+
ollama list
|
|
165
|
+
|
|
166
|
+
# Recommended for translation (14B+)
|
|
167
|
+
ollama pull qwen2.5:14b # Best multilingual (29 languages)
|
|
168
|
+
ollama pull qwen2.5:32b # Higher quality, needs 32GB RAM
|
|
169
|
+
ollama pull llama3.1:8b # Good quality, lighter
|
|
170
|
+
|
|
171
|
+
# Lightweight options (may have quality issues)
|
|
172
|
+
ollama pull qwen2.5:7b # Better quality than 3B
|
|
173
|
+
ollama pull llama3.2 # Good for English-centric docs
|
|
174
|
+
|
|
175
|
+
# Other options
|
|
176
|
+
ollama pull mistral-nemo
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
## Configuration
|
|
180
|
+
|
|
181
|
+
### Environment Variables
|
|
182
|
+
|
|
183
|
+
```bash
|
|
184
|
+
# Default server URL (optional, this is the default)
|
|
185
|
+
export OLLAMA_BASE_URL=http://localhost:11434
|
|
186
|
+
|
|
187
|
+
# Custom server location
|
|
188
|
+
export OLLAMA_BASE_URL=http://192.168.1.100:11434
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
### Config File
|
|
192
|
+
|
|
193
|
+
```json
|
|
194
|
+
{
|
|
195
|
+
"provider": {
|
|
196
|
+
"name": "ollama",
|
|
197
|
+
"model": "qwen2.5:14b",
|
|
198
|
+
"baseUrl": "http://localhost:11434"
|
|
199
|
+
},
|
|
200
|
+
"translation": {
|
|
201
|
+
"qualityThreshold": 75,
|
|
202
|
+
"maxIterations": 3
|
|
203
|
+
}
|
|
204
|
+
}
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
::: tip
|
|
208
|
+
For local models, a lower `qualityThreshold` (75) is recommended to avoid excessive refinement iterations. Use 14B+ models for reliable results.
|
|
209
|
+
:::
|
|
210
|
+
|
|
211
|
+
### Model-Specific Settings
|
|
212
|
+
|
|
213
|
+
For different document types:
|
|
214
|
+
|
|
215
|
+
```bash
|
|
216
|
+
# Best quality - qwen2.5:14b (recommended for most use cases)
|
|
217
|
+
llm-translate file api-spec.md -s en -t ko \
|
|
218
|
+
--provider ollama \
|
|
219
|
+
--model qwen2.5:14b \
|
|
220
|
+
--quality 75
|
|
221
|
+
|
|
222
|
+
# Higher quality with 32B model (requires 32GB RAM)
|
|
223
|
+
llm-translate file legal-doc.md -s en -t ko \
|
|
224
|
+
--provider ollama \
|
|
225
|
+
--model qwen2.5:32b \
|
|
226
|
+
--quality 80
|
|
227
|
+
|
|
228
|
+
# README files - lighter model for simple content
|
|
229
|
+
llm-translate file README.md -s en -t ko \
|
|
230
|
+
--provider ollama \
|
|
231
|
+
--model llama3.2 \
|
|
232
|
+
--quality 70
|
|
233
|
+
|
|
234
|
+
# Large documentation sets - balance speed and quality
|
|
235
|
+
llm-translate dir ./docs ./docs-ko -s en -t ko \
|
|
236
|
+
--provider ollama \
|
|
237
|
+
--model qwen2.5:14b \
|
|
238
|
+
--parallel 2
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
## Performance Optimization
|
|
242
|
+
|
|
243
|
+
### GPU Acceleration
|
|
244
|
+
|
|
245
|
+
#### NVIDIA (Linux/Windows)
|
|
246
|
+
|
|
247
|
+
```bash
|
|
248
|
+
# Check CUDA availability
|
|
249
|
+
nvidia-smi
|
|
250
|
+
|
|
251
|
+
# Ollama automatically uses CUDA if available
|
|
252
|
+
ollama serve
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
#### Apple Silicon (macOS)
|
|
256
|
+
|
|
257
|
+
Metal acceleration is automatic on M1/M2/M3/M4 Macs.
|
|
258
|
+
|
|
259
|
+
```bash
|
|
260
|
+
# Check GPU usage
|
|
261
|
+
sudo powermetrics --samplers gpu_power
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
### Memory Management
|
|
265
|
+
|
|
266
|
+
```bash
|
|
267
|
+
# Set GPU memory limit (Linux with NVIDIA)
|
|
268
|
+
CUDA_VISIBLE_DEVICES=0 ollama serve
|
|
269
|
+
|
|
270
|
+
# Limit CPU threads
|
|
271
|
+
OLLAMA_NUM_PARALLEL=2 ollama serve
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
### Optimizing for Large Documents
|
|
275
|
+
|
|
276
|
+
```bash
|
|
277
|
+
# Reduce chunk size for memory-constrained systems
|
|
278
|
+
llm-translate file large-doc.md --target ko \
|
|
279
|
+
--provider ollama \
|
|
280
|
+
--chunk-size 512
|
|
281
|
+
|
|
282
|
+
# Disable caching to reduce memory usage
|
|
283
|
+
llm-translate file doc.md --target ko \
|
|
284
|
+
--provider ollama \
|
|
285
|
+
--no-cache
|
|
286
|
+
|
|
287
|
+
# Single-threaded processing for stability
|
|
288
|
+
llm-translate dir ./docs ./docs-ko --target ko \
|
|
289
|
+
--provider ollama \
|
|
290
|
+
--parallel 1
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
## Remote Ollama Server
|
|
294
|
+
|
|
295
|
+
### Server Setup
|
|
296
|
+
|
|
297
|
+
On the server machine:
|
|
298
|
+
|
|
299
|
+
```bash
|
|
300
|
+
# Allow external connections
|
|
301
|
+
OLLAMA_HOST=0.0.0.0 ollama serve
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
::: warning Security
|
|
305
|
+
Only expose Ollama on trusted networks. Consider using a VPN or SSH tunnel for remote access.
|
|
306
|
+
:::
|
|
307
|
+
|
|
308
|
+
### SSH Tunnel (Recommended)
|
|
309
|
+
|
|
310
|
+
```bash
|
|
311
|
+
# Create secure tunnel to remote server
|
|
312
|
+
ssh -L 11434:localhost:11434 user@remote-server
|
|
313
|
+
|
|
314
|
+
# Then use as normal
|
|
315
|
+
llm-translate file doc.md --target ko --provider ollama
|
|
316
|
+
```
|
|
317
|
+
|
|
318
|
+
### Direct Connection
|
|
319
|
+
|
|
320
|
+
```bash
|
|
321
|
+
# Set remote server URL
|
|
322
|
+
export OLLAMA_BASE_URL=http://remote-server:11434
|
|
323
|
+
|
|
324
|
+
llm-translate file doc.md --target ko --provider ollama
|
|
325
|
+
```
|
|
326
|
+
|
|
327
|
+
### Docker Compose for Team Server
|
|
328
|
+
|
|
329
|
+
```yaml
|
|
330
|
+
# docker-compose.yml
|
|
331
|
+
version: '3.8'
|
|
332
|
+
services:
|
|
333
|
+
ollama:
|
|
334
|
+
image: ollama/ollama
|
|
335
|
+
ports:
|
|
336
|
+
- "11434:11434"
|
|
337
|
+
volumes:
|
|
338
|
+
- ollama_data:/root/.ollama
|
|
339
|
+
deploy:
|
|
340
|
+
resources:
|
|
341
|
+
reservations:
|
|
342
|
+
devices:
|
|
343
|
+
- driver: nvidia
|
|
344
|
+
count: 1
|
|
345
|
+
capabilities: [gpu]
|
|
346
|
+
restart: unless-stopped
|
|
347
|
+
|
|
348
|
+
volumes:
|
|
349
|
+
ollama_data:
|
|
350
|
+
```
|
|
351
|
+
|
|
352
|
+
## Troubleshooting
|
|
353
|
+
|
|
354
|
+
### Connection Errors
|
|
355
|
+
|
|
356
|
+
```
|
|
357
|
+
Error: Cannot connect to Ollama server at http://localhost:11434
|
|
358
|
+
```
|
|
359
|
+
|
|
360
|
+
**Solutions:**
|
|
361
|
+
|
|
362
|
+
```bash
|
|
363
|
+
# Check if Ollama is running
|
|
364
|
+
curl http://localhost:11434/api/tags
|
|
365
|
+
|
|
366
|
+
# Start the server
|
|
367
|
+
ollama serve
|
|
368
|
+
|
|
369
|
+
# Check for port conflicts
|
|
370
|
+
lsof -i :11434
|
|
371
|
+
```
|
|
372
|
+
|
|
373
|
+
### Model Not Found
|
|
374
|
+
|
|
375
|
+
```
|
|
376
|
+
Error: Model "llama3.2" not found. Pull it with: ollama pull llama3.2
|
|
377
|
+
```
|
|
378
|
+
|
|
379
|
+
**Solution:**
|
|
380
|
+
|
|
381
|
+
```bash
|
|
382
|
+
# Download the model
|
|
383
|
+
ollama pull llama3.2
|
|
384
|
+
|
|
385
|
+
# Verify installation
|
|
386
|
+
ollama list
|
|
387
|
+
```
|
|
388
|
+
|
|
389
|
+
### Out of Memory
|
|
390
|
+
|
|
391
|
+
```
|
|
392
|
+
Error: Out of memory. Try a smaller model or reduce chunk size.
|
|
393
|
+
```
|
|
394
|
+
|
|
395
|
+
**Solutions:**
|
|
396
|
+
|
|
397
|
+
```bash
|
|
398
|
+
# Use a smaller model
|
|
399
|
+
ollama pull llama3.2:1b
|
|
400
|
+
llm-translate file doc.md --target ko --provider ollama --model llama3.2:1b
|
|
401
|
+
|
|
402
|
+
# Reduce chunk size
|
|
403
|
+
llm-translate file doc.md --target ko --provider ollama --chunk-size 256
|
|
404
|
+
|
|
405
|
+
# Close other applications to free RAM
|
|
406
|
+
```
|
|
407
|
+
|
|
408
|
+
### Slow Performance
|
|
409
|
+
|
|
410
|
+
**Solutions:**
|
|
411
|
+
|
|
412
|
+
1. **Use GPU acceleration** - Ensure Ollama detects your GPU
|
|
413
|
+
2. **Use smaller model** - 7B models are much faster than 70B
|
|
414
|
+
3. **Reduce quality threshold** - Fewer refinement iterations
|
|
415
|
+
4. **Increase chunk size** - Fewer API calls (if memory allows)
|
|
416
|
+
|
|
417
|
+
```bash
|
|
418
|
+
# Check if GPU is being used
|
|
419
|
+
ollama run llama3.2 --verbose
|
|
420
|
+
|
|
421
|
+
# Fast translation settings
|
|
422
|
+
llm-translate file doc.md --target ko \
|
|
423
|
+
--provider ollama \
|
|
424
|
+
--model llama3.2 \
|
|
425
|
+
--quality 70 \
|
|
426
|
+
--max-iterations 2
|
|
427
|
+
```
|
|
428
|
+
|
|
429
|
+
### Quality Issues
|
|
430
|
+
|
|
431
|
+
Local models may produce lower quality than cloud APIs. Tips to improve:
|
|
432
|
+
|
|
433
|
+
1. **Use larger models** when possible
|
|
434
|
+
2. **Use models with good multilingual training** (Qwen, Llama 3.2+)
|
|
435
|
+
3. **Provide glossary** for technical terms
|
|
436
|
+
4. **Lower quality threshold** to avoid infinite refinement loops
|
|
437
|
+
|
|
438
|
+
```bash
|
|
439
|
+
# High-quality local translation
|
|
440
|
+
llm-translate file doc.md --target ko \
|
|
441
|
+
--provider ollama \
|
|
442
|
+
--model qwen2.5:32b \
|
|
443
|
+
--glossary glossary.json \
|
|
444
|
+
--quality 80 \
|
|
445
|
+
--max-iterations 4
|
|
446
|
+
```
|
|
447
|
+
|
|
448
|
+
## Comparison: Cloud vs Local
|
|
449
|
+
|
|
450
|
+
| Aspect | Cloud (Claude/OpenAI) | Local (Ollama) |
|
|
451
|
+
|--------|----------------------|----------------|
|
|
452
|
+
| **Privacy** | Data sent to servers | Fully private |
|
|
453
|
+
| **Cost** | Per-token pricing | Free after setup |
|
|
454
|
+
| **Quality** | Excellent | Good to Very Good (model dependent) |
|
|
455
|
+
| **Speed** | Fast | Varies with hardware |
|
|
456
|
+
| **Offline** | No | Yes |
|
|
457
|
+
| **Setup** | API key only | Install + download model |
|
|
458
|
+
| **Context** | 200K (Claude) | 32K-128K |
|
|
459
|
+
|
|
460
|
+
::: info Local Translation Considerations
|
|
461
|
+
Local models (14B+) can produce good translation results but may not match cloud API quality for complex or nuanced content. Use larger models (32B+) for better results.
|
|
462
|
+
:::
|
|
463
|
+
|
|
464
|
+
### When to Use Ollama
|
|
465
|
+
|
|
466
|
+
- Sensitive/confidential documents
|
|
467
|
+
- Air-gapped environments
|
|
468
|
+
- High-volume translation (cost savings)
|
|
469
|
+
- Privacy-conscious organizations
|
|
470
|
+
- Simple to moderate complexity documents
|
|
471
|
+
|
|
472
|
+
### When to Use Cloud APIs
|
|
473
|
+
|
|
474
|
+
- Need for prompt caching (Claude - 90% cost reduction)
|
|
475
|
+
- Limited local hardware (< 16GB RAM)
|
|
476
|
+
- Need highest quality translations
|
|
477
|
+
- Complex technical or legal documents
|
|
478
|
+
- Occasional/low-volume translation
|
|
479
|
+
|
|
480
|
+
## Advanced: Custom Models
|
|
481
|
+
|
|
482
|
+
### Creating a Translation-Optimized Model
|
|
483
|
+
|
|
484
|
+
Create a `Modelfile` based on Qwen:
|
|
485
|
+
|
|
486
|
+
```dockerfile
|
|
487
|
+
FROM qwen2.5:14b
|
|
488
|
+
|
|
489
|
+
PARAMETER temperature 0.3
|
|
490
|
+
PARAMETER num_ctx 32768
|
|
491
|
+
|
|
492
|
+
SYSTEM """You are a professional translator. Follow these rules:
|
|
493
|
+
1. Maintain the original formatting (markdown, code blocks)
|
|
494
|
+
2. Never translate code inside code blocks
|
|
495
|
+
3. Keep URLs and file paths unchanged
|
|
496
|
+
4. Translate naturally, not literally
|
|
497
|
+
5. Use formal/polite register for Korean (경어체) and Japanese (です・ます調)
|
|
498
|
+
"""
|
|
499
|
+
```
|
|
500
|
+
|
|
501
|
+
Build and use:
|
|
502
|
+
|
|
503
|
+
```bash
|
|
504
|
+
# Create custom model
|
|
505
|
+
ollama create translator -f Modelfile
|
|
506
|
+
|
|
507
|
+
# Use for translation
|
|
508
|
+
llm-translate file doc.md -s en -t ko --provider ollama --model translator
|
|
509
|
+
```
|
|
510
|
+
|
|
511
|
+
## Next Steps
|
|
512
|
+
|
|
513
|
+
- [Configure glossaries](./glossary) for consistent terminology
|
|
514
|
+
- [Optimize chunking](./chunking) for your documents
|
|
515
|
+
- [Set up quality control](./quality-control) thresholds
|
|
@@ -0,0 +1,221 @@
|
|
|
1
|
+
# Prompt Caching
|
|
2
|
+
|
|
3
|
+
::: info Translations
|
|
4
|
+
All non-English documentation is automatically translated using Claude Sonnet 4.
|
|
5
|
+
:::
|
|
6
|
+
|
|
7
|
+
Prompt caching is a cost optimization feature that reduces API costs by up to 90% for repeated content.
|
|
8
|
+
|
|
9
|
+
## How It Works
|
|
10
|
+
|
|
11
|
+
When translating documents, certain parts of the prompt remain constant:
|
|
12
|
+
|
|
13
|
+
- **System instructions**: Translation rules and guidelines
|
|
14
|
+
- **Glossary**: Domain-specific terminology
|
|
15
|
+
|
|
16
|
+
These are cached and reused across multiple chunks, saving significant costs.
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
Request 1 (First Chunk):
|
|
20
|
+
┌─────────────────────────────────┐
|
|
21
|
+
│ System Instructions (CACHED) │ ◀─ Written to cache
|
|
22
|
+
├─────────────────────────────────┤
|
|
23
|
+
│ Glossary (CACHED) │ ◀─ Written to cache
|
|
24
|
+
├─────────────────────────────────┤
|
|
25
|
+
│ Source Text (NOT cached) │
|
|
26
|
+
└─────────────────────────────────┘
|
|
27
|
+
|
|
28
|
+
Request 2+ (Subsequent Chunks):
|
|
29
|
+
┌─────────────────────────────────┐
|
|
30
|
+
│ System Instructions (CACHED) │ ◀─ Read from cache (90% off)
|
|
31
|
+
├─────────────────────────────────┤
|
|
32
|
+
│ Glossary (CACHED) │ ◀─ Read from cache (90% off)
|
|
33
|
+
├─────────────────────────────────┤
|
|
34
|
+
│ Source Text (NOT cached) │
|
|
35
|
+
└─────────────────────────────────┘
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## Cost Impact
|
|
39
|
+
|
|
40
|
+
### Pricing (Claude)
|
|
41
|
+
|
|
42
|
+
| Token Type | Cost Multiplier |
|
|
43
|
+
|------------|-----------------|
|
|
44
|
+
| Regular input | 1.0x |
|
|
45
|
+
| Cache write | 1.25x (first use) |
|
|
46
|
+
| Cache read | 0.1x (subsequent) |
|
|
47
|
+
| Output | 1.0x |
|
|
48
|
+
|
|
49
|
+
### Example Calculation
|
|
50
|
+
|
|
51
|
+
For a 10-chunk document with 500-token glossary:
|
|
52
|
+
|
|
53
|
+
**Without caching:**
|
|
54
|
+
```
|
|
55
|
+
10 chunks × 500 glossary tokens = 5,000 tokens
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
**With caching:**
|
|
59
|
+
```
|
|
60
|
+
First chunk: 500 × 1.25 = 625 tokens (cache write)
|
|
61
|
+
9 chunks: 500 × 0.1 × 9 = 450 tokens (cache read)
|
|
62
|
+
Total: 1,075 tokens (78% savings)
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
## Requirements
|
|
66
|
+
|
|
67
|
+
### Minimum Token Thresholds
|
|
68
|
+
|
|
69
|
+
Prompt caching requires minimum content length:
|
|
70
|
+
|
|
71
|
+
| Model | Minimum Tokens |
|
|
72
|
+
|-------|---------------|
|
|
73
|
+
| Claude Haiku 4.5 | 4,096 |
|
|
74
|
+
| Claude Haiku 3.5 | 2,048 |
|
|
75
|
+
| Claude Sonnet | 1,024 |
|
|
76
|
+
| Claude Opus | 1,024 |
|
|
77
|
+
|
|
78
|
+
Content below these thresholds won't be cached.
|
|
79
|
+
|
|
80
|
+
### Provider Support
|
|
81
|
+
|
|
82
|
+
| Provider | Caching Support |
|
|
83
|
+
|----------|-----------------|
|
|
84
|
+
| Claude | ✅ Full support |
|
|
85
|
+
| OpenAI | ✅ Automatic |
|
|
86
|
+
| Ollama | ❌ Not available |
|
|
87
|
+
|
|
88
|
+
## Configuration
|
|
89
|
+
|
|
90
|
+
Caching is enabled by default for Claude. To disable:
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
llm-translate file doc.md -o doc.ko.md --target ko --no-cache
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
Or in config:
|
|
97
|
+
|
|
98
|
+
```json
|
|
99
|
+
{
|
|
100
|
+
"provider": {
|
|
101
|
+
"name": "claude",
|
|
102
|
+
"caching": false
|
|
103
|
+
}
|
|
104
|
+
}
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
## Monitoring Cache Performance
|
|
108
|
+
|
|
109
|
+
### CLI Output
|
|
110
|
+
|
|
111
|
+
```
|
|
112
|
+
✓ Translation complete
|
|
113
|
+
Cache: 890 read / 234 written (78% hit rate)
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
### Verbose Mode
|
|
117
|
+
|
|
118
|
+
```bash
|
|
119
|
+
llm-translate file doc.md -o doc.ko.md --target ko --verbose
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
Shows per-chunk cache statistics:
|
|
123
|
+
|
|
124
|
+
```
|
|
125
|
+
[Chunk 1/10] Cache: 0 read / 890 written
|
|
126
|
+
[Chunk 2/10] Cache: 890 read / 0 written
|
|
127
|
+
[Chunk 3/10] Cache: 890 read / 0 written
|
|
128
|
+
...
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
### Programmatic Access
|
|
132
|
+
|
|
133
|
+
```typescript
|
|
134
|
+
const result = await engine.translateFile({
|
|
135
|
+
input: 'doc.md',
|
|
136
|
+
output: 'doc.ko.md',
|
|
137
|
+
targetLang: 'ko',
|
|
138
|
+
});
|
|
139
|
+
|
|
140
|
+
console.log(result.metadata.tokensUsed);
|
|
141
|
+
// {
|
|
142
|
+
// input: 5000,
|
|
143
|
+
// output: 6000,
|
|
144
|
+
// cacheRead: 8000,
|
|
145
|
+
// cacheWrite: 1000
|
|
146
|
+
// }
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
## Maximizing Cache Efficiency
|
|
150
|
+
|
|
151
|
+
### 1. Use Consistent Glossaries
|
|
152
|
+
|
|
153
|
+
Same glossary content = same cache key
|
|
154
|
+
|
|
155
|
+
```bash
|
|
156
|
+
# Good: Same glossary for all files
|
|
157
|
+
llm-translate dir ./docs ./docs-ko --target ko --glossary glossary.json
|
|
158
|
+
|
|
159
|
+
# Less efficient: Different glossary per file
|
|
160
|
+
llm-translate file a.md --glossary a-glossary.json
|
|
161
|
+
llm-translate file b.md --glossary b-glossary.json
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
### 2. Batch Process Related Files
|
|
165
|
+
|
|
166
|
+
Cache persists for ~5 minutes. Process files together:
|
|
167
|
+
|
|
168
|
+
```bash
|
|
169
|
+
# Efficient: Sequential processing shares cache
|
|
170
|
+
llm-translate dir ./docs ./docs-ko --target ko
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
### 3. Order Files by Size
|
|
174
|
+
|
|
175
|
+
Start with larger files to warm the cache:
|
|
176
|
+
|
|
177
|
+
```bash
|
|
178
|
+
# Cache is populated by first file, reused by rest
|
|
179
|
+
llm-translate file large-doc.md ...
|
|
180
|
+
llm-translate file small-doc.md ...
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
### 4. Use Larger Glossaries Strategically
|
|
184
|
+
|
|
185
|
+
Larger glossaries benefit more from caching:
|
|
186
|
+
|
|
187
|
+
| Glossary Size | Cache Savings |
|
|
188
|
+
|---------------|---------------|
|
|
189
|
+
| 100 tokens | ~70% |
|
|
190
|
+
| 500 tokens | ~78% |
|
|
191
|
+
| 1000+ tokens | ~80%+ |
|
|
192
|
+
|
|
193
|
+
## Troubleshooting
|
|
194
|
+
|
|
195
|
+
### Cache Not Working
|
|
196
|
+
|
|
197
|
+
**Symptoms:** No `cacheRead` tokens reported
|
|
198
|
+
|
|
199
|
+
**Causes:**
|
|
200
|
+
1. Content below minimum threshold
|
|
201
|
+
2. Content changed between requests
|
|
202
|
+
3. Cache TTL expired (5 minutes)
|
|
203
|
+
|
|
204
|
+
**Solutions:**
|
|
205
|
+
- Ensure glossary + system prompt > minimum tokens
|
|
206
|
+
- Process files in quick succession
|
|
207
|
+
- Use verbose mode to debug
|
|
208
|
+
|
|
209
|
+
### High Cache Write Costs
|
|
210
|
+
|
|
211
|
+
**Symptoms:** More `cacheWrite` than expected
|
|
212
|
+
|
|
213
|
+
**Causes:**
|
|
214
|
+
1. Many unique glossaries
|
|
215
|
+
2. Files processed too far apart
|
|
216
|
+
3. Cache invalidation between runs
|
|
217
|
+
|
|
218
|
+
**Solutions:**
|
|
219
|
+
- Consolidate glossaries
|
|
220
|
+
- Use batch processing
|
|
221
|
+
- Process within 5-minute windows
|