@llm-translate/cli 1.0.0-next.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.dockerignore +51 -0
- package/.env.example +33 -0
- package/.github/workflows/docs-pages.yml +57 -0
- package/.github/workflows/release.yml +49 -0
- package/.translaterc.json +44 -0
- package/CLAUDE.md +243 -0
- package/Dockerfile +55 -0
- package/README.md +371 -0
- package/RFC.md +1595 -0
- package/dist/cli/index.d.ts +2 -0
- package/dist/cli/index.js +4494 -0
- package/dist/cli/index.js.map +1 -0
- package/dist/index.d.ts +1152 -0
- package/dist/index.js +3841 -0
- package/dist/index.js.map +1 -0
- package/docker-compose.yml +56 -0
- package/docs/.vitepress/config.ts +161 -0
- package/docs/api/agent.md +262 -0
- package/docs/api/engine.md +274 -0
- package/docs/api/index.md +171 -0
- package/docs/api/providers.md +304 -0
- package/docs/changelog.md +64 -0
- package/docs/cli/dir.md +243 -0
- package/docs/cli/file.md +213 -0
- package/docs/cli/glossary.md +273 -0
- package/docs/cli/index.md +129 -0
- package/docs/cli/init.md +158 -0
- package/docs/cli/serve.md +211 -0
- package/docs/glossary.json +235 -0
- package/docs/guide/chunking.md +272 -0
- package/docs/guide/configuration.md +139 -0
- package/docs/guide/cost-optimization.md +237 -0
- package/docs/guide/docker.md +371 -0
- package/docs/guide/getting-started.md +150 -0
- package/docs/guide/glossary.md +241 -0
- package/docs/guide/index.md +86 -0
- package/docs/guide/ollama.md +515 -0
- package/docs/guide/prompt-caching.md +221 -0
- package/docs/guide/providers.md +232 -0
- package/docs/guide/quality-control.md +206 -0
- package/docs/guide/vitepress-integration.md +265 -0
- package/docs/index.md +63 -0
- package/docs/ja/api/agent.md +262 -0
- package/docs/ja/api/engine.md +274 -0
- package/docs/ja/api/index.md +171 -0
- package/docs/ja/api/providers.md +304 -0
- package/docs/ja/changelog.md +64 -0
- package/docs/ja/cli/dir.md +243 -0
- package/docs/ja/cli/file.md +213 -0
- package/docs/ja/cli/glossary.md +273 -0
- package/docs/ja/cli/index.md +111 -0
- package/docs/ja/cli/init.md +158 -0
- package/docs/ja/guide/chunking.md +271 -0
- package/docs/ja/guide/configuration.md +139 -0
- package/docs/ja/guide/cost-optimization.md +30 -0
- package/docs/ja/guide/getting-started.md +150 -0
- package/docs/ja/guide/glossary.md +214 -0
- package/docs/ja/guide/index.md +32 -0
- package/docs/ja/guide/ollama.md +410 -0
- package/docs/ja/guide/prompt-caching.md +221 -0
- package/docs/ja/guide/providers.md +232 -0
- package/docs/ja/guide/quality-control.md +137 -0
- package/docs/ja/guide/vitepress-integration.md +265 -0
- package/docs/ja/index.md +58 -0
- package/docs/ko/api/agent.md +262 -0
- package/docs/ko/api/engine.md +274 -0
- package/docs/ko/api/index.md +171 -0
- package/docs/ko/api/providers.md +304 -0
- package/docs/ko/changelog.md +64 -0
- package/docs/ko/cli/dir.md +243 -0
- package/docs/ko/cli/file.md +213 -0
- package/docs/ko/cli/glossary.md +273 -0
- package/docs/ko/cli/index.md +111 -0
- package/docs/ko/cli/init.md +158 -0
- package/docs/ko/guide/chunking.md +271 -0
- package/docs/ko/guide/configuration.md +139 -0
- package/docs/ko/guide/cost-optimization.md +30 -0
- package/docs/ko/guide/getting-started.md +150 -0
- package/docs/ko/guide/glossary.md +214 -0
- package/docs/ko/guide/index.md +32 -0
- package/docs/ko/guide/ollama.md +410 -0
- package/docs/ko/guide/prompt-caching.md +221 -0
- package/docs/ko/guide/providers.md +232 -0
- package/docs/ko/guide/quality-control.md +137 -0
- package/docs/ko/guide/vitepress-integration.md +265 -0
- package/docs/ko/index.md +58 -0
- package/docs/zh/api/agent.md +262 -0
- package/docs/zh/api/engine.md +274 -0
- package/docs/zh/api/index.md +171 -0
- package/docs/zh/api/providers.md +304 -0
- package/docs/zh/changelog.md +64 -0
- package/docs/zh/cli/dir.md +243 -0
- package/docs/zh/cli/file.md +213 -0
- package/docs/zh/cli/glossary.md +273 -0
- package/docs/zh/cli/index.md +111 -0
- package/docs/zh/cli/init.md +158 -0
- package/docs/zh/guide/chunking.md +271 -0
- package/docs/zh/guide/configuration.md +139 -0
- package/docs/zh/guide/cost-optimization.md +30 -0
- package/docs/zh/guide/getting-started.md +150 -0
- package/docs/zh/guide/glossary.md +214 -0
- package/docs/zh/guide/index.md +32 -0
- package/docs/zh/guide/ollama.md +410 -0
- package/docs/zh/guide/prompt-caching.md +221 -0
- package/docs/zh/guide/providers.md +232 -0
- package/docs/zh/guide/quality-control.md +137 -0
- package/docs/zh/guide/vitepress-integration.md +265 -0
- package/docs/zh/index.md +58 -0
- package/package.json +91 -0
- package/release.config.mjs +15 -0
- package/schemas/glossary.schema.json +110 -0
- package/src/cli/commands/dir.ts +469 -0
- package/src/cli/commands/file.ts +291 -0
- package/src/cli/commands/glossary.ts +221 -0
- package/src/cli/commands/init.ts +68 -0
- package/src/cli/commands/serve.ts +60 -0
- package/src/cli/index.ts +64 -0
- package/src/cli/options.ts +59 -0
- package/src/core/agent.ts +1119 -0
- package/src/core/chunker.ts +391 -0
- package/src/core/engine.ts +634 -0
- package/src/errors.ts +188 -0
- package/src/index.ts +147 -0
- package/src/integrations/vitepress.ts +549 -0
- package/src/parsers/markdown.ts +383 -0
- package/src/providers/claude.ts +259 -0
- package/src/providers/interface.ts +109 -0
- package/src/providers/ollama.ts +379 -0
- package/src/providers/openai.ts +308 -0
- package/src/providers/registry.ts +153 -0
- package/src/server/index.ts +152 -0
- package/src/server/middleware/auth.ts +93 -0
- package/src/server/middleware/logger.ts +90 -0
- package/src/server/routes/health.ts +84 -0
- package/src/server/routes/translate.ts +210 -0
- package/src/server/types.ts +138 -0
- package/src/services/cache.ts +899 -0
- package/src/services/config.ts +217 -0
- package/src/services/glossary.ts +247 -0
- package/src/types/analysis.ts +164 -0
- package/src/types/index.ts +265 -0
- package/src/types/modes.ts +121 -0
- package/src/types/mqm.ts +157 -0
- package/src/utils/logger.ts +141 -0
- package/src/utils/tokens.ts +116 -0
- package/tests/fixtures/glossaries/ml-glossary.json +53 -0
- package/tests/fixtures/input/lynq-installation.ko.md +350 -0
- package/tests/fixtures/input/lynq-installation.md +350 -0
- package/tests/fixtures/input/simple.ko.md +27 -0
- package/tests/fixtures/input/simple.md +27 -0
- package/tests/unit/chunker.test.ts +229 -0
- package/tests/unit/glossary.test.ts +146 -0
- package/tests/unit/markdown.test.ts +205 -0
- package/tests/unit/tokens.test.ts +81 -0
- package/tsconfig.json +28 -0
- package/tsup.config.ts +34 -0
- package/vitest.config.ts +16 -0
|
@@ -0,0 +1,410 @@
|
|
|
1
|
+
# 使用 Ollama 进行本地翻译
|
|
2
|
+
|
|
3
|
+
::: info 翻译说明
|
|
4
|
+
所有非英文文档均使用 Claude Sonnet 4 自动翻译。
|
|
5
|
+
:::
|
|
6
|
+
|
|
7
|
+
使用 Ollama 完全离线运行 llm-translate。无需 API 密钥,敏感文档完全隐私保护。
|
|
8
|
+
|
|
9
|
+
::: warning 质量因模型而异
|
|
10
|
+
Ollama 翻译质量**高度依赖于模型选择**。为获得可靠的翻译结果:
|
|
11
|
+
|
|
12
|
+
- **最低要求**:14B+ 参数模型(例如 `qwen2.5:14b` 、 `llama3.1:8b`)
|
|
13
|
+
- **推荐**:32B+ 模型(例如 `qwen2.5:32b` 、 `llama3.3:70b`)
|
|
14
|
+
- **不推荐**:7B 以下的模型会产生不一致且通常无法使用的翻译
|
|
15
|
+
|
|
16
|
+
较小的模型(3B、7B)可能适用于简单内容,但在技术文档上经常失败,产生不完整的输出,或忽略格式指令。
|
|
17
|
+
:::
|
|
18
|
+
|
|
19
|
+
## 为什么选择 Ollama?
|
|
20
|
+
|
|
21
|
+
- **隐私保护**:文档永不离开您的设备
|
|
22
|
+
- **无 API 成本**:初始设置后可无限翻译
|
|
23
|
+
- **离线工作**:无需互联网连接
|
|
24
|
+
- **可定制**:为您的领域微调模型
|
|
25
|
+
|
|
26
|
+
## 系统要求
|
|
27
|
+
|
|
28
|
+
### 最低要求(14B 模型)
|
|
29
|
+
|
|
30
|
+
- **内存**:16GB(适用于 qwen2.5:14b 等 14B 模型)
|
|
31
|
+
- **存储**:20GB 可用空间
|
|
32
|
+
- **CPU**:现代多核处理器
|
|
33
|
+
|
|
34
|
+
### 推荐配置
|
|
35
|
+
|
|
36
|
+
- **内存**:32GB+(适用于 qwen2.5:32b 等大型模型)
|
|
37
|
+
- **GPU**:NVIDIA 16GB+ 显存或 Apple Silicon(M2/M3/M4)
|
|
38
|
+
- **存储**:100GB+ 用于多个模型
|
|
39
|
+
|
|
40
|
+
### GPU 支持
|
|
41
|
+
|
|
42
|
+
| 平台 | GPU | 支持程度 |
|
|
43
|
+
|----------|-----|---------|
|
|
44
|
+
| macOS | Apple Silicon (M1/M2/M3/M4) | 优秀 |
|
|
45
|
+
| Linux | NVIDIA (CUDA) | 优秀 |
|
|
46
|
+
| Linux | AMD (ROCm) | 良好 |
|
|
47
|
+
| Windows | NVIDIA (CUDA) | 良好 |
|
|
48
|
+
| Windows | AMD | 有限 |
|
|
49
|
+
|
|
50
|
+
## 安装
|
|
51
|
+
|
|
52
|
+
### macOS
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
# Using Homebrew (recommended)
|
|
56
|
+
brew install ollama
|
|
57
|
+
|
|
58
|
+
# Or download from https://ollama.ai
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
### Linux
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
# One-line installer
|
|
65
|
+
curl -fsSL https://ollama.ai/install.sh | sh
|
|
66
|
+
|
|
67
|
+
# Or using package managers
|
|
68
|
+
# Ubuntu/Debian
|
|
69
|
+
curl -fsSL https://ollama.ai/install.sh | sh
|
|
70
|
+
|
|
71
|
+
# Arch Linux
|
|
72
|
+
yay -S ollama
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
### Windows
|
|
76
|
+
|
|
77
|
+
从 [ollama.ai](https://ollama.ai/download/windows) 下载安装程序。
|
|
78
|
+
|
|
79
|
+
### Docker
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
# CPU only
|
|
83
|
+
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
|
|
84
|
+
|
|
85
|
+
# With NVIDIA GPU
|
|
86
|
+
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
## 快速开始
|
|
90
|
+
|
|
91
|
+
### 1. 启动 Ollama 服务器
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
# Start the server (runs in background)
|
|
95
|
+
ollama serve
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
::: tip
|
|
99
|
+
在 macOS 和 Windows 上,Ollama 在安装后会自动作为后台服务启动。
|
|
100
|
+
:::
|
|
101
|
+
|
|
102
|
+
### 2. 下载模型
|
|
103
|
+
|
|
104
|
+
```bash
|
|
105
|
+
# Recommended: Qwen 2.5 14B (best multilingual support for local)
|
|
106
|
+
ollama pull qwen2.5:14b
|
|
107
|
+
|
|
108
|
+
# Alternative: Llama 3.2 (lighter, good for English-centric docs)
|
|
109
|
+
ollama pull llama3.2
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
### 3. 翻译
|
|
113
|
+
|
|
114
|
+
```bash
|
|
115
|
+
# Basic translation with qwen2.5:14b
|
|
116
|
+
llm-translate file README.md -o README.ko.md -s en -t ko --provider ollama --model qwen2.5:14b
|
|
117
|
+
|
|
118
|
+
# With specific model
|
|
119
|
+
llm-translate file doc.md -s en -t ja --provider ollama --model qwen2.5:14b
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
::: tip Qwen 2.5 翻译优势
|
|
123
|
+
Qwen 2.5 支持 29 种语言,包括韩语、日语、中文和所有主要欧洲语言。14B 版本在 16GB 内存上运行时为翻译任务提供优秀的质量。
|
|
124
|
+
:::
|
|
125
|
+
|
|
126
|
+
## 推荐的翻译模型
|
|
127
|
+
|
|
128
|
+
### 最佳质量(32B+)
|
|
129
|
+
|
|
130
|
+
| 模型 | 大小 | 显存 | 语言 | 质量 |
|
|
131
|
+
|-------|------|------|-----------|---------|
|
|
132
|
+
|`llama3.3`| 70B | 40GB+ | 100+ | 优秀 |
|
|
133
|
+
|`qwen2.5:32b`| 32B | 20GB+ | 29 | 优秀 |
|
|
134
|
+
|`llama3.1:70b`| 70B | 40GB+ | 8 | 很好 |
|
|
135
|
+
|
|
136
|
+
### 轻量级且语言支持最佳
|
|
137
|
+
|
|
138
|
+
对于资源有限的系统,**Qwen2.5** 提供最佳的多语言支持(29 种语言)。
|
|
139
|
+
|
|
140
|
+
| 模型 | 参数 | 内存 | 语言 | 质量 | 最适合 |
|
|
141
|
+
|-------|-----------|-----|-----------|---------|----------|
|
|
142
|
+
|`qwen2.5:3b`| 3B | 3GB | 29 | 良好 | **平衡(推荐)** |
|
|
143
|
+
|`qwen2.5:7b`| 7B | 6GB | 29 | 很好 | 质量优先 |
|
|
144
|
+
|`gemma3:4b`| 4B | 4GB | 多种 | 良好 | 翻译优化 |
|
|
145
|
+
|`llama3.2`| 3B | 4GB | 8 | 良好 | 英文为主的文档 |
|
|
146
|
+
|
|
147
|
+
### 超轻量级(< 2GB 内存)
|
|
148
|
+
|
|
149
|
+
| 模型 | 参数 | 内存 | 语言 | 质量 |
|
|
150
|
+
|-------|-----------|-----|-----------|---------|
|
|
151
|
+
|`qwen2.5:1.5b`| 1.5B | 2GB | 29 | 基础 |
|
|
152
|
+
|`qwen2.5:0.5b`| 0.5B | 1GB | 29 | 基础 |
|
|
153
|
+
|`gemma3:1b`| 1B | 1.5GB | 多种 | 基础 |
|
|
154
|
+
|`llama3.2:1b`| 1B | 2GB | 8 | 基础 |
|
|
155
|
+
|
|
156
|
+
::: tip Qwen 多语言支持
|
|
157
|
+
Qwen2.5 支持包括韩语、日语、中文和所有主要欧洲语言在内的 29 种语言。对于非英语翻译工作,Qwen 通常是最佳的轻量级选择。
|
|
158
|
+
:::
|
|
159
|
+
|
|
160
|
+
### 下载模型
|
|
161
|
+
|
|
162
|
+
```bash
|
|
163
|
+
# List available models
|
|
164
|
+
ollama list
|
|
165
|
+
|
|
166
|
+
# Recommended for translation (14B+)
|
|
167
|
+
ollama pull qwen2.5:14b # Best multilingual (29 languages)
|
|
168
|
+
ollama pull qwen2.5:32b # Higher quality, needs 32GB RAM
|
|
169
|
+
ollama pull llama3.1:8b # Good quality, lighter
|
|
170
|
+
|
|
171
|
+
# Lightweight options (may have quality issues)
|
|
172
|
+
ollama pull qwen2.5:7b # Better quality than 3B
|
|
173
|
+
ollama pull llama3.2 # Good for English-centric docs
|
|
174
|
+
|
|
175
|
+
# Other options
|
|
176
|
+
ollama pull mistral-nemo
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
## 配置
|
|
180
|
+
|
|
181
|
+
### 环境变量
|
|
182
|
+
|
|
183
|
+
```bash
|
|
184
|
+
# Default server URL (optional, this is the default)
|
|
185
|
+
export OLLAMA_BASE_URL=http://localhost:11434
|
|
186
|
+
|
|
187
|
+
# Custom server location
|
|
188
|
+
export OLLAMA_BASE_URL=http://192.168.1.100:11434
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
### 配置文件
|
|
192
|
+
|
|
193
|
+
```json
|
|
194
|
+
{
|
|
195
|
+
"provider": {
|
|
196
|
+
"name": "ollama",
|
|
197
|
+
"model": "qwen2.5:14b",
|
|
198
|
+
"baseUrl": "http://localhost:11434"
|
|
199
|
+
},
|
|
200
|
+
"translation": {
|
|
201
|
+
"qualityThreshold": 75,
|
|
202
|
+
"maxIterations": 3
|
|
203
|
+
}
|
|
204
|
+
}
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
::: tip
|
|
208
|
+
对于本地模型,建议使用较低的 `qualityThreshold`(75) 以避免过度的精化迭代。使用 14B+ 模型以获得可靠的结果。
|
|
209
|
+
:::
|
|
210
|
+
|
|
211
|
+
### 模型特定设置
|
|
212
|
+
|
|
213
|
+
针对不同文档类型:
|
|
214
|
+
|
|
215
|
+
```bash
|
|
216
|
+
# Best quality - qwen2.5:14b (recommended for most use cases)
|
|
217
|
+
llm-translate file api-spec.md -s en -t ko \
|
|
218
|
+
--provider ollama \
|
|
219
|
+
--model qwen2.5:14b \
|
|
220
|
+
--quality 75
|
|
221
|
+
|
|
222
|
+
# Higher quality with 32B model (requires 32GB RAM)
|
|
223
|
+
llm-translate file legal-doc.md -s en -t ko \
|
|
224
|
+
--provider ollama \
|
|
225
|
+
--model qwen2.5:32b \
|
|
226
|
+
--quality 80
|
|
227
|
+
|
|
228
|
+
# README files - lighter model for simple content
|
|
229
|
+
llm-translate file README.md -s en -t ko \
|
|
230
|
+
--provider ollama \
|
|
231
|
+
--model llama3.2 \
|
|
232
|
+
--quality 70
|
|
233
|
+
|
|
234
|
+
# Large documentation sets - balance speed and quality
|
|
235
|
+
llm-translate dir ./docs ./docs-ko -s en -t ko \
|
|
236
|
+
--provider ollama \
|
|
237
|
+
--model qwen2.5:14b \
|
|
238
|
+
--parallel 2
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
## 性能优化
|
|
242
|
+
|
|
243
|
+
### GPU 加速
|
|
244
|
+
|
|
245
|
+
#### NVIDIA (Linux/Windows)
|
|
246
|
+
|
|
247
|
+
```bash
|
|
248
|
+
# Check CUDA availability
|
|
249
|
+
nvidia-smi
|
|
250
|
+
|
|
251
|
+
# Ollama automatically uses CUDA if available
|
|
252
|
+
ollama serve
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
#### Apple Silicon (macOS)
|
|
256
|
+
|
|
257
|
+
Metal 加速在 M1/M2/M3/M4 Mac 上自动启用。
|
|
258
|
+
|
|
259
|
+
```bash
|
|
260
|
+
# Check GPU usage
|
|
261
|
+
sudo powermetrics --samplers gpu_power
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
### 内存管理
|
|
265
|
+
|
|
266
|
+
```bash
|
|
267
|
+
# Set GPU memory limit (Linux with NVIDIA)
|
|
268
|
+
CUDA_VISIBLE_DEVICES=0 ollama serve
|
|
269
|
+
|
|
270
|
+
# Limit CPU threads
|
|
271
|
+
OLLAMA_NUM_PARALLEL=2 ollama serve
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
### 大文档优化
|
|
275
|
+
|
|
276
|
+
```bash
|
|
277
|
+
# Reduce chunk size for memory-constrained systems
|
|
278
|
+
llm-translate file large-doc.md --target ko \
|
|
279
|
+
--provider ollama \
|
|
280
|
+
--chunk-size 512
|
|
281
|
+
|
|
282
|
+
# Disable caching to reduce memory usage
|
|
283
|
+
llm-translate file doc.md --target ko \
|
|
284
|
+
--provider ollama \
|
|
285
|
+
--no-cache
|
|
286
|
+
|
|
287
|
+
# Single-threaded processing for stability
|
|
288
|
+
llm-translate dir ./docs ./docs-ko --target ko \
|
|
289
|
+
--provider ollama \
|
|
290
|
+
--parallel 1
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
## 远程 Ollama 服务器
|
|
294
|
+
|
|
295
|
+
### 服务器设置
|
|
296
|
+
|
|
297
|
+
在服务器机器上:
|
|
298
|
+
|
|
299
|
+
```bash
|
|
300
|
+
# Allow external connections
|
|
301
|
+
OLLAMA_HOST=0.0.0.0 ollama serve
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
::: warning 安全提醒
|
|
305
|
+
仅在受信任的网络上暴露 Ollama。考虑使用 VPN 或 SSH 隧道进行远程访问。
|
|
306
|
+
:::
|
|
307
|
+
|
|
308
|
+
### SSH 隧道(推荐)
|
|
309
|
+
|
|
310
|
+
```bash
|
|
311
|
+
# Create secure tunnel to remote server
|
|
312
|
+
ssh -L 11434:localhost:11434 user@remote-server
|
|
313
|
+
|
|
314
|
+
# Then use as normal
|
|
315
|
+
llm-translate file doc.md --target ko --provider ollama
|
|
316
|
+
```
|
|
317
|
+
|
|
318
|
+
### 直接连接
|
|
319
|
+
|
|
320
|
+
```bash
|
|
321
|
+
# Set remote server URL
|
|
322
|
+
export OLLAMA_BASE_URL=http://remote-server:11434
|
|
323
|
+
|
|
324
|
+
llm-translate file doc.md --target ko --provider ollama
|
|
325
|
+
```
|
|
326
|
+
|
|
327
|
+
### 团队服务器的 Docker Compose
|
|
328
|
+
|
|
329
|
+
```yaml
|
|
330
|
+
# docker-compose.yml
|
|
331
|
+
version: '3.8'
|
|
332
|
+
services:
|
|
333
|
+
ollama:
|
|
334
|
+
image: ollama/ollama
|
|
335
|
+
ports:
|
|
336
|
+
- "11434:11434"
|
|
337
|
+
volumes:
|
|
338
|
+
- ollama_data:/root/.ollama
|
|
339
|
+
deploy:
|
|
340
|
+
resources:
|
|
341
|
+
reservations:
|
|
342
|
+
devices:
|
|
343
|
+
- driver: nvidia
|
|
344
|
+
count: 1
|
|
345
|
+
capabilities: [gpu]
|
|
346
|
+
restart: unless-stopped
|
|
347
|
+
|
|
348
|
+
volumes:
|
|
349
|
+
ollama_data:
|
|
350
|
+
```
|
|
351
|
+
|
|
352
|
+
## 故障排除
|
|
353
|
+
|
|
354
|
+
### 连接错误
|
|
355
|
+
|
|
356
|
+
```
|
|
357
|
+
Error: Cannot connect to Ollama server at http://localhost:11434
|
|
358
|
+
```
|
|
359
|
+
|
|
360
|
+
**解决方案:**
|
|
361
|
+
|
|
362
|
+
```bash
|
|
363
|
+
# Check if Ollama is running
|
|
364
|
+
curl http://localhost:11434/api/tags
|
|
365
|
+
|
|
366
|
+
# Start the server
|
|
367
|
+
ollama serve
|
|
368
|
+
|
|
369
|
+
# Check for port conflicts
|
|
370
|
+
lsof -i :11434
|
|
371
|
+
```
|
|
372
|
+
|
|
373
|
+
### 模型未找到
|
|
374
|
+
|
|
375
|
+
```
|
|
376
|
+
Error: Model "llama3.2" not found. Pull it with: ollama pull llama3.2
|
|
377
|
+
```
|
|
378
|
+
|
|
379
|
+
**解决方案:**
|
|
380
|
+
|
|
381
|
+
```bash
|
|
382
|
+
# Download the model
|
|
383
|
+
ollama pull llama3.2
|
|
384
|
+
|
|
385
|
+
# Verify installation
|
|
386
|
+
ollama list
|
|
387
|
+
```
|
|
388
|
+
|
|
389
|
+
### 内存不足
|
|
390
|
+
|
|
391
|
+
```
|
|
392
|
+
Error: Out of memory. Try a smaller model or reduce chunk size.
|
|
393
|
+
```
|
|
394
|
+
|
|
395
|
+
**解决方案:**
|
|
396
|
+
|
|
397
|
+
```bash
|
|
398
|
+
# Use a smaller model
|
|
399
|
+
ollama pull llama3.2:1b
|
|
400
|
+
llm-translate file doc.md --target ko --provider ollama --model llama3.2:1b
|
|
401
|
+
|
|
402
|
+
# Reduce chunk size
|
|
403
|
+
llm-translate file doc.md --target ko --provider ollama --chunk-size 256
|
|
404
|
+
|
|
405
|
+
# Close other applications to free RAM
|
|
406
|
+
```
|
|
407
|
+
|
|
408
|
+
### 性能缓慢
|
|
409
|
+
|
|
410
|
+
**解决方案:**
|
|
@@ -0,0 +1,221 @@
|
|
|
1
|
+
# 提示缓存
|
|
2
|
+
|
|
3
|
+
::: info 翻译说明
|
|
4
|
+
所有非英文文档均使用 Claude Sonnet 4 自动翻译。
|
|
5
|
+
:::
|
|
6
|
+
|
|
7
|
+
提示缓存是一项成本优化功能,通过重复使用内容可将 API 成本降低高达 90%。
|
|
8
|
+
|
|
9
|
+
## 工作原理
|
|
10
|
+
|
|
11
|
+
在翻译文档时,提示的某些部分保持不变:
|
|
12
|
+
|
|
13
|
+
- **系统指令**:翻译规则和指导原则
|
|
14
|
+
- **术语表**:特定领域的术语
|
|
15
|
+
|
|
16
|
+
这些内容会被缓存并在多个分块中重复使用,从而显著节省成本。
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
Request 1 (First Chunk):
|
|
20
|
+
┌─────────────────────────────────┐
|
|
21
|
+
│ System Instructions (CACHED) │ ◀─ Written to cache
|
|
22
|
+
├─────────────────────────────────┤
|
|
23
|
+
│ Glossary (CACHED) │ ◀─ Written to cache
|
|
24
|
+
├─────────────────────────────────┤
|
|
25
|
+
│ Source Text (NOT cached) │
|
|
26
|
+
└─────────────────────────────────┘
|
|
27
|
+
|
|
28
|
+
Request 2+ (Subsequent Chunks):
|
|
29
|
+
┌─────────────────────────────────┐
|
|
30
|
+
│ System Instructions (CACHED) │ ◀─ Read from cache (90% off)
|
|
31
|
+
├─────────────────────────────────┤
|
|
32
|
+
│ Glossary (CACHED) │ ◀─ Read from cache (90% off)
|
|
33
|
+
├─────────────────────────────────┤
|
|
34
|
+
│ Source Text (NOT cached) │
|
|
35
|
+
└─────────────────────────────────┘
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## 成本影响
|
|
39
|
+
|
|
40
|
+
### 定价(Claude)
|
|
41
|
+
|
|
42
|
+
| 令牌类型 | 成本倍数 |
|
|
43
|
+
|------------|-----------------|
|
|
44
|
+
| 常规输入 | 1.0x |
|
|
45
|
+
| 缓存写入 | 1.25x(首次使用) |
|
|
46
|
+
| 缓存读取 | 0.1x(后续使用) |
|
|
47
|
+
| 输出 | 1.0x |
|
|
48
|
+
|
|
49
|
+
### 示例计算
|
|
50
|
+
|
|
51
|
+
对于包含 500 令牌术语表的 10 分块文档:
|
|
52
|
+
|
|
53
|
+
**不使用缓存:**
|
|
54
|
+
```
|
|
55
|
+
10 chunks × 500 glossary tokens = 5,000 tokens
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
**使用缓存:**
|
|
59
|
+
```
|
|
60
|
+
First chunk: 500 × 1.25 = 625 tokens (cache write)
|
|
61
|
+
9 chunks: 500 × 0.1 × 9 = 450 tokens (cache read)
|
|
62
|
+
Total: 1,075 tokens (78% savings)
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
## 要求
|
|
66
|
+
|
|
67
|
+
### 最小令牌阈值
|
|
68
|
+
|
|
69
|
+
提示缓存需要最小内容长度:
|
|
70
|
+
|
|
71
|
+
| 模型 | 最小令牌数 |
|
|
72
|
+
|-------|---------------|
|
|
73
|
+
| Claude Haiku 4.5 | 4,096 |
|
|
74
|
+
| Claude Haiku 3.5 | 2,048 |
|
|
75
|
+
| Claude Sonnet | 1,024 |
|
|
76
|
+
| Claude Opus | 1,024 |
|
|
77
|
+
|
|
78
|
+
低于这些阈值的内容不会被缓存。
|
|
79
|
+
|
|
80
|
+
### 提供商支持
|
|
81
|
+
|
|
82
|
+
| 提供商 | 缓存支持 |
|
|
83
|
+
|----------|-----------------|
|
|
84
|
+
| Claude | ✅ 完全支持 |
|
|
85
|
+
| OpenAI | ✅ 自动 |
|
|
86
|
+
| Ollama | ❌ 不可用 |
|
|
87
|
+
|
|
88
|
+
## 配置
|
|
89
|
+
|
|
90
|
+
Claude 默认启用缓存。要禁用:
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
llm-translate file doc.md -o doc.ko.md --target ko --no-cache
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
或在配置中:
|
|
97
|
+
|
|
98
|
+
```json
|
|
99
|
+
{
|
|
100
|
+
"provider": {
|
|
101
|
+
"name": "claude",
|
|
102
|
+
"caching": false
|
|
103
|
+
}
|
|
104
|
+
}
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
## 监控缓存性能
|
|
108
|
+
|
|
109
|
+
### CLI 输出
|
|
110
|
+
|
|
111
|
+
```
|
|
112
|
+
✓ Translation complete
|
|
113
|
+
Cache: 890 read / 234 written (78% hit rate)
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
### 详细模式
|
|
117
|
+
|
|
118
|
+
```bash
|
|
119
|
+
llm-translate file doc.md -o doc.ko.md --target ko --verbose
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
显示每个分块的缓存统计:
|
|
123
|
+
|
|
124
|
+
```
|
|
125
|
+
[Chunk 1/10] Cache: 0 read / 890 written
|
|
126
|
+
[Chunk 2/10] Cache: 890 read / 0 written
|
|
127
|
+
[Chunk 3/10] Cache: 890 read / 0 written
|
|
128
|
+
...
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
### 程序化访问
|
|
132
|
+
|
|
133
|
+
```typescript
|
|
134
|
+
const result = await engine.translateFile({
|
|
135
|
+
input: 'doc.md',
|
|
136
|
+
output: 'doc.ko.md',
|
|
137
|
+
targetLang: 'ko',
|
|
138
|
+
});
|
|
139
|
+
|
|
140
|
+
console.log(result.metadata.tokensUsed);
|
|
141
|
+
// {
|
|
142
|
+
// input: 5000,
|
|
143
|
+
// output: 6000,
|
|
144
|
+
// cacheRead: 8000,
|
|
145
|
+
// cacheWrite: 1000
|
|
146
|
+
// }
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
## 最大化缓存效率
|
|
150
|
+
|
|
151
|
+
### 1. 使用一致的术语表
|
|
152
|
+
|
|
153
|
+
相同的术语表内容 = 相同的缓存键
|
|
154
|
+
|
|
155
|
+
```bash
|
|
156
|
+
# Good: Same glossary for all files
|
|
157
|
+
llm-translate dir ./docs ./docs-ko --target ko --glossary glossary.json
|
|
158
|
+
|
|
159
|
+
# Less efficient: Different glossary per file
|
|
160
|
+
llm-translate file a.md --glossary a-glossary.json
|
|
161
|
+
llm-translate file b.md --glossary b-glossary.json
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
### 2. 批量处理相关文件
|
|
165
|
+
|
|
166
|
+
缓存持续约 5 分钟。一起处理文件:
|
|
167
|
+
|
|
168
|
+
```bash
|
|
169
|
+
# Efficient: Sequential processing shares cache
|
|
170
|
+
llm-translate dir ./docs ./docs-ko --target ko
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
### 3. 按大小排序文件
|
|
174
|
+
|
|
175
|
+
从较大的文件开始预热缓存:
|
|
176
|
+
|
|
177
|
+
```bash
|
|
178
|
+
# Cache is populated by first file, reused by rest
|
|
179
|
+
llm-translate file large-doc.md ...
|
|
180
|
+
llm-translate file small-doc.md ...
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
### 4. 策略性使用较大的术语表
|
|
184
|
+
|
|
185
|
+
较大的术语表从缓存中获益更多:
|
|
186
|
+
|
|
187
|
+
| 术语表大小 | 缓存节省 |
|
|
188
|
+
|---------------|---------------|
|
|
189
|
+
| 100 令牌 | ~70% |
|
|
190
|
+
| 500 令牌 | ~78% |
|
|
191
|
+
| 1000+ 令牌 | ~80%+ |
|
|
192
|
+
|
|
193
|
+
## 故障排除
|
|
194
|
+
|
|
195
|
+
### 缓存不工作
|
|
196
|
+
|
|
197
|
+
**症状:** 未报告 `cacheRead` 令牌
|
|
198
|
+
|
|
199
|
+
**原因:**
|
|
200
|
+
1. 内容低于最小阈值
|
|
201
|
+
2. 请求之间内容发生变化
|
|
202
|
+
3. 缓存 TTL 过期(5 分钟)
|
|
203
|
+
|
|
204
|
+
**解决方案:**
|
|
205
|
+
- 确保术语表 + 系统提示 > 最小令牌数
|
|
206
|
+
- 快速连续处理文件
|
|
207
|
+
- 使用详细模式进行调试
|
|
208
|
+
|
|
209
|
+
### 高缓存写入成本
|
|
210
|
+
|
|
211
|
+
**症状:**`cacheWrite` 超出预期
|
|
212
|
+
|
|
213
|
+
**原因:**
|
|
214
|
+
1. 许多唯一的术语表
|
|
215
|
+
2. 文件处理间隔太远
|
|
216
|
+
3. 运行之间缓存失效
|
|
217
|
+
|
|
218
|
+
**解决方案:**
|
|
219
|
+
- 合并术语表
|
|
220
|
+
- 使用批量处理
|
|
221
|
+
- 在 5 分钟窗口内处理
|