@llm-translate/cli 1.0.0-next.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.dockerignore +51 -0
- package/.env.example +33 -0
- package/.github/workflows/docs-pages.yml +57 -0
- package/.github/workflows/release.yml +49 -0
- package/.translaterc.json +44 -0
- package/CLAUDE.md +243 -0
- package/Dockerfile +55 -0
- package/README.md +371 -0
- package/RFC.md +1595 -0
- package/dist/cli/index.d.ts +2 -0
- package/dist/cli/index.js +4494 -0
- package/dist/cli/index.js.map +1 -0
- package/dist/index.d.ts +1152 -0
- package/dist/index.js +3841 -0
- package/dist/index.js.map +1 -0
- package/docker-compose.yml +56 -0
- package/docs/.vitepress/config.ts +161 -0
- package/docs/api/agent.md +262 -0
- package/docs/api/engine.md +274 -0
- package/docs/api/index.md +171 -0
- package/docs/api/providers.md +304 -0
- package/docs/changelog.md +64 -0
- package/docs/cli/dir.md +243 -0
- package/docs/cli/file.md +213 -0
- package/docs/cli/glossary.md +273 -0
- package/docs/cli/index.md +129 -0
- package/docs/cli/init.md +158 -0
- package/docs/cli/serve.md +211 -0
- package/docs/glossary.json +235 -0
- package/docs/guide/chunking.md +272 -0
- package/docs/guide/configuration.md +139 -0
- package/docs/guide/cost-optimization.md +237 -0
- package/docs/guide/docker.md +371 -0
- package/docs/guide/getting-started.md +150 -0
- package/docs/guide/glossary.md +241 -0
- package/docs/guide/index.md +86 -0
- package/docs/guide/ollama.md +515 -0
- package/docs/guide/prompt-caching.md +221 -0
- package/docs/guide/providers.md +232 -0
- package/docs/guide/quality-control.md +206 -0
- package/docs/guide/vitepress-integration.md +265 -0
- package/docs/index.md +63 -0
- package/docs/ja/api/agent.md +262 -0
- package/docs/ja/api/engine.md +274 -0
- package/docs/ja/api/index.md +171 -0
- package/docs/ja/api/providers.md +304 -0
- package/docs/ja/changelog.md +64 -0
- package/docs/ja/cli/dir.md +243 -0
- package/docs/ja/cli/file.md +213 -0
- package/docs/ja/cli/glossary.md +273 -0
- package/docs/ja/cli/index.md +111 -0
- package/docs/ja/cli/init.md +158 -0
- package/docs/ja/guide/chunking.md +271 -0
- package/docs/ja/guide/configuration.md +139 -0
- package/docs/ja/guide/cost-optimization.md +30 -0
- package/docs/ja/guide/getting-started.md +150 -0
- package/docs/ja/guide/glossary.md +214 -0
- package/docs/ja/guide/index.md +32 -0
- package/docs/ja/guide/ollama.md +410 -0
- package/docs/ja/guide/prompt-caching.md +221 -0
- package/docs/ja/guide/providers.md +232 -0
- package/docs/ja/guide/quality-control.md +137 -0
- package/docs/ja/guide/vitepress-integration.md +265 -0
- package/docs/ja/index.md +58 -0
- package/docs/ko/api/agent.md +262 -0
- package/docs/ko/api/engine.md +274 -0
- package/docs/ko/api/index.md +171 -0
- package/docs/ko/api/providers.md +304 -0
- package/docs/ko/changelog.md +64 -0
- package/docs/ko/cli/dir.md +243 -0
- package/docs/ko/cli/file.md +213 -0
- package/docs/ko/cli/glossary.md +273 -0
- package/docs/ko/cli/index.md +111 -0
- package/docs/ko/cli/init.md +158 -0
- package/docs/ko/guide/chunking.md +271 -0
- package/docs/ko/guide/configuration.md +139 -0
- package/docs/ko/guide/cost-optimization.md +30 -0
- package/docs/ko/guide/getting-started.md +150 -0
- package/docs/ko/guide/glossary.md +214 -0
- package/docs/ko/guide/index.md +32 -0
- package/docs/ko/guide/ollama.md +410 -0
- package/docs/ko/guide/prompt-caching.md +221 -0
- package/docs/ko/guide/providers.md +232 -0
- package/docs/ko/guide/quality-control.md +137 -0
- package/docs/ko/guide/vitepress-integration.md +265 -0
- package/docs/ko/index.md +58 -0
- package/docs/zh/api/agent.md +262 -0
- package/docs/zh/api/engine.md +274 -0
- package/docs/zh/api/index.md +171 -0
- package/docs/zh/api/providers.md +304 -0
- package/docs/zh/changelog.md +64 -0
- package/docs/zh/cli/dir.md +243 -0
- package/docs/zh/cli/file.md +213 -0
- package/docs/zh/cli/glossary.md +273 -0
- package/docs/zh/cli/index.md +111 -0
- package/docs/zh/cli/init.md +158 -0
- package/docs/zh/guide/chunking.md +271 -0
- package/docs/zh/guide/configuration.md +139 -0
- package/docs/zh/guide/cost-optimization.md +30 -0
- package/docs/zh/guide/getting-started.md +150 -0
- package/docs/zh/guide/glossary.md +214 -0
- package/docs/zh/guide/index.md +32 -0
- package/docs/zh/guide/ollama.md +410 -0
- package/docs/zh/guide/prompt-caching.md +221 -0
- package/docs/zh/guide/providers.md +232 -0
- package/docs/zh/guide/quality-control.md +137 -0
- package/docs/zh/guide/vitepress-integration.md +265 -0
- package/docs/zh/index.md +58 -0
- package/package.json +91 -0
- package/release.config.mjs +15 -0
- package/schemas/glossary.schema.json +110 -0
- package/src/cli/commands/dir.ts +469 -0
- package/src/cli/commands/file.ts +291 -0
- package/src/cli/commands/glossary.ts +221 -0
- package/src/cli/commands/init.ts +68 -0
- package/src/cli/commands/serve.ts +60 -0
- package/src/cli/index.ts +64 -0
- package/src/cli/options.ts +59 -0
- package/src/core/agent.ts +1119 -0
- package/src/core/chunker.ts +391 -0
- package/src/core/engine.ts +634 -0
- package/src/errors.ts +188 -0
- package/src/index.ts +147 -0
- package/src/integrations/vitepress.ts +549 -0
- package/src/parsers/markdown.ts +383 -0
- package/src/providers/claude.ts +259 -0
- package/src/providers/interface.ts +109 -0
- package/src/providers/ollama.ts +379 -0
- package/src/providers/openai.ts +308 -0
- package/src/providers/registry.ts +153 -0
- package/src/server/index.ts +152 -0
- package/src/server/middleware/auth.ts +93 -0
- package/src/server/middleware/logger.ts +90 -0
- package/src/server/routes/health.ts +84 -0
- package/src/server/routes/translate.ts +210 -0
- package/src/server/types.ts +138 -0
- package/src/services/cache.ts +899 -0
- package/src/services/config.ts +217 -0
- package/src/services/glossary.ts +247 -0
- package/src/types/analysis.ts +164 -0
- package/src/types/index.ts +265 -0
- package/src/types/modes.ts +121 -0
- package/src/types/mqm.ts +157 -0
- package/src/utils/logger.ts +141 -0
- package/src/utils/tokens.ts +116 -0
- package/tests/fixtures/glossaries/ml-glossary.json +53 -0
- package/tests/fixtures/input/lynq-installation.ko.md +350 -0
- package/tests/fixtures/input/lynq-installation.md +350 -0
- package/tests/fixtures/input/simple.ko.md +27 -0
- package/tests/fixtures/input/simple.md +27 -0
- package/tests/unit/chunker.test.ts +229 -0
- package/tests/unit/glossary.test.ts +146 -0
- package/tests/unit/markdown.test.ts +205 -0
- package/tests/unit/tokens.test.ts +81 -0
- package/tsconfig.json +28 -0
- package/tsup.config.ts +34 -0
- package/vitest.config.ts +16 -0
|
@@ -0,0 +1,158 @@
|
|
|
1
|
+
# llm-translate init
|
|
2
|
+
|
|
3
|
+
::: info 翻译说明
|
|
4
|
+
所有非英文文档均使用 Claude Sonnet 4 自动翻译。
|
|
5
|
+
:::
|
|
6
|
+
|
|
7
|
+
为您的项目初始化配置文件。
|
|
8
|
+
|
|
9
|
+
## 概要
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
llm-translate init [options]
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
## 选项
|
|
16
|
+
|
|
17
|
+
| 选项 | 默认值 | 描述 |
|
|
18
|
+
|--------|---------|-------------|
|
|
19
|
+
|`--provider `,`-p`| claude | 默认提供商 |
|
|
20
|
+
|`--model `,`-m`| 不定 | 默认模型 |
|
|
21
|
+
|`--quality`| 85 | 默认质量阈值 |
|
|
22
|
+
|`--glossary`| none | 创建术语表模板 |
|
|
23
|
+
|`--force `,`-f`| false | 覆盖现有配置 |
|
|
24
|
+
|
|
25
|
+
## 示例
|
|
26
|
+
|
|
27
|
+
### 基本初始化
|
|
28
|
+
|
|
29
|
+
```bash
|
|
30
|
+
llm-translate init
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
创建 `.translaterc.json`:
|
|
34
|
+
|
|
35
|
+
```json
|
|
36
|
+
{
|
|
37
|
+
"provider": {
|
|
38
|
+
"name": "claude",
|
|
39
|
+
"model": "claude-haiku-4-5-20251001"
|
|
40
|
+
},
|
|
41
|
+
"translation": {
|
|
42
|
+
"qualityThreshold": 85,
|
|
43
|
+
"maxIterations": 4
|
|
44
|
+
},
|
|
45
|
+
"paths": {}
|
|
46
|
+
}
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
### 指定提供商
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
llm-translate init --provider openai --model gpt-4o
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
### 使用术语表模板
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
llm-translate init --glossary
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
同时创建 `glossary.json`:
|
|
62
|
+
|
|
63
|
+
```json
|
|
64
|
+
{
|
|
65
|
+
"sourceLanguage": "en",
|
|
66
|
+
"version": "1.0.0",
|
|
67
|
+
"terms": [
|
|
68
|
+
{
|
|
69
|
+
"source": "example",
|
|
70
|
+
"targets": {
|
|
71
|
+
"ko": "예시"
|
|
72
|
+
},
|
|
73
|
+
"context": "Replace with your terms"
|
|
74
|
+
}
|
|
75
|
+
]
|
|
76
|
+
}
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
### 自定义质量
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
llm-translate init --quality 95
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
## 交互模式
|
|
86
|
+
|
|
87
|
+
不带选项时,init 以交互方式运行:
|
|
88
|
+
|
|
89
|
+
```
|
|
90
|
+
$ llm-translate init
|
|
91
|
+
|
|
92
|
+
llm-translate Configuration Setup
|
|
93
|
+
|
|
94
|
+
? Select provider: (Use arrow keys)
|
|
95
|
+
❯ claude
|
|
96
|
+
openai
|
|
97
|
+
ollama
|
|
98
|
+
|
|
99
|
+
? Select model: (Use arrow keys)
|
|
100
|
+
❯ claude-haiku-4-5-20251001 (fast, cost-effective)
|
|
101
|
+
claude-sonnet-4-5-20250929 (balanced)
|
|
102
|
+
claude-opus-4-5-20251101 (highest quality)
|
|
103
|
+
|
|
104
|
+
? Quality threshold: (85)
|
|
105
|
+
? Create glossary template? (y/N)
|
|
106
|
+
|
|
107
|
+
✓ Created .translaterc.json
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
## 输出文件
|
|
111
|
+
|
|
112
|
+
### .translaterc.json
|
|
113
|
+
|
|
114
|
+
```json
|
|
115
|
+
{
|
|
116
|
+
"$schema": "https://llm-translate.dev/schema.json",
|
|
117
|
+
"provider": {
|
|
118
|
+
"name": "claude",
|
|
119
|
+
"model": "claude-haiku-4-5-20251001"
|
|
120
|
+
},
|
|
121
|
+
"translation": {
|
|
122
|
+
"qualityThreshold": 85,
|
|
123
|
+
"maxIterations": 4,
|
|
124
|
+
"preserveFormatting": true
|
|
125
|
+
},
|
|
126
|
+
"chunking": {
|
|
127
|
+
"maxTokens": 1024,
|
|
128
|
+
"overlapTokens": 150
|
|
129
|
+
},
|
|
130
|
+
"paths": {
|
|
131
|
+
"glossary": "./glossary.json",
|
|
132
|
+
"cache": "./.translate-cache"
|
|
133
|
+
}
|
|
134
|
+
}
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
### glossary.json(使用 --glossary)
|
|
138
|
+
|
|
139
|
+
```json
|
|
140
|
+
{
|
|
141
|
+
"$schema": "https://llm-translate.dev/glossary-schema.json",
|
|
142
|
+
"sourceLanguage": "en",
|
|
143
|
+
"version": "1.0.0",
|
|
144
|
+
"description": "Project glossary",
|
|
145
|
+
"terms": []
|
|
146
|
+
}
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
## 覆盖现有配置
|
|
150
|
+
|
|
151
|
+
```bash
|
|
152
|
+
# Will fail if config exists
|
|
153
|
+
llm-translate init
|
|
154
|
+
# Error: .translaterc.json already exists. Use --force to overwrite.
|
|
155
|
+
|
|
156
|
+
# Force overwrite
|
|
157
|
+
llm-translate init --force
|
|
158
|
+
```
|
|
@@ -0,0 +1,271 @@
|
|
|
1
|
+
# Chunking策略
|
|
2
|
+
|
|
3
|
+
::: info 翻译说明
|
|
4
|
+
所有非英文文档均使用Claude Sonnet 4自动翻译。
|
|
5
|
+
:::
|
|
6
|
+
|
|
7
|
+
大型文档会被分割成分块进行翻译。了解Chunking有助于优化质量和成本。
|
|
8
|
+
|
|
9
|
+
## 为什么需要Chunking?
|
|
10
|
+
|
|
11
|
+
LLM有上下文限制,在专注内容上表现更好:
|
|
12
|
+
|
|
13
|
+
| 原因 | 描述 |
|
|
14
|
+
|--------|-------------|
|
|
15
|
+
| **上下文限制** | 模型有最大输入大小限制 |
|
|
16
|
+
| **质量** | 较小的分块能获得更专注的处理 |
|
|
17
|
+
| **成本** | 允许缓存重复内容 |
|
|
18
|
+
| **进度** | 支持进度跟踪和恢复 |
|
|
19
|
+
|
|
20
|
+
## 默认配置
|
|
21
|
+
|
|
22
|
+
```json
|
|
23
|
+
{
|
|
24
|
+
"chunking": {
|
|
25
|
+
"maxTokens": 1024,
|
|
26
|
+
"overlapTokens": 150
|
|
27
|
+
}
|
|
28
|
+
}
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## 分块大小选项
|
|
32
|
+
|
|
33
|
+
### maxTokens
|
|
34
|
+
|
|
35
|
+
每个分块的最大令牌数(不包括提示开销)。
|
|
36
|
+
|
|
37
|
+
| 大小 | 最适用于 | 权衡 |
|
|
38
|
+
|------|----------|-----------|
|
|
39
|
+
| 512 | 高质量要求 | 更多API调用 |
|
|
40
|
+
| **1024** | 一般使用(默认) | 平衡 |
|
|
41
|
+
| 2048 | 成本优化 | 可能降低质量 |
|
|
42
|
+
|
|
43
|
+
### overlapTokens
|
|
44
|
+
|
|
45
|
+
来自前一个分块的上下文确保跨边界的连续性。
|
|
46
|
+
|
|
47
|
+
```
|
|
48
|
+
Chunk 1: [Content A ]
|
|
49
|
+
Chunk 2: [overlap][Content B ]
|
|
50
|
+
Chunk 3: [overlap][Content C ]
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
::: tip 推荐重叠
|
|
54
|
+
使用 `maxTokens` 值的10-15%。对于1024令牌,100-150重叠令牌效果良好。
|
|
55
|
+
:::
|
|
56
|
+
|
|
57
|
+
## Markdown感知Chunking
|
|
58
|
+
|
|
59
|
+
llm-translate使用基于AST的Chunking,尊重文档结构。
|
|
60
|
+
|
|
61
|
+
### 保留的边界
|
|
62
|
+
|
|
63
|
+
Chunking器永远不会分割这些元素:
|
|
64
|
+
|
|
65
|
+
| 元素 | 行为 |
|
|
66
|
+
|---------|----------|
|
|
67
|
+
| 标题 | 保留章节边界 |
|
|
68
|
+
| 代码块 | 始终保持完整 |
|
|
69
|
+
| 列表 | 尽可能将项目分组 |
|
|
70
|
+
| 表格 | 永不跨分块分割 |
|
|
71
|
+
| 段落 | 在自然边界处分割 |
|
|
72
|
+
|
|
73
|
+
### 示例
|
|
74
|
+
|
|
75
|
+
::: details 点击查看Chunking示例
|
|
76
|
+
|
|
77
|
+
**输入文档:**
|
|
78
|
+
|
|
79
|
+
```markdown
|
|
80
|
+
# Introduction
|
|
81
|
+
|
|
82
|
+
This is the introduction paragraph that explains
|
|
83
|
+
the purpose of the document.
|
|
84
|
+
|
|
85
|
+
## Getting Started
|
|
86
|
+
|
|
87
|
+
### Prerequisites
|
|
88
|
+
|
|
89
|
+
- Node.js 20+
|
|
90
|
+
- npm or yarn
|
|
91
|
+
|
|
92
|
+
### Installation
|
|
93
|
+
|
|
94
|
+
npm install @llm-translate/cli
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
**结果:**
|
|
98
|
+
|
|
99
|
+
```
|
|
100
|
+
Chunk 1: # Introduction + paragraph
|
|
101
|
+
Chunk 2: ## Getting Started + ### Prerequisites + list
|
|
102
|
+
Chunk 3: ### Installation + code block
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
:::
|
|
106
|
+
|
|
107
|
+
## 配置
|
|
108
|
+
|
|
109
|
+
::: code-group
|
|
110
|
+
|
|
111
|
+
```bash [CLI]
|
|
112
|
+
llm-translate file doc.md --target ko --chunk-size 2048
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
```json [.translaterc.json]
|
|
116
|
+
{
|
|
117
|
+
"chunking": {
|
|
118
|
+
"maxTokens": 2048,
|
|
119
|
+
"overlapTokens": 200,
|
|
120
|
+
"preservePatterns": [
|
|
121
|
+
"```[\\s\\S]*?```",
|
|
122
|
+
"\\|[^\\n]+\\|"
|
|
123
|
+
]
|
|
124
|
+
}
|
|
125
|
+
}
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
```typescript [Programmatic]
|
|
129
|
+
import { chunkContent } from '@llm-translate/cli';
|
|
130
|
+
|
|
131
|
+
const chunks = chunkContent(content, {
|
|
132
|
+
maxTokens: 1024,
|
|
133
|
+
overlapTokens: 150,
|
|
134
|
+
});
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
:::
|
|
138
|
+
|
|
139
|
+
## 优化预设
|
|
140
|
+
|
|
141
|
+
根据您的优先级选择:
|
|
142
|
+
|
|
143
|
+
::: code-group
|
|
144
|
+
|
|
145
|
+
```json [Quality Focus]
|
|
146
|
+
{
|
|
147
|
+
"chunking": {
|
|
148
|
+
"maxTokens": 512,
|
|
149
|
+
"overlapTokens": 100
|
|
150
|
+
}
|
|
151
|
+
}
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
```json [Cost Focus]
|
|
155
|
+
{
|
|
156
|
+
"chunking": {
|
|
157
|
+
"maxTokens": 2048,
|
|
158
|
+
"overlapTokens": 50
|
|
159
|
+
}
|
|
160
|
+
}
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
```json [Long Documents]
|
|
164
|
+
{
|
|
165
|
+
"chunking": {
|
|
166
|
+
"maxTokens": 1500,
|
|
167
|
+
"overlapTokens": 150
|
|
168
|
+
},
|
|
169
|
+
"translation": {
|
|
170
|
+
"maxIterations": 3
|
|
171
|
+
}
|
|
172
|
+
}
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
:::
|
|
176
|
+
|
|
177
|
+
::: info 何时使用各预设
|
|
178
|
+
- **质量优先**:技术文档、法律内容
|
|
179
|
+
- **成本优先**:博客文章、一般内容
|
|
180
|
+
- **长文档**:书籍、综合指南
|
|
181
|
+
:::
|
|
182
|
+
|
|
183
|
+
## 内容保护
|
|
184
|
+
|
|
185
|
+
### 受保护的内容
|
|
186
|
+
|
|
187
|
+
llm-translate自动保护某些内容不被翻译:
|
|
188
|
+
|
|
189
|
+
| 内容类型 | 示例 | 行为 |
|
|
190
|
+
|--------------|---------|----------|
|
|
191
|
+
| 代码块 |` __INLINE_CODE_16__ `| 永不翻译 |
|
|
192
|
+
| 内联代码 |`` ` variable ` ``| 保留 |
|
|
193
|
+
| URL |`https://...`| 保留 |
|
|
194
|
+
| 文件路径 |`./path/to/file`| 保留 |
|
|
195
|
+
|
|
196
|
+
### 链接处理
|
|
197
|
+
|
|
198
|
+
链接URL被保留,但链接文本会被翻译:
|
|
199
|
+
|
|
200
|
+
```markdown
|
|
201
|
+
[Getting Started](./getting-started.md)
|
|
202
|
+
↓
|
|
203
|
+
[시작하기](./getting-started.md)
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
## 调试
|
|
207
|
+
|
|
208
|
+
### 预览分块
|
|
209
|
+
|
|
210
|
+
使用 `--dry-run` 查看您的文档将如何被分块:
|
|
211
|
+
|
|
212
|
+
```bash
|
|
213
|
+
llm-translate file doc.md --target ko --dry-run --verbose
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
输出:
|
|
217
|
+
```
|
|
218
|
+
Document Analysis:
|
|
219
|
+
Total tokens: ~5,200
|
|
220
|
+
Chunks: 6
|
|
221
|
+
Average chunk size: ~867 tokens
|
|
222
|
+
|
|
223
|
+
Chunk breakdown:
|
|
224
|
+
[1] Lines 1-45 (Introduction) - 823 tokens
|
|
225
|
+
[2] Lines 46-89 (Getting Started) - 912 tokens
|
|
226
|
+
[3] Lines 90-134 (Configuration) - 878 tokens
|
|
227
|
+
...
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
### 程序化检查
|
|
231
|
+
|
|
232
|
+
```typescript
|
|
233
|
+
import { chunkContent, getChunkStats } from '@llm-translate/cli';
|
|
234
|
+
|
|
235
|
+
const chunks = chunkContent(content, { maxTokens: 1024 });
|
|
236
|
+
const stats = getChunkStats(chunks);
|
|
237
|
+
|
|
238
|
+
console.log(`Total chunks: ${stats.count}`);
|
|
239
|
+
console.log(`Average size: ${stats.avgTokens} tokens`);
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
## 故障排除
|
|
243
|
+
|
|
244
|
+
::: warning 分块过小
|
|
245
|
+
**症状**:许多小分块,过多API调用
|
|
246
|
+
|
|
247
|
+
**解决方案**:增加 `maxTokens`
|
|
248
|
+
```json
|
|
249
|
+
{ "chunking": { "maxTokens": 2048 } }
|
|
250
|
+
```
|
|
251
|
+
:::
|
|
252
|
+
|
|
253
|
+
::: warning 分块间上下文丢失
|
|
254
|
+
**症状**:各章节间术语不一致
|
|
255
|
+
|
|
256
|
+
**解决方案**:增加重叠或使用术语表
|
|
257
|
+
```json
|
|
258
|
+
{ "chunking": { "overlapTokens": 300 } }
|
|
259
|
+
```
|
|
260
|
+
:::
|
|
261
|
+
|
|
262
|
+
::: danger 代码块被分割
|
|
263
|
+
**症状**:输出中出现语法错误
|
|
264
|
+
|
|
265
|
+
**原因**:这种情况永远不应该发生。如果发生,请[报告问题](https://github.com/selenehyun/llm-translate/issues)。
|
|
266
|
+
:::
|
|
267
|
+
|
|
268
|
+
::: warning 表格被破坏
|
|
269
|
+
**症状**:表格格式损坏
|
|
270
|
+
|
|
271
|
+
**解决方案**:表格应该自动保持完整。对于非常大的表格(100+行),考虑手动分割。
|
|
@@ -0,0 +1,139 @@
|
|
|
1
|
+
# 配置
|
|
2
|
+
|
|
3
|
+
::: info 翻译
|
|
4
|
+
所有非英文文档均使用 Claude Sonnet 4 自动翻译。
|
|
5
|
+
:::
|
|
6
|
+
|
|
7
|
+
llm-translate 使用分层配置系统。设置按以下顺序应用(后者覆盖前者):
|
|
8
|
+
|
|
9
|
+
1. 内置默认值
|
|
10
|
+
2. 配置文件 (`.translaterc.json`)
|
|
11
|
+
3. 环境变量
|
|
12
|
+
4. CLI 参数
|
|
13
|
+
|
|
14
|
+
## 配置文件
|
|
15
|
+
|
|
16
|
+
在项目根目录创建 `.translaterc.json`:
|
|
17
|
+
|
|
18
|
+
```json
|
|
19
|
+
{
|
|
20
|
+
"provider": {
|
|
21
|
+
"name": "claude",
|
|
22
|
+
"model": "claude-haiku-4-5-20251001",
|
|
23
|
+
"apiKey": null
|
|
24
|
+
},
|
|
25
|
+
"translation": {
|
|
26
|
+
"qualityThreshold": 85,
|
|
27
|
+
"maxIterations": 4,
|
|
28
|
+
"preserveFormatting": true
|
|
29
|
+
},
|
|
30
|
+
"chunking": {
|
|
31
|
+
"maxTokens": 1024,
|
|
32
|
+
"overlapTokens": 150
|
|
33
|
+
},
|
|
34
|
+
"paths": {
|
|
35
|
+
"glossary": "./glossary.json",
|
|
36
|
+
"cache": "./.translate-cache"
|
|
37
|
+
}
|
|
38
|
+
}
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
### 提供商设置
|
|
42
|
+
|
|
43
|
+
| 选项 | 类型 | 默认值 | 描述 |
|
|
44
|
+
|--------|------|---------|-------------|
|
|
45
|
+
|`name `| string |`"claude"`| 提供商名称:` claude `、` openai `、` ollama`|
|
|
46
|
+
|`model`| string | 不定 | 模型标识符 |
|
|
47
|
+
|`apiKey`| string | null | API 密钥(建议使用环境变量) |
|
|
48
|
+
|`baseUrl`| string | null | 自定义 API 端点 |
|
|
49
|
+
|
|
50
|
+
### 翻译设置
|
|
51
|
+
|
|
52
|
+
| 选项 | 类型 | 默认值 | 描述 |
|
|
53
|
+
|--------|------|---------|-------------|
|
|
54
|
+
|`qualityThreshold `| number |` 85`| 最低质量阈值(0-100) |
|
|
55
|
+
|`maxIterations `| number |` 4`| 最大优化迭代次数 |
|
|
56
|
+
|`preserveFormatting `| boolean |` true`| 保留 Markdown/HTML 结构 |
|
|
57
|
+
|
|
58
|
+
### Chunking 设置
|
|
59
|
+
|
|
60
|
+
| 选项 | 类型 | 默认值 | 描述 |
|
|
61
|
+
|--------|------|---------|-------------|
|
|
62
|
+
|`maxTokens `| number |` 1024`| 每个分块的最大令牌数 |
|
|
63
|
+
|`overlapTokens `| number |` 150`| 分块间的上下文重叠 |
|
|
64
|
+
|
|
65
|
+
### 路径设置
|
|
66
|
+
|
|
67
|
+
| 选项 | 类型 | 默认值 | 描述 |
|
|
68
|
+
|--------|------|---------|-------------|
|
|
69
|
+
|`glossary`| string | null | 术语表文件路径 |
|
|
70
|
+
|`cache`| string | null | 翻译缓存路径 |
|
|
71
|
+
|
|
72
|
+
## 环境变量
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
# API Keys
|
|
76
|
+
ANTHROPIC_API_KEY=sk-ant-xxxxx
|
|
77
|
+
OPENAI_API_KEY=sk-xxxxx
|
|
78
|
+
OLLAMA_BASE_URL=http://localhost:11434
|
|
79
|
+
|
|
80
|
+
# Default Settings
|
|
81
|
+
LLM_TRANSLATE_PROVIDER=claude
|
|
82
|
+
LLM_TRANSLATE_MODEL=claude-haiku-4-5-20251001
|
|
83
|
+
LLM_TRANSLATE_QUALITY_THRESHOLD=85
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
## CLI 覆盖示例
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
# Override provider
|
|
90
|
+
llm-translate file doc.md -o doc.ko.md --target ko --provider openai
|
|
91
|
+
|
|
92
|
+
# Override model
|
|
93
|
+
llm-translate file doc.md -o doc.ko.md --target ko --model claude-sonnet-4-5-20250929
|
|
94
|
+
|
|
95
|
+
# Override quality threshold
|
|
96
|
+
llm-translate file doc.md -o doc.ko.md --target ko --quality 90
|
|
97
|
+
|
|
98
|
+
# Override max iterations
|
|
99
|
+
llm-translate file doc.md -o doc.ko.md --target ko --max-iterations 6
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
## 项目级配置
|
|
103
|
+
|
|
104
|
+
对于 monorepo 或多项目设置,在每个项目目录中放置 `.translaterc.json`:
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
my-monorepo/
|
|
108
|
+
├── packages/
|
|
109
|
+
│ ├── frontend/
|
|
110
|
+
│ │ ├── .translaterc.json # Frontend-specific terms
|
|
111
|
+
│ │ └── docs/
|
|
112
|
+
│ └── backend/
|
|
113
|
+
│ ├── .translaterc.json # Backend-specific terms
|
|
114
|
+
│ └── docs/
|
|
115
|
+
└── .translaterc.json # Shared defaults
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
llm-translate 从当前目录向上搜索配置文件。
|
|
119
|
+
|
|
120
|
+
## 模型选择指南
|
|
121
|
+
|
|
122
|
+
| 模型 | 速度 | 质量 | 成本 | 最适用于 |
|
|
123
|
+
|-------|-------|---------|------|----------|
|
|
124
|
+
|`claude-haiku-4-5-20251001`| 快 | 良好 | 低 | 通用文档,大批量 |
|
|
125
|
+
|`claude-sonnet-4-5-20250929`| 中等 | 优秀 | 中等 | 技术文档,质量关键 |
|
|
126
|
+
|`claude-opus-4-5-20251101`| 慢 | 最佳 | 高 | 复杂内容,细致文本 |
|
|
127
|
+
|`gpt-4o-mini`| 快 | 良好 | 低 | Haiku 的替代方案 |
|
|
128
|
+
|`gpt-4o`| 中等 | 优秀 | 中等 | Sonnet 的替代方案 |
|
|
129
|
+
|
|
130
|
+
## 质量阈值指南
|
|
131
|
+
|
|
132
|
+
| 阈值 | 使用场景 |
|
|
133
|
+
|-----------|----------|
|
|
134
|
+
| 70-75 | 草稿翻译,内部文档 |
|
|
135
|
+
| 80-85 | 标准文档(默认) |
|
|
136
|
+
| 90-95 | 面向公众,营销内容 |
|
|
137
|
+
| 95+ | 法律,医疗,受监管内容 |
|
|
138
|
+
|
|
139
|
+
更高的阈值需要更多迭代,成本也更高。
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
# 成本优化
|
|
2
|
+
|
|
3
|
+
::: info 翻译说明
|
|
4
|
+
所有非英文文档均使用 Claude Sonnet 4 自动翻译。
|
|
5
|
+
:::
|
|
6
|
+
|
|
7
|
+
本指南介绍在保持翻译质量的同时最小化 API 成本的策略。
|
|
8
|
+
|
|
9
|
+
## 成本结构
|
|
10
|
+
|
|
11
|
+
### 令牌定价(截至 2025 年)
|
|
12
|
+
|
|
13
|
+
| 模型 | 输入(1K) | 输出(1K) | 缓存读取 | 缓存写入 |
|
|
14
|
+
|-------|-----------|-------------|------------|-------------|
|
|
15
|
+
| Claude Haiku 4.5 | $0.001 | $0.005 | $0.0001 | $0.00125 |
|
|
16
|
+
| Claude Sonnet 4.5 | $0.003 | $0.015 | $0.0003 | $0.00375 |
|
|
17
|
+
| Claude Opus 4.5 | $0.015 | $0.075 | $0.0015 | $0.01875 |
|
|
18
|
+
| GPT-4o-mini | $0.00015 | $0.0006 | 自动 | 自动 |
|
|
19
|
+
| GPT-4o | $0.0025 | $0.01 | 自动 | 自动 |
|
|
20
|
+
|
|
21
|
+
### 成本因素
|
|
22
|
+
|
|
23
|
+
- [ ] 对标准文档使用 Haiku
|
|
24
|
+
- [ ] 适当设置质量阈值(不要高于所需)
|
|
25
|
+
- [ ] 启用并最大化提示缓存
|
|
26
|
+
- [ ] 批量处理文件
|
|
27
|
+
- [ ] 保持术语表精简
|
|
28
|
+
- [ ] 对增量更新使用翻译缓存
|
|
29
|
+
- [ ] 使用详细输出监控成本
|
|
30
|
+
- [ ] 大型任务前先进行估算
|