@llm-translate/cli 1.0.0-next.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.dockerignore +51 -0
- package/.env.example +33 -0
- package/.github/workflows/docs-pages.yml +57 -0
- package/.github/workflows/release.yml +49 -0
- package/.translaterc.json +44 -0
- package/CLAUDE.md +243 -0
- package/Dockerfile +55 -0
- package/README.md +371 -0
- package/RFC.md +1595 -0
- package/dist/cli/index.d.ts +2 -0
- package/dist/cli/index.js +4494 -0
- package/dist/cli/index.js.map +1 -0
- package/dist/index.d.ts +1152 -0
- package/dist/index.js +3841 -0
- package/dist/index.js.map +1 -0
- package/docker-compose.yml +56 -0
- package/docs/.vitepress/config.ts +161 -0
- package/docs/api/agent.md +262 -0
- package/docs/api/engine.md +274 -0
- package/docs/api/index.md +171 -0
- package/docs/api/providers.md +304 -0
- package/docs/changelog.md +64 -0
- package/docs/cli/dir.md +243 -0
- package/docs/cli/file.md +213 -0
- package/docs/cli/glossary.md +273 -0
- package/docs/cli/index.md +129 -0
- package/docs/cli/init.md +158 -0
- package/docs/cli/serve.md +211 -0
- package/docs/glossary.json +235 -0
- package/docs/guide/chunking.md +272 -0
- package/docs/guide/configuration.md +139 -0
- package/docs/guide/cost-optimization.md +237 -0
- package/docs/guide/docker.md +371 -0
- package/docs/guide/getting-started.md +150 -0
- package/docs/guide/glossary.md +241 -0
- package/docs/guide/index.md +86 -0
- package/docs/guide/ollama.md +515 -0
- package/docs/guide/prompt-caching.md +221 -0
- package/docs/guide/providers.md +232 -0
- package/docs/guide/quality-control.md +206 -0
- package/docs/guide/vitepress-integration.md +265 -0
- package/docs/index.md +63 -0
- package/docs/ja/api/agent.md +262 -0
- package/docs/ja/api/engine.md +274 -0
- package/docs/ja/api/index.md +171 -0
- package/docs/ja/api/providers.md +304 -0
- package/docs/ja/changelog.md +64 -0
- package/docs/ja/cli/dir.md +243 -0
- package/docs/ja/cli/file.md +213 -0
- package/docs/ja/cli/glossary.md +273 -0
- package/docs/ja/cli/index.md +111 -0
- package/docs/ja/cli/init.md +158 -0
- package/docs/ja/guide/chunking.md +271 -0
- package/docs/ja/guide/configuration.md +139 -0
- package/docs/ja/guide/cost-optimization.md +30 -0
- package/docs/ja/guide/getting-started.md +150 -0
- package/docs/ja/guide/glossary.md +214 -0
- package/docs/ja/guide/index.md +32 -0
- package/docs/ja/guide/ollama.md +410 -0
- package/docs/ja/guide/prompt-caching.md +221 -0
- package/docs/ja/guide/providers.md +232 -0
- package/docs/ja/guide/quality-control.md +137 -0
- package/docs/ja/guide/vitepress-integration.md +265 -0
- package/docs/ja/index.md +58 -0
- package/docs/ko/api/agent.md +262 -0
- package/docs/ko/api/engine.md +274 -0
- package/docs/ko/api/index.md +171 -0
- package/docs/ko/api/providers.md +304 -0
- package/docs/ko/changelog.md +64 -0
- package/docs/ko/cli/dir.md +243 -0
- package/docs/ko/cli/file.md +213 -0
- package/docs/ko/cli/glossary.md +273 -0
- package/docs/ko/cli/index.md +111 -0
- package/docs/ko/cli/init.md +158 -0
- package/docs/ko/guide/chunking.md +271 -0
- package/docs/ko/guide/configuration.md +139 -0
- package/docs/ko/guide/cost-optimization.md +30 -0
- package/docs/ko/guide/getting-started.md +150 -0
- package/docs/ko/guide/glossary.md +214 -0
- package/docs/ko/guide/index.md +32 -0
- package/docs/ko/guide/ollama.md +410 -0
- package/docs/ko/guide/prompt-caching.md +221 -0
- package/docs/ko/guide/providers.md +232 -0
- package/docs/ko/guide/quality-control.md +137 -0
- package/docs/ko/guide/vitepress-integration.md +265 -0
- package/docs/ko/index.md +58 -0
- package/docs/zh/api/agent.md +262 -0
- package/docs/zh/api/engine.md +274 -0
- package/docs/zh/api/index.md +171 -0
- package/docs/zh/api/providers.md +304 -0
- package/docs/zh/changelog.md +64 -0
- package/docs/zh/cli/dir.md +243 -0
- package/docs/zh/cli/file.md +213 -0
- package/docs/zh/cli/glossary.md +273 -0
- package/docs/zh/cli/index.md +111 -0
- package/docs/zh/cli/init.md +158 -0
- package/docs/zh/guide/chunking.md +271 -0
- package/docs/zh/guide/configuration.md +139 -0
- package/docs/zh/guide/cost-optimization.md +30 -0
- package/docs/zh/guide/getting-started.md +150 -0
- package/docs/zh/guide/glossary.md +214 -0
- package/docs/zh/guide/index.md +32 -0
- package/docs/zh/guide/ollama.md +410 -0
- package/docs/zh/guide/prompt-caching.md +221 -0
- package/docs/zh/guide/providers.md +232 -0
- package/docs/zh/guide/quality-control.md +137 -0
- package/docs/zh/guide/vitepress-integration.md +265 -0
- package/docs/zh/index.md +58 -0
- package/package.json +91 -0
- package/release.config.mjs +15 -0
- package/schemas/glossary.schema.json +110 -0
- package/src/cli/commands/dir.ts +469 -0
- package/src/cli/commands/file.ts +291 -0
- package/src/cli/commands/glossary.ts +221 -0
- package/src/cli/commands/init.ts +68 -0
- package/src/cli/commands/serve.ts +60 -0
- package/src/cli/index.ts +64 -0
- package/src/cli/options.ts +59 -0
- package/src/core/agent.ts +1119 -0
- package/src/core/chunker.ts +391 -0
- package/src/core/engine.ts +634 -0
- package/src/errors.ts +188 -0
- package/src/index.ts +147 -0
- package/src/integrations/vitepress.ts +549 -0
- package/src/parsers/markdown.ts +383 -0
- package/src/providers/claude.ts +259 -0
- package/src/providers/interface.ts +109 -0
- package/src/providers/ollama.ts +379 -0
- package/src/providers/openai.ts +308 -0
- package/src/providers/registry.ts +153 -0
- package/src/server/index.ts +152 -0
- package/src/server/middleware/auth.ts +93 -0
- package/src/server/middleware/logger.ts +90 -0
- package/src/server/routes/health.ts +84 -0
- package/src/server/routes/translate.ts +210 -0
- package/src/server/types.ts +138 -0
- package/src/services/cache.ts +899 -0
- package/src/services/config.ts +217 -0
- package/src/services/glossary.ts +247 -0
- package/src/types/analysis.ts +164 -0
- package/src/types/index.ts +265 -0
- package/src/types/modes.ts +121 -0
- package/src/types/mqm.ts +157 -0
- package/src/utils/logger.ts +141 -0
- package/src/utils/tokens.ts +116 -0
- package/tests/fixtures/glossaries/ml-glossary.json +53 -0
- package/tests/fixtures/input/lynq-installation.ko.md +350 -0
- package/tests/fixtures/input/lynq-installation.md +350 -0
- package/tests/fixtures/input/simple.ko.md +27 -0
- package/tests/fixtures/input/simple.md +27 -0
- package/tests/unit/chunker.test.ts +229 -0
- package/tests/unit/glossary.test.ts +146 -0
- package/tests/unit/markdown.test.ts +205 -0
- package/tests/unit/tokens.test.ts +81 -0
- package/tsconfig.json +28 -0
- package/tsup.config.ts +34 -0
- package/vitest.config.ts +16 -0
|
@@ -0,0 +1,241 @@
|
|
|
1
|
+
# Glossary
|
|
2
|
+
|
|
3
|
+
::: info Translations
|
|
4
|
+
All non-English documentation is automatically translated using Claude Sonnet 4.
|
|
5
|
+
:::
|
|
6
|
+
|
|
7
|
+
The glossary feature ensures consistent terminology across all your translations. Define terms once, and they'll be translated the same way every time.
|
|
8
|
+
|
|
9
|
+
## Glossary File Format
|
|
10
|
+
|
|
11
|
+
Create a `glossary.json` file:
|
|
12
|
+
|
|
13
|
+
```json
|
|
14
|
+
{
|
|
15
|
+
"sourceLanguage": "en",
|
|
16
|
+
"version": "1.0.0",
|
|
17
|
+
"terms": [
|
|
18
|
+
{
|
|
19
|
+
"source": "component",
|
|
20
|
+
"targets": {
|
|
21
|
+
"ko": "컴포넌트",
|
|
22
|
+
"ja": "コンポーネント",
|
|
23
|
+
"zh": "组件"
|
|
24
|
+
},
|
|
25
|
+
"context": "UI component in React/Vue"
|
|
26
|
+
}
|
|
27
|
+
]
|
|
28
|
+
}
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## Term Structure
|
|
32
|
+
|
|
33
|
+
### Basic Term
|
|
34
|
+
|
|
35
|
+
```json
|
|
36
|
+
{
|
|
37
|
+
"source": "endpoint",
|
|
38
|
+
"targets": {
|
|
39
|
+
"ko": "엔드포인트",
|
|
40
|
+
"ja": "エンドポイント"
|
|
41
|
+
}
|
|
42
|
+
}
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
### With Context
|
|
46
|
+
|
|
47
|
+
Context helps the LLM understand how to use the term:
|
|
48
|
+
|
|
49
|
+
```json
|
|
50
|
+
{
|
|
51
|
+
"source": "state",
|
|
52
|
+
"targets": { "ko": "상태" },
|
|
53
|
+
"context": "Application state in state management"
|
|
54
|
+
}
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
### Do Not Translate
|
|
58
|
+
|
|
59
|
+
Keep technical terms in English:
|
|
60
|
+
|
|
61
|
+
```json
|
|
62
|
+
{
|
|
63
|
+
"source": "Kubernetes",
|
|
64
|
+
"doNotTranslate": true
|
|
65
|
+
}
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
### Do Not Translate for Specific Languages
|
|
69
|
+
|
|
70
|
+
```json
|
|
71
|
+
{
|
|
72
|
+
"source": "React",
|
|
73
|
+
"doNotTranslateFor": ["ko", "ja"],
|
|
74
|
+
"targets": {
|
|
75
|
+
"zh": "React框架"
|
|
76
|
+
}
|
|
77
|
+
}
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
### Case Sensitivity
|
|
81
|
+
|
|
82
|
+
```json
|
|
83
|
+
{
|
|
84
|
+
"source": "API",
|
|
85
|
+
"targets": { "ko": "API" },
|
|
86
|
+
"caseSensitive": true
|
|
87
|
+
}
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## Complete Example
|
|
91
|
+
|
|
92
|
+
```json
|
|
93
|
+
{
|
|
94
|
+
"sourceLanguage": "en",
|
|
95
|
+
"version": "1.0.0",
|
|
96
|
+
"description": "Technical documentation glossary",
|
|
97
|
+
"terms": [
|
|
98
|
+
{
|
|
99
|
+
"source": "component",
|
|
100
|
+
"targets": { "ko": "컴포넌트", "ja": "コンポーネント" },
|
|
101
|
+
"context": "UI component"
|
|
102
|
+
},
|
|
103
|
+
{
|
|
104
|
+
"source": "prop",
|
|
105
|
+
"targets": { "ko": "프롭", "ja": "プロップ" },
|
|
106
|
+
"context": "React component property"
|
|
107
|
+
},
|
|
108
|
+
{
|
|
109
|
+
"source": "hook",
|
|
110
|
+
"targets": { "ko": "훅", "ja": "フック" },
|
|
111
|
+
"context": "React hook (useState, useEffect, etc.)"
|
|
112
|
+
},
|
|
113
|
+
{
|
|
114
|
+
"source": "state",
|
|
115
|
+
"targets": { "ko": "상태", "ja": "ステート" },
|
|
116
|
+
"context": "Component or application state"
|
|
117
|
+
},
|
|
118
|
+
{
|
|
119
|
+
"source": "TypeScript",
|
|
120
|
+
"doNotTranslate": true
|
|
121
|
+
},
|
|
122
|
+
{
|
|
123
|
+
"source": "JavaScript",
|
|
124
|
+
"doNotTranslate": true
|
|
125
|
+
},
|
|
126
|
+
{
|
|
127
|
+
"source": "npm",
|
|
128
|
+
"doNotTranslate": true
|
|
129
|
+
},
|
|
130
|
+
{
|
|
131
|
+
"source": "API",
|
|
132
|
+
"doNotTranslate": true,
|
|
133
|
+
"caseSensitive": true
|
|
134
|
+
}
|
|
135
|
+
]
|
|
136
|
+
}
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
## CLI Commands
|
|
140
|
+
|
|
141
|
+
### List Glossary Terms
|
|
142
|
+
|
|
143
|
+
```bash
|
|
144
|
+
llm-translate glossary list --glossary glossary.json
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
### Validate Glossary
|
|
148
|
+
|
|
149
|
+
```bash
|
|
150
|
+
llm-translate glossary validate --glossary glossary.json
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
### Add a Term
|
|
154
|
+
|
|
155
|
+
```bash
|
|
156
|
+
llm-translate glossary add "container" --target ko="컨테이너" --glossary glossary.json
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
### Remove a Term
|
|
160
|
+
|
|
161
|
+
```bash
|
|
162
|
+
llm-translate glossary remove "container" --glossary glossary.json
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
## Best Practices
|
|
166
|
+
|
|
167
|
+
### 1. Start with Common Technical Terms
|
|
168
|
+
|
|
169
|
+
```json
|
|
170
|
+
{
|
|
171
|
+
"terms": [
|
|
172
|
+
{ "source": "API", "doNotTranslate": true },
|
|
173
|
+
{ "source": "SDK", "doNotTranslate": true },
|
|
174
|
+
{ "source": "CLI", "doNotTranslate": true },
|
|
175
|
+
{ "source": "URL", "doNotTranslate": true },
|
|
176
|
+
{ "source": "JSON", "doNotTranslate": true }
|
|
177
|
+
]
|
|
178
|
+
}
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
### 2. Include Product-Specific Terms
|
|
182
|
+
|
|
183
|
+
```json
|
|
184
|
+
{
|
|
185
|
+
"source": "Workspace",
|
|
186
|
+
"targets": { "ko": "워크스페이스" },
|
|
187
|
+
"context": "Product-specific workspace feature"
|
|
188
|
+
}
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
### 3. Add Context for Ambiguous Terms
|
|
192
|
+
|
|
193
|
+
```json
|
|
194
|
+
{
|
|
195
|
+
"source": "run",
|
|
196
|
+
"targets": { "ko": "실행" },
|
|
197
|
+
"context": "Execute a command or script"
|
|
198
|
+
}
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
### 4. Use `doNotTranslate` for Brand Names
|
|
202
|
+
|
|
203
|
+
```json
|
|
204
|
+
{
|
|
205
|
+
"source": "GitHub",
|
|
206
|
+
"doNotTranslate": true
|
|
207
|
+
}
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
### 5. Version Your Glossary
|
|
211
|
+
|
|
212
|
+
Track glossary changes alongside your documentation.
|
|
213
|
+
|
|
214
|
+
## How Terms Are Applied
|
|
215
|
+
|
|
216
|
+
1. **Before translation**: The glossary is injected into the prompt
|
|
217
|
+
2. **During translation**: The LLM sees required translations for each term
|
|
218
|
+
3. **Quality evaluation**: Glossary compliance is scored (20% of total)
|
|
219
|
+
4. **Refinement**: Missing terms are flagged for correction
|
|
220
|
+
|
|
221
|
+
## Troubleshooting
|
|
222
|
+
|
|
223
|
+
### Term Not Being Applied
|
|
224
|
+
|
|
225
|
+
- Check case sensitivity settings
|
|
226
|
+
- Ensure the term appears in source text exactly as defined
|
|
227
|
+
- Verify the target language is in the `targets` object
|
|
228
|
+
|
|
229
|
+
### Inconsistent Translations
|
|
230
|
+
|
|
231
|
+
- Add more context to disambiguate
|
|
232
|
+
- Check for duplicate terms with different translations
|
|
233
|
+
- Increase quality threshold to enforce compliance
|
|
234
|
+
|
|
235
|
+
### Glossary Too Large
|
|
236
|
+
|
|
237
|
+
Large glossaries increase token usage. Consider:
|
|
238
|
+
|
|
239
|
+
- Splitting by domain/project
|
|
240
|
+
- Using selective glossary injection (coming soon)
|
|
241
|
+
- Removing rarely-used terms
|
|
@@ -0,0 +1,86 @@
|
|
|
1
|
+
# What is llm-translate?
|
|
2
|
+
|
|
3
|
+
::: info Translations
|
|
4
|
+
All non-English documentation is automatically translated using Claude Sonnet 4.
|
|
5
|
+
:::
|
|
6
|
+
|
|
7
|
+
llm-translate is a CLI tool for translating documents using Large Language Models. It's designed specifically for technical documentation where consistency, accuracy, and format preservation are critical.
|
|
8
|
+
|
|
9
|
+
## Key Features
|
|
10
|
+
|
|
11
|
+
### Glossary Enforcement
|
|
12
|
+
|
|
13
|
+
Define domain-specific terminology once and ensure it's translated consistently across all your documents.
|
|
14
|
+
|
|
15
|
+
```json
|
|
16
|
+
{
|
|
17
|
+
"terms": [
|
|
18
|
+
{
|
|
19
|
+
"source": "API endpoint",
|
|
20
|
+
"targets": { "ko": "API 엔드포인트", "ja": "APIエンドポイント" }
|
|
21
|
+
},
|
|
22
|
+
{
|
|
23
|
+
"source": "Kubernetes",
|
|
24
|
+
"doNotTranslate": true
|
|
25
|
+
}
|
|
26
|
+
]
|
|
27
|
+
}
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
### Self-Refine Quality Control
|
|
31
|
+
|
|
32
|
+
llm-translate doesn't just translate once and call it done. It uses an iterative refinement process:
|
|
33
|
+
|
|
34
|
+
1. **Initial Translation** - Generate first translation with glossary
|
|
35
|
+
2. **Quality Evaluation** - Score against semantic accuracy, fluency, glossary compliance, and format preservation
|
|
36
|
+
3. **Reflection** - Identify specific issues if quality threshold not met
|
|
37
|
+
4. **Improvement** - Apply targeted fixes
|
|
38
|
+
5. **Repeat** - Continue until quality >= threshold or max iterations reached
|
|
39
|
+
|
|
40
|
+
### Prompt Caching
|
|
41
|
+
|
|
42
|
+
For Claude models, llm-translate automatically uses prompt caching to reduce costs:
|
|
43
|
+
|
|
44
|
+
- System instructions and glossary are cached across chunks
|
|
45
|
+
- Subsequent requests use cached tokens at 90% discount
|
|
46
|
+
- Especially effective for large documents with many chunks
|
|
47
|
+
|
|
48
|
+
### Format Preservation
|
|
49
|
+
|
|
50
|
+
The tool uses AST-based parsing to preserve:
|
|
51
|
+
|
|
52
|
+
- Markdown formatting (headers, lists, tables, code blocks)
|
|
53
|
+
- HTML tags and attributes
|
|
54
|
+
- Links and images
|
|
55
|
+
- Document structure and hierarchy
|
|
56
|
+
|
|
57
|
+
## Use Cases
|
|
58
|
+
|
|
59
|
+
- **Technical Documentation** - Translate README, API docs, user guides
|
|
60
|
+
- **Knowledge Bases** - Multilingual support articles
|
|
61
|
+
- **Product Content** - Release notes, changelogs, feature descriptions
|
|
62
|
+
- **Developer Resources** - Tutorials, guides, code comments
|
|
63
|
+
|
|
64
|
+
## Architecture
|
|
65
|
+
|
|
66
|
+
```
|
|
67
|
+
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
|
|
68
|
+
│ Parser │────▶│ Chunker │────▶│ Agent │
|
|
69
|
+
│ (MD/HTML) │ │ (Semantic) │ │(Self-Refine)│
|
|
70
|
+
└─────────────┘ └──────────────┘ └─────────────┘
|
|
71
|
+
│
|
|
72
|
+
┌──────────────┐ │
|
|
73
|
+
│ Provider │◀───────────┘
|
|
74
|
+
│(Claude/GPT) │
|
|
75
|
+
└──────────────┘
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
## Comparison
|
|
79
|
+
|
|
80
|
+
| Feature | llm-translate | Generic LLM | Traditional MT |
|
|
81
|
+
|---------|--------------|-------------|----------------|
|
|
82
|
+
| Glossary enforcement | ✅ | ❌ | ⚠️ Limited |
|
|
83
|
+
| Quality control | ✅ Self-refine | ❌ | ❌ |
|
|
84
|
+
| Format preservation | ✅ AST-based | ⚠️ Prompt-based | ❌ |
|
|
85
|
+
| Cost optimization | ✅ Caching | ❌ | N/A |
|
|
86
|
+
| Code block handling | ✅ Protected | ⚠️ | ❌ |
|