@llm-translate/cli 1.0.0-next.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (157) hide show
  1. package/.dockerignore +51 -0
  2. package/.env.example +33 -0
  3. package/.github/workflows/docs-pages.yml +57 -0
  4. package/.github/workflows/release.yml +49 -0
  5. package/.translaterc.json +44 -0
  6. package/CLAUDE.md +243 -0
  7. package/Dockerfile +55 -0
  8. package/README.md +371 -0
  9. package/RFC.md +1595 -0
  10. package/dist/cli/index.d.ts +2 -0
  11. package/dist/cli/index.js +4494 -0
  12. package/dist/cli/index.js.map +1 -0
  13. package/dist/index.d.ts +1152 -0
  14. package/dist/index.js +3841 -0
  15. package/dist/index.js.map +1 -0
  16. package/docker-compose.yml +56 -0
  17. package/docs/.vitepress/config.ts +161 -0
  18. package/docs/api/agent.md +262 -0
  19. package/docs/api/engine.md +274 -0
  20. package/docs/api/index.md +171 -0
  21. package/docs/api/providers.md +304 -0
  22. package/docs/changelog.md +64 -0
  23. package/docs/cli/dir.md +243 -0
  24. package/docs/cli/file.md +213 -0
  25. package/docs/cli/glossary.md +273 -0
  26. package/docs/cli/index.md +129 -0
  27. package/docs/cli/init.md +158 -0
  28. package/docs/cli/serve.md +211 -0
  29. package/docs/glossary.json +235 -0
  30. package/docs/guide/chunking.md +272 -0
  31. package/docs/guide/configuration.md +139 -0
  32. package/docs/guide/cost-optimization.md +237 -0
  33. package/docs/guide/docker.md +371 -0
  34. package/docs/guide/getting-started.md +150 -0
  35. package/docs/guide/glossary.md +241 -0
  36. package/docs/guide/index.md +86 -0
  37. package/docs/guide/ollama.md +515 -0
  38. package/docs/guide/prompt-caching.md +221 -0
  39. package/docs/guide/providers.md +232 -0
  40. package/docs/guide/quality-control.md +206 -0
  41. package/docs/guide/vitepress-integration.md +265 -0
  42. package/docs/index.md +63 -0
  43. package/docs/ja/api/agent.md +262 -0
  44. package/docs/ja/api/engine.md +274 -0
  45. package/docs/ja/api/index.md +171 -0
  46. package/docs/ja/api/providers.md +304 -0
  47. package/docs/ja/changelog.md +64 -0
  48. package/docs/ja/cli/dir.md +243 -0
  49. package/docs/ja/cli/file.md +213 -0
  50. package/docs/ja/cli/glossary.md +273 -0
  51. package/docs/ja/cli/index.md +111 -0
  52. package/docs/ja/cli/init.md +158 -0
  53. package/docs/ja/guide/chunking.md +271 -0
  54. package/docs/ja/guide/configuration.md +139 -0
  55. package/docs/ja/guide/cost-optimization.md +30 -0
  56. package/docs/ja/guide/getting-started.md +150 -0
  57. package/docs/ja/guide/glossary.md +214 -0
  58. package/docs/ja/guide/index.md +32 -0
  59. package/docs/ja/guide/ollama.md +410 -0
  60. package/docs/ja/guide/prompt-caching.md +221 -0
  61. package/docs/ja/guide/providers.md +232 -0
  62. package/docs/ja/guide/quality-control.md +137 -0
  63. package/docs/ja/guide/vitepress-integration.md +265 -0
  64. package/docs/ja/index.md +58 -0
  65. package/docs/ko/api/agent.md +262 -0
  66. package/docs/ko/api/engine.md +274 -0
  67. package/docs/ko/api/index.md +171 -0
  68. package/docs/ko/api/providers.md +304 -0
  69. package/docs/ko/changelog.md +64 -0
  70. package/docs/ko/cli/dir.md +243 -0
  71. package/docs/ko/cli/file.md +213 -0
  72. package/docs/ko/cli/glossary.md +273 -0
  73. package/docs/ko/cli/index.md +111 -0
  74. package/docs/ko/cli/init.md +158 -0
  75. package/docs/ko/guide/chunking.md +271 -0
  76. package/docs/ko/guide/configuration.md +139 -0
  77. package/docs/ko/guide/cost-optimization.md +30 -0
  78. package/docs/ko/guide/getting-started.md +150 -0
  79. package/docs/ko/guide/glossary.md +214 -0
  80. package/docs/ko/guide/index.md +32 -0
  81. package/docs/ko/guide/ollama.md +410 -0
  82. package/docs/ko/guide/prompt-caching.md +221 -0
  83. package/docs/ko/guide/providers.md +232 -0
  84. package/docs/ko/guide/quality-control.md +137 -0
  85. package/docs/ko/guide/vitepress-integration.md +265 -0
  86. package/docs/ko/index.md +58 -0
  87. package/docs/zh/api/agent.md +262 -0
  88. package/docs/zh/api/engine.md +274 -0
  89. package/docs/zh/api/index.md +171 -0
  90. package/docs/zh/api/providers.md +304 -0
  91. package/docs/zh/changelog.md +64 -0
  92. package/docs/zh/cli/dir.md +243 -0
  93. package/docs/zh/cli/file.md +213 -0
  94. package/docs/zh/cli/glossary.md +273 -0
  95. package/docs/zh/cli/index.md +111 -0
  96. package/docs/zh/cli/init.md +158 -0
  97. package/docs/zh/guide/chunking.md +271 -0
  98. package/docs/zh/guide/configuration.md +139 -0
  99. package/docs/zh/guide/cost-optimization.md +30 -0
  100. package/docs/zh/guide/getting-started.md +150 -0
  101. package/docs/zh/guide/glossary.md +214 -0
  102. package/docs/zh/guide/index.md +32 -0
  103. package/docs/zh/guide/ollama.md +410 -0
  104. package/docs/zh/guide/prompt-caching.md +221 -0
  105. package/docs/zh/guide/providers.md +232 -0
  106. package/docs/zh/guide/quality-control.md +137 -0
  107. package/docs/zh/guide/vitepress-integration.md +265 -0
  108. package/docs/zh/index.md +58 -0
  109. package/package.json +91 -0
  110. package/release.config.mjs +15 -0
  111. package/schemas/glossary.schema.json +110 -0
  112. package/src/cli/commands/dir.ts +469 -0
  113. package/src/cli/commands/file.ts +291 -0
  114. package/src/cli/commands/glossary.ts +221 -0
  115. package/src/cli/commands/init.ts +68 -0
  116. package/src/cli/commands/serve.ts +60 -0
  117. package/src/cli/index.ts +64 -0
  118. package/src/cli/options.ts +59 -0
  119. package/src/core/agent.ts +1119 -0
  120. package/src/core/chunker.ts +391 -0
  121. package/src/core/engine.ts +634 -0
  122. package/src/errors.ts +188 -0
  123. package/src/index.ts +147 -0
  124. package/src/integrations/vitepress.ts +549 -0
  125. package/src/parsers/markdown.ts +383 -0
  126. package/src/providers/claude.ts +259 -0
  127. package/src/providers/interface.ts +109 -0
  128. package/src/providers/ollama.ts +379 -0
  129. package/src/providers/openai.ts +308 -0
  130. package/src/providers/registry.ts +153 -0
  131. package/src/server/index.ts +152 -0
  132. package/src/server/middleware/auth.ts +93 -0
  133. package/src/server/middleware/logger.ts +90 -0
  134. package/src/server/routes/health.ts +84 -0
  135. package/src/server/routes/translate.ts +210 -0
  136. package/src/server/types.ts +138 -0
  137. package/src/services/cache.ts +899 -0
  138. package/src/services/config.ts +217 -0
  139. package/src/services/glossary.ts +247 -0
  140. package/src/types/analysis.ts +164 -0
  141. package/src/types/index.ts +265 -0
  142. package/src/types/modes.ts +121 -0
  143. package/src/types/mqm.ts +157 -0
  144. package/src/utils/logger.ts +141 -0
  145. package/src/utils/tokens.ts +116 -0
  146. package/tests/fixtures/glossaries/ml-glossary.json +53 -0
  147. package/tests/fixtures/input/lynq-installation.ko.md +350 -0
  148. package/tests/fixtures/input/lynq-installation.md +350 -0
  149. package/tests/fixtures/input/simple.ko.md +27 -0
  150. package/tests/fixtures/input/simple.md +27 -0
  151. package/tests/unit/chunker.test.ts +229 -0
  152. package/tests/unit/glossary.test.ts +146 -0
  153. package/tests/unit/markdown.test.ts +205 -0
  154. package/tests/unit/tokens.test.ts +81 -0
  155. package/tsconfig.json +28 -0
  156. package/tsup.config.ts +34 -0
  157. package/vitest.config.ts +16 -0
@@ -0,0 +1,241 @@
1
+ # Glossary
2
+
3
+ ::: info Translations
4
+ All non-English documentation is automatically translated using Claude Sonnet 4.
5
+ :::
6
+
7
+ The glossary feature ensures consistent terminology across all your translations. Define terms once, and they'll be translated the same way every time.
8
+
9
+ ## Glossary File Format
10
+
11
+ Create a `glossary.json` file:
12
+
13
+ ```json
14
+ {
15
+ "sourceLanguage": "en",
16
+ "version": "1.0.0",
17
+ "terms": [
18
+ {
19
+ "source": "component",
20
+ "targets": {
21
+ "ko": "컴포넌트",
22
+ "ja": "コンポーネント",
23
+ "zh": "组件"
24
+ },
25
+ "context": "UI component in React/Vue"
26
+ }
27
+ ]
28
+ }
29
+ ```
30
+
31
+ ## Term Structure
32
+
33
+ ### Basic Term
34
+
35
+ ```json
36
+ {
37
+ "source": "endpoint",
38
+ "targets": {
39
+ "ko": "엔드포인트",
40
+ "ja": "エンドポイント"
41
+ }
42
+ }
43
+ ```
44
+
45
+ ### With Context
46
+
47
+ Context helps the LLM understand how to use the term:
48
+
49
+ ```json
50
+ {
51
+ "source": "state",
52
+ "targets": { "ko": "상태" },
53
+ "context": "Application state in state management"
54
+ }
55
+ ```
56
+
57
+ ### Do Not Translate
58
+
59
+ Keep technical terms in English:
60
+
61
+ ```json
62
+ {
63
+ "source": "Kubernetes",
64
+ "doNotTranslate": true
65
+ }
66
+ ```
67
+
68
+ ### Do Not Translate for Specific Languages
69
+
70
+ ```json
71
+ {
72
+ "source": "React",
73
+ "doNotTranslateFor": ["ko", "ja"],
74
+ "targets": {
75
+ "zh": "React框架"
76
+ }
77
+ }
78
+ ```
79
+
80
+ ### Case Sensitivity
81
+
82
+ ```json
83
+ {
84
+ "source": "API",
85
+ "targets": { "ko": "API" },
86
+ "caseSensitive": true
87
+ }
88
+ ```
89
+
90
+ ## Complete Example
91
+
92
+ ```json
93
+ {
94
+ "sourceLanguage": "en",
95
+ "version": "1.0.0",
96
+ "description": "Technical documentation glossary",
97
+ "terms": [
98
+ {
99
+ "source": "component",
100
+ "targets": { "ko": "컴포넌트", "ja": "コンポーネント" },
101
+ "context": "UI component"
102
+ },
103
+ {
104
+ "source": "prop",
105
+ "targets": { "ko": "프롭", "ja": "プロップ" },
106
+ "context": "React component property"
107
+ },
108
+ {
109
+ "source": "hook",
110
+ "targets": { "ko": "훅", "ja": "フック" },
111
+ "context": "React hook (useState, useEffect, etc.)"
112
+ },
113
+ {
114
+ "source": "state",
115
+ "targets": { "ko": "상태", "ja": "ステート" },
116
+ "context": "Component or application state"
117
+ },
118
+ {
119
+ "source": "TypeScript",
120
+ "doNotTranslate": true
121
+ },
122
+ {
123
+ "source": "JavaScript",
124
+ "doNotTranslate": true
125
+ },
126
+ {
127
+ "source": "npm",
128
+ "doNotTranslate": true
129
+ },
130
+ {
131
+ "source": "API",
132
+ "doNotTranslate": true,
133
+ "caseSensitive": true
134
+ }
135
+ ]
136
+ }
137
+ ```
138
+
139
+ ## CLI Commands
140
+
141
+ ### List Glossary Terms
142
+
143
+ ```bash
144
+ llm-translate glossary list --glossary glossary.json
145
+ ```
146
+
147
+ ### Validate Glossary
148
+
149
+ ```bash
150
+ llm-translate glossary validate --glossary glossary.json
151
+ ```
152
+
153
+ ### Add a Term
154
+
155
+ ```bash
156
+ llm-translate glossary add "container" --target ko="컨테이너" --glossary glossary.json
157
+ ```
158
+
159
+ ### Remove a Term
160
+
161
+ ```bash
162
+ llm-translate glossary remove "container" --glossary glossary.json
163
+ ```
164
+
165
+ ## Best Practices
166
+
167
+ ### 1. Start with Common Technical Terms
168
+
169
+ ```json
170
+ {
171
+ "terms": [
172
+ { "source": "API", "doNotTranslate": true },
173
+ { "source": "SDK", "doNotTranslate": true },
174
+ { "source": "CLI", "doNotTranslate": true },
175
+ { "source": "URL", "doNotTranslate": true },
176
+ { "source": "JSON", "doNotTranslate": true }
177
+ ]
178
+ }
179
+ ```
180
+
181
+ ### 2. Include Product-Specific Terms
182
+
183
+ ```json
184
+ {
185
+ "source": "Workspace",
186
+ "targets": { "ko": "워크스페이스" },
187
+ "context": "Product-specific workspace feature"
188
+ }
189
+ ```
190
+
191
+ ### 3. Add Context for Ambiguous Terms
192
+
193
+ ```json
194
+ {
195
+ "source": "run",
196
+ "targets": { "ko": "실행" },
197
+ "context": "Execute a command or script"
198
+ }
199
+ ```
200
+
201
+ ### 4. Use `doNotTranslate` for Brand Names
202
+
203
+ ```json
204
+ {
205
+ "source": "GitHub",
206
+ "doNotTranslate": true
207
+ }
208
+ ```
209
+
210
+ ### 5. Version Your Glossary
211
+
212
+ Track glossary changes alongside your documentation.
213
+
214
+ ## How Terms Are Applied
215
+
216
+ 1. **Before translation**: The glossary is injected into the prompt
217
+ 2. **During translation**: The LLM sees required translations for each term
218
+ 3. **Quality evaluation**: Glossary compliance is scored (20% of total)
219
+ 4. **Refinement**: Missing terms are flagged for correction
220
+
221
+ ## Troubleshooting
222
+
223
+ ### Term Not Being Applied
224
+
225
+ - Check case sensitivity settings
226
+ - Ensure the term appears in source text exactly as defined
227
+ - Verify the target language is in the `targets` object
228
+
229
+ ### Inconsistent Translations
230
+
231
+ - Add more context to disambiguate
232
+ - Check for duplicate terms with different translations
233
+ - Increase quality threshold to enforce compliance
234
+
235
+ ### Glossary Too Large
236
+
237
+ Large glossaries increase token usage. Consider:
238
+
239
+ - Splitting by domain/project
240
+ - Using selective glossary injection (coming soon)
241
+ - Removing rarely-used terms
@@ -0,0 +1,86 @@
1
+ # What is llm-translate?
2
+
3
+ ::: info Translations
4
+ All non-English documentation is automatically translated using Claude Sonnet 4.
5
+ :::
6
+
7
+ llm-translate is a CLI tool for translating documents using Large Language Models. It's designed specifically for technical documentation where consistency, accuracy, and format preservation are critical.
8
+
9
+ ## Key Features
10
+
11
+ ### Glossary Enforcement
12
+
13
+ Define domain-specific terminology once and ensure it's translated consistently across all your documents.
14
+
15
+ ```json
16
+ {
17
+ "terms": [
18
+ {
19
+ "source": "API endpoint",
20
+ "targets": { "ko": "API 엔드포인트", "ja": "APIエンドポイント" }
21
+ },
22
+ {
23
+ "source": "Kubernetes",
24
+ "doNotTranslate": true
25
+ }
26
+ ]
27
+ }
28
+ ```
29
+
30
+ ### Self-Refine Quality Control
31
+
32
+ llm-translate doesn't just translate once and call it done. It uses an iterative refinement process:
33
+
34
+ 1. **Initial Translation** - Generate first translation with glossary
35
+ 2. **Quality Evaluation** - Score against semantic accuracy, fluency, glossary compliance, and format preservation
36
+ 3. **Reflection** - Identify specific issues if quality threshold not met
37
+ 4. **Improvement** - Apply targeted fixes
38
+ 5. **Repeat** - Continue until quality >= threshold or max iterations reached
39
+
40
+ ### Prompt Caching
41
+
42
+ For Claude models, llm-translate automatically uses prompt caching to reduce costs:
43
+
44
+ - System instructions and glossary are cached across chunks
45
+ - Subsequent requests use cached tokens at 90% discount
46
+ - Especially effective for large documents with many chunks
47
+
48
+ ### Format Preservation
49
+
50
+ The tool uses AST-based parsing to preserve:
51
+
52
+ - Markdown formatting (headers, lists, tables, code blocks)
53
+ - HTML tags and attributes
54
+ - Links and images
55
+ - Document structure and hierarchy
56
+
57
+ ## Use Cases
58
+
59
+ - **Technical Documentation** - Translate README, API docs, user guides
60
+ - **Knowledge Bases** - Multilingual support articles
61
+ - **Product Content** - Release notes, changelogs, feature descriptions
62
+ - **Developer Resources** - Tutorials, guides, code comments
63
+
64
+ ## Architecture
65
+
66
+ ```
67
+ ┌─────────────┐ ┌──────────────┐ ┌─────────────┐
68
+ │ Parser │────▶│ Chunker │────▶│ Agent │
69
+ │ (MD/HTML) │ │ (Semantic) │ │(Self-Refine)│
70
+ └─────────────┘ └──────────────┘ └─────────────┘
71
+
72
+ ┌──────────────┐ │
73
+ │ Provider │◀───────────┘
74
+ │(Claude/GPT) │
75
+ └──────────────┘
76
+ ```
77
+
78
+ ## Comparison
79
+
80
+ | Feature | llm-translate | Generic LLM | Traditional MT |
81
+ |---------|--------------|-------------|----------------|
82
+ | Glossary enforcement | ✅ | ❌ | ⚠️ Limited |
83
+ | Quality control | ✅ Self-refine | ❌ | ❌ |
84
+ | Format preservation | ✅ AST-based | ⚠️ Prompt-based | ❌ |
85
+ | Cost optimization | ✅ Caching | ❌ | N/A |
86
+ | Code block handling | ✅ Protected | ⚠️ | ❌ |