llm_translate 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +0 -5
- data/README.zh.md +148 -61
- data/content/llm_translate.yml +1 -6
- data/lib/llm_translate/ai_client.rb +3 -3
- data/lib/llm_translate/config.rb +0 -13
- data/lib/llm_translate/translator_engine.rb +1 -140
- data/lib/llm_translate/version.rb +1 -1
- data/llm_translate.yml +1 -6
- data/test_llm_translate.yml +1 -5
- data/test_new_config.yml +1 -6
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c1231cf03fe5984a00c4bccf9596b2255dd83c4852bac5bd554068e3853f1fad
|
4
|
+
data.tar.gz: 208488f548d281ae103eb10ad3790201c41a62cff8acb0b24c36e786181da3ed
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: bc4713b2292f56da626f79beafda11144a45f8fc84ed660f4f333afa2cf93b592285bb74b5746cb2610cf5bf41c81006ffb8ad648c6784365692404e4ff56495
|
7
|
+
data.tar.gz: 0f618ca45a6b97cf3fed6410d2dcf73bd80fd17abc0e91c443110021da5da0b484945b45f9813ef46aa671714c821ad1aa73f51530165cc9eceea437b32ed3e3
|
data/README.md
CHANGED
@@ -177,11 +177,6 @@ translation:
|
|
177
177
|
default_prompt: "Your custom prompt with {content} placeholder"
|
178
178
|
preserve_formatting: true
|
179
179
|
translate_code_comments: false
|
180
|
-
preserve_patterns:
|
181
|
-
- "```[\\s\\S]*?```" # Code blocks
|
182
|
-
- "`[^`]+`" # Inline code
|
183
|
-
- "\\[.*?\\]\\(.*?\\)" # Links
|
184
|
-
- "!\\[.*?\\]\\(.*?\\)" # Images
|
185
180
|
|
186
181
|
# File Processing
|
187
182
|
files:
|
data/README.zh.md
CHANGED
@@ -1,21 +1,21 @@
|
|
1
1
|
# LlmTranslate
|
2
2
|
|
3
|
-
|
3
|
+
AI 驱动的 Markdown 翻译工具,可在使用各种 AI 提供商翻译内容时保留格式。
|
4
4
|
|
5
|
-
##
|
5
|
+
## 功能特性
|
6
6
|
|
7
|
-
- 🤖 **AI
|
8
|
-
- 📝 **Markdown
|
9
|
-
- 🔧 **灵活配置**:基于YAML的配置,支持环境变量
|
7
|
+
- 🤖 **AI 驱动翻译**:支持 OpenAI、Anthropic 和 Ollama
|
8
|
+
- 📝 **Markdown 格式保留**:保持代码块、链接、图片和格式完整
|
9
|
+
- 🔧 **灵活配置**:基于 YAML 的配置,支持环境变量
|
10
10
|
- 📁 **批量处理**:递归处理整个目录结构
|
11
|
-
- 🚀 **CLI
|
12
|
-
- 📊
|
13
|
-
- ⚡
|
11
|
+
- 🚀 **CLI 界面**:使用 Thor 的易用命令行界面
|
12
|
+
- 📊 **进度跟踪**:内置日志记录和报告功能
|
13
|
+
- ⚡ **错误处理**:带有重试机制的强大错误处理
|
14
14
|
- 🎯 **可定制**:自定义提示、文件模式和输出策略
|
15
15
|
|
16
16
|
## 安装
|
17
17
|
|
18
|
-
|
18
|
+
将以下行添加到您应用程序的 Gemfile 中:
|
19
19
|
|
20
20
|
```ruby
|
21
21
|
gem 'llm_translate'
|
@@ -27,15 +27,15 @@ gem 'llm_translate'
|
|
27
27
|
bundle install
|
28
28
|
```
|
29
29
|
|
30
|
-
|
30
|
+
或者自行安装:
|
31
31
|
|
32
32
|
```bash
|
33
33
|
gem install llm_translate
|
34
34
|
```
|
35
35
|
|
36
|
-
##
|
36
|
+
## 依赖项
|
37
37
|
|
38
|
-
该gem
|
38
|
+
该 gem 需要 `rubyllm` gem 进行 AI 集成:
|
39
39
|
|
40
40
|
```bash
|
41
41
|
gem install rubyllm
|
@@ -48,22 +48,22 @@ gem install rubyllm
|
|
48
48
|
llm_translate init
|
49
49
|
```
|
50
50
|
|
51
|
-
2. **设置您的API密钥**:
|
51
|
+
2. **设置您的 API 密钥**:
|
52
52
|
```bash
|
53
53
|
export LLM_TRANSLATE_API_KEY="your-api-key-here"
|
54
54
|
```
|
55
55
|
|
56
|
-
3. **翻译您的markdown文件**:
|
56
|
+
3. **翻译您的 markdown 文件**:
|
57
57
|
```bash
|
58
|
-
llm_translate translate --config ./
|
58
|
+
llm_translate translate --config ./llm_translate.yml
|
59
59
|
```
|
60
60
|
|
61
61
|
## 配置
|
62
62
|
|
63
|
-
翻译器使用YAML
|
63
|
+
翻译器使用 YAML 配置文件。这是一个最小示例:
|
64
64
|
|
65
65
|
```yaml
|
66
|
-
#
|
66
|
+
# llm_translate.yml
|
67
67
|
ai:
|
68
68
|
api_key: ${LLM_TRANSLATE_API_KEY}
|
69
69
|
provider: "openai"
|
@@ -91,7 +91,7 @@ logging:
|
|
91
91
|
output: "console"
|
92
92
|
```
|
93
93
|
|
94
|
-
### AI
|
94
|
+
### AI 提供商
|
95
95
|
|
96
96
|
#### OpenAI
|
97
97
|
```yaml
|
@@ -109,101 +109,188 @@ ai:
|
|
109
109
|
model: "claude-3-sonnet-20240229"
|
110
110
|
```
|
111
111
|
|
112
|
-
#### Ollama
|
113
|
-
```
|
114
|
-
|
115
|
-
|
112
|
+
#### Ollama (本地)
|
113
|
+
```yaml
|
114
|
+
ai:
|
115
|
+
provider: "ollama"
|
116
|
+
model: "llama2"
|
117
|
+
# 如果不使用默认设置,请设置 OLLAMA_HOST 环境变量
|
118
|
+
```
|
116
119
|
|
117
|
-
##
|
120
|
+
## 使用方法
|
118
121
|
|
119
122
|
### 基本翻译
|
120
123
|
|
121
124
|
#### 目录模式(默认)
|
122
125
|
```bash
|
123
|
-
|
124
|
-
```
|
126
|
+
llm_translate translate --config ./llm_translate.yml
|
127
|
+
```
|
125
128
|
|
126
129
|
#### 单文件模式
|
127
|
-
|
128
|
-
llm_translate init
|
129
|
-
```7output_file`:
|
130
|
+
要翻译单个文件,请在配置中设置 `input_file` 和 `output_file`:
|
130
131
|
|
131
|
-
```
|
132
|
-
|
133
|
-
|
132
|
+
```yaml
|
133
|
+
files:
|
134
|
+
# 单文件模式
|
135
|
+
input_file: "./README.md"
|
136
|
+
output_file: "./README.zh.md"
|
137
|
+
```
|
134
138
|
|
135
|
-
|
136
|
-
llm_translate init
|
137
|
-
```7output_file`时,翻译器将以单文件模式运行,忽略与目录相关的设置。
|
139
|
+
当同时指定 `input_file` 和 `output_file` 时,翻译器将以单文件模式运行,忽略与目录相关的设置。
|
138
140
|
|
139
141
|
### 命令行选项
|
140
142
|
|
141
143
|
```bash
|
142
|
-
|
143
|
-
|
144
|
+
llm_translate translate [OPTIONS]
|
145
|
+
|
146
|
+
Options:
|
147
|
+
-c, --config PATH 配置文件路径(默认:./llm_translate.yml)
|
148
|
+
-i, --input PATH 输入目录(覆盖配置)
|
149
|
+
-o, --output PATH 输出目录(覆盖配置)
|
150
|
+
-p, --prompt TEXT 自定义翻译提示(覆盖配置)
|
151
|
+
-v, --verbose 启用详细输出
|
152
|
+
-d, --dry-run 执行试运行,不进行实际翻译
|
153
|
+
|
154
|
+
Other Commands:
|
155
|
+
llm_translate init 初始化新的配置文件
|
156
|
+
llm_translate version 显示版本信息
|
157
|
+
```
|
144
158
|
|
145
159
|
### 配置文件结构
|
146
160
|
|
147
|
-
```
|
148
|
-
|
149
|
-
|
150
|
-
|
151
|
-
|
161
|
+
```yaml
|
162
|
+
# AI 配置
|
163
|
+
ai:
|
164
|
+
api_key: ${LLM_TRANSLATE_API_KEY}
|
165
|
+
provider: "openai" # openai, anthropic, ollama
|
166
|
+
model: "gpt-4"
|
167
|
+
temperature: 0.3
|
168
|
+
max_tokens: 4000
|
169
|
+
retry_attempts: 3
|
170
|
+
retry_delay: 2
|
171
|
+
timeout: 60
|
172
|
+
|
173
|
+
# 翻译设置
|
174
|
+
translation:
|
175
|
+
target_language: "zh-CN"
|
176
|
+
source_language: "auto"
|
177
|
+
default_prompt: "您的自定义提示,包含 {content} 占位符"
|
178
|
+
preserve_formatting: true
|
179
|
+
translate_code_comments: false
|
180
|
+
|
181
|
+
# 文件处理
|
182
|
+
files:
|
183
|
+
input_directory: "./docs"
|
184
|
+
output_directory: "./docs-translated"
|
185
|
+
filename_strategy: "suffix" # suffix, replace, directory
|
186
|
+
filename_suffix: ".zh"
|
187
|
+
include_patterns:
|
188
|
+
- "**/*.md"
|
189
|
+
- "**/*.markdown"
|
190
|
+
exclude_patterns:
|
191
|
+
- "**/node_modules/**"
|
192
|
+
- "**/.*"
|
193
|
+
preserve_directory_structure: true
|
194
|
+
overwrite_policy: "ask" # ask, overwrite, skip, backup
|
195
|
+
backup_directory: "./backups"
|
196
|
+
|
197
|
+
# 日志记录
|
198
|
+
logging:
|
199
|
+
level: "info" # debug, info, warn, error
|
200
|
+
output: "console" # console, file, both
|
201
|
+
file_path: "./logs/translator.log"
|
202
|
+
verbose_translation: false
|
203
|
+
error_log_path: "./logs/errors.log"
|
204
|
+
|
205
|
+
# 错误处理
|
206
|
+
error_handling:
|
207
|
+
on_error: "log_and_continue" # stop, log_and_continue, skip_file
|
208
|
+
max_consecutive_errors: 5
|
209
|
+
retry_on_failure: 2
|
210
|
+
generate_error_report: true
|
211
|
+
error_report_path: "./logs/error_report.md"
|
212
|
+
|
213
|
+
# 性能
|
214
|
+
performance:
|
215
|
+
concurrent_files: 3
|
216
|
+
batch_size: 5
|
217
|
+
request_interval: 1 # 请求之间的秒数
|
218
|
+
max_memory_mb: 500
|
219
|
+
|
220
|
+
# 输出
|
221
|
+
output:
|
222
|
+
show_progress: true
|
223
|
+
show_statistics: true
|
224
|
+
generate_report: true
|
225
|
+
report_path: "./reports/translation_report.md"
|
226
|
+
format: "markdown"
|
227
|
+
include_metadata: true
|
228
|
+
```
|
152
229
|
|
153
230
|
## 示例
|
154
231
|
|
155
232
|
### 翻译文档
|
156
233
|
|
157
234
|
```bash
|
158
|
-
|
159
|
-
|
235
|
+
# 将 ./docs 中的所有 markdown 文件翻译为中文
|
236
|
+
llm_translate translate --input ./docs --output ./docs-zh
|
237
|
+
|
238
|
+
# 使用自定义提示
|
239
|
+
llm_translate translate --prompt "翻译以下内容为中文,保持技术术语不变: {content}"
|
240
|
+
|
241
|
+
# 试运行以查看将要翻译的内容
|
242
|
+
llm_translate translate --dry-run --verbose
|
243
|
+
```
|
160
244
|
|
161
245
|
### 批量翻译
|
162
246
|
|
163
247
|
```bash
|
164
|
-
|
165
|
-
|
248
|
+
# 翻译多种语言版本
|
249
|
+
for lang in zh-CN ja-JP ko-KR; do
|
250
|
+
llm_translate translate --config "./configs/llm_translate-${lang}.yml"
|
251
|
+
done
|
252
|
+
```
|
166
253
|
|
167
254
|
## 开发
|
168
255
|
|
169
|
-
|
256
|
+
检出仓库后,运行:
|
170
257
|
|
171
258
|
```bash
|
172
259
|
bundle install
|
173
260
|
```
|
174
261
|
|
175
|
-
|
262
|
+
运行测试:
|
176
263
|
|
177
264
|
```bash
|
178
|
-
bundle
|
179
|
-
```
|
265
|
+
bundle exec rspec
|
266
|
+
```
|
180
267
|
|
181
|
-
|
268
|
+
运行代码检查:
|
182
269
|
|
183
270
|
```bash
|
184
|
-
|
185
|
-
```
|
271
|
+
bundle exec rubocop
|
272
|
+
```
|
186
273
|
|
187
|
-
|
274
|
+
将此 gem 安装到您的本地机器:
|
188
275
|
|
189
276
|
```bash
|
190
|
-
|
191
|
-
```
|
277
|
+
bundle exec rake install
|
278
|
+
```
|
192
279
|
|
193
280
|
## 贡献
|
194
281
|
|
195
|
-
欢迎在GitHub
|
282
|
+
欢迎在 GitHub 上提交错误报告和拉取请求:https://github.com/llm_translate/llm_translate。
|
196
283
|
|
197
284
|
## 许可证
|
198
285
|
|
199
|
-
该gem
|
286
|
+
该 gem 根据 [MIT 许可证](https://opensource.org/licenses/MIT) 的条款作为开源软件提供。
|
200
287
|
|
201
288
|
## 更新日志
|
202
289
|
|
203
290
|
### v0.1.0
|
204
|
-
-
|
205
|
-
- 支持OpenAI、Anthropic和Ollama
|
206
|
-
- Markdown格式保留
|
291
|
+
- 初始版本
|
292
|
+
- 支持 OpenAI、Anthropic 和 Ollama 提供商
|
293
|
+
- Markdown 格式保留
|
207
294
|
- 可配置的翻译提示
|
208
295
|
- 批量文件处理
|
209
|
-
-
|
296
|
+
- 全面的错误处理和日志记录
|
data/content/llm_translate.yml
CHANGED
@@ -1,6 +1,8 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
3
|
require 'ruby_llm'
|
4
|
+
# require 'pry'
|
5
|
+
|
4
6
|
module LlmTranslate
|
5
7
|
class AiClient
|
6
8
|
attr_reader :config, :logger
|
@@ -56,16 +58,14 @@ module LlmTranslate
|
|
56
58
|
|
57
59
|
def configure_ruby_llm
|
58
60
|
RubyLLM.configure do |config_obj|
|
59
|
-
# For aihubmix.com or any custom host, use OpenAI-compatible API
|
60
61
|
config_obj.openai_api_key = config.api_key
|
61
62
|
config_obj.openai_api_base = config.ai_host
|
62
|
-
config_obj.default_model = config.ai_model
|
63
63
|
end
|
64
64
|
end
|
65
65
|
|
66
66
|
def make_request(prompt)
|
67
67
|
chat = RubyLLM.chat
|
68
|
-
.with_model(config.ai_model)
|
68
|
+
.with_model(config.ai_model, assume_exists: true, provider: config.ai_provider)
|
69
69
|
.with_temperature(config.temperature)
|
70
70
|
|
71
71
|
response = chat.ask(prompt)
|
data/lib/llm_translate/config.rb
CHANGED
@@ -75,10 +75,6 @@ module LlmTranslate
|
|
75
75
|
data.dig('translation', 'translate_code_comments') == true
|
76
76
|
end
|
77
77
|
|
78
|
-
def preserve_patterns
|
79
|
-
data.dig('translation', 'preserve_patterns') || default_preserve_patterns
|
80
|
-
end
|
81
|
-
|
82
78
|
# File Configuration
|
83
79
|
def input_directory
|
84
80
|
cli_options[:input] || data.dig('files', 'input_directory') || './docs'
|
@@ -266,14 +262,5 @@ module LlmTranslate
|
|
266
262
|
{content}
|
267
263
|
PROMPT
|
268
264
|
end
|
269
|
-
|
270
|
-
def default_preserve_patterns
|
271
|
-
[
|
272
|
-
'```[\\s\\S]*?```', # Code blocks
|
273
|
-
'`[^`]+`', # Inline code
|
274
|
-
'\\[.*?\\]\\(.*?\\)', # Links
|
275
|
-
'!\\[.*?\\]\\(.*?\\)' # Images
|
276
|
-
]
|
277
|
-
end
|
278
265
|
end
|
279
266
|
end
|
@@ -87,147 +87,8 @@ module LlmTranslate
|
|
87
87
|
end
|
88
88
|
|
89
89
|
def translate_with_format_preservation(content)
|
90
|
-
# Extract and preserve special markdown elements
|
91
|
-
preserved_elements = extract_preserved_elements(content)
|
92
|
-
|
93
|
-
# Replace preserved elements with placeholders
|
94
|
-
content_with_placeholders = replace_with_placeholders(content, preserved_elements)
|
95
|
-
|
96
90
|
# Translate the content with placeholders
|
97
|
-
|
98
|
-
|
99
|
-
# Restore preserved elements
|
100
|
-
restore_preserved_elements(translated_content, preserved_elements)
|
101
|
-
end
|
102
|
-
|
103
|
-
def extract_preserved_elements(content)
|
104
|
-
preserved = {}
|
105
|
-
pattern_index = 0
|
106
|
-
|
107
|
-
config.preserve_patterns.each do |pattern|
|
108
|
-
regex = Regexp.new(pattern, Regexp::MULTILINE)
|
109
|
-
|
110
|
-
content.scan(regex) do |match|
|
111
|
-
# Handle both single match and capture groups
|
112
|
-
match_text = match.is_a?(Array) ? match[0] : match
|
113
|
-
placeholder = "PRESERVED_ELEMENT_#{pattern_index}"
|
114
|
-
preserved[placeholder] = match_text
|
115
|
-
pattern_index += 1
|
116
|
-
end
|
117
|
-
end
|
118
|
-
|
119
|
-
preserved
|
120
|
-
end
|
121
|
-
|
122
|
-
def replace_with_placeholders(content, preserved_elements)
|
123
|
-
result = content.dup
|
124
|
-
|
125
|
-
preserved_elements.each do |placeholder, original_text|
|
126
|
-
# Escape special regex characters in the original text
|
127
|
-
escaped_text = Regexp.escape(original_text)
|
128
|
-
result = result.gsub(Regexp.new(escaped_text), placeholder)
|
129
|
-
end
|
130
|
-
|
131
|
-
result
|
132
|
-
end
|
133
|
-
|
134
|
-
def restore_preserved_elements(translated_content, preserved_elements)
|
135
|
-
result = translated_content.dup
|
136
|
-
|
137
|
-
preserved_elements.each do |placeholder, original_text|
|
138
|
-
result = result.gsub(placeholder, original_text)
|
139
|
-
end
|
140
|
-
|
141
|
-
result
|
142
|
-
end
|
143
|
-
|
144
|
-
# Additional helper methods for handling special cases
|
145
|
-
|
146
|
-
def split_large_content(content, max_size = 3000)
|
147
|
-
# Split content into chunks if it's too large for the AI model
|
148
|
-
return [content] if content.length <= max_size
|
149
|
-
|
150
|
-
chunks = []
|
151
|
-
lines = content.split("\n")
|
152
|
-
current_chunk = ''
|
153
|
-
|
154
|
-
lines.each do |line|
|
155
|
-
# If adding this line would exceed the limit, start a new chunk
|
156
|
-
if "#{current_chunk}#{line}\n".length > max_size && !current_chunk.empty?
|
157
|
-
chunks << current_chunk.strip
|
158
|
-
current_chunk = "#{line}\n"
|
159
|
-
else
|
160
|
-
current_chunk += "#{line}\n"
|
161
|
-
end
|
162
|
-
end
|
163
|
-
|
164
|
-
# Add the last chunk if it's not empty
|
165
|
-
chunks << current_chunk.strip unless current_chunk.strip.empty?
|
166
|
-
|
167
|
-
chunks
|
168
|
-
end
|
169
|
-
|
170
|
-
def translate_large_content(content)
|
171
|
-
chunks = split_large_content(content)
|
172
|
-
|
173
|
-
return ai_client.translate(content) if chunks.length == 1
|
174
|
-
|
175
|
-
logger.info "Splitting large content into #{chunks.length} chunks"
|
176
|
-
|
177
|
-
translated_chunks = chunks.map.with_index do |chunk, index|
|
178
|
-
logger.debug "Translating chunk #{index + 1}/#{chunks.length}"
|
179
|
-
|
180
|
-
translated = ai_client.translate(chunk)
|
181
|
-
|
182
|
-
# Add delay between chunks to avoid rate limiting
|
183
|
-
sleep(config.request_interval) if config.request_interval.positive? && index < chunks.length - 1
|
184
|
-
|
185
|
-
translated
|
186
|
-
end
|
187
|
-
|
188
|
-
translated_chunks.join("\n\n")
|
189
|
-
end
|
190
|
-
|
191
|
-
def detect_language(content)
|
192
|
-
# Simple language detection based on content
|
193
|
-
# This is a basic implementation - could be enhanced with a proper language detection library
|
194
|
-
|
195
|
-
# Check for common English words
|
196
|
-
english_indicators = %w[the and or but with from this that these those]
|
197
|
-
chinese_indicators = %w[的 在 是 和 或者 但是 这 那]
|
198
|
-
|
199
|
-
english_score = english_indicators.count { |word| content.downcase.include?(word) }
|
200
|
-
chinese_score = chinese_indicators.count { |word| content.include?(word) }
|
201
|
-
|
202
|
-
if chinese_score > english_score
|
203
|
-
'zh'
|
204
|
-
elsif english_score.positive?
|
205
|
-
'en'
|
206
|
-
else
|
207
|
-
config.source_language
|
208
|
-
end
|
209
|
-
end
|
210
|
-
|
211
|
-
def should_translate_content?(content)
|
212
|
-
# Skip translation if content is mostly code or already in target language
|
213
|
-
|
214
|
-
# Skip if content is mostly code blocks
|
215
|
-
code_block_pattern = /```[\s\S]*?```/m
|
216
|
-
code_blocks = content.scan(code_block_pattern)
|
217
|
-
code_length = code_blocks.join.length
|
218
|
-
|
219
|
-
if code_length > content.length * 0.8
|
220
|
-
logger.debug 'Skipping translation: content is mostly code blocks'
|
221
|
-
return false
|
222
|
-
end
|
223
|
-
|
224
|
-
# Skip if content is very short
|
225
|
-
if content.strip.length < 10
|
226
|
-
logger.debug 'Skipping translation: content too short'
|
227
|
-
return false
|
228
|
-
end
|
229
|
-
|
230
|
-
true
|
91
|
+
ai_client.translate(content)
|
231
92
|
end
|
232
93
|
end
|
233
94
|
end
|
data/llm_translate.yml
CHANGED
data/test_llm_translate.yml
CHANGED
data/test_new_config.yml
CHANGED