llm_translate 0.1.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +0 -5
- data/README.zh.md +149 -62
- data/content/llm_translate.yml +1 -6
- data/lib/llm_translate/ai_client.rb +3 -3
- data/lib/llm_translate/cli.rb +11 -16
- data/lib/llm_translate/config.rb +0 -53
- data/lib/llm_translate/logger.rb +4 -11
- data/lib/llm_translate/translator_engine.rb +62 -140
- data/lib/llm_translate/version.rb +1 -1
- data/llm_translate.gemspec +1 -0
- data/llm_translate.yml +1 -6
- data/test_llm_translate.yml +1 -5
- data/test_new_config.yml +1 -6
- metadata +16 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: b3d7bffb10cabd77729e1e806a256a16bd7a036faa0ea52b7024e3a521dada5e
|
4
|
+
data.tar.gz: 50fcf8a22940afb311387913d2455dc519812abe8618be54c47541808193afd6
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 76e635e0838f893377ed9ba05e8e4e04c36053cbb4c68b2ab09cbd8c69d8a7433400886e1f817647d9b322fc7dad1e4643cc53b6471fc7ed2b6538df015108d7
|
7
|
+
data.tar.gz: 713bb859602fd5f511444f8e491bcd556beb7954f2f03cb36ff79938ff1e07e60c5419dd2c9818dc4d1f95136af8d4722b649a267e4217128598a26036b50532
|
data/README.md
CHANGED
@@ -177,11 +177,6 @@ translation:
|
|
177
177
|
default_prompt: "Your custom prompt with {content} placeholder"
|
178
178
|
preserve_formatting: true
|
179
179
|
translate_code_comments: false
|
180
|
-
preserve_patterns:
|
181
|
-
- "```[\\s\\S]*?```" # Code blocks
|
182
|
-
- "`[^`]+`" # Inline code
|
183
|
-
- "\\[.*?\\]\\(.*?\\)" # Links
|
184
|
-
- "!\\[.*?\\]\\(.*?\\)" # Images
|
185
180
|
|
186
181
|
# File Processing
|
187
182
|
files:
|
data/README.zh.md
CHANGED
@@ -1,21 +1,21 @@
|
|
1
1
|
# LlmTranslate
|
2
2
|
|
3
|
-
|
3
|
+
AI 驱动的 Markdown 翻译工具,在使用各种 AI 提供商翻译内容的同时保持格式不变。
|
4
4
|
|
5
|
-
##
|
5
|
+
## 功能特性
|
6
6
|
|
7
|
-
- 🤖 **AI
|
8
|
-
- 📝 **Markdown
|
9
|
-
- 🔧 **灵活配置**:基于YAML的配置,支持环境变量
|
7
|
+
- 🤖 **AI 驱动翻译**:支持 OpenAI、Anthropic 和 Ollama
|
8
|
+
- 📝 **Markdown 格式保留**:保持代码块、链接、图片和格式不变
|
9
|
+
- 🔧 **灵活配置**:基于 YAML 的配置,支持环境变量
|
10
10
|
- 📁 **批量处理**:递归处理整个目录结构
|
11
|
-
- 🚀 **CLI
|
12
|
-
- 📊
|
13
|
-
- ⚡
|
14
|
-
- 🎯
|
11
|
+
- 🚀 **CLI 界面**:使用 Thor 实现的易用命令行界面
|
12
|
+
- 📊 **进度跟踪**:内置日志记录和报告功能
|
13
|
+
- ⚡ **错误处理**:具有重试机制的强大错误处理
|
14
|
+
- 🎯 **可定制性**:自定义提示、文件模式和输出策略
|
15
15
|
|
16
16
|
## 安装
|
17
17
|
|
18
|
-
|
18
|
+
将以下行添加到您应用程序的 Gemfile 中:
|
19
19
|
|
20
20
|
```ruby
|
21
21
|
gem 'llm_translate'
|
@@ -27,15 +27,15 @@ gem 'llm_translate'
|
|
27
27
|
bundle install
|
28
28
|
```
|
29
29
|
|
30
|
-
|
30
|
+
或者自行安装:
|
31
31
|
|
32
32
|
```bash
|
33
33
|
gem install llm_translate
|
34
34
|
```
|
35
35
|
|
36
|
-
##
|
36
|
+
## 依赖项
|
37
37
|
|
38
|
-
该gem
|
38
|
+
该 gem 需要 `rubyllm` gem 进行 AI 集成:
|
39
39
|
|
40
40
|
```bash
|
41
41
|
gem install rubyllm
|
@@ -48,22 +48,22 @@ gem install rubyllm
|
|
48
48
|
llm_translate init
|
49
49
|
```
|
50
50
|
|
51
|
-
2. **设置您的API密钥**:
|
51
|
+
2. **设置您的 API 密钥**:
|
52
52
|
```bash
|
53
53
|
export LLM_TRANSLATE_API_KEY="your-api-key-here"
|
54
54
|
```
|
55
55
|
|
56
|
-
3. **翻译您的markdown文件**:
|
56
|
+
3. **翻译您的 markdown 文件**:
|
57
57
|
```bash
|
58
|
-
llm_translate translate --config ./
|
58
|
+
llm_translate translate --config ./llm_translate.yml
|
59
59
|
```
|
60
60
|
|
61
61
|
## 配置
|
62
62
|
|
63
|
-
翻译器使用YAML
|
63
|
+
翻译器使用 YAML 配置文件。这是一个最小示例:
|
64
64
|
|
65
65
|
```yaml
|
66
|
-
#
|
66
|
+
# llm_translate.yml
|
67
67
|
ai:
|
68
68
|
api_key: ${LLM_TRANSLATE_API_KEY}
|
69
69
|
provider: "openai"
|
@@ -91,7 +91,7 @@ logging:
|
|
91
91
|
output: "console"
|
92
92
|
```
|
93
93
|
|
94
|
-
### AI
|
94
|
+
### AI 提供商
|
95
95
|
|
96
96
|
#### OpenAI
|
97
97
|
```yaml
|
@@ -109,101 +109,188 @@ ai:
|
|
109
109
|
model: "claude-3-sonnet-20240229"
|
110
110
|
```
|
111
111
|
|
112
|
-
#### Ollama
|
113
|
-
```
|
114
|
-
|
115
|
-
|
112
|
+
#### Ollama (本地)
|
113
|
+
```yaml
|
114
|
+
ai:
|
115
|
+
provider: "ollama"
|
116
|
+
model: "llama2"
|
117
|
+
# 如果不使用默认设置,请设置 OLLAMA_HOST 环境变量
|
118
|
+
```
|
116
119
|
|
117
|
-
##
|
120
|
+
## 使用方法
|
118
121
|
|
119
122
|
### 基本翻译
|
120
123
|
|
121
124
|
#### 目录模式(默认)
|
122
125
|
```bash
|
123
|
-
|
124
|
-
```
|
126
|
+
llm_translate translate --config ./llm_translate.yml
|
127
|
+
```
|
125
128
|
|
126
129
|
#### 单文件模式
|
127
|
-
|
128
|
-
llm_translate init
|
129
|
-
```7output_file`:
|
130
|
+
要翻译单个文件,请在配置中设置 `input_file` 和 `output_file`:
|
130
131
|
|
131
|
-
```
|
132
|
-
|
133
|
-
|
132
|
+
```yaml
|
133
|
+
files:
|
134
|
+
# 单文件模式
|
135
|
+
input_file: "./README.md"
|
136
|
+
output_file: "./README.zh.md"
|
137
|
+
```
|
134
138
|
|
135
|
-
|
136
|
-
llm_translate init
|
137
|
-
```7output_file`时,翻译器将以单文件模式运行,忽略与目录相关的设置。
|
139
|
+
当同时指定 `input_file` 和 `output_file` 时,翻译器将以单文件模式运行,忽略与目录相关的设置。
|
138
140
|
|
139
141
|
### 命令行选项
|
140
142
|
|
141
143
|
```bash
|
142
|
-
|
143
|
-
|
144
|
+
llm_translate translate [OPTIONS]
|
145
|
+
|
146
|
+
Options:
|
147
|
+
-c, --config PATH 配置文件路径(默认:./llm_translate.yml)
|
148
|
+
-i, --input PATH 输入目录(覆盖配置)
|
149
|
+
-o, --output PATH 输出目录(覆盖配置)
|
150
|
+
-p, --prompt TEXT 自定义翻译提示(覆盖配置)
|
151
|
+
-v, --verbose 启用详细输出
|
152
|
+
-d, --dry-run 执行试运行而不进行实际翻译
|
153
|
+
|
154
|
+
Other Commands:
|
155
|
+
llm_translate init 初始化新的配置文件
|
156
|
+
llm_translate version 显示版本信息
|
157
|
+
```
|
144
158
|
|
145
159
|
### 配置文件结构
|
146
160
|
|
147
|
-
```
|
148
|
-
|
149
|
-
|
150
|
-
|
151
|
-
|
161
|
+
```yaml
|
162
|
+
# AI 配置
|
163
|
+
ai:
|
164
|
+
api_key: ${LLM_TRANSLATE_API_KEY}
|
165
|
+
provider: "openai" # openai, anthropic, ollama
|
166
|
+
model: "gpt-4"
|
167
|
+
temperature: 0.3
|
168
|
+
max_tokens: 4000
|
169
|
+
retry_attempts: 3
|
170
|
+
retry_delay: 2
|
171
|
+
timeout: 60
|
172
|
+
|
173
|
+
# 翻译设置
|
174
|
+
translation:
|
175
|
+
target_language: "zh-CN"
|
176
|
+
source_language: "auto"
|
177
|
+
default_prompt: "您的自定义提示,包含 {content} 占位符"
|
178
|
+
preserve_formatting: true
|
179
|
+
translate_code_comments: false
|
180
|
+
|
181
|
+
# 文件处理
|
182
|
+
files:
|
183
|
+
input_directory: "./docs"
|
184
|
+
output_directory: "./docs-translated"
|
185
|
+
filename_strategy: "suffix" # suffix, replace, directory
|
186
|
+
filename_suffix: ".zh"
|
187
|
+
include_patterns:
|
188
|
+
- "**/*.md"
|
189
|
+
- "**/*.markdown"
|
190
|
+
exclude_patterns:
|
191
|
+
- "**/node_modules/**"
|
192
|
+
- "**/.*"
|
193
|
+
preserve_directory_structure: true
|
194
|
+
overwrite_policy: "ask" # ask, overwrite, skip, backup
|
195
|
+
backup_directory: "./backups"
|
196
|
+
|
197
|
+
# 日志记录
|
198
|
+
logging:
|
199
|
+
level: "info" # debug, info, warn, error
|
200
|
+
output: "console" # console, file, both
|
201
|
+
file_path: "./logs/translator.log"
|
202
|
+
verbose_translation: false
|
203
|
+
error_log_path: "./logs/errors.log"
|
204
|
+
|
205
|
+
# 错误处理
|
206
|
+
error_handling:
|
207
|
+
on_error: "log_and_continue" # stop, log_and_continue, skip_file
|
208
|
+
max_consecutive_errors: 5
|
209
|
+
retry_on_failure: 2
|
210
|
+
generate_error_report: true
|
211
|
+
error_report_path: "./logs/error_report.md"
|
212
|
+
|
213
|
+
# 性能
|
214
|
+
performance:
|
215
|
+
concurrent_files: 3
|
216
|
+
batch_size: 5
|
217
|
+
request_interval: 1 # 请求之间的秒数
|
218
|
+
max_memory_mb: 500
|
219
|
+
|
220
|
+
# 输出
|
221
|
+
output:
|
222
|
+
show_progress: true
|
223
|
+
show_statistics: true
|
224
|
+
generate_report: true
|
225
|
+
report_path: "./reports/translation_report.md"
|
226
|
+
format: "markdown"
|
227
|
+
include_metadata: true
|
228
|
+
```
|
152
229
|
|
153
230
|
## 示例
|
154
231
|
|
155
232
|
### 翻译文档
|
156
233
|
|
157
234
|
```bash
|
158
|
-
|
159
|
-
|
235
|
+
# 将 ./docs 中的所有 markdown 文件翻译为中文
|
236
|
+
llm_translate translate --input ./docs --output ./docs-zh
|
237
|
+
|
238
|
+
# 使用自定义提示
|
239
|
+
llm_translate translate --prompt "翻译以下内容为中文,保持技术术语不变: {content}"
|
240
|
+
|
241
|
+
# 试运行以查看将要翻译的内容
|
242
|
+
llm_translate translate --dry-run --verbose
|
243
|
+
```
|
160
244
|
|
161
245
|
### 批量翻译
|
162
246
|
|
163
247
|
```bash
|
164
|
-
|
165
|
-
|
248
|
+
# 翻译多种语言版本
|
249
|
+
for lang in zh-CN ja-JP ko-KR; do
|
250
|
+
llm_translate translate --config "./configs/llm_translate-${lang}.yml"
|
251
|
+
done
|
252
|
+
```
|
166
253
|
|
167
254
|
## 开发
|
168
255
|
|
169
|
-
|
256
|
+
检出仓库后,运行:
|
170
257
|
|
171
258
|
```bash
|
172
259
|
bundle install
|
173
260
|
```
|
174
261
|
|
175
|
-
|
262
|
+
运行测试:
|
176
263
|
|
177
264
|
```bash
|
178
|
-
bundle
|
179
|
-
```
|
265
|
+
bundle exec rspec
|
266
|
+
```
|
180
267
|
|
181
|
-
|
268
|
+
运行代码检查:
|
182
269
|
|
183
270
|
```bash
|
184
|
-
|
185
|
-
```
|
271
|
+
bundle exec rubocop
|
272
|
+
```
|
186
273
|
|
187
|
-
|
274
|
+
将此 gem 安装到您的本地机器:
|
188
275
|
|
189
276
|
```bash
|
190
|
-
|
191
|
-
```
|
277
|
+
bundle exec rake install
|
278
|
+
```
|
192
279
|
|
193
280
|
## 贡献
|
194
281
|
|
195
|
-
欢迎在GitHub
|
282
|
+
欢迎在 GitHub 上提交错误报告和拉取请求:https://github.com/llm_translate/llm_translate。
|
196
283
|
|
197
284
|
## 许可证
|
198
285
|
|
199
|
-
该gem
|
286
|
+
该 gem 根据 [MIT 许可证](https://opensource.org/licenses/MIT) 的条款作为开源软件提供。
|
200
287
|
|
201
288
|
## 更新日志
|
202
289
|
|
203
290
|
### v0.1.0
|
204
|
-
-
|
205
|
-
- 支持OpenAI、Anthropic和Ollama
|
206
|
-
- Markdown格式保留
|
291
|
+
- 初始版本
|
292
|
+
- 支持 OpenAI、Anthropic 和 Ollama 提供商
|
293
|
+
- Markdown 格式保留
|
207
294
|
- 可配置的翻译提示
|
208
295
|
- 批量文件处理
|
209
|
-
-
|
296
|
+
- 全面的错误处理和日志记录
|
data/content/llm_translate.yml
CHANGED
@@ -1,6 +1,8 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
3
|
require 'ruby_llm'
|
4
|
+
# require 'pry'
|
5
|
+
|
4
6
|
module LlmTranslate
|
5
7
|
class AiClient
|
6
8
|
attr_reader :config, :logger
|
@@ -56,16 +58,14 @@ module LlmTranslate
|
|
56
58
|
|
57
59
|
def configure_ruby_llm
|
58
60
|
RubyLLM.configure do |config_obj|
|
59
|
-
# For aihubmix.com or any custom host, use OpenAI-compatible API
|
60
61
|
config_obj.openai_api_key = config.api_key
|
61
62
|
config_obj.openai_api_base = config.ai_host
|
62
|
-
config_obj.default_model = config.ai_model
|
63
63
|
end
|
64
64
|
end
|
65
65
|
|
66
66
|
def make_request(prompt)
|
67
67
|
chat = RubyLLM.chat
|
68
|
-
.with_model(config.ai_model)
|
68
|
+
.with_model(config.ai_model, assume_exists: true, provider: config.ai_provider)
|
69
69
|
.with_temperature(config.temperature)
|
70
70
|
|
71
71
|
response = chat.ask(prompt)
|
data/lib/llm_translate/cli.rb
CHANGED
@@ -70,24 +70,19 @@ module LlmTranslate
|
|
70
70
|
end
|
71
71
|
|
72
72
|
# Translate files
|
73
|
-
|
74
|
-
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
79
|
-
translator_engine.translate_file(file_path) unless options[:dry_run]
|
73
|
+
if options[:dry_run]
|
74
|
+
logger.info "DRY RUN: Would translate #{files.length} files with #{config.concurrent_files} concurrent threads"
|
75
|
+
success_count = files.length
|
76
|
+
error_count = 0
|
77
|
+
else
|
78
|
+
logger.info "Starting translation with #{config.concurrent_files} concurrent files"
|
80
79
|
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
error_count += 1
|
85
|
-
logger.error "✗ Failed to process #{file_path}: #{e.message}"
|
80
|
+
results = translator_engine.translate_files_concurrently(files)
|
81
|
+
success_count = results[:success].length
|
82
|
+
error_count = results[:error].length
|
86
83
|
|
87
|
-
if
|
88
|
-
|
89
|
-
break
|
90
|
-
end
|
84
|
+
# Check if we should stop on too many errors
|
85
|
+
logger.error "Stopping due to too many errors (#{error_count})" if config.should_stop_on_error?(error_count)
|
91
86
|
end
|
92
87
|
|
93
88
|
# Summary
|
data/lib/llm_translate/config.rb
CHANGED
@@ -75,10 +75,6 @@ module LlmTranslate
|
|
75
75
|
data.dig('translation', 'translate_code_comments') == true
|
76
76
|
end
|
77
77
|
|
78
|
-
def preserve_patterns
|
79
|
-
data.dig('translation', 'preserve_patterns') || default_preserve_patterns
|
80
|
-
end
|
81
|
-
|
82
78
|
# File Configuration
|
83
79
|
def input_directory
|
84
80
|
cli_options[:input] || data.dig('files', 'input_directory') || './docs'
|
@@ -124,10 +120,6 @@ module LlmTranslate
|
|
124
120
|
data.dig('files', 'overwrite_policy') || 'ask'
|
125
121
|
end
|
126
122
|
|
127
|
-
def backup_directory
|
128
|
-
data.dig('files', 'backup_directory') || './backups'
|
129
|
-
end
|
130
|
-
|
131
123
|
# Logging Configuration
|
132
124
|
def log_level
|
133
125
|
cli_options[:verbose] ? 'debug' : (data.dig('logging', 'level') || 'info')
|
@@ -137,18 +129,10 @@ module LlmTranslate
|
|
137
129
|
data.dig('logging', 'output') || 'console'
|
138
130
|
end
|
139
131
|
|
140
|
-
def log_file_path
|
141
|
-
data.dig('logging', 'file_path') || './logs/llm_translate.log'
|
142
|
-
end
|
143
|
-
|
144
132
|
def verbose_translation?
|
145
133
|
cli_options[:verbose] || data.dig('logging', 'verbose_translation') == true
|
146
134
|
end
|
147
135
|
|
148
|
-
def error_log_path
|
149
|
-
data.dig('logging', 'error_log_path') || './logs/errors.log'
|
150
|
-
end
|
151
|
-
|
152
136
|
# Error Handling Configuration
|
153
137
|
def on_error
|
154
138
|
data.dig('error_handling', 'on_error') || 'log_and_continue'
|
@@ -166,10 +150,6 @@ module LlmTranslate
|
|
166
150
|
data.dig('error_handling', 'generate_error_report') != false
|
167
151
|
end
|
168
152
|
|
169
|
-
def error_report_path
|
170
|
-
data.dig('error_handling', 'error_report_path') || './logs/error_report.md'
|
171
|
-
end
|
172
|
-
|
173
153
|
def should_stop_on_error?(error_count)
|
174
154
|
on_error == 'stop' || error_count >= max_consecutive_errors
|
175
155
|
end
|
@@ -179,27 +159,11 @@ module LlmTranslate
|
|
179
159
|
data.dig('performance', 'concurrent_files') || 3
|
180
160
|
end
|
181
161
|
|
182
|
-
def batch_size
|
183
|
-
data.dig('performance', 'batch_size') || 5
|
184
|
-
end
|
185
|
-
|
186
162
|
def request_interval
|
187
163
|
data.dig('performance', 'request_interval') || 1
|
188
164
|
end
|
189
165
|
|
190
|
-
def max_memory_mb
|
191
|
-
data.dig('performance', 'max_memory_mb') || 500
|
192
|
-
end
|
193
|
-
|
194
166
|
# Output Configuration
|
195
|
-
def show_progress?
|
196
|
-
data.dig('output', 'show_progress') != false
|
197
|
-
end
|
198
|
-
|
199
|
-
def show_statistics?
|
200
|
-
data.dig('output', 'show_statistics') != false
|
201
|
-
end
|
202
|
-
|
203
167
|
def generate_report?
|
204
168
|
data.dig('output', 'generate_report') != false
|
205
169
|
end
|
@@ -208,14 +172,6 @@ module LlmTranslate
|
|
208
172
|
data.dig('output', 'report_path') || './reports/translation_report.md'
|
209
173
|
end
|
210
174
|
|
211
|
-
def output_format
|
212
|
-
data.dig('output', 'format') || 'markdown'
|
213
|
-
end
|
214
|
-
|
215
|
-
def include_metadata?
|
216
|
-
data.dig('output', 'include_metadata') != false
|
217
|
-
end
|
218
|
-
|
219
175
|
private
|
220
176
|
|
221
177
|
def load_config_file(config_path)
|
@@ -266,14 +222,5 @@ module LlmTranslate
|
|
266
222
|
{content}
|
267
223
|
PROMPT
|
268
224
|
end
|
269
|
-
|
270
|
-
def default_preserve_patterns
|
271
|
-
[
|
272
|
-
'```[\\s\\S]*?```', # Code blocks
|
273
|
-
'`[^`]+`', # Inline code
|
274
|
-
'\\[.*?\\]\\(.*?\\)', # Links
|
275
|
-
'!\\[.*?\\]\\(.*?\\)' # Images
|
276
|
-
]
|
277
|
-
end
|
278
225
|
end
|
279
226
|
end
|
data/lib/llm_translate/logger.rb
CHANGED
@@ -67,7 +67,7 @@ module LlmTranslate
|
|
67
67
|
when 'console'
|
68
68
|
create_console_logger
|
69
69
|
when 'file'
|
70
|
-
create_file_logger(
|
70
|
+
create_file_logger('./logs/llm_translate.log')
|
71
71
|
when 'both'
|
72
72
|
create_multi_logger
|
73
73
|
else
|
@@ -76,15 +76,8 @@ module LlmTranslate
|
|
76
76
|
end
|
77
77
|
|
78
78
|
def create_error_logger
|
79
|
-
return nil
|
80
|
-
|
81
|
-
FileUtils.mkdir_p(File.dirname(config.error_log_path))
|
82
|
-
error_logger = ::Logger.new(config.error_log_path)
|
83
|
-
error_logger.level = log_level_constant
|
84
|
-
error_logger.formatter = proc do |severity, datetime, _progname, msg|
|
85
|
-
"[#{datetime.strftime('%Y-%m-%d %H:%M:%S')}] #{severity}: #{msg}\n"
|
86
|
-
end
|
87
|
-
error_logger
|
79
|
+
# Error logger is no longer supported, return nil
|
80
|
+
nil
|
88
81
|
end
|
89
82
|
|
90
83
|
def create_console_logger
|
@@ -119,7 +112,7 @@ module LlmTranslate
|
|
119
112
|
|
120
113
|
def create_multi_logger
|
121
114
|
console_logger = create_console_logger
|
122
|
-
file_logger = create_file_logger(
|
115
|
+
file_logger = create_file_logger('./logs/llm_translate.log')
|
123
116
|
|
124
117
|
MultiLogger.new([console_logger, file_logger])
|
125
118
|
end
|
@@ -2,6 +2,7 @@
|
|
2
2
|
|
3
3
|
require 'pathname'
|
4
4
|
require 'fileutils'
|
5
|
+
require 'async'
|
5
6
|
|
6
7
|
module LlmTranslate
|
7
8
|
class TranslatorEngine
|
@@ -53,6 +54,66 @@ module LlmTranslate
|
|
53
54
|
sleep(config.request_interval) if config.request_interval.positive?
|
54
55
|
end
|
55
56
|
|
57
|
+
def translate_files_concurrently(file_paths)
|
58
|
+
return translate_files_sequentially(file_paths) if config.concurrent_files <= 1
|
59
|
+
|
60
|
+
results = { success: [], error: [] }
|
61
|
+
|
62
|
+
# Use Async to run concurrent translation tasks
|
63
|
+
Async do |task|
|
64
|
+
# Process files in batches to limit concurrency
|
65
|
+
file_paths.each_slice(config.concurrent_files) do |batch|
|
66
|
+
# Create async tasks for the current batch
|
67
|
+
batch_tasks = batch.map.with_index do |file_path, _batch_index|
|
68
|
+
# Calculate overall index
|
69
|
+
overall_index = file_paths.index(file_path) + 1
|
70
|
+
|
71
|
+
task.async do
|
72
|
+
logger.info "[#{overall_index}/#{file_paths.length}] Processing: #{file_path}"
|
73
|
+
|
74
|
+
# Translate the file
|
75
|
+
translate_file(file_path)
|
76
|
+
|
77
|
+
# Collect successful result
|
78
|
+
results[:success] << file_path
|
79
|
+
|
80
|
+
logger.info "✓ Successfully processed: #{file_path}"
|
81
|
+
{ status: :success, file: file_path }
|
82
|
+
rescue StandardError => e
|
83
|
+
# Collect error result
|
84
|
+
results[:error] << { file: file_path, error: e.message }
|
85
|
+
|
86
|
+
logger.error "✗ Failed to process #{file_path}: #{e.message}"
|
87
|
+
{ status: :error, file: file_path, error: e.message }
|
88
|
+
end
|
89
|
+
end
|
90
|
+
|
91
|
+
# Wait for all tasks in this batch to complete before starting the next batch
|
92
|
+
batch_tasks.each(&:wait)
|
93
|
+
end
|
94
|
+
end
|
95
|
+
|
96
|
+
results
|
97
|
+
end
|
98
|
+
|
99
|
+
def translate_files_sequentially(file_paths)
|
100
|
+
results = { success: [], error: [] }
|
101
|
+
|
102
|
+
file_paths.each_with_index do |file_path, index|
|
103
|
+
logger.info "[#{index + 1}/#{file_paths.length}] Processing: #{file_path}"
|
104
|
+
|
105
|
+
translate_file(file_path)
|
106
|
+
|
107
|
+
results[:success] << file_path
|
108
|
+
logger.info "✓ Successfully processed: #{file_path}"
|
109
|
+
rescue StandardError => e
|
110
|
+
results[:error] << { file: file_path, error: e.message }
|
111
|
+
logger.error "✗ Failed to process #{file_path}: #{e.message}"
|
112
|
+
end
|
113
|
+
|
114
|
+
results
|
115
|
+
end
|
116
|
+
|
56
117
|
def translate_content(content, file_path = nil)
|
57
118
|
if config.preserve_formatting?
|
58
119
|
translate_with_format_preservation(content)
|
@@ -87,147 +148,8 @@ module LlmTranslate
|
|
87
148
|
end
|
88
149
|
|
89
150
|
def translate_with_format_preservation(content)
|
90
|
-
# Extract and preserve special markdown elements
|
91
|
-
preserved_elements = extract_preserved_elements(content)
|
92
|
-
|
93
|
-
# Replace preserved elements with placeholders
|
94
|
-
content_with_placeholders = replace_with_placeholders(content, preserved_elements)
|
95
|
-
|
96
151
|
# Translate the content with placeholders
|
97
|
-
|
98
|
-
|
99
|
-
# Restore preserved elements
|
100
|
-
restore_preserved_elements(translated_content, preserved_elements)
|
101
|
-
end
|
102
|
-
|
103
|
-
def extract_preserved_elements(content)
|
104
|
-
preserved = {}
|
105
|
-
pattern_index = 0
|
106
|
-
|
107
|
-
config.preserve_patterns.each do |pattern|
|
108
|
-
regex = Regexp.new(pattern, Regexp::MULTILINE)
|
109
|
-
|
110
|
-
content.scan(regex) do |match|
|
111
|
-
# Handle both single match and capture groups
|
112
|
-
match_text = match.is_a?(Array) ? match[0] : match
|
113
|
-
placeholder = "PRESERVED_ELEMENT_#{pattern_index}"
|
114
|
-
preserved[placeholder] = match_text
|
115
|
-
pattern_index += 1
|
116
|
-
end
|
117
|
-
end
|
118
|
-
|
119
|
-
preserved
|
120
|
-
end
|
121
|
-
|
122
|
-
def replace_with_placeholders(content, preserved_elements)
|
123
|
-
result = content.dup
|
124
|
-
|
125
|
-
preserved_elements.each do |placeholder, original_text|
|
126
|
-
# Escape special regex characters in the original text
|
127
|
-
escaped_text = Regexp.escape(original_text)
|
128
|
-
result = result.gsub(Regexp.new(escaped_text), placeholder)
|
129
|
-
end
|
130
|
-
|
131
|
-
result
|
132
|
-
end
|
133
|
-
|
134
|
-
def restore_preserved_elements(translated_content, preserved_elements)
|
135
|
-
result = translated_content.dup
|
136
|
-
|
137
|
-
preserved_elements.each do |placeholder, original_text|
|
138
|
-
result = result.gsub(placeholder, original_text)
|
139
|
-
end
|
140
|
-
|
141
|
-
result
|
142
|
-
end
|
143
|
-
|
144
|
-
# Additional helper methods for handling special cases
|
145
|
-
|
146
|
-
def split_large_content(content, max_size = 3000)
|
147
|
-
# Split content into chunks if it's too large for the AI model
|
148
|
-
return [content] if content.length <= max_size
|
149
|
-
|
150
|
-
chunks = []
|
151
|
-
lines = content.split("\n")
|
152
|
-
current_chunk = ''
|
153
|
-
|
154
|
-
lines.each do |line|
|
155
|
-
# If adding this line would exceed the limit, start a new chunk
|
156
|
-
if "#{current_chunk}#{line}\n".length > max_size && !current_chunk.empty?
|
157
|
-
chunks << current_chunk.strip
|
158
|
-
current_chunk = "#{line}\n"
|
159
|
-
else
|
160
|
-
current_chunk += "#{line}\n"
|
161
|
-
end
|
162
|
-
end
|
163
|
-
|
164
|
-
# Add the last chunk if it's not empty
|
165
|
-
chunks << current_chunk.strip unless current_chunk.strip.empty?
|
166
|
-
|
167
|
-
chunks
|
168
|
-
end
|
169
|
-
|
170
|
-
def translate_large_content(content)
|
171
|
-
chunks = split_large_content(content)
|
172
|
-
|
173
|
-
return ai_client.translate(content) if chunks.length == 1
|
174
|
-
|
175
|
-
logger.info "Splitting large content into #{chunks.length} chunks"
|
176
|
-
|
177
|
-
translated_chunks = chunks.map.with_index do |chunk, index|
|
178
|
-
logger.debug "Translating chunk #{index + 1}/#{chunks.length}"
|
179
|
-
|
180
|
-
translated = ai_client.translate(chunk)
|
181
|
-
|
182
|
-
# Add delay between chunks to avoid rate limiting
|
183
|
-
sleep(config.request_interval) if config.request_interval.positive? && index < chunks.length - 1
|
184
|
-
|
185
|
-
translated
|
186
|
-
end
|
187
|
-
|
188
|
-
translated_chunks.join("\n\n")
|
189
|
-
end
|
190
|
-
|
191
|
-
def detect_language(content)
|
192
|
-
# Simple language detection based on content
|
193
|
-
# This is a basic implementation - could be enhanced with a proper language detection library
|
194
|
-
|
195
|
-
# Check for common English words
|
196
|
-
english_indicators = %w[the and or but with from this that these those]
|
197
|
-
chinese_indicators = %w[的 在 是 和 或者 但是 这 那]
|
198
|
-
|
199
|
-
english_score = english_indicators.count { |word| content.downcase.include?(word) }
|
200
|
-
chinese_score = chinese_indicators.count { |word| content.include?(word) }
|
201
|
-
|
202
|
-
if chinese_score > english_score
|
203
|
-
'zh'
|
204
|
-
elsif english_score.positive?
|
205
|
-
'en'
|
206
|
-
else
|
207
|
-
config.source_language
|
208
|
-
end
|
209
|
-
end
|
210
|
-
|
211
|
-
def should_translate_content?(content)
|
212
|
-
# Skip translation if content is mostly code or already in target language
|
213
|
-
|
214
|
-
# Skip if content is mostly code blocks
|
215
|
-
code_block_pattern = /```[\s\S]*?```/m
|
216
|
-
code_blocks = content.scan(code_block_pattern)
|
217
|
-
code_length = code_blocks.join.length
|
218
|
-
|
219
|
-
if code_length > content.length * 0.8
|
220
|
-
logger.debug 'Skipping translation: content is mostly code blocks'
|
221
|
-
return false
|
222
|
-
end
|
223
|
-
|
224
|
-
# Skip if content is very short
|
225
|
-
if content.strip.length < 10
|
226
|
-
logger.debug 'Skipping translation: content too short'
|
227
|
-
return false
|
228
|
-
end
|
229
|
-
|
230
|
-
true
|
152
|
+
ai_client.translate(content)
|
231
153
|
end
|
232
154
|
end
|
233
155
|
end
|
data/llm_translate.gemspec
CHANGED
data/llm_translate.yml
CHANGED
data/test_llm_translate.yml
CHANGED
data/test_new_config.yml
CHANGED
metadata
CHANGED
@@ -1,15 +1,29 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: llm_translate
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- LlmTranslate Team
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2025-
|
11
|
+
date: 2025-09-01 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: async
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - "~>"
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '2.0'
|
20
|
+
type: :runtime
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - "~>"
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '2.0'
|
13
27
|
- !ruby/object:Gem::Dependency
|
14
28
|
name: ruby_llm
|
15
29
|
requirement: !ruby/object:Gem::Requirement
|