llm_translate 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 584007a41c9b59041a1ecd9e42c96231c66fdc606edb19567c58fd342d838a15
4
- data.tar.gz: 1b9afce81578be82bcfd4f81a7fa6050073d9eef8848a7ee43ef9110b38ead94
3
+ metadata.gz: b3d7bffb10cabd77729e1e806a256a16bd7a036faa0ea52b7024e3a521dada5e
4
+ data.tar.gz: 50fcf8a22940afb311387913d2455dc519812abe8618be54c47541808193afd6
5
5
  SHA512:
6
- metadata.gz: a959c5dfcf20bfc2bc94f787cf8c3e49ee75417a5be461a0b2650d023ce964670dd566e118e764b3152c3afdc2a22d2ac33605fdfca7909e55848e0761b0515c
7
- data.tar.gz: d56df324476f82d6deb7e4b28bba026349661e4b5c389e1068e09dae36ecae7c38a8d38cee8cbfa861c686db3f063f3715cc0c18430e462d86f4ebf69079ce72
6
+ metadata.gz: 76e635e0838f893377ed9ba05e8e4e04c36053cbb4c68b2ab09cbd8c69d8a7433400886e1f817647d9b322fc7dad1e4643cc53b6471fc7ed2b6538df015108d7
7
+ data.tar.gz: 713bb859602fd5f511444f8e491bcd556beb7954f2f03cb36ff79938ff1e07e60c5419dd2c9818dc4d1f95136af8d4722b649a267e4217128598a26036b50532
data/README.md CHANGED
@@ -177,11 +177,6 @@ translation:
177
177
  default_prompt: "Your custom prompt with {content} placeholder"
178
178
  preserve_formatting: true
179
179
  translate_code_comments: false
180
- preserve_patterns:
181
- - "```[\\s\\S]*?```" # Code blocks
182
- - "`[^`]+`" # Inline code
183
- - "\\[.*?\\]\\(.*?\\)" # Links
184
- - "!\\[.*?\\]\\(.*?\\)" # Images
185
180
 
186
181
  # File Processing
187
182
  files:
data/README.zh.md CHANGED
@@ -1,21 +1,21 @@
1
1
  # LlmTranslate
2
2
 
3
- 一个由AI驱动的Markdown翻译器,能够在翻译内容时保持格式不变,同时使用各种AI提供者。
3
+ AI 驱动的 Markdown 翻译工具,在使用各种 AI 提供商翻译内容的同时保持格式不变。
4
4
 
5
- ## 特性
5
+ ## 功能特性
6
6
 
7
- - 🤖 **AI驱动的翻译**:支持OpenAI、Anthropic和Ollama
8
- - 📝 **Markdown格式保留**:保持代码块、链接、图像和格式不变
9
- - 🔧 **灵活配置**:基于YAML的配置,支持环境变量
7
+ - 🤖 **AI 驱动翻译**:支持 OpenAI、Anthropic Ollama
8
+ - 📝 **Markdown 格式保留**:保持代码块、链接、图片和格式不变
9
+ - 🔧 **灵活配置**:基于 YAML 的配置,支持环境变量
10
10
  - 📁 **批量处理**:递归处理整个目录结构
11
- - 🚀 **CLI接口**:易于使用的命令行接口,使用Thor
12
- - 📊 **进度跟踪**:内置日志记录和报告
13
- - ⚡ **错误处理**:强大的错误处理机制,带有重试机制
14
- - 🎯 **可定制**:自定义提示、文件模式和输出策略
11
+ - 🚀 **CLI 界面**:使用 Thor 实现的易用命令行界面
12
+ - 📊 **进度跟踪**:内置日志记录和报告功能
13
+ - ⚡ **错误处理**:具有重试机制的强大错误处理
14
+ - 🎯 **可定制性**:自定义提示、文件模式和输出策略
15
15
 
16
16
  ## 安装
17
17
 
18
- 将此行添加到您应用程序的Gemfile中:
18
+ 将以下行添加到您应用程序的 Gemfile 中:
19
19
 
20
20
  ```ruby
21
21
  gem 'llm_translate'
@@ -27,15 +27,15 @@ gem 'llm_translate'
27
27
  bundle install
28
28
  ```
29
29
 
30
- 或者您也可以自己安装:
30
+ 或者自行安装:
31
31
 
32
32
  ```bash
33
33
  gem install llm_translate
34
34
  ```
35
35
 
36
- ## 依赖
36
+ ## 依赖项
37
37
 
38
- 该gem需要`rubyllm` gem进行AI集成:
38
+ gem 需要 `rubyllm` gem 进行 AI 集成:
39
39
 
40
40
  ```bash
41
41
  gem install rubyllm
@@ -48,22 +48,22 @@ gem install rubyllm
48
48
  llm_translate init
49
49
  ```
50
50
 
51
- 2. **设置您的API密钥**:
51
+ 2. **设置您的 API 密钥**:
52
52
  ```bash
53
53
  export LLM_TRANSLATE_API_KEY="your-api-key-here"
54
54
  ```
55
55
 
56
- 3. **翻译您的markdown文件**:
56
+ 3. **翻译您的 markdown 文件**:
57
57
  ```bash
58
- llm_translate translate --config ./translator.yml
58
+ llm_translate translate --config ./llm_translate.yml
59
59
  ```
60
60
 
61
61
  ## 配置
62
62
 
63
- 翻译器使用YAML配置文件。以下是一个最小示例:
63
+ 翻译器使用 YAML 配置文件。这是一个最小示例:
64
64
 
65
65
  ```yaml
66
- # translator.yml
66
+ # llm_translate.yml
67
67
  ai:
68
68
  api_key: ${LLM_TRANSLATE_API_KEY}
69
69
  provider: "openai"
@@ -91,7 +91,7 @@ logging:
91
91
  output: "console"
92
92
  ```
93
93
 
94
- ### AI提供者
94
+ ### AI 提供商
95
95
 
96
96
  #### OpenAI
97
97
  ```yaml
@@ -109,101 +109,188 @@ ai:
109
109
  model: "claude-3-sonnet-20240229"
110
110
  ```
111
111
 
112
- #### Ollama(本地)
113
- ```bash
114
- bundle install
115
- ```0
112
+ #### Ollama (本地)
113
+ ```yaml
114
+ ai:
115
+ provider: "ollama"
116
+ model: "llama2"
117
+ # 如果不使用默认设置,请设置 OLLAMA_HOST 环境变量
118
+ ```
116
119
 
117
- ## 使用
120
+ ## 使用方法
118
121
 
119
122
  ### 基本翻译
120
123
 
121
124
  #### 目录模式(默认)
122
125
  ```bash
123
- bundle install
124
- ```1
126
+ llm_translate translate --config ./llm_translate.yml
127
+ ```
125
128
 
126
129
  #### 单文件模式
127
- 要翻译单个文件,请在配置中设置`input_file```bash
128
- llm_translate init
129
- ```7output_file`:
130
+ 要翻译单个文件,请在配置中设置 `input_file` 和 `output_file`:
130
131
 
131
- ```bash
132
- bundle install
133
- ```2
132
+ ```yaml
133
+ files:
134
+ # 单文件模式
135
+ input_file: "./README.md"
136
+ output_file: "./README.zh.md"
137
+ ```
134
138
 
135
- 当同时指定`input_file```bash
136
- llm_translate init
137
- ```7output_file`时,翻译器将以单文件模式运行,忽略与目录相关的设置。
139
+ 当同时指定 `input_file` 和 `output_file` 时,翻译器将以单文件模式运行,忽略与目录相关的设置。
138
140
 
139
141
  ### 命令行选项
140
142
 
141
143
  ```bash
142
- bundle install
143
- ```3
144
+ llm_translate translate [OPTIONS]
145
+
146
+ Options:
147
+ -c, --config PATH 配置文件路径(默认:./llm_translate.yml)
148
+ -i, --input PATH 输入目录(覆盖配置)
149
+ -o, --output PATH 输出目录(覆盖配置)
150
+ -p, --prompt TEXT 自定义翻译提示(覆盖配置)
151
+ -v, --verbose 启用详细输出
152
+ -d, --dry-run 执行试运行而不进行实际翻译
153
+
154
+ Other Commands:
155
+ llm_translate init 初始化新的配置文件
156
+ llm_translate version 显示版本信息
157
+ ```
144
158
 
145
159
  ### 配置文件结构
146
160
 
147
- ```bash
148
- bundle install
149
- ```4[\s\S]*?```bash
150
- bundle install
151
- ```5
161
+ ```yaml
162
+ # AI 配置
163
+ ai:
164
+ api_key: ${LLM_TRANSLATE_API_KEY}
165
+ provider: "openai" # openai, anthropic, ollama
166
+ model: "gpt-4"
167
+ temperature: 0.3
168
+ max_tokens: 4000
169
+ retry_attempts: 3
170
+ retry_delay: 2
171
+ timeout: 60
172
+
173
+ # 翻译设置
174
+ translation:
175
+ target_language: "zh-CN"
176
+ source_language: "auto"
177
+ default_prompt: "您的自定义提示,包含 {content} 占位符"
178
+ preserve_formatting: true
179
+ translate_code_comments: false
180
+
181
+ # 文件处理
182
+ files:
183
+ input_directory: "./docs"
184
+ output_directory: "./docs-translated"
185
+ filename_strategy: "suffix" # suffix, replace, directory
186
+ filename_suffix: ".zh"
187
+ include_patterns:
188
+ - "**/*.md"
189
+ - "**/*.markdown"
190
+ exclude_patterns:
191
+ - "**/node_modules/**"
192
+ - "**/.*"
193
+ preserve_directory_structure: true
194
+ overwrite_policy: "ask" # ask, overwrite, skip, backup
195
+ backup_directory: "./backups"
196
+
197
+ # 日志记录
198
+ logging:
199
+ level: "info" # debug, info, warn, error
200
+ output: "console" # console, file, both
201
+ file_path: "./logs/translator.log"
202
+ verbose_translation: false
203
+ error_log_path: "./logs/errors.log"
204
+
205
+ # 错误处理
206
+ error_handling:
207
+ on_error: "log_and_continue" # stop, log_and_continue, skip_file
208
+ max_consecutive_errors: 5
209
+ retry_on_failure: 2
210
+ generate_error_report: true
211
+ error_report_path: "./logs/error_report.md"
212
+
213
+ # 性能
214
+ performance:
215
+ concurrent_files: 3
216
+ batch_size: 5
217
+ request_interval: 1 # 请求之间的秒数
218
+ max_memory_mb: 500
219
+
220
+ # 输出
221
+ output:
222
+ show_progress: true
223
+ show_statistics: true
224
+ generate_report: true
225
+ report_path: "./reports/translation_report.md"
226
+ format: "markdown"
227
+ include_metadata: true
228
+ ```
152
229
 
153
230
  ## 示例
154
231
 
155
232
  ### 翻译文档
156
233
 
157
234
  ```bash
158
- bundle install
159
- ```6
235
+ # 将 ./docs 中的所有 markdown 文件翻译为中文
236
+ llm_translate translate --input ./docs --output ./docs-zh
237
+
238
+ # 使用自定义提示
239
+ llm_translate translate --prompt "翻译以下内容为中文,保持技术术语不变: {content}"
240
+
241
+ # 试运行以查看将要翻译的内容
242
+ llm_translate translate --dry-run --verbose
243
+ ```
160
244
 
161
245
  ### 批量翻译
162
246
 
163
247
  ```bash
164
- bundle install
165
- ```7
248
+ # 翻译多种语言版本
249
+ for lang in zh-CN ja-JP ko-KR; do
250
+ llm_translate translate --config "./configs/llm_translate-${lang}.yml"
251
+ done
252
+ ```
166
253
 
167
254
  ## 开发
168
255
 
169
- 克隆代码库后,运行:
256
+ 检出仓库后,运行:
170
257
 
171
258
  ```bash
172
259
  bundle install
173
260
  ```
174
261
 
175
- 要运行测试:
262
+ 运行测试:
176
263
 
177
264
  ```bash
178
- bundle install
179
- ```9
265
+ bundle exec rspec
266
+ ```
180
267
 
181
- 要运行代码检查:
268
+ 运行代码检查:
182
269
 
183
270
  ```bash
184
- gem install llm_translate
185
- ```0
271
+ bundle exec rubocop
272
+ ```
186
273
 
187
- 要将此gem安装到您的本地机器上:
274
+ 将此 gem 安装到您的本地机器:
188
275
 
189
276
  ```bash
190
- gem install llm_translate
191
- ```1
277
+ bundle exec rake install
278
+ ```
192
279
 
193
280
  ## 贡献
194
281
 
195
- 欢迎在GitHub上提交错误报告和拉取请求,地址为 https://github.com/translator/translator
282
+ 欢迎在 GitHub 上提交错误报告和拉取请求:https://github.com/llm_translate/llm_translate
196
283
 
197
284
  ## 许可证
198
285
 
199
- 该gem[MIT许可证](https://opensource.org/licenses/MIT)条款下作为开源软件提供。
286
+ gem 根据 [MIT 许可证](https://opensource.org/licenses/MIT) 的条款作为开源软件提供。
200
287
 
201
288
  ## 更新日志
202
289
 
203
290
  ### v0.1.0
204
- - 初始发布
205
- - 支持OpenAI、Anthropic和Ollama提供者
206
- - Markdown格式保留
291
+ - 初始版本
292
+ - 支持 OpenAI、Anthropic Ollama 提供商
293
+ - Markdown 格式保留
207
294
  - 可配置的翻译提示
208
295
  - 批量文件处理
209
- - 综合的错误处理和日志记录
296
+ - 全面的错误处理和日志记录
@@ -50,12 +50,7 @@ translation:
50
50
  # 是否翻译代码注释
51
51
  translate_code_comments: false
52
52
 
53
- # 需要保留不翻译的内容模式
54
- preserve_patterns:
55
- - "```[\\s\\S]*?```" # 代码块
56
- - "`[^`]+`" # 行内代码
57
- - "\\[.*?\\]\\(.*?\\)" # 链接
58
- - "!\\[.*?\\]\\(.*?\\)" # 图片
53
+
59
54
 
60
55
  # 文件处理配置
61
56
  files:
@@ -1,6 +1,8 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require 'ruby_llm'
4
+ # require 'pry'
5
+
4
6
  module LlmTranslate
5
7
  class AiClient
6
8
  attr_reader :config, :logger
@@ -56,16 +58,14 @@ module LlmTranslate
56
58
 
57
59
  def configure_ruby_llm
58
60
  RubyLLM.configure do |config_obj|
59
- # For aihubmix.com or any custom host, use OpenAI-compatible API
60
61
  config_obj.openai_api_key = config.api_key
61
62
  config_obj.openai_api_base = config.ai_host
62
- config_obj.default_model = config.ai_model
63
63
  end
64
64
  end
65
65
 
66
66
  def make_request(prompt)
67
67
  chat = RubyLLM.chat
68
- .with_model(config.ai_model)
68
+ .with_model(config.ai_model, assume_exists: true, provider: config.ai_provider)
69
69
  .with_temperature(config.temperature)
70
70
 
71
71
  response = chat.ask(prompt)
@@ -70,24 +70,19 @@ module LlmTranslate
70
70
  end
71
71
 
72
72
  # Translate files
73
- success_count = 0
74
- error_count = 0
75
-
76
- files.each_with_index do |file_path, index|
77
- logger.info "[#{index + 1}/#{files.length}] Processing: #{file_path}"
78
-
79
- translator_engine.translate_file(file_path) unless options[:dry_run]
73
+ if options[:dry_run]
74
+ logger.info "DRY RUN: Would translate #{files.length} files with #{config.concurrent_files} concurrent threads"
75
+ success_count = files.length
76
+ error_count = 0
77
+ else
78
+ logger.info "Starting translation with #{config.concurrent_files} concurrent files"
80
79
 
81
- success_count += 1
82
- logger.info "✓ Successfully processed: #{file_path}"
83
- rescue StandardError => e
84
- error_count += 1
85
- logger.error "✗ Failed to process #{file_path}: #{e.message}"
80
+ results = translator_engine.translate_files_concurrently(files)
81
+ success_count = results[:success].length
82
+ error_count = results[:error].length
86
83
 
87
- if config.should_stop_on_error?(error_count)
88
- logger.error 'Stopping due to too many consecutive errors'
89
- break
90
- end
84
+ # Check if we should stop on too many errors
85
+ logger.error "Stopping due to too many errors (#{error_count})" if config.should_stop_on_error?(error_count)
91
86
  end
92
87
 
93
88
  # Summary
@@ -75,10 +75,6 @@ module LlmTranslate
75
75
  data.dig('translation', 'translate_code_comments') == true
76
76
  end
77
77
 
78
- def preserve_patterns
79
- data.dig('translation', 'preserve_patterns') || default_preserve_patterns
80
- end
81
-
82
78
  # File Configuration
83
79
  def input_directory
84
80
  cli_options[:input] || data.dig('files', 'input_directory') || './docs'
@@ -124,10 +120,6 @@ module LlmTranslate
124
120
  data.dig('files', 'overwrite_policy') || 'ask'
125
121
  end
126
122
 
127
- def backup_directory
128
- data.dig('files', 'backup_directory') || './backups'
129
- end
130
-
131
123
  # Logging Configuration
132
124
  def log_level
133
125
  cli_options[:verbose] ? 'debug' : (data.dig('logging', 'level') || 'info')
@@ -137,18 +129,10 @@ module LlmTranslate
137
129
  data.dig('logging', 'output') || 'console'
138
130
  end
139
131
 
140
- def log_file_path
141
- data.dig('logging', 'file_path') || './logs/llm_translate.log'
142
- end
143
-
144
132
  def verbose_translation?
145
133
  cli_options[:verbose] || data.dig('logging', 'verbose_translation') == true
146
134
  end
147
135
 
148
- def error_log_path
149
- data.dig('logging', 'error_log_path') || './logs/errors.log'
150
- end
151
-
152
136
  # Error Handling Configuration
153
137
  def on_error
154
138
  data.dig('error_handling', 'on_error') || 'log_and_continue'
@@ -166,10 +150,6 @@ module LlmTranslate
166
150
  data.dig('error_handling', 'generate_error_report') != false
167
151
  end
168
152
 
169
- def error_report_path
170
- data.dig('error_handling', 'error_report_path') || './logs/error_report.md'
171
- end
172
-
173
153
  def should_stop_on_error?(error_count)
174
154
  on_error == 'stop' || error_count >= max_consecutive_errors
175
155
  end
@@ -179,27 +159,11 @@ module LlmTranslate
179
159
  data.dig('performance', 'concurrent_files') || 3
180
160
  end
181
161
 
182
- def batch_size
183
- data.dig('performance', 'batch_size') || 5
184
- end
185
-
186
162
  def request_interval
187
163
  data.dig('performance', 'request_interval') || 1
188
164
  end
189
165
 
190
- def max_memory_mb
191
- data.dig('performance', 'max_memory_mb') || 500
192
- end
193
-
194
166
  # Output Configuration
195
- def show_progress?
196
- data.dig('output', 'show_progress') != false
197
- end
198
-
199
- def show_statistics?
200
- data.dig('output', 'show_statistics') != false
201
- end
202
-
203
167
  def generate_report?
204
168
  data.dig('output', 'generate_report') != false
205
169
  end
@@ -208,14 +172,6 @@ module LlmTranslate
208
172
  data.dig('output', 'report_path') || './reports/translation_report.md'
209
173
  end
210
174
 
211
- def output_format
212
- data.dig('output', 'format') || 'markdown'
213
- end
214
-
215
- def include_metadata?
216
- data.dig('output', 'include_metadata') != false
217
- end
218
-
219
175
  private
220
176
 
221
177
  def load_config_file(config_path)
@@ -266,14 +222,5 @@ module LlmTranslate
266
222
  {content}
267
223
  PROMPT
268
224
  end
269
-
270
- def default_preserve_patterns
271
- [
272
- '```[\\s\\S]*?```', # Code blocks
273
- '`[^`]+`', # Inline code
274
- '\\[.*?\\]\\(.*?\\)', # Links
275
- '!\\[.*?\\]\\(.*?\\)' # Images
276
- ]
277
- end
278
225
  end
279
226
  end
@@ -67,7 +67,7 @@ module LlmTranslate
67
67
  when 'console'
68
68
  create_console_logger
69
69
  when 'file'
70
- create_file_logger(config.log_file_path)
70
+ create_file_logger('./logs/llm_translate.log')
71
71
  when 'both'
72
72
  create_multi_logger
73
73
  else
@@ -76,15 +76,8 @@ module LlmTranslate
76
76
  end
77
77
 
78
78
  def create_error_logger
79
- return nil unless config.error_log_path
80
-
81
- FileUtils.mkdir_p(File.dirname(config.error_log_path))
82
- error_logger = ::Logger.new(config.error_log_path)
83
- error_logger.level = log_level_constant
84
- error_logger.formatter = proc do |severity, datetime, _progname, msg|
85
- "[#{datetime.strftime('%Y-%m-%d %H:%M:%S')}] #{severity}: #{msg}\n"
86
- end
87
- error_logger
79
+ # Error logger is no longer supported, return nil
80
+ nil
88
81
  end
89
82
 
90
83
  def create_console_logger
@@ -119,7 +112,7 @@ module LlmTranslate
119
112
 
120
113
  def create_multi_logger
121
114
  console_logger = create_console_logger
122
- file_logger = create_file_logger(config.log_file_path)
115
+ file_logger = create_file_logger('./logs/llm_translate.log')
123
116
 
124
117
  MultiLogger.new([console_logger, file_logger])
125
118
  end
@@ -2,6 +2,7 @@
2
2
 
3
3
  require 'pathname'
4
4
  require 'fileutils'
5
+ require 'async'
5
6
 
6
7
  module LlmTranslate
7
8
  class TranslatorEngine
@@ -53,6 +54,66 @@ module LlmTranslate
53
54
  sleep(config.request_interval) if config.request_interval.positive?
54
55
  end
55
56
 
57
+ def translate_files_concurrently(file_paths)
58
+ return translate_files_sequentially(file_paths) if config.concurrent_files <= 1
59
+
60
+ results = { success: [], error: [] }
61
+
62
+ # Use Async to run concurrent translation tasks
63
+ Async do |task|
64
+ # Process files in batches to limit concurrency
65
+ file_paths.each_slice(config.concurrent_files) do |batch|
66
+ # Create async tasks for the current batch
67
+ batch_tasks = batch.map.with_index do |file_path, _batch_index|
68
+ # Calculate overall index
69
+ overall_index = file_paths.index(file_path) + 1
70
+
71
+ task.async do
72
+ logger.info "[#{overall_index}/#{file_paths.length}] Processing: #{file_path}"
73
+
74
+ # Translate the file
75
+ translate_file(file_path)
76
+
77
+ # Collect successful result
78
+ results[:success] << file_path
79
+
80
+ logger.info "✓ Successfully processed: #{file_path}"
81
+ { status: :success, file: file_path }
82
+ rescue StandardError => e
83
+ # Collect error result
84
+ results[:error] << { file: file_path, error: e.message }
85
+
86
+ logger.error "✗ Failed to process #{file_path}: #{e.message}"
87
+ { status: :error, file: file_path, error: e.message }
88
+ end
89
+ end
90
+
91
+ # Wait for all tasks in this batch to complete before starting the next batch
92
+ batch_tasks.each(&:wait)
93
+ end
94
+ end
95
+
96
+ results
97
+ end
98
+
99
+ def translate_files_sequentially(file_paths)
100
+ results = { success: [], error: [] }
101
+
102
+ file_paths.each_with_index do |file_path, index|
103
+ logger.info "[#{index + 1}/#{file_paths.length}] Processing: #{file_path}"
104
+
105
+ translate_file(file_path)
106
+
107
+ results[:success] << file_path
108
+ logger.info "✓ Successfully processed: #{file_path}"
109
+ rescue StandardError => e
110
+ results[:error] << { file: file_path, error: e.message }
111
+ logger.error "✗ Failed to process #{file_path}: #{e.message}"
112
+ end
113
+
114
+ results
115
+ end
116
+
56
117
  def translate_content(content, file_path = nil)
57
118
  if config.preserve_formatting?
58
119
  translate_with_format_preservation(content)
@@ -87,147 +148,8 @@ module LlmTranslate
87
148
  end
88
149
 
89
150
  def translate_with_format_preservation(content)
90
- # Extract and preserve special markdown elements
91
- preserved_elements = extract_preserved_elements(content)
92
-
93
- # Replace preserved elements with placeholders
94
- content_with_placeholders = replace_with_placeholders(content, preserved_elements)
95
-
96
151
  # Translate the content with placeholders
97
- translated_content = ai_client.translate(content_with_placeholders)
98
-
99
- # Restore preserved elements
100
- restore_preserved_elements(translated_content, preserved_elements)
101
- end
102
-
103
- def extract_preserved_elements(content)
104
- preserved = {}
105
- pattern_index = 0
106
-
107
- config.preserve_patterns.each do |pattern|
108
- regex = Regexp.new(pattern, Regexp::MULTILINE)
109
-
110
- content.scan(regex) do |match|
111
- # Handle both single match and capture groups
112
- match_text = match.is_a?(Array) ? match[0] : match
113
- placeholder = "PRESERVED_ELEMENT_#{pattern_index}"
114
- preserved[placeholder] = match_text
115
- pattern_index += 1
116
- end
117
- end
118
-
119
- preserved
120
- end
121
-
122
- def replace_with_placeholders(content, preserved_elements)
123
- result = content.dup
124
-
125
- preserved_elements.each do |placeholder, original_text|
126
- # Escape special regex characters in the original text
127
- escaped_text = Regexp.escape(original_text)
128
- result = result.gsub(Regexp.new(escaped_text), placeholder)
129
- end
130
-
131
- result
132
- end
133
-
134
- def restore_preserved_elements(translated_content, preserved_elements)
135
- result = translated_content.dup
136
-
137
- preserved_elements.each do |placeholder, original_text|
138
- result = result.gsub(placeholder, original_text)
139
- end
140
-
141
- result
142
- end
143
-
144
- # Additional helper methods for handling special cases
145
-
146
- def split_large_content(content, max_size = 3000)
147
- # Split content into chunks if it's too large for the AI model
148
- return [content] if content.length <= max_size
149
-
150
- chunks = []
151
- lines = content.split("\n")
152
- current_chunk = ''
153
-
154
- lines.each do |line|
155
- # If adding this line would exceed the limit, start a new chunk
156
- if "#{current_chunk}#{line}\n".length > max_size && !current_chunk.empty?
157
- chunks << current_chunk.strip
158
- current_chunk = "#{line}\n"
159
- else
160
- current_chunk += "#{line}\n"
161
- end
162
- end
163
-
164
- # Add the last chunk if it's not empty
165
- chunks << current_chunk.strip unless current_chunk.strip.empty?
166
-
167
- chunks
168
- end
169
-
170
- def translate_large_content(content)
171
- chunks = split_large_content(content)
172
-
173
- return ai_client.translate(content) if chunks.length == 1
174
-
175
- logger.info "Splitting large content into #{chunks.length} chunks"
176
-
177
- translated_chunks = chunks.map.with_index do |chunk, index|
178
- logger.debug "Translating chunk #{index + 1}/#{chunks.length}"
179
-
180
- translated = ai_client.translate(chunk)
181
-
182
- # Add delay between chunks to avoid rate limiting
183
- sleep(config.request_interval) if config.request_interval.positive? && index < chunks.length - 1
184
-
185
- translated
186
- end
187
-
188
- translated_chunks.join("\n\n")
189
- end
190
-
191
- def detect_language(content)
192
- # Simple language detection based on content
193
- # This is a basic implementation - could be enhanced with a proper language detection library
194
-
195
- # Check for common English words
196
- english_indicators = %w[the and or but with from this that these those]
197
- chinese_indicators = %w[的 在 是 和 或者 但是 这 那]
198
-
199
- english_score = english_indicators.count { |word| content.downcase.include?(word) }
200
- chinese_score = chinese_indicators.count { |word| content.include?(word) }
201
-
202
- if chinese_score > english_score
203
- 'zh'
204
- elsif english_score.positive?
205
- 'en'
206
- else
207
- config.source_language
208
- end
209
- end
210
-
211
- def should_translate_content?(content)
212
- # Skip translation if content is mostly code or already in target language
213
-
214
- # Skip if content is mostly code blocks
215
- code_block_pattern = /```[\s\S]*?```/m
216
- code_blocks = content.scan(code_block_pattern)
217
- code_length = code_blocks.join.length
218
-
219
- if code_length > content.length * 0.8
220
- logger.debug 'Skipping translation: content is mostly code blocks'
221
- return false
222
- end
223
-
224
- # Skip if content is very short
225
- if content.strip.length < 10
226
- logger.debug 'Skipping translation: content too short'
227
- return false
228
- end
229
-
230
- true
152
+ ai_client.translate(content)
231
153
  end
232
154
  end
233
155
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module LlmTranslate
4
- VERSION = '0.1.0'
4
+ VERSION = '0.3.0'
5
5
  end
@@ -31,6 +31,7 @@ Gem::Specification.new do |spec|
31
31
  spec.require_paths = ['lib']
32
32
 
33
33
  # Dependencies
34
+ spec.add_dependency 'async', '~> 2.0'
34
35
  spec.add_dependency 'ruby_llm', '~> 1.6'
35
36
  spec.add_dependency 'thor', '~> 1.3'
36
37
 
data/llm_translate.yml CHANGED
@@ -50,12 +50,7 @@ translation:
50
50
  # 是否翻译代码注释
51
51
  translate_code_comments: false
52
52
 
53
- # 需要保留不翻译的内容模式
54
- preserve_patterns:
55
- - "```[\\s\\S]*?```" # 代码块
56
- - "`[^`]+`" # 行内代码
57
- - "\\[.*?\\]\\(.*?\\)" # 链接
58
- - "!\\[.*?\\]\\(.*?\\)" # 图片
53
+
59
54
 
60
55
  # 文件处理配置
61
56
  files:
@@ -48,11 +48,7 @@ translation:
48
48
  translate_code_comments: false
49
49
 
50
50
  # 需要保留不翻译的内容模式
51
- preserve_patterns:
52
- - "```[\\s\\S]*?```" # 代码块
53
- - "`[^`]+`" # 行内代码
54
- - "\\[.*?\\]\\(.*?\\)" # 链接
55
- - "!\\[.*?\\]\\(.*?\\)" # 图片
51
+
56
52
 
57
53
  # 文件处理配置
58
54
  files:
data/test_new_config.yml CHANGED
@@ -50,12 +50,7 @@ translation:
50
50
  # 是否翻译代码注释
51
51
  translate_code_comments: false
52
52
 
53
- # 需要保留不翻译的内容模式
54
- preserve_patterns:
55
- - "```[\\s\\S]*?```" # 代码块
56
- - "`[^`]+`" # 行内代码
57
- - "\\[.*?\\]\\(.*?\\)" # 链接
58
- - "!\\[.*?\\]\\(.*?\\)" # 图片
53
+
59
54
 
60
55
  # 文件处理配置
61
56
  files:
metadata CHANGED
@@ -1,15 +1,29 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: llm_translate
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - LlmTranslate Team
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2025-08-19 00:00:00.000000000 Z
11
+ date: 2025-09-01 00:00:00.000000000 Z
12
12
  dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: async
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '2.0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '2.0'
13
27
  - !ruby/object:Gem::Dependency
14
28
  name: ruby_llm
15
29
  requirement: !ruby/object:Gem::Requirement