llm_translate 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 584007a41c9b59041a1ecd9e42c96231c66fdc606edb19567c58fd342d838a15
4
- data.tar.gz: 1b9afce81578be82bcfd4f81a7fa6050073d9eef8848a7ee43ef9110b38ead94
3
+ metadata.gz: c1231cf03fe5984a00c4bccf9596b2255dd83c4852bac5bd554068e3853f1fad
4
+ data.tar.gz: 208488f548d281ae103eb10ad3790201c41a62cff8acb0b24c36e786181da3ed
5
5
  SHA512:
6
- metadata.gz: a959c5dfcf20bfc2bc94f787cf8c3e49ee75417a5be461a0b2650d023ce964670dd566e118e764b3152c3afdc2a22d2ac33605fdfca7909e55848e0761b0515c
7
- data.tar.gz: d56df324476f82d6deb7e4b28bba026349661e4b5c389e1068e09dae36ecae7c38a8d38cee8cbfa861c686db3f063f3715cc0c18430e462d86f4ebf69079ce72
6
+ metadata.gz: bc4713b2292f56da626f79beafda11144a45f8fc84ed660f4f333afa2cf93b592285bb74b5746cb2610cf5bf41c81006ffb8ad648c6784365692404e4ff56495
7
+ data.tar.gz: 0f618ca45a6b97cf3fed6410d2dcf73bd80fd17abc0e91c443110021da5da0b484945b45f9813ef46aa671714c821ad1aa73f51530165cc9eceea437b32ed3e3
data/README.md CHANGED
@@ -177,11 +177,6 @@ translation:
177
177
  default_prompt: "Your custom prompt with {content} placeholder"
178
178
  preserve_formatting: true
179
179
  translate_code_comments: false
180
- preserve_patterns:
181
- - "```[\\s\\S]*?```" # Code blocks
182
- - "`[^`]+`" # Inline code
183
- - "\\[.*?\\]\\(.*?\\)" # Links
184
- - "!\\[.*?\\]\\(.*?\\)" # Images
185
180
 
186
181
  # File Processing
187
182
  files:
data/README.zh.md CHANGED
@@ -1,21 +1,21 @@
1
1
  # LlmTranslate
2
2
 
3
- 一个由AI驱动的Markdown翻译器,能够在翻译内容时保持格式不变,同时使用各种AI提供者。
3
+ AI 驱动的 Markdown 翻译工具,可在使用各种 AI 提供商翻译内容时保留格式。
4
4
 
5
- ## 特性
5
+ ## 功能特性
6
6
 
7
- - 🤖 **AI驱动的翻译**:支持OpenAI、Anthropic和Ollama
8
- - 📝 **Markdown格式保留**:保持代码块、链接、图像和格式不变
9
- - 🔧 **灵活配置**:基于YAML的配置,支持环境变量
7
+ - 🤖 **AI 驱动翻译**:支持 OpenAI、Anthropic Ollama
8
+ - 📝 **Markdown 格式保留**:保持代码块、链接、图片和格式完整
9
+ - 🔧 **灵活配置**:基于 YAML 的配置,支持环境变量
10
10
  - 📁 **批量处理**:递归处理整个目录结构
11
- - 🚀 **CLI接口**:易于使用的命令行接口,使用Thor
12
- - 📊 **进度跟踪**:内置日志记录和报告
13
- - ⚡ **错误处理**:强大的错误处理机制,带有重试机制
11
+ - 🚀 **CLI 界面**:使用 Thor 的易用命令行界面
12
+ - 📊 **进度跟踪**:内置日志记录和报告功能
13
+ - ⚡ **错误处理**:带有重试机制的强大错误处理
14
14
  - 🎯 **可定制**:自定义提示、文件模式和输出策略
15
15
 
16
16
  ## 安装
17
17
 
18
- 将此行添加到您应用程序的Gemfile中:
18
+ 将以下行添加到您应用程序的 Gemfile 中:
19
19
 
20
20
  ```ruby
21
21
  gem 'llm_translate'
@@ -27,15 +27,15 @@ gem 'llm_translate'
27
27
  bundle install
28
28
  ```
29
29
 
30
- 或者您也可以自己安装:
30
+ 或者自行安装:
31
31
 
32
32
  ```bash
33
33
  gem install llm_translate
34
34
  ```
35
35
 
36
- ## 依赖
36
+ ## 依赖项
37
37
 
38
- 该gem需要`rubyllm` gem进行AI集成:
38
+ gem 需要 `rubyllm` gem 进行 AI 集成:
39
39
 
40
40
  ```bash
41
41
  gem install rubyllm
@@ -48,22 +48,22 @@ gem install rubyllm
48
48
  llm_translate init
49
49
  ```
50
50
 
51
- 2. **设置您的API密钥**:
51
+ 2. **设置您的 API 密钥**:
52
52
  ```bash
53
53
  export LLM_TRANSLATE_API_KEY="your-api-key-here"
54
54
  ```
55
55
 
56
- 3. **翻译您的markdown文件**:
56
+ 3. **翻译您的 markdown 文件**:
57
57
  ```bash
58
- llm_translate translate --config ./translator.yml
58
+ llm_translate translate --config ./llm_translate.yml
59
59
  ```
60
60
 
61
61
  ## 配置
62
62
 
63
- 翻译器使用YAML配置文件。以下是一个最小示例:
63
+ 翻译器使用 YAML 配置文件。这是一个最小示例:
64
64
 
65
65
  ```yaml
66
- # translator.yml
66
+ # llm_translate.yml
67
67
  ai:
68
68
  api_key: ${LLM_TRANSLATE_API_KEY}
69
69
  provider: "openai"
@@ -91,7 +91,7 @@ logging:
91
91
  output: "console"
92
92
  ```
93
93
 
94
- ### AI提供者
94
+ ### AI 提供商
95
95
 
96
96
  #### OpenAI
97
97
  ```yaml
@@ -109,101 +109,188 @@ ai:
109
109
  model: "claude-3-sonnet-20240229"
110
110
  ```
111
111
 
112
- #### Ollama(本地)
113
- ```bash
114
- bundle install
115
- ```0
112
+ #### Ollama (本地)
113
+ ```yaml
114
+ ai:
115
+ provider: "ollama"
116
+ model: "llama2"
117
+ # 如果不使用默认设置,请设置 OLLAMA_HOST 环境变量
118
+ ```
116
119
 
117
- ## 使用
120
+ ## 使用方法
118
121
 
119
122
  ### 基本翻译
120
123
 
121
124
  #### 目录模式(默认)
122
125
  ```bash
123
- bundle install
124
- ```1
126
+ llm_translate translate --config ./llm_translate.yml
127
+ ```
125
128
 
126
129
  #### 单文件模式
127
- 要翻译单个文件,请在配置中设置`input_file```bash
128
- llm_translate init
129
- ```7output_file`:
130
+ 要翻译单个文件,请在配置中设置 `input_file` 和 `output_file`:
130
131
 
131
- ```bash
132
- bundle install
133
- ```2
132
+ ```yaml
133
+ files:
134
+ # 单文件模式
135
+ input_file: "./README.md"
136
+ output_file: "./README.zh.md"
137
+ ```
134
138
 
135
- 当同时指定`input_file```bash
136
- llm_translate init
137
- ```7output_file`时,翻译器将以单文件模式运行,忽略与目录相关的设置。
139
+ 当同时指定 `input_file` 和 `output_file` 时,翻译器将以单文件模式运行,忽略与目录相关的设置。
138
140
 
139
141
  ### 命令行选项
140
142
 
141
143
  ```bash
142
- bundle install
143
- ```3
144
+ llm_translate translate [OPTIONS]
145
+
146
+ Options:
147
+ -c, --config PATH 配置文件路径(默认:./llm_translate.yml)
148
+ -i, --input PATH 输入目录(覆盖配置)
149
+ -o, --output PATH 输出目录(覆盖配置)
150
+ -p, --prompt TEXT 自定义翻译提示(覆盖配置)
151
+ -v, --verbose 启用详细输出
152
+ -d, --dry-run 执行试运行,不进行实际翻译
153
+
154
+ Other Commands:
155
+ llm_translate init 初始化新的配置文件
156
+ llm_translate version 显示版本信息
157
+ ```
144
158
 
145
159
  ### 配置文件结构
146
160
 
147
- ```bash
148
- bundle install
149
- ```4[\s\S]*?```bash
150
- bundle install
151
- ```5
161
+ ```yaml
162
+ # AI 配置
163
+ ai:
164
+ api_key: ${LLM_TRANSLATE_API_KEY}
165
+ provider: "openai" # openai, anthropic, ollama
166
+ model: "gpt-4"
167
+ temperature: 0.3
168
+ max_tokens: 4000
169
+ retry_attempts: 3
170
+ retry_delay: 2
171
+ timeout: 60
172
+
173
+ # 翻译设置
174
+ translation:
175
+ target_language: "zh-CN"
176
+ source_language: "auto"
177
+ default_prompt: "您的自定义提示,包含 {content} 占位符"
178
+ preserve_formatting: true
179
+ translate_code_comments: false
180
+
181
+ # 文件处理
182
+ files:
183
+ input_directory: "./docs"
184
+ output_directory: "./docs-translated"
185
+ filename_strategy: "suffix" # suffix, replace, directory
186
+ filename_suffix: ".zh"
187
+ include_patterns:
188
+ - "**/*.md"
189
+ - "**/*.markdown"
190
+ exclude_patterns:
191
+ - "**/node_modules/**"
192
+ - "**/.*"
193
+ preserve_directory_structure: true
194
+ overwrite_policy: "ask" # ask, overwrite, skip, backup
195
+ backup_directory: "./backups"
196
+
197
+ # 日志记录
198
+ logging:
199
+ level: "info" # debug, info, warn, error
200
+ output: "console" # console, file, both
201
+ file_path: "./logs/translator.log"
202
+ verbose_translation: false
203
+ error_log_path: "./logs/errors.log"
204
+
205
+ # 错误处理
206
+ error_handling:
207
+ on_error: "log_and_continue" # stop, log_and_continue, skip_file
208
+ max_consecutive_errors: 5
209
+ retry_on_failure: 2
210
+ generate_error_report: true
211
+ error_report_path: "./logs/error_report.md"
212
+
213
+ # 性能
214
+ performance:
215
+ concurrent_files: 3
216
+ batch_size: 5
217
+ request_interval: 1 # 请求之间的秒数
218
+ max_memory_mb: 500
219
+
220
+ # 输出
221
+ output:
222
+ show_progress: true
223
+ show_statistics: true
224
+ generate_report: true
225
+ report_path: "./reports/translation_report.md"
226
+ format: "markdown"
227
+ include_metadata: true
228
+ ```
152
229
 
153
230
  ## 示例
154
231
 
155
232
  ### 翻译文档
156
233
 
157
234
  ```bash
158
- bundle install
159
- ```6
235
+ # 将 ./docs 中的所有 markdown 文件翻译为中文
236
+ llm_translate translate --input ./docs --output ./docs-zh
237
+
238
+ # 使用自定义提示
239
+ llm_translate translate --prompt "翻译以下内容为中文,保持技术术语不变: {content}"
240
+
241
+ # 试运行以查看将要翻译的内容
242
+ llm_translate translate --dry-run --verbose
243
+ ```
160
244
 
161
245
  ### 批量翻译
162
246
 
163
247
  ```bash
164
- bundle install
165
- ```7
248
+ # 翻译多种语言版本
249
+ for lang in zh-CN ja-JP ko-KR; do
250
+ llm_translate translate --config "./configs/llm_translate-${lang}.yml"
251
+ done
252
+ ```
166
253
 
167
254
  ## 开发
168
255
 
169
- 克隆代码库后,运行:
256
+ 检出仓库后,运行:
170
257
 
171
258
  ```bash
172
259
  bundle install
173
260
  ```
174
261
 
175
- 要运行测试:
262
+ 运行测试:
176
263
 
177
264
  ```bash
178
- bundle install
179
- ```9
265
+ bundle exec rspec
266
+ ```
180
267
 
181
- 要运行代码检查:
268
+ 运行代码检查:
182
269
 
183
270
  ```bash
184
- gem install llm_translate
185
- ```0
271
+ bundle exec rubocop
272
+ ```
186
273
 
187
- 要将此gem安装到您的本地机器上:
274
+ 将此 gem 安装到您的本地机器:
188
275
 
189
276
  ```bash
190
- gem install llm_translate
191
- ```1
277
+ bundle exec rake install
278
+ ```
192
279
 
193
280
  ## 贡献
194
281
 
195
- 欢迎在GitHub上提交错误报告和拉取请求,地址为 https://github.com/translator/translator
282
+ 欢迎在 GitHub 上提交错误报告和拉取请求:https://github.com/llm_translate/llm_translate
196
283
 
197
284
  ## 许可证
198
285
 
199
- 该gem[MIT许可证](https://opensource.org/licenses/MIT)条款下作为开源软件提供。
286
+ gem 根据 [MIT 许可证](https://opensource.org/licenses/MIT) 的条款作为开源软件提供。
200
287
 
201
288
  ## 更新日志
202
289
 
203
290
  ### v0.1.0
204
- - 初始发布
205
- - 支持OpenAI、Anthropic和Ollama提供者
206
- - Markdown格式保留
291
+ - 初始版本
292
+ - 支持 OpenAI、Anthropic Ollama 提供商
293
+ - Markdown 格式保留
207
294
  - 可配置的翻译提示
208
295
  - 批量文件处理
209
- - 综合的错误处理和日志记录
296
+ - 全面的错误处理和日志记录
@@ -50,12 +50,7 @@ translation:
50
50
  # 是否翻译代码注释
51
51
  translate_code_comments: false
52
52
 
53
- # 需要保留不翻译的内容模式
54
- preserve_patterns:
55
- - "```[\\s\\S]*?```" # 代码块
56
- - "`[^`]+`" # 行内代码
57
- - "\\[.*?\\]\\(.*?\\)" # 链接
58
- - "!\\[.*?\\]\\(.*?\\)" # 图片
53
+
59
54
 
60
55
  # 文件处理配置
61
56
  files:
@@ -1,6 +1,8 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require 'ruby_llm'
4
+ # require 'pry'
5
+
4
6
  module LlmTranslate
5
7
  class AiClient
6
8
  attr_reader :config, :logger
@@ -56,16 +58,14 @@ module LlmTranslate
56
58
 
57
59
  def configure_ruby_llm
58
60
  RubyLLM.configure do |config_obj|
59
- # For aihubmix.com or any custom host, use OpenAI-compatible API
60
61
  config_obj.openai_api_key = config.api_key
61
62
  config_obj.openai_api_base = config.ai_host
62
- config_obj.default_model = config.ai_model
63
63
  end
64
64
  end
65
65
 
66
66
  def make_request(prompt)
67
67
  chat = RubyLLM.chat
68
- .with_model(config.ai_model)
68
+ .with_model(config.ai_model, assume_exists: true, provider: config.ai_provider)
69
69
  .with_temperature(config.temperature)
70
70
 
71
71
  response = chat.ask(prompt)
@@ -75,10 +75,6 @@ module LlmTranslate
75
75
  data.dig('translation', 'translate_code_comments') == true
76
76
  end
77
77
 
78
- def preserve_patterns
79
- data.dig('translation', 'preserve_patterns') || default_preserve_patterns
80
- end
81
-
82
78
  # File Configuration
83
79
  def input_directory
84
80
  cli_options[:input] || data.dig('files', 'input_directory') || './docs'
@@ -266,14 +262,5 @@ module LlmTranslate
266
262
  {content}
267
263
  PROMPT
268
264
  end
269
-
270
- def default_preserve_patterns
271
- [
272
- '```[\\s\\S]*?```', # Code blocks
273
- '`[^`]+`', # Inline code
274
- '\\[.*?\\]\\(.*?\\)', # Links
275
- '!\\[.*?\\]\\(.*?\\)' # Images
276
- ]
277
- end
278
265
  end
279
266
  end
@@ -87,147 +87,8 @@ module LlmTranslate
87
87
  end
88
88
 
89
89
  def translate_with_format_preservation(content)
90
- # Extract and preserve special markdown elements
91
- preserved_elements = extract_preserved_elements(content)
92
-
93
- # Replace preserved elements with placeholders
94
- content_with_placeholders = replace_with_placeholders(content, preserved_elements)
95
-
96
90
  # Translate the content with placeholders
97
- translated_content = ai_client.translate(content_with_placeholders)
98
-
99
- # Restore preserved elements
100
- restore_preserved_elements(translated_content, preserved_elements)
101
- end
102
-
103
- def extract_preserved_elements(content)
104
- preserved = {}
105
- pattern_index = 0
106
-
107
- config.preserve_patterns.each do |pattern|
108
- regex = Regexp.new(pattern, Regexp::MULTILINE)
109
-
110
- content.scan(regex) do |match|
111
- # Handle both single match and capture groups
112
- match_text = match.is_a?(Array) ? match[0] : match
113
- placeholder = "PRESERVED_ELEMENT_#{pattern_index}"
114
- preserved[placeholder] = match_text
115
- pattern_index += 1
116
- end
117
- end
118
-
119
- preserved
120
- end
121
-
122
- def replace_with_placeholders(content, preserved_elements)
123
- result = content.dup
124
-
125
- preserved_elements.each do |placeholder, original_text|
126
- # Escape special regex characters in the original text
127
- escaped_text = Regexp.escape(original_text)
128
- result = result.gsub(Regexp.new(escaped_text), placeholder)
129
- end
130
-
131
- result
132
- end
133
-
134
- def restore_preserved_elements(translated_content, preserved_elements)
135
- result = translated_content.dup
136
-
137
- preserved_elements.each do |placeholder, original_text|
138
- result = result.gsub(placeholder, original_text)
139
- end
140
-
141
- result
142
- end
143
-
144
- # Additional helper methods for handling special cases
145
-
146
- def split_large_content(content, max_size = 3000)
147
- # Split content into chunks if it's too large for the AI model
148
- return [content] if content.length <= max_size
149
-
150
- chunks = []
151
- lines = content.split("\n")
152
- current_chunk = ''
153
-
154
- lines.each do |line|
155
- # If adding this line would exceed the limit, start a new chunk
156
- if "#{current_chunk}#{line}\n".length > max_size && !current_chunk.empty?
157
- chunks << current_chunk.strip
158
- current_chunk = "#{line}\n"
159
- else
160
- current_chunk += "#{line}\n"
161
- end
162
- end
163
-
164
- # Add the last chunk if it's not empty
165
- chunks << current_chunk.strip unless current_chunk.strip.empty?
166
-
167
- chunks
168
- end
169
-
170
- def translate_large_content(content)
171
- chunks = split_large_content(content)
172
-
173
- return ai_client.translate(content) if chunks.length == 1
174
-
175
- logger.info "Splitting large content into #{chunks.length} chunks"
176
-
177
- translated_chunks = chunks.map.with_index do |chunk, index|
178
- logger.debug "Translating chunk #{index + 1}/#{chunks.length}"
179
-
180
- translated = ai_client.translate(chunk)
181
-
182
- # Add delay between chunks to avoid rate limiting
183
- sleep(config.request_interval) if config.request_interval.positive? && index < chunks.length - 1
184
-
185
- translated
186
- end
187
-
188
- translated_chunks.join("\n\n")
189
- end
190
-
191
- def detect_language(content)
192
- # Simple language detection based on content
193
- # This is a basic implementation - could be enhanced with a proper language detection library
194
-
195
- # Check for common English words
196
- english_indicators = %w[the and or but with from this that these those]
197
- chinese_indicators = %w[的 在 是 和 或者 但是 这 那]
198
-
199
- english_score = english_indicators.count { |word| content.downcase.include?(word) }
200
- chinese_score = chinese_indicators.count { |word| content.include?(word) }
201
-
202
- if chinese_score > english_score
203
- 'zh'
204
- elsif english_score.positive?
205
- 'en'
206
- else
207
- config.source_language
208
- end
209
- end
210
-
211
- def should_translate_content?(content)
212
- # Skip translation if content is mostly code or already in target language
213
-
214
- # Skip if content is mostly code blocks
215
- code_block_pattern = /```[\s\S]*?```/m
216
- code_blocks = content.scan(code_block_pattern)
217
- code_length = code_blocks.join.length
218
-
219
- if code_length > content.length * 0.8
220
- logger.debug 'Skipping translation: content is mostly code blocks'
221
- return false
222
- end
223
-
224
- # Skip if content is very short
225
- if content.strip.length < 10
226
- logger.debug 'Skipping translation: content too short'
227
- return false
228
- end
229
-
230
- true
91
+ ai_client.translate(content)
231
92
  end
232
93
  end
233
94
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module LlmTranslate
4
- VERSION = '0.1.0'
4
+ VERSION = '0.2.0'
5
5
  end
data/llm_translate.yml CHANGED
@@ -50,12 +50,7 @@ translation:
50
50
  # 是否翻译代码注释
51
51
  translate_code_comments: false
52
52
 
53
- # 需要保留不翻译的内容模式
54
- preserve_patterns:
55
- - "```[\\s\\S]*?```" # 代码块
56
- - "`[^`]+`" # 行内代码
57
- - "\\[.*?\\]\\(.*?\\)" # 链接
58
- - "!\\[.*?\\]\\(.*?\\)" # 图片
53
+
59
54
 
60
55
  # 文件处理配置
61
56
  files:
@@ -48,11 +48,7 @@ translation:
48
48
  translate_code_comments: false
49
49
 
50
50
  # 需要保留不翻译的内容模式
51
- preserve_patterns:
52
- - "```[\\s\\S]*?```" # 代码块
53
- - "`[^`]+`" # 行内代码
54
- - "\\[.*?\\]\\(.*?\\)" # 链接
55
- - "!\\[.*?\\]\\(.*?\\)" # 图片
51
+
56
52
 
57
53
  # 文件处理配置
58
54
  files:
data/test_new_config.yml CHANGED
@@ -50,12 +50,7 @@ translation:
50
50
  # 是否翻译代码注释
51
51
  translate_code_comments: false
52
52
 
53
- # 需要保留不翻译的内容模式
54
- preserve_patterns:
55
- - "```[\\s\\S]*?```" # 代码块
56
- - "`[^`]+`" # 行内代码
57
- - "\\[.*?\\]\\(.*?\\)" # 链接
58
- - "!\\[.*?\\]\\(.*?\\)" # 图片
53
+
59
54
 
60
55
  # 文件处理配置
61
56
  files:
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: llm_translate
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - LlmTranslate Team