better_translate 0.4.2 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (100) hide show
  1. checksums.yaml +4 -4
  2. data/.env.example +14 -0
  3. data/.rspec +3 -0
  4. data/.rubocop.yml +8 -0
  5. data/.yardopts +10 -0
  6. data/CHANGELOG.md +125 -96
  7. data/CLAUDE.md +385 -0
  8. data/README.md +649 -187
  9. data/Rakefile +7 -1
  10. data/Steepfile +29 -0
  11. data/docs/implementation/00-overview.md +220 -0
  12. data/docs/implementation/01-setup_dependencies.md +668 -0
  13. data/docs/implementation/02-error_handling.md +65 -0
  14. data/docs/implementation/03-core_components.md +457 -0
  15. data/docs/implementation/03.5-variable_preservation.md +509 -0
  16. data/docs/implementation/04-provider_architecture.md +571 -0
  17. data/docs/implementation/05-translation_logic.md +1065 -0
  18. data/docs/implementation/06-main_module_api.md +122 -0
  19. data/docs/implementation/07-direct_translation_helpers.md +582 -0
  20. data/docs/implementation/08-rails_integration.md +323 -0
  21. data/docs/implementation/09-testing_suite.md +228 -0
  22. data/docs/implementation/10-documentation_examples.md +150 -0
  23. data/docs/implementation/11-quality_security.md +65 -0
  24. data/docs/implementation/12-cli_standalone.md +698 -0
  25. data/exe/better_translate +9 -0
  26. data/lib/better_translate/cache.rb +125 -0
  27. data/lib/better_translate/cli.rb +304 -0
  28. data/lib/better_translate/configuration.rb +201 -0
  29. data/lib/better_translate/direct_translator.rb +131 -0
  30. data/lib/better_translate/errors.rb +101 -0
  31. data/lib/better_translate/progress_tracker.rb +157 -0
  32. data/lib/better_translate/provider_factory.rb +45 -0
  33. data/lib/better_translate/providers/anthropic_provider.rb +154 -0
  34. data/lib/better_translate/providers/base_http_provider.rb +239 -0
  35. data/lib/better_translate/providers/chatgpt_provider.rb +138 -44
  36. data/lib/better_translate/providers/gemini_provider.rb +123 -61
  37. data/lib/better_translate/railtie.rb +18 -0
  38. data/lib/better_translate/rate_limiter.rb +90 -0
  39. data/lib/better_translate/strategies/base_strategy.rb +58 -0
  40. data/lib/better_translate/strategies/batch_strategy.rb +56 -0
  41. data/lib/better_translate/strategies/deep_strategy.rb +45 -0
  42. data/lib/better_translate/strategies/strategy_selector.rb +43 -0
  43. data/lib/better_translate/translator.rb +115 -284
  44. data/lib/better_translate/utils/hash_flattener.rb +104 -0
  45. data/lib/better_translate/validator.rb +105 -0
  46. data/lib/better_translate/variable_extractor.rb +259 -0
  47. data/lib/better_translate/version.rb +2 -9
  48. data/lib/better_translate/yaml_handler.rb +168 -0
  49. data/lib/better_translate.rb +97 -73
  50. data/lib/generators/better_translate/analyze/USAGE +12 -0
  51. data/lib/generators/better_translate/analyze/analyze_generator.rb +94 -0
  52. data/lib/generators/better_translate/install/USAGE +13 -0
  53. data/lib/generators/better_translate/install/install_generator.rb +71 -0
  54. data/lib/generators/better_translate/install/templates/README +20 -0
  55. data/lib/generators/better_translate/install/templates/initializer.rb.tt +47 -0
  56. data/lib/generators/better_translate/translate/USAGE +13 -0
  57. data/lib/generators/better_translate/translate/translate_generator.rb +114 -0
  58. data/lib/tasks/better_translate.rake +136 -0
  59. data/sig/better_translate/cache.rbs +28 -0
  60. data/sig/better_translate/cli.rbs +24 -0
  61. data/sig/better_translate/configuration.rbs +78 -0
  62. data/sig/better_translate/direct_translator.rbs +18 -0
  63. data/sig/better_translate/errors.rbs +46 -0
  64. data/sig/better_translate/progress_tracker.rbs +29 -0
  65. data/sig/better_translate/provider_factory.rbs +8 -0
  66. data/sig/better_translate/providers/anthropic_provider.rbs +27 -0
  67. data/sig/better_translate/providers/base_http_provider.rbs +44 -0
  68. data/sig/better_translate/providers/chatgpt_provider.rbs +25 -0
  69. data/sig/better_translate/providers/gemini_provider.rbs +22 -0
  70. data/sig/better_translate/railtie.rbs +7 -0
  71. data/sig/better_translate/rate_limiter.rbs +20 -0
  72. data/sig/better_translate/strategies/base_strategy.rbs +19 -0
  73. data/sig/better_translate/strategies/batch_strategy.rbs +13 -0
  74. data/sig/better_translate/strategies/deep_strategy.rbs +11 -0
  75. data/sig/better_translate/strategies/strategy_selector.rbs +10 -0
  76. data/sig/better_translate/translator.rbs +24 -0
  77. data/sig/better_translate/utils/hash_flattener.rbs +14 -0
  78. data/sig/better_translate/validator.rbs +14 -0
  79. data/sig/better_translate/variable_extractor.rbs +40 -0
  80. data/sig/better_translate/version.rbs +4 -0
  81. data/sig/better_translate/yaml_handler.rbs +29 -0
  82. data/sig/better_translate.rbs +32 -2
  83. data/sig/faraday.rbs +22 -0
  84. data/sig/generators/better_translate/analyze/analyze_generator.rbs +18 -0
  85. data/sig/generators/better_translate/install/install_generator.rbs +14 -0
  86. data/sig/generators/better_translate/translate/translate_generator.rbs +10 -0
  87. data/sig/optparse.rbs +9 -0
  88. data/sig/psych.rbs +5 -0
  89. data/sig/rails.rbs +34 -0
  90. metadata +89 -203
  91. data/lib/better_translate/helper.rb +0 -83
  92. data/lib/better_translate/providers/base_provider.rb +0 -102
  93. data/lib/better_translate/service.rb +0 -105
  94. data/lib/better_translate/similarity_analyzer.rb +0 -218
  95. data/lib/better_translate/utils.rb +0 -55
  96. data/lib/better_translate/writer.rb +0 -75
  97. data/lib/generators/better_translate/analyze_generator.rb +0 -57
  98. data/lib/generators/better_translate/install_generator.rb +0 -14
  99. data/lib/generators/better_translate/templates/better_translate.rb +0 -49
  100. data/lib/generators/better_translate/translate_generator.rb +0 -84
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 04ce5f4b9e73378af8eec2409688e26dfef6fb42268b8817800c8c3433877eaf
4
- data.tar.gz: 0b34308c1ef74dd3ad74b8cbc0b546cf831f2e43c3f3e2628cb636511ab33289
3
+ metadata.gz: 6a19ffdafe02b71cad6e1d93eba868299d8551adc3e7952450c709c3f71d1292
4
+ data.tar.gz: 7c5d7dde1f0b120124d13f75c31637b2a304845955a2974fc7f0c2f58eec8e76
5
5
  SHA512:
6
- metadata.gz: e63e571fa9f0a881c25d35d4cb31f5cfaa0fbed2e84d3847e50f9b6f811685b5a13666408b48f0e61666232916c8cbf803dda11a6e51bcca9813f1e97b696435
7
- data.tar.gz: 94f9664d32402395d9a575c2e1ae8f04b29781421fd7a4b266ae7bf0c7a2c21d345ab73762397d50028d01482255b0a10319c4799c25056256c39e0d43269df0
6
+ metadata.gz: a0c9c62cc2a35a0663d79af39c8e4a87cd86106b906b185c805f4ff9cba2bdfa82c1eaf8c366b5e91e70fe0c5ea87d5da852e03c32ae2a3959ae32ef2318d62a
7
+ data.tar.gz: 6fb2cac27269257f878d3055027213dc2bef5d75332132f559493064a4c83f1360903080571a3fd1e71c7c6feec73cf41ff619d8ecc8da180246779b5e893e59
data/.env.example ADDED
@@ -0,0 +1,14 @@
1
+ # API Keys for Translation Providers
2
+ # Copy this file to .env and replace with your actual API keys
3
+
4
+ # OpenAI API Key (for ChatGPT provider)
5
+ # Get your key from: https://platform.openai.com/api-keys
6
+ OPENAI_API_KEY=your_openai_api_key_here
7
+
8
+ # Google Gemini API Key
9
+ # Get your key from: https://aistudio.google.com/app/apikey
10
+ GEMINI_API_KEY=your_gemini_api_key_here
11
+
12
+ # Anthropic API Key (for Claude provider)
13
+ # Get your key from: https://console.anthropic.com/settings/keys
14
+ ANTHROPIC_API_KEY=your_anthropic_api_key_here
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
data/.rubocop.yml ADDED
@@ -0,0 +1,8 @@
1
+ AllCops:
2
+ TargetRubyVersion: 3.0
3
+
4
+ Style/StringLiterals:
5
+ EnforcedStyle: double_quotes
6
+
7
+ Style/StringLiteralsInInterpolation:
8
+ EnforcedStyle: double_quotes
data/.yardopts ADDED
@@ -0,0 +1,10 @@
1
+ --markup markdown
2
+ --readme README.md
3
+ --private
4
+ --protected
5
+ --output-dir doc
6
+ lib/**/*.rb
7
+ -
8
+ CHANGELOG.md
9
+ LICENSE.txt
10
+ docs/FEATURES.md
data/CHANGELOG.md CHANGED
@@ -5,109 +5,138 @@ All notable changes to BetterTranslate will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
- ## [0.4.2] - 2025-03-11
8
+ ## [1.0.0] - 2025-10-22
9
9
 
10
- ### Added
11
- - Comprehensive YARD-style documentation across the entire codebase:
12
- - Added detailed class and method documentation for all core components
13
- - Added parameter and return type documentation
14
- - Added usage examples for key classes and methods
15
- - Improved inline comments for complex logic
16
-
17
- ### Changed
18
- - Improved code readability and maintainability through better documentation
19
- - Enhanced developer experience with clearer API documentation
20
-
21
- ## [0.4.1] - 2025-03-11
22
-
23
- ### Fixed
24
- - Migliorati i test RSpec per garantire maggiore affidabilità:
25
- - Corretti gli stub per i provider di traduzione
26
- - Migliorata la gestione delle richieste HTTP nei test
27
- - Ottimizzati i file YAML temporanei per i test di similarità
28
- - Risolti problemi di compatibilità con WebMock
29
-
30
- ### Changed
31
- - Sostituito l'approccio di stubbing specifico con pattern più flessibili
32
- - Migliorata la struttura dei test per il SimilarityAnalyzer
33
-
34
- ## [0.4.0] - 2025-03-11
35
-
36
- ### Added
37
- - New Translation Similarity Analyzer:
38
- - Identifies similar translations across language files
39
- - Generates detailed JSON reports and human-readable summaries
40
- - Uses Levenshtein distance for similarity calculation
41
- - Configurable similarity threshold
42
- - New Rails generator: `rails generate better_translate:analyze`
43
- - Analyzes all YAML files in the locales directory
44
- - Provides immediate feedback in the console
45
- - Generates comprehensive similarity reports
46
-
47
- ## [0.3.1] - 2025-03-11
48
-
49
- ### Added
50
- - Comprehensive RSpec test suite covering:
51
- - Core translation functionality
52
- - LRU cache implementation
53
- - Provider selection and initialization
54
- - Error handling
55
- - Configuration management
56
- - Improved documentation with badges and testing information
57
-
58
- ### Changed
59
- - Made `translate` method public in `Service` class for better testability
60
- - Reorganized README.md with better structure and modern layout
61
-
62
- ## [0.3.0] - 2025-03-11
10
+ ### Complete Rewrite 🎉
63
11
 
64
- ### Added
65
- - New translation helper methods:
66
- - `translate_text_to_languages`: Translate single text to multiple languages
67
- - `translate_texts_to_languages`: Translate multiple texts to multiple languages
68
- - LRU caching for improved performance
69
-
70
- ### Changed
71
- - Enhanced error handling in translation providers
72
- - Improved method documentation
12
+ This version represents a complete architectural rewrite of BetterTranslate with improved design, better testing, and enhanced features.
73
13
 
74
- ## [0.2.0] - 2025-03-11
14
+ **Note:** This release is not backward compatible with versions 0.x.x
75
15
 
76
16
  ### Added
77
- - Two-step filtering process:
78
- - Global exclusions using `global_exclusions`
79
- - Language-specific exclusions using `exclusions_per_language`
80
-
81
- ### Changed
82
- - Improved filtering logic to handle language-specific exclusions independently
83
- - Enhanced documentation for exclusion configuration
84
17
 
85
- ### Fixed
86
- - Issue with language-specific exclusions affecting global filtering
87
-
88
- ## [0.1.1] - 2025-03-10
89
-
90
- ### Added
91
- - New Rails generator: `rails generate better_translate:translate`
92
- - Triggers translation process
18
+ #### Core Infrastructure
19
+ - **Configuration System**: Type-safe configuration with comprehensive validation
20
+ - Support for multiple providers (ChatGPT, Gemini, Anthropic)
21
+ - Configurable timeouts, retries, and concurrency
22
+ - Translation modes: override and incremental
23
+ - Optional translation context for domain-specific terminology
24
+
25
+ - **LRU Cache**: Intelligent caching system for improved performance
26
+ - Configurable capacity (default: 1000 items)
27
+ - Optional TTL (Time To Live) support
28
+ - Thread-safe with Mutex protection
29
+ - Cache key format: `"#{text}:#{target_lang_code}"`
30
+
31
+ - **Rate Limiter**: Thread-safe request throttling
32
+ - Configurable delay between requests (default: 0.5s)
33
+ - Prevents API overload and rate limit errors
34
+ - Mutex-based synchronization
35
+
36
+ - **Validator**: Comprehensive input validation
37
+ - Language code validation (2-letter ISO codes)
38
+ - Text validation for translation
39
+ - File path validation
40
+ - API key validation
41
+
42
+ - **Error Handling**: Custom exception hierarchy
43
+ - `ConfigurationError`: Configuration issues
44
+ - `ValidationError`: Input validation failures
45
+ - `TranslationError`: Translation failures
46
+ - `ProviderError`: Provider-specific errors
47
+ - `ApiError`: API call failures
48
+ - `RateLimitError`: Rate limit exceeded
49
+ - `FileError`: File operation failures
50
+ - `YamlError`: YAML parsing errors
51
+ - `ProviderNotFoundError`: Unknown provider
52
+
53
+ #### Provider Architecture
54
+ - **BaseHttpProvider**: Abstract base class for HTTP-based providers
55
+ - Faraday-based HTTP client (required for all providers)
56
+ - Retry logic with exponential backoff (3 attempts, 2s base delay, 60s max)
57
+ - Built-in rate limiting (0.5s between requests)
58
+ - Configurable timeouts (default: 30s)
59
+
60
+ - **ChatGPT Provider**: OpenAI GPT integration
61
+ - Model: GPT-5-nano
62
+ - Temperature: 1.0
63
+
64
+ - **Gemini Provider**: Google Gemini integration
65
+ - Model: gemini-2.0-flash-exp
66
+
67
+ - **Anthropic Provider**: Claude integration (planned)
68
+ - Model: claude-3-5-sonnet-20241022
69
+
70
+ #### Translation Features
71
+ - **Translation Strategies**: Automatic strategy selection based on content size
72
+ - Deep Translation (< 50 strings): Individual translation with detailed progress
73
+ - Batch Translation (≥ 50 strings): Processes in batches of 10 for performance
74
+
75
+ - **Exclusion System**: Two-tier exclusion mechanism
76
+ - Global exclusions: Apply to all target languages (e.g., brand names)
77
+ - Language-specific exclusions: Exclude keys only for specific languages
78
+
79
+ - **Translation Modes**:
80
+ - Override mode: Replaces entire target YAML files
81
+ - Incremental mode: Merges with existing files, only translates missing keys
82
+
83
+ - **Translation Context**: Domain-specific context for improved accuracy
84
+ - Medical terminology
85
+ - Legal terminology
86
+ - Financial terminology
87
+ - E-commerce
88
+ - Technical documentation
89
+
90
+ #### Rails Integration
91
+ - **Install Generator**: `rails generate better_translate:install`
92
+ - Creates initializer with example configuration
93
+ - Configures all supported providers
94
+
95
+ - **Translate Generator**: `rails generate better_translate:translate`
96
+ - Runs translation process
93
97
  - Displays progress messages
94
98
  - Integrates with existing configuration
95
99
 
96
- ## [0.1.0] - 2025-03-10
100
+ - **Analyze Generator**: `rails generate better_translate:analyze`
101
+ - Analyzes translation similarities using Levenshtein distance
102
+ - Generates detailed JSON reports
103
+ - Provides human-readable summaries
104
+ - Configurable similarity threshold
97
105
 
98
- ### Added
99
- - Initial release with core features:
100
- - YAML file translation from source to multiple target languages
101
- - Multiple provider support (ChatGPT and Google Gemini)
102
- - Progress tracking with ruby-progressbar
103
- - Centralized configuration via initializer
104
- - Two translation modes: override and incremental
105
- - Key exclusion system
106
- - Rails integration
107
-
108
- ### Configuration
109
- - API key management for providers
110
- - Source and target language settings
111
- - Output folder configuration
112
- - Translation mode selection
113
- - Exclusion patterns support
106
+ #### Utilities
107
+ - **HashFlattener**: Converts nested YAML to flat structure and vice versa
108
+ - Flatten with dot-notation keys
109
+ - Unflatten back to nested structure
110
+ - Preserves data types and structure
111
+
112
+ ### Development
113
+ - **YARD Documentation**: Comprehensive documentation for all public APIs
114
+ - `@param` with types
115
+ - `@return` with types
116
+ - `@raise` for exceptions
117
+ - `@example` blocks
118
+
119
+ - **RSpec Test Suite**: Full test coverage for core components
120
+ - Configuration tests
121
+ - Cache tests
122
+ - Rate limiter tests
123
+ - Validator tests
124
+ - Error handling tests
125
+ - Hash flattener tests
126
+
127
+ - **RuboCop**: Code style compliance
128
+ - Ruby 3.0+ target
129
+ - Frozen string literals required
130
+ - Double quotes for strings
131
+
132
+ ### Security
133
+ - Environment variable-based API key management
134
+ - No hardcoded credentials
135
+ - Input validation for all user-provided data
136
+ - VCR cassettes with automatic API key anonymization
137
+
138
+ ### Performance
139
+ - LRU caching reduces API costs
140
+ - Batch processing for large files (≥50 strings)
141
+ - Configurable concurrent requests (default: 3)
142
+ - Rate limiting prevents API overload
data/CLAUDE.md ADDED
@@ -0,0 +1,385 @@
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Development Commands
6
+
7
+ ### Testing
8
+ ```bash
9
+ # Run all tests (unit + integration)
10
+ bundle exec rake spec
11
+ # or
12
+ bundle exec rspec
13
+
14
+ # Run only unit tests (fast, no API calls)
15
+ bundle exec rspec spec/better_translate/
16
+
17
+ # Run only integration tests (with real API calls via VCR)
18
+ bundle exec rspec spec/integration/ --tag integration
19
+
20
+ # Run specific test file
21
+ bundle exec rspec spec/better_translate_spec.rb
22
+
23
+ # Run with specific example (line number)
24
+ bundle exec rspec spec/better_translate_spec.rb:42
25
+ ```
26
+
27
+ ### VCR Cassettes & API Testing
28
+
29
+ **Setup API Keys**:
30
+ 1. Copy `.env.example` to `.env`:
31
+ ```bash
32
+ cp .env.example .env
33
+ ```
34
+ 2. Edit `.env` and add your real API keys:
35
+ ```env
36
+ OPENAI_API_KEY=sk-...
37
+ GEMINI_API_KEY=...
38
+ ANTHROPIC_API_KEY=sk-ant-...
39
+ ```
40
+ 3. **IMPORTANT**: Never commit `.env` file (already in `.gitignore`)
41
+
42
+ **VCR Cassette Modes**:
43
+ - `:once` (default): Use existing cassettes, record new interactions
44
+ - `:new_episodes`: Record new interactions, keep existing ones
45
+ - `:all`: Re-record all cassettes (use when API changes)
46
+
47
+ **Re-recording Cassettes**:
48
+ ```bash
49
+ # Delete existing cassettes and re-record with real API calls
50
+ rm -rf spec/vcr_cassettes/
51
+ bundle exec rspec spec/integration/ --tag integration
52
+
53
+ # Re-record specific provider
54
+ rm -rf spec/vcr_cassettes/chatgpt/
55
+ bundle exec rspec spec/integration/chatgpt_integration_spec.rb
56
+ ```
57
+
58
+ **Cassette Location**: `spec/vcr_cassettes/`
59
+ - Cassettes are automatically anonymized (API keys replaced with placeholders)
60
+ - Cassettes should be committed to git for CI/CD pipelines
61
+ - Tests run without API keys when cassettes exist
62
+
63
+ ### Code Quality
64
+ ```bash
65
+ # Run RuboCop linter
66
+ bundle exec rake rubocop
67
+ # or
68
+ bundle exec rubocop
69
+
70
+ # Auto-fix RuboCop violations
71
+ bundle exec rubocop -a
72
+
73
+ # Run type checking with Steep
74
+ bundle exec rake steep
75
+ # or
76
+ bundle exec steep check
77
+
78
+ # Run default rake task (runs spec, rubocop, and steep)
79
+ bundle exec rake
80
+ ```
81
+
82
+ ### Documentation
83
+ ```bash
84
+ # Generate YARD documentation
85
+ bundle exec yard doc
86
+
87
+ # Start YARD server (view docs at http://localhost:8808)
88
+ bundle exec yard server
89
+
90
+ # Check documentation coverage
91
+ bundle exec yard stats
92
+ ```
93
+
94
+ ### Security
95
+ ```bash
96
+ # Check for security vulnerabilities in dependencies
97
+ bundle exec bundler-audit check --update
98
+ ```
99
+
100
+ ### Type Checking (RBS/Steep)
101
+ ```bash
102
+ # Run type checking
103
+ bundle exec steep check
104
+
105
+ # Type check specific files
106
+ bundle exec steep check lib/better_translate/cache.rb
107
+
108
+ # Show statistics
109
+ bundle exec steep stats
110
+
111
+ # Validate RBS syntax only
112
+ bundle exec rbs validate
113
+ ```
114
+
115
+ **RBS Files**: Type signatures are in `sig/` directory
116
+ - All public APIs have RBS signatures
117
+ - Steep is integrated in CI/CD pipeline
118
+ - Default rake task includes type checking
119
+
120
+ **Status**: 51 type errors remaining (down from 112 initial)
121
+ - Most errors are related to empty collection annotations
122
+ - All critical paths are type-checked
123
+ - Continuous improvement in progress
124
+
125
+ ### Gem Management
126
+ ```bash
127
+ # Install dependencies
128
+ bundle install
129
+
130
+ # Install gem locally for testing
131
+ bundle exec rake install
132
+
133
+ # Interactive console with gem loaded
134
+ bin/console
135
+ ```
136
+
137
+ ## Architecture Overview
138
+
139
+ ### Provider-Based System
140
+ The gem uses a provider architecture to support multiple AI translation services:
141
+
142
+ - **BaseHttpProvider**: Abstract base class for all HTTP-based providers
143
+ - Uses Faraday for all HTTP connections (REQUIRED - do not use Net::HTTP or other libraries)
144
+ - Implements retry logic with exponential backoff (3 attempts, 2s base delay, 60s max)
145
+ - Handles rate limiting (0.5s between requests, thread-safe with Mutex)
146
+ - Configurable timeouts (default: 30s)
147
+
148
+ - **Providers**:
149
+ - ChatGPT (OpenAI): GPT-5-nano model, temperature=1.0
150
+ - Google Gemini: gemini-2.0-flash-exp model
151
+ - Anthropic Claude: Planned support
152
+
153
+ ### Translation Strategies
154
+ The gem automatically selects the optimal strategy based on content size:
155
+
156
+ - **Deep Translation** (< 50 strings): Individual translation with detailed progress
157
+ - **Batch Translation** (≥ 50 strings): Processes in batches of 10 for performance
158
+
159
+ ### Configuration System
160
+ Type-safe `Configuration` class with mandatory validation:
161
+ - Required: provider, API keys, source language, target languages, file paths
162
+ - Optional: translation mode (override/incremental), context, caching, rate limiting
163
+ - Validation enforced via `config.validate!` before translation
164
+
165
+ ### Caching System
166
+ LRU cache implementation:
167
+ - Default capacity: 1000 items (configurable)
168
+ - Cache key format: `"#{text}:#{target_lang_code}"`
169
+ - Optional TTL support
170
+ - Thread-safe with Mutex protection
171
+ - Toggleable via `cache_enabled` config
172
+
173
+ ### Exclusion System
174
+ Two-tier exclusion mechanism:
175
+ - **Global exclusions**: Apply to all target languages (e.g., brand names)
176
+ - **Language-specific exclusions**: Exclude keys only for specific languages (e.g., legal text that was manually translated)
177
+
178
+ ### Translation Modes
179
+ - **Override**: Replaces entire target YAML files
180
+ - **Incremental**: Merges with existing files, only translates missing keys
181
+
182
+ ## Rails Integration
183
+
184
+ The gem provides three generators for Rails applications:
185
+
186
+ ```bash
187
+ # Generate initializer with example configuration
188
+ rails generate better_translate:install
189
+
190
+ # Run translation process
191
+ rails generate better_translate:translate
192
+
193
+ # Analyze translation similarities (Levenshtein distance)
194
+ rails generate better_translate:analyze
195
+ ```
196
+
197
+ Configuration is typically done in `config/initializers/better_translate.rb`.
198
+
199
+ ## Development Requirements
200
+
201
+ ### YARD Documentation (MANDATORY)
202
+ ALL public methods, classes, and modules must have comprehensive YARD documentation:
203
+
204
+ - Use `@param` for parameters with types (e.g., `@param text [String]`)
205
+ - Use `@return` for return values with types
206
+ - Use `@raise` for exceptions
207
+ - Provide `@example` blocks for public APIs
208
+ - Mark private methods with `@api private`
209
+
210
+ Example:
211
+ ```ruby
212
+ # Translates text to a target language
213
+ #
214
+ # @param text [String] The text to translate
215
+ # @param target_lang_code [String] Language code (e.g., "it", "fr")
216
+ # @return [String] The translated text
217
+ # @raise [ValidationError] If input is invalid
218
+ # @raise [TranslationError] If translation fails
219
+ #
220
+ # @example
221
+ # translate("Hello", "it") #=> "Ciao"
222
+ def translate(text, target_lang_code)
223
+ # ...
224
+ end
225
+ ```
226
+
227
+ ### HTTP Client (MANDATORY)
228
+ - Use Faraday for ALL HTTP connections
229
+ - Do NOT use Net::HTTP, HTTParty, or other HTTP libraries
230
+ - Implement retry logic and error handling as shown in BaseHttpProvider
231
+
232
+ ### Code Style
233
+ - RuboCop compliance required before commits
234
+ - String literals: Use double quotes (enforced by RuboCop)
235
+ - Target Ruby version: 3.0+
236
+ - Frozen string literals: Required at top of all files
237
+
238
+ ### Security
239
+ - NEVER hardcode API keys in code
240
+ - Use environment variables: `ENV['OPENAI_API_KEY']`, `ENV['GEMINI_API_KEY']`
241
+ - VCR cassettes must anonymize API keys automatically
242
+ - Input validation required for all user-provided data (language codes, file paths, text)
243
+
244
+ ## Error Handling
245
+
246
+ Custom exception hierarchy (all inherit from `BetterTranslate::Error`):
247
+ - `ConfigurationError`: Configuration issues
248
+ - `ValidationError`: Input validation failures
249
+ - `TranslationError`: Translation failures
250
+ - `ProviderError`: Provider-specific errors
251
+ - `ApiError`: API call failures
252
+ - `RateLimitError`: Rate limit exceeded
253
+ - `FileError`: File operation failures
254
+ - `YamlError`: YAML parsing errors
255
+ - `ProviderNotFoundError`: Unknown provider
256
+
257
+ All errors include detailed messages and context hash for debugging.
258
+
259
+ ## Testing Practices
260
+
261
+ ### Test-Driven Development (TDD) - MANDATORY
262
+ **ALWAYS write tests BEFORE implementing any new feature or bug fix.**
263
+
264
+ This is a strict requirement for all development in this project. Follow the Red-Green-Refactor cycle:
265
+
266
+ #### TDD Workflow (REQUIRED)
267
+ 1. **RED**: Write failing tests first
268
+ - Write RSpec tests that describe the desired behavior
269
+ - Run tests and verify they fail: `bundle exec rspec`
270
+ - Failing tests prove that the test is valid and catches the missing functionality
271
+
272
+ 2. **GREEN**: Implement minimum code to pass tests
273
+ - Write the simplest implementation that makes tests pass
274
+ - Run tests again and verify they pass: `bundle exec rspec`
275
+ - DO NOT add extra features beyond what tests require
276
+
277
+ 3. **REFACTOR**: Clean up code while keeping tests green
278
+ - Improve code quality, remove duplication
279
+ - Run tests after each refactoring to ensure nothing breaks
280
+ - Update documentation (YARD) as needed
281
+
282
+ #### Example TDD Workflow
283
+ ```bash
284
+ # 1. RED - Write failing test
285
+ # Edit spec/providers/new_provider_spec.rb with test cases
286
+ bundle exec rspec spec/providers/new_provider_spec.rb
287
+ # => Should see failures (RED)
288
+
289
+ # 2. GREEN - Implement feature
290
+ # Edit lib/better_translate/providers/new_provider.rb
291
+ bundle exec rspec spec/providers/new_provider_spec.rb
292
+ # => Should see passing tests (GREEN)
293
+
294
+ # 3. REFACTOR - Improve code
295
+ # Refactor implementation while keeping tests green
296
+ bundle exec rspec spec/providers/new_provider_spec.rb
297
+ # => Should still see passing tests (GREEN)
298
+ ```
299
+
300
+ #### Why TDD is Mandatory
301
+ - Ensures all code is testable by design
302
+ - Prevents regression bugs
303
+ - Provides living documentation of expected behavior
304
+ - Catches edge cases early
305
+ - Makes refactoring safer
306
+
307
+ #### Exceptions
308
+ The ONLY acceptable exception to writing tests first is for critical production hotfixes where immediate deployment is required. In such cases:
309
+ - Document the technical debt in code comments
310
+ - Create a GitHub issue to add tests
311
+ - Add tests within 24 hours of the hotfix
312
+
313
+ ### RSpec Setup
314
+
315
+ **Test Organization**:
316
+ - `spec/better_translate/`: Unit tests with WebMock stubs (fast, no API calls)
317
+ - `spec/integration/`: Integration tests with VCR cassettes (real API interactions)
318
+
319
+ **Unit Tests** (WebMock):
320
+ - Fast execution, no API keys required
321
+ - Test code structure, request formatting, and error handling
322
+ - Use `stub_request` to mock HTTP responses
323
+ - Example: `spec/better_translate/providers/chatgpt_provider_spec.rb`
324
+
325
+ **Integration Tests** (VCR):
326
+ - Test real API interactions
327
+ - Require API keys in `.env` file for first run (to record cassettes)
328
+ - Subsequent runs use recorded cassettes (no API keys needed)
329
+ - Tag with `:integration` and `:vcr`
330
+ - Example: `spec/integration/chatgpt_integration_spec.rb`
331
+
332
+ **Running Tests**:
333
+ ```bash
334
+ # Unit tests only (fast, recommended for TDD)
335
+ bundle exec rspec spec/better_translate/
336
+
337
+ # Integration tests only (slower, validates API compatibility)
338
+ bundle exec rspec spec/integration/ --tag integration
339
+
340
+ # All tests
341
+ bundle exec rspec
342
+ ```
343
+
344
+ ### VCR Configuration Details
345
+
346
+ VCR is configured in `spec_helper.rb` with:
347
+ - **Cassette library**: `spec/vcr_cassettes/`
348
+ - **Record mode**: `:once` (use existing, record new)
349
+ - **API key filtering**: Automatically replaces keys with `<OPENAI_API_KEY>`, etc.
350
+ - **Match on**: HTTP method, URI, and request body
351
+
352
+ **When to Re-record Cassettes**:
353
+ 1. API response format changes
354
+ 2. Adding new test scenarios
355
+ 3. Provider updates model or endpoint
356
+ 4. Testing error conditions
357
+
358
+ **Cassette Workflow**:
359
+ 1. First run: Needs real API keys, records responses
360
+ 2. Subsequent runs: Uses cassettes, no API calls
361
+ 3. CI/CD: Uses committed cassettes, no secrets needed
362
+
363
+ ## Translation Context Feature
364
+
365
+ The `translation_context` configuration allows providing domain-specific context to improve translation accuracy:
366
+
367
+ ```ruby
368
+ config.translation_context = "Medical terminology for healthcare applications"
369
+ ```
370
+
371
+ This context is included in the AI system prompt, helping with specialized terminology in fields like:
372
+ - Medical/Healthcare
373
+ - Legal
374
+ - Financial
375
+ - E-commerce
376
+ - Technical documentation
377
+
378
+ ## Performance Considerations
379
+
380
+ - Enable caching for repeated translations to reduce API costs
381
+ - Use incremental mode to preserve manual corrections
382
+ - Monitor API usage through provider dashboards
383
+ - Batch processing automatically used for large files (≥50 strings)
384
+ - Rate limiting prevents API overload (configurable, default 0.5s between requests)
385
+ - Concurrent requests configurable via `max_concurrent_requests` (default: 3)