deepl_diff 1.0.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 601d9219d37903d18d03a30e85fee9ce783522fbfb75d616f2873ef4ca6da950
4
+ data.tar.gz: a0b23c609b7f6f48b74a83a818d96d3bb13665568f3563a6e67fe2c16fb42f44
5
+ SHA512:
6
+ metadata.gz: a739f33576f117fde32e6c6fd9527835a2f6aeb3fbdd4b3265aa4e0c913e519bc0f8bd536ed2abdcee50721823e75286aed909cbfc59fc889cea40538c83847c
7
+ data.tar.gz: 87beae80de1cf85db6308cc613e71b428b0008a4b797ef276ef4d16abd65f162aea2cfd09b454ddd6f2fb438ae3939af7c48c42e850f3f4772c2184be1c35839
data/.gitignore ADDED
@@ -0,0 +1,13 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+
11
+ # rspec failure tracking
12
+ .rspec_status
13
+ test.rb
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
data/.rubocop.yml ADDED
@@ -0,0 +1,15 @@
1
+ Documentation:
2
+ Enabled: false
3
+
4
+ Style/StringLiterals:
5
+ EnforcedStyle: double_quotes
6
+
7
+ Style/ClassAndModuleChildren:
8
+ EnforcedStyle: compact
9
+
10
+ Metrics/BlockLength:
11
+ Exclude:
12
+ - spec/**/*
13
+
14
+ Metrics/LineLength:
15
+ Max: 120
data/.travis.yml ADDED
@@ -0,0 +1,12 @@
1
+ language: ruby
2
+ sudo: false
3
+ rvm:
4
+ - 2.3.3
5
+ cache: bundler
6
+
7
+ addons:
8
+ code_climate:
9
+ repo_token: "49b1afe0298c521f3d73377db0d20d1d4b9749ad533126bf3ed105fe9612eb0e"
10
+
11
+ after_success:
12
+ - bundle exec codeclimate-test-reporter
data/Gemfile ADDED
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ source "https://rubygems.org"
4
+
5
+ gemspec
data/README.md ADDED
@@ -0,0 +1,117 @@
1
+ # DeepLDiff
2
+
3
+ DeepL API wrapper helps to translate only changes between revisions of long texts.
4
+
5
+
6
+
7
+
8
+ **DeepLDiff** based on [GoogleTranslateDiff](https://github.com/gzigzigzeo/google_translate_diff)
9
+ ## Use case
10
+
11
+ Assume your project contains a significant amount of products descriptions which:
12
+ - Require retranslation each time user edits them.
13
+ - Have a lot of equal parts (like return policy).
14
+ - Change frequently.
15
+
16
+ If your user changes a single word within the long description, you will be charged for the retranslation of the whole text.
17
+
18
+ Much better approach is to try to translate every repeated structural element (sentence) in your texts array just once to save money. This gem helps to make it done.
19
+
20
+ ## Installation
21
+
22
+ Add this line to your application's Gemfile:
23
+
24
+ ```ruby
25
+ gem 'deepl_diff'
26
+ ```
27
+
28
+ And then execute:
29
+
30
+ $ bundle
31
+
32
+ Or install it yourself as:
33
+
34
+ $ gem install deepl_diff
35
+
36
+ ## Usage
37
+
38
+ ```ruby
39
+ require "deepl_diff"
40
+
41
+ # This dependencies are not included, as you might need to roll your own cache based on different store
42
+ require "redis"
43
+ require "connection_pool"
44
+ require "redis-namespace"
45
+ require "ratelimit" # Optional, if you will use
46
+
47
+ # Setup https://github.com/wikiti/deepl-rb
48
+ DeepL.configure do |config|
49
+ config.auth_key = 'your-api-token'
50
+ config.host = 'https://api-free.deepl.com' # Default value is 'https://api.deepl.com'
51
+ config.version = 'v1' # Default value is 'v2'
52
+ end
53
+
54
+ # I always use pool for redis
55
+ pool = ConnectionPool.new(size: 10, timeout: 5) { Redis.new }
56
+
57
+ # Pass DeepL for DeepLDiff
58
+ DeepLDiff.api = DeepL
59
+
60
+ DeepLDiff.cache_store =
61
+ DeepLDiff::RedisCacheStore.new(pool, timeout: 7.days, namespace: "t")
62
+
63
+ # Optional
64
+ DeepLDiff.rate_limiter =
65
+ DeepLDiff::RedisRateLimiter.new(
66
+ pool, threshold: 8000, interval: 60, namespace: "t"
67
+ )
68
+
69
+ DeepLDiff.translate("test translations", from: "en", to: "es")
70
+ ```
71
+
72
+ ## How it works
73
+
74
+ - Text nodes are extracted from HTML.
75
+ - Every text node is split into sentences (using `punkt-segmenter` gem).
76
+ - Cache is checked for the presence of each sentence (using language couple and a hash of string).
77
+ - Missing sentences are translated via API and cached.
78
+ - Original HTML is recombined from translations and cache data.
79
+
80
+ *NOTE:* if `:from` is not specified or equal to nil, then the DeepL API will be called twice, the first time a sample of text up to 100 characters long will be transferred to determine the language, and the second time the entire text will be transferred.
81
+ Try to specify `:from` explicitly
82
+
83
+ ## Input
84
+
85
+ `::translate` can receive string, array or deep hash and will return the same, but translated.
86
+
87
+ ```ruby
88
+ DeepLDiff.translate("test", from: "en", to: "es")
89
+ DeepLDiff.translate(%w[test language], from: "en", to: "es")
90
+ DeepLDiff.translate(
91
+ { title: "test", values: { type: "frequent" } }, from: "en", to: "es"
92
+ )
93
+ ```
94
+
95
+ See `DeepLDiff::Linearizer` for details.
96
+
97
+ ## HTML
98
+
99
+ You can pass HTML as like as plain text:
100
+
101
+ ```ruby
102
+ DeepLDiff.translate("<b>Black</b>", from: "en", to: "es")
103
+ ```
104
+
105
+ ## Very long texts
106
+
107
+ DeepL API has a limitation: query can not be longer than approximately 128 KB. If your text is really that long, multiple queries will be used to translate it automatically.
108
+
109
+ ## Development
110
+
111
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
112
+
113
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
114
+
115
+ ## Contributing
116
+
117
+ Bug reports and pull requests are welcome on GitHub at https://github.com/wikiti/deepl-rb.
data/Rakefile ADDED
@@ -0,0 +1,8 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rspec/core/rake_task"
5
+
6
+ RSpec::Core::RakeTask.new(:spec)
7
+
8
+ task default: :spec
@@ -0,0 +1,55 @@
1
+ # frozen_string_literal: true
2
+
3
+ lib = File.expand_path("lib", __dir__)
4
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
5
+ require "deepl_diff/version"
6
+
7
+ # rubocop:disable Metrics/BlockLength
8
+ Gem::Specification.new do |spec|
9
+ spec.name = "deepl_diff"
10
+ spec.version = DeepLDiff::VERSION
11
+ spec.authors = ["Islam Gagiev"]
12
+ spec.email = ["omniacinis@gmail.com"]
13
+
14
+ spec.summary = %(
15
+ DeepL API wrapper for Ruby which helps to translate only changes
16
+ between revisions of long texts.
17
+
18
+ )
19
+ spec.description = %(
20
+ DeepL API wrapper for Ruby which helps to translate only changes
21
+ between revisions of long texts.
22
+ )
23
+ spec.homepage = "https://github.com/Halvanhelv/deepl_diff"
24
+
25
+ if spec.respond_to?(:metadata)
26
+ spec.metadata["allowed_push_host"] = "https://rubygems.org"
27
+ else
28
+ raise "RubyGems 2.0 or newer is required to protect against " \
29
+ "public gem pushes."
30
+ end
31
+
32
+ spec.files = `git ls-files -z`.split("\x0").reject do |f|
33
+ f.match(%r{^(test|spec|features)/})
34
+ end
35
+ spec.bindir = "exe"
36
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
37
+ spec.require_paths = ["lib"]
38
+
39
+ spec.add_development_dependency "bundler", "~> 1.14"
40
+ spec.add_development_dependency "codeclimate-test-reporter", "~> 1.0.0"
41
+ spec.add_development_dependency "rake", "~> 10.0"
42
+ spec.add_development_dependency "rspec", "~> 3.0"
43
+ spec.add_development_dependency "rubocop"
44
+ spec.add_development_dependency "simplecov"
45
+
46
+ spec.add_dependency "connection_pool"
47
+ spec.add_dependency "deepl-rb"
48
+ spec.add_dependency "dry-initializer"
49
+ spec.add_dependency "ox"
50
+ spec.add_dependency "punkt-segmenter"
51
+ spec.add_dependency "ratelimit"
52
+ spec.add_dependency "redis"
53
+ spec.add_dependency "redis-namespace"
54
+ end
55
+ # rubocop:enable Metrics/BlockLength
@@ -0,0 +1,38 @@
1
+ # frozen_string_literal: true
2
+
3
+ class DeepLDiff::Cache
4
+ extend Dry::Initializer
5
+
6
+ param :from
7
+ param :to
8
+
9
+ def cached_and_missing(values)
10
+ keys = values.map { |v| key(v) }
11
+ cached = cache_store.read_multi(keys)
12
+ missing = values.map.with_index { |v, i| v if cached[i].nil? }.compact
13
+
14
+ [cached, missing]
15
+ end
16
+
17
+ def store(values, cached, updates)
18
+ cached.map.with_index do |value, index|
19
+ value || store_value(values[index], updates.shift)
20
+ end
21
+ end
22
+
23
+ private
24
+
25
+ def store_value(value, translation)
26
+ cache_store.write(key(value), translation)
27
+ translation
28
+ end
29
+
30
+ def key(value)
31
+ hash = Digest::MD5.hexdigest(value.strip) # No matter how much spaces
32
+ "#{from}:#{to}:#{hash}"
33
+ end
34
+
35
+ def cache_store
36
+ DeepLDiff.cache_store
37
+ end
38
+ end
@@ -0,0 +1,56 @@
1
+ # frozen_string_literal: true
2
+
3
+ class DeepLDiff::Chunker
4
+ extend ::Dry::Initializer
5
+
6
+ class Error < StandardError; end
7
+
8
+ Chunk = Struct.new(:values, :size)
9
+
10
+ param :values
11
+ option :limit, default: proc { MAX_CHUNK_SIZE }
12
+ option :count_limit, default: proc { COUNT_LIMIT }
13
+
14
+ def call
15
+ chunks.map(&:values)
16
+ end
17
+
18
+ def chunks
19
+ values.each_with_object([]) do |value, chunks|
20
+ validate_value_size(value)
21
+
22
+ tail = chunks.last
23
+
24
+ if next_chunk?(tail, value)
25
+ chunks << Chunk.new([], 0)
26
+ tail = chunks.last
27
+ end
28
+
29
+ update_chunk(tail, value)
30
+ end
31
+ end
32
+
33
+ private
34
+
35
+ def next_chunk?(tail, value)
36
+ tail.nil? ||
37
+ (size(value) + tail.size > limit) ||
38
+ tail.values.size > count_limit
39
+ end
40
+
41
+ def size(text)
42
+ CGI.escape(text).size
43
+ end
44
+
45
+ def update_chunk(chunk, value)
46
+ chunk.values << value
47
+ chunk.size = chunk.size + value.size
48
+ end
49
+
50
+ def validate_value_size(value)
51
+ raise Error, "Too long part #{value.size} > #{limit}" if value.size > limit
52
+ end
53
+
54
+ MAX_CHUNK_SIZE = 1700
55
+ COUNT_LIMIT = 300
56
+ end
@@ -0,0 +1,29 @@
1
+ # frozen_string_literal: true
2
+
3
+ class DeepLDiff::Linearizer
4
+ class << self
5
+ def linearize(struct, array = [])
6
+ case struct
7
+ when Hash
8
+ struct.each { |_k, v| linearize(v, array) }
9
+ when Array
10
+ struct.each { |v| linearize(v, array) }
11
+ else
12
+ array << struct
13
+ end
14
+
15
+ array
16
+ end
17
+
18
+ def restore(struct, array)
19
+ case struct
20
+ when Hash
21
+ struct.transform_values { |v| restore(v, array) }
22
+ when Array
23
+ struct.map { |v| restore(v, array) }
24
+ else
25
+ array.shift
26
+ end
27
+ end
28
+ end
29
+ end
@@ -0,0 +1,26 @@
1
+ # frozen_string_literal: true
2
+
3
+ class DeepLDiff::RedisCacheStore
4
+ extend Dry::Initializer
5
+
6
+ param :connection_pool
7
+
8
+ option :timeout, default: proc { 60 * 60 * 24 * 7 }
9
+ option :namespace, default: proc { DeepLDiff::CACHE_NAMESPACE }
10
+
11
+ def read_multi(keys)
12
+ redis { |redis| redis.mget(*keys) }
13
+ end
14
+
15
+ def write(key, value)
16
+ redis { |redis| redis.setex(key, timeout, value) }
17
+ end
18
+
19
+ private
20
+
21
+ def redis
22
+ connection_pool.with do |redis|
23
+ yield Redis::Namespace.new(namespace, redis: redis)
24
+ end
25
+ end
26
+ end
@@ -0,0 +1,22 @@
1
+ # frozen_string_literal: true
2
+
3
+ class DeepLDiff::RedisRateLimiter
4
+ extend Dry::Initializer
5
+
6
+ class RateLimitExceeded < StandardError; end
7
+
8
+ param :connection_pool
9
+ param :threshold, default: proc { 8000 }
10
+ param :interval, default: proc { 60 }
11
+
12
+ option :namespace, default: proc { DeepLDiff::CACHE_NAMESPACE }
13
+
14
+ def check(size)
15
+ connection_pool.with do |redis|
16
+ rate_limit = Ratelimit.new(namespace, redis: redis)
17
+ raise RateLimitExceeded if rate_limit.exceeded?("call", threshold: threshold, interval: interval)
18
+
19
+ rate_limit.add size
20
+ end
21
+ end
22
+ end
@@ -0,0 +1,157 @@
1
+ # frozen_string_literal: true
2
+
3
+ class DeepLDiff::Request
4
+ extend Dry::Initializer
5
+ extend Forwardable
6
+
7
+ param :values
8
+ param :options
9
+
10
+ def_delegators :DeepLDiff, :api, :cache_store, :rate_limiter
11
+ def_delegators :"DeepLDiff::Linearizer", :linearize, :restore
12
+
13
+ def call
14
+ validate_globals
15
+
16
+ return values if from == to || values.empty?
17
+
18
+ translation
19
+ end
20
+
21
+ private
22
+
23
+ def from
24
+ @from ||= options.delete(:from) || detect_language
25
+ end
26
+
27
+ def to
28
+ @to ||= options.delete(:to) { nil }
29
+ end
30
+
31
+ def detect_language
32
+ api.translate(text_tokens_texts.join(" ")[0..100], nil, to)
33
+ .detected_source_language.downcase
34
+ end
35
+
36
+ def validate_globals
37
+ raise "Set DeepLDiff.api before calling ::translate" unless api
38
+ return if cache_store
39
+
40
+ raise "Set DeepLDiff.cache_store before calling ::translate"
41
+ end
42
+
43
+ # Extracts flat text array
44
+ # => "Name", "<b>Good</b> boy"
45
+ #
46
+ # #values might be something like { name: "Name", bio: "<b>Good</b> boy" }
47
+ def texts
48
+ @texts ||= linearize(values)
49
+ end
50
+
51
+ # Converts each array item to token list
52
+ # => [..., [["<b>", :markup], ["Good", :text], ...]]
53
+ def tokens
54
+ @tokens ||= texts.map do |value|
55
+ DeepLDiff::Tokenizer.tokenize(value)
56
+ end
57
+ end
58
+
59
+ # Extracts text tokens from token list
60
+ # => { ..., "1_1" => "Good", 1_3 => "Boy", ... }
61
+ def text_tokens
62
+ @text_tokens ||= extract_text_tokens.to_h
63
+ end
64
+
65
+ def extract_text_tokens
66
+ tokens.each_with_object([]).with_index do |(group, result), group_index|
67
+ group.each_with_index do |(value, type), index|
68
+ result << ["#{group_index}_#{index}", value] if type == :text
69
+ end
70
+ end
71
+ end
72
+
73
+ # Extracts values from text tokens
74
+ # => [ ..., "Good", "Boy", ... ]
75
+ def text_tokens_texts
76
+ @text_tokens_texts ||= linearize(text_tokens).map(&:to_s).map(&:strip)
77
+ end
78
+
79
+ # Splits things requires translations to per-request chunks
80
+ # (groups less 2k sym)
81
+ # => [[ ..., "Good", "Boy", ... ]]
82
+ def chunks
83
+ @chunks ||= DeepLDiff::Chunker.new(text_tokens_texts).call
84
+ end
85
+
86
+ # Translates/loads from cache values from each chunk
87
+ # => [[ ..., "Horoshiy", "Malchik", ... ]]
88
+ def chunks_translated
89
+ @chunks_translated ||= chunks.map do |chunk|
90
+ cached, missing = cache.cached_and_missing(chunk)
91
+ if missing.empty?
92
+ cached
93
+ else
94
+ cache.store(chunk, cached, call_api(missing))
95
+ end
96
+ end
97
+ end
98
+
99
+ # Restores indexes for translated tokens
100
+ # => { ..., "1_1" => "Horoshiy", 1_3 => "Malchik", ... }
101
+ def text_tokens_translated
102
+ @text_tokens_texts_translated ||=
103
+ restore(text_tokens, chunks_translated.flatten)
104
+ end
105
+
106
+ # Restores tokens translated + adds same spacing as in source token
107
+ # => [[..., [ "Horoshiy", :text ], ...]]
108
+ # rubocop:disable Metrics/AbcSize
109
+ def tokens_translated
110
+ @tokens_translated ||= tokens.dup.tap do |tokens|
111
+ text_tokens_translated.each do |index, value|
112
+ group_index, index = index.split("_")
113
+ tokens[group_index.to_i][index.to_i][0] =
114
+ restore_spacing(tokens[group_index.to_i][index.to_i][0], value)
115
+ end
116
+ end
117
+ end
118
+ # rubocop:enable Metrics/AbcSize
119
+
120
+ def restore_spacing(source_value, value)
121
+ DeepLDiff::Spacing.restore(source_value, value)
122
+ end
123
+
124
+ # Restores texts from tokens
125
+ # [..., "<b>Horoshiy</b> Malchik", ...]
126
+ def texts_translated
127
+ @texts_translated ||= tokens_translated.map do |group|
128
+ group.map { |value, type| type == :text ? value : fix_ascii(value) }.join
129
+ end
130
+ end
131
+
132
+ # Final result
133
+ def translation
134
+ @translation ||= restore(values, texts_translated)
135
+ end
136
+
137
+ def call_api(values)
138
+ check_rate_limit(values)
139
+ [api.translate(values, from, to, **options)].flatten.map(&:text)
140
+ end
141
+
142
+ def cache
143
+ @cache ||= DeepLDiff::Cache.new(from, to)
144
+ end
145
+
146
+ def check_rate_limit(values)
147
+ return if rate_limiter.nil?
148
+
149
+ size = values.map(&:size).inject(0) { |sum, x| sum + x }
150
+ rate_limiter.check(size)
151
+ end
152
+
153
+ # Markup should not contain control characters
154
+ def fix_ascii(value)
155
+ value.gsub(/[\u0000-\u001F]/, " ")
156
+ end
157
+ end
@@ -0,0 +1,31 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Adds same count leading-trailing spaces left has to the right
4
+ class DeepLDiff::Spacing
5
+ class << self
6
+ # DeepLDiff::Spacing.restore(" a ", "Z") # => " Z "
7
+ def restore(left, right)
8
+ leading(left) + right.strip + trailing(left)
9
+ end
10
+
11
+ private
12
+
13
+ def spaces(count)
14
+ ([" "] * count).join
15
+ end
16
+
17
+ def leading(value)
18
+ pos = value =~ /[^[:space:]]+/ui
19
+ return "" if pos.nil? || pos.zero?
20
+
21
+ value[0..(pos - 1)]
22
+ end
23
+
24
+ def trailing(value)
25
+ pos = value =~ /[[:space:]]+\z/ui
26
+ return "" if pos.nil?
27
+
28
+ value[pos..]
29
+ end
30
+ end
31
+ end
@@ -0,0 +1,159 @@
1
+ # frozen_string_literal: true
2
+
3
+ class DeepLDiff::Tokenizer < ::Ox::Sax
4
+ def initialize(source)
5
+ @pos = nil
6
+ @source = source
7
+ @tokens = nil
8
+ @context = []
9
+ @sequence = []
10
+ @indicies = []
11
+ end
12
+
13
+ def instruct(target)
14
+ start_markup(target)
15
+ end
16
+
17
+ def end_instruct(target)
18
+ end_markup(target)
19
+ end
20
+
21
+ def start_element(name)
22
+ start_markup(name)
23
+ end
24
+
25
+ def end_element(name)
26
+ end_markup(name)
27
+ end
28
+
29
+ def attr(name, value)
30
+ return unless @context.last == :span
31
+ return unless name == :class && value == "notranslate"
32
+ return if notranslate?
33
+
34
+ @sequence[-1] = :notranslate
35
+ end
36
+
37
+ def text(value)
38
+ return if value == ""
39
+
40
+ @sequence << (SKIP.include?(@context.last) ? :markup : :text)
41
+ @indicies << @pos - 1
42
+ end
43
+
44
+ def tokens
45
+ @tokens ||= token_sequences_joined
46
+ .tap { |tokens| make_sentences_from_last_token(tokens) }
47
+ end
48
+
49
+ private
50
+
51
+ def token_sequences_joined
52
+ raw_tokens.each_with_object([]) do |token, tokens|
53
+ if tokens.empty? # Initial state
54
+ tokens << token
55
+ elsif tokens.last[1] == token[1]
56
+ # Join series of tokens of the same type into one
57
+ tokens.last[0].concat(token[0])
58
+ else
59
+ # If token before :markup is :text we need to split it into sentences
60
+ make_sentences_from_last_token(tokens)
61
+ tokens << token
62
+ end
63
+ end
64
+ end
65
+
66
+ def make_sentences_from_last_token(tokens)
67
+ return if tokens.empty?
68
+
69
+ tokens.concat(sentences(tokens.pop[0])) if tokens.last[1] == :text
70
+ end
71
+
72
+ # rubocop: disable Metrics/MethodLength
73
+ def sentences(value)
74
+ return [] if value.strip.empty?
75
+
76
+ boundaries =
77
+ Punkt::SentenceTokenizer
78
+ .new(value)
79
+ .sentences_from_text(value)
80
+
81
+ return [[value, :text]] if boundaries.size == 1
82
+
83
+ boundaries.map.with_index do |(left, right), index|
84
+ next_boundary = boundaries[index + 1]
85
+ right = next_boundary[0] - 1 if next_boundary
86
+
87
+ [value[left..right], :text]
88
+ end
89
+ end
90
+ # rubocop:enable Metrics/MethodLength
91
+
92
+ # Whether the sequence is between `:notranslate` and `:end_notranslate`
93
+ def notranslate?
94
+ @sequence.select { |item| item[/notranslate/] }.last == :notranslate
95
+ end
96
+
97
+ # Returns the item for last opened span
98
+ def end_span
99
+ return :markup unless notranslate?
100
+
101
+ opened_spans = @sequence
102
+ .reverse
103
+ .take_while { |item| item != :notranslate }
104
+ .map { |item| { span: 1, end_span: -1 }.fetch(item, 0) }
105
+ .reduce(0, :+)
106
+
107
+ opened_spans.positive? ? :end_span : :end_notranslate
108
+ end
109
+
110
+ def raw_tokens
111
+ @raw_tokens ||= @indicies.map.with_index do |i, n|
112
+ first = i
113
+ last = (@indicies[n + 1] || 0) - 1
114
+ value = fix_utf(@source.byteslice(first..last))
115
+ type = @sequence[n]
116
+ type = :text if INNER_SPANS.include?(type)
117
+ [value, type]
118
+ end
119
+ end
120
+
121
+ def fix_utf(value)
122
+ value.encode("UTF-8", undef: :replace, invalid: :replace, replace: " ")
123
+ end
124
+
125
+ def start_markup(name)
126
+ @context << name
127
+ @sequence << (if notranslate?
128
+ name == :span ? :span : :text
129
+ else
130
+ :markup
131
+ end)
132
+ @indicies << @pos - 1
133
+ end
134
+
135
+ def end_markup(name)
136
+ @context.pop
137
+ @sequence << (if notranslate?
138
+ name == :span ? end_span : :text
139
+ else
140
+ :markup
141
+ end)
142
+ @indicies << @pos - 1 unless @pos == @source.bytesize
143
+ end
144
+
145
+ class << self
146
+ def tokenize(value)
147
+ return [] if value.nil?
148
+
149
+ tokenizer = new(value).tap do |h|
150
+ Ox.sax_parse(h, StringIO.new(value), HTML_OPTIONS)
151
+ end
152
+ tokenizer.tokens
153
+ end
154
+ end
155
+
156
+ SKIP = %i[script style].freeze
157
+ INNER_SPANS = %i[notranslate span end_span end_notranslate].freeze
158
+ HTML_OPTIONS = { smart: true, skip: :skip_none }.freeze
159
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module DeepLDiff
4
+ VERSION = "1.0.2"
5
+ end
data/lib/deepl_diff.rb ADDED
@@ -0,0 +1,28 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "ox"
4
+ require "punkt-segmenter"
5
+ require "dry/initializer"
6
+ require "deepl"
7
+
8
+ require "deepl_diff/version"
9
+ require "deepl_diff/tokenizer"
10
+ require "deepl_diff/linearizer"
11
+ require "deepl_diff/chunker"
12
+ require "deepl_diff/spacing"
13
+ require "deepl_diff/cache"
14
+ require "deepl_diff/redis_cache_store"
15
+ require "deepl_diff/redis_rate_limiter"
16
+ require "deepl_diff/request"
17
+
18
+ module DeepLDiff
19
+ class << self
20
+ attr_accessor :api, :cache_store, :rate_limiter
21
+
22
+ def translate(*args)
23
+ Request.new(*args).call
24
+ end
25
+ end
26
+
27
+ CACHE_NAMESPACE = "deepl-diff"
28
+ end
metadata ADDED
@@ -0,0 +1,259 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: deepl_diff
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.2
5
+ platform: ruby
6
+ authors:
7
+ - Islam Gagiev
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2023-02-15 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.14'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.14'
27
+ - !ruby/object:Gem::Dependency
28
+ name: codeclimate-test-reporter
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: 1.0.0
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: 1.0.0
41
+ - !ruby/object:Gem::Dependency
42
+ name: rake
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '10.0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '10.0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: rspec
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '3.0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '3.0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: rubocop
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: simplecov
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
97
+ - !ruby/object:Gem::Dependency
98
+ name: connection_pool
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: '0'
104
+ type: :runtime
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - ">="
109
+ - !ruby/object:Gem::Version
110
+ version: '0'
111
+ - !ruby/object:Gem::Dependency
112
+ name: deepl-rb
113
+ requirement: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - ">="
116
+ - !ruby/object:Gem::Version
117
+ version: '0'
118
+ type: :runtime
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ requirements:
122
+ - - ">="
123
+ - !ruby/object:Gem::Version
124
+ version: '0'
125
+ - !ruby/object:Gem::Dependency
126
+ name: dry-initializer
127
+ requirement: !ruby/object:Gem::Requirement
128
+ requirements:
129
+ - - ">="
130
+ - !ruby/object:Gem::Version
131
+ version: '0'
132
+ type: :runtime
133
+ prerelease: false
134
+ version_requirements: !ruby/object:Gem::Requirement
135
+ requirements:
136
+ - - ">="
137
+ - !ruby/object:Gem::Version
138
+ version: '0'
139
+ - !ruby/object:Gem::Dependency
140
+ name: ox
141
+ requirement: !ruby/object:Gem::Requirement
142
+ requirements:
143
+ - - ">="
144
+ - !ruby/object:Gem::Version
145
+ version: '0'
146
+ type: :runtime
147
+ prerelease: false
148
+ version_requirements: !ruby/object:Gem::Requirement
149
+ requirements:
150
+ - - ">="
151
+ - !ruby/object:Gem::Version
152
+ version: '0'
153
+ - !ruby/object:Gem::Dependency
154
+ name: punkt-segmenter
155
+ requirement: !ruby/object:Gem::Requirement
156
+ requirements:
157
+ - - ">="
158
+ - !ruby/object:Gem::Version
159
+ version: '0'
160
+ type: :runtime
161
+ prerelease: false
162
+ version_requirements: !ruby/object:Gem::Requirement
163
+ requirements:
164
+ - - ">="
165
+ - !ruby/object:Gem::Version
166
+ version: '0'
167
+ - !ruby/object:Gem::Dependency
168
+ name: ratelimit
169
+ requirement: !ruby/object:Gem::Requirement
170
+ requirements:
171
+ - - ">="
172
+ - !ruby/object:Gem::Version
173
+ version: '0'
174
+ type: :runtime
175
+ prerelease: false
176
+ version_requirements: !ruby/object:Gem::Requirement
177
+ requirements:
178
+ - - ">="
179
+ - !ruby/object:Gem::Version
180
+ version: '0'
181
+ - !ruby/object:Gem::Dependency
182
+ name: redis
183
+ requirement: !ruby/object:Gem::Requirement
184
+ requirements:
185
+ - - ">="
186
+ - !ruby/object:Gem::Version
187
+ version: '0'
188
+ type: :runtime
189
+ prerelease: false
190
+ version_requirements: !ruby/object:Gem::Requirement
191
+ requirements:
192
+ - - ">="
193
+ - !ruby/object:Gem::Version
194
+ version: '0'
195
+ - !ruby/object:Gem::Dependency
196
+ name: redis-namespace
197
+ requirement: !ruby/object:Gem::Requirement
198
+ requirements:
199
+ - - ">="
200
+ - !ruby/object:Gem::Version
201
+ version: '0'
202
+ type: :runtime
203
+ prerelease: false
204
+ version_requirements: !ruby/object:Gem::Requirement
205
+ requirements:
206
+ - - ">="
207
+ - !ruby/object:Gem::Version
208
+ version: '0'
209
+ description: "\nDeepL API wrapper for Ruby which helps to translate only changes\nbetween
210
+ revisions of long texts.\n "
211
+ email:
212
+ - omniacinis@gmail.com
213
+ executables: []
214
+ extensions: []
215
+ extra_rdoc_files: []
216
+ files:
217
+ - ".gitignore"
218
+ - ".rspec"
219
+ - ".rubocop.yml"
220
+ - ".travis.yml"
221
+ - Gemfile
222
+ - README.md
223
+ - Rakefile
224
+ - deepl_diff.gemspec
225
+ - lib/deepl_diff.rb
226
+ - lib/deepl_diff/cache.rb
227
+ - lib/deepl_diff/chunker.rb
228
+ - lib/deepl_diff/linearizer.rb
229
+ - lib/deepl_diff/redis_cache_store.rb
230
+ - lib/deepl_diff/redis_rate_limiter.rb
231
+ - lib/deepl_diff/request.rb
232
+ - lib/deepl_diff/spacing.rb
233
+ - lib/deepl_diff/tokenizer.rb
234
+ - lib/deepl_diff/version.rb
235
+ homepage: https://github.com/Halvanhelv/deepl_diff
236
+ licenses: []
237
+ metadata:
238
+ allowed_push_host: https://rubygems.org
239
+ post_install_message:
240
+ rdoc_options: []
241
+ require_paths:
242
+ - lib
243
+ required_ruby_version: !ruby/object:Gem::Requirement
244
+ requirements:
245
+ - - ">="
246
+ - !ruby/object:Gem::Version
247
+ version: '0'
248
+ required_rubygems_version: !ruby/object:Gem::Requirement
249
+ requirements:
250
+ - - ">="
251
+ - !ruby/object:Gem::Version
252
+ version: '0'
253
+ requirements: []
254
+ rubygems_version: 3.1.6
255
+ signing_key:
256
+ specification_version: 4
257
+ summary: DeepL API wrapper for Ruby which helps to translate only changes between
258
+ revisions of long texts.
259
+ test_files: []