damerau-levenshtein 1.2.0 → 1.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: acbd86c82aab23d17130fda4d7e05aa39d10571d
4
- data.tar.gz: 8421eaf8a995a2b217ca77a703b0ac328a729202
3
+ metadata.gz: 67c6faa0317240defb0210ab642c514942855fdb
4
+ data.tar.gz: 4ad7bd77b1eb7fec82c489e78d572fa75f25c2c9
5
5
  SHA512:
6
- metadata.gz: 3ad4824365414744893442e7c64698f9c0bd0ca4b948b52d53b55f107a89fc5e96971e1bf81ac50a051ed63bf68dcb33493e08af96fbe16aa9d10bd030b0cae6
7
- data.tar.gz: e1e72f50b29be357115bdf666bb7c7a57c9b88bd1eecf0457d25dba469ebe20cdd65e8b4b58c153d947041ff2dd5f2593743787257bd575d4689416a31065ee3
6
+ metadata.gz: fd15b89736839baa55d4023c15a5ff8511364b55b4e585c3bf68dcd4bb75e62e9e5dfdff1505691694c502c875f19b74c996691dec7261d184be97e0b0636ede
7
+ data.tar.gz: ae7e131dcd30ed4f3d98c78bfbe4359a63392c34f301c76aef975fbf94ec9f60abd9e61f500f71c0a1bfe9e493080eb6291c50a736d2c68802681524ad580f47
data/.gitignore CHANGED
@@ -5,6 +5,10 @@ tmp
5
5
  *.o
6
6
  *.bundle
7
7
  *.gem
8
+ .nvimlog
9
+ .vim.custom
10
+ .byebug_history
11
+
8
12
  # rcov generated
9
13
  coverage
10
14
 
@@ -1 +1 @@
1
- 2.2.4
1
+ 2.4.1
@@ -2,7 +2,8 @@ rvm:
2
2
  - 2.0
3
3
  - 2.1
4
4
  - 2.2
5
- - 2.3.1
5
+ - 2.3
6
+ - 2.4
6
7
  before_install: "gem update bundler"
7
8
  script:
8
9
  - "bundle exec rake"
@@ -1,6 +1,8 @@
1
1
  damerau-levenshtein CHANGELOG
2
2
  =============================
3
3
 
4
+ 1.3.0 -- (issue #10) shows difference between two strings
5
+
4
6
  1.2.0 -- add edit distance for array of integers (by @azhi)
5
7
 
6
8
  1.1.3 -- add ruby 2.3.1 to travis tests by request from @greysteil
data/Gemfile CHANGED
@@ -1,3 +1,5 @@
1
- source 'https://rubygems.org'
1
+ # frozen_string_literal: true
2
+
3
+ source "https://rubygems.org"
2
4
 
3
5
  gemspec
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2011-2016 Dmitry Mozzherin
3
+ Copyright (c) 2011-2017 Dmitry Mozzherin
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining
6
6
  a copy of this software and associated documentation files (the
data/README.md CHANGED
@@ -1,13 +1,12 @@
1
- damerau-levenshtein
2
- ===================
1
+ # damerau-levenshtein #
3
2
 
4
3
  [![Gem Version][gem_svg]][gem]
5
4
  [![Continuous Integration Status][ci_svg]][ci]
6
5
  [![Dependency Status][dep_svg]][dep]
7
6
  [![Coverage Status][cov_svg]][cov]
8
7
 
9
- The damerau-levenshtein gem allows to find edit distance between two UTF-8
10
- or ASCII encoded strings with O(N\*M) efficiency.
8
+ The damerau-levenshtein gem allows to find [edit distance][ed] between two
9
+ UTF-8 or ASCII encoded strings with O(N\*M) efficiency.
11
10
 
12
11
  This gem implements pure Levenshtein algorithm, Damerau modification of it
13
12
  (where 2 character transposition counts as 1 edit distance). It also includes
@@ -20,21 +19,28 @@ require "damerau-levenshtein"
20
19
  DamerauLevenshtein.distance("Something", "Smoething") #returns 1
21
20
  ```
22
21
 
22
+ It also returns a diff between two strings according to Levenshtein alrorithm.
23
+ The diff is expressed by tags `<ins>`, `<del>`, and `<subst>`
24
+
25
+ ```ruby
26
+ require "damerau-levenshtein"
27
+ differ = DamerauLevenshtein::Differ.new
28
+ differ.run("corn", "cron")
29
+ # output: ["c<subst>or</subst>n", "c<subst>ro</subst>n"]
30
+ ```
31
+
23
32
  Gem damerau-levenshtein is compatible with ruby versions 1.8.7
24
33
  and 1.9.2 and higher, as well as 2.0.0 and higher
25
34
 
26
- Dependencies
27
- -------------
35
+ ## Dependencies ##
28
36
 
29
37
  sudo apt-get install build-essential libgmp3-dev
30
38
 
31
- Installation
32
- ------------
39
+ ## Installation ##
33
40
 
34
41
  gem install damerau-levenshtein
35
42
 
36
- Examples
37
- --------
43
+ ## Examples ##
38
44
 
39
45
  ```ruby
40
46
  require "damerau-levenshtein"
@@ -65,25 +71,55 @@ dl.distance("Something", "meSothing", 2) #returns 2 instead of 4
65
71
  dl.distance("Sjöstedt", "Sjostedt") #returns 1
66
72
  ```
67
73
 
68
- API Description
69
- -----------
74
+ * compare two arrays
75
+
76
+ ```ruby
77
+ dl.array_distance([1,2,3,5], [1,2,3,4]) #returns 1
78
+ ```
79
+
80
+ * return diff between two strings
81
+
82
+ ```ruby
83
+ differ = DamerauLevenshtein::Differ.new
84
+ differ.run("Something", "smthg")
85
+ ```
70
86
 
71
- Gem defines two methods
87
+ * return diff between two strings in raw format
88
+
89
+ ```ruby
90
+ differ = DamerauLevenshtein::Differ.new
91
+ differ.format = :raw
92
+ differ.run("Something", "smthg")
93
+ ```
94
+
95
+ ## API Description ##
96
+
97
+ ### Methods ###
98
+
99
+ #### DamerauLevenshtein.version
72
100
 
73
101
  ```ruby
74
102
  DamerauLevenshtein.version
75
103
  #returns version number of the gem
104
+ ```
105
+
106
+ #### DamerauLevenshtein.distance
76
107
 
108
+ ```ruby
77
109
  DamerauLevenshtein.distance(string1, string2, block_size, max_distance)
78
- #returns [edit distance][ed] between 2 strings
79
- ```
110
+ #returns edit distance between 2 strings
80
111
 
112
+ DamerauLevenshtein.string_distance(string1, string2, block_size, max_distance)
113
+ # an alias for .distance
81
114
 
115
+ DamerauLevenshtein.array_distance(array1, array2, block_size, max_distance)
116
+ # returns edit distance between 2 arrays of integers
117
+ ```
82
118
 
83
- DamerauLevenshtein.distance takes 4 arguments:
119
+ `DamerauLevenshtein.distance` and `.array_distance` take 4 arguments:
84
120
 
85
- * `string1`
86
- * `string2`
121
+ * `string1` (`array1` for `.array_distance`)
122
+ * `string2` (`array2` for `.array_distance`)
87
123
  * `block_size` (default is 1)
88
124
  * `max_distance` (default is 10)
89
125
 
@@ -113,57 +149,63 @@ Levenshtein algorithm is expensive, so it makes sense to give up when edit
113
149
  distance is becoming too big. The argument max_distance does just that.
114
150
 
115
151
  ```ruby
152
+
116
153
  DamerauLevenshtein.distance("abcdefg", "1234567", 0, 3)
117
154
  # output: 4 -- it gave up when edit distance exceeded 3
155
+
118
156
  ```
119
157
 
120
- `DamerauLevenshtein.string_distance` is an alias of
121
- `DamerauLevenshtein.distance`
158
+ #### DamerauLevenshtein::Differ
159
+
160
+ `differ = DamerauLevenshtein::Differ.new` creates an instance of new differ class to return difference between two strings
161
+
162
+ `differ.format` shows current format for diff. Default is `:tag` format
163
+
164
+ `differ.format = :raw` changes current format for diffs. Possible values are `:tag` and `:raw`
122
165
 
123
- `DamerauLevenshtein.array_distance` has the same parameters as
124
- `DamerauLevenshtein.distance`, but operates on arrays of Integers.
166
+ `differ.run("String1", "String2")` returns difference between two strings.
167
+
168
+ For example:
125
169
 
126
170
  ```ruby
127
- DamerauLevenshtein.array_distance([1,2,4], [1,2,3])
128
- # output: 1
171
+ differ = DamerauLevenshtein::Differ.new
172
+ differ.run("Something", "smthng")
173
+ # output: ["<ins>S</ins><subst>o</subst>m<ins>e</ins>th<ins>i</ins>ng",
174
+ # "<del>S</del><subst>s</subst>m<del>e</del>th<del>i</del>ng"]
175
+
129
176
  ```
130
177
 
131
- Contributing to damerau-levenshtein
132
- -----------------------------------
178
+
179
+ ## Contributing to damerau-levenshtein ##
133
180
 
134
181
  * Check out the latest master to make sure the feature hasn't been
135
- implemented or the bug hasn't been fixed yet
182
+ implemented or the bug hasn't been fixed yet
136
183
  * Check out the issue tracker to make sure someone already hasn't requested
137
- it and/or contributed it
184
+ it and/or contributed it
138
185
  * Fork the project
139
186
  * Start a feature/bugfix branch
140
187
  * Commit and push until you are happy with your contribution
141
188
  * Make sure to add tests for it. This is important so I don't break it
142
- in a future version unintentionally.
189
+ in a future version unintentionally.
143
190
  * Please try not to mess with the Rakefile, version, or history. If you want
144
- to have your own version, or is otherwise necessary, that is fine, but please
145
- isolate to its own commit so I can cherry-pick around it.
191
+ to have your own version, or is otherwise necessary, that is fine, but please
192
+ isolate to its own commit so I can cherry-pick around it.
146
193
 
147
- Versioning
148
- ----------
194
+ ## Versioning ##
149
195
 
150
196
  This gem is following practices of [Semantic Versioning][semver]
151
197
 
152
- Authors
153
- -------
198
+ ## Authors ##
154
199
 
155
200
  [Dmitry Mozzherin][dimus]
156
201
 
157
- Contributors
158
- ------------
202
+ ## Contributors ##
159
203
 
160
- [lazylester][lazylester], [Ran Xie][skarlit], [Alexey Zapparov][ixti],
161
- [azhi][azhi]
204
+ [lazylester][lazylester], [Ran Xie][skarlit], [Alexey Zapparov][ixti], [azhi][azhi]
162
205
 
163
- Copyright
164
- ---------
206
+ ## Copyright ##
165
207
 
166
- Copyright (c) 2011-2016 Dmitry Mozzherin. See LICENSE.txt for
208
+ Copyright (c) 2011-2017 Dmitry Mozzherin. See LICENSE.txt for
167
209
  further details.
168
210
 
169
211
  [gem_svg]: https://badge.fury.io/rb/damerau-levenshtein.svg
data/Rakefile CHANGED
@@ -1,25 +1,32 @@
1
+ # frozen_string_literal: true
2
+
1
3
  require "bundler/gem_tasks"
2
4
  require "rspec/core/rake_task"
3
- require 'cucumber/rake/task'
5
+ require "cucumber/rake/task"
4
6
  require "rubocop/rake_task"
5
- require 'rake/dsl_definition'
6
- require 'rake'
7
- require 'rake/extensiontask'
8
- require 'rspec'
7
+ require "rake/dsl_definition"
8
+ require "rake"
9
+ require "rake/extensiontask"
10
+ require "rspec"
9
11
 
10
12
  RSpec::Core::RakeTask.new(:spec) do |rspec|
11
- rspec.pattern = FileList['spec/**/*_spec.rb']
13
+ rspec.pattern = FileList["spec/**/*_spec.rb"]
12
14
  end
13
15
 
14
16
  Cucumber::Rake::Task.new(:features)
15
17
 
16
- Rake::ExtensionTask.new('damerau_levenshtein') do |extension|
17
- extension.ext_dir = 'ext/damerau_levenshtein'
18
- extension.lib_dir = 'lib/damerau-levenshtein'
18
+ Rake::ExtensionTask.new("damerau_levenshtein") do |extension|
19
+ extension.ext_dir = "ext/damerau_levenshtein"
20
+ extension.lib_dir = "lib/damerau-levenshtein"
19
21
  end
20
22
 
21
23
  Rake::Task[:spec].prerequisites << :compile
22
24
  Rake::Task[:features].prerequisites << :compile
23
25
 
24
26
  RuboCop::RakeTask.new
25
- task :default => [:rubocop, :spec]
27
+ task default: %i[rubocop spec]
28
+
29
+ desc "open an irb session preloaded with this gem"
30
+ task :console do
31
+ sh "irb -r pp -r ./lib/damerau-levenshtein.rb"
32
+ end
@@ -1,7 +1,11 @@
1
+ # frozen_string_literal: true
2
+
1
3
  $LOAD_PATH.push File.expand_path("../lib", __FILE__)
2
4
 
3
5
  require "damerau-levenshtein/version"
4
6
 
7
+ # rubocop:disable Metrics/BlockLength
8
+
5
9
  Gem::Specification.new do |s|
6
10
  s.name = "damerau-levenshtein"
7
11
  s.version = DamerauLevenshtein::VERSION
@@ -25,6 +29,7 @@ Gem::Specification.new do |s|
25
29
  s.add_development_dependency "rspec", "~> 3.5"
26
30
  # activesupport >= 5.0 does not support Ruby < 2.2
27
31
  s.add_development_dependency "activesupport", "~> 4.2"
32
+ s.add_development_dependency "byebug", "~> 9.0"
28
33
  s.add_development_dependency "cucumber", "~> 2.4"
29
34
  s.add_development_dependency "ruby-prof", "~> 0.15"
30
35
  s.add_development_dependency "shoulda", "~> 3.5"
@@ -34,3 +39,5 @@ Gem::Specification.new do |s|
34
39
  s.add_development_dependency "rake", "~> 11.2"
35
40
  s.add_development_dependency "rake-compiler", "~> 1.0"
36
41
  end
42
+
43
+ # rubocop:enable all
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  # Loads mkmf which is used to make makefiles for Ruby extensions
2
4
  require "mkmf"
3
5
 
@@ -1,7 +1,10 @@
1
1
  # encoding: UTF-8
2
+ # frozen_string_literal: true
2
3
 
3
- require "damerau-levenshtein/version"
4
- require "damerau-levenshtein/damerau_levenshtein"
4
+ require_relative "damerau-levenshtein/version"
5
+ require_relative "damerau-levenshtein/damerau_levenshtein"
6
+ require_relative "damerau-levenshtein/formatter"
7
+ require_relative "damerau-levenshtein/differ"
5
8
 
6
9
  # Damerau-Levenshtein algorithm
7
10
  module DamerauLevenshtein
@@ -0,0 +1,107 @@
1
+ # frozen_string_literal: true
2
+
3
+ module DamerauLevenshtein
4
+ # Shows the difference between two strings in character by character
5
+ # resolution
6
+ class Differ
7
+ FORMATS = %i[raw tag].freeze
8
+ attr_accessor :format
9
+
10
+ def initialize
11
+ @format = :tag
12
+ @matrix = []
13
+ end
14
+
15
+ def format=(new_format)
16
+ new_format = new_format.to_sym
17
+ @format = new_format if FORMATS.include?(new_format)
18
+ end
19
+
20
+ def run(str1, str2)
21
+ @len1 = str1.size.freeze
22
+ @len2 = str2.size.freeze
23
+ prepare_matrix
24
+ edit_distance(str1, str2)
25
+ raw = trace_back
26
+ formatter_factory.show(raw, str1, str2)
27
+ end
28
+
29
+ private
30
+
31
+ def formatter_factory
32
+ formatter =
33
+ case @format
34
+ when :tag
35
+ DamerauLevenshtein::FormatterTag
36
+ when :raw
37
+ DamerauLevenshtein::FormatterRaw
38
+ end
39
+ Formatter.new(formatter)
40
+ end
41
+
42
+ def edit_distance(str1, str2)
43
+ (1..@len2).each do |i|
44
+ (1..@len1).each do |j|
45
+ no_change(i, j) && next if str2[i - 1] == str1[j - 1]
46
+ @matrix[i][j] = [del(i, j), ins(i, j), subst(i, j)].min + 1
47
+ end
48
+ end
49
+ end
50
+
51
+ def trace_back
52
+ res = []
53
+ cell = [@len2, @len1]
54
+ while cell != [0, 0]
55
+ cell, char = char_data(cell)
56
+ res.unshift char
57
+ end
58
+ res
59
+ end
60
+
61
+ def char_data(cell)
62
+ char = { distance: @matrix[cell[0]][cell[1]] }
63
+ val = find_previous(cell)
64
+ previous_value = val[0][0]
65
+ char[:type] = previous_value == char[:distance] ? :same : val[1]
66
+ cell = val.pop
67
+ [cell, char]
68
+ end
69
+
70
+ def find_previous(cell)
71
+ candidates = [[[ins(*cell), 1], :ins, [cell[0], cell[1] - 1]],
72
+ [[del(*cell), 2], :del, [cell[0] - 1, cell[1]]],
73
+ [[subst(*cell), 0], :subst, [cell[0] - 1, cell[1] - 1]]]
74
+ select_cell(candidates)
75
+ end
76
+
77
+ def select_cell(candidates)
78
+ candidates.select { |e| e[-1][0] >= 0 && e[-1][1] >= 0 }.
79
+ sort_by(&:first).first
80
+ end
81
+
82
+ def del(i, j)
83
+ @matrix[i - 1][j]
84
+ end
85
+
86
+ def ins(i, j)
87
+ @matrix[i][j - 1]
88
+ end
89
+
90
+ def subst(i, j)
91
+ @matrix[i - 1][j - 1]
92
+ end
93
+
94
+ def no_change(i, j)
95
+ @matrix[i][j] = @matrix[i - 1][j - 1]
96
+ end
97
+
98
+ def prepare_matrix
99
+ @matrix = []
100
+ @matrix << (0..@len1).to_a
101
+ @len2.times do |i|
102
+ ary = [i + 1] + (1..@len1).map { nil }
103
+ @matrix << ary
104
+ end
105
+ end
106
+ end
107
+ end
@@ -0,0 +1,91 @@
1
+ # frozen_string_literal: true
2
+
3
+ module DamerauLevenshtein
4
+ # Formats supplied strings according to their differences
5
+ class Formatter
6
+ def initialize(formatter)
7
+ @formatter = formatter
8
+ end
9
+
10
+ def show(raw_format, str1, str2)
11
+ @formatter.show(raw_format, str1, str2)
12
+ end
13
+ end
14
+
15
+ # Outputs raw format for two strings
16
+ module FormatterRaw
17
+ def self.show(raw_format, _, _)
18
+ raw_format
19
+ end
20
+ end
21
+
22
+ # Outputs strings marked with tags
23
+ module FormatterTag
24
+ class << self
25
+ def show(raw_format, str1, str2)
26
+ inverted_raw_format = raw_format.map do |e|
27
+ type = invert_type(e[:type])
28
+ { distance: e[:distance], type: type }
29
+ end
30
+ [show_string(raw_format, str1, str2),
31
+ show_string(inverted_raw_format, str2, str1)]
32
+ end
33
+
34
+ private
35
+
36
+ def invert_type(type)
37
+ case type
38
+ when :del
39
+ :ins
40
+ when :ins
41
+ :del
42
+ else
43
+ type
44
+ end
45
+ end
46
+
47
+ def show_string(raw, str1, str2)
48
+ data = { res: [], type: nil, deletes: 0, inserts: 0,
49
+ str1: str1, str2: str2 }
50
+ raw.each_with_index do |e, i|
51
+ process_entry(e, i, data)
52
+ end
53
+ data[:res] << format("</%s>", data[:type]) if data[:type] != :same
54
+ data[:res].join("")
55
+ end
56
+
57
+ def process_entry(e, i, data)
58
+ if data[:type] && e[:type] != data[:type]
59
+ insert_tags(e, data)
60
+ elsif data[:type].nil?
61
+ data[:res] << format("<%s>", e[:type]) if e[:type] != :same
62
+ end
63
+ insert_letter(e, i, data)
64
+ end
65
+
66
+ def insert_tags(entry, data)
67
+ data[:res] << format("</%s>", data[:type]) if data[:type] != :same
68
+ data[:res] << format("<%s>", entry[:type]) if entry[:type] != :same
69
+ end
70
+
71
+ def insert_letter(entry, index, data)
72
+ if entry[:type] == :del
73
+ insert_del(index, data)
74
+ else
75
+ insert_others(index, data)
76
+ end
77
+ data[:inserts] += 1 if entry[:type] == :ins
78
+ data[:type] = entry[:type]
79
+ end
80
+
81
+ def insert_del(i, data)
82
+ data[:res] << data[:str2][i - data[:inserts]]
83
+ data[:deletes] += 1
84
+ end
85
+
86
+ def insert_others(i, data)
87
+ data[:res] << data[:str1][i - data[:deletes]]
88
+ end
89
+ end
90
+ end
91
+ end
@@ -1,4 +1,6 @@
1
+ # frozen_string_literal: true
2
+
1
3
  # Damerau Levenshtein algorithm
2
4
  module DamerauLevenshtein
3
- VERSION = "1.2.0".freeze
5
+ VERSION = "1.3.0"
4
6
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: damerau-levenshtein
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2.0
4
+ version: 1.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dmitry Mozzherin
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-09-15 00:00:00.000000000 Z
11
+ date: 2017-08-07 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rspec
@@ -38,6 +38,20 @@ dependencies:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
40
  version: '4.2'
41
+ - !ruby/object:Gem::Dependency
42
+ name: byebug
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '9.0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '9.0'
41
55
  - !ruby/object:Gem::Dependency
42
56
  name: cucumber
43
57
  requirement: !ruby/object:Gem::Requirement
@@ -176,6 +190,8 @@ files:
176
190
  - ext/damerau_levenshtein/extconf.rb
177
191
  - lib/damerau-levenshtein.rb
178
192
  - lib/damerau-levenshtein/damerau_levenshtein.so
193
+ - lib/damerau-levenshtein/differ.rb
194
+ - lib/damerau-levenshtein/formatter.rb
179
195
  - lib/damerau-levenshtein/version.rb
180
196
  homepage: https://github.com/GlobalNamesArchitecture/damerau-levenshtein
181
197
  licenses:
@@ -198,7 +214,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
198
214
  version: '0'
199
215
  requirements: []
200
216
  rubyforge_project:
201
- rubygems_version: 2.4.5.1
217
+ rubygems_version: 2.6.11
202
218
  signing_key:
203
219
  specification_version: 4
204
220
  summary: Calculation of editing distance for 2 strings using Levenshtein or Damerau-Levenshtein