damerau-levenshtein 1.2.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: acbd86c82aab23d17130fda4d7e05aa39d10571d
4
- data.tar.gz: 8421eaf8a995a2b217ca77a703b0ac328a729202
3
+ metadata.gz: 67c6faa0317240defb0210ab642c514942855fdb
4
+ data.tar.gz: 4ad7bd77b1eb7fec82c489e78d572fa75f25c2c9
5
5
  SHA512:
6
- metadata.gz: 3ad4824365414744893442e7c64698f9c0bd0ca4b948b52d53b55f107a89fc5e96971e1bf81ac50a051ed63bf68dcb33493e08af96fbe16aa9d10bd030b0cae6
7
- data.tar.gz: e1e72f50b29be357115bdf666bb7c7a57c9b88bd1eecf0457d25dba469ebe20cdd65e8b4b58c153d947041ff2dd5f2593743787257bd575d4689416a31065ee3
6
+ metadata.gz: fd15b89736839baa55d4023c15a5ff8511364b55b4e585c3bf68dcd4bb75e62e9e5dfdff1505691694c502c875f19b74c996691dec7261d184be97e0b0636ede
7
+ data.tar.gz: ae7e131dcd30ed4f3d98c78bfbe4359a63392c34f301c76aef975fbf94ec9f60abd9e61f500f71c0a1bfe9e493080eb6291c50a736d2c68802681524ad580f47
data/.gitignore CHANGED
@@ -5,6 +5,10 @@ tmp
5
5
  *.o
6
6
  *.bundle
7
7
  *.gem
8
+ .nvimlog
9
+ .vim.custom
10
+ .byebug_history
11
+
8
12
  # rcov generated
9
13
  coverage
10
14
 
@@ -1 +1 @@
1
- 2.2.4
1
+ 2.4.1
@@ -2,7 +2,8 @@ rvm:
2
2
  - 2.0
3
3
  - 2.1
4
4
  - 2.2
5
- - 2.3.1
5
+ - 2.3
6
+ - 2.4
6
7
  before_install: "gem update bundler"
7
8
  script:
8
9
  - "bundle exec rake"
@@ -1,6 +1,8 @@
1
1
  damerau-levenshtein CHANGELOG
2
2
  =============================
3
3
 
4
+ 1.3.0 -- (issue #10) shows difference between two strings
5
+
4
6
  1.2.0 -- add edit distance for array of integers (by @azhi)
5
7
 
6
8
  1.1.3 -- add ruby 2.3.1 to travis tests by request from @greysteil
data/Gemfile CHANGED
@@ -1,3 +1,5 @@
1
- source 'https://rubygems.org'
1
+ # frozen_string_literal: true
2
+
3
+ source "https://rubygems.org"
2
4
 
3
5
  gemspec
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2011-2016 Dmitry Mozzherin
3
+ Copyright (c) 2011-2017 Dmitry Mozzherin
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining
6
6
  a copy of this software and associated documentation files (the
data/README.md CHANGED
@@ -1,13 +1,12 @@
1
- damerau-levenshtein
2
- ===================
1
+ # damerau-levenshtein #
3
2
 
4
3
  [![Gem Version][gem_svg]][gem]
5
4
  [![Continuous Integration Status][ci_svg]][ci]
6
5
  [![Dependency Status][dep_svg]][dep]
7
6
  [![Coverage Status][cov_svg]][cov]
8
7
 
9
- The damerau-levenshtein gem allows to find edit distance between two UTF-8
10
- or ASCII encoded strings with O(N\*M) efficiency.
8
+ The damerau-levenshtein gem allows to find [edit distance][ed] between two
9
+ UTF-8 or ASCII encoded strings with O(N\*M) efficiency.
11
10
 
12
11
  This gem implements pure Levenshtein algorithm, Damerau modification of it
13
12
  (where 2 character transposition counts as 1 edit distance). It also includes
@@ -20,21 +19,28 @@ require "damerau-levenshtein"
20
19
  DamerauLevenshtein.distance("Something", "Smoething") #returns 1
21
20
  ```
22
21
 
22
+ It also returns a diff between two strings according to Levenshtein alrorithm.
23
+ The diff is expressed by tags `<ins>`, `<del>`, and `<subst>`
24
+
25
+ ```ruby
26
+ require "damerau-levenshtein"
27
+ differ = DamerauLevenshtein::Differ.new
28
+ differ.run("corn", "cron")
29
+ # output: ["c<subst>or</subst>n", "c<subst>ro</subst>n"]
30
+ ```
31
+
23
32
  Gem damerau-levenshtein is compatible with ruby versions 1.8.7
24
33
  and 1.9.2 and higher, as well as 2.0.0 and higher
25
34
 
26
- Dependencies
27
- -------------
35
+ ## Dependencies ##
28
36
 
29
37
  sudo apt-get install build-essential libgmp3-dev
30
38
 
31
- Installation
32
- ------------
39
+ ## Installation ##
33
40
 
34
41
  gem install damerau-levenshtein
35
42
 
36
- Examples
37
- --------
43
+ ## Examples ##
38
44
 
39
45
  ```ruby
40
46
  require "damerau-levenshtein"
@@ -65,25 +71,55 @@ dl.distance("Something", "meSothing", 2) #returns 2 instead of 4
65
71
  dl.distance("Sjöstedt", "Sjostedt") #returns 1
66
72
  ```
67
73
 
68
- API Description
69
- -----------
74
+ * compare two arrays
75
+
76
+ ```ruby
77
+ dl.array_distance([1,2,3,5], [1,2,3,4]) #returns 1
78
+ ```
79
+
80
+ * return diff between two strings
81
+
82
+ ```ruby
83
+ differ = DamerauLevenshtein::Differ.new
84
+ differ.run("Something", "smthg")
85
+ ```
70
86
 
71
- Gem defines two methods
87
+ * return diff between two strings in raw format
88
+
89
+ ```ruby
90
+ differ = DamerauLevenshtein::Differ.new
91
+ differ.format = :raw
92
+ differ.run("Something", "smthg")
93
+ ```
94
+
95
+ ## API Description ##
96
+
97
+ ### Methods ###
98
+
99
+ #### DamerauLevenshtein.version
72
100
 
73
101
  ```ruby
74
102
  DamerauLevenshtein.version
75
103
  #returns version number of the gem
104
+ ```
105
+
106
+ #### DamerauLevenshtein.distance
76
107
 
108
+ ```ruby
77
109
  DamerauLevenshtein.distance(string1, string2, block_size, max_distance)
78
- #returns [edit distance][ed] between 2 strings
79
- ```
110
+ #returns edit distance between 2 strings
80
111
 
112
+ DamerauLevenshtein.string_distance(string1, string2, block_size, max_distance)
113
+ # an alias for .distance
81
114
 
115
+ DamerauLevenshtein.array_distance(array1, array2, block_size, max_distance)
116
+ # returns edit distance between 2 arrays of integers
117
+ ```
82
118
 
83
- DamerauLevenshtein.distance takes 4 arguments:
119
+ `DamerauLevenshtein.distance` and `.array_distance` take 4 arguments:
84
120
 
85
- * `string1`
86
- * `string2`
121
+ * `string1` (`array1` for `.array_distance`)
122
+ * `string2` (`array2` for `.array_distance`)
87
123
  * `block_size` (default is 1)
88
124
  * `max_distance` (default is 10)
89
125
 
@@ -113,57 +149,63 @@ Levenshtein algorithm is expensive, so it makes sense to give up when edit
113
149
  distance is becoming too big. The argument max_distance does just that.
114
150
 
115
151
  ```ruby
152
+
116
153
  DamerauLevenshtein.distance("abcdefg", "1234567", 0, 3)
117
154
  # output: 4 -- it gave up when edit distance exceeded 3
155
+
118
156
  ```
119
157
 
120
- `DamerauLevenshtein.string_distance` is an alias of
121
- `DamerauLevenshtein.distance`
158
+ #### DamerauLevenshtein::Differ
159
+
160
+ `differ = DamerauLevenshtein::Differ.new` creates an instance of new differ class to return difference between two strings
161
+
162
+ `differ.format` shows current format for diff. Default is `:tag` format
163
+
164
+ `differ.format = :raw` changes current format for diffs. Possible values are `:tag` and `:raw`
122
165
 
123
- `DamerauLevenshtein.array_distance` has the same parameters as
124
- `DamerauLevenshtein.distance`, but operates on arrays of Integers.
166
+ `differ.run("String1", "String2")` returns difference between two strings.
167
+
168
+ For example:
125
169
 
126
170
  ```ruby
127
- DamerauLevenshtein.array_distance([1,2,4], [1,2,3])
128
- # output: 1
171
+ differ = DamerauLevenshtein::Differ.new
172
+ differ.run("Something", "smthng")
173
+ # output: ["<ins>S</ins><subst>o</subst>m<ins>e</ins>th<ins>i</ins>ng",
174
+ # "<del>S</del><subst>s</subst>m<del>e</del>th<del>i</del>ng"]
175
+
129
176
  ```
130
177
 
131
- Contributing to damerau-levenshtein
132
- -----------------------------------
178
+
179
+ ## Contributing to damerau-levenshtein ##
133
180
 
134
181
  * Check out the latest master to make sure the feature hasn't been
135
- implemented or the bug hasn't been fixed yet
182
+ implemented or the bug hasn't been fixed yet
136
183
  * Check out the issue tracker to make sure someone already hasn't requested
137
- it and/or contributed it
184
+ it and/or contributed it
138
185
  * Fork the project
139
186
  * Start a feature/bugfix branch
140
187
  * Commit and push until you are happy with your contribution
141
188
  * Make sure to add tests for it. This is important so I don't break it
142
- in a future version unintentionally.
189
+ in a future version unintentionally.
143
190
  * Please try not to mess with the Rakefile, version, or history. If you want
144
- to have your own version, or is otherwise necessary, that is fine, but please
145
- isolate to its own commit so I can cherry-pick around it.
191
+ to have your own version, or is otherwise necessary, that is fine, but please
192
+ isolate to its own commit so I can cherry-pick around it.
146
193
 
147
- Versioning
148
- ----------
194
+ ## Versioning ##
149
195
 
150
196
  This gem is following practices of [Semantic Versioning][semver]
151
197
 
152
- Authors
153
- -------
198
+ ## Authors ##
154
199
 
155
200
  [Dmitry Mozzherin][dimus]
156
201
 
157
- Contributors
158
- ------------
202
+ ## Contributors ##
159
203
 
160
- [lazylester][lazylester], [Ran Xie][skarlit], [Alexey Zapparov][ixti],
161
- [azhi][azhi]
204
+ [lazylester][lazylester], [Ran Xie][skarlit], [Alexey Zapparov][ixti], [azhi][azhi]
162
205
 
163
- Copyright
164
- ---------
206
+ ## Copyright ##
165
207
 
166
- Copyright (c) 2011-2016 Dmitry Mozzherin. See LICENSE.txt for
208
+ Copyright (c) 2011-2017 Dmitry Mozzherin. See LICENSE.txt for
167
209
  further details.
168
210
 
169
211
  [gem_svg]: https://badge.fury.io/rb/damerau-levenshtein.svg
data/Rakefile CHANGED
@@ -1,25 +1,32 @@
1
+ # frozen_string_literal: true
2
+
1
3
  require "bundler/gem_tasks"
2
4
  require "rspec/core/rake_task"
3
- require 'cucumber/rake/task'
5
+ require "cucumber/rake/task"
4
6
  require "rubocop/rake_task"
5
- require 'rake/dsl_definition'
6
- require 'rake'
7
- require 'rake/extensiontask'
8
- require 'rspec'
7
+ require "rake/dsl_definition"
8
+ require "rake"
9
+ require "rake/extensiontask"
10
+ require "rspec"
9
11
 
10
12
  RSpec::Core::RakeTask.new(:spec) do |rspec|
11
- rspec.pattern = FileList['spec/**/*_spec.rb']
13
+ rspec.pattern = FileList["spec/**/*_spec.rb"]
12
14
  end
13
15
 
14
16
  Cucumber::Rake::Task.new(:features)
15
17
 
16
- Rake::ExtensionTask.new('damerau_levenshtein') do |extension|
17
- extension.ext_dir = 'ext/damerau_levenshtein'
18
- extension.lib_dir = 'lib/damerau-levenshtein'
18
+ Rake::ExtensionTask.new("damerau_levenshtein") do |extension|
19
+ extension.ext_dir = "ext/damerau_levenshtein"
20
+ extension.lib_dir = "lib/damerau-levenshtein"
19
21
  end
20
22
 
21
23
  Rake::Task[:spec].prerequisites << :compile
22
24
  Rake::Task[:features].prerequisites << :compile
23
25
 
24
26
  RuboCop::RakeTask.new
25
- task :default => [:rubocop, :spec]
27
+ task default: %i[rubocop spec]
28
+
29
+ desc "open an irb session preloaded with this gem"
30
+ task :console do
31
+ sh "irb -r pp -r ./lib/damerau-levenshtein.rb"
32
+ end
@@ -1,7 +1,11 @@
1
+ # frozen_string_literal: true
2
+
1
3
  $LOAD_PATH.push File.expand_path("../lib", __FILE__)
2
4
 
3
5
  require "damerau-levenshtein/version"
4
6
 
7
+ # rubocop:disable Metrics/BlockLength
8
+
5
9
  Gem::Specification.new do |s|
6
10
  s.name = "damerau-levenshtein"
7
11
  s.version = DamerauLevenshtein::VERSION
@@ -25,6 +29,7 @@ Gem::Specification.new do |s|
25
29
  s.add_development_dependency "rspec", "~> 3.5"
26
30
  # activesupport >= 5.0 does not support Ruby < 2.2
27
31
  s.add_development_dependency "activesupport", "~> 4.2"
32
+ s.add_development_dependency "byebug", "~> 9.0"
28
33
  s.add_development_dependency "cucumber", "~> 2.4"
29
34
  s.add_development_dependency "ruby-prof", "~> 0.15"
30
35
  s.add_development_dependency "shoulda", "~> 3.5"
@@ -34,3 +39,5 @@ Gem::Specification.new do |s|
34
39
  s.add_development_dependency "rake", "~> 11.2"
35
40
  s.add_development_dependency "rake-compiler", "~> 1.0"
36
41
  end
42
+
43
+ # rubocop:enable all
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  # Loads mkmf which is used to make makefiles for Ruby extensions
2
4
  require "mkmf"
3
5
 
@@ -1,7 +1,10 @@
1
1
  # encoding: UTF-8
2
+ # frozen_string_literal: true
2
3
 
3
- require "damerau-levenshtein/version"
4
- require "damerau-levenshtein/damerau_levenshtein"
4
+ require_relative "damerau-levenshtein/version"
5
+ require_relative "damerau-levenshtein/damerau_levenshtein"
6
+ require_relative "damerau-levenshtein/formatter"
7
+ require_relative "damerau-levenshtein/differ"
5
8
 
6
9
  # Damerau-Levenshtein algorithm
7
10
  module DamerauLevenshtein
@@ -0,0 +1,107 @@
1
+ # frozen_string_literal: true
2
+
3
+ module DamerauLevenshtein
4
+ # Shows the difference between two strings in character by character
5
+ # resolution
6
+ class Differ
7
+ FORMATS = %i[raw tag].freeze
8
+ attr_accessor :format
9
+
10
+ def initialize
11
+ @format = :tag
12
+ @matrix = []
13
+ end
14
+
15
+ def format=(new_format)
16
+ new_format = new_format.to_sym
17
+ @format = new_format if FORMATS.include?(new_format)
18
+ end
19
+
20
+ def run(str1, str2)
21
+ @len1 = str1.size.freeze
22
+ @len2 = str2.size.freeze
23
+ prepare_matrix
24
+ edit_distance(str1, str2)
25
+ raw = trace_back
26
+ formatter_factory.show(raw, str1, str2)
27
+ end
28
+
29
+ private
30
+
31
+ def formatter_factory
32
+ formatter =
33
+ case @format
34
+ when :tag
35
+ DamerauLevenshtein::FormatterTag
36
+ when :raw
37
+ DamerauLevenshtein::FormatterRaw
38
+ end
39
+ Formatter.new(formatter)
40
+ end
41
+
42
+ def edit_distance(str1, str2)
43
+ (1..@len2).each do |i|
44
+ (1..@len1).each do |j|
45
+ no_change(i, j) && next if str2[i - 1] == str1[j - 1]
46
+ @matrix[i][j] = [del(i, j), ins(i, j), subst(i, j)].min + 1
47
+ end
48
+ end
49
+ end
50
+
51
+ def trace_back
52
+ res = []
53
+ cell = [@len2, @len1]
54
+ while cell != [0, 0]
55
+ cell, char = char_data(cell)
56
+ res.unshift char
57
+ end
58
+ res
59
+ end
60
+
61
+ def char_data(cell)
62
+ char = { distance: @matrix[cell[0]][cell[1]] }
63
+ val = find_previous(cell)
64
+ previous_value = val[0][0]
65
+ char[:type] = previous_value == char[:distance] ? :same : val[1]
66
+ cell = val.pop
67
+ [cell, char]
68
+ end
69
+
70
+ def find_previous(cell)
71
+ candidates = [[[ins(*cell), 1], :ins, [cell[0], cell[1] - 1]],
72
+ [[del(*cell), 2], :del, [cell[0] - 1, cell[1]]],
73
+ [[subst(*cell), 0], :subst, [cell[0] - 1, cell[1] - 1]]]
74
+ select_cell(candidates)
75
+ end
76
+
77
+ def select_cell(candidates)
78
+ candidates.select { |e| e[-1][0] >= 0 && e[-1][1] >= 0 }.
79
+ sort_by(&:first).first
80
+ end
81
+
82
+ def del(i, j)
83
+ @matrix[i - 1][j]
84
+ end
85
+
86
+ def ins(i, j)
87
+ @matrix[i][j - 1]
88
+ end
89
+
90
+ def subst(i, j)
91
+ @matrix[i - 1][j - 1]
92
+ end
93
+
94
+ def no_change(i, j)
95
+ @matrix[i][j] = @matrix[i - 1][j - 1]
96
+ end
97
+
98
+ def prepare_matrix
99
+ @matrix = []
100
+ @matrix << (0..@len1).to_a
101
+ @len2.times do |i|
102
+ ary = [i + 1] + (1..@len1).map { nil }
103
+ @matrix << ary
104
+ end
105
+ end
106
+ end
107
+ end
@@ -0,0 +1,91 @@
1
+ # frozen_string_literal: true
2
+
3
+ module DamerauLevenshtein
4
+ # Formats supplied strings according to their differences
5
+ class Formatter
6
+ def initialize(formatter)
7
+ @formatter = formatter
8
+ end
9
+
10
+ def show(raw_format, str1, str2)
11
+ @formatter.show(raw_format, str1, str2)
12
+ end
13
+ end
14
+
15
+ # Outputs raw format for two strings
16
+ module FormatterRaw
17
+ def self.show(raw_format, _, _)
18
+ raw_format
19
+ end
20
+ end
21
+
22
+ # Outputs strings marked with tags
23
+ module FormatterTag
24
+ class << self
25
+ def show(raw_format, str1, str2)
26
+ inverted_raw_format = raw_format.map do |e|
27
+ type = invert_type(e[:type])
28
+ { distance: e[:distance], type: type }
29
+ end
30
+ [show_string(raw_format, str1, str2),
31
+ show_string(inverted_raw_format, str2, str1)]
32
+ end
33
+
34
+ private
35
+
36
+ def invert_type(type)
37
+ case type
38
+ when :del
39
+ :ins
40
+ when :ins
41
+ :del
42
+ else
43
+ type
44
+ end
45
+ end
46
+
47
+ def show_string(raw, str1, str2)
48
+ data = { res: [], type: nil, deletes: 0, inserts: 0,
49
+ str1: str1, str2: str2 }
50
+ raw.each_with_index do |e, i|
51
+ process_entry(e, i, data)
52
+ end
53
+ data[:res] << format("</%s>", data[:type]) if data[:type] != :same
54
+ data[:res].join("")
55
+ end
56
+
57
+ def process_entry(e, i, data)
58
+ if data[:type] && e[:type] != data[:type]
59
+ insert_tags(e, data)
60
+ elsif data[:type].nil?
61
+ data[:res] << format("<%s>", e[:type]) if e[:type] != :same
62
+ end
63
+ insert_letter(e, i, data)
64
+ end
65
+
66
+ def insert_tags(entry, data)
67
+ data[:res] << format("</%s>", data[:type]) if data[:type] != :same
68
+ data[:res] << format("<%s>", entry[:type]) if entry[:type] != :same
69
+ end
70
+
71
+ def insert_letter(entry, index, data)
72
+ if entry[:type] == :del
73
+ insert_del(index, data)
74
+ else
75
+ insert_others(index, data)
76
+ end
77
+ data[:inserts] += 1 if entry[:type] == :ins
78
+ data[:type] = entry[:type]
79
+ end
80
+
81
+ def insert_del(i, data)
82
+ data[:res] << data[:str2][i - data[:inserts]]
83
+ data[:deletes] += 1
84
+ end
85
+
86
+ def insert_others(i, data)
87
+ data[:res] << data[:str1][i - data[:deletes]]
88
+ end
89
+ end
90
+ end
91
+ end
@@ -1,4 +1,6 @@
1
+ # frozen_string_literal: true
2
+
1
3
  # Damerau Levenshtein algorithm
2
4
  module DamerauLevenshtein
3
- VERSION = "1.2.0".freeze
5
+ VERSION = "1.3.0"
4
6
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: damerau-levenshtein
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2.0
4
+ version: 1.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dmitry Mozzherin
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-09-15 00:00:00.000000000 Z
11
+ date: 2017-08-07 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rspec
@@ -38,6 +38,20 @@ dependencies:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
40
  version: '4.2'
41
+ - !ruby/object:Gem::Dependency
42
+ name: byebug
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '9.0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '9.0'
41
55
  - !ruby/object:Gem::Dependency
42
56
  name: cucumber
43
57
  requirement: !ruby/object:Gem::Requirement
@@ -176,6 +190,8 @@ files:
176
190
  - ext/damerau_levenshtein/extconf.rb
177
191
  - lib/damerau-levenshtein.rb
178
192
  - lib/damerau-levenshtein/damerau_levenshtein.so
193
+ - lib/damerau-levenshtein/differ.rb
194
+ - lib/damerau-levenshtein/formatter.rb
179
195
  - lib/damerau-levenshtein/version.rb
180
196
  homepage: https://github.com/GlobalNamesArchitecture/damerau-levenshtein
181
197
  licenses:
@@ -198,7 +214,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
198
214
  version: '0'
199
215
  requirements: []
200
216
  rubyforge_project:
201
- rubygems_version: 2.4.5.1
217
+ rubygems_version: 2.6.11
202
218
  signing_key:
203
219
  specification_version: 4
204
220
  summary: Calculation of editing distance for 2 strings using Levenshtein or Damerau-Levenshtein