RubyGems - loose_tight_dictionary - Versions diffs - 1.0.2 → 1.0.3 - Mend

loose_tight_dictionary 1.0.2 → 1.0.3

Files changed (12) hide show

data/README.rdoc +24 -6
data/examples/bts_aircraft/test_bts_aircraft.rb +0 -5
data/examples/first_name_matching.rb +1 -1
data/lib/loose_tight_dictionary/result.rb +1 -0
data/lib/loose_tight_dictionary/score.rb +66 -20
data/lib/loose_tight_dictionary/similarity.rb +7 -6
data/lib/loose_tight_dictionary/stop_word.rb +19 -0
data/lib/loose_tight_dictionary/version.rb +1 -1
data/lib/loose_tight_dictionary/wrapper.rb +28 -11
data/lib/loose_tight_dictionary.rb +48 -48
data/test/test_loose_tight_dictionary.rb +20 -5
metadata +21 -20

data/README.rdoc CHANGED Viewed

@@ -1,6 +1,6 @@
 = loose_tight_dictionary
-Match things based on string similarity (using the Pair Distance algorithm) and regular expressions.
+Find a needle in a haystack based on string similarity (using the Pair Distance algorithm and Levenshtein distance) and regular expressions.
 == Quickstart
@@ -11,7 +11,20 @@ Match things based on string similarity (using the Pair Distance algorithm) and
 == String similarity matching
-Exclusively uses {Dice's Coefficient}[http://en.wikipedia.org/wiki/Dice's_coefficient] algorithm (aka Pair Distance).
+Uses {Dice's Coefficient}[http://en.wikipedia.org/wiki/Dice's_coefficient] algorithm (aka Pair Distance).
+If that judges two strings to be be equally similar to a third string, then Levenshtein distance is used. For example, pair distance considers "RATZ" and "CATZ" to be equally similar to "RITZ" so we invoke Levenshtein.
+    >> require 'amatch'
+    => true
+    >> 'RITZ'.pair_distance_similar 'RATZ'
+    => 0.3333333333333333
+    >> 'RITZ'.pair_distance_similar 'CATZ'  # <-- pair distance can't tell the difference, so we fall back to levenshtein...
+    => 0.3333333333333333
+    >> 'RITZ'.levenshtein_similar 'RATZ'
+    => 0.75
+    >> 'RITZ'.levenshtein_similar 'CATZ'    # <-- which properly shows that RATZ should win
+    => 0.5
 == Production use
@@ -36,6 +49,7 @@ You can improve the default matchings with regular expressions.
 * Emphasize important words using <b>blockings</b> and <b>tighteners</b>
 * Filter out stop words with <b>tighteners</b>
 * Prevent impossible matches with <b>blockings</b> and <b>identities</b>
+* Ignore words with <b>stop words</b>
 === Blockings
@@ -49,19 +63,23 @@ Adding a tightener like <tt>/(boeing).*(7\d\d)/i</tt> will cause "BOEING COMPANY
 Adding an identity like <tt>/(F)\-?(\d50)/</tt> ensures that "Ford F-150" and "Ford F-250" never match.
+=== Stop words
+Adding a stop word like <tt>THE</tt> ensures that it is not taken into account when comparing "THE CAT", "THE DAT", and "THE CATT"
 == Case sensitivity
-Scoring is case-insensitive. Everything is downcased before scoring. This is a change from previous versions.
+Scoring is case-insensitive. Everything is downcased before scoring. This is a change from previous versions. Your regexps may still be case-sensitive, though.
 == Examples
 Check out the tests.
-== Speed
+== Speed (and who to thank for the algorithms)
-If you add the amatch[http://flori.github.com/amatch/] gem to your Gemfile, it will use that, which is much faster (but {segfaults have been seen in the wild}[https://github.com/flori/amatch/issues/3]). Thanks Flori!
+If you add the amatch[http://flori.github.com/amatch/] gem to your Gemfile, it will use that, which is much faster (but {segfaults have been seen in the wild}[https://github.com/flori/amatch/issues/3]). Thanks {Flori}[https://github.com/flori]!
-Otherwise, a pure ruby version derived from the {answer to a StackOverflow question}[http://stackoverflow.com/questions/653157/a-better-similarity-ranking-algorithm-for-variable-length-strings] is used. Thanks {marzagao}[http://stackoverflow.com/users/10997/marzagao]!
+Otherwise, pure ruby versions of the string similarity algorithms derived from the {answer to a StackOverflow question}[http://stackoverflow.com/questions/653157/a-better-similarity-ranking-algorithm-for-variable-length-strings] and {the text gem}[https://github.com/threedaymonk/text/blob/master/lib/text/levenshtein.rb] are used. Thanks {marzagao}[http://stackoverflow.com/users/10997/marzagao] and {threedaymonk}[https://github.com/threedaymonk]!
 == Authors

data/examples/bts_aircraft/test_bts_aircraft.rb CHANGED Viewed

@@ -71,11 +71,6 @@ FINAL_OPTIONS = {
 }
 class TestBtsAircraft < Test::Unit::TestCase
-  should "store the records somewhere" do
-    d = LooseTightDictionary.new HAYSTACK
-    assert d.records.grep(/BOEING 707-100/)
-  end
   should "understand records by using the haystack reader" do
     d = LooseTightDictionary.new HAYSTACK, FINAL_OPTIONS
     assert d.haystack.map { |record| record.to_str }.include?('boeing boeing 707-100')

data/examples/first_name_matching.rb CHANGED Viewed

@@ -8,7 +8,7 @@ require 'loose_tight_dictionary'
 haystack = [ 'seamus', 'andy', 'ben' ]
 needles = [ 'Mr. Seamus', 'Sr. Andy', 'Master BenT', 'Shamus Heaney' ]
-d = LooseTightDictionary.new haystack, :log => $stdout
+d = LooseTightDictionary.new haystack
 needles.each do |needle|
   d.explain needle
   puts

data/lib/loose_tight_dictionary/result.rb CHANGED Viewed

@@ -4,6 +4,7 @@ class LooseTightDictionary
     attr_accessor :tighteners
     attr_accessor :blockings
     attr_accessor :identities
+    attr_accessor :candidates
     attr_accessor :joint
     attr_accessor :disjoint
     attr_accessor :possibly_identical

data/lib/loose_tight_dictionary/score.rb CHANGED Viewed

@@ -9,40 +9,44 @@ class LooseTightDictionary
     attr_reader :str1, :str2
     def initialize(str1, str2)
-      @str1 = str1
-      @str2 = str2
-    end
-    def to_f
-      @to_f ||= dices_coefficient(str1, str2)
+      @str1 = str1.downcase
+      @str2 = str2.downcase
     end
     def inspect
-      %{#<Score: to_f=#{to_f}>}
+      %{#<Score: dices_coefficient=#{dices_coefficient} levenshtein=#{levenshtein}>}
     end
     def <=>(other)
-      to_f <=> other.to_f
+      by_dices_coefficient = (dices_coefficient <=> other.dices_coefficient)
+      if by_dices_coefficient == 0
+        levenshtein <=> other.levenshtein
+      else
+        by_dices_coefficient
+      end
     end
-    def ==(other)
-      to_f == other.to_f
+    def utf8?
+      return @utf8_query[0] if @utf8_query.is_a?(::Array)
+      @utf8_query = [ (defined?(::Encoding) ? str1.encoding.to_s : $KCODE).downcase.start_with?('u') ]
+      @utf8_query[0]
     end
-    private
-    # http://stackoverflow.com/questions/653157/a-better-similarity-ranking-algorithm-for-variable-length-strings
     if defined?(::Amatch)
-      def dices_coefficient(str1, str2)
-        str1 = str1.downcase
-        str2 = str2.downcase
+      def dices_coefficient
         str1.pair_distance_similar str2
       end
+      def levenshtein
+        str1.levenshtein_similar str2
+      end
     else
       SPACE = ' '
-      def dices_coefficient(str1, str2)
-        str1 = str1.downcase
-        str2 = str2.downcase
+      # http://stackoverflow.com/questions/653157/a-better-similarity-ranking-algorithm-for-variable-length-strings
+      def dices_coefficient
         if str1 == str2
           return 1.0
         elsif str1.length == 1 and str2.length == 1
@@ -71,6 +75,48 @@ class LooseTightDictionary
         end
         (2.0 * intersection) / union
       end
+      # extracted/adapted from the text gem version 1.0.2
+      # normalization added for utf-8 strings
+      # lib/text/levenshtein.rb
+      def levenshtein
+        if utf8?
+          unpack_rule = 'U*'
+        else
+          unpack_rule = 'C*'
+        end
+        s = str1.unpack(unpack_rule)
+        t = str2.unpack(unpack_rule)
+        n = s.length
+        m = t.length
+        if n == 0 or m == 0
+          return 0.0
+        end
+        d = (0..m).to_a
+        x = nil
+        (0...n).each do |i|
+          e = i+1
+          (0...m).each do |j|
+            cost = (s[i] == t[j]) ? 0 : 1
+            x = [
+              d[j+1] + 1, # insertion
+              e + 1,      # deletion
+              d[j] + cost # substitution
+            ].min
+            d[j] = e
+            e = x
+          end
+          d[m] = x
+        end
+        # normalization logic from https://github.com/flori/amatch/blob/master/ext/amatch_ext.c#L301
+        # if (b_len > a_len) {
+        #     result = rb_float_new(1.0 - ((double) v[p][b_len]) / b_len);
+        # } else {
+        #     result = rb_float_new(1.0 - ((double) v[p][b_len]) / a_len);
+        # }
+        1.0 - x.to_f / [n, m].max
+      end
     end
   end
 end

data/lib/loose_tight_dictionary/similarity.rb CHANGED Viewed

@@ -9,16 +9,17 @@ class LooseTightDictionary
     end
     def <=>(other)
-      if best_score != other.best_score
-        best_score <=> other.best_score
+      by_score = best_score <=> other.best_score
+      if by_score == 0
+        original_weight <=> other.original_weight
       else
-        weight <=> other.weight
+        by_score
       end
     end
     # Weight things towards short original strings
-    def weight
-      @weight ||= (1.0 / (wrapper1.to_str.length * wrapper2.to_str.length))
+    def original_weight
+      @original_weight ||= (1.0 / (wrapper1.render.length * wrapper2.render.length))
     end
     def best_score
@@ -46,7 +47,7 @@ class LooseTightDictionary
     end
     def inspect
-      %{#<Similarity "#{wrapper2.to_str}"=>"#{best_wrapper2_variant}" versus "#{wrapper1.to_str}"=>"#{best_wrapper1_variant}" weight=#{"%0.5f" % weight} best_score=#{"%0.5f" % best_score.to_f}>}
+      %{#<Similarity "#{wrapper2.render}"=>"#{best_wrapper2_variant}" versus "#{wrapper1.render}"=>"#{best_wrapper1_variant}" original_weight=#{"%0.5f" % original_weight} best_score=#{best_score.inspect}>}
     end
   end
 end

data/lib/loose_tight_dictionary/stop_word.rb ADDED Viewed

@@ -0,0 +1,19 @@
+class LooseTightDictionary
+  # A stop word is ignored
+  class StopWord
+    attr_reader :regexp
+    def initialize(regexp_or_str)
+      @regexp = regexp_or_str.to_regexp
+    end
+    # Destructively remove stop words from the string
+    def apply!(str)
+      str.gsub! regexp, ''
+    end
+    def inspect
+      "#<StopWord regexp=#{regexp.inspect}>"
+    end
+  end
+end

data/lib/loose_tight_dictionary/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 class LooseTightDictionary
-  VERSION = '1.0.2'
+  VERSION = '1.0.3'
 end

data/lib/loose_tight_dictionary/wrapper.rb CHANGED Viewed

@@ -1,22 +1,23 @@
 class LooseTightDictionary
   # Wrappers are the tokens that are passed around when doing scoring and optimizing.
   class Wrapper #:nodoc: all
-    attr_reader :parent
+    attr_reader :loose_tight_dictionary
     attr_reader :record
     attr_reader :read
-    def initialize(parent, record, read = nil)
-      @parent = parent
+    def initialize(loose_tight_dictionary, record, read = nil)
+      @loose_tight_dictionary = loose_tight_dictionary
       @record = record
       @read = read
     end
     def inspect
-      "#<Wrapper to_str=#{to_str} variants=#{variants.length}>"
+      "#<Wrapper render=#{render} variants=#{variants.length}>"
     end
-    def to_str
-      @to_str ||= case read
+    def render
+      return @render if rendered?
+      str = case read
       when ::Proc
         read.call record
       when ::Symbol
@@ -29,22 +30,38 @@ class LooseTightDictionary
         record
       else
         record[read]
-      end.to_s
+      end.to_s.dup
+      loose_tight_dictionary.stop_words.each do |stop_word|
+        stop_word.apply! str
+      end
+      str.strip!
+      @render = str.freeze
+      @rendered = true
+      @render
     end
-    alias :to_s :to_str
+    alias :to_str :render
+    WORD_BOUNDARY = %r{\s*\b\s*}
+    def words
+      @words ||= render.split(WORD_BOUNDARY)
+    end
     def similarity(other)
       Similarity.new self, other
     end
     def variants
-      @variants ||= parent.tighteners.inject([ to_str ]) do |memo, tightener|
-        if tightener.apply? to_str
-          memo.push tightener.apply(to_str)
+      @variants ||= loose_tight_dictionary.tighteners.inject([ render ]) do |memo, tightener|
+        if tightener.apply? render
+          memo.push tightener.apply(render)
         end
         memo
       end.uniq
     end
+    def rendered?
+      @rendered == true
+    end
   end
 end

data/lib/loose_tight_dictionary.rb CHANGED Viewed

@@ -8,6 +8,7 @@ require 'to_regexp'
 # See the README for more information.
 class LooseTightDictionary
   autoload :Tightener, 'loose_tight_dictionary/tightener'
+  autoload :StopWord, 'loose_tight_dictionary/stop_word'
   autoload :Blocking, 'loose_tight_dictionary/blocking'
   autoload :Identity, 'loose_tight_dictionary/identity'
   autoload :Result, 'loose_tight_dictionary/result'
@@ -16,19 +17,31 @@ class LooseTightDictionary
   autoload :Score, 'loose_tight_dictionary/score'
   autoload :CachedResult, 'loose_tight_dictionary/cached_result'
-  attr_reader :options
   attr_reader :haystack
-  attr_reader :records
+  attr_reader :blockings
+  attr_reader :identities
+  attr_reader :tighteners
+  attr_reader :stop_words
+  attr_reader :first_blocking_decides
+  attr_reader :must_match_blocking
+  attr_reader :must_match_at_least_one_word
   # haystack - a bunch of records
   # options
   # * tighteners: regexps (see readme)
   # * identities: regexps
   # * blockings: regexps
+  # * stop_words: regexps
   # * read: how to interpret each entry in the 'haystack', either a Proc or a symbol
   def initialize(records, options = {})
-    @options = options.symbolize_keys
-    @records = records
+    options = options.symbolize_keys
+    @first_blocking_decides = options.fetch :first_blocking_decides, false
+    @must_match_blocking = options.fetch :must_match_blocking, false
+    @must_match_at_least_one_word = options.fetch :must_match_at_least_one_word, false
+    @blockings = options.fetch(:blockings, []).map { |regexp_or_str| Blocking.new regexp_or_str }
+    @identities = options.fetch(:identities, []).map { |regexp_or_str| Identity.new regexp_or_str }
+    @tighteners = options.fetch(:tighteners, []).map { |regexp_or_str| Tightener.new regexp_or_str }
+    @stop_words = options.fetch(:stop_words, []).map { |regexp_or_str| StopWord.new regexp_or_str }
     read = options[:read] || options[:haystack_reader]
     @haystack = records.map { |record| Wrapper.new self, record, read }
   end
@@ -37,10 +50,6 @@ class LooseTightDictionary
     @last_result || raise(::RuntimeError, "[loose_tight_dictionary] You can't access the last result until you've run a find with :gather_last_result => true")
   end
-  def log(str = '') #:nodoc:
-    (options[:log] || $stderr).puts str unless options[:log] == false
-  end
   def find_all(needle, options = {})
     options = options.symbolize_keys.merge(:find_all => true)
     find needle, options
@@ -50,11 +59,13 @@ class LooseTightDictionary
     raise ::RuntimeError, "[loose_tight_dictionary] Dictionary has already been freed, can't perform more finds" if freed?
     options = options.symbolize_keys
-    if gather_last_result = options.fetch(:gather_last_result, false)
+    gather_last_result = options.fetch(:gather_last_result, false)
+    is_find_all = options.fetch(:find_all, false)
+    if gather_last_result
       free_last_result
       @last_result = Result.new
     end
-    find_all = options.fetch(:find_all, false)
     if gather_last_result
       last_result.tighteners = tighteners
@@ -69,15 +80,27 @@ class LooseTightDictionary
     end
     if must_match_blocking and blockings.any? and blockings.none? { |blocking| blocking.match? needle }
-      if find_all
+      if is_find_all
         return []
       else
         return nil
       end
     end
+    candidates = if must_match_at_least_one_word
+      haystack.select do |straw|
+        needle.words.any? { |w| straw.render.include? w }
+      end
+    else
+      haystack
+    end
+    if gather_last_result
+      last_result.candidates = candidates
+    end
     joint, disjoint = if blockings.any?
-      haystack.partition do |straw|
+      candidates.partition do |straw|
         if first_blocking_decides
           blockings.detect { |blocking| blocking.match? needle }.try :join?, needle, straw
         else
@@ -85,7 +108,7 @@ class LooseTightDictionary
         end
       end
     else
-      [ haystack.dup, [] ]
+      [ candidates.dup, [] ]
     end
     # special case: the needle didn't fit anywhere, but must_match_blocking is false, so we'll try it against everything
@@ -115,7 +138,7 @@ class LooseTightDictionary
       last_result.certainly_different = certainly_different
     end
-    if find_all
+    if is_find_all
       return possibly_identical.map { |straw| straw.record }
     end
@@ -125,12 +148,11 @@ class LooseTightDictionary
       last_result.similarities = similarities
     end
-    if best_similarity = similarities[-1] and best_similarity.best_score.to_f > 0
+    if best_similarity = similarities[-1] and best_similarity.best_score.dices_coefficient > 0
       record = best_similarity.wrapper2.record
       if gather_last_result
         last_result.record = record
-        last_result.score = best_similarity.best_score.to_f
+        last_result.score = best_similarity.best_score.dices_coefficient
       end
       record
     end
@@ -148,11 +170,11 @@ class LooseTightDictionary
     log
     log "Needle"
     log "-" * 150
-    log last_result.needle.to_str
+    log last_result.needle.render
     log
     log "Haystack"
     log "-" * 150
-    log last_result.haystack.map { |record| record.to_str }.join("\n")
+    log last_result.haystack.map { |record| record.render }.join("\n")
     log
     log "Tighteners"
     log "-" * 150
@@ -168,19 +190,19 @@ class LooseTightDictionary
     log
     log "Joint"
     log "-" * 150
-    log last_result.joint.blank? ? '(none)' : last_result.joint.map { |joint| joint.to_str }.join("\n")
+    log last_result.joint.blank? ? '(none)' : last_result.joint.map { |joint| joint.render }.join("\n")
     log
     log "Disjoint"
     log "-" * 150
-    log last_result.disjoint.blank? ? '(none)' : last_result.disjoint.map { |disjoint| disjoint.to_str }.join("\n")
+    log last_result.disjoint.blank? ? '(none)' : last_result.disjoint.map { |disjoint| disjoint.render }.join("\n")
     log
     log "Possibly identical"
     log "-" * 150
-    log last_result.possibly_identical.blank? ? '(none)' : last_result.possibly_identical.map { |possibly_identical| possibly_identical.to_str }.join("\n")
+    log last_result.possibly_identical.blank? ? '(none)' : last_result.possibly_identical.map { |possibly_identical| possibly_identical.render }.join("\n")
     log
     log "Certainly different"
     log "-" * 150
-    log last_result.certainly_different.blank? ? '(none)' : last_result.certainly_different.map { |certainly_different| certainly_different.to_str }.join("\n")
+    log last_result.certainly_different.blank? ? '(none)' : last_result.certainly_different.map { |certainly_different| certainly_different.render }.join("\n")
     log
     log "Similarities"
     log "-" * 150
@@ -190,33 +212,11 @@ class LooseTightDictionary
     log "-" * 150
     log record.inspect
   end
-  def must_match_blocking
-    options.fetch :must_match_blocking, false
-  end
-  def first_blocking_decides
-    options.fetch :first_blocking_decides, false
-  end
-  def tighteners
-    @tighteners ||= (options[:tighteners] || []).map do |regexp_or_str|
-      Tightener.new regexp_or_str
-    end
-  end
-  def identities
-    @identities ||= (options[:identities] || []).map do |regexp_or_str|
-      Identity.new regexp_or_str
-    end
-  end
-  def blockings
-    @blockings ||= (options[:blockings] || []).map do |regexp_or_str|
-      Blocking.new regexp_or_str
-    end
+  def log(str = '') #:nodoc:
+    $stderr.puts str
   end
   def freed?
     @freed == true
   end

data/test/test_loose_tight_dictionary.rb CHANGED Viewed

@@ -1,3 +1,4 @@
+# -*- encoding: utf-8 -*-
 require 'helper'
 class TestLooseTightDictionary < Test::Unit::TestCase
@@ -11,8 +12,9 @@ class TestLooseTightDictionary < Test::Unit::TestCase
   # end
   def test_001_find
-    d = LooseTightDictionary.new %w{ NISSAN HONDA }
-    assert_equal 'NISSAN', d.find('MISSAM')
+    d = LooseTightDictionary.new %w{ RATZ CATZ }
+    assert_equal 'RATZ', d.find('RITZ')
+    assert_equal 'RATZ', d.find('RíTZ')
     d = LooseTightDictionary.new [ 'X' ]
     assert_equal 'X', d.find('X')
@@ -46,7 +48,7 @@ class TestLooseTightDictionary < Test::Unit::TestCase
     d = LooseTightDictionary.new ['BOEING 737-100/200', 'BOEING 737-900'], :tighteners => tighteners
     assert_equal 'BOEING 737-100/200', d.find('BOEING 737100 number 900')
   end
   def test_008_false_positive_without_identity
     d = LooseTightDictionary.new %w{ foo bar }
     assert_equal 'bar', d.find('baz')
@@ -63,7 +65,7 @@ class TestLooseTightDictionary < Test::Unit::TestCase
     assert_equal 'X', d.find('X')
     assert_equal nil, d.find('A')
   end
   # TODO this is not very helpful
   def test_0095_must_match_blocking
     d = LooseTightDictionary.new [ 'X' ], :blockings => [ /X/, /Y/ ], :must_match_blocking => true
@@ -98,7 +100,7 @@ class TestLooseTightDictionary < Test::Unit::TestCase
     d = LooseTightDictionary.new [ 'Boeing 747', 'Boeing 747SR', 'Boeing ER6' ], :blockings => [ /(boeing \d{3})/i, /boeing (7|E)/i, /boeing/i ], :first_blocking_decides => true
     assert_equal [ 'Boeing ER6' ], d.find_all('Boeing ER6')
     # or equivalently with an identity
     d = LooseTightDictionary.new [ 'Boeing 747', 'Boeing 747SR', 'Boeing ER6' ], :blockings => [ /(boeing \d{3})/i, /boeing/i ], :first_blocking_decides => true, :identities => [ /boeing (7|E)/i ]
     assert_equal [ 'Boeing ER6' ], d.find_all('Boeing ER6')
@@ -153,4 +155,17 @@ class TestLooseTightDictionary < Test::Unit::TestCase
   def test_018_no_result_if_best_score_is_zero
     assert_equal nil, LooseTightDictionary.new(['a']).find('b')
   end
+  def test_019_must_match_at_least_one_word
+    d = LooseTightDictionary.new %w{ RATZ CATZ }, :must_match_at_least_one_word => true
+    assert_equal nil, d.find('RITZ')
+  end
+  def test_020_stop_words
+    d = LooseTightDictionary.new [ 'A HOTEL', 'B HTL' ], :must_match_at_least_one_word => true
+    assert_equal 'B HTL', d.find('A HTL')
+    d = LooseTightDictionary.new [ 'A HOTEL', 'B HTL' ], :must_match_at_least_one_word => true, :stop_words => [ %r{HO?TE?L} ]
+    assert_equal 'A HOTEL', d.find('A HTL')
+  end
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: loose_tight_dictionary
 version: !ruby/object:Gem::Version
-  version: 1.0.2
+  version: 1.0.3
   prerelease:
 platform: ruby
 authors:
@@ -9,11 +9,11 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2011-12-04 00:00:00.000000000Z
+date: 2011-12-06 00:00:00.000000000Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: shoulda
-  requirement: &2178228360 !ruby/object:Gem::Requirement
+  requirement: &2185313400 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -21,10 +21,10 @@ dependencies:
         version: '0'
   type: :development
   prerelease: false
-  version_requirements: *2178228360
+  version_requirements: *2185313400
 - !ruby/object:Gem::Dependency
   name: remote_table
-  requirement: &2178227820 !ruby/object:Gem::Requirement
+  requirement: &2185284260 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -32,10 +32,10 @@ dependencies:
         version: '0'
   type: :development
   prerelease: false
-  version_requirements: *2178227820
+  version_requirements: *2185284260
 - !ruby/object:Gem::Dependency
   name: activerecord
-  requirement: &2178227200 !ruby/object:Gem::Requirement
+  requirement: &2185283700 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -43,10 +43,10 @@ dependencies:
         version: '3'
   type: :development
   prerelease: false
-  version_requirements: *2178227200
+  version_requirements: *2185283700
 - !ruby/object:Gem::Dependency
   name: mysql
-  requirement: &2178197080 !ruby/object:Gem::Requirement
+  requirement: &2185283260 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -54,10 +54,10 @@ dependencies:
         version: '0'
   type: :development
   prerelease: false
-  version_requirements: *2178197080
+  version_requirements: *2185283260
 - !ruby/object:Gem::Dependency
   name: cohort_scope
-  requirement: &2178196620 !ruby/object:Gem::Requirement
+  requirement: &2185282760 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -65,10 +65,10 @@ dependencies:
         version: '0'
   type: :development
   prerelease: false
-  version_requirements: *2178196620
+  version_requirements: *2185282760
 - !ruby/object:Gem::Dependency
   name: weighted_average
-  requirement: &2178196200 !ruby/object:Gem::Requirement
+  requirement: &2185282340 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -76,10 +76,10 @@ dependencies:
         version: '0'
   type: :development
   prerelease: false
-  version_requirements: *2178196200
+  version_requirements: *2185282340
 - !ruby/object:Gem::Dependency
   name: rake
-  requirement: &2178195780 !ruby/object:Gem::Requirement
+  requirement: &2185281880 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -87,10 +87,10 @@ dependencies:
         version: '0'
   type: :development
   prerelease: false
-  version_requirements: *2178195780
+  version_requirements: *2185281880
 - !ruby/object:Gem::Dependency
   name: activesupport
-  requirement: &2178195140 !ruby/object:Gem::Requirement
+  requirement: &2185281260 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -98,10 +98,10 @@ dependencies:
         version: '3'
   type: :runtime
   prerelease: false
-  version_requirements: *2178195140
+  version_requirements: *2185281260
 - !ruby/object:Gem::Dependency
   name: to_regexp
-  requirement: &2178194560 !ruby/object:Gem::Requirement
+  requirement: &2185280640 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -109,7 +109,7 @@ dependencies:
         version: 0.0.3
   type: :runtime
   prerelease: false
-  version_requirements: *2178194560
+  version_requirements: *2185280640
 description: Create dictionaries that link rows between two tables using loose matching
   (string similarity) by default and tight matching (regexp) by request.
 email:
@@ -150,6 +150,7 @@ files:
 - lib/loose_tight_dictionary/result.rb
 - lib/loose_tight_dictionary/score.rb
 - lib/loose_tight_dictionary/similarity.rb
+- lib/loose_tight_dictionary/stop_word.rb
 - lib/loose_tight_dictionary/tightener.rb
 - lib/loose_tight_dictionary/version.rb
 - lib/loose_tight_dictionary/wrapper.rb