fuzzy_search 0.3 → 0.4

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  Search through your models while tolerating slight mis-spellings. If you have a Person in your database named O'Reilly, you want your users to be able to find it even if they type "OReilly" or "O'Rielly".
4
4
 
5
- This gem is not as powerful as dedicated search tools like Solr, but it's much quicker and easier to set up. It uses your regular database for indexing, rather than an external service that has to be maintained separately.
5
+ This gem is not as powerful as dedicated search tools like Solr, but it's much easier to set up and more appropriate for searching small strings, such names of people or products. It uses your regular database, rather than an external service that has to be maintained separately.
6
6
 
7
7
  Currently only Rails 2 is supported. I welcome any contributions that resolve this!
8
8
 
@@ -10,14 +10,9 @@ Currently only Rails 2 is supported. I welcome any contributions that resolve th
10
10
 
11
11
  Add `fuzzy_search` to your Rails project's Gemfile, and do the usual `bundle install` dance.
12
12
 
13
- Then, run the generator and migrate to create the search table:
14
-
15
- $ ./script/generate fuzzy_search_setup
16
- $ rake db:migrate
17
-
18
13
  ## Example
19
14
 
20
- To allow a model to be searched, specify which columns are to be indexed:
15
+ To allow a model to be searched, specify which columns are to be searched on:
21
16
 
22
17
  ```ruby
23
18
  class Person < ActiveRecord::Base
@@ -26,24 +21,37 @@ class Person < ActiveRecord::Base
26
21
  # ...
27
22
  end
28
23
  ```
29
- Now, the gem will update the index whenever a Person is saved. To index all the existing records in a model, do this:
24
+ And then create a search table, and the initial trigrams, for that model:
30
25
 
31
- ```ruby
32
- Person.rebuild_fuzzy_search_index!
33
- ```
26
+ $ ./script/generate fuzzy_search_table Person
27
+ $ rake db:migrate
34
28
 
35
- The fuzzy_search method returns arrays:
29
+ The fuzzy_search method returns search results:
36
30
 
37
31
  ```ruby
38
32
  people = Person.fuzzy_search "OReilly"
39
33
  ```
40
-
41
- Fuzzy find works on scopes too, including named_scopes and on-the-fly scopes:
34
+ It works thru scopes too, including named_scopes and on-the-fly scopes:
42
35
 
43
36
  ```ruby
44
37
  people = Person.scoped({:conditions => ["state='active'"]}).fuzzy_search("OReilly")
45
38
  ```
46
39
 
40
+ If you have a very large data set but are typically searching for items
41
+ within a scoped subset of that data, you can get a significant performance
42
+ boost for those searches by having FuzzySearch include the scope-defining
43
+ field (which currently must be an integer) in the search table:
44
+
45
+ ```ruby
46
+ class Person < ActiveRecord::Base
47
+ # ...
48
+ fuzzy_searchable_on :first_name, :last_name, :subset_on => :zipcode
49
+ # ...
50
+ end
51
+
52
+ bev_hills_people = Person.fuzzy_search("OReilly", :subset => {:zipcode => 90210})
53
+ ```
54
+
47
55
  ## Licence and credits
48
56
 
49
57
  This gem is based on the rails-fuzzy-search plugin by iulianu
data/TODO ADDED
@@ -0,0 +1,37 @@
1
+ * Have the test app's fuzzy search table migrations generated
2
+ automatically by the test suite, instead of having to manually
3
+ update them every time I change the template.
4
+
5
+ * MyISAM: Consider setting concurrent_insert to 2 on MySQL to make sure we don't
6
+ have nasty locking scenarios when there are holes in the middle
7
+ of the trigrams table. See:
8
+ http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_concurrent_insert
9
+
10
+ Or, as an alternate way of doing defragmentation that doesn't block the server
11
+ for ages like OPTIMIZE, perhaps can just rebuild trigrams for the most-recently
12
+ inserted trigram rows until fragmentation no longer in place? In concurrent_insert
13
+ mode 1, this should cause it to reallocate those trigram rows backwards into the hole.
14
+
15
+ * Watch out for interactions between fuzzy_search_limit and scopes.
16
+ If the first 25 results from the search don't fit the scope, the
17
+ user will end up with an empty result set.
18
+
19
+ * If it doesn't slow things up too much, prefer short matches when given
20
+ short query strings, i.e. "ama" should rank "Amad" higher than
21
+ "Amalamadingdongwitcherydoo". Can do this by using the total
22
+ number of trigrams for the given record (or maybe a given word?)
23
+ as a secondary order field.
24
+
25
+ * Phonetic coding (i.e metaphone). Needs to be all-or-nothing
26
+ for any given AR model, otherwise we'd have to try searching
27
+ on both coded and raw versions of each query string to match
28
+ against both coded and non-coded properties, which would
29
+ throw away the optimization (although it would still allow
30
+ for phonetic near-matches, which is cool).
31
+
32
+ Maybe can make this a user-adjustable thing by bringing
33
+ back the 'specify a normalize method' feature.
34
+
35
+ * Maybe allow a [trigram,rec_id] to appear multiple times
36
+ under different subsets? Just have to make sure my scoring
37
+ rule counts only unique trigram hits.
data/gen_fake_data.rb ADDED
@@ -0,0 +1,26 @@
1
+ require 'rubygems'
2
+ require 'faker'
3
+ require 'active_record'
4
+ require 'active_support'
5
+ require 'lib/split_trigrams.rb'
6
+
7
+ $KCODE = 'utf-8'
8
+
9
+ people_f = open("preloaded_people.csv", "w")
10
+ trigrams_f = open("preloaded_person_fuzzy_search_trigrams.csv", "w")
11
+
12
+ srand(1234)
13
+
14
+ 1_000_000.times do |i|
15
+ idx = i+1
16
+ first_name, last_name = Faker::Name.first_name, Faker::Name.last_name
17
+ fav_num = rand(100)+1
18
+ people_f.puts([idx, first_name, last_name, 'Model airplanes', fav_num].join(","))
19
+
20
+ trigrams = FuzzySearch::split_trigrams([first_name, last_name])
21
+ trigrams.each do |tri|
22
+ trigrams_f.puts([fav_num, tri, idx].join(","))
23
+ end
24
+ end
25
+
26
+ [people_f, trigrams_f].each(&:close)
@@ -1,111 +1,71 @@
1
+ require 'set'
2
+
1
3
  module FuzzySearch
2
- module ModelExtensions
4
+ module FuzzyModelExtensions
3
5
  def self.included(base)
4
- base.extend ClassMethods
5
-
6
- base.write_inheritable_attribute :fuzzy_search_properties, []
7
- base.class_inheritable_reader :fuzzy_search_properties
6
+ {
7
+ :fuzzy_search_properties => [],
8
+ :fuzzy_search_limit => 25,
9
+ :fuzzy_search_subset_property => nil
10
+ }.each do |key, value|
11
+ base.write_inheritable_attribute key, value
12
+ base.class_inheritable_reader key
13
+ end
8
14
 
9
- base.write_inheritable_attribute :fuzzy_search_threshold, 5
10
- base.class_inheritable_reader :fuzzy_search_threshold
15
+ base.extend ClassMethods
11
16
  end
12
17
 
13
18
  module ClassMethods
14
19
  def fuzzy_searchable_on(*properties)
15
20
  # TODO: Complain if fuzzy_searchable_on is called more than once
21
+ # TODO: Complain if no properties were given
22
+ options = properties.last.is_a?(Hash) ? properties.pop : {}
16
23
  write_inheritable_attribute :fuzzy_search_properties, properties
17
- has_many :fuzzy_search_trigrams, :as => :rec, :dependent => :destroy
18
- after_save :update_fuzzy_search_trigrams!
19
- named_scope :fuzzy_search_scope, lambda { |words| generate_fuzzy_search_scope_params(words) }
20
- extend WordNormalizerClassMethod unless respond_to? :normalize
21
- include InstanceMethods
22
- end
23
-
24
- def fuzzy_search(words)
25
- # TODO: If fuzzy_search_scope doesn't exist, provide a useful error
26
- fuzzy_search_scope(words).all
27
- end
28
-
29
- def rebuild_fuzzy_search_index!
30
- FuzzySearchTrigram.delete_all(:rec_type => self.class.name)
31
- all.each do |rec|
32
- rec.update_fuzzy_search_trigrams!
24
+ if options[:subset_on]
25
+ write_inheritable_attribute :fuzzy_search_subset_property, options[:subset_on]
26
+ options.delete(:subset_on)
33
27
  end
34
- end
35
-
36
- private
37
28
 
38
- def generate_fuzzy_search_scope_params(words)
39
- no_results = {:conditions => "0 = 1"}
40
- return no_results unless words != nil
41
- words = words.strip.to_s.split(/[\s\-]+/) unless words.instance_of? Array
42
- return no_results unless words.size > 0
43
-
44
- trigrams = []
45
- words.each do |w|
46
- word = ' ' + normalize(w) + ' '
47
- word_as_chars = word.mb_chars
48
- trigrams << (0..word_as_chars.length-3).collect {|idx| word_as_chars[idx,3].to_s}
29
+ unless options.empty?
30
+ # TODO Test me
31
+ raise "Invalid options: #{options.keys.join(",")}"
49
32
  end
50
- trigrams = trigrams.flatten.uniq
51
-
52
- # Transform the list of columns in the searchable entity into
53
- # a SQL fragment like:
54
- # "table_name.id, table_name.field1, table_name.field2, ..."
55
- entity_fields = columns.map {|col| table_name + "." + col.name}.join(", ")
56
33
 
57
- # The SQL expression for calculating fuzzy_score
58
- # Has to be used multiple times because some databases (i.e. Postgres) do not support HAVING on named SELECT fields
59
- # TODO: See if we can't get the count(*) out of here, that's a non-trivial operation in some databases
60
- fuzzy_score_expr = "(((count(*)*100.0)/#{trigrams.size}) + " +
61
- "((count(*)*100.0)/(SELECT count(*) FROM fuzzy_search_trigrams WHERE rec_id = #{table_name}.#{primary_key} AND rec_type = '#{name}')))/2.0"
62
-
63
- # TODO: Optimize this query. In a large trigram table, this is going to go through a lot of dead ends.
64
- # Maybe I need to just bite the bullet and learn how to do procedures? That would break cross-database compatibility, though...
65
- return {
66
- :select => "#{fuzzy_score_expr} AS fuzzy_score, #{entity_fields}",
67
- :joins => ["LEFT OUTER JOIN fuzzy_search_trigrams ON fuzzy_search_trigrams.rec_id = #{table_name}.#{primary_key}"],
68
- :conditions => ["fuzzy_search_trigrams.token IN (?) AND rec_type = '#{name}'", trigrams],
69
- :group => "#{table_name}.#{primary_key}",
70
- :order => "fuzzy_score DESC",
71
- :having => "#{fuzzy_score_expr} >= #{fuzzy_search_threshold}"
34
+ named_scope :fuzzy_search_scope, lambda { |words|
35
+ fuzzy_search_scope_with_opts(words, {})
36
+ }
37
+ named_scope :fuzzy_search_scope_with_opts, lambda { |words, opts|
38
+ self::FuzzySearchTrigram.params_for_search(words, opts)
72
39
  }
40
+ extend FuzzySearchClassMethods
41
+ include InstanceMethods
42
+ after_save :update_fuzzy_search_trigrams!
43
+ after_destroy :delete_fuzzy_search_trigrams!
44
+
45
+ const_set(:FuzzySearchTrigram, Class.new(ActiveRecord::Base))
46
+ self::FuzzySearchTrigram.extend TrigramModelExtensions
47
+ self::FuzzySearchTrigram.set_target_class self
48
+ self::FuzzySearchTrigram.table_name = "#{name.underscore}_fuzzy_search_trigrams"
73
49
  end
74
50
  end
75
51
 
76
- module WordNormalizerClassMethod
77
- def normalize(word)
78
- word.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').downcase.to_s
52
+ module FuzzySearchClassMethods
53
+ def fuzzy_search(words, opts = {})
54
+ fuzzy_search_scope_with_opts(words, opts)
55
+ end
56
+
57
+ def rebuild_fuzzy_search_index!
58
+ self::FuzzySearchTrigram.rebuild_index
79
59
  end
80
60
  end
81
61
 
82
62
  module InstanceMethods
83
63
  def update_fuzzy_search_trigrams!
84
- FuzzySearchTrigram.delete_all(:rec_id => self.id, :rec_type => self.class.name)
85
-
86
- # to avoid double entries
87
- tokens = []
88
- self.class.fuzzy_search_properties.each do |prop|
89
- prop_value = send(prop)
90
- next if prop_value.nil?
91
- # split the property into words (which are separated by whitespaces)
92
- # and generate the trigrams for each word
93
- prop_value.to_s.split(/[\s\-]+/).each do |p|
94
- # put a space in front and at the end to emphasize the endings
95
- word = ' ' + self.class.normalize(p) + ' '
96
- word_as_chars = word.mb_chars
97
- (0..word_as_chars.length - 3).each do |idx|
98
- token = word_as_chars[idx, 3].to_s
99
- tokens << token unless tokens.member?(token)
100
- end
101
- end
102
- end
64
+ self.class::FuzzySearchTrigram.update_trigrams(self)
65
+ end
103
66
 
104
- FuzzySearchTrigram.import(
105
- [:token, :rec_id, :rec_type],
106
- tokens.map{|t| [t, self.id, self.class.name]},
107
- :validate => false
108
- )
67
+ def delete_fuzzy_search_trigrams!
68
+ self.class::FuzzySearchTrigram.delete_trigrams(self)
109
69
  end
110
70
  end
111
71
  end
data/lib/fuzzy_search.rb CHANGED
@@ -11,7 +11,8 @@ module ActiveRecord # :nodoc:
11
11
  end
12
12
 
13
13
  require 'fuzzy_model_extensions'
14
- require 'fuzzy_search_trigram'
15
14
  require 'fuzzy_search_ver'
15
+ require 'split_trigrams'
16
+ require 'trigram_model_extensions'
16
17
 
17
- ActiveRecord::Base.send(:include, FuzzySearch::ModelExtensions)
18
+ ActiveRecord::Base.send(:include, FuzzySearch::FuzzyModelExtensions)
@@ -1,3 +1,3 @@
1
1
  module FuzzySearch
2
- VERSION = "0.3"
2
+ VERSION = "0.4"
3
3
  end
@@ -0,0 +1,16 @@
1
+ module FuzzySearch
2
+ def self.split_trigrams(s)
3
+ s = s.join(" ") if s.is_a?(Array)
4
+ return [] unless s and s.respond_to?(:to_s)
5
+ words = s.to_s.strip.split(/[\s\-]+/)
6
+ trigrams = Set.new
7
+ words.each do |w|
8
+ chars = w.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').downcase.strip
9
+ chars = " " + chars + " "
10
+ (0..chars.length-3).each do |idx|
11
+ trigrams << chars[idx,3].to_s
12
+ end
13
+ end
14
+ return trigrams.to_a
15
+ end
16
+ end
@@ -0,0 +1,102 @@
1
+ module FuzzySearch
2
+ module TrigramModelExtensions
3
+ def set_target_class(cls)
4
+ write_inheritable_attribute :target_class, cls
5
+ class_inheritable_reader :target_class
6
+ end
7
+
8
+ def rebuild_index
9
+ reset_column_information
10
+ delete_all
11
+ target_class.find_each do |rec|
12
+ # Maybe can make this more efficient by updating trigrams for
13
+ # batches of records...
14
+ update_trigrams(rec)
15
+ end
16
+ end
17
+
18
+ def params_for_search(search_term, opts = {})
19
+ trigrams = FuzzySearch::split_trigrams(search_term)
20
+ # No results for empty search string
21
+ return {:conditions => "0 = 1"} unless trigrams and !trigrams.empty?
22
+
23
+ subset = nil
24
+ if opts[:subset]
25
+ if (
26
+ opts[:subset].size == 1 &&
27
+ opts[:subset].keys.first == target_class.fuzzy_search_subset_property
28
+ )
29
+ subset = opts[:subset].values.first.to_i
30
+ opts.delete(:subset)
31
+ else
32
+ # TODO Test me
33
+ raise "Invalid subset argument #{opts[:subset]}"
34
+ end
35
+ end
36
+
37
+ unless opts.empty?
38
+ # TODO Test me
39
+ raise "Invalid options: #{opts.keys.join(",")}"
40
+ end
41
+
42
+ # Retrieve the IDs of the matching items
43
+ search_result = connection.select_rows(
44
+ "SELECT rec_id, count(*) FROM #{i(table_name)} " +
45
+ (connection.adapter_name.downcase == 'mysql' ?
46
+ "IGNORE INDEX (index_#{table_name}_on_rec_id) " : ""
47
+ ) +
48
+ "WHERE token IN (#{trigrams.map{|t| v(t)}.join(',')}) " +
49
+ (subset ? "AND subset = #{subset} " : "") +
50
+ "GROUP by rec_id " +
51
+ "ORDER BY count(*) DESC " +
52
+ "LIMIT #{target_class.send(:fuzzy_search_limit)}"
53
+ )
54
+ return {:conditions => "0 = 1"} if search_result.empty?
55
+
56
+ # Perform a join between the target table and a fake table of matching ids
57
+ static_sql_union = search_result.map{|rec_id, count|
58
+ "SELECT #{v(rec_id)} AS id, #{count} AS score"
59
+ }.join(" UNION ");
60
+ primary_key_expr = "#{i(target_class.table_name)}.#{i(target_class.primary_key)}"
61
+ return {
62
+ :joins => "INNER JOIN (#{static_sql_union}) AS fuzzy_search_results ON " +
63
+ "fuzzy_search_results.id = #{primary_key_expr}",
64
+ :order => "fuzzy_search_results.score DESC"
65
+ }
66
+ end
67
+
68
+ def update_trigrams(rec)
69
+ delete_trigrams(rec)
70
+
71
+ values = target_class.fuzzy_search_properties.map{|p| rec.send(p)}
72
+ values = values.select{|p| p and p.respond_to?(:to_s)}
73
+ trigrams = FuzzySearch::split_trigrams(values)
74
+
75
+ subset_prop = target_class.fuzzy_search_subset_property
76
+ subset = subset_prop ? rec.send(subset_prop) : 0
77
+
78
+ # Ar-extensions import, much much faster than individual creates
79
+ import(
80
+ [:subset, :token, :rec_id],
81
+ trigrams.map{|t| [subset, t, rec.id]},
82
+ :validate => false
83
+ )
84
+ end
85
+
86
+ def delete_trigrams(rec)
87
+ delete_all(:rec_id => rec.id)
88
+ end
89
+
90
+ private
91
+
92
+ # Quote SQL identifier (i.e. table or column name)
93
+ def i(s)
94
+ connection.quote_column_name(s)
95
+ end
96
+
97
+ # Quote SQL value (i.e. a string or number)
98
+ def v(s)
99
+ connection.quote(s)
100
+ end
101
+ end
102
+ end
@@ -0,0 +1,21 @@
1
+ class FuzzySearchTableGenerator < Rails::Generator::NamedBase
2
+ attr_reader :target_model_name
3
+ attr_reader :table_name
4
+ attr_reader :migration_filename
5
+ attr_reader :migration_name
6
+
7
+ def initialize(runtime_args, runtime_options = {})
8
+ super
9
+ @target_model_name = name.classify
10
+ @table_name = "#{name.underscore}_fuzzy_search_trigrams"
11
+ @migration_filename = "create_#{name.underscore}_fuzzy_search_table"
12
+ @migration_name = migration_filename.classify
13
+ end
14
+
15
+ def manifest
16
+ record do |m|
17
+ m.migration_template "create_fuzzy_search_table.rb", "db/migrate",
18
+ :migration_file_name => migration_filename
19
+ end
20
+ end
21
+ end
@@ -0,0 +1,30 @@
1
+ class <%= migration_name %> < ActiveRecord::Migration
2
+ def self.up
3
+ is_mysql = ActiveRecord::Base.connection.adapter_name.downcase == 'mysql'
4
+ table = '<%= table_name %>'.to_sym
5
+
6
+ create_table table, :id => false do |t|
7
+ t.column :subset, :integer, :limit => 4, :null => false
8
+ if is_mysql
9
+ t.column :token, "binary(3)", :null => false
10
+ else
11
+ t.column :token, :binary, :limit => 3, :null => false
12
+ end
13
+ t.column :rec_id, :integer, :null => false
14
+ end
15
+
16
+ if is_mysql
17
+ ActiveRecord::Base.connection.execute(
18
+ "ALTER TABLE #{table.to_s} ENGINE = MyISAM"
19
+ )
20
+ end
21
+ add_index table, [:token, :subset, :rec_id], :name => "full_cover"
22
+ add_index table, [:rec_id]
23
+
24
+ <%= target_model_name %>.rebuild_fuzzy_search_index!
25
+ end
26
+
27
+ def self.down
28
+ drop_table '<%= table_name %>'.to_sym
29
+ end
30
+ end
@@ -1,3 +1,3 @@
1
1
  class Person < ActiveRecord::Base
2
- fuzzy_searchable_on :first_name, :last_name
2
+ fuzzy_searchable_on :first_name, :last_name, :subset_on => :favorite_number
3
3
  end
@@ -1,6 +1,11 @@
1
+ #test:
2
+ # adapter: sqlite3
3
+ # database: ":memory:"
4
+ # verbosity: quiet
5
+ # pool: 5
6
+ # timeout: 5000
7
+
1
8
  test:
2
- adapter: sqlite3
3
- database: ":memory:"
4
- verbosity: quiet
5
- pool: 5
6
- timeout: 5000
9
+ adapter: mysql
10
+ database: "fuzzy"
11
+ username: root
@@ -8,6 +8,7 @@ class CreateTables < ActiveRecord::Migration
8
8
  t.string :first_name
9
9
  t.string :last_name
10
10
  t.string :hobby
11
+ t.integer :favorite_number
11
12
  end
12
13
  end
13
14
 
@@ -0,0 +1,30 @@
1
+ class CreatePersonFuzzySearchTable < ActiveRecord::Migration
2
+ def self.up
3
+ is_mysql = ActiveRecord::Base.connection.adapter_name.downcase == 'mysql'
4
+ table = 'person_fuzzy_search_trigrams'.to_sym
5
+
6
+ create_table table, :id => false do |t|
7
+ t.column :subset, :integer, :limit => 4, :null => false
8
+ if is_mysql
9
+ t.column :token, "binary(3)", :null => false
10
+ else
11
+ t.column :token, :binary, :limit => 3, :null => false
12
+ end
13
+ t.column :rec_id, :integer, :null => false
14
+ end
15
+
16
+ if is_mysql
17
+ ActiveRecord::Base.connection.execute(
18
+ "ALTER TABLE #{table.to_s} ENGINE = MyISAM"
19
+ )
20
+ end
21
+ add_index table, [:token, :subset, :rec_id], :name => "full_cover"
22
+ add_index table, [:rec_id]
23
+
24
+ Person.rebuild_fuzzy_search_index!
25
+ end
26
+
27
+ def self.down
28
+ drop_table 'person_fuzzy_search_trigrams'.to_sym
29
+ end
30
+ end
@@ -0,0 +1,30 @@
1
+ class CreateEmailFuzzySearchTable < ActiveRecord::Migration
2
+ def self.up
3
+ is_mysql = ActiveRecord::Base.connection.adapter_name.downcase == 'mysql'
4
+ table = 'email_fuzzy_search_trigrams'.to_sym
5
+
6
+ create_table table, :id => false do |t|
7
+ t.column :subset, :integer, :limit => 4, :null => false
8
+ if is_mysql
9
+ t.column :token, "binary(3)", :null => false
10
+ else
11
+ t.column :token, :binary, :limit => 3, :null => false
12
+ end
13
+ t.column :rec_id, :integer, :null => false
14
+ end
15
+
16
+ if is_mysql
17
+ ActiveRecord::Base.connection.execute(
18
+ "ALTER TABLE #{table.to_s} ENGINE = MyISAM"
19
+ )
20
+ end
21
+ add_index table, [:token, :subset, :rec_id], :name => "full_cover"
22
+ add_index table, [:rec_id]
23
+
24
+ Email.rebuild_fuzzy_search_index!
25
+ end
26
+
27
+ def self.down
28
+ drop_table 'email_fuzzy_search_trigrams'.to_sym
29
+ end
30
+ end
data/test/factories.rb CHANGED
@@ -5,6 +5,7 @@ FactoryGirl.define do
5
5
  first_name 'John'
6
6
  last_name 'Doe'
7
7
  hobby 'Flying kites'
8
+ sequence(:favorite_number) {|n| (n%3) + 1 }
8
9
  end
9
10
 
10
11
  factory :email do
data/test/test_helper.rb CHANGED
@@ -18,6 +18,7 @@ end
18
18
 
19
19
  require 'rubygems'
20
20
  require 'minitest/autorun'
21
+ require 'minitest/benchmark' if ENV["BENCH"]
21
22
  require 'redgreen'
22
23
  require 'pp'
23
24
 
@@ -61,9 +62,21 @@ module MiniTest
61
62
  end
62
63
 
63
64
  require 'factories'
65
+ require 'faker'
64
66
  MiniTest::Unit::TestCase.send(:include, Factory::Syntax::Methods)
65
67
 
66
68
  MiniTest::Unit::TestCase.add_setup_hook do
67
69
  ActiveRecord::Migration.verbose = false
68
70
  ActiveRecord::Migrator.migrate("#{Rails.root}/db/migrate") # Migrations in the test app
69
71
  end
72
+
73
+ # From http://stackoverflow.com/questions/1090801/1091106
74
+ ActiveRecord::ConnectionAdapters::AbstractAdapter.class_eval do
75
+ attr_reader :last_query
76
+
77
+ def log_with_last_query(sql, name, &block)
78
+ @last_query = [sql, name]
79
+ log_without_last_query(sql, name, &block)
80
+ end
81
+ alias_method_chain :log, :last_query
82
+ end
@@ -2,7 +2,7 @@ require File.expand_path(File.dirname(__FILE__) + '/../test_helper')
2
2
 
3
3
  describe "fuzzy_search" do
4
4
  before do
5
- create(:person, :last_name => "meier", :first_name => "kristian")
5
+ @kris = create(:person, :last_name => "meier", :first_name => "kristian")
6
6
  create(:person, :last_name => "meyer", :first_name => "christian", :hobby => "Bicycling")
7
7
  create(:person, :last_name => "mayr", :first_name => "Chris")
8
8
  create(:person, :last_name => "maier", :first_name => "christoph", :hobby => "Bicycling")
@@ -18,12 +18,13 @@ describe "fuzzy_search" do
18
18
  after do
19
19
  Person.delete_all
20
20
  Email.delete_all
21
- FuzzySearchTrigram.delete_all
21
+ Person::FuzzySearchTrigram.delete_all
22
+ Email::FuzzySearchTrigram.delete_all
22
23
  end
23
24
 
24
25
  it "can search for records with similar strings to a query" do
25
- assert_equal 3, Person.fuzzy_search("meyr").size
26
- assert_equal 1, Person.fuzzy_search("myr").size
26
+ refute_empty Person.fuzzy_search("maier")
27
+ refute_empty Person.fuzzy_search("ather")
27
28
  end
28
29
 
29
30
  it "can search on multiple columns" do
@@ -32,14 +33,10 @@ describe "fuzzy_search" do
32
33
  assert_equal "meier", result[0].last_name
33
34
  end
34
35
 
35
- it "sorts results by their fuzzy match score" do
36
+ it "sorts results with the best match first" do
36
37
  result = Person.fuzzy_search("kristian meier")
37
- assert_equal 100, result[0].fuzzy_score
38
- prior = 100
39
- (1..3).each do |idx|
40
- assert result[idx].fuzzy_score <= prior
41
- prior = result[idx].fuzzy_score
42
- end
38
+ assert_equal @kris, result.first
39
+ assert result.size > 1
43
40
  end
44
41
 
45
42
  it "returns an empty result set when given an empty query string" do
@@ -48,9 +45,16 @@ describe "fuzzy_search" do
48
45
  end
49
46
 
50
47
  it "updates the search index automatically when a new record is saved" do
51
- assert_empty Person.fuzzy_search("Dave")
48
+ assert_empty Person.fuzzy_search("David")
52
49
  create(:person, :first_name => "David", :last_name => "Simon")
53
- refute_empty Person.fuzzy_search("Dave")
50
+ refute_empty Person.fuzzy_search("David")
51
+ end
52
+
53
+ it "only updates the appropriate trigram table" do
54
+ original_state = Email::FuzzySearchTrigram.all.to_a
55
+ create(:person, :first_name => "David", :last_name => "Simon")
56
+ new_state = Email::FuzzySearchTrigram.all.to_a
57
+ assert_equal original_state, new_state
54
58
  end
55
59
 
56
60
  it "updates the search index automatically when a record is updated" do
@@ -80,12 +84,8 @@ describe "fuzzy_search" do
80
84
  refute_empty Email.fuzzy_search("oscar")
81
85
  end
82
86
 
83
- it "can normalize strings" do
84
- assert_equal("aaaaaa", Person.normalize("ÀÁÂÃÄÅ"))
85
- end
86
-
87
87
  it "normalizes strings before searching on them" do
88
- assert_equal 1, Person.fuzzy_search("Müll").size
88
+ assert_equal 1, Person.fuzzy_search("Müell").size
89
89
  assert_equal 1, Email.fuzzy_search("öscar").size
90
90
  end
91
91
 
@@ -95,12 +95,21 @@ describe "fuzzy_search" do
95
95
 
96
96
  it "can search through a scope" do
97
97
  scope = Person.scoped({:conditions => {:hobby => "Bicycling"}})
98
- assert_equal 4, Person.fuzzy_search("chris").size
99
- assert_equal 2, scope.fuzzy_search("chris").size
98
+ full = Person.fuzzy_search("chris")
99
+ subset = scope.fuzzy_search("chris")
100
+ assert full.size > subset.size
101
+ assert subset.size > 0
102
+ end
103
+
104
+ it "can use the subset field to narrow the search range" do
105
+ full = Person.fuzzy_search("chris")
106
+ subset = Person.fuzzy_search("chris", :subset => {:favorite_number => 2})
107
+ assert full.size > subset.size
108
+ assert subset.size > 0
100
109
  end
101
110
 
102
111
  it "can rebuild the search index from scratch" do
103
- FuzzySearchTrigram.delete_all
112
+ Person::FuzzySearchTrigram.delete_all
104
113
  assert_empty Person.fuzzy_search("chris")
105
114
  Person.rebuild_fuzzy_search_index!
106
115
  refute_empty Person.fuzzy_search("chris")
@@ -0,0 +1,51 @@
1
+ require File.expand_path(File.dirname(__FILE__) + '/../test_helper')
2
+
3
+ describe "fuzzy_search" do
4
+ if ENV["BENCH"]
5
+ before do
6
+ Person.set_table_name "preloaded_people"
7
+ Person::FuzzySearchTrigram.set_table_name "preloaded_person_fuzzy_search_trigrams"
8
+
9
+ # Force Person to reload its fuzzy type id from preloaded_types
10
+ Person.send(:write_inheritable_attribute, :fuzzy_search_cached_type_id, nil)
11
+ end
12
+
13
+ after do
14
+ Person.set_table_name "people"
15
+ Person::FuzzySearchTrigram.set_table_name "person_fuzzy_search_trigrams"
16
+ end
17
+
18
+ bench_range do
19
+ [1, 10, 50]
20
+ end
21
+
22
+ bench_performance_linear "regular queries", 0.9 do |n|
23
+ c = 0
24
+ n.times do
25
+ result = Person.scoped(:limit => 20).fuzzy_search(Faker::Name.last_name)
26
+ c += result.size
27
+ end
28
+ if n >= 30
29
+ rpq = c/(n.to_f)
30
+ if rpq < 2
31
+ raise "Sanity check failure, average results per query: #{rpq}"
32
+ end
33
+ end
34
+ end
35
+
36
+ bench_performance_linear "subset queries", 0.9 do |n|
37
+ c = 0
38
+ n.times do
39
+ result = Person.scoped(:limit => 20).fuzzy_search(Faker::Name.last_name,
40
+ :subset => {:favorite_number => rand(100)+1})
41
+ c += result.size
42
+ end
43
+ if n >= 30
44
+ rpq = c/(n.to_f)
45
+ if rpq < 0.1
46
+ raise "Sanity check failure, average results per query: #{rpq}"
47
+ end
48
+ end
49
+ end
50
+ end
51
+ end
@@ -0,0 +1,18 @@
1
+ require File.expand_path(File.dirname(__FILE__) + '/../test_helper')
2
+ require 'set'
3
+
4
+ describe "fuzzy search trigram splitter" do
5
+ it "can split a string into trigrams with emphasized edges" do
6
+ t = [" da", "dav", "avi", "vid", "id "]
7
+ assert_equal Set.new(t), Set.new(FuzzySearch::split_trigrams("david"))
8
+ end
9
+
10
+ it "can normalize strings" do
11
+ assert_equal([" a "], FuzzySearch::split_trigrams("À"))
12
+ end
13
+
14
+ it "can handle an array of strings" do
15
+ t = [" x ", " y ", " zi", "zig", "ig "]
16
+ assert_equal Set.new(t), Set.new(FuzzySearch::split_trigrams(["x", "y", "zig"]))
17
+ end
18
+ end
metadata CHANGED
@@ -1,12 +1,12 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fuzzy_search
3
3
  version: !ruby/object:Gem::Version
4
- hash: 13
4
+ hash: 3
5
5
  prerelease:
6
6
  segments:
7
7
  - 0
8
- - 3
9
- version: "0.3"
8
+ - 4
9
+ version: "0.4"
10
10
  platform: ruby
11
11
  authors:
12
12
  - Kristian Meier
@@ -15,7 +15,7 @@ autorequire:
15
15
  bindir: bin
16
16
  cert_chain: []
17
17
 
18
- date: 2011-10-13 00:00:00 Z
18
+ date: 2011-10-27 00:00:00 Z
19
19
  dependencies:
20
20
  - !ruby/object:Gem::Dependency
21
21
  name: ar-extensions
@@ -45,12 +45,15 @@ files:
45
45
  - MIT-LICENSE
46
46
  - README.md
47
47
  - Rakefile
48
+ - TODO
49
+ - gen_fake_data.rb
48
50
  - lib/fuzzy_model_extensions.rb
49
51
  - lib/fuzzy_search.rb
50
- - lib/fuzzy_search_trigram.rb
51
52
  - lib/fuzzy_search_ver.rb
52
- - rails_generators/fuzzy_search_setup/fuzzy_search_setup_generator.rb
53
- - rails_generators/fuzzy_search_setup/templates/create_fuzzy_search_trigrams.rb
53
+ - lib/split_trigrams.rb
54
+ - lib/trigram_model_extensions.rb
55
+ - rails_generators/fuzzy_search_table/fuzzy_search_table_generator.rb
56
+ - rails_generators/fuzzy_search_table/templates/create_fuzzy_search_table.rb
54
57
  - test/app_root/app/models/email.rb
55
58
  - test/app_root/app/models/person.rb
56
59
  - test/app_root/config/boot.rb
@@ -58,13 +61,16 @@ files:
58
61
  - test/app_root/config/environment.rb
59
62
  - test/app_root/config/environments/test.rb
60
63
  - test/app_root/config/routes.rb
61
- - test/app_root/db/migrate/20100529235049_create_tables.rb
62
- - test/app_root/db/migrate/20111013132330_create_fuzzy_search_trigrams.rb
64
+ - test/app_root/db/migrate/2_create_tables.rb
65
+ - test/app_root/db/migrate/3_create_person_fuzzy_search_table.rb
66
+ - test/app_root/db/migrate/4_create_email_fuzzy_search_table.rb
63
67
  - test/app_root/vendor/plugins/fuzzy_search/init.rb
64
68
  - test/factories.rb
65
69
  - test/test.watchr
66
70
  - test/test_helper.rb
67
71
  - test/unit/fuzzy_search_test.rb
72
+ - test/unit/performance_test.rb
73
+ - test/unit/split_trigrams_test.rb
68
74
  homepage: http://github.com/DavidMikeSimon/fuzzy_search
69
75
  licenses: []
70
76
 
@@ -1,2 +0,0 @@
1
- class FuzzySearchTrigram < ActiveRecord::Base
2
- end
@@ -1,8 +0,0 @@
1
- class FuzzySearchSetupGenerator < Rails::Generator::Base
2
- def manifest
3
- record do |m|
4
- m.migration_template "create_fuzzy_search_trigrams.rb", "db/migrate",
5
- :migration_file_name => "create_fuzzy_search_trigrams"
6
- end
7
- end
8
- end
@@ -1,15 +0,0 @@
1
- class CreateFuzzySearchTrigrams < ActiveRecord::Migration
2
- def self.up
3
- create_table :fuzzy_search_trigrams, :id => false do |t|
4
- t.column :token, :string, :limit => 3
5
- t.column :rec_type, :string
6
- t.column :rec_id, :integer
7
- end
8
-
9
- add_index :fuzzy_search_trigrams, [:rec_type, :token]
10
- end
11
-
12
- def self.down
13
- drop_table :fuzzy_search_trigrams
14
- end
15
- end
@@ -1,15 +0,0 @@
1
- class CreateFuzzySearchTrigrams < ActiveRecord::Migration
2
- def self.up
3
- create_table :fuzzy_search_trigrams, :id => false do |t|
4
- t.column :token, :string, :limit => 3
5
- t.column :rec_type, :string
6
- t.column :rec_id, :integer
7
- end
8
-
9
- add_index :fuzzy_search_trigrams, [:rec_type, :token]
10
- end
11
-
12
- def self.down
13
- drop_table :fuzzy_search_trigrams
14
- end
15
- end