fuzzy_search 0.3 → 0.4
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +22 -14
- data/TODO +37 -0
- data/gen_fake_data.rb +26 -0
- data/lib/fuzzy_model_extensions.rb +45 -85
- data/lib/fuzzy_search.rb +3 -2
- data/lib/fuzzy_search_ver.rb +1 -1
- data/lib/split_trigrams.rb +16 -0
- data/lib/trigram_model_extensions.rb +102 -0
- data/rails_generators/fuzzy_search_table/fuzzy_search_table_generator.rb +21 -0
- data/rails_generators/fuzzy_search_table/templates/create_fuzzy_search_table.rb +30 -0
- data/test/app_root/app/models/person.rb +1 -1
- data/test/app_root/config/database.yml +10 -5
- data/test/app_root/db/migrate/{20100529235049_create_tables.rb → 2_create_tables.rb} +1 -0
- data/test/app_root/db/migrate/3_create_person_fuzzy_search_table.rb +30 -0
- data/test/app_root/db/migrate/4_create_email_fuzzy_search_table.rb +30 -0
- data/test/factories.rb +1 -0
- data/test/test_helper.rb +13 -0
- data/test/unit/fuzzy_search_test.rb +30 -21
- data/test/unit/performance_test.rb +51 -0
- data/test/unit/split_trigrams_test.rb +18 -0
- metadata +15 -9
- data/lib/fuzzy_search_trigram.rb +0 -2
- data/rails_generators/fuzzy_search_setup/fuzzy_search_setup_generator.rb +0 -8
- data/rails_generators/fuzzy_search_setup/templates/create_fuzzy_search_trigrams.rb +0 -15
- data/test/app_root/db/migrate/20111013132330_create_fuzzy_search_trigrams.rb +0 -15
data/README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2
2
|
|
3
3
|
Search through your models while tolerating slight mis-spellings. If you have a Person in your database named O'Reilly, you want your users to be able to find it even if they type "OReilly" or "O'Rielly".
|
4
4
|
|
5
|
-
This gem is not as powerful as dedicated search tools like Solr, but it's much
|
5
|
+
This gem is not as powerful as dedicated search tools like Solr, but it's much easier to set up and more appropriate for searching small strings, such names of people or products. It uses your regular database, rather than an external service that has to be maintained separately.
|
6
6
|
|
7
7
|
Currently only Rails 2 is supported. I welcome any contributions that resolve this!
|
8
8
|
|
@@ -10,14 +10,9 @@ Currently only Rails 2 is supported. I welcome any contributions that resolve th
|
|
10
10
|
|
11
11
|
Add `fuzzy_search` to your Rails project's Gemfile, and do the usual `bundle install` dance.
|
12
12
|
|
13
|
-
Then, run the generator and migrate to create the search table:
|
14
|
-
|
15
|
-
$ ./script/generate fuzzy_search_setup
|
16
|
-
$ rake db:migrate
|
17
|
-
|
18
13
|
## Example
|
19
14
|
|
20
|
-
To allow a model to be searched, specify which columns are to be
|
15
|
+
To allow a model to be searched, specify which columns are to be searched on:
|
21
16
|
|
22
17
|
```ruby
|
23
18
|
class Person < ActiveRecord::Base
|
@@ -26,24 +21,37 @@ class Person < ActiveRecord::Base
|
|
26
21
|
# ...
|
27
22
|
end
|
28
23
|
```
|
29
|
-
|
24
|
+
And then create a search table, and the initial trigrams, for that model:
|
30
25
|
|
31
|
-
|
32
|
-
|
33
|
-
```
|
26
|
+
$ ./script/generate fuzzy_search_table Person
|
27
|
+
$ rake db:migrate
|
34
28
|
|
35
|
-
The fuzzy_search method returns
|
29
|
+
The fuzzy_search method returns search results:
|
36
30
|
|
37
31
|
```ruby
|
38
32
|
people = Person.fuzzy_search "OReilly"
|
39
33
|
```
|
40
|
-
|
41
|
-
Fuzzy find works on scopes too, including named_scopes and on-the-fly scopes:
|
34
|
+
It works thru scopes too, including named_scopes and on-the-fly scopes:
|
42
35
|
|
43
36
|
```ruby
|
44
37
|
people = Person.scoped({:conditions => ["state='active'"]}).fuzzy_search("OReilly")
|
45
38
|
```
|
46
39
|
|
40
|
+
If you have a very large data set but are typically searching for items
|
41
|
+
within a scoped subset of that data, you can get a significant performance
|
42
|
+
boost for those searches by having FuzzySearch include the scope-defining
|
43
|
+
field (which currently must be an integer) in the search table:
|
44
|
+
|
45
|
+
```ruby
|
46
|
+
class Person < ActiveRecord::Base
|
47
|
+
# ...
|
48
|
+
fuzzy_searchable_on :first_name, :last_name, :subset_on => :zipcode
|
49
|
+
# ...
|
50
|
+
end
|
51
|
+
|
52
|
+
bev_hills_people = Person.fuzzy_search("OReilly", :subset => {:zipcode => 90210})
|
53
|
+
```
|
54
|
+
|
47
55
|
## Licence and credits
|
48
56
|
|
49
57
|
This gem is based on the rails-fuzzy-search plugin by iulianu
|
data/TODO
ADDED
@@ -0,0 +1,37 @@
|
|
1
|
+
* Have the test app's fuzzy search table migrations generated
|
2
|
+
automatically by the test suite, instead of having to manually
|
3
|
+
update them every time I change the template.
|
4
|
+
|
5
|
+
* MyISAM: Consider setting concurrent_insert to 2 on MySQL to make sure we don't
|
6
|
+
have nasty locking scenarios when there are holes in the middle
|
7
|
+
of the trigrams table. See:
|
8
|
+
http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_concurrent_insert
|
9
|
+
|
10
|
+
Or, as an alternate way of doing defragmentation that doesn't block the server
|
11
|
+
for ages like OPTIMIZE, perhaps can just rebuild trigrams for the most-recently
|
12
|
+
inserted trigram rows until fragmentation no longer in place? In concurrent_insert
|
13
|
+
mode 1, this should cause it to reallocate those trigram rows backwards into the hole.
|
14
|
+
|
15
|
+
* Watch out for interactions between fuzzy_search_limit and scopes.
|
16
|
+
If the first 25 results from the search don't fit the scope, the
|
17
|
+
user will end up with an empty result set.
|
18
|
+
|
19
|
+
* If it doesn't slow things up too much, prefer short matches when given
|
20
|
+
short query strings, i.e. "ama" should rank "Amad" higher than
|
21
|
+
"Amalamadingdongwitcherydoo". Can do this by using the total
|
22
|
+
number of trigrams for the given record (or maybe a given word?)
|
23
|
+
as a secondary order field.
|
24
|
+
|
25
|
+
* Phonetic coding (i.e metaphone). Needs to be all-or-nothing
|
26
|
+
for any given AR model, otherwise we'd have to try searching
|
27
|
+
on both coded and raw versions of each query string to match
|
28
|
+
against both coded and non-coded properties, which would
|
29
|
+
throw away the optimization (although it would still allow
|
30
|
+
for phonetic near-matches, which is cool).
|
31
|
+
|
32
|
+
Maybe can make this a user-adjustable thing by bringing
|
33
|
+
back the 'specify a normalize method' feature.
|
34
|
+
|
35
|
+
* Maybe allow a [trigram,rec_id] to appear multiple times
|
36
|
+
under different subsets? Just have to make sure my scoring
|
37
|
+
rule counts only unique trigram hits.
|
data/gen_fake_data.rb
ADDED
@@ -0,0 +1,26 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'faker'
|
3
|
+
require 'active_record'
|
4
|
+
require 'active_support'
|
5
|
+
require 'lib/split_trigrams.rb'
|
6
|
+
|
7
|
+
$KCODE = 'utf-8'
|
8
|
+
|
9
|
+
people_f = open("preloaded_people.csv", "w")
|
10
|
+
trigrams_f = open("preloaded_person_fuzzy_search_trigrams.csv", "w")
|
11
|
+
|
12
|
+
srand(1234)
|
13
|
+
|
14
|
+
1_000_000.times do |i|
|
15
|
+
idx = i+1
|
16
|
+
first_name, last_name = Faker::Name.first_name, Faker::Name.last_name
|
17
|
+
fav_num = rand(100)+1
|
18
|
+
people_f.puts([idx, first_name, last_name, 'Model airplanes', fav_num].join(","))
|
19
|
+
|
20
|
+
trigrams = FuzzySearch::split_trigrams([first_name, last_name])
|
21
|
+
trigrams.each do |tri|
|
22
|
+
trigrams_f.puts([fav_num, tri, idx].join(","))
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
[people_f, trigrams_f].each(&:close)
|
@@ -1,111 +1,71 @@
|
|
1
|
+
require 'set'
|
2
|
+
|
1
3
|
module FuzzySearch
|
2
|
-
module
|
4
|
+
module FuzzyModelExtensions
|
3
5
|
def self.included(base)
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
6
|
+
{
|
7
|
+
:fuzzy_search_properties => [],
|
8
|
+
:fuzzy_search_limit => 25,
|
9
|
+
:fuzzy_search_subset_property => nil
|
10
|
+
}.each do |key, value|
|
11
|
+
base.write_inheritable_attribute key, value
|
12
|
+
base.class_inheritable_reader key
|
13
|
+
end
|
8
14
|
|
9
|
-
base.
|
10
|
-
base.class_inheritable_reader :fuzzy_search_threshold
|
15
|
+
base.extend ClassMethods
|
11
16
|
end
|
12
17
|
|
13
18
|
module ClassMethods
|
14
19
|
def fuzzy_searchable_on(*properties)
|
15
20
|
# TODO: Complain if fuzzy_searchable_on is called more than once
|
21
|
+
# TODO: Complain if no properties were given
|
22
|
+
options = properties.last.is_a?(Hash) ? properties.pop : {}
|
16
23
|
write_inheritable_attribute :fuzzy_search_properties, properties
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
extend WordNormalizerClassMethod unless respond_to? :normalize
|
21
|
-
include InstanceMethods
|
22
|
-
end
|
23
|
-
|
24
|
-
def fuzzy_search(words)
|
25
|
-
# TODO: If fuzzy_search_scope doesn't exist, provide a useful error
|
26
|
-
fuzzy_search_scope(words).all
|
27
|
-
end
|
28
|
-
|
29
|
-
def rebuild_fuzzy_search_index!
|
30
|
-
FuzzySearchTrigram.delete_all(:rec_type => self.class.name)
|
31
|
-
all.each do |rec|
|
32
|
-
rec.update_fuzzy_search_trigrams!
|
24
|
+
if options[:subset_on]
|
25
|
+
write_inheritable_attribute :fuzzy_search_subset_property, options[:subset_on]
|
26
|
+
options.delete(:subset_on)
|
33
27
|
end
|
34
|
-
end
|
35
|
-
|
36
|
-
private
|
37
28
|
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
words = words.strip.to_s.split(/[\s\-]+/) unless words.instance_of? Array
|
42
|
-
return no_results unless words.size > 0
|
43
|
-
|
44
|
-
trigrams = []
|
45
|
-
words.each do |w|
|
46
|
-
word = ' ' + normalize(w) + ' '
|
47
|
-
word_as_chars = word.mb_chars
|
48
|
-
trigrams << (0..word_as_chars.length-3).collect {|idx| word_as_chars[idx,3].to_s}
|
29
|
+
unless options.empty?
|
30
|
+
# TODO Test me
|
31
|
+
raise "Invalid options: #{options.keys.join(",")}"
|
49
32
|
end
|
50
|
-
trigrams = trigrams.flatten.uniq
|
51
|
-
|
52
|
-
# Transform the list of columns in the searchable entity into
|
53
|
-
# a SQL fragment like:
|
54
|
-
# "table_name.id, table_name.field1, table_name.field2, ..."
|
55
|
-
entity_fields = columns.map {|col| table_name + "." + col.name}.join(", ")
|
56
33
|
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
# TODO: Optimize this query. In a large trigram table, this is going to go through a lot of dead ends.
|
64
|
-
# Maybe I need to just bite the bullet and learn how to do procedures? That would break cross-database compatibility, though...
|
65
|
-
return {
|
66
|
-
:select => "#{fuzzy_score_expr} AS fuzzy_score, #{entity_fields}",
|
67
|
-
:joins => ["LEFT OUTER JOIN fuzzy_search_trigrams ON fuzzy_search_trigrams.rec_id = #{table_name}.#{primary_key}"],
|
68
|
-
:conditions => ["fuzzy_search_trigrams.token IN (?) AND rec_type = '#{name}'", trigrams],
|
69
|
-
:group => "#{table_name}.#{primary_key}",
|
70
|
-
:order => "fuzzy_score DESC",
|
71
|
-
:having => "#{fuzzy_score_expr} >= #{fuzzy_search_threshold}"
|
34
|
+
named_scope :fuzzy_search_scope, lambda { |words|
|
35
|
+
fuzzy_search_scope_with_opts(words, {})
|
36
|
+
}
|
37
|
+
named_scope :fuzzy_search_scope_with_opts, lambda { |words, opts|
|
38
|
+
self::FuzzySearchTrigram.params_for_search(words, opts)
|
72
39
|
}
|
40
|
+
extend FuzzySearchClassMethods
|
41
|
+
include InstanceMethods
|
42
|
+
after_save :update_fuzzy_search_trigrams!
|
43
|
+
after_destroy :delete_fuzzy_search_trigrams!
|
44
|
+
|
45
|
+
const_set(:FuzzySearchTrigram, Class.new(ActiveRecord::Base))
|
46
|
+
self::FuzzySearchTrigram.extend TrigramModelExtensions
|
47
|
+
self::FuzzySearchTrigram.set_target_class self
|
48
|
+
self::FuzzySearchTrigram.table_name = "#{name.underscore}_fuzzy_search_trigrams"
|
73
49
|
end
|
74
50
|
end
|
75
51
|
|
76
|
-
module
|
77
|
-
def
|
78
|
-
|
52
|
+
module FuzzySearchClassMethods
|
53
|
+
def fuzzy_search(words, opts = {})
|
54
|
+
fuzzy_search_scope_with_opts(words, opts)
|
55
|
+
end
|
56
|
+
|
57
|
+
def rebuild_fuzzy_search_index!
|
58
|
+
self::FuzzySearchTrigram.rebuild_index
|
79
59
|
end
|
80
60
|
end
|
81
61
|
|
82
62
|
module InstanceMethods
|
83
63
|
def update_fuzzy_search_trigrams!
|
84
|
-
FuzzySearchTrigram.
|
85
|
-
|
86
|
-
# to avoid double entries
|
87
|
-
tokens = []
|
88
|
-
self.class.fuzzy_search_properties.each do |prop|
|
89
|
-
prop_value = send(prop)
|
90
|
-
next if prop_value.nil?
|
91
|
-
# split the property into words (which are separated by whitespaces)
|
92
|
-
# and generate the trigrams for each word
|
93
|
-
prop_value.to_s.split(/[\s\-]+/).each do |p|
|
94
|
-
# put a space in front and at the end to emphasize the endings
|
95
|
-
word = ' ' + self.class.normalize(p) + ' '
|
96
|
-
word_as_chars = word.mb_chars
|
97
|
-
(0..word_as_chars.length - 3).each do |idx|
|
98
|
-
token = word_as_chars[idx, 3].to_s
|
99
|
-
tokens << token unless tokens.member?(token)
|
100
|
-
end
|
101
|
-
end
|
102
|
-
end
|
64
|
+
self.class::FuzzySearchTrigram.update_trigrams(self)
|
65
|
+
end
|
103
66
|
|
104
|
-
|
105
|
-
|
106
|
-
tokens.map{|t| [t, self.id, self.class.name]},
|
107
|
-
:validate => false
|
108
|
-
)
|
67
|
+
def delete_fuzzy_search_trigrams!
|
68
|
+
self.class::FuzzySearchTrigram.delete_trigrams(self)
|
109
69
|
end
|
110
70
|
end
|
111
71
|
end
|
data/lib/fuzzy_search.rb
CHANGED
@@ -11,7 +11,8 @@ module ActiveRecord # :nodoc:
|
|
11
11
|
end
|
12
12
|
|
13
13
|
require 'fuzzy_model_extensions'
|
14
|
-
require 'fuzzy_search_trigram'
|
15
14
|
require 'fuzzy_search_ver'
|
15
|
+
require 'split_trigrams'
|
16
|
+
require 'trigram_model_extensions'
|
16
17
|
|
17
|
-
ActiveRecord::Base.send(:include, FuzzySearch::
|
18
|
+
ActiveRecord::Base.send(:include, FuzzySearch::FuzzyModelExtensions)
|
data/lib/fuzzy_search_ver.rb
CHANGED
@@ -0,0 +1,16 @@
|
|
1
|
+
module FuzzySearch
|
2
|
+
def self.split_trigrams(s)
|
3
|
+
s = s.join(" ") if s.is_a?(Array)
|
4
|
+
return [] unless s and s.respond_to?(:to_s)
|
5
|
+
words = s.to_s.strip.split(/[\s\-]+/)
|
6
|
+
trigrams = Set.new
|
7
|
+
words.each do |w|
|
8
|
+
chars = w.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').downcase.strip
|
9
|
+
chars = " " + chars + " "
|
10
|
+
(0..chars.length-3).each do |idx|
|
11
|
+
trigrams << chars[idx,3].to_s
|
12
|
+
end
|
13
|
+
end
|
14
|
+
return trigrams.to_a
|
15
|
+
end
|
16
|
+
end
|
@@ -0,0 +1,102 @@
|
|
1
|
+
module FuzzySearch
|
2
|
+
module TrigramModelExtensions
|
3
|
+
def set_target_class(cls)
|
4
|
+
write_inheritable_attribute :target_class, cls
|
5
|
+
class_inheritable_reader :target_class
|
6
|
+
end
|
7
|
+
|
8
|
+
def rebuild_index
|
9
|
+
reset_column_information
|
10
|
+
delete_all
|
11
|
+
target_class.find_each do |rec|
|
12
|
+
# Maybe can make this more efficient by updating trigrams for
|
13
|
+
# batches of records...
|
14
|
+
update_trigrams(rec)
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
def params_for_search(search_term, opts = {})
|
19
|
+
trigrams = FuzzySearch::split_trigrams(search_term)
|
20
|
+
# No results for empty search string
|
21
|
+
return {:conditions => "0 = 1"} unless trigrams and !trigrams.empty?
|
22
|
+
|
23
|
+
subset = nil
|
24
|
+
if opts[:subset]
|
25
|
+
if (
|
26
|
+
opts[:subset].size == 1 &&
|
27
|
+
opts[:subset].keys.first == target_class.fuzzy_search_subset_property
|
28
|
+
)
|
29
|
+
subset = opts[:subset].values.first.to_i
|
30
|
+
opts.delete(:subset)
|
31
|
+
else
|
32
|
+
# TODO Test me
|
33
|
+
raise "Invalid subset argument #{opts[:subset]}"
|
34
|
+
end
|
35
|
+
end
|
36
|
+
|
37
|
+
unless opts.empty?
|
38
|
+
# TODO Test me
|
39
|
+
raise "Invalid options: #{opts.keys.join(",")}"
|
40
|
+
end
|
41
|
+
|
42
|
+
# Retrieve the IDs of the matching items
|
43
|
+
search_result = connection.select_rows(
|
44
|
+
"SELECT rec_id, count(*) FROM #{i(table_name)} " +
|
45
|
+
(connection.adapter_name.downcase == 'mysql' ?
|
46
|
+
"IGNORE INDEX (index_#{table_name}_on_rec_id) " : ""
|
47
|
+
) +
|
48
|
+
"WHERE token IN (#{trigrams.map{|t| v(t)}.join(',')}) " +
|
49
|
+
(subset ? "AND subset = #{subset} " : "") +
|
50
|
+
"GROUP by rec_id " +
|
51
|
+
"ORDER BY count(*) DESC " +
|
52
|
+
"LIMIT #{target_class.send(:fuzzy_search_limit)}"
|
53
|
+
)
|
54
|
+
return {:conditions => "0 = 1"} if search_result.empty?
|
55
|
+
|
56
|
+
# Perform a join between the target table and a fake table of matching ids
|
57
|
+
static_sql_union = search_result.map{|rec_id, count|
|
58
|
+
"SELECT #{v(rec_id)} AS id, #{count} AS score"
|
59
|
+
}.join(" UNION ");
|
60
|
+
primary_key_expr = "#{i(target_class.table_name)}.#{i(target_class.primary_key)}"
|
61
|
+
return {
|
62
|
+
:joins => "INNER JOIN (#{static_sql_union}) AS fuzzy_search_results ON " +
|
63
|
+
"fuzzy_search_results.id = #{primary_key_expr}",
|
64
|
+
:order => "fuzzy_search_results.score DESC"
|
65
|
+
}
|
66
|
+
end
|
67
|
+
|
68
|
+
def update_trigrams(rec)
|
69
|
+
delete_trigrams(rec)
|
70
|
+
|
71
|
+
values = target_class.fuzzy_search_properties.map{|p| rec.send(p)}
|
72
|
+
values = values.select{|p| p and p.respond_to?(:to_s)}
|
73
|
+
trigrams = FuzzySearch::split_trigrams(values)
|
74
|
+
|
75
|
+
subset_prop = target_class.fuzzy_search_subset_property
|
76
|
+
subset = subset_prop ? rec.send(subset_prop) : 0
|
77
|
+
|
78
|
+
# Ar-extensions import, much much faster than individual creates
|
79
|
+
import(
|
80
|
+
[:subset, :token, :rec_id],
|
81
|
+
trigrams.map{|t| [subset, t, rec.id]},
|
82
|
+
:validate => false
|
83
|
+
)
|
84
|
+
end
|
85
|
+
|
86
|
+
def delete_trigrams(rec)
|
87
|
+
delete_all(:rec_id => rec.id)
|
88
|
+
end
|
89
|
+
|
90
|
+
private
|
91
|
+
|
92
|
+
# Quote SQL identifier (i.e. table or column name)
|
93
|
+
def i(s)
|
94
|
+
connection.quote_column_name(s)
|
95
|
+
end
|
96
|
+
|
97
|
+
# Quote SQL value (i.e. a string or number)
|
98
|
+
def v(s)
|
99
|
+
connection.quote(s)
|
100
|
+
end
|
101
|
+
end
|
102
|
+
end
|
@@ -0,0 +1,21 @@
|
|
1
|
+
class FuzzySearchTableGenerator < Rails::Generator::NamedBase
|
2
|
+
attr_reader :target_model_name
|
3
|
+
attr_reader :table_name
|
4
|
+
attr_reader :migration_filename
|
5
|
+
attr_reader :migration_name
|
6
|
+
|
7
|
+
def initialize(runtime_args, runtime_options = {})
|
8
|
+
super
|
9
|
+
@target_model_name = name.classify
|
10
|
+
@table_name = "#{name.underscore}_fuzzy_search_trigrams"
|
11
|
+
@migration_filename = "create_#{name.underscore}_fuzzy_search_table"
|
12
|
+
@migration_name = migration_filename.classify
|
13
|
+
end
|
14
|
+
|
15
|
+
def manifest
|
16
|
+
record do |m|
|
17
|
+
m.migration_template "create_fuzzy_search_table.rb", "db/migrate",
|
18
|
+
:migration_file_name => migration_filename
|
19
|
+
end
|
20
|
+
end
|
21
|
+
end
|
@@ -0,0 +1,30 @@
|
|
1
|
+
class <%= migration_name %> < ActiveRecord::Migration
|
2
|
+
def self.up
|
3
|
+
is_mysql = ActiveRecord::Base.connection.adapter_name.downcase == 'mysql'
|
4
|
+
table = '<%= table_name %>'.to_sym
|
5
|
+
|
6
|
+
create_table table, :id => false do |t|
|
7
|
+
t.column :subset, :integer, :limit => 4, :null => false
|
8
|
+
if is_mysql
|
9
|
+
t.column :token, "binary(3)", :null => false
|
10
|
+
else
|
11
|
+
t.column :token, :binary, :limit => 3, :null => false
|
12
|
+
end
|
13
|
+
t.column :rec_id, :integer, :null => false
|
14
|
+
end
|
15
|
+
|
16
|
+
if is_mysql
|
17
|
+
ActiveRecord::Base.connection.execute(
|
18
|
+
"ALTER TABLE #{table.to_s} ENGINE = MyISAM"
|
19
|
+
)
|
20
|
+
end
|
21
|
+
add_index table, [:token, :subset, :rec_id], :name => "full_cover"
|
22
|
+
add_index table, [:rec_id]
|
23
|
+
|
24
|
+
<%= target_model_name %>.rebuild_fuzzy_search_index!
|
25
|
+
end
|
26
|
+
|
27
|
+
def self.down
|
28
|
+
drop_table '<%= table_name %>'.to_sym
|
29
|
+
end
|
30
|
+
end
|
@@ -1,6 +1,11 @@
|
|
1
|
+
#test:
|
2
|
+
# adapter: sqlite3
|
3
|
+
# database: ":memory:"
|
4
|
+
# verbosity: quiet
|
5
|
+
# pool: 5
|
6
|
+
# timeout: 5000
|
7
|
+
|
1
8
|
test:
|
2
|
-
adapter:
|
3
|
-
database: "
|
4
|
-
|
5
|
-
pool: 5
|
6
|
-
timeout: 5000
|
9
|
+
adapter: mysql
|
10
|
+
database: "fuzzy"
|
11
|
+
username: root
|
@@ -0,0 +1,30 @@
|
|
1
|
+
class CreatePersonFuzzySearchTable < ActiveRecord::Migration
|
2
|
+
def self.up
|
3
|
+
is_mysql = ActiveRecord::Base.connection.adapter_name.downcase == 'mysql'
|
4
|
+
table = 'person_fuzzy_search_trigrams'.to_sym
|
5
|
+
|
6
|
+
create_table table, :id => false do |t|
|
7
|
+
t.column :subset, :integer, :limit => 4, :null => false
|
8
|
+
if is_mysql
|
9
|
+
t.column :token, "binary(3)", :null => false
|
10
|
+
else
|
11
|
+
t.column :token, :binary, :limit => 3, :null => false
|
12
|
+
end
|
13
|
+
t.column :rec_id, :integer, :null => false
|
14
|
+
end
|
15
|
+
|
16
|
+
if is_mysql
|
17
|
+
ActiveRecord::Base.connection.execute(
|
18
|
+
"ALTER TABLE #{table.to_s} ENGINE = MyISAM"
|
19
|
+
)
|
20
|
+
end
|
21
|
+
add_index table, [:token, :subset, :rec_id], :name => "full_cover"
|
22
|
+
add_index table, [:rec_id]
|
23
|
+
|
24
|
+
Person.rebuild_fuzzy_search_index!
|
25
|
+
end
|
26
|
+
|
27
|
+
def self.down
|
28
|
+
drop_table 'person_fuzzy_search_trigrams'.to_sym
|
29
|
+
end
|
30
|
+
end
|
@@ -0,0 +1,30 @@
|
|
1
|
+
class CreateEmailFuzzySearchTable < ActiveRecord::Migration
|
2
|
+
def self.up
|
3
|
+
is_mysql = ActiveRecord::Base.connection.adapter_name.downcase == 'mysql'
|
4
|
+
table = 'email_fuzzy_search_trigrams'.to_sym
|
5
|
+
|
6
|
+
create_table table, :id => false do |t|
|
7
|
+
t.column :subset, :integer, :limit => 4, :null => false
|
8
|
+
if is_mysql
|
9
|
+
t.column :token, "binary(3)", :null => false
|
10
|
+
else
|
11
|
+
t.column :token, :binary, :limit => 3, :null => false
|
12
|
+
end
|
13
|
+
t.column :rec_id, :integer, :null => false
|
14
|
+
end
|
15
|
+
|
16
|
+
if is_mysql
|
17
|
+
ActiveRecord::Base.connection.execute(
|
18
|
+
"ALTER TABLE #{table.to_s} ENGINE = MyISAM"
|
19
|
+
)
|
20
|
+
end
|
21
|
+
add_index table, [:token, :subset, :rec_id], :name => "full_cover"
|
22
|
+
add_index table, [:rec_id]
|
23
|
+
|
24
|
+
Email.rebuild_fuzzy_search_index!
|
25
|
+
end
|
26
|
+
|
27
|
+
def self.down
|
28
|
+
drop_table 'email_fuzzy_search_trigrams'.to_sym
|
29
|
+
end
|
30
|
+
end
|
data/test/factories.rb
CHANGED
data/test/test_helper.rb
CHANGED
@@ -18,6 +18,7 @@ end
|
|
18
18
|
|
19
19
|
require 'rubygems'
|
20
20
|
require 'minitest/autorun'
|
21
|
+
require 'minitest/benchmark' if ENV["BENCH"]
|
21
22
|
require 'redgreen'
|
22
23
|
require 'pp'
|
23
24
|
|
@@ -61,9 +62,21 @@ module MiniTest
|
|
61
62
|
end
|
62
63
|
|
63
64
|
require 'factories'
|
65
|
+
require 'faker'
|
64
66
|
MiniTest::Unit::TestCase.send(:include, Factory::Syntax::Methods)
|
65
67
|
|
66
68
|
MiniTest::Unit::TestCase.add_setup_hook do
|
67
69
|
ActiveRecord::Migration.verbose = false
|
68
70
|
ActiveRecord::Migrator.migrate("#{Rails.root}/db/migrate") # Migrations in the test app
|
69
71
|
end
|
72
|
+
|
73
|
+
# From http://stackoverflow.com/questions/1090801/1091106
|
74
|
+
ActiveRecord::ConnectionAdapters::AbstractAdapter.class_eval do
|
75
|
+
attr_reader :last_query
|
76
|
+
|
77
|
+
def log_with_last_query(sql, name, &block)
|
78
|
+
@last_query = [sql, name]
|
79
|
+
log_without_last_query(sql, name, &block)
|
80
|
+
end
|
81
|
+
alias_method_chain :log, :last_query
|
82
|
+
end
|
@@ -2,7 +2,7 @@ require File.expand_path(File.dirname(__FILE__) + '/../test_helper')
|
|
2
2
|
|
3
3
|
describe "fuzzy_search" do
|
4
4
|
before do
|
5
|
-
create(:person, :last_name => "meier", :first_name => "kristian")
|
5
|
+
@kris = create(:person, :last_name => "meier", :first_name => "kristian")
|
6
6
|
create(:person, :last_name => "meyer", :first_name => "christian", :hobby => "Bicycling")
|
7
7
|
create(:person, :last_name => "mayr", :first_name => "Chris")
|
8
8
|
create(:person, :last_name => "maier", :first_name => "christoph", :hobby => "Bicycling")
|
@@ -18,12 +18,13 @@ describe "fuzzy_search" do
|
|
18
18
|
after do
|
19
19
|
Person.delete_all
|
20
20
|
Email.delete_all
|
21
|
-
FuzzySearchTrigram.delete_all
|
21
|
+
Person::FuzzySearchTrigram.delete_all
|
22
|
+
Email::FuzzySearchTrigram.delete_all
|
22
23
|
end
|
23
24
|
|
24
25
|
it "can search for records with similar strings to a query" do
|
25
|
-
|
26
|
-
|
26
|
+
refute_empty Person.fuzzy_search("maier")
|
27
|
+
refute_empty Person.fuzzy_search("ather")
|
27
28
|
end
|
28
29
|
|
29
30
|
it "can search on multiple columns" do
|
@@ -32,14 +33,10 @@ describe "fuzzy_search" do
|
|
32
33
|
assert_equal "meier", result[0].last_name
|
33
34
|
end
|
34
35
|
|
35
|
-
it "sorts results
|
36
|
+
it "sorts results with the best match first" do
|
36
37
|
result = Person.fuzzy_search("kristian meier")
|
37
|
-
assert_equal
|
38
|
-
|
39
|
-
(1..3).each do |idx|
|
40
|
-
assert result[idx].fuzzy_score <= prior
|
41
|
-
prior = result[idx].fuzzy_score
|
42
|
-
end
|
38
|
+
assert_equal @kris, result.first
|
39
|
+
assert result.size > 1
|
43
40
|
end
|
44
41
|
|
45
42
|
it "returns an empty result set when given an empty query string" do
|
@@ -48,9 +45,16 @@ describe "fuzzy_search" do
|
|
48
45
|
end
|
49
46
|
|
50
47
|
it "updates the search index automatically when a new record is saved" do
|
51
|
-
assert_empty Person.fuzzy_search("
|
48
|
+
assert_empty Person.fuzzy_search("David")
|
52
49
|
create(:person, :first_name => "David", :last_name => "Simon")
|
53
|
-
refute_empty Person.fuzzy_search("
|
50
|
+
refute_empty Person.fuzzy_search("David")
|
51
|
+
end
|
52
|
+
|
53
|
+
it "only updates the appropriate trigram table" do
|
54
|
+
original_state = Email::FuzzySearchTrigram.all.to_a
|
55
|
+
create(:person, :first_name => "David", :last_name => "Simon")
|
56
|
+
new_state = Email::FuzzySearchTrigram.all.to_a
|
57
|
+
assert_equal original_state, new_state
|
54
58
|
end
|
55
59
|
|
56
60
|
it "updates the search index automatically when a record is updated" do
|
@@ -80,12 +84,8 @@ describe "fuzzy_search" do
|
|
80
84
|
refute_empty Email.fuzzy_search("oscar")
|
81
85
|
end
|
82
86
|
|
83
|
-
it "can normalize strings" do
|
84
|
-
assert_equal("aaaaaa", Person.normalize("ÀÁÂÃÄÅ"))
|
85
|
-
end
|
86
|
-
|
87
87
|
it "normalizes strings before searching on them" do
|
88
|
-
assert_equal 1, Person.fuzzy_search("
|
88
|
+
assert_equal 1, Person.fuzzy_search("Müell").size
|
89
89
|
assert_equal 1, Email.fuzzy_search("öscar").size
|
90
90
|
end
|
91
91
|
|
@@ -95,12 +95,21 @@ describe "fuzzy_search" do
|
|
95
95
|
|
96
96
|
it "can search through a scope" do
|
97
97
|
scope = Person.scoped({:conditions => {:hobby => "Bicycling"}})
|
98
|
-
|
99
|
-
|
98
|
+
full = Person.fuzzy_search("chris")
|
99
|
+
subset = scope.fuzzy_search("chris")
|
100
|
+
assert full.size > subset.size
|
101
|
+
assert subset.size > 0
|
102
|
+
end
|
103
|
+
|
104
|
+
it "can use the subset field to narrow the search range" do
|
105
|
+
full = Person.fuzzy_search("chris")
|
106
|
+
subset = Person.fuzzy_search("chris", :subset => {:favorite_number => 2})
|
107
|
+
assert full.size > subset.size
|
108
|
+
assert subset.size > 0
|
100
109
|
end
|
101
110
|
|
102
111
|
it "can rebuild the search index from scratch" do
|
103
|
-
FuzzySearchTrigram.delete_all
|
112
|
+
Person::FuzzySearchTrigram.delete_all
|
104
113
|
assert_empty Person.fuzzy_search("chris")
|
105
114
|
Person.rebuild_fuzzy_search_index!
|
106
115
|
refute_empty Person.fuzzy_search("chris")
|
@@ -0,0 +1,51 @@
|
|
1
|
+
require File.expand_path(File.dirname(__FILE__) + '/../test_helper')
|
2
|
+
|
3
|
+
describe "fuzzy_search" do
|
4
|
+
if ENV["BENCH"]
|
5
|
+
before do
|
6
|
+
Person.set_table_name "preloaded_people"
|
7
|
+
Person::FuzzySearchTrigram.set_table_name "preloaded_person_fuzzy_search_trigrams"
|
8
|
+
|
9
|
+
# Force Person to reload its fuzzy type id from preloaded_types
|
10
|
+
Person.send(:write_inheritable_attribute, :fuzzy_search_cached_type_id, nil)
|
11
|
+
end
|
12
|
+
|
13
|
+
after do
|
14
|
+
Person.set_table_name "people"
|
15
|
+
Person::FuzzySearchTrigram.set_table_name "person_fuzzy_search_trigrams"
|
16
|
+
end
|
17
|
+
|
18
|
+
bench_range do
|
19
|
+
[1, 10, 50]
|
20
|
+
end
|
21
|
+
|
22
|
+
bench_performance_linear "regular queries", 0.9 do |n|
|
23
|
+
c = 0
|
24
|
+
n.times do
|
25
|
+
result = Person.scoped(:limit => 20).fuzzy_search(Faker::Name.last_name)
|
26
|
+
c += result.size
|
27
|
+
end
|
28
|
+
if n >= 30
|
29
|
+
rpq = c/(n.to_f)
|
30
|
+
if rpq < 2
|
31
|
+
raise "Sanity check failure, average results per query: #{rpq}"
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end
|
35
|
+
|
36
|
+
bench_performance_linear "subset queries", 0.9 do |n|
|
37
|
+
c = 0
|
38
|
+
n.times do
|
39
|
+
result = Person.scoped(:limit => 20).fuzzy_search(Faker::Name.last_name,
|
40
|
+
:subset => {:favorite_number => rand(100)+1})
|
41
|
+
c += result.size
|
42
|
+
end
|
43
|
+
if n >= 30
|
44
|
+
rpq = c/(n.to_f)
|
45
|
+
if rpq < 0.1
|
46
|
+
raise "Sanity check failure, average results per query: #{rpq}"
|
47
|
+
end
|
48
|
+
end
|
49
|
+
end
|
50
|
+
end
|
51
|
+
end
|
@@ -0,0 +1,18 @@
|
|
1
|
+
require File.expand_path(File.dirname(__FILE__) + '/../test_helper')
|
2
|
+
require 'set'
|
3
|
+
|
4
|
+
describe "fuzzy search trigram splitter" do
|
5
|
+
it "can split a string into trigrams with emphasized edges" do
|
6
|
+
t = [" da", "dav", "avi", "vid", "id "]
|
7
|
+
assert_equal Set.new(t), Set.new(FuzzySearch::split_trigrams("david"))
|
8
|
+
end
|
9
|
+
|
10
|
+
it "can normalize strings" do
|
11
|
+
assert_equal([" a "], FuzzySearch::split_trigrams("À"))
|
12
|
+
end
|
13
|
+
|
14
|
+
it "can handle an array of strings" do
|
15
|
+
t = [" x ", " y ", " zi", "zig", "ig "]
|
16
|
+
assert_equal Set.new(t), Set.new(FuzzySearch::split_trigrams(["x", "y", "zig"]))
|
17
|
+
end
|
18
|
+
end
|
metadata
CHANGED
@@ -1,12 +1,12 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: fuzzy_search
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
hash:
|
4
|
+
hash: 3
|
5
5
|
prerelease:
|
6
6
|
segments:
|
7
7
|
- 0
|
8
|
-
-
|
9
|
-
version: "0.
|
8
|
+
- 4
|
9
|
+
version: "0.4"
|
10
10
|
platform: ruby
|
11
11
|
authors:
|
12
12
|
- Kristian Meier
|
@@ -15,7 +15,7 @@ autorequire:
|
|
15
15
|
bindir: bin
|
16
16
|
cert_chain: []
|
17
17
|
|
18
|
-
date: 2011-10-
|
18
|
+
date: 2011-10-27 00:00:00 Z
|
19
19
|
dependencies:
|
20
20
|
- !ruby/object:Gem::Dependency
|
21
21
|
name: ar-extensions
|
@@ -45,12 +45,15 @@ files:
|
|
45
45
|
- MIT-LICENSE
|
46
46
|
- README.md
|
47
47
|
- Rakefile
|
48
|
+
- TODO
|
49
|
+
- gen_fake_data.rb
|
48
50
|
- lib/fuzzy_model_extensions.rb
|
49
51
|
- lib/fuzzy_search.rb
|
50
|
-
- lib/fuzzy_search_trigram.rb
|
51
52
|
- lib/fuzzy_search_ver.rb
|
52
|
-
-
|
53
|
-
-
|
53
|
+
- lib/split_trigrams.rb
|
54
|
+
- lib/trigram_model_extensions.rb
|
55
|
+
- rails_generators/fuzzy_search_table/fuzzy_search_table_generator.rb
|
56
|
+
- rails_generators/fuzzy_search_table/templates/create_fuzzy_search_table.rb
|
54
57
|
- test/app_root/app/models/email.rb
|
55
58
|
- test/app_root/app/models/person.rb
|
56
59
|
- test/app_root/config/boot.rb
|
@@ -58,13 +61,16 @@ files:
|
|
58
61
|
- test/app_root/config/environment.rb
|
59
62
|
- test/app_root/config/environments/test.rb
|
60
63
|
- test/app_root/config/routes.rb
|
61
|
-
- test/app_root/db/migrate/
|
62
|
-
- test/app_root/db/migrate/
|
64
|
+
- test/app_root/db/migrate/2_create_tables.rb
|
65
|
+
- test/app_root/db/migrate/3_create_person_fuzzy_search_table.rb
|
66
|
+
- test/app_root/db/migrate/4_create_email_fuzzy_search_table.rb
|
63
67
|
- test/app_root/vendor/plugins/fuzzy_search/init.rb
|
64
68
|
- test/factories.rb
|
65
69
|
- test/test.watchr
|
66
70
|
- test/test_helper.rb
|
67
71
|
- test/unit/fuzzy_search_test.rb
|
72
|
+
- test/unit/performance_test.rb
|
73
|
+
- test/unit/split_trigrams_test.rb
|
68
74
|
homepage: http://github.com/DavidMikeSimon/fuzzy_search
|
69
75
|
licenses: []
|
70
76
|
|
data/lib/fuzzy_search_trigram.rb
DELETED
@@ -1,15 +0,0 @@
|
|
1
|
-
class CreateFuzzySearchTrigrams < ActiveRecord::Migration
|
2
|
-
def self.up
|
3
|
-
create_table :fuzzy_search_trigrams, :id => false do |t|
|
4
|
-
t.column :token, :string, :limit => 3
|
5
|
-
t.column :rec_type, :string
|
6
|
-
t.column :rec_id, :integer
|
7
|
-
end
|
8
|
-
|
9
|
-
add_index :fuzzy_search_trigrams, [:rec_type, :token]
|
10
|
-
end
|
11
|
-
|
12
|
-
def self.down
|
13
|
-
drop_table :fuzzy_search_trigrams
|
14
|
-
end
|
15
|
-
end
|
@@ -1,15 +0,0 @@
|
|
1
|
-
class CreateFuzzySearchTrigrams < ActiveRecord::Migration
|
2
|
-
def self.up
|
3
|
-
create_table :fuzzy_search_trigrams, :id => false do |t|
|
4
|
-
t.column :token, :string, :limit => 3
|
5
|
-
t.column :rec_type, :string
|
6
|
-
t.column :rec_id, :integer
|
7
|
-
end
|
8
|
-
|
9
|
-
add_index :fuzzy_search_trigrams, [:rec_type, :token]
|
10
|
-
end
|
11
|
-
|
12
|
-
def self.down
|
13
|
-
drop_table :fuzzy_search_trigrams
|
14
|
-
end
|
15
|
-
end
|