redis-textsearch 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.rdoc +93 -0
- data/lib/redis/text_search/collection.rb +146 -0
- data/lib/redis/text_search.rb +289 -0
- data/spec/redis_text_search_core_spec.rb +143 -0
- data/spec/spec_helper.rb +5 -0
- metadata +69 -0
data/README.rdoc
ADDED
@@ -0,0 +1,93 @@
|
|
1
|
+
= Redis::TextSearch - Use Redis to perform text search from any type of class
|
2
|
+
|
3
|
+
This gem implements an extremely fast text search using Redis, based on the patterns from
|
4
|
+
James Gray's {lists and sets in Redis post}[http://blog.grayproductions.net/articles/lists_and_sets_in_redis]
|
5
|
+
as well as Antirez's {text search gist}[http://gist.github.com/120067]. You can use it with any type of class,
|
6
|
+
whether it be ActiveRecord, DataMapper, MongoRecord, or even a class having nothing to do with an ORM.
|
7
|
+
|
8
|
+
This is not intended to be the most full-featured text search available. For that, look into
|
9
|
+
{Sphinx}[http://www.sphinxsearch.com/], {Solr}[http://lucene.apache.org/solr/], or
|
10
|
+
{other alternatives}[http://en.wikipedia.org/wiki/Full_text_search#Open_Source_projects].
|
11
|
+
This gem is designed to (a) be extremely fast and (b) handle any (or multiple) data stores.
|
12
|
+
|
13
|
+
The only requirement this gem has is that your class must provide an +id+ instance method
|
14
|
+
which returns the ID for that instance. ActiveRecord, DataMapper, and MongoRecord all have +id+ methods which
|
15
|
+
are known to be suitable. Since "ID" can be any data type, you can even write an +id+ method of your own
|
16
|
+
that just returns a string, or an MD5 of a filename, or something else unique.
|
17
|
+
|
18
|
+
== Installation
|
19
|
+
|
20
|
+
gem install gemcutter
|
21
|
+
gem tumble
|
22
|
+
gem install redis-textsearch
|
23
|
+
|
24
|
+
== Initialization
|
25
|
+
|
26
|
+
If on Rails, config/initializers/redis.rb is a good place for this:
|
27
|
+
|
28
|
+
require 'redis'
|
29
|
+
require 'redis/text_search'
|
30
|
+
Redis::TextSearch.redis = Redis.new(:host => 'localhost', :port => 6379)
|
31
|
+
|
32
|
+
== Example
|
33
|
+
|
34
|
+
Model class:
|
35
|
+
|
36
|
+
class Post < ActiveRecord::Base
|
37
|
+
include Redis::TextSearch
|
38
|
+
|
39
|
+
text_index :title, :minlength => 2
|
40
|
+
text_index :tags, :exact => true
|
41
|
+
|
42
|
+
# Using AR callback (you can call update_text_indexes anywhere on your instance)
|
43
|
+
after_save do |r|
|
44
|
+
r.update_text_indexes
|
45
|
+
end
|
46
|
+
after_destroy do |r|
|
47
|
+
r.delete_text_indexes
|
48
|
+
end
|
49
|
+
end
|
50
|
+
|
51
|
+
Create posts:
|
52
|
+
|
53
|
+
Post.create(:title => "All About Bacon", :tags => "chef nontechnical")
|
54
|
+
Post.create(:title => "All About Bacon - Part 2", :tags => "chef nontechnical")
|
55
|
+
Post.create(:title => "Homemade Belgian Waffles", :tags => "chef nontechnical")
|
56
|
+
Post.create(:title => "Using Redis with Ruby", :tags => "technical ruby redis")
|
57
|
+
Post.create(:title => "Installing Redis on Linux", :tags => "technical redis linux")
|
58
|
+
Post.create(:title => "Chef Deployment Recipes", :tags => "ruby howto chef")
|
59
|
+
|
60
|
+
Then search for them:
|
61
|
+
|
62
|
+
@posts = Post.text_search('bacon') # 2 results
|
63
|
+
@posts = Post.text_search('chef') # 4 results (tags and title)
|
64
|
+
@posts = Post.text_search('technical', 'ruby') # AND search (1 result)
|
65
|
+
@posts = Post.text_search('chef', :fields => :tags) # 3 results (only search tags)
|
66
|
+
|
67
|
+
You can pass options through to the +find+ method:
|
68
|
+
|
69
|
+
@posts = Post.text_search('redis', :order => 'updated_at desc')
|
70
|
+
@posts = Post.text_search('redis', :select => 'id,title', :limit => 50)
|
71
|
+
|
72
|
+
And do pagination (adapted from +will_paginate+):
|
73
|
+
|
74
|
+
@posts = Post.text_search('redis', :page => 1, :per_page => 10)
|
75
|
+
@posts = Post.text_search('redis', :page => 1) # uses class.per_page like will_paginate
|
76
|
+
|
77
|
+
You can also specify specific fields to search as a hash:
|
78
|
+
|
79
|
+
@posts = Post.text_search(:tags => 'chef') # 4 results
|
80
|
+
@posts = Post.text_search(:tags => ['technical','ruby']) # AND (1 result)
|
81
|
+
@posts = Post.text_search(:tags => 'redis', :title => 'linux') # 1 result
|
82
|
+
@posts = Post.text_search(:title => 'chef') # 1 result
|
83
|
+
|
84
|
+
Note that if you need to pass options to +find+ AND search specific fields, the first
|
85
|
+
hash must be in brackets:
|
86
|
+
|
87
|
+
@posts = Post.text_search({:tags => 'chef', :title => 'deployment'},
|
88
|
+
:order => 'updated_at desc')
|
89
|
+
|
90
|
+
== Author
|
91
|
+
|
92
|
+
Copyright (c) 2009 {Nate Wiger}[http://nate.wiger.org]. All Rights Reserved.
|
93
|
+
Released under the {Artistic License}[http://www.opensource.org/licenses/artistic-license-2.0.php].
|
@@ -0,0 +1,146 @@
|
|
1
|
+
module Redis::TextSearch
|
2
|
+
# = Invalid page number error
|
3
|
+
# This is an ArgumentError raised in case a page was requested that is either
|
4
|
+
# zero or negative number. You should decide how do deal with such errors in
|
5
|
+
# the controller.
|
6
|
+
#
|
7
|
+
# If you're using Rails 2, then this error will automatically get handled like
|
8
|
+
# 404 Not Found. The hook is in "will_paginate.rb":
|
9
|
+
#
|
10
|
+
# ActionController::Base.rescue_responses['Redis::TextSearch::InvalidPage'] = :not_found
|
11
|
+
#
|
12
|
+
# If you don't like this, use your preffered method of rescuing exceptions in
|
13
|
+
# public from your controllers to handle this differently. The +rescue_from+
|
14
|
+
# method is a nice addition to Rails 2.
|
15
|
+
#
|
16
|
+
# This error is *not* raised when a page further than the last page is
|
17
|
+
# requested. Use <tt>Redis::TextSearch::Collection#out_of_bounds?</tt> method to
|
18
|
+
# check for those cases and manually deal with them as you see fit.
|
19
|
+
class InvalidPage < ArgumentError
|
20
|
+
def initialize(page, page_num)
|
21
|
+
super "#{page.inspect} given as value, which translates to '#{page_num}' as page number"
|
22
|
+
end
|
23
|
+
end
|
24
|
+
|
25
|
+
# = The key to pagination
|
26
|
+
# Arrays returned from paginating finds are, in fact, instances of this little
|
27
|
+
# class. You may think of Redis::TextSearch::Collection as an ordinary array with
|
28
|
+
# some extra properties. Those properties are used by view helpers to generate
|
29
|
+
# correct page links.
|
30
|
+
#
|
31
|
+
# Redis::TextSearch::Collection also assists in rolling out your own pagination
|
32
|
+
# solutions: see +create+.
|
33
|
+
#
|
34
|
+
# If you are writing a library that provides a collection which you would like
|
35
|
+
# to conform to this API, you don't have to copy these methods over; simply
|
36
|
+
# make your plugin/gem dependant on the "mislav-will_paginate" gem:
|
37
|
+
#
|
38
|
+
# gem 'mislav-will_paginate'
|
39
|
+
# require 'will_paginate/collection'
|
40
|
+
#
|
41
|
+
# # Redis::TextSearch::Collection is now available for use
|
42
|
+
class Collection < Array
|
43
|
+
attr_reader :current_page, :per_page, :total_entries, :total_pages
|
44
|
+
|
45
|
+
# Arguments to the constructor are the current page number, per-page limit
|
46
|
+
# and the total number of entries. The last argument is optional because it
|
47
|
+
# is best to do lazy counting; in other words, count *conditionally* after
|
48
|
+
# populating the collection using the +replace+ method.
|
49
|
+
def initialize(page, per_page, total = nil)
|
50
|
+
@current_page = page.to_i
|
51
|
+
raise InvalidPage.new(page, @current_page) if @current_page < 1
|
52
|
+
@per_page = per_page.to_i
|
53
|
+
raise ArgumentError, "`per_page` setting cannot be less than 1 (#{@per_page} given)" if @per_page < 1
|
54
|
+
|
55
|
+
self.total_entries = total if total
|
56
|
+
end
|
57
|
+
|
58
|
+
# Just like +new+, but yields the object after instantiation and returns it
|
59
|
+
# afterwards. This is very useful for manual pagination:
|
60
|
+
#
|
61
|
+
# @entries = Redis::TextSearch::Collection.create(1, 10) do |pager|
|
62
|
+
# result = Post.find(:all, :limit => pager.per_page, :offset => pager.offset)
|
63
|
+
# # inject the result array into the paginated collection:
|
64
|
+
# pager.replace(result)
|
65
|
+
#
|
66
|
+
# unless pager.total_entries
|
67
|
+
# # the pager didn't manage to guess the total count, do it manually
|
68
|
+
# pager.total_entries = Post.count
|
69
|
+
# end
|
70
|
+
# end
|
71
|
+
#
|
72
|
+
# The possibilities with this are endless. For another example, here is how
|
73
|
+
# Redis::TextSearch used to define pagination for Array instances:
|
74
|
+
#
|
75
|
+
# Array.class_eval do
|
76
|
+
# def paginate(page = 1, per_page = 15)
|
77
|
+
# Redis::TextSearch::Collection.create(page, per_page, size) do |pager|
|
78
|
+
# pager.replace self[pager.offset, pager.per_page].to_a
|
79
|
+
# end
|
80
|
+
# end
|
81
|
+
# end
|
82
|
+
#
|
83
|
+
# The Array#paginate API has since then changed, but this still serves as a
|
84
|
+
# fine example of Redis::TextSearch::Collection usage.
|
85
|
+
def self.create(page, per_page, total = nil)
|
86
|
+
pager = new(page, per_page, total)
|
87
|
+
yield pager
|
88
|
+
pager
|
89
|
+
end
|
90
|
+
|
91
|
+
# Helper method that is true when someone tries to fetch a page with a
|
92
|
+
# larger number than the last page. Can be used in combination with flashes
|
93
|
+
# and redirecting.
|
94
|
+
def out_of_bounds?
|
95
|
+
current_page > total_pages
|
96
|
+
end
|
97
|
+
|
98
|
+
# Current offset of the paginated collection. If we're on the first page,
|
99
|
+
# it is always 0. If we're on the 2nd page and there are 30 entries per page,
|
100
|
+
# the offset is 30. This property is useful if you want to render ordinals
|
101
|
+
# side by side with records in the view: simply start with offset + 1.
|
102
|
+
def offset
|
103
|
+
(current_page - 1) * per_page
|
104
|
+
end
|
105
|
+
|
106
|
+
# current_page - 1 or nil if there is no previous page
|
107
|
+
def previous_page
|
108
|
+
current_page > 1 ? (current_page - 1) : nil
|
109
|
+
end
|
110
|
+
|
111
|
+
# current_page + 1 or nil if there is no next page
|
112
|
+
def next_page
|
113
|
+
current_page < total_pages ? (current_page + 1) : nil
|
114
|
+
end
|
115
|
+
|
116
|
+
# sets the <tt>total_entries</tt> property and calculates <tt>total_pages</tt>
|
117
|
+
def total_entries=(number)
|
118
|
+
@total_entries = number.to_i
|
119
|
+
@total_pages = (@total_entries / per_page.to_f).ceil
|
120
|
+
end
|
121
|
+
|
122
|
+
# This is a magic wrapper for the original Array#replace method. It serves
|
123
|
+
# for populating the paginated collection after initialization.
|
124
|
+
#
|
125
|
+
# Why magic? Because it tries to guess the total number of entries judging
|
126
|
+
# by the size of given array. If it is shorter than +per_page+ limit, then we
|
127
|
+
# know we're on the last page. This trick is very useful for avoiding
|
128
|
+
# unnecessary hits to the database to do the counting after we fetched the
|
129
|
+
# data for the current page.
|
130
|
+
#
|
131
|
+
# However, after using +replace+ you should always test the value of
|
132
|
+
# +total_entries+ and set it to a proper value if it's +nil+. See the example
|
133
|
+
# in +create+.
|
134
|
+
def replace(array)
|
135
|
+
result = super
|
136
|
+
|
137
|
+
# The collection is shorter then page limit? Rejoice, because
|
138
|
+
# then we know that we are on the last page!
|
139
|
+
if total_entries.nil? and length < per_page and (current_page == 1 or length > 0)
|
140
|
+
self.total_entries = offset + length
|
141
|
+
end
|
142
|
+
|
143
|
+
result
|
144
|
+
end
|
145
|
+
end
|
146
|
+
end
|
@@ -0,0 +1,289 @@
|
|
1
|
+
# Redis::TextSearch - Use Redis to add text search to your app.
|
2
|
+
class Redis
|
3
|
+
#
|
4
|
+
# Redis::TextSearch enables high-performance text search in your app using
|
5
|
+
# Redis. You can perform text search on any type of data store or ORM.
|
6
|
+
#
|
7
|
+
module TextSearch
|
8
|
+
class NoFinderMethod < StandardError; end
|
9
|
+
class BadTextIndex < StandardError; end
|
10
|
+
|
11
|
+
DEFAULT_EXCLUDE_LIST = %w(a an and as at but by for in into of on onto to the)
|
12
|
+
|
13
|
+
dir = File.expand_path(__FILE__.sub(/\.rb$/,''))
|
14
|
+
autoload :Collection, File.join(dir, 'collection')
|
15
|
+
|
16
|
+
class << self
|
17
|
+
def redis=(conn) @redis = conn end
|
18
|
+
def redis
|
19
|
+
@redis ||= $redis || raise(NotConnected, "Redis::TextSearch.redis not set to a Redis.new connection")
|
20
|
+
end
|
21
|
+
|
22
|
+
def included(klass)
|
23
|
+
klass.instance_variable_set('@redis', @redis)
|
24
|
+
klass.instance_variable_set('@text_indexes', {})
|
25
|
+
klass.instance_variable_set('@text_search_find', nil)
|
26
|
+
klass.instance_variable_set('@text_index_exclude_list', DEFAULT_EXCLUDE_LIST)
|
27
|
+
klass.send :include, InstanceMethods
|
28
|
+
klass.extend ClassMethods
|
29
|
+
klass.guess_text_search_find
|
30
|
+
class << klass
|
31
|
+
define_method(:per_page) { 30 } unless respond_to?(:per_page)
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end
|
35
|
+
|
36
|
+
# These class methods are added to the class when you include Redis::TextSearch.
|
37
|
+
module ClassMethods
|
38
|
+
attr_accessor :redis
|
39
|
+
attr_reader :text_indexes
|
40
|
+
|
41
|
+
# Words to exclude from text indexing. By default, includes common
|
42
|
+
# English prepositions like "a", "an", "the", "and", "or", etc.
|
43
|
+
# This is an accessor to an array, so you can use += or << to add to it.
|
44
|
+
# See the constant +DEFAULT_EXCLUDE_LIST+ for the default list.
|
45
|
+
attr_accessor :text_index_exclude_list
|
46
|
+
|
47
|
+
# Set the Redis prefix to use. Defaults to model_name
|
48
|
+
def prefix=(prefix) @prefix = prefix end
|
49
|
+
def prefix #:nodoc:
|
50
|
+
@prefix ||= self.name.to_s.
|
51
|
+
sub(%r{(.*::)}, '').
|
52
|
+
gsub(/([A-Z]+)([A-Z][a-z])/,'\1_\2').
|
53
|
+
gsub(/([a-z\d])([A-Z])/,'\1_\2').
|
54
|
+
downcase
|
55
|
+
end
|
56
|
+
|
57
|
+
def field_key(name, id) #:nodoc:
|
58
|
+
"#{prefix}:#{id}:#{name}"
|
59
|
+
end
|
60
|
+
|
61
|
+
# This is called when the class is imported, and uses reflection to guess
|
62
|
+
# how to retrieve records. You can override it by explicitly defining a
|
63
|
+
# +text_search_find+ class method that takes an array of IDs as an argument.
|
64
|
+
def guess_text_search_find
|
65
|
+
if defined?(ActiveRecord::Base) and is_a?(ActiveRecord::Base)
|
66
|
+
instance_eval <<-EndMethod
|
67
|
+
def text_search_find(ids, options)
|
68
|
+
all(options.merge(:conditions => {:#{primary_key} => ids}))
|
69
|
+
end
|
70
|
+
EndMethod
|
71
|
+
elsif defined?(MongoRecord::Base) and is_a?(MongoRecord::Base)
|
72
|
+
instance_eval <<-EndMethod
|
73
|
+
def text_search_find(ids, options)
|
74
|
+
all(options.merge(:conditions => {:#{primary_key} => ids}))
|
75
|
+
end
|
76
|
+
EndMethod
|
77
|
+
elsif respond_to?(:get)
|
78
|
+
# DataMapper::Resource is an include, so is_a? won't work
|
79
|
+
instance_eval <<-EndMethod
|
80
|
+
def text_search_find(ids, options)
|
81
|
+
get(ids, options)
|
82
|
+
end
|
83
|
+
EndMethod
|
84
|
+
end
|
85
|
+
end
|
86
|
+
|
87
|
+
# Define fields to be indexed for text search. To update the index, you must call
|
88
|
+
# update_text_indexes after record save or at the appropriate point.
|
89
|
+
def text_index(*args)
|
90
|
+
options = args.last.is_a?(Hash) ? args.pop : {}
|
91
|
+
options[:minlength] ||= 1
|
92
|
+
options[:split] ||= /\s+/
|
93
|
+
raise ArgumentError, "Must specify fields to index to #{self.name}.text_index" unless args.length > 0
|
94
|
+
args.each do |name|
|
95
|
+
@text_indexes[name.to_sym] = options.merge(:key => field_key(name, 'text_index'))
|
96
|
+
end
|
97
|
+
end
|
98
|
+
|
99
|
+
# Perform text search and return results from database. Options:
|
100
|
+
#
|
101
|
+
# 'string', 'string2'
|
102
|
+
# :fields
|
103
|
+
# :page
|
104
|
+
# :per_page
|
105
|
+
def text_search(*args)
|
106
|
+
options = args.length > 1 && args.last.is_a?(Hash) ? args.pop : {}
|
107
|
+
fields = Array(options.delete(:fields) || @text_indexes.keys)
|
108
|
+
finder = options.delete(:finder)
|
109
|
+
unless finder
|
110
|
+
unless defined?(:text_search_find)
|
111
|
+
raise NoFinderMethod, "Could not detect how to find records; you must def text_search_find()"
|
112
|
+
end
|
113
|
+
finder = :text_search_find
|
114
|
+
end
|
115
|
+
|
116
|
+
#
|
117
|
+
# Assemble set names for our intersection.
|
118
|
+
# Accept two ways of doing search: either {:field => ['value','value'], :field => 'value'},
|
119
|
+
# or 'value','value', :fields => [:field, :field]. The first is an AND, the latter an OR.
|
120
|
+
#
|
121
|
+
ids = []
|
122
|
+
if args.empty?
|
123
|
+
raise ArgumentError, "Must specify search string(s) to #{self.name}.text_search"
|
124
|
+
elsif args.first.is_a?(Hash)
|
125
|
+
sets = []
|
126
|
+
args.first.each do |f,v|
|
127
|
+
sets += text_search_sets_for(f,v)
|
128
|
+
end
|
129
|
+
# Execute single intersection (AND)
|
130
|
+
ids = redis.set_intersect(*sets)
|
131
|
+
else
|
132
|
+
fields.each do |f|
|
133
|
+
sets = text_search_sets_for(f,args)
|
134
|
+
# Execute intersection per loop (OR)
|
135
|
+
ids += redis.set_intersect(*sets)
|
136
|
+
end
|
137
|
+
end
|
138
|
+
|
139
|
+
# Calculate pagination if applicable. Presence of :page indicates we want pagination.
|
140
|
+
# Adapted from will_paginate/finder.rb
|
141
|
+
if options.has_key?(:page)
|
142
|
+
page = options.delete(:page) || 1
|
143
|
+
per_page = options.delete(:per_page) || self.per_page
|
144
|
+
total = ids.length
|
145
|
+
|
146
|
+
Redis::TextSearch::Collection.create(page, per_page, total) do |pager|
|
147
|
+
# Convert page/per_page to limit/offset
|
148
|
+
options.merge!(:offset => pager.offset, :limit => pager.per_page)
|
149
|
+
pager.replace(send(finder, ids, options){ |*a| yield(*a) if block_given? })
|
150
|
+
end
|
151
|
+
else
|
152
|
+
# Execute finder directly
|
153
|
+
send(finder, ids, options)
|
154
|
+
end
|
155
|
+
end
|
156
|
+
|
157
|
+
# Filter and return self. Chainable.
|
158
|
+
def text_filter(field)
|
159
|
+
raise UnimplementedError
|
160
|
+
end
|
161
|
+
|
162
|
+
# Delete all text indexes for the given id.
|
163
|
+
def delete_text_indexes(id, *fields)
|
164
|
+
fields = @text_indexes.keys if fields.empty?
|
165
|
+
fields.each do |field|
|
166
|
+
redis.pipelined do |pipe|
|
167
|
+
text_indexes_for(id, field).each do |key|
|
168
|
+
pipe.srem(key, id)
|
169
|
+
end
|
170
|
+
pipe.del field_key("#{field}_indexes", id)
|
171
|
+
end
|
172
|
+
end
|
173
|
+
end
|
174
|
+
|
175
|
+
def text_indexes_for(id, field) #:nodoc:
|
176
|
+
(redis.get(field_key("#{field}_indexes", id)) || '').split(';')
|
177
|
+
end
|
178
|
+
|
179
|
+
def text_search_sets_for(field, values)
|
180
|
+
key = @text_indexes[field][:key]
|
181
|
+
Array(values).collect do |val|
|
182
|
+
str = val.downcase.gsub(/[^\w\s]+/,'').gsub(/\s+/, '.') # can't have " " in Redis cmd string
|
183
|
+
"#{key}:#{str}"
|
184
|
+
end
|
185
|
+
end
|
186
|
+
end
|
187
|
+
|
188
|
+
module InstanceMethods #:nodoc:
|
189
|
+
def redis() self.class.redis end
|
190
|
+
def field_key(name) #:nodoc:
|
191
|
+
self.class.field_key(name, id)
|
192
|
+
end
|
193
|
+
|
194
|
+
# Retrieve the options for the given field
|
195
|
+
def text_index_options_for(field)
|
196
|
+
self.class.text_indexes[field] ||
|
197
|
+
raise(BadTextIndex, "No such text index #{field} in #{self.class.name}")
|
198
|
+
end
|
199
|
+
|
200
|
+
# Retrieve the reverse-mapping of text indexes for a given field. Designed
|
201
|
+
# as a utility method but maybe you will find it useful.
|
202
|
+
def text_indexes_for(field)
|
203
|
+
self.class.text_indexes_for(id, field)
|
204
|
+
end
|
205
|
+
|
206
|
+
# Retrieve all text indexes
|
207
|
+
def text_indexes
|
208
|
+
fields = self.class.text_indexes.keys
|
209
|
+
fields.collect{|f| text_indexes_for(f)}.flatten
|
210
|
+
end
|
211
|
+
|
212
|
+
# Update all text indexes for the given object. Should be used in an +after_save+ hook
|
213
|
+
# or other applicable area, for example r.update_text_indexes. Can pass an array of
|
214
|
+
# field names to restrict updates just to those fields.
|
215
|
+
def update_text_indexes(*fields)
|
216
|
+
fields = self.class.text_indexes.keys if fields.empty?
|
217
|
+
fields.each do |field|
|
218
|
+
options = self.class.text_indexes[field]
|
219
|
+
value = self.send(field)
|
220
|
+
return false if value.length < options[:minlength] # too short to index
|
221
|
+
indexes = []
|
222
|
+
|
223
|
+
# If values is array, like :tags => ["a", "b"], use as-is
|
224
|
+
# Otherwise, split words on /\s+/ so :title => "Hey there" => ["Hey", "there"]
|
225
|
+
values = value.is_a?(Array) ? value : options[:split] ? value.split(options[:split]) : value
|
226
|
+
values.each do |val|
|
227
|
+
val.gsub!(/[^\w\s]+/,'')
|
228
|
+
val.downcase!
|
229
|
+
next if value.length < options[:minlength]
|
230
|
+
next if self.class.text_index_exclude_list.include? value
|
231
|
+
if options[:exact]
|
232
|
+
str = val.gsub(/\s+/, '.') # can't have " " in Redis cmd string
|
233
|
+
indexes << "#{options[:key]}:#{str}"
|
234
|
+
else
|
235
|
+
len = options[:minlength]
|
236
|
+
while len < val.length
|
237
|
+
str = val[0..len].gsub(/\s+/, '.') # can't have " " in Redis cmd string
|
238
|
+
indexes << "#{options[:key]}:#{str}"
|
239
|
+
len += 1
|
240
|
+
end
|
241
|
+
end
|
242
|
+
end
|
243
|
+
|
244
|
+
# Determine what, if anything, needs to be done. If the indexes are unchanged,
|
245
|
+
# don't make any trips to Redis. Saves tons of useless network calls.
|
246
|
+
old_indexes = text_indexes_for(field)
|
247
|
+
new_indexes = indexes - old_indexes
|
248
|
+
del_indexes = old_indexes - indexes
|
249
|
+
|
250
|
+
# No change, so skip
|
251
|
+
# puts "[#{field}] old=#{old_indexes.inspect} / idx=#{indexes.inspect} / new=#{new_indexes.inspect} / del=#{del_indexes.inspect}"
|
252
|
+
next if new_indexes.empty? and del_indexes.empty?
|
253
|
+
|
254
|
+
# Add new indexes
|
255
|
+
exec_pipelined_index_cmd(:sadd, new_indexes)
|
256
|
+
|
257
|
+
# Delete indexes no longer used
|
258
|
+
exec_pipelined_index_cmd(:srem, del_indexes)
|
259
|
+
|
260
|
+
# Replace our reverse map of indexes
|
261
|
+
redis.set field_key("#{field}_indexes"), indexes.join(';')
|
262
|
+
end # fields.each
|
263
|
+
end
|
264
|
+
|
265
|
+
# Delete all text indexes that the object is a member of. Should be used in
|
266
|
+
# an +after_destroy+ hook to remove the dead object.
|
267
|
+
def delete_text_indexes(*fields)
|
268
|
+
fields = self.class.text_indexes.keys if fields.empty?
|
269
|
+
fields.each do |field|
|
270
|
+
del_indexes = text_indexes_for(field)
|
271
|
+
exec_pipelined_index_cmd(:srem, del_indexes)
|
272
|
+
redis.del field_key("#{field}_indexes")
|
273
|
+
end
|
274
|
+
end
|
275
|
+
|
276
|
+
private
|
277
|
+
|
278
|
+
def exec_pipelined_index_cmd(cmd, indexes)
|
279
|
+
return if indexes.empty?
|
280
|
+
redis.pipelined do |pipe|
|
281
|
+
indexes.each do |key|
|
282
|
+
pipe.send(cmd, key, id)
|
283
|
+
end
|
284
|
+
end
|
285
|
+
end
|
286
|
+
|
287
|
+
end
|
288
|
+
end
|
289
|
+
end
|
@@ -0,0 +1,143 @@
|
|
1
|
+
|
2
|
+
require File.expand_path(File.dirname(__FILE__) + '/spec_helper')
|
3
|
+
|
4
|
+
class Post
|
5
|
+
include Redis::TextSearch
|
6
|
+
|
7
|
+
text_index :title
|
8
|
+
text_index :tags, :exact => true
|
9
|
+
|
10
|
+
def self.text_search_find(ids, options)
|
11
|
+
options.empty? ? ids : [ids, options]
|
12
|
+
end
|
13
|
+
|
14
|
+
def self.first(ids, options)
|
15
|
+
|
16
|
+
end
|
17
|
+
|
18
|
+
def initialize(attrib)
|
19
|
+
@attrib = attrib
|
20
|
+
@id = attrib[:id] || 1
|
21
|
+
end
|
22
|
+
def id; @id; end
|
23
|
+
def method_missing(name, *args)
|
24
|
+
@attrib[name] || super
|
25
|
+
end
|
26
|
+
end
|
27
|
+
|
28
|
+
TITLES = [
|
29
|
+
'Some plain text',
|
30
|
+
'More plain textstring comments',
|
31
|
+
'Come get somebody personal comments',
|
32
|
+
'*Welcome to Nate\'s new BLOG!!',
|
33
|
+
]
|
34
|
+
|
35
|
+
TAGS = [
|
36
|
+
['personal', 'nontechnical'],
|
37
|
+
['mysql', 'technical'],
|
38
|
+
['gaming','technical']
|
39
|
+
]
|
40
|
+
|
41
|
+
|
42
|
+
describe Redis::TextSearch do
|
43
|
+
before :all do
|
44
|
+
@post = Post.new(:title => TITLES[0], :tags => TAGS[0], :id => 1)
|
45
|
+
@post2 = Post.new(:title => TITLES[1], :tags => TAGS[1], :id => 2)
|
46
|
+
@post3 = Post.new(:title => TITLES[2], :tags => TAGS[2], :id => 3)
|
47
|
+
|
48
|
+
@post.delete_text_indexes
|
49
|
+
@post2.delete_text_indexes
|
50
|
+
Post.delete_text_indexes(3)
|
51
|
+
end
|
52
|
+
|
53
|
+
it "should define text indexes in the class" do
|
54
|
+
Post.text_indexes[:title][:key].should == 'post:text_index:title'
|
55
|
+
Post.text_indexes[:tags][:key].should == 'post:text_index:tags'
|
56
|
+
end
|
57
|
+
|
58
|
+
it "should update text indexes correctly" do
|
59
|
+
@post.update_text_indexes
|
60
|
+
@post2.update_text_indexes
|
61
|
+
|
62
|
+
Post.redis.set_members('post:text_index:title:so').should == ['1']
|
63
|
+
Post.redis.set_members('post:text_index:title:som').should == ['1']
|
64
|
+
Post.redis.set_members('post:text_index:title:some').should == ['1']
|
65
|
+
Post.redis.set_members('post:text_index:title:pl').sort.should == ['1','2']
|
66
|
+
Post.redis.set_members('post:text_index:title:pla').sort.should == ['1','2']
|
67
|
+
Post.redis.set_members('post:text_index:title:plai').sort.should == ['1','2']
|
68
|
+
Post.redis.set_members('post:text_index:title:plain').sort.should == ['1','2']
|
69
|
+
Post.redis.set_members('post:text_index:title:te').sort.should == ['1','2']
|
70
|
+
Post.redis.set_members('post:text_index:title:tex').sort.should == ['1','2']
|
71
|
+
Post.redis.set_members('post:text_index:title:text').sort.should == ['1','2']
|
72
|
+
Post.redis.set_members('post:text_index:title:texts').should == ['2']
|
73
|
+
Post.redis.set_members('post:text_index:title:textst').should == ['2']
|
74
|
+
Post.redis.set_members('post:text_index:title:textstr').should == ['2']
|
75
|
+
Post.redis.set_members('post:text_index:title:textstri').should == ['2']
|
76
|
+
Post.redis.set_members('post:text_index:title:textstrin').should == ['2']
|
77
|
+
Post.redis.set_members('post:text_index:title:textstring').should == ['2']
|
78
|
+
Post.redis.set_members('post:text_index:tags:pe').should == []
|
79
|
+
Post.redis.set_members('post:text_index:tags:per').should == []
|
80
|
+
Post.redis.set_members('post:text_index:tags:pers').should == []
|
81
|
+
Post.redis.set_members('post:text_index:tags:perso').should == []
|
82
|
+
Post.redis.set_members('post:text_index:tags:person').should == []
|
83
|
+
Post.redis.set_members('post:text_index:tags:persona').should == []
|
84
|
+
Post.redis.set_members('post:text_index:tags:personal').should == ['1']
|
85
|
+
Post.redis.set_members('post:text_index:tags:no').should == []
|
86
|
+
Post.redis.set_members('post:text_index:tags:non').should == []
|
87
|
+
Post.redis.set_members('post:text_index:tags:nont').should == []
|
88
|
+
Post.redis.set_members('post:text_index:tags:nonte').should == []
|
89
|
+
Post.redis.set_members('post:text_index:tags:nontec').should == []
|
90
|
+
Post.redis.set_members('post:text_index:tags:nontech').should == []
|
91
|
+
Post.redis.set_members('post:text_index:tags:nontechn').should == []
|
92
|
+
Post.redis.set_members('post:text_index:tags:nontechni').should == []
|
93
|
+
Post.redis.set_members('post:text_index:tags:nontechnic').should == []
|
94
|
+
Post.redis.set_members('post:text_index:tags:nontechnica').should == []
|
95
|
+
Post.redis.set_members('post:text_index:tags:nontechnical').should == ['1']
|
96
|
+
end
|
97
|
+
|
98
|
+
it "should search text indexes and return records" do
|
99
|
+
Post.text_search('some').should == ['1']
|
100
|
+
@post3.update_text_indexes
|
101
|
+
Post.text_search('some').sort.should == ['1','3']
|
102
|
+
Post.text_search('plain').sort.should == ['1','2']
|
103
|
+
Post.text_search('plain','text').sort.should == ['1','2']
|
104
|
+
Post.text_search('plain','textstr').sort.should == ['2']
|
105
|
+
Post.text_search('some','TExt').sort.should == ['1']
|
106
|
+
Post.text_search('techNIcal').sort.should == ['2','3']
|
107
|
+
Post.text_search('nontechnical').sort.should == ['1']
|
108
|
+
Post.text_search('personal').sort.should == ['1','3']
|
109
|
+
Post.text_search('personAL', :fields => :tags).sort.should == ['1']
|
110
|
+
Post.text_search('PERsonal', :fields => [:tags]).sort.should == ['1']
|
111
|
+
Post.text_search('nontechnical', :fields => [:title]).sort.should == []
|
112
|
+
end
|
113
|
+
|
114
|
+
it "should pass options thru to find" do
|
115
|
+
Post.text_search('some', :order => 'updated_at desc').should == [['3','1'], {:order=>"updated_at desc"}]
|
116
|
+
Post.text_search('some', :select => 'id,username').should == [['3','1'], {:select => 'id,username'}]
|
117
|
+
end
|
118
|
+
|
119
|
+
it "should handle pagination" do
|
120
|
+
Post.text_search('some', :page => 1).should == [['3','1'], {:offset=>0, :limit=>30}]
|
121
|
+
Post.text_search('some', :page => 2, :per_page => 5).should == [['3','1'], {:offset=>5, :limit=>5}]
|
122
|
+
Post.text_search('some', :page => 15, :per_page => 3).should == [['3','1'], {:offset=>42, :limit=>3}]
|
123
|
+
end
|
124
|
+
|
125
|
+
it "should support a hash to the text_search method" do
|
126
|
+
Post.text_search(:tags => 'technical').sort.should == ['2','3']
|
127
|
+
Post.text_search(:tags => 'nontechnical').sort.should == ['1']
|
128
|
+
Post.text_search(:tags => 'technical', :title => 'plain').should == ['2']
|
129
|
+
Post.text_search(:tags => ['technical','MYsql'], :title => 'Mo').should == ['2']
|
130
|
+
Post.text_search(:tags => ['technical','MYsql'], :title => 'some').should == []
|
131
|
+
Post.text_search(:tags => 'technical', :title => 'comments').sort.should == ['2','3']
|
132
|
+
end
|
133
|
+
|
134
|
+
# MUST BE LAST!!!!!!
|
135
|
+
it "should delete text indexes" do
|
136
|
+
@post.delete_text_indexes
|
137
|
+
@post2.delete_text_indexes
|
138
|
+
Post.delete_text_indexes(3)
|
139
|
+
@post.text_indexes.should == []
|
140
|
+
@post2.text_indexes.should == []
|
141
|
+
@post3.text_indexes.should == []
|
142
|
+
end
|
143
|
+
end
|
data/spec/spec_helper.rb
ADDED
metadata
ADDED
@@ -0,0 +1,69 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: redis-textsearch
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Nate Wiger
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
|
12
|
+
date: 2009-12-03 00:00:00 -08:00
|
13
|
+
default_executable:
|
14
|
+
dependencies:
|
15
|
+
- !ruby/object:Gem::Dependency
|
16
|
+
name: redis
|
17
|
+
type: :runtime
|
18
|
+
version_requirement:
|
19
|
+
version_requirements: !ruby/object:Gem::Requirement
|
20
|
+
requirements:
|
21
|
+
- - ">="
|
22
|
+
- !ruby/object:Gem::Version
|
23
|
+
version: "0.1"
|
24
|
+
version:
|
25
|
+
description: Crazy fast text search using Redis. Works with any ORM or data store.
|
26
|
+
email: nate@wiger.org
|
27
|
+
executables: []
|
28
|
+
|
29
|
+
extensions: []
|
30
|
+
|
31
|
+
extra_rdoc_files:
|
32
|
+
- README.rdoc
|
33
|
+
files:
|
34
|
+
- lib/redis/text_search/collection.rb
|
35
|
+
- lib/redis/text_search.rb
|
36
|
+
- spec/redis_text_search_core_spec.rb
|
37
|
+
- spec/spec_helper.rb
|
38
|
+
- README.rdoc
|
39
|
+
has_rdoc: true
|
40
|
+
homepage: http://github.com/nateware/redis-textsearch
|
41
|
+
licenses: []
|
42
|
+
|
43
|
+
post_install_message:
|
44
|
+
rdoc_options:
|
45
|
+
- --title
|
46
|
+
- Redis::TextSearch -- Crazy fast text search using Redis
|
47
|
+
require_paths:
|
48
|
+
- lib
|
49
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
50
|
+
requirements:
|
51
|
+
- - ">="
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: "0"
|
54
|
+
version:
|
55
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
56
|
+
requirements:
|
57
|
+
- - ">="
|
58
|
+
- !ruby/object:Gem::Version
|
59
|
+
version: "0"
|
60
|
+
version:
|
61
|
+
requirements:
|
62
|
+
- redis, v0.1 or greater
|
63
|
+
rubyforge_project: redis-textsearch
|
64
|
+
rubygems_version: 1.3.5
|
65
|
+
signing_key:
|
66
|
+
specification_version: 3
|
67
|
+
summary: Fast text search and word indexes using Redis
|
68
|
+
test_files: []
|
69
|
+
|