redis-textsearch 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.rdoc +93 -0
- data/lib/redis/text_search/collection.rb +146 -0
- data/lib/redis/text_search.rb +289 -0
- data/spec/redis_text_search_core_spec.rb +143 -0
- data/spec/spec_helper.rb +5 -0
- metadata +69 -0
data/README.rdoc
ADDED
@@ -0,0 +1,93 @@
|
|
1
|
+
= Redis::TextSearch - Use Redis to perform text search from any type of class
|
2
|
+
|
3
|
+
This gem implements an extremely fast text search using Redis, based on the patterns from
|
4
|
+
James Gray's {lists and sets in Redis post}[http://blog.grayproductions.net/articles/lists_and_sets_in_redis]
|
5
|
+
as well as Antirez's {text search gist}[http://gist.github.com/120067]. You can use it with any type of class,
|
6
|
+
whether it be ActiveRecord, DataMapper, MongoRecord, or even a class having nothing to do with an ORM.
|
7
|
+
|
8
|
+
This is not intended to be the most full-featured text search available. For that, look into
|
9
|
+
{Sphinx}[http://www.sphinxsearch.com/], {Solr}[http://lucene.apache.org/solr/], or
|
10
|
+
{other alternatives}[http://en.wikipedia.org/wiki/Full_text_search#Open_Source_projects].
|
11
|
+
This gem is designed to (a) be extremely fast and (b) handle any (or multiple) data stores.
|
12
|
+
|
13
|
+
The only requirement this gem has is that your class must provide an +id+ instance method
|
14
|
+
which returns the ID for that instance. ActiveRecord, DataMapper, and MongoRecord all have +id+ methods which
|
15
|
+
are known to be suitable. Since "ID" can be any data type, you can even write an +id+ method of your own
|
16
|
+
that just returns a string, or an MD5 of a filename, or something else unique.
|
17
|
+
|
18
|
+
== Installation
|
19
|
+
|
20
|
+
gem install gemcutter
|
21
|
+
gem tumble
|
22
|
+
gem install redis-textsearch
|
23
|
+
|
24
|
+
== Initialization
|
25
|
+
|
26
|
+
If on Rails, config/initializers/redis.rb is a good place for this:
|
27
|
+
|
28
|
+
require 'redis'
|
29
|
+
require 'redis/text_search'
|
30
|
+
Redis::TextSearch.redis = Redis.new(:host => 'localhost', :port => 6379)
|
31
|
+
|
32
|
+
== Example
|
33
|
+
|
34
|
+
Model class:
|
35
|
+
|
36
|
+
class Post < ActiveRecord::Base
|
37
|
+
include Redis::TextSearch
|
38
|
+
|
39
|
+
text_index :title, :minlength => 2
|
40
|
+
text_index :tags, :exact => true
|
41
|
+
|
42
|
+
# Using AR callback (you can call update_text_indexes anywhere on your instance)
|
43
|
+
after_save do |r|
|
44
|
+
r.update_text_indexes
|
45
|
+
end
|
46
|
+
after_destroy do |r|
|
47
|
+
r.delete_text_indexes
|
48
|
+
end
|
49
|
+
end
|
50
|
+
|
51
|
+
Create posts:
|
52
|
+
|
53
|
+
Post.create(:title => "All About Bacon", :tags => "chef nontechnical")
|
54
|
+
Post.create(:title => "All About Bacon - Part 2", :tags => "chef nontechnical")
|
55
|
+
Post.create(:title => "Homemade Belgian Waffles", :tags => "chef nontechnical")
|
56
|
+
Post.create(:title => "Using Redis with Ruby", :tags => "technical ruby redis")
|
57
|
+
Post.create(:title => "Installing Redis on Linux", :tags => "technical redis linux")
|
58
|
+
Post.create(:title => "Chef Deployment Recipes", :tags => "ruby howto chef")
|
59
|
+
|
60
|
+
Then search for them:
|
61
|
+
|
62
|
+
@posts = Post.text_search('bacon') # 2 results
|
63
|
+
@posts = Post.text_search('chef') # 4 results (tags and title)
|
64
|
+
@posts = Post.text_search('technical', 'ruby') # AND search (1 result)
|
65
|
+
@posts = Post.text_search('chef', :fields => :tags) # 3 results (only search tags)
|
66
|
+
|
67
|
+
You can pass options through to the +find+ method:
|
68
|
+
|
69
|
+
@posts = Post.text_search('redis', :order => 'updated_at desc')
|
70
|
+
@posts = Post.text_search('redis', :select => 'id,title', :limit => 50)
|
71
|
+
|
72
|
+
And do pagination (adapted from +will_paginate+):
|
73
|
+
|
74
|
+
@posts = Post.text_search('redis', :page => 1, :per_page => 10)
|
75
|
+
@posts = Post.text_search('redis', :page => 1) # uses class.per_page like will_paginate
|
76
|
+
|
77
|
+
You can also specify specific fields to search as a hash:
|
78
|
+
|
79
|
+
@posts = Post.text_search(:tags => 'chef') # 4 results
|
80
|
+
@posts = Post.text_search(:tags => ['technical','ruby']) # AND (1 result)
|
81
|
+
@posts = Post.text_search(:tags => 'redis', :title => 'linux') # 1 result
|
82
|
+
@posts = Post.text_search(:title => 'chef') # 1 result
|
83
|
+
|
84
|
+
Note that if you need to pass options to +find+ AND search specific fields, the first
|
85
|
+
hash must be in brackets:
|
86
|
+
|
87
|
+
@posts = Post.text_search({:tags => 'chef', :title => 'deployment'},
|
88
|
+
:order => 'updated_at desc')
|
89
|
+
|
90
|
+
== Author
|
91
|
+
|
92
|
+
Copyright (c) 2009 {Nate Wiger}[http://nate.wiger.org]. All Rights Reserved.
|
93
|
+
Released under the {Artistic License}[http://www.opensource.org/licenses/artistic-license-2.0.php].
|
@@ -0,0 +1,146 @@
|
|
1
|
+
module Redis::TextSearch
|
2
|
+
# = Invalid page number error
|
3
|
+
# This is an ArgumentError raised in case a page was requested that is either
|
4
|
+
# zero or negative number. You should decide how do deal with such errors in
|
5
|
+
# the controller.
|
6
|
+
#
|
7
|
+
# If you're using Rails 2, then this error will automatically get handled like
|
8
|
+
# 404 Not Found. The hook is in "will_paginate.rb":
|
9
|
+
#
|
10
|
+
# ActionController::Base.rescue_responses['Redis::TextSearch::InvalidPage'] = :not_found
|
11
|
+
#
|
12
|
+
# If you don't like this, use your preffered method of rescuing exceptions in
|
13
|
+
# public from your controllers to handle this differently. The +rescue_from+
|
14
|
+
# method is a nice addition to Rails 2.
|
15
|
+
#
|
16
|
+
# This error is *not* raised when a page further than the last page is
|
17
|
+
# requested. Use <tt>Redis::TextSearch::Collection#out_of_bounds?</tt> method to
|
18
|
+
# check for those cases and manually deal with them as you see fit.
|
19
|
+
class InvalidPage < ArgumentError
|
20
|
+
def initialize(page, page_num)
|
21
|
+
super "#{page.inspect} given as value, which translates to '#{page_num}' as page number"
|
22
|
+
end
|
23
|
+
end
|
24
|
+
|
25
|
+
# = The key to pagination
|
26
|
+
# Arrays returned from paginating finds are, in fact, instances of this little
|
27
|
+
# class. You may think of Redis::TextSearch::Collection as an ordinary array with
|
28
|
+
# some extra properties. Those properties are used by view helpers to generate
|
29
|
+
# correct page links.
|
30
|
+
#
|
31
|
+
# Redis::TextSearch::Collection also assists in rolling out your own pagination
|
32
|
+
# solutions: see +create+.
|
33
|
+
#
|
34
|
+
# If you are writing a library that provides a collection which you would like
|
35
|
+
# to conform to this API, you don't have to copy these methods over; simply
|
36
|
+
# make your plugin/gem dependant on the "mislav-will_paginate" gem:
|
37
|
+
#
|
38
|
+
# gem 'mislav-will_paginate'
|
39
|
+
# require 'will_paginate/collection'
|
40
|
+
#
|
41
|
+
# # Redis::TextSearch::Collection is now available for use
|
42
|
+
class Collection < Array
|
43
|
+
attr_reader :current_page, :per_page, :total_entries, :total_pages
|
44
|
+
|
45
|
+
# Arguments to the constructor are the current page number, per-page limit
|
46
|
+
# and the total number of entries. The last argument is optional because it
|
47
|
+
# is best to do lazy counting; in other words, count *conditionally* after
|
48
|
+
# populating the collection using the +replace+ method.
|
49
|
+
def initialize(page, per_page, total = nil)
|
50
|
+
@current_page = page.to_i
|
51
|
+
raise InvalidPage.new(page, @current_page) if @current_page < 1
|
52
|
+
@per_page = per_page.to_i
|
53
|
+
raise ArgumentError, "`per_page` setting cannot be less than 1 (#{@per_page} given)" if @per_page < 1
|
54
|
+
|
55
|
+
self.total_entries = total if total
|
56
|
+
end
|
57
|
+
|
58
|
+
# Just like +new+, but yields the object after instantiation and returns it
|
59
|
+
# afterwards. This is very useful for manual pagination:
|
60
|
+
#
|
61
|
+
# @entries = Redis::TextSearch::Collection.create(1, 10) do |pager|
|
62
|
+
# result = Post.find(:all, :limit => pager.per_page, :offset => pager.offset)
|
63
|
+
# # inject the result array into the paginated collection:
|
64
|
+
# pager.replace(result)
|
65
|
+
#
|
66
|
+
# unless pager.total_entries
|
67
|
+
# # the pager didn't manage to guess the total count, do it manually
|
68
|
+
# pager.total_entries = Post.count
|
69
|
+
# end
|
70
|
+
# end
|
71
|
+
#
|
72
|
+
# The possibilities with this are endless. For another example, here is how
|
73
|
+
# Redis::TextSearch used to define pagination for Array instances:
|
74
|
+
#
|
75
|
+
# Array.class_eval do
|
76
|
+
# def paginate(page = 1, per_page = 15)
|
77
|
+
# Redis::TextSearch::Collection.create(page, per_page, size) do |pager|
|
78
|
+
# pager.replace self[pager.offset, pager.per_page].to_a
|
79
|
+
# end
|
80
|
+
# end
|
81
|
+
# end
|
82
|
+
#
|
83
|
+
# The Array#paginate API has since then changed, but this still serves as a
|
84
|
+
# fine example of Redis::TextSearch::Collection usage.
|
85
|
+
def self.create(page, per_page, total = nil)
|
86
|
+
pager = new(page, per_page, total)
|
87
|
+
yield pager
|
88
|
+
pager
|
89
|
+
end
|
90
|
+
|
91
|
+
# Helper method that is true when someone tries to fetch a page with a
|
92
|
+
# larger number than the last page. Can be used in combination with flashes
|
93
|
+
# and redirecting.
|
94
|
+
def out_of_bounds?
|
95
|
+
current_page > total_pages
|
96
|
+
end
|
97
|
+
|
98
|
+
# Current offset of the paginated collection. If we're on the first page,
|
99
|
+
# it is always 0. If we're on the 2nd page and there are 30 entries per page,
|
100
|
+
# the offset is 30. This property is useful if you want to render ordinals
|
101
|
+
# side by side with records in the view: simply start with offset + 1.
|
102
|
+
def offset
|
103
|
+
(current_page - 1) * per_page
|
104
|
+
end
|
105
|
+
|
106
|
+
# current_page - 1 or nil if there is no previous page
|
107
|
+
def previous_page
|
108
|
+
current_page > 1 ? (current_page - 1) : nil
|
109
|
+
end
|
110
|
+
|
111
|
+
# current_page + 1 or nil if there is no next page
|
112
|
+
def next_page
|
113
|
+
current_page < total_pages ? (current_page + 1) : nil
|
114
|
+
end
|
115
|
+
|
116
|
+
# sets the <tt>total_entries</tt> property and calculates <tt>total_pages</tt>
|
117
|
+
def total_entries=(number)
|
118
|
+
@total_entries = number.to_i
|
119
|
+
@total_pages = (@total_entries / per_page.to_f).ceil
|
120
|
+
end
|
121
|
+
|
122
|
+
# This is a magic wrapper for the original Array#replace method. It serves
|
123
|
+
# for populating the paginated collection after initialization.
|
124
|
+
#
|
125
|
+
# Why magic? Because it tries to guess the total number of entries judging
|
126
|
+
# by the size of given array. If it is shorter than +per_page+ limit, then we
|
127
|
+
# know we're on the last page. This trick is very useful for avoiding
|
128
|
+
# unnecessary hits to the database to do the counting after we fetched the
|
129
|
+
# data for the current page.
|
130
|
+
#
|
131
|
+
# However, after using +replace+ you should always test the value of
|
132
|
+
# +total_entries+ and set it to a proper value if it's +nil+. See the example
|
133
|
+
# in +create+.
|
134
|
+
def replace(array)
|
135
|
+
result = super
|
136
|
+
|
137
|
+
# The collection is shorter then page limit? Rejoice, because
|
138
|
+
# then we know that we are on the last page!
|
139
|
+
if total_entries.nil? and length < per_page and (current_page == 1 or length > 0)
|
140
|
+
self.total_entries = offset + length
|
141
|
+
end
|
142
|
+
|
143
|
+
result
|
144
|
+
end
|
145
|
+
end
|
146
|
+
end
|
@@ -0,0 +1,289 @@
|
|
1
|
+
# Redis::TextSearch - Use Redis to add text search to your app.
|
2
|
+
class Redis
|
3
|
+
#
|
4
|
+
# Redis::TextSearch enables high-performance text search in your app using
|
5
|
+
# Redis. You can perform text search on any type of data store or ORM.
|
6
|
+
#
|
7
|
+
module TextSearch
|
8
|
+
class NoFinderMethod < StandardError; end
|
9
|
+
class BadTextIndex < StandardError; end
|
10
|
+
|
11
|
+
DEFAULT_EXCLUDE_LIST = %w(a an and as at but by for in into of on onto to the)
|
12
|
+
|
13
|
+
dir = File.expand_path(__FILE__.sub(/\.rb$/,''))
|
14
|
+
autoload :Collection, File.join(dir, 'collection')
|
15
|
+
|
16
|
+
class << self
|
17
|
+
def redis=(conn) @redis = conn end
|
18
|
+
def redis
|
19
|
+
@redis ||= $redis || raise(NotConnected, "Redis::TextSearch.redis not set to a Redis.new connection")
|
20
|
+
end
|
21
|
+
|
22
|
+
def included(klass)
|
23
|
+
klass.instance_variable_set('@redis', @redis)
|
24
|
+
klass.instance_variable_set('@text_indexes', {})
|
25
|
+
klass.instance_variable_set('@text_search_find', nil)
|
26
|
+
klass.instance_variable_set('@text_index_exclude_list', DEFAULT_EXCLUDE_LIST)
|
27
|
+
klass.send :include, InstanceMethods
|
28
|
+
klass.extend ClassMethods
|
29
|
+
klass.guess_text_search_find
|
30
|
+
class << klass
|
31
|
+
define_method(:per_page) { 30 } unless respond_to?(:per_page)
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end
|
35
|
+
|
36
|
+
# These class methods are added to the class when you include Redis::TextSearch.
|
37
|
+
module ClassMethods
|
38
|
+
attr_accessor :redis
|
39
|
+
attr_reader :text_indexes
|
40
|
+
|
41
|
+
# Words to exclude from text indexing. By default, includes common
|
42
|
+
# English prepositions like "a", "an", "the", "and", "or", etc.
|
43
|
+
# This is an accessor to an array, so you can use += or << to add to it.
|
44
|
+
# See the constant +DEFAULT_EXCLUDE_LIST+ for the default list.
|
45
|
+
attr_accessor :text_index_exclude_list
|
46
|
+
|
47
|
+
# Set the Redis prefix to use. Defaults to model_name
|
48
|
+
def prefix=(prefix) @prefix = prefix end
|
49
|
+
def prefix #:nodoc:
|
50
|
+
@prefix ||= self.name.to_s.
|
51
|
+
sub(%r{(.*::)}, '').
|
52
|
+
gsub(/([A-Z]+)([A-Z][a-z])/,'\1_\2').
|
53
|
+
gsub(/([a-z\d])([A-Z])/,'\1_\2').
|
54
|
+
downcase
|
55
|
+
end
|
56
|
+
|
57
|
+
def field_key(name, id) #:nodoc:
|
58
|
+
"#{prefix}:#{id}:#{name}"
|
59
|
+
end
|
60
|
+
|
61
|
+
# This is called when the class is imported, and uses reflection to guess
|
62
|
+
# how to retrieve records. You can override it by explicitly defining a
|
63
|
+
# +text_search_find+ class method that takes an array of IDs as an argument.
|
64
|
+
def guess_text_search_find
|
65
|
+
if defined?(ActiveRecord::Base) and is_a?(ActiveRecord::Base)
|
66
|
+
instance_eval <<-EndMethod
|
67
|
+
def text_search_find(ids, options)
|
68
|
+
all(options.merge(:conditions => {:#{primary_key} => ids}))
|
69
|
+
end
|
70
|
+
EndMethod
|
71
|
+
elsif defined?(MongoRecord::Base) and is_a?(MongoRecord::Base)
|
72
|
+
instance_eval <<-EndMethod
|
73
|
+
def text_search_find(ids, options)
|
74
|
+
all(options.merge(:conditions => {:#{primary_key} => ids}))
|
75
|
+
end
|
76
|
+
EndMethod
|
77
|
+
elsif respond_to?(:get)
|
78
|
+
# DataMapper::Resource is an include, so is_a? won't work
|
79
|
+
instance_eval <<-EndMethod
|
80
|
+
def text_search_find(ids, options)
|
81
|
+
get(ids, options)
|
82
|
+
end
|
83
|
+
EndMethod
|
84
|
+
end
|
85
|
+
end
|
86
|
+
|
87
|
+
# Define fields to be indexed for text search. To update the index, you must call
|
88
|
+
# update_text_indexes after record save or at the appropriate point.
|
89
|
+
def text_index(*args)
|
90
|
+
options = args.last.is_a?(Hash) ? args.pop : {}
|
91
|
+
options[:minlength] ||= 1
|
92
|
+
options[:split] ||= /\s+/
|
93
|
+
raise ArgumentError, "Must specify fields to index to #{self.name}.text_index" unless args.length > 0
|
94
|
+
args.each do |name|
|
95
|
+
@text_indexes[name.to_sym] = options.merge(:key => field_key(name, 'text_index'))
|
96
|
+
end
|
97
|
+
end
|
98
|
+
|
99
|
+
# Perform text search and return results from database. Options:
|
100
|
+
#
|
101
|
+
# 'string', 'string2'
|
102
|
+
# :fields
|
103
|
+
# :page
|
104
|
+
# :per_page
|
105
|
+
def text_search(*args)
|
106
|
+
options = args.length > 1 && args.last.is_a?(Hash) ? args.pop : {}
|
107
|
+
fields = Array(options.delete(:fields) || @text_indexes.keys)
|
108
|
+
finder = options.delete(:finder)
|
109
|
+
unless finder
|
110
|
+
unless defined?(:text_search_find)
|
111
|
+
raise NoFinderMethod, "Could not detect how to find records; you must def text_search_find()"
|
112
|
+
end
|
113
|
+
finder = :text_search_find
|
114
|
+
end
|
115
|
+
|
116
|
+
#
|
117
|
+
# Assemble set names for our intersection.
|
118
|
+
# Accept two ways of doing search: either {:field => ['value','value'], :field => 'value'},
|
119
|
+
# or 'value','value', :fields => [:field, :field]. The first is an AND, the latter an OR.
|
120
|
+
#
|
121
|
+
ids = []
|
122
|
+
if args.empty?
|
123
|
+
raise ArgumentError, "Must specify search string(s) to #{self.name}.text_search"
|
124
|
+
elsif args.first.is_a?(Hash)
|
125
|
+
sets = []
|
126
|
+
args.first.each do |f,v|
|
127
|
+
sets += text_search_sets_for(f,v)
|
128
|
+
end
|
129
|
+
# Execute single intersection (AND)
|
130
|
+
ids = redis.set_intersect(*sets)
|
131
|
+
else
|
132
|
+
fields.each do |f|
|
133
|
+
sets = text_search_sets_for(f,args)
|
134
|
+
# Execute intersection per loop (OR)
|
135
|
+
ids += redis.set_intersect(*sets)
|
136
|
+
end
|
137
|
+
end
|
138
|
+
|
139
|
+
# Calculate pagination if applicable. Presence of :page indicates we want pagination.
|
140
|
+
# Adapted from will_paginate/finder.rb
|
141
|
+
if options.has_key?(:page)
|
142
|
+
page = options.delete(:page) || 1
|
143
|
+
per_page = options.delete(:per_page) || self.per_page
|
144
|
+
total = ids.length
|
145
|
+
|
146
|
+
Redis::TextSearch::Collection.create(page, per_page, total) do |pager|
|
147
|
+
# Convert page/per_page to limit/offset
|
148
|
+
options.merge!(:offset => pager.offset, :limit => pager.per_page)
|
149
|
+
pager.replace(send(finder, ids, options){ |*a| yield(*a) if block_given? })
|
150
|
+
end
|
151
|
+
else
|
152
|
+
# Execute finder directly
|
153
|
+
send(finder, ids, options)
|
154
|
+
end
|
155
|
+
end
|
156
|
+
|
157
|
+
# Filter and return self. Chainable.
|
158
|
+
def text_filter(field)
|
159
|
+
raise UnimplementedError
|
160
|
+
end
|
161
|
+
|
162
|
+
# Delete all text indexes for the given id.
|
163
|
+
def delete_text_indexes(id, *fields)
|
164
|
+
fields = @text_indexes.keys if fields.empty?
|
165
|
+
fields.each do |field|
|
166
|
+
redis.pipelined do |pipe|
|
167
|
+
text_indexes_for(id, field).each do |key|
|
168
|
+
pipe.srem(key, id)
|
169
|
+
end
|
170
|
+
pipe.del field_key("#{field}_indexes", id)
|
171
|
+
end
|
172
|
+
end
|
173
|
+
end
|
174
|
+
|
175
|
+
def text_indexes_for(id, field) #:nodoc:
|
176
|
+
(redis.get(field_key("#{field}_indexes", id)) || '').split(';')
|
177
|
+
end
|
178
|
+
|
179
|
+
def text_search_sets_for(field, values)
|
180
|
+
key = @text_indexes[field][:key]
|
181
|
+
Array(values).collect do |val|
|
182
|
+
str = val.downcase.gsub(/[^\w\s]+/,'').gsub(/\s+/, '.') # can't have " " in Redis cmd string
|
183
|
+
"#{key}:#{str}"
|
184
|
+
end
|
185
|
+
end
|
186
|
+
end
|
187
|
+
|
188
|
+
module InstanceMethods #:nodoc:
|
189
|
+
def redis() self.class.redis end
|
190
|
+
def field_key(name) #:nodoc:
|
191
|
+
self.class.field_key(name, id)
|
192
|
+
end
|
193
|
+
|
194
|
+
# Retrieve the options for the given field
|
195
|
+
def text_index_options_for(field)
|
196
|
+
self.class.text_indexes[field] ||
|
197
|
+
raise(BadTextIndex, "No such text index #{field} in #{self.class.name}")
|
198
|
+
end
|
199
|
+
|
200
|
+
# Retrieve the reverse-mapping of text indexes for a given field. Designed
|
201
|
+
# as a utility method but maybe you will find it useful.
|
202
|
+
def text_indexes_for(field)
|
203
|
+
self.class.text_indexes_for(id, field)
|
204
|
+
end
|
205
|
+
|
206
|
+
# Retrieve all text indexes
|
207
|
+
def text_indexes
|
208
|
+
fields = self.class.text_indexes.keys
|
209
|
+
fields.collect{|f| text_indexes_for(f)}.flatten
|
210
|
+
end
|
211
|
+
|
212
|
+
# Update all text indexes for the given object. Should be used in an +after_save+ hook
|
213
|
+
# or other applicable area, for example r.update_text_indexes. Can pass an array of
|
214
|
+
# field names to restrict updates just to those fields.
|
215
|
+
def update_text_indexes(*fields)
|
216
|
+
fields = self.class.text_indexes.keys if fields.empty?
|
217
|
+
fields.each do |field|
|
218
|
+
options = self.class.text_indexes[field]
|
219
|
+
value = self.send(field)
|
220
|
+
return false if value.length < options[:minlength] # too short to index
|
221
|
+
indexes = []
|
222
|
+
|
223
|
+
# If values is array, like :tags => ["a", "b"], use as-is
|
224
|
+
# Otherwise, split words on /\s+/ so :title => "Hey there" => ["Hey", "there"]
|
225
|
+
values = value.is_a?(Array) ? value : options[:split] ? value.split(options[:split]) : value
|
226
|
+
values.each do |val|
|
227
|
+
val.gsub!(/[^\w\s]+/,'')
|
228
|
+
val.downcase!
|
229
|
+
next if value.length < options[:minlength]
|
230
|
+
next if self.class.text_index_exclude_list.include? value
|
231
|
+
if options[:exact]
|
232
|
+
str = val.gsub(/\s+/, '.') # can't have " " in Redis cmd string
|
233
|
+
indexes << "#{options[:key]}:#{str}"
|
234
|
+
else
|
235
|
+
len = options[:minlength]
|
236
|
+
while len < val.length
|
237
|
+
str = val[0..len].gsub(/\s+/, '.') # can't have " " in Redis cmd string
|
238
|
+
indexes << "#{options[:key]}:#{str}"
|
239
|
+
len += 1
|
240
|
+
end
|
241
|
+
end
|
242
|
+
end
|
243
|
+
|
244
|
+
# Determine what, if anything, needs to be done. If the indexes are unchanged,
|
245
|
+
# don't make any trips to Redis. Saves tons of useless network calls.
|
246
|
+
old_indexes = text_indexes_for(field)
|
247
|
+
new_indexes = indexes - old_indexes
|
248
|
+
del_indexes = old_indexes - indexes
|
249
|
+
|
250
|
+
# No change, so skip
|
251
|
+
# puts "[#{field}] old=#{old_indexes.inspect} / idx=#{indexes.inspect} / new=#{new_indexes.inspect} / del=#{del_indexes.inspect}"
|
252
|
+
next if new_indexes.empty? and del_indexes.empty?
|
253
|
+
|
254
|
+
# Add new indexes
|
255
|
+
exec_pipelined_index_cmd(:sadd, new_indexes)
|
256
|
+
|
257
|
+
# Delete indexes no longer used
|
258
|
+
exec_pipelined_index_cmd(:srem, del_indexes)
|
259
|
+
|
260
|
+
# Replace our reverse map of indexes
|
261
|
+
redis.set field_key("#{field}_indexes"), indexes.join(';')
|
262
|
+
end # fields.each
|
263
|
+
end
|
264
|
+
|
265
|
+
# Delete all text indexes that the object is a member of. Should be used in
|
266
|
+
# an +after_destroy+ hook to remove the dead object.
|
267
|
+
def delete_text_indexes(*fields)
|
268
|
+
fields = self.class.text_indexes.keys if fields.empty?
|
269
|
+
fields.each do |field|
|
270
|
+
del_indexes = text_indexes_for(field)
|
271
|
+
exec_pipelined_index_cmd(:srem, del_indexes)
|
272
|
+
redis.del field_key("#{field}_indexes")
|
273
|
+
end
|
274
|
+
end
|
275
|
+
|
276
|
+
private
|
277
|
+
|
278
|
+
def exec_pipelined_index_cmd(cmd, indexes)
|
279
|
+
return if indexes.empty?
|
280
|
+
redis.pipelined do |pipe|
|
281
|
+
indexes.each do |key|
|
282
|
+
pipe.send(cmd, key, id)
|
283
|
+
end
|
284
|
+
end
|
285
|
+
end
|
286
|
+
|
287
|
+
end
|
288
|
+
end
|
289
|
+
end
|
@@ -0,0 +1,143 @@
|
|
1
|
+
|
2
|
+
require File.expand_path(File.dirname(__FILE__) + '/spec_helper')
|
3
|
+
|
4
|
+
class Post
|
5
|
+
include Redis::TextSearch
|
6
|
+
|
7
|
+
text_index :title
|
8
|
+
text_index :tags, :exact => true
|
9
|
+
|
10
|
+
def self.text_search_find(ids, options)
|
11
|
+
options.empty? ? ids : [ids, options]
|
12
|
+
end
|
13
|
+
|
14
|
+
def self.first(ids, options)
|
15
|
+
|
16
|
+
end
|
17
|
+
|
18
|
+
def initialize(attrib)
|
19
|
+
@attrib = attrib
|
20
|
+
@id = attrib[:id] || 1
|
21
|
+
end
|
22
|
+
def id; @id; end
|
23
|
+
def method_missing(name, *args)
|
24
|
+
@attrib[name] || super
|
25
|
+
end
|
26
|
+
end
|
27
|
+
|
28
|
+
TITLES = [
|
29
|
+
'Some plain text',
|
30
|
+
'More plain textstring comments',
|
31
|
+
'Come get somebody personal comments',
|
32
|
+
'*Welcome to Nate\'s new BLOG!!',
|
33
|
+
]
|
34
|
+
|
35
|
+
TAGS = [
|
36
|
+
['personal', 'nontechnical'],
|
37
|
+
['mysql', 'technical'],
|
38
|
+
['gaming','technical']
|
39
|
+
]
|
40
|
+
|
41
|
+
|
42
|
+
describe Redis::TextSearch do
|
43
|
+
before :all do
|
44
|
+
@post = Post.new(:title => TITLES[0], :tags => TAGS[0], :id => 1)
|
45
|
+
@post2 = Post.new(:title => TITLES[1], :tags => TAGS[1], :id => 2)
|
46
|
+
@post3 = Post.new(:title => TITLES[2], :tags => TAGS[2], :id => 3)
|
47
|
+
|
48
|
+
@post.delete_text_indexes
|
49
|
+
@post2.delete_text_indexes
|
50
|
+
Post.delete_text_indexes(3)
|
51
|
+
end
|
52
|
+
|
53
|
+
it "should define text indexes in the class" do
|
54
|
+
Post.text_indexes[:title][:key].should == 'post:text_index:title'
|
55
|
+
Post.text_indexes[:tags][:key].should == 'post:text_index:tags'
|
56
|
+
end
|
57
|
+
|
58
|
+
it "should update text indexes correctly" do
|
59
|
+
@post.update_text_indexes
|
60
|
+
@post2.update_text_indexes
|
61
|
+
|
62
|
+
Post.redis.set_members('post:text_index:title:so').should == ['1']
|
63
|
+
Post.redis.set_members('post:text_index:title:som').should == ['1']
|
64
|
+
Post.redis.set_members('post:text_index:title:some').should == ['1']
|
65
|
+
Post.redis.set_members('post:text_index:title:pl').sort.should == ['1','2']
|
66
|
+
Post.redis.set_members('post:text_index:title:pla').sort.should == ['1','2']
|
67
|
+
Post.redis.set_members('post:text_index:title:plai').sort.should == ['1','2']
|
68
|
+
Post.redis.set_members('post:text_index:title:plain').sort.should == ['1','2']
|
69
|
+
Post.redis.set_members('post:text_index:title:te').sort.should == ['1','2']
|
70
|
+
Post.redis.set_members('post:text_index:title:tex').sort.should == ['1','2']
|
71
|
+
Post.redis.set_members('post:text_index:title:text').sort.should == ['1','2']
|
72
|
+
Post.redis.set_members('post:text_index:title:texts').should == ['2']
|
73
|
+
Post.redis.set_members('post:text_index:title:textst').should == ['2']
|
74
|
+
Post.redis.set_members('post:text_index:title:textstr').should == ['2']
|
75
|
+
Post.redis.set_members('post:text_index:title:textstri').should == ['2']
|
76
|
+
Post.redis.set_members('post:text_index:title:textstrin').should == ['2']
|
77
|
+
Post.redis.set_members('post:text_index:title:textstring').should == ['2']
|
78
|
+
Post.redis.set_members('post:text_index:tags:pe').should == []
|
79
|
+
Post.redis.set_members('post:text_index:tags:per').should == []
|
80
|
+
Post.redis.set_members('post:text_index:tags:pers').should == []
|
81
|
+
Post.redis.set_members('post:text_index:tags:perso').should == []
|
82
|
+
Post.redis.set_members('post:text_index:tags:person').should == []
|
83
|
+
Post.redis.set_members('post:text_index:tags:persona').should == []
|
84
|
+
Post.redis.set_members('post:text_index:tags:personal').should == ['1']
|
85
|
+
Post.redis.set_members('post:text_index:tags:no').should == []
|
86
|
+
Post.redis.set_members('post:text_index:tags:non').should == []
|
87
|
+
Post.redis.set_members('post:text_index:tags:nont').should == []
|
88
|
+
Post.redis.set_members('post:text_index:tags:nonte').should == []
|
89
|
+
Post.redis.set_members('post:text_index:tags:nontec').should == []
|
90
|
+
Post.redis.set_members('post:text_index:tags:nontech').should == []
|
91
|
+
Post.redis.set_members('post:text_index:tags:nontechn').should == []
|
92
|
+
Post.redis.set_members('post:text_index:tags:nontechni').should == []
|
93
|
+
Post.redis.set_members('post:text_index:tags:nontechnic').should == []
|
94
|
+
Post.redis.set_members('post:text_index:tags:nontechnica').should == []
|
95
|
+
Post.redis.set_members('post:text_index:tags:nontechnical').should == ['1']
|
96
|
+
end
|
97
|
+
|
98
|
+
it "should search text indexes and return records" do
|
99
|
+
Post.text_search('some').should == ['1']
|
100
|
+
@post3.update_text_indexes
|
101
|
+
Post.text_search('some').sort.should == ['1','3']
|
102
|
+
Post.text_search('plain').sort.should == ['1','2']
|
103
|
+
Post.text_search('plain','text').sort.should == ['1','2']
|
104
|
+
Post.text_search('plain','textstr').sort.should == ['2']
|
105
|
+
Post.text_search('some','TExt').sort.should == ['1']
|
106
|
+
Post.text_search('techNIcal').sort.should == ['2','3']
|
107
|
+
Post.text_search('nontechnical').sort.should == ['1']
|
108
|
+
Post.text_search('personal').sort.should == ['1','3']
|
109
|
+
Post.text_search('personAL', :fields => :tags).sort.should == ['1']
|
110
|
+
Post.text_search('PERsonal', :fields => [:tags]).sort.should == ['1']
|
111
|
+
Post.text_search('nontechnical', :fields => [:title]).sort.should == []
|
112
|
+
end
|
113
|
+
|
114
|
+
it "should pass options thru to find" do
|
115
|
+
Post.text_search('some', :order => 'updated_at desc').should == [['3','1'], {:order=>"updated_at desc"}]
|
116
|
+
Post.text_search('some', :select => 'id,username').should == [['3','1'], {:select => 'id,username'}]
|
117
|
+
end
|
118
|
+
|
119
|
+
it "should handle pagination" do
|
120
|
+
Post.text_search('some', :page => 1).should == [['3','1'], {:offset=>0, :limit=>30}]
|
121
|
+
Post.text_search('some', :page => 2, :per_page => 5).should == [['3','1'], {:offset=>5, :limit=>5}]
|
122
|
+
Post.text_search('some', :page => 15, :per_page => 3).should == [['3','1'], {:offset=>42, :limit=>3}]
|
123
|
+
end
|
124
|
+
|
125
|
+
it "should support a hash to the text_search method" do
|
126
|
+
Post.text_search(:tags => 'technical').sort.should == ['2','3']
|
127
|
+
Post.text_search(:tags => 'nontechnical').sort.should == ['1']
|
128
|
+
Post.text_search(:tags => 'technical', :title => 'plain').should == ['2']
|
129
|
+
Post.text_search(:tags => ['technical','MYsql'], :title => 'Mo').should == ['2']
|
130
|
+
Post.text_search(:tags => ['technical','MYsql'], :title => 'some').should == []
|
131
|
+
Post.text_search(:tags => 'technical', :title => 'comments').sort.should == ['2','3']
|
132
|
+
end
|
133
|
+
|
134
|
+
# MUST BE LAST!!!!!!
|
135
|
+
it "should delete text indexes" do
|
136
|
+
@post.delete_text_indexes
|
137
|
+
@post2.delete_text_indexes
|
138
|
+
Post.delete_text_indexes(3)
|
139
|
+
@post.text_indexes.should == []
|
140
|
+
@post2.text_indexes.should == []
|
141
|
+
@post3.text_indexes.should == []
|
142
|
+
end
|
143
|
+
end
|
data/spec/spec_helper.rb
ADDED
metadata
ADDED
@@ -0,0 +1,69 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: redis-textsearch
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Nate Wiger
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
|
12
|
+
date: 2009-12-03 00:00:00 -08:00
|
13
|
+
default_executable:
|
14
|
+
dependencies:
|
15
|
+
- !ruby/object:Gem::Dependency
|
16
|
+
name: redis
|
17
|
+
type: :runtime
|
18
|
+
version_requirement:
|
19
|
+
version_requirements: !ruby/object:Gem::Requirement
|
20
|
+
requirements:
|
21
|
+
- - ">="
|
22
|
+
- !ruby/object:Gem::Version
|
23
|
+
version: "0.1"
|
24
|
+
version:
|
25
|
+
description: Crazy fast text search using Redis. Works with any ORM or data store.
|
26
|
+
email: nate@wiger.org
|
27
|
+
executables: []
|
28
|
+
|
29
|
+
extensions: []
|
30
|
+
|
31
|
+
extra_rdoc_files:
|
32
|
+
- README.rdoc
|
33
|
+
files:
|
34
|
+
- lib/redis/text_search/collection.rb
|
35
|
+
- lib/redis/text_search.rb
|
36
|
+
- spec/redis_text_search_core_spec.rb
|
37
|
+
- spec/spec_helper.rb
|
38
|
+
- README.rdoc
|
39
|
+
has_rdoc: true
|
40
|
+
homepage: http://github.com/nateware/redis-textsearch
|
41
|
+
licenses: []
|
42
|
+
|
43
|
+
post_install_message:
|
44
|
+
rdoc_options:
|
45
|
+
- --title
|
46
|
+
- Redis::TextSearch -- Crazy fast text search using Redis
|
47
|
+
require_paths:
|
48
|
+
- lib
|
49
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
50
|
+
requirements:
|
51
|
+
- - ">="
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: "0"
|
54
|
+
version:
|
55
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
56
|
+
requirements:
|
57
|
+
- - ">="
|
58
|
+
- !ruby/object:Gem::Version
|
59
|
+
version: "0"
|
60
|
+
version:
|
61
|
+
requirements:
|
62
|
+
- redis, v0.1 or greater
|
63
|
+
rubyforge_project: redis-textsearch
|
64
|
+
rubygems_version: 1.3.5
|
65
|
+
signing_key:
|
66
|
+
specification_version: 3
|
67
|
+
summary: Fast text search and word indexes using Redis
|
68
|
+
test_files: []
|
69
|
+
|