acts_as_indexed 0.6.3 → 0.6.4
Sign up to get free protection for your applications and to get access to all the features.
- data/CHANGELOG +15 -10
- data/README.rdoc +10 -7
- data/VERSION +1 -1
- data/acts_as_indexed.gemspec +2 -2
- data/lib/acts_as_indexed.rb +6 -19
- data/lib/acts_as_indexed/search_atom.rb +12 -3
- data/lib/acts_as_indexed/search_index.rb +135 -78
- data/test/abstract_unit.rb +0 -1
- data/test/acts_as_indexed_test.rb +44 -14
- data/test/configuration_test.rb +1 -1
- data/test/fixtures/posts.yml +37 -37
- data/test/schema.rb +13 -7
- data/test/search_atom_test.rb +13 -1
- data/test/search_index_test.rb +3 -3
- metadata +4 -4
data/CHANGELOG
CHANGED
@@ -1,11 +1,16 @@
|
|
1
|
+
===0.6.4 [16th August 2010]
|
2
|
+
- Added starts-with query type [nilbus - Edward Anderson]
|
3
|
+
- Various fixes and improvements.
|
4
|
+
- Real names given for all contributors.
|
5
|
+
|
1
6
|
===0.6.3 [5th July 2010]
|
2
|
-
- index file path can now be definited as a Pathname as well as an array. [parndt]
|
3
|
-
- Can now define which records are indexed and which are not via an :if proc. [madpilot]
|
4
|
-
- Lots of tidying up. [parndt]
|
5
|
-
- Rails 3 fixes. [myabc]
|
7
|
+
- index file path can now be definited as a Pathname as well as an array. [parndt - Philip Arndt]
|
8
|
+
- Can now define which records are indexed and which are not via an :if proc. [madpilot - Myles Eftos]
|
9
|
+
- Lots of tidying up. [parndt - Philip Arndt]
|
10
|
+
- Rails 3 fixes. [myabc - Alex Coles]
|
6
11
|
|
7
12
|
===0.6.2 [11th June 2010]
|
8
|
-
- Now available as a Gem as well as the original plugin. [parndt - Thanks for doing most of the hard work.]
|
13
|
+
- Now available as a Gem as well as the original plugin. [parndt - Philip Arndt - Thanks for doing most of the hard work.]
|
9
14
|
|
10
15
|
===0.6.0 [10th June 2010]
|
11
16
|
- Now supports Rails 3.x.x as well as Rails 2.x.x.
|
@@ -14,11 +19,11 @@
|
|
14
19
|
- Deprecated find_with_index and will_paginate_search methods.
|
15
20
|
|
16
21
|
===0.5.3 [6th June 2010]
|
17
|
-
- Now supports non-standard table names automatically. [nandalopes]
|
22
|
+
- Now supports non-standard table names automatically. [nandalopes - Fernanda Lopes]
|
18
23
|
|
19
24
|
===0.5.2 [3rd May 2010]
|
20
|
-
- Fix for Errno::ERANGE error related to certain Math.log calculations. [parndt]
|
21
|
-
- Improved index detection in a shared-directory environment. [bob-p]
|
25
|
+
- Fix for Errno::ERANGE error related to certain Math.log calculations. [parndt - Philip Arndt]
|
26
|
+
- Improved index detection in a shared-directory environment. [bob-p - Thomas Pomfret]
|
22
27
|
|
23
28
|
===0.5.1 [11 June 2009]
|
24
29
|
- Fixed Ruby 1.8.6 compatibility.
|
@@ -67,7 +72,7 @@
|
|
67
72
|
|
68
73
|
===0.3.0 [18 September 2007]
|
69
74
|
- Minor bug fixes.
|
70
|
-
- min_word_size now works properly, with
|
75
|
+
- min_word_size now works properly, with queries containing small words in
|
71
76
|
quotes or being preceded by a '+' symbol are now searched on.
|
72
77
|
|
73
78
|
===0.2.2 [06 September 2007]
|
@@ -93,4 +98,4 @@
|
|
93
98
|
|
94
99
|
===0.1 [31 August 2007]
|
95
100
|
|
96
|
-
- Initial release.
|
101
|
+
- Initial release.
|
data/README.rdoc
CHANGED
@@ -18,13 +18,13 @@ app with no dependencies and minimal setup.
|
|
18
18
|
|
19
19
|
=== Install
|
20
20
|
|
21
|
-
|
21
|
+
==== Rails 2.x.x
|
22
22
|
./script/plugin install git://github.com/dougal/acts_as_indexed.git
|
23
23
|
|
24
|
-
|
24
|
+
==== Rails 3.x.x
|
25
25
|
rails plugin install git://github.com/dougal/acts_as_indexed.git
|
26
26
|
|
27
|
-
|
27
|
+
==== As a Gem
|
28
28
|
Despite this being slightly against the the original ethos of the project,
|
29
29
|
acts_as_indexed is now available as a Gem as several people have requested it.
|
30
30
|
|
@@ -32,6 +32,8 @@ acts_as_indexed is now available as a Gem as several people have requested it.
|
|
32
32
|
|
33
33
|
Make sure to specify the Gem in your environment.rb file (Rails 2.x.x), or the Gemfile (Rails 3.x.x).
|
34
34
|
|
35
|
+
==== No Git?
|
36
|
+
|
35
37
|
If you don't have git installed, you can download the plugin from the GitHub
|
36
38
|
page (http://github.com/dougal/acts_as_indexed) and unpack it into the
|
37
39
|
<tt>vendor/plugins</tt> directory of your rails app.
|
@@ -88,7 +90,6 @@ an argument.
|
|
88
90
|
# Chain it with any number of ActiveRecord methods and named_scopes.
|
89
91
|
my_search_results = Post.public.with_query('my search query').find(:all, :limit => 10) # return the first 10 matches which are public.
|
90
92
|
|
91
|
-
|
92
93
|
=== Query Options
|
93
94
|
|
94
95
|
The following query operators are supported:
|
@@ -97,6 +98,8 @@ The following query operators are supported:
|
|
97
98
|
* NOT:: 'cat -dog' will find records matching 'cat' AND NOT 'dog'
|
98
99
|
* INCLUDE:: 'cat +me' will find records matching 'cat' and 'me', even if 'me' is smaller than the +min_word_size+
|
99
100
|
* "":: Quoted terms are matched as phrases. '"cat dog"' will find records matching the whole phrase. Quoted terms can be preceded by the NOT operator; 'cat -"big dog"' etc. Quoted terms can include words shorter than the +min_word_size+.
|
101
|
+
* ^:: Terms that begin with ^ will match records that contain a word starting with the term. '^cat' will find matches containing 'cat', 'catapult', 'caterpillar' etc.
|
102
|
+
* ^"":: A quoted term that begins with ^ matches any phrase that begin with this phrase. '^"cat d"' will find records matching the whole phrases "cat dog" and "cat dinner". This type of search is useful for autocomplete inputs.
|
100
103
|
|
101
104
|
=== Pagination
|
102
105
|
|
@@ -137,7 +140,7 @@ All of the above are most welcome. mailto:dougal.s@gmail.com
|
|
137
140
|
|
138
141
|
== Credits
|
139
142
|
|
140
|
-
Douglas F Shearer - http
|
143
|
+
Douglas F Shearer - http://douglasfshearer.com
|
141
144
|
|
142
145
|
|
143
146
|
== Future Releases
|
@@ -146,5 +149,5 @@ Future releases will be looking to add the following features:
|
|
146
149
|
* Optional html scrubbing during indexing.
|
147
150
|
* Ranking affected by field weightings.
|
148
151
|
* Support for DataMapper, Sequel and the various MongoDB ORMs.
|
149
|
-
* UTF-8 support. See the current solution
|
150
|
-
https://gist.github.com/193903bb4e0d6e5debe1
|
152
|
+
* UTF-8 support. See the current solution in the following Gist:
|
153
|
+
https://gist.github.com/193903bb4e0d6e5debe1
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.6.
|
1
|
+
0.6.4
|
data/acts_as_indexed.gemspec
CHANGED
@@ -5,11 +5,11 @@
|
|
5
5
|
|
6
6
|
Gem::Specification.new do |s|
|
7
7
|
s.name = %q{acts_as_indexed}
|
8
|
-
s.version = "0.6.
|
8
|
+
s.version = "0.6.4"
|
9
9
|
|
10
10
|
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
11
11
|
s.authors = ["Douglas F Shearer"]
|
12
|
-
s.date = %q{2010-
|
12
|
+
s.date = %q{2010-08-16}
|
13
13
|
s.description = %q{Acts As Indexed is a plugin which provides a pain-free way to add fulltext search to your Ruby on Rails app}
|
14
14
|
s.email = %q{dougal.s@gmail.com}
|
15
15
|
s.extra_rdoc_files = [
|
data/lib/acts_as_indexed.rb
CHANGED
@@ -95,9 +95,7 @@ module ActsAsIndexed #:nodoc:
|
|
95
95
|
build_index unless aai_config.index_file.directory?
|
96
96
|
index = SearchIndex.new(aai_config.index_file, aai_config.index_file_depth, aai_fields, aai_config.min_word_size, aai_config.if_proc)
|
97
97
|
index.add_record(record)
|
98
|
-
index.save
|
99
98
|
@query_cache = {}
|
100
|
-
true
|
101
99
|
end
|
102
100
|
|
103
101
|
# Removes the passed +record+ from the index. Clears the query cache.
|
@@ -105,11 +103,9 @@ module ActsAsIndexed #:nodoc:
|
|
105
103
|
def index_remove(record)
|
106
104
|
index = SearchIndex.new(aai_config.index_file, aai_config.index_file_depth, aai_fields, aai_config.min_word_size, aai_config.if_proc)
|
107
105
|
# record won't be in index if it doesn't exist. Just return true.
|
108
|
-
return
|
106
|
+
return unless index.exists?
|
109
107
|
index.remove_record(record)
|
110
|
-
index.save
|
111
108
|
@query_cache = {}
|
112
|
-
true
|
113
109
|
end
|
114
110
|
|
115
111
|
# Updates the index.
|
@@ -119,12 +115,8 @@ module ActsAsIndexed #:nodoc:
|
|
119
115
|
def index_update(record)
|
120
116
|
build_index unless aai_config.index_file.directory?
|
121
117
|
index = SearchIndex.new(aai_config.index_file, aai_config.index_file_depth, aai_fields, aai_config.min_word_size, aai_config.if_proc)
|
122
|
-
#index.remove_record(find(record.id))
|
123
|
-
#index.add_record(record)
|
124
118
|
index.update_record(record,find(record.id))
|
125
|
-
index.save
|
126
119
|
@query_cache = {}
|
127
|
-
true
|
128
120
|
end
|
129
121
|
|
130
122
|
# Finds instances matching the terms passed in +query+. Terms are ANDed by
|
@@ -153,12 +145,12 @@ module ActsAsIndexed #:nodoc:
|
|
153
145
|
else
|
154
146
|
logger.debug('Query held in cache.')
|
155
147
|
end
|
156
|
-
return @query_cache[query].sort.reverse.map
|
148
|
+
return @query_cache[query].sort.reverse.map{|r| r.first} if options[:ids_only] || @query_cache[query].empty?
|
157
149
|
|
158
150
|
# slice up the results by offset and limit
|
159
151
|
offset = find_options[:offset] || 0
|
160
152
|
limit = find_options.include?(:limit) ? find_options[:limit] : @query_cache[query].size
|
161
|
-
part_query = @query_cache[query].sort.reverse.slice(offset,limit).map
|
153
|
+
part_query = @query_cache[query].sort.reverse.slice(offset,limit).map{|r| r.first}
|
162
154
|
|
163
155
|
# Set these to nil as we are dealing with the pagination by setting
|
164
156
|
# exactly what records we want.
|
@@ -179,7 +171,7 @@ module ActsAsIndexed #:nodoc:
|
|
179
171
|
ranked_records[r] = @query_cache[query][r.id]
|
180
172
|
end
|
181
173
|
|
182
|
-
ranked_records.to_a.sort_by{|a| a.last }.reverse.map
|
174
|
+
ranked_records.to_a.sort_by{|a| a.last }.reverse.map{|r| r.first}
|
183
175
|
end
|
184
176
|
end
|
185
177
|
|
@@ -189,14 +181,9 @@ module ActsAsIndexed #:nodoc:
|
|
189
181
|
|
190
182
|
# Builds an index from scratch for the current model class.
|
191
183
|
def build_index
|
192
|
-
|
193
|
-
|
194
|
-
while (records = find(:all, :limit => increment, :offset => offset)).size > 0
|
195
|
-
#p "offset is #{offset}, increment is #{increment}"
|
196
|
-
index = SearchIndex.new(aai_config.index_file, aai_config.index_file_depth, aai_fields, aai_config.min_word_size, aai_config.if_proc)
|
197
|
-
offset += increment
|
184
|
+
index = SearchIndex.new(aai_config.index_file, aai_config.index_file_depth, aai_fields, aai_config.min_word_size, aai_config.if_proc)
|
185
|
+
find_in_batches({ :batch_size => 500 }) do |records|
|
198
186
|
index.add_records(records)
|
199
|
-
index.save
|
200
187
|
end
|
201
188
|
end
|
202
189
|
|
@@ -14,8 +14,10 @@ module ActsAsIndexed #:nodoc:
|
|
14
14
|
# W(T, D) = tf(T, D) * log ( DN / df(T))
|
15
15
|
# weighting = frequency_in_this_record * log (total_number_of_records / number_of_matching_records)
|
16
16
|
|
17
|
-
|
18
|
-
|
17
|
+
attr_reader :records
|
18
|
+
|
19
|
+
def initialize(records={})
|
20
|
+
@records = records
|
19
21
|
end
|
20
22
|
|
21
23
|
# Returns true if the given record is present.
|
@@ -49,6 +51,13 @@ module ActsAsIndexed #:nodoc:
|
|
49
51
|
@records.delete(record_id)
|
50
52
|
end
|
51
53
|
|
54
|
+
# Creates a new SearchAtom with the combined records from self and other
|
55
|
+
def +(other)
|
56
|
+
SearchAtom.new(@records.clone.merge!(other.records) { |key, _old, _new|
|
57
|
+
_old + _new
|
58
|
+
})
|
59
|
+
end
|
60
|
+
|
52
61
|
# Returns at atom containing the records and positions of +self+ preceded by +former+
|
53
62
|
# "former latter" or "big dog" where "big" is the former and "dog" is the latter.
|
54
63
|
def preceded_by(former)
|
@@ -101,4 +110,4 @@ module ActsAsIndexed #:nodoc:
|
|
101
110
|
end
|
102
111
|
|
103
112
|
end
|
104
|
-
end
|
113
|
+
end
|
@@ -22,19 +22,21 @@ module ActsAsIndexed #:nodoc:
|
|
22
22
|
end
|
23
23
|
|
24
24
|
# Adds +record+ to the index.
|
25
|
-
def add_record(record)
|
26
|
-
return
|
25
|
+
def add_record(record, no_save=false)
|
26
|
+
return unless @if_proc.call(record)
|
27
27
|
condensed_record = condense_record(record)
|
28
28
|
load_atoms(condensed_record)
|
29
29
|
add_occurences(condensed_record,record.id)
|
30
30
|
@records_size += 1
|
31
|
+
self.save unless no_save
|
31
32
|
end
|
32
33
|
|
33
34
|
# Adds multiple records to the index. Accepts an array of +records+.
|
34
35
|
def add_records(records)
|
35
36
|
records.each do |record|
|
36
|
-
add_record(record)
|
37
|
+
add_record(record, true)
|
37
38
|
end
|
39
|
+
self.save
|
38
40
|
end
|
39
41
|
|
40
42
|
# Removes +record+ from the index.
|
@@ -44,29 +46,13 @@ module ActsAsIndexed #:nodoc:
|
|
44
46
|
atoms.each do |a|
|
45
47
|
@atoms[a].remove_record(record.id) if @atoms.has_key?(a)
|
46
48
|
@records_size -= 1
|
47
|
-
#p "removing #{record.id} from #{a}"
|
48
49
|
end
|
50
|
+
self.save
|
49
51
|
end
|
50
52
|
|
51
53
|
def update_record(record_new, record_old)
|
52
|
-
|
53
|
-
|
54
|
-
old_atoms = condense_record(record_old)
|
55
|
-
new_atoms = condense_record(record_new)
|
56
|
-
|
57
|
-
# Remove the old version from the appropriate atoms.
|
58
|
-
load_atoms(old_atoms)
|
59
|
-
old_atoms.each do |a|
|
60
|
-
@atoms[a].remove_record(record_new.id) if @atoms.has_key?(a)
|
61
|
-
end
|
62
|
-
|
63
|
-
if @if_proc.call(record_new)
|
64
|
-
# Add the new version to the appropriate atoms.
|
65
|
-
load_atoms(new_atoms)
|
66
|
-
# TODO: Make a version of this method that takes the
|
67
|
-
# atomised version of the record.
|
68
|
-
add_occurences(new_atoms, record_new.id)
|
69
|
-
end
|
54
|
+
remove_record(record_old)
|
55
|
+
add_record(record_new)
|
70
56
|
end
|
71
57
|
|
72
58
|
# Saves the current index partitions to the filesystem.
|
@@ -77,7 +63,6 @@ module ActsAsIndexed #:nodoc:
|
|
77
63
|
(atoms_sorted[encoded_prefix(atom_name)] ||= {})[atom_name] = records
|
78
64
|
end
|
79
65
|
atoms_sorted.each do |e_p, atoms|
|
80
|
-
#p "Saving #{e_p}."
|
81
66
|
@root.join(e_p.to_s).open("w+") do |f|
|
82
67
|
Marshal.dump(atoms,f)
|
83
68
|
end
|
@@ -95,29 +80,55 @@ module ActsAsIndexed #:nodoc:
|
|
95
80
|
# Returns an array of IDs for records matching +query+.
|
96
81
|
def search(query)
|
97
82
|
return [] if query.nil?
|
98
|
-
|
83
|
+
load_options = { :start => true } if query[/\^/]
|
84
|
+
load_atoms(cleanup_atoms(query), load_options || {})
|
99
85
|
queries = parse_query(query.dup)
|
100
86
|
positive = run_queries(queries[:positive])
|
101
87
|
positive_quoted = run_quoted_queries(queries[:positive_quoted])
|
102
88
|
negative = run_queries(queries[:negative])
|
103
89
|
negative_quoted = run_quoted_queries(queries[:negative_quoted])
|
90
|
+
starts_with = run_queries(queries[:starts_with], true)
|
91
|
+
start_quoted = run_quoted_queries(queries[:start_quoted], true)
|
104
92
|
|
105
|
-
|
106
|
-
|
107
|
-
|
108
|
-
results =
|
109
|
-
|
110
|
-
|
111
|
-
|
112
|
-
results =
|
93
|
+
results = {}
|
94
|
+
|
95
|
+
if queries[:start_quoted].any?
|
96
|
+
results = merge_query_results(results, start_quoted)
|
97
|
+
end
|
98
|
+
|
99
|
+
if queries[:starts_with].any?
|
100
|
+
results = merge_query_results(results, starts_with)
|
101
|
+
end
|
102
|
+
|
103
|
+
if queries[:positive_quoted].any?
|
104
|
+
results = merge_query_results(results, positive_quoted)
|
105
|
+
end
|
106
|
+
|
107
|
+
if queries[:positive].any?
|
108
|
+
results = merge_query_results(results, positive)
|
113
109
|
end
|
114
110
|
|
115
111
|
negative_results = (negative.keys + negative_quoted.keys)
|
116
112
|
results.delete_if { |r_id, w| negative_results.include?(r_id) }
|
117
|
-
#p results
|
118
113
|
results
|
119
114
|
end
|
120
|
-
|
115
|
+
|
116
|
+
def merge_query_results(results1, results2)
|
117
|
+
# Return the other if one is empty.
|
118
|
+
return results1 if results2.empty?
|
119
|
+
return results2 if results1.empty?
|
120
|
+
|
121
|
+
# Delete any records from results 1 that are not in results 2.
|
122
|
+
r1 = results1.delete_if{ |r_id,w| results2.exclude?(r_id) }
|
123
|
+
|
124
|
+
|
125
|
+
# Delete any records from results 2 that are not in results 1.
|
126
|
+
r2 = results2.delete_if{ |r_id,w| results1.exclude?(r_id) }
|
127
|
+
|
128
|
+
# Merge the results by adding their respective scores.
|
129
|
+
r1.merge(r2) { |r_id,old_val,new_val| old_val + new_val}
|
130
|
+
end
|
131
|
+
|
121
132
|
# Returns true if the index root exists on the FS.
|
122
133
|
#--
|
123
134
|
# TODO: Make a private method called 'root_exists?' which checks for the root directory.
|
@@ -129,7 +140,6 @@ module ActsAsIndexed #:nodoc:
|
|
129
140
|
|
130
141
|
# Gets the size file from the index.
|
131
142
|
def load_record_size
|
132
|
-
#p "About to load #{@root.join('size')}"
|
133
143
|
@root.join('size').open do |f|
|
134
144
|
Marshal.load(f)
|
135
145
|
end
|
@@ -144,7 +154,11 @@ module ActsAsIndexed #:nodoc:
|
|
144
154
|
|
145
155
|
# Returns true if the given atom is present.
|
146
156
|
def include_atom?(atom)
|
147
|
-
|
157
|
+
if atom.is_a? Regexp
|
158
|
+
@atoms.keys.grep(atom).any?
|
159
|
+
else
|
160
|
+
@atoms.has_key?(atom)
|
161
|
+
end
|
148
162
|
end
|
149
163
|
|
150
164
|
# Returns true if all the given atoms are present.
|
@@ -170,7 +184,6 @@ module ActsAsIndexed #:nodoc:
|
|
170
184
|
condensed_record.each_with_index do |atom, i|
|
171
185
|
add_atom(atom)
|
172
186
|
@atoms[atom].add_position(record_id, i)
|
173
|
-
#p "adding #{record.id} to #{atom}"
|
174
187
|
end
|
175
188
|
end
|
176
189
|
|
@@ -197,6 +210,12 @@ module ActsAsIndexed #:nodoc:
|
|
197
210
|
|
198
211
|
def parse_query(s)
|
199
212
|
|
213
|
+
# Find ^"foo bar".
|
214
|
+
start_quoted = []
|
215
|
+
while st_quoted = s.slice!(/\^\"[^\"]*\"/)
|
216
|
+
start_quoted << cleanup_atoms(st_quoted)
|
217
|
+
end
|
218
|
+
|
200
219
|
# Find -"foo bar".
|
201
220
|
negative_quoted = []
|
202
221
|
while neg_quoted = s.slice!(/-\"[^\"]*\"/)
|
@@ -209,6 +228,12 @@ module ActsAsIndexed #:nodoc:
|
|
209
228
|
positive_quoted << cleanup_atoms(pos_quoted)
|
210
229
|
end
|
211
230
|
|
231
|
+
# Find ^foo.
|
232
|
+
starts_with = []
|
233
|
+
while st_with = s.slice!(/\^[\S]*/)
|
234
|
+
starts_with << cleanup_atoms(st_with).first
|
235
|
+
end
|
236
|
+
|
212
237
|
# Find -foo.
|
213
238
|
negative = []
|
214
239
|
while neg = s.slice!(/-[\S]*/)
|
@@ -224,74 +249,106 @@ module ActsAsIndexed #:nodoc:
|
|
224
249
|
# Find all other terms.
|
225
250
|
positive += cleanup_atoms(s,true)
|
226
251
|
|
227
|
-
{
|
252
|
+
{ :start_quoted => start_quoted,
|
253
|
+
:negative_quoted => negative_quoted,
|
254
|
+
:positive_quoted => positive_quoted,
|
255
|
+
:starts_with => starts_with,
|
256
|
+
:negative => negative,
|
257
|
+
:positive => positive }
|
228
258
|
end
|
229
|
-
|
230
|
-
def run_queries(atoms)
|
259
|
+
|
260
|
+
def run_queries(atoms, starts_with=false)
|
231
261
|
results = {}
|
232
|
-
atoms.
|
262
|
+
atoms.each do |atom|
|
233
263
|
interim_results = {}
|
234
|
-
|
235
|
-
|
236
|
-
|
237
|
-
|
238
|
-
|
239
|
-
|
240
|
-
|
241
|
-
|
242
|
-
|
243
|
-
|
244
|
-
|
245
|
-
|
246
|
-
|
247
|
-
|
248
|
-
|
249
|
-
|
264
|
+
|
265
|
+
# If these atoms are to be run as 'starts with', make them a Regexp
|
266
|
+
# with a carat.
|
267
|
+
atom = /^#{atom}/ if starts_with
|
268
|
+
|
269
|
+
# Get the resulting matches, and break if none exist.
|
270
|
+
matches = get_atom_results(@atoms.keys, atom)
|
271
|
+
break if matches.nil?
|
272
|
+
|
273
|
+
# Grab the record IDs and weightings.
|
274
|
+
interim_results = matches.weightings(@records_size)
|
275
|
+
|
276
|
+
# Merge them with the results obtained already, if any.
|
277
|
+
results = results.empty? ? interim_results : merge_query_results(results, interim_results)
|
278
|
+
|
279
|
+
break if results.empty?
|
280
|
+
|
250
281
|
end
|
251
|
-
#p results
|
252
282
|
results
|
253
283
|
end
|
254
|
-
|
255
|
-
def run_quoted_queries(quoted_atoms)
|
284
|
+
|
285
|
+
def run_quoted_queries(quoted_atoms, starts_with=false)
|
256
286
|
results = {}
|
257
287
|
quoted_atoms.each do |quoted_atom|
|
258
288
|
interim_results = {}
|
289
|
+
|
290
|
+
break if quoted_atom.empty?
|
291
|
+
|
292
|
+
# If these atoms are to be run as 'starts with', make the final atom a
|
293
|
+
# Regexp with a line-start anchor.
|
294
|
+
quoted_atom[-1] = /^#{quoted_atom.last}/ if starts_with
|
295
|
+
|
296
|
+
# Little bit of memoization.
|
297
|
+
atoms_keys = @atoms.keys
|
298
|
+
|
299
|
+
# Get the matches for the first atom.
|
300
|
+
matches = get_atom_results(atoms_keys, quoted_atom.first)
|
301
|
+
break if matches.nil?
|
302
|
+
|
259
303
|
# Check the index contains all the required atoms.
|
260
|
-
# match_atom = first_word_atom
|
261
304
|
# for each of the others
|
262
305
|
# return atom containing records + positions where current atom is preceded by following atom.
|
263
306
|
# end
|
264
|
-
#
|
265
|
-
next unless include_atoms?(quoted_atom)
|
266
|
-
matches = @atoms[quoted_atom.first]
|
307
|
+
# Return records from final atom.
|
267
308
|
quoted_atom[1..-1].each do |atom_name|
|
268
|
-
|
309
|
+
interim_matches = get_atom_results(atoms_keys, atom_name)
|
310
|
+
if interim_matches.nil?
|
311
|
+
matches = nil
|
312
|
+
break
|
313
|
+
end
|
314
|
+
matches = interim_matches.preceded_by(matches)
|
269
315
|
end
|
270
|
-
#results += matches.record_ids
|
271
316
|
|
317
|
+
break if matches.nil?
|
318
|
+
# Grab the record IDs and weightings.
|
272
319
|
interim_results = matches.weightings(@records_size)
|
273
|
-
if results.empty?
|
274
|
-
results = interim_results
|
275
|
-
else
|
276
|
-
rr = {}
|
277
|
-
interim_results.each do |r,w|
|
278
|
-
rr[r] = w + results[r] if results[r]
|
279
|
-
end
|
280
|
-
#p results.class
|
281
|
-
results = rr
|
282
|
-
end
|
283
320
|
|
321
|
+
# Merge them with the results obtained already, if any.
|
322
|
+
results = results.empty? ? interim_results : merge_query_results(results, interim_results)
|
323
|
+
|
324
|
+
break if results.empty?
|
325
|
+
|
284
326
|
end
|
285
327
|
results
|
286
328
|
end
|
287
329
|
|
288
|
-
def
|
330
|
+
def get_atom_results(atoms_keys, atom)
|
331
|
+
if atom.is_a? Regexp
|
332
|
+
matching_keys = atoms_keys.grep(atom)
|
333
|
+
results = SearchAtom.new
|
334
|
+
matching_keys.each do |key|
|
335
|
+
results += @atoms[key]
|
336
|
+
end
|
337
|
+
results
|
338
|
+
else
|
339
|
+
@atoms[atom]
|
340
|
+
end
|
341
|
+
end
|
342
|
+
|
343
|
+
def load_atoms(atoms, options={})
|
289
344
|
# Remove duplicate atoms.
|
290
345
|
# Remove atoms already in index.
|
291
346
|
# Calculate prefixes.
|
292
347
|
# Remove duplicate prefixes.
|
293
348
|
atoms.uniq.reject{|a| include_atom?(a)}.collect{|a| encoded_prefix(a)}.uniq.each do |name|
|
294
|
-
|
349
|
+
pattern = @root.join(name.to_s).to_s
|
350
|
+
pattern += '*' if options[:start]
|
351
|
+
Pathname.glob(pattern).each do |atom_file|
|
295
352
|
atom_file.open do |f|
|
296
353
|
@atoms.merge!(Marshal.load(f))
|
297
354
|
end
|
data/test/abstract_unit.rb
CHANGED
@@ -1,4 +1,4 @@
|
|
1
|
-
require File.
|
1
|
+
require File.dirname(__FILE__) + '/abstract_unit'
|
2
2
|
|
3
3
|
class ActsAsIndexedTest < ActiveSupport::TestCase
|
4
4
|
fixtures :posts
|
@@ -12,10 +12,10 @@ class ActsAsIndexedTest < ActiveSupport::TestCase
|
|
12
12
|
def test_adds_to_index
|
13
13
|
original_post_count = Post.count
|
14
14
|
assert_equal [], Post.find_with_index('badger')
|
15
|
-
|
16
|
-
assert
|
15
|
+
post = Post.new(:title => 'badger', :body => 'Thousands of them!')
|
16
|
+
assert post.save
|
17
17
|
assert_equal original_post_count+1, Post.count
|
18
|
-
assert_equal [
|
18
|
+
assert_equal [post.id], Post.find_with_index('badger',{},{:ids_only => true})
|
19
19
|
end
|
20
20
|
|
21
21
|
def test_removes_from_index
|
@@ -66,10 +66,10 @@ class ActsAsIndexedTest < ActiveSupport::TestCase
|
|
66
66
|
def test_scoped_simple_queries
|
67
67
|
assert_equal [], Post.find_with_index(nil)
|
68
68
|
assert_equal [], Post.with_query('')
|
69
|
-
assert_equal [5, 6], Post.with_query('ship').map
|
70
|
-
assert_equal [6], Post.with_query('foo').map
|
71
|
-
assert_equal [6], Post.with_query('foo ship').map
|
72
|
-
assert_equal [6], Post.with_query('ship foo').map
|
69
|
+
assert_equal [5, 6], Post.with_query('ship').map{|r| r.id}.sort
|
70
|
+
assert_equal [6], Post.with_query('foo').map{|r| r.id}
|
71
|
+
assert_equal [6], Post.with_query('foo ship').map{|r| r.id}
|
72
|
+
assert_equal [6], Post.with_query('ship foo').map{|r| r.id}
|
73
73
|
end
|
74
74
|
|
75
75
|
def test_negative_queries
|
@@ -80,9 +80,9 @@ class ActsAsIndexedTest < ActiveSupport::TestCase
|
|
80
80
|
end
|
81
81
|
|
82
82
|
def test_scoped_negative_queries
|
83
|
-
assert_equal [5, 6], Post.with_query('crane').map
|
84
|
-
assert_equal [5], Post.with_query('crane -foo').map
|
85
|
-
assert_equal [5], Post.with_query('-foo crane').map
|
83
|
+
assert_equal [5, 6], Post.with_query('crane').map{|r| r.id}.sort
|
84
|
+
assert_equal [5], Post.with_query('crane -foo').map{|r| r.id}
|
85
|
+
assert_equal [5], Post.with_query('-foo crane').map{|r| r.id}
|
86
86
|
assert_equal [], Post.with_query('-foo') #Edgecase
|
87
87
|
end
|
88
88
|
|
@@ -94,8 +94,8 @@ class ActsAsIndexedTest < ActiveSupport::TestCase
|
|
94
94
|
end
|
95
95
|
|
96
96
|
def test_scoped_quoted_queries
|
97
|
-
assert_equal [5], Post.with_query('"crane ship"').map
|
98
|
-
assert_equal [6], Post.with_query('"crane big"').map
|
97
|
+
assert_equal [5], Post.with_query('"crane ship"').map{|r| r.id}
|
98
|
+
assert_equal [6], Post.with_query('"crane big"').map{|r| r.id}
|
99
99
|
assert_equal [], Post.with_query('foo "crane ship"')
|
100
100
|
assert_equal [], Post.with_query('"crane badger"')
|
101
101
|
end
|
@@ -106,10 +106,40 @@ class ActsAsIndexedTest < ActiveSupport::TestCase
|
|
106
106
|
end
|
107
107
|
|
108
108
|
def test_scoped_negative_quoted_queries
|
109
|
-
assert_equal [6], Post.with_query('crane -"crane ship"').map
|
109
|
+
assert_equal [6], Post.with_query('crane -"crane ship"').map{|r| r.id}
|
110
110
|
assert_equal [], Post.with_query('-"crane big"') # Edgecase
|
111
111
|
end
|
112
112
|
|
113
|
+
def test_start_queries
|
114
|
+
assert_equal [6,5], Post.find_with_index('ship ^crane',{},{:ids_only => true})
|
115
|
+
assert_equal [6,5], Post.find_with_index('^crane ship',{},{:ids_only => true})
|
116
|
+
assert_equal [6,5], Post.find_with_index('^ship ^crane',{},{:ids_only => true})
|
117
|
+
assert_equal [6,5], Post.find_with_index('^crane ^ship',{},{:ids_only => true})
|
118
|
+
assert_equal [6,5], Post.find_with_index('^ship crane',{},{:ids_only => true})
|
119
|
+
assert_equal [6,5], Post.find_with_index('crane ^ship',{},{:ids_only => true})
|
120
|
+
assert_equal [6,5], Post.find_with_index('^crane',{},{:ids_only => true})
|
121
|
+
assert_equal [6,5], Post.find_with_index('^cran',{},{:ids_only => true})
|
122
|
+
assert_equal [6,5], Post.find_with_index('^cra',{},{:ids_only => true})
|
123
|
+
assert_equal [6,5,4], Post.find_with_index('^cr',{},{:ids_only => true})
|
124
|
+
assert_equal [6,5,4,3,2,1], Post.find_with_index('^c',{},{:ids_only => true})
|
125
|
+
assert_equal [], Post.find_with_index('^notthere',{},{:ids_only => true})
|
126
|
+
end
|
127
|
+
|
128
|
+
def test_start_quoted_queries
|
129
|
+
assert_equal [6,5], Post.find_with_index('^"crane" ship',{},{:ids_only => true})
|
130
|
+
assert_equal [6,5], Post.find_with_index('ship ^"crane"',{},{:ids_only => true})
|
131
|
+
assert_equal [5], Post.find_with_index('^"crane ship"',{},{:ids_only => true})
|
132
|
+
assert_equal [5], Post.find_with_index('^"crane shi"',{},{:ids_only => true})
|
133
|
+
assert_equal [5], Post.find_with_index('^"crane sh"',{},{:ids_only => true})
|
134
|
+
assert_equal [5], Post.find_with_index('^"crane s"',{},{:ids_only => true})
|
135
|
+
assert_equal [6,5], Post.find_with_index('^"crane "',{},{:ids_only => true})
|
136
|
+
assert_equal [6,5], Post.find_with_index('^"crane"',{},{:ids_only => true})
|
137
|
+
assert_equal [6,5], Post.find_with_index('^"cran"',{},{:ids_only => true})
|
138
|
+
assert_equal [6,5], Post.find_with_index('^"cra"',{},{:ids_only => true})
|
139
|
+
assert_equal [6,5,4], Post.find_with_index('^"cr"',{},{:ids_only => true})
|
140
|
+
assert_equal [6,5,4,3,2,1], Post.find_with_index('^"c"',{},{:ids_only => true})
|
141
|
+
end
|
142
|
+
|
113
143
|
def test_find_options
|
114
144
|
all_results = Post.find_with_index('crane',{},{:ids_only => true})
|
115
145
|
first_result = Post.find_with_index('crane',{:limit => 1})
|
data/test/configuration_test.rb
CHANGED
data/test/fixtures/posts.yml
CHANGED
@@ -1,37 +1,37 @@
|
|
1
|
-
# Content generated using the random article feature on Wikipedia, http://en.wikipedia.org/wiki/Special:Random
|
2
|
-
# Wikipedia content may be redistributed under the GNU Free Documentation License; http://en.wikipedia.org/wiki/Wikipedia:Text_of_the_GNU_Free_Documentation_License
|
3
|
-
wikipedia_article_1:
|
4
|
-
id: 1
|
5
|
-
title: Body Count (video game)
|
6
|
-
body: Body Count is a 1994 First-person shooter for the Sega Mega Drive. It is one of the few games that make use of the Menacer light gun and the Mega Mouse. \n In the U.S. the game was released on the Sega Channel.
|
7
|
-
visible: 1
|
8
|
-
|
9
|
-
wikipedia_article_2:
|
10
|
-
id: 2
|
11
|
-
title: Julien Ellis
|
12
|
-
body: Julien Ellis is a ice hockey goalie, born in Sorel, Quebec on January 27, 1986. He is currently 6'0" and weighs approximately 177 pounds. He wears number 34 and catches left. \n Julien played his entire junior hockey career in the QMJHL with the Shawinigan Cataractes. He was there from 2002 through the 2006 season, and played a total of 173 regular season games for them. During his time there, he recorded eight shutouts, as well as a career high .921 save percentage and 2.41 goals against average. \n Julien was chosen in round six of the 2004 NHL Entry Draft by the Vancouver Canucks, making him 189th overall pick and the 5th pick for Vancouver. \n His 2006-07 season was spent with the Victoria Salmon Kings of the ECHL, where he played 37 games and made 1,212 saves. Julien was called up the the Manitoba Moose of the AHL several times, where he played eight games.
|
13
|
-
visible: 1
|
14
|
-
|
15
|
-
wikipedia_article_3:
|
16
|
-
id: 3
|
17
|
-
title: Tuen Mun River
|
18
|
-
body: The Tuen Mun River is a river in Tuen Mun, New Territories, Hong Kong. It has many tributaries, with major ones coming from Lam Tei, Kau Keng Shan, Hung Shui Hang and Nai Wai. It flows south, splitting Tuen Mun into a west side and an east side. It eventually feeds into the Tuen Mun Typhoon Shelter, which is part of Castle Peak Bay.
|
19
|
-
visible: 0
|
20
|
-
|
21
|
-
wikipedia_article_4:
|
22
|
-
id: 4
|
23
|
-
title: So Happily Unsatisfied
|
24
|
-
body: So Happily Unsatisfied is an album that was recorded by the band Nine Days. It was intended to be the follow-up to their successful major-label debut, The Madding Crowd from 2000. The release date of the album was repeatedly delayed by Sony until the band was ultimately dropped. In the interim, the album had leaked onto the internet. The band has also put the whole album on their official website for the public to download.
|
25
|
-
visible: 1
|
26
|
-
|
27
|
-
wikipedia_article_5:
|
28
|
-
id: 5
|
29
|
-
title: SS Cornhusker State (T-ACS-6)
|
30
|
-
body: SS Cornhusker State (T-ACS-6) is a crane ship in ready reserve for the United States Navy. She is stationed at Cheatham Annex in Williamsburg, Virginia and is in ready reserve under the Military Sealift Command. The ship was named for the state of Nebraska, which is also known as the Cornhusker State. \n The ship was built by the Bath Iron Works. Her keel was laid on 27 November 1967, launched on 2 November 1968, and delivered 20 June 1969 as CV Stag Hound (MA 207). \n Stag Hound was acquired by the US Navy from the Maritime Administration in 1986 and was converted throughout 1987. She re-entered service as Cornhusker State on 12 March 1988, and has been in ready reserve since 1993.
|
31
|
-
visible: 0
|
32
|
-
|
33
|
-
article_similar_to_5:
|
34
|
-
id: 6
|
35
|
-
title: An article I made up by myself!
|
36
|
-
body: crane crane big ship foo
|
37
|
-
visible: 1
|
1
|
+
# Content generated using the random article feature on Wikipedia, http://en.wikipedia.org/wiki/Special:Random
|
2
|
+
# Wikipedia content may be redistributed under the GNU Free Documentation License; http://en.wikipedia.org/wiki/Wikipedia:Text_of_the_GNU_Free_Documentation_License
|
3
|
+
wikipedia_article_1:
|
4
|
+
id: 1
|
5
|
+
title: Body Count (video game)
|
6
|
+
body: Body Count is a 1994 First-person shooter for the Sega Mega Drive. It is one of the few games that make use of the Menacer light gun and the Mega Mouse. \n In the U.S. the game was released on the Sega Channel.
|
7
|
+
visible: 1
|
8
|
+
|
9
|
+
wikipedia_article_2:
|
10
|
+
id: 2
|
11
|
+
title: Julien Ellis
|
12
|
+
body: Julien Ellis is a ice hockey goalie, born in Sorel, Quebec on January 27, 1986. He is currently 6'0" and weighs approximately 177 pounds. He wears number 34 and catches left. \n Julien played his entire junior hockey career in the QMJHL with the Shawinigan Cataractes. He was there from 2002 through the 2006 season, and played a total of 173 regular season games for them. During his time there, he recorded eight shutouts, as well as a career high .921 save percentage and 2.41 goals against average. \n Julien was chosen in round six of the 2004 NHL Entry Draft by the Vancouver Canucks, making him 189th overall pick and the 5th pick for Vancouver. \n His 2006-07 season was spent with the Victoria Salmon Kings of the ECHL, where he played 37 games and made 1,212 saves. Julien was called up the the Manitoba Moose of the AHL several times, where he played eight games.
|
13
|
+
visible: 1
|
14
|
+
|
15
|
+
wikipedia_article_3:
|
16
|
+
id: 3
|
17
|
+
title: Tuen Mun River
|
18
|
+
body: The Tuen Mun River is a river in Tuen Mun, New Territories, Hong Kong. It has many tributaries, with major ones coming from Lam Tei, Kau Keng Shan, Hung Shui Hang and Nai Wai. It flows south, splitting Tuen Mun into a west side and an east side. It eventually feeds into the Tuen Mun Typhoon Shelter, which is part of Castle Peak Bay.
|
19
|
+
visible: 0
|
20
|
+
|
21
|
+
wikipedia_article_4:
|
22
|
+
id: 4
|
23
|
+
title: So Happily Unsatisfied
|
24
|
+
body: So Happily Unsatisfied is an album that was recorded by the band Nine Days. It was intended to be the follow-up to their successful major-label debut, The Madding Crowd from 2000. The release date of the album was repeatedly delayed by Sony until the band was ultimately dropped. In the interim, the album had leaked onto the internet. The band has also put the whole album on their official website for the public to download.
|
25
|
+
visible: 1
|
26
|
+
|
27
|
+
wikipedia_article_5:
|
28
|
+
id: 5
|
29
|
+
title: SS Cornhusker State (T-ACS-6)
|
30
|
+
body: SS Cornhusker State (T-ACS-6) is a crane ship in ready reserve for the United States Navy. She is stationed at Cheatham Annex in Williamsburg, Virginia and is in ready reserve under the Military Sealift Command. The ship was named for the state of Nebraska, which is also known as the Cornhusker State. \n The ship was built by the Bath Iron Works. Her keel was laid on 27 November 1967, launched on 2 November 1968, and delivered 20 June 1969 as CV Stag Hound (MA 207). \n Stag Hound was acquired by the US Navy from the Maritime Administration in 1986 and was converted throughout 1987. She re-entered service as Cornhusker State on 12 March 1988, and has been in ready reserve since 1993.
|
31
|
+
visible: 0
|
32
|
+
|
33
|
+
article_similar_to_5:
|
34
|
+
id: 6
|
35
|
+
title: An article I made up by myself!
|
36
|
+
body: crane crane big ship foo
|
37
|
+
visible: 1
|
data/test/schema.rb
CHANGED
@@ -1,7 +1,13 @@
|
|
1
|
-
ActiveRecord::Schema.define :version => 0 do
|
2
|
-
create_table :posts, :force => true do |t|
|
3
|
-
t.column :title, :string
|
4
|
-
t.column :body, :text
|
5
|
-
t.column :visible, :boolean
|
6
|
-
end
|
7
|
-
|
1
|
+
ActiveRecord::Schema.define :version => 0 do
|
2
|
+
create_table :posts, :force => true do |t|
|
3
|
+
t.column :title, :string
|
4
|
+
t.column :body, :text
|
5
|
+
t.column :visible, :boolean
|
6
|
+
end
|
7
|
+
|
8
|
+
create_table :sources, :force => true do |t|
|
9
|
+
t.column :name, :string
|
10
|
+
t.column :url, :string
|
11
|
+
t.column :description, :text
|
12
|
+
end
|
13
|
+
end
|
data/test/search_atom_test.rb
CHANGED
@@ -1,4 +1,4 @@
|
|
1
|
-
require File.
|
1
|
+
require File.dirname(__FILE__) + '/abstract_unit'
|
2
2
|
include ActsAsIndexed
|
3
3
|
|
4
4
|
class SearchAtomTest < ActiveSupport::TestCase
|
@@ -81,6 +81,18 @@ class SearchAtomTest < ActiveSupport::TestCase
|
|
81
81
|
assert_in_delta(3.219, weightings[1], 2 ** -10)
|
82
82
|
assert_in_delta(1.609, weightings[2], 2 ** -10)
|
83
83
|
end
|
84
|
+
|
85
|
+
def test_adding_with_recursive_merge
|
86
|
+
sa0 = SearchAtom.new()
|
87
|
+
sa1 = SearchAtom.new({1=>[1]})
|
88
|
+
sa2 = SearchAtom.new({1=>[2], 2=>[3]})
|
89
|
+
|
90
|
+
assert_equal (sa0 + sa1).records, {1=>[1]}
|
91
|
+
assert_equal (sa0 + sa2).records, {1=>[2], 2=>[3]}
|
92
|
+
|
93
|
+
assert_equal (sa1 + sa2).records, {1=>[1,2], 2=>[3]}
|
94
|
+
assert_equal (sa2 + sa1).records, {1=>[2,1], 2=>[3]}
|
95
|
+
end
|
84
96
|
|
85
97
|
private
|
86
98
|
|
data/test/search_index_test.rb
CHANGED
@@ -1,4 +1,4 @@
|
|
1
|
-
require File.
|
1
|
+
require File.dirname(__FILE__) + '/abstract_unit'
|
2
2
|
include ActsAsIndexed
|
3
3
|
|
4
4
|
class SearchIndexTest < ActiveSupport::TestCase
|
@@ -35,8 +35,8 @@ class SearchIndexTest < ActiveSupport::TestCase
|
|
35
35
|
search_index = build_search_index
|
36
36
|
mock_records = ['record0', 'record1']
|
37
37
|
|
38
|
-
search_index.expects(:add_record).with('record0')
|
39
|
-
search_index.expects(:add_record).with('record1')
|
38
|
+
search_index.expects(:add_record).with('record0', true)
|
39
|
+
search_index.expects(:add_record).with('record1', true)
|
40
40
|
|
41
41
|
search_index.add_records(mock_records)
|
42
42
|
end
|
metadata
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: acts_as_indexed
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
hash:
|
4
|
+
hash: 15
|
5
5
|
prerelease: false
|
6
6
|
segments:
|
7
7
|
- 0
|
8
8
|
- 6
|
9
|
-
-
|
10
|
-
version: 0.6.
|
9
|
+
- 4
|
10
|
+
version: 0.6.4
|
11
11
|
platform: ruby
|
12
12
|
authors:
|
13
13
|
- Douglas F Shearer
|
@@ -15,7 +15,7 @@ autorequire:
|
|
15
15
|
bindir: bin
|
16
16
|
cert_chain: []
|
17
17
|
|
18
|
-
date: 2010-
|
18
|
+
date: 2010-08-16 00:00:00 +01:00
|
19
19
|
default_executable:
|
20
20
|
dependencies: []
|
21
21
|
|