xapian_db 0.3.4 → 0.4.0
Sign up to get free protection for your applications and to get access to all the features.
- data/CHANGELOG.md +23 -0
- data/README.rdoc +34 -10
- data/lib/xapian_db/adapters/base_adapter.rb +21 -2
- data/lib/xapian_db/database.rb +34 -1
- data/lib/xapian_db/document_blueprint.rb +42 -13
- data/lib/xapian_db/indexer.rb +24 -5
- data/lib/xapian_db/railtie.rb +6 -0
- data/lib/xapian_db/resultset.rb +31 -4
- data/lib/xapian_db.rb +7 -0
- metadata +3 -3
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,23 @@
|
|
1
|
+
##0.4.0 (December 15th, 2010)
|
2
|
+
|
3
|
+
Features:
|
4
|
+
|
5
|
+
- Simple facets implementation. The only facet supported is the class name of the indexed objects
|
6
|
+
- Support for sorting (only for class searches, not for global searches)
|
7
|
+
- The result of a search can be used with will_paginate
|
8
|
+
|
9
|
+
Bugfixes:
|
10
|
+
|
11
|
+
- removed the class scope expression from the spelling suggestion when searching on a class
|
12
|
+
- keys of the attributes and index hashes are now sorted to be compatible with ruby 1.8 (which does
|
13
|
+
not preserve the order of the keys in a hash)
|
14
|
+
- Fixed the problem that blueprint configurations got lost after the first request in the development
|
15
|
+
env (Rails only). You should put your blueprints either into a class that is loaded by Rails or into
|
16
|
+
the file config/xapian_blueprints.rb wich is loaded automatically by XapianDb
|
17
|
+
|
18
|
+
**Since the internal structure of the index has changed, you must reindex your objects if you come from an
|
19
|
+
earlier version of XapianDb!**
|
20
|
+
|
1
21
|
##0.3.4 (December 14th, 2010)
|
2
22
|
|
3
23
|
Features:
|
@@ -5,6 +25,9 @@ Features:
|
|
5
25
|
- perform searches on indexed classes to scope the search to objects of a specific class
|
6
26
|
- specify multiple blueprint attributes and index methods in one statement (without specifying options)
|
7
27
|
- use blocks for complex attribute or index specifications
|
28
|
+
|
29
|
+
Changes:
|
30
|
+
|
8
31
|
- changed the implementation of Resultset.size to get more accurate estimations
|
9
32
|
- changed the indexing of active_record or datamapper models when declared as attributes or indexes
|
10
33
|
in a blueprint (indexes now all attributes of the object instead of using to_s)
|
data/README.rdoc
CHANGED
@@ -137,7 +137,8 @@ Use blocks for complex evaluations of attributes or indexed values:
|
|
137
137
|
end
|
138
138
|
end
|
139
139
|
|
140
|
-
|
140
|
+
place these configurations either into the corrsepondig class or - I prefer to have the index configurations outside
|
141
|
+
the models - into the file config/xapian_blueprints.rb.
|
141
142
|
|
142
143
|
=== Update the index
|
143
144
|
|
@@ -157,23 +158,30 @@ In verbose mode, XapianDb will use the progressbar gem if available.
|
|
157
158
|
|
158
159
|
A simple query looks like this:
|
159
160
|
|
160
|
-
results = XapianDb.search
|
161
|
+
results = XapianDb.search "Foo"
|
161
162
|
|
162
163
|
You can use wildcards and boolean operators:
|
163
164
|
|
164
|
-
results = XapianDb.search
|
165
|
+
results = XapianDb.search "fo* or baz"
|
165
166
|
|
166
167
|
You can query attributes:
|
167
168
|
|
168
|
-
results = XapianDb.search
|
169
|
+
results = XapianDb.search "name:Foo"
|
169
170
|
|
170
171
|
You can query objects of a specific class:
|
171
172
|
|
172
|
-
results = Person.search
|
173
|
+
results = Person.search "name:Foo"
|
173
174
|
|
174
175
|
If you want to override the default of 10 docs per page, pass the :per_page argument:
|
175
176
|
|
176
|
-
results = Person.search
|
177
|
+
results = Person.search "name:Foo", :per_page => 20
|
178
|
+
|
179
|
+
On class queries you can specifiy order options:
|
180
|
+
|
181
|
+
results = Person.search "name:Foo", :order => :first_name
|
182
|
+
results = Person.search "Fo*", :order => [:name, :first_name], :sort_decending => true
|
183
|
+
|
184
|
+
Please note that the order option is not avaliable for global searches (XapianDb.search...)
|
177
185
|
|
178
186
|
=== Process the results
|
179
187
|
|
@@ -189,8 +197,8 @@ If you use a persistent database, the resultset may contain a spelling correctio
|
|
189
197
|
|
190
198
|
To access the found documents, get a page from the resultset:
|
191
199
|
|
192
|
-
page =
|
193
|
-
page =
|
200
|
+
page = results.paginate # Get the first page
|
201
|
+
page = results.paginate :page => 2 # Get the second page
|
194
202
|
|
195
203
|
Now you can access the documents:
|
196
204
|
|
@@ -199,9 +207,25 @@ Now you can access the documents:
|
|
199
207
|
puts doc.name # We can access the configured attributes
|
200
208
|
person = doc.indexed_object # Access the object behind this doc (lazy loaded)
|
201
209
|
|
210
|
+
Use a search result with will_paginate in a view:
|
211
|
+
|
212
|
+
<%= will_paginate @results %>
|
213
|
+
|
214
|
+
=== Facets
|
215
|
+
|
216
|
+
If you want to implement a simple drilldown for your searches, you can use a facets query:
|
217
|
+
|
218
|
+
search_expression = "Foo"
|
219
|
+
facets = XapianDb.facets(search_expression)
|
220
|
+
facets.each do |klass, count|
|
221
|
+
puts "#{klass.name}: #{count} hits"
|
222
|
+
|
223
|
+
# This is how you would get all documents for the facet
|
224
|
+
# doc = klass.search search_expression
|
225
|
+
end
|
226
|
+
|
227
|
+
Facet support in XapianDb is very limited. The only available facet is the class of the indexed objects. In many cases that's all that's needed. Therefore, it is very likely that I won't add more options for facets (since I'm not a fan of facets anyway). However, if you desperately need advanced facets, let me know. Or - even better - send me a pull request with a nice implementation ;-)
|
202
228
|
|
203
229
|
== What to expect from future releases
|
204
230
|
|
205
|
-
* facet support
|
206
|
-
* will_paginate support
|
207
231
|
* asynchronous index writer based on {resque}[https://github.com/defunkt/resque] for production environments
|
@@ -19,8 +19,27 @@ module XapianDb
|
|
19
19
|
klass.class_eval do
|
20
20
|
|
21
21
|
# Add a method to search models of this class
|
22
|
-
|
23
|
-
|
22
|
+
# Options:
|
23
|
+
# - :order (Array<Symbol>) Accepts an array of attribute names for sorting
|
24
|
+
# - :sort_decending (Boolean) Allows to reverse the sorting
|
25
|
+
define_singleton_method(:search) do |expression, options={}|
|
26
|
+
options = {:sort_decending => false}.merge options
|
27
|
+
class_scope = "indexed_class:#{klass.name.downcase}"
|
28
|
+
|
29
|
+
if options[:order]
|
30
|
+
attr_names = [options[:order]].flatten
|
31
|
+
blueprint = XapianDb::DocumentBlueprint.blueprint_for klass
|
32
|
+
sort_indices = attr_names.map {|attr_name| blueprint.value_index_for(attr_name)}
|
33
|
+
options[:sort_indices] = attr_names.map {|attr_name| blueprint.value_index_for(attr_name)}
|
34
|
+
end
|
35
|
+
result = XapianDb.database.search "#{class_scope} and (#{expression})", options
|
36
|
+
|
37
|
+
# Remove the class scope from the spelling suggestion (if any)
|
38
|
+
unless result.spelling_suggestion.empty?
|
39
|
+
scope_length = "#{class_scope} and (".size
|
40
|
+
result.spelling_suggestion = result.spelling_suggestion.slice scope_length..-2
|
41
|
+
end
|
42
|
+
result
|
24
43
|
end
|
25
44
|
|
26
45
|
end
|
data/lib/xapian_db/database.rb
CHANGED
@@ -44,6 +44,10 @@ module XapianDb
|
|
44
44
|
# @param [String] expression A valid search expression.
|
45
45
|
# @param [Hash] options
|
46
46
|
# @option options [Integer] :per_page (10) How many docs per page?
|
47
|
+
# @option options [Array<Integer>] :sort_indices (nil) An array of attribute indices to sort by. This
|
48
|
+
# option is used internally by the search method implemented on configured classes. Do not use it
|
49
|
+
# directly unless
|
50
|
+
# you know what you do
|
47
51
|
# @example Simple Query
|
48
52
|
# resultset = db.search("foo")
|
49
53
|
# @example Wildcard Query
|
@@ -54,16 +58,45 @@ module XapianDb
|
|
54
58
|
# resultset = db.search("name:foo")
|
55
59
|
# @return [XapianDb::Resultset] The resultset
|
56
60
|
def search(expression, options={})
|
57
|
-
opts = {:per_page => 10}.merge(options)
|
61
|
+
opts = {:per_page => 10, :sort_decending => false}.merge(options)
|
58
62
|
@query_parser ||= QueryParser.new(self)
|
59
63
|
query = @query_parser.parse(expression)
|
60
64
|
enquiry = Xapian::Enquire.new(reader)
|
61
65
|
enquiry.query = query
|
66
|
+
|
67
|
+
if opts[:sort_indices]
|
68
|
+
raise ArgumentError.new("Sorting is available for class scoped searches only") unless expression =~ /^indexed_class:/
|
69
|
+
sorter = Xapian::MultiValueSorter.new
|
70
|
+
options[:sort_indices].each do |index|
|
71
|
+
sorter.add(index, opts[:sort_decending])
|
72
|
+
end
|
73
|
+
enquiry.set_sort_by_key_then_relevance(sorter)
|
74
|
+
end
|
75
|
+
|
62
76
|
opts[:spelling_suggestion] = @query_parser.spelling_suggestion
|
63
77
|
opts[:db_size] = self.size
|
64
78
|
Resultset.new(enquiry, opts)
|
65
79
|
end
|
66
80
|
|
81
|
+
# A very simple implementation of facets limited to the class facets.
|
82
|
+
# @param [String] expression A valid search expression (see {#search} for examples).
|
83
|
+
# @return [Hash<Class, Integer>] A hash containing the classes and the hits per class
|
84
|
+
def facets(expression)
|
85
|
+
@query_parser ||= QueryParser.new(self)
|
86
|
+
query = @query_parser.parse(expression)
|
87
|
+
enquiry = Xapian::Enquire.new(reader)
|
88
|
+
enquiry.query = query
|
89
|
+
enquiry.collapse_key = 0 # Value 0 always contains the class name
|
90
|
+
facets = {}
|
91
|
+
enquiry.mset(0, self.size).matches.each do |match|
|
92
|
+
class_name = match.document.values[0].value
|
93
|
+
# We must add 1 to the collapse_count since collapse_count means
|
94
|
+
# "how many other matches are there?"
|
95
|
+
facets[Kernel.const_get(class_name)] = match.collapse_count + 1
|
96
|
+
end
|
97
|
+
facets
|
98
|
+
end
|
99
|
+
|
67
100
|
end
|
68
101
|
|
69
102
|
# In Memory database
|
@@ -72,10 +72,48 @@ module XapianDb
|
|
72
72
|
# Instance methods
|
73
73
|
# ---------------------------------------------------------------------------------
|
74
74
|
|
75
|
+
# Get the names of all configured attributes sorted alphabetically
|
76
|
+
# @return [Array<Symbol>] The names of the attributes
|
77
|
+
def attribute_names
|
78
|
+
@attributes_hash.keys.sort
|
79
|
+
end
|
80
|
+
|
81
|
+
# Get the block associated with an attribute
|
82
|
+
# @param [Symbol] attribute The name of the attribute
|
83
|
+
# @return [Block] The block
|
84
|
+
def block_for_attribute(attribute)
|
85
|
+
@attributes_hash[attribute]
|
86
|
+
end
|
87
|
+
|
88
|
+
# Get the names of all configured index methods sorted alphabetically
|
89
|
+
# @return [Array<Symbol>] The names of the index_methods
|
90
|
+
def indexed_method_names
|
91
|
+
@indexed_methods_hash.keys.sort
|
92
|
+
end
|
93
|
+
|
94
|
+
# Get the options for an indexed method
|
95
|
+
# @param [Symbol] method The name of the method
|
96
|
+
# @return [IndexOptions] The options
|
97
|
+
def options_for_indexed_method(method)
|
98
|
+
@indexed_methods_hash[method]
|
99
|
+
end
|
100
|
+
|
101
|
+
# Return the value index of an attribute. Needed to access the value of an attribute
|
102
|
+
# from a Xapian document.
|
103
|
+
# @param [String, Symbol] attribute_name The name of the attribute
|
104
|
+
# @return [Integer] The value index of the attribute
|
105
|
+
# @raise ArgumentError if the attribute name is unknown
|
106
|
+
def value_index_for(attribute_name)
|
107
|
+
index = attribute_names.index attribute_name.to_sym
|
108
|
+
raise ArgumentError.new("Attribute #{attribute_name} unknown") unless index
|
109
|
+
# We add 1 because value slot 0 is reserved for the class name
|
110
|
+
index + 1
|
111
|
+
end
|
112
|
+
|
75
113
|
# Return an array of all configured text methods in this blueprint
|
76
114
|
# @return [Array<String>] All searchable prefixes
|
77
115
|
def searchable_prefixes
|
78
|
-
@prefixes ||= indexed_methods_hash.keys
|
116
|
+
@prefixes ||= @indexed_methods_hash.keys
|
79
117
|
end
|
80
118
|
|
81
119
|
# Lazily build and return a module that implements accessors for each field
|
@@ -91,7 +129,7 @@ module XapianDb
|
|
91
129
|
end
|
92
130
|
end
|
93
131
|
|
94
|
-
@attributes_hash.keys.each_with_index do |field, index|
|
132
|
+
@attributes_hash.keys.sort.each_with_index do |field, index|
|
95
133
|
@accessors_module.instance_eval do
|
96
134
|
define_method field do
|
97
135
|
YAML::load(self.values[index+1].value)
|
@@ -112,15 +150,6 @@ module XapianDb
|
|
112
150
|
# configured class must implement this method.
|
113
151
|
attr_reader :lang_method
|
114
152
|
|
115
|
-
# Collection of the configured attribute methods
|
116
|
-
# @return [Array<Symbol>] The names of the configured attribute methods
|
117
|
-
attr_reader :attributes_hash
|
118
|
-
|
119
|
-
# Collection of the configured index methods
|
120
|
-
# @return [Hash<Symbol, IndexOptions>] A hashtable containing all index methods as
|
121
|
-
# keys and IndexOptions as values
|
122
|
-
attr_reader :indexed_methods_hash
|
123
|
-
|
124
153
|
# Set / read a custom adapter.
|
125
154
|
# Use this configuration option if you need a specific adapter for an indexed class.
|
126
155
|
# If set, it overrides the globally configured adapter (see also {Config#adapter})
|
@@ -166,8 +195,8 @@ module XapianDb
|
|
166
195
|
self.index(name, opts, &block) if opts[:index]
|
167
196
|
end
|
168
197
|
|
169
|
-
# Add list of attributes to the blueprint. Attributes will be stored in the xapian documents
|
170
|
-
# accessed from a search result.
|
198
|
+
# Add a list of attributes to the blueprint. Attributes will be stored in the xapian documents ans
|
199
|
+
# can be accessed from a search result.
|
171
200
|
# @param [Array] attributes An array of method names that deliver the values for the attributes
|
172
201
|
# @todo Make sure the name does not collide with a method name of Xapian::Document
|
173
202
|
def attributes(*attributes)
|
data/lib/xapian_db/indexer.rb
CHANGED
@@ -35,15 +35,14 @@ module XapianDb
|
|
35
35
|
# We store the class name of the object at position 0
|
36
36
|
@xapian_doc.add_value(0, @obj.class.name)
|
37
37
|
|
38
|
-
|
39
|
-
|
38
|
+
@blueprint.attribute_names.each do |attribute|
|
39
|
+
block = @blueprint.block_for_attribute attribute
|
40
40
|
if block
|
41
41
|
value = @obj.instance_eval(&block)
|
42
42
|
else
|
43
43
|
value = @obj.send(attribute)
|
44
44
|
end
|
45
|
-
@xapian_doc.add_value(
|
46
|
-
pos += 1
|
45
|
+
@xapian_doc.add_value(@blueprint.value_index_for(attribute), value.to_yaml)
|
47
46
|
end
|
48
47
|
end
|
49
48
|
|
@@ -68,7 +67,26 @@ module XapianDb
|
|
68
67
|
@xapian_doc.add_term("C#{@obj.class}")
|
69
68
|
|
70
69
|
|
71
|
-
@blueprint.indexed_methods_hash.each do |method
|
70
|
+
# @blueprint.indexed_methods_hash.keys.sort.each do |method|
|
71
|
+
# options = @blueprint.indexed_methods_hash[method]
|
72
|
+
# if options.block
|
73
|
+
# obj = @obj.instance_eval(&options.block)
|
74
|
+
# else
|
75
|
+
# obj = @obj.send(method)
|
76
|
+
# end
|
77
|
+
# unless obj.nil?
|
78
|
+
# values = get_values_to_index_from obj
|
79
|
+
# values.each do |value|
|
80
|
+
# # Add value with field name
|
81
|
+
# term_generator.index_text(value.to_s.downcase, options.weight, "X#{method.upcase}")
|
82
|
+
# # Add value without field name
|
83
|
+
# term_generator.index_text(value.to_s.downcase)
|
84
|
+
# end
|
85
|
+
# end
|
86
|
+
# end
|
87
|
+
|
88
|
+
@blueprint.indexed_method_names.each do |method|
|
89
|
+
options = @blueprint.options_for_indexed_method method
|
72
90
|
if options.block
|
73
91
|
obj = @obj.instance_eval(&options.block)
|
74
92
|
else
|
@@ -84,6 +102,7 @@ module XapianDb
|
|
84
102
|
end
|
85
103
|
end
|
86
104
|
end
|
105
|
+
|
87
106
|
end
|
88
107
|
|
89
108
|
private
|
data/lib/xapian_db/railtie.rb
CHANGED
@@ -40,5 +40,11 @@ module XapianDb
|
|
40
40
|
|
41
41
|
end
|
42
42
|
|
43
|
+
config.to_prepare do
|
44
|
+
# Load a blueprint config if there is one
|
45
|
+
blueprints_file_path = "#{Rails.root}/config/xapian_blueprints.rb"
|
46
|
+
load blueprints_file_path if File.exist?(blueprints_file_path)
|
47
|
+
end
|
48
|
+
|
43
49
|
end
|
44
50
|
end
|
data/lib/xapian_db/resultset.rb
CHANGED
@@ -4,10 +4,13 @@ module XapianDb
|
|
4
4
|
|
5
5
|
# The resultset encapsulates a Xapian::Query object and allows paged access
|
6
6
|
# to the found documents.
|
7
|
+
# The resultset is compatible with will_paginate.
|
7
8
|
# @example Process the first page of a resultsest
|
8
9
|
# resultset.paginate(:page => 1, :per_page => 10).each do |doc|
|
9
10
|
# # do something with the xapian document
|
10
11
|
# end
|
12
|
+
# @example Use the resultset and will_paginate in a view
|
13
|
+
# <%= will_paginate resultset %>
|
11
14
|
# @author Gernot Kogler
|
12
15
|
class Resultset
|
13
16
|
|
@@ -15,9 +18,17 @@ module XapianDb
|
|
15
18
|
# @return [Integer]
|
16
19
|
attr_reader :size
|
17
20
|
|
21
|
+
# The number of pages
|
22
|
+
# @return [Integer]
|
23
|
+
attr_reader :total_pages
|
24
|
+
|
25
|
+
# The current page
|
26
|
+
# @return [Integer]
|
27
|
+
attr_reader :current_page
|
28
|
+
|
18
29
|
# The spelling corrected query (if a language is configured)
|
19
30
|
# @return [String]
|
20
|
-
|
31
|
+
attr_accessor :spelling_suggestion
|
21
32
|
|
22
33
|
# Constructor
|
23
34
|
# @param [Xapian::Enquire] enquiry a Xapian query result (see http://xapian.org/docs/apidoc/html/classXapian_1_1Enquire.html)
|
@@ -29,15 +40,30 @@ module XapianDb
|
|
29
40
|
# To get more accurate results, we pass the doc count to the mset method
|
30
41
|
@size = enquiry.mset(0, options[:db_size]).matches_estimated
|
31
42
|
@spelling_suggestion = options[:spelling_suggestion]
|
32
|
-
@per_page
|
43
|
+
@per_page = options[:per_page]
|
44
|
+
@total_pages = (@size / @per_page.to_f).ceil
|
45
|
+
@current_page = 1
|
33
46
|
end
|
34
47
|
|
35
48
|
# Paginate the result
|
36
49
|
# @param [Hash] opts Options for the persistent database
|
37
50
|
# @option opts [Integer] :page (1) The page to access
|
38
51
|
def paginate(opts={})
|
39
|
-
options
|
40
|
-
|
52
|
+
options = {:page => 1}.merge(opts)
|
53
|
+
@current_page = options[:page].to_i
|
54
|
+
build_page(@current_page)
|
55
|
+
end
|
56
|
+
|
57
|
+
# The previous page number
|
58
|
+
# @return [Integer] The number of the previous page or nil, if we are at page 1
|
59
|
+
def previous_page
|
60
|
+
@current_page > 1 ? (@current_page - 1) : nil
|
61
|
+
end
|
62
|
+
|
63
|
+
# The next page number
|
64
|
+
# @return [Integer] The number of the next page or nil, if we are at the last page
|
65
|
+
def next_page
|
66
|
+
@current_page < @total_pages ? (@current_page + 1): nil
|
41
67
|
end
|
42
68
|
|
43
69
|
private
|
@@ -45,6 +71,7 @@ module XapianDb
|
|
45
71
|
# Build a page of Xapian documents
|
46
72
|
# @return [Array<Xapian::Document>] An array of xapian documents
|
47
73
|
def build_page(page)
|
74
|
+
page.nil? ? page = 1 : page = page.to_i
|
48
75
|
docs = []
|
49
76
|
offset = (page - 1) * @per_page
|
50
77
|
return [] if offset > @size
|
data/lib/xapian_db.rb
CHANGED
@@ -77,6 +77,13 @@ module XapianDb
|
|
77
77
|
XapianDb::Config.database.search(expression)
|
78
78
|
end
|
79
79
|
|
80
|
+
# Get facets from the configured database.
|
81
|
+
# See {XapianDb::Database#facets} for options
|
82
|
+
# @return [Hash<Class, Integer>] A hash containing the classes and the hits per class
|
83
|
+
def self.facets(expression)
|
84
|
+
XapianDb::Config.database.facets(expression)
|
85
|
+
end
|
86
|
+
|
80
87
|
end
|
81
88
|
|
82
89
|
do_not_require = %w(update_stopwords.rb railtie.rb base_adapter.rb)
|
metadata
CHANGED
@@ -4,9 +4,9 @@ version: !ruby/object:Gem::Version
|
|
4
4
|
prerelease: false
|
5
5
|
segments:
|
6
6
|
- 0
|
7
|
-
- 3
|
8
7
|
- 4
|
9
|
-
|
8
|
+
- 0
|
9
|
+
version: 0.4.0
|
10
10
|
platform: ruby
|
11
11
|
authors:
|
12
12
|
- Gernot Kogler
|
@@ -14,7 +14,7 @@ autorequire:
|
|
14
14
|
bindir: bin
|
15
15
|
cert_chain: []
|
16
16
|
|
17
|
-
date: 2010-12-
|
17
|
+
date: 2010-12-15 00:00:00 +01:00
|
18
18
|
default_executable:
|
19
19
|
dependencies:
|
20
20
|
- !ruby/object:Gem::Dependency
|