xapian_db 1.0 → 1.1
Sign up to get free protection for your applications and to get access to all the features.
- data/CHANGELOG.md +19 -0
- data/README.rdoc +53 -4
- data/lib/type_codec.rb +124 -0
- data/lib/xapian_db/adapters/active_record_adapter.rb +2 -7
- data/lib/xapian_db/adapters/base_adapter.rb +6 -26
- data/lib/xapian_db/config.rb +2 -3
- data/lib/xapian_db/database.rb +16 -17
- data/lib/xapian_db/document_blueprint.rb +77 -34
- data/lib/xapian_db/index_writers/beanstalk_worker.rb +2 -2
- data/lib/xapian_db/indexer.rb +4 -5
- data/lib/xapian_db/query_parser.rb +15 -2
- data/lib/xapian_db/utilities.rb +6 -0
- data/lib/xapian_db.rb +36 -15
- metadata +13 -12
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,22 @@
|
|
1
|
+
##1.1 (September 7th, 2011)
|
2
|
+
|
3
|
+
Fixes:
|
4
|
+
|
5
|
+
- better handling of the beanstalk-client dependency
|
6
|
+
- recreate the xapian index database if the configured path exists but does not contain a valid xapian index
|
7
|
+
- support for non-integer primary keys (removed unneccesary to_i conversion)
|
8
|
+
|
9
|
+
Features:
|
10
|
+
|
11
|
+
- rails sample app upgraded to 3.1
|
12
|
+
- support for value range queries (strings, dates, numbers)
|
13
|
+
- sorting now works on a global query, too (XapianDb.search...)
|
14
|
+
- global factes queries have now the same options like class scoped facet queries
|
15
|
+
- Support for custom serialization into xapian documents; overwrite the serialization implementation in type_codec.rb or implement your own serialization for specific types (see examples/custom_serialization.rb)
|
16
|
+
- support to reindex a single object while evaluation an ignore_if block (if present)
|
17
|
+
|
18
|
+
IMPORTANT: YOU MUST REBUILD YOUR XAPIAN INDEX DATABASE SINCE THE INDEX STRUCTURE HAS CHANGED!
|
19
|
+
|
1
20
|
##1.0 (August 17th, 2011)
|
2
21
|
|
3
22
|
Features:
|
data/README.rdoc
CHANGED
@@ -1,5 +1,9 @@
|
|
1
1
|
= XapianDb
|
2
2
|
|
3
|
+
== Important Information
|
4
|
+
|
5
|
+
If you upgrade from an earlier version of xapian_db to 1.1, you MUST rebuild your entire index (XapianDb.rebuild_xapian_index)!
|
6
|
+
|
3
7
|
== What's in the box?
|
4
8
|
|
5
9
|
XapianDb is a ruby gem that combines features of nosql databases and fulltext indexing into one piece. The result: Rich documents and very fast queries. It is based on {Xapian}[http://xapian.org/], an efficient and powerful indexing library.
|
@@ -116,6 +120,14 @@ You may add a filter expression to exclude objects from the index. This is handy
|
|
116
120
|
blueprint.ignore_if {active == false}
|
117
121
|
end
|
118
122
|
|
123
|
+
You can add a type information to an attribute. As of now the special types :string, :date and :number are supported (and required for range queries):
|
124
|
+
|
125
|
+
XapianDb::DocumentBlueprint.setup(Person) do |blueprint|
|
126
|
+
blueprint.attribute :age, :as => :number
|
127
|
+
blueprint.attribute :date_of_birth, :as => :date
|
128
|
+
blueprint.attribute :name, :as => :string
|
129
|
+
end
|
130
|
+
|
119
131
|
You can override the global adapter configuration in a specific blueprint. Let's say you use ActiveRecord, but you have
|
120
132
|
one more class that is not stored in the database, but you want it to be indexed:
|
121
133
|
|
@@ -145,6 +157,10 @@ To rebuild the index for all blueprints, use
|
|
145
157
|
|
146
158
|
XapianDb.rebuild_xapian_index
|
147
159
|
|
160
|
+
You can update the index for a single object, too (e.g. to reevaluate an ignore_if block without modifying and saving the object):
|
161
|
+
|
162
|
+
XapianDb.reindex object
|
163
|
+
|
148
164
|
=== Query the index
|
149
165
|
|
150
166
|
A simple query looks like this:
|
@@ -180,7 +196,26 @@ On class queries you can specifiy order options:
|
|
180
196
|
results = Person.search "name:Foo", :order => :first_name
|
181
197
|
results = Person.search "Fo*", :order => [:name, :first_name], :sort_decending => true
|
182
198
|
|
183
|
-
|
199
|
+
If you define an attribute with a supported type, you can do range searches:
|
200
|
+
|
201
|
+
XapianDb::DocumentBlueprint.setup(Person) do |blueprint|
|
202
|
+
blueprint.attribute :age, :as => :number
|
203
|
+
blueprint.attribute :date_of_birth, :as => :date
|
204
|
+
blueprint.attribute :name, :as => :string
|
205
|
+
end
|
206
|
+
|
207
|
+
result = XapianDb.search("date_of_birth:2011-01-01..2011-12-31")
|
208
|
+
result = XapianDb.search("age:30..40")
|
209
|
+
result = XapianDb.search("name:Adam..Chris")
|
210
|
+
|
211
|
+
Open Ranges are supported, too:
|
212
|
+
|
213
|
+
result = XapianDb.search("age:..40")
|
214
|
+
result = XapianDb.search("age:30..")
|
215
|
+
|
216
|
+
You can combine range query expressions with other expressions:
|
217
|
+
|
218
|
+
result = XapianDb.search("age:30..40 AND city:Aarau")
|
184
219
|
|
185
220
|
=== Process the results
|
186
221
|
|
@@ -216,7 +251,15 @@ Or with kaminari:
|
|
216
251
|
If you want to implement a simple drilldown for your searches, you can use a global facets query:
|
217
252
|
|
218
253
|
search_expression = "Foo"
|
219
|
-
facets = XapianDb.facets(search_expression)
|
254
|
+
facets = XapianDb.facets(:name, search_expression)
|
255
|
+
facets.each do |name, count|
|
256
|
+
puts "#{name}: #{count} hits"
|
257
|
+
end
|
258
|
+
|
259
|
+
If you want the facets based on the indexed class, use the special attribute :indexed_class:
|
260
|
+
|
261
|
+
search_expression = "Foo"
|
262
|
+
facets = XapianDb.facets(:indexed_class, search_expression)
|
220
263
|
facets.each do |klass, count|
|
221
264
|
puts "#{klass.name}: #{count} hits"
|
222
265
|
|
@@ -224,7 +267,7 @@ If you want to implement a simple drilldown for your searches, you can use a glo
|
|
224
267
|
# doc = klass.search search_expression
|
225
268
|
end
|
226
269
|
|
227
|
-
A
|
270
|
+
A class level facet query is possible, too:
|
228
271
|
|
229
272
|
search_expression = "Foo"
|
230
273
|
facets = Person.facets(:name, search_expression)
|
@@ -232,7 +275,7 @@ A global facet search always groups the results by the class of the indexed obje
|
|
232
275
|
puts "#{name}: #{count} hits"
|
233
276
|
end
|
234
277
|
|
235
|
-
|
278
|
+
Any attribute declared in a blueprint can be used for a facet query. Use facet queries on attributes that store atomic values like strings, numbers or dates.
|
236
279
|
If you use it on attributes that contain collections (like an array of strings), you might get unexpected results.
|
237
280
|
|
238
281
|
=== Find similar documents
|
@@ -269,6 +312,12 @@ you can use the auto_indexing_disabled method with a block and rebuild the whole
|
|
269
312
|
end
|
270
313
|
Person.rebuild_xapian_index
|
271
314
|
|
315
|
+
== Add your own serializers for special objects
|
316
|
+
|
317
|
+
XapianDb serializes objects to xapian documents using YAML by default. This way, type information is preserved und you get back what you put into a xapian document, not just a string.
|
318
|
+
|
319
|
+
However, dates need special handling to support date range queries. To support date range queries and allow the addition of other custom data types in the future, XapianDb uses a simple, extensible mechanism to serialize / deserialize your objects. An example on how to extend this mechanism is provided in examples/custom_serialization.rb.
|
320
|
+
|
272
321
|
== Production setup
|
273
322
|
|
274
323
|
Since Xapian allows only one database instance to write to the index, the default setup of XapianDb will not work
|
data/lib/type_codec.rb
ADDED
@@ -0,0 +1,124 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
# This class is responsible for encoding and decoding values depending on their
|
4
|
+
# type
|
5
|
+
|
6
|
+
require "bigdecimal"
|
7
|
+
|
8
|
+
module XapianDb
|
9
|
+
|
10
|
+
class TypeCodec
|
11
|
+
|
12
|
+
extend XapianDb::Utilities
|
13
|
+
|
14
|
+
# Get the codec for a type
|
15
|
+
# @param [Symbol] type a supported type as a string or symbol.
|
16
|
+
# The following types are supported:
|
17
|
+
# - :date
|
18
|
+
# @return [DateCodec]
|
19
|
+
def self.codec_for(type)
|
20
|
+
begin
|
21
|
+
constantize "XapianDb::TypeCodec::#{camelize("#{type}_codec")}"
|
22
|
+
rescue NameError
|
23
|
+
raise ArgumentError.new "no codec defined for type #{type}"
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
class GenericCodec
|
28
|
+
|
29
|
+
# Encode an object to its yaml representation
|
30
|
+
# @param [Object] object an object to encode
|
31
|
+
# @return [String] the yaml string
|
32
|
+
def self.encode(object)
|
33
|
+
begin
|
34
|
+
if object.respond_to?(:attributes)
|
35
|
+
object.attributes.to_yaml
|
36
|
+
else
|
37
|
+
object.to_yaml
|
38
|
+
end
|
39
|
+
rescue NoMethodError
|
40
|
+
raise ArgumentError.new "#{object} does not support yaml serialization"
|
41
|
+
end
|
42
|
+
end
|
43
|
+
|
44
|
+
# Decode an object from a yaml string
|
45
|
+
# @param [String] yaml_string a yaml string representing the object
|
46
|
+
# @return [Object] the parsed object
|
47
|
+
def self.decode(yaml_string)
|
48
|
+
begin
|
49
|
+
YAML::load yaml_string
|
50
|
+
rescue TypeError
|
51
|
+
raise ArgumentError.new "'#{yaml_string}' cannot be loaded by YAML"
|
52
|
+
end
|
53
|
+
end
|
54
|
+
end
|
55
|
+
|
56
|
+
class StringCodec
|
57
|
+
|
58
|
+
# Encode an object to a string
|
59
|
+
# @param [Object] object an object to encode
|
60
|
+
# @return [String] the string
|
61
|
+
def self.encode(object)
|
62
|
+
object.to_s
|
63
|
+
end
|
64
|
+
|
65
|
+
# Decode a string
|
66
|
+
# @param [String] string a string
|
67
|
+
# @return [String] the string
|
68
|
+
def self.decode(string)
|
69
|
+
string
|
70
|
+
end
|
71
|
+
end
|
72
|
+
|
73
|
+
class DateCodec
|
74
|
+
|
75
|
+
# Encode a date to a string in the format 'yyyymmdd'
|
76
|
+
# @param [Date] date a date object to encode
|
77
|
+
# @return [String] the encoded date
|
78
|
+
def self.encode(date)
|
79
|
+
begin
|
80
|
+
date.strftime "%Y%m%d"
|
81
|
+
rescue NoMethodError
|
82
|
+
raise ArgumentError.new "#{date} was expected to be a date"
|
83
|
+
end
|
84
|
+
end
|
85
|
+
|
86
|
+
# Decode a string to a date
|
87
|
+
# @param [String] date_as_string a string representing a date
|
88
|
+
# @return [Date] the parsed date
|
89
|
+
def self.decode(date_as_string)
|
90
|
+
begin
|
91
|
+
Date.parse date_as_string
|
92
|
+
rescue ArgumentError
|
93
|
+
raise ArgumentError.new "'#{date_as_string}' cannot be converted to a date"
|
94
|
+
end
|
95
|
+
end
|
96
|
+
end
|
97
|
+
|
98
|
+
class NumberCodec
|
99
|
+
|
100
|
+
# Encode a number to a sortable string
|
101
|
+
# @param [Integer, BigDecimal, Float] number a number object to encode
|
102
|
+
# @return [String] the encoded number
|
103
|
+
def self.encode(number)
|
104
|
+
begin
|
105
|
+
Xapian::sortable_serialise number
|
106
|
+
rescue TypeError
|
107
|
+
raise ArgumentError.new "#{number} was expected to be a number"
|
108
|
+
end
|
109
|
+
end
|
110
|
+
|
111
|
+
# Decode a string to a BigDecimal
|
112
|
+
# @param [String] number_as_string a string representing a number
|
113
|
+
# @return [BigDecimal] the decoded number
|
114
|
+
def self.decode(encoded_number)
|
115
|
+
begin
|
116
|
+
BigDecimal.new(Xapian::sortable_unserialise(encoded_number).to_s)
|
117
|
+
rescue TypeError
|
118
|
+
raise ArgumentError.new "#{encoded_number} cannot be unserialized"
|
119
|
+
end
|
120
|
+
end
|
121
|
+
end
|
122
|
+
|
123
|
+
end
|
124
|
+
end
|
@@ -37,14 +37,9 @@ module XapianDb
|
|
37
37
|
|
38
38
|
klass.class_eval do
|
39
39
|
|
40
|
-
# add the after
|
40
|
+
# add the after commit logic
|
41
41
|
after_commit do
|
42
|
-
|
43
|
-
if blueprint.should_index?(self)
|
44
|
-
XapianDb.index(self)
|
45
|
-
else
|
46
|
-
XapianDb.delete_doc_with(self.xapian_id)
|
47
|
-
end
|
42
|
+
XapianDb.reindex(self)
|
48
43
|
end
|
49
44
|
|
50
45
|
# add the after destroy logic
|
@@ -33,9 +33,9 @@ module XapianDb
|
|
33
33
|
order = options.delete :order
|
34
34
|
if order
|
35
35
|
attr_names = [order].flatten
|
36
|
-
|
37
|
-
|
38
|
-
options[:sort_indices] = attr_names.map {|attr_name|
|
36
|
+
undefined_attrs = attr_names - XapianDb::DocumentBlueprint.attributes
|
37
|
+
raise ArgumentError.new "invalid order clause: attributes #{undefined_attrs.inspect} are not defined" unless undefined_attrs.empty?
|
38
|
+
options[:sort_indices] = attr_names.map {|attr_name| XapianDb::DocumentBlueprint.value_number_for(attr_name) }
|
39
39
|
end
|
40
40
|
result = XapianDb.database.search "#{class_scope} and (#{expression})", options
|
41
41
|
|
@@ -52,29 +52,9 @@ module XapianDb
|
|
52
52
|
end
|
53
53
|
|
54
54
|
# Add a method to search atribute facets of this class
|
55
|
-
define_singleton_method(:facets) do |
|
56
|
-
|
57
|
-
|
58
|
-
return {} if expression.nil? || expression.strip.empty?
|
59
|
-
|
60
|
-
class_scope = "indexed_class:#{klass.name.downcase}"
|
61
|
-
blueprint = XapianDb::DocumentBlueprint.blueprint_for klass
|
62
|
-
value_index = blueprint.value_index_for attr_name.to_sym
|
63
|
-
|
64
|
-
query_parser = QueryParser.new(XapianDb.database)
|
65
|
-
query = query_parser.parse("#{class_scope} and (#{expression})")
|
66
|
-
enquiry = Xapian::Enquire.new(XapianDb.database.reader)
|
67
|
-
enquiry.query = query
|
68
|
-
enquiry.collapse_key = value_index
|
69
|
-
facets = {}
|
70
|
-
enquiry.mset(0, XapianDb.database.size).matches.each do |match|
|
71
|
-
facet_value = YAML::load match.document.values[value_index].value
|
72
|
-
# We must add 1 to the collapse_count since collapse_count means
|
73
|
-
# "how many other matches are there?"
|
74
|
-
facets[facet_value] = match.collapse_count + 1
|
75
|
-
end
|
76
|
-
facets
|
77
|
-
|
55
|
+
define_singleton_method(:facets) do |attribute, expression|
|
56
|
+
class_scope = "indexed_class:#{klass.name.downcase}"
|
57
|
+
XapianDb.database.facets attribute, "#{class_scope} and (#{expression})"
|
78
58
|
end
|
79
59
|
|
80
60
|
end
|
data/lib/xapian_db/config.rb
CHANGED
@@ -67,10 +67,9 @@ module XapianDb
|
|
67
67
|
if path.to_sym == :memory
|
68
68
|
@_database = XapianDb.create_db
|
69
69
|
else
|
70
|
-
|
70
|
+
begin
|
71
71
|
@_database = XapianDb.open_db :path => path
|
72
|
-
|
73
|
-
# Database does not exist; create it
|
72
|
+
rescue IOError
|
74
73
|
@_database = XapianDb.create_db :path => path
|
75
74
|
end
|
76
75
|
end
|
data/lib/xapian_db/database.rb
CHANGED
@@ -75,13 +75,9 @@ module XapianDb
|
|
75
75
|
sort_decending = opts.delete :sort_decending
|
76
76
|
|
77
77
|
if sort_indices
|
78
|
-
|
79
|
-
sorter
|
80
|
-
|
81
|
-
sort_indices.each do |index|
|
82
|
-
sorter.add(index, sort_decending)
|
83
|
-
end
|
84
|
-
enquiry.set_sort_by_key_then_relevance(sorter)
|
78
|
+
sorter = Xapian::MultiValueKeyMaker.new
|
79
|
+
sort_indices.each { |index| sorter.add_value index }
|
80
|
+
enquiry.set_sort_by_key_then_relevance(sorter, sort_decending)
|
85
81
|
end
|
86
82
|
|
87
83
|
opts[:spelling_suggestion] = @query_parser.spelling_suggestion
|
@@ -123,25 +119,28 @@ module XapianDb
|
|
123
119
|
Resultset.new(enquiry, :db_size => self.size)
|
124
120
|
end
|
125
121
|
|
126
|
-
# A very simple implementation of facets
|
122
|
+
# A very simple implementation of facets using Xapian collapse key.
|
123
|
+
# @param [Symbol, String] attribute the name of an attribute declared in one ore more blueprints
|
127
124
|
# @param [String] expression A valid search expression (see {#search} for examples).
|
128
125
|
# @return [Hash<Class, Integer>] A hash containing the classes and the hits per class
|
129
|
-
def facets(expression)
|
130
|
-
|
131
|
-
|
132
|
-
|
126
|
+
def facets(attribute, expression)
|
127
|
+
# return an empty hash if no search expression is given
|
128
|
+
return {} if expression.nil? || expression.strip.empty?
|
129
|
+
value_number = XapianDb::DocumentBlueprint.value_number_for(attribute)
|
130
|
+
query_parser = QueryParser.new(XapianDb.database)
|
131
|
+
query = query_parser.parse(expression)
|
132
|
+
enquiry = Xapian::Enquire.new(XapianDb.database.reader)
|
133
133
|
enquiry.query = query
|
134
|
-
enquiry.collapse_key =
|
134
|
+
enquiry.collapse_key = value_number
|
135
135
|
facets = {}
|
136
|
-
enquiry.mset(0,
|
137
|
-
|
136
|
+
enquiry.mset(0, XapianDb.database.size).matches.each do |match|
|
137
|
+
facet_value = YAML::load match.document.value(value_number)
|
138
138
|
# We must add 1 to the collapse_count since collapse_count means
|
139
139
|
# "how many other matches are there?"
|
140
|
-
facets[
|
140
|
+
facets[facet_value] = match.collapse_count + 1
|
141
141
|
end
|
142
142
|
facets
|
143
143
|
end
|
144
|
-
|
145
144
|
end
|
146
145
|
|
147
146
|
# In Memory database
|
@@ -35,12 +35,17 @@ module XapianDb
|
|
35
35
|
@blueprints ||= {}
|
36
36
|
blueprint = DocumentBlueprint.new
|
37
37
|
yield blueprint if block_given? # configure the blueprint through the block
|
38
|
+
validate_type_consistency_on blueprint
|
38
39
|
# Remove a previously loaded blueprint for this class to avoid stale blueprint definitions
|
39
|
-
@blueprints.delete_if { |
|
40
|
+
@blueprints.delete_if { |indexed_class, blueprint| indexed_class.name == klass.name }
|
40
41
|
@blueprints[klass] = blueprint
|
41
42
|
@_adapter = blueprint._adapter || XapianDb::Config.adapter || Adapters::GenericAdapter
|
42
43
|
@_adapter.add_class_helper_methods_to klass
|
43
|
-
|
44
|
+
|
45
|
+
@searchable_prefixes = @blueprints.values.map { |blueprint| blueprint.searchable_prefixes }.flatten.compact.uniq || []
|
46
|
+
# We can always do a field search on the name of the indexed class
|
47
|
+
@searchable_prefixes << "indexed_class"
|
48
|
+
@attributes = @blueprints.values.map { |blueprint| blueprint.attribute_names}.flatten.compact.uniq.sort || []
|
44
49
|
end
|
45
50
|
|
46
51
|
# Get all configured classes
|
@@ -63,14 +68,54 @@ module XapianDb
|
|
63
68
|
raise "Blueprint for class #{klass} is not defined"
|
64
69
|
end
|
65
70
|
|
71
|
+
# Get the value number for an attribute. Please note that this is not the index in the values
|
72
|
+
# array of a xapian document but the valueno. Therefore, document.values[value_number] returns
|
73
|
+
# the wrong data, use document.value(value_number) instead.
|
74
|
+
# @param [attribute] The name of an attribute
|
75
|
+
# @return [Integer] The value number
|
76
|
+
def value_number_for(attribute)
|
77
|
+
raise ArgumentError.new "attribute #{attribute} is not configured in any blueprint" if @attributes.nil?
|
78
|
+
return 0 if attribute.to_sym == :indexed_class
|
79
|
+
position = @attributes.index attribute.to_sym
|
80
|
+
if position
|
81
|
+
# We add 1 because value slot 0 is reserved for the class name
|
82
|
+
return position + 1
|
83
|
+
else
|
84
|
+
raise ArgumentError.new "attribute #{attribute} is not configured in any blueprint"
|
85
|
+
end
|
86
|
+
end
|
87
|
+
|
88
|
+
# Get the type info of an attribute
|
89
|
+
# @param [attribute] The name of an indexed method
|
90
|
+
# @return [Symbol] The defined type or :untyped if no type is defined
|
91
|
+
def type_info_for(attribute)
|
92
|
+
return nil if @blueprints.nil?
|
93
|
+
@blueprints.values.each do |blueprint|
|
94
|
+
return blueprint.type_map[attribute] if blueprint.type_map.has_key?(attribute)
|
95
|
+
end
|
96
|
+
nil
|
97
|
+
end
|
98
|
+
|
66
99
|
# Return an array of all configured text methods in any blueprint
|
67
100
|
# @return [Array<String>] All searchable prefixes
|
68
101
|
def searchable_prefixes
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
|
102
|
+
@searchable_prefixes || []
|
103
|
+
end
|
104
|
+
|
105
|
+
# Return an array of all defined attributes
|
106
|
+
# @return [Array<Symbol>] All defined attributes
|
107
|
+
def attributes
|
108
|
+
@attributes || []
|
109
|
+
end
|
110
|
+
|
111
|
+
private
|
112
|
+
|
113
|
+
def validate_type_consistency_on(blueprint)
|
114
|
+
blueprint.type_map.each do |method_name, type|
|
115
|
+
if type_info_for(method_name) && type_info_for(method_name) != type
|
116
|
+
raise ArgumentError.new "ambigous type definition for #{method_name} detected (#{type_info_for(method_name)}, #{type})"
|
117
|
+
end
|
118
|
+
end
|
74
119
|
end
|
75
120
|
|
76
121
|
end
|
@@ -79,6 +124,8 @@ module XapianDb
|
|
79
124
|
# Instance methods
|
80
125
|
# ---------------------------------------------------------------------------------
|
81
126
|
|
127
|
+
attr_reader :type_map
|
128
|
+
|
82
129
|
# Get the names of all configured attributes sorted alphabetically
|
83
130
|
# @return [Array<Symbol>] The names of the attributes
|
84
131
|
def attribute_names
|
@@ -89,7 +136,7 @@ module XapianDb
|
|
89
136
|
# @param [Symbol] attribute The name of the attribute
|
90
137
|
# @return [Block] The block
|
91
138
|
def block_for_attribute(attribute)
|
92
|
-
@attributes_hash[attribute]
|
139
|
+
@attributes_hash[attribute][:block]
|
93
140
|
end
|
94
141
|
|
95
142
|
# Get the names of all configured index methods sorted alphabetically
|
@@ -105,22 +152,10 @@ module XapianDb
|
|
105
152
|
@indexed_methods_hash[method]
|
106
153
|
end
|
107
154
|
|
108
|
-
# Return the value index of an attribute. Needed to access the value of an attribute
|
109
|
-
# from a Xapian document.
|
110
|
-
# @param [String, Symbol] attribute_name The name of the attribute
|
111
|
-
# @return [Integer] The value index of the attribute
|
112
|
-
# @raise ArgumentError if the attribute name is unknown
|
113
|
-
def value_index_for(attribute_name)
|
114
|
-
index = attribute_names.index attribute_name.to_sym
|
115
|
-
raise ArgumentError.new("Attribute #{attribute_name} unknown") unless index
|
116
|
-
# We add 1 because value slot 0 is reserved for the class name
|
117
|
-
index + 1
|
118
|
-
end
|
119
|
-
|
120
155
|
# Return an array of all configured text methods in this blueprint
|
121
156
|
# @return [Array<String>] All searchable prefixes
|
122
157
|
def searchable_prefixes
|
123
|
-
@
|
158
|
+
@searchable_prefixes ||= indexed_method_names
|
124
159
|
end
|
125
160
|
|
126
161
|
# Should the object go into the index? Evaluates an ignore expression,
|
@@ -151,10 +186,11 @@ module XapianDb
|
|
151
186
|
|
152
187
|
# Add an accessor for each attribute
|
153
188
|
attribute_names.each do |attribute|
|
154
|
-
index =
|
189
|
+
index = DocumentBlueprint.value_number_for(attribute)
|
190
|
+
codec = XapianDb::TypeCodec.codec_for @type_map[attribute]
|
155
191
|
@accessors_module.instance_eval do
|
156
192
|
define_method attribute do
|
157
|
-
|
193
|
+
codec.decode self.value(index)
|
158
194
|
end
|
159
195
|
end
|
160
196
|
end
|
@@ -174,8 +210,9 @@ module XapianDb
|
|
174
210
|
|
175
211
|
# Construct the blueprint
|
176
212
|
def initialize
|
177
|
-
@attributes_hash =
|
213
|
+
@attributes_hash = {}
|
178
214
|
@indexed_methods_hash = {}
|
215
|
+
@type_map = {}
|
179
216
|
end
|
180
217
|
|
181
218
|
# Set the adapter
|
@@ -194,6 +231,7 @@ module XapianDb
|
|
194
231
|
# @param [Hash] options
|
195
232
|
# @option options [Integer] :weight (1) The weight for this attribute.
|
196
233
|
# @option options [Boolean] :index (true) Should the attribute be indexed?
|
234
|
+
# @option options [Symbol] :as should add type info for range queries (:date, :numeric)
|
197
235
|
# @example For complex attribute configurations you may pass a block:
|
198
236
|
# XapianDb::DocumentBlueprint.setup(IndexedObject) do |blueprint|
|
199
237
|
# blueprint.attribute :complex do
|
@@ -206,13 +244,15 @@ module XapianDb
|
|
206
244
|
# end
|
207
245
|
def attribute(name, options={}, &block)
|
208
246
|
raise ArgumentError.new("You cannot use #{name} as an attribute name since it is a reserved method name of Xapian::Document") if reserved_method_name?(name)
|
209
|
-
|
247
|
+
do_not_index = options.delete(:index) == false
|
248
|
+
@type_map[name] = (options.delete(:as) || :generic)
|
249
|
+
|
210
250
|
if block_given?
|
211
|
-
@attributes_hash[name] = block
|
251
|
+
@attributes_hash[name] = {:block => block}.merge(options)
|
212
252
|
else
|
213
|
-
@attributes_hash[name] =
|
253
|
+
@attributes_hash[name] = options
|
214
254
|
end
|
215
|
-
self.index(name,
|
255
|
+
self.index(name, options, &block) unless do_not_index
|
216
256
|
end
|
217
257
|
|
218
258
|
# Add a list of attributes to the blueprint. Attributes will be stored in the xapian documents ans
|
@@ -221,7 +261,8 @@ module XapianDb
|
|
221
261
|
def attributes(*attributes)
|
222
262
|
attributes.each do |attr|
|
223
263
|
raise ArgumentError.new("You cannot use #{attr} as an attribute name since it is a reserved method name of Xapian::Document") if reserved_method_name?(attr)
|
224
|
-
@attributes_hash[attr] =
|
264
|
+
@attributes_hash[attr] = {}
|
265
|
+
@type_map[attr] = :generic
|
225
266
|
self.index attr
|
226
267
|
end
|
227
268
|
end
|
@@ -249,7 +290,9 @@ module XapianDb
|
|
249
290
|
when 2
|
250
291
|
# Is it a method name with options?
|
251
292
|
if args.last.is_a? Hash
|
252
|
-
|
293
|
+
options = args.last
|
294
|
+
assert_valid_keys options, :weight
|
295
|
+
@indexed_methods_hash[args.first] = IndexOptions.new(options.merge(:block => block))
|
253
296
|
else
|
254
297
|
add_indexes_from args
|
255
298
|
end
|
@@ -266,16 +309,16 @@ module XapianDb
|
|
266
309
|
# Options for an indexed method
|
267
310
|
class IndexOptions
|
268
311
|
|
269
|
-
|
270
|
-
attr_accessor :weight, :block
|
312
|
+
attr_reader :weight, :block
|
271
313
|
|
272
314
|
# Constructor
|
273
315
|
# @param [Hash] options
|
274
316
|
# @option options [Integer] :weight (1) The weight for the indexed value
|
275
|
-
def initialize(options)
|
317
|
+
def initialize(options = {})
|
276
318
|
@weight = options[:weight] || 1
|
277
319
|
@block = options[:block]
|
278
320
|
end
|
321
|
+
|
279
322
|
end
|
280
323
|
|
281
324
|
private
|
@@ -295,4 +338,4 @@ module XapianDb
|
|
295
338
|
|
296
339
|
end
|
297
340
|
|
298
|
-
end
|
341
|
+
end
|
@@ -3,7 +3,7 @@
|
|
3
3
|
module XapianDb
|
4
4
|
module IndexWriters
|
5
5
|
|
6
|
-
# Worker to update the Xapian index; the worker is used in the beanstalk worker
|
6
|
+
# Worker to update the Xapian index; the worker is used in the beanstalk worker script
|
7
7
|
# and uses the DirectWriter to do the real work
|
8
8
|
# @author Gernot Kogler
|
9
9
|
class BeanstalkWorker
|
@@ -12,7 +12,7 @@ module XapianDb
|
|
12
12
|
|
13
13
|
def index_task(options)
|
14
14
|
klass = constantize options[:class]
|
15
|
-
obj = klass.respond_to?(:get) ? klass.get(options[:id]
|
15
|
+
obj = klass.respond_to?(:get) ? klass.get(options[:id]) : klass.find(options[:id])
|
16
16
|
DirectWriter.index obj
|
17
17
|
end
|
18
18
|
|
data/lib/xapian_db/indexer.rb
CHANGED
@@ -43,10 +43,9 @@ module XapianDb
|
|
43
43
|
value = @obj.send(attribute)
|
44
44
|
end
|
45
45
|
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
@xapian_doc.add_value(@blueprint.value_index_for(attribute), yaml)
|
46
|
+
codec = XapianDb::TypeCodec.codec_for @blueprint.type_map[attribute]
|
47
|
+
encoded_string = codec.encode value
|
48
|
+
@xapian_doc.add_value DocumentBlueprint.value_number_for(attribute), encoded_string
|
50
49
|
end
|
51
50
|
end
|
52
51
|
|
@@ -105,4 +104,4 @@ module XapianDb
|
|
105
104
|
|
106
105
|
end
|
107
106
|
|
108
|
-
end
|
107
|
+
end
|
@@ -39,7 +39,20 @@ module XapianDb
|
|
39
39
|
|
40
40
|
# Add the searchable prefixes to allow searches by field
|
41
41
|
# (like "name:Kogler")
|
42
|
-
XapianDb::DocumentBlueprint.searchable_prefixes.each
|
42
|
+
XapianDb::DocumentBlueprint.searchable_prefixes.each do |prefix|
|
43
|
+
parser.add_prefix(prefix.to_s.downcase, "X#{prefix.to_s.upcase}")
|
44
|
+
type_info = XapianDb::DocumentBlueprint.type_info_for(prefix)
|
45
|
+
next if type_info.nil? || type_info == :generic
|
46
|
+
value_number = XapianDb::DocumentBlueprint.value_number_for(prefix)
|
47
|
+
case type_info
|
48
|
+
when :date
|
49
|
+
parser.add_valuerangeprocessor Xapian::DateValueRangeProcessor.new(value_number, "#{prefix}:")
|
50
|
+
when :number
|
51
|
+
parser.add_valuerangeprocessor Xapian::NumberValueRangeProcessor.new(value_number, "#{prefix}:")
|
52
|
+
when :string
|
53
|
+
parser.add_valuerangeprocessor Xapian::StringValueRangeProcessor.new(value_number, "#{prefix}:")
|
54
|
+
end
|
55
|
+
end
|
43
56
|
query = parser.parse_query(expression, @query_flags)
|
44
57
|
@spelling_suggestion = parser.get_corrected_query_string.force_encoding("UTF-8")
|
45
58
|
@spelling_suggestion = nil if @spelling_suggestion.empty?
|
@@ -48,4 +61,4 @@ module XapianDb
|
|
48
61
|
|
49
62
|
end
|
50
63
|
|
51
|
-
end
|
64
|
+
end
|
data/lib/xapian_db/utilities.rb
CHANGED
@@ -24,5 +24,11 @@ module XapianDb
|
|
24
24
|
constant
|
25
25
|
end
|
26
26
|
|
27
|
+
# Taken from Rails
|
28
|
+
def assert_valid_keys(hash, *valid_keys)
|
29
|
+
unknown_keys = hash.keys - [valid_keys].flatten
|
30
|
+
raise(ArgumentError, "Unsupported option(s) detected: #{unknown_keys.join(", ")}") unless unknown_keys.empty?
|
31
|
+
end
|
32
|
+
|
27
33
|
end
|
28
34
|
end
|
data/lib/xapian_db.rb
CHANGED
@@ -9,6 +9,22 @@
|
|
9
9
|
require 'xapian'
|
10
10
|
require 'yaml'
|
11
11
|
|
12
|
+
do_not_require = %w(update_stopwords.rb railtie.rb base_adapter.rb beanstalk_writer.rb utilities.rb install_generator.rb)
|
13
|
+
files = Dir.glob("#{File.dirname(__FILE__)}/**/*.rb").reject{|path| do_not_require.include?(File.basename(path))}
|
14
|
+
# Require these first
|
15
|
+
require "#{File.dirname(__FILE__)}/xapian_db/utilities"
|
16
|
+
require "#{File.dirname(__FILE__)}/xapian_db/adapters/base_adapter"
|
17
|
+
files.each {|file| require file}
|
18
|
+
|
19
|
+
# Configure XapianDB if we are in a Rails app
|
20
|
+
require File.dirname(__FILE__) + '/xapian_db/railtie' if defined?(Rails)
|
21
|
+
|
22
|
+
# Try to require the beanstalk writer (depends on beanstalk-client)
|
23
|
+
begin
|
24
|
+
require File.dirname(__FILE__) + '/xapian_db/index_writers/beanstalk_writer'
|
25
|
+
rescue LoadError
|
26
|
+
end
|
27
|
+
|
12
28
|
module XapianDb
|
13
29
|
|
14
30
|
# Supported languages
|
@@ -75,14 +91,21 @@ module XapianDb
|
|
75
91
|
# See {XapianDb::Database#search} for options
|
76
92
|
# @return [XapianDb::Resultset]
|
77
93
|
def self.search(expression, options={})
|
94
|
+
order = options.delete :order
|
95
|
+
if order
|
96
|
+
attr_names = [order].flatten
|
97
|
+
undefined_attrs = attr_names - XapianDb::DocumentBlueprint.attributes
|
98
|
+
raise ArgumentError.new "invalid order clause: attributes #{undefined_attrs.inspect} are not defined" unless undefined_attrs.empty?
|
99
|
+
options[:sort_indices] = attr_names.map {|attr_name| XapianDb::DocumentBlueprint.value_number_for(attr_name) }
|
100
|
+
end
|
78
101
|
XapianDb::Config.database.search(expression, options)
|
79
102
|
end
|
80
103
|
|
81
104
|
# Get facets from the configured database.
|
82
105
|
# See {XapianDb::Database#facets} for options
|
83
106
|
# @return [Hash<Class, Integer>] A hash containing the classes and the hits per class
|
84
|
-
def self.facets(expression)
|
85
|
-
XapianDb::Config.database.facets
|
107
|
+
def self.facets(attribute, expression)
|
108
|
+
XapianDb::Config.database.facets attribute, expression
|
86
109
|
end
|
87
110
|
|
88
111
|
# Update an object in the index
|
@@ -99,6 +122,17 @@ module XapianDb
|
|
99
122
|
writer.delete_doc_with xapian_id
|
100
123
|
end
|
101
124
|
|
125
|
+
# Update or delete a xapian document belonging to an object depending on the ignore_if logic(if present)
|
126
|
+
# @param [Object] object An instance of a class with a blueprint configuration
|
127
|
+
def self.reindex(object)
|
128
|
+
blueprint = XapianDb::DocumentBlueprint.blueprint_for object.class
|
129
|
+
if blueprint.should_index?(object)
|
130
|
+
XapianDb.index object
|
131
|
+
else
|
132
|
+
XapianDb.delete_doc_with object.xapian_id
|
133
|
+
end
|
134
|
+
end
|
135
|
+
|
102
136
|
# Reindex all objects of a given class
|
103
137
|
# @param [Class] klass The class to reindex
|
104
138
|
# @param [Hash] options Options for reindexing
|
@@ -161,16 +195,3 @@ module XapianDb
|
|
161
195
|
end
|
162
196
|
|
163
197
|
end
|
164
|
-
|
165
|
-
do_not_require = %w(update_stopwords.rb railtie.rb base_adapter.rb beanstalk_writer.rb utilities.rb install_generator.rb)
|
166
|
-
files = Dir.glob("#{File.dirname(__FILE__)}/**/*.rb").reject{|path| do_not_require.include?(File.basename(path))}
|
167
|
-
# Require these first
|
168
|
-
require "#{File.dirname(__FILE__)}/xapian_db/utilities"
|
169
|
-
require "#{File.dirname(__FILE__)}/xapian_db/adapters/base_adapter"
|
170
|
-
files.each {|file| require file}
|
171
|
-
|
172
|
-
# Configure XapianDB if we are in a Rails app
|
173
|
-
require File.dirname(__FILE__) + '/xapian_db/railtie' if defined?(Rails)
|
174
|
-
|
175
|
-
# Require the beanstalk writer is beanstalk-client is installed
|
176
|
-
require File.dirname(__FILE__) + '/xapian_db/index_writers/beanstalk_writer' if Gem.available?('beanstalk-client')
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: xapian_db
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: '1.
|
4
|
+
version: '1.1'
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,12 +9,12 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2011-
|
12
|
+
date: 2011-09-07 00:00:00.000000000 +02:00
|
13
13
|
default_executable:
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
16
16
|
name: daemons
|
17
|
-
requirement: &
|
17
|
+
requirement: &70318028242380 !ruby/object:Gem::Requirement
|
18
18
|
none: false
|
19
19
|
requirements:
|
20
20
|
- - ! '>='
|
@@ -22,10 +22,10 @@ dependencies:
|
|
22
22
|
version: 1.0.10
|
23
23
|
type: :runtime
|
24
24
|
prerelease: false
|
25
|
-
version_requirements: *
|
25
|
+
version_requirements: *70318028242380
|
26
26
|
- !ruby/object:Gem::Dependency
|
27
27
|
name: xapian-ruby
|
28
|
-
requirement: &
|
28
|
+
requirement: &70318028241920 !ruby/object:Gem::Requirement
|
29
29
|
none: false
|
30
30
|
requirements:
|
31
31
|
- - ! '>='
|
@@ -33,10 +33,10 @@ dependencies:
|
|
33
33
|
version: 1.2.6
|
34
34
|
type: :runtime
|
35
35
|
prerelease: false
|
36
|
-
version_requirements: *
|
36
|
+
version_requirements: *70318028241920
|
37
37
|
- !ruby/object:Gem::Dependency
|
38
38
|
name: rspec
|
39
|
-
requirement: &
|
39
|
+
requirement: &70318028241460 !ruby/object:Gem::Requirement
|
40
40
|
none: false
|
41
41
|
requirements:
|
42
42
|
- - ! '>='
|
@@ -44,10 +44,10 @@ dependencies:
|
|
44
44
|
version: 2.3.1
|
45
45
|
type: :development
|
46
46
|
prerelease: false
|
47
|
-
version_requirements: *
|
47
|
+
version_requirements: *70318028241460
|
48
48
|
- !ruby/object:Gem::Dependency
|
49
49
|
name: simplecov
|
50
|
-
requirement: &
|
50
|
+
requirement: &70318028241000 !ruby/object:Gem::Requirement
|
51
51
|
none: false
|
52
52
|
requirements:
|
53
53
|
- - ! '>='
|
@@ -55,10 +55,10 @@ dependencies:
|
|
55
55
|
version: 0.3.7
|
56
56
|
type: :development
|
57
57
|
prerelease: false
|
58
|
-
version_requirements: *
|
58
|
+
version_requirements: *70318028241000
|
59
59
|
- !ruby/object:Gem::Dependency
|
60
60
|
name: beanstalk-client
|
61
|
-
requirement: &
|
61
|
+
requirement: &70318028240520 !ruby/object:Gem::Requirement
|
62
62
|
none: false
|
63
63
|
requirements:
|
64
64
|
- - ! '>='
|
@@ -66,7 +66,7 @@ dependencies:
|
|
66
66
|
version: 1.1.0
|
67
67
|
type: :development
|
68
68
|
prerelease: false
|
69
|
-
version_requirements: *
|
69
|
+
version_requirements: *70318028240520
|
70
70
|
description: XapianDb is a ruby gem that combines features of nosql databases and
|
71
71
|
fulltext indexing. It is based on Xapian, an efficient and powerful indexing library
|
72
72
|
email: gernot.kogler (at) garaio (dot) com
|
@@ -76,6 +76,7 @@ extra_rdoc_files: []
|
|
76
76
|
files:
|
77
77
|
- lib/generators/install_generator.rb
|
78
78
|
- lib/generators/templates/beanstalk_worker
|
79
|
+
- lib/type_codec.rb
|
79
80
|
- lib/xapian_db/adapters/active_record_adapter.rb
|
80
81
|
- lib/xapian_db/adapters/base_adapter.rb
|
81
82
|
- lib/xapian_db/adapters/datamapper_adapter.rb
|