xapian_db 1.0 → 1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/CHANGELOG.md +19 -0
- data/README.rdoc +53 -4
- data/lib/type_codec.rb +124 -0
- data/lib/xapian_db/adapters/active_record_adapter.rb +2 -7
- data/lib/xapian_db/adapters/base_adapter.rb +6 -26
- data/lib/xapian_db/config.rb +2 -3
- data/lib/xapian_db/database.rb +16 -17
- data/lib/xapian_db/document_blueprint.rb +77 -34
- data/lib/xapian_db/index_writers/beanstalk_worker.rb +2 -2
- data/lib/xapian_db/indexer.rb +4 -5
- data/lib/xapian_db/query_parser.rb +15 -2
- data/lib/xapian_db/utilities.rb +6 -0
- data/lib/xapian_db.rb +36 -15
- metadata +13 -12
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,22 @@
|
|
1
|
+
##1.1 (September 7th, 2011)
|
2
|
+
|
3
|
+
Fixes:
|
4
|
+
|
5
|
+
- better handling of the beanstalk-client dependency
|
6
|
+
- recreate the xapian index database if the configured path exists but does not contain a valid xapian index
|
7
|
+
- support for non-integer primary keys (removed unneccesary to_i conversion)
|
8
|
+
|
9
|
+
Features:
|
10
|
+
|
11
|
+
- rails sample app upgraded to 3.1
|
12
|
+
- support for value range queries (strings, dates, numbers)
|
13
|
+
- sorting now works on a global query, too (XapianDb.search...)
|
14
|
+
- global factes queries have now the same options like class scoped facet queries
|
15
|
+
- Support for custom serialization into xapian documents; overwrite the serialization implementation in type_codec.rb or implement your own serialization for specific types (see examples/custom_serialization.rb)
|
16
|
+
- support to reindex a single object while evaluation an ignore_if block (if present)
|
17
|
+
|
18
|
+
IMPORTANT: YOU MUST REBUILD YOUR XAPIAN INDEX DATABASE SINCE THE INDEX STRUCTURE HAS CHANGED!
|
19
|
+
|
1
20
|
##1.0 (August 17th, 2011)
|
2
21
|
|
3
22
|
Features:
|
data/README.rdoc
CHANGED
@@ -1,5 +1,9 @@
|
|
1
1
|
= XapianDb
|
2
2
|
|
3
|
+
== Important Information
|
4
|
+
|
5
|
+
If you upgrade from an earlier version of xapian_db to 1.1, you MUST rebuild your entire index (XapianDb.rebuild_xapian_index)!
|
6
|
+
|
3
7
|
== What's in the box?
|
4
8
|
|
5
9
|
XapianDb is a ruby gem that combines features of nosql databases and fulltext indexing into one piece. The result: Rich documents and very fast queries. It is based on {Xapian}[http://xapian.org/], an efficient and powerful indexing library.
|
@@ -116,6 +120,14 @@ You may add a filter expression to exclude objects from the index. This is handy
|
|
116
120
|
blueprint.ignore_if {active == false}
|
117
121
|
end
|
118
122
|
|
123
|
+
You can add a type information to an attribute. As of now the special types :string, :date and :number are supported (and required for range queries):
|
124
|
+
|
125
|
+
XapianDb::DocumentBlueprint.setup(Person) do |blueprint|
|
126
|
+
blueprint.attribute :age, :as => :number
|
127
|
+
blueprint.attribute :date_of_birth, :as => :date
|
128
|
+
blueprint.attribute :name, :as => :string
|
129
|
+
end
|
130
|
+
|
119
131
|
You can override the global adapter configuration in a specific blueprint. Let's say you use ActiveRecord, but you have
|
120
132
|
one more class that is not stored in the database, but you want it to be indexed:
|
121
133
|
|
@@ -145,6 +157,10 @@ To rebuild the index for all blueprints, use
|
|
145
157
|
|
146
158
|
XapianDb.rebuild_xapian_index
|
147
159
|
|
160
|
+
You can update the index for a single object, too (e.g. to reevaluate an ignore_if block without modifying and saving the object):
|
161
|
+
|
162
|
+
XapianDb.reindex object
|
163
|
+
|
148
164
|
=== Query the index
|
149
165
|
|
150
166
|
A simple query looks like this:
|
@@ -180,7 +196,26 @@ On class queries you can specifiy order options:
|
|
180
196
|
results = Person.search "name:Foo", :order => :first_name
|
181
197
|
results = Person.search "Fo*", :order => [:name, :first_name], :sort_decending => true
|
182
198
|
|
183
|
-
|
199
|
+
If you define an attribute with a supported type, you can do range searches:
|
200
|
+
|
201
|
+
XapianDb::DocumentBlueprint.setup(Person) do |blueprint|
|
202
|
+
blueprint.attribute :age, :as => :number
|
203
|
+
blueprint.attribute :date_of_birth, :as => :date
|
204
|
+
blueprint.attribute :name, :as => :string
|
205
|
+
end
|
206
|
+
|
207
|
+
result = XapianDb.search("date_of_birth:2011-01-01..2011-12-31")
|
208
|
+
result = XapianDb.search("age:30..40")
|
209
|
+
result = XapianDb.search("name:Adam..Chris")
|
210
|
+
|
211
|
+
Open Ranges are supported, too:
|
212
|
+
|
213
|
+
result = XapianDb.search("age:..40")
|
214
|
+
result = XapianDb.search("age:30..")
|
215
|
+
|
216
|
+
You can combine range query expressions with other expressions:
|
217
|
+
|
218
|
+
result = XapianDb.search("age:30..40 AND city:Aarau")
|
184
219
|
|
185
220
|
=== Process the results
|
186
221
|
|
@@ -216,7 +251,15 @@ Or with kaminari:
|
|
216
251
|
If you want to implement a simple drilldown for your searches, you can use a global facets query:
|
217
252
|
|
218
253
|
search_expression = "Foo"
|
219
|
-
facets = XapianDb.facets(search_expression)
|
254
|
+
facets = XapianDb.facets(:name, search_expression)
|
255
|
+
facets.each do |name, count|
|
256
|
+
puts "#{name}: #{count} hits"
|
257
|
+
end
|
258
|
+
|
259
|
+
If you want the facets based on the indexed class, use the special attribute :indexed_class:
|
260
|
+
|
261
|
+
search_expression = "Foo"
|
262
|
+
facets = XapianDb.facets(:indexed_class, search_expression)
|
220
263
|
facets.each do |klass, count|
|
221
264
|
puts "#{klass.name}: #{count} hits"
|
222
265
|
|
@@ -224,7 +267,7 @@ If you want to implement a simple drilldown for your searches, you can use a glo
|
|
224
267
|
# doc = klass.search search_expression
|
225
268
|
end
|
226
269
|
|
227
|
-
A
|
270
|
+
A class level facet query is possible, too:
|
228
271
|
|
229
272
|
search_expression = "Foo"
|
230
273
|
facets = Person.facets(:name, search_expression)
|
@@ -232,7 +275,7 @@ A global facet search always groups the results by the class of the indexed obje
|
|
232
275
|
puts "#{name}: #{count} hits"
|
233
276
|
end
|
234
277
|
|
235
|
-
|
278
|
+
Any attribute declared in a blueprint can be used for a facet query. Use facet queries on attributes that store atomic values like strings, numbers or dates.
|
236
279
|
If you use it on attributes that contain collections (like an array of strings), you might get unexpected results.
|
237
280
|
|
238
281
|
=== Find similar documents
|
@@ -269,6 +312,12 @@ you can use the auto_indexing_disabled method with a block and rebuild the whole
|
|
269
312
|
end
|
270
313
|
Person.rebuild_xapian_index
|
271
314
|
|
315
|
+
== Add your own serializers for special objects
|
316
|
+
|
317
|
+
XapianDb serializes objects to xapian documents using YAML by default. This way, type information is preserved und you get back what you put into a xapian document, not just a string.
|
318
|
+
|
319
|
+
However, dates need special handling to support date range queries. To support date range queries and allow the addition of other custom data types in the future, XapianDb uses a simple, extensible mechanism to serialize / deserialize your objects. An example on how to extend this mechanism is provided in examples/custom_serialization.rb.
|
320
|
+
|
272
321
|
== Production setup
|
273
322
|
|
274
323
|
Since Xapian allows only one database instance to write to the index, the default setup of XapianDb will not work
|
data/lib/type_codec.rb
ADDED
@@ -0,0 +1,124 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
# This class is responsible for encoding and decoding values depending on their
|
4
|
+
# type
|
5
|
+
|
6
|
+
require "bigdecimal"
|
7
|
+
|
8
|
+
module XapianDb
|
9
|
+
|
10
|
+
class TypeCodec
|
11
|
+
|
12
|
+
extend XapianDb::Utilities
|
13
|
+
|
14
|
+
# Get the codec for a type
|
15
|
+
# @param [Symbol] type a supported type as a string or symbol.
|
16
|
+
# The following types are supported:
|
17
|
+
# - :date
|
18
|
+
# @return [DateCodec]
|
19
|
+
def self.codec_for(type)
|
20
|
+
begin
|
21
|
+
constantize "XapianDb::TypeCodec::#{camelize("#{type}_codec")}"
|
22
|
+
rescue NameError
|
23
|
+
raise ArgumentError.new "no codec defined for type #{type}"
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
class GenericCodec
|
28
|
+
|
29
|
+
# Encode an object to its yaml representation
|
30
|
+
# @param [Object] object an object to encode
|
31
|
+
# @return [String] the yaml string
|
32
|
+
def self.encode(object)
|
33
|
+
begin
|
34
|
+
if object.respond_to?(:attributes)
|
35
|
+
object.attributes.to_yaml
|
36
|
+
else
|
37
|
+
object.to_yaml
|
38
|
+
end
|
39
|
+
rescue NoMethodError
|
40
|
+
raise ArgumentError.new "#{object} does not support yaml serialization"
|
41
|
+
end
|
42
|
+
end
|
43
|
+
|
44
|
+
# Decode an object from a yaml string
|
45
|
+
# @param [String] yaml_string a yaml string representing the object
|
46
|
+
# @return [Object] the parsed object
|
47
|
+
def self.decode(yaml_string)
|
48
|
+
begin
|
49
|
+
YAML::load yaml_string
|
50
|
+
rescue TypeError
|
51
|
+
raise ArgumentError.new "'#{yaml_string}' cannot be loaded by YAML"
|
52
|
+
end
|
53
|
+
end
|
54
|
+
end
|
55
|
+
|
56
|
+
class StringCodec
|
57
|
+
|
58
|
+
# Encode an object to a string
|
59
|
+
# @param [Object] object an object to encode
|
60
|
+
# @return [String] the string
|
61
|
+
def self.encode(object)
|
62
|
+
object.to_s
|
63
|
+
end
|
64
|
+
|
65
|
+
# Decode a string
|
66
|
+
# @param [String] string a string
|
67
|
+
# @return [String] the string
|
68
|
+
def self.decode(string)
|
69
|
+
string
|
70
|
+
end
|
71
|
+
end
|
72
|
+
|
73
|
+
class DateCodec
|
74
|
+
|
75
|
+
# Encode a date to a string in the format 'yyyymmdd'
|
76
|
+
# @param [Date] date a date object to encode
|
77
|
+
# @return [String] the encoded date
|
78
|
+
def self.encode(date)
|
79
|
+
begin
|
80
|
+
date.strftime "%Y%m%d"
|
81
|
+
rescue NoMethodError
|
82
|
+
raise ArgumentError.new "#{date} was expected to be a date"
|
83
|
+
end
|
84
|
+
end
|
85
|
+
|
86
|
+
# Decode a string to a date
|
87
|
+
# @param [String] date_as_string a string representing a date
|
88
|
+
# @return [Date] the parsed date
|
89
|
+
def self.decode(date_as_string)
|
90
|
+
begin
|
91
|
+
Date.parse date_as_string
|
92
|
+
rescue ArgumentError
|
93
|
+
raise ArgumentError.new "'#{date_as_string}' cannot be converted to a date"
|
94
|
+
end
|
95
|
+
end
|
96
|
+
end
|
97
|
+
|
98
|
+
class NumberCodec
|
99
|
+
|
100
|
+
# Encode a number to a sortable string
|
101
|
+
# @param [Integer, BigDecimal, Float] number a number object to encode
|
102
|
+
# @return [String] the encoded number
|
103
|
+
def self.encode(number)
|
104
|
+
begin
|
105
|
+
Xapian::sortable_serialise number
|
106
|
+
rescue TypeError
|
107
|
+
raise ArgumentError.new "#{number} was expected to be a number"
|
108
|
+
end
|
109
|
+
end
|
110
|
+
|
111
|
+
# Decode a string to a BigDecimal
|
112
|
+
# @param [String] number_as_string a string representing a number
|
113
|
+
# @return [BigDecimal] the decoded number
|
114
|
+
def self.decode(encoded_number)
|
115
|
+
begin
|
116
|
+
BigDecimal.new(Xapian::sortable_unserialise(encoded_number).to_s)
|
117
|
+
rescue TypeError
|
118
|
+
raise ArgumentError.new "#{encoded_number} cannot be unserialized"
|
119
|
+
end
|
120
|
+
end
|
121
|
+
end
|
122
|
+
|
123
|
+
end
|
124
|
+
end
|
@@ -37,14 +37,9 @@ module XapianDb
|
|
37
37
|
|
38
38
|
klass.class_eval do
|
39
39
|
|
40
|
-
# add the after
|
40
|
+
# add the after commit logic
|
41
41
|
after_commit do
|
42
|
-
|
43
|
-
if blueprint.should_index?(self)
|
44
|
-
XapianDb.index(self)
|
45
|
-
else
|
46
|
-
XapianDb.delete_doc_with(self.xapian_id)
|
47
|
-
end
|
42
|
+
XapianDb.reindex(self)
|
48
43
|
end
|
49
44
|
|
50
45
|
# add the after destroy logic
|
@@ -33,9 +33,9 @@ module XapianDb
|
|
33
33
|
order = options.delete :order
|
34
34
|
if order
|
35
35
|
attr_names = [order].flatten
|
36
|
-
|
37
|
-
|
38
|
-
options[:sort_indices] = attr_names.map {|attr_name|
|
36
|
+
undefined_attrs = attr_names - XapianDb::DocumentBlueprint.attributes
|
37
|
+
raise ArgumentError.new "invalid order clause: attributes #{undefined_attrs.inspect} are not defined" unless undefined_attrs.empty?
|
38
|
+
options[:sort_indices] = attr_names.map {|attr_name| XapianDb::DocumentBlueprint.value_number_for(attr_name) }
|
39
39
|
end
|
40
40
|
result = XapianDb.database.search "#{class_scope} and (#{expression})", options
|
41
41
|
|
@@ -52,29 +52,9 @@ module XapianDb
|
|
52
52
|
end
|
53
53
|
|
54
54
|
# Add a method to search atribute facets of this class
|
55
|
-
define_singleton_method(:facets) do |
|
56
|
-
|
57
|
-
|
58
|
-
return {} if expression.nil? || expression.strip.empty?
|
59
|
-
|
60
|
-
class_scope = "indexed_class:#{klass.name.downcase}"
|
61
|
-
blueprint = XapianDb::DocumentBlueprint.blueprint_for klass
|
62
|
-
value_index = blueprint.value_index_for attr_name.to_sym
|
63
|
-
|
64
|
-
query_parser = QueryParser.new(XapianDb.database)
|
65
|
-
query = query_parser.parse("#{class_scope} and (#{expression})")
|
66
|
-
enquiry = Xapian::Enquire.new(XapianDb.database.reader)
|
67
|
-
enquiry.query = query
|
68
|
-
enquiry.collapse_key = value_index
|
69
|
-
facets = {}
|
70
|
-
enquiry.mset(0, XapianDb.database.size).matches.each do |match|
|
71
|
-
facet_value = YAML::load match.document.values[value_index].value
|
72
|
-
# We must add 1 to the collapse_count since collapse_count means
|
73
|
-
# "how many other matches are there?"
|
74
|
-
facets[facet_value] = match.collapse_count + 1
|
75
|
-
end
|
76
|
-
facets
|
77
|
-
|
55
|
+
define_singleton_method(:facets) do |attribute, expression|
|
56
|
+
class_scope = "indexed_class:#{klass.name.downcase}"
|
57
|
+
XapianDb.database.facets attribute, "#{class_scope} and (#{expression})"
|
78
58
|
end
|
79
59
|
|
80
60
|
end
|
data/lib/xapian_db/config.rb
CHANGED
@@ -67,10 +67,9 @@ module XapianDb
|
|
67
67
|
if path.to_sym == :memory
|
68
68
|
@_database = XapianDb.create_db
|
69
69
|
else
|
70
|
-
|
70
|
+
begin
|
71
71
|
@_database = XapianDb.open_db :path => path
|
72
|
-
|
73
|
-
# Database does not exist; create it
|
72
|
+
rescue IOError
|
74
73
|
@_database = XapianDb.create_db :path => path
|
75
74
|
end
|
76
75
|
end
|
data/lib/xapian_db/database.rb
CHANGED
@@ -75,13 +75,9 @@ module XapianDb
|
|
75
75
|
sort_decending = opts.delete :sort_decending
|
76
76
|
|
77
77
|
if sort_indices
|
78
|
-
|
79
|
-
sorter
|
80
|
-
|
81
|
-
sort_indices.each do |index|
|
82
|
-
sorter.add(index, sort_decending)
|
83
|
-
end
|
84
|
-
enquiry.set_sort_by_key_then_relevance(sorter)
|
78
|
+
sorter = Xapian::MultiValueKeyMaker.new
|
79
|
+
sort_indices.each { |index| sorter.add_value index }
|
80
|
+
enquiry.set_sort_by_key_then_relevance(sorter, sort_decending)
|
85
81
|
end
|
86
82
|
|
87
83
|
opts[:spelling_suggestion] = @query_parser.spelling_suggestion
|
@@ -123,25 +119,28 @@ module XapianDb
|
|
123
119
|
Resultset.new(enquiry, :db_size => self.size)
|
124
120
|
end
|
125
121
|
|
126
|
-
# A very simple implementation of facets
|
122
|
+
# A very simple implementation of facets using Xapian collapse key.
|
123
|
+
# @param [Symbol, String] attribute the name of an attribute declared in one ore more blueprints
|
127
124
|
# @param [String] expression A valid search expression (see {#search} for examples).
|
128
125
|
# @return [Hash<Class, Integer>] A hash containing the classes and the hits per class
|
129
|
-
def facets(expression)
|
130
|
-
|
131
|
-
|
132
|
-
|
126
|
+
def facets(attribute, expression)
|
127
|
+
# return an empty hash if no search expression is given
|
128
|
+
return {} if expression.nil? || expression.strip.empty?
|
129
|
+
value_number = XapianDb::DocumentBlueprint.value_number_for(attribute)
|
130
|
+
query_parser = QueryParser.new(XapianDb.database)
|
131
|
+
query = query_parser.parse(expression)
|
132
|
+
enquiry = Xapian::Enquire.new(XapianDb.database.reader)
|
133
133
|
enquiry.query = query
|
134
|
-
enquiry.collapse_key =
|
134
|
+
enquiry.collapse_key = value_number
|
135
135
|
facets = {}
|
136
|
-
enquiry.mset(0,
|
137
|
-
|
136
|
+
enquiry.mset(0, XapianDb.database.size).matches.each do |match|
|
137
|
+
facet_value = YAML::load match.document.value(value_number)
|
138
138
|
# We must add 1 to the collapse_count since collapse_count means
|
139
139
|
# "how many other matches are there?"
|
140
|
-
facets[
|
140
|
+
facets[facet_value] = match.collapse_count + 1
|
141
141
|
end
|
142
142
|
facets
|
143
143
|
end
|
144
|
-
|
145
144
|
end
|
146
145
|
|
147
146
|
# In Memory database
|
@@ -35,12 +35,17 @@ module XapianDb
|
|
35
35
|
@blueprints ||= {}
|
36
36
|
blueprint = DocumentBlueprint.new
|
37
37
|
yield blueprint if block_given? # configure the blueprint through the block
|
38
|
+
validate_type_consistency_on blueprint
|
38
39
|
# Remove a previously loaded blueprint for this class to avoid stale blueprint definitions
|
39
|
-
@blueprints.delete_if { |
|
40
|
+
@blueprints.delete_if { |indexed_class, blueprint| indexed_class.name == klass.name }
|
40
41
|
@blueprints[klass] = blueprint
|
41
42
|
@_adapter = blueprint._adapter || XapianDb::Config.adapter || Adapters::GenericAdapter
|
42
43
|
@_adapter.add_class_helper_methods_to klass
|
43
|
-
|
44
|
+
|
45
|
+
@searchable_prefixes = @blueprints.values.map { |blueprint| blueprint.searchable_prefixes }.flatten.compact.uniq || []
|
46
|
+
# We can always do a field search on the name of the indexed class
|
47
|
+
@searchable_prefixes << "indexed_class"
|
48
|
+
@attributes = @blueprints.values.map { |blueprint| blueprint.attribute_names}.flatten.compact.uniq.sort || []
|
44
49
|
end
|
45
50
|
|
46
51
|
# Get all configured classes
|
@@ -63,14 +68,54 @@ module XapianDb
|
|
63
68
|
raise "Blueprint for class #{klass} is not defined"
|
64
69
|
end
|
65
70
|
|
71
|
+
# Get the value number for an attribute. Please note that this is not the index in the values
|
72
|
+
# array of a xapian document but the valueno. Therefore, document.values[value_number] returns
|
73
|
+
# the wrong data, use document.value(value_number) instead.
|
74
|
+
# @param [attribute] The name of an attribute
|
75
|
+
# @return [Integer] The value number
|
76
|
+
def value_number_for(attribute)
|
77
|
+
raise ArgumentError.new "attribute #{attribute} is not configured in any blueprint" if @attributes.nil?
|
78
|
+
return 0 if attribute.to_sym == :indexed_class
|
79
|
+
position = @attributes.index attribute.to_sym
|
80
|
+
if position
|
81
|
+
# We add 1 because value slot 0 is reserved for the class name
|
82
|
+
return position + 1
|
83
|
+
else
|
84
|
+
raise ArgumentError.new "attribute #{attribute} is not configured in any blueprint"
|
85
|
+
end
|
86
|
+
end
|
87
|
+
|
88
|
+
# Get the type info of an attribute
|
89
|
+
# @param [attribute] The name of an indexed method
|
90
|
+
# @return [Symbol] The defined type or :untyped if no type is defined
|
91
|
+
def type_info_for(attribute)
|
92
|
+
return nil if @blueprints.nil?
|
93
|
+
@blueprints.values.each do |blueprint|
|
94
|
+
return blueprint.type_map[attribute] if blueprint.type_map.has_key?(attribute)
|
95
|
+
end
|
96
|
+
nil
|
97
|
+
end
|
98
|
+
|
66
99
|
# Return an array of all configured text methods in any blueprint
|
67
100
|
# @return [Array<String>] All searchable prefixes
|
68
101
|
def searchable_prefixes
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
|
102
|
+
@searchable_prefixes || []
|
103
|
+
end
|
104
|
+
|
105
|
+
# Return an array of all defined attributes
|
106
|
+
# @return [Array<Symbol>] All defined attributes
|
107
|
+
def attributes
|
108
|
+
@attributes || []
|
109
|
+
end
|
110
|
+
|
111
|
+
private
|
112
|
+
|
113
|
+
def validate_type_consistency_on(blueprint)
|
114
|
+
blueprint.type_map.each do |method_name, type|
|
115
|
+
if type_info_for(method_name) && type_info_for(method_name) != type
|
116
|
+
raise ArgumentError.new "ambigous type definition for #{method_name} detected (#{type_info_for(method_name)}, #{type})"
|
117
|
+
end
|
118
|
+
end
|
74
119
|
end
|
75
120
|
|
76
121
|
end
|
@@ -79,6 +124,8 @@ module XapianDb
|
|
79
124
|
# Instance methods
|
80
125
|
# ---------------------------------------------------------------------------------
|
81
126
|
|
127
|
+
attr_reader :type_map
|
128
|
+
|
82
129
|
# Get the names of all configured attributes sorted alphabetically
|
83
130
|
# @return [Array<Symbol>] The names of the attributes
|
84
131
|
def attribute_names
|
@@ -89,7 +136,7 @@ module XapianDb
|
|
89
136
|
# @param [Symbol] attribute The name of the attribute
|
90
137
|
# @return [Block] The block
|
91
138
|
def block_for_attribute(attribute)
|
92
|
-
@attributes_hash[attribute]
|
139
|
+
@attributes_hash[attribute][:block]
|
93
140
|
end
|
94
141
|
|
95
142
|
# Get the names of all configured index methods sorted alphabetically
|
@@ -105,22 +152,10 @@ module XapianDb
|
|
105
152
|
@indexed_methods_hash[method]
|
106
153
|
end
|
107
154
|
|
108
|
-
# Return the value index of an attribute. Needed to access the value of an attribute
|
109
|
-
# from a Xapian document.
|
110
|
-
# @param [String, Symbol] attribute_name The name of the attribute
|
111
|
-
# @return [Integer] The value index of the attribute
|
112
|
-
# @raise ArgumentError if the attribute name is unknown
|
113
|
-
def value_index_for(attribute_name)
|
114
|
-
index = attribute_names.index attribute_name.to_sym
|
115
|
-
raise ArgumentError.new("Attribute #{attribute_name} unknown") unless index
|
116
|
-
# We add 1 because value slot 0 is reserved for the class name
|
117
|
-
index + 1
|
118
|
-
end
|
119
|
-
|
120
155
|
# Return an array of all configured text methods in this blueprint
|
121
156
|
# @return [Array<String>] All searchable prefixes
|
122
157
|
def searchable_prefixes
|
123
|
-
@
|
158
|
+
@searchable_prefixes ||= indexed_method_names
|
124
159
|
end
|
125
160
|
|
126
161
|
# Should the object go into the index? Evaluates an ignore expression,
|
@@ -151,10 +186,11 @@ module XapianDb
|
|
151
186
|
|
152
187
|
# Add an accessor for each attribute
|
153
188
|
attribute_names.each do |attribute|
|
154
|
-
index =
|
189
|
+
index = DocumentBlueprint.value_number_for(attribute)
|
190
|
+
codec = XapianDb::TypeCodec.codec_for @type_map[attribute]
|
155
191
|
@accessors_module.instance_eval do
|
156
192
|
define_method attribute do
|
157
|
-
|
193
|
+
codec.decode self.value(index)
|
158
194
|
end
|
159
195
|
end
|
160
196
|
end
|
@@ -174,8 +210,9 @@ module XapianDb
|
|
174
210
|
|
175
211
|
# Construct the blueprint
|
176
212
|
def initialize
|
177
|
-
@attributes_hash =
|
213
|
+
@attributes_hash = {}
|
178
214
|
@indexed_methods_hash = {}
|
215
|
+
@type_map = {}
|
179
216
|
end
|
180
217
|
|
181
218
|
# Set the adapter
|
@@ -194,6 +231,7 @@ module XapianDb
|
|
194
231
|
# @param [Hash] options
|
195
232
|
# @option options [Integer] :weight (1) The weight for this attribute.
|
196
233
|
# @option options [Boolean] :index (true) Should the attribute be indexed?
|
234
|
+
# @option options [Symbol] :as should add type info for range queries (:date, :numeric)
|
197
235
|
# @example For complex attribute configurations you may pass a block:
|
198
236
|
# XapianDb::DocumentBlueprint.setup(IndexedObject) do |blueprint|
|
199
237
|
# blueprint.attribute :complex do
|
@@ -206,13 +244,15 @@ module XapianDb
|
|
206
244
|
# end
|
207
245
|
def attribute(name, options={}, &block)
|
208
246
|
raise ArgumentError.new("You cannot use #{name} as an attribute name since it is a reserved method name of Xapian::Document") if reserved_method_name?(name)
|
209
|
-
|
247
|
+
do_not_index = options.delete(:index) == false
|
248
|
+
@type_map[name] = (options.delete(:as) || :generic)
|
249
|
+
|
210
250
|
if block_given?
|
211
|
-
@attributes_hash[name] = block
|
251
|
+
@attributes_hash[name] = {:block => block}.merge(options)
|
212
252
|
else
|
213
|
-
@attributes_hash[name] =
|
253
|
+
@attributes_hash[name] = options
|
214
254
|
end
|
215
|
-
self.index(name,
|
255
|
+
self.index(name, options, &block) unless do_not_index
|
216
256
|
end
|
217
257
|
|
218
258
|
# Add a list of attributes to the blueprint. Attributes will be stored in the xapian documents ans
|
@@ -221,7 +261,8 @@ module XapianDb
|
|
221
261
|
def attributes(*attributes)
|
222
262
|
attributes.each do |attr|
|
223
263
|
raise ArgumentError.new("You cannot use #{attr} as an attribute name since it is a reserved method name of Xapian::Document") if reserved_method_name?(attr)
|
224
|
-
@attributes_hash[attr] =
|
264
|
+
@attributes_hash[attr] = {}
|
265
|
+
@type_map[attr] = :generic
|
225
266
|
self.index attr
|
226
267
|
end
|
227
268
|
end
|
@@ -249,7 +290,9 @@ module XapianDb
|
|
249
290
|
when 2
|
250
291
|
# Is it a method name with options?
|
251
292
|
if args.last.is_a? Hash
|
252
|
-
|
293
|
+
options = args.last
|
294
|
+
assert_valid_keys options, :weight
|
295
|
+
@indexed_methods_hash[args.first] = IndexOptions.new(options.merge(:block => block))
|
253
296
|
else
|
254
297
|
add_indexes_from args
|
255
298
|
end
|
@@ -266,16 +309,16 @@ module XapianDb
|
|
266
309
|
# Options for an indexed method
|
267
310
|
class IndexOptions
|
268
311
|
|
269
|
-
|
270
|
-
attr_accessor :weight, :block
|
312
|
+
attr_reader :weight, :block
|
271
313
|
|
272
314
|
# Constructor
|
273
315
|
# @param [Hash] options
|
274
316
|
# @option options [Integer] :weight (1) The weight for the indexed value
|
275
|
-
def initialize(options)
|
317
|
+
def initialize(options = {})
|
276
318
|
@weight = options[:weight] || 1
|
277
319
|
@block = options[:block]
|
278
320
|
end
|
321
|
+
|
279
322
|
end
|
280
323
|
|
281
324
|
private
|
@@ -295,4 +338,4 @@ module XapianDb
|
|
295
338
|
|
296
339
|
end
|
297
340
|
|
298
|
-
end
|
341
|
+
end
|
@@ -3,7 +3,7 @@
|
|
3
3
|
module XapianDb
|
4
4
|
module IndexWriters
|
5
5
|
|
6
|
-
# Worker to update the Xapian index; the worker is used in the beanstalk worker
|
6
|
+
# Worker to update the Xapian index; the worker is used in the beanstalk worker script
|
7
7
|
# and uses the DirectWriter to do the real work
|
8
8
|
# @author Gernot Kogler
|
9
9
|
class BeanstalkWorker
|
@@ -12,7 +12,7 @@ module XapianDb
|
|
12
12
|
|
13
13
|
def index_task(options)
|
14
14
|
klass = constantize options[:class]
|
15
|
-
obj = klass.respond_to?(:get) ? klass.get(options[:id]
|
15
|
+
obj = klass.respond_to?(:get) ? klass.get(options[:id]) : klass.find(options[:id])
|
16
16
|
DirectWriter.index obj
|
17
17
|
end
|
18
18
|
|
data/lib/xapian_db/indexer.rb
CHANGED
@@ -43,10 +43,9 @@ module XapianDb
|
|
43
43
|
value = @obj.send(attribute)
|
44
44
|
end
|
45
45
|
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
@xapian_doc.add_value(@blueprint.value_index_for(attribute), yaml)
|
46
|
+
codec = XapianDb::TypeCodec.codec_for @blueprint.type_map[attribute]
|
47
|
+
encoded_string = codec.encode value
|
48
|
+
@xapian_doc.add_value DocumentBlueprint.value_number_for(attribute), encoded_string
|
50
49
|
end
|
51
50
|
end
|
52
51
|
|
@@ -105,4 +104,4 @@ module XapianDb
|
|
105
104
|
|
106
105
|
end
|
107
106
|
|
108
|
-
end
|
107
|
+
end
|
@@ -39,7 +39,20 @@ module XapianDb
|
|
39
39
|
|
40
40
|
# Add the searchable prefixes to allow searches by field
|
41
41
|
# (like "name:Kogler")
|
42
|
-
XapianDb::DocumentBlueprint.searchable_prefixes.each
|
42
|
+
XapianDb::DocumentBlueprint.searchable_prefixes.each do |prefix|
|
43
|
+
parser.add_prefix(prefix.to_s.downcase, "X#{prefix.to_s.upcase}")
|
44
|
+
type_info = XapianDb::DocumentBlueprint.type_info_for(prefix)
|
45
|
+
next if type_info.nil? || type_info == :generic
|
46
|
+
value_number = XapianDb::DocumentBlueprint.value_number_for(prefix)
|
47
|
+
case type_info
|
48
|
+
when :date
|
49
|
+
parser.add_valuerangeprocessor Xapian::DateValueRangeProcessor.new(value_number, "#{prefix}:")
|
50
|
+
when :number
|
51
|
+
parser.add_valuerangeprocessor Xapian::NumberValueRangeProcessor.new(value_number, "#{prefix}:")
|
52
|
+
when :string
|
53
|
+
parser.add_valuerangeprocessor Xapian::StringValueRangeProcessor.new(value_number, "#{prefix}:")
|
54
|
+
end
|
55
|
+
end
|
43
56
|
query = parser.parse_query(expression, @query_flags)
|
44
57
|
@spelling_suggestion = parser.get_corrected_query_string.force_encoding("UTF-8")
|
45
58
|
@spelling_suggestion = nil if @spelling_suggestion.empty?
|
@@ -48,4 +61,4 @@ module XapianDb
|
|
48
61
|
|
49
62
|
end
|
50
63
|
|
51
|
-
end
|
64
|
+
end
|
data/lib/xapian_db/utilities.rb
CHANGED
@@ -24,5 +24,11 @@ module XapianDb
|
|
24
24
|
constant
|
25
25
|
end
|
26
26
|
|
27
|
+
# Taken from Rails
|
28
|
+
def assert_valid_keys(hash, *valid_keys)
|
29
|
+
unknown_keys = hash.keys - [valid_keys].flatten
|
30
|
+
raise(ArgumentError, "Unsupported option(s) detected: #{unknown_keys.join(", ")}") unless unknown_keys.empty?
|
31
|
+
end
|
32
|
+
|
27
33
|
end
|
28
34
|
end
|
data/lib/xapian_db.rb
CHANGED
@@ -9,6 +9,22 @@
|
|
9
9
|
require 'xapian'
|
10
10
|
require 'yaml'
|
11
11
|
|
12
|
+
do_not_require = %w(update_stopwords.rb railtie.rb base_adapter.rb beanstalk_writer.rb utilities.rb install_generator.rb)
|
13
|
+
files = Dir.glob("#{File.dirname(__FILE__)}/**/*.rb").reject{|path| do_not_require.include?(File.basename(path))}
|
14
|
+
# Require these first
|
15
|
+
require "#{File.dirname(__FILE__)}/xapian_db/utilities"
|
16
|
+
require "#{File.dirname(__FILE__)}/xapian_db/adapters/base_adapter"
|
17
|
+
files.each {|file| require file}
|
18
|
+
|
19
|
+
# Configure XapianDB if we are in a Rails app
|
20
|
+
require File.dirname(__FILE__) + '/xapian_db/railtie' if defined?(Rails)
|
21
|
+
|
22
|
+
# Try to require the beanstalk writer (depends on beanstalk-client)
|
23
|
+
begin
|
24
|
+
require File.dirname(__FILE__) + '/xapian_db/index_writers/beanstalk_writer'
|
25
|
+
rescue LoadError
|
26
|
+
end
|
27
|
+
|
12
28
|
module XapianDb
|
13
29
|
|
14
30
|
# Supported languages
|
@@ -75,14 +91,21 @@ module XapianDb
|
|
75
91
|
# See {XapianDb::Database#search} for options
|
76
92
|
# @return [XapianDb::Resultset]
|
77
93
|
def self.search(expression, options={})
|
94
|
+
order = options.delete :order
|
95
|
+
if order
|
96
|
+
attr_names = [order].flatten
|
97
|
+
undefined_attrs = attr_names - XapianDb::DocumentBlueprint.attributes
|
98
|
+
raise ArgumentError.new "invalid order clause: attributes #{undefined_attrs.inspect} are not defined" unless undefined_attrs.empty?
|
99
|
+
options[:sort_indices] = attr_names.map {|attr_name| XapianDb::DocumentBlueprint.value_number_for(attr_name) }
|
100
|
+
end
|
78
101
|
XapianDb::Config.database.search(expression, options)
|
79
102
|
end
|
80
103
|
|
81
104
|
# Get facets from the configured database.
|
82
105
|
# See {XapianDb::Database#facets} for options
|
83
106
|
# @return [Hash<Class, Integer>] A hash containing the classes and the hits per class
|
84
|
-
def self.facets(expression)
|
85
|
-
XapianDb::Config.database.facets
|
107
|
+
def self.facets(attribute, expression)
|
108
|
+
XapianDb::Config.database.facets attribute, expression
|
86
109
|
end
|
87
110
|
|
88
111
|
# Update an object in the index
|
@@ -99,6 +122,17 @@ module XapianDb
|
|
99
122
|
writer.delete_doc_with xapian_id
|
100
123
|
end
|
101
124
|
|
125
|
+
# Update or delete a xapian document belonging to an object depending on the ignore_if logic(if present)
|
126
|
+
# @param [Object] object An instance of a class with a blueprint configuration
|
127
|
+
def self.reindex(object)
|
128
|
+
blueprint = XapianDb::DocumentBlueprint.blueprint_for object.class
|
129
|
+
if blueprint.should_index?(object)
|
130
|
+
XapianDb.index object
|
131
|
+
else
|
132
|
+
XapianDb.delete_doc_with object.xapian_id
|
133
|
+
end
|
134
|
+
end
|
135
|
+
|
102
136
|
# Reindex all objects of a given class
|
103
137
|
# @param [Class] klass The class to reindex
|
104
138
|
# @param [Hash] options Options for reindexing
|
@@ -161,16 +195,3 @@ module XapianDb
|
|
161
195
|
end
|
162
196
|
|
163
197
|
end
|
164
|
-
|
165
|
-
do_not_require = %w(update_stopwords.rb railtie.rb base_adapter.rb beanstalk_writer.rb utilities.rb install_generator.rb)
|
166
|
-
files = Dir.glob("#{File.dirname(__FILE__)}/**/*.rb").reject{|path| do_not_require.include?(File.basename(path))}
|
167
|
-
# Require these first
|
168
|
-
require "#{File.dirname(__FILE__)}/xapian_db/utilities"
|
169
|
-
require "#{File.dirname(__FILE__)}/xapian_db/adapters/base_adapter"
|
170
|
-
files.each {|file| require file}
|
171
|
-
|
172
|
-
# Configure XapianDB if we are in a Rails app
|
173
|
-
require File.dirname(__FILE__) + '/xapian_db/railtie' if defined?(Rails)
|
174
|
-
|
175
|
-
# Require the beanstalk writer is beanstalk-client is installed
|
176
|
-
require File.dirname(__FILE__) + '/xapian_db/index_writers/beanstalk_writer' if Gem.available?('beanstalk-client')
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: xapian_db
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: '1.
|
4
|
+
version: '1.1'
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,12 +9,12 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2011-
|
12
|
+
date: 2011-09-07 00:00:00.000000000 +02:00
|
13
13
|
default_executable:
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
16
16
|
name: daemons
|
17
|
-
requirement: &
|
17
|
+
requirement: &70318028242380 !ruby/object:Gem::Requirement
|
18
18
|
none: false
|
19
19
|
requirements:
|
20
20
|
- - ! '>='
|
@@ -22,10 +22,10 @@ dependencies:
|
|
22
22
|
version: 1.0.10
|
23
23
|
type: :runtime
|
24
24
|
prerelease: false
|
25
|
-
version_requirements: *
|
25
|
+
version_requirements: *70318028242380
|
26
26
|
- !ruby/object:Gem::Dependency
|
27
27
|
name: xapian-ruby
|
28
|
-
requirement: &
|
28
|
+
requirement: &70318028241920 !ruby/object:Gem::Requirement
|
29
29
|
none: false
|
30
30
|
requirements:
|
31
31
|
- - ! '>='
|
@@ -33,10 +33,10 @@ dependencies:
|
|
33
33
|
version: 1.2.6
|
34
34
|
type: :runtime
|
35
35
|
prerelease: false
|
36
|
-
version_requirements: *
|
36
|
+
version_requirements: *70318028241920
|
37
37
|
- !ruby/object:Gem::Dependency
|
38
38
|
name: rspec
|
39
|
-
requirement: &
|
39
|
+
requirement: &70318028241460 !ruby/object:Gem::Requirement
|
40
40
|
none: false
|
41
41
|
requirements:
|
42
42
|
- - ! '>='
|
@@ -44,10 +44,10 @@ dependencies:
|
|
44
44
|
version: 2.3.1
|
45
45
|
type: :development
|
46
46
|
prerelease: false
|
47
|
-
version_requirements: *
|
47
|
+
version_requirements: *70318028241460
|
48
48
|
- !ruby/object:Gem::Dependency
|
49
49
|
name: simplecov
|
50
|
-
requirement: &
|
50
|
+
requirement: &70318028241000 !ruby/object:Gem::Requirement
|
51
51
|
none: false
|
52
52
|
requirements:
|
53
53
|
- - ! '>='
|
@@ -55,10 +55,10 @@ dependencies:
|
|
55
55
|
version: 0.3.7
|
56
56
|
type: :development
|
57
57
|
prerelease: false
|
58
|
-
version_requirements: *
|
58
|
+
version_requirements: *70318028241000
|
59
59
|
- !ruby/object:Gem::Dependency
|
60
60
|
name: beanstalk-client
|
61
|
-
requirement: &
|
61
|
+
requirement: &70318028240520 !ruby/object:Gem::Requirement
|
62
62
|
none: false
|
63
63
|
requirements:
|
64
64
|
- - ! '>='
|
@@ -66,7 +66,7 @@ dependencies:
|
|
66
66
|
version: 1.1.0
|
67
67
|
type: :development
|
68
68
|
prerelease: false
|
69
|
-
version_requirements: *
|
69
|
+
version_requirements: *70318028240520
|
70
70
|
description: XapianDb is a ruby gem that combines features of nosql databases and
|
71
71
|
fulltext indexing. It is based on Xapian, an efficient and powerful indexing library
|
72
72
|
email: gernot.kogler (at) garaio (dot) com
|
@@ -76,6 +76,7 @@ extra_rdoc_files: []
|
|
76
76
|
files:
|
77
77
|
- lib/generators/install_generator.rb
|
78
78
|
- lib/generators/templates/beanstalk_worker
|
79
|
+
- lib/type_codec.rb
|
79
80
|
- lib/xapian_db/adapters/active_record_adapter.rb
|
80
81
|
- lib/xapian_db/adapters/base_adapter.rb
|
81
82
|
- lib/xapian_db/adapters/datamapper_adapter.rb
|