xapian-fu 1.3.2 → 1.4.0

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGELOG.rdoc CHANGED
@@ -1,3 +1,16 @@
1
+ === 1.4.0 (13th March 2012)
2
+
3
+ * Support for indexing Arrays properly
4
+
5
+ * Support returning all and no documents by using Xapian's special
6
+ queries `MatchAll` and `MatchNothing`.
7
+
8
+ * Add boolean terms and faceted queries.
9
+
10
+ See [http://bit.ly/rMuA4M](http://bit.ly/rMuA4M).
11
+
12
+ * Fix number range queries when no prefixes are given.
13
+
1
14
  === 1.3.2 (4th December 2011)
2
15
 
3
16
  * Number range queries (Damian Janowski)
data/README.rdoc CHANGED
@@ -17,6 +17,14 @@ Hash with full text indexing (and ACID transactions).
17
17
 
18
18
  sudo gem install xapian-fu
19
19
 
20
+ === Xapian Bindings
21
+
22
+ Xapian Fu requires the Xapian Ruby bindings to be available. On
23
+ Debian/Ubuntu, you can install the `libxapian-ruby1.8` package to get
24
+ them. Alternatively, you can install the `xapian-ruby` gem, which
25
+ reportedly will also provide them. You can also just get the
26
+ upstream Xapian release and manually install it.
27
+
20
28
  == Documentation
21
29
 
22
30
  XapianFu::XapianDb is the corner-stone of XapianFu. A XapianDb
@@ -97,7 +105,7 @@ type of object is instantiated when returning those stored values.
97
105
  And in the case of Fixnum and Bignum, allows you to order search
98
106
  results without worrying about leading zeros.
99
107
 
100
- db = XapianDb.new(:fields => {
108
+ db = XapianDb.new(:fields => {
101
109
  :title => { :store => true },
102
110
  :released => { :type => Date, :store => true },
103
111
  :votes => { :type => Fixnum, :store => true }
@@ -113,6 +121,13 @@ Find the document with the highest :year value
113
121
 
114
122
  db.documents.max(:year)
115
123
 
124
+ == Special queries
125
+
126
+ XapianFu supports Xapian's `MatchAll` and `MatchNothing` queries:
127
+
128
+ db.search(:all)
129
+ db.search(:nothing)
130
+
116
131
  == Search examples
117
132
 
118
133
  Search on particular fields
@@ -145,6 +160,54 @@ And any combinations of the above:
145
160
 
146
161
  db.search("(ruby OR sinatra) -rails xap*")
147
162
 
163
+ == Boolean terms
164
+
165
+ If you want to implement something like [this](http://getting-started-with-xapian.readthedocs.org/en/latest/howtos/boolean_filters.html#searching),
166
+ then:
167
+
168
+ db = XapianDb.new(
169
+ fields: {
170
+ name: {:index => true},
171
+ colors: {:boolean => true}
172
+ }
173
+ )
174
+
175
+ db << {name: "Foo", colors: ["red", "black"]}
176
+ db << {name: "Foo", colors: ["red", "green"]}
177
+ db << {name: "Foo", colors: ["blue", "yellow"]}
178
+
179
+ db.search("foo", filter: {:colors => ["red"]})
180
+
181
+ The main thing here is that filtering by color doesn't affect the relevancy of the documents returned.
182
+
183
+ == Facets
184
+
185
+ Many times you want to allow users to narrow down the search results by restricting the query
186
+ to specific values of a given category. This is called [faceted search](readthedocs.org/docs/getting-started-with-xapian/en/latest/xapian-core-rst/facets.html).
187
+
188
+ To find out which values you can display to your users, you can do something like this:
189
+
190
+ results = db.search("foo", facets: [:colors, :year])
191
+
192
+ results.facets
193
+ # {
194
+ # :colors => [
195
+ # ["blue", 4]
196
+ # ["red", 1]
197
+ # ],
198
+ #
199
+ # :year => [
200
+ # [2010, 3],
201
+ # [2011, 2],
202
+ # [2012, 1]
203
+ # ]
204
+ # }
205
+
206
+ When filtering by one of these values, it's best to define the field as
207
+ boolean (see section above) and then use `:filter`:
208
+
209
+ db.search("foo", filter: {colors: ["blue"], year: [2010]})
210
+
148
211
  == ActiveRecord Integration
149
212
 
150
213
  XapianFu always stores the :id field, so you can easily use it with
@@ -176,7 +239,7 @@ perhaps by reindexing once in a while.
176
239
 
177
240
  db = XapianDb.new(:dir => 'posts.db')
178
241
  deleted_posts = Post.find(:all, :conditions => 'deleted_at is not null')
179
- deleted_posts.each do |post|
242
+ deleted_posts.each do |post|
180
243
  db.documents.delete(post.id)
181
244
  post.destroy
182
245
  end
@@ -184,8 +247,8 @@ perhaps by reindexing once in a while.
184
247
  = More Info
185
248
 
186
249
  Author:: John Leach (mailto:john@johnleach.co.uk)
187
- Copyright:: Copyright (c) 2009-2011 John Leach
250
+ Copyright:: Copyright (c) 2009-2012 John Leach
188
251
  License:: MIT (The Xapian library is GPL)
189
252
  Mailing list:: http://rubyforge.org/mailman/listinfo/xapian-fu-discuss
190
253
  Web page:: http://johnleach.co.uk/documents/xapian-fu
191
- Github:: http://github.com/johnl/xapian-fu/tree/master
254
+ Github:: http://github.com/johnl/xapian-fu/tree/master
@@ -49,15 +49,15 @@ module XapianFu #:nodoc:
49
49
  # or false.
50
50
  #
51
51
  class QueryParser #:notnew:
52
-
52
+
53
53
  # The stemming strategy to use when generating terms from a query.
54
54
  # Defaults to <tt>:some</tt>
55
55
  attr_accessor :stemming_strategy
56
-
56
+
57
57
  # The default operation when combining search terms. Defaults to
58
58
  # <tt>:and</tt>
59
59
  attr_accessor :default_op
60
-
60
+
61
61
  # The database that this query is agains, used for setting up
62
62
  # fields, stemming, stopping and spelling.
63
63
  attr_accessor :database
@@ -72,9 +72,17 @@ module XapianFu #:nodoc:
72
72
  self.database = @options[:database]
73
73
  end
74
74
 
75
- # Parse the given query string and return a Xapian::Query object
75
+ # Parse the given query and return a Xapian::Query object
76
+ # Accepts either a string or a special query
76
77
  def parse_query(q)
77
- query_parser.parse_query(q, xapian_flags)
78
+ case q
79
+ when :all
80
+ Xapian::Query.new("")
81
+ when :nothing
82
+ Xapian::Query.new()
83
+ else
84
+ query_parser.parse_query(q, xapian_flags)
85
+ end
78
86
  end
79
87
 
80
88
  # Return the query string with any spelling corrections made
@@ -93,16 +101,32 @@ module XapianFu #:nodoc:
93
101
  qp.stemmer = database.stemmer if database
94
102
  qp.default_op = xapian_default_op
95
103
  qp.stemming_strategy = xapian_stemming_strategy
104
+
96
105
  fields.each do |name, type|
106
+ next if database && database.boolean_fields.include?(name)
97
107
  qp.add_prefix(name.to_s.downcase, "X" + name.to_s.upcase)
98
108
  end
99
109
 
110
+ database.boolean_fields.each do |name|
111
+ qp.add_boolean_prefix(name.to_s.downcase, "X#{name.to_s.upcase}")
112
+ end if database
113
+
100
114
  database.sortable_fields.each do |field, opts|
101
- if opts[:range_prefix]
102
- qp.add_valuerangeprocessor(Xapian::NumberValueRangeProcessor.new(XapianDocValueAccessor.value_key(field), opts[:range_prefix], true))
115
+ prefix, string = nil
116
+
117
+ if opts[:range_postfix]
118
+ prefix = false
119
+ string = opts[:range_postfix]
103
120
  else
104
- qp.add_valuerangeprocessor(Xapian::NumberValueRangeProcessor.new(XapianDocValueAccessor.value_key(field), opts[:range_postfix], false))
121
+ prefix = true
122
+ string = opts[:range_prefix] || "#{field.to_s.downcase}:"
105
123
  end
124
+
125
+ qp.add_valuerangeprocessor(Xapian::NumberValueRangeProcessor.new(
126
+ XapianDocValueAccessor.value_key(field),
127
+ string,
128
+ prefix
129
+ ))
106
130
  end if database
107
131
 
108
132
  @query_parser = qp
@@ -12,12 +12,23 @@ module XapianFu
12
12
  # by :corrected_query, otherwise this is empty.
13
13
  attr_reader :corrected_query
14
14
 
15
+ attr_reader :facets
16
+
15
17
  # nodoc
16
18
  def initialize(options = { })
17
19
  @mset = options[:mset]
18
20
  @current_page = options[:current_page]
19
21
  @per_page = options[:per_page]
20
22
  @corrected_query = options[:corrected_query]
23
+ @facets = {}
24
+ @db = options[:xapian_db]
25
+
26
+ options[:spies].each do |name, spy|
27
+ @facets[name] = spy.values.map do |value|
28
+ [@db.unserialize_value(name, value.term), value.termfreq]
29
+ end
30
+ end if options[:spies]
31
+
21
32
  doc_options = {:xapian_db => options[:xapian_db] }
22
33
  concat mset.matches.collect { |m| XapianDoc.new(m, doc_options) }
23
34
  end
@@ -1,3 +1,3 @@
1
1
  module XapianFu #:nodoc:
2
- VERSION = "1.3.2"
2
+ VERSION = "1.4.0"
3
3
  end
@@ -100,7 +100,7 @@ module XapianFu #:nodoc:
100
100
  #
101
101
  class XapianDb # :nonew:
102
102
  # Path to the on-disk database. Nil if in-memory database
103
- attr_reader :dir
103
+ attr_reader :dir
104
104
  attr_reader :db_flag #:nodoc:
105
105
  # An array of the fields that will be stored in the Xapian
106
106
  attr_reader :store_values
@@ -112,6 +112,8 @@ module XapianFu #:nodoc:
112
112
  attr_reader :fields
113
113
  # An array of fields that will not be indexed
114
114
  attr_reader :unindexed_fields
115
+ # An array of fields that will be treated as boolean terms
116
+ attr_reader :boolean_fields
115
117
  # Whether this db will generate a spelling dictionary during indexing
116
118
  attr_reader :spelling
117
119
  attr_reader :sortable_fields
@@ -130,7 +132,7 @@ module XapianFu #:nodoc:
130
132
  setup_fields(@options[:fields])
131
133
  @store_values << @options[:store]
132
134
  @store_values << @options[:sortable]
133
- @store_values << @options[:collapsible]
135
+ @store_values << @options[:collapsible]
134
136
  @store_values = @store_values.flatten.uniq.compact
135
137
  @spelling = @options[:spelling]
136
138
  end
@@ -195,10 +197,10 @@ module XapianFu #:nodoc:
195
197
  #
196
198
  # The <tt>:page</tt> option sets which page of results to return.
197
199
  # Defaults to 1.
198
- #
200
+ #
199
201
  # The <tt>:order</tt> option specifies the stored field to order
200
202
  # the results by (instead of the default search result weight).
201
- #
203
+ #
202
204
  # The <tt>:reverse</tt> option reverses the order of the results,
203
205
  # so lowest search weight first (or lowest stored field value
204
206
  # first).
@@ -213,9 +215,13 @@ module XapianFu #:nodoc:
213
215
  # enabled, spelling suggestions are available using the
214
216
  # XapianFu::ResultSet <tt>corrected_query</tt> method.
215
217
  #
218
+ # The first parameter can also be <tt>:all</tt> or
219
+ # <tt>:nothing</tt>, to match all documents or no documents
220
+ # respectively.
221
+ #
216
222
  # For additional options on how the query is parsed, see
217
223
  # XapianFu::QueryParser
218
-
224
+
219
225
  def search(q, options = {})
220
226
  defaults = { :page => 1, :reverse => false,
221
227
  :boolean => true, :boolean_anycase => true, :wildcards => true,
@@ -227,15 +233,29 @@ module XapianFu #:nodoc:
227
233
  per_page = per_page.to_i rescue 10
228
234
  offset = page * per_page
229
235
  qp = XapianFu::QueryParser.new({ :database => self }.merge(options))
230
- query = qp.parse_query(q.to_s)
236
+ query = qp.parse_query(q.is_a?(Symbol) ? q : q.to_s)
237
+ query = filter_query(query, options[:filter]) if options[:filter]
231
238
  enquiry = Xapian::Enquire.new(ro)
232
239
  setup_ordering(enquiry, options[:order], options[:reverse])
233
240
  if options[:collapse]
234
241
  enquiry.collapse_key = XapianDocValueAccessor.value_key(options[:collapse])
235
242
  end
243
+ if options[:facets]
244
+ spies = options[:facets].inject({}) do |accum, name|
245
+ accum[name] = spy = Xapian::ValueCountMatchSpy.new(XapianDocValueAccessor.value_key(name))
246
+ enquiry.add_matchspy(spy)
247
+ accum
248
+ end
249
+ end
236
250
  enquiry.query = query
237
- ResultSet.new(:mset => enquiry.mset(offset, per_page), :current_page => page + 1,
238
- :per_page => per_page, :corrected_query => qp.corrected_query, :xapian_db => self)
251
+
252
+ ResultSet.new(:mset => enquiry.mset(offset, per_page),
253
+ :current_page => page + 1,
254
+ :per_page => per_page,
255
+ :corrected_query => qp.corrected_query,
256
+ :spies => spies,
257
+ :xapian_db => self
258
+ )
239
259
  end
240
260
 
241
261
  # Run the given block in a XapianDB transaction. Any changes to the
@@ -272,6 +292,22 @@ module XapianFu #:nodoc:
272
292
  ro.reopen
273
293
  end
274
294
 
295
+ def serialize_value(field, value, type = nil)
296
+ if sortable_fields.include?(field)
297
+ Xapian.sortable_serialise(value)
298
+ else
299
+ (type || fields[field] || Object).to_xapian_fu_storage_value(value)
300
+ end
301
+ end
302
+
303
+ def unserialize_value(field, value, type = nil)
304
+ if sortable_fields.include?(field)
305
+ Xapian.sortable_unserialise(value)
306
+ else
307
+ (type || fields[field] || Object).from_xapian_fu_storage_value(value)
308
+ end
309
+ end
310
+
275
311
  private
276
312
 
277
313
  # Setup the writable database
@@ -317,12 +353,17 @@ module XapianFu #:nodoc:
317
353
  @unindexed_fields = []
318
354
  @store_values = []
319
355
  @sortable_fields = {}
356
+ @boolean_fields = []
320
357
  return nil if field_options.nil?
321
358
  default_opts = {
322
359
  :store => true,
323
360
  :index => true,
324
361
  :type => String
325
362
  }
363
+ boolean_default_opts = default_opts.merge(
364
+ :store => false,
365
+ :index => false
366
+ )
326
367
  # Convert array argument to hash, with String as default type
327
368
  if field_options.is_a? Array
328
369
  fohash = { }
@@ -332,16 +373,62 @@ module XapianFu #:nodoc:
332
373
  field_options.each do |name,opts|
333
374
  # Handle simple setup by type only
334
375
  opts = { :type => opts } unless opts.is_a? Hash
335
- opts = default_opts.merge(opts)
376
+ if opts[:boolean]
377
+ opts = boolean_default_opts.merge(opts)
378
+ else
379
+ opts = default_opts.merge(opts)
380
+ end
336
381
  @store_values << name if opts[:store]
337
382
  @sortable_fields[name] = {:range_prefix => opts[:range_prefix], :range_postfix => opts[:range_postfix]} if opts[:sortable]
338
383
  @unindexed_fields << name if opts[:index] == false
384
+ @boolean_fields << name if opts[:boolean]
339
385
  @fields[name] = opts[:type]
340
386
  end
341
387
  @fields
342
388
  end
343
-
389
+
390
+ def filter_query(query, filter)
391
+ subqueries = filter.map do |field, values|
392
+ values = Array(values)
393
+
394
+ if sortable_fields[field]
395
+ sortable_filter_query(field, values)
396
+ elsif boolean_fields.include?(field)
397
+ boolean_filter_query(field, values)
398
+ end
399
+ end
400
+
401
+ combined_subqueries = Xapian::Query.new(Xapian::Query::OP_AND, subqueries)
402
+
403
+ Xapian::Query.new(Xapian::Query::OP_FILTER, query, combined_subqueries)
404
+ end
405
+
406
+ def sortable_filter_query(field, values)
407
+ subqueries = values.map do |value|
408
+ from, to = value.split("..")
409
+ slot = XapianDocValueAccessor.value_key(field)
410
+
411
+ if from.empty?
412
+ Xapian::Query.new(Xapian::Query::OP_VALUE_LE, slot, Xapian.sortable_serialise(to.to_f))
413
+ elsif to.nil?
414
+ Xapian::Query.new(Xapian::Query::OP_VALUE_GE, slot, Xapian.sortable_serialise(from.to_f))
415
+ else
416
+ Xapian::Query.new(Xapian::Query::OP_VALUE_RANGE, slot, Xapian.sortable_serialise(from.to_f), Xapian.sortable_serialise(to.to_f))
417
+ end
418
+ end
419
+
420
+ Xapian::Query.new(Xapian::Query::OP_OR, subqueries)
421
+ end
422
+
423
+ def boolean_filter_query(field, values)
424
+ subqueries = values.map do |value|
425
+ Xapian::Query.new("X#{field.to_s.upcase}#{value.to_s.downcase}")
426
+ end
427
+
428
+ Xapian::Query.new(Xapian::Query::OP_OR, subqueries)
429
+ end
430
+
344
431
  end
345
-
432
+
346
433
  end
347
434
 
@@ -18,9 +18,15 @@ class DateTime #:nodoc:
18
18
  end
19
19
  end
20
20
 
21
+ class Array #:nodoc:
22
+ def to_xapian_fu_string
23
+ join(" ")
24
+ end
25
+ end
26
+
21
27
  module XapianFu #:nodoc:
22
28
  require 'xapian_doc_value_accessor'
23
-
29
+
24
30
  # Raised whenever a XapianDb is needed but has not been provided,
25
31
  # such as when retrieving the terms list for a document
26
32
  class XapianDbNotSet < XapianFuError ; end
@@ -33,26 +39,26 @@ module XapianFu #:nodoc:
33
39
  # documents to the database. You usually don't need to instantiate
34
40
  # them yourself unless you're doing something a bit advanced.
35
41
  class XapianDoc
36
-
42
+
37
43
  # A hash of the fields given to this object on initialize
38
44
  attr_reader :fields
39
-
45
+
40
46
  # An abitrary blob of data stored alongside the document in the
41
47
  # Xapian database.
42
48
  attr_reader :data
43
-
49
+
44
50
  # The search score of this document when returned as part of a
45
51
  # search result
46
52
  attr_reader :weight
47
-
53
+
48
54
  # The Xapian::Match object for this document when returned as part
49
55
  # of a search result.
50
56
  attr_reader :match
51
-
57
+
52
58
  # The unsigned integer "primary key" for this document in the
53
59
  # Xapian database.
54
60
  attr_accessor :id
55
-
61
+
56
62
  # The XapianDb object that this document was retrieved from, or
57
63
  # should be stored in.
58
64
  attr_accessor :db
@@ -66,7 +72,7 @@ module XapianFu #:nodoc:
66
72
  # and term enumeration.
67
73
  def initialize(doc, options = {})
68
74
  @options = options
69
-
75
+
70
76
  @fields = {}
71
77
  if doc.is_a? Xapian::Match
72
78
  match = doc
@@ -84,6 +90,9 @@ module XapianFu #:nodoc:
84
90
  elsif doc.respond_to?(:has_key?) and doc.respond_to?("[]")
85
91
  @fields = doc
86
92
  @id = doc[:id] if doc.has_key?(:id)
93
+ # Handle initialisation from an object with a to_xapian_fu_string method
94
+ elsif doc.respond_to?(:to_xapian_fu_string)
95
+ @fields = { :content => doc.to_xapian_fu_string }
87
96
  # Handle initialisation from anything else that can be coerced
88
97
  # into a string
89
98
  elsif doc.respond_to? :to_s
@@ -107,10 +116,10 @@ module XapianFu #:nodoc:
107
116
  def values
108
117
  @value_accessor ||= XapianDocValueAccessor.new(self)
109
118
  end
110
-
119
+
111
120
  # Return a list of terms that the db has for this document.
112
121
  def terms
113
- raise XapianFu::XapianDbNotSet unless db
122
+ raise XapianFu::XapianDbNotSet unless db
114
123
  db.ro.termlist(id) if db.respond_to?(:ro) and db.ro and id
115
124
  end
116
125
 
@@ -123,7 +132,7 @@ module XapianFu #:nodoc:
123
132
  xapian_document.clear_values
124
133
  add_values_to_xapian_document
125
134
  # Clear and add terms
126
- xapian_document.clear_terms
135
+ xapian_document.clear_terms
127
136
  generate_terms
128
137
  xapian_document
129
138
  end
@@ -135,7 +144,7 @@ module XapianFu #:nodoc:
135
144
  def xapian_document
136
145
  @xapian_document ||= Xapian::Document.new
137
146
  end
138
-
147
+
139
148
  # Compare IDs with another XapianDoc
140
149
  def ==(b)
141
150
  if b.is_a?(XapianDoc)
@@ -144,7 +153,7 @@ module XapianFu #:nodoc:
144
153
  super(b)
145
154
  end
146
155
  end
147
-
156
+
148
157
  def inspect
149
158
  s = ["<#{self.class.to_s} id=#{id}"]
150
159
  s << "weight=%.5f" % weight if weight
@@ -167,7 +176,7 @@ module XapianFu #:nodoc:
167
176
  def update
168
177
  db.rw.replace_document(id, to_xapian_document)
169
178
  end
170
-
179
+
171
180
  # Set the stemmer to use for this document. Accepts any string
172
181
  # that the Xapian::Stem class accepts (Either the English name for
173
182
  # the language or the two letter ISO639 code). Can also be an
@@ -183,7 +192,7 @@ module XapianFu #:nodoc:
183
192
  if @stemmer
184
193
  @stemmer
185
194
  else
186
- @stemmer =
195
+ @stemmer =
187
196
  if ! @options[:stemmer].nil?
188
197
  @options[:stemmer]
189
198
  elsif @options[:language]
@@ -196,7 +205,7 @@ module XapianFu #:nodoc:
196
205
  @stemmer = StemFactory.stemmer_for(@stemmer)
197
206
  end
198
207
  end
199
-
208
+
200
209
  # Return the stopper for this document. If not set on initialize
201
210
  # by the :stopper or :language option, it will try the database's
202
211
  # stopper and otherwise default to an English stopper..
@@ -218,7 +227,7 @@ module XapianFu #:nodoc:
218
227
  end
219
228
  end
220
229
 
221
- # Return this document's language which is set on initialize, inherited
230
+ # Return this document's language which is set on initialize, inherited
222
231
  # from the database or defaults to :english
223
232
  def language
224
233
  if @language
@@ -234,14 +243,14 @@ module XapianFu #:nodoc:
234
243
  end
235
244
  end
236
245
  end
237
-
246
+
238
247
  private
239
-
248
+
240
249
  # Array of field names not to run through the TermGenerator
241
250
  def unindexed_fields
242
251
  db ? db.unindexed_fields : []
243
252
  end
244
-
253
+
245
254
  # Add all the fields to be stored as XapianDb values
246
255
  def add_values_to_xapian_document
247
256
  db.store_values.collect do |key|
@@ -271,12 +280,19 @@ module XapianFu #:nodoc:
271
280
  # add value without field name
272
281
  tg.send(index_method, v)
273
282
  end
283
+
284
+ db.boolean_fields.each do |name|
285
+ Array(fields[name]).each do |value|
286
+ xapian_document.add_boolean_term("X#{name.to_s.upcase}#{value.to_s.downcase}")
287
+ end
288
+ end
289
+
274
290
  xapian_document
275
291
  end
276
-
292
+
277
293
  end
278
-
279
-
294
+
295
+
280
296
  class StemFactory
281
297
  # Return a Xapian::Stem object for the given option. Accepts any
282
298
  # string that the Xapian::Stem class accepts (Either the English
@@ -296,5 +312,5 @@ module XapianFu #:nodoc:
296
312
  end
297
313
  end
298
314
  end
299
-
315
+
300
316
  end
@@ -63,20 +63,31 @@ class Date #:nodoc:
63
63
  end
64
64
  end
65
65
 
66
+ class Object
67
+ def self.to_xapian_fu_storage_value(value)
68
+ value.to_s
69
+ end
70
+
71
+ def self.from_xapian_fu_storage_value(value)
72
+ value
73
+ end
74
+ end
75
+
66
76
  module XapianFu #:nodoc:
67
-
77
+
68
78
  class ValueOutOfBounds < XapianFuError
69
79
  end
70
-
80
+
71
81
  # A XapianDocValueAccessor is used to provide the XapianDoc#values
72
82
  # interface to read and write field values to a XapianDb. It is
73
83
  # usually set up by a XapianDoc so you shouldn't need to set up your
74
84
  # own.
75
85
  class XapianDocValueAccessor
86
+
76
87
  def initialize(xapian_doc)
77
88
  @doc = xapian_doc
78
89
  end
79
-
90
+
80
91
  # Add the given <tt>value</tt> with the given <tt>key</tt> to the
81
92
  # XapianDoc. If the value has a
82
93
  # <tt>to_xapian_fu_storage_value</tt> method then it is used to
@@ -84,23 +95,12 @@ module XapianFu #:nodoc:
84
95
  # is used. This is usually paired with a
85
96
  # <tt>from_xapian_fu_storage_value</tt> class method on retrieval.
86
97
  def store(key, value, type = nil)
87
- type = @doc.db.fields[key] if type.nil? and @doc.db
88
-
89
- if @doc.db && @doc.db.sortable_fields.include?(key)
90
- converted_value = Xapian.sortable_serialise(value)
91
- else
92
- if type and type.respond_to?(:to_xapian_fu_storage_value)
93
- converted_value = type.to_xapian_fu_storage_value(value)
94
- else
95
- converted_value = value.to_s
96
- end
97
- end
98
-
98
+ converted_value = @doc.db.serialize_value(key, value, type)
99
99
  @doc.xapian_document.add_value(XapianDocValueAccessor.value_key(key), converted_value)
100
100
  value
101
101
  end
102
102
  alias_method "[]=", :store
103
-
103
+
104
104
  # Retrieve the value with the given <tt>key</tt> from the
105
105
  # XapianDoc. <tt>key</tt> can be a symbol or string, in which case
106
106
  # it's hashed to get an integer value number. Or you can give the
@@ -117,20 +117,15 @@ module XapianFu #:nodoc:
117
117
  # empty string is returned.
118
118
  def fetch(key, type = nil)
119
119
  value = @doc.xapian_document.value(XapianDocValueAccessor.value_key(key))
120
- type = @doc.db.fields[key] if type.nil? and @doc.db
121
- if type and type.respond_to?(:from_xapian_fu_storage_value)
122
- type.from_xapian_fu_storage_value(value)
123
- else
124
- value
125
- end
120
+ @doc.db.unserialize_value(key, value, type)
126
121
  end
127
122
  alias_method "[]", :fetch
128
-
123
+
129
124
  # Count the values stored in the XapianDoc
130
125
  def size
131
126
  @doc.xapian_document.values_count
132
127
  end
133
-
128
+
134
129
  # Remove the value with the given key from the XapianDoc and return it
135
130
  def delete(key)
136
131
  value = fetch(key)
@@ -142,6 +137,6 @@ module XapianFu #:nodoc:
142
137
  # value number
143
138
  def self.value_key(key)
144
139
  (key.is_a?(Integer) ? key : Zlib.crc32(key.to_s))
145
- end
140
+ end
146
141
  end
147
142
  end
@@ -0,0 +1,34 @@
1
+ require File.expand_path('../lib/xapian_fu.rb', File.dirname(__FILE__))
2
+
3
+ tmp_dir = '/tmp/xapian_fu_test.db'
4
+
5
+ describe "Facets support" do
6
+
7
+ before do
8
+ @xdb = XapianFu::XapianDb.new(
9
+ :dir => tmp_dir, :create => true, :overwrite => true,
10
+ :fields => {
11
+ :name => { :index => true },
12
+ :age => { :type => Integer, :sortable => true },
13
+ :height => { :type => Float, :sortable => true }
14
+ }
15
+ )
16
+
17
+ @xdb << {:name => "John A", :age => 30, :height => 1.8}
18
+ @xdb << {:name => "John B", :age => 35, :height => 1.8}
19
+ @xdb << {:name => "John C", :age => 40, :height => 1.7}
20
+ @xdb << {:name => "John D", :age => 40, :height => 1.7}
21
+ @xdb << {:name => "Markus", :age => 35, :height => 1.7}
22
+ @xdb.flush
23
+ end
24
+
25
+ it "should expose facets when searching" do
26
+ results = @xdb.search("john", {:facets => [:age, :height]})
27
+
28
+ results.facets[:age].should == [[30, 1], [35, 1], [40, 2]]
29
+ results.facets[:height].should == [[1.7, 2], [1.8, 2]]
30
+
31
+ results.facets.keys.map(&:to_s).sort == %w(age height)
32
+ end
33
+
34
+ end
@@ -37,6 +37,13 @@ describe QueryParser do
37
37
  terms.should_not include "john"
38
38
  end
39
39
 
40
+ it "should turn :all into a query with no terms" do
41
+ qp = QueryParser.new
42
+ qp.parse_query(:all).terms.should == []
43
+ qp.parse_query(:all).empty?.should be_false
44
+ qp.parse_query(:nothing).empty?.should be_true
45
+ end
46
+
40
47
  end
41
48
 
42
49
  end
@@ -202,6 +202,46 @@ describe XapianDb do
202
202
  xdb.size.should == 2
203
203
  end
204
204
 
205
+ it "should generate boolean terms for multiple values" do
206
+ xdb = XapianDb.new(:dir => tmp_dir, :create => true,
207
+ :fields => {
208
+ :name => { :index => true },
209
+ :colors => { :boolean => true }
210
+ }
211
+ )
212
+
213
+ xdb << {:name => "Foo", :colors => [:red, :black]}
214
+ xdb << {:name => "Foo", :colors => [:red, :green]}
215
+ xdb << {:name => "Foo", :colors => [:blue, :yellow]}
216
+
217
+ xdb.flush
218
+
219
+ xdb.search("foo", :filter => {:colors => [:red]}).map(&:id).should == [1, 2]
220
+ xdb.search("foo", :filter => {:colors => [:black, :green]}).map(&:id).should == [1, 2]
221
+
222
+ xdb.search("red").should be_empty
223
+ end
224
+
225
+ it "should index boolean terms if asked to" do
226
+ xdb = XapianDb.new(:dir => tmp_dir, :create => true,
227
+ :fields => {
228
+ :name => { :index => true },
229
+ :colors => { :index => true, :boolean => true }
230
+ }
231
+ )
232
+
233
+ xdb << {:name => "Foo", :colors => [:red, :black]}
234
+ xdb << {:name => "Foo", :colors => [:red, :green]}
235
+ xdb << {:name => "Foo", :colors => [:blue, :yellow]}
236
+
237
+ xdb.flush
238
+
239
+ xdb.search("foo", :filter => {:colors => [:red]}).map(&:id).should == [1, 2]
240
+ xdb.search("foo", :filter => {:colors => [:black, :green]}).map(&:id).should == [1, 2]
241
+
242
+ xdb.search("red").map(&:id).should == [1, 2]
243
+ end
244
+
205
245
  end
206
246
 
207
247
  describe "search" do
@@ -394,6 +434,85 @@ describe XapianDb do
394
434
  xdb.search("jon").should be_empty
395
435
  xdb.search("jon", :synonyms => true).should_not be_empty
396
436
  end
437
+
438
+ describe "with special queries" do
439
+ before do
440
+ @xdb = XapianDb.new
441
+ @xdb << "Doc 1"
442
+ @xdb << "Doc 2"
443
+ end
444
+
445
+ it "should return empty array on :nothing" do
446
+ @xdb.search(:nothing).should be_empty
447
+ end
448
+
449
+ it "should return all documents on :all" do
450
+ @xdb.search(:all).length.should eq 2
451
+ end
452
+ end
453
+
454
+ it "should allow to search by boolean terms" do
455
+ xdb = XapianDb.new(:dir => tmp_dir, :create => true,
456
+ :fields => {
457
+ :name => { :index => true },
458
+ :age => { :boolean => true },
459
+ :city => { :boolean => true }
460
+ }
461
+ )
462
+
463
+ xdb << {:name => "John A", :age => 10, :city => "London"}
464
+ xdb << {:name => "John B", :age => 11, :city => "Liverpool"}
465
+ xdb << {:name => "John C", :age => 12, :city => "Liverpool"}
466
+
467
+ xdb.flush
468
+
469
+ xdb.search("john").size.should == 3
470
+ xdb.search("john", :filter => {:age => 10}).map(&:id).should == [1]
471
+ xdb.search("john", :filter => {:age => [10, 12]}).map(&:id).should == [1, 3]
472
+
473
+ xdb.search("john", :filter => {:age => 10, :city => "Liverpool"}).map(&:id).should == []
474
+ xdb.search("john", :filter => {:city => "Liverpool"}).map(&:id).should == [2, 3]
475
+ xdb.search("john", :filter => {:age => 11..15, :city => "Liverpool"}).map(&:id).should == [2, 3]
476
+
477
+ xdb.search("liverpool").should be_empty
478
+ xdb.search("city:liverpool").map(&:id).should == [2, 3]
479
+ end
480
+ end
481
+
482
+ describe "filtering" do
483
+ before do
484
+ @xdb = XapianDb.new(
485
+ :dir => tmp_dir, :create => true, :overwrite => true,
486
+ :fields => {
487
+ :name => { :index => true },
488
+ :age => { :type => Integer, :sortable => true },
489
+ :height => { :type => Float, :sortable => true }
490
+ }
491
+ )
492
+ end
493
+
494
+ it "should filter results using value ranges" do
495
+ @xdb << {:name => "John", :age => 30, :height => 1.8}
496
+ @xdb << {:name => "John", :age => 35, :height => 1.9}
497
+ @xdb << {:name => "John", :age => 40, :height => 1.7}
498
+ @xdb << {:name => "Markus", :age => 35, :height => 1.7}
499
+ @xdb.flush
500
+
501
+ # Make sure we're combining queries using OP_FILTER by comparing
502
+ # the weights with and without filtering.
503
+ @xdb.search("markus")[0].weight.should == @xdb.search("markus", :filter => {:age => "35"})[0].weight
504
+
505
+ @xdb.search("john", :filter => {:age => "10..20"}).should be_empty
506
+
507
+ @xdb.search("john", :filter => {:age => "10..30"}).map(&:id).should == [1]
508
+ @xdb.search("john", :filter => {:age => "35.."}).map(&:id).should == [2, 3]
509
+ @xdb.search("john", :filter => {:age => "..35"}).map(&:id).should == [1, 2]
510
+ @xdb.search("john", :filter => {:age => ["..30", "40.."]}).map(&:id).should == [1, 3]
511
+
512
+ @xdb.search("john", :filter => {:age => "10..30", :height => "1.8"}).map(&:id).should == [1]
513
+ @xdb.search("john", :filter => {:age => "10..30", :height => "..1.8"}).map(&:id).should == [1]
514
+ @xdb.search("john", :filter => {:age => "10..30", :height => "1.9.."}).should be_empty
515
+ end
397
516
  end
398
517
 
399
518
  describe "add_doc" do
@@ -468,6 +587,21 @@ describe XapianDb do
468
587
  docs.map { |d| d.id }.should == [1, 3]
469
588
  end
470
589
 
590
+ it "should allow range queries without prefixes" do
591
+ xdb = XapianDb.new(:fields => {
592
+ :price => { :type => Integer, :sortable => true, :range_prefix => "$" },
593
+ :age => { :type => Integer, :sortable => true }
594
+ })
595
+
596
+ xdb << XapianDoc.new(:price => 10, :age => 40)
597
+ xdb << XapianDoc.new(:price => 20, :age => 35)
598
+ xdb << XapianDoc.new(:price => 45, :age => 30)
599
+
600
+ docs = xdb.search("$20..40 OR age:40..50")
601
+
602
+ docs.map { |d| d.id }.should == [1, 2]
603
+ end
604
+
471
605
  it "should store values declared as to be collapsible" do
472
606
  xdb = XapianDb.new(:collapsible => :group_id)
473
607
  xdb << XapianDoc.new(:group_id => "666", :author => "Jim Jones")
@@ -37,6 +37,30 @@ describe XapianDoc do
37
37
  xdb.ro.positionlist(doc.id, "once").first.should == nil
38
38
  end
39
39
 
40
+ it "should tokenize an array given as a field" do
41
+ xdb = XapianDb.new
42
+ xdoc = xdb.documents.new(:colors => [:red, :green, :blue]).to_xapian_document
43
+ xdoc.terms.should be_a_kind_of Array
44
+ xdoc.terms.last.should be_a_kind_of Xapian::Term
45
+ terms = xdoc.terms.collect { |t| t.term }
46
+ terms.should include "red"
47
+ terms.should include "green"
48
+ terms.should include "blue"
49
+ terms.should_not include "redgreenblue"
50
+ end
51
+
52
+ it "should tokenize an array given as the content" do
53
+ xdb = XapianDb.new
54
+ xdoc = xdb.documents.new([:red, :green, :blue]).to_xapian_document
55
+ xdoc.terms.should be_a_kind_of Array
56
+ xdoc.terms.last.should be_a_kind_of Xapian::Term
57
+ terms = xdoc.terms.collect { |t| t.term }
58
+ terms.should include "red"
59
+ terms.should include "green"
60
+ terms.should include "blue"
61
+ terms.should_not include "redgreenblue"
62
+ end
63
+
40
64
  it "should tokenize a hash" do
41
65
  xdb = XapianDb.new
42
66
  xdoc = xdb.documents.new(:title => 'once upon a time').to_xapian_document
@@ -12,8 +12,14 @@ describe XapianDocValueAccessor do
12
12
  end
13
13
  end
14
14
 
15
+ before do
16
+ @xdb = XapianDb.new(:fields => {:name => {:index => true}, :city => {:store => true}})
17
+ @xdb << {:name => "John"}
18
+ @xdb.flush
19
+ end
20
+
15
21
  it "should store and fetch values like a hash" do
16
- values = XapianDocValueAccessor.new(XapianDoc.new(nil))
22
+ values = @xdb.documents.find(1).values
17
23
  values.store(:city, "Leeds").should == "Leeds"
18
24
  values.fetch(:city).should == "Leeds"
19
25
  values[:city] = "London"
@@ -21,7 +27,7 @@ describe XapianDocValueAccessor do
21
27
  end
22
28
 
23
29
  it "should add and retrieve values from the Xapian::Document" do
24
- doc = XapianDoc.new(nil)
30
+ doc = @xdb.documents.find(1)
25
31
  values = XapianDocValueAccessor.new(doc)
26
32
  lambda { values[:city] = "London" }.should change(doc.xapian_document, :values_count).by(1)
27
33
  end
@@ -95,12 +101,13 @@ describe XapianDocValueAccessor do
95
101
  end
96
102
 
97
103
  it "should count the stored values when size is called" do
98
- doc = XapianDoc.new(nil)
104
+ doc = @xdb.documents.find(1)
99
105
  lambda { doc.values[:city] = "London" }.should change(doc.values, :size).by(1)
100
106
  end
101
107
 
102
108
  it "should delete values from the Xapian::Document" do
103
- doc = XapianDoc.new(nil)
109
+ doc = @xdb.documents.find(1)
110
+ values = doc.values
104
111
  doc.values[:city] = "Leeds"
105
112
  lambda { doc.values.delete(:city) }.should change(doc.values, :size).by(-1)
106
113
  doc.values[:city] = "London"
metadata CHANGED
@@ -1,21 +1,22 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: xapian-fu
3
3
  version: !ruby/object:Gem::Version
4
- hash: 31
4
+ hash: 7
5
5
  prerelease:
6
6
  segments:
7
7
  - 1
8
- - 3
9
- - 2
10
- version: 1.3.2
8
+ - 4
9
+ - 0
10
+ version: 1.4.0
11
11
  platform: ruby
12
12
  authors:
13
13
  - John Leach
14
+ - Damian Janowski
14
15
  autorequire:
15
16
  bindir: bin
16
17
  cert_chain: []
17
18
 
18
- date: 2011-12-04 00:00:00 Z
19
+ date: 2012-03-13 00:00:00 Z
19
20
  dependencies:
20
21
  - !ruby/object:Gem::Dependency
21
22
  name: rspec
@@ -107,6 +108,7 @@ files:
107
108
  - spec/xapian_doc_spec.rb
108
109
  - spec/xapian_db_spec.rb
109
110
  - spec/stopper_factory_spec.rb
111
+ - spec/facets_spec.rb
110
112
  - spec/fixtures/film_data/x86_64-linux~1.8.7/value.baseA
111
113
  - spec/fixtures/film_data/x86_64-linux~1.8.7/position.baseA
112
114
  - spec/fixtures/film_data/x86_64-linux~1.8.7/record.baseA
@@ -203,6 +205,7 @@ test_files:
203
205
  - spec/xapian_doc_spec.rb
204
206
  - spec/xapian_db_spec.rb
205
207
  - spec/stopper_factory_spec.rb
208
+ - spec/facets_spec.rb
206
209
  - spec/fixtures/film_data/x86_64-linux~1.8.7/value.baseA
207
210
  - spec/fixtures/film_data/x86_64-linux~1.8.7/position.baseA
208
211
  - spec/fixtures/film_data/x86_64-linux~1.8.7/record.baseA