xapian_db 0.3.4 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGELOG.md CHANGED
@@ -1,3 +1,23 @@
1
+ ##0.4.0 (December 15th, 2010)
2
+
3
+ Features:
4
+
5
+ - Simple facets implementation. The only facet supported is the class name of the indexed objects
6
+ - Support for sorting (only for class searches, not for global searches)
7
+ - The result of a search can be used with will_paginate
8
+
9
+ Bugfixes:
10
+
11
+ - removed the class scope expression from the spelling suggestion when searching on a class
12
+ - keys of the attributes and index hashes are now sorted to be compatible with ruby 1.8 (which does
13
+ not preserve the order of the keys in a hash)
14
+ - Fixed the problem that blueprint configurations got lost after the first request in the development
15
+ env (Rails only). You should put your blueprints either into a class that is loaded by Rails or into
16
+ the file config/xapian_blueprints.rb wich is loaded automatically by XapianDb
17
+
18
+ **Since the internal structure of the index has changed, you must reindex your objects if you come from an
19
+ earlier version of XapianDb!**
20
+
1
21
  ##0.3.4 (December 14th, 2010)
2
22
 
3
23
  Features:
@@ -5,6 +25,9 @@ Features:
5
25
  - perform searches on indexed classes to scope the search to objects of a specific class
6
26
  - specify multiple blueprint attributes and index methods in one statement (without specifying options)
7
27
  - use blocks for complex attribute or index specifications
28
+
29
+ Changes:
30
+
8
31
  - changed the implementation of Resultset.size to get more accurate estimations
9
32
  - changed the indexing of active_record or datamapper models when declared as attributes or indexes
10
33
  in a blueprint (indexes now all attributes of the object instead of using to_s)
data/README.rdoc CHANGED
@@ -137,7 +137,8 @@ Use blocks for complex evaluations of attributes or indexed values:
137
137
  end
138
138
  end
139
139
 
140
- You can place this configuration anywhere, e.g. in an initializer.
140
+ place these configurations either into the corrsepondig class or - I prefer to have the index configurations outside
141
+ the models - into the file config/xapian_blueprints.rb.
141
142
 
142
143
  === Update the index
143
144
 
@@ -157,23 +158,30 @@ In verbose mode, XapianDb will use the progressbar gem if available.
157
158
 
158
159
  A simple query looks like this:
159
160
 
160
- results = XapianDb.search("Foo")
161
+ results = XapianDb.search "Foo"
161
162
 
162
163
  You can use wildcards and boolean operators:
163
164
 
164
- results = XapianDb.search("fo* or baz")
165
+ results = XapianDb.search "fo* or baz"
165
166
 
166
167
  You can query attributes:
167
168
 
168
- results = XapianDb.search("name:Foo")
169
+ results = XapianDb.search "name:Foo"
169
170
 
170
171
  You can query objects of a specific class:
171
172
 
172
- results = Person.search("name:Foo")
173
+ results = Person.search "name:Foo"
173
174
 
174
175
  If you want to override the default of 10 docs per page, pass the :per_page argument:
175
176
 
176
- results = Person.search("name:Foo", :per_page => 20)
177
+ results = Person.search "name:Foo", :per_page => 20
178
+
179
+ On class queries you can specifiy order options:
180
+
181
+ results = Person.search "name:Foo", :order => :first_name
182
+ results = Person.search "Fo*", :order => [:name, :first_name], :sort_decending => true
183
+
184
+ Please note that the order option is not avaliable for global searches (XapianDb.search...)
177
185
 
178
186
  === Process the results
179
187
 
@@ -189,8 +197,8 @@ If you use a persistent database, the resultset may contain a spelling correctio
189
197
 
190
198
  To access the found documents, get a page from the resultset:
191
199
 
192
- page = result.paginate # Get the first page
193
- page = result.paginate :page => 2 # Get the second page
200
+ page = results.paginate # Get the first page
201
+ page = results.paginate :page => 2 # Get the second page
194
202
 
195
203
  Now you can access the documents:
196
204
 
@@ -199,9 +207,25 @@ Now you can access the documents:
199
207
  puts doc.name # We can access the configured attributes
200
208
  person = doc.indexed_object # Access the object behind this doc (lazy loaded)
201
209
 
210
+ Use a search result with will_paginate in a view:
211
+
212
+ <%= will_paginate @results %>
213
+
214
+ === Facets
215
+
216
+ If you want to implement a simple drilldown for your searches, you can use a facets query:
217
+
218
+ search_expression = "Foo"
219
+ facets = XapianDb.facets(search_expression)
220
+ facets.each do |klass, count|
221
+ puts "#{klass.name}: #{count} hits"
222
+
223
+ # This is how you would get all documents for the facet
224
+ # doc = klass.search search_expression
225
+ end
226
+
227
+ Facet support in XapianDb is very limited. The only available facet is the class of the indexed objects. In many cases that's all that's needed. Therefore, it is very likely that I won't add more options for facets (since I'm not a fan of facets anyway). However, if you desperately need advanced facets, let me know. Or - even better - send me a pull request with a nice implementation ;-)
202
228
 
203
229
  == What to expect from future releases
204
230
 
205
- * facet support
206
- * will_paginate support
207
231
  * asynchronous index writer based on {resque}[https://github.com/defunkt/resque] for production environments
@@ -19,8 +19,27 @@ module XapianDb
19
19
  klass.class_eval do
20
20
 
21
21
  # Add a method to search models of this class
22
- define_singleton_method(:search) do |expression|
23
- XapianDb.database.search "indexed_class:#{klass.name.downcase} and (#{expression})"
22
+ # Options:
23
+ # - :order (Array<Symbol>) Accepts an array of attribute names for sorting
24
+ # - :sort_decending (Boolean) Allows to reverse the sorting
25
+ define_singleton_method(:search) do |expression, options={}|
26
+ options = {:sort_decending => false}.merge options
27
+ class_scope = "indexed_class:#{klass.name.downcase}"
28
+
29
+ if options[:order]
30
+ attr_names = [options[:order]].flatten
31
+ blueprint = XapianDb::DocumentBlueprint.blueprint_for klass
32
+ sort_indices = attr_names.map {|attr_name| blueprint.value_index_for(attr_name)}
33
+ options[:sort_indices] = attr_names.map {|attr_name| blueprint.value_index_for(attr_name)}
34
+ end
35
+ result = XapianDb.database.search "#{class_scope} and (#{expression})", options
36
+
37
+ # Remove the class scope from the spelling suggestion (if any)
38
+ unless result.spelling_suggestion.empty?
39
+ scope_length = "#{class_scope} and (".size
40
+ result.spelling_suggestion = result.spelling_suggestion.slice scope_length..-2
41
+ end
42
+ result
24
43
  end
25
44
 
26
45
  end
@@ -44,6 +44,10 @@ module XapianDb
44
44
  # @param [String] expression A valid search expression.
45
45
  # @param [Hash] options
46
46
  # @option options [Integer] :per_page (10) How many docs per page?
47
+ # @option options [Array<Integer>] :sort_indices (nil) An array of attribute indices to sort by. This
48
+ # option is used internally by the search method implemented on configured classes. Do not use it
49
+ # directly unless
50
+ # you know what you do
47
51
  # @example Simple Query
48
52
  # resultset = db.search("foo")
49
53
  # @example Wildcard Query
@@ -54,16 +58,45 @@ module XapianDb
54
58
  # resultset = db.search("name:foo")
55
59
  # @return [XapianDb::Resultset] The resultset
56
60
  def search(expression, options={})
57
- opts = {:per_page => 10}.merge(options)
61
+ opts = {:per_page => 10, :sort_decending => false}.merge(options)
58
62
  @query_parser ||= QueryParser.new(self)
59
63
  query = @query_parser.parse(expression)
60
64
  enquiry = Xapian::Enquire.new(reader)
61
65
  enquiry.query = query
66
+
67
+ if opts[:sort_indices]
68
+ raise ArgumentError.new("Sorting is available for class scoped searches only") unless expression =~ /^indexed_class:/
69
+ sorter = Xapian::MultiValueSorter.new
70
+ options[:sort_indices].each do |index|
71
+ sorter.add(index, opts[:sort_decending])
72
+ end
73
+ enquiry.set_sort_by_key_then_relevance(sorter)
74
+ end
75
+
62
76
  opts[:spelling_suggestion] = @query_parser.spelling_suggestion
63
77
  opts[:db_size] = self.size
64
78
  Resultset.new(enquiry, opts)
65
79
  end
66
80
 
81
+ # A very simple implementation of facets limited to the class facets.
82
+ # @param [String] expression A valid search expression (see {#search} for examples).
83
+ # @return [Hash<Class, Integer>] A hash containing the classes and the hits per class
84
+ def facets(expression)
85
+ @query_parser ||= QueryParser.new(self)
86
+ query = @query_parser.parse(expression)
87
+ enquiry = Xapian::Enquire.new(reader)
88
+ enquiry.query = query
89
+ enquiry.collapse_key = 0 # Value 0 always contains the class name
90
+ facets = {}
91
+ enquiry.mset(0, self.size).matches.each do |match|
92
+ class_name = match.document.values[0].value
93
+ # We must add 1 to the collapse_count since collapse_count means
94
+ # "how many other matches are there?"
95
+ facets[Kernel.const_get(class_name)] = match.collapse_count + 1
96
+ end
97
+ facets
98
+ end
99
+
67
100
  end
68
101
 
69
102
  # In Memory database
@@ -72,10 +72,48 @@ module XapianDb
72
72
  # Instance methods
73
73
  # ---------------------------------------------------------------------------------
74
74
 
75
+ # Get the names of all configured attributes sorted alphabetically
76
+ # @return [Array<Symbol>] The names of the attributes
77
+ def attribute_names
78
+ @attributes_hash.keys.sort
79
+ end
80
+
81
+ # Get the block associated with an attribute
82
+ # @param [Symbol] attribute The name of the attribute
83
+ # @return [Block] The block
84
+ def block_for_attribute(attribute)
85
+ @attributes_hash[attribute]
86
+ end
87
+
88
+ # Get the names of all configured index methods sorted alphabetically
89
+ # @return [Array<Symbol>] The names of the index_methods
90
+ def indexed_method_names
91
+ @indexed_methods_hash.keys.sort
92
+ end
93
+
94
+ # Get the options for an indexed method
95
+ # @param [Symbol] method The name of the method
96
+ # @return [IndexOptions] The options
97
+ def options_for_indexed_method(method)
98
+ @indexed_methods_hash[method]
99
+ end
100
+
101
+ # Return the value index of an attribute. Needed to access the value of an attribute
102
+ # from a Xapian document.
103
+ # @param [String, Symbol] attribute_name The name of the attribute
104
+ # @return [Integer] The value index of the attribute
105
+ # @raise ArgumentError if the attribute name is unknown
106
+ def value_index_for(attribute_name)
107
+ index = attribute_names.index attribute_name.to_sym
108
+ raise ArgumentError.new("Attribute #{attribute_name} unknown") unless index
109
+ # We add 1 because value slot 0 is reserved for the class name
110
+ index + 1
111
+ end
112
+
75
113
  # Return an array of all configured text methods in this blueprint
76
114
  # @return [Array<String>] All searchable prefixes
77
115
  def searchable_prefixes
78
- @prefixes ||= indexed_methods_hash.keys
116
+ @prefixes ||= @indexed_methods_hash.keys
79
117
  end
80
118
 
81
119
  # Lazily build and return a module that implements accessors for each field
@@ -91,7 +129,7 @@ module XapianDb
91
129
  end
92
130
  end
93
131
 
94
- @attributes_hash.keys.each_with_index do |field, index|
132
+ @attributes_hash.keys.sort.each_with_index do |field, index|
95
133
  @accessors_module.instance_eval do
96
134
  define_method field do
97
135
  YAML::load(self.values[index+1].value)
@@ -112,15 +150,6 @@ module XapianDb
112
150
  # configured class must implement this method.
113
151
  attr_reader :lang_method
114
152
 
115
- # Collection of the configured attribute methods
116
- # @return [Array<Symbol>] The names of the configured attribute methods
117
- attr_reader :attributes_hash
118
-
119
- # Collection of the configured index methods
120
- # @return [Hash<Symbol, IndexOptions>] A hashtable containing all index methods as
121
- # keys and IndexOptions as values
122
- attr_reader :indexed_methods_hash
123
-
124
153
  # Set / read a custom adapter.
125
154
  # Use this configuration option if you need a specific adapter for an indexed class.
126
155
  # If set, it overrides the globally configured adapter (see also {Config#adapter})
@@ -166,8 +195,8 @@ module XapianDb
166
195
  self.index(name, opts, &block) if opts[:index]
167
196
  end
168
197
 
169
- # Add list of attributes to the blueprint. Attributes will be stored in the xapian documents an can be
170
- # accessed from a search result.
198
+ # Add a list of attributes to the blueprint. Attributes will be stored in the xapian documents ans
199
+ # can be accessed from a search result.
171
200
  # @param [Array] attributes An array of method names that deliver the values for the attributes
172
201
  # @todo Make sure the name does not collide with a method name of Xapian::Document
173
202
  def attributes(*attributes)
@@ -35,15 +35,14 @@ module XapianDb
35
35
  # We store the class name of the object at position 0
36
36
  @xapian_doc.add_value(0, @obj.class.name)
37
37
 
38
- pos = 1
39
- @blueprint.attributes_hash.each do |attribute, block|
38
+ @blueprint.attribute_names.each do |attribute|
39
+ block = @blueprint.block_for_attribute attribute
40
40
  if block
41
41
  value = @obj.instance_eval(&block)
42
42
  else
43
43
  value = @obj.send(attribute)
44
44
  end
45
- @xapian_doc.add_value(pos, value.to_yaml)
46
- pos += 1
45
+ @xapian_doc.add_value(@blueprint.value_index_for(attribute), value.to_yaml)
47
46
  end
48
47
  end
49
48
 
@@ -68,7 +67,26 @@ module XapianDb
68
67
  @xapian_doc.add_term("C#{@obj.class}")
69
68
 
70
69
 
71
- @blueprint.indexed_methods_hash.each do |method, options|
70
+ # @blueprint.indexed_methods_hash.keys.sort.each do |method|
71
+ # options = @blueprint.indexed_methods_hash[method]
72
+ # if options.block
73
+ # obj = @obj.instance_eval(&options.block)
74
+ # else
75
+ # obj = @obj.send(method)
76
+ # end
77
+ # unless obj.nil?
78
+ # values = get_values_to_index_from obj
79
+ # values.each do |value|
80
+ # # Add value with field name
81
+ # term_generator.index_text(value.to_s.downcase, options.weight, "X#{method.upcase}")
82
+ # # Add value without field name
83
+ # term_generator.index_text(value.to_s.downcase)
84
+ # end
85
+ # end
86
+ # end
87
+
88
+ @blueprint.indexed_method_names.each do |method|
89
+ options = @blueprint.options_for_indexed_method method
72
90
  if options.block
73
91
  obj = @obj.instance_eval(&options.block)
74
92
  else
@@ -84,6 +102,7 @@ module XapianDb
84
102
  end
85
103
  end
86
104
  end
105
+
87
106
  end
88
107
 
89
108
  private
@@ -40,5 +40,11 @@ module XapianDb
40
40
 
41
41
  end
42
42
 
43
+ config.to_prepare do
44
+ # Load a blueprint config if there is one
45
+ blueprints_file_path = "#{Rails.root}/config/xapian_blueprints.rb"
46
+ load blueprints_file_path if File.exist?(blueprints_file_path)
47
+ end
48
+
43
49
  end
44
50
  end
@@ -4,10 +4,13 @@ module XapianDb
4
4
 
5
5
  # The resultset encapsulates a Xapian::Query object and allows paged access
6
6
  # to the found documents.
7
+ # The resultset is compatible with will_paginate.
7
8
  # @example Process the first page of a resultsest
8
9
  # resultset.paginate(:page => 1, :per_page => 10).each do |doc|
9
10
  # # do something with the xapian document
10
11
  # end
12
+ # @example Use the resultset and will_paginate in a view
13
+ # <%= will_paginate resultset %>
11
14
  # @author Gernot Kogler
12
15
  class Resultset
13
16
 
@@ -15,9 +18,17 @@ module XapianDb
15
18
  # @return [Integer]
16
19
  attr_reader :size
17
20
 
21
+ # The number of pages
22
+ # @return [Integer]
23
+ attr_reader :total_pages
24
+
25
+ # The current page
26
+ # @return [Integer]
27
+ attr_reader :current_page
28
+
18
29
  # The spelling corrected query (if a language is configured)
19
30
  # @return [String]
20
- attr_reader :spelling_suggestion
31
+ attr_accessor :spelling_suggestion
21
32
 
22
33
  # Constructor
23
34
  # @param [Xapian::Enquire] enquiry a Xapian query result (see http://xapian.org/docs/apidoc/html/classXapian_1_1Enquire.html)
@@ -29,15 +40,30 @@ module XapianDb
29
40
  # To get more accurate results, we pass the doc count to the mset method
30
41
  @size = enquiry.mset(0, options[:db_size]).matches_estimated
31
42
  @spelling_suggestion = options[:spelling_suggestion]
32
- @per_page = options[:per_page]
43
+ @per_page = options[:per_page]
44
+ @total_pages = (@size / @per_page.to_f).ceil
45
+ @current_page = 1
33
46
  end
34
47
 
35
48
  # Paginate the result
36
49
  # @param [Hash] opts Options for the persistent database
37
50
  # @option opts [Integer] :page (1) The page to access
38
51
  def paginate(opts={})
39
- options = {:page => 1}.merge(opts)
40
- build_page(options[:page])
52
+ options = {:page => 1}.merge(opts)
53
+ @current_page = options[:page].to_i
54
+ build_page(@current_page)
55
+ end
56
+
57
+ # The previous page number
58
+ # @return [Integer] The number of the previous page or nil, if we are at page 1
59
+ def previous_page
60
+ @current_page > 1 ? (@current_page - 1) : nil
61
+ end
62
+
63
+ # The next page number
64
+ # @return [Integer] The number of the next page or nil, if we are at the last page
65
+ def next_page
66
+ @current_page < @total_pages ? (@current_page + 1): nil
41
67
  end
42
68
 
43
69
  private
@@ -45,6 +71,7 @@ module XapianDb
45
71
  # Build a page of Xapian documents
46
72
  # @return [Array<Xapian::Document>] An array of xapian documents
47
73
  def build_page(page)
74
+ page.nil? ? page = 1 : page = page.to_i
48
75
  docs = []
49
76
  offset = (page - 1) * @per_page
50
77
  return [] if offset > @size
data/lib/xapian_db.rb CHANGED
@@ -77,6 +77,13 @@ module XapianDb
77
77
  XapianDb::Config.database.search(expression)
78
78
  end
79
79
 
80
+ # Get facets from the configured database.
81
+ # See {XapianDb::Database#facets} for options
82
+ # @return [Hash<Class, Integer>] A hash containing the classes and the hits per class
83
+ def self.facets(expression)
84
+ XapianDb::Config.database.facets(expression)
85
+ end
86
+
80
87
  end
81
88
 
82
89
  do_not_require = %w(update_stopwords.rb railtie.rb base_adapter.rb)
metadata CHANGED
@@ -4,9 +4,9 @@ version: !ruby/object:Gem::Version
4
4
  prerelease: false
5
5
  segments:
6
6
  - 0
7
- - 3
8
7
  - 4
9
- version: 0.3.4
8
+ - 0
9
+ version: 0.4.0
10
10
  platform: ruby
11
11
  authors:
12
12
  - Gernot Kogler
@@ -14,7 +14,7 @@ autorequire:
14
14
  bindir: bin
15
15
  cert_chain: []
16
16
 
17
- date: 2010-12-13 00:00:00 +01:00
17
+ date: 2010-12-15 00:00:00 +01:00
18
18
  default_executable:
19
19
  dependencies:
20
20
  - !ruby/object:Gem::Dependency