psq-dm-xapian 0.3.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (8) hide show
  1. data/CHANGES.txt +4 -0
  2. data/LICENSE +51 -0
  3. data/README.txt +57 -0
  4. data/Rakefile +57 -0
  5. data/SETUP.txt +91 -0
  6. data/TODO +0 -0
  7. data/lib/dm-xapian.rb +702 -0
  8. metadata +71 -0
data/CHANGES.txt ADDED
@@ -0,0 +1,4 @@
1
+ * 2008-11-08
2
+ - version is now 0.3
3
+ - applied patch from Emmanuel Surleau: make the rake tasks work with merb 0.9.13 (should also work with 1.0, I assume, but I haven't upgraded yet). In addition, rebuild_index complains in an explicit way when a class name is not a model or indexed by Xapian
4
+
data/LICENSE ADDED
@@ -0,0 +1,51 @@
1
+ dm-xapian is released under the MIT License.
2
+ Copyright (c) 2008 Joshaven Potter <yourtech@gmail.com>
3
+ Copyright (c) 2008 Pascal Belloncle <psq@nanorails.com>
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
23
+
24
+
25
+
26
+
27
+
28
+ This software is a Fork of http://github.com/frabcus/acts_as_xapian which
29
+ was licensed under the following:
30
+
31
+ acts_as_xapian is released under the MIT License.
32
+
33
+ Copyright (c) 2008 UK Citizens Online Democracy.
34
+
35
+ Permission is hereby granted, free of charge, to any person obtaining a copy
36
+ of the acts_as_xapian software and associated documentation files (the
37
+ "Software"), to deal in the Software without restriction, including without
38
+ limitation the rights to use, copy, modify, merge, publish, distribute,
39
+ sublicense, and/or sell copies of the Software, and to permit persons to whom
40
+ the Software is furnished to do so, subject to the following conditions:
41
+
42
+ The above copyright notice and this permission notice shall be included in all
43
+ copies or substantial portions of the Software.
44
+
45
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
46
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
47
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
48
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
49
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
50
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
51
+ THE SOFTWARE.
data/README.txt ADDED
@@ -0,0 +1,57 @@
1
+ CONTRIBUTORS WELCOME
2
+
3
+ Currently somewhat tested, but, ahem, no spec:
4
+
5
+ * rake xapian:update_index models="model1 model2 ..."
6
+ * rake xapian:rebuild_index models="model1 model2 ..."
7
+ * rake xapian:query models="model1 model2 ..." query="..."
8
+ * tracking of models to be updated after added or updated (via rake xapian:update_index)
9
+
10
+
11
+ dm-xapian
12
+ =========
13
+
14
+ Merb plugin that provides use of the Ruby Xapian search engine library.
15
+
16
+
17
+ Setup
18
+ =====
19
+
20
+ For setup instructions read through SETUP.txt, its short and tells you where to get what else you need and what to do with it.
21
+
22
+
23
+ Xapian
24
+ ======
25
+
26
+ Xapian is an Open Source Search Engine Library, released under the GPL. It's written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C# and Ruby (so far!)
27
+
28
+ Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.
29
+
30
+ If you're after a packaged search engine for your website, you should take a look at Omega: an application we supply built upon Xapian. Unlike most other website search solutions, Xapian's versatility allows you to extend Omega to meet your needs as they grow.
31
+
32
+ The latest stable version is 1.0.7, released on 2008-07-15.
33
+
34
+ http://xapian.org/
35
+
36
+
37
+ Xapian Bindings for Ruby
38
+ ========================
39
+ The Ruby bindings for Xapian are packaged in the xapian module.
40
+
41
+ General info: http://xapian.org/docs/bindings/ruby/
42
+ API Docs: http://xapian.org/docs/bindings/ruby/rdocs/
43
+
44
+ To Use
45
+ ======
46
+ * install gem via "rake install"
47
+ * add to config/init.rb:
48
+ dependencies "dm-xapian"
49
+ * add the dm-xapian models to the database:
50
+ rake dm:automigrate
51
+ * Add to each model:
52
+ is_indexed :texts => [ :name, :region, :country, :varietal ],
53
+ :values => [[:price, 0, "price", :float], [:ean, 1, "ean", :string]],
54
+ :terms => [ [ :winery, 'W', "winery" ] ]
55
+
56
+ * Terms are global across all models
57
+ * texts, values and terms based on properties
data/Rakefile ADDED
@@ -0,0 +1,57 @@
1
+ require 'rubygems'
2
+ require 'rake/gempackagetask'
3
+ require 'rubygems/specification'
4
+ require 'date'
5
+ require 'merb-core/version'
6
+ require 'merb-core/tasks/merb_rake_helper'
7
+
8
+ NAME = "dm-xapian"
9
+ GEM_VERSION = "0.3"
10
+ AUTHOR = "Joshaven Potter, Pascal Belloncle"
11
+ EMAIL = "yourtech@gmail.com, psq@nanorails.com"
12
+ HOMEPAGE = "http://github.com/psq/dm-xapian"
13
+ SUMMARY = "Merb plugin that provides use of the Ruby Xapian search engine library"
14
+
15
+ spec = Gem::Specification.new do |s|
16
+ s.rubyforge_project = 'merb'
17
+ s.name = NAME
18
+ s.version = GEM_VERSION
19
+ s.platform = Gem::Platform::RUBY
20
+ s.has_rdoc = true
21
+ s.extra_rdoc_files = ["README.txt", "LICENSE", 'TODO', 'SETUP.txt', 'CHANGES.txt']
22
+ s.summary = SUMMARY
23
+ s.description = s.summary
24
+ s.author = AUTHOR
25
+ s.email = EMAIL
26
+ s.homepage = HOMEPAGE
27
+ s.add_dependency('merb', '>= 0.9.7')
28
+ s.require_path = 'lib'
29
+ s.files = %w(LICENSE README.txt Rakefile TODO CHANGES.txt SETUP.txt) + Dir.glob("{lib,spec}/**/*")
30
+
31
+ end
32
+
33
+ Rake::GemPackageTask.new(spec) do |pkg|
34
+ pkg.gem_spec = spec
35
+ end
36
+
37
+ desc "install the plugin locally"
38
+ task :install => [:package] do
39
+ Merb::RakeHelper.install(NAME, :version => GEM_VERSION)
40
+ # sudo "gem install #{install_home} pkg/#{NAME}-#{GEM_VERSION} --no-update-sources"
41
+ end
42
+
43
+ desc "create a gemspec file"
44
+ task :make_spec do
45
+ File.open("#{NAME}.gemspec", "w") do |file|
46
+ file.puts spec.to_ruby
47
+ end
48
+ end
49
+
50
+ namespace :jruby do
51
+
52
+ desc "Run :package and install the resulting .gem with jruby"
53
+ task :install => :package do
54
+ sudo "jruby -S gem install #{install_home} pkg/#{NAME}-#{GEM_VERSION}.gem --no-rdoc --no-ri"
55
+ end
56
+
57
+ end
data/SETUP.txt ADDED
@@ -0,0 +1,91 @@
1
+ Overview
2
+ ========
3
+
4
+ I have basic copy and paste instructions below... *You may want to set the SRC bash variable*
5
+ The same instructions with extra text around them are here: http://xapian.org/docs/install.html
6
+
7
+ If you received the following error then read the approprate section, 'On OS X' or 'On Linux'
8
+ dm_xapian: No Ruby bindings for Xapian installed. Please follow setup instructions.
9
+
10
+
11
+
12
+
13
+ Everyone
14
+ ========
15
+
16
+ After downloading & installing Xapian and its bindings you'll need to drop this in your plugins folder
17
+ TODO: Complete this doc & verify drop-in functionality mentioned above
18
+
19
+
20
+
21
+
22
+ On OS X (with Xcode so you can compile)
23
+ =======================================
24
+
25
+ # navigate to your source director or create one with mkdir ~/src
26
+ SRC=~/src
27
+ cd $SRC
28
+
29
+ # download packages (your welcome to check for newer code but don't blame me if it doesn't work)
30
+ curl -O http://oligarchy.co.uk/xapian/1.0.7/xapian-core-1.0.7.tar.gz
31
+ curl -O http://oligarchy.co.uk/xapian/1.0.7/xapian-bindings-1.0.7.tar.gz
32
+
33
+ # uncompress the downloads
34
+ tar zxvf xapian-core-1.0.7.tar.gz
35
+ tar zxvf xapian-bindings-1.0.7.tar.gz
36
+
37
+ # compile & install xapian-core
38
+ cd $SRC/xapian-core-1.0.7
39
+ ./configure --prefix=/opt
40
+ make
41
+ sudo make install
42
+
43
+
44
+ # compile & install xapian-bindings ** the ruby goes here **
45
+ # You can find the Xapian API here: http://xapian.org/docs/bindings/ruby/rdocs/classes/Xapian.html
46
+ cd $SRC/xapian-bindings-1.0.7
47
+ ./configure XAPIAN_CONFIG=/opt/bin/xapian-config
48
+ make
49
+ sudo make install
50
+ if [ $? = 0 ];then echo;echo;echo "All set, have fun.";echo You can find the Xapian API here: http://xapian.org/docs/bindings/ruby/rdocs/classes/Xapian.html;else echo;echo;echo '\n\nSomething went wrong... try slower this time.';fi;echo;echo
51
+
52
+
53
+
54
+
55
+ On Linux
56
+ ========
57
+
58
+ SRC=~/src
59
+ cd $SRC
60
+ wget http://oligarchy.co.uk/xapian/1.0.7/xapian-core-1.0.7.tar.gz
61
+ wget http://oligarchy.co.uk/xapian/1.0.7/xapian-bindings-1.0.7.tar.gz
62
+
63
+ echo uncompressing the downloads
64
+ tar zxvf xapian-core-1.0.7.tar.gz
65
+ tar zxvf xapian-bindings-1.0.7.tar.gz
66
+
67
+ echo compile & install xapian-core
68
+ cd $SRC/xapian-core-1.0.7
69
+ ./configure --prefix=/opt
70
+ make
71
+ sudo make install
72
+
73
+
74
+ echo compile & install xapian-bindings ** the ruby goes here **
75
+ cd $SRC/xapian-bindings-1.0.7
76
+ ./configure XAPIAN_CONFIG=/opt/bin/xapian-config
77
+ make
78
+ sudo make install
79
+
80
+ if [ $? = 0 ];then echo;echo;echo "All set, have fun.";echo You can find the Xapian API here: http://xapian.org/docs/bindings/ruby/rdocs/classes/Xapian.html;else echo;echo;echo '\n\nSomething went wrong... try slower this time.';fi;echo;echo
81
+
82
+
83
+
84
+
85
+ On Windows
86
+ ==========
87
+ Click: Start
88
+ Click: Run
89
+ No, wait...
90
+ Save yourself time and get a Mac then follow the instructions above or at least use a VM to install on Ubuntu
91
+
data/TODO ADDED
File without changes
data/lib/dm-xapian.rb ADDED
@@ -0,0 +1,702 @@
1
+ # acts_as_xapian/lib/acts_as_xapian.rb:
2
+ # Xapian full text search in Ruby on Rails.
3
+ #
4
+ # Copyright (c) 2008 UK Citizens Online Democracy. All rights reserved.
5
+ # Email: francis@mysociety.org; WWW: http://www.mysociety.org/
6
+ #
7
+ # Documentation
8
+ # =============
9
+ #
10
+ # See ../README.txt foocumentation. Please update that file if you edit
11
+ # code.
12
+
13
+ # Make it so if Xapian isn't installed, the Rails app doesn't fail completely,
14
+ # just when somebody does a search.
15
+ begin
16
+ require 'xapian'
17
+ $acts_as_xapian_bindings_available = true
18
+ rescue LoadError
19
+ STDERR.puts "acts_as_xapian: No Ruby bindings for Xapian installed"
20
+ $acts_as_xapian_bindings_available = false
21
+ end
22
+
23
+ module ActsAsXapian
24
+ ######################################################################
25
+ # Module level variables
26
+ # XXX must be some kind of cattr_accessor that can do this better
27
+ def ActsAsXapian.bindings_available
28
+ $acts_as_xapian_bindings_available
29
+ end
30
+ class NoXapianRubyBindingsError < StandardError
31
+ end
32
+
33
+ # XXX global class intializers here get loaded more than once, don't know why. Protect them.
34
+ if not $acts_as_xapian_class_var_init
35
+ @@db = nil
36
+ @@db_path = nil
37
+ @@writable_db = nil
38
+ @@writable_suffix = nil
39
+ @@init_values = {}
40
+ $acts_as_xapian_class_var_init = true
41
+ end
42
+ def ActsAsXapian.db
43
+ @@db
44
+ end
45
+ def ActsAsXapian.db_path
46
+ @@db_path
47
+ end
48
+ def ActsAsXapian.writable_db
49
+ @@writable_db
50
+ end
51
+ def ActsAsXapian.stemmer
52
+ @@stemmer
53
+ end
54
+ def ActsAsXapian.term_generator
55
+ @@term_generator
56
+ end
57
+ def ActsAsXapian.enquire
58
+ @@enquire
59
+ end
60
+ def ActsAsXapian.query_parser
61
+ @@query_parser
62
+ end
63
+ def ActsAsXapian.values_by_prefix
64
+ @@values_by_prefix
65
+ end
66
+ def ActsAsXapian.config
67
+ @@config
68
+ end
69
+
70
+ def ActsAsXapian.configure(env, root)
71
+ @@environment = env
72
+ @@root = root
73
+ end
74
+
75
+ ######################################################################
76
+ def ActsAsXapian.set_options(classname, options)
77
+ if not classname.nil?
78
+ # store class and options for use later, when we open the db in readable_init
79
+ @@init_values[classname] = options
80
+ end
81
+ end
82
+
83
+ # Reads the config file (if any) and sets up the path to the database we'll be using
84
+ def ActsAsXapian.prepare_environment
85
+ return unless @@db_path.nil?
86
+
87
+ # barf if we can't figure out the environment
88
+ raise "Set RAILS_ENV, so acts_as_xapian can find the right Xapian database" if not @@environment
89
+
90
+ # check for a config file
91
+ config_file = @@root + "/config/xapian.yml"
92
+ @@config = File.exists?(config_file) ? YAML.load_file(config_file)[@@environment] : {}
93
+
94
+ # figure out where the DBs should go
95
+ db_parent_path = File.join(@@root, config['base_db_path'] ? config['base_db_path']: 'xapiandbs/')
96
+
97
+ # make the directory for the xapian databases to go in
98
+ Dir.mkdir(db_parent_path) unless File.exists?(db_parent_path)
99
+
100
+ @@db_path = File.join(db_parent_path, @@environment)
101
+
102
+ # make some things that don't depend on the db
103
+ # XXX this gets made once for each acts_as_xapian. Oh well.
104
+ @@stemmer = Xapian::Stem.new('english')
105
+ end
106
+
107
+ # Opens / reopens the db for reading
108
+ # XXX we perhaps don't need to rebuild database and enquire and queryparser -
109
+ # but db.reopen wasn't enough by itself, so just do everything it's easier.
110
+ def ActsAsXapian.readable_init
111
+ raise NoXapianRubyBindingsError.new("Xapian Ruby bindings not installed") unless ActsAsXapian.bindings_available
112
+ raise "acts_as_xapian hasn't been called in any models" if @@init_values.empty?
113
+
114
+ # if DB is not nil, then we're already initialised, so don't do it again
115
+ # XXX we need to reopen the database each time, so Xapian gets changes to it.
116
+ # Hopefully in later version of Xapian it will autodetect this, and this can
117
+ # be commented back in again.
118
+ # return unless @@db.nil?
119
+
120
+ prepare_environment
121
+
122
+ # basic Xapian objects
123
+ begin
124
+ @@db = Xapian::Database.new(@@db_path)
125
+ @@enquire = Xapian::Enquire.new(@@db)
126
+ rescue IOError
127
+ raise "Xapian database not opened; have you built it with scripts/rebuild-xapian-index ?"
128
+ end
129
+
130
+ init_query_parser
131
+ end
132
+
133
+ # Make a new query parser
134
+ def ActsAsXapian.init_query_parser
135
+ # for queries
136
+ @@query_parser = Xapian::QueryParser.new
137
+ @@query_parser.stemmer = @@stemmer
138
+ @@query_parser.stemming_strategy = Xapian::QueryParser::STEM_SOME
139
+ @@query_parser.database = @@db
140
+ @@query_parser.default_op = Xapian::Query::OP_AND
141
+
142
+ @@terms_by_capital = {}
143
+ @@values_by_number = {}
144
+ @@values_by_prefix = {}
145
+ @@value_ranges_store = []
146
+
147
+ @@init_values.each do |classname, options|
148
+
149
+ # go through the various field types, and tell query parser about them,
150
+ # and error check them - i.e. check for consistency between models
151
+ @@query_parser.add_boolean_prefix("model", "M")
152
+ @@query_parser.add_boolean_prefix("modelid", "I")
153
+ if options[:terms]
154
+ for term in options[:terms]
155
+ raise "Use a single capital letter for term code" if not term[1].match(/^[A-Z]$/)
156
+ raise "M and I are reserved for use as the model/id term" if term[1] == "M" or term[1] == "I"
157
+ raise "model and modelid are reserved for use as the model/id prefixes" if term[2] == "model" or term[2] == "modelid"
158
+ raise "Z is reserved for stemming terms" if term[1] == "Z"
159
+ raise "Already have code '" + term[1] + "' in another model but with different prefix '" + @@terms_by_capital[term[1]] + "'" if @@terms_by_capital.include?(term[1]) && @@terms_by_capital[term[1]] != term[2]
160
+ @@terms_by_capital[term[1]] = term[2]
161
+ @@query_parser.add_prefix(term[2], term[1])
162
+ end
163
+ end
164
+ if options[:values]
165
+ for value in options[:values]
166
+ raise "Value index '"+value[1].to_s+"' must be an integer, is " + value[1].class.to_s if value[1].class != 1.class
167
+ raise "Already have value index '" + value[1].to_s + "' in another model but with different prefix '" + @@values_by_number[value[1]].to_s + "'" if @@values_by_number.include?(value[1]) && @@values_by_number[value[1]] != value[2]
168
+
169
+ # date types are special, mark them so the first model they're seen for
170
+ if !@@values_by_number.include?(value[1])
171
+ if value[3] == :date
172
+ value_range = Xapian::DateValueRangeProcessor.new(value[1])
173
+ elsif value[3] == :string
174
+ value_range = Xapian::StringValueRangeProcessor.new(value[1])
175
+ elsif value[3] == :number
176
+ value_range = Xapian::NumberValueRangeProcessor.new(value[1])
177
+ else
178
+ raise "Unknown value type '" + value[3].to_s + "'"
179
+ end
180
+
181
+ @@query_parser.add_valuerangeprocessor(value_range)
182
+
183
+ # stop it being garbage collected, as
184
+ # add_valuerangeprocessor ref is outside Ruby's GC
185
+ @@value_ranges_store.push(value_range)
186
+ end
187
+
188
+ @@values_by_number[value[1]] = value[2]
189
+ @@values_by_prefix[value[2]] = value[1]
190
+ end
191
+ end
192
+ end
193
+ end
194
+
195
+ def ActsAsXapian.writable_init(suffix = "")
196
+ raise NoXapianRubyBindingsError.new("Xapian Ruby bindings not installed") unless ActsAsXapian.bindings_available
197
+ raise "acts_as_xapian hasn't been called in any models" if @@init_values.empty?
198
+
199
+ # if DB is not nil, then we're already initialised, so don't do it again
200
+ return unless @@writable_db.nil?
201
+
202
+ prepare_environment
203
+
204
+ new_path = @@db_path + suffix
205
+ raise "writable_suffix/suffix inconsistency" if @@writable_suffix && @@writable_suffix != suffix
206
+ if @@writable_db.nil?
207
+ # for indexing
208
+ @@writable_db = Xapian::WritableDatabase.new(new_path, Xapian::DB_CREATE_OR_OPEN)
209
+ @@term_generator = Xapian::TermGenerator.new()
210
+ @@term_generator.set_flags(Xapian::TermGenerator::FLAG_SPELLING, 0)
211
+ @@term_generator.database = @@writable_db
212
+ @@term_generator.stemmer = @@stemmer
213
+ @@writable_suffix = suffix
214
+ end
215
+ end
216
+
217
+ ######################################################################
218
+ # Search with a query or for similar models
219
+
220
+ # Base class for Search and Similar below
221
+ class QueryBase
222
+ attr_accessor :offset
223
+ attr_accessor :limit
224
+ attr_accessor :query
225
+ attr_accessor :matches
226
+ attr_accessor :query_models
227
+
228
+ def initialize_db
229
+ ActsAsXapian.readable_init
230
+ if ActsAsXapian.db.nil?
231
+ raise "ActsAsXapian not initialized"
232
+ end
233
+ end
234
+
235
+ # Set self.query before calling this
236
+ def initialize_query(options)
237
+ #raise options.to_yaml
238
+
239
+ offset = options[:offset] || 0; offset = offset.to_i
240
+ limit = options[:limit]
241
+ raise "please specifiy maximum number of results to return with parameter :limit" if not limit
242
+ limit = limit.to_i
243
+ sort_by_prefix = options[:sort_by_prefix] || nil
244
+ sort_by_ascending = options[:sort_by_ascending].nil? ? true : options[:sort_by_ascending]
245
+ collapse_by_prefix = options[:collapse_by_prefix] || nil
246
+
247
+ ActsAsXapian.enquire.query = self.query
248
+
249
+ if sort_by_prefix.nil?
250
+ ActsAsXapian.enquire.sort_by_relevance!
251
+ else
252
+ value = ActsAsXapian.values_by_prefix[sort_by_prefix]
253
+ raise "couldn't find prefix '" + sort_by_prefix + "'" if value.nil?
254
+ ActsAsXapian.enquire.sort_by_value_then_relevance!(value, sort_by_ascending)
255
+ end
256
+ if collapse_by_prefix.nil?
257
+ ActsAsXapian.enquire.collapse_key = Xapian.BAD_VALUENO
258
+ else
259
+ value = ActsAsXapian.values_by_prefix[collapse_by_prefix]
260
+ raise "couldn't find prefix '" + collapse_by_prefix + "'" if value.nil?
261
+ ActsAsXapian.enquire.collapse_key = value
262
+ end
263
+
264
+ self.matches = ActsAsXapian.enquire.mset(offset, limit, 100)
265
+ @cached_results = nil
266
+ end
267
+
268
+ # Return a description of the query
269
+ def description
270
+ self.query.description
271
+ end
272
+
273
+ # Estimate total number of results
274
+ def matches_estimated
275
+ self.matches.matches_estimated
276
+ end
277
+
278
+ def estimate_is_exact
279
+ self.matches.estimate_is_exact
280
+ end
281
+
282
+ # Return query string with spelling correction
283
+ def spelling_correction
284
+ correction = ActsAsXapian.query_parser.get_corrected_query_string
285
+ if correction.empty?
286
+ return nil
287
+ end
288
+ return correction
289
+ end
290
+
291
+ # Return array of models found
292
+ def results
293
+ # If they've already pulled out the results, just return them.
294
+ if not @cached_results.nil?
295
+ return @cached_results
296
+ end
297
+
298
+ # Pull out all the results
299
+ docs = []
300
+ iter = self.matches._begin
301
+ while not iter.equals(self.matches._end)
302
+ docs.push({:data => iter.document.data,
303
+ :percent => iter.percent,
304
+ :weight => iter.weight,
305
+ :collapse_count => iter.collapse_count})
306
+ iter.next
307
+ end
308
+
309
+ # Look up without too many SQL queries
310
+ lhash = {}
311
+ lhash.default = []
312
+ for doc in docs
313
+ k = doc[:data].split('-')
314
+ lhash[k[0]] = lhash[k[0]] + [k[1]]
315
+ end
316
+ # for each class, look up all ids
317
+ chash = {}
318
+ for cls, ids in lhash
319
+ # conditions = [ "#{cls.constantize.table_name}.#{cls.constantize.primary_key} in (?)", ids ]
320
+ # found = cls.constantize.find(:all, :conditions => conditions, :include => cls.constantize.xapian_options[:eager_load])
321
+ found = Object.full_const_get(cls).all(:id.in => ids)
322
+ for f in found
323
+ chash[[cls, f.id]] = f
324
+ end
325
+ end
326
+ # now get them in right order again
327
+ results = []
328
+ docs.each{|doc| k = doc[:data].split('-'); results << { :model => chash[[k[0], k[1].to_i]],
329
+ :percent => doc[:percent], :weight => doc[:weight], :collapse_count => doc[:collapse_count] } }
330
+ @cached_results = results
331
+ return results
332
+ end
333
+ end
334
+
335
+ # Search for a query string, returns an array of hashes in result order.
336
+ # Each hash contains the actual Rails object in :model, and other detail
337
+ # about relevancy etc. in other keys.
338
+ class Search < QueryBase
339
+ attr_accessor :query_string
340
+
341
+ # Note that model_classes is not only sometimes useful here - it's
342
+ # essential to make sure the classes have been loaded, and thus
343
+ # acts_as_xapian called on them, so we know the fields for the query
344
+ # parser.
345
+
346
+ # model_classes - model classes to search within, e.g. [PublicBody,
347
+ # User]. Can take a single model class, or you can express the model
348
+ # class names in strings if you like.
349
+ # query_string - user inputed query string, with syntax much like Google Search
350
+ def initialize(model_classes, query_string, options = {})
351
+ # Check parameters, convert to actual array of model classes
352
+ new_model_classes = []
353
+ model_classes = [model_classes] if model_classes.class != Array
354
+ for model_class in model_classes:
355
+ raise "pass in the model class itself, or a string containing its name" if model_class.class != Class && model_class.class != String
356
+ model_class = Object.full_const_get(model_class) if model_class.class == String
357
+ new_model_classes.push(model_class)
358
+ end
359
+ model_classes = new_model_classes
360
+
361
+ # Set things up
362
+ self.initialize_db
363
+
364
+ # Case of a string, searching for a Google-like syntax query
365
+ self.query_string = query_string
366
+
367
+ # Construct query which only finds things from specified models
368
+ model_query = Xapian::Query.new(Xapian::Query::OP_OR, model_classes.map{|mc| "M" + mc.to_s})
369
+ user_query = ActsAsXapian.query_parser.parse_query(self.query_string,
370
+ Xapian::QueryParser::FLAG_BOOLEAN | Xapian::QueryParser::FLAG_PHRASE |
371
+ Xapian::QueryParser::FLAG_LOVEHATE | Xapian::QueryParser::FLAG_WILDCARD |
372
+ Xapian::QueryParser::FLAG_SPELLING_CORRECTION)
373
+ self.query = Xapian::Query.new(Xapian::Query::OP_AND, model_query, user_query)
374
+
375
+ # Call base class constructor
376
+ self.initialize_query(options)
377
+ end
378
+
379
+ # Return just normal words in the query i.e. Not operators, ones in
380
+ # date ranges or similar. Use this for cheap highlighting with
381
+ # TextHelper::highlight, and excerpt.
382
+ def words_to_highlight
383
+ query_nopunc = self.query_string.gsub(/[^a-z0-9:\.\/_]/i, " ")
384
+ query_nopunc = query_nopunc.gsub(/\s+/, " ")
385
+ words = query_nopunc.split(" ")
386
+ # Remove anything with a :, . or / in it
387
+ words = words.find_all {|o| !o.match(/(:|\.|\/)/) }
388
+ words = words.find_all {|o| !o.match(/^(AND|NOT|OR|XOR)$/) }
389
+ return words
390
+ end
391
+
392
+ end
393
+
394
+ # Search for models which contain theimportant terms taken from a specified
395
+ # list of models. i.e. Use to find documents similar to one (or more)
396
+ # documents, or use to refine searches.
397
+ class Similar < QueryBase
398
+ attr_accessor :query_models
399
+ attr_accessor :important_terms
400
+
401
+ # model_classes - model classes to search within, e.g. [PublicBody, User]
402
+ # query_models - list of models you want to find things similar to
403
+ def initialize(model_classes, query_models, options = {})
404
+ self.initialize_db
405
+
406
+ # Case of an array, searching for models similar to those models in the array
407
+ self.query_models = query_models
408
+
409
+ # Find the documents by their unique term
410
+ input_models_query = Xapian::Query.new(Xapian::Query::OP_OR, query_models.map{|m| "I" + m.xapian_document_term})
411
+ ActsAsXapian.enquire.query = input_models_query
412
+ matches = ActsAsXapian.enquire.mset(0, 100, 100) # XXX so this whole method will only work with 100 docs
413
+
414
+ # Get set of relevant terms for those documents
415
+ selection = Xapian::RSet.new()
416
+ iter = matches._begin
417
+ while not iter.equals(matches._end)
418
+ selection.add_document(iter)
419
+ iter.next
420
+ end
421
+
422
+ # Bit weird that the function to make esets is part of the enquire
423
+ # object. This explains what exactly it does, which is to exclude
424
+ # terms in the existing query.
425
+ # http://thread.gmane.org/gmane.comp.search.xapian.general/3673/focus=3681
426
+ eset = ActsAsXapian.enquire.eset(40, selection)
427
+
428
+ # Do main search for them
429
+ self.important_terms = []
430
+ iter = eset._begin
431
+ while not iter.equals(eset._end)
432
+ self.important_terms.push(iter.term)
433
+ iter.next
434
+ end
435
+ similar_query = Xapian::Query.new(Xapian::Query::OP_OR, self.important_terms)
436
+ # Exclude original
437
+ combined_query = Xapian::Query.new(Xapian::Query::OP_AND_NOT, similar_query, input_models_query)
438
+
439
+ # Restrain to model classes
440
+ model_query = Xapian::Query.new(Xapian::Query::OP_OR, model_classes.map{|mc| "M" + mc.to_s})
441
+ self.query = Xapian::Query.new(Xapian::Query::OP_AND, model_query, combined_query)
442
+
443
+ # Call base class constructor
444
+ self.initialize_query(options)
445
+ end
446
+ end
447
+
448
+ ######################################################################
449
+ # Index
450
+
451
+ # Offline indexing job queue model, create with migration made
452
+ # using "script/generate acts_as_xapian" as described in ../README.txt
453
+ class ActsAsXapianJob
454
+ include ::DataMapper::Resource
455
+ property :id, Integer, :serial => true
456
+ property :model_class, String
457
+ property :model_id, Integer
458
+ property :action, String
459
+ # add_index :acts_as_xapian_jobs, [:model_class, :model_id], :unique => true
460
+ end
461
+
462
+ # Update index with any changes needed, call this offline. Only call it
463
+ # from a script that exits - otherwise Xapian's writable database won't
464
+ # flush your changes. Specifying flush will reduce performance, but
465
+ # make sure that each index update is definitely saved to disk before
466
+ # logging in the database that it has been.
467
+ def ActsAsXapian.update_index(flush = false, verbose = false)
468
+ ActsAsXapian.writable_init
469
+
470
+ ids_to_refresh = ActsAsXapianJob.all.map() { |i| i.id }
471
+ for id in ids_to_refresh
472
+ begin
473
+ # TODO: transaction
474
+ # ActiveRecord::Base.transaction do
475
+ job = ActsAsXapianJob.first(:id => id) #, :lock =>true)
476
+ STDOUT.puts("ActsAsXapian.update_index #{job.action} #{job.model_class} #{job.model_id.to_s}") if verbose
477
+ if job.action == 'update'
478
+ # XXX Index functions may reference other models, so we could eager load here too?
479
+ model = Object.full_const_get(job.model_class).first(:id => job.model_id) # :include => cls.constantize.xapian_options[:include]
480
+ model.xapian_index
481
+ elsif job.action == 'destroy'
482
+ # Make dummy model with right id, just for destruction
483
+ model = Object.full_const_get(job.model_class).new
484
+ model.id = job.model_id
485
+ model.xapian_destroy
486
+ else
487
+ raise "unknown ActsAsXapianJob action '" + job.action + "'"
488
+ end
489
+ job.destroy
490
+
491
+ if flush
492
+ ActsAsXapian.writable_db.flush
493
+ end
494
+ # end
495
+ rescue => detail
496
+ # print any error, and carry on so other things are indexed
497
+ # XXX If item is later deleted, this should give up, and it
498
+ # won't. It will keep trying (assuming update_index called from
499
+ # regular cron job) and mayhap cause trouble.
500
+ STDERR.puts(detail.backtrace.join("\n") + "\nFAILED ActsAsXapian.update_index job #{id} #{$!}")
501
+ end
502
+ end
503
+ end
504
+
505
+ # You must specify *all* the models here, this totally rebuilds the Xapian database.
506
+ # You'll want any readers to reopen the database after this.
507
+ def ActsAsXapian.rebuild_index(model_classes, verbose = false)
508
+ raise "when rebuilding all, please call as first and only thing done in process / task" if not ActsAsXapian.writable_db.nil?
509
+
510
+ prepare_environment
511
+
512
+ # Delete any existing .new database, and open a new one
513
+ new_path = ActsAsXapian.db_path + ".new"
514
+ if File.exist?(new_path)
515
+ raise "found existing " + new_path + " which is not Xapian flint database, please delete for me" if not File.exist?(File.join(new_path, "iamflint"))
516
+ FileUtils.rm_r(new_path)
517
+ end
518
+ ActsAsXapian.writable_init(".new")
519
+
520
+ # Index everything
521
+ # XXX not a good place to do this destroy, as unindexed list is lost if
522
+ # process is aborted and old database carries on being used. Perhaps do in
523
+ # transaction and commit after rename below? Not sure if thenlocking is then bad
524
+ # for live website running at same time.
525
+
526
+ ActsAsXapianJob.all.destroy!
527
+
528
+ batch_size = 1000
529
+ for model_class in model_classes
530
+ # TODO: transaction
531
+ # model_class.transaction do
532
+ 0.step(model_class.count, batch_size) do |i|
533
+ STDOUT.puts("ActsAsXapian: New batch. From #{i} to #{i + batch_size}") if verbose
534
+ models = model_class.all(:limit => batch_size, :offset => i)
535
+ for model in models
536
+ STDOUT.puts("ActsAsXapian.rebuild_index #{model_class} #{model.id}") if verbose
537
+ model.xapian_index
538
+ end
539
+ end
540
+ # end
541
+ end
542
+
543
+ ActsAsXapian.writable_db.flush
544
+
545
+ # Rename into place
546
+ old_path = ActsAsXapian.db_path
547
+ temp_path = ActsAsXapian.db_path + ".tmp"
548
+ if File.exist?(temp_path)
549
+ raise "temporary database found " + temp_path + " which is not Xapian flint database, please delete for me" if not File.exist?(File.join(temp_path, "iamflint"))
550
+ FileUtils.rm_r(temp_path)
551
+ end
552
+ if File.exist?(old_path)
553
+ FileUtils.mv old_path, temp_path
554
+ end
555
+ FileUtils.mv new_path, old_path
556
+
557
+ # Delete old database
558
+ if File.exist?(temp_path)
559
+ raise "old database now at " + temp_path + " is not Xapian flint database, please delete for me" if not File.exist?(File.join(temp_path, "iamflint"))
560
+ FileUtils.rm_r(temp_path)
561
+ end
562
+
563
+ # You'll want to restart your FastCGI or Mongrel processes after this,
564
+ # so they get the new db
565
+ end
566
+
567
+ end
568
+
569
+ module DataMapper
570
+ module Xapian
571
+
572
+ def is_indexed(options)
573
+ cattr_accessor :xapian_options
574
+
575
+ ActsAsXapian.set_options(self.to_s, options)
576
+ self.xapian_options = options
577
+
578
+ self.after :save do
579
+ model_class = self.class.to_s
580
+ model_id = self.id
581
+ # TODO: transaction
582
+ # ActiveRecord::Base.transaction do
583
+ found = ::ActsAsXapian::ActsAsXapianJob.all(:model_class => model_class, :model_id => model_id).destroy!
584
+ job = ::ActsAsXapian::ActsAsXapianJob.create(:model_class => model_class, :model_id => model_id, :action => "update")
585
+ # job.save!
586
+ # end
587
+ end
588
+
589
+ self.after :destroy do
590
+ :xapian_mark_needs_destroy
591
+ model_class = self.class.to_s
592
+ model_id = self.id
593
+ # TODO: transaction
594
+ # ActiveRecord::Base.transaction do
595
+ found = ::ActsAsXapian::ActsAsXapianJob.all(:model_class => model_class, :model_id => model_id).destroy!
596
+ job = ::ActsAsXapian::ActsAsXapianJob.create(:model_class => model_class, :model_id => model_id, :action => "destroy")
597
+ # end
598
+ end
599
+
600
+ include DataMapper::Xapian::InstanceMethods
601
+ end
602
+
603
+ module InstanceMethods
604
+ # Used internally
605
+ def xapian_document_term
606
+ self.class.to_s + "-" + self.id.to_s
607
+ end
608
+
609
+ # Extract value of a field from the model
610
+ def xapian_value(field, type = nil)
611
+ # value = self[field] || self.send(field.to_sym)
612
+ value = self.send(field.to_sym)
613
+ if type == :date
614
+ if value.kind_of?(Time)
615
+ value.utc.strftime("%Y%m%d")
616
+ elsif value.kind_f?(Date)
617
+ value.to_time.utc.strftime("%Y%m%d")
618
+ else
619
+ raise "Only Time or Date types supported by acts_as_xapian for :date fields, got " + value.class.to_s
620
+ end
621
+ elsif type == :boolean
622
+ value ? true : false
623
+ else
624
+ value.to_s
625
+ end
626
+ end
627
+
628
+ # Store record in the Xapian database
629
+ def xapian_index
630
+ # if we have a conditional function for indexing, call it and destory object if failed
631
+ if self.class.xapian_options.include?(:if)
632
+ if_value = xapian_value(self.class.xapian_options[:if], :boolean)
633
+ if not if_value
634
+ self.xapian_destroy
635
+ return
636
+ end
637
+ end
638
+
639
+ # otherwise (re)write the Xapian record for the object
640
+ doc = ::Xapian::Document.new
641
+ ActsAsXapian.term_generator.document = doc
642
+
643
+ doc.data = self.xapian_document_term
644
+
645
+ doc.add_term("M" + self.class.to_s)
646
+ doc.add_term("I" + doc.data)
647
+ if self.xapian_options[:terms]
648
+ for term in self.xapian_options[:terms]
649
+ ActsAsXapian.term_generator.increase_termpos # stop phrases spanning different text fields
650
+ ActsAsXapian.term_generator.index_text(xapian_value(term[0]), 1, term[1])
651
+ end
652
+ end
653
+ if self.xapian_options[:values]
654
+ for value in self.xapian_options[:values]
655
+ doc.add_value(value[1], xapian_value(value[0], value[3]))
656
+ end
657
+ end
658
+ if self.xapian_options[:texts]
659
+ for text in self.xapian_options[:texts]
660
+ ActsAsXapian.term_generator.increase_termpos # stop phrases spanning different text fields
661
+ # XXX the "1" here is a weight that could be varied for a boost function
662
+ ActsAsXapian.term_generator.index_text(xapian_value(text), 1)
663
+ end
664
+ end
665
+
666
+ ActsAsXapian.writable_db.replace_document("I" + doc.data, doc)
667
+ end
668
+
669
+ # Delete record from the Xapian database
670
+ def xapian_destroy
671
+ ActsAsXapian.writable_db.delete_document("I" + self.xapian_document_term)
672
+ end
673
+
674
+ end
675
+ end
676
+ end
677
+
678
+ module DataMapper
679
+ module Resource
680
+ module ClassMethods
681
+ include DataMapper::Xapian
682
+ end
683
+ end
684
+ end
685
+
686
+ if defined?(Merb::Plugins)
687
+ # Merb gives you a Merb::Plugins.config hash...feel free to put your stuff in your piece of it
688
+ # Merb::Plugins.config[:dm_xapian] = {
689
+ # :chickens => false
690
+ # }
691
+
692
+ # Merb::BootLoader.before_app_loads do
693
+ #
694
+ # end
695
+
696
+ Merb::BootLoader.after_app_loads do
697
+ # code that can be required after the application loads
698
+ ActsAsXapian.configure(Merb.env||'development', Merb.root)
699
+ end
700
+
701
+ Merb::Plugins.add_rakefiles "dm-xapian/merbtasks"
702
+ end
metadata ADDED
@@ -0,0 +1,71 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: psq-dm-xapian
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.3.1
5
+ platform: ruby
6
+ authors:
7
+ - Joshaven Potter, Pascal Belloncle
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2008-11-08 00:00:00 -08:00
13
+ default_executable:
14
+ dependencies:
15
+ - !ruby/object:Gem::Dependency
16
+ name: merb
17
+ version_requirement:
18
+ version_requirements: !ruby/object:Gem::Requirement
19
+ requirements:
20
+ - - ">="
21
+ - !ruby/object:Gem::Version
22
+ version: 0.9.7
23
+ version:
24
+ description: Merb plugin that provides access to the Ruby Xapian search engine library
25
+ email: yourtech@gmail.com, psq@nanorails.com
26
+ executables: []
27
+
28
+ extensions: []
29
+
30
+ extra_rdoc_files:
31
+ - README.txt
32
+ - LICENSE
33
+ - TODO
34
+ - SETUP.txt
35
+ - CHANGES.txt
36
+ files:
37
+ - LICENSE
38
+ - README.txt
39
+ - Rakefile
40
+ - TODO
41
+ - lib/dm-xapian.rb
42
+ - SETUP.txt
43
+ - CHANGES.txt
44
+ has_rdoc: true
45
+ homepage: http://github.com/psq/dm-xapian
46
+ post_install_message:
47
+ rdoc_options: []
48
+
49
+ require_paths:
50
+ - lib
51
+ required_ruby_version: !ruby/object:Gem::Requirement
52
+ requirements:
53
+ - - ">="
54
+ - !ruby/object:Gem::Version
55
+ version: "0"
56
+ version:
57
+ required_rubygems_version: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: "0"
62
+ version:
63
+ requirements: []
64
+
65
+ rubyforge_project: merb
66
+ rubygems_version: 1.2.0
67
+ signing_key:
68
+ specification_version: 2
69
+ summary: Merb plugin that provides access to the Ruby Xapian search engine library
70
+ test_files: []
71
+