psq-dm-xapian 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (8) hide show
  1. data/CHANGES.txt +4 -0
  2. data/LICENSE +51 -0
  3. data/README.txt +57 -0
  4. data/Rakefile +57 -0
  5. data/SETUP.txt +91 -0
  6. data/TODO +0 -0
  7. data/lib/dm-xapian.rb +702 -0
  8. metadata +71 -0
data/CHANGES.txt ADDED
@@ -0,0 +1,4 @@
1
+ * 2008-11-08
2
+ - version is now 0.3
3
+ - applied patch from Emmanuel Surleau: make the rake tasks work with merb 0.9.13 (should also work with 1.0, I assume, but I haven't upgraded yet). In addition, rebuild_index complains in an explicit way when a class name is not a model or indexed by Xapian
4
+
data/LICENSE ADDED
@@ -0,0 +1,51 @@
1
+ dm-xapian is released under the MIT License.
2
+ Copyright (c) 2008 Joshaven Potter <yourtech@gmail.com>
3
+ Copyright (c) 2008 Pascal Belloncle <psq@nanorails.com>
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
23
+
24
+
25
+
26
+
27
+
28
+ This software is a Fork of http://github.com/frabcus/acts_as_xapian which
29
+ was licensed under the following:
30
+
31
+ acts_as_xapian is released under the MIT License.
32
+
33
+ Copyright (c) 2008 UK Citizens Online Democracy.
34
+
35
+ Permission is hereby granted, free of charge, to any person obtaining a copy
36
+ of the acts_as_xapian software and associated documentation files (the
37
+ "Software"), to deal in the Software without restriction, including without
38
+ limitation the rights to use, copy, modify, merge, publish, distribute,
39
+ sublicense, and/or sell copies of the Software, and to permit persons to whom
40
+ the Software is furnished to do so, subject to the following conditions:
41
+
42
+ The above copyright notice and this permission notice shall be included in all
43
+ copies or substantial portions of the Software.
44
+
45
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
46
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
47
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
48
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
49
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
50
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
51
+ THE SOFTWARE.
data/README.txt ADDED
@@ -0,0 +1,57 @@
1
+ CONTRIBUTORS WELCOME
2
+
3
+ Currently somewhat tested, but, ahem, no spec:
4
+
5
+ * rake xapian:update_index models="model1 model2 ..."
6
+ * rake xapian:rebuild_index models="model1 model2 ..."
7
+ * rake xapian:query models="model1 model2 ..." query="..."
8
+ * tracking of models to be updated after added or updated (via rake xapian:update_index)
9
+
10
+
11
+ dm-xapian
12
+ =========
13
+
14
+ Merb plugin that provides use of the Ruby Xapian search engine library.
15
+
16
+
17
+ Setup
18
+ =====
19
+
20
+ For setup instructions read through SETUP.txt, its short and tells you where to get what else you need and what to do with it.
21
+
22
+
23
+ Xapian
24
+ ======
25
+
26
+ Xapian is an Open Source Search Engine Library, released under the GPL. It's written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C# and Ruby (so far!)
27
+
28
+ Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.
29
+
30
+ If you're after a packaged search engine for your website, you should take a look at Omega: an application we supply built upon Xapian. Unlike most other website search solutions, Xapian's versatility allows you to extend Omega to meet your needs as they grow.
31
+
32
+ The latest stable version is 1.0.7, released on 2008-07-15.
33
+
34
+ http://xapian.org/
35
+
36
+
37
+ Xapian Bindings for Ruby
38
+ ========================
39
+ The Ruby bindings for Xapian are packaged in the xapian module.
40
+
41
+ General info: http://xapian.org/docs/bindings/ruby/
42
+ API Docs: http://xapian.org/docs/bindings/ruby/rdocs/
43
+
44
+ To Use
45
+ ======
46
+ * install gem via "rake install"
47
+ * add to config/init.rb:
48
+ dependencies "dm-xapian"
49
+ * add the dm-xapian models to the database:
50
+ rake dm:automigrate
51
+ * Add to each model:
52
+ is_indexed :texts => [ :name, :region, :country, :varietal ],
53
+ :values => [[:price, 0, "price", :float], [:ean, 1, "ean", :string]],
54
+ :terms => [ [ :winery, 'W', "winery" ] ]
55
+
56
+ * Terms are global across all models
57
+ * texts, values and terms based on properties
data/Rakefile ADDED
@@ -0,0 +1,57 @@
1
+ require 'rubygems'
2
+ require 'rake/gempackagetask'
3
+ require 'rubygems/specification'
4
+ require 'date'
5
+ require 'merb-core/version'
6
+ require 'merb-core/tasks/merb_rake_helper'
7
+
8
+ NAME = "dm-xapian"
9
+ GEM_VERSION = "0.3"
10
+ AUTHOR = "Joshaven Potter, Pascal Belloncle"
11
+ EMAIL = "yourtech@gmail.com, psq@nanorails.com"
12
+ HOMEPAGE = "http://github.com/psq/dm-xapian"
13
+ SUMMARY = "Merb plugin that provides use of the Ruby Xapian search engine library"
14
+
15
+ spec = Gem::Specification.new do |s|
16
+ s.rubyforge_project = 'merb'
17
+ s.name = NAME
18
+ s.version = GEM_VERSION
19
+ s.platform = Gem::Platform::RUBY
20
+ s.has_rdoc = true
21
+ s.extra_rdoc_files = ["README.txt", "LICENSE", 'TODO', 'SETUP.txt', 'CHANGES.txt']
22
+ s.summary = SUMMARY
23
+ s.description = s.summary
24
+ s.author = AUTHOR
25
+ s.email = EMAIL
26
+ s.homepage = HOMEPAGE
27
+ s.add_dependency('merb', '>= 0.9.7')
28
+ s.require_path = 'lib'
29
+ s.files = %w(LICENSE README.txt Rakefile TODO CHANGES.txt SETUP.txt) + Dir.glob("{lib,spec}/**/*")
30
+
31
+ end
32
+
33
+ Rake::GemPackageTask.new(spec) do |pkg|
34
+ pkg.gem_spec = spec
35
+ end
36
+
37
+ desc "install the plugin locally"
38
+ task :install => [:package] do
39
+ Merb::RakeHelper.install(NAME, :version => GEM_VERSION)
40
+ # sudo "gem install #{install_home} pkg/#{NAME}-#{GEM_VERSION} --no-update-sources"
41
+ end
42
+
43
+ desc "create a gemspec file"
44
+ task :make_spec do
45
+ File.open("#{NAME}.gemspec", "w") do |file|
46
+ file.puts spec.to_ruby
47
+ end
48
+ end
49
+
50
+ namespace :jruby do
51
+
52
+ desc "Run :package and install the resulting .gem with jruby"
53
+ task :install => :package do
54
+ sudo "jruby -S gem install #{install_home} pkg/#{NAME}-#{GEM_VERSION}.gem --no-rdoc --no-ri"
55
+ end
56
+
57
+ end
data/SETUP.txt ADDED
@@ -0,0 +1,91 @@
1
+ Overview
2
+ ========
3
+
4
+ I have basic copy and paste instructions below... *You may want to set the SRC bash variable*
5
+ The same instructions with extra text around them are here: http://xapian.org/docs/install.html
6
+
7
+ If you received the following error then read the approprate section, 'On OS X' or 'On Linux'
8
+ dm_xapian: No Ruby bindings for Xapian installed. Please follow setup instructions.
9
+
10
+
11
+
12
+
13
+ Everyone
14
+ ========
15
+
16
+ After downloading & installing Xapian and its bindings you'll need to drop this in your plugins folder
17
+ TODO: Complete this doc & verify drop-in functionality mentioned above
18
+
19
+
20
+
21
+
22
+ On OS X (with Xcode so you can compile)
23
+ =======================================
24
+
25
+ # navigate to your source director or create one with mkdir ~/src
26
+ SRC=~/src
27
+ cd $SRC
28
+
29
+ # download packages (your welcome to check for newer code but don't blame me if it doesn't work)
30
+ curl -O http://oligarchy.co.uk/xapian/1.0.7/xapian-core-1.0.7.tar.gz
31
+ curl -O http://oligarchy.co.uk/xapian/1.0.7/xapian-bindings-1.0.7.tar.gz
32
+
33
+ # uncompress the downloads
34
+ tar zxvf xapian-core-1.0.7.tar.gz
35
+ tar zxvf xapian-bindings-1.0.7.tar.gz
36
+
37
+ # compile & install xapian-core
38
+ cd $SRC/xapian-core-1.0.7
39
+ ./configure --prefix=/opt
40
+ make
41
+ sudo make install
42
+
43
+
44
+ # compile & install xapian-bindings ** the ruby goes here **
45
+ # You can find the Xapian API here: http://xapian.org/docs/bindings/ruby/rdocs/classes/Xapian.html
46
+ cd $SRC/xapian-bindings-1.0.7
47
+ ./configure XAPIAN_CONFIG=/opt/bin/xapian-config
48
+ make
49
+ sudo make install
50
+ if [ $? = 0 ];then echo;echo;echo "All set, have fun.";echo You can find the Xapian API here: http://xapian.org/docs/bindings/ruby/rdocs/classes/Xapian.html;else echo;echo;echo '\n\nSomething went wrong... try slower this time.';fi;echo;echo
51
+
52
+
53
+
54
+
55
+ On Linux
56
+ ========
57
+
58
+ SRC=~/src
59
+ cd $SRC
60
+ wget http://oligarchy.co.uk/xapian/1.0.7/xapian-core-1.0.7.tar.gz
61
+ wget http://oligarchy.co.uk/xapian/1.0.7/xapian-bindings-1.0.7.tar.gz
62
+
63
+ echo uncompressing the downloads
64
+ tar zxvf xapian-core-1.0.7.tar.gz
65
+ tar zxvf xapian-bindings-1.0.7.tar.gz
66
+
67
+ echo compile & install xapian-core
68
+ cd $SRC/xapian-core-1.0.7
69
+ ./configure --prefix=/opt
70
+ make
71
+ sudo make install
72
+
73
+
74
+ echo compile & install xapian-bindings ** the ruby goes here **
75
+ cd $SRC/xapian-bindings-1.0.7
76
+ ./configure XAPIAN_CONFIG=/opt/bin/xapian-config
77
+ make
78
+ sudo make install
79
+
80
+ if [ $? = 0 ];then echo;echo;echo "All set, have fun.";echo You can find the Xapian API here: http://xapian.org/docs/bindings/ruby/rdocs/classes/Xapian.html;else echo;echo;echo '\n\nSomething went wrong... try slower this time.';fi;echo;echo
81
+
82
+
83
+
84
+
85
+ On Windows
86
+ ==========
87
+ Click: Start
88
+ Click: Run
89
+ No, wait...
90
+ Save yourself time and get a Mac then follow the instructions above or at least use a VM to install on Ubuntu
91
+
data/TODO ADDED
File without changes
data/lib/dm-xapian.rb ADDED
@@ -0,0 +1,702 @@
1
+ # acts_as_xapian/lib/acts_as_xapian.rb:
2
+ # Xapian full text search in Ruby on Rails.
3
+ #
4
+ # Copyright (c) 2008 UK Citizens Online Democracy. All rights reserved.
5
+ # Email: francis@mysociety.org; WWW: http://www.mysociety.org/
6
+ #
7
+ # Documentation
8
+ # =============
9
+ #
10
+ # See ../README.txt foocumentation. Please update that file if you edit
11
+ # code.
12
+
13
+ # Make it so if Xapian isn't installed, the Rails app doesn't fail completely,
14
+ # just when somebody does a search.
15
+ begin
16
+ require 'xapian'
17
+ $acts_as_xapian_bindings_available = true
18
+ rescue LoadError
19
+ STDERR.puts "acts_as_xapian: No Ruby bindings for Xapian installed"
20
+ $acts_as_xapian_bindings_available = false
21
+ end
22
+
23
+ module ActsAsXapian
24
+ ######################################################################
25
+ # Module level variables
26
+ # XXX must be some kind of cattr_accessor that can do this better
27
+ def ActsAsXapian.bindings_available
28
+ $acts_as_xapian_bindings_available
29
+ end
30
+ class NoXapianRubyBindingsError < StandardError
31
+ end
32
+
33
+ # XXX global class intializers here get loaded more than once, don't know why. Protect them.
34
+ if not $acts_as_xapian_class_var_init
35
+ @@db = nil
36
+ @@db_path = nil
37
+ @@writable_db = nil
38
+ @@writable_suffix = nil
39
+ @@init_values = {}
40
+ $acts_as_xapian_class_var_init = true
41
+ end
42
+ def ActsAsXapian.db
43
+ @@db
44
+ end
45
+ def ActsAsXapian.db_path
46
+ @@db_path
47
+ end
48
+ def ActsAsXapian.writable_db
49
+ @@writable_db
50
+ end
51
+ def ActsAsXapian.stemmer
52
+ @@stemmer
53
+ end
54
+ def ActsAsXapian.term_generator
55
+ @@term_generator
56
+ end
57
+ def ActsAsXapian.enquire
58
+ @@enquire
59
+ end
60
+ def ActsAsXapian.query_parser
61
+ @@query_parser
62
+ end
63
+ def ActsAsXapian.values_by_prefix
64
+ @@values_by_prefix
65
+ end
66
+ def ActsAsXapian.config
67
+ @@config
68
+ end
69
+
70
+ def ActsAsXapian.configure(env, root)
71
+ @@environment = env
72
+ @@root = root
73
+ end
74
+
75
+ ######################################################################
76
+ def ActsAsXapian.set_options(classname, options)
77
+ if not classname.nil?
78
+ # store class and options for use later, when we open the db in readable_init
79
+ @@init_values[classname] = options
80
+ end
81
+ end
82
+
83
+ # Reads the config file (if any) and sets up the path to the database we'll be using
84
+ def ActsAsXapian.prepare_environment
85
+ return unless @@db_path.nil?
86
+
87
+ # barf if we can't figure out the environment
88
+ raise "Set RAILS_ENV, so acts_as_xapian can find the right Xapian database" if not @@environment
89
+
90
+ # check for a config file
91
+ config_file = @@root + "/config/xapian.yml"
92
+ @@config = File.exists?(config_file) ? YAML.load_file(config_file)[@@environment] : {}
93
+
94
+ # figure out where the DBs should go
95
+ db_parent_path = File.join(@@root, config['base_db_path'] ? config['base_db_path']: 'xapiandbs/')
96
+
97
+ # make the directory for the xapian databases to go in
98
+ Dir.mkdir(db_parent_path) unless File.exists?(db_parent_path)
99
+
100
+ @@db_path = File.join(db_parent_path, @@environment)
101
+
102
+ # make some things that don't depend on the db
103
+ # XXX this gets made once for each acts_as_xapian. Oh well.
104
+ @@stemmer = Xapian::Stem.new('english')
105
+ end
106
+
107
+ # Opens / reopens the db for reading
108
+ # XXX we perhaps don't need to rebuild database and enquire and queryparser -
109
+ # but db.reopen wasn't enough by itself, so just do everything it's easier.
110
+ def ActsAsXapian.readable_init
111
+ raise NoXapianRubyBindingsError.new("Xapian Ruby bindings not installed") unless ActsAsXapian.bindings_available
112
+ raise "acts_as_xapian hasn't been called in any models" if @@init_values.empty?
113
+
114
+ # if DB is not nil, then we're already initialised, so don't do it again
115
+ # XXX we need to reopen the database each time, so Xapian gets changes to it.
116
+ # Hopefully in later version of Xapian it will autodetect this, and this can
117
+ # be commented back in again.
118
+ # return unless @@db.nil?
119
+
120
+ prepare_environment
121
+
122
+ # basic Xapian objects
123
+ begin
124
+ @@db = Xapian::Database.new(@@db_path)
125
+ @@enquire = Xapian::Enquire.new(@@db)
126
+ rescue IOError
127
+ raise "Xapian database not opened; have you built it with scripts/rebuild-xapian-index ?"
128
+ end
129
+
130
+ init_query_parser
131
+ end
132
+
133
+ # Make a new query parser
134
+ def ActsAsXapian.init_query_parser
135
+ # for queries
136
+ @@query_parser = Xapian::QueryParser.new
137
+ @@query_parser.stemmer = @@stemmer
138
+ @@query_parser.stemming_strategy = Xapian::QueryParser::STEM_SOME
139
+ @@query_parser.database = @@db
140
+ @@query_parser.default_op = Xapian::Query::OP_AND
141
+
142
+ @@terms_by_capital = {}
143
+ @@values_by_number = {}
144
+ @@values_by_prefix = {}
145
+ @@value_ranges_store = []
146
+
147
+ @@init_values.each do |classname, options|
148
+
149
+ # go through the various field types, and tell query parser about them,
150
+ # and error check them - i.e. check for consistency between models
151
+ @@query_parser.add_boolean_prefix("model", "M")
152
+ @@query_parser.add_boolean_prefix("modelid", "I")
153
+ if options[:terms]
154
+ for term in options[:terms]
155
+ raise "Use a single capital letter for term code" if not term[1].match(/^[A-Z]$/)
156
+ raise "M and I are reserved for use as the model/id term" if term[1] == "M" or term[1] == "I"
157
+ raise "model and modelid are reserved for use as the model/id prefixes" if term[2] == "model" or term[2] == "modelid"
158
+ raise "Z is reserved for stemming terms" if term[1] == "Z"
159
+ raise "Already have code '" + term[1] + "' in another model but with different prefix '" + @@terms_by_capital[term[1]] + "'" if @@terms_by_capital.include?(term[1]) && @@terms_by_capital[term[1]] != term[2]
160
+ @@terms_by_capital[term[1]] = term[2]
161
+ @@query_parser.add_prefix(term[2], term[1])
162
+ end
163
+ end
164
+ if options[:values]
165
+ for value in options[:values]
166
+ raise "Value index '"+value[1].to_s+"' must be an integer, is " + value[1].class.to_s if value[1].class != 1.class
167
+ raise "Already have value index '" + value[1].to_s + "' in another model but with different prefix '" + @@values_by_number[value[1]].to_s + "'" if @@values_by_number.include?(value[1]) && @@values_by_number[value[1]] != value[2]
168
+
169
+ # date types are special, mark them so the first model they're seen for
170
+ if !@@values_by_number.include?(value[1])
171
+ if value[3] == :date
172
+ value_range = Xapian::DateValueRangeProcessor.new(value[1])
173
+ elsif value[3] == :string
174
+ value_range = Xapian::StringValueRangeProcessor.new(value[1])
175
+ elsif value[3] == :number
176
+ value_range = Xapian::NumberValueRangeProcessor.new(value[1])
177
+ else
178
+ raise "Unknown value type '" + value[3].to_s + "'"
179
+ end
180
+
181
+ @@query_parser.add_valuerangeprocessor(value_range)
182
+
183
+ # stop it being garbage collected, as
184
+ # add_valuerangeprocessor ref is outside Ruby's GC
185
+ @@value_ranges_store.push(value_range)
186
+ end
187
+
188
+ @@values_by_number[value[1]] = value[2]
189
+ @@values_by_prefix[value[2]] = value[1]
190
+ end
191
+ end
192
+ end
193
+ end
194
+
195
+ def ActsAsXapian.writable_init(suffix = "")
196
+ raise NoXapianRubyBindingsError.new("Xapian Ruby bindings not installed") unless ActsAsXapian.bindings_available
197
+ raise "acts_as_xapian hasn't been called in any models" if @@init_values.empty?
198
+
199
+ # if DB is not nil, then we're already initialised, so don't do it again
200
+ return unless @@writable_db.nil?
201
+
202
+ prepare_environment
203
+
204
+ new_path = @@db_path + suffix
205
+ raise "writable_suffix/suffix inconsistency" if @@writable_suffix && @@writable_suffix != suffix
206
+ if @@writable_db.nil?
207
+ # for indexing
208
+ @@writable_db = Xapian::WritableDatabase.new(new_path, Xapian::DB_CREATE_OR_OPEN)
209
+ @@term_generator = Xapian::TermGenerator.new()
210
+ @@term_generator.set_flags(Xapian::TermGenerator::FLAG_SPELLING, 0)
211
+ @@term_generator.database = @@writable_db
212
+ @@term_generator.stemmer = @@stemmer
213
+ @@writable_suffix = suffix
214
+ end
215
+ end
216
+
217
+ ######################################################################
218
+ # Search with a query or for similar models
219
+
220
+ # Base class for Search and Similar below
221
+ class QueryBase
222
+ attr_accessor :offset
223
+ attr_accessor :limit
224
+ attr_accessor :query
225
+ attr_accessor :matches
226
+ attr_accessor :query_models
227
+
228
+ def initialize_db
229
+ ActsAsXapian.readable_init
230
+ if ActsAsXapian.db.nil?
231
+ raise "ActsAsXapian not initialized"
232
+ end
233
+ end
234
+
235
+ # Set self.query before calling this
236
+ def initialize_query(options)
237
+ #raise options.to_yaml
238
+
239
+ offset = options[:offset] || 0; offset = offset.to_i
240
+ limit = options[:limit]
241
+ raise "please specifiy maximum number of results to return with parameter :limit" if not limit
242
+ limit = limit.to_i
243
+ sort_by_prefix = options[:sort_by_prefix] || nil
244
+ sort_by_ascending = options[:sort_by_ascending].nil? ? true : options[:sort_by_ascending]
245
+ collapse_by_prefix = options[:collapse_by_prefix] || nil
246
+
247
+ ActsAsXapian.enquire.query = self.query
248
+
249
+ if sort_by_prefix.nil?
250
+ ActsAsXapian.enquire.sort_by_relevance!
251
+ else
252
+ value = ActsAsXapian.values_by_prefix[sort_by_prefix]
253
+ raise "couldn't find prefix '" + sort_by_prefix + "'" if value.nil?
254
+ ActsAsXapian.enquire.sort_by_value_then_relevance!(value, sort_by_ascending)
255
+ end
256
+ if collapse_by_prefix.nil?
257
+ ActsAsXapian.enquire.collapse_key = Xapian.BAD_VALUENO
258
+ else
259
+ value = ActsAsXapian.values_by_prefix[collapse_by_prefix]
260
+ raise "couldn't find prefix '" + collapse_by_prefix + "'" if value.nil?
261
+ ActsAsXapian.enquire.collapse_key = value
262
+ end
263
+
264
+ self.matches = ActsAsXapian.enquire.mset(offset, limit, 100)
265
+ @cached_results = nil
266
+ end
267
+
268
+ # Return a description of the query
269
+ def description
270
+ self.query.description
271
+ end
272
+
273
+ # Estimate total number of results
274
+ def matches_estimated
275
+ self.matches.matches_estimated
276
+ end
277
+
278
+ def estimate_is_exact
279
+ self.matches.estimate_is_exact
280
+ end
281
+
282
+ # Return query string with spelling correction
283
+ def spelling_correction
284
+ correction = ActsAsXapian.query_parser.get_corrected_query_string
285
+ if correction.empty?
286
+ return nil
287
+ end
288
+ return correction
289
+ end
290
+
291
+ # Return array of models found
292
+ def results
293
+ # If they've already pulled out the results, just return them.
294
+ if not @cached_results.nil?
295
+ return @cached_results
296
+ end
297
+
298
+ # Pull out all the results
299
+ docs = []
300
+ iter = self.matches._begin
301
+ while not iter.equals(self.matches._end)
302
+ docs.push({:data => iter.document.data,
303
+ :percent => iter.percent,
304
+ :weight => iter.weight,
305
+ :collapse_count => iter.collapse_count})
306
+ iter.next
307
+ end
308
+
309
+ # Look up without too many SQL queries
310
+ lhash = {}
311
+ lhash.default = []
312
+ for doc in docs
313
+ k = doc[:data].split('-')
314
+ lhash[k[0]] = lhash[k[0]] + [k[1]]
315
+ end
316
+ # for each class, look up all ids
317
+ chash = {}
318
+ for cls, ids in lhash
319
+ # conditions = [ "#{cls.constantize.table_name}.#{cls.constantize.primary_key} in (?)", ids ]
320
+ # found = cls.constantize.find(:all, :conditions => conditions, :include => cls.constantize.xapian_options[:eager_load])
321
+ found = Object.full_const_get(cls).all(:id.in => ids)
322
+ for f in found
323
+ chash[[cls, f.id]] = f
324
+ end
325
+ end
326
+ # now get them in right order again
327
+ results = []
328
+ docs.each{|doc| k = doc[:data].split('-'); results << { :model => chash[[k[0], k[1].to_i]],
329
+ :percent => doc[:percent], :weight => doc[:weight], :collapse_count => doc[:collapse_count] } }
330
+ @cached_results = results
331
+ return results
332
+ end
333
+ end
334
+
335
+ # Search for a query string, returns an array of hashes in result order.
336
+ # Each hash contains the actual Rails object in :model, and other detail
337
+ # about relevancy etc. in other keys.
338
+ class Search < QueryBase
339
+ attr_accessor :query_string
340
+
341
+ # Note that model_classes is not only sometimes useful here - it's
342
+ # essential to make sure the classes have been loaded, and thus
343
+ # acts_as_xapian called on them, so we know the fields for the query
344
+ # parser.
345
+
346
+ # model_classes - model classes to search within, e.g. [PublicBody,
347
+ # User]. Can take a single model class, or you can express the model
348
+ # class names in strings if you like.
349
+ # query_string - user inputed query string, with syntax much like Google Search
350
+ def initialize(model_classes, query_string, options = {})
351
+ # Check parameters, convert to actual array of model classes
352
+ new_model_classes = []
353
+ model_classes = [model_classes] if model_classes.class != Array
354
+ for model_class in model_classes:
355
+ raise "pass in the model class itself, or a string containing its name" if model_class.class != Class && model_class.class != String
356
+ model_class = Object.full_const_get(model_class) if model_class.class == String
357
+ new_model_classes.push(model_class)
358
+ end
359
+ model_classes = new_model_classes
360
+
361
+ # Set things up
362
+ self.initialize_db
363
+
364
+ # Case of a string, searching for a Google-like syntax query
365
+ self.query_string = query_string
366
+
367
+ # Construct query which only finds things from specified models
368
+ model_query = Xapian::Query.new(Xapian::Query::OP_OR, model_classes.map{|mc| "M" + mc.to_s})
369
+ user_query = ActsAsXapian.query_parser.parse_query(self.query_string,
370
+ Xapian::QueryParser::FLAG_BOOLEAN | Xapian::QueryParser::FLAG_PHRASE |
371
+ Xapian::QueryParser::FLAG_LOVEHATE | Xapian::QueryParser::FLAG_WILDCARD |
372
+ Xapian::QueryParser::FLAG_SPELLING_CORRECTION)
373
+ self.query = Xapian::Query.new(Xapian::Query::OP_AND, model_query, user_query)
374
+
375
+ # Call base class constructor
376
+ self.initialize_query(options)
377
+ end
378
+
379
+ # Return just normal words in the query i.e. Not operators, ones in
380
+ # date ranges or similar. Use this for cheap highlighting with
381
+ # TextHelper::highlight, and excerpt.
382
+ def words_to_highlight
383
+ query_nopunc = self.query_string.gsub(/[^a-z0-9:\.\/_]/i, " ")
384
+ query_nopunc = query_nopunc.gsub(/\s+/, " ")
385
+ words = query_nopunc.split(" ")
386
+ # Remove anything with a :, . or / in it
387
+ words = words.find_all {|o| !o.match(/(:|\.|\/)/) }
388
+ words = words.find_all {|o| !o.match(/^(AND|NOT|OR|XOR)$/) }
389
+ return words
390
+ end
391
+
392
+ end
393
+
394
+ # Search for models which contain theimportant terms taken from a specified
395
+ # list of models. i.e. Use to find documents similar to one (or more)
396
+ # documents, or use to refine searches.
397
+ class Similar < QueryBase
398
+ attr_accessor :query_models
399
+ attr_accessor :important_terms
400
+
401
+ # model_classes - model classes to search within, e.g. [PublicBody, User]
402
+ # query_models - list of models you want to find things similar to
403
+ def initialize(model_classes, query_models, options = {})
404
+ self.initialize_db
405
+
406
+ # Case of an array, searching for models similar to those models in the array
407
+ self.query_models = query_models
408
+
409
+ # Find the documents by their unique term
410
+ input_models_query = Xapian::Query.new(Xapian::Query::OP_OR, query_models.map{|m| "I" + m.xapian_document_term})
411
+ ActsAsXapian.enquire.query = input_models_query
412
+ matches = ActsAsXapian.enquire.mset(0, 100, 100) # XXX so this whole method will only work with 100 docs
413
+
414
+ # Get set of relevant terms for those documents
415
+ selection = Xapian::RSet.new()
416
+ iter = matches._begin
417
+ while not iter.equals(matches._end)
418
+ selection.add_document(iter)
419
+ iter.next
420
+ end
421
+
422
+ # Bit weird that the function to make esets is part of the enquire
423
+ # object. This explains what exactly it does, which is to exclude
424
+ # terms in the existing query.
425
+ # http://thread.gmane.org/gmane.comp.search.xapian.general/3673/focus=3681
426
+ eset = ActsAsXapian.enquire.eset(40, selection)
427
+
428
+ # Do main search for them
429
+ self.important_terms = []
430
+ iter = eset._begin
431
+ while not iter.equals(eset._end)
432
+ self.important_terms.push(iter.term)
433
+ iter.next
434
+ end
435
+ similar_query = Xapian::Query.new(Xapian::Query::OP_OR, self.important_terms)
436
+ # Exclude original
437
+ combined_query = Xapian::Query.new(Xapian::Query::OP_AND_NOT, similar_query, input_models_query)
438
+
439
+ # Restrain to model classes
440
+ model_query = Xapian::Query.new(Xapian::Query::OP_OR, model_classes.map{|mc| "M" + mc.to_s})
441
+ self.query = Xapian::Query.new(Xapian::Query::OP_AND, model_query, combined_query)
442
+
443
+ # Call base class constructor
444
+ self.initialize_query(options)
445
+ end
446
+ end
447
+
448
+ ######################################################################
449
+ # Index
450
+
451
+ # Offline indexing job queue model, create with migration made
452
+ # using "script/generate acts_as_xapian" as described in ../README.txt
453
+ class ActsAsXapianJob
454
+ include ::DataMapper::Resource
455
+ property :id, Integer, :serial => true
456
+ property :model_class, String
457
+ property :model_id, Integer
458
+ property :action, String
459
+ # add_index :acts_as_xapian_jobs, [:model_class, :model_id], :unique => true
460
+ end
461
+
462
+ # Update index with any changes needed, call this offline. Only call it
463
+ # from a script that exits - otherwise Xapian's writable database won't
464
+ # flush your changes. Specifying flush will reduce performance, but
465
+ # make sure that each index update is definitely saved to disk before
466
+ # logging in the database that it has been.
467
+ def ActsAsXapian.update_index(flush = false, verbose = false)
468
+ ActsAsXapian.writable_init
469
+
470
+ ids_to_refresh = ActsAsXapianJob.all.map() { |i| i.id }
471
+ for id in ids_to_refresh
472
+ begin
473
+ # TODO: transaction
474
+ # ActiveRecord::Base.transaction do
475
+ job = ActsAsXapianJob.first(:id => id) #, :lock =>true)
476
+ STDOUT.puts("ActsAsXapian.update_index #{job.action} #{job.model_class} #{job.model_id.to_s}") if verbose
477
+ if job.action == 'update'
478
+ # XXX Index functions may reference other models, so we could eager load here too?
479
+ model = Object.full_const_get(job.model_class).first(:id => job.model_id) # :include => cls.constantize.xapian_options[:include]
480
+ model.xapian_index
481
+ elsif job.action == 'destroy'
482
+ # Make dummy model with right id, just for destruction
483
+ model = Object.full_const_get(job.model_class).new
484
+ model.id = job.model_id
485
+ model.xapian_destroy
486
+ else
487
+ raise "unknown ActsAsXapianJob action '" + job.action + "'"
488
+ end
489
+ job.destroy
490
+
491
+ if flush
492
+ ActsAsXapian.writable_db.flush
493
+ end
494
+ # end
495
+ rescue => detail
496
+ # print any error, and carry on so other things are indexed
497
+ # XXX If item is later deleted, this should give up, and it
498
+ # won't. It will keep trying (assuming update_index called from
499
+ # regular cron job) and mayhap cause trouble.
500
+ STDERR.puts(detail.backtrace.join("\n") + "\nFAILED ActsAsXapian.update_index job #{id} #{$!}")
501
+ end
502
+ end
503
+ end
504
+
505
+ # You must specify *all* the models here, this totally rebuilds the Xapian database.
506
+ # You'll want any readers to reopen the database after this.
507
+ def ActsAsXapian.rebuild_index(model_classes, verbose = false)
508
+ raise "when rebuilding all, please call as first and only thing done in process / task" if not ActsAsXapian.writable_db.nil?
509
+
510
+ prepare_environment
511
+
512
+ # Delete any existing .new database, and open a new one
513
+ new_path = ActsAsXapian.db_path + ".new"
514
+ if File.exist?(new_path)
515
+ raise "found existing " + new_path + " which is not Xapian flint database, please delete for me" if not File.exist?(File.join(new_path, "iamflint"))
516
+ FileUtils.rm_r(new_path)
517
+ end
518
+ ActsAsXapian.writable_init(".new")
519
+
520
+ # Index everything
521
+ # XXX not a good place to do this destroy, as unindexed list is lost if
522
+ # process is aborted and old database carries on being used. Perhaps do in
523
+ # transaction and commit after rename below? Not sure if thenlocking is then bad
524
+ # for live website running at same time.
525
+
526
+ ActsAsXapianJob.all.destroy!
527
+
528
+ batch_size = 1000
529
+ for model_class in model_classes
530
+ # TODO: transaction
531
+ # model_class.transaction do
532
+ 0.step(model_class.count, batch_size) do |i|
533
+ STDOUT.puts("ActsAsXapian: New batch. From #{i} to #{i + batch_size}") if verbose
534
+ models = model_class.all(:limit => batch_size, :offset => i)
535
+ for model in models
536
+ STDOUT.puts("ActsAsXapian.rebuild_index #{model_class} #{model.id}") if verbose
537
+ model.xapian_index
538
+ end
539
+ end
540
+ # end
541
+ end
542
+
543
+ ActsAsXapian.writable_db.flush
544
+
545
+ # Rename into place
546
+ old_path = ActsAsXapian.db_path
547
+ temp_path = ActsAsXapian.db_path + ".tmp"
548
+ if File.exist?(temp_path)
549
+ raise "temporary database found " + temp_path + " which is not Xapian flint database, please delete for me" if not File.exist?(File.join(temp_path, "iamflint"))
550
+ FileUtils.rm_r(temp_path)
551
+ end
552
+ if File.exist?(old_path)
553
+ FileUtils.mv old_path, temp_path
554
+ end
555
+ FileUtils.mv new_path, old_path
556
+
557
+ # Delete old database
558
+ if File.exist?(temp_path)
559
+ raise "old database now at " + temp_path + " is not Xapian flint database, please delete for me" if not File.exist?(File.join(temp_path, "iamflint"))
560
+ FileUtils.rm_r(temp_path)
561
+ end
562
+
563
+ # You'll want to restart your FastCGI or Mongrel processes after this,
564
+ # so they get the new db
565
+ end
566
+
567
+ end
568
+
569
+ module DataMapper
570
+ module Xapian
571
+
572
+ def is_indexed(options)
573
+ cattr_accessor :xapian_options
574
+
575
+ ActsAsXapian.set_options(self.to_s, options)
576
+ self.xapian_options = options
577
+
578
+ self.after :save do
579
+ model_class = self.class.to_s
580
+ model_id = self.id
581
+ # TODO: transaction
582
+ # ActiveRecord::Base.transaction do
583
+ found = ::ActsAsXapian::ActsAsXapianJob.all(:model_class => model_class, :model_id => model_id).destroy!
584
+ job = ::ActsAsXapian::ActsAsXapianJob.create(:model_class => model_class, :model_id => model_id, :action => "update")
585
+ # job.save!
586
+ # end
587
+ end
588
+
589
+ self.after :destroy do
590
+ :xapian_mark_needs_destroy
591
+ model_class = self.class.to_s
592
+ model_id = self.id
593
+ # TODO: transaction
594
+ # ActiveRecord::Base.transaction do
595
+ found = ::ActsAsXapian::ActsAsXapianJob.all(:model_class => model_class, :model_id => model_id).destroy!
596
+ job = ::ActsAsXapian::ActsAsXapianJob.create(:model_class => model_class, :model_id => model_id, :action => "destroy")
597
+ # end
598
+ end
599
+
600
+ include DataMapper::Xapian::InstanceMethods
601
+ end
602
+
603
+ module InstanceMethods
604
+ # Used internally
605
+ def xapian_document_term
606
+ self.class.to_s + "-" + self.id.to_s
607
+ end
608
+
609
+ # Extract value of a field from the model
610
+ def xapian_value(field, type = nil)
611
+ # value = self[field] || self.send(field.to_sym)
612
+ value = self.send(field.to_sym)
613
+ if type == :date
614
+ if value.kind_of?(Time)
615
+ value.utc.strftime("%Y%m%d")
616
+ elsif value.kind_f?(Date)
617
+ value.to_time.utc.strftime("%Y%m%d")
618
+ else
619
+ raise "Only Time or Date types supported by acts_as_xapian for :date fields, got " + value.class.to_s
620
+ end
621
+ elsif type == :boolean
622
+ value ? true : false
623
+ else
624
+ value.to_s
625
+ end
626
+ end
627
+
628
+ # Store record in the Xapian database
629
+ def xapian_index
630
+ # if we have a conditional function for indexing, call it and destory object if failed
631
+ if self.class.xapian_options.include?(:if)
632
+ if_value = xapian_value(self.class.xapian_options[:if], :boolean)
633
+ if not if_value
634
+ self.xapian_destroy
635
+ return
636
+ end
637
+ end
638
+
639
+ # otherwise (re)write the Xapian record for the object
640
+ doc = ::Xapian::Document.new
641
+ ActsAsXapian.term_generator.document = doc
642
+
643
+ doc.data = self.xapian_document_term
644
+
645
+ doc.add_term("M" + self.class.to_s)
646
+ doc.add_term("I" + doc.data)
647
+ if self.xapian_options[:terms]
648
+ for term in self.xapian_options[:terms]
649
+ ActsAsXapian.term_generator.increase_termpos # stop phrases spanning different text fields
650
+ ActsAsXapian.term_generator.index_text(xapian_value(term[0]), 1, term[1])
651
+ end
652
+ end
653
+ if self.xapian_options[:values]
654
+ for value in self.xapian_options[:values]
655
+ doc.add_value(value[1], xapian_value(value[0], value[3]))
656
+ end
657
+ end
658
+ if self.xapian_options[:texts]
659
+ for text in self.xapian_options[:texts]
660
+ ActsAsXapian.term_generator.increase_termpos # stop phrases spanning different text fields
661
+ # XXX the "1" here is a weight that could be varied for a boost function
662
+ ActsAsXapian.term_generator.index_text(xapian_value(text), 1)
663
+ end
664
+ end
665
+
666
+ ActsAsXapian.writable_db.replace_document("I" + doc.data, doc)
667
+ end
668
+
669
+ # Delete record from the Xapian database
670
+ def xapian_destroy
671
+ ActsAsXapian.writable_db.delete_document("I" + self.xapian_document_term)
672
+ end
673
+
674
+ end
675
+ end
676
+ end
677
+
678
+ module DataMapper
679
+ module Resource
680
+ module ClassMethods
681
+ include DataMapper::Xapian
682
+ end
683
+ end
684
+ end
685
+
686
+ if defined?(Merb::Plugins)
687
+ # Merb gives you a Merb::Plugins.config hash...feel free to put your stuff in your piece of it
688
+ # Merb::Plugins.config[:dm_xapian] = {
689
+ # :chickens => false
690
+ # }
691
+
692
+ # Merb::BootLoader.before_app_loads do
693
+ #
694
+ # end
695
+
696
+ Merb::BootLoader.after_app_loads do
697
+ # code that can be required after the application loads
698
+ ActsAsXapian.configure(Merb.env||'development', Merb.root)
699
+ end
700
+
701
+ Merb::Plugins.add_rakefiles "dm-xapian/merbtasks"
702
+ end
metadata ADDED
@@ -0,0 +1,71 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: psq-dm-xapian
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.3.1
5
+ platform: ruby
6
+ authors:
7
+ - Joshaven Potter, Pascal Belloncle
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2008-11-08 00:00:00 -08:00
13
+ default_executable:
14
+ dependencies:
15
+ - !ruby/object:Gem::Dependency
16
+ name: merb
17
+ version_requirement:
18
+ version_requirements: !ruby/object:Gem::Requirement
19
+ requirements:
20
+ - - ">="
21
+ - !ruby/object:Gem::Version
22
+ version: 0.9.7
23
+ version:
24
+ description: Merb plugin that provides access to the Ruby Xapian search engine library
25
+ email: yourtech@gmail.com, psq@nanorails.com
26
+ executables: []
27
+
28
+ extensions: []
29
+
30
+ extra_rdoc_files:
31
+ - README.txt
32
+ - LICENSE
33
+ - TODO
34
+ - SETUP.txt
35
+ - CHANGES.txt
36
+ files:
37
+ - LICENSE
38
+ - README.txt
39
+ - Rakefile
40
+ - TODO
41
+ - lib/dm-xapian.rb
42
+ - SETUP.txt
43
+ - CHANGES.txt
44
+ has_rdoc: true
45
+ homepage: http://github.com/psq/dm-xapian
46
+ post_install_message:
47
+ rdoc_options: []
48
+
49
+ require_paths:
50
+ - lib
51
+ required_ruby_version: !ruby/object:Gem::Requirement
52
+ requirements:
53
+ - - ">="
54
+ - !ruby/object:Gem::Version
55
+ version: "0"
56
+ version:
57
+ required_rubygems_version: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: "0"
62
+ version:
63
+ requirements: []
64
+
65
+ rubyforge_project: merb
66
+ rubygems_version: 1.2.0
67
+ signing_key:
68
+ specification_version: 2
69
+ summary: Merb plugin that provides access to the Ruby Xapian search engine library
70
+ test_files: []
71
+