wbzyl-acts_as_xapian 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,2 @@
1
+
2
+ * Done with gemifying acts_as_xapian plugin.
@@ -0,0 +1,21 @@
1
+ acts_as_xapian is released under the MIT License.
2
+
3
+ Copyright (c) 2008 UK Citizens Online Democracy.
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of the acts_as_xapian software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including without
8
+ limitation the rights to use, copy, modify, merge, publish, distribute,
9
+ sublicense, and/or sell copies of the Software, and to permit persons to whom
10
+ the Software is furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
@@ -0,0 +1,267 @@
1
+ Do patch this file if there is documentation missing / wrong. It's called
2
+ README.txt and is in git, using Textile formatting. The wiki page is just
3
+ copied from the README.txt file.
4
+
5
+ Contents
6
+ ========
7
+
8
+ * a. Introduction to acts_as_xapian
9
+ * b. Installation
10
+ * c. Comparison to acts_as_solr (as on 24 April 2008)
11
+ * d. Documentation - indexing
12
+ * e. Documentation - querying
13
+ * f. Configuration
14
+ * g. Performance
15
+ * h. Support
16
+
17
+
18
+ a. Introduction to acts_as_xapian
19
+ =================================
20
+
21
+ "Xapian":http://www.xapian.org is a full text search engine library which has
22
+ Ruby bindings. acts_as_xapian adds support for it to Rails. It is an
23
+ alternative to acts_as_solr, acts_as_ferret, Ultrasphinx, acts_as_indexed,
24
+ acts_as_searchable or acts_as_tsearch.
25
+
26
+ acts_as_xapian is deployed in production on these websites.
27
+ * "WhatDoTheyKnow":http://www.whatdotheyknow.com
28
+ * "MindBites":http://www.mindbites.com
29
+
30
+ The section "c. Comparison to acts_as_solr" below will give you an idea of
31
+ acts_as_xapian's features.
32
+
33
+ acts_as_xapian was started by Francis Irving in May 2008 for search and email
34
+ alerts in WhatDoTheyKnow, and so was supported by "mySociety":http://www.mysociety.org
35
+ and initially paid for by the "JRSST Charitable Trust":http://www.jrrt.org.uk/jrsstct.htm
36
+
37
+
38
+ b. Installation
39
+ ===============
40
+
41
+ Retrieve the plugin directly from the git version control system by running
42
+ this command within your Rails app.
43
+
44
+ git clone git://github.com/frabcus/acts_as_xapian.git vendor/plugins/acts_as_xapian
45
+
46
+ Xapian 1.0.5 and associated Ruby bindings are also required.
47
+
48
+ Debian or Ubuntu - install the packages libxapian15 and libxapian-ruby1.8.
49
+
50
+ Mac OSX - follow the instructions for installing from source on
51
+ the "Installing Xapian":http://xapian.org/docs/install.html page - you need the
52
+ Xapian library and bindings (you don't need Omega).
53
+
54
+ There is no Ruby Gem for Xapian, it would be great if you could make one!
55
+
56
+
57
+ c. Comparison to acts_as_solr (as on 24 April 2008)
58
+ =============================
59
+
60
+ * Offline indexing only mode - which is a minus if you want changes
61
+ immediately reflected in the search index, and a plus if you were going to
62
+ have to implement your own offline indexing anyway.
63
+
64
+ * Collapsing - the equivalent of SQL's "group by". You can specify a field
65
+ to collapse on, and only the most relevant result from each value of that
66
+ field is returned. Along with a count of how many there are in total.
67
+ acts_as_solr doesn't have this.
68
+
69
+ * No highlighting - Xapian can't return you text highlighted with a search
70
+ query. You can try and make do with TextHelper::highlight (combined with
71
+ words_to_highlight below). I found the highlighting in acts_as_solr didn't
72
+ really understand the query anyway.
73
+
74
+ * Date range searching - this exists in acts_as_solr, but I found it
75
+ wasn't documented well enough, and was hard to get working.
76
+
77
+ * Spelling correction - "did you mean?" built in and just works.
78
+
79
+ * Similar documents - acts_as_xapian has a simple command to find other models
80
+ that are like a specified model.
81
+
82
+ * Multiple models - acts_as_xapian searches multiple types of model if you
83
+ like, returning them mixed up together by relevancy. This is like
84
+ multi_solr_search, only it is the default mode of operation and is properly
85
+ supported.
86
+
87
+ * No daemons - However, if you have more than one web server, you'll need to
88
+ work out how to use "Xapian's remote backend":http://xapian.org/docs/remote.html.
89
+
90
+ * One layer - full-powered Xapian is called directly from the Ruby, without
91
+ Solr getting in the way whenever you want to use a new feature from Lucene.
92
+
93
+ * No Java - an advantage if you're more used to working in the rest of the
94
+ open source world. acts_as_xapian, it's pure Ruby and C++.
95
+
96
+ * Xapian's awesome email list - the kids over at
97
+ "xapian-discuss":http://lists.xapian.org/mailman/listinfo/xapian-discuss
98
+ are super helpful. Useful if you need to extend and improve acts_as_xapian. The
99
+ Ruby bindings are mature and well maintained as part of Xapian.
100
+
101
+
102
+ d. Documentation - indexing
103
+ ===========================
104
+
105
+ Xapian is an *offline indexing* search library - only one process can have the
106
+ Xapian database open for writing at once, and others that try meanwhile are
107
+ unceremoniously kicked out. For this reason, acts_as_xapian does not support
108
+ immediate writing to the database when your models change.
109
+
110
+ Instead, there is a ActsAsXapianJob model which stores which models need
111
+ updating or deleting in the search index. A rake task 'xapian:update_index'
112
+ then performs the updates since last change. You can run it on a cron job, or
113
+ similar.
114
+
115
+ Here's how to add indexing to your Rails app:
116
+
117
+ 1. Put acts_as_xapian in your models that need search indexing. e.g.
118
+
119
+ acts_as_xapian :texts => [ :name, :short_name ],
120
+ :values => [ [ :created_at, 0, "created_at", :date ] ],
121
+ :terms => [ [ :variety, 'V', "variety" ] ]
122
+
123
+ Options must include:
124
+
125
+ * :texts, an array of fields for indexing with full text search.
126
+ e.g. :texts => [ :title, :body ]
127
+
128
+ * :values, things which have a range of values for sorting, or for collapsing.
129
+ Specify an array quadruple of [ field, identifier, prefix, type ] where
130
+ ** identifier is an arbitary numeric identifier for use in the Xapian database
131
+ ** prefix is the part to use in search queries that goes before the :
132
+ ** type can be any of :string, :number or :date
133
+
134
+ e.g. :values => [ [ :created_at, 0, "created_at", :date ],
135
+ [ :size, 1, "size", :string ] ]
136
+
137
+ * :terms, things which come with a prefix (before a :) in search queries.
138
+ Specify an array triple of [ field, char, prefix ] where
139
+ ** char is an arbitary single upper case char used in the Xapian database, just
140
+ pick any single uppercase character, but use a different one for each prefix.
141
+ ** prefix is the part to use in search queries that goes before the :
142
+ For example, if you were making Google and indexing to be able to later do a
143
+ query like "site:www.whatdotheyknow.com", then the prefix would be "site".
144
+
145
+ e.g. :terms => [ [ :variety, 'V', "variety" ] ]
146
+
147
+ A 'field' is a symbol referring to either an attribute or a function which
148
+ returns the text, date or number to index. Both 'identifier' and 'char' must be
149
+ the same for the same prefix in different models.
150
+
151
+ Options may include:
152
+ * :eager_load, added as an :include clause when looking up search results in
153
+ database
154
+ * :if, either an attribute or a function which if returns false means the
155
+ object isn't indexed
156
+
157
+ 2. Generate a database migration to create the ActsAsXapianJob model:
158
+
159
+ script/generate acts_as_xapian
160
+ rake db:migrate
161
+
162
+ 3. Call 'rake xapian:rebuild_index models="ModelName1 ModelName2"' to build the index
163
+ the first time (you must specify all your indexed models). It's put in a
164
+ development/test/production dir in acts_as_xapian/xapiandbs. See f. Configuration
165
+ below if you want to change this.
166
+
167
+ 4. Then from a cron job or a daemon, or by hand regularly!, call 'rake xapian:update_index'
168
+
169
+
170
+ e. Documentation - querying
171
+ ===========================
172
+
173
+ Testing indexing
174
+ ----------------
175
+
176
+ If you just want to test indexing is working, you'll find this rake task
177
+ useful (it has more options, see tasks/xapian.rake)
178
+
179
+ rake xapian:query models="PublicBody User" query="moo"
180
+
181
+ Performing a query
182
+ ------------------
183
+
184
+ To perform a query from code call ActsAsXapian::Search.new. This takes in turn:
185
+ * model_classes - list of models to search, e.g. [PublicBody, InfoRequestEvent]
186
+ * query_string - Google like syntax, see below
187
+
188
+ And then a hash of options:
189
+ * :offset - Offset of first result (default 0)
190
+ * :limit - Number of results per page
191
+ * :sort_by_prefix - Optionally, prefix of value to sort by, otherwise sort by relevance
192
+ * :sort_by_ascending - Default true (documents with higher values better/earlier), set to false for descending sort
193
+ * :collapse_by_prefix - Optionally, prefix of value to collapse by (i.e. only return most relevant result from group)
194
+
195
+ Google like query syntax is as described in
196
+ "Xapian::QueryParser Syntax":http://www.xapian.org/docs/queryparser.html
197
+ Queries can include prefix:value parts, according to what you indexed in the
198
+ acts_as_xapian part above. You can also say things like model:InfoRequestEvent
199
+ to constrain by model in more complex ways than the :model parameter, or
200
+ modelid:InfoRequestEvent-100 to only find one specific object.
201
+
202
+ Returns an ActsAsXapian::Search object. Useful methods are:
203
+ * description - a techy one, to check how the query has been parsed
204
+ * matches_estimated - a guesstimate at the total number of hits
205
+ * spelling_correction - the corrected query string if there is a correction, otherwise nil
206
+ * words_to_highlight - list of words for you to highlight, perhaps with TextHelper::highlight
207
+ * results - an array of hashes each containing:
208
+ ** :model - your Rails model, this is what you most want!
209
+ ** :weight - relevancy measure
210
+ ** :percent - the weight as a %, 0 meaning the item did not match the query at all
211
+ ** :collapse_count - number of results with the same prefix, if you specified collapse_by_prefix
212
+
213
+ Finding similar models
214
+ ----------------------
215
+
216
+ To find models that are similar to a given set of models call ActsAsXapian::Similar.new. This takes:
217
+ * model_classes - list of model classes to return models from within
218
+ * models - list of models that you want to find related ones to
219
+
220
+ Returns an ActsAsXapian::Similar object. Has all methods from ActsAsXapian::Search above, except
221
+ for words_to_highlight. In addition has:
222
+ * important_terms - the terms extracted from the input models, that were used to search for output
223
+ You need the results methods to get the similar models.
224
+
225
+
226
+ f. Configuration
227
+ ================
228
+
229
+ If you want to customise the configuration of acts_as_xapian, it will look for
230
+ a file called 'xapian.yml' under RAILS_ROOT/config. As is familiar from the
231
+ format of the database.yml file, separate :development, :test and :production
232
+ sections are expected.
233
+
234
+ The following options are available:
235
+ * base_db_path - specifies the directory, relative to RAILS_ROOT, in which
236
+ acts_as_xapian stores its search index databases. Default is the directory
237
+ xapiandbs within the acts_as_xapian directory.
238
+
239
+
240
+ g. Performance
241
+ ==============
242
+
243
+ On development sites, acts_as_xapian automatically logs the time taken to do
244
+ searches. The time displayed is for the Xapian parts of the query; the Rails
245
+ database model lookups will be logged separately by ActiveRecord. Example:
246
+
247
+ Xapian query (0.00029s) Search: hello
248
+
249
+ To enable this, and other performance logging, on a production site,
250
+ temporarily add this to the end of your config/environment.rb
251
+
252
+ ActiveRecord::Base.logger = Logger.new(STDOUT)
253
+
254
+
255
+ h. Support
256
+ ==========
257
+
258
+ Please ask any questions on the
259
+ "acts_as_xapian Google Group":http://groups.google.com/group/acts_as_xapian
260
+
261
+ The official home page and repository for acts_as_xapian are the
262
+ "acts_as_xapian github page":http://github.com/frabcus/acts_as_xapian/wikis
263
+
264
+ For more details about anything, see source code in lib/acts_as_xapian.rb
265
+
266
+ Merging source instructions "Using git for collaboration" here:
267
+ http://www.kernel.org/pub/software/scm/git/docs/gittutorial.html
@@ -0,0 +1,14 @@
1
+ require 'rake'
2
+ require 'rubygems'
3
+ require 'echoe'
4
+
5
+ Echoe.new('acts_as_xapian', '0.0.2') do |p|
6
+ p.description = "Acts_as_xapian is a full text search gem/plugin for Ruby on Rails."
7
+ p.url = "http://github.com/Overbryd/acts_as_xapian"
8
+ p.author = "Lukas Rieder (original author: Francis Irving)"
9
+ p.email = "l.rieder@gmail.com"
10
+ p.ignore_pattern = ["tmp/*", "script/*"]
11
+ p.development_dependencies = []
12
+ end
13
+
14
+ Dir["#{File.dirname(__FILE__)}/tasks/*.rake"].sort.each { |ext| load ext }
@@ -0,0 +1,31 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ Gem::Specification.new do |s|
4
+ s.name = %q{acts_as_xapian}
5
+ s.version = "0.0.2"
6
+
7
+ s.required_rubygems_version = Gem::Requirement.new(">= 1.2") if s.respond_to? :required_rubygems_version=
8
+ s.authors = ["Lukas Rieder (original author: Francis Irving)"]
9
+ s.date = %q{2008-12-09}
10
+ s.description = %q{Acts_as_xapian is a full text search gem/plugin for Ruby on Rails.}
11
+ s.email = %q{l.rieder@gmail.com}
12
+ s.extra_rdoc_files = ["lib/acts_as_xapian.rb", "README.textile", "LICENSE.txt", "tasks/xapian.rake", "CHANGELOG"]
13
+ s.files = ["Rakefile", "acts_as_xapian.gemspec", "lib/acts_as_xapian.rb", "init.rb", "Manifest", "generators/acts_as_xapian/templates/migration.rb", "generators/acts_as_xapian/templates/xapian.yml", "generators/acts_as_xapian/USAGE", "generators/acts_as_xapian/acts_as_xapian_generator.rb", "README.textile", "LICENSE.txt", "tasks/xapian.rake", "CHANGELOG"]
14
+ s.has_rdoc = true
15
+ s.homepage = %q{http://github.com/Overbryd/acts_as_xapian}
16
+ s.rdoc_options = ["--line-numbers", "--inline-source", "--title", "Acts_as_xapian", "--main", "README.textile"]
17
+ s.require_paths = ["lib"]
18
+ s.rubyforge_project = %q{acts_as_xapian}
19
+ s.rubygems_version = %q{1.3.1}
20
+ s.summary = %q{Acts_as_xapian is a full text search gem/plugin for Ruby on Rails.}
21
+
22
+ if s.respond_to? :specification_version then
23
+ current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
24
+ s.specification_version = 2
25
+
26
+ if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.2.0') then
27
+ else
28
+ end
29
+ else
30
+ end
31
+ end
@@ -0,0 +1 @@
1
+ ./script/generate acts_as_xapian
@@ -0,0 +1,17 @@
1
+ class ActsAsXapianGenerator < Rails::Generator::Base
2
+
3
+ def manifest
4
+ record do |m|
5
+ m.migration_template 'migration.rb', 'db/migrate',
6
+ :migration_file_name => "create_acts_as_xapian"
7
+ m.file 'xapian.yml', 'config/xapian.yml'
8
+ end
9
+ end
10
+
11
+ protected
12
+
13
+ def banner
14
+ "Usage: #{$0} acts_as_xapian"
15
+ end
16
+
17
+ end
@@ -0,0 +1,14 @@
1
+ class CreateActsAsXapian < ActiveRecord::Migration
2
+ def self.up
3
+ create_table :acts_as_xapian_jobs do |t|
4
+ t.column :model, :string, :null => false
5
+ t.column :model_id, :integer, :null => false
6
+ t.column :action, :string, :null => false
7
+ end
8
+ add_index :acts_as_xapian_jobs, [:model, :model_id], :unique => true
9
+ end
10
+ def self.down
11
+ drop_table :acts_as_xapian_jobs
12
+ end
13
+ end
14
+
@@ -0,0 +1,10 @@
1
+ # acts_as_xapian config
2
+
3
+ development:
4
+ base_db_path: db/xapian
5
+
6
+ production:
7
+ base_db_path: db/xapian
8
+
9
+ test:
10
+ base_db_path: db/xapian
data/init.rb ADDED
@@ -0,0 +1,9 @@
1
+ # acts_as_xapian/init.rb:
2
+ #
3
+ # Copyright (c) 2008 UK Citizens Online Democracy. All rights reserved.
4
+ # Email: francis@mysociety.org; WWW: http://www.mysociety.org/
5
+ #
6
+ # $Id: init.rb,v 1.1 2008/04/23 13:33:50 francis Exp $
7
+
8
+ require 'acts_as_xapian'
9
+
@@ -0,0 +1,778 @@
1
+ # acts_as_xapian/lib/acts_as_xapian.rb:
2
+ # Xapian full text search in Ruby on Rails.
3
+ #
4
+ # Copyright (c) 2008 UK Citizens Online Democracy. All rights reserved.
5
+ # Email: francis@mysociety.org; WWW: http://www.mysociety.org/
6
+ #
7
+ # Documentation
8
+ # =============
9
+ #
10
+ # See ../README.txt foocumentation. Please update that file if you edit
11
+ # code.
12
+
13
+ # Make it so if Xapian isn't installed, the Rails app doesn't fail completely,
14
+ # just when somebody does a search.
15
+ begin
16
+ require 'xapian'
17
+ $acts_as_xapian_bindings_available = true
18
+ rescue LoadError
19
+ STDERR.puts "acts_as_xapian: No Ruby bindings for Xapian installed"
20
+ $acts_as_xapian_bindings_available = false
21
+ end
22
+
23
+ module ActsAsXapian
24
+ ######################################################################
25
+ # Module level variables
26
+ # XXX must be some kind of cattr_accessor that can do this better
27
+ def ActsAsXapian.bindings_available
28
+ $acts_as_xapian_bindings_available
29
+ end
30
+ class NoXapianRubyBindingsError < StandardError
31
+ end
32
+
33
+ # XXX global class intializers here get loaded more than once, don't know why. Protect them.
34
+ if not $acts_as_xapian_class_var_init
35
+ @@db = nil
36
+ @@db_path = nil
37
+ @@writable_db = nil
38
+ @@writable_suffix = nil
39
+ @@init_values = []
40
+ $acts_as_xapian_class_var_init = true
41
+ end
42
+ def ActsAsXapian.db
43
+ @@db
44
+ end
45
+ def ActsAsXapian.db_path
46
+ @@db_path
47
+ end
48
+ def ActsAsXapian.writable_db
49
+ @@writable_db
50
+ end
51
+ def ActsAsXapian.stemmer
52
+ @@stemmer
53
+ end
54
+ def ActsAsXapian.term_generator
55
+ @@term_generator
56
+ end
57
+ def ActsAsXapian.enquire
58
+ @@enquire
59
+ end
60
+ def ActsAsXapian.query_parser
61
+ @@query_parser
62
+ end
63
+ def ActsAsXapian.values_by_prefix
64
+ @@values_by_prefix
65
+ end
66
+ def ActsAsXapian.config
67
+ @@config
68
+ end
69
+
70
+ ######################################################################
71
+ # Initialisation
72
+ def ActsAsXapian.init(classname = nil, options = nil)
73
+ if not classname.nil?
74
+ # store class and options for use later, when we open the db in readable_init
75
+ @@init_values.push([classname,options])
76
+ end
77
+ end
78
+
79
+ # Reads the config file (if any) and sets up the path to the database we'll be using
80
+ def ActsAsXapian.prepare_environment
81
+ return unless @@db_path.nil?
82
+
83
+ # barf if we can't figure out the environment
84
+ environment = (ENV['RAILS_ENV'] or RAILS_ENV)
85
+ raise "Set RAILS_ENV, so acts_as_xapian can find the right Xapian database" if not environment
86
+
87
+ # check for a config file
88
+ config_file = RAILS_ROOT + "/config/xapian.yml"
89
+ @@config = File.exists?(config_file) ? YAML.load_file(config_file)[environment] : {}
90
+
91
+ # figure out where the DBs should go
92
+ if config['base_db_path']
93
+ db_parent_path = RAILS_ROOT + "/" + config['base_db_path']
94
+ else
95
+ db_parent_path = File.join(File.dirname(__FILE__), '../xapiandbs/')
96
+ end
97
+
98
+ # make the directory for the xapian databases to go in
99
+ Dir.mkdir(db_parent_path) unless File.exists?(db_parent_path)
100
+
101
+ @@db_path = File.join(db_parent_path, environment)
102
+
103
+ # make some things that don't depend on the db
104
+ # XXX this gets made once for each acts_as_xapian. Oh well.
105
+ @@stemmer = Xapian::Stem.new('english')
106
+ end
107
+
108
+ # Opens / reopens the db for reading
109
+ # XXX we perhaps don't need to rebuild database and enquire and queryparser -
110
+ # but db.reopen wasn't enough by itself, so just do everything it's easier.
111
+ def ActsAsXapian.readable_init
112
+ raise NoXapianRubyBindingsError.new("Xapian Ruby bindings not installed") unless ActsAsXapian.bindings_available
113
+ raise "acts_as_xapian hasn't been called in any models" if @@init_values.empty?
114
+
115
+ # if DB is not nil, then we're already initialised, so don't do it again
116
+ # XXX we need to reopen the database each time, so Xapian gets changes to it.
117
+ # Hopefully in later version of Xapian it will autodetect this, and this can
118
+ # be commented back in again.
119
+ # return unless @@db.nil?
120
+
121
+ prepare_environment
122
+
123
+ # basic Xapian objects
124
+ begin
125
+ @@db = Xapian::Database.new(@@db_path)
126
+ @@enquire = Xapian::Enquire.new(@@db)
127
+ rescue IOError
128
+ raise "Xapian database not opened; have you built it with scripts/rebuild-xapian-index ?"
129
+ end
130
+
131
+ init_query_parser
132
+ end
133
+
134
+ # Make a new query parser
135
+ def ActsAsXapian.init_query_parser
136
+ # for queries
137
+ @@query_parser = Xapian::QueryParser.new
138
+ @@query_parser.stemmer = @@stemmer
139
+ @@query_parser.stemming_strategy = Xapian::QueryParser::STEM_SOME
140
+ @@query_parser.database = @@db
141
+ @@query_parser.default_op = Xapian::Query::OP_AND
142
+
143
+ @@terms_by_capital = {}
144
+ @@values_by_number = {}
145
+ @@values_by_prefix = {}
146
+ @@value_ranges_store = []
147
+
148
+ for init_value_pair in @@init_values
149
+ classname = init_value_pair[0]
150
+ options = init_value_pair[1]
151
+
152
+ # go through the various field types, and tell query parser about them,
153
+ # and error check them - i.e. check for consistency between models
154
+ @@query_parser.add_boolean_prefix("model", "M")
155
+ @@query_parser.add_boolean_prefix("modelid", "I")
156
+ if options[:terms]
157
+ for term in options[:terms]
158
+ raise "Use a single capital letter for term code" if not term[1].match(/^[A-Z]$/)
159
+ raise "M and I are reserved for use as the model/id term" if term[1] == "M" or term[1] == "I"
160
+ raise "model and modelid are reserved for use as the model/id prefixes" if term[2] == "model" or term[2] == "modelid"
161
+ raise "Z is reserved for stemming terms" if term[1] == "Z"
162
+ raise "Already have code '" + term[1] + "' in another model but with different prefix '" + @@terms_by_capital[term[1]] + "'" if @@terms_by_capital.include?(term[1]) && @@terms_by_capital[term[1]] != term[2]
163
+ @@terms_by_capital[term[1]] = term[2]
164
+ @@query_parser.add_prefix(term[2], term[1])
165
+ end
166
+ end
167
+ if options[:values]
168
+ for value in options[:values]
169
+ raise "Value index '"+value[1].to_s+"' must be an integer, is " + value[1].class.to_s if value[1].class != 1.class
170
+ raise "Already have value index '" + value[1].to_s + "' in another model but with different prefix '" + @@values_by_number[value[1]].to_s + "'" if @@values_by_number.include?(value[1]) && @@values_by_number[value[1]] != value[2]
171
+
172
+ # date types are special, mark them so the first model they're seen for
173
+ if !@@values_by_number.include?(value[1])
174
+ if value[3] == :date
175
+ value_range = Xapian::DateValueRangeProcessor.new(value[1])
176
+ elsif value[3] == :string
177
+ value_range = Xapian::StringValueRangeProcessor.new(value[1])
178
+ elsif value[3] == :number
179
+ value_range = Xapian::NumberValueRangeProcessor.new(value[1])
180
+ else
181
+ raise "Unknown value type '" + value[3].to_s + "'"
182
+ end
183
+
184
+ @@query_parser.add_valuerangeprocessor(value_range)
185
+
186
+ # stop it being garbage collected, as
187
+ # add_valuerangeprocessor ref is outside Ruby's GC
188
+ @@value_ranges_store.push(value_range)
189
+ end
190
+
191
+ @@values_by_number[value[1]] = value[2]
192
+ @@values_by_prefix[value[2]] = value[1]
193
+ end
194
+ end
195
+ end
196
+ end
197
+
198
+ def ActsAsXapian.writable_init(suffix = "")
199
+ raise NoXapianRubyBindingsError.new("Xapian Ruby bindings not installed") unless ActsAsXapian.bindings_available
200
+ raise "acts_as_xapian hasn't been called in any models" if @@init_values.empty?
201
+
202
+ # if DB is not nil, then we're already initialised, so don't do it again
203
+ return unless @@writable_db.nil?
204
+
205
+ prepare_environment
206
+
207
+ new_path = @@db_path + suffix
208
+ raise "writable_suffix/suffix inconsistency" if @@writable_suffix && @@writable_suffix != suffix
209
+ if @@writable_db.nil?
210
+ # for indexing
211
+ @@writable_db = Xapian::WritableDatabase.new(new_path, Xapian::DB_CREATE_OR_OPEN)
212
+ @@term_generator = Xapian::TermGenerator.new()
213
+ @@term_generator.set_flags(Xapian::TermGenerator::FLAG_SPELLING, 0)
214
+ @@term_generator.database = @@writable_db
215
+ @@term_generator.stemmer = @@stemmer
216
+ @@writable_suffix = suffix
217
+ end
218
+ end
219
+
220
+ ######################################################################
221
+ # Search with a query or for similar models
222
+
223
+ # Base class for Search and Similar below
224
+ class QueryBase
225
+ attr_accessor :offset
226
+ attr_accessor :limit
227
+ attr_accessor :query
228
+ attr_accessor :matches
229
+ attr_accessor :query_models
230
+ attr_accessor :runtime
231
+ attr_accessor :cached_results
232
+
233
+ def initialize_db
234
+ self.runtime = 0.0
235
+
236
+ ActsAsXapian.readable_init
237
+ if ActsAsXapian.db.nil?
238
+ raise "ActsAsXapian not initialized"
239
+ end
240
+ end
241
+
242
+ # Set self.query before calling this
243
+ def initialize_query(options)
244
+ #raise options.to_yaml
245
+
246
+ self.runtime += Benchmark::realtime {
247
+ offset = options[:offset] || 0; offset = offset.to_i
248
+ limit = options[:limit] || -1
249
+ #raise "please specifiy maximum number of results to return with parameter :limit" if not limit
250
+ limit = limit.to_i
251
+ sort_by_prefix = options[:sort_by_prefix] || nil
252
+ sort_by_ascending = options[:sort_by_ascending].nil? ? true : options[:sort_by_ascending]
253
+ collapse_by_prefix = options[:collapse_by_prefix] || nil
254
+
255
+ ActsAsXapian.enquire.query = self.query
256
+
257
+ if sort_by_prefix.nil?
258
+ ActsAsXapian.enquire.sort_by_relevance!
259
+ else
260
+ value = ActsAsXapian.values_by_prefix[sort_by_prefix]
261
+ raise "couldn't find prefix '" + sort_by_prefix + "'" if value.nil?
262
+ ActsAsXapian.enquire.sort_by_value_then_relevance!(value, sort_by_ascending)
263
+ end
264
+ if collapse_by_prefix.nil?
265
+ ActsAsXapian.enquire.collapse_key = Xapian.BAD_VALUENO
266
+ else
267
+ value = ActsAsXapian.values_by_prefix[collapse_by_prefix]
268
+ raise "couldn't find prefix '" + collapse_by_prefix + "'" if value.nil?
269
+ ActsAsXapian.enquire.collapse_key = value
270
+ end
271
+
272
+ self.matches = ActsAsXapian.enquire.mset(offset, limit, 100)
273
+ self.cached_results = nil
274
+ }
275
+ end
276
+
277
+ # Return a description of the query
278
+ def description
279
+ self.query.description
280
+ end
281
+
282
+ # Estimate total number of results
283
+ def matches_estimated
284
+ self.matches.matches_estimated
285
+ end
286
+
287
+ # Return query string with spelling correction
288
+ def spelling_correction
289
+ correction = ActsAsXapian.query_parser.get_corrected_query_string
290
+ if correction.empty?
291
+ return nil
292
+ end
293
+ return correction
294
+ end
295
+
296
+ # Return array of models found
297
+ def results
298
+ # If they've already pulled out the results, just return them.
299
+ if !self.cached_results.nil?
300
+ return self.cached_results
301
+ end
302
+
303
+ docs = []
304
+ self.runtime += Benchmark::realtime {
305
+ # Pull out all the results
306
+ iter = self.matches._begin
307
+ while not iter.equals(self.matches._end)
308
+ docs.push({:data => iter.document.data,
309
+ :percent => iter.percent,
310
+ :weight => iter.weight,
311
+ :collapse_count => iter.collapse_count})
312
+ iter.next
313
+ end
314
+ }
315
+
316
+ # Log time taken, excluding database lookups below which will be displayed separately by ActiveRecord
317
+ if ActiveRecord::Base.logger
318
+ ActiveRecord::Base.logger.add(Logger::DEBUG, " Xapian query (#{'%.5fs' % self.runtime}) #{self.log_description}")
319
+ end
320
+
321
+ # Look up without too many SQL queries
322
+ lhash = {}
323
+ lhash.default = []
324
+ for doc in docs
325
+ k = doc[:data].split('-')
326
+ lhash[k[0]] = lhash[k[0]] + [k[1]]
327
+ end
328
+ # for each class, look up all ids
329
+ chash = {}
330
+ for cls, ids in lhash
331
+ conditions = [ "#{cls.constantize.table_name}.#{cls.constantize.primary_key} in (?)", ids ]
332
+ found = cls.constantize.find(:all, :conditions => conditions, :include => cls.constantize.xapian_options[:eager_load])
333
+ for f in found
334
+ chash[[cls, f.id]] = f
335
+ end
336
+ end
337
+ # now get them in right order again
338
+ results = []
339
+ docs.each{|doc| k = doc[:data].split('-'); results << { :model => chash[[k[0], k[1].to_i]],
340
+ :percent => doc[:percent], :weight => doc[:weight], :collapse_count => doc[:collapse_count] } }
341
+ self.cached_results = results
342
+ return results
343
+ end
344
+ end
345
+
346
+ # Search for a query string, returns an array of hashes in result order.
347
+ # Each hash contains the actual Rails object in :model, and other detail
348
+ # about relevancy etc. in other keys.
349
+ class Search < QueryBase
350
+ attr_accessor :query_string
351
+
352
+ # Note that model_classes is not only sometimes useful here - it's
353
+ # essential to make sure the classes have been loaded, and thus
354
+ # acts_as_xapian called on them, so we know the fields for the query
355
+ # parser.
356
+
357
+ # model_classes - model classes to search within, e.g. [PublicBody,
358
+ # User]. Can take a single model class, or you can express the model
359
+ # class names in strings if you like.
360
+ # query_string - user inputed query string, with syntax much like Google Search
361
+ def initialize(model_classes, query_string, options = {})
362
+ # Check parameters, convert to actual array of model classes
363
+ new_model_classes = []
364
+ model_classes = [model_classes] if model_classes.class != Array
365
+ for model_class in model_classes:
366
+ raise "pass in the model class itself, or a string containing its name" if model_class.class != Class && model_class.class != String
367
+ model_class = model_class.constantize if model_class.class == String
368
+ new_model_classes.push(model_class)
369
+ end
370
+ model_classes = new_model_classes
371
+
372
+ # Set things up
373
+ self.initialize_db
374
+
375
+ # Case of a string, searching for a Google-like syntax query
376
+ self.query_string = query_string
377
+
378
+ # Construct query which only finds things from specified models
379
+ model_query = Xapian::Query.new(Xapian::Query::OP_OR, model_classes.map{|mc| "M" + mc.to_s})
380
+ user_query = ActsAsXapian.query_parser.parse_query(self.query_string,
381
+ Xapian::QueryParser::FLAG_BOOLEAN | Xapian::QueryParser::FLAG_PHRASE |
382
+ Xapian::QueryParser::FLAG_LOVEHATE | Xapian::QueryParser::FLAG_WILDCARD |
383
+ Xapian::QueryParser::FLAG_SPELLING_CORRECTION)
384
+ self.query = Xapian::Query.new(Xapian::Query::OP_AND, model_query, user_query)
385
+
386
+ # Call base class constructor
387
+ self.initialize_query(options)
388
+ end
389
+
390
+ # Return just normal words in the query i.e. Not operators, ones in
391
+ # date ranges or similar. Use this for cheap highlighting with
392
+ # TextHelper::highlight, and excerpt.
393
+ def words_to_highlight
394
+ query_nopunc = self.query_string.gsub(/[^a-z0-9:\.\/_]/i, " ")
395
+ query_nopunc = query_nopunc.gsub(/\s+/, " ")
396
+ words = query_nopunc.split(" ")
397
+ # Remove anything with a :, . or / in it
398
+ words = words.find_all {|o| !o.match(/(:|\.|\/)/) }
399
+ words = words.find_all {|o| !o.match(/^(AND|NOT|OR|XOR)$/) }
400
+ return words
401
+ end
402
+
403
+ # Text for lines in log file
404
+ def log_description
405
+ "Search: " + self.query_string
406
+ end
407
+
408
+ end
409
+
410
+ # Search for models which contain theimportant terms taken from a specified
411
+ # list of models. i.e. Use to find documents similar to one (or more)
412
+ # documents, or use to refine searches.
413
+ class Similar < QueryBase
414
+ attr_accessor :query_models
415
+ attr_accessor :important_terms
416
+
417
+ # model_classes - model classes to search within, e.g. [PublicBody, User]
418
+ # query_models - list of models you want to find things similar to
419
+ def initialize(model_classes, query_models, options = {})
420
+ self.initialize_db
421
+
422
+ self.runtime += Benchmark::realtime {
423
+ # Case of an array, searching for models similar to those models in the array
424
+ self.query_models = query_models
425
+
426
+ # Find the documents by their unique term
427
+ input_models_query = Xapian::Query.new(Xapian::Query::OP_OR, query_models.map{|m| "I" + m.xapian_document_term})
428
+ ActsAsXapian.enquire.query = input_models_query
429
+ matches = ActsAsXapian.enquire.mset(0, 100, 100) # XXX so this whole method will only work with 100 docs
430
+
431
+ # Get set of relevant terms for those documents
432
+ selection = Xapian::RSet.new()
433
+ iter = matches._begin
434
+ while not iter.equals(matches._end)
435
+ selection.add_document(iter)
436
+ iter.next
437
+ end
438
+
439
+ # Bit weird that the function to make esets is part of the enquire
440
+ # object. This explains what exactly it does, which is to exclude
441
+ # terms in the existing query.
442
+ # http://thread.gmane.org/gmane.comp.search.xapian.general/3673/focus=3681
443
+ eset = ActsAsXapian.enquire.eset(40, selection)
444
+
445
+ # Do main search for them
446
+ self.important_terms = []
447
+ iter = eset._begin
448
+ while not iter.equals(eset._end)
449
+ self.important_terms.push(iter.term)
450
+ iter.next
451
+ end
452
+ similar_query = Xapian::Query.new(Xapian::Query::OP_OR, self.important_terms)
453
+ # Exclude original
454
+ combined_query = Xapian::Query.new(Xapian::Query::OP_AND_NOT, similar_query, input_models_query)
455
+
456
+ # Restrain to model classes
457
+ model_query = Xapian::Query.new(Xapian::Query::OP_OR, model_classes.map{|mc| "M" + mc.to_s})
458
+ self.query = Xapian::Query.new(Xapian::Query::OP_AND, model_query, combined_query)
459
+ }
460
+
461
+ # Call base class constructor
462
+ self.initialize_query(options)
463
+ end
464
+
465
+ # Text for lines in log file
466
+ def log_description
467
+ "Similar: " + self.query_models.to_s
468
+ end
469
+ end
470
+
471
+ ######################################################################
472
+ # Index
473
+
474
+ # Offline indexing job queue model, create with migration made
475
+ # using "script/generate acts_as_xapian" as described in ../README.txt
476
+ class ActsAsXapianJob < ActiveRecord::Base
477
+ end
478
+
479
+ # Update index with any changes needed, call this offline. Only call it
480
+ # from a script that exits - otherwise Xapian's writable database won't
481
+ # flush your changes. Specifying flush will reduce performance, but
482
+ # make sure that each index update is definitely saved to disk before
483
+ # logging in the database that it has been.
484
+ def ActsAsXapian.update_index(flush = false, verbose = false)
485
+ # Before calling writable_init we have to make sure every model class has been initialized.
486
+ # i.e. has had its class code loaded, so acts_as_xapian has been called inside it, and
487
+ # we have the info from acts_as_xapian.
488
+ model_classes = ActsAsXapianJob.find_by_sql("select model from acts_as_xapian_jobs group by model").map {|a| a.model.constantize}
489
+ # If there are no models in the queue, then nothing to do
490
+ return if model_classes.size == 0
491
+
492
+ ActsAsXapian.writable_init
493
+
494
+ ids_to_refresh = ActsAsXapianJob.find(:all).map() { |i| i.id }
495
+ for id in ids_to_refresh
496
+ begin
497
+ ActiveRecord::Base.transaction do
498
+ job = ActsAsXapianJob.find(id, :lock =>true)
499
+ STDOUT.puts("ActsAsXapian.update_index #{job.action} #{job.model} #{job.model_id.to_s}") if verbose
500
+ if job.action == 'update'
501
+ # XXX Index functions may reference other models, so we could eager load here too?
502
+ model = job.model.constantize.find(job.model_id) # :include => cls.constantize.xapian_options[:include]
503
+ model.xapian_index
504
+ elsif job.action == 'destroy'
505
+ # Make dummy model with right id, just for destruction
506
+ model = job.model.constantize.new
507
+ model.id = job.model_id
508
+ model.xapian_destroy
509
+ else
510
+ raise "unknown ActsAsXapianJob action '" + job.action + "'"
511
+ end
512
+ job.destroy
513
+
514
+ if flush
515
+ ActsAsXapian.writable_db.flush
516
+ end
517
+ end
518
+ rescue => detail
519
+ # print any error, and carry on so other things are indexed
520
+ # XXX If item is later deleted, this should give up, and it
521
+ # won't. It will keep trying (assuming update_index called from
522
+ # regular cron job) and mayhap cause trouble.
523
+ STDERR.puts(detail.backtrace.join("\n") + "\nFAILED ActsAsXapian.update_index job #{id} #{$!}")
524
+ end
525
+ end
526
+ end
527
+
528
+ # You must specify *all* the models here, this totally rebuilds the Xapian database.
529
+ # You'll want any readers to reopen the database after this.
530
+ def ActsAsXapian.rebuild_index(model_classes, verbose = false)
531
+ raise "when rebuilding all, please call as first and only thing done in process / task" if not ActsAsXapian.writable_db.nil?
532
+
533
+ prepare_environment
534
+
535
+ # Delete any existing .new database, and open a new one
536
+ new_path = ActsAsXapian.db_path + ".new"
537
+ if File.exist?(new_path)
538
+ raise "found existing " + new_path + " which is not Xapian flint database, please delete for me" if not File.exist?(File.join(new_path, "iamflint"))
539
+ FileUtils.rm_r(new_path)
540
+ end
541
+ ActsAsXapian.writable_init(".new")
542
+
543
+ # Index everything
544
+ # XXX not a good place to do this destroy, as unindexed list is lost if
545
+ # process is aborted and old database carries on being used. Perhaps do in
546
+ # transaction and commit after rename below? Not sure if thenlocking is then bad
547
+ # for live website running at same time.
548
+
549
+ ActsAsXapianJob.destroy_all
550
+ batch_size = 1000
551
+ for model_class in model_classes
552
+ model_class.transaction do
553
+ 0.step(model_class.count, batch_size) do |i|
554
+ STDOUT.puts("ActsAsXapian: New batch. From #{i} to #{i + batch_size}") if verbose
555
+ models = model_class.find(:all, :limit => batch_size, :offset => i, :order => :id)
556
+ for model in models
557
+ STDOUT.puts("ActsAsXapian.rebuild_index #{model_class} #{model.id}") if verbose
558
+ model.xapian_index
559
+ end
560
+ end
561
+ end
562
+ end
563
+
564
+ ActsAsXapian.writable_db.flush
565
+
566
+ # Rename into place
567
+ old_path = ActsAsXapian.db_path
568
+ temp_path = ActsAsXapian.db_path + ".tmp"
569
+ if File.exist?(temp_path)
570
+ raise "temporary database found " + temp_path + " which is not Xapian flint database, please delete for me" if not File.exist?(File.join(temp_path, "iamflint"))
571
+ FileUtils.rm_r(temp_path)
572
+ end
573
+ if File.exist?(old_path)
574
+ FileUtils.mv old_path, temp_path
575
+ end
576
+ FileUtils.mv new_path, old_path
577
+
578
+ # Delete old database
579
+ if File.exist?(temp_path)
580
+ raise "old database now at " + temp_path + " is not Xapian flint database, please delete for me" if not File.exist?(File.join(temp_path, "iamflint"))
581
+ FileUtils.rm_r(temp_path)
582
+ end
583
+
584
+ # You'll want to restart your FastCGI or Mongrel processes after this,
585
+ # so they get the new db
586
+ end
587
+
588
+ ######################################################################
589
+ # Instance methods that get injected into your model.
590
+
591
+ module InstanceMethods
592
+ # Used internally
593
+ def xapian_document_term
594
+ self.class.to_s + "-" + self.id.to_s
595
+ end
596
+
597
+ # Extract value of a field from the model
598
+ def xapian_value(field, type = nil)
599
+ value = self[field] || self.send(field.to_sym)
600
+ if type == :date
601
+ if value.kind_of?(Time)
602
+ value.utc.strftime("%Y%m%d")
603
+ elsif value.kind_of?(Date)
604
+ value.to_time.utc.strftime("%Y%m%d")
605
+ else
606
+ raise "Only Time or Date types supported by acts_as_xapian for :date fields, got " + value.class.to_s
607
+ end
608
+ elsif type == :boolean
609
+ value ? true : false
610
+ else
611
+ value.to_s
612
+ end
613
+ end
614
+
615
+ # Store record in the Xapian database
616
+ def xapian_index
617
+ # if we have a conditional function for indexing, call it and destory object if failed
618
+ if self.class.xapian_options.include?(:if)
619
+ if_value = xapian_value(self.class.xapian_options[:if], :boolean)
620
+ if not if_value
621
+ self.xapian_destroy
622
+ return
623
+ end
624
+ end
625
+
626
+ # otherwise (re)write the Xapian record for the object
627
+ doc = Xapian::Document.new
628
+ ActsAsXapian.term_generator.document = doc
629
+
630
+ doc.data = self.xapian_document_term
631
+
632
+ doc.add_term("M" + self.class.to_s)
633
+ doc.add_term("I" + doc.data)
634
+ if self.xapian_options[:terms]
635
+ for term in self.xapian_options[:terms]
636
+ ActsAsXapian.term_generator.increase_termpos # stop phrases spanning different text fields
637
+ ActsAsXapian.term_generator.index_text(xapian_value(term[0]), 1, term[1])
638
+ end
639
+ end
640
+ if self.xapian_options[:values]
641
+ for value in self.xapian_options[:values]
642
+ doc.add_value(value[1], xapian_value(value[0], value[3]))
643
+ end
644
+ end
645
+ if self.xapian_options[:texts]
646
+ for text in self.xapian_options[:texts]
647
+ ActsAsXapian.term_generator.increase_termpos # stop phrases spanning different text fields
648
+ # XXX the "1" here is a weight that could be varied for a boost function
649
+ ActsAsXapian.term_generator.index_text(xapian_value(text), 1)
650
+ end
651
+ end
652
+
653
+ ActsAsXapian.writable_db.replace_document("I" + doc.data, doc)
654
+ end
655
+
656
+ # Delete record from the Xapian database
657
+ def xapian_destroy
658
+ ActsAsXapian.writable_db.delete_document("I" + self.xapian_document_term)
659
+ end
660
+
661
+ # Used to mark changes needed by batch indexer
662
+ def xapian_mark_needs_index
663
+ model = self.class.base_class.to_s
664
+ model_id = self.id
665
+ ActiveRecord::Base.transaction do
666
+ found = ActsAsXapianJob.delete_all([ "model = ? and model_id = ?", model, model_id])
667
+ job = ActsAsXapianJob.new
668
+ job.model = model
669
+ job.model_id = model_id
670
+ job.action = 'update'
671
+ job.save!
672
+ end
673
+ end
674
+ def xapian_mark_needs_destroy
675
+ model = self.class.base_class.to_s
676
+ model_id = self.id
677
+ ActiveRecord::Base.transaction do
678
+ found = ActsAsXapianJob.delete_all([ "model = ? and model_id = ?", model, model_id])
679
+ job = ActsAsXapianJob.new
680
+ job.model = model
681
+ job.model_id = model_id
682
+ job.action = 'destroy'
683
+ job.save!
684
+ end
685
+ end
686
+ end
687
+
688
+ module ClassMethods
689
+
690
+ # Model.find_with_xapian("Search Term OR Phrase")
691
+ # => Array of Records
692
+ #
693
+ # this can be used through association proxies /!\ DANGEROUS MAGIC /!\
694
+ # example:
695
+ # @document = Document.find(params[:id])
696
+ # @document_pages = @document.pages.find_with_xapian("Search Term OR Phrase").compact # NOTE THE compact wich removes nil objects from the array
697
+ #
698
+ # as seen here: http://pastie.org/270114
699
+ def find_with_xapian(search_term, options = {})
700
+ search_with_xapian(search_term, options).results.collect{|x| x[:model]}
701
+ end
702
+
703
+ def search_with_xapian(search_term, options = {})
704
+ ActsAsXapian::Search.new([self], search_term, options)
705
+ end
706
+
707
+ #this method should return true if the integration of xapian on self is complete
708
+ def xapian?
709
+ self.included_modules.include?(InstanceMethods) && self.extended_by.include?(ClassMethods)
710
+ end
711
+
712
+ end
713
+
714
+ module ProxyFinder
715
+
716
+ def find_with_xapian(search_term, options = {})
717
+ search_with_xapian(search_term, options).results.collect{|x| x[:model]}
718
+ end
719
+
720
+ def search_with_xapian(search_term, options = {})
721
+ ActsAsXapian::Search.new([proxy_reflection.klass], "#{proxy_reflection.primary_key_name}:#{proxy_owner.id} #{search_term}", options)
722
+ end
723
+
724
+ end
725
+
726
+ ######################################################################
727
+ # Main entry point, add acts_as_xapian to your model.
728
+
729
+ module ActsMethods
730
+ # See top of this file for docs
731
+ def acts_as_xapian(options)
732
+ # Give error only on queries if bindings not available
733
+ return unless ActsAsXapian.bindings_available
734
+
735
+ include InstanceMethods
736
+ extend ClassMethods
737
+
738
+ # extend has_many && has_many_and_belongs_to associations with our ProxyFinder to get scoped results
739
+ # I've written a small report in the discussion group why this is the proper way of doing this.
740
+ # see here: XXX - write it you lazy douche bag!
741
+ self.reflections.each do |association_name, r|
742
+ # skip if the associated model isn't indexed by acts_as_xapian
743
+ next unless r.klass.respond_to?(:xapian?) && r.klass.xapian?
744
+ # skip all associations except ham and habtm
745
+ next unless [:has_many, :has_many_and_belongs_to_many].include?(r.macro)
746
+
747
+ # XXX todo:
748
+ # extend the associated model xapian options with this term:
749
+ # [proxy_reflection.primary_key_name.to_sym, <magically find a free capital letter>, proxy_reflection.primary_key_name]
750
+ # otherways this assumes that the associated & indexed model indexes this kind of term
751
+
752
+ # but before you do the above, rewrite the options syntax... wich imho is actually very ugly
753
+
754
+ # XXX test this nifty feature on habtm!
755
+
756
+ if r.options[:extend].nil?
757
+ r.options[:extend] = [ProxyFinder]
758
+ elsif not r.options[:extend].include?(ProxyFinder)
759
+ r.options[:extend] << ProxyFinder
760
+ end
761
+ end
762
+
763
+ cattr_accessor :xapian_options
764
+ self.xapian_options = options
765
+
766
+ ActsAsXapian.init(self.class.to_s, options)
767
+
768
+ after_save :xapian_mark_needs_index
769
+ after_destroy :xapian_mark_needs_destroy
770
+ end
771
+ end
772
+
773
+ end
774
+
775
+ # Reopen ActiveRecord and include the acts_as_xapian method
776
+ ActiveRecord::Base.extend ActsAsXapian::ActsMethods
777
+
778
+
@@ -0,0 +1,43 @@
1
+ require 'rubygems'
2
+ require 'rake'
3
+ require 'rake/testtask'
4
+ require 'activerecord'
5
+ require 'acts_as_xapian'
6
+
7
+ namespace :xapian do
8
+ # Parameters - specify "flush=true" to save changes to the Xapian database
9
+ # after each model that is updated. This is safer, but slower. Specify
10
+ # "verbose=true" to print model name as it is run.
11
+ desc 'Updates Xapian search index with changes to models since last call'
12
+ task(:update_index => :environment) do
13
+ ActsAsXapian.update_index(ENV['flush'] ? true : false, ENV['verbose'] ? true : false)
14
+ end
15
+
16
+ # Parameters - specify 'models="PublicBody User"' to say which models
17
+ # you index with Xapian.
18
+ # This totally rebuilds the database, so you will want to restart any
19
+ # web server afterwards to make sure it gets the changes, rather than
20
+ # still pointing to the old deleted database. Specify "verbose=true" to
21
+ # print model name as it is run.
22
+ desc 'Completely rebuilds Xapian search index (must specify all models)'
23
+ task(:rebuild_index => :environment) do
24
+ raise "specify ALL your models with models=\"ModelName1 ModelName2\" as parameter" if ENV['models'].nil?
25
+ ActsAsXapian.rebuild_index(ENV['models'].split(" ").map{|m| m.constantize}, ENV['verbose'] ? true : false)
26
+ end
27
+
28
+ # Parameters - are models, query, offset, limit, sort_by_prefix,
29
+ # collapse_by_prefix
30
+ desc 'Run a query, return YAML of results'
31
+ task(:query => :environment) do
32
+ raise "specify models=\"ModelName1 ModelName2\" as parameter" if ENV['models'].nil?
33
+ raise "specify query=\"your terms\" as parameter" if ENV['query'].nil?
34
+ s = ActsAsXapian::Search.new(ENV['models'].split(" ").map{|m| m.constantize},
35
+ ENV['query'],
36
+ :offset => (ENV['offset'] || 0), :limit => (ENV['limit'] || 10),
37
+ :sort_by_prefix => (ENV['sort_by_prefix'] || nil),
38
+ :collapse_by_prefix => (ENV['collapse_by_prefix'] || nil)
39
+ )
40
+ STDOUT.puts(s.results.to_yaml)
41
+ end
42
+ end
43
+
metadata ADDED
@@ -0,0 +1,74 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: wbzyl-acts_as_xapian
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.2
5
+ platform: ruby
6
+ authors:
7
+ - "Lukas Rieder (original author: Francis Irving)"
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2008-12-09 00:00:00 -08:00
13
+ default_executable:
14
+ dependencies: []
15
+
16
+ description: Acts_as_xapian is a full text search gem/plugin for Ruby on Rails.
17
+ email: l.rieder@gmail.com
18
+ executables: []
19
+
20
+ extensions: []
21
+
22
+ extra_rdoc_files:
23
+ - lib/acts_as_xapian.rb
24
+ - README.textile
25
+ - LICENSE.txt
26
+ - tasks/xapian.rake
27
+ - CHANGELOG
28
+ files:
29
+ - Rakefile
30
+ - acts_as_xapian.gemspec
31
+ - lib/acts_as_xapian.rb
32
+ - init.rb
33
+ - Manifest
34
+ - generators/acts_as_xapian/templates/migration.rb
35
+ - generators/acts_as_xapian/templates/xapian.yml
36
+ - generators/acts_as_xapian/USAGE
37
+ - generators/acts_as_xapian/acts_as_xapian_generator.rb
38
+ - README.textile
39
+ - LICENSE.txt
40
+ - tasks/xapian.rake
41
+ - CHANGELOG
42
+ has_rdoc: true
43
+ homepage: http://github.com/Overbryd/acts_as_xapian
44
+ post_install_message:
45
+ rdoc_options:
46
+ - --line-numbers
47
+ - --inline-source
48
+ - --title
49
+ - Acts_as_xapian
50
+ - --main
51
+ - README.textile
52
+ require_paths:
53
+ - lib
54
+ required_ruby_version: !ruby/object:Gem::Requirement
55
+ requirements:
56
+ - - ">="
57
+ - !ruby/object:Gem::Version
58
+ version: "0"
59
+ version:
60
+ required_rubygems_version: !ruby/object:Gem::Requirement
61
+ requirements:
62
+ - - ">="
63
+ - !ruby/object:Gem::Version
64
+ version: "1.2"
65
+ version:
66
+ requirements: []
67
+
68
+ rubyforge_project: acts_as_xapian
69
+ rubygems_version: 1.2.0
70
+ signing_key:
71
+ specification_version: 2
72
+ summary: Acts_as_xapian is a full text search gem/plugin for Ruby on Rails.
73
+ test_files: []
74
+