wbzyl-acts_as_xapian 0.0.2
Sign up to get free protection for your applications and to get access to all the features.
- data/CHANGELOG +2 -0
- data/LICENSE.txt +21 -0
- data/README.textile +267 -0
- data/Rakefile +14 -0
- data/acts_as_xapian.gemspec +31 -0
- data/generators/acts_as_xapian/USAGE +1 -0
- data/generators/acts_as_xapian/acts_as_xapian_generator.rb +17 -0
- data/generators/acts_as_xapian/templates/migration.rb +14 -0
- data/generators/acts_as_xapian/templates/xapian.yml +10 -0
- data/init.rb +9 -0
- data/lib/acts_as_xapian.rb +778 -0
- data/tasks/xapian.rake +43 -0
- metadata +74 -0
data/CHANGELOG
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
acts_as_xapian is released under the MIT License.
|
2
|
+
|
3
|
+
Copyright (c) 2008 UK Citizens Online Democracy.
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of the acts_as_xapian software and associated documentation files (the
|
7
|
+
"Software"), to deal in the Software without restriction, including without
|
8
|
+
limitation the rights to use, copy, modify, merge, publish, distribute,
|
9
|
+
sublicense, and/or sell copies of the Software, and to permit persons to whom
|
10
|
+
the Software is furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
13
|
+
copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
21
|
+
THE SOFTWARE.
|
data/README.textile
ADDED
@@ -0,0 +1,267 @@
|
|
1
|
+
Do patch this file if there is documentation missing / wrong. It's called
|
2
|
+
README.txt and is in git, using Textile formatting. The wiki page is just
|
3
|
+
copied from the README.txt file.
|
4
|
+
|
5
|
+
Contents
|
6
|
+
========
|
7
|
+
|
8
|
+
* a. Introduction to acts_as_xapian
|
9
|
+
* b. Installation
|
10
|
+
* c. Comparison to acts_as_solr (as on 24 April 2008)
|
11
|
+
* d. Documentation - indexing
|
12
|
+
* e. Documentation - querying
|
13
|
+
* f. Configuration
|
14
|
+
* g. Performance
|
15
|
+
* h. Support
|
16
|
+
|
17
|
+
|
18
|
+
a. Introduction to acts_as_xapian
|
19
|
+
=================================
|
20
|
+
|
21
|
+
"Xapian":http://www.xapian.org is a full text search engine library which has
|
22
|
+
Ruby bindings. acts_as_xapian adds support for it to Rails. It is an
|
23
|
+
alternative to acts_as_solr, acts_as_ferret, Ultrasphinx, acts_as_indexed,
|
24
|
+
acts_as_searchable or acts_as_tsearch.
|
25
|
+
|
26
|
+
acts_as_xapian is deployed in production on these websites.
|
27
|
+
* "WhatDoTheyKnow":http://www.whatdotheyknow.com
|
28
|
+
* "MindBites":http://www.mindbites.com
|
29
|
+
|
30
|
+
The section "c. Comparison to acts_as_solr" below will give you an idea of
|
31
|
+
acts_as_xapian's features.
|
32
|
+
|
33
|
+
acts_as_xapian was started by Francis Irving in May 2008 for search and email
|
34
|
+
alerts in WhatDoTheyKnow, and so was supported by "mySociety":http://www.mysociety.org
|
35
|
+
and initially paid for by the "JRSST Charitable Trust":http://www.jrrt.org.uk/jrsstct.htm
|
36
|
+
|
37
|
+
|
38
|
+
b. Installation
|
39
|
+
===============
|
40
|
+
|
41
|
+
Retrieve the plugin directly from the git version control system by running
|
42
|
+
this command within your Rails app.
|
43
|
+
|
44
|
+
git clone git://github.com/frabcus/acts_as_xapian.git vendor/plugins/acts_as_xapian
|
45
|
+
|
46
|
+
Xapian 1.0.5 and associated Ruby bindings are also required.
|
47
|
+
|
48
|
+
Debian or Ubuntu - install the packages libxapian15 and libxapian-ruby1.8.
|
49
|
+
|
50
|
+
Mac OSX - follow the instructions for installing from source on
|
51
|
+
the "Installing Xapian":http://xapian.org/docs/install.html page - you need the
|
52
|
+
Xapian library and bindings (you don't need Omega).
|
53
|
+
|
54
|
+
There is no Ruby Gem for Xapian, it would be great if you could make one!
|
55
|
+
|
56
|
+
|
57
|
+
c. Comparison to acts_as_solr (as on 24 April 2008)
|
58
|
+
=============================
|
59
|
+
|
60
|
+
* Offline indexing only mode - which is a minus if you want changes
|
61
|
+
immediately reflected in the search index, and a plus if you were going to
|
62
|
+
have to implement your own offline indexing anyway.
|
63
|
+
|
64
|
+
* Collapsing - the equivalent of SQL's "group by". You can specify a field
|
65
|
+
to collapse on, and only the most relevant result from each value of that
|
66
|
+
field is returned. Along with a count of how many there are in total.
|
67
|
+
acts_as_solr doesn't have this.
|
68
|
+
|
69
|
+
* No highlighting - Xapian can't return you text highlighted with a search
|
70
|
+
query. You can try and make do with TextHelper::highlight (combined with
|
71
|
+
words_to_highlight below). I found the highlighting in acts_as_solr didn't
|
72
|
+
really understand the query anyway.
|
73
|
+
|
74
|
+
* Date range searching - this exists in acts_as_solr, but I found it
|
75
|
+
wasn't documented well enough, and was hard to get working.
|
76
|
+
|
77
|
+
* Spelling correction - "did you mean?" built in and just works.
|
78
|
+
|
79
|
+
* Similar documents - acts_as_xapian has a simple command to find other models
|
80
|
+
that are like a specified model.
|
81
|
+
|
82
|
+
* Multiple models - acts_as_xapian searches multiple types of model if you
|
83
|
+
like, returning them mixed up together by relevancy. This is like
|
84
|
+
multi_solr_search, only it is the default mode of operation and is properly
|
85
|
+
supported.
|
86
|
+
|
87
|
+
* No daemons - However, if you have more than one web server, you'll need to
|
88
|
+
work out how to use "Xapian's remote backend":http://xapian.org/docs/remote.html.
|
89
|
+
|
90
|
+
* One layer - full-powered Xapian is called directly from the Ruby, without
|
91
|
+
Solr getting in the way whenever you want to use a new feature from Lucene.
|
92
|
+
|
93
|
+
* No Java - an advantage if you're more used to working in the rest of the
|
94
|
+
open source world. acts_as_xapian, it's pure Ruby and C++.
|
95
|
+
|
96
|
+
* Xapian's awesome email list - the kids over at
|
97
|
+
"xapian-discuss":http://lists.xapian.org/mailman/listinfo/xapian-discuss
|
98
|
+
are super helpful. Useful if you need to extend and improve acts_as_xapian. The
|
99
|
+
Ruby bindings are mature and well maintained as part of Xapian.
|
100
|
+
|
101
|
+
|
102
|
+
d. Documentation - indexing
|
103
|
+
===========================
|
104
|
+
|
105
|
+
Xapian is an *offline indexing* search library - only one process can have the
|
106
|
+
Xapian database open for writing at once, and others that try meanwhile are
|
107
|
+
unceremoniously kicked out. For this reason, acts_as_xapian does not support
|
108
|
+
immediate writing to the database when your models change.
|
109
|
+
|
110
|
+
Instead, there is a ActsAsXapianJob model which stores which models need
|
111
|
+
updating or deleting in the search index. A rake task 'xapian:update_index'
|
112
|
+
then performs the updates since last change. You can run it on a cron job, or
|
113
|
+
similar.
|
114
|
+
|
115
|
+
Here's how to add indexing to your Rails app:
|
116
|
+
|
117
|
+
1. Put acts_as_xapian in your models that need search indexing. e.g.
|
118
|
+
|
119
|
+
acts_as_xapian :texts => [ :name, :short_name ],
|
120
|
+
:values => [ [ :created_at, 0, "created_at", :date ] ],
|
121
|
+
:terms => [ [ :variety, 'V', "variety" ] ]
|
122
|
+
|
123
|
+
Options must include:
|
124
|
+
|
125
|
+
* :texts, an array of fields for indexing with full text search.
|
126
|
+
e.g. :texts => [ :title, :body ]
|
127
|
+
|
128
|
+
* :values, things which have a range of values for sorting, or for collapsing.
|
129
|
+
Specify an array quadruple of [ field, identifier, prefix, type ] where
|
130
|
+
** identifier is an arbitary numeric identifier for use in the Xapian database
|
131
|
+
** prefix is the part to use in search queries that goes before the :
|
132
|
+
** type can be any of :string, :number or :date
|
133
|
+
|
134
|
+
e.g. :values => [ [ :created_at, 0, "created_at", :date ],
|
135
|
+
[ :size, 1, "size", :string ] ]
|
136
|
+
|
137
|
+
* :terms, things which come with a prefix (before a :) in search queries.
|
138
|
+
Specify an array triple of [ field, char, prefix ] where
|
139
|
+
** char is an arbitary single upper case char used in the Xapian database, just
|
140
|
+
pick any single uppercase character, but use a different one for each prefix.
|
141
|
+
** prefix is the part to use in search queries that goes before the :
|
142
|
+
For example, if you were making Google and indexing to be able to later do a
|
143
|
+
query like "site:www.whatdotheyknow.com", then the prefix would be "site".
|
144
|
+
|
145
|
+
e.g. :terms => [ [ :variety, 'V', "variety" ] ]
|
146
|
+
|
147
|
+
A 'field' is a symbol referring to either an attribute or a function which
|
148
|
+
returns the text, date or number to index. Both 'identifier' and 'char' must be
|
149
|
+
the same for the same prefix in different models.
|
150
|
+
|
151
|
+
Options may include:
|
152
|
+
* :eager_load, added as an :include clause when looking up search results in
|
153
|
+
database
|
154
|
+
* :if, either an attribute or a function which if returns false means the
|
155
|
+
object isn't indexed
|
156
|
+
|
157
|
+
2. Generate a database migration to create the ActsAsXapianJob model:
|
158
|
+
|
159
|
+
script/generate acts_as_xapian
|
160
|
+
rake db:migrate
|
161
|
+
|
162
|
+
3. Call 'rake xapian:rebuild_index models="ModelName1 ModelName2"' to build the index
|
163
|
+
the first time (you must specify all your indexed models). It's put in a
|
164
|
+
development/test/production dir in acts_as_xapian/xapiandbs. See f. Configuration
|
165
|
+
below if you want to change this.
|
166
|
+
|
167
|
+
4. Then from a cron job or a daemon, or by hand regularly!, call 'rake xapian:update_index'
|
168
|
+
|
169
|
+
|
170
|
+
e. Documentation - querying
|
171
|
+
===========================
|
172
|
+
|
173
|
+
Testing indexing
|
174
|
+
----------------
|
175
|
+
|
176
|
+
If you just want to test indexing is working, you'll find this rake task
|
177
|
+
useful (it has more options, see tasks/xapian.rake)
|
178
|
+
|
179
|
+
rake xapian:query models="PublicBody User" query="moo"
|
180
|
+
|
181
|
+
Performing a query
|
182
|
+
------------------
|
183
|
+
|
184
|
+
To perform a query from code call ActsAsXapian::Search.new. This takes in turn:
|
185
|
+
* model_classes - list of models to search, e.g. [PublicBody, InfoRequestEvent]
|
186
|
+
* query_string - Google like syntax, see below
|
187
|
+
|
188
|
+
And then a hash of options:
|
189
|
+
* :offset - Offset of first result (default 0)
|
190
|
+
* :limit - Number of results per page
|
191
|
+
* :sort_by_prefix - Optionally, prefix of value to sort by, otherwise sort by relevance
|
192
|
+
* :sort_by_ascending - Default true (documents with higher values better/earlier), set to false for descending sort
|
193
|
+
* :collapse_by_prefix - Optionally, prefix of value to collapse by (i.e. only return most relevant result from group)
|
194
|
+
|
195
|
+
Google like query syntax is as described in
|
196
|
+
"Xapian::QueryParser Syntax":http://www.xapian.org/docs/queryparser.html
|
197
|
+
Queries can include prefix:value parts, according to what you indexed in the
|
198
|
+
acts_as_xapian part above. You can also say things like model:InfoRequestEvent
|
199
|
+
to constrain by model in more complex ways than the :model parameter, or
|
200
|
+
modelid:InfoRequestEvent-100 to only find one specific object.
|
201
|
+
|
202
|
+
Returns an ActsAsXapian::Search object. Useful methods are:
|
203
|
+
* description - a techy one, to check how the query has been parsed
|
204
|
+
* matches_estimated - a guesstimate at the total number of hits
|
205
|
+
* spelling_correction - the corrected query string if there is a correction, otherwise nil
|
206
|
+
* words_to_highlight - list of words for you to highlight, perhaps with TextHelper::highlight
|
207
|
+
* results - an array of hashes each containing:
|
208
|
+
** :model - your Rails model, this is what you most want!
|
209
|
+
** :weight - relevancy measure
|
210
|
+
** :percent - the weight as a %, 0 meaning the item did not match the query at all
|
211
|
+
** :collapse_count - number of results with the same prefix, if you specified collapse_by_prefix
|
212
|
+
|
213
|
+
Finding similar models
|
214
|
+
----------------------
|
215
|
+
|
216
|
+
To find models that are similar to a given set of models call ActsAsXapian::Similar.new. This takes:
|
217
|
+
* model_classes - list of model classes to return models from within
|
218
|
+
* models - list of models that you want to find related ones to
|
219
|
+
|
220
|
+
Returns an ActsAsXapian::Similar object. Has all methods from ActsAsXapian::Search above, except
|
221
|
+
for words_to_highlight. In addition has:
|
222
|
+
* important_terms - the terms extracted from the input models, that were used to search for output
|
223
|
+
You need the results methods to get the similar models.
|
224
|
+
|
225
|
+
|
226
|
+
f. Configuration
|
227
|
+
================
|
228
|
+
|
229
|
+
If you want to customise the configuration of acts_as_xapian, it will look for
|
230
|
+
a file called 'xapian.yml' under RAILS_ROOT/config. As is familiar from the
|
231
|
+
format of the database.yml file, separate :development, :test and :production
|
232
|
+
sections are expected.
|
233
|
+
|
234
|
+
The following options are available:
|
235
|
+
* base_db_path - specifies the directory, relative to RAILS_ROOT, in which
|
236
|
+
acts_as_xapian stores its search index databases. Default is the directory
|
237
|
+
xapiandbs within the acts_as_xapian directory.
|
238
|
+
|
239
|
+
|
240
|
+
g. Performance
|
241
|
+
==============
|
242
|
+
|
243
|
+
On development sites, acts_as_xapian automatically logs the time taken to do
|
244
|
+
searches. The time displayed is for the Xapian parts of the query; the Rails
|
245
|
+
database model lookups will be logged separately by ActiveRecord. Example:
|
246
|
+
|
247
|
+
Xapian query (0.00029s) Search: hello
|
248
|
+
|
249
|
+
To enable this, and other performance logging, on a production site,
|
250
|
+
temporarily add this to the end of your config/environment.rb
|
251
|
+
|
252
|
+
ActiveRecord::Base.logger = Logger.new(STDOUT)
|
253
|
+
|
254
|
+
|
255
|
+
h. Support
|
256
|
+
==========
|
257
|
+
|
258
|
+
Please ask any questions on the
|
259
|
+
"acts_as_xapian Google Group":http://groups.google.com/group/acts_as_xapian
|
260
|
+
|
261
|
+
The official home page and repository for acts_as_xapian are the
|
262
|
+
"acts_as_xapian github page":http://github.com/frabcus/acts_as_xapian/wikis
|
263
|
+
|
264
|
+
For more details about anything, see source code in lib/acts_as_xapian.rb
|
265
|
+
|
266
|
+
Merging source instructions "Using git for collaboration" here:
|
267
|
+
http://www.kernel.org/pub/software/scm/git/docs/gittutorial.html
|
data/Rakefile
ADDED
@@ -0,0 +1,14 @@
|
|
1
|
+
require 'rake'
|
2
|
+
require 'rubygems'
|
3
|
+
require 'echoe'
|
4
|
+
|
5
|
+
Echoe.new('acts_as_xapian', '0.0.2') do |p|
|
6
|
+
p.description = "Acts_as_xapian is a full text search gem/plugin for Ruby on Rails."
|
7
|
+
p.url = "http://github.com/Overbryd/acts_as_xapian"
|
8
|
+
p.author = "Lukas Rieder (original author: Francis Irving)"
|
9
|
+
p.email = "l.rieder@gmail.com"
|
10
|
+
p.ignore_pattern = ["tmp/*", "script/*"]
|
11
|
+
p.development_dependencies = []
|
12
|
+
end
|
13
|
+
|
14
|
+
Dir["#{File.dirname(__FILE__)}/tasks/*.rake"].sort.each { |ext| load ext }
|
@@ -0,0 +1,31 @@
|
|
1
|
+
# -*- encoding: utf-8 -*-
|
2
|
+
|
3
|
+
Gem::Specification.new do |s|
|
4
|
+
s.name = %q{acts_as_xapian}
|
5
|
+
s.version = "0.0.2"
|
6
|
+
|
7
|
+
s.required_rubygems_version = Gem::Requirement.new(">= 1.2") if s.respond_to? :required_rubygems_version=
|
8
|
+
s.authors = ["Lukas Rieder (original author: Francis Irving)"]
|
9
|
+
s.date = %q{2008-12-09}
|
10
|
+
s.description = %q{Acts_as_xapian is a full text search gem/plugin for Ruby on Rails.}
|
11
|
+
s.email = %q{l.rieder@gmail.com}
|
12
|
+
s.extra_rdoc_files = ["lib/acts_as_xapian.rb", "README.textile", "LICENSE.txt", "tasks/xapian.rake", "CHANGELOG"]
|
13
|
+
s.files = ["Rakefile", "acts_as_xapian.gemspec", "lib/acts_as_xapian.rb", "init.rb", "Manifest", "generators/acts_as_xapian/templates/migration.rb", "generators/acts_as_xapian/templates/xapian.yml", "generators/acts_as_xapian/USAGE", "generators/acts_as_xapian/acts_as_xapian_generator.rb", "README.textile", "LICENSE.txt", "tasks/xapian.rake", "CHANGELOG"]
|
14
|
+
s.has_rdoc = true
|
15
|
+
s.homepage = %q{http://github.com/Overbryd/acts_as_xapian}
|
16
|
+
s.rdoc_options = ["--line-numbers", "--inline-source", "--title", "Acts_as_xapian", "--main", "README.textile"]
|
17
|
+
s.require_paths = ["lib"]
|
18
|
+
s.rubyforge_project = %q{acts_as_xapian}
|
19
|
+
s.rubygems_version = %q{1.3.1}
|
20
|
+
s.summary = %q{Acts_as_xapian is a full text search gem/plugin for Ruby on Rails.}
|
21
|
+
|
22
|
+
if s.respond_to? :specification_version then
|
23
|
+
current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
|
24
|
+
s.specification_version = 2
|
25
|
+
|
26
|
+
if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.2.0') then
|
27
|
+
else
|
28
|
+
end
|
29
|
+
else
|
30
|
+
end
|
31
|
+
end
|
@@ -0,0 +1 @@
|
|
1
|
+
./script/generate acts_as_xapian
|
@@ -0,0 +1,17 @@
|
|
1
|
+
class ActsAsXapianGenerator < Rails::Generator::Base
|
2
|
+
|
3
|
+
def manifest
|
4
|
+
record do |m|
|
5
|
+
m.migration_template 'migration.rb', 'db/migrate',
|
6
|
+
:migration_file_name => "create_acts_as_xapian"
|
7
|
+
m.file 'xapian.yml', 'config/xapian.yml'
|
8
|
+
end
|
9
|
+
end
|
10
|
+
|
11
|
+
protected
|
12
|
+
|
13
|
+
def banner
|
14
|
+
"Usage: #{$0} acts_as_xapian"
|
15
|
+
end
|
16
|
+
|
17
|
+
end
|
@@ -0,0 +1,14 @@
|
|
1
|
+
class CreateActsAsXapian < ActiveRecord::Migration
|
2
|
+
def self.up
|
3
|
+
create_table :acts_as_xapian_jobs do |t|
|
4
|
+
t.column :model, :string, :null => false
|
5
|
+
t.column :model_id, :integer, :null => false
|
6
|
+
t.column :action, :string, :null => false
|
7
|
+
end
|
8
|
+
add_index :acts_as_xapian_jobs, [:model, :model_id], :unique => true
|
9
|
+
end
|
10
|
+
def self.down
|
11
|
+
drop_table :acts_as_xapian_jobs
|
12
|
+
end
|
13
|
+
end
|
14
|
+
|
data/init.rb
ADDED
@@ -0,0 +1,778 @@
|
|
1
|
+
# acts_as_xapian/lib/acts_as_xapian.rb:
|
2
|
+
# Xapian full text search in Ruby on Rails.
|
3
|
+
#
|
4
|
+
# Copyright (c) 2008 UK Citizens Online Democracy. All rights reserved.
|
5
|
+
# Email: francis@mysociety.org; WWW: http://www.mysociety.org/
|
6
|
+
#
|
7
|
+
# Documentation
|
8
|
+
# =============
|
9
|
+
#
|
10
|
+
# See ../README.txt foocumentation. Please update that file if you edit
|
11
|
+
# code.
|
12
|
+
|
13
|
+
# Make it so if Xapian isn't installed, the Rails app doesn't fail completely,
|
14
|
+
# just when somebody does a search.
|
15
|
+
begin
|
16
|
+
require 'xapian'
|
17
|
+
$acts_as_xapian_bindings_available = true
|
18
|
+
rescue LoadError
|
19
|
+
STDERR.puts "acts_as_xapian: No Ruby bindings for Xapian installed"
|
20
|
+
$acts_as_xapian_bindings_available = false
|
21
|
+
end
|
22
|
+
|
23
|
+
module ActsAsXapian
|
24
|
+
######################################################################
|
25
|
+
# Module level variables
|
26
|
+
# XXX must be some kind of cattr_accessor that can do this better
|
27
|
+
def ActsAsXapian.bindings_available
|
28
|
+
$acts_as_xapian_bindings_available
|
29
|
+
end
|
30
|
+
class NoXapianRubyBindingsError < StandardError
|
31
|
+
end
|
32
|
+
|
33
|
+
# XXX global class intializers here get loaded more than once, don't know why. Protect them.
|
34
|
+
if not $acts_as_xapian_class_var_init
|
35
|
+
@@db = nil
|
36
|
+
@@db_path = nil
|
37
|
+
@@writable_db = nil
|
38
|
+
@@writable_suffix = nil
|
39
|
+
@@init_values = []
|
40
|
+
$acts_as_xapian_class_var_init = true
|
41
|
+
end
|
42
|
+
def ActsAsXapian.db
|
43
|
+
@@db
|
44
|
+
end
|
45
|
+
def ActsAsXapian.db_path
|
46
|
+
@@db_path
|
47
|
+
end
|
48
|
+
def ActsAsXapian.writable_db
|
49
|
+
@@writable_db
|
50
|
+
end
|
51
|
+
def ActsAsXapian.stemmer
|
52
|
+
@@stemmer
|
53
|
+
end
|
54
|
+
def ActsAsXapian.term_generator
|
55
|
+
@@term_generator
|
56
|
+
end
|
57
|
+
def ActsAsXapian.enquire
|
58
|
+
@@enquire
|
59
|
+
end
|
60
|
+
def ActsAsXapian.query_parser
|
61
|
+
@@query_parser
|
62
|
+
end
|
63
|
+
def ActsAsXapian.values_by_prefix
|
64
|
+
@@values_by_prefix
|
65
|
+
end
|
66
|
+
def ActsAsXapian.config
|
67
|
+
@@config
|
68
|
+
end
|
69
|
+
|
70
|
+
######################################################################
|
71
|
+
# Initialisation
|
72
|
+
def ActsAsXapian.init(classname = nil, options = nil)
|
73
|
+
if not classname.nil?
|
74
|
+
# store class and options for use later, when we open the db in readable_init
|
75
|
+
@@init_values.push([classname,options])
|
76
|
+
end
|
77
|
+
end
|
78
|
+
|
79
|
+
# Reads the config file (if any) and sets up the path to the database we'll be using
|
80
|
+
def ActsAsXapian.prepare_environment
|
81
|
+
return unless @@db_path.nil?
|
82
|
+
|
83
|
+
# barf if we can't figure out the environment
|
84
|
+
environment = (ENV['RAILS_ENV'] or RAILS_ENV)
|
85
|
+
raise "Set RAILS_ENV, so acts_as_xapian can find the right Xapian database" if not environment
|
86
|
+
|
87
|
+
# check for a config file
|
88
|
+
config_file = RAILS_ROOT + "/config/xapian.yml"
|
89
|
+
@@config = File.exists?(config_file) ? YAML.load_file(config_file)[environment] : {}
|
90
|
+
|
91
|
+
# figure out where the DBs should go
|
92
|
+
if config['base_db_path']
|
93
|
+
db_parent_path = RAILS_ROOT + "/" + config['base_db_path']
|
94
|
+
else
|
95
|
+
db_parent_path = File.join(File.dirname(__FILE__), '../xapiandbs/')
|
96
|
+
end
|
97
|
+
|
98
|
+
# make the directory for the xapian databases to go in
|
99
|
+
Dir.mkdir(db_parent_path) unless File.exists?(db_parent_path)
|
100
|
+
|
101
|
+
@@db_path = File.join(db_parent_path, environment)
|
102
|
+
|
103
|
+
# make some things that don't depend on the db
|
104
|
+
# XXX this gets made once for each acts_as_xapian. Oh well.
|
105
|
+
@@stemmer = Xapian::Stem.new('english')
|
106
|
+
end
|
107
|
+
|
108
|
+
# Opens / reopens the db for reading
|
109
|
+
# XXX we perhaps don't need to rebuild database and enquire and queryparser -
|
110
|
+
# but db.reopen wasn't enough by itself, so just do everything it's easier.
|
111
|
+
def ActsAsXapian.readable_init
|
112
|
+
raise NoXapianRubyBindingsError.new("Xapian Ruby bindings not installed") unless ActsAsXapian.bindings_available
|
113
|
+
raise "acts_as_xapian hasn't been called in any models" if @@init_values.empty?
|
114
|
+
|
115
|
+
# if DB is not nil, then we're already initialised, so don't do it again
|
116
|
+
# XXX we need to reopen the database each time, so Xapian gets changes to it.
|
117
|
+
# Hopefully in later version of Xapian it will autodetect this, and this can
|
118
|
+
# be commented back in again.
|
119
|
+
# return unless @@db.nil?
|
120
|
+
|
121
|
+
prepare_environment
|
122
|
+
|
123
|
+
# basic Xapian objects
|
124
|
+
begin
|
125
|
+
@@db = Xapian::Database.new(@@db_path)
|
126
|
+
@@enquire = Xapian::Enquire.new(@@db)
|
127
|
+
rescue IOError
|
128
|
+
raise "Xapian database not opened; have you built it with scripts/rebuild-xapian-index ?"
|
129
|
+
end
|
130
|
+
|
131
|
+
init_query_parser
|
132
|
+
end
|
133
|
+
|
134
|
+
# Make a new query parser
|
135
|
+
def ActsAsXapian.init_query_parser
|
136
|
+
# for queries
|
137
|
+
@@query_parser = Xapian::QueryParser.new
|
138
|
+
@@query_parser.stemmer = @@stemmer
|
139
|
+
@@query_parser.stemming_strategy = Xapian::QueryParser::STEM_SOME
|
140
|
+
@@query_parser.database = @@db
|
141
|
+
@@query_parser.default_op = Xapian::Query::OP_AND
|
142
|
+
|
143
|
+
@@terms_by_capital = {}
|
144
|
+
@@values_by_number = {}
|
145
|
+
@@values_by_prefix = {}
|
146
|
+
@@value_ranges_store = []
|
147
|
+
|
148
|
+
for init_value_pair in @@init_values
|
149
|
+
classname = init_value_pair[0]
|
150
|
+
options = init_value_pair[1]
|
151
|
+
|
152
|
+
# go through the various field types, and tell query parser about them,
|
153
|
+
# and error check them - i.e. check for consistency between models
|
154
|
+
@@query_parser.add_boolean_prefix("model", "M")
|
155
|
+
@@query_parser.add_boolean_prefix("modelid", "I")
|
156
|
+
if options[:terms]
|
157
|
+
for term in options[:terms]
|
158
|
+
raise "Use a single capital letter for term code" if not term[1].match(/^[A-Z]$/)
|
159
|
+
raise "M and I are reserved for use as the model/id term" if term[1] == "M" or term[1] == "I"
|
160
|
+
raise "model and modelid are reserved for use as the model/id prefixes" if term[2] == "model" or term[2] == "modelid"
|
161
|
+
raise "Z is reserved for stemming terms" if term[1] == "Z"
|
162
|
+
raise "Already have code '" + term[1] + "' in another model but with different prefix '" + @@terms_by_capital[term[1]] + "'" if @@terms_by_capital.include?(term[1]) && @@terms_by_capital[term[1]] != term[2]
|
163
|
+
@@terms_by_capital[term[1]] = term[2]
|
164
|
+
@@query_parser.add_prefix(term[2], term[1])
|
165
|
+
end
|
166
|
+
end
|
167
|
+
if options[:values]
|
168
|
+
for value in options[:values]
|
169
|
+
raise "Value index '"+value[1].to_s+"' must be an integer, is " + value[1].class.to_s if value[1].class != 1.class
|
170
|
+
raise "Already have value index '" + value[1].to_s + "' in another model but with different prefix '" + @@values_by_number[value[1]].to_s + "'" if @@values_by_number.include?(value[1]) && @@values_by_number[value[1]] != value[2]
|
171
|
+
|
172
|
+
# date types are special, mark them so the first model they're seen for
|
173
|
+
if !@@values_by_number.include?(value[1])
|
174
|
+
if value[3] == :date
|
175
|
+
value_range = Xapian::DateValueRangeProcessor.new(value[1])
|
176
|
+
elsif value[3] == :string
|
177
|
+
value_range = Xapian::StringValueRangeProcessor.new(value[1])
|
178
|
+
elsif value[3] == :number
|
179
|
+
value_range = Xapian::NumberValueRangeProcessor.new(value[1])
|
180
|
+
else
|
181
|
+
raise "Unknown value type '" + value[3].to_s + "'"
|
182
|
+
end
|
183
|
+
|
184
|
+
@@query_parser.add_valuerangeprocessor(value_range)
|
185
|
+
|
186
|
+
# stop it being garbage collected, as
|
187
|
+
# add_valuerangeprocessor ref is outside Ruby's GC
|
188
|
+
@@value_ranges_store.push(value_range)
|
189
|
+
end
|
190
|
+
|
191
|
+
@@values_by_number[value[1]] = value[2]
|
192
|
+
@@values_by_prefix[value[2]] = value[1]
|
193
|
+
end
|
194
|
+
end
|
195
|
+
end
|
196
|
+
end
|
197
|
+
|
198
|
+
def ActsAsXapian.writable_init(suffix = "")
|
199
|
+
raise NoXapianRubyBindingsError.new("Xapian Ruby bindings not installed") unless ActsAsXapian.bindings_available
|
200
|
+
raise "acts_as_xapian hasn't been called in any models" if @@init_values.empty?
|
201
|
+
|
202
|
+
# if DB is not nil, then we're already initialised, so don't do it again
|
203
|
+
return unless @@writable_db.nil?
|
204
|
+
|
205
|
+
prepare_environment
|
206
|
+
|
207
|
+
new_path = @@db_path + suffix
|
208
|
+
raise "writable_suffix/suffix inconsistency" if @@writable_suffix && @@writable_suffix != suffix
|
209
|
+
if @@writable_db.nil?
|
210
|
+
# for indexing
|
211
|
+
@@writable_db = Xapian::WritableDatabase.new(new_path, Xapian::DB_CREATE_OR_OPEN)
|
212
|
+
@@term_generator = Xapian::TermGenerator.new()
|
213
|
+
@@term_generator.set_flags(Xapian::TermGenerator::FLAG_SPELLING, 0)
|
214
|
+
@@term_generator.database = @@writable_db
|
215
|
+
@@term_generator.stemmer = @@stemmer
|
216
|
+
@@writable_suffix = suffix
|
217
|
+
end
|
218
|
+
end
|
219
|
+
|
220
|
+
######################################################################
|
221
|
+
# Search with a query or for similar models
|
222
|
+
|
223
|
+
# Base class for Search and Similar below
|
224
|
+
class QueryBase
|
225
|
+
attr_accessor :offset
|
226
|
+
attr_accessor :limit
|
227
|
+
attr_accessor :query
|
228
|
+
attr_accessor :matches
|
229
|
+
attr_accessor :query_models
|
230
|
+
attr_accessor :runtime
|
231
|
+
attr_accessor :cached_results
|
232
|
+
|
233
|
+
def initialize_db
|
234
|
+
self.runtime = 0.0
|
235
|
+
|
236
|
+
ActsAsXapian.readable_init
|
237
|
+
if ActsAsXapian.db.nil?
|
238
|
+
raise "ActsAsXapian not initialized"
|
239
|
+
end
|
240
|
+
end
|
241
|
+
|
242
|
+
# Set self.query before calling this
|
243
|
+
def initialize_query(options)
|
244
|
+
#raise options.to_yaml
|
245
|
+
|
246
|
+
self.runtime += Benchmark::realtime {
|
247
|
+
offset = options[:offset] || 0; offset = offset.to_i
|
248
|
+
limit = options[:limit] || -1
|
249
|
+
#raise "please specifiy maximum number of results to return with parameter :limit" if not limit
|
250
|
+
limit = limit.to_i
|
251
|
+
sort_by_prefix = options[:sort_by_prefix] || nil
|
252
|
+
sort_by_ascending = options[:sort_by_ascending].nil? ? true : options[:sort_by_ascending]
|
253
|
+
collapse_by_prefix = options[:collapse_by_prefix] || nil
|
254
|
+
|
255
|
+
ActsAsXapian.enquire.query = self.query
|
256
|
+
|
257
|
+
if sort_by_prefix.nil?
|
258
|
+
ActsAsXapian.enquire.sort_by_relevance!
|
259
|
+
else
|
260
|
+
value = ActsAsXapian.values_by_prefix[sort_by_prefix]
|
261
|
+
raise "couldn't find prefix '" + sort_by_prefix + "'" if value.nil?
|
262
|
+
ActsAsXapian.enquire.sort_by_value_then_relevance!(value, sort_by_ascending)
|
263
|
+
end
|
264
|
+
if collapse_by_prefix.nil?
|
265
|
+
ActsAsXapian.enquire.collapse_key = Xapian.BAD_VALUENO
|
266
|
+
else
|
267
|
+
value = ActsAsXapian.values_by_prefix[collapse_by_prefix]
|
268
|
+
raise "couldn't find prefix '" + collapse_by_prefix + "'" if value.nil?
|
269
|
+
ActsAsXapian.enquire.collapse_key = value
|
270
|
+
end
|
271
|
+
|
272
|
+
self.matches = ActsAsXapian.enquire.mset(offset, limit, 100)
|
273
|
+
self.cached_results = nil
|
274
|
+
}
|
275
|
+
end
|
276
|
+
|
277
|
+
# Return a description of the query
|
278
|
+
def description
|
279
|
+
self.query.description
|
280
|
+
end
|
281
|
+
|
282
|
+
# Estimate total number of results
|
283
|
+
def matches_estimated
|
284
|
+
self.matches.matches_estimated
|
285
|
+
end
|
286
|
+
|
287
|
+
# Return query string with spelling correction
|
288
|
+
def spelling_correction
|
289
|
+
correction = ActsAsXapian.query_parser.get_corrected_query_string
|
290
|
+
if correction.empty?
|
291
|
+
return nil
|
292
|
+
end
|
293
|
+
return correction
|
294
|
+
end
|
295
|
+
|
296
|
+
# Return array of models found
|
297
|
+
def results
|
298
|
+
# If they've already pulled out the results, just return them.
|
299
|
+
if !self.cached_results.nil?
|
300
|
+
return self.cached_results
|
301
|
+
end
|
302
|
+
|
303
|
+
docs = []
|
304
|
+
self.runtime += Benchmark::realtime {
|
305
|
+
# Pull out all the results
|
306
|
+
iter = self.matches._begin
|
307
|
+
while not iter.equals(self.matches._end)
|
308
|
+
docs.push({:data => iter.document.data,
|
309
|
+
:percent => iter.percent,
|
310
|
+
:weight => iter.weight,
|
311
|
+
:collapse_count => iter.collapse_count})
|
312
|
+
iter.next
|
313
|
+
end
|
314
|
+
}
|
315
|
+
|
316
|
+
# Log time taken, excluding database lookups below which will be displayed separately by ActiveRecord
|
317
|
+
if ActiveRecord::Base.logger
|
318
|
+
ActiveRecord::Base.logger.add(Logger::DEBUG, " Xapian query (#{'%.5fs' % self.runtime}) #{self.log_description}")
|
319
|
+
end
|
320
|
+
|
321
|
+
# Look up without too many SQL queries
|
322
|
+
lhash = {}
|
323
|
+
lhash.default = []
|
324
|
+
for doc in docs
|
325
|
+
k = doc[:data].split('-')
|
326
|
+
lhash[k[0]] = lhash[k[0]] + [k[1]]
|
327
|
+
end
|
328
|
+
# for each class, look up all ids
|
329
|
+
chash = {}
|
330
|
+
for cls, ids in lhash
|
331
|
+
conditions = [ "#{cls.constantize.table_name}.#{cls.constantize.primary_key} in (?)", ids ]
|
332
|
+
found = cls.constantize.find(:all, :conditions => conditions, :include => cls.constantize.xapian_options[:eager_load])
|
333
|
+
for f in found
|
334
|
+
chash[[cls, f.id]] = f
|
335
|
+
end
|
336
|
+
end
|
337
|
+
# now get them in right order again
|
338
|
+
results = []
|
339
|
+
docs.each{|doc| k = doc[:data].split('-'); results << { :model => chash[[k[0], k[1].to_i]],
|
340
|
+
:percent => doc[:percent], :weight => doc[:weight], :collapse_count => doc[:collapse_count] } }
|
341
|
+
self.cached_results = results
|
342
|
+
return results
|
343
|
+
end
|
344
|
+
end
|
345
|
+
|
346
|
+
# Search for a query string, returns an array of hashes in result order.
|
347
|
+
# Each hash contains the actual Rails object in :model, and other detail
|
348
|
+
# about relevancy etc. in other keys.
|
349
|
+
class Search < QueryBase
|
350
|
+
attr_accessor :query_string
|
351
|
+
|
352
|
+
# Note that model_classes is not only sometimes useful here - it's
|
353
|
+
# essential to make sure the classes have been loaded, and thus
|
354
|
+
# acts_as_xapian called on them, so we know the fields for the query
|
355
|
+
# parser.
|
356
|
+
|
357
|
+
# model_classes - model classes to search within, e.g. [PublicBody,
|
358
|
+
# User]. Can take a single model class, or you can express the model
|
359
|
+
# class names in strings if you like.
|
360
|
+
# query_string - user inputed query string, with syntax much like Google Search
|
361
|
+
def initialize(model_classes, query_string, options = {})
|
362
|
+
# Check parameters, convert to actual array of model classes
|
363
|
+
new_model_classes = []
|
364
|
+
model_classes = [model_classes] if model_classes.class != Array
|
365
|
+
for model_class in model_classes:
|
366
|
+
raise "pass in the model class itself, or a string containing its name" if model_class.class != Class && model_class.class != String
|
367
|
+
model_class = model_class.constantize if model_class.class == String
|
368
|
+
new_model_classes.push(model_class)
|
369
|
+
end
|
370
|
+
model_classes = new_model_classes
|
371
|
+
|
372
|
+
# Set things up
|
373
|
+
self.initialize_db
|
374
|
+
|
375
|
+
# Case of a string, searching for a Google-like syntax query
|
376
|
+
self.query_string = query_string
|
377
|
+
|
378
|
+
# Construct query which only finds things from specified models
|
379
|
+
model_query = Xapian::Query.new(Xapian::Query::OP_OR, model_classes.map{|mc| "M" + mc.to_s})
|
380
|
+
user_query = ActsAsXapian.query_parser.parse_query(self.query_string,
|
381
|
+
Xapian::QueryParser::FLAG_BOOLEAN | Xapian::QueryParser::FLAG_PHRASE |
|
382
|
+
Xapian::QueryParser::FLAG_LOVEHATE | Xapian::QueryParser::FLAG_WILDCARD |
|
383
|
+
Xapian::QueryParser::FLAG_SPELLING_CORRECTION)
|
384
|
+
self.query = Xapian::Query.new(Xapian::Query::OP_AND, model_query, user_query)
|
385
|
+
|
386
|
+
# Call base class constructor
|
387
|
+
self.initialize_query(options)
|
388
|
+
end
|
389
|
+
|
390
|
+
# Return just normal words in the query i.e. Not operators, ones in
|
391
|
+
# date ranges or similar. Use this for cheap highlighting with
|
392
|
+
# TextHelper::highlight, and excerpt.
|
393
|
+
def words_to_highlight
|
394
|
+
query_nopunc = self.query_string.gsub(/[^a-z0-9:\.\/_]/i, " ")
|
395
|
+
query_nopunc = query_nopunc.gsub(/\s+/, " ")
|
396
|
+
words = query_nopunc.split(" ")
|
397
|
+
# Remove anything with a :, . or / in it
|
398
|
+
words = words.find_all {|o| !o.match(/(:|\.|\/)/) }
|
399
|
+
words = words.find_all {|o| !o.match(/^(AND|NOT|OR|XOR)$/) }
|
400
|
+
return words
|
401
|
+
end
|
402
|
+
|
403
|
+
# Text for lines in log file
|
404
|
+
def log_description
|
405
|
+
"Search: " + self.query_string
|
406
|
+
end
|
407
|
+
|
408
|
+
end
|
409
|
+
|
410
|
+
# Search for models which contain theimportant terms taken from a specified
|
411
|
+
# list of models. i.e. Use to find documents similar to one (or more)
|
412
|
+
# documents, or use to refine searches.
|
413
|
+
class Similar < QueryBase
|
414
|
+
attr_accessor :query_models
|
415
|
+
attr_accessor :important_terms
|
416
|
+
|
417
|
+
# model_classes - model classes to search within, e.g. [PublicBody, User]
|
418
|
+
# query_models - list of models you want to find things similar to
|
419
|
+
def initialize(model_classes, query_models, options = {})
|
420
|
+
self.initialize_db
|
421
|
+
|
422
|
+
self.runtime += Benchmark::realtime {
|
423
|
+
# Case of an array, searching for models similar to those models in the array
|
424
|
+
self.query_models = query_models
|
425
|
+
|
426
|
+
# Find the documents by their unique term
|
427
|
+
input_models_query = Xapian::Query.new(Xapian::Query::OP_OR, query_models.map{|m| "I" + m.xapian_document_term})
|
428
|
+
ActsAsXapian.enquire.query = input_models_query
|
429
|
+
matches = ActsAsXapian.enquire.mset(0, 100, 100) # XXX so this whole method will only work with 100 docs
|
430
|
+
|
431
|
+
# Get set of relevant terms for those documents
|
432
|
+
selection = Xapian::RSet.new()
|
433
|
+
iter = matches._begin
|
434
|
+
while not iter.equals(matches._end)
|
435
|
+
selection.add_document(iter)
|
436
|
+
iter.next
|
437
|
+
end
|
438
|
+
|
439
|
+
# Bit weird that the function to make esets is part of the enquire
|
440
|
+
# object. This explains what exactly it does, which is to exclude
|
441
|
+
# terms in the existing query.
|
442
|
+
# http://thread.gmane.org/gmane.comp.search.xapian.general/3673/focus=3681
|
443
|
+
eset = ActsAsXapian.enquire.eset(40, selection)
|
444
|
+
|
445
|
+
# Do main search for them
|
446
|
+
self.important_terms = []
|
447
|
+
iter = eset._begin
|
448
|
+
while not iter.equals(eset._end)
|
449
|
+
self.important_terms.push(iter.term)
|
450
|
+
iter.next
|
451
|
+
end
|
452
|
+
similar_query = Xapian::Query.new(Xapian::Query::OP_OR, self.important_terms)
|
453
|
+
# Exclude original
|
454
|
+
combined_query = Xapian::Query.new(Xapian::Query::OP_AND_NOT, similar_query, input_models_query)
|
455
|
+
|
456
|
+
# Restrain to model classes
|
457
|
+
model_query = Xapian::Query.new(Xapian::Query::OP_OR, model_classes.map{|mc| "M" + mc.to_s})
|
458
|
+
self.query = Xapian::Query.new(Xapian::Query::OP_AND, model_query, combined_query)
|
459
|
+
}
|
460
|
+
|
461
|
+
# Call base class constructor
|
462
|
+
self.initialize_query(options)
|
463
|
+
end
|
464
|
+
|
465
|
+
# Text for lines in log file
|
466
|
+
def log_description
|
467
|
+
"Similar: " + self.query_models.to_s
|
468
|
+
end
|
469
|
+
end
|
470
|
+
|
471
|
+
######################################################################
|
472
|
+
# Index
|
473
|
+
|
474
|
+
# Offline indexing job queue model, create with migration made
|
475
|
+
# using "script/generate acts_as_xapian" as described in ../README.txt
|
476
|
+
class ActsAsXapianJob < ActiveRecord::Base
|
477
|
+
end
|
478
|
+
|
479
|
+
# Update index with any changes needed, call this offline. Only call it
|
480
|
+
# from a script that exits - otherwise Xapian's writable database won't
|
481
|
+
# flush your changes. Specifying flush will reduce performance, but
|
482
|
+
# make sure that each index update is definitely saved to disk before
|
483
|
+
# logging in the database that it has been.
|
484
|
+
def ActsAsXapian.update_index(flush = false, verbose = false)
|
485
|
+
# Before calling writable_init we have to make sure every model class has been initialized.
|
486
|
+
# i.e. has had its class code loaded, so acts_as_xapian has been called inside it, and
|
487
|
+
# we have the info from acts_as_xapian.
|
488
|
+
model_classes = ActsAsXapianJob.find_by_sql("select model from acts_as_xapian_jobs group by model").map {|a| a.model.constantize}
|
489
|
+
# If there are no models in the queue, then nothing to do
|
490
|
+
return if model_classes.size == 0
|
491
|
+
|
492
|
+
ActsAsXapian.writable_init
|
493
|
+
|
494
|
+
ids_to_refresh = ActsAsXapianJob.find(:all).map() { |i| i.id }
|
495
|
+
for id in ids_to_refresh
|
496
|
+
begin
|
497
|
+
ActiveRecord::Base.transaction do
|
498
|
+
job = ActsAsXapianJob.find(id, :lock =>true)
|
499
|
+
STDOUT.puts("ActsAsXapian.update_index #{job.action} #{job.model} #{job.model_id.to_s}") if verbose
|
500
|
+
if job.action == 'update'
|
501
|
+
# XXX Index functions may reference other models, so we could eager load here too?
|
502
|
+
model = job.model.constantize.find(job.model_id) # :include => cls.constantize.xapian_options[:include]
|
503
|
+
model.xapian_index
|
504
|
+
elsif job.action == 'destroy'
|
505
|
+
# Make dummy model with right id, just for destruction
|
506
|
+
model = job.model.constantize.new
|
507
|
+
model.id = job.model_id
|
508
|
+
model.xapian_destroy
|
509
|
+
else
|
510
|
+
raise "unknown ActsAsXapianJob action '" + job.action + "'"
|
511
|
+
end
|
512
|
+
job.destroy
|
513
|
+
|
514
|
+
if flush
|
515
|
+
ActsAsXapian.writable_db.flush
|
516
|
+
end
|
517
|
+
end
|
518
|
+
rescue => detail
|
519
|
+
# print any error, and carry on so other things are indexed
|
520
|
+
# XXX If item is later deleted, this should give up, and it
|
521
|
+
# won't. It will keep trying (assuming update_index called from
|
522
|
+
# regular cron job) and mayhap cause trouble.
|
523
|
+
STDERR.puts(detail.backtrace.join("\n") + "\nFAILED ActsAsXapian.update_index job #{id} #{$!}")
|
524
|
+
end
|
525
|
+
end
|
526
|
+
end
|
527
|
+
|
528
|
+
# You must specify *all* the models here, this totally rebuilds the Xapian database.
|
529
|
+
# You'll want any readers to reopen the database after this.
|
530
|
+
def ActsAsXapian.rebuild_index(model_classes, verbose = false)
|
531
|
+
raise "when rebuilding all, please call as first and only thing done in process / task" if not ActsAsXapian.writable_db.nil?
|
532
|
+
|
533
|
+
prepare_environment
|
534
|
+
|
535
|
+
# Delete any existing .new database, and open a new one
|
536
|
+
new_path = ActsAsXapian.db_path + ".new"
|
537
|
+
if File.exist?(new_path)
|
538
|
+
raise "found existing " + new_path + " which is not Xapian flint database, please delete for me" if not File.exist?(File.join(new_path, "iamflint"))
|
539
|
+
FileUtils.rm_r(new_path)
|
540
|
+
end
|
541
|
+
ActsAsXapian.writable_init(".new")
|
542
|
+
|
543
|
+
# Index everything
|
544
|
+
# XXX not a good place to do this destroy, as unindexed list is lost if
|
545
|
+
# process is aborted and old database carries on being used. Perhaps do in
|
546
|
+
# transaction and commit after rename below? Not sure if thenlocking is then bad
|
547
|
+
# for live website running at same time.
|
548
|
+
|
549
|
+
ActsAsXapianJob.destroy_all
|
550
|
+
batch_size = 1000
|
551
|
+
for model_class in model_classes
|
552
|
+
model_class.transaction do
|
553
|
+
0.step(model_class.count, batch_size) do |i|
|
554
|
+
STDOUT.puts("ActsAsXapian: New batch. From #{i} to #{i + batch_size}") if verbose
|
555
|
+
models = model_class.find(:all, :limit => batch_size, :offset => i, :order => :id)
|
556
|
+
for model in models
|
557
|
+
STDOUT.puts("ActsAsXapian.rebuild_index #{model_class} #{model.id}") if verbose
|
558
|
+
model.xapian_index
|
559
|
+
end
|
560
|
+
end
|
561
|
+
end
|
562
|
+
end
|
563
|
+
|
564
|
+
ActsAsXapian.writable_db.flush
|
565
|
+
|
566
|
+
# Rename into place
|
567
|
+
old_path = ActsAsXapian.db_path
|
568
|
+
temp_path = ActsAsXapian.db_path + ".tmp"
|
569
|
+
if File.exist?(temp_path)
|
570
|
+
raise "temporary database found " + temp_path + " which is not Xapian flint database, please delete for me" if not File.exist?(File.join(temp_path, "iamflint"))
|
571
|
+
FileUtils.rm_r(temp_path)
|
572
|
+
end
|
573
|
+
if File.exist?(old_path)
|
574
|
+
FileUtils.mv old_path, temp_path
|
575
|
+
end
|
576
|
+
FileUtils.mv new_path, old_path
|
577
|
+
|
578
|
+
# Delete old database
|
579
|
+
if File.exist?(temp_path)
|
580
|
+
raise "old database now at " + temp_path + " is not Xapian flint database, please delete for me" if not File.exist?(File.join(temp_path, "iamflint"))
|
581
|
+
FileUtils.rm_r(temp_path)
|
582
|
+
end
|
583
|
+
|
584
|
+
# You'll want to restart your FastCGI or Mongrel processes after this,
|
585
|
+
# so they get the new db
|
586
|
+
end
|
587
|
+
|
588
|
+
######################################################################
|
589
|
+
# Instance methods that get injected into your model.
|
590
|
+
|
591
|
+
module InstanceMethods
|
592
|
+
# Used internally
|
593
|
+
def xapian_document_term
|
594
|
+
self.class.to_s + "-" + self.id.to_s
|
595
|
+
end
|
596
|
+
|
597
|
+
# Extract value of a field from the model
|
598
|
+
def xapian_value(field, type = nil)
|
599
|
+
value = self[field] || self.send(field.to_sym)
|
600
|
+
if type == :date
|
601
|
+
if value.kind_of?(Time)
|
602
|
+
value.utc.strftime("%Y%m%d")
|
603
|
+
elsif value.kind_of?(Date)
|
604
|
+
value.to_time.utc.strftime("%Y%m%d")
|
605
|
+
else
|
606
|
+
raise "Only Time or Date types supported by acts_as_xapian for :date fields, got " + value.class.to_s
|
607
|
+
end
|
608
|
+
elsif type == :boolean
|
609
|
+
value ? true : false
|
610
|
+
else
|
611
|
+
value.to_s
|
612
|
+
end
|
613
|
+
end
|
614
|
+
|
615
|
+
# Store record in the Xapian database
|
616
|
+
def xapian_index
|
617
|
+
# if we have a conditional function for indexing, call it and destory object if failed
|
618
|
+
if self.class.xapian_options.include?(:if)
|
619
|
+
if_value = xapian_value(self.class.xapian_options[:if], :boolean)
|
620
|
+
if not if_value
|
621
|
+
self.xapian_destroy
|
622
|
+
return
|
623
|
+
end
|
624
|
+
end
|
625
|
+
|
626
|
+
# otherwise (re)write the Xapian record for the object
|
627
|
+
doc = Xapian::Document.new
|
628
|
+
ActsAsXapian.term_generator.document = doc
|
629
|
+
|
630
|
+
doc.data = self.xapian_document_term
|
631
|
+
|
632
|
+
doc.add_term("M" + self.class.to_s)
|
633
|
+
doc.add_term("I" + doc.data)
|
634
|
+
if self.xapian_options[:terms]
|
635
|
+
for term in self.xapian_options[:terms]
|
636
|
+
ActsAsXapian.term_generator.increase_termpos # stop phrases spanning different text fields
|
637
|
+
ActsAsXapian.term_generator.index_text(xapian_value(term[0]), 1, term[1])
|
638
|
+
end
|
639
|
+
end
|
640
|
+
if self.xapian_options[:values]
|
641
|
+
for value in self.xapian_options[:values]
|
642
|
+
doc.add_value(value[1], xapian_value(value[0], value[3]))
|
643
|
+
end
|
644
|
+
end
|
645
|
+
if self.xapian_options[:texts]
|
646
|
+
for text in self.xapian_options[:texts]
|
647
|
+
ActsAsXapian.term_generator.increase_termpos # stop phrases spanning different text fields
|
648
|
+
# XXX the "1" here is a weight that could be varied for a boost function
|
649
|
+
ActsAsXapian.term_generator.index_text(xapian_value(text), 1)
|
650
|
+
end
|
651
|
+
end
|
652
|
+
|
653
|
+
ActsAsXapian.writable_db.replace_document("I" + doc.data, doc)
|
654
|
+
end
|
655
|
+
|
656
|
+
# Delete record from the Xapian database
|
657
|
+
def xapian_destroy
|
658
|
+
ActsAsXapian.writable_db.delete_document("I" + self.xapian_document_term)
|
659
|
+
end
|
660
|
+
|
661
|
+
# Used to mark changes needed by batch indexer
|
662
|
+
def xapian_mark_needs_index
|
663
|
+
model = self.class.base_class.to_s
|
664
|
+
model_id = self.id
|
665
|
+
ActiveRecord::Base.transaction do
|
666
|
+
found = ActsAsXapianJob.delete_all([ "model = ? and model_id = ?", model, model_id])
|
667
|
+
job = ActsAsXapianJob.new
|
668
|
+
job.model = model
|
669
|
+
job.model_id = model_id
|
670
|
+
job.action = 'update'
|
671
|
+
job.save!
|
672
|
+
end
|
673
|
+
end
|
674
|
+
def xapian_mark_needs_destroy
|
675
|
+
model = self.class.base_class.to_s
|
676
|
+
model_id = self.id
|
677
|
+
ActiveRecord::Base.transaction do
|
678
|
+
found = ActsAsXapianJob.delete_all([ "model = ? and model_id = ?", model, model_id])
|
679
|
+
job = ActsAsXapianJob.new
|
680
|
+
job.model = model
|
681
|
+
job.model_id = model_id
|
682
|
+
job.action = 'destroy'
|
683
|
+
job.save!
|
684
|
+
end
|
685
|
+
end
|
686
|
+
end
|
687
|
+
|
688
|
+
module ClassMethods
|
689
|
+
|
690
|
+
# Model.find_with_xapian("Search Term OR Phrase")
|
691
|
+
# => Array of Records
|
692
|
+
#
|
693
|
+
# this can be used through association proxies /!\ DANGEROUS MAGIC /!\
|
694
|
+
# example:
|
695
|
+
# @document = Document.find(params[:id])
|
696
|
+
# @document_pages = @document.pages.find_with_xapian("Search Term OR Phrase").compact # NOTE THE compact wich removes nil objects from the array
|
697
|
+
#
|
698
|
+
# as seen here: http://pastie.org/270114
|
699
|
+
def find_with_xapian(search_term, options = {})
|
700
|
+
search_with_xapian(search_term, options).results.collect{|x| x[:model]}
|
701
|
+
end
|
702
|
+
|
703
|
+
def search_with_xapian(search_term, options = {})
|
704
|
+
ActsAsXapian::Search.new([self], search_term, options)
|
705
|
+
end
|
706
|
+
|
707
|
+
#this method should return true if the integration of xapian on self is complete
|
708
|
+
def xapian?
|
709
|
+
self.included_modules.include?(InstanceMethods) && self.extended_by.include?(ClassMethods)
|
710
|
+
end
|
711
|
+
|
712
|
+
end
|
713
|
+
|
714
|
+
module ProxyFinder
|
715
|
+
|
716
|
+
def find_with_xapian(search_term, options = {})
|
717
|
+
search_with_xapian(search_term, options).results.collect{|x| x[:model]}
|
718
|
+
end
|
719
|
+
|
720
|
+
def search_with_xapian(search_term, options = {})
|
721
|
+
ActsAsXapian::Search.new([proxy_reflection.klass], "#{proxy_reflection.primary_key_name}:#{proxy_owner.id} #{search_term}", options)
|
722
|
+
end
|
723
|
+
|
724
|
+
end
|
725
|
+
|
726
|
+
######################################################################
|
727
|
+
# Main entry point, add acts_as_xapian to your model.
|
728
|
+
|
729
|
+
module ActsMethods
|
730
|
+
# See top of this file for docs
|
731
|
+
def acts_as_xapian(options)
|
732
|
+
# Give error only on queries if bindings not available
|
733
|
+
return unless ActsAsXapian.bindings_available
|
734
|
+
|
735
|
+
include InstanceMethods
|
736
|
+
extend ClassMethods
|
737
|
+
|
738
|
+
# extend has_many && has_many_and_belongs_to associations with our ProxyFinder to get scoped results
|
739
|
+
# I've written a small report in the discussion group why this is the proper way of doing this.
|
740
|
+
# see here: XXX - write it you lazy douche bag!
|
741
|
+
self.reflections.each do |association_name, r|
|
742
|
+
# skip if the associated model isn't indexed by acts_as_xapian
|
743
|
+
next unless r.klass.respond_to?(:xapian?) && r.klass.xapian?
|
744
|
+
# skip all associations except ham and habtm
|
745
|
+
next unless [:has_many, :has_many_and_belongs_to_many].include?(r.macro)
|
746
|
+
|
747
|
+
# XXX todo:
|
748
|
+
# extend the associated model xapian options with this term:
|
749
|
+
# [proxy_reflection.primary_key_name.to_sym, <magically find a free capital letter>, proxy_reflection.primary_key_name]
|
750
|
+
# otherways this assumes that the associated & indexed model indexes this kind of term
|
751
|
+
|
752
|
+
# but before you do the above, rewrite the options syntax... wich imho is actually very ugly
|
753
|
+
|
754
|
+
# XXX test this nifty feature on habtm!
|
755
|
+
|
756
|
+
if r.options[:extend].nil?
|
757
|
+
r.options[:extend] = [ProxyFinder]
|
758
|
+
elsif not r.options[:extend].include?(ProxyFinder)
|
759
|
+
r.options[:extend] << ProxyFinder
|
760
|
+
end
|
761
|
+
end
|
762
|
+
|
763
|
+
cattr_accessor :xapian_options
|
764
|
+
self.xapian_options = options
|
765
|
+
|
766
|
+
ActsAsXapian.init(self.class.to_s, options)
|
767
|
+
|
768
|
+
after_save :xapian_mark_needs_index
|
769
|
+
after_destroy :xapian_mark_needs_destroy
|
770
|
+
end
|
771
|
+
end
|
772
|
+
|
773
|
+
end
|
774
|
+
|
775
|
+
# Reopen ActiveRecord and include the acts_as_xapian method
|
776
|
+
ActiveRecord::Base.extend ActsAsXapian::ActsMethods
|
777
|
+
|
778
|
+
|
data/tasks/xapian.rake
ADDED
@@ -0,0 +1,43 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'rake'
|
3
|
+
require 'rake/testtask'
|
4
|
+
require 'activerecord'
|
5
|
+
require 'acts_as_xapian'
|
6
|
+
|
7
|
+
namespace :xapian do
|
8
|
+
# Parameters - specify "flush=true" to save changes to the Xapian database
|
9
|
+
# after each model that is updated. This is safer, but slower. Specify
|
10
|
+
# "verbose=true" to print model name as it is run.
|
11
|
+
desc 'Updates Xapian search index with changes to models since last call'
|
12
|
+
task(:update_index => :environment) do
|
13
|
+
ActsAsXapian.update_index(ENV['flush'] ? true : false, ENV['verbose'] ? true : false)
|
14
|
+
end
|
15
|
+
|
16
|
+
# Parameters - specify 'models="PublicBody User"' to say which models
|
17
|
+
# you index with Xapian.
|
18
|
+
# This totally rebuilds the database, so you will want to restart any
|
19
|
+
# web server afterwards to make sure it gets the changes, rather than
|
20
|
+
# still pointing to the old deleted database. Specify "verbose=true" to
|
21
|
+
# print model name as it is run.
|
22
|
+
desc 'Completely rebuilds Xapian search index (must specify all models)'
|
23
|
+
task(:rebuild_index => :environment) do
|
24
|
+
raise "specify ALL your models with models=\"ModelName1 ModelName2\" as parameter" if ENV['models'].nil?
|
25
|
+
ActsAsXapian.rebuild_index(ENV['models'].split(" ").map{|m| m.constantize}, ENV['verbose'] ? true : false)
|
26
|
+
end
|
27
|
+
|
28
|
+
# Parameters - are models, query, offset, limit, sort_by_prefix,
|
29
|
+
# collapse_by_prefix
|
30
|
+
desc 'Run a query, return YAML of results'
|
31
|
+
task(:query => :environment) do
|
32
|
+
raise "specify models=\"ModelName1 ModelName2\" as parameter" if ENV['models'].nil?
|
33
|
+
raise "specify query=\"your terms\" as parameter" if ENV['query'].nil?
|
34
|
+
s = ActsAsXapian::Search.new(ENV['models'].split(" ").map{|m| m.constantize},
|
35
|
+
ENV['query'],
|
36
|
+
:offset => (ENV['offset'] || 0), :limit => (ENV['limit'] || 10),
|
37
|
+
:sort_by_prefix => (ENV['sort_by_prefix'] || nil),
|
38
|
+
:collapse_by_prefix => (ENV['collapse_by_prefix'] || nil)
|
39
|
+
)
|
40
|
+
STDOUT.puts(s.results.to_yaml)
|
41
|
+
end
|
42
|
+
end
|
43
|
+
|
metadata
ADDED
@@ -0,0 +1,74 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: wbzyl-acts_as_xapian
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.0.2
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- "Lukas Rieder (original author: Francis Irving)"
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
|
12
|
+
date: 2008-12-09 00:00:00 -08:00
|
13
|
+
default_executable:
|
14
|
+
dependencies: []
|
15
|
+
|
16
|
+
description: Acts_as_xapian is a full text search gem/plugin for Ruby on Rails.
|
17
|
+
email: l.rieder@gmail.com
|
18
|
+
executables: []
|
19
|
+
|
20
|
+
extensions: []
|
21
|
+
|
22
|
+
extra_rdoc_files:
|
23
|
+
- lib/acts_as_xapian.rb
|
24
|
+
- README.textile
|
25
|
+
- LICENSE.txt
|
26
|
+
- tasks/xapian.rake
|
27
|
+
- CHANGELOG
|
28
|
+
files:
|
29
|
+
- Rakefile
|
30
|
+
- acts_as_xapian.gemspec
|
31
|
+
- lib/acts_as_xapian.rb
|
32
|
+
- init.rb
|
33
|
+
- Manifest
|
34
|
+
- generators/acts_as_xapian/templates/migration.rb
|
35
|
+
- generators/acts_as_xapian/templates/xapian.yml
|
36
|
+
- generators/acts_as_xapian/USAGE
|
37
|
+
- generators/acts_as_xapian/acts_as_xapian_generator.rb
|
38
|
+
- README.textile
|
39
|
+
- LICENSE.txt
|
40
|
+
- tasks/xapian.rake
|
41
|
+
- CHANGELOG
|
42
|
+
has_rdoc: true
|
43
|
+
homepage: http://github.com/Overbryd/acts_as_xapian
|
44
|
+
post_install_message:
|
45
|
+
rdoc_options:
|
46
|
+
- --line-numbers
|
47
|
+
- --inline-source
|
48
|
+
- --title
|
49
|
+
- Acts_as_xapian
|
50
|
+
- --main
|
51
|
+
- README.textile
|
52
|
+
require_paths:
|
53
|
+
- lib
|
54
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
55
|
+
requirements:
|
56
|
+
- - ">="
|
57
|
+
- !ruby/object:Gem::Version
|
58
|
+
version: "0"
|
59
|
+
version:
|
60
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
61
|
+
requirements:
|
62
|
+
- - ">="
|
63
|
+
- !ruby/object:Gem::Version
|
64
|
+
version: "1.2"
|
65
|
+
version:
|
66
|
+
requirements: []
|
67
|
+
|
68
|
+
rubyforge_project: acts_as_xapian
|
69
|
+
rubygems_version: 1.2.0
|
70
|
+
signing_key:
|
71
|
+
specification_version: 2
|
72
|
+
summary: Acts_as_xapian is a full text search gem/plugin for Ruby on Rails.
|
73
|
+
test_files: []
|
74
|
+
|