acts_as_xapian 0.1.1
Sign up to get free protection for your applications and to get access to all the features.
- data/.document +5 -0
- data/.gitignore +21 -0
- data/LICENSE +20 -0
- data/README.rdoc +148 -0
- data/Rakefile +46 -0
- data/VERSION +1 -0
- data/acts_as_xapian.gemspec +70 -0
- data/generators/acts_as_xapian/USAGE +1 -0
- data/generators/acts_as_xapian/acts_as_xapian_generator.rb +14 -0
- data/generators/acts_as_xapian/templates/migrations/migration.rb +14 -0
- data/generators/acts_as_xapian/templates/tasks/xapian.rake +42 -0
- data/lib/acts_as_xapian/base.rb +215 -0
- data/lib/acts_as_xapian/core_ext/array.rb +24 -0
- data/lib/acts_as_xapian/index.rb +45 -0
- data/lib/acts_as_xapian/query_base.rb +159 -0
- data/lib/acts_as_xapian/readable_index.rb +117 -0
- data/lib/acts_as_xapian/search.rb +67 -0
- data/lib/acts_as_xapian/similar.rb +61 -0
- data/lib/acts_as_xapian/writeable_index.rb +152 -0
- data/lib/acts_as_xapian.rb +5 -0
- data/spec/acts_as_xapian_spec.rb +7 -0
- data/spec/spec.opts +1 -0
- data/spec/spec_helper.rb +9 -0
- metadata +111 -0
data/.document
ADDED
data/.gitignore
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright (c) 2009 Mike Nelson
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.rdoc
ADDED
@@ -0,0 +1,148 @@
|
|
1
|
+
=acts_as_xapian_gem / acts_as_xapian
|
2
|
+
|
3
|
+
== Introduction
|
4
|
+
|
5
|
+
Xapian[http://www.xapian.org] is a full text search engine library which has Ruby bindings. acts_as_xapian adds support for it to Rails. It is an alternative to acts_as_solr, acts_as_ferret, Ultrasphinx, acts_as_indexed, acts_as_searchable or acts_as_tsearch.
|
6
|
+
|
7
|
+
acts_as_xapian is deployed in production on these websites.
|
8
|
+
* WhatDoTheyKnow[http://www.whatdotheyknow.com]
|
9
|
+
* MindBites[http://www.mindbites.com]
|
10
|
+
|
11
|
+
== A Quick Note
|
12
|
+
|
13
|
+
This gem was created directly from the acts_as_xapian plugin. There were very few changes, the majority of which were to make the gem handle installation better. If you'd like more information about the original plugin go here[http://www.github.com/frabcus/acts_as_xapian/] or if I've left something crucial out, send me a message via github.
|
14
|
+
|
15
|
+
== Installation
|
16
|
+
|
17
|
+
Install Xapian with the ruby bindings on your box. For you OSX users, I'd recommend using Homebrew[http://github.com/mxcl/homebrew]
|
18
|
+
|
19
|
+
Then install the gem
|
20
|
+
sudo gem install acts_as_xapian
|
21
|
+
|
22
|
+
Navigate to your project and generate the required files
|
23
|
+
script/generate acts_as_xapian
|
24
|
+
|
25
|
+
Migrate your database
|
26
|
+
rake db:migrate
|
27
|
+
|
28
|
+
== Usage
|
29
|
+
|
30
|
+
Xapian is an offline indexing search library - only one process can have the Xapian database open for writing at once, and others that try meanwhile are unceremoniously kicked out. For this reason, acts_as_xapian does not support immediate writing to the database when your models change.
|
31
|
+
|
32
|
+
Instead, there is a ActsAsXapianJob model which stores which models need updating or deleting in the search index. A rake task 'xapian:update_index' then performs the updates since last change. You can run it on a cron job, or similar.
|
33
|
+
|
34
|
+
Here's how to add indexing to your Rails app:
|
35
|
+
|
36
|
+
Put acts_as_xapian in your models that need search indexing. e.g.
|
37
|
+
|
38
|
+
acts_as_xapian :texts => [:name, :short_name],
|
39
|
+
:values => [[ :created_at, 0, "created_at", :date ]],
|
40
|
+
:terms => [[ :variety, 'V', "variety" ]]
|
41
|
+
|
42
|
+
Options must include:
|
43
|
+
|
44
|
+
* :texts, an array of fields for indexing with full text search.
|
45
|
+
e.g. :texts => [ :title, :body ]
|
46
|
+
|
47
|
+
* :values, things which have a range of values for sorting or collapsing. Specify an array quadruple of [ field, identifier, prefix, type ] where _identifier_ is an arbitary numeric identifier for use in the Xapian database, _prefix_ is the part to use in search queries that goes before the : , and _type_ can be any of :string, :number or :date.
|
48
|
+
e.g. :values => [[ :created_at, 0, "created_at", :date ], [ :size, 1, "size", :string ]]
|
49
|
+
|
50
|
+
* :terms, things which come with a prefix (before a ':') in search queries. Specify an array triple of [ field, char, prefix ] where _char_ is an arbitary single upper case char used in the Xapian database, just pick any single uppercase character, but use a different one for each prefix. _prefix_ is the part to use in search queries that goes before the : . For example, if you were making Google and indexing to be able to later do a query like "site:www.whatdotheyknow.com", then the prefix would be "site".
|
51
|
+
e.g. :terms => [ [ :variety, 'V', "variety" ] ]
|
52
|
+
|
53
|
+
A 'field' is a symbol referring to either an attribute or a function which returns the text, date or number to index. Both 'identifier' and 'char' must be the same for the same prefix in different models.
|
54
|
+
|
55
|
+
Options may include:
|
56
|
+
|
57
|
+
* :eager_load, added as an :include clause when looking up search results in database
|
58
|
+
* :if, either an attribute or a function which if returns false means the object isn't indexed
|
59
|
+
|
60
|
+
To build the index
|
61
|
+
the first time, call:
|
62
|
+
rake xapian:rebuild_index
|
63
|
+
|
64
|
+
It puts the db in the development/test/production directory in your db directory. See the configuration section below if you want to change this.
|
65
|
+
|
66
|
+
Then from a cron job or a daemon, or by hand regularly call:
|
67
|
+
'rake xapian:update_index'
|
68
|
+
|
69
|
+
|
70
|
+
== Querying
|
71
|
+
|
72
|
+
|
73
|
+
=== Testing indexing
|
74
|
+
|
75
|
+
If you just want to test indexing is working, you'll find this rake task useful:
|
76
|
+
rake xapian:query q="moo"
|
77
|
+
|
78
|
+
You have a few more options here:
|
79
|
+
* models - the models to query (ex: models="User Company"). Omitting searches all xapian models
|
80
|
+
* offset - the offset of the results
|
81
|
+
* limit - the limiting number of results
|
82
|
+
* sort_by_prefix - sort by the prefix specified in value field of the acts_as_xapian call
|
83
|
+
* collapse_by_prefix - collapse the results based on best result for it's prefix
|
84
|
+
|
85
|
+
=== Performing a query
|
86
|
+
|
87
|
+
To perform a query from code call ActsAsXapian::Search.new. This takes in turn:
|
88
|
+
* model_classes - list of models to search, e.g. [PublicBody, InfoRequestEvent]
|
89
|
+
* query_string - Google like syntax, see below
|
90
|
+
|
91
|
+
And then a hash of options:
|
92
|
+
* :offset - Offset of first result (default 0)
|
93
|
+
* :limit - Number of results per page
|
94
|
+
* :sort_by_prefix - Optionally, prefix of value to sort by, otherwise sort by relevance
|
95
|
+
* :sort_by_ascending - Default true (documents with higher values better/earlier), set to false for descending sort
|
96
|
+
* :collapse_by_prefix - Optionally, prefix of value to collapse by (i.e. only return most relevant result from group)
|
97
|
+
|
98
|
+
Google like query syntax is as described in {Xapian::QueryParser Syntax}[http://www.xapian.org/docs/queryparser.html] Queries can include prefix:value parts, according to what you indexed in the acts_as_xapian part above. You can also say things like model:InfoRequestEvent to constrain by model in more complex ways than the :model parameter, or modelid:InfoRequestEvent-100 to only find one specific object.
|
99
|
+
|
100
|
+
Returns an ActsAsXapian::Search object. Useful methods are:
|
101
|
+
* description - a techy one, to check how the query has been parsed
|
102
|
+
* matches_estimated - a guesstimate at the total number of hits
|
103
|
+
* spelling_correction - the corrected query string if there is a correction, otherwise nil
|
104
|
+
* words_to_highlight - list of words for you to highlight, perhaps with TextHelper::highlight
|
105
|
+
* results - an array of hashes each structured like:
|
106
|
+
{:model > YourModel, :weight => 3.92, :percent => 100%, :collapse_count => 0}
|
107
|
+
* :model - your Rails model, this is what you most want!
|
108
|
+
* :weight - relevancy measure
|
109
|
+
* :percent - the weight as a %, 0 meaning the item did not match the query at all
|
110
|
+
* :collapse_count - number of results with the same prefix, if you specified collapse_by_prefix
|
111
|
+
|
112
|
+
=== Finding similar models
|
113
|
+
|
114
|
+
To find models that are similar to a given set of models call ActsAsXapian::Similar.new. This takes:
|
115
|
+
* model_classes - list of model classes to return models from within
|
116
|
+
* models - list of models that you want to find related ones to
|
117
|
+
|
118
|
+
Returns an ActsAsXapian::Similar object. Has all methods from ActsAsXapian::Search above, except for words_to_highlight. In addition has:
|
119
|
+
* important_terms - the terms extracted from the input models, that were used to search for output. You need the results methods to get the similar models.
|
120
|
+
|
121
|
+
|
122
|
+
== Configuration
|
123
|
+
|
124
|
+
|
125
|
+
If you want to customise the configuration of acts_as_xapian, it will look for a file called 'xapian.yml' under RAILS_ROOT/config. As is familiar from the format of the database.yml file, separate :development, :test and :production sections are expected.
|
126
|
+
|
127
|
+
The following options are available:
|
128
|
+
* base_db_path - specifies the directory, relative to RAILS_ROOT, in which acts_as_xapian stores its search index databases. Default is the xapiandbs directory within the db directory.
|
129
|
+
|
130
|
+
|
131
|
+
== Performance
|
132
|
+
|
133
|
+
On development sites, acts_as_xapian automatically logs the time taken to do searches. The time displayed is for the Xapian parts of the query; the Rails database model lookups will be logged separately by ActiveRecord. Example:
|
134
|
+
|
135
|
+
Xapian query (0.00029s) Search: hello
|
136
|
+
|
137
|
+
To enable this, and other performance logging, on a production site, temporarily add this to the end of your config/environment.rb
|
138
|
+
|
139
|
+
ActiveRecord::Base.logger = Logger.new(STDOUT)
|
140
|
+
|
141
|
+
|
142
|
+
== Support
|
143
|
+
|
144
|
+
Please ask any questions on the {acts_as_xapian Google Group}[http://groups.google.com/group/acts_as_xapian]
|
145
|
+
|
146
|
+
The official home page and repository for acts_as_xapian are the {acts_as_xapian github page}[http://github.com/frabcus/acts_as_xapian/wikis]
|
147
|
+
|
148
|
+
For more details about anything, see source code in lib/acts_as_xapian/*.rb
|
data/Rakefile
ADDED
@@ -0,0 +1,46 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'rake'
|
3
|
+
|
4
|
+
begin
|
5
|
+
require 'jeweler'
|
6
|
+
Jeweler::Tasks.new do |gem|
|
7
|
+
gem.name = "acts_as_xapian"
|
8
|
+
gem.summary = %Q{A gem for interacting with the Xapian full text search engine}
|
9
|
+
gem.description = %Q{A gem for interacting with the Xapian full text search engine. Completely based on the acts_as_xapian plugin.}
|
10
|
+
gem.email = "mdnelson30@gmail.com"
|
11
|
+
gem.homepage = "http://github.com/mnelson/acts_as_xapian_gem"
|
12
|
+
gem.authors = ["Mike Nelson"]
|
13
|
+
gem.add_development_dependency "rspec", ">= 1.2.9"
|
14
|
+
gem.add_development_dependency "active_record"
|
15
|
+
# gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
|
16
|
+
end
|
17
|
+
Jeweler::GemcutterTasks.new
|
18
|
+
rescue LoadError
|
19
|
+
puts "Jeweler (or a dependency) not available. Install it with: gem install jeweler"
|
20
|
+
end
|
21
|
+
|
22
|
+
require 'spec/rake/spectask'
|
23
|
+
Spec::Rake::SpecTask.new(:spec) do |spec|
|
24
|
+
spec.libs << 'lib' << 'spec'
|
25
|
+
spec.spec_files = FileList['spec/**/*_spec.rb']
|
26
|
+
end
|
27
|
+
|
28
|
+
Spec::Rake::SpecTask.new(:rcov) do |spec|
|
29
|
+
spec.libs << 'lib' << 'spec'
|
30
|
+
spec.pattern = 'spec/**/*_spec.rb'
|
31
|
+
spec.rcov = true
|
32
|
+
end
|
33
|
+
|
34
|
+
task :spec => :check_dependencies
|
35
|
+
|
36
|
+
task :default => :spec
|
37
|
+
|
38
|
+
require 'rake/rdoctask'
|
39
|
+
Rake::RDocTask.new do |rdoc|
|
40
|
+
version = File.exist?('VERSION') ? File.read('VERSION') : ""
|
41
|
+
|
42
|
+
rdoc.rdoc_dir = 'rdoc'
|
43
|
+
rdoc.title = "acts_as_xapian #{version}"
|
44
|
+
rdoc.rdoc_files.include('README*')
|
45
|
+
rdoc.rdoc_files.include('lib/**/*.rb')
|
46
|
+
end
|
data/VERSION
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
0.1.1
|
@@ -0,0 +1,70 @@
|
|
1
|
+
# Generated by jeweler
|
2
|
+
# DO NOT EDIT THIS FILE DIRECTLY
|
3
|
+
# Instead, edit Jeweler::Tasks in Rakefile, and run the gemspec command
|
4
|
+
# -*- encoding: utf-8 -*-
|
5
|
+
|
6
|
+
Gem::Specification.new do |s|
|
7
|
+
s.name = %q{acts_as_xapian}
|
8
|
+
s.version = "0.1.1"
|
9
|
+
|
10
|
+
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
11
|
+
s.authors = ["Mike Nelson"]
|
12
|
+
s.date = %q{2010-03-17}
|
13
|
+
s.description = %q{A gem for interacting with the Xapian full text search engine. Completely based on the acts_as_xapian plugin.}
|
14
|
+
s.email = %q{mdnelson30@gmail.com}
|
15
|
+
s.extra_rdoc_files = [
|
16
|
+
"LICENSE",
|
17
|
+
"README.rdoc"
|
18
|
+
]
|
19
|
+
s.files = [
|
20
|
+
".document",
|
21
|
+
".gitignore",
|
22
|
+
"LICENSE",
|
23
|
+
"README.rdoc",
|
24
|
+
"Rakefile",
|
25
|
+
"VERSION",
|
26
|
+
"acts_as_xapian.gemspec",
|
27
|
+
"generators/acts_as_xapian/USAGE",
|
28
|
+
"generators/acts_as_xapian/acts_as_xapian_generator.rb",
|
29
|
+
"generators/acts_as_xapian/templates/migrations/migration.rb",
|
30
|
+
"generators/acts_as_xapian/templates/tasks/xapian.rake",
|
31
|
+
"lib/acts_as_xapian.rb",
|
32
|
+
"lib/acts_as_xapian/base.rb",
|
33
|
+
"lib/acts_as_xapian/core_ext/array.rb",
|
34
|
+
"lib/acts_as_xapian/index.rb",
|
35
|
+
"lib/acts_as_xapian/query_base.rb",
|
36
|
+
"lib/acts_as_xapian/readable_index.rb",
|
37
|
+
"lib/acts_as_xapian/search.rb",
|
38
|
+
"lib/acts_as_xapian/similar.rb",
|
39
|
+
"lib/acts_as_xapian/writeable_index.rb",
|
40
|
+
"spec/acts_as_xapian_spec.rb",
|
41
|
+
"spec/spec.opts",
|
42
|
+
"spec/spec_helper.rb"
|
43
|
+
]
|
44
|
+
s.homepage = %q{http://github.com/mnelson/acts_as_xapian_gem}
|
45
|
+
s.rdoc_options = ["--charset=UTF-8"]
|
46
|
+
s.require_paths = ["lib"]
|
47
|
+
s.rubygems_version = %q{1.3.6}
|
48
|
+
s.summary = %q{A gem for interacting with the Xapian full text search engine}
|
49
|
+
s.test_files = [
|
50
|
+
"spec/acts_as_xapian_spec.rb",
|
51
|
+
"spec/spec_helper.rb"
|
52
|
+
]
|
53
|
+
|
54
|
+
if s.respond_to? :specification_version then
|
55
|
+
current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
|
56
|
+
s.specification_version = 3
|
57
|
+
|
58
|
+
if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.2.0') then
|
59
|
+
s.add_development_dependency(%q<rspec>, [">= 1.2.9"])
|
60
|
+
s.add_development_dependency(%q<active_record>, [">= 0"])
|
61
|
+
else
|
62
|
+
s.add_dependency(%q<rspec>, [">= 1.2.9"])
|
63
|
+
s.add_dependency(%q<active_record>, [">= 0"])
|
64
|
+
end
|
65
|
+
else
|
66
|
+
s.add_dependency(%q<rspec>, [">= 1.2.9"])
|
67
|
+
s.add_dependency(%q<active_record>, [">= 0"])
|
68
|
+
end
|
69
|
+
end
|
70
|
+
|
@@ -0,0 +1 @@
|
|
1
|
+
./script/generate acts_as_xapian
|
@@ -0,0 +1,14 @@
|
|
1
|
+
class ActsAsXapianGenerator < Rails::Generator::Base
|
2
|
+
def manifest
|
3
|
+
record do |m|
|
4
|
+
m.migration_template 'migrations/migration.rb', 'db/migrate',
|
5
|
+
:migration_file_name => "create_acts_as_xapian"
|
6
|
+
m.file "tasks/xapian.rake", "lib/tasks/xapian.rake"
|
7
|
+
end
|
8
|
+
end
|
9
|
+
|
10
|
+
protected
|
11
|
+
def banner
|
12
|
+
"Usage: #{$0} acts_as_xapian"
|
13
|
+
end
|
14
|
+
end
|
@@ -0,0 +1,14 @@
|
|
1
|
+
class CreateActsAsXapian < ActiveRecord::Migration
|
2
|
+
def self.up
|
3
|
+
create_table :acts_as_xapian_jobs do |t|
|
4
|
+
t.column :model, :string, :null => false
|
5
|
+
t.column :model_id, :integer, :null => false
|
6
|
+
t.column :action, :string, :null => false
|
7
|
+
end
|
8
|
+
add_index :acts_as_xapian_jobs, [:model, :model_id], :unique => true
|
9
|
+
end
|
10
|
+
def self.down
|
11
|
+
drop_table :acts_as_xapian_jobs
|
12
|
+
end
|
13
|
+
end
|
14
|
+
|
@@ -0,0 +1,42 @@
|
|
1
|
+
|
2
|
+
namespace :xapian do
|
3
|
+
|
4
|
+
# Parameters - specify "flush=true" to save changes to the Xapian database
|
5
|
+
# after each model that is updated. This is safer, but slower. Specify
|
6
|
+
# "verbose=true" to print model name as it is run.
|
7
|
+
desc 'Updates Xapian search index with changes to models since last call'
|
8
|
+
task :update_index => :environment do
|
9
|
+
ActsAsXapian::WriteableIndex.update_index(ENV['flush'] ? true : false, ENV['verbose'] ? true : false)
|
10
|
+
end
|
11
|
+
|
12
|
+
desc 'Pulls all the xapian models from either the params or the project itself'
|
13
|
+
task :retrieve_models => :environment do
|
14
|
+
@models = (ENV['models'] || ENV['m']) && (ENV['models'] || ENV['m']).split(" ").map{|m| m.constantize} || ActiveRecord::Base.send(:subclasses).select{|klazz| klazz.respond_to?(:xapian?)}
|
15
|
+
STDOUT.puts("Found Xapian Models: #{@models.map(&:name).join(', ')}")
|
16
|
+
end
|
17
|
+
# Parameters - specify 'models="PublicBody User"' to say which models
|
18
|
+
# you index with Xapian.
|
19
|
+
# This totally rebuilds the database, so you will want to restart any
|
20
|
+
# web server afterwards to make sure it gets the changes, rather than
|
21
|
+
# still pointing to the old deleted database. Specify "verbose=true" to
|
22
|
+
# print model name as it is run.
|
23
|
+
desc 'Completely rebuilds Xapian search index (must specify all models)'
|
24
|
+
task :rebuild_index => :retrieve_models do
|
25
|
+
ActsAsXapian::WriteableIndex.rebuild_index(@models, ENV['verbose'] ? true : false)
|
26
|
+
end
|
27
|
+
|
28
|
+
# Parameters - are models, query, offset, limit, sort_by_prefix,
|
29
|
+
# collapse_by_prefix
|
30
|
+
desc 'Run a query, return YAML of results'
|
31
|
+
task :query => :retrieve_models do
|
32
|
+
raise "specify q=\"your terms\" as parameter" if (ENV['query'] || ENV['q']).nil?
|
33
|
+
s = ActsAsXapian::Search.new(@models,
|
34
|
+
(ENV['query'] || ENV['q']),
|
35
|
+
:offset => (ENV['offset'] || 0), :limit => (ENV['limit'] || 10),
|
36
|
+
:sort_by_prefix => (ENV['sort_by_prefix'] || nil),
|
37
|
+
:collapse_by_prefix => (ENV['collapse_by_prefix'] || nil)
|
38
|
+
)
|
39
|
+
STDOUT.puts(s.results.to_yaml)
|
40
|
+
end
|
41
|
+
end
|
42
|
+
|
@@ -0,0 +1,215 @@
|
|
1
|
+
# acts_as_xapian/lib/acts_as_xapian.rb:
|
2
|
+
# Xapian full text search in Ruby on Rails.
|
3
|
+
#
|
4
|
+
# Copyright (c) 2008 UK Citizens Online Democracy. All rights reserved.
|
5
|
+
# Email: francis@mysociety.org; WWW: http://www.mysociety.org/
|
6
|
+
#
|
7
|
+
# Documentation
|
8
|
+
# =============
|
9
|
+
#
|
10
|
+
# See ../README.txt foocumentation. Please update that file if you edit
|
11
|
+
# code.
|
12
|
+
|
13
|
+
# Make it so if Xapian isn't installed, the Rails app doesn't fail completely,
|
14
|
+
# just when somebody does a search.
|
15
|
+
begin
|
16
|
+
require 'xapian'
|
17
|
+
$acts_as_xapian_bindings_available = true
|
18
|
+
rescue LoadError
|
19
|
+
STDERR.puts "acts_as_xapian: No Ruby bindings for Xapian installed"
|
20
|
+
$acts_as_xapian_bindings_available = false
|
21
|
+
end
|
22
|
+
|
23
|
+
module ActsAsXapian
|
24
|
+
class NoXapianRubyBindingsError < StandardError; end
|
25
|
+
|
26
|
+
# Offline indexing job queue model, create with migration made
|
27
|
+
# using "script/generate acts_as_xapian" as described in ../README.txt
|
28
|
+
class ActsAsXapianJob < ActiveRecord::Base; end
|
29
|
+
|
30
|
+
######################################################################
|
31
|
+
# Module level variables
|
32
|
+
# XXX must be some kind of cattr_accessor that can do this better
|
33
|
+
def self.bindings_available
|
34
|
+
$acts_as_xapian_bindings_available
|
35
|
+
end
|
36
|
+
|
37
|
+
######################################################################
|
38
|
+
# Main entry point, add acts_as_xapian to your model.
|
39
|
+
|
40
|
+
module ActsMethods
|
41
|
+
# See top of this file for docs
|
42
|
+
def acts_as_xapian(options)
|
43
|
+
# Give error only on queries if bindings not available
|
44
|
+
return unless ActsAsXapian.bindings_available
|
45
|
+
|
46
|
+
include InstanceMethods
|
47
|
+
extend ClassMethods
|
48
|
+
|
49
|
+
class_eval('def xapian_boost(term_type, term); 1; end') unless self.instance_methods.include?('xapian_boost')
|
50
|
+
|
51
|
+
# extend has_many && has_many_and_belongs_to associations with our ProxyFinder to get scoped results
|
52
|
+
# I've written a small report in the discussion group why this is the proper way of doing this.
|
53
|
+
# see here: XXX - write it you lazy douche bag!
|
54
|
+
self.reflections.each do |association_name, r|
|
55
|
+
# skip if the associated model isn't indexed by acts_as_xapian
|
56
|
+
next unless r.klass.respond_to?(:xapian?) && r.klass.xapian?
|
57
|
+
# skip all associations except ham and habtm
|
58
|
+
next unless [:has_many, :has_many_and_belongs_to_many].include?(r.macro)
|
59
|
+
|
60
|
+
# XXX todo:
|
61
|
+
# extend the associated model xapian options with this term:
|
62
|
+
# [proxy_reflection.primary_key_name.to_sym, <magically find a free capital letter>, proxy_reflection.primary_key_name]
|
63
|
+
# otherways this assumes that the associated & indexed model indexes this kind of term
|
64
|
+
|
65
|
+
# but before you do the above, rewrite the options syntax... wich imho is actually very ugly
|
66
|
+
|
67
|
+
# XXX test this nifty feature on habtm!
|
68
|
+
|
69
|
+
if r.options[:extend].nil?
|
70
|
+
r.options[:extend] = [ProxyFinder]
|
71
|
+
elsif !r.options[:extend].include?(ProxyFinder)
|
72
|
+
r.options[:extend] << ProxyFinder
|
73
|
+
end
|
74
|
+
end
|
75
|
+
|
76
|
+
cattr_accessor :xapian_options
|
77
|
+
self.xapian_options = options
|
78
|
+
|
79
|
+
ActsAsXapian::Index.init(self.class.to_s, options)
|
80
|
+
|
81
|
+
after_save :xapian_mark_needs_index
|
82
|
+
after_destroy :xapian_mark_needs_destroy
|
83
|
+
end
|
84
|
+
end
|
85
|
+
|
86
|
+
module ClassMethods
|
87
|
+
# Model.find_with_xapian("Search Term OR Phrase")
|
88
|
+
# => Array of Records
|
89
|
+
#
|
90
|
+
# this can be used through association proxies /!\ DANGEROUS MAGIC /!\
|
91
|
+
# example:
|
92
|
+
# @document = Document.find(params[:id])
|
93
|
+
# @document_pages = @document.pages.find_with_xapian("Search Term OR Phrase").compact # NOTE THE compact wich removes nil objects from the array
|
94
|
+
#
|
95
|
+
# as seen here: http://pastie.org/270114
|
96
|
+
def find_with_xapian(search_term, options = {})
|
97
|
+
search_with_xapian(search_term, options).results.map {|x| x[:model] }
|
98
|
+
end
|
99
|
+
|
100
|
+
def search_with_xapian(search_term, options = {})
|
101
|
+
ActsAsXapian::Search.new([self], search_term, options)
|
102
|
+
end
|
103
|
+
|
104
|
+
def with_xapian_scope(ids)
|
105
|
+
with_scope(:find => {:conditions => {"#{self.table_name}.#{self.primary_key}" => ids}, :include => self.xapian_options[:eager_load]}) { yield }
|
106
|
+
end
|
107
|
+
|
108
|
+
#this method should return true if the integration of xapian on self is complete
|
109
|
+
def xapian?
|
110
|
+
self.included_modules.include?(InstanceMethods) && self.extended_by.include?(ClassMethods)
|
111
|
+
end
|
112
|
+
end
|
113
|
+
|
114
|
+
######################################################################
|
115
|
+
# Instance methods that get injected into your model.
|
116
|
+
|
117
|
+
module InstanceMethods
|
118
|
+
# Used internally
|
119
|
+
def xapian_document_term
|
120
|
+
"#{self.class}-#{self.id}"
|
121
|
+
end
|
122
|
+
|
123
|
+
# Extract value of a field from the model
|
124
|
+
def xapian_value(field, type = nil)
|
125
|
+
value = self.respond_to?(field) ? self.send(field) : self[field] # Give preference to method if it exists
|
126
|
+
case type
|
127
|
+
when :date
|
128
|
+
value = value.to_time if value.kind_of?(Date)
|
129
|
+
raise "Only Time or Date types supported by acts_as_xapian for :date fields, got #{value.class}" unless value.kind_of?(Time)
|
130
|
+
value.utc.strftime("%Y%m%d")
|
131
|
+
when :boolean
|
132
|
+
value ? true : false
|
133
|
+
when :number
|
134
|
+
value.nil? ? "" : Xapian::sortable_serialise(value.to_f)
|
135
|
+
else
|
136
|
+
value.to_s
|
137
|
+
end
|
138
|
+
end
|
139
|
+
|
140
|
+
# Store record in the Xapian database
|
141
|
+
def xapian_index
|
142
|
+
# if we have a conditional function for indexing, call it and destory object if failed
|
143
|
+
if self.class.xapian_options.key?(:if) && !xapian_value(self.class.xapian_options[:if], :boolean)
|
144
|
+
self.xapian_destroy
|
145
|
+
return
|
146
|
+
end
|
147
|
+
|
148
|
+
# otherwise (re)write the Xapian record for the object
|
149
|
+
doc = Xapian::Document.new
|
150
|
+
WriteableIndex.term_generator.document = doc
|
151
|
+
|
152
|
+
doc.data = self.xapian_document_term
|
153
|
+
|
154
|
+
doc.add_term("M#{self.class}")
|
155
|
+
doc.add_term("I#{doc.data}")
|
156
|
+
(self.xapian_options[:terms] || []).each do |term|
|
157
|
+
WriteableIndex.term_generator.increase_termpos # stop phrases spanning different text fields
|
158
|
+
WriteableIndex.term_generator.index_text(xapian_value(term[0]), self.xapian_boost(:term, term[0]), term[1])
|
159
|
+
end
|
160
|
+
(self.xapian_options[:values] || []).each {|value| doc.add_value(value[1], xapian_value(value[0], value[3])) }
|
161
|
+
(self.xapian_options[:texts] || []).each do |text|
|
162
|
+
WriteableIndex.term_generator.increase_termpos # stop phrases spanning different text fields
|
163
|
+
WriteableIndex.term_generator.index_text(xapian_value(text), self.xapian_boost(:text, text))
|
164
|
+
end
|
165
|
+
|
166
|
+
WriteableIndex.replace_document("I#{doc.data}", doc)
|
167
|
+
end
|
168
|
+
|
169
|
+
# Delete record from the Xapian database
|
170
|
+
def xapian_destroy
|
171
|
+
WriteableIndex.delete_document("I#{self.xapian_document_term}")
|
172
|
+
end
|
173
|
+
|
174
|
+
# Used to mark changes needed by batch indexer
|
175
|
+
def xapian_mark_needs_index
|
176
|
+
model = self.class.base_class.to_s
|
177
|
+
model_id = self.id
|
178
|
+
return false unless model_id # After save gets called even if save fails
|
179
|
+
ActiveRecord::Base.transaction do
|
180
|
+
found = ActsAsXapianJob.delete_all(["model = ? and model_id = ?", model, model_id])
|
181
|
+
job = ActsAsXapianJob.new
|
182
|
+
job.model = model
|
183
|
+
job.model_id = model_id
|
184
|
+
job.action = 'update'
|
185
|
+
job.save!
|
186
|
+
end
|
187
|
+
end
|
188
|
+
|
189
|
+
def xapian_mark_needs_destroy
|
190
|
+
model = self.class.base_class.to_s
|
191
|
+
model_id = self.id
|
192
|
+
ActiveRecord::Base.transaction do
|
193
|
+
found = ActsAsXapianJob.delete_all(["model = ? and model_id = ?", model, model_id])
|
194
|
+
job = ActsAsXapianJob.new
|
195
|
+
job.model = model
|
196
|
+
job.model_id = model_id
|
197
|
+
job.action = 'destroy'
|
198
|
+
job.save!
|
199
|
+
end
|
200
|
+
end
|
201
|
+
end
|
202
|
+
|
203
|
+
module ProxyFinder
|
204
|
+
def find_with_xapian(search_term, options = {})
|
205
|
+
search_with_xapian(search_term, options).results.map {|x| x[:model] }
|
206
|
+
end
|
207
|
+
|
208
|
+
def search_with_xapian(search_term, options = {})
|
209
|
+
ActsAsXapian::Search.new([proxy_reflection.klass], "#{proxy_reflection.primary_key_name}:#{proxy_owner.id} #{search_term}", options)
|
210
|
+
end
|
211
|
+
end
|
212
|
+
end
|
213
|
+
|
214
|
+
# Reopen ActiveRecord and include the acts_as_xapian method
|
215
|
+
ActiveRecord::Base.extend ActsAsXapian::ActsMethods
|
@@ -0,0 +1,24 @@
|
|
1
|
+
module ActsAsXapian
|
2
|
+
module ArrayExt
|
3
|
+
# Creates a ActsAsXapian::Similar search passing through the options to the search
|
4
|
+
#
|
5
|
+
# The model classes to search are automatically generated off the classes of the
|
6
|
+
# entries in the array. If you want more control over which models to search,
|
7
|
+
# specify option :models and it will override the default behavior
|
8
|
+
def search_similar(options = {})
|
9
|
+
raise "All entries must be xapian models" unless all? {|i| i.class.respond_to?(:xapian?) && i.class.xapian? }
|
10
|
+
models = options.delete(:models)
|
11
|
+
ActsAsXapian::Similar.new(models || map {|i| i.class }.uniq, self, options)
|
12
|
+
end
|
13
|
+
|
14
|
+
# Runs a ActsAsXapian::Similar search passing back the returned models instead of the
|
15
|
+
# search object. Takes all the same options as search_similar
|
16
|
+
def find_similar(options = {})
|
17
|
+
search_similar(options).results.map {|x| x[:model] }
|
18
|
+
end
|
19
|
+
end
|
20
|
+
end
|
21
|
+
|
22
|
+
Array.class_eval do
|
23
|
+
include ActsAsXapian::ArrayExt
|
24
|
+
end
|