xapian_db 0.3.1 → 0.3.2

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGELOG.md ADDED
@@ -0,0 +1,33 @@
1
+ ##0.3.2 (December 10th, 2010)
2
+
3
+ Features:
4
+ - Moved the per_page option from Resultset.paginate to Database.search
5
+ - Added support for language settings (global and dynamic per object)
6
+ - Added support for xapian stemmers
7
+ - Removed the dependency to progressbar (but it is still used if available)
8
+ - Made the rebuild_xapian_index method silent by default (use :verbose => true to get status info)
9
+
10
+ ##0.3.1 (December 6th, 2010)
11
+
12
+ Bugfixes:
13
+
14
+ - Fixed the gemspec
15
+
16
+ ##0.3.0 (December 4th, 2010)
17
+
18
+ Features:
19
+ - Rails integration with configuration file (config/xapian_db.yml) and automatic setup
20
+
21
+ ##0.2.0 (December 1st, 2010)
22
+
23
+ Features:
24
+
25
+ - Blueprint configuration extended
26
+ - Adapter for Datamapper
27
+ - Search by attribute names
28
+ - Search with wildcards
29
+ - Document attributes can carry anything that is serializable by YAML
30
+
31
+ ##0.1.0 (November 23th, 2010)
32
+
33
+ Proof of concept, not really useful for real world usage
data/LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2010 Gernot Kogler
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.rdoc ADDED
@@ -0,0 +1,168 @@
1
+ = XapianDb
2
+
3
+ == What's in the box?
4
+
5
+ XapianDb is a ruby gem that combines features of nosql databases and fulltext indexing into one piece. The result: Rich documents and very fast queries.
6
+ It is based on {Xapian}[http://xapian.org/], an efficient and powerful indexing library.
7
+ The gem is in very early development and not production ready yet.
8
+
9
+ XapianDb is inspired by {xapian-fu}[https://github.com/johnl/xapian-fu] and {xapit}[https://github.com/ryanb/xapit].
10
+ Thank you John and Ryan for your great work. It helped me learning to understand the xapian library and I borrowed an idea
11
+ or two from you ;-)
12
+
13
+ == Why yet another indexing gem?
14
+
15
+ In the good old days I used {ferret}[https://github.com/dbalmain/ferret] and {acts_as_ferret}[https://github.com/jkraemer/acts_as_ferret]
16
+ as my fulltext indexing solution and everything was fine. But time moved on and Ferret didn't.
17
+
18
+ So I started to rethink fulltext indexing again. I looked for something that
19
+
20
+ * is under active development
21
+ * is fast
22
+ * is lightweight and easy to install / deploy
23
+ * is framework and database agnostic and works with pure POROS (plain old ruby objects)
24
+ * is configurable anywhere, not just inside the model classes; I think that index configurations should not be part of the domain model
25
+ * supports document configuration at the class level, not the database level; each class has its own document structure
26
+ * integrates with popular Ruby / Rails ORMs like ActiveRecord or Datamapper through a plugin architecture
27
+ * returns rich document objects that do not necessarily need a database roundtrip to render the search results (but know how to get the underlying object, if needed)
28
+ * updates the index realtime (no scheduled reindexing jobs)
29
+ * supports all major features of a full text indexer, namely wildcards!!
30
+
31
+ I tried hard but I couldn't find such a thing so I decided to write it, based on the Xapian library.
32
+
33
+ == Getting started
34
+
35
+ If you want to use xapian_db in a Rails app, you need Rails 3 or newer.
36
+
37
+ === Install Xapian if not already installed
38
+
39
+ To use xapian_db, make sure you have the Xapian library and ruby bindings installed. At the time of this writing, the newest release of Xapian was 1.2.3. You might
40
+ want to adjust the URLs below to load the most current release of Xapian.
41
+ The example code works for OSX. On linux you might want to use wget instead of curl.
42
+
43
+ A future release of xapian_db might include the Xapian binaries and make this step obsolete.
44
+
45
+ ==== Install Xapian
46
+ curl -O http://oligarchy.co.uk/xapian/1.2.3/xapian-core-1.2.3.tar.gz
47
+ tar xzvf xapian-core-1.2.3.tar.gz
48
+ cd xapian-core-1.2.3
49
+ ./configure --prefix=/usr/local
50
+ make
51
+ sudo make install
52
+
53
+ ==== Install ruby bindings for Xapian
54
+ curl -O http://oligarchy.co.uk/xapian/1.2.2/xapian-bindings-1.2.3.tar.gz
55
+ tar xzvf xapian-bindings-1.2.3.tar.gz
56
+ cd xapian-bindings-1.2.3
57
+ ./configure --prefix=/usr/local XAPIAN_CONFIG=/usr/local/bin/xapian-config
58
+ make
59
+ sudo make install
60
+
61
+ The following steps assume that you are using xapian_db within a Rails app. The gem has an
62
+ example in the examples folder that shows how you can use xapian_db without Rails.
63
+
64
+ === Configure your databases
65
+
66
+ Without a config file, xapian_db creates the database in the db folder for development and production
67
+ environments. If you are in the test environment, xapian_db creates an in memory database.
68
+ It assumes you are using ActiveRecord.
69
+
70
+ You can override these defaults by placing a config file named 'xapian_db.yml' into your config folder. Here's an example:
71
+
72
+ # XapianDb configuration
73
+ defaults: &defaults
74
+ adapter: datamapper # Avaliable adapters: :active_record, :datamapper
75
+ language: de # Global language; can be overridden for specific blueprints
76
+
77
+ development:
78
+ database: db/xapian_db/development
79
+ <<: *defaults
80
+
81
+ test:
82
+ database: ":memory:" # Use an in memory database for tests
83
+ <<: *defaults
84
+
85
+ production:
86
+ database: db/xapian_db/production
87
+ <<: *defaults
88
+
89
+ === Configure an index blueprint
90
+
91
+ In order to get your models indexed, you must configure a document blueprint for each class you want to index:
92
+
93
+ XapianDb::DocumentBlueprint.setup(Person) do |blueprint|
94
+ blueprint.attribute :name, :weight => 10
95
+ blueprint.attribute :first_name
96
+ end
97
+
98
+ The example above assumes that you have a class <code>Person</code> with the methods <code>name</code> and <code>first_name</code>.
99
+ Attributes will get indexed and are stored in the documents. You will be able to access the name and the first name in your search results.
100
+
101
+ If you want to index additional data but do not need access to it from a search result, use the index method:
102
+
103
+ blueprint.index :remarks, :weight => 5
104
+
105
+ If you config a class that has a language property, e.g.
106
+
107
+ class Person
108
+ attr_reader :language
109
+ end
110
+
111
+ you can configure the blueprint to use the language of the object when indexing:
112
+
113
+ XapianDb::DocumentBlueprint.setup(Person) do |blueprint|
114
+ blueprint.language_method :language
115
+ end
116
+
117
+ Don't worry if you have languages in your database that are not supported by Xapian. If the language is not supported, XapianDb
118
+ will fall back to the global language configuration or none, if you haven't configured one.
119
+
120
+ You can place this configuration anywhere, e.g. in an initializer.
121
+
122
+ === Update the index
123
+
124
+ xapian_db injects some helper methods into your configured model classes that update the index automatically
125
+ for you when you create, save or destroy models. If you already have models that should now go into the index,
126
+ use the method <code>rebuild_xapian_index</code>:
127
+
128
+ Person.rebuild_xapian_index
129
+
130
+ === Query the index
131
+
132
+ A simple query looks like this:
133
+
134
+ results = XapianDb.search("Foo")
135
+
136
+ You can use wildcards and boolean operators:
137
+
138
+ results = XapianDb.search("Fo*" OR "Baz")
139
+
140
+ You can query attributes:
141
+
142
+ results = XapianDb.search("name:Foo")
143
+
144
+ === Process the results
145
+
146
+ <code>XapianDb.search</code> returns a resultset object. You can access the number of hits directly:
147
+
148
+ result.size # Very fast, does not load the resulting documents
149
+
150
+ To access the found documents, get a page from the resultset:
151
+
152
+ page = result.paginate # Get the first page with 10 documents
153
+ page = result.paginate(:page => 2, :per_page => 20) # Get the second page page with documents 21-40
154
+
155
+ Now you can access the documents:
156
+
157
+ doc = page.first
158
+ puts doc.domain_class # Get the type of the indexed object, e.g. "Person"
159
+ puts doc.name # We can access the configured attributes
160
+ person = doc.indexed_object # Access the object behind this doc (lazy loaded)
161
+
162
+
163
+ == What to expect from future releases
164
+
165
+ * multi language support (spelling correction, stop words)
166
+ * facet support
167
+ * will_paginate support
168
+ * asynchronous index writer based on {resque}[https://github.com/defunkt/resque] for production environments
@@ -1,16 +1,27 @@
1
1
  # encoding: utf-8
2
2
 
3
- # Adapter for ActiveRecord. To use it, simply set it as the
4
- # default for any DocumentBlueprint or a specific DocumentBlueprint
5
-
6
3
  module XapianDb
7
4
  module Adapters
8
-
5
+
6
+ # Adapter for ActiveRecord. To use it, configure it like this:
7
+ # XapianDb::Config.setup do |config|
8
+ # config.adapter :active_record
9
+ # end
10
+ # This adapter does the following:
11
+ # - adds the instance method <code>xapian_id</code> to an indexed class
12
+ # - adds the class method <code>rebuild_xapian_index</code> to an indexed class
13
+ # - adds an after save block to an indexed class to update the index
14
+ # - adds an after destroy block to an indexed class to update the index
15
+ # - adds the instance method <code>indexed_object</code> to the module that will be included
16
+ # in every found xapian document
17
+ # @author Gernot Kogler
18
+
9
19
  class ActiveRecordAdapter
10
20
 
11
21
  class << self
12
-
22
+
13
23
  # Implement the class helper methods
24
+ # @param [Class] klass The class to add the helper methods to
14
25
  def add_class_helper_methods_to(klass)
15
26
 
16
27
  klass.instance_eval do
@@ -18,56 +29,45 @@ module XapianDb
18
29
  define_method(:xapian_id) do
19
30
  "#{self.class}-#{self.id}"
20
31
  end
21
-
32
+
22
33
  end
23
-
34
+
24
35
  klass.class_eval do
25
-
36
+
26
37
  # add the after save logic
27
38
  after_save do
28
39
  XapianDb::Config.writer.index(self)
29
40
  end
30
-
41
+
31
42
  # add the after destroy logic
32
43
  after_destroy do
33
44
  XapianDb::Config.writer.unindex(self)
34
45
  end
35
46
 
36
47
  # Add a method to reindex all models of this class
37
- define_singleton_method(:rebuild_xapian_index) do
38
- # db = XapianDb::Adapters::ActiveRecordAdapter.database
39
- # # First, delete all docs of this class
40
- # db.delete_docs_of_class(klass)
41
- # obj_count = klass.count
42
- # puts "Reindexing #{obj_count} objects..."
43
- # pbar = ProgressBar.new("Status", obj_count)
44
- # klass.all.each do |obj|
45
- # doc = @@blueprint.indexer.build_document_for(obj)
46
- # db.store_doc(doc)
47
- # pbar.inc
48
- # end
49
- # db.commit
50
- XapianDb::Config.writer.reindex_class(klass)
48
+ define_singleton_method(:rebuild_xapian_index) do |options={}|
49
+ XapianDb::Config.writer.reindex_class(klass, options)
51
50
  end
52
51
  end
53
-
52
+
54
53
  end
55
-
56
- # Implement the document helper methods
54
+
55
+ # Implement the document helper methods on a module
56
+ # @param [Module] a_module The module to add the helper methods to
57
57
  def add_doc_helper_methods_to(a_module)
58
58
  a_module.instance_eval do
59
59
  # Implement access to the indexed object
60
60
  define_method :indexed_object do
61
- return @indexed_object unless @indexed_object.nil?
62
- # retrieve the object id from data
61
+ return @indexed_object unless @indexed_object.nil?
62
+ # retrieve the class and id from data
63
63
  klass_name, id = data.split("-")
64
64
  klass = Kernel.const_get(klass_name)
65
65
  @indexed_object = klass.find(id.to_i)
66
66
  end
67
67
  end
68
-
68
+
69
69
  end
70
-
70
+
71
71
  end
72
72
  end
73
73
  end
@@ -1,16 +1,26 @@
1
1
  # encoding: utf-8
2
2
 
3
- # Adapter for datamapper. To use it, simply set it as the
4
- # default for any DocumentBlueprint or a specific DocumentBlueprint
5
-
6
3
  module XapianDb
7
4
  module Adapters
8
-
5
+
6
+ # Adapter for Datamapper. To use it, configure it like this:
7
+ # XapianDb::Config.setup do |config|
8
+ # config.adapter :datamapper
9
+ # end
10
+ # This adapter does the following:
11
+ # - adds the instance method <code>xapian_id</code> to an indexed class
12
+ # - adds the class method <code>rebuild_xapian_index</code> to an indexed class
13
+ # - adds an after save block to an indexed class to update the index
14
+ # - adds an after destroy block to an indexed class to update the index
15
+ # - adds the instance method <code>indexed_object</code> to the module that will be included
16
+ # in every found xapian document
17
+ # @author Gernot Kogler
9
18
  class DatamapperAdapter
10
19
 
11
20
  class << self
12
-
21
+
13
22
  # Implement the class helper methods
23
+ # @param [Class] klass The class to add the helper methods to
14
24
  def add_class_helper_methods_to(klass)
15
25
 
16
26
  klass.instance_eval do
@@ -18,44 +28,45 @@ module XapianDb
18
28
  define_method(:xapian_id) do
19
29
  "#{self.class}-#{self.id}"
20
30
  end
21
-
31
+
22
32
  end
23
-
33
+
24
34
  klass.class_eval do
25
-
35
+
26
36
  # add the after save logic
27
37
  after :save do
28
38
  XapianDb::Config.writer.index(self)
29
39
  end
30
-
40
+
31
41
  # add the after destroy logic
32
42
  after :destroy do
33
43
  XapianDb::Config.writer.unindex(self)
34
44
  end
35
45
 
36
46
  # Add a method to reindex all models of this class
37
- define_singleton_method(:rebuild_xapian_index) do
38
- XapianDb::Config.writer.reindex_class(self)
47
+ define_singleton_method(:rebuild_xapian_index) do |options={}|
48
+ XapianDb::Config.writer.reindex_class(self, options)
39
49
  end
40
50
  end
41
-
51
+
42
52
  end
43
-
44
- # Implement the document helper methods
53
+
54
+ # Implement the document helper methods on a module
55
+ # @param [Module] a_module The module to add the helper methods to
45
56
  def add_doc_helper_methods_to(a_module)
46
57
  a_module.instance_eval do
47
58
  # Implement access to the indexed object
48
59
  define_method :indexed_object do
49
- return @indexed_object unless @indexed_object.nil?
50
- # retrieve the object id from data
60
+ return @indexed_object unless @indexed_object.nil?
61
+ # retrieve the class and id from data
51
62
  klass_name, id = data.split("-")
52
63
  klass = Kernel.const_get(klass_name)
53
64
  @indexed_object = klass.get(id.to_i)
54
65
  end
55
66
  end
56
-
67
+
57
68
  end
58
-
69
+
59
70
  end
60
71
  end
61
72
  end
@@ -1,22 +1,30 @@
1
1
  # encoding: utf-8
2
2
 
3
- # The generic adapter is a universal adapater that can be used for any
4
- # ruby class. To use the generic adapter (which is the default),
5
- # configure the expression that generates a unique key from your objects
6
- # using the method 'unique_key'.
7
3
  module XapianDb
8
4
  module Adapters
9
-
5
+
6
+ # The generic adapter is a universal adapater that can be used for any
7
+ # ruby class. To use the generic adapter (which is the default),
8
+ # configure the expression that generates a unique key from your objects
9
+ # using the method 'unique_key'.
10
+ # This adapter does the following:
11
+ # - adds the instance method <code>xapian_id</code> to an indexed class
12
+ # @author Gernot Kogler
10
13
  class GenericAdapter
11
14
 
12
15
  class << self
13
-
16
+
14
17
  # Define the unique key expression
18
+ # @example Use the same unique expression like the active record adapter (assuming your objects have an id)
19
+ # XapianDb::Adapters::GenericAdapter.unique_key do
20
+ # "#{self.class}-#{self.id}"
21
+ # end
15
22
  def unique_key(&block)
16
23
  @unique_key_block = block
17
24
  end
18
-
25
+
19
26
  # Implement the class helper methods
27
+ # @param [Class] klass The class to add the helper methods to
20
28
  def add_class_helper_methods_to(klass)
21
29
  raise "Unique key is not configured for generic adapter!" if @unique_key_block.nil?
22
30
  expression = @unique_key_block
@@ -26,16 +34,17 @@ module XapianDb
26
34
  end
27
35
  end
28
36
  end
29
-
30
- # Implement the document helper methods
37
+
38
+ # Implement the document helper methods on a module. So far there are none
39
+ # @param [Module] a_module The module to add the helper methods to
31
40
  def add_doc_helper_methods_to(obj)
32
41
  # We have none so far
33
42
  end
34
-
43
+
35
44
  end
36
45
 
37
46
  end
38
-
47
+
39
48
  end
40
-
49
+
41
50
  end
@@ -1,46 +1,61 @@
1
1
  # encoding: utf-8
2
2
 
3
- # Global configuration for XapianDb
4
- # @author Gernot Kogler
5
-
6
3
  module XapianDb
7
-
4
+
5
+ # Global configuration for XapianDb
6
+ # @example A typical configuration might look like this:
7
+ # XapianDb::Config.setup do |config|
8
+ # config.adapter :active_record
9
+ # config.writer :direct
10
+ # config.database "db/xapian_db"
11
+ # end
12
+ # @author Gernot Kogler
8
13
  class Config
9
14
 
10
- # ---------------------------------------------------------------------------------
15
+ # ---------------------------------------------------------------------------------
11
16
  # Singleton methods
12
- # ---------------------------------------------------------------------------------
17
+ # ---------------------------------------------------------------------------------
13
18
  class << self
14
19
 
20
+ # Configure global options for XapianDb.
21
+ # Availabe options:
22
+ # - adapter (see {XapianDb::Config#adapter})
23
+ # - writer (see {XapianDb::Config#writer})
24
+ # - database (see {XapianDb::Config#database})
25
+ # - language (see {XapianDb::Config#language})
26
+ # In a Rails app, you can configure XapianDb using a config file. See the README for the details
15
27
  def setup(&block)
16
28
  @config ||= Config.new
17
29
  yield @config if block_given?
18
30
  end
19
-
31
+
20
32
  # Install delegates for the config instance variables
21
- [:database, :adapter, :writer].each do |attr|
33
+ [:database, :adapter, :writer, :stemmer].each do |attr|
22
34
  define_method attr do
23
35
  @config.nil? ? nil : @config.instance_variable_get("@_#{attr}")
24
36
  end
25
- end
26
- end
37
+ end
38
+ end
27
39
 
28
- # ---------------------------------------------------------------------------------
40
+ # ---------------------------------------------------------------------------------
29
41
  # DSL methods
30
- # ---------------------------------------------------------------------------------
31
- attr_reader :_database, :_adapter, :_writer
32
-
33
- # Set the database; either pass a path to the file system or
34
- # the symbolic name "memory"
42
+ # ---------------------------------------------------------------------------------
43
+
44
+ #
45
+ attr_reader :_database, :_adapter, :_writer, :_stemmer
46
+
47
+ # Set the global database to use
48
+ # @param [String] path The path to the database. Either apply a file sytem path or :memory
49
+ # for an in memory database
35
50
  def database(path)
36
-
51
+
37
52
  # If the current database is a persistent database, we must release the
38
53
  # database and run the garbage collector to remove the write lock
39
54
  if @_database.is_a?(XapianDb::PersistentDatabase)
40
55
  @_database = nil
41
56
  GC.start
42
57
  end
43
-
58
+
44
59
  if path.to_sym == :memory
45
60
  @_database = XapianDb.create_db
46
61
  else
@@ -52,31 +67,49 @@ module XapianDb
52
67
  end
53
68
  end
54
69
  end
55
-
56
- # Define the adapter to use; the following adapters are available:
57
- # - :generic
58
- # - :active_record
59
- # - :datamapper
70
+
71
+ # Set the adapter
72
+ # @param [Symbol] type The adapter type; the following adapters are available:
73
+ # - :generic ({XapianDb::Adapters::GenericAdapter})
74
+ # - :active_record ({XapianDb::Adapters::ActiveRecordAdapter})
75
+ # - :datamapper ({XapianDb::Adapters::DatamapperAdapter})
60
76
  def adapter(type)
61
77
  # We try to guess the adapter name
62
78
  @_adapter = XapianDb::Adapters.const_get("#{camelize(type.to_s)}Adapter")
63
79
  end
64
80
 
65
- # Define the writer to use; the following adapters are available:
66
- # - :direct
67
- # More to come in a future release :-)
81
+ # Set the index writer
82
+ # @param [Symbol] type The writer type; the following adapters are available:
83
+ # - :direct ({XapianDb::IndexWriters::DirectWriter})
84
+ # More to come in a future release
68
85
  def writer(type)
69
86
  # We try to guess the writer name
70
87
  @_writer = XapianDb::IndexWriters.const_get("#{camelize(type.to_s)}Writer")
71
88
  end
72
-
89
+
90
+ # Set the language
91
+ # @param [Symbol] lang The language; either apply the english name of the language
92
+ # or the two letter IS639 code
93
+ # @example Use the english name of the language
94
+ # XapianDb::Config.setup do |config|
95
+ # config.language :german
96
+ # end
97
+ # @example Use the iso code of the language
98
+ # XapianDb::Config.setup do |config|
99
+ # config.language :de
100
+ # end
101
+ # see http://xapian.org/docs/apidoc/html/classXapian_1_1Stem.html for supported languages
102
+ def language(lang)
103
+ @_stemmer = Xapian::Stem.new(lang.to_s)
104
+ end
105
+
73
106
  private
74
-
107
+
75
108
  # TODO: move this to a helper module
76
109
  def camelize(string)
77
110
  string.split(/[^a-z0-9]/i).map{|w| w.capitalize}.join
78
111
  end
79
-
112
+
80
113
  end
81
-
114
+
82
115
  end