xapian_db 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/CHANGELOG.md ADDED
@@ -0,0 +1,33 @@
1
+ ##0.3.2 (December 10th, 2010)
2
+
3
+ Features:
4
+ - Moved the per_page option from Resultset.paginate to Database.search
5
+ - Added support for language settings (global and dynamic per object)
6
+ - Added support for xapian stemmers
7
+ - Removed the dependency to progressbar (but it is still used if available)
8
+ - Made the rebuild_xapian_index method silent by default (use :verbose => true to get status info)
9
+
10
+ ##0.3.1 (December 6th, 2010)
11
+
12
+ Bugfixes:
13
+
14
+ - Fixed the gemspec
15
+
16
+ ##0.3.0 (December 4th, 2010)
17
+
18
+ Features:
19
+ - Rails integration with configuration file (config/xapian_db.yml) and automatic setup
20
+
21
+ ##0.2.0 (December 1st, 2010)
22
+
23
+ Features:
24
+
25
+ - Blueprint configuration extended
26
+ - Adapter for Datamapper
27
+ - Search by attribute names
28
+ - Search with wildcards
29
+ - Document attributes can carry anything that is serializable by YAML
30
+
31
+ ##0.1.0 (November 23th, 2010)
32
+
33
+ Proof of concept, not really useful for real world usage
data/LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2010 Gernot Kogler
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.rdoc ADDED
@@ -0,0 +1,168 @@
1
+ = XapianDb
2
+
3
+ == What's in the box?
4
+
5
+ XapianDb is a ruby gem that combines features of nosql databases and fulltext indexing into one piece. The result: Rich documents and very fast queries.
6
+ It is based on {Xapian}[http://xapian.org/], an efficient and powerful indexing library.
7
+ The gem is in very early development and not production ready yet.
8
+
9
+ XapianDb is inspired by {xapian-fu}[https://github.com/johnl/xapian-fu] and {xapit}[https://github.com/ryanb/xapit].
10
+ Thank you John and Ryan for your great work. It helped me learning to understand the xapian library and I borrowed an idea
11
+ or two from you ;-)
12
+
13
+ == Why yet another indexing gem?
14
+
15
+ In the good old days I used {ferret}[https://github.com/dbalmain/ferret] and {acts_as_ferret}[https://github.com/jkraemer/acts_as_ferret]
16
+ as my fulltext indexing solution and everything was fine. But time moved on and Ferret didn't.
17
+
18
+ So I started to rethink fulltext indexing again. I looked for something that
19
+
20
+ * is under active development
21
+ * is fast
22
+ * is lightweight and easy to install / deploy
23
+ * is framework and database agnostic and works with pure POROS (plain old ruby objects)
24
+ * is configurable anywhere, not just inside the model classes; I think that index configurations should not be part of the domain model
25
+ * supports document configuration at the class level, not the database level; each class has its own document structure
26
+ * integrates with popular Ruby / Rails ORMs like ActiveRecord or Datamapper through a plugin architecture
27
+ * returns rich document objects that do not necessarily need a database roundtrip to render the search results (but know how to get the underlying object, if needed)
28
+ * updates the index realtime (no scheduled reindexing jobs)
29
+ * supports all major features of a full text indexer, namely wildcards!!
30
+
31
+ I tried hard but I couldn't find such a thing so I decided to write it, based on the Xapian library.
32
+
33
+ == Getting started
34
+
35
+ If you want to use xapian_db in a Rails app, you need Rails 3 or newer.
36
+
37
+ === Install Xapian if not already installed
38
+
39
+ To use xapian_db, make sure you have the Xapian library and ruby bindings installed. At the time of this writing, the newest release of Xapian was 1.2.3. You might
40
+ want to adjust the URLs below to load the most current release of Xapian.
41
+ The example code works for OSX. On linux you might want to use wget instead of curl.
42
+
43
+ A future release of xapian_db might include the Xapian binaries and make this step obsolete.
44
+
45
+ ==== Install Xapian
46
+ curl -O http://oligarchy.co.uk/xapian/1.2.3/xapian-core-1.2.3.tar.gz
47
+ tar xzvf xapian-core-1.2.3.tar.gz
48
+ cd xapian-core-1.2.3
49
+ ./configure --prefix=/usr/local
50
+ make
51
+ sudo make install
52
+
53
+ ==== Install ruby bindings for Xapian
54
+ curl -O http://oligarchy.co.uk/xapian/1.2.2/xapian-bindings-1.2.3.tar.gz
55
+ tar xzvf xapian-bindings-1.2.3.tar.gz
56
+ cd xapian-bindings-1.2.3
57
+ ./configure --prefix=/usr/local XAPIAN_CONFIG=/usr/local/bin/xapian-config
58
+ make
59
+ sudo make install
60
+
61
+ The following steps assume that you are using xapian_db within a Rails app. The gem has an
62
+ example in the examples folder that shows how you can use xapian_db without Rails.
63
+
64
+ === Configure your databases
65
+
66
+ Without a config file, xapian_db creates the database in the db folder for development and production
67
+ environments. If you are in the test environment, xapian_db creates an in memory database.
68
+ It assumes you are using ActiveRecord.
69
+
70
+ You can override these defaults by placing a config file named 'xapian_db.yml' into your config folder. Here's an example:
71
+
72
+ # XapianDb configuration
73
+ defaults: &defaults
74
+ adapter: datamapper # Avaliable adapters: :active_record, :datamapper
75
+ language: de # Global language; can be overridden for specific blueprints
76
+
77
+ development:
78
+ database: db/xapian_db/development
79
+ <<: *defaults
80
+
81
+ test:
82
+ database: ":memory:" # Use an in memory database for tests
83
+ <<: *defaults
84
+
85
+ production:
86
+ database: db/xapian_db/production
87
+ <<: *defaults
88
+
89
+ === Configure an index blueprint
90
+
91
+ In order to get your models indexed, you must configure a document blueprint for each class you want to index:
92
+
93
+ XapianDb::DocumentBlueprint.setup(Person) do |blueprint|
94
+ blueprint.attribute :name, :weight => 10
95
+ blueprint.attribute :first_name
96
+ end
97
+
98
+ The example above assumes that you have a class <code>Person</code> with the methods <code>name</code> and <code>first_name</code>.
99
+ Attributes will get indexed and are stored in the documents. You will be able to access the name and the first name in your search results.
100
+
101
+ If you want to index additional data but do not need access to it from a search result, use the index method:
102
+
103
+ blueprint.index :remarks, :weight => 5
104
+
105
+ If you config a class that has a language property, e.g.
106
+
107
+ class Person
108
+ attr_reader :language
109
+ end
110
+
111
+ you can configure the blueprint to use the language of the object when indexing:
112
+
113
+ XapianDb::DocumentBlueprint.setup(Person) do |blueprint|
114
+ blueprint.language_method :language
115
+ end
116
+
117
+ Don't worry if you have languages in your database that are not supported by Xapian. If the language is not supported, XapianDb
118
+ will fall back to the global language configuration or none, if you haven't configured one.
119
+
120
+ You can place this configuration anywhere, e.g. in an initializer.
121
+
122
+ === Update the index
123
+
124
+ xapian_db injects some helper methods into your configured model classes that update the index automatically
125
+ for you when you create, save or destroy models. If you already have models that should now go into the index,
126
+ use the method <code>rebuild_xapian_index</code>:
127
+
128
+ Person.rebuild_xapian_index
129
+
130
+ === Query the index
131
+
132
+ A simple query looks like this:
133
+
134
+ results = XapianDb.search("Foo")
135
+
136
+ You can use wildcards and boolean operators:
137
+
138
+ results = XapianDb.search("Fo*" OR "Baz")
139
+
140
+ You can query attributes:
141
+
142
+ results = XapianDb.search("name:Foo")
143
+
144
+ === Process the results
145
+
146
+ <code>XapianDb.search</code> returns a resultset object. You can access the number of hits directly:
147
+
148
+ result.size # Very fast, does not load the resulting documents
149
+
150
+ To access the found documents, get a page from the resultset:
151
+
152
+ page = result.paginate # Get the first page with 10 documents
153
+ page = result.paginate(:page => 2, :per_page => 20) # Get the second page page with documents 21-40
154
+
155
+ Now you can access the documents:
156
+
157
+ doc = page.first
158
+ puts doc.domain_class # Get the type of the indexed object, e.g. "Person"
159
+ puts doc.name # We can access the configured attributes
160
+ person = doc.indexed_object # Access the object behind this doc (lazy loaded)
161
+
162
+
163
+ == What to expect from future releases
164
+
165
+ * multi language support (spelling correction, stop words)
166
+ * facet support
167
+ * will_paginate support
168
+ * asynchronous index writer based on {resque}[https://github.com/defunkt/resque] for production environments
@@ -1,16 +1,27 @@
1
1
  # encoding: utf-8
2
2
 
3
- # Adapter for ActiveRecord. To use it, simply set it as the
4
- # default for any DocumentBlueprint or a specific DocumentBlueprint
5
-
6
3
  module XapianDb
7
4
  module Adapters
8
-
5
+
6
+ # Adapter for ActiveRecord. To use it, configure it like this:
7
+ # XapianDb::Config.setup do |config|
8
+ # config.adapter :active_record
9
+ # end
10
+ # This adapter does the following:
11
+ # - adds the instance method <code>xapian_id</code> to an indexed class
12
+ # - adds the class method <code>rebuild_xapian_index</code> to an indexed class
13
+ # - adds an after save block to an indexed class to update the index
14
+ # - adds an after destroy block to an indexed class to update the index
15
+ # - adds the instance method <code>indexed_object</code> to the module that will be included
16
+ # in every found xapian document
17
+ # @author Gernot Kogler
18
+
9
19
  class ActiveRecordAdapter
10
20
 
11
21
  class << self
12
-
22
+
13
23
  # Implement the class helper methods
24
+ # @param [Class] klass The class to add the helper methods to
14
25
  def add_class_helper_methods_to(klass)
15
26
 
16
27
  klass.instance_eval do
@@ -18,56 +29,45 @@ module XapianDb
18
29
  define_method(:xapian_id) do
19
30
  "#{self.class}-#{self.id}"
20
31
  end
21
-
32
+
22
33
  end
23
-
34
+
24
35
  klass.class_eval do
25
-
36
+
26
37
  # add the after save logic
27
38
  after_save do
28
39
  XapianDb::Config.writer.index(self)
29
40
  end
30
-
41
+
31
42
  # add the after destroy logic
32
43
  after_destroy do
33
44
  XapianDb::Config.writer.unindex(self)
34
45
  end
35
46
 
36
47
  # Add a method to reindex all models of this class
37
- define_singleton_method(:rebuild_xapian_index) do
38
- # db = XapianDb::Adapters::ActiveRecordAdapter.database
39
- # # First, delete all docs of this class
40
- # db.delete_docs_of_class(klass)
41
- # obj_count = klass.count
42
- # puts "Reindexing #{obj_count} objects..."
43
- # pbar = ProgressBar.new("Status", obj_count)
44
- # klass.all.each do |obj|
45
- # doc = @@blueprint.indexer.build_document_for(obj)
46
- # db.store_doc(doc)
47
- # pbar.inc
48
- # end
49
- # db.commit
50
- XapianDb::Config.writer.reindex_class(klass)
48
+ define_singleton_method(:rebuild_xapian_index) do |options={}|
49
+ XapianDb::Config.writer.reindex_class(klass, options)
51
50
  end
52
51
  end
53
-
52
+
54
53
  end
55
-
56
- # Implement the document helper methods
54
+
55
+ # Implement the document helper methods on a module
56
+ # @param [Module] a_module The module to add the helper methods to
57
57
  def add_doc_helper_methods_to(a_module)
58
58
  a_module.instance_eval do
59
59
  # Implement access to the indexed object
60
60
  define_method :indexed_object do
61
- return @indexed_object unless @indexed_object.nil?
62
- # retrieve the object id from data
61
+ return @indexed_object unless @indexed_object.nil?
62
+ # retrieve the class and id from data
63
63
  klass_name, id = data.split("-")
64
64
  klass = Kernel.const_get(klass_name)
65
65
  @indexed_object = klass.find(id.to_i)
66
66
  end
67
67
  end
68
-
68
+
69
69
  end
70
-
70
+
71
71
  end
72
72
  end
73
73
  end
@@ -1,16 +1,26 @@
1
1
  # encoding: utf-8
2
2
 
3
- # Adapter for datamapper. To use it, simply set it as the
4
- # default for any DocumentBlueprint or a specific DocumentBlueprint
5
-
6
3
  module XapianDb
7
4
  module Adapters
8
-
5
+
6
+ # Adapter for Datamapper. To use it, configure it like this:
7
+ # XapianDb::Config.setup do |config|
8
+ # config.adapter :datamapper
9
+ # end
10
+ # This adapter does the following:
11
+ # - adds the instance method <code>xapian_id</code> to an indexed class
12
+ # - adds the class method <code>rebuild_xapian_index</code> to an indexed class
13
+ # - adds an after save block to an indexed class to update the index
14
+ # - adds an after destroy block to an indexed class to update the index
15
+ # - adds the instance method <code>indexed_object</code> to the module that will be included
16
+ # in every found xapian document
17
+ # @author Gernot Kogler
9
18
  class DatamapperAdapter
10
19
 
11
20
  class << self
12
-
21
+
13
22
  # Implement the class helper methods
23
+ # @param [Class] klass The class to add the helper methods to
14
24
  def add_class_helper_methods_to(klass)
15
25
 
16
26
  klass.instance_eval do
@@ -18,44 +28,45 @@ module XapianDb
18
28
  define_method(:xapian_id) do
19
29
  "#{self.class}-#{self.id}"
20
30
  end
21
-
31
+
22
32
  end
23
-
33
+
24
34
  klass.class_eval do
25
-
35
+
26
36
  # add the after save logic
27
37
  after :save do
28
38
  XapianDb::Config.writer.index(self)
29
39
  end
30
-
40
+
31
41
  # add the after destroy logic
32
42
  after :destroy do
33
43
  XapianDb::Config.writer.unindex(self)
34
44
  end
35
45
 
36
46
  # Add a method to reindex all models of this class
37
- define_singleton_method(:rebuild_xapian_index) do
38
- XapianDb::Config.writer.reindex_class(self)
47
+ define_singleton_method(:rebuild_xapian_index) do |options={}|
48
+ XapianDb::Config.writer.reindex_class(self, options)
39
49
  end
40
50
  end
41
-
51
+
42
52
  end
43
-
44
- # Implement the document helper methods
53
+
54
+ # Implement the document helper methods on a module
55
+ # @param [Module] a_module The module to add the helper methods to
45
56
  def add_doc_helper_methods_to(a_module)
46
57
  a_module.instance_eval do
47
58
  # Implement access to the indexed object
48
59
  define_method :indexed_object do
49
- return @indexed_object unless @indexed_object.nil?
50
- # retrieve the object id from data
60
+ return @indexed_object unless @indexed_object.nil?
61
+ # retrieve the class and id from data
51
62
  klass_name, id = data.split("-")
52
63
  klass = Kernel.const_get(klass_name)
53
64
  @indexed_object = klass.get(id.to_i)
54
65
  end
55
66
  end
56
-
67
+
57
68
  end
58
-
69
+
59
70
  end
60
71
  end
61
72
  end
@@ -1,22 +1,30 @@
1
1
  # encoding: utf-8
2
2
 
3
- # The generic adapter is a universal adapater that can be used for any
4
- # ruby class. To use the generic adapter (which is the default),
5
- # configure the expression that generates a unique key from your objects
6
- # using the method 'unique_key'.
7
3
  module XapianDb
8
4
  module Adapters
9
-
5
+
6
+ # The generic adapter is a universal adapater that can be used for any
7
+ # ruby class. To use the generic adapter (which is the default),
8
+ # configure the expression that generates a unique key from your objects
9
+ # using the method 'unique_key'.
10
+ # This adapter does the following:
11
+ # - adds the instance method <code>xapian_id</code> to an indexed class
12
+ # @author Gernot Kogler
10
13
  class GenericAdapter
11
14
 
12
15
  class << self
13
-
16
+
14
17
  # Define the unique key expression
18
+ # @example Use the same unique expression like the active record adapter (assuming your objects have an id)
19
+ # XapianDb::Adapters::GenericAdapter.unique_key do
20
+ # "#{self.class}-#{self.id}"
21
+ # end
15
22
  def unique_key(&block)
16
23
  @unique_key_block = block
17
24
  end
18
-
25
+
19
26
  # Implement the class helper methods
27
+ # @param [Class] klass The class to add the helper methods to
20
28
  def add_class_helper_methods_to(klass)
21
29
  raise "Unique key is not configured for generic adapter!" if @unique_key_block.nil?
22
30
  expression = @unique_key_block
@@ -26,16 +34,17 @@ module XapianDb
26
34
  end
27
35
  end
28
36
  end
29
-
30
- # Implement the document helper methods
37
+
38
+ # Implement the document helper methods on a module. So far there are none
39
+ # @param [Module] a_module The module to add the helper methods to
31
40
  def add_doc_helper_methods_to(obj)
32
41
  # We have none so far
33
42
  end
34
-
43
+
35
44
  end
36
45
 
37
46
  end
38
-
47
+
39
48
  end
40
-
49
+
41
50
  end
@@ -1,46 +1,61 @@
1
1
  # encoding: utf-8
2
2
 
3
- # Global configuration for XapianDb
4
- # @author Gernot Kogler
5
-
6
3
  module XapianDb
7
-
4
+
5
+ # Global configuration for XapianDb
6
+ # @example A typical configuration might look like this:
7
+ # XapianDb::Config.setup do |config|
8
+ # config.adapter :active_record
9
+ # config.writer :direct
10
+ # config.database "db/xapian_db"
11
+ # end
12
+ # @author Gernot Kogler
8
13
  class Config
9
14
 
10
- # ---------------------------------------------------------------------------------
15
+ # ---------------------------------------------------------------------------------
11
16
  # Singleton methods
12
- # ---------------------------------------------------------------------------------
17
+ # ---------------------------------------------------------------------------------
13
18
  class << self
14
19
 
20
+ # Configure global options for XapianDb.
21
+ # Availabe options:
22
+ # - adapter (see {XapianDb::Config#adapter})
23
+ # - writer (see {XapianDb::Config#writer})
24
+ # - database (see {XapianDb::Config#database})
25
+ # - language (see {XapianDb::Config#language})
26
+ # In a Rails app, you can configure XapianDb using a config file. See the README for the details
15
27
  def setup(&block)
16
28
  @config ||= Config.new
17
29
  yield @config if block_given?
18
30
  end
19
-
31
+
20
32
  # Install delegates for the config instance variables
21
- [:database, :adapter, :writer].each do |attr|
33
+ [:database, :adapter, :writer, :stemmer].each do |attr|
22
34
  define_method attr do
23
35
  @config.nil? ? nil : @config.instance_variable_get("@_#{attr}")
24
36
  end
25
- end
26
- end
37
+ end
38
+ end
27
39
 
28
- # ---------------------------------------------------------------------------------
40
+ # ---------------------------------------------------------------------------------
29
41
  # DSL methods
30
- # ---------------------------------------------------------------------------------
31
- attr_reader :_database, :_adapter, :_writer
32
-
33
- # Set the database; either pass a path to the file system or
34
- # the symbolic name "memory"
42
+ # ---------------------------------------------------------------------------------
43
+
44
+ #
45
+ attr_reader :_database, :_adapter, :_writer, :_stemmer
46
+
47
+ # Set the global database to use
48
+ # @param [String] path The path to the database. Either apply a file sytem path or :memory
49
+ # for an in memory database
35
50
  def database(path)
36
-
51
+
37
52
  # If the current database is a persistent database, we must release the
38
53
  # database and run the garbage collector to remove the write lock
39
54
  if @_database.is_a?(XapianDb::PersistentDatabase)
40
55
  @_database = nil
41
56
  GC.start
42
57
  end
43
-
58
+
44
59
  if path.to_sym == :memory
45
60
  @_database = XapianDb.create_db
46
61
  else
@@ -52,31 +67,49 @@ module XapianDb
52
67
  end
53
68
  end
54
69
  end
55
-
56
- # Define the adapter to use; the following adapters are available:
57
- # - :generic
58
- # - :active_record
59
- # - :datamapper
70
+
71
+ # Set the adapter
72
+ # @param [Symbol] type The adapter type; the following adapters are available:
73
+ # - :generic ({XapianDb::Adapters::GenericAdapter})
74
+ # - :active_record ({XapianDb::Adapters::ActiveRecordAdapter})
75
+ # - :datamapper ({XapianDb::Adapters::DatamapperAdapter})
60
76
  def adapter(type)
61
77
  # We try to guess the adapter name
62
78
  @_adapter = XapianDb::Adapters.const_get("#{camelize(type.to_s)}Adapter")
63
79
  end
64
80
 
65
- # Define the writer to use; the following adapters are available:
66
- # - :direct
67
- # More to come in a future release :-)
81
+ # Set the index writer
82
+ # @param [Symbol] type The writer type; the following adapters are available:
83
+ # - :direct ({XapianDb::IndexWriters::DirectWriter})
84
+ # More to come in a future release
68
85
  def writer(type)
69
86
  # We try to guess the writer name
70
87
  @_writer = XapianDb::IndexWriters.const_get("#{camelize(type.to_s)}Writer")
71
88
  end
72
-
89
+
90
+ # Set the language
91
+ # @param [Symbol] lang The language; either apply the english name of the language
92
+ # or the two letter IS639 code
93
+ # @example Use the english name of the language
94
+ # XapianDb::Config.setup do |config|
95
+ # config.language :german
96
+ # end
97
+ # @example Use the iso code of the language
98
+ # XapianDb::Config.setup do |config|
99
+ # config.language :de
100
+ # end
101
+ # see http://xapian.org/docs/apidoc/html/classXapian_1_1Stem.html for supported languages
102
+ def language(lang)
103
+ @_stemmer = Xapian::Stem.new(lang.to_s)
104
+ end
105
+
73
106
  private
74
-
107
+
75
108
  # TODO: move this to a helper module
76
109
  def camelize(string)
77
110
  string.split(/[^a-z0-9]/i).map{|w| w.capitalize}.join
78
111
  end
79
-
112
+
80
113
  end
81
-
114
+
82
115
  end