mongoid-sphinx 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.markdown ADDED
@@ -0,0 +1,135 @@
1
+ # MongoidSphinx
2
+
3
+ ## Fork info
4
+
5
+ This is a fork of http://github.com/burke/mongosphinx with many changes to simplify and support Mongoid.
6
+
7
+ ## General info
8
+
9
+ The MongoidSphinx library implements an interface between MongoDB and Sphinx
10
+ supporting Mongoid to automatically index objects in Sphinx. It tries to
11
+ act as transparent as possible: Just an additional method in Mongoid
12
+ and some Sphinx configuration are needed to get going.
13
+
14
+ ## Prerequisites
15
+
16
+ MongoidSphinx needs gems Mongoid and Riddle as well as a running Sphinx
17
+ and a MongoDB installation. Just add this to your Gemfile:
18
+
19
+ gem riddle
20
+ gem mongoid
21
+ gem mongoidsphinx, :require => 'mongoid_sphinx'
22
+
23
+ No additional configuraton is needed for interfacing with MongoDB: Setup is
24
+ done when Mongoid is able to talk to the MongoDB server.
25
+
26
+ A proper "sphinx.conf" file and a script for retrieving index data have to
27
+ be provided for interfacing with Sphinx: Sorry, no ThinkingSphinx like
28
+ magic... :-) Depending on the amount of data, more than one index may be used
29
+ and indexes may be consolidated from time to time.
30
+
31
+ This is a sample configuration for a single "main" index:
32
+
33
+ searchd {
34
+ address = 0.0.0.0
35
+ port = 3312
36
+
37
+ log = ./sphinx/searchd.log
38
+ query_log = ./sphinx/query.log
39
+ pid_file = ./sphinx/searchd.pid
40
+ }
41
+
42
+ source mongoblog {
43
+ type = xmlpipe2
44
+
45
+ xmlpipe_command = rake sphinx:genxml --silent
46
+ }
47
+
48
+ index mongoblog {
49
+ source = mongoblog
50
+
51
+ charset_type = utf-8
52
+ path = ./sphinx/sphinx_index_main
53
+ }
54
+
55
+ Notice the line "xmlpipe_command =". This is what the indexer runs to generate
56
+ its input. You can change this to whatever works best for you, but I set it up as
57
+ a rake task, with the following in `lib/tasks/sphinx.rake` .
58
+
59
+ Here :fields is a list of fields to export. Performance tends to suffer if you export
60
+ everything, so you'll probably want to just list the fields you're indexing.
61
+
62
+ namespace :sphinx do
63
+ task :genxml => :environment do
64
+ MongoidSphinx::Indexer::XMLDocset.stream(Food)
65
+ end
66
+ end
67
+
68
+ This uses MongoDB cursor to better stream collection. Instead of offset. See: http://groups.google.com/group/mongodb-user/browse_thread/thread/35f01db45ea3b0bd/96ebc49b511a6b41?lnk=gst&q=skip#96ebc49b511a6b41
69
+
70
+ ## Models
71
+
72
+ Use method _search_index_ to enable indexing of a model. You must provide a list of
73
+ attribute keys.
74
+
75
+ A side effect of calling this method is, that MongoidSphinx overrides the
76
+ default of letting MongoDB create new IDs: Sphinx only allows numeric IDs and
77
+ MongoidSphinx forces new objects with the name of the class, a hyphen and an
78
+ integer as ID (e.g. _Post-38497238_). Again: Only these objects are
79
+ indexed due to internal restrictions of Sphinx.
80
+
81
+ Sample:
82
+
83
+ class Post
84
+ include Mongoid::Sphinx
85
+
86
+ field :title
87
+ field :body
88
+
89
+ search_index :title, :body
90
+ end
91
+
92
+ You must also create a config/sphinx.yml file with the host and port of your sphinxd process like so:
93
+
94
+ development:
95
+ address: localhost
96
+ port: 3312
97
+
98
+ staging:
99
+ address: localhost
100
+ port: 3312
101
+
102
+ production:
103
+ address: localhost
104
+ port: 3312
105
+
106
+ ## Queries
107
+
108
+ An additional instance method <tt>search</tt> is added for each
109
+ search indexed model. This method takes a Sphinx query like
110
+ `foo @title bar`, runs it within the context of the current class and returns
111
+ an Array of matching MongoDB documents.
112
+
113
+ Samples:
114
+
115
+ Post.search('first')
116
+ => [...]
117
+
118
+ post = Post.search('this is @title post').first
119
+ post.title
120
+ => "First Post"
121
+ post.class
122
+ => Post
123
+
124
+ Additional options _:match_mode_, _:limit_ and
125
+ _:max_matches_ can be provided to customize the behavior of Riddle.
126
+ Option _:raw_ can be set to _true_ to do no lookup of the
127
+ document IDs but return the raw IDs instead.
128
+
129
+ Sample:
130
+
131
+ Post.search('my post', :limit => 100)
132
+
133
+ ## Copyright
134
+
135
+ Copyright (c) 2010 RedBeard Tech. See LICENSE for details.
@@ -0,0 +1,8 @@
1
+ require "mongoid"
2
+ require "riddle"
3
+
4
+ require 'mongoid_sphinx/configuration'
5
+ require 'mongoid_sphinx/indexer'
6
+ require 'mongoid_sphinx/multi_attribute'
7
+ require 'mongoid_sphinx/mongoid/identity'
8
+ require 'mongoid_sphinx/mongoid/sphinx'
@@ -0,0 +1,31 @@
1
+ require 'erb'
2
+ require 'singleton'
3
+
4
+ module MongoidSphinx
5
+ class Configuration
6
+ include Singleton
7
+
8
+ attr_accessor :address, :port, :configuration
9
+
10
+ def client
11
+ @configuration ||= parse_config
12
+ Riddle::Client.new address, port
13
+ end
14
+
15
+ private
16
+
17
+ # Parse the config/sphinx.yml file - if it exists
18
+ #
19
+ def parse_config
20
+ path = "#{Rails.root}/config/sphinx.yml"
21
+ return unless File.exists?(path)
22
+
23
+ conf = YAML::load(ERB.new(IO.read(path)).result)[Rails.env]
24
+
25
+ conf.each do |key,value|
26
+ self.send("#{key}=", value) if self.respond_to?("#{key}=")
27
+ end
28
+ end
29
+
30
+ end
31
+ end
@@ -0,0 +1,123 @@
1
+ # MongoidSphinx, a full text indexing extension for MongoDB using
2
+ # Sphinx.
3
+ #
4
+ # This file contains the MongoidSphinx::Indexer::XMLDocset and
5
+ # MongoidSphinx::Indexer::XMLDoc classes.
6
+
7
+ module MongoidSphinx #:nodoc:
8
+
9
+ # Module Indexer contains classes for creating XML input documents for the
10
+ # indexer. Each Sphinx index consists of a single "sphinx:docset" with any
11
+ # number of "sphinx:document" tags.
12
+ #
13
+ # The XML source can be generated from an array of CouchRest objects or from
14
+ # an array of Hashes containing at least fields "classname" and "_id"
15
+ # as returned by MongoDB view "MongoSphinxIndex/couchrests_by_timestamp".
16
+ #
17
+ # Sample:
18
+ #
19
+ # rows = [{ 'name' => 'John', 'phone' => '199 43828',
20
+ # 'classname' => 'Address', '_id' => 'Address-234164'
21
+ # },
22
+ # { 'name' => 'Sue', 'mobile' => '828 19439',
23
+ # 'classname' => 'Address', '_id' => 'Address-422433'
24
+ # }
25
+ # ]
26
+ # puts MongoSphinx::Indexer::XMLDocset.new(rows).to_s
27
+ #
28
+ # <?xml version="1.0" encoding="utf-8"?>
29
+ # <sphinx:docset>
30
+ # <sphinx:schema>
31
+ # <sphinx:attr name="csphinx-class" type="multi"/>
32
+ # <sphinx:field name="classname"/>
33
+ # <sphinx:field name="name"/>
34
+ # <sphinx:field name="phone"/>
35
+ # <sphinx:field name="mobile"/>
36
+ # <sphinx:field name="created_at"/>
37
+ # </sphinx:schema>
38
+ # <sphinx:document id="234164">
39
+ # <csphinx-class>336,623,883,1140</csphinx-class>
40
+ # <classname>Address</classname>
41
+ # <name><![CDATA[[John]]></name>
42
+ # <phone><![CDATA[[199 422433]]></phone>
43
+ # <mobile><![CDATA[[]]></mobile>
44
+ # <created_at><![CDATA[[]]></created_at>
45
+ # </sphinx:document>
46
+ # <sphinx:document id="423423">
47
+ # <csphinx-class>336,623,883,1140</csphinx-class>
48
+ # <classname>Address</classname>
49
+ # <name><![CDATA[[Sue]]></name>
50
+ # <phone><![CDATA[[]]></phone>
51
+ # <mobile><![CDATA[[828 19439]]></mobile>
52
+ # <created_at><![CDATA[[]]></created_at>
53
+ # </sphinx:document>
54
+ # </sphinx:docset>"
55
+
56
+ module Indexer
57
+
58
+ # Class XMLDocset wraps the XML representation of a document to index. It
59
+ # contains a complete "sphinx:docset" including its schema definition.
60
+
61
+ class XMLDocset
62
+
63
+ # Streams xml of all objects in a klass to the stdout. This makes sure you can process large collections.
64
+ #
65
+ # Options:
66
+ # attributes (required) - The attributes that are put in the sphinx xml.
67
+ #
68
+ # Example:
69
+ # MongoSphinx::Indexer::XMLDocset.stream(Document, :attributes => %w(title content))
70
+ # This will create an XML stream to stdout.
71
+ #
72
+ # Configure in your sphinx.conf like
73
+ # xmlpipe_command = ./script/runner "MongoSphinx::Indexer::XMLDocset.stream(Document, :attributes => %w(title content))"
74
+ #
75
+ def self.stream(klass)
76
+ STDOUT.sync = true # Make sure we really stream..
77
+
78
+ puts '<?xml version="1.0" encoding="utf-8"?>'
79
+ puts '<sphinx:docset>'
80
+
81
+ # Schema
82
+ puts '<sphinx:schema>'
83
+ klass.search_fields.each do |key, value|
84
+ puts "<sphinx:field name=\"#{key}\"/>"
85
+ end
86
+ # FIXME: What is this attribute?
87
+ puts '<sphinx:field name="classname"/>'
88
+ puts '<sphinx:attr name="csphinx-class" type="multi"/>'
89
+ puts '</sphinx:schema>'
90
+
91
+ collection = Mongoid.database.collection(klass.collection.name)
92
+ collection.find.each do |document_hash|
93
+ XMLDoc.stream_for_hash(document_hash, klass)
94
+ end
95
+
96
+ puts '</sphinx:docset>'
97
+ end
98
+
99
+ end
100
+
101
+ class XMLDoc
102
+
103
+ def self.stream_for_hash(hash, klass)
104
+ sphinx_compatible_id = hash['_id'].to_s.to_i - 100000000000000000000000
105
+
106
+ puts "<sphinx:document id=\"#{sphinx_compatible_id}\">"
107
+ # FIXME: Should we include this?
108
+ puts '<csphinx-class>'
109
+ puts MongoidSphinx::MultiAttribute.encode(klass.to_s)
110
+ puts '</csphinx-class>'
111
+ puts "<classname>#{klass.to_s}</classname>"
112
+
113
+ klass.search_fields.each do |key|
114
+ value = hash[key.to_s]
115
+ puts "<#{key}><![CDATA[[#{value}]]></#{key}>"
116
+ end
117
+
118
+ puts '</sphinx:document>'
119
+ end
120
+
121
+ end
122
+ end
123
+ end
@@ -0,0 +1,23 @@
1
+ module Mongoid
2
+ class Identity
3
+
4
+ protected
5
+
6
+ # Return an id that is sphinx compatible
7
+ def generate_id
8
+ while true
9
+ id = 100000000000000000000000 + rand(4294967294) # 4,294,967,295 is the theoretical max number of documents a 32 bit sphinx install can index
10
+ candidate = id.to_s
11
+
12
+ begin
13
+ @document.class.find(candidate) # Resource not found exception if available
14
+ rescue Mongoid::Errors::DocumentNotFound
15
+ id = BSON::ObjectId.from_string(candidate)
16
+ break
17
+ end
18
+ end
19
+ @document.using_object_ids? ? id : id.to_s
20
+ end
21
+
22
+ end
23
+ end
@@ -0,0 +1,49 @@
1
+ # MongoidSphinx, a full text indexing extension for MongoDB/Mongoid using
2
+ # Sphinx.
3
+
4
+ module Mongoid
5
+ module Sphinx
6
+ extend ActiveSupport::Concern
7
+ included do
8
+ cattr_accessor :search_fields
9
+ end
10
+
11
+ module ClassMethods
12
+ def search_index(*fields)
13
+ self.search_fields = fields
14
+ end
15
+
16
+ def search(query, options = {})
17
+ client = MongoidSphinx::Configuration.instance.client
18
+
19
+ query = query + " @classname #{@document.class.to_s}"
20
+
21
+ client.match_mode = options[:match_mode] || :extended
22
+ client.limit = options[:limit] if options.key?(:limit)
23
+ client.max_matches = options[:max_matches] if options.key?(:max_matches)
24
+
25
+ if options.key?(:sort_by)
26
+ client.sort_mode = :extended
27
+ client.sort_by = options[:sort_by]
28
+ end
29
+
30
+ result = client.query(query)
31
+
32
+ #TODO
33
+ if result and result[:status] == 0 and (matches = result[:matches])
34
+ classname = nil
35
+ ids = matches.collect do |row|
36
+ classname = MongoidSphinx::MultiAttribute.decode(row[:attributes]['csphinx-class'])
37
+ row[:doc].to_s rescue nil
38
+ end.compact
39
+
40
+ return ids if options[:raw]
41
+ return Object.const_get(classname).find(ids)
42
+ else
43
+ return []
44
+ end
45
+ end
46
+ end
47
+
48
+ end
49
+ end
@@ -0,0 +1,57 @@
1
+ # MongoSphinx, a full text indexing extension for MongoDB using
2
+ # Sphinx.
3
+ #
4
+ # This file contains the MongoSphinx::MultiAttribute class.
5
+
6
+ # Namespace module for the MongoSphinx gem.
7
+
8
+ module MongoidSphinx #:nodoc:
9
+
10
+ # Module MultiAttribute implements helpers to translate back and
11
+ # forth between Ruby Strings and an array of integers suitable for Sphinx
12
+ # attributes of type "multi".
13
+ #
14
+ # Background: Getting an ID as result for a query is OK, but for example to
15
+ # allow cast safety, we need an aditional attribute. Sphinx supports
16
+ # attributes which are returned together with the ID, but they behave a
17
+ # little different than expected: Instead we can use arrays of integers with
18
+ # ASCII character codes. These values are returned in ascending (!) order of
19
+ # value (yes, sounds funny but is reasonable from an internal view to
20
+ # Sphinx). So we mask each byte with 0x0100++ to keep the order...
21
+ #
22
+ # Sample:
23
+ #
24
+ # MongoSphinx::MultiAttribute.encode('Hello')
25
+ # => "328,613,876,1132,1391"
26
+ # MongoSphinx::MultiAttribute.decode('328,613,876,1132,1391')
27
+ # => "Hello"
28
+
29
+ module MultiAttribute
30
+
31
+ # Returns an numeric representation of a Ruby String suitable for "multi"
32
+ # attributes of Sphinx.
33
+ #
34
+ # Parameters:
35
+ #
36
+ # [str] String to translate
37
+
38
+ def self.encode(str)
39
+ offset = 0
40
+ return str.bytes.collect { |c| (offset+= 0x0100) + c }.join(',')
41
+ end
42
+
43
+ # Returns the original MongoDB ID created from a Sphinx ID. Only works if
44
+ # the ID was created from a MongoDB ID before!
45
+ #
46
+ # Parameters:
47
+ #
48
+ # [multi] Sphinx "multi" attribute to translate back
49
+
50
+ def self.decode(multi)
51
+ offset = 0
52
+ multi = multi.split(',') if not multi.kind_of? Array
53
+
54
+ return multi.collect {|x| (x.to_i-(offset+=0x0100)).chr}.to_s
55
+ end
56
+ end
57
+ end
@@ -0,0 +1,3 @@
1
+ module MongoidSphinx #:nodoc
2
+ VERSION = "0.0.1"
3
+ end
metadata ADDED
@@ -0,0 +1,110 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: mongoid-sphinx
3
+ version: !ruby/object:Gem::Version
4
+ hash: 29
5
+ prerelease: false
6
+ segments:
7
+ - 0
8
+ - 0
9
+ - 1
10
+ version: 0.0.1
11
+ platform: ruby
12
+ authors:
13
+ - Matt Hodgson
14
+ autorequire:
15
+ bindir: bin
16
+ cert_chain: []
17
+
18
+ date: 2010-12-21 00:00:00 -05:00
19
+ default_executable:
20
+ dependencies:
21
+ - !ruby/object:Gem::Dependency
22
+ version_requirements: &id001 !ruby/object:Gem::Requirement
23
+ none: false
24
+ requirements:
25
+ - - "="
26
+ - !ruby/object:Gem::Version
27
+ hash: 62196421
28
+ segments:
29
+ - 2
30
+ - 0
31
+ - 0
32
+ - beta
33
+ - 19
34
+ version: 2.0.0.beta.19
35
+ requirement: *id001
36
+ type: :runtime
37
+ name: mongoid
38
+ prerelease: false
39
+ - !ruby/object:Gem::Dependency
40
+ version_requirements: &id002 !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ~>
44
+ - !ruby/object:Gem::Version
45
+ hash: 19
46
+ segments:
47
+ - 1
48
+ - 1
49
+ - 0
50
+ version: 1.1.0
51
+ requirement: *id002
52
+ type: :runtime
53
+ name: riddle
54
+ prerelease: false
55
+ description: A full text indexing extension for MongoDB using Sphinx and Mongoid.
56
+ email:
57
+ - mhodgson@scenario4.com
58
+ executables: []
59
+
60
+ extensions: []
61
+
62
+ extra_rdoc_files: []
63
+
64
+ files:
65
+ - lib/mongoid_sphinx/configuration.rb
66
+ - lib/mongoid_sphinx/indexer.rb
67
+ - lib/mongoid_sphinx/mongoid/identity.rb
68
+ - lib/mongoid_sphinx/mongoid/sphinx.rb
69
+ - lib/mongoid_sphinx/multi_attribute.rb
70
+ - lib/mongoid_sphinx/version.rb
71
+ - lib/mongoid_sphinx.rb
72
+ - README.markdown
73
+ has_rdoc: true
74
+ homepage: http://github.com/mhodgson/mongoid-sphinx
75
+ licenses: []
76
+
77
+ post_install_message:
78
+ rdoc_options: []
79
+
80
+ require_paths:
81
+ - lib
82
+ required_ruby_version: !ruby/object:Gem::Requirement
83
+ none: false
84
+ requirements:
85
+ - - ">="
86
+ - !ruby/object:Gem::Version
87
+ hash: 3
88
+ segments:
89
+ - 0
90
+ version: "0"
91
+ required_rubygems_version: !ruby/object:Gem::Requirement
92
+ none: false
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ hash: 23
97
+ segments:
98
+ - 1
99
+ - 3
100
+ - 6
101
+ version: 1.3.6
102
+ requirements: []
103
+
104
+ rubyforge_project:
105
+ rubygems_version: 1.3.7
106
+ signing_key:
107
+ specification_version: 3
108
+ summary: A full text indexing extension for MongoDB using Sphinx and Mongoid.
109
+ test_files: []
110
+