mongoid-sphinx 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
data/README.markdown ADDED
@@ -0,0 +1,135 @@
1
+ # MongoidSphinx
2
+
3
+ ## Fork info
4
+
5
+ This is a fork of http://github.com/burke/mongosphinx with many changes to simplify and support Mongoid.
6
+
7
+ ## General info
8
+
9
+ The MongoidSphinx library implements an interface between MongoDB and Sphinx
10
+ supporting Mongoid to automatically index objects in Sphinx. It tries to
11
+ act as transparent as possible: Just an additional method in Mongoid
12
+ and some Sphinx configuration are needed to get going.
13
+
14
+ ## Prerequisites
15
+
16
+ MongoidSphinx needs gems Mongoid and Riddle as well as a running Sphinx
17
+ and a MongoDB installation. Just add this to your Gemfile:
18
+
19
+ gem riddle
20
+ gem mongoid
21
+ gem mongoidsphinx, :require => 'mongoid_sphinx'
22
+
23
+ No additional configuraton is needed for interfacing with MongoDB: Setup is
24
+ done when Mongoid is able to talk to the MongoDB server.
25
+
26
+ A proper "sphinx.conf" file and a script for retrieving index data have to
27
+ be provided for interfacing with Sphinx: Sorry, no ThinkingSphinx like
28
+ magic... :-) Depending on the amount of data, more than one index may be used
29
+ and indexes may be consolidated from time to time.
30
+
31
+ This is a sample configuration for a single "main" index:
32
+
33
+ searchd {
34
+ address = 0.0.0.0
35
+ port = 3312
36
+
37
+ log = ./sphinx/searchd.log
38
+ query_log = ./sphinx/query.log
39
+ pid_file = ./sphinx/searchd.pid
40
+ }
41
+
42
+ source mongoblog {
43
+ type = xmlpipe2
44
+
45
+ xmlpipe_command = rake sphinx:genxml --silent
46
+ }
47
+
48
+ index mongoblog {
49
+ source = mongoblog
50
+
51
+ charset_type = utf-8
52
+ path = ./sphinx/sphinx_index_main
53
+ }
54
+
55
+ Notice the line "xmlpipe_command =". This is what the indexer runs to generate
56
+ its input. You can change this to whatever works best for you, but I set it up as
57
+ a rake task, with the following in `lib/tasks/sphinx.rake` .
58
+
59
+ Here :fields is a list of fields to export. Performance tends to suffer if you export
60
+ everything, so you'll probably want to just list the fields you're indexing.
61
+
62
+ namespace :sphinx do
63
+ task :genxml => :environment do
64
+ MongoidSphinx::Indexer::XMLDocset.stream(Food)
65
+ end
66
+ end
67
+
68
+ This uses MongoDB cursor to better stream collection. Instead of offset. See: http://groups.google.com/group/mongodb-user/browse_thread/thread/35f01db45ea3b0bd/96ebc49b511a6b41?lnk=gst&q=skip#96ebc49b511a6b41
69
+
70
+ ## Models
71
+
72
+ Use method _search_index_ to enable indexing of a model. You must provide a list of
73
+ attribute keys.
74
+
75
+ A side effect of calling this method is, that MongoidSphinx overrides the
76
+ default of letting MongoDB create new IDs: Sphinx only allows numeric IDs and
77
+ MongoidSphinx forces new objects with the name of the class, a hyphen and an
78
+ integer as ID (e.g. _Post-38497238_). Again: Only these objects are
79
+ indexed due to internal restrictions of Sphinx.
80
+
81
+ Sample:
82
+
83
+ class Post
84
+ include Mongoid::Sphinx
85
+
86
+ field :title
87
+ field :body
88
+
89
+ search_index :title, :body
90
+ end
91
+
92
+ You must also create a config/sphinx.yml file with the host and port of your sphinxd process like so:
93
+
94
+ development:
95
+ address: localhost
96
+ port: 3312
97
+
98
+ staging:
99
+ address: localhost
100
+ port: 3312
101
+
102
+ production:
103
+ address: localhost
104
+ port: 3312
105
+
106
+ ## Queries
107
+
108
+ An additional instance method <tt>search</tt> is added for each
109
+ search indexed model. This method takes a Sphinx query like
110
+ `foo @title bar`, runs it within the context of the current class and returns
111
+ an Array of matching MongoDB documents.
112
+
113
+ Samples:
114
+
115
+ Post.search('first')
116
+ => [...]
117
+
118
+ post = Post.search('this is @title post').first
119
+ post.title
120
+ => "First Post"
121
+ post.class
122
+ => Post
123
+
124
+ Additional options _:match_mode_, _:limit_ and
125
+ _:max_matches_ can be provided to customize the behavior of Riddle.
126
+ Option _:raw_ can be set to _true_ to do no lookup of the
127
+ document IDs but return the raw IDs instead.
128
+
129
+ Sample:
130
+
131
+ Post.search('my post', :limit => 100)
132
+
133
+ ## Copyright
134
+
135
+ Copyright (c) 2010 RedBeard Tech. See LICENSE for details.
@@ -0,0 +1,8 @@
1
+ require "mongoid"
2
+ require "riddle"
3
+
4
+ require 'mongoid_sphinx/configuration'
5
+ require 'mongoid_sphinx/indexer'
6
+ require 'mongoid_sphinx/multi_attribute'
7
+ require 'mongoid_sphinx/mongoid/identity'
8
+ require 'mongoid_sphinx/mongoid/sphinx'
@@ -0,0 +1,31 @@
1
+ require 'erb'
2
+ require 'singleton'
3
+
4
+ module MongoidSphinx
5
+ class Configuration
6
+ include Singleton
7
+
8
+ attr_accessor :address, :port, :configuration
9
+
10
+ def client
11
+ @configuration ||= parse_config
12
+ Riddle::Client.new address, port
13
+ end
14
+
15
+ private
16
+
17
+ # Parse the config/sphinx.yml file - if it exists
18
+ #
19
+ def parse_config
20
+ path = "#{Rails.root}/config/sphinx.yml"
21
+ return unless File.exists?(path)
22
+
23
+ conf = YAML::load(ERB.new(IO.read(path)).result)[Rails.env]
24
+
25
+ conf.each do |key,value|
26
+ self.send("#{key}=", value) if self.respond_to?("#{key}=")
27
+ end
28
+ end
29
+
30
+ end
31
+ end
@@ -0,0 +1,123 @@
1
+ # MongoidSphinx, a full text indexing extension for MongoDB using
2
+ # Sphinx.
3
+ #
4
+ # This file contains the MongoidSphinx::Indexer::XMLDocset and
5
+ # MongoidSphinx::Indexer::XMLDoc classes.
6
+
7
+ module MongoidSphinx #:nodoc:
8
+
9
+ # Module Indexer contains classes for creating XML input documents for the
10
+ # indexer. Each Sphinx index consists of a single "sphinx:docset" with any
11
+ # number of "sphinx:document" tags.
12
+ #
13
+ # The XML source can be generated from an array of CouchRest objects or from
14
+ # an array of Hashes containing at least fields "classname" and "_id"
15
+ # as returned by MongoDB view "MongoSphinxIndex/couchrests_by_timestamp".
16
+ #
17
+ # Sample:
18
+ #
19
+ # rows = [{ 'name' => 'John', 'phone' => '199 43828',
20
+ # 'classname' => 'Address', '_id' => 'Address-234164'
21
+ # },
22
+ # { 'name' => 'Sue', 'mobile' => '828 19439',
23
+ # 'classname' => 'Address', '_id' => 'Address-422433'
24
+ # }
25
+ # ]
26
+ # puts MongoSphinx::Indexer::XMLDocset.new(rows).to_s
27
+ #
28
+ # <?xml version="1.0" encoding="utf-8"?>
29
+ # <sphinx:docset>
30
+ # <sphinx:schema>
31
+ # <sphinx:attr name="csphinx-class" type="multi"/>
32
+ # <sphinx:field name="classname"/>
33
+ # <sphinx:field name="name"/>
34
+ # <sphinx:field name="phone"/>
35
+ # <sphinx:field name="mobile"/>
36
+ # <sphinx:field name="created_at"/>
37
+ # </sphinx:schema>
38
+ # <sphinx:document id="234164">
39
+ # <csphinx-class>336,623,883,1140</csphinx-class>
40
+ # <classname>Address</classname>
41
+ # <name><![CDATA[[John]]></name>
42
+ # <phone><![CDATA[[199 422433]]></phone>
43
+ # <mobile><![CDATA[[]]></mobile>
44
+ # <created_at><![CDATA[[]]></created_at>
45
+ # </sphinx:document>
46
+ # <sphinx:document id="423423">
47
+ # <csphinx-class>336,623,883,1140</csphinx-class>
48
+ # <classname>Address</classname>
49
+ # <name><![CDATA[[Sue]]></name>
50
+ # <phone><![CDATA[[]]></phone>
51
+ # <mobile><![CDATA[[828 19439]]></mobile>
52
+ # <created_at><![CDATA[[]]></created_at>
53
+ # </sphinx:document>
54
+ # </sphinx:docset>"
55
+
56
+ module Indexer
57
+
58
+ # Class XMLDocset wraps the XML representation of a document to index. It
59
+ # contains a complete "sphinx:docset" including its schema definition.
60
+
61
+ class XMLDocset
62
+
63
+ # Streams xml of all objects in a klass to the stdout. This makes sure you can process large collections.
64
+ #
65
+ # Options:
66
+ # attributes (required) - The attributes that are put in the sphinx xml.
67
+ #
68
+ # Example:
69
+ # MongoSphinx::Indexer::XMLDocset.stream(Document, :attributes => %w(title content))
70
+ # This will create an XML stream to stdout.
71
+ #
72
+ # Configure in your sphinx.conf like
73
+ # xmlpipe_command = ./script/runner "MongoSphinx::Indexer::XMLDocset.stream(Document, :attributes => %w(title content))"
74
+ #
75
+ def self.stream(klass)
76
+ STDOUT.sync = true # Make sure we really stream..
77
+
78
+ puts '<?xml version="1.0" encoding="utf-8"?>'
79
+ puts '<sphinx:docset>'
80
+
81
+ # Schema
82
+ puts '<sphinx:schema>'
83
+ klass.search_fields.each do |key, value|
84
+ puts "<sphinx:field name=\"#{key}\"/>"
85
+ end
86
+ # FIXME: What is this attribute?
87
+ puts '<sphinx:field name="classname"/>'
88
+ puts '<sphinx:attr name="csphinx-class" type="multi"/>'
89
+ puts '</sphinx:schema>'
90
+
91
+ collection = Mongoid.database.collection(klass.collection.name)
92
+ collection.find.each do |document_hash|
93
+ XMLDoc.stream_for_hash(document_hash, klass)
94
+ end
95
+
96
+ puts '</sphinx:docset>'
97
+ end
98
+
99
+ end
100
+
101
+ class XMLDoc
102
+
103
+ def self.stream_for_hash(hash, klass)
104
+ sphinx_compatible_id = hash['_id'].to_s.to_i - 100000000000000000000000
105
+
106
+ puts "<sphinx:document id=\"#{sphinx_compatible_id}\">"
107
+ # FIXME: Should we include this?
108
+ puts '<csphinx-class>'
109
+ puts MongoidSphinx::MultiAttribute.encode(klass.to_s)
110
+ puts '</csphinx-class>'
111
+ puts "<classname>#{klass.to_s}</classname>"
112
+
113
+ klass.search_fields.each do |key|
114
+ value = hash[key.to_s]
115
+ puts "<#{key}><![CDATA[[#{value}]]></#{key}>"
116
+ end
117
+
118
+ puts '</sphinx:document>'
119
+ end
120
+
121
+ end
122
+ end
123
+ end
@@ -0,0 +1,23 @@
1
+ module Mongoid
2
+ class Identity
3
+
4
+ protected
5
+
6
+ # Return an id that is sphinx compatible
7
+ def generate_id
8
+ while true
9
+ id = 100000000000000000000000 + rand(4294967294) # 4,294,967,295 is the theoretical max number of documents a 32 bit sphinx install can index
10
+ candidate = id.to_s
11
+
12
+ begin
13
+ @document.class.find(candidate) # Resource not found exception if available
14
+ rescue Mongoid::Errors::DocumentNotFound
15
+ id = BSON::ObjectId.from_string(candidate)
16
+ break
17
+ end
18
+ end
19
+ @document.using_object_ids? ? id : id.to_s
20
+ end
21
+
22
+ end
23
+ end
@@ -0,0 +1,49 @@
1
+ # MongoidSphinx, a full text indexing extension for MongoDB/Mongoid using
2
+ # Sphinx.
3
+
4
+ module Mongoid
5
+ module Sphinx
6
+ extend ActiveSupport::Concern
7
+ included do
8
+ cattr_accessor :search_fields
9
+ end
10
+
11
+ module ClassMethods
12
+ def search_index(*fields)
13
+ self.search_fields = fields
14
+ end
15
+
16
+ def search(query, options = {})
17
+ client = MongoidSphinx::Configuration.instance.client
18
+
19
+ query = query + " @classname #{@document.class.to_s}"
20
+
21
+ client.match_mode = options[:match_mode] || :extended
22
+ client.limit = options[:limit] if options.key?(:limit)
23
+ client.max_matches = options[:max_matches] if options.key?(:max_matches)
24
+
25
+ if options.key?(:sort_by)
26
+ client.sort_mode = :extended
27
+ client.sort_by = options[:sort_by]
28
+ end
29
+
30
+ result = client.query(query)
31
+
32
+ #TODO
33
+ if result and result[:status] == 0 and (matches = result[:matches])
34
+ classname = nil
35
+ ids = matches.collect do |row|
36
+ classname = MongoidSphinx::MultiAttribute.decode(row[:attributes]['csphinx-class'])
37
+ row[:doc].to_s rescue nil
38
+ end.compact
39
+
40
+ return ids if options[:raw]
41
+ return Object.const_get(classname).find(ids)
42
+ else
43
+ return []
44
+ end
45
+ end
46
+ end
47
+
48
+ end
49
+ end
@@ -0,0 +1,57 @@
1
+ # MongoSphinx, a full text indexing extension for MongoDB using
2
+ # Sphinx.
3
+ #
4
+ # This file contains the MongoSphinx::MultiAttribute class.
5
+
6
+ # Namespace module for the MongoSphinx gem.
7
+
8
+ module MongoidSphinx #:nodoc:
9
+
10
+ # Module MultiAttribute implements helpers to translate back and
11
+ # forth between Ruby Strings and an array of integers suitable for Sphinx
12
+ # attributes of type "multi".
13
+ #
14
+ # Background: Getting an ID as result for a query is OK, but for example to
15
+ # allow cast safety, we need an aditional attribute. Sphinx supports
16
+ # attributes which are returned together with the ID, but they behave a
17
+ # little different than expected: Instead we can use arrays of integers with
18
+ # ASCII character codes. These values are returned in ascending (!) order of
19
+ # value (yes, sounds funny but is reasonable from an internal view to
20
+ # Sphinx). So we mask each byte with 0x0100++ to keep the order...
21
+ #
22
+ # Sample:
23
+ #
24
+ # MongoSphinx::MultiAttribute.encode('Hello')
25
+ # => "328,613,876,1132,1391"
26
+ # MongoSphinx::MultiAttribute.decode('328,613,876,1132,1391')
27
+ # => "Hello"
28
+
29
+ module MultiAttribute
30
+
31
+ # Returns an numeric representation of a Ruby String suitable for "multi"
32
+ # attributes of Sphinx.
33
+ #
34
+ # Parameters:
35
+ #
36
+ # [str] String to translate
37
+
38
+ def self.encode(str)
39
+ offset = 0
40
+ return str.bytes.collect { |c| (offset+= 0x0100) + c }.join(',')
41
+ end
42
+
43
+ # Returns the original MongoDB ID created from a Sphinx ID. Only works if
44
+ # the ID was created from a MongoDB ID before!
45
+ #
46
+ # Parameters:
47
+ #
48
+ # [multi] Sphinx "multi" attribute to translate back
49
+
50
+ def self.decode(multi)
51
+ offset = 0
52
+ multi = multi.split(',') if not multi.kind_of? Array
53
+
54
+ return multi.collect {|x| (x.to_i-(offset+=0x0100)).chr}.to_s
55
+ end
56
+ end
57
+ end
@@ -0,0 +1,3 @@
1
+ module MongoidSphinx #:nodoc
2
+ VERSION = "0.0.1"
3
+ end
metadata ADDED
@@ -0,0 +1,110 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: mongoid-sphinx
3
+ version: !ruby/object:Gem::Version
4
+ hash: 29
5
+ prerelease: false
6
+ segments:
7
+ - 0
8
+ - 0
9
+ - 1
10
+ version: 0.0.1
11
+ platform: ruby
12
+ authors:
13
+ - Matt Hodgson
14
+ autorequire:
15
+ bindir: bin
16
+ cert_chain: []
17
+
18
+ date: 2010-12-21 00:00:00 -05:00
19
+ default_executable:
20
+ dependencies:
21
+ - !ruby/object:Gem::Dependency
22
+ version_requirements: &id001 !ruby/object:Gem::Requirement
23
+ none: false
24
+ requirements:
25
+ - - "="
26
+ - !ruby/object:Gem::Version
27
+ hash: 62196421
28
+ segments:
29
+ - 2
30
+ - 0
31
+ - 0
32
+ - beta
33
+ - 19
34
+ version: 2.0.0.beta.19
35
+ requirement: *id001
36
+ type: :runtime
37
+ name: mongoid
38
+ prerelease: false
39
+ - !ruby/object:Gem::Dependency
40
+ version_requirements: &id002 !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ~>
44
+ - !ruby/object:Gem::Version
45
+ hash: 19
46
+ segments:
47
+ - 1
48
+ - 1
49
+ - 0
50
+ version: 1.1.0
51
+ requirement: *id002
52
+ type: :runtime
53
+ name: riddle
54
+ prerelease: false
55
+ description: A full text indexing extension for MongoDB using Sphinx and Mongoid.
56
+ email:
57
+ - mhodgson@scenario4.com
58
+ executables: []
59
+
60
+ extensions: []
61
+
62
+ extra_rdoc_files: []
63
+
64
+ files:
65
+ - lib/mongoid_sphinx/configuration.rb
66
+ - lib/mongoid_sphinx/indexer.rb
67
+ - lib/mongoid_sphinx/mongoid/identity.rb
68
+ - lib/mongoid_sphinx/mongoid/sphinx.rb
69
+ - lib/mongoid_sphinx/multi_attribute.rb
70
+ - lib/mongoid_sphinx/version.rb
71
+ - lib/mongoid_sphinx.rb
72
+ - README.markdown
73
+ has_rdoc: true
74
+ homepage: http://github.com/mhodgson/mongoid-sphinx
75
+ licenses: []
76
+
77
+ post_install_message:
78
+ rdoc_options: []
79
+
80
+ require_paths:
81
+ - lib
82
+ required_ruby_version: !ruby/object:Gem::Requirement
83
+ none: false
84
+ requirements:
85
+ - - ">="
86
+ - !ruby/object:Gem::Version
87
+ hash: 3
88
+ segments:
89
+ - 0
90
+ version: "0"
91
+ required_rubygems_version: !ruby/object:Gem::Requirement
92
+ none: false
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ hash: 23
97
+ segments:
98
+ - 1
99
+ - 3
100
+ - 6
101
+ version: 1.3.6
102
+ requirements: []
103
+
104
+ rubyforge_project:
105
+ rubygems_version: 1.3.7
106
+ signing_key:
107
+ specification_version: 3
108
+ summary: A full text indexing extension for MongoDB using Sphinx and Mongoid.
109
+ test_files: []
110
+