mongosphinx 0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.rdoc +164 -0
- data/lib/indexer.rb +217 -0
- data/lib/mixins/.indexer.rb.swp +0 -0
- data/lib/mixins/indexer.rb +181 -0
- data/lib/mixins/properties.rb +64 -0
- data/lib/multi_attribute.rb +57 -0
- data/mongosphinx.rb +41 -0
- metadata +69 -0
data/README.rdoc
ADDED
@@ -0,0 +1,164 @@
|
|
1
|
+
= MongoSphinx
|
2
|
+
|
3
|
+
The MongoSphinx library implements an interface between MongoDBand Sphinx
|
4
|
+
supporting MongoMapper to automatically index objects in Sphinx. It tries to
|
5
|
+
act as transparent as possible: Just an additional method in MongoMapper
|
6
|
+
and some Sphinx configuration are needed to get going.
|
7
|
+
|
8
|
+
== Prerequisites
|
9
|
+
|
10
|
+
MongoSphinx needs gems MongoMapper and Riddle as well as a running Sphinx
|
11
|
+
and a MongoDB installation.
|
12
|
+
|
13
|
+
sudo gem sources -a http://gems.github.com # Only needed once!
|
14
|
+
sudo gem install riddle
|
15
|
+
sudo gem install mongomapper
|
16
|
+
sudo gem install burke-mongosphinx
|
17
|
+
|
18
|
+
No additional configuraton is needed for interfacing with MongoDB: Setup is
|
19
|
+
done when MongoMapper is able to talk to the MongoDB server.
|
20
|
+
|
21
|
+
A proper "sphinx.conf" file and a script for retrieving index data have to
|
22
|
+
be provided for interfacing with Sphinx: Sorry, no UltraSphinx like
|
23
|
+
magic... :-) Depending on the amount of data, more than one index may be used
|
24
|
+
and indexes may be consolidated from time to time.
|
25
|
+
|
26
|
+
This is a sample configuration for a single "main" index:
|
27
|
+
|
28
|
+
searchd {
|
29
|
+
address = 0.0.0.0
|
30
|
+
port = 3312
|
31
|
+
|
32
|
+
log = ./sphinx/searchd.log
|
33
|
+
query_log = ./sphinx/query.log
|
34
|
+
pid_file = ./sphinx/searchd.pid
|
35
|
+
}
|
36
|
+
|
37
|
+
source mongoblog {
|
38
|
+
type = xmlpipe2
|
39
|
+
|
40
|
+
xmlpipe_command = "rake sphinx:genxml"
|
41
|
+
}
|
42
|
+
|
43
|
+
index mongoblog {
|
44
|
+
source = mongoblog
|
45
|
+
|
46
|
+
charset_type = utf-8
|
47
|
+
path = ./sphinx/sphinx_index_main
|
48
|
+
}
|
49
|
+
|
50
|
+
Notice the line "xmlpipe_command =". This is what the indexer runs to generate
|
51
|
+
its input. You can change this to whatever works best for you, but I set it up as
|
52
|
+
a rake task, with the following in `lib/tasks/sphinx.rake` .
|
53
|
+
|
54
|
+
Here, :fields is a list of fields to export. Performance tends to suffer if you export
|
55
|
+
everything, so you'll probably want to just list the fields you're indexing.
|
56
|
+
|
57
|
+
namespace :sphinx do
|
58
|
+
task :genxml => :environment do
|
59
|
+
puts MongoSphinx::Indexer::XMLDocset.new(Food.all(:fields => 'name')).to_s
|
60
|
+
end
|
61
|
+
end
|
62
|
+
|
63
|
+
== Models
|
64
|
+
|
65
|
+
Use method <tt>fulltext_index</tt> to enable indexing of a model. The
|
66
|
+
default is to index all attributes but it is recommended to provide a list of
|
67
|
+
attribute keys.
|
68
|
+
|
69
|
+
A side effect of calling this method is, that MongoSphinx overrides the
|
70
|
+
default of letting MongoDB create new IDs: Sphinx only allows numeric IDs and
|
71
|
+
MongoSphinx forces new objects with the name of the class, a hyphen and an
|
72
|
+
integer as ID (e.g. <tt>Post-38497238</tt>). Again: Only these objects are
|
73
|
+
indexed due to internal restrictions of Sphinx.
|
74
|
+
|
75
|
+
Sample:
|
76
|
+
|
77
|
+
class Post
|
78
|
+
include MongoMapper::Document
|
79
|
+
|
80
|
+
key :title, String
|
81
|
+
key :body, String
|
82
|
+
|
83
|
+
fulltext_index :title, :body
|
84
|
+
end
|
85
|
+
|
86
|
+
Add options <tt>:server</tt> and <tt>:port</tt> to <tt>fulltext_index</tt> if
|
87
|
+
the Sphinx server to query is running on a different server (defaults to
|
88
|
+
"localhost" with port 3312).
|
89
|
+
|
90
|
+
Here is a full-featured sample setting additional options:
|
91
|
+
|
92
|
+
fulltext_index :title, :body, :server => 'my.other.server', :port => 3313
|
93
|
+
|
94
|
+
== Indexing
|
95
|
+
|
96
|
+
Automatically starting the reindexing of objects the moment new objects are
|
97
|
+
created can be implemented by adding a filter to the model class:
|
98
|
+
|
99
|
+
after_save :reindex
|
100
|
+
def reindex
|
101
|
+
`sudo indexer --all --rotate` # Configure sudo to allow this call...
|
102
|
+
end
|
103
|
+
|
104
|
+
This or a similar callback should be added to all models needing instant
|
105
|
+
indexing. If indexing is not that crucial or load is high, some additional
|
106
|
+
checks for the time of the last call should be added.
|
107
|
+
|
108
|
+
Keep in mind that reindexing is not incremental, and xml is generated to pass
|
109
|
+
data from mongo to sphinx. It's not a speedy operation on large datasets.
|
110
|
+
|
111
|
+
|
112
|
+
== Queries
|
113
|
+
|
114
|
+
An additional instance method <tt>by_fulltext_index</tt> is added for each
|
115
|
+
fulltext indexed model. This method takes a Sphinx query like
|
116
|
+
"foo @title bar", runs it within the context of the current class and returns
|
117
|
+
an Array of matching MongoDB documents.
|
118
|
+
|
119
|
+
Samples:
|
120
|
+
|
121
|
+
Post.by_fulltext_index('first')
|
122
|
+
=> [...]
|
123
|
+
|
124
|
+
post = Post.by_fulltext_index('this is @title post').first
|
125
|
+
post.title
|
126
|
+
=> "First Post"
|
127
|
+
post.class
|
128
|
+
=> Post
|
129
|
+
|
130
|
+
Additional options <tt>:match_mode</tt>, <tt>:limit</tt> and
|
131
|
+
<tt>:max_matches</tt> can be provided to customize the behaviour of Riddle.
|
132
|
+
Option <tt>:raw</tt> can be set to <tt>true</tt> to do no lookup of the
|
133
|
+
document IDs but return the raw IDs instead.
|
134
|
+
|
135
|
+
Sample:
|
136
|
+
|
137
|
+
Post.by_fulltext_index('my post', :limit => 100)
|
138
|
+
|
139
|
+
== Copyright & License
|
140
|
+
|
141
|
+
Copyright (c) 2009 Burke Libbey, Ryan Neufeld
|
142
|
+
|
143
|
+
CouchSphinx Copyright (c) 2009 Holtzbrinck Digital GmbH, Jan Ulbrich
|
144
|
+
|
145
|
+
Permission is hereby granted, free of charge, to any person
|
146
|
+
obtaining a copy of this software and associated documentation
|
147
|
+
files (the "Software"), to deal in the Software without
|
148
|
+
restriction, including without limitation the rights to use,
|
149
|
+
copy, modify, merge, publish, distribute, sublicense, and/or sell
|
150
|
+
copies of the Software, and to permit persons to whom the
|
151
|
+
Software is furnished to do so, subject to the following
|
152
|
+
conditions:
|
153
|
+
|
154
|
+
The above copyright notice and this permission notice shall be
|
155
|
+
included in all copies or substantial portions of the Software.
|
156
|
+
|
157
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
158
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
|
159
|
+
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
160
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
|
161
|
+
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
162
|
+
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
163
|
+
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
164
|
+
OTHER DEALINGS IN THE SOFTWARE.
|
data/lib/indexer.rb
ADDED
@@ -0,0 +1,217 @@
|
|
1
|
+
# MongoSphinx, a full text indexing extension for MongoDB using
|
2
|
+
# Sphinx.
|
3
|
+
#
|
4
|
+
# This file contains the MongoSphinx::Indexer::XMLDocset and
|
5
|
+
# MongoSphinx::Indexer::XMLDoc classes.
|
6
|
+
|
7
|
+
# Namespace module for the MongoSphinx gem.
|
8
|
+
|
9
|
+
module MongoSphinx #:nodoc:
|
10
|
+
|
11
|
+
# Module Indexer contains classes for creating XML input documents for the
|
12
|
+
# indexer. Each Sphinx index consists of a single "sphinx:docset" with any
|
13
|
+
# number of "sphinx:document" tags.
|
14
|
+
#
|
15
|
+
# The XML source can be generated from an array of CouchRest objects or from
|
16
|
+
# an array of Hashes containing at least fields "classname" and "_id"
|
17
|
+
# as returned by MongoDB view "MongoSphinxIndex/couchrests_by_timestamp".
|
18
|
+
#
|
19
|
+
# Sample:
|
20
|
+
#
|
21
|
+
# rows = [{ 'name' => 'John', 'phone' => '199 43828',
|
22
|
+
# 'classname' => 'Address', '_id' => 'Address-234164'
|
23
|
+
# },
|
24
|
+
# { 'name' => 'Sue', 'mobile' => '828 19439',
|
25
|
+
# 'classname' => 'Address', '_id' => 'Address-422433'
|
26
|
+
# }
|
27
|
+
# ]
|
28
|
+
# puts MongoSphinx::Indexer::XMLDocset.new(rows).to_s
|
29
|
+
#
|
30
|
+
# <?xml version="1.0" encoding="utf-8"?>
|
31
|
+
# <sphinx:docset>
|
32
|
+
# <sphinx:schema>
|
33
|
+
# <sphinx:attr name="csphinx-class" type="multi"/>
|
34
|
+
# <sphinx:field name="classname"/>
|
35
|
+
# <sphinx:field name="name"/>
|
36
|
+
# <sphinx:field name="phone"/>
|
37
|
+
# <sphinx:field name="mobile"/>
|
38
|
+
# <sphinx:field name="created_at"/>
|
39
|
+
# </sphinx:schema>
|
40
|
+
# <sphinx:document id="234164">
|
41
|
+
# <csphinx-class>336,623,883,1140</csphinx-class>
|
42
|
+
# <classname>Address</classname>
|
43
|
+
# <name><![CDATA[[John]]></name>
|
44
|
+
# <phone><![CDATA[[199 422433]]></phone>
|
45
|
+
# <mobile><![CDATA[[]]></mobile>
|
46
|
+
# <created_at><![CDATA[[]]></created_at>
|
47
|
+
# </sphinx:document>
|
48
|
+
# <sphinx:document id="423423">
|
49
|
+
# <csphinx-class>336,623,883,1140</csphinx-class>
|
50
|
+
# <classname>Address</classname>
|
51
|
+
# <name><![CDATA[[Sue]]></name>
|
52
|
+
# <phone><![CDATA[[]]></phone>
|
53
|
+
# <mobile><![CDATA[[828 19439]]></mobile>
|
54
|
+
# <created_at><![CDATA[[]]></created_at>
|
55
|
+
# </sphinx:document>
|
56
|
+
# </sphinx:docset>"
|
57
|
+
|
58
|
+
module Indexer
|
59
|
+
|
60
|
+
# Class XMLDocset wraps the XML representation of a document to index. It
|
61
|
+
# contains a complete "sphinx:docset" including its schema definition.
|
62
|
+
|
63
|
+
class XMLDocset
|
64
|
+
|
65
|
+
# Objects contained in document set.
|
66
|
+
|
67
|
+
attr_reader :xml_docs
|
68
|
+
|
69
|
+
# XML generated for opening the document.
|
70
|
+
|
71
|
+
attr_reader :xml_header
|
72
|
+
|
73
|
+
# XML generated for closing the document.
|
74
|
+
|
75
|
+
attr_reader :xml_footer
|
76
|
+
|
77
|
+
# Creates a XMLDocset object from the provided data. It defines a
|
78
|
+
# superset of all fields of the classes to index objects for. The class
|
79
|
+
# names are collected from the provided objects as well.
|
80
|
+
#
|
81
|
+
# Parameters:
|
82
|
+
#
|
83
|
+
# [data] Array with objects of type CouchRest::Document or Hash to create XML for
|
84
|
+
|
85
|
+
def initialize(rows = [])
|
86
|
+
raise ArgumentError, 'Missing class names' if rows.nil?
|
87
|
+
|
88
|
+
xml = '<?xml version="1.0" encoding="utf-8"?>'
|
89
|
+
|
90
|
+
xml << '<sphinx:docset><sphinx:schema>'
|
91
|
+
|
92
|
+
@xml_docs = []
|
93
|
+
classes = []
|
94
|
+
|
95
|
+
rows.each do |row|
|
96
|
+
object = nil
|
97
|
+
|
98
|
+
if row.kind_of? MongoMapper::Document
|
99
|
+
object = row
|
100
|
+
elsif row.kind_of? Hash
|
101
|
+
row = row['value'] if row['classname'].nil?
|
102
|
+
|
103
|
+
if row and (class_name = row['classname'])
|
104
|
+
object = eval(class_name.to_s).new(row) rescue nil
|
105
|
+
end
|
106
|
+
end
|
107
|
+
|
108
|
+
if object and object.sphinx_id
|
109
|
+
classes << object.class if not classes.include? object.class
|
110
|
+
@xml_docs << XMLDoc.from_object(object)
|
111
|
+
end
|
112
|
+
end
|
113
|
+
|
114
|
+
field_names = classes.collect { |clas| clas.fulltext_keys rescue []
|
115
|
+
}.flatten.uniq
|
116
|
+
|
117
|
+
field_names.each do |key, value|
|
118
|
+
xml << "<sphinx:field name=\"#{key}\"/>"
|
119
|
+
end
|
120
|
+
|
121
|
+
xml << '<sphinx:field name="classname"/>'
|
122
|
+
xml << '<sphinx:attr name="csphinx-class" type="multi"/>'
|
123
|
+
|
124
|
+
xml << '</sphinx:schema>'
|
125
|
+
|
126
|
+
@xml_header = xml
|
127
|
+
@xml_footer = '</sphinx:docset>'
|
128
|
+
end
|
129
|
+
|
130
|
+
# Returns the encoded data as XML.
|
131
|
+
|
132
|
+
def to_xml
|
133
|
+
return to_s
|
134
|
+
end
|
135
|
+
|
136
|
+
# Returns the encoded data as XML.
|
137
|
+
|
138
|
+
def to_s
|
139
|
+
return self.xml_header + self.xml_docs.join + self.xml_footer
|
140
|
+
end
|
141
|
+
end
|
142
|
+
|
143
|
+
# Class XMLDoc wraps the XML representation of a single document to index
|
144
|
+
# and contains a complete "sphinx:document" tag.
|
145
|
+
|
146
|
+
class XMLDoc
|
147
|
+
|
148
|
+
# Returns the ID of the encoded data.
|
149
|
+
|
150
|
+
attr_reader :id
|
151
|
+
|
152
|
+
# Returns the class name of the encoded data.
|
153
|
+
|
154
|
+
attr_reader :class_name
|
155
|
+
|
156
|
+
# Returns the encoded data.
|
157
|
+
|
158
|
+
attr_reader :xml
|
159
|
+
|
160
|
+
# Creates a XMLDoc object from the provided CouchRest object.
|
161
|
+
#
|
162
|
+
# Parameters:
|
163
|
+
#
|
164
|
+
# [object] Object to index
|
165
|
+
|
166
|
+
def self.from_object(object)
|
167
|
+
raise ArgumentError, 'Missing object' if object.nil?
|
168
|
+
raise ArgumentError, 'No compatible ID' if (id = object.sphinx_id).nil?
|
169
|
+
|
170
|
+
return new(id, object.class.to_s, object.fulltext_attributes)
|
171
|
+
end
|
172
|
+
|
173
|
+
# Creates a XMLDoc object from the provided ID, class name and data.
|
174
|
+
#
|
175
|
+
# Parameters:
|
176
|
+
#
|
177
|
+
# [id] ID of the object to index
|
178
|
+
# [class_name] Name of the class
|
179
|
+
# [data] Hash with the properties to index
|
180
|
+
|
181
|
+
def initialize(id, class_name, properties)
|
182
|
+
raise ArgumentError, 'Missing id' if id.nil?
|
183
|
+
raise ArgumentError, 'Missing class_name' if class_name.nil?
|
184
|
+
|
185
|
+
xml = "<sphinx:document id=\"#{id}\">"
|
186
|
+
|
187
|
+
xml << '<csphinx-class>'
|
188
|
+
xml << MongoSphinx::MultiAttribute.encode(class_name)
|
189
|
+
xml << '</csphinx-class>'
|
190
|
+
xml << "<classname>#{class_name}</classname>"
|
191
|
+
|
192
|
+
properties.each do |key, value|
|
193
|
+
xml << "<#{key}><![CDATA[[#{value}]]></#{key}>"
|
194
|
+
end
|
195
|
+
|
196
|
+
xml << '</sphinx:document>'
|
197
|
+
|
198
|
+
@xml = xml
|
199
|
+
|
200
|
+
@id = id
|
201
|
+
@class_name = class_name
|
202
|
+
end
|
203
|
+
|
204
|
+
# Returns the encoded data as XML.
|
205
|
+
|
206
|
+
def to_xml
|
207
|
+
return to_s
|
208
|
+
end
|
209
|
+
|
210
|
+
# Returns the encoded data as XML.
|
211
|
+
|
212
|
+
def to_s
|
213
|
+
return self.xml
|
214
|
+
end
|
215
|
+
end
|
216
|
+
end
|
217
|
+
end
|
Binary file
|
@@ -0,0 +1,181 @@
|
|
1
|
+
# MongoSphinx, a full text indexing extension for MongoDB/MongoMapper using
|
2
|
+
# Sphinx.
|
3
|
+
#
|
4
|
+
# This file contains the MongoMapper::Mixins::Indexer module which in turn
|
5
|
+
# includes MongoMapper::Mixins::Indexer::ClassMethods.
|
6
|
+
|
7
|
+
module MongoMapper # :nodoc:
|
8
|
+
module Mixins # :nodoc:
|
9
|
+
|
10
|
+
# Mixin for MongoMapper adding indexing stuff. See class ClassMethods for
|
11
|
+
# details.
|
12
|
+
|
13
|
+
module Indexer #:nodoc:
|
14
|
+
|
15
|
+
# Bootstrap method to include patches with.
|
16
|
+
#
|
17
|
+
# Parameters:
|
18
|
+
#
|
19
|
+
# [base] Class to include class methods of module into
|
20
|
+
|
21
|
+
def self.included(base)
|
22
|
+
base.extend(ClassMethods)
|
23
|
+
end
|
24
|
+
|
25
|
+
# Patches to the MongoMapper Document module: Adds the
|
26
|
+
# "fulltext_index" method for enabling indexing and defining the fields
|
27
|
+
# to include as a domain specific extention. This method also assures
|
28
|
+
# the existence of a special design document used to generate indexes
|
29
|
+
# from.
|
30
|
+
#
|
31
|
+
# An additional save callback sets an ID like "Post-123123" (class name
|
32
|
+
# plus pure numeric ID compatible with Sphinx) for new objects).
|
33
|
+
#
|
34
|
+
# Last but not least method "by_fulltext_index" is defined allowing a
|
35
|
+
|
36
|
+
# full text search like "foo @title bar" within the context of the
|
37
|
+
# current class.
|
38
|
+
#
|
39
|
+
# Samples:
|
40
|
+
#
|
41
|
+
# class Post < MongoMapper::Document
|
42
|
+
# use_database SERVER.default_database
|
43
|
+
#
|
44
|
+
# property :title
|
45
|
+
# property :body
|
46
|
+
#
|
47
|
+
# fulltext_index :title, :body
|
48
|
+
# end
|
49
|
+
#
|
50
|
+
# Post.by_fulltext_index('first')
|
51
|
+
# => [...]
|
52
|
+
# post = Post.by_fulltext_index('this is @title post').first
|
53
|
+
# post.title
|
54
|
+
# => "First Post"
|
55
|
+
# post.class
|
56
|
+
# => Post
|
57
|
+
|
58
|
+
def save_callback()
|
59
|
+
object = self
|
60
|
+
if object.id.nil?
|
61
|
+
idsize = fulltext_opts[:idsize] || 32
|
62
|
+
limit = (1 << idsize) - 1
|
63
|
+
|
64
|
+
while true
|
65
|
+
id = rand(limit)
|
66
|
+
candidate = "#{self.class.to_s}-#{id}"
|
67
|
+
|
68
|
+
begin
|
69
|
+
object.class.find(candidate) # Resource not found exception if available
|
70
|
+
rescue MongoMapper::DocumentNotFound
|
71
|
+
object.id = candidate
|
72
|
+
break
|
73
|
+
end
|
74
|
+
end
|
75
|
+
end
|
76
|
+
end
|
77
|
+
|
78
|
+
|
79
|
+
|
80
|
+
module ClassMethods
|
81
|
+
|
82
|
+
# Method for enabling fulltext indexing and for defining the fields to
|
83
|
+
# include.
|
84
|
+
#
|
85
|
+
# Parameters:
|
86
|
+
#
|
87
|
+
# [keys] Array of field keys to include plus options Hash
|
88
|
+
#
|
89
|
+
# Options:
|
90
|
+
#
|
91
|
+
# [:server] Server name (defaults to localhost)
|
92
|
+
# [:port] Server port (defaults to 3312)
|
93
|
+
# [:idsize] Number of bits for the ID to generate (defaults to 32)
|
94
|
+
|
95
|
+
def fulltext_index(*keys)
|
96
|
+
opts = keys.pop if keys.last.is_a?(Hash)
|
97
|
+
opts ||= {} # Handle some options: Future use... :-)
|
98
|
+
|
99
|
+
# Save the keys to index and the options for later use in callback.
|
100
|
+
# Helper method cattr_accessor is already bootstrapped by couchrest
|
101
|
+
# gem.
|
102
|
+
|
103
|
+
cattr_accessor :fulltext_keys
|
104
|
+
cattr_accessor :fulltext_opts
|
105
|
+
|
106
|
+
self.fulltext_keys = keys
|
107
|
+
self.fulltext_opts = opts
|
108
|
+
|
109
|
+
# Overwrite setting of new ID to do something compatible with
|
110
|
+
# Sphinx. If an ID already exists, we try to match it with our
|
111
|
+
# Schema and cowardly ignore if not.
|
112
|
+
|
113
|
+
before_save :save_callback
|
114
|
+
|
115
|
+
end
|
116
|
+
|
117
|
+
# Searches for an object of this model class (e.g. Post, Comment) and
|
118
|
+
# the requested query string. The query string may contain any query
|
119
|
+
# provided by Sphinx.
|
120
|
+
#
|
121
|
+
# Call MongoMapper::Document.by_fulltext_index() to query
|
122
|
+
# without reducing to a single class type.
|
123
|
+
#
|
124
|
+
# Parameters:
|
125
|
+
#
|
126
|
+
# [query] Query string like "foo @title bar"
|
127
|
+
# [options] Additional options to set
|
128
|
+
#
|
129
|
+
# Options:
|
130
|
+
#
|
131
|
+
# [:match_mode] Optional Riddle match mode (defaults to :extended)
|
132
|
+
# [:limit] Optional Riddle limit (Riddle default)
|
133
|
+
# [:max_matches] Optional Riddle max_matches (Riddle default)
|
134
|
+
# [:sort_by] Optional Riddle sort order (also sets sort_mode to :extended)
|
135
|
+
# [:raw] Flag to return only IDs and do not lookup objects (defaults to false)
|
136
|
+
|
137
|
+
def by_fulltext_index(query, options = {})
|
138
|
+
if self == Document
|
139
|
+
client = Riddle::Client.new
|
140
|
+
else
|
141
|
+
client = Riddle::Client.new(fulltext_opts[:server],
|
142
|
+
fulltext_opts[:port])
|
143
|
+
|
144
|
+
query = query + " @classname #{self}"
|
145
|
+
end
|
146
|
+
|
147
|
+
client.match_mode = options[:match_mode] || :extended
|
148
|
+
|
149
|
+
if (limit = options[:limit])
|
150
|
+
client.limit = limit
|
151
|
+
end
|
152
|
+
|
153
|
+
if (max_matches = options[:max_matches])
|
154
|
+
client.max_matches = matches
|
155
|
+
end
|
156
|
+
|
157
|
+
if (sort_by = options[:sort_by])
|
158
|
+
client.sort_mode = :extended
|
159
|
+
client.sort_by = sort_by
|
160
|
+
end
|
161
|
+
|
162
|
+
result = client.query(query)
|
163
|
+
|
164
|
+
#TODO
|
165
|
+
if result and result[:status] == 0 and (matches = result[:matches])
|
166
|
+
classname = nil
|
167
|
+
ids = matches.collect do |row|
|
168
|
+
classname = MongoSphinx::MultiAttribute.decode(row[:attributes]['csphinx-class'])
|
169
|
+
(classname + '-' + row[:doc].to_s) rescue nil
|
170
|
+
end.compact
|
171
|
+
|
172
|
+
return ids if options[:raw]
|
173
|
+
return Object.const_get(classname).find(ids)
|
174
|
+
else
|
175
|
+
return []
|
176
|
+
end
|
177
|
+
end
|
178
|
+
end
|
179
|
+
end
|
180
|
+
end
|
181
|
+
end
|
@@ -0,0 +1,64 @@
|
|
1
|
+
# MongoSphinx, a full text indexing extension for MongoDB using
|
2
|
+
# Sphinx.
|
3
|
+
#
|
4
|
+
# This file contains the MongoMapper::Mixins::Properties module.
|
5
|
+
|
6
|
+
# Patches to the CouchRest library.
|
7
|
+
|
8
|
+
module MongoMapper # :nodoc:
|
9
|
+
module Mixins # :nodoc:
|
10
|
+
|
11
|
+
# Patches to the CouchRest Properties module: Adds the "attributes" method
|
12
|
+
# plus some fulltext relevant stuff.
|
13
|
+
#
|
14
|
+
# Samples:
|
15
|
+
#
|
16
|
+
# data = SERVER.default_database.view('CouchSphinxIndex/couchrests_by_timestamp')
|
17
|
+
# rows = data['rows']
|
18
|
+
# post = Post.new(rows.first)
|
19
|
+
#
|
20
|
+
# post.attributes
|
21
|
+
# => {:tags=>"one, two, three", :updated_at=>Tue Jun 09 14:45:00 +0200 2009,
|
22
|
+
# :author=>nil, :title=>"First Post",
|
23
|
+
# :created_at=>Tue Jun 09 14:45:00 +0200 2009,
|
24
|
+
# :body=>"This is the first post. This is the [...] first post. "}
|
25
|
+
#
|
26
|
+
# post.fulltext_attributes
|
27
|
+
# => {:title=>"First Post", :author=>nil,
|
28
|
+
# :created_at=>Tue Jun 09 14:45:00 +0200 2009
|
29
|
+
# :body=>"This is the first post. This is the [...] first post. "}
|
30
|
+
#
|
31
|
+
# post.sphinx_id
|
32
|
+
# => "921744775"
|
33
|
+
# post.id
|
34
|
+
# => "Post-921744775"
|
35
|
+
|
36
|
+
module Properties
|
37
|
+
|
38
|
+
# Returns a Hash of all attributes allowed to be indexed. As a side
|
39
|
+
# effect it sets the fulltext_keys variable if still blank or empty.
|
40
|
+
|
41
|
+
def fulltext_attributes
|
42
|
+
clas = self.class
|
43
|
+
|
44
|
+
if not clas.fulltext_keys or clas.fulltext_keys.empty?
|
45
|
+
clas.fulltext_keys = self.attributes.collect { |k,v| k.intern }
|
46
|
+
end
|
47
|
+
|
48
|
+
return self.attributes.reject do |k, v|
|
49
|
+
not (clas.fulltext_keys.include? k.intern)
|
50
|
+
end
|
51
|
+
end
|
52
|
+
|
53
|
+
# Returns the numeric part of the document ID (compatible to Sphinx).
|
54
|
+
|
55
|
+
def sphinx_id
|
56
|
+
if (match = self.id.match(/#{self.class}-([0-9]+)/))
|
57
|
+
return match[1]
|
58
|
+
else
|
59
|
+
return nil
|
60
|
+
end
|
61
|
+
end
|
62
|
+
end
|
63
|
+
end
|
64
|
+
end
|
@@ -0,0 +1,57 @@
|
|
1
|
+
# MongoSphinx, a full text indexing extension for MongoDB using
|
2
|
+
# Sphinx.
|
3
|
+
#
|
4
|
+
# This file contains the MongoSphinx::MultiAttribute class.
|
5
|
+
|
6
|
+
# Namespace module for the MongoSphinx gem.
|
7
|
+
|
8
|
+
module MongoSphinx #:nodoc:
|
9
|
+
|
10
|
+
# Module MultiAttribute implements helpers to translate back and
|
11
|
+
# forth between Ruby Strings and an array of integers suitable for Sphinx
|
12
|
+
# attributes of type "multi".
|
13
|
+
#
|
14
|
+
# Background: Getting an ID as result for a query is OK, but for example to
|
15
|
+
# allow cast safety, we need an aditional attribute. Sphinx supports
|
16
|
+
# attributes which are returned together with the ID, but they behave a
|
17
|
+
# little different than expected: Instead we can use arrays of integers with
|
18
|
+
# ASCII character codes. These values are returned in ascending (!) order of
|
19
|
+
# value (yes, sounds funny but is reasonable from an internal view to
|
20
|
+
# Sphinx). So we mask each byte with 0x0100++ to keep the order...
|
21
|
+
#
|
22
|
+
# Sample:
|
23
|
+
#
|
24
|
+
# MongoSphinx::MultiAttribute.encode('Hello')
|
25
|
+
# => "328,613,876,1132,1391"
|
26
|
+
# MongoSphinx::MultiAttribute.decode('328,613,876,1132,1391')
|
27
|
+
# => "Hello"
|
28
|
+
|
29
|
+
module MultiAttribute
|
30
|
+
|
31
|
+
# Returns an numeric representation of a Ruby String suitable for "multi"
|
32
|
+
# attributes of Sphinx.
|
33
|
+
#
|
34
|
+
# Parameters:
|
35
|
+
#
|
36
|
+
# [str] String to translate
|
37
|
+
|
38
|
+
def self.encode(str)
|
39
|
+
offset = 0
|
40
|
+
return str.bytes.collect { |c| (offset+= 0x0100) + c }.join(',')
|
41
|
+
end
|
42
|
+
|
43
|
+
# Returns the original MongoDB ID created from a Sphinx ID. Only works if
|
44
|
+
# the ID was created from a MongoDB ID before!
|
45
|
+
#
|
46
|
+
# Parameters:
|
47
|
+
#
|
48
|
+
# [multi] Sphinx "multi" attribute to translate back
|
49
|
+
|
50
|
+
def self.decode(multi)
|
51
|
+
offset = 0
|
52
|
+
multi = multi.split(',') if not multi.kind_of? Array
|
53
|
+
|
54
|
+
return multi.collect {|x| (x.to_i-(offset+=0x0100)).chr}.to_s
|
55
|
+
end
|
56
|
+
end
|
57
|
+
end
|
data/mongosphinx.rb
ADDED
@@ -0,0 +1,41 @@
|
|
1
|
+
# MongoSphinx, a full text indexing extension for using
|
2
|
+
# Sphinx.
|
3
|
+
#
|
4
|
+
# This file contains the includes implementing this library. Have a look at
|
5
|
+
# the README.rdoc as a starting point.
|
6
|
+
|
7
|
+
begin
|
8
|
+
require 'rubygems'
|
9
|
+
rescue LoadError; end
|
10
|
+
require 'mongomapper'
|
11
|
+
require 'riddle'
|
12
|
+
|
13
|
+
|
14
|
+
module MongoSphinx
|
15
|
+
if (match = __FILE__.match(/mongosphinx-([0-9.-]*)/))
|
16
|
+
VERSION = match[1]
|
17
|
+
else
|
18
|
+
VERSION = 'unknown'
|
19
|
+
end
|
20
|
+
end
|
21
|
+
|
22
|
+
require 'lib/multi_attribute'
|
23
|
+
require 'lib/indexer'
|
24
|
+
require 'lib/mixins/indexer'
|
25
|
+
require 'lib/mixins/properties'
|
26
|
+
|
27
|
+
|
28
|
+
# Include the Indexer mixin from the original Document class of
|
29
|
+
# MongoMapper which adds a few methods and allows calling method indexed_with.
|
30
|
+
|
31
|
+
module MongoMapper # :nodoc:
|
32
|
+
module Document # :nodoc:
|
33
|
+
include MongoMapper::Mixins::Indexer
|
34
|
+
module InstanceMethods
|
35
|
+
include MongoMapper::Mixins::Properties
|
36
|
+
end
|
37
|
+
module ClassMethods
|
38
|
+
include MongoMapper::Mixins::Indexer::ClassMethods
|
39
|
+
end
|
40
|
+
end
|
41
|
+
end
|
metadata
ADDED
@@ -0,0 +1,69 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: mongosphinx
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: "0.1"
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Burke Libbey
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
|
12
|
+
date: 2009-10-14 00:00:00 -05:00
|
13
|
+
default_executable:
|
14
|
+
dependencies: []
|
15
|
+
|
16
|
+
description:
|
17
|
+
email: burke@burkelibbey.org
|
18
|
+
executables: []
|
19
|
+
|
20
|
+
extensions: []
|
21
|
+
|
22
|
+
extra_rdoc_files:
|
23
|
+
- README.rdoc
|
24
|
+
files:
|
25
|
+
- README.rdoc
|
26
|
+
- mongosphinx.rb
|
27
|
+
- lib/multi_attribute.rb
|
28
|
+
- lib/mixins/properties.rb
|
29
|
+
- lib/mixins/indexer.rb
|
30
|
+
- lib/mixins/.indexer.rb.swp
|
31
|
+
- lib/indexer.rb
|
32
|
+
has_rdoc: true
|
33
|
+
homepage: http://github.com/burke/mongosphinx
|
34
|
+
licenses: []
|
35
|
+
|
36
|
+
post_install_message:
|
37
|
+
rdoc_options:
|
38
|
+
- --exclude
|
39
|
+
- pkg
|
40
|
+
- --exclude
|
41
|
+
- tmp
|
42
|
+
- --all
|
43
|
+
- --title
|
44
|
+
- MongoSphinx
|
45
|
+
- --main
|
46
|
+
- README.rdoc
|
47
|
+
require_paths:
|
48
|
+
- .
|
49
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
50
|
+
requirements:
|
51
|
+
- - ">="
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: "0"
|
54
|
+
version:
|
55
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
56
|
+
requirements:
|
57
|
+
- - ">="
|
58
|
+
- !ruby/object:Gem::Version
|
59
|
+
version: "0"
|
60
|
+
version:
|
61
|
+
requirements: []
|
62
|
+
|
63
|
+
rubyforge_project:
|
64
|
+
rubygems_version: 1.3.5
|
65
|
+
signing_key:
|
66
|
+
specification_version: 3
|
67
|
+
summary: A full text indexing extension for MongoDB using Sphinx.
|
68
|
+
test_files: []
|
69
|
+
|