couchsphinx 0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.rdoc +171 -0
- data/couchsphinx.rb +28 -0
- data/lib/indexer.rb +217 -0
- data/lib/mixins/indexer.rb +256 -0
- data/lib/mixins/properties.rb +75 -0
- data/lib/multi_attribute.rb +57 -0
- metadata +68 -0
data/README.rdoc
ADDED
|
@@ -0,0 +1,171 @@
|
|
|
1
|
+
= CouchSphinx
|
|
2
|
+
|
|
3
|
+
The CouchSphinx library implements an interface between CouchDB and Sphinx
|
|
4
|
+
supporting CouchRest to automatically index objects in Sphinx. It tries to
|
|
5
|
+
act as transparent as possible: Just an additional method in the CouchRest
|
|
6
|
+
domain specific language and some Sphinx configuration are needed to get
|
|
7
|
+
going.
|
|
8
|
+
|
|
9
|
+
== Prerequisites
|
|
10
|
+
|
|
11
|
+
CouchSphinx needs gems CouchRest and Riddle as well as a running Sphinx
|
|
12
|
+
and a CouchDB installation.
|
|
13
|
+
|
|
14
|
+
sudo gem sources -a http://gems.github.com # Only needed once!
|
|
15
|
+
sudo gem install riddle
|
|
16
|
+
sudo gem install couchrest
|
|
17
|
+
sudo gem install ulbrich-couchsphinx
|
|
18
|
+
|
|
19
|
+
No additional configuraton is needed for interfacing with CouchDB: Setup is
|
|
20
|
+
done when CouchRest is able to talk to the CouchDB server.
|
|
21
|
+
|
|
22
|
+
A proper "sphinx.conf" file and a script for retrieving index data have to
|
|
23
|
+
be provided for interfacing with Sphinx: Sorry, no UltraSphinx like
|
|
24
|
+
magic... :-) Depending on the amount of data, more than one index may be used
|
|
25
|
+
and indexes may be consolidated from time to time.
|
|
26
|
+
|
|
27
|
+
This is a sample configuration for a single "main" index:
|
|
28
|
+
|
|
29
|
+
searchd {
|
|
30
|
+
address = 0.0.0.0
|
|
31
|
+
port = 3312
|
|
32
|
+
|
|
33
|
+
log = ./sphinx/searchd.log
|
|
34
|
+
query_log = ./sphinx/query.log
|
|
35
|
+
pid_file = ./sphinx/searchd.pid
|
|
36
|
+
}
|
|
37
|
+
|
|
38
|
+
source couchblog {
|
|
39
|
+
type = xmlpipe2
|
|
40
|
+
|
|
41
|
+
xmlpipe_command = ./sphinxsource.rb
|
|
42
|
+
}
|
|
43
|
+
|
|
44
|
+
index couchblog {
|
|
45
|
+
source = couchblog
|
|
46
|
+
|
|
47
|
+
charset_type = utf-8
|
|
48
|
+
path = ./sphinx/sphinx_index_main
|
|
49
|
+
}
|
|
50
|
+
|
|
51
|
+
The script "sphinxsource.rb" providing the data to index may vary
|
|
52
|
+
depending on the number of CouchDB instances it talks to. This is a simple
|
|
53
|
+
script interfacing with one single instance:
|
|
54
|
+
|
|
55
|
+
#!/usr/bin/env ruby
|
|
56
|
+
|
|
57
|
+
require 'rubygems'
|
|
58
|
+
require 'lib/models' # Depends on location of model files
|
|
59
|
+
|
|
60
|
+
data = SERVER.default_database.view('CouchSphinxIndex/couchrests_by_timestamp')
|
|
61
|
+
rows = data['rows'] rescue []
|
|
62
|
+
|
|
63
|
+
puts CouchSphinx::Indexer::XMLDocset.new(rows).to_s
|
|
64
|
+
|
|
65
|
+
== Models
|
|
66
|
+
|
|
67
|
+
Use method <tt>fulltext_index</tt> to enable indexing of a model. The
|
|
68
|
+
default is to index all attributes but it is recommended to provide a list of
|
|
69
|
+
attribute keys.
|
|
70
|
+
|
|
71
|
+
A side effect of calling this method is, that CouchSphinx overrides the
|
|
72
|
+
default of letting CouchDB create new IDs: Sphinx only allows numeric IDs and
|
|
73
|
+
CouchSphinx forces new objects with the name of the class, a hyphen and an
|
|
74
|
+
integer as ID (e.g. <tt>Post-38497238</tt>). Again: Only these objects are
|
|
75
|
+
indexed due to internal restrictions of Sphinx.
|
|
76
|
+
|
|
77
|
+
Sample:
|
|
78
|
+
|
|
79
|
+
class Post < CouchRest::ExtendedDocument
|
|
80
|
+
use_database SERVER.default_database
|
|
81
|
+
|
|
82
|
+
property :title
|
|
83
|
+
property :body
|
|
84
|
+
|
|
85
|
+
fulltext_index :title, :body
|
|
86
|
+
end
|
|
87
|
+
|
|
88
|
+
Add options <tt>:server</tt> and <tt>:port</tt> to <tt>fulltext_index</tt> if
|
|
89
|
+
the Sphinx server to query is running on a different server (defaults to
|
|
90
|
+
"localhost" with port 3312).
|
|
91
|
+
|
|
92
|
+
If you are sure your Sphinx is compiled with 64-bit support, you may add
|
|
93
|
+
option <tt>:idsize</tt> with value <tt>64</tt> to generate 64-bit IDs for
|
|
94
|
+
CouchDB (defaults to 32-bits).
|
|
95
|
+
|
|
96
|
+
Here is a full-featured sample setting additional options:
|
|
97
|
+
|
|
98
|
+
fulltext_index :title, :body, :server => 'my.other.server', :port => 3313,
|
|
99
|
+
:idsize => 64
|
|
100
|
+
|
|
101
|
+
== Indexing
|
|
102
|
+
|
|
103
|
+
CouchSphinx also adds a new design document to CouchDB: It needs to collect
|
|
104
|
+
all relevant objects for running the Sphinx indexer and adds its own views
|
|
105
|
+
to do so. Have a look at CouchDB design document "CouchSphinxIndex" for
|
|
106
|
+
details.
|
|
107
|
+
|
|
108
|
+
Automatically starting the reindexing of objects the moment new objects are
|
|
109
|
+
created can be implemented by adding a save_callback to the model class:
|
|
110
|
+
|
|
111
|
+
save_callback :after do |object|
|
|
112
|
+
`sudo indexer --all --rotate` # Configure sudo to allow this call...
|
|
113
|
+
end
|
|
114
|
+
|
|
115
|
+
This or a similar callback should be added to all models needing instant
|
|
116
|
+
indexing. If indexing is not that crucial or load is high, some additional
|
|
117
|
+
checks for the time of the last call should be added.
|
|
118
|
+
|
|
119
|
+
== Queries
|
|
120
|
+
|
|
121
|
+
An additional instance method <tt>by_fulltext_index</tt> is added for each
|
|
122
|
+
fulltext indexed model. This method takes a Sphinx query like
|
|
123
|
+
"foo @title bar", runs it within the context of the current class and returns
|
|
124
|
+
an Array of matching CouchDB documents. Use
|
|
125
|
+
<tt>CouchRest::ExtendedDocument.by_fulltext_index</tt> if you want to find
|
|
126
|
+
any document matching the query and not only a certain class.
|
|
127
|
+
|
|
128
|
+
Samples:
|
|
129
|
+
|
|
130
|
+
Post.by_fulltext_index('first')
|
|
131
|
+
=> [...]
|
|
132
|
+
|
|
133
|
+
post = Post.by_fulltext_index('this is @title post').first
|
|
134
|
+
post.title
|
|
135
|
+
=> "First Post"
|
|
136
|
+
post.class
|
|
137
|
+
=> Post
|
|
138
|
+
|
|
139
|
+
Additional options <tt>:match_mode</tt>, <tt>:limit</tt> and
|
|
140
|
+
<tt>:max_matches</tt> can be provided to customize the behaviour of Riddle.
|
|
141
|
+
Option <tt>:raw</tt> can be set to <tt>true</tt> to do no lookup of the
|
|
142
|
+
document IDs but return the raw IDs instead.
|
|
143
|
+
|
|
144
|
+
Sample:
|
|
145
|
+
|
|
146
|
+
Post.by_fulltext_index('my post', :limit => 100)
|
|
147
|
+
|
|
148
|
+
== Copyright & License
|
|
149
|
+
|
|
150
|
+
Copyright (c) 2009 Holtzbrinck Digital GmbH, Jan Ulbrich
|
|
151
|
+
|
|
152
|
+
Permission is hereby granted, free of charge, to any person
|
|
153
|
+
obtaining a copy of this software and associated documentation
|
|
154
|
+
files (the "Software"), to deal in the Software without
|
|
155
|
+
restriction, including without limitation the rights to use,
|
|
156
|
+
copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
157
|
+
copies of the Software, and to permit persons to whom the
|
|
158
|
+
Software is furnished to do so, subject to the following
|
|
159
|
+
conditions:
|
|
160
|
+
|
|
161
|
+
The above copyright notice and this permission notice shall be
|
|
162
|
+
included in all copies or substantial portions of the Software.
|
|
163
|
+
|
|
164
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
|
165
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
|
|
166
|
+
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
|
167
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
|
|
168
|
+
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
|
169
|
+
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
|
170
|
+
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
|
171
|
+
OTHER DEALINGS IN THE SOFTWARE.
|
data/couchsphinx.rb
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
# CouchSphinx, a full text indexing extension for CouchDB/CouchRest using
|
|
2
|
+
# Sphinx.
|
|
3
|
+
#
|
|
4
|
+
# This file contains the includes implementing this library. Have a look at
|
|
5
|
+
# the README.rdoc as a starting point.
|
|
6
|
+
|
|
7
|
+
require 'rubygems'
|
|
8
|
+
|
|
9
|
+
require 'couchrest'
|
|
10
|
+
require 'riddle'
|
|
11
|
+
|
|
12
|
+
# Version number to use for updating CouchDB design document CouchSphinxIndex
|
|
13
|
+
# if needed.
|
|
14
|
+
|
|
15
|
+
module CouchSphinx
|
|
16
|
+
if (match = __FILE__.match(/couchsphinx-([0-9.-]*)/))
|
|
17
|
+
VERSION = match[1]
|
|
18
|
+
else
|
|
19
|
+
VERSION = 'unknown'
|
|
20
|
+
end
|
|
21
|
+
end
|
|
22
|
+
|
|
23
|
+
# Require the stuff implementing this library...
|
|
24
|
+
|
|
25
|
+
require 'lib/multi_attribute'
|
|
26
|
+
require 'lib/indexer'
|
|
27
|
+
require 'lib/mixins/indexer'
|
|
28
|
+
require 'lib/mixins/properties'
|
data/lib/indexer.rb
ADDED
|
@@ -0,0 +1,217 @@
|
|
|
1
|
+
# CouchSphinx, a full text indexing extension for CouchDB/CouchRest using
|
|
2
|
+
# Sphinx.
|
|
3
|
+
#
|
|
4
|
+
# This file contains the CouchSphinx::Indexer::XMLDocset and
|
|
5
|
+
# CouchSphinx::Indexer::XMLDoc classes.
|
|
6
|
+
|
|
7
|
+
# Namespace module for the CouchSphinx gem.
|
|
8
|
+
|
|
9
|
+
module CouchSphinx #:nodoc:
|
|
10
|
+
|
|
11
|
+
# Module Indexer contains classes for creating XML input documents for the
|
|
12
|
+
# indexer. Each Sphinx index consists of a single "sphinx:docset" with any
|
|
13
|
+
# number of "sphinx:document" tags.
|
|
14
|
+
#
|
|
15
|
+
# The XML source can be generated from an array of CouchRest objects or from
|
|
16
|
+
# an array of Hashes containing at least fields "couchrest-type" and "_id"
|
|
17
|
+
# as returned by CouchDB view "CouchSphinxIndex/couchrests_by_timestamp".
|
|
18
|
+
#
|
|
19
|
+
# Sample:
|
|
20
|
+
#
|
|
21
|
+
# rows = [{ 'name' => 'John', 'phone' => '199 43828',
|
|
22
|
+
# 'couchrest-type' => 'Address', '_id' => 'Address-234164'
|
|
23
|
+
# },
|
|
24
|
+
# { 'name' => 'Sue', 'mobile' => '828 19439',
|
|
25
|
+
# 'couchrest-type' => 'Address', '_id' => 'Address-422433'
|
|
26
|
+
# }
|
|
27
|
+
# ]
|
|
28
|
+
# puts CouchSphinx::Indexer::XMLDocset.new(rows).to_s
|
|
29
|
+
#
|
|
30
|
+
# <?xml version="1.0" encoding="utf-8"?>
|
|
31
|
+
# <sphinx:docset>
|
|
32
|
+
# <sphinx:schema>
|
|
33
|
+
# <sphinx:attr name="csphinx-class" type="multi"/>
|
|
34
|
+
# <sphinx:field name="couchrest-type"/>
|
|
35
|
+
# <sphinx:field name="name"/>
|
|
36
|
+
# <sphinx:field name="phone"/>
|
|
37
|
+
# <sphinx:field name="mobile"/>
|
|
38
|
+
# <sphinx:field name="created_at"/>
|
|
39
|
+
# </sphinx:schema>
|
|
40
|
+
# <sphinx:document id="234164">
|
|
41
|
+
# <csphinx-class>336,623,883,1140</csphinx-class>
|
|
42
|
+
# <couchrest-type>Address</couchrest-type>
|
|
43
|
+
# <name><![CDATA[[John]]></name>
|
|
44
|
+
# <phone><![CDATA[[199 422433]]></phone>
|
|
45
|
+
# <mobile><![CDATA[[]]></mobile>
|
|
46
|
+
# <created_at><![CDATA[[]]></created_at>
|
|
47
|
+
# </sphinx:document>
|
|
48
|
+
# <sphinx:document id="423423">
|
|
49
|
+
# <csphinx-class>336,623,883,1140</csphinx-class>
|
|
50
|
+
# <couchrest-type>Address</couchrest-type>
|
|
51
|
+
# <name><![CDATA[[Sue]]></name>
|
|
52
|
+
# <phone><![CDATA[[]]></phone>
|
|
53
|
+
# <mobile><![CDATA[[828 19439]]></mobile>
|
|
54
|
+
# <created_at><![CDATA[[]]></created_at>
|
|
55
|
+
# </sphinx:document>
|
|
56
|
+
# </sphinx:docset>"
|
|
57
|
+
|
|
58
|
+
module Indexer
|
|
59
|
+
|
|
60
|
+
# Class XMLDocset wraps the XML representation of a document to index. It
|
|
61
|
+
# contains a complete "sphinx:docset" including its schema definition.
|
|
62
|
+
|
|
63
|
+
class XMLDocset
|
|
64
|
+
|
|
65
|
+
# Objects contained in document set.
|
|
66
|
+
|
|
67
|
+
attr_reader :xml_docs
|
|
68
|
+
|
|
69
|
+
# XML generated for opening the document.
|
|
70
|
+
|
|
71
|
+
attr_reader :xml_header
|
|
72
|
+
|
|
73
|
+
# XML generated for closing the document.
|
|
74
|
+
|
|
75
|
+
attr_reader :xml_footer
|
|
76
|
+
|
|
77
|
+
# Creates a XMLDocset object from the provided data. It defines a
|
|
78
|
+
# superset of all fields of the classes to index objects for. The class
|
|
79
|
+
# names are collected from the provided objects as well.
|
|
80
|
+
#
|
|
81
|
+
# Parameters:
|
|
82
|
+
#
|
|
83
|
+
# [data] Array with objects of type CouchRest::Document or Hash to create XML for
|
|
84
|
+
|
|
85
|
+
def initialize(rows = [])
|
|
86
|
+
raise ArgumentError, 'Missing class names' if rows.nil?
|
|
87
|
+
|
|
88
|
+
xml = '<?xml version="1.0" encoding="utf-8"?>'
|
|
89
|
+
|
|
90
|
+
xml << '<sphinx:docset><sphinx:schema>'
|
|
91
|
+
|
|
92
|
+
@xml_docs = []
|
|
93
|
+
classes = []
|
|
94
|
+
|
|
95
|
+
rows.each do |row|
|
|
96
|
+
object = nil
|
|
97
|
+
|
|
98
|
+
if row.kind_of? CouchRest::Document
|
|
99
|
+
object = row
|
|
100
|
+
elsif row.kind_of? Hash
|
|
101
|
+
row = row['value'] if row['couchrest-type'].nil?
|
|
102
|
+
|
|
103
|
+
if row and (class_name = row['couchrest-type'])
|
|
104
|
+
object = eval(class_name.to_s).new(row) rescue nil
|
|
105
|
+
end
|
|
106
|
+
end
|
|
107
|
+
|
|
108
|
+
if object and object.sphinx_id
|
|
109
|
+
classes << object.class if not classes.include? object.class
|
|
110
|
+
@xml_docs << XMLDoc.from_object(object)
|
|
111
|
+
end
|
|
112
|
+
end
|
|
113
|
+
|
|
114
|
+
field_names = classes.collect { |clas| clas.fulltext_keys rescue []
|
|
115
|
+
}.flatten.uniq
|
|
116
|
+
|
|
117
|
+
field_names.each do |key, value|
|
|
118
|
+
xml << "<sphinx:field name=\"#{key}\"/>"
|
|
119
|
+
end
|
|
120
|
+
|
|
121
|
+
xml << '<sphinx:field name="couchrest-type"/>'
|
|
122
|
+
xml << '<sphinx:attr name="csphinx-class" type="multi"/>'
|
|
123
|
+
|
|
124
|
+
xml << '</sphinx:schema>'
|
|
125
|
+
|
|
126
|
+
@xml_header = xml
|
|
127
|
+
@xml_footer = '</sphinx:docset>'
|
|
128
|
+
end
|
|
129
|
+
|
|
130
|
+
# Returns the encoded data as XML.
|
|
131
|
+
|
|
132
|
+
def to_xml
|
|
133
|
+
return to_s
|
|
134
|
+
end
|
|
135
|
+
|
|
136
|
+
# Returns the encoded data as XML.
|
|
137
|
+
|
|
138
|
+
def to_s
|
|
139
|
+
return self.xml_header + self.xml_docs.join + self.xml_footer
|
|
140
|
+
end
|
|
141
|
+
end
|
|
142
|
+
|
|
143
|
+
# Class XMLDoc wraps the XML representation of a single document to index
|
|
144
|
+
# and contains a complete "sphinx:document" tag.
|
|
145
|
+
|
|
146
|
+
class XMLDoc
|
|
147
|
+
|
|
148
|
+
# Returns the ID of the encoded data.
|
|
149
|
+
|
|
150
|
+
attr_reader :id
|
|
151
|
+
|
|
152
|
+
# Returns the class name of the encoded data.
|
|
153
|
+
|
|
154
|
+
attr_reader :class_name
|
|
155
|
+
|
|
156
|
+
# Returns the encoded data.
|
|
157
|
+
|
|
158
|
+
attr_reader :xml
|
|
159
|
+
|
|
160
|
+
# Creates a XMLDoc object from the provided CouchRest object.
|
|
161
|
+
#
|
|
162
|
+
# Parameters:
|
|
163
|
+
#
|
|
164
|
+
# [object] Object to index
|
|
165
|
+
|
|
166
|
+
def self.from_object(object)
|
|
167
|
+
raise ArgumentError, 'Missing object' if object.nil?
|
|
168
|
+
raise ArgumentError, 'No compatible ID' if (id = object.sphinx_id).nil?
|
|
169
|
+
|
|
170
|
+
return new(id, object.class.to_s, object.fulltext_attributes)
|
|
171
|
+
end
|
|
172
|
+
|
|
173
|
+
# Creates a XMLDoc object from the provided ID, class name and data.
|
|
174
|
+
#
|
|
175
|
+
# Parameters:
|
|
176
|
+
#
|
|
177
|
+
# [id] ID of the object to index
|
|
178
|
+
# [class_name] Name of the class
|
|
179
|
+
# [data] Hash with the properties to index
|
|
180
|
+
|
|
181
|
+
def initialize(id, class_name, properties)
|
|
182
|
+
raise ArgumentError, 'Missing id' if id.nil?
|
|
183
|
+
raise ArgumentError, 'Missing class_name' if class_name.nil?
|
|
184
|
+
|
|
185
|
+
xml = "<sphinx:document id=\"#{id}\">"
|
|
186
|
+
|
|
187
|
+
xml << '<csphinx-class>'
|
|
188
|
+
xml << CouchSphinx::MultiAttribute.encode(class_name)
|
|
189
|
+
xml << '</csphinx-class>'
|
|
190
|
+
xml << "<couchrest-type>#{class_name}</couchrest-type>"
|
|
191
|
+
|
|
192
|
+
properties.each do |key, value|
|
|
193
|
+
xml << "<#{key}><![CDATA[[#{value}]]></#{key}>"
|
|
194
|
+
end
|
|
195
|
+
|
|
196
|
+
xml << '</sphinx:document>'
|
|
197
|
+
|
|
198
|
+
@xml = xml
|
|
199
|
+
|
|
200
|
+
@id = id
|
|
201
|
+
@class_name = class_name
|
|
202
|
+
end
|
|
203
|
+
|
|
204
|
+
# Returns the encoded data as XML.
|
|
205
|
+
|
|
206
|
+
def to_xml
|
|
207
|
+
return to_s
|
|
208
|
+
end
|
|
209
|
+
|
|
210
|
+
# Returns the encoded data as XML.
|
|
211
|
+
|
|
212
|
+
def to_s
|
|
213
|
+
return self.xml
|
|
214
|
+
end
|
|
215
|
+
end
|
|
216
|
+
end
|
|
217
|
+
end
|
|
@@ -0,0 +1,256 @@
|
|
|
1
|
+
# CouchSphinx, a full text indexing extension for CouchDB/CouchRest using
|
|
2
|
+
# Sphinx.
|
|
3
|
+
#
|
|
4
|
+
# This file contains the CouchRest::Mixins::Indexer module which in turn
|
|
5
|
+
# includes CouchRest::Mixins::Indexer::ClassMethods.
|
|
6
|
+
|
|
7
|
+
# Patches to the CouchRest library.
|
|
8
|
+
|
|
9
|
+
module CouchRest # :nodoc:
|
|
10
|
+
module Mixins # :nodoc:
|
|
11
|
+
|
|
12
|
+
# Mixin for CouchRest adding indexing stuff. See class ClassMethods for
|
|
13
|
+
# details.
|
|
14
|
+
|
|
15
|
+
module Indexer #:nodoc:
|
|
16
|
+
|
|
17
|
+
# Bootstrap method to include patches with.
|
|
18
|
+
#
|
|
19
|
+
# Parameters:
|
|
20
|
+
#
|
|
21
|
+
# [base] Class to include class methods of module into
|
|
22
|
+
|
|
23
|
+
def self.included(base)
|
|
24
|
+
base.extend(ClassMethods)
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
# Patches to the CouchRest ExtendedDocument module: Adds the
|
|
28
|
+
# "fulltext_index" method for enabling indexing and defining the fields
|
|
29
|
+
# to include as a domain specific extention. This method also assures
|
|
30
|
+
# the existence of a special design document used to generate indexes
|
|
31
|
+
# from.
|
|
32
|
+
#
|
|
33
|
+
# An additional save callback sets an ID like "Post-123123" (class name
|
|
34
|
+
# plus pure numeric ID compatible with Sphinx) for new objects).
|
|
35
|
+
#
|
|
36
|
+
# Last but not least method "by_fulltext_index" is defined allowing a
|
|
37
|
+
# full text search like "foo @title bar" within the context of the
|
|
38
|
+
# current class.
|
|
39
|
+
#
|
|
40
|
+
# Samples:
|
|
41
|
+
#
|
|
42
|
+
# class Post < CouchRest::ExtendedDocument
|
|
43
|
+
# use_database SERVER.default_database
|
|
44
|
+
#
|
|
45
|
+
# property :title
|
|
46
|
+
# property :body
|
|
47
|
+
#
|
|
48
|
+
# fulltext_index :title, :body
|
|
49
|
+
# end
|
|
50
|
+
#
|
|
51
|
+
# Post.by_fulltext_index('first')
|
|
52
|
+
# => [...]
|
|
53
|
+
# post = Post.by_fulltext_index('this is @title post').first
|
|
54
|
+
# post.title
|
|
55
|
+
# => "First Post"
|
|
56
|
+
# post.class
|
|
57
|
+
# => Post
|
|
58
|
+
|
|
59
|
+
module ClassMethods
|
|
60
|
+
|
|
61
|
+
# Method for enabling fulltext indexing and for defining the fields to
|
|
62
|
+
# include.
|
|
63
|
+
#
|
|
64
|
+
# Parameters:
|
|
65
|
+
#
|
|
66
|
+
# [keys] Array of field keys to include plus options Hash
|
|
67
|
+
#
|
|
68
|
+
# Options:
|
|
69
|
+
#
|
|
70
|
+
# [:server] Server name (defaults to localhost)
|
|
71
|
+
# [:port] Server port (defaults to 3312)
|
|
72
|
+
# [:idsize] Number of bits for the ID to generate (defaults to 32)
|
|
73
|
+
|
|
74
|
+
def fulltext_index(*keys)
|
|
75
|
+
opts = keys.pop if keys.last.is_a?(Hash)
|
|
76
|
+
opts ||= {} # Handle some options: Future use... :-)
|
|
77
|
+
|
|
78
|
+
# Save the keys to index and the options for later use in callback.
|
|
79
|
+
# Helper method cattr_accessor is already bootstrapped by couchrest
|
|
80
|
+
# gem.
|
|
81
|
+
|
|
82
|
+
cattr_accessor :fulltext_keys
|
|
83
|
+
cattr_accessor :fulltext_opts
|
|
84
|
+
|
|
85
|
+
self.fulltext_keys = keys
|
|
86
|
+
self.fulltext_opts = opts
|
|
87
|
+
|
|
88
|
+
# We add a few new functions to CouchDB for retrieving modified
|
|
89
|
+
# documents...
|
|
90
|
+
|
|
91
|
+
assure_existing_couch_index
|
|
92
|
+
|
|
93
|
+
# Overwrite setting of new ID to do something compatible with
|
|
94
|
+
# Sphinx. If an ID already exists, we try to match it with our
|
|
95
|
+
# Schema and cowardly ignore if not.
|
|
96
|
+
|
|
97
|
+
save_callback :before do |object|
|
|
98
|
+
if object.id.nil?
|
|
99
|
+
idsize = fulltext_opts[:idsize] || 32
|
|
100
|
+
limit = (1 << idsize) - 1
|
|
101
|
+
|
|
102
|
+
while true
|
|
103
|
+
id = rand(limit)
|
|
104
|
+
candidate = "#{self.class.to_s}-#{id}"
|
|
105
|
+
|
|
106
|
+
begin
|
|
107
|
+
object.class.get(candidate) # Resource not found exception if available
|
|
108
|
+
rescue RestClient::ResourceNotFound
|
|
109
|
+
object['_id'] = candidate
|
|
110
|
+
break
|
|
111
|
+
end
|
|
112
|
+
end
|
|
113
|
+
end
|
|
114
|
+
end
|
|
115
|
+
end
|
|
116
|
+
|
|
117
|
+
# Searches for an object of this model class (e.g. Post, Comment) and
|
|
118
|
+
# the requested query string. The query string may contain any query
|
|
119
|
+
# provided by Sphinx.
|
|
120
|
+
#
|
|
121
|
+
# Call CouchRest::ExtendedDocument.by_fulltext_index() to query
|
|
122
|
+
# without reducing to a single class type.
|
|
123
|
+
#
|
|
124
|
+
# Parameters:
|
|
125
|
+
#
|
|
126
|
+
# [query] Query string like "foo @title bar"
|
|
127
|
+
# [options] Additional options to set
|
|
128
|
+
#
|
|
129
|
+
# Options:
|
|
130
|
+
#
|
|
131
|
+
# [:match_mode] Optional Riddle match mode (defaults to :extended)
|
|
132
|
+
# [:limit] Optional Riddle limit (Riddle default)
|
|
133
|
+
# [:max_matches] Optional Riddle max_matches (Riddle default)
|
|
134
|
+
# [:sort_by] Optional Riddle sort order (also sets sort_mode to :extended)
|
|
135
|
+
# [:raw] Flag to return only IDs and do not lookup objects (defaults to false)
|
|
136
|
+
|
|
137
|
+
def by_fulltext_index(query, options = {})
|
|
138
|
+
if self == ExtendedDocument
|
|
139
|
+
client = Riddle::Client.new
|
|
140
|
+
else
|
|
141
|
+
client = Riddle::Client.new(fulltext_opts[:server],
|
|
142
|
+
fulltext_opts[:port])
|
|
143
|
+
|
|
144
|
+
query = query + " @couchrest-type #{self}"
|
|
145
|
+
end
|
|
146
|
+
|
|
147
|
+
client.match_mode = options[:match_mode] || :extended
|
|
148
|
+
|
|
149
|
+
if (limit = options[:limit])
|
|
150
|
+
client.limit = limit
|
|
151
|
+
end
|
|
152
|
+
|
|
153
|
+
if (max_matches = options[:max_matches])
|
|
154
|
+
client.max_matches = matches
|
|
155
|
+
end
|
|
156
|
+
|
|
157
|
+
if (sort_by = options[:sort_by])
|
|
158
|
+
client.sort_mode = :extended
|
|
159
|
+
client.sort_by = sort_by
|
|
160
|
+
end
|
|
161
|
+
|
|
162
|
+
result = client.query(query)
|
|
163
|
+
|
|
164
|
+
if result and result[:status] == 0 and (matches = result[:matches])
|
|
165
|
+
keys = matches.collect { |row| (CouchSphinx::MultiAttribute.decode(
|
|
166
|
+
row[:attributes]['csphinx-class']) +
|
|
167
|
+
'-' + row[:doc].to_s) rescue nil }.compact
|
|
168
|
+
|
|
169
|
+
return keys if options[:raw]
|
|
170
|
+
return multi_get(keys)
|
|
171
|
+
else
|
|
172
|
+
return []
|
|
173
|
+
end
|
|
174
|
+
end
|
|
175
|
+
|
|
176
|
+
# Returns objects for all provided keys not reducing lookup to a
|
|
177
|
+
# certain type. Casts to a CouchRest object if possible.
|
|
178
|
+
#
|
|
179
|
+
# Parameters:
|
|
180
|
+
#
|
|
181
|
+
# [ids] Array of document IDs to retrieve
|
|
182
|
+
|
|
183
|
+
def multi_get(ids)
|
|
184
|
+
result = CouchRest.post(SERVER.default_database.to_s +
|
|
185
|
+
'/_all_docs?include_docs=true', :keys => ids)
|
|
186
|
+
|
|
187
|
+
return result['rows'].collect { |row|
|
|
188
|
+
row = row['doc'] if row['couchrest-type'].nil?
|
|
189
|
+
|
|
190
|
+
if row and (class_name = row['couchrest-type'])
|
|
191
|
+
eval(class_name.to_s).new(row) rescue row
|
|
192
|
+
else
|
|
193
|
+
row
|
|
194
|
+
end
|
|
195
|
+
}
|
|
196
|
+
end
|
|
197
|
+
|
|
198
|
+
# Defines a design document with the functions needed to lookup
|
|
199
|
+
# modified documents. If the current version is to old, a new version
|
|
200
|
+
# of the design document is stored.
|
|
201
|
+
|
|
202
|
+
def assure_existing_couch_index
|
|
203
|
+
if (doc = database.get("_design/CouchSphinxIndex") rescue nil)
|
|
204
|
+
return if (ver = doc['version']) and ver == CouchSphinx::VERSION
|
|
205
|
+
|
|
206
|
+
database.delete_doc(doc)
|
|
207
|
+
end
|
|
208
|
+
|
|
209
|
+
all_couchrests = {
|
|
210
|
+
:map => 'function(doc) {
|
|
211
|
+
if(doc["couchrest-type"] && (doc["created_at"] || doc["updated_at"])) {
|
|
212
|
+
var date = doc["updated_at"];
|
|
213
|
+
|
|
214
|
+
if(date == null)
|
|
215
|
+
date = doc["created_at"];
|
|
216
|
+
|
|
217
|
+
emit(doc._id, doc);
|
|
218
|
+
}
|
|
219
|
+
}'
|
|
220
|
+
}
|
|
221
|
+
|
|
222
|
+
couchrests_by_timestamp = {
|
|
223
|
+
:map => 'function(doc) {
|
|
224
|
+
if(doc["couchrest-type"] && (doc["created_at"] || doc["updated_at"])) {
|
|
225
|
+
var date = doc["updated_at"];
|
|
226
|
+
|
|
227
|
+
if(date == null)
|
|
228
|
+
date = doc["created_at"];
|
|
229
|
+
|
|
230
|
+
emit(Date.parse(date), doc);
|
|
231
|
+
}
|
|
232
|
+
}'
|
|
233
|
+
}
|
|
234
|
+
|
|
235
|
+
database.save_doc({
|
|
236
|
+
"_id" => "_design/CouchSphinxIndex",
|
|
237
|
+
:lib_version => CouchSphinx::VERSION,
|
|
238
|
+
:views => {
|
|
239
|
+
:all_couchrests => all_couchrests,
|
|
240
|
+
:couchrests_by_timestamp => couchrests_by_timestamp
|
|
241
|
+
}
|
|
242
|
+
})
|
|
243
|
+
end
|
|
244
|
+
end
|
|
245
|
+
end
|
|
246
|
+
end
|
|
247
|
+
end
|
|
248
|
+
|
|
249
|
+
# Include the Indexer mixin from the original ExtendedDocument class of
|
|
250
|
+
# CouchRest which adds a few methods and allows calling method indexed_with.
|
|
251
|
+
|
|
252
|
+
module CouchRest # :nodoc:
|
|
253
|
+
class ExtendedDocument # :nodoc:
|
|
254
|
+
include CouchRest::Mixins::Indexer
|
|
255
|
+
end
|
|
256
|
+
end
|
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
# CouchSphinx, a full text indexing extension for CouchDB/CouchRest using
|
|
2
|
+
# Sphinx.
|
|
3
|
+
#
|
|
4
|
+
# This file contains the CouchRest::Mixins::Properties module.
|
|
5
|
+
|
|
6
|
+
# Patches to the CouchRest library.
|
|
7
|
+
|
|
8
|
+
module CouchRest # :nodoc:
|
|
9
|
+
module Mixins # :nodoc:
|
|
10
|
+
|
|
11
|
+
# Patches to the CouchRest Properties module: Adds the "attributes" method
|
|
12
|
+
# plus some fulltext relevant stuff.
|
|
13
|
+
#
|
|
14
|
+
# Samples:
|
|
15
|
+
#
|
|
16
|
+
# data = SERVER.default_database.view('CouchSphinxIndex/couchrests_by_timestamp')
|
|
17
|
+
# rows = data['rows']
|
|
18
|
+
# post = Post.new(rows.first)
|
|
19
|
+
#
|
|
20
|
+
# post.attributes
|
|
21
|
+
# => {:tags=>"one, two, three", :updated_at=>Tue Jun 09 14:45:00 +0200 2009,
|
|
22
|
+
# :author=>nil, :title=>"First Post",
|
|
23
|
+
# :created_at=>Tue Jun 09 14:45:00 +0200 2009,
|
|
24
|
+
# :body=>"This is the first post. This is the [...] first post. "}
|
|
25
|
+
#
|
|
26
|
+
# post.fulltext_attributes
|
|
27
|
+
# => {:title=>"First Post", :author=>nil,
|
|
28
|
+
# :created_at=>Tue Jun 09 14:45:00 +0200 2009
|
|
29
|
+
# :body=>"This is the first post. This is the [...] first post. "}
|
|
30
|
+
#
|
|
31
|
+
# post.sphinx_id
|
|
32
|
+
# => "921744775"
|
|
33
|
+
# post.id
|
|
34
|
+
# => "Post-921744775"
|
|
35
|
+
|
|
36
|
+
module Properties
|
|
37
|
+
|
|
38
|
+
# Returns a Hash of all properties plus the ID of the document.
|
|
39
|
+
|
|
40
|
+
def attributes
|
|
41
|
+
data = {}
|
|
42
|
+
|
|
43
|
+
self.properties.collect { |p|
|
|
44
|
+
{ p.name.intern => self.send(p.name) } }.each { |h|
|
|
45
|
+
data.merge! h }
|
|
46
|
+
|
|
47
|
+
return data
|
|
48
|
+
end
|
|
49
|
+
|
|
50
|
+
# Returns a Hash of all attributes allowed to be indexed. As a side
|
|
51
|
+
# effect it sets the fulltext_keys variable if still blank or empty.
|
|
52
|
+
|
|
53
|
+
def fulltext_attributes
|
|
54
|
+
clas = self.class
|
|
55
|
+
|
|
56
|
+
if not clas.fulltext_keys or clas.fulltext_keys.empty?
|
|
57
|
+
clas.fulltext_keys = self.properties.collect { |p| p.name.intern }
|
|
58
|
+
end
|
|
59
|
+
|
|
60
|
+
return self.attributes.reject { |k, v|
|
|
61
|
+
not (clas.fulltext_keys.include? k) }
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
# Returns the numeric part of the document ID (compatible to Sphinx).
|
|
65
|
+
|
|
66
|
+
def sphinx_id
|
|
67
|
+
if (match = self.id.match(/#{self.class}-([0-9]+)/))
|
|
68
|
+
return match[1]
|
|
69
|
+
else
|
|
70
|
+
return nil
|
|
71
|
+
end
|
|
72
|
+
end
|
|
73
|
+
end
|
|
74
|
+
end
|
|
75
|
+
end
|
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
# CouchSphinx, a full text indexing extension for CouchDB/CouchRest using
|
|
2
|
+
# Sphinx.
|
|
3
|
+
#
|
|
4
|
+
# This file contains the CouchSphinx::MultiAttribute class.
|
|
5
|
+
|
|
6
|
+
# Namespace module for the CouchSphinx gem.
|
|
7
|
+
|
|
8
|
+
module CouchSphinx #:nodoc:
|
|
9
|
+
|
|
10
|
+
# Module MultiAttribute implements helpers to translate back and
|
|
11
|
+
# forth between Ruby Strings and an array of integers suitable for Sphinx
|
|
12
|
+
# attributes of type "multi".
|
|
13
|
+
#
|
|
14
|
+
# Background: Getting an ID as result for a query is OK, but for example to
|
|
15
|
+
# allow cast safety, we need an aditional attribute. Sphinx supports
|
|
16
|
+
# attributes which are returned together with the ID, but they behave a
|
|
17
|
+
# little different than expected: Instead we can use arrays of integers with
|
|
18
|
+
# ASCII character codes. These values are returned in ascending (!) order of
|
|
19
|
+
# value (yes, sounds funny but is reasonable from an internal view to
|
|
20
|
+
# Sphinx). So we mask each byte with 0x0100++ to keep the order...
|
|
21
|
+
#
|
|
22
|
+
# Sample:
|
|
23
|
+
#
|
|
24
|
+
# CouchSphinx::MultiAttribute.encode('Hello')
|
|
25
|
+
# => "328,613,876,1132,1391"
|
|
26
|
+
# CouchSphinx::MultiAttribute.decode('328,613,876,1132,1391')
|
|
27
|
+
# => "Hello"
|
|
28
|
+
|
|
29
|
+
module MultiAttribute
|
|
30
|
+
|
|
31
|
+
# Returns an numeric representation of a Ruby String suitable for "multi"
|
|
32
|
+
# attributes of Sphinx.
|
|
33
|
+
#
|
|
34
|
+
# Parameters:
|
|
35
|
+
#
|
|
36
|
+
# [str] String to translate
|
|
37
|
+
|
|
38
|
+
def self.encode(str)
|
|
39
|
+
offset = 0
|
|
40
|
+
return str.bytes.collect { |c| (offset+= 0x0100) + c }.join(',')
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
# Returns the original CouchDB ID created from a Sphinx ID. Only works if
|
|
44
|
+
# the ID was created from a CouchDB ID before!
|
|
45
|
+
#
|
|
46
|
+
# Parameters:
|
|
47
|
+
#
|
|
48
|
+
# [multi] Sphinx "multi" attribute to translate back
|
|
49
|
+
|
|
50
|
+
def self.decode(multi)
|
|
51
|
+
offset = 0
|
|
52
|
+
multi = multi.split(',') if not multi.kind_of? Array
|
|
53
|
+
|
|
54
|
+
return multi.collect {|x| (x.to_i-(offset+=0x0100)).chr}.to_s
|
|
55
|
+
end
|
|
56
|
+
end
|
|
57
|
+
end
|
metadata
ADDED
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
|
2
|
+
name: couchsphinx
|
|
3
|
+
version: !ruby/object:Gem::Version
|
|
4
|
+
version: "0.1"
|
|
5
|
+
platform: ruby
|
|
6
|
+
authors:
|
|
7
|
+
- Jan Ulbrich
|
|
8
|
+
autorequire:
|
|
9
|
+
bindir: bin
|
|
10
|
+
cert_chain: []
|
|
11
|
+
|
|
12
|
+
date: 2009-11-04 00:00:00 +01:00
|
|
13
|
+
default_executable:
|
|
14
|
+
dependencies: []
|
|
15
|
+
|
|
16
|
+
description:
|
|
17
|
+
email: jan.ulbrich @nospam@ holtzbrinck.com
|
|
18
|
+
executables: []
|
|
19
|
+
|
|
20
|
+
extensions: []
|
|
21
|
+
|
|
22
|
+
extra_rdoc_files:
|
|
23
|
+
- README.rdoc
|
|
24
|
+
files:
|
|
25
|
+
- README.rdoc
|
|
26
|
+
- couchsphinx.rb
|
|
27
|
+
- lib/multi_attribute.rb
|
|
28
|
+
- lib/mixins/properties.rb
|
|
29
|
+
- lib/mixins/indexer.rb
|
|
30
|
+
- lib/indexer.rb
|
|
31
|
+
has_rdoc: true
|
|
32
|
+
homepage: http://github.com/ulbrich/couchsphinx
|
|
33
|
+
licenses: []
|
|
34
|
+
|
|
35
|
+
post_install_message:
|
|
36
|
+
rdoc_options:
|
|
37
|
+
- --exclude
|
|
38
|
+
- pkg
|
|
39
|
+
- --exclude
|
|
40
|
+
- tmp
|
|
41
|
+
- --all
|
|
42
|
+
- --title
|
|
43
|
+
- CouchSphinx
|
|
44
|
+
- --main
|
|
45
|
+
- README.rdoc
|
|
46
|
+
require_paths:
|
|
47
|
+
- .
|
|
48
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
|
49
|
+
requirements:
|
|
50
|
+
- - ">="
|
|
51
|
+
- !ruby/object:Gem::Version
|
|
52
|
+
version: "0"
|
|
53
|
+
version:
|
|
54
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
|
55
|
+
requirements:
|
|
56
|
+
- - ">="
|
|
57
|
+
- !ruby/object:Gem::Version
|
|
58
|
+
version: "0"
|
|
59
|
+
version:
|
|
60
|
+
requirements: []
|
|
61
|
+
|
|
62
|
+
rubyforge_project:
|
|
63
|
+
rubygems_version: 1.3.5
|
|
64
|
+
signing_key:
|
|
65
|
+
specification_version: 3
|
|
66
|
+
summary: A full text indexing extension for CouchDB/CouchRest using Sphinx.
|
|
67
|
+
test_files: []
|
|
68
|
+
|