couchsphinx 0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,171 @@
1
+ = CouchSphinx
2
+
3
+ The CouchSphinx library implements an interface between CouchDB and Sphinx
4
+ supporting CouchRest to automatically index objects in Sphinx. It tries to
5
+ act as transparent as possible: Just an additional method in the CouchRest
6
+ domain specific language and some Sphinx configuration are needed to get
7
+ going.
8
+
9
+ == Prerequisites
10
+
11
+ CouchSphinx needs gems CouchRest and Riddle as well as a running Sphinx
12
+ and a CouchDB installation.
13
+
14
+ sudo gem sources -a http://gems.github.com # Only needed once!
15
+ sudo gem install riddle
16
+ sudo gem install couchrest
17
+ sudo gem install ulbrich-couchsphinx
18
+
19
+ No additional configuraton is needed for interfacing with CouchDB: Setup is
20
+ done when CouchRest is able to talk to the CouchDB server.
21
+
22
+ A proper "sphinx.conf" file and a script for retrieving index data have to
23
+ be provided for interfacing with Sphinx: Sorry, no UltraSphinx like
24
+ magic... :-) Depending on the amount of data, more than one index may be used
25
+ and indexes may be consolidated from time to time.
26
+
27
+ This is a sample configuration for a single "main" index:
28
+
29
+ searchd {
30
+ address = 0.0.0.0
31
+ port = 3312
32
+
33
+ log = ./sphinx/searchd.log
34
+ query_log = ./sphinx/query.log
35
+ pid_file = ./sphinx/searchd.pid
36
+ }
37
+
38
+ source couchblog {
39
+ type = xmlpipe2
40
+
41
+ xmlpipe_command = ./sphinxsource.rb
42
+ }
43
+
44
+ index couchblog {
45
+ source = couchblog
46
+
47
+ charset_type = utf-8
48
+ path = ./sphinx/sphinx_index_main
49
+ }
50
+
51
+ The script "sphinxsource.rb" providing the data to index may vary
52
+ depending on the number of CouchDB instances it talks to. This is a simple
53
+ script interfacing with one single instance:
54
+
55
+ #!/usr/bin/env ruby
56
+
57
+ require 'rubygems'
58
+ require 'lib/models' # Depends on location of model files
59
+
60
+ data = SERVER.default_database.view('CouchSphinxIndex/couchrests_by_timestamp')
61
+ rows = data['rows'] rescue []
62
+
63
+ puts CouchSphinx::Indexer::XMLDocset.new(rows).to_s
64
+
65
+ == Models
66
+
67
+ Use method <tt>fulltext_index</tt> to enable indexing of a model. The
68
+ default is to index all attributes but it is recommended to provide a list of
69
+ attribute keys.
70
+
71
+ A side effect of calling this method is, that CouchSphinx overrides the
72
+ default of letting CouchDB create new IDs: Sphinx only allows numeric IDs and
73
+ CouchSphinx forces new objects with the name of the class, a hyphen and an
74
+ integer as ID (e.g. <tt>Post-38497238</tt>). Again: Only these objects are
75
+ indexed due to internal restrictions of Sphinx.
76
+
77
+ Sample:
78
+
79
+ class Post < CouchRest::ExtendedDocument
80
+ use_database SERVER.default_database
81
+
82
+ property :title
83
+ property :body
84
+
85
+ fulltext_index :title, :body
86
+ end
87
+
88
+ Add options <tt>:server</tt> and <tt>:port</tt> to <tt>fulltext_index</tt> if
89
+ the Sphinx server to query is running on a different server (defaults to
90
+ "localhost" with port 3312).
91
+
92
+ If you are sure your Sphinx is compiled with 64-bit support, you may add
93
+ option <tt>:idsize</tt> with value <tt>64</tt> to generate 64-bit IDs for
94
+ CouchDB (defaults to 32-bits).
95
+
96
+ Here is a full-featured sample setting additional options:
97
+
98
+ fulltext_index :title, :body, :server => 'my.other.server', :port => 3313,
99
+ :idsize => 64
100
+
101
+ == Indexing
102
+
103
+ CouchSphinx also adds a new design document to CouchDB: It needs to collect
104
+ all relevant objects for running the Sphinx indexer and adds its own views
105
+ to do so. Have a look at CouchDB design document "CouchSphinxIndex" for
106
+ details.
107
+
108
+ Automatically starting the reindexing of objects the moment new objects are
109
+ created can be implemented by adding a save_callback to the model class:
110
+
111
+ save_callback :after do |object|
112
+ `sudo indexer --all --rotate` # Configure sudo to allow this call...
113
+ end
114
+
115
+ This or a similar callback should be added to all models needing instant
116
+ indexing. If indexing is not that crucial or load is high, some additional
117
+ checks for the time of the last call should be added.
118
+
119
+ == Queries
120
+
121
+ An additional instance method <tt>by_fulltext_index</tt> is added for each
122
+ fulltext indexed model. This method takes a Sphinx query like
123
+ "foo @title bar", runs it within the context of the current class and returns
124
+ an Array of matching CouchDB documents. Use
125
+ <tt>CouchRest::ExtendedDocument.by_fulltext_index</tt> if you want to find
126
+ any document matching the query and not only a certain class.
127
+
128
+ Samples:
129
+
130
+ Post.by_fulltext_index('first')
131
+ => [...]
132
+
133
+ post = Post.by_fulltext_index('this is @title post').first
134
+ post.title
135
+ => "First Post"
136
+ post.class
137
+ => Post
138
+
139
+ Additional options <tt>:match_mode</tt>, <tt>:limit</tt> and
140
+ <tt>:max_matches</tt> can be provided to customize the behaviour of Riddle.
141
+ Option <tt>:raw</tt> can be set to <tt>true</tt> to do no lookup of the
142
+ document IDs but return the raw IDs instead.
143
+
144
+ Sample:
145
+
146
+ Post.by_fulltext_index('my post', :limit => 100)
147
+
148
+ == Copyright & License
149
+
150
+ Copyright (c) 2009 Holtzbrinck Digital GmbH, Jan Ulbrich
151
+
152
+ Permission is hereby granted, free of charge, to any person
153
+ obtaining a copy of this software and associated documentation
154
+ files (the "Software"), to deal in the Software without
155
+ restriction, including without limitation the rights to use,
156
+ copy, modify, merge, publish, distribute, sublicense, and/or sell
157
+ copies of the Software, and to permit persons to whom the
158
+ Software is furnished to do so, subject to the following
159
+ conditions:
160
+
161
+ The above copyright notice and this permission notice shall be
162
+ included in all copies or substantial portions of the Software.
163
+
164
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
165
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
166
+ OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
167
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
168
+ HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
169
+ WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
170
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
171
+ OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,28 @@
1
+ # CouchSphinx, a full text indexing extension for CouchDB/CouchRest using
2
+ # Sphinx.
3
+ #
4
+ # This file contains the includes implementing this library. Have a look at
5
+ # the README.rdoc as a starting point.
6
+
7
+ require 'rubygems'
8
+
9
+ require 'couchrest'
10
+ require 'riddle'
11
+
12
+ # Version number to use for updating CouchDB design document CouchSphinxIndex
13
+ # if needed.
14
+
15
+ module CouchSphinx
16
+ if (match = __FILE__.match(/couchsphinx-([0-9.-]*)/))
17
+ VERSION = match[1]
18
+ else
19
+ VERSION = 'unknown'
20
+ end
21
+ end
22
+
23
+ # Require the stuff implementing this library...
24
+
25
+ require 'lib/multi_attribute'
26
+ require 'lib/indexer'
27
+ require 'lib/mixins/indexer'
28
+ require 'lib/mixins/properties'
@@ -0,0 +1,217 @@
1
+ # CouchSphinx, a full text indexing extension for CouchDB/CouchRest using
2
+ # Sphinx.
3
+ #
4
+ # This file contains the CouchSphinx::Indexer::XMLDocset and
5
+ # CouchSphinx::Indexer::XMLDoc classes.
6
+
7
+ # Namespace module for the CouchSphinx gem.
8
+
9
+ module CouchSphinx #:nodoc:
10
+
11
+ # Module Indexer contains classes for creating XML input documents for the
12
+ # indexer. Each Sphinx index consists of a single "sphinx:docset" with any
13
+ # number of "sphinx:document" tags.
14
+ #
15
+ # The XML source can be generated from an array of CouchRest objects or from
16
+ # an array of Hashes containing at least fields "couchrest-type" and "_id"
17
+ # as returned by CouchDB view "CouchSphinxIndex/couchrests_by_timestamp".
18
+ #
19
+ # Sample:
20
+ #
21
+ # rows = [{ 'name' => 'John', 'phone' => '199 43828',
22
+ # 'couchrest-type' => 'Address', '_id' => 'Address-234164'
23
+ # },
24
+ # { 'name' => 'Sue', 'mobile' => '828 19439',
25
+ # 'couchrest-type' => 'Address', '_id' => 'Address-422433'
26
+ # }
27
+ # ]
28
+ # puts CouchSphinx::Indexer::XMLDocset.new(rows).to_s
29
+ #
30
+ # <?xml version="1.0" encoding="utf-8"?>
31
+ # <sphinx:docset>
32
+ # <sphinx:schema>
33
+ # <sphinx:attr name="csphinx-class" type="multi"/>
34
+ # <sphinx:field name="couchrest-type"/>
35
+ # <sphinx:field name="name"/>
36
+ # <sphinx:field name="phone"/>
37
+ # <sphinx:field name="mobile"/>
38
+ # <sphinx:field name="created_at"/>
39
+ # </sphinx:schema>
40
+ # <sphinx:document id="234164">
41
+ # <csphinx-class>336,623,883,1140</csphinx-class>
42
+ # <couchrest-type>Address</couchrest-type>
43
+ # <name><![CDATA[[John]]></name>
44
+ # <phone><![CDATA[[199 422433]]></phone>
45
+ # <mobile><![CDATA[[]]></mobile>
46
+ # <created_at><![CDATA[[]]></created_at>
47
+ # </sphinx:document>
48
+ # <sphinx:document id="423423">
49
+ # <csphinx-class>336,623,883,1140</csphinx-class>
50
+ # <couchrest-type>Address</couchrest-type>
51
+ # <name><![CDATA[[Sue]]></name>
52
+ # <phone><![CDATA[[]]></phone>
53
+ # <mobile><![CDATA[[828 19439]]></mobile>
54
+ # <created_at><![CDATA[[]]></created_at>
55
+ # </sphinx:document>
56
+ # </sphinx:docset>"
57
+
58
+ module Indexer
59
+
60
+ # Class XMLDocset wraps the XML representation of a document to index. It
61
+ # contains a complete "sphinx:docset" including its schema definition.
62
+
63
+ class XMLDocset
64
+
65
+ # Objects contained in document set.
66
+
67
+ attr_reader :xml_docs
68
+
69
+ # XML generated for opening the document.
70
+
71
+ attr_reader :xml_header
72
+
73
+ # XML generated for closing the document.
74
+
75
+ attr_reader :xml_footer
76
+
77
+ # Creates a XMLDocset object from the provided data. It defines a
78
+ # superset of all fields of the classes to index objects for. The class
79
+ # names are collected from the provided objects as well.
80
+ #
81
+ # Parameters:
82
+ #
83
+ # [data] Array with objects of type CouchRest::Document or Hash to create XML for
84
+
85
+ def initialize(rows = [])
86
+ raise ArgumentError, 'Missing class names' if rows.nil?
87
+
88
+ xml = '<?xml version="1.0" encoding="utf-8"?>'
89
+
90
+ xml << '<sphinx:docset><sphinx:schema>'
91
+
92
+ @xml_docs = []
93
+ classes = []
94
+
95
+ rows.each do |row|
96
+ object = nil
97
+
98
+ if row.kind_of? CouchRest::Document
99
+ object = row
100
+ elsif row.kind_of? Hash
101
+ row = row['value'] if row['couchrest-type'].nil?
102
+
103
+ if row and (class_name = row['couchrest-type'])
104
+ object = eval(class_name.to_s).new(row) rescue nil
105
+ end
106
+ end
107
+
108
+ if object and object.sphinx_id
109
+ classes << object.class if not classes.include? object.class
110
+ @xml_docs << XMLDoc.from_object(object)
111
+ end
112
+ end
113
+
114
+ field_names = classes.collect { |clas| clas.fulltext_keys rescue []
115
+ }.flatten.uniq
116
+
117
+ field_names.each do |key, value|
118
+ xml << "<sphinx:field name=\"#{key}\"/>"
119
+ end
120
+
121
+ xml << '<sphinx:field name="couchrest-type"/>'
122
+ xml << '<sphinx:attr name="csphinx-class" type="multi"/>'
123
+
124
+ xml << '</sphinx:schema>'
125
+
126
+ @xml_header = xml
127
+ @xml_footer = '</sphinx:docset>'
128
+ end
129
+
130
+ # Returns the encoded data as XML.
131
+
132
+ def to_xml
133
+ return to_s
134
+ end
135
+
136
+ # Returns the encoded data as XML.
137
+
138
+ def to_s
139
+ return self.xml_header + self.xml_docs.join + self.xml_footer
140
+ end
141
+ end
142
+
143
+ # Class XMLDoc wraps the XML representation of a single document to index
144
+ # and contains a complete "sphinx:document" tag.
145
+
146
+ class XMLDoc
147
+
148
+ # Returns the ID of the encoded data.
149
+
150
+ attr_reader :id
151
+
152
+ # Returns the class name of the encoded data.
153
+
154
+ attr_reader :class_name
155
+
156
+ # Returns the encoded data.
157
+
158
+ attr_reader :xml
159
+
160
+ # Creates a XMLDoc object from the provided CouchRest object.
161
+ #
162
+ # Parameters:
163
+ #
164
+ # [object] Object to index
165
+
166
+ def self.from_object(object)
167
+ raise ArgumentError, 'Missing object' if object.nil?
168
+ raise ArgumentError, 'No compatible ID' if (id = object.sphinx_id).nil?
169
+
170
+ return new(id, object.class.to_s, object.fulltext_attributes)
171
+ end
172
+
173
+ # Creates a XMLDoc object from the provided ID, class name and data.
174
+ #
175
+ # Parameters:
176
+ #
177
+ # [id] ID of the object to index
178
+ # [class_name] Name of the class
179
+ # [data] Hash with the properties to index
180
+
181
+ def initialize(id, class_name, properties)
182
+ raise ArgumentError, 'Missing id' if id.nil?
183
+ raise ArgumentError, 'Missing class_name' if class_name.nil?
184
+
185
+ xml = "<sphinx:document id=\"#{id}\">"
186
+
187
+ xml << '<csphinx-class>'
188
+ xml << CouchSphinx::MultiAttribute.encode(class_name)
189
+ xml << '</csphinx-class>'
190
+ xml << "<couchrest-type>#{class_name}</couchrest-type>"
191
+
192
+ properties.each do |key, value|
193
+ xml << "<#{key}><![CDATA[[#{value}]]></#{key}>"
194
+ end
195
+
196
+ xml << '</sphinx:document>'
197
+
198
+ @xml = xml
199
+
200
+ @id = id
201
+ @class_name = class_name
202
+ end
203
+
204
+ # Returns the encoded data as XML.
205
+
206
+ def to_xml
207
+ return to_s
208
+ end
209
+
210
+ # Returns the encoded data as XML.
211
+
212
+ def to_s
213
+ return self.xml
214
+ end
215
+ end
216
+ end
217
+ end
@@ -0,0 +1,256 @@
1
+ # CouchSphinx, a full text indexing extension for CouchDB/CouchRest using
2
+ # Sphinx.
3
+ #
4
+ # This file contains the CouchRest::Mixins::Indexer module which in turn
5
+ # includes CouchRest::Mixins::Indexer::ClassMethods.
6
+
7
+ # Patches to the CouchRest library.
8
+
9
+ module CouchRest # :nodoc:
10
+ module Mixins # :nodoc:
11
+
12
+ # Mixin for CouchRest adding indexing stuff. See class ClassMethods for
13
+ # details.
14
+
15
+ module Indexer #:nodoc:
16
+
17
+ # Bootstrap method to include patches with.
18
+ #
19
+ # Parameters:
20
+ #
21
+ # [base] Class to include class methods of module into
22
+
23
+ def self.included(base)
24
+ base.extend(ClassMethods)
25
+ end
26
+
27
+ # Patches to the CouchRest ExtendedDocument module: Adds the
28
+ # "fulltext_index" method for enabling indexing and defining the fields
29
+ # to include as a domain specific extention. This method also assures
30
+ # the existence of a special design document used to generate indexes
31
+ # from.
32
+ #
33
+ # An additional save callback sets an ID like "Post-123123" (class name
34
+ # plus pure numeric ID compatible with Sphinx) for new objects).
35
+ #
36
+ # Last but not least method "by_fulltext_index" is defined allowing a
37
+ # full text search like "foo @title bar" within the context of the
38
+ # current class.
39
+ #
40
+ # Samples:
41
+ #
42
+ # class Post < CouchRest::ExtendedDocument
43
+ # use_database SERVER.default_database
44
+ #
45
+ # property :title
46
+ # property :body
47
+ #
48
+ # fulltext_index :title, :body
49
+ # end
50
+ #
51
+ # Post.by_fulltext_index('first')
52
+ # => [...]
53
+ # post = Post.by_fulltext_index('this is @title post').first
54
+ # post.title
55
+ # => "First Post"
56
+ # post.class
57
+ # => Post
58
+
59
+ module ClassMethods
60
+
61
+ # Method for enabling fulltext indexing and for defining the fields to
62
+ # include.
63
+ #
64
+ # Parameters:
65
+ #
66
+ # [keys] Array of field keys to include plus options Hash
67
+ #
68
+ # Options:
69
+ #
70
+ # [:server] Server name (defaults to localhost)
71
+ # [:port] Server port (defaults to 3312)
72
+ # [:idsize] Number of bits for the ID to generate (defaults to 32)
73
+
74
+ def fulltext_index(*keys)
75
+ opts = keys.pop if keys.last.is_a?(Hash)
76
+ opts ||= {} # Handle some options: Future use... :-)
77
+
78
+ # Save the keys to index and the options for later use in callback.
79
+ # Helper method cattr_accessor is already bootstrapped by couchrest
80
+ # gem.
81
+
82
+ cattr_accessor :fulltext_keys
83
+ cattr_accessor :fulltext_opts
84
+
85
+ self.fulltext_keys = keys
86
+ self.fulltext_opts = opts
87
+
88
+ # We add a few new functions to CouchDB for retrieving modified
89
+ # documents...
90
+
91
+ assure_existing_couch_index
92
+
93
+ # Overwrite setting of new ID to do something compatible with
94
+ # Sphinx. If an ID already exists, we try to match it with our
95
+ # Schema and cowardly ignore if not.
96
+
97
+ save_callback :before do |object|
98
+ if object.id.nil?
99
+ idsize = fulltext_opts[:idsize] || 32
100
+ limit = (1 << idsize) - 1
101
+
102
+ while true
103
+ id = rand(limit)
104
+ candidate = "#{self.class.to_s}-#{id}"
105
+
106
+ begin
107
+ object.class.get(candidate) # Resource not found exception if available
108
+ rescue RestClient::ResourceNotFound
109
+ object['_id'] = candidate
110
+ break
111
+ end
112
+ end
113
+ end
114
+ end
115
+ end
116
+
117
+ # Searches for an object of this model class (e.g. Post, Comment) and
118
+ # the requested query string. The query string may contain any query
119
+ # provided by Sphinx.
120
+ #
121
+ # Call CouchRest::ExtendedDocument.by_fulltext_index() to query
122
+ # without reducing to a single class type.
123
+ #
124
+ # Parameters:
125
+ #
126
+ # [query] Query string like "foo @title bar"
127
+ # [options] Additional options to set
128
+ #
129
+ # Options:
130
+ #
131
+ # [:match_mode] Optional Riddle match mode (defaults to :extended)
132
+ # [:limit] Optional Riddle limit (Riddle default)
133
+ # [:max_matches] Optional Riddle max_matches (Riddle default)
134
+ # [:sort_by] Optional Riddle sort order (also sets sort_mode to :extended)
135
+ # [:raw] Flag to return only IDs and do not lookup objects (defaults to false)
136
+
137
+ def by_fulltext_index(query, options = {})
138
+ if self == ExtendedDocument
139
+ client = Riddle::Client.new
140
+ else
141
+ client = Riddle::Client.new(fulltext_opts[:server],
142
+ fulltext_opts[:port])
143
+
144
+ query = query + " @couchrest-type #{self}"
145
+ end
146
+
147
+ client.match_mode = options[:match_mode] || :extended
148
+
149
+ if (limit = options[:limit])
150
+ client.limit = limit
151
+ end
152
+
153
+ if (max_matches = options[:max_matches])
154
+ client.max_matches = matches
155
+ end
156
+
157
+ if (sort_by = options[:sort_by])
158
+ client.sort_mode = :extended
159
+ client.sort_by = sort_by
160
+ end
161
+
162
+ result = client.query(query)
163
+
164
+ if result and result[:status] == 0 and (matches = result[:matches])
165
+ keys = matches.collect { |row| (CouchSphinx::MultiAttribute.decode(
166
+ row[:attributes]['csphinx-class']) +
167
+ '-' + row[:doc].to_s) rescue nil }.compact
168
+
169
+ return keys if options[:raw]
170
+ return multi_get(keys)
171
+ else
172
+ return []
173
+ end
174
+ end
175
+
176
+ # Returns objects for all provided keys not reducing lookup to a
177
+ # certain type. Casts to a CouchRest object if possible.
178
+ #
179
+ # Parameters:
180
+ #
181
+ # [ids] Array of document IDs to retrieve
182
+
183
+ def multi_get(ids)
184
+ result = CouchRest.post(SERVER.default_database.to_s +
185
+ '/_all_docs?include_docs=true', :keys => ids)
186
+
187
+ return result['rows'].collect { |row|
188
+ row = row['doc'] if row['couchrest-type'].nil?
189
+
190
+ if row and (class_name = row['couchrest-type'])
191
+ eval(class_name.to_s).new(row) rescue row
192
+ else
193
+ row
194
+ end
195
+ }
196
+ end
197
+
198
+ # Defines a design document with the functions needed to lookup
199
+ # modified documents. If the current version is to old, a new version
200
+ # of the design document is stored.
201
+
202
+ def assure_existing_couch_index
203
+ if (doc = database.get("_design/CouchSphinxIndex") rescue nil)
204
+ return if (ver = doc['version']) and ver == CouchSphinx::VERSION
205
+
206
+ database.delete_doc(doc)
207
+ end
208
+
209
+ all_couchrests = {
210
+ :map => 'function(doc) {
211
+ if(doc["couchrest-type"] && (doc["created_at"] || doc["updated_at"])) {
212
+ var date = doc["updated_at"];
213
+
214
+ if(date == null)
215
+ date = doc["created_at"];
216
+
217
+ emit(doc._id, doc);
218
+ }
219
+ }'
220
+ }
221
+
222
+ couchrests_by_timestamp = {
223
+ :map => 'function(doc) {
224
+ if(doc["couchrest-type"] && (doc["created_at"] || doc["updated_at"])) {
225
+ var date = doc["updated_at"];
226
+
227
+ if(date == null)
228
+ date = doc["created_at"];
229
+
230
+ emit(Date.parse(date), doc);
231
+ }
232
+ }'
233
+ }
234
+
235
+ database.save_doc({
236
+ "_id" => "_design/CouchSphinxIndex",
237
+ :lib_version => CouchSphinx::VERSION,
238
+ :views => {
239
+ :all_couchrests => all_couchrests,
240
+ :couchrests_by_timestamp => couchrests_by_timestamp
241
+ }
242
+ })
243
+ end
244
+ end
245
+ end
246
+ end
247
+ end
248
+
249
+ # Include the Indexer mixin from the original ExtendedDocument class of
250
+ # CouchRest which adds a few methods and allows calling method indexed_with.
251
+
252
+ module CouchRest # :nodoc:
253
+ class ExtendedDocument # :nodoc:
254
+ include CouchRest::Mixins::Indexer
255
+ end
256
+ end
@@ -0,0 +1,75 @@
1
+ # CouchSphinx, a full text indexing extension for CouchDB/CouchRest using
2
+ # Sphinx.
3
+ #
4
+ # This file contains the CouchRest::Mixins::Properties module.
5
+
6
+ # Patches to the CouchRest library.
7
+
8
+ module CouchRest # :nodoc:
9
+ module Mixins # :nodoc:
10
+
11
+ # Patches to the CouchRest Properties module: Adds the "attributes" method
12
+ # plus some fulltext relevant stuff.
13
+ #
14
+ # Samples:
15
+ #
16
+ # data = SERVER.default_database.view('CouchSphinxIndex/couchrests_by_timestamp')
17
+ # rows = data['rows']
18
+ # post = Post.new(rows.first)
19
+ #
20
+ # post.attributes
21
+ # => {:tags=>"one, two, three", :updated_at=>Tue Jun 09 14:45:00 +0200 2009,
22
+ # :author=>nil, :title=>"First Post",
23
+ # :created_at=>Tue Jun 09 14:45:00 +0200 2009,
24
+ # :body=>"This is the first post. This is the [...] first post. "}
25
+ #
26
+ # post.fulltext_attributes
27
+ # => {:title=>"First Post", :author=>nil,
28
+ # :created_at=>Tue Jun 09 14:45:00 +0200 2009
29
+ # :body=>"This is the first post. This is the [...] first post. "}
30
+ #
31
+ # post.sphinx_id
32
+ # => "921744775"
33
+ # post.id
34
+ # => "Post-921744775"
35
+
36
+ module Properties
37
+
38
+ # Returns a Hash of all properties plus the ID of the document.
39
+
40
+ def attributes
41
+ data = {}
42
+
43
+ self.properties.collect { |p|
44
+ { p.name.intern => self.send(p.name) } }.each { |h|
45
+ data.merge! h }
46
+
47
+ return data
48
+ end
49
+
50
+ # Returns a Hash of all attributes allowed to be indexed. As a side
51
+ # effect it sets the fulltext_keys variable if still blank or empty.
52
+
53
+ def fulltext_attributes
54
+ clas = self.class
55
+
56
+ if not clas.fulltext_keys or clas.fulltext_keys.empty?
57
+ clas.fulltext_keys = self.properties.collect { |p| p.name.intern }
58
+ end
59
+
60
+ return self.attributes.reject { |k, v|
61
+ not (clas.fulltext_keys.include? k) }
62
+ end
63
+
64
+ # Returns the numeric part of the document ID (compatible to Sphinx).
65
+
66
+ def sphinx_id
67
+ if (match = self.id.match(/#{self.class}-([0-9]+)/))
68
+ return match[1]
69
+ else
70
+ return nil
71
+ end
72
+ end
73
+ end
74
+ end
75
+ end
@@ -0,0 +1,57 @@
1
+ # CouchSphinx, a full text indexing extension for CouchDB/CouchRest using
2
+ # Sphinx.
3
+ #
4
+ # This file contains the CouchSphinx::MultiAttribute class.
5
+
6
+ # Namespace module for the CouchSphinx gem.
7
+
8
+ module CouchSphinx #:nodoc:
9
+
10
+ # Module MultiAttribute implements helpers to translate back and
11
+ # forth between Ruby Strings and an array of integers suitable for Sphinx
12
+ # attributes of type "multi".
13
+ #
14
+ # Background: Getting an ID as result for a query is OK, but for example to
15
+ # allow cast safety, we need an aditional attribute. Sphinx supports
16
+ # attributes which are returned together with the ID, but they behave a
17
+ # little different than expected: Instead we can use arrays of integers with
18
+ # ASCII character codes. These values are returned in ascending (!) order of
19
+ # value (yes, sounds funny but is reasonable from an internal view to
20
+ # Sphinx). So we mask each byte with 0x0100++ to keep the order...
21
+ #
22
+ # Sample:
23
+ #
24
+ # CouchSphinx::MultiAttribute.encode('Hello')
25
+ # => "328,613,876,1132,1391"
26
+ # CouchSphinx::MultiAttribute.decode('328,613,876,1132,1391')
27
+ # => "Hello"
28
+
29
+ module MultiAttribute
30
+
31
+ # Returns an numeric representation of a Ruby String suitable for "multi"
32
+ # attributes of Sphinx.
33
+ #
34
+ # Parameters:
35
+ #
36
+ # [str] String to translate
37
+
38
+ def self.encode(str)
39
+ offset = 0
40
+ return str.bytes.collect { |c| (offset+= 0x0100) + c }.join(',')
41
+ end
42
+
43
+ # Returns the original CouchDB ID created from a Sphinx ID. Only works if
44
+ # the ID was created from a CouchDB ID before!
45
+ #
46
+ # Parameters:
47
+ #
48
+ # [multi] Sphinx "multi" attribute to translate back
49
+
50
+ def self.decode(multi)
51
+ offset = 0
52
+ multi = multi.split(',') if not multi.kind_of? Array
53
+
54
+ return multi.collect {|x| (x.to_i-(offset+=0x0100)).chr}.to_s
55
+ end
56
+ end
57
+ end
metadata ADDED
@@ -0,0 +1,68 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: couchsphinx
3
+ version: !ruby/object:Gem::Version
4
+ version: "0.1"
5
+ platform: ruby
6
+ authors:
7
+ - Jan Ulbrich
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2009-11-04 00:00:00 +01:00
13
+ default_executable:
14
+ dependencies: []
15
+
16
+ description:
17
+ email: jan.ulbrich @nospam@ holtzbrinck.com
18
+ executables: []
19
+
20
+ extensions: []
21
+
22
+ extra_rdoc_files:
23
+ - README.rdoc
24
+ files:
25
+ - README.rdoc
26
+ - couchsphinx.rb
27
+ - lib/multi_attribute.rb
28
+ - lib/mixins/properties.rb
29
+ - lib/mixins/indexer.rb
30
+ - lib/indexer.rb
31
+ has_rdoc: true
32
+ homepage: http://github.com/ulbrich/couchsphinx
33
+ licenses: []
34
+
35
+ post_install_message:
36
+ rdoc_options:
37
+ - --exclude
38
+ - pkg
39
+ - --exclude
40
+ - tmp
41
+ - --all
42
+ - --title
43
+ - CouchSphinx
44
+ - --main
45
+ - README.rdoc
46
+ require_paths:
47
+ - .
48
+ required_ruby_version: !ruby/object:Gem::Requirement
49
+ requirements:
50
+ - - ">="
51
+ - !ruby/object:Gem::Version
52
+ version: "0"
53
+ version:
54
+ required_rubygems_version: !ruby/object:Gem::Requirement
55
+ requirements:
56
+ - - ">="
57
+ - !ruby/object:Gem::Version
58
+ version: "0"
59
+ version:
60
+ requirements: []
61
+
62
+ rubyforge_project:
63
+ rubygems_version: 1.3.5
64
+ signing_key:
65
+ specification_version: 3
66
+ summary: A full text indexing extension for CouchDB/CouchRest using Sphinx.
67
+ test_files: []
68
+