ulbrich-couchsphinx 0.2

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,171 @@
1
+ = CouchSphinx
2
+
3
+ The CouchSphinx library implements an interface between CouchDB and Sphinx
4
+ supporting CouchRest to automatically index objects in Sphinx. It tries to
5
+ act as transparent as possible: Just an additional method in the CouchRest
6
+ domain specific language and some Sphinx configuration are needed to get
7
+ going.
8
+
9
+ == Prerequisites
10
+
11
+ CouchSphinx needs gems CouchRest and Riddle as well as a running Sphinx
12
+ and a CouchDB installation.
13
+
14
+ sudo gem sources -a http://gems.github.com # Only needed once!
15
+ sudo gem install riddle
16
+ sudo gem install couchrest
17
+ sudo gem install ulbrich-couchsphinx
18
+
19
+ No additional configuraton is needed for interfacing with CouchDB: Setup is
20
+ done when CouchRest is able to talk to the CouchDB server.
21
+
22
+ A proper "sphinx.conf" file and a script for retrieving index data have to
23
+ be provided for interfacing with Sphinx: Sorry, no UltraSphinx like
24
+ magic... :-) Depending on the amount of data, more than one index may be used
25
+ and indexes may be consolidated from time to time.
26
+
27
+ This is a sample configuration for a single "main" index:
28
+
29
+ searchd {
30
+ address = 0.0.0.0
31
+ port = 3312
32
+
33
+ log = ./sphinx/searchd.log
34
+ query_log = ./sphinx/query.log
35
+ pid_file = ./sphinx/searchd.pid
36
+ }
37
+
38
+ source couchblog {
39
+ type = xmlpipe2
40
+
41
+ xmlpipe_command = ./sphinxsource.rb
42
+ }
43
+
44
+ index couchblog {
45
+ source = couchblog
46
+
47
+ charset_type = utf-8
48
+ path = ./sphinx/sphinx_index_main
49
+ }
50
+
51
+ The script "sphinxsource.rb" providing the data to index may vary
52
+ depending on the number of CouchDB instances it talks to. This is a simple
53
+ script interfacing with one single instance:
54
+
55
+ #!/usr/bin/env ruby
56
+
57
+ require 'rubygems'
58
+ require 'lib/models' # Depends on location of model files
59
+
60
+ data = SERVER.default_database.view('CouchSphinxIndex/couchrests_by_timestamp')
61
+ rows = data['rows'] rescue []
62
+
63
+ puts CouchSphinx::Indexer::XMLDocset.new(rows).to_s
64
+
65
+ == Models
66
+
67
+ Use method <tt>fulltext_index</tt> to enable indexing of a model. The
68
+ default is to index all attributes but it is recommended to provide a list of
69
+ attribute keys.
70
+
71
+ A side effect of calling this method is, that CouchSphinx overrides the
72
+ default of letting CouchDB create new IDs: Sphinx only allows numeric IDs and
73
+ CouchSphinx forces new objects with the name of the class, a hyphen and an
74
+ integer as ID (e.g. <tt>Post-38497238</tt>). Again: Only these objects are
75
+ indexed due to internal restrictions of Sphinx.
76
+
77
+ Sample:
78
+
79
+ class Post < CouchRest::ExtendedDocument
80
+ use_database SERVER.default_database
81
+
82
+ property :title
83
+ property :body
84
+
85
+ fulltext_index :title, :body
86
+ end
87
+
88
+ Add options <tt>:server</tt> and <tt>:port</tt> to <tt>fulltext_index</tt> if
89
+ the Sphinx server to query is running on a different server (defaults to
90
+ "localhost" with port 3312).
91
+
92
+ If you are sure your Sphinx is compiled with 64-bit support, you may add
93
+ option <tt>:idsize</tt> with value <tt>64</tt> to generate 64-bit IDs for
94
+ CouchDB (defaults to 32-bits).
95
+
96
+ Here is a full-featured sample setting additional options:
97
+
98
+ fulltext_index :title, :body, :server => 'my.other.server', :port => 3313,
99
+ :idsize => 64
100
+
101
+ == Indexing
102
+
103
+ CouchSphinx also adds a new design document to CouchDB: It needs to collect
104
+ all relevant objects for running the Sphinx indexer and adds its own views
105
+ to do so. Have a look at CouchDB design document "CouchSphinxIndex" for
106
+ details.
107
+
108
+ Automatically starting the reindexing of objects the moment new objects are
109
+ created can be implemented by adding a save_callback to the model class:
110
+
111
+ save_callback :after do |object|
112
+ `sudo indexer --all --rotate` # Configure sudo to allow this call...
113
+ end
114
+
115
+ This or a similar callback should be added to all models needing instant
116
+ indexing. If indexing is not that crucial or load is high, some additional
117
+ checks for the time of the last call should be added.
118
+
119
+ == Queries
120
+
121
+ An additional instance method <tt>by_fulltext_index</tt> is added for each
122
+ fulltext indexed model. This method takes a Sphinx query like
123
+ "foo @title bar", runs it within the context of the current class and returns
124
+ an Array of matching CouchDB documents. Use
125
+ <tt>CouchRest::ExtendedDocument.by_fulltext_index</tt> if you want to find
126
+ any document matching the query and not only a certain class.
127
+
128
+ Samples:
129
+
130
+ Post.by_fulltext_index('first')
131
+ => [...]
132
+
133
+ post = Post.by_fulltext_index('this is @title post').first
134
+ post.title
135
+ => "First Post"
136
+ post.class
137
+ => Post
138
+
139
+ Additional options <tt>:match_mode</tt>, <tt>:limit</tt> and
140
+ <tt>:max_matches</tt> can be provided to customize the behaviour of Riddle.
141
+ Option <tt>:raw</tt> can be set to <tt>true</tt> to do no lookup of the
142
+ document IDs but return the raw IDs instead.
143
+
144
+ Sample:
145
+
146
+ Post.by_fulltext_index('my post', :limit => 100)
147
+
148
+ == Copyright & License
149
+
150
+ Copyright (c) 2009 Holtzbrinck Digital GmbH, Jan Ulbrich
151
+
152
+ Permission is hereby granted, free of charge, to any person
153
+ obtaining a copy of this software and associated documentation
154
+ files (the "Software"), to deal in the Software without
155
+ restriction, including without limitation the rights to use,
156
+ copy, modify, merge, publish, distribute, sublicense, and/or sell
157
+ copies of the Software, and to permit persons to whom the
158
+ Software is furnished to do so, subject to the following
159
+ conditions:
160
+
161
+ The above copyright notice and this permission notice shall be
162
+ included in all copies or substantial portions of the Software.
163
+
164
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
165
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
166
+ OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
167
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
168
+ HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
169
+ WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
170
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
171
+ OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,28 @@
1
+ # CouchSphinx, a full text indexing extension for CouchDB/CouchRest using
2
+ # Sphinx.
3
+ #
4
+ # This file contains the includes implementing this library. Have a look at
5
+ # the README.rdoc as a starting point.
6
+
7
+ require 'rubygems'
8
+
9
+ require 'couchrest'
10
+ require 'riddle'
11
+
12
+ # Version number to use for updating CouchDB design document CouchSphinxIndex
13
+ # if needed.
14
+
15
+ module CouchSphinx
16
+ if (match = __FILE__.match(/couchsphinx-([0-9.-]*)/))
17
+ VERSION = match[1]
18
+ else
19
+ VERSION = 'unknown'
20
+ end
21
+ end
22
+
23
+ # Require the stuff implementing this library...
24
+
25
+ require 'lib/multi_attribute'
26
+ require 'lib/indexer'
27
+ require 'lib/mixins/indexer'
28
+ require 'lib/mixins/properties'
@@ -0,0 +1,217 @@
1
+ # CouchSphinx, a full text indexing extension for CouchDB/CouchRest using
2
+ # Sphinx.
3
+ #
4
+ # This file contains the CouchSphinx::Indexer::XMLDocset and
5
+ # CouchSphinx::Indexer::XMLDoc classes.
6
+
7
+ # Namespace module for the CouchSphinx gem.
8
+
9
+ module CouchSphinx #:nodoc:
10
+
11
+ # Module Indexer contains classes for creating XML input documents for the
12
+ # indexer. Each Sphinx index consists of a single "sphinx:docset" with any
13
+ # number of "sphinx:document" tags.
14
+ #
15
+ # The XML source can be generated from an array of CouchRest objects or from
16
+ # an array of Hashes containing at least fields "couchrest-type" and "_id"
17
+ # as returned by CouchDB view "CouchSphinxIndex/couchrests_by_timestamp".
18
+ #
19
+ # Sample:
20
+ #
21
+ # rows = [{ 'name' => 'John', 'phone' => '199 43828',
22
+ # 'couchrest-type' => 'Address', '_id' => 'Address-234164'
23
+ # },
24
+ # { 'name' => 'Sue', 'mobile' => '828 19439',
25
+ # 'couchrest-type' => 'Address', '_id' => 'Address-422433'
26
+ # }
27
+ # ]
28
+ # puts CouchSphinx::Indexer::XMLDocset.new(rows).to_s
29
+ #
30
+ # <?xml version="1.0" encoding="utf-8"?>
31
+ # <sphinx:docset>
32
+ # <sphinx:schema>
33
+ # <sphinx:attr name="csphinx-class" type="multi"/>
34
+ # <sphinx:field name="couchrest-type"/>
35
+ # <sphinx:field name="name"/>
36
+ # <sphinx:field name="phone"/>
37
+ # <sphinx:field name="mobile"/>
38
+ # <sphinx:field name="created_at"/>
39
+ # </sphinx:schema>
40
+ # <sphinx:document id="234164">
41
+ # <csphinx-class>336,623,883,1140</csphinx-class>
42
+ # <couchrest-type>Address</couchrest-type>
43
+ # <name><![CDATA[[John]]></name>
44
+ # <phone><![CDATA[[199 422433]]></phone>
45
+ # <mobile><![CDATA[[]]></mobile>
46
+ # <created_at><![CDATA[[]]></created_at>
47
+ # </sphinx:document>
48
+ # <sphinx:document id="423423">
49
+ # <csphinx-class>336,623,883,1140</csphinx-class>
50
+ # <couchrest-type>Address</couchrest-type>
51
+ # <name><![CDATA[[Sue]]></name>
52
+ # <phone><![CDATA[[]]></phone>
53
+ # <mobile><![CDATA[[828 19439]]></mobile>
54
+ # <created_at><![CDATA[[]]></created_at>
55
+ # </sphinx:document>
56
+ # </sphinx:docset>"
57
+
58
+ module Indexer
59
+
60
+ # Class XMLDocset wraps the XML representation of a document to index. It
61
+ # contains a complete "sphinx:docset" including its schema definition.
62
+
63
+ class XMLDocset
64
+
65
+ # Objects contained in document set.
66
+
67
+ attr_reader :xml_docs
68
+
69
+ # XML generated for opening the document.
70
+
71
+ attr_reader :xml_header
72
+
73
+ # XML generated for closing the document.
74
+
75
+ attr_reader :xml_footer
76
+
77
+ # Creates a XMLDocset object from the provided data. It defines a
78
+ # superset of all fields of the classes to index objects for. The class
79
+ # names are collected from the provided objects as well.
80
+ #
81
+ # Parameters:
82
+ #
83
+ # [data] Array with objects of type CouchRest::Document or Hash to create XML for
84
+
85
+ def initialize(rows = [])
86
+ raise ArgumentError, 'Missing class names' if rows.nil?
87
+
88
+ xml = '<?xml version="1.0" encoding="utf-8"?>'
89
+
90
+ xml << '<sphinx:docset><sphinx:schema>'
91
+
92
+ @xml_docs = []
93
+ classes = []
94
+
95
+ rows.each do |row|
96
+ object = nil
97
+
98
+ if row.kind_of? CouchRest::Document
99
+ object = row
100
+ elsif row.kind_of? Hash
101
+ row = row['value'] if row['couchrest-type'].nil?
102
+
103
+ if row and (class_name = row['couchrest-type'])
104
+ object = eval(class_name.to_s).new(row) rescue nil
105
+ end
106
+ end
107
+
108
+ if object and object.sphinx_id
109
+ classes << object.class if not classes.include? object.class
110
+ @xml_docs << XMLDoc.from_object(object)
111
+ end
112
+ end
113
+
114
+ field_names = classes.collect { |clas| clas.fulltext_keys rescue []
115
+ }.flatten.uniq
116
+
117
+ field_names.each do |key, value|
118
+ xml << "<sphinx:field name=\"#{key}\"/>"
119
+ end
120
+
121
+ xml << '<sphinx:field name="couchrest-type"/>'
122
+ xml << '<sphinx:attr name="csphinx-class" type="multi"/>'
123
+
124
+ xml << '</sphinx:schema>'
125
+
126
+ @xml_header = xml
127
+ @xml_footer = '</sphinx:docset>'
128
+ end
129
+
130
+ # Returns the encoded data as XML.
131
+
132
+ def to_xml
133
+ return to_s
134
+ end
135
+
136
+ # Returns the encoded data as XML.
137
+
138
+ def to_s
139
+ return self.xml_header + self.xml_docs.join + self.xml_footer
140
+ end
141
+ end
142
+
143
+ # Class XMLDoc wraps the XML representation of a single document to index
144
+ # and contains a complete "sphinx:document" tag.
145
+
146
+ class XMLDoc
147
+
148
+ # Returns the ID of the encoded data.
149
+
150
+ attr_reader :id
151
+
152
+ # Returns the class name of the encoded data.
153
+
154
+ attr_reader :class_name
155
+
156
+ # Returns the encoded data.
157
+
158
+ attr_reader :xml
159
+
160
+ # Creates a XMLDoc object from the provided CouchRest object.
161
+ #
162
+ # Parameters:
163
+ #
164
+ # [object] Object to index
165
+
166
+ def self.from_object(object)
167
+ raise ArgumentError, 'Missing object' if object.nil?
168
+ raise ArgumentError, 'No compatible ID' if (id = object.sphinx_id).nil?
169
+
170
+ return new(id, object.class.to_s, object.fulltext_attributes)
171
+ end
172
+
173
+ # Creates a XMLDoc object from the provided ID, class name and data.
174
+ #
175
+ # Parameters:
176
+ #
177
+ # [id] ID of the object to index
178
+ # [class_name] Name of the class
179
+ # [data] Hash with the properties to index
180
+
181
+ def initialize(id, class_name, properties)
182
+ raise ArgumentError, 'Missing id' if id.nil?
183
+ raise ArgumentError, 'Missing class_name' if class_name.nil?
184
+
185
+ xml = "<sphinx:document id=\"#{id}\">"
186
+
187
+ xml << '<csphinx-class>'
188
+ xml << CouchSphinx::MultiAttribute.encode(class_name)
189
+ xml << '</csphinx-class>'
190
+ xml << "<couchrest-type>#{class_name}</couchrest-type>"
191
+
192
+ properties.each do |key, value|
193
+ xml << "<#{key}><![CDATA[[#{value}]]></#{key}>"
194
+ end
195
+
196
+ xml << '</sphinx:document>'
197
+
198
+ @xml = xml
199
+
200
+ @id = id
201
+ @class_name = class_name
202
+ end
203
+
204
+ # Returns the encoded data as XML.
205
+
206
+ def to_xml
207
+ return to_s
208
+ end
209
+
210
+ # Returns the encoded data as XML.
211
+
212
+ def to_s
213
+ return self.xml
214
+ end
215
+ end
216
+ end
217
+ end
@@ -0,0 +1,256 @@
1
+ # CouchSphinx, a full text indexing extension for CouchDB/CouchRest using
2
+ # Sphinx.
3
+ #
4
+ # This file contains the CouchRest::Mixins::Indexer module which in turn
5
+ # includes CouchRest::Mixins::Indexer::ClassMethods.
6
+
7
+ # Patches to the CouchRest library.
8
+
9
+ module CouchRest # :nodoc:
10
+ module Mixins # :nodoc:
11
+
12
+ # Mixin for CouchRest adding indexing stuff. See class ClassMethods for
13
+ # details.
14
+
15
+ module Indexer #:nodoc:
16
+
17
+ # Bootstrap method to include patches with.
18
+ #
19
+ # Parameters:
20
+ #
21
+ # [base] Class to include class methods of module into
22
+
23
+ def self.included(base)
24
+ base.extend(ClassMethods)
25
+ end
26
+
27
+ # Patches to the CouchRest ExtendedDocument module: Adds the
28
+ # "fulltext_index" method for enabling indexing and defining the fields
29
+ # to include as a domain specific extention. This method also assures
30
+ # the existence of a special design document used to generate indexes
31
+ # from.
32
+ #
33
+ # An additional save callback sets an ID like "Post-123123" (class name
34
+ # plus pure numeric ID compatible with Sphinx) for new objects).
35
+ #
36
+ # Last but not least method "by_fulltext_index" is defined allowing a
37
+ # full text search like "foo @title bar" within the context of the
38
+ # current class.
39
+ #
40
+ # Samples:
41
+ #
42
+ # class Post < CouchRest::ExtendedDocument
43
+ # use_database SERVER.default_database
44
+ #
45
+ # property :title
46
+ # property :body
47
+ #
48
+ # fulltext_index :title, :body
49
+ # end
50
+ #
51
+ # Post.by_fulltext_index('first')
52
+ # => [...]
53
+ # post = Post.by_fulltext_index('this is @title post').first
54
+ # post.title
55
+ # => "First Post"
56
+ # post.class
57
+ # => Post
58
+
59
+ module ClassMethods
60
+
61
+ # Method for enabling fulltext indexing and for defining the fields to
62
+ # include.
63
+ #
64
+ # Parameters:
65
+ #
66
+ # [keys] Array of field keys to include plus options Hash
67
+ #
68
+ # Options:
69
+ #
70
+ # [:server] Server name (defaults to localhost)
71
+ # [:port] Server port (defaults to 3312)
72
+ # [:idsize] Number of bits for the ID to generate (defaults to 32)
73
+
74
+ def fulltext_index(*keys)
75
+ opts = keys.pop if keys.last.is_a?(Hash)
76
+ opts ||= {} # Handle some options: Future use... :-)
77
+
78
+ # Save the keys to index and the options for later use in callback.
79
+ # Helper method cattr_accessor is already bootstrapped by couchrest
80
+ # gem.
81
+
82
+ cattr_accessor :fulltext_keys
83
+ cattr_accessor :fulltext_opts
84
+
85
+ self.fulltext_keys = keys
86
+ self.fulltext_opts = opts
87
+
88
+ # We add a few new functions to CouchDB for retrieving modified
89
+ # documents...
90
+
91
+ assure_existing_couch_index
92
+
93
+ # Overwrite setting of new ID to do something compatible with
94
+ # Sphinx. If an ID already exists, we try to match it with our
95
+ # Schema and cowardly ignore if not.
96
+
97
+ save_callback :before do |object|
98
+ if object.id.nil?
99
+ idsize = fulltext_opts[:idsize] || 32
100
+ limit = (1 << idsize) - 1
101
+
102
+ while true
103
+ id = rand(limit)
104
+ candidate = "#{self.class.to_s}-#{id}"
105
+
106
+ begin
107
+ object.class.get(candidate) # Resource not found exception if available
108
+ rescue RestClient::ResourceNotFound
109
+ object['_id'] = candidate
110
+ break
111
+ end
112
+ end
113
+ end
114
+ end
115
+ end
116
+
117
+ # Searches for an object of this model class (e.g. Post, Comment) and
118
+ # the requested query string. The query string may contain any query
119
+ # provided by Sphinx.
120
+ #
121
+ # Call CouchRest::ExtendedDocument.by_fulltext_index() to query
122
+ # without reducing to a single class type.
123
+ #
124
+ # Parameters:
125
+ #
126
+ # [query] Query string like "foo @title bar"
127
+ # [options] Additional options to set
128
+ #
129
+ # Options:
130
+ #
131
+ # [:match_mode] Optional Riddle match mode (defaults to :extended)
132
+ # [:limit] Optional Riddle limit (Riddle default)
133
+ # [:max_matches] Optional Riddle max_matches (Riddle default)
134
+ # [:sort_by] Optional Riddle sort order (also sets sort_mode to :extended)
135
+ # [:raw] Flag to return only IDs and do not lookup objects (defaults to false)
136
+
137
+ def by_fulltext_index(query, options = {})
138
+ if self == ExtendedDocument
139
+ client = Riddle::Client.new
140
+ else
141
+ client = Riddle::Client.new(fulltext_opts[:server],
142
+ fulltext_opts[:port])
143
+
144
+ query = query + " @couchrest-type #{self}"
145
+ end
146
+
147
+ client.match_mode = options[:match_mode] || :extended
148
+
149
+ if (limit = options[:limit])
150
+ client.limit = limit
151
+ end
152
+
153
+ if (max_matches = options[:max_matches])
154
+ client.max_matches = matches
155
+ end
156
+
157
+ if (sort_by = options[:sort_by])
158
+ client.sort_mode = :extended
159
+ client.sort_by = sort_by
160
+ end
161
+
162
+ result = client.query(query)
163
+
164
+ if result and result[:status] == 0 and (matches = result[:matches])
165
+ keys = matches.collect { |row| (CouchSphinx::MultiAttribute.decode(
166
+ row[:attributes]['csphinx-class']) +
167
+ '-' + row[:doc].to_s) rescue nil }.compact
168
+
169
+ return keys if options[:raw]
170
+ return multi_get(keys)
171
+ else
172
+ return []
173
+ end
174
+ end
175
+
176
+ # Returns objects for all provided keys not reducing lookup to a
177
+ # certain type. Casts to a CouchRest object if possible.
178
+ #
179
+ # Parameters:
180
+ #
181
+ # [ids] Array of document IDs to retrieve
182
+
183
+ def multi_get(ids)
184
+ result = CouchRest.post(SERVER.default_database.to_s +
185
+ '/_all_docs?include_docs=true', :keys => ids)
186
+
187
+ return result['rows'].collect { |row|
188
+ row = row['doc'] if row['couchrest-type'].nil?
189
+
190
+ if row and (class_name = row['couchrest-type'])
191
+ eval(class_name.to_s).new(row) rescue row
192
+ else
193
+ row
194
+ end
195
+ }
196
+ end
197
+
198
+ # Defines a design document with the functions needed to lookup
199
+ # modified documents. If the current version is to old, a new version
200
+ # of the design document is stored.
201
+
202
+ def assure_existing_couch_index
203
+ if (doc = database.get("_design/CouchSphinxIndex") rescue nil)
204
+ return if (ver = doc['version']) and ver == CouchSphinx::VERSION
205
+
206
+ database.delete_doc(doc)
207
+ end
208
+
209
+ all_couchrests = {
210
+ :map => 'function(doc) {
211
+ if(doc["couchrest-type"] && (doc["created_at"] || doc["updated_at"])) {
212
+ var date = doc["updated_at"];
213
+
214
+ if(date == null)
215
+ date = doc["created_at"];
216
+
217
+ emit(doc._id, doc);
218
+ }
219
+ }'
220
+ }
221
+
222
+ couchrests_by_timestamp = {
223
+ :map => 'function(doc) {
224
+ if(doc["couchrest-type"] && (doc["created_at"] || doc["updated_at"])) {
225
+ var date = doc["updated_at"];
226
+
227
+ if(date == null)
228
+ date = doc["created_at"];
229
+
230
+ emit(Date.parse(date), doc);
231
+ }
232
+ }'
233
+ }
234
+
235
+ database.save_doc({
236
+ "_id" => "_design/CouchSphinxIndex",
237
+ :lib_version => CouchSphinx::VERSION,
238
+ :views => {
239
+ :all_couchrests => all_couchrests,
240
+ :couchrests_by_timestamp => couchrests_by_timestamp
241
+ }
242
+ })
243
+ end
244
+ end
245
+ end
246
+ end
247
+ end
248
+
249
+ # Include the Indexer mixin from the original ExtendedDocument class of
250
+ # CouchRest which adds a few methods and allows calling method indexed_with.
251
+
252
+ module CouchRest # :nodoc:
253
+ class ExtendedDocument # :nodoc:
254
+ include CouchRest::Mixins::Indexer
255
+ end
256
+ end
@@ -0,0 +1,75 @@
1
+ # CouchSphinx, a full text indexing extension for CouchDB/CouchRest using
2
+ # Sphinx.
3
+ #
4
+ # This file contains the CouchRest::Mixins::Properties module.
5
+
6
+ # Patches to the CouchRest library.
7
+
8
+ module CouchRest # :nodoc:
9
+ module Mixins # :nodoc:
10
+
11
+ # Patches to the CouchRest Properties module: Adds the "attributes" method
12
+ # plus some fulltext relevant stuff.
13
+ #
14
+ # Samples:
15
+ #
16
+ # data = SERVER.default_database.view('CouchSphinxIndex/couchrests_by_timestamp')
17
+ # rows = data['rows']
18
+ # post = Post.new(rows.first)
19
+ #
20
+ # post.attributes
21
+ # => {:tags=>"one, two, three", :updated_at=>Tue Jun 09 14:45:00 +0200 2009,
22
+ # :author=>nil, :title=>"First Post",
23
+ # :created_at=>Tue Jun 09 14:45:00 +0200 2009,
24
+ # :body=>"This is the first post. This is the [...] first post. "}
25
+ #
26
+ # post.fulltext_attributes
27
+ # => {:title=>"First Post", :author=>nil,
28
+ # :created_at=>Tue Jun 09 14:45:00 +0200 2009
29
+ # :body=>"This is the first post. This is the [...] first post. "}
30
+ #
31
+ # post.sphinx_id
32
+ # => "921744775"
33
+ # post.id
34
+ # => "Post-921744775"
35
+
36
+ module Properties
37
+
38
+ # Returns a Hash of all properties plus the ID of the document.
39
+
40
+ def attributes
41
+ data = {}
42
+
43
+ self.properties.collect { |p|
44
+ { p.name.intern => self.send(p.name) } }.each { |h|
45
+ data.merge! h }
46
+
47
+ return data
48
+ end
49
+
50
+ # Returns a Hash of all attributes allowed to be indexed. As a side
51
+ # effect it sets the fulltext_keys variable if still blank or empty.
52
+
53
+ def fulltext_attributes
54
+ clas = self.class
55
+
56
+ if not clas.fulltext_keys or clas.fulltext_keys.empty?
57
+ clas.fulltext_keys = self.properties.collect { |p| p.name.intern }
58
+ end
59
+
60
+ return self.attributes.reject { |k, v|
61
+ not (clas.fulltext_keys.include? k) }
62
+ end
63
+
64
+ # Returns the numeric part of the document ID (compatible to Sphinx).
65
+
66
+ def sphinx_id
67
+ if (match = self.id.match(/#{self.class}-([0-9]+)/))
68
+ return match[1]
69
+ else
70
+ return nil
71
+ end
72
+ end
73
+ end
74
+ end
75
+ end
@@ -0,0 +1,57 @@
1
+ # CouchSphinx, a full text indexing extension for CouchDB/CouchRest using
2
+ # Sphinx.
3
+ #
4
+ # This file contains the CouchSphinx::MultiAttribute class.
5
+
6
+ # Namespace module for the CouchSphinx gem.
7
+
8
+ module CouchSphinx #:nodoc:
9
+
10
+ # Module MultiAttribute implements helpers to translate back and
11
+ # forth between Ruby Strings and an array of integers suitable for Sphinx
12
+ # attributes of type "multi".
13
+ #
14
+ # Background: Getting an ID as result for a query is OK, but for example to
15
+ # allow cast safety, we need an aditional attribute. Sphinx supports
16
+ # attributes which are returned together with the ID, but they behave a
17
+ # little different than expected: Instead we can use arrays of integers with
18
+ # ASCII character codes. These values are returned in ascending (!) order of
19
+ # value (yes, sounds funny but is reasonable from an internal view to
20
+ # Sphinx). So we mask each byte with 0x0100++ to keep the order...
21
+ #
22
+ # Sample:
23
+ #
24
+ # CouchSphinx::MultiAttribute.encode('Hello')
25
+ # => "328,613,876,1132,1391"
26
+ # CouchSphinx::MultiAttribute.decode('328,613,876,1132,1391')
27
+ # => "Hello"
28
+
29
+ module MultiAttribute
30
+
31
+ # Returns an numeric representation of a Ruby String suitable for "multi"
32
+ # attributes of Sphinx.
33
+ #
34
+ # Parameters:
35
+ #
36
+ # [str] String to translate
37
+
38
+ def self.encode(str)
39
+ offset = 0
40
+ return str.bytes.collect { |c| (offset+= 0x0100) + c }.join(',')
41
+ end
42
+
43
+ # Returns the original CouchDB ID created from a Sphinx ID. Only works if
44
+ # the ID was created from a CouchDB ID before!
45
+ #
46
+ # Parameters:
47
+ #
48
+ # [multi] Sphinx "multi" attribute to translate back
49
+
50
+ def self.decode(multi)
51
+ offset = 0
52
+ multi = multi.split(',') if not multi.kind_of? Array
53
+
54
+ return multi.collect {|x| (x.to_i-(offset+=0x0100)).chr}.to_s
55
+ end
56
+ end
57
+ end
metadata ADDED
@@ -0,0 +1,66 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: ulbrich-couchsphinx
3
+ version: !ruby/object:Gem::Version
4
+ version: "0.2"
5
+ platform: ruby
6
+ authors:
7
+ - Jan Ulbrich
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2009-05-16 00:00:00 -07:00
13
+ default_executable:
14
+ dependencies: []
15
+
16
+ description:
17
+ email: jan.ulbrich @nospam@ holtzbrinck.com
18
+ executables: []
19
+
20
+ extensions: []
21
+
22
+ extra_rdoc_files:
23
+ - README.rdoc
24
+ files:
25
+ - README.rdoc
26
+ - couchsphinx.rb
27
+ - lib/multi_attribute.rb
28
+ - lib/mixins/properties.rb
29
+ - lib/mixins/indexer.rb
30
+ - lib/indexer.rb
31
+ has_rdoc: true
32
+ homepage: http://github.com/ulbrich/couchsphinx
33
+ post_install_message:
34
+ rdoc_options:
35
+ - --exclude
36
+ - pkg
37
+ - --exclude
38
+ - tmp
39
+ - --all
40
+ - --title
41
+ - CouchSphinx
42
+ - --main
43
+ - README.rdoc
44
+ require_paths:
45
+ - .
46
+ required_ruby_version: !ruby/object:Gem::Requirement
47
+ requirements:
48
+ - - ">="
49
+ - !ruby/object:Gem::Version
50
+ version: "0"
51
+ version:
52
+ required_rubygems_version: !ruby/object:Gem::Requirement
53
+ requirements:
54
+ - - ">="
55
+ - !ruby/object:Gem::Version
56
+ version: "0"
57
+ version:
58
+ requirements: []
59
+
60
+ rubyforge_project:
61
+ rubygems_version: 1.2.0
62
+ signing_key:
63
+ specification_version: 2
64
+ summary: A full text indexing extension for CouchDB/CouchRest using Sphinx.
65
+ test_files: []
66
+