bdimcheff-dm-sphinx-adapter 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,200 @@
1
+ = DataMapper Sphinx Adapter
2
+
3
+ * http://dm-sphinx.rubyforge.org
4
+ * http://rubyforge.org/projects/dm-sphinx
5
+ * http://github.com/shanna/dm-sphinx-adapter/tree/master
6
+
7
+ == Description
8
+
9
+ A DataMapper Sphinx adapter.
10
+
11
+ == Dependencies
12
+
13
+ Ruby::
14
+ * dm-core ~> 0.9.7
15
+ * dm-is-searchable ~> 0.9.7 (optional)
16
+
17
+ I'd recommend using the dm-more plugin dm-is-searchable instead of fetching the document id's yourself.
18
+
19
+ Sphinx::
20
+ * 0.9.8-r871
21
+ * 0.9.8-r909
22
+ * 0.9.8-r985
23
+ * 0.9.8-r1065
24
+ * 0.9.8-r1112
25
+ * 0.9.8-rc1 (gem version: 0.9.8.1198)
26
+ * 0.9.8-rc2 (gem version: 0.9.8.1231)
27
+ * 0.9.8 (gem version: 0.9.8.1371)
28
+
29
+ Internally the Riddle client library is used.
30
+
31
+ == Install
32
+
33
+ * Via git: git clone git://github.com/shanna/dm-sphinx-adapter.git
34
+ * Via gem: gem install shanna-dm-sphinx-adapter -s http://gems.github.com
35
+
36
+ == Synopsis
37
+
38
+ DataMapper uses URIs or a connection has to connect to your data-stores. In this case the sphinx search daemon
39
+ <tt>searchd</tt>.
40
+
41
+ On its own this adapter will only return an array of document hashes when queried. The DataMapper library
42
+ <tt>dm-is-searchable</tt> however provides a common interface to search one adapter and load documents from another. My
43
+ preference is to use this adapter in tandem with <tt>dm-is-searchable</tt>. See further examples in the synopsis for
44
+ usage with <tt>dm-is-searchable</tt>.
45
+
46
+ Like all DataMapper adapters you can connect with a Hash or URI.
47
+
48
+ A URI:
49
+ DataMapper.setup(:search, 'sphinx://localhost')
50
+
51
+ The breakdown is:
52
+ "#{adapter}://#{host}:#{port}/#{config}"
53
+ - adapter Must be :sphinx
54
+ - host Hostname (default: localhost)
55
+ - port Optional port number (default: 3312)
56
+
57
+ Alternatively supply a Hash:
58
+ DataMapper.setup(:search, {
59
+ :adapter => 'sphinx', # required
60
+ :config => './sphinx.conf' # optional. Recommended though.
61
+ :host => 'localhost', # optional. Default: localhost
62
+ :port => 3312 # optional. Default: 3312
63
+ }
64
+
65
+ === DataMapper
66
+
67
+ require 'rubygems'
68
+ require 'dm-sphinx-adapter'
69
+
70
+ DataMapper.setup(:default, 'sqlite3::memory:')
71
+ DataMapper.setup(:search, 'sphinx://localhost:3312')
72
+
73
+ class Item
74
+ include DataMapper::Resource
75
+ property :id, Serial
76
+ property :name, String
77
+ end
78
+
79
+ # Fire up your sphinx search daemon and start searching.
80
+ docs = repository(:search){ Item.all(:name => 'barney') } # Search 'items' index for '@name barney'
81
+ ids = docs.map{|doc| doc[:id]}
82
+ items = Item.all(:id => ids) # Search :default for all the document id's returned by sphinx.
83
+
84
+ === DataMapper and IsSearchable
85
+
86
+ IsSearchable is a DataMapper plugin that provides a common search interface when searching from one adapter and reading
87
+ documents from another.
88
+
89
+ IsSearchable will read resources from your <tt>:default</tt> repository on behalf of a search adapter such as
90
+ <tt>dm-sphinx-adapter</tt> and <tt>dm-ferret-adapter</tt>. This saves some of the grunt work (as shown in the previous
91
+ example) by mapping the resulting document id's from a search with your <tt>:search</tt> adapter into a suitable
92
+ <tt>#first</tt> or <tt>#all</tt> query for your <tt>:default</tt> repository.
93
+
94
+ IsSearchable adds a single class method to your resource. The first argument is a <tt>Hash</tt> of
95
+ <tt>DataMapper::Query</tt> conditions to pass to your search adapter (in this case <tt>dm-sphinx-adapter</tt>). An
96
+ optional second <tt>Hash</tt> of <tt>DataMapper::Query</tt> conditions can also be passed and will be appended to the
97
+ query on your <tt>:default</tt> database. This can be handy if you need to add extra exclusions that aren't possible
98
+ using <tt>dm-sphinx-adapter</tt> such as <tt>#gt</tt> or <tt>#lt</tt> conditions.
99
+
100
+ require 'rubygems'
101
+ require 'dm-core'
102
+ require 'dm-is-searchable'
103
+ require 'dm-sphinx-adapter'
104
+
105
+ # Connections.
106
+ DataMapper.setup(:default, 'sqlite3::memory:')
107
+ DataMapper.setup(:search, 'sphinx://localhost:3312')
108
+
109
+ class Item
110
+ include DataMapper::Resource
111
+ property :id, Serial
112
+ property :name, String
113
+
114
+ is :searchable # defaults to :search repository though you can be explicit:
115
+ # is :searchable, :repository => :sphinx
116
+ end
117
+
118
+ # Fire up your sphinx search daemon and start searching.
119
+ items = Item.search(:name => 'barney') # Search 'items' index for '@name barney'
120
+
121
+ === Merb, DataMapper and IsSearchable
122
+
123
+ # config/init.rb
124
+ dependency 'dm-is-searchable'
125
+ dependency 'dm-sphinx-adapter'
126
+
127
+ # config/database.yml
128
+ ---
129
+ development: &defaults
130
+ repositories:
131
+ search:
132
+ adapter: sphinx
133
+ host: localhost
134
+ port: 3312
135
+
136
+ # app/models/item.rb
137
+ class Item
138
+ include DataMapper::Resource
139
+ property :id, Serial
140
+ property :name, String
141
+
142
+ is :searchable # defaults to :search repository though you can be explicit:
143
+ # is :searchable, :repository => :sphinx
144
+ end # Item
145
+
146
+ # Fire up your sphinx search daemon and start searching.
147
+ Item.search(:name => 'barney') # Search 'items' index for '@name barney'
148
+
149
+ === DataMapper, IsSearchable and DataMapper::SphinxResource
150
+
151
+ For finer grained control you can include DataMapper::SphinxResource. For instance you can search one or more indexes
152
+ and sort, include or exclude by attributes defined in your sphinx configuration:
153
+
154
+ class Item
155
+ include DataMapper::SphinxResource
156
+ property :id, Serial
157
+ property :name, String
158
+
159
+ is :searchable
160
+ repository(:search) do
161
+ index :items
162
+ index :items_delta, :delta => true
163
+
164
+ # Sphinx attributes to sort include/exclude by.
165
+ attribute :updated_on, DateTime
166
+ end
167
+
168
+ end # Item
169
+
170
+ # Search 'items, items_delta' index for '@name barney' updated in the last 30 minutes.
171
+ Item.search(:name => 'barney', :updated => (Time.now - 1800 .. Time.now))
172
+
173
+ == Sphinx Configuration
174
+
175
+ No limitations, restrictions or requirement are imposed on your sphinx configuration. The adapter will not generate nor
176
+ overwrite your finely crafted config file.
177
+
178
+ == Searchd
179
+
180
+ To keep things simple, this adapter does not manage your sphinx server. Try one of these fine offerings:
181
+
182
+ * god[http://god.rubyforge.org]
183
+ * daemon_controller[http://github.com/FooBarWidget/daemon_controller/tree/master]
184
+ * monit[http://www.tildeslash.com/monit]
185
+
186
+ == Indexer and Live(ish) updates.
187
+
188
+ As of 0.3 the indexer will no longer be fired on create/update even if you have delta indexes defined. Sphinx indexing
189
+ is blazing fast but unless your resource sees very little activity you will run the risk of lock errors on
190
+ the temporary delta index files (.tmpl.sp1) and your delta index won't be updated. Given this functionality is
191
+ unreliable at best I've chosen to remove it.
192
+
193
+ For reliable live(ish) updates in a main + delta scheme it's probably best you schedule them outside of your ORM.
194
+ Andrew (Shodan) Aksyonoff of Sphinx suggests a cronjob or alternatively if you need even less lag to "run indexer in
195
+ an endless loop, with a few seconds of sleep in between to allow searchd some headroom to pick up the changes".
196
+
197
+ == Contributing
198
+
199
+ Go nuts. Just send me a pull request (github or otherwise) when you are happy with your code.
200
+
@@ -0,0 +1,49 @@
1
+ require 'rubygems'
2
+ require 'rake'
3
+
4
+ begin
5
+ require 'jeweler'
6
+ Jeweler::Tasks.new do |gem|
7
+ gem.name = "dm-sphinx-adapter"
8
+ gem.summary = %q{A DataMapper Sphinx adapter.}
9
+ gem.email = "shane.hanna@gmail.com"
10
+ gem.homepage = "http://github.com/shanna/dm-sphinx-adapter"
11
+ gem.authors = ["Shane Hanna"]
12
+ gem.add_dependency 'dm-core', ['~> 0.9']
13
+ gem.files.reject!{|f| f=~ %r{test/files/tmp/.*}}
14
+ # gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
15
+ end
16
+ rescue LoadError
17
+ puts "Jeweler not available. Install it with: sudo gem install technicalpickles-jeweler -s http://gems.github.com"
18
+ end
19
+
20
+ require 'rake/rdoctask'
21
+ Rake::RDocTask.new do |rdoc|
22
+ rdoc.rdoc_dir = 'rdoc'
23
+ rdoc.title = 'dm-sphinx-adapter'
24
+ rdoc.options << '--line-numbers' << '--inline-source'
25
+ rdoc.rdoc_files.include('README*')
26
+ rdoc.rdoc_files.include('lib/**/*.rb')
27
+ end
28
+
29
+ require 'rake/testtask'
30
+ Rake::TestTask.new(:test) do |test|
31
+ test.libs << 'lib' << 'test'
32
+ test.pattern = 'test/**/test_*.rb'
33
+ test.verbose = true
34
+ end
35
+
36
+ begin
37
+ require 'rcov/rcovtask'
38
+ Rcov::RcovTask.new do |test|
39
+ test.libs << 'test'
40
+ test.pattern = 'test/**/test_*.rb'
41
+ test.verbose = true
42
+ end
43
+ rescue LoadError
44
+ task :rcov do
45
+ abort "RCov is not available. In order to run rcov, you must: sudo gem install spicycode-rcov"
46
+ end
47
+ end
48
+
49
+ task :default => :test
@@ -0,0 +1,4 @@
1
+ ---
2
+ :major: 0
3
+ :minor: 8
4
+ :patch: 0
@@ -0,0 +1,23 @@
1
+ require 'rubygems'
2
+
3
+ # TODO: Hide the shitload of dm-core warnings or at least try to?
4
+ old_verbose, $VERBOSE = $VERBOSE, nil
5
+ gem 'dm-core', '~> 0.9.8'
6
+ require 'dm-core'
7
+ $VERBOSE = old_verbose
8
+
9
+ require 'pathname'
10
+ lib = Pathname(__FILE__).dirname.expand_path
11
+ dir = lib / 'dm-sphinx-adapter'
12
+
13
+ # Bundled Riddle since the gem is very old and we don't need any of the config generation stuff.
14
+ $:.unshift lib
15
+ require 'riddle'
16
+
17
+ # TODO: Require farms suck. Do something about it.
18
+ require dir / 'adapter'
19
+ require dir / 'attribute'
20
+ require dir / 'collection'
21
+ require dir / 'index'
22
+ require dir / 'query'
23
+ require dir / 'resource'
@@ -0,0 +1,200 @@
1
+ module DataMapper
2
+ module Adapters
3
+ module Sphinx
4
+ # == Synopsis
5
+ #
6
+ # DataMapper uses URIs or a connection has to connect to your data-stores. In this case the sphinx search daemon
7
+ # <tt>searchd</tt>.
8
+ #
9
+ # On its own this adapter will only return an array of document hashes when queried. The DataMapper library dm-more
10
+ # however provides dm-is-searchable, a common interface to search one adapter and load documents from another. My
11
+ # preference is to use this adapter in tandem with dm-is-searchable.
12
+ #
13
+ # Like all DataMapper adapters you can connect with a Hash or URI.
14
+ #
15
+ # A URI:
16
+ # DataMapper.setup(:search, 'sphinx://localhost')
17
+ #
18
+ # The breakdown is:
19
+ # "#{adapter}://#{host}:#{port}/#{config}"
20
+ # - adapter Must be :sphinx
21
+ # - host Hostname (default: localhost)
22
+ # - port Optional port number (default: 3312)
23
+ #
24
+ # Alternatively supply a Hash:
25
+ # DataMapper.setup(:search, {
26
+ # :adapter => 'sphinx', # required
27
+ # :host => 'localhost', # optional. Default: localhost
28
+ # :port => 3312 # optional. Default: 3312
29
+ # })
30
+ class Adapter < AbstractAdapter
31
+
32
+ # ==== See
33
+ # * DataMapper::Adapters::AbstractAdapter
34
+ #
35
+ # ==== Parameters
36
+ # uri_or_options<URI, DataObject::URI, Addressable::URI, String, Hash, Pathname>::
37
+ # DataMapper uri or options hash.
38
+ def initialize(name, uri_or_options)
39
+ super # Set up defaults.
40
+ @options = normalize_options(uri_or_options)
41
+ end
42
+
43
+ def create(resources) #:nodoc:
44
+ 0
45
+ end
46
+
47
+ def delete(query) #:nodoc:
48
+ 0
49
+ end
50
+
51
+ # Query your Sphinx repository and return all matching documents.
52
+ #
53
+ # ==== Notes
54
+ #
55
+ # These methods are public but normally called indirectly through DataMapper::Resource#get,
56
+ # DataMapper::Resource#first or DataMapper::Resource#all.
57
+ #
58
+ # The document hashes returned are those from Riddle::Client.
59
+ #
60
+ # ==== Parameters
61
+ # query<DataMapper::Query>:: The query object.
62
+ #
63
+ # ==== Returns
64
+ # Array<Hash>:: An array of document hashes. <tt>[{:id => 1, ...}, {:id => 2, ...}]</tt>
65
+ # Array<>:: An empty array if no documents match.
66
+ def read_many(query)
67
+ read(query)
68
+ end
69
+
70
+ # Query your Sphinx repository and return the first document matched.
71
+ #
72
+ # ==== Notes
73
+ #
74
+ # These methods are public but normally called indirectly through DataMapper::Resource#get,
75
+ # DataMapper::Resource#first or DataMapper::Resource#all.
76
+ #
77
+ # ==== Parameters
78
+ # query<DataMapper::Query>:: The query object.
79
+ #
80
+ # ==== Returns
81
+ # Hash:: An document hash of the first document matched. <tt>{:id => 1, ...}</tt>
82
+ # Nil:: If no documents match.
83
+ def read_one(query)
84
+ read(query).first
85
+ end
86
+
87
+ protected
88
+ # List sphinx indexes to search.
89
+ #
90
+ # If no indexes are explicitly declared using DataMapper::Adapters::Sphinx::Resource then the default storage
91
+ # name is used.
92
+ #
93
+ # ==== See
94
+ # * DataMapper::Adapters::Sphinx::Resource::ClassMethods#sphinx_indexes
95
+ #
96
+ # ==== Parameters
97
+ # model<DataMapper::Model>:: The DataMapper::Model.
98
+ #
99
+ # ==== Returns
100
+ # Array<DataMapper::Adapters::Sphinx::Index>:: Index objects from the model.
101
+ def indexes(query)
102
+ indexes = query.model.sphinx_indexes(name) if query.model.respond_to?(:sphinx_indexes)
103
+ if indexes.nil? or indexes.empty?
104
+ indexes = [Index.new(query.model, query.model.storage_name(name))]
105
+ end
106
+ indexes
107
+ end
108
+
109
+ # Query sphinx for a list of document IDs.
110
+ #
111
+ # ==== Parameters
112
+ # query<DataMapper::Query>:: The query object.
113
+ #
114
+ # ==== Returns
115
+ # Array<Hash>:: An array of document hashes. <tt>[{:id => 1, ...}, {:id => 2, ...}]</tt>
116
+ # Array<>:: An empty array if no documents match.
117
+ def read(query)
118
+ from = indexes(query).map{|index| index.name}.join(', ')
119
+ search = Sphinx::Query.new(query).to_s
120
+ client = Riddle::Client.new(@options[:host], @options[:port])
121
+
122
+ # You can set some options that aren't set by the adapter.
123
+ @options.except(:host, :port, :match_mode, :limit, :offset, :sort_mode, :sort_by).each do |k, v|
124
+ client.method("#{k}=".to_sym).call(v) if client.respond_to?("#{k}=".to_sym)
125
+ end
126
+
127
+ client.match_mode = :extended
128
+ client.filters = search_filters(query) # By attribute.
129
+ client.limit = query.limit.to_i if query.limit
130
+ client.offset = query.offset.to_i if query.offset
131
+
132
+ if order = search_order(query)
133
+ client.sort_mode = :extended
134
+ client.sort_by = order
135
+ end
136
+
137
+ result = client.query(search, from)
138
+ raise result[:error] unless result[:error].nil?
139
+
140
+ DataMapper.logger.info(
141
+ %q{Sphinx (%.3f): search '%s' in '%s' found %d documents} % [result[:time], search, from, result[:total]]
142
+ )
143
+ # TODO: Confusing, call it something other than collection?
144
+ Collection.new(result)
145
+ # result[:matches].map{|doc| doc[:id] = doc[:doc]; doc}
146
+ end
147
+
148
+
149
+ # Riddle search filters for attributes.
150
+ def search_filters(query) #:nodoc:
151
+ filters = []
152
+ query.conditions.each do |operator, attribute, value|
153
+ next unless attribute.kind_of? Sphinx::Attribute
154
+ filters << case operator
155
+ when :eql, :like then attribute.filter(value)
156
+ when :not then attribute.filter(value, false)
157
+ else raise NotImplementedError.new("Sphinx: Query attributes do not support the #{operator} operator")
158
+ end
159
+ end
160
+ filters
161
+ end
162
+
163
+ # TODO: How do you tell the difference between the default query order and someone explicitly asking for
164
+ # sorting by the primary key? I don't think you can at the moment.
165
+ def search_order(query) #:nodoc:
166
+ by = []
167
+ query.order.each do |order|
168
+ next unless order.property.kind_of? Sphinx::Attribute
169
+ by << [order.property.field, order.direction].join(' ')
170
+ end
171
+ by.empty? ? nil : by.join(', ')
172
+ end
173
+
174
+ # Coerce +uri_or_options+ into a +Hash+ of options.
175
+ #
176
+ # ==== Parameters
177
+ # uri_or_options<URI, DataObject::URI, Addressable::URI, String, Hash, Pathname>::
178
+ # DataMapper uri or options hash.
179
+ #
180
+ # ==== Returns
181
+ # Hash
182
+ def normalize_options(uri_or_options)
183
+ case uri_or_options
184
+ when String, Addressable::URI then DataObjects::URI.parse(uri_or_options).attributes
185
+ when DataObjects::URI then uri_or_options.attributes
186
+ when Pathname then {:path => uri_or_options}
187
+ else
188
+ uri_or_options[:path] ||= uri_or_options.delete(:config) || uri_or_options.delete(:database)
189
+ uri_or_options
190
+ end
191
+ end
192
+
193
+ end # Adapter
194
+ end # Sphinx
195
+
196
+ # Keep magic in DataMapper#setup happy.
197
+ SphinxAdapter = Sphinx::Adapter
198
+ end # Adapters
199
+ end # DataMapper
200
+