gsolr 0.12.2

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore ADDED
@@ -0,0 +1,3 @@
1
+ pkg/*
2
+ *.gem
3
+ .bundle
data/Gemfile ADDED
@@ -0,0 +1,11 @@
1
+ source "http://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in gsolr.gemspec
4
+ gemspec
5
+
6
+ # gem 'builder'
7
+
8
+ group :test do
9
+ gem 'rspec'
10
+ gem 'rspec-core'
11
+ end
data/Gemfile.lock ADDED
@@ -0,0 +1,30 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ gsolr (0.0.1)
5
+ json (~> 1.4.6)
6
+
7
+ GEM
8
+ remote: http://rubygems.org/
9
+ specs:
10
+ diff-lcs (1.1.2)
11
+ json (1.4.6)
12
+ rspec (2.0.1)
13
+ rspec-core (~> 2.0.1)
14
+ rspec-expectations (~> 2.0.1)
15
+ rspec-mocks (~> 2.0.1)
16
+ rspec-core (2.0.1)
17
+ rspec-expectations (2.0.1)
18
+ diff-lcs (>= 1.1.2)
19
+ rspec-mocks (2.0.1)
20
+ rspec-core (~> 2.0.1)
21
+ rspec-expectations (~> 2.0.1)
22
+
23
+ PLATFORMS
24
+ ruby
25
+
26
+ DEPENDENCIES
27
+ gsolr!
28
+ json (~> 1.4.6)
29
+ rspec
30
+ rspec-core
data/LICENSE ADDED
@@ -0,0 +1,13 @@
1
+ Copyright 2008-2010 Matt Mitchell
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an "AS IS" BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
data/README.md ADDED
@@ -0,0 +1,160 @@
1
+ # GSolr
2
+
3
+ A simple, extensible Ruby client for the Solr interface. Capable of talking to Solr and to Riak.
4
+
5
+ ## Installation:
6
+ sudo gem install gsolr
7
+
8
+ ## Example:
9
+ require 'rubygems'
10
+ require 'gsolr'
11
+ solr = GSolr.connect :url => "http://solrserver.com"
12
+
13
+ # send a request to /select
14
+ response = gsolr.select :q=>'*:*'
15
+
16
+ # send a request to a custom request handler; /catalog
17
+ response = gsolr.request '/catalog', :q=>'*:*'
18
+
19
+ # alternative to above:
20
+ response = gsolr.catalog :q=>'*:*'
21
+
22
+ ## Querying
23
+ Use the #select method to send requests to the /select handler:
24
+
25
+ response = solr.select {
26
+ :q=>'washington',
27
+ :start=>0,
28
+ :rows=>10
29
+ }
30
+
31
+ The params sent into the method are sent to Solr as-is. The one exception is if a value is an array. When an array is used, multiple parameters *with the same name* are generated for the Solr query. Example:
32
+
33
+ solr.select {
34
+ :q => 'roses',
35
+ :fq => ['red', 'violet']
36
+ }
37
+
38
+ The above statement generates this Solr query:
39
+
40
+ .../?q=roses&fq=red&fq=violet
41
+
42
+ Use the #request method for a custom request handler path:
43
+
44
+ response = solr.request '/documents', :q=>'test'
45
+
46
+ A shortcut for the above example use a method call instead:
47
+
48
+ response = solr.documents :q=>'test'
49
+
50
+
51
+ ## Updating Solr
52
+ Updating uses native Ruby structures. Hashes are used for single documents and arrays are used for a collection of documents (hashes). These structures get turned into simple XML "messages". Raw XML strings can also be used.
53
+
54
+ Raw XML via #update
55
+
56
+ solr.update '</commit>'
57
+ solr.update '</optimize>'
58
+
59
+ Single document via #add
60
+
61
+ solr.add {
62
+ :id => 1,
63
+ :price => 1.00
64
+ }
65
+
66
+ Multiple documents via #add
67
+
68
+ documents = [{
69
+ :id => 1,
70
+ :price => 1.00
71
+ }, {
72
+ :id => 2,
73
+ :price => 10.50
74
+ }]
75
+
76
+ solr.add documents
77
+
78
+ When adding, you can also supply "add" xml element attributes and/or a block for manipulating other "add" related elements (docs and fields) when using the #add method:
79
+
80
+ add_doc = {:id=>1, :price=>1.00}
81
+ add_attr = {:allowDups=>false, :commitWithin=>10.0}
82
+
83
+ solr.add(add_doc, add_attr) do |doc|
84
+ # boost each document
85
+ doc.attrs[:boost] = 1.5
86
+ # boost the price field:
87
+ doc.field_by_name(:price).attrs[:boost] = 2.0
88
+ end
89
+
90
+ Delete by id
91
+
92
+ solr.delete_by_id 1
93
+
94
+ or an array of ids
95
+
96
+ solr.delete_by_id [1, 2, 3, 4]
97
+
98
+ Delete by query:
99
+
100
+ solr.delete_by_query 'price:1.00'
101
+
102
+ Delete by array of queries
103
+
104
+ solr.delete_by_query ['price:1.00', 'price:10.00']
105
+
106
+ Commit & optimize shortcuts
107
+
108
+ solr.commit
109
+ solr.optimize
110
+
111
+ ## Response Formats
112
+ The default response format is Ruby. When the :wt param is set to :ruby, the response is eval'd resulting in a Hash. You can get a raw response by setting the :wt to "ruby" - notice, the string -- not a symbol. GSolr will eval the Ruby string ONLY if the :wt value is :ruby. All other response formats are available as expected, :wt=>'xml' etc..
113
+
114
+ ### Evaluated Ruby (default)
115
+
116
+ solr.select(:wt=>:ruby) # notice :ruby is a Symbol
117
+
118
+ ### Raw Ruby
119
+
120
+ solr.select(:wt=>'ruby') # notice 'ruby' is a String
121
+
122
+ ### XML:
123
+
124
+ solr.select(:wt=>:xml)
125
+
126
+ ### JSON:
127
+
128
+ solr.select(:wt=>:json)
129
+
130
+ You can access the original request context (path, params, url etc.) by calling the #raw method:
131
+
132
+ response = solr.select :q=>'*:*'
133
+
134
+ response.raw[:status_code]
135
+ response.raw[:body]
136
+ response.raw[:url]
137
+
138
+ The raw is a hash that contains the generated params, url, path, post data, headers etc., very useful for debugging and testing.
139
+
140
+ ## Related Resources & Projects
141
+ * The Apache Solr project
142
+ * [Solr](http://lucene.apache.org/solr/)
143
+ * The original Solr Ruby Gem
144
+ * [solr-ruby](http://wiki.apache.org/solr/solr-ruby)
145
+ * The RSolr Gem, from which this was hijacked
146
+ * [RSolr](https://github.com/mwmitchell/rsolr)
147
+
148
+ ## Note on Patches/Pull Requests
149
+ * Fork the project.
150
+ * Add tests for your contribution.
151
+ * Write your contribution.
152
+ * Commit only that contribution. Changes to rakefile, version, or history should be done in a respective commit.
153
+ * Send a pull request.
154
+
155
+ ## Contributors (to the RSolr project, who therefore contributed to this)
156
+ * mperham
157
+ * Mat Brown
158
+ * shairontoledo
159
+ * Matthew Rudy
160
+ * Fouad Mardini
data/Rakefile ADDED
@@ -0,0 +1,40 @@
1
+ $:.unshift(File.dirname(__FILE__)) unless
2
+ $:.include?(File.dirname(__FILE__)) || $:.include?(File.expand_path(File.dirname(__FILE__)))
3
+
4
+ require 'bundler'
5
+
6
+ require 'rake'
7
+ require 'rake/testtask'
8
+
9
+ require 'rake/gempackagetask'
10
+
11
+ gemspec = eval File.read('gsolr.gemspec')
12
+
13
+ # Gem packaging tasks
14
+ Rake::GemPackageTask.new(gemspec) do |pkg|
15
+ pkg.need_zip = false
16
+ pkg.need_tar = false
17
+ end
18
+
19
+ task :gem => :gemspec
20
+
21
+ desc %{Build the gemspec file.}
22
+ task :gemspec do
23
+ gemspec.validate
24
+ end
25
+
26
+ desc %{Release the gem to RubyGems.org}
27
+ task :release => :gem do
28
+ system "gem push pkg/#{gemspec.name}-#{gemspec.version}.gem"
29
+ end
30
+
31
+
32
+ ENV['RUBYOPT'] = '-W1'
33
+
34
+ task :environment do
35
+ require File.dirname(__FILE__) + '/lib/gsolr'
36
+ end
37
+
38
+ Dir['tasks/**/*.rake'].each { |t| load t }
39
+
40
+ task :default => ['spec:api']
data/gsolr.gemspec ADDED
@@ -0,0 +1,25 @@
1
+ # -*- encoding: utf-8 -*-
2
+ $:.push File.expand_path("../lib", __FILE__)
3
+ require "gsolr/version"
4
+
5
+ Gem::Specification.new do |s|
6
+ s.name = "gsolr"
7
+ s.version = Gsolr::VERSION
8
+ s.platform = Gem::Platform::RUBY
9
+ s.authors = ["Scott Gonyea"]
10
+ s.email = ["me@sgonyea.com"]
11
+ s.homepage = "http://rubygems.org/gems/gsolr"
12
+ s.summary = %q{Generic Solr Client}
13
+ s.description = %q{This is a generic solr client, capable of talking to Solr, as well as Riak}
14
+
15
+ s.rubyforge_project = "gsolr"
16
+
17
+ s.add_dependency('json', '~>1.4.6')
18
+
19
+ s.add_development_dependency "rspec"
20
+
21
+ s.files = `git ls-files`.split("\n")
22
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
23
+ s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
24
+ s.require_paths = ["lib"]
25
+ end
@@ -0,0 +1,121 @@
1
+ module GSolr
2
+ class Client
3
+
4
+ attr_reader :connection
5
+
6
+ # "connection" is instance of:
7
+ # GSolr::Adapter::HTTP
8
+ # GSolr::Adapter::Direct (jRuby only)
9
+ # or any other class that uses the connection "interface"
10
+ def initialize(connection)
11
+ @connection = connection
12
+ end
13
+
14
+ # Send a request to a request handler using the method name.
15
+ # Also proxies to the #paginate method if the method starts with "paginate_"
16
+ def method_missing(method_name, *args, &blk)
17
+ request("/#{method_name}", *args, &blk)
18
+ end
19
+
20
+ # sends data to the update handler
21
+ # data can be a string of xml, or an object that returns xml from its #to_xml method
22
+ def update(data, params={})
23
+ request '/update', params, data
24
+ end
25
+
26
+ # send request solr
27
+ # params is hash with valid solr request params (:q, :fl, :qf etc..)
28
+ # if params[:wt] is not set, the default is :ruby
29
+ # if :wt is something other than :ruby, the raw response body is used
30
+ # otherwise, a simple Hash is returned
31
+ # NOTE: to get raw ruby, use :wt=>'ruby' <- a string, not a symbol like :ruby
32
+ #
33
+ def request(path, params={}, *extra)
34
+ response = @connection.request(path, map_params(params), *extra)
35
+ adapt_response(response)
36
+ end
37
+
38
+ #
39
+ # single record:
40
+ # solr.update(:id=>1, :name=>'one')
41
+ #
42
+ # update using an array
43
+ # solr.update([{:id=>1, :name=>'one'}, {:id=>2, :name=>'two'}])
44
+ #
45
+ def add(doc, &block)
46
+ update message.add(doc, &block)
47
+ end
48
+
49
+ # send </commit>
50
+ def commit
51
+ update message.commit
52
+ end
53
+
54
+ # send </optimize>
55
+ def optimize
56
+ update message.optimize
57
+ end
58
+
59
+ # send </rollback>
60
+ # NOTE: solr 1.4 only
61
+ def rollback
62
+ update message.rollback
63
+ end
64
+
65
+ # Delete one or many documents by id
66
+ # solr.delete_by_id 10
67
+ # solr.delete_by_id([12, 41, 199])
68
+ def delete_by_id(id)
69
+ update message.delete_by_id(id)
70
+ end
71
+
72
+ # delete one or many documents by query
73
+ # solr.delete_by_query 'available:0'
74
+ # solr.delete_by_query ['quantity:0', 'manu:"FQ"']
75
+ def delete_by_query(query)
76
+ update message.delete_by_query(query)
77
+ end
78
+
79
+ # shortcut to GSolr::Message::Generator
80
+ def message *opts
81
+ @message ||= GSolr::Message::Generator.new
82
+ end
83
+
84
+ protected
85
+
86
+ # sets default params etc.. - could be used as a mapping hook
87
+ # type of request should be passed in here? -> map_params(:query, {})
88
+ def map_params(params)
89
+ params||={}
90
+ {:wt=>:json}.merge(params)
91
+ end
92
+
93
+ # "connection_response" must be a hash with the following keys:
94
+ # :params - a sub hash of standard solr params
95
+ # : body - the raw response body from the solr server
96
+ # This method will evaluate the :body value if the params[:wt] == :ruby
97
+ # otherwise, the body is returned
98
+ # The return object has a special method attached called #raw
99
+ # This method gives you access to the original response from the connection,
100
+ # so you can access things like the actual :url sent to solr,
101
+ # the raw :body, original :params and original :data
102
+ def adapt_response(connection_response)
103
+ data = connection_response[:body]
104
+
105
+ # if the wt is :ruby, evaluate the ruby string response
106
+ if connection_response[:params][:wt] == :ruby
107
+ data = Kernel.eval(data)
108
+ end
109
+
110
+ # attach a method called #raw that returns the original connection response value
111
+ def data.raw
112
+ @raw
113
+ end
114
+
115
+ data.send(:instance_variable_set, '@raw', connection_response)
116
+
117
+ return data
118
+ end
119
+
120
+ end # class Client
121
+ end # module GSolr
@@ -0,0 +1,56 @@
1
+ require 'net/http'
2
+
3
+ #
4
+ # Connection for standard HTTP Solr server
5
+ #
6
+ module GSolr
7
+ module Connection
8
+ class NetHttp
9
+
10
+ include GSolr::Connection::Requestable
11
+
12
+ def connection
13
+ @connection ||= Net::HTTP.new(@uri.host, @uri.port)
14
+ end
15
+
16
+ def get(path, params={})
17
+ url = self.build_url path, params
18
+ net_http_response = self.connection.get url
19
+ create_http_context net_http_response, url, path, params
20
+ end
21
+
22
+ def post(path, data, params={}, headers={})
23
+ url = self.build_url path, params
24
+ net_http_response = self.connection.post url, data, headers
25
+ create_http_context net_http_response, url, path, params, data, headers
26
+ end
27
+
28
+ def create_http_context(net_http_response, url, path, params, data=nil, headers={})
29
+ full_url = "#{@uri.scheme}://#{@uri.host}"
30
+
31
+ full_url += ":#{@uri.port}" if @uri.port
32
+
33
+ full_url += url
34
+
35
+ return {
36
+ :status_code => net_http_response.code.to_i,
37
+ :url => full_url,
38
+ :body => encode_utf8(net_http_response.body),
39
+ :path => path,
40
+ :params => params,
41
+ :data => data,
42
+ :headers => headers,
43
+ :message => net_http_response.message
44
+ }
45
+ end
46
+
47
+ # accepts a path/string and optional hash of query params
48
+ def build_url(path, params={})
49
+ full_path = @uri.path + path
50
+
51
+ super full_path, params, @uri.query
52
+ end
53
+
54
+ end # class NetHttp
55
+ end # module Connection
56
+ end # module GSolr
@@ -0,0 +1,48 @@
1
+ # A module that defines the interface and top-level logic for http based connection classes.
2
+ module GSolr
3
+ module Connection
4
+ module Requestable
5
+
6
+ include GSolr::Connection::Utils
7
+
8
+ attr_reader :opts, :uri
9
+
10
+ # opts can have:
11
+ # :url => 'http://localhost:8080/solr'
12
+ def initialize(opts={})
13
+ opts[:url] ||= 'http://127.0.0.1:8983/solr'
14
+ @opts = opts
15
+ @uri = URI.parse opts[:url]
16
+ end
17
+
18
+ # send a request to the connection
19
+ # request '/select', :q=>'*:*'
20
+ #
21
+ # request '/update', {:wt=>:xml}, '</commit>'
22
+ #
23
+ # force a post where the post body is the param query
24
+ # request '/update', "<optimize/>", :method=>:post
25
+ #
26
+ def request(path, params={}, *extra)
27
+ opts = extra[-1].kind_of?(Hash) ? extra.pop : {}
28
+ data = extra[0]
29
+ # force a POST, use the query string as the POST body
30
+ if opts[:method] == :post and data.to_s.empty?
31
+ http_context = self.post(path, hash_to_query(params), {}, {'Content-Type' => 'application/x-www-form-urlencoded'})
32
+ else
33
+ if data
34
+ # standard POST, using "data" as the POST body
35
+ http_context = self.post(path, data, params, {"Content-Type" => 'text/xml; charset=utf-8'})
36
+ else
37
+ # standard GET
38
+ http_context = self.get(path, params)
39
+ end
40
+ end
41
+
42
+ raise GSolr::RequestError.new("Solr Response: #{http_context[:message]}") unless http_context[:status_code] == 200
43
+
44
+ return http_context
45
+ end
46
+ end # module Requestable
47
+ end # module Connection
48
+ end # module GSolr
@@ -0,0 +1,82 @@
1
+ # Helpful utility methods for building queries to a Solr server
2
+ # This includes helpers that the Direct connection can use.
3
+ module GSolr
4
+ module Connection
5
+ module Utils
6
+ # Performs URI escaping so that you can construct proper
7
+ # query strings faster. Use this rather than the cgi.rb
8
+ # version since it's faster. (Stolen from Rack).
9
+ def escape(s)
10
+ s.to_s.gsub(/([^ a-zA-Z0-9_.-]+)/n) {
11
+ #'%'+$1.unpack('H2'*$1.size).join('%').upcase
12
+ '%'+$1.unpack('H2'*bytesize($1)).join('%').upcase
13
+ }.tr(' ', '+')
14
+ end
15
+
16
+ # encodes the string as utf-8 in Ruby 1.9
17
+ # returns the unaltered string in Ruby 1.8
18
+ def encode_utf8(string)
19
+ (string.respond_to?(:force_encoding) and string.respond_to?(:encoding)) ?
20
+ string.force_encoding(Encoding::UTF_8) : string
21
+ end
22
+
23
+ # Return the bytesize of String; uses String#length under Ruby 1.8 and
24
+ # String#bytesize under 1.9.
25
+ if ''.respond_to?(:bytesize)
26
+ def bytesize(string)
27
+ string.bytesize
28
+ end
29
+ else
30
+ def bytesize(string)
31
+ string.size
32
+ end
33
+ end
34
+
35
+ # creates and returns a url as a string
36
+ # "url" is the base url
37
+ # "params" is an optional hash of GET style query params
38
+ # "string_query" is an extra query string that will be appended to the
39
+ # result of "url" and "params".
40
+ def build_url(url='', params={}, string_query='')
41
+ queries = [string_query, hash_to_query(params)]
42
+
43
+ queries.delete_if{|q_elem|
44
+ q_elem.to_s.empty?
45
+ }
46
+
47
+ url += "?#{queries.join('&')}" unless queries.empty?
48
+
49
+ return url
50
+ end
51
+
52
+ # converts a key value pair to an escaped string:
53
+ # Example:
54
+ # build_param(:id, 1) == "id=1"
55
+ def build_param(k,v)
56
+ "#{escape(k)}=#{escape(v)}"
57
+ end
58
+
59
+ #
60
+ # converts hash into URL query string, keys get an alpha sort
61
+ # if a value is an array, the array values get mapped to the same key:
62
+ # hash_to_query(:q=>'blah', :fq=>['blah', 'blah'], :facet=>{:field=>['location_facet', 'format_facet']})
63
+ # returns:
64
+ # ?q=blah&fq=blah&fq=blah&facet.field=location_facet&facet.field=format.facet
65
+ #
66
+ # if a value is empty/nil etc., it is not added
67
+ def hash_to_query(params)
68
+ mapped = params.map do |key, val|
69
+ next if val.to_s.empty?
70
+
71
+ if val.class == Array
72
+ hash_to_query(v.map { |elem| [key, elem] })
73
+ else
74
+ build_param key, val
75
+ end
76
+ end
77
+
78
+ mapped.compact.join("&")
79
+ end
80
+ end # module Utils
81
+ end # module Connection
82
+ end # module GSolr
@@ -0,0 +1,9 @@
1
+ require 'uri'
2
+
3
+ module GSolr
4
+ module Connection
5
+ autoload :NetHttp, 'gsolr/connection/net_http'
6
+ autoload :Utils, 'gsolr/connection/utils'
7
+ autoload :Requestable, 'gsolr/connection/requestable'
8
+ end
9
+ end
@@ -0,0 +1,52 @@
1
+ # A class that represents a "doc" xml element for a solr update
2
+ module GSolr
3
+ module Message
4
+ class Document
5
+
6
+ # "attrs" is a hash for setting the "doc" xml attributes
7
+ # "fields" is an array of Field objects
8
+ attr_accessor :attrs, :fields
9
+
10
+ # "doc_hash" must be a Hash/Mash object
11
+ # If a value in the "doc_hash" is an array,
12
+ # a field object is created for each value...
13
+ def initialize(doc_hash = {})
14
+ @fields = []
15
+ doc_hash.each_pair do |field,values|
16
+ # create a new field for each value (multi-valued)
17
+ # put non-array values into an array
18
+ values = [values] unless values.is_a?(Array)
19
+ values.each do |v|
20
+ next if v.to_s.empty?
21
+ @fields << GSolr::Message::Field.new({:name=>field}, v.to_s)
22
+ end
23
+ end
24
+ @attrs={}
25
+ end
26
+
27
+ # returns an array of fields that match the "name" arg
28
+ def fields_by_name(name)
29
+ @fields.select{|f|f.name==name}
30
+ end
31
+
32
+ # returns the *first* field that matches the "name" arg
33
+ def field_by_name(name)
34
+ @fields.detect{|f|f.name==name}
35
+ end
36
+
37
+ #
38
+ # Add a field value to the document. Options map directly to
39
+ # XML attributes in the Solr <field> node.
40
+ # See http://wiki.apache.org/solr/UpdateXmlMessages#head-8315b8028923d028950ff750a57ee22cbf7977c6
41
+ #
42
+ # === Example:
43
+ #
44
+ # document.add_field('title', 'A Title', :boost => 2.0)
45
+ #
46
+ def add_field(name, value, options = {})
47
+ @fields << GSolr::Message::Field.new(options.merge({:name=>name}), value)
48
+ end
49
+
50
+ end # class Document
51
+ end # module Message
52
+ end # module GSolr
@@ -0,0 +1,23 @@
1
+ # A class that represents a "doc"/"field" xml element for a solr update
2
+ module GSolr
3
+ module Message
4
+ class Field
5
+
6
+ # "attrs" is a hash for setting the "doc" xml attributes
7
+ # "value" is the text value for the node
8
+ attr_accessor :attrs, :value
9
+
10
+ # "attrs" must be a hash
11
+ # "value" should be something that responds to #_to_s
12
+ def initialize(attrs, value)
13
+ @attrs = attrs
14
+ @value = value
15
+ end
16
+
17
+ # the value of the "name" attribute
18
+ def name
19
+ @attrs[:name]
20
+ end
21
+ end # class Field
22
+ end # module Message
23
+ end # module GSolr