cheap_skate 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
data/LICENSE ADDED
@@ -0,0 +1,16 @@
1
+ CheapSkate: a simple Solr emulator for Ruby
2
+ Copyright (C) 2010 Ross Singer (rossfsinger@gmail.com), Talis (http://talis.com/)
3
+
4
+ This program is free software; you can redistribute it and/or
5
+ modify it under the terms of the GNU General Public License
6
+ as published by the Free Software Foundation; either version 2
7
+ of the License, or (at your option) any later version.
8
+
9
+ This program is distributed in the hope that it will be useful,
10
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
11
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12
+ GNU General Public License for more details.
13
+
14
+ You should have received a copy of the GNU General Public License
15
+ along with this program; if not, write to the Free Software
16
+ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
data/README ADDED
@@ -0,0 +1,25 @@
1
+ CheapSkate is a Solr emulator intended for situations where the basic functionality of Solr is needed or desired but a Java application server is not feasible (i.e. a cheap, shared webhosting account). It uses Ferret (a Ruby Lucene clone) for the fulltext indexing and faceting.
2
+
3
+ CheapSkate is very much a work in progress and development is based on responding to specific needs, rather than really trying to build an actual Solr clone (hence the current lack of XML responseWriter, for example).
4
+
5
+ Requirements:
6
+ sinatra
7
+ ferret
8
+ hpricot
9
+ uuid
10
+ json
11
+ faster_csv
12
+
13
+ Caveats:
14
+
15
+ CheapSkate isn't really intended to replace Solr. If you need something that scales to millions of documents or performs replication or does all the neat things Solr does, use Solr.
16
+
17
+ Faceting causes a serious performance hit on large result sets due to the fact that the facet results require every document in the index to be loaded.
18
+
19
+ Todos:
20
+
21
+ Provide field boosting and analyzers both during index as well as query.
22
+
23
+ Add the MoreLikeThis analyzer from acts_as_ferret.
24
+
25
+ Add more responseWriters (starting with XML).
@@ -0,0 +1,7 @@
1
+ :default
2
+ :prefix_path: /solr
3
+ :ferret:
4
+ :path: db/skate
5
+ :facet_score_threshold: 0.0
6
+ :schema: conf/schema.yaml
7
+
@@ -0,0 +1,290 @@
1
+ schema:
2
+ name: example
3
+ copyFields:
4
+ - cat: text
5
+ - name: text
6
+ - manu: text
7
+ - features: text
8
+ - includes: text
9
+ - manu: manu_exact
10
+ uniqueKey: id
11
+ fields:
12
+ price:
13
+ stored: true
14
+ indexed: true
15
+ type: float
16
+ cat:
17
+ multiValued: true
18
+ stored: true
19
+ indexed: true
20
+ type: text_ws
21
+ omitNorms: true
22
+ name:
23
+ stored: true
24
+ indexed: true
25
+ type: textgen
26
+ category:
27
+ stored: true
28
+ indexed: true
29
+ type: textgen
30
+ popularity:
31
+ stored: true
32
+ indexed: true
33
+ type: int
34
+ content_type:
35
+ multiValued: true
36
+ stored: true
37
+ indexed: true
38
+ type: string
39
+ author:
40
+ stored: true
41
+ indexed: true
42
+ type: textgen
43
+ comments:
44
+ stored: true
45
+ indexed: true
46
+ type: text
47
+ title:
48
+ multiValued: true
49
+ stored: true
50
+ indexed: true
51
+ type: text
52
+ includes:
53
+ termOffsets: true
54
+ stored: true
55
+ termVectors: true
56
+ indexed: true
57
+ type: text
58
+ termPositions: true
59
+ text:
60
+ multiValued: true
61
+ stored: false
62
+ indexed: true
63
+ type: text
64
+ weight:
65
+ stored: true
66
+ indexed: true
67
+ type: float
68
+ subject:
69
+ stored: true
70
+ indexed: true
71
+ type: text
72
+ id:
73
+ required: true
74
+ stored: true
75
+ indexed: true
76
+ type: string
77
+ text_rev:
78
+ multiValued: true
79
+ stored: false
80
+ indexed: true
81
+ type: text_rev
82
+ manu_exact:
83
+ stored: false
84
+ indexed: true
85
+ type: string
86
+ links:
87
+ multiValued: true
88
+ stored: true
89
+ indexed: true
90
+ type: string
91
+ features:
92
+ multiValued: true
93
+ stored: true
94
+ indexed: true
95
+ type: text
96
+ sku:
97
+ stored: true
98
+ indexed: true
99
+ type: textTight
100
+ omitNorms: true
101
+ description:
102
+ stored: true
103
+ indexed: true
104
+ type: text
105
+ inStock:
106
+ stored: true
107
+ indexed: true
108
+ type: boolean
109
+ payloads:
110
+ stored: true
111
+ indexed: true
112
+ type: payloads
113
+ last_modified:
114
+ stored: true
115
+ indexed: true
116
+ type: date
117
+ manu:
118
+ stored: true
119
+ indexed: true
120
+ type: textgen
121
+ omitNorms: true
122
+ alphaNameSort:
123
+ stored: false
124
+ indexed: true
125
+ type: alphaOnlySort
126
+ keywords:
127
+ stored: true
128
+ indexed: true
129
+ type: textgen
130
+ version: "1.2"
131
+ dynamic_fields:
132
+ "*_tf":
133
+ stored: true
134
+ indexed: true
135
+ type: tfloat
136
+ "*_l":
137
+ stored: true
138
+ indexed: true
139
+ type: long
140
+ "*_b":
141
+ stored: true
142
+ indexed: true
143
+ type: boolean
144
+ random_*:
145
+ type: random
146
+ "*_ti":
147
+ stored: true
148
+ indexed: true
149
+ type: tint
150
+ "*_d":
151
+ stored: true
152
+ indexed: true
153
+ type: double
154
+ "*_tdt":
155
+ stored: true
156
+ indexed: true
157
+ type: tdate
158
+ "*_tl":
159
+ stored: true
160
+ indexed: true
161
+ type: tlong
162
+ "*_f":
163
+ stored: true
164
+ indexed: true
165
+ type: float
166
+ "*_pi":
167
+ stored: true
168
+ indexed: true
169
+ type: pint
170
+ attr_*:
171
+ multiValued: true
172
+ stored: true
173
+ indexed: true
174
+ type: textgen
175
+ "*_s":
176
+ stored: true
177
+ indexed: true
178
+ type: string
179
+ ignored_*:
180
+ multiValued: true
181
+ type: ignored
182
+ "*_td":
183
+ stored: true
184
+ indexed: true
185
+ type: tdouble
186
+ "*_dt":
187
+ stored: true
188
+ indexed: true
189
+ type: date
190
+ "*_t":
191
+ stored: true
192
+ indexed: true
193
+ type: text
194
+ "*_i":
195
+ stored: true
196
+ indexed: true
197
+ type: int
198
+ types:
199
+ pint:
200
+ :type: :int
201
+ :index: :untokenized_omit_norms
202
+ tfloat:
203
+ :type:
204
+ :index: :untokenized_omit_norms
205
+ boolean:
206
+ :type: :bool
207
+ :index: :untokenized_omit_norms
208
+ phonetic:
209
+ :type: :text
210
+ sfloat:
211
+ :type:
212
+ :index: :untokenized_omit_norms
213
+ tdate:
214
+ :type:
215
+ :index: :untokenized_omit_norms
216
+ binary:
217
+ :type:
218
+ :index: :untokenized
219
+ pdouble:
220
+ :type:
221
+ :index: :untokenized_omit_norms
222
+ pfloat:
223
+ :type: :float
224
+ :index: :untokenized_omit_norms
225
+ plong:
226
+ :type:
227
+ :index: :untokenized_omit_norms
228
+ lowercase:
229
+ :type: :text
230
+ text:
231
+ :type: :text
232
+ int:
233
+ :type:
234
+ :index: :untokenized_omit_norms
235
+ date:
236
+ :type:
237
+ :index: :untokenized_omit_norms
238
+ text_rev:
239
+ :type: :text
240
+ slong:
241
+ :type:
242
+ :index: :untokenized_omit_norms
243
+ sint:
244
+ :type:
245
+ :index: :untokenized_omit_norms
246
+ textgen:
247
+ :type: :text
248
+ text_ws:
249
+ :type: :text
250
+ random:
251
+ :type:
252
+ :index: :untokenized
253
+ tdouble:
254
+ :type:
255
+ :index: :untokenized_omit_norms
256
+ tlong:
257
+ :type:
258
+ :index: :untokenized_omit_norms
259
+ tint:
260
+ :type:
261
+ :index: :untokenized_omit_norms
262
+ alphaOnlySort:
263
+ :type: :text
264
+ :index: :omit_norms
265
+ sdouble:
266
+ :type:
267
+ :index: :untokenized_omit_norms
268
+ pdate:
269
+ :type: :date
270
+ :index: :untokenized_omit_norms
271
+ double:
272
+ :type:
273
+ :index: :untokenized_omit_norms
274
+ string:
275
+ :type: :string
276
+ :index: :untokenized_omit_norms
277
+ ignored:
278
+ :type: :string
279
+ :index: :untokenized
280
+ payloads:
281
+ :type: :text
282
+ textTight:
283
+ :type: :text
284
+ long:
285
+ :type:
286
+ :index: :untokenized_omit_norms
287
+ float:
288
+ :type:
289
+ :index: :untokenized_omit_norms
290
+ defaultSearchField: text
@@ -0,0 +1,19 @@
1
+ require 'rubygems'
2
+ require 'sinatra'
3
+
4
+ root_dir = File.dirname(__FILE__)
5
+
6
+ set :environment, :production
7
+ set :configuration, root_dir+"/conf/cheapskate.yml"
8
+ set :site, :kochief
9
+ set :root, root_dir
10
+
11
+ set :logging, false
12
+ disable :run
13
+
14
+ FileUtils.mkdir_p 'log' unless File.exists?('log')
15
+ log = File.new("log/#{Sinatra::Application.site}.log", "a")
16
+ STDOUT.reopen(log)
17
+ STDERR.reopen(log)
18
+ require 'cheap_skate'
19
+ run CheapSkate::Application
@@ -0,0 +1,23 @@
1
+ module CheapSkate
2
+ $KCODE = 'u'
3
+ #stdlib dependencies
4
+ require "rubygems"
5
+ require 'jcode'
6
+ require 'yaml'
7
+ require 'cgi'
8
+
9
+ #gem dependencies
10
+ require 'faster_csv'
11
+ require "uuid"
12
+ require 'ferret'
13
+ require 'json'
14
+ require 'hpricot'
15
+ require 'sinatra'
16
+
17
+ #project files
18
+ require File.dirname(__FILE__)+'/cheap_skate/models'
19
+ require File.dirname(__FILE__)+'/cheap_skate/schema'
20
+ require File.dirname(__FILE__)+'/cheap_skate/index'
21
+ require File.dirname(__FILE__)+'/cheap_skate/application'
22
+
23
+ end
@@ -0,0 +1,106 @@
1
+ module CheapSkate
2
+ class Application < Sinatra::Base
3
+
4
+ configure do
5
+
6
+ config = YAML.load_file(Sinatra::Application.configuration)[Sinatra::Application.site]
7
+
8
+ i = CheapSkate::Index.new(config[:ferret]||{}, Schema.new_from_config(YAML.load_file(config[:schema])))
9
+ i.set_fields_from_schema
10
+ set :index, i
11
+ set :prefix_path, (config[:prefix_path]||nil)
12
+ STDOUT.puts "CheapSkate starting with index schema: #{i.schema.name}"
13
+ STDOUT.puts "#{i.reader.num_docs} documents currently indexed."
14
+ end
15
+
16
+ get '/' do
17
+ "Welcome to CheapSkate"
18
+ end
19
+
20
+
21
+ get '/select/' do
22
+ results = select(params)
23
+ wt = params["wt"] || "json"
24
+ results.query_time = qtime
25
+ if wt == "json"
26
+ content_type 'application/json', :charset => 'utf-8'
27
+ end
28
+ results.send("as_#{wt}")
29
+ end
30
+
31
+ post '/select/' do
32
+ results = select(params)
33
+ wt = params["wt"] || "json"
34
+ results.query_time = qtime
35
+ results.send("as_#{wt}")
36
+ end
37
+
38
+ get '/update/csv/' do
39
+ csv = CSVLoader.new(params, settings.index)
40
+ csv.parse
41
+ out = <<END
42
+ <?xml version="1.0" encoding="UTF-8"?>
43
+ <response>
44
+ <lst name="responseHeader"><int name="status">0</int><int name="QTime">#{qtime}</int></lst>
45
+ </response>
46
+ END
47
+ end
48
+
49
+ post '/update/' do
50
+ i = InputDocument.new(request.env["rack.input"].read, settings.index)
51
+ i.parse
52
+ wt = params["wt"] || "json"
53
+ i.query_time = qtime
54
+ i.send("as_#{wt}")
55
+ end
56
+
57
+ get '/admin/ping/' do
58
+ '<response>
59
+ <lst name="responseHeader"><int name="status">0</int><int name="QTime">1</int><lst name="params"><str name="echoParams">all</str><str name="echoParams">all</str><str name="q">solrpingquery</str><str name="qt">standard</str></lst></lst><str name="status">OK</str>
60
+ </response>'
61
+ end
62
+
63
+ before do
64
+ if env[:prefix_path]
65
+ request.path_info.sub!(/^#{env[:prefix_path]}/,'')
66
+ end
67
+ unless request.path_info[-1,1] == "/"
68
+ request.path_info << "/"
69
+ end
70
+ @time = Time.now
71
+ end
72
+
73
+ helpers do
74
+ def select(params)
75
+ qry = request.env["rack.input"].read
76
+ if qry.empty?
77
+ qry = request.env["rack.request.query_string"]
78
+ end
79
+
80
+ parm = CGI.parse(qry)
81
+ qt = (params['qt'] || "standard")
82
+ query = settings.index.send("parse_#{qt}_query".to_sym, parm)
83
+
84
+ opts = {}
85
+ opts[:offset] = (params["start"] || 0).to_i
86
+ opts[:limit] = (params["rows"] || 10).to_i
87
+ if params["sort"]
88
+ opts[:sort] = params["sort"]
89
+ opts[:sort].sub!(/ asc/,"")
90
+ opts[:sort].sub!(/ desc/,"DESC")
91
+ end
92
+ if params["facet"] == "true"
93
+ query.extend(Facet)
94
+ query.add_facets_to_query(parm)
95
+ query.parse_facet_query(parm)
96
+ end
97
+ results = settings.index.search(query, opts)
98
+ results
99
+ end
100
+
101
+ def qtime
102
+ ((Time.now - @time) * 1000).to_i
103
+ end
104
+ end
105
+ end
106
+ end
@@ -0,0 +1,181 @@
1
+ module CheapSkate
2
+ class Index < Ferret::Index::Index
3
+ attr_accessor :schema
4
+ def initialize(opts={}, schema=CheapSkate::Schema.new)
5
+ super(opts)
6
+ @schema = schema
7
+ end
8
+
9
+ def set_fields_from_schema
10
+ index_schema_changed = false
11
+ schema.field_names.each do |fld|
12
+ f = schema.field_to_field_info(fld)
13
+ if !field_infos[f.name]
14
+ self.writer.field_infos << f
15
+ self.reader.field_infos << f
16
+ index_schema_changed = true
17
+ end
18
+ end
19
+ puts "Schema has changed" if index_schema_changed
20
+
21
+ end
22
+
23
+ def set_dynamic_field(field)
24
+ return if field_infos[field]
25
+
26
+ return if schema.fields[field]
27
+
28
+ dyn_field = nil
29
+ schema.dynamic_fields.keys.each do |dyn|
30
+ if dyn =~ /^\*/
31
+ r = Regexp.new(dyn.sub(/^\*/,".*"))
32
+ elsif dyn =~ /\*$/
33
+ r = Regexp.new(dyn.sub(/\*$/,".*"))
34
+ end
35
+ unless (field.to_s =~ r).nil?
36
+ dyn_field = dyn
37
+ break
38
+ else
39
+ puts "Unable to match #{field.to_s} against a dynamic field pattern"
40
+ end
41
+ end
42
+ return unless dyn_field
43
+ opts = {}
44
+ if schema.dynamic_fields[dyn_field][:index] == :no
45
+ opts[:index] = :no
46
+ opts[:term_vector] = :no
47
+ elsif schema.field_types[schema.dynamic_fields[dyn_field][:field_type]][:index]
48
+ opts[:index] = schema.field_types[schema.dynamic_fields[dyn_field][:field_type]][:index]
49
+ end
50
+ if schema.dynamic_fields[dyn_field][:stored] == :no
51
+ opts[:store] = :no
52
+ end
53
+ puts "Adding dynamic field: #{field}"
54
+ writer.field_infos.add_field(field, opts)
55
+ end
56
+
57
+ def search(query, opts={})
58
+ results = ResultSet.new
59
+ results.offset = opts[:offset]
60
+ results.limit = opts[:limit]
61
+ results.query = query.query.to_s
62
+ if opts[:limit] < 1
63
+ opts[:limit] = 1
64
+ end
65
+ if query.filter
66
+ opts[:filter] = query.filter
67
+ end
68
+
69
+ if query.filter_proc
70
+ if query.query.is_a?(Ferret::Search::MatchAllQuery) && query.filter.nil?
71
+ get_facets_from_index_terms(query)
72
+ else
73
+ opts[:filter_proc] = query.filter_proc
74
+ end
75
+ end
76
+
77
+ results.total = self.search_each(query.query, opts) do |id, score|
78
+ results << @schema.typed_document(self[id])
79
+ end
80
+ if query.respond_to?(:facet_fields)
81
+ facets = {}
82
+ query.facet_fields.each do | facet, values |
83
+ facets[facet] = values.sort{|a,b| b[1]<=>a[1]}[query.facet_offset, query.facet_limit]
84
+ end
85
+ query.facet_fields = facets
86
+ if query.facet_queries
87
+ query.facet_queries.each do |fq|
88
+ fq[:results] = search_each(fq.query, :filter=>fq.filter, :limit=>1) {|id,score|}
89
+ end
90
+ end
91
+ results.extend(Facet)
92
+ results.add_facets_to_results(query)
93
+ end
94
+ results
95
+ end
96
+
97
+ def get_facets_from_index_terms(query)
98
+ puts query.facet_fields.inspect
99
+ query.facet_fields.keys.each do |field|
100
+ field_terms = reader.terms(field)
101
+ next unless field_terms
102
+ field_terms.each do |term, count|
103
+ query.facet_fields[field][term] = count
104
+ end
105
+ end
106
+ end
107
+
108
+
109
+ def parse_standard_query(params)
110
+ or_and = case
111
+ when params["q.op"] then [*params["q.op"]].first
112
+ else schema.default_operator
113
+ end
114
+
115
+ dflt_field = case
116
+ when params["df"] then [*params["df"]].first
117
+ else schema.default_field
118
+ end
119
+ parser = Ferret::QueryParser.new(:default_field=>dflt_field, :fields=>reader.tokenized_fields, :or_default=>(or_and=="OR"))
120
+
121
+ query = CheapSkate::Query.new
122
+ q = case params["q"].class.name
123
+ when "Array" then params["q"].first
124
+ when "String" then params["q"]
125
+ else nil
126
+ end
127
+ if q && !q.empty? && q != "*:*"
128
+ query.query = parser.parse(q)
129
+ else
130
+ query.query = Ferret::Search::MatchAllQuery.new
131
+ end
132
+ if params['fq']
133
+ query.filter = parse_filtered_query(params)
134
+ end
135
+ query
136
+ end
137
+
138
+ def parse_filtered_query(params)
139
+ or_and = case
140
+ when params["q.op"] then [*params["q.op"]].first
141
+ else schema.default_operator
142
+ end
143
+
144
+ dflt_field = case
145
+ when params["df"] then [*params["df"]].first
146
+ else schema.default_field
147
+ end
148
+
149
+ strict_parser = Ferret::QueryParser.new(:default_field=>dflt_field, :fields=>reader.tokenized_fields, :validate_fields=>true, :or_default=>(or_and=="OR"), :handle_parse_errors=>false)
150
+ bool = Ferret::Search::BooleanQuery.new
151
+ [*params['fq']].each do |fq|
152
+ if (filtq = strict_parser.parse(fq) && !filtq.to_s.empty?)
153
+ bool.add_query(filtq, :must)
154
+ else
155
+ (idx, term) = fq.split(":")
156
+ term.sub!(/^\"/,'').sub!(/\"$/,'')
157
+ bool.add_query(Ferret::Search::TermQuery.new(idx.to_sym, term), :must)
158
+ end
159
+ end
160
+ unless bool.to_s.empty?
161
+ return Ferret::Search::QueryFilter.new(bool)
162
+ end
163
+ nil
164
+ end
165
+
166
+
167
+ def parse_dismax_query(params)
168
+ parse_standard_query(params)
169
+ end
170
+
171
+ def parse_morelikethis_query(params)
172
+
173
+ end
174
+
175
+ def create_document(id=UUID.generate, boost=1.0)
176
+ d = CheapSkate::Document.new(id, boost)
177
+ d.index = self
178
+ d
179
+ end
180
+ end
181
+ end
@@ -0,0 +1,286 @@
1
+ module CheapSkate
2
+ class Document < Ferret::Document
3
+ attr_accessor :index, :doc_id
4
+ def initialize(doc_id=UUID.generate, boost=1.0)
5
+ @doc_id = doc_id
6
+ super(boost)
7
+ end
8
+
9
+
10
+ def add_field(key, value)
11
+ @index.set_dynamic_field(key.to_sym) unless @index.schema.field_names.index(key.to_sym)
12
+ if value.is_a?(Array)
13
+ value.each do |v|
14
+ add_field(key, v)
15
+ end
16
+ else
17
+ self[key.to_sym] ||= []
18
+ self[key.to_sym] << value
19
+ end
20
+ if copy_fields = @index.schema.copy_fields[key.to_sym]
21
+ copy_fields.each do |field|
22
+ add_field(field, value)
23
+ end
24
+ end
25
+ end
26
+ end
27
+
28
+ class ResultSet
29
+ attr_accessor :total, :docs, :query, :limit, :offset, :facets, :query_time
30
+ def <<(obj)
31
+ @docs ||=[]
32
+ @docs << obj
33
+ end
34
+
35
+ def to_hash()
36
+ response = {"responseHeader"=>{"status"=>0, "QTime"=>self.query_time, "params"=>{"q"=>self.query, "version"=>"2.2", "rows"=>self.limit}}}
37
+ response["response"] = {"numFound"=>self.total, "start"=>self.offset, "docs"=>[]}
38
+ if self.docs
39
+ self.docs.each do |doc|
40
+ response["response"]["docs"] << doc
41
+ end
42
+ end
43
+
44
+ response
45
+ end
46
+
47
+ def as_ruby
48
+ response = self.to_hash
49
+ response["responseHeader"]["wt"] = "ruby"
50
+ response.inspect
51
+ end
52
+ def as_json
53
+ response = self.to_hash
54
+ response["responseHeader"]["wt"] = "json"
55
+ response.to_json
56
+ end
57
+ end
58
+
59
+ module Facet
60
+ attr_accessor :facet_queries, :facet_limit, :facet_fields, :facet_offset, :facets, :facet_total
61
+ def add_facets_to_query(params)
62
+
63
+ if params['facet.limit'] && !params['facet.limit'].empty?
64
+ @facet_limit = params['facet.limit'].first.to_i
65
+ else
66
+ @facet_limit = 10
67
+ end
68
+ if params['facet.offset'] && !params['facet.offset'].empty?
69
+ @facet_offset = params['facet.offset'].first.to_i
70
+ else
71
+ @facet_offset = 0
72
+ end
73
+
74
+ @facet_fields = {}
75
+ params["facet.field"].each do | field |
76
+ @facet_fields[field.to_sym] = {}
77
+ end
78
+
79
+ @filter_proc = lambda do |doc,score,searcher|
80
+ @facet_fields.keys.each do |field|
81
+ [*searcher[doc][field]].each do |term|
82
+ next if term.nil?
83
+ @facet_fields[field][term] ||=0
84
+ @facet_fields[field][term] += 1
85
+ end
86
+ end
87
+ end
88
+
89
+ end
90
+
91
+ def add_facet_query(query, query_string)
92
+ @facet_queries ||= []
93
+ @facet_queries << {:query=>query, :results=>0, :query_string=>query_string}
94
+ end
95
+
96
+ def add_facets_to_results(query)
97
+ @facet_fields = query.facet_fields
98
+ @facet_limit = query.facet_limit
99
+ @facet_queries = query.facet_queries
100
+ @facet_offset = query.facet_offset
101
+ end
102
+
103
+ def parse_facet_query(params)
104
+ [*params['facet.query']].each do |q|
105
+ next unless q
106
+ bool = Ferret::Search::BooleanQuery.new
107
+
108
+
109
+ (idx, term) = q.split(":")
110
+ term.sub!(/^\"/,'').sub!(/\"$/,'')
111
+ bool.add_query(Ferret::Search::TermQuery.new(idx.to_sym, term), :must)
112
+
113
+ unless bool.to_s.empty?
114
+ if @filter
115
+ bool = Ferret::Search::FilteredQuery(bool, @filter)
116
+ end
117
+ query.query = @query
118
+ query.filter = Ferret::Search::QueryFilter.new(bool)
119
+ add_facet_query(query, q)
120
+ end
121
+ end
122
+
123
+ end
124
+
125
+ def to_hash
126
+ r = super
127
+ p = r["responseHeader"]["params"]
128
+ p["facets"] = "true"
129
+ p["facet.field"] = @fields
130
+ p["facet.limit"] = @limit
131
+ p["facet.offset"] = @offset
132
+ p["facet.query"] = @query
133
+ @total = 400;
134
+ r["facet_counts"] = {"facet_fields"=>@facet_fields, "facet_queries"=>[]}
135
+ if @facet_queries
136
+ @facet_queries.each do | fq |
137
+ r["facet_counts"]["facet_queries"] << [fq[:query_string], fq[:results]]
138
+ end
139
+ end
140
+ r
141
+ end
142
+
143
+ end
144
+
145
+
146
+
147
+ class FacetResponse
148
+ attr_accessor :query, :limit, :fields, :offset, :facets, :total
149
+ def initialize
150
+ @fields = []
151
+ @facets = {}
152
+ end
153
+
154
+ def add_facets(r)
155
+
156
+ end
157
+ end
158
+
159
+ class InputDocument
160
+ attr_reader :doc
161
+ attr_accessor :query_time
162
+ def initialize(doc, index)
163
+ doc.sub!(/^[^\<]*/,'')
164
+ @doc = Hpricot::XML(doc)
165
+ @index = index
166
+ end
167
+
168
+ def parse
169
+ action = @doc.root.name
170
+ self.send(action)
171
+ end
172
+
173
+ def add
174
+ (@doc/'/add/doc').each do |doc|
175
+ document = @index.create_document
176
+ (doc/'field').each do |elem|
177
+ field = elem.attributes['name']
178
+ value = nil
179
+ value = elem.inner_html
180
+ if field and value
181
+ if field == "id"
182
+ document[@index.schema.id_field] = value
183
+ else
184
+ document.add_field(field, value)
185
+ end
186
+ end
187
+ end
188
+ @index << document
189
+ end
190
+ end
191
+
192
+ def commit
193
+ @index.flush
194
+ end
195
+
196
+ def delete
197
+ ids = []
198
+ (@doc/"/delete/id").each do |del|
199
+ ids << del.inner_html
200
+ end
201
+ (@doc/"/delete/query").each do |del|
202
+ @index.search_each(del.attributes['q'], :limit=>:all) do |id,score|
203
+ ids << id
204
+ end
205
+ end
206
+ unless ids.empty?
207
+ @index.delete(ids)
208
+ end
209
+ end
210
+
211
+ def optimize
212
+ @index.optimize
213
+ end
214
+
215
+ def to_hash
216
+ return {"responseHeader"=>{"QTime"=>self.query_time, "status"=>0}}
217
+ end
218
+
219
+ def as_ruby
220
+ return to_hash.inspect
221
+ end
222
+
223
+ def as_json
224
+ return to_hash.to_json
225
+ end
226
+ end
227
+
228
+
229
+ class CSVLoader
230
+ attr_reader :fields, :filename, :file, :field_meta
231
+ def initialize(params, index)
232
+ @filename = params['stream.file']
233
+ if @filename
234
+ @file = open(@filename)
235
+ end
236
+ @field_meta = {}
237
+ params.each_pair do |key, val|
238
+ next unless key =~ /^f\./
239
+ (f,field,arg) = key.split(".")
240
+ @field_meta[field] ||={}
241
+ @field_meta[field][arg] = val
242
+ end
243
+ @index = index
244
+ end
245
+
246
+ def parse
247
+ if @file
248
+ parse_file
249
+ end
250
+ end
251
+
252
+ def parse_file
253
+ @fields = @file.gets.chomp.split(",")
254
+ documents = []
255
+ while line = @file.gets
256
+ line.chomp!
257
+ doc = @index.create_document
258
+ # keep track of where we are on the row
259
+ i = 0
260
+ FasterCSV.parse_line(line).each do |field|
261
+ unless field && @fields[i]
262
+ i+= 1
263
+ next
264
+ end
265
+ if @fields[i] == "id"
266
+ doc[@index.schema.id_field] = field
267
+ else
268
+ if @field_meta[@fields[i]] && @field_meta[@fields[i]]["split"] == "true"
269
+ field.split(@field_meta[@fields[i]]["separator"]).each do |f|
270
+ doc.add_field(@fields[i], f.strip)
271
+ end
272
+ else
273
+ doc.add_field(@fields[i], field.strip)
274
+ end
275
+ end
276
+ i+=1
277
+ end
278
+ @index << doc
279
+ end
280
+ end
281
+ end
282
+
283
+ class Query
284
+ attr_accessor :parser, :query, :filter, :filter_proc
285
+ end
286
+ end
@@ -0,0 +1,227 @@
1
+ require 'rexml/document'
2
+ require 'yaml'
3
+ class CheapSkate::Schema
4
+ include CheapSkate
5
+ attr_reader :name, :fields, :config, :field_types, :id_field, :copy_fields, :dynamic_fields, :default_field, :default_operator
6
+ def self.xml_to_yaml(xml)
7
+ doc = REXML::Document.new xml
8
+ y = {"schema"=>{"types"=>{}, "fields"=>{}}}
9
+ y["schema"]["name"] = doc.root.attributes["name"]
10
+ y["schema"]["version"] = doc.root.attributes["version"]
11
+ doc.each_element("/schema/fields/field") do |field|
12
+ f = {}
13
+ field.attributes.each do |a,v|
14
+ next if a == "name"
15
+ f[a] = case v
16
+ when "true" then true
17
+ when "false" then false
18
+ else v
19
+ end
20
+ end
21
+ y["schema"]["fields"][field.attributes['name']] = f
22
+ end
23
+ doc.each_element("/schema/fields/dynamicField") do |dyn_field|
24
+ f = {}
25
+ dyn_field.attributes.each do |a,v|
26
+ next if a == "name"
27
+ f[a] = case v
28
+ when "true" then true
29
+ when "false" then false
30
+ else v
31
+ end
32
+ end
33
+ y["schema"]["dynamic_fields"] ||= {}
34
+ y["schema"]["dynamic_fields"][dyn_field.attributes['name']] = f
35
+ end
36
+ doc.each_element("/schema/types/fieldType") do |type|
37
+ t = {}
38
+ t[:type] = case type.attributes['class']
39
+ when "solr.StrField" then :string
40
+ when "solr.TextField" then :text
41
+ when "solr.IntField" then :int
42
+ when "solr.FloatField" then :float
43
+ when "solr.BoolField" then :bool
44
+ when "solr.DateField" then :date
45
+ end
46
+ if type.attributes['omitNorms'] && type.attributes['omitNorms'] == "true"
47
+ t[:index] = :omit_norms
48
+ end
49
+ unless t[:type] == :text
50
+ if t[:index] == :omit_norms
51
+ t[:index] = :untokenized_omit_norms
52
+ else
53
+ t[:index] = :untokenized
54
+ end
55
+ end
56
+ y["schema"]["types"][type.attributes['name']] = t
57
+ end
58
+ doc.each_element("/schema/types/fieldtype") do |type|
59
+ t = {}
60
+ t[:type] = case type.attributes['class']
61
+ when "solr.StrField" then :string
62
+ when "solr.TextField" then :text
63
+ when "solr.IntField" then :int
64
+ when "solr.FloatField" then :float
65
+ when "solr.BoolField" then :bool
66
+ when "solr.DateField" then :date
67
+ end
68
+ if type.attributes['omitNorms'] && type.attributes['omitNorms'] == "true"
69
+ t[:index] = :omit_norms
70
+ end
71
+ unless t[:type] == :text
72
+ if t[:index] == :omit_norms
73
+ t[:index] = :untokenized_omit_norms
74
+ else
75
+ t[:index] = :untokenized
76
+ end
77
+ end
78
+ y["schema"]["types"][type.attributes['name']] = t
79
+ end
80
+ if dflt = doc.elements['/schema/defaultSearchField']
81
+ y["schema"]["defaultSearchField"] = dflt.get_text.value if dflt.has_text?
82
+ end
83
+ if uniq_key = doc.elements['/schema/uniqueKey']
84
+ y["schema"]["uniqueKey"] = uniq_key.get_text.value if uniq_key.has_text?
85
+ end
86
+ copy_fields = []
87
+ doc.each_element("/schema/copyField") do |copy|
88
+ copy_fields << {copy.attributes['source']=>copy.attributes['dest']}
89
+ end
90
+ unless copy_fields.empty?
91
+ y["schema"]["copyFields"] = copy_fields
92
+ end
93
+ y.to_yaml
94
+ end
95
+
96
+ def self.new_from_config(config_hash)
97
+ schema = self.new
98
+ schema.load_from_conf(config_hash)
99
+ schema
100
+ end
101
+
102
+ def initialize
103
+ @copy_fields = {}
104
+ end
105
+
106
+ def load_from_conf(conf)
107
+ @fields ={}
108
+ @field_types ={}
109
+ @name = conf['schema']['name']
110
+ conf['schema']['fields'].keys.each do |field|
111
+ @fields[field.to_sym] = {}
112
+ fld = conf['schema']['fields'][field]
113
+ @fields[field.to_sym][:field_type] = fld['type'].to_sym
114
+ if fld['indexed'] == false
115
+ @fields[field.to_sym][:index] = :no
116
+ end
117
+ if fld['stored'] == false
118
+ @fields[field.to_sym][:store] = :no
119
+ end
120
+ @fields[field.to_sym][:multi_valued] = fld['multiValued']||false
121
+ end
122
+ if conf['schema']['dynamic_fields']
123
+ conf['schema']['dynamic_fields'].keys.each do |field|
124
+ @dynamic_fields ||= {}
125
+ @dynamic_fields[field.to_sym] = {}
126
+ fld = conf['schema']['dynamic_fields'][field]
127
+ @dynamic_fields[field.to_sym][:field_type] = fld['type'].to_sym
128
+ if fld['indexed'] == false
129
+ @dynamic_fields[field.to_sym][:index] = :no
130
+ end
131
+ if fld['stored'] == false
132
+ @dynamic_fields[field.to_sym][:store] = :no
133
+ end
134
+ @dynamic_fields[field.to_sym][:multi_valued] = fld['multiValued']||false
135
+ end
136
+ end
137
+ conf['schema']['types'].keys.each do |type|
138
+ @field_types[type.to_sym] = conf['schema']['types'][type]
139
+ end
140
+ conf['schema']['copyFields'].each do |copy|
141
+ copy.each_pair do | orig, dest|
142
+ @copy_fields[orig.to_sym] ||= []
143
+ @copy_fields[orig.to_sym] << dest.to_sym
144
+ end
145
+ end
146
+ @id_field = (conf['schema']['uniqueKey'] || "id").to_sym
147
+ @default_field = (conf['schema']['defaultSearchField']||"*").to_sym
148
+ @default_operator = (conf['schema']['defaultOperator']||"OR")
149
+ end
150
+
151
+ def typed_document(lazy_doc)
152
+ doc = {}
153
+ lazy_doc.fields.each do |field|
154
+ [*lazy_doc[field]].each do |fld|
155
+ if doc[field]
156
+ doc[field] = [*doc[field]]
157
+ doc[field] << type_field(field, fld)
158
+ elsif multi_valued?(field)
159
+ doc[field] = [type_field(field, fld)]
160
+ else
161
+ doc[field] = type_field(field, fld)
162
+ end
163
+ end
164
+ end
165
+ doc
166
+ end
167
+
168
+ def multi_valued?(field)
169
+ if @fields[field]
170
+ return @fields[field][:multi_valued]
171
+ else
172
+ dyn_field = nil
173
+ @dynamic_fields.keys.each do |dyn|
174
+ if dyn =~ /^\*/
175
+ r = Regexp.new(dyn.sub(/^\*/,".*"))
176
+ elsif dyn =~ /\*$/
177
+ r = Regexp.new(dyn.sub(/\*$/,".*"))
178
+ end
179
+ if field =~ dyn
180
+ dyn_field = dyn
181
+ break
182
+ end
183
+ end
184
+ return dyn_field[:multi_valued] if dyn_field
185
+ end
186
+ false
187
+ end
188
+
189
+ def type_field(field_name, value)
190
+ return value.to_s unless @fields[field_name]
191
+ val = case @field_types[@fields[field_name][:field_type]][:type]
192
+ when :string then value.to_s
193
+ when :text then value.to_s
194
+ when :int then value.to_i
195
+ when :float then value.to_f
196
+ when :date then Date.parse(value)
197
+ when :bool
198
+ if value == "true"
199
+ true
200
+ else
201
+ false
202
+ end
203
+ else
204
+ val.to_s
205
+ end
206
+ val
207
+ end
208
+
209
+ def field_names
210
+ return @fields.keys
211
+ end
212
+
213
+ def field_to_field_info(field_name)
214
+ opts = {}
215
+ if @fields[field_name][:index] == :no
216
+ opts[:index] = :no
217
+ opts[:term_vector] = :no
218
+ elsif @field_types[@fields[field_name][:field_type]][:index]
219
+ opts[:index] = @field_types[@fields[field_name][:field_type]][:index]
220
+ end
221
+ if @fields[field_name][:stored] == :no
222
+ opts[:store] = :no
223
+ end
224
+ Ferret::Index::FieldInfo.new(field_name, opts)
225
+ end
226
+
227
+ end
metadata ADDED
@@ -0,0 +1,143 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: cheap_skate
3
+ version: !ruby/object:Gem::Version
4
+ prerelease: false
5
+ segments:
6
+ - 0
7
+ - 0
8
+ - 1
9
+ version: 0.0.1
10
+ platform: ruby
11
+ authors:
12
+ - Ross Singer
13
+ autorequire:
14
+ bindir: bin
15
+ cert_chain: []
16
+
17
+ date: 2010-05-28 00:00:00 -04:00
18
+ default_executable:
19
+ dependencies:
20
+ - !ruby/object:Gem::Dependency
21
+ name: fastercsv
22
+ prerelease: false
23
+ requirement: &id001 !ruby/object:Gem::Requirement
24
+ requirements:
25
+ - - ">="
26
+ - !ruby/object:Gem::Version
27
+ segments:
28
+ - 0
29
+ version: "0"
30
+ type: :runtime
31
+ version_requirements: *id001
32
+ - !ruby/object:Gem::Dependency
33
+ name: uuid
34
+ prerelease: false
35
+ requirement: &id002 !ruby/object:Gem::Requirement
36
+ requirements:
37
+ - - ">="
38
+ - !ruby/object:Gem::Version
39
+ segments:
40
+ - 0
41
+ version: "0"
42
+ type: :runtime
43
+ version_requirements: *id002
44
+ - !ruby/object:Gem::Dependency
45
+ name: ferret
46
+ prerelease: false
47
+ requirement: &id003 !ruby/object:Gem::Requirement
48
+ requirements:
49
+ - - ">="
50
+ - !ruby/object:Gem::Version
51
+ segments:
52
+ - 0
53
+ version: "0"
54
+ type: :runtime
55
+ version_requirements: *id003
56
+ - !ruby/object:Gem::Dependency
57
+ name: json
58
+ prerelease: false
59
+ requirement: &id004 !ruby/object:Gem::Requirement
60
+ requirements:
61
+ - - ">="
62
+ - !ruby/object:Gem::Version
63
+ segments:
64
+ - 0
65
+ version: "0"
66
+ type: :runtime
67
+ version_requirements: *id004
68
+ - !ruby/object:Gem::Dependency
69
+ name: hpricot
70
+ prerelease: false
71
+ requirement: &id005 !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ segments:
76
+ - 0
77
+ version: "0"
78
+ type: :runtime
79
+ version_requirements: *id005
80
+ - !ruby/object:Gem::Dependency
81
+ name: sinatra
82
+ prerelease: false
83
+ requirement: &id006 !ruby/object:Gem::Requirement
84
+ requirements:
85
+ - - ">="
86
+ - !ruby/object:Gem::Version
87
+ segments:
88
+ - 0
89
+ version: "0"
90
+ type: :runtime
91
+ version_requirements: *id006
92
+ description: A Solr-like interface for situations where running a Java application server is not an option (such as shared web hosting).
93
+ email: rossfsinger@gmail.com
94
+ executables: []
95
+
96
+ extensions: []
97
+
98
+ extra_rdoc_files:
99
+ - LICENSE
100
+ - README
101
+ files:
102
+ - conf/cheapskate.yml-dist
103
+ - conf/schema.yml-dist
104
+ - config.ru-dist
105
+ - lib/cheap_skate.rb
106
+ - lib/cheap_skate/application.rb
107
+ - lib/cheap_skate/index.rb
108
+ - lib/cheap_skate/models.rb
109
+ - lib/cheap_skate/schema.rb
110
+ - LICENSE
111
+ - README
112
+ has_rdoc: true
113
+ homepage: http://github.com/rsinger/CheapSkate
114
+ licenses: []
115
+
116
+ post_install_message:
117
+ rdoc_options:
118
+ - --charset=UTF-8
119
+ require_paths:
120
+ - lib
121
+ required_ruby_version: !ruby/object:Gem::Requirement
122
+ requirements:
123
+ - - ">="
124
+ - !ruby/object:Gem::Version
125
+ segments:
126
+ - 0
127
+ version: "0"
128
+ required_rubygems_version: !ruby/object:Gem::Requirement
129
+ requirements:
130
+ - - ">="
131
+ - !ruby/object:Gem::Version
132
+ segments:
133
+ - 0
134
+ version: "0"
135
+ requirements: []
136
+
137
+ rubyforge_project:
138
+ rubygems_version: 1.3.6
139
+ signing_key:
140
+ specification_version: 3
141
+ summary: A very simple Solr emulator in Ruby
142
+ test_files: []
143
+