em-voldemort 0.1.5

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore ADDED
@@ -0,0 +1 @@
1
+ /Gemfile.lock
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source 'https://rubygems.org'
2
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,19 @@
1
+ Copyright (c) 2013 LinkedIn Corp.
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ of this software and associated documentation files (the "Software"), to deal
5
+ in the Software without restriction, including without limitation the rights
6
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ copies of the Software, and to permit persons to whom the Software is
8
+ furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in
11
+ all copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,78 @@
1
+ EM::Voldemort
2
+ =============
3
+
4
+ A Ruby client for [Voldemort](http://www.project-voldemort.com/), implemented using
5
+ [EventMachine](http://rubyeventmachine.com/).
6
+
7
+ Features:
8
+
9
+ * High-performance, non-blocking access to Voldemort clusters.
10
+ * Fault-tolerant: automatically retries failed requests, routes requests to replica nodes if the
11
+ primary node is down, reconnects when a connection is lost, etc.
12
+ * Supports client-side routing using the same consistent hashing algorithm as the Java client.
13
+ * Keys and values in Voldemort's [Binary JSON](https://github.com/voldemort/voldemort/wiki/Binary-JSON-Serialization)
14
+ are automatically serialized and unserialized to Ruby hashes and arrays.
15
+ * Transparent gzip compression (like the Java client).
16
+
17
+ Limitations:
18
+
19
+ * Can only be used to access
20
+ [read-only stores](https://github.com/voldemort/voldemort/wiki/Build-and-Push-Jobs-for-Voldemort-Read-Only-Stores)
21
+ (the type of store that you build in a batch job in Hadoop and bulk-load into the Voldemort
22
+ cluster). Accessing read-write stores (which use BDB or MySQL as storage engine) is not currently
23
+ supported. This cuts out a lot of complexity (quorum reads/writes, conflict resolution, zoning
24
+ etc).
25
+ * Currently only supports gzip or uncompressed data, none of the other compression codecs.
26
+ * Currently doesn't support serialization formats other than Binary JSON and raw bytes.
27
+
28
+ Compatibility:
29
+
30
+ * Ruby 1.9 and above (not compatible with 1.8).
31
+ * Only tested on MRI, but ought to work on any Ruby implementation.
32
+ * Should work with a wide range of Voldemort versions (this client uses the `pb0` Protocol
33
+ Buffers-based protocol).
34
+
35
+
36
+ Usage
37
+ -----
38
+
39
+ `gem install em-voldemort` or add `gem 'em-voldemort'` to your Gemfile.
40
+
41
+ The client is initialized by giving it the hostname and port of any node in the cluster. That node
42
+ will be contacted to discover the other nodes and the configuration of the stores.
43
+ `EM::Voldemort::Cluster` is a client for an entire Voldemort cluster, which may have many stores;
44
+ `EM::Voldemort::Store` is the preferred way of accessing one particular store.
45
+
46
+ You get the store object from the cluster (you can do this during the initialization of your app --
47
+ EventMachine doesn't need to be running yet):
48
+
49
+ require 'em-voldemort'
50
+ MY_VOLDEMORT_CLUSTER = EM::Voldemort::Cluster.new(:host => 'voldemort.example.com', :port => 6666)
51
+ MY_VOLDEMORT_STORE = MY_VOLDEMORT_CLUSTER.store('my_store')
52
+
53
+ # Alternative convenience method, using a URL:
54
+ MY_VOLDEMORT_STORE = EM::Voldemort::Store.from_url('voldemort://voldemort.example.com:6666/my_store')
55
+
56
+ Making requests is then straightforward:
57
+
58
+ request = MY_VOLDEMORT_STORE.get('key-to-look-up')
59
+ request.callback {|response| puts "value: #{response}" }
60
+ request.errback {|error| puts "request failed: #{error}" }
61
+
62
+ On successful requests, the value passed to the callback is fully decoded (gzip decompressed and/or
63
+ Binary JSON decoded, if appropriate). On failed requests, an exception object is passed to the
64
+ errback. The exception object is of one of the following types:
65
+
66
+ * `EM::Voldemort::ClientError` -- like a HTTP 400 series error. Something is wrong with the request,
67
+ and retrying it won't help.
68
+ * `EM::Voldemort::KeyNotFound` -- subclass of `ClientError`, indicates that the given key was not
69
+ found in the store (like HTTP 404).
70
+ * `EM::Voldemort::ServerError` -- like a HTTP 500 series error or network error. We were not able to
71
+ get a valid response from the cluster. This gem automatically retries requests, so there's no
72
+ point in immediately retrying the request in application code (though you may want to retry after
73
+ a delay, if your application allows).
74
+
75
+ If you want to gracefully shut down the client (which allows any requests in flight to complete, but
76
+ stops any further requests from being made):
77
+
78
+ MY_VOLDEMORT_CLUSTER.close
@@ -0,0 +1,23 @@
1
+ require 'date'
2
+
3
+ Gem::Specification.new do |s|
4
+ s.name = 'em-voldemort'
5
+ s.authors = ['Martin Kleppmann']
6
+ s.email = 'martin@kleppmann.com'
7
+ s.version = '0.1.5'
8
+ s.summary = %q{Client for Voldemort}
9
+ s.description = %q{EventMachine implementation of a Voldemort client. Currently limited to read-only stores.}
10
+ s.homepage = 'https://github.com/rapportive-oss/em-voldemort'
11
+ s.date = Date.today.to_s
12
+ s.files = `git ls-files`.split("\n")
13
+ s.require_paths = %w(lib)
14
+ s.license = 'MIT'
15
+
16
+ s.add_dependency 'eventmachine'
17
+ s.add_dependency 'beefcake'
18
+ s.add_dependency 'nokogiri'
19
+ s.add_dependency 'json'
20
+
21
+ s.add_development_dependency 'rspec'
22
+ s.add_development_dependency 'timecop'
23
+ end
@@ -0,0 +1,11 @@
1
+ require 'uri'
2
+ require 'zlib'
3
+ require 'logger'
4
+ require 'eventmachine'
5
+ require 'beefcake'
6
+ require 'nokogiri'
7
+ require 'json'
8
+
9
+ %w(protobuf protocol errors store router connection cluster compressor binary_json).each do |file|
10
+ require File.join(File.dirname(__FILE__), 'em-voldemort', file)
11
+ end
@@ -0,0 +1,330 @@
1
+ module EM::Voldemort
2
+ # Codec for Voldemort's custom binary serialization format. The Voldemort codebase itself refers
3
+ # to this format as "json", even though it has virtually nothing in common with JSON. It's
4
+ # actually more like Avro, but with less sophisticated schema evolution, and less compact. We're
5
+ # only using it because the Hadoop job for building read-only stores requires it. The format is
6
+ # roughly documented at https://github.com/voldemort/voldemort/wiki/Binary-JSON-Serialization
7
+ #
8
+ # This code is adapted from Alejandro Crosa's voldemort-rb gem (MIT License).
9
+ # https://github.com/acrosa/voldemort-rb
10
+ class BinaryJson
11
+
12
+ attr_reader :has_version_tag
13
+ attr_reader :schema_versions
14
+
15
+ BYTE_MIN_VAL = -2**7
16
+ BYTE_MAX_VAL = 2**7 - 1
17
+ SHORT_MIN_VAL = -2**15
18
+ SHORT_MAX_VAL = 2**15 - 1
19
+ INT_MIN_VAL = -2**31
20
+ INT_MAX_VAL = 2**31 - 1
21
+ LONG_MIN_VAL = -2**63
22
+ LONG_MAX_VAL = 2**63 - 1
23
+ FLOAT_MIN_VAL = 2.0**-149
24
+ DOUBLE_MIN_VAL = 2.0**-1074
25
+ STRING_MAX_LEN = 0x3FFFFFFF
26
+
27
+ def initialize(schema_by_version, has_version_tag=true)
28
+ @has_version_tag = has_version_tag
29
+ @schema_versions = schema_by_version.each_with_object({}) do |(version, schema), hash|
30
+ hash[version.to_i] = parse_schema(schema)
31
+ end
32
+ end
33
+
34
+ # Serializes a Ruby object to binary JSON
35
+ def encode(object)
36
+ ''.force_encoding(Encoding::BINARY).tap do |bytes|
37
+ newest_version = schema_versions.keys.max
38
+ schema = schema_versions[newest_version]
39
+ bytes << newest_version.chr if has_version_tag
40
+ write(object, bytes, schema)
41
+ end
42
+ end
43
+
44
+ # Parses a binary JSON string into Ruby objects
45
+ def decode(bytes)
46
+ bytes.force_encoding(Encoding::BINARY)
47
+ input = StringIO.new(bytes)
48
+ version = has_version_tag ? input.read(1).ord : 0
49
+ schema = schema_versions[version]
50
+ raise ClientError, "no registered schema for version #{version}" unless schema
51
+ read(input, schema)
52
+ end
53
+
54
+ private
55
+
56
+ def parse_schema(schema)
57
+ # tolerate use of single quotes in place of double quotes in the schema
58
+ schema = schema.gsub("'", '"')
59
+
60
+ if schema =~ /\A[\{\[]/
61
+ # check if the json is a list or string, since these are
62
+ # the only ones that JSON.parse() will work with
63
+ JSON.parse(schema)
64
+ else
65
+ # otherwise it's a primitive, so just strip the quotes
66
+ schema.gsub('"', '')
67
+ end
68
+ end
69
+
70
+ # serialization
71
+
72
+ def write(object, bytes, schema)
73
+ case schema
74
+ when Hash
75
+ if object.is_a? Hash
76
+ write_map(object, bytes, schema)
77
+ else
78
+ raise ClientError, "serialization error: #{object.inspect} does not match schema #{schema.inspect}"
79
+ end
80
+ when Array
81
+ if object.is_a? Array
82
+ write_list(object, bytes, schema)
83
+ else
84
+ raise ClientError, "serialization error: #{object.inspect} does not match schema #{schema.inspect}"
85
+ end
86
+ when 'string' then write_bytes( object, bytes)
87
+ when 'int8' then write_int8( object, bytes)
88
+ when 'int16' then write_int16( object, bytes)
89
+ when 'int32' then write_int32( object, bytes)
90
+ when 'int64' then write_int64( object, bytes)
91
+ when 'float32' then write_float32(object, bytes)
92
+ when 'float64' then write_float64(object, bytes)
93
+ when 'date' then write_date( object, bytes)
94
+ when 'bytes' then write_bytes( object, bytes)
95
+ when 'boolean' then write_boolean(object, bytes)
96
+ else raise ClientError, "unrecognised binary json schema: #{schema.inspect}"
97
+ end
98
+ end
99
+
100
+ def write_boolean(object, bytes)
101
+ if object.nil?
102
+ bytes << [BYTE_MIN_VAL].pack('c')
103
+ elsif object
104
+ bytes << 1.chr
105
+ else
106
+ bytes << 0.chr
107
+ end
108
+ end
109
+
110
+ def write_string(object, bytes)
111
+ write_bytes(object, bytes)
112
+ end
113
+
114
+ def write_int8(object, bytes)
115
+ if object.nil?
116
+ bytes << [BYTE_MIN_VAL].pack('c')
117
+ elsif object > BYTE_MIN_VAL && object <= BYTE_MAX_VAL
118
+ bytes << [object].pack('c')
119
+ else
120
+ raise ClientError, "value out of int8 range: #{object}"
121
+ end
122
+ end
123
+
124
+ def write_int16(object, bytes)
125
+ if object.nil?
126
+ bytes << [SHORT_MIN_VAL].pack('n')
127
+ elsif object > SHORT_MIN_VAL && object <= SHORT_MAX_VAL
128
+ bytes << [object].pack('n')
129
+ else
130
+ raise ClientError, "value out of int16 range: #{object}"
131
+ end
132
+ end
133
+
134
+ def write_int32(object, bytes)
135
+ if object.nil?
136
+ bytes << [INT_MIN_VAL].pack('N')
137
+ elsif object > INT_MIN_VAL && object <= INT_MAX_VAL
138
+ bytes << [object].pack('N')
139
+ else
140
+ raise ClientError, "value out of int32 range: #{object}"
141
+ end
142
+ end
143
+
144
+ def write_int64(object, bytes)
145
+ if object.nil?
146
+ bytes << [INT_MIN_VAL, 0].pack('NN')
147
+ elsif object > LONG_MIN_VAL && object <= LONG_MAX_VAL
148
+ bytes << [object / 2**32, object % 2**32].pack('NN')
149
+ else
150
+ raise ClientError, "value out of int64 range: #{object}"
151
+ end
152
+ end
153
+
154
+ def write_float32(object, bytes)
155
+ if object == FLOAT_MIN_VAL
156
+ raise ClientError, "Can't use #{FLOAT_MIN_VAL} because it is used to represent nil"
157
+ else
158
+ bytes << [object || FLOAT_MIN_VAL].pack('g')
159
+ end
160
+ end
161
+
162
+ def write_float64(object, bytes)
163
+ if object == DOUBLE_MIN_VAL
164
+ raise ClientError, "Can't use #{DOUBLE_MIN_VAL} because it is used to represent nil"
165
+ else
166
+ bytes << [object || DOUBLE_MIN_VAL].pack('G')
167
+ end
168
+ end
169
+
170
+ def write_date(object, bytes)
171
+ if object.nil?
172
+ write_int64(nil, bytes)
173
+ else
174
+ write_int64((object.to_f * 1000).to_i, bytes)
175
+ end
176
+ end
177
+
178
+ def write_length(length, bytes)
179
+ if length < SHORT_MAX_VAL
180
+ bytes << [length].pack('n')
181
+ elsif length < STRING_MAX_LEN
182
+ bytes << [length | 0xC0000000].pack('N')
183
+ else
184
+ raise ClientError, 'string is too long to be serialized'
185
+ end
186
+ end
187
+
188
+ def write_bytes(object, bytes)
189
+ if object.nil?
190
+ write_int16(-1, bytes)
191
+ else
192
+ write_length(object.length, bytes)
193
+ bytes << object
194
+ end
195
+ end
196
+
197
+ def write_map(object, bytes, schema)
198
+ if object.nil?
199
+ bytes << [-1].pack('c')
200
+ else
201
+ bytes << [1].pack('c')
202
+ if object.size != schema.size
203
+ raise ClientError, "Fields of object #{object.inspect} do not match schema #{schema.inspect}"
204
+ end
205
+
206
+ schema.sort.each do |key, value_type|
207
+ if object.has_key?(key.to_s)
208
+ write(object[key.to_s], bytes, value_type)
209
+ elsif object.has_key?(key.to_sym)
210
+ write(object[key.to_sym], bytes, value_type)
211
+ else
212
+ raise ClientError, "Object #{object.inspect} does not have #{key} field required by the schema"
213
+ end
214
+ end
215
+ end
216
+ end
217
+
218
+ def write_list(object, bytes, schema)
219
+ if schema.length != 1
220
+ raise ClientError, "Schema error: a list must have one item, unlike #{schema.inspect}"
221
+ elsif object.nil?
222
+ write_int16(-1, bytes)
223
+ else
224
+ write_length(object.length, bytes)
225
+ object.each {|item| write(item, bytes, schema.first) }
226
+ end
227
+ end
228
+
229
+ # parsing
230
+
231
+ def read(input, schema)
232
+ case schema
233
+ when Hash then read_map(input, schema)
234
+ when Array then read_list(input, schema)
235
+ when 'string' then read_bytes(input)
236
+ when 'int8' then read_int8(input)
237
+ when 'int16' then read_int16(input)
238
+ when 'int32' then read_int32(input)
239
+ when 'int64' then read_int64(input)
240
+ when 'float32' then read_float32(input)
241
+ when 'float64' then read_float64(input)
242
+ when 'date' then read_date(input)
243
+ when 'bytes' then read_bytes(input)
244
+ when 'boolean' then read_boolean(input)
245
+ else raise ClientError, "unrecognised binary json schema: #{schema.inspect}"
246
+ end
247
+ end
248
+
249
+ def read_map(input, schema)
250
+ return nil if input.read(1).unpack('c') == [-1]
251
+ schema.sort.each_with_object({}) do |(key, value_type), object|
252
+ object[key.to_sym] = read(input, value_type)
253
+ end
254
+ end
255
+
256
+ def read_length(input)
257
+ size = input.read(2).unpack('n').first
258
+ if size == 0xFFFF
259
+ -1
260
+ elsif size & 0x8000 > 0
261
+ (size & 0x3FFF) << 16 | input.read(2).unpack('n').first
262
+ else
263
+ size
264
+ end
265
+ end
266
+
267
+ def read_list(input, schema)
268
+ size = read_length(input)
269
+ return nil if size < 0
270
+ [].tap do |object|
271
+ size.times { object << read(input, schema.first) }
272
+ end
273
+ end
274
+
275
+ def read_boolean(input)
276
+ value = input.read(1).unpack('c').first
277
+ return nil if value < 0
278
+ value > 0
279
+ end
280
+
281
+ def read_int8(input)
282
+ value = input.read(1).unpack('c').first
283
+ value unless value == BYTE_MIN_VAL
284
+ end
285
+
286
+ def to_signed(value, bits)
287
+ if value >= 2 ** (bits - 1)
288
+ value - 2 ** bits
289
+ else
290
+ value
291
+ end
292
+ end
293
+
294
+ def read_int16(input)
295
+ value = to_signed(input.read(2).unpack('n').first, 16)
296
+ value unless value == SHORT_MIN_VAL
297
+ end
298
+
299
+ def read_int32(input)
300
+ value = to_signed(input.read(4).unpack('N').first, 32)
301
+ value unless value == INT_MIN_VAL
302
+ end
303
+
304
+ def read_int64(input)
305
+ high, low = input.read(8).unpack('NN')
306
+ value = to_signed(high << 32 | low, 64)
307
+ value unless value == LONG_MIN_VAL
308
+ end
309
+
310
+ def read_float32(input)
311
+ value = input.read(4).unpack('g').first
312
+ value unless value == FLOAT_MIN_VAL
313
+ end
314
+
315
+ def read_float64(input)
316
+ value = input.read(8).unpack('G').first
317
+ value unless value == DOUBLE_MIN_VAL
318
+ end
319
+
320
+ def read_date(input)
321
+ timestamp = read_int64(input)
322
+ timestamp && Time.at(timestamp / 1000.0)
323
+ end
324
+
325
+ def read_bytes(input)
326
+ length = read_length(input)
327
+ input.read(length) if length >= 0
328
+ end
329
+ end
330
+ end