em-voldemort 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/.gitignore ADDED
@@ -0,0 +1 @@
1
+ /Gemfile.lock
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source 'https://rubygems.org'
2
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,19 @@
1
+ Copyright (c) 2013 LinkedIn Corp.
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ of this software and associated documentation files (the "Software"), to deal
5
+ in the Software without restriction, including without limitation the rights
6
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ copies of the Software, and to permit persons to whom the Software is
8
+ furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in
11
+ all copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,78 @@
1
+ EM::Voldemort
2
+ =============
3
+
4
+ A Ruby client for [Voldemort](http://www.project-voldemort.com/), implemented using
5
+ [EventMachine](http://rubyeventmachine.com/).
6
+
7
+ Features:
8
+
9
+ * High-performance, non-blocking access to Voldemort clusters.
10
+ * Fault-tolerant: automatically retries failed requests, routes requests to replica nodes if the
11
+ primary node is down, reconnects when a connection is lost, etc.
12
+ * Supports client-side routing using the same consistent hashing algorithm as the Java client.
13
+ * Keys and values in Voldemort's [Binary JSON](https://github.com/voldemort/voldemort/wiki/Binary-JSON-Serialization)
14
+ are automatically serialized and unserialized to Ruby hashes and arrays.
15
+ * Transparent gzip compression (like the Java client).
16
+
17
+ Limitations:
18
+
19
+ * Can only be used to access
20
+ [read-only stores](https://github.com/voldemort/voldemort/wiki/Build-and-Push-Jobs-for-Voldemort-Read-Only-Stores)
21
+ (the type of store that you build in a batch job in Hadoop and bulk-load into the Voldemort
22
+ cluster). Accessing read-write stores (which use BDB or MySQL as storage engine) is not currently
23
+ supported. This cuts out a lot of complexity (quorum reads/writes, conflict resolution, zoning
24
+ etc).
25
+ * Currently only supports gzip or uncompressed data, none of the other compression codecs.
26
+ * Currently doesn't support serialization formats other than Binary JSON and raw bytes.
27
+
28
+ Compatibility:
29
+
30
+ * Ruby 1.9 and above (not compatible with 1.8).
31
+ * Only tested on MRI, but ought to work on any Ruby implementation.
32
+ * Should work with a wide range of Voldemort versions (this client uses the `pb0` Protocol
33
+ Buffers-based protocol).
34
+
35
+
36
+ Usage
37
+ -----
38
+
39
+ `gem install em-voldemort` or add `gem 'em-voldemort'` to your Gemfile.
40
+
41
+ The client is initialized by giving it the hostname and port of any node in the cluster. That node
42
+ will be contacted to discover the other nodes and the configuration of the stores.
43
+ `EM::Voldemort::Cluster` is a client for an entire Voldemort cluster, which may have many stores;
44
+ `EM::Voldemort::Store` is the preferred way of accessing one particular store.
45
+
46
+ You get the store object from the cluster (you can do this during the initialization of your app --
47
+ EventMachine doesn't need to be running yet):
48
+
49
+ require 'em-voldemort'
50
+ MY_VOLDEMORT_CLUSTER = EM::Voldemort::Cluster.new(:host => 'voldemort.example.com', :port => 6666)
51
+ MY_VOLDEMORT_STORE = MY_VOLDEMORT_CLUSTER.store('my_store')
52
+
53
+ # Alternative convenience method, using a URL:
54
+ MY_VOLDEMORT_STORE = EM::Voldemort::Store.from_url('voldemort://voldemort.example.com:6666/my_store')
55
+
56
+ Making requests is then straightforward:
57
+
58
+ request = MY_VOLDEMORT_STORE.get('key-to-look-up')
59
+ request.callback {|response| puts "value: #{response}" }
60
+ request.errback {|error| puts "request failed: #{error}" }
61
+
62
+ On successful requests, the value passed to the callback is fully decoded (gzip decompressed and/or
63
+ Binary JSON decoded, if appropriate). On failed requests, an exception object is passed to the
64
+ errback. The exception object is of one of the following types:
65
+
66
+ * `EM::Voldemort::ClientError` -- like a HTTP 400 series error. Something is wrong with the request,
67
+ and retrying it won't help.
68
+ * `EM::Voldemort::KeyNotFound` -- subclass of `ClientError`, indicates that the given key was not
69
+ found in the store (like HTTP 404).
70
+ * `EM::Voldemort::ServerError` -- like a HTTP 500 series error or network error. We were not able to
71
+ get a valid response from the cluster. This gem automatically retries requests, so there's no
72
+ point in immediately retrying the request in application code (though you may want to retry after
73
+ a delay, if your application allows).
74
+
75
+ If you want to gracefully shut down the client (which allows any requests in flight to complete, but
76
+ stops any further requests from being made):
77
+
78
+ MY_VOLDEMORT_CLUSTER.close
@@ -0,0 +1,23 @@
1
+ require 'date'
2
+
3
+ Gem::Specification.new do |s|
4
+ s.name = 'em-voldemort'
5
+ s.authors = ['Martin Kleppmann']
6
+ s.email = 'martin@kleppmann.com'
7
+ s.version = '0.1.5'
8
+ s.summary = %q{Client for Voldemort}
9
+ s.description = %q{EventMachine implementation of a Voldemort client. Currently limited to read-only stores.}
10
+ s.homepage = 'https://github.com/rapportive-oss/em-voldemort'
11
+ s.date = Date.today.to_s
12
+ s.files = `git ls-files`.split("\n")
13
+ s.require_paths = %w(lib)
14
+ s.license = 'MIT'
15
+
16
+ s.add_dependency 'eventmachine'
17
+ s.add_dependency 'beefcake'
18
+ s.add_dependency 'nokogiri'
19
+ s.add_dependency 'json'
20
+
21
+ s.add_development_dependency 'rspec'
22
+ s.add_development_dependency 'timecop'
23
+ end
@@ -0,0 +1,11 @@
1
+ require 'uri'
2
+ require 'zlib'
3
+ require 'logger'
4
+ require 'eventmachine'
5
+ require 'beefcake'
6
+ require 'nokogiri'
7
+ require 'json'
8
+
9
+ %w(protobuf protocol errors store router connection cluster compressor binary_json).each do |file|
10
+ require File.join(File.dirname(__FILE__), 'em-voldemort', file)
11
+ end
@@ -0,0 +1,330 @@
1
+ module EM::Voldemort
2
+ # Codec for Voldemort's custom binary serialization format. The Voldemort codebase itself refers
3
+ # to this format as "json", even though it has virtually nothing in common with JSON. It's
4
+ # actually more like Avro, but with less sophisticated schema evolution, and less compact. We're
5
+ # only using it because the Hadoop job for building read-only stores requires it. The format is
6
+ # roughly documented at https://github.com/voldemort/voldemort/wiki/Binary-JSON-Serialization
7
+ #
8
+ # This code is adapted from Alejandro Crosa's voldemort-rb gem (MIT License).
9
+ # https://github.com/acrosa/voldemort-rb
10
+ class BinaryJson
11
+
12
+ attr_reader :has_version_tag
13
+ attr_reader :schema_versions
14
+
15
+ BYTE_MIN_VAL = -2**7
16
+ BYTE_MAX_VAL = 2**7 - 1
17
+ SHORT_MIN_VAL = -2**15
18
+ SHORT_MAX_VAL = 2**15 - 1
19
+ INT_MIN_VAL = -2**31
20
+ INT_MAX_VAL = 2**31 - 1
21
+ LONG_MIN_VAL = -2**63
22
+ LONG_MAX_VAL = 2**63 - 1
23
+ FLOAT_MIN_VAL = 2.0**-149
24
+ DOUBLE_MIN_VAL = 2.0**-1074
25
+ STRING_MAX_LEN = 0x3FFFFFFF
26
+
27
+ def initialize(schema_by_version, has_version_tag=true)
28
+ @has_version_tag = has_version_tag
29
+ @schema_versions = schema_by_version.each_with_object({}) do |(version, schema), hash|
30
+ hash[version.to_i] = parse_schema(schema)
31
+ end
32
+ end
33
+
34
+ # Serializes a Ruby object to binary JSON
35
+ def encode(object)
36
+ ''.force_encoding(Encoding::BINARY).tap do |bytes|
37
+ newest_version = schema_versions.keys.max
38
+ schema = schema_versions[newest_version]
39
+ bytes << newest_version.chr if has_version_tag
40
+ write(object, bytes, schema)
41
+ end
42
+ end
43
+
44
+ # Parses a binary JSON string into Ruby objects
45
+ def decode(bytes)
46
+ bytes.force_encoding(Encoding::BINARY)
47
+ input = StringIO.new(bytes)
48
+ version = has_version_tag ? input.read(1).ord : 0
49
+ schema = schema_versions[version]
50
+ raise ClientError, "no registered schema for version #{version}" unless schema
51
+ read(input, schema)
52
+ end
53
+
54
+ private
55
+
56
+ def parse_schema(schema)
57
+ # tolerate use of single quotes in place of double quotes in the schema
58
+ schema = schema.gsub("'", '"')
59
+
60
+ if schema =~ /\A[\{\[]/
61
+ # check if the json is a list or string, since these are
62
+ # the only ones that JSON.parse() will work with
63
+ JSON.parse(schema)
64
+ else
65
+ # otherwise it's a primitive, so just strip the quotes
66
+ schema.gsub('"', '')
67
+ end
68
+ end
69
+
70
+ # serialization
71
+
72
+ def write(object, bytes, schema)
73
+ case schema
74
+ when Hash
75
+ if object.is_a? Hash
76
+ write_map(object, bytes, schema)
77
+ else
78
+ raise ClientError, "serialization error: #{object.inspect} does not match schema #{schema.inspect}"
79
+ end
80
+ when Array
81
+ if object.is_a? Array
82
+ write_list(object, bytes, schema)
83
+ else
84
+ raise ClientError, "serialization error: #{object.inspect} does not match schema #{schema.inspect}"
85
+ end
86
+ when 'string' then write_bytes( object, bytes)
87
+ when 'int8' then write_int8( object, bytes)
88
+ when 'int16' then write_int16( object, bytes)
89
+ when 'int32' then write_int32( object, bytes)
90
+ when 'int64' then write_int64( object, bytes)
91
+ when 'float32' then write_float32(object, bytes)
92
+ when 'float64' then write_float64(object, bytes)
93
+ when 'date' then write_date( object, bytes)
94
+ when 'bytes' then write_bytes( object, bytes)
95
+ when 'boolean' then write_boolean(object, bytes)
96
+ else raise ClientError, "unrecognised binary json schema: #{schema.inspect}"
97
+ end
98
+ end
99
+
100
+ def write_boolean(object, bytes)
101
+ if object.nil?
102
+ bytes << [BYTE_MIN_VAL].pack('c')
103
+ elsif object
104
+ bytes << 1.chr
105
+ else
106
+ bytes << 0.chr
107
+ end
108
+ end
109
+
110
+ def write_string(object, bytes)
111
+ write_bytes(object, bytes)
112
+ end
113
+
114
+ def write_int8(object, bytes)
115
+ if object.nil?
116
+ bytes << [BYTE_MIN_VAL].pack('c')
117
+ elsif object > BYTE_MIN_VAL && object <= BYTE_MAX_VAL
118
+ bytes << [object].pack('c')
119
+ else
120
+ raise ClientError, "value out of int8 range: #{object}"
121
+ end
122
+ end
123
+
124
+ def write_int16(object, bytes)
125
+ if object.nil?
126
+ bytes << [SHORT_MIN_VAL].pack('n')
127
+ elsif object > SHORT_MIN_VAL && object <= SHORT_MAX_VAL
128
+ bytes << [object].pack('n')
129
+ else
130
+ raise ClientError, "value out of int16 range: #{object}"
131
+ end
132
+ end
133
+
134
+ def write_int32(object, bytes)
135
+ if object.nil?
136
+ bytes << [INT_MIN_VAL].pack('N')
137
+ elsif object > INT_MIN_VAL && object <= INT_MAX_VAL
138
+ bytes << [object].pack('N')
139
+ else
140
+ raise ClientError, "value out of int32 range: #{object}"
141
+ end
142
+ end
143
+
144
+ def write_int64(object, bytes)
145
+ if object.nil?
146
+ bytes << [INT_MIN_VAL, 0].pack('NN')
147
+ elsif object > LONG_MIN_VAL && object <= LONG_MAX_VAL
148
+ bytes << [object / 2**32, object % 2**32].pack('NN')
149
+ else
150
+ raise ClientError, "value out of int64 range: #{object}"
151
+ end
152
+ end
153
+
154
+ def write_float32(object, bytes)
155
+ if object == FLOAT_MIN_VAL
156
+ raise ClientError, "Can't use #{FLOAT_MIN_VAL} because it is used to represent nil"
157
+ else
158
+ bytes << [object || FLOAT_MIN_VAL].pack('g')
159
+ end
160
+ end
161
+
162
+ def write_float64(object, bytes)
163
+ if object == DOUBLE_MIN_VAL
164
+ raise ClientError, "Can't use #{DOUBLE_MIN_VAL} because it is used to represent nil"
165
+ else
166
+ bytes << [object || DOUBLE_MIN_VAL].pack('G')
167
+ end
168
+ end
169
+
170
+ def write_date(object, bytes)
171
+ if object.nil?
172
+ write_int64(nil, bytes)
173
+ else
174
+ write_int64((object.to_f * 1000).to_i, bytes)
175
+ end
176
+ end
177
+
178
+ def write_length(length, bytes)
179
+ if length < SHORT_MAX_VAL
180
+ bytes << [length].pack('n')
181
+ elsif length < STRING_MAX_LEN
182
+ bytes << [length | 0xC0000000].pack('N')
183
+ else
184
+ raise ClientError, 'string is too long to be serialized'
185
+ end
186
+ end
187
+
188
+ def write_bytes(object, bytes)
189
+ if object.nil?
190
+ write_int16(-1, bytes)
191
+ else
192
+ write_length(object.length, bytes)
193
+ bytes << object
194
+ end
195
+ end
196
+
197
+ def write_map(object, bytes, schema)
198
+ if object.nil?
199
+ bytes << [-1].pack('c')
200
+ else
201
+ bytes << [1].pack('c')
202
+ if object.size != schema.size
203
+ raise ClientError, "Fields of object #{object.inspect} do not match schema #{schema.inspect}"
204
+ end
205
+
206
+ schema.sort.each do |key, value_type|
207
+ if object.has_key?(key.to_s)
208
+ write(object[key.to_s], bytes, value_type)
209
+ elsif object.has_key?(key.to_sym)
210
+ write(object[key.to_sym], bytes, value_type)
211
+ else
212
+ raise ClientError, "Object #{object.inspect} does not have #{key} field required by the schema"
213
+ end
214
+ end
215
+ end
216
+ end
217
+
218
+ def write_list(object, bytes, schema)
219
+ if schema.length != 1
220
+ raise ClientError, "Schema error: a list must have one item, unlike #{schema.inspect}"
221
+ elsif object.nil?
222
+ write_int16(-1, bytes)
223
+ else
224
+ write_length(object.length, bytes)
225
+ object.each {|item| write(item, bytes, schema.first) }
226
+ end
227
+ end
228
+
229
+ # parsing
230
+
231
+ def read(input, schema)
232
+ case schema
233
+ when Hash then read_map(input, schema)
234
+ when Array then read_list(input, schema)
235
+ when 'string' then read_bytes(input)
236
+ when 'int8' then read_int8(input)
237
+ when 'int16' then read_int16(input)
238
+ when 'int32' then read_int32(input)
239
+ when 'int64' then read_int64(input)
240
+ when 'float32' then read_float32(input)
241
+ when 'float64' then read_float64(input)
242
+ when 'date' then read_date(input)
243
+ when 'bytes' then read_bytes(input)
244
+ when 'boolean' then read_boolean(input)
245
+ else raise ClientError, "unrecognised binary json schema: #{schema.inspect}"
246
+ end
247
+ end
248
+
249
+ def read_map(input, schema)
250
+ return nil if input.read(1).unpack('c') == [-1]
251
+ schema.sort.each_with_object({}) do |(key, value_type), object|
252
+ object[key.to_sym] = read(input, value_type)
253
+ end
254
+ end
255
+
256
+ def read_length(input)
257
+ size = input.read(2).unpack('n').first
258
+ if size == 0xFFFF
259
+ -1
260
+ elsif size & 0x8000 > 0
261
+ (size & 0x3FFF) << 16 | input.read(2).unpack('n').first
262
+ else
263
+ size
264
+ end
265
+ end
266
+
267
+ def read_list(input, schema)
268
+ size = read_length(input)
269
+ return nil if size < 0
270
+ [].tap do |object|
271
+ size.times { object << read(input, schema.first) }
272
+ end
273
+ end
274
+
275
+ def read_boolean(input)
276
+ value = input.read(1).unpack('c').first
277
+ return nil if value < 0
278
+ value > 0
279
+ end
280
+
281
+ def read_int8(input)
282
+ value = input.read(1).unpack('c').first
283
+ value unless value == BYTE_MIN_VAL
284
+ end
285
+
286
+ def to_signed(value, bits)
287
+ if value >= 2 ** (bits - 1)
288
+ value - 2 ** bits
289
+ else
290
+ value
291
+ end
292
+ end
293
+
294
+ def read_int16(input)
295
+ value = to_signed(input.read(2).unpack('n').first, 16)
296
+ value unless value == SHORT_MIN_VAL
297
+ end
298
+
299
+ def read_int32(input)
300
+ value = to_signed(input.read(4).unpack('N').first, 32)
301
+ value unless value == INT_MIN_VAL
302
+ end
303
+
304
+ def read_int64(input)
305
+ high, low = input.read(8).unpack('NN')
306
+ value = to_signed(high << 32 | low, 64)
307
+ value unless value == LONG_MIN_VAL
308
+ end
309
+
310
+ def read_float32(input)
311
+ value = input.read(4).unpack('g').first
312
+ value unless value == FLOAT_MIN_VAL
313
+ end
314
+
315
+ def read_float64(input)
316
+ value = input.read(8).unpack('G').first
317
+ value unless value == DOUBLE_MIN_VAL
318
+ end
319
+
320
+ def read_date(input)
321
+ timestamp = read_int64(input)
322
+ timestamp && Time.at(timestamp / 1000.0)
323
+ end
324
+
325
+ def read_bytes(input)
326
+ length = read_length(input)
327
+ input.read(length) if length >= 0
328
+ end
329
+ end
330
+ end