consul-mutex 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md ADDED
@@ -0,0 +1,95 @@
1
+ Sometimes, you just want some code to run on only one machine in a cluster
2
+ at any particular time. Perhaps you only need one copy running, and you'd
3
+ like to have something ready to failover, or maybe you want to make sure you
4
+ don't take down all your machines simultaneously for a code upgrade.
5
+
6
+ Either way, `Consul::Mutex` has got you covered.
7
+
8
+
9
+ # Installation
10
+
11
+ It's a gem:
12
+
13
+ gem install consul-mutex
14
+
15
+ There's also the wonders of [the Gemfile](http://bundler.io):
16
+
17
+ gem 'consul-mutex'
18
+
19
+ If you're the sturdy type that likes to run from git:
20
+
21
+ rake install
22
+
23
+ Or, if you've eschewed the convenience of Rubygems entirely, then you
24
+ presumably know what to do already.
25
+
26
+
27
+ # Usage
28
+
29
+ Simply instantiate a new `Consul::Mutex`, giving it the key you want to use
30
+ as the "lock":
31
+
32
+ require 'consul/mutex'
33
+
34
+ mutex = Consul::Mutex.new('/my/something/weird')
35
+
36
+ Then, whenever you want to only have one thing running at once, run the code
37
+ inside a block passed to `#synchronize`:
38
+
39
+ mutex.synchronize { print "There can be"; sleep 5; puts " only one." }
40
+
41
+ If your consul server is not accessable via `http://localhost:8500`, you'll
42
+ need to tell `Consul::Mutex` where to find it:
43
+
44
+ mutex = Consul::Mutex.new('/some/key', consul_url: 'http://consul:8500')
45
+
46
+ By default, the "value" of the lock resource will be the hostname of the
47
+ machine that it's running on (so you know who has the lock). If, for some
48
+ reason, you'd like to set the value to something else, you can do that, too:
49
+
50
+ mutex = Consul::Mutex.new('/some/key', value: "It is now #{Time.now}")
51
+
52
+
53
+ ## Failure Is An Option
54
+
55
+ One thing that is a bit unsettling about Consul-mediated mutexes is that
56
+ the lock can be "lost" for a variety of reasons. The most common one, of
57
+ course, is simple communications failure -- if you're on the wrong side of a
58
+ split-brain, the rest of the cluster can recover and continue on, and you no
59
+ longer have the lock, because your half of the cluster is dead. Consul also
60
+ allows locks to be force-unlocked by an operator, because otherwise, if
61
+ something died without unlocking, the lock would be held *forever*.
62
+
63
+ As a result of all this, the block of code that you pass to `#synchronize`
64
+ runs on a separate thread, and *can be killed without warning* if the mutex
65
+ determines that it no longer holds the lock. That means you want to be a
66
+ bit extra-careful about idempotence and releasing resources you acquire (via
67
+ `ensure` blocks).
68
+
69
+
70
+ # Contributing
71
+
72
+ Bug reports should be sent to the [Github issue
73
+ tracker](https://github.com/mpalmer/consul-mutex/issues), or
74
+ [e-mailed](mailto:theshed+consul-mutex@hezmatt.org). Patches can be sent as a
75
+ Github pull request, or [e-mailed](mailto:theshed+consul-mutex@hezmatt.org).
76
+
77
+
78
+ # Licence
79
+
80
+ Unless otherwise stated, everything in this repo is covered by the following
81
+ copyright notice:
82
+
83
+ Copyright (C) 2015 Civilized Discourse Construction Kit Inc.
84
+
85
+ This program is free software: you can redistribute it and/or modify it
86
+ under the terms of the GNU General Public License version 3, as
87
+ published by the Free Software Foundation.
88
+
89
+ This program is distributed in the hope that it will be useful,
90
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
91
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
92
+ GNU General Public License for more details.
93
+
94
+ You should have received a copy of the GNU General Public License
95
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
@@ -0,0 +1,38 @@
1
+ begin
2
+ require 'git-version-bump'
3
+ rescue LoadError
4
+ nil
5
+ end
6
+
7
+ Gem::Specification.new do |s|
8
+ s.name = "consul-mutex"
9
+
10
+ s.version = GVB.version rescue "0.0.0.1.NOGVB"
11
+ s.date = GVB.date rescue Time.now.strftime("%Y-%m-%d")
12
+
13
+ s.platform = Gem::Platform::RUBY
14
+
15
+ s.summary = "Manage distributed code with a Consul-mediated mutex"
16
+
17
+ s.authors = ["Matt Palmer"]
18
+ s.email = ["theshed+consul-mutex@hezmatt.org"]
19
+ s.homepage = "http://theshed.hezmatt.org/consul-mutex"
20
+
21
+ s.files = `git ls-files -z`.split("\0").reject { |f| f =~ /^(G|spec|Rakefile)/ }
22
+
23
+ s.required_ruby_version = ">= 1.9.3"
24
+
25
+ s.add_runtime_dependency 'nesty', '~> 1.0'
26
+
27
+ s.add_development_dependency 'bundler'
28
+ s.add_development_dependency 'github-release'
29
+ s.add_development_dependency 'guard-spork'
30
+ s.add_development_dependency 'guard-rspec'
31
+ s.add_development_dependency 'rake', '~> 10.4', '>= 10.4.2'
32
+ # Needed for guard
33
+ s.add_development_dependency 'rb-inotify', '~> 0.9'
34
+ s.add_development_dependency 'redcarpet'
35
+ s.add_development_dependency 'rspec'
36
+ s.add_development_dependency 'webmock'
37
+ s.add_development_dependency 'yard'
38
+ end
data/lib/.gitkeep ADDED
File without changes
@@ -0,0 +1,452 @@
1
+ require 'json'
2
+ require 'socket'
3
+ require 'uri'
4
+ require 'thwait'
5
+ require 'net/http'
6
+ require 'nesty'
7
+
8
+ if !Kernel.const_defined?(:Consul)
9
+ #:nodoc:
10
+ module Consul; end
11
+ end
12
+
13
+ # A Consul-mediated distributed mutex
14
+ #
15
+ # Sometimes, you just want some code to run on only one machine in a cluster
16
+ # at any particular time. Perhaps you only need one copy running, and you'd
17
+ # like to have something ready to failover, or maybe you want to make sure
18
+ # you don't take down all your machines simultaneously for a code upgrade.
19
+ #
20
+ # Either way, `Consul::Mutex` has got you covered.
21
+ #
22
+ class Consul::Mutex
23
+ # Indicates something went wrong in the worker thread.
24
+ #
25
+ # Catch this exception if you specifically want to know that your worker
26
+ # thread lost its mind and errored out. The *actual* exception which
27
+ # caused the thread to terminate is available in `#nested`.
28
+ #
29
+ ThreadExceptionError = Class.new(Nesty::NestedStandardError)
30
+
31
+ # Indicates some sort of problem communicating with Consul.
32
+ #
33
+ # Something has gone terribly, terribly wrong, and I need to tell you all
34
+ # about it.
35
+ #
36
+ ConsulError = Class.new(RuntimeError)
37
+
38
+ # Indicates that the worker thread was terminated because we lost the
39
+ # distributed lock.
40
+ #
41
+ # You can't assume *anything* about what state anything is in that you
42
+ # haven't explicitly made true using `ensure`. In general, if you're
43
+ # getting this exception, something is doing terrible, terrible things to
44
+ # your Consul cluster.
45
+ #
46
+ LostLockError = Class.new(RuntimeError)
47
+
48
+ # Internal-only class to abstract out the details of a Consul KV key,
49
+ # parsing it from the HTTP response into something a little easier
50
+ # to deal with.
51
+ #
52
+ class Key
53
+ attr_reader :consul_index, :session, :value
54
+
55
+ def initialize(http_response)
56
+ begin
57
+ json = JSON.parse(http_response.body)
58
+ rescue JSON::ParserError => ex
59
+ raise ConsulError,
60
+ "Consul returned unparseable JSON: #{ex.message}"
61
+ end
62
+
63
+ unless json.is_a?(Array)
64
+ raise ConsulError,
65
+ "Consul did not return an array; instead, it is a #{json.class}"
66
+ end
67
+
68
+ if json.length != 1
69
+ raise ConsulError,
70
+ "Invalid number of objects returned: expected 1, got #{json.length}"
71
+ end
72
+
73
+ json = json.first
74
+
75
+ @consul_index = http_response['X-Consul-Index']
76
+ @session = json['Session']
77
+ @value = json['Value'].nil? ? nil : json['Value'].unpack("m").first
78
+
79
+ @last_cas_index = nil
80
+ end
81
+ end
82
+ private_constant :Key
83
+
84
+ # Create a new Consul-mediated distributed mutex.
85
+ #
86
+ # @param key [String] the path (within the Consul KV store namespace,
87
+ # `/v1/kv`) which will be used as the lock key for all operations on
88
+ # this mutex. Every mutex created with the same key will exclude with
89
+ # all other mutexes created with the same key.
90
+ #
91
+ # @option opts [String] :value (hostname) the value to set on the lock
92
+ # key when we acquire the lock. This can be anything you like, but
93
+ # generally you'll want to set something that uniquely identifies who
94
+ # is currently holding the lock. The default, the local system's
95
+ # hostname, is generally a good choice.
96
+ #
97
+ # @option opts [String] :consul_url ('http://localhost:8500') where to
98
+ # connect to in order to talk to your local Consul cluster.
99
+ #
100
+ def initialize(key, opts = {})
101
+ @key = key
102
+ @value = opts.fetch(:value, Socket.gethostname)
103
+ @consul_url = URI(opts.fetch(:consul_url, 'http://localhost:8500'))
104
+ end
105
+
106
+ # Run code under the protection of this mutex.
107
+ #
108
+ # This method works similarly to the `Mutex#synchronize` method in the
109
+ # Ruby stdlib. You pass it a block, and only one instance of the code
110
+ # in the block will be running at any given moment.
111
+ #
112
+ # The big difference is that the blocks of code being mutually excluded
113
+ # can be running in separate processes, on separate machines. We are
114
+ # truly living in the future.
115
+ #
116
+ # The slightly smaller difference is that the block of code you specify
117
+ # *may* not actually run, or it might be killed whilst running. You'll
118
+ # always get an exception raised if that occurs, but you'll need to
119
+ # be careful to clean up any state that your code sets using `ensure`
120
+ # blocks (or equivalent).
121
+ #
122
+ # Your block of code is also run on a separate thread within the
123
+ # interpreter from the one in which `#synchronize` itself is called (in
124
+ # case that's important to you).
125
+ #
126
+ # @yield your code will be run once we have acquired the lock which
127
+ # controls this mutex.
128
+ #
129
+ # @raise [ArgumentError] if no block is passed to this method.
130
+ #
131
+ # @raise [RuntimeError] if some sort of internal logic error occurs
132
+ # (always a bug in this code, please report)
133
+ #
134
+ # @raise [SocketError] if a mysterious socket-related error occurs.
135
+ #
136
+ # @raise [ConsulError] if an error occurs talking to Consul. This is
137
+ # indicative of a problem with the Consul agent, or the network.
138
+ #
139
+ # @raise [ThreadExceptionError] if the worker thread exits with an
140
+ # exception during execution. This is indicative of a problem in your
141
+ # code.
142
+ #
143
+ # @raise [LostLockError] if the Consul lock is somehow disrupted while
144
+ # the worker thread is running. This should only happen if the Consul
145
+ # cluster has a serious problem of some sort, or someone fiddles with
146
+ # the lock key behind our backs. Find who is fiddling with the key, and
147
+ # break their fingers.
148
+ #
149
+ def synchronize
150
+ unless block_given?
151
+ raise ArgumentError,
152
+ "No block passed to #{self.inspect}#synchronize"
153
+ end
154
+
155
+ acquire_lock
156
+
157
+ begin
158
+ worker_thread = Thread.new { yield }
159
+ watcher_thread = Thread.new { hold_lock }
160
+
161
+ tw = ThreadsWait.new(worker_thread, watcher_thread)
162
+ finished_thread = tw.next_wait
163
+
164
+ err = nil
165
+
166
+ case finished_thread
167
+ when worker_thread
168
+ # Work completed successfully... excellent
169
+ return worker_thread.value
170
+
171
+ when watcher_thread
172
+ # We lost the lock... fiddlesticks
173
+
174
+ # May as well delete our session now, it's useless
175
+ delete_session
176
+
177
+ k = watcher_thread.value
178
+
179
+ msg = if k.nil?
180
+ "Lost lock, key deleted!"
181
+ else
182
+ if k.session.nil?
183
+ "Lost lock, no active session!"
184
+ else
185
+ "Lost lock to session '#{k.session}'"
186
+ end
187
+ end
188
+
189
+ raise LostLockError, msg
190
+ else
191
+ raise RuntimeError,
192
+ "Mysterious return value from `ThreadsWait#next_wait: #{finished_thread.inspect}"
193
+ end
194
+ ensure
195
+ watcher_thread.kill
196
+ worker_thread.kill
197
+
198
+ begin
199
+ worker_thread.join
200
+ rescue Exception => ex
201
+ worker_thread = ex
202
+ end
203
+
204
+ begin
205
+ watcher_thread.join
206
+ rescue Exception => ex
207
+ watcher_thread = ex
208
+ end
209
+
210
+ release_lock if @session_id
211
+
212
+ if worker_thread.is_a?(Exception)
213
+ raise ThreadExceptionError.new(
214
+ "Worker thread raised exception",
215
+ worker_thread
216
+ )
217
+ end
218
+
219
+ if watcher_thread.is_a?(Exception)
220
+ if watcher_thread.is_a?(ConsulError)
221
+ raise watcher_thread
222
+ else
223
+ raise RuntimeError,
224
+ "Watcher thread raised exception: " +
225
+ "#{watcher_thread.message} " +
226
+ "(#{watcher_thread.class})"
227
+ end
228
+ end
229
+ end
230
+ end
231
+
232
+ private
233
+
234
+ def acquire_lock
235
+ wait_for_free_lock
236
+
237
+ k = nil
238
+
239
+ while k.nil? or k.session.nil?
240
+ unless set_key(@value, :acquire => session_id)
241
+ return acquire_lock
242
+ end
243
+
244
+ # It may seem extremely weird to do this loop construct, *and*
245
+ # get the key immediately after we've just (seemingly) successfully
246
+ # acquired the lock on the key, but here's the thing: consul locks
247
+ # are entirely advisory. We can get the lock, then someone else
248
+ # can delete the key and recreate it, and our lock on the key is
249
+ # gone.
250
+ #
251
+ # We can guard against someone doing that without our knowledge by
252
+ # requesting the key and waiting for changes (as we will do, in
253
+ # `hold_lock`), but we can only do that once we have a value for
254
+ # `X-Consul-Index` which corresponds to a version of the key which
255
+ # shows that we do, indeed, hold the lock. Since PUTs on keys
256
+ # don't return `X-Consul-Index`, we need to make a separate GET
257
+ # request in order to get a value for `X-Consul-Index` **and check
258
+ # that response shows we still hold the lock** before we can do
259
+ # our wait-for-change dance in `hold_lock`.
260
+ #
261
+ # Yeah, distributed systems are *fun*, ain't they?
262
+ #
263
+ k = get_key
264
+ end
265
+
266
+ if k and k.session != session_id
267
+ # Someone else managed to get the lock out from underneath us.
268
+ # Do the whole dance again.
269
+ acquire_lock
270
+ end
271
+ end
272
+
273
+ def wait_for_free_lock
274
+ if (k = get_key(!@last_cas_index.nil?)).nil? or k.session.nil?
275
+ # The wait... is over!
276
+ return
277
+ else
278
+ wait_for_free_lock
279
+ end
280
+ end
281
+
282
+ def get_key(on_change = false)
283
+ url = key_url.dup
284
+
285
+ if on_change and @last_cas_index.nil?
286
+ raise RuntimeError,
287
+ "on_change=true and @last_cas_index.nil? -- that shouldn't happen, file a bug"
288
+ end
289
+
290
+ query_args = if on_change and @last_cas_index
291
+ { :index => @last_cas_index }
292
+ else
293
+ {}
294
+ end
295
+
296
+ url.query = URI.encode_www_form({ :consistent => nil }.merge(query_args))
297
+
298
+ consul_connection do |http|
299
+ res = http.get(url.request_uri)
300
+
301
+ case res.code
302
+ when '404'
303
+ nil
304
+ when '200'
305
+ Key.new(res).tap { |k| @last_cas_index = k.consul_index }
306
+ else
307
+ raise ConsulError,
308
+ "Consul returned bad response to GET #{url.request_uri}: #{res.code}: #{res.message}"
309
+ end
310
+ end
311
+ end
312
+
313
+ def set_key(value, query_args = {})
314
+ url = key_url.dup
315
+ url.query = URI.encode_www_form(query_args)
316
+
317
+ consul_connection do |http|
318
+ res = http.request(Net::HTTP::Put.new(url.request_uri), value)
319
+
320
+ if res.code == '200'
321
+ case res.body
322
+ when 'true'
323
+ return true
324
+ when 'false'
325
+ return false
326
+ else
327
+ raise ConsulError,
328
+ "Unexpected response body to PUT #{url.request_uri}: #{res.body.inspect}"
329
+ end
330
+ else
331
+ raise ConsulError,
332
+ "Unexpected response code to PUT #{url.request_uri}: #{res.code} #{res.message}"
333
+ end
334
+ end
335
+ end
336
+
337
+ def delete_key(query_args)
338
+ url = key_url.dup
339
+ url.query = URI.encode_www_form(query_args)
340
+
341
+ consul_connection do |http|
342
+ res = http.request(Net::HTTP::Delete.new(url.request_uri))
343
+
344
+ if res.code != '200'
345
+ raise ConsulError,
346
+ "Unexpected Consul response to DELETE #{url.request_uri}: #{res.code} #{res.message}"
347
+ elsif res.body != 'true' and res.body != 'false'
348
+ raise ConsulError,
349
+ "Unexpected Consul response to DELETE #{url.request_uri}: #{res.body}"
350
+ end
351
+ end
352
+ end
353
+
354
+ def hold_lock
355
+ loop do
356
+ k = get_key(true)
357
+
358
+ if k.nil? or k.session.nil? or k.session != session_id
359
+ return k
360
+ end
361
+ end
362
+ end
363
+
364
+ def release_lock
365
+ unless set_key('', :release => session_id)
366
+ raise ConsulError,
367
+ "Attempt to release lock returned false"
368
+ end
369
+
370
+ delete_session
371
+ k = get_key
372
+ delete_key(:cas => k.consul_index) if k and k.session.nil?
373
+ end
374
+
375
+ def key_url
376
+ @key_url ||= sub_url('/v1/kv/' + @key)
377
+ end
378
+
379
+ def sub_url(path)
380
+ @consul_url.dup.tap do |url|
381
+ url.path += '/' + path
382
+ url.path.gsub!('//', '/')
383
+ end
384
+ end
385
+
386
+ def consul_connection
387
+ begin
388
+ Net::HTTP.start(@consul_url.host, @consul_url.port, :use_ssl => @consul_url.scheme == 'https') do |http|
389
+ yield http
390
+ end
391
+ rescue Timeout::Error
392
+ raise ConsulError,
393
+ "Consul request failed: timeout"
394
+ rescue Errno::ECONNREFUSED
395
+ raise ConsulError,
396
+ "Consul request failed: connection refused"
397
+ rescue SocketError => ex
398
+ if ex.message == 'getaddrinfo: Name or service not known'
399
+ raise ConsulError,
400
+ "Consul request failed: Name resolution failure"
401
+ else
402
+ raise
403
+ end
404
+ rescue Net::HTTPBadResponse => ex
405
+ raise ConsulError,
406
+ "Bad HTTP response from Consul: #{ex.message}"
407
+ end
408
+ end
409
+
410
+ def session_id
411
+ @session_id ||= begin
412
+ consul_connection do |http|
413
+ res = http.request(Net::HTTP::Put.new(sub_url('/v1/session/create'), {}))
414
+
415
+ id = if res.code == '200'
416
+ begin
417
+ JSON.parse(res.body)['ID']
418
+ rescue JSON::ParserError => ex
419
+ raise ConsulError,
420
+ "Unparseable response from Consul session create: #{ex.message}"
421
+ end
422
+ else
423
+ raise ConsulError,
424
+ "Unexpected response from Consul session creation: #{res.code} #{res.message}"
425
+ end
426
+
427
+ if id.nil?
428
+ raise ConsulError,
429
+ "Consul did not provide us a session ID"
430
+ end
431
+
432
+ id
433
+ end
434
+ end
435
+ end
436
+
437
+ def delete_session
438
+ consul_connection do |http|
439
+ res = http.request(Net::HTTP::Put.new(sub_url("/v1/session/destroy/#{@session_id}"), {}))
440
+
441
+ if res.code != '200'
442
+ raise ConsulError,
443
+ "Unexpected Consul response to session deletion: #{res.code} #{res.message}"
444
+ elsif res.body != 'true'
445
+ raise ConsulError,
446
+ "Unexpected Consul response to session deletion: #{res.body}"
447
+ else
448
+ @session_id = nil
449
+ end
450
+ end
451
+ end
452
+ end