consul-mutex 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md ADDED
@@ -0,0 +1,95 @@
1
+ Sometimes, you just want some code to run on only one machine in a cluster
2
+ at any particular time. Perhaps you only need one copy running, and you'd
3
+ like to have something ready to failover, or maybe you want to make sure you
4
+ don't take down all your machines simultaneously for a code upgrade.
5
+
6
+ Either way, `Consul::Mutex` has got you covered.
7
+
8
+
9
+ # Installation
10
+
11
+ It's a gem:
12
+
13
+ gem install consul-mutex
14
+
15
+ There's also the wonders of [the Gemfile](http://bundler.io):
16
+
17
+ gem 'consul-mutex'
18
+
19
+ If you're the sturdy type that likes to run from git:
20
+
21
+ rake install
22
+
23
+ Or, if you've eschewed the convenience of Rubygems entirely, then you
24
+ presumably know what to do already.
25
+
26
+
27
+ # Usage
28
+
29
+ Simply instantiate a new `Consul::Mutex`, giving it the key you want to use
30
+ as the "lock":
31
+
32
+ require 'consul/mutex'
33
+
34
+ mutex = Consul::Mutex.new('/my/something/weird')
35
+
36
+ Then, whenever you want to only have one thing running at once, run the code
37
+ inside a block passed to `#synchronize`:
38
+
39
+ mutex.synchronize { print "There can be"; sleep 5; puts " only one." }
40
+
41
+ If your consul server is not accessable via `http://localhost:8500`, you'll
42
+ need to tell `Consul::Mutex` where to find it:
43
+
44
+ mutex = Consul::Mutex.new('/some/key', consul_url: 'http://consul:8500')
45
+
46
+ By default, the "value" of the lock resource will be the hostname of the
47
+ machine that it's running on (so you know who has the lock). If, for some
48
+ reason, you'd like to set the value to something else, you can do that, too:
49
+
50
+ mutex = Consul::Mutex.new('/some/key', value: "It is now #{Time.now}")
51
+
52
+
53
+ ## Failure Is An Option
54
+
55
+ One thing that is a bit unsettling about Consul-mediated mutexes is that
56
+ the lock can be "lost" for a variety of reasons. The most common one, of
57
+ course, is simple communications failure -- if you're on the wrong side of a
58
+ split-brain, the rest of the cluster can recover and continue on, and you no
59
+ longer have the lock, because your half of the cluster is dead. Consul also
60
+ allows locks to be force-unlocked by an operator, because otherwise, if
61
+ something died without unlocking, the lock would be held *forever*.
62
+
63
+ As a result of all this, the block of code that you pass to `#synchronize`
64
+ runs on a separate thread, and *can be killed without warning* if the mutex
65
+ determines that it no longer holds the lock. That means you want to be a
66
+ bit extra-careful about idempotence and releasing resources you acquire (via
67
+ `ensure` blocks).
68
+
69
+
70
+ # Contributing
71
+
72
+ Bug reports should be sent to the [Github issue
73
+ tracker](https://github.com/mpalmer/consul-mutex/issues), or
74
+ [e-mailed](mailto:theshed+consul-mutex@hezmatt.org). Patches can be sent as a
75
+ Github pull request, or [e-mailed](mailto:theshed+consul-mutex@hezmatt.org).
76
+
77
+
78
+ # Licence
79
+
80
+ Unless otherwise stated, everything in this repo is covered by the following
81
+ copyright notice:
82
+
83
+ Copyright (C) 2015 Civilized Discourse Construction Kit Inc.
84
+
85
+ This program is free software: you can redistribute it and/or modify it
86
+ under the terms of the GNU General Public License version 3, as
87
+ published by the Free Software Foundation.
88
+
89
+ This program is distributed in the hope that it will be useful,
90
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
91
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
92
+ GNU General Public License for more details.
93
+
94
+ You should have received a copy of the GNU General Public License
95
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
@@ -0,0 +1,38 @@
1
+ begin
2
+ require 'git-version-bump'
3
+ rescue LoadError
4
+ nil
5
+ end
6
+
7
+ Gem::Specification.new do |s|
8
+ s.name = "consul-mutex"
9
+
10
+ s.version = GVB.version rescue "0.0.0.1.NOGVB"
11
+ s.date = GVB.date rescue Time.now.strftime("%Y-%m-%d")
12
+
13
+ s.platform = Gem::Platform::RUBY
14
+
15
+ s.summary = "Manage distributed code with a Consul-mediated mutex"
16
+
17
+ s.authors = ["Matt Palmer"]
18
+ s.email = ["theshed+consul-mutex@hezmatt.org"]
19
+ s.homepage = "http://theshed.hezmatt.org/consul-mutex"
20
+
21
+ s.files = `git ls-files -z`.split("\0").reject { |f| f =~ /^(G|spec|Rakefile)/ }
22
+
23
+ s.required_ruby_version = ">= 1.9.3"
24
+
25
+ s.add_runtime_dependency 'nesty', '~> 1.0'
26
+
27
+ s.add_development_dependency 'bundler'
28
+ s.add_development_dependency 'github-release'
29
+ s.add_development_dependency 'guard-spork'
30
+ s.add_development_dependency 'guard-rspec'
31
+ s.add_development_dependency 'rake', '~> 10.4', '>= 10.4.2'
32
+ # Needed for guard
33
+ s.add_development_dependency 'rb-inotify', '~> 0.9'
34
+ s.add_development_dependency 'redcarpet'
35
+ s.add_development_dependency 'rspec'
36
+ s.add_development_dependency 'webmock'
37
+ s.add_development_dependency 'yard'
38
+ end
data/lib/.gitkeep ADDED
File without changes
@@ -0,0 +1,452 @@
1
+ require 'json'
2
+ require 'socket'
3
+ require 'uri'
4
+ require 'thwait'
5
+ require 'net/http'
6
+ require 'nesty'
7
+
8
+ if !Kernel.const_defined?(:Consul)
9
+ #:nodoc:
10
+ module Consul; end
11
+ end
12
+
13
+ # A Consul-mediated distributed mutex
14
+ #
15
+ # Sometimes, you just want some code to run on only one machine in a cluster
16
+ # at any particular time. Perhaps you only need one copy running, and you'd
17
+ # like to have something ready to failover, or maybe you want to make sure
18
+ # you don't take down all your machines simultaneously for a code upgrade.
19
+ #
20
+ # Either way, `Consul::Mutex` has got you covered.
21
+ #
22
+ class Consul::Mutex
23
+ # Indicates something went wrong in the worker thread.
24
+ #
25
+ # Catch this exception if you specifically want to know that your worker
26
+ # thread lost its mind and errored out. The *actual* exception which
27
+ # caused the thread to terminate is available in `#nested`.
28
+ #
29
+ ThreadExceptionError = Class.new(Nesty::NestedStandardError)
30
+
31
+ # Indicates some sort of problem communicating with Consul.
32
+ #
33
+ # Something has gone terribly, terribly wrong, and I need to tell you all
34
+ # about it.
35
+ #
36
+ ConsulError = Class.new(RuntimeError)
37
+
38
+ # Indicates that the worker thread was terminated because we lost the
39
+ # distributed lock.
40
+ #
41
+ # You can't assume *anything* about what state anything is in that you
42
+ # haven't explicitly made true using `ensure`. In general, if you're
43
+ # getting this exception, something is doing terrible, terrible things to
44
+ # your Consul cluster.
45
+ #
46
+ LostLockError = Class.new(RuntimeError)
47
+
48
+ # Internal-only class to abstract out the details of a Consul KV key,
49
+ # parsing it from the HTTP response into something a little easier
50
+ # to deal with.
51
+ #
52
+ class Key
53
+ attr_reader :consul_index, :session, :value
54
+
55
+ def initialize(http_response)
56
+ begin
57
+ json = JSON.parse(http_response.body)
58
+ rescue JSON::ParserError => ex
59
+ raise ConsulError,
60
+ "Consul returned unparseable JSON: #{ex.message}"
61
+ end
62
+
63
+ unless json.is_a?(Array)
64
+ raise ConsulError,
65
+ "Consul did not return an array; instead, it is a #{json.class}"
66
+ end
67
+
68
+ if json.length != 1
69
+ raise ConsulError,
70
+ "Invalid number of objects returned: expected 1, got #{json.length}"
71
+ end
72
+
73
+ json = json.first
74
+
75
+ @consul_index = http_response['X-Consul-Index']
76
+ @session = json['Session']
77
+ @value = json['Value'].nil? ? nil : json['Value'].unpack("m").first
78
+
79
+ @last_cas_index = nil
80
+ end
81
+ end
82
+ private_constant :Key
83
+
84
+ # Create a new Consul-mediated distributed mutex.
85
+ #
86
+ # @param key [String] the path (within the Consul KV store namespace,
87
+ # `/v1/kv`) which will be used as the lock key for all operations on
88
+ # this mutex. Every mutex created with the same key will exclude with
89
+ # all other mutexes created with the same key.
90
+ #
91
+ # @option opts [String] :value (hostname) the value to set on the lock
92
+ # key when we acquire the lock. This can be anything you like, but
93
+ # generally you'll want to set something that uniquely identifies who
94
+ # is currently holding the lock. The default, the local system's
95
+ # hostname, is generally a good choice.
96
+ #
97
+ # @option opts [String] :consul_url ('http://localhost:8500') where to
98
+ # connect to in order to talk to your local Consul cluster.
99
+ #
100
+ def initialize(key, opts = {})
101
+ @key = key
102
+ @value = opts.fetch(:value, Socket.gethostname)
103
+ @consul_url = URI(opts.fetch(:consul_url, 'http://localhost:8500'))
104
+ end
105
+
106
+ # Run code under the protection of this mutex.
107
+ #
108
+ # This method works similarly to the `Mutex#synchronize` method in the
109
+ # Ruby stdlib. You pass it a block, and only one instance of the code
110
+ # in the block will be running at any given moment.
111
+ #
112
+ # The big difference is that the blocks of code being mutually excluded
113
+ # can be running in separate processes, on separate machines. We are
114
+ # truly living in the future.
115
+ #
116
+ # The slightly smaller difference is that the block of code you specify
117
+ # *may* not actually run, or it might be killed whilst running. You'll
118
+ # always get an exception raised if that occurs, but you'll need to
119
+ # be careful to clean up any state that your code sets using `ensure`
120
+ # blocks (or equivalent).
121
+ #
122
+ # Your block of code is also run on a separate thread within the
123
+ # interpreter from the one in which `#synchronize` itself is called (in
124
+ # case that's important to you).
125
+ #
126
+ # @yield your code will be run once we have acquired the lock which
127
+ # controls this mutex.
128
+ #
129
+ # @raise [ArgumentError] if no block is passed to this method.
130
+ #
131
+ # @raise [RuntimeError] if some sort of internal logic error occurs
132
+ # (always a bug in this code, please report)
133
+ #
134
+ # @raise [SocketError] if a mysterious socket-related error occurs.
135
+ #
136
+ # @raise [ConsulError] if an error occurs talking to Consul. This is
137
+ # indicative of a problem with the Consul agent, or the network.
138
+ #
139
+ # @raise [ThreadExceptionError] if the worker thread exits with an
140
+ # exception during execution. This is indicative of a problem in your
141
+ # code.
142
+ #
143
+ # @raise [LostLockError] if the Consul lock is somehow disrupted while
144
+ # the worker thread is running. This should only happen if the Consul
145
+ # cluster has a serious problem of some sort, or someone fiddles with
146
+ # the lock key behind our backs. Find who is fiddling with the key, and
147
+ # break their fingers.
148
+ #
149
+ def synchronize
150
+ unless block_given?
151
+ raise ArgumentError,
152
+ "No block passed to #{self.inspect}#synchronize"
153
+ end
154
+
155
+ acquire_lock
156
+
157
+ begin
158
+ worker_thread = Thread.new { yield }
159
+ watcher_thread = Thread.new { hold_lock }
160
+
161
+ tw = ThreadsWait.new(worker_thread, watcher_thread)
162
+ finished_thread = tw.next_wait
163
+
164
+ err = nil
165
+
166
+ case finished_thread
167
+ when worker_thread
168
+ # Work completed successfully... excellent
169
+ return worker_thread.value
170
+
171
+ when watcher_thread
172
+ # We lost the lock... fiddlesticks
173
+
174
+ # May as well delete our session now, it's useless
175
+ delete_session
176
+
177
+ k = watcher_thread.value
178
+
179
+ msg = if k.nil?
180
+ "Lost lock, key deleted!"
181
+ else
182
+ if k.session.nil?
183
+ "Lost lock, no active session!"
184
+ else
185
+ "Lost lock to session '#{k.session}'"
186
+ end
187
+ end
188
+
189
+ raise LostLockError, msg
190
+ else
191
+ raise RuntimeError,
192
+ "Mysterious return value from `ThreadsWait#next_wait: #{finished_thread.inspect}"
193
+ end
194
+ ensure
195
+ watcher_thread.kill
196
+ worker_thread.kill
197
+
198
+ begin
199
+ worker_thread.join
200
+ rescue Exception => ex
201
+ worker_thread = ex
202
+ end
203
+
204
+ begin
205
+ watcher_thread.join
206
+ rescue Exception => ex
207
+ watcher_thread = ex
208
+ end
209
+
210
+ release_lock if @session_id
211
+
212
+ if worker_thread.is_a?(Exception)
213
+ raise ThreadExceptionError.new(
214
+ "Worker thread raised exception",
215
+ worker_thread
216
+ )
217
+ end
218
+
219
+ if watcher_thread.is_a?(Exception)
220
+ if watcher_thread.is_a?(ConsulError)
221
+ raise watcher_thread
222
+ else
223
+ raise RuntimeError,
224
+ "Watcher thread raised exception: " +
225
+ "#{watcher_thread.message} " +
226
+ "(#{watcher_thread.class})"
227
+ end
228
+ end
229
+ end
230
+ end
231
+
232
+ private
233
+
234
+ def acquire_lock
235
+ wait_for_free_lock
236
+
237
+ k = nil
238
+
239
+ while k.nil? or k.session.nil?
240
+ unless set_key(@value, :acquire => session_id)
241
+ return acquire_lock
242
+ end
243
+
244
+ # It may seem extremely weird to do this loop construct, *and*
245
+ # get the key immediately after we've just (seemingly) successfully
246
+ # acquired the lock on the key, but here's the thing: consul locks
247
+ # are entirely advisory. We can get the lock, then someone else
248
+ # can delete the key and recreate it, and our lock on the key is
249
+ # gone.
250
+ #
251
+ # We can guard against someone doing that without our knowledge by
252
+ # requesting the key and waiting for changes (as we will do, in
253
+ # `hold_lock`), but we can only do that once we have a value for
254
+ # `X-Consul-Index` which corresponds to a version of the key which
255
+ # shows that we do, indeed, hold the lock. Since PUTs on keys
256
+ # don't return `X-Consul-Index`, we need to make a separate GET
257
+ # request in order to get a value for `X-Consul-Index` **and check
258
+ # that response shows we still hold the lock** before we can do
259
+ # our wait-for-change dance in `hold_lock`.
260
+ #
261
+ # Yeah, distributed systems are *fun*, ain't they?
262
+ #
263
+ k = get_key
264
+ end
265
+
266
+ if k and k.session != session_id
267
+ # Someone else managed to get the lock out from underneath us.
268
+ # Do the whole dance again.
269
+ acquire_lock
270
+ end
271
+ end
272
+
273
+ def wait_for_free_lock
274
+ if (k = get_key(!@last_cas_index.nil?)).nil? or k.session.nil?
275
+ # The wait... is over!
276
+ return
277
+ else
278
+ wait_for_free_lock
279
+ end
280
+ end
281
+
282
+ def get_key(on_change = false)
283
+ url = key_url.dup
284
+
285
+ if on_change and @last_cas_index.nil?
286
+ raise RuntimeError,
287
+ "on_change=true and @last_cas_index.nil? -- that shouldn't happen, file a bug"
288
+ end
289
+
290
+ query_args = if on_change and @last_cas_index
291
+ { :index => @last_cas_index }
292
+ else
293
+ {}
294
+ end
295
+
296
+ url.query = URI.encode_www_form({ :consistent => nil }.merge(query_args))
297
+
298
+ consul_connection do |http|
299
+ res = http.get(url.request_uri)
300
+
301
+ case res.code
302
+ when '404'
303
+ nil
304
+ when '200'
305
+ Key.new(res).tap { |k| @last_cas_index = k.consul_index }
306
+ else
307
+ raise ConsulError,
308
+ "Consul returned bad response to GET #{url.request_uri}: #{res.code}: #{res.message}"
309
+ end
310
+ end
311
+ end
312
+
313
+ def set_key(value, query_args = {})
314
+ url = key_url.dup
315
+ url.query = URI.encode_www_form(query_args)
316
+
317
+ consul_connection do |http|
318
+ res = http.request(Net::HTTP::Put.new(url.request_uri), value)
319
+
320
+ if res.code == '200'
321
+ case res.body
322
+ when 'true'
323
+ return true
324
+ when 'false'
325
+ return false
326
+ else
327
+ raise ConsulError,
328
+ "Unexpected response body to PUT #{url.request_uri}: #{res.body.inspect}"
329
+ end
330
+ else
331
+ raise ConsulError,
332
+ "Unexpected response code to PUT #{url.request_uri}: #{res.code} #{res.message}"
333
+ end
334
+ end
335
+ end
336
+
337
+ def delete_key(query_args)
338
+ url = key_url.dup
339
+ url.query = URI.encode_www_form(query_args)
340
+
341
+ consul_connection do |http|
342
+ res = http.request(Net::HTTP::Delete.new(url.request_uri))
343
+
344
+ if res.code != '200'
345
+ raise ConsulError,
346
+ "Unexpected Consul response to DELETE #{url.request_uri}: #{res.code} #{res.message}"
347
+ elsif res.body != 'true' and res.body != 'false'
348
+ raise ConsulError,
349
+ "Unexpected Consul response to DELETE #{url.request_uri}: #{res.body}"
350
+ end
351
+ end
352
+ end
353
+
354
+ def hold_lock
355
+ loop do
356
+ k = get_key(true)
357
+
358
+ if k.nil? or k.session.nil? or k.session != session_id
359
+ return k
360
+ end
361
+ end
362
+ end
363
+
364
+ def release_lock
365
+ unless set_key('', :release => session_id)
366
+ raise ConsulError,
367
+ "Attempt to release lock returned false"
368
+ end
369
+
370
+ delete_session
371
+ k = get_key
372
+ delete_key(:cas => k.consul_index) if k and k.session.nil?
373
+ end
374
+
375
+ def key_url
376
+ @key_url ||= sub_url('/v1/kv/' + @key)
377
+ end
378
+
379
+ def sub_url(path)
380
+ @consul_url.dup.tap do |url|
381
+ url.path += '/' + path
382
+ url.path.gsub!('//', '/')
383
+ end
384
+ end
385
+
386
+ def consul_connection
387
+ begin
388
+ Net::HTTP.start(@consul_url.host, @consul_url.port, :use_ssl => @consul_url.scheme == 'https') do |http|
389
+ yield http
390
+ end
391
+ rescue Timeout::Error
392
+ raise ConsulError,
393
+ "Consul request failed: timeout"
394
+ rescue Errno::ECONNREFUSED
395
+ raise ConsulError,
396
+ "Consul request failed: connection refused"
397
+ rescue SocketError => ex
398
+ if ex.message == 'getaddrinfo: Name or service not known'
399
+ raise ConsulError,
400
+ "Consul request failed: Name resolution failure"
401
+ else
402
+ raise
403
+ end
404
+ rescue Net::HTTPBadResponse => ex
405
+ raise ConsulError,
406
+ "Bad HTTP response from Consul: #{ex.message}"
407
+ end
408
+ end
409
+
410
+ def session_id
411
+ @session_id ||= begin
412
+ consul_connection do |http|
413
+ res = http.request(Net::HTTP::Put.new(sub_url('/v1/session/create'), {}))
414
+
415
+ id = if res.code == '200'
416
+ begin
417
+ JSON.parse(res.body)['ID']
418
+ rescue JSON::ParserError => ex
419
+ raise ConsulError,
420
+ "Unparseable response from Consul session create: #{ex.message}"
421
+ end
422
+ else
423
+ raise ConsulError,
424
+ "Unexpected response from Consul session creation: #{res.code} #{res.message}"
425
+ end
426
+
427
+ if id.nil?
428
+ raise ConsulError,
429
+ "Consul did not provide us a session ID"
430
+ end
431
+
432
+ id
433
+ end
434
+ end
435
+ end
436
+
437
+ def delete_session
438
+ consul_connection do |http|
439
+ res = http.request(Net::HTTP::Put.new(sub_url("/v1/session/destroy/#{@session_id}"), {}))
440
+
441
+ if res.code != '200'
442
+ raise ConsulError,
443
+ "Unexpected Consul response to session deletion: #{res.code} #{res.message}"
444
+ elsif res.body != 'true'
445
+ raise ConsulError,
446
+ "Unexpected Consul response to session deletion: #{res.body}"
447
+ else
448
+ @session_id = nil
449
+ end
450
+ end
451
+ end
452
+ end