zk 1.6.1 → 1.6.2
Sign up to get free protection for your applications and to get access to all the features.
- data/README.markdown +6 -61
- data/RELEASES.markdown +6 -0
- data/lib/zk/client/state_mixin.rb +0 -45
- data/lib/zk/client/threaded.rb +93 -14
- data/lib/zk/version.rb +1 -1
- data/lib/zk.rb +1 -1
- data/spec/zk/pool_spec.rb +1 -0
- metadata +4 -4
data/README.markdown
CHANGED
@@ -65,6 +65,12 @@ In addition to all of that, I would like to think that the public API the ZK::Cl
|
|
65
65
|
[zk-eventmachine]: https://github.com/slyphon/zk-eventmachine
|
66
66
|
|
67
67
|
## NEWS ##
|
68
|
+
### v1.6.2 ###
|
69
|
+
|
70
|
+
* Change state call to reduce the chances of deadlocks
|
71
|
+
|
72
|
+
One of the problems I've been seeing is that during some kind of shutdown event, some method will call `closed?` or `connected?` which will acquire a mutex and make a call on the underlying connection at the *exact* moment necessary to cause a deadlock. In order to help prevent this, and building on some changes from 1.5.3, we now treat our cached `@last_cnx_state` as the current state of the connection and don't touch the underlying connection object (except in the case of the java driver, which is safe).
|
73
|
+
|
68
74
|
### v1.6.0 ###
|
69
75
|
|
70
76
|
* Locker cleanup code!
|
@@ -89,67 +95,6 @@ Will go through your locker nodes one by one and try to lock and unlock them. If
|
|
89
95
|
|
90
96
|
* 'private' is not 'protected'. I've been writing ruby for several years now, and apparently I'd forgotten that 'protected' does not work like how it does in java. The visibility of these methods has been corrected, and all specs pass, so I don't expect issues...but please report if this change causes any bugs in user code.
|
91
97
|
|
92
|
-
### v1.5.2 ###
|
93
|
-
|
94
|
-
* Fix locker cleanup code to avoid a nasty race when a session is lost, see [issue #34](https://github.com/slyphon/zk/issues/34)
|
95
|
-
|
96
|
-
* Fix potential deadlock in ForkHook code so the mutex is unlocked in the case of an exception
|
97
|
-
|
98
|
-
* Do not hang forever when shutting down and the shutdown thread does not exit (wait 30 seconds).
|
99
|
-
|
100
|
-
### v1.5.1 ###
|
101
|
-
|
102
|
-
* Added a `:retry_duration` option to the Threaded client constructor which will allows the user to specify for how long in the case of a connection loss, should an operation wait for the connection to be re-established before retrying the operation. This can be set at a global level and overridden on a per-call basis. The default is to not retry (which may change at a later date). Generally speaking, a timeout of > 30s is probably excessive, and care should be taken because during a connection loss, the server-side state may change without you being aware of it (i.e. events will not be delivered).
|
103
|
-
|
104
|
-
* Small fork-hook implementation fix. Previously we were using WeakRefs so that hooks would not prevent an object from being garbage collected. This has been replaced with a finalizer which is more deterministic.
|
105
|
-
|
106
|
-
### v1.5.0 ###
|
107
|
-
|
108
|
-
Ok, now seriously this time. I think all of the forking issues are done.
|
109
|
-
|
110
|
-
* Implemented a 'stop the world' feature to ensure safety when forking. All threads are stopped, but state is preserved. `fork()` can then be called safely, and after fork returns, all threads will be restarted in the parent, and the connection will be torn down and reopened in the child.
|
111
|
-
|
112
|
-
* The easiest, and supported, way of doing this is now to call `ZK.install_fork_hook` after requiring zk. This will install an `alias_method_chain` style hook around the `Kernel.fork` method, which handles pausing all clients in the parent, calling fork, then resuming in the parent and reconnecting in the child. If you're using ZK in resque, I *highly* recommend using this approach, as it will give the most consistent results.
|
113
|
-
|
114
|
-
In your app that requires an open ZK instance and `fork()`:
|
115
|
-
|
116
|
-
```ruby
|
117
|
-
|
118
|
-
require 'zk'
|
119
|
-
ZK.install_fork_hook
|
120
|
-
|
121
|
-
```
|
122
|
-
|
123
|
-
Then use fork as you normally would.
|
124
|
-
|
125
|
-
* Logging is now off by default, but we now use the excellent, can't-recommend-it-enough, [logging](https://github.com/TwP/logging) gem. If you want to tap into the ZK logs, you can assign a stdlib compliant logger to `ZK.logger` and that will be used. Otherwise, you can use the Logging framework's controls. All ZK logs are consolidated under the 'ZK' logger instance.
|
126
|
-
|
127
|
-
|
128
|
-
### v1.4.1 ###
|
129
|
-
|
130
|
-
* All users of resque or other libraries that depend on `fork()` are encouraged to upgrade immediately. This version of ZK features the `zookeeper-1.1.0` gem with a completely rewritten backend that provides true fork safety. The rules still apply (you must call `#reopen` on your client as soon as possible in the child process) but you can be assured a much more stable experience.
|
131
|
-
|
132
|
-
### v1.4.0 ###
|
133
|
-
|
134
|
-
* Added a new `:ignore` option for convenience when you don't care if an operation fails. In the case of a failure, the method will return nil instead of raising an exception. This option works for `children`, `create`, `delete`, `get`, `get_acl`, `set`, and `set_acl`. `stat` will ignore the option (because it doesn't care about the state of a node).
|
135
|
-
|
136
|
-
```
|
137
|
-
# so instead of having to do:
|
138
|
-
|
139
|
-
begin
|
140
|
-
zk.delete('/some/path')
|
141
|
-
rescue ZK::Exceptions;:NoNode
|
142
|
-
end
|
143
|
-
|
144
|
-
# you can do
|
145
|
-
|
146
|
-
zk.delete('/some/path', :ignore => :no_node)
|
147
|
-
|
148
|
-
```
|
149
|
-
|
150
|
-
* MASSIVE fork/parent/child test around event delivery and much greater stability expected for linux (with the zookeeper-1.0.3 gem). Again, please see the documentation on the wiki about [proper fork procedure](http://github.com/slyphon/zk/wiki/Forking).
|
151
|
-
|
152
|
-
|
153
98
|
|
154
99
|
## Caveats
|
155
100
|
|
data/RELEASES.markdown
CHANGED
@@ -1,5 +1,11 @@
|
|
1
1
|
This file notes feature differences and bugfixes contained between releases.
|
2
2
|
|
3
|
+
### v1.6.2 ###
|
4
|
+
|
5
|
+
* Change state call to reduce the chances of deadlocks
|
6
|
+
|
7
|
+
One of the problems I've been seeing is that during some kind of shutdown event, some method will call `closed?` or `connected?` which will acquire a mutex and make a call on the underlying connection at the *exact* moment necessary to cause a deadlock. In order to help prevent this, and building on some changes from 1.5.3, we now treat our cached `@last_cnx_state` as the current state of the connection and don't touch the underlying connection object (except in the case of the java driver, which is safe).
|
8
|
+
|
3
9
|
### v1.6.1 ###
|
4
10
|
|
5
11
|
* Small fixes for zk-eventmachine compatibilty
|
@@ -3,51 +3,6 @@ module ZK
|
|
3
3
|
# Provides client-state related methods. Included in ZK::Client::Base.
|
4
4
|
# (refactored out to this class to ease documentation overload)
|
5
5
|
module StateMixin
|
6
|
-
# Returns true if the underlying connection is in the +connected+ state.
|
7
|
-
def connected?
|
8
|
-
wrap_state_closed_error { cnx and cnx.connected? }
|
9
|
-
end
|
10
|
-
|
11
|
-
# is the underlying connection is in the +associating+ state?
|
12
|
-
# @return [bool]
|
13
|
-
def associating?
|
14
|
-
wrap_state_closed_error { cnx and cnx.associating? }
|
15
|
-
end
|
16
|
-
|
17
|
-
# is the underlying connection is in the +connecting+ state?
|
18
|
-
# @return [bool]
|
19
|
-
def connecting?
|
20
|
-
wrap_state_closed_error { cnx and cnx.connecting? }
|
21
|
-
end
|
22
|
-
|
23
|
-
# is the underlying connection is in the +expired_session+ state?
|
24
|
-
# @return [bool]
|
25
|
-
def expired_session?
|
26
|
-
return nil unless @cnx
|
27
|
-
|
28
|
-
if defined?(::JRUBY_VERSION)
|
29
|
-
cnx.state == Java::OrgApacheZookeeper::ZooKeeper::States::EXPIRED_SESSION
|
30
|
-
else
|
31
|
-
wrap_state_closed_error { cnx.state == Zookeeper::ZOO_EXPIRED_SESSION_STATE }
|
32
|
-
end
|
33
|
-
end
|
34
|
-
|
35
|
-
# returns the current state of the connection as reported by the underlying driver
|
36
|
-
# as a symbol. The possible values are <tt>[:closed, :expired_session, :auth_failed
|
37
|
-
# :connecting, :connected, :associating]</tt>.
|
38
|
-
#
|
39
|
-
# See the Zookeeper session
|
40
|
-
# {documentation}[http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions]
|
41
|
-
# for more information
|
42
|
-
#
|
43
|
-
def state
|
44
|
-
if defined?(::JRUBY_VERSION)
|
45
|
-
cnx.state.to_string.downcase.to_sym
|
46
|
-
else
|
47
|
-
STATE_SYM_MAP.fetch(cnx.state) { |k| raise IndexError, "unrecognized state: #{k}" }
|
48
|
-
end
|
49
|
-
end
|
50
|
-
|
51
6
|
# Register a block to be called when *any* connection event occurs
|
52
7
|
#
|
53
8
|
# @yield [event] yields the connection event to the block
|
data/lib/zk/client/threaded.rb
CHANGED
@@ -200,7 +200,7 @@ module ZK
|
|
200
200
|
# @option opts [Fixnum] :timeout how long we will wait for the connection
|
201
201
|
# to be established. If timeout is nil, we will wait forever: *use
|
202
202
|
# carefully*.
|
203
|
-
|
203
|
+
def connect(opts={})
|
204
204
|
@mutex.synchronize { unlocked_connect(opts) }
|
205
205
|
end
|
206
206
|
|
@@ -216,7 +216,7 @@ module ZK
|
|
216
216
|
|
217
217
|
setup_locks
|
218
218
|
|
219
|
-
@pid
|
219
|
+
@pid = Process.pid
|
220
220
|
@client_state = RUNNING # reset state to running if we were paused
|
221
221
|
|
222
222
|
old_cnx, @cnx = @cnx, nil
|
@@ -224,8 +224,6 @@ module ZK
|
|
224
224
|
|
225
225
|
join_and_clear_reconnect_thread
|
226
226
|
|
227
|
-
@last_cnx_state = nil
|
228
|
-
|
229
227
|
@mutex.synchronize do
|
230
228
|
# it's important that we're holding the lock, as access to 'cnx' is
|
231
229
|
# synchronized, and we want to avoid a race where event handlers
|
@@ -245,8 +243,14 @@ module ZK
|
|
245
243
|
end
|
246
244
|
|
247
245
|
logger.debug { "reopening, no fork detected" }
|
248
|
-
@last_cnx_state =
|
246
|
+
@last_cnx_state = Zookeeper::ZOO_CONNECTING_STATE
|
249
247
|
@cnx.reopen(timeout) # ok, we werent' forked, so just reopen
|
248
|
+
|
249
|
+
# this is a bit of a hack, because we need to wait until the event thread
|
250
|
+
# delivers the connected event, which we used to be able to rely on just the
|
251
|
+
# connection doing. since we don't want to call the @cnx.state method to check
|
252
|
+
# (rather use the cached @last_cnx_state), we wait for consistency's sake
|
253
|
+
wait_until_connected_or_dying(timeout)
|
250
254
|
end
|
251
255
|
end
|
252
256
|
|
@@ -347,6 +351,37 @@ module ZK
|
|
347
351
|
on_tpool ? shutdown_thread : shutdown_thread.join(30)
|
348
352
|
end
|
349
353
|
|
354
|
+
# this overrides the implementation in StateMixin
|
355
|
+
def connected?
|
356
|
+
@mutex.synchronize { running? && @last_cnx_state == Zookeeper::ZOO_CONNECTED_STATE }
|
357
|
+
end
|
358
|
+
|
359
|
+
def associating?
|
360
|
+
@mutex.synchronize { running? && @last_cnx_state == Zookeeper::ZOO_ASSOCIATING_STATE }
|
361
|
+
end
|
362
|
+
|
363
|
+
def connecting?
|
364
|
+
@mutex.synchronize { running? && @last_cnx_state == Zookeeper::ZOO_CONNECTING_STATE }
|
365
|
+
end
|
366
|
+
|
367
|
+
def expired_session?
|
368
|
+
@mutex.synchronize do
|
369
|
+
return false unless @cnx and running?
|
370
|
+
|
371
|
+
if defined?(::JRUBY_VERSION)
|
372
|
+
!@cnx.state.alive?
|
373
|
+
else
|
374
|
+
@last_cnx_state == Zookeeper::ZOO_EXPIRED_SESSION_STATE
|
375
|
+
end
|
376
|
+
end
|
377
|
+
end
|
378
|
+
|
379
|
+
def state
|
380
|
+
@mutex.synchronize do
|
381
|
+
STATE_SYM_MAP.fetch(@last_cnx_state) { |k| raise IndexError, "unrecognized state: #{k.inspect}" }
|
382
|
+
end
|
383
|
+
end
|
384
|
+
|
350
385
|
# {see ZK::Client::Base#close}
|
351
386
|
def close
|
352
387
|
super
|
@@ -407,20 +442,47 @@ module ZK
|
|
407
442
|
|
408
443
|
# @private
|
409
444
|
def wait_until_connected_or_dying(timeout)
|
410
|
-
time_to_stop = Time.now + timeout
|
445
|
+
time_to_stop = timeout ? Time.now + timeout : nil
|
411
446
|
|
412
447
|
@mutex.synchronize do
|
413
448
|
while true
|
414
|
-
|
415
|
-
|
416
|
-
|
417
|
-
|
449
|
+
if timeout
|
450
|
+
now = Time.now
|
451
|
+
break if (@last_cnx_state == Zookeeper::ZOO_CONNECTED_STATE) || (now > time_to_stop) || (@client_state != RUNNING)
|
452
|
+
deadline = time_to_stop.to_f - now.to_f
|
453
|
+
@cond.wait(deadline)
|
454
|
+
else
|
455
|
+
break if (@last_cnx_state == Zookeeper::ZOO_CONNECTED_STATE) || (@client_state != RUNNING)
|
456
|
+
@cond.wait
|
457
|
+
end
|
418
458
|
end
|
459
|
+
end
|
419
460
|
|
420
|
-
|
461
|
+
logger.debug { "#{__method__} @last_cnx_state: #{@last_cnx_state.inspect}, time_left? #{timeout ? (Time.now.to_f < time_to_stop.to_f) : 'true'}, @client_state: #{@client_state.inspect}" }
|
462
|
+
end
|
463
|
+
|
464
|
+
# @private
|
465
|
+
def wait_until_closed(timeout=nil)
|
466
|
+
time_to_stop = timeout ? Time.now + timeout : nil
|
467
|
+
|
468
|
+
@mutex.synchronize do
|
469
|
+
while true
|
470
|
+
if timeout
|
471
|
+
now = Time.now
|
472
|
+
break if (now > time_to_stop) || (@client_state == CLOSED)
|
473
|
+
deadline = time_to_stop.to_f - now.to_f
|
474
|
+
@cond.wait(deadline)
|
475
|
+
else
|
476
|
+
break if @client_state == CLOSED
|
477
|
+
@cond.wait
|
478
|
+
end
|
479
|
+
end
|
421
480
|
end
|
481
|
+
|
482
|
+
logger.debug { "#{__method__} @last_cnx_state: #{@last_cnx_state.inspect}, time_left? #{timeout ? (Time.now.to_f < time_to_stop.to_f) : 'true'}, @client_state: #{@client_state.inspect}" }
|
422
483
|
end
|
423
484
|
|
485
|
+
|
424
486
|
# @private
|
425
487
|
def client_state
|
426
488
|
@mutex.synchronize { @client_state }
|
@@ -551,9 +613,26 @@ module ZK
|
|
551
613
|
def unlocked_connect(opts={})
|
552
614
|
return if @cnx
|
553
615
|
timeout = opts.fetch(:timeout, @connection_timeout)
|
616
|
+
|
617
|
+
# this is a little bit of a lie, but is the legitimate state we're in when we first
|
618
|
+
# create the connection.
|
619
|
+
@last_cnx_state = Zookeeper::ZOO_CONNECTING_STATE
|
620
|
+
|
554
621
|
@cnx = create_connection(@host, timeout, @event_handler.get_default_watcher_block)
|
622
|
+
|
555
623
|
spawn_reconnect_thread
|
624
|
+
|
625
|
+
# this is a bit of a hack, because we need to wait until the event thread
|
626
|
+
# delivers the connected event, which we used to be able to rely on just the
|
627
|
+
# connection doing. since we don't want to call the @cnx.state method to check
|
628
|
+
# (rather use the cached @last_cnx_state), we wait for consistency's sake
|
629
|
+
#
|
630
|
+
# NOTE: this may cause issues later if we move to using non-reentrant locks
|
631
|
+
# TODO: this may wind up causing the whole process to take longer
|
632
|
+
# than `timeout` to complete, we should probably be using a difference
|
633
|
+
# (i.e. time-to-go) here
|
634
|
+
wait_until_connected_or_dying(timeout)
|
556
635
|
end
|
557
|
-
end
|
558
|
-
end
|
559
|
-
end
|
636
|
+
end # Threaded
|
637
|
+
end # Client
|
638
|
+
end # ZK
|
data/lib/zk/version.rb
CHANGED
data/lib/zk.rb
CHANGED
data/spec/zk/pool_spec.rb
CHANGED
@@ -219,6 +219,7 @@ describe ZK::Pool do
|
|
219
219
|
before do
|
220
220
|
@min_clients = 1
|
221
221
|
@max_clients = 2
|
222
|
+
@timeout = 10
|
222
223
|
@connection_pool = ZK::Pool::Bounded.new(connection_host, :min_clients => @min_clients, :max_clients => @max_clients, :timeout => @timeout)
|
223
224
|
@connection_pool.should be_open
|
224
225
|
wait_until(2) { @connection_pool.available_size > 0 }
|
metadata
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: zk
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
hash:
|
4
|
+
hash: 11
|
5
5
|
prerelease:
|
6
6
|
segments:
|
7
7
|
- 1
|
8
8
|
- 6
|
9
|
-
-
|
10
|
-
version: 1.6.
|
9
|
+
- 2
|
10
|
+
version: 1.6.2
|
11
11
|
platform: ruby
|
12
12
|
authors:
|
13
13
|
- Jonathan D. Simms
|
@@ -16,7 +16,7 @@ autorequire:
|
|
16
16
|
bindir: bin
|
17
17
|
cert_chain: []
|
18
18
|
|
19
|
-
date: 2012-
|
19
|
+
date: 2012-06-01 00:00:00 Z
|
20
20
|
dependencies:
|
21
21
|
- !ruby/object:Gem::Dependency
|
22
22
|
name: zookeeper
|