zk 1.6.1 → 1.6.2

Sign up to get free protection for your applications and to get access to all the features.
data/README.markdown CHANGED
@@ -65,6 +65,12 @@ In addition to all of that, I would like to think that the public API the ZK::Cl
65
65
  [zk-eventmachine]: https://github.com/slyphon/zk-eventmachine
66
66
 
67
67
  ## NEWS ##
68
+ ### v1.6.2 ###
69
+
70
+ * Change state call to reduce the chances of deadlocks
71
+
72
+ One of the problems I've been seeing is that during some kind of shutdown event, some method will call `closed?` or `connected?` which will acquire a mutex and make a call on the underlying connection at the *exact* moment necessary to cause a deadlock. In order to help prevent this, and building on some changes from 1.5.3, we now treat our cached `@last_cnx_state` as the current state of the connection and don't touch the underlying connection object (except in the case of the java driver, which is safe).
73
+
68
74
  ### v1.6.0 ###
69
75
 
70
76
  * Locker cleanup code!
@@ -89,67 +95,6 @@ Will go through your locker nodes one by one and try to lock and unlock them. If
89
95
 
90
96
  * 'private' is not 'protected'. I've been writing ruby for several years now, and apparently I'd forgotten that 'protected' does not work like how it does in java. The visibility of these methods has been corrected, and all specs pass, so I don't expect issues...but please report if this change causes any bugs in user code.
91
97
 
92
- ### v1.5.2 ###
93
-
94
- * Fix locker cleanup code to avoid a nasty race when a session is lost, see [issue #34](https://github.com/slyphon/zk/issues/34)
95
-
96
- * Fix potential deadlock in ForkHook code so the mutex is unlocked in the case of an exception
97
-
98
- * Do not hang forever when shutting down and the shutdown thread does not exit (wait 30 seconds).
99
-
100
- ### v1.5.1 ###
101
-
102
- * Added a `:retry_duration` option to the Threaded client constructor which will allows the user to specify for how long in the case of a connection loss, should an operation wait for the connection to be re-established before retrying the operation. This can be set at a global level and overridden on a per-call basis. The default is to not retry (which may change at a later date). Generally speaking, a timeout of > 30s is probably excessive, and care should be taken because during a connection loss, the server-side state may change without you being aware of it (i.e. events will not be delivered).
103
-
104
- * Small fork-hook implementation fix. Previously we were using WeakRefs so that hooks would not prevent an object from being garbage collected. This has been replaced with a finalizer which is more deterministic.
105
-
106
- ### v1.5.0 ###
107
-
108
- Ok, now seriously this time. I think all of the forking issues are done.
109
-
110
- * Implemented a 'stop the world' feature to ensure safety when forking. All threads are stopped, but state is preserved. `fork()` can then be called safely, and after fork returns, all threads will be restarted in the parent, and the connection will be torn down and reopened in the child.
111
-
112
- * The easiest, and supported, way of doing this is now to call `ZK.install_fork_hook` after requiring zk. This will install an `alias_method_chain` style hook around the `Kernel.fork` method, which handles pausing all clients in the parent, calling fork, then resuming in the parent and reconnecting in the child. If you're using ZK in resque, I *highly* recommend using this approach, as it will give the most consistent results.
113
-
114
- In your app that requires an open ZK instance and `fork()`:
115
-
116
- ```ruby
117
-
118
- require 'zk'
119
- ZK.install_fork_hook
120
-
121
- ```
122
-
123
- Then use fork as you normally would.
124
-
125
- * Logging is now off by default, but we now use the excellent, can't-recommend-it-enough, [logging](https://github.com/TwP/logging) gem. If you want to tap into the ZK logs, you can assign a stdlib compliant logger to `ZK.logger` and that will be used. Otherwise, you can use the Logging framework's controls. All ZK logs are consolidated under the 'ZK' logger instance.
126
-
127
-
128
- ### v1.4.1 ###
129
-
130
- * All users of resque or other libraries that depend on `fork()` are encouraged to upgrade immediately. This version of ZK features the `zookeeper-1.1.0` gem with a completely rewritten backend that provides true fork safety. The rules still apply (you must call `#reopen` on your client as soon as possible in the child process) but you can be assured a much more stable experience.
131
-
132
- ### v1.4.0 ###
133
-
134
- * Added a new `:ignore` option for convenience when you don't care if an operation fails. In the case of a failure, the method will return nil instead of raising an exception. This option works for `children`, `create`, `delete`, `get`, `get_acl`, `set`, and `set_acl`. `stat` will ignore the option (because it doesn't care about the state of a node).
135
-
136
- ```
137
- # so instead of having to do:
138
-
139
- begin
140
- zk.delete('/some/path')
141
- rescue ZK::Exceptions;:NoNode
142
- end
143
-
144
- # you can do
145
-
146
- zk.delete('/some/path', :ignore => :no_node)
147
-
148
- ```
149
-
150
- * MASSIVE fork/parent/child test around event delivery and much greater stability expected for linux (with the zookeeper-1.0.3 gem). Again, please see the documentation on the wiki about [proper fork procedure](http://github.com/slyphon/zk/wiki/Forking).
151
-
152
-
153
98
 
154
99
  ## Caveats
155
100
 
data/RELEASES.markdown CHANGED
@@ -1,5 +1,11 @@
1
1
  This file notes feature differences and bugfixes contained between releases.
2
2
 
3
+ ### v1.6.2 ###
4
+
5
+ * Change state call to reduce the chances of deadlocks
6
+
7
+ One of the problems I've been seeing is that during some kind of shutdown event, some method will call `closed?` or `connected?` which will acquire a mutex and make a call on the underlying connection at the *exact* moment necessary to cause a deadlock. In order to help prevent this, and building on some changes from 1.5.3, we now treat our cached `@last_cnx_state` as the current state of the connection and don't touch the underlying connection object (except in the case of the java driver, which is safe).
8
+
3
9
  ### v1.6.1 ###
4
10
 
5
11
  * Small fixes for zk-eventmachine compatibilty
@@ -3,51 +3,6 @@ module ZK
3
3
  # Provides client-state related methods. Included in ZK::Client::Base.
4
4
  # (refactored out to this class to ease documentation overload)
5
5
  module StateMixin
6
- # Returns true if the underlying connection is in the +connected+ state.
7
- def connected?
8
- wrap_state_closed_error { cnx and cnx.connected? }
9
- end
10
-
11
- # is the underlying connection is in the +associating+ state?
12
- # @return [bool]
13
- def associating?
14
- wrap_state_closed_error { cnx and cnx.associating? }
15
- end
16
-
17
- # is the underlying connection is in the +connecting+ state?
18
- # @return [bool]
19
- def connecting?
20
- wrap_state_closed_error { cnx and cnx.connecting? }
21
- end
22
-
23
- # is the underlying connection is in the +expired_session+ state?
24
- # @return [bool]
25
- def expired_session?
26
- return nil unless @cnx
27
-
28
- if defined?(::JRUBY_VERSION)
29
- cnx.state == Java::OrgApacheZookeeper::ZooKeeper::States::EXPIRED_SESSION
30
- else
31
- wrap_state_closed_error { cnx.state == Zookeeper::ZOO_EXPIRED_SESSION_STATE }
32
- end
33
- end
34
-
35
- # returns the current state of the connection as reported by the underlying driver
36
- # as a symbol. The possible values are <tt>[:closed, :expired_session, :auth_failed
37
- # :connecting, :connected, :associating]</tt>.
38
- #
39
- # See the Zookeeper session
40
- # {documentation}[http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions]
41
- # for more information
42
- #
43
- def state
44
- if defined?(::JRUBY_VERSION)
45
- cnx.state.to_string.downcase.to_sym
46
- else
47
- STATE_SYM_MAP.fetch(cnx.state) { |k| raise IndexError, "unrecognized state: #{k}" }
48
- end
49
- end
50
-
51
6
  # Register a block to be called when *any* connection event occurs
52
7
  #
53
8
  # @yield [event] yields the connection event to the block
@@ -200,7 +200,7 @@ module ZK
200
200
  # @option opts [Fixnum] :timeout how long we will wait for the connection
201
201
  # to be established. If timeout is nil, we will wait forever: *use
202
202
  # carefully*.
203
- def connect(opts={})
203
+ def connect(opts={})
204
204
  @mutex.synchronize { unlocked_connect(opts) }
205
205
  end
206
206
 
@@ -216,7 +216,7 @@ module ZK
216
216
 
217
217
  setup_locks
218
218
 
219
- @pid = Process.pid
219
+ @pid = Process.pid
220
220
  @client_state = RUNNING # reset state to running if we were paused
221
221
 
222
222
  old_cnx, @cnx = @cnx, nil
@@ -224,8 +224,6 @@ module ZK
224
224
 
225
225
  join_and_clear_reconnect_thread
226
226
 
227
- @last_cnx_state = nil
228
-
229
227
  @mutex.synchronize do
230
228
  # it's important that we're holding the lock, as access to 'cnx' is
231
229
  # synchronized, and we want to avoid a race where event handlers
@@ -245,8 +243,14 @@ module ZK
245
243
  end
246
244
 
247
245
  logger.debug { "reopening, no fork detected" }
248
- @last_cnx_state = nil
246
+ @last_cnx_state = Zookeeper::ZOO_CONNECTING_STATE
249
247
  @cnx.reopen(timeout) # ok, we werent' forked, so just reopen
248
+
249
+ # this is a bit of a hack, because we need to wait until the event thread
250
+ # delivers the connected event, which we used to be able to rely on just the
251
+ # connection doing. since we don't want to call the @cnx.state method to check
252
+ # (rather use the cached @last_cnx_state), we wait for consistency's sake
253
+ wait_until_connected_or_dying(timeout)
250
254
  end
251
255
  end
252
256
 
@@ -347,6 +351,37 @@ module ZK
347
351
  on_tpool ? shutdown_thread : shutdown_thread.join(30)
348
352
  end
349
353
 
354
+ # this overrides the implementation in StateMixin
355
+ def connected?
356
+ @mutex.synchronize { running? && @last_cnx_state == Zookeeper::ZOO_CONNECTED_STATE }
357
+ end
358
+
359
+ def associating?
360
+ @mutex.synchronize { running? && @last_cnx_state == Zookeeper::ZOO_ASSOCIATING_STATE }
361
+ end
362
+
363
+ def connecting?
364
+ @mutex.synchronize { running? && @last_cnx_state == Zookeeper::ZOO_CONNECTING_STATE }
365
+ end
366
+
367
+ def expired_session?
368
+ @mutex.synchronize do
369
+ return false unless @cnx and running?
370
+
371
+ if defined?(::JRUBY_VERSION)
372
+ !@cnx.state.alive?
373
+ else
374
+ @last_cnx_state == Zookeeper::ZOO_EXPIRED_SESSION_STATE
375
+ end
376
+ end
377
+ end
378
+
379
+ def state
380
+ @mutex.synchronize do
381
+ STATE_SYM_MAP.fetch(@last_cnx_state) { |k| raise IndexError, "unrecognized state: #{k.inspect}" }
382
+ end
383
+ end
384
+
350
385
  # {see ZK::Client::Base#close}
351
386
  def close
352
387
  super
@@ -407,20 +442,47 @@ module ZK
407
442
 
408
443
  # @private
409
444
  def wait_until_connected_or_dying(timeout)
410
- time_to_stop = Time.now + timeout
445
+ time_to_stop = timeout ? Time.now + timeout : nil
411
446
 
412
447
  @mutex.synchronize do
413
448
  while true
414
- now = Time.now
415
- break if (@last_cnx_state == Zookeeper::ZOO_CONNECTED_STATE) || (now > time_to_stop) || (@client_state != RUNNING)
416
- deadline = time_to_stop.to_f - now.to_f
417
- @cond.wait(deadline)
449
+ if timeout
450
+ now = Time.now
451
+ break if (@last_cnx_state == Zookeeper::ZOO_CONNECTED_STATE) || (now > time_to_stop) || (@client_state != RUNNING)
452
+ deadline = time_to_stop.to_f - now.to_f
453
+ @cond.wait(deadline)
454
+ else
455
+ break if (@last_cnx_state == Zookeeper::ZOO_CONNECTED_STATE) || (@client_state != RUNNING)
456
+ @cond.wait
457
+ end
418
458
  end
459
+ end
419
460
 
420
- logger.debug { "#{__method__} @last_cnx_state: #{@last_cnx_state.inspect}, time_left? #{Time.now.to_f < time_to_stop.to_f}, @client_state: #{@client_state.inspect}" }
461
+ logger.debug { "#{__method__} @last_cnx_state: #{@last_cnx_state.inspect}, time_left? #{timeout ? (Time.now.to_f < time_to_stop.to_f) : 'true'}, @client_state: #{@client_state.inspect}" }
462
+ end
463
+
464
+ # @private
465
+ def wait_until_closed(timeout=nil)
466
+ time_to_stop = timeout ? Time.now + timeout : nil
467
+
468
+ @mutex.synchronize do
469
+ while true
470
+ if timeout
471
+ now = Time.now
472
+ break if (now > time_to_stop) || (@client_state == CLOSED)
473
+ deadline = time_to_stop.to_f - now.to_f
474
+ @cond.wait(deadline)
475
+ else
476
+ break if @client_state == CLOSED
477
+ @cond.wait
478
+ end
479
+ end
421
480
  end
481
+
482
+ logger.debug { "#{__method__} @last_cnx_state: #{@last_cnx_state.inspect}, time_left? #{timeout ? (Time.now.to_f < time_to_stop.to_f) : 'true'}, @client_state: #{@client_state.inspect}" }
422
483
  end
423
484
 
485
+
424
486
  # @private
425
487
  def client_state
426
488
  @mutex.synchronize { @client_state }
@@ -551,9 +613,26 @@ module ZK
551
613
  def unlocked_connect(opts={})
552
614
  return if @cnx
553
615
  timeout = opts.fetch(:timeout, @connection_timeout)
616
+
617
+ # this is a little bit of a lie, but is the legitimate state we're in when we first
618
+ # create the connection.
619
+ @last_cnx_state = Zookeeper::ZOO_CONNECTING_STATE
620
+
554
621
  @cnx = create_connection(@host, timeout, @event_handler.get_default_watcher_block)
622
+
555
623
  spawn_reconnect_thread
624
+
625
+ # this is a bit of a hack, because we need to wait until the event thread
626
+ # delivers the connected event, which we used to be able to rely on just the
627
+ # connection doing. since we don't want to call the @cnx.state method to check
628
+ # (rather use the cached @last_cnx_state), we wait for consistency's sake
629
+ #
630
+ # NOTE: this may cause issues later if we move to using non-reentrant locks
631
+ # TODO: this may wind up causing the whole process to take longer
632
+ # than `timeout` to complete, we should probably be using a difference
633
+ # (i.e. time-to-go) here
634
+ wait_until_connected_or_dying(timeout)
556
635
  end
557
- end
558
- end
559
- end
636
+ end # Threaded
637
+ end # Client
638
+ end # ZK
data/lib/zk/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module ZK
2
- VERSION = "1.6.1"
2
+ VERSION = "1.6.2"
3
3
  end
data/lib/zk.rb CHANGED
@@ -211,7 +211,7 @@ module ZK
211
211
  ensure
212
212
  if cnx
213
213
  cnx.close!
214
- Thread.pass until cnx.closed?
214
+ cnx.wait_until_closed(30) # XXX: hardcoded here, do not hang forever
215
215
  end
216
216
  end
217
217
 
data/spec/zk/pool_spec.rb CHANGED
@@ -219,6 +219,7 @@ describe ZK::Pool do
219
219
  before do
220
220
  @min_clients = 1
221
221
  @max_clients = 2
222
+ @timeout = 10
222
223
  @connection_pool = ZK::Pool::Bounded.new(connection_host, :min_clients => @min_clients, :max_clients => @max_clients, :timeout => @timeout)
223
224
  @connection_pool.should be_open
224
225
  wait_until(2) { @connection_pool.available_size > 0 }
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: zk
3
3
  version: !ruby/object:Gem::Version
4
- hash: 13
4
+ hash: 11
5
5
  prerelease:
6
6
  segments:
7
7
  - 1
8
8
  - 6
9
- - 1
10
- version: 1.6.1
9
+ - 2
10
+ version: 1.6.2
11
11
  platform: ruby
12
12
  authors:
13
13
  - Jonathan D. Simms
@@ -16,7 +16,7 @@ autorequire:
16
16
  bindir: bin
17
17
  cert_chain: []
18
18
 
19
- date: 2012-05-31 00:00:00 Z
19
+ date: 2012-06-01 00:00:00 Z
20
20
  dependencies:
21
21
  - !ruby/object:Gem::Dependency
22
22
  name: zookeeper