RubyGems - zk - Versions diffs - 1.6.1 → 1.6.2 - Mend

zk 1.6.1 → 1.6.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

data/README.markdown +6 -61
data/RELEASES.markdown +6 -0
data/lib/zk/client/state_mixin.rb +0 -45
data/lib/zk/client/threaded.rb +93 -14
data/lib/zk/version.rb +1 -1
data/lib/zk.rb +1 -1
data/spec/zk/pool_spec.rb +1 -0
metadata +4 -4

data/README.markdown CHANGED Viewed

@@ -65,6 +65,12 @@ In addition to all of that, I would like to think that the public API the ZK::Cl
 [zk-eventmachine]: https://github.com/slyphon/zk-eventmachine
 ## NEWS ##
+### v1.6.2 ###
+* Change state call to reduce the chances of deadlocks
+One of the problems I've been seeing is that during some kind of shutdown event, some method will call `closed?` or `connected?` which will acquire a mutex and make a call on the underlying connection at the *exact* moment necessary to cause a deadlock. In order to help prevent this, and building on some changes from 1.5.3, we now treat our cached `@last_cnx_state` as the current state of the connection and don't touch the underlying connection object (except in the case of the java driver, which is safe).
 ### v1.6.0 ###
 * Locker cleanup code!
@@ -89,67 +95,6 @@ Will go through your locker nodes one by one and try to lock and unlock them. If
 * 'private' is not 'protected'. I've been writing ruby for several years now, and apparently I'd forgotten that 'protected' does not work like how it does in java. The visibility of these methods has been corrected, and all specs pass, so I don't expect issues...but please report if this change causes any bugs in user code.
-### v1.5.2 ###
-* Fix locker cleanup code to avoid a nasty race when a session is lost, see [issue #34](https://github.com/slyphon/zk/issues/34)
-* Fix potential deadlock in ForkHook code so the mutex is unlocked in the case of an exception
-* Do not hang forever when shutting down and the shutdown thread does not exit (wait 30 seconds).
-### v1.5.1 ###
-* Added a `:retry_duration` option to the Threaded client constructor which will allows the user to specify for how long in the case of a connection loss, should an operation wait for the connection to be re-established before retrying the operation. This can be set at a global level and overridden on a per-call basis. The default is to not retry (which may change at a later date). Generally speaking, a timeout of > 30s is probably excessive, and care should be taken because during a connection loss, the server-side state may change without you being aware of it (i.e. events will not be delivered).
-* Small fork-hook implementation fix. Previously we were using WeakRefs so that hooks would not prevent an object from being garbage collected. This has been replaced with a finalizer which is more deterministic.
-### v1.5.0 ###
-Ok, now seriously this time. I think all of the forking issues are done.
-* Implemented a 'stop the world' feature to ensure safety when forking. All threads are stopped, but state is preserved. `fork()` can then be called safely, and after fork returns, all threads will be restarted in the parent, and the connection will be torn down and reopened in the child.
-* The easiest, and supported, way of doing this is now to call `ZK.install_fork_hook` after requiring zk. This will install an `alias_method_chain` style hook around the `Kernel.fork` method, which handles pausing all clients in the parent, calling fork, then resuming in the parent and reconnecting in the child. If you're using ZK in resque, I *highly* recommend using this approach, as it will give the most consistent results.
-In your app that requires an open ZK instance and `fork()`:
-```ruby
-require 'zk'
-ZK.install_fork_hook
-```
-Then use fork as you normally would.
-* Logging is now off by default, but we now use the excellent, can't-recommend-it-enough, [logging](https://github.com/TwP/logging) gem. If you want to tap into the ZK logs, you can assign a stdlib compliant logger to `ZK.logger` and that will be used. Otherwise, you can use the Logging framework's controls. All ZK logs are consolidated under the 'ZK' logger instance.
-### v1.4.1 ###
-* All users of resque or other libraries that depend on `fork()` are encouraged to upgrade immediately. This version of ZK features the `zookeeper-1.1.0` gem with a completely rewritten backend that provides true fork safety. The rules still apply (you must call `#reopen` on your client as soon as possible in the child process) but you can be assured a much more stable experience.
-### v1.4.0 ###
-* Added a new `:ignore` option for convenience when you don't care if an operation fails. In the case of a failure, the method will return nil instead of raising an exception. This option works for `children`, `create`, `delete`, `get`, `get_acl`, `set`, and `set_acl`. `stat` will ignore the option (because it doesn't care about the state of a node).
-```
-# so instead of having to do:
-begin
-  zk.delete('/some/path')
-rescue ZK::Exceptions;:NoNode
-end
-# you can do
-zk.delete('/some/path', :ignore => :no_node)
-```
-* MASSIVE fork/parent/child test around event delivery and much greater stability expected for linux (with the zookeeper-1.0.3 gem). Again, please see the documentation on the wiki about [proper fork procedure](http://github.com/slyphon/zk/wiki/Forking).
 ## Caveats

data/RELEASES.markdown CHANGED Viewed

@@ -1,5 +1,11 @@
 This file notes feature differences and bugfixes contained between releases.
+### v1.6.2 ###
+* Change state call to reduce the chances of deadlocks
+One of the problems I've been seeing is that during some kind of shutdown event, some method will call `closed?` or `connected?` which will acquire a mutex and make a call on the underlying connection at the *exact* moment necessary to cause a deadlock. In order to help prevent this, and building on some changes from 1.5.3, we now treat our cached `@last_cnx_state` as the current state of the connection and don't touch the underlying connection object (except in the case of the java driver, which is safe).
 ### v1.6.1 ###
 * Small fixes for zk-eventmachine compatibilty

data/lib/zk/client/state_mixin.rb CHANGED Viewed

@@ -3,51 +3,6 @@ module ZK
     # Provides client-state related methods. Included in ZK::Client::Base.
     # (refactored out to this class to ease documentation overload)
     module StateMixin
-      # Returns true if the underlying connection is in the +connected+ state.
-      def connected?
-        wrap_state_closed_error { cnx and cnx.connected? }
-      end
-      # is the underlying connection is in the +associating+ state?
-      # @return [bool]
-      def associating?
-        wrap_state_closed_error { cnx and cnx.associating? }
-      end
-      # is the underlying connection is in the +connecting+ state?
-      # @return [bool]
-      def connecting?
-        wrap_state_closed_error { cnx and cnx.connecting? }
-      end
-      # is the underlying connection is in the +expired_session+ state?
-      # @return [bool]
-      def expired_session?
-        return nil unless @cnx
-        if defined?(::JRUBY_VERSION)
-          cnx.state == Java::OrgApacheZookeeper::ZooKeeper::States::EXPIRED_SESSION
-        else
-          wrap_state_closed_error { cnx.state == Zookeeper::ZOO_EXPIRED_SESSION_STATE }
-        end
-      end
-      # returns the current state of the connection as reported by the underlying driver
-      # as a symbol. The possible values are <tt>[:closed, :expired_session, :auth_failed
-      # :connecting, :connected, :associating]</tt>.
-      #
-      # See the Zookeeper session
-      # {documentation}[http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions]
-      # for more information
-      #
-      def state
-        if defined?(::JRUBY_VERSION)
-          cnx.state.to_string.downcase.to_sym
-        else
-          STATE_SYM_MAP.fetch(cnx.state) { |k| raise IndexError, "unrecognized state: #{k}" }
-        end
-      end
       # Register a block to be called when *any* connection event occurs
       #
       # @yield [event] yields the connection event to the block

data/lib/zk/client/threaded.rb CHANGED Viewed

@@ -200,7 +200,7 @@ module ZK
       # @option opts [Fixnum] :timeout how long we will wait for the connection
       #   to be established. If timeout is nil, we will wait forever: *use
       #   carefully*.
-       def connect(opts={})
+      def connect(opts={})
         @mutex.synchronize { unlocked_connect(opts) }
       end
@@ -216,7 +216,7 @@ module ZK
           setup_locks
-          @pid        = Process.pid
+          @pid           = Process.pid
           @client_state  = RUNNING                     # reset state to running if we were paused
           old_cnx, @cnx = @cnx, nil
@@ -224,8 +224,6 @@ module ZK
           join_and_clear_reconnect_thread
-          @last_cnx_state = nil
           @mutex.synchronize do
             # it's important that we're holding the lock, as access to 'cnx' is
             # synchronized, and we want to avoid a race where event handlers
@@ -245,8 +243,14 @@ module ZK
             end
             logger.debug { "reopening, no fork detected" }
-            @last_cnx_state = nil
+            @last_cnx_state = Zookeeper::ZOO_CONNECTING_STATE
             @cnx.reopen(timeout)                # ok, we werent' forked, so just reopen
+            # this is a bit of a hack, because we need to wait until the event thread
+            # delivers the connected event, which we used to be able to rely on just the
+            # connection doing. since we don't want to call the @cnx.state method to check
+            # (rather use the cached @last_cnx_state), we wait for consistency's sake
+            wait_until_connected_or_dying(timeout)
           end
         end
@@ -347,6 +351,37 @@ module ZK
         on_tpool ? shutdown_thread : shutdown_thread.join(30)
       end
+      # this overrides the implementation in StateMixin
+      def connected?
+        @mutex.synchronize { running? && @last_cnx_state == Zookeeper::ZOO_CONNECTED_STATE }
+      end
+      def associating?
+        @mutex.synchronize { running? && @last_cnx_state == Zookeeper::ZOO_ASSOCIATING_STATE }
+      end
+      def connecting?
+        @mutex.synchronize { running? && @last_cnx_state == Zookeeper::ZOO_CONNECTING_STATE }
+      end
+      def expired_session?
+        @mutex.synchronize do
+          return false unless @cnx and running?
+          if defined?(::JRUBY_VERSION)
+            !@cnx.state.alive?
+          else
+            @last_cnx_state == Zookeeper::ZOO_EXPIRED_SESSION_STATE
+          end
+        end
+      end
+      def state
+        @mutex.synchronize do
+          STATE_SYM_MAP.fetch(@last_cnx_state) { |k| raise IndexError, "unrecognized state: #{k.inspect}" }
+        end
+      end
       # {see ZK::Client::Base#close}
       def close
         super
@@ -407,20 +442,47 @@ module ZK
       # @private
       def wait_until_connected_or_dying(timeout)
-        time_to_stop = Time.now + timeout
+        time_to_stop = timeout ? Time.now + timeout : nil
         @mutex.synchronize do
           while true
-            now = Time.now
-            break if (@last_cnx_state == Zookeeper::ZOO_CONNECTED_STATE) || (now > time_to_stop) || (@client_state != RUNNING)
-            deadline = time_to_stop.to_f - now.to_f
-            @cond.wait(deadline)
+            if timeout
+              now = Time.now
+              break if (@last_cnx_state == Zookeeper::ZOO_CONNECTED_STATE) || (now > time_to_stop) || (@client_state != RUNNING)
+              deadline = time_to_stop.to_f - now.to_f
+              @cond.wait(deadline)
+            else
+              break if (@last_cnx_state == Zookeeper::ZOO_CONNECTED_STATE) || (@client_state != RUNNING)
+              @cond.wait
+            end
           end
+        end
-          logger.debug { "#{__method__} @last_cnx_state: #{@last_cnx_state.inspect}, time_left? #{Time.now.to_f < time_to_stop.to_f}, @client_state: #{@client_state.inspect}" }
+        logger.debug { "#{__method__} @last_cnx_state: #{@last_cnx_state.inspect}, time_left? #{timeout ? (Time.now.to_f < time_to_stop.to_f) : 'true'}, @client_state: #{@client_state.inspect}" }
+      end
+      # @private
+      def wait_until_closed(timeout=nil)
+        time_to_stop = timeout ? Time.now + timeout : nil
+        @mutex.synchronize do
+          while true
+            if timeout
+              now = Time.now
+              break if (now > time_to_stop) || (@client_state == CLOSED)
+              deadline = time_to_stop.to_f - now.to_f
+              @cond.wait(deadline)
+            else
+              break if @client_state == CLOSED
+              @cond.wait
+            end
+          end
         end
+        logger.debug { "#{__method__} @last_cnx_state: #{@last_cnx_state.inspect}, time_left? #{timeout ? (Time.now.to_f < time_to_stop.to_f) : 'true'}, @client_state: #{@client_state.inspect}" }
       end
       # @private
       def client_state
         @mutex.synchronize { @client_state }
@@ -551,9 +613,26 @@ module ZK
         def unlocked_connect(opts={})
           return if @cnx
           timeout = opts.fetch(:timeout, @connection_timeout)
+          # this is a little bit of a lie, but is the legitimate state we're in when we first
+          # create the connection.
+          @last_cnx_state = Zookeeper::ZOO_CONNECTING_STATE
           @cnx = create_connection(@host, timeout, @event_handler.get_default_watcher_block)
           spawn_reconnect_thread
+          # this is a bit of a hack, because we need to wait until the event thread
+          # delivers the connected event, which we used to be able to rely on just the
+          # connection doing. since we don't want to call the @cnx.state method to check
+          # (rather use the cached @last_cnx_state), we wait for consistency's sake
+          #
+          # NOTE: this may cause issues later if we move to using non-reentrant locks
+          # TODO: this may wind up causing the whole process to take longer
+          #       than `timeout` to complete, we should probably be using a difference
+          #       (i.e. time-to-go) here
+          wait_until_connected_or_dying(timeout)
         end
-    end
-  end
-end
+    end # Threaded
+  end # Client
+end # ZK

data/lib/zk/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module ZK
-  VERSION = "1.6.1"
+  VERSION = "1.6.2"
 end

data/lib/zk.rb CHANGED Viewed

@@ -211,7 +211,7 @@ module ZK
   ensure
     if cnx
       cnx.close!
-      Thread.pass until cnx.closed?
+      cnx.wait_until_closed(30) # XXX: hardcoded here, do not hang forever
     end
   end

data/spec/zk/pool_spec.rb CHANGED Viewed

@@ -219,6 +219,7 @@ describe ZK::Pool do
     before do
       @min_clients = 1
       @max_clients = 2
+      @timeout = 10
       @connection_pool = ZK::Pool::Bounded.new(connection_host, :min_clients => @min_clients, :max_clients => @max_clients, :timeout => @timeout)
       @connection_pool.should be_open
       wait_until(2) { @connection_pool.available_size > 0 }

metadata CHANGED Viewed

@@ -1,13 +1,13 @@
 --- !ruby/object:Gem::Specification
 name: zk
 version: !ruby/object:Gem::Version
-  hash: 13
+  hash: 11
   prerelease:
   segments:
   - 1
   - 6
-  - 1
-  version: 1.6.1
+  - 2
+  version: 1.6.2
 platform: ruby
 authors:
 - Jonathan D. Simms
@@ -16,7 +16,7 @@ autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-05-31 00:00:00 Z
+date: 2012-06-01 00:00:00 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: zookeeper