RubyGems - mongo_ha - Versions diffs - 1.11.0.rc1 - Mend

mongo_ha 1.11.0.rc1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

checksums.yaml +7 -0
data/README.md +162 -0
data/Rakefile +28 -0
data/lib/mongo_ha/mongo_client.rb +188 -0
data/lib/mongo_ha/networking.rb +58 -0
data/lib/mongo_ha/version.rb +3 -0
data/lib/mongo_ha.rb +38 -0
metadata +65 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: 40d6a7af7f740daf8f07e5f79713ae0b2ad76e2d
+  data.tar.gz: 1a0ba4bdda6f79ea283d1544f926d1901b500a1f
+SHA512:
+  metadata.gz: 58f85d47132a40bf22cad95bc38044590b3fbcb0dbf1a78171f3dc3ae29aa559a7ff836867f3926465ecb2fdc5809de3447d50318112fc7665e39313c21ce355
+  data.tar.gz: a9e5aae3321e9e17f5ed1f401a952c8e39a60c1bea47345c4c5a461135a95f10e56411bb087fbc07a115882facaca62b010b6d17640cd35140d646116645de87

data/README.md ADDED Viewed

@@ -0,0 +1,162 @@
+# mongo_ha
+High availability for the mongo ruby driver. Automatic reconnects and recovery when replica-set changes, etc.
+## Status
+Production Ready: Used every day in an enterprise environment across
+remote data centers.
+## Overview
+Adds methods to the Mongo Ruby driver to support retries on connection failure.
+In the event of a connection failure, only one thread will attempt to re-establish
+connectivity to the Mongo server(s). This is to prevent swamping the mongo
+servers with reconnect attempts.
+Retries are initially performed quickly in case it is brief network issue
+and then backs off to give the replica-set time to elect a new master.
+Currently Only Supports Ruby Mongo driver v1.11.x
+mongo_ha transparently supports MongoMapper since it uses the mongo ruby driver
+that is patched by loading this gem.
+Mongo Router processes will often return a connection failure on their side
+as an OperationFailure. This code will also retry automatically when the router
+has errors talking to a sharded cluster.
+## Mongo Cursors
+Any operations that return a cursor need to be handled in your own code
+since the retry cannot be handled transparently.
+For example: `find` returns a cursor, whereas `find_one` is handled because
+it returns the data returned rather than a cursor
+Example
+```ruby
+# Wrap existing cursor based calls with a retry on connection failure block
+results_collection.retry_on_connection_failure do
+  results_collection.find({}, sort: '_id', timeout: false) do |cursor|
+    cursor.each do |record|
+      puts "Record: #{record.inspect}"
+    end
+  end
+end
+```
+### Note
+In the above example the block will be repeated from the _beginning_ of the
+collection should a connection failure occur. Without appropriate handling it
+is possible to read the same records twice.
+If the collection cannot be processed twice, it may be better to just let the
+`Mongo::ConnectionFailure` flow up into the application for it to deal with at
+a higher level.
+## Installation
+Add to Gemfile:
+```ruby
+gem 'mongo_ha'
+```
+Or for standalone environments
+```shell
+gem install mongo_ha
+```
+If you are also using SemanticLogger, place `mongo_ha` below `semantic_logger`
+and/or `rails_semantic_logger` in the `Gemfile`. This way it will create a logger
+just for `Mongo::MongoClient` to improve the log output during connection recovery.
+## Configuration
+mongo_ha adds several new configuration options to fine tune the reconnect behavior
+for any environment.
+Sample mongo.yml:
+```yaml
+default_options: &default_options
+  :w:                           1
+  :pool_size:                   5
+  :pool_timeout:                5
+  :connect_timeout:             5
+  :reconnect_attempts:          53
+  :reconnect_retry_seconds:     0.1
+  :reconnect_retry_multiplier:  2
+  :reconnect_max_retry_seconds: 5
+development: &development
+  uri: mongodb://localhost:27017/development
+  options:
+    <<: *default_options
+test:
+  uri: mongodb://localhost:27017/test
+  options:
+    <<: *default_options
+# Sample Production Settings
+production:
+  uri: mongodb://mongo1.site.com:27017,mongo2.site.com:27017/production
+  options:
+    <<: *default_options
+    :pool_size:    50
+    :pool_timeout: 5
+```
+The following options can be specified in the Mongo configuration options
+to tune the retry intervals during a connection failure
+### :reconnect_attempts
+* Number of times to attempt to reconnect.
+* Default: 53
+### :reconnect_retry_seconds
+* Initial delay before retrying
+* Default: 0.1
+### :reconnect_retry_multiplier
+* Multiply delay by this number with each retry to prevent overwhelming the server
+* Default: 2
+### :reconnect_max_retry_seconds
+* Maximum number of seconds to wait before retrying again
+* Default: 5
+Using the above default values, will result in retry connects at the following intervals
+   0.1 0.2 0.4 0.8 1.6 3.2 5 5 5 5  ....
+## Testing
+There is really only one place to test something like `mongo_ha` and that is in
+a high volume mission critical production environment.
+The initial code in this gem was created over 2 years with MongoDB running in an
+enterprise production environment with hundreds of connections to Mongo servers
+in remote data centers across a WAN. It adds high availability to standalone
+MongoDB servers, replica-sets, and sharded clusters.
+## Issues
+If the following output appears after adding the above connection options:
+```shell
+reconnect_attempts is not a valid option for Mongo::MongoClient
+reconnect_retry_seconds is not a valid option for Mongo::MongoClient
+reconnect_retry_multiplier is not a valid option for Mongo::MongoClient
+reconnect_max_retry_seconds is not a valid option for Mongo::MongoClient
+```
+Then the `mongo_ha` gem was not loaded prior to connecting to Mongo

data/Rakefile ADDED Viewed

@@ -0,0 +1,28 @@
+require 'rake/clean'
+require 'rake/testtask'
+$LOAD_PATH.unshift File.expand_path("../lib", __FILE__)
+require 'mongo_ha/version'
+task :gem do
+  system "gem build mongo_ha.gemspec"
+end
+task :publish => :gem do
+  system "git tag -a v#{MongoHA::VERSION} -m 'Tagging #{MongoHA::VERSION}'"
+  system "git push --tags"
+  system "gem push mongo_ha-#{MongoHA::VERSION}.gem"
+  system "rm mongo_ha-#{MongoHA::VERSION}.gem"
+end
+desc "Run Test Suite"
+task :test do
+  Rake::TestTask.new(:functional) do |t|
+    t.test_files = FileList['test/*_test.rb']
+    t.verbose    = true
+  end
+  Rake::Task['functional'].invoke
+end
+task :default => :test

data/lib/mongo_ha/mongo_client.rb ADDED Viewed

@@ -0,0 +1,188 @@
+require 'mongo'
+module MongoHA
+  module MongoClient
+    CONNECTION_RETRY_OPTS = [:reconnect_attempts, :reconnect_retry_seconds, :reconnect_retry_multiplier, :reconnect_max_retry_seconds]
+    # The following errors occur when mongos cannot connect to the shard
+    # They require a retry to resolve them
+    # This list was created through painful experience. Add any new ones as they are discovered
+    #   9001: socket exception
+    #   Operation failed with the following exception: Unknown error - Connection reset by peer:Unknown error - Connection reset by peer
+    #   DBClientBase::findOne: transport error
+    #   : db assertion failure
+    #   8002: 8002 all servers down!
+    #   Operation failed with the following exception: stream closed
+    #   Operation failed with the following exception: Bad file descriptor - Bad file descriptor:Bad file descriptor - Bad file descriptor
+    #   Failed to connect to primary node.
+    #   10009: ReplicaSetMonitor no master found for set: mdbb
+    MONGOS_CONNECTION_ERRORS = [
+      'socket exception',
+      'Connection reset by peer',
+      'transport error',
+      'db assertion failure',
+      '8002',
+      'stream closed',
+      'Bad file descriptor',
+      'Failed to connect',
+      '10009',
+      'no master found',
+      'not master',
+      'Timed out waiting on socket',
+      "didn't get writeback",
+    ]
+    module InstanceMethods
+      # Add retry logic to MongoClient
+      def self.included(base)
+        base.class_eval do
+          alias_method :receive_message_original, :receive_message
+          alias_method :connect_original, :connect
+          alias_method :valid_opts_original, :valid_opts
+          alias_method :setup_original, :setup
+          attr_accessor *CONNECTION_RETRY_OPTS
+          # Prevent multiple threads from trying to reconnect at the same time during
+          # connection failures
+          @@failover_mutex = Mutex.new
+          # Wrap internal networking calls with retry logic
+          # Do not stub out :send_message_with_gle or :send_message
+          # It modifies the message, see CollectionWriter#send_write_operation
+          def receive_message(*args)
+            retry_on_connection_failure do
+              receive_message_original *args
+            end
+          end
+          def connect(*args)
+            retry_on_connection_failure do
+              connect_original *args
+            end
+          end
+          protected
+          def valid_opts(*args)
+            valid_opts_original(*args) + CONNECTION_RETRY_OPTS
+          end
+          def setup(opts)
+            self.reconnect_attempts          = (opts.delete(:reconnect_attempts) || 53).to_i
+            self.reconnect_retry_seconds     = (opts.delete(:reconnect_retry_seconds) || 0.1).to_f
+            self.reconnect_retry_multiplier  = (opts.delete(:reconnect_retry_multiplier) || 2).to_f
+            self.reconnect_max_retry_seconds = (opts.delete(:reconnect_max_retry_seconds) || 5).to_f
+            setup_original(opts)
+          end
+        end
+      end
+      # Retry the supplied block when a Mongo::ConnectionFailure occurs
+      #
+      # Note: Check for Duplicate Key on inserts
+      #
+      # Returns the result of the block
+      #
+      # Example:
+      #   connection.retry_on_connection_failure { |retried| connection.ping }
+      def retry_on_connection_failure(&block)
+        raise "Missing mandatory block parameter on call to Mongo::Connection#retry_on_connection_failure" unless block
+        retried = false
+        mongos_retries = 0
+        begin
+          result = block.call(retried)
+          retried = false
+          result
+        rescue Mongo::ConnectionFailure => exc
+          # Retry if reconnected, but only once to prevent an infinite loop
+          logger.warn "Connection Failure: '#{exc.message}' [#{exc.error_code}]"
+          if !retried && reconnect
+            retried = true
+            # TODO There has to be a way to flush the connection pool of all inactive connections
+            retry
+          end
+          raise exc
+        rescue Mongo::OperationFailure => exc
+          # Workaround not master issue. Disconnect connection when we get a not master
+          # error message. Master checks for an exact match on "not master", whereas
+          # it sometimes gets: "not master and slaveok=false"
+          if exc.result
+            error = exc.result['err'] || exc.result['errmsg']
+            close if error && error.include?("not master")
+          end
+          # These get returned when connected to a local mongos router when it in turn
+          # has connection failures talking to the remote shards. All we do is retry the same operation
+          # since it's connections to multiple remote shards may have failed.
+          # Disconnecting the current connection will not help since it is just to the mongos router
+          # First make sure it is connected to the mongos router
+          raise exc unless (MONGOS_CONNECTION_ERRORS.any? { |err| exc.message.include?(err) }) || (exc.message.strip == ':')
+          mongos_retries += 1
+          if mongos_retries <= 60
+            retried = true
+            Kernel.sleep(0.5)
+            logger.warn "[#{primary.inspect}] Router Connection Failure. Retry ##{mongos_retries}. Exc: '#{exc.message}' [#{exc.error_code}]"
+            # TODO Is there a way to flush the connection pool of all inactive connections
+            retry
+          end
+          raise exc
+        end
+      end
+      # Call this method whenever a Mongo::ConnectionFailure Exception
+      # has been raised to re-establish the connection
+      #
+      # This method is thread-safe and ensure that only one thread at a time
+      # per connection will attempt to re-establish the connection
+      #
+      # Returns whether the connection is connected again
+      def reconnect
+        logger.debug "Going to reconnect"
+        # Prevent other threads from invoking reconnect logic at the same time
+        @@failover_mutex.synchronize do
+          # Another thread may have already failed over the connection by the
+          # time this threads gets in
+          if active?
+            logger.info "Connected to: #{primary.inspect}"
+            return true
+          end
+          # Close all sockets that are not checked out so that other threads not
+          # currently waiting on Mongo, don't get bad connections and have to
+          # retry each one in turn
+          @primary_pool.close if @primary_pool
+          if reconnect_attempts > 0
+            # Wait for other threads to finish working on their sockets
+            retries = 1
+            retry_seconds = reconnect_retry_seconds
+            begin
+              logger.warn "Connection unavailable. Waiting: #{retry_seconds} seconds before retrying"
+              sleep retry_seconds
+              # Call original connect method since it is already within a retry block
+              connect_original
+            rescue Mongo::ConnectionFailure => exc
+              if retries < reconnect_attempts
+                retries += 1
+                retry_seconds *=  reconnect_retry_multiplier
+                retry_seconds = reconnect_max_retry_seconds if retry_seconds > reconnect_max_retry_seconds
+                retry
+              end
+              logger.error "Auto-reconnect giving up after #{retries} reconnect attempts"
+              raise exc
+            end
+            logger.info "Successfully reconnected to: #{primary.inspect}"
+          end
+          connected?
+        end
+      end
+    end
+  end
+end

data/lib/mongo_ha/networking.rb ADDED Viewed

@@ -0,0 +1,58 @@
+module MongoHA
+  module Networking
+    module InstanceMethods
+      def self.included(base)
+        base.class_eval do
+          # Fix problem where a Timeout exception is not checking the socket back into the pool
+          #   Based on code from Gem V1.11.1, not needed with V1.12 or above
+          #   Only change is the ensure block
+          def send_message_with_gle(operation, message, db_name, log_message=nil, write_concern=false)
+            docs = num_received = cursor_id = ''
+            add_message_headers(message, operation)
+            last_error_message = build_get_last_error_message(db_name, write_concern)
+            last_error_id = add_message_headers(last_error_message, Mongo::Constants::OP_QUERY)
+            packed_message = message.append!(last_error_message).to_s
+            sock = nil
+            begin
+              sock = checkout_writer
+              send_message_on_socket(packed_message, sock)
+              docs, num_received, cursor_id = receive(sock, last_error_id)
+#              Removed checkin
+#              checkin(sock)
+            rescue Mongo::ConnectionFailure, Mongo::OperationFailure, Mongo::OperationTimeout => ex
+#              Removed checkin
+#              checkin(sock)
+              raise ex
+            rescue SystemStackError, NoMemoryError, SystemCallError => ex
+              close
+              raise ex
+#           Added ensure block to always check sock back in
+            ensure
+              checkin(sock) if sock
+            end
+            if num_received == 1
+              error = docs[0]['err'] || docs[0]['errmsg']
+              if error && error.include?("not master")
+                close
+                raise Mongo::ConnectionFailure.new(docs[0]['code'].to_s + ': ' + error, docs[0]['code'], docs[0])
+              elsif (!error.nil? && note = docs[0]['jnote'] || docs[0]['wnote']) # assignment
+                code = docs[0]['code'] || Mongo::ErrorCode::BAD_VALUE # as of server version 2.5.5
+                raise Mongo::WriteConcernError.new(code.to_s + ': ' + note, code, docs[0])
+              elsif error
+                code = docs[0]['code'] || Mongo::ErrorCode::UNKNOWN_ERROR
+                error = "wtimeout" if error == "timeout"
+                raise Mongo::WriteConcernError.new(code.to_s + ': ' + error, code, docs[0]) if error == "wtimeout"
+                raise Mongo::OperationFailure.new(code.to_s + ': ' + error, code, docs[0])
+              end
+            end
+            docs[0]
+          end
+        end
+      end
+    end
+  end
+end

data/lib/mongo_ha/version.rb ADDED Viewed

@@ -0,0 +1,3 @@
+module MongoHA #:nodoc
+  VERSION = "1.11.0.rc1"
+end

data/lib/mongo_ha.rb ADDED Viewed

@@ -0,0 +1,38 @@
+require 'mongo'
+require 'mongo_ha/version'
+require 'mongo_ha/mongo_client'
+require 'mongo_ha/networking'
+# Give MongoClient a class-specific logger if SemanticLogger is available
+# to give better logging information during a connection recovery scenario
+if defined?(SemanticLogger)
+  Mongo::MongoClient.send(:include, SemanticLogger::Loggable)
+  Mongo::MongoClient.send(:define_method, :logger) { super() }
+end
+# Add in retry methods
+Mongo::MongoClient.include(MongoHA::MongoClient::InstanceMethods)
+# Ensure connection is checked back into the pool when exceptions are thrown
+#   The following line is no longer required with Mongo V1.12 and above
+Mongo::Networking.include(MongoHA::Networking::InstanceMethods)
+# Wrap critical Mongo methods with retry_on_connection_failure
+{
+  Mongo::Collection                => [
+    :aggregate, :count, :capped?, :distinct, :drop, :drop_index, :drop_indexes,
+    :ensure_index, :find_one, :find_and_modify, :group, :index_information,
+    :options, :stats, :map_reduce
+  ],
+  Mongo::CollectionOperationWriter => [:send_write_operation, :batch_message_send],
+  Mongo::CollectionCommandWriter   => [:send_write_command, :batch_message_send]
+}.each_pair do |klass, methods|
+  methods.each do |method|
+    original_method = "#{method}_original".to_sym
+    klass.send(:alias_method, original_method, method)
+    klass.send(:define_method, method) do |*args|
+      @connection.retry_on_connection_failure { send(original_method, *args) }
+    end
+  end
+end

metadata ADDED Viewed

@@ -0,0 +1,65 @@
+--- !ruby/object:Gem::Specification
+name: mongo_ha
+version: !ruby/object:Gem::Version
+  version: 1.11.0.rc1
+platform: ruby
+authors:
+- Reid Morrison
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2015-01-01 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: mongo
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 1.11.0
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 1.11.0
+description: Automatic reconnects and recovery when replica-set changes, or connections
+  are lost, with transparent recovery
+email:
+- reidmo@gmail.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- README.md
+- Rakefile
+- lib/mongo_ha.rb
+- lib/mongo_ha/mongo_client.rb
+- lib/mongo_ha/networking.rb
+- lib/mongo_ha/version.rb
+homepage: https://github.com/reidmorrison/mongo_ha
+licenses:
+- Apache License V2.0
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">"
+    - !ruby/object:Gem::Version
+      version: 1.3.1
+requirements: []
+rubyforge_project:
+rubygems_version: 2.4.5
+signing_key:
+specification_version: 4
+summary: High availability for the mongo ruby driver
+test_files: []