mongo_ha 1.11.0.rc1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 40d6a7af7f740daf8f07e5f79713ae0b2ad76e2d
4
+ data.tar.gz: 1a0ba4bdda6f79ea283d1544f926d1901b500a1f
5
+ SHA512:
6
+ metadata.gz: 58f85d47132a40bf22cad95bc38044590b3fbcb0dbf1a78171f3dc3ae29aa559a7ff836867f3926465ecb2fdc5809de3447d50318112fc7665e39313c21ce355
7
+ data.tar.gz: a9e5aae3321e9e17f5ed1f401a952c8e39a60c1bea47345c4c5a461135a95f10e56411bb087fbc07a115882facaca62b010b6d17640cd35140d646116645de87
data/README.md ADDED
@@ -0,0 +1,162 @@
1
+ # mongo_ha
2
+
3
+ High availability for the mongo ruby driver. Automatic reconnects and recovery when replica-set changes, etc.
4
+
5
+ ## Status
6
+
7
+ Production Ready: Used every day in an enterprise environment across
8
+ remote data centers.
9
+
10
+ ## Overview
11
+
12
+ Adds methods to the Mongo Ruby driver to support retries on connection failure.
13
+
14
+ In the event of a connection failure, only one thread will attempt to re-establish
15
+ connectivity to the Mongo server(s). This is to prevent swamping the mongo
16
+ servers with reconnect attempts.
17
+
18
+ Retries are initially performed quickly in case it is brief network issue
19
+ and then backs off to give the replica-set time to elect a new master.
20
+
21
+ Currently Only Supports Ruby Mongo driver v1.11.x
22
+
23
+ mongo_ha transparently supports MongoMapper since it uses the mongo ruby driver
24
+ that is patched by loading this gem.
25
+
26
+ Mongo Router processes will often return a connection failure on their side
27
+ as an OperationFailure. This code will also retry automatically when the router
28
+ has errors talking to a sharded cluster.
29
+
30
+ ## Mongo Cursors
31
+
32
+ Any operations that return a cursor need to be handled in your own code
33
+ since the retry cannot be handled transparently.
34
+ For example: `find` returns a cursor, whereas `find_one` is handled because
35
+ it returns the data returned rather than a cursor
36
+
37
+ Example
38
+
39
+ ```ruby
40
+ # Wrap existing cursor based calls with a retry on connection failure block
41
+ results_collection.retry_on_connection_failure do
42
+ results_collection.find({}, sort: '_id', timeout: false) do |cursor|
43
+ cursor.each do |record|
44
+ puts "Record: #{record.inspect}"
45
+ end
46
+ end
47
+ end
48
+ ```
49
+
50
+ ### Note
51
+
52
+ In the above example the block will be repeated from the _beginning_ of the
53
+ collection should a connection failure occur. Without appropriate handling it
54
+ is possible to read the same records twice.
55
+
56
+ If the collection cannot be processed twice, it may be better to just let the
57
+ `Mongo::ConnectionFailure` flow up into the application for it to deal with at
58
+ a higher level.
59
+
60
+ ## Installation
61
+
62
+ Add to Gemfile:
63
+
64
+ ```ruby
65
+ gem 'mongo_ha'
66
+ ```
67
+
68
+ Or for standalone environments
69
+
70
+ ```shell
71
+ gem install mongo_ha
72
+ ```
73
+
74
+ If you are also using SemanticLogger, place `mongo_ha` below `semantic_logger`
75
+ and/or `rails_semantic_logger` in the `Gemfile`. This way it will create a logger
76
+ just for `Mongo::MongoClient` to improve the log output during connection recovery.
77
+
78
+ ## Configuration
79
+
80
+ mongo_ha adds several new configuration options to fine tune the reconnect behavior
81
+ for any environment.
82
+
83
+ Sample mongo.yml:
84
+
85
+ ```yaml
86
+ default_options: &default_options
87
+ :w: 1
88
+ :pool_size: 5
89
+ :pool_timeout: 5
90
+ :connect_timeout: 5
91
+ :reconnect_attempts: 53
92
+ :reconnect_retry_seconds: 0.1
93
+ :reconnect_retry_multiplier: 2
94
+ :reconnect_max_retry_seconds: 5
95
+
96
+ development: &development
97
+ uri: mongodb://localhost:27017/development
98
+ options:
99
+ <<: *default_options
100
+
101
+ test:
102
+ uri: mongodb://localhost:27017/test
103
+ options:
104
+ <<: *default_options
105
+
106
+ # Sample Production Settings
107
+ production:
108
+ uri: mongodb://mongo1.site.com:27017,mongo2.site.com:27017/production
109
+ options:
110
+ <<: *default_options
111
+ :pool_size: 50
112
+ :pool_timeout: 5
113
+ ```
114
+
115
+ The following options can be specified in the Mongo configuration options
116
+ to tune the retry intervals during a connection failure
117
+
118
+ ### :reconnect_attempts
119
+
120
+ * Number of times to attempt to reconnect.
121
+ * Default: 53
122
+
123
+ ### :reconnect_retry_seconds
124
+
125
+ * Initial delay before retrying
126
+ * Default: 0.1
127
+
128
+ ### :reconnect_retry_multiplier
129
+
130
+ * Multiply delay by this number with each retry to prevent overwhelming the server
131
+ * Default: 2
132
+
133
+ ### :reconnect_max_retry_seconds
134
+
135
+ * Maximum number of seconds to wait before retrying again
136
+ * Default: 5
137
+
138
+ Using the above default values, will result in retry connects at the following intervals
139
+
140
+ 0.1 0.2 0.4 0.8 1.6 3.2 5 5 5 5 ....
141
+
142
+ ## Testing
143
+
144
+ There is really only one place to test something like `mongo_ha` and that is in
145
+ a high volume mission critical production environment.
146
+ The initial code in this gem was created over 2 years with MongoDB running in an
147
+ enterprise production environment with hundreds of connections to Mongo servers
148
+ in remote data centers across a WAN. It adds high availability to standalone
149
+ MongoDB servers, replica-sets, and sharded clusters.
150
+
151
+ ## Issues
152
+
153
+ If the following output appears after adding the above connection options:
154
+
155
+ ```shell
156
+ reconnect_attempts is not a valid option for Mongo::MongoClient
157
+ reconnect_retry_seconds is not a valid option for Mongo::MongoClient
158
+ reconnect_retry_multiplier is not a valid option for Mongo::MongoClient
159
+ reconnect_max_retry_seconds is not a valid option for Mongo::MongoClient
160
+ ```
161
+
162
+ Then the `mongo_ha` gem was not loaded prior to connecting to Mongo
data/Rakefile ADDED
@@ -0,0 +1,28 @@
1
+ require 'rake/clean'
2
+ require 'rake/testtask'
3
+
4
+ $LOAD_PATH.unshift File.expand_path("../lib", __FILE__)
5
+ require 'mongo_ha/version'
6
+
7
+ task :gem do
8
+ system "gem build mongo_ha.gemspec"
9
+ end
10
+
11
+ task :publish => :gem do
12
+ system "git tag -a v#{MongoHA::VERSION} -m 'Tagging #{MongoHA::VERSION}'"
13
+ system "git push --tags"
14
+ system "gem push mongo_ha-#{MongoHA::VERSION}.gem"
15
+ system "rm mongo_ha-#{MongoHA::VERSION}.gem"
16
+ end
17
+
18
+ desc "Run Test Suite"
19
+ task :test do
20
+ Rake::TestTask.new(:functional) do |t|
21
+ t.test_files = FileList['test/*_test.rb']
22
+ t.verbose = true
23
+ end
24
+
25
+ Rake::Task['functional'].invoke
26
+ end
27
+
28
+ task :default => :test
@@ -0,0 +1,188 @@
1
+ require 'mongo'
2
+ module MongoHA
3
+ module MongoClient
4
+ CONNECTION_RETRY_OPTS = [:reconnect_attempts, :reconnect_retry_seconds, :reconnect_retry_multiplier, :reconnect_max_retry_seconds]
5
+
6
+ # The following errors occur when mongos cannot connect to the shard
7
+ # They require a retry to resolve them
8
+ # This list was created through painful experience. Add any new ones as they are discovered
9
+ # 9001: socket exception
10
+ # Operation failed with the following exception: Unknown error - Connection reset by peer:Unknown error - Connection reset by peer
11
+ # DBClientBase::findOne: transport error
12
+ # : db assertion failure
13
+ # 8002: 8002 all servers down!
14
+ # Operation failed with the following exception: stream closed
15
+ # Operation failed with the following exception: Bad file descriptor - Bad file descriptor:Bad file descriptor - Bad file descriptor
16
+ # Failed to connect to primary node.
17
+ # 10009: ReplicaSetMonitor no master found for set: mdbb
18
+ MONGOS_CONNECTION_ERRORS = [
19
+ 'socket exception',
20
+ 'Connection reset by peer',
21
+ 'transport error',
22
+ 'db assertion failure',
23
+ '8002',
24
+ 'stream closed',
25
+ 'Bad file descriptor',
26
+ 'Failed to connect',
27
+ '10009',
28
+ 'no master found',
29
+ 'not master',
30
+ 'Timed out waiting on socket',
31
+ "didn't get writeback",
32
+ ]
33
+
34
+ module InstanceMethods
35
+ # Add retry logic to MongoClient
36
+ def self.included(base)
37
+ base.class_eval do
38
+ alias_method :receive_message_original, :receive_message
39
+ alias_method :connect_original, :connect
40
+ alias_method :valid_opts_original, :valid_opts
41
+ alias_method :setup_original, :setup
42
+
43
+ attr_accessor *CONNECTION_RETRY_OPTS
44
+
45
+ # Prevent multiple threads from trying to reconnect at the same time during
46
+ # connection failures
47
+ @@failover_mutex = Mutex.new
48
+ # Wrap internal networking calls with retry logic
49
+
50
+ # Do not stub out :send_message_with_gle or :send_message
51
+ # It modifies the message, see CollectionWriter#send_write_operation
52
+
53
+ def receive_message(*args)
54
+ retry_on_connection_failure do
55
+ receive_message_original *args
56
+ end
57
+ end
58
+
59
+ def connect(*args)
60
+ retry_on_connection_failure do
61
+ connect_original *args
62
+ end
63
+ end
64
+
65
+ protected
66
+
67
+ def valid_opts(*args)
68
+ valid_opts_original(*args) + CONNECTION_RETRY_OPTS
69
+ end
70
+
71
+ def setup(opts)
72
+ self.reconnect_attempts = (opts.delete(:reconnect_attempts) || 53).to_i
73
+ self.reconnect_retry_seconds = (opts.delete(:reconnect_retry_seconds) || 0.1).to_f
74
+ self.reconnect_retry_multiplier = (opts.delete(:reconnect_retry_multiplier) || 2).to_f
75
+ self.reconnect_max_retry_seconds = (opts.delete(:reconnect_max_retry_seconds) || 5).to_f
76
+ setup_original(opts)
77
+ end
78
+
79
+ end
80
+ end
81
+
82
+ # Retry the supplied block when a Mongo::ConnectionFailure occurs
83
+ #
84
+ # Note: Check for Duplicate Key on inserts
85
+ #
86
+ # Returns the result of the block
87
+ #
88
+ # Example:
89
+ # connection.retry_on_connection_failure { |retried| connection.ping }
90
+ def retry_on_connection_failure(&block)
91
+ raise "Missing mandatory block parameter on call to Mongo::Connection#retry_on_connection_failure" unless block
92
+ retried = false
93
+ mongos_retries = 0
94
+ begin
95
+ result = block.call(retried)
96
+ retried = false
97
+ result
98
+ rescue Mongo::ConnectionFailure => exc
99
+ # Retry if reconnected, but only once to prevent an infinite loop
100
+ logger.warn "Connection Failure: '#{exc.message}' [#{exc.error_code}]"
101
+ if !retried && reconnect
102
+ retried = true
103
+ # TODO There has to be a way to flush the connection pool of all inactive connections
104
+ retry
105
+ end
106
+ raise exc
107
+ rescue Mongo::OperationFailure => exc
108
+ # Workaround not master issue. Disconnect connection when we get a not master
109
+ # error message. Master checks for an exact match on "not master", whereas
110
+ # it sometimes gets: "not master and slaveok=false"
111
+ if exc.result
112
+ error = exc.result['err'] || exc.result['errmsg']
113
+ close if error && error.include?("not master")
114
+ end
115
+
116
+ # These get returned when connected to a local mongos router when it in turn
117
+ # has connection failures talking to the remote shards. All we do is retry the same operation
118
+ # since it's connections to multiple remote shards may have failed.
119
+ # Disconnecting the current connection will not help since it is just to the mongos router
120
+ # First make sure it is connected to the mongos router
121
+ raise exc unless (MONGOS_CONNECTION_ERRORS.any? { |err| exc.message.include?(err) }) || (exc.message.strip == ':')
122
+
123
+ mongos_retries += 1
124
+ if mongos_retries <= 60
125
+ retried = true
126
+ Kernel.sleep(0.5)
127
+ logger.warn "[#{primary.inspect}] Router Connection Failure. Retry ##{mongos_retries}. Exc: '#{exc.message}' [#{exc.error_code}]"
128
+ # TODO Is there a way to flush the connection pool of all inactive connections
129
+ retry
130
+ end
131
+ raise exc
132
+ end
133
+ end
134
+
135
+ # Call this method whenever a Mongo::ConnectionFailure Exception
136
+ # has been raised to re-establish the connection
137
+ #
138
+ # This method is thread-safe and ensure that only one thread at a time
139
+ # per connection will attempt to re-establish the connection
140
+ #
141
+ # Returns whether the connection is connected again
142
+ def reconnect
143
+ logger.debug "Going to reconnect"
144
+
145
+ # Prevent other threads from invoking reconnect logic at the same time
146
+ @@failover_mutex.synchronize do
147
+ # Another thread may have already failed over the connection by the
148
+ # time this threads gets in
149
+ if active?
150
+ logger.info "Connected to: #{primary.inspect}"
151
+ return true
152
+ end
153
+
154
+ # Close all sockets that are not checked out so that other threads not
155
+ # currently waiting on Mongo, don't get bad connections and have to
156
+ # retry each one in turn
157
+ @primary_pool.close if @primary_pool
158
+
159
+ if reconnect_attempts > 0
160
+ # Wait for other threads to finish working on their sockets
161
+ retries = 1
162
+ retry_seconds = reconnect_retry_seconds
163
+ begin
164
+ logger.warn "Connection unavailable. Waiting: #{retry_seconds} seconds before retrying"
165
+ sleep retry_seconds
166
+ # Call original connect method since it is already within a retry block
167
+ connect_original
168
+ rescue Mongo::ConnectionFailure => exc
169
+ if retries < reconnect_attempts
170
+ retries += 1
171
+ retry_seconds *= reconnect_retry_multiplier
172
+ retry_seconds = reconnect_max_retry_seconds if retry_seconds > reconnect_max_retry_seconds
173
+ retry
174
+ end
175
+
176
+ logger.error "Auto-reconnect giving up after #{retries} reconnect attempts"
177
+ raise exc
178
+ end
179
+ logger.info "Successfully reconnected to: #{primary.inspect}"
180
+ end
181
+ connected?
182
+ end
183
+
184
+ end
185
+
186
+ end
187
+ end
188
+ end
@@ -0,0 +1,58 @@
1
+ module MongoHA
2
+ module Networking
3
+ module InstanceMethods
4
+ def self.included(base)
5
+ base.class_eval do
6
+ # Fix problem where a Timeout exception is not checking the socket back into the pool
7
+ # Based on code from Gem V1.11.1, not needed with V1.12 or above
8
+ # Only change is the ensure block
9
+ def send_message_with_gle(operation, message, db_name, log_message=nil, write_concern=false)
10
+ docs = num_received = cursor_id = ''
11
+ add_message_headers(message, operation)
12
+
13
+ last_error_message = build_get_last_error_message(db_name, write_concern)
14
+ last_error_id = add_message_headers(last_error_message, Mongo::Constants::OP_QUERY)
15
+
16
+ packed_message = message.append!(last_error_message).to_s
17
+ sock = nil
18
+ begin
19
+ sock = checkout_writer
20
+ send_message_on_socket(packed_message, sock)
21
+ docs, num_received, cursor_id = receive(sock, last_error_id)
22
+ # Removed checkin
23
+ # checkin(sock)
24
+ rescue Mongo::ConnectionFailure, Mongo::OperationFailure, Mongo::OperationTimeout => ex
25
+ # Removed checkin
26
+ # checkin(sock)
27
+ raise ex
28
+ rescue SystemStackError, NoMemoryError, SystemCallError => ex
29
+ close
30
+ raise ex
31
+ # Added ensure block to always check sock back in
32
+ ensure
33
+ checkin(sock) if sock
34
+ end
35
+
36
+ if num_received == 1
37
+ error = docs[0]['err'] || docs[0]['errmsg']
38
+ if error && error.include?("not master")
39
+ close
40
+ raise Mongo::ConnectionFailure.new(docs[0]['code'].to_s + ': ' + error, docs[0]['code'], docs[0])
41
+ elsif (!error.nil? && note = docs[0]['jnote'] || docs[0]['wnote']) # assignment
42
+ code = docs[0]['code'] || Mongo::ErrorCode::BAD_VALUE # as of server version 2.5.5
43
+ raise Mongo::WriteConcernError.new(code.to_s + ': ' + note, code, docs[0])
44
+ elsif error
45
+ code = docs[0]['code'] || Mongo::ErrorCode::UNKNOWN_ERROR
46
+ error = "wtimeout" if error == "timeout"
47
+ raise Mongo::WriteConcernError.new(code.to_s + ': ' + error, code, docs[0]) if error == "wtimeout"
48
+ raise Mongo::OperationFailure.new(code.to_s + ': ' + error, code, docs[0])
49
+ end
50
+ end
51
+
52
+ docs[0]
53
+ end
54
+ end
55
+ end
56
+ end
57
+ end
58
+ end
@@ -0,0 +1,3 @@
1
+ module MongoHA #:nodoc
2
+ VERSION = "1.11.0.rc1"
3
+ end
data/lib/mongo_ha.rb ADDED
@@ -0,0 +1,38 @@
1
+ require 'mongo'
2
+ require 'mongo_ha/version'
3
+ require 'mongo_ha/mongo_client'
4
+ require 'mongo_ha/networking'
5
+
6
+ # Give MongoClient a class-specific logger if SemanticLogger is available
7
+ # to give better logging information during a connection recovery scenario
8
+ if defined?(SemanticLogger)
9
+ Mongo::MongoClient.send(:include, SemanticLogger::Loggable)
10
+ Mongo::MongoClient.send(:define_method, :logger) { super() }
11
+ end
12
+
13
+ # Add in retry methods
14
+ Mongo::MongoClient.include(MongoHA::MongoClient::InstanceMethods)
15
+
16
+ # Ensure connection is checked back into the pool when exceptions are thrown
17
+ # The following line is no longer required with Mongo V1.12 and above
18
+ Mongo::Networking.include(MongoHA::Networking::InstanceMethods)
19
+
20
+ # Wrap critical Mongo methods with retry_on_connection_failure
21
+ {
22
+ Mongo::Collection => [
23
+ :aggregate, :count, :capped?, :distinct, :drop, :drop_index, :drop_indexes,
24
+ :ensure_index, :find_one, :find_and_modify, :group, :index_information,
25
+ :options, :stats, :map_reduce
26
+ ],
27
+ Mongo::CollectionOperationWriter => [:send_write_operation, :batch_message_send],
28
+ Mongo::CollectionCommandWriter => [:send_write_command, :batch_message_send]
29
+
30
+ }.each_pair do |klass, methods|
31
+ methods.each do |method|
32
+ original_method = "#{method}_original".to_sym
33
+ klass.send(:alias_method, original_method, method)
34
+ klass.send(:define_method, method) do |*args|
35
+ @connection.retry_on_connection_failure { send(original_method, *args) }
36
+ end
37
+ end
38
+ end
metadata ADDED
@@ -0,0 +1,65 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: mongo_ha
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.11.0.rc1
5
+ platform: ruby
6
+ authors:
7
+ - Reid Morrison
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2015-01-01 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: mongo
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: 1.11.0
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: 1.11.0
27
+ description: Automatic reconnects and recovery when replica-set changes, or connections
28
+ are lost, with transparent recovery
29
+ email:
30
+ - reidmo@gmail.com
31
+ executables: []
32
+ extensions: []
33
+ extra_rdoc_files: []
34
+ files:
35
+ - README.md
36
+ - Rakefile
37
+ - lib/mongo_ha.rb
38
+ - lib/mongo_ha/mongo_client.rb
39
+ - lib/mongo_ha/networking.rb
40
+ - lib/mongo_ha/version.rb
41
+ homepage: https://github.com/reidmorrison/mongo_ha
42
+ licenses:
43
+ - Apache License V2.0
44
+ metadata: {}
45
+ post_install_message:
46
+ rdoc_options: []
47
+ require_paths:
48
+ - lib
49
+ required_ruby_version: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - ">="
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ required_rubygems_version: !ruby/object:Gem::Requirement
55
+ requirements:
56
+ - - ">"
57
+ - !ruby/object:Gem::Version
58
+ version: 1.3.1
59
+ requirements: []
60
+ rubyforge_project:
61
+ rubygems_version: 2.4.5
62
+ signing_key:
63
+ specification_version: 4
64
+ summary: High availability for the mongo ruby driver
65
+ test_files: []