redis_failover 0.7.0 → 0.8.0

Sign up to get free protection for your applications and to get access to all the features.
data/Changes.md CHANGED
@@ -1,3 +1,8 @@
1
+ 0.8.0
2
+ -----------
3
+ - Added manual failover support (can be initiated via RedisFailover::Client#manual_failover)
4
+ - Misc. cleanup
5
+
1
6
  0.7.0
2
7
  -----------
3
8
  - When new master promotion occurs, make existing slaves point to new candidate before promoting new master.
data/README.md CHANGED
@@ -10,7 +10,7 @@ architecture. The existing standard Redis client for Ruby also only supports con
10
10
  Redis server. When using master/slave replication, it is desirable to have all writes go to the
11
11
  master, and all reads go to one of the N configured slaves.
12
12
 
13
- This gem attempts to address these failover scenarios. A redis failover Node Manager daemon runs as a background
13
+ This gem (built using [ZK][]) attempts to address these failover scenarios. A redis failover Node Manager daemon runs as a background
14
14
  process and monitors all of your configured master/slave nodes. When the daemon starts up, it
15
15
  automatically discovers the current master/slaves. Background watchers are setup for each of
16
16
  the redis nodes. As soon as a node is detected as being offline, it will be moved to an "unavailable" state.
@@ -35,6 +35,8 @@ dispatch Redis read operations to one of N slaves, and Redis write operations to
35
35
  If it fails to communicate with any node, it will go back and fetch the current list of available servers, and then
36
36
  optionally retry the operation.
37
37
 
38
+ [ZK]: https://github.com/slyphon/zk
39
+
38
40
  ## Installation
39
41
 
40
42
  redis_failover has an external dependency on ZooKeeper. You must have a running ZooKeeper cluster already available in order to use redis_failover. ZooKeeper provides redis_failover with its high availability and data consistency between Redis::Failover clients and the Node Manager daemon. Please see the requirements section below for more information on installing and setting up ZooKeeper if you don't have it running already.
@@ -120,6 +122,15 @@ The full set of options that can be passed to RedisFailover::Client are:
120
122
  :retry_failure - indicate if failures should be retried (default true)
121
123
  :max_retries - max retries for a failure (default 3)
122
124
 
125
+ ## Manual Failover
126
+
127
+ Manual failover can be initiated via RedisFailover::Client#manual_failover. This schedules a manual failover with the
128
+ currently active Node Manager. Once the Node Manager receives the request, it will either failover to the specific
129
+ server passed to #manual_failover, or it will pick a random slave to become the new master. Here's an example:
130
+
131
+ client = RedisFailover::Client.new(:zkservers => 'localhost:2181,localhost:2182,localhost:2183')
132
+ client.manual_failover(:host => 'localhost', :port => 2222)
133
+
123
134
  ## Requirements
124
135
 
125
136
  - redis_failover is actively tested against MRI 1.9.2/1.9.3 and JRuby 1.6.7 (1.9 mode only). Other rubies may work, although I don't actively test against them. 1.8 is not supported.
@@ -129,8 +140,12 @@ The full set of options that can be passed to RedisFailover::Client are:
129
140
 
130
141
  - Note that by default the Node Manager will mark slaves that are currently syncing with their master as "available" based on the configuration value set for "slave-serve-stale-data" in redis.conf. By default this value is set to "yes" in the configuration, which means that slaves still syncing with their master will be available for servicing read requests. If you don't want this behavior, just set "slave-serve-stale-data" to "no" in your redis.conf file.
131
142
 
143
+ ## Limitations
144
+
132
145
  - Note that it's still possible for the RedisFailover::Client instances to see a stale list of servers for a very small window. In most cases this will not be the case due to how ZooKeeper handles distributed communication, but you should be aware that in the worst case the client could write to a "stale" master for a small period of time until the next watch event is received by the client via ZooKeeper.
133
146
 
147
+ - Note that currently multiple Node Managers are currently used for redundancy purposes only. The Node Managers do not communicate with each other to perform any type of election or voting to determine if they all agree on promoting a new master. Right now Node Managers that are not "active" just sit and wait until they can grab the lock to become the single decision-maker for which Redis servers are available or not. This means that a scenario could present itself where a Node Manager thinks the Redis master is available, however the actual RedisFailover::Client instances think they can't reach the Redis master (either due to network partitions or the Node Manager flapping due to machine failure, etc). We are exploring ways to improve this situation.
148
+
134
149
  ## Resources
135
150
 
136
151
  - To learn more about Redis master/slave replication, see the [Redis documentation](http://redis.io/topics/replication).
@@ -15,6 +15,7 @@ require 'redis_failover/node'
15
15
  require 'redis_failover/errors'
16
16
  require 'redis_failover/client'
17
17
  require 'redis_failover/runner'
18
+ require 'redis_failover/manual'
18
19
  require 'redis_failover/version'
19
20
  require 'redis_failover/node_manager'
20
21
  require 'redis_failover/node_watcher'
@@ -87,7 +87,6 @@ module RedisFailover
87
87
  # Creates a new failover redis client.
88
88
  #
89
89
  # Options:
90
- #
91
90
  # :zkservers - comma-separated ZooKeeper host:port pairs (required)
92
91
  # :znode_path - the Znode path override for redis server list (optional)
93
92
  # :password - password for redis nodes (optional)
@@ -96,7 +95,6 @@ module RedisFailover
96
95
  # :logger - logger override (optional)
97
96
  # :retry_failure - indicate if failures should be retried (default true)
98
97
  # :max_retries - max retries for a failure (default 3)
99
- #
100
98
  def initialize(options = {})
101
99
  Util.logger = options[:logger] if options[:logger]
102
100
  @zkservers = options.fetch(:zkservers) { raise ArgumentError, ':zkservers required'}
@@ -132,6 +130,18 @@ module RedisFailover
132
130
  end
133
131
  alias_method :to_s, :inspect
134
132
 
133
+ # Force a manual failover to a new server. A specific server can be specified
134
+ # via options. If no options are passed, a random slave will be selected as
135
+ # the candidate for the new master.
136
+ #
137
+ # Options:
138
+ # :host - the host of the failover candidate
139
+ # :port - the port of the failover candidate
140
+ def manual_failover(options = {})
141
+ Manual.failover(zk, options)
142
+ self
143
+ end
144
+
135
145
  private
136
146
 
137
147
  def zk
@@ -141,13 +151,15 @@ module RedisFailover
141
151
  def start_zk
142
152
  @delivery_thread ||= Thread.new do
143
153
  while event = @queue.pop
144
- if event.is_a?(Proc)
145
- event.call
146
- else
147
- handle_zk_event(event)
154
+ begin
155
+ Proc === event ? event.call : handle_zk_event(event)
156
+ rescue => ex
157
+ logger.error("Error while handling event: #{ex.inspect}")
158
+ logger.error(ex.backtrace.join("\n"))
148
159
  end
149
160
  end
150
161
  end
162
+
151
163
  reconnect_zk
152
164
  end
153
165
 
@@ -165,6 +177,7 @@ module RedisFailover
165
177
  @zk.on_connected do
166
178
  @zk.stat(@znode, :watch => true)
167
179
  end
180
+ @zk.stat(@znode, :watch => true)
168
181
  end
169
182
  end
170
183
 
@@ -181,13 +194,13 @@ module RedisFailover
181
194
  end
182
195
 
183
196
  def reconnect_zk
184
- handle_lost_connection
185
197
  @lock.synchronize do
198
+ handle_lost_connection
186
199
  @zk.close! if @zk
187
200
  @zk = ZK.new(@zkservers)
201
+ handle_session_established
202
+ update_znode_timestamp
188
203
  end
189
- handle_session_established
190
- update_znode_timestamp
191
204
  end
192
205
 
193
206
  def handle_lost_connection
@@ -200,8 +213,10 @@ module RedisFailover
200
213
 
201
214
  def dispatch(method, *args, &block)
202
215
  unless recently_heard_from_node_manager?
203
- purge_clients
204
- raise MissingNodeManagerError.new(ZNODE_UPDATE_TIMEOUT)
216
+ @lock.synchronize do
217
+ reconnect_zk
218
+ build_clients
219
+ end
205
220
  end
206
221
 
207
222
  verify_supported!(method)
@@ -40,10 +40,4 @@ module RedisFailover
40
40
  super("Operation `#{operation}` is currently unsupported")
41
41
  end
42
42
  end
43
-
44
- class MissingNodeManagerError < Error
45
- def initialize(timeout)
46
- super("Failed to hear from Node Manager within #{timeout} seconds")
47
- end
48
- end
49
43
  end
@@ -0,0 +1,26 @@
1
+ module RedisFailover
2
+ # Provides manual failover support to a new master.
3
+ module Manual
4
+ extend self
5
+
6
+ # Path for manual failover communication.
7
+ ZNODE_PATH = '/redis_failover_manual'.freeze
8
+
9
+ # Denotes that any slave can be used as a candidate for promotion.
10
+ ANY_SLAVE = "ANY_SLAVE".freeze
11
+
12
+ def failover(zk, options = {})
13
+ create_path(zk)
14
+ node = options.empty? ? ANY_SLAVE : "#{options[:host]}:#{options[:port]}"
15
+ zk.set(ZNODE_PATH, node)
16
+ end
17
+
18
+ private
19
+
20
+ def create_path(zk)
21
+ zk.create(ZNODE_PATH)
22
+ rescue ZK::Exceptions::NodeExists
23
+ # best effort
24
+ end
25
+ end
26
+ end
@@ -82,7 +82,7 @@ module RedisFailover
82
82
  end
83
83
 
84
84
  def ==(other)
85
- return false unless other.is_a?(Node)
85
+ return false unless Node === other
86
86
  return true if self.equal?(other)
87
87
  [host, port] == [other.host, other.port]
88
88
  end
@@ -132,13 +132,13 @@ module RedisFailover
132
132
  yield redis
133
133
  end
134
134
  rescue
135
- raise NodeUnavailableError.new(self)
135
+ raise NodeUnavailableError, self, caller
136
136
  ensure
137
137
  if redis
138
138
  begin
139
139
  redis.client.disconnect
140
140
  rescue
141
- raise NodeUnavailableError.new(self)
141
+ raise NodeUnavailableError, self, caller
142
142
  end
143
143
  end
144
144
  end
@@ -24,13 +24,17 @@ module RedisFailover
24
24
  logger.info("Redis Node Manager v#{VERSION} starting (#{RUBY_DESCRIPTION})")
25
25
  @options = options
26
26
  @znode = @options[:znode_path] || Util::DEFAULT_ZNODE_PATH
27
+ @manual_znode = Manual::ZNODE_PATH
28
+ @mutex = Mutex.new
27
29
  end
28
30
 
29
31
  def start
30
32
  @queue = Queue.new
31
- @zk = ZK.new(@options[:zkservers])
33
+ @leader = false
34
+ setup_zk
32
35
  logger.info('Waiting to become master Node Manager ...')
33
36
  @zk.with_lock(LOCK_PATH) do
37
+ @leader = true
34
38
  logger.info('Acquired master Node Manager lock')
35
39
  discover_nodes
36
40
  initialize_path
@@ -50,6 +54,7 @@ module RedisFailover
50
54
  end
51
55
 
52
56
  def shutdown
57
+ @queue.clear
53
58
  @queue << nil
54
59
  @watchers.each(&:shutdown) if @watchers
55
60
  @zk.close! if @zk
@@ -57,14 +62,33 @@ module RedisFailover
57
62
 
58
63
  private
59
64
 
65
+ def setup_zk
66
+ @zk.close! if @zk
67
+ @zk = ZK.new(@options[:zkservers])
68
+
69
+ @zk.register(@manual_znode) do |event|
70
+ @mutex.synchronize do
71
+ if event.node_changed?
72
+ schedule_manual_failover
73
+ end
74
+ end
75
+ end
76
+
77
+ @zk.on_connected do
78
+ @zk.stat(@manual_znode, :watch => true)
79
+ end
80
+ @zk.stat(@manual_znode, :watch => true)
81
+ end
82
+
60
83
  def handle_state_reports
61
84
  while state_report = @queue.pop
62
85
  begin
63
86
  node, state = state_report
64
87
  case state
65
- when :unavailable then handle_unavailable(node)
66
- when :available then handle_available(node)
67
- when :syncing then handle_syncing(node)
88
+ when :unavailable then handle_unavailable(node)
89
+ when :available then handle_available(node)
90
+ when :syncing then handle_syncing(node)
91
+ when :manual_failover then handle_manual_failover(node)
68
92
  else raise InvalidNodeStateError.new(node, state)
69
93
  end
70
94
 
@@ -127,6 +151,17 @@ module RedisFailover
127
151
  handle_available(node)
128
152
  end
129
153
 
154
+ def handle_manual_failover(node)
155
+ # no-op if node to be failed over is already master
156
+ return if @master == node
157
+ logger.info("Handling manual failover")
158
+
159
+ # make current master a slave, and promote new master
160
+ @slaves << @master
161
+ @slaves.delete(node)
162
+ promote_new_master(node)
163
+ end
164
+
130
165
  def promote_new_master(node = nil)
131
166
  delete_path
132
167
  @master = nil
@@ -244,5 +279,19 @@ module RedisFailover
244
279
  create_path
245
280
  @zk.set(@znode, encode(current_nodes))
246
281
  end
282
+
283
+ def schedule_manual_failover
284
+ return unless @leader
285
+ new_master = @zk.get(@manual_znode, :watch => true).first
286
+ logger.info("Received manual failover request for: #{new_master}")
287
+
288
+ node = if new_master == Manual::ANY_SLAVE
289
+ @slaves.sample
290
+ else
291
+ host, port = new_master.split(':', 2)
292
+ Node.new(:host => host, :port => port, :password => @options[:password])
293
+ end
294
+ notify_state(node, :manual_failover) if node
295
+ end
247
296
  end
248
297
  end
@@ -17,7 +17,7 @@ module RedisFailover
17
17
  end
18
18
 
19
19
  def watch
20
- @monitor_thread = Thread.new { monitor_node }
20
+ @monitor_thread ||= Thread.new { monitor_node }
21
21
  self
22
22
  end
23
23
 
@@ -9,17 +9,10 @@ module RedisFailover
9
9
  node_manager_thread.join
10
10
  end
11
11
 
12
- def self.node_manager
13
- @node_manager
14
- end
15
-
16
12
  def self.trap_signals
17
13
  [:INT, :TERM].each do |signal|
18
- previous_signal = trap(signal) do
14
+ trap(signal) do
19
15
  Util.logger.info('Shutting down ...')
20
- if previous_signal && previous_signal.respond_to?(:call)
21
- previous_signal.call
22
- end
23
16
  @node_manager.shutdown
24
17
  exit(0)
25
18
  end
@@ -1,3 +1,3 @@
1
1
  module RedisFailover
2
- VERSION = "0.7.0"
2
+ VERSION = "0.8.0"
3
3
  end
@@ -17,8 +17,8 @@ Gem::Specification.new do |gem|
17
17
 
18
18
  gem.add_dependency('redis')
19
19
  gem.add_dependency('redis-namespace')
20
- gem.add_dependency('multi_json', '>= 1.0', '< 1.3')
21
- gem.add_dependency('zk', '~> 0.9')
20
+ gem.add_dependency('multi_json', '~> 1')
21
+ gem.add_dependency('zk', '~> 1.0')
22
22
 
23
23
  gem.add_development_dependency('rake')
24
24
  gem.add_development_dependency('rspec')
data/spec/client_spec.rb CHANGED
@@ -89,9 +89,10 @@ module RedisFailover
89
89
  expect { client.select }.to raise_error(UnsupportedOperationError)
90
90
  end
91
91
 
92
- it 'raises error when no communication from Node Manager within certain time window' do
92
+ it 'attempts ZK reconnect when no communication from Node Manager within certain time window' do
93
93
  client.instance_variable_set(:@last_znode_timestamp, Time.at(0))
94
- expect { client.del('foo') }.to raise_error(MissingNodeManagerError)
94
+ client.should_receive(:reconnect_zk)
95
+ client.del('foo')
95
96
  end
96
97
  end
97
98
  end
@@ -3,16 +3,6 @@ module RedisFailover
3
3
  attr_accessor :master
4
4
  public :current_nodes
5
5
 
6
- def initialize(options)
7
- super
8
- @zklock = Object.new
9
- @zklock.instance_eval do
10
- def with_lock
11
- yield
12
- end
13
- end
14
- end
15
-
16
6
  def discover_nodes
17
7
  # only discover nodes once in testing
18
8
  return if @nodes_discovered
@@ -28,6 +18,10 @@ module RedisFailover
28
18
  @nodes_discovered = true
29
19
  end
30
20
 
21
+ def setup_zk
22
+ @zk = NullObject.new
23
+ end
24
+
31
25
  def slaves
32
26
  @slaves
33
27
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: redis_failover
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.0
4
+ version: 0.8.0
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,11 +9,11 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-04-24 00:00:00.000000000 Z
12
+ date: 2012-05-02 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: redis
16
- requirement: &70345784320360 !ruby/object:Gem::Requirement
16
+ requirement: !ruby/object:Gem::Requirement
17
17
  none: false
18
18
  requirements:
19
19
  - - ! '>='
@@ -21,10 +21,15 @@ dependencies:
21
21
  version: '0'
22
22
  type: :runtime
23
23
  prerelease: false
24
- version_requirements: *70345784320360
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ! '>='
28
+ - !ruby/object:Gem::Version
29
+ version: '0'
25
30
  - !ruby/object:Gem::Dependency
26
31
  name: redis-namespace
27
- requirement: &70345784319940 !ruby/object:Gem::Requirement
32
+ requirement: !ruby/object:Gem::Requirement
28
33
  none: false
29
34
  requirements:
30
35
  - - ! '>='
@@ -32,35 +37,47 @@ dependencies:
32
37
  version: '0'
33
38
  type: :runtime
34
39
  prerelease: false
35
- version_requirements: *70345784319940
36
- - !ruby/object:Gem::Dependency
37
- name: multi_json
38
- requirement: &70345784319420 !ruby/object:Gem::Requirement
40
+ version_requirements: !ruby/object:Gem::Requirement
39
41
  none: false
40
42
  requirements:
41
43
  - - ! '>='
42
44
  - !ruby/object:Gem::Version
43
- version: '1.0'
44
- - - <
45
+ version: '0'
46
+ - !ruby/object:Gem::Dependency
47
+ name: multi_json
48
+ requirement: !ruby/object:Gem::Requirement
49
+ none: false
50
+ requirements:
51
+ - - ~>
45
52
  - !ruby/object:Gem::Version
46
- version: '1.3'
53
+ version: '1'
47
54
  type: :runtime
48
55
  prerelease: false
49
- version_requirements: *70345784319420
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ none: false
58
+ requirements:
59
+ - - ~>
60
+ - !ruby/object:Gem::Version
61
+ version: '1'
50
62
  - !ruby/object:Gem::Dependency
51
63
  name: zk
52
- requirement: &70345784318660 !ruby/object:Gem::Requirement
64
+ requirement: !ruby/object:Gem::Requirement
53
65
  none: false
54
66
  requirements:
55
67
  - - ~>
56
68
  - !ruby/object:Gem::Version
57
- version: '0.9'
69
+ version: '1.0'
58
70
  type: :runtime
59
71
  prerelease: false
60
- version_requirements: *70345784318660
72
+ version_requirements: !ruby/object:Gem::Requirement
73
+ none: false
74
+ requirements:
75
+ - - ~>
76
+ - !ruby/object:Gem::Version
77
+ version: '1.0'
61
78
  - !ruby/object:Gem::Dependency
62
79
  name: rake
63
- requirement: &70345784318280 !ruby/object:Gem::Requirement
80
+ requirement: !ruby/object:Gem::Requirement
64
81
  none: false
65
82
  requirements:
66
83
  - - ! '>='
@@ -68,10 +85,15 @@ dependencies:
68
85
  version: '0'
69
86
  type: :development
70
87
  prerelease: false
71
- version_requirements: *70345784318280
88
+ version_requirements: !ruby/object:Gem::Requirement
89
+ none: false
90
+ requirements:
91
+ - - ! '>='
92
+ - !ruby/object:Gem::Version
93
+ version: '0'
72
94
  - !ruby/object:Gem::Dependency
73
95
  name: rspec
74
- requirement: &70345784317820 !ruby/object:Gem::Requirement
96
+ requirement: !ruby/object:Gem::Requirement
75
97
  none: false
76
98
  requirements:
77
99
  - - ! '>='
@@ -79,7 +101,12 @@ dependencies:
79
101
  version: '0'
80
102
  type: :development
81
103
  prerelease: false
82
- version_requirements: *70345784317820
104
+ version_requirements: !ruby/object:Gem::Requirement
105
+ none: false
106
+ requirements:
107
+ - - ! '>='
108
+ - !ruby/object:Gem::Version
109
+ version: '0'
83
110
  description: Redis Failover is a ZooKeeper-based automatic master/slave failover solution
84
111
  for Ruby
85
112
  email:
@@ -102,6 +129,7 @@ files:
102
129
  - lib/redis_failover/cli.rb
103
130
  - lib/redis_failover/client.rb
104
131
  - lib/redis_failover/errors.rb
132
+ - lib/redis_failover/manual.rb
105
133
  - lib/redis_failover/node.rb
106
134
  - lib/redis_failover/node_manager.rb
107
135
  - lib/redis_failover/node_watcher.rb
@@ -132,7 +160,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
132
160
  version: '0'
133
161
  segments:
134
162
  - 0
135
- hash: -4131531819785678189
163
+ hash: -3688012640966712276
136
164
  required_rubygems_version: !ruby/object:Gem::Requirement
137
165
  none: false
138
166
  requirements:
@@ -141,10 +169,10 @@ required_rubygems_version: !ruby/object:Gem::Requirement
141
169
  version: '0'
142
170
  segments:
143
171
  - 0
144
- hash: -4131531819785678189
172
+ hash: -3688012640966712276
145
173
  requirements: []
146
174
  rubyforge_project:
147
- rubygems_version: 1.8.16
175
+ rubygems_version: 1.8.23
148
176
  signing_key:
149
177
  specification_version: 3
150
178
  summary: Redis Failover is a ZooKeeper-based automatic master/slave failover solution