redis_failover 0.9.7.2 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
data/Changes.md CHANGED
@@ -1,3 +1,17 @@
1
+ 1.0.0
2
+ -----------
3
+ ** NOTE: This version of redis_failover requires that you upgrade your clients and Node Managers at the same time.
4
+
5
+ - redis_failover now supports distributed monitoring among the Node Managers! Previously, the Node Managers were only used
6
+ as a means of redundancy in case a particular node manager crashed. Starting with version 1.0 of redis_failover, the Node
7
+ Managers will all periodically report their health report/snapshots. The primary Node Manager will utilize a configurable
8
+ "node strategy" to determine if a particular node is available or unavailable.
9
+ - redis_failover now supports a configurable "failover strategy" that's consulted when performing a failover. Currently,
10
+ a single strategy is provided that takes into account the average latency of the last health check to the redis server.
11
+ - Improved handling of underlying ZK client connection in RedisFailover::NodeManager
12
+ - Add support for passing in an existing ZK client instance to RedisFailover::Cient.new
13
+ - Reduce unnecessary writes to ZK
14
+
1
15
  0.9.7.2
2
16
  -----------
3
17
  - Add support for Redis#client's location method. Fixes a compatibility issue with redis_failover and Sidekiq.
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  [![Build Status](https://secure.travis-ci.org/ryanlecompte/redis_failover.png?branch=master)](http://travis-ci.org/ryanlecompte/redis_failover)
4
4
 
5
- redis_failover attempts to provides a full automatic master/slave failover solution for Ruby. Redis does not provide
5
+ redis_failover attempts to provides a full automatic master/slave failover solution for Ruby. Redis does not currently provide
6
6
  an automatic failover capability when configured for master/slave replication. When the master node dies,
7
7
  a new master must be manually brought online and assigned as the slave's new master. This manual
8
8
  switch-over is not desirable in high traffic sites where Redis is a critical part of the overall
@@ -10,8 +10,8 @@ architecture. The existing standard Redis client for Ruby also only supports con
10
10
  Redis server. When using master/slave replication, it is desirable to have all writes go to the
11
11
  master, and all reads go to one of the N configured slaves.
12
12
 
13
- This gem (built using [ZK][]) attempts to address these failover scenarios. A redis failover Node Manager daemon runs as a background
14
- process and monitors all of your configured master/slave nodes. When the daemon starts up, it
13
+ This gem (built using [ZK][]) attempts to address these failover scenarios. One or more Node Manager daemons run as background
14
+ processes and monitor all of your configured master/slave nodes. When the daemon starts up, it
15
15
  automatically discovers the current master/slaves. Background watchers are setup for each of
16
16
  the redis nodes. As soon as a node is detected as being offline, it will be moved to an "unavailable" state.
17
17
  If the node that went offline was the master, then one of the slaves will be promoted as the new master.
@@ -22,8 +22,10 @@ nodes. Note that detection of a node going down should be nearly instantaneous,
22
22
  used to keep tabs on a node is via a blocking Redis BLPOP call (no polling). This call fails nearly
23
23
  immediately when the node actually goes offline. To avoid false positives (i.e., intermittent flaky
24
24
  network interruption), the Node Manager will only mark a node as unavailable if it fails to communicate with
25
- it 3 times (this is configurable via --max-failures, see configuration options below). Note that you can
26
- deploy multiple Node Manager daemons for added redundancy.
25
+ it 3 times (this is configurable via --max-failures, see configuration options below). Note that you can (and should)
26
+ deploy multiple Node Manager daemons since they each report periodic health reports/snapshots of the redis servers. A
27
+ "node strategy" is used to determine if a node is actually unavailable. By default a majority strategy is used, but
28
+ you can also configure "consensus" or "single" as well.
27
29
 
28
30
  This gem provides a RedisFailover::Client wrapper that is master/slave aware. The client is configured
29
31
  with a list of ZooKeeper servers. The client will automatically contact the ZooKeeper cluster to find out
@@ -64,15 +66,20 @@ following options:
64
66
 
65
67
  Usage: redis_node_manager [OPTIONS]
66
68
 
69
+
67
70
  Specific options:
68
- -n, --nodes NODES Comma-separated redis host:port pairs
69
- -z, --zkservers SERVERS Comma-separated ZooKeeper host:port pairs
70
- -p, --password PASSWORD Redis password
71
- --znode-path PATH Znode path override for storing redis server list
72
- --max-failures COUNT Max failures before manager marks node unavailable
73
- -C, --config PATH Path to YAML configuration file
74
- -E, --environment ENV Config environment to use
75
- -h, --help Display all options
71
+ -n, --nodes NODES Comma-separated redis host:port pairs
72
+ -z, --zkservers SERVERS Comma-separated ZooKeeper host:port pairs
73
+ -p, --password PASSWORD Redis password
74
+ --znode-path PATH Znode path override for storing redis server list
75
+ --max-failures COUNT Max failures before manager marks node unavailable
76
+ -C, --config PATH Path to YAML config file
77
+ --with-chroot ROOT Path to ZooKeepers chroot
78
+ -E, --environment ENV Config environment to use
79
+ --node-strategy STRATEGY Strategy used when determining availability of nodes (default: majority)
80
+ --failover-strategy STRATEGY Strategy used when failing over to a new node (default: latency)
81
+ --required-node-managers COUNT Required Node Managers that must be reachable to determine node state (default: 1)
82
+ -h, --help Display all options
76
83
 
77
84
  To start the daemon for a simple master/slave configuration, use the following:
78
85
 
@@ -83,6 +90,9 @@ would look like the following:
83
90
 
84
91
  ---
85
92
  :max_failures: 2
93
+ :node_strategy: majority
94
+ :failover_strategy: latency
95
+ :required_node_managers: 2
86
96
  :nodes:
87
97
  - localhost:6379
88
98
  - localhost:1111
@@ -103,25 +113,35 @@ directory for configuration file samples.
103
113
 
104
114
  The Node Manager will automatically discover the master/slaves upon startup. Note that it is
105
115
  a good idea to run more than one instance of the Node Manager daemon in your environment. At
106
- any moment, a single Node Manager process will be designated to monitor the redis servers. If
116
+ any moment, a single Node Manager process will be designated to manage the redis servers. If
107
117
  this Node Manager process dies or becomes partitioned from the network, another Node Manager
108
- will be promoted as the primary monitor of redis servers. You can run as many Node Manager
109
- processes as you'd like for added redundancy.
118
+ will be promoted as the primary manager of redis servers. You can run as many Node Manager
119
+ processes as you'd like. Every Node Manager periodically records health "snapshots" which the
120
+ primary/master Node Manager consults when determining if it should officially mark a redis
121
+ server as unavailable. By default, a majority strategy is used. Also, when a failover
122
+ happens, the primary Node Manager will consult the node snapshots to determine the best
123
+ node to use as the new master.
110
124
 
111
125
  ## Client Usage
112
126
 
113
127
  The redis failover client must be used in conjunction with a running Node Manager daemon. The
114
128
  client supports various configuration options, however the only mandatory option is the list of
115
- ZooKeeper servers:
129
+ ZooKeeper servers OR an existing ZK client instance:
116
130
 
131
+ # Explicitly specify the ZK servers
117
132
  client = RedisFailover::Client.new(:zkservers => 'localhost:2181,localhost:2182,localhost:2183')
118
133
 
134
+ # Explicitly specify an existing ZK client instance (useful if using a connection pool, etc)
135
+ zk = ZK.new('localhost:2181,localhost:2182,localhost:2183')
136
+ client = RedisFailover::Client.new(:zk => zk)
137
+
119
138
  The client actually employs the common redis and redis-namespace gems underneath, so this should be
120
139
  a drop-in replacement for your existing pure redis client usage.
121
140
 
122
141
  The full set of options that can be passed to RedisFailover::Client are:
123
142
 
124
- :zkservers - comma-separated ZooKeeper host:port pairs (required)
143
+ :zk - an existing ZK client instance
144
+ :zkservers - comma-separated ZooKeeper host:port pairs
125
145
  :znode_path - the Znode path override for redis server list (optional)
126
146
  :password - password for redis nodes (optional)
127
147
  :db - db to use for redis nodes (optional)
@@ -149,6 +169,25 @@ server passed to #manual_failover, or it will pick a random slave to become the
149
169
  client = RedisFailover::Client.new(:zkservers => 'localhost:2181,localhost:2182,localhost:2183')
150
170
  client.manual_failover(:host => 'localhost', :port => 2222)
151
171
 
172
+ ## Node & Failover Strategies
173
+
174
+ As of redis_failover version 1.0, the notion of "node" and "failover" strategies exists. All running Node Managers will periodically record
175
+ "snapshots" of their view of the redis nodes. The primary Node Manager will process these snapshots from all of the Node Managers by running a configurable
176
+ node strategy. By default, a majority strategy is used. This means that if a majority of Node Managers indicate that a node is unavailable, then the primary
177
+ Node Manager will officially mark it as unavailable. Other strategies exist:
178
+
179
+ - consensus (all Node Managers must agree that the node is unavailable)
180
+ - single (at least one Node Manager saying the node is unavailable will cause the node to be marked as such)
181
+
182
+ When a failover happens, the primary Node Manager will now consult a "failover strategy" to determine which candidate node should be used. Currently only a single
183
+ strategy is provided by redis_failover: latency. This strategy simply selects a node that is both marked as available by all Node Managers and has the lowest
184
+ average latency for its last health check.
185
+
186
+ Note that you should set the "required_node_managers" configuration option appropriately. This value (defaults to 1) is used to determine if enough Node
187
+ Managers have reported their view of a node's state. For example, if you have deployed 5 Node Managers, then you should set this value to 5 if you only
188
+ want to accept a node's availability when all 5 Node Managers are part of the snapshot. To give yourself flexibility, you may want to set this value to 3
189
+ instead. This would give you flexibility to take down 2 Node Managers, while still allowing the cluster to be managed appropriately.
190
+
152
191
  ## Documentation
153
192
 
154
193
  redis_failover uses YARD for its API documentation. Refer to the generated [API documentation](http://rubydoc.info/github/ryanlecompte/redis_failover/master/frames) for full coverage.
@@ -166,8 +205,6 @@ redis_failover uses YARD for its API documentation. Refer to the generated [API
166
205
 
167
206
  - Note that it's still possible for the RedisFailover::Client instances to see a stale list of servers for a very small window. In most cases this will not be the case due to how ZooKeeper handles distributed communication, but you should be aware that in the worst case the client could write to a "stale" master for a small period of time until the next watch event is received by the client via ZooKeeper.
168
207
 
169
- - Note that currently multiple Node Managers are currently used for redundancy purposes only. The Node Managers do not communicate with each other to perform any type of election or voting to determine if they all agree on promoting a new master. Right now Node Managers that are not "active" just sit and wait until they can grab the lock to become the single decision-maker for which Redis servers are available or not. This means that a scenario could present itself where a Node Manager thinks the Redis master is available, however the actual RedisFailover::Client instances think they can't reach the Redis master (either due to network partitions or the Node Manager flapping due to machine failure, etc). We are exploring ways to improve this situation.
170
-
171
208
  ## Resources
172
209
 
173
210
  - Check out Steve Whittaker's [redis-failover-test](https://github.com/swhitt/redis-failover-test) project which shows how to test redis_failover in a non-trivial configuration using Vagrant/Chef.
@@ -2,6 +2,9 @@
2
2
  # redis_node_manager -C config.yml
3
3
  ---
4
4
  :max_failures: 2
5
+ :node_strategy: majority
6
+ :failover_strategy: latency
7
+ :required_node_managers: 2
5
8
  :nodes:
6
9
  - localhost:6379
7
10
  - localhost:1111
@@ -6,6 +6,7 @@ require 'thread'
6
6
  require 'logger'
7
7
  require 'timeout'
8
8
  require 'optparse'
9
+ require 'benchmark'
9
10
  require 'multi_json'
10
11
  require 'securerandom'
11
12
 
@@ -16,7 +17,9 @@ require 'redis_failover/errors'
16
17
  require 'redis_failover/client'
17
18
  require 'redis_failover/runner'
18
19
  require 'redis_failover/version'
20
+ require 'redis_failover/node_strategy'
19
21
  require 'redis_failover/node_manager'
20
22
  require 'redis_failover/node_watcher'
23
+ require 'redis_failover/node_snapshot'
21
24
  require 'redis_failover/manual_failover'
22
-
25
+ require 'redis_failover/failover_strategy'
@@ -48,6 +48,21 @@ module RedisFailover
48
48
  options[:config_environment] = config_env
49
49
  end
50
50
 
51
+ opts.on('--node-strategy STRATEGY',
52
+ 'Strategy used when determining availability of nodes (default: majority)') do |strategy|
53
+ options[:node_strategy] = strategy
54
+ end
55
+
56
+ opts.on('--failover-strategy STRATEGY',
57
+ 'Strategy used when failing over to a new node (default: latency)') do |strategy|
58
+ options[:failover_strategy] = strategy
59
+ end
60
+
61
+ opts.on('--required-node-managers COUNT',
62
+ 'Required Node Managers that must be reachable to determine node state (default: 1)') do |count|
63
+ options[:required_node_managers] = Integer(count)
64
+ end
65
+
51
66
  opts.on('-h', '--help', 'Display all options') do
52
67
  puts opts
53
68
  exit
@@ -59,7 +74,7 @@ module RedisFailover
59
74
  options = from_file(config_file, options[:config_environment])
60
75
  end
61
76
 
62
- if required_options_missing?(options)
77
+ if invalid_options?(options)
63
78
  puts parser
64
79
  exit
65
80
  end
@@ -68,7 +83,7 @@ module RedisFailover
68
83
  end
69
84
 
70
85
  # @return [Boolean] true if required options missing, false otherwise
71
- def self.required_options_missing?(options)
86
+ def self.invalid_options?(options)
72
87
  return true if options.empty?
73
88
  return true unless options.values_at(:nodes, :zkservers).all?
74
89
  false
@@ -113,6 +128,14 @@ module RedisFailover
113
128
  options[:nodes].each { |opts| opts.update(:password => password) }
114
129
  end
115
130
 
131
+ if node_strategy = options[:node_strategy]
132
+ options[:node_strategy] = node_strategy.to_sym
133
+ end
134
+
135
+ if failover_strategy = options[:failover_strategy]
136
+ options[:failover_strategy] = failover_strategy.to_sym
137
+ end
138
+
116
139
  options
117
140
  end
118
141
  end
@@ -40,6 +40,7 @@ module RedisFailover
40
40
  #
41
41
  # @param [Hash] options the options used to initialize the client instance
42
42
  # @option options [String] :zkservers comma-separated ZooKeeper host:port
43
+ # @option options [String] :zk an existing ZK client connection instance
43
44
  # @option options [String] :znode_path znode path override for redis nodes
44
45
  # @option options [String] :password password for redis nodes
45
46
  # @option options [String] :db database to use for redis nodes
@@ -49,6 +50,7 @@ module RedisFailover
49
50
  # @option options [Integer] :max_retries max retries for a failure
50
51
  # @option options [Boolean] :safe_mode indicates if safe mode is used or not
51
52
  # @option options [Boolean] :master_only indicates if only redis master is used
53
+ # @note Use either :zkservers or :zk
52
54
  # @return [RedisFailover::Client]
53
55
  def initialize(options = {})
54
56
  Util.logger = options[:logger] if options[:logger]
@@ -133,7 +135,7 @@ module RedisFailover
133
135
  # @option options [String] :host the host of the failover candidate
134
136
  # @option options [String] :port the port of the failover candidate
135
137
  def manual_failover(options = {})
136
- ManualFailover.new(@zk, options).perform
138
+ ManualFailover.new(@zk, @root_znode, options).perform
137
139
  self
138
140
  end
139
141
 
@@ -177,13 +179,13 @@ module RedisFailover
177
179
 
178
180
  # Sets up the underlying ZooKeeper connection.
179
181
  def setup_zk
180
- @zk = ZK.new(@zkservers)
181
- @zk.watcher.register(@znode) { |event| handle_zk_event(event) }
182
+ @zk = ZK.new(@zkservers) if @zkservers
183
+ @zk.register(redis_nodes_path) { |event| handle_zk_event(event) }
182
184
  if @safe_mode
183
185
  @zk.on_expired_session { purge_clients }
184
186
  end
185
- @zk.on_connected { @zk.stat(@znode, :watch => true) }
186
- @zk.stat(@znode, :watch => true)
187
+ @zk.on_connected { @zk.stat(redis_nodes_path, :watch => true) }
188
+ @zk.stat(redis_nodes_path, :watch => true)
187
189
  update_znode_timestamp
188
190
  end
189
191
 
@@ -196,12 +198,12 @@ module RedisFailover
196
198
  build_clients
197
199
  elsif event.node_deleted?
198
200
  purge_clients
199
- @zk.stat(@znode, :watch => true)
201
+ @zk.stat(redis_nodes_path, :watch => true)
200
202
  else
201
203
  logger.error("Unknown ZK node event: #{event.inspect}")
202
204
  end
203
205
  ensure
204
- @zk.stat(@znode, :watch => true)
206
+ @zk.stat(redis_nodes_path, :watch => true)
205
207
  end
206
208
 
207
209
  # Determines if a method is a known redis operation.
@@ -310,7 +312,7 @@ module RedisFailover
310
312
  #
311
313
  # @return [Hash] the known master/slave redis servers
312
314
  def fetch_nodes
313
- data = @zk.get(@znode, :watch => true).first
315
+ data = @zk.get(redis_nodes_path, :watch => true).first
314
316
  nodes = symbolize_keys(decode(data))
315
317
  logger.debug("Fetched nodes: #{nodes.inspect}")
316
318
 
@@ -319,6 +321,10 @@ module RedisFailover
319
321
  logger.debug { "Caught #{ex.class} '#{ex.message}' - reopening ZK client" }
320
322
  @zk.reopen
321
323
  retry
324
+ rescue *ZK_ERRORS => ex
325
+ logger.warn { "Caught #{ex.class} '#{ex.message}' - retrying" }
326
+ sleep(RETRY_WAIT_TIME)
327
+ retry
322
328
  end
323
329
 
324
330
  # Builds new Redis clients for the specified nodes.
@@ -475,8 +481,12 @@ module RedisFailover
475
481
  #
476
482
  # @param [Hash] options the configuration options
477
483
  def parse_options(options)
478
- @zkservers = options.fetch(:zkservers) { raise ArgumentError, ':zkservers required'}
479
- @znode = options.fetch(:znode_path, Util::DEFAULT_ZNODE_PATH)
484
+ @zk, @zkservers = options.values_at(:zk, :zkservers)
485
+ if [@zk, @zkservers].all? || [@zk, @zkservers].none?
486
+ raise ArgumentError, 'must specify :zk or :zkservers'
487
+ end
488
+
489
+ @root_znode = options.fetch(:znode_path, Util::DEFAULT_ROOT_ZNODE_PATH)
480
490
  @namespace = options[:namespace]
481
491
  @password = options[:password]
482
492
  @db = options[:db]
@@ -485,5 +495,10 @@ module RedisFailover
485
495
  @safe_mode = options.fetch(:safe_mode, true)
486
496
  @master_only = options.fetch(:master_only, false)
487
497
  end
498
+
499
+ # @return [String] the znode path for the master redis nodes config
500
+ def redis_nodes_path
501
+ "#{@root_znode}/nodes"
502
+ end
488
503
  end
489
504
  end
@@ -51,8 +51,4 @@ module RedisFailover
51
51
  super("Operation `#{operation}` is currently unsupported")
52
52
  end
53
53
  end
54
-
55
- # Raised when we detect an expired ZK session.
56
- class ZKDisconnectedError < Error
57
- end
58
54
  end
@@ -0,0 +1,25 @@
1
+ module RedisFailover
2
+ # Base class for strategies that determine which node is used during failover.
3
+ class FailoverStrategy
4
+ include Util
5
+
6
+ # Loads a strategy based on the given name.
7
+ #
8
+ # @param [String, Symbol] name the strategy name
9
+ # @return [Object] a new strategy instance
10
+ def self.for(name)
11
+ require "redis_failover/failover_strategy/#{name.downcase}"
12
+ const_get(name.capitalize).new
13
+ rescue LoadError, NameError
14
+ raise "Failed to find failover strategy: #{name}"
15
+ end
16
+
17
+ # Returns a candidate node as determined by this strategy.
18
+ #
19
+ # @param [Hash<Node, NodeSnapshot>] snapshots the node snapshots
20
+ # @return [Node] the candidate node or nil if one couldn't be found
21
+ def find_candidate(snapshots)
22
+ raise NotImplementedError
23
+ end
24
+ end
25
+ end
@@ -0,0 +1,21 @@
1
+ module RedisFailover
2
+ class FailoverStrategy
3
+ # Failover strategy that selects an available node that is both seen by all
4
+ # node managers and has the lowest reported health check latency.
5
+ class Latency < FailoverStrategy
6
+ # @see RedisFailover::FailoverStrategy#find_candidate
7
+ def find_candidate(snapshots)
8
+ candidates = {}
9
+ snapshots.each do |node, snapshot|
10
+ if snapshot.all_available?
11
+ candidates[node] = snapshot.avg_latency
12
+ end
13
+ end
14
+
15
+ if candidate = candidates.min_by(&:last)
16
+ candidate.first
17
+ end
18
+ end
19
+ end
20
+ end
21
+ end
@@ -2,37 +2,49 @@ module RedisFailover
2
2
  # Provides manual failover support to a new master.
3
3
  class ManualFailover
4
4
  # Path for manual failover communication.
5
- ZNODE_PATH = '/redis_failover_manual'.freeze
5
+ ZNODE_PATH = 'manual_failover'.freeze
6
6
 
7
7
  # Denotes that any slave can be used as a candidate for promotion.
8
8
  ANY_SLAVE = "ANY_SLAVE".freeze
9
9
 
10
+ def self.path(root_znode)
11
+ "#{root_znode}/#{ZNODE_PATH}"
12
+ end
13
+
10
14
  # Creates a new instance.
11
15
  #
12
16
  # @param [ZK] zk the ZooKeeper client
17
+ # @param [ZNode] root_znode the root ZK node
13
18
  # @param [Hash] options the options used for manual failover
14
19
  # @option options [String] :host the host of the failover candidate
15
20
  # @option options [String] :port the port of the failover candidate
16
21
  # @note
17
22
  # If options is empty, a random slave will be used
18
23
  # as a failover candidate.
19
- def initialize(zk, options = {})
24
+ def initialize(zk, root_znode, options = {})
20
25
  @zk = zk
26
+ @root_znode = root_znode
21
27
  @options = options
28
+
29
+ unless @options.empty?
30
+ port = Integer(@options[:port]) rescue nil
31
+ raise ArgumentError, ':host not properly specified' if @options[:host].to_s.empty?
32
+ raise ArgumentError, ':port not properly specified' if port.nil?
33
+ end
22
34
  end
23
35
 
24
36
  # Performs a manual failover.
25
37
  def perform
26
38
  create_path
27
39
  node = @options.empty? ? ANY_SLAVE : "#{@options[:host]}:#{@options[:port]}"
28
- @zk.set(ZNODE_PATH, node)
40
+ @zk.set(self.class.path(@root_znode), node)
29
41
  end
30
42
 
31
43
  private
32
44
 
33
45
  # Creates the znode path used for coordinating manual failovers.
34
46
  def create_path
35
- @zk.create(ZNODE_PATH)
47
+ @zk.create(self.class.path(@root_znode))
36
48
  rescue ZK::Exceptions::NodeExists
37
49
  # best effort
38
50
  end