nogara-redis_failover 0.9.7 → 0.9.7.2

Sign up to get free protection for your applications and to get access to all the features.
data/Changes.md CHANGED
@@ -1,3 +1,20 @@
1
+ HEAD
2
+ -----------
3
+ - redis_failover now supports distributed monitoring among the node managers! Previously, the node managers were only used
4
+ as a means of redundancy in case a particular node manager crashed. Starting with version 1.0 of redis_failover, the node
5
+ managers will all periodically report their health report/snapshots. The primary node manager will utilize a configurable
6
+ "node strategy" to determine if a particular node is available or unavailable.
7
+ - redis_failover now supports a configurable "failover strategy" that's consulted when performing a failover. Currently,
8
+ a single strategy is provided that takes into account the average latency of the last health check to the redis server.
9
+
10
+ 0.9.7.2
11
+ -----------
12
+ - Add support for Redis#client's location method. Fixes a compatibility issue with redis_failover and Sidekiq.
13
+
14
+ 0.9.7.1
15
+ -----------
16
+ - Stop repeated attempts to acquire exclusive lock in Node Manager (#36)
17
+
1
18
  0.9.7
2
19
  -----------
3
20
  - Stubbed Client#client to return itself, fixes a fork reconnect bug with Resque (dbalatero)
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  [![Build Status](https://secure.travis-ci.org/ryanlecompte/redis_failover.png?branch=master)](http://travis-ci.org/ryanlecompte/redis_failover)
4
4
 
5
- redis_failover attempts to provides a full automatic master/slave failover solution for Ruby. Redis does not provide
5
+ redis_failover attempts to provides a full automatic master/slave failover solution for Ruby. Redis does not currently provide
6
6
  an automatic failover capability when configured for master/slave replication. When the master node dies,
7
7
  a new master must be manually brought online and assigned as the slave's new master. This manual
8
8
  switch-over is not desirable in high traffic sites where Redis is a critical part of the overall
@@ -10,8 +10,8 @@ architecture. The existing standard Redis client for Ruby also only supports con
10
10
  Redis server. When using master/slave replication, it is desirable to have all writes go to the
11
11
  master, and all reads go to one of the N configured slaves.
12
12
 
13
- This gem (built using [ZK][]) attempts to address these failover scenarios. A redis failover Node Manager daemon runs as a background
14
- process and monitors all of your configured master/slave nodes. When the daemon starts up, it
13
+ This gem (built using [ZK][]) attempts to address these failover scenarios. One or more Node Manager daemons run as background
14
+ processes and monitor all of your configured master/slave nodes. When the daemon starts up, it
15
15
  automatically discovers the current master/slaves. Background watchers are setup for each of
16
16
  the redis nodes. As soon as a node is detected as being offline, it will be moved to an "unavailable" state.
17
17
  If the node that went offline was the master, then one of the slaves will be promoted as the new master.
@@ -22,8 +22,10 @@ nodes. Note that detection of a node going down should be nearly instantaneous,
22
22
  used to keep tabs on a node is via a blocking Redis BLPOP call (no polling). This call fails nearly
23
23
  immediately when the node actually goes offline. To avoid false positives (i.e., intermittent flaky
24
24
  network interruption), the Node Manager will only mark a node as unavailable if it fails to communicate with
25
- it 3 times (this is configurable via --max-failures, see configuration options below). Note that you can
26
- deploy multiple Node Manager daemons for added redundancy.
25
+ it 3 times (this is configurable via --max-failures, see configuration options below). Note that you can (and should)
26
+ deploy multiple Node Manager daemons since they each report periodic health reports/snapshots of the redis servers. A
27
+ "node strategy" is used to determine if a node is actually unavailable. By default a majority strategy is used, but
28
+ you can also configure "consensus" or "single" as well.
27
29
 
28
30
  This gem provides a RedisFailover::Client wrapper that is master/slave aware. The client is configured
29
31
  with a list of ZooKeeper servers. The client will automatically contact the ZooKeeper cluster to find out
@@ -64,15 +66,20 @@ following options:
64
66
 
65
67
  Usage: redis_node_manager [OPTIONS]
66
68
 
69
+
67
70
  Specific options:
68
- -n, --nodes NODES Comma-separated redis host:port pairs
69
- -z, --zkservers SERVERS Comma-separated ZooKeeper host:port pairs
70
- -p, --password PASSWORD Redis password
71
- --znode-path PATH Znode path override for storing redis server list
72
- --max-failures COUNT Max failures before manager marks node unavailable
73
- -C, --config PATH Path to YAML configuration file
74
- -E, --environment ENV Config environment to use
75
- -h, --help Display all options
71
+ -n, --nodes NODES Comma-separated redis host:port pairs
72
+ -z, --zkservers SERVERS Comma-separated ZooKeeper host:port pairs
73
+ -p, --password PASSWORD Redis password
74
+ --znode-path PATH Znode path override for storing redis server list
75
+ --max-failures COUNT Max failures before manager marks node unavailable
76
+ -C, --config PATH Path to YAML config file
77
+ --with-chroot ROOT Path to ZooKeepers chroot
78
+ -E, --environment ENV Config environment to use
79
+ --node-strategy STRATEGY Strategy used when determining availability of nodes (default: majority)
80
+ --failover-strategy STRATEGY Strategy used when failing over to a new node (default: latency)
81
+ --required-node-managers COUNT Required Node Managers that must be reachable to determine node state (default: 1)
82
+ -h, --help Display all options
76
83
 
77
84
  To start the daemon for a simple master/slave configuration, use the following:
78
85
 
@@ -103,10 +110,12 @@ directory for configuration file samples.
103
110
 
104
111
  The Node Manager will automatically discover the master/slaves upon startup. Note that it is
105
112
  a good idea to run more than one instance of the Node Manager daemon in your environment. At
106
- any moment, a single Node Manager process will be designated to monitor the redis servers. If
113
+ any moment, a single Node Manager process will be designated to manage the redis servers. If
107
114
  this Node Manager process dies or becomes partitioned from the network, another Node Manager
108
- will be promoted as the primary monitor of redis servers. You can run as many Node Manager
109
- processes as you'd like for added redundancy.
115
+ will be promoted as the primary manager of redis servers. You can run as many Node Manager
116
+ processes as you'd like. Every Node Manager periodically records health "snapshots" which the
117
+ primary/master Node Manager consults when determining if it should officially mark a redis
118
+ server as unavailable. By default, a majority strategy is used.
110
119
 
111
120
  ## Client Usage
112
121
 
@@ -149,6 +158,25 @@ server passed to #manual_failover, or it will pick a random slave to become the
149
158
  client = RedisFailover::Client.new(:zkservers => 'localhost:2181,localhost:2182,localhost:2183')
150
159
  client.manual_failover(:host => 'localhost', :port => 2222)
151
160
 
161
+ ## Node & Failover Strategies
162
+
163
+ As of redis_failover version 1.0, the notion of "node" and "failover" strategies exists. All running Node Managers will periodically record
164
+ "snapshots" of their view of the redis nodes. The primary Node Manager will process these snapshots from all of the Node Managers by running a configurable
165
+ node strategy. By default, a majority strategy is used. This means that if a majority of Node Managers indicate that a node is unavailable, then the primary
166
+ Node Manager will officially mark it as unavailable. Other strategies exist:
167
+
168
+ - consensus (all Node Managers must agree that the node is unavailable)
169
+ - single (at least one Node Manager saying the node is unavailable will cause the node to be marked as such)
170
+
171
+ When a failover happens, the primary Node Manager will now consult a "failover strategy" to determine which candidate node should be used. Currently only a single
172
+ strategy is provided by redis_failover: latency. This strategy simply selects a node that is both marked as available by all Node Managers and has the lowest
173
+ average latency for its last health check.
174
+
175
+ Note that you should set the "required_node_managers" configuration option appropriately. This value (defaults to 1) is used to determine if enough Node
176
+ Managers have reported their view of a node's state. For example, if you have deployed 5 Node Managers, then you should set this value to 5 if you only
177
+ want to accept a node's availability when all 5 Node Managers are part of the snapshot. To give yourself flexibility, you may want to set this value to 3
178
+ instead. This would give you flexibility to take down 2 Node Managers, while still allowing the cluster to be managed appropriately.
179
+
152
180
  ## Documentation
153
181
 
154
182
  redis_failover uses YARD for its API documentation. Refer to the generated [API documentation](http://rubydoc.info/github/ryanlecompte/redis_failover/master/frames) for full coverage.
@@ -166,8 +194,6 @@ redis_failover uses YARD for its API documentation. Refer to the generated [API
166
194
 
167
195
  - Note that it's still possible for the RedisFailover::Client instances to see a stale list of servers for a very small window. In most cases this will not be the case due to how ZooKeeper handles distributed communication, but you should be aware that in the worst case the client could write to a "stale" master for a small period of time until the next watch event is received by the client via ZooKeeper.
168
196
 
169
- - Note that currently multiple Node Managers are currently used for redundancy purposes only. The Node Managers do not communicate with each other to perform any type of election or voting to determine if they all agree on promoting a new master. Right now Node Managers that are not "active" just sit and wait until they can grab the lock to become the single decision-maker for which Redis servers are available or not. This means that a scenario could present itself where a Node Manager thinks the Redis master is available, however the actual RedisFailover::Client instances think they can't reach the Redis master (either due to network partitions or the Node Manager flapping due to machine failure, etc). We are exploring ways to improve this situation.
170
-
171
197
  ## Resources
172
198
 
173
199
  - Check out Steve Whittaker's [redis-failover-test](https://github.com/swhitt/redis-failover-test) project which shows how to test redis_failover in a non-trivial configuration using Vagrant/Chef.
@@ -2,6 +2,9 @@
2
2
  # redis_node_manager -C config.yml
3
3
  ---
4
4
  :max_failures: 2
5
+ :node_strategy: majority
6
+ :failover_strategy: latency
7
+ :required_node_managers: 2
5
8
  :nodes:
6
9
  - localhost:6379
7
10
  - localhost:1111
@@ -6,6 +6,7 @@ require 'thread'
6
6
  require 'logger'
7
7
  require 'timeout'
8
8
  require 'optparse'
9
+ require 'benchmark'
9
10
  require 'multi_json'
10
11
  require 'securerandom'
11
12
 
@@ -16,7 +17,9 @@ require 'redis_failover/errors'
16
17
  require 'redis_failover/client'
17
18
  require 'redis_failover/runner'
18
19
  require 'redis_failover/version'
20
+ require 'redis_failover/node_strategy'
19
21
  require 'redis_failover/node_manager'
20
22
  require 'redis_failover/node_watcher'
23
+ require 'redis_failover/node_snapshot'
21
24
  require 'redis_failover/manual_failover'
22
-
25
+ require 'redis_failover/failover_strategy'
@@ -48,6 +48,21 @@ module RedisFailover
48
48
  options[:config_environment] = config_env
49
49
  end
50
50
 
51
+ opts.on('--node-strategy STRATEGY',
52
+ 'Strategy used when determining availability of nodes (default: majority)') do |strategy|
53
+ options[:node_strategy] = strategy
54
+ end
55
+
56
+ opts.on('--failover-strategy STRATEGY',
57
+ 'Strategy used when failing over to a new node (default: latency)') do |strategy|
58
+ options[:failover_strategy] = strategy
59
+ end
60
+
61
+ opts.on('--required-node-managers COUNT',
62
+ 'Required Node Managers that must be reachable to determine node state (default: 1)') do |count|
63
+ options[:required_node_managers] = Integer(count)
64
+ end
65
+
51
66
  opts.on('-h', '--help', 'Display all options') do
52
67
  puts opts
53
68
  exit
@@ -59,7 +74,7 @@ module RedisFailover
59
74
  options = from_file(config_file, options[:config_environment])
60
75
  end
61
76
 
62
- if required_options_missing?(options)
77
+ if invalid_options?(options)
63
78
  puts parser
64
79
  exit
65
80
  end
@@ -68,7 +83,7 @@ module RedisFailover
68
83
  end
69
84
 
70
85
  # @return [Boolean] true if required options missing, false otherwise
71
- def self.required_options_missing?(options)
86
+ def self.invalid_options?(options)
72
87
  return true if options.empty?
73
88
  return true unless options.values_at(:nodes, :zkservers).all?
74
89
  false
@@ -113,6 +128,14 @@ module RedisFailover
113
128
  options[:nodes].each { |opts| opts.update(:password => password) }
114
129
  end
115
130
 
131
+ if node_strategy = options[:node_strategy]
132
+ options[:node_strategy] = node_strategy.to_sym
133
+ end
134
+
135
+ if failover_strategy = options[:failover_strategy]
136
+ options[:failover_strategy] = failover_strategy.to_sym
137
+ end
138
+
116
139
  options
117
140
  end
118
141
  end
@@ -79,10 +79,12 @@ module RedisFailover
79
79
  self
80
80
  end
81
81
 
82
- # Sidekiq-web asks for a location in the redis client. Implements something here
83
- # just to make sure it works
82
+ # Delegates to the underlying Redis client to fetch the location.
83
+ # This method always returns the location of the master.
84
+ #
85
+ # @return [String] the redis location
84
86
  def location
85
- 'location'
87
+ dispatch(:client).location
86
88
  end
87
89
 
88
90
  # Specifies a callback to invoke when the current redis node list changes.
@@ -131,7 +133,7 @@ module RedisFailover
131
133
  # @option options [String] :host the host of the failover candidate
132
134
  # @option options [String] :port the port of the failover candidate
133
135
  def manual_failover(options = {})
134
- ManualFailover.new(@zk, options).perform
136
+ ManualFailover.new(@zk, @root_znode, options).perform
135
137
  self
136
138
  end
137
139
 
@@ -176,12 +178,12 @@ module RedisFailover
176
178
  # Sets up the underlying ZooKeeper connection.
177
179
  def setup_zk
178
180
  @zk = ZK.new(@zkservers)
179
- @zk.watcher.register(@znode) { |event| handle_zk_event(event) }
181
+ @zk.watcher.register(redis_nodes_path) { |event| handle_zk_event(event) }
180
182
  if @safe_mode
181
183
  @zk.on_expired_session { purge_clients }
182
184
  end
183
- @zk.on_connected { @zk.stat(@znode, :watch => true) }
184
- @zk.stat(@znode, :watch => true)
185
+ @zk.on_connected { @zk.stat(redis_nodes_path, :watch => true) }
186
+ @zk.stat(redis_nodes_path, :watch => true)
185
187
  update_znode_timestamp
186
188
  end
187
189
 
@@ -194,12 +196,12 @@ module RedisFailover
194
196
  build_clients
195
197
  elsif event.node_deleted?
196
198
  purge_clients
197
- @zk.stat(@znode, :watch => true)
199
+ @zk.stat(redis_nodes_path, :watch => true)
198
200
  else
199
201
  logger.error("Unknown ZK node event: #{event.inspect}")
200
202
  end
201
203
  ensure
202
- @zk.stat(@znode, :watch => true)
204
+ @zk.stat(redis_nodes_path, :watch => true)
203
205
  end
204
206
 
205
207
  # Determines if a method is a known redis operation.
@@ -308,7 +310,7 @@ module RedisFailover
308
310
  #
309
311
  # @return [Hash] the known master/slave redis servers
310
312
  def fetch_nodes
311
- data = @zk.get(@znode, :watch => true).first
313
+ data = @zk.get(redis_nodes_path, :watch => true).first
312
314
  nodes = symbolize_keys(decode(data))
313
315
  logger.debug("Fetched nodes: #{nodes.inspect}")
314
316
 
@@ -474,7 +476,7 @@ module RedisFailover
474
476
  # @param [Hash] options the configuration options
475
477
  def parse_options(options)
476
478
  @zkservers = options.fetch(:zkservers) { raise ArgumentError, ':zkservers required'}
477
- @znode = options.fetch(:znode_path, Util::DEFAULT_ZNODE_PATH)
479
+ @root_znode = options.fetch(:znode_path, Util::DEFAULT_ROOT_ZNODE_PATH)
478
480
  @namespace = options[:namespace]
479
481
  @password = options[:password]
480
482
  @db = options[:db]
@@ -483,5 +485,10 @@ module RedisFailover
483
485
  @safe_mode = options.fetch(:safe_mode, true)
484
486
  @master_only = options.fetch(:master_only, false)
485
487
  end
488
+
489
+ # @return [String] the znode path for the master redis nodes config
490
+ def redis_nodes_path
491
+ "#{@root_znode}/nodes"
492
+ end
486
493
  end
487
494
  end
@@ -51,8 +51,4 @@ module RedisFailover
51
51
  super("Operation `#{operation}` is currently unsupported")
52
52
  end
53
53
  end
54
-
55
- # Raised when we detect an expired ZK session.
56
- class ZKDisconnectedError < Error
57
- end
58
54
  end
@@ -0,0 +1,25 @@
1
+ module RedisFailover
2
+ # Base class for strategies that determine which node is used during failover.
3
+ class FailoverStrategy
4
+ include Util
5
+
6
+ # Loads a strategy based on the given name.
7
+ #
8
+ # @param [String, Symbol] name the strategy name
9
+ # @return [Object] a new strategy instance
10
+ def self.for(name)
11
+ require "redis_failover/failover_strategy/#{name.downcase}"
12
+ const_get(name.capitalize).new
13
+ rescue LoadError, NameError
14
+ raise "Failed to find failover strategy: #{name}"
15
+ end
16
+
17
+ # Returns a candidate node as determined by this strategy.
18
+ #
19
+ # @param [Hash<Node, NodeSnapshot>] snapshots the node snapshots
20
+ # @return [Node] the candidate node or nil if one couldn't be found
21
+ def find_candidate(snapshots)
22
+ raise NotImplementedError
23
+ end
24
+ end
25
+ end
@@ -0,0 +1,21 @@
1
+ module RedisFailover
2
+ class FailoverStrategy
3
+ # Failover strategy that selects an available node that is both seen by all
4
+ # node managers and has the lowest reported health check latency.
5
+ class Latency < FailoverStrategy
6
+ # @see RedisFailover::FailoverStrategy#find_candidate
7
+ def find_candidate(snapshots)
8
+ candidates = {}
9
+ snapshots.each do |node, snapshot|
10
+ if snapshot.all_available?
11
+ candidates[node] = snapshot.avg_latency
12
+ end
13
+ end
14
+
15
+ if candidate = candidates.min_by(&:last)
16
+ candidate.first
17
+ end
18
+ end
19
+ end
20
+ end
21
+ end
@@ -2,37 +2,49 @@ module RedisFailover
2
2
  # Provides manual failover support to a new master.
3
3
  class ManualFailover
4
4
  # Path for manual failover communication.
5
- ZNODE_PATH = '/redis_failover_manual'.freeze
5
+ ZNODE_PATH = 'manual_failover'.freeze
6
6
 
7
7
  # Denotes that any slave can be used as a candidate for promotion.
8
8
  ANY_SLAVE = "ANY_SLAVE".freeze
9
9
 
10
+ def self.path(root_znode)
11
+ "#{root_znode}/#{ZNODE_PATH}"
12
+ end
13
+
10
14
  # Creates a new instance.
11
15
  #
12
16
  # @param [ZK] zk the ZooKeeper client
17
+ # @param [ZNode] root_znode the root ZK node
13
18
  # @param [Hash] options the options used for manual failover
14
19
  # @option options [String] :host the host of the failover candidate
15
20
  # @option options [String] :port the port of the failover candidate
16
21
  # @note
17
22
  # If options is empty, a random slave will be used
18
23
  # as a failover candidate.
19
- def initialize(zk, options = {})
24
+ def initialize(zk, root_znode, options = {})
20
25
  @zk = zk
26
+ @root_znode = root_znode
21
27
  @options = options
28
+
29
+ unless @options.empty?
30
+ port = Integer(@options[:port]) rescue nil
31
+ raise ArgumentError, ':host not properly specified' if @options[:host].to_s.empty?
32
+ raise ArgumentError, ':port not properly specified' if port.nil?
33
+ end
22
34
  end
23
35
 
24
36
  # Performs a manual failover.
25
37
  def perform
26
38
  create_path
27
39
  node = @options.empty? ? ANY_SLAVE : "#{@options[:host]}:#{@options[:port]}"
28
- @zk.set(ZNODE_PATH, node)
40
+ @zk.set(self.class.path(@root_znode), node)
29
41
  end
30
42
 
31
43
  private
32
44
 
33
45
  # Creates the znode path used for coordinating manual failovers.
34
46
  def create_path
35
- @zk.create(ZNODE_PATH)
47
+ @zk.create(self.class.path(@root_znode))
36
48
  rescue ZK::Exceptions::NodeExists
37
49
  # best effort
38
50
  end
@@ -22,7 +22,8 @@ module RedisFailover
22
22
  # @option options [String] :host the host of the redis server
23
23
  # @option options [String] :port the port of the redis server
24
24
  def initialize(options = {})
25
- @host = options.fetch(:host) { raise InvalidNodeError, 'missing host'}
25
+ @host = options[:host]
26
+ raise InvalidNodeError, 'missing host' if @host.to_s.empty?
26
27
  @port = Integer(options[:port] || 6379)
27
28
  @password = options[:password]
28
29
  end
@@ -3,20 +3,23 @@ module RedisFailover
3
3
  # will discover the current redis master and slaves. Each redis node is
4
4
  # monitored by a NodeWatcher instance. The NodeWatchers periodically
5
5
  # report the current state of the redis node it's watching to the
6
- # NodeManager via an asynchronous queue. The NodeManager processes the
7
- # state reports and reacts appropriately by handling stale/dead nodes,
8
- # and promoting a new redis master if it sees fit to do so.
6
+ # NodeManager. The NodeManager processes the state reports and reacts
7
+ # appropriately by handling stale/dead nodes, and promoting a new redis master
8
+ # if it sees fit to do so.
9
9
  class NodeManager
10
10
  include Util
11
11
 
12
12
  # Number of seconds to wait before retrying bootstrap process.
13
- TIMEOUT = 3
13
+ TIMEOUT = 5
14
+ # Number of seconds for checking node snapshots.
15
+ CHECK_INTERVAL = 10
16
+ # Number of max attempts to promote a master before releasing master lock.
17
+ MAX_PROMOTION_ATTEMPTS = 3
14
18
 
15
19
  # ZK Errors that the Node Manager cares about.
16
20
  ZK_ERRORS = [
17
21
  ZK::Exceptions::LockAssertionFailedError,
18
- ZK::Exceptions::InterruptedSession,
19
- ZKDisconnectedError
22
+ ZK::Exceptions::InterruptedSession
20
23
  ].freeze
21
24
 
22
25
  # Errors that can happen during the node discovery process.
@@ -38,15 +41,16 @@ module RedisFailover
38
41
  def initialize(options)
39
42
  logger.info("Redis Node Manager v#{VERSION} starting (#{RUBY_DESCRIPTION})")
40
43
  @options = options
41
- @znode = @options[:znode_path] || Util::DEFAULT_ZNODE_PATH
42
- @manual_znode = ManualFailover::ZNODE_PATH
43
- @mutex = Mutex.new
44
+ @required_node_managers = options.fetch(:required_node_managers, 1)
45
+ @root_znode = options.fetch(:znode_path, Util::DEFAULT_ROOT_ZNODE_PATH)
46
+ @node_strategy = NodeStrategy.for(options.fetch(:node_strategy, :majority))
47
+ @failover_strategy = FailoverStrategy.for(options.fetch(:failover_strategy, :latency))
48
+ @nodes = Array(@options[:nodes]).map { |opts| Node.new(opts) }.uniq
49
+ @master_manager = false
50
+ @master_promotion_attempts = 0
51
+ @sufficient_node_managers = false
52
+ @lock = Monitor.new
44
53
  @shutdown = false
45
- @leader = false
46
- @master = nil
47
- @slaves = []
48
- @unavailable = []
49
- @lock_path = "#{@znode}_lock".freeze
50
54
  end
51
55
 
52
56
  # Starts the node manager.
@@ -54,21 +58,18 @@ module RedisFailover
54
58
  # @note This method does not return until the manager terminates.
55
59
  def start
56
60
  return unless running?
57
- @queue = Queue.new
58
61
  setup_zk
59
- logger.info('Waiting to become master Node Manager ...')
60
- with_lock do
61
- @leader = true
62
- logger.info('Acquired master Node Manager lock')
63
- if discover_nodes
64
- initialize_path
65
- spawn_watchers
66
- handle_state_reports
67
- end
68
- end
62
+ spawn_watchers
63
+ wait_until_master
69
64
  rescue *ZK_ERRORS => ex
70
65
  logger.error("ZK error while attempting to manage nodes: #{ex.inspect}")
71
66
  reset
67
+ sleep(TIMEOUT)
68
+ retry
69
+ rescue NoMasterError
70
+ logger.error("Failed to promote a new master after #{MAX_PROMOTION_ATTEMPTS} attempts.")
71
+ reset
72
+ sleep(TIMEOUT)
72
73
  retry
73
74
  end
74
75
 
@@ -77,78 +78,58 @@ module RedisFailover
77
78
  #
78
79
  # @param [Node] node the node
79
80
  # @param [Symbol] state the state
80
- def notify_state(node, state)
81
- @queue << [node, state]
81
+ # @param [Integer] latency an optional latency
82
+ def notify_state(node, state, latency = nil)
83
+ @lock.synchronize do
84
+ if running?
85
+ update_current_state(node, state, latency)
86
+ end
87
+ end
88
+ rescue => ex
89
+ logger.error("Error handling state report #{[node, state].inspect}: #{ex.inspect}")
90
+ logger.error(ex.backtrace.join("\n"))
82
91
  end
83
92
 
84
93
  # Performs a reset of the manager.
85
94
  def reset
86
- @leader = false
95
+ @master_manager = false
96
+ @master_promotion_attempts = 0
87
97
  @watchers.each(&:shutdown) if @watchers
88
- @queue.clear
89
- @zk.close! if @zk
90
- @zk_lock = nil
91
98
  end
92
99
 
93
100
  # Initiates a graceful shutdown.
94
101
  def shutdown
95
102
  logger.info('Shutting down ...')
96
- @mutex.synchronize do
103
+ @lock.synchronize do
97
104
  @shutdown = true
98
105
  end
106
+
107
+ reset
108
+ exit
99
109
  end
100
110
 
101
111
  private
102
112
 
103
113
  # Configures the ZooKeeper client.
104
114
  def setup_zk
105
- @zk.close! if @zk
106
- @zk = ZK.new("#{@options[:zkservers]}#{@options[:chroot] || ''}")
107
- @zk.on_expired_session { notify_state(:zk_disconnected, nil) }
108
-
109
- @zk.register(@manual_znode) do |event|
110
- if event.node_created? || event.node_changed?
111
- perform_manual_failover
115
+ unless @zk
116
+ @zk = ZK.new("#{@options[:zkservers]}#{@options[:chroot] || ''}")
117
+ @zk.register(manual_failover_path) do |event|
118
+ handle_manual_failover_update(event)
112
119
  end
120
+ @zk.on_connected { @zk.stat(manual_failover_path, :watch => true) }
113
121
  end
114
122
 
115
- @zk.on_connected { @zk.stat(@manual_znode, :watch => true) }
116
- @zk.stat(@manual_znode, :watch => true)
117
- end
118
-
119
- # Handles periodic state reports from {RedisFailover::NodeWatcher} instances.
120
- def handle_state_reports
121
- while running? && (state_report = @queue.pop)
122
- begin
123
- @mutex.synchronize do
124
- return unless running?
125
- @zk_lock.assert!
126
- node, state = state_report
127
- case state
128
- when :unavailable then handle_unavailable(node)
129
- when :available then handle_available(node)
130
- when :syncing then handle_syncing(node)
131
- when :zk_disconnected then raise ZKDisconnectedError
132
- else raise InvalidNodeStateError.new(node, state)
133
- end
134
-
135
- # flush current state
136
- write_state
137
- end
138
- rescue *ZK_ERRORS
139
- # fail hard if this is a ZK connection-related error
140
- raise
141
- rescue => ex
142
- logger.error("Error handling #{state_report.inspect}: #{ex.inspect}")
143
- logger.error(ex.backtrace.join("\n"))
144
- end
145
- end
123
+ create_path(@root_znode)
124
+ create_path(current_state_root)
125
+ @zk.stat(manual_failover_path, :watch => true)
146
126
  end
147
127
 
148
128
  # Handles an unavailable node.
149
129
  #
150
130
  # @param [Node] node the unavailable node
151
- def handle_unavailable(node)
131
+ # @param [Hash<Node, NodeSnapshot>] snapshots the current set of snapshots
132
+ def handle_unavailable(node, snapshots)
152
133
  # no-op if we already know about this node
153
134
  return if @unavailable.include?(node)
154
135
  logger.info("Handling unavailable node: #{node}")
@@ -157,7 +138,7 @@ module RedisFailover
157
138
  # find a new master if this node was a master
158
139
  if node == @master
159
140
  logger.info("Demoting currently unavailable master #{node}.")
160
- promote_new_master
141
+ promote_new_master(snapshots)
161
142
  else
162
143
  @slaves.delete(node)
163
144
  end
@@ -166,7 +147,8 @@ module RedisFailover
166
147
  # Handles an available node.
167
148
  #
168
149
  # @param [Node] node the available node
169
- def handle_available(node)
150
+ # @param [Hash<Node, NodeSnapshot>] snapshots the current set of snapshots
151
+ def handle_available(node, snapshots)
170
152
  reconcile(node)
171
153
 
172
154
  # no-op if we already know about this node
@@ -179,7 +161,7 @@ module RedisFailover
179
161
  @slaves << node
180
162
  else
181
163
  # no master exists, make this the new master
182
- promote_new_master(node)
164
+ promote_new_master(snapshots, node)
183
165
  end
184
166
 
185
167
  @unavailable.delete(node)
@@ -188,74 +170,75 @@ module RedisFailover
188
170
  # Handles a node that is currently syncing.
189
171
  #
190
172
  # @param [Node] node the syncing node
191
- def handle_syncing(node)
173
+ # @param [Hash<Node, NodeSnapshot>] snapshots the current set of snapshots
174
+ def handle_syncing(node, snapshots)
192
175
  reconcile(node)
193
176
 
194
177
  if node.syncing_with_master? && node.prohibits_stale_reads?
195
178
  logger.info("Node #{node} not ready yet, still syncing with master.")
196
179
  force_unavailable_slave(node)
197
- return
180
+ else
181
+ # otherwise, we can use this node
182
+ handle_available(node, snapshots)
198
183
  end
199
-
200
- # otherwise, we can use this node
201
- handle_available(node)
202
184
  end
203
185
 
204
186
  # Handles a manual failover request to the given node.
205
187
  #
206
188
  # @param [Node] node the candidate node for failover
207
- def handle_manual_failover(node)
189
+ # @param [Hash<Node, NodeSnapshot>] snapshots the current set of snapshots
190
+ def handle_manual_failover(node, snapshots)
208
191
  # no-op if node to be failed over is already master
209
192
  return if @master == node
210
193
  logger.info("Handling manual failover")
211
194
 
195
+ # ensure we can talk to the node
196
+ node.ping
197
+
212
198
  # make current master a slave, and promote new master
213
199
  @slaves << @master if @master
214
200
  @slaves.delete(node)
215
- promote_new_master(node)
201
+ promote_new_master(snapshots, node)
216
202
  end
217
203
 
218
204
  # Promotes a new master.
219
205
  #
206
+ # @param [Hash<Node, NodeSnapshot>] snapshots the current set of snapshots
220
207
  # @param [Node] node the optional node to promote
221
- # @note if no node is specified, a random slave will be used
222
- def promote_new_master(node = nil)
223
- delete_path
208
+ def promote_new_master(snapshots, node = nil)
209
+ delete_path(redis_nodes_path)
224
210
  @master = nil
225
211
 
226
- # make a specific node or slave the new master
227
- candidate = node || @slaves.pop
228
- unless candidate
212
+ # make a specific node or selected candidate the new master
213
+ candidate = node || failover_strategy_candidate(snapshots)
214
+
215
+ if candidate.nil?
229
216
  logger.error('Failed to promote a new master, no candidate available.')
230
- return
217
+ else
218
+ @slaves.delete(candidate)
219
+ @unavailable.delete(candidate)
220
+ redirect_slaves_to(candidate)
221
+ candidate.make_master!
222
+ @master = candidate
223
+ write_current_redis_nodes
224
+ @master_promotion_attempts = 0
225
+ logger.info("Successfully promoted #{candidate} to master.")
231
226
  end
232
-
233
- redirect_slaves_to(candidate)
234
- candidate.make_master!
235
- @master = candidate
236
-
237
- create_path
238
- write_state
239
- logger.info("Successfully promoted #{candidate} to master.")
240
227
  end
241
228
 
242
229
  # Discovers the current master and slave nodes.
243
230
  # @return [Boolean] true if nodes successfully discovered, false otherwise
244
231
  def discover_nodes
245
- @mutex.synchronize do
246
- return false unless running?
247
- nodes = @options[:nodes].map { |opts| Node.new(opts) }.uniq
232
+ @lock.synchronize do
233
+ return unless running?
234
+ @slaves, @unavailable = [], []
248
235
  if @master = find_existing_master
249
236
  logger.info("Using master #{@master} from existing znode config.")
250
- elsif @master = guess_master(nodes)
237
+ elsif @master = guess_master(@nodes)
251
238
  logger.info("Guessed master #{@master} from known redis nodes.")
252
239
  end
253
- @slaves = nodes - [@master]
254
- logger.info("Managing master (#{@master}) and slaves " +
255
- "(#{@slaves.map(&:to_s).join(', ')})")
256
- # ensure that slaves are correctly pointing to this master
257
- redirect_slaves_to(@master)
258
- true
240
+ @slaves = @nodes - [@master]
241
+ logger.info("Managing master (#{@master}) and slaves #{stringify_nodes(@slaves)}")
259
242
  end
260
243
  rescue *NODE_DISCOVERY_ERRORS => ex
261
244
  msg = <<-MSG.gsub(/\s+/, ' ')
@@ -273,7 +256,7 @@ module RedisFailover
273
256
 
274
257
  # Seeds the initial node master from an existing znode config.
275
258
  def find_existing_master
276
- if data = @zk.get(@znode).first
259
+ if data = @zk.get(redis_nodes_path).first
277
260
  nodes = symbolize_keys(decode(data))
278
261
  master = node_from(nodes[:master])
279
262
  logger.info("Master from existing znode config: #{master || 'none'}")
@@ -302,10 +285,13 @@ module RedisFailover
302
285
 
303
286
  # Spawns the {RedisFailover::NodeWatcher} instances for each managed node.
304
287
  def spawn_watchers
305
- @watchers = [@master, @slaves, @unavailable].flatten.compact.map do |node|
306
- NodeWatcher.new(self, node, @options[:max_failures] || 3)
288
+ @zk.delete(current_state_path, :ignore => :no_node)
289
+ @monitored_available, @monitored_unavailable = {}, []
290
+ @watchers = @nodes.map do |node|
291
+ NodeWatcher.new(self, node, @options.fetch(:max_failures, 3))
307
292
  end
308
293
  @watchers.each(&:watch)
294
+ logger.info("Monitoring redis nodes at #{stringify_nodes(@nodes)}")
309
295
  end
310
296
 
311
297
  # Searches for the master node.
@@ -373,77 +359,361 @@ module RedisFailover
373
359
  }
374
360
  end
375
361
 
362
+ # @return [Hash] the set of currently available/unavailable nodes as
363
+ # seen by this node manager instance
364
+ def node_availability_state
365
+ {
366
+ :available => Hash[@monitored_available.map { |k, v| [k.to_s, v] }],
367
+ :unavailable => @monitored_unavailable.map(&:to_s)
368
+ }
369
+ end
370
+
376
371
  # Deletes the znode path containing the redis nodes.
377
- def delete_path
378
- @zk.delete(@znode)
379
- logger.info("Deleted ZooKeeper node #{@znode}")
372
+ #
373
+ # @param [String] path the znode path to delete
374
+ def delete_path(path)
375
+ @zk.delete(path)
376
+ logger.info("Deleted ZK node #{path}")
380
377
  rescue ZK::Exceptions::NoNode => ex
381
378
  logger.info("Tried to delete missing znode: #{ex.inspect}")
382
379
  end
383
380
 
384
- # Creates the znode path containing the redis nodes.
385
- def create_path
386
- unless @zk.exists?(@znode)
387
- @zk.create(@znode, encode(current_nodes))
388
- logger.info("Created ZooKeeper node #{@znode}")
381
+ # Creates a znode path.
382
+ #
383
+ # @param [String] path the znode path to create
384
+ # @param [Hash] options the options used to create the path
385
+ # @option options [String] :initial_value an initial value for the znode
386
+ # @option options [Boolean] :ephemeral true if node is ephemeral, false otherwise
387
+ def create_path(path, options = {})
388
+ unless @zk.exists?(path)
389
+ @zk.create(path,
390
+ options[:initial_value],
391
+ :ephemeral => options.fetch(:ephemeral, false))
392
+ logger.info("Created ZK node #{path}")
389
393
  end
390
394
  rescue ZK::Exceptions::NodeExists
391
395
  # best effort
392
396
  end
393
397
 
394
- # Initializes the znode path containing the redis nodes.
395
- def initialize_path
396
- create_path
397
- write_state
398
+ # Writes state to a particular znode path.
399
+ #
400
+ # @param [String] path the znode path that should be written to
401
+ # @param [String] value the value to write to the znode
402
+ # @param [Hash] options the default options to be used when creating the node
403
+ # @note the path will be created if it doesn't exist
404
+ def write_state(path, value, options = {})
405
+ create_path(path, options.merge(:initial_value => value))
406
+ @zk.set(path, value)
407
+ end
408
+
409
+ # Handles a manual failover znode update.
410
+ #
411
+ # @param [ZK::Event] event the ZK event to handle
412
+ def handle_manual_failover_update(event)
413
+ if event.node_created? || event.node_changed?
414
+ perform_manual_failover
415
+ end
416
+ rescue => ex
417
+ logger.error("Error scheduling a manual failover: #{ex.inspect}")
418
+ logger.error(ex.backtrace.join("\n"))
419
+ ensure
420
+ @zk.stat(manual_failover_path, :watch => true)
421
+ end
422
+
423
+ # Produces a FQDN id for this Node Manager.
424
+ #
425
+ # @return [String] the FQDN for this Node Manager
426
+ def manager_id
427
+ @manager_id ||= [
428
+ Socket.gethostbyname(Socket.gethostname)[0],
429
+ Process.pid
430
+ ].join('-')
431
+ end
432
+
433
+ # Writes the current master list of redis nodes. This method is only invoked
434
+ # if this node manager instance is the master/primary manager.
435
+ def write_current_redis_nodes
436
+ write_state(redis_nodes_path, encode(current_nodes))
437
+ end
438
+
439
+ # @return [String] root path for current node manager state
440
+ def current_state_root
441
+ "#{@root_znode}/manager_node_state"
442
+ end
443
+
444
+ # @return [String] the znode path for this node manager's view
445
+ # of available nodes
446
+ def current_state_path
447
+ "#{current_state_root}/#{manager_id}"
448
+ end
449
+
450
+ # @return [String] the znode path for the master redis nodes config
451
+ def redis_nodes_path
452
+ "#{@root_znode}/nodes"
453
+ end
454
+
455
+ # @return [String] the znode path used for performing manual failovers
456
+ def manual_failover_path
457
+ ManualFailover.path(@root_znode)
458
+ end
459
+
460
+ # @return [Boolean] true if this node manager is the master, false otherwise
461
+ def master_manager?
462
+ @master_manager
398
463
  end
399
464
 
400
- # Writes the current redis nodes state to the znode path.
401
- def write_state
402
- create_path
403
- @zk.set(@znode, encode(current_nodes))
465
+ # Used to update the master node manager state. These states are only handled if
466
+ # this node manager instance is serving as the master manager.
467
+ #
468
+ # @param [Node] node the node to handle
469
+ # @param [Hash<Node, NodeSnapshot>] snapshots the current set of snapshots
470
+ def update_master_state(node, snapshots)
471
+ state = @node_strategy.determine_state(node, snapshots)
472
+ case state
473
+ when :unavailable
474
+ handle_unavailable(node, snapshots)
475
+ when :available
476
+ if node.syncing_with_master?
477
+ handle_syncing(node, snapshots)
478
+ else
479
+ handle_available(node, snapshots)
480
+ end
481
+ else
482
+ raise InvalidNodeStateError.new(node, state)
483
+ end
484
+ rescue *ZK_ERRORS
485
+ # fail hard if this is a ZK connection-related error
486
+ raise
487
+ rescue => ex
488
+ logger.error("Error handling state report for #{[node, state].inspect}: #{ex.inspect}")
489
+ end
490
+
491
+ # Updates the current view of the world for this particular node
492
+ # manager instance. All node managers write this state regardless
493
+ # of whether they are the master manager or not.
494
+ #
495
+ # @param [Node] node the node to handle
496
+ # @param [Symbol] state the node state
497
+ # @param [Integer] latency an optional latency
498
+ def update_current_state(node, state, latency = nil)
499
+ case state
500
+ when :unavailable
501
+ @monitored_unavailable |= [node]
502
+ @monitored_available.delete(node)
503
+ when :available
504
+ @monitored_available[node] = latency
505
+ @monitored_unavailable.delete(node)
506
+ else
507
+ raise InvalidNodeStateError.new(node, state)
508
+ end
509
+
510
+ # flush ephemeral current node manager state
511
+ write_state(current_state_path,
512
+ encode(node_availability_state),
513
+ :ephemeral => true)
514
+ end
515
+
516
+ # Fetches each currently running node manager's view of the
517
+ # world in terms of which nodes they think are available/unavailable.
518
+ #
519
+ # @return [Hash<String, Array>] a hash of node manager to host states
520
+ def fetch_node_manager_states
521
+ states = {}
522
+ @zk.children(current_state_root).each do |child|
523
+ full_path = "#{current_state_root}/#{child}"
524
+ begin
525
+ states[child] = symbolize_keys(decode(@zk.get(full_path).first))
526
+ rescue ZK::Exceptions::NoNode
527
+ # ignore, this is an edge case that can happen when a node manager
528
+ # process dies while fetching its state
529
+ rescue => ex
530
+ logger.error("Failed to fetch states for #{full_path}: #{ex.inspect}")
531
+ end
532
+ end
533
+ states
534
+ end
535
+
536
+ # Builds current snapshots of nodes across all running node managers.
537
+ #
538
+ # @return [Hash<Node, NodeSnapshot>] the snapshots for all nodes
539
+ def current_node_snapshots
540
+ nodes = {}
541
+ snapshots = Hash.new { |h, k| h[k] = NodeSnapshot.new(k) }
542
+ fetch_node_manager_states.each do |node_manager, states|
543
+ available, unavailable = states.values_at(:available, :unavailable)
544
+ available.each do |node_string, latency|
545
+ node = nodes[node_string] ||= node_from(node_string)
546
+ snapshots[node].viewable_by(node_manager, latency)
547
+ end
548
+ unavailable.each do |node_string|
549
+ node = nodes[node_string] ||= node_from(node_string)
550
+ snapshots[node].unviewable_by(node_manager)
551
+ end
552
+ end
553
+
554
+ snapshots
555
+ end
556
+
557
+ # Waits until this node manager becomes the master.
558
+ def wait_until_master
559
+ logger.info('Waiting to become master Node Manager ...')
560
+
561
+ with_lock do
562
+ @master_manager = true
563
+ logger.info('Acquired master Node Manager lock.')
564
+ logger.info("Configured node strategy #{@node_strategy.class}")
565
+ logger.info("Configured failover strategy #{@failover_strategy.class}")
566
+ logger.info("Required Node Managers to make a decision: #{@required_node_managers}")
567
+ manage_nodes
568
+ end
569
+ end
570
+
571
+ # Manages the redis nodes by periodically processing snapshots.
572
+ def manage_nodes
573
+ # Re-discover nodes, since the state of the world may have been changed
574
+ # by the time we've become the primary node manager.
575
+ discover_nodes
576
+
577
+ # ensure that slaves are correctly pointing to this master
578
+ redirect_slaves_to(@master)
579
+
580
+ # Periodically update master config state.
581
+ while running? && master_manager?
582
+ @zk_lock.assert!
583
+ sleep(CHECK_INTERVAL)
584
+
585
+ @lock.synchronize do
586
+ snapshots = current_node_snapshots
587
+ if ensure_sufficient_node_managers(snapshots)
588
+ snapshots.each_key do |node|
589
+ update_master_state(node, snapshots)
590
+ end
591
+
592
+ # flush current master state
593
+ write_current_redis_nodes
594
+
595
+ # check if we've exhausted our attempts to promote a master
596
+ unless @master
597
+ @master_promotion_attempts += 1
598
+ raise NoMasterError if @master_promotion_attempts > MAX_PROMOTION_ATTEMPTS
599
+ end
600
+ end
601
+ end
602
+ end
603
+ end
604
+
605
+ # Creates a Node instance from a string.
606
+ #
607
+ # @param [String] node_string a string representation of a node (e.g., host:port)
608
+ # @return [Node] the Node representation
609
+ def node_from(node_string)
610
+ return if node_string.nil?
611
+ host, port = node_string.split(':', 2)
612
+ Node.new(:host => host, :port => port, :password => @options[:password])
404
613
  end
405
614
 
406
615
  # Executes a block wrapped in a ZK exclusive lock.
407
616
  def with_lock
408
- @zk_lock = @zk.locker(@lock_path)
409
- while running? && !@zk_lock.lock
410
- sleep(TIMEOUT)
617
+ @zk_lock ||= @zk.locker('master_redis_node_manager_lock')
618
+
619
+ begin
620
+ @zk_lock.lock!(true)
621
+ rescue Exception
622
+ # handle shutdown case
623
+ running? ? raise : return
411
624
  end
412
625
 
413
626
  if running?
627
+ @zk_lock.assert!
414
628
  yield
415
629
  end
416
630
  ensure
417
- @zk_lock.unlock! if @zk_lock
631
+ if @zk_lock
632
+ begin
633
+ @zk_lock.unlock!
634
+ rescue => ex
635
+ logger.warn("Failed to release lock: #{ex.inspect}")
636
+ end
637
+ end
418
638
  end
419
639
 
420
640
  # Perform a manual failover to a redis node.
421
641
  def perform_manual_failover
422
- @mutex.synchronize do
423
- return unless running? && @leader && @zk_lock
642
+ @lock.synchronize do
643
+ return unless running? && @master_manager && @zk_lock
424
644
  @zk_lock.assert!
425
- new_master = @zk.get(@manual_znode, :watch => true).first
645
+ new_master = @zk.get(manual_failover_path, :watch => true).first
426
646
  return unless new_master && new_master.size > 0
427
647
  logger.info("Received manual failover request for: #{new_master}")
428
648
  logger.info("Current nodes: #{current_nodes.inspect}")
429
- node = new_master == ManualFailover::ANY_SLAVE ?
430
- @slaves.shuffle.first : node_from(new_master)
649
+ snapshots = current_node_snapshots
650
+
651
+ node = if new_master == ManualFailover::ANY_SLAVE
652
+ failover_strategy_candidate(snapshots)
653
+ else
654
+ node_from(new_master)
655
+ end
656
+
431
657
  if node
432
- handle_manual_failover(node)
658
+ handle_manual_failover(node, snapshots)
433
659
  else
434
660
  logger.error('Failed to perform manual failover, no candidate found.')
435
661
  end
436
662
  end
437
663
  rescue => ex
438
- logger.error("Error handling a manual failover: #{ex.inspect}")
664
+ logger.error("Error handling manual failover: #{ex.inspect}")
439
665
  logger.error(ex.backtrace.join("\n"))
440
666
  ensure
441
- @zk.stat(@manual_znode, :watch => true)
667
+ @zk.stat(manual_failover_path, :watch => true)
442
668
  end
443
669
 
444
670
  # @return [Boolean] true if running, false otherwise
445
671
  def running?
446
- !@shutdown
672
+ @lock.synchronize { !@shutdown }
673
+ end
674
+
675
+ # @return [String] a stringified version of redis nodes
676
+ def stringify_nodes(nodes)
677
+ "(#{nodes.map(&:to_s).join(', ')})"
678
+ end
679
+
680
+ # Determines if each snapshot has a sufficient number of node managers.
681
+ #
682
+ # @param [Hash<Node, Snapshot>] snapshots the current snapshots
683
+ # @return [Boolean] true if sufficient, false otherwise
684
+ def ensure_sufficient_node_managers(snapshots)
685
+ currently_sufficient = true
686
+ snapshots.each do |node, snapshot|
687
+ node_managers = snapshot.node_managers
688
+ if node_managers.size < @required_node_managers
689
+ logger.error("Not enough Node Managers in snapshot for node #{node}. " +
690
+ "Required: #{@required_node_managers}, " +
691
+ "Available: #{node_managers.size} #{node_managers}")
692
+ currently_sufficient = false
693
+ end
694
+ end
695
+
696
+ if currently_sufficient && !@sufficient_node_managers
697
+ logger.info("Required Node Managers are visible: #{@required_node_managers}")
698
+ end
699
+
700
+ @sufficient_node_managers = currently_sufficient
701
+ @sufficient_node_managers
702
+ end
703
+
704
+ # Invokes the configured failover strategy.
705
+ #
706
+ # @param [Hash<Node, NodeSnapshot>] snapshots the node snapshots
707
+ # @return [Node] a failover candidate
708
+ def failover_strategy_candidate(snapshots)
709
+ # only include nodes that this master Node Manager can see
710
+ filtered_snapshots = snapshots.select do |node, snapshot|
711
+ snapshot.viewable_by?(manager_id)
712
+ end
713
+
714
+ logger.info('Attempting to find candidate from snapshots:')
715
+ logger.info("\n" + filtered_snapshots.values.join("\n"))
716
+ @failover_strategy.find_candidate(filtered_snapshots)
447
717
  end
448
718
  end
449
719
  end