spbtv_redis_failover 1.0.2.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (53) hide show
  1. checksums.yaml +7 -0
  2. data/.gitignore +19 -0
  3. data/.travis.yml +7 -0
  4. data/.yardopts +6 -0
  5. data/Changes.md +191 -0
  6. data/Gemfile +2 -0
  7. data/LICENSE +22 -0
  8. data/README.md +240 -0
  9. data/Rakefile +9 -0
  10. data/bin/redis_node_manager +7 -0
  11. data/examples/config.yml +17 -0
  12. data/examples/multiple_environments_config.yml +15 -0
  13. data/lib/redis_failover.rb +25 -0
  14. data/lib/redis_failover/cli.rb +142 -0
  15. data/lib/redis_failover/client.rb +517 -0
  16. data/lib/redis_failover/errors.rb +54 -0
  17. data/lib/redis_failover/failover_strategy.rb +25 -0
  18. data/lib/redis_failover/failover_strategy/latency.rb +21 -0
  19. data/lib/redis_failover/manual_failover.rb +52 -0
  20. data/lib/redis_failover/node.rb +190 -0
  21. data/lib/redis_failover/node_manager.rb +741 -0
  22. data/lib/redis_failover/node_snapshot.rb +81 -0
  23. data/lib/redis_failover/node_strategy.rb +34 -0
  24. data/lib/redis_failover/node_strategy/consensus.rb +18 -0
  25. data/lib/redis_failover/node_strategy/majority.rb +18 -0
  26. data/lib/redis_failover/node_strategy/single.rb +17 -0
  27. data/lib/redis_failover/node_watcher.rb +83 -0
  28. data/lib/redis_failover/runner.rb +27 -0
  29. data/lib/redis_failover/util.rb +137 -0
  30. data/lib/redis_failover/version.rb +3 -0
  31. data/misc/redis_failover.png +0 -0
  32. data/spbtv_redis_failover.gemspec +26 -0
  33. data/spec/cli_spec.rb +75 -0
  34. data/spec/client_spec.rb +153 -0
  35. data/spec/failover_strategy/latency_spec.rb +41 -0
  36. data/spec/failover_strategy_spec.rb +17 -0
  37. data/spec/node_manager_spec.rb +136 -0
  38. data/spec/node_snapshot_spec.rb +30 -0
  39. data/spec/node_spec.rb +84 -0
  40. data/spec/node_strategy/consensus_spec.rb +30 -0
  41. data/spec/node_strategy/majority_spec.rb +22 -0
  42. data/spec/node_strategy/single_spec.rb +22 -0
  43. data/spec/node_strategy_spec.rb +22 -0
  44. data/spec/node_watcher_spec.rb +58 -0
  45. data/spec/spec_helper.rb +21 -0
  46. data/spec/support/config/multiple_environments.yml +15 -0
  47. data/spec/support/config/multiple_environments_with_chroot.yml +17 -0
  48. data/spec/support/config/single_environment.yml +7 -0
  49. data/spec/support/config/single_environment_with_chroot.yml +8 -0
  50. data/spec/support/node_manager_stub.rb +87 -0
  51. data/spec/support/redis_stub.rb +105 -0
  52. data/spec/util_spec.rb +21 -0
  53. metadata +207 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: cd8dbb91274794ac6cae34d019edc18a26989334
4
+ data.tar.gz: 400c4c3d172219b75227dc6093c27e6e477a5e6b
5
+ SHA512:
6
+ metadata.gz: a6fa61ddf6fe348feb8be032015905c4267ff47dfa62b44412afa470de8c894ee8b3d92a708ed15ec2fc2fb6575f665a093dcd3f9b94a88e903d66d4ace291b3
7
+ data.tar.gz: 652e4ff79d364e24b5fefd37340ab33e8f69002f7a1a4eb959e9cb5f70f9d7f5ea0e6301b60454c37c3d64fc03b3ff4976dcb3cbcc454c0a835152231042af72
data/.gitignore ADDED
@@ -0,0 +1,19 @@
1
+ *.gem
2
+ *.rbc
3
+ .bundle
4
+ .config
5
+ .yardoc
6
+ Gemfile.lock
7
+ InstalledFiles
8
+ _yardoc
9
+ coverage
10
+ doc/
11
+ lib/bundler/man
12
+ pkg
13
+ rdoc
14
+ spec/reports
15
+ test/tmp
16
+ test/version_tmp
17
+ tmp
18
+ tags
19
+ .DS_Store
data/.travis.yml ADDED
@@ -0,0 +1,7 @@
1
+ language: ruby
2
+ rvm:
3
+ - 1.9.2
4
+ - 1.9.3
5
+ - jruby-19mode
6
+ - 2.1.0
7
+ - 2.0.0
data/.yardopts ADDED
@@ -0,0 +1,6 @@
1
+ -m markdown
2
+ --readme README.md
3
+ -
4
+ Changes.md
5
+ LICENSE
6
+
data/Changes.md ADDED
@@ -0,0 +1,191 @@
1
+ 1.0.2
2
+ -----------
3
+ - Reopen client if an ZK::Exceptions::InterruptedSession occurs (#50, mauricio)
4
+ - Insert the "root_znode" path before "master_redis_node_manager_lock" and expose via accessor (#52, jzaleski)
5
+
6
+ 1.0.1
7
+ -----------
8
+ - Bumped required dependency on ZK gem. ZK 1.7.4 fixes a critical bug with locking (see https://github.com/slyphon/zk/issues/54)
9
+ - Fix an issue where a failover would not occur if we couldn't check the role of a downed master
10
+
11
+ 1.0.0
12
+ -----------
13
+ ** NOTE: This version of redis_failover requires that you upgrade your clients and Node Managers at the same time.
14
+
15
+ - redis_failover now supports distributed monitoring among the Node Managers! Previously, the Node Managers were only used
16
+ as a means of redundancy in case a particular node manager crashed. Starting with version 1.0 of redis_failover, the Node
17
+ Managers will all periodically report their health report/snapshots. The primary Node Manager will utilize a configurable
18
+ "node strategy" to determine if a particular node is available or unavailable.
19
+ - redis_failover now supports a configurable "failover strategy" that's consulted when performing a failover. Currently,
20
+ a single strategy is provided that takes into account the average latency of the last health check to the redis server.
21
+ - Improved handling of underlying ZK client connection in RedisFailover::NodeManager
22
+ - Add support for passing in an existing ZK client instance to RedisFailover::Cient.new
23
+ - Reduce unnecessary writes to ZK
24
+
25
+ 0.9.7.2
26
+ -----------
27
+ - Add support for Redis#client's location method. Fixes a compatibility issue with redis_failover and Sidekiq.
28
+
29
+ 0.9.7.1
30
+ -----------
31
+ - Stop repeated attempts to acquire exclusive lock in Node Manager (#36)
32
+
33
+ 0.9.7
34
+ -----------
35
+ - Stubbed Client#client to return itself, fixes a fork reconnect bug with Resque (dbalatero)
36
+
37
+ 0.9.6
38
+ -----------
39
+ - Handle the node discovery error condition where the znode points to a master that is now a slave.
40
+
41
+ 0.9.5
42
+ -----------
43
+ - Introduce a safer master node discovery process for the Node Manager (#34)
44
+ - Improved shutdown process for Node Manager
45
+
46
+ 0.9.4
47
+ -----------
48
+ - Preserve original master by reading from existing znode state.
49
+ - Prevent Timeout::Error from bringing down the process (#32) (@eric)
50
+
51
+ 0.9.3
52
+ -----------
53
+ - Add lock assert for Node Manager.
54
+
55
+ 0.9.2
56
+ -----------
57
+ - Improved exception handling in NodeWatcher.
58
+
59
+ 0.9.1
60
+ -----------
61
+ - Improve nested exception handling.
62
+ - Fix manual failover support when znode does not exist first.
63
+ - Various fixes to work better with 1.8.7.
64
+
65
+ 0.9.0
66
+ -----------
67
+ - Make Node Manager's lock path vary with its main znode. (Bira)
68
+ - Node Manager's znode for holding current list of redis nodes is no longer ephemeral. This is unnecessary since the current master should only be changed by redis_failover.
69
+ - Introduce :master_only option for RedisFailover::Client (disabled by default). This option configures the client to direct all read/write operations to the master.
70
+ - Introduce :safe_mode option (enabled by default). This option configures the client to purge its redis clients when a ZK session expires or when the client hasn't recently heard from the node manager.
71
+ - Introduce RedisFailover::Client#on_node_change callback notification for when the currently known list of master/slave redis nodes changes.
72
+ - Added #current_master and #current_slaves to RedisFailover::Client. This is useful for programmatically doing things based on the current master/slaves.
73
+ - redis_node_manager should start if no redis servers are available (#29)
74
+ - Better handling of ZK session expirations in Node Manager.
75
+
76
+ 0.8.9
77
+ -----------
78
+ - Handle errors raised by redis 3.x client (tsilen)
79
+
80
+ 0.8.8
81
+ -----------
82
+ - Use a stack for handling nested blocks in RedisFailover::Client (inspired by connection_pool gem)
83
+ - Fix an issue with #multi and Redis 3.x.
84
+
85
+ 0.8.7
86
+ -----------
87
+ - Support TTL operation (#24)
88
+
89
+ 0.8.6
90
+ -----------
91
+ - No longer buffer output (kyohsuke)
92
+ - Update redis/zk gem versions to latest (rudionrails)
93
+
94
+ 0.8.5
95
+ -----------
96
+ - Lock-down gemspec to version 1.1.x of redis-namespace to play nicely with redis 2.2.x.
97
+ - Fix RedisFailover::Client#manual_failover regression (oleriesenberg)
98
+
99
+ 0.8.4
100
+ -----------
101
+ - Lock-down gemspec to redis 2.2.x in light of upcoming redis 3.x release. Once sufficient testing
102
+ has been done with the 3.x release, I will relax the constraint in the gemspec.
103
+ - Add environment-scoped configuration file support (oleriesenberg)
104
+
105
+ 0.8.3
106
+ -----------
107
+ - Added a way to gracefully shutdown/reconnect a RedisFailover::Client. (#13)
108
+ - Upgraded to latest ZK version that supports forking.
109
+ - Handle case where the same RedisFailover::Client is referenced by a #multi block (#14)
110
+
111
+ 0.8.2
112
+ -----------
113
+ - Fix method signature for RedisFailover::Client#respond_to_missing? (#12)
114
+
115
+ 0.8.1
116
+ -----------
117
+ - Added YARD documentation.
118
+ - Improve ZooKeeper client connection management.
119
+ - Upgrade to latest ZK gem stable release.
120
+
121
+ 0.8.0
122
+ -----------
123
+ - Added manual failover support (can be initiated via RedisFailover::Client#manual_failover)
124
+ - Misc. cleanup
125
+
126
+ 0.7.0
127
+ -----------
128
+ - When new master promotion occurs, make existing slaves point to new candidate before promoting new master.
129
+ - Add support for specifying command-line options in a config.yml file for Node Manager.
130
+ - Upgrade to 0.9 version of ZK client and cleanup ZK connection error handling.
131
+
132
+ 0.6.0
133
+ -----------
134
+ - Add support for running multiple Node Manager processes for added redundancy (#4)
135
+ - Add support for specifying a redis database in RedisFailover::Client (#5)
136
+ - Improved Node Manager command-line option parsing
137
+
138
+ 0.5.4
139
+ -----------
140
+ - No longer use problematic ZK#reopen.
141
+
142
+ 0.5.3
143
+ -----------
144
+ - Handle more ZK exceptions as candidates for reconnecting the client on error.
145
+ - Add safety check to actively purge redis clients if a RedisFailover::Client hasn't heard from the Node Manager in a certain time window.
146
+
147
+ 0.5.2
148
+ -----------
149
+ - Always try to create path before setting current state in Node Manager.
150
+ - More explicit rescuing of exceptions.
151
+
152
+ 0.5.1
153
+ -----------
154
+ - More logging around exceptions
155
+ - Handle re-watching on client session expirations / disconnections
156
+ - Use an ephemeral node for the list of redis servers
157
+
158
+ 0.5.0
159
+ -----------
160
+ - redis_failover is now built on top of ZooKeeper! This means redis_failover enjoys all of the reliability, redundancy, and data consistency offered by ZooKeeper. The old fragile HTTP-based approach has been removed and will no longer be supported in favor of ZooKeeper. This does mean that in order to use redis_failover, you must have ZooKeeper installed and running. Please see the README for steps on how to do this if you don't already have ZooKeeper running in your production environment.
161
+
162
+ 0.4.0
163
+ -----------
164
+ - No longer force newly available slaves to master if already slaves of that master
165
+ - Honor a node's slave-serve-stale-data configuration option; do not mark a sync-with-master-in-progress slave as available if its slave-serve-stale-data is disabled
166
+ - Change reachable/unreachable wording to available/unavailable
167
+ - Added node reconciliation, i.e. if a node comes back up, make sure that the node manager and the node agree on current role
168
+ - More efficient use of redis client connections
169
+ - Raise proper error for unsupported operations (i.e., those that don't make sense for a failover client)
170
+ - Properly handle any hanging node operations in the failover server
171
+
172
+ 0.3.0
173
+ -----------
174
+ - Integrated travis-ci
175
+ - Added background monitor to client for proactively detecting changes to current set of redis nodes
176
+
177
+ 0.2.0
178
+ -----------
179
+ - Added retry support for contacting failover server from client
180
+ - Client now verifies proper master/slave role before attempting operation
181
+ - General edge case cleanup for NodeManager
182
+
183
+ 0.1.1
184
+ -----------
185
+
186
+ - Fix option parser require
187
+
188
+ 0.1.0
189
+ -----------
190
+
191
+ - First release
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source 'https://rubygems.org'
2
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2012 Ryan LeCompte
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,240 @@
1
+ # Automatic Redis Failover
2
+
3
+ [![Build Status](https://secure.travis-ci.org/SPBTV/spbtv_redis_failover.png?branch=master)](http://travis-ci.org/SPBTV/spbtv_redis_failover)
4
+
5
+ redis_failover provides a full automatic master/slave failover solution for Ruby. Redis does not currently provide
6
+ an automatic failover capability when configured for master/slave replication. When the master node dies,
7
+ a new master must be manually brought online and assigned as the slave's new master. This manual
8
+ switch-over is not desirable in high traffic sites where Redis is a critical part of the overall
9
+ architecture. The existing standard Redis client for Ruby also only supports configuration for a single
10
+ Redis server. When using master/slave replication, it is desirable to have all writes go to the
11
+ master, and all reads go to one of the N configured slaves.
12
+
13
+ This gem (built using [ZK][]) attempts to address these failover scenarios. One or more Node Manager daemons run as background
14
+ processes and monitor all of your configured master/slave nodes. When the daemon starts up, it
15
+ automatically discovers the current master/slaves. Background watchers are setup for each of
16
+ the redis nodes. As soon as a node is detected as being offline, it will be moved to an "unavailable" state.
17
+ If the node that went offline was the master, then one of the slaves will be promoted as the new master.
18
+ All existing slaves will be automatically reconfigured to point to the new master for replication.
19
+ All nodes marked as unavailable will be periodically checked to see if they have been brought back online.
20
+ If so, the newly available nodes will be configured as slaves and brought back into the list of available
21
+ nodes. Note that detection of a node going down should be nearly instantaneous, since the mechanism
22
+ used to keep tabs on a node is via a blocking Redis BLPOP call (no polling). This call fails nearly
23
+ immediately when the node actually goes offline. To avoid false positives (i.e., intermittent flaky
24
+ network interruption), the Node Manager will only mark a node as unavailable if it fails to communicate with
25
+ it 3 times (this is configurable via --max-failures, see configuration options below). Note that you can (and should)
26
+ deploy multiple Node Manager daemons since they each report periodic health reports/snapshots of the redis servers. A
27
+ "node strategy" is used to determine if a node is actually unavailable. By default a majority strategy is used, but
28
+ you can also configure "consensus" or "single" as well.
29
+
30
+ This gem provides a RedisFailover::Client wrapper that is master/slave aware. The client is configured
31
+ with a list of ZooKeeper servers. The client will automatically contact the ZooKeeper cluster to find out
32
+ the current state of the world (i.e., who is the current master and who are the current slaves). The client
33
+ also sets up a ZooKeeper watcher for the set of redis nodes controlled by the Node Manager daemon. When the daemon
34
+ promotes a new master or detects a node as going down, ZooKeeper will notify the client near-instantaneously so
35
+ that it can rebuild its set of Redis connections. The client also acts as a load balancer in that it will automatically
36
+ dispatch Redis read operations to one of N slaves, and Redis write operations to the master.
37
+ If it fails to communicate with any node, it will go back and fetch the current list of available servers, and then
38
+ optionally retry the operation.
39
+
40
+ [ZK]: https://github.com/slyphon/zk
41
+
42
+ ## Architecture Diagram
43
+
44
+ ![redis_failover architecture diagram](https://github.com/ryanlecompte/redis_failover/raw/master/misc/redis_failover.png)
45
+
46
+ ## Installation
47
+
48
+ redis_failover has an external dependency on ZooKeeper. You must have a running ZooKeeper cluster already available in order to use redis_failover. ZooKeeper provides redis_failover with its high availability and data consistency between Redis::Failover clients and the Node Manager daemon. Please see the requirements section below for more information on installing and setting up ZooKeeper if you don't have it running already.
49
+
50
+ Add this line to your application's Gemfile:
51
+
52
+ gem 'redis_failover'
53
+
54
+ And then execute:
55
+
56
+ $ bundle
57
+
58
+ Or install it yourself as:
59
+
60
+ $ gem install redis_failover
61
+
62
+ ## Node Manager Daemon Usage
63
+
64
+ The Node Manager is a simple process that should be run as a background daemon. The daemon supports the
65
+ following options:
66
+
67
+ Usage: redis_node_manager [OPTIONS]
68
+
69
+
70
+ Specific options:
71
+ -n, --nodes NODES Comma-separated redis host:port pairs
72
+ -z, --zkservers SERVERS Comma-separated ZooKeeper host:port pairs
73
+ -p, --password PASSWORD Redis password
74
+ --znode-path PATH Znode path override for storing redis server list
75
+ --max-failures COUNT Max failures before manager marks node unavailable
76
+ -C, --config PATH Path to YAML config file
77
+ --with-chroot ROOT Path to ZooKeepers chroot
78
+ -E, --environment ENV Config environment to use
79
+ --node-strategy STRATEGY Strategy used when determining availability of nodes (default: majority)
80
+ --failover-strategy STRATEGY Strategy used when failing over to a new node (default: latency)
81
+ --required-node-managers COUNT Required Node Managers that must be reachable to determine node state (default: 1)
82
+ -h, --help Display all options
83
+
84
+ To start the daemon for a simple master/slave configuration, use the following:
85
+
86
+ redis_node_manager -n localhost:6379,localhost:6380 -z localhost:2181,localhost:2182,localhost:2183
87
+
88
+ The configuration parameters can also be specified in a config.yml file. An example configuration
89
+ would look like the following:
90
+
91
+ ---
92
+ :max_failures: 2
93
+ :node_strategy: majority
94
+ :failover_strategy: latency
95
+ :required_node_managers: 2
96
+ :nodes:
97
+ - localhost:6379
98
+ - localhost:1111
99
+ - localhost:2222
100
+ - localhost:3333
101
+ :zkservers:
102
+ - localhost:2181
103
+ - localhost:2182
104
+ - localhost:2183
105
+ :password: foobar
106
+
107
+ You would then simpy start the Node Manager via the following:
108
+
109
+ redis_node_manager -C config.yml
110
+
111
+ You can also scope the configuration to a particular environment (e.g., staging/development). See the examples
112
+ directory for configuration file samples.
113
+
114
+ The Node Manager will automatically discover the master/slaves upon startup. Note that it is
115
+ a good idea to run more than one instance of the Node Manager daemon in your environment. At
116
+ any moment, a single Node Manager process will be designated to manage the redis servers. If
117
+ this Node Manager process dies or becomes partitioned from the network, another Node Manager
118
+ will be promoted as the primary manager of redis servers. You can run as many Node Manager
119
+ processes as you'd like. Every Node Manager periodically records health "snapshots" which the
120
+ primary/master Node Manager consults when determining if it should officially mark a redis
121
+ server as unavailable. By default, a majority strategy is used. Also, when a failover
122
+ happens, the primary Node Manager will consult the node snapshots to determine the best
123
+ node to use as the new master.
124
+
125
+ ## Client Usage
126
+
127
+ The redis failover client must be used in conjunction with a running Node Manager daemon. The
128
+ client supports various configuration options, however the only mandatory option is the list of
129
+ ZooKeeper servers OR an existing ZK client instance:
130
+
131
+ # Explicitly specify the ZK servers
132
+ client = RedisFailover::Client.new(:zkservers => 'localhost:2181,localhost:2182,localhost:2183')
133
+
134
+ # Explicitly specify an existing ZK client instance (useful if using a connection pool, etc)
135
+ zk = ZK.new('localhost:2181,localhost:2182,localhost:2183')
136
+ client = RedisFailover::Client.new(:zk => zk)
137
+
138
+ The client actually employs the common redis and redis-namespace gems underneath, so this should be
139
+ a drop-in replacement for your existing pure redis client usage.
140
+
141
+ The full set of options that can be passed to RedisFailover::Client are:
142
+
143
+ :zk - an existing ZK client instance
144
+ :zkservers - comma-separated ZooKeeper host:port pairs
145
+ :znode_path - the Znode path override for redis server list (optional)
146
+ :password - password for redis nodes (optional)
147
+ :db - db to use for redis nodes (optional)
148
+ :namespace - namespace for redis nodes (optional)
149
+ :logger - logger override (optional)
150
+ :retry_failure - indicate if failures should be retried (default true)
151
+ :max_retries - max retries for a failure (default 3)
152
+ :safe_mode - indicates if safe mode is used or not (default true)
153
+ :master_only - indicates if only redis master is used (default false)
154
+ :verify_role - verify the actual role of a redis node before every command (default true)
155
+
156
+ The RedisFailover::Client also supports a custom callback that will be invoked whenever the list of redis clients changes. Example usage:
157
+
158
+ RedisFailover::Client.new(:zkservers => 'localhost:2181,localhost:2182,localhost:2183') do |client|
159
+ client.on_node_change do |master, slaves|
160
+ logger.info("Nodes changed! master: #{master}, slaves: #{slaves}")
161
+ end
162
+ end
163
+
164
+ ## Manual Failover
165
+
166
+ Manual failover can be initiated via RedisFailover::Client#manual_failover. This schedules a manual failover with the
167
+ currently active Node Manager. Once the Node Manager receives the request, it will either failover to the specific
168
+ server passed to #manual_failover, or it will pick a random slave to become the new master. Here's an example:
169
+
170
+ client = RedisFailover::Client.new(:zkservers => 'localhost:2181,localhost:2182,localhost:2183')
171
+ client.manual_failover(:host => 'localhost', :port => 2222)
172
+
173
+ ## Node & Failover Strategies
174
+
175
+ As of redis_failover version 1.0, the notion of "node" and "failover" strategies exists. All running Node Managers will periodically record
176
+ "snapshots" of their view of the redis nodes. The primary Node Manager will process these snapshots from all of the Node Managers by running a configurable
177
+ node strategy. By default, a majority strategy is used. This means that if a majority of Node Managers indicate that a node is unavailable, then the primary
178
+ Node Manager will officially mark it as unavailable. Other strategies exist:
179
+
180
+ - consensus (all Node Managers must agree that the node is unavailable)
181
+ - single (at least one Node Manager saying the node is unavailable will cause the node to be marked as such)
182
+
183
+ When a failover happens, the primary Node Manager will now consult a "failover strategy" to determine which candidate node should be used. Currently only a single
184
+ strategy is provided by redis_failover: latency. This strategy simply selects a node that is both marked as available by all Node Managers and has the lowest
185
+ average latency for its last health check.
186
+
187
+ Note that you should set the "required_node_managers" configuration option appropriately. This value (defaults to 1) is used to determine if enough Node
188
+ Managers have reported their view of a node's state. For example, if you have deployed 5 Node Managers, then you should set this value to 5 if you only
189
+ want to accept a node's availability when all 5 Node Managers are part of the snapshot. To give yourself flexibility, you may want to set this value to 3
190
+ instead. This would give you flexibility to take down 2 Node Managers, while still allowing the cluster to be managed appropriately.
191
+
192
+ ## Documentation
193
+
194
+ redis_failover uses YARD for its API documentation. Refer to the generated [API documentation](http://rubydoc.info/github/ryanlecompte/redis_failover/master/frames) for full coverage.
195
+
196
+ ## Requirements
197
+
198
+ - redis_failover is actively tested against MRI 1.8.7/1.9.2/1.9.3 and JRuby 1.6.7 (1.9 mode only). Other rubies may work, although I don't actively test against them.
199
+ - redis_failover requires a ZooKeeper service cluster to ensure reliability and data consistency. ZooKeeper is very simple and easy to get up and running. Please refer to this [Quick ZooKeeper Guide](https://github.com/ryanlecompte/redis_failover/wiki/Quick-ZooKeeper-Guide) to get up and running quickly if you don't already have ZooKeeper as a part of your environment.
200
+
201
+ ## Considerations
202
+
203
+ - Note that by default the Node Manager will mark slaves that are currently syncing with their master as "available" based on the configuration value set for "slave-serve-stale-data" in redis.conf. By default this value is set to "yes" in the configuration, which means that slaves still syncing with their master will be available for servicing read requests. If you don't want this behavior, just set "slave-serve-stale-data" to "no" in your redis.conf file.
204
+
205
+ ## Limitations
206
+
207
+ - Note that it's still possible for the RedisFailover::Client instances to see a stale list of servers for a very small window. In most cases this will not be the case due to how ZooKeeper handles distributed communication, but you should be aware that in the worst case the client could write to a "stale" master for a small period of time until the next watch event is received by the client via ZooKeeper.
208
+
209
+ ## Resources
210
+
211
+ - Check out Steve Whittaker's [redis-failover-test](https://github.com/swhitt/redis-failover-test) project which shows how to test redis_failover in a non-trivial configuration using Vagrant/Chef.
212
+ - To learn more about Redis master/slave replication, see the [Redis documentation](http://redis.io/topics/replication).
213
+ - To learn more about ZooKeeper, see the official [ZooKeeper](http://zookeeper.apache.org/) site.
214
+ - See the [Quick ZooKeeper Guide](https://github.com/ryanlecompte/redis_failover/wiki/Quick-ZooKeeper-Guide) for a quick guide to getting ZooKeeper up and running with redis_failover.
215
+ - To learn more about how ZooKeeper handles network partitions, see [ZooKeeper Failure Scenarios](http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios)
216
+ - Slides for a [lightning talk](http://www.slideshare.net/ryanlecompte/handling-redis-failover-with-zookeeper) that I gave at BaRuCo 2012.
217
+ - Feel free to join #zk-gem on the IRC freenode network. We're usually hanging out there talking about ZooKeeper and redis_failover.
218
+
219
+
220
+ ## License
221
+
222
+ Please see LICENSE for licensing details.
223
+
224
+ ## Author
225
+
226
+ Ryan LeCompte
227
+
228
+ [@ryanlecompte](https://twitter.com/ryanlecompte)
229
+
230
+ ## Acknowledgements
231
+
232
+ Special thanks to [Eric Lindvall](https://github.com/eric) and [Jonathan Simms](https://github.com/slyphon) for their invaluable ZooKeeper advice and guidance!
233
+
234
+ ## Contributing
235
+
236
+ 1. Fork it
237
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
238
+ 3. Commit your changes (`git commit -am 'Added some feature'`)
239
+ 4. Push to the branch (`git push origin my-new-feature`)
240
+ 5. Create new Pull Request