beetle 0.2.3 → 0.2.5

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2010 XING AG
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,124 @@
1
+ = Automatic Redis Failover for Beetle
2
+
3
+ == Introduction
4
+
5
+ Redis is used as the persistence layer in the AMQP message deduplication
6
+ process. Because it is such a critical piece in our infrastructure, it is
7
+ essential that a failure of this service is as unlikely as possible. As our
8
+ AMQP workers are working in a highly distributed manner, all accessing the same
9
+ Redis server, a automatic failover to another Redis server has to be very
10
+ defensive and ensure that every worker in the system will switch to the new
11
+ server at the same time. If the new server would not get accepted from every
12
+ worker, a switch would not be possible. This ensures that even in the case of a
13
+ partitioned network it is impossible that two different workers use two
14
+ different Redis servers for message deduplication.
15
+
16
+ == Our goals
17
+
18
+ * opt-in, no need to use the redis-failover solution
19
+ * no single point of failure
20
+ * automatic switch in case of redis-master failure
21
+ * switch should not cause inconsistent data on the redis servers
22
+ * workers should be able to determine the current redis-master without asking
23
+ another process (as long as the redis servers are working)
24
+
25
+ == How it works
26
+
27
+ To ensure consistency, a service (the Redis Configuration Server - RCS) is
28
+ constantly checking the availability and configuration of the currently
29
+ configured Redis master server. If this service detects that the Redis master
30
+ is no longer available, it tries to find an alternative server (one of the
31
+ slaves) which could be promoted to be the new Redis master.
32
+
33
+ On every worker server runs another daemon, the Redis Configuration Client
34
+ (RCC) which listens to messages sent by the RCS.
35
+
36
+ If the RCS finds another potential Redis Master, it sends out a message to see
37
+ if all known RCCs are still available (once again to eliminate the risk of a
38
+ partitioned network) and if they agree to the master switch.
39
+
40
+ If all RCCs have answered to that message, the RCS sends out a message which
41
+ tells the RCCs to invalidate the current master.
42
+
43
+ This happens by deleting the contents of a special file which is used
44
+ by the workers to store the current Redis master (the content of that file is
45
+ the hostname:port of the currently active Redis master). By doing that, it is
46
+ ensured that no operations are done to the old Redis master server anymore, because the
47
+ AMQP workers check this file's mtime and reads its contents in case that the
48
+ file changed, before every Redis operation. When the file has been emptied, the
49
+ RCCs respond to the "invalidate" message of the RCS. When all RCCs have
50
+ responded, the RCS knows for sure that it is safe to switch the Redis master
51
+ now. It sends a "reconfigure" message with the new Redis master hostname:port
52
+ to the RCCs, which then write that value into their redis master file.
53
+
54
+ Additionally, the RCS sends reconfigure messages with the current Redis master
55
+ periodically, to allow new RCCs to pick up the current master. Plus it turns
56
+ all other redis servers into slaves of the current master.
57
+
58
+ === Prerequisites
59
+
60
+ * one redis-configuration-server process ("RCS", on one server), one redis-configuration-client process ("RCC") on every worker server
61
+ * the RCS knows about all possible RCCs using a list of client ids
62
+ * the RCS and RCCs exchange messages via a "system queue"
63
+
64
+ === Flow of actions
65
+
66
+ * on startup, an RCC can consult its redis master file to determine the current master without the help of the RCS by checking that it's still a master (or wait for the periodic reconfigure message with the current master from the RCS)
67
+ * when the RCS finds the master to be down, it will retry a couple of times before starting a reconfiguration round
68
+ * the RCS sends all RCCs a "ping" message to check if every client is there and able to to answer
69
+ * the RCCs acknowledge via a "pong" message if they can confirm the current master to be unavailable
70
+ * the RCS waits for *all* RCCs to reply via pong
71
+ * the RCS tells all RCCs to stop using the master by sending an "invalidate" message
72
+ * the RCCs acknowledge via an "invalidated" message if they can still confirm the current master to be unavailable
73
+ * the RCS waits for *all* RCCs to acknowledge the invalidation
74
+ * the RCS promotes the former slave to become the new master (by sending SLAVEOF no one)
75
+ * the RCS sends a "reconfigure" message containing the new master to every RCC
76
+ * the RCCs write the new master to their redis master file
77
+
78
+ === Configuration
79
+
80
+ See Beetle::Configuration for setting redis configuration server and client options.
81
+
82
+ Please note:
83
+ Beetle::Configuration#redis_server must be a file path (not a redis host:port string) to use the redis failover. The RCS and RCCs store the current redis master in that file, and the handlers read from it.
84
+
85
+ == How to use it
86
+
87
+ This example uses two worker servers, identified by rcc-1 and rcc-2.
88
+
89
+ Please note:
90
+ All command line options can also be given as a yaml configuration file via the --config-file option.
91
+
92
+ === On one server
93
+
94
+ Start the Redis Configuration Server:
95
+
96
+ beetle configuration_server start -- --redis-servers redis-1:6379,redis-2:6379 --client-ids rcc-1,rcc-2
97
+
98
+ Get help for starting/stopping the server:
99
+
100
+ beetle configuration_server -h
101
+
102
+ Get help for server options:
103
+
104
+ beetle configuration_server start -- -h
105
+
106
+ === On every worker server
107
+
108
+ Start the Redis Configuration Client:
109
+
110
+ On first worker server:
111
+
112
+ beetle configuration_client start -- --client-id rcc-1
113
+
114
+ On second worker server:
115
+
116
+ beetle configuration_client start -- --client-id rcc-2
117
+
118
+ Get help for starting/stopping the client:
119
+
120
+ beetle configuration_client -h
121
+
122
+ Get help for client options:
123
+
124
+ beetle configuration_client start -- -h
@@ -0,0 +1,50 @@
1
+ = Release Notes
2
+
3
+ == Version 0.2.5
4
+
5
+ Added missing files to gem and rdoc
6
+
7
+ == Version 0.2.4
8
+
9
+ Log and send a system notification when pong message from unknown client received.
10
+
11
+ == Version 0.2.2
12
+
13
+ Patch release which upgrades to redis-rb 2.0.4. This enables us to drop our redis monkey
14
+ patch which enabled connection timeouts for earlier redis versions. Note that earlier
15
+ Beetle versions are not compatible with redis 2.0.4.
16
+
17
+ == Version 0.2.1
18
+
19
+ Improved error message when no rabbitmq broker is available.
20
+
21
+ == Version 0.2
22
+
23
+ This version adds support for automatic redis deduplication store failover (see separate
24
+ file REDIS_AUTO_FAILOVER.rdoc).
25
+
26
+ === User visible changes
27
+
28
+ * it's possible to register auto deleted queues and exchanges
29
+ * Beetle::Client#configure returns self in order to simplify client setup
30
+ * it's possible to trace specific messages (see Beetle::Client#trace)
31
+ * default message handler timeout is 10 minutes now
32
+ * system wide configuration values can be specified via a yml formatted configuration
33
+ file (Beetle::Configuration#config_file)
34
+ * the config value redis_server specifies either a single server or a file path (used
35
+ by the automatic redis failover logic)
36
+
37
+ === Fugs Bixed
38
+
39
+ * handle active_support seconds notation for handler timeouts correctly
40
+ * error handler was erroneously called for expired messages
41
+ * subscribers would block when some non beetle process posts an undecodable message
42
+
43
+ === Gem Dependency Changes
44
+
45
+ * redis needs to be at least version 2.0.3
46
+ * we make use of the SystemTimer gem for ruby 1.8.7
47
+
48
+ == Version 0.1
49
+
50
+ Initial Release
@@ -0,0 +1,113 @@
1
+ require 'rake'
2
+ require 'rake/testtask'
3
+ require 'rcov/rcovtask'
4
+ require 'cucumber/rake/task'
5
+
6
+ # 1.8/1.9 compatible way of loading lib/beetle.rb
7
+ $:.unshift 'lib'
8
+ require 'beetle'
9
+
10
+ namespace :test do
11
+ namespace :coverage do
12
+ desc "Delete aggregate coverage data."
13
+ task(:clean) { rm_f "coverage.data" }
14
+ end
15
+
16
+ desc 'Aggregate code coverage'
17
+ task :coverage => "test:coverage:clean"
18
+
19
+ Rcov::RcovTask.new(:coverage) do |t|
20
+ t.libs << "test"
21
+ t.test_files = FileList["test/**/*_test.rb"]
22
+ t.output_dir = "test/coverage"
23
+ t.verbose = true
24
+ t.rcov_opts << "--exclude '.*' --include-file 'lib/beetle/'"
25
+ end
26
+ task :coverage do
27
+ system 'open test/coverage/index.html'
28
+ end if RUBY_PLATFORM =~ /darwin/
29
+ end
30
+
31
+
32
+ namespace :beetle do
33
+ task :test do
34
+ Beetle::Client.new.test
35
+ end
36
+
37
+ task :trace do
38
+ trap('INT'){ EM.stop_event_loop }
39
+ Beetle::Client.new.trace
40
+ end
41
+ end
42
+
43
+ namespace :rabbit do
44
+ def start(node_name, port)
45
+ script = File.expand_path(File.dirname(__FILE__)+"/script/start_rabbit")
46
+ puts "starting rabbit #{node_name} on port #{port}"
47
+ puts "type ^C a RETURN to abort"
48
+ sleep 1
49
+ exec "sudo #{script} #{node_name} #{port}"
50
+ end
51
+ desc "start rabbit instance 1"
52
+ task :start1 do
53
+ start "rabbit1", 5672
54
+ end
55
+ desc "start rabbit instance 2"
56
+ task :start2 do
57
+ start "rabbit2", 5673
58
+ end
59
+ desc "reset rabbit instances (deletes all data!)"
60
+ task :reset do
61
+ ["rabbit1", "rabbit2"].each do |node|
62
+ `sudo rabbitmqctl -n #{node} stop_app`
63
+ `sudo rabbitmqctl -n #{node} reset`
64
+ `sudo rabbitmqctl -n #{node} start_app`
65
+ end
66
+ end
67
+ end
68
+
69
+ namespace :redis do
70
+ def config_file(suffix)
71
+ File.expand_path(File.dirname(__FILE__)+"/etc/redis-#{suffix}.conf")
72
+ end
73
+ desc "start main redis"
74
+ task :start1 do
75
+ exec "redis-server #{config_file(:master)}"
76
+ end
77
+ desc "start slave redis"
78
+ task :start2 do
79
+ exec "redis-server #{config_file(:slave)}"
80
+ end
81
+ end
82
+
83
+ Cucumber::Rake::Task.new(:cucumber) do |t|
84
+ t.cucumber_opts = "features --format progress"
85
+ end
86
+
87
+ task :default do
88
+ Rake::Task[:test].invoke
89
+ Rake::Task[:cucumber].invoke
90
+ end
91
+
92
+ Rake::TestTask.new do |t|
93
+ t.libs << "test"
94
+ t.test_files = FileList['test/**/*_test.rb']
95
+ t.verbose = true
96
+ end
97
+
98
+ require 'rake/rdoctask'
99
+
100
+ Rake::RDocTask.new do |rdoc|
101
+ rdoc.rdoc_dir = 'site/rdoc'
102
+ rdoc.title = 'Beetle'
103
+ rdoc.main = 'README.rdoc'
104
+ rdoc.options << '--line-numbers' << '--inline-source' << '--quiet'
105
+ rdoc.rdoc_files.include('**/*.rdoc')
106
+ rdoc.rdoc_files.include('MIT-LICENSE')
107
+ rdoc.rdoc_files.include('lib/**/*.rb')
108
+ end
109
+
110
+ desc "build the beetle gem"
111
+ task :build do
112
+ system("gem build beetle.gemspec")
113
+ end
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = "beetle"
3
- s.version = "0.2.3"
3
+ s.version = "0.2.5"
4
4
 
5
5
  s.required_rubygems_version = ">= 1.3.1"
6
6
  s.authors = ["Stefan Kaes", "Pascal Friederich", "Ali Jelveh", "Sebastian Roebke"]
@@ -10,8 +10,8 @@ Gem::Specification.new do |s|
10
10
  s.summary = "High Availability AMQP Messaging with Redundant Queues"
11
11
  s.email = "developers@xing.com"
12
12
  s.executables = ["beetle"]
13
- s.extra_rdoc_files = ["README.rdoc"]
14
- s.files = Dir['{examples,ext,lib}/**/*.rb'] + %w(beetle.gemspec examples/README.rdoc)
13
+ s.extra_rdoc_files = Dir['**/*.rdoc'] + %w(MIT-LICENSE)
14
+ s.files = Dir['{examples,ext,lib}/**/*.rb'] + Dir['{features,script}/**/*'] + %w(beetle.gemspec Rakefile)
15
15
  s.extensions = 'ext/mkrf_conf.rb'
16
16
  s.homepage = "http://xing.github.com/beetle/"
17
17
  s.rdoc_options = ["--charset=UTF-8"]
@@ -0,0 +1,23 @@
1
+ === Cucumber
2
+
3
+ Beetle ships with a cucumber feature to test the automatic redis failover
4
+ as an integration test.
5
+
6
+ To run it, you have to start a RabbitMQ.
7
+
8
+ The top level Rakefile comes with targets to start several RabbitMQ instances locally.
9
+ Make sure the corresponding binaries are in your search path. Open a new shell
10
+ and execute the following command:
11
+
12
+ rake rabbit:start1
13
+
14
+ Then you can run the cucumber feature by running:
15
+
16
+ cucumber
17
+
18
+ or
19
+
20
+ rake cucumber
21
+
22
+
23
+ Note: Cucumber will automatically run after the unit test when you run rake.
@@ -0,0 +1,105 @@
1
+ Feature: Redis auto failover
2
+ In order to eliminate a single point of failure
3
+ Beetle handlers should automatically switch to a new redis master in case of a redis master failure
4
+
5
+ Background:
6
+ Given a redis server "redis-1" exists as master
7
+ And a redis server "redis-2" exists as slave of "redis-1"
8
+
9
+ Scenario: Successful redis master switch
10
+ Given a redis configuration server using redis servers "redis-1,redis-2" with clients "rc-client-1,rc-client-2" exists
11
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
12
+ And a redis configuration client "rc-client-2" using redis servers "redis-1,redis-2" exists
13
+ And a beetle handler using the redis-master file from "rc-client-1" exists
14
+ And redis server "redis-1" is down
15
+ And the retry timeout for the redis master check is reached
16
+ Then a system notification for "redis-1" not being available should be sent
17
+ And the role of redis server "redis-2" should be "master"
18
+ And the redis master of "rc-client-1" should be "redis-2"
19
+ And the redis master of "rc-client-2" should be "redis-2"
20
+ And the redis master of the beetle handler should be "redis-2"
21
+ And a system notification for switching from "redis-1" to "redis-2" should be sent
22
+ Given a redis server "redis-1" exists as master
23
+ Then the role of redis server "redis-1" should be "slave"
24
+
25
+ Scenario: Redis master only temporarily down (no switch necessary)
26
+ Given a redis configuration server using redis servers "redis-1,redis-2" with clients "rc-client-1,rc-client-2" exists
27
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
28
+ And a redis configuration client "rc-client-2" using redis servers "redis-1,redis-2" exists
29
+ And a beetle handler using the redis-master file from "rc-client-1" exists
30
+ And redis server "redis-1" is down for less seconds than the retry timeout for the redis master check
31
+ And the retry timeout for the redis master check is reached
32
+ Then the role of redis server "redis-1" should be "master"
33
+ Then the role of redis server "redis-2" should be "slave"
34
+ And the redis master of "rc-client-1" should be "redis-1"
35
+ And the redis master of "rc-client-2" should be "redis-1"
36
+ And the redis master of the beetle handler should be "redis-1"
37
+
38
+ Scenario: Not all redis configuration clients available (no switch possible)
39
+ Given a redis configuration server using redis servers "redis-1,redis-2" with clients "rc-client-1,rc-client-2" exists
40
+ And redis server "redis-1" is down
41
+ And the retry timeout for the redis master check is reached
42
+ Then the role of redis server "redis-2" should be "slave"
43
+
44
+ Scenario: No redis slave available to become new master (no switch possible)
45
+ Given a redis configuration server using redis servers "redis-1,redis-2" with clients "rc-client-1,rc-client-2" exists
46
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
47
+ And a redis configuration client "rc-client-2" using redis servers "redis-1,redis-2" exists
48
+ And redis server "redis-1" is down
49
+ And redis server "redis-2" is down
50
+ And the retry timeout for the redis master check is reached
51
+ Then the redis master of "rc-client-1" should be "redis-1"
52
+ And the redis master of "rc-client-2" should be "redis-1"
53
+ And a system notification for no slave available to become new master should be sent
54
+
55
+ Scenario: Redis configuration client starts while no redis master available
56
+ Given redis server "redis-1" is down
57
+ And redis server "redis-2" is down
58
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
59
+ And the retry timeout for the redis master determination is reached
60
+ Then the redis master of "rc-client-1" should be undefined
61
+
62
+ Scenario: Redis configuration client starts while no redis master available but master file exists
63
+ Given redis server "redis-1" is down
64
+ And redis server "redis-2" is down
65
+ And an old redis master file for "rc-client-1" with master "redis-1" exists
66
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
67
+ And the retry timeout for the redis master determination is reached
68
+ Then the redis master of "rc-client-1" should be undefined
69
+
70
+ Scenario: Redis configuration client starts while both redis servers are master
71
+ Given redis server "redis-2" is master
72
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
73
+ Then the redis master of "rc-client-1" should be undefined
74
+
75
+ Scenario: Redis configuration client starts while both redis servers are master but master file exists
76
+ Given redis server "redis-2" is master
77
+ And an old redis master file for "rc-client-1" with master "redis-1" exists
78
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
79
+ Then the redis master of "rc-client-1" should be "redis-1"
80
+
81
+ Scenario: Redis configuration client starts while both redis servers are slave
82
+ Given a redis server "redis-3" exists as master
83
+ And redis server "redis-1" is slave of "redis-3"
84
+ And redis server "redis-2" is slave of "redis-3"
85
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
86
+ Then the redis master of "rc-client-1" should be undefined
87
+
88
+ Scenario: Redis configuration client starts while both redis servers are slave but master file exists
89
+ Given a redis server "redis-3" exists as master
90
+ And redis server "redis-1" is slave of "redis-3"
91
+ And redis server "redis-2" is slave of "redis-3"
92
+ And an old redis master file for "rc-client-1" with master "redis-1" exists
93
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
94
+ Then the redis master of "rc-client-1" should be undefined
95
+
96
+ Scenario: Redis configuration client starts while there is a redis master but no slave
97
+ Given redis server "redis-2" is down
98
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
99
+ Then the redis master of "rc-client-1" should be undefined
100
+
101
+ Scenario: Redis configuration client starts while there is a redis master but no slave but master file exists
102
+ Given redis server "redis-2" is down
103
+ And an old redis master file for "rc-client-1" with master "redis-1" exists
104
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
105
+ Then the redis master of "rc-client-1" should be "redis-1"