beetle 0.2.3 → 0.2.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2010 XING AG
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,124 @@
1
+ = Automatic Redis Failover for Beetle
2
+
3
+ == Introduction
4
+
5
+ Redis is used as the persistence layer in the AMQP message deduplication
6
+ process. Because it is such a critical piece in our infrastructure, it is
7
+ essential that a failure of this service is as unlikely as possible. As our
8
+ AMQP workers are working in a highly distributed manner, all accessing the same
9
+ Redis server, a automatic failover to another Redis server has to be very
10
+ defensive and ensure that every worker in the system will switch to the new
11
+ server at the same time. If the new server would not get accepted from every
12
+ worker, a switch would not be possible. This ensures that even in the case of a
13
+ partitioned network it is impossible that two different workers use two
14
+ different Redis servers for message deduplication.
15
+
16
+ == Our goals
17
+
18
+ * opt-in, no need to use the redis-failover solution
19
+ * no single point of failure
20
+ * automatic switch in case of redis-master failure
21
+ * switch should not cause inconsistent data on the redis servers
22
+ * workers should be able to determine the current redis-master without asking
23
+ another process (as long as the redis servers are working)
24
+
25
+ == How it works
26
+
27
+ To ensure consistency, a service (the Redis Configuration Server - RCS) is
28
+ constantly checking the availability and configuration of the currently
29
+ configured Redis master server. If this service detects that the Redis master
30
+ is no longer available, it tries to find an alternative server (one of the
31
+ slaves) which could be promoted to be the new Redis master.
32
+
33
+ On every worker server runs another daemon, the Redis Configuration Client
34
+ (RCC) which listens to messages sent by the RCS.
35
+
36
+ If the RCS finds another potential Redis Master, it sends out a message to see
37
+ if all known RCCs are still available (once again to eliminate the risk of a
38
+ partitioned network) and if they agree to the master switch.
39
+
40
+ If all RCCs have answered to that message, the RCS sends out a message which
41
+ tells the RCCs to invalidate the current master.
42
+
43
+ This happens by deleting the contents of a special file which is used
44
+ by the workers to store the current Redis master (the content of that file is
45
+ the hostname:port of the currently active Redis master). By doing that, it is
46
+ ensured that no operations are done to the old Redis master server anymore, because the
47
+ AMQP workers check this file's mtime and reads its contents in case that the
48
+ file changed, before every Redis operation. When the file has been emptied, the
49
+ RCCs respond to the "invalidate" message of the RCS. When all RCCs have
50
+ responded, the RCS knows for sure that it is safe to switch the Redis master
51
+ now. It sends a "reconfigure" message with the new Redis master hostname:port
52
+ to the RCCs, which then write that value into their redis master file.
53
+
54
+ Additionally, the RCS sends reconfigure messages with the current Redis master
55
+ periodically, to allow new RCCs to pick up the current master. Plus it turns
56
+ all other redis servers into slaves of the current master.
57
+
58
+ === Prerequisites
59
+
60
+ * one redis-configuration-server process ("RCS", on one server), one redis-configuration-client process ("RCC") on every worker server
61
+ * the RCS knows about all possible RCCs using a list of client ids
62
+ * the RCS and RCCs exchange messages via a "system queue"
63
+
64
+ === Flow of actions
65
+
66
+ * on startup, an RCC can consult its redis master file to determine the current master without the help of the RCS by checking that it's still a master (or wait for the periodic reconfigure message with the current master from the RCS)
67
+ * when the RCS finds the master to be down, it will retry a couple of times before starting a reconfiguration round
68
+ * the RCS sends all RCCs a "ping" message to check if every client is there and able to to answer
69
+ * the RCCs acknowledge via a "pong" message if they can confirm the current master to be unavailable
70
+ * the RCS waits for *all* RCCs to reply via pong
71
+ * the RCS tells all RCCs to stop using the master by sending an "invalidate" message
72
+ * the RCCs acknowledge via an "invalidated" message if they can still confirm the current master to be unavailable
73
+ * the RCS waits for *all* RCCs to acknowledge the invalidation
74
+ * the RCS promotes the former slave to become the new master (by sending SLAVEOF no one)
75
+ * the RCS sends a "reconfigure" message containing the new master to every RCC
76
+ * the RCCs write the new master to their redis master file
77
+
78
+ === Configuration
79
+
80
+ See Beetle::Configuration for setting redis configuration server and client options.
81
+
82
+ Please note:
83
+ Beetle::Configuration#redis_server must be a file path (not a redis host:port string) to use the redis failover. The RCS and RCCs store the current redis master in that file, and the handlers read from it.
84
+
85
+ == How to use it
86
+
87
+ This example uses two worker servers, identified by rcc-1 and rcc-2.
88
+
89
+ Please note:
90
+ All command line options can also be given as a yaml configuration file via the --config-file option.
91
+
92
+ === On one server
93
+
94
+ Start the Redis Configuration Server:
95
+
96
+ beetle configuration_server start -- --redis-servers redis-1:6379,redis-2:6379 --client-ids rcc-1,rcc-2
97
+
98
+ Get help for starting/stopping the server:
99
+
100
+ beetle configuration_server -h
101
+
102
+ Get help for server options:
103
+
104
+ beetle configuration_server start -- -h
105
+
106
+ === On every worker server
107
+
108
+ Start the Redis Configuration Client:
109
+
110
+ On first worker server:
111
+
112
+ beetle configuration_client start -- --client-id rcc-1
113
+
114
+ On second worker server:
115
+
116
+ beetle configuration_client start -- --client-id rcc-2
117
+
118
+ Get help for starting/stopping the client:
119
+
120
+ beetle configuration_client -h
121
+
122
+ Get help for client options:
123
+
124
+ beetle configuration_client start -- -h
@@ -0,0 +1,50 @@
1
+ = Release Notes
2
+
3
+ == Version 0.2.5
4
+
5
+ Added missing files to gem and rdoc
6
+
7
+ == Version 0.2.4
8
+
9
+ Log and send a system notification when pong message from unknown client received.
10
+
11
+ == Version 0.2.2
12
+
13
+ Patch release which upgrades to redis-rb 2.0.4. This enables us to drop our redis monkey
14
+ patch which enabled connection timeouts for earlier redis versions. Note that earlier
15
+ Beetle versions are not compatible with redis 2.0.4.
16
+
17
+ == Version 0.2.1
18
+
19
+ Improved error message when no rabbitmq broker is available.
20
+
21
+ == Version 0.2
22
+
23
+ This version adds support for automatic redis deduplication store failover (see separate
24
+ file REDIS_AUTO_FAILOVER.rdoc).
25
+
26
+ === User visible changes
27
+
28
+ * it's possible to register auto deleted queues and exchanges
29
+ * Beetle::Client#configure returns self in order to simplify client setup
30
+ * it's possible to trace specific messages (see Beetle::Client#trace)
31
+ * default message handler timeout is 10 minutes now
32
+ * system wide configuration values can be specified via a yml formatted configuration
33
+ file (Beetle::Configuration#config_file)
34
+ * the config value redis_server specifies either a single server or a file path (used
35
+ by the automatic redis failover logic)
36
+
37
+ === Fugs Bixed
38
+
39
+ * handle active_support seconds notation for handler timeouts correctly
40
+ * error handler was erroneously called for expired messages
41
+ * subscribers would block when some non beetle process posts an undecodable message
42
+
43
+ === Gem Dependency Changes
44
+
45
+ * redis needs to be at least version 2.0.3
46
+ * we make use of the SystemTimer gem for ruby 1.8.7
47
+
48
+ == Version 0.1
49
+
50
+ Initial Release
@@ -0,0 +1,113 @@
1
+ require 'rake'
2
+ require 'rake/testtask'
3
+ require 'rcov/rcovtask'
4
+ require 'cucumber/rake/task'
5
+
6
+ # 1.8/1.9 compatible way of loading lib/beetle.rb
7
+ $:.unshift 'lib'
8
+ require 'beetle'
9
+
10
+ namespace :test do
11
+ namespace :coverage do
12
+ desc "Delete aggregate coverage data."
13
+ task(:clean) { rm_f "coverage.data" }
14
+ end
15
+
16
+ desc 'Aggregate code coverage'
17
+ task :coverage => "test:coverage:clean"
18
+
19
+ Rcov::RcovTask.new(:coverage) do |t|
20
+ t.libs << "test"
21
+ t.test_files = FileList["test/**/*_test.rb"]
22
+ t.output_dir = "test/coverage"
23
+ t.verbose = true
24
+ t.rcov_opts << "--exclude '.*' --include-file 'lib/beetle/'"
25
+ end
26
+ task :coverage do
27
+ system 'open test/coverage/index.html'
28
+ end if RUBY_PLATFORM =~ /darwin/
29
+ end
30
+
31
+
32
+ namespace :beetle do
33
+ task :test do
34
+ Beetle::Client.new.test
35
+ end
36
+
37
+ task :trace do
38
+ trap('INT'){ EM.stop_event_loop }
39
+ Beetle::Client.new.trace
40
+ end
41
+ end
42
+
43
+ namespace :rabbit do
44
+ def start(node_name, port)
45
+ script = File.expand_path(File.dirname(__FILE__)+"/script/start_rabbit")
46
+ puts "starting rabbit #{node_name} on port #{port}"
47
+ puts "type ^C a RETURN to abort"
48
+ sleep 1
49
+ exec "sudo #{script} #{node_name} #{port}"
50
+ end
51
+ desc "start rabbit instance 1"
52
+ task :start1 do
53
+ start "rabbit1", 5672
54
+ end
55
+ desc "start rabbit instance 2"
56
+ task :start2 do
57
+ start "rabbit2", 5673
58
+ end
59
+ desc "reset rabbit instances (deletes all data!)"
60
+ task :reset do
61
+ ["rabbit1", "rabbit2"].each do |node|
62
+ `sudo rabbitmqctl -n #{node} stop_app`
63
+ `sudo rabbitmqctl -n #{node} reset`
64
+ `sudo rabbitmqctl -n #{node} start_app`
65
+ end
66
+ end
67
+ end
68
+
69
+ namespace :redis do
70
+ def config_file(suffix)
71
+ File.expand_path(File.dirname(__FILE__)+"/etc/redis-#{suffix}.conf")
72
+ end
73
+ desc "start main redis"
74
+ task :start1 do
75
+ exec "redis-server #{config_file(:master)}"
76
+ end
77
+ desc "start slave redis"
78
+ task :start2 do
79
+ exec "redis-server #{config_file(:slave)}"
80
+ end
81
+ end
82
+
83
+ Cucumber::Rake::Task.new(:cucumber) do |t|
84
+ t.cucumber_opts = "features --format progress"
85
+ end
86
+
87
+ task :default do
88
+ Rake::Task[:test].invoke
89
+ Rake::Task[:cucumber].invoke
90
+ end
91
+
92
+ Rake::TestTask.new do |t|
93
+ t.libs << "test"
94
+ t.test_files = FileList['test/**/*_test.rb']
95
+ t.verbose = true
96
+ end
97
+
98
+ require 'rake/rdoctask'
99
+
100
+ Rake::RDocTask.new do |rdoc|
101
+ rdoc.rdoc_dir = 'site/rdoc'
102
+ rdoc.title = 'Beetle'
103
+ rdoc.main = 'README.rdoc'
104
+ rdoc.options << '--line-numbers' << '--inline-source' << '--quiet'
105
+ rdoc.rdoc_files.include('**/*.rdoc')
106
+ rdoc.rdoc_files.include('MIT-LICENSE')
107
+ rdoc.rdoc_files.include('lib/**/*.rb')
108
+ end
109
+
110
+ desc "build the beetle gem"
111
+ task :build do
112
+ system("gem build beetle.gemspec")
113
+ end
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = "beetle"
3
- s.version = "0.2.3"
3
+ s.version = "0.2.5"
4
4
 
5
5
  s.required_rubygems_version = ">= 1.3.1"
6
6
  s.authors = ["Stefan Kaes", "Pascal Friederich", "Ali Jelveh", "Sebastian Roebke"]
@@ -10,8 +10,8 @@ Gem::Specification.new do |s|
10
10
  s.summary = "High Availability AMQP Messaging with Redundant Queues"
11
11
  s.email = "developers@xing.com"
12
12
  s.executables = ["beetle"]
13
- s.extra_rdoc_files = ["README.rdoc"]
14
- s.files = Dir['{examples,ext,lib}/**/*.rb'] + %w(beetle.gemspec examples/README.rdoc)
13
+ s.extra_rdoc_files = Dir['**/*.rdoc'] + %w(MIT-LICENSE)
14
+ s.files = Dir['{examples,ext,lib}/**/*.rb'] + Dir['{features,script}/**/*'] + %w(beetle.gemspec Rakefile)
15
15
  s.extensions = 'ext/mkrf_conf.rb'
16
16
  s.homepage = "http://xing.github.com/beetle/"
17
17
  s.rdoc_options = ["--charset=UTF-8"]
@@ -0,0 +1,23 @@
1
+ === Cucumber
2
+
3
+ Beetle ships with a cucumber feature to test the automatic redis failover
4
+ as an integration test.
5
+
6
+ To run it, you have to start a RabbitMQ.
7
+
8
+ The top level Rakefile comes with targets to start several RabbitMQ instances locally.
9
+ Make sure the corresponding binaries are in your search path. Open a new shell
10
+ and execute the following command:
11
+
12
+ rake rabbit:start1
13
+
14
+ Then you can run the cucumber feature by running:
15
+
16
+ cucumber
17
+
18
+ or
19
+
20
+ rake cucumber
21
+
22
+
23
+ Note: Cucumber will automatically run after the unit test when you run rake.
@@ -0,0 +1,105 @@
1
+ Feature: Redis auto failover
2
+ In order to eliminate a single point of failure
3
+ Beetle handlers should automatically switch to a new redis master in case of a redis master failure
4
+
5
+ Background:
6
+ Given a redis server "redis-1" exists as master
7
+ And a redis server "redis-2" exists as slave of "redis-1"
8
+
9
+ Scenario: Successful redis master switch
10
+ Given a redis configuration server using redis servers "redis-1,redis-2" with clients "rc-client-1,rc-client-2" exists
11
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
12
+ And a redis configuration client "rc-client-2" using redis servers "redis-1,redis-2" exists
13
+ And a beetle handler using the redis-master file from "rc-client-1" exists
14
+ And redis server "redis-1" is down
15
+ And the retry timeout for the redis master check is reached
16
+ Then a system notification for "redis-1" not being available should be sent
17
+ And the role of redis server "redis-2" should be "master"
18
+ And the redis master of "rc-client-1" should be "redis-2"
19
+ And the redis master of "rc-client-2" should be "redis-2"
20
+ And the redis master of the beetle handler should be "redis-2"
21
+ And a system notification for switching from "redis-1" to "redis-2" should be sent
22
+ Given a redis server "redis-1" exists as master
23
+ Then the role of redis server "redis-1" should be "slave"
24
+
25
+ Scenario: Redis master only temporarily down (no switch necessary)
26
+ Given a redis configuration server using redis servers "redis-1,redis-2" with clients "rc-client-1,rc-client-2" exists
27
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
28
+ And a redis configuration client "rc-client-2" using redis servers "redis-1,redis-2" exists
29
+ And a beetle handler using the redis-master file from "rc-client-1" exists
30
+ And redis server "redis-1" is down for less seconds than the retry timeout for the redis master check
31
+ And the retry timeout for the redis master check is reached
32
+ Then the role of redis server "redis-1" should be "master"
33
+ Then the role of redis server "redis-2" should be "slave"
34
+ And the redis master of "rc-client-1" should be "redis-1"
35
+ And the redis master of "rc-client-2" should be "redis-1"
36
+ And the redis master of the beetle handler should be "redis-1"
37
+
38
+ Scenario: Not all redis configuration clients available (no switch possible)
39
+ Given a redis configuration server using redis servers "redis-1,redis-2" with clients "rc-client-1,rc-client-2" exists
40
+ And redis server "redis-1" is down
41
+ And the retry timeout for the redis master check is reached
42
+ Then the role of redis server "redis-2" should be "slave"
43
+
44
+ Scenario: No redis slave available to become new master (no switch possible)
45
+ Given a redis configuration server using redis servers "redis-1,redis-2" with clients "rc-client-1,rc-client-2" exists
46
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
47
+ And a redis configuration client "rc-client-2" using redis servers "redis-1,redis-2" exists
48
+ And redis server "redis-1" is down
49
+ And redis server "redis-2" is down
50
+ And the retry timeout for the redis master check is reached
51
+ Then the redis master of "rc-client-1" should be "redis-1"
52
+ And the redis master of "rc-client-2" should be "redis-1"
53
+ And a system notification for no slave available to become new master should be sent
54
+
55
+ Scenario: Redis configuration client starts while no redis master available
56
+ Given redis server "redis-1" is down
57
+ And redis server "redis-2" is down
58
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
59
+ And the retry timeout for the redis master determination is reached
60
+ Then the redis master of "rc-client-1" should be undefined
61
+
62
+ Scenario: Redis configuration client starts while no redis master available but master file exists
63
+ Given redis server "redis-1" is down
64
+ And redis server "redis-2" is down
65
+ And an old redis master file for "rc-client-1" with master "redis-1" exists
66
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
67
+ And the retry timeout for the redis master determination is reached
68
+ Then the redis master of "rc-client-1" should be undefined
69
+
70
+ Scenario: Redis configuration client starts while both redis servers are master
71
+ Given redis server "redis-2" is master
72
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
73
+ Then the redis master of "rc-client-1" should be undefined
74
+
75
+ Scenario: Redis configuration client starts while both redis servers are master but master file exists
76
+ Given redis server "redis-2" is master
77
+ And an old redis master file for "rc-client-1" with master "redis-1" exists
78
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
79
+ Then the redis master of "rc-client-1" should be "redis-1"
80
+
81
+ Scenario: Redis configuration client starts while both redis servers are slave
82
+ Given a redis server "redis-3" exists as master
83
+ And redis server "redis-1" is slave of "redis-3"
84
+ And redis server "redis-2" is slave of "redis-3"
85
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
86
+ Then the redis master of "rc-client-1" should be undefined
87
+
88
+ Scenario: Redis configuration client starts while both redis servers are slave but master file exists
89
+ Given a redis server "redis-3" exists as master
90
+ And redis server "redis-1" is slave of "redis-3"
91
+ And redis server "redis-2" is slave of "redis-3"
92
+ And an old redis master file for "rc-client-1" with master "redis-1" exists
93
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
94
+ Then the redis master of "rc-client-1" should be undefined
95
+
96
+ Scenario: Redis configuration client starts while there is a redis master but no slave
97
+ Given redis server "redis-2" is down
98
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
99
+ Then the redis master of "rc-client-1" should be undefined
100
+
101
+ Scenario: Redis configuration client starts while there is a redis master but no slave but master file exists
102
+ Given redis server "redis-2" is down
103
+ And an old redis master file for "rc-client-1" with master "redis-1" exists
104
+ And a redis configuration client "rc-client-1" using redis servers "redis-1,redis-2" exists
105
+ Then the redis master of "rc-client-1" should be "redis-1"