apprentice 0.0.5 → 0.0.6
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +26 -10
- data/apprentice.gemspec +2 -2
- data/lib/apprentice.rb +21 -2
- data/lib/apprentice/checker.rb +50 -2
- data/lib/apprentice/checks/galera.rb +101 -1
- data/lib/apprentice/checks/mysql.rb +127 -0
- data/lib/apprentice/configuration.rb +80 -9
- data/lib/apprentice/server.rb +16 -3
- data/lib/apprentice/version.rb +2 -1
- data/ruby-apprentice.default +9 -1
- data/ruby-apprentice.init +5 -5
- metadata +5 -4
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 5677b66a1b063db82716d77a866bed546a2e873b
|
4
|
+
data.tar.gz: 3cc0c846e58fccd80625fe2ed1728fc402a6d1f4
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: dadebfde96c397883370a51b3f3f20199245f78dd7dc451eac833fb4bbc3ce40677ad0aeb4b76a41fe90127dbab57eebaffe82ad4a34e3935e73f02c06058220
|
7
|
+
data.tar.gz: a210337e0c220d29f93bfc3e8d66d382c37bcf2ad304e68dfe735911627c98b386a98cdad19a0227742cba03d1b1da01374a2d2040fb17219d88b556de5eaf3c
|
data/README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
# Apprentice
|
2
2
|
|
3
|
-
Apprentice is tiny server application that determines the
|
3
|
+
Apprentice is tiny server application (under 300 lines of ruby code) that determines the integrity of a running [MariaDB/MySQL slave](https://mariadb.com/kb/en/replication-overview/) or [MariaDB Galera master-master cluster member](https://mariadb.com/kb/en/what-is-mariadb-galera-cluster/) and responds to HTTP requests on a pre-defined port, depending on the state of the server it is checking on.
|
4
4
|
|
5
5
|
## How does it work?
|
6
6
|
|
@@ -10,13 +10,20 @@ You can find out about the syntax by running `apprentice --help`:
|
|
10
10
|
Usage: apprentice [options]
|
11
11
|
|
12
12
|
Specific options:
|
13
|
-
-s, --server SERVER
|
14
|
-
-u, --user USER USER to connect the server with
|
13
|
+
-s, --server SERVER SERVER to connect to
|
14
|
+
-u, --user USER USER to connect to the server with
|
15
15
|
-p, --password PASSWORD PASSWORD to use
|
16
|
+
-t, --type TYPE TYPE of server. Must either by "galera" or "mysql".
|
16
17
|
-i, --ip IP Local IP to bind to
|
18
|
+
(default: 0.0.0.0)
|
17
19
|
--port PORT Local PORT to use
|
18
|
-
|
19
|
-
--
|
20
|
+
(default: 3307)
|
21
|
+
--sql_port PORT Port of MariaDB/MySQL server to connect to
|
22
|
+
(default: 3306)
|
23
|
+
--[no-]accept-donor Accept galera cluster state "Donor/Desynced" as valid
|
24
|
+
(default: false)
|
25
|
+
--threshold SECONDS MariaDB/MySQL slave lag threshold
|
26
|
+
(default: 120)
|
20
27
|
|
21
28
|
Common options:
|
22
29
|
-h, --help Show this message
|
@@ -25,7 +32,7 @@ You can find out about the syntax by running `apprentice --help`:
|
|
25
32
|
|
26
33
|
## What it does
|
27
34
|
|
28
|
-
It determines whether or not the server it is connected to is alive and ready to serve connections to clients. Furthermore, it also determines whether said server is a healthy
|
35
|
+
It determines whether or not the server it is connected to is alive and ready to serve connections to clients. Furthermore, it also determines whether said server is a healthy enough to serve connections, i.e. doesn't suffer from slave lag or has separated from the cluster.
|
29
36
|
|
30
37
|
## What it doesn't do
|
31
38
|
|
@@ -35,7 +42,16 @@ It determines whether or not the server it is connected to is alive and ready to
|
|
35
42
|
* *`503 Service Unavailable`*: The server is unavailable and not ready for connections
|
36
43
|
|
37
44
|
## What's it checking exactly?
|
45
|
+
###MariaDB/MySQL
|
46
|
+
Apprentice checks the following variables:
|
47
|
+
|
48
|
+
* **Slave_IO_Running**: Indicates whether a slave is actually replicating from its master. If this is set to "No" or even "nil" the server is considered unfit for serving client connections.
|
49
|
+
* **Seconds_Behind_Master**: Indicates how far (in seconds) the slave is behind its master's state. A threshold above 120 is widely considered to be unsuitable for serving valid data. The lower the value the higher the risk of Apprentice returning a negative result.
|
50
|
+
* *Note*: Generally, MariaDB/MySQL slaves are lagging a little (even if it is just fractions to few seconds). A threshold value below 30 - 60 (depending on your setup) would probably be too conservative. However, YMMV.
|
51
|
+
|
52
|
+
For Apprentice to be able to check on the mentioned variables the user you specify on the command line needs [the 'REPLICATION CLIENT' privileges](http://dev.mysql.com/doc/refman/5.0/en/privileges-provided.html#priv_replication-client) granted within the given server. Otherwise Apprentice is going to return a negative result.
|
38
53
|
|
54
|
+
###Galera
|
39
55
|
Apprentice checks the following variables:
|
40
56
|
|
41
57
|
* **wsrep_cluster_size**: A cluster size below 2 is considered an error since there must never be one single server inside a cluster setup.
|
@@ -44,7 +60,8 @@ Apprentice checks the following variables:
|
|
44
60
|
* *Note*: The value `2` indicates the server in question is currently being used as a donor to another member of the cluster and might be exhibiting slow-downs and/or erratic behaviour due to elevated network traffic and disc IO. For further explanation please [consult the MariaDB documentation](https://mariadb.com/kb/en/what-is-mariadb-galera-cluster/).
|
45
61
|
|
46
62
|
## That's great and all, but what gives?
|
47
|
-
By itself, Apprentice doesn't do
|
63
|
+
By itself, Apprentice doesn't do anything all that useful. However, it accommodates [HAProxy's httpchk method](http://cbonte.github.io/haproxy-dconv/configuration-1.4.html#option%20httpchk) quite nicely, making it possible to let HAProxy not only balance connection among a large pool of MariaDB/MySQL slave nodes or cluster members but also check on their respected "health" while doing so.
|
64
|
+
Usually, HAProxy would only be able to establish a connection to a server without checking on its consistency. Apprentice does that job for you and helps HAProxy make the right decision on which servers to let a client gain access to.
|
48
65
|
|
49
66
|
## Goodies
|
50
67
|
|
@@ -53,13 +70,12 @@ I've included an init.d script, `ruby-apprentice.init` which you may use in orde
|
|
53
70
|
|
54
71
|
$ mv ruby-apprentice.init /etc/init.d/ruby-apprentice
|
55
72
|
$ chmod +x /etc/init.d/ruby-apprentice
|
56
|
-
$ mv ruby-apprentice.defaults /etc/defaults/
|
73
|
+
$ mv ruby-apprentice.defaults /etc/defaults/ruby-apprentice
|
57
74
|
|
58
75
|
Now you just need to add the relevant information for starting Apprentice. The defaults file is pretty self explanatory.
|
59
76
|
|
60
77
|
## TODO
|
61
78
|
|
62
|
-
* Write better (r)docs. I'm sorry for the abysmal state they're in right now
|
63
|
-
* Be a lot more forgiving when it comes to SQL connection errors/reconnects/server going awol.
|
64
79
|
* Finish the rspec definitions. Sorry for missing out on those as well.
|
80
|
+
* Maybe integrate a logger
|
65
81
|
* Write a better init script
|
data/apprentice.gemspec
CHANGED
@@ -8,8 +8,8 @@ Gem::Specification.new do |spec|
|
|
8
8
|
spec.version = Apprentice::VERSION
|
9
9
|
spec.authors = 'Moritz Heiber'
|
10
10
|
spec.email = %w{moritz.heiber@gmail.com}
|
11
|
-
spec.description = 'A MariaDB cluster integrity checker'
|
12
|
-
spec.summary = '
|
11
|
+
spec.description = 'A MariaDB/MySQL slave lag and cluster integrity checker'
|
12
|
+
spec.summary = 'Checks a given server for consistency and replication status'
|
13
13
|
spec.homepage = 'http://github.com/moritzheiber/apprentice'
|
14
14
|
spec.license = 'MIT'
|
15
15
|
|
data/lib/apprentice.rb
CHANGED
@@ -3,15 +3,34 @@ require 'apprentice/configuration'
|
|
3
3
|
require 'apprentice/version'
|
4
4
|
require 'apprentice/server'
|
5
5
|
|
6
|
+
# The main Apprentice module including all other modules and classes
|
6
7
|
module Apprentice
|
8
|
+
|
9
|
+
# This defines the sentinel, i.e. tiny server, Apprentice uses to communicate with e.g. HAProxy's httpchk method.
|
7
10
|
class Sentinel
|
8
|
-
include Configuration
|
9
|
-
include Server
|
11
|
+
include Configuration #:nodoc:
|
12
|
+
include Server #:nodoc:
|
10
13
|
|
14
|
+
# This depends on the Configuration module since it uses the Configuration#get_config method.
|
15
|
+
#
|
16
|
+
# ==== Return value
|
17
|
+
#
|
18
|
+
# * <tt>@options</tt> - set the global variable <tt>@options</tt> which is used inside #run the start the EventMachine server
|
11
19
|
def initialize
|
12
20
|
@options = get_config
|
13
21
|
end
|
14
22
|
|
23
|
+
# Starts the EventMachine server
|
24
|
+
#
|
25
|
+
# === Special conditions
|
26
|
+
#
|
27
|
+
# We are trapping the signals <tt>INT</tt> and <tt>TERM</tt> here in order to shut down the EventMachine gracefully.
|
28
|
+
#
|
29
|
+
# ==== Attributes
|
30
|
+
#
|
31
|
+
# * <tt>@options.ip</tt> - The server binds to this specific ip
|
32
|
+
# * <tt>@options.port</tt> - The server uses this specific port to expose its limited HTTP interface to the world
|
33
|
+
# * <tt>@options</tt> - Gets passed to the server as a whole to be used with Server::EventServer#initialize
|
15
34
|
def run
|
16
35
|
EM.run do
|
17
36
|
Signal.trap('INT') { EventMachine.stop }
|
data/lib/apprentice/checker.rb
CHANGED
@@ -1,9 +1,37 @@
|
|
1
|
+
# Contains all the relevant methods for checking on a server's state
|
2
|
+
#
|
3
|
+
# Conditionally includes either MariaDB/MySQL or Galera related checking code
|
1
4
|
module Checker
|
2
|
-
require 'apprentice/checks/galera'
|
3
|
-
include Galera
|
4
5
|
|
6
|
+
# HTTP response codes and their respective return value
|
7
|
+
#
|
8
|
+
# We're constructing our dumb HTTP response handler using these
|
5
9
|
CODES = {200 => 'OK',503 => 'Service Unavailable'}
|
6
10
|
|
11
|
+
case @type
|
12
|
+
when 'galera'
|
13
|
+
require 'apprentice/checks/galera'
|
14
|
+
include Galera
|
15
|
+
when 'mysql'
|
16
|
+
require 'apprentice/checks/mysql'
|
17
|
+
include Mysql_Checks
|
18
|
+
end
|
19
|
+
|
20
|
+
# Format our HTTP/1.1 response properly without using arbitrary line breaks.
|
21
|
+
#
|
22
|
+
# ==== Attributes
|
23
|
+
#
|
24
|
+
# * +texts+ - A hash containing all text responses returned from run_checks.
|
25
|
+
#
|
26
|
+
# ==== Return values
|
27
|
+
#
|
28
|
+
# * +value+ - The comprehensive text returned with a HTTP response.
|
29
|
+
#
|
30
|
+
# ==== Examples
|
31
|
+
#
|
32
|
+
# t = ['Something', 'Something else']
|
33
|
+
# response = format_text(t)
|
34
|
+
# response.inspect # => 'Something\r\nSomething else\r\n'
|
7
35
|
def format_text(texts)
|
8
36
|
value = ''
|
9
37
|
if !texts.empty?
|
@@ -14,6 +42,26 @@ module Checker
|
|
14
42
|
return value
|
15
43
|
end
|
16
44
|
|
45
|
+
# Generates the actual output returned by the Server::EventServer class.
|
46
|
+
#
|
47
|
+
# It's valid HTTP/1.1 and should be understood by almost any browser. Certainly by HAProxy's httpchk.
|
48
|
+
#
|
49
|
+
# ==== Attributes
|
50
|
+
#
|
51
|
+
# * +code+ - The HTTP code for the returned response
|
52
|
+
# * +text+ - Formatted text to be returned with the response
|
53
|
+
#
|
54
|
+
# ==== Return values
|
55
|
+
#
|
56
|
+
# * String - A HTTP response string
|
57
|
+
#
|
58
|
+
# ==== Examples
|
59
|
+
#
|
60
|
+
# code = 503
|
61
|
+
# text = 'Something is wrong'
|
62
|
+
#
|
63
|
+
# response = generate_response(code, text)
|
64
|
+
# response.inspect # => 'HTTP/1.1 503 Service Unavailable\r\nContent-type: text/plain\r\nContent-length: 18\r\n\r\nSomething is wrong\r\n'
|
17
65
|
def generate_response(code = 503, text)
|
18
66
|
"HTTP/1.1 #{code} #{CODES[code]}\r\nContent-type: text/plain\r\nContent-length: #{text.length}\r\n\r\n#{text}"
|
19
67
|
end
|
@@ -1,6 +1,22 @@
|
|
1
|
+
# Contains Galera specific methods for checking cluster member consistency
|
1
2
|
module Galera
|
2
|
-
STATES = {1 => 'Joining',2 => 'Donor/Desynced',3 => 'Joined',4 => 'Synced'}
|
3
3
|
|
4
|
+
# Galera knows {a couple of different states}[http://www.percona.com/doc/percona-xtradb-cluster/wsrep-status-index.html#wsrep_local_state].
|
5
|
+
# This constant describes their respective meaning for user feedback and, possibly, logging purposes.
|
6
|
+
STATES = {1 => 'Joining', 2 => 'Donor/Desynced', 3 => 'Joined', 4 => 'Synced'}
|
7
|
+
|
8
|
+
# Gets the actual status from the Galera cluster member using the Mysql2 gem.
|
9
|
+
# Notice that we're using the EventMachine-enabled Mysql2::Client.
|
10
|
+
#
|
11
|
+
# Right now it only returns the relevant error output and continues working afterwards.
|
12
|
+
#
|
13
|
+
# Nothing is mentioned about explicitly closing a client connection in the Mysql2 docs,
|
14
|
+
# however, we need to be careful with the amount of connections we're using since we might
|
15
|
+
# find ourselves in an environment where the number of connections is constraint for a very few.
|
16
|
+
#
|
17
|
+
# ==== Return values
|
18
|
+
#
|
19
|
+
# * @status - Contains a hash of all the relevant wsrep_* variables to be examined by #run_checks
|
4
20
|
def get_galera_status
|
5
21
|
begin
|
6
22
|
client = Mysql2::Client.new(
|
@@ -12,16 +28,34 @@ module Galera
|
|
12
28
|
)
|
13
29
|
result = client.query "SHOW STATUS LIKE 'wsrep_%';"
|
14
30
|
if result.count > 0
|
31
|
+
|
32
|
+
# We need to do some conversion here in order to get a usable hash
|
15
33
|
result.each do |r|
|
16
34
|
@status.merge!(Hash[*r])
|
17
35
|
end
|
18
36
|
end
|
19
37
|
client.close
|
20
38
|
rescue Exception => message
|
39
|
+
# FIXME Properly handle exception
|
21
40
|
puts message
|
22
41
|
end
|
23
42
|
end
|
24
43
|
|
44
|
+
# Returns the relevant status HTTP code accompanied by a useful user feedback text
|
45
|
+
#
|
46
|
+
# ==== Attributes
|
47
|
+
#
|
48
|
+
# * @status - Should contain a hash with the relevant information to determine the
|
49
|
+
# the cluster member status. Also see #get_galera_status.
|
50
|
+
#
|
51
|
+
# ==== Return values
|
52
|
+
#
|
53
|
+
# * +response+ - A hash containing a HTTP <tt>:code</tt> and a <tt>:text</tt> to return to the user
|
54
|
+
#
|
55
|
+
# ==== Example
|
56
|
+
#
|
57
|
+
# @status = {'wsrep_cluster_size' => 4 }
|
58
|
+
# response = self.run_checks # => {:code => 503, :text => 'Some text'}
|
25
59
|
def run_checks
|
26
60
|
get_galera_status
|
27
61
|
unless @status.empty?
|
@@ -42,16 +76,82 @@ module Galera
|
|
42
76
|
end
|
43
77
|
end
|
44
78
|
|
79
|
+
# Checks whether the cluster size as reported by the member is above 1.
|
80
|
+
# Any value below 2 is considered bad, as a cluster, by definition, should consist of at least
|
81
|
+
# 2 members connected to each other.
|
82
|
+
#
|
83
|
+
# A cluster size of 1 might also indicate a split-brain situation.
|
84
|
+
#
|
85
|
+
# ==== Return values
|
86
|
+
#
|
87
|
+
# * +true+ or +false+ - depending on the value of <tt>@status['wsrep_cluster_size']</tt>
|
88
|
+
#
|
89
|
+
# ==== Examples
|
90
|
+
#
|
91
|
+
# @status = Hash.new
|
92
|
+
#
|
93
|
+
# @status['wsrep_cluster_size'] = 3
|
94
|
+
# r = check_cluster_size
|
95
|
+
# r.inspect # => true
|
96
|
+
#
|
97
|
+
# @status['wsrep_cluster_size'] = 1
|
98
|
+
# r = check_cluster_size
|
99
|
+
# r.inspect # => false
|
45
100
|
def check_cluster_size
|
46
101
|
return true if Integer(@status['wsrep_cluster_size']) > 1
|
47
102
|
false
|
48
103
|
end
|
49
104
|
|
105
|
+
# Checks whether the cluster replication is running and active.
|
106
|
+
# If this returns false the <tt>'wsrep_ready'</tt> status variable is set to <tt>'OFF'</tt> and thus the server is not an active
|
107
|
+
# member of a running cluster.
|
108
|
+
#
|
109
|
+
# ==== Return values
|
110
|
+
#
|
111
|
+
# * +true+ or +false+ - depending on the value of <tt>@status['wsrep_ready']</tt>
|
112
|
+
#
|
113
|
+
# ==== Examples
|
114
|
+
#
|
115
|
+
# @status = Hash.new
|
116
|
+
#
|
117
|
+
# @status['wsrep_ready'] = 'ON'
|
118
|
+
# r = check_ready_state
|
119
|
+
# r.inspect # => true
|
120
|
+
#
|
121
|
+
# @status['wsrep_ready'] = 'OFF'
|
122
|
+
# r = check_ready_state
|
123
|
+
# r.inspect # => false
|
50
124
|
def check_ready_state
|
51
125
|
return true if @status['wsrep_ready'] == 'ON'
|
52
126
|
false
|
53
127
|
end
|
54
128
|
|
129
|
+
# Checks how the cluster member sees itself in terms of status
|
130
|
+
#
|
131
|
+
# Valid states, read from the <tt>'wsrep_local_state'</tt> variable and depending on the configuration, are <tt>4</tt>, meaning <tt>Synced</tt>, or <tt>2</tt>,
|
132
|
+
# meaning <tt>Donor/Desynced</tt>, if the option <tt>--accept-donor</tt> was passed at runtime.
|
133
|
+
#
|
134
|
+
# ==== Return values
|
135
|
+
#
|
136
|
+
# * +true+ or +false+ - depending on the value of <tt>@status['wsrep_local_state']</tt>
|
137
|
+
#
|
138
|
+
# ==== Examples
|
139
|
+
#
|
140
|
+
# @status = Hash.new
|
141
|
+
# @donor_allowed = false
|
142
|
+
#
|
143
|
+
# @status['wsrep_local_state'] = 4
|
144
|
+
# r = check_local_state
|
145
|
+
# r.inspect # => true
|
146
|
+
#
|
147
|
+
# @status['wsrep_local_state'] = 2
|
148
|
+
# r = check_local_state
|
149
|
+
# r.inspect # => false
|
150
|
+
#
|
151
|
+
# @donor_allowed = true
|
152
|
+
# @status['wsrep_local_state'] = 2
|
153
|
+
# r = check_local_state
|
154
|
+
# r.inspect # => true
|
55
155
|
def check_local_state
|
56
156
|
s = Integer(@status['wsrep_local_state'])
|
57
157
|
return true if s == 4 || (s == 2 && @donor_allowed)
|
@@ -0,0 +1,127 @@
|
|
1
|
+
# Contains MariaDB/MySQL specific methods for checking on slave health
|
2
|
+
module Mysql_Checks
|
3
|
+
|
4
|
+
# Gets the actual status from the MariaDB/MySQL slave using the Mysql2 gem.
|
5
|
+
# Notice that we're using the EventMachine-enabled Mysql2::Client.
|
6
|
+
#
|
7
|
+
# Right now it only returns the relevant error output and continues working afterwards.
|
8
|
+
#
|
9
|
+
# Nothing is mentioned about explicitly closing a client connection in the Mysql2 docs,
|
10
|
+
# however, we need to be careful with the amount of connections we're using since we might
|
11
|
+
# find ourselves in an environment where the number of connections is constraint for a very few.
|
12
|
+
#
|
13
|
+
# ==== Return values
|
14
|
+
#
|
15
|
+
# * @status - Contains a hash of all the relevant replication related variables to be examined by #run_checks
|
16
|
+
def get_mysql_status
|
17
|
+
begin
|
18
|
+
client = Mysql2::Client.new(
|
19
|
+
host: @server,
|
20
|
+
port: @sql_port,
|
21
|
+
username: @user,
|
22
|
+
password: @password
|
23
|
+
)
|
24
|
+
result = client.query 'SHOW SLAVE STATUS;'
|
25
|
+
if result.count > 0
|
26
|
+
result.each do |key, state|
|
27
|
+
@status[key] = state
|
28
|
+
end
|
29
|
+
end
|
30
|
+
client.close
|
31
|
+
rescue Exception => message
|
32
|
+
puts message
|
33
|
+
end
|
34
|
+
end
|
35
|
+
|
36
|
+
# Get the value of <tt>'Slave_IO_Running'</tt>, which, obviously, should be <tt>Yes</tt> since otherwise
|
37
|
+
# it would mean the slave is not replicated properly and/or has stopped because of an error.
|
38
|
+
#
|
39
|
+
# ==== Attributes
|
40
|
+
#
|
41
|
+
# * <tt>@status</tt> - Uses the <tt>'Slave_IO_Running'</tt> key inside the hash.
|
42
|
+
#
|
43
|
+
# ==== Return values
|
44
|
+
#
|
45
|
+
# +true+ or +false+ - depending on whether or not the slave's replication thread is running.
|
46
|
+
#
|
47
|
+
# ==== Examples
|
48
|
+
#
|
49
|
+
# @status = Hash.new
|
50
|
+
# @status['Slave_IO_Running'] = 'Yes'
|
51
|
+
#
|
52
|
+
# r = check_slave_io
|
53
|
+
# r.inspect # => true
|
54
|
+
#
|
55
|
+
# @status['Slave_IO_Running'] = 'No'
|
56
|
+
#
|
57
|
+
# r = check_slave_io
|
58
|
+
# r.inspect # => false
|
59
|
+
def check_slave_io
|
60
|
+
return true if @status['Slave_IO_Running'] == 'Yes'
|
61
|
+
false
|
62
|
+
end
|
63
|
+
|
64
|
+
# Get the value of <tt>'Seconds_Behind_Master'</tt>, which indicates the amount of time in seconds
|
65
|
+
# the slave is behind the master's instruction set received via the replication thread. This should
|
66
|
+
# always be as close to zero as possible (or even zero). If this value is beyond <tt>@threshold</tt>
|
67
|
+
# constantly you will need to think about changing your setup to accommodate the traffic coming in
|
68
|
+
# from the master.
|
69
|
+
#
|
70
|
+
# ==== Attributes
|
71
|
+
#
|
72
|
+
# * <tt>@status</tt> - Uses the <tt>'Seconds_Behind_Master'</tt> key inside the hash
|
73
|
+
# * <tt>@threshold</tt> - The globally defined threshold after which the slave is considered to be too far behind to still be an active member. The default is 120 seconds.
|
74
|
+
#
|
75
|
+
# ==== Return values
|
76
|
+
#
|
77
|
+
# +true+ or +false+ - depending on whether or not the slave's replication thread is behind <tt>@threshold</tt>
|
78
|
+
#
|
79
|
+
# ==== Examples
|
80
|
+
#
|
81
|
+
# @status = Hash.new
|
82
|
+
# @status['Slave_IO_Running'] = 'Yes'
|
83
|
+
#
|
84
|
+
# r = check_slave_io
|
85
|
+
# r.inspect # => true
|
86
|
+
#
|
87
|
+
# @status['Slave_IO_Running'] = 'No'
|
88
|
+
#
|
89
|
+
# r = check_slave_io
|
90
|
+
# r.inspect # => false
|
91
|
+
def check_seconds_behind
|
92
|
+
return true if Integer(@status['Seconds_Behind_Master']) < @threshold
|
93
|
+
end
|
94
|
+
|
95
|
+
# Returns the relevant status HTTP code accompanied by a useful user feedback text
|
96
|
+
#
|
97
|
+
# ==== Attributes
|
98
|
+
#
|
99
|
+
# * @status - Should contain a hash with the relevant information to determine the
|
100
|
+
# the cluster member status. Also see #get_mysql_status.
|
101
|
+
#
|
102
|
+
# ==== Return values
|
103
|
+
#
|
104
|
+
# * +response+ - A hash containing a HTTP <tt>:code</tt> and a <tt>:text</tt> to return to the user
|
105
|
+
#
|
106
|
+
# ==== Example
|
107
|
+
#
|
108
|
+
# @status = {'Seconds_Behind_Master' => 140 }
|
109
|
+
# response = self.run_checks
|
110
|
+
# response.inspect # => {:code => 503, :text => 'Some text'}
|
111
|
+
def run_checks
|
112
|
+
get_mysql_status
|
113
|
+
unless @status.empty?
|
114
|
+
response = {code: 200, text: []}
|
115
|
+
if !check_slave_io
|
116
|
+
response[:text] << 'Slave IO is not running.'
|
117
|
+
end
|
118
|
+
if !check_seconds_behind
|
119
|
+
response[:text] << "Slave is #{@status['Seconds_Behind_Master']} seconds behind. Threshold is #{@threshold}"
|
120
|
+
end
|
121
|
+
response[:code] = 503 unless response[:text].empty?
|
122
|
+
return response
|
123
|
+
else
|
124
|
+
return {code: 503, text: ['Unable to determine slave status']}
|
125
|
+
end
|
126
|
+
end
|
127
|
+
end
|
@@ -1,13 +1,41 @@
|
|
1
1
|
require 'optparse'
|
2
2
|
require 'ostruct'
|
3
3
|
|
4
|
+
# This module contains all the command line configuration methods
|
4
5
|
module Configuration
|
6
|
+
|
7
|
+
# Reads ARGV with OptionParser and return an OpenStruct object with the parsed values
|
8
|
+
#
|
9
|
+
# ==== Default values
|
10
|
+
#
|
11
|
+
# * +ip+ - By default Apprentice binds to 0.0.0.0.
|
12
|
+
# * +port+ - The port Apprentice binds to. It defaults to 3307.
|
13
|
+
# * +sql_port+ - The port the MariaDB/MySQL server listens on Apprentice connects to. Defaults to 3306.
|
14
|
+
# * +threshold+ - The acceptable slave lag in seconds. Defaults to 120 seconds. It only applies when the type is set to 'mysql'.
|
15
|
+
# * +accept_donor+ - If passed, cluster members in the state '2' aka "Donor/Desynced" are accepted as valid client providers. Defaults to false, which is recommended.
|
16
|
+
#
|
17
|
+
# ==== Attributes
|
18
|
+
#
|
19
|
+
# * +ARGV+
|
20
|
+
#
|
21
|
+
# ==== Return values
|
22
|
+
#
|
23
|
+
# * +options+ - OpenStruct object containing all options passed with ARGV
|
24
|
+
#
|
25
|
+
# ==== Example
|
26
|
+
#
|
27
|
+
# ARGV = "--user user --password password --server server"
|
28
|
+
# opt = get_config
|
29
|
+
# opt.user # => 'user'
|
30
|
+
# opt.password # => 'password'
|
31
|
+
# opt.server # => 'server'
|
5
32
|
def get_config
|
6
33
|
options = OpenStruct.new
|
7
34
|
options.ip = '0.0.0.0'
|
8
35
|
options.port = 3307
|
9
36
|
options.sql_port = 3306
|
10
37
|
options.accept_donor = false
|
38
|
+
options.threshold = 120
|
11
39
|
|
12
40
|
opt_parser = OptionParser.new do |opts|
|
13
41
|
opts.banner = "Usage: apprentice [options]\n"
|
@@ -15,20 +43,29 @@ module Configuration
|
|
15
43
|
opts.separator 'Specific options:'
|
16
44
|
|
17
45
|
opts.on('-s SERVER', '--server SERVER',
|
18
|
-
'
|
46
|
+
'SERVER to connect to') { |s| options.server = s }
|
19
47
|
opts.on('-u USER', '--user USER',
|
20
|
-
'USER to connect the server with') { |u| options.user = u }
|
48
|
+
'USER to connect to the server with') { |u| options.user = u }
|
21
49
|
opts.on('-p PASSWORD', '--password PASSWORD',
|
22
50
|
'PASSWORD to use') { |p| options.password = p }
|
51
|
+
opts.on('-t TYPE', '--type TYPE',
|
52
|
+
'TYPE of server. Must either by "galera" or "mysql".') { |t| options.type = t }
|
23
53
|
|
24
54
|
opts.on('-i', '--ip IP',
|
25
|
-
'Local IP to bind to'
|
55
|
+
'Local IP to bind to',
|
56
|
+
"(default: #{options.ip})") { |i| options.ip = i }
|
26
57
|
opts.on('--port PORT',
|
27
|
-
'Local PORT to use'
|
58
|
+
'Local PORT to use',
|
59
|
+
"(default: #{options.port})") { |p| options.port = p }
|
28
60
|
opts.on('--sql_port PORT',
|
29
|
-
'Port of
|
61
|
+
'Port of MariaDB/MySQL server to connect to',
|
62
|
+
"(default: #{options.sql_port})") { |p| options.sql_port = p }
|
30
63
|
opts.on('--[no-]accept-donor',
|
31
|
-
'Accept cluster state "Donor/Desynced" as valid'
|
64
|
+
'Accept galera cluster state "Donor/Desynced" as valid',
|
65
|
+
"(default: #{options.accept_donor})") { |ad| options.accept_donor = ad }
|
66
|
+
opts.on('--threshold SECONDS',
|
67
|
+
'MariaDB/MySQL slave lag threshold',
|
68
|
+
"(default: #{options.threshold})") { |tr| options.threshold = tr }
|
32
69
|
|
33
70
|
opts.separator ''
|
34
71
|
opts.separator 'Common options:'
|
@@ -44,10 +81,19 @@ module Configuration
|
|
44
81
|
end
|
45
82
|
|
46
83
|
begin
|
47
|
-
ARGV << 's-h' if ARGV.size < 3
|
48
84
|
opt_parser.parse!(ARGV)
|
49
|
-
|
50
|
-
|
85
|
+
|
86
|
+
# We need four variables:
|
87
|
+
# * user: a valid mysql user
|
88
|
+
# * password: the corresponding password
|
89
|
+
# * server: the server to connect to
|
90
|
+
# * type: either mysql or galera, depending on the setup
|
91
|
+
unless options.server &&
|
92
|
+
options.user &&
|
93
|
+
options.password &&
|
94
|
+
check_type(options.type)
|
95
|
+
$stderr.puts 'Error: you have to specify a user, a password, a server to connect to'
|
96
|
+
$stderr.puts 'and a valid type. It can either by "galera" or "mysql".'
|
51
97
|
$stderr.puts 'Try -h/--help for more options'
|
52
98
|
exit
|
53
99
|
end
|
@@ -57,4 +103,29 @@ module Configuration
|
|
57
103
|
exit
|
58
104
|
end
|
59
105
|
end
|
106
|
+
|
107
|
+
# Check the user input for a valid type
|
108
|
+
#
|
109
|
+
# ==== Attributes
|
110
|
+
#
|
111
|
+
# * +type+ - the type extracted from ARGV
|
112
|
+
#
|
113
|
+
# ==== Return values
|
114
|
+
#
|
115
|
+
# Either true or false, depending on whether the input provided
|
116
|
+
# matches either 'mysql' or 'galera'
|
117
|
+
#
|
118
|
+
# ==== Example
|
119
|
+
#
|
120
|
+
# r = check_type('mysql')
|
121
|
+
# r.inspect # => 'true'
|
122
|
+
#
|
123
|
+
# r = check_type('something else')
|
124
|
+
# r.inspect # => 'false'
|
125
|
+
def check_type(type)
|
126
|
+
%w{galera mysql}.each do |t|
|
127
|
+
return true if t == type
|
128
|
+
end
|
129
|
+
false
|
130
|
+
end
|
60
131
|
end
|
data/lib/apprentice/server.rb
CHANGED
@@ -1,12 +1,14 @@
|
|
1
|
+
# Main server module consisting of all server related methods and classes
|
1
2
|
module Server
|
3
|
+
|
4
|
+
# The actual EM::Connection instance referenced by the EventServer class.
|
5
|
+
# Notice that we use Mysql2::Client::EM instead of the regular Mysql2::Client class.
|
2
6
|
class EventServer < EM::Connection
|
3
7
|
require 'apprentice/checker'
|
4
8
|
require 'mysql2/em'
|
5
9
|
include Checker
|
6
10
|
|
7
|
-
|
8
|
-
|
9
|
-
def initialize(options)
|
11
|
+
def initialize(options) #:nodoc:
|
10
12
|
@ip = options.ip
|
11
13
|
@port = options.port
|
12
14
|
@sql_port = options.sql_port
|
@@ -14,9 +16,20 @@ module Server
|
|
14
16
|
@user = options.user
|
15
17
|
@password = options.password
|
16
18
|
@donor_allowed = options.donor_allowed
|
19
|
+
@type = options.type
|
20
|
+
@threshold = options.threshold
|
17
21
|
@status = {}
|
18
22
|
end
|
19
23
|
|
24
|
+
# Take the raw data received on @port and run initiate the checks against the server located at @server
|
25
|
+
#
|
26
|
+
# ==== Special conditions
|
27
|
+
#
|
28
|
+
# We are sending something to our client with #send_data inside the function, depending on what #run_checks returned to us during the function call.
|
29
|
+
#
|
30
|
+
# ==== Attributes
|
31
|
+
#
|
32
|
+
# * +data+ - We receive the actual HTTP request but since we're not a full blown HTTP server we don't actually use it to any extent
|
20
33
|
def receive_data(data)
|
21
34
|
response = run_checks
|
22
35
|
response_text = format_text(response[:text])
|
data/lib/apprentice/version.rb
CHANGED
data/ruby-apprentice.default
CHANGED
@@ -1,9 +1,17 @@
|
|
1
1
|
# Set to true to start the service
|
2
2
|
START=false
|
3
3
|
|
4
|
-
# MariaDB host
|
4
|
+
# MariaDB/MySQL host
|
5
5
|
DBHOST=''
|
6
6
|
# Username which shall be used to check the status
|
7
7
|
DBUSER=''
|
8
8
|
# Password
|
9
9
|
DBPASSWORD=''
|
10
|
+
# Type of server
|
11
|
+
# This should either by 'mysql' for MariaDB/MySQL slave lag detection
|
12
|
+
# or 'galera' for cluster member consistency checking
|
13
|
+
TYPE=''
|
14
|
+
# You can specify any other arguments you want to
|
15
|
+
# Example: '--threshold 60' for an accepted slave lag of 60 seconds
|
16
|
+
# For more options see 'apprentice --help'
|
17
|
+
EXTRA_ARGS=''
|
data/ruby-apprentice.init
CHANGED
@@ -5,7 +5,7 @@
|
|
5
5
|
# Required-Stop:
|
6
6
|
# Default-Start: 2 3 4 5
|
7
7
|
# Default-Stop: 0 1 6
|
8
|
-
# Short-Description: a MariaDB cluster integrity checker
|
8
|
+
# Short-Description: a MariaDB/MySQL slave lag and cluster integrity checker
|
9
9
|
### END INIT INFO
|
10
10
|
|
11
11
|
NAME="`basename ${0/.sh/}`"
|
@@ -30,7 +30,7 @@ do_start()
|
|
30
30
|
if [ ! "${START}" = "true" ]; then
|
31
31
|
log_failure_msg "this service is disabled. Enable it in /etc/default/$NAME"
|
32
32
|
return 2
|
33
|
-
elif [ ! "${DBHOST}" ] || [ ! "${DBPASSWORD}" ] || [ ! ${DBUSER} ] ; then
|
33
|
+
elif [ ! "${DBHOST}" ] || [ ! "${DBPASSWORD}" ] || [ ! ${DBUSER} ] || [ ! ${TYPE} ] ; then
|
34
34
|
log_failure_msg "Missing variables inside defaults file."
|
35
35
|
return 2
|
36
36
|
fi
|
@@ -41,13 +41,13 @@ do_start()
|
|
41
41
|
chown $USER:$GROUP "$pidfile_dirname"
|
42
42
|
chmod 0750 "$pidfile_dirname"
|
43
43
|
|
44
|
-
DAEMON_ARGS="--password ${DBPASSWORD} --user ${DBUSER} --server ${DBHOST} ${EXTRA_ARGS}"
|
44
|
+
DAEMON_ARGS="--password ${DBPASSWORD} --user ${DBUSER} --server ${DBHOST} --type ${TYPE} ${EXTRA_ARGS}"
|
45
45
|
|
46
46
|
start-stop-daemon --start --background --make-pidfile --quiet \
|
47
|
-
|
47
|
+
--user ${USER} --group ${GROUP} \
|
48
48
|
--pidfile ${PIDFILE} --exec ${DAEMON} --test > /dev/null || return 1
|
49
49
|
start-stop-daemon --start --background --make-pidfile --quiet \
|
50
|
-
|
50
|
+
--user ${USER} --group ${GROUP} \
|
51
51
|
--pidfile ${PIDFILE} --exec ${DAEMON} -- ${DAEMON_ARGS} || return 2
|
52
52
|
log_end_msg $?
|
53
53
|
}
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: apprentice
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.6
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Moritz Heiber
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2013-09-
|
11
|
+
date: 2013-09-13 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -38,7 +38,7 @@ dependencies:
|
|
38
38
|
- - '>='
|
39
39
|
- !ruby/object:Gem::Version
|
40
40
|
version: '0'
|
41
|
-
description: A MariaDB cluster integrity checker
|
41
|
+
description: A MariaDB/MySQL slave lag and cluster integrity checker
|
42
42
|
email:
|
43
43
|
- moritz.heiber@gmail.com
|
44
44
|
executables:
|
@@ -57,6 +57,7 @@ files:
|
|
57
57
|
- lib/apprentice.rb
|
58
58
|
- lib/apprentice/checker.rb
|
59
59
|
- lib/apprentice/checks/galera.rb
|
60
|
+
- lib/apprentice/checks/mysql.rb
|
60
61
|
- lib/apprentice/configuration.rb
|
61
62
|
- lib/apprentice/server.rb
|
62
63
|
- lib/apprentice/version.rb
|
@@ -87,7 +88,7 @@ rubyforge_project:
|
|
87
88
|
rubygems_version: 2.0.7
|
88
89
|
signing_key:
|
89
90
|
specification_version: 4
|
90
|
-
summary:
|
91
|
+
summary: Checks a given server for consistency and replication status
|
91
92
|
test_files:
|
92
93
|
- spec/lib/apprentice_spec.rb
|
93
94
|
- spec/spec_helper.rb
|