systemd_mon_mod 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 06a413928718c467090adf686c9dfa4d1f869077389f0fe83a81889066e87ef9
4
+ data.tar.gz: d7b6dd5c3f69ccd71e650ff4cbb5d99421f1344b1f6cb8f46fe2bf8830cb2257
5
+ SHA512:
6
+ metadata.gz: 4338ab309e663836cbe70a0958ee1fbc682f63e8f6c2406a18f56fc8d71d026e9c5928c87a8b82341bdc2f40a9b676f124478767aa84b8a958c5a31266526715
7
+ data.tar.gz: a441af6f624aa118b059288743a3b1bb8cee6fb79e1fb99aabdf67ae0973261642a1e9f788757e0eaa3963c12c65ca2c7273d97ec7ca4c4c9cb8fcfeeae058d3
data/.gitignore ADDED
@@ -0,0 +1,23 @@
1
+ *.gem
2
+ *.rbc
3
+ .bundle
4
+ .config
5
+ .yardoc
6
+ Gemfile.lock
7
+ InstalledFiles
8
+ _yardoc
9
+ coverage
10
+ doc/
11
+ lib/bundler/man
12
+ pkg
13
+ rdoc
14
+ spec/reports
15
+ test/tmp
16
+ test/version_tmp
17
+ tmp
18
+ *.bundle
19
+ *.so
20
+ *.o
21
+ *.a
22
+ mkmf.log
23
+ .vagrant
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in systemd_alert.gemspec
4
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2014 Jon Cairns
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,125 @@
1
+ # SystemdMon
2
+
3
+ Monitor systemd units and trigger alerts for failed states. The command line tool runs as a daemon, using dbus to get notifications of changes to systemd services. If a service enters a failed state, or returns from a failed state to an active state, notifications will be triggered.
4
+
5
+ Built-in notifications include email, slack, and hipchat, but more can be added via the ruby API.
6
+
7
+ It works by subscribing to DBus notifications from Systemd. This means that there is no polling, and no busy-loops. SystemdMon will sit in the background, happily waiting and using minimal processes.
8
+
9
+ ## Requirements
10
+
11
+ * A linux server
12
+ * Ruby > 1.9.3
13
+ * Systemd (v204 was used in development)
14
+ * `mail` gem (if email notifier is used)
15
+ * `slack-notifier` gem > 1.0 (if slack notifier is used)
16
+ * `hipchat` (if hipchat notifier is used)
17
+
18
+ ## Installation
19
+
20
+ Install the gem using:
21
+
22
+ gem install systemd_mon
23
+
24
+ ## Usage
25
+
26
+ To run the command line tool, you will first need to create a YAML configuration file to specify which systemd units you want to monitor, and which notifications you want to trigger. A full example looks like this:
27
+
28
+ ```yaml
29
+ ---
30
+ verbose: true # Default is off
31
+ notifiers:
32
+ email:
33
+ to: "team@mydomain.com"
34
+ from: "systemdmon@mydomain.com"
35
+ # These are options passed to the 'mail' gem
36
+ smtp:
37
+ address: smtp.gmail.com
38
+ port: 587
39
+ domain: mydomain.com
40
+ user_name: "user@mydomain.com"
41
+ password: "supersecr3t"
42
+ authentication: "plain"
43
+ enable_starttls_auto: true
44
+ slack:
45
+ webhook_url: https://hooks.slack.com/services/super/secret/tokenthings
46
+ channel: mychannel
47
+ username: doge
48
+ icon_emoji: ":computer"
49
+ icon_url: "http://example.com/icon"
50
+ hipchat:
51
+ token: bigsecrettokenhere
52
+ room: myroom
53
+ username: doge
54
+ units:
55
+ - unicorn.service
56
+ - nginx.service
57
+ - sidekiq.service
58
+ ```
59
+
60
+ Save that somewhere appropriate (e.g. `/etc/systemd_mon.yml`), then start the command line tool with:
61
+
62
+ $ systemd_mon /etc/systemd_mon.yml
63
+
64
+ You'll probably want to run it via systemd, which you can do with this example service file (change file paths as appropriate):
65
+
66
+ ```
67
+ [Unit]
68
+ Description=SystemdMon
69
+ After=network.target
70
+
71
+ [Service]
72
+ Type=simple
73
+ User=deploy
74
+ StandardInput=null
75
+ StandardOutput=syslog
76
+ StandardError=syslog
77
+ ExecStart=/usr/local/bin/systemd_mon /etc/systemd_mon.yml
78
+
79
+ [Install]
80
+ WantedBy=multi-user.target
81
+ ```
82
+
83
+ ## Behaviour
84
+
85
+ Systemd provides information about state changes in very fine detail. For example, if you start a service, it may go through the following states: activating (start-pre), activiating (start) and finally active (running). This will likely happen in less than a second, and you probably don't want 3 notifications. Therefore, SystemdMon queues up states until it comes across one that you think you should know about. In this case, it will notify you when the state reaches active (running), but the notification can show the history of how the state changed so you get the full picture.
86
+
87
+ SystemdMon does simple analysis on the history of state changes, so it can summarise with statuses like "recovered", "automatically restarted", "still failed", etc. It will also report with the host name of the server.
88
+
89
+ You'll also want to know if SystemdMon itself falls over, and when it starts back up again. It will attempt to send a final notification before it exits, and one to say it's starting. However, be aware that it might not send a notification in some conditions (e.g. in the case of a SIGKILL), or a network failure. The age-old question: who will watch the watcher?
90
+
91
+ ## Docker integration
92
+ There is a public Docker image available which bundles all requirements (Ruby + Gems). Since systemd_mon relies on dbus, you need to mount the host dbus directory into your container. Besides that, the configuration filename is currently hardcoded to systemd_mon.yml. You have to mount the directory where the systemd_mon.yml file is located on your host system into your container as well. Below is a working example:
93
+
94
+ ```
95
+ docker run --name "systemd_mon" -v /var/run/dbus:/var/run/dbus -v /path/to/systemd_mon/config/:/systemd_mon/ kromit/systemd_mon
96
+ ```
97
+
98
+ If you want to run this image with systemd (very handy on CoreOS for example) you can use it as follows:
99
+
100
+ ```
101
+ [Unit]
102
+ Description=systemd_mon
103
+ After=docker.service
104
+ Requires=docker.service
105
+
106
+ [Service]
107
+ Restart=always
108
+ RestartSec=60
109
+ ExecStartPre=-/usr/bin/docker kill systemd_mon
110
+ ExecStartPre=-/usr/bin/docker rm systemd_mon
111
+ ExecStart=/usr/bin/docker run --name "systemd_mon" -v /var/run/dbus:/var/run/dbus -v /path/to/systemd_mon/config/:/systemd_mon/ kromit/systemd_mon
112
+
113
+ [Install]
114
+ WantedBy=multi-user.target
115
+ ```
116
+
117
+ ## Contributing
118
+
119
+ I'd love more contributions, particulary new notifiers. Follow the example of the slack and email notifiers and either package as a new gem or submit a pull request if you think it should be part of the main project.
120
+
121
+ 1. Fork it ( https://github.com/joonty/systemd_mon/fork )
122
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
123
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
124
+ 4. Push to the branch (`git push origin my-new-feature`)
125
+ 5. Create a new Pull Request
data/Rakefile ADDED
@@ -0,0 +1,2 @@
1
+ require "bundler/gem_tasks"
2
+
data/bin/systemd_mon ADDED
@@ -0,0 +1,4 @@
1
+ #!/usr/bin/env ruby
2
+ require 'systemd_mon/cli'
3
+
4
+ SystemdMon::CLI.new.start
@@ -0,0 +1,39 @@
1
+ require 'systemd_mon/unit_with_state'
2
+
3
+ module SystemdMon
4
+ class CallbackManager
5
+ def initialize(queue)
6
+ self.queue = queue
7
+ self.states = Hash.new { |h, u| h[u] = UnitWithState.new(u) }
8
+ end
9
+
10
+ def start(change_callback, each_state_change_callback)
11
+ loop do
12
+ unit, state = queue.deq
13
+ Logger.debug { "#{unit} new state: #{state}" }
14
+ unit_state = states[unit]
15
+ unit_state << state
16
+
17
+ if each_state_change_callback
18
+ with_error_handling { each_state_change_callback.call(unit_state) }
19
+ end
20
+
21
+ if change_callback && unit_state.state_change.important?
22
+ with_error_handling { change_callback.call(unit_state) }
23
+ end
24
+
25
+ unit_state.reset! if unit_state.state_change.important?
26
+ end
27
+ end
28
+
29
+ def with_error_handling
30
+ yield
31
+ rescue => e
32
+ Logger.error "Uncaught exception (#{e.class}) in callback: #{e.message}"
33
+ Logger.debug_error { "\n\t#{e.backtrace.join("\n\t")}\n" }
34
+ end
35
+
36
+ protected
37
+ attr_accessor :queue, :states
38
+ end
39
+ end
@@ -0,0 +1,86 @@
1
+ require 'yaml'
2
+ require 'systemd_mon'
3
+ require 'systemd_mon/monitor'
4
+ require 'systemd_mon/error'
5
+ require 'systemd_mon/dbus_manager'
6
+
7
+ module SystemdMon
8
+ class CLI
9
+ def initialize
10
+ self.me = "systemd_mon"
11
+ self.verbose = true
12
+ end
13
+
14
+ def start
15
+ yaml_config_file = ARGV.first
16
+ self.options = load_and_validate_options(yaml_config_file)
17
+ self.verbose = options['verbose'] || false
18
+ Logger.verbose = verbose
19
+
20
+ start_monitor
21
+
22
+ rescue SystemdMon::Error => e
23
+ err_string = e.message
24
+ if verbose
25
+ if e.original
26
+ err_string << " - #{e.original.message} (#{e.original.class})"
27
+ err_string << "\n\t#{e.original.backtrace.join("\n\t")}"
28
+ else
29
+ err_string << " (#{e.class})"
30
+ err_string << "\n\t#{e.backtrace.join("\n\t")}"
31
+ end
32
+ end
33
+ fatal_error(err_string)
34
+ rescue => e
35
+ err_string = e.message
36
+ if verbose
37
+ err_string << " (#{e.class})"
38
+ err_string << "\n\t#{e.backtrace.join("\n\t")}"
39
+ end
40
+ fatal_error(err_string)
41
+ end
42
+
43
+ protected
44
+ def start_monitor
45
+ monitor = Monitor.new(DBusManager.new)
46
+
47
+ # Load units to monitor
48
+ monitor.register_units options['units']
49
+
50
+ options['notifiers'].each do |name, notifier_options|
51
+ klass = NotifierLoader.new.get_class(name)
52
+ monitor.add_notifier klass.new(notifier_options)
53
+ end
54
+
55
+ monitor.start
56
+ end
57
+
58
+ def load_and_validate_options(yaml_config_file)
59
+ options = load_options(yaml_config_file)
60
+
61
+ unless options.has_key?('notifiers') && options['notifiers'].any?
62
+ fatal_error("no notifiers have been defined, there is no reason to continue")
63
+ end
64
+ unless options.has_key?('units') && options['units'].any?
65
+ fatal_error("no units have been added for watching, there is no reason to continue")
66
+ end
67
+ options
68
+ end
69
+
70
+ def load_options(yaml_config_file)
71
+ unless yaml_config_file && File.exists?(yaml_config_file)
72
+ fatal_error "First argument must be a path to a YAML configuration file"
73
+ end
74
+
75
+ YAML.load_file(yaml_config_file)
76
+ end
77
+
78
+ def fatal_error(message, code = 255)
79
+ $stderr.puts " #{me} error: #{message}"
80
+ exit code
81
+ end
82
+
83
+ protected
84
+ attr_accessor :verbose, :options, :me
85
+ end
86
+ end
@@ -0,0 +1,35 @@
1
+ require 'dbus'
2
+ require 'systemd_mon/error'
3
+ require 'systemd_mon/dbus_unit'
4
+
5
+ module SystemdMon
6
+ class DBusManager
7
+ def initialize
8
+ self.dbus = DBus::SystemBus.instance
9
+ self.systemd_service = dbus.service("org.freedesktop.systemd1")
10
+ self.systemd_object = systemd_service.object("/org/freedesktop/systemd1")
11
+ systemd_object.introspect
12
+ if systemd_object.respond_to?("Subscribe")
13
+ systemd_object.Subscribe
14
+ else
15
+ raise SystemdMon::SystemdError, "Systemd is not installed, or is an incompatible version. It must provide the Subscribe dbus method: version 204 is the minimum recommended version."
16
+ end
17
+ end
18
+
19
+ def fetch_unit(unit_name)
20
+ path = systemd_object.GetUnit(unit_name).first
21
+ DBusUnit.new(unit_name, path, systemd_service.object(path))
22
+ rescue DBus::Error
23
+ raise SystemdMon::UnknownUnitError, "Unknown or unloaded systemd unit '#{unit_name}'"
24
+ end
25
+
26
+ def runner
27
+ main = DBus::Main.new
28
+ main << dbus
29
+ main
30
+ end
31
+
32
+ protected
33
+ attr_accessor :systemd_service, :systemd_object, :dbus
34
+ end
35
+ end
@@ -0,0 +1,70 @@
1
+ require 'systemd_mon/state'
2
+
3
+ module SystemdMon
4
+ class DBusUnit
5
+ attr_reader :name, :maybe_service_type
6
+
7
+ IFACE_UNIT = "org.freedesktop.systemd1.Unit"
8
+ IFACE_SERVICE = "org.freedesktop.systemd1.Service"
9
+ IFACE_PROPS = "org.freedesktop.DBus.Properties"
10
+
11
+ def initialize(name, path, dbus_object)
12
+ self.name = name
13
+ self.path = path
14
+ self.dbus_object = dbus_object
15
+ prepare_dbus_objects!
16
+ self.maybe_service_type = service_type
17
+ end
18
+
19
+ def register_listener!(queue)
20
+ queue.enq [self, build_state] # initial state
21
+ dbus_object.on_signal("PropertiesChanged") do |iface|
22
+ if iface == IFACE_UNIT
23
+ queue.enq [self, build_state]
24
+ end
25
+ end
26
+ end
27
+
28
+ def on_change(&callback)
29
+ self.change_callback = callback
30
+ end
31
+
32
+ def on_each_state_change(&callback)
33
+ self.each_state_change_callback = callback
34
+ end
35
+
36
+ def property(name)
37
+ dbus_object.Get(IFACE_UNIT, name).first
38
+ end
39
+
40
+ def to_s
41
+ "#{name}" << (maybe_service_type ? " (#{maybe_service_type})" : '')
42
+ end
43
+
44
+ protected
45
+ attr_accessor :path, :dbus_object, :change_callback, :each_state_change_callback
46
+ attr_writer :name, :maybe_service_type
47
+
48
+ def build_state
49
+ State.new(
50
+ property("ActiveState"),
51
+ property("SubState"),
52
+ property("LoadState"),
53
+ property("UnitFileState"),
54
+ maybe_service_type
55
+ )
56
+ end
57
+
58
+ def prepare_dbus_objects!
59
+ dbus_object.introspect
60
+ self.dbus_object.default_iface = IFACE_PROPS
61
+ self
62
+ end
63
+
64
+ def service_type
65
+ if dbus_object[IFACE_SERVICE]
66
+ dbus_object[IFACE_SERVICE]['Type']
67
+ end
68
+ end
69
+ end
70
+ end
@@ -0,0 +1,19 @@
1
+ module SystemdMon
2
+
3
+ # Save original exception for use in verbose mode
4
+ class Error < StandardError
5
+ attr_reader :original
6
+
7
+ def initialize(msg, original=$!)
8
+ super(msg)
9
+ @original = original
10
+ end
11
+ end
12
+
13
+ class SystemdError < Error; end
14
+ class MonitorError < Error; end
15
+ class UnknownUnitError < Error; end
16
+ class NotificationError < Error; end
17
+ class NotifierDependencyError < Error; end
18
+ class NotifierError < Error; end
19
+ end
@@ -0,0 +1,18 @@
1
+ module SystemdMon::Formatters
2
+ class Base
3
+ def initialize(unit)
4
+ self.unit = unit
5
+ end
6
+
7
+ def as_html
8
+ raise "The formatter #{self.class} does not provide an html formatted string"
9
+ end
10
+
11
+ def as_text
12
+ raise "The formatter #{self.class} does not provide a plain text string"
13
+ end
14
+
15
+ protected
16
+ attr_accessor :unit
17
+ end
18
+ end
@@ -0,0 +1,32 @@
1
+ require 'systemd_mon/formatters/base'
2
+ module SystemdMon::Formatters
3
+ class StateTableFormatter < Base
4
+ def as_text
5
+ table = render_table
6
+ lengths = table.transpose.map { |v| v.map(&:length).max }
7
+
8
+ full_width = lengths.inject(&:+) + (lengths.length * 3) + 1
9
+ div = " " + ("-" * full_width) + "\n"
10
+ s = div.dup
11
+ table.each do |row|
12
+ s << " | "
13
+ row.each_with_index { |col, i|
14
+ s << col.ljust(lengths[i]) + " | "
15
+ }
16
+ s << "\n" + div.dup
17
+ end
18
+ s
19
+ end
20
+
21
+ protected
22
+ def render_table
23
+ changed = unit.state_change.diff
24
+ table = []
25
+ table << ["Time"].concat(changed.map{|v| v.first.display_name})
26
+ changed.transpose.each do |vals|
27
+ table << [vals.first.timestamp.strftime("%H:%M:%S.%3N %z")].concat(vals.map{|v| v.value})
28
+ end
29
+ table
30
+ end
31
+ end
32
+ end
@@ -0,0 +1,33 @@
1
+ module SystemdMon
2
+ class Logger
3
+ def self.verbose=(flag)
4
+ @verbose = flag
5
+ end
6
+
7
+ def self.verbose
8
+ @verbose
9
+ end
10
+
11
+ def self.debug(message = nil, stream = $stdout)
12
+ if verbose
13
+ if block_given?
14
+ $stdout.puts yield
15
+ else
16
+ $stdout.puts message
17
+ end
18
+ end
19
+ end
20
+
21
+ def self.error(message = nil)
22
+ $stderr.puts message
23
+ end
24
+
25
+ def self.debug_error(message = nil)
26
+ debug message, $stderr
27
+ end
28
+
29
+ def self.puts(message = nil)
30
+ $stdout.puts message
31
+ end
32
+ end
33
+ end
@@ -0,0 +1,98 @@
1
+ require 'thread'
2
+ require 'systemd_mon/logger'
3
+ require 'systemd_mon/callback_manager'
4
+ require 'systemd_mon/notification_centre'
5
+ require 'systemd_mon/notification'
6
+ require 'systemd_mon/error'
7
+
8
+ module SystemdMon
9
+ class Monitor
10
+ def initialize(dbus_manager)
11
+ self.hostname = `hostname`.strip
12
+ self.dbus_manager = dbus_manager
13
+ self.units = []
14
+ self.change_callback = lambda(&method(:unit_change_callback))
15
+ self.notification_centre = NotificationCentre.new
16
+ Thread.abort_on_exception = true
17
+ end
18
+
19
+ def add_notifier(notifier)
20
+ notification_centre << notifier
21
+ self
22
+ end
23
+
24
+ def register_unit(unit_name)
25
+ self.units << dbus_manager.fetch_unit(unit_name)
26
+ self
27
+ end
28
+
29
+ def register_units(*unit_names)
30
+ self.units.concat unit_names.flatten.map { |unit_name|
31
+ dbus_manager.fetch_unit(unit_name)
32
+ }
33
+ self
34
+ end
35
+
36
+ def on_change(&callback)
37
+ self.change_callback = callback
38
+ self
39
+ end
40
+
41
+ def on_each_state_change(&callback)
42
+ self.each_state_change_callback = callback
43
+ self
44
+ end
45
+
46
+ def start
47
+ startup_check!
48
+ at_exit { notification_centre.notify_stop! hostname }
49
+ notification_centre.notify_start! hostname
50
+
51
+ Logger.puts "Monitoring changes to #{units.count} units"
52
+ Logger.debug { " - " + units.map(&:name).join("\n - ") + "\n\n" }
53
+ Logger.debug { "Using notifiers: #{notification_centre.classes.join(", ")}"}
54
+
55
+ state_q = Queue.new
56
+
57
+ units.each do |unit|
58
+ unit.register_listener! state_q
59
+ end
60
+
61
+ [start_callback_thread(state_q),
62
+ start_dbus_thread].each(&:join)
63
+ end
64
+
65
+ protected
66
+ attr_accessor :units, :dbus_manager, :change_callback, :each_state_change_callback, :hostname, :notification_centre
67
+
68
+ def startup_check!
69
+ unless units.any?
70
+ raise MonitorError, "At least one systemd unit should be registered before monitoring can start"
71
+ end
72
+ unless notification_centre.any?
73
+ raise MonitorError, "At least one notifier should be registered before monitoring can start"
74
+ end
75
+ self
76
+ end
77
+
78
+ def start_dbus_thread
79
+ Thread.new do
80
+ dbus_manager.runner.run
81
+ end
82
+ end
83
+
84
+ def start_callback_thread(state_q)
85
+ Thread.new do
86
+ manager = CallbackManager.new(state_q)
87
+ manager.start change_callback, each_state_change_callback
88
+ end
89
+ end
90
+
91
+ def unit_change_callback(unit)
92
+ Logger.puts "#{unit.name} #{unit.state_change.status_text}: #{unit.state.active} (#{unit.state.sub})"
93
+ Logger.debug unit.state_change.to_s
94
+ Logger.puts
95
+ notification_centre.notify! Notification.new(hostname, unit)
96
+ end
97
+ end
98
+ end
@@ -0,0 +1,34 @@
1
+ module SystemdMon
2
+ class Notification
3
+ attr_reader :unit, :type, :hostname
4
+
5
+ def initialize(hostname, unit)
6
+ self.hostname = hostname
7
+ self.unit = unit
8
+ self.type = determine_type
9
+ end
10
+
11
+ def self.types
12
+ [:alert, :warning, :info, :ok]
13
+ end
14
+
15
+ def type_text
16
+ type.to_s.capitalize
17
+ end
18
+
19
+ protected
20
+ attr_writer :unit, :type, :hostname
21
+
22
+ def determine_type
23
+ if unit.state_change.ok?
24
+ if unit.state_change.first.fail?
25
+ :ok
26
+ else
27
+ :info
28
+ end
29
+ else
30
+ :alert
31
+ end
32
+ end
33
+ end
34
+ end