recmon 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
data/.yardopts ADDED
@@ -0,0 +1,2 @@
1
+ --title "REC Monitor" -m rdoc --main README lib/**/*.rb - README CHANGELOG
2
+
data/CHANGELOG ADDED
@@ -0,0 +1,2 @@
1
+ == Version 1.0.0
2
+ - Initial publication
data/README ADDED
@@ -0,0 +1,143 @@
1
+ = Recmon
2
+ Recmon is a host-based system monitor. It complements REC, the Ruby Event Correlator.
3
+
4
+ == Installation
5
+
6
+ $ sudo gem install recmon
7
+
8
+ == Usage
9
+ Require the recmon gem:
10
+
11
+ require 'rubygems'
12
+ require 'recmon'
13
+
14
+ Create a monitor to obtain readings from the sensors at the right time:
15
+
16
+ s = Recmon::Monitor.new()
17
+
18
+ Now add sensors for each aspect you want to monitor:
19
+
20
+ s.web("Website", "http://www.example.com/index.html")
21
+ s.diskspace("Database", "/var/postgres/data/")
22
+ s.filesize("Messages", "/var/log/messages")
23
+ s.ping("earth", "206.125.172.58")
24
+
25
+ Then start the monitor running:
26
+
27
+ s.start()
28
+
29
+ and it will periodically write entries to the log file (default = +/var/log/recmon.log+):
30
+
31
+ 2012-09-13T16:08:59+10:00 Recmon is monitoring.
32
+ ...
33
+ 2012-09-13T16:08:59+10:00 site=Website status=down
34
+ 2012-09-13T16:08:59+10:00 Database usage=247 MB
35
+ 2012-09-13T16:08:59+10:00 Messages filesize=83 KB
36
+ 2012-09-13T16:08:59+10:00 ping host=earth status=up
37
+ ...
38
+ 2012-09-13T16:09:16+10:00 Recmon is exiting.
39
+
40
+ == Sensors
41
+ There are several sensors:
42
+
43
+ - *web*: ensure a website is responding
44
+ - *diskspace*: track the diskspace used by a folder
45
+ - *filesize*: monitor the size of a file
46
+ - *ping*: check if a server is alive
47
+ - *proc*: look for a named process
48
+ - *ssh*: ensure the SSH service is running
49
+ - *command*: run an arbitrary command and report success
50
+
51
+ === Common characteristics
52
+ All sensors have a name which is used to distinguish it from others of the same type.
53
+ For example, you can monitor the +earth+ server and the +terra+ server.
54
+ The name is used in composing the log entry.
55
+
56
+ All sensors have a sane default frequency, so it is not necessary to
57
+ specify a frequency unless you want to override the default.
58
+
59
+ Sensors compose log messages that is designed for further processing. Although they are
60
+ readable, it is more important that they are parsable, so they always start with an ISO8601 date time
61
+ and then several +name+=+value+ pairs.
62
+
63
+ === web: Ensure a website is responding
64
+ To periodically check if a website is reachable, add a WebSensor, specifying the title, the URL, and optionally a frequency (default = 60 seconds).
65
+
66
+ s.web(name, url, freq=60)
67
+ s.web("Main", "http://www.finalstep.com.au/heartbeat.png")
68
+ # log entry => "2012-09-13T16:08:59+10:00 site=Main status=down"
69
+
70
+ s.web("Google", "http://www.google.com/jsapi", 120)
71
+ # log entry => "2012-09-13T16:09:02+10:00 site=Google status=up"
72
+
73
+ === diskspace: track the diskspace used by a folder
74
+ A DiskspaceSensor tracks how much disk space is being consumed by a set of files (log files or database files) in a folder:
75
+
76
+ s.diskspace(name, folderPath, freq=1200)
77
+ s.diskspace("Database", "/var/postgres/data/")
78
+ # log entry => "2012-09-13T16:09:02+10:00 Database usage=45 MB"
79
+ s.diskspace("Logs", "/var/log/", 86400) # daily frequency
80
+ # log entry => "2012-09-13T20:00:00+10:00 Logs usage=16 MB"
81
+
82
+ === filesize: monitor the size of a file
83
+ A FilesizeSensor tracks the size of a given file every 10 minutes:
84
+
85
+ s.filesize(name, path, freq=1200)
86
+ s.filesize("Messages", "/var/log/messages")
87
+ # log entry => "2012-09-13T16:09:02+10:00 Messages filesize=78 KB"
88
+
89
+ === ping: check if a server is alive
90
+ The PingSensor pings a server to ensure it is accessible through the network
91
+
92
+ s.ping(name, ipaddr, freq=300)
93
+ s.ping("earth", "206.125.172.58")
94
+ # log entry => "2012-09-13T16:09:02+10:00 ping host=earth status=up"
95
+
96
+ === proc: look for a named process
97
+ It is often useful to directly check that a process is still running.
98
+ The ProcSensor finds processes that match a pattern.
99
+
100
+ s.proc(name, pattern, freq=60)
101
+ s.proc("Webserver", "httpd")
102
+ # log entry => "2012-09-13T16:09:02+10:00 proc=Webserver status=running"
103
+ # log entry => "2012-09-13T16:10:30+10:00 proc=Webserver status=stopped"
104
+ s.proc("Postgres server", "postgres: writer process")
105
+ # log entry => "2012-09-13T16:10:30+10:00 proc=Postgres server status=running"
106
+
107
+ === ssh: ensure the SSH service is running
108
+ Check that a server is accepting SSH connections. The SSHSensor actually runs a harmless
109
+ command (+pwd+) on the target server to confirm SSH is working. Obviously to make this
110
+ work in batch mode, the recmon user needs to have a private key in <code>~/.ssh/</code> and add it
111
+ to <code>~/.ssh/authorized_keys</code> on the target server.
112
+
113
+ s.ssh(hostname, user="_recmon", port=22, freq=300)
114
+ s.ssh("earth")
115
+ # log entry => "2012-09-13T16:09:02+10:00 SSH at host=earth for user=_recmon status=up"
116
+ s.ssh("moon", "richard", 922, 600)
117
+ # log entry => "2012-09-13T16:09:02+10:00 SSH at host=moon for user=richard status=up"
118
+
119
+ === command: run an arbitrary command and report success
120
+ The *command* sensor simply executes the command and reports success (exit code = 0)
121
+ or failure.
122
+
123
+ s.command("MySQL server", "/usr/local/bin/mysqladmin ping")
124
+ # log entry => "2012-09-13T16:09:02+10:00 MySQL server status=up"
125
+
126
+
127
+ == Why Recmon?
128
+ === 1. Recmon is lightweight
129
+ There are several wonderful system monitoring tools (nagios, splunk)
130
+ but they require a considerable investment in configuration, not to mention
131
+ dependencies that may be undesired.
132
+ If you just want to keep an eye on a few key metrics then Recmon is much
133
+ faster and easier to set up.
134
+
135
+ === 2. Recmon complements REC
136
+ Recmon generates log entries for a range of events that are not typically logged.
137
+ Once they are logged, they can be analysed to generate alerts.
138
+
139
+ REC (Ruby Event Correlation) is the tool that correlates events across time to determine
140
+ if a situation is abnormal, and so generates fewer, more meaningful alerts
141
+ by email or instant message.
142
+
143
+ So Recmon + REC = lightweight Nagios
data/lib/recmon.rb ADDED
@@ -0,0 +1,22 @@
1
+ require 'recmon/monitor'
2
+ require 'recmon/sensor'
3
+ require 'recmon/command-sensor'
4
+ require 'recmon/diskspace-sensor'
5
+ require 'recmon/filesize-sensor'
6
+ require 'recmon/ping-sensor'
7
+ require 'recmon/proc-sensor'
8
+ require 'recmon/ssh-sensor'
9
+ require 'recmon/web-sensor'
10
+
11
+ # The Recmon Module includes:
12
+ # - Recmon::Monitor
13
+ # - Recmon::Sensor
14
+ # - Recmon::CommandSensor
15
+ # - Recmon::DiskspaceSensor
16
+ # - Recmon::FilesizeSensor
17
+ # - Recmon::PingSensor
18
+ # - Recmon::ProcSensor
19
+ # - Recmon::SSHSensor
20
+ # - Recmon::WebSensor
21
+ module Recmon
22
+ end
@@ -0,0 +1,26 @@
1
+ require 'recmon/sensor'
2
+ require 'ipaddr'
3
+
4
+ module Recmon
5
+
6
+ # Periodically execute a command and report success
7
+ class CommandSensor < Sensor
8
+
9
+ # Create a new CommandSensor. The name is descriptive and the command
10
+ # should be runnable by the +_recmon+ user
11
+ def initialize(name, command, freq=600)
12
+ super(name, freq)
13
+ @command = command
14
+ end
15
+
16
+ # Ensure user _recmon has permission to run the command.
17
+ # An exit status of zero is required for success
18
+ def sense()
19
+ `#{@command}`
20
+ up = $?.exitstatus == 0
21
+ status = up ? "up" : "down"
22
+ return("#{@name} status=#{status}")
23
+ end
24
+
25
+ end
26
+ end
@@ -0,0 +1,28 @@
1
+ require 'recmon/sensor'
2
+
3
+ module Recmon
4
+
5
+ # Periodically check the diskspace usage for a folder
6
+ class DiskspaceSensor < Sensor
7
+
8
+ # create a DiskspaceSensor to check the diskspace usage of a folder.
9
+ # * +name+ is a descriptive name for the folder
10
+ # * +path+ should be absolute
11
+ def initialize(name, path, freq=1200)
12
+ super(name, freq)
13
+ @path = path.strip()
14
+ raise "Path not found: #{path}" unless File.directory?(@path)
15
+ end
16
+
17
+ # Called by Monitor. Executes du(1) to determine the disk usage in Megabytes.
18
+ def sense()
19
+ begin
20
+ size = `sudo /usr/bin/du -sm #{@path}`.to_i().to_s()
21
+ rescue
22
+ size = "unknown"
23
+ end
24
+ return("#{@name} usage=#{size} MB")
25
+ end
26
+
27
+ end
28
+ end
@@ -0,0 +1,105 @@
1
+ require 'time'
2
+
3
+ module Recmon
4
+
5
+ # The Monitor manages the list of sensors, and calls on them periodically
6
+ # to report their results.
7
+ class Monitor
8
+
9
+ # Returns an array of sensors
10
+ attr_reader :sensors
11
+
12
+ # Creates a new monitor, writing to the given log file.
13
+ # The default frequency of 10 seconds is frequent enough to be near-real-time
14
+ # without any burden on the processor. This is the frequency on which the
15
+ # monitor checks if any sensors are due - not the frequency for each sensor.
16
+ def initialize(log="/var/log/recmon.log", freq=10)
17
+ @logfile = log
18
+ @freq = freq # seconds
19
+ @log = ""
20
+ @sensors = []
21
+ end
22
+
23
+ # Starts the monitor running until an INT signal is received.
24
+ def start()
25
+ trap("HUP") { # defer monitoring for a minute and reopen log files
26
+ @log.close() unless @log.nil? or @log.closed?
27
+ sleep(60)
28
+ open_log()
29
+ }
30
+ trap("INT") {
31
+ stop()
32
+ }
33
+ open_log()
34
+ log("Recmon is monitoring.")
35
+ monitor()
36
+ while sleep(@freq)
37
+ monitor()
38
+ end
39
+ end
40
+
41
+ # opens the log file - occurs on starting, and if HUP signal is received
42
+ def open_log()
43
+ begin
44
+ @log = File.new(@logfile, 'a')
45
+ rescue
46
+ $stderr.puts("Unable to open log file #{@logfile}.}")
47
+ exit 1
48
+ end
49
+ end
50
+
51
+ # Stops the monitor
52
+ def stop()
53
+ log("Recmon is exiting.")
54
+ @log.close() unless @log.nil? or @log.closed?
55
+ exit 0
56
+ end
57
+
58
+ # Checks the list of sensors to see if any are due to report
59
+ def monitor()
60
+ @sensors.each { |s| log(s.sense()) if s.check() }
61
+ @log.flush()
62
+ end
63
+
64
+ # Logs a message to the log file
65
+ def log(message)
66
+ @log.puts("#{Time.now.iso8601} #{message}")
67
+ end
68
+
69
+ # Adds a CommandSensor
70
+ def command(name, command, freq=120)
71
+ @sensors << Recmon::CommandSensor.new(name, command, freq)
72
+ end
73
+
74
+ # Adds a DiskspaceSensor
75
+ def diskspace(name, path, freq=1200)
76
+ @sensors << Recmon::DiskspaceSensor.new(name, path, freq)
77
+ end
78
+
79
+ # Adds a FilesizeSensor
80
+ def filesize(name, path, freq=1200)
81
+ @sensors << Recmon::FilesizeSensor.new(name, path, freq)
82
+ end
83
+
84
+ # Adds a PingSensor
85
+ def ping(name, ip, freq=120)
86
+ @sensors << Recmon::PingSensor.new(name, ip, freq)
87
+ end
88
+
89
+ # Adds a ProcSensor
90
+ def proc(name, pattern, freq=60)
91
+ @sensors << Recmon::ProcSensor.new(name, pattern, freq)
92
+ end
93
+
94
+ # Add an SSHSensor
95
+ def ssh(name, user, port=22, freq=120)
96
+ @sensors << Recmon::SSHSensor.new(name, user, port, freq)
97
+ end
98
+
99
+ # Adds a WebSensor
100
+ def web(name, url, freq=120)
101
+ @sensors << Recmon::WebSensor.new(name, url, freq)
102
+ end
103
+
104
+ end
105
+ end
@@ -0,0 +1,25 @@
1
+ require 'recmon/sensor'
2
+ require 'ipaddr'
3
+
4
+ module Recmon
5
+
6
+ # Periodically pings a host to check it is alive
7
+ class PingSensor < Sensor
8
+
9
+ # Create a new PingSensor to ping the given IP every 2 minutes
10
+ def initialize(name, ip, freq=120)
11
+ super(name, freq)
12
+ @ip = IPAddr.new(ip)
13
+ end
14
+
15
+ # Called by Monitor. Sends a single ICMP ping to the IP.
16
+ # Of course, there is no point doing this if any firewall in between drops ICMP pings.
17
+ def sense()
18
+ `/sbin/ping -c1 #{@ip}`
19
+ up = $?.exitstatus == 0
20
+ status = up ? "up" : "down"
21
+ return("ping host=#{@name} status=#{status}")
22
+ end
23
+
24
+ end
25
+ end
@@ -0,0 +1,23 @@
1
+ require 'recmon/sensor'
2
+
3
+ module Recmon
4
+
5
+ # ProcSensor checks periodically to ensure a certain process is running
6
+ class ProcSensor < Sensor
7
+
8
+ # Create a new ProcSensor. The pattern is a string as specified in pgrep(1).
9
+ def initialize(name, pattern, freq=60)
10
+ super(name, freq)
11
+ @pattern = pattern
12
+ end
13
+
14
+ # Called by Monitor. This method executes pgrep(1) to check if one or more
15
+ # processes match
16
+ def sense()
17
+ down = `sudo /usr/bin/pgrep -f "#{@pattern}"` == ""
18
+ status = down ? "stopped" : "running"
19
+ return("proc=#{@name} status=#{status}")
20
+ end
21
+
22
+ end
23
+ end
@@ -0,0 +1,34 @@
1
+
2
+ module Recmon
3
+
4
+ # Sensor is the abstract parent class for all sensors.
5
+ class Sensor
6
+
7
+ # should only be called by the Monitor
8
+ def initialize(name, freq)
9
+ @name = name
10
+ @freq = freq.to_i
11
+ @due = Time.now
12
+ raise "A name must be specified for the sensor" if name.nil? or name.length == 0
13
+ raise "Frequency must be positive seconds, #{@freq} is not valid" if freq < 1
14
+ end
15
+
16
+ # check if the sensor is due to take another reading
17
+ def check()
18
+ now = Time.now
19
+ if @due < now
20
+ @due = now + @freq
21
+ return(true)
22
+ end
23
+ return(false)
24
+ end
25
+
26
+ # *Must* be implemented by the subclass.
27
+ #
28
+ # Called by the Monitor if #check returns true
29
+ def sense()
30
+ raise("#sense method should be overridden by the sub-class")
31
+ end
32
+
33
+ end
34
+ end
@@ -0,0 +1,26 @@
1
+ require 'recmon/sensor'
2
+
3
+ module Recmon
4
+
5
+ # SSHSensor periodically checks that an SSH service is available
6
+ class SSHSensor < Sensor
7
+
8
+ # Create a new SSHSensor to periodically test SSH at the named host
9
+ def initialize(name, user="_recmon", port=22, freq=300)
10
+ super(name, freq)
11
+ @user = user
12
+ @port = port
13
+ end
14
+
15
+ # Called by Monitor. sense() attempts to log in to the host in batch mode
16
+ # so you *MUST* set up a private key authentication for this user or
17
+ # this sensor will generate false negatives.
18
+ def sense()
19
+ `/usr/bin/ssh -p#{@port} -o "BatchMode yes" #{@user}@#{@name} pwd`
20
+ up = $?.exitstatus == 0
21
+ status = up ? "up" : "down"
22
+ return("SSH at host=#{@name} for user=#{@user} status=#{status}")
23
+ end
24
+
25
+ end
26
+ end
@@ -0,0 +1,36 @@
1
+ require 'recmon/sensor'
2
+ require 'uri'
3
+ require 'net/http'
4
+
5
+ module Recmon
6
+
7
+ # WebSensor periodically checks that a website is accessible
8
+ class WebSensor < Sensor
9
+
10
+ # Create a new WebSensor with the given name, to check the url periodically
11
+ # to check if the website is available
12
+ def initialize(name, url, freq=60)
13
+ super(name, freq)
14
+ @url = URI.parse(url)
15
+ end
16
+
17
+ # Called by the Monitor, #sense checks for an <code>HTTP 200 OK</code> response from the URL.
18
+ def sense()
19
+ begin
20
+ q = @url.query || ""
21
+ q += (q.length > 0 ? "&" : "?")
22
+ q += "ts=" + Time.now().to_i().to_s() # make request unique to prevent caching
23
+ req = Net::HTTP::Get.new(@url.path + q)
24
+ res = Net::HTTP.start(@url.host, @url.port) {|http|
25
+ http.request(req)
26
+ }
27
+ up = (res.code == '200')
28
+ rescue
29
+ up = false
30
+ end
31
+ status = up ? "up" : "down"
32
+ return("site=#{@name} status=#{status}")
33
+ end
34
+
35
+ end
36
+ end
metadata ADDED
@@ -0,0 +1,77 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: recmon
3
+ version: !ruby/object:Gem::Version
4
+ hash: 23
5
+ prerelease:
6
+ segments:
7
+ - 1
8
+ - 0
9
+ - 0
10
+ version: 1.0.0
11
+ platform: ruby
12
+ authors:
13
+ - Richard Kernahan
14
+ autorequire:
15
+ bindir: bin
16
+ cert_chain: []
17
+
18
+ date: 2012-09-28 00:00:00 Z
19
+ dependencies: []
20
+
21
+ description: "\tRecmon generates log entries for a range of events that are not typically logged.\n Once they are logged, they can be analysed to generate alerts.\n\n REC (Ruby Event Correlation) is the tool that correlates events across time to determine\n if a situation is abnormal, and so generates fewer, more meaningful alerts\n by email or instant message.\n\n So Recmon + REC = lightweight Nagios\n"
22
+ email: dev.recmon@finalstep.com.au
23
+ executables: []
24
+
25
+ extensions: []
26
+
27
+ extra_rdoc_files: []
28
+
29
+ files:
30
+ - lib/recmon.rb
31
+ - lib/recmon/monitor.rb
32
+ - lib/recmon/sensor.rb
33
+ - lib/recmon/command-sensor.rb
34
+ - lib/recmon/diskspace-sensor.rb
35
+ - lib/recmon/ping-sensor.rb
36
+ - lib/recmon/proc-sensor.rb
37
+ - lib/recmon/ssh-sensor.rb
38
+ - lib/recmon/web-sensor.rb
39
+ - .yardopts
40
+ - CHANGELOG
41
+ - README
42
+ homepage: http://rubygems.org/gems/recmon
43
+ licenses: []
44
+
45
+ post_install_message:
46
+ rdoc_options: []
47
+
48
+ require_paths:
49
+ - lib
50
+ required_ruby_version: !ruby/object:Gem::Requirement
51
+ none: false
52
+ requirements:
53
+ - - ">="
54
+ - !ruby/object:Gem::Version
55
+ hash: 3
56
+ segments:
57
+ - 0
58
+ version: "0"
59
+ required_rubygems_version: !ruby/object:Gem::Requirement
60
+ none: false
61
+ requirements:
62
+ - - ">="
63
+ - !ruby/object:Gem::Version
64
+ hash: 3
65
+ segments:
66
+ - 0
67
+ version: "0"
68
+ requirements: []
69
+
70
+ rubyforge_project:
71
+ rubygems_version: 1.8.24
72
+ signing_key:
73
+ specification_version: 3
74
+ summary: Recmon is a host-based system monitor. It complements REC, the Ruby Event Correlator.
75
+ test_files: []
76
+
77
+ has_rdoc: