solanum 0.2.0 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -1,51 +1,57 @@
1
1
  Solanum
2
2
  =======
3
3
 
4
- This gem provides a domain-specific language (DSL) for collecting metrics
5
- data in Ruby. The `solanum` script takes a number of monitoring configuration
6
- scripts as arguments and periodically collects the metrics defined. The results
7
- can be printed to the console or sent to a [Riemann](http://riemann.io/) server.
8
- This requires the `riemann-client` gem to work.
9
-
10
- ## Structure
11
-
12
- Solanum scripts define _sources_, which provide some string input when they are
13
- read. This input is processed by a set of _matchers_ for each source, which can
14
- generate named measurements from that data. A simple example would be a file
15
- source, which is read and matched line-by-line against a set of regular
16
- expressions.
17
-
18
- The emitted measurements can undergo a bit more processing before being
19
- reported. For example, some metrics are monotonically-increasing counters, and
20
- what we actually want is the _difference_ between each reading. For others, we
21
- may want to apply threshold-based states to the events. These are set by
22
- _service prototypes_, which are also defined in the scripts.
23
-
24
- ## Examples
25
-
26
- Here's an example of reading some information about the current system memory:
27
-
28
- ```ruby
29
- # Read memory usage.
30
- read "/proc/meminfo" do
31
- match /^MemTotal:\s+(\d+) kB$/, cast: :to_i, scale: 1024, record: 'memory total bytes'
32
- match /^MemFree:\s+(\d+) kB$/, cast: :to_i, scale: 1024, record: 'memory free bytes'
33
- end
34
-
35
- # Calculate percentages from total space.
36
- compute do |metrics|
37
- total = metrics['memory total bytes']
38
- free = metrics['memory free bytes']
39
- if total && free
40
- metrics['memory free pct'] = free.to_f/total
41
- end
42
- end
43
-
44
- # Define a service prototype with a threshold-based state.
45
- service 'memory free pct', state: thresholds(0.00, :critical, 0.10, :warning, 0.25, :ok)
46
- ```
47
-
48
- See the files in the `examples` directory for more monitor configuration samples.
4
+ This gem provides a monitoring daemon which can be configured to collect data
5
+ from a variety of pluggable sources. The results can be printed to the console
6
+ or sent to a [Riemann](http://riemann.io/) server. This requires the
7
+ `riemann-client` gem to work.
8
+
9
+
10
+ ## Installation
11
+
12
+ **TODO**
13
+
14
+
15
+ ## Metric Events
16
+
17
+ Solanum represents each measurement datapoint as an _event_. Each event must
18
+ have at minimum a `service` and `metric` with the measurement name and value,
19
+ respectively. Events may also contain other attributes such as a `state`, `ttl`,
20
+ `tags`, and so on - see the [Riemann concepts](http://riemann.io/concepts.html)
21
+ page for more details.
22
+
23
+
24
+ ## Configuration
25
+
26
+ Solanum is configured using one or more YAML files. These specify common event
27
+ attributes, sources, and outputs.
28
+
29
+ See the [example config](config.yml) in this repo for possible config options.
30
+
31
+ ### Defaults
32
+
33
+ The `defaults` section of the config provides common attributes to apply to
34
+ every event. This can be used to provide a common TTL, tags, and more.
35
+
36
+ ### Sources
37
+
38
+ A _source_ is a class which extends `Solanum::Source` and implements the
39
+ `collect!` method to return metric events. Solanum comes with several metric
40
+ sources built in, including basic host-level monitoring of CPU usage, load,
41
+ memory, diskstats, network, and more.
42
+
43
+ Additional custom sources can be provided, as long as they are in Ruby's lib
44
+ path for the daemon.
45
+
46
+ ### Outputs
47
+
48
+ An _output_ is a destination to report the collected events to. The simplest
49
+ one is the `print` output, which writes each event to STDOUT. This is useful for
50
+ debugging, but you probably won't leave it on for deployed daemons.
51
+
52
+ The other included choice is the `riemann` output, which sends each event to a
53
+ Riemann monitoring server.
54
+
49
55
 
50
56
  ## License
51
57
 
data/bin/solanum CHANGED
@@ -6,16 +6,13 @@ require 'optparse'
6
6
  require 'solanum'
7
7
 
8
8
  $options = {
9
- riemann_host: nil,
10
- riemann_port: 5555,
11
- interval: 5,
12
- verbose: false,
9
+ period: 10,
13
10
  }
14
11
 
15
12
  $defaults = {
16
- host: %x{hostname}.chomp,
13
+ host: %x{hostname --fqdn}.chomp,
17
14
  tags: [],
18
- ttl: 10,
15
+ ttl: 60,
19
16
  }
20
17
 
21
18
  def fail(msg, code=1)
@@ -23,25 +20,18 @@ def fail(msg, code=1)
23
20
  exit code
24
21
  end
25
22
 
26
- def log(msg)
27
- puts "%s %s" % [Time.now.strftime("%H:%M:%S"), msg] if $options[:verbose]
28
- end
29
-
30
23
  # Parse command-line options.
31
24
  options = OptionParser.new do |opts|
32
- opts.banner = "Usage: #{File.basename($0)} [options] <monitor config> [monitor config] ..."
25
+ opts.banner = "Usage: #{File.basename($0)} [options] <config.yml> [config2.yml ...]"
33
26
  opts.separator ""
34
27
  opts.separator "Event Attributes:"
35
28
  opts.on( '--host HOST', "Event hostname (default: #{$defaults[:host]})") {|v| $defaults[:host] = v }
36
- opts.on('-a', '--attribute KEY=VAL', "Attribute to add to the event (may be given multiple times)") {|attr| k,v = attr.split(/=/); if k and v then $defaults[k.intern] = v end }
29
+ opts.on('-a', '--attribute KEY=VAL', "Attribute to add to every event (may be given multiple times)") {|attr| k,v = attr.split(/=/); if k and v then $defaults[k.intern] = v end }
37
30
  opts.on('-t', '--tag TAG', "Tag to add to events (may be given multiple times)") {|v| $defaults[:tags] << v }
38
- opts.on( '--ttl SECONDS', "Default TTL for events (default: #{$options[:ttl]})") {|v| $defaults[:ttl] = v.to_i }
31
+ opts.on( '--ttl SECONDS', "Default TTL for events (default: #{$defaults[:ttl]})") {|v| $defaults[:ttl] = v.to_i }
39
32
  opts.separator ""
40
33
  opts.separator "General Options:"
41
- opts.on( '--riemann-host HOST', "Riemann host to report events to") {|v| $options[:riemann_host] = v }
42
- opts.on( '--riemann-port PORT', "Riemann port (default: #{$options[:riemann_port]})") {|v| $options[:riemann_port] = v.to_i }
43
- opts.on('-i', '--interval SECONDS', "Seconds between updates (default: #{$options[:interval]})") {|v| $options[:interval] = v.to_i }
44
- opts.on('-v', '--verbose', "Print additional information to stdout") { $options[:verbose] = true }
34
+ opts.on('-p', '--period SECONDS', "Seconds between updates (default: #{$options[:period]})") {|v| $options[:period] = v.to_i }
45
35
  opts.on('-h', '--help', "Displays usage information") { print opts; exit }
46
36
  end
47
37
  options.parse!
@@ -49,46 +39,14 @@ options.parse!
49
39
  # Check usage.
50
40
  fail options if ARGV.empty?
51
41
 
52
-
53
-
54
- ##### MONITORING CONFIGS #####
55
-
42
+ # Construct monitoring system.
56
43
  $solanum = Solanum.new(ARGV)
57
44
  fail "No sources loaded!" if $solanum.sources.empty?
58
45
 
59
- if $options[:riemann_host]
60
- begin
61
- require 'riemann/client'
62
- rescue LoadError
63
- fail "ERROR: could not load Riemann client library! `gem install riemann-client` to enable reporting"
64
- end
65
-
66
- $riemann = Riemann::Client.new(host: $options[:riemann_host], port: $options[:riemann_port])
67
- end
68
-
69
-
70
-
71
- ##### REPORT LOOP #####
72
-
46
+ # Handle ^C interrupts gracefully.
73
47
  trap "SIGINT" do
74
48
  exit
75
49
  end
76
50
 
77
- loop do
78
- $solanum.collect!
79
- events = $solanum.build_events($defaults)
80
-
81
- events.each do |event|
82
- if $options[:verbose] || $riemann.nil?
83
- puts "%-40s %5s (%s) %s" % [
84
- event[:service], event[:metric],
85
- event[:state].nil? ? "--" : event[:state],
86
- event.inspect
87
- ]
88
- end
89
-
90
- $riemann << event if $riemann
91
- end
92
-
93
- sleep $options[:interval]
94
- end
51
+ # Scheduling loop.
52
+ $solanum.run!
data/lib/solanum.rb CHANGED
@@ -1,81 +1,136 @@
1
+ require 'solanum/config'
2
+ require 'solanum/schedule'
3
+ require 'thread'
4
+
5
+
1
6
  # Class which wraps up an active Solanum monitoring system into an object.
2
- #
3
- # Author:: Greg Look
4
7
  class Solanum
5
- attr_reader :sources, :services, :metrics
8
+ attr_reader :defaults, :sources, :outputs
6
9
 
7
- require 'solanum/config'
8
- require 'solanum/source'
10
+ # Merge two event attribute maps together, concatenating tags.
11
+ def self.merge_attrs(a, b)
12
+ stringify = lambda do |x|
13
+ o = {}
14
+ x.keys.each do |k|
15
+ o[k.to_s] = x[k]
16
+ end
17
+ o
18
+ end
19
+
20
+ if a.nil?
21
+ stringify[b]
22
+ elsif b.nil?
23
+ stringify[a]
24
+ else
25
+ a = stringify[a]
26
+ b = stringify[b]
27
+ tags = a['tags'] ? a['tags'].dup : []
28
+ tags.concat(b['tags']) if b['tags']
29
+ tags.uniq!
30
+ x = a.dup.merge(b)
31
+ x['tags'] = tags unless tags.empty?
32
+ x
33
+ end
34
+ end
9
35
 
10
36
 
11
- # Loads the given monitoring scripts and initializes the sources and service
12
- # definitions.
13
- def initialize(scripts)
37
+ # Loads the given configuration file(s) and initializes the system.
38
+ def initialize(config_paths)
39
+ @defaults = {tags: []}
14
40
  @sources = []
15
- @services = []
16
- @metrics = {}
41
+ @outputs = []
17
42
 
18
- scripts.each do |path|
19
- begin
20
- config = Solanum::Config.new(path)
21
- @sources.concat(config.sources)
22
- @services.concat(config.services)
23
- rescue => e
24
- STDERR.puts "Error loading monitor script #{path}: #{e}"
25
- end
43
+ # Load and merge files.
44
+ config_paths.each do |path|
45
+ conf = Config.load_file(path)
46
+
47
+ # merge defaults, update tags
48
+ @defaults = Solanum.merge_attrs(@defaults, conf[:defaults])
49
+
50
+ # sources and outputs are additive
51
+ @sources.concat(conf[:sources])
52
+ @outputs.concat(conf[:outputs])
53
+ end
54
+
55
+ # Add default print output.
56
+ if @outputs.empty?
57
+ require 'solanum/output/print'
58
+ @outputs << Solanum::Output::Print.new()
26
59
  end
27
60
 
61
+ @defaults.freeze
62
+ @outputs.freeze
28
63
  @sources.freeze
29
- @services.freeze
64
+
65
+ @schedule = Solanum::Schedule.new
66
+ @sources.each_with_index do |source, i|
67
+ @schedule.insert!(source.next_run, i)
68
+ end
30
69
  end
31
70
 
32
71
 
33
- # Collects metrics from the given sources, in order. Updates the internal
34
- # merged map of metric data.
35
- def collect!
36
- @old_metrics = @metrics
37
- @metrics = @sources.reduce({}) do |metrics, source|
38
- begin
39
- new_metrics = source.collect(metrics) || {}
40
- metrics.merge(new_metrics)
41
- rescue => e
42
- STDERR.puts "Error collecting metrics from #{source}: #{e}"
43
- metrics
72
+ # Reschedule the given source for later running.
73
+ def reschedule!(source)
74
+ idx = nil
75
+ @sources.each_with_index do |s, i|
76
+ if s == source
77
+ idx = i
78
+ break
44
79
  end
45
80
  end
81
+ raise "Source #{source.inspect} is not present in source list!" unless idx
82
+ @schedule.insert!(source.next_run, idx)
83
+ @scheduler.wakeup
46
84
  end
47
85
 
48
86
 
49
- # Builds full events from a set of service prototypes, old metrics, and new
50
- # metrics.
51
- def build_events(defaults={})
52
- @metrics.keys.sort.map do |service|
53
- value = @metrics[service]
54
- prototype = @services.select{|m| m[0] === service }.map{|m| m[1] }.reduce({}, &:merge)
87
+ # Report a batch of events to all reporters.
88
+ def record!(events)
89
+ # TODO: does this need locking?
90
+ @outputs.each do |output|
91
+ output.write_events events
92
+ end
93
+ end
55
94
 
56
- state = prototype[:state] ? prototype[:state].call(value) : :ok
57
- tags = ((prototype[:tags] || []) + (defaults[:tags] || [])).uniq
58
- ttl = prototype[:ttl] || defaults[:ttl]
59
95
 
60
- if prototype[:diff]
61
- last = @old_metrics[service]
62
- if last && last <= value
63
- value = value - last
64
- else
65
- value = nil
96
+ # Run collection from the given source in a new thread.
97
+ def collect_events!(source)
98
+ Thread.new do
99
+ begin
100
+ events = source.collect!
101
+ attrs = Solanum.merge_attrs(@defaults, source.attributes)
102
+ events = events.map do |event|
103
+ Solanum.merge_attrs(attrs, event)
66
104
  end
105
+ record! events
106
+ rescue => e
107
+ STDERR.puts "Error collecting events from source #{source.type}: #{e}"
67
108
  end
109
+ reschedule! source
110
+ end
111
+ end
112
+
68
113
 
69
- if value
70
- defaults.merge({
71
- service: service,
72
- metric: value,
73
- state: state.to_s,
74
- tags: tags,
75
- ttl: ttl
76
- })
114
+ # Runs the collection loop.
115
+ def run!
116
+ @scheduler = Thread.current
117
+ loop do
118
+ # Determine when next scheduled source should run, and sleep if needed.
119
+ duration = @schedule.next_wait || 1
120
+ if 0 < duration
121
+ sleep duration
122
+ next
77
123
  end
78
- end.compact
124
+
125
+ # Get the next ready source.
126
+ idx = @schedule.pop_ready!
127
+ source = @sources[idx] if idx
128
+ next unless source
129
+ #puts "Source #{source.type} is ready to run!" # DEBUG
130
+
131
+ # Start thread to collect and report events.
132
+ collect_events! source
133
+ end
79
134
  end
80
135
 
81
136
  end
@@ -1,97 +1,90 @@
1
- require 'solanum/source'
1
+ require 'yaml'
2
2
 
3
- class Solanum::Config
4
- attr_reader :sources, :services
3
+ class Solanum
4
+ module Config
5
5
 
6
- def initialize(path)
7
- @sources = []
8
- @services = []
9
-
10
- instance_eval ::File.readlines(path).join, path, 1
11
-
12
- raise "No sources loaded from monitor script: #{path}" if @sources.empty?
6
+ # Helper method to clear the type cache.
7
+ def self.clear_type_cache!
8
+ @@type_classes = {}
13
9
  end
14
10
 
15
11
 
16
- private
17
-
18
- # Registers a new source object. If a block is given, it is used to configure
19
- # the source with instance_exec.
20
- def register_source(source, config=nil)
21
- source.instance_exec &config if config
22
- @sources << source
23
- source
24
- end
12
+ # Resolve a type based on a library path.
13
+ def self.resolve_type(namespace, type, lib_path=nil, class_name=nil)
14
+ @@type_classes ||= {}
25
15
 
16
+ type_key = "#{namespace}:#{type}"
17
+ return @@type_classes[type_key] if @@type_classes.include?(type_key)
26
18
 
27
- # Registers a source which runs a command and matches against output lines.
28
- def run(command, &config)
29
- register_source Solanum::Source::Command.new(command), config
30
- end
19
+ lib_path ||= type.include?('/') ? type : "solanum/#{namespace}/#{type}"
20
+ if class_name
21
+ cls_path = class_name.split('::')
22
+ else
23
+ cls_path = lib_path.split('/').map {|w| w.capitalize }
24
+ end
31
25
 
26
+ begin
27
+ require lib_path
28
+ cls = cls_path.inject(Object) do |mod, class_name|
29
+ mod.const_get(class_name) if mod
30
+ end
31
+ STDERR.puts "Unable to resolve class #{cls_path.join('::')}" unless cls
32
+ @@type_classes[type_key] = cls
33
+ rescue LoadError => e
34
+ STDERR.puts "Unable to load code for #{type_key} type: #{e}"
35
+ @@type_classes[type_key] = nil
36
+ end
32
37
 
33
- # Registers a source which matches against the lines in a file.
34
- def read(path, &config)
35
- register_source Solanum::Source::File.new(path), config
38
+ @@type_classes[type_key]
36
39
  end
37
40
 
38
41
 
39
- # Registers a source which computes metrics directly.
40
- def compute(&block)
41
- register_source Solanum::Source::Compute.new(block)
42
+ # Resolves a type config string and constructs a new instance of it. Memoizes
43
+ # the results of loading the class in the `@@type_classes` field.
44
+ def self.construct_type(namespace, type, args)
45
+ cls = resolve_type(namespace, type, args['lib_path'], args['class'])
46
+ if cls.nil?
47
+ STDERR.puts "Skipping construction of failed #{namespace} type #{type}"
48
+ nil
49
+ else
50
+ begin
51
+ #puts "#{cls}.new(#{args.inspect})" # DEBUG
52
+ cls.new(args)
53
+ rescue => e
54
+ STDERR.puts "Error constructing #{namespace} type #{type}: #{args.inspect} #{e}"
55
+ nil
56
+ end
57
+ end
42
58
  end
43
59
 
44
60
 
45
- # Registers a pair of [matcher, prototype] where matcher is generally a string
46
- # or regex to match a service name, and prototype is a map of :ttl, :state,
47
- # :tags, etc.
48
- def service(service, prototype={})
49
- @services << [service, prototype]
50
- end
61
+ # Load the given configuration file. Returns a map with initialized :sources
62
+ # and :outputs.
63
+ def self.load_file(path)
64
+ config = File.open(path) {|f| YAML.load(f) }
51
65
 
66
+ defaults = config['defaults'] || {}
52
67
 
53
- ##### HELPER METHODS #####
54
-
55
- # Creates a state function based on thresholds. If the first argument is a
56
- # symbol, it is taken as the default service state. Otherwise, arguments should
57
- # be alternating numeric thresholds and state values to assign if the metric
58
- # value exceeds the threshold.
59
- #
60
- # For example, for an 'availability' metric you often want to warn on low
61
- # values. To assign a 'critical' state to values between 0% and 10%,
62
- # 'warning' between 10% and 25%, and 'ok' above, use the following:
63
- #
64
- # thresholds(0.00, :critical, 0.10, :warning, 0.25, :ok)
65
- #
66
- # For 'usage' metrics it's the inverse, giving low values ok states and
67
- # warning about high values:
68
- #
69
- # thresholds(:ok, 55, :warning, 65, :critical)
70
- #
71
- def thresholds(*args)
72
- default_state = nil
73
- default_state = args.shift unless args.first.kind_of? Numeric
74
-
75
- # Check arguments.
76
- raise "Thresholds must be paired with state values" unless args.count.even?
77
- args.each_slice(2) do |threshold|
78
- limit, state = *threshold
79
- raise "Limits must be numeric: #{limit}" unless limit.kind_of? Numeric
80
- raise "State values must be strings or symbols: #{state}" unless state.instance_of?(String) || state.instance_of?(Symbol)
68
+ # Construct sources from config.
69
+ source_configs = config['sources'] || []
70
+ sources = source_configs.map do |conf|
71
+ self.construct_type('source', conf['type'], conf)
81
72
  end
73
+ sources.reject!(&:nil?)
82
74
 
83
- # State block.
84
- lambda do |v|
85
- state = default_state
86
- args.each_slice(2) do |threshold|
87
- if threshold[0] < v
88
- state = threshold[1]
89
- else
90
- break
91
- end
92
- end
93
- state
75
+ # Construct outputs from config.
76
+ output_configs = config['outputs'] || []
77
+ outputs = output_configs.map do |conf|
78
+ self.construct_type('output', conf['type'], conf)
94
79
  end
80
+ outputs.reject!(&:nil?)
81
+
82
+ {
83
+ defaults: defaults,
84
+ sources: sources,
85
+ outputs: outputs,
86
+ }
95
87
  end
96
88
 
97
89
  end
90
+ end