RubyGems - rec - Versions diffs - 1.0.2 → 1.0.4 - Mend

rec 1.0.2 → 1.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

data/lib/EXAMPLES +226 -0
data/lib/README +211 -0
metadata +22 -11

data/lib/EXAMPLES ADDED Viewed

@@ -0,0 +1,226 @@
+= REC Examples
+The best way to understand REC is to see how rules are written.
+The early examples were inspired by Risto Vaarandi's brilliant SEC (http://simple-evcorr.sourceforge.net/),
+so they employ similar names for easy comparison.
+== Single Threshold
+We are monitoring events where a user has had 3 incorrect password attempts.
+If we see that happen 3 times (+threshold+) within a minute (+lifespan+), alert the administrator.
+	# single threshold rule
+	Rule.new(10034, {
+		:pattern => %r\w+ sudo\:  (\w+) \: 3 incorrect password attempts/,
+		:details => ["userid"],
+		:message => "Failed sudo password for user %userid$s",
+		:lifespan => 60,
+		:alert => "'Too much sudo activity' userid=%userid$s attempts=%count$d dur=%dur$0.3fs ",
+		:threshold => 3
+	}) { |state|
+	if state.count == state.threshold
+		Notify.urgent(state.generate_alert())
+		state.release()
+	end
+	}
+When we see the first event, a state is created with title "Failed sudo password for user richard".
+The second event has not effect, beyond automatically incrementing the count.
+When we see the third event, an output message is generated and logged, and then the generated
+message is also sent via IM to the administrator. Alternatively:
+	}) { |state|
+	message = state.generate_alert()	# writes out a new log entry, and returns it
+	Notify.urgent(message)			# sends the message to the administrator
+	}
+Finally, the state is released (we forget all about it).
+If there is a fourth event, that would then create another state of the same kind which
+would start counting again. Suppose we wanted to avoid that, and just keep on ignoring any
+more events in a sliding window until the user has given it a 3 minute rest.
+The action could be modified in this way:
+	}) { |state|
+	Notify.urgent(state.generate_alert()) if state.count == state.threshold
+	# keep on pushing expiry out to 3 minutes after the last event
+	state.extend_for(180) if state.count >= state.threshold
+	}
+Suppose we want to check for 3 events within 60 seconds, and then ignore further events
+for a fixed 5 minutes.
+	}) { |state|
+	if state.count == state.threshold
+		Notify.urgent(state.generate_alert())
+		state.extend_for(300)	# expire exactly 5 minutes after the 3rd event
+	end
+	}
+== Adding a final block
+If we want to see one message when the user first has trouble, then another message
+after he has decided to stop trying, the format is a little different. The block
+given is previous examples is stored in the +params+ as +action+.
+Instead, the +action+ block may be specified directly as a member of the params hash,
+and the +onexpiry+ must be specified in this way if it is to be used.
+	Rule.new(10034, {
+		:pattern => /^\s+\w+\s+sudo\[\d+\]\:\s+(\w+) \:/,
+		:details => ["userid"],
+		:message => "sudo activity for user %userid$s",
+		:threshold => 3,
+		:lifespan => 60,
+		:alert => "'Too much sudo activity' userid=%userid$s attempts=%count$d dur=%dur$0.3fs ",
+		:expiry => "'Gave sudo a rest' userid=%userid$s attempts=%count$d dur=%dur$0.3fs ",
+		:action => Proc.new { |state|
+			if state.count == state.threshold
+				Notify.urgent(state.generate(:alert))
+				state.release()
+			end
+		},
+		:final => Proc.new { |state|
+			Notify.normal(state.generate(:expiry))
+		}
+	})
+When the state is about to expire, its :onexpiry block will be called. In this case,
+it generates a log entry using the :final message template, and sends the message
+to the administrator via normal (email) delivery.
+== Event compression
+Compression involves converting a stream of events into fewer, preferably one. In this example,
+we report when a skype conversation starts and then suppress all further noise for about 8 minutes.
+	# suppression rule
+	Rule.new(10035, {
+		:pattern => /^\s\w+\sFirewall\[\d+\]\:\sSkype is listening from 0.0.0.0:(\d+)/,
+		:details => ["port"],
+		:message => "Skype conversation started on port %port$d",
+		:alert => "Skype running on port %port$d",
+		:lifespan => 479
+	}) { |state|
+		state.generate_first_only(:alert)
+	}
+The <code>generate_first_only</code> method creates a new event using the :alert template
+only if the state's +count+ is 1, so it notices the first event and ignores all subsequent
+events as long as the state lives.
+By default, generate() and generate_first_only() use the :alert template.
+If no :alert was provided, the :message will be used instead. In this example,
+we could have omitted the argument:
+	}) { |state|
+		state.generate_first_only()
+	}
+== Pairs of rules
+We want to know when a server goes down, and when it comes back up again.
+In this example, rule 10036 creates a new log entry when we first
+see the server is not responding, and the state persists for 5 minutes.
+	# pair rule
+	Rule.new(10036, {
+		:pattern => /^\s\w+\s\w+\: nfs\: server (\w+) not responding/,
+		:details => ["host"],
+		:message => "Server %host$s is down",
+		:lifespan => 300
+	}) { |state|
+		state.generate_first_only()
+	}
+Rule 10037 looks for a message saying the server is OK, *AND* that there is a state
+with a title like "Server earth is down". The :allstates parameter contains an array of
+templates - the rule does not react to the event unless all of the named states exist.
+When all the conditions are satisfied, the rule generates a new log entry that
+the server is up, and then forget both states.
+	Rule.new(10037, {
+		:pattern => /^\s\w+\s\w+\: nfs\: server (\w+) OK/,
+		:details => ["host"],
+		:message => "Server %host$s is up again",
+		:allstates => ["Server %host$s is down"]
+	}) {|state|
+		state.generate()
+		state.release("Server %host$s is down")
+		state.release()
+	}
+Since no :alert is specified, it defaults to the :message. So +generate+ will
+log a message that "Server earth is up again".
+== Correlating events (and states)
+Now suppose we want to know how long the server was down. We have two options:
+1. we could add a final block to rule 10036 to report its age, but that would
+   just create an extra message and that's what we're trying to get away from
+2. we could report the duration in a single "Server earth is up again" message
+Since we've already seen how to add a final block, lets take option 2.
+	Rule.new(10037, {
+		:pattern => /^\s\w+\s\w+\: nfs\: server (\w+) OK/,
+		:details => ["host"],
+		:message => "Server %host$s is up again after %outage$d minutes",
+		:allstates => ["Server %host$s is down"]
+	}) {|state|
+		duration = State.find("Server %host$s is down", state).age()
+		state.params[:outage] = (duration/60).to_i()
+		state.generate()
+		state.release("Server %host$s is down")
+		state.release()
+	}
+We can obtain the duration of the outage with the State#find method, which
+interpolates the current state's values into the template, and finds the
+matching state.
+We now need to store that duration into the state's values as an integer, because
+sprintf %d expects an integer.
+Having calculated the duration, we generate the message, and forget both states.
+== Shortcut actions
+Several actions are so common they have been provided as constants to make the rules
+more succinct but still readable. One is to generate a message on the first event only:
+	Rule.new(10036, {
+		:pattern => /^\s\w+\s\w+\: nfs\: server (\w+) not responding/,
+		:details => ["host"],
+		:message => "Server %host$s is down",
+		:lifespan => 300
+	}) { |state|
+		state.generate_first_only()
+	}
+can be abbreviated in this way:
+	Rule.new(10036, {
+		:pattern => /^\s\w+\s\w+\: nfs\: server (\w+) not responding/,
+		:details => ["host"],
+		:message => "Server %host$s is down",
+		:lifespan => 300,
+		:action => State::Generate_first_only
+	})
+Another common action is to generate a message and release the state immediately:
+	Rule.new(10040, {
+		:pattern => /Accepted password for (\w+) from (\d+\.\d+\.\d+\.\d+)/,
+		:details => ["user", "ip"],
+		:message => "User %user$s signed in via SSH from %ip$s",
+	}) { |state|
+		state.generate()
+		state.release()
+	}
+can be abbreviated in this way:
+	Rule.new(10040, {
+		:pattern => /Accepted password for (\w+) from (\d+\.\d+\.\d+\.\d+)/,
+		:details => ["user", "ip"],
+		:message => "User %user$s signed in via SSH from %ip$s",
+		:action => State::Generate_and_release
+	})

data/lib/README ADDED Viewed

@@ -0,0 +1,211 @@
+= Ruby Event Correlation
+Correlates events in order to generate a smaller set of more meaningful events.
+== Installation
+1. Install the gem
+	$ sudo gem install rec
+2. Select a ruleset or create your own
+	#!/usr/bin/ruby
+	require 'rec'
+	include REC
+	require 'rulesets/postfix-rules'
+	Correlator::start()
+3. Start it up
+	$ rulesets/rules.rb < /var/log/mail.log 3>missed.log 2>control.log > newevents.log
+== Why correlate events?
+We all know that we should read our log files. But reading log files is *really* boring,
+and frankly its easy to miss important things in all the superfluous detail.
+[Save time]
+	If you are lazy enough to not want to review all of your log files manually forever, and
+	smart enough to work out what needs monitoring and when you might want to pay attention,
+	then wouldn't it be good if you could define those rules and let the computer do what it
+	does best?
+[Generate meaning]
+	The logs of many applications are filled with entries that are quite low level - perhaps
+	wonderful for debugging, but typically not terribly meaningful in terms of business.
+	Wouldn't it be good if we could summarise a bunch of low level events into a single
+	business event - and then just read the <em>business log</em>.
+== Alternatives
+There are several alternatives to REC which may suit your needs better:
+* splunk[www.splunk.com]
+* nagios[www.nagios.com]
+* scalextreme.com[www.scalextreme.com]
+While I like these options, I find they take a lot of configuring.
+They also has some dependencies that make them a bit heavier than you may want.
+If you just want to keep track of a few kinds of events, want a lot of flexibility
+and control without too much effort, then REC may be of some value.
+== How does REC work?
+Each entry in a log file is an *event*.
+The Correlator reads the events, and attempts to match an event against each Rule.
+If an event matches a rule, the rule creates a State which just means we're remembering
+that the event matched a rule. The pattern to match is a regexp, and the captured values
+are named. For example
+	# log entry => "nfs: server earth not responding"
+	pattern => /nfs\: server (\w+) not responding/
+	details => ['host']
+	# values of interest are captured into a hash => {'host' => 'earth' }
+	:message => "Server %host$s is down"
+	# interpolation with named parameters => "Server earth is down"
+A state has a fixed lifetime, set when it is created. At the end of its life, it may simply
+expire quietly, or a pre-defined action may be executed. For example, if we find a server is down,
+we may want to wait for 3 minutes and if it is not up again, then alert the administrator.
+The server being down is a state, and two states are distinguished by their *titles*. For example,
+"host earth is down" and "host terra is down".
+Now that we're remembering a set of states, we can match events against not only the event's
+message, but also other states. For example, we can match "host terra is up" against a previously
+created state "host terra is down", and generate a new event "host terra is back up after 14 minutes".
+We can also 'swallow' all of the rest of the "host terra is down" events because they add nothing new.
+This <em>event compression</em> means the administrator gets one important message, and not 27
+distracting alerts.
+A notifcation can be sent by email or IM, depending on your preferences and working hours.
+The destinations and credentials are supplied to your ruleset:
+	# For better security, move the next few lines into a file readable only by
+	# the user running this script eg. /home/rec/alert.conf
+	# and then require that file
+	Notify.smtp_credentials("rec@gmail.com", "recret", "myfirm.com")
+	Notify.emailTo = "me@myfirm.com"
+	Notify.jabber_credentials("rec@gmail.com", "recret")
+	Notify.jabberTo = "me@myfirm.com"
+Rules can then send an alert when desired. Two common cases involve alerting immediately
+on the first event (eg. "host terra is down"), and alerting on expiry or at a subsequent event
+(eg. "host terra is back up").
+	state.alert_first_only()		# => generate a new event on first original event
+						# or
+	Notify.normal(state.alert_first_only())	# => log and also send the new event via email
+In most cases, however, it is not necessary to alert the administrator at all. It is enough to
+log the new event in the output logfile for later review.
+== Anatomy of a Rule
+Warn if an user is having trouble executing sudo commands
+The log entry (/var/log/secure) looks like this:
+	Sep 16 07:09:22 earth sudo:  richard : 3 incorrect password attempts ;...
+and the rule might look like this:
+	# single threshold rule
+	Rule.new(10034, {
+		:pattern => /\w+ sudo\:  (\w+) \: 3 incorrect password attempts/,
+		:details => ["userid"],
+		:message => "Failed sudo password for user %userid$s",
+		:lifespan => 60,
+		:alert => "'Too much sudo activity' userid=%userid$s attempts=%count$d dur=%dur$0.3fs ",
+		:threshold => 3,
+		:capture => true
+	}) { |state|
+	if state.count == state.threshold
+		Notify.urgent(state.generate_alert())
+		state.release()
+	end
+	}
+Let's look at each part:
+[Rule ID]
+	Each rule must have a unique integer ID (+rid+).
+	It is the first argument and is mandatory.
+	Its probably a good idea to 'reserve' a	number range for a ruleset
+	to keep them separate from other rules (eg. 17801-17899 for Postfix-related rules).
+The second argument is a hash of options:
+[pattern]
+	The +pattern+ is a regexp designed to match certain log messages.
+	A +message+ is what's left of a log entry after we have removed the timestamp and
+	any priority level. For example:
+		[Thu Aug 16 16:11:21 2012] [error] ap_proxy_connect_backend disabling worker for (127.0.0.1)
+		# timestamp parsed => 2012-08-16T16:11:21+10:00
+		# priority ignored => "error"
+		# message => "ap_proxy_connect_backend disabling worker for (127.0.0.1)"
+[details]
+	The pattern may contain regexp 'captures' (eg. (\d+.\d+.\d+.\d+) to capture the ip).
+	For each capture a name should be specified in the +details+ array.
+	The sequence of captures is as specified for ruby Regexps.
+		:pattern => /\w+ sudo\:  (\w+) \: (\d) incorrect password attempts/,
+		:details => ["userid", "failures"],
+	The names chosen for captured values are used as keys to store the values in the same
+	hash that stores the parameters, so do *not* choose words like +pattern+, +details+,
+	+message+, +threshold+, +lifespan+, +alert+, +capture+, +continue+, or +action+.
+[message]
+	The +message+ is a string template into which the captured values are interpolated
+	to produce a unique key for a state.
+		:details => ["userid"],
+		:message => "Failed sudo password for user %userid$s",
+		# userid = "richard" => "Failed sudo password for user richard"
+	Note the modified +sprintf+ syntax: the value of +userid+ is inserted into the message
+	as a string by the String::sprinth method. This becomes the +title+ and key for the state
+	created by this rule.
+[lifespan]
+	When a rule creates a state, we need to know how long to remember the state for, and
+	when to expire it. The +lifespan+ specifies that duration in seconds.
+	It is also possible to extend the life of a state should other events take place (with
+	State::live_another) in the same way that a web session may be extended for another 10
+	minutes longer at each request.
+[alert]
+	This is a string template used to generate an output log message (the timestamp will be
+	prefixed automatically to complete the log entry).
+		:alert => "'Too much sudo activity' userid=%userid$s attempts=%count$d dur=%dur$0.3fs ",
+	By convention, we make out log messages very easy to parse by creating name=value pairs,
+	and single-quoting strings containing spaces, in case the output will be processed further.
+	If no +alert+ is provided, it will default to +message+.
+[capture]
+	The +capture+ parameters tells REC to store the original log entries in the state (in the
+	+logs+ attribute). You could in this way for example extract a transcript of each web session from
+	a noisy access log, and output them as each session finishes or expires.
+		:capture => true
+[threshold]
+	This parameter is used in the action.
+			:threshold => 3
+[allstates]
+	An array of templates used to determine if matching states exist. All the mentioned
+	states must be found or the rule will not take any action.
+[anystates]
+	An array of templates used to determine if matching states exist. If any one of the
+	mentioned states exist, then the rule will execute its action.
+[notstates]
+	An array of templates used to determine if matching states exist. If any one of the
+	mentioned states does exist, then the rule will *not* execute its action.
+[Any arbitrary parameter]
+	Any arbitrary parameters may be added to the rule, and they are passed on to the
+	state in the +params+ hash.
+The third argument is a block.
+[action]
+	The action is a block with a single argument which is the state created by the rule.
+	The +count+ of matched events is maintained automatically. In this case, when we have
+	seen 3 events, we generate an output log entry and also send it by IM, then release
+	the state (forget about it).
+			:threshold => 3
+		}) { |state|
+		if state.count == state.threshold
+			Notify.urgent(state.generate_alert())
+			state.release()
+		end
+		}
+	By the magic of Ruby's #method_missing method (Yes, I'm looking at you Java!) we can
+	refer to any parameter succinctly instead of a cumbersome hash notation, so:
+		state.threshold === state.params['threshold']
+For more examples, see the EXAMPLES page.

metadata CHANGED Viewed

@@ -1,12 +1,13 @@
 --- !ruby/object:Gem::Specification
 name: rec
 version: !ruby/object:Gem::Version
-  prerelease: false
+  hash: 31
+  prerelease:
   segments:
   - 1
   - 0
-  - 2
-  version: 1.0.2
+  - 4
+  version: 1.0.4
 platform: ruby
 authors:
 - Richard Kernahan
@@ -14,8 +15,7 @@ autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-09-17 00:00:00 +10:00
-default_executable:
+date: 2012-09-17 00:00:00 Z
 dependencies: []
 description: "\t\tSifts through your log files in real time, using stateful intelligence to determine\n\
@@ -30,8 +30,9 @@ executables: []
 extensions: []
-extra_rdoc_files: []
+extra_rdoc_files:
+- lib/README
+- lib/EXAMPLES
 files:
 - lib/rec.rb
 - lib/rec/rule.rb
@@ -40,35 +41,45 @@ files:
 - lib/rec/notify.rb
 - lib/rec/mock-notify.rb
 - lib/string.rb
-has_rdoc: true
+- lib/README
+- lib/EXAMPLES
 homepage: http://rubygems.org/gems/rec
 licenses: []
 post_install_message:
-rdoc_options: []
+rdoc_options:
+- --show-hash
+- --main
+- lib/README
+- --title
+- REC -- Ruby Event Correlation
 require_paths:
 - lib
 required_ruby_version: !ruby/object:Gem::Requirement
+  none: false
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
+      hash: 3
       segments:
       - 0
       version: "0"
 required_rubygems_version: !ruby/object:Gem::Requirement
+  none: false
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
+      hash: 3
       segments:
       - 0
       version: "0"
 requirements: []
 rubyforge_project: rec
-rubygems_version: 1.3.6
+rubygems_version: 1.8.24
 signing_key:
 specification_version: 3
 summary: Ruby event correlation
 test_files: []
+has_rdoc: