rec 1.0.2 → 1.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. data/lib/EXAMPLES +226 -0
  2. data/lib/README +211 -0
  3. metadata +22 -11
data/lib/EXAMPLES ADDED
@@ -0,0 +1,226 @@
1
+ = REC Examples
2
+ The best way to understand REC is to see how rules are written.
3
+
4
+ The early examples were inspired by Risto Vaarandi's brilliant SEC (http://simple-evcorr.sourceforge.net/),
5
+ so they employ similar names for easy comparison.
6
+
7
+ == Single Threshold
8
+ We are monitoring events where a user has had 3 incorrect password attempts.
9
+ If we see that happen 3 times (+threshold+) within a minute (+lifespan+), alert the administrator.
10
+
11
+ # single threshold rule
12
+ Rule.new(10034, {
13
+ :pattern => %r\w+ sudo\: (\w+) \: 3 incorrect password attempts/,
14
+ :details => ["userid"],
15
+ :message => "Failed sudo password for user %userid$s",
16
+ :lifespan => 60,
17
+ :alert => "'Too much sudo activity' userid=%userid$s attempts=%count$d dur=%dur$0.3fs ",
18
+ :threshold => 3
19
+ }) { |state|
20
+ if state.count == state.threshold
21
+ Notify.urgent(state.generate_alert())
22
+ state.release()
23
+ end
24
+ }
25
+
26
+ When we see the first event, a state is created with title "Failed sudo password for user richard".
27
+ The second event has not effect, beyond automatically incrementing the count.
28
+ When we see the third event, an output message is generated and logged, and then the generated
29
+ message is also sent via IM to the administrator. Alternatively:
30
+
31
+ }) { |state|
32
+ message = state.generate_alert() # writes out a new log entry, and returns it
33
+ Notify.urgent(message) # sends the message to the administrator
34
+ }
35
+
36
+ Finally, the state is released (we forget all about it).
37
+
38
+ If there is a fourth event, that would then create another state of the same kind which
39
+ would start counting again. Suppose we wanted to avoid that, and just keep on ignoring any
40
+ more events in a sliding window until the user has given it a 3 minute rest.
41
+ The action could be modified in this way:
42
+
43
+ }) { |state|
44
+ Notify.urgent(state.generate_alert()) if state.count == state.threshold
45
+ # keep on pushing expiry out to 3 minutes after the last event
46
+ state.extend_for(180) if state.count >= state.threshold
47
+ }
48
+
49
+ Suppose we want to check for 3 events within 60 seconds, and then ignore further events
50
+ for a fixed 5 minutes.
51
+
52
+ }) { |state|
53
+ if state.count == state.threshold
54
+ Notify.urgent(state.generate_alert())
55
+ state.extend_for(300) # expire exactly 5 minutes after the 3rd event
56
+ end
57
+ }
58
+
59
+ == Adding a final block
60
+ If we want to see one message when the user first has trouble, then another message
61
+ after he has decided to stop trying, the format is a little different. The block
62
+ given is previous examples is stored in the +params+ as +action+.
63
+
64
+ Instead, the +action+ block may be specified directly as a member of the params hash,
65
+ and the +onexpiry+ must be specified in this way if it is to be used.
66
+
67
+ Rule.new(10034, {
68
+ :pattern => /^\s+\w+\s+sudo\[\d+\]\:\s+(\w+) \:/,
69
+ :details => ["userid"],
70
+ :message => "sudo activity for user %userid$s",
71
+ :threshold => 3,
72
+ :lifespan => 60,
73
+ :alert => "'Too much sudo activity' userid=%userid$s attempts=%count$d dur=%dur$0.3fs ",
74
+ :expiry => "'Gave sudo a rest' userid=%userid$s attempts=%count$d dur=%dur$0.3fs ",
75
+ :action => Proc.new { |state|
76
+ if state.count == state.threshold
77
+ Notify.urgent(state.generate(:alert))
78
+ state.release()
79
+ end
80
+ },
81
+ :final => Proc.new { |state|
82
+ Notify.normal(state.generate(:expiry))
83
+ }
84
+ })
85
+
86
+ When the state is about to expire, its :onexpiry block will be called. In this case,
87
+ it generates a log entry using the :final message template, and sends the message
88
+ to the administrator via normal (email) delivery.
89
+
90
+ == Event compression
91
+ Compression involves converting a stream of events into fewer, preferably one. In this example,
92
+ we report when a skype conversation starts and then suppress all further noise for about 8 minutes.
93
+
94
+ # suppression rule
95
+ Rule.new(10035, {
96
+ :pattern => /^\s\w+\sFirewall\[\d+\]\:\sSkype is listening from 0.0.0.0:(\d+)/,
97
+ :details => ["port"],
98
+ :message => "Skype conversation started on port %port$d",
99
+ :alert => "Skype running on port %port$d",
100
+ :lifespan => 479
101
+ }) { |state|
102
+ state.generate_first_only(:alert)
103
+ }
104
+
105
+ The <code>generate_first_only</code> method creates a new event using the :alert template
106
+ only if the state's +count+ is 1, so it notices the first event and ignores all subsequent
107
+ events as long as the state lives.
108
+
109
+ By default, generate() and generate_first_only() use the :alert template.
110
+ If no :alert was provided, the :message will be used instead. In this example,
111
+ we could have omitted the argument:
112
+
113
+ }) { |state|
114
+ state.generate_first_only()
115
+ }
116
+
117
+ == Pairs of rules
118
+ We want to know when a server goes down, and when it comes back up again.
119
+ In this example, rule 10036 creates a new log entry when we first
120
+ see the server is not responding, and the state persists for 5 minutes.
121
+
122
+ # pair rule
123
+ Rule.new(10036, {
124
+ :pattern => /^\s\w+\s\w+\: nfs\: server (\w+) not responding/,
125
+ :details => ["host"],
126
+ :message => "Server %host$s is down",
127
+ :lifespan => 300
128
+ }) { |state|
129
+ state.generate_first_only()
130
+ }
131
+
132
+ Rule 10037 looks for a message saying the server is OK, *AND* that there is a state
133
+ with a title like "Server earth is down". The :allstates parameter contains an array of
134
+ templates - the rule does not react to the event unless all of the named states exist.
135
+
136
+ When all the conditions are satisfied, the rule generates a new log entry that
137
+ the server is up, and then forget both states.
138
+
139
+ Rule.new(10037, {
140
+ :pattern => /^\s\w+\s\w+\: nfs\: server (\w+) OK/,
141
+ :details => ["host"],
142
+ :message => "Server %host$s is up again",
143
+ :allstates => ["Server %host$s is down"]
144
+ }) {|state|
145
+ state.generate()
146
+ state.release("Server %host$s is down")
147
+ state.release()
148
+ }
149
+
150
+ Since no :alert is specified, it defaults to the :message. So +generate+ will
151
+ log a message that "Server earth is up again".
152
+
153
+ == Correlating events (and states)
154
+ Now suppose we want to know how long the server was down. We have two options:
155
+ 1. we could add a final block to rule 10036 to report its age, but that would
156
+ just create an extra message and that's what we're trying to get away from
157
+ 2. we could report the duration in a single "Server earth is up again" message
158
+
159
+ Since we've already seen how to add a final block, lets take option 2.
160
+
161
+ Rule.new(10037, {
162
+ :pattern => /^\s\w+\s\w+\: nfs\: server (\w+) OK/,
163
+ :details => ["host"],
164
+ :message => "Server %host$s is up again after %outage$d minutes",
165
+ :allstates => ["Server %host$s is down"]
166
+ }) {|state|
167
+ duration = State.find("Server %host$s is down", state).age()
168
+ state.params[:outage] = (duration/60).to_i()
169
+ state.generate()
170
+ state.release("Server %host$s is down")
171
+ state.release()
172
+ }
173
+
174
+ We can obtain the duration of the outage with the State#find method, which
175
+ interpolates the current state's values into the template, and finds the
176
+ matching state.
177
+
178
+ We now need to store that duration into the state's values as an integer, because
179
+ sprintf %d expects an integer.
180
+
181
+ Having calculated the duration, we generate the message, and forget both states.
182
+
183
+ == Shortcut actions
184
+ Several actions are so common they have been provided as constants to make the rules
185
+ more succinct but still readable. One is to generate a message on the first event only:
186
+
187
+ Rule.new(10036, {
188
+ :pattern => /^\s\w+\s\w+\: nfs\: server (\w+) not responding/,
189
+ :details => ["host"],
190
+ :message => "Server %host$s is down",
191
+ :lifespan => 300
192
+ }) { |state|
193
+ state.generate_first_only()
194
+ }
195
+
196
+ can be abbreviated in this way:
197
+
198
+ Rule.new(10036, {
199
+ :pattern => /^\s\w+\s\w+\: nfs\: server (\w+) not responding/,
200
+ :details => ["host"],
201
+ :message => "Server %host$s is down",
202
+ :lifespan => 300,
203
+ :action => State::Generate_first_only
204
+ })
205
+
206
+ Another common action is to generate a message and release the state immediately:
207
+
208
+ Rule.new(10040, {
209
+ :pattern => /Accepted password for (\w+) from (\d+\.\d+\.\d+\.\d+)/,
210
+ :details => ["user", "ip"],
211
+ :message => "User %user$s signed in via SSH from %ip$s",
212
+ }) { |state|
213
+ state.generate()
214
+ state.release()
215
+ }
216
+
217
+ can be abbreviated in this way:
218
+
219
+ Rule.new(10040, {
220
+ :pattern => /Accepted password for (\w+) from (\d+\.\d+\.\d+\.\d+)/,
221
+ :details => ["user", "ip"],
222
+ :message => "User %user$s signed in via SSH from %ip$s",
223
+ :action => State::Generate_and_release
224
+ })
225
+
226
+
data/lib/README ADDED
@@ -0,0 +1,211 @@
1
+ = Ruby Event Correlation
2
+ Correlates events in order to generate a smaller set of more meaningful events.
3
+
4
+ == Installation
5
+ 1. Install the gem
6
+ $ sudo gem install rec
7
+
8
+ 2. Select a ruleset or create your own
9
+ #!/usr/bin/ruby
10
+ require 'rec'
11
+ include REC
12
+ require 'rulesets/postfix-rules'
13
+ Correlator::start()
14
+
15
+ 3. Start it up
16
+ $ rulesets/rules.rb < /var/log/mail.log 3>missed.log 2>control.log > newevents.log
17
+
18
+ == Why correlate events?
19
+ We all know that we should read our log files. But reading log files is *really* boring,
20
+ and frankly its easy to miss important things in all the superfluous detail.
21
+
22
+ [Save time]
23
+ If you are lazy enough to not want to review all of your log files manually forever, and
24
+ smart enough to work out what needs monitoring and when you might want to pay attention,
25
+ then wouldn't it be good if you could define those rules and let the computer do what it
26
+ does best?
27
+
28
+ [Generate meaning]
29
+ The logs of many applications are filled with entries that are quite low level - perhaps
30
+ wonderful for debugging, but typically not terribly meaningful in terms of business.
31
+ Wouldn't it be good if we could summarise a bunch of low level events into a single
32
+ business event - and then just read the <em>business log</em>.
33
+
34
+ == Alternatives
35
+ There are several alternatives to REC which may suit your needs better:
36
+ * splunk[www.splunk.com]
37
+ * nagios[www.nagios.com]
38
+ * scalextreme.com[www.scalextreme.com]
39
+ While I like these options, I find they take a lot of configuring.
40
+ They also has some dependencies that make them a bit heavier than you may want.
41
+ If you just want to keep track of a few kinds of events, want a lot of flexibility
42
+ and control without too much effort, then REC may be of some value.
43
+
44
+ == How does REC work?
45
+ Each entry in a log file is an *event*.
46
+ The Correlator reads the events, and attempts to match an event against each Rule.
47
+ If an event matches a rule, the rule creates a State which just means we're remembering
48
+ that the event matched a rule. The pattern to match is a regexp, and the captured values
49
+ are named. For example
50
+ # log entry => "nfs: server earth not responding"
51
+ pattern => /nfs\: server (\w+) not responding/
52
+ details => ['host']
53
+ # values of interest are captured into a hash => {'host' => 'earth' }
54
+ :message => "Server %host$s is down"
55
+ # interpolation with named parameters => "Server earth is down"
56
+
57
+ A state has a fixed lifetime, set when it is created. At the end of its life, it may simply
58
+ expire quietly, or a pre-defined action may be executed. For example, if we find a server is down,
59
+ we may want to wait for 3 minutes and if it is not up again, then alert the administrator.
60
+ The server being down is a state, and two states are distinguished by their *titles*. For example,
61
+ "host earth is down" and "host terra is down".
62
+
63
+ Now that we're remembering a set of states, we can match events against not only the event's
64
+ message, but also other states. For example, we can match "host terra is up" against a previously
65
+ created state "host terra is down", and generate a new event "host terra is back up after 14 minutes".
66
+ We can also 'swallow' all of the rest of the "host terra is down" events because they add nothing new.
67
+ This <em>event compression</em> means the administrator gets one important message, and not 27
68
+ distracting alerts.
69
+
70
+ A notifcation can be sent by email or IM, depending on your preferences and working hours.
71
+ The destinations and credentials are supplied to your ruleset:
72
+ # For better security, move the next few lines into a file readable only by
73
+ # the user running this script eg. /home/rec/alert.conf
74
+ # and then require that file
75
+ Notify.smtp_credentials("rec@gmail.com", "recret", "myfirm.com")
76
+ Notify.emailTo = "me@myfirm.com"
77
+ Notify.jabber_credentials("rec@gmail.com", "recret")
78
+ Notify.jabberTo = "me@myfirm.com"
79
+
80
+ Rules can then send an alert when desired. Two common cases involve alerting immediately
81
+ on the first event (eg. "host terra is down"), and alerting on expiry or at a subsequent event
82
+ (eg. "host terra is back up").
83
+ state.alert_first_only() # => generate a new event on first original event
84
+ # or
85
+ Notify.normal(state.alert_first_only()) # => log and also send the new event via email
86
+
87
+ In most cases, however, it is not necessary to alert the administrator at all. It is enough to
88
+ log the new event in the output logfile for later review.
89
+
90
+ == Anatomy of a Rule
91
+ Warn if an user is having trouble executing sudo commands
92
+ The log entry (/var/log/secure) looks like this:
93
+
94
+ Sep 16 07:09:22 earth sudo: richard : 3 incorrect password attempts ;...
95
+
96
+ and the rule might look like this:
97
+
98
+ # single threshold rule
99
+ Rule.new(10034, {
100
+ :pattern => /\w+ sudo\: (\w+) \: 3 incorrect password attempts/,
101
+ :details => ["userid"],
102
+ :message => "Failed sudo password for user %userid$s",
103
+ :lifespan => 60,
104
+ :alert => "'Too much sudo activity' userid=%userid$s attempts=%count$d dur=%dur$0.3fs ",
105
+ :threshold => 3,
106
+ :capture => true
107
+ }) { |state|
108
+ if state.count == state.threshold
109
+ Notify.urgent(state.generate_alert())
110
+ state.release()
111
+ end
112
+ }
113
+
114
+ Let's look at each part:
115
+ [Rule ID]
116
+ Each rule must have a unique integer ID (+rid+).
117
+ It is the first argument and is mandatory.
118
+ Its probably a good idea to 'reserve' a number range for a ruleset
119
+ to keep them separate from other rules (eg. 17801-17899 for Postfix-related rules).
120
+
121
+ The second argument is a hash of options:
122
+ [pattern]
123
+ The +pattern+ is a regexp designed to match certain log messages.
124
+ A +message+ is what's left of a log entry after we have removed the timestamp and
125
+ any priority level. For example:
126
+ [Thu Aug 16 16:11:21 2012] [error] ap_proxy_connect_backend disabling worker for (127.0.0.1)
127
+ # timestamp parsed => 2012-08-16T16:11:21+10:00
128
+ # priority ignored => "error"
129
+ # message => "ap_proxy_connect_backend disabling worker for (127.0.0.1)"
130
+
131
+ [details]
132
+ The pattern may contain regexp 'captures' (eg. (\d+.\d+.\d+.\d+) to capture the ip).
133
+ For each capture a name should be specified in the +details+ array.
134
+ The sequence of captures is as specified for ruby Regexps.
135
+ :pattern => /\w+ sudo\: (\w+) \: (\d) incorrect password attempts/,
136
+ :details => ["userid", "failures"],
137
+ The names chosen for captured values are used as keys to store the values in the same
138
+ hash that stores the parameters, so do *not* choose words like +pattern+, +details+,
139
+ +message+, +threshold+, +lifespan+, +alert+, +capture+, +continue+, or +action+.
140
+
141
+ [message]
142
+ The +message+ is a string template into which the captured values are interpolated
143
+ to produce a unique key for a state.
144
+ :details => ["userid"],
145
+ :message => "Failed sudo password for user %userid$s",
146
+ # userid = "richard" => "Failed sudo password for user richard"
147
+ Note the modified +sprintf+ syntax: the value of +userid+ is inserted into the message
148
+ as a string by the String::sprinth method. This becomes the +title+ and key for the state
149
+ created by this rule.
150
+
151
+ [lifespan]
152
+ When a rule creates a state, we need to know how long to remember the state for, and
153
+ when to expire it. The +lifespan+ specifies that duration in seconds.
154
+
155
+ It is also possible to extend the life of a state should other events take place (with
156
+ State::live_another) in the same way that a web session may be extended for another 10
157
+ minutes longer at each request.
158
+
159
+ [alert]
160
+ This is a string template used to generate an output log message (the timestamp will be
161
+ prefixed automatically to complete the log entry).
162
+ :alert => "'Too much sudo activity' userid=%userid$s attempts=%count$d dur=%dur$0.3fs ",
163
+ By convention, we make out log messages very easy to parse by creating name=value pairs,
164
+ and single-quoting strings containing spaces, in case the output will be processed further.
165
+
166
+ If no +alert+ is provided, it will default to +message+.
167
+
168
+ [capture]
169
+ The +capture+ parameters tells REC to store the original log entries in the state (in the
170
+ +logs+ attribute). You could in this way for example extract a transcript of each web session from
171
+ a noisy access log, and output them as each session finishes or expires.
172
+ :capture => true
173
+
174
+ [threshold]
175
+ This parameter is used in the action.
176
+ :threshold => 3
177
+
178
+ [allstates]
179
+ An array of templates used to determine if matching states exist. All the mentioned
180
+ states must be found or the rule will not take any action.
181
+
182
+ [anystates]
183
+ An array of templates used to determine if matching states exist. If any one of the
184
+ mentioned states exist, then the rule will execute its action.
185
+
186
+ [notstates]
187
+ An array of templates used to determine if matching states exist. If any one of the
188
+ mentioned states does exist, then the rule will *not* execute its action.
189
+
190
+ [Any arbitrary parameter]
191
+ Any arbitrary parameters may be added to the rule, and they are passed on to the
192
+ state in the +params+ hash.
193
+
194
+ The third argument is a block.
195
+ [action]
196
+ The action is a block with a single argument which is the state created by the rule.
197
+ The +count+ of matched events is maintained automatically. In this case, when we have
198
+ seen 3 events, we generate an output log entry and also send it by IM, then release
199
+ the state (forget about it).
200
+ :threshold => 3
201
+ }) { |state|
202
+ if state.count == state.threshold
203
+ Notify.urgent(state.generate_alert())
204
+ state.release()
205
+ end
206
+ }
207
+ By the magic of Ruby's #method_missing method (Yes, I'm looking at you Java!) we can
208
+ refer to any parameter succinctly instead of a cumbersome hash notation, so:
209
+ state.threshold === state.params['threshold']
210
+
211
+ For more examples, see the EXAMPLES page.
metadata CHANGED
@@ -1,12 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rec
3
3
  version: !ruby/object:Gem::Version
4
- prerelease: false
4
+ hash: 31
5
+ prerelease:
5
6
  segments:
6
7
  - 1
7
8
  - 0
8
- - 2
9
- version: 1.0.2
9
+ - 4
10
+ version: 1.0.4
10
11
  platform: ruby
11
12
  authors:
12
13
  - Richard Kernahan
@@ -14,8 +15,7 @@ autorequire:
14
15
  bindir: bin
15
16
  cert_chain: []
16
17
 
17
- date: 2012-09-17 00:00:00 +10:00
18
- default_executable:
18
+ date: 2012-09-17 00:00:00 Z
19
19
  dependencies: []
20
20
 
21
21
  description: "\t\tSifts through your log files in real time, using stateful intelligence to determine\n\
@@ -30,8 +30,9 @@ executables: []
30
30
 
31
31
  extensions: []
32
32
 
33
- extra_rdoc_files: []
34
-
33
+ extra_rdoc_files:
34
+ - lib/README
35
+ - lib/EXAMPLES
35
36
  files:
36
37
  - lib/rec.rb
37
38
  - lib/rec/rule.rb
@@ -40,35 +41,45 @@ files:
40
41
  - lib/rec/notify.rb
41
42
  - lib/rec/mock-notify.rb
42
43
  - lib/string.rb
43
- has_rdoc: true
44
+ - lib/README
45
+ - lib/EXAMPLES
44
46
  homepage: http://rubygems.org/gems/rec
45
47
  licenses: []
46
48
 
47
49
  post_install_message:
48
- rdoc_options: []
49
-
50
+ rdoc_options:
51
+ - --show-hash
52
+ - --main
53
+ - lib/README
54
+ - --title
55
+ - REC -- Ruby Event Correlation
50
56
  require_paths:
51
57
  - lib
52
58
  required_ruby_version: !ruby/object:Gem::Requirement
59
+ none: false
53
60
  requirements:
54
61
  - - ">="
55
62
  - !ruby/object:Gem::Version
63
+ hash: 3
56
64
  segments:
57
65
  - 0
58
66
  version: "0"
59
67
  required_rubygems_version: !ruby/object:Gem::Requirement
68
+ none: false
60
69
  requirements:
61
70
  - - ">="
62
71
  - !ruby/object:Gem::Version
72
+ hash: 3
63
73
  segments:
64
74
  - 0
65
75
  version: "0"
66
76
  requirements: []
67
77
 
68
78
  rubyforge_project: rec
69
- rubygems_version: 1.3.6
79
+ rubygems_version: 1.8.24
70
80
  signing_key:
71
81
  specification_version: 3
72
82
  summary: Ruby event correlation
73
83
  test_files: []
74
84
 
85
+ has_rdoc: