rec 1.0.2 → 1.0.4

Sign up to get free protection for your applications and to get access to all the features.
Files changed (3) hide show
  1. data/lib/EXAMPLES +226 -0
  2. data/lib/README +211 -0
  3. metadata +22 -11
data/lib/EXAMPLES ADDED
@@ -0,0 +1,226 @@
1
+ = REC Examples
2
+ The best way to understand REC is to see how rules are written.
3
+
4
+ The early examples were inspired by Risto Vaarandi's brilliant SEC (http://simple-evcorr.sourceforge.net/),
5
+ so they employ similar names for easy comparison.
6
+
7
+ == Single Threshold
8
+ We are monitoring events where a user has had 3 incorrect password attempts.
9
+ If we see that happen 3 times (+threshold+) within a minute (+lifespan+), alert the administrator.
10
+
11
+ # single threshold rule
12
+ Rule.new(10034, {
13
+ :pattern => %r\w+ sudo\: (\w+) \: 3 incorrect password attempts/,
14
+ :details => ["userid"],
15
+ :message => "Failed sudo password for user %userid$s",
16
+ :lifespan => 60,
17
+ :alert => "'Too much sudo activity' userid=%userid$s attempts=%count$d dur=%dur$0.3fs ",
18
+ :threshold => 3
19
+ }) { |state|
20
+ if state.count == state.threshold
21
+ Notify.urgent(state.generate_alert())
22
+ state.release()
23
+ end
24
+ }
25
+
26
+ When we see the first event, a state is created with title "Failed sudo password for user richard".
27
+ The second event has not effect, beyond automatically incrementing the count.
28
+ When we see the third event, an output message is generated and logged, and then the generated
29
+ message is also sent via IM to the administrator. Alternatively:
30
+
31
+ }) { |state|
32
+ message = state.generate_alert() # writes out a new log entry, and returns it
33
+ Notify.urgent(message) # sends the message to the administrator
34
+ }
35
+
36
+ Finally, the state is released (we forget all about it).
37
+
38
+ If there is a fourth event, that would then create another state of the same kind which
39
+ would start counting again. Suppose we wanted to avoid that, and just keep on ignoring any
40
+ more events in a sliding window until the user has given it a 3 minute rest.
41
+ The action could be modified in this way:
42
+
43
+ }) { |state|
44
+ Notify.urgent(state.generate_alert()) if state.count == state.threshold
45
+ # keep on pushing expiry out to 3 minutes after the last event
46
+ state.extend_for(180) if state.count >= state.threshold
47
+ }
48
+
49
+ Suppose we want to check for 3 events within 60 seconds, and then ignore further events
50
+ for a fixed 5 minutes.
51
+
52
+ }) { |state|
53
+ if state.count == state.threshold
54
+ Notify.urgent(state.generate_alert())
55
+ state.extend_for(300) # expire exactly 5 minutes after the 3rd event
56
+ end
57
+ }
58
+
59
+ == Adding a final block
60
+ If we want to see one message when the user first has trouble, then another message
61
+ after he has decided to stop trying, the format is a little different. The block
62
+ given is previous examples is stored in the +params+ as +action+.
63
+
64
+ Instead, the +action+ block may be specified directly as a member of the params hash,
65
+ and the +onexpiry+ must be specified in this way if it is to be used.
66
+
67
+ Rule.new(10034, {
68
+ :pattern => /^\s+\w+\s+sudo\[\d+\]\:\s+(\w+) \:/,
69
+ :details => ["userid"],
70
+ :message => "sudo activity for user %userid$s",
71
+ :threshold => 3,
72
+ :lifespan => 60,
73
+ :alert => "'Too much sudo activity' userid=%userid$s attempts=%count$d dur=%dur$0.3fs ",
74
+ :expiry => "'Gave sudo a rest' userid=%userid$s attempts=%count$d dur=%dur$0.3fs ",
75
+ :action => Proc.new { |state|
76
+ if state.count == state.threshold
77
+ Notify.urgent(state.generate(:alert))
78
+ state.release()
79
+ end
80
+ },
81
+ :final => Proc.new { |state|
82
+ Notify.normal(state.generate(:expiry))
83
+ }
84
+ })
85
+
86
+ When the state is about to expire, its :onexpiry block will be called. In this case,
87
+ it generates a log entry using the :final message template, and sends the message
88
+ to the administrator via normal (email) delivery.
89
+
90
+ == Event compression
91
+ Compression involves converting a stream of events into fewer, preferably one. In this example,
92
+ we report when a skype conversation starts and then suppress all further noise for about 8 minutes.
93
+
94
+ # suppression rule
95
+ Rule.new(10035, {
96
+ :pattern => /^\s\w+\sFirewall\[\d+\]\:\sSkype is listening from 0.0.0.0:(\d+)/,
97
+ :details => ["port"],
98
+ :message => "Skype conversation started on port %port$d",
99
+ :alert => "Skype running on port %port$d",
100
+ :lifespan => 479
101
+ }) { |state|
102
+ state.generate_first_only(:alert)
103
+ }
104
+
105
+ The <code>generate_first_only</code> method creates a new event using the :alert template
106
+ only if the state's +count+ is 1, so it notices the first event and ignores all subsequent
107
+ events as long as the state lives.
108
+
109
+ By default, generate() and generate_first_only() use the :alert template.
110
+ If no :alert was provided, the :message will be used instead. In this example,
111
+ we could have omitted the argument:
112
+
113
+ }) { |state|
114
+ state.generate_first_only()
115
+ }
116
+
117
+ == Pairs of rules
118
+ We want to know when a server goes down, and when it comes back up again.
119
+ In this example, rule 10036 creates a new log entry when we first
120
+ see the server is not responding, and the state persists for 5 minutes.
121
+
122
+ # pair rule
123
+ Rule.new(10036, {
124
+ :pattern => /^\s\w+\s\w+\: nfs\: server (\w+) not responding/,
125
+ :details => ["host"],
126
+ :message => "Server %host$s is down",
127
+ :lifespan => 300
128
+ }) { |state|
129
+ state.generate_first_only()
130
+ }
131
+
132
+ Rule 10037 looks for a message saying the server is OK, *AND* that there is a state
133
+ with a title like "Server earth is down". The :allstates parameter contains an array of
134
+ templates - the rule does not react to the event unless all of the named states exist.
135
+
136
+ When all the conditions are satisfied, the rule generates a new log entry that
137
+ the server is up, and then forget both states.
138
+
139
+ Rule.new(10037, {
140
+ :pattern => /^\s\w+\s\w+\: nfs\: server (\w+) OK/,
141
+ :details => ["host"],
142
+ :message => "Server %host$s is up again",
143
+ :allstates => ["Server %host$s is down"]
144
+ }) {|state|
145
+ state.generate()
146
+ state.release("Server %host$s is down")
147
+ state.release()
148
+ }
149
+
150
+ Since no :alert is specified, it defaults to the :message. So +generate+ will
151
+ log a message that "Server earth is up again".
152
+
153
+ == Correlating events (and states)
154
+ Now suppose we want to know how long the server was down. We have two options:
155
+ 1. we could add a final block to rule 10036 to report its age, but that would
156
+ just create an extra message and that's what we're trying to get away from
157
+ 2. we could report the duration in a single "Server earth is up again" message
158
+
159
+ Since we've already seen how to add a final block, lets take option 2.
160
+
161
+ Rule.new(10037, {
162
+ :pattern => /^\s\w+\s\w+\: nfs\: server (\w+) OK/,
163
+ :details => ["host"],
164
+ :message => "Server %host$s is up again after %outage$d minutes",
165
+ :allstates => ["Server %host$s is down"]
166
+ }) {|state|
167
+ duration = State.find("Server %host$s is down", state).age()
168
+ state.params[:outage] = (duration/60).to_i()
169
+ state.generate()
170
+ state.release("Server %host$s is down")
171
+ state.release()
172
+ }
173
+
174
+ We can obtain the duration of the outage with the State#find method, which
175
+ interpolates the current state's values into the template, and finds the
176
+ matching state.
177
+
178
+ We now need to store that duration into the state's values as an integer, because
179
+ sprintf %d expects an integer.
180
+
181
+ Having calculated the duration, we generate the message, and forget both states.
182
+
183
+ == Shortcut actions
184
+ Several actions are so common they have been provided as constants to make the rules
185
+ more succinct but still readable. One is to generate a message on the first event only:
186
+
187
+ Rule.new(10036, {
188
+ :pattern => /^\s\w+\s\w+\: nfs\: server (\w+) not responding/,
189
+ :details => ["host"],
190
+ :message => "Server %host$s is down",
191
+ :lifespan => 300
192
+ }) { |state|
193
+ state.generate_first_only()
194
+ }
195
+
196
+ can be abbreviated in this way:
197
+
198
+ Rule.new(10036, {
199
+ :pattern => /^\s\w+\s\w+\: nfs\: server (\w+) not responding/,
200
+ :details => ["host"],
201
+ :message => "Server %host$s is down",
202
+ :lifespan => 300,
203
+ :action => State::Generate_first_only
204
+ })
205
+
206
+ Another common action is to generate a message and release the state immediately:
207
+
208
+ Rule.new(10040, {
209
+ :pattern => /Accepted password for (\w+) from (\d+\.\d+\.\d+\.\d+)/,
210
+ :details => ["user", "ip"],
211
+ :message => "User %user$s signed in via SSH from %ip$s",
212
+ }) { |state|
213
+ state.generate()
214
+ state.release()
215
+ }
216
+
217
+ can be abbreviated in this way:
218
+
219
+ Rule.new(10040, {
220
+ :pattern => /Accepted password for (\w+) from (\d+\.\d+\.\d+\.\d+)/,
221
+ :details => ["user", "ip"],
222
+ :message => "User %user$s signed in via SSH from %ip$s",
223
+ :action => State::Generate_and_release
224
+ })
225
+
226
+
data/lib/README ADDED
@@ -0,0 +1,211 @@
1
+ = Ruby Event Correlation
2
+ Correlates events in order to generate a smaller set of more meaningful events.
3
+
4
+ == Installation
5
+ 1. Install the gem
6
+ $ sudo gem install rec
7
+
8
+ 2. Select a ruleset or create your own
9
+ #!/usr/bin/ruby
10
+ require 'rec'
11
+ include REC
12
+ require 'rulesets/postfix-rules'
13
+ Correlator::start()
14
+
15
+ 3. Start it up
16
+ $ rulesets/rules.rb < /var/log/mail.log 3>missed.log 2>control.log > newevents.log
17
+
18
+ == Why correlate events?
19
+ We all know that we should read our log files. But reading log files is *really* boring,
20
+ and frankly its easy to miss important things in all the superfluous detail.
21
+
22
+ [Save time]
23
+ If you are lazy enough to not want to review all of your log files manually forever, and
24
+ smart enough to work out what needs monitoring and when you might want to pay attention,
25
+ then wouldn't it be good if you could define those rules and let the computer do what it
26
+ does best?
27
+
28
+ [Generate meaning]
29
+ The logs of many applications are filled with entries that are quite low level - perhaps
30
+ wonderful for debugging, but typically not terribly meaningful in terms of business.
31
+ Wouldn't it be good if we could summarise a bunch of low level events into a single
32
+ business event - and then just read the <em>business log</em>.
33
+
34
+ == Alternatives
35
+ There are several alternatives to REC which may suit your needs better:
36
+ * splunk[www.splunk.com]
37
+ * nagios[www.nagios.com]
38
+ * scalextreme.com[www.scalextreme.com]
39
+ While I like these options, I find they take a lot of configuring.
40
+ They also has some dependencies that make them a bit heavier than you may want.
41
+ If you just want to keep track of a few kinds of events, want a lot of flexibility
42
+ and control without too much effort, then REC may be of some value.
43
+
44
+ == How does REC work?
45
+ Each entry in a log file is an *event*.
46
+ The Correlator reads the events, and attempts to match an event against each Rule.
47
+ If an event matches a rule, the rule creates a State which just means we're remembering
48
+ that the event matched a rule. The pattern to match is a regexp, and the captured values
49
+ are named. For example
50
+ # log entry => "nfs: server earth not responding"
51
+ pattern => /nfs\: server (\w+) not responding/
52
+ details => ['host']
53
+ # values of interest are captured into a hash => {'host' => 'earth' }
54
+ :message => "Server %host$s is down"
55
+ # interpolation with named parameters => "Server earth is down"
56
+
57
+ A state has a fixed lifetime, set when it is created. At the end of its life, it may simply
58
+ expire quietly, or a pre-defined action may be executed. For example, if we find a server is down,
59
+ we may want to wait for 3 minutes and if it is not up again, then alert the administrator.
60
+ The server being down is a state, and two states are distinguished by their *titles*. For example,
61
+ "host earth is down" and "host terra is down".
62
+
63
+ Now that we're remembering a set of states, we can match events against not only the event's
64
+ message, but also other states. For example, we can match "host terra is up" against a previously
65
+ created state "host terra is down", and generate a new event "host terra is back up after 14 minutes".
66
+ We can also 'swallow' all of the rest of the "host terra is down" events because they add nothing new.
67
+ This <em>event compression</em> means the administrator gets one important message, and not 27
68
+ distracting alerts.
69
+
70
+ A notifcation can be sent by email or IM, depending on your preferences and working hours.
71
+ The destinations and credentials are supplied to your ruleset:
72
+ # For better security, move the next few lines into a file readable only by
73
+ # the user running this script eg. /home/rec/alert.conf
74
+ # and then require that file
75
+ Notify.smtp_credentials("rec@gmail.com", "recret", "myfirm.com")
76
+ Notify.emailTo = "me@myfirm.com"
77
+ Notify.jabber_credentials("rec@gmail.com", "recret")
78
+ Notify.jabberTo = "me@myfirm.com"
79
+
80
+ Rules can then send an alert when desired. Two common cases involve alerting immediately
81
+ on the first event (eg. "host terra is down"), and alerting on expiry or at a subsequent event
82
+ (eg. "host terra is back up").
83
+ state.alert_first_only() # => generate a new event on first original event
84
+ # or
85
+ Notify.normal(state.alert_first_only()) # => log and also send the new event via email
86
+
87
+ In most cases, however, it is not necessary to alert the administrator at all. It is enough to
88
+ log the new event in the output logfile for later review.
89
+
90
+ == Anatomy of a Rule
91
+ Warn if an user is having trouble executing sudo commands
92
+ The log entry (/var/log/secure) looks like this:
93
+
94
+ Sep 16 07:09:22 earth sudo: richard : 3 incorrect password attempts ;...
95
+
96
+ and the rule might look like this:
97
+
98
+ # single threshold rule
99
+ Rule.new(10034, {
100
+ :pattern => /\w+ sudo\: (\w+) \: 3 incorrect password attempts/,
101
+ :details => ["userid"],
102
+ :message => "Failed sudo password for user %userid$s",
103
+ :lifespan => 60,
104
+ :alert => "'Too much sudo activity' userid=%userid$s attempts=%count$d dur=%dur$0.3fs ",
105
+ :threshold => 3,
106
+ :capture => true
107
+ }) { |state|
108
+ if state.count == state.threshold
109
+ Notify.urgent(state.generate_alert())
110
+ state.release()
111
+ end
112
+ }
113
+
114
+ Let's look at each part:
115
+ [Rule ID]
116
+ Each rule must have a unique integer ID (+rid+).
117
+ It is the first argument and is mandatory.
118
+ Its probably a good idea to 'reserve' a number range for a ruleset
119
+ to keep them separate from other rules (eg. 17801-17899 for Postfix-related rules).
120
+
121
+ The second argument is a hash of options:
122
+ [pattern]
123
+ The +pattern+ is a regexp designed to match certain log messages.
124
+ A +message+ is what's left of a log entry after we have removed the timestamp and
125
+ any priority level. For example:
126
+ [Thu Aug 16 16:11:21 2012] [error] ap_proxy_connect_backend disabling worker for (127.0.0.1)
127
+ # timestamp parsed => 2012-08-16T16:11:21+10:00
128
+ # priority ignored => "error"
129
+ # message => "ap_proxy_connect_backend disabling worker for (127.0.0.1)"
130
+
131
+ [details]
132
+ The pattern may contain regexp 'captures' (eg. (\d+.\d+.\d+.\d+) to capture the ip).
133
+ For each capture a name should be specified in the +details+ array.
134
+ The sequence of captures is as specified for ruby Regexps.
135
+ :pattern => /\w+ sudo\: (\w+) \: (\d) incorrect password attempts/,
136
+ :details => ["userid", "failures"],
137
+ The names chosen for captured values are used as keys to store the values in the same
138
+ hash that stores the parameters, so do *not* choose words like +pattern+, +details+,
139
+ +message+, +threshold+, +lifespan+, +alert+, +capture+, +continue+, or +action+.
140
+
141
+ [message]
142
+ The +message+ is a string template into which the captured values are interpolated
143
+ to produce a unique key for a state.
144
+ :details => ["userid"],
145
+ :message => "Failed sudo password for user %userid$s",
146
+ # userid = "richard" => "Failed sudo password for user richard"
147
+ Note the modified +sprintf+ syntax: the value of +userid+ is inserted into the message
148
+ as a string by the String::sprinth method. This becomes the +title+ and key for the state
149
+ created by this rule.
150
+
151
+ [lifespan]
152
+ When a rule creates a state, we need to know how long to remember the state for, and
153
+ when to expire it. The +lifespan+ specifies that duration in seconds.
154
+
155
+ It is also possible to extend the life of a state should other events take place (with
156
+ State::live_another) in the same way that a web session may be extended for another 10
157
+ minutes longer at each request.
158
+
159
+ [alert]
160
+ This is a string template used to generate an output log message (the timestamp will be
161
+ prefixed automatically to complete the log entry).
162
+ :alert => "'Too much sudo activity' userid=%userid$s attempts=%count$d dur=%dur$0.3fs ",
163
+ By convention, we make out log messages very easy to parse by creating name=value pairs,
164
+ and single-quoting strings containing spaces, in case the output will be processed further.
165
+
166
+ If no +alert+ is provided, it will default to +message+.
167
+
168
+ [capture]
169
+ The +capture+ parameters tells REC to store the original log entries in the state (in the
170
+ +logs+ attribute). You could in this way for example extract a transcript of each web session from
171
+ a noisy access log, and output them as each session finishes or expires.
172
+ :capture => true
173
+
174
+ [threshold]
175
+ This parameter is used in the action.
176
+ :threshold => 3
177
+
178
+ [allstates]
179
+ An array of templates used to determine if matching states exist. All the mentioned
180
+ states must be found or the rule will not take any action.
181
+
182
+ [anystates]
183
+ An array of templates used to determine if matching states exist. If any one of the
184
+ mentioned states exist, then the rule will execute its action.
185
+
186
+ [notstates]
187
+ An array of templates used to determine if matching states exist. If any one of the
188
+ mentioned states does exist, then the rule will *not* execute its action.
189
+
190
+ [Any arbitrary parameter]
191
+ Any arbitrary parameters may be added to the rule, and they are passed on to the
192
+ state in the +params+ hash.
193
+
194
+ The third argument is a block.
195
+ [action]
196
+ The action is a block with a single argument which is the state created by the rule.
197
+ The +count+ of matched events is maintained automatically. In this case, when we have
198
+ seen 3 events, we generate an output log entry and also send it by IM, then release
199
+ the state (forget about it).
200
+ :threshold => 3
201
+ }) { |state|
202
+ if state.count == state.threshold
203
+ Notify.urgent(state.generate_alert())
204
+ state.release()
205
+ end
206
+ }
207
+ By the magic of Ruby's #method_missing method (Yes, I'm looking at you Java!) we can
208
+ refer to any parameter succinctly instead of a cumbersome hash notation, so:
209
+ state.threshold === state.params['threshold']
210
+
211
+ For more examples, see the EXAMPLES page.
metadata CHANGED
@@ -1,12 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rec
3
3
  version: !ruby/object:Gem::Version
4
- prerelease: false
4
+ hash: 31
5
+ prerelease:
5
6
  segments:
6
7
  - 1
7
8
  - 0
8
- - 2
9
- version: 1.0.2
9
+ - 4
10
+ version: 1.0.4
10
11
  platform: ruby
11
12
  authors:
12
13
  - Richard Kernahan
@@ -14,8 +15,7 @@ autorequire:
14
15
  bindir: bin
15
16
  cert_chain: []
16
17
 
17
- date: 2012-09-17 00:00:00 +10:00
18
- default_executable:
18
+ date: 2012-09-17 00:00:00 Z
19
19
  dependencies: []
20
20
 
21
21
  description: "\t\tSifts through your log files in real time, using stateful intelligence to determine\n\
@@ -30,8 +30,9 @@ executables: []
30
30
 
31
31
  extensions: []
32
32
 
33
- extra_rdoc_files: []
34
-
33
+ extra_rdoc_files:
34
+ - lib/README
35
+ - lib/EXAMPLES
35
36
  files:
36
37
  - lib/rec.rb
37
38
  - lib/rec/rule.rb
@@ -40,35 +41,45 @@ files:
40
41
  - lib/rec/notify.rb
41
42
  - lib/rec/mock-notify.rb
42
43
  - lib/string.rb
43
- has_rdoc: true
44
+ - lib/README
45
+ - lib/EXAMPLES
44
46
  homepage: http://rubygems.org/gems/rec
45
47
  licenses: []
46
48
 
47
49
  post_install_message:
48
- rdoc_options: []
49
-
50
+ rdoc_options:
51
+ - --show-hash
52
+ - --main
53
+ - lib/README
54
+ - --title
55
+ - REC -- Ruby Event Correlation
50
56
  require_paths:
51
57
  - lib
52
58
  required_ruby_version: !ruby/object:Gem::Requirement
59
+ none: false
53
60
  requirements:
54
61
  - - ">="
55
62
  - !ruby/object:Gem::Version
63
+ hash: 3
56
64
  segments:
57
65
  - 0
58
66
  version: "0"
59
67
  required_rubygems_version: !ruby/object:Gem::Requirement
68
+ none: false
60
69
  requirements:
61
70
  - - ">="
62
71
  - !ruby/object:Gem::Version
72
+ hash: 3
63
73
  segments:
64
74
  - 0
65
75
  version: "0"
66
76
  requirements: []
67
77
 
68
78
  rubyforge_project: rec
69
- rubygems_version: 1.3.6
79
+ rubygems_version: 1.8.24
70
80
  signing_key:
71
81
  specification_version: 3
72
82
  summary: Ruby event correlation
73
83
  test_files: []
74
84
 
85
+ has_rdoc: