logstash-filter-aggregate 2.2.0 → 2.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: e023e6c80ed96fa874477b00888a78bf45ee56a7
4
- data.tar.gz: 1375bebbcda30c0052f8f2aa5188dda548d53d6c
3
+ metadata.gz: 89d2a20ef87157c9155e42001819efcd8b65c908
4
+ data.tar.gz: ad218ff47a177f75185534405db70ffa2c7fd7ca
5
5
  SHA512:
6
- metadata.gz: f611bd231a13d8933662dbe7df7075a551e89772f0007d4d694636f307e576c828f091f37c7a3f917b717bec9dbb0acacff944cff14a4fc27bee92318d394e5a
7
- data.tar.gz: bccd38bcb4f37a1f8f4a06ab01baa92883d6d4e3848ff380fa318583830a3600fe464ecd93f299993a42908ea70f0fd596300b6331d469eadfef6b47b165b4db
6
+ metadata.gz: cfd494edb94755712121a5a72325eeac1252d0b78e8ddcdffa69b861bc08b682b4cb08b6759e76be8b71066ddbe93d89f786b69891d7c5736ed4bc15ec0e7232
7
+ data.tar.gz: 1910c47decb6a4fc25cb03315c323a261071a69f7182d62f74f8b393ae02d58102fb9491802f34cb0a2a4dfee3c663b05eaa048e18e23b6346fa8e410a59a249
@@ -1,3 +1,8 @@
1
+ ## 2.3.0
2
+ - new feature: Add new option "push_map_as_event_on_timeout" so that when a task timeout happens the aggregation map can be yielded as a new event
3
+ - new feature: Add new option "timeout_code" which takes the timeout event populated with the aggregation map and executes code on it. This works for "push_map_as_event_on_timeout" as well as "push_previous_map_as_event"
4
+ - new feature: Add new option "timeout_task_id_field" which is used to map the task_id on timeout events.
5
+
1
6
  ## 2.2.0
2
7
  - new feature: add new option "push_previous_map_as_event" so that each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new logstash event
3
8
 
@@ -6,6 +6,7 @@ Maintainers:
6
6
 
7
7
  Contributors:
8
8
  * Fabien Baligand (fbaligand)
9
+ * Artur Kronenberg (pandaadb)
9
10
 
10
11
  Note: If you've sent us patches, bug reports, or otherwise contributed to
11
12
  Logstash, and you aren't on the list above and want to be, please let us know
data/README.md CHANGED
@@ -62,7 +62,7 @@ otherwise events may be processed out of sequence and unexpected results will oc
62
62
 
63
63
  the field `sql_duration` is added and contains the sum of all sql queries durations.
64
64
 
65
- ## Example #2
65
+ ## Example #2 : no start event
66
66
 
67
67
  * If you have the same logs than example #1, but without a start log :
68
68
  ```
@@ -100,11 +100,63 @@ the field `sql_duration` is added and contains the sum of all sql queries durati
100
100
  * the key point is the "||=" ruby operator.
101
101
  it allows to initialize 'sql_duration' map entry to 0 only if this map entry is not already initialized
102
102
 
103
- ## Example #3
103
+ ## Example #3 : no end event
104
104
 
105
- Third use case : you have no specific start event and no specific end event.
106
- A typical case is aggregating results from jdbc input plugin.
107
- * Given that you have this SQL query : `SELECT country_name, town_name FROM town`
105
+ Third use case: You have no specific end event.
106
+
107
+ A typical case is aggregating or tracking user behaviour. We can track a user by its ID through the events, however once the user stops interacting, the events stop coming in. There is no specific event indicating the end of the user's interaction.
108
+
109
+ In this case, we can enable the option 'push_map_as_event_on_timeout' to enable pushing the aggregation map as a new event when a timeout occurs.
110
+ In addition, we can enable 'timeout_code' to execute code on the populated timeout event.
111
+ We can also add 'timeout_task_id_field' so we can correlate the task_id, which in this case would be the user's ID.
112
+
113
+ * Given these logs:
114
+
115
+ ```
116
+ INFO - 12345 - Clicked One
117
+ INFO - 12345 - Clicked Two
118
+ INFO - 12345 - Clicked Three
119
+ ```
120
+
121
+ * You can aggregate the amount of clicks the user did like this:
122
+
123
+ ``` ruby
124
+ filter {
125
+ grok {
126
+ match => [ "message", "%{LOGLEVEL:loglevel} - %{NOTSPACE:user_id} - %{GREEDYDATA:msg_text}" ]
127
+ }
128
+
129
+ aggregate {
130
+ task_id => "%{user_id}"
131
+ code => "map['clicks'] ||= 0; map['clicks'] += 1;"
132
+ push_map_as_event_on_timeout => true
133
+ timeout_task_id_field => "user_id"
134
+ timeout => 600 # 10 minutes timeout
135
+ timeout_code => "event.tag('_aggregatetimeout')"
136
+ }
137
+ }
138
+ ```
139
+
140
+ * After ten minutes, this will yield an event like:
141
+
142
+ ``` json
143
+ {
144
+ "user_id" : "12345",
145
+ "clicks" : 3,
146
+ "tags" : [
147
+ "_aggregatetimeout"
148
+ ]
149
+ }
150
+ ```
151
+
152
+
153
+ ## Example #4 : no end event and tasks come one after the other
154
+
155
+ Fourth use case : like example #3, you have no specific end event, but also, tasks come one after the other.
156
+ That is to say : tasks are not interlaced. All task1 events come, then all task2 events come, ...
157
+ In that case, you don't want to wait task timeout to flush aggregation map.
158
+ * A typical case is aggregating results from jdbc input plugin.
159
+ * Given that you have this SQL query : `SELECT country_name, town_name FROM town ORDER BY country_name`
108
160
  * Using jdbc input plugin, you get these 3 events from :
109
161
  ``` json
110
162
  { "country_name": "France", "town_name": "Paris" }
@@ -119,26 +171,26 @@ A typical case is aggregating results from jdbc input plugin.
119
171
  * You can do that using `push_previous_map_as_event` aggregate plugin option :
120
172
  ``` ruby
121
173
  filter {
122
- aggregate {
123
- task_id => "%{country_name}"
124
- code => "
125
- map['tags'] ||= ['aggregated']
126
- map['town_name'] ||= []
127
- event.to_hash.each do |key,value|
128
- map[key] = value unless map.has_key?(key)
129
- map[key] << value if map[key].is_a?(Array)
130
- end
131
- "
132
- push_previous_map_as_event => true
133
- timeout => 5
134
- }
135
-
136
- if "aggregated" not in [tags] {
137
- drop {}
138
- }
139
- }
174
+ aggregate {
175
+ task_id => "%{country_name}"
176
+ code => "
177
+ map['tags'] ||= ['aggregated']
178
+ map['town_name'] ||= []
179
+ event.to_hash.each do |key,value|
180
+ map[key] = value unless map.has_key?(key)
181
+ map[key] << value if map[key].is_a?(Array)
182
+ end
183
+ "
184
+ push_previous_map_as_event => true
185
+ timeout => 5
186
+ }
187
+
188
+ if "aggregated" not in [tags] {
189
+ drop {}
190
+ }
191
+ }
140
192
  ```
141
- * The key point is that, each time aggregate plugin detects a new `country_name`, it pushes previous aggregate map as a new logstash event (with 'aggregated' tag), and then creates a new empty map for the next country
193
+ * The key point is that each time aggregate plugin detects a new `country_name`, it pushes previous aggregate map as a new logstash event (with 'aggregated' tag), and then creates a new empty map for the next country
142
194
  * When 5s timeout comes, the last aggregate map is pushed as a new event
143
195
  * Finally, initial events (which are not aggregated) are dropped because useless
144
196
 
@@ -202,6 +254,23 @@ and then creates a new empty map for the next task.
202
254
  _WARNING:_ this option works fine only if tasks come one after the other. It means : all task1 events, then all task2 events, etc...
203
255
  Default value: `false`
204
256
 
257
+ - **push_map_as_event_on_timeout**
258
+ When this option is enabled, each time a task timeout is detected, it pushes task aggregation map as a new logstash event.
259
+ This enables to detect and process task timeouts in logstash, but also to manage tasks that have no explicit end event.
260
+
261
+ - **timeout_code**
262
+ The code to execute to complete timeout generated event, when 'push_map_as_event_on_timeout' or 'push_previous_map_as_event' is set to true.
263
+ The code block will have access to the newly generated timeout event that is pre-populated with the aggregation map.
264
+ If 'timeout_task_id_field' is set, the event is also populated with the task_id value
265
+ Example value: `"event.tag('_aggregatetimeout')"`
266
+
267
+ - **timeout_task_id_field**
268
+ This option indicates the timeout generated event's field for the "task_id" value.
269
+ The task id will then be set into the timeout event. This can help correlate which tasks have been timed out.
270
+ This field has no default value and will not be set on the event if not configured.
271
+ Example:
272
+ If the task_id is "12345" and this field is set to "my_id", the generated event will have:
273
+ `event[ "my_id" ] = "12345"`
205
274
 
206
275
  ## Changelog
207
276
 
@@ -69,7 +69,7 @@ require "thread"
69
69
  #
70
70
  # the field `sql_duration` is added and contains the sum of all sql queries durations.
71
71
  #
72
- # ==== Example #2
72
+ # ==== Example #2 : no start event
73
73
  #
74
74
  # * If you have the same logs than example #1, but without a start log :
75
75
  # [source,ruby]
@@ -109,9 +109,63 @@ require "thread"
109
109
  # * the key point is the "||=" ruby operator. It allows to initialize 'sql_duration' map entry to 0 only if this map entry is not already initialized
110
110
  #
111
111
  #
112
- # ==== Example #3
112
+ # ==== Example #3 : no end event
113
+ #
114
+ # Third use case: You have no specific end event.
115
+ #
116
+ # A typical case is aggregating or tracking user behaviour. We can track a user by its ID through the events, however once the user stops interacting, the events stop coming in. There is no specific event indicating the end of the user's interaction.
117
+ #
118
+ # In this case, we can enable the option 'push_map_as_event_on_timeout' to enable pushing the aggregation map as a new event when a timeout occurs.
119
+ # In addition, we can enable 'timeout_code' to execute code on the populated timeout event.
120
+ # We can also add 'timeout_task_id_field' so we can correlate the task_id, which in this case would be the user's ID.
121
+ #
122
+ # * Given these logs:
123
+ #
124
+ # [source,ruby]
125
+ # ----------------------------------
126
+ # INFO - 12345 - Clicked One
127
+ # INFO - 12345 - Clicked Two
128
+ # INFO - 12345 - Clicked Three
129
+ # ----------------------------------
130
+ #
131
+ # * You can aggregate the amount of clicks the user did like this:
132
+ #
133
+ # [source,ruby]
134
+ # ----------------------------------
135
+ # filter {
136
+ # grok {
137
+ # match => [ "message", "%{LOGLEVEL:loglevel} - %{NOTSPACE:user_id} - %{GREEDYDATA:msg_text}" ]
138
+ # }
139
+ #
140
+ # aggregate {
141
+ # task_id => "%{user_id}"
142
+ # code => "map['clicks'] ||= 0; map['clicks'] += 1;"
143
+ # push_map_as_event_on_timeout => true
144
+ # timeout_task_id_field => "user_id"
145
+ # timeout => 600 # 10 minutes timeout
146
+ # timeout_code => "event.tag('_aggregatetimeout')"
147
+ # }
148
+ # }
149
+ # ----------------------------------
150
+ #
151
+ # * After ten minutes, this will yield an event like:
152
+ #
153
+ # [source,json]
154
+ # ----------------------------------
155
+ # {
156
+ # "user_id" : "12345",
157
+ # "clicks" : 3,
158
+ # "tags" : [
159
+ # "_aggregatetimeout"
160
+ # ]
161
+ # }
162
+ # ----------------------------------
163
+ #
164
+ # ==== Example #4 : no end event and tasks come one after the other
113
165
  #
114
- # Third use case : you have no specific start event and no specific end event.
166
+ # Fourth use case : like example #3, you have no specific end event, but also, tasks come one after the other.
167
+ # That is to say : tasks are not interlaced. All task1 events come, then all task2 events come, ...
168
+ # In that case, you don't want to wait task timeout to flush aggregation map.
115
169
  # * A typical case is aggregating results from jdbc input plugin.
116
170
  # * Given that you have this SQL query : `SELECT country_name, town_name FROM town`
117
171
  # * Using jdbc input plugin, you get these 3 events from :
@@ -150,7 +204,7 @@ require "thread"
150
204
  # }
151
205
  # }
152
206
  # ----------------------------------
153
- # * The key point is that, each time aggregate plugin detects a new `country_name`, it pushes previous aggregate map as a new logstash event (with 'aggregated' tag), and then creates a new empty map for the next country
207
+ # * The key point is that each time aggregate plugin detects a new `country_name`, it pushes previous aggregate map as a new logstash event (with 'aggregated' tag), and then creates a new empty map for the next country
154
208
  # * When 5s timeout comes, the last aggregate map is pushed as a new event
155
209
  # * Finally, initial events (which are not aggregated) are dropped because useless
156
210
  #
@@ -195,6 +249,30 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
195
249
  # Example value : `"map['sql_duration'] += event['duration']"`
196
250
  config :code, :validate => :string, :required => true
197
251
 
252
+
253
+
254
+ # The code to execute to complete timeout generated event, when 'push_map_as_event_on_timeout' or 'push_previous_map_as_event' is set to true.
255
+ # The code block will have access to the newly generated timeout event that is pre-populated with the aggregation map.
256
+ #
257
+ # If 'timeout_task_id_field' is set, the event is also populated with the task_id value
258
+ #
259
+ # Example value: `"event.tag('_aggregatetimeout')"`
260
+ config :timeout_code, :validate => :string, :required => false
261
+
262
+
263
+ # This option indicates the timeout generated event's field for the "task_id" value.
264
+ # The task id will then be set into the timeout event. This can help correlate which tasks have been timed out.
265
+ #
266
+ # This field has no default value and will not be set on the event if not configured.
267
+ #
268
+ # Example:
269
+ #
270
+ # If the task_id is "12345" and this field is set to "my_id", the generated event will have:
271
+ # event[ "my_id" ] = "12345"
272
+ #
273
+ config :timeout_task_id_field, :validate => :string, :required => false
274
+
275
+
198
276
  # Tell the filter what to do with aggregate map.
199
277
  #
200
278
  # `create`: create the map, and execute the code only if map wasn't created before
@@ -225,10 +303,13 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
225
303
 
226
304
  # When this option is enabled, each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new logstash event,
227
305
  # and then creates a new empty map for the next task.
228
- #
306
+ #
229
307
  # WARNING: this option works fine only if tasks come one after the other. It means : all task1 events, then all task2 events, etc...
230
308
  config :push_previous_map_as_event, :validate => :boolean, :required => false, :default => false
231
309
 
310
+ # When this option is enabled, each time a task timeout is detected, it pushes task aggregation map as a new logstash event.
311
+ # This enables to detect and process task timeouts in logstash, but also to manage tasks that have no explicit end event.
312
+ config :push_map_as_event_on_timeout, :validate => :boolean, :required => false, :default => false
232
313
 
233
314
  # Default timeout (in seconds) when not defined in plugin configuration
234
315
  DEFAULT_TIMEOUT = 1800
@@ -256,6 +337,11 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
256
337
  # process lambda expression to call in each filter call
257
338
  eval("@codeblock = lambda { |event, map| #{@code} }", binding, "(aggregate filter code)")
258
339
 
340
+ # process lambda expression to call in the timeout case or previous event case
341
+ if @timeout_code
342
+ eval("@timeout_codeblock = lambda { |event| #{@timeout_code} }", binding, "(aggregate filter timeout code)")
343
+ end
344
+
259
345
  @@mutex.synchronize do
260
346
  # define eviction_instance
261
347
  if (!@timeout.nil? && (@@eviction_instance.nil? || @timeout < @@eviction_instance.timeout))
@@ -317,13 +403,14 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
317
403
  # retrieve the current aggregate map
318
404
  aggregate_maps_element = @@aggregate_maps[task_id]
319
405
 
406
+
320
407
  # create aggregate map, if it doesn't exist
321
408
  if (aggregate_maps_element.nil?)
322
409
  return if @map_action == "update"
323
410
  # create new event from previous map, if @push_previous_map_as_event is enabled
324
411
  if (@push_previous_map_as_event and !@@aggregate_maps.empty?)
325
412
  previous_map = @@aggregate_maps.shift[1].map
326
- event_to_yield = LogStash::Event.new(previous_map)
413
+ event_to_yield = create_timeout_event(previous_map, task_id)
327
414
  end
328
415
  aggregate_maps_element = LogStash::Filters::Aggregate::Element.new(Time.now);
329
416
  @@aggregate_maps[task_id] = aggregate_maps_element
@@ -354,6 +441,31 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
354
441
 
355
442
  end
356
443
 
444
+ # Create a new event from the aggregation_map and the corresponding task_id
445
+ # This will create the event and
446
+ # if @timeout_task_id_field is set, it will set the task_id on the timeout event
447
+ # if @timeout_code is set, it will execute the timeout code on the created timeout event
448
+ # returns the newly created event
449
+ def create_timeout_event(aggregation_map, task_id)
450
+ event_to_yield = LogStash::Event.new(aggregation_map)
451
+
452
+ if @timeout_task_id_field
453
+ event_to_yield[@timeout_task_id_field] = task_id
454
+ end
455
+
456
+ # Call code block if available
457
+ if @timeout_code
458
+ begin
459
+ @timeout_codeblock.call(event_to_yield)
460
+ rescue => exception
461
+ @logger.error("Aggregate exception occurred. Error: #{exception} ; TimeoutCode: #{@timeout_code} ; TimeoutEventData: #{event_to_yield.instance_variable_get('@data')}")
462
+ event_to_yield.tag("_aggregateexception")
463
+ end
464
+ end
465
+
466
+ return event_to_yield
467
+ end
468
+
357
469
  # Necessary to indicate logstash to periodically call 'flush' method
358
470
  def periodic_flush
359
471
  true
@@ -371,23 +483,24 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
371
483
  if (@@eviction_instance == self && (@@last_eviction_timestamp.nil? || Time.now > @@last_eviction_timestamp + @timeout / 2))
372
484
  events_to_flush = remove_expired_maps()
373
485
  @@last_eviction_timestamp = Time.now
486
+ return events_to_flush
374
487
  end
375
-
376
- return events_to_flush
488
+
377
489
  end
378
490
 
379
491
 
380
492
  # Remove the expired Aggregate maps from @@aggregate_maps if they are older than timeout.
381
- # If @push_previous_map_as_event option is set, expired maps are returned as new events to be flushed to Logstash pipeline.
493
+ # If @push_previous_map_as_event option is set, or @push_map_as_event_on_timeout is set, expired maps are returned as new events to be flushed to Logstash pipeline.
382
494
  def remove_expired_maps()
383
495
  events_to_flush = []
384
496
  min_timestamp = Time.now - @timeout
385
497
 
386
498
  @@mutex.synchronize do
499
+
387
500
  @@aggregate_maps.delete_if do |key, element|
388
501
  if (element.creation_timestamp < min_timestamp)
389
- if (@push_previous_map_as_event)
390
- events_to_flush << LogStash::Event.new(element.map)
502
+ if (@push_previous_map_as_event) || (@push_map_as_event_on_timeout)
503
+ events_to_flush << create_timeout_event(element.map, key)
391
504
  end
392
505
  next true
393
506
  end
@@ -409,4 +522,4 @@ class LogStash::Filters::Aggregate::Element
409
522
  @creation_timestamp = creation_timestamp
410
523
  @map = {}
411
524
  end
412
- end
525
+ end
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = 'logstash-filter-aggregate'
3
- s.version = '2.2.0'
3
+ s.version = '2.3.0'
4
4
  s.licenses = ['Apache License (2.0)']
5
5
  s.summary = "The aim of this filter is to aggregate information available among several events (typically log lines) belonging to a same task, and finally push aggregated information into final task event."
6
6
  s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
@@ -10,7 +10,7 @@ describe LogStash::Filters::Aggregate do
10
10
  aggregate_maps.clear()
11
11
  @start_filter = setup_filter({ "map_action" => "create", "code" => "map['sql_duration'] = 0" })
12
12
  @update_filter = setup_filter({ "map_action" => "update", "code" => "map['sql_duration'] += event['duration']" })
13
- @end_filter = setup_filter({ "map_action" => "update", "code" => "event.to_hash.merge!(map)", "end_of_task" => true, "timeout" => 5 })
13
+ @end_filter = setup_filter({"timeout_task_id_field" => "my_id", "push_map_as_event_on_timeout" => true, "map_action" => "update", "code" => "event.to_hash.merge!(map)", "end_of_task" => true, "timeout" => 5, "timeout_code" => "event['test'] = 'testValue'" })
14
14
  end
15
15
 
16
16
  context "Start event" do
@@ -170,13 +170,18 @@ describe LogStash::Filters::Aggregate do
170
170
 
171
171
  describe "timeout defined on the filter" do
172
172
  it "event is not removed if not expired" do
173
- @end_filter.flush()
173
+ entries = @end_filter.flush()
174
174
  expect(aggregate_maps.size).to eq(1)
175
+ expect(entries).to be_empty
175
176
  end
176
- it "event is removed if expired" do
177
+ it "removes event if expired and creates a new timeout event" do
177
178
  sleep(2)
178
- @end_filter.flush()
179
+ entries = @end_filter.flush()
179
180
  expect(aggregate_maps).to be_empty
181
+ expect(entries.size).to eq(1)
182
+ expect(entries[0]['my_id']).to eq("id_123") # task id
183
+ expect(entries[0]["sql_duration"]).to eq(0) # Aggregation map
184
+ expect(entries[0]['test']).to eq("testValue") # Timeout code
180
185
  end
181
186
  end
182
187
 
@@ -231,13 +236,14 @@ describe LogStash::Filters::Aggregate do
231
236
 
232
237
  describe "when timeout happens, " do
233
238
  it "flush method should return last map as new event" do
234
- push_filter = setup_filter({ "code" => "map['taskid'] = event['taskid']", "push_previous_map_as_event" => true, "timeout" => 1 })
239
+ push_filter = setup_filter({ "code" => "map['taskid'] = event['taskid']", "push_previous_map_as_event" => true, "timeout" => 1, "timeout_code" => "event['test'] = 'testValue'" })
235
240
  push_filter.filter(event({"taskid" => "1"}))
236
241
  sleep(2)
237
242
  events_to_flush = push_filter.flush()
238
243
  expect(events_to_flush).not_to be_nil
239
244
  expect(events_to_flush.size).to eq(1)
240
245
  expect(events_to_flush[0]["taskid"]).to eq("1")
246
+ expect(events_to_flush[0]['test']).to eq("testValue")
241
247
  expect(aggregate_maps.size).to eq(0)
242
248
  end
243
249
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: logstash-filter-aggregate
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.2.0
4
+ version: 2.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Elastic
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2016-07-09 00:00:00.000000000 Z
12
+ date: 2016-08-03 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  requirement: !ruby/object:Gem::Requirement