logstash-filter-aggregate 2.1.2 → 2.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: f880dab9eca9eca31f5634938e973fd0bc7d739b
4
- data.tar.gz: e2f233c165cd6f944c7f0cc63968e2bb6428f694
3
+ metadata.gz: e023e6c80ed96fa874477b00888a78bf45ee56a7
4
+ data.tar.gz: 1375bebbcda30c0052f8f2aa5188dda548d53d6c
5
5
  SHA512:
6
- metadata.gz: 8cbf67652af3eaf6547302b701a6aec844e7ca1a82b722a83ca048724a3ce252e1e5ed2bbf5c2fee8533f8fd1ab387791e7b7749fed33a6f745c5650bc57632e
7
- data.tar.gz: ec57ad6ef531427c1c43800340f8352b44d954aba39bf3f786bf315a7f843458f8252f846f60b4c2c9312c0e94293bc2ec47aad0ad5088b291326083cd00ab76
6
+ metadata.gz: f611bd231a13d8933662dbe7df7075a551e89772f0007d4d694636f307e576c828f091f37c7a3f917b717bec9dbb0acacff944cff14a4fc27bee92318d394e5a
7
+ data.tar.gz: bccd38bcb4f37a1f8f4a06ab01baa92883d6d4e3848ff380fa318583830a3600fe464ecd93f299993a42908ea70f0fd596300b6331d469eadfef6b47b165b4db
data/CHANGELOG.md CHANGED
@@ -1,3 +1,6 @@
1
+ ## 2.2.0
2
+ - new feature: add new option "push_previous_map_as_event" so that each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new logstash event
3
+
1
4
  ## 2.1.2
2
5
  - bugfix: clarify default timeout behaviour : by default, timeout is 1800s
3
6
 
data/README.md CHANGED
@@ -4,9 +4,8 @@
4
4
 
5
5
  The aim of this filter is to aggregate information available among several events (typically log lines) belonging to a same task, and finally push aggregated information into final task event.
6
6
 
7
- You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work
8
- correctly otherwise documents
9
- may be processed out of sequence and unexpected results will occur.
7
+ You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work correctly
8
+ otherwise events may be processed out of sequence and unexpected results will occur.
10
9
 
11
10
  ## Example #1
12
11
 
@@ -101,6 +100,47 @@ the field `sql_duration` is added and contains the sum of all sql queries durati
101
100
  * the key point is the "||=" ruby operator.
102
101
  it allows to initialize 'sql_duration' map entry to 0 only if this map entry is not already initialized
103
102
 
103
+ ## Example #3
104
+
105
+ Third use case : you have no specific start event and no specific end event.
106
+ A typical case is aggregating results from jdbc input plugin.
107
+ * Given that you have this SQL query : `SELECT country_name, town_name FROM town`
108
+ * Using jdbc input plugin, you get these 3 events from :
109
+ ``` json
110
+ { "country_name": "France", "town_name": "Paris" }
111
+ { "country_name": "France", "town_name": "Marseille" }
112
+ { "country_name": "USA", "town_name": "New-York" }
113
+ ```
114
+ * And you would like these 2 result events to push them into elasticsearch :
115
+ ``` json
116
+ { "country_name": "France", "town_name": [ "Paris", "Marseille" ] }
117
+ { "country_name": "USA", "town_name": [ "New-York" ] }
118
+ ```
119
+ * You can do that using `push_previous_map_as_event` aggregate plugin option :
120
+ ``` ruby
121
+ filter {
122
+ aggregate {
123
+ task_id => "%{country_name}"
124
+ code => "
125
+ map['tags'] ||= ['aggregated']
126
+ map['town_name'] ||= []
127
+ event.to_hash.each do |key,value|
128
+ map[key] = value unless map.has_key?(key)
129
+ map[key] << value if map[key].is_a?(Array)
130
+ end
131
+ "
132
+ push_previous_map_as_event => true
133
+ timeout => 5
134
+ }
135
+
136
+ if "aggregated" not in [tags] {
137
+ drop {}
138
+ }
139
+ }
140
+ ```
141
+ * The key point is that, each time aggregate plugin detects a new `country_name`, it pushes previous aggregate map as a new logstash event (with 'aggregated' tag), and then creates a new empty map for the next country
142
+ * When 5s timeout comes, the last aggregate map is pushed as a new event
143
+ * Finally, initial events (which are not aggregated) are dropped because useless
104
144
 
105
145
  ## How it works
106
146
  - the filter needs a "task_id" to correlate events (log lines) of a same task
@@ -114,7 +154,7 @@ it allows to initialize 'sql_duration' map entry to 0 only if this map entry is
114
154
 
115
155
  ## Use Cases
116
156
  - extract some cool metrics from task logs and push them into task final log event (like in example #1 and #2)
117
- - extract error information in any task log line, and push it in final task event (to get a final document with all error information if any)
157
+ - extract error information in any task log line, and push it in final task event (to get a final event with all error information if any)
118
158
  - extract all back-end calls as a list, and push this list in final task event (to get a task profile)
119
159
  - extract all http headers logged in several lines to push this list in final task event (complete http request info)
120
160
  - for every back-end call, collect call details available on several lines, analyse it and finally tag final back-end call log line (error, timeout, business-warning, ...)
@@ -156,6 +196,12 @@ If not defined, aggregate maps will not be stored at logstash stop and will be l
156
196
  Must be defined in only one aggregate filter (as aggregate maps are global).
157
197
  Example value : `"/path/to/.aggregate_maps"`
158
198
 
199
+ - **push_previous_map_as_event:**
200
+ When this option is enabled, each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new logstash event,
201
+ and then creates a new empty map for the next task.
202
+ _WARNING:_ this option works fine only if tasks come one after the other. It means : all task1 events, then all task2 events, etc...
203
+ Default value: `false`
204
+
159
205
 
160
206
  ## Changelog
161
207
 
@@ -8,9 +8,8 @@ require "thread"
8
8
  # The aim of this filter is to aggregate information available among several events (typically log lines) belonging to a same task,
9
9
  # and finally push aggregated information into final task event.
10
10
  #
11
- # You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work
12
- # correctly otherwise documents
13
- # may be processed out of sequence and unexpected results will occur.
11
+ # You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work correctly
12
+ # otherwise events may be processed out of sequence and unexpected results will occur.
14
13
  #
15
14
  # ==== Example #1
16
15
  #
@@ -110,6 +109,52 @@ require "thread"
110
109
  # * the key point is the "||=" ruby operator. It allows to initialize 'sql_duration' map entry to 0 only if this map entry is not already initialized
111
110
  #
112
111
  #
112
+ # ==== Example #3
113
+ #
114
+ # Third use case : you have no specific start event and no specific end event.
115
+ # * A typical case is aggregating results from jdbc input plugin.
116
+ # * Given that you have this SQL query : `SELECT country_name, town_name FROM town`
117
+ # * Using jdbc input plugin, you get these 3 events from :
118
+ # [source,json]
119
+ # ----------------------------------
120
+ # { "country_name": "France", "town_name": "Paris" }
121
+ # { "country_name": "France", "town_name": "Marseille" }
122
+ # { "country_name": "USA", "town_name": "New-York" }
123
+ # ----------------------------------
124
+ # * And you would like these 2 result events to push them into elasticsearch :
125
+ # [source,json]
126
+ # ----------------------------------
127
+ # { "country_name": "France", "town_name": [ "Paris", "Marseille" ] }
128
+ # { "country_name": "USA", "town_name": [ "New-York" ] }
129
+ # ----------------------------------
130
+ # * You can do that using `push_previous_map_as_event` aggregate plugin option :
131
+ # [source,ruby]
132
+ # ----------------------------------
133
+ # filter {
134
+ # aggregate {
135
+ # task_id => "%{country_name}"
136
+ # code => "
137
+ # map['tags'] ||= ['aggregated']
138
+ # map['town_name'] ||= []
139
+ # event.to_hash.each do |key,value|
140
+ # map[key] = value unless map.has_key?(key)
141
+ # map[key] << value if map[key].is_a?(Array)
142
+ # end
143
+ # "
144
+ # push_previous_map_as_event => true
145
+ # timeout => 5
146
+ # }
147
+ #
148
+ # if "aggregated" not in [tags] {
149
+ # drop {}
150
+ # }
151
+ # }
152
+ # ----------------------------------
153
+ # * The key point is that, each time aggregate plugin detects a new `country_name`, it pushes previous aggregate map as a new logstash event (with 'aggregated' tag), and then creates a new empty map for the next country
154
+ # * When 5s timeout comes, the last aggregate map is pushed as a new event
155
+ # * Finally, initial events (which are not aggregated) are dropped because useless
156
+ #
157
+ #
113
158
  # ==== How it works
114
159
  # * the filter needs a "task_id" to correlate events (log lines) of a same task
115
160
  # * at the task beggining, filter creates a map, attached to task_id
@@ -123,7 +168,7 @@ require "thread"
123
168
  #
124
169
  # ==== Use Cases
125
170
  # * extract some cool metrics from task logs and push them into task final log event (like in example #1 and #2)
126
- # * extract error information in any task log line, and push it in final task event (to get a final document with all error information if any)
171
+ # * extract error information in any task log line, and push it in final task event (to get a final event with all error information if any)
127
172
  # * extract all back-end calls as a list, and push this list in final task event (to get a task profile)
128
173
  # * extract all http headers logged in several lines to push this list in final task event (complete http request info)
129
174
  # * for every back-end call, collect call details available on several lines, analyse it and finally tag final back-end call log line (error, timeout, business-warning, ...)
@@ -178,6 +223,12 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
178
223
  # Example value : `"/path/to/.aggregate_maps"`
179
224
  config :aggregate_maps_path, :validate => :string, :required => false
180
225
 
226
+ # When this option is enabled, each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new logstash event,
227
+ # and then creates a new empty map for the next task.
228
+ #
229
+ # WARNING: this option works fine only if tasks come one after the other. It means : all task1 events, then all task2 events, etc...
230
+ config :push_previous_map_as_event, :validate => :boolean, :required => false, :default => false
231
+
181
232
 
182
233
  # Default timeout (in seconds) when not defined in plugin configuration
183
234
  DEFAULT_TIMEOUT = 1800
@@ -258,14 +309,22 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
258
309
  return if task_id.nil? || task_id == @task_id
259
310
 
260
311
  noError = false
312
+ event_to_yield = nil
261
313
 
262
314
  # protect aggregate_maps against concurrent access, using a mutex
263
315
  @@mutex.synchronize do
264
316
 
265
317
  # retrieve the current aggregate map
266
318
  aggregate_maps_element = @@aggregate_maps[task_id]
319
+
320
+ # create aggregate map, if it doesn't exist
267
321
  if (aggregate_maps_element.nil?)
268
322
  return if @map_action == "update"
323
+ # create new event from previous map, if @push_previous_map_as_event is enabled
324
+ if (@push_previous_map_as_event and !@@aggregate_maps.empty?)
325
+ previous_map = @@aggregate_maps.shift[1].map
326
+ event_to_yield = LogStash::Event.new(previous_map)
327
+ end
269
328
  aggregate_maps_element = LogStash::Filters::Aggregate::Element.new(Time.now);
270
329
  @@aggregate_maps[task_id] = aggregate_maps_element
271
330
  else
@@ -284,10 +343,15 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
284
343
 
285
344
  # delete the map if task is ended
286
345
  @@aggregate_maps.delete(task_id) if @end_of_task
346
+
287
347
  end
288
348
 
289
349
  # match the filter, only if no error occurred
290
350
  filter_matched(event) if noError
351
+
352
+ # yield previous map as new event if set
353
+ yield event_to_yield unless event_to_yield.nil?
354
+
291
355
  end
292
356
 
293
357
  # Necessary to indicate logstash to periodically call 'flush' method
@@ -305,20 +369,33 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
305
369
 
306
370
  # Launch eviction only every interval of (@timeout / 2) seconds
307
371
  if (@@eviction_instance == self && (@@last_eviction_timestamp.nil? || Time.now > @@last_eviction_timestamp + @timeout / 2))
308
- remove_expired_elements()
372
+ events_to_flush = remove_expired_maps()
309
373
  @@last_eviction_timestamp = Time.now
310
374
  end
311
375
 
312
- return nil
376
+ return events_to_flush
313
377
  end
314
378
 
315
379
 
316
- # Remove the expired Aggregate elements from "aggregate_maps" if they are older than timeout
317
- def remove_expired_elements()
380
+ # Remove the expired Aggregate maps from @@aggregate_maps if they are older than timeout.
381
+ # If @push_previous_map_as_event option is set, expired maps are returned as new events to be flushed to Logstash pipeline.
382
+ def remove_expired_maps()
383
+ events_to_flush = []
318
384
  min_timestamp = Time.now - @timeout
385
+
319
386
  @@mutex.synchronize do
320
- @@aggregate_maps.delete_if { |key, element| element.creation_timestamp < min_timestamp }
387
+ @@aggregate_maps.delete_if do |key, element|
388
+ if (element.creation_timestamp < min_timestamp)
389
+ if (@push_previous_map_as_event)
390
+ events_to_flush << LogStash::Event.new(element.map)
391
+ end
392
+ next true
393
+ end
394
+ next false
395
+ end
321
396
  end
397
+
398
+ return events_to_flush
322
399
  end
323
400
 
324
401
  end # class LogStash::Filters::Aggregate
@@ -1,9 +1,9 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = 'logstash-filter-aggregate'
3
- s.version = '2.1.2'
3
+ s.version = '2.2.0'
4
4
  s.licenses = ['Apache License (2.0)']
5
5
  s.summary = "The aim of this filter is to aggregate information available among several events (typically log lines) belonging to a same task, and finally push aggregated information into final task event."
6
- s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
6
+ s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
7
7
  s.authors = ["Elastic", "Fabien Baligand"]
8
8
  s.email = 'info@elastic.co'
9
9
  s.homepage = "https://github.com/logstash-plugins/logstash-filter-aggregate"
@@ -218,4 +218,30 @@ describe LogStash::Filters::Aggregate do
218
218
  end
219
219
  end
220
220
  end
221
+
222
+ context "push_previous_map_as_event option is defined, " do
223
+ describe "when a new task id is detected, " do
224
+ it "should push previous map as new event" do
225
+ push_filter = setup_filter({ "code" => "map['taskid'] = event['taskid']", "push_previous_map_as_event" => true, "timeout" => 5 })
226
+ push_filter.filter(event({"taskid" => "1"})) { |yield_event| fail "task 1 shouldn't have yield event" }
227
+ push_filter.filter(event({"taskid" => "2"})) { |yield_event| expect(yield_event["taskid"]).to eq("1") }
228
+ expect(aggregate_maps.size).to eq(1)
229
+ end
230
+ end
231
+
232
+ describe "when timeout happens, " do
233
+ it "flush method should return last map as new event" do
234
+ push_filter = setup_filter({ "code" => "map['taskid'] = event['taskid']", "push_previous_map_as_event" => true, "timeout" => 1 })
235
+ push_filter.filter(event({"taskid" => "1"}))
236
+ sleep(2)
237
+ events_to_flush = push_filter.flush()
238
+ expect(events_to_flush).not_to be_nil
239
+ expect(events_to_flush.size).to eq(1)
240
+ expect(events_to_flush[0]["taskid"]).to eq("1")
241
+ expect(aggregate_maps.size).to eq(0)
242
+ end
243
+ end
244
+ end
245
+
246
+
221
247
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: logstash-filter-aggregate
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.1.2
4
+ version: 2.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Elastic
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2016-06-04 00:00:00.000000000 Z
12
+ date: 2016-07-09 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  requirement: !ruby/object:Gem::Requirement