logstash-filter-aggregate 2.1.2 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: f880dab9eca9eca31f5634938e973fd0bc7d739b
4
- data.tar.gz: e2f233c165cd6f944c7f0cc63968e2bb6428f694
3
+ metadata.gz: e023e6c80ed96fa874477b00888a78bf45ee56a7
4
+ data.tar.gz: 1375bebbcda30c0052f8f2aa5188dda548d53d6c
5
5
  SHA512:
6
- metadata.gz: 8cbf67652af3eaf6547302b701a6aec844e7ca1a82b722a83ca048724a3ce252e1e5ed2bbf5c2fee8533f8fd1ab387791e7b7749fed33a6f745c5650bc57632e
7
- data.tar.gz: ec57ad6ef531427c1c43800340f8352b44d954aba39bf3f786bf315a7f843458f8252f846f60b4c2c9312c0e94293bc2ec47aad0ad5088b291326083cd00ab76
6
+ metadata.gz: f611bd231a13d8933662dbe7df7075a551e89772f0007d4d694636f307e576c828f091f37c7a3f917b717bec9dbb0acacff944cff14a4fc27bee92318d394e5a
7
+ data.tar.gz: bccd38bcb4f37a1f8f4a06ab01baa92883d6d4e3848ff380fa318583830a3600fe464ecd93f299993a42908ea70f0fd596300b6331d469eadfef6b47b165b4db
data/CHANGELOG.md CHANGED
@@ -1,3 +1,6 @@
1
+ ## 2.2.0
2
+ - new feature: add new option "push_previous_map_as_event" so that each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new logstash event
3
+
1
4
  ## 2.1.2
2
5
  - bugfix: clarify default timeout behaviour : by default, timeout is 1800s
3
6
 
data/README.md CHANGED
@@ -4,9 +4,8 @@
4
4
 
5
5
  The aim of this filter is to aggregate information available among several events (typically log lines) belonging to a same task, and finally push aggregated information into final task event.
6
6
 
7
- You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work
8
- correctly otherwise documents
9
- may be processed out of sequence and unexpected results will occur.
7
+ You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work correctly
8
+ otherwise events may be processed out of sequence and unexpected results will occur.
10
9
 
11
10
  ## Example #1
12
11
 
@@ -101,6 +100,47 @@ the field `sql_duration` is added and contains the sum of all sql queries durati
101
100
  * the key point is the "||=" ruby operator.
102
101
  it allows to initialize 'sql_duration' map entry to 0 only if this map entry is not already initialized
103
102
 
103
+ ## Example #3
104
+
105
+ Third use case : you have no specific start event and no specific end event.
106
+ A typical case is aggregating results from jdbc input plugin.
107
+ * Given that you have this SQL query : `SELECT country_name, town_name FROM town`
108
+ * Using jdbc input plugin, you get these 3 events from :
109
+ ``` json
110
+ { "country_name": "France", "town_name": "Paris" }
111
+ { "country_name": "France", "town_name": "Marseille" }
112
+ { "country_name": "USA", "town_name": "New-York" }
113
+ ```
114
+ * And you would like these 2 result events to push them into elasticsearch :
115
+ ``` json
116
+ { "country_name": "France", "town_name": [ "Paris", "Marseille" ] }
117
+ { "country_name": "USA", "town_name": [ "New-York" ] }
118
+ ```
119
+ * You can do that using `push_previous_map_as_event` aggregate plugin option :
120
+ ``` ruby
121
+ filter {
122
+ aggregate {
123
+ task_id => "%{country_name}"
124
+ code => "
125
+ map['tags'] ||= ['aggregated']
126
+ map['town_name'] ||= []
127
+ event.to_hash.each do |key,value|
128
+ map[key] = value unless map.has_key?(key)
129
+ map[key] << value if map[key].is_a?(Array)
130
+ end
131
+ "
132
+ push_previous_map_as_event => true
133
+ timeout => 5
134
+ }
135
+
136
+ if "aggregated" not in [tags] {
137
+ drop {}
138
+ }
139
+ }
140
+ ```
141
+ * The key point is that, each time aggregate plugin detects a new `country_name`, it pushes previous aggregate map as a new logstash event (with 'aggregated' tag), and then creates a new empty map for the next country
142
+ * When 5s timeout comes, the last aggregate map is pushed as a new event
143
+ * Finally, initial events (which are not aggregated) are dropped because useless
104
144
 
105
145
  ## How it works
106
146
  - the filter needs a "task_id" to correlate events (log lines) of a same task
@@ -114,7 +154,7 @@ it allows to initialize 'sql_duration' map entry to 0 only if this map entry is
114
154
 
115
155
  ## Use Cases
116
156
  - extract some cool metrics from task logs and push them into task final log event (like in example #1 and #2)
117
- - extract error information in any task log line, and push it in final task event (to get a final document with all error information if any)
157
+ - extract error information in any task log line, and push it in final task event (to get a final event with all error information if any)
118
158
  - extract all back-end calls as a list, and push this list in final task event (to get a task profile)
119
159
  - extract all http headers logged in several lines to push this list in final task event (complete http request info)
120
160
  - for every back-end call, collect call details available on several lines, analyse it and finally tag final back-end call log line (error, timeout, business-warning, ...)
@@ -156,6 +196,12 @@ If not defined, aggregate maps will not be stored at logstash stop and will be l
156
196
  Must be defined in only one aggregate filter (as aggregate maps are global).
157
197
  Example value : `"/path/to/.aggregate_maps"`
158
198
 
199
+ - **push_previous_map_as_event:**
200
+ When this option is enabled, each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new logstash event,
201
+ and then creates a new empty map for the next task.
202
+ _WARNING:_ this option works fine only if tasks come one after the other. It means : all task1 events, then all task2 events, etc...
203
+ Default value: `false`
204
+
159
205
 
160
206
  ## Changelog
161
207
 
@@ -8,9 +8,8 @@ require "thread"
8
8
  # The aim of this filter is to aggregate information available among several events (typically log lines) belonging to a same task,
9
9
  # and finally push aggregated information into final task event.
10
10
  #
11
- # You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work
12
- # correctly otherwise documents
13
- # may be processed out of sequence and unexpected results will occur.
11
+ # You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work correctly
12
+ # otherwise events may be processed out of sequence and unexpected results will occur.
14
13
  #
15
14
  # ==== Example #1
16
15
  #
@@ -110,6 +109,52 @@ require "thread"
110
109
  # * the key point is the "||=" ruby operator. It allows to initialize 'sql_duration' map entry to 0 only if this map entry is not already initialized
111
110
  #
112
111
  #
112
+ # ==== Example #3
113
+ #
114
+ # Third use case : you have no specific start event and no specific end event.
115
+ # * A typical case is aggregating results from jdbc input plugin.
116
+ # * Given that you have this SQL query : `SELECT country_name, town_name FROM town`
117
+ # * Using jdbc input plugin, you get these 3 events from :
118
+ # [source,json]
119
+ # ----------------------------------
120
+ # { "country_name": "France", "town_name": "Paris" }
121
+ # { "country_name": "France", "town_name": "Marseille" }
122
+ # { "country_name": "USA", "town_name": "New-York" }
123
+ # ----------------------------------
124
+ # * And you would like these 2 result events to push them into elasticsearch :
125
+ # [source,json]
126
+ # ----------------------------------
127
+ # { "country_name": "France", "town_name": [ "Paris", "Marseille" ] }
128
+ # { "country_name": "USA", "town_name": [ "New-York" ] }
129
+ # ----------------------------------
130
+ # * You can do that using `push_previous_map_as_event` aggregate plugin option :
131
+ # [source,ruby]
132
+ # ----------------------------------
133
+ # filter {
134
+ # aggregate {
135
+ # task_id => "%{country_name}"
136
+ # code => "
137
+ # map['tags'] ||= ['aggregated']
138
+ # map['town_name'] ||= []
139
+ # event.to_hash.each do |key,value|
140
+ # map[key] = value unless map.has_key?(key)
141
+ # map[key] << value if map[key].is_a?(Array)
142
+ # end
143
+ # "
144
+ # push_previous_map_as_event => true
145
+ # timeout => 5
146
+ # }
147
+ #
148
+ # if "aggregated" not in [tags] {
149
+ # drop {}
150
+ # }
151
+ # }
152
+ # ----------------------------------
153
+ # * The key point is that, each time aggregate plugin detects a new `country_name`, it pushes previous aggregate map as a new logstash event (with 'aggregated' tag), and then creates a new empty map for the next country
154
+ # * When 5s timeout comes, the last aggregate map is pushed as a new event
155
+ # * Finally, initial events (which are not aggregated) are dropped because useless
156
+ #
157
+ #
113
158
  # ==== How it works
114
159
  # * the filter needs a "task_id" to correlate events (log lines) of a same task
115
160
  # * at the task beggining, filter creates a map, attached to task_id
@@ -123,7 +168,7 @@ require "thread"
123
168
  #
124
169
  # ==== Use Cases
125
170
  # * extract some cool metrics from task logs and push them into task final log event (like in example #1 and #2)
126
- # * extract error information in any task log line, and push it in final task event (to get a final document with all error information if any)
171
+ # * extract error information in any task log line, and push it in final task event (to get a final event with all error information if any)
127
172
  # * extract all back-end calls as a list, and push this list in final task event (to get a task profile)
128
173
  # * extract all http headers logged in several lines to push this list in final task event (complete http request info)
129
174
  # * for every back-end call, collect call details available on several lines, analyse it and finally tag final back-end call log line (error, timeout, business-warning, ...)
@@ -178,6 +223,12 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
178
223
  # Example value : `"/path/to/.aggregate_maps"`
179
224
  config :aggregate_maps_path, :validate => :string, :required => false
180
225
 
226
+ # When this option is enabled, each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new logstash event,
227
+ # and then creates a new empty map for the next task.
228
+ #
229
+ # WARNING: this option works fine only if tasks come one after the other. It means : all task1 events, then all task2 events, etc...
230
+ config :push_previous_map_as_event, :validate => :boolean, :required => false, :default => false
231
+
181
232
 
182
233
  # Default timeout (in seconds) when not defined in plugin configuration
183
234
  DEFAULT_TIMEOUT = 1800
@@ -258,14 +309,22 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
258
309
  return if task_id.nil? || task_id == @task_id
259
310
 
260
311
  noError = false
312
+ event_to_yield = nil
261
313
 
262
314
  # protect aggregate_maps against concurrent access, using a mutex
263
315
  @@mutex.synchronize do
264
316
 
265
317
  # retrieve the current aggregate map
266
318
  aggregate_maps_element = @@aggregate_maps[task_id]
319
+
320
+ # create aggregate map, if it doesn't exist
267
321
  if (aggregate_maps_element.nil?)
268
322
  return if @map_action == "update"
323
+ # create new event from previous map, if @push_previous_map_as_event is enabled
324
+ if (@push_previous_map_as_event and !@@aggregate_maps.empty?)
325
+ previous_map = @@aggregate_maps.shift[1].map
326
+ event_to_yield = LogStash::Event.new(previous_map)
327
+ end
269
328
  aggregate_maps_element = LogStash::Filters::Aggregate::Element.new(Time.now);
270
329
  @@aggregate_maps[task_id] = aggregate_maps_element
271
330
  else
@@ -284,10 +343,15 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
284
343
 
285
344
  # delete the map if task is ended
286
345
  @@aggregate_maps.delete(task_id) if @end_of_task
346
+
287
347
  end
288
348
 
289
349
  # match the filter, only if no error occurred
290
350
  filter_matched(event) if noError
351
+
352
+ # yield previous map as new event if set
353
+ yield event_to_yield unless event_to_yield.nil?
354
+
291
355
  end
292
356
 
293
357
  # Necessary to indicate logstash to periodically call 'flush' method
@@ -305,20 +369,33 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
305
369
 
306
370
  # Launch eviction only every interval of (@timeout / 2) seconds
307
371
  if (@@eviction_instance == self && (@@last_eviction_timestamp.nil? || Time.now > @@last_eviction_timestamp + @timeout / 2))
308
- remove_expired_elements()
372
+ events_to_flush = remove_expired_maps()
309
373
  @@last_eviction_timestamp = Time.now
310
374
  end
311
375
 
312
- return nil
376
+ return events_to_flush
313
377
  end
314
378
 
315
379
 
316
- # Remove the expired Aggregate elements from "aggregate_maps" if they are older than timeout
317
- def remove_expired_elements()
380
+ # Remove the expired Aggregate maps from @@aggregate_maps if they are older than timeout.
381
+ # If @push_previous_map_as_event option is set, expired maps are returned as new events to be flushed to Logstash pipeline.
382
+ def remove_expired_maps()
383
+ events_to_flush = []
318
384
  min_timestamp = Time.now - @timeout
385
+
319
386
  @@mutex.synchronize do
320
- @@aggregate_maps.delete_if { |key, element| element.creation_timestamp < min_timestamp }
387
+ @@aggregate_maps.delete_if do |key, element|
388
+ if (element.creation_timestamp < min_timestamp)
389
+ if (@push_previous_map_as_event)
390
+ events_to_flush << LogStash::Event.new(element.map)
391
+ end
392
+ next true
393
+ end
394
+ next false
395
+ end
321
396
  end
397
+
398
+ return events_to_flush
322
399
  end
323
400
 
324
401
  end # class LogStash::Filters::Aggregate
@@ -1,9 +1,9 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = 'logstash-filter-aggregate'
3
- s.version = '2.1.2'
3
+ s.version = '2.2.0'
4
4
  s.licenses = ['Apache License (2.0)']
5
5
  s.summary = "The aim of this filter is to aggregate information available among several events (typically log lines) belonging to a same task, and finally push aggregated information into final task event."
6
- s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
6
+ s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
7
7
  s.authors = ["Elastic", "Fabien Baligand"]
8
8
  s.email = 'info@elastic.co'
9
9
  s.homepage = "https://github.com/logstash-plugins/logstash-filter-aggregate"
@@ -218,4 +218,30 @@ describe LogStash::Filters::Aggregate do
218
218
  end
219
219
  end
220
220
  end
221
+
222
+ context "push_previous_map_as_event option is defined, " do
223
+ describe "when a new task id is detected, " do
224
+ it "should push previous map as new event" do
225
+ push_filter = setup_filter({ "code" => "map['taskid'] = event['taskid']", "push_previous_map_as_event" => true, "timeout" => 5 })
226
+ push_filter.filter(event({"taskid" => "1"})) { |yield_event| fail "task 1 shouldn't have yield event" }
227
+ push_filter.filter(event({"taskid" => "2"})) { |yield_event| expect(yield_event["taskid"]).to eq("1") }
228
+ expect(aggregate_maps.size).to eq(1)
229
+ end
230
+ end
231
+
232
+ describe "when timeout happens, " do
233
+ it "flush method should return last map as new event" do
234
+ push_filter = setup_filter({ "code" => "map['taskid'] = event['taskid']", "push_previous_map_as_event" => true, "timeout" => 1 })
235
+ push_filter.filter(event({"taskid" => "1"}))
236
+ sleep(2)
237
+ events_to_flush = push_filter.flush()
238
+ expect(events_to_flush).not_to be_nil
239
+ expect(events_to_flush.size).to eq(1)
240
+ expect(events_to_flush[0]["taskid"]).to eq("1")
241
+ expect(aggregate_maps.size).to eq(0)
242
+ end
243
+ end
244
+ end
245
+
246
+
221
247
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: logstash-filter-aggregate
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.1.2
4
+ version: 2.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Elastic
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2016-06-04 00:00:00.000000000 Z
12
+ date: 2016-07-09 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  requirement: !ruby/object:Gem::Requirement