logstash-filter-aggregate 2.1.2 → 2.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +3 -0
- data/README.md +50 -4
- data/lib/logstash/filters/aggregate.rb +86 -9
- data/logstash-filter-aggregate.gemspec +2 -2
- data/spec/filters/aggregate_spec.rb +26 -0
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: e023e6c80ed96fa874477b00888a78bf45ee56a7
|
4
|
+
data.tar.gz: 1375bebbcda30c0052f8f2aa5188dda548d53d6c
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f611bd231a13d8933662dbe7df7075a551e89772f0007d4d694636f307e576c828f091f37c7a3f917b717bec9dbb0acacff944cff14a4fc27bee92318d394e5a
|
7
|
+
data.tar.gz: bccd38bcb4f37a1f8f4a06ab01baa92883d6d4e3848ff380fa318583830a3600fe464ecd93f299993a42908ea70f0fd596300b6331d469eadfef6b47b165b4db
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,6 @@
|
|
1
|
+
## 2.2.0
|
2
|
+
- new feature: add new option "push_previous_map_as_event" so that each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new logstash event
|
3
|
+
|
1
4
|
## 2.1.2
|
2
5
|
- bugfix: clarify default timeout behaviour : by default, timeout is 1800s
|
3
6
|
|
data/README.md
CHANGED
@@ -4,9 +4,8 @@
|
|
4
4
|
|
5
5
|
The aim of this filter is to aggregate information available among several events (typically log lines) belonging to a same task, and finally push aggregated information into final task event.
|
6
6
|
|
7
|
-
You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work
|
8
|
-
|
9
|
-
may be processed out of sequence and unexpected results will occur.
|
7
|
+
You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work correctly
|
8
|
+
otherwise events may be processed out of sequence and unexpected results will occur.
|
10
9
|
|
11
10
|
## Example #1
|
12
11
|
|
@@ -101,6 +100,47 @@ the field `sql_duration` is added and contains the sum of all sql queries durati
|
|
101
100
|
* the key point is the "||=" ruby operator.
|
102
101
|
it allows to initialize 'sql_duration' map entry to 0 only if this map entry is not already initialized
|
103
102
|
|
103
|
+
## Example #3
|
104
|
+
|
105
|
+
Third use case : you have no specific start event and no specific end event.
|
106
|
+
A typical case is aggregating results from jdbc input plugin.
|
107
|
+
* Given that you have this SQL query : `SELECT country_name, town_name FROM town`
|
108
|
+
* Using jdbc input plugin, you get these 3 events from :
|
109
|
+
``` json
|
110
|
+
{ "country_name": "France", "town_name": "Paris" }
|
111
|
+
{ "country_name": "France", "town_name": "Marseille" }
|
112
|
+
{ "country_name": "USA", "town_name": "New-York" }
|
113
|
+
```
|
114
|
+
* And you would like these 2 result events to push them into elasticsearch :
|
115
|
+
``` json
|
116
|
+
{ "country_name": "France", "town_name": [ "Paris", "Marseille" ] }
|
117
|
+
{ "country_name": "USA", "town_name": [ "New-York" ] }
|
118
|
+
```
|
119
|
+
* You can do that using `push_previous_map_as_event` aggregate plugin option :
|
120
|
+
``` ruby
|
121
|
+
filter {
|
122
|
+
aggregate {
|
123
|
+
task_id => "%{country_name}"
|
124
|
+
code => "
|
125
|
+
map['tags'] ||= ['aggregated']
|
126
|
+
map['town_name'] ||= []
|
127
|
+
event.to_hash.each do |key,value|
|
128
|
+
map[key] = value unless map.has_key?(key)
|
129
|
+
map[key] << value if map[key].is_a?(Array)
|
130
|
+
end
|
131
|
+
"
|
132
|
+
push_previous_map_as_event => true
|
133
|
+
timeout => 5
|
134
|
+
}
|
135
|
+
|
136
|
+
if "aggregated" not in [tags] {
|
137
|
+
drop {}
|
138
|
+
}
|
139
|
+
}
|
140
|
+
```
|
141
|
+
* The key point is that, each time aggregate plugin detects a new `country_name`, it pushes previous aggregate map as a new logstash event (with 'aggregated' tag), and then creates a new empty map for the next country
|
142
|
+
* When 5s timeout comes, the last aggregate map is pushed as a new event
|
143
|
+
* Finally, initial events (which are not aggregated) are dropped because useless
|
104
144
|
|
105
145
|
## How it works
|
106
146
|
- the filter needs a "task_id" to correlate events (log lines) of a same task
|
@@ -114,7 +154,7 @@ it allows to initialize 'sql_duration' map entry to 0 only if this map entry is
|
|
114
154
|
|
115
155
|
## Use Cases
|
116
156
|
- extract some cool metrics from task logs and push them into task final log event (like in example #1 and #2)
|
117
|
-
- extract error information in any task log line, and push it in final task event (to get a final
|
157
|
+
- extract error information in any task log line, and push it in final task event (to get a final event with all error information if any)
|
118
158
|
- extract all back-end calls as a list, and push this list in final task event (to get a task profile)
|
119
159
|
- extract all http headers logged in several lines to push this list in final task event (complete http request info)
|
120
160
|
- for every back-end call, collect call details available on several lines, analyse it and finally tag final back-end call log line (error, timeout, business-warning, ...)
|
@@ -156,6 +196,12 @@ If not defined, aggregate maps will not be stored at logstash stop and will be l
|
|
156
196
|
Must be defined in only one aggregate filter (as aggregate maps are global).
|
157
197
|
Example value : `"/path/to/.aggregate_maps"`
|
158
198
|
|
199
|
+
- **push_previous_map_as_event:**
|
200
|
+
When this option is enabled, each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new logstash event,
|
201
|
+
and then creates a new empty map for the next task.
|
202
|
+
_WARNING:_ this option works fine only if tasks come one after the other. It means : all task1 events, then all task2 events, etc...
|
203
|
+
Default value: `false`
|
204
|
+
|
159
205
|
|
160
206
|
## Changelog
|
161
207
|
|
@@ -8,9 +8,8 @@ require "thread"
|
|
8
8
|
# The aim of this filter is to aggregate information available among several events (typically log lines) belonging to a same task,
|
9
9
|
# and finally push aggregated information into final task event.
|
10
10
|
#
|
11
|
-
# You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work
|
12
|
-
#
|
13
|
-
# may be processed out of sequence and unexpected results will occur.
|
11
|
+
# You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work correctly
|
12
|
+
# otherwise events may be processed out of sequence and unexpected results will occur.
|
14
13
|
#
|
15
14
|
# ==== Example #1
|
16
15
|
#
|
@@ -110,6 +109,52 @@ require "thread"
|
|
110
109
|
# * the key point is the "||=" ruby operator. It allows to initialize 'sql_duration' map entry to 0 only if this map entry is not already initialized
|
111
110
|
#
|
112
111
|
#
|
112
|
+
# ==== Example #3
|
113
|
+
#
|
114
|
+
# Third use case : you have no specific start event and no specific end event.
|
115
|
+
# * A typical case is aggregating results from jdbc input plugin.
|
116
|
+
# * Given that you have this SQL query : `SELECT country_name, town_name FROM town`
|
117
|
+
# * Using jdbc input plugin, you get these 3 events from :
|
118
|
+
# [source,json]
|
119
|
+
# ----------------------------------
|
120
|
+
# { "country_name": "France", "town_name": "Paris" }
|
121
|
+
# { "country_name": "France", "town_name": "Marseille" }
|
122
|
+
# { "country_name": "USA", "town_name": "New-York" }
|
123
|
+
# ----------------------------------
|
124
|
+
# * And you would like these 2 result events to push them into elasticsearch :
|
125
|
+
# [source,json]
|
126
|
+
# ----------------------------------
|
127
|
+
# { "country_name": "France", "town_name": [ "Paris", "Marseille" ] }
|
128
|
+
# { "country_name": "USA", "town_name": [ "New-York" ] }
|
129
|
+
# ----------------------------------
|
130
|
+
# * You can do that using `push_previous_map_as_event` aggregate plugin option :
|
131
|
+
# [source,ruby]
|
132
|
+
# ----------------------------------
|
133
|
+
# filter {
|
134
|
+
# aggregate {
|
135
|
+
# task_id => "%{country_name}"
|
136
|
+
# code => "
|
137
|
+
# map['tags'] ||= ['aggregated']
|
138
|
+
# map['town_name'] ||= []
|
139
|
+
# event.to_hash.each do |key,value|
|
140
|
+
# map[key] = value unless map.has_key?(key)
|
141
|
+
# map[key] << value if map[key].is_a?(Array)
|
142
|
+
# end
|
143
|
+
# "
|
144
|
+
# push_previous_map_as_event => true
|
145
|
+
# timeout => 5
|
146
|
+
# }
|
147
|
+
#
|
148
|
+
# if "aggregated" not in [tags] {
|
149
|
+
# drop {}
|
150
|
+
# }
|
151
|
+
# }
|
152
|
+
# ----------------------------------
|
153
|
+
# * The key point is that, each time aggregate plugin detects a new `country_name`, it pushes previous aggregate map as a new logstash event (with 'aggregated' tag), and then creates a new empty map for the next country
|
154
|
+
# * When 5s timeout comes, the last aggregate map is pushed as a new event
|
155
|
+
# * Finally, initial events (which are not aggregated) are dropped because useless
|
156
|
+
#
|
157
|
+
#
|
113
158
|
# ==== How it works
|
114
159
|
# * the filter needs a "task_id" to correlate events (log lines) of a same task
|
115
160
|
# * at the task beggining, filter creates a map, attached to task_id
|
@@ -123,7 +168,7 @@ require "thread"
|
|
123
168
|
#
|
124
169
|
# ==== Use Cases
|
125
170
|
# * extract some cool metrics from task logs and push them into task final log event (like in example #1 and #2)
|
126
|
-
# * extract error information in any task log line, and push it in final task event (to get a final
|
171
|
+
# * extract error information in any task log line, and push it in final task event (to get a final event with all error information if any)
|
127
172
|
# * extract all back-end calls as a list, and push this list in final task event (to get a task profile)
|
128
173
|
# * extract all http headers logged in several lines to push this list in final task event (complete http request info)
|
129
174
|
# * for every back-end call, collect call details available on several lines, analyse it and finally tag final back-end call log line (error, timeout, business-warning, ...)
|
@@ -178,6 +223,12 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
|
|
178
223
|
# Example value : `"/path/to/.aggregate_maps"`
|
179
224
|
config :aggregate_maps_path, :validate => :string, :required => false
|
180
225
|
|
226
|
+
# When this option is enabled, each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new logstash event,
|
227
|
+
# and then creates a new empty map for the next task.
|
228
|
+
#
|
229
|
+
# WARNING: this option works fine only if tasks come one after the other. It means : all task1 events, then all task2 events, etc...
|
230
|
+
config :push_previous_map_as_event, :validate => :boolean, :required => false, :default => false
|
231
|
+
|
181
232
|
|
182
233
|
# Default timeout (in seconds) when not defined in plugin configuration
|
183
234
|
DEFAULT_TIMEOUT = 1800
|
@@ -258,14 +309,22 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
|
|
258
309
|
return if task_id.nil? || task_id == @task_id
|
259
310
|
|
260
311
|
noError = false
|
312
|
+
event_to_yield = nil
|
261
313
|
|
262
314
|
# protect aggregate_maps against concurrent access, using a mutex
|
263
315
|
@@mutex.synchronize do
|
264
316
|
|
265
317
|
# retrieve the current aggregate map
|
266
318
|
aggregate_maps_element = @@aggregate_maps[task_id]
|
319
|
+
|
320
|
+
# create aggregate map, if it doesn't exist
|
267
321
|
if (aggregate_maps_element.nil?)
|
268
322
|
return if @map_action == "update"
|
323
|
+
# create new event from previous map, if @push_previous_map_as_event is enabled
|
324
|
+
if (@push_previous_map_as_event and !@@aggregate_maps.empty?)
|
325
|
+
previous_map = @@aggregate_maps.shift[1].map
|
326
|
+
event_to_yield = LogStash::Event.new(previous_map)
|
327
|
+
end
|
269
328
|
aggregate_maps_element = LogStash::Filters::Aggregate::Element.new(Time.now);
|
270
329
|
@@aggregate_maps[task_id] = aggregate_maps_element
|
271
330
|
else
|
@@ -284,10 +343,15 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
|
|
284
343
|
|
285
344
|
# delete the map if task is ended
|
286
345
|
@@aggregate_maps.delete(task_id) if @end_of_task
|
346
|
+
|
287
347
|
end
|
288
348
|
|
289
349
|
# match the filter, only if no error occurred
|
290
350
|
filter_matched(event) if noError
|
351
|
+
|
352
|
+
# yield previous map as new event if set
|
353
|
+
yield event_to_yield unless event_to_yield.nil?
|
354
|
+
|
291
355
|
end
|
292
356
|
|
293
357
|
# Necessary to indicate logstash to periodically call 'flush' method
|
@@ -305,20 +369,33 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
|
|
305
369
|
|
306
370
|
# Launch eviction only every interval of (@timeout / 2) seconds
|
307
371
|
if (@@eviction_instance == self && (@@last_eviction_timestamp.nil? || Time.now > @@last_eviction_timestamp + @timeout / 2))
|
308
|
-
|
372
|
+
events_to_flush = remove_expired_maps()
|
309
373
|
@@last_eviction_timestamp = Time.now
|
310
374
|
end
|
311
375
|
|
312
|
-
return
|
376
|
+
return events_to_flush
|
313
377
|
end
|
314
378
|
|
315
379
|
|
316
|
-
# Remove the expired Aggregate
|
317
|
-
|
380
|
+
# Remove the expired Aggregate maps from @@aggregate_maps if they are older than timeout.
|
381
|
+
# If @push_previous_map_as_event option is set, expired maps are returned as new events to be flushed to Logstash pipeline.
|
382
|
+
def remove_expired_maps()
|
383
|
+
events_to_flush = []
|
318
384
|
min_timestamp = Time.now - @timeout
|
385
|
+
|
319
386
|
@@mutex.synchronize do
|
320
|
-
@@aggregate_maps.delete_if
|
387
|
+
@@aggregate_maps.delete_if do |key, element|
|
388
|
+
if (element.creation_timestamp < min_timestamp)
|
389
|
+
if (@push_previous_map_as_event)
|
390
|
+
events_to_flush << LogStash::Event.new(element.map)
|
391
|
+
end
|
392
|
+
next true
|
393
|
+
end
|
394
|
+
next false
|
395
|
+
end
|
321
396
|
end
|
397
|
+
|
398
|
+
return events_to_flush
|
322
399
|
end
|
323
400
|
|
324
401
|
end # class LogStash::Filters::Aggregate
|
@@ -1,9 +1,9 @@
|
|
1
1
|
Gem::Specification.new do |s|
|
2
2
|
s.name = 'logstash-filter-aggregate'
|
3
|
-
s.version
|
3
|
+
s.version = '2.2.0'
|
4
4
|
s.licenses = ['Apache License (2.0)']
|
5
5
|
s.summary = "The aim of this filter is to aggregate information available among several events (typically log lines) belonging to a same task, and finally push aggregated information into final task event."
|
6
|
-
s.description
|
6
|
+
s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
|
7
7
|
s.authors = ["Elastic", "Fabien Baligand"]
|
8
8
|
s.email = 'info@elastic.co'
|
9
9
|
s.homepage = "https://github.com/logstash-plugins/logstash-filter-aggregate"
|
@@ -218,4 +218,30 @@ describe LogStash::Filters::Aggregate do
|
|
218
218
|
end
|
219
219
|
end
|
220
220
|
end
|
221
|
+
|
222
|
+
context "push_previous_map_as_event option is defined, " do
|
223
|
+
describe "when a new task id is detected, " do
|
224
|
+
it "should push previous map as new event" do
|
225
|
+
push_filter = setup_filter({ "code" => "map['taskid'] = event['taskid']", "push_previous_map_as_event" => true, "timeout" => 5 })
|
226
|
+
push_filter.filter(event({"taskid" => "1"})) { |yield_event| fail "task 1 shouldn't have yield event" }
|
227
|
+
push_filter.filter(event({"taskid" => "2"})) { |yield_event| expect(yield_event["taskid"]).to eq("1") }
|
228
|
+
expect(aggregate_maps.size).to eq(1)
|
229
|
+
end
|
230
|
+
end
|
231
|
+
|
232
|
+
describe "when timeout happens, " do
|
233
|
+
it "flush method should return last map as new event" do
|
234
|
+
push_filter = setup_filter({ "code" => "map['taskid'] = event['taskid']", "push_previous_map_as_event" => true, "timeout" => 1 })
|
235
|
+
push_filter.filter(event({"taskid" => "1"}))
|
236
|
+
sleep(2)
|
237
|
+
events_to_flush = push_filter.flush()
|
238
|
+
expect(events_to_flush).not_to be_nil
|
239
|
+
expect(events_to_flush.size).to eq(1)
|
240
|
+
expect(events_to_flush[0]["taskid"]).to eq("1")
|
241
|
+
expect(aggregate_maps.size).to eq(0)
|
242
|
+
end
|
243
|
+
end
|
244
|
+
end
|
245
|
+
|
246
|
+
|
221
247
|
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: logstash-filter-aggregate
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.
|
4
|
+
version: 2.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Elastic
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2016-
|
12
|
+
date: 2016-07-09 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
requirement: !ruby/object:Gem::Requirement
|