logstash-filter-aggregate 2.1.2 → 2.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +3 -0
- data/README.md +50 -4
- data/lib/logstash/filters/aggregate.rb +86 -9
- data/logstash-filter-aggregate.gemspec +2 -2
- data/spec/filters/aggregate_spec.rb +26 -0
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: e023e6c80ed96fa874477b00888a78bf45ee56a7
|
4
|
+
data.tar.gz: 1375bebbcda30c0052f8f2aa5188dda548d53d6c
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f611bd231a13d8933662dbe7df7075a551e89772f0007d4d694636f307e576c828f091f37c7a3f917b717bec9dbb0acacff944cff14a4fc27bee92318d394e5a
|
7
|
+
data.tar.gz: bccd38bcb4f37a1f8f4a06ab01baa92883d6d4e3848ff380fa318583830a3600fe464ecd93f299993a42908ea70f0fd596300b6331d469eadfef6b47b165b4db
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,6 @@
|
|
1
|
+
## 2.2.0
|
2
|
+
- new feature: add new option "push_previous_map_as_event" so that each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new logstash event
|
3
|
+
|
1
4
|
## 2.1.2
|
2
5
|
- bugfix: clarify default timeout behaviour : by default, timeout is 1800s
|
3
6
|
|
data/README.md
CHANGED
@@ -4,9 +4,8 @@
|
|
4
4
|
|
5
5
|
The aim of this filter is to aggregate information available among several events (typically log lines) belonging to a same task, and finally push aggregated information into final task event.
|
6
6
|
|
7
|
-
You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work
|
8
|
-
|
9
|
-
may be processed out of sequence and unexpected results will occur.
|
7
|
+
You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work correctly
|
8
|
+
otherwise events may be processed out of sequence and unexpected results will occur.
|
10
9
|
|
11
10
|
## Example #1
|
12
11
|
|
@@ -101,6 +100,47 @@ the field `sql_duration` is added and contains the sum of all sql queries durati
|
|
101
100
|
* the key point is the "||=" ruby operator.
|
102
101
|
it allows to initialize 'sql_duration' map entry to 0 only if this map entry is not already initialized
|
103
102
|
|
103
|
+
## Example #3
|
104
|
+
|
105
|
+
Third use case : you have no specific start event and no specific end event.
|
106
|
+
A typical case is aggregating results from jdbc input plugin.
|
107
|
+
* Given that you have this SQL query : `SELECT country_name, town_name FROM town`
|
108
|
+
* Using jdbc input plugin, you get these 3 events from :
|
109
|
+
``` json
|
110
|
+
{ "country_name": "France", "town_name": "Paris" }
|
111
|
+
{ "country_name": "France", "town_name": "Marseille" }
|
112
|
+
{ "country_name": "USA", "town_name": "New-York" }
|
113
|
+
```
|
114
|
+
* And you would like these 2 result events to push them into elasticsearch :
|
115
|
+
``` json
|
116
|
+
{ "country_name": "France", "town_name": [ "Paris", "Marseille" ] }
|
117
|
+
{ "country_name": "USA", "town_name": [ "New-York" ] }
|
118
|
+
```
|
119
|
+
* You can do that using `push_previous_map_as_event` aggregate plugin option :
|
120
|
+
``` ruby
|
121
|
+
filter {
|
122
|
+
aggregate {
|
123
|
+
task_id => "%{country_name}"
|
124
|
+
code => "
|
125
|
+
map['tags'] ||= ['aggregated']
|
126
|
+
map['town_name'] ||= []
|
127
|
+
event.to_hash.each do |key,value|
|
128
|
+
map[key] = value unless map.has_key?(key)
|
129
|
+
map[key] << value if map[key].is_a?(Array)
|
130
|
+
end
|
131
|
+
"
|
132
|
+
push_previous_map_as_event => true
|
133
|
+
timeout => 5
|
134
|
+
}
|
135
|
+
|
136
|
+
if "aggregated" not in [tags] {
|
137
|
+
drop {}
|
138
|
+
}
|
139
|
+
}
|
140
|
+
```
|
141
|
+
* The key point is that, each time aggregate plugin detects a new `country_name`, it pushes previous aggregate map as a new logstash event (with 'aggregated' tag), and then creates a new empty map for the next country
|
142
|
+
* When 5s timeout comes, the last aggregate map is pushed as a new event
|
143
|
+
* Finally, initial events (which are not aggregated) are dropped because useless
|
104
144
|
|
105
145
|
## How it works
|
106
146
|
- the filter needs a "task_id" to correlate events (log lines) of a same task
|
@@ -114,7 +154,7 @@ it allows to initialize 'sql_duration' map entry to 0 only if this map entry is
|
|
114
154
|
|
115
155
|
## Use Cases
|
116
156
|
- extract some cool metrics from task logs and push them into task final log event (like in example #1 and #2)
|
117
|
-
- extract error information in any task log line, and push it in final task event (to get a final
|
157
|
+
- extract error information in any task log line, and push it in final task event (to get a final event with all error information if any)
|
118
158
|
- extract all back-end calls as a list, and push this list in final task event (to get a task profile)
|
119
159
|
- extract all http headers logged in several lines to push this list in final task event (complete http request info)
|
120
160
|
- for every back-end call, collect call details available on several lines, analyse it and finally tag final back-end call log line (error, timeout, business-warning, ...)
|
@@ -156,6 +196,12 @@ If not defined, aggregate maps will not be stored at logstash stop and will be l
|
|
156
196
|
Must be defined in only one aggregate filter (as aggregate maps are global).
|
157
197
|
Example value : `"/path/to/.aggregate_maps"`
|
158
198
|
|
199
|
+
- **push_previous_map_as_event:**
|
200
|
+
When this option is enabled, each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new logstash event,
|
201
|
+
and then creates a new empty map for the next task.
|
202
|
+
_WARNING:_ this option works fine only if tasks come one after the other. It means : all task1 events, then all task2 events, etc...
|
203
|
+
Default value: `false`
|
204
|
+
|
159
205
|
|
160
206
|
## Changelog
|
161
207
|
|
@@ -8,9 +8,8 @@ require "thread"
|
|
8
8
|
# The aim of this filter is to aggregate information available among several events (typically log lines) belonging to a same task,
|
9
9
|
# and finally push aggregated information into final task event.
|
10
10
|
#
|
11
|
-
# You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work
|
12
|
-
#
|
13
|
-
# may be processed out of sequence and unexpected results will occur.
|
11
|
+
# You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work correctly
|
12
|
+
# otherwise events may be processed out of sequence and unexpected results will occur.
|
14
13
|
#
|
15
14
|
# ==== Example #1
|
16
15
|
#
|
@@ -110,6 +109,52 @@ require "thread"
|
|
110
109
|
# * the key point is the "||=" ruby operator. It allows to initialize 'sql_duration' map entry to 0 only if this map entry is not already initialized
|
111
110
|
#
|
112
111
|
#
|
112
|
+
# ==== Example #3
|
113
|
+
#
|
114
|
+
# Third use case : you have no specific start event and no specific end event.
|
115
|
+
# * A typical case is aggregating results from jdbc input plugin.
|
116
|
+
# * Given that you have this SQL query : `SELECT country_name, town_name FROM town`
|
117
|
+
# * Using jdbc input plugin, you get these 3 events from :
|
118
|
+
# [source,json]
|
119
|
+
# ----------------------------------
|
120
|
+
# { "country_name": "France", "town_name": "Paris" }
|
121
|
+
# { "country_name": "France", "town_name": "Marseille" }
|
122
|
+
# { "country_name": "USA", "town_name": "New-York" }
|
123
|
+
# ----------------------------------
|
124
|
+
# * And you would like these 2 result events to push them into elasticsearch :
|
125
|
+
# [source,json]
|
126
|
+
# ----------------------------------
|
127
|
+
# { "country_name": "France", "town_name": [ "Paris", "Marseille" ] }
|
128
|
+
# { "country_name": "USA", "town_name": [ "New-York" ] }
|
129
|
+
# ----------------------------------
|
130
|
+
# * You can do that using `push_previous_map_as_event` aggregate plugin option :
|
131
|
+
# [source,ruby]
|
132
|
+
# ----------------------------------
|
133
|
+
# filter {
|
134
|
+
# aggregate {
|
135
|
+
# task_id => "%{country_name}"
|
136
|
+
# code => "
|
137
|
+
# map['tags'] ||= ['aggregated']
|
138
|
+
# map['town_name'] ||= []
|
139
|
+
# event.to_hash.each do |key,value|
|
140
|
+
# map[key] = value unless map.has_key?(key)
|
141
|
+
# map[key] << value if map[key].is_a?(Array)
|
142
|
+
# end
|
143
|
+
# "
|
144
|
+
# push_previous_map_as_event => true
|
145
|
+
# timeout => 5
|
146
|
+
# }
|
147
|
+
#
|
148
|
+
# if "aggregated" not in [tags] {
|
149
|
+
# drop {}
|
150
|
+
# }
|
151
|
+
# }
|
152
|
+
# ----------------------------------
|
153
|
+
# * The key point is that, each time aggregate plugin detects a new `country_name`, it pushes previous aggregate map as a new logstash event (with 'aggregated' tag), and then creates a new empty map for the next country
|
154
|
+
# * When 5s timeout comes, the last aggregate map is pushed as a new event
|
155
|
+
# * Finally, initial events (which are not aggregated) are dropped because useless
|
156
|
+
#
|
157
|
+
#
|
113
158
|
# ==== How it works
|
114
159
|
# * the filter needs a "task_id" to correlate events (log lines) of a same task
|
115
160
|
# * at the task beggining, filter creates a map, attached to task_id
|
@@ -123,7 +168,7 @@ require "thread"
|
|
123
168
|
#
|
124
169
|
# ==== Use Cases
|
125
170
|
# * extract some cool metrics from task logs and push them into task final log event (like in example #1 and #2)
|
126
|
-
# * extract error information in any task log line, and push it in final task event (to get a final
|
171
|
+
# * extract error information in any task log line, and push it in final task event (to get a final event with all error information if any)
|
127
172
|
# * extract all back-end calls as a list, and push this list in final task event (to get a task profile)
|
128
173
|
# * extract all http headers logged in several lines to push this list in final task event (complete http request info)
|
129
174
|
# * for every back-end call, collect call details available on several lines, analyse it and finally tag final back-end call log line (error, timeout, business-warning, ...)
|
@@ -178,6 +223,12 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
|
|
178
223
|
# Example value : `"/path/to/.aggregate_maps"`
|
179
224
|
config :aggregate_maps_path, :validate => :string, :required => false
|
180
225
|
|
226
|
+
# When this option is enabled, each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new logstash event,
|
227
|
+
# and then creates a new empty map for the next task.
|
228
|
+
#
|
229
|
+
# WARNING: this option works fine only if tasks come one after the other. It means : all task1 events, then all task2 events, etc...
|
230
|
+
config :push_previous_map_as_event, :validate => :boolean, :required => false, :default => false
|
231
|
+
|
181
232
|
|
182
233
|
# Default timeout (in seconds) when not defined in plugin configuration
|
183
234
|
DEFAULT_TIMEOUT = 1800
|
@@ -258,14 +309,22 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
|
|
258
309
|
return if task_id.nil? || task_id == @task_id
|
259
310
|
|
260
311
|
noError = false
|
312
|
+
event_to_yield = nil
|
261
313
|
|
262
314
|
# protect aggregate_maps against concurrent access, using a mutex
|
263
315
|
@@mutex.synchronize do
|
264
316
|
|
265
317
|
# retrieve the current aggregate map
|
266
318
|
aggregate_maps_element = @@aggregate_maps[task_id]
|
319
|
+
|
320
|
+
# create aggregate map, if it doesn't exist
|
267
321
|
if (aggregate_maps_element.nil?)
|
268
322
|
return if @map_action == "update"
|
323
|
+
# create new event from previous map, if @push_previous_map_as_event is enabled
|
324
|
+
if (@push_previous_map_as_event and !@@aggregate_maps.empty?)
|
325
|
+
previous_map = @@aggregate_maps.shift[1].map
|
326
|
+
event_to_yield = LogStash::Event.new(previous_map)
|
327
|
+
end
|
269
328
|
aggregate_maps_element = LogStash::Filters::Aggregate::Element.new(Time.now);
|
270
329
|
@@aggregate_maps[task_id] = aggregate_maps_element
|
271
330
|
else
|
@@ -284,10 +343,15 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
|
|
284
343
|
|
285
344
|
# delete the map if task is ended
|
286
345
|
@@aggregate_maps.delete(task_id) if @end_of_task
|
346
|
+
|
287
347
|
end
|
288
348
|
|
289
349
|
# match the filter, only if no error occurred
|
290
350
|
filter_matched(event) if noError
|
351
|
+
|
352
|
+
# yield previous map as new event if set
|
353
|
+
yield event_to_yield unless event_to_yield.nil?
|
354
|
+
|
291
355
|
end
|
292
356
|
|
293
357
|
# Necessary to indicate logstash to periodically call 'flush' method
|
@@ -305,20 +369,33 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
|
|
305
369
|
|
306
370
|
# Launch eviction only every interval of (@timeout / 2) seconds
|
307
371
|
if (@@eviction_instance == self && (@@last_eviction_timestamp.nil? || Time.now > @@last_eviction_timestamp + @timeout / 2))
|
308
|
-
|
372
|
+
events_to_flush = remove_expired_maps()
|
309
373
|
@@last_eviction_timestamp = Time.now
|
310
374
|
end
|
311
375
|
|
312
|
-
return
|
376
|
+
return events_to_flush
|
313
377
|
end
|
314
378
|
|
315
379
|
|
316
|
-
# Remove the expired Aggregate
|
317
|
-
|
380
|
+
# Remove the expired Aggregate maps from @@aggregate_maps if they are older than timeout.
|
381
|
+
# If @push_previous_map_as_event option is set, expired maps are returned as new events to be flushed to Logstash pipeline.
|
382
|
+
def remove_expired_maps()
|
383
|
+
events_to_flush = []
|
318
384
|
min_timestamp = Time.now - @timeout
|
385
|
+
|
319
386
|
@@mutex.synchronize do
|
320
|
-
@@aggregate_maps.delete_if
|
387
|
+
@@aggregate_maps.delete_if do |key, element|
|
388
|
+
if (element.creation_timestamp < min_timestamp)
|
389
|
+
if (@push_previous_map_as_event)
|
390
|
+
events_to_flush << LogStash::Event.new(element.map)
|
391
|
+
end
|
392
|
+
next true
|
393
|
+
end
|
394
|
+
next false
|
395
|
+
end
|
321
396
|
end
|
397
|
+
|
398
|
+
return events_to_flush
|
322
399
|
end
|
323
400
|
|
324
401
|
end # class LogStash::Filters::Aggregate
|
@@ -1,9 +1,9 @@
|
|
1
1
|
Gem::Specification.new do |s|
|
2
2
|
s.name = 'logstash-filter-aggregate'
|
3
|
-
s.version
|
3
|
+
s.version = '2.2.0'
|
4
4
|
s.licenses = ['Apache License (2.0)']
|
5
5
|
s.summary = "The aim of this filter is to aggregate information available among several events (typically log lines) belonging to a same task, and finally push aggregated information into final task event."
|
6
|
-
s.description
|
6
|
+
s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
|
7
7
|
s.authors = ["Elastic", "Fabien Baligand"]
|
8
8
|
s.email = 'info@elastic.co'
|
9
9
|
s.homepage = "https://github.com/logstash-plugins/logstash-filter-aggregate"
|
@@ -218,4 +218,30 @@ describe LogStash::Filters::Aggregate do
|
|
218
218
|
end
|
219
219
|
end
|
220
220
|
end
|
221
|
+
|
222
|
+
context "push_previous_map_as_event option is defined, " do
|
223
|
+
describe "when a new task id is detected, " do
|
224
|
+
it "should push previous map as new event" do
|
225
|
+
push_filter = setup_filter({ "code" => "map['taskid'] = event['taskid']", "push_previous_map_as_event" => true, "timeout" => 5 })
|
226
|
+
push_filter.filter(event({"taskid" => "1"})) { |yield_event| fail "task 1 shouldn't have yield event" }
|
227
|
+
push_filter.filter(event({"taskid" => "2"})) { |yield_event| expect(yield_event["taskid"]).to eq("1") }
|
228
|
+
expect(aggregate_maps.size).to eq(1)
|
229
|
+
end
|
230
|
+
end
|
231
|
+
|
232
|
+
describe "when timeout happens, " do
|
233
|
+
it "flush method should return last map as new event" do
|
234
|
+
push_filter = setup_filter({ "code" => "map['taskid'] = event['taskid']", "push_previous_map_as_event" => true, "timeout" => 1 })
|
235
|
+
push_filter.filter(event({"taskid" => "1"}))
|
236
|
+
sleep(2)
|
237
|
+
events_to_flush = push_filter.flush()
|
238
|
+
expect(events_to_flush).not_to be_nil
|
239
|
+
expect(events_to_flush.size).to eq(1)
|
240
|
+
expect(events_to_flush[0]["taskid"]).to eq("1")
|
241
|
+
expect(aggregate_maps.size).to eq(0)
|
242
|
+
end
|
243
|
+
end
|
244
|
+
end
|
245
|
+
|
246
|
+
|
221
247
|
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: logstash-filter-aggregate
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.
|
4
|
+
version: 2.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Elastic
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2016-
|
12
|
+
date: 2016-07-09 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
requirement: !ruby/object:Gem::Requirement
|