RubyGems - logstash-filter-aggregate - Versions diffs - 2.1.2 → 2.2.0 - Mend

logstash-filter-aggregate 2.1.2 → 2.2.0

Files changed (7) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +3 -0
data/README.md +50 -4
data/lib/logstash/filters/aggregate.rb +86 -9
data/logstash-filter-aggregate.gemspec +2 -2
data/spec/filters/aggregate_spec.rb +26 -0
metadata +2 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: f880dab9eca9eca31f5634938e973fd0bc7d739b
-  data.tar.gz: e2f233c165cd6f944c7f0cc63968e2bb6428f694
+  metadata.gz: e023e6c80ed96fa874477b00888a78bf45ee56a7
+  data.tar.gz: 1375bebbcda30c0052f8f2aa5188dda548d53d6c
 SHA512:
-  metadata.gz: 8cbf67652af3eaf6547302b701a6aec844e7ca1a82b722a83ca048724a3ce252e1e5ed2bbf5c2fee8533f8fd1ab387791e7b7749fed33a6f745c5650bc57632e
-  data.tar.gz: ec57ad6ef531427c1c43800340f8352b44d954aba39bf3f786bf315a7f843458f8252f846f60b4c2c9312c0e94293bc2ec47aad0ad5088b291326083cd00ab76
+  metadata.gz: f611bd231a13d8933662dbe7df7075a551e89772f0007d4d694636f307e576c828f091f37c7a3f917b717bec9dbb0acacff944cff14a4fc27bee92318d394e5a
+  data.tar.gz: bccd38bcb4f37a1f8f4a06ab01baa92883d6d4e3848ff380fa318583830a3600fe464ecd93f299993a42908ea70f0fd596300b6331d469eadfef6b47b165b4db

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,6 @@
+## 2.2.0
+ - new feature: add new option "push_previous_map_as_event" so that each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new logstash event
 ## 2.1.2
  - bugfix: clarify default timeout behaviour : by default, timeout is 1800s

data/README.md CHANGED Viewed

@@ -4,9 +4,8 @@
 The aim of this filter is to aggregate information available among several events (typically log lines) belonging to a same task, and finally push aggregated information into final task event.
-You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work
-correctly otherwise documents
-may be processed out of sequence and unexpected results will occur.
+You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work correctly
+otherwise events may be processed out of sequence and unexpected results will occur.
 ## Example #1
@@ -101,6 +100,47 @@ the field `sql_duration` is added and contains the sum of all sql queries durati
 * the key point is the "||=" ruby operator.
 it allows to initialize 'sql_duration' map entry to 0 only if this map entry is not already initialized
+## Example #3
+Third use case : you have no specific start event and no specific end event.
+A typical case is aggregating results from jdbc input plugin.
+* Given that you have this SQL query : `SELECT country_name, town_name FROM town`
+* Using jdbc input plugin, you get these 3 events from :
+``` json
+  { "country_name": "France", "town_name": "Paris" }
+  { "country_name": "France", "town_name": "Marseille" }
+  { "country_name": "USA", "town_name": "New-York" }
+```
+* And you would like these 2 result events to push them into elasticsearch :
+``` json
+  { "country_name": "France", "town_name": [ "Paris", "Marseille" ] }
+  { "country_name": "USA", "town_name": [ "New-York" ] }
+```
+* You can do that using `push_previous_map_as_event` aggregate plugin option :
+``` ruby
+     filter {
+		 aggregate {
+		     task_id => "%{country_name}"
+		     code => "
+		     	map['tags'] ||= ['aggregated']
+		     	map['town_name'] ||= []
+		     	event.to_hash.each do |key,value|
+		     		map[key] = value unless map.has_key?(key)
+		     		map[key] << value if map[key].is_a?(Array)
+		     	end
+		     "
+		     push_previous_map_as_event => true
+		     timeout => 5
+		 }
+		 if "aggregated" not in [tags] {
+		 	drop {}
+		 }
+	 }
+```
+* The key point is that, each time aggregate plugin detects a new `country_name`, it pushes previous aggregate map as a new logstash event (with 'aggregated' tag), and then creates a new empty map for the next country
+* When 5s timeout comes, the last aggregate map is pushed as a new event
+* Finally, initial events (which are not aggregated) are dropped because useless
 ## How it works
 - the filter needs a "task_id" to correlate events (log lines) of a same task
@@ -114,7 +154,7 @@ it allows to initialize 'sql_duration' map entry to 0 only if this map entry is
 ## Use Cases
 - extract some cool metrics from task logs and push them into task final log event (like in example #1 and #2)
-- extract error information in any task log line, and push it in final task event (to get a final document with all error information if any)
+- extract error information in any task log line, and push it in final task event (to get a final event with all error information if any)
 - extract all back-end calls as a list, and push this list in final task event (to get a task profile)
 - extract all http headers logged in several lines to push this list in final task event (complete http request info)
 - for every back-end call, collect call details available on several lines, analyse it and finally tag final back-end call log line (error, timeout, business-warning, ...)
@@ -156,6 +196,12 @@ If not defined, aggregate maps will not be stored at logstash stop and will be l
 Must be defined in only one aggregate filter (as aggregate maps are global).
 Example value : `"/path/to/.aggregate_maps"`
+- **push_previous_map_as_event:**
+When this option is enabled, each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new logstash event,
+and then creates a new empty map for the next task.
+_WARNING:_ this option works fine only if tasks come one after the other. It means : all task1 events, then all task2 events, etc...
+Default value: `false`
 ## Changelog

data/lib/logstash/filters/aggregate.rb CHANGED Viewed

@@ -8,9 +8,8 @@ require "thread"
 # The aim of this filter is to aggregate information available among several events (typically log lines) belonging to a same task,
 # and finally push aggregated information into final task event.
 #
-# You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work
-# correctly otherwise documents
-# may be processed out of sequence and unexpected results will occur.
+# You should be very careful to set logstash filter workers to 1 (`-w 1` flag) for this filter to work correctly
+# otherwise events may be processed out of sequence and unexpected results will occur.
 #
 # ==== Example #1
 #
@@ -110,6 +109,52 @@ require "thread"
 # * the key point is the "||=" ruby operator. It allows to initialize 'sql_duration' map entry to 0 only if this map entry is not already initialized
 #
 #
+# ==== Example #3
+#
+# Third use case : you have no specific start event and no specific end event.
+# * A typical case is aggregating results from jdbc input plugin.
+# * Given that you have this SQL query : `SELECT country_name, town_name FROM town`
+# * Using jdbc input plugin, you get these 3 events from :
+# [source,json]
+# ----------------------------------
+#   { "country_name": "France", "town_name": "Paris" }
+#   { "country_name": "France", "town_name": "Marseille" }
+#   { "country_name": "USA", "town_name": "New-York" }
+# ----------------------------------
+# * And you would like these 2 result events to push them into elasticsearch :
+# [source,json]
+# ----------------------------------
+#   { "country_name": "France", "town_name": [ "Paris", "Marseille" ] }
+#   { "country_name": "USA", "town_name": [ "New-York" ] }
+# ----------------------------------
+# * You can do that using `push_previous_map_as_event` aggregate plugin option :
+# [source,ruby]
+# ----------------------------------
+#      filter {
+#      aggregate {
+#          task_id => "%{country_name}"
+#          code => "
+#           map['tags'] ||= ['aggregated']
+#           map['town_name'] ||= []
+#           event.to_hash.each do |key,value|
+#             map[key] = value unless map.has_key?(key)
+#             map[key] << value if map[key].is_a?(Array)
+#           end
+#          "
+#          push_previous_map_as_event => true
+#          timeout => 5
+#      }
+#
+#      if "aggregated" not in [tags] {
+#       drop {}
+#      }
+#    }
+# ----------------------------------
+# * The key point is that, each time aggregate plugin detects a new `country_name`, it pushes previous aggregate map as a new logstash event (with 'aggregated' tag), and then creates a new empty map for the next country
+# * When 5s timeout comes, the last aggregate map is pushed as a new event
+# * Finally, initial events (which are not aggregated) are dropped because useless
+#
+#
 # ==== How it works
 # * the filter needs a "task_id" to correlate events (log lines) of a same task
 # * at the task beggining, filter creates a map, attached to task_id
@@ -123,7 +168,7 @@ require "thread"
 #
 # ==== Use Cases
 # * extract some cool metrics from task logs and push them into task final log event (like in example #1 and #2)
-# * extract error information in any task log line, and push it in final task event (to get a final document with all error information if any)
+# * extract error information in any task log line, and push it in final task event (to get a final event with all error information if any)
 # * extract all back-end calls as a list, and push this list in final task event (to get a task profile)
 # * extract all http headers logged in several lines to push this list in final task event (complete http request info)
 # * for every back-end call, collect call details available on several lines, analyse it and finally tag final back-end call log line (error, timeout, business-warning, ...)
@@ -178,6 +223,12 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
   # Example value : `"/path/to/.aggregate_maps"`
   config :aggregate_maps_path, :validate => :string, :required => false
+  # When this option is enabled, each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new logstash event,
+  # and then creates a new empty map for the next task.
+  #
+  # WARNING: this option works fine only if tasks come one after the other. It means : all task1 events, then all task2 events, etc...
+  config :push_previous_map_as_event, :validate => :boolean, :required => false, :default => false
   # Default timeout (in seconds) when not defined in plugin configuration
   DEFAULT_TIMEOUT = 1800
@@ -258,14 +309,22 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
     return if task_id.nil? || task_id == @task_id
     noError = false
+    event_to_yield = nil
     # protect aggregate_maps against concurrent access, using a mutex
     @@mutex.synchronize do
       # retrieve the current aggregate map
       aggregate_maps_element = @@aggregate_maps[task_id]
+      # create aggregate map, if it doesn't exist
       if (aggregate_maps_element.nil?)
         return if @map_action == "update"
+        # create new event from previous map, if @push_previous_map_as_event is enabled
+        if (@push_previous_map_as_event and !@@aggregate_maps.empty?)
+          previous_map = @@aggregate_maps.shift[1].map
+          event_to_yield = LogStash::Event.new(previous_map)
+        end
         aggregate_maps_element = LogStash::Filters::Aggregate::Element.new(Time.now);
         @@aggregate_maps[task_id] = aggregate_maps_element
       else
@@ -284,10 +343,15 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
       # delete the map if task is ended
       @@aggregate_maps.delete(task_id) if @end_of_task
     end
     # match the filter, only if no error occurred
     filter_matched(event) if noError
+    # yield previous map as new event if set
+    yield event_to_yield unless event_to_yield.nil?
   end
   # Necessary to indicate logstash to periodically call 'flush' method
@@ -305,20 +369,33 @@ class LogStash::Filters::Aggregate < LogStash::Filters::Base
     # Launch eviction only every interval of (@timeout / 2) seconds
     if (@@eviction_instance == self && (@@last_eviction_timestamp.nil? || Time.now > @@last_eviction_timestamp + @timeout / 2))
-      remove_expired_elements()
+      events_to_flush = remove_expired_maps()
       @@last_eviction_timestamp = Time.now
     end
-    return nil
+    return events_to_flush
   end
-  # Remove the expired Aggregate elements from "aggregate_maps" if they are older than timeout
-  def remove_expired_elements()
+  # Remove the expired Aggregate maps from @@aggregate_maps if they are older than timeout.
+  # If @push_previous_map_as_event option is set, expired maps are returned as new events to be flushed to Logstash pipeline.
+  def remove_expired_maps()
+    events_to_flush = []
     min_timestamp = Time.now - @timeout
     @@mutex.synchronize do
-      @@aggregate_maps.delete_if { |key, element| element.creation_timestamp < min_timestamp }
+      @@aggregate_maps.delete_if do |key, element|
+        if (element.creation_timestamp < min_timestamp)
+          if (@push_previous_map_as_event)
+            events_to_flush << LogStash::Event.new(element.map)
+          end
+          next true
+        end
+        next false
+      end
     end
+    return events_to_flush
   end
 end # class LogStash::Filters::Aggregate

data/logstash-filter-aggregate.gemspec CHANGED Viewed

@@ -1,9 +1,9 @@
 Gem::Specification.new do |s|
   s.name = 'logstash-filter-aggregate'
-  s.version         = '2.1.2'
+  s.version = '2.2.0'
   s.licenses = ['Apache License (2.0)']
   s.summary = "The aim of this filter is to aggregate information available among several events (typically log lines) belonging to a same task, and finally push aggregated information into final task event."
-  s.description     = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
+  s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
   s.authors = ["Elastic", "Fabien Baligand"]
   s.email = 'info@elastic.co'
   s.homepage = "https://github.com/logstash-plugins/logstash-filter-aggregate"

data/spec/filters/aggregate_spec.rb CHANGED Viewed

@@ -218,4 +218,30 @@ describe LogStash::Filters::Aggregate do
       end
     end
   end
+  context "push_previous_map_as_event option is defined, " do
+    describe "when a new task id is detected, " do
+      it "should push previous map as new event" do
+        push_filter = setup_filter({ "code" => "map['taskid'] = event['taskid']", "push_previous_map_as_event" => true, "timeout" => 5 })
+        push_filter.filter(event({"taskid" => "1"})) { |yield_event| fail "task 1 shouldn't have yield event" }
+        push_filter.filter(event({"taskid" => "2"})) { |yield_event| expect(yield_event["taskid"]).to eq("1") }
+        expect(aggregate_maps.size).to eq(1)
+      end
+    end
+    describe "when timeout happens, " do
+      it "flush method should return last map as new event" do
+        push_filter = setup_filter({ "code" => "map['taskid'] = event['taskid']", "push_previous_map_as_event" => true, "timeout" => 1 })
+        push_filter.filter(event({"taskid" => "1"}))
+        sleep(2)
+        events_to_flush = push_filter.flush()
+        expect(events_to_flush).not_to be_nil
+        expect(events_to_flush.size).to eq(1)
+        expect(events_to_flush[0]["taskid"]).to eq("1")
+        expect(aggregate_maps.size).to eq(0)
+      end
+    end
+  end
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: logstash-filter-aggregate
 version: !ruby/object:Gem::Version
-  version: 2.1.2
+  version: 2.2.0
 platform: ruby
 authors:
 - Elastic
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2016-06-04 00:00:00.000000000 Z
+date: 2016-07-09 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   requirement: !ruby/object:Gem::Requirement