RubyGems - fluent-plugin-cloudwatch-ingest - Versions diffs - 0.6.0 → 1.0.0 - Mend

fluent-plugin-cloudwatch-ingest 0.6.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +38 -0
data/README.md +19 -3
data/circle.yml +3 -0
data/lib/fluent/plugin/cloudwatch/ingest/version.rb +1 -1
data/lib/fluent/plugin/in_cloudwatch_ingest.rb +128 -59
metadata +3 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 97564b12944b487966ac98b5f0175b1fed34b2d9
-  data.tar.gz: 7bde517b6c51e6b78902da36d090d3ddb7594e4e
+  metadata.gz: 906c62816c5c16ed1fd7bfe31fc9eed873a0d149
+  data.tar.gz: a4d5b892302e21a96ef8c8d3b067a605cb1c49f5
 SHA512:
-  metadata.gz: 1d5ee8f52a6c7669ab9386b9874f3f227d20f5983550d65bcfd389925b6af3b84bcfa76f59f788547902930fad9ddcfabb60c1b0ff92e93d86497ec01050c064
-  data.tar.gz: 86f6914db8742f1c4eb852a7faea833c11714563ffee10c7f0c3a37acf83d5f713e5baaa2717ac4811f72ae1cba99cea2515f28782c3ad6ab8b189eec579a2b0
+  metadata.gz: a5e306f56e216c0742fd8983591481559569f93add886e1c5eca867f44c1b824dfcd912f29ea1a0ede09f0681841380a14cfddd1586557a2428e8e9b8a9ac87a
+  data.tar.gz: 9ea98c8202409bf0a0116dbbcd85204760c7f3e1645f24e4a8d1c69853b61a45b95ce9327a459190bc303359c0b9faee5d713b4597c408aebde4e603cab56987

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,38 @@
+# Changelog
+## 0.1.3
+* Initial release
+## 0.2.1
+* AWS SDK logging
+* Code reorganization
+## 0.3.1
+* Limit events per API call
+* Parser constructor fix (@snltd)
+## 0.4.0
+* Optionally fetch oldest logs first (@chaeyk)
+## 0.5.4
+* Optionally parse the body as JSON into structured fields
+## 0.6.0
+* Add statsd telemetry
+## 1.0.0
+* Print a stack trace when recusing exceptions (@chaeyk)
+* If stored API token is invalid or corrupt, use a stored timestamp (@chaeyk)
+* Truncate statefile before saving (@chaeyk)
+* Amend how `api_interval` is used (see README.md) (@chaeyk)
+* Improve null stream detection (@chaeyk)
+* Remove streams from state file that are no longer present (@chaeyk)
+* Apply `error_interval` when failing to get statefile lock (@chaeyk)
+* `api_interval` deprecated in favour of `error_interval`

data/README.md CHANGED Viewed

@@ -1,4 +1,5 @@
-# Fluentd Cloudwatch Plugin [![Circle CI](https://circleci.com/gh/sampointer/fluent-plugin-cloudwatch-ingest.svg?style=shield)](https://circleci.com/gh/sampointer/fluent-plugin-cloudwatch-ingest) [![Gem Version](https://badge.fury.io/rb/fluent-plugin-cloudwatch-ingest.svg)](https://badge.fury.io/rb/fluent-plugin-cloudwatch-ingest) ![](http://ruby-gem-downloads-badge.herokuapp.com/fluent-plugin-cloudwatch-ingest?type=total)
+# Fluentd Cloudwatch Plugin
+[![Circle CI](https://circleci.com/gh/sampointer/fluent-plugin-cloudwatch-ingest.svg?style=shield)](https://circleci.com/gh/sampointer/fluent-plugin-cloudwatch-ingest) [![Gem Version](https://badge.fury.io/rb/fluent-plugin-cloudwatch-ingest.svg)](https://badge.fury.io/rb/fluent-plugin-cloudwatch-ingest) ![](http://ruby-gem-downloads-badge.herokuapp.com/fluent-plugin-cloudwatch-ingest?type=total) [![Join the chat at https://gitter.im/fluent-plugin-cloudwatch-ingest](https://badges.gitter.im/fluent-plugin-cloudwatch-ingest.svg)](https://gitter.im/fluent-plugin-cloudwatch-ingest/Lobby?utm_source=share-link&utm_medium=link&utm_campaign=share-link)
 ## Introduction
@@ -36,10 +37,9 @@ Or install it yourself as:
   aws_logging_enabled true
   log_group_name_prefix /aws/lambda
   log_stream_name_prefix 2017
-  limit_events 10000
   state_file_name /mnt/nfs/cloudwatch.state
   interval 60
-  api_interval 5            # Time to wait between API call failures before retry
+  error_interval 5          # Time to wait between error conditions before retry
   limit_events 10000        # Number of events to fetch in any given iteration
   event_start_time 0        # Do not fetch events before this time (UNIX epoch, miliseconds)
   oldest_logs_first false   # When true fetch the oldest logs first
@@ -81,6 +81,22 @@ If `fail_on_unparsable_json` is set to `true` a record body consisting of malfor
 The `expression` is applied before JSON parsing is attempted. One may therefore extract a JSON fragment from within the event body if it is decorated with additional free-form text.
+### Telemetry
+With `telemetry` set to `true` and a valid `statsd_endpoint` the plugin will emit telemetry in statsd format to 8125:UDP. It is up to you to configure your statsd-speaking daemon to add any prefix or tagging that you might want.
+The metrics emitted in this version are:
+```
+api.calls.describeloggroups.attempted
+api.calls.describeloggroups.failed
+api.calls.describelogstreams.attempted
+api.calls.describelogstreams.failed
+api.calls.getlogevents.attempted
+api.calls.getlogevents.failed
+api.calls.getlogevents.invalid_token
+events.emitted.success
+```
 ### Sub-second timestamps
 When using `event_time true` the `@timestamp` field for the record is taken from the time recorded against the event by Cloudwatch. This is the most common mode to run in as it's an easy path to normalization: all of your Lambdas or other AWS service need not have the same, valid, `time_format` nor a regex that matches every case.

data/circle.yml CHANGED Viewed

@@ -9,3 +9,6 @@ deployment:
     owner: sampointer
     commands:
       - bin/deploy
+notify:
+  webhooks:
+    - url: https://webhooks.gitter.im/e/e1ae263bbcea2f51419f

data/lib/fluent/plugin/cloudwatch/ingest/version.rb CHANGED Viewed

@@ -2,7 +2,7 @@ module Fluent
   module Plugin
     module Cloudwatch
       module Ingest
-        VERSION = '0.6.0'.freeze
+        VERSION = '1.0.0'.freeze
       end
     end
   end

data/lib/fluent/plugin/in_cloudwatch_ingest.rb CHANGED Viewed

@@ -27,8 +27,9 @@ module Fluent::Plugin
     config_param :state_file_name, :string, default: '/var/spool/td-agent/cloudwatch.state' # rubocop:disable LineLength
     desc 'Fetch logs every interval'
     config_param :interval, :time, default: 60
-    desc 'Time to pause between API call failures and limits'
-    config_param :api_interval, :time, default: 5
+    desc 'Time to pause between error conditions'
+    config_param :error_interval, :time, default: 5
+    config_param :api_interval, :time
     desc 'Tag to apply to record'
     config_param :tag, :string, default: 'cloudwatch'
     desc 'Enabled AWS SDK logging'
@@ -88,6 +89,12 @@ module Fluent::Plugin
       # Configure telemetry, if enabled
       @statsd = Statsd.new @statsd_endpoint, 8125 if @telemetry
+      # Fixup deprecated options
+      if @api_interval
+        @error_interval = @api_interval
+        log.warn('api_interval is deprecated for error_interval')
+      end
       @parser = parser_create(conf: parser_config)
       log.info('Configured fluentd-plugin-cloudwatch-ingest')
     end
@@ -154,10 +161,10 @@ module Fluent::Plugin
           break unless response.next_token
           next_token = response.next_token
         rescue => boom
-          log.error("Unable to retrieve log groups: #{boom}")
+          log.error("Unable to retrieve log groups: #{boom.inspect}")
           metric(:increment, 'api.calls.describeloggroups.failed')
           next_token = nil
-          sleep @api_interval
+          sleep @error_interval
           retry
         end
       end
@@ -189,11 +196,11 @@ module Fluent::Plugin
           break unless response.next_token
           next_token = response.next_token
         rescue => boom
-          log.error("Unable to retrieve log streams for group #{log_group_name} with stream prefix #{log_stream_name_prefix}: #{boom}") # rubocop:disable LineLength
+          log.error("Unable to retrieve log streams for group #{log_group_name} with stream prefix #{log_stream_name_prefix}: #{boom.inspect}") # rubocop:disable LineLength
           metric(:increment, 'api.calls.describelogstreams.failed')
           log_streams = []
           next_token = nil
-          sleep @api_interval
+          sleep @error_interval
           retry
         end
       end
@@ -202,83 +209,135 @@ module Fluent::Plugin
       return log_streams
     end
+    def process_stream(group, stream, next_token, start_time, state)
+      event_count = 0
+      metric(:increment, 'api.calls.getlogevents.attempted')
+      response = @aws.get_log_events(
+        log_group_name: group,
+        log_stream_name: stream,
+        next_token: next_token,
+        limit: @limit_events,
+        start_time: start_time,
+        start_from_head: @oldest_logs_first
+      )
+      response.events.each do |e|
+        begin
+          emit(e, group, stream)
+          event_count += 1
+        rescue => boom
+          log.error("Failed to emit event #{e}: #{boom.inspect}")
+        end
+      end
+      has_stream_timestamp = true if state.store[group][stream]['timestamp']
+      if !has_stream_timestamp && response.events.count.zero?
+        # This stream has returned no data ever.
+        # In this case, don't save state (token could be an invalid one)
+      else
+        # Once all events for this stream have been processed,
+        # in this iteration, store the forward token
+        state.new_store[group][stream]['token'] = response.next_forward_token
+        if response.events.last
+          state.new_store[group][stream]['timestamp'] =
+            response.events.last.timestamp
+        else
+          state.new_store[group][stream]['timestamp'] =
+            state.store[group][stream]['timestamp']
+        end
+      end
+      return event_count
+    end
     def run
       until @finished
         begin
           state = State.new(@state_file_name, log)
         rescue => boom
-          log.info("Failed lock state. Sleeping for #{@interval}: #{boom}")
-          sleep @interval
-          retry
+          log.info("Failed lock state. Sleeping for #{@error_interval}: "\
+                   "#{boom.inspect}")
+          sleep @error_interval
+          next
         end
+        event_count = 0
         # Fetch the streams for each log group
         log_groups(@log_group_name_prefix).each do |group|
           # For each log stream get and emit the events
           log_streams(group, @log_stream_name_prefix).each do |stream|
+            state.store[group][stream] = {} unless state.store[group][stream]
+            log.info("processing stream: #{stream}")
             # See if we have some stored state for this group and stream.
             # If we have then use the stored forward_token to pick up
             # from that point. Otherwise start from the start.
-            if state.store[group] && state.store[group][stream]
-              stream_token =
-                (state.store[group][stream] if state.store[group][stream])
-            else
-              stream_token = nil
-            end
             begin
-              metric(:increment, 'api.calls.getlogevents.attempted')
-              response = @aws.get_log_events(
-                log_group_name: group,
-                log_stream_name: stream,
-                next_token: stream_token,
-                limit: @limit_events,
-                start_time: @event_start_time,
-                start_from_head: @oldest_logs_first
-              )
-              response.events.each do |e|
-                begin
-                  emit(e, group, stream)
-                rescue => boom
-                  log.error("Failed to emit event #{e}: #{boom}")
-                  metric(:increment, 'events.emitted.failed')
-                end
+              event_count += process_stream(group, stream,
+                                            state.store[group][stream]['token'],
+                                            @event_start_time, state)
+            rescue Aws::CloudWatchLogs::Errors::InvalidParameterException
+              metric(:increment, 'api.calls.getlogevents.invalid_token')
+              log.error('cloudwatch token is expired or broken. '\
+                        'trying with timestamp.')
+              # try again with timestamp instead of forward token
+              begin
+                timestamp = state.store[group][stream]['timestamp']
+                timestamp = @event_start_time unless timestamp
+                event_count += process_stream(group, stream,
+                                              nil, timestamp, state)
+              rescue => boom
+                log.error("Unable to retrieve events for stream #{stream} "\
+                          "in group #{group}: #{boom.inspect}") # rubocop:disable all
+                metric(:increment, 'api.calls.getlogevents.failed')
+                sleep @error_interval
+                next
               end
-              # Once all events for this stream have been processed,
-              # in this iteration, store the forward token
-              state.store[group][stream] = response.next_forward_token
             rescue => boom
-              log.error("Unable to retrieve events for stream #{stream} in group #{group}: #{boom}") # rubocop:disable LineLength
+              log.error("Unable to retrieve events for stream #{stream} in group #{group}: #{boom.inspect}") # rubocop:disable LineLength
               metric(:increment, 'api.calls.getlogevents.failed')
-              sleep @api_interval
-              retry
+              sleep @error_interval
+              next
             end
           end
         end
-        log.info('Pruning and saving state')
-        state.prune(log_groups(@log_group_name_prefix)) # Remove dead streams
+        log.info('Saving state')
         begin
           state.save
           state.close
-        rescue
-          log.error("Unable to save state file: #{boom}")
+        rescue => boom
+          log.error("Unable to save state file: #{boom.inspect}")
         end
-        log.info("Pausing for #{@interval}")
-        sleep @interval
+        if event_count > 0
+          sleep_interval = @interval
+        else
+          sleep_interval = @error_interval # when there is no events, slow down
+        end
+        log.info("#{event_count} events processed.")
+        log.info("Pausing for #{sleep_interval}")
+        sleep sleep_interval
       end
     end
     class CloudwatchIngestInput::State
       class LockFailed < RuntimeError; end
-      attr_accessor :statefile, :store
+      attr_accessor :statefile, :store, :new_store
       def initialize(filepath, log)
         @filepath = filepath
         @log = log
-        @store = Hash.new { |h, k| h[k] = {} }
+        @store = Hash.new { |h, k| h[k] = Hash.new { |x, y| x[y] = {} } }
+        @new_store = Hash.new { |h, k| h[k] = Hash.new { |x, y| x[y] = {} } }
         if File.exist?(filepath)
           self.statefile = Pathname.new(@filepath).open('r+')
@@ -288,7 +347,8 @@ module Fluent::Plugin
             self.statefile = Pathname.new(@filepath).open('w+')
             save
           rescue => boom
-            @log.error("Unable to create new file #{statefile.path}: #{boom}")
+            @log.error("Unable to create new file #{statefile.path}: "\
+                       "#{boom.inspect}")
           end
         end
@@ -298,13 +358,30 @@ module Fluent::Plugin
         lockstatus = statefile.flock(File::LOCK_EX | File::LOCK_NB)
         raise CloudwatchIngestInput::State::LockFailed if lockstatus == false
-        @store.merge!(Psych.safe_load(statefile.read))
-        @log.info("Loaded #{@store.keys.size} groups from #{statefile.path}")
+        begin
+          @store.merge!(Psych.safe_load(statefile.read))
+          # Migrate old state file
+          @store.each do |_group, streams|
+            streams.update(streams) do |_name, stream|
+              if stream.is_a? String
+                return { 'token' => stream, 'timestamp' => Time.now.to_i }
+              end
+              return stream
+            end
+          end
+          @log.info("Loaded #{@store.keys.size} groups from #{statefile.path}")
+        rescue
+          statefile.close
+          raise
+        end
       end
       def save
         statefile.rewind
-        statefile.write(Psych.dump(@store))
+        statefile.truncate(0)
+        statefile.write(Psych.dump(@new_store))
         @log.info("Saved state to #{statefile.path}")
         statefile.rewind
       end
@@ -312,14 +389,6 @@ module Fluent::Plugin
       def close
         statefile.close
       end
-      def prune(log_groups)
-        groups_before = @store.keys.size
-        @store.delete_if { |k, _v| true unless log_groups.include?(k) }
-        @log.info("Pruned #{groups_before - @store.keys.size} keys from store")
-        # TODO: also prune streams as these are most likely to be transient
-      end
     end
   end
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: fluent-plugin-cloudwatch-ingest
 version: !ruby/object:Gem::Version
-  version: 0.6.0
+  version: 1.0.0
 platform: ruby
 authors:
 - Sam Pointer
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2017-06-23 00:00:00.000000000 Z
+date: 2017-06-30 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler
@@ -133,6 +133,7 @@ files:
 - ".rspec"
 - ".rubocop.yml"
 - ".ruby-version"
+- CHANGELOG.md
 - Gemfile
 - LICENSE
 - README.md