fluent-plugin-cloudwatch-ingest 1.1.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 284c46daa64025bff3faa261efa317eed76257ea
4
- data.tar.gz: 6fdbe82b7723d0f7e03c5c5e38346a7ca6274e97
3
+ metadata.gz: 9099fc9a7106d463763eedd19d543e9e1023231c
4
+ data.tar.gz: 2d0203a3edd8f96f7a0f25a71dadf79b1d6f8c3d
5
5
  SHA512:
6
- metadata.gz: 70cdaf653605e98b277717969c47fc54570204d3cc0d4f5a0e1c0228e1cf845587821609d41a23a7a728a5d1cd70be05b62eb296524e1956efde87ebd9eaf95b
7
- data.tar.gz: 21f9998c51a1f117ccf09d5a864a82df7dcaff27f55ec93d5dd642223ced0182ea26795e209363ed76ef8a18ca97f1c67970fc6805cc1628b39588b9638e4db5
6
+ metadata.gz: 41d7a5122379a9e2c8ff975e61b49475e2a1ee31b7ca3bba5aa468e5913f7d1ed2f71e7175f7fb2a2fef3c5a1f844150fc9a35054e0d235152b1fe39ba18132e
7
+ data.tar.gz: 39c4c78a663a47054c147e785075de17eb70f12933eb42a1f4e821485cb7e59bed17d260babe967b22edb84687213dafdbd6f7fa0a478e61eccb13eff265cc9a
data/CHANGELOG.md CHANGED
@@ -36,3 +36,14 @@
36
36
  * Remove streams from state file that are no longer present (@chaeyk)
37
37
  * Apply `error_interval` when failing to get statefile lock (@chaeyk)
38
38
  * `api_interval` deprecated in favour of `error_interval`
39
+
40
+ ## 1.1.0
41
+
42
+ * Update aws-sdk runtime dependency
43
+
44
+ ## 1.2.0
45
+
46
+ * Add the ability to inject both the `ingestion_time returned from the the Cloudwatch Logs API, and the time that this plugin ingested the event into the record.
47
+ * Add telemetry to the parser
48
+
49
+ Both of these changes are designed to make debugging ingestion problems from high-volume log groups easier.
data/README.md CHANGED
@@ -1,5 +1,5 @@
1
1
  # Fluentd Cloudwatch Plugin
2
- [![Circle CI](https://circleci.com/gh/sampointer/fluent-plugin-cloudwatch-ingest.svg?style=shield)](https://circleci.com/gh/sampointer/fluent-plugin-cloudwatch-ingest) [![Gem Version](https://badge.fury.io/rb/fluent-plugin-cloudwatch-ingest.svg)](https://badge.fury.io/rb/fluent-plugin-cloudwatch-ingest) ![](http://ruby-gem-downloads-badge.herokuapp.com/fluent-plugin-cloudwatch-ingest?type=total) [![Join the chat at https://gitter.im/fluent-plugin-cloudwatch-ingest](https://badges.gitter.im/fluent-plugin-cloudwatch-ingest.svg)](https://gitter.im/fluent-plugin-cloudwatch-ingest/Lobby?utm_source=share-link&utm_medium=link&utm_campaign=share-link)[![Dependency Status](https://gemnasium.com/badges/github.com/sampointer/fluent-plugin-cloudwatch-ingest.svg)](https://gemnasium.com/github.com/sampointer/fluent-plugin-cloudwatch-ingest)
2
+ [![Circle CI](https://circleci.com/gh/sampointer/fluent-plugin-cloudwatch-ingest.svg?style=shield)](https://circleci.com/gh/sampointer/fluent-plugin-cloudwatch-ingest) [![Gem Version](https://badge.fury.io/rb/fluent-plugin-cloudwatch-ingest.svg)](https://badge.fury.io/rb/fluent-plugin-cloudwatch-ingest) ![](http://ruby-gem-downloads-badge.herokuapp.com/fluent-plugin-cloudwatch-ingest?type=total) [![Join the chat at https://gitter.im/fluent-plugin-cloudwatch-ingest](https://badges.gitter.im/fluent-plugin-cloudwatch-ingest.svg)](https://gitter.im/fluent-plugin-cloudwatch-ingest/Lobby?utm_source=share-link&utm_medium=link&utm_campaign=share-link) [![Dependency Status](https://gemnasium.com/badges/github.com/sampointer/fluent-plugin-cloudwatch-ingest.svg)](https://gemnasium.com/github.com/sampointer/fluent-plugin-cloudwatch-ingest)
3
3
 
4
4
  ## Introduction
5
5
 
@@ -49,11 +49,15 @@ Or install it yourself as:
49
49
  @type cloudwatch_ingest
50
50
  expression /^(?<message>.+)$/
51
51
  time_format %Y-%m-%d %H:%M:%S.%L
52
- event_time true # take time from the Cloudwatch event, rather than parse it from the body
53
- inject_group_name true # inject the group name into the record
54
- inject_stream_name true # inject the stream name into the record
55
- parse_json_body false # Attempt to parse the body as json and add structured fields from the result
56
- fail_on_unparsable_json false # If the body cannot be parsed as json do not ingest the record
52
+ event_time true # take time from the Cloudwatch event, rather than parse it from the body
53
+ inject_group_name true # inject the group name into the record
54
+ inject_stream_name true # inject the stream name into the record
55
+ inject_cloudwatch_ingestion_time field_name # inject the `ingestion_time` as returned by the Cloudwatch API
56
+ inject_plugin_ingestion_time field_name # inject the 13 digit epoch time at which the plugin ingested the event
57
+ parse_json_body false # Attempt to parse the body as json and add structured fields from the result
58
+ fail_on_unparsable_json false # If the body cannot be parsed as json do not ingest the record
59
+ telemetry false # Produce statsd telemetry
60
+ statsd_endpoint localhost # Endpoint to which telemetry should be sent
57
61
  </parse>
58
62
  </source>
59
63
  ```
@@ -81,6 +85,11 @@ If `fail_on_unparsable_json` is set to `true` a record body consisting of malfor
81
85
 
82
86
  The `expression` is applied before JSON parsing is attempted. One may therefore extract a JSON fragment from within the event body if it is decorated with additional free-form text.
83
87
 
88
+ ### High volume Log Groups
89
+ If you're having ingestion problems from high volume log groups you're advised to enable telemetry in both the main plugin and the parser, and to also set both `inject_cloudwatch_ingestion_time` and `inject_plugin_ingestion_time` to `true`.
90
+
91
+ This will enable your telemetry system to plot the state of your rate limiting, the effect of the ingestion delay _inside_ Cloudwatch Logs (`timestamp` vs `ingestion_time`) and take appropriate tuning action.
92
+
84
93
  ### Telemetry
85
94
  With `telemetry` set to `true` and a valid `statsd_endpoint` the plugin will emit telemetry in statsd format to 8125:UDP. It is up to you to configure your statsd-speaking daemon to add any prefix or tagging that you might want.
86
95
 
@@ -97,6 +106,16 @@ api.calls.getlogevents.invalid_token
97
106
  events.emitted.success
98
107
  ```
99
108
 
109
+ Likewise when telemetry is enabled for the parser, the emitted metrics are:
110
+
111
+ ```
112
+ parser.record.attempted
113
+ parser.record.success
114
+ parser.json.success # if json parsing is enabled
115
+ parser.json.failed # if json parsing is enabled
116
+ parser.ingestion_skew # the difference between `timestamp` and `ingestion_time` as returned by the Cloudwatch API
117
+ ```
118
+
100
119
  ### Sub-second timestamps
101
120
  When using `event_time true` the `@timestamp` field for the record is taken from the time recorded against the event by Cloudwatch. This is the most common mode to run in as it's an easy path to normalization: all of your Lambdas or other AWS service need not have the same, valid, `time_format` nor a regex that matches every case.
102
121
 
@@ -2,7 +2,7 @@ module Fluent
2
2
  module Plugin
3
3
  module Cloudwatch
4
4
  module Ingest
5
- VERSION = '1.1.0'.freeze
5
+ VERSION = '1.2.0'.freeze
6
6
  end
7
7
  end
8
8
  end
@@ -5,6 +5,7 @@ require 'fluent/plugin/parser'
5
5
  require 'json'
6
6
  require 'pathname'
7
7
  require 'psych'
8
+ require 'statsd-ruby'
8
9
 
9
10
  module Fluent::Plugin
10
11
  class CloudwatchIngestInput < Fluent::Plugin::Input
@@ -1,6 +1,8 @@
1
+ require 'date'
1
2
  require 'fluent/plugin/parser_regexp'
2
3
  require 'fluent/time'
3
4
  require 'multi_json'
5
+ require 'statsd-ruby'
4
6
 
5
7
  module Fluent
6
8
  module Plugin
@@ -11,9 +13,13 @@ module Fluent
11
13
  config_param :time_format, :string, default: '%Y-%m-%d %H:%M:%S.%L'
12
14
  config_param :event_time, :bool, default: true
13
15
  config_param :inject_group_name, :bool, default: true
16
+ config_param :inject_cloudwatch_ingestion_time, :string, default: false
17
+ config_param :inject_plugin_ingestion_time, :string, default: false
14
18
  config_param :inject_stream_name, :bool, default: true
15
19
  config_param :parse_json_body, :bool, default: false
16
20
  config_param :fail_on_unparsable_json, :bool, default: false
21
+ config_param :telemetry, :bool, default: false
22
+ config_param :statsd_endpoint, :string, default: 'localhost'
17
23
 
18
24
  def initialize
19
25
  super
@@ -21,9 +27,21 @@ module Fluent
21
27
 
22
28
  def configure(conf)
23
29
  super
30
+ @statsd = Statsd.new @statsd_endpoint, 8125 if @telemetry
31
+ end
32
+
33
+ def metric(method, name, value = 0)
34
+ case method
35
+ when :increment
36
+ @statsd.send(method, name) if @telemetry
37
+ else
38
+ @statsd.send(method, name, value) if @telemetry
39
+ end
24
40
  end
25
41
 
26
42
  def parse(event, group, stream)
43
+ metric(:increment, 'parser.record.attempted')
44
+
27
45
  time = nil
28
46
  record = nil
29
47
  super(event.message) do |t, r|
@@ -38,10 +56,12 @@ module Fluent
38
56
  # message into the record we'd bork on
39
57
  # nested keys. Force level one Strings.
40
58
  json_body = MultiJson.load(record['message'])
59
+ metric(:increment, 'parser.json.success')
41
60
  json_body.each_pair do |k, v|
42
61
  record[k.to_s] = v.to_s
43
62
  end
44
63
  rescue MultiJson::ParseError
64
+ metric(:increment, 'parser.json.failed')
45
65
  if @fail_on_unparsable_json
46
66
  yield nil, nil
47
67
  return
@@ -53,6 +73,27 @@ module Fluent
53
73
  record['log_group_name'] = group if @inject_group_name
54
74
  record['log_stream_name'] = stream if @inject_stream_name
55
75
 
76
+ if @inject_plugin_ingestion_time
77
+ now = DateTime.now
78
+ record[@inject_plugin_ingestion_time] = now.iso8601
79
+ end
80
+
81
+ if @inject_cloudwatch_ingestion_time
82
+ epoch_ms = event.ingestion_time.to_f / 1000
83
+ time = Time.at(epoch_ms)
84
+ record[@inject_cloudwatch_ingestion_time] =
85
+ time.to_datetime.iso8601(3)
86
+ end
87
+
88
+ # Optionally emit cloudwatch event and ingestion time skew telemetry
89
+ if @telemetry
90
+ metric(
91
+ :gauge,
92
+ 'parser.ingestion_skew',
93
+ event.ingestion_time - event.timestamp
94
+ )
95
+ end
96
+
56
97
  # We do String processing on the event time here to
57
98
  # avoid rounding errors introduced by floating point
58
99
  # arithmetic.
@@ -61,6 +102,7 @@ module Fluent
61
102
 
62
103
  time = Fluent::EventTime.new(event_s, event_ns) if @event_time
63
104
 
105
+ metric(:increment, 'parser.record.success')
64
106
  yield time, record
65
107
  end
66
108
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fluent-plugin-cloudwatch-ingest
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.0
4
+ version: 1.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Sam Pointer
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2017-07-07 00:00:00.000000000 Z
11
+ date: 2017-08-18 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -168,7 +168,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
168
168
  version: '0'
169
169
  requirements: []
170
170
  rubyforge_project:
171
- rubygems_version: 2.6.11
171
+ rubygems_version: 2.6.12
172
172
  signing_key:
173
173
  specification_version: 4
174
174
  summary: Fluentd plugin to ingest AWS Cloudwatch logs