fluent-plugin-cloudwatch-ingest-chaeyk 0.3.2

Sign up to get free protection for your applications and to get access to all the features.
data/README.md ADDED
@@ -0,0 +1,132 @@
1
+ # Fluentd Cloudwatch Plugin [![Circle CI](https://circleci.com/gh/sampointer/fluent-plugin-cloudwatch-ingest.svg?style=shield)](https://circleci.com/gh/sampointer/fluent-plugin-cloudwatch-ingest) [![Gem Version](https://badge.fury.io/rb/fluent-plugin-cloudwatch-ingest.svg)](https://badge.fury.io/rb/fluent-plugin-cloudwatch-ingest) ![](http://ruby-gem-downloads-badge.herokuapp.com/fluent-plugin-cloudwatch-ingest?type=total)
2
+
3
+ ## Introduction
4
+
5
+ This gem was created out of frustration with existing solutions for Cloudwatch log ingestion into a Fluentd pipeline. Specifically, it has been designed to support:
6
+
7
+ * The 0.14.x fluentd plugin API
8
+ * Native IAM including cross-account authentication via STS
9
+ * Tidy state serialization
10
+ * HA configurations without ingestion duplication
11
+
12
+ ## Installation
13
+
14
+ Add this line to your application's Gemfile:
15
+
16
+ ```ruby
17
+ gem 'fluent-plugin-cloudwatch-ingest'
18
+ ```
19
+
20
+ And then execute:
21
+
22
+ $ bundle
23
+
24
+ Or install it yourself as:
25
+
26
+ $ gem install fluent-plugin-cloudwatch-ingest
27
+
28
+ ## Usage
29
+ ```
30
+ <source>
31
+ @type cloudwatch_ingest
32
+ region us-east-1
33
+ sts_enabled true
34
+ sts_arn arn:aws:iam::123456789012:role/role_in_another_account
35
+ sts_session_name fluentd-dev
36
+ aws_logging_enabled true
37
+ log_group_name_prefix /aws/lambda
38
+ log_stream_name_prefix 2017
39
+ limit_events 10000
40
+ state_file_name /mnt/nfs/cloudwatch.state
41
+ interval 60
42
+ api_interval 5 # Time to wait between API call failures before retry
43
+ limit_events 10000 # Number of events to fetch in any given iteration
44
+ event_start_time 0 # Do not fetch events before this time (UNIX epoch, miliseconds)
45
+ <parse>
46
+ @type cloudwatch_ingest
47
+ expression /^(?<message>.+)$/
48
+ time_format %Y-%m-%d %H:%M:%S.%L
49
+ event_time true # take time from the Cloudwatch event, rather than parse it from the body
50
+ inject_group_name true # inject the group name into the record
51
+ inject_stream_name true # inject the stream name into the record
52
+ </parse>
53
+ </source>
54
+ ```
55
+
56
+ ### Authentication
57
+ The plugin will assume an IAM instance role. Without either of the `sts_*` options that role will be used for authentication. With those set the plugin will
58
+ attempt to `sts:AssumeRole` the `sts_arn`. This is useful for fetching logs from many accounts where the fluentd infrastructure lives in one single account.
59
+
60
+ ### Prefixes
61
+ Both the `log_group_name_prefix` and `log_stream_name_prefix` may be omitted, in which case all groups and streams will be ingested. For performance reasons it is often desirable to set the `log_stream_name_prefix` to be today's date, managed by a configuration management system.
62
+
63
+ ### State file
64
+ The state file is a YAML serialization of the current ingestion state. When running in a HA configuration this should be placed on a shared filesystem, such as EFS.
65
+ The state file is opened with an exclusive write call and as such also functions as a lock file in HA configurations. See below.
66
+
67
+ ### HA Setup
68
+ When the state file is located on a shared filesystem an exclusive write lock will attempted each `interval`.
69
+ As such it is safe to run multiple instances of this plugin consuming from the same CloudWatch logging source without fear of duplication, as long as they share a state file.
70
+ In a properly configured auto-scaling group this provides for uninterrupted log ingestion in the event of a failure of any single node.
71
+
72
+ ### Sub-second timestamps
73
+ When using `event_time true` the `@timestamp` field for the record is taken from the time recorded against the event by Cloudwatch. This is the most common mode to run in as it's an easy path to normalization: all of your Lambdas or other AWS service need not have the same, valid, `time_format` nor a regex that matches every case.
74
+
75
+ If your output plugin supports sub-second precision (and you're running fluentd 0.14.x) you'll "enjoy" sub-second precision.
76
+
77
+ #### Elasticsearch
78
+ It is a common pattern to use fluentd alongside the [fluentd-plugin-elasticsearch](https://github.com/uken/fluent-plugin-elasticsearch) plugin, either directly or via [fluent-plugin-aws-elasticsearch-service](https://github.com/atomita/fluent-plugin-aws-elasticsearch-service), to ingest logs into Elasticsearch.
79
+
80
+ At present there is a bug within this plugin that, via an unwise cast, causes records without a named timestamp field to be cast to `DateTime`, losing the precision. This PR: https://github.com/uken/fluent-plugin-elasticsearch/pull/249 fixes that issue. If you need this functionality then I would urge you to comment and express interest over there.
81
+
82
+ Failing that I maintain my own fork of that repository with the fix in place: https://github.com/sampointer/fluent-plugin-elasticsearch/tree/add_configurable_time_precision_when_timestamp_missing
83
+
84
+ ### IAM
85
+ IAM is a tricky and often bespoke subject. Here's a starter that will ingest all of the logs for all of your Lambdas in the account in which the plugin is running:
86
+
87
+ ```json
88
+ {
89
+ "Version": "2012-10-17",
90
+ "Statement": [
91
+ {
92
+ "Effect": "Allow",
93
+ "Action": [
94
+ "logs:DescribeLogGroups",
95
+ "logs:DescribeLogStreams",
96
+ "logs:DescribeMetricFilters",
97
+ "logs:FilterLogEvents",
98
+ "logs:GetLogEvents"
99
+ ],
100
+ "Resource": [
101
+ "arn:aws:logs:eu-west-1:123456789012:log-group:/aws/lambda/*:*"
102
+ ]
103
+ },
104
+ {
105
+ "Effect": "Allow",
106
+ "Action": [
107
+ "logs:DescribeLogGroups",
108
+ ],
109
+ "Resource": [
110
+ "arn:aws:logs:eu-west-1:123456789012:log-group:*:*"
111
+ ]
112
+ }
113
+ ]
114
+ }
115
+ ```
116
+
117
+ ### Cross-account authentication
118
+ Is a tricky subject that probably cannot be described here. Broadly speaking the IAM instance role of the host on which the plugin is running
119
+ needs to be able to `sts:AssumeRole` the `sts_arn` (and obviously needs `sts_enabled` to be true).
120
+
121
+ The assumed role should look more-or-less like that above in terms of the actions and resource combinations required.
122
+
123
+ ## Development
124
+
125
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
126
+
127
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
128
+
129
+ ## Contributing
130
+
131
+ Bug reports and pull requests are welcome on GitHub at https://github.com/sampointer/fluent-plugin-cloudwatch-ingest.
132
+
data/Rakefile ADDED
@@ -0,0 +1,8 @@
1
+ require 'bundler/gem_tasks'
2
+ require 'rspec/core/rake_task'
3
+ require 'rubocop/rake_task'
4
+
5
+ RSpec::Core::RakeTask.new(:spec)
6
+ RuboCop::RakeTask.new
7
+
8
+ task default: %i[rubocop spec build]
@@ -0,0 +1,2 @@
1
+ ---
2
+ :rubygems_api_key: REPLACEME
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'bundler/setup'
4
+ require 'fluent/plugin/cloudwatch/ingest'
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require 'irb'
14
+ IRB.start
data/bin/deploy ADDED
@@ -0,0 +1,10 @@
1
+ #!/bin/bash
2
+ #
3
+ # deploy to rubygems.
4
+
5
+ gem=$(ls pkg/*.gem | tail -1)
6
+ mkdir -p /home/ubuntu/.gem/
7
+ cp assets/credentials /home/ubuntu/.gem/credentials
8
+ sed -i "s/REPLACEME/${rubygems_api_key}/g" /home/ubuntu/.gem/credentials
9
+ chmod 0600 /home/ubuntu/.gem/credentials
10
+ gem push ${gem}
data/bin/setup ADDED
@@ -0,0 +1,7 @@
1
+ #!/bin/bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+
5
+ bundle install
6
+
7
+ # Do any other automated setup that you need to do here
data/circle.yml ADDED
@@ -0,0 +1,11 @@
1
+ test:
2
+ override:
3
+ - bundle exec rake
4
+ post:
5
+ - cp pkg/*.gem ${CIRCLE_ARTIFACTS}
6
+ deployment:
7
+ release:
8
+ tag: /^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(-(0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(\.(0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*)?(\+[0-9a-zA-Z-]+(\.[0-9a-zA-Z-]+)*)?$/
9
+ owner: sampointer
10
+ commands:
11
+ - bin/deploy
@@ -0,0 +1,37 @@
1
+ # coding: utf-8
2
+
3
+ lib = File.expand_path('../lib', __FILE__)
4
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
5
+ require 'fluent/plugin/cloudwatch/ingest/version'
6
+
7
+ Gem::Specification.new do |spec|
8
+ spec.name = 'fluent-plugin-cloudwatch-ingest-chaeyk'
9
+ spec.version = Fluent::Plugin::Cloudwatch::Ingest::VERSION
10
+ spec.authors = ['Sam Pointer']
11
+ spec.email = ['san@outsidethe.net']
12
+
13
+ spec.summary = 'Fluentd plugin to ingest AWS Cloudwatch logs'
14
+ spec.description = 'Fluentd plugin to ingest AWS Cloudwatch logs'
15
+ spec.homepage = 'https://github.com/sampointer/fluent-plugin-cloudwatch-ingest'
16
+
17
+ # Prevent pushing this gem to RubyGems.org by setting 'allowed_push_host', or
18
+ # delete this section to allow pushing this gem to any host.
19
+ if spec.respond_to?(:metadata) # rubocop:disable all
20
+ spec.metadata['allowed_push_host'] = 'https://rubygems.org'
21
+ else
22
+ raise 'RubyGems 2.0 or newer is required to protect against public gem pushes.' # rubocop:disable all
23
+ end
24
+
25
+ spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) } # rubocop:disable all
26
+ spec.bindir = 'exe'
27
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
28
+ spec.require_paths = ['lib']
29
+
30
+ spec.add_development_dependency 'bundler', '~> 1.10'
31
+ spec.add_development_dependency 'rake', '~> 10.0'
32
+ spec.add_development_dependency 'rspec'
33
+ spec.add_development_dependency 'rubocop'
34
+
35
+ spec.add_dependency 'fluentd', '~>0.14.13'
36
+ spec.add_dependency 'aws-sdk', '~>2.8.4'
37
+ end
@@ -0,0 +1,9 @@
1
+ module Fluent
2
+ module Plugin
3
+ module Cloudwatch
4
+ module Ingest
5
+ # Your code goes here...
6
+ end
7
+ end
8
+ end
9
+ end
@@ -0,0 +1,9 @@
1
+ module Fluent
2
+ module Plugin
3
+ module Cloudwatch
4
+ module Ingest
5
+ VERSION = '0.3.2'.freeze
6
+ end
7
+ end
8
+ end
9
+ end
@@ -0,0 +1,299 @@
1
+ require 'aws-sdk'
2
+ require 'fluent/config/error'
3
+ require 'fluent/plugin/input'
4
+ require 'fluent/plugin/parser'
5
+ require 'json'
6
+ require 'pathname'
7
+ require 'psych'
8
+
9
+ module Fluent::Plugin
10
+ class CloudwatchIngestInput < Fluent::Plugin::Input
11
+ Fluent::Plugin.register_input('cloudwatch_ingest', self)
12
+ helpers :compat_parameters, :parser
13
+
14
+ desc 'The region of the source cloudwatch logs'
15
+ config_param :region, :string, default: 'us-east-1'
16
+ desc 'Enable STS for cross-account IAM'
17
+ config_param :sts_enabled, :bool, default: false
18
+ desc 'The IAM role ARN in the source account to use when STS is enabled'
19
+ config_param :sts_arn, :string, default: ''
20
+ desc 'The session name for use with STS'
21
+ config_param :sts_session_name, :string, default: 'fluentd'
22
+ desc 'Log group name or prefix. Not setting means "all"'
23
+ config_param :log_group_name_prefix, :string, default: ''
24
+ desc 'Log stream name or prefix. Not setting means "all"'
25
+ config_param :log_stream_name_prefix, :string, default: ''
26
+ desc 'State file name'
27
+ config_param :state_file_name, :string, default: '/var/spool/td-agent/cloudwatch.state' # rubocop:disable all
28
+ desc 'Fetch logs every interval'
29
+ config_param :interval, :time, default: 60
30
+ desc 'Time to pause between API call failures and limits'
31
+ config_param :api_interval, :time, default: 5
32
+ desc 'Tag to apply to record'
33
+ config_param :tag, :string, default: 'cloudwatch'
34
+ desc 'Enabled AWS SDK logging'
35
+ config_param :aws_logging_enabled, :bool, default: false
36
+ desc 'Limit the number of events fetched in any iteration'
37
+ config_param :limit_events, :integer, default: 10_000
38
+ desc 'Do not fetch events before this time'
39
+ config_param :event_start_time, :integer, default: 0
40
+ config_section :parse do
41
+ config_set_default :@type, 'cloudwatch_ingest'
42
+ desc 'Regular expression with which to parse the event message'
43
+ config_param :expression, :string, default: '^(?<message>.+)$'
44
+ desc 'Take the timestamp from the event rather than the expression'
45
+ config_param :event_time, :bool, default: true
46
+ desc 'Time format to use when parsing event message'
47
+ config_param :time_format, :string, default: '%Y-%m-%d %H:%M:%S.%L'
48
+ desc 'Inject the log group name into the record'
49
+ config_param :inject_group_name, :bool, default: true
50
+ desc 'Inject the log stream name into the record'
51
+ config_param :inject_stream_name, :bool, default: true
52
+ end
53
+
54
+ def initialize
55
+ super
56
+ log.info('Starting fluentd-plugin-cloudwatch-ingest')
57
+ end
58
+
59
+ def configure(conf)
60
+ super
61
+ compat_parameters_convert(conf, :parser)
62
+ parser_config = conf.elements('parse').first
63
+ unless parser_config
64
+ raise Fluent::ConfigError, '<parse> section is required.'
65
+ end
66
+ unless parser_config['expression']
67
+ raise Fluent::ConfigError, 'parse/expression is required.'
68
+ end
69
+ unless parser_config['event_time']
70
+ raise Fluent::ConfigError, 'parse/event_time is required.'
71
+ end
72
+
73
+ @parser = parser_create(conf: parser_config)
74
+ log.info('Configured fluentd-plugin-cloudwatch-ingest')
75
+ end
76
+
77
+ def start
78
+ super
79
+ log.info('Started fluentd-plugin-cloudwatch-ingest')
80
+
81
+ # Get a handle to Cloudwatch
82
+ aws_options = {}
83
+ Aws.config[:region] = @region
84
+ Aws.config[:logger] = log if @aws_logging
85
+ log.info("Working in region #{@region}")
86
+
87
+ if @sts_enabled
88
+ aws_options[:credentials] = Aws::AssumeRoleCredentials.new(
89
+ role_arn: @sts_arn,
90
+ role_session_name: @sts_session_name
91
+ )
92
+
93
+ log.info("Using STS for authentication with source account ARN: #{@sts_arn}, session name: #{@sts_session_name}") # rubocop:disable all
94
+ else
95
+ log.info('Using local instance IAM role for authentication')
96
+ end
97
+ @aws = Aws::CloudWatchLogs::Client.new(aws_options)
98
+ @finished = false
99
+ @thread = Thread.new(&method(:run))
100
+ end
101
+
102
+ def shutdown
103
+ @finished = true
104
+ @thread.join
105
+ end
106
+
107
+ private
108
+
109
+ def emit(event, log_group_name, log_stream_name)
110
+ @parser.parse(event, log_group_name, log_stream_name) do |time, record|
111
+ router.emit(@tag, time, record)
112
+ end
113
+ end
114
+
115
+ def log_groups(log_group_prefix)
116
+ log_groups = []
117
+
118
+ # Fetch all log group names
119
+ next_token = nil
120
+ loop do
121
+ begin
122
+ response = if !log_group_prefix.empty?
123
+ @aws.describe_log_groups(
124
+ log_group_name_prefix: log_group_prefix,
125
+ next_token: next_token
126
+ )
127
+ else
128
+ @aws.describe_log_groups(
129
+ next_token: next_token
130
+ )
131
+ end
132
+
133
+ response.log_groups.each { |g| log_groups << g.log_group_name }
134
+ break unless response.next_token
135
+ next_token = response.next_token
136
+ rescue => boom
137
+ log.error("Unable to retrieve log groups: #{boom}")
138
+ next_token = nil
139
+ sleep @api_interval
140
+ retry
141
+ end
142
+ end
143
+ log.info("Found #{log_groups.size} log groups")
144
+
145
+ return log_groups
146
+ end
147
+
148
+ def log_streams(log_group_name, log_stream_name_prefix)
149
+ log_streams = []
150
+ next_token = nil
151
+ loop do
152
+ begin
153
+ response = if !log_stream_name_prefix.empty?
154
+ @aws.describe_log_streams(
155
+ log_group_name: log_group_name,
156
+ log_stream_name_prefix: log_stream_name_prefix,
157
+ next_token: next_token
158
+ )
159
+ else
160
+ @aws.describe_log_streams(
161
+ log_group_name: log_group_name,
162
+ next_token: next_token
163
+ )
164
+ end
165
+
166
+ response.log_streams.each { |s| log_streams << s.log_stream_name }
167
+ break unless response.next_token
168
+ next_token = response.next_token
169
+ rescue => boom
170
+ log.error("Unable to retrieve log streams for group #{log_group_name} with stream prefix #{log_stream_name_prefix}: #{boom}") # rubocop:disable all
171
+ log_streams = []
172
+ next_token = nil
173
+ sleep @api_interval
174
+ retry
175
+ end
176
+ end
177
+ log.info("Found #{log_streams.size} streams for #{log_group_name}")
178
+
179
+ return log_streams
180
+ end
181
+
182
+ def run
183
+ until @finished
184
+ begin
185
+ state = State.new(@state_file_name, log)
186
+ rescue => boom
187
+ log.info("Failed lock state. Sleeping for #{@interval}: #{boom}")
188
+ sleep @interval
189
+ retry
190
+ end
191
+
192
+ # Fetch the streams for each log group
193
+ log_groups(@log_group_name_prefix).each do |group|
194
+ # For each log stream get and emit the events
195
+ log_streams(group, @log_stream_name_prefix).each do |stream|
196
+ # See if we have some stored state for this group and stream.
197
+ # If we have then use the stored forward_token to pick up
198
+ # from that point. Otherwise start from the start.
199
+ if state.store[group] && state.store[group][stream]
200
+ stream_token =
201
+ (state.store[group][stream] if state.store[group][stream])
202
+ else
203
+ stream_token = nil
204
+ end
205
+
206
+ begin
207
+ response = @aws.get_log_events(
208
+ log_group_name: group,
209
+ log_stream_name: stream,
210
+ next_token: stream_token,
211
+ limit: @limit_events,
212
+ start_time: @event_start_time,
213
+ start_from_head: true
214
+ )
215
+
216
+ response.events.each do |e|
217
+ begin
218
+ emit(e, group, stream)
219
+ rescue => boom
220
+ log.error("Failed to emit event #{e}: #{boom}")
221
+ end
222
+ end
223
+
224
+ # Once all events for this stream have been processed,
225
+ # in this iteration, store the forward token
226
+ state.store[group][stream] = response.next_forward_token
227
+ rescue => boom
228
+ log.error("Unable to retrieve events for stream #{stream} in group #{group}: #{boom}") # rubocop:disable all
229
+ sleep @api_interval
230
+ retry
231
+ end
232
+ end
233
+ end
234
+
235
+ log.info('Pruning and saving state')
236
+ state.prune(log_groups(@log_group_name_prefix)) # Remove dead streams
237
+ begin
238
+ state.save
239
+ state.close
240
+ rescue
241
+ log.error("Unable to save state file: #{boom}")
242
+ end
243
+ log.info("Pausing for #{@interval}")
244
+ sleep @interval
245
+ end
246
+ end
247
+
248
+ class CloudwatchIngestInput::State
249
+ class LockFailed < RuntimeError; end
250
+ attr_accessor :statefile, :store
251
+
252
+ def initialize(filepath, log)
253
+ @filepath = filepath
254
+ @log = log
255
+ @store = Hash.new { |h, k| h[k] = {} }
256
+
257
+ if File.exist?(filepath)
258
+ self.statefile = Pathname.new(@filepath).open('r+')
259
+ else
260
+ @log.warn("No state file #{statefile} Creating a new one.")
261
+ begin
262
+ self.statefile = Pathname.new(@filepath).open('w+')
263
+ save
264
+ rescue => boom
265
+ @log.error("Unable to create new file #{statefile.path}: #{boom}")
266
+ end
267
+ end
268
+
269
+ # Attempt to obtain an exclusive flock on the file and raise and
270
+ # exception if we can't
271
+ @log.info("Obtaining exclusive lock on state file #{statefile.path}")
272
+ lockstatus = statefile.flock(File::LOCK_EX | File::LOCK_NB)
273
+ raise CloudwatchIngestInput::State::LockFailed if lockstatus == false
274
+
275
+ @store.merge!(Psych.safe_load(statefile.read))
276
+ @log.info("Loaded #{@store.keys.size} groups from #{statefile.path}")
277
+ end
278
+
279
+ def save
280
+ statefile.rewind
281
+ statefile.write(Psych.dump(@store))
282
+ @log.info("Saved state to #{statefile.path}")
283
+ statefile.rewind
284
+ end
285
+
286
+ def close
287
+ statefile.close
288
+ end
289
+
290
+ def prune(log_groups)
291
+ groups_before = @store.keys.size
292
+ @store.delete_if { |k, _v| true unless log_groups.include?(k) }
293
+ @log.info("Pruned #{groups_before - @store.keys.size} keys from store")
294
+
295
+ # TODO: also prune streams as these are most likely to be transient
296
+ end
297
+ end
298
+ end
299
+ end