fluent-plugin-cloudfront-log-v0.14-fix 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: bddf4c6de34233ea0fc327a2620d1739598e041d
4
+ data.tar.gz: 53f6abff048836b78e187046a5a3647bcf64d155
5
+ SHA512:
6
+ metadata.gz: fd657fc1db7d5c378f54ea1fd4c4c42be6d114864360749a1d8cd448e9479d7c6db64a1e3626d6329dca4a0a2774ead543be4db7e96b74cddbee4c072c35ee8a
7
+ data.tar.gz: fc97ec882180002c0756b9a4cc559769205d6a68d565226bfd1382a612e90a65a79380b7b470d51c7bf5f15adcb8ba9a395d891897a0519a57c27163c7504f3c
@@ -0,0 +1,42 @@
1
+ # Created by https://www.gitignore.io/api/ruby
2
+
3
+ ### Ruby ###
4
+ .bin
5
+ *.gems
6
+ *.gem
7
+ *.rbc
8
+ /.config
9
+ /coverage/
10
+ /InstalledFiles
11
+ /pkg/
12
+ /spec/reports/
13
+ /spec/examples.txt
14
+ /test/tmp/
15
+ /test/version_tmp/
16
+ /tmp/
17
+
18
+ ## Specific to RubyMotion:
19
+ .dat*
20
+ .repl_history
21
+ build/
22
+
23
+ ## Documentation cache and generated files:
24
+ /.yardoc/
25
+ /_yardoc/
26
+ /doc/
27
+ /rdoc/
28
+
29
+ ## Environment normalization:
30
+ /.bundle/
31
+ /vendor/bundle
32
+ /lib/bundler/man/
33
+
34
+ # for a library or gem, you might want to ignore these files since the code is
35
+ # intended to run in multiple environments; otherwise, check them in:
36
+ Gemfile.lock
37
+ .ruby-version
38
+ .ruby-gemset
39
+
40
+ # unless supporting rvm < 1.11.0 or doing something fancy, ignore this:
41
+ .rvmrc
42
+
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in fluent-plugin-cloudfront-log-v0.14-fix.gemspec
4
+ gemspec
@@ -0,0 +1,94 @@
1
+ # Fluent::Plugin::Cloudfront::Log
2
+ This plugin will connect to the S3 bucket that you store your cloudfront logs in. Once the plugin processes them and ships them to FluentD, it moves them to another location(either another bucket or sub directory).
3
+
4
+ This is a fork of https://github.com/kubihie/fluent-plugin-cloudfront-log with a
5
+ fix with regards to fluentd v0.14 error. This is a temporary gem that I would
6
+ publish until PR https://github.com/kubihie/fluent-plugin-cloudfront-log/pull/9
7
+ is merged.
8
+
9
+ ## Example config
10
+ ```
11
+ <source>
12
+ @type cloudfront_log
13
+ log_bucket cloudfront-logs
14
+ log_prefix production
15
+ region us-east-1
16
+ interval 300
17
+ aws_key_id xxxxxx
18
+ aws_sec_key xxxxxx
19
+ tag reverb.cloudfront
20
+ verbose true
21
+ </source>
22
+ ```
23
+
24
+ ## Configuration options
25
+
26
+ #### log_bucket
27
+ This option tells the plugin where to look for the cloudfront logs
28
+
29
+ #### log_prefix
30
+ For example if your logs are stored in a folder called "production" under the "cloudfront-logs" bucket, your logs would be stored in cloudfront like "cloudfront-logs/production/log.gz".
31
+ In this case, you'd want to use the prefix "production".
32
+
33
+ #### moved_log_bucket
34
+ Here you can specify where you'd like the log files to be moved after processing. If left blank this defaults to a folder called `_moved` under the bucket configured for `@log_bucket`.
35
+
36
+ #### moved_log_prefix
37
+ This specifices what the log files will be named once they're processed. This defaults to `_moved`.
38
+
39
+ #### region
40
+ The region where your cloudfront logs are stored.
41
+
42
+ #### interval
43
+ This is the rate in seconds at which we check the bucket for updated logs. This defaults to 300.
44
+ #### aws_sec_id
45
+ The ID of your AWS keypair. Note: Since this plugin uses aws-sdk under the hood you can leave these two aws fields blank if you have an IAM role applied to your FluentD instance.
46
+
47
+ #### aws_sec_key
48
+ The secret key portion of your AWS keypair
49
+
50
+ #### tag
51
+ This is a FluentD builtin.
52
+
53
+ #### thread_num
54
+ The number of threads to create to concurrently process the S3 objects. Defaults to 4.
55
+
56
+ #### s3_get_max
57
+ Control the size of the S3 fetched list on each iteration. Default to 200.
58
+
59
+ #### delimiter
60
+ You shouldn't have to specify delimiter at all but this option is provided and passed to the S3 client in the event that you have a weird delimiter in your log file names. Defaults to `nil`.
61
+
62
+ #### verbose
63
+ Turn this on if you'd like to see verbose information about the plugin and how it's processing your files.
64
+
65
+ ## Installation
66
+
67
+ Add this line to your application's Gemfile:
68
+
69
+ ```ruby
70
+ gem 'fluent-plugin-cloudfront-log-v0.14-fix'
71
+ ```
72
+
73
+ And then execute:
74
+
75
+ $ bundle
76
+
77
+ Or install it yourself as:
78
+
79
+ $ gem install 'fluent-plugin-cloudfront-log-v0.14-fix'
80
+
81
+ ## Development
82
+
83
+ After checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
84
+
85
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
86
+
87
+ ## Contributing
88
+
89
+ Bug reports and pull requests are welcome on GitHub at https://github.com/packetloop/fluent-plugin-cloudfront-log-v0.14-fix.
90
+
91
+ ## Credits
92
+
93
+ kubihie
94
+
@@ -0,0 +1,11 @@
1
+ require "bundler/gem_tasks"
2
+
3
+ require "rake/testtask"
4
+ Rake::TestTask.new(:test) do |test|
5
+ test.libs << 'lib' << 'test'
6
+ test.pattern = 'test/**/test_*.rb'
7
+ test.verbose = true
8
+ end
9
+
10
+
11
+ task :default => :test
@@ -0,0 +1,25 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+
5
+ Gem::Specification.new do |spec|
6
+ spec.name = "fluent-plugin-cloudfront-log-v0.14-fix"
7
+ spec.version = "0.0.1"
8
+ spec.authors = ["lenfree"]
9
+ spec.email = ["lenfree.yeung@gmail.com"]
10
+
11
+ spec.summary = %q{AWS CloudFront log input plugin with temporary fix for v0.14. Credit to kubihie.}
12
+ spec.description = %q{AWS CloudFront log input plugin for fluentd. This repo is temporary until PR to upstream is addressed.}
13
+ spec.homepage = "https://github.com/packetloop/fluent-plugin-cloudfront-log-v0.14-fix"
14
+
15
+ spec.files = `git ls-files`.split($/)
16
+ spec.executables = spec.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
17
+ spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
18
+ spec.require_paths = ["lib"]
19
+
20
+ spec.add_dependency "fluentd", "~> 0"
21
+ spec.add_dependency "aws-sdk", "~> 2.1"
22
+ spec.add_development_dependency "bundler"
23
+ spec.add_development_dependency "rake"
24
+ spec.add_development_dependency 'test-unit', "~> 2"
25
+ end
@@ -0,0 +1,192 @@
1
+ require 'fluent/input'
2
+
3
+ class Fluent::Cloudfront_LogInput < Fluent::Input
4
+ Fluent::Plugin.register_input('cloudfront_log', self)
5
+
6
+ config_param :aws_key_id, :string, :default => nil, :secret => true
7
+ config_param :aws_sec_key, :string, :default => nil, :secret => true
8
+ config_param :log_bucket, :string
9
+ config_param :log_prefix, :string
10
+ config_param :moved_log_bucket, :string, :default => @log_bucket
11
+ config_param :moved_log_prefix, :string, :default => '_moved'
12
+ config_param :region, :string
13
+ config_param :tag, :string, :default => 'cloudfront.access'
14
+ config_param :interval, :integer, :default => 300
15
+ config_param :delimiter, :string, :default => nil
16
+ config_param :verbose, :string, :default => false
17
+ config_param :thread_num, :integer, :default => 4
18
+ config_param :s3_get_max, :integer, :default => 200
19
+
20
+ def initialize
21
+ super
22
+ require 'logger'
23
+ require 'aws-sdk'
24
+ require 'zlib'
25
+ require 'time'
26
+ require 'uri'
27
+ end
28
+
29
+ def configure(conf)
30
+ super
31
+
32
+ raise Fluent::ConfigError.new unless @log_bucket
33
+ raise Fluent::ConfigError.new unless @region
34
+ raise Fluent::ConfigError.new unless @log_prefix
35
+
36
+ @moved_log_bucket = @log_bucket unless @moved_log_bucket
37
+ @moved_log_prefix = @log_prefix + '_moved' unless @moved_log_prefix
38
+
39
+ if @verbose
40
+ log.info("@log_bucket: #{@log_bucket}")
41
+ log.info("@moved_log_bucket: #{@moved_log_bucket}")
42
+ log.info("@log_prefix: #{@log_prefix}")
43
+ log.info("@moved_log_prefix: #{@moved_log_prefix}")
44
+ log.info("@thread_num: #{@thread_num}")
45
+ end
46
+ end
47
+
48
+ def start
49
+ super
50
+ log.info("Cloudfront verbose logging enabled") if @verbose
51
+ client
52
+
53
+ @loop = Coolio::Loop.new
54
+ timer = TimerWatcher.new(@interval, true, log, &method(:input))
55
+
56
+ @loop.attach(timer)
57
+ @thread = Thread.new(&method(:run))
58
+ end
59
+
60
+ def shutdown
61
+ @loop.stop
62
+ @thread.join
63
+ end
64
+
65
+ def run
66
+ @loop.run
67
+ end
68
+
69
+ def client
70
+ begin
71
+ options = {:region => @region}
72
+ if @aws_key_id and @aws_sec_key
73
+ options[:access_key_id] = @aws_key_id
74
+ options[:secret_access_key] = @aws_sec_key
75
+ end
76
+ @client = Aws::S3::Client.new(options)
77
+ rescue => e
78
+ log.warn("S3 client error. #{e.message}")
79
+ end
80
+ end
81
+
82
+ def parse_header(line)
83
+ case line
84
+ when /^#Version:.+/i then
85
+ @version = line.sub(/^#Version:/i, '').strip
86
+ when /^#Fields:.+/i then
87
+ @fields = line.sub(/^#Fields:/i, '').strip.split("\s")
88
+ end
89
+ end
90
+
91
+ def purge(filename)
92
+ # Key is the name of the object without the bucket prefix, e.g: asdf/asdf.jpg
93
+ source_object_key = [@log_prefix, filename].join('/')
94
+
95
+ # Full path includes bucket name in addition to object key, e.g: bucket/asdf/asdf.jpg
96
+ source_object_full_path = [@log_bucket, source_object_key].join('/')
97
+
98
+ dest_object_key = [@moved_log_prefix, filename].join('/')
99
+ dest_object_full_path = [@moved_log_bucket, dest_object_key].join('/')
100
+
101
+ log.info("Copying object: #{source_object_full_path} to #{dest_object_full_path}") if @verbose
102
+
103
+ begin
104
+ client.copy_object(:bucket => @moved_log_bucket, :copy_source => source_object_full_path, :key => dest_object_key)
105
+ rescue => e
106
+ log.warn("S3 Copy client error. #{e.message}")
107
+ return
108
+ end
109
+
110
+
111
+ log.info("Deleting object: #{source_object_key} from #{@log_bucket}") if @verbose
112
+ begin
113
+ client.delete_object(:bucket => @log_bucket, :key => source_object_key)
114
+ rescue => e
115
+ log.warn("S3 Delete client error. #{e.message}")
116
+ return
117
+ end
118
+ end
119
+
120
+
121
+ def process_content(content)
122
+ filename = content.key.sub(/^#{@log_prefix}\//, "")
123
+ log.info("CloudFront Currently processing: #{filename}") if @verbose
124
+ return if filename[-1] == '/' #skip directory/
125
+ return unless filename[-2, 2] == 'gz' #skip without gz file
126
+
127
+ begin
128
+ access_log_gz = client.get_object(:bucket => @log_bucket, :key => content.key).body
129
+ access_log = Zlib::GzipReader.new(access_log_gz).read
130
+ rescue => e
131
+ log.warn("S3 GET client error. #{e.message}")
132
+ return
133
+ end
134
+
135
+ access_log.split("\n").each do |line|
136
+ if line[0.1] == '#'
137
+ parse_header(line)
138
+ next
139
+ end
140
+ line = URI.unescape(line) #hoge%2520fuga -> hoge%20fuga
141
+ line = URI.unescape(line) #hoge%20fuga -> hoge fuga
142
+ line = line.split("\t")
143
+ record = Hash[@fields.collect.zip(line)]
144
+ timestamp = Time.parse("#{record['date']}T#{record['time']}+00:00").to_i
145
+ router.emit(@tag, timestamp, record)
146
+ end
147
+ purge(filename)
148
+ end
149
+
150
+ def input
151
+ log.info("CloudFront Begining input going to list S3")
152
+ begin
153
+ s3_list = client.list_objects(:bucket => @log_bucket, :prefix => @log_prefix , :delimiter => @delimiter, :max_keys => @s3_get_max)
154
+ rescue => e
155
+ log.warn("S3 GET list error. #{e.message}")
156
+ return
157
+ end
158
+ log.info("Finished S3 get list")
159
+ queue = Queue.new
160
+ threads = []
161
+ log.debug("S3 List size: #{s3_list.contents.length}")
162
+ s3_list.contents.each do |content|
163
+ queue << content
164
+ end
165
+ # BEGINS THREADS
166
+ @thread_num.times do
167
+ threads << Thread.new do
168
+ until queue.empty?
169
+ work_unit = queue.pop(true) rescue nil
170
+ if work_unit
171
+ process_content(work_unit)
172
+ end
173
+ end
174
+ end
175
+ end
176
+ log.debug("CloudFront Waiting for Threads to finish...")
177
+ threads.each { |t| t.join }
178
+ log.debug("CloudFront Finished")
179
+ end
180
+
181
+ class TimerWatcher < Coolio::TimerWatcher
182
+ def initialize(interval, repeat, log, &callback)
183
+ @callback = callback
184
+ @log = log
185
+ super(interval, repeat)
186
+ end
187
+
188
+ def on_timer
189
+ @callback.call
190
+ end
191
+ end
192
+ end
@@ -0,0 +1,28 @@
1
+ require 'rubygems'
2
+ require 'bundler'
3
+ begin
4
+ Bundler.setup(:default, :development)
5
+ rescue Bundler::BundlerError => e
6
+ $stderr.puts e.message
7
+ $stderr.puts "Run `bundle install` to install missing gems"
8
+ exit e.status_code
9
+ end
10
+ require 'test/unit'
11
+
12
+ $LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
13
+ $LOAD_PATH.unshift(File.dirname(__FILE__))
14
+ require 'fluent/test'
15
+ unless ENV.has_key?('VERBOSE')
16
+ nulllogger = Object.new
17
+ nulllogger.instance_eval {|obj|
18
+ def method_missing(method, *args)
19
+ # pass
20
+ end
21
+ }
22
+ $log = nulllogger
23
+ end
24
+
25
+ require 'fluent/plugin/in_cloudfront_log'
26
+
27
+ class Test::Unit::TestCase
28
+ end
@@ -0,0 +1,66 @@
1
+ require_relative '../helper'
2
+ require 'fluent/test'
3
+
4
+ class Cloudfront_LogInputTest < Test::Unit::TestCase
5
+ def setup
6
+ Fluent::Test.setup
7
+ end
8
+
9
+ DEFAULT_CONFIG = {
10
+ :aws_key_id => 'AKIAZZZZZZZZZZZZZZZZ',
11
+ :aws_sec_key => '1234567890qwertyuiopasdfghjklzxcvbnm',
12
+ :log_bucket => 'bucket-name',
13
+ :log_prefix => 'a/b/c',
14
+ :moved_log_bucket => 'bucket-name-moved',
15
+ :moved_log_prefix => 'a/b/c_moved',
16
+ :region => 'ap-northeast-1',
17
+ :tag => 'cloudfront',
18
+ :interval => '500',
19
+ :delimiter => nil,
20
+ :verbose => true,
21
+ }
22
+
23
+ def parse_config(conf = {})
24
+ ''.tap{|s| conf.each { |k, v| s << "#{k} #{v}\n" } }
25
+ end
26
+
27
+ def create_driver(conf = DEFAULT_CONFIG)
28
+ Fluent::Test::InputTestDriver.new(Fluent::Cloudfront_LogInput).configure(parse_config conf)
29
+ end
30
+
31
+ def test_configure
32
+ assert_nothing_raised { driver = create_driver }
33
+
34
+ exception = assert_raise(Fluent::ConfigError) {
35
+ conf = DEFAULT_CONFIG.clone
36
+ conf.delete(:log_bucket)
37
+ driver = create_driver(conf)
38
+ }
39
+ assert_equal("'log_bucket' parameter is required", exception.message)
40
+
41
+ exception = assert_raise(Fluent::ConfigError) {
42
+ conf = DEFAULT_CONFIG.clone
43
+ conf.delete(:region)
44
+ driver = create_driver(conf)
45
+ }
46
+ assert_equal("'region' parameter is required", exception.message)
47
+
48
+ exception = assert_raise(Fluent::ConfigError) {
49
+ conf = DEFAULT_CONFIG.clone
50
+ conf.delete(:log_prefix)
51
+ driver = create_driver(conf)
52
+ }
53
+ assert_equal("'log_prefix' parameter is required", exception.message)
54
+
55
+ conf = DEFAULT_CONFIG.clone
56
+ conf.delete(:moved_log_bucket)
57
+ driver = create_driver(conf)
58
+ assert_equal(driver.instance.log_bucket, driver.instance.moved_log_bucket)
59
+
60
+ conf = DEFAULT_CONFIG.clone
61
+ conf.delete(:moved_log_prefix)
62
+ driver = create_driver(conf)
63
+ assert_equal('_moved', driver.instance.moved_log_prefix)
64
+ end
65
+
66
+ end
metadata ADDED
@@ -0,0 +1,124 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: fluent-plugin-cloudfront-log-v0.14-fix
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: ruby
6
+ authors:
7
+ - lenfree
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2018-01-02 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: fluentd
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: aws-sdk
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '2.1'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '2.1'
41
+ - !ruby/object:Gem::Dependency
42
+ name: bundler
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: rake
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: test-unit
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '2'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '2'
83
+ description: AWS CloudFront log input plugin for fluentd. This repo is temporary until
84
+ PR to upstream is addressed.
85
+ email:
86
+ - lenfree.yeung@gmail.com
87
+ executables: []
88
+ extensions: []
89
+ extra_rdoc_files: []
90
+ files:
91
+ - ".gitignore"
92
+ - Gemfile
93
+ - README.md
94
+ - Rakefile
95
+ - fluent-plugin-cloudfront-log-v0.14-fix.gemspec
96
+ - lib/fluent/plugin/in_cloudfront_log.rb
97
+ - test/helper.rb
98
+ - test/plugin/test_in_cloudfrontlog.rb
99
+ homepage: https://github.com/packetloop/fluent-plugin-cloudfront-log-v0.14-fix
100
+ licenses: []
101
+ metadata: {}
102
+ post_install_message:
103
+ rdoc_options: []
104
+ require_paths:
105
+ - lib
106
+ required_ruby_version: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - ">="
109
+ - !ruby/object:Gem::Version
110
+ version: '0'
111
+ required_rubygems_version: !ruby/object:Gem::Requirement
112
+ requirements:
113
+ - - ">="
114
+ - !ruby/object:Gem::Version
115
+ version: '0'
116
+ requirements: []
117
+ rubyforge_project:
118
+ rubygems_version: 2.6.11
119
+ signing_key:
120
+ specification_version: 4
121
+ summary: AWS CloudFront log input plugin with temporary fix for v0.14. Credit to kubihie.
122
+ test_files:
123
+ - test/helper.rb
124
+ - test/plugin/test_in_cloudfrontlog.rb