fluent-plugin-s3in 0.1.2.2

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 451f49b44a18a7a497a848e88f6577116097ef26
4
+ data.tar.gz: a9ad7edfa2aeca7df49b4867b34804a8437fd682
5
+ SHA512:
6
+ metadata.gz: a44412621dbe9ef4d647549b16901c5372b87cd07be0ef963d1aef4a9dcad0f3d1151a9e50246ca9aa8e207ff1b03f01f64aed10bf11b1c7f1e9ceb619117488
7
+ data.tar.gz: 9b8640131595abb4d16238b69ec5a43c4c5ad4b81669764487f390f16da3f6d303fc44c6e5ee43eebf100b272a534f6e0f73fb017b9d0344c3b680dec3b582f9
@@ -0,0 +1,13 @@
1
+ .DS_Store
2
+ /.bundle/
3
+ /temp/
4
+ /.yardoc
5
+ /Gemfile.lock
6
+ /_yardoc/
7
+ /coverage/
8
+ /doc/
9
+ /pkg/
10
+ /spec/reports/
11
+ /tmp/
12
+ /vendor/
13
+ /temp/
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ #--format documentation
2
+ --color
@@ -0,0 +1,3 @@
1
+
2
+ LineLength:
3
+ Max: 110
@@ -0,0 +1,5 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.1.7
4
+ before_install: gem install bundler -v 1.10.6
5
+ script: bundle exec rake test
@@ -0,0 +1,13 @@
1
+ # Contributor Code of Conduct
2
+
3
+ As contributors and maintainers of this project, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
4
+
5
+ We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.
6
+
7
+ Examples of unacceptable behavior by participants include the use of sexual language or imagery, derogatory comments or personal attacks, trolling, public or private harassment, insults, or other unprofessional conduct.
8
+
9
+ Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed from the project team.
10
+
11
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue or contacting one or more of the project maintainers.
12
+
13
+ This Code of Conduct is adapted from the [Contributor Covenant](http://contributor-covenant.org), version 1.0.0, available at [http://contributor-covenant.org/version/1/0/0/](http://contributor-covenant.org/version/1/0/0/)
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in fluent-plugin-input-from-s3.gemspec
4
+ gemspec
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2015 TODO: Write your name
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
@@ -0,0 +1,41 @@
1
+ # Fluent::Plugin::S3in
2
+
3
+ Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/fluent/plugin/s3in`. To experiment with that code, run `bin/console` for an interactive prompt.
4
+
5
+ TODO: Delete this and the text above, and describe your gem
6
+
7
+ ## Installation
8
+
9
+ Add this line to your application's Gemfile:
10
+
11
+ ```ruby
12
+ gem 'fluent-plugin-s3in'
13
+ ```
14
+
15
+ And then execute:
16
+
17
+ $ bundle
18
+
19
+ Or install it yourself as:
20
+
21
+ $ gem install fluent-plugin-s3in
22
+
23
+ ## Usage
24
+
25
+ TODO: Write usage instructions here
26
+
27
+ ## Development
28
+
29
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
30
+
31
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
32
+
33
+ ## Contributing
34
+
35
+ Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/fluent-plugin-s3in. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](contributor-covenant.org) code of conduct.
36
+
37
+
38
+ ## License
39
+
40
+ The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
41
+
@@ -0,0 +1,11 @@
1
+ require 'bundler/gem_tasks'
2
+
3
+ task :default => :test
4
+
5
+ task :test do
6
+ require 'rspec/core'
7
+ require 'rspec/core/rake_task'
8
+ RSpec::Core::RakeTask.new(:test) do |spec|
9
+ spec.pattern = FileList['spec/**/*_spec.rb']
10
+ end
11
+ end
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.1.2.2
@@ -0,0 +1,36 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+
5
+ Gem::Specification.new do |spec|
6
+ spec.name = 'fluent-plugin-s3in'
7
+ spec.version = File.read('VERSION').strip
8
+ spec.authors = ['Takeshi Shiihara']
9
+ spec.email = ['shi-take@dummy.dummy']
10
+
11
+ spec.summary = %q{Write a short summary, because Rubygems requires one.}
12
+ spec.description = %q{Write a longer description or delete this line.}
13
+ spec.homepage = 'https://github.com/shii-take/fluent-plugin-s3in'
14
+ spec.license = 'MIT'
15
+
16
+ spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
17
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
18
+ spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
19
+ spec.require_paths = ['lib']
20
+
21
+ spec.required_ruby_version = '~> 2'
22
+
23
+ spec.add_dependency 'fluentd', '~> 0'
24
+ spec.add_dependency 'sequel', '~> 4'
25
+ spec.add_dependency 'aws-sdk', '~> 2.1'
26
+ spec.add_dependency 'sqlite3', '~> 1.3'
27
+ spec.add_dependency 'tzinfo', '~> 1.2'
28
+
29
+ spec.add_development_dependency 'bundler', '~> 1.0'
30
+ spec.add_development_dependency 'rake', '~> 10.0'
31
+ spec.add_development_dependency 'fakes3'
32
+ spec.add_development_dependency 'rspec', '~> 3.3.0'
33
+ spec.add_development_dependency 'glint'
34
+ spec.add_development_dependency 'apache-loggen'
35
+ spec.add_development_dependency 'webmock'
36
+ end
@@ -0,0 +1,673 @@
1
+ require 'fileutils'
2
+ require 'sequel'
3
+ require 'aws-sdk'
4
+ require 'time'
5
+ require 'singleton'
6
+ require 'sqlite3'
7
+ require 'tzinfo'
8
+ require 'zlib'
9
+ require 'net/http'
10
+ require 'uri'
11
+
12
+ # Fluent
13
+ module Fluent
14
+ # S3Input
15
+ class S3Input < Fluent::Input
16
+ Fluent::Plugin.register_input('s3in', self)
17
+
18
+ define_method('router') { Fluent::Engine } unless method_defined?(:router)
19
+
20
+ # AWS Common Config
21
+ config_param :region, :string, default: nil
22
+ # AWS Credential Config
23
+ config_param :access_key_id, :string, default: nil
24
+ config_param :secret_access_key, :string, default: nil
25
+ # AWS S3 Config
26
+ config_param :s3_bucket, :string, default: nil
27
+ config_param :s3_prefix, :string, default: nil
28
+ DATE_CONDITION_MAX_NUM = 20
29
+ (1..DATE_CONDITION_MAX_NUM).each do |i|
30
+ config_param "s3_key_date_condition#{i}".to_sym, :string, default: nil
31
+ end
32
+ config_param :s3_key_format, :string, default: nil
33
+ config_param :s3_key_exclude_format, :string, default: nil
34
+ config_param :s3_key_current_format, :string, default: nil
35
+ # Log Format
36
+ config_param :format, :string, default: nil
37
+ config_param :multiline, :bool, default: false
38
+ config_param :format_firstline, :string, default: nil
39
+ config_param :tag, :string, default: 's3in.log'
40
+ config_param :timestamp, :string, default: nil
41
+ config_param :timezone, :string, default: 'UTC'
42
+ # Workspace Config
43
+ config_param :work_dir, :string, default: '/var/s3in'
44
+ config_param :clear_db_at_start, :bool, default: false
45
+ # S3 Describe Interval
46
+ config_param :refresh_interval, :integer, default: 300
47
+ config_param :start_now, :bool, default: false
48
+ # EC2 Describe Config
49
+ config_param :add_instance_tags, :bool, default: false
50
+ # Performance Config
51
+ config_param :download_thread_num, :integer, default: 5
52
+ config_param :parse_thread_num, :integer, default: 5
53
+
54
+ attr_reader :status
55
+ attr_accessor :start_queue
56
+
57
+ module Status
58
+ READY = 1
59
+ RUNNING = 2
60
+ WAITING = 3
61
+ SHUTDOWN = 4
62
+ end
63
+
64
+ def initialize
65
+ super
66
+ @status = Status::READY
67
+ @shutdown_flag = false
68
+ @router = router
69
+ @download_queue = Queue.new
70
+ @archive_parse_queue = Queue.new
71
+ @current_parse_queue = Queue.new
72
+ @start_queue = Queue.new
73
+ end
74
+
75
+ def configure(conf)
76
+ super
77
+
78
+ @region = _region
79
+ fail 'region is required' if @region.nil?
80
+ if @access_key_id.nil? && @secret_access_key.nil?
81
+ fail 'set "access_key_id" and "secret_access_key", or set "iam role" in this instance' unless _iam_role?
82
+ else
83
+ fail 'access_key_id is required' if @access_key_id.nil?
84
+ fail 'secret_access_key is required' if @secret_access_key.nil?
85
+ end
86
+ fail 's3_bucket is required' if @s3_bucket.nil?
87
+ fail 's3_prefix is required' if @s3_prefix.nil?
88
+
89
+ @date_conditions = []
90
+ (1..DATE_CONDITION_MAX_NUM).each do |index|
91
+ next if conf["s3_key_date_condition#{index}"].nil?
92
+ params = conf["s3_key_date_condition#{index}"].split(' ', 2)
93
+ fail "s3_key_date_condition#{index} parse error" if params.size != 2
94
+ @date_conditions << { group_name: params[0], condition_str: params[1] }
95
+ end
96
+ time = Time.now.utc
97
+ @date_conditions.each do |date_condition|
98
+ _validate_date_condition(time_str: nil, condition_str: date_condition[:condition_str], current_time: time)
99
+ end
100
+
101
+ @s3_prefix = _add_end_slash(@s3_prefix)
102
+ fail 'Requires "s3_key_format" if input "s3_key_date_condition" values' if @date_conditions.size > 0 && @s3_key_format.nil?
103
+ @s3_key_format_regexp = _regexp_format(format: @s3_key_format, deny_no_name: @date_conditions.size > 0) unless @s3_key_format.nil?
104
+ @s3_key_exclude_format_regexp = _regexp_format(format: @s3_key_exclude_format) unless @s3_key_exclude_format.nil?
105
+ @s3_key_current_format_regexp = _regexp_format(format: @s3_key_current_format) unless @s3_key_current_format.nil?
106
+
107
+ fail 'format is required' if @format.nil?
108
+ @format_regexp = _regexp_format(format: @format, deny_no_name: true)
109
+ @format_firstline_regexp = _regexp_format(format: @format_firstline) unless @format_firstline.nil?
110
+
111
+ fail 'download_thread_num is required (over 1)' if @download_thread_num <= 0
112
+ fail 'parse_thread_num is required (over 1)' if @parse_thread_num <= 0
113
+
114
+ fail 'timezone is required' if @timezone.nil?
115
+
116
+ _validate_timestamp unless @timestamp.nil?
117
+
118
+ fail 'work_dir is required' if @work_dir.nil? || @work_dir.empty?
119
+ @work_dir = _add_end_slash(@work_dir)
120
+ _make_work_dir
121
+
122
+ true
123
+ rescue => e
124
+ raise Fluent::ConfigError, "error occurred: #{e.message}, #{e.backtrace.join("\n")}"
125
+ end
126
+
127
+ def _region
128
+ return @region unless @region.nil?
129
+ uri = URI.parse('http://169.254.169.254/latest/meta-data/placement/availability-zone')
130
+ http = Net::HTTP.new(uri.host, uri.port)
131
+ http.open_timeout = 5
132
+ http.read_timeout = 10
133
+ http.get(uri.path).chop
134
+ rescue
135
+ nil
136
+ end
137
+
138
+ def _validate_timestamp
139
+ params = @timestamp.split(' ', 2)
140
+ fail "not found timestamp group name '#{params[0]}' in format" unless @format_regexp.named_captures.include? params[0]
141
+ @timestamp = {
142
+ group_name: params[0],
143
+ format: params[1].nil? ? nil : params[1]
144
+ }
145
+ end
146
+
147
+ def _strptime_with_timezone(date, format = nil)
148
+ utc_offset = @timezone.nil? ? Time.now.utc_offset / 60 : TZInfo::Timezone.get(@timezone).current_period.utc_offset / 60
149
+ offset_ope = (utc_offset >= 0) ? '+' : '-'
150
+ utc_offset = utc_offset.abs
151
+ offset_hour = (utc_offset / 60).to_i
152
+ offset_min = utc_offset - (offset_hour * 60)
153
+ offset_str = sprintf('%s%02d:%02d', offset_ope, offset_hour, offset_min)
154
+ unless (/(%z|%:z|%::z|%Z)/ =~ format)
155
+ if md = date.match(/([\+\-Z])([0-9]{2})?:?([0-9]{2})?$/)
156
+ date = date.gsub(md[0], offset_str)
157
+ end
158
+ end
159
+ time = format.nil? ? Time.parse(date).iso8601 : Time.strptime(date, format).iso8601
160
+ unless (/(%z|%:z|%::z|%Z)/ =~ format)
161
+ if md = time.match(/([\+\-Z])([0-9]{2})?:?([0-9]{2})?$/)
162
+ time = time.gsub(md[0], offset_str)
163
+ end
164
+ end
165
+ Time.iso8601(time)
166
+ end
167
+
168
+ def _make_work_dir
169
+ if Dir.exist?(@work_dir)
170
+ fail 'work_dir is not writable' unless FileTest.writable?(@work_dir)
171
+ else
172
+ begin
173
+ FileUtils.mkdir_p @work_dir
174
+ rescue
175
+ raise 'coud not make work_dir'
176
+ end
177
+ end
178
+ end
179
+
180
+ def _validate_date_condition(time_str: nil, condition_str:, current_time:)
181
+ date_format = condition_str.rpartition(' ')[0].rpartition(' ')[0]
182
+ left_side_time = time_str.nil? ? nil : _strptime_with_timezone(time_str, date_format)
183
+ comparison_operator = condition_str.rpartition(' ')[0].rpartition(' ')[2]
184
+ right_side_time = _diff_to_time(condition_str.rpartition(' ')[2], current_time)
185
+ _compare_times(left_side_time, comparison_operator, right_side_time)
186
+ rescue => e
187
+ raise "s3_key_date_condition parse error: #{e.message}"
188
+ end
189
+
190
+ def _diff_to_time(diff_str, current_time)
191
+ if md = diff_str.match(/^([\+\-])[0-9]+$/)
192
+ number = diff_str.rpartition(md[1])[2].to_i
193
+ case md[1]
194
+ when '+'
195
+ return current_time + number
196
+ when '-'
197
+ return current_time - number
198
+ end
199
+ end
200
+ Time.iso8601(diff_str)
201
+ end
202
+
203
+ def _compare_times(time_a, operator, time_b)
204
+ case operator
205
+ when '>'
206
+ return time_a.nil? || (time_a <=> time_b) > 0
207
+ when '<'
208
+ return time_a.nil? || (time_a <=> time_b) < 0
209
+ when '>='
210
+ return time_a.nil? || (time_a <=> time_b) >= 0
211
+ when '<='
212
+ return time_a.nil? || (time_a <=> time_b) <= 0
213
+ when '=='
214
+ return time_a.nil? || (time_a <=> time_b) == 0
215
+ when '!='
216
+ return time_a.nil? || (time_a <=> time_b) != 0
217
+ else
218
+ fail 'invalid comparison operator'
219
+ end
220
+ false
221
+ end
222
+
223
+ def _regexp_format(format:, deny_no_name: false)
224
+ regex = nil
225
+ unless format.start_with?('/') && format.end_with?('/')
226
+ fail "Invalid regexp in format '#{format[1..-2]}'"
227
+ end
228
+ begin
229
+ regex = @multiline ? Regexp.new(format[1..-2], Regexp::MULTILINE) : Regexp.new(format[1..-2])
230
+ rescue => e
231
+ raise "Invalid regexp in format '#{format[1..-2]}': #{e.message}"
232
+ end
233
+ fail "No named captures in format '#{format[1..-2]}'" if deny_no_name && regex.named_captures.empty?
234
+ regex
235
+ end
236
+
237
+ def _iam_role?
238
+ ec2 = Aws::EC2::Client.new(region: @region)
239
+ s3 = Aws::S3::Client.new(region: @region)
240
+ !ec2.config.credentials.nil? && !s3.config.credentials.nil?
241
+ rescue => e
242
+ raise "Aws Client error occurred: #{e.message}"
243
+ end
244
+
245
+ def _add_end_slash(str)
246
+ str.end_with?('/') ? str : str + '/'
247
+ end
248
+
249
+ def start
250
+ super
251
+
252
+ #@db = Datastore.instance
253
+ @db = Datastore.new
254
+ @db.connect(work_dir: @work_dir, clear: @clear_db_at_start)
255
+
256
+ _clear_queues
257
+ @timer = Thread.new(&method(:_timer))
258
+ @thread = Thread.start do
259
+ while @start_queue.pop
260
+ @status = Status::RUNNING
261
+ break if @shutdown_flag
262
+ run
263
+ @status = Status::WAITING
264
+ end
265
+ @status = Status::SHUTDOWN
266
+ end
267
+ @thread
268
+ end
269
+
270
+ def _timer
271
+ loop do
272
+ sleep @refresh_interval unless @start_now
273
+ @start_now = false if @start_now
274
+ @start_queue.push true
275
+ end
276
+ end
277
+
278
+ def shutdown
279
+ super
280
+ @timer.kill
281
+ @shutdown_flag = true
282
+ @start_queue.push nil
283
+ @parse_thread_num.times { @archive_parse_queue.push nil }
284
+ @parse_thread_num.times { @current_parse_queue.push nil }
285
+ @download_thread_num.times { @download_queue.push nil }
286
+ # thread timeout 30sec
287
+ @thread.join 30
288
+ _clear_queues
289
+ @db.close
290
+ end
291
+
292
+ def _clear_queues
293
+ @start_queue.clear
294
+ @download_queue.clear
295
+ @current_parse_queue.clear
296
+ @archive_parse_queue.clear
297
+ end
298
+
299
+ def run
300
+ obj_list = _diff_objects
301
+ return if obj_list.size <= 0
302
+ obj_list.each { |obj| @download_queue.push obj }
303
+ threads = []
304
+ dl_finish_queue = Queue.new
305
+ @download_thread_num.times do
306
+ threads << Thread.new do
307
+ while obj = @download_queue.pop
308
+ break if @shutdown_flag
309
+ _download_object obj
310
+ break if @shutdown_flag
311
+ @archive_parse_queue.push obj
312
+ dl_finish_queue.push true
313
+ if @download_queue.empty?
314
+ @download_thread_num.times { @download_queue.push nil }
315
+ end
316
+ end
317
+ end
318
+ end
319
+ @parse_thread_num.times do
320
+ threads << Thread.new do
321
+ while obj = @archive_parse_queue.pop
322
+ break if @shutdown_flag
323
+ if obj[:current]
324
+ @current_parse_queue.push obj
325
+ else
326
+ _parse_object obj
327
+ end
328
+ if @archive_parse_queue.empty? && dl_finish_queue.size == obj_list.size
329
+ @parse_thread_num.times { @archive_parse_queue.push nil }
330
+ end
331
+ end
332
+ sleep 0.01 until @archive_parse_queue.empty?
333
+ @current_parse_queue.push nil if @current_parse_queue.empty?
334
+ while obj = @current_parse_queue.pop
335
+ break if @shutdown_flag
336
+ _parse_object obj
337
+ @current_parse_queue.push nil if @current_parse_queue.empty?
338
+ end
339
+ end
340
+ end
341
+ threads.each(&:join)
342
+ rescue => e
343
+ raise "error occurred: #{e.message}, #{e.backtrace.join("\n")}"
344
+ end
345
+
346
+ def _diff_objects
347
+ s3_obj_list = _s3_objects(prefix: @s3_prefix)
348
+ diff = []
349
+ s3_obj_list.each do |obj|
350
+ record = @db.init_record(obj)
351
+ diff << record unless record.nil?
352
+ end
353
+ diff
354
+ end
355
+
356
+ def _s3_objects(prefix:, objects: [], next_marker: nil, timestamp: Time.now.utc)
357
+ return objects if @shutdown_flag
358
+ resp = _s3_client.list_objects(
359
+ bucket: @s3_bucket,
360
+ delimiter: '/',
361
+ marker: next_marker,
362
+ prefix: prefix
363
+ )
364
+ resp.contents.each do |obj|
365
+ next if obj.key.end_with?('/') || obj.size <= 0
366
+
367
+ key_match = nil
368
+ unless @s3_key_format_regexp.nil?
369
+ key_match = @s3_key_format_regexp.match(obj.key)
370
+ next if key_match.nil?
371
+ end
372
+
373
+ unless @s3_key_exclude_format_regexp.nil?
374
+ exkey_match = @s3_key_exclude_format_regexp.match(obj.key)
375
+ next unless exkey_match.nil?
376
+ end
377
+
378
+ skip_flag = false
379
+ key_match.names.each do |name|
380
+ next unless @date_conditions.collect { |item| item[:group_name] }.include?(name)
381
+ @date_conditions.each do |condition|
382
+ unless _validate_date_condition(time_str: key_match[name.to_sym], condition_str: condition[:condition_str], current_time: timestamp)
383
+ skip_flag = true
384
+ break
385
+ end
386
+ end
387
+ break if skip_flag
388
+ end unless key_match.nil?
389
+ next if skip_flag
390
+
391
+ objects << @db.default_schema.merge(
392
+ {
393
+ bucket: resp.name,
394
+ key: obj.key,
395
+ size: obj.size.to_i,
396
+ modified: obj.last_modified.iso8601,
397
+ current: _current_target?(obj.key),
398
+ # TODO REMOVE
399
+ #position: 0
400
+ }
401
+ ) if obj.storage_class != 'GLACIER'
402
+ end
403
+ _s3_objects(
404
+ prefix: prefix, objects: objects, next_marker: resp.next_marker, timestamp: timestamp
405
+ ) if resp.is_truncated
406
+ resp.common_prefixes.each do |cmn_prefix|
407
+ _s3_objects(prefix: cmn_prefix.prefix, objects: objects, timestamp: timestamp)
408
+ end if next_marker.nil?
409
+ objects
410
+ rescue => e
411
+ $log.error "error occurred: #{e.message}, #{e.backtrace.join("\n")}"
412
+ end
413
+
414
+ def _current_target?(key)
415
+ return false if @s3_key_current_format_regexp.nil?
416
+ !@s3_key_current_format_regexp.match(key).nil?
417
+ end
418
+
419
+ def _download_object(obj)
420
+ file = _file_path(bucket: obj[:bucket], key: obj[:key])
421
+ begin
422
+ File.delete file if FileTest.exist? file
423
+ resp = _s3_client.get_object(
424
+ response_target: (/\.gz$/ =~ obj[:key]) ? file + '.gz' : file,
425
+ bucket: obj[:bucket],
426
+ key: obj[:key]
427
+ )
428
+ obj[:size] = resp.content_length
429
+ obj[:modified] = resp.last_modified.iso8601
430
+ return FileTest.exist? file
431
+ rescue => e
432
+ $log.error "S3 GetObject error occurred: #{e.message}"
433
+ File.delete file if FileTest.exist? file
434
+ return false
435
+ end
436
+ end
437
+
438
+ def _file_path(bucket:, key:)
439
+ @work_dir + Digest::MD5.hexdigest("#{bucket}/#{key}")
440
+ end
441
+
442
+ def _s3_client
443
+ options = { region: @region }
444
+ options.merge!(
445
+ access_key_id: @access_key_id,
446
+ secret_access_key: @secret_access_key
447
+ ) if @access_key_id && @secret_access_key
448
+ Aws::S3::Client.new(options)
449
+ rescue => e
450
+ raise "S3 Client error occurred: #{e.message}"
451
+ end
452
+
453
+ def _ends_line(file, position)
454
+ first_line = nil
455
+ last_line = nil
456
+ File.open(file, File::RDONLY) do |f|
457
+ first_line = f.gets
458
+ if f.pos >= position
459
+ last_line = first_line
460
+ break
461
+ end
462
+ cr_count = 0
463
+ f.seek(position - 1, IO::SEEK_SET)
464
+ while char = f.read(1)
465
+ if char == "\n"
466
+ break if cr_count > 0
467
+ cr_count += 1
468
+ end
469
+ f.seek(f.pos - 2, IO::SEEK_SET)
470
+ end
471
+ last_line = f.gets
472
+ end
473
+ [first_line, last_line]
474
+ end
475
+
476
+ def _parse_object(obj)
477
+ file = _file_path(bucket: obj[:bucket], key: obj[:key])
478
+
479
+ if /\.gz$/ =~ obj[:key]
480
+ Zlib::GzipReader.open(file + '.gz', encoding: 'UTF-8') { |f| File.write(file, f.read) }
481
+ File.delete file + '.gz'
482
+ end
483
+
484
+ first_line, last_line = _ends_line(file, obj[:position])
485
+ obj[:first_line] = first_line
486
+
487
+ return if @shutdown_flag
488
+
489
+ File.open(file, File::RDONLY) { |f| _read_object_lines f, obj }
490
+
491
+ File.delete file
492
+ rescue => e
493
+ $log.error "error occurred: #{e.message} #{e.backtrace}"
494
+ end
495
+
496
+ def _read_object_lines(f, obj)
497
+ return if @shutdown_flag
498
+ current_record = @db.search_current_record obj
499
+ line_buf = []
500
+ instance_tag_cache = {}
501
+ f.seek(obj[:position], IO::SEEK_SET)
502
+ while line = f.gets
503
+ break if @shutdown_flag
504
+ line_buf = [] if @multiline
505
+
506
+ unless @format_firstline_regexp.nil?
507
+ firstline_match = @format_firstline_regexp.match(line)
508
+ line_buf = [] unless firstline_match.nil?
509
+ end
510
+
511
+ line_buf << line
512
+ line_match = @format_regexp.match(line_buf.join(''))
513
+ next if line_match.nil?
514
+
515
+ line_buf = []
516
+ emit_record = {}
517
+ log_time = nil
518
+ line_match.names.each do |name|
519
+ emit_record[name] = line_match[name.to_sym]
520
+
521
+ if !@timestamp.nil? && name == @timestamp[:group_name] && !emit_record[name].nil?
522
+ begin
523
+ log_time = _strptime_with_timezone(emit_record[name], @timestamp[:format])
524
+ rescue => e
525
+ $log.warn "error occurred: #{e.message}"
526
+ log_time = Fluent::Engine.now
527
+ end
528
+ end
529
+
530
+ next if !@add_instance_tags || name != 'instance_id' || emit_record[name].nil?
531
+ instance_id = line_match[name.to_sym]
532
+ if instance_tag_cache[instance_id].nil?
533
+ instance_tag_cache[instance_id] = _describe_instance_tags(instance_id)
534
+ end
535
+ instance_tag_cache[instance_id].each do |key, value|
536
+ emit_record[key] = value
537
+ end
538
+ end
539
+
540
+ obj[:position] = f.pos
541
+ obj[:last_line] = line
542
+
543
+ @db.transaction do |db|
544
+ unless current_record.nil?
545
+ db.where(id: current_record[:id]).update(
546
+ size: 0,
547
+ position: 0,
548
+ first_line: nil,
549
+ last_line: nil
550
+ )
551
+ current_record = nil
552
+ end
553
+ @router.emit(@tag, log_time, emit_record)
554
+ db.where(id: obj[:id]).limit(1).update(obj)
555
+ end
556
+
557
+ break if @shutdown_flag
558
+ end
559
+ end
560
+
561
+ def _describe_instance_tags(instance_id)
562
+ tags = _ec2_client.describe_instances(instance_ids: [instance_id]).reservations[0].instances[0].tags
563
+ new_tag_hash = {}
564
+ tags.each { |tag| new_tag_hash.store(tag[:key], tag[:value]) }
565
+ new_tag_hash
566
+ rescue => e
567
+ raise "EC2 Client error occurred: #{e.message}"
568
+ end
569
+
570
+ def _ec2_client
571
+ options = { region: @region }
572
+ options.merge!(
573
+ access_key_id: @access_key_id,
574
+ secret_access_key: @secret_access_key
575
+ ) if @access_key_id && @secret_access_key
576
+ Aws::EC2::Client.new(options)
577
+ rescue => e
578
+ raise "EC2 Client error occurred: #{e.message}"
579
+ end
580
+
581
+ class Datastore
582
+ #include Singleton
583
+
584
+ attr_accessor :counter
585
+
586
+ def initialize
587
+ @semaphore = Mutex.new
588
+ end
589
+
590
+ def connect(work_dir:, clear: false)
591
+ db = nil
592
+ @work_dir = work_dir
593
+ db = Sequel.sqlite(@work_dir + 's3in.sqlite')
594
+ #db = Sequel.sqlite
595
+
596
+ # http://arbitrage.jpn.org/it/2015-07-07-2/
597
+ {
598
+ 'journal_mode' => 'MEMORY',
599
+ #'journal_mode' => 'wal',
600
+ #'journal_mode' => 'Persist',
601
+ 'synchronous' => 'OFF',
602
+ 'busy_timeout' => 50_000
603
+ }.each { |key, value| db.pragma_set(key, value) }
604
+
605
+ db.drop_table(:objects) if clear && db.table_exists?(:objects)
606
+ db.create_table :objects do
607
+ primary_key :id
608
+ String :bucket, allow_null: false, index: true
609
+ String :key, allow_null: false, index: true
610
+ Integer :size, allow_null: false, index: false, default: 0
611
+ Boolean :current, allow_null: false, index: true, default: false
612
+ Integer :position, allow_null: false, index: false, default: 0
613
+ String :modified, allow_null: false, index: false
614
+ String :first_line, allow_null: true, index: true, default: nil
615
+ String :last_line, allow_null: true, index: true, default: nil
616
+ end unless db.table_exists?(:objects)
617
+ @db = db
618
+ end
619
+
620
+ def close
621
+ @db.disconnect
622
+ Sequel::DATABASES.delete(@db)
623
+ end
624
+
625
+ def default_schema
626
+ if @schema_hash.nil?
627
+ @schema_hash = {}
628
+ @db.schema(:objects).collect { |item| @schema_hash[item[0]] = item[1][:default] }
629
+ @schema_hash[:size] = @schema_hash[:size].to_i
630
+ @schema_hash[:position] = @schema_hash[:position].to_i
631
+ end
632
+ @schema_hash.clone
633
+ end
634
+
635
+ def transaction
636
+ @semaphore.synchronize do
637
+ @db.transaction { yield @db[:objects] }
638
+ end
639
+ end
640
+
641
+ def init_record(obj)
642
+ transaction do |db|
643
+ record = db.first(bucket: obj[:bucket], key: obj[:key])
644
+ if record.nil?
645
+ obj[:id] = db.insert(obj)
646
+ else
647
+ return nil if Time.iso8601(obj[:modified]) == Time.iso8601(record[:modified])
648
+ # return nil if Time.iso8601(obj[:modified]) == Time.iso8601(record[:modified]) && obj[:size] == record[:size]
649
+ obj[:id] = record[:id]
650
+ obj[:position] = record[:position]
651
+ obj[:first_line] = record[:first_line]
652
+ obj[:last_line] = record[:last_line]
653
+ end
654
+ obj
655
+ end
656
+ end
657
+
658
+ def search_current_record(obj)
659
+ return nil unless obj[:current]
660
+ transaction do |db|
661
+ return db.exclude(
662
+ bucket: obj[:bucket],
663
+ key: obj[:key]
664
+ ).where(
665
+ current: true,
666
+ first_line: obj[:first_line],
667
+ last_line: obj[:last_line]
668
+ ).first
669
+ end
670
+ end
671
+ end
672
+ end
673
+ end
metadata ADDED
@@ -0,0 +1,224 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: fluent-plugin-s3in
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.2.2
5
+ platform: ruby
6
+ authors:
7
+ - Takeshi Shiihara
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2015-11-08 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: fluentd
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: sequel
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '4'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '4'
41
+ - !ruby/object:Gem::Dependency
42
+ name: aws-sdk
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '2.1'
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '2.1'
55
+ - !ruby/object:Gem::Dependency
56
+ name: sqlite3
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '1.3'
62
+ type: :runtime
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '1.3'
69
+ - !ruby/object:Gem::Dependency
70
+ name: tzinfo
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '1.2'
76
+ type: :runtime
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '1.2'
83
+ - !ruby/object:Gem::Dependency
84
+ name: bundler
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: '1.0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: '1.0'
97
+ - !ruby/object:Gem::Dependency
98
+ name: rake
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - "~>"
102
+ - !ruby/object:Gem::Version
103
+ version: '10.0'
104
+ type: :development
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - "~>"
109
+ - !ruby/object:Gem::Version
110
+ version: '10.0'
111
+ - !ruby/object:Gem::Dependency
112
+ name: fakes3
113
+ requirement: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - ">="
116
+ - !ruby/object:Gem::Version
117
+ version: '0'
118
+ type: :development
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ requirements:
122
+ - - ">="
123
+ - !ruby/object:Gem::Version
124
+ version: '0'
125
+ - !ruby/object:Gem::Dependency
126
+ name: rspec
127
+ requirement: !ruby/object:Gem::Requirement
128
+ requirements:
129
+ - - "~>"
130
+ - !ruby/object:Gem::Version
131
+ version: 3.3.0
132
+ type: :development
133
+ prerelease: false
134
+ version_requirements: !ruby/object:Gem::Requirement
135
+ requirements:
136
+ - - "~>"
137
+ - !ruby/object:Gem::Version
138
+ version: 3.3.0
139
+ - !ruby/object:Gem::Dependency
140
+ name: glint
141
+ requirement: !ruby/object:Gem::Requirement
142
+ requirements:
143
+ - - ">="
144
+ - !ruby/object:Gem::Version
145
+ version: '0'
146
+ type: :development
147
+ prerelease: false
148
+ version_requirements: !ruby/object:Gem::Requirement
149
+ requirements:
150
+ - - ">="
151
+ - !ruby/object:Gem::Version
152
+ version: '0'
153
+ - !ruby/object:Gem::Dependency
154
+ name: apache-loggen
155
+ requirement: !ruby/object:Gem::Requirement
156
+ requirements:
157
+ - - ">="
158
+ - !ruby/object:Gem::Version
159
+ version: '0'
160
+ type: :development
161
+ prerelease: false
162
+ version_requirements: !ruby/object:Gem::Requirement
163
+ requirements:
164
+ - - ">="
165
+ - !ruby/object:Gem::Version
166
+ version: '0'
167
+ - !ruby/object:Gem::Dependency
168
+ name: webmock
169
+ requirement: !ruby/object:Gem::Requirement
170
+ requirements:
171
+ - - ">="
172
+ - !ruby/object:Gem::Version
173
+ version: '0'
174
+ type: :development
175
+ prerelease: false
176
+ version_requirements: !ruby/object:Gem::Requirement
177
+ requirements:
178
+ - - ">="
179
+ - !ruby/object:Gem::Version
180
+ version: '0'
181
+ description: Write a longer description or delete this line.
182
+ email:
183
+ - shi-take@dummy.dummy
184
+ executables: []
185
+ extensions: []
186
+ extra_rdoc_files: []
187
+ files:
188
+ - ".gitignore"
189
+ - ".rspec"
190
+ - ".rubocop.yml"
191
+ - ".travis.yml"
192
+ - CODE_OF_CONDUCT.md
193
+ - Gemfile
194
+ - LICENSE.txt
195
+ - README.md
196
+ - Rakefile
197
+ - VERSION
198
+ - fluent-plugin-s3in.gemspec
199
+ - lib/fluent/plugin/in_s3.rb
200
+ homepage: https://github.com/shii-take/fluent-plugin-s3in
201
+ licenses:
202
+ - MIT
203
+ metadata: {}
204
+ post_install_message:
205
+ rdoc_options: []
206
+ require_paths:
207
+ - lib
208
+ required_ruby_version: !ruby/object:Gem::Requirement
209
+ requirements:
210
+ - - "~>"
211
+ - !ruby/object:Gem::Version
212
+ version: '2'
213
+ required_rubygems_version: !ruby/object:Gem::Requirement
214
+ requirements:
215
+ - - ">="
216
+ - !ruby/object:Gem::Version
217
+ version: '0'
218
+ requirements: []
219
+ rubyforge_project:
220
+ rubygems_version: 2.2.5
221
+ signing_key:
222
+ specification_version: 4
223
+ summary: Write a short summary, because Rubygems requires one.
224
+ test_files: []