RubyGems - logstash-output-s3 - Versions diffs - 0.1.0 - Mend

logstash-output-s3 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml ADDED Viewed

@@ -0,0 +1,15 @@
+---
+!binary "U0hBMQ==":
+  metadata.gz: !binary |-
+    YzM5YTk3ZjY1OWU3Mjg2YTMwZTA1NTY1ZWIxMWYyODJkMTcxMWQ5YQ==
+  data.tar.gz: !binary |-
+    NWQxNDkyNWEzNGEyNjE4MDQwZmE0ZTk3NjdhYjI4ZWY5ZWVmZTM2OA==
+SHA512:
+  metadata.gz: !binary |-
+    YmQ1MjhiYTdjNGExOTMzYjBkYjc2MGRhOGRiODY0YWY0YjBiN2RhMjk2OWI4
+    YjJhMjEwOTk2YzJkNWRhYTYwOWM1MDA0YWI2MjI3NjdjOGYxMjFkNTI1Yzdj
+    MzM3ZGMwNjlkZGY4MzZmMjVhMDE4ZWQxZGVjNDVkYjBlOThmZjQ=
+  data.tar.gz: !binary |-
+    N2M5YTNlYzEyZjQ4MDZjZjExZDg1YjEzZDQ1MzNmNTk1ZWI0NWJlZjJjZTA0
+    NTM5NmY3NzA1MjdlOTU5MDcwZTczZWI5ZDRiZTAxZTdhYzAzZjVlOTYxZDIy
+    ODVlMTAxNjcwYzZkNGRjN2NmZDRjYjY0NzA4ZjRiNDcxNTUyNWQ=

data/.gitignore ADDED Viewed

@@ -0,0 +1,4 @@
+*.gem
+Gemfile.lock
+.bundle
+vendor

data/Gemfile ADDED Viewed

@@ -0,0 +1,4 @@
+source 'http://rubygems.org'
+gem 'rake'
+gem 'gem_publisher'
+gem 'archive-tar-minitar'

data/LICENSE ADDED Viewed

@@ -0,0 +1,13 @@
+Copyright (c) 2012-2014 Elasticsearch <http://www.elasticsearch.org>
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.

data/Rakefile ADDED Viewed

@@ -0,0 +1,6 @@
+@files=[]
+task :default do
+  system("rake -T")
+end

data/lib/logstash/outputs/s3.rb ADDED Viewed

@@ -0,0 +1,357 @@
+# encoding: utf-8
+require "logstash/outputs/base"
+require "logstash/namespace"
+require "socket" # for Socket.gethostname
+# TODO integrate aws_config in the future
+#require "logstash/plugin_mixins/aws_config"
+# INFORMATION:
+# This plugin was created for store the logstash's events into Amazon Simple Storage Service (Amazon S3).
+# For use it you needs authentications and an s3 bucket.
+# Be careful to have the permission to write file on S3's bucket and run logstash with super user for establish connection.
+# S3 plugin allows you to do something complex, let's explain:)
+# S3 outputs create temporary files into "/opt/logstash/S3_temp/". If you want, you can change the path at the start of register method.
+# This files have a special name, for example:
+# ls.s3.ip-10-228-27-95.2013-04-18T10.00.tag_hello.part0.txt
+# ls.s3 : indicate logstash plugin s3
+# "ip-10-228-27-95" : indicate you ip machine, if you have more logstash and writing on the same bucket for example.
+# "2013-04-18T10.00" : represents the time whenever you specify time_file.
+# "tag_hello" : this indicate the event's tag, you can collect events with the same tag.
+# "part0" : this means if you indicate size_file then it will generate more parts if you file.size > size_file.
+#           When a file is full it will pushed on bucket and will be deleted in temporary directory.
+#           If a file is empty is not pushed, but deleted.
+# This plugin have a system to restore the previous temporary files if something crash.
+##[Note] :
+## If you specify size_file and time_file then it will create file for each tag (if specified), when time_file or
+## their size > size_file, it will be triggered then they will be pushed on s3's bucket and will delete from local disk.
+## If you don't specify size_file, but time_file then it will create only one file for each tag (if specified).
+## When time_file it will be triggered then the files will be pushed on s3's bucket and delete from local disk.
+## If you don't specify time_file, but size_file  then it will create files for each tag (if specified),
+## that will be triggered when their size > size_file, then they will be pushed on s3's bucket and will delete from local disk.
+## If you don't specific size_file and time_file you have a curios mode. It will create only one file for each tag (if specified).
+## Then the file will be rest on temporary directory and don't will be pushed on bucket until we will restart logstash.
+# INFORMATION ABOUT CLASS:
+# I tried to comment the class at best i could do.
+# I think there are much thing to improve, but if you want some points to develop here a list:
+# TODO Integrate aws_config in the future
+# TODO Find a method to push them all files when logtstash close the session.
+# TODO Integrate @field on the path file
+# TODO Permanent connection or on demand? For now on demand, but isn't a good implementation.
+#      Use a while or a thread to try the connection before break a time_out and signal an error.
+# TODO If you have bugs report or helpful advice contact me, but remember that this code is much mine as much as yours,
+#      try to work on it if you want :)
+# USAGE:
+# This is an example of logstash config:
+# output {
+#    s3{
+#      access_key_id => "crazy_key"             (required)
+#      secret_access_key => "monkey_access_key" (required)
+#      endpoint_region => "eu-west-1"           (required)
+#      bucket => "boss_please_open_your_bucket" (required)
+#      size_file => 2048                        (optional)
+#      time_file => 5                           (optional)
+#      format => "plain"                        (optional)
+#      canned_acl => "private"                  (optional. Options are "private", "public_read", "public_read_write", "authenticated_read". Defaults to "private" )
+#    }
+# }
+# We analize this:
+# access_key_id => "crazy_key"
+# Amazon will give you the key for use their service if you buy it or try it. (not very much open source anyway)
+# secret_access_key => "monkey_access_key"
+# Amazon will give you the secret_access_key for use their service if you buy it or try it . (not very much open source anyway).
+# endpoint_region => "eu-west-1"
+# When you make a contract with Amazon, you should know where the services you use.
+# bucket => "boss_please_open_your_bucket"
+# Be careful you have the permission to write on bucket and know the name.
+# size_file => 2048
+# Means the size, in KB, of files who can store on temporary directory before you will be pushed on bucket.
+# Is useful if you have a little server with poor space on disk and you don't want blow up the server with unnecessary temporary log files.
+# time_file => 5
+# Means, in minutes, the time  before the files will be pushed on bucket. Is useful if you want to push the files every specific time.
+# format => "plain"
+# Means the format of events you want to store in the files
+# canned_acl => "private"
+# The S3 canned ACL to use when putting the file. Defaults to "private".
+# LET'S ROCK AND ROLL ON THE CODE!
+class LogStash::Outputs::S3 < LogStash::Outputs::Base
+ #TODO integrate aws_config in the future
+ #  include LogStash::PluginMixins::AwsConfig
+ config_name "s3"
+ milestone 1
+ # Aws access_key.
+ config :access_key_id, :validate => :string
+ # Aws secret_access_key
+ config :secret_access_key, :validate => :string
+ # S3 bucket
+ config :bucket, :validate => :string
+ # Aws endpoint_region
+ config :endpoint_region, :validate => ["us-east-1", "us-west-1", "us-west-2",
+                                        "eu-west-1", "ap-southeast-1", "ap-southeast-2",
+                                        "ap-northeast-1", "sa-east-1", "us-gov-west-1"], :default => "us-east-1"
+ # Set the size of file in KB, this means that files on bucket when have dimension > file_size, they are stored in two or more file.
+ # If you have tags then it will generate a specific size file for every tags
+ ##NOTE: define size of file is the better thing, because generate a local temporary file on disk and then put it in bucket.
+ config :size_file, :validate => :number, :default => 0
+ # Set the time, in minutes, to close the current sub_time_section of bucket.
+ # If you define file_size you have a number of files in consideration of the section and the current tag.
+ # 0 stay all time on listerner, beware if you specific 0 and size_file 0, because you will not put the file on bucket,
+ # for now the only thing this plugin can do is to put the file when logstash restart.
+ config :time_file, :validate => :number, :default => 0
+ # The event format you want to store in files. Defaults to plain text.
+ config :format, :validate => [ "json", "plain", "nil" ], :default => "plain"
+ ## IMPORTANT: if you use multiple instance of s3, you should specify on one of them the "restore=> true" and on the others "restore => false".
+ ## This is hack for not destroy the new files after restoring the initial files.
+ ## If you do not specify "restore => true" when logstash crashes or is restarted, the files are not sent into the bucket,
+ ## for example if you have single Instance.
+ config :restore, :validate => :boolean, :default => false
+ # Aws canned ACL
+ config :canned_acl, :validate => ["private", "public_read", "public_read_write", "authenticated_read"],
+        :default => "private"
+ # Method to set up the aws configuration and establish connection
+ def aws_s3_config
+  @endpoint_region == 'us-east-1' ? @endpoint_region = 's3.amazonaws.com' : @endpoint_region = 's3-'+@endpoint_region+'.amazonaws.com'
+  @logger.info("Registering s3 output", :bucket => @bucket, :endpoint_region => @endpoint_region)
+  AWS.config(
+    :access_key_id => @access_key_id,
+    :secret_access_key => @secret_access_key,
+    :s3_endpoint => @endpoint_region
+  )
+  @s3 = AWS::S3.new
+ end
+ # This method is used to manage sleep and awaken thread.
+ def time_alert(interval)
+   Thread.new do
+    loop do
+      start_time = Time.now
+      yield
+      elapsed = Time.now - start_time
+      sleep([interval - elapsed, 0].max)
+    end
+   end
+ end
+ # this method is used for write files on bucket. It accept the file and the name of file.
+ def write_on_bucket (file_data, file_basename)
+  # if you lose connection with s3, bad control implementation.
+  if ( @s3 == nil)
+    aws_s3_config
+  end
+  # find and use the bucket
+  bucket = @s3.buckets[@bucket]
+  @logger.debug "S3: ready to write "+file_basename+" in bucket "+@bucket+", Fire in the hole!"
+  # prepare for write the file
+  object = bucket.objects[file_basename]
+  object.write(:file => file_data, :acl => @canned_acl)
+  @logger.debug "S3: has written "+file_basename+" in bucket "+@bucket + " with canned ACL \"" + @canned_acl + "\""
+ end
+ # this method is used for create new path for name the file
+ def getFinalPath
+   @pass_time = Time.now
+   return @temp_directory+"ls.s3."+Socket.gethostname+"."+(@pass_time).strftime("%Y-%m-%dT%H.%M")
+ end
+ # This method is used for restore the previous crash of logstash or to prepare the files to send in bucket.
+ # Take two parameter: flag and name. Flag indicate if you want to restore or not, name is the name of file
+ def upFile(flag, name)
+   Dir[@temp_directory+name].each do |file|
+     name_file = File.basename(file)
+     if (flag == true)
+      @logger.warn "S3: have found temporary file: "+name_file+", something has crashed before... Prepare for upload in bucket!"
+     end
+     if (!File.zero?(file))
+       write_on_bucket(file, name_file)
+       if (flag == true)
+          @logger.debug "S3: file: "+name_file+" restored on bucket "+@bucket
+       else
+          @logger.debug "S3: file: "+name_file+" was put on bucket "+@bucket
+       end
+     end
+     File.delete (file)
+   end
+ end
+ # This method is used for create new empty temporary files for use. Flag is needed for indicate new subsection time_file.
+ def newFile (flag)
+   if (flag == true)
+     @current_final_path = getFinalPath
+     @sizeCounter = 0
+   end
+   if (@tags.size != 0)
+     @tempFile = File.new(@current_final_path+".tag_"+@tag_path+"part"+@sizeCounter.to_s+".txt", "w")
+   else
+     @tempFile = File.new(@current_final_path+".part"+@sizeCounter.to_s+".txt", "w")
+   end
+ end
+ public
+ def register
+   require "aws-sdk"
+   @temp_directory = "/opt/logstash/S3_temp/"
+   if (@tags.size != 0)
+       @tag_path = ""
+       for i in (0..@tags.size-1)
+          @tag_path += @tags[i].to_s+"."
+       end
+   end
+   if !(File.directory? @temp_directory)
+    @logger.debug "S3: Directory "+@temp_directory+" doesn't exist, let's make it!"
+    Dir.mkdir(@temp_directory)
+   else
+    @logger.debug "S3: Directory "+@temp_directory+" exist, nothing to do"
+   end
+   if (@restore == true )
+     @logger.debug "S3: is attempting to verify previous crashes..."
+     upFile(true, "*.txt")
+   end
+   newFile(true)
+   if (time_file != 0)
+      first_time = true
+      @thread = time_alert(@time_file*60) do
+       if (first_time == false)
+         @logger.debug "S3: time_file triggered,  let's bucket the file if dosen't empty  and create new file "
+         upFile(false, File.basename(@tempFile))
+         newFile(true)
+       else
+         first_time = false
+       end
+     end
+   end
+ end
+ public
+ def receive(event)
+  return unless output?(event)
+  # Prepare format of Events
+  if (@format == "plain")
+     message = self.class.format_message(event)
+  elsif (@format == "json")
+     message = event.to_json
+  else
+     message = event.to_s
+  end
+  if(time_file !=0)
+     @logger.debug "S3: trigger files after "+((@pass_time+60*time_file)-Time.now).to_s
+  end
+  # if specific the size
+  if(size_file !=0)
+    if (@tempFile.size < @size_file )
+       @logger.debug "S3: File have size: "+@tempFile.size.to_s+" and size_file is: "+ @size_file.to_s
+       @logger.debug "S3: put event into: "+File.basename(@tempFile)
+       # Put the event in the file, now!
+       File.open(@tempFile, 'a') do |file|
+         file.puts message
+         file.write "\n"
+       end
+     else
+       @logger.debug "S3: file: "+File.basename(@tempFile)+" is too large, let's bucket it and create new file"
+       upFile(false, File.basename(@tempFile))
+       @sizeCounter += 1
+       newFile(false)
+     end
+  # else we put all in one file
+  else
+    @logger.debug "S3: put event into "+File.basename(@tempFile)
+    File.open(@tempFile, 'a') do |file|
+      file.puts message
+      file.write "\n"
+    end
+  end
+ end
+ def self.format_message(event)
+    message = "Date: #{event[LogStash::Event::TIMESTAMP]}\n"
+    message << "Source: #{event["source"]}\n"
+    message << "Tags: #{event["tags"].join(', ')}\n"
+    message << "Fields: #{event.to_hash.inspect}\n"
+    message << "Message: #{event["message"]}"
+ end
+end
+# Enjoy it, by Bistic:)

data/logstash-output-s3.gemspec ADDED Viewed

@@ -0,0 +1,28 @@
+Gem::Specification.new do |s|
+  s.name            = 'logstash-output-s3'
+  s.version         = '0.1.0'
+  s.licenses        = ['Apache License (2.0)']
+  s.summary         = "This plugin was created for store the logstash's events into Amazon Simple Storage Service (Amazon S3)"
+  s.description     = "This plugin was created for store the logstash's events into Amazon Simple Storage Service (Amazon S3)"
+  s.authors         = ["Elasticsearch"]
+  s.email           = 'richard.pijnenburg@elasticsearch.com'
+  s.homepage        = "http://logstash.net/"
+  s.require_paths = ["lib"]
+  # Files
+  s.files = `git ls-files`.split($\)+::Dir.glob('vendor/*')
+  # Tests
+  s.test_files = s.files.grep(%r{^(test|spec|features)/})
+  # Special flag to let us know this is actually a logstash plugin
+  s.metadata = { "logstash_plugin" => "true", "group" => "output" }
+  # Gem dependencies
+  s.add_runtime_dependency 'logstash', '>= 1.4.0', '< 2.0.0'
+  s.add_runtime_dependency 'aws-sdk'
+end

data/rakelib/publish.rake ADDED Viewed

@@ -0,0 +1,9 @@
+require "gem_publisher"
+desc "Publish gem to RubyGems.org"
+task :publish_gem do |t|
+  gem_file = Dir.glob(File.expand_path('../*.gemspec',File.dirname(__FILE__))).first
+  gem = GemPublisher.publish_if_updated(gem_file, :rubygems)
+  puts "Published #{gem}" if gem
+end

data/rakelib/vendor.rake ADDED Viewed

@@ -0,0 +1,169 @@
+require "net/http"
+require "uri"
+require "digest/sha1"
+def vendor(*args)
+  return File.join("vendor", *args)
+end
+directory "vendor/" => ["vendor"] do |task, args|
+  mkdir task.name
+end
+def fetch(url, sha1, output)
+  puts "Downloading #{url}"
+  actual_sha1 = download(url, output)
+  if actual_sha1 != sha1
+    fail "SHA1 does not match (expected '#{sha1}' but got '#{actual_sha1}')"
+  end
+end # def fetch
+def file_fetch(url, sha1)
+  filename = File.basename( URI(url).path )
+  output = "vendor/#{filename}"
+  task output => [ "vendor/" ] do
+    begin
+      actual_sha1 = file_sha1(output)
+      if actual_sha1 != sha1
+        fetch(url, sha1, output)
+      end
+    rescue Errno::ENOENT
+      fetch(url, sha1, output)
+    end
+  end.invoke
+  return output
+end
+def file_sha1(path)
+  digest = Digest::SHA1.new
+  fd = File.new(path, "r")
+  while true
+    begin
+      digest << fd.sysread(16384)
+    rescue EOFError
+      break
+    end
+  end
+  return digest.hexdigest
+ensure
+  fd.close if fd
+end
+def download(url, output)
+  uri = URI(url)
+  digest = Digest::SHA1.new
+  tmp = "#{output}.tmp"
+  Net::HTTP.start(uri.host, uri.port, :use_ssl => (uri.scheme == "https")) do |http|
+    request = Net::HTTP::Get.new(uri.path)
+    http.request(request) do |response|
+      fail "HTTP fetch failed for #{url}. #{response}" if [200, 301].include?(response.code)
+      size = (response["content-length"].to_i || -1).to_f
+      count = 0
+      File.open(tmp, "w") do |fd|
+        response.read_body do |chunk|
+          fd.write(chunk)
+          digest << chunk
+          if size > 0 && $stdout.tty?
+            count += chunk.bytesize
+            $stdout.write(sprintf("\r%0.2f%%", count/size * 100))
+          end
+        end
+      end
+      $stdout.write("\r      \r") if $stdout.tty?
+    end
+  end
+  File.rename(tmp, output)
+  return digest.hexdigest
+rescue SocketError => e
+  puts "Failure while downloading #{url}: #{e}"
+  raise
+ensure
+  File.unlink(tmp) if File.exist?(tmp)
+end # def download
+def untar(tarball, &block)
+  require "archive/tar/minitar"
+  tgz = Zlib::GzipReader.new(File.open(tarball))
+  # Pull out typesdb
+  tar = Archive::Tar::Minitar::Input.open(tgz)
+  tar.each do |entry|
+    path = block.call(entry)
+    next if path.nil?
+    parent = File.dirname(path)
+    mkdir_p parent unless File.directory?(parent)
+    # Skip this file if the output file is the same size
+    if entry.directory?
+      mkdir path unless File.directory?(path)
+    else
+      entry_mode = entry.instance_eval { @mode } & 0777
+      if File.exists?(path)
+        stat = File.stat(path)
+        # TODO(sissel): Submit a patch to archive-tar-minitar upstream to
+        # expose headers in the entry.
+        entry_size = entry.instance_eval { @size }
+        # If file sizes are same, skip writing.
+        next if stat.size == entry_size && (stat.mode & 0777) == entry_mode
+      end
+      puts "Extracting #{entry.full_name} from #{tarball} #{entry_mode.to_s(8)}"
+      File.open(path, "w") do |fd|
+        # eof? check lets us skip empty files. Necessary because the API provided by
+        # Archive::Tar::Minitar::Reader::EntryStream only mostly acts like an
+        # IO object. Something about empty files in this EntryStream causes
+        # IO.copy_stream to throw "can't convert nil into String" on JRuby
+        # TODO(sissel): File a bug about this.
+        while !entry.eof?
+          chunk = entry.read(16384)
+          fd.write(chunk)
+        end
+          #IO.copy_stream(entry, fd)
+      end
+      File.chmod(entry_mode, path)
+    end
+  end
+  tar.close
+  File.unlink(tarball) if File.file?(tarball)
+end # def untar
+def ungz(file)
+  outpath = file.gsub('.gz', '')
+  tgz = Zlib::GzipReader.new(File.open(file))
+  begin
+    File.open(outpath, "w") do |out|
+      IO::copy_stream(tgz, out)
+    end
+    File.unlink(file)
+  rescue
+    File.unlink(outpath) if File.file?(outpath)
+   raise
+  end
+  tgz.close
+end
+desc "Process any vendor files required for this plugin"
+task "vendor" do |task, args|
+  @files.each do |file|
+    download = file_fetch(file['url'], file['sha1'])
+    if download =~ /.tar.gz/
+      prefix = download.gsub('.tar.gz', '').gsub('vendor/', '')
+      untar(download) do |entry|
+        if !file['files'].nil?
+          next unless file['files'].include?(entry.full_name.gsub(prefix, ''))
+          out = entry.full_name.split("/").last
+        end
+        File.join('vendor', out)
+      end
+    elsif download =~ /.gz/
+      ungz(download)
+    end
+  end
+end

data/spec/outputs/s3_spec.rb ADDED Viewed

	@@ -0,0 +1 @@
1	+ require 'spec_helper'

metadata ADDED Viewed

@@ -0,0 +1,91 @@
+--- !ruby/object:Gem::Specification
+name: logstash-output-s3
+version: !ruby/object:Gem::Version
+  version: 0.1.0
+platform: ruby
+authors:
+- Elasticsearch
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2014-11-06 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: logstash
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: 1.4.0
+    - - <
+      - !ruby/object:Gem::Version
+        version: 2.0.0
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: 1.4.0
+    - - <
+      - !ruby/object:Gem::Version
+        version: 2.0.0
+- !ruby/object:Gem::Dependency
+  name: aws-sdk
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+description: This plugin was created for store the logstash's events into Amazon Simple
+  Storage Service (Amazon S3)
+email: richard.pijnenburg@elasticsearch.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- .gitignore
+- Gemfile
+- LICENSE
+- Rakefile
+- lib/logstash/outputs/s3.rb
+- logstash-output-s3.gemspec
+- rakelib/publish.rake
+- rakelib/vendor.rake
+- spec/outputs/s3_spec.rb
+homepage: http://logstash.net/
+licenses:
+- Apache License (2.0)
+metadata:
+  logstash_plugin: 'true'
+  group: output
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ! '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ! '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 2.4.1
+signing_key:
+specification_version: 4
+summary: This plugin was created for store the logstash's events into Amazon Simple
+  Storage Service (Amazon S3)
+test_files:
+- spec/outputs/s3_spec.rb