RubyGems - fluent-plugin-redshift-anton - Versions diffs - 1.0.1 - Mend

fluent-plugin-redshift-anton 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +7 -0
data/Gemfile +3 -0
data/README.md +162 -0
data/Rakefile +16 -0
data/VERSION +1 -0
data/fluent-plugin-redshift-anton.gemspec +25 -0
data/lib/fluent/plugin/out_redshift_auto.rb +330 -0
data/test/plugin/test_out_redshift_auto.rb +526 -0
data/test/test_helper.rb +8 -0
metadata +137 -0

checksums.yaml ADDED

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: fba5c12a592c44c13ab3faadd0842a13fd2f635d
+  data.tar.gz: 55a73f89f4174e1a58aa07708d05aa9a6184df17
+SHA512:
+  metadata.gz: 263d0db356aabc5c02cfb08497a3da6bbe1d2cc1fcc38c1f03ceec0a7ceda31e2ef2e14be9b2c3af71dabeb68e185191c4c8907c5ebe97973e34f9144b58ca60
+  data.tar.gz: 825d2401ddbed069b12916698c911ec18028b60f2a6e258240377b6e3e11cfe14b7a4bfb6cc9985bd494bea13f932de939253fcc045c47dc481fa6efea3b00dd

data/Gemfile ADDED

@@ -0,0 +1,3 @@
+source 'https://rubygems.org'
+gemspec

data/README.md ADDED

@@ -0,0 +1,162 @@
+Amazon Redshift output plugin for Fluentd
+========
+## Overview
+Amazon Redshift output plugin uploads event logs to an Amazon Redshift Cluster. Supportted data formats are csv, tsv and json. An S3 bucket and a Redshift Cluster are required to use this plugin.
+## Installation
+    gem install fluent-plugin-redshift-anton
+## Configuration
+Format:
+    <match my.tag>
+        type redshift_anton
+        # s3 (for copying data to redshift)
+        aws_key_id YOUR_AWS_KEY_ID
+        aws_sec_key YOUR_AWS_SECRET_KEY
+        s3_bucket YOUR_S3_BUCKET
+        s3_endpoint YOUR_S3_BUCKET_END_POINT
+        path YOUR_S3_PATH
+        timestamp_key_format year=%Y/month=%m/day=%d/hour=%H/%Y%m%d-%H%M
+        # redshift
+        redshift_host YOUR_AMAZON_REDSHIFT_CLUSTER_END_POINT
+        redshift_port YOUR_AMAZON_REDSHIFT_CLUSTER_PORT
+        redshift_dbname YOUR_AMAZON_REDSHIFT_CLUSTER_DATABASE_NAME
+        redshift_user YOUR_AMAZON_REDSHIFT_CLUSTER_USER_NAME
+        redshift_password YOUR_AMAZON_REDSHIFT_CLUSTER_PASSWORD
+        redshift_schemaname YOUR_AMAZON_REDSHIFT_CLUSTER_TARGET_SCHEMA_NAME
+        redshift_tablename YOUR_AMAZON_REDSHIFT_CLUSTER_TARGET_TABLE_NAME
+        make_auto_table 1 # 1 => make table auto 0 => no
+        tag_table 1 # 1=> tag_name = table_name, 0 => no, use redshift_atablename
+        file_type [tsv|csv|json|msgpack]
+        varchar_length ALL_COLUMNS_VARCHAR_LENGTH
+        # buffer
+        buffer_type file
+        buffer_path /var/log/fluent/redshift
+        flush_interval 15m
+        buffer_chunk_limit 1g
+    </match>
+Example (watch and upload json formatted apache log):
+    <source>
+        type tail
+        path redshift_test.json
+        pos_file redshift_test_json.pos
+        tag redshift.json
+        format /^(?<log>.*)$/
+    </source>
+    <match redshift.json>
+        type redshift
+        # s3 (for copying data to redshift)
+        aws_key_id YOUR_AWS_KEY_ID
+        aws_sec_key YOUR_AWS_SECRET_KEY
+        s3_bucket hapyrus-example
+        s3_endpoint s3.amazonaws.com
+        path path/on/s3/apache_json_log/
+        timestamp_key_format year=%Y/month=%m/day=%d/hour=%H/%Y%m%d-%H%M
+        # redshift
+        redshift_host xxx-yyy-zzz.xxxxxxxxxx.us-east-1.redshift.amazonaws.com
+        redshift_port 5439
+        redshift_dbname fluent-redshift-test
+        redshift_user fluent
+        redshift_password fluent-password
+        redshift_tablename apache_log
+        file_type json
+        # buffer
+        buffer_type file
+        buffer_path /var/log/fluent/redshift
+        flush_interval 15m
+        buffer_chunk_limit 1g
+    <match>
++ `type` (required) : The value must be `redshift`.
++ `aws_key_id` (required) : AWS access key id to access s3 bucket.
++ `aws_sec_key` (required) : AWS securet key id to access s3 bucket.
++ `s3_bucket` (required) : s3 bucket name. S3 bucket must be same as the region of your Redshift cluster.
++ `s3_endpoint` : s3 endpoint.
++ `path` (required) : s3 path to input.
++ `timestamp_key_format` : The format of the object keys. It can include date-format directives.
+  - Default parameter is "year=%Y/month=%m/day=%d/hour=%H/%Y%m%d-%H%M"
+  - For example, the s3 path is as following with the above example configration.
+    <pre>
+  hapyrus-example/apache_json_log/year=2013/month=03/day=05/hour=12/20130305_1215_00.gz
+  hapyrus-example/apache_json_log/year=2013/month=03/day=05/hour=12/20130305_1230_00.gz
+</pre>
++ `redshift_host` (required) : the end point(or hostname) of your Amazon Redshift cluster.
++ `redshift_port` (required) : port number.
++ `redshift_dbname` (required) : database name.
++ `redshift_user` (required) : user name.
++ `redshift_password` (required) : password for the user name.
++ `redshift_tablename` (required) : table name to store data.
++ `redshift_schemaname` : schema name to store data. By default, this option is not set and find table without schema as your own search_path.
++ `make_auto_table` (optional, integer) : whether make tables automatically. If you set 1, this makes tables automatically else if you set 0, doesn't make.
++ `tag_table` (optional, integer) : whether table_name equals tag_name. If you set 1, it shows tag_name equals table_name, else if you set 0, it's not.
++ `file_type` : file format of the source data.  `csv`, `tsv`, `msgpack` or `json` are available.
++ `delimiter` : delimiter of the source data. This option will be ignored if `file_type` is specified.
++ `buffer_type` : buffer type.
++ `buffer_path` : path prefix of the files to buffer logs.
++ `flush_interval` : flush interval.
++ `buffer_chunk_limit` : limit buffer size to chunk.
++ `utc` : utc time zone. This parameter affects `timestamp_key_format`.
+## Logging examples
+```ruby
+# examples by fluent-logger
+require 'fluent-logger'
+log = Fluent::Logger::FluentLogger.new(nil, :host => 'localhost', :port => 24224)
+# file_type: csv
+log.post('your.tag', :log => "12345,12345")
+# file_type: tsv
+log.post('your.tag', :log => "12345\t12345")
+# file_type: json
+require 'json'
+log.post('your.tag', :log => { :user_id => 12345, :data_id => 12345 }.to_json)
+# file_type: msgpack
+log.post('your.tag', :user_id => 12345, :data_id => 12345)
+```
+## License
+Copyright (c) 2013 [Hapyrus Inc](http://hapyrus.com)
+[Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0)

data/Rakefile ADDED

@@ -0,0 +1,16 @@
+require "bundler"
+Bundler::GemHelper.install_tasks
+require 'rake/testtask'
+Rake::TestTask.new(:test) do |test|
+  test.libs << 'lib' << 'test'
+  test.test_files = FileList['test/plugin/*.rb']
+  test.verbose = true
+end
+task :coverage do |t|
+  ENV['COVERAGE'] = '1'
+  Rake::Task["test"].invoke
+end
+task :default => [:build]

data/VERSION ADDED

	@@ -0,0 +1 @@
1	+ 1.0.1

data/fluent-plugin-redshift-anton.gemspec ADDED

@@ -0,0 +1,25 @@
+# -*- encoding: utf-8 -*-
+$:.push File.expand_path('../lib', __FILE__)
+Gem::Specification.new do |gem|
+  gem.name          = "fluent-plugin-redshift-anton"
+  gem.version       = File.read("VERSION").strip
+  gem.authors       = ["Anton Kuchinsky"]
+  gem.email         = ["akuchinsky@gmail.com"]
+  gem.description   = %q{Amazon Redshift output plugin for Fluentd with creating table}
+  gem.summary       = gem.description
+  gem.homepage      = "https://github.com/akuchins/fluent-plugin-redshift-anton"
+  gem.has_rdoc      = false
+  gem.files         = `git ls-files`.split($/)
+  gem.executables   = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
+  gem.test_files    = gem.files.grep(%r{^(test|spec|features)/})
+  gem.require_paths = ["lib"]
+  gem.add_dependency "fluentd", ">= 0.10.0"
+  gem.add_dependency "aws-sdk", ">= 1.6.3"
+  gem.add_dependency "pg", ">= 0.14.0"
+  gem.add_development_dependency "rake"
+  gem.add_development_dependency "simplecov", ">= 0.5.4"
+  gem.add_development_dependency "flexmock", ">= 1.3.1"
+end

data/lib/fluent/plugin/out_redshift_auto.rb ADDED

@@ -0,0 +1,330 @@
+module Fluent
+class RedshiftOutput < BufferedOutput
+  Fluent::Plugin.register_output('redshift_anton', self)
+  # ignore load table error. (invalid data format)
+  IGNORE_REDSHIFT_ERROR_REGEXP = /^ERROR:  Load into table '[^']+' failed\./
+  def initialize
+    super
+    require 'aws-sdk'
+    require 'zlib'
+    require 'time'
+    require 'tempfile'
+    require 'pg'
+    require 'json'
+    require 'csv'
+  end
+  config_param :record_log_tag, :string, :default => 'log'
+  # s3
+  config_param :aws_key_id, :string
+  config_param :aws_sec_key, :string
+  config_param :s3_bucket, :string
+  config_param :s3_endpoint, :string, :default => nil
+  config_param :path, :string, :default => ""
+  config_param :timestamp_key_format, :string, :default => 'year=%Y/month=%m/day=%d/hour=%H/%Y%m%d-%H%M'
+  config_param :utc, :bool, :default => false
+  # redshift
+  config_param :redshift_host, :string
+  config_param :redshift_port, :integer, :default => 5439
+  config_param :redshift_dbname, :string
+  config_param :redshift_user, :string
+  config_param :redshift_password, :string
+  config_param :redshift_tablename, :string
+  config_param :redshift_schemaname, :string, :default => "public"
+  config_param :redshift_copy_base_options, :string , :default => "FILLRECORD ACCEPTANYDATE TRUNCATECOLUMNS"
+  config_param :make_auto_table, :integer, :default => 1 #1 => make_auto 0=> no
+  config_param :tag_table, :integer, :default => 1 #1 => tag_name = table_name, 0 => no
+  # file format
+  config_param :file_type, :string, :default => nil  # json, tsv, csv
+  config_param :delimiter, :string, :default => nil
+  # for debug
+  config_param :log_suffix, :string, :default => ''
+  # for varchar length
+  config_param :varchar_length, :integer, :default => 255
+  def configure(conf)
+    super
+    @path = "#{@path}/" if /.+[^\/]$/ =~ @path
+    @path = "" if @path == "/"
+    @utc = true if conf['utc']
+    @db_conf = {
+      host:@redshift_host,
+      port:@redshift_port,
+      dbname:@redshift_dbname,
+      user:@redshift_user,
+      password:@redshift_password
+    }
+    @delimiter = determine_delimiter(@file_type) if @delimiter.nil? or @delimiter.empty?
+    $log.debug format_log("redshift file_type:#{@file_type} delimiter:'#{@delimiter}'")
+    @copy_sql_template = "copy #{@redshift_schemaname}.%s from '%s' CREDENTIALS 'aws_access_key_id=#{@aws_key_id};aws_secret_access_key=%s' delimiter '#{@delimiter}' GZIP TRUNCATECOLUMNS ESCAPE #{@redshift_copy_base_options};"
+  end
+  def start
+    super
+    # init s3 conf
+    $log.debug format_log("redshift file_type:#{@file_type} delimiter:'#{@delimiter}'")
+    options = {
+      :access_key_id     => @aws_key_id,
+      :secret_access_key => @aws_sec_key
+    }
+    options[:s3_endpoint] = @s3_endpoint if @s3_endpoint
+    @s3 = AWS::S3.new(options)
+    @bucket = @s3.buckets[@s3_bucket]
+  end
+  def format(tag, time, record)
+    record = JSON.generate(record)
+    if @make_auto_table == 1 && json?
+      json = JSON.parse(record)
+      cols = []
+      json.each do |key,val|
+        cols.push("#{key}")
+      end
+      make_table_from_tag_name(tag, cols)
+    end
+    (json?) ? record.to_msgpack : "#{record[@record_log_tag]}\n"
+  end
+  def write(chunk)
+    $log.debug format_log("start creating gz.")
+    if @tag_table == 1 then
+      file_name = File::basename(chunk.path)
+      table_name = file_name.sub(/\..*/, "")
+    else
+      table_name = @redshift_tablename
+    end
+    # create a gz file
+    tmp = Tempfile.new("s3-")
+    tmp = (json?) ? create_gz_file_from_json(tmp, chunk, @delimiter)
+                  : create_gz_file_from_msgpack(tmp, chunk)
+    # no data -> skip
+    unless tmp
+      $log.debug format_log("received no valid data. ")
+      return false # for debug
+    end
+    # create a file path with time format
+    s3path = create_s3path(@bucket, @path)
+    # upload gz to s3
+    @bucket.objects[s3path].write(Pathname.new(tmp.path),
+                                  :acl => :bucket_owner_full_control)
+    # copy gz on s3 to redshift
+    s3_uri = "s3://#{@s3_bucket}/#{s3path}"
+    sql = @copy_sql_template % [table_name, s3_uri, @aws_sec_key]
+    $log.debug  format_log("start copying. s3_uri=#{s3_uri}")
+    conn = nil
+    begin
+      conn = PG.connect(@db_conf)
+      conn.exec(sql)
+      $log.info format_log("completed copying to redshift. s3_uri=#{s3_uri}")
+    rescue PG::Error => e
+      $log.error format_log("failed to copy data into redshift. s3_uri=#{s3_uri}"), :error=>e.to_s
+      raise e unless e.to_s =~ IGNORE_REDSHIFT_ERROR_REGEXP
+      return false # for debug
+    ensure
+      conn.close rescue nil if conn
+    end
+    true # for debug
+  end
+  protected
+  def format_log(message)
+    (@log_suffix and not @log_suffix.empty?) ? "#{message} #{@log_suffix}" : message
+  end
+  private
+  def json?
+    @file_type == 'json'
+  end
+  def create_gz_file_from_msgpack(dst_file, chunk)
+    gzw = nil
+    begin
+      gzw = Zlib::GzipWriter.new(dst_file)
+      chunk.write_to(gzw)
+    ensure
+      gzw.close rescue nil if gzw
+    end
+    dst_file
+  end
+  def create_gz_file_from_json(dst_file, chunk, delimiter)
+    # fetch the table definition from redshift
+    redshift_table_columns = fetch_table_columns
+    if redshift_table_columns == nil
+      raise "failed to fetch the redshift table definition."
+    elsif redshift_table_columns.empty?
+      $log.warn format_log("no table on redshift. table_name=#{@redshift_tablename}")
+      return nil
+    end
+    # convert json to tsv format text
+    gzw = nil
+    begin
+      gzw = Zlib::GzipWriter.new(dst_file)
+       $log.debug format_log("redshift file_type:#{@file_type} delimiter:'#{@delimiter}'")
+      chunk.msgpack_each do |record|
+        begin
+          tsv_text = json_to_table_text(redshift_table_columns, record, delimiter)
+          gzw.write(tsv_text) if tsv_text and not tsv_text.empty?
+        rescue => e
+          $log.error format_log("failed to create table text from json. text=(#{record[@record_log_tag]})"), :error=>$!.to_s
+          $log.error_backtrace
+        end
+      end
+      return nil unless gzw.pos > 0
+    ensure
+      gzw.close rescue nil if gzw
+    end
+    dst_file
+  end
+  def determine_delimiter(file_type)
+    case file_type
+    when 'json', 'tsv'
+      "\t"
+    when "csv"
+      ','
+    else
+      raise Fluent::ConfigError, "Invalid file_type:#{file_type}."
+    end
+  end
+  def fetch_table_columns
+    fetch_columns_sql = "select column_name from INFORMATION_SCHEMA.COLUMNS where table_name = '#{@redshift_tablename}' and table_schema = '#{@redshift_schemaname}' order by ordinal_position;"
+    conn = PG.connect(@db_conf)
+    begin
+      columns = nil
+      conn.exec(fetch_columns_sql) do |result|
+        columns = result.collect{|row| row['column_name']}
+      end
+      columns
+    ensure
+      conn.close rescue nil
+    end
+  end
+  def json_to_table_text(redshift_table_columns, json_text, delimiter)
+    return "" if json_text.nil? or json_text.empty?
+    # parse json text
+    json_obj = nil
+    begin
+      json_obj = JSON.parse(json_text)
+    rescue => e
+      $log.warn format_log("failed to parse json. "), :error=>e.to_s
+      return ""
+    end
+    return "" unless json_obj
+    # extract values from json
+    val_list = redshift_table_columns.collect do |cn|
+      val = json_obj[cn]
+      val = nil unless val and not val.to_s.empty?
+      val = JSON.generate(val) if val.kind_of?(Hash) or val.kind_of?(Array)
+      val.to_s unless val.nil?
+    end
+    if val_list.all?{|v| v.nil? or v.empty?}
+      $log.warn format_log("no data match for table columns on redshift. json_text=#{json_text} table_columns=#{redshift_table_columns}")
+      return ""
+    end
+    generate_line_with_delimiter(val_list, delimiter)
+  end
+  def generate_line_with_delimiter(val_list, delimiter)
+    val_list = val_list.collect do |val|
+      if val.nil? or val.empty?
+        ""
+      else
+        val.gsub(/\\/, "\\\\\\").gsub(/\t/, "\\\t").gsub(/\n/, "\\\n") # escape tab, newline and backslash
+      end
+    end
+    val_list.join(delimiter) + "\n"
+  end
+  def create_s3path(bucket, path)
+    timestamp_key = (@utc) ? Time.now.utc.strftime(@timestamp_key_format) : Time.now.strftime(@timestamp_key_format)
+    i = 0
+    begin
+      suffix = "_#{'%02d' % i}"
+      s3path = "#{path}#{timestamp_key}#{suffix}.gz"
+      i += 1
+    end while bucket.objects[s3path].exists?
+    s3path
+  end
+  def make_table_from_tag_name(tag, columns_arr)
+    conn = PG.connect(@db_conf)
+    sql = "SELECT table_name FROM INFORMATION_SCHEMA.TABLES WHERE table_name LIKE '#{tag}';"
+    cnt = 0
+    conn.exec(sql).each do |r|
+      cnt = cnt + 1
+    end
+    if cnt >= 1
+      return
+    end
+    cols = ""
+    for col_name in columns_arr do
+      cols = cols + "\"#{col_name}\" varchar(#{varchar_length}),"
+    end
+    len = cols.length
+    cols.slice!(len - 1)
+    if @redshift_schemaname && @redshift_schemaname != "public"
+      sql = "SELECT nspname FROM pg_namespace WHERE nspname LIKE '#{@redshift_schemaname}';"
+      cnt = 0
+      conn.exec(sql).each do |r|
+        cnt = cnt + 1
+      end
+      if cnt == 0
+        sql = "CREATE SCHEMA #{@redshift_schemaname}"
+         begin
+        conn.exec(sql)
+        rescue PGError => e
+          $log.error format_log("failed CREATE SCHEMA schema_name: #{@redshift_schemaname}")
+          $log.error format_log("class: " + e.class + " msg: " + e.message)
+        rescue => e
+          $log.error format_log("failed CREATE SCHEMA schema_name: #{@redshift_schemaname}")
+          $log.error format_log("class: " + e.class + " msg: " + e.message)
+        end
+        $log.info format_log("SCHEMA CREATED: => #{sql}")
+      end
+      table_name = "#{@redshift_schemaname}.#{tag}"
+    else
+      table_name = "#{tag}"
+    end
+    sql = "CREATE TABLE #{table_name} (#{cols});"
+    begin
+      conn.exec(sql)
+    rescue PGError => e
+      $log.error format_log("failed CREATE TABLE table_name: #{table_name}")
+      $log.error format_log("class: " + e.class + " msg: " + e.message)
+    rescue => e
+      $log.error format_log("failed CREATE TABLE table_name: #{table_name}")
+      $log.error format_log("class: " + e.class + " msg: " + e.message)
+    end
+    conn.close
+    $log.info format_log("TABLE CREATED: => #{sql}")
+  end
+end
+end

data/test/plugin/test_out_redshift_auto.rb ADDED

@@ -0,0 +1,526 @@
+require 'test_helper'
+require 'fluent/test'
+require 'fluent/plugin/out_redshift_auto'
+require 'flexmock/test_unit'
+require 'zlib'
+class RedshiftOutputTest < Test::Unit::TestCase
+  def setup
+    require 'aws-sdk'
+    require 'pg'
+    require 'csv'
+    Fluent::Test.setup
+  end
+  CONFIG_BASE= %[
+    aws_key_id test_key_id
+    aws_sec_key test_sec_key
+    s3_bucket test_bucket
+    path log
+    redshift_host test_host
+    redshift_dbname test_db
+    redshift_user test_user
+    redshift_password test_password
+    redshift_tablename test_table
+    buffer_type memory
+    utc
+    log_suffix id:5 host:localhost
+  ]
+  CONFIG_CSV= %[
+    #{CONFIG_BASE}
+    file_type csv
+  ]
+  CONFIG_TSV= %[
+    #{CONFIG_BASE}
+    file_type tsv
+  ]
+  CONFIG_JSON = %[
+    #{CONFIG_BASE}
+    file_type json
+  ]
+  CONFIG_JSON_WITH_SCHEMA = %[
+    #{CONFIG_BASE}
+    redshift_schemaname test_schema
+    file_type json
+  ]
+  CONFIG_MSGPACK = %[
+    #{CONFIG_BASE}
+    file_type msgpack
+  ]
+  CONFIG_PIPE_DELIMITER= %[
+    #{CONFIG_BASE}
+    delimiter |
+  ]
+  CONFIG_PIPE_DELIMITER_WITH_NAME= %[
+    #{CONFIG_BASE}
+    file_type pipe
+    delimiter |
+  ]
+  CONFIG=CONFIG_CSV
+  RECORD_CSV_A = {"log" => %[val_a,val_b,val_c,val_d]}
+  RECORD_CSV_B = {"log" => %[val_e,val_f,val_g,val_h]}
+  RECORD_TSV_A = {"log" => %[val_a\tval_b\tval_c\tval_d]}
+  RECORD_TSV_B = {"log" => %[val_e\tval_f\tval_g\tval_h]}
+  RECORD_JSON_A = {"log" => %[{"key_a" : "val_a", "key_b" : "val_b"}]}
+  RECORD_JSON_B = {"log" => %[{"key_c" : "val_c", "key_d" : "val_d"}]}
+  RECORD_MSGPACK_A = {"key_a" => "val_a", "key_b" => "val_b"}
+  RECORD_MSGPACK_B = {"key_c" => "val_c", "key_d" => "val_d"}
+  DEFAULT_TIME = Time.parse("2013-03-06 12:15:02 UTC").to_i
+  def create_driver(conf = CONFIG, tag='test.input')
+    Fluent::Test::BufferedOutputTestDriver.new(Fluent::RedshiftOutput, tag).configure(conf)
+  end
+  def create_driver_no_write(conf = CONFIG, tag='test.input')
+    Fluent::Test::BufferedOutputTestDriver.new(Fluent::RedshiftOutput, tag) do
+      def write(chunk)
+        chunk.read
+      end
+    end.configure(conf)
+  end
+  def test_configure
+    assert_raise(Fluent::ConfigError) {
+      d = create_driver('')
+    }
+    assert_raise(Fluent::ConfigError) {
+      d = create_driver(CONFIG_BASE)
+    }
+    d = create_driver(CONFIG_CSV)
+    assert_equal "test_key_id", d.instance.aws_key_id
+    assert_equal "test_sec_key", d.instance.aws_sec_key
+    assert_equal "test_bucket", d.instance.s3_bucket
+    assert_equal "log/", d.instance.path
+    assert_equal "test_host", d.instance.redshift_host
+    assert_equal 5439, d.instance.redshift_port
+    assert_equal "test_db", d.instance.redshift_dbname
+    assert_equal "test_user", d.instance.redshift_user
+    assert_equal "test_password", d.instance.redshift_password
+    assert_equal "test_table", d.instance.redshift_tablename
+    assert_equal nil, d.instance.redshift_schemaname
+    assert_equal "FILLRECORD ACCEPTANYDATE TRUNCATECOLUMNS", d.instance.redshift_copy_base_options
+    assert_equal nil, d.instance.redshift_copy_options
+    assert_equal "csv", d.instance.file_type
+    assert_equal ",", d.instance.delimiter
+    assert_equal true, d.instance.utc
+  end
+  def test_configure_with_schemaname
+    d = create_driver(CONFIG_JSON_WITH_SCHEMA)
+    assert_equal "test_schema", d.instance.redshift_schemaname
+  end
+  def test_configure_localtime
+    d = create_driver(CONFIG_CSV.gsub(/ *utc */, ''))
+    assert_equal false, d.instance.utc
+  end
+  def test_configure_no_path
+    d = create_driver(CONFIG_CSV.gsub(/ *path *.+$/, ''))
+    assert_equal "", d.instance.path
+  end
+  def test_configure_root_path
+    d = create_driver(CONFIG_CSV.gsub(/ *path *.+$/, 'path /'))
+    assert_equal "", d.instance.path
+  end
+  def test_configure_path_with_slash
+    d = create_driver(CONFIG_CSV.gsub(/ *path *.+$/, 'path log/'))
+    assert_equal "log/", d.instance.path
+  end
+  def test_configure_path_starts_with_slash
+    d = create_driver(CONFIG_CSV.gsub(/ *path *.+$/, 'path /log/'))
+    assert_equal "log/", d.instance.path
+  end
+  def test_configure_path_starts_with_slash_without_last_slash
+    d = create_driver(CONFIG_CSV.gsub(/ *path *.+$/, 'path /log'))
+    assert_equal "log/", d.instance.path
+  end
+  def test_configure_tsv
+    d1 = create_driver(CONFIG_TSV)
+    assert_equal "tsv", d1.instance.file_type
+    assert_equal "\t", d1.instance.delimiter
+  end
+  def test_configure_json
+    d2 = create_driver(CONFIG_JSON)
+    assert_equal "json", d2.instance.file_type
+    assert_equal "\t", d2.instance.delimiter
+  end
+  def test_configure_msgpack
+    d2 = create_driver(CONFIG_MSGPACK)
+    assert_equal "msgpack", d2.instance.file_type
+    assert_equal "\t", d2.instance.delimiter
+  end
+  def test_configure_original_file_type
+    d3 = create_driver(CONFIG_PIPE_DELIMITER)
+    assert_equal nil, d3.instance.file_type
+    assert_equal "|", d3.instance.delimiter
+    d4 = create_driver(CONFIG_PIPE_DELIMITER_WITH_NAME)
+    assert_equal "pipe", d4.instance.file_type
+    assert_equal "|", d4.instance.delimiter
+  end
+  def test_configure_no_log_suffix
+    d = create_driver(CONFIG_CSV.gsub(/ *log_suffix *.+$/, ''))
+    assert_equal "", d.instance.log_suffix
+  end
+  def emit_csv(d)
+    d.emit(RECORD_CSV_A, DEFAULT_TIME)
+    d.emit(RECORD_CSV_B, DEFAULT_TIME)
+  end
+  def emit_tsv(d)
+    d.emit(RECORD_TSV_A, DEFAULT_TIME)
+    d.emit(RECORD_TSV_B, DEFAULT_TIME)
+  end
+  def emit_json(d)
+    d.emit(RECORD_JSON_A, DEFAULT_TIME)
+    d.emit(RECORD_JSON_B, DEFAULT_TIME)
+  end
+  def emit_msgpack(d)
+    d.emit(RECORD_MSGPACK_A, DEFAULT_TIME)
+    d.emit(RECORD_MSGPACK_B, DEFAULT_TIME)
+  end
+  def test_format_csv
+    d_csv = create_driver_no_write(CONFIG_CSV)
+    emit_csv(d_csv)
+    d_csv.expect_format RECORD_CSV_A['log'] + "\n"
+    d_csv.expect_format RECORD_CSV_B['log'] + "\n"
+    d_csv.run
+  end
+  def test_format_tsv
+    d_tsv = create_driver_no_write(CONFIG_TSV)
+    emit_tsv(d_tsv)
+    d_tsv.expect_format RECORD_TSV_A['log'] + "\n"
+    d_tsv.expect_format RECORD_TSV_B['log'] + "\n"
+    d_tsv.run
+  end
+  def test_format_json
+    d_json = create_driver_no_write(CONFIG_JSON)
+    emit_json(d_json)
+    d_json.expect_format RECORD_JSON_A.to_msgpack
+    d_json.expect_format RECORD_JSON_B.to_msgpack
+    d_json.run
+  end
+  def test_format_msgpack
+    d_msgpack = create_driver_no_write(CONFIG_MSGPACK)
+    emit_msgpack(d_msgpack)
+    d_msgpack.expect_format({ 'log' => RECORD_MSGPACK_A }.to_msgpack)
+    d_msgpack.expect_format({ 'log' => RECORD_MSGPACK_B }.to_msgpack)
+    d_msgpack.run
+  end
+  class PGConnectionMock
+    def initialize(options = {})
+      @return_keys = options[:return_keys] || ['key_a', 'key_b', 'key_c', 'key_d', 'key_e', 'key_f', 'key_g', 'key_h']
+      @target_schema = options[:schemaname] || nil
+      @target_table = options[:tablename] || 'test_table'
+    end
+    def expected_column_list_query
+      if @target_schema
+        /\Aselect column_name from INFORMATION_SCHEMA.COLUMNS where table_schema = '#{@target_schema}' and table_name = '#{@target_table}'/
+      else
+        /\Aselect column_name from INFORMATION_SCHEMA.COLUMNS where table_name = '#{@target_table}'/
+      end
+    end
+    def expected_copy_query
+      if @target_schema
+        /\Acopy #{@target_schema}.#{@target_table} from/
+      else
+        /\Acopy #{@target_table} from/
+      end
+    end
+    def exec(sql, &block)
+      if block_given?
+        if sql =~ expected_column_list_query
+          yield @return_keys.collect{|key| {'column_name' => key}}
+        else
+          yield []
+        end
+      else
+        unless sql =~ expected_copy_query
+          error = PG::Error.new("ERROR:  Load into table '#{@target_table}' failed.  Check 'stl_load_errors' system table for details.")
+          error.result = "ERROR:  Load into table '#{@target_table}' failed.  Check 'stl_load_errors' system table for details."
+          raise error
+        end
+      end
+    end
+    def close
+    end
+  end
+  def setup_pg_mock
+    # create mock of PG
+    def PG.connect(dbinfo)
+      return PGConnectionMock.new
+    end
+  end
+  def setup_s3_mock(expected_data)
+    current_time = Time.now
+    # create mock of s3 object
+    s3obj = flexmock(AWS::S3::S3Object)
+    s3obj.should_receive(:exists?).with_any_args.and_return { false }
+    s3obj.should_receive(:write).with(
+      # pathname
+      on { |pathname|
+        data = nil
+        pathname.open { |f|
+          gz = Zlib::GzipReader.new(f)
+          data = gz.read
+          gz.close
+        }
+        assert_equal expected_data, data
+      },
+      :acl => :bucket_owner_full_control
+    ).and_return { true }
+    # create mock of s3 object collection
+    s3obj_col = flexmock(AWS::S3::ObjectCollection)
+    s3obj_col.should_receive(:[]).with(
+      on { |key|
+        expected_key = current_time.utc.strftime("log/year=%Y/month=%m/day=%d/hour=%H/%Y%m%d-%H%M_00.gz")
+        key == expected_key
+      }).
+      and_return {
+        s3obj
+      }
+    # create mock of s3 bucket
+    flexmock(AWS::S3::Bucket).new_instances do |bucket|
+      bucket.should_receive(:objects).with_any_args.
+        and_return {
+          s3obj_col
+        }
+    end
+  end
+  def setup_tempfile_mock_to_be_closed
+    flexmock(Tempfile).new_instances.should_receive(:close!).at_least.once
+  end
+  def setup_mocks(expected_data)
+    setup_pg_mock
+    setup_s3_mock(expected_data) end
+  def test_write_with_csv
+    setup_mocks(%[val_a,val_b,val_c,val_d\nval_e,val_f,val_g,val_h\n])
+    setup_tempfile_mock_to_be_closed
+    d_csv = create_driver
+    emit_csv(d_csv)
+    assert_equal true, d_csv.run
+  end
+  def test_write_with_json
+    setup_mocks(%[val_a\tval_b\t\t\t\t\t\t\n\t\tval_c\tval_d\t\t\t\t\n])
+    setup_tempfile_mock_to_be_closed
+    d_json = create_driver(CONFIG_JSON)
+    emit_json(d_json)
+    assert_equal true, d_json.run
+  end
+  def test_write_with_json_hash_value
+    setup_mocks("val_a\t{\"foo\":\"var\"}\t\t\t\t\t\t\n\t\tval_c\tval_d\t\t\t\t\n")
+    d_json = create_driver(CONFIG_JSON)
+    d_json.emit({"log" => %[{"key_a" : "val_a", "key_b" : {"foo" : "var"}}]} , DEFAULT_TIME)
+    d_json.emit(RECORD_JSON_B, DEFAULT_TIME)
+    assert_equal true, d_json.run
+  end
+  def test_write_with_json_array_value
+    setup_mocks("val_a\t[\"foo\",\"var\"]\t\t\t\t\t\t\n\t\tval_c\tval_d\t\t\t\t\n")
+    d_json = create_driver(CONFIG_JSON)
+    d_json.emit({"log" => %[{"key_a" : "val_a", "key_b" : ["foo", "var"]}]} , DEFAULT_TIME)
+    d_json.emit(RECORD_JSON_B, DEFAULT_TIME)
+    assert_equal true, d_json.run
+  end
+  def test_write_with_json_including_tab_newline_quote
+    setup_mocks("val_a_with_\\\t_tab_\\\n_newline\tval_b_with_\\\\_quote\t\t\t\t\t\t\n\t\tval_c\tval_d\t\t\t\t\n")
+    d_json = create_driver(CONFIG_JSON)
+    d_json.emit({"log" => %[{"key_a" : "val_a_with_\\t_tab_\\n_newline", "key_b" : "val_b_with_\\\\_quote"}]} , DEFAULT_TIME)
+    d_json.emit(RECORD_JSON_B, DEFAULT_TIME)
+    assert_equal true, d_json.run
+  end
+  def test_write_with_json_no_data
+    setup_mocks("")
+    d_json = create_driver(CONFIG_JSON)
+    d_json.emit("", DEFAULT_TIME)
+    d_json.emit("", DEFAULT_TIME)
+    assert_equal false, d_json.run
+  end
+  def test_write_with_json_invalid_one_line
+    setup_mocks(%[\t\tval_c\tval_d\t\t\t\t\n])
+    d_json = create_driver(CONFIG_JSON)
+    d_json.emit({"log" => %[}}]}, DEFAULT_TIME)
+    d_json.emit(RECORD_JSON_B, DEFAULT_TIME)
+    assert_equal true, d_json.run
+  end
+  def test_write_with_json_no_available_data
+    setup_mocks(%[val_a\tval_b\t\t\t\t\t\t\n])
+    d_json = create_driver(CONFIG_JSON)
+    d_json.emit(RECORD_JSON_A, DEFAULT_TIME)
+    d_json.emit({"log" => %[{"key_o" : "val_o", "key_p" : "val_p"}]}, DEFAULT_TIME)
+    assert_equal true, d_json.run
+  end
+  def test_write_with_msgpack
+    setup_mocks(%[val_a\tval_b\t\t\t\t\t\t\n\t\tval_c\tval_d\t\t\t\t\n])
+    d_msgpack = create_driver(CONFIG_MSGPACK)
+    emit_msgpack(d_msgpack)
+    assert_equal true, d_msgpack.run
+  end
+  def test_write_with_msgpack_hash_value
+    setup_mocks("val_a\t{\"foo\":\"var\"}\t\t\t\t\t\t\n\t\tval_c\tval_d\t\t\t\t\n")
+    d_msgpack = create_driver(CONFIG_MSGPACK)
+    d_msgpack.emit({"key_a" => "val_a", "key_b" => {"foo" => "var"}} , DEFAULT_TIME)
+    d_msgpack.emit(RECORD_MSGPACK_B, DEFAULT_TIME)
+    assert_equal true, d_msgpack.run
+  end
+  def test_write_with_msgpack_array_value
+    setup_mocks("val_a\t[\"foo\",\"var\"]\t\t\t\t\t\t\n\t\tval_c\tval_d\t\t\t\t\n")
+    d_msgpack = create_driver(CONFIG_MSGPACK)
+    d_msgpack.emit({"key_a" => "val_a", "key_b" => ["foo", "var"]} , DEFAULT_TIME)
+    d_msgpack.emit(RECORD_MSGPACK_B, DEFAULT_TIME)
+    assert_equal true, d_msgpack.run
+  end
+  def test_write_with_msgpack_including_tab_newline_quote
+    setup_mocks("val_a_with_\\\t_tab_\\\n_newline\tval_b_with_\\\\_quote\t\t\t\t\t\t\n\t\tval_c\tval_d\t\t\t\t\n")
+    d_msgpack = create_driver(CONFIG_MSGPACK)
+    d_msgpack.emit({"key_a" => "val_a_with_\t_tab_\n_newline", "key_b" => "val_b_with_\\_quote"} , DEFAULT_TIME)
+    d_msgpack.emit(RECORD_MSGPACK_B, DEFAULT_TIME)
+    assert_equal true, d_msgpack.run
+  end
+  def test_write_with_msgpack_no_data
+    setup_mocks("")
+    d_msgpack = create_driver(CONFIG_MSGPACK)
+    d_msgpack.emit({}, DEFAULT_TIME)
+    d_msgpack.emit({}, DEFAULT_TIME)
+    assert_equal false, d_msgpack.run
+  end
+  def test_write_with_msgpack_no_available_data
+    setup_mocks(%[val_a\tval_b\t\t\t\t\t\t\n])
+    d_msgpack = create_driver(CONFIG_MSGPACK)
+    d_msgpack.emit(RECORD_MSGPACK_A, DEFAULT_TIME)
+    d_msgpack.emit({"key_o" => "val_o", "key_p" => "val_p"}, DEFAULT_TIME)
+    assert_equal true, d_msgpack.run
+  end
+  def test_write_redshift_connection_error
+    def PG.connect(dbinfo)
+      return Class.new do
+        def initialize(return_keys=[]); end
+        def exec(sql)
+          raise PG::Error, "redshift connection error"
+        end
+        def close; end
+      end.new
+    end
+    setup_s3_mock(%[val_a,val_b,val_c,val_d\nval_e,val_f,val_g,val_h\n])
+    d_csv = create_driver
+    emit_csv(d_csv)
+    assert_raise(PG::Error) {
+      d_csv.run
+    }
+  end
+  def test_write_redshift_load_error
+    PG::Error.module_eval { attr_accessor :result}
+    def PG.connect(dbinfo)
+      return Class.new do
+        def initialize(return_keys=[]); end
+        def exec(sql)
+          error = PG::Error.new("ERROR:  Load into table 'apache_log' failed.  Check 'stl_load_errors' system table for details.")
+          error.result = "ERROR:  Load into table 'apache_log' failed.  Check 'stl_load_errors' system table for details."
+          raise error
+        end
+        def close; end
+      end.new
+    end
+    setup_s3_mock(%[val_a,val_b,val_c,val_d\nval_e,val_f,val_g,val_h\n])
+    d_csv = create_driver
+    emit_csv(d_csv)
+    assert_equal false,  d_csv.run
+  end
+  def test_write_with_json_redshift_connection_error
+    def PG.connect(dbinfo)
+      return Class.new do
+        def initialize(return_keys=[]); end
+        def exec(sql, &block)
+          error = PG::Error.new("redshift connection error")
+          raise error
+        end
+        def close; end
+      end.new
+    end
+    setup_s3_mock(%[val_a,val_b,val_c,val_d\nval_e,val_f,val_g,val_h\n])
+    d_json = create_driver(CONFIG_JSON)
+    emit_json(d_json)
+    assert_raise(PG::Error) {
+      d_json.run
+    }
+  end
+  def test_write_with_json_no_table_on_redshift
+    def PG.connect(dbinfo)
+      return Class.new do
+        def initialize(return_keys=[]); end
+        def exec(sql, &block)
+          yield [] if block_given?
+        end
+        def close; end
+      end.new
+    end
+    setup_s3_mock(%[val_a,val_b,val_c,val_d\nval_e,val_f,val_g,val_h\n])
+    d_json = create_driver(CONFIG_JSON)
+    emit_json(d_json)
+    assert_equal false, d_json.run
+  end
+  def test_write_with_json_failed_to_get_columns
+    def PG.connect(dbinfo)
+      return Class.new do
+        def initialize(return_keys=[]); end
+        def exec(sql, &block)
+        end
+        def close; end
+      end.new
+    end
+    setup_s3_mock("")
+    d_json = create_driver(CONFIG_JSON)
+    emit_json(d_json)
+    assert_raise(RuntimeError, "failed to fetch the redshift table definition.") {
+      d_json.run
+    }
+  end
+  def test_write_with_json_fetch_column_with_schema
+    def PG.connect(dbinfo)
+      return PGConnectionMock.new(:schemaname => 'test_schema')
+    end
+    setup_s3_mock(%[val_a\tval_b\t\t\t\t\t\t\n\t\tval_c\tval_d\t\t\t\t\n])
+    d_json = create_driver(CONFIG_JSON_WITH_SCHEMA)
+    emit_json(d_json)
+    assert_equal true, d_json.run
+  end
+end

data/test/test_helper.rb ADDED

@@ -0,0 +1,8 @@
+if ENV['COVERAGE']
+  require 'simplecov'
+  SimpleCov.start do
+    add_filter 'test/'
+    add_filter 'pkg/'
+    add_filter 'vendor/'
+  end
+end

metadata ADDED

@@ -0,0 +1,137 @@
+--- !ruby/object:Gem::Specification
+name: fluent-plugin-redshift-anton
+version: !ruby/object:Gem::Version
+  version: 1.0.1
+platform: ruby
+authors:
+- Anton Kuchinsky
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2015-04-11 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: fluentd
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: 0.10.0
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: 0.10.0
+- !ruby/object:Gem::Dependency
+  name: aws-sdk
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: 1.6.3
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: 1.6.3
+- !ruby/object:Gem::Dependency
+  name: pg
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: 0.14.0
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: 0.14.0
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: simplecov
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: 0.5.4
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: 0.5.4
+- !ruby/object:Gem::Dependency
+  name: flexmock
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: 1.3.1
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: 1.3.1
+description: Amazon Redshift output plugin for Fluentd with creating table
+email:
+- akuchinsky@gmail.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- Gemfile
+- README.md
+- Rakefile
+- VERSION
+- fluent-plugin-redshift-anton.gemspec
+- lib/fluent/plugin/out_redshift_auto.rb
+- test/plugin/test_out_redshift_auto.rb
+- test/test_helper.rb
+homepage: https://github.com/akuchins/fluent-plugin-redshift-anton
+licenses: []
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 2.0.14
+signing_key:
+specification_version: 4
+summary: Amazon Redshift output plugin for Fluentd with creating table
+test_files:
+- test/plugin/test_out_redshift_auto.rb
+- test/test_helper.rb