RubyGems - fluent-plugin-bigquery - Versions diffs - 0.0.1 - Mend

fluent-plugin-bigquery 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

checksums.yaml +7 -0
data/.gitignore +18 -0
data/Gemfile +4 -0
data/LICENSE.txt +13 -0
data/README.md +140 -0
data/Rakefile +11 -0
data/fluent-plugin-bigquery.gemspec +29 -0
data/lib/fluent/plugin/bigquery/load_request_body_wrapper.rb +173 -0
data/lib/fluent/plugin/bigquery/version.rb +6 -0
data/lib/fluent/plugin/out_bigquery.rb +296 -0
data/test/helper.rb +33 -0
data/test/test_load_request_body_wrapper.rb +190 -0
metadata +157 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: 793b3fbd0189044538497bcef9dc244adf987d86
+  data.tar.gz: 115ff35e20bf3e3fe58e54978f7e659678e14a17
+SHA512:
+  metadata.gz: d230732372df108fbcdf59aec5485f2837028531fb1cb9edf0582dfc31654a747bb35494e9de646067e295960709e778d9e69f41f80f1a8810e122b952c971e4
+  data.tar.gz: fc7dbd59a34a44f9f8f2aef3829fc352636c58a5fad3d8ddbf661368c912d8a2ba5d6f2080c66fc85c32f615916c3ad980f3e4e3a9b6e967bc8bfc1fc5712e2c

data/.gitignore ADDED Viewed

@@ -0,0 +1,18 @@
+*.gem
+*.rbc
+.bundle
+.config
+.yardoc
+Gemfile.lock
+InstalledFiles
+_yardoc
+coverage
+doc/
+lib/bundler/man
+pkg
+rdoc
+spec/reports
+test/tmp
+test/version_tmp
+tmp
+script/

data/Gemfile ADDED Viewed

@@ -0,0 +1,4 @@
+source 'https://rubygems.org'
+# Specify your gem's dependencies in fluent-plugin-bigquery.gemspec
+gemspec

data/LICENSE.txt ADDED Viewed

@@ -0,0 +1,13 @@
+Copyright (c) 2012- TAGOMORI Satoshi
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.

data/README.md ADDED Viewed

@@ -0,0 +1,140 @@
+# fluent-plugin-webhdfs
+Fluentd output plugin to load/insert data into Google BigQuery.
+* insert data over streaming inserts
+  * for continuous real-time insertions, under many limitations
+  * https://developers.google.com/bigquery/streaming-data-into-bigquery#usecases
+* (NOT IMPLEMENTED) load data
+  * for data loading as batch jobs, for big amount of data
+  * https://developers.google.com/bigquery/loading-data-into-bigquery
+Current version of this plugin supports Google API with Service Account Authentication, and does not support OAuth.
+## Configuration
+### Streming inserts
+For service account authentication, generate service account private key file and email key, then upload private key file onto your server.
+Configure insert specifications with target table schema, with your credentials. This is minimum configurations:
+```apache
+<match dummy>
+  type bigquery
+  method insert    # default
+  email xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxx@developer.gserviceaccount.com
+  private_key_path /home/username/.keys/00000000000000000000000000000000-privatekey.p12
+  # private_key_passphrase notasecret # default
+  project yourproject_id
+  dataset yourdataset_id
+  table   tablename
+  time_format %s
+  time_field  time
+  field_integer time,status,bytes
+  field_string  rhost,vhost,path,method,protocol,agent,referer
+  field_float   requestime
+  field_boolean bot_access,loginsession
+</match>
+```
+For high rate inserts over streaming inserts, you should specify flush intervals and buffer chunk options:
+```apache
+<match dummy>
+  type bigquery
+  method insert    # default
+  flush_interval 1  # flush as frequent as possible
+  buffer_chunk_records_limit 300  # default rate limit for users is 100
+  buffer_queue_limit 10240        # 1MB * 10240 -> 10GB!
+  num_threads 16
+  email xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxx@developer.gserviceaccount.com
+  private_key_path /home/username/.keys/00000000000000000000000000000000-privatekey.p12
+  # private_key_passphrase notasecret # default
+  project yourproject_id
+  dataset yourdataset_id
+  tables  accesslog1,accesslog2,accesslog3
+  time_format %s
+  time_field  time
+  field_integer time,status,bytes
+  field_string  rhost,vhost,path,method,protocol,agent,referer
+  field_float   requestime
+  field_boolean bot_access,loginsession
+</match>
+```
+Important options for high rate events are:
+  * `tables`
+    * 2 or more tables are available with ',' separator
+    * `out_bigquery` uses these tables for Table Sharding inserts
+    * these must have same schema
+  * `buffer_chunk_records_limit`
+    * number of records over streaming inserts API call is limited as 100, per second, per table
+    * default average rate limit is 100, and spike rate limit is 1000
+    * `out_bigquery` flushes buffer with 100 records for 1 inserts API call
+  * `buffer_queue_limit`
+    * BigQuery streaming inserts needs very small buffer chunks
+    * for high-rate events, `buffer_queue_limit` should be configured with big number
+    * Max 1GB memory may be used under network problem in default configuration
+      * `buffer_chunk_limit (default 1MB)` x `buffer_queue_limit (default 1024)`
+  * `num_threads`
+    * threads for insert api calls in parallel
+    * specify this option for 100 or more records per seconds
+    * 10 or more threads seems good for inserts over internet
+    * less threads may be good for Google Compute Engine instances (with low latency for BigQuery)
+  * `flush_interval`
+    * `1` is lowest value, without patches on Fluentd v0.10.41 or earlier
+    * see `patches` below
+### patches
+This plugin depends on `fluent-plugin-buffer-lightening`, and it includes monkey patch module for BufferedOutput plugin, to realize high rate and low latency flushing. With this patch, sub 1 second flushing available.
+To use this feature, execute fluentd with `-r fluent/plugin/output_try_flush_interval_patch` option.
+And configure `flush_interval` and `try_flush_interval` with floating point value.
+```apache
+<match dummy>
+  type bigquery
+  method insert    # default
+  flush_interval     0.2
+  try_flush_interval 0.05
+  buffer_chunk_records_limit 300  # default rate limit for users is 100
+  buffer_queue_limit 10240        # 1MB * 10240 -> 10GB!
+  num_threads 16
+  # credentials, project/dataset/table and schema specs.
+</match>
+```
+With this configuration, flushing will be done in 0.25 seconds after record inputs in the worst case.
+## TODO
+* support Load API
+  * with automatically configured flush/buffer options
+* support RECORD field
+  * and support optional data fields
+* support NULLABLE/REQUIRED/REPEATED field options
+* OAuth installed application credentials support
+* Google API discovery expiration
+* Error classes
+* check row size limits

data/Rakefile ADDED Viewed

@@ -0,0 +1,11 @@
+#!/usr/bin/env rake
+require "bundler/gem_tasks"
+require 'rake/testtask'
+Rake::TestTask.new(:test) do |test|
+  test.libs << 'lib' << 'test'
+  test.pattern = 'test/**/test_*.rb'
+  test.verbose = true
+end
+task :default => :test

data/fluent-plugin-bigquery.gemspec ADDED Viewed

@@ -0,0 +1,29 @@
+# coding: utf-8
+lib = File.expand_path('../lib', __FILE__)
+$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
+require 'fluent/plugin/bigquery/version'
+Gem::Specification.new do |spec|
+  spec.name          = "fluent-plugin-bigquery"
+  spec.version       = Fluent::BigQueryPlugin::VERSION
+  spec.authors       = ["TAGOMORI Satoshi"]
+  spec.email         = ["tagomoris@gmail.com"]
+  spec.description   = %q{Fluentd plugin to store data on Google BigQuery, by load, or by stream inserts}
+  spec.summary       = %q{Fluentd plugin to store data on Google BigQuery}
+  spec.homepage      = "https://github.com/tagomoris/fluent-plugin-bigquery"
+  spec.license       = "APLv2"
+  spec.files         = `git ls-files`.split($/)
+  spec.executables   = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
+  spec.test_files    = spec.files.grep(%r{^(test|spec|features)/})
+  spec.require_paths = ["lib"]
+  spec.add_development_dependency "rake"
+  spec.add_runtime_dependency "google-api-client", "~> 0.6.4"
+  spec.add_runtime_dependency "fluentd"
+  spec.add_runtime_dependency "fluent-mixin-plaintextformatter", '>= 0.2.1'
+  spec.add_runtime_dependency "fluent-mixin-config-placeholders", ">= 0.2.0"
+  spec.add_runtime_dependency "fluent-plugin-buffer-lightening"
+  spec.add_development_dependency "fluent-plugin-dummydata-producer"
+end

data/lib/fluent/plugin/bigquery/load_request_body_wrapper.rb ADDED Viewed

@@ -0,0 +1,173 @@
+module Fluent
+  module BigQueryPlugin
+    class LoadRequestBodyWrapper
+      # body can be a instance of IO (#rewind, #read, #to_str)
+      #   http://rubydoc.info/github/google/google-api-ruby-client/Google/APIClient/Request#body-instance_method
+      # http://rubydoc.info/github/google/google-api-ruby-client/Google/APIClient#execute-instance_method
+      # (Google::APIClient::Method) api_method: The method object or the RPC name of the method being executed.
+      # (Hash, Array) parameters: The parameters to send to the method.
+      # (String) body: The body of the request.
+      # (Hash, Array) headers: The HTTP headers for the request.
+      # (Hash) options: A set of options for the request, of which:
+      #          (#generate_authenticated_request) :authorization (default: true)
+      #                       - The authorization mechanism for the response. Used only if :authenticated is true.
+      #          (TrueClass, FalseClass) :authenticated (default: true)
+      #                       - true if the request must be signed or somehow authenticated, false otherwise.
+      #          (TrueClass, FalseClass) :gzip (default: true) - true if gzip enabled, false otherwise.
+      # https://developers.google.com/bigquery/loading-data-into-bigquery#loaddatapostrequest
+      JSON_PRETTY_DUMP = JSON::State.new(space: " ", indent:"  ", object_nl:"\n", array_nl:"\n")
+      CONTENT_TYPE_FIRST = "Content-Type: application/json; charset=UTF-8\n\n"
+      CONTENT_TYPE_SECOND = "Content-Type: application/octet-stream\n\n"
+      MULTIPART_BOUNDARY = "--xxx\n"
+      MULTIPART_BOUNDARY_END = "--xxx--\n"
+      def initialize(project_id, dataset_id, table_id, field_defs, buffer)
+        @metadata = {
+          configuration: {
+            load: {
+              sourceFormat: "<required for JSON files>",
+              schema: {
+                fields: field_defs
+              },
+              destinationTable: {
+                projectId: project_id,
+                datasetId: dataset_id,
+                tableId: table_id
+              }
+            }
+          }
+        }
+        @non_buffer = MULTIPART_BOUNDARY + CONTENT_TYPE_FIRST + @metadata.to_json(JSON_PRETTY_DUMP) + "\n" +
+          MULTIPART_BOUNDARY + CONTENT_TYPE_SECOND
+        @non_buffer.force_encoding("ASCII-8BIT")
+        @non_buffer_bytesize = @non_buffer.bytesize
+        @buffer = buffer # read
+        @buffer_bytesize = @buffer.size # Fluentd Buffer Chunk #size -> bytesize
+        @footer = MULTIPART_BOUNDARY_END.force_encoding("ASCII-8BIT")
+        @contents_bytesize = @non_buffer_bytesize + @buffer_bytesize
+        @total_bytesize = @contents_bytesize + MULTIPART_BOUNDARY_END.bytesize
+        @whole_data = nil
+        @counter = 0
+        @eof = false
+      end
+#       sample_body = <<EOF
+# --xxx
+# Content-Type: application/json; charset=UTF-8
+#
+# {
+#   "configuration": {
+#     "load": {
+#       "sourceFormat": "<required for JSON files>",
+#       "schema": {
+#         "fields": [
+#           {"name":"f1", "type":"STRING"},
+#           {"name":"f2", "type":"INTEGER"}
+#         ]
+#       },
+#       "destinationTable": {
+#         "projectId": "projectId",
+#         "datasetId": "datasetId",
+#         "tableId": "tableId"
+#       }
+#     }
+#   }
+# }
+# --xxx
+# Content-Type: application/octet-stream
+#
+# <your data>
+# --xxx--
+# EOF
+      def rewind
+        @counter = 0
+        @eof = false
+      end
+      def eof?
+        @eof
+      end
+      def to_str
+        rewind
+        self.read # all data
+      end
+      def read(length=nil, outbuf="")
+        raise ArgumentError, "negative read length" if length && length < 0
+        return (length.nil? || length == 0) ? "" : nil if @eof
+        return outbuf if length == 0
+        # read all data
+        if length.nil? || length >= @total_bytesize
+          @whole_data ||= @buffer.read.force_encoding("ASCII-8BIT")
+          if @counter.zero?
+            outbuf.replace(@non_buffer)
+            outbuf << @whole_data
+            outbuf << @footer
+          elsif @counter < @non_buffer_bytesize
+            outbuf.replace(@non_buffer[ @counter .. -1 ])
+            outbuf << @whole_data
+            outbuf << @footer
+          elsif @counter < @contents_bytesize
+            outbuf.replace(@whole_data[ (@counter - @non_buffer_bytesize) .. -1 ])
+            outbuf << @footer
+          else
+            outbuf.replace(@footer[ (@counter - @contents_bytesize) .. -1 ])
+          end
+          @counter = @total_bytesize
+          @eof = true
+          return outbuf
+        end
+        # In ruby script level (non-ext module), we cannot prevent to change outbuf length or object re-assignment
+        outbuf.replace("")
+        # return first part (metadata)
+        if @counter < @non_buffer_bytesize
+          non_buffer_part = @non_buffer[@counter, length]
+          if non_buffer_part
+            outbuf << non_buffer_part
+            length -= non_buffer_part.bytesize
+            @counter += non_buffer_part.bytesize
+          end
+        end
+        return outbuf if length < 1
+        # return second part (buffer content)
+        if @counter < @contents_bytesize
+          @whole_data ||= @buffer.read.force_encoding("ASCII-8BIT")
+          buffer_part = @whole_data[@counter - @non_buffer_bytesize, length]
+          if buffer_part
+            outbuf << buffer_part
+            length -= buffer_part.bytesize
+            @counter += buffer_part.bytesize
+          end
+        end
+        return outbuf if length < 1
+        # return footer
+        footer_part = @footer[@counter - @contents_bytesize, length]
+        if footer_part
+          outbuf << footer_part
+          @counter += footer_part.bytesize
+          @eof = true if @counter >= @total_bytesize
+        end
+        outbuf
+      end
+    end
+  end
+end

data/lib/fluent/plugin/bigquery/version.rb ADDED Viewed

@@ -0,0 +1,6 @@
+module Fluent
+  module BigQueryPlugin
+    VERSION = "0.0.1"
+  end
+end

data/lib/fluent/plugin/out_bigquery.rb ADDED Viewed

@@ -0,0 +1,296 @@
+# -*- coding: utf-8 -*-
+require 'fluent/plugin/bigquery/version'
+require 'fluent/mixin/config_placeholders'
+require 'fluent/mixin/plaintextformatter'
+## TODO: load implementation
+# require 'fluent/plugin/bigquery/load_request_body_wrapper'
+require 'fluent/plugin/output_try_flush_interval_patch'
+module Fluent
+  ### TODO: error classes for each api error responses
+  # class BigQueryAPIError < StandardError
+  # end
+  class BigQueryOutput < BufferedOutput
+    Fluent::Plugin.register_output('bigquery', self)
+    # https://developers.google.com/bigquery/browser-tool-quickstart
+    # https://developers.google.com/bigquery/bigquery-api-quickstart
+    config_set_default :buffer_type, 'lightening'
+    config_set_default :flush_interval, 0.25
+    config_set_default :try_flush_interval, 0.05
+    config_set_default :buffer_chunk_records_limit, 100
+    config_set_default :buffer_chunk_limit, 1000000
+    config_set_default :buffer_queue_limit, 1024
+    ### for loads
+    ### TODO: different default values for buffering between 'load' and insert
+    # config_set_default :flush_interval, 1800 # 30min => 48 imports/day
+    # config_set_default :buffer_chunk_limit, 1000**4 # 1.0*10^12 < 1TB (1024^4)
+    ### OAuth credential
+    # config_param :client_id, :string
+    # config_param :client_secret, :string
+    ### Service Account credential
+    config_param :email, :string
+    config_param :private_key_path, :string
+    config_param :private_key_passphrase, :string, :default => 'notasecret'
+    # see as simple reference
+    #   https://github.com/abronte/BigQuery/blob/master/lib/bigquery.rb
+    config_param :project, :string
+    # dataset_name
+    #   The name can be up to 1,024 characters long, and consist of A-Z, a-z, 0-9, and the underscore,
+    #   but it cannot start with a number or underscore, or have spaces.
+    config_param :dataset, :string
+    # table_id
+    #   In Table ID, enter a name for your new table. Naming rules are the same as for your dataset.
+    config_param :table, :string, :default => nil
+    config_param :tables, :string, :default => nil
+    config_param :field_string,  :string, :default => nil
+    config_param :field_integer, :string, :default => nil
+    config_param :field_float,   :string, :default => nil
+    config_param :field_boolean, :string, :default => nil
+    ### TODO: record field stream inserts doesn't works well?
+    ###  At table creation, table type json + field type record -> field type validation fails
+    ###  At streaming inserts, schema cannot be specified
+    # config_param :field_record,  :string, :defualt => nil
+    # config_param :optional_data_field, :string, :default => nil
+    config_param :time_format, :string, :default => nil
+    config_param :localtime, :bool, :default => nil
+    config_param :utc, :bool, :default => nil
+    config_param :time_field, :string, :default => nil
+    config_param :method, :string, :default => 'insert' # or 'load' # TODO: not implemented now
+    config_param :load_size_limit, :integer, :default => 1000**4 # < 1TB (1024^4) # TODO: not implemented now
+    ### method: 'load'
+    #   https://developers.google.com/bigquery/loading-data-into-bigquery
+    # Maximum File Sizes:
+    # File Type   Compressed   Uncompressed
+    # CSV         1 GB         With new-lines in strings: 4 GB
+    #                          Without new-lines in strings: 1 TB
+    # JSON        1 GB         1 TB
+    config_param :row_size_limit, :integer, :default => 100*1000 # < 100KB # configurable in google ?
+    # config_param :insert_size_limit, :integer, :default => 1000**2 # < 1MB
+    # config_param :rows_per_second_limit, :integer, :default => 1000 # spike limit
+    ### method: ''Streaming data inserts support
+    #  https://developers.google.com/bigquery/streaming-data-into-bigquery#usecases
+    # Maximum row size: 100 KB
+    # Maximum data size of all rows, per insert: 1 MB
+    # Maximum rows per second: 100 rows per second, per table, with allowed and occasional bursts of up to 1,000 rows per second.
+    #                          If you exceed 100 rows per second for an extended period of time, throttling might occur.
+    ### Toooooooooooooo short/small per inserts and row!
+    ### Table types
+    # https://developers.google.com/bigquery/docs/tables
+    #
+    # type - The following data types are supported; see Data Formats for details on each data type:
+    # STRING
+    # INTEGER
+    # FLOAT
+    # BOOLEAN
+    # RECORD A JSON object, used when importing nested records. This type is only available when using JSON source files.
+    #
+    # mode - Whether a field can be null. The following values are supported:
+    # NULLABLE - The cell can be null.
+    # REQUIRED - The cell cannot be null.
+    # REPEATED - Zero or more repeated simple or nested subfields. This mode is only supported when using JSON source files.
+    def initialize
+      super
+      require 'google/api_client'
+      require 'google/api_client/client_secrets'
+      require 'google/api_client/auth/installed_app'
+    end
+    def configure(conf)
+      super
+      if (!@table && !@tables) || (@table && @table)
+        raise Fluent::ConfigError, "'table' or 'tables' must be specified, and both are invalid"
+      end
+      @tablelist = @tables ? @tables.split(',') : [@table]
+      @fields = {}
+      if @field_string
+        @field_string.split(',').each do |fieldname|
+          @fields[fieldname] = :string
+        end
+      end
+      if @field_integer
+        @field_integer.split(',').each do |fieldname|
+          @fields[fieldname] = :integer
+        end
+      end
+      if @field_float
+        @field_float.split(',').each do |fieldname|
+          @fields[fieldname] = :float
+        end
+      end
+      if @field_boolean
+        @field_boolean.split(',').each do |fieldname|
+          @fields[fieldname] = :boolean
+        end
+      end
+      if @localtime.nil?
+        if @utc
+          @localtime = false
+        end
+      end
+      @timef = TimeFormatter.new(@time_format, @localtime)
+    end
+    def start
+      super
+      @bq = client.discovered_api("bigquery", "v2") # TODO: refresh with specified expiration
+      @cached_client = nil
+      @cached_client_expiration = nil
+      @tables_queue = @tablelist.dup.shuffle
+      @tables_mutex = Mutex.new
+    end
+    def shutdown
+      super
+      # nothing to do
+    end
+    def client
+      return @cached_client if @cached_client && @cached_client_expiration > Time.now
+      client = Google::APIClient.new(
+        :application_name => 'Fluentd BigQuery plugin',
+        :application_version => Fluent::BigQueryPlugin::VERSION
+      )
+      key = Google::APIClient::PKCS12.load_key( @private_key_path, @private_key_passphrase )
+      asserter = Google::APIClient::JWTAsserter.new(
+        @email,
+        "https://www.googleapis.com/auth/bigquery",
+        key
+      )
+      # refresh_auth
+      client.authorization = asserter.authorize
+      @cached_client_expiration = Time.now + 1800
+      @cached_client = client
+    end
+    def insert(table_id, rows)
+      res = client().execute(
+        :api_method => @bq.tabledata.insert_all,
+        :parameters => {
+          'projectId' => @project,
+          'datasetId' => @dataset,
+          'tableId' => table_id,
+        },
+        :body_object => {
+          "rows" => rows
+        }
+      )
+      if res.status != 200
+        # api_error? -> client cache clear
+        @cached_client = nil
+        message = res.body
+        if res.body =~ /^\{/
+          begin
+            res_obj = JSON.parse(res.body)
+            message = res_obj['error']['message'] || res.body
+          rescue => e
+            $log.warn "Parse error: google api error response body", :body => res.body
+          end
+        end
+        $log.error "tabledata.insertAll API", :project_id => @project_id, :dataset => @dataset_id, :table => table_id, :code => res.status, :message => message
+        raise "failed to insert into bigquery" # TODO: error class
+      end
+    end
+    def load
+      # https://developers.google.com/bigquery/loading-data-into-bigquery#loaddatapostrequest
+      raise NotImplementedError # TODO
+    end
+    def format_record(record)
+      out = {}
+      @fields.each do |key, type|
+        value = record[key]
+        next if value.nil? # field does not exists, or null value
+        out[key] = case type
+                   when :string  then record[key].to_s
+                   when :integer then record[key].to_i
+                   when :float   then record[key].to_f
+                   when :boolean then !!record[key]
+                   # when :record
+                   else
+                     raise "BUG: unknown field type #{type}"
+                   end
+      end
+      out
+    end
+    def format_stream(tag, es)
+      super
+      buf = ''
+      es.each do |time, record|
+        row = if @time_field
+                format_record(record.merge({@time_field => @timef.format(time)}))
+              else
+                format_record(record)
+              end
+        buf << {"json" => row}.to_msgpack unless row.empty?
+      end
+      buf
+    end
+    def write(chunk)
+      rows = []
+      chunk.msgpack_each do |row_object|
+        # TODO: row size limit
+        rows << row_object
+      end
+      # TODO: method
+      insert_table = @tables_mutex.synchronize do
+        t = @tables_queue.shift
+        @tables_queue.push t
+        t
+      end
+      insert(insert_table, rows)
+    end
+    # def client_oauth # not implemented
+    #   raise NotImplementedError, "OAuth needs browser authentication..."
+    #
+    #   client = Google::APIClient.new(
+    #     :application_name => 'Example Ruby application',
+    #     :application_version => '1.0.0'
+    #   )
+    #   bigquery = client.discovered_api('bigquery', 'v2')
+    #   flow = Google::APIClient::InstalledAppFlow.new(
+    #     :client_id => @client_id
+    #     :client_secret => @client_secret
+    #     :scope => ['https://www.googleapis.com/auth/bigquery']
+    #   )
+    #   client.authorization = flow.authorize # browser authentication !
+    #   client
+    # end
+  end
+end

data/test/helper.rb ADDED Viewed

@@ -0,0 +1,33 @@
+require 'rubygems'
+require 'bundler'
+begin
+  Bundler.setup(:default, :development)
+rescue Bundler::BundlerError => e
+  $stderr.puts e.message
+  $stderr.puts "Run `bundle install` to install missing gems"
+  exit e.status_code
+end
+require 'test/unit'
+$LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
+$LOAD_PATH.unshift(File.dirname(__FILE__))
+require 'fluent/test'
+unless ENV.has_key?('VERBOSE')
+  nulllogger = Object.new
+  nulllogger.instance_eval {|obj|
+    def method_missing(method, *args)
+      # pass
+    end
+  }
+  $log = nulllogger
+end
+require 'fluent/buffer'
+require 'fluent/plugin/buf_memory'
+require 'fluent/plugin/buf_file'
+require 'fluent/plugin/out_bigquery'
+require 'fluent/plugin/bigquery/load_request_body_wrapper'
+class Test::Unit::TestCase
+end

data/test/test_load_request_body_wrapper.rb ADDED Viewed

@@ -0,0 +1,190 @@
+# -*- coding: utf-8 -*-
+require 'helper'
+require 'json'
+require 'tempfile'
+class LoadRequestBodyWrapperTest < Test::Unit::TestCase
+  def content_alphabet(repeat)
+    (0...repeat).map{|i| "#{i}0123456789\n" }.join
+  end
+  def content_kana(repeat)
+    (0...repeat).map{|i| "#{i}あいうえおかきくけこ\n" }.join
+  end
+  def mem_chunk(repeat=10, kana=false)
+    content = kana ? content_kana(repeat) : content_alphabet(repeat)
+    Fluent::MemoryBufferChunk.new('bc_mem', content)
+  end
+  def file_chunk(repeat=10, kana=false)
+    content = kana ? content_kana(repeat) : content_alphabet(repeat)
+    tmpfile = Tempfile.new('fluent_bigquery_plugin_test')
+    buf = Fluent::FileBufferChunk.new('bc_mem', tmpfile.path, tmpfile.object_id)
+    buf << content
+    buf
+  end
+  def field_defs
+    [{"name" => "field1", "type" => "STRING"}, {"name" => "field2", "type" => "INTEGER"}]
+  end
+  def check_meta(blank, first, last)
+    assert_equal "", blank
+    header1, body1 = first.split("\n\n")
+    assert_equal "Content-Type: application/json; charset=UTF-8", header1
+    metadata = JSON.parse(body1)
+    assert_equal "<required for JSON files>", metadata["configuration"]["load"]["sourceFormat"]
+    assert_equal "field1", metadata["configuration"]["load"]["schema"]["fields"][0]["name"]
+    assert_equal "STRING", metadata["configuration"]["load"]["schema"]["fields"][0]["type"]
+    assert_equal "field2", metadata["configuration"]["load"]["schema"]["fields"][1]["name"]
+    assert_equal "INTEGER", metadata["configuration"]["load"]["schema"]["fields"][1]["type"]
+    assert_equal "pname1", metadata["configuration"]["load"]["destinationTable"]["projectId"]
+    assert_equal "dname1", metadata["configuration"]["load"]["destinationTable"]["datasetId"]
+    assert_equal "tname1", metadata["configuration"]["load"]["destinationTable"]["tableId"]
+    assert_equal "--\n", last
+  end
+  def check_ascii(data)
+    blank, first, second, last = data.split(/--xxx\n?/)
+    check_meta(blank, first, last)
+    header2, body2 = second.split("\n\n")
+    assert_equal "Content-Type: application/octet-stream", header2
+    i = 0
+    body2.each_line do |line|
+      assert_equal "#{i}0123456789\n", line
+      i += 1
+    end
+  end
+  def check_kana(data)
+    blank, first, second, last = data.split(/--xxx\n?/)
+    check_meta(blank, first, last)
+    header2, body2 = second.split("\n\n")
+    assert_equal "Content-Type: application/octet-stream", header2
+    i = 0
+    body2.each_line do |line|
+      assert_equal "#{i}あいうえおかきくけこ\n", line
+      i += 1
+    end
+  end
+  def setup
+    @klass = Fluent::BigQueryPlugin::LoadRequestBodyWrapper
+    self
+  end
+  def test_memory_buf
+    d1 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), mem_chunk(10))
+    data1 = d1.read.force_encoding("UTF-8")
+    check_ascii(data1)
+    d2 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), mem_chunk(10))
+    data2 = ""
+    while !d2.eof? do
+      buf = "     "
+      objid = buf.object_id
+      data2 << d2.read(20, buf)
+      assert_equal objid, buf.object_id
+    end
+    data2.force_encoding("UTF-8")
+    assert_equal data1.size, data2.size
+  end
+  def test_memory_buf2
+    d1 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), mem_chunk(100000))
+    data1 = d1.read.force_encoding("UTF-8")
+    check_ascii(data1)
+    d2 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), mem_chunk(100000))
+    data2 = ""
+    while !d2.eof? do
+      buf = "     "
+      objid = buf.object_id
+      data2 << d2.read(2048, buf)
+      assert_equal objid, buf.object_id
+    end
+    data2.force_encoding("UTF-8")
+    assert_equal data1.size, data2.size
+  end
+  def test_memory_buf3 # kana
+    d1 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), mem_chunk(100000, true))
+    data1 = d1.read.force_encoding("UTF-8")
+    check_kana(data1)
+    d2 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), mem_chunk(100000, true))
+    data2 = ""
+    while !d2.eof? do
+      buf = "     "
+      objid = buf.object_id
+      data2 << d2.read(2048, buf)
+      assert_equal objid, buf.object_id
+    end
+    data2.force_encoding("UTF-8")
+    assert_equal data1.size, data2.size
+  end
+  def test_file_buf
+    d1 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), file_chunk(10))
+    data1 = d1.read.force_encoding("UTF-8")
+    check_ascii(data1)
+    d2 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), file_chunk(10))
+    data2 = ""
+    while !d2.eof? do
+      buf = "     "
+      objid = buf.object_id
+      data2 << d2.read(20, buf)
+      assert_equal objid, buf.object_id
+    end
+    data2.force_encoding("UTF-8")
+    assert_equal data1.size, data2.size
+  end
+  def test_file_buf2
+    d1 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), file_chunk(100000))
+    data1 = d1.read.force_encoding("UTF-8")
+    check_ascii(data1)
+    d2 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), file_chunk(100000))
+    data2 = ""
+    while !d2.eof? do
+      buf = "     "
+      objid = buf.object_id
+      data2 << d2.read(20480, buf)
+      assert_equal objid, buf.object_id
+    end
+    data2.force_encoding("UTF-8")
+    assert_equal data1.size, data2.size
+  end
+  def test_file_buf3 # kana
+    d1 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), file_chunk(100000, true))
+    data1 = d1.read.force_encoding("UTF-8")
+    check_kana(data1)
+    d2 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), file_chunk(100000, true))
+    data2 = ""
+    while !d2.eof? do
+      buf = "     "
+      objid = buf.object_id
+      data2 << d2.read(20480, buf)
+      assert_equal objid, buf.object_id
+    end
+    data2.force_encoding("UTF-8")
+    assert_equal data1.size, data2.size
+  end
+end

metadata ADDED Viewed

@@ -0,0 +1,157 @@
+--- !ruby/object:Gem::Specification
+name: fluent-plugin-bigquery
+version: !ruby/object:Gem::Version
+  version: 0.0.1
+platform: ruby
+authors:
+- TAGOMORI Satoshi
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2013-12-23 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: google-api-client
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ~>
+      - !ruby/object:Gem::Version
+        version: 0.6.4
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ~>
+      - !ruby/object:Gem::Version
+        version: 0.6.4
+- !ruby/object:Gem::Dependency
+  name: fluentd
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: fluent-mixin-plaintextformatter
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: 0.2.1
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: 0.2.1
+- !ruby/object:Gem::Dependency
+  name: fluent-mixin-config-placeholders
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: 0.2.0
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: 0.2.0
+- !ruby/object:Gem::Dependency
+  name: fluent-plugin-buffer-lightening
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: fluent-plugin-dummydata-producer
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+description: Fluentd plugin to store data on Google BigQuery, by load, or by stream
+  inserts
+email:
+- tagomoris@gmail.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- .gitignore
+- Gemfile
+- LICENSE.txt
+- README.md
+- Rakefile
+- fluent-plugin-bigquery.gemspec
+- lib/fluent/plugin/bigquery/load_request_body_wrapper.rb
+- lib/fluent/plugin/bigquery/version.rb
+- lib/fluent/plugin/out_bigquery.rb
+- test/helper.rb
+- test/test_load_request_body_wrapper.rb
+homepage: https://github.com/tagomoris/fluent-plugin-bigquery
+licenses:
+- APLv2
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 2.0.3
+signing_key:
+specification_version: 4
+summary: Fluentd plugin to store data on Google BigQuery
+test_files:
+- test/helper.rb
+- test/test_load_request_body_wrapper.rb
+has_rdoc: