RubyGems - fluent-plugin-bigquery - Versions diffs - 0.2.15 → 0.2.16 - Mend

fluent-plugin-bigquery 0.2.15 → 0.2.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/README.md +69 -18
data/fluent-plugin-bigquery.gemspec +3 -2
data/lib/fluent/plugin/bigquery/version.rb +1 -1
data/lib/fluent/plugin/out_bigquery.rb +148 -57
data/test/helper.rb +0 -1
data/test/plugin/test_out_bigquery.rb +109 -0
metadata +19 -8
data/lib/fluent/plugin/bigquery/load_request_body_wrapper.rb +0 -173
data/test/test_load_request_body_wrapper.rb +0 -190

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 27f275ca7fb430c0576f0358736759c256b18492
-  data.tar.gz: 0f910671b2e06f5f3af370a88122c98fc275fa02
+  metadata.gz: 6283655314f920c8d3f1bab8f387d96c6fe79da0
+  data.tar.gz: f1016e03203cf12c4c26ad62f1c3a05926423fa7
 SHA512:
-  metadata.gz: a7014fe6b9f39479a08d8440f6d5a2e4f7524e89c72e07d77008ead605bb430963893ae88ab77934f4a6aff9a9e903af6d7d23820a5885df38c00203248991cd
-  data.tar.gz: 5115561a849c5a5f3150c2254e536fd1b645757853c40860b35ee733a80e16452edb99b630a049f247d39ee56494f64632c1b2891544e7296d1549d63c781d82
+  metadata.gz: 15e484f5df810cd5736711bd70df5a9e34950e10c77118a3b6097fba6f9c1efd9641ac515df547a15b2f9dac653deb3d5b2fa665541a47bf43dba750754d584e
+  data.tar.gz: 1bbcea1f4ec490c69028eca66032b6dca734231fedf431951618b1d5ad08a354395f357e31a028f3e60531ec15e831bbbb0d4f9a70bd48082a57562885564023

data/README.md CHANGED

@@ -5,7 +5,7 @@
 * insert data over streaming inserts
   * for continuous real-time insertions
   * https://developers.google.com/bigquery/streaming-data-into-bigquery#usecases
-* (NOT IMPLEMENTED) load data
+* load data
   * for data loading as batch jobs, for big amount of data
   * https://developers.google.com/bigquery/loading-data-into-bigquery
@@ -20,7 +20,7 @@ Configure insert specifications with target table schema, with your credentials.
 ```apache
 <match dummy>
-  type bigquery
+  @type bigquery
   method insert    # default
@@ -47,7 +47,7 @@ For high rate inserts over streaming inserts, you should specify flush intervals
 ```apache
 <match dummy>
-  type bigquery
+  @type bigquery
   method insert    # default
@@ -106,6 +106,37 @@ Important options for high rate events are:
 See [Quota policy](https://cloud.google.com/bigquery/streaming-data-into-bigquery#quota)
 section in the Google BigQuery document.
+### Load
+```apache
+<match bigquery>
+  @type bigquery
+  method load
+  buffer_type file
+  buffer_path bigquery.*.buffer
+  flush_interval 1800
+  flush_at_shutdown true
+  try_flush_interval 1
+  utc
+  auth_method json_key
+  json_key json_key_path.json
+  time_format %s
+  time_field  time
+  project yourproject_id
+  dataset yourdataset_id
+  auto_create_table true
+  table yourtable%{time_slice}
+  schema_path bq_schema.json
+</match>
+```
+I recommend to use file buffer and long flush interval.
+__CAUTION: `flush_interval` default is still `0.25` even if `method` is `load` on current version.__
 ### Authentication
 There are two methods supported to fetch access token for the service account.
@@ -127,7 +158,7 @@ download its JSON key and deploy the key with fluentd.
 ```apache
 <match dummy>
-  type bigquery
+  @type bigquery
   auth_method json_key
   json_key /home/username/.keys/00000000000000000000000000000000-jsonkey.json
@@ -144,7 +175,7 @@ You need to only include `private_key` and `client_email` key from JSON key file
 ```apache
 <match dummy>
-  type bigquery
+  @type bigquery
   auth_method json_key
   json_key {"private_key": "-----BEGIN PRIVATE KEY-----\n...", "client_email": "xxx@developer.gserviceaccount.com"}
@@ -165,7 +196,7 @@ Compute Engine instance, then you can configure fluentd like this.
 ```apache
 <match dummy>
-  type bigquery
+  @type bigquery
   auth_method compute_engine
@@ -198,6 +229,7 @@ In this authentication method, the credentials returned are determined by the en
 ### Table id formatting
+#### strftime formatting
 `table` and `tables` options accept [Time#strftime](http://ruby-doc.org/core-1.9.3/Time.html#method-i-strftime)
 format to construct table ids.
 Table ids are formatted at runtime
@@ -208,7 +240,7 @@ data is inserted into tables `accesslog_2014_08`, `accesslog_2014_09` and so on.
 ```apache
 <match dummy>
-  type bigquery
+  @type bigquery
   ...
@@ -220,8 +252,11 @@ data is inserted into tables `accesslog_2014_08`, `accesslog_2014_09` and so on.
 </match>
 ```
+#### record attribute formatting
 The format can be suffixed with attribute name.
+__NOTE: This feature is available only if `method` is `insert`. Because it makes performance impact. Use `%{time_slice}` instead of it.__
 ```apache
 <match dummy>
   ...
@@ -233,23 +268,39 @@ The format can be suffixed with attribute name.
 If attribute name is given, the time to be used for formatting is value of each row.
 The value for the time should be a UNIX time.
+#### time_slice_key formatting
 Or, the options can use `%{time_slice}` placeholder.
 `%{time_slice}` is replaced by formatted time slice key at runtime.
 ```apache
 <match dummy>
-  type bigquery
+  @type bigquery
   ...
-  project yourproject_id
-  dataset yourdataset_id
   table   accesslog%{time_slice}
   ...
 </match>
 ```
+#### record attribute value formatting
+Or, `${attr_name}` placeholder is available to use value of attribute as part of table id.
+`${attr_name}` is replaced by string value of the attribute specified by `attr_name`.
+__NOTE: This feature is available only if `method` is `insert`.__
+```apache
+<match dummy>
+  ...
+  table   accesslog_%Y_%m_${subdomain}
+  ...
+</match>
+```
+For example value of `subdomain` attribute is `"bq.fluent"`, table id will be like "accesslog_2016_03_bqfluent".
+- any type of attribute is allowed because stringified value will be used as replacement.
+- acceptable characters are alphabets, digits and `_`. All other characters will be removed.
 ### Dynamic table creating
 When `auto_create_table` is set to `true`, try to create the table using BigQuery API when insertion failed with code=404 "Not Found: Table ...".
@@ -259,7 +310,7 @@ NOTE: `auto_create_table` option cannot be used with `fetch_schema`. You should
 ```apache
 <match dummy>
-  type bigquery
+  @type bigquery
   ...
@@ -283,7 +334,7 @@ you can also specify nested fields by prefixing their belonging record fields.
 ```apache
 <match dummy>
-  type bigquery
+  @type bigquery
   ...
@@ -322,7 +373,7 @@ The second method is to specify a path to a BigQuery schema file instead of list
 ```apache
 <match dummy>
-  type bigquery
+  @type bigquery
   ...
@@ -339,7 +390,7 @@ The third method is to set `fetch_schema` to `true` to enable fetch a schema usi
 ```apache
 <match dummy>
-  type bigquery
+  @type bigquery
   ...
@@ -363,7 +414,7 @@ You can set `insert_id_field` option to specify the field to use as `insertId` p
 ```apache
 <match dummy>
-  type bigquery
+  @type bigquery
   ...

data/fluent-plugin-bigquery.gemspec CHANGED

@@ -11,7 +11,7 @@ Gem::Specification.new do |spec|
   spec.description   = %q{Fluentd plugin to store data on Google BigQuery, by load, or by stream inserts}
   spec.summary       = %q{Fluentd plugin to store data on Google BigQuery}
   spec.homepage      = "https://github.com/kaizenplatform/fluent-plugin-bigquery"
-  spec.license       = "APLv2"
+  spec.license       = "Apache-2.0"
   spec.files         = `git ls-files`.split($/)
   spec.executables   = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
@@ -23,9 +23,10 @@ Gem::Specification.new do |spec|
   spec.add_development_dependency "test-unit", "~> 3.0.2"
   spec.add_development_dependency "test-unit-rr", "~> 1.0.3"
-  spec.add_runtime_dependency "google-api-client", "~> 0.9.1"
+  spec.add_runtime_dependency "google-api-client", "~> 0.9.3"
   spec.add_runtime_dependency "googleauth", ">=  0.5.0"
   spec.add_runtime_dependency "multi_json"
+  spec.add_runtime_dependency "activesupport", ">= 3.2"
   spec.add_runtime_dependency "fluentd"
   spec.add_runtime_dependency "fluent-mixin-plaintextformatter", '>= 0.2.1'
   spec.add_runtime_dependency "fluent-mixin-config-placeholders", ">= 0.3.0"

data/lib/fluent/plugin/bigquery/version.rb CHANGED

@@ -1,6 +1,6 @@
 module Fluent
   module BigQueryPlugin
-    VERSION = "0.2.15"
+    VERSION = "0.2.16"
   end
 end

data/lib/fluent/plugin/out_bigquery.rb CHANGED

@@ -92,7 +92,7 @@ module Fluent
     config_param :insert_id_field, :string, default: nil
-    config_param :method, :string, default: 'insert' # or 'load' # TODO: not implemented now
+    config_param :method, :string, default: 'insert' # or 'load'
     config_param :load_size_limit, :integer, default: 1000**4 # < 1TB (1024^4) # TODO: not implemented now
     ### method: 'load'
@@ -150,6 +150,14 @@ module Fluent
     def configure(conf)
       super
+      if @method == "insert"
+        extend(InsertImplementation)
+      elsif @method == "load"
+        extend(LoadImplementation)
+      else
+        raise Fluend::ConfigError "'method' must be 'insert' or 'load'"
+      end
       case @auth_method
       when 'private_key'
         unless @email && @private_key_path
@@ -286,6 +294,12 @@ module Fluent
              else
                current_time
              end
+      if row && format =~ /\$\{/
+        json = row[:json]
+        format.gsub!(/\$\{\s*(\w+)\s*\}/) do |m|
+          row[:json][$1.to_sym].to_s.gsub(/[^\w]/, '')
+        end
+      end
       table_id = time.strftime(format)
       if chunk
@@ -321,29 +335,6 @@ module Fluent
       raise "failed to create table in bigquery" # TODO: error class
     end
-    def insert(table_id, rows)
-      client.insert_all_table_data(@project, @dataset, table_id, {
-        rows: rows
-      }, {})
-    rescue Google::Apis::ServerError, Google::Apis::ClientError, Google::Apis::AuthorizationError => e
-      # api_error? -> client cache clear
-      @cached_client = nil
-      message = e.message
-      if @auto_create_table && e.status_code == 404 && /Not Found: Table/i =~ message.to_s
-        # Table Not Found: Auto Create Table
-        create_table(table_id)
-        raise "table created. send rows next time."
-      end
-      log.error "tabledata.insertAll API", project_id: @project, dataset: @dataset, table: table_id, code: e.status_code, message: message
-      raise "failed to insert into bigquery" # TODO: error class
-    end
-    def load
-      # https://developers.google.com/bigquery/loading-data-into-bigquery#loaddatapostrequest
-      raise NotImplementedError # TODO
-    end
     def replace_record_key(record)
       new_record = {}
       record.each do |key, _|
@@ -366,44 +357,13 @@ module Fluent
       record
     end
-    def format(tag, time, record)
-      buf = ''
-      if @replace_record_key
-        record = replace_record_key(record)
-      end
-      if @convert_hash_to_json
-        record = convert_hash_to_json(record)
-      end
-      row = @fields.format(@add_time_field.call(record, time))
-      unless row.empty?
-        row = {"json" => row}
-        row['insert_id'] = @get_insert_id.call(record) if @get_insert_id
-        buf << row.to_msgpack
-      end
-      buf
-    end
     def write(chunk)
-      rows = []
-      chunk.msgpack_each do |row_object|
-        # TODO: row size limit
-        rows << row_object.deep_symbolize_keys
-      end
-      # TODO: method
-      insert_table_format = @tables_mutex.synchronize do
+      table_id_format = @tables_mutex.synchronize do
         t = @tables_queue.shift
         @tables_queue.push t
         t
       end
-      rows.group_by {|row| generate_table_id(insert_table_format, Time.at(Fluent::Engine.now), row, chunk) }.each do |table_id, rows|
-        insert(table_id, rows)
-      end
+      _write(chunk, table_id_format)
     end
     def fetch_schema
@@ -422,6 +382,137 @@ module Fluent
       raise "failed to fetch schema from bigquery" # TODO: error class
     end
+    module InsertImplementation
+      def format(tag, time, record)
+        buf = ''
+        if @replace_record_key
+          record = replace_record_key(record)
+        end
+        if @convert_hash_to_json
+          record = convert_hash_to_json(record)
+        end
+        row = @fields.format(@add_time_field.call(record, time))
+        unless row.empty?
+          row = {"json" => row}
+          row['insert_id'] = @get_insert_id.call(record) if @get_insert_id
+          buf << row.to_msgpack
+        end
+        buf
+      end
+      def _write(chunk, table_format)
+        rows = []
+        chunk.msgpack_each do |row_object|
+          # TODO: row size limit
+          rows << row_object.deep_symbolize_keys
+        end
+        rows.group_by {|row| generate_table_id(table_format, Time.at(Fluent::Engine.now), row, chunk) }.each do |table_id, group|
+          insert(table_id, group)
+        end
+      end
+      def insert(table_id, rows)
+        client.insert_all_table_data(@project, @dataset, table_id, {
+          rows: rows
+        }, {})
+      rescue Google::Apis::ServerError, Google::Apis::ClientError, Google::Apis::AuthorizationError => e
+        # api_error? -> client cache clear
+        @cached_client = nil
+        message = e.message
+        if @auto_create_table && e.status_code == 404 && /Not Found: Table/i =~ message.to_s
+          # Table Not Found: Auto Create Table
+          create_table(table_id)
+          raise "table created. send rows next time."
+        end
+        log.error "tabledata.insertAll API", project_id: @project, dataset: @dataset, table: table_id, code: e.status_code, message: message
+        raise "failed to insert into bigquery" # TODO: error class
+      end
+    end
+    module LoadImplementation
+      def format(tag, time, record)
+        buf = ''
+        if @replace_record_key
+          record = replace_record_key(record)
+        end
+        row = @fields.format(@add_time_field.call(record, time))
+        unless row.empty?
+          buf << MultiJson.dump(row) + "\n"
+        end
+        buf
+      end
+      def _write(chunk, table_id_format)
+        table_id = generate_table_id(table_id_format, Time.at(Fluent::Engine.now), nil, chunk)
+        load(chunk, table_id)
+      end
+      def load(chunk, table_id)
+        res = nil
+        create_upload_source(chunk) do |upload_source|
+          res = client.insert_job(@project, {
+            configuration: {
+              load: {
+                destination_table: {
+                  project_id: @project,
+                  dataset_id: @dataset,
+                  table_id: table_id,
+                },
+                schema: {
+                  fields: @fields.to_a,
+                },
+                write_disposition: "WRITE_APPEND",
+                source_format: "NEWLINE_DELIMITED_JSON"
+              }
+            }
+          }, {upload_source: upload_source, content_type: "application/octet-stream"})
+        end
+        wait_load(res, table_id)
+      end
+      private
+      def wait_load(res, table_id)
+        wait_interval = 10
+        _response = res
+        until _response.status.state == "DONE"
+          log.debug "wait for load job finish", state: _response.status.state
+          sleep wait_interval
+          _response = client.get_job(@project, _response.job_reference.job_id)
+        end
+        if _response.status.error_result
+          log.error "job.insert API", project_id: @project, dataset: @dataset, table: table_id, message: _response.status.error_result.message
+          raise "failed to load into bigquery"
+        end
+        log.debug "finish load job", state: _response.status.state
+      end
+      def create_upload_source(chunk)
+        chunk_is_file = @buffer_type == 'file'
+        if chunk_is_file
+          File.open(chunk.path) do |file|
+            yield file
+          end
+        else
+          Tempfile.open("chunk-tmp") do |file|
+            file.binmode
+            chunk.write_to(file)
+            file.sync
+            file.rewind
+            yield file
+          end
+        end
+      end
+    end
     class FieldSchema
       def initialize(name, mode = :nullable)
         unless [:nullable, :required, :repeated].include?(mode)

data/test/helper.rb CHANGED

@@ -27,7 +27,6 @@ require 'fluent/plugin/buf_memory'
 require 'fluent/plugin/buf_file'
 require 'fluent/plugin/out_bigquery'
-require 'fluent/plugin/bigquery/load_request_body_wrapper'
 require 'rr'

data/test/plugin/test_out_bigquery.rb CHANGED

@@ -710,6 +710,35 @@ class BigQueryOutputTest < Test::Unit::TestCase
     assert_equal expected, MessagePack.unpack(buf)
   end
+  def test_format_for_load
+    now = Time.now
+    input = [
+      now,
+      {
+        "uuid" => "9ABFF756-0267-4247-847F-0895B65F0938",
+      }
+    ]
+    expected = MultiJson.dump({
+      "uuid" => "9ABFF756-0267-4247-847F-0895B65F0938",
+    }) + "\n"
+    driver = create_driver(<<-CONFIG)
+      method load
+      table foo
+      email foo@bar.example
+      private_key_path /path/to/key
+      project yourproject_id
+      dataset yourdataset_id
+      field_string uuid
+    CONFIG
+    driver.instance.start
+    buf = driver.instance.format_stream("my.tag", [input])
+    driver.instance.shutdown
+    assert_equal expected, buf
+  end
   def test_empty_value_in_required
     now = Time.now
     input = [
@@ -857,6 +886,66 @@ class BigQueryOutputTest < Test::Unit::TestCase
     driver.instance.shutdown
   end
+  def test_write_for_load
+    schema_path = File.join(File.dirname(__FILE__), "testdata", "sudo.schema")
+    entry = {a: "b"}, {b: "c"}
+    driver = create_driver(<<-CONFIG)
+      method load
+      table foo
+      email foo@bar.example
+      private_key_path /path/to/key
+      project yourproject_id
+      dataset yourdataset_id
+      time_format %s
+      time_field  time
+      schema_path #{schema_path}
+      field_integer time
+    CONFIG
+    schema_fields = MultiJson.load(File.read(schema_path)).map(&:deep_symbolize_keys).tap do |h|
+      h[0][:type] = "INTEGER"
+      h[0][:mode] = "NULLABLE"
+    end
+    chunk = Fluent::MemoryBufferChunk.new("my.tag")
+    io = StringIO.new("hello")
+    mock(driver.instance).create_upload_source(chunk).yields(io)
+    mock_client(driver) do |expect|
+      expect.insert_job('yourproject_id', {
+        configuration: {
+          load: {
+            destination_table: {
+              project_id: 'yourproject_id',
+              dataset_id: 'yourdataset_id',
+              table_id: 'foo',
+            },
+            schema: {
+              fields: schema_fields,
+            },
+            write_disposition: "WRITE_APPEND",
+            source_format: "NEWLINE_DELIMITED_JSON"
+          }
+        }
+      }, {upload_source: io, content_type: "application/octet-stream"}) {
+        s = stub!
+        status_stub = stub!
+        s.status { status_stub }
+        status_stub.state { "DONE" }
+        status_stub.error_result { nil }
+        s
+      }
+    end
+    entry.each do |e|
+      chunk << MultiJson.dump(e) + "\n"
+    end
+    driver.instance.start
+    driver.instance.write(chunk)
+    driver.instance.shutdown
+  end
   def test_write_with_row_based_table_id_formatting
     entry = [
       {json: {a: "b", created_at: Time.local(2014,8,20,9,0,0).to_i}},
@@ -935,6 +1024,26 @@ class BigQueryOutputTest < Test::Unit::TestCase
     assert_equal 'foo_20140811', table_id
   end
+  def test_generate_table_id_with_attribute_replacement
+    driver = create_driver
+    table_id_format = 'foo_%Y_%m_%d_${baz}'
+    current_time = Time.now
+    time = Time.local(2014, 8, 11, 21, 20, 56)
+    [
+      [ { baz: 1234 },         'foo_2014_08_11_1234' ],
+      [ { baz: 'piyo' },       'foo_2014_08_11_piyo' ],
+      [ { baz: true },         'foo_2014_08_11_true' ],
+      [ { baz: nil },          'foo_2014_08_11_' ],
+      [ { baz: '' },           'foo_2014_08_11_' ],
+      [ { baz: "_X-Y.Z !\n" }, 'foo_2014_08_11__XYZ' ],
+      [ { baz: { xyz: 1 } },   'foo_2014_08_11_xyz1' ],
+    ].each do |attrs, expected|
+      row = { json: { created_at: Time.local(2014,8,10,21,20,57).to_i }.merge(attrs) }
+      table_id = driver.instance.generate_table_id(table_id_format, time, row)
+      assert_equal expected, table_id
+    end
+  end
   def test_auto_create_table_by_bigquery_api
     now = Time.now
     message = {

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: fluent-plugin-bigquery
 version: !ruby/object:Gem::Version
-  version: 0.2.15
+  version: 0.2.16
 platform: ruby
 authors:
 - Naoya Ito
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2016-01-25 00:00:00.000000000 Z
+date: 2016-03-16 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rake
@@ -72,14 +72,14 @@ dependencies:
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: 0.9.1
+        version: 0.9.3
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: 0.9.1
+        version: 0.9.3
 - !ruby/object:Gem::Dependency
   name: googleauth
   requirement: !ruby/object:Gem::Requirement
@@ -108,6 +108,20 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+- !ruby/object:Gem::Dependency
+  name: activesupport
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '3.2'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '3.2'
 - !ruby/object:Gem::Dependency
   name: fluentd
   requirement: !ruby/object:Gem::Requirement
@@ -193,7 +207,6 @@ files:
 - README.md
 - Rakefile
 - fluent-plugin-bigquery.gemspec
-- lib/fluent/plugin/bigquery/load_request_body_wrapper.rb
 - lib/fluent/plugin/bigquery/version.rb
 - lib/fluent/plugin/out_bigquery.rb
 - test/helper.rb
@@ -201,10 +214,9 @@ files:
 - test/plugin/testdata/apache.schema
 - test/plugin/testdata/json_key.json
 - test/plugin/testdata/sudo.schema
-- test/test_load_request_body_wrapper.rb
 homepage: https://github.com/kaizenplatform/fluent-plugin-bigquery
 licenses:
-- APLv2
+- Apache-2.0
 metadata: {}
 post_install_message:
 rdoc_options: []
@@ -232,4 +244,3 @@ test_files:
 - test/plugin/testdata/apache.schema
 - test/plugin/testdata/json_key.json
 - test/plugin/testdata/sudo.schema
-- test/test_load_request_body_wrapper.rb

data/lib/fluent/plugin/bigquery/load_request_body_wrapper.rb DELETED

@@ -1,173 +0,0 @@
-module Fluent
-  module BigQueryPlugin
-    class LoadRequestBodyWrapper
-      # body can be a instance of IO (#rewind, #read, #to_str)
-      #   http://rubydoc.info/github/google/google-api-ruby-client/Google/APIClient/Request#body-instance_method
-      # http://rubydoc.info/github/google/google-api-ruby-client/Google/APIClient#execute-instance_method
-      # (Google::APIClient::Method) api_method: The method object or the RPC name of the method being executed.
-      # (Hash, Array) parameters: The parameters to send to the method.
-      # (String) body: The body of the request.
-      # (Hash, Array) headers: The HTTP headers for the request.
-      # (Hash) options: A set of options for the request, of which:
-      #          (#generate_authenticated_request) :authorization (default: true)
-      #                       - The authorization mechanism for the response. Used only if :authenticated is true.
-      #          (TrueClass, FalseClass) :authenticated (default: true)
-      #                       - true if the request must be signed or somehow authenticated, false otherwise.
-      #          (TrueClass, FalseClass) :gzip (default: true) - true if gzip enabled, false otherwise.
-      # https://developers.google.com/bigquery/loading-data-into-bigquery#loaddatapostrequest
-      JSON_PRETTY_DUMP = JSON::State.new(space: " ", indent:"  ", object_nl:"\n", array_nl:"\n")
-      CONTENT_TYPE_FIRST = "Content-Type: application/json; charset=UTF-8\n\n"
-      CONTENT_TYPE_SECOND = "Content-Type: application/octet-stream\n\n"
-      MULTIPART_BOUNDARY = "--xxx\n"
-      MULTIPART_BOUNDARY_END = "--xxx--\n"
-      def initialize(project_id, dataset_id, table_id, field_defs, buffer)
-        @metadata = {
-          configuration: {
-            load: {
-              sourceFormat: "<required for JSON files>",
-              schema: {
-                fields: field_defs
-              },
-              destinationTable: {
-                projectId: project_id,
-                datasetId: dataset_id,
-                tableId: table_id
-              }
-            }
-          }
-        }
-        @non_buffer = MULTIPART_BOUNDARY + CONTENT_TYPE_FIRST + @metadata.to_json(JSON_PRETTY_DUMP) + "\n" +
-          MULTIPART_BOUNDARY + CONTENT_TYPE_SECOND
-        @non_buffer.force_encoding("ASCII-8BIT")
-        @non_buffer_bytesize = @non_buffer.bytesize
-        @buffer = buffer # read
-        @buffer_bytesize = @buffer.size # Fluentd Buffer Chunk #size -> bytesize
-        @footer = MULTIPART_BOUNDARY_END.force_encoding("ASCII-8BIT")
-        @contents_bytesize = @non_buffer_bytesize + @buffer_bytesize
-        @total_bytesize = @contents_bytesize + MULTIPART_BOUNDARY_END.bytesize
-        @whole_data = nil
-        @counter = 0
-        @eof = false
-      end
-#       sample_body = <<EOF
-# --xxx
-# Content-Type: application/json; charset=UTF-8
-#
-# {
-#   "configuration": {
-#     "load": {
-#       "sourceFormat": "<required for JSON files>",
-#       "schema": {
-#         "fields": [
-#           {"name":"f1", "type":"STRING"},
-#           {"name":"f2", "type":"INTEGER"}
-#         ]
-#       },
-#       "destinationTable": {
-#         "projectId": "projectId",
-#         "datasetId": "datasetId",
-#         "tableId": "tableId"
-#       }
-#     }
-#   }
-# }
-# --xxx
-# Content-Type: application/octet-stream
-#
-# <your data>
-# --xxx--
-# EOF
-      def rewind
-        @counter = 0
-        @eof = false
-      end
-      def eof?
-        @eof
-      end
-      def to_str
-        rewind
-        self.read # all data
-      end
-      def read(length=nil, outbuf="")
-        raise ArgumentError, "negative read length" if length && length < 0
-        return (length.nil? || length == 0) ? "" : nil if @eof
-        return outbuf if length == 0
-        # read all data
-        if length.nil? || length >= @total_bytesize
-          @whole_data ||= @buffer.read.force_encoding("ASCII-8BIT")
-          if @counter.zero?
-            outbuf.replace(@non_buffer)
-            outbuf << @whole_data
-            outbuf << @footer
-          elsif @counter < @non_buffer_bytesize
-            outbuf.replace(@non_buffer[ @counter .. -1 ])
-            outbuf << @whole_data
-            outbuf << @footer
-          elsif @counter < @contents_bytesize
-            outbuf.replace(@whole_data[ (@counter - @non_buffer_bytesize) .. -1 ])
-            outbuf << @footer
-          else
-            outbuf.replace(@footer[ (@counter - @contents_bytesize) .. -1 ])
-          end
-          @counter = @total_bytesize
-          @eof = true
-          return outbuf
-        end
-        # In ruby script level (non-ext module), we cannot prevent to change outbuf length or object re-assignment
-        outbuf.replace("")
-        # return first part (metadata)
-        if @counter < @non_buffer_bytesize
-          non_buffer_part = @non_buffer[@counter, length]
-          if non_buffer_part
-            outbuf << non_buffer_part
-            length -= non_buffer_part.bytesize
-            @counter += non_buffer_part.bytesize
-          end
-        end
-        return outbuf if length < 1
-        # return second part (buffer content)
-        if @counter < @contents_bytesize
-          @whole_data ||= @buffer.read.force_encoding("ASCII-8BIT")
-          buffer_part = @whole_data[@counter - @non_buffer_bytesize, length]
-          if buffer_part
-            outbuf << buffer_part
-            length -= buffer_part.bytesize
-            @counter += buffer_part.bytesize
-          end
-        end
-        return outbuf if length < 1
-        # return footer
-        footer_part = @footer[@counter - @contents_bytesize, length]
-        if footer_part
-          outbuf << footer_part
-          @counter += footer_part.bytesize
-          @eof = true if @counter >= @total_bytesize
-        end
-        outbuf
-      end
-    end
-  end
-end

data/test/test_load_request_body_wrapper.rb DELETED

@@ -1,190 +0,0 @@
-# -*- coding: utf-8 -*-
-require 'helper'
-require 'json'
-require 'tempfile'
-class LoadRequestBodyWrapperTest < Test::Unit::TestCase
-  def content_alphabet(repeat)
-    (0...repeat).map{|i| "#{i}0123456789\n" }.join
-  end
-  def content_kana(repeat)
-    (0...repeat).map{|i| "#{i}あいうえおかきくけこ\n" }.join
-  end
-  def mem_chunk(repeat=10, kana=false)
-    content = kana ? content_kana(repeat) : content_alphabet(repeat)
-    Fluent::MemoryBufferChunk.new('bc_mem', content)
-  end
-  def file_chunk(repeat=10, kana=false)
-    content = kana ? content_kana(repeat) : content_alphabet(repeat)
-    tmpfile = Tempfile.new('fluent_bigquery_plugin_test')
-    buf = Fluent::FileBufferChunk.new('bc_mem', tmpfile.path, tmpfile.object_id)
-    buf << content
-    buf
-  end
-  def field_defs
-    [{"name" => "field1", "type" => "STRING"}, {"name" => "field2", "type" => "INTEGER"}]
-  end
-  def check_meta(blank, first, last)
-    assert_equal "", blank
-    header1, body1 = first.split("\n\n")
-    assert_equal "Content-Type: application/json; charset=UTF-8", header1
-    metadata = JSON.parse(body1)
-    assert_equal "<required for JSON files>", metadata["configuration"]["load"]["sourceFormat"]
-    assert_equal "field1", metadata["configuration"]["load"]["schema"]["fields"][0]["name"]
-    assert_equal "STRING", metadata["configuration"]["load"]["schema"]["fields"][0]["type"]
-    assert_equal "field2", metadata["configuration"]["load"]["schema"]["fields"][1]["name"]
-    assert_equal "INTEGER", metadata["configuration"]["load"]["schema"]["fields"][1]["type"]
-    assert_equal "pname1", metadata["configuration"]["load"]["destinationTable"]["projectId"]
-    assert_equal "dname1", metadata["configuration"]["load"]["destinationTable"]["datasetId"]
-    assert_equal "tname1", metadata["configuration"]["load"]["destinationTable"]["tableId"]
-    assert_equal "--\n", last
-  end
-  def check_ascii(data)
-    blank, first, second, last = data.split(/--xxx\n?/)
-    check_meta(blank, first, last)
-    header2, body2 = second.split("\n\n")
-    assert_equal "Content-Type: application/octet-stream", header2
-    i = 0
-    body2.each_line do |line|
-      assert_equal "#{i}0123456789\n", line
-      i += 1
-    end
-  end
-  def check_kana(data)
-    blank, first, second, last = data.split(/--xxx\n?/)
-    check_meta(blank, first, last)
-    header2, body2 = second.split("\n\n")
-    assert_equal "Content-Type: application/octet-stream", header2
-    i = 0
-    body2.each_line do |line|
-      assert_equal "#{i}あいうえおかきくけこ\n", line
-      i += 1
-    end
-  end
-  def setup
-    @klass = Fluent::BigQueryPlugin::LoadRequestBodyWrapper
-    self
-  end
-  def test_memory_buf
-    d1 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), mem_chunk(10))
-    data1 = d1.read.force_encoding("UTF-8")
-    check_ascii(data1)
-    d2 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), mem_chunk(10))
-    data2 = ""
-    while !d2.eof? do
-      buf = "     "
-      objid = buf.object_id
-      data2 << d2.read(20, buf)
-      assert_equal objid, buf.object_id
-    end
-    data2.force_encoding("UTF-8")
-    assert_equal data1.size, data2.size
-  end
-  def test_memory_buf2
-    d1 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), mem_chunk(100000))
-    data1 = d1.read.force_encoding("UTF-8")
-    check_ascii(data1)
-    d2 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), mem_chunk(100000))
-    data2 = ""
-    while !d2.eof? do
-      buf = "     "
-      objid = buf.object_id
-      data2 << d2.read(2048, buf)
-      assert_equal objid, buf.object_id
-    end
-    data2.force_encoding("UTF-8")
-    assert_equal data1.size, data2.size
-  end
-  def test_memory_buf3 # kana
-    d1 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), mem_chunk(100000, true))
-    data1 = d1.read.force_encoding("UTF-8")
-    check_kana(data1)
-    d2 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), mem_chunk(100000, true))
-    data2 = ""
-    while !d2.eof? do
-      buf = "     "
-      objid = buf.object_id
-      data2 << d2.read(2048, buf)
-      assert_equal objid, buf.object_id
-    end
-    data2.force_encoding("UTF-8")
-    assert_equal data1.size, data2.size
-  end
-  def test_file_buf
-    d1 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), file_chunk(10))
-    data1 = d1.read.force_encoding("UTF-8")
-    check_ascii(data1)
-    d2 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), file_chunk(10))
-    data2 = ""
-    while !d2.eof? do
-      buf = "     "
-      objid = buf.object_id
-      data2 << d2.read(20, buf)
-      assert_equal objid, buf.object_id
-    end
-    data2.force_encoding("UTF-8")
-    assert_equal data1.size, data2.size
-  end
-  def test_file_buf2
-    d1 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), file_chunk(100000))
-    data1 = d1.read.force_encoding("UTF-8")
-    check_ascii(data1)
-    d2 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), file_chunk(100000))
-    data2 = ""
-    while !d2.eof? do
-      buf = "     "
-      objid = buf.object_id
-      data2 << d2.read(20480, buf)
-      assert_equal objid, buf.object_id
-    end
-    data2.force_encoding("UTF-8")
-    assert_equal data1.size, data2.size
-  end
-  def test_file_buf3 # kana
-    d1 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), file_chunk(100000, true))
-    data1 = d1.read.force_encoding("UTF-8")
-    check_kana(data1)
-    d2 = @klass.new('pname1', 'dname1', 'tname1', field_defs(), file_chunk(100000, true))
-    data2 = ""
-    while !d2.eof? do
-      buf = "     "
-      objid = buf.object_id
-      data2 << d2.read(20480, buf)
-      assert_equal objid, buf.object_id
-    end
-    data2.force_encoding("UTF-8")
-    assert_equal data1.size, data2.size
-  end
-end