embulk-output-bigquery 0.4.2 → 0.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 4287886bb0467a77706c88ae428cc95f082cdca1
4
- data.tar.gz: 5c3677c05609d29b5835be54bde777827ab6b607
3
+ metadata.gz: e8b074a351f22417a10571e1a9aa60a1bc82df0d
4
+ data.tar.gz: c50a49b3b99f5cab88e023af6d95b39990c40d89
5
5
  SHA512:
6
- metadata.gz: 4d2ff6070fc2eb7a27c26513f712d056ef2e7f0a158125c8921293a5c8299f4963891928bf6e3c43679d4131bbdd7481acf96d1a9fe5cb338b5c39f8c3553b6b
7
- data.tar.gz: 3bae2694c1d59218517b0f6755216d5050b522b20263c83ebde062df069c2e97fc0adbd586ddc192e43c80af4691c26d55665e363c13e1f8a467cc44acae5f04
6
+ metadata.gz: 52e5a630d3173d2baec83dd03fbf0e3e4cb7d46aeb870e192ce31c6c8178534cade8975ec586e2a07bc1c213d616505ed52b5ec154cabdc19669317f8ba673b3
7
+ data.tar.gz: f9b15c9ff54a64626b33ce123a11ac80a7984535eb9fc47c042c9647b3d1728bd6f5f4f3157c9fcbac9163b097f83de5de7e3db2fb443a93aa4bdf32bd9d7fd5
data/CHANGELOG.md CHANGED
@@ -1,3 +1,7 @@
1
+ ## 0.4.3 - 2017-02-11
2
+
3
+ * [maintenance] Fix `schma_update_options` was not set with load_from_gcs (thanks to h10a-bf)
4
+
1
5
  ## 0.4.2 - 2016-10-12
2
6
 
3
7
  * [maintenance] Fix `schema_update_options` was not working (nil error)
data/README.md CHANGED
@@ -102,8 +102,8 @@ Following options are same as [bq command-line tools](https://cloud.google.com/b
102
102
  | allow_quoted_newlines | boolean | optional | false | Set true, if data contains newline characters. It may cause slow procsssing |
103
103
  | time_partitioning | hash | optional | `{"type":"DAY"}` if `table` parameter has a partition decorator, otherwise nil | See [Time Partitioning](#time-partitioning) |
104
104
  | time_partitioning.type | string | required | nil | The only type supported is DAY, which will generate one partition per day based on data loading time. |
105
- | time_partitioning.expiration__ms | int | optional | nil | Number of milliseconds for which to keep the storage for a partition. partition |
106
- | schema_update_options | array | optional | nil | List of `ALLOW_FIELD_ADDITION` or `ALLOW_FIELD_RELAXATION` or both. See [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions) |
105
+ | time_partitioning.expiration_ms | int | optional | nil | Number of milliseconds for which to keep the storage for a partition. partition |
106
+ | schema_update_options | array | optional | nil | (Experimental) List of `ALLOW_FIELD_ADDITION` or `ALLOW_FIELD_RELAXATION` or both. See [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions). NOTE for the current status: `schema_update_options` does not work for `copy` job, that is, is not effective for most of modes such as `append`, `append_direct`, `replace`, `replace_backup` (except `delete_in_advance`) |
107
107
 
108
108
  ### Example
109
109
 
@@ -127,24 +127,25 @@ out:
127
127
 
128
128
  ##### append
129
129
 
130
- 1. Load to temporary table.
130
+ 1. Load to temporary table (Create and WRITE_APPEND in parallel)
131
131
  2. Copy temporary table to destination table (or partition). (WRITE_APPEND)
132
132
 
133
133
  ##### append_direct
134
134
 
135
- Insert data into existing table (or partition) directly.
135
+ 1. Insert data into existing table (or partition) directly. (WRITE_APPEND in parallel)
136
+
136
137
  This is not transactional, i.e., if fails, the target table could have some rows inserted.
137
138
 
138
139
  ##### replace
139
140
 
140
- 1. Load to temporary table.
141
+ 1. Load to temporary table (Create and WRITE_APPEND in parallel)
141
142
  2. Copy temporary table to destination table (or partition). (WRITE_TRUNCATE)
142
143
 
143
144
  ```is_skip_job_result_check``` must be false when replace mode
144
145
 
145
146
  ##### replace_backup
146
147
 
147
- 1. Load to temporary table.
148
+ 1. Load to temporary table (Create and WRITE_APPEND in parallel)
148
149
  2. Copy destination table (or partition) to backup table (or partition). (dataset_old, table_old)
149
150
  3. Copy temporary table to destination table (or partition). (WRITE_TRUNCATE)
150
151
 
@@ -316,7 +317,7 @@ Therefore, it is recommended to format records with filter plugins written in Ja
316
317
  filters:
317
318
  - type: to_json
318
319
  column: {name: payload, type: string}
319
- default_format: %Y-%m-%d %H:%M:%S.%6N
320
+ default_format: "%Y-%m-%d %H:%M:%S.%6N"
320
321
  out:
321
322
  type: bigquery
322
323
  payload_column_index: 0 # or, payload_column: payload
@@ -397,24 +398,12 @@ out:
397
398
  expiration_ms: 259200000
398
399
  ```
399
400
 
400
- Use `schema_update_options` to allow the schema of the desitination table to be updated as a side effect of the load job as:
401
-
402
- ```yaml
403
- out:
404
- type: bigquery
405
- table: table_name$20160929
406
- auto_create_table: true
407
- time_partitioning:
408
- type: DAY
409
- expiration_ms: 259200000
410
- schema_update_options:
411
- - ALLOW_FIELD_ADDITION
412
- - ALLOW_FIELD_RELAXATION
413
- ```
401
+ Use [Tables: patch](https://cloud.google.com/bigquery/docs/reference/v2/tables/patch) API to update the schema of the partitioned table, embulk-output-bigquery itself does not support it, though.
402
+ Note that only adding a new column, and relaxing non-necessary columns to be `NULLABLE` are supported now. Deleting columns, and renaming columns are not supported.
414
403
 
415
- It seems that only adding a new column, and relaxing non-necessary columns to be `NULLABLE` are supported now.
416
- Deleting columns, and renaming columns are not supported.
417
- See [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions) for details.
404
+ MEMO: [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions) is available
405
+ to update the schema of the desitination table as a side effect of the load job, but it is not available for copy job.
406
+ Thus, it was not suitable for embulk-output-bigquery idempotence modes, `append`, `replace`, and `replace_backup`, sigh.
418
407
 
419
408
  ## Development
420
409
 
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |spec|
2
2
  spec.name = "embulk-output-bigquery"
3
- spec.version = "0.4.2"
3
+ spec.version = "0.4.3"
4
4
  spec.authors = ["Satoshi Akama", "Naotoshi Seo"]
5
5
  spec.summary = "Google BigQuery output plugin for Embulk"
6
6
  spec.description = "Embulk plugin that insert records to Google BigQuery."
@@ -13,7 +13,8 @@ Gem::Specification.new do |spec|
13
13
  spec.require_paths = ["lib"]
14
14
 
15
15
  spec.add_dependency 'google-api-client'
16
- spec.add_dependency "tzinfo"
16
+ spec.add_dependency 'time_with_zone'
17
+
17
18
  spec.add_development_dependency 'embulk', ['>= 0.8.2']
18
19
  spec.add_development_dependency 'bundler', ['>= 1.10.6']
19
20
  spec.add_development_dependency 'rake', ['>= 10.0']
@@ -104,6 +104,11 @@ module Embulk
104
104
  }
105
105
  }
106
106
  }
107
+
108
+ if @task['schema_update_options']
109
+ body[:configuration][:load][:schema_update_options] = @task['schema_update_options']
110
+ end
111
+
107
112
  opts = {}
108
113
 
109
114
  Embulk.logger.debug { "embulk-output-bigquery: insert_job(#{@project}, #{body}, #{opts})" }
@@ -258,10 +263,6 @@ module Embulk
258
263
  }
259
264
  }
260
265
 
261
- if @task['schema_update_options']
262
- body[:configuration][:copy][:schema_update_options] = @task['schema_update_options']
263
- end
264
-
265
266
  opts = {}
266
267
  Embulk.logger.debug { "embulk-output-bigquery: insert_job(#{@project}, #{body}, #{opts})" }
267
268
  response = with_network_retry { client.insert_job(@project, body, opts) }
@@ -1,5 +1,5 @@
1
1
  require 'time'
2
- require 'tzinfo'
2
+ require 'time_with_zone'
3
3
  require 'json'
4
4
  require_relative 'helper'
5
5
 
@@ -23,8 +23,8 @@ module Embulk
23
23
  # @return [Array] an arary whose key is column_index, and value is its converter (Proc)
24
24
  def self.create_converters(task, schema)
25
25
  column_options_map = Helper.column_options_map(task['column_options'])
26
- default_timestamp_format = task['default_timestamp_format']
27
- default_timezone = task['default_timezone']
26
+ default_timestamp_format = task['default_timestamp_format'] || DEFAULT_TIMESTAMP_FORMAT
27
+ default_timezone = task['default_timezone'] || DEFAULT_TIMEZONE
28
28
  schema.map do |column|
29
29
  column_name = column[:name]
30
30
  embulk_type = column[:type]
@@ -53,7 +53,7 @@ module Embulk
53
53
  @timestamp_format = timestamp_format
54
54
  @default_timestamp_format = default_timestamp_format
55
55
  @timezone = timezone || default_timezone
56
- @zone_offset = get_zone_offset(@timezone) if @timezone
56
+ @zone_offset = TimeWithZone.zone_offset(@timezone)
57
57
  @strict = strict.nil? ? true : strict
58
58
  end
59
59
 
@@ -194,7 +194,7 @@ module Embulk
194
194
  Proc.new {|val|
195
195
  next nil if val.nil?
196
196
  with_typecast_error(val) do |val|
197
- strptime_with_zone(val, @timestamp_format, zone_offset).to_f
197
+ TimeWithZone.set_zone_offset(Time.strptime(val, @timestamp_format), zone_offset).strftime("%Y-%m-%d %H:%M:%S.%6N %:z")
198
198
  end
199
199
  }
200
200
  else
@@ -238,7 +238,7 @@ module Embulk
238
238
  when 'TIMESTAMP'
239
239
  Proc.new {|val|
240
240
  next nil if val.nil?
241
- val.to_f # BigQuery supports UNIX timestamp
241
+ val.strftime("%Y-%m-%d %H:%M:%S.%6N %:z")
242
242
  }
243
243
  else
244
244
  raise NotSupportedType, "cannot take column type #{type} for timestamp column"
@@ -261,31 +261,6 @@ module Embulk
261
261
  raise NotSupportedType, "cannot take column type #{type} for json column"
262
262
  end
263
263
  end
264
-
265
- private
266
-
267
- # [+-]HH:MM, [+-]HHMM, [+-]HH
268
- NUMERIC_PATTERN = %r{\A[+-]\d\d(:?\d\d)?\z}
269
-
270
- # Region/Zone, Region/Zone/Zone
271
- NAME_PATTERN = %r{\A[^/]+/[^/]+(/[^/]+)?\z}
272
-
273
- def strptime_with_zone(date, timestamp_format, zone_offset)
274
- time = Time.strptime(date, timestamp_format)
275
- utc_offset = time.utc_offset
276
- time.localtime(zone_offset) + utc_offset - zone_offset
277
- end
278
-
279
- def get_zone_offset(timezone)
280
- if NUMERIC_PATTERN === timezone
281
- Time.zone_offset(timezone)
282
- elsif NAME_PATTERN === timezone || 'UTC' == timezone
283
- tz = TZInfo::Timezone.get(timezone)
284
- tz.period_for_utc(Time.now).utc_total_offset
285
- else
286
- raise ArgumentError, "timezone format is invalid: #{timezone}"
287
- end
288
- end
289
264
  end
290
265
  end
291
266
  end
@@ -43,7 +43,7 @@ module Embulk
43
43
  end
44
44
 
45
45
  def record
46
- [true, 1, 1.1, 'foo', Time.parse("2016-02-26 00:00:00 +09:00"), {"foo"=>"foo"}]
46
+ [true, 1, 1.1, 'foo', Time.parse("2016-02-26 00:00:00 +00:00").utc, {"foo"=>"foo"}]
47
47
  end
48
48
 
49
49
  def page
@@ -81,7 +81,7 @@ module Embulk
81
81
  formatter_proc = file_writer.instance_variable_get(:@formatter_proc)
82
82
  assert_equal :to_csv, formatter_proc.name
83
83
 
84
- expected = %Q[true,1,1.1,foo,1456412400.0,"{""foo"":""foo""}"\n]
84
+ expected = %Q[true,1,1.1,foo,2016-02-26 00:00:00.000000 +00:00,"{""foo"":""foo""}"\n]
85
85
  assert_equal expected, formatter_proc.call(record)
86
86
  end
87
87
 
@@ -91,7 +91,7 @@ module Embulk
91
91
  formatter_proc = file_writer.instance_variable_get(:@formatter_proc)
92
92
  assert_equal :to_jsonl, formatter_proc.name
93
93
 
94
- expected = %Q[{"boolean":true,"long":1,"double":1.1,"string":"foo","timestamp":1456412400.0,"json":"{\\"foo\\":\\"foo\\"}"}\n]
94
+ expected = %Q[{"boolean":true,"long":1,"double":1.1,"string":"foo","timestamp":"2016-02-26 00:00:00.000000 +00:00","json":"{\\"foo\\":\\"foo\\"}"}\n]
95
95
  assert_equal expected, formatter_proc.call(record)
96
96
  end
97
97
  end
@@ -23,8 +23,8 @@ module Embulk
23
23
  assert_equal 1, converters[1].call(1)
24
24
  assert_equal 1.1, converters[2].call(1.1)
25
25
  assert_equal 'foo', converters[3].call('foo')
26
- timestamp = Time.parse("2016-02-26 00:00:00.100000 UTC")
27
- assert_equal 1456444800.1, converters[4].call(timestamp)
26
+ timestamp = Time.parse("2016-02-26 00:00:00.500000 +00:00")
27
+ assert_equal "2016-02-26 00:00:00.500000 +00:00", converters[4].call(timestamp)
28
28
  assert_equal %Q[{"foo":"foo"}], converters[5].call({'foo'=>'foo'})
29
29
  end
30
30
 
@@ -55,7 +55,7 @@ module Embulk
55
55
  assert_equal '1', converters[1].call(1)
56
56
  assert_equal '1.1', converters[2].call(1.1)
57
57
  assert_equal 1, converters[3].call('1')
58
- timestamp = Time.parse("2016-02-26 00:00:00.100000 UTC")
58
+ timestamp = Time.parse("2016-02-26 00:00:00.100000 +00:00")
59
59
  assert_equal 1456444800, converters[4].call(timestamp)
60
60
  assert_equal({'foo'=>'foo'}, converters[5].call({'foo'=>'foo'}))
61
61
  end
@@ -208,7 +208,7 @@ module Embulk
208
208
  timestamp_format: '%Y-%m-%d', timezone: 'Asia/Tokyo'
209
209
  ).create_converter
210
210
  assert_equal nil, converter.call(nil)
211
- assert_equal 1456412400.0, converter.call("2016-02-26")
211
+ assert_equal "2016-02-26 00:00:00.000000 +09:00", converter.call("2016-02-26")
212
212
 
213
213
  # Users must care of BQ timestamp format by themselves with no timestamp_format
214
214
  converter = ValueConverterFactory.new(SCHEMA_TYPE, 'TIMESTAMP').create_converter
@@ -240,22 +240,22 @@ module Embulk
240
240
  def test_float
241
241
  converter = ValueConverterFactory.new(SCHEMA_TYPE, 'FLOAT').create_converter
242
242
  assert_equal nil, converter.call(nil)
243
- expected = 1456444800.100000
243
+ expected = 1456444800.500000
244
244
  assert_equal expected, converter.call(Time.at(expected))
245
245
  end
246
246
 
247
247
  def test_string
248
248
  converter = ValueConverterFactory.new(SCHEMA_TYPE, 'STRING').create_converter
249
249
  assert_equal nil, converter.call(nil)
250
- timestamp = Time.parse("2016-02-26 00:00:00.100000 UTC")
251
- expected = "2016-02-26 00:00:00.100000"
250
+ timestamp = Time.parse("2016-02-26 00:00:00.500000 +00:00")
251
+ expected = "2016-02-26 00:00:00.500000"
252
252
  assert_equal expected, converter.call(timestamp)
253
253
 
254
254
  converter = ValueConverterFactory.new(
255
255
  SCHEMA_TYPE, 'STRING',
256
256
  timestamp_format: '%Y-%m-%d', timezone: 'Asia/Tokyo'
257
257
  ).create_converter
258
- timestamp = Time.parse("2016-02-25 15:00:00.100000 UTC")
258
+ timestamp = Time.parse("2016-02-25 15:00:00.500000 +00:00")
259
259
  expected = "2016-02-26"
260
260
  assert_equal expected, converter.call(timestamp)
261
261
  end
@@ -263,8 +263,9 @@ module Embulk
263
263
  def test_timestamp
264
264
  converter = ValueConverterFactory.new(SCHEMA_TYPE, 'TIMESTAMP').create_converter
265
265
  assert_equal nil, converter.call(nil)
266
- expected = 1456444800.100000
267
- assert_equal expected, converter.call(Time.at(expected))
266
+ subject = 1456444800.500000
267
+ expected = "2016-02-26 00:00:00.500000 +00:00"
268
+ assert_equal expected, converter.call(Time.at(subject).utc)
268
269
  end
269
270
 
270
271
  def test_record
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: embulk-output-bigquery
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.2
4
+ version: 0.4.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Satoshi Akama
@@ -9,78 +9,78 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2016-10-12 00:00:00.000000000 Z
12
+ date: 2017-02-11 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
- name: google-api-client
16
- version_requirements: !ruby/object:Gem::Requirement
17
- requirements:
18
- - - ">="
19
- - !ruby/object:Gem::Version
20
- version: '0'
21
15
  requirement: !ruby/object:Gem::Requirement
22
16
  requirements:
23
17
  - - ">="
24
18
  - !ruby/object:Gem::Version
25
19
  version: '0'
20
+ name: google-api-client
26
21
  prerelease: false
27
22
  type: :runtime
28
- - !ruby/object:Gem::Dependency
29
- name: tzinfo
30
23
  version_requirements: !ruby/object:Gem::Requirement
31
24
  requirements:
32
25
  - - ">="
33
26
  - !ruby/object:Gem::Version
34
27
  version: '0'
28
+ - !ruby/object:Gem::Dependency
35
29
  requirement: !ruby/object:Gem::Requirement
36
30
  requirements:
37
31
  - - ">="
38
32
  - !ruby/object:Gem::Version
39
33
  version: '0'
34
+ name: time_with_zone
40
35
  prerelease: false
41
36
  type: :runtime
42
- - !ruby/object:Gem::Dependency
43
- name: embulk
44
37
  version_requirements: !ruby/object:Gem::Requirement
45
38
  requirements:
46
39
  - - ">="
47
40
  - !ruby/object:Gem::Version
48
- version: 0.8.2
41
+ version: '0'
42
+ - !ruby/object:Gem::Dependency
49
43
  requirement: !ruby/object:Gem::Requirement
50
44
  requirements:
51
45
  - - ">="
52
46
  - !ruby/object:Gem::Version
53
47
  version: 0.8.2
48
+ name: embulk
54
49
  prerelease: false
55
50
  type: :development
56
- - !ruby/object:Gem::Dependency
57
- name: bundler
58
51
  version_requirements: !ruby/object:Gem::Requirement
59
52
  requirements:
60
53
  - - ">="
61
54
  - !ruby/object:Gem::Version
62
- version: 1.10.6
55
+ version: 0.8.2
56
+ - !ruby/object:Gem::Dependency
63
57
  requirement: !ruby/object:Gem::Requirement
64
58
  requirements:
65
59
  - - ">="
66
60
  - !ruby/object:Gem::Version
67
61
  version: 1.10.6
62
+ name: bundler
68
63
  prerelease: false
69
64
  type: :development
70
- - !ruby/object:Gem::Dependency
71
- name: rake
72
65
  version_requirements: !ruby/object:Gem::Requirement
73
66
  requirements:
74
67
  - - ">="
75
68
  - !ruby/object:Gem::Version
76
- version: '10.0'
69
+ version: 1.10.6
70
+ - !ruby/object:Gem::Dependency
77
71
  requirement: !ruby/object:Gem::Requirement
78
72
  requirements:
79
73
  - - ">="
80
74
  - !ruby/object:Gem::Version
81
75
  version: '10.0'
76
+ name: rake
82
77
  prerelease: false
83
78
  type: :development
79
+ version_requirements: !ruby/object:Gem::Requirement
80
+ requirements:
81
+ - - ">="
82
+ - !ruby/object:Gem::Version
83
+ version: '10.0'
84
84
  description: Embulk plugin that insert records to Google BigQuery.
85
85
  email:
86
86
  - satoshiakama@gmail.com