tumugi-plugin-bigquery 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 3ab49f7c9e04361951e1f7ab9eff4019d09c8829
4
- data.tar.gz: 1b5ed8b53e77fb413a98ce6cfe4a682611b33f62
3
+ metadata.gz: 1f82d5d752da3918795afc6cc669a0fb4711cf95
4
+ data.tar.gz: fed486ae8aeb9266d4fd11cf523a19a8507755af
5
5
  SHA512:
6
- metadata.gz: ea2e037bc46885c0e7cec71d085a5883dd9508f597d6e98d275b11464d2a9f27596a26cfd4a9141f51090b9ab2c9a96df63eb27c59ebba12113c00e2cac4d239
7
- data.tar.gz: 1e18b4578eda83a38d200896cf0b8e22a85726b08f956bd20e9e794d81322df18416004d58f977e78b053b3609e559064d04cfe683ab1ecc60d32e8f89f34067
6
+ metadata.gz: 8418f29dfe96d38bcdfa0c5098d59efd819edaa715a7e2af0945c57f70a4d08fa5c477560df636389eefe0bcf40712c4d240b2a97d3301cadebf6e5615808f2b
7
+ data.tar.gz: aa34ee20fdec506277f3ac40b8819b62f7d000db7c802ed4c9fcedd5af1bf33644ecf673b10326c8e64f8d554a9a708cdfc0f9ae7942b5570257adec3108ab28
data/.travis.yml CHANGED
@@ -1,7 +1,9 @@
1
1
  language: ruby
2
2
  cache: bundler
3
3
  rvm:
4
- - 2.1.10
5
- - 2.2.5
6
- - 2.3.1
7
- - jruby-9.0.5.0
4
+ - 2.1
5
+ - 2.3.1
6
+ - jruby-9.1.2.0
7
+ before_install:
8
+ - gem install bundler
9
+
data/CHANGELOG.md ADDED
@@ -0,0 +1,48 @@
1
+ # Change Log
2
+
3
+ ## [0.2.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/0.2.0) (2016-06-06)
4
+ [Full Changelog](https://github.com/tumugi/tumugi-plugin-bigquery/compare/v0.1.0...0.2.0)
5
+
6
+ **Implemented enhancements:**
7
+
8
+ - Support extract table to FileSystemTarget [\#23](https://github.com/tumugi/tumugi-plugin-bigquery/issues/23)
9
+ - Support load from GCS [\#5](https://github.com/tumugi/tumugi-plugin-bigquery/issues/5)
10
+ - Support extract table to Google Cloud Storage [\#4](https://github.com/tumugi/tumugi-plugin-bigquery/issues/4)
11
+ - Support service account application default auth [\#22](https://github.com/tumugi/tumugi-plugin-bigquery/pull/22) ([hakobera](https://github.com/hakobera))
12
+
13
+ **Fixed bugs:**
14
+
15
+ - Fix typo and dependency [\#24](https://github.com/tumugi/tumugi-plugin-bigquery/pull/24) ([hakobera](https://github.com/hakobera))
16
+ - Fix missing project\_id of dataset/table [\#21](https://github.com/tumugi/tumugi-plugin-bigquery/pull/21) ([hakobera](https://github.com/hakobera))
17
+ - Fix private key file auth does not work [\#19](https://github.com/tumugi/tumugi-plugin-bigquery/pull/19) ([hakobera](https://github.com/hakobera))
18
+ - Fix support private key file in config section [\#13](https://github.com/tumugi/tumugi-plugin-bigquery/pull/13) ([hakobera](https://github.com/hakobera))
19
+
20
+ **Closed issues:**
21
+
22
+ - Update tumugi to v0.5.0 [\#8](https://github.com/tumugi/tumugi-plugin-bigquery/issues/8)
23
+
24
+ **Merged pull requests:**
25
+
26
+ - Cache output [\#26](https://github.com/tumugi/tumugi-plugin-bigquery/pull/26) ([hakobera](https://github.com/hakobera))
27
+ - Prepare release for 0.2.0 [\#25](https://github.com/tumugi/tumugi-plugin-bigquery/pull/25) ([hakobera](https://github.com/hakobera))
28
+ - Use Thor's invoke instead of system method [\#18](https://github.com/tumugi/tumugi-plugin-bigquery/pull/18) ([hakobera](https://github.com/hakobera))
29
+ - Change test ruby version [\#17](https://github.com/tumugi/tumugi-plugin-bigquery/pull/17) ([hakobera](https://github.com/hakobera))
30
+ - Change tumugi dependency version [\#16](https://github.com/tumugi/tumugi-plugin-bigquery/pull/16) ([hakobera](https://github.com/hakobera))
31
+ - Implement extract table to google cloud storage feature [\#15](https://github.com/tumugi/tumugi-plugin-bigquery/pull/15) ([hakobera](https://github.com/hakobera))
32
+ - Add BigqueryLoadTask [\#12](https://github.com/tumugi/tumugi-plugin-bigquery/pull/12) ([hakobera](https://github.com/hakobera))
33
+ - Update dependency gems [\#11](https://github.com/tumugi/tumugi-plugin-bigquery/pull/11) ([hakobera](https://github.com/hakobera))
34
+ - Update tumugi to v0.5.0 [\#9](https://github.com/tumugi/tumugi-plugin-bigquery/pull/9) ([hakobera](https://github.com/hakobera))
35
+ - Add rubygems badge [\#3](https://github.com/tumugi/tumugi-plugin-bigquery/pull/3) ([hakobera](https://github.com/hakobera))
36
+
37
+ ## [v0.1.0](https://github.com/tumugi/tumugi-plugin-bigquery/tree/v0.1.0) (2016-05-16)
38
+ **Fixed bugs:**
39
+
40
+ - Fix unused arguments [\#2](https://github.com/tumugi/tumugi-plugin-bigquery/pull/2) ([hakobera](https://github.com/hakobera))
41
+
42
+ **Merged pull requests:**
43
+
44
+ - First implementation [\#1](https://github.com/tumugi/tumugi-plugin-bigquery/pull/1) ([hakobera](https://github.com/hakobera))
45
+
46
+
47
+
48
+ \* *This Change Log was automatically generated by [github_changelog_generator](https://github.com/skywinder/Github-Changelog-Generator)*
data/README.md CHANGED
@@ -1,4 +1,4 @@
1
- [![Build Status](https://travis-ci.org/tumugi/tumugi-plugin-bigquery.svg?branch=master)](https://travis-ci.org/tumugi/tumugi-plugin-bigquery) [![Code Climate](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery/badges/gpa.svg)](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery) [![Coverage Status](https://coveralls.io/repos/github/tumugi/tumugi-plugin-bigquery/badge.svg?branch=master)](https://coveralls.io/github/tumugi/tumugi-plugin-bigquery)
1
+ [![Build Status](https://travis-ci.org/tumugi/tumugi-plugin-bigquery.svg?branch=master)](https://travis-ci.org/tumugi/tumugi-plugin-bigquery) [![Code Climate](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery/badges/gpa.svg)](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery) [![Coverage Status](https://coveralls.io/repos/github/tumugi/tumugi-plugin-bigquery/badge.svg?branch=master)](https://coveralls.io/github/tumugi/tumugi-plugin-bigquery) [![Gem Version](https://badge.fury.io/rb/tumugi-plugin-bigquery.svg)](https://badge.fury.io/rb/tumugi-plugin-bigquery)
2
2
 
3
3
  # tumugi-plugin-bigquery
4
4
 
@@ -68,6 +68,8 @@ end
68
68
 
69
69
  #### Usage
70
70
 
71
+ Copy `test.src_table` to `test.dest_table`.
72
+
71
73
  ```rb
72
74
  task :task1, type: :bigquery_copy do
73
75
  param_set :src_dataset_id, 'test'
@@ -77,6 +79,24 @@ task :task1, type: :bigquery_copy do
77
79
  end
78
80
  ```
79
81
 
82
+ ### Tumugi::Plugin::BigqueryLoadTask
83
+
84
+ `Tumugi::Plugin::BigqueryLoadTask` is task to load structured data from GCS into BigQuery.
85
+
86
+ #### Usage
87
+
88
+ Load `gs://test_bucket/load_data.csv` into `dest_project:dest_dataset.dest_table`
89
+
90
+ ```rb
91
+ task :task1, type: :bigquery_load do
92
+ param_set :bucket, 'test_bucket'
93
+ param_set :key, 'load_data.csv'
94
+ param_set :project_id, 'dest_project'
95
+ param_set :datset_id, 'dest_dataset'
96
+ param_set :table_id, 'dest_table'
97
+ end
98
+ ```
99
+
80
100
  ### Config Section
81
101
 
82
102
  tumugi-plugin-bigquery provide config section named "bigquery" which can specified BigQuery autenticaion info.
@@ -84,7 +104,7 @@ tumugi-plugin-bigquery provide config section named "bigquery" which can specifi
84
104
  #### Authenticate by client_email and private_key
85
105
 
86
106
  ```rb
87
- Tumugi.config do |config|
107
+ Tumugi.configure do |config|
88
108
  config.section("bigquery") do |section|
89
109
  section.project_id = "xxx"
90
110
  section.client_email = "yyy@yyy.iam.gserviceaccount.com"
@@ -96,7 +116,7 @@ end
96
116
  #### Authenticate by JSON key file
97
117
 
98
118
  ```rb
99
- Tumugi.config do |config|
119
+ Tumugi.configure do |config|
100
120
  config.section("bigquery") do |section|
101
121
  section.private_key_file = "/path/to/key.json"
102
122
  end
data/examples/load.rb ADDED
@@ -0,0 +1,24 @@
1
+ task :task1, type: :bigquery_load do
2
+ requires :task2
3
+ param_set :bucket, 'tumugi-plugin-bigquery'
4
+ param_set :key, 'test.csv'
5
+ param_set :dataset_id, -> { input.dataset_id }
6
+ param_set :table_id, 'load_test'
7
+ param_set :skip_leading_rows, 1
8
+ param_set :schema, [
9
+ {
10
+ name: 'row_number',
11
+ type: 'INTEGER',
12
+ mode: 'NULLABLE'
13
+ },
14
+ {
15
+ name: 'value',
16
+ type: 'INTEGER',
17
+ mode: 'NULLABLE'
18
+ },
19
+ ]
20
+ end
21
+
22
+ task :task2, type: :bigquery_dataset do
23
+ param_set :dataset_id, 'test'
24
+ end
data/examples/test.csv ADDED
@@ -0,0 +1,6 @@
1
+ row_number,value
2
+ 1,1
3
+ 2,2
4
+ 3,3
5
+ 4,4
6
+ 5,5
@@ -1,7 +1,7 @@
1
- Tumugi.config do |c|
2
- c.section('bigquery') do |s|
3
- s.project_id = ENV["PROJECT_ID"]
4
- s.client_email = ENV["CLIENT_EMAIL"]
5
- s.private_key = ENV["PRIVATE_KEY"].gsub(/\\n/, "\n")
1
+ Tumugi.configure do |config|
2
+ config.section('bigquery') do |section|
3
+ section.project_id = ENV["PROJECT_ID"]
4
+ section.client_email = ENV["CLIENT_EMAIL"]
5
+ section.private_key = ENV["PRIVATE_KEY"].gsub(/\\n/, "\n")
6
6
  end
7
7
  end
@@ -1,4 +1,5 @@
1
1
  require 'kura'
2
+ require 'json'
2
3
  require_relative './error'
3
4
 
4
5
  Tumugi::Config.register_section('bigquery', :project_id, :client_email, :private_key, :private_key_file)
@@ -9,12 +10,22 @@ module Tumugi
9
10
  class Client
10
11
  attr_reader :project_id
11
12
 
12
- def initialize(project_id: nil, client_email: nil, private_key: nil)
13
- config = Tumugi.config.section('bigquery')
14
- @project_id = project_id || config.project_id
15
- @client_email = client_email || config.client_email
16
- @private_key = private_key || config.private_key
17
- @client = Kura.client(@project_id, @client_email, @private_key)
13
+ def initialize(project_id: nil, client_email: nil, private_key: nil, private_key_file: nil)
14
+ @project_id = project_id
15
+
16
+ if client_email.nil? && private_key.nil? && !private_key_file.nil?
17
+ @client = Kura.client(private_key_file)
18
+ if @project_id.nil?
19
+ key = JSON.parse(File.read(private_key_file))
20
+ @project_id = key['project_id']
21
+ end
22
+ else
23
+ # This method call style is needed for jruby.
24
+ # JRuby cannot handle correctly if method using keyword hash and last hash argument.
25
+ # see https://bugs.ruby-lang.org/issues/7529
26
+ @client = Kura.client(project_id = { "project_id" => @project_id, "client_email" => client_email, "private_key" => private_key },
27
+ client_email = nil, private_key = nil, {http_options: {timeout: 60}})
28
+ end
18
29
  rescue Kura::ApiError => e
19
30
  process_error(e)
20
31
  end
@@ -77,6 +88,12 @@ module Tumugi
77
88
  process_error(e)
78
89
  end
79
90
 
91
+ def table(dataset_id, table_id, project_id: nil)
92
+ @client.table(dataset_id, table_id, project_id: project_id || @project_id)
93
+ rescue Kura::ApiError => e
94
+ process_error(e)
95
+ end
96
+
80
97
  def table_exist?(dataset_id, table_id, project_id: nil)
81
98
  !@client.table(dataset_id, table_id, project_id: project_id || @project_id).nil?
82
99
  rescue Kura::ApiError => e
@@ -163,6 +180,7 @@ module Tumugi
163
180
  use_query_cache: true,
164
181
  user_defined_function_resources: nil,
165
182
  project_id: nil,
183
+ job_project_id: nil,
166
184
  job_id: nil,
167
185
  wait: nil,
168
186
  dry_run: false,
@@ -175,7 +193,7 @@ module Tumugi
175
193
  use_query_cache: use_query_cache,
176
194
  user_defined_function_resources: user_defined_function_resources,
177
195
  project_id: project_id || @project_id,
178
- job_project_id: project_id || @project_id,
196
+ job_project_id: job_project_id || @project_id,
179
197
  job_id: job_id,
180
198
  wait: wait,
181
199
  dry_run: dry_run,
@@ -185,28 +203,38 @@ module Tumugi
185
203
  end
186
204
 
187
205
  def load(dataset_id, table_id, source_uris=nil,
188
- schema: nil, delimiter: ",", field_delimiter: delimiter, mode: :append,
189
- allow_jagged_rows: false, max_bad_records: 0,
206
+ schema: nil,
207
+ field_delimiter: ",",
208
+ mode: :append,
209
+ allow_jagged_rows: false,
210
+ max_bad_records: 0,
190
211
  ignore_unknown_values: false,
191
212
  allow_quoted_newlines: false,
192
- quote: '"', skip_leading_rows: 0,
213
+ quote: '"',
214
+ skip_leading_rows: 0,
193
215
  source_format: "CSV",
194
216
  project_id: nil,
217
+ job_project_id: nil,
195
218
  job_id: nil,
196
219
  file: nil, wait: nil,
197
220
  dry_run: false,
198
221
  &blk)
199
222
  @client.load(dataset_id, table_id, source_uris=source_uris,
200
- schema: schema, delimiter: delimiter, field_delimiter: field_delimiter, mode: mode,
201
- allow_jagged_rows: allow_jagged_rows, max_bad_records: max_bad_records,
223
+ schema: schema,
224
+ field_delimiter: field_delimiter,
225
+ mode: mode,
226
+ allow_jagged_rows: allow_jagged_rows,
227
+ max_bad_records: max_bad_records,
202
228
  ignore_unknown_values: ignore_unknown_values,
203
229
  allow_quoted_newlines: allow_quoted_newlines,
204
- quote: quote, skip_leading_rows: skip_leading_rows,
230
+ quote: quote,
231
+ skip_leading_rows: skip_leading_rows,
205
232
  source_format: source_format,
206
233
  project_id: project_id || @project_id,
207
- job_project_id: project_id || @project_id,
234
+ job_project_id: job_project_id || @project_id,
208
235
  job_id: job_id,
209
- file: file, wait: wait,
236
+ file: file,
237
+ wait: wait,
210
238
  dry_run: dry_run,
211
239
  &blk)
212
240
  rescue Kura::ApiError => e
@@ -219,6 +247,7 @@ module Tumugi
219
247
  field_delimiter: ",",
220
248
  print_header: true,
221
249
  project_id: nil,
250
+ job_project_id: nil,
222
251
  job_id: nil,
223
252
  wait: nil,
224
253
  dry_run: false,
@@ -229,7 +258,7 @@ module Tumugi
229
258
  field_delimiter: field_delimiter,
230
259
  print_header: print_header,
231
260
  project_id: project_id || @project_id,
232
- job_project_id: project_id || @project_id,
261
+ job_project_id: job_project_id || @project_id,
233
262
  job_id: job_id,
234
263
  wait: wait,
235
264
  dry_run: dry_run,
@@ -242,6 +271,7 @@ module Tumugi
242
271
  mode: :truncate,
243
272
  src_project_id: nil,
244
273
  dest_project_id: nil,
274
+ job_project_id: dest_project_id,
245
275
  job_id: nil,
246
276
  wait: nil,
247
277
  dry_run: false,
@@ -250,7 +280,7 @@ module Tumugi
250
280
  mode: mode,
251
281
  src_project_id: src_project_id || @project_id,
252
282
  dest_project_id: dest_project_id || @project_id,
253
- job_project_id: dest_project_id || @project_id,
283
+ job_project_id: job_project_id || @project_id,
254
284
  job_id: job_id,
255
285
  wait: wait,
256
286
  dry_run: dry_run,
@@ -280,7 +310,7 @@ module Tumugi
280
310
  private
281
311
 
282
312
  def process_error(e)
283
- raise Tumugi::Plugin::Bigquery::BigqueryError.new(e.reason, e.message)
313
+ raise Tumugi::Plugin::Bigquery::BigqueryError.new(e.message, e.reason)
284
314
  end
285
315
  end
286
316
  end
@@ -1,7 +1,7 @@
1
1
  module Tumugi
2
2
  module Plugin
3
3
  module Bigquery
4
- VERSION = "0.1.0"
4
+ VERSION = "0.2.0"
5
5
  end
6
6
  end
7
7
  end
@@ -17,8 +17,8 @@ module Tumugi
17
17
  cfg = Tumugi.config.section('bigquery')
18
18
  @project_id = project_id || cfg.project_id
19
19
  @dataset_id = dataset_id
20
- @client = client || Tumugi::Plugin::Bigquery::Client.new(project_id: @project_id)
21
- @dataset = Tumugi::Plugin::Bigquery::Dataset.new(project_id: @project_id, dataset_id: @dataset_id)
20
+ @client = client || Tumugi::Plugin::Bigquery::Client.new(cfg.to_h.merge(project_id: @project_id))
21
+ @dataset = Tumugi::Plugin::Bigquery::Dataset.new(project_id: @client.project_id, dataset_id: @dataset_id)
22
22
  end
23
23
 
24
24
  def exist?
@@ -18,8 +18,8 @@ module Tumugi
18
18
  @project_id = project_id || cfg.project_id
19
19
  @dataset_id = dataset_id
20
20
  @table_id = table_id
21
- @client = client || Tumugi::Plugin::Bigquery::Client.new(project_id: @project_id)
22
- @table = Tumugi::Plugin::Bigquery::Table.new(project_id: @project_id, dataset_id: @dataset_id, table_id: @table_id)
21
+ @client = client || Tumugi::Plugin::Bigquery::Client.new(cfg.to_h.merge(project_id: @project_id))
22
+ @table = Tumugi::Plugin::Bigquery::Table.new(project_id: @client.project_id, dataset_id: @dataset_id, table_id: @table_id)
23
23
  end
24
24
 
25
25
  def exist?
@@ -15,9 +15,11 @@ module Tumugi
15
15
  param :wait, type: :int, default: 60
16
16
 
17
17
  def output
18
+ return @output if @output
19
+
18
20
  opts = { dataset_id: dest_dataset_id, table_id: dest_table_id }
19
21
  opts[:project_id] = dest_project_id if dest_project_id
20
- Tumugi::Plugin::BigqueryTableTarget.new(opts)
22
+ @output = Tumugi::Plugin::BigqueryTableTarget.new(opts)
21
23
  end
22
24
 
23
25
  def run
@@ -10,7 +10,7 @@ module Tumugi
10
10
  param :dataset_id, type: :string, required: true
11
11
 
12
12
  def output
13
- Tumugi::Plugin::BigqueryDatasetTarget.new(project_id: project_id, dataset_id: dataset_id)
13
+ @output ||= Tumugi::Plugin::BigqueryDatasetTarget.new(project_id: project_id, dataset_id: dataset_id)
14
14
  end
15
15
 
16
16
  def run
@@ -0,0 +1,112 @@
1
+ require 'json'
2
+ require 'tumugi'
3
+ require 'tumugi/plugin/file_system_target'
4
+ require_relative '../target/bigquery_table'
5
+
6
+ module Tumugi
7
+ module Plugin
8
+ class BigqueryExportTask < Tumugi::Task
9
+ Tumugi::Plugin.register_task('bigquery_export', self)
10
+
11
+ param :project_id, type: :string
12
+ param :job_project_id, type: :string
13
+ param :dataset_id, type: :string, required: true
14
+ param :table_id, type: :string, required: true
15
+
16
+ param :compression, type: :string, default: 'NONE' # GZIP
17
+ param :destination_format, type: :string, default: 'CSV' # NEWLINE_DELIMITED_JSON, AVRO
18
+
19
+ # Only effected if destiation_format == 'CSV'
20
+ param :field_delimiter, type: :string, default: ','
21
+ param :print_header, type: :bool, default: true
22
+
23
+ param :page_size, type: :integer, default: 10000
24
+
25
+ param :wait, type: :integer, default: 120
26
+
27
+ def run
28
+ unless output.is_a?(Tumugi::Plugin::FileSystemTarget)
29
+ raise Tumugi::TumugiError.new("BigqueryExportTask#output must be return a instance of Tumugi::Plugin::FileSystemTarget")
30
+ end
31
+
32
+ client = Tumugi::Plugin::Bigquery::Client.new(config)
33
+ table = Tumugi::Plugin::Bigquery::Table.new(project_id: client.project_id, dataset_id: dataset_id, table_id: table_id)
34
+ job_project_id = client.project_id if job_project_id.nil?
35
+
36
+ log "Source: #{table}"
37
+ log "Destination: #{output}"
38
+
39
+ if is_gcs?(output)
40
+ export_to_gcs(client)
41
+ else
42
+ if destination_format.upcase == 'AVRO'
43
+ raise Tumugi::TumugiError.new("destination_format='AVRO' is only supported when export to Google Cloud Storage")
44
+ end
45
+ if compression.upcase == 'GZIP'
46
+ logger.warn("compression parameter is ignored, it's only supported when export to Google Cloud Storage")
47
+ end
48
+ export_to_file_system(client)
49
+ end
50
+ end
51
+
52
+ private
53
+
54
+ def is_gcs?(target)
55
+ not target.to_s.match(/^gs:\/\/[^\/]+\/.+$/).nil?
56
+ end
57
+
58
+ def export_to_gcs(client)
59
+ options = {
60
+ compression: compression.upcase,
61
+ destination_format: destination_format.upcase,
62
+ field_delimiter: field_delimiter,
63
+ print_header: print_header,
64
+ project_id: client.project_id,
65
+ job_project_id: job_project_id || client.project_id,
66
+ wait: wait
67
+ }
68
+ client.extract(dataset_id, table_id, output.to_s, options)
69
+ end
70
+
71
+ def export_to_file_system(client)
72
+ schema ||= client.table(dataset_id, table_id, project_id: client.project_id).schema.fields
73
+ field_names = schema.map{|f| f.respond_to?(:[]) ? (f["name"] || f[:name]) : f.name }
74
+ start_index = 0
75
+ page_token = nil
76
+ options = {
77
+ max_result: page_size,
78
+ project_id: client.project_id,
79
+ }
80
+
81
+ output.open('w') do |file|
82
+ file.puts field_names.join(field_delimiter) if destination_format == 'CSV' && print_header
83
+ begin
84
+ table_data_list = client.list_tabledata(dataset_id, table_id, options.merge(start_index: start_index, page_token: page_token))
85
+ start_index += page_size
86
+ page_token = table_data_list[:next_token]
87
+ table_data_list[:rows].each do |row|
88
+ file.puts line(field_names, row, destination_format)
89
+ end
90
+ end while not page_token.nil?
91
+ end
92
+ end
93
+
94
+ def line(field_names, row, format)
95
+ case format
96
+ when 'CSV'
97
+ row.map{|v| v[1]}.join(field_delimiter)
98
+ when 'NEWLINE_DELIMITED_JSON'
99
+ JSON.generate(row.to_h)
100
+ end
101
+ end
102
+
103
+ def config
104
+ cfg = Tumugi.config.section('bigquery').to_h
105
+ unless project_id.nil?
106
+ cfg[:project_id] = project_id
107
+ end
108
+ cfg
109
+ end
110
+ end
111
+ end
112
+ end
@@ -0,0 +1,73 @@
1
+ require 'tumugi'
2
+ require_relative '../target/bigquery_table'
3
+
4
+ module Tumugi
5
+ module Plugin
6
+ class BigqueryLoadTask < Tumugi::Task
7
+ Tumugi::Plugin.register_task('bigquery_load', self)
8
+
9
+ param :bucket, type: :string, required: true
10
+ param :key, type: :string, required: true
11
+ param :project_id, type: :string
12
+ param :dataset_id, type: :string, required: true
13
+ param :table_id, type: :string, required: true
14
+
15
+ param :schema # type: :array
16
+ param :field_delimiter, type: :string, default: ','
17
+ param :mode, type: :string, default: 'append' # truncate, empty
18
+ param :allow_jagged_rows, type: :bool, default: false
19
+ param :max_bad_records, type: :integer, default: 0
20
+ param :ignore_unknown_values, type: :bool, default: false
21
+ param :allow_quoted_newlines, type: :bool, default: false
22
+ param :quote, type: :string, default: '"'
23
+ param :skip_leading_rows, type: :interger, default: 0
24
+ param :source_format, type: :string, default: 'CSV' # NEWLINE_DELIMITED_JSON, AVRO
25
+ param :wait, type: :integer, default: 60
26
+
27
+ def output
28
+ return @output if @output
29
+
30
+ opts = { dataset_id: dataset_id, table_id: table_id }
31
+ opts[:project_id] = project_id if project_id
32
+ @output = Tumugi::Plugin::BigqueryTableTarget.new(opts)
33
+ end
34
+
35
+ def run
36
+ if mode != 'append'
37
+ raise Tumugi::ParameterError.new("Parameter 'schema' is required when 'mode' is 'truncate' or 'empty'") if schema.nil?
38
+ end
39
+
40
+ src_uri = "gs://#{bucket}#{normalize_path(key)}"
41
+ log "Source: #{src_uri}"
42
+ log "Destination: #{output}"
43
+
44
+ bq_client = output.client
45
+ opts = {
46
+ schema: schema,
47
+ field_delimiter: field_delimiter,
48
+ mode: mode.to_sym,
49
+ allow_jagged_rows: allow_jagged_rows,
50
+ max_bad_records: max_bad_records,
51
+ ignore_unknown_values: ignore_unknown_values,
52
+ allow_quoted_newlines: allow_quoted_newlines,
53
+ quote: quote,
54
+ skip_leading_rows: skip_leading_rows,
55
+ source_format: source_format,
56
+ project_id: output.project_id,
57
+ wait: wait
58
+ }
59
+ bq_client.load(output.dataset_id, output.table_id, src_uri, opts)
60
+ end
61
+
62
+ private
63
+
64
+ def normalize_path(path)
65
+ unless path.start_with?('/')
66
+ "/#{path}"
67
+ else
68
+ path
69
+ end
70
+ end
71
+ end
72
+ end
73
+ end
@@ -13,7 +13,7 @@ module Tumugi
13
13
  param :wait, type: :int, default: 60
14
14
 
15
15
  def output
16
- Tumugi::Plugin::BigqueryTableTarget.new(project_id: project_id, dataset_id: dataset_id, table_id: table_id)
16
+ @output ||= Tumugi::Plugin::BigqueryTableTarget.new(project_id: project_id, dataset_id: dataset_id, table_id: table_id)
17
17
  end
18
18
 
19
19
  def run
@@ -20,8 +20,8 @@ Gem::Specification.new do |spec|
20
20
  spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
21
21
  spec.require_paths = ["lib"]
22
22
 
23
- spec.add_runtime_dependency "tumugi", "~> 0.4.5"
24
- spec.add_runtime_dependency "kura", "0.2.16"
23
+ spec.add_runtime_dependency "tumugi", ">= 0.5.1"
24
+ spec.add_runtime_dependency "kura", "~> 0.2.17"
25
25
 
26
26
  spec.add_development_dependency 'bundler', '~> 1.11'
27
27
  spec.add_development_dependency 'rake', '~> 10.0'
@@ -29,4 +29,5 @@ Gem::Specification.new do |spec|
29
29
  spec.add_development_dependency 'test-unit-rr'
30
30
  spec.add_development_dependency 'coveralls'
31
31
  spec.add_development_dependency 'github_changelog_generator'
32
+ spec.add_development_dependency 'tumugi-plugin-google_cloud_storage'
32
33
  end
metadata CHANGED
@@ -1,43 +1,43 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: tumugi-plugin-bigquery
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Kazuyuki Honda
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2016-05-16 00:00:00.000000000 Z
11
+ date: 2016-06-06 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: tumugi
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - "~>"
17
+ - - ">="
18
18
  - !ruby/object:Gem::Version
19
- version: 0.4.5
19
+ version: 0.5.1
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - "~>"
24
+ - - ">="
25
25
  - !ruby/object:Gem::Version
26
- version: 0.4.5
26
+ version: 0.5.1
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: kura
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
- - - '='
31
+ - - "~>"
32
32
  - !ruby/object:Gem::Version
33
- version: 0.2.16
33
+ version: 0.2.17
34
34
  type: :runtime
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
- - - '='
38
+ - - "~>"
39
39
  - !ruby/object:Gem::Version
40
- version: 0.2.16
40
+ version: 0.2.17
41
41
  - !ruby/object:Gem::Dependency
42
42
  name: bundler
43
43
  requirement: !ruby/object:Gem::Requirement
@@ -122,6 +122,20 @@ dependencies:
122
122
  - - ">="
123
123
  - !ruby/object:Gem::Version
124
124
  version: '0'
125
+ - !ruby/object:Gem::Dependency
126
+ name: tumugi-plugin-google_cloud_storage
127
+ requirement: !ruby/object:Gem::Requirement
128
+ requirements:
129
+ - - ">="
130
+ - !ruby/object:Gem::Version
131
+ version: '0'
132
+ type: :development
133
+ prerelease: false
134
+ version_requirements: !ruby/object:Gem::Requirement
135
+ requirements:
136
+ - - ">="
137
+ - !ruby/object:Gem::Version
138
+ version: '0'
125
139
  description:
126
140
  email:
127
141
  - hakobera@gmail.com
@@ -131,13 +145,16 @@ extra_rdoc_files: []
131
145
  files:
132
146
  - ".gitignore"
133
147
  - ".travis.yml"
148
+ - CHANGELOG.md
134
149
  - Gemfile
135
150
  - README.md
136
151
  - Rakefile
137
152
  - bin/setup
138
153
  - examples/copy.rb
139
154
  - examples/dataset.rb
155
+ - examples/load.rb
140
156
  - examples/query.rb
157
+ - examples/test.csv
141
158
  - examples/tumugi_config_example.rb
142
159
  - lib/tumugi/plugin/bigquery/client.rb
143
160
  - lib/tumugi/plugin/bigquery/dataset.rb
@@ -148,6 +165,8 @@ files:
148
165
  - lib/tumugi/plugin/target/bigquery_table.rb
149
166
  - lib/tumugi/plugin/task/bigquery_copy.rb
150
167
  - lib/tumugi/plugin/task/bigquery_dataset.rb
168
+ - lib/tumugi/plugin/task/bigquery_export.rb
169
+ - lib/tumugi/plugin/task/bigquery_load.rb
151
170
  - lib/tumugi/plugin/task/bigquery_query.rb
152
171
  - tumugi-plugin-bigquery.gemspec
153
172
  homepage: https://github.com/tumugi/tumugi-plugin-bigquery