embulk-output-bigquery 0.1.2 → 0.1.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 46c61dd1c73ff99c3c69bd217ca772f07b2e1127
4
- data.tar.gz: ba184360972884260c1fe90264af7d5386791804
3
+ metadata.gz: b37d638ca9c217221687cdcadfbd45257291aef4
4
+ data.tar.gz: 972bf78e9ce75972fd3f2e1f77389a7383c3d2a0
5
5
  SHA512:
6
- metadata.gz: aa693e59cb4b45c2d43f07479f3d61e63242185be9964d4f00b83a4a784a0443ae270a63760f3f2f188e74deb77cbb94a89a18db49d2c5cd4621f18b73363ab3
7
- data.tar.gz: 7c0ea783220de28befd7c565ff83ec5ff58f13af0db16b3d341a12c3e415adeacba375e5688a42fcbb26d0402a48071622ed5b161fa52fd08b1f56444faf66e1
6
+ metadata.gz: 6d18639e76da80f45e2852df8408ec4c9c655e06a77a776c42dfb52310cc787e8319fd8f182eef20c59ffd01e5939ccb3799b4a7bad5b20f416aa668d513b3e3
7
+ data.tar.gz: 5a37cded1558ba6f3fbb1d4c475c46a593151a32e831299c00c386185d32828ec72810c2334863f7e03cdc7a3b2d974e7d1ec0cf653276073e7d336398ae92ba
data/README.md CHANGED
@@ -1,7 +1,7 @@
1
1
 
2
2
  # embulk-output-bigquery
3
3
 
4
- [Embulk](https://github.com/embulk/embulk/) output plugin to load/insert data into [Google BigQuery](https://cloud.google.com/bigquery/)
4
+ [Embulk](https://github.com/embulk/embulk/) output plugin to load/insert data into [Google BigQuery](https://cloud.google.com/bigquery/) using [direct insert](https://cloud.google.com/bigquery/loading-data-into-bigquery#loaddatapostrequest)
5
5
 
6
6
  ## Overview
7
7
 
@@ -16,7 +16,7 @@ https://developers.google.com/bigquery/loading-data-into-bigquery
16
16
  ### NOT IMPLEMENTED
17
17
  * insert data over streaming inserts
18
18
  * for continuous real-time insertions
19
- * Pleast use other product, like [fluent-plugin-bigquery](https://github.com/kaizenplatform/fluent-plugin-bigquery)
19
+ * Please use other product, like [fluent-plugin-bigquery](https://github.com/kaizenplatform/fluent-plugin-bigquery)
20
20
  * https://developers.google.com/bigquery/streaming-data-into-bigquery#usecases
21
21
 
22
22
  Current version of this plugin supports Google API with Service Account Authentication, but does not support
@@ -24,8 +24,9 @@ OAuth flow for installed applications.
24
24
 
25
25
  ## Configuration
26
26
 
27
- - **service_account_email**: your Google service account email (string, required)
28
- - **p12_keyfile_path**: fullpath of private key in P12(PKCS12) format (string, required)
27
+ - **auth_method**: (private_key or compute_engine) (string, optional, default is private_key)
28
+ - **service_account_email**: your Google service account email (string, required when auth_method is private_key)
29
+ - **p12_keyfile_path**: fullpath of private key in P12(PKCS12) format (string, required when auth_method is private_key)
29
30
  - **path_prefix**: (string, required)
30
31
  - **sequence_format**: (string, optional, default is %03d.%02d)
31
32
  - **file_ext**: (string, required)
@@ -42,13 +43,14 @@ OAuth flow for installed applications.
42
43
  - **is_skip_job_result_check**: (boolean, optional, default is 0)
43
44
  - **field_delimiter**: (string, optional, default is ",")
44
45
  - **max_bad_records**: (int, optional, default is 0)
45
- - **encoding**: (UTF-8 or ISO-8859-1) (string, optional, default is "UTF-8")
46
+ - **encoding**: (UTF-8 or ISO-8859-1) (string, optional, default is UTF-8)
46
47
 
47
- ## Example
48
+ ### Example
48
49
 
49
50
  ```yaml
50
51
  out:
51
52
  type: bigquery
53
+ auth_method: private_key # default
52
54
  service_account_email: ABCXYZ123ABCXYZ123.gserviceaccount.com
53
55
  p12_keyfile_path: /path/to/p12_keyfile.p12
54
56
  path_prefix: /path/to/output
@@ -64,19 +66,58 @@ out:
64
66
  - {type: gzip}
65
67
  ```
66
68
 
67
- ## Dynamic table creating
69
+ ### Authentication
70
+
71
+ There are two methods supported to fetch access token for the service account.
72
+
73
+ 1. Public-Private key pair
74
+ 2. Predefined access token (Compute Engine only)
75
+
76
+ The examples above use the first one. You first need to create a service account (client ID),
77
+ download its private key and deploy the key with embulk.
78
+
79
+ On the other hand, you don't need to explicitly create a service account for embulk when you
80
+ run embulk in Google Compute Engine. In this second authentication method, you need to
81
+ add the API scope "https://www.googleapis.com/auth/bigquery" to the scope list of your
82
+ Compute Engine instance, then you can configure embulk like this.
83
+
84
+ ```yaml
85
+ out:
86
+ type: bigquery
87
+ auth_method: compute_engine
88
+ ```
89
+
90
+ ### Table id formatting
91
+
92
+ `table` and option accept [Time#strftime](http://ruby-doc.org/core-1.9.3/Time.html#method-i-strftime)
93
+ format to construct table ids.
94
+ Table ids are formatted at runtime
95
+ using the local time of the embulk server.
96
+
97
+ For example, with the configuration below,
98
+ data is inserted into tables `table_2015_04`, `table_2015_05` and so on.
99
+
100
+ ```yaml
101
+ out:
102
+ type: bigquery
103
+ table: table_%Y_%m
104
+ ```
105
+
106
+ ### Dynamic table creating
68
107
 
69
108
  When `auto_create_table` is set to true, try to create the table using BigQuery API.
70
109
 
110
+ If table already exists, insert into it.
111
+
71
112
  To describe the schema of the target table, please write schema path.
72
113
 
73
- `table` option accept [Time#strftime](http://ruby-doc.org/core-1.9.3/Time.html#method-i-strftime)
74
- format of ruby to construct table name.
75
114
 
76
- ```
77
- auto_create_table: true
78
- table: table_%Y_%m
79
- schema_path: /path/to/schema.json
115
+ ```yaml
116
+ out:
117
+ type: bigquery
118
+ auto_create_table: true
119
+ table: table_%Y_%m
120
+ schema_path: /path/to/schema.json
80
121
  ```
81
122
 
82
123
  ## Build
@@ -15,7 +15,7 @@ configurations {
15
15
  sourceCompatibility = 1.7
16
16
  targetCompatibility = 1.7
17
17
 
18
- version = "0.1.2"
18
+ version = "0.1.3"
19
19
 
20
20
  dependencies {
21
21
  compile "org.embulk:embulk-core:0.5.1"
@@ -1,44 +1,38 @@
1
1
  package org.embulk.output;
2
2
 
3
3
  import java.io.File;
4
- import java.io.FileNotFoundException;
5
- import java.io.FileInputStream;
6
4
  import java.io.IOException;
7
- import java.util.ArrayList;
8
- import java.util.List;
9
- import java.util.IllegalFormatException;
10
- import com.google.api.client.auth.oauth2.Credential;
11
- import com.google.api.client.auth.oauth2.CredentialRefreshListener;
12
- import com.google.api.client.auth.oauth2.TokenErrorResponse;
13
- import com.google.api.client.auth.oauth2.TokenResponse;
5
+
6
+ import com.google.common.base.Optional;
14
7
  import com.google.common.collect.ImmutableList;
15
8
  import java.security.GeneralSecurityException;
16
9
 
17
- import org.embulk.spi.Exec;
18
- import org.slf4j.Logger;
19
-
20
10
  import com.google.api.client.googleapis.auth.oauth2.GoogleCredential;
11
+ import com.google.api.client.googleapis.compute.ComputeCredential;
21
12
  import com.google.api.client.googleapis.javanet.GoogleNetHttpTransport;
22
13
  import com.google.api.client.http.HttpTransport;
23
- import com.google.api.client.http.InputStreamContent;
24
14
  import com.google.api.client.json.JsonFactory;
25
15
  import com.google.api.client.json.jackson2.JacksonFactory;
16
+ import com.google.api.client.http.HttpRequestInitializer;
17
+ import com.google.api.client.googleapis.json.GoogleJsonResponseException;
26
18
  import com.google.api.services.bigquery.Bigquery;
27
19
  import com.google.api.services.bigquery.BigqueryScopes;
28
20
  import com.google.api.services.bigquery.model.ProjectList;
21
+ import org.embulk.spi.Exec;
22
+ import org.slf4j.Logger;
29
23
 
30
24
  public class BigqueryAuthentication
31
25
  {
32
-
33
26
  private final Logger log = Exec.getLogger(BigqueryAuthentication.class);
34
- private final String serviceAccountEmail;
35
- private final String p12KeyFilePath;
27
+ private final Optional<String> serviceAccountEmail;
28
+ private final Optional<String> p12KeyFilePath;
36
29
  private final String applicationName;
37
30
  private final HttpTransport httpTransport;
38
31
  private final JsonFactory jsonFactory;
39
- private final GoogleCredential credentials;
32
+ private final HttpRequestInitializer credentials;
40
33
 
41
- public BigqueryAuthentication(String serviceAccountEmail, String p12KeyFilePath, String applicationName) throws IOException, GeneralSecurityException
34
+ public BigqueryAuthentication(String authMethod, Optional<String> serviceAccountEmail, Optional<String> p12KeyFilePath, String applicationName)
35
+ throws IOException, GeneralSecurityException
42
36
  {
43
37
  this.serviceAccountEmail = serviceAccountEmail;
44
38
  this.p12KeyFilePath = p12KeyFilePath;
@@ -46,41 +40,60 @@ public class BigqueryAuthentication
46
40
 
47
41
  this.httpTransport = GoogleNetHttpTransport.newTrustedTransport();
48
42
  this.jsonFactory = new JacksonFactory();
49
- this.credentials = getCredentialProvider();
43
+
44
+ if (authMethod.toLowerCase().equals("compute_engine")) {
45
+ this.credentials = getComputeCredential();
46
+ } else {
47
+ this.credentials = getServiceAccountCredential();
48
+ }
50
49
  }
51
50
 
52
51
  /**
53
52
  * @see https://developers.google.com/accounts/docs/OAuth2ServiceAccount#authorizingrequests
54
53
  */
55
- private GoogleCredential getCredentialProvider() throws IOException, GeneralSecurityException
54
+ private GoogleCredential getServiceAccountCredential() throws IOException, GeneralSecurityException
56
55
  {
57
56
  // @see https://cloud.google.com/compute/docs/api/how-tos/authorization
58
57
  // @see https://developers.google.com/resources/api-libraries/documentation/storage/v1/java/latest/com/google/api/services/storage/STORAGE_SCOPE.html
59
- GoogleCredential cred = new GoogleCredential.Builder()
58
+ // @see https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/BigqueryScopes.html
59
+ return new GoogleCredential.Builder()
60
60
  .setTransport(httpTransport)
61
61
  .setJsonFactory(jsonFactory)
62
- .setServiceAccountId(serviceAccountEmail)
62
+ .setServiceAccountId(serviceAccountEmail.orNull())
63
63
  .setServiceAccountScopes(
64
64
  ImmutableList.of(
65
65
  BigqueryScopes.BIGQUERY
66
66
  )
67
67
  )
68
- .setServiceAccountPrivateKeyFromP12File(new File(p12KeyFilePath))
68
+ .setServiceAccountPrivateKeyFromP12File(new File(p12KeyFilePath.orNull()))
69
69
  .build();
70
- return cred;
71
70
  }
72
71
 
73
- public Bigquery getBigqueryClient() throws IOException
72
+ /**
73
+ * @see http://developers.guge.io/accounts/docs/OAuth2ServiceAccount#creatinganaccount
74
+ * @see https://developers.google.com/accounts/docs/OAuth2
75
+ */
76
+ private ComputeCredential getComputeCredential() throws IOException
77
+ {
78
+ ComputeCredential credential = new ComputeCredential.Builder(httpTransport, jsonFactory)
79
+ .build();
80
+ credential.refreshToken();
81
+
82
+ //log.debug("access_token:" + credential.getAccessToken());
83
+ log.debug("access_token expired:" + credential.getExpiresInSeconds());
84
+
85
+ return credential;
86
+ }
87
+
88
+ public Bigquery getBigqueryClient() throws GoogleJsonResponseException, IOException
74
89
  {
75
90
  Bigquery client = new Bigquery.Builder(httpTransport, jsonFactory, credentials)
76
- .setHttpRequestInitializer(credentials)
77
91
  .setApplicationName(applicationName)
78
92
  .build();
79
93
 
80
- // For throw IOException when authentication is failed.
94
+ // For throw IOException when authentication is fail.
81
95
  long maxResults = 1;
82
- Bigquery.Projects.List req = client.projects().list().setMaxResults(maxResults);
83
- ProjectList projectList = req.execute();
96
+ ProjectList projectList = client.projects().list().setMaxResults(maxResults).execute();
84
97
 
85
98
  return client;
86
99
  }
@@ -1,14 +1,11 @@
1
1
  package org.embulk.output;
2
2
 
3
3
  import java.io.File;
4
- import java.io.FileWriter;
5
4
  import java.io.FileNotFoundException;
6
5
  import java.io.FileOutputStream;
7
6
  import java.io.BufferedOutputStream;
8
7
  import java.io.IOException;
9
8
  import java.util.List;
10
- import java.util.ArrayList;
11
- import java.util.HashMap;
12
9
  import java.util.concurrent.TimeoutException;
13
10
  import com.google.common.base.Optional;
14
11
  import com.google.common.base.Throwables;
@@ -36,11 +33,17 @@ public class BigqueryOutputPlugin
36
33
  public interface PluginTask
37
34
  extends Task
38
35
  {
36
+ @Config("auth_method")
37
+ @ConfigDefault("\"private_key\"")
38
+ public String getAuthMethod();
39
+
39
40
  @Config("service_account_email")
40
- public String getServiceAccountEmail();
41
+ @ConfigDefault("null")
42
+ public Optional<String> getServiceAccountEmail();
41
43
 
42
44
  @Config("p12_keyfile_path")
43
- public String getP12KeyfilePath();
45
+ @ConfigDefault("null")
46
+ public Optional<String> getP12KeyfilePath();
44
47
 
45
48
  @Config("application_name")
46
49
  @ConfigDefault("\"Embulk BigQuery plugin\"")
@@ -115,7 +118,8 @@ public class BigqueryOutputPlugin
115
118
  final PluginTask task = config.loadConfig(PluginTask.class);
116
119
 
117
120
  try {
118
- bigQueryWriter = new BigqueryWriter.Builder(task.getServiceAccountEmail())
121
+ bigQueryWriter = new BigqueryWriter.Builder(task.getAuthMethod())
122
+ .setServiceAccountEmail(task.getServiceAccountEmail())
119
123
  .setP12KeyFilePath(task.getP12KeyfilePath())
120
124
  .setApplicationName(task.getApplicationName())
121
125
  .setProject(task.getProject())
@@ -170,8 +174,6 @@ public class BigqueryOutputPlugin
170
174
  private BufferedOutputStream output = null;
171
175
  private File file;
172
176
  private String filePath;
173
- private String fileName;
174
- private long fileSize;
175
177
 
176
178
  public void nextFile()
177
179
  {
@@ -6,19 +6,11 @@ import java.io.FileNotFoundException;
6
6
  import java.io.FileInputStream;
7
7
  import java.io.BufferedInputStream;
8
8
  import com.google.api.client.http.InputStreamContent;
9
- import java.util.ArrayList;
10
9
  import java.util.List;
11
- import java.util.Iterator;
12
- import java.util.HashMap;
13
- import java.util.IllegalFormatException;
14
- import java.util.concurrent.Callable;
15
10
  import java.util.concurrent.TimeoutException;
16
- import org.apache.commons.lang3.StringUtils;
17
11
  import com.google.common.base.Optional;
18
- import com.google.common.collect.ImmutableSet;
19
12
  import com.google.common.base.Throwables;
20
13
  import java.security.GeneralSecurityException;
21
-
22
14
  import com.fasterxml.jackson.databind.ObjectMapper;
23
15
  import com.fasterxml.jackson.core.type.TypeReference;
24
16
 
@@ -26,31 +18,21 @@ import org.embulk.spi.Exec;
26
18
  import org.slf4j.Logger;
27
19
 
28
20
  import com.google.api.services.bigquery.Bigquery;
29
- import com.google.api.services.bigquery.BigqueryScopes;
30
- import com.google.api.services.bigquery.Bigquery.Datasets;
31
21
  import com.google.api.services.bigquery.Bigquery.Tables;
32
22
  import com.google.api.services.bigquery.Bigquery.Jobs.Insert;
33
- import com.google.api.services.bigquery.Bigquery.Jobs.GetQueryResults;
34
23
  import com.google.api.services.bigquery.model.Job;
35
24
  import com.google.api.services.bigquery.model.JobConfiguration;
36
25
  import com.google.api.services.bigquery.model.JobConfigurationLoad;
37
- import com.google.api.services.bigquery.model.JobStatus;
38
26
  import com.google.api.services.bigquery.model.JobStatistics;
39
27
  import com.google.api.services.bigquery.model.JobReference;
40
- import com.google.api.services.bigquery.model.DatasetList;
41
28
  import com.google.api.services.bigquery.model.Table;
42
- import com.google.api.services.bigquery.model.TableList;
43
29
  import com.google.api.services.bigquery.model.TableSchema;
44
30
  import com.google.api.services.bigquery.model.TableReference;
45
31
  import com.google.api.services.bigquery.model.TableFieldSchema;
46
- import com.google.api.services.bigquery.model.TableCell;
47
- import com.google.api.services.bigquery.model.TableRow;
48
32
  import com.google.api.services.bigquery.model.ErrorProto;
49
33
  import com.google.api.client.googleapis.json.GoogleJsonResponseException;
50
-
51
34
  import com.google.api.client.googleapis.media.MediaHttpUploader;
52
35
  import com.google.api.client.googleapis.media.MediaHttpUploaderProgressListener;
53
- import com.google.api.client.googleapis.media.MediaHttpUploader.UploadState;
54
36
 
55
37
  public class BigqueryWriter
56
38
  {
@@ -86,7 +68,7 @@ public class BigqueryWriter
86
68
  this.jobStatusPollingInterval = builder.jobStatusPollingInterval;
87
69
  this.isSkipJobResultCheck = builder.isSkipJobResultCheck;
88
70
 
89
- BigqueryAuthentication auth = new BigqueryAuthentication(builder.serviceAccountEmail, builder.p12KeyFilePath, builder.applicationName);
71
+ BigqueryAuthentication auth = new BigqueryAuthentication(builder.authMethod, builder.serviceAccountEmail, builder.p12KeyFilePath, builder.applicationName);
90
72
  this.bigQueryClient = auth.getBigqueryClient();
91
73
 
92
74
  checkConfig();
@@ -252,7 +234,7 @@ public class BigqueryWriter
252
234
  {
253
235
  if (autoCreateTable) {
254
236
  if (!schemaPath.isPresent()) {
255
- throw new IOException("schema_path is empty");
237
+ throw new FileNotFoundException("schema_path is empty");
256
238
  } else {
257
239
  File file = new File(schemaPath.orNull());
258
240
  if (!file.exists()) {
@@ -296,8 +278,9 @@ public class BigqueryWriter
296
278
 
297
279
  public static class Builder
298
280
  {
299
- private final String serviceAccountEmail;
300
- private String p12KeyFilePath;
281
+ private final String authMethod;
282
+ private Optional<String> serviceAccountEmail;
283
+ private Optional<String> p12KeyFilePath;
301
284
  private String applicationName;
302
285
  private String project;
303
286
  private String dataset;
@@ -312,13 +295,18 @@ public class BigqueryWriter
312
295
  private int jobStatusPollingInterval;
313
296
  private boolean isSkipJobResultCheck;
314
297
 
298
+ public Builder(String authMethod)
299
+ {
300
+ this.authMethod = authMethod;
301
+ }
315
302
 
316
- public Builder(String serviceAccountEmail)
303
+ public Builder setServiceAccountEmail(Optional<String> serviceAccountEmail)
317
304
  {
318
305
  this.serviceAccountEmail = serviceAccountEmail;
306
+ return this;
319
307
  }
320
308
 
321
- public Builder setP12KeyFilePath(String p12KeyFilePath)
309
+ public Builder setP12KeyFilePath(Optional<String> p12KeyFilePath)
322
310
  {
323
311
  this.p12KeyFilePath = p12KeyFilePath;
324
312
  return this;
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: embulk-output-bigquery
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.2
4
+ version: 0.1.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Satoshi Akama
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-04-01 00:00:00.000000000 Z
11
+ date: 2015-04-06 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  requirement: !ruby/object:Gem::Requirement
@@ -46,7 +46,6 @@ extensions: []
46
46
  extra_rdoc_files: []
47
47
  files:
48
48
  - .gitignore
49
- - LICENSE.txt
50
49
  - README.md
51
50
  - build.gradle
52
51
  - gradle/wrapper/gradle-wrapper.jar
@@ -63,7 +62,7 @@ files:
63
62
  - src/test/java/org/embulk/output/TestBigqueryWriter.java
64
63
  - classpath/commons-codec-1.3.jar
65
64
  - classpath/commons-logging-1.1.1.jar
66
- - classpath/embulk-output-bigquery-0.1.2.jar
65
+ - classpath/embulk-output-bigquery-0.1.3.jar
67
66
  - classpath/google-api-client-1.19.1.jar
68
67
  - classpath/google-api-services-bigquery-v2-rev193-1.19.1.jar
69
68
  - classpath/google-http-client-1.19.0.jar
@@ -1,21 +0,0 @@
1
-
2
- MIT License
3
-
4
- Permission is hereby granted, free of charge, to any person obtaining
5
- a copy of this software and associated documentation files (the
6
- "Software"), to deal in the Software without restriction, including
7
- without limitation the rights to use, copy, modify, merge, publish,
8
- distribute, sublicense, and/or sell copies of the Software, and to
9
- permit persons to whom the Software is furnished to do so, subject to
10
- the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be
13
- included in all copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16
- EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
17
- MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18
- NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
19
- LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
20
- OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
21
- WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.