embulk-output-bigquery 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 46c61dd1c73ff99c3c69bd217ca772f07b2e1127
4
- data.tar.gz: ba184360972884260c1fe90264af7d5386791804
3
+ metadata.gz: b37d638ca9c217221687cdcadfbd45257291aef4
4
+ data.tar.gz: 972bf78e9ce75972fd3f2e1f77389a7383c3d2a0
5
5
  SHA512:
6
- metadata.gz: aa693e59cb4b45c2d43f07479f3d61e63242185be9964d4f00b83a4a784a0443ae270a63760f3f2f188e74deb77cbb94a89a18db49d2c5cd4621f18b73363ab3
7
- data.tar.gz: 7c0ea783220de28befd7c565ff83ec5ff58f13af0db16b3d341a12c3e415adeacba375e5688a42fcbb26d0402a48071622ed5b161fa52fd08b1f56444faf66e1
6
+ metadata.gz: 6d18639e76da80f45e2852df8408ec4c9c655e06a77a776c42dfb52310cc787e8319fd8f182eef20c59ffd01e5939ccb3799b4a7bad5b20f416aa668d513b3e3
7
+ data.tar.gz: 5a37cded1558ba6f3fbb1d4c475c46a593151a32e831299c00c386185d32828ec72810c2334863f7e03cdc7a3b2d974e7d1ec0cf653276073e7d336398ae92ba
data/README.md CHANGED
@@ -1,7 +1,7 @@
1
1
 
2
2
  # embulk-output-bigquery
3
3
 
4
- [Embulk](https://github.com/embulk/embulk/) output plugin to load/insert data into [Google BigQuery](https://cloud.google.com/bigquery/)
4
+ [Embulk](https://github.com/embulk/embulk/) output plugin to load/insert data into [Google BigQuery](https://cloud.google.com/bigquery/) using [direct insert](https://cloud.google.com/bigquery/loading-data-into-bigquery#loaddatapostrequest)
5
5
 
6
6
  ## Overview
7
7
 
@@ -16,7 +16,7 @@ https://developers.google.com/bigquery/loading-data-into-bigquery
16
16
  ### NOT IMPLEMENTED
17
17
  * insert data over streaming inserts
18
18
  * for continuous real-time insertions
19
- * Pleast use other product, like [fluent-plugin-bigquery](https://github.com/kaizenplatform/fluent-plugin-bigquery)
19
+ * Please use other product, like [fluent-plugin-bigquery](https://github.com/kaizenplatform/fluent-plugin-bigquery)
20
20
  * https://developers.google.com/bigquery/streaming-data-into-bigquery#usecases
21
21
 
22
22
  Current version of this plugin supports Google API with Service Account Authentication, but does not support
@@ -24,8 +24,9 @@ OAuth flow for installed applications.
24
24
 
25
25
  ## Configuration
26
26
 
27
- - **service_account_email**: your Google service account email (string, required)
28
- - **p12_keyfile_path**: fullpath of private key in P12(PKCS12) format (string, required)
27
+ - **auth_method**: (private_key or compute_engine) (string, optional, default is private_key)
28
+ - **service_account_email**: your Google service account email (string, required when auth_method is private_key)
29
+ - **p12_keyfile_path**: fullpath of private key in P12(PKCS12) format (string, required when auth_method is private_key)
29
30
  - **path_prefix**: (string, required)
30
31
  - **sequence_format**: (string, optional, default is %03d.%02d)
31
32
  - **file_ext**: (string, required)
@@ -42,13 +43,14 @@ OAuth flow for installed applications.
42
43
  - **is_skip_job_result_check**: (boolean, optional, default is 0)
43
44
  - **field_delimiter**: (string, optional, default is ",")
44
45
  - **max_bad_records**: (int, optional, default is 0)
45
- - **encoding**: (UTF-8 or ISO-8859-1) (string, optional, default is "UTF-8")
46
+ - **encoding**: (UTF-8 or ISO-8859-1) (string, optional, default is UTF-8)
46
47
 
47
- ## Example
48
+ ### Example
48
49
 
49
50
  ```yaml
50
51
  out:
51
52
  type: bigquery
53
+ auth_method: private_key # default
52
54
  service_account_email: ABCXYZ123ABCXYZ123.gserviceaccount.com
53
55
  p12_keyfile_path: /path/to/p12_keyfile.p12
54
56
  path_prefix: /path/to/output
@@ -64,19 +66,58 @@ out:
64
66
  - {type: gzip}
65
67
  ```
66
68
 
67
- ## Dynamic table creating
69
+ ### Authentication
70
+
71
+ There are two methods supported to fetch access token for the service account.
72
+
73
+ 1. Public-Private key pair
74
+ 2. Predefined access token (Compute Engine only)
75
+
76
+ The examples above use the first one. You first need to create a service account (client ID),
77
+ download its private key and deploy the key with embulk.
78
+
79
+ On the other hand, you don't need to explicitly create a service account for embulk when you
80
+ run embulk in Google Compute Engine. In this second authentication method, you need to
81
+ add the API scope "https://www.googleapis.com/auth/bigquery" to the scope list of your
82
+ Compute Engine instance, then you can configure embulk like this.
83
+
84
+ ```yaml
85
+ out:
86
+ type: bigquery
87
+ auth_method: compute_engine
88
+ ```
89
+
90
+ ### Table id formatting
91
+
92
+ `table` and option accept [Time#strftime](http://ruby-doc.org/core-1.9.3/Time.html#method-i-strftime)
93
+ format to construct table ids.
94
+ Table ids are formatted at runtime
95
+ using the local time of the embulk server.
96
+
97
+ For example, with the configuration below,
98
+ data is inserted into tables `table_2015_04`, `table_2015_05` and so on.
99
+
100
+ ```yaml
101
+ out:
102
+ type: bigquery
103
+ table: table_%Y_%m
104
+ ```
105
+
106
+ ### Dynamic table creating
68
107
 
69
108
  When `auto_create_table` is set to true, try to create the table using BigQuery API.
70
109
 
110
+ If table already exists, insert into it.
111
+
71
112
  To describe the schema of the target table, please write schema path.
72
113
 
73
- `table` option accept [Time#strftime](http://ruby-doc.org/core-1.9.3/Time.html#method-i-strftime)
74
- format of ruby to construct table name.
75
114
 
76
- ```
77
- auto_create_table: true
78
- table: table_%Y_%m
79
- schema_path: /path/to/schema.json
115
+ ```yaml
116
+ out:
117
+ type: bigquery
118
+ auto_create_table: true
119
+ table: table_%Y_%m
120
+ schema_path: /path/to/schema.json
80
121
  ```
81
122
 
82
123
  ## Build
@@ -15,7 +15,7 @@ configurations {
15
15
  sourceCompatibility = 1.7
16
16
  targetCompatibility = 1.7
17
17
 
18
- version = "0.1.2"
18
+ version = "0.1.3"
19
19
 
20
20
  dependencies {
21
21
  compile "org.embulk:embulk-core:0.5.1"
@@ -1,44 +1,38 @@
1
1
  package org.embulk.output;
2
2
 
3
3
  import java.io.File;
4
- import java.io.FileNotFoundException;
5
- import java.io.FileInputStream;
6
4
  import java.io.IOException;
7
- import java.util.ArrayList;
8
- import java.util.List;
9
- import java.util.IllegalFormatException;
10
- import com.google.api.client.auth.oauth2.Credential;
11
- import com.google.api.client.auth.oauth2.CredentialRefreshListener;
12
- import com.google.api.client.auth.oauth2.TokenErrorResponse;
13
- import com.google.api.client.auth.oauth2.TokenResponse;
5
+
6
+ import com.google.common.base.Optional;
14
7
  import com.google.common.collect.ImmutableList;
15
8
  import java.security.GeneralSecurityException;
16
9
 
17
- import org.embulk.spi.Exec;
18
- import org.slf4j.Logger;
19
-
20
10
  import com.google.api.client.googleapis.auth.oauth2.GoogleCredential;
11
+ import com.google.api.client.googleapis.compute.ComputeCredential;
21
12
  import com.google.api.client.googleapis.javanet.GoogleNetHttpTransport;
22
13
  import com.google.api.client.http.HttpTransport;
23
- import com.google.api.client.http.InputStreamContent;
24
14
  import com.google.api.client.json.JsonFactory;
25
15
  import com.google.api.client.json.jackson2.JacksonFactory;
16
+ import com.google.api.client.http.HttpRequestInitializer;
17
+ import com.google.api.client.googleapis.json.GoogleJsonResponseException;
26
18
  import com.google.api.services.bigquery.Bigquery;
27
19
  import com.google.api.services.bigquery.BigqueryScopes;
28
20
  import com.google.api.services.bigquery.model.ProjectList;
21
+ import org.embulk.spi.Exec;
22
+ import org.slf4j.Logger;
29
23
 
30
24
  public class BigqueryAuthentication
31
25
  {
32
-
33
26
  private final Logger log = Exec.getLogger(BigqueryAuthentication.class);
34
- private final String serviceAccountEmail;
35
- private final String p12KeyFilePath;
27
+ private final Optional<String> serviceAccountEmail;
28
+ private final Optional<String> p12KeyFilePath;
36
29
  private final String applicationName;
37
30
  private final HttpTransport httpTransport;
38
31
  private final JsonFactory jsonFactory;
39
- private final GoogleCredential credentials;
32
+ private final HttpRequestInitializer credentials;
40
33
 
41
- public BigqueryAuthentication(String serviceAccountEmail, String p12KeyFilePath, String applicationName) throws IOException, GeneralSecurityException
34
+ public BigqueryAuthentication(String authMethod, Optional<String> serviceAccountEmail, Optional<String> p12KeyFilePath, String applicationName)
35
+ throws IOException, GeneralSecurityException
42
36
  {
43
37
  this.serviceAccountEmail = serviceAccountEmail;
44
38
  this.p12KeyFilePath = p12KeyFilePath;
@@ -46,41 +40,60 @@ public class BigqueryAuthentication
46
40
 
47
41
  this.httpTransport = GoogleNetHttpTransport.newTrustedTransport();
48
42
  this.jsonFactory = new JacksonFactory();
49
- this.credentials = getCredentialProvider();
43
+
44
+ if (authMethod.toLowerCase().equals("compute_engine")) {
45
+ this.credentials = getComputeCredential();
46
+ } else {
47
+ this.credentials = getServiceAccountCredential();
48
+ }
50
49
  }
51
50
 
52
51
  /**
53
52
  * @see https://developers.google.com/accounts/docs/OAuth2ServiceAccount#authorizingrequests
54
53
  */
55
- private GoogleCredential getCredentialProvider() throws IOException, GeneralSecurityException
54
+ private GoogleCredential getServiceAccountCredential() throws IOException, GeneralSecurityException
56
55
  {
57
56
  // @see https://cloud.google.com/compute/docs/api/how-tos/authorization
58
57
  // @see https://developers.google.com/resources/api-libraries/documentation/storage/v1/java/latest/com/google/api/services/storage/STORAGE_SCOPE.html
59
- GoogleCredential cred = new GoogleCredential.Builder()
58
+ // @see https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/BigqueryScopes.html
59
+ return new GoogleCredential.Builder()
60
60
  .setTransport(httpTransport)
61
61
  .setJsonFactory(jsonFactory)
62
- .setServiceAccountId(serviceAccountEmail)
62
+ .setServiceAccountId(serviceAccountEmail.orNull())
63
63
  .setServiceAccountScopes(
64
64
  ImmutableList.of(
65
65
  BigqueryScopes.BIGQUERY
66
66
  )
67
67
  )
68
- .setServiceAccountPrivateKeyFromP12File(new File(p12KeyFilePath))
68
+ .setServiceAccountPrivateKeyFromP12File(new File(p12KeyFilePath.orNull()))
69
69
  .build();
70
- return cred;
71
70
  }
72
71
 
73
- public Bigquery getBigqueryClient() throws IOException
72
+ /**
73
+ * @see http://developers.guge.io/accounts/docs/OAuth2ServiceAccount#creatinganaccount
74
+ * @see https://developers.google.com/accounts/docs/OAuth2
75
+ */
76
+ private ComputeCredential getComputeCredential() throws IOException
77
+ {
78
+ ComputeCredential credential = new ComputeCredential.Builder(httpTransport, jsonFactory)
79
+ .build();
80
+ credential.refreshToken();
81
+
82
+ //log.debug("access_token:" + credential.getAccessToken());
83
+ log.debug("access_token expired:" + credential.getExpiresInSeconds());
84
+
85
+ return credential;
86
+ }
87
+
88
+ public Bigquery getBigqueryClient() throws GoogleJsonResponseException, IOException
74
89
  {
75
90
  Bigquery client = new Bigquery.Builder(httpTransport, jsonFactory, credentials)
76
- .setHttpRequestInitializer(credentials)
77
91
  .setApplicationName(applicationName)
78
92
  .build();
79
93
 
80
- // For throw IOException when authentication is failed.
94
+ // For throw IOException when authentication is fail.
81
95
  long maxResults = 1;
82
- Bigquery.Projects.List req = client.projects().list().setMaxResults(maxResults);
83
- ProjectList projectList = req.execute();
96
+ ProjectList projectList = client.projects().list().setMaxResults(maxResults).execute();
84
97
 
85
98
  return client;
86
99
  }
@@ -1,14 +1,11 @@
1
1
  package org.embulk.output;
2
2
 
3
3
  import java.io.File;
4
- import java.io.FileWriter;
5
4
  import java.io.FileNotFoundException;
6
5
  import java.io.FileOutputStream;
7
6
  import java.io.BufferedOutputStream;
8
7
  import java.io.IOException;
9
8
  import java.util.List;
10
- import java.util.ArrayList;
11
- import java.util.HashMap;
12
9
  import java.util.concurrent.TimeoutException;
13
10
  import com.google.common.base.Optional;
14
11
  import com.google.common.base.Throwables;
@@ -36,11 +33,17 @@ public class BigqueryOutputPlugin
36
33
  public interface PluginTask
37
34
  extends Task
38
35
  {
36
+ @Config("auth_method")
37
+ @ConfigDefault("\"private_key\"")
38
+ public String getAuthMethod();
39
+
39
40
  @Config("service_account_email")
40
- public String getServiceAccountEmail();
41
+ @ConfigDefault("null")
42
+ public Optional<String> getServiceAccountEmail();
41
43
 
42
44
  @Config("p12_keyfile_path")
43
- public String getP12KeyfilePath();
45
+ @ConfigDefault("null")
46
+ public Optional<String> getP12KeyfilePath();
44
47
 
45
48
  @Config("application_name")
46
49
  @ConfigDefault("\"Embulk BigQuery plugin\"")
@@ -115,7 +118,8 @@ public class BigqueryOutputPlugin
115
118
  final PluginTask task = config.loadConfig(PluginTask.class);
116
119
 
117
120
  try {
118
- bigQueryWriter = new BigqueryWriter.Builder(task.getServiceAccountEmail())
121
+ bigQueryWriter = new BigqueryWriter.Builder(task.getAuthMethod())
122
+ .setServiceAccountEmail(task.getServiceAccountEmail())
119
123
  .setP12KeyFilePath(task.getP12KeyfilePath())
120
124
  .setApplicationName(task.getApplicationName())
121
125
  .setProject(task.getProject())
@@ -170,8 +174,6 @@ public class BigqueryOutputPlugin
170
174
  private BufferedOutputStream output = null;
171
175
  private File file;
172
176
  private String filePath;
173
- private String fileName;
174
- private long fileSize;
175
177
 
176
178
  public void nextFile()
177
179
  {
@@ -6,19 +6,11 @@ import java.io.FileNotFoundException;
6
6
  import java.io.FileInputStream;
7
7
  import java.io.BufferedInputStream;
8
8
  import com.google.api.client.http.InputStreamContent;
9
- import java.util.ArrayList;
10
9
  import java.util.List;
11
- import java.util.Iterator;
12
- import java.util.HashMap;
13
- import java.util.IllegalFormatException;
14
- import java.util.concurrent.Callable;
15
10
  import java.util.concurrent.TimeoutException;
16
- import org.apache.commons.lang3.StringUtils;
17
11
  import com.google.common.base.Optional;
18
- import com.google.common.collect.ImmutableSet;
19
12
  import com.google.common.base.Throwables;
20
13
  import java.security.GeneralSecurityException;
21
-
22
14
  import com.fasterxml.jackson.databind.ObjectMapper;
23
15
  import com.fasterxml.jackson.core.type.TypeReference;
24
16
 
@@ -26,31 +18,21 @@ import org.embulk.spi.Exec;
26
18
  import org.slf4j.Logger;
27
19
 
28
20
  import com.google.api.services.bigquery.Bigquery;
29
- import com.google.api.services.bigquery.BigqueryScopes;
30
- import com.google.api.services.bigquery.Bigquery.Datasets;
31
21
  import com.google.api.services.bigquery.Bigquery.Tables;
32
22
  import com.google.api.services.bigquery.Bigquery.Jobs.Insert;
33
- import com.google.api.services.bigquery.Bigquery.Jobs.GetQueryResults;
34
23
  import com.google.api.services.bigquery.model.Job;
35
24
  import com.google.api.services.bigquery.model.JobConfiguration;
36
25
  import com.google.api.services.bigquery.model.JobConfigurationLoad;
37
- import com.google.api.services.bigquery.model.JobStatus;
38
26
  import com.google.api.services.bigquery.model.JobStatistics;
39
27
  import com.google.api.services.bigquery.model.JobReference;
40
- import com.google.api.services.bigquery.model.DatasetList;
41
28
  import com.google.api.services.bigquery.model.Table;
42
- import com.google.api.services.bigquery.model.TableList;
43
29
  import com.google.api.services.bigquery.model.TableSchema;
44
30
  import com.google.api.services.bigquery.model.TableReference;
45
31
  import com.google.api.services.bigquery.model.TableFieldSchema;
46
- import com.google.api.services.bigquery.model.TableCell;
47
- import com.google.api.services.bigquery.model.TableRow;
48
32
  import com.google.api.services.bigquery.model.ErrorProto;
49
33
  import com.google.api.client.googleapis.json.GoogleJsonResponseException;
50
-
51
34
  import com.google.api.client.googleapis.media.MediaHttpUploader;
52
35
  import com.google.api.client.googleapis.media.MediaHttpUploaderProgressListener;
53
- import com.google.api.client.googleapis.media.MediaHttpUploader.UploadState;
54
36
 
55
37
  public class BigqueryWriter
56
38
  {
@@ -86,7 +68,7 @@ public class BigqueryWriter
86
68
  this.jobStatusPollingInterval = builder.jobStatusPollingInterval;
87
69
  this.isSkipJobResultCheck = builder.isSkipJobResultCheck;
88
70
 
89
- BigqueryAuthentication auth = new BigqueryAuthentication(builder.serviceAccountEmail, builder.p12KeyFilePath, builder.applicationName);
71
+ BigqueryAuthentication auth = new BigqueryAuthentication(builder.authMethod, builder.serviceAccountEmail, builder.p12KeyFilePath, builder.applicationName);
90
72
  this.bigQueryClient = auth.getBigqueryClient();
91
73
 
92
74
  checkConfig();
@@ -252,7 +234,7 @@ public class BigqueryWriter
252
234
  {
253
235
  if (autoCreateTable) {
254
236
  if (!schemaPath.isPresent()) {
255
- throw new IOException("schema_path is empty");
237
+ throw new FileNotFoundException("schema_path is empty");
256
238
  } else {
257
239
  File file = new File(schemaPath.orNull());
258
240
  if (!file.exists()) {
@@ -296,8 +278,9 @@ public class BigqueryWriter
296
278
 
297
279
  public static class Builder
298
280
  {
299
- private final String serviceAccountEmail;
300
- private String p12KeyFilePath;
281
+ private final String authMethod;
282
+ private Optional<String> serviceAccountEmail;
283
+ private Optional<String> p12KeyFilePath;
301
284
  private String applicationName;
302
285
  private String project;
303
286
  private String dataset;
@@ -312,13 +295,18 @@ public class BigqueryWriter
312
295
  private int jobStatusPollingInterval;
313
296
  private boolean isSkipJobResultCheck;
314
297
 
298
+ public Builder(String authMethod)
299
+ {
300
+ this.authMethod = authMethod;
301
+ }
315
302
 
316
- public Builder(String serviceAccountEmail)
303
+ public Builder setServiceAccountEmail(Optional<String> serviceAccountEmail)
317
304
  {
318
305
  this.serviceAccountEmail = serviceAccountEmail;
306
+ return this;
319
307
  }
320
308
 
321
- public Builder setP12KeyFilePath(String p12KeyFilePath)
309
+ public Builder setP12KeyFilePath(Optional<String> p12KeyFilePath)
322
310
  {
323
311
  this.p12KeyFilePath = p12KeyFilePath;
324
312
  return this;
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: embulk-output-bigquery
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.2
4
+ version: 0.1.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Satoshi Akama
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-04-01 00:00:00.000000000 Z
11
+ date: 2015-04-06 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  requirement: !ruby/object:Gem::Requirement
@@ -46,7 +46,6 @@ extensions: []
46
46
  extra_rdoc_files: []
47
47
  files:
48
48
  - .gitignore
49
- - LICENSE.txt
50
49
  - README.md
51
50
  - build.gradle
52
51
  - gradle/wrapper/gradle-wrapper.jar
@@ -63,7 +62,7 @@ files:
63
62
  - src/test/java/org/embulk/output/TestBigqueryWriter.java
64
63
  - classpath/commons-codec-1.3.jar
65
64
  - classpath/commons-logging-1.1.1.jar
66
- - classpath/embulk-output-bigquery-0.1.2.jar
65
+ - classpath/embulk-output-bigquery-0.1.3.jar
67
66
  - classpath/google-api-client-1.19.1.jar
68
67
  - classpath/google-api-services-bigquery-v2-rev193-1.19.1.jar
69
68
  - classpath/google-http-client-1.19.0.jar
@@ -1,21 +0,0 @@
1
-
2
- MIT License
3
-
4
- Permission is hereby granted, free of charge, to any person obtaining
5
- a copy of this software and associated documentation files (the
6
- "Software"), to deal in the Software without restriction, including
7
- without limitation the rights to use, copy, modify, merge, publish,
8
- distribute, sublicense, and/or sell copies of the Software, and to
9
- permit persons to whom the Software is furnished to do so, subject to
10
- the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be
13
- included in all copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16
- EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
17
- MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18
- NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
19
- LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
20
- OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
21
- WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.