fluent-plugin-bigquery-patched-retry-insert-502 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 742731c54150af380dad12023e7600404f4589c6
4
+ data.tar.gz: 9df991f3d2b66ad22ee6a7a2bde86c2524b9ab53
5
+ SHA512:
6
+ metadata.gz: a38f152af649fc7f276e87c0b4d415e6ebffca05e944b51cff67d456406e5110c117f5fc932e5b4b12c9349d401a992f96d95a412aee3b9606d01c642ea5ebd3
7
+ data.tar.gz: e949e32c403e48b785bd5fa6cb0242e3d8ca614686b4b2b7b718931353b370866fc95e4bd9d8c53c06720c3f65c1db99c275dbf0d24d2a894960e90e67be0f0f
@@ -0,0 +1,19 @@
1
+ *.gem
2
+ *.rbc
3
+ .bundle
4
+ .config
5
+ .yardoc
6
+ .ruby-version
7
+ Gemfile.lock
8
+ InstalledFiles
9
+ _yardoc
10
+ coverage
11
+ doc/
12
+ lib/bundler/man
13
+ pkg
14
+ rdoc
15
+ spec/reports
16
+ test/tmp
17
+ test/version_tmp
18
+ tmp
19
+ script/
@@ -0,0 +1,26 @@
1
+ language: ruby
2
+
3
+ rvm:
4
+ - 2.0
5
+ - 2.1
6
+ - 2.2
7
+ - 2.3.3
8
+
9
+ gemfile:
10
+ - Gemfile
11
+ - gemfiles/activesupport-4.gemfile
12
+
13
+ matrix:
14
+ exclude:
15
+ - rvm: 2.0
16
+ gemfile: Gemfile
17
+ - rvm: 2.1
18
+ gemfile: Gemfile
19
+
20
+ before_install:
21
+ - gem update bundler
22
+
23
+ before_install:
24
+ - gem update bundler
25
+
26
+ script: bundle exec rake test
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in fluent-plugin-bigquery.gemspec
4
+ gemspec
@@ -0,0 +1,13 @@
1
+ Copyright (c) 2012- TAGOMORI Satoshi
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an "AS IS" BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
@@ -0,0 +1,563 @@
1
+ # fluent-plugin-bigquery
2
+
3
+ [Fluentd](http://fluentd.org) output plugin to load/insert data into Google BigQuery.
4
+
5
+ - **Plugin type**: TimeSlicedOutput
6
+
7
+ * insert data over streaming inserts
8
+ * for continuous real-time insertions
9
+ * https://developers.google.com/bigquery/streaming-data-into-bigquery#usecases
10
+ * load data
11
+ * for data loading as batch jobs, for big amount of data
12
+ * https://developers.google.com/bigquery/loading-data-into-bigquery
13
+
14
+ Current version of this plugin supports Google API with Service Account Authentication, but does not support
15
+ OAuth flow for installed applications.
16
+
17
+ ## Notice
18
+ If you use ruby-2.1 or earlier, you must use activesupport-4.2.x or earlier.
19
+
20
+ ## With docker image
21
+ If you use official alpine based fluentd docker image (https://github.com/fluent/fluentd-docker-image),
22
+ You need to install `bigdecimal` gem on your own dockerfile.
23
+ Because alpine based image has only minimal ruby environment in order to reduce image size.
24
+ And in most case, dependency to embedded gem is not written on gemspec.
25
+ Because embbeded gem dependency sometimes restricts ruby environment.
26
+
27
+ ## Configuration
28
+
29
+ ### Options
30
+
31
+ | name | type | required? | default | description |
32
+ | :------------------------------------- | :------------ | :----------- | :------------------------- | :----------------------- |
33
+ | method | string | no | insert | `insert` (Streaming Insert) or `load` (load job) |
34
+ | buffer_type | string | no | lightening (insert) or file (load) | |
35
+ | buffer_chunk_limit | integer | no | 1MB (insert) or 1GB (load) | |
36
+ | buffer_queue_limit | integer | no | 1024 (insert) or 32 (load) | |
37
+ | buffer_chunk_records_limit | integer | no | 500 | |
38
+ | flush_interval | float | no | 0.25 (*insert) or default of time sliced output (load) | |
39
+ | try_flush_interval | float | no | 0.05 (*insert) or default of time sliced output (load) | |
40
+ | auth_method | enum | yes | private_key | `private_key` or `json_key` or `compute_engine` or `application_default` |
41
+ | email | string | yes (private_key) | nil | GCP Service Account Email |
42
+ | private_key_path | string | yes (private_key) | nil | GCP Private Key file path |
43
+ | private_key_passphrase | string | yes (private_key) | nil | GCP Private Key Passphrase |
44
+ | json_key | string | yes (json_key) | nil | GCP JSON Key file path or JSON Key string |
45
+ | project | string | yes | nil | |
46
+ | table | string | yes (either `tables`) | nil | |
47
+ | tables | string | yes (either `table`) | nil | can set multi table names splitted by `,` |
48
+ | template_suffix | string | no | nil | can use `%{time_slice}` placeholder replaced by `time_slice_format` |
49
+ | auto_create_table | bool | no | false | If true, creates table automatically |
50
+ | skip_invalid_rows | bool | no | false | Only `insert` method. |
51
+ | max_bad_records | integer | no | 0 | Only `load` method. If the number of bad records exceeds this value, an invalid error is returned in the job result. |
52
+ | ignore_unknown_values | bool | no | false | Accept rows that contain values that do not match the schema. The unknown values are ignored. |
53
+ | schema | array | yes (either `fetch_schema` or `schema_path`) | nil | Schema Definition. It is formatted by JSON. |
54
+ | schema_path | string | yes (either `fetch_schema`) | nil | Schema Definition file path. It is formatted by JSON. |
55
+ | fetch_schema | bool | yes (either `schema_path`) | false | If true, fetch table schema definition from Bigquery table automatically. |
56
+ | fetch_schema_table | string | no | nil | If set, fetch table schema definition from this table, If fetch_schema is false, this param is ignored |
57
+ | schema_cache_expire | integer | no | 600 | Value is second. If current time is after expiration interval, re-fetch table schema definition. |
58
+ | field_string (deprecated) | string | no | nil | see examples. |
59
+ | field_integer (deprecated) | string | no | nil | see examples. |
60
+ | field_float (deprecated) | string | no | nil | see examples. |
61
+ | field_boolean (deprecated) | string | no | nil | see examples. |
62
+ | field_timestamp (deprecated) | string | no | nil | see examples. |
63
+ | time_field | string | no | nil | If this param is set, plugin set formatted time string to this field. |
64
+ | time_format | string | no | nil | ex. `%s`, `%Y/%m%d %H:%M:%S` |
65
+ | replace_record_key | bool | no | false | see examples. |
66
+ | replace_record_key_regexp{1-10} | string | no | nil | see examples. |
67
+ | convert_hash_to_json (deprecated) | bool | no | false | If true, converts Hash value of record to JSON String. |
68
+ | insert_id_field | string | no | nil | Use key as `insert_id` of Streaming Insert API parameter. |
69
+ | allow_retry_insert_errors | bool | no | false | Retry to insert rows when an insertErrors occurs. There is a possibility that rows are inserted in duplicate. |
70
+ | request_timeout_sec | integer | no | nil | Bigquery API response timeout |
71
+ | request_open_timeout_sec | integer | no | 60 | Bigquery API connection, and request timeout. If you send big data to Bigquery, set large value. |
72
+ | time_partitioning_type | enum | no (either day) | nil | Type of bigquery time partitioning feature(experimental feature on BigQuery). |
73
+ | time_partitioning_expiration | time | no | nil | Expiration milliseconds for bigquery time partitioning. (experimental feature on BigQuery) |
74
+
75
+ ### Standard Options
76
+
77
+ | name | type | required? | default | description |
78
+ | :------------------------------------- | :------------ | :----------- | :------------------------- | :----------------------- |
79
+ | localtime | bool | no | nil | Use localtime |
80
+ | utc | bool | no | nil | Use utc |
81
+
82
+ And see http://docs.fluentd.org/articles/output-plugin-overview#time-sliced-output-parameters
83
+
84
+ ## Examples
85
+
86
+ ### Streaming inserts
87
+
88
+ Configure insert specifications with target table schema, with your credentials. This is minimum configurations:
89
+
90
+ ```apache
91
+ <match dummy>
92
+ @type bigquery
93
+
94
+ method insert # default
95
+
96
+ auth_method private_key # default
97
+ email xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxx@developer.gserviceaccount.com
98
+ private_key_path /home/username/.keys/00000000000000000000000000000000-privatekey.p12
99
+ # private_key_passphrase notasecret # default
100
+
101
+ project yourproject_id
102
+ dataset yourdataset_id
103
+ table tablename
104
+
105
+ time_format %s
106
+ time_field time
107
+
108
+ schema [
109
+ {"name": "time", "type": "INTEGER"},
110
+ {"name": "status", "type": "INTEGER"},
111
+ {"name": "bytes", "type": "INTEGER"},
112
+ {"name": "vhost", "type": "STRING"},
113
+ {"name": "path", "type": "STRING"},
114
+ {"name": "method", "type": "STRING"},
115
+ {"name": "protocol", "type": "STRING"},
116
+ {"name": "agent", "type": "STRING"},
117
+ {"name": "referer", "type": "STRING"},
118
+ {"name": "remote", "type": "RECORD", "fields": [
119
+ {"name": "host", "type": "STRING"},
120
+ {"name": "ip", "type": "STRING"},
121
+ {"name": "user", "type": "STRING"}
122
+ ]},
123
+ {"name": "requesttime", "type": "FLOAT"},
124
+ {"name": "bot_access", "type": "BOOLEAN"},
125
+ {"name": "loginsession", "type": "BOOLEAN"}
126
+ ]
127
+ </match>
128
+ ```
129
+
130
+ For high rate inserts over streaming inserts, you should specify flush intervals and buffer chunk options:
131
+
132
+ ```apache
133
+ <match dummy>
134
+ @type bigquery
135
+
136
+ method insert # default
137
+
138
+ flush_interval 1 # flush as frequent as possible
139
+
140
+ buffer_chunk_records_limit 300 # default rate limit for users is 100
141
+ buffer_queue_limit 10240 # 1MB * 10240 -> 10GB!
142
+
143
+ num_threads 16
144
+
145
+ auth_method private_key # default
146
+ email xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxx@developer.gserviceaccount.com
147
+ private_key_path /home/username/.keys/00000000000000000000000000000000-privatekey.p12
148
+ # private_key_passphrase notasecret # default
149
+
150
+ project yourproject_id
151
+ dataset yourdataset_id
152
+ tables accesslog1,accesslog2,accesslog3
153
+
154
+ time_format %s
155
+ time_field time
156
+
157
+ schema [
158
+ {"name": "time", "type": "INTEGER"},
159
+ {"name": "status", "type": "INTEGER"},
160
+ {"name": "bytes", "type": "INTEGER"},
161
+ {"name": "vhost", "type": "STRING"},
162
+ {"name": "path", "type": "STRING"},
163
+ {"name": "method", "type": "STRING"},
164
+ {"name": "protocol", "type": "STRING"},
165
+ {"name": "agent", "type": "STRING"},
166
+ {"name": "referer", "type": "STRING"},
167
+ {"name": "remote", "type": "RECORD", "fields": [
168
+ {"name": "host", "type": "STRING"},
169
+ {"name": "ip", "type": "STRING"},
170
+ {"name": "user", "type": "STRING"}
171
+ ]},
172
+ {"name": "requesttime", "type": "FLOAT"},
173
+ {"name": "bot_access", "type": "BOOLEAN"},
174
+ {"name": "loginsession", "type": "BOOLEAN"}
175
+ ]
176
+ </match>
177
+ ```
178
+
179
+ Important options for high rate events are:
180
+
181
+ * `tables`
182
+ * 2 or more tables are available with ',' separator
183
+ * `out_bigquery` uses these tables for Table Sharding inserts
184
+ * these must have same schema
185
+ * `buffer_chunk_limit`
186
+ * max size of an insert or chunk (default 1000000 or 1MB)
187
+ * the max size is limited to 1MB on BigQuery
188
+ * `buffer_chunk_records_limit`
189
+ * number of records over streaming inserts API call is limited as 500, per insert or chunk
190
+ * `out_bigquery` flushes buffer with 500 records for 1 inserts API call
191
+ * `buffer_queue_limit`
192
+ * BigQuery streaming inserts needs very small buffer chunks
193
+ * for high-rate events, `buffer_queue_limit` should be configured with big number
194
+ * Max 1GB memory may be used under network problem in default configuration
195
+ * `buffer_chunk_limit (default 1MB)` x `buffer_queue_limit (default 1024)`
196
+ * `num_threads`
197
+ * threads for insert api calls in parallel
198
+ * specify this option for 100 or more records per seconds
199
+ * 10 or more threads seems good for inserts over internet
200
+ * less threads may be good for Google Compute Engine instances (with low latency for BigQuery)
201
+ * `flush_interval`
202
+ * interval between data flushes (default 0.25)
203
+ * you can set subsecond values such as `0.15` on Fluentd v0.10.42 or later
204
+
205
+ See [Quota policy](https://cloud.google.com/bigquery/streaming-data-into-bigquery#quota)
206
+ section in the Google BigQuery document.
207
+
208
+ ### Load
209
+ ```apache
210
+ <match bigquery>
211
+ @type bigquery
212
+
213
+ method load
214
+ buffer_type file
215
+ buffer_path bigquery.*.buffer
216
+ flush_interval 1800
217
+ flush_at_shutdown true
218
+ try_flush_interval 1
219
+ utc
220
+
221
+ auth_method json_key
222
+ json_key json_key_path.json
223
+
224
+ time_format %s
225
+ time_field time
226
+
227
+ project yourproject_id
228
+ dataset yourdataset_id
229
+ auto_create_table true
230
+ table yourtable%{time_slice}
231
+ schema_path bq_schema.json
232
+ </match>
233
+ ```
234
+
235
+ I recommend to use file buffer and long flush interval.
236
+
237
+ __CAUTION: `flush_interval` default is still `0.25` even if `method` is `load` on current version.__
238
+
239
+ ### Authentication
240
+
241
+ There are four methods supported to fetch access token for the service account.
242
+
243
+ 1. Public-Private key pair of GCP(Google Cloud Platform)'s service account
244
+ 2. JSON key of GCP(Google Cloud Platform)'s service account
245
+ 3. Predefined access token (Compute Engine only)
246
+ 4. Google application default credentials (http://goo.gl/IUuyuX)
247
+
248
+ #### Public-Private key pair of GCP's service account
249
+
250
+ The examples above use the first one. You first need to create a service account (client ID),
251
+ download its private key and deploy the key with fluentd.
252
+
253
+ #### JSON key of GCP(Google Cloud Platform)'s service account
254
+
255
+ You first need to create a service account (client ID),
256
+ download its JSON key and deploy the key with fluentd.
257
+
258
+ ```apache
259
+ <match dummy>
260
+ @type bigquery
261
+
262
+ auth_method json_key
263
+ json_key /home/username/.keys/00000000000000000000000000000000-jsonkey.json
264
+
265
+ project yourproject_id
266
+ dataset yourdataset_id
267
+ table tablename
268
+ ...
269
+ </match>
270
+ ```
271
+
272
+ You can also provide `json_key` as embedded JSON string like this.
273
+ You need to only include `private_key` and `client_email` key from JSON key file.
274
+
275
+ ```apache
276
+ <match dummy>
277
+ @type bigquery
278
+
279
+ auth_method json_key
280
+ json_key {"private_key": "-----BEGIN PRIVATE KEY-----\n...", "client_email": "xxx@developer.gserviceaccount.com"}
281
+
282
+ project yourproject_id
283
+ dataset yourdataset_id
284
+ table tablename
285
+ ...
286
+ </match>
287
+ ```
288
+
289
+ #### Predefined access token (Compute Engine only)
290
+
291
+ When you run fluentd on Googlce Compute Engine instance,
292
+ you don't need to explicitly create a service account for fluentd.
293
+ In this authentication method, you need to add the API scope "https://www.googleapis.com/auth/bigquery" to the scope list of your
294
+ Compute Engine instance, then you can configure fluentd like this.
295
+
296
+ ```apache
297
+ <match dummy>
298
+ @type bigquery
299
+
300
+ auth_method compute_engine
301
+
302
+ project yourproject_id
303
+ dataset yourdataset_id
304
+ table tablename
305
+
306
+ time_format %s
307
+ time_field time
308
+ ...
309
+ </match>
310
+ ```
311
+
312
+ #### Application default credentials
313
+
314
+ The Application Default Credentials provide a simple way to get authorization credentials for use in calling Google APIs, which are described in detail at http://goo.gl/IUuyuX.
315
+
316
+ In this authentication method, the credentials returned are determined by the environment the code is running in. Conditions are checked in the following order:credentials are get from following order.
317
+
318
+ 1. The environment variable `GOOGLE_APPLICATION_CREDENTIALS` is checked. If this variable is specified it should point to a JSON key file that defines the credentials.
319
+ 2. The environment variable `GOOGLE_PRIVATE_KEY` and `GOOGLE_CLIENT_EMAIL` are checked. If this variables are specified `GOOGLE_PRIVATE_KEY` should point to `private_key`, `GOOGLE_CLIENT_EMAIL` should point to `client_email` in a JSON key.
320
+ 3. Well known path is checked. If file is exists, the file used as a JSON key file. This path is `$HOME/.config/gcloud/application_default_credentials.json`.
321
+ 4. System default path is checked. If file is exists, the file used as a JSON key file. This path is `/etc/google/auth/application_default_credentials.json`.
322
+ 5. If you are running in Google Compute Engine production, the built-in service account associated with the virtual machine instance will be used.
323
+ 6. If none of these conditions is true, an error will occur.
324
+
325
+ ### Table id formatting
326
+
327
+ #### strftime formatting
328
+ `table` and `tables` options accept [Time#strftime](http://ruby-doc.org/core-1.9.3/Time.html#method-i-strftime)
329
+ format to construct table ids.
330
+ Table ids are formatted at runtime
331
+ using the local time of the fluentd server.
332
+
333
+ For example, with the configuration below,
334
+ data is inserted into tables `accesslog_2014_08`, `accesslog_2014_09` and so on.
335
+
336
+ ```apache
337
+ <match dummy>
338
+ @type bigquery
339
+
340
+ ...
341
+
342
+ project yourproject_id
343
+ dataset yourdataset_id
344
+ table accesslog_%Y_%m
345
+
346
+ ...
347
+ </match>
348
+ ```
349
+
350
+ #### record attribute formatting
351
+ The format can be suffixed with attribute name.
352
+
353
+ __NOTE: This feature is available only if `method` is `insert`. Because it makes performance impact. Use `%{time_slice}` instead of it.__
354
+
355
+ ```apache
356
+ <match dummy>
357
+ ...
358
+ table accesslog_%Y_%m@timestamp
359
+ ...
360
+ </match>
361
+ ```
362
+
363
+ If attribute name is given, the time to be used for formatting is value of each row.
364
+ The value for the time should be a UNIX time.
365
+
366
+ #### time_slice_key formatting
367
+ Or, the options can use `%{time_slice}` placeholder.
368
+ `%{time_slice}` is replaced by formatted time slice key at runtime.
369
+
370
+ ```apache
371
+ <match dummy>
372
+ @type bigquery
373
+
374
+ ...
375
+ table accesslog%{time_slice}
376
+ ...
377
+ </match>
378
+ ```
379
+
380
+ #### record attribute value formatting
381
+ Or, `${attr_name}` placeholder is available to use value of attribute as part of table id.
382
+ `${attr_name}` is replaced by string value of the attribute specified by `attr_name`.
383
+
384
+ __NOTE: This feature is available only if `method` is `insert`.__
385
+
386
+ ```apache
387
+ <match dummy>
388
+ ...
389
+ table accesslog_%Y_%m_${subdomain}
390
+ ...
391
+ </match>
392
+ ```
393
+
394
+ For example value of `subdomain` attribute is `"bq.fluent"`, table id will be like "accesslog_2016_03_bqfluent".
395
+
396
+ - any type of attribute is allowed because stringified value will be used as replacement.
397
+ - acceptable characters are alphabets, digits and `_`. All other characters will be removed.
398
+
399
+ ### Date partitioned table support
400
+ this plugin can insert (load) into date partitioned table.
401
+
402
+ Use `%{time_slice}`.
403
+
404
+ ```apache
405
+ <match dummy>
406
+ @type bigquery
407
+
408
+ ...
409
+ time_slice_format %Y%m%d
410
+ table accesslog$%{time_slice}
411
+ ...
412
+ </match>
413
+ ```
414
+
415
+ But, Dynamic table creating doesn't support date partitioned table yet.
416
+
417
+ ### Dynamic table creating
418
+
419
+ When `auto_create_table` is set to `true`, try to create the table using BigQuery API when insertion failed with code=404 "Not Found: Table ...".
420
+ Next retry of insertion is expected to be success.
421
+
422
+ NOTE: `auto_create_table` option cannot be used with `fetch_schema`. You should create the table on ahead to use `fetch_schema`.
423
+
424
+ ```apache
425
+ <match dummy>
426
+ @type bigquery
427
+
428
+ ...
429
+
430
+ auto_create_table true
431
+ table accesslog_%Y_%m
432
+
433
+ ...
434
+ </match>
435
+ ```
436
+
437
+ ### Table schema
438
+
439
+ There are three methods to describe the schema of the target table.
440
+
441
+ 1. List fields in fluent.conf
442
+ 2. Load a schema file in JSON.
443
+ 3. Fetch a schema using BigQuery API
444
+
445
+ The examples above use the first method. In this method,
446
+ you can also specify nested fields by prefixing their belonging record fields.
447
+
448
+ ```apache
449
+ <match dummy>
450
+ @type bigquery
451
+
452
+ ...
453
+
454
+ time_format %s
455
+ time_field time
456
+
457
+ schema [
458
+ {"name": "time", "type": "INTEGER"},
459
+ {"name": "status", "type": "INTEGER"},
460
+ {"name": "bytes", "type": "INTEGER"},
461
+ {"name": "vhost", "type": "STRING"},
462
+ {"name": "path", "type": "STRING"},
463
+ {"name": "method", "type": "STRING"},
464
+ {"name": "protocol", "type": "STRING"},
465
+ {"name": "agent", "type": "STRING"},
466
+ {"name": "referer", "type": "STRING"},
467
+ {"name": "remote", "type": "RECORD", "fields": [
468
+ {"name": "host", "type": "STRING"},
469
+ {"name": "ip", "type": "STRING"},
470
+ {"name": "user", "type": "STRING"}
471
+ ]},
472
+ {"name": "requesttime", "type": "FLOAT"},
473
+ {"name": "bot_access", "type": "BOOLEAN"},
474
+ {"name": "loginsession", "type": "BOOLEAN"}
475
+ ]
476
+ </match>
477
+ ```
478
+
479
+ This schema accepts structured JSON data like:
480
+
481
+ ```json
482
+ {
483
+ "request":{
484
+ "time":1391748126.7000976,
485
+ "vhost":"www.example.com",
486
+ "path":"/",
487
+ "method":"GET",
488
+ "protocol":"HTTP/1.1",
489
+ "agent":"HotJava",
490
+ "bot_access":false
491
+ },
492
+ "remote":{ "ip": "192.0.2.1" },
493
+ "response":{
494
+ "status":200,
495
+ "bytes":1024
496
+ }
497
+ }
498
+ ```
499
+
500
+ The second method is to specify a path to a BigQuery schema file instead of listing fields. In this case, your fluent.conf looks like:
501
+
502
+ ```apache
503
+ <match dummy>
504
+ @type bigquery
505
+
506
+ ...
507
+
508
+ time_format %s
509
+ time_field time
510
+
511
+ schema_path /path/to/httpd.schema
512
+ </match>
513
+ ```
514
+ where /path/to/httpd.schema is a path to the JSON-encoded schema file which you used for creating the table on BigQuery. By using external schema file you are able to write full schema that does support NULLABLE/REQUIRED/REPEATED, this feature is really useful and adds full flexbility.
515
+
516
+ The third method is to set `fetch_schema` to `true` to enable fetch a schema using BigQuery API. In this case, your fluent.conf looks like:
517
+
518
+ ```apache
519
+ <match dummy>
520
+ @type bigquery
521
+
522
+ ...
523
+
524
+ time_format %s
525
+ time_field time
526
+
527
+ fetch_schema true
528
+ # fetch_schema_table other_table # if you want to fetch schema from other table
529
+ </match>
530
+ ```
531
+
532
+ If you specify multiple tables in configuration file, plugin get all schema data from BigQuery and merge it.
533
+
534
+ NOTE: Since JSON does not define how to encode data of TIMESTAMP type,
535
+ you are still recommended to specify JSON types for TIMESTAMP fields as "time" field does in the example, if you use second or third method.
536
+
537
+ ### Specifying insertId property
538
+
539
+ BigQuery uses `insertId` property to detect duplicate insertion requests (see [data consistency](https://cloud.google.com/bigquery/streaming-data-into-bigquery#dataconsistency) in Google BigQuery documents).
540
+ You can set `insert_id_field` option to specify the field to use as `insertId` property.
541
+
542
+ ```apache
543
+ <match dummy>
544
+ @type bigquery
545
+
546
+ ...
547
+
548
+ insert_id_field uuid
549
+ schema [{"name": "uuid", "type": "STRING"}]
550
+ </match>
551
+ ```
552
+
553
+ ## TODO
554
+
555
+ * OAuth installed application credentials support
556
+ * Google API discovery expiration
557
+ * check row size limits
558
+
559
+ ## Authors
560
+
561
+ * @tagomoris: First author, original version
562
+ * KAIZEN platform Inc.: Maintener, Since 2014.08.19
563
+ * @joker1007