fluent-plugin-bigquery-test 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: ff14eb5085151de11780105f37826c8c6359e064e8b03b151593a825392743bb
4
+ data.tar.gz: 04ba1a56eef89610cdce659129de4b887141a992b1836114df07a0bc546e5d95
5
+ SHA512:
6
+ metadata.gz: 8016409a53493922cd2df4d1e1628fb47ca2189392fb3c0eef154e512ee34ce59bb491af4f6dca0d9841af52b81326fd2525f22971a174ff1b1ac2a6f627ac79
7
+ data.tar.gz: 2ec49bcf6281f40887128c079ded78a62f837737a4c264933cb8750233dd64cbdf76ab018260d071a6f2fa0f064b39fa340026e2d3bb20656be448a058a853c9
@@ -0,0 +1,16 @@
1
+ <!-- Please check your config and docs of fluentd !! -->
2
+
3
+ ## Environments
4
+
5
+ - fluentd version:
6
+ - plugin version:
7
+
8
+ ## Configuration
9
+ <!-- Please write your configuration -->
10
+
11
+ ## Expected Behavior
12
+
13
+ ## Actual Behavior
14
+
15
+ ## Log (if you have)
16
+
@@ -0,0 +1,21 @@
1
+ *.gem
2
+ *.rbc
3
+ .bundle
4
+ .config
5
+ .yardoc
6
+ .ruby-version
7
+ Gemfile.lock
8
+ InstalledFiles
9
+ _yardoc
10
+ coverage
11
+ doc/
12
+ lib/bundler/man
13
+ pkg
14
+ rdoc
15
+ spec/reports
16
+ test/tmp
17
+ test/version_tmp
18
+ tmp
19
+ script/
20
+
21
+ fluentd-0.12
@@ -0,0 +1,14 @@
1
+ language: ruby
2
+
3
+ rvm:
4
+ - 2.3.7
5
+ - 2.4.4
6
+ - 2.5.1
7
+
8
+ gemfile:
9
+ - Gemfile
10
+
11
+ before_install:
12
+ - gem update bundler
13
+
14
+ script: bundle exec rake test
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in fluent-plugin-bigquery.gemspec
4
+ gemspec
@@ -0,0 +1,13 @@
1
+ Copyright (c) 2012- TAGOMORI Satoshi
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an "AS IS" BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
@@ -0,0 +1,602 @@
1
+ # fluent-plugin-bigquery
2
+
3
+ ## Notice
4
+
5
+ We will transfer fluent-plugin-bigquery repository to [fluent-plugins-nursery](https://github.com/fluent-plugins-nursery) organization.
6
+ It does not change maintenance plan.
7
+ The main purpose is that it solves mismatch between maintainers and current organization.
8
+
9
+ ---
10
+
11
+ [Fluentd](http://fluentd.org) output plugin to load/insert data into Google BigQuery.
12
+
13
+ - **Plugin type**: Output
14
+
15
+ * insert data over streaming inserts
16
+ * plugin type is `bigquery_insert`
17
+ * for continuous real-time insertions
18
+ * https://developers.google.com/bigquery/streaming-data-into-bigquery#usecases
19
+ * load data
20
+ * plugin type is `bigquery_load`
21
+ * for data loading as batch jobs, for big amount of data
22
+ * https://developers.google.com/bigquery/loading-data-into-bigquery
23
+
24
+ Current version of this plugin supports Google API with Service Account Authentication, but does not support
25
+ OAuth flow for installed applications.
26
+
27
+ ## Support Version
28
+
29
+ | plugin version | fluentd version | ruby version |
30
+ | :----------- | :----------- | :----------- |
31
+ | v0.4.x | 0.12.x | 2.0 or later |
32
+ | v1.x.x | 0.14.x or later | 2.2 or later |
33
+ | v2.x.x | 0.14.x or later | 2.3 or later |
34
+
35
+ ## With docker image
36
+ If you use official alpine based fluentd docker image (https://github.com/fluent/fluentd-docker-image),
37
+ You need to install `bigdecimal` gem on your own dockerfile.
38
+ Because alpine based image has only minimal ruby environment in order to reduce image size.
39
+ And in most case, dependency to embedded gem is not written on gemspec.
40
+ Because embbeded gem dependency sometimes restricts ruby environment.
41
+
42
+ ## Configuration
43
+
44
+ ### Options
45
+
46
+ #### common
47
+
48
+ | name | type | required? | placeholder? | default | description |
49
+ | :-------------------------------------------- | :------------ | :----------- | :---------- | :------------------------- | :----------------------- |
50
+ | auth_method | enum | yes | no | private_key | `private_key` or `json_key` or `compute_engine` or `application_default` |
51
+ | email | string | yes (private_key) | no | nil | GCP Service Account Email |
52
+ | private_key_path | string | yes (private_key) | no | nil | GCP Private Key file path |
53
+ | private_key_passphrase | string | yes (private_key) | no | nil | GCP Private Key Passphrase |
54
+ | json_key | string | yes (json_key) | no | nil | GCP JSON Key file path or JSON Key string |
55
+ | location | string | no | no | nil | BigQuery Data Location. The geographic location of the job. Required except for US and EU. |
56
+ | project | string | yes | yes | nil | |
57
+ | dataset | string | yes | yes | nil | |
58
+ | table | string | yes (either `tables`) | yes | nil | |
59
+ | tables | array(string) | yes (either `table`) | yes | nil | can set multi table names splitted by `,` |
60
+ | auto_create_table | bool | no | no | false | If true, creates table automatically |
61
+ | ignore_unknown_values | bool | no | no | false | Accept rows that contain values that do not match the schema. The unknown values are ignored. |
62
+ | schema | array | yes (either `fetch_schema` or `schema_path`) | no | nil | Schema Definition. It is formatted by JSON. |
63
+ | schema_path | string | yes (either `fetch_schema`) | no | nil | Schema Definition file path. It is formatted by JSON. |
64
+ | fetch_schema | bool | yes (either `schema_path`) | no | false | If true, fetch table schema definition from Bigquery table automatically. |
65
+ | fetch_schema_table | string | no | yes | nil | If set, fetch table schema definition from this table, If fetch_schema is false, this param is ignored |
66
+ | schema_cache_expire | integer | no | no | 600 | Value is second. If current time is after expiration interval, re-fetch table schema definition. |
67
+ | request_timeout_sec | integer | no | no | nil | Bigquery API response timeout |
68
+ | request_open_timeout_sec | integer | no | no | 60 | Bigquery API connection, and request timeout. If you send big data to Bigquery, set large value. |
69
+ | time_partitioning_type | enum | no (either day) | no | nil | Type of bigquery time partitioning feature. |
70
+ | time_partitioning_field | string | no | no | nil | Field used to determine how to create a time-based partition. |
71
+ | time_partitioning_expiration | time | no | no | nil | Expiration milliseconds for bigquery time partitioning. |
72
+ | clustering_fields | array(string) | no | no | nil | One or more fields on which data should be clustered. The order of the specified columns determines the sort order of the data. |
73
+
74
+ #### bigquery_insert
75
+
76
+ | name | type | required? | placeholder? | default | description |
77
+ | :------------------------------------- | :------------ | :----------- | :---------- | :------------------------- | :----------------------- |
78
+ | template_suffix | string | no | yes | nil | can use `%{time_slice}` placeholder replaced by `time_slice_format` |
79
+ | skip_invalid_rows | bool | no | no | false | |
80
+ | insert_id_field | string | no | no | nil | Use key as `insert_id` of Streaming Insert API parameter. see. https://docs.fluentd.org/v1.0/articles/api-plugin-helper-record_accessor |
81
+ | add_insert_timestamp | string | no | no | nil | Adds a timestamp column just before sending the rows to BigQuery, so that buffering time is not taken into account. Gives a field in BigQuery which represents the insert time of the row. |
82
+ | allow_retry_insert_errors | bool | no | no | false | Retry to insert rows when an insertErrors occurs. There is a possibility that rows are inserted in duplicate. |
83
+
84
+ #### bigquery_load
85
+
86
+ | name | type | required? | placeholder? | default | description |
87
+ | :------------------------------------- | :------------ | :----------- | :---------- | :------------------------- | :----------------------- |
88
+ | source_format | enum | no | no | json | Specify source format `json` or `csv` or `avro`. If you change this parameter, you must change formatter plugin via `<format>` config section. |
89
+ | max_bad_records | integer | no | no | 0 | If the number of bad records exceeds this value, an invalid error is returned in the job result. |
90
+
91
+ ### Buffer section
92
+
93
+ | name | type | required? | default | description |
94
+ | :------------------------------------- | :------------ | :----------- | :------------------------- | :----------------------- |
95
+ | @type | string | no | memory (insert) or file (load) | |
96
+ | chunk_limit_size | integer | no | 1MB (insert) or 1GB (load) | |
97
+ | total_limit_size | integer | no | 1GB (insert) or 32GB (load) | |
98
+ | chunk_records_limit | integer | no | 500 (insert) or nil (load) | |
99
+ | flush_mode | enum | no | interval | default, lazy, interval, immediate |
100
+ | flush_interval | float | no | 1.0 (insert) or 3600 (load) | |
101
+ | flush_thread_interval | float | no | 0.05 (insert) or 5 (load) | |
102
+ | flush_thread_burst_interval | float | no | 0.05 (insert) or 5 (load) | |
103
+
104
+ And, other params (defined by base class) are available
105
+
106
+ see. https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin/output.rb
107
+
108
+ ### Inject section
109
+
110
+ It is replacement of previous version `time_field` and `time_format`.
111
+
112
+ For example.
113
+
114
+ ```
115
+ <inject>
116
+ time_key time_field_name
117
+ time_type string
118
+ time_format %Y-%m-%d %H:%M:%S
119
+ </inject>
120
+ ```
121
+
122
+ | name | type | required? | default | description |
123
+ | :------------------------------------- | :------------ | :----------- | :------------------------- | :----------------------- |
124
+ | hostname_key | string | no | nil | |
125
+ | hostname | string | no | nil | |
126
+ | tag_key | string | no | nil | |
127
+ | time_key | string | no | nil | |
128
+ | time_type | string | no | nil | |
129
+ | time_format | string | no | nil | |
130
+ | localtime | bool | no | true | |
131
+ | utc | bool | no | false | |
132
+ | timezone | string | no | nil | |
133
+
134
+ see. https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin_helper/inject.rb
135
+
136
+ ### Formatter section
137
+
138
+ This section is for `load` mode only.
139
+ If you use `insert` mode, used formatter is `json` only.
140
+
141
+ Bigquery supports `csv`, `json` and `avro` format. Default is `json`
142
+ I recommend to use `json` for now.
143
+
144
+ For example.
145
+
146
+ ```
147
+ source_format csv
148
+
149
+ <format>
150
+ @type csv
151
+ fields col1, col2, col3
152
+ </format>
153
+ ```
154
+
155
+ see. https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin_helper/formatter.rb
156
+
157
+ ## Examples
158
+
159
+ ### Streaming inserts
160
+
161
+ Configure insert specifications with target table schema, with your credentials. This is minimum configurations:
162
+
163
+ ```apache
164
+ <match dummy>
165
+ @type bigquery_insert
166
+
167
+ auth_method private_key # default
168
+ email xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxx@developer.gserviceaccount.com
169
+ private_key_path /home/username/.keys/00000000000000000000000000000000-privatekey.p12
170
+ # private_key_passphrase notasecret # default
171
+
172
+ project yourproject_id
173
+ dataset yourdataset_id
174
+ table tablename
175
+
176
+ schema [
177
+ {"name": "time", "type": "INTEGER"},
178
+ {"name": "status", "type": "INTEGER"},
179
+ {"name": "bytes", "type": "INTEGER"},
180
+ {"name": "vhost", "type": "STRING"},
181
+ {"name": "path", "type": "STRING"},
182
+ {"name": "method", "type": "STRING"},
183
+ {"name": "protocol", "type": "STRING"},
184
+ {"name": "agent", "type": "STRING"},
185
+ {"name": "referer", "type": "STRING"},
186
+ {"name": "remote", "type": "RECORD", "fields": [
187
+ {"name": "host", "type": "STRING"},
188
+ {"name": "ip", "type": "STRING"},
189
+ {"name": "user", "type": "STRING"}
190
+ ]},
191
+ {"name": "requesttime", "type": "FLOAT"},
192
+ {"name": "bot_access", "type": "BOOLEAN"},
193
+ {"name": "loginsession", "type": "BOOLEAN"}
194
+ ]
195
+ </match>
196
+ ```
197
+
198
+ For high rate inserts over streaming inserts, you should specify flush intervals and buffer chunk options:
199
+
200
+ ```apache
201
+ <match dummy>
202
+ @type bigquery_insert
203
+
204
+ <buffer>
205
+ flush_interval 0.1 # flush as frequent as possible
206
+
207
+ total_limit_size 10g
208
+
209
+ flush_thread_count 16
210
+ </buffer>
211
+
212
+ auth_method private_key # default
213
+ email xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxx@developer.gserviceaccount.com
214
+ private_key_path /home/username/.keys/00000000000000000000000000000000-privatekey.p12
215
+ # private_key_passphrase notasecret # default
216
+
217
+ project yourproject_id
218
+ dataset yourdataset_id
219
+ tables accesslog1,accesslog2,accesslog3
220
+
221
+ schema [
222
+ {"name": "time", "type": "INTEGER"},
223
+ {"name": "status", "type": "INTEGER"},
224
+ {"name": "bytes", "type": "INTEGER"},
225
+ {"name": "vhost", "type": "STRING"},
226
+ {"name": "path", "type": "STRING"},
227
+ {"name": "method", "type": "STRING"},
228
+ {"name": "protocol", "type": "STRING"},
229
+ {"name": "agent", "type": "STRING"},
230
+ {"name": "referer", "type": "STRING"},
231
+ {"name": "remote", "type": "RECORD", "fields": [
232
+ {"name": "host", "type": "STRING"},
233
+ {"name": "ip", "type": "STRING"},
234
+ {"name": "user", "type": "STRING"}
235
+ ]},
236
+ {"name": "requesttime", "type": "FLOAT"},
237
+ {"name": "bot_access", "type": "BOOLEAN"},
238
+ {"name": "loginsession", "type": "BOOLEAN"}
239
+ ]
240
+ </match>
241
+ ```
242
+
243
+ Important options for high rate events are:
244
+
245
+ * `tables`
246
+ * 2 or more tables are available with ',' separator
247
+ * `out_bigquery` uses these tables for Table Sharding inserts
248
+ * these must have same schema
249
+ * `buffer/chunk_limit_size`
250
+ * max size of an insert or chunk (default 1000000 or 1MB)
251
+ * the max size is limited to 1MB on BigQuery
252
+ * `buffer/chunk_records_limit`
253
+ * number of records over streaming inserts API call is limited as 500, per insert or chunk
254
+ * `out_bigquery` flushes buffer with 500 records for 1 inserts API call
255
+ * `buffer/queue_length_limit`
256
+ * BigQuery streaming inserts needs very small buffer chunks
257
+ * for high-rate events, `buffer_queue_limit` should be configured with big number
258
+ * Max 1GB memory may be used under network problem in default configuration
259
+ * `chunk_limit_size (default 1MB)` x `queue_length_limit (default 1024)`
260
+ * `buffer/flush_thread_count`
261
+ * threads for insert api calls in parallel
262
+ * specify this option for 100 or more records per seconds
263
+ * 10 or more threads seems good for inserts over internet
264
+ * less threads may be good for Google Compute Engine instances (with low latency for BigQuery)
265
+ * `buffer/flush_interval`
266
+ * interval between data flushes (default 0.25)
267
+ * you can set subsecond values such as `0.15` on Fluentd v0.10.42 or later
268
+
269
+ See [Quota policy](https://cloud.google.com/bigquery/streaming-data-into-bigquery#quota)
270
+ section in the Google BigQuery document.
271
+
272
+ ### Load
273
+ ```apache
274
+ <match bigquery>
275
+ @type bigquery_load
276
+
277
+ <buffer>
278
+ path bigquery.*.buffer
279
+ flush_at_shutdown true
280
+ timekey_use_utc
281
+ </buffer>
282
+
283
+ auth_method json_key
284
+ json_key json_key_path.json
285
+
286
+ project yourproject_id
287
+ dataset yourdataset_id
288
+ auto_create_table true
289
+ table yourtable%{time_slice}
290
+ schema_path bq_schema.json
291
+ </match>
292
+ ```
293
+
294
+ I recommend to use file buffer and long flush interval.
295
+
296
+ ### Authentication
297
+
298
+ There are four methods supported to fetch access token for the service account.
299
+
300
+ 1. Public-Private key pair of GCP(Google Cloud Platform)'s service account
301
+ 2. JSON key of GCP(Google Cloud Platform)'s service account
302
+ 3. Predefined access token (Compute Engine only)
303
+ 4. Google application default credentials (http://goo.gl/IUuyuX)
304
+
305
+ #### Public-Private key pair of GCP's service account
306
+
307
+ The examples above use the first one. You first need to create a service account (client ID),
308
+ download its private key and deploy the key with fluentd.
309
+
310
+ #### JSON key of GCP(Google Cloud Platform)'s service account
311
+
312
+ You first need to create a service account (client ID),
313
+ download its JSON key and deploy the key with fluentd.
314
+
315
+ ```apache
316
+ <match dummy>
317
+ @type bigquery_insert
318
+
319
+ auth_method json_key
320
+ json_key /home/username/.keys/00000000000000000000000000000000-jsonkey.json
321
+
322
+ project yourproject_id
323
+ dataset yourdataset_id
324
+ table tablename
325
+ ...
326
+ </match>
327
+ ```
328
+
329
+ You can also provide `json_key` as embedded JSON string like this.
330
+ You need to only include `private_key` and `client_email` key from JSON key file.
331
+
332
+ ```apache
333
+ <match dummy>
334
+ @type bigquery_insert
335
+
336
+ auth_method json_key
337
+ json_key {"private_key": "-----BEGIN PRIVATE KEY-----\n...", "client_email": "xxx@developer.gserviceaccount.com"}
338
+
339
+ project yourproject_id
340
+ dataset yourdataset_id
341
+ table tablename
342
+ ...
343
+ </match>
344
+ ```
345
+
346
+ #### Predefined access token (Compute Engine only)
347
+
348
+ When you run fluentd on Googlce Compute Engine instance,
349
+ you don't need to explicitly create a service account for fluentd.
350
+ In this authentication method, you need to add the API scope "https://www.googleapis.com/auth/bigquery" to the scope list of your
351
+ Compute Engine instance, then you can configure fluentd like this.
352
+
353
+ ```apache
354
+ <match dummy>
355
+ @type bigquery_insert
356
+
357
+ auth_method compute_engine
358
+
359
+ project yourproject_id
360
+ dataset yourdataset_id
361
+ table tablename
362
+
363
+ ...
364
+ </match>
365
+ ```
366
+
367
+ #### Application default credentials
368
+
369
+ The Application Default Credentials provide a simple way to get authorization credentials for use in calling Google APIs, which are described in detail at http://goo.gl/IUuyuX.
370
+
371
+ In this authentication method, the credentials returned are determined by the environment the code is running in. Conditions are checked in the following order:credentials are get from following order.
372
+
373
+ 1. The environment variable `GOOGLE_APPLICATION_CREDENTIALS` is checked. If this variable is specified it should point to a JSON key file that defines the credentials.
374
+ 2. The environment variable `GOOGLE_PRIVATE_KEY` and `GOOGLE_CLIENT_EMAIL` are checked. If this variables are specified `GOOGLE_PRIVATE_KEY` should point to `private_key`, `GOOGLE_CLIENT_EMAIL` should point to `client_email` in a JSON key.
375
+ 3. Well known path is checked. If file is exists, the file used as a JSON key file. This path is `$HOME/.config/gcloud/application_default_credentials.json`.
376
+ 4. System default path is checked. If file is exists, the file used as a JSON key file. This path is `/etc/google/auth/application_default_credentials.json`.
377
+ 5. If you are running in Google Compute Engine production, the built-in service account associated with the virtual machine instance will be used.
378
+ 6. If none of these conditions is true, an error will occur.
379
+
380
+ ### Table id formatting
381
+
382
+ this plugin supports fluentd-0.14 style placeholder.
383
+
384
+ #### strftime formatting
385
+ `table` and `tables` options accept [Time#strftime](http://ruby-doc.org/core-1.9.3/Time.html#method-i-strftime)
386
+ format to construct table ids.
387
+ Table ids are formatted at runtime
388
+ using the chunk key time.
389
+
390
+ see. http://docs.fluentd.org/v0.14/articles/output-plugin-overview
391
+
392
+ For example, with the configuration below,
393
+ data is inserted into tables `accesslog_2014_08`, `accesslog_2014_09` and so on.
394
+
395
+ ```apache
396
+ <match dummy>
397
+ @type bigquery_insert
398
+
399
+ ...
400
+
401
+ project yourproject_id
402
+ dataset yourdataset_id
403
+ table accesslog_%Y_%m
404
+
405
+ <buffer time>
406
+ timekey 1d
407
+ </buffer>
408
+ ...
409
+ </match>
410
+ ```
411
+
412
+ #### record attribute formatting
413
+ The format can be suffixed with attribute name.
414
+
415
+ __CAUTION: format is different with previous version__
416
+
417
+ ```apache
418
+ <match dummy>
419
+ ...
420
+ table accesslog_${status_code}
421
+
422
+ <buffer status_code>
423
+ </buffer>
424
+ ...
425
+ </match>
426
+ ```
427
+
428
+ If attribute name is given, the time to be used for formatting is value of each row.
429
+ The value for the time should be a UNIX time.
430
+
431
+ #### time_slice_key formatting
432
+
433
+ Instead, Use strftime formatting.
434
+
435
+ strftime formatting of current version is based on chunk key.
436
+ That is same with previous time_slice_key formatting .
437
+
438
+ ### Date partitioned table support
439
+ this plugin can insert (load) into date partitioned table.
440
+
441
+ Use placeholder.
442
+
443
+ ```apache
444
+ <match dummy>
445
+ @type bigquery_load
446
+
447
+ ...
448
+ table accesslog$%Y%m%d
449
+
450
+ <buffer time>
451
+ timekey 1d
452
+ </buffer>
453
+ ...
454
+ </match>
455
+ ```
456
+
457
+ But, Dynamic table creating doesn't support date partitioned table yet.
458
+ And streaming insert is not allowed to insert with `$%Y%m%d` suffix.
459
+ If you use date partitioned table with streaming insert, Please omit `$%Y%m%d` suffix from `table`.
460
+
461
+ ### Dynamic table creating
462
+
463
+ When `auto_create_table` is set to `true`, try to create the table using BigQuery API when insertion failed with code=404 "Not Found: Table ...".
464
+ Next retry of insertion is expected to be success.
465
+
466
+ NOTE: `auto_create_table` option cannot be used with `fetch_schema`. You should create the table on ahead to use `fetch_schema`.
467
+
468
+ ```apache
469
+ <match dummy>
470
+ @type bigquery_insert
471
+
472
+ ...
473
+
474
+ auto_create_table true
475
+ table accesslog_%Y_%m
476
+
477
+ ...
478
+ </match>
479
+ ```
480
+
481
+ Also, you can create clustered table by using `clustering_fields`.
482
+
483
+ ### Table schema
484
+
485
+ There are three methods to describe the schema of the target table.
486
+
487
+ 1. List fields in fluent.conf
488
+ 2. Load a schema file in JSON.
489
+ 3. Fetch a schema using BigQuery API
490
+
491
+ The examples above use the first method. In this method,
492
+ you can also specify nested fields by prefixing their belonging record fields.
493
+
494
+ ```apache
495
+ <match dummy>
496
+ @type bigquery_insert
497
+
498
+ ...
499
+
500
+ schema [
501
+ {"name": "time", "type": "INTEGER"},
502
+ {"name": "status", "type": "INTEGER"},
503
+ {"name": "bytes", "type": "INTEGER"},
504
+ {"name": "vhost", "type": "STRING"},
505
+ {"name": "path", "type": "STRING"},
506
+ {"name": "method", "type": "STRING"},
507
+ {"name": "protocol", "type": "STRING"},
508
+ {"name": "agent", "type": "STRING"},
509
+ {"name": "referer", "type": "STRING"},
510
+ {"name": "remote", "type": "RECORD", "fields": [
511
+ {"name": "host", "type": "STRING"},
512
+ {"name": "ip", "type": "STRING"},
513
+ {"name": "user", "type": "STRING"}
514
+ ]},
515
+ {"name": "requesttime", "type": "FLOAT"},
516
+ {"name": "bot_access", "type": "BOOLEAN"},
517
+ {"name": "loginsession", "type": "BOOLEAN"}
518
+ ]
519
+ </match>
520
+ ```
521
+
522
+ This schema accepts structured JSON data like:
523
+
524
+ ```json
525
+ {
526
+ "request":{
527
+ "time":1391748126.7000976,
528
+ "vhost":"www.example.com",
529
+ "path":"/",
530
+ "method":"GET",
531
+ "protocol":"HTTP/1.1",
532
+ "agent":"HotJava",
533
+ "bot_access":false
534
+ },
535
+ "remote":{ "ip": "192.0.2.1" },
536
+ "response":{
537
+ "status":200,
538
+ "bytes":1024
539
+ }
540
+ }
541
+ ```
542
+
543
+ The second method is to specify a path to a BigQuery schema file instead of listing fields. In this case, your fluent.conf looks like:
544
+
545
+ ```apache
546
+ <match dummy>
547
+ @type bigquery_insert
548
+
549
+ ...
550
+
551
+ schema_path /path/to/httpd.schema
552
+ </match>
553
+ ```
554
+ where /path/to/httpd.schema is a path to the JSON-encoded schema file which you used for creating the table on BigQuery. By using external schema file you are able to write full schema that does support NULLABLE/REQUIRED/REPEATED, this feature is really useful and adds full flexbility.
555
+
556
+ The third method is to set `fetch_schema` to `true` to enable fetch a schema using BigQuery API. In this case, your fluent.conf looks like:
557
+
558
+ ```apache
559
+ <match dummy>
560
+ @type bigquery_insert
561
+
562
+ ...
563
+
564
+ fetch_schema true
565
+ # fetch_schema_table other_table # if you want to fetch schema from other table
566
+ </match>
567
+ ```
568
+
569
+ If you specify multiple tables in configuration file, plugin get all schema data from BigQuery and merge it.
570
+
571
+ NOTE: Since JSON does not define how to encode data of TIMESTAMP type,
572
+ you are still recommended to specify JSON types for TIMESTAMP fields as "time" field does in the example, if you use second or third method.
573
+
574
+ ### Specifying insertId property
575
+
576
+ BigQuery uses `insertId` property to detect duplicate insertion requests (see [data consistency](https://cloud.google.com/bigquery/streaming-data-into-bigquery#dataconsistency) in Google BigQuery documents).
577
+ You can set `insert_id_field` option to specify the field to use as `insertId` property.
578
+ `insert_id_field` can use fluentd record_accessor format like `$['key1'][0]['key2']`.
579
+ (detail. https://docs.fluentd.org/v1.0/articles/api-plugin-helper-record_accessor)
580
+
581
+ ```apache
582
+ <match dummy>
583
+ @type bigquery_insert
584
+
585
+ ...
586
+
587
+ insert_id_field uuid
588
+ schema [{"name": "uuid", "type": "STRING"}]
589
+ </match>
590
+ ```
591
+
592
+ ## TODO
593
+
594
+ * OAuth installed application credentials support
595
+ * Google API discovery expiration
596
+ * check row size limits
597
+
598
+ ## Authors
599
+
600
+ * @tagomoris: First author, original version
601
+ * KAIZEN platform Inc.: Maintener, Since 2014.08.19
602
+ * @joker1007