fluent-plugin-kinesis 0.4.1 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (44) hide show
  1. checksums.yaml +4 -4
  2. data/.gitignore +13 -18
  3. data/.travis.yml +9 -9
  4. data/CHANGELOG.md +9 -0
  5. data/CONTRIBUTORS.txt +1 -1
  6. data/Gemfile +12 -9
  7. data/LICENSE.txt +39 -201
  8. data/Makefile +40 -0
  9. data/NOTICE.txt +1 -1
  10. data/README-v0.4.md +348 -0
  11. data/README.md +398 -183
  12. data/Rakefile +20 -14
  13. data/benchmark/dummer.conf +13 -0
  14. data/benchmark/firehose.conf +24 -0
  15. data/benchmark/producer.conf +28 -0
  16. data/benchmark/streams.conf +24 -0
  17. data/fluent-plugin-kinesis.gemspec +34 -23
  18. data/gemfiles/Gemfile.fluentd-0.10.58 +20 -0
  19. data/lib/fluent/plugin/kinesis_helper.rb +30 -0
  20. data/lib/fluent/plugin/kinesis_helper/api.rb +164 -0
  21. data/lib/fluent/plugin/kinesis_helper/class_methods.rb +120 -0
  22. data/lib/fluent/plugin/kinesis_helper/client.rb +36 -0
  23. data/lib/fluent/plugin/kinesis_helper/credentials.rb +51 -0
  24. data/lib/fluent/plugin/kinesis_helper/error.rb +38 -0
  25. data/lib/fluent/plugin/kinesis_helper/format.rb +85 -0
  26. data/lib/fluent/plugin/kinesis_helper/initialize.rb +58 -0
  27. data/lib/fluent/plugin/kinesis_helper/kpl.rb +81 -0
  28. data/lib/fluent/plugin/out_kinesis.rb +13 -11
  29. data/lib/fluent/plugin/out_kinesis_firehose.rb +44 -0
  30. data/lib/fluent/plugin/out_kinesis_producer.rb +38 -0
  31. data/lib/fluent/plugin/out_kinesis_streams.rb +47 -0
  32. data/lib/fluent/plugin/patched_detach_process_impl.rb +103 -0
  33. data/lib/fluent_plugin_kinesis/version.rb +17 -0
  34. data/lib/kinesis_producer.rb +24 -0
  35. data/lib/kinesis_producer/binary.rb +10 -0
  36. data/lib/kinesis_producer/daemon.rb +238 -0
  37. data/lib/kinesis_producer/library.rb +122 -0
  38. data/lib/kinesis_producer/protobuf/config.pb.rb +66 -0
  39. data/lib/kinesis_producer/protobuf/messages.pb.rb +151 -0
  40. data/lib/kinesis_producer/tasks/binary.rake +73 -0
  41. metadata +196 -36
  42. data/lib/fluent/plugin/version.rb +0 -16
  43. data/test/helper.rb +0 -32
  44. data/test/plugin/test_out_kinesis.rb +0 -641
data/README.md CHANGED
@@ -1,47 +1,51 @@
1
- # Fluent Plugin for Amazon Kinesis
1
+ # Fluent plugin for Amazon Kinesis
2
2
 
3
- [![Build Status](https://travis-ci.org/awslabs/aws-fluent-plugin-kinesis.svg?branch=master)](https://travis-ci.org/awslabs/aws-fluent-plugin-kinesis)
3
+ [![Build Status](https://travis-ci.org/awslabs/aws-fluent-plugin-kinesis.svg?branch=master)](https://travis-ci.org/awslabs/aws-fluent-plugin-kinesis)
4
4
 
5
- ## Overview
5
+ [Fluentd][fluentd] output plugin
6
+ that sends events to [Amazon Kinesis Streams][streams] (via both API and [Kinesis Producer Library (KPL)][producer]) and [Amazon Kinesis Firehose][firehose] (via API). This gem includes three output plugins respectively:
6
7
 
7
- [Fluentd](http://fluentd.org/) output plugin
8
- that sends events to [Amazon Kinesis](https://aws.amazon.com/kinesis/).
8
+ - `kinesis_streams`
9
+ - `kinesis_producer`
10
+ - `kinesis_firehose`
9
11
 
10
- Also, there is a documentation on [Fluentd official site](http://docs.fluentd.org/articles/kinesis-stream).
12
+ Also, there is a [documentation on Fluentd official site][fluentd-doc-kinesis].
11
13
 
12
- ## Installation
14
+ ## Warning: `kinesis` is no longer supported
15
+ As of v1.0.0, `kinesis` plugin is no longer supported. Still you can use the plugin, but if you see the warn log below, please consider to use `kinesis_streams`.
16
+
17
+ [warn]: Deprecated warning: out_kinesis is no longer supported after v1.0.0. Please check out_kinesis_streams out.
13
18
 
19
+ If you still want to use `kinesis`, please see [the old README][old-readme].
20
+
21
+ ## Installation
14
22
  This fluentd plugin is available as the `fluent-plugin-kinesis` gem from RubyGems.
15
23
 
16
24
  gem install fluent-plugin-kinesis
17
25
 
18
- Or you can install this plugin for [td-agent](https://github.com/treasure-data/td-agent) as:
26
+ Or you can install this plugin for [td-agent][td-agent] as:
19
27
 
20
- fluent-gem install fluent-plugin-kinesis
28
+ td-agent-gem install fluent-plugin-kinesis
21
29
 
22
- If you would like to build by yourself and install, please see the section below.
23
- Your need [bundler](http://bundler.io/) for this.
30
+ If you would like to build by yourself and install, please see the section below. Your need [bundler][bundler] for this.
24
31
 
25
- In case of using with Fluentd:
26
- Fluentd will be also installed via the process below.
32
+ In case of using with Fluentd: Fluentd will be also installed via the process below.
27
33
 
28
34
  git clone https://github.com/awslabs/aws-fluent-plugin-kinesis.git
29
35
  cd aws-fluent-plugin-kinesis
30
36
  bundle install
31
- rake build
32
- rake install
37
+ bundle exec rake build
38
+ bundle exec rake install
33
39
 
34
- Also, you can use this plugin with td-agent:
35
- You have to install td-agent before installing this plugin.
40
+ Also, you can use this plugin with td-agent: You have to install td-agent before installing this plugin.
36
41
 
37
42
  git clone https://github.com/awslabs/aws-fluent-plugin-kinesis.git
38
43
  cd aws-fluent-plugin-kinesis
39
44
  bundle install
40
- rake build
45
+ bundle exec rake build
41
46
  fluent-gem install pkg/fluent-plugin-kinesis
42
47
 
43
- Or just download specify your Ruby library path.
44
- Below is the sample for specifying your library path via RUBYLIB.
48
+ Or just download specify your Ruby library path. Below is the sample for specifying your library path via RUBYLIB.
45
49
 
46
50
  git clone https://github.com/awslabs/aws-fluent-plugin-kinesis.git
47
51
  cd aws-fluent-plugin-kinesis
@@ -49,12 +53,10 @@ Below is the sample for specifying your library path via RUBYLIB.
49
53
  export RUBYLIB=$RUBYLIB:/path/to/aws-fluent-plugin-kinesis/lib
50
54
 
51
55
  ## Dependencies
52
-
53
- * Ruby 1.9.3+
54
- * Fluentd 0.10.43+
56
+ * Ruby 2.0.0+
57
+ * Fluentd 0.10.58+
55
58
 
56
59
  ## Basic Usage
57
-
58
60
  Here are general procedures for using this plugin:
59
61
 
60
62
  1. Install.
@@ -73,256 +75,432 @@ To run with td-agent, it would be as follows:
73
75
  1. Edit configuration file provided by td-agent.
74
76
  1. Then, run or restart td-agent.
75
77
 
76
- ## Configuration
78
+ ## Getting started
79
+ Assume you use Amazon EC2 instances with Instance profile. If you want to use specific credentials, see [Credentials](#configuration-credentials).
80
+
81
+ ### kinesis_streams
82
+ <match your_tag>
83
+ @type kinesis_streams
84
+ region us-east-1
85
+ stream_name your_stream
86
+ partition_key key # Otherwise, use random partition key
87
+ </match>
88
+ For more detail, see [Configuration: kinesis_streams](#configuration-kinesis_streams)
77
89
 
78
- Here are items for Fluentd configuration file.
90
+ ### kinesis_producer
91
+ <match your_tag>
92
+ @type kinesis_producer
93
+ region us-east-1
94
+ stream_name your_stream
95
+ partition_key key # Otherwise, use random partition key
96
+ </match>
97
+ For more detail, see [Configuration: kinesis_producer](#configuration-kinesis_producer)
79
98
 
80
- To put records into Amazon Kinesis,
81
- you need to provide AWS security credentials.
82
- If you provide aws_key_id and aws_sec_key in configuration file as below,
83
- we use it. You can also provide credentials via environment variables as
84
- AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY. Also we support IAM Role for
85
- authentication. Please find the [AWS SDK for Ruby Developer Guide](http://docs.aws.amazon.com/AWSSdkDocsRuby/latest/DeveloperGuide/ruby-dg-setup.html)
86
- for more information about authentication.
87
- We support all options which AWS SDK for Ruby supports.
99
+ ### kinesis_firehose
100
+ <match your_tag>
101
+ @type kinesis_firehose
102
+ region us-east-1
103
+ delivery_stream_name your_stream
104
+ </match>
105
+ For more detail, see [Configuration: kinesis_firehose](#configuration-kinesis_firehose)
88
106
 
89
- ### type
107
+ ### For better throughput
108
+ Add configuration like below:
90
109
 
91
- Use the word 'kinesis'.
110
+ flush_interval 1
111
+ buffer_chunk_limit 1m
112
+ try_flush_interval 0.1
113
+ queued_chunk_flush_interval 0.01
114
+ num_threads 15
115
+ detach_process 5
92
116
 
93
- ### stream_name
117
+ Note: Each value should be adjusted to your system by yourself.
94
118
 
95
- Name of the stream to put data.
119
+ ## Configuration: Credentials
120
+ To put records into Amazon Kinesis Streams or Firehose, you need to provide AWS security credentials.
96
121
 
97
- ### aws_key_id
122
+ The credential provider will be choosen by the steps below:
98
123
 
124
+ - Use [**shared_credentials**](#shared_credentials) section if you set it
125
+ - Use [**assume_role_credentials**](#assume_role_credentials) section if you set it
126
+ - Otherwise, default provicder chain:
127
+ - [**aws_key_id**](#aws_key_id) and [**aws_sec_key**](#aws_sec_key)
128
+ - Environment variables (ex. `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, etc.)
129
+ - Default shared credentials (`default` in `~/.aws/credentials`)
130
+ - Instance profile (For Amazon EC2)
131
+
132
+ ### aws_key_id
99
133
  AWS access key id.
100
134
 
101
135
  ### aws_sec_key
136
+ AWS secret access key.
137
+
138
+ ### shared_credentials
139
+ Use this config section to specify shared credential file path and profile name. If you want to use default profile (`default` in `~/.aws/credentials`), you don't have to specify here.
140
+
141
+ #### profile_name
142
+ Profile name of the credential file.
143
+
144
+ #### path
145
+ Path for the credential file.
146
+
147
+ ### assume_role_credentials
148
+ Use this config section for cross account access.
149
+
150
+ #### role_arn
151
+ IAM Role to be assumed with [AssumeRole][assume_role].
152
+
153
+ #### external_id
154
+ A unique identifier that is used by third parties when [assuming roles][assmue_role] in their customers' accounts. Use this option with `role_arn` for third party cross account access. For detail, please see [How to Use an External ID When Granting Access to Your AWS Resources to a Third Party][external_id].
102
155
 
103
- AWS secret key.
156
+ ## Configuraion: Format
157
+ This plugin use `Fluent::TextFormatter` to serialize record to string. For more detail, see [formatter.rb]. Also, this plugin includes `Fluent::SetTimeKeyMixin` and `Fluent::SetTagKeyMixin` to use **include_time_key** and **include_tagkey**.
104
158
 
105
- ### role_arn
159
+ ### formatter
160
+ Default `json`.
106
161
 
107
- IAM Role to be assumed with [AssumeRole](http://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html).
108
- Use this option for cross account access.
162
+ ### include_time_key
163
+ Defalut `false`. If you want to include `time` field in your record, set `true`.
109
164
 
110
- ### external_id
165
+ ### include_tag_key
166
+ Defalut `false`. If you want to include `tag` field in your record, set `true`.
111
167
 
112
- A unique identifier that is used by third parties when
113
- [assuming roles](http://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html) in their customers' accounts.
114
- Use this option with `role_arn` for third party cross account access.
115
- For detail, please see [How to Use an External ID When Granting Access to Your AWS Resources to a Third Party](http://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html).
168
+ ### data_key
169
+ If your record contains a field whose string should be sent to Amazon Kinesis directly (without formatter), use this parameter to specify the field. In that case, other fileds than **data_key** are thrown away and never sent to Amazon Kinesis. Default `nil`, which means whole record will be formatted and sent.
170
+
171
+ ### log_truncate_max_size
172
+ Integer, default 0. When emitting the log entry, the message will be truncated by this size to avoid infinite loop when the log is also sent to Kinesis. The value 0 (default) means no truncation.
173
+
174
+ ## Configuration: kinesis_streams
175
+ Here are `kinesis_streams` specific configurations.
176
+
177
+ ### stream_name
178
+ Name of the stream to put data.
116
179
 
117
180
  ### region
181
+ AWS region of your stream. It should be in form like `us-east-1`, `us-west-2`. Refer to [Regions and Endpoints in AWS General Reference][region] for supported regions.
182
+
183
+ Default `nil`, which means try to find from environment variable `AWS_REGION`.
184
+
185
+ ### partition_key
186
+ A key to extract partition key from JSON object. Default `nil`, which means partition key will be generated randomly.
118
187
 
119
- AWS region of your stream.
120
- It should be in form like "us-east-1", "us-west-2".
121
- Refer to [Regions and Endpoints in AWS General Reference](http://docs.aws.amazon.com/general/latest/gr/rande.html#ak_region)
122
- for supported regions.
188
+ ### retries_on_batch_request
189
+ Integer, default is 3. The plugin will put multiple records to Amazon Kinesis Streams in batches using PutRecords. A set of records in a batch may fail for reasons documented in the Kinesis Service API Reference for PutRecords. Failed records will be retried **retries_on_batch_request** times. If a record fails all retries an error log will be emitted.
123
190
 
124
- ### ensure_stream_connection
191
+ ### reset_backoff_if_success
192
+ Boolean, default `true`. If enabled, when after retrying, the next retrying checks the number of succeeded records on the former batch request and reset exponential backoff if there is any success. Because batch request could be composed by requests across shards, simple exponential backoff for the batch request wouldn't work some cases.
125
193
 
126
- When enabled, the plugin checks and ensures a connection to the stream you are using by [DescribeStream](http://docs.aws.amazon.com/kinesis/latest/APIReference/API_DescribeStream.html) and throws exception if it fails. Enabled by default.
194
+ ### batch_request_max_count
195
+ Integer, default 500. The number of max count of making batch request from record chunk. It can't exceed the default value because it's API limit.
196
+
197
+ ### batch_request_max_size
198
+ Integer, default 5 * 1024*1024. The number of max size of making batch request from record chunk. It can't exceed the default value because it's API limit.
127
199
 
128
200
  ### http_proxy
201
+ HTTP proxy for API calling. Default `nil`.
202
+
203
+ ### endpoint
204
+ API endpoint URL, for testing. Defalut `nil`.
205
+
206
+ ### ssl_verify_peer
207
+ Boolean. Disable if you want to verify ssl conncetion, for testing. Default `true`.
208
+
209
+ ### debug
210
+ Boolean. Enable if you need to debug Amazon Kinesis Firehose API call. Default is `false`.
211
+
212
+ ## Configuration: kinesis_producer
213
+ Here are `kinesis_producer` specific configurations.
129
214
 
130
- Proxy server, if any.
131
- It should be in form like "http://squid:3128/"
215
+ ### stream_name
216
+ Name of the stream to put data.
132
217
 
133
- ### random_partition_key
218
+ ### region
219
+ AWS region of your stream. It should be in form like `us-east-1`, `us-west-2`. Refer to [Regions and Endpoints in AWS General Reference][region] for supported regions.
134
220
 
135
- Boolean. If true, the plugin uses randomly generated
136
- partition key for each record. Note that this parameter
137
- overrides *partition_key*, *partition_key_expr*,
138
- *explicit_hash_key* and *explicit_hash_key_expr*.
221
+ Default `nil`, which means try to find from environment variable `AWS_REGION`. If both **region** and `AWS_REGION` are not defined, KPL will try to find region from Amazon EC2 metadata.
139
222
 
140
223
  ### partition_key
224
+ A key to extract partition key from JSON object. Default `nil`, which means partition key will be generated randomly.
141
225
 
142
- A key to extract partition key from JSON object.
226
+ ### debug
227
+ Boolean. Enable if you need to debug Kinesis Producer Library metrics. Default is `false`.
143
228
 
144
- ### partition_key_expr
229
+ ### kinesis_producer
230
+ This section is configuration for Kinesis Producer Library. Almost all of description comes from [deault_config.propertites of KPL Java Sample Application][default_config.properties].
145
231
 
146
- A Ruby expression to extract partition key from JSON object.
147
- We treat your expression as below.
232
+ #### aggregation_enabled
233
+ Enable aggregation. With aggregation, multiple user records are packed into a single KinesisRecord. If disabled, each user record is sent in its own KinesisRecord.
148
234
 
149
- a_proc = eval(sprintf('proc {|record| %s }', YOUR_EXPRESSION))
150
- a_proc.call(record)
235
+ If your records are small, enabling aggregation will allow you to put many more records than you would otherwise be able to for a shard before getting throttled.
151
236
 
152
- You should write your Ruby expression that receives input data
153
- as a variable 'record', process it and return it. The returned
154
- value will be used as a partition key. For use case example,
155
- see 'Configuration examples' part.
237
+ Default: `true`
156
238
 
157
- ### explicit_hash_key
239
+ #### aggregation_max_count
240
+ Maximum number of items to pack into an aggregated record.
158
241
 
159
- A key to extract explicit hash key from JSON object.
160
- Explicit hash key is hash value used to explicitly
161
- determine the shard the data record is assigned to
162
- by overriding the partition key hash.
242
+ There should be normally no need to adjust this. If you want to limit the time records spend buffering, look into record_max_buffered_time instead.
163
243
 
164
- ### explicit_hash_key_expr
244
+ Default: 4294967295
245
+ Minimum: 1
246
+ Maximum (inclusive): 9223372036854775807
165
247
 
166
- A Ruby expression to extract explicit hash key from JSON object.
167
- Your expression will be treat in the same way as we treat partition_key_expr.
248
+ #### aggregation_max_size
249
+ Maximum number of bytes to pack into an aggregated Kinesis record.
168
250
 
169
- ### order_events
251
+ There should be normally no need to adjust this. If you want to limit the time records spend buffering, look into [**record_max_buffered_time**](#record_max_buffered_time) instead.
170
252
 
171
- Boolean. By enabling it, you can strictly order events in Amazon Kinesis,
172
- according to arrival of events. Without this, events will be coarsely ordered
173
- based on arrival time. For detail,
174
- see [Using the Amazon Kinesis Service API](http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-using-api-java.html#kinesis-using-api-defn-sequence-number).
253
+ If a record has more data by itself than this limit, it will bypass the aggregator. Note the backend enforces a limit of 50KB on record size. If you set this beyond 50KB, oversize records will be rejected at the backend.
175
254
 
176
- Please note that if you set *detach_process* or *num_threads greater than 1*,
177
- this option will be ignored.
255
+ Default: 51200
256
+ Minimum: 64
257
+ Maximum (inclusive): 1048576
178
258
 
179
- ### detach_process
259
+ #### collection_max_count
260
+ Maximum number of items to pack into an PutRecords request.
180
261
 
181
- Integer. Optional. This defines the number of parallel processes to start.
182
- This can be used to increase throughput by allowing multiple processes to
183
- execute the plugin at once. This cannot be used together with **order_events**.
184
- Setting this option to > 0 will cause the plugin to run in a separate
185
- process. The default is 0.
262
+ There should be normally no need to adjust this. If you want to limit the time records spend buffering, look into [**record_max_buffered_time**](#record_max_buffered_time) instead.
186
263
 
187
- ### num_threads
264
+ Default: 500
265
+ Minimum: 1
266
+ Maximum (inclusive): 500
188
267
 
189
- Integer. The number of threads to flush the buffer. This plugin is based on
190
- Fluentd::BufferedOutput, so we buffer incoming records before emitting them to
191
- Amazon Kinesis. You can find the detail about buffering mechanism [here](http://docs.fluentd.org/articles/buffer-plugin-overview).
192
- Emitting records to Amazon Kinesis via network causes I/O Wait, so parallelizing
193
- emitting with threads will improve throughput.
268
+ #### collection_max_size
269
+ Maximum amount of data to send with a PutRecords request.
194
270
 
195
- This option can be used to parallelize writes into the output(s)
196
- designated by the output plugin. The default is 1.
197
- Also you can use this option with *detach_process*.
271
+ There should be normally no need to adjust this. If you want to limit the time records spend buffering, look into [**record_max_buffered_time**](#record_max_buffered_time) instead.
198
272
 
199
- ### retries_on_putrecords
273
+ Records larger than the limit will still be sent, but will not be grouped with others.
200
274
 
201
- Integer, default is 3. When **order_events** is false, the plugin will put multiple
202
- records to Amazon Kinesis in batches using PutRecords. A set of records in a batch
203
- may fail for reasons documented in the Kinesis Service API Reference for PutRecords.
204
- Failed records will be retried **retries_on_putrecords** times. If a record
205
- fails all retries an error log will be emitted.
275
+ Default: 5242880
276
+ Minimum: 52224
277
+ Maximum (inclusive): 9223372036854775807
206
278
 
207
- ### use_yajl
279
+ #### connect_timeout
280
+ Timeout (milliseconds) for establishing TLS connections.
208
281
 
209
- Boolean, default is false.
210
- In case you find error `Encoding::UndefinedConversionError` with multibyte texts, you can avoid that error with this option.
282
+ Default: 6000
283
+ Minimum: 100
284
+ Maximum (inclusive): 300000
211
285
 
212
- ### zlib_compression
286
+ #### custom_endpoint
287
+ Use a custom Kinesis and CloudWatch endpoint.
213
288
 
214
- Boolean, default is false.
215
- Zlib compresses the message data blob.
216
- Each zlib compressed message must remain within megabyte in size.
289
+ Mostly for testing use. Note this does not accept protocols or paths, only host names or ip addresses. There is no way to disable TLS. The KPL always connects with TLS.
217
290
 
218
- ### debug
291
+ Expected pattern: `^([A-Za-z0-9-\\.]+)?$`
219
292
 
220
- Boolean. Enable if you need to debug Amazon Kinesis API call. Default is false.
293
+ #### fail_if_throttled
294
+ If `true`, throttled puts are not retried. The records that got throttled will be failed immediately upon receiving the throttling error. This is useful if you want to react immediately to any throttling without waiting for the KPL to retry. For example, you can use a different hash key to send the throttled record to a backup shard.
221
295
 
222
- ## Configuration examples
296
+ If `false`, the KPL will automatically retry throttled puts. The KPL performs backoff for shards that it has received throttling errors from, and will avoid flooding them with retries. Note that records may fail from expiration (see [**record_ttl**](#record_ttl)) if they get delayed for too long because of
297
+ throttling.
223
298
 
224
- Here are some configuration examles.
225
- Assume that the JSON object below is coming to with tag 'your_tag'.
299
+ Default: `false`
226
300
 
227
- {
228
- "name":"foo",
229
- "action":"bar"
230
- }
301
+ #### log_level
302
+ Minimum level of logs. Messages below the specified level will not be logged. Logs for the native KPL daemon show up on stderr.
231
303
 
232
- ### Simply putting events to Amazon Kinesis with a partition key
304
+ Default: `info`
305
+ Expected pattern: `info|warning|error`
233
306
 
234
- In this example, simply a value 'foo' will be used as partition key,
235
- then events will be sent to the stream specified in 'stream_name'.
307
+ #### max_connections
308
+ Maximum number of connections to open to the backend. HTTP requests are sent in parallel over multiple connections.
236
309
 
237
- <match your_tag>
238
- type kinesis
310
+ Setting this too high may impact latency and consume additional resources without increasing throughput.
239
311
 
240
- stream_name YOUR_STREAM_NAME
312
+ Default: 4
313
+ Minimum: 1
314
+ Maximum (inclusive): 128
241
315
 
242
- aws_key_id YOUR_AWS_ACCESS_KEY
243
- aws_sec_key YOUR_SECRET_KEY
316
+ #### metrics_granularity
317
+ Controls the granularity of metrics that are uploaded to CloudWatch. Greater granularity produces more metrics.
244
318
 
245
- region us-east-1
319
+ When `shard` is selected, metrics are emitted with the stream name and shard id as dimensions. On top of this, the same metric is also emitted with only the stream name dimension, and lastly, without the stream name. This means for a particular metric, 2 streams with 2 shards (each) will produce 7 CloudWatch metrics, one for each shard, one for each stream, and one overall, all describing the same statistics, but at different levels of granularity.
246
320
 
247
- partition_key name
248
- </match>
321
+ When `stream` is selected, per shard metrics are not uploaded; when `global` is selected, only the total aggregate for all streams and all shards are uploaded.
249
322
 
250
- ### Using partition_key_expr to add specific prefix to partition key
323
+ Consider reducing the granularity if you're not interested in shard-level metrics, or if you have a large number of shards.
251
324
 
252
- In this example, we add partition_key_expr to the example above.
253
- This expression adds string 'some_prefix-' to partition key 'name',
254
- then partition key finally will be 'some_prefix-foo'.
325
+ If you only have 1 stream, select `global`; the global data will be equivalent to that for the stream.
255
326
 
256
- With specifying parition_key and parition_key_expr both,
257
- the extracted value for partition key from JSON object will be
258
- passed to your Ruby expression as a variable 'record'.
327
+ Refer to the metrics documentation for details about each metric.
259
328
 
260
- <match your_tag>
261
- type kinesis
329
+ Default: `shard`
330
+ Expected pattern: `global|stream|shard`
262
331
 
263
- stream_name YOUR_STREAM_NAME
332
+ #### metrics_level
333
+ Controls the number of metrics that are uploaded to CloudWatch.
264
334
 
265
- aws_key_id YOUR_AWS_ACCESS_KEY
266
- aws_sec_key YOUR_SECRET_KEY
335
+ `none` disables all metrics.
267
336
 
268
- region us-east-1
337
+ `summary` enables the following metrics: UserRecordsPut, KinesisRecordsPut, ErrorsByCode, AllErrors, BufferingTime.
269
338
 
270
- partition_key name
271
- partition_key_expr 'some_prefix-' + record
272
- </match>
339
+ `detailed` enables all remaining metrics.
273
340
 
274
- ### Using partition_key_expr to extract a value for partition key
341
+ Refer to the metrics documentation for details about each metric.
275
342
 
276
- In this example, we use only partition_key_expr to extract
277
- a value for partition key. It will be 'bar'.
343
+ Default: `detailed`
344
+ Expected pattern: `none|summary|detailed`
278
345
 
279
- Specifying partition_key_expr without partition_key,
280
- hash object that is converted from whole JSON object will be
281
- passed to your Ruby expression as a variable 'record'.
346
+ #### metrics_namespace
347
+ The namespace to upload metrics under.
282
348
 
283
- <match your_tag>
284
- type kinesis
349
+ If you have multiple applications running the KPL under the same AWS account, you should use a different namespace for each application.
285
350
 
286
- stream_name YOUR_STREAM_NAME
351
+ If you are also using the KCL, you may wish to use the application name you have configured for the KCL as the the namespace here. This way both your KPL and KCL metrics show up under the same namespace.
287
352
 
288
- aws_key_id YOUR_AWS_ACCESS_KEY
289
- aws_sec_key YOUR_SECRET_KEY
353
+ Default: `KinesisProducerLibrary`
354
+ Expected pattern: `(?!AWS/).{1,255}`
290
355
 
291
- region us-east-1
356
+ #### metrics_upload_delay
357
+ Delay (in milliseconds) between each metrics upload.
292
358
 
293
- partition_key_expr record['action']
294
- </match>
359
+ For testing only. There is no benefit in setting this lower or higher in production.
295
360
 
296
- ### Improving throughput to Amazon Kinesis
361
+ Default: 60000
362
+ Minimum: 1
363
+ Maximum (inclusive): 60000
364
+
365
+ #### min_connections
366
+ Minimum number of connections to keep open to the backend.
297
367
 
298
- The achievable throughput to Amazon Kinesis is limited to single-threaded
299
- PutRecord calls if **order_events** is set to true. By setting **order_events**
300
- to false records will be sent to Amazon Kinesis in batches. When operating in
301
- this mode the plugin can also be configured to execute in parallel.
302
- The **detach_process** and **num_threads** configuration settings control
303
- parallelism.
368
+ There should be no need to increase this in general.
304
369
 
305
- Please note that **order_events** option will be ignored if you choose to
306
- use either **detach_process** or **num_threads**.
370
+ Default: 1
371
+ Minimum: 1
372
+ Maximum (inclusive): 16
373
+
374
+ #### port
375
+ Server port to connect to. Only useful with [**custom_endpoint**](#custom_endpoint).
376
+
377
+ Default: 443
378
+ Minimum: 1
379
+ Maximum (inclusive): 65535
380
+
381
+ #### rate_limit
382
+ Limits the maximum allowed put rate for a shard, as a percentage of the backend limits.
383
+
384
+ The rate limit prevents the producer from sending data too fast to a shard. Such a limit is useful for reducing bandwidth and CPU cycle wastage from sending requests that we know are going to fail from throttling.
385
+
386
+ Kinesis enforces limits on both the number of records and number of bytes per second. This setting applies to both.
387
+
388
+ The default value of 150% is chosen to allow a single producer instance to completely saturate the allowance for a shard. This is an aggressive setting. If you prefer to reduce throttling errors rather than completely saturate the shard, consider reducing this setting.
389
+
390
+ Default: 150
391
+ Minimum: 1
392
+ Maximum (inclusive): 9223372036854775807
393
+
394
+ #### record_max_buffered_time
395
+ Maximum amount of itme (milliseconds) a record may spend being buffered before it gets sent. Records may be sent sooner than this depending on the other buffering limits.
396
+
397
+ This setting provides coarse ordering among records - any two records will be reordered by no more than twice this amount (assuming no failures and retries and equal network latency).
398
+
399
+ The library makes a best effort to enforce this time, but cannot guarantee that it will be precisely met. In general, if the CPU is not overloaded, the library will meet this deadline to within 10ms.
400
+
401
+ Failures and retries can additionally increase the amount of time records spend in the KPL. If your application cannot tolerate late records, use the [**record_ttl**](#record_ttl) setting to drop records that do not get transmitted in time.
402
+
403
+ Setting this too low can negatively impact throughput.
404
+
405
+ Default: 100
406
+ Maximum (inclusive): 9223372036854775807
407
+
408
+ #### record_ttl
409
+ Set a time-to-live on records (milliseconds). Records that do not get successfully put within the limit are failed.
410
+
411
+ This setting is useful if your application cannot or does not wish to tolerate late records. Records will still incur network latency after they leave the KPL, so take that into consideration when choosing a value for this setting.
412
+
413
+ If you do not wish to lose records and prefer to retry indefinitely, set record_ttl to a large value like INT_MAX. This has the potential to cause head-of-line blocking if network issues or throttling occur. You can respond to such situations by using the metrics reporting functions of the KPL. You may also set [**fail_if_throttled**](#fail_if_throttled) to true to prevent automatic retries in case of throttling.
414
+
415
+ Default: 30000
416
+ Minimum: 100
417
+ Maximum (inclusive): 9223372036854775807
418
+
419
+ #### request_timeout
420
+ The maximum total time (milliseconds) elapsed between when we begin a HTTP request and receiving all of the response. If it goes over, the request will be timed-out.
421
+
422
+ Note that a timed-out request may actually succeed at the backend. Retrying then leads to duplicates. Setting the timeout too low will therefore increase the probability of duplicates.
423
+
424
+ Default: 6000
425
+ Minimum: 100
426
+ Maximum (inclusive): 600000
427
+
428
+ #### verify_certificate
429
+ Verify the endpoint's certificate. Do not disable unless using [**custom_endpoint**](#custom_endpoint) for testing. Never disable this in production.
430
+
431
+ Default: `true`
432
+
433
+ #### credentials_refresh_delay
434
+ Interval seconds for refreshing credentials seding to KPL.
435
+
436
+ Defalut 5000
437
+
438
+ ## Configuration: kinesis_firehose
439
+ Here are `kinesis_firehose` specific configurations.
440
+
441
+ ### delivery_stream_name
442
+ Name of the delivery stream to put data.
443
+
444
+ ### region
445
+ AWS region of your stream. It should be in form like `us-east-1`, `us-west-2`. Refer to [Regions and Endpoints in AWS General Reference][region] for supported regions.
446
+
447
+ Default `nil`, which means try to find from environment variable `AWS_REGION`.
448
+
449
+ ### append_new_line
450
+ Boolean. Default `true`. If it is enabled, the plugin add new line character (`\n`) to each serialized record.
451
+
452
+ ### retries_on_batch_request
453
+ Integer, default is 3. The plugin will put multiple records to Amazon Kinesis Firehose in batches using PutRecordBatch. A set of records in a batch may fail for reasons documented in the Kinesis Service API Reference for PutRecordBatch. Failed records will be retried **retries_on_batch_request** times. If a record fails all retries an error log will be emitted.
454
+
455
+ ### reset_backoff_if_success
456
+ Boolean, default `true`. If enabled, when after retrying, the next retrying checks the number of succeeded records on the former batch request and reset exponential backoff if there is any success. Because batch request could be composed by requests across shards, simple exponential backoff for the batch request wouldn't work some cases.
457
+
458
+ ### batch_request_max_count
459
+ Integer, default 500. The number of max count of making batch request from record chunk. It can't exceed the default value because it's API limit.
460
+
461
+ ### batch_request_max_size
462
+ Integer, default 4 * 1024*1024. The number of max size of making batch request from record chunk. It can't exceed the default value because it's API limit.
463
+
464
+ ### http_proxy
465
+ HTTP proxy for API calling. Default `nil`.
466
+
467
+ ### endpoint
468
+ API endpoint URL, for testing. Defalut `nil`.
469
+
470
+ ### ssl_verify_peer
471
+ Boolean. Disable if you want to verify ssl conncetion, for testing. Default `true`.
472
+
473
+ ### debug
474
+ Boolean. Enable if you need to debug Amazon Kinesis Firehose API call. Default is `false`.
475
+
476
+ ## Configuration: Examples
477
+
478
+ Here are some configuration examles.
479
+ Assume that the JSON object below is coming to with tag 'your_tag'.
480
+
481
+ {
482
+ "name":"foo",
483
+ "action":"bar"
484
+ }
485
+
486
+ ### Improving throughput to Amazon Kinesis
487
+ The plugin can also be configured to execute in parallel. `detach_process` and `num_threads` configuration settings control parallelism.
307
488
 
308
489
  In case of the configuration below, you will spawn 2 processes.
309
490
 
310
491
  <match your_tag>
311
- type kinesis
492
+ type kinesis_*
312
493
 
313
494
  stream_name YOUR_STREAM_NAME
314
495
  region us-east-1
315
496
 
316
497
  detach_process 2
317
-
318
498
  </match>
319
499
 
320
- You can also specify a number of threads to put.
321
- The number of threads is bound to each individual processes.
322
- So in this case, you will spawn 1 process which has 50 threads.
500
+ You can also specify a number of threads to put. The number of threads is bound to each individual processes. So in this case, you will spawn 1 process which has 50 threads.
323
501
 
324
502
  <match your_tag>
325
- type kinesis
503
+ type kinesis_*
326
504
 
327
505
  stream_name YOUR_STREAM_NAME
328
506
  region us-east-1
@@ -330,11 +508,10 @@ So in this case, you will spawn 1 process which has 50 threads.
330
508
  num_threads 50
331
509
  </match>
332
510
 
333
- Both options can be used together, in the configuration below,
334
- you will spawn 2 processes and 50 threads per each processes.
511
+ Both options can be used together, in the configuration below, you will spawn 2 processes and 50 threads per each processes.
335
512
 
336
513
  <match your_tag>
337
- type kinesis
514
+ type kinesis_*
338
515
 
339
516
  stream_name YOUR_STREAM_NAME
340
517
  region us-east-1
@@ -343,6 +520,44 @@ you will spawn 2 processes and 50 threads per each processes.
343
520
  num_threads 50
344
521
  </match>
345
522
 
523
+ ## Development
524
+
525
+ To launch `fluentd` process with this plugin for development, follow the steps below:
526
+
527
+ git clone https://github.com/awslabs/aws-fluent-plugin-kinesis.git
528
+ cd aws-fluent-plugin-kinesis
529
+ make # will install gems and download KPL jar file and extract binaries
530
+ make [stream/firehose/producer]
531
+
532
+ Then, in another terminal, run the command below. It will emit one record.
533
+
534
+ make hello
535
+
536
+ Also, you can test streaming log data by `dummer`:
537
+
538
+ make dummer # keep writing to /tmp/dummy.log
539
+
540
+ ## Contributing
541
+
542
+ Bug reports and pull requests are welcome on [GitHub][github].
543
+
346
544
  ## Related Resources
347
545
 
348
- * [Amazon Kinesis Developer Guide](http://docs.aws.amazon.com/kinesis/latest/dev/introduction.html)
546
+ * [Amazon Kinesis Streams Developer Guide](http://docs.aws.amazon.com/kinesis/latest/dev/introduction.html)
547
+ * [Amazon Kinesis Firehose Developer Guide](http://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html)
548
+
549
+ [fluentd]: http://fluentd.org/
550
+ [streams]: https://aws.amazon.com/kinesis/streams/
551
+ [firehose]: https://aws.amazon.com/kinesis/firehose/
552
+ [producer]: http://docs.aws.amazon.com/kinesis/latest/dev/developing-producers-with-kpl.html
553
+ [td-agent]: https://github.com/treasure-data/td-agent
554
+ [bundler]: http://bundler.io/
555
+ [assume_role]: http://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html
556
+ [external_id]: http://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html
557
+ [region]: http://docs.aws.amazon.com/general/latest/gr/rande.html#ak_region
558
+ [fluentd_buffer]: http://docs.fluentd.org/articles/buffer-plugin-overview
559
+ [github]: https://github.com/awslabs/aws-fluent-plugin-kinesis
560
+ [formatter.rb]: https://github.com/fluent/fluentd/blob/master/lib/fluent/formatter.rb
561
+ [default_config.properties]: https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer-sample/default_config.properties
562
+ [old-readme]: https://github.com/awslabs/aws-fluent-plugin-kinesis/blob/master/README-v0.4.md
563
+ [fluentd-doc-kinesis]: http://docs.fluentd.org/articles/kinesis-stream