fluent-plugin-kinesis 0.4.1 → 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.gitignore +13 -18
- data/.travis.yml +9 -9
- data/CHANGELOG.md +9 -0
- data/CONTRIBUTORS.txt +1 -1
- data/Gemfile +12 -9
- data/LICENSE.txt +39 -201
- data/Makefile +40 -0
- data/NOTICE.txt +1 -1
- data/README-v0.4.md +348 -0
- data/README.md +398 -183
- data/Rakefile +20 -14
- data/benchmark/dummer.conf +13 -0
- data/benchmark/firehose.conf +24 -0
- data/benchmark/producer.conf +28 -0
- data/benchmark/streams.conf +24 -0
- data/fluent-plugin-kinesis.gemspec +34 -23
- data/gemfiles/Gemfile.fluentd-0.10.58 +20 -0
- data/lib/fluent/plugin/kinesis_helper.rb +30 -0
- data/lib/fluent/plugin/kinesis_helper/api.rb +164 -0
- data/lib/fluent/plugin/kinesis_helper/class_methods.rb +120 -0
- data/lib/fluent/plugin/kinesis_helper/client.rb +36 -0
- data/lib/fluent/plugin/kinesis_helper/credentials.rb +51 -0
- data/lib/fluent/plugin/kinesis_helper/error.rb +38 -0
- data/lib/fluent/plugin/kinesis_helper/format.rb +85 -0
- data/lib/fluent/plugin/kinesis_helper/initialize.rb +58 -0
- data/lib/fluent/plugin/kinesis_helper/kpl.rb +81 -0
- data/lib/fluent/plugin/out_kinesis.rb +13 -11
- data/lib/fluent/plugin/out_kinesis_firehose.rb +44 -0
- data/lib/fluent/plugin/out_kinesis_producer.rb +38 -0
- data/lib/fluent/plugin/out_kinesis_streams.rb +47 -0
- data/lib/fluent/plugin/patched_detach_process_impl.rb +103 -0
- data/lib/fluent_plugin_kinesis/version.rb +17 -0
- data/lib/kinesis_producer.rb +24 -0
- data/lib/kinesis_producer/binary.rb +10 -0
- data/lib/kinesis_producer/daemon.rb +238 -0
- data/lib/kinesis_producer/library.rb +122 -0
- data/lib/kinesis_producer/protobuf/config.pb.rb +66 -0
- data/lib/kinesis_producer/protobuf/messages.pb.rb +151 -0
- data/lib/kinesis_producer/tasks/binary.rake +73 -0
- metadata +196 -36
- data/lib/fluent/plugin/version.rb +0 -16
- data/test/helper.rb +0 -32
- data/test/plugin/test_out_kinesis.rb +0 -641
data/README.md
CHANGED
@@ -1,47 +1,51 @@
|
|
1
|
-
# Fluent
|
1
|
+
# Fluent plugin for Amazon Kinesis
|
2
2
|
|
3
|
-
|
3
|
+
[![Build Status](https://travis-ci.org/awslabs/aws-fluent-plugin-kinesis.svg?branch=master)](https://travis-ci.org/awslabs/aws-fluent-plugin-kinesis)
|
4
4
|
|
5
|
-
|
5
|
+
[Fluentd][fluentd] output plugin
|
6
|
+
that sends events to [Amazon Kinesis Streams][streams] (via both API and [Kinesis Producer Library (KPL)][producer]) and [Amazon Kinesis Firehose][firehose] (via API). This gem includes three output plugins respectively:
|
6
7
|
|
7
|
-
|
8
|
-
|
8
|
+
- `kinesis_streams`
|
9
|
+
- `kinesis_producer`
|
10
|
+
- `kinesis_firehose`
|
9
11
|
|
10
|
-
Also, there is a documentation on
|
12
|
+
Also, there is a [documentation on Fluentd official site][fluentd-doc-kinesis].
|
11
13
|
|
12
|
-
##
|
14
|
+
## Warning: `kinesis` is no longer supported
|
15
|
+
As of v1.0.0, `kinesis` plugin is no longer supported. Still you can use the plugin, but if you see the warn log below, please consider to use `kinesis_streams`.
|
16
|
+
|
17
|
+
[warn]: Deprecated warning: out_kinesis is no longer supported after v1.0.0. Please check out_kinesis_streams out.
|
13
18
|
|
19
|
+
If you still want to use `kinesis`, please see [the old README][old-readme].
|
20
|
+
|
21
|
+
## Installation
|
14
22
|
This fluentd plugin is available as the `fluent-plugin-kinesis` gem from RubyGems.
|
15
23
|
|
16
24
|
gem install fluent-plugin-kinesis
|
17
25
|
|
18
|
-
Or you can install this plugin for [td-agent]
|
26
|
+
Or you can install this plugin for [td-agent][td-agent] as:
|
19
27
|
|
20
|
-
|
28
|
+
td-agent-gem install fluent-plugin-kinesis
|
21
29
|
|
22
|
-
If you would like to build by yourself and install, please see the section below.
|
23
|
-
Your need [bundler](http://bundler.io/) for this.
|
30
|
+
If you would like to build by yourself and install, please see the section below. Your need [bundler][bundler] for this.
|
24
31
|
|
25
|
-
In case of using with Fluentd:
|
26
|
-
Fluentd will be also installed via the process below.
|
32
|
+
In case of using with Fluentd: Fluentd will be also installed via the process below.
|
27
33
|
|
28
34
|
git clone https://github.com/awslabs/aws-fluent-plugin-kinesis.git
|
29
35
|
cd aws-fluent-plugin-kinesis
|
30
36
|
bundle install
|
31
|
-
rake build
|
32
|
-
rake install
|
37
|
+
bundle exec rake build
|
38
|
+
bundle exec rake install
|
33
39
|
|
34
|
-
Also, you can use this plugin with td-agent:
|
35
|
-
You have to install td-agent before installing this plugin.
|
40
|
+
Also, you can use this plugin with td-agent: You have to install td-agent before installing this plugin.
|
36
41
|
|
37
42
|
git clone https://github.com/awslabs/aws-fluent-plugin-kinesis.git
|
38
43
|
cd aws-fluent-plugin-kinesis
|
39
44
|
bundle install
|
40
|
-
rake build
|
45
|
+
bundle exec rake build
|
41
46
|
fluent-gem install pkg/fluent-plugin-kinesis
|
42
47
|
|
43
|
-
Or just download specify your Ruby library path.
|
44
|
-
Below is the sample for specifying your library path via RUBYLIB.
|
48
|
+
Or just download specify your Ruby library path. Below is the sample for specifying your library path via RUBYLIB.
|
45
49
|
|
46
50
|
git clone https://github.com/awslabs/aws-fluent-plugin-kinesis.git
|
47
51
|
cd aws-fluent-plugin-kinesis
|
@@ -49,12 +53,10 @@ Below is the sample for specifying your library path via RUBYLIB.
|
|
49
53
|
export RUBYLIB=$RUBYLIB:/path/to/aws-fluent-plugin-kinesis/lib
|
50
54
|
|
51
55
|
## Dependencies
|
52
|
-
|
53
|
-
*
|
54
|
-
* Fluentd 0.10.43+
|
56
|
+
* Ruby 2.0.0+
|
57
|
+
* Fluentd 0.10.58+
|
55
58
|
|
56
59
|
## Basic Usage
|
57
|
-
|
58
60
|
Here are general procedures for using this plugin:
|
59
61
|
|
60
62
|
1. Install.
|
@@ -73,256 +75,432 @@ To run with td-agent, it would be as follows:
|
|
73
75
|
1. Edit configuration file provided by td-agent.
|
74
76
|
1. Then, run or restart td-agent.
|
75
77
|
|
76
|
-
##
|
78
|
+
## Getting started
|
79
|
+
Assume you use Amazon EC2 instances with Instance profile. If you want to use specific credentials, see [Credentials](#configuration-credentials).
|
80
|
+
|
81
|
+
### kinesis_streams
|
82
|
+
<match your_tag>
|
83
|
+
@type kinesis_streams
|
84
|
+
region us-east-1
|
85
|
+
stream_name your_stream
|
86
|
+
partition_key key # Otherwise, use random partition key
|
87
|
+
</match>
|
88
|
+
For more detail, see [Configuration: kinesis_streams](#configuration-kinesis_streams)
|
77
89
|
|
78
|
-
|
90
|
+
### kinesis_producer
|
91
|
+
<match your_tag>
|
92
|
+
@type kinesis_producer
|
93
|
+
region us-east-1
|
94
|
+
stream_name your_stream
|
95
|
+
partition_key key # Otherwise, use random partition key
|
96
|
+
</match>
|
97
|
+
For more detail, see [Configuration: kinesis_producer](#configuration-kinesis_producer)
|
79
98
|
|
80
|
-
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
We support all options which AWS SDK for Ruby supports.
|
99
|
+
### kinesis_firehose
|
100
|
+
<match your_tag>
|
101
|
+
@type kinesis_firehose
|
102
|
+
region us-east-1
|
103
|
+
delivery_stream_name your_stream
|
104
|
+
</match>
|
105
|
+
For more detail, see [Configuration: kinesis_firehose](#configuration-kinesis_firehose)
|
88
106
|
|
89
|
-
###
|
107
|
+
### For better throughput
|
108
|
+
Add configuration like below:
|
90
109
|
|
91
|
-
|
110
|
+
flush_interval 1
|
111
|
+
buffer_chunk_limit 1m
|
112
|
+
try_flush_interval 0.1
|
113
|
+
queued_chunk_flush_interval 0.01
|
114
|
+
num_threads 15
|
115
|
+
detach_process 5
|
92
116
|
|
93
|
-
|
117
|
+
Note: Each value should be adjusted to your system by yourself.
|
94
118
|
|
95
|
-
|
119
|
+
## Configuration: Credentials
|
120
|
+
To put records into Amazon Kinesis Streams or Firehose, you need to provide AWS security credentials.
|
96
121
|
|
97
|
-
|
122
|
+
The credential provider will be choosen by the steps below:
|
98
123
|
|
124
|
+
- Use [**shared_credentials**](#shared_credentials) section if you set it
|
125
|
+
- Use [**assume_role_credentials**](#assume_role_credentials) section if you set it
|
126
|
+
- Otherwise, default provicder chain:
|
127
|
+
- [**aws_key_id**](#aws_key_id) and [**aws_sec_key**](#aws_sec_key)
|
128
|
+
- Environment variables (ex. `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, etc.)
|
129
|
+
- Default shared credentials (`default` in `~/.aws/credentials`)
|
130
|
+
- Instance profile (For Amazon EC2)
|
131
|
+
|
132
|
+
### aws_key_id
|
99
133
|
AWS access key id.
|
100
134
|
|
101
135
|
### aws_sec_key
|
136
|
+
AWS secret access key.
|
137
|
+
|
138
|
+
### shared_credentials
|
139
|
+
Use this config section to specify shared credential file path and profile name. If you want to use default profile (`default` in `~/.aws/credentials`), you don't have to specify here.
|
140
|
+
|
141
|
+
#### profile_name
|
142
|
+
Profile name of the credential file.
|
143
|
+
|
144
|
+
#### path
|
145
|
+
Path for the credential file.
|
146
|
+
|
147
|
+
### assume_role_credentials
|
148
|
+
Use this config section for cross account access.
|
149
|
+
|
150
|
+
#### role_arn
|
151
|
+
IAM Role to be assumed with [AssumeRole][assume_role].
|
152
|
+
|
153
|
+
#### external_id
|
154
|
+
A unique identifier that is used by third parties when [assuming roles][assmue_role] in their customers' accounts. Use this option with `role_arn` for third party cross account access. For detail, please see [How to Use an External ID When Granting Access to Your AWS Resources to a Third Party][external_id].
|
102
155
|
|
103
|
-
|
156
|
+
## Configuraion: Format
|
157
|
+
This plugin use `Fluent::TextFormatter` to serialize record to string. For more detail, see [formatter.rb]. Also, this plugin includes `Fluent::SetTimeKeyMixin` and `Fluent::SetTagKeyMixin` to use **include_time_key** and **include_tagkey**.
|
104
158
|
|
105
|
-
###
|
159
|
+
### formatter
|
160
|
+
Default `json`.
|
106
161
|
|
107
|
-
|
108
|
-
|
162
|
+
### include_time_key
|
163
|
+
Defalut `false`. If you want to include `time` field in your record, set `true`.
|
109
164
|
|
110
|
-
###
|
165
|
+
### include_tag_key
|
166
|
+
Defalut `false`. If you want to include `tag` field in your record, set `true`.
|
111
167
|
|
112
|
-
|
113
|
-
|
114
|
-
|
115
|
-
|
168
|
+
### data_key
|
169
|
+
If your record contains a field whose string should be sent to Amazon Kinesis directly (without formatter), use this parameter to specify the field. In that case, other fileds than **data_key** are thrown away and never sent to Amazon Kinesis. Default `nil`, which means whole record will be formatted and sent.
|
170
|
+
|
171
|
+
### log_truncate_max_size
|
172
|
+
Integer, default 0. When emitting the log entry, the message will be truncated by this size to avoid infinite loop when the log is also sent to Kinesis. The value 0 (default) means no truncation.
|
173
|
+
|
174
|
+
## Configuration: kinesis_streams
|
175
|
+
Here are `kinesis_streams` specific configurations.
|
176
|
+
|
177
|
+
### stream_name
|
178
|
+
Name of the stream to put data.
|
116
179
|
|
117
180
|
### region
|
181
|
+
AWS region of your stream. It should be in form like `us-east-1`, `us-west-2`. Refer to [Regions and Endpoints in AWS General Reference][region] for supported regions.
|
182
|
+
|
183
|
+
Default `nil`, which means try to find from environment variable `AWS_REGION`.
|
184
|
+
|
185
|
+
### partition_key
|
186
|
+
A key to extract partition key from JSON object. Default `nil`, which means partition key will be generated randomly.
|
118
187
|
|
119
|
-
|
120
|
-
|
121
|
-
Refer to [Regions and Endpoints in AWS General Reference](http://docs.aws.amazon.com/general/latest/gr/rande.html#ak_region)
|
122
|
-
for supported regions.
|
188
|
+
### retries_on_batch_request
|
189
|
+
Integer, default is 3. The plugin will put multiple records to Amazon Kinesis Streams in batches using PutRecords. A set of records in a batch may fail for reasons documented in the Kinesis Service API Reference for PutRecords. Failed records will be retried **retries_on_batch_request** times. If a record fails all retries an error log will be emitted.
|
123
190
|
|
124
|
-
###
|
191
|
+
### reset_backoff_if_success
|
192
|
+
Boolean, default `true`. If enabled, when after retrying, the next retrying checks the number of succeeded records on the former batch request and reset exponential backoff if there is any success. Because batch request could be composed by requests across shards, simple exponential backoff for the batch request wouldn't work some cases.
|
125
193
|
|
126
|
-
|
194
|
+
### batch_request_max_count
|
195
|
+
Integer, default 500. The number of max count of making batch request from record chunk. It can't exceed the default value because it's API limit.
|
196
|
+
|
197
|
+
### batch_request_max_size
|
198
|
+
Integer, default 5 * 1024*1024. The number of max size of making batch request from record chunk. It can't exceed the default value because it's API limit.
|
127
199
|
|
128
200
|
### http_proxy
|
201
|
+
HTTP proxy for API calling. Default `nil`.
|
202
|
+
|
203
|
+
### endpoint
|
204
|
+
API endpoint URL, for testing. Defalut `nil`.
|
205
|
+
|
206
|
+
### ssl_verify_peer
|
207
|
+
Boolean. Disable if you want to verify ssl conncetion, for testing. Default `true`.
|
208
|
+
|
209
|
+
### debug
|
210
|
+
Boolean. Enable if you need to debug Amazon Kinesis Firehose API call. Default is `false`.
|
211
|
+
|
212
|
+
## Configuration: kinesis_producer
|
213
|
+
Here are `kinesis_producer` specific configurations.
|
129
214
|
|
130
|
-
|
131
|
-
|
215
|
+
### stream_name
|
216
|
+
Name of the stream to put data.
|
132
217
|
|
133
|
-
###
|
218
|
+
### region
|
219
|
+
AWS region of your stream. It should be in form like `us-east-1`, `us-west-2`. Refer to [Regions and Endpoints in AWS General Reference][region] for supported regions.
|
134
220
|
|
135
|
-
|
136
|
-
partition key for each record. Note that this parameter
|
137
|
-
overrides *partition_key*, *partition_key_expr*,
|
138
|
-
*explicit_hash_key* and *explicit_hash_key_expr*.
|
221
|
+
Default `nil`, which means try to find from environment variable `AWS_REGION`. If both **region** and `AWS_REGION` are not defined, KPL will try to find region from Amazon EC2 metadata.
|
139
222
|
|
140
223
|
### partition_key
|
224
|
+
A key to extract partition key from JSON object. Default `nil`, which means partition key will be generated randomly.
|
141
225
|
|
142
|
-
|
226
|
+
### debug
|
227
|
+
Boolean. Enable if you need to debug Kinesis Producer Library metrics. Default is `false`.
|
143
228
|
|
144
|
-
###
|
229
|
+
### kinesis_producer
|
230
|
+
This section is configuration for Kinesis Producer Library. Almost all of description comes from [deault_config.propertites of KPL Java Sample Application][default_config.properties].
|
145
231
|
|
146
|
-
|
147
|
-
|
232
|
+
#### aggregation_enabled
|
233
|
+
Enable aggregation. With aggregation, multiple user records are packed into a single KinesisRecord. If disabled, each user record is sent in its own KinesisRecord.
|
148
234
|
|
149
|
-
|
150
|
-
a_proc.call(record)
|
235
|
+
If your records are small, enabling aggregation will allow you to put many more records than you would otherwise be able to for a shard before getting throttled.
|
151
236
|
|
152
|
-
|
153
|
-
as a variable 'record', process it and return it. The returned
|
154
|
-
value will be used as a partition key. For use case example,
|
155
|
-
see 'Configuration examples' part.
|
237
|
+
Default: `true`
|
156
238
|
|
157
|
-
|
239
|
+
#### aggregation_max_count
|
240
|
+
Maximum number of items to pack into an aggregated record.
|
158
241
|
|
159
|
-
|
160
|
-
Explicit hash key is hash value used to explicitly
|
161
|
-
determine the shard the data record is assigned to
|
162
|
-
by overriding the partition key hash.
|
242
|
+
There should be normally no need to adjust this. If you want to limit the time records spend buffering, look into record_max_buffered_time instead.
|
163
243
|
|
164
|
-
|
244
|
+
Default: 4294967295
|
245
|
+
Minimum: 1
|
246
|
+
Maximum (inclusive): 9223372036854775807
|
165
247
|
|
166
|
-
|
167
|
-
|
248
|
+
#### aggregation_max_size
|
249
|
+
Maximum number of bytes to pack into an aggregated Kinesis record.
|
168
250
|
|
169
|
-
|
251
|
+
There should be normally no need to adjust this. If you want to limit the time records spend buffering, look into [**record_max_buffered_time**](#record_max_buffered_time) instead.
|
170
252
|
|
171
|
-
|
172
|
-
according to arrival of events. Without this, events will be coarsely ordered
|
173
|
-
based on arrival time. For detail,
|
174
|
-
see [Using the Amazon Kinesis Service API](http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-using-api-java.html#kinesis-using-api-defn-sequence-number).
|
253
|
+
If a record has more data by itself than this limit, it will bypass the aggregator. Note the backend enforces a limit of 50KB on record size. If you set this beyond 50KB, oversize records will be rejected at the backend.
|
175
254
|
|
176
|
-
|
177
|
-
|
255
|
+
Default: 51200
|
256
|
+
Minimum: 64
|
257
|
+
Maximum (inclusive): 1048576
|
178
258
|
|
179
|
-
|
259
|
+
#### collection_max_count
|
260
|
+
Maximum number of items to pack into an PutRecords request.
|
180
261
|
|
181
|
-
|
182
|
-
This can be used to increase throughput by allowing multiple processes to
|
183
|
-
execute the plugin at once. This cannot be used together with **order_events**.
|
184
|
-
Setting this option to > 0 will cause the plugin to run in a separate
|
185
|
-
process. The default is 0.
|
262
|
+
There should be normally no need to adjust this. If you want to limit the time records spend buffering, look into [**record_max_buffered_time**](#record_max_buffered_time) instead.
|
186
263
|
|
187
|
-
|
264
|
+
Default: 500
|
265
|
+
Minimum: 1
|
266
|
+
Maximum (inclusive): 500
|
188
267
|
|
189
|
-
|
190
|
-
|
191
|
-
Amazon Kinesis. You can find the detail about buffering mechanism [here](http://docs.fluentd.org/articles/buffer-plugin-overview).
|
192
|
-
Emitting records to Amazon Kinesis via network causes I/O Wait, so parallelizing
|
193
|
-
emitting with threads will improve throughput.
|
268
|
+
#### collection_max_size
|
269
|
+
Maximum amount of data to send with a PutRecords request.
|
194
270
|
|
195
|
-
|
196
|
-
designated by the output plugin. The default is 1.
|
197
|
-
Also you can use this option with *detach_process*.
|
271
|
+
There should be normally no need to adjust this. If you want to limit the time records spend buffering, look into [**record_max_buffered_time**](#record_max_buffered_time) instead.
|
198
272
|
|
199
|
-
|
273
|
+
Records larger than the limit will still be sent, but will not be grouped with others.
|
200
274
|
|
201
|
-
|
202
|
-
|
203
|
-
|
204
|
-
Failed records will be retried **retries_on_putrecords** times. If a record
|
205
|
-
fails all retries an error log will be emitted.
|
275
|
+
Default: 5242880
|
276
|
+
Minimum: 52224
|
277
|
+
Maximum (inclusive): 9223372036854775807
|
206
278
|
|
207
|
-
|
279
|
+
#### connect_timeout
|
280
|
+
Timeout (milliseconds) for establishing TLS connections.
|
208
281
|
|
209
|
-
|
210
|
-
|
282
|
+
Default: 6000
|
283
|
+
Minimum: 100
|
284
|
+
Maximum (inclusive): 300000
|
211
285
|
|
212
|
-
|
286
|
+
#### custom_endpoint
|
287
|
+
Use a custom Kinesis and CloudWatch endpoint.
|
213
288
|
|
214
|
-
|
215
|
-
Zlib compresses the message data blob.
|
216
|
-
Each zlib compressed message must remain within megabyte in size.
|
289
|
+
Mostly for testing use. Note this does not accept protocols or paths, only host names or ip addresses. There is no way to disable TLS. The KPL always connects with TLS.
|
217
290
|
|
218
|
-
|
291
|
+
Expected pattern: `^([A-Za-z0-9-\\.]+)?$`
|
219
292
|
|
220
|
-
|
293
|
+
#### fail_if_throttled
|
294
|
+
If `true`, throttled puts are not retried. The records that got throttled will be failed immediately upon receiving the throttling error. This is useful if you want to react immediately to any throttling without waiting for the KPL to retry. For example, you can use a different hash key to send the throttled record to a backup shard.
|
221
295
|
|
222
|
-
|
296
|
+
If `false`, the KPL will automatically retry throttled puts. The KPL performs backoff for shards that it has received throttling errors from, and will avoid flooding them with retries. Note that records may fail from expiration (see [**record_ttl**](#record_ttl)) if they get delayed for too long because of
|
297
|
+
throttling.
|
223
298
|
|
224
|
-
|
225
|
-
Assume that the JSON object below is coming to with tag 'your_tag'.
|
299
|
+
Default: `false`
|
226
300
|
|
227
|
-
|
228
|
-
|
229
|
-
"action":"bar"
|
230
|
-
}
|
301
|
+
#### log_level
|
302
|
+
Minimum level of logs. Messages below the specified level will not be logged. Logs for the native KPL daemon show up on stderr.
|
231
303
|
|
232
|
-
|
304
|
+
Default: `info`
|
305
|
+
Expected pattern: `info|warning|error`
|
233
306
|
|
234
|
-
|
235
|
-
|
307
|
+
#### max_connections
|
308
|
+
Maximum number of connections to open to the backend. HTTP requests are sent in parallel over multiple connections.
|
236
309
|
|
237
|
-
|
238
|
-
type kinesis
|
310
|
+
Setting this too high may impact latency and consume additional resources without increasing throughput.
|
239
311
|
|
240
|
-
|
312
|
+
Default: 4
|
313
|
+
Minimum: 1
|
314
|
+
Maximum (inclusive): 128
|
241
315
|
|
242
|
-
|
243
|
-
|
316
|
+
#### metrics_granularity
|
317
|
+
Controls the granularity of metrics that are uploaded to CloudWatch. Greater granularity produces more metrics.
|
244
318
|
|
245
|
-
|
319
|
+
When `shard` is selected, metrics are emitted with the stream name and shard id as dimensions. On top of this, the same metric is also emitted with only the stream name dimension, and lastly, without the stream name. This means for a particular metric, 2 streams with 2 shards (each) will produce 7 CloudWatch metrics, one for each shard, one for each stream, and one overall, all describing the same statistics, but at different levels of granularity.
|
246
320
|
|
247
|
-
|
248
|
-
</match>
|
321
|
+
When `stream` is selected, per shard metrics are not uploaded; when `global` is selected, only the total aggregate for all streams and all shards are uploaded.
|
249
322
|
|
250
|
-
|
323
|
+
Consider reducing the granularity if you're not interested in shard-level metrics, or if you have a large number of shards.
|
251
324
|
|
252
|
-
|
253
|
-
This expression adds string 'some_prefix-' to partition key 'name',
|
254
|
-
then partition key finally will be 'some_prefix-foo'.
|
325
|
+
If you only have 1 stream, select `global`; the global data will be equivalent to that for the stream.
|
255
326
|
|
256
|
-
|
257
|
-
the extracted value for partition key from JSON object will be
|
258
|
-
passed to your Ruby expression as a variable 'record'.
|
327
|
+
Refer to the metrics documentation for details about each metric.
|
259
328
|
|
260
|
-
|
261
|
-
|
329
|
+
Default: `shard`
|
330
|
+
Expected pattern: `global|stream|shard`
|
262
331
|
|
263
|
-
|
332
|
+
#### metrics_level
|
333
|
+
Controls the number of metrics that are uploaded to CloudWatch.
|
264
334
|
|
265
|
-
|
266
|
-
aws_sec_key YOUR_SECRET_KEY
|
335
|
+
`none` disables all metrics.
|
267
336
|
|
268
|
-
|
337
|
+
`summary` enables the following metrics: UserRecordsPut, KinesisRecordsPut, ErrorsByCode, AllErrors, BufferingTime.
|
269
338
|
|
270
|
-
|
271
|
-
partition_key_expr 'some_prefix-' + record
|
272
|
-
</match>
|
339
|
+
`detailed` enables all remaining metrics.
|
273
340
|
|
274
|
-
|
341
|
+
Refer to the metrics documentation for details about each metric.
|
275
342
|
|
276
|
-
|
277
|
-
|
343
|
+
Default: `detailed`
|
344
|
+
Expected pattern: `none|summary|detailed`
|
278
345
|
|
279
|
-
|
280
|
-
|
281
|
-
passed to your Ruby expression as a variable 'record'.
|
346
|
+
#### metrics_namespace
|
347
|
+
The namespace to upload metrics under.
|
282
348
|
|
283
|
-
|
284
|
-
type kinesis
|
349
|
+
If you have multiple applications running the KPL under the same AWS account, you should use a different namespace for each application.
|
285
350
|
|
286
|
-
|
351
|
+
If you are also using the KCL, you may wish to use the application name you have configured for the KCL as the the namespace here. This way both your KPL and KCL metrics show up under the same namespace.
|
287
352
|
|
288
|
-
|
289
|
-
|
353
|
+
Default: `KinesisProducerLibrary`
|
354
|
+
Expected pattern: `(?!AWS/).{1,255}`
|
290
355
|
|
291
|
-
|
356
|
+
#### metrics_upload_delay
|
357
|
+
Delay (in milliseconds) between each metrics upload.
|
292
358
|
|
293
|
-
|
294
|
-
</match>
|
359
|
+
For testing only. There is no benefit in setting this lower or higher in production.
|
295
360
|
|
296
|
-
|
361
|
+
Default: 60000
|
362
|
+
Minimum: 1
|
363
|
+
Maximum (inclusive): 60000
|
364
|
+
|
365
|
+
#### min_connections
|
366
|
+
Minimum number of connections to keep open to the backend.
|
297
367
|
|
298
|
-
|
299
|
-
PutRecord calls if **order_events** is set to true. By setting **order_events**
|
300
|
-
to false records will be sent to Amazon Kinesis in batches. When operating in
|
301
|
-
this mode the plugin can also be configured to execute in parallel.
|
302
|
-
The **detach_process** and **num_threads** configuration settings control
|
303
|
-
parallelism.
|
368
|
+
There should be no need to increase this in general.
|
304
369
|
|
305
|
-
|
306
|
-
|
370
|
+
Default: 1
|
371
|
+
Minimum: 1
|
372
|
+
Maximum (inclusive): 16
|
373
|
+
|
374
|
+
#### port
|
375
|
+
Server port to connect to. Only useful with [**custom_endpoint**](#custom_endpoint).
|
376
|
+
|
377
|
+
Default: 443
|
378
|
+
Minimum: 1
|
379
|
+
Maximum (inclusive): 65535
|
380
|
+
|
381
|
+
#### rate_limit
|
382
|
+
Limits the maximum allowed put rate for a shard, as a percentage of the backend limits.
|
383
|
+
|
384
|
+
The rate limit prevents the producer from sending data too fast to a shard. Such a limit is useful for reducing bandwidth and CPU cycle wastage from sending requests that we know are going to fail from throttling.
|
385
|
+
|
386
|
+
Kinesis enforces limits on both the number of records and number of bytes per second. This setting applies to both.
|
387
|
+
|
388
|
+
The default value of 150% is chosen to allow a single producer instance to completely saturate the allowance for a shard. This is an aggressive setting. If you prefer to reduce throttling errors rather than completely saturate the shard, consider reducing this setting.
|
389
|
+
|
390
|
+
Default: 150
|
391
|
+
Minimum: 1
|
392
|
+
Maximum (inclusive): 9223372036854775807
|
393
|
+
|
394
|
+
#### record_max_buffered_time
|
395
|
+
Maximum amount of itme (milliseconds) a record may spend being buffered before it gets sent. Records may be sent sooner than this depending on the other buffering limits.
|
396
|
+
|
397
|
+
This setting provides coarse ordering among records - any two records will be reordered by no more than twice this amount (assuming no failures and retries and equal network latency).
|
398
|
+
|
399
|
+
The library makes a best effort to enforce this time, but cannot guarantee that it will be precisely met. In general, if the CPU is not overloaded, the library will meet this deadline to within 10ms.
|
400
|
+
|
401
|
+
Failures and retries can additionally increase the amount of time records spend in the KPL. If your application cannot tolerate late records, use the [**record_ttl**](#record_ttl) setting to drop records that do not get transmitted in time.
|
402
|
+
|
403
|
+
Setting this too low can negatively impact throughput.
|
404
|
+
|
405
|
+
Default: 100
|
406
|
+
Maximum (inclusive): 9223372036854775807
|
407
|
+
|
408
|
+
#### record_ttl
|
409
|
+
Set a time-to-live on records (milliseconds). Records that do not get successfully put within the limit are failed.
|
410
|
+
|
411
|
+
This setting is useful if your application cannot or does not wish to tolerate late records. Records will still incur network latency after they leave the KPL, so take that into consideration when choosing a value for this setting.
|
412
|
+
|
413
|
+
If you do not wish to lose records and prefer to retry indefinitely, set record_ttl to a large value like INT_MAX. This has the potential to cause head-of-line blocking if network issues or throttling occur. You can respond to such situations by using the metrics reporting functions of the KPL. You may also set [**fail_if_throttled**](#fail_if_throttled) to true to prevent automatic retries in case of throttling.
|
414
|
+
|
415
|
+
Default: 30000
|
416
|
+
Minimum: 100
|
417
|
+
Maximum (inclusive): 9223372036854775807
|
418
|
+
|
419
|
+
#### request_timeout
|
420
|
+
The maximum total time (milliseconds) elapsed between when we begin a HTTP request and receiving all of the response. If it goes over, the request will be timed-out.
|
421
|
+
|
422
|
+
Note that a timed-out request may actually succeed at the backend. Retrying then leads to duplicates. Setting the timeout too low will therefore increase the probability of duplicates.
|
423
|
+
|
424
|
+
Default: 6000
|
425
|
+
Minimum: 100
|
426
|
+
Maximum (inclusive): 600000
|
427
|
+
|
428
|
+
#### verify_certificate
|
429
|
+
Verify the endpoint's certificate. Do not disable unless using [**custom_endpoint**](#custom_endpoint) for testing. Never disable this in production.
|
430
|
+
|
431
|
+
Default: `true`
|
432
|
+
|
433
|
+
#### credentials_refresh_delay
|
434
|
+
Interval seconds for refreshing credentials seding to KPL.
|
435
|
+
|
436
|
+
Defalut 5000
|
437
|
+
|
438
|
+
## Configuration: kinesis_firehose
|
439
|
+
Here are `kinesis_firehose` specific configurations.
|
440
|
+
|
441
|
+
### delivery_stream_name
|
442
|
+
Name of the delivery stream to put data.
|
443
|
+
|
444
|
+
### region
|
445
|
+
AWS region of your stream. It should be in form like `us-east-1`, `us-west-2`. Refer to [Regions and Endpoints in AWS General Reference][region] for supported regions.
|
446
|
+
|
447
|
+
Default `nil`, which means try to find from environment variable `AWS_REGION`.
|
448
|
+
|
449
|
+
### append_new_line
|
450
|
+
Boolean. Default `true`. If it is enabled, the plugin add new line character (`\n`) to each serialized record.
|
451
|
+
|
452
|
+
### retries_on_batch_request
|
453
|
+
Integer, default is 3. The plugin will put multiple records to Amazon Kinesis Firehose in batches using PutRecordBatch. A set of records in a batch may fail for reasons documented in the Kinesis Service API Reference for PutRecordBatch. Failed records will be retried **retries_on_batch_request** times. If a record fails all retries an error log will be emitted.
|
454
|
+
|
455
|
+
### reset_backoff_if_success
|
456
|
+
Boolean, default `true`. If enabled, when after retrying, the next retrying checks the number of succeeded records on the former batch request and reset exponential backoff if there is any success. Because batch request could be composed by requests across shards, simple exponential backoff for the batch request wouldn't work some cases.
|
457
|
+
|
458
|
+
### batch_request_max_count
|
459
|
+
Integer, default 500. The number of max count of making batch request from record chunk. It can't exceed the default value because it's API limit.
|
460
|
+
|
461
|
+
### batch_request_max_size
|
462
|
+
Integer, default 4 * 1024*1024. The number of max size of making batch request from record chunk. It can't exceed the default value because it's API limit.
|
463
|
+
|
464
|
+
### http_proxy
|
465
|
+
HTTP proxy for API calling. Default `nil`.
|
466
|
+
|
467
|
+
### endpoint
|
468
|
+
API endpoint URL, for testing. Defalut `nil`.
|
469
|
+
|
470
|
+
### ssl_verify_peer
|
471
|
+
Boolean. Disable if you want to verify ssl conncetion, for testing. Default `true`.
|
472
|
+
|
473
|
+
### debug
|
474
|
+
Boolean. Enable if you need to debug Amazon Kinesis Firehose API call. Default is `false`.
|
475
|
+
|
476
|
+
## Configuration: Examples
|
477
|
+
|
478
|
+
Here are some configuration examles.
|
479
|
+
Assume that the JSON object below is coming to with tag 'your_tag'.
|
480
|
+
|
481
|
+
{
|
482
|
+
"name":"foo",
|
483
|
+
"action":"bar"
|
484
|
+
}
|
485
|
+
|
486
|
+
### Improving throughput to Amazon Kinesis
|
487
|
+
The plugin can also be configured to execute in parallel. `detach_process` and `num_threads` configuration settings control parallelism.
|
307
488
|
|
308
489
|
In case of the configuration below, you will spawn 2 processes.
|
309
490
|
|
310
491
|
<match your_tag>
|
311
|
-
type
|
492
|
+
type kinesis_*
|
312
493
|
|
313
494
|
stream_name YOUR_STREAM_NAME
|
314
495
|
region us-east-1
|
315
496
|
|
316
497
|
detach_process 2
|
317
|
-
|
318
498
|
</match>
|
319
499
|
|
320
|
-
You can also specify a number of threads to put.
|
321
|
-
The number of threads is bound to each individual processes.
|
322
|
-
So in this case, you will spawn 1 process which has 50 threads.
|
500
|
+
You can also specify a number of threads to put. The number of threads is bound to each individual processes. So in this case, you will spawn 1 process which has 50 threads.
|
323
501
|
|
324
502
|
<match your_tag>
|
325
|
-
type
|
503
|
+
type kinesis_*
|
326
504
|
|
327
505
|
stream_name YOUR_STREAM_NAME
|
328
506
|
region us-east-1
|
@@ -330,11 +508,10 @@ So in this case, you will spawn 1 process which has 50 threads.
|
|
330
508
|
num_threads 50
|
331
509
|
</match>
|
332
510
|
|
333
|
-
Both options can be used together, in the configuration below,
|
334
|
-
you will spawn 2 processes and 50 threads per each processes.
|
511
|
+
Both options can be used together, in the configuration below, you will spawn 2 processes and 50 threads per each processes.
|
335
512
|
|
336
513
|
<match your_tag>
|
337
|
-
type
|
514
|
+
type kinesis_*
|
338
515
|
|
339
516
|
stream_name YOUR_STREAM_NAME
|
340
517
|
region us-east-1
|
@@ -343,6 +520,44 @@ you will spawn 2 processes and 50 threads per each processes.
|
|
343
520
|
num_threads 50
|
344
521
|
</match>
|
345
522
|
|
523
|
+
## Development
|
524
|
+
|
525
|
+
To launch `fluentd` process with this plugin for development, follow the steps below:
|
526
|
+
|
527
|
+
git clone https://github.com/awslabs/aws-fluent-plugin-kinesis.git
|
528
|
+
cd aws-fluent-plugin-kinesis
|
529
|
+
make # will install gems and download KPL jar file and extract binaries
|
530
|
+
make [stream/firehose/producer]
|
531
|
+
|
532
|
+
Then, in another terminal, run the command below. It will emit one record.
|
533
|
+
|
534
|
+
make hello
|
535
|
+
|
536
|
+
Also, you can test streaming log data by `dummer`:
|
537
|
+
|
538
|
+
make dummer # keep writing to /tmp/dummy.log
|
539
|
+
|
540
|
+
## Contributing
|
541
|
+
|
542
|
+
Bug reports and pull requests are welcome on [GitHub][github].
|
543
|
+
|
346
544
|
## Related Resources
|
347
545
|
|
348
|
-
* [Amazon Kinesis Developer Guide](http://docs.aws.amazon.com/kinesis/latest/dev/introduction.html)
|
546
|
+
* [Amazon Kinesis Streams Developer Guide](http://docs.aws.amazon.com/kinesis/latest/dev/introduction.html)
|
547
|
+
* [Amazon Kinesis Firehose Developer Guide](http://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html)
|
548
|
+
|
549
|
+
[fluentd]: http://fluentd.org/
|
550
|
+
[streams]: https://aws.amazon.com/kinesis/streams/
|
551
|
+
[firehose]: https://aws.amazon.com/kinesis/firehose/
|
552
|
+
[producer]: http://docs.aws.amazon.com/kinesis/latest/dev/developing-producers-with-kpl.html
|
553
|
+
[td-agent]: https://github.com/treasure-data/td-agent
|
554
|
+
[bundler]: http://bundler.io/
|
555
|
+
[assume_role]: http://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html
|
556
|
+
[external_id]: http://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html
|
557
|
+
[region]: http://docs.aws.amazon.com/general/latest/gr/rande.html#ak_region
|
558
|
+
[fluentd_buffer]: http://docs.fluentd.org/articles/buffer-plugin-overview
|
559
|
+
[github]: https://github.com/awslabs/aws-fluent-plugin-kinesis
|
560
|
+
[formatter.rb]: https://github.com/fluent/fluentd/blob/master/lib/fluent/formatter.rb
|
561
|
+
[default_config.properties]: https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer-sample/default_config.properties
|
562
|
+
[old-readme]: https://github.com/awslabs/aws-fluent-plugin-kinesis/blob/master/README-v0.4.md
|
563
|
+
[fluentd-doc-kinesis]: http://docs.fluentd.org/articles/kinesis-stream
|