roched-fluent-plugin-kafka 0.6.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: e881322a89e987344e6548fa9072160f02f36972
4
+ data.tar.gz: 891fe76d4b08be328dc184d482670c04c6cf0169
5
+ SHA512:
6
+ metadata.gz: 24e79b9778e49e92e380d7a0f5df370557f3c19d0baee536e0e216d3e0d30365865280b35c525ec019ed3a261c89a043f4c36f6be0bf941398d40e93e8cf305b
7
+ data.tar.gz: 6ec09a40c4ff6928933d2c7e38d5bdf47813966786a61b696b39051955c4ea7bbb23d42639d8b8444c6c0d276ca57dae62892e5ccef1e34a2b2f9780b77118c5
data/.gitignore ADDED
@@ -0,0 +1,2 @@
1
+ /Gemfile.lock
2
+ *.swp
data/.travis.yml ADDED
@@ -0,0 +1,18 @@
1
+ language: ruby
2
+
3
+ rvm:
4
+ - 2.1
5
+ - 2.2
6
+ - 2.3.1
7
+ - 2.4.1
8
+ - ruby-head
9
+
10
+ script:
11
+ - bundle exec rake test
12
+
13
+ sudo: false
14
+
15
+ matrix:
16
+ allow_failures:
17
+ - rvm: ruby-head
18
+
data/ChangeLog ADDED
@@ -0,0 +1,94 @@
1
+ Release 0.6.3 - 2017/11/14
2
+
3
+ * in_kafka_group: re-create consumer when error happens during event fetch
4
+
5
+ Release 0.6.2 - 2017/11/1
6
+
7
+ * Fix ltsv parsing issue which generates symbol keys
8
+
9
+ Release 0.6.1 - 2017/08/30
10
+
11
+ * Add stats and datadog monitoring support
12
+ * ssl_ca_certs now accepts multiple paths
13
+ * Fix bug by ruby-kafka 0.4.1 changes
14
+ * Update ruby-kafka dependency to v0.4.1
15
+
16
+ Release 0.6.0 - 2017/07/25
17
+
18
+ * Add principal and keytab parameters for SASL support
19
+
20
+ Release 0.5.7 - 2017/07/13
21
+
22
+ * out_kafka_buffered: Add kafka_agg_max_messages parameter
23
+
24
+ Release 0.5.6 - 2017/07/10
25
+
26
+ * output: Add ActiveSupport notification support
27
+
28
+ Release 0.5.5 - 2017/04/19
29
+
30
+ * output: Some trace log level changed to debug
31
+ * out_kafka_buffered: Add discard_kafka_delivery_failed parameter
32
+
33
+ Release 0.5.4 - 2017/04/12
34
+
35
+ * out_kafka_buffered: Add max_send_limit_bytes parameter
36
+ * out_kafka: Improve buffer overflow handling of ruby-kafka
37
+
38
+ Release 0.5.3 - 2017/02/13
39
+
40
+ * Relax ruby-kafka dependency
41
+
42
+ Release 0.5.2 - 2017/02/13
43
+
44
+ * in_kafka_group: Add max_bytes parameter
45
+
46
+ Release 0.5.1 - 2017/02/06
47
+
48
+ * in_kafka_group: Fix uninitialized constant error
49
+
50
+ Release 0.5.0 - 2017/01/17
51
+
52
+ * output: Add out_kafka2 plugin with v0.14 API
53
+
54
+ Release 0.4.2 - 2016/12/10
55
+
56
+ * input: Add use_record_time and time_format parameters
57
+ * Update ruby-kafka dependency to 0.3.16.beta2
58
+
59
+ Release 0.4.1 - 2016/12/01
60
+
61
+ * output: Support specifying partition
62
+
63
+ Release 0.4.0 - 2016/11/08
64
+
65
+ * Remove zookeeper dependency
66
+
67
+ Release 0.3.5 - 2016/10/21
68
+
69
+ * output: Support message key and related parameters. #91
70
+
71
+ Release 0.3.4 - 2016/10/20
72
+
73
+ * output: Add exclude_topic_key and exclude_partition_key. #89
74
+
75
+ Release 0.3.3 - 2016/10/17
76
+
77
+ * out_kafka_buffered: Add get_kafka_client_log parameter. #83
78
+ * out_kafka_buffered: Skip and log invalid record to avoid buffer stuck. #86
79
+ * in_kafka_group: Add retry_emit_limit to handle BufferQueueLimitError. #87
80
+
81
+ Release 0.3.2 - 2016/10/06
82
+
83
+ * in_kafka_group: Re-fetch events after consumer error. #79
84
+
85
+ Release 0.3.1 - 2016/08/28
86
+
87
+ * output: Change default required_acks to -1. #70
88
+ * Support ruby version changed to 2.1.0 or later
89
+
90
+ Release 0.3.0 - 2016/08/24
91
+
92
+ * Fully replace poseidon ruby library with ruby-kafka to support latest kafka versions
93
+
94
+ See git commits for older changes
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in fluent-plugin-kafka.gemspec
4
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,14 @@
1
+ Copyright (C) 2014 htgc
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an "AS IS" BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
14
+
data/README.md ADDED
@@ -0,0 +1,244 @@
1
+ # fluent-plugin-kafka, a plugin for [Fluentd](http://fluentd.org)
2
+
3
+ [![Build Status](https://travis-ci.org/htgc/fluent-plugin-kafka.svg?branch=master)](https://travis-ci.org/htgc/fluent-plugin-kafka)
4
+
5
+ A fluentd plugin to both consume and produce data for Apache Kafka.
6
+
7
+ TODO: Also, I need to write tests
8
+
9
+ ## Installation
10
+
11
+ Add this line to your application's Gemfile:
12
+
13
+ gem 'fluent-plugin-kafka'
14
+
15
+ And then execute:
16
+
17
+ $ bundle
18
+
19
+ Or install it yourself as:
20
+
21
+ $ gem install fluent-plugin-kafka
22
+
23
+ If you want to use zookeeper related parameters, you also need to install zookeeper gem. zookeeper gem includes native extension, so development tools are needed, e.g. gcc, make and etc.
24
+
25
+ ## Requirements
26
+
27
+ - Ruby 2.1 or later
28
+ - Input plugins work with kafka v0.9 or later
29
+ - Output plugins work with kafka v0.8 or later
30
+
31
+ ## Usage
32
+
33
+ ### Common parameters
34
+
35
+ #### SSL authentication
36
+
37
+ - ssl_ca_cert
38
+ - ssl_client_cert
39
+ - ssl_client_cert_key
40
+
41
+ Set path to SSL related files. See [Encryption and Authentication using SSL](https://github.com/zendesk/ruby-kafka#encryption-and-authentication-using-ssl) for more detail.
42
+
43
+ #### SASL authentication
44
+
45
+ - principal
46
+ - keytab
47
+
48
+ Set principal and path to keytab for SASL/GSSAPI authentication. See [Authentication using SASL](https://github.com/zendesk/ruby-kafka#authentication-using-sasl) for more details.
49
+
50
+ ### Input plugin (@type 'kafka')
51
+
52
+ Consume events by single consumer.
53
+
54
+ <source>
55
+ @type kafka
56
+
57
+ brokers <broker1_host>:<broker1_port>,<broker2_host>:<broker2_port>,..
58
+ topics <listening topics(separate with comma',')>
59
+ format <input text type (text|json|ltsv|msgpack)> :default => json
60
+ message_key <key (Optional, for text format only, default is message)>
61
+ add_prefix <tag prefix (Optional)>
62
+ add_suffix <tag suffix (Optional)>
63
+
64
+ # Optionally, you can manage topic offset by using zookeeper
65
+ offset_zookeeper <zookeer node list (<zookeeper1_host>:<zookeeper1_port>,<zookeeper2_host>:<zookeeper2_port>,..)>
66
+ offset_zk_root_node <offset path in zookeeper> default => '/fluent-plugin-kafka'
67
+
68
+ # ruby-kafka consumer options
69
+ max_bytes (integer) :default => nil (Use default of ruby-kafka)
70
+ max_wait_time (integer) :default => nil (Use default of ruby-kafka)
71
+ min_bytes (integer) :default => nil (Use default of ruby-kafka)
72
+ </source>
73
+
74
+ Supports a start of processing from the assigned offset for specific topics.
75
+
76
+ <source>
77
+ @type kafka
78
+
79
+ brokers <broker1_host>:<broker1_port>,<broker2_host>:<broker2_port>,..
80
+ format <input text type (text|json|ltsv|msgpack)>
81
+ <topic>
82
+ topic <listening topic>
83
+ partition <listening partition: default=0>
84
+ offset <listening start offset: default=-1>
85
+ </topic>
86
+ <topic>
87
+ topic <listening topic>
88
+ partition <listening partition: default=0>
89
+ offset <listening start offset: default=-1>
90
+ </topic>
91
+ </source>
92
+
93
+ See also [ruby-kafka README](https://github.com/zendesk/ruby-kafka#consuming-messages-from-kafka) for more detailed documentation about ruby-kafka.
94
+
95
+ ### Input plugin (@type 'kafka_group', supports kafka group)
96
+
97
+ Consume events by kafka consumer group features..
98
+
99
+ <source>
100
+ @type kafka_group
101
+
102
+ brokers <broker1_host>:<broker1_port>,<broker2_host>:<broker2_port>,..
103
+ consumer_group <consumer group name, must set>
104
+ topics <listening topics(separate with comma',')>
105
+ format <input text type (text|json|ltsv|msgpack)> :default => json
106
+ message_key <key (Optional, for text format only, default is message)>
107
+ add_prefix <tag prefix (Optional)>
108
+ add_suffix <tag suffix (Optional)>
109
+ retry_emit_limit <Wait retry_emit_limit x 1s when BuffereQueueLimitError happens. The default is nil and it means waiting until BufferQueueLimitError is resolved>
110
+ use_record_time <If true, replace event time with contents of 'time' field of fetched record>
111
+ time_format <string (Optional when use_record_time is used)>
112
+
113
+ # ruby-kafka consumer options
114
+ max_bytes (integer) :default => 1048576
115
+ max_wait_time (integer) :default => nil (Use default of ruby-kafka)
116
+ min_bytes (integer) :default => nil (Use default of ruby-kafka)
117
+ offset_commit_interval (integer) :default => nil (Use default of ruby-kafka)
118
+ offset_commit_threshold (integer) :default => nil (Use default of ruby-kafka)
119
+ start_from_beginning (bool) :default => true
120
+ </source>
121
+
122
+ See also [ruby-kafka README](https://github.com/zendesk/ruby-kafka#consuming-messages-from-kafka) for more detailed documentation about ruby-kafka options.
123
+
124
+ ### Buffered output plugin
125
+
126
+ This plugin uses ruby-kafka producer for writing data. This plugin works with recent kafka versions.
127
+
128
+ <match *.**>
129
+ @type kafka_buffered
130
+
131
+ # Brokers: you can choose either brokers or zookeeper. If you are not familiar with zookeeper, use brokers parameters.
132
+ brokers <broker1_host>:<broker1_port>,<broker2_host>:<broker2_port>,.. # Set brokers directly
133
+ zookeeper <zookeeper_host>:<zookeeper_port> # Set brokers via Zookeeper
134
+ zookeeper_path <broker path in zookeeper> :default => /brokers/ids # Set path in zookeeper for kafka
135
+
136
+ default_topic (string) :default => nil
137
+ default_partition_key (string) :default => nil
138
+ default_message_key (string) :default => nil
139
+ output_data_type (json|ltsv|msgpack|attr:<record name>|<formatter name>) :default => json
140
+ output_include_tag (bool) :default => false
141
+ output_include_time (bool) :default => false
142
+ exclude_topic_key (bool) :default => false
143
+ exclude_partition_key (bool) :default => false
144
+ get_kafka_client_log (bool) :default => false
145
+
146
+ # See fluentd document for buffer related parameters: http://docs.fluentd.org/articles/buffer-plugin-overview
147
+
148
+ # ruby-kafka producer options
149
+ max_send_retries (integer) :default => 1
150
+ required_acks (integer) :default => -1
151
+ ack_timeout (integer) :default => nil (Use default of ruby-kafka)
152
+ compression_codec (gzip|snappy) :default => nil (No compression)
153
+ kafka_agg_max_bytes (integer) :default => 4096
154
+ kafka_agg_max_messages (integer) :default => nil (No limit)
155
+ max_send_limit_bytes (integer) :default => nil (No drop)
156
+ discard_kafka_delivery_failed (bool) :default => false (No discard)
157
+ monitoring_list (array) :default => []
158
+ </match>
159
+
160
+ `<formatter name>` of `output_data_type` uses fluentd's formatter plugins. See [formatter article](http://docs.fluentd.org/articles/formatter-plugin-overview).
161
+
162
+ ruby-kafka sometimes returns `Kafka::DeliveryFailed` error without good information.
163
+ In this case, `get_kafka_client_log` is useful for identifying the error cause.
164
+ ruby-kafka's log is routed to fluentd log so you can see ruby-kafka's log in fluentd logs.
165
+
166
+ Supports following ruby-kafka's producer options.
167
+
168
+ - max_send_retries - default: 1 - Number of times to retry sending of messages to a leader.
169
+ - required_acks - default: -1 - The number of acks required per request. If you need flush performance, set lower value, e.g. 1, 2.
170
+ - ack_timeout - default: nil - How long the producer waits for acks. The unit is seconds.
171
+ - compression_codec - default: nil - The codec the producer uses to compress messages.
172
+ - kafka_agg_max_bytes - default: 4096 - Maximum value of total message size to be included in one batch transmission.
173
+ - kafka_agg_max_messages - default: nil - Maximum number of messages to include in one batch transmission.
174
+ - max_send_limit_bytes - default: nil - Max byte size to send message to avoid MessageSizeTooLarge. For example, if you set 1000000(message.max.bytes in kafka), Message more than 1000000 byes will be dropped.
175
+ - discard_kafka_delivery_failed - default: false - discard the record where [Kafka::DeliveryFailed](http://www.rubydoc.info/gems/ruby-kafka/Kafka/DeliveryFailed) occurred
176
+ - monitoring_list - default: [] - library to be used to monitor. statsd and datadog are supported
177
+
178
+ If you want to know about detail of monitoring, see also https://github.com/zendesk/ruby-kafka#monitoring
179
+
180
+ See also [Kafka::Client](http://www.rubydoc.info/gems/ruby-kafka/Kafka/Client) for more detailed documentation about ruby-kafka.
181
+
182
+ This plugin supports compression codec "snappy" also.
183
+ Install snappy module before you use snappy compression.
184
+
185
+ $ gem install snappy
186
+
187
+ snappy gem uses native extension, so you need to install several packages before.
188
+ On Ubuntu, need development packages and snappy library.
189
+
190
+ $ sudo apt-get install build-essential autoconf automake libtool libsnappy-dev
191
+
192
+ #### Load balancing
193
+
194
+ Messages will be assigned a partition at random as default by ruby-kafka, but messages with the same partition key will always be assigned to the same partition by setting `default_partition_key` in config file.
195
+ If key name `partition_key` exists in a message, this plugin set its value of partition_key as key.
196
+
197
+ |default_partition_key|partition_key| behavior |
198
+ | --- | --- | --- |
199
+ |Not set|Not exists| All messages are assigned a partition at random |
200
+ |Set| Not exists| All messages are assigned to the specific partition |
201
+ |Not set| Exists | Messages which have partition_key record are assigned to the specific partition, others are assigned a partition at random |
202
+ |Set| Exists | Messages which have partition_key record are assigned to the specific partition with parition_key, others are assigned to the specific partition with default_parition_key |
203
+
204
+ If key name `message_key` exists in a message, this plugin publishes the value of message_key to kafka and can be read by consumers. Same message key will be assigned to all messages by setting `default_message_key` in config file. If message_key exists and if partition_key is not set explicitly, messsage_key will be used for partitioning.
205
+
206
+ ### Non-buffered output plugin
207
+
208
+ This plugin uses ruby-kafka producer for writing data. For performance and reliability concerns, use `kafka_bufferd` output instead. This is mainly for testing.
209
+
210
+ <match *.**>
211
+ @type kafka
212
+
213
+ # Brokers: you can choose either brokers or zookeeper.
214
+ brokers <broker1_host>:<broker1_port>,<broker2_host>:<broker2_port>,.. # Set brokers directly
215
+ zookeeper <zookeeper_host>:<zookeeper_port> # Set brokers via Zookeeper
216
+ zookeeper_path <broker path in zookeeper> :default => /brokers/ids # Set path in zookeeper for kafka
217
+
218
+ default_topic (string) :default => nil
219
+ default_partition_key (string) :default => nil
220
+ default_message_key (string) :default => nil
221
+ output_data_type (json|ltsv|msgpack|attr:<record name>|<formatter name>) :default => json
222
+ output_include_tag (bool) :default => false
223
+ output_include_time (bool) :default => false
224
+ exclude_topic_key (bool) :default => false
225
+ exclude_partition_key (bool) :default => false
226
+
227
+ # ruby-kafka producer options
228
+ max_send_retries (integer) :default => 1
229
+ required_acks (integer) :default => -1
230
+ ack_timeout (integer) :default => nil (Use default of ruby-kafka)
231
+ compression_codec (gzip|snappy) :default => nil
232
+ max_buffer_size (integer) :default => nil (Use default of ruby-kafka)
233
+ max_buffer_bytesize (integer) :default => nil (Use default of ruby-kafka)
234
+ </match>
235
+
236
+ This plugin also supports ruby-kafka related parameters. See Buffered output plugin section.
237
+
238
+ ## Contributing
239
+
240
+ 1. Fork it
241
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
242
+ 3. Commit your changes (`git commit -am 'Added some feature'`)
243
+ 4. Push to the branch (`git push origin my-new-feature`)
244
+ 5. Create new Pull Request
data/Rakefile ADDED
@@ -0,0 +1,12 @@
1
+ require 'bundler'
2
+ Bundler::GemHelper.install_tasks
3
+
4
+ require 'rake/testtask'
5
+
6
+ Rake::TestTask.new(:test) do |test|
7
+ test.libs << 'lib' << 'test'
8
+ test.test_files = FileList['test/**/test_*.rb']
9
+ test.verbose = true
10
+ end
11
+
12
+ task :default => [:build]
@@ -0,0 +1,24 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ Gem::Specification.new do |gem|
4
+ gem.authors = ["Hidemasa Togashi", "Masahiro Nakagawa"]
5
+ gem.email = ["togachiro@gmail.com", "repeatedly@gmail.com"]
6
+ gem.description = %q{Fluentd plugin for Apache Kafka > 0.8}
7
+ gem.summary = %q{Fluentd plugin for Apache Kafka > 0.8}
8
+ gem.homepage = "https://github.com/roche-d/fluent-plugin-kafka"
9
+ gem.license = "Apache-2.0"
10
+
11
+ gem.files = `git ls-files`.split($\)
12
+ gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
13
+ gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
14
+ gem.name = "roched-fluent-plugin-kafka"
15
+ gem.require_paths = ["lib"]
16
+ gem.version = '0.6.5'
17
+ gem.required_ruby_version = ">= 2.1.0"
18
+
19
+ gem.add_dependency "fluentd", [">= 0.10.58", "< 2"]
20
+ gem.add_dependency 'ltsv'
21
+ gem.add_dependency 'ruby-kafka', '~> 0.4.1'
22
+ gem.add_development_dependency "rake", ">= 0.9.2"
23
+ gem.add_development_dependency "test-unit", ">= 3.0.8"
24
+ end