fluent-plugin-kafka 0.13.0 → 0.15.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d2916b74ae31b54e70f789e5891e6cef47e34a714c1a76216ef0f7d8769c5f64
4
- data.tar.gz: b75ca1d9b41fb0dc2ec547a544185c4cecdfa8759587b2f456fa51ee2720bebd
3
+ metadata.gz: 43c3a759f4636168c932c33f45c38105ebb522b5ea8222f1b1a7eceb53343348
4
+ data.tar.gz: c64a103244e721fa2de124f466f2480c960daafc713fd16f685ea4dd4a545a3d
5
5
  SHA512:
6
- metadata.gz: fd9dfbae3b9b663ba4cdc502bd59138bf3713555cc736935b269ccb54e5c5228f24fb738be5ff6e2262759102fae9f98fe68bccbf76e4a579e1b47bb401f0843
7
- data.tar.gz: 785ff04d203d38d064a0b1673e91b94399e857e13db918debeed90435e0337a644c8c166971b0915b03123da998a29327deca993aa78e63587a46fc329947e5b
6
+ metadata.gz: 707d92f2a23041b53daf6410d3fadb0e84053c4eb250b20c6dd3c72a15969273d2279b71950334187d156767bf6646a0af468a0f84e85ca683a34c127e47e363
7
+ data.tar.gz: 978883c8a72152bb6b9262ccea4e6b65b91bca1a3907ea43a7930cf7b4d414f1a9f47cb593d420738a48bd47d86451f376fe4cc6e7dec6b4f2c4e81ad5213d00
data/ChangeLog CHANGED
@@ -1,3 +1,27 @@
1
+ Release 0.15.0 - 2020/09/14
2
+
3
+ * Add experimental `in_rdkafka_group`
4
+ * in_kafka: Expose `ssl_verify_hostname` parameter
5
+
6
+ Release 0.14.2 - 2020/08/26
7
+
8
+ * in_kafka_group: Add `add_headers` parameter
9
+ * out_kafka2/out_rdkafka2: Support `discard_kafka_delivery_failed` parameter
10
+
11
+ Release 0.14.1 - 2020/08/11
12
+
13
+ * kafka_producer_ext: Fix regression by v0.14.0 changes
14
+
15
+ Release 0.14.0 - 2020/08/07
16
+
17
+ * Update ruby-kafka dependency to v1.2.0 or later. Check https://github.com/zendesk/ruby-kafka#compatibility
18
+ * kafka_producer_ext: Follow Paritioner API change
19
+
20
+ Release 0.13.1 - 2020/07/17
21
+
22
+ * in_kafka_group: Support ssl_verify_hostname parameter
23
+ * out_kafka2/out_rdkafka2: Support topic parameter with placeholders
24
+
1
25
  Release 0.13.0 - 2020/03/09
2
26
 
3
27
  * Accept ruby-kafka v1 or later
data/README.md CHANGED
@@ -118,10 +118,13 @@ Consume events by kafka consumer group features..
118
118
  topics <listening topics(separate with comma',')>
119
119
  format <input text type (text|json|ltsv|msgpack)> :default => json
120
120
  message_key <key (Optional, for text format only, default is message)>
121
+ kafka_mesasge_key <key (Optional, If specified, set kafka's message key to this key)>
122
+ add_headers <If true, add kafka's message headers to record>
121
123
  add_prefix <tag prefix (Optional)>
122
124
  add_suffix <tag suffix (Optional)>
123
125
  retry_emit_limit <Wait retry_emit_limit x 1s when BuffereQueueLimitError happens. The default is nil and it means waiting until BufferQueueLimitError is resolved>
124
- use_record_time <If true, replace event time with contents of 'time' field of fetched record>
126
+ use_record_time (Deprecated. Use 'time_source record' instead.) <If true, replace event time with contents of 'time' field of fetched record>
127
+ time_source <source for message timestamp (now|kafka|record)> :default => now
125
128
  time_format <string (Optional when use_record_time is used)>
126
129
 
127
130
  # ruby-kafka consumer options
@@ -138,9 +141,43 @@ See also [ruby-kafka README](https://github.com/zendesk/ruby-kafka#consuming-mes
138
141
 
139
142
  Consuming topic name is used for event tag. So when the target topic name is `app_event`, the tag is `app_event`. If you want to modify tag, use `add_prefix` or `add_suffix` parameter. With `add_prefix kafka`, the tag is `kafka.app_event`.
140
143
 
144
+ ### Input plugin (@type 'rdkafka_group', supports kafka consumer groups, uses rdkafka-ruby)
145
+
146
+ :warning: **The in_rdkafka_group consumer was not yet tested under heavy production load. Use it at your own risk!**
147
+
148
+ With the introduction of the rdkafka-ruby based input plugin we hope to support Kafka brokers above version 2.1 where we saw [compatibility issues](https://github.com/fluent/fluent-plugin-kafka/issues/315) when using the ruby-kafka based @kafka_group input type. The rdkafka-ruby lib wraps the highly performant and production ready librdkafka C lib.
149
+
150
+ <source>
151
+ @type rdkafka_group
152
+ topics <listening topics(separate with comma',')>
153
+ format <input text type (text|json|ltsv|msgpack)> :default => json
154
+ message_key <key (Optional, for text format only, default is message)>
155
+ kafka_mesasge_key <key (Optional, If specified, set kafka's message key to this key)>
156
+ add_headers <If true, add kafka's message headers to record>
157
+ add_prefix <tag prefix (Optional)>
158
+ add_suffix <tag suffix (Optional)>
159
+ retry_emit_limit <Wait retry_emit_limit x 1s when BuffereQueueLimitError happens. The default is nil and it means waiting until BufferQueueLimitError is resolved>
160
+ use_record_time (Deprecated. Use 'time_source record' instead.) <If true, replace event time with contents of 'time' field of fetched record>
161
+ time_source <source for message timestamp (now|kafka|record)> :default => now
162
+ time_format <string (Optional when use_record_time is used)>
163
+
164
+ # kafka consumer options
165
+ max_wait_time_ms 500
166
+ max_batch_size 10000
167
+ kafka_configs {
168
+ "bootstrap.servers": "brokers <broker1_host>:<broker1_port>,<broker2_host>:<broker2_port>",
169
+ "group.id": "<consumer group name>"
170
+ }
171
+ </source>
172
+
173
+ See also [rdkafka-ruby](https://github.com/appsignal/rdkafka-ruby) and [librdkafka](https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md) for more detailed documentation about Kafka consumer options.
174
+
175
+ Consuming topic name is used for event tag. So when the target topic name is `app_event`, the tag is `app_event`. If you want to modify tag, use `add_prefix` or `add_suffix` parameter. With `add_prefix kafka`, the tag is `kafka.app_event`.
176
+
141
177
  ### Output plugin
142
178
 
143
- This `kafka2` plugin is for fluentd v1.0 or later. This will be `out_kafka` plugin in the future.
179
+ This `kafka2` plugin is for fluentd v1 or later. This plugin uses `ruby-kafka` producer for writing data.
180
+ If `ruby-kafka` doesn't fit your kafka environment, check `rdkafka2` plugin instead. This will be `out_kafka` plugin in the future.
144
181
 
145
182
  <match app.**>
146
183
  @type kafka2
@@ -161,6 +198,7 @@ This `kafka2` plugin is for fluentd v1.0 or later. This will be `out_kafka` plug
161
198
  headers (hash) :default => {}
162
199
  headers_from_record (hash) :default => {}
163
200
  use_default_for_unknown_topic (bool) :default => false
201
+ discard_kafka_delivery_failed (bool) :default => false (No discard)
164
202
 
165
203
  <format>
166
204
  @type (json|ltsv|msgpack|attr:<record name>|<formatter name>) :default => json
@@ -384,6 +422,7 @@ You need to install rdkafka gem.
384
422
  default_message_key (string) :default => nil
385
423
  exclude_topic_key (bool) :default => false
386
424
  exclude_partition_key (bool) :default => false
425
+ discard_kafka_delivery_failed (bool) :default => false (No discard)
387
426
 
388
427
  # same with kafka2
389
428
  headers (hash) :default => {}
@@ -443,7 +482,7 @@ See ruby-kafka README for more details: https://github.com/zendesk/ruby-kafka#co
443
482
 
444
483
  To avoid the problem, there are 2 approaches:
445
484
 
446
- - Upgrade your kafka cluster to latest version. This is better becase recent version is faster and robust.
485
+ - Upgrade your kafka cluster to latest version. This is better because recent version is faster and robust.
447
486
  - Downgrade ruby-kafka/fluent-plugin-kafka to work with your older kafka.
448
487
 
449
488
  ## Contributing
@@ -13,12 +13,12 @@ Gem::Specification.new do |gem|
13
13
  gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
14
14
  gem.name = "fluent-plugin-kafka"
15
15
  gem.require_paths = ["lib"]
16
- gem.version = '0.13.0'
16
+ gem.version = '0.15.0'
17
17
  gem.required_ruby_version = ">= 2.1.0"
18
18
 
19
19
  gem.add_dependency "fluentd", [">= 0.10.58", "< 2"]
20
20
  gem.add_dependency 'ltsv'
21
- gem.add_dependency 'ruby-kafka', '>= 0.7.8', '< 2'
21
+ gem.add_dependency 'ruby-kafka', '>= 1.2.0', '< 2'
22
22
  gem.add_development_dependency "rake", ">= 0.9.2"
23
23
  gem.add_development_dependency "test-unit", ">= 3.0.8"
24
24
  end
@@ -39,6 +39,8 @@ class Fluent::KafkaInput < Fluent::Input
39
39
  :deprecated => "Use 'time_source record' instead."
40
40
  config_param :time_source, :enum, :list => [:now, :kafka, :record], :default => :now,
41
41
  :desc => "Source for message timestamp."
42
+ config_param :record_time_key, :string, :default => 'time',
43
+ :desc => "Time field when time_source is 'record'"
42
44
  config_param :get_kafka_client_log, :bool, :default => false
43
45
  config_param :time_format, :string, :default => nil,
44
46
  :desc => "Time format to be used to parse 'time' field."
@@ -186,16 +188,17 @@ class Fluent::KafkaInput < Fluent::Input
186
188
  @kafka = Kafka.new(seed_brokers: @brokers, client_id: @client_id, logger: logger, ssl_ca_cert: read_ssl_file(@ssl_ca_cert),
187
189
  ssl_client_cert: read_ssl_file(@ssl_client_cert), ssl_client_cert_key: read_ssl_file(@ssl_client_cert_key),
188
190
  ssl_ca_certs_from_system: @ssl_ca_certs_from_system, sasl_scram_username: @username, sasl_scram_password: @password,
189
- sasl_scram_mechanism: @scram_mechanism, sasl_over_ssl: @sasl_over_ssl)
191
+ sasl_scram_mechanism: @scram_mechanism, sasl_over_ssl: @sasl_over_ssl, ssl_verify_hostname: @ssl_verify_hostname)
190
192
  elsif @username != nil && @password != nil
191
193
  @kafka = Kafka.new(seed_brokers: @brokers, client_id: @client_id, logger: logger, ssl_ca_cert: read_ssl_file(@ssl_ca_cert),
192
194
  ssl_client_cert: read_ssl_file(@ssl_client_cert), ssl_client_cert_key: read_ssl_file(@ssl_client_cert_key),
193
195
  ssl_ca_certs_from_system: @ssl_ca_certs_from_system,sasl_plain_username: @username, sasl_plain_password: @password,
194
- sasl_over_ssl: @sasl_over_ssl)
196
+ sasl_over_ssl: @sasl_over_ssl, ssl_verify_hostname: @ssl_verify_hostname)
195
197
  else
196
198
  @kafka = Kafka.new(seed_brokers: @brokers, client_id: @client_id, logger: logger, ssl_ca_cert: read_ssl_file(@ssl_ca_cert),
197
199
  ssl_client_cert: read_ssl_file(@ssl_client_cert), ssl_client_cert_key: read_ssl_file(@ssl_client_cert_key),
198
- ssl_ca_certs_from_system: @ssl_ca_certs_from_system, sasl_gssapi_principal: @principal, sasl_gssapi_keytab: @keytab)
200
+ ssl_ca_certs_from_system: @ssl_ca_certs_from_system, sasl_gssapi_principal: @principal, sasl_gssapi_keytab: @keytab,
201
+ ssl_verify_hostname: @ssl_verify_hostname)
199
202
  end
200
203
 
201
204
  @zookeeper = Zookeeper.new(@offset_zookeeper) if @offset_zookeeper
@@ -292,9 +295,9 @@ class Fluent::KafkaInput < Fluent::Input
292
295
  record_time = Fluent::Engine.now
293
296
  when :record
294
297
  if @time_format
295
- record_time = @time_parser.parse(record['time'])
298
+ record_time = @time_parser.parse(record[@record_time_key])
296
299
  else
297
- record_time = record['time']
300
+ record_time = record[@record_time_key]
298
301
  end
299
302
  else
300
303
  $log.fatal "BUG: invalid time_source: #{@time_source}"
@@ -18,6 +18,8 @@ class Fluent::KafkaGroupInput < Fluent::Input
18
18
  :desc => "Supported format: (json|text|ltsv|msgpack)"
19
19
  config_param :message_key, :string, :default => 'message',
20
20
  :desc => "For 'text' format only."
21
+ config_param :add_headers, :bool, :default => false,
22
+ :desc => "Add kafka's message headers to event record"
21
23
  config_param :add_prefix, :string, :default => nil,
22
24
  :desc => "Tag prefix (Optional)"
23
25
  config_param :add_suffix, :string, :default => nil,
@@ -29,6 +31,8 @@ class Fluent::KafkaGroupInput < Fluent::Input
29
31
  :deprecated => "Use 'time_source record' instead."
30
32
  config_param :time_source, :enum, :list => [:now, :kafka, :record], :default => :now,
31
33
  :desc => "Source for message timestamp."
34
+ config_param :record_time_key, :string, :default => 'time',
35
+ :desc => "Time field when time_source is 'record'"
32
36
  config_param :get_kafka_client_log, :bool, :default => false
33
37
  config_param :time_format, :string, :default => nil,
34
38
  :desc => "Time format to be used to parse 'time' field."
@@ -166,16 +170,17 @@ class Fluent::KafkaGroupInput < Fluent::Input
166
170
  @kafka = Kafka.new(seed_brokers: @brokers, client_id: @client_id, logger: logger, connect_timeout: @connect_timeout, socket_timeout: @socket_timeout, ssl_ca_cert: read_ssl_file(@ssl_ca_cert),
167
171
  ssl_client_cert: read_ssl_file(@ssl_client_cert), ssl_client_cert_key: read_ssl_file(@ssl_client_cert_key),
168
172
  ssl_ca_certs_from_system: @ssl_ca_certs_from_system, sasl_scram_username: @username, sasl_scram_password: @password,
169
- sasl_scram_mechanism: @scram_mechanism, sasl_over_ssl: @sasl_over_ssl)
173
+ sasl_scram_mechanism: @scram_mechanism, sasl_over_ssl: @sasl_over_ssl, ssl_verify_hostname: @ssl_verify_hostname)
170
174
  elsif @username != nil && @password != nil
171
175
  @kafka = Kafka.new(seed_brokers: @brokers, client_id: @client_id, logger: logger, connect_timeout: @connect_timeout, socket_timeout: @socket_timeout, ssl_ca_cert: read_ssl_file(@ssl_ca_cert),
172
176
  ssl_client_cert: read_ssl_file(@ssl_client_cert), ssl_client_cert_key: read_ssl_file(@ssl_client_cert_key),
173
177
  ssl_ca_certs_from_system: @ssl_ca_certs_from_system, sasl_plain_username: @username, sasl_plain_password: @password,
174
- sasl_over_ssl: @sasl_over_ssl)
178
+ sasl_over_ssl: @sasl_over_ssl, ssl_verify_hostname: @ssl_verify_hostname)
175
179
  else
176
180
  @kafka = Kafka.new(seed_brokers: @brokers, client_id: @client_id, logger: logger, connect_timeout: @connect_timeout, socket_timeout: @socket_timeout, ssl_ca_cert: read_ssl_file(@ssl_ca_cert),
177
181
  ssl_client_cert: read_ssl_file(@ssl_client_cert), ssl_client_cert_key: read_ssl_file(@ssl_client_cert_key),
178
- ssl_ca_certs_from_system: @ssl_ca_certs_from_system, sasl_gssapi_principal: @principal, sasl_gssapi_keytab: @keytab)
182
+ ssl_ca_certs_from_system: @ssl_ca_certs_from_system, sasl_gssapi_principal: @principal, sasl_gssapi_keytab: @keytab,
183
+ ssl_verify_hostname: @ssl_verify_hostname)
179
184
  end
180
185
 
181
186
  @consumer = setup_consumer
@@ -198,7 +203,14 @@ class Fluent::KafkaGroupInput < Fluent::Input
198
203
  def setup_consumer
199
204
  consumer = @kafka.consumer(@consumer_opts)
200
205
  @topics.each { |topic|
201
- consumer.subscribe(topic, start_from_beginning: @start_from_beginning, max_bytes_per_partition: @max_bytes)
206
+ if m = /^\/(.+)\/$/.match(topic)
207
+ topic_or_regex = Regexp.new(m[1])
208
+ $log.info "Subscribe to topics matching the regex #{topic}"
209
+ else
210
+ topic_or_regex = topic
211
+ $log.info "Subscribe to topic #{topic}"
212
+ end
213
+ consumer.subscribe(topic_or_regex, start_from_beginning: @start_from_beginning, max_bytes_per_partition: @max_bytes)
202
214
  }
203
215
  consumer
204
216
  end
@@ -243,9 +255,9 @@ class Fluent::KafkaGroupInput < Fluent::Input
243
255
  record_time = Fluent::Engine.now
244
256
  when :record
245
257
  if @time_format
246
- record_time = @time_parser.parse(record['time'].to_s)
258
+ record_time = @time_parser.parse(record[@record_time_key].to_s)
247
259
  else
248
- record_time = record['time']
260
+ record_time = record[@record_time_key]
249
261
  end
250
262
  else
251
263
  log.fatal "BUG: invalid time_source: #{@time_source}"
@@ -253,6 +265,11 @@ class Fluent::KafkaGroupInput < Fluent::Input
253
265
  if @kafka_message_key
254
266
  record[@kafka_message_key] = msg.key
255
267
  end
268
+ if @add_headers
269
+ msg.headers.each_pair { |k, v|
270
+ record[k] = v
271
+ }
272
+ end
256
273
  es.add(record_time, record)
257
274
  rescue => e
258
275
  log.warn "parser error in #{batch.topic}/#{batch.partition}", :error => e.to_s, :value => msg.value, :offset => msg.offset
@@ -0,0 +1,284 @@
1
+ require 'fluent/plugin/input'
2
+ require 'fluent/time'
3
+ require 'fluent/plugin/kafka_plugin_util'
4
+
5
+ require 'rdkafka'
6
+
7
+ class Fluent::Plugin::RdKafkaGroupInput < Fluent::Plugin::Input
8
+ Fluent::Plugin.register_input('rdkafka_group', self)
9
+
10
+ helpers :thread
11
+
12
+ config_param :topics, :string,
13
+ :desc => "Listening topics(separate with comma',')."
14
+
15
+ config_param :format, :string, :default => 'json',
16
+ :desc => "Supported format: (json|text|ltsv|msgpack)"
17
+ config_param :message_key, :string, :default => 'message',
18
+ :desc => "For 'text' format only."
19
+ config_param :add_headers, :bool, :default => false,
20
+ :desc => "Add kafka's message headers to event record"
21
+ config_param :add_prefix, :string, :default => nil,
22
+ :desc => "Tag prefix (Optional)"
23
+ config_param :add_suffix, :string, :default => nil,
24
+ :desc => "Tag suffix (Optional)"
25
+ config_param :use_record_time, :bool, :default => false,
26
+ :desc => "Replace message timestamp with contents of 'time' field.",
27
+ :deprecated => "Use 'time_source record' instead."
28
+ config_param :time_source, :enum, :list => [:now, :kafka, :record], :default => :now,
29
+ :desc => "Source for message timestamp."
30
+ config_param :record_time_key, :string, :default => 'time',
31
+ :desc => "Time field when time_source is 'record'"
32
+ config_param :time_format, :string, :default => nil,
33
+ :desc => "Time format to be used to parse 'time' field."
34
+ config_param :kafka_message_key, :string, :default => nil,
35
+ :desc => "Set kafka's message key to this field"
36
+
37
+ config_param :retry_emit_limit, :integer, :default => nil,
38
+ :desc => "How long to stop event consuming when BufferQueueLimitError happens. Wait retry_emit_limit x 1s. The default is waiting until BufferQueueLimitError is resolved"
39
+ config_param :retry_wait_seconds, :integer, :default => 30
40
+ config_param :disable_retry_limit, :bool, :default => false,
41
+ :desc => "If set true, it disables retry_limit and make Fluentd retry indefinitely (default: false)"
42
+ config_param :retry_limit, :integer, :default => 10,
43
+ :desc => "The maximum number of retries for connecting kafka (default: 10)"
44
+
45
+ config_param :max_wait_time_ms, :integer, :default => 250,
46
+ :desc => "How long to block polls in milliseconds until the server sends us data."
47
+ config_param :max_batch_size, :integer, :default => 10000,
48
+ :desc => "Maximum number of log lines emitted in a single batch."
49
+
50
+ config_param :kafka_configs, :hash, :default => {},
51
+ :desc => "Kafka configuration properties as desribed in https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md"
52
+
53
+ include Fluent::KafkaPluginUtil::SSLSettings
54
+ include Fluent::KafkaPluginUtil::SaslSettings
55
+
56
+ class ForShutdown < StandardError
57
+ end
58
+
59
+ BufferError = Fluent::Plugin::Buffer::BufferOverflowError
60
+
61
+ def initialize
62
+ super
63
+
64
+ @time_parser = nil
65
+ @retry_count = 1
66
+ end
67
+
68
+ def _config_to_array(config)
69
+ config_array = config.split(',').map {|k| k.strip }
70
+ if config_array.empty?
71
+ raise Fluent::ConfigError, "kafka_group: '#{config}' is a required parameter"
72
+ end
73
+ config_array
74
+ end
75
+
76
+ def multi_workers_ready?
77
+ true
78
+ end
79
+
80
+ private :_config_to_array
81
+
82
+ def configure(conf)
83
+ super
84
+
85
+ log.warn "The in_rdkafka_group consumer was not yet tested under heavy production load. Use it at your own risk!"
86
+
87
+ log.info "Will watch for topics #{@topics} at brokers " \
88
+ "#{@kafka_configs["bootstrap.servers"]} and '#{@kafka_configs["group.id"]}' group"
89
+
90
+ @topics = _config_to_array(@topics)
91
+
92
+ @parser_proc = setup_parser
93
+
94
+ @time_source = :record if @use_record_time
95
+
96
+ if @time_source == :record and @time_format
97
+ @time_parser = Fluent::TimeParser.new(@time_format)
98
+ end
99
+ end
100
+
101
+ def setup_parser
102
+ case @format
103
+ when 'json'
104
+ begin
105
+ require 'oj'
106
+ Oj.default_options = Fluent::DEFAULT_OJ_OPTIONS
107
+ Proc.new { |msg| Oj.load(msg.payload) }
108
+ rescue LoadError
109
+ require 'yajl'
110
+ Proc.new { |msg| Yajl::Parser.parse(msg.payload) }
111
+ end
112
+ when 'ltsv'
113
+ require 'ltsv'
114
+ Proc.new { |msg| LTSV.parse(msg.payload, {:symbolize_keys => false}).first }
115
+ when 'msgpack'
116
+ require 'msgpack'
117
+ Proc.new { |msg| MessagePack.unpack(msg.payload) }
118
+ when 'text'
119
+ Proc.new { |msg| {@message_key => msg.payload} }
120
+ end
121
+ end
122
+
123
+ def start
124
+ super
125
+
126
+ @consumer = setup_consumer
127
+
128
+ thread_create(:in_rdkafka_group, &method(:run))
129
+ end
130
+
131
+ def shutdown
132
+ # This nil assignment should be guarded by mutex in multithread programming manner.
133
+ # But the situation is very low contention, so we don't use mutex for now.
134
+ # If the problem happens, we will add a guard for consumer.
135
+ consumer = @consumer
136
+ @consumer = nil
137
+ consumer.close
138
+
139
+ super
140
+ end
141
+
142
+ def setup_consumer
143
+ consumer = Rdkafka::Config.new(@kafka_configs).consumer
144
+ consumer.subscribe(*@topics)
145
+ consumer
146
+ end
147
+
148
+ def reconnect_consumer
149
+ log.warn "Stopping Consumer"
150
+ consumer = @consumer
151
+ @consumer = nil
152
+ if consumer
153
+ consumer.close
154
+ end
155
+ log.warn "Could not connect to broker. retry_time:#{@retry_count}. Next retry will be in #{@retry_wait_seconds} seconds"
156
+ @retry_count = @retry_count + 1
157
+ sleep @retry_wait_seconds
158
+ @consumer = setup_consumer
159
+ log.warn "Re-starting consumer #{Time.now.to_s}"
160
+ @retry_count = 0
161
+ rescue =>e
162
+ log.error "unexpected error during re-starting consumer object access", :error => e.to_s
163
+ log.error_backtrace
164
+ if @retry_count <= @retry_limit or disable_retry_limit
165
+ reconnect_consumer
166
+ end
167
+ end
168
+
169
+ class Batch
170
+ attr_reader :topic
171
+ attr_reader :messages
172
+
173
+ def initialize(topic)
174
+ @topic = topic
175
+ @messages = []
176
+ end
177
+ end
178
+
179
+ # Executes the passed codeblock on a batch of messages.
180
+ # It is guaranteed that every message in a given batch belongs to the same topic, because the tagging logic in :run expects that property.
181
+ # The number of maximum messages in a batch is capped by the :max_batch_size configuration value. It ensures that consuming from a single
182
+ # topic for a long time (e.g. with `auto.offset.reset` set to `earliest`) does not lead to memory exhaustion. Also, calling consumer.poll
183
+ # advances thes consumer offset, so in case the process crashes we might lose at most :max_batch_size messages.
184
+ def each_batch(&block)
185
+ batch = nil
186
+ message = nil
187
+ while @consumer
188
+ message = @consumer.poll(@max_wait_time_ms)
189
+ if message
190
+ if not batch
191
+ batch = Batch.new(message.topic)
192
+ elsif batch.topic != message.topic || batch.messages.size >= @max_batch_size
193
+ yield batch
194
+ batch = Batch.new(message.topic)
195
+ end
196
+ batch.messages << message
197
+ else
198
+ yield batch if batch
199
+ batch = nil
200
+ end
201
+ end
202
+ yield batch if batch
203
+ end
204
+
205
+ def run
206
+ while @consumer
207
+ begin
208
+ each_batch { |batch|
209
+ log.debug "A new batch for topic #{batch.topic} with #{batch.messages.size} messages"
210
+ es = Fluent::MultiEventStream.new
211
+ tag = batch.topic
212
+ tag = @add_prefix + "." + tag if @add_prefix
213
+ tag = tag + "." + @add_suffix if @add_suffix
214
+
215
+ batch.messages.each { |msg|
216
+ begin
217
+ record = @parser_proc.call(msg)
218
+ case @time_source
219
+ when :kafka
220
+ record_time = Fluent::EventTime.from_time(msg.timestamp)
221
+ when :now
222
+ record_time = Fluent::Engine.now
223
+ when :record
224
+ if @time_format
225
+ record_time = @time_parser.parse(record[@record_time_key].to_s)
226
+ else
227
+ record_time = record[@record_time_key]
228
+ end
229
+ else
230
+ log.fatal "BUG: invalid time_source: #{@time_source}"
231
+ end
232
+ if @kafka_message_key
233
+ record[@kafka_message_key] = msg.key
234
+ end
235
+ if @add_headers
236
+ msg.headers.each_pair { |k, v|
237
+ record[k] = v
238
+ }
239
+ end
240
+ es.add(record_time, record)
241
+ rescue => e
242
+ log.warn "parser error in #{msg.topic}/#{msg.partition}", :error => e.to_s, :value => msg.payload, :offset => msg.offset
243
+ log.debug_backtrace
244
+ end
245
+ }
246
+
247
+ unless es.empty?
248
+ emit_events(tag, es)
249
+ end
250
+ }
251
+ rescue ForShutdown
252
+ rescue => e
253
+ log.error "unexpected error during consuming events from kafka. Re-fetch events.", :error => e.to_s
254
+ log.error_backtrace
255
+ reconnect_consumer
256
+ end
257
+ end
258
+ rescue => e
259
+ log.error "unexpected error during consumer object access", :error => e.to_s
260
+ log.error_backtrace
261
+ end
262
+
263
+ def emit_events(tag, es)
264
+ retries = 0
265
+ begin
266
+ router.emit_stream(tag, es)
267
+ rescue BufferError
268
+ raise ForShutdown if @consumer.nil?
269
+
270
+ if @retry_emit_limit.nil?
271
+ sleep 1
272
+ retry
273
+ end
274
+
275
+ if retries < @retry_emit_limit
276
+ retries += 1
277
+ sleep 1
278
+ retry
279
+ else
280
+ raise RuntimeError, "Exceeds retry_emit_limit"
281
+ end
282
+ end
283
+ end
284
+ end
@@ -69,12 +69,13 @@ module Kafka
69
69
  retry_backoff: retry_backoff,
70
70
  max_buffer_size: max_buffer_size,
71
71
  max_buffer_bytesize: max_buffer_bytesize,
72
+ partitioner: @partitioner,
72
73
  )
73
74
  end
74
75
  end
75
76
 
76
77
  class TopicProducer
77
- def initialize(topic, cluster:, transaction_manager:, logger:, instrumenter:, compressor:, ack_timeout:, required_acks:, max_retries:, retry_backoff:, max_buffer_size:, max_buffer_bytesize:)
78
+ def initialize(topic, cluster:, transaction_manager:, logger:, instrumenter:, compressor:, ack_timeout:, required_acks:, max_retries:, retry_backoff:, max_buffer_size:, max_buffer_bytesize:, partitioner:)
78
79
  @cluster = cluster
79
80
  @transaction_manager = transaction_manager
80
81
  @logger = logger
@@ -86,6 +87,7 @@ module Kafka
86
87
  @max_buffer_size = max_buffer_size
87
88
  @max_buffer_bytesize = max_buffer_bytesize
88
89
  @compressor = compressor
90
+ @partitioner = partitioner
89
91
 
90
92
  @topic = topic
91
93
  @cluster.add_target_topics(Set.new([topic]))
@@ -250,7 +252,7 @@ module Kafka
250
252
 
251
253
  begin
252
254
  if partition.nil?
253
- partition = Partitioner.partition_for_key(partition_count, message)
255
+ partition = @partitioner.call(partition_count, message)
254
256
  end
255
257
 
256
258
  @buffer.write(
@@ -15,6 +15,7 @@ module Fluent::Plugin
15
15
  Set brokers directly:
16
16
  <broker1_host>:<broker1_port>,<broker2_host>:<broker2_port>,..
17
17
  DESC
18
+ config_param :topic, :string, :default => nil, :desc => "kafka topic. Placeholders are supported"
18
19
  config_param :topic_key, :string, :default => 'topic', :desc => "Field for kafka topic"
19
20
  config_param :default_topic, :string, :default => nil,
20
21
  :desc => "Default output topic when record doesn't have topic field"
@@ -68,6 +69,7 @@ The codec the producer uses to compress messages.
68
69
  Supported codecs depends on ruby-kafka: https://github.com/zendesk/ruby-kafka#compression
69
70
  DESC
70
71
  config_param :max_send_limit_bytes, :size, :default => nil
72
+ config_param :discard_kafka_delivery_failed, :bool, :default => false
71
73
  config_param :active_support_notification_regex, :string, :default => nil,
72
74
  :desc => <<-DESC
73
75
  Add a regular expression to capture ActiveSupport notifications from the Kafka client
@@ -215,7 +217,11 @@ DESC
215
217
  # TODO: optimize write performance
216
218
  def write(chunk)
217
219
  tag = chunk.metadata.tag
218
- topic = (chunk.metadata.variables && chunk.metadata.variables[@topic_key_sym]) || @default_topic || tag
220
+ topic = if @topic
221
+ extract_placeholders(@topic, chunk)
222
+ else
223
+ (chunk.metadata.variables && chunk.metadata.variables[@topic_key_sym]) || @default_topic || tag
224
+ end
219
225
 
220
226
  messages = 0
221
227
  record_buf = nil
@@ -262,7 +268,16 @@ DESC
262
268
 
263
269
  if messages > 0
264
270
  log.debug { "#{messages} messages send." }
265
- producer.deliver_messages
271
+ if @discard_kafka_delivery_failed
272
+ begin
273
+ producer.deliver_messages
274
+ rescue Kafka::DeliveryFailed => e
275
+ log.warn "DeliveryFailed occurred. Discard broken event:", :error => e.to_s, :error_class => e.class.to_s, :tag => tag
276
+ producer.clear_buffer
277
+ end
278
+ else
279
+ producer.deliver_messages
280
+ end
266
281
  end
267
282
  rescue Kafka::UnknownTopicOrPartition
268
283
  if @use_default_for_unknown_topic && topic != @default_topic
@@ -33,6 +33,7 @@ Set brokers directly:
33
33
  <broker1_host>:<broker1_port>,<broker2_host>:<broker2_port>,..
34
34
  Brokers: you can choose to use either brokers or zookeeper.
35
35
  DESC
36
+ config_param :topic, :string, :default => nil, :desc => "kafka topic. Placeholders are supported"
36
37
  config_param :topic_key, :string, :default => 'topic', :desc => "Field for kafka topic"
37
38
  config_param :default_topic, :string, :default => nil,
38
39
  :desc => "Default output topic when record doesn't have topic field"
@@ -72,6 +73,7 @@ The codec the producer uses to compress messages. Used for compression.codec
72
73
  Supported codecs: (gzip|snappy)
73
74
  DESC
74
75
  config_param :max_send_limit_bytes, :size, :default => nil
76
+ config_param :discard_kafka_delivery_failed, :bool, :default => false
75
77
  config_param :rdkafka_buffering_max_ms, :integer, :default => nil, :desc => 'Used for queue.buffering.max.ms'
76
78
  config_param :rdkafka_buffering_max_messages, :integer, :default => nil, :desc => 'Used for queue.buffering.max.messages'
77
79
  config_param :rdkafka_message_max_bytes, :integer, :default => nil, :desc => 'Used for message.max.bytes'
@@ -278,7 +280,11 @@ DESC
278
280
 
279
281
  def write(chunk)
280
282
  tag = chunk.metadata.tag
281
- topic = (chunk.metadata.variables && chunk.metadata.variables[@topic_key_sym]) || @default_topic || tag
283
+ topic = if @topic
284
+ extract_placeholders(@topic, chunk)
285
+ else
286
+ (chunk.metadata.variables && chunk.metadata.variables[@topic_key_sym]) || @default_topic || tag
287
+ end
282
288
 
283
289
  handlers = []
284
290
  record_buf = nil
@@ -320,9 +326,13 @@ DESC
320
326
  }
321
327
  end
322
328
  rescue Exception => e
323
- log.warn "Send exception occurred: #{e} at #{e.backtrace.first}"
324
- # Raise exception to retry sendind messages
325
- raise e
329
+ if @discard_kafka_delivery_failed
330
+ log.warn "Delivery failed. Discard events:", :error => e.to_s, :error_class => e.class.to_s, :tag => tag
331
+ else
332
+ log.warn "Send exception occurred: #{e} at #{e.backtrace.first}"
333
+ # Raise exception to retry sendind messages
334
+ raise e
335
+ end
326
336
  end
327
337
 
328
338
  def enqueue_with_retry(producer, topic, record_buf, message_key, partition, headers)
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fluent-plugin-kafka
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.13.0
4
+ version: 0.15.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Hidemasa Togashi
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2020-03-10 00:00:00.000000000 Z
12
+ date: 2020-09-14 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: fluentd
@@ -51,7 +51,7 @@ dependencies:
51
51
  requirements:
52
52
  - - ">="
53
53
  - !ruby/object:Gem::Version
54
- version: 0.7.8
54
+ version: 1.2.0
55
55
  - - "<"
56
56
  - !ruby/object:Gem::Version
57
57
  version: '2'
@@ -61,7 +61,7 @@ dependencies:
61
61
  requirements:
62
62
  - - ">="
63
63
  - !ruby/object:Gem::Version
64
- version: 0.7.8
64
+ version: 1.2.0
65
65
  - - "<"
66
66
  - !ruby/object:Gem::Version
67
67
  version: '2'
@@ -111,6 +111,7 @@ files:
111
111
  - fluent-plugin-kafka.gemspec
112
112
  - lib/fluent/plugin/in_kafka.rb
113
113
  - lib/fluent/plugin/in_kafka_group.rb
114
+ - lib/fluent/plugin/in_rdkafka_group.rb
114
115
  - lib/fluent/plugin/kafka_plugin_util.rb
115
116
  - lib/fluent/plugin/kafka_producer_ext.rb
116
117
  - lib/fluent/plugin/out_kafka.rb