fluent-plugin-kafka-enchanced 0.5.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 056609426bfbcfd6ea642db660c8360beaee7f72
4
+ data.tar.gz: acd3535de138c68109edcaf7ad318ad76cc8dc5d
5
+ SHA512:
6
+ metadata.gz: 1d25c86afc8ea2513504bd6d2b086751f3bdcc88c3f8f35673188bd3ee5edf7b23fc2d688392e79df3be2d5995b02bc168bc68a5b94564c8f0ebce7ce7e9a19e
7
+ data.tar.gz: b5c07478cd3595b420330490efa62b1d0878020962a742c9bf72f373186cb4229715ae723335e52fef2256087f17b22550faf849e0454fd66ffb1d1aedf8ee96
data/.gitignore ADDED
@@ -0,0 +1,2 @@
1
+ /Gemfile.lock
2
+ *.swp
data/.travis.yml ADDED
@@ -0,0 +1,17 @@
1
+ language: ruby
2
+
3
+ rvm:
4
+ - 2.1
5
+ - 2.2
6
+ - 2.3.1
7
+ - ruby-head
8
+
9
+ script:
10
+ - bundle exec rake test
11
+
12
+ sudo: false
13
+
14
+ matrix:
15
+ allow_failures:
16
+ - rvm: ruby-head
17
+
data/ChangeLog ADDED
@@ -0,0 +1,49 @@
1
+ Release 0.5.1 - 2017/02/06
2
+
3
+ * in_kafka_group: Fix uninitialized constant error
4
+
5
+ Release 0.5.0 - 2017/01/17
6
+
7
+ * output: Add out_kafka2 plugin with v0.14 API
8
+
9
+ Release 0.4.2 - 2016/12/10
10
+
11
+ * input: Add use_record_time and time_format parameters
12
+ * Update ruby-kafka dependency to 0.3.16.beta2
13
+
14
+ Release 0.4.1 - 2016/12/01
15
+
16
+ * output: Support specifying partition
17
+
18
+ Release 0.4.0 - 2016/11/08
19
+
20
+ * Remove zookeeper dependency
21
+
22
+ Release 0.3.5 - 2016/10/21
23
+
24
+ * output: Support message key and related parameters. #91
25
+
26
+ Release 0.3.4 - 2016/10/20
27
+
28
+ * output: Add exclude_topic_key and exclude_partition_key. #89
29
+
30
+ Release 0.3.3 - 2016/10/17
31
+
32
+ * out_kafka_buffered: Add get_kafka_client_log parameter. #83
33
+ * out_kafka_buffered: Skip and log invalid record to avoid buffer stuck. #86
34
+ * in_kafka_group: Add retry_emit_limit to handle BufferQueueLimitError. #87
35
+
36
+ Release 0.3.2 - 2016/10/06
37
+
38
+ * in_kafka_group: Re-fetch events after consumer error. #79
39
+
40
+ Release 0.3.1 - 2016/08/28
41
+
42
+ * output: Change default required_acks to -1. #70
43
+ * Support ruby version changed to 2.1.0 or later
44
+
45
+ Release 0.3.0 - 2016/08/24
46
+
47
+ * Fully replace poseidon ruby library with ruby-kafka to support latest kafka versions
48
+
49
+ See git commits for older changes
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in fluent-plugin-kafka.gemspec
4
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2014 htgc
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,221 @@
1
+ # fluent-plugin-kafka, a plugin for [Fluentd](http://fluentd.org)
2
+
3
+ [![Build Status](https://travis-ci.org/htgc/fluent-plugin-kafka.svg?branch=master)](https://travis-ci.org/htgc/fluent-plugin-kafka)
4
+
5
+ A fluentd plugin to both consume and produce data for Apache Kafka.
6
+
7
+ TODO: Also, I need to write tests
8
+
9
+ ## Installation
10
+
11
+ Add this line to your application's Gemfile:
12
+
13
+ gem 'fluent-plugin-kafka'
14
+
15
+ And then execute:
16
+
17
+ $ bundle
18
+
19
+ Or install it yourself as:
20
+
21
+ $ gem install fluent-plugin-kafka
22
+
23
+ If you want to use zookeeper related parameters, you also need to install zookeeper gem. zookeeper gem includes native extension, so development tools are needed, e.g. gcc, make and etc.
24
+
25
+ ## Requirements
26
+
27
+ - Ruby 2.1 or later
28
+ - Input plugins work with kafka v0.9 or later
29
+ - Output plugins work with kafka v0.8 or later
30
+
31
+ ## Usage
32
+
33
+ ### Common parameters
34
+
35
+ - ssl_ca_cert
36
+ - ssl_client_cert
37
+ - ssl_client_cert_key
38
+
39
+ Set path to SSL related files. See [Encryption and Authentication using SSL](https://github.com/zendesk/ruby-kafka#encryption-and-authentication-using-ssl) for more detail.
40
+
41
+ ### Input plugin (@type 'kafka')
42
+
43
+ Consume events by single consumer.
44
+
45
+ <source>
46
+ @type kafka
47
+
48
+ brokers <broker1_host>:<broker1_port>,<broker2_host>:<broker2_port>,..
49
+ topics <listening topics(separate with comma',')>
50
+ format <input text type (text|json|ltsv|msgpack)> :default => json
51
+ message_key <key (Optional, for text format only, default is message)>
52
+ add_prefix <tag prefix (Optional)>
53
+ add_suffix <tag suffix (Optional)>
54
+
55
+ # Optionally, you can manage topic offset by using zookeeper
56
+ offset_zookeeper <zookeer node list (<zookeeper1_host>:<zookeeper1_port>,<zookeeper2_host>:<zookeeper2_port>,..)>
57
+ offset_zk_root_node <offset path in zookeeper> default => '/fluent-plugin-kafka'
58
+
59
+ # ruby-kafka consumer options
60
+ max_bytes (integer) :default => nil (Use default of ruby-kafka)
61
+ max_wait_time (integer) :default => nil (Use default of ruby-kafka)
62
+ min_bytes (integer) :default => nil (Use default of ruby-kafka)
63
+ </source>
64
+
65
+ Supports a start of processing from the assigned offset for specific topics.
66
+
67
+ <source>
68
+ @type kafka
69
+
70
+ brokers <broker1_host>:<broker1_port>,<broker2_host>:<broker2_port>,..
71
+ format <input text type (text|json|ltsv|msgpack)>
72
+ <topic>
73
+ topic <listening topic>
74
+ partition <listening partition: default=0>
75
+ offset <listening start offset: default=-1>
76
+ </topic>
77
+ <topic>
78
+ topic <listening topic>
79
+ partition <listening partition: default=0>
80
+ offset <listening start offset: default=-1>
81
+ </topic>
82
+ </source>
83
+
84
+ See also [ruby-kafka README](https://github.com/zendesk/ruby-kafka#consuming-messages-from-kafka) for more detailed documentation about ruby-kafka.
85
+
86
+ ### Input plugin (@type 'kafka_group', supports kafka group)
87
+
88
+ Consume events by kafka consumer group features..
89
+
90
+ <source>
91
+ @type kafka_group
92
+
93
+ brokers <broker1_host>:<broker1_port>,<broker2_host>:<broker2_port>,..
94
+ consumer_group <consumer group name, must set>
95
+ topics <listening topics(separate with comma',')>
96
+ format <input text type (text|json|ltsv|msgpack)> :default => json
97
+ message_key <key (Optional, for text format only, default is message)>
98
+ add_prefix <tag prefix (Optional)>
99
+ add_suffix <tag suffix (Optional)>
100
+ retry_emit_limit <Wait retry_emit_limit x 1s when BuffereQueueLimitError happens. The default is nil and it means waiting until BufferQueueLimitError is resolved>
101
+ use_record_time <If true, replace event time with contents of 'time' field of fetched record>
102
+ time_format <string (Optional when use_record_time is used)>
103
+
104
+ # ruby-kafka consumer options
105
+ max_bytes (integer) :default => 1048576
106
+ max_wait_time (integer) :default => nil (Use default of ruby-kafka)
107
+ min_bytes (integer) :default => nil (Use default of ruby-kafka)
108
+ offset_commit_interval (integer) :default => nil (Use default of ruby-kafka)
109
+ offset_commit_threshold (integer) :default => nil (Use default of ruby-kafka)
110
+ start_from_beginning (bool) :default => true
111
+ </source>
112
+
113
+ See also [ruby-kafka README](https://github.com/zendesk/ruby-kafka#consuming-messages-from-kafka) for more detailed documentation about ruby-kafka options.
114
+
115
+ ### Buffered output plugin
116
+
117
+ This plugin uses ruby-kafka producer for writing data. This plugin works with recent kafka versions.
118
+
119
+ <match *.**>
120
+ @type kafka_buffered
121
+
122
+ # Brokers: you can choose either brokers or zookeeper. If you are not familiar with zookeeper, use brokers parameters.
123
+ brokers <broker1_host>:<broker1_port>,<broker2_host>:<broker2_port>,.. # Set brokers directly
124
+ zookeeper <zookeeper_host>:<zookeeper_port> # Set brokers via Zookeeper
125
+ zookeeper_path <broker path in zookeeper> :default => /brokers/ids # Set path in zookeeper for kafka
126
+
127
+ default_topic (string) :default => nil
128
+ default_partition_key (string) :default => nil
129
+ default_message_key (string) :default => nil
130
+ output_data_type (json|ltsv|msgpack|attr:<record name>|<formatter name>) :default => json
131
+ output_include_tag (bool) :default => false
132
+ output_include_time (bool) :default => false
133
+ exclude_topic_key (bool) :default => false
134
+ exclude_partition_key (bool) :default => false
135
+ get_kafka_client_log (bool) :default => false
136
+
137
+ # See fluentd document for buffer related parameters: http://docs.fluentd.org/articles/buffer-plugin-overview
138
+
139
+ # ruby-kafka producer options
140
+ max_send_retries (integer) :default => 1
141
+ required_acks (integer) :default => -1
142
+ ack_timeout (integer) :default => nil (Use default of ruby-kafka)
143
+ compression_codec (gzip|snappy) :default => nil (No compression)
144
+ </match>
145
+
146
+ `<formatter name>` of `output_data_type` uses fluentd's formatter plugins. See [formatter article](http://docs.fluentd.org/articles/formatter-plugin-overview).
147
+
148
+ ruby-kafka sometimes returns `Kafka::DeliveryFailed` error without good information.
149
+ In this case, `get_kafka_client_log` is useful for identifying the error cause.
150
+ ruby-kafka's log is routed to fluentd log so you can see ruby-kafka's log in fluentd logs.
151
+
152
+ Supports following ruby-kafka's producer options.
153
+
154
+ - max_send_retries - default: 1 - Number of times to retry sending of messages to a leader.
155
+ - required_acks - default: -1 - The number of acks required per request. If you need flush performance, set lower value, e.g. 1, 2.
156
+ - ack_timeout - default: nil - How long the producer waits for acks. The unit is seconds.
157
+ - compression_codec - default: nil - The codec the producer uses to compress messages.
158
+
159
+ See also [Kafka::Client](http://www.rubydoc.info/gems/ruby-kafka/Kafka/Client) for more detailed documentation about ruby-kafka.
160
+
161
+ This plugin supports compression codec "snappy" also.
162
+ Install snappy module before you use snappy compression.
163
+
164
+ $ gem install snappy
165
+
166
+ snappy gem uses native extension, so you need to install several packages before.
167
+ On Ubuntu, need development packages and snappy library.
168
+
169
+ $ sudo apt-get install build-essential autoconf automake libtool libsnappy-dev
170
+
171
+ #### Load balancing
172
+
173
+ Messages will be assigned a partition at random as default by ruby-kafka, but messages with the same partition key will always be assigned to the same partition by setting `default_partition_key` in config file.
174
+ If key name `partition_key` exists in a message, this plugin set its value of partition_key as key.
175
+
176
+ |default_partition_key|partition_key| behavior |
177
+ | --- | --- | --- |
178
+ |Not set|Not exists| All messages are assigned a partition at random |
179
+ |Set| Not exists| All messages are assigned to the specific partition |
180
+ |Not set| Exists | Messages which have partition_key record are assigned to the specific partition, others are assigned a partition at random |
181
+ |Set| Exists | Messages which have partition_key record are assigned to the specific partition with parition_key, others are assigned to the specific partition with default_parition_key |
182
+
183
+ If key name `message_key` exists in a message, this plugin publishes the value of message_key to kafka and can be read by consumers. Same message key will be assigned to all messages by setting `default_message_key` in config file. If message_key exists and if partition_key is not set explicitly, messsage_key will be used for partitioning.
184
+
185
+ ### Non-buffered output plugin
186
+
187
+ This plugin uses ruby-kafka producer for writing data. For performance and reliability concerns, use `kafka_bufferd` output instead. This is mainly for testing.
188
+
189
+ <match *.**>
190
+ @type kafka
191
+
192
+ # Brokers: you can choose either brokers or zookeeper.
193
+ brokers <broker1_host>:<broker1_port>,<broker2_host>:<broker2_port>,.. # Set brokers directly
194
+ zookeeper <zookeeper_host>:<zookeeper_port> # Set brokers via Zookeeper
195
+ zookeeper_path <broker path in zookeeper> :default => /brokers/ids # Set path in zookeeper for kafka
196
+
197
+ default_topic (string) :default => nil
198
+ default_partition_key (string) :default => nil
199
+ default_message_key (string) :default => nil
200
+ output_data_type (json|ltsv|msgpack|attr:<record name>|<formatter name>) :default => json
201
+ output_include_tag (bool) :default => false
202
+ output_include_time (bool) :default => false
203
+ exclude_topic_key (bool) :default => false
204
+ exclude_partition_key (bool) :default => false
205
+
206
+ # ruby-kafka producer options
207
+ max_send_retries (integer) :default => 1
208
+ required_acks (integer) :default => -1
209
+ ack_timeout (integer) :default => nil (Use default of ruby-kafka)
210
+ compression_codec (gzip|snappy) :default => nil
211
+ </match>
212
+
213
+ This plugin also supports ruby-kafka related parameters. See Buffered output plugin section.
214
+
215
+ ## Contributing
216
+
217
+ 1. Fork it
218
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
219
+ 3. Commit your changes (`git commit -am 'Added some feature'`)
220
+ 4. Push to the branch (`git push origin my-new-feature`)
221
+ 5. Create new Pull Request
data/Rakefile ADDED
@@ -0,0 +1,12 @@
1
+ require 'bundler'
2
+ Bundler::GemHelper.install_tasks
3
+
4
+ require 'rake/testtask'
5
+
6
+ Rake::TestTask.new(:test) do |test|
7
+ test.libs << 'lib' << 'test'
8
+ test.test_files = FileList['test/**/test_*.rb']
9
+ test.verbose = true
10
+ end
11
+
12
+ task :default => [:build]
@@ -0,0 +1,23 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ Gem::Specification.new do |gem|
4
+ gem.authors = ["Paschenko Konstantin"]
5
+ gem.email = ["zhr0n4x@gmail.com"]
6
+ gem.description = %q{Fluentd plugin for Apache Kafka > 0.8}
7
+ gem.summary = %q{Fluentd plugin for Apache Kafka > 0.8}
8
+ gem.homepage = "https://github.com/fluent/fluent-plugin-kafka"
9
+
10
+ gem.files = `git ls-files`.split($\)
11
+ gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
12
+ gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
13
+ gem.name = "fluent-plugin-kafka-enchanced"
14
+ gem.require_paths = ["lib"]
15
+ gem.version = '0.5.1'
16
+ gem.required_ruby_version = ">= 2.1.0"
17
+
18
+ gem.add_dependency "fluentd", [">= 0.10.58", "< 2"]
19
+ gem.add_dependency 'ltsv'
20
+ gem.add_dependency 'ruby-kafka', '= 0.3.16.beta2'
21
+ gem.add_development_dependency "rake", ">= 0.9.2"
22
+ gem.add_development_dependency "test-unit", ">= 3.0.8"
23
+ end
@@ -0,0 +1,308 @@
1
+ require 'fluent/input'
2
+ require 'fluent/time'
3
+ require 'fluent/plugin/kafka_plugin_util'
4
+
5
+ class Fluent::KafkaInput < Fluent::Input
6
+ Fluent::Plugin.register_input('kafka', self)
7
+
8
+ config_param :format, :string, :default => 'json',
9
+ :desc => "Supported format: (json|text|ltsv|msgpack)"
10
+ config_param :message_key, :string, :default => 'message',
11
+ :desc => "For 'text' format only."
12
+ config_param :host, :string, :default => nil,
13
+ :desc => "Broker host"
14
+ config_param :port, :integer, :default => nil,
15
+ :desc => "Broker port"
16
+ config_param :brokers, :string, :default => 'localhost:9092',
17
+ :desc => "List of broker-host:port, separate with comma, must set."
18
+ config_param :interval, :integer, :default => 1, # seconds
19
+ :desc => "Interval (Unit: seconds)"
20
+ config_param :topics, :string, :default => nil,
21
+ :desc => "Listening topics(separate with comma',')"
22
+ config_param :client_id, :string, :default => 'kafka'
23
+ config_param :partition, :integer, :default => 0,
24
+ :desc => "Listening partition"
25
+ config_param :offset, :integer, :default => -1,
26
+ :desc => "Listening start offset"
27
+ config_param :add_prefix, :string, :default => nil,
28
+ :desc => "Tag prefix"
29
+ config_param :add_suffix, :string, :default => nil,
30
+ :desc => "tag suffix"
31
+ config_param :add_offset_in_record, :bool, :default => false
32
+
33
+ config_param :offset_zookeeper, :string, :default => nil
34
+ config_param :offset_zk_root_node, :string, :default => '/fluent-plugin-kafka'
35
+ config_param :use_record_time, :bool, :default => false,
36
+ :desc => "Replace message timestamp with contents of 'time' field."
37
+ config_param :time_format, :string, :default => nil,
38
+ :desc => "Time format to be used to parse 'time' filed."
39
+
40
+ # Kafka#fetch_messages options
41
+ config_param :max_bytes, :integer, :default => nil,
42
+ :desc => "Maximum number of bytes to fetch."
43
+ config_param :max_wait_time, :integer, :default => nil,
44
+ :desc => "How long to block until the server sends us data."
45
+ config_param :min_bytes, :integer, :default => nil,
46
+ :desc => "Smallest amount of data the server should send us."
47
+
48
+ include Fluent::KafkaPluginUtil::SSLSettings
49
+
50
+ unless method_defined?(:router)
51
+ define_method("router") { Fluent::Engine }
52
+ end
53
+
54
+ def initialize
55
+ super
56
+ require 'kafka'
57
+
58
+ @time_parser = nil
59
+ end
60
+
61
+ def configure(conf)
62
+ super
63
+
64
+ @topic_list = []
65
+ if @topics
66
+ @topic_list = @topics.split(',').map { |topic|
67
+ TopicEntry.new(topic.strip, @partition, @offset)
68
+ }
69
+ else
70
+ conf.elements.select { |element| element.name == 'topic' }.each do |element|
71
+ unless element.has_key?('topic')
72
+ raise Fluent::ConfigError, "kafka: 'topic' is a require parameter in 'topic element'."
73
+ end
74
+ partition = element.has_key?('partition') ? element['partition'].to_i : 0
75
+ offset = element.has_key?('offset') ? element['offset'].to_i : -1
76
+ @topic_list.push(TopicEntry.new(element['topic'], partition, offset))
77
+ end
78
+ end
79
+
80
+ if @topic_list.empty?
81
+ raise Fluent::ConfigError, "kafka: 'topics' or 'topic element' is a require parameter"
82
+ end
83
+
84
+ # For backward compatibility
85
+ @brokers = case
86
+ when @host && @port
87
+ ["#{@host}:#{@port}"]
88
+ when @host
89
+ ["#{@host}:9092"]
90
+ when @port
91
+ ["localhost:#{@port}"]
92
+ else
93
+ @brokers
94
+ end
95
+
96
+ if conf['max_wait_ms']
97
+ log.warn "'max_wait_ms' parameter is deprecated. Use second unit 'max_wait_time' instead"
98
+ @max_wait_time = conf['max_wait_ms'].to_i / 1000
99
+ end
100
+
101
+ @max_wait_time = @interval if @max_wait_time.nil?
102
+
103
+ require 'zookeeper' if @offset_zookeeper
104
+
105
+ @parser_proc = setup_parser
106
+
107
+ if @use_record_time and @time_format
108
+ @time_parser = Fluent::TextParser::TimeParser.new(@time_format)
109
+ end
110
+ end
111
+
112
+ def setup_parser
113
+ case @format
114
+ when 'json'
115
+ require 'yajl'
116
+ Proc.new { |msg, te|
117
+ r = Yajl::Parser.parse(msg.value)
118
+ add_offset_in_hash(r, te, msg.offset) if @add_offset_in_record
119
+ r
120
+ }
121
+ when 'ltsv'
122
+ require 'ltsv'
123
+ Proc.new { |msg, te|
124
+ r = LTSV.parse(msg.value).first
125
+ add_offset_in_hash(r, te, msg.offset) if @add_offset_in_record
126
+ r
127
+ }
128
+ when 'msgpack'
129
+ require 'msgpack'
130
+ Proc.new { |msg, te|
131
+ r = MessagePack.unpack(msg.value)
132
+ add_offset_in_hash(r, te, msg.offset) if @add_offset_in_record
133
+ r
134
+ }
135
+ when 'text'
136
+ Proc.new { |msg, te|
137
+ r = {@message_key => msg.value}
138
+ add_offset_in_hash(r, te, msg.offset) if @add_offset_in_record
139
+ r
140
+ }
141
+ end
142
+ end
143
+
144
+ def add_offset_in_hash(hash, te, offset)
145
+ hash['kafka_topic'.freeze] = te.topic
146
+ hash['kafka_partition'.freeze] = te.partition
147
+ hash['kafka_offset'.freeze] = offset
148
+ end
149
+
150
+ def start
151
+ super
152
+
153
+ @loop = Coolio::Loop.new
154
+ opt = {}
155
+ opt[:max_bytes] = @max_bytes if @max_bytes
156
+ opt[:max_wait_time] = @max_wait_time if @max_wait_time
157
+ opt[:min_bytes] = @min_bytes if @min_bytes
158
+
159
+ @kafka = Kafka.new(seed_brokers: @brokers, client_id: @client_id,
160
+ ssl_ca_cert: read_ssl_file(@ssl_ca_cert),
161
+ ssl_client_cert: read_ssl_file(@ssl_client_cert),
162
+ ssl_client_cert_key: read_ssl_file(@ssl_client_cert_key))
163
+ @zookeeper = Zookeeper.new(@offset_zookeeper) if @offset_zookeeper
164
+
165
+ @topic_watchers = @topic_list.map {|topic_entry|
166
+ offset_manager = OffsetManager.new(topic_entry, @zookeeper, @offset_zk_root_node) if @offset_zookeeper
167
+ TopicWatcher.new(
168
+ topic_entry,
169
+ @kafka,
170
+ interval,
171
+ @parser_proc,
172
+ @add_prefix,
173
+ @add_suffix,
174
+ offset_manager,
175
+ router,
176
+ opt)
177
+ }
178
+ @topic_watchers.each {|tw|
179
+ tw.attach(@loop)
180
+ }
181
+ @thread = Thread.new(&method(:run))
182
+ end
183
+
184
+ def shutdown
185
+ @loop.stop
186
+ @zookeeper.close! if @zookeeper
187
+ @thread.join
188
+ @kafka.close
189
+ super
190
+ end
191
+
192
+ def run
193
+ @loop.run
194
+ rescue => e
195
+ $log.error "unexpected error", :error => e.to_s
196
+ $log.error_backtrace
197
+ end
198
+
199
+ class TopicWatcher < Coolio::TimerWatcher
200
+ def initialize(topic_entry, kafka, interval, parser, add_prefix, add_suffix, offset_manager, router, options={})
201
+ @topic_entry = topic_entry
202
+ @kafka = kafka
203
+ @callback = method(:consume)
204
+ @parser = parser
205
+ @add_prefix = add_prefix
206
+ @add_suffix = add_suffix
207
+ @options = options
208
+ @offset_manager = offset_manager
209
+ @router = router
210
+
211
+ @next_offset = @topic_entry.offset
212
+ if @topic_entry.offset == -1 && offset_manager
213
+ @next_offset = offset_manager.next_offset
214
+ end
215
+ @fetch_args = {
216
+ topic: @topic_entry.topic,
217
+ partition: @topic_entry.partition,
218
+ }.merge(@options)
219
+
220
+ super(interval, true)
221
+ end
222
+
223
+ def on_timer
224
+ @callback.call
225
+ rescue => e
226
+ # TODO log?
227
+ $log.error e.to_s
228
+ $log.error_backtrace
229
+ end
230
+
231
+ def consume
232
+ offset = @next_offset
233
+ @fetch_args[:offset] = offset
234
+ messages = @kafka.fetch_messages(@fetch_args)
235
+
236
+ return if messages.size.zero?
237
+
238
+ es = Fluent::MultiEventStream.new
239
+ tag = @topic_entry.topic
240
+ tag = @add_prefix + "." + tag if @add_prefix
241
+ tag = tag + "." + @add_suffix if @add_suffix
242
+
243
+ messages.each { |msg|
244
+ begin
245
+ record = @parser.call(msg, @topic_entry)
246
+ if @use_record_time
247
+ if @time_format
248
+ record_time = @time_parser.parse(record['time'])
249
+ else
250
+ record_time = record['time']
251
+ end
252
+ else
253
+ record_time = Fluent::Engine.now
254
+ end
255
+ es.add(record_time, record)
256
+ rescue => e
257
+ $log.warn "parser error in #{@topic_entry.topic}/#{@topic_entry.partition}", :error => e.to_s, :value => msg.value, :offset => msg.offset
258
+ $log.debug_backtrace
259
+ end
260
+ }
261
+ offset = messages.last.offset + 1
262
+
263
+ unless es.empty?
264
+ @router.emit_stream(tag, es)
265
+
266
+ if @offset_manager
267
+ @offset_manager.save_offset(offset)
268
+ end
269
+ @next_offset = offset
270
+ end
271
+ end
272
+ end
273
+
274
+ class TopicEntry
275
+ def initialize(topic, partition, offset)
276
+ @topic = topic
277
+ @partition = partition
278
+ @offset = offset
279
+ end
280
+ attr_reader :topic, :partition, :offset
281
+ end
282
+
283
+ class OffsetManager
284
+ def initialize(topic_entry, zookeeper, zk_root_node)
285
+ @zookeeper = zookeeper
286
+ @zk_path = "#{zk_root_node}/#{topic_entry.topic}/#{topic_entry.partition}/next_offset"
287
+ create_node(@zk_path, topic_entry.topic, topic_entry.partition)
288
+ end
289
+
290
+ def create_node(zk_path, topic, partition)
291
+ path = ""
292
+ zk_path.split(/(\/[^\/]+)/).reject(&:empty?).each { |dir|
293
+ path = path + dir
294
+ @zookeeper.create(:path => "#{path}")
295
+ }
296
+ $log.trace "use zk offset node : #{path}"
297
+ end
298
+
299
+ def next_offset
300
+ @zookeeper.get(:path => @zk_path)[:data].to_i
301
+ end
302
+
303
+ def save_offset(offset)
304
+ @zookeeper.set(:path => @zk_path, :data => offset.to_s)
305
+ $log.trace "update zk offset node : #{offset.to_s}"
306
+ end
307
+ end
308
+ end