ruby-kafka 0.4.2 → 0.4.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 5c001c97f73b001e170f23d7fd970781aa6bb196
4
- data.tar.gz: 39f0b03b56ad97eee788fac63ec467599ad8cd8e
3
+ metadata.gz: 6c934181fd72d8f78eb9b4438b7297a6a71a51eb
4
+ data.tar.gz: e3a970abf6839ab03bdd34caeb53dfe9fe91666d
5
5
  SHA512:
6
- metadata.gz: 6f9312f25ab7493d5803b581b22af270399f0f5ff2fbf4012da1f4a30525c430fb1fa21085cb4042aa4b59074a0abe3af9038ee6e97f0b48770baa50eaab6393
7
- data.tar.gz: 7523013dbc08d7a193faf6b2b2419a6704cb4b7918f9ed993545b534e356084a765a1c8cfba937a55d7236c26985e3cf252cbe50df6c3c98a45e1bc47e32caad
6
+ metadata.gz: 266b2624a721b56c991797a6848531e00675274a24241db59de6d8649ac70ba41799b40d39efbfd8c90a1b467aa142fe68526533e23933c606fbaac7b9ae780d
7
+ data.tar.gz: 1d41e52b87201e803a3edf62a56a0086c1f8aed86dc255a9c73eee5b295acc27113e7d82adba4558666142509664610f5531c654049e596c546c89e468f153b5
data/.gitignore CHANGED
@@ -3,6 +3,7 @@
3
3
  /_yardoc/
4
4
  /coverage/
5
5
  /doc/
6
+ /Gemfile.lock
6
7
  /pkg/
7
8
  /spec/reports/
8
9
  /tmp/
data/.rubocop.yml ADDED
@@ -0,0 +1,44 @@
1
+ AllCops:
2
+ DisplayCopNames: true
3
+ TargetRubyVersion: 2.1
4
+
5
+ Lint:
6
+ Enabled: false
7
+ Metrics:
8
+ Enabled: false
9
+ Performance:
10
+ Enabled: false
11
+ Style:
12
+ Enabled: false
13
+
14
+ # Configured cops
15
+
16
+ Layout/CaseIndentation:
17
+ EnforcedStyle: end
18
+ Layout/FirstParameterIndentation:
19
+ EnforcedStyle: consistent
20
+ Layout/IndentArray:
21
+ EnforcedStyle: consistent
22
+ Layout/IndentHash:
23
+ EnforcedStyle: consistent
24
+ Layout/MultilineMethodCallIndentation:
25
+ EnforcedStyle: indented
26
+ Layout/MultilineOperationIndentation:
27
+ EnforcedStyle: indented
28
+ Lint/EndAlignment:
29
+ EnforcedStyleAlignWith: variable
30
+
31
+ #
32
+ # Disabled cops
33
+ #
34
+
35
+ Layout/AlignHash:
36
+ Enabled: false
37
+ Layout/AlignParameters:
38
+ Enabled: false
39
+ Layout/EmptyLinesAroundClassBody:
40
+ Enabled: false
41
+ Layout/EmptyLinesAroundModuleBody:
42
+ Enabled: false
43
+ Layout/SpaceInsideBlockBraces:
44
+ Enabled: false
data/CHANGELOG.md CHANGED
@@ -4,6 +4,20 @@ Changes and additions to the library will be listed here.
4
4
 
5
5
  ## Unreleased
6
6
 
7
+ ## v0.4.3
8
+
9
+ - Restart the async producer thread automatically after errors.
10
+ - Include the offset lag in batch consumer metrics (Statsd).
11
+ - Make the default `max_wait_time` more sane.
12
+ - Fix issue with cached default offset lookups (#431).
13
+ - Upgrade to Datadog client version 3.
14
+
15
+ ## v0.4.2
16
+
17
+ - Fix connection issue on SASL connections (#401).
18
+ - Add more instrumentation of consumer groups (#407).
19
+ - Improve error logging (#385)
20
+
7
21
  ## v0.4.1
8
22
 
9
23
  - Allow seeking the consumer position (#386).
data/Gemfile CHANGED
@@ -1,4 +1,3 @@
1
1
  source "https://rubygems.org"
2
- ruby "2.2.3"
3
2
 
4
3
  gemspec
data/README.md CHANGED
@@ -31,8 +31,9 @@ Although parts of this library work with Kafka 0.8 – specifically, the Produce
31
31
  5. [Logging](#logging)
32
32
  6. [Instrumentation](#instrumentation)
33
33
  7. [Monitoring](#monitoring)
34
- 1. [Reporting Metrics to Statsd](#reporting-metrics-to-statsd)
35
- 2. [Reporting Metrics to Datadog](#reporting-metrics-to-datadog)
34
+ 1. [What to Monitor](#what-to-monitor)
35
+ 2. [Reporting Metrics to Statsd](#reporting-metrics-to-statsd)
36
+ 3. [Reporting Metrics to Datadog](#reporting-metrics-to-datadog)
36
37
  8. [Understanding Timeouts](#understanding-timeouts)
37
38
  9. [Security](#security)
38
39
  1. [Encryption and Authentication using SSL](#encryption-and-authentication-using-ssl)
@@ -654,14 +655,15 @@ In order to optimize for low latency, you want to process a message as soon as p
654
655
  There are three values that can be tuned in order to balance these two concerns.
655
656
 
656
657
  * `min_bytes` is the minimum number of bytes to return from a single message fetch. By setting this to a high value you can increase the processing throughput. The default value is one byte.
657
- * `max_wait_time` is the maximum number of seconds to wait before returning data from a single message fetch. By setting this high you also increase the processing throughput – and by setting it low you set a bound on latency. This configuration overrides `min_bytes`, so you'll _always_ get data back within the time specified. The default value is five seconds. If you want to have at most one second of latency, set `max_wait_time` to 1.
658
+ * `max_wait_time` is the maximum number of seconds to wait before returning data from a single message fetch. By setting this high you also increase the processing throughput – and by setting it low you set a bound on latency. This configuration overrides `min_bytes`, so you'll _always_ get data back within the time specified. The default value is one second. If you want to have at most five seconds of latency, set `max_wait_time` to 5. You should make sure `max_wait_time` * num brokers + `heartbeat_interval` is less than `session_timeout`.
658
659
  * `max_bytes_per_partition` is the maximum amount of data a broker will return for a single partition when fetching new messages. The default is 1MB, but increasing this number may lead to better throughtput since you'll need to fetch less frequently. Setting it to a lower value is not recommended unless you have so many partitions that it's causing network and latency issues to transfer a fetch response from a broker to a client. Setting the number too high may result in instability, so be careful.
659
660
 
660
661
  The first two settings can be passed to either `#each_message` or `#each_batch`, e.g.
661
662
 
662
663
  ```ruby
663
- # Waits for data for up to 30 seconds, preferring to fetch at least 5KB at a time.
664
- consumer.each_message(min_bytes: 1024 * 5, max_wait_time: 30) do |message|
664
+ # Waits for data for up to 5 seconds on each broker, preferring to fetch at least 5KB at a time.
665
+ # This can wait up to num brokers * 5 seconds.
666
+ consumer.each_message(min_bytes: 1024 * 5, max_wait_time: 5) do |message|
665
667
  # ...
666
668
  end
667
669
  ```
@@ -729,6 +731,33 @@ end
729
731
  * `message_count` is the number of messages for which delivery was attempted.
730
732
  * `delivered_message_count` is the number of messages that were acknowledged by the brokers - if this number is smaller than `message_count` not all messages were successfully delivered.
731
733
 
734
+ #### Consumer Notifications
735
+
736
+ * `process_message.consumer.kafka` is sent whenever a message is processed by a consumer. It includes the following payload:
737
+ * `value` is the message value.
738
+ * `key` is the message key.
739
+ * `topic` is the topic that the message was consumed from.
740
+ * `partition` is the topic partition that the message was consumed from.
741
+ * `offset` is the message's offset within the topic partition.
742
+ * `offset_lag` is the number of messages within the topic partition that have not yet been consumed.
743
+
744
+ * `process_batch.consumer.kafka` is sent whenever a message batch is processed by a consumer. It includes the following payload:
745
+ * `message_count` is the number of messages in the batch.
746
+ * `topic` is the topic that the message batch was consumed from.
747
+ * `partition` is the topic partition that the message batch was consumed from.
748
+ * `highwater_mark_offset` is the message batch's highest offset within the topic partition.
749
+ * `offset_lag` is the number of messages within the topic partition that have not yet been consumed.
750
+
751
+ * `join_group.consumer.kafka` is sent whenever a consumer joins a consumer group. It includes the following payload:
752
+ * `group_id` is the consumer group id.
753
+
754
+ * `sync_group.consumer.kafka` is sent whenever a consumer is assigned topic partitions within a consumer group. It includes the following payload:
755
+ * `group_id` is the consumer group id.
756
+
757
+ * `leave_group.consumer.kafka` is sent whenever a consumer leaves a consumer group. It includes the following payload:
758
+ * `group_id` is the consumer group id.
759
+
760
+
732
761
  #### Connection Notifications
733
762
 
734
763
  * `request.connection.kafka` is sent whenever a network request is sent to a Kafka broker. It includes the following payload:
@@ -749,6 +778,29 @@ It is highly recommended that you monitor your Kafka client applications in prod
749
778
  You can quite easily build monitoring on top of the provided [instrumentation hooks](#instrumentation). In order to further help with monitoring, a prebuilt [Statsd](https://github.com/etsy/statsd) and [Datadog](https://www.datadoghq.com/) reporter is included with ruby-kafka.
750
779
 
751
780
 
781
+ #### What to Monitor
782
+
783
+ We recommend monitoring the following:
784
+
785
+ * Low-level Kafka API calls:
786
+ * The rate of API call errors to the total number of calls by both API and broker.
787
+ * The API call throughput by both API and broker.
788
+ * The API call latency by both API and broker.
789
+ * Producer-level metrics:
790
+ * Delivery throughput by topic.
791
+ * The latency of deliveries.
792
+ * The producer buffer fill ratios.
793
+ * The async producer queue sizes.
794
+ * Message delivery delays.
795
+ * Failed delivery attempts.
796
+ * Consumer-level metrics:
797
+ * Message processing throughput by topic.
798
+ * Processing latency by topic.
799
+ * Processing errors by topic.
800
+ * Consumer lag (how many messages are yet to be processed) by topic/partition.
801
+ * Group join/sync/leave by client host.
802
+
803
+
752
804
  #### Reporting Metrics to Statsd
753
805
 
754
806
  The Statsd reporter is automatically enabled when the `kafka/statsd` library is required. You can optionally change the configuration.
@@ -938,6 +990,8 @@ Currently, there are three actively developed frameworks based on ruby-kafka, th
938
990
 
939
991
  * [Racecar](https://github.com/zendesk/racecar) - A simple framework that integrates with Ruby on Rails to provide a seamless way to write, test, configure, and run Kafka consumers. It comes with sensible defaults and conventions.
940
992
 
993
+ * [DeliveryBoy](https://github.com/zendesk/delivery_boy) – A library that integrates with Ruby on Rails, making it easy to publish Kafka messages from any Rails application.
994
+
941
995
  * [Karafka](https://github.com/karafka/karafka) - Framework used to simplify Apache Kafka based Ruby and Rails applications development. Karafka provides higher abstraction layers, including Capistrano, Docker and Heroku support.
942
996
 
943
997
  * [Phobos](https://github.com/klarna/phobos) - Micro framework and library for applications dealing with Apache Kafka. It wraps common behaviors needed by consumers and producers in an easy and convenient API.
data/ci/init.rb CHANGED
@@ -4,7 +4,7 @@ require "kafka"
4
4
 
5
5
  logger = Logger.new(STDOUT)
6
6
  logger.level = Logger::INFO
7
- logger.formatter = -> (_, _, _, msg) { msg }
7
+ logger.formatter = ->(_, _, _, msg) { msg }
8
8
 
9
9
  STDOUT.sync = true
10
10
 
data/circle.yml CHANGED
@@ -5,17 +5,19 @@ machine:
5
5
  - docker
6
6
  environment:
7
7
  LOG_LEVEL: DEBUG
8
+ ruby:
9
+ version: 2.4.1
8
10
 
9
11
  dependencies:
10
12
  pre:
11
13
  - docker -v
12
14
  - docker pull ches/kafka:0.9.0.1
13
15
  - docker pull jplock/zookeeper:3.4.6
14
- - gem install bundler -v 1.9.5
15
16
 
16
17
  test:
17
18
  override:
18
19
  - bundle exec rspec -r rspec_junit_formatter --format RspecJunitFormatter -o $CIRCLE_TEST_REPORTS/rspec/unit.xml
19
20
  - bundle exec rspec -r rspec_junit_formatter --format RspecJunitFormatter -o $CIRCLE_TEST_REPORTS/rspec/functional.xml --tag functional
20
21
  post:
22
+ - bundle exec rubocop
21
23
  - cp *.log $CIRCLE_ARTIFACTS/ || true
data/lib/kafka.rb CHANGED
@@ -203,6 +203,13 @@ module Kafka
203
203
 
204
204
  # Raised if not all messages could be sent by a producer.
205
205
  class DeliveryFailed < Error
206
+ attr_reader :failed_messages
207
+
208
+ def initialize(message, failed_messages)
209
+ @failed_messages = failed_messages
210
+
211
+ super(message)
212
+ end
206
213
  end
207
214
 
208
215
  class HeartbeatError < Error
@@ -142,16 +142,10 @@ module Kafka
142
142
 
143
143
  def ensure_threads_running!
144
144
  @worker_thread = nil unless @worker_thread && @worker_thread.alive?
145
- @worker_thread ||= start_thread { @worker.run }
145
+ @worker_thread ||= Thread.new { @worker.run }
146
146
 
147
147
  @timer_thread = nil unless @timer_thread && @timer_thread.alive?
148
- @timer_thread ||= start_thread { @timer.run }
149
- end
150
-
151
- def start_thread(&block)
152
- thread = Thread.new(&block)
153
- thread.abort_on_exception = true
154
- thread
148
+ @timer_thread ||= Thread.new { @timer.run }
155
149
  end
156
150
 
157
151
  def buffer_overflow(topic)
@@ -191,6 +185,8 @@ module Kafka
191
185
  end
192
186
 
193
187
  def run
188
+ @logger.info "Starting async producer in the background..."
189
+
194
190
  loop do
195
191
  operation, payload = @queue.pop
196
192
 
@@ -218,6 +214,15 @@ module Kafka
218
214
  raise "Unknown operation #{operation.inspect}"
219
215
  end
220
216
  end
217
+ rescue Kafka::Error => e
218
+ @logger.error "Unexpected Kafka error #{e.class}: #{e.message}\n#{e.backtrace.join("\n")}"
219
+ @logger.info "Restarting in 10 seconds..."
220
+
221
+ sleep 10
222
+ retry
223
+ rescue Exception => e
224
+ @logger.error "Unexpected Kafka error #{e.class}: #{e.message}\n#{e.backtrace.join("\n")}"
225
+ @logger.error "Async producer crashed!"
221
226
  ensure
222
227
  @producer.shutdown
223
228
  end
data/lib/kafka/client.rb CHANGED
@@ -14,12 +14,13 @@ require "kafka/sasl_authenticator"
14
14
 
15
15
  module Kafka
16
16
  class Client
17
+ URI_SCHEMES = ["kafka", "kafka+ssl"]
17
18
 
18
19
  # Initializes a new Kafka client.
19
20
  #
20
21
  # @param seed_brokers [Array<String>, String] the list of brokers used to initialize
21
22
  # the client. Either an Array of connections, or a comma separated string of connections.
22
- # Connections can either be a string of "port:protocol" or a full URI with a scheme.
23
+ # A connection can either be a string of "host:port" or a full URI with a scheme.
23
24
  # If there's a scheme it's ignored and only host/port are used.
24
25
  #
25
26
  # @param client_id [String] the identifier for this application.
@@ -142,7 +143,7 @@ module Kafka
142
143
  operation.execute
143
144
 
144
145
  unless buffer.empty?
145
- raise DeliveryFailed
146
+ raise DeliveryFailed.new(nil, [message])
146
147
  end
147
148
  end
148
149
 
@@ -435,7 +436,6 @@ module Kafka
435
436
  @cluster.resolve_offset(topic, partition, :latest) - 1
436
437
  end
437
438
 
438
-
439
439
  # Retrieve the offset of the last message in each partition of the specified topics.
440
440
  #
441
441
  # @param topics [Array<String>] topic names.
@@ -516,6 +516,11 @@ module Kafka
516
516
  connection = "kafka://" + connection unless connection =~ /:\/\//
517
517
  uri = URI.parse(connection)
518
518
  uri.port ||= 9092 # Default Kafka port.
519
+
520
+ unless URI_SCHEMES.include?(uri.scheme)
521
+ raise Kafka::Error, "invalid protocol `#{uri.scheme}` in `#{connection}`"
522
+ end
523
+
519
524
  uri
520
525
  end
521
526
  end
@@ -165,10 +165,10 @@ module Kafka
165
165
  # at the last committed offsets.
166
166
  #
167
167
  # @param min_bytes [Integer] the minimum number of bytes to read before
168
- # returning messages from the server; if `max_wait_time` is reached, this
168
+ # returning messages from each broker; if `max_wait_time` is reached, this
169
169
  # is ignored.
170
170
  # @param max_wait_time [Integer, Float] the maximum duration of time to wait before
171
- # returning messages from the server, in seconds.
171
+ # returning messages from each broker, in seconds.
172
172
  # @param automatically_mark_as_processed [Boolean] whether to automatically
173
173
  # mark a message as successfully processed when the block returns
174
174
  # without an exception. Once marked successful, the offsets of processed
@@ -178,7 +178,7 @@ module Kafka
178
178
  # The original exception will be returned by calling `#cause` on the
179
179
  # {Kafka::ProcessingError} instance.
180
180
  # @return [nil]
181
- def each_message(min_bytes: 1, max_wait_time: 5, automatically_mark_as_processed: true)
181
+ def each_message(min_bytes: 1, max_wait_time: 1, automatically_mark_as_processed: true)
182
182
  consumer_loop do
183
183
  batches = fetch_batches(min_bytes: min_bytes, max_wait_time: max_wait_time)
184
184
 
@@ -232,17 +232,17 @@ module Kafka
232
232
  # at the last committed offsets.
233
233
  #
234
234
  # @param min_bytes [Integer] the minimum number of bytes to read before
235
- # returning messages from the server; if `max_wait_time` is reached, this
235
+ # returning messages from each broker; if `max_wait_time` is reached, this
236
236
  # is ignored.
237
237
  # @param max_wait_time [Integer, Float] the maximum duration of time to wait before
238
- # returning messages from the server, in seconds.
238
+ # returning messages from each broker, in seconds.
239
239
  # @param automatically_mark_as_processed [Boolean] whether to automatically
240
240
  # mark a batch's messages as successfully processed when the block returns
241
241
  # without an exception. Once marked successful, the offsets of processed
242
242
  # messages can be committed to Kafka.
243
243
  # @yieldparam batch [Kafka::FetchedBatch] a message batch fetched from Kafka.
244
244
  # @return [nil]
245
- def each_batch(min_bytes: 1, max_wait_time: 5, automatically_mark_as_processed: true)
245
+ def each_batch(min_bytes: 1, max_wait_time: 1, automatically_mark_as_processed: true)
246
246
  consumer_loop do
247
247
  batches = fetch_batches(min_bytes: min_bytes, max_wait_time: max_wait_time)
248
248
 
@@ -260,7 +260,7 @@ module Kafka
260
260
  begin
261
261
  yield batch
262
262
  rescue => e
263
- offset_range = (batch.first_offset .. batch.last_offset)
263
+ offset_range = (batch.first_offset..batch.last_offset)
264
264
  location = "#{batch.topic}/#{batch.partition} in offset range #{offset_range}"
265
265
  backtrace = e.backtrace.join("\n")
266
266
 
@@ -289,9 +289,9 @@ module Kafka
289
289
  # you will want to do this in every consumer group member in order to make sure
290
290
  # that the member that's assigned the partition knows where to start.
291
291
  #
292
- # @param topic [String]
293
- # @param partition [Integer]
294
- # @param offset [Integer]
292
+ # @param topic [String]
293
+ # @param partition [Integer]
294
+ # @param offset [Integer]
295
295
  # @return [nil]
296
296
  def seek(topic, partition, offset)
297
297
  @offset_manager.seek_to(topic, partition, offset)
data/lib/kafka/datadog.rb CHANGED
@@ -27,24 +27,53 @@ module Kafka
27
27
  module Datadog
28
28
  STATSD_NAMESPACE = "ruby_kafka"
29
29
 
30
- def self.statsd
31
- @statsd ||= ::Datadog::Statsd.new(::Datadog::Statsd::DEFAULT_HOST, ::Datadog::Statsd::DEFAULT_PORT, namespace: STATSD_NAMESPACE)
32
- end
30
+ class << self
31
+ def statsd
32
+ @statsd ||= ::Datadog::Statsd.new(host, port, namespace: namespace, tags: tags)
33
+ end
33
34
 
34
- def self.host=(host)
35
- statsd.host = host
36
- end
35
+ def host
36
+ @host ||= ::Datadog::Statsd::DEFAULT_HOST
37
+ end
37
38
 
38
- def self.port=(port)
39
- statsd.port = port
40
- end
39
+ def host=(host)
40
+ @host = host
41
+ clear
42
+ end
41
43
 
42
- def self.namespace=(namespace)
43
- statsd.namespace = namespace
44
- end
44
+ def port
45
+ @port ||= ::Datadog::Statsd::DEFAULT_PORT
46
+ end
47
+
48
+ def port=(port)
49
+ @port = port
50
+ clear
51
+ end
52
+
53
+ def namespace
54
+ @namespace ||= STATSD_NAMESPACE
55
+ end
45
56
 
46
- def self.tags=(tags)
47
- statsd.tags = tags
57
+ def namespace=(namespace)
58
+ @namespace = namespace
59
+ clear
60
+ end
61
+
62
+ def tags
63
+ @tags ||= []
64
+ end
65
+
66
+ def tags=(tags)
67
+ @tags = tags
68
+ clear
69
+ end
70
+
71
+ private
72
+
73
+ def clear
74
+ @statsd && @statsd.close
75
+ @statsd = nil
76
+ end
48
77
  end
49
78
 
50
79
  class StatsdSubscriber < ActiveSupport::Subscriber
@@ -113,6 +142,7 @@ module Kafka
113
142
  end
114
143
 
115
144
  def process_batch(event)
145
+ lag = event.payload.fetch(:offset_lag)
116
146
  messages = event.payload.fetch(:message_count)
117
147
 
118
148
  tags = {
@@ -128,6 +158,8 @@ module Kafka
128
158
  timing("consumer.process_batch.latency", event.duration, tags: tags)
129
159
  count("consumer.messages", messages, tags: tags)
130
160
  end
161
+
162
+ gauge("consumer.lag", lag, tags: tags)
131
163
  end
132
164
 
133
165
  def join_group(event)
@@ -1,4 +1,7 @@
1
1
  module Kafka
2
+
3
+ # Manages a consumer's position in partitions, figures out where to resume processing
4
+ # from, etc.
2
5
  class OffsetManager
3
6
 
4
7
  # The default broker setting for offsets.retention.minutes is 1440.
@@ -21,10 +24,28 @@ module Kafka
21
24
  @recommit_interval = (offset_retention_time || DEFAULT_RETENTION_TIME) / 2
22
25
  end
23
26
 
27
+ # Set the default offset for a topic.
28
+ #
29
+ # When the consumer is started for the first time, or in cases where it gets stuck and
30
+ # has to reset its position, it must start either with the earliest messages or with
31
+ # the latest, skipping to the very end of each partition.
32
+ #
33
+ # @param topic [String] the name of the topic.
34
+ # @param default_offset [Symbol] either `:earliest` or `:latest`.
35
+ # @return [nil]
24
36
  def set_default_offset(topic, default_offset)
25
37
  @default_offsets[topic] = default_offset
26
38
  end
27
39
 
40
+ # Mark a message as having been processed.
41
+ #
42
+ # When offsets are committed, the message's offset will be stored in Kafka so
43
+ # that we can resume from this point at a later time.
44
+ #
45
+ # @param topic [String] the name of the topic.
46
+ # @param partition [Integer] the partition number.
47
+ # @param offset [Integer] the offset of the message that should be marked as processed.
48
+ # @return [nil]
28
49
  def mark_as_processed(topic, partition, offset)
29
50
  @uncommitted_offsets += 1
30
51
  @processed_offsets[topic] ||= {}
@@ -35,15 +56,35 @@ module Kafka
35
56
  @logger.debug "Marking #{topic}/#{partition}:#{offset} as processed"
36
57
  end
37
58
 
59
+ # Move the consumer's position in the partition back to the configured default
60
+ # offset, either the first or latest in the partition.
61
+ #
62
+ # @param topic [String] the name of the topic.
63
+ # @param partition [Integer] the partition number.
64
+ # @return [nil]
38
65
  def seek_to_default(topic, partition)
66
+ # Remove any cached offset, in case things have changed broker-side.
67
+ clear_resolved_offset(topic)
68
+
39
69
  seek_to(topic, partition, -1)
40
70
  end
41
71
 
72
+ # Move the consumer's position in the partition to the specified offset.
73
+ #
74
+ # @param topic [String] the name of the topic.
75
+ # @param partition [Integer] the partition number.
76
+ # @param offset [Integer] the offset that the consumer position should be moved to.
77
+ # @return [nil]
42
78
  def seek_to(topic, partition, offset)
43
79
  @processed_offsets[topic] ||= {}
44
80
  @processed_offsets[topic][partition] = offset
45
81
  end
46
82
 
83
+ # Return the next offset that should be fetched for the specified partition.
84
+ #
85
+ # @param topic [String] the name of the topic.
86
+ # @param partition [Integer] the partition number.
87
+ # @return [Integer] the next offset that should be fetched.
47
88
  def next_offset_for(topic, partition)
48
89
  offset = @processed_offsets.fetch(topic, {}).fetch(partition) {
49
90
  committed_offset_for(topic, partition)
@@ -59,6 +100,16 @@ module Kafka
59
100
  end
60
101
  end
61
102
 
103
+ # Commit offsets of messages that have been marked as processed.
104
+ #
105
+ # If `recommit` is set to true, we will also commit the existing positions
106
+ # even if no messages have been processed on a partition. This is done
107
+ # in order to avoid the offset information expiring in cases where messages
108
+ # are very rare -- it's essentially a keep-alive.
109
+ #
110
+ # @param recommit [Boolean] whether to recommit offsets that have already been
111
+ # committed.
112
+ # @return [nil]
62
113
  def commit_offsets(recommit = false)
63
114
  offsets = offsets_to_commit(recommit)
64
115
  unless offsets.empty?
@@ -74,6 +125,10 @@ module Kafka
74
125
  end
75
126
  end
76
127
 
128
+ # Commit offsets if necessary, according to the offset commit policy specified
129
+ # when initializing the class.
130
+ #
131
+ # @return [nil]
77
132
  def commit_offsets_if_necessary
78
133
  recommit = recommit_timeout_reached?
79
134
  if recommit || commit_timeout_reached? || commit_threshold_reached?
@@ -81,6 +136,9 @@ module Kafka
81
136
  end
82
137
  end
83
138
 
139
+ # Clear all stored offset information.
140
+ #
141
+ # @return [nil]
84
142
  def clear_offsets
85
143
  @processed_offsets.clear
86
144
  @resolved_offsets.clear
@@ -89,6 +147,12 @@ module Kafka
89
147
  @committed_offsets = nil
90
148
  end
91
149
 
150
+ # Clear stored offset information for all partitions except those specified
151
+ # in `excluded`.
152
+ #
153
+ # offset_manager.clear_offsets_excluding("my-topic" => [1, 2, 3])
154
+ #
155
+ # @return [nil]
92
156
  def clear_offsets_excluding(excluded)
93
157
  # Clear all offsets that aren't in `excluded`.
94
158
  @processed_offsets.each do |topic, partitions|
@@ -104,6 +168,10 @@ module Kafka
104
168
 
105
169
  private
106
170
 
171
+ def clear_resolved_offset(topic)
172
+ @resolved_offsets.delete(topic)
173
+ end
174
+
107
175
  def resolve_offset(topic, partition)
108
176
  @resolved_offsets[topic] ||= fetch_resolved_offsets(topic)
109
177
  @resolved_offsets[topic].fetch(partition)
@@ -11,5 +11,15 @@ module Kafka
11
11
  @create_time = create_time
12
12
  @bytesize = key.to_s.bytesize + value.to_s.bytesize
13
13
  end
14
+
15
+ def ==(other)
16
+ @value == other.value &&
17
+ @key == other.key &&
18
+ @topic == other.topic &&
19
+ @partition == other.partition &&
20
+ @partition_key == other.partition_key &&
21
+ @create_time == other.create_time &&
22
+ @bytesize == other.bytesize
23
+ end
14
24
  end
15
25
  end
@@ -98,7 +98,7 @@ module Kafka
98
98
  timeout: @ack_timeout * 1000, # Kafka expects the timeout in milliseconds.
99
99
  )
100
100
 
101
- handle_response(response) if response
101
+ handle_response(broker, response) if response
102
102
  rescue ConnectionError => e
103
103
  @logger.error "Could not connect to broker #{broker}: #{e}"
104
104
 
@@ -108,7 +108,7 @@ module Kafka
108
108
  end
109
109
  end
110
110
 
111
- def handle_response(response)
111
+ def handle_response(broker, response)
112
112
  response.each_partition do |topic_info, partition_info|
113
113
  topic = topic_info.topic
114
114
  partition = partition_info.partition
@@ -138,24 +138,24 @@ module Kafka
138
138
  })
139
139
  end
140
140
  rescue Kafka::CorruptMessage
141
- @logger.error "Corrupt message when writing to #{topic}/#{partition}"
141
+ @logger.error "Corrupt message when writing to #{topic}/#{partition} on #{broker}"
142
142
  rescue Kafka::UnknownTopicOrPartition
143
- @logger.error "Unknown topic or partition #{topic}/#{partition}"
143
+ @logger.error "Unknown topic or partition #{topic}/#{partition} on #{broker}"
144
144
  @cluster.mark_as_stale!
145
145
  rescue Kafka::LeaderNotAvailable
146
146
  @logger.error "Leader currently not available for #{topic}/#{partition}"
147
147
  @cluster.mark_as_stale!
148
148
  rescue Kafka::NotLeaderForPartition
149
- @logger.error "Broker not currently leader for #{topic}/#{partition}"
149
+ @logger.error "Broker #{broker} not currently leader for #{topic}/#{partition}"
150
150
  @cluster.mark_as_stale!
151
151
  rescue Kafka::RequestTimedOut
152
- @logger.error "Timed out while writing to #{topic}/#{partition}"
152
+ @logger.error "Timed out while writing to #{topic}/#{partition} on #{broker}"
153
153
  rescue Kafka::NotEnoughReplicas
154
154
  @logger.error "Not enough in-sync replicas for #{topic}/#{partition}"
155
155
  rescue Kafka::NotEnoughReplicasAfterAppend
156
156
  @logger.error "Messages written, but to fewer in-sync replicas than required for #{topic}/#{partition}"
157
157
  else
158
- @logger.debug "Successfully appended #{messages.count} messages to #{topic}/#{partition}"
158
+ @logger.debug "Successfully appended #{messages.count} messages to #{topic}/#{partition} on #{broker}"
159
159
 
160
160
  # The messages were successfully written; clear them from the buffer.
161
161
  @buffer.clear_messages(topic: topic, partition: partition)
@@ -294,7 +294,11 @@ module Kafka
294
294
 
295
295
  notification[:attempts] = attempt
296
296
 
297
- @cluster.refresh_metadata_if_necessary!
297
+ begin
298
+ @cluster.refresh_metadata_if_necessary!
299
+ rescue ConnectionError => e
300
+ raise DeliveryFailed.new(e, buffer_messages)
301
+ end
298
302
 
299
303
  assign_partitions!
300
304
  operation.execute
@@ -321,13 +325,13 @@ module Kafka
321
325
  unless @pending_message_queue.empty?
322
326
  # Mark the cluster as stale in order to force a cluster metadata refresh.
323
327
  @cluster.mark_as_stale!
324
- raise DeliveryFailed, "Failed to assign partitions to #{@pending_message_queue.size} messages"
328
+ raise DeliveryFailed.new("Failed to assign partitions to #{@pending_message_queue.size} messages", buffer_messages)
325
329
  end
326
330
 
327
331
  unless @buffer.empty?
328
332
  partitions = @buffer.map {|topic, partition, _| "#{topic}/#{partition}" }.join(", ")
329
333
 
330
- raise DeliveryFailed, "Failed to send messages to #{partitions}"
334
+ raise DeliveryFailed.new("Failed to send messages to #{partitions}", buffer_messages)
331
335
  end
332
336
  end
333
337
 
@@ -380,6 +384,29 @@ module Kafka
380
384
  @pending_message_queue.replace(failed_messages)
381
385
  end
382
386
 
387
+ def buffer_messages
388
+ messages = []
389
+
390
+ @pending_message_queue.each do |message|
391
+ messages << message
392
+ end
393
+
394
+ @buffer.each do |topic, partition, messages_for_partition|
395
+ messages_for_partition.each do |message|
396
+ messages << PendingMessage.new(
397
+ message.value,
398
+ message.key,
399
+ topic,
400
+ partition,
401
+ nil,
402
+ message.create_time
403
+ )
404
+ end
405
+ end
406
+
407
+ messages
408
+ end
409
+
383
410
  def buffer_overflow(topic, message)
384
411
  @instrumenter.instrument("buffer_overflow.producer", {
385
412
  topic: topic,
@@ -38,7 +38,7 @@ module Kafka
38
38
  end
39
39
 
40
40
  # we can continue, so send OK
41
- @encoder.write([0,2].pack('l>c'))
41
+ @encoder.write([0, 2].pack('l>c'))
42
42
 
43
43
  # read wrapped message and return it back with principal
44
44
  handshake_messages
@@ -31,15 +31,15 @@ module Kafka
31
31
 
32
32
  # first initiate the TCP socket
33
33
  begin
34
- # Initiate the socket connection in the background. If it doesn't fail
35
- # immediately it will raise an IO::WaitWritable (Errno::EINPROGRESS)
34
+ # Initiate the socket connection in the background. If it doesn't fail
35
+ # immediately it will raise an IO::WaitWritable (Errno::EINPROGRESS)
36
36
  # indicating the connection is in progress.
37
37
  @tcp_socket.connect_nonblock(sockaddr)
38
38
  rescue IO::WaitWritable
39
39
  # select will block until the socket is writable or the timeout
40
40
  # is exceeded, whichever comes first.
41
41
  unless select_with_timeout(@tcp_socket, :connect_write)
42
- # select returns nil when the socket is not ready before timeout
42
+ # select returns nil when the socket is not ready before timeout
43
43
  # seconds have elapsed
44
44
  @tcp_socket.close
45
45
  raise Errno::ETIMEDOUT
@@ -57,8 +57,8 @@ module Kafka
57
57
  @ssl_socket = OpenSSL::SSL::SSLSocket.new(@tcp_socket, ssl_context)
58
58
 
59
59
  begin
60
- # Initiate the socket connection in the background. If it doesn't fail
61
- # immediately it will raise an IO::WaitWritable (Errno::EINPROGRESS)
60
+ # Initiate the socket connection in the background. If it doesn't fail
61
+ # immediately it will raise an IO::WaitWritable (Errno::EINPROGRESS)
62
62
  # indicating the connection is in progress.
63
63
  # Unlike waiting for a tcp socket to connect, you can't time out ssl socket
64
64
  # connections during the connect phase properly, because IO.select only partially works.
@@ -130,7 +130,7 @@ module Kafka
130
130
  # our write buffer.
131
131
  written += @ssl_socket.write_nonblock(bytes)
132
132
  rescue Errno::EFAULT => error
133
- raise error
133
+ raise error
134
134
  rescue OpenSSL::SSL::SSLError, Errno::EAGAIN, Errno::EWOULDBLOCK, IO::WaitWritable => error
135
135
  if error.is_a?(OpenSSL::SSL::SSLError) && error.message == 'write would block'
136
136
  if select_with_timeout(@ssl_socket, :write)
data/lib/kafka/statsd.rb CHANGED
@@ -29,7 +29,7 @@ module Kafka
29
29
  DEFAULT_PORT = 8125
30
30
 
31
31
  def self.statsd
32
- @statsd ||= ::Statsd.new(DEFAULT_HOST, DEFAULT_PORT).tap{ |sd| sd.namespace = DEFAULT_NAMESPACE }
32
+ @statsd ||= ::Statsd.new(DEFAULT_HOST, DEFAULT_PORT).tap { |sd| sd.namespace = DEFAULT_NAMESPACE }
33
33
  end
34
34
 
35
35
  def self.host=(host)
data/lib/kafka/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module Kafka
2
- VERSION = "0.4.2"
2
+ VERSION = "0.4.3"
3
3
  end
data/ruby-kafka.gemspec CHANGED
@@ -1,4 +1,5 @@
1
1
  # coding: utf-8
2
+
2
3
  lib = File.expand_path('../lib', __FILE__)
3
4
  $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
5
  require 'kafka/version'
@@ -36,9 +37,10 @@ Gem::Specification.new do |spec|
36
37
  spec.add_development_dependency "snappy"
37
38
  spec.add_development_dependency "colored"
38
39
  spec.add_development_dependency "rspec_junit_formatter", "0.2.2"
39
- spec.add_development_dependency "dogstatsd-ruby", ">= 2.0.0"
40
+ spec.add_development_dependency "dogstatsd-ruby", ">= 3.0.0"
40
41
  spec.add_development_dependency "statsd-ruby"
41
42
  spec.add_development_dependency "ruby-prof"
42
43
  spec.add_development_dependency "timecop"
44
+ spec.add_development_dependency "rubocop", "~> 0.49.1"
43
45
  spec.add_development_dependency "gssapi", '>=1.2.0'
44
46
  end
@@ -0,0 +1,17 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+ #
4
+ # This file was generated by Bundler.
5
+ #
6
+ # The application 'rubocop' is installed as part of a gem, and
7
+ # this file is here to facilitate running it.
8
+ #
9
+
10
+ require "pathname"
11
+ ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../Gemfile",
12
+ Pathname.new(__FILE__).realpath)
13
+
14
+ require "rubygems"
15
+ require "bundler/setup"
16
+
17
+ load Gem.bin_path("rubocop", "rubocop")
@@ -0,0 +1,17 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+ #
4
+ # This file was generated by Bundler.
5
+ #
6
+ # The application 'ruby-parse' is installed as part of a gem, and
7
+ # this file is here to facilitate running it.
8
+ #
9
+
10
+ require "pathname"
11
+ ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../Gemfile",
12
+ Pathname.new(__FILE__).realpath)
13
+
14
+ require "rubygems"
15
+ require "bundler/setup"
16
+
17
+ load Gem.bin_path("parser", "ruby-parse")
@@ -0,0 +1,17 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+ #
4
+ # This file was generated by Bundler.
5
+ #
6
+ # The application 'ruby-rewrite' is installed as part of a gem, and
7
+ # this file is here to facilitate running it.
8
+ #
9
+
10
+ require "pathname"
11
+ ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../Gemfile",
12
+ Pathname.new(__FILE__).realpath)
13
+
14
+ require "rubygems"
15
+ require "bundler/setup"
16
+
17
+ load Gem.bin_path("parser", "ruby-rewrite")
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ruby-kafka
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.2
4
+ version: 0.4.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Daniel Schierbeck
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2017-09-08 00:00:00.000000000 Z
11
+ date: 2017-10-20 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -170,14 +170,14 @@ dependencies:
170
170
  requirements:
171
171
  - - ">="
172
172
  - !ruby/object:Gem::Version
173
- version: 2.0.0
173
+ version: 3.0.0
174
174
  type: :development
175
175
  prerelease: false
176
176
  version_requirements: !ruby/object:Gem::Requirement
177
177
  requirements:
178
178
  - - ">="
179
179
  - !ruby/object:Gem::Version
180
- version: 2.0.0
180
+ version: 3.0.0
181
181
  - !ruby/object:Gem::Dependency
182
182
  name: statsd-ruby
183
183
  requirement: !ruby/object:Gem::Requirement
@@ -220,6 +220,20 @@ dependencies:
220
220
  - - ">="
221
221
  - !ruby/object:Gem::Version
222
222
  version: '0'
223
+ - !ruby/object:Gem::Dependency
224
+ name: rubocop
225
+ requirement: !ruby/object:Gem::Requirement
226
+ requirements:
227
+ - - "~>"
228
+ - !ruby/object:Gem::Version
229
+ version: 0.49.1
230
+ type: :development
231
+ prerelease: false
232
+ version_requirements: !ruby/object:Gem::Requirement
233
+ requirements:
234
+ - - "~>"
235
+ - !ruby/object:Gem::Version
236
+ version: 0.49.1
223
237
  - !ruby/object:Gem::Dependency
224
238
  name: gssapi
225
239
  requirement: !ruby/object:Gem::Requirement
@@ -243,10 +257,10 @@ extra_rdoc_files: []
243
257
  files:
244
258
  - ".gitignore"
245
259
  - ".rspec"
260
+ - ".rubocop.yml"
246
261
  - ".yardopts"
247
262
  - CHANGELOG.md
248
263
  - Gemfile
249
- - Gemfile.lock
250
264
  - ISSUE_TEMPLATE.md
251
265
  - LICENSE.txt
252
266
  - Procfile
@@ -341,8 +355,11 @@ files:
341
355
  - vendor/bundle/bin/pry
342
356
  - vendor/bundle/bin/rake
343
357
  - vendor/bundle/bin/rspec
358
+ - vendor/bundle/bin/rubocop
359
+ - vendor/bundle/bin/ruby-parse
344
360
  - vendor/bundle/bin/ruby-prof
345
361
  - vendor/bundle/bin/ruby-prof-check-trace
362
+ - vendor/bundle/bin/ruby-rewrite
346
363
  homepage: https://github.com/zendesk/ruby-kafka
347
364
  licenses:
348
365
  - Apache License Version 2.0
@@ -363,7 +380,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
363
380
  version: '0'
364
381
  requirements: []
365
382
  rubyforge_project:
366
- rubygems_version: 2.4.5.1
383
+ rubygems_version: 2.6.11
367
384
  signing_key:
368
385
  specification_version: 4
369
386
  summary: A client library for the Kafka distributed commit log.
data/Gemfile.lock DELETED
@@ -1,92 +0,0 @@
1
- PATH
2
- remote: .
3
- specs:
4
- ruby-kafka (0.4.1)
5
-
6
- GEM
7
- remote: https://rubygems.org/
8
- specs:
9
- activesupport (4.2.5)
10
- i18n (~> 0.7)
11
- json (~> 1.7, >= 1.7.7)
12
- minitest (~> 5.1)
13
- thread_safe (~> 0.3, >= 0.3.4)
14
- tzinfo (~> 1.1)
15
- benchmark-perf (0.1.0)
16
- builder (3.2.2)
17
- coderay (1.1.0)
18
- colored (1.2)
19
- diff-lcs (1.2.5)
20
- docker-api (1.32.1)
21
- excon (>= 0.38.0)
22
- json
23
- dogstatsd-ruby (2.1.0)
24
- dotenv (2.1.0)
25
- excon (0.54.0)
26
- ffi (1.9.18)
27
- gssapi (1.2.0)
28
- ffi (>= 1.0.1)
29
- i18n (0.7.0)
30
- json (1.8.3)
31
- method_source (0.8.2)
32
- minitest (5.8.3)
33
- pry (0.9.12.6)
34
- coderay (~> 1.0)
35
- method_source (~> 0.8)
36
- slop (~> 3.4)
37
- rake (10.5.0)
38
- rspec (3.4.0)
39
- rspec-core (~> 3.4.0)
40
- rspec-expectations (~> 3.4.0)
41
- rspec-mocks (~> 3.4.0)
42
- rspec-benchmark (0.1.0)
43
- benchmark-perf (~> 0.1.0)
44
- rspec (>= 3.0.0, < 4.0.0)
45
- rspec-core (3.4.1)
46
- rspec-support (~> 3.4.0)
47
- rspec-expectations (3.4.0)
48
- diff-lcs (>= 1.2.0, < 2.0)
49
- rspec-support (~> 3.4.0)
50
- rspec-mocks (3.4.1)
51
- diff-lcs (>= 1.2.0, < 2.0)
52
- rspec-support (~> 3.4.0)
53
- rspec-support (3.4.1)
54
- rspec_junit_formatter (0.2.2)
55
- builder (< 4)
56
- rspec-core (>= 2, < 4, != 2.12.0)
57
- ruby-prof (0.15.9)
58
- slop (3.6.0)
59
- snappy (0.0.12)
60
- statsd-ruby (1.4.0)
61
- thread_safe (0.3.5)
62
- timecop (0.8.0)
63
- tzinfo (1.2.2)
64
- thread_safe (~> 0.1)
65
-
66
- PLATFORMS
67
- ruby
68
-
69
- DEPENDENCIES
70
- activesupport
71
- bundler (>= 1.9.5)
72
- colored
73
- docker-api
74
- dogstatsd-ruby (>= 2.0.0)
75
- dotenv
76
- gssapi (>= 1.2.0)
77
- pry
78
- rake (~> 10.0)
79
- rspec
80
- rspec-benchmark
81
- rspec_junit_formatter (= 0.2.2)
82
- ruby-kafka!
83
- ruby-prof
84
- snappy
85
- statsd-ruby
86
- timecop
87
-
88
- RUBY VERSION
89
- ruby 2.2.3p173
90
-
91
- BUNDLED WITH
92
- 1.15.3