ruby-kafka 0.4.2 → 0.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 5c001c97f73b001e170f23d7fd970781aa6bb196
4
- data.tar.gz: 39f0b03b56ad97eee788fac63ec467599ad8cd8e
3
+ metadata.gz: 6c934181fd72d8f78eb9b4438b7297a6a71a51eb
4
+ data.tar.gz: e3a970abf6839ab03bdd34caeb53dfe9fe91666d
5
5
  SHA512:
6
- metadata.gz: 6f9312f25ab7493d5803b581b22af270399f0f5ff2fbf4012da1f4a30525c430fb1fa21085cb4042aa4b59074a0abe3af9038ee6e97f0b48770baa50eaab6393
7
- data.tar.gz: 7523013dbc08d7a193faf6b2b2419a6704cb4b7918f9ed993545b534e356084a765a1c8cfba937a55d7236c26985e3cf252cbe50df6c3c98a45e1bc47e32caad
6
+ metadata.gz: 266b2624a721b56c991797a6848531e00675274a24241db59de6d8649ac70ba41799b40d39efbfd8c90a1b467aa142fe68526533e23933c606fbaac7b9ae780d
7
+ data.tar.gz: 1d41e52b87201e803a3edf62a56a0086c1f8aed86dc255a9c73eee5b295acc27113e7d82adba4558666142509664610f5531c654049e596c546c89e468f153b5
data/.gitignore CHANGED
@@ -3,6 +3,7 @@
3
3
  /_yardoc/
4
4
  /coverage/
5
5
  /doc/
6
+ /Gemfile.lock
6
7
  /pkg/
7
8
  /spec/reports/
8
9
  /tmp/
data/.rubocop.yml ADDED
@@ -0,0 +1,44 @@
1
+ AllCops:
2
+ DisplayCopNames: true
3
+ TargetRubyVersion: 2.1
4
+
5
+ Lint:
6
+ Enabled: false
7
+ Metrics:
8
+ Enabled: false
9
+ Performance:
10
+ Enabled: false
11
+ Style:
12
+ Enabled: false
13
+
14
+ # Configured cops
15
+
16
+ Layout/CaseIndentation:
17
+ EnforcedStyle: end
18
+ Layout/FirstParameterIndentation:
19
+ EnforcedStyle: consistent
20
+ Layout/IndentArray:
21
+ EnforcedStyle: consistent
22
+ Layout/IndentHash:
23
+ EnforcedStyle: consistent
24
+ Layout/MultilineMethodCallIndentation:
25
+ EnforcedStyle: indented
26
+ Layout/MultilineOperationIndentation:
27
+ EnforcedStyle: indented
28
+ Lint/EndAlignment:
29
+ EnforcedStyleAlignWith: variable
30
+
31
+ #
32
+ # Disabled cops
33
+ #
34
+
35
+ Layout/AlignHash:
36
+ Enabled: false
37
+ Layout/AlignParameters:
38
+ Enabled: false
39
+ Layout/EmptyLinesAroundClassBody:
40
+ Enabled: false
41
+ Layout/EmptyLinesAroundModuleBody:
42
+ Enabled: false
43
+ Layout/SpaceInsideBlockBraces:
44
+ Enabled: false
data/CHANGELOG.md CHANGED
@@ -4,6 +4,20 @@ Changes and additions to the library will be listed here.
4
4
 
5
5
  ## Unreleased
6
6
 
7
+ ## v0.4.3
8
+
9
+ - Restart the async producer thread automatically after errors.
10
+ - Include the offset lag in batch consumer metrics (Statsd).
11
+ - Make the default `max_wait_time` more sane.
12
+ - Fix issue with cached default offset lookups (#431).
13
+ - Upgrade to Datadog client version 3.
14
+
15
+ ## v0.4.2
16
+
17
+ - Fix connection issue on SASL connections (#401).
18
+ - Add more instrumentation of consumer groups (#407).
19
+ - Improve error logging (#385)
20
+
7
21
  ## v0.4.1
8
22
 
9
23
  - Allow seeking the consumer position (#386).
data/Gemfile CHANGED
@@ -1,4 +1,3 @@
1
1
  source "https://rubygems.org"
2
- ruby "2.2.3"
3
2
 
4
3
  gemspec
data/README.md CHANGED
@@ -31,8 +31,9 @@ Although parts of this library work with Kafka 0.8 – specifically, the Produce
31
31
  5. [Logging](#logging)
32
32
  6. [Instrumentation](#instrumentation)
33
33
  7. [Monitoring](#monitoring)
34
- 1. [Reporting Metrics to Statsd](#reporting-metrics-to-statsd)
35
- 2. [Reporting Metrics to Datadog](#reporting-metrics-to-datadog)
34
+ 1. [What to Monitor](#what-to-monitor)
35
+ 2. [Reporting Metrics to Statsd](#reporting-metrics-to-statsd)
36
+ 3. [Reporting Metrics to Datadog](#reporting-metrics-to-datadog)
36
37
  8. [Understanding Timeouts](#understanding-timeouts)
37
38
  9. [Security](#security)
38
39
  1. [Encryption and Authentication using SSL](#encryption-and-authentication-using-ssl)
@@ -654,14 +655,15 @@ In order to optimize for low latency, you want to process a message as soon as p
654
655
  There are three values that can be tuned in order to balance these two concerns.
655
656
 
656
657
  * `min_bytes` is the minimum number of bytes to return from a single message fetch. By setting this to a high value you can increase the processing throughput. The default value is one byte.
657
- * `max_wait_time` is the maximum number of seconds to wait before returning data from a single message fetch. By setting this high you also increase the processing throughput – and by setting it low you set a bound on latency. This configuration overrides `min_bytes`, so you'll _always_ get data back within the time specified. The default value is five seconds. If you want to have at most one second of latency, set `max_wait_time` to 1.
658
+ * `max_wait_time` is the maximum number of seconds to wait before returning data from a single message fetch. By setting this high you also increase the processing throughput – and by setting it low you set a bound on latency. This configuration overrides `min_bytes`, so you'll _always_ get data back within the time specified. The default value is one second. If you want to have at most five seconds of latency, set `max_wait_time` to 5. You should make sure `max_wait_time` * num brokers + `heartbeat_interval` is less than `session_timeout`.
658
659
  * `max_bytes_per_partition` is the maximum amount of data a broker will return for a single partition when fetching new messages. The default is 1MB, but increasing this number may lead to better throughtput since you'll need to fetch less frequently. Setting it to a lower value is not recommended unless you have so many partitions that it's causing network and latency issues to transfer a fetch response from a broker to a client. Setting the number too high may result in instability, so be careful.
659
660
 
660
661
  The first two settings can be passed to either `#each_message` or `#each_batch`, e.g.
661
662
 
662
663
  ```ruby
663
- # Waits for data for up to 30 seconds, preferring to fetch at least 5KB at a time.
664
- consumer.each_message(min_bytes: 1024 * 5, max_wait_time: 30) do |message|
664
+ # Waits for data for up to 5 seconds on each broker, preferring to fetch at least 5KB at a time.
665
+ # This can wait up to num brokers * 5 seconds.
666
+ consumer.each_message(min_bytes: 1024 * 5, max_wait_time: 5) do |message|
665
667
  # ...
666
668
  end
667
669
  ```
@@ -729,6 +731,33 @@ end
729
731
  * `message_count` is the number of messages for which delivery was attempted.
730
732
  * `delivered_message_count` is the number of messages that were acknowledged by the brokers - if this number is smaller than `message_count` not all messages were successfully delivered.
731
733
 
734
+ #### Consumer Notifications
735
+
736
+ * `process_message.consumer.kafka` is sent whenever a message is processed by a consumer. It includes the following payload:
737
+ * `value` is the message value.
738
+ * `key` is the message key.
739
+ * `topic` is the topic that the message was consumed from.
740
+ * `partition` is the topic partition that the message was consumed from.
741
+ * `offset` is the message's offset within the topic partition.
742
+ * `offset_lag` is the number of messages within the topic partition that have not yet been consumed.
743
+
744
+ * `process_batch.consumer.kafka` is sent whenever a message batch is processed by a consumer. It includes the following payload:
745
+ * `message_count` is the number of messages in the batch.
746
+ * `topic` is the topic that the message batch was consumed from.
747
+ * `partition` is the topic partition that the message batch was consumed from.
748
+ * `highwater_mark_offset` is the message batch's highest offset within the topic partition.
749
+ * `offset_lag` is the number of messages within the topic partition that have not yet been consumed.
750
+
751
+ * `join_group.consumer.kafka` is sent whenever a consumer joins a consumer group. It includes the following payload:
752
+ * `group_id` is the consumer group id.
753
+
754
+ * `sync_group.consumer.kafka` is sent whenever a consumer is assigned topic partitions within a consumer group. It includes the following payload:
755
+ * `group_id` is the consumer group id.
756
+
757
+ * `leave_group.consumer.kafka` is sent whenever a consumer leaves a consumer group. It includes the following payload:
758
+ * `group_id` is the consumer group id.
759
+
760
+
732
761
  #### Connection Notifications
733
762
 
734
763
  * `request.connection.kafka` is sent whenever a network request is sent to a Kafka broker. It includes the following payload:
@@ -749,6 +778,29 @@ It is highly recommended that you monitor your Kafka client applications in prod
749
778
  You can quite easily build monitoring on top of the provided [instrumentation hooks](#instrumentation). In order to further help with monitoring, a prebuilt [Statsd](https://github.com/etsy/statsd) and [Datadog](https://www.datadoghq.com/) reporter is included with ruby-kafka.
750
779
 
751
780
 
781
+ #### What to Monitor
782
+
783
+ We recommend monitoring the following:
784
+
785
+ * Low-level Kafka API calls:
786
+ * The rate of API call errors to the total number of calls by both API and broker.
787
+ * The API call throughput by both API and broker.
788
+ * The API call latency by both API and broker.
789
+ * Producer-level metrics:
790
+ * Delivery throughput by topic.
791
+ * The latency of deliveries.
792
+ * The producer buffer fill ratios.
793
+ * The async producer queue sizes.
794
+ * Message delivery delays.
795
+ * Failed delivery attempts.
796
+ * Consumer-level metrics:
797
+ * Message processing throughput by topic.
798
+ * Processing latency by topic.
799
+ * Processing errors by topic.
800
+ * Consumer lag (how many messages are yet to be processed) by topic/partition.
801
+ * Group join/sync/leave by client host.
802
+
803
+
752
804
  #### Reporting Metrics to Statsd
753
805
 
754
806
  The Statsd reporter is automatically enabled when the `kafka/statsd` library is required. You can optionally change the configuration.
@@ -938,6 +990,8 @@ Currently, there are three actively developed frameworks based on ruby-kafka, th
938
990
 
939
991
  * [Racecar](https://github.com/zendesk/racecar) - A simple framework that integrates with Ruby on Rails to provide a seamless way to write, test, configure, and run Kafka consumers. It comes with sensible defaults and conventions.
940
992
 
993
+ * [DeliveryBoy](https://github.com/zendesk/delivery_boy) – A library that integrates with Ruby on Rails, making it easy to publish Kafka messages from any Rails application.
994
+
941
995
  * [Karafka](https://github.com/karafka/karafka) - Framework used to simplify Apache Kafka based Ruby and Rails applications development. Karafka provides higher abstraction layers, including Capistrano, Docker and Heroku support.
942
996
 
943
997
  * [Phobos](https://github.com/klarna/phobos) - Micro framework and library for applications dealing with Apache Kafka. It wraps common behaviors needed by consumers and producers in an easy and convenient API.
data/ci/init.rb CHANGED
@@ -4,7 +4,7 @@ require "kafka"
4
4
 
5
5
  logger = Logger.new(STDOUT)
6
6
  logger.level = Logger::INFO
7
- logger.formatter = -> (_, _, _, msg) { msg }
7
+ logger.formatter = ->(_, _, _, msg) { msg }
8
8
 
9
9
  STDOUT.sync = true
10
10
 
data/circle.yml CHANGED
@@ -5,17 +5,19 @@ machine:
5
5
  - docker
6
6
  environment:
7
7
  LOG_LEVEL: DEBUG
8
+ ruby:
9
+ version: 2.4.1
8
10
 
9
11
  dependencies:
10
12
  pre:
11
13
  - docker -v
12
14
  - docker pull ches/kafka:0.9.0.1
13
15
  - docker pull jplock/zookeeper:3.4.6
14
- - gem install bundler -v 1.9.5
15
16
 
16
17
  test:
17
18
  override:
18
19
  - bundle exec rspec -r rspec_junit_formatter --format RspecJunitFormatter -o $CIRCLE_TEST_REPORTS/rspec/unit.xml
19
20
  - bundle exec rspec -r rspec_junit_formatter --format RspecJunitFormatter -o $CIRCLE_TEST_REPORTS/rspec/functional.xml --tag functional
20
21
  post:
22
+ - bundle exec rubocop
21
23
  - cp *.log $CIRCLE_ARTIFACTS/ || true
data/lib/kafka.rb CHANGED
@@ -203,6 +203,13 @@ module Kafka
203
203
 
204
204
  # Raised if not all messages could be sent by a producer.
205
205
  class DeliveryFailed < Error
206
+ attr_reader :failed_messages
207
+
208
+ def initialize(message, failed_messages)
209
+ @failed_messages = failed_messages
210
+
211
+ super(message)
212
+ end
206
213
  end
207
214
 
208
215
  class HeartbeatError < Error
@@ -142,16 +142,10 @@ module Kafka
142
142
 
143
143
  def ensure_threads_running!
144
144
  @worker_thread = nil unless @worker_thread && @worker_thread.alive?
145
- @worker_thread ||= start_thread { @worker.run }
145
+ @worker_thread ||= Thread.new { @worker.run }
146
146
 
147
147
  @timer_thread = nil unless @timer_thread && @timer_thread.alive?
148
- @timer_thread ||= start_thread { @timer.run }
149
- end
150
-
151
- def start_thread(&block)
152
- thread = Thread.new(&block)
153
- thread.abort_on_exception = true
154
- thread
148
+ @timer_thread ||= Thread.new { @timer.run }
155
149
  end
156
150
 
157
151
  def buffer_overflow(topic)
@@ -191,6 +185,8 @@ module Kafka
191
185
  end
192
186
 
193
187
  def run
188
+ @logger.info "Starting async producer in the background..."
189
+
194
190
  loop do
195
191
  operation, payload = @queue.pop
196
192
 
@@ -218,6 +214,15 @@ module Kafka
218
214
  raise "Unknown operation #{operation.inspect}"
219
215
  end
220
216
  end
217
+ rescue Kafka::Error => e
218
+ @logger.error "Unexpected Kafka error #{e.class}: #{e.message}\n#{e.backtrace.join("\n")}"
219
+ @logger.info "Restarting in 10 seconds..."
220
+
221
+ sleep 10
222
+ retry
223
+ rescue Exception => e
224
+ @logger.error "Unexpected Kafka error #{e.class}: #{e.message}\n#{e.backtrace.join("\n")}"
225
+ @logger.error "Async producer crashed!"
221
226
  ensure
222
227
  @producer.shutdown
223
228
  end
data/lib/kafka/client.rb CHANGED
@@ -14,12 +14,13 @@ require "kafka/sasl_authenticator"
14
14
 
15
15
  module Kafka
16
16
  class Client
17
+ URI_SCHEMES = ["kafka", "kafka+ssl"]
17
18
 
18
19
  # Initializes a new Kafka client.
19
20
  #
20
21
  # @param seed_brokers [Array<String>, String] the list of brokers used to initialize
21
22
  # the client. Either an Array of connections, or a comma separated string of connections.
22
- # Connections can either be a string of "port:protocol" or a full URI with a scheme.
23
+ # A connection can either be a string of "host:port" or a full URI with a scheme.
23
24
  # If there's a scheme it's ignored and only host/port are used.
24
25
  #
25
26
  # @param client_id [String] the identifier for this application.
@@ -142,7 +143,7 @@ module Kafka
142
143
  operation.execute
143
144
 
144
145
  unless buffer.empty?
145
- raise DeliveryFailed
146
+ raise DeliveryFailed.new(nil, [message])
146
147
  end
147
148
  end
148
149
 
@@ -435,7 +436,6 @@ module Kafka
435
436
  @cluster.resolve_offset(topic, partition, :latest) - 1
436
437
  end
437
438
 
438
-
439
439
  # Retrieve the offset of the last message in each partition of the specified topics.
440
440
  #
441
441
  # @param topics [Array<String>] topic names.
@@ -516,6 +516,11 @@ module Kafka
516
516
  connection = "kafka://" + connection unless connection =~ /:\/\//
517
517
  uri = URI.parse(connection)
518
518
  uri.port ||= 9092 # Default Kafka port.
519
+
520
+ unless URI_SCHEMES.include?(uri.scheme)
521
+ raise Kafka::Error, "invalid protocol `#{uri.scheme}` in `#{connection}`"
522
+ end
523
+
519
524
  uri
520
525
  end
521
526
  end
@@ -165,10 +165,10 @@ module Kafka
165
165
  # at the last committed offsets.
166
166
  #
167
167
  # @param min_bytes [Integer] the minimum number of bytes to read before
168
- # returning messages from the server; if `max_wait_time` is reached, this
168
+ # returning messages from each broker; if `max_wait_time` is reached, this
169
169
  # is ignored.
170
170
  # @param max_wait_time [Integer, Float] the maximum duration of time to wait before
171
- # returning messages from the server, in seconds.
171
+ # returning messages from each broker, in seconds.
172
172
  # @param automatically_mark_as_processed [Boolean] whether to automatically
173
173
  # mark a message as successfully processed when the block returns
174
174
  # without an exception. Once marked successful, the offsets of processed
@@ -178,7 +178,7 @@ module Kafka
178
178
  # The original exception will be returned by calling `#cause` on the
179
179
  # {Kafka::ProcessingError} instance.
180
180
  # @return [nil]
181
- def each_message(min_bytes: 1, max_wait_time: 5, automatically_mark_as_processed: true)
181
+ def each_message(min_bytes: 1, max_wait_time: 1, automatically_mark_as_processed: true)
182
182
  consumer_loop do
183
183
  batches = fetch_batches(min_bytes: min_bytes, max_wait_time: max_wait_time)
184
184
 
@@ -232,17 +232,17 @@ module Kafka
232
232
  # at the last committed offsets.
233
233
  #
234
234
  # @param min_bytes [Integer] the minimum number of bytes to read before
235
- # returning messages from the server; if `max_wait_time` is reached, this
235
+ # returning messages from each broker; if `max_wait_time` is reached, this
236
236
  # is ignored.
237
237
  # @param max_wait_time [Integer, Float] the maximum duration of time to wait before
238
- # returning messages from the server, in seconds.
238
+ # returning messages from each broker, in seconds.
239
239
  # @param automatically_mark_as_processed [Boolean] whether to automatically
240
240
  # mark a batch's messages as successfully processed when the block returns
241
241
  # without an exception. Once marked successful, the offsets of processed
242
242
  # messages can be committed to Kafka.
243
243
  # @yieldparam batch [Kafka::FetchedBatch] a message batch fetched from Kafka.
244
244
  # @return [nil]
245
- def each_batch(min_bytes: 1, max_wait_time: 5, automatically_mark_as_processed: true)
245
+ def each_batch(min_bytes: 1, max_wait_time: 1, automatically_mark_as_processed: true)
246
246
  consumer_loop do
247
247
  batches = fetch_batches(min_bytes: min_bytes, max_wait_time: max_wait_time)
248
248
 
@@ -260,7 +260,7 @@ module Kafka
260
260
  begin
261
261
  yield batch
262
262
  rescue => e
263
- offset_range = (batch.first_offset .. batch.last_offset)
263
+ offset_range = (batch.first_offset..batch.last_offset)
264
264
  location = "#{batch.topic}/#{batch.partition} in offset range #{offset_range}"
265
265
  backtrace = e.backtrace.join("\n")
266
266
 
@@ -289,9 +289,9 @@ module Kafka
289
289
  # you will want to do this in every consumer group member in order to make sure
290
290
  # that the member that's assigned the partition knows where to start.
291
291
  #
292
- # @param topic [String]
293
- # @param partition [Integer]
294
- # @param offset [Integer]
292
+ # @param topic [String]
293
+ # @param partition [Integer]
294
+ # @param offset [Integer]
295
295
  # @return [nil]
296
296
  def seek(topic, partition, offset)
297
297
  @offset_manager.seek_to(topic, partition, offset)
data/lib/kafka/datadog.rb CHANGED
@@ -27,24 +27,53 @@ module Kafka
27
27
  module Datadog
28
28
  STATSD_NAMESPACE = "ruby_kafka"
29
29
 
30
- def self.statsd
31
- @statsd ||= ::Datadog::Statsd.new(::Datadog::Statsd::DEFAULT_HOST, ::Datadog::Statsd::DEFAULT_PORT, namespace: STATSD_NAMESPACE)
32
- end
30
+ class << self
31
+ def statsd
32
+ @statsd ||= ::Datadog::Statsd.new(host, port, namespace: namespace, tags: tags)
33
+ end
33
34
 
34
- def self.host=(host)
35
- statsd.host = host
36
- end
35
+ def host
36
+ @host ||= ::Datadog::Statsd::DEFAULT_HOST
37
+ end
37
38
 
38
- def self.port=(port)
39
- statsd.port = port
40
- end
39
+ def host=(host)
40
+ @host = host
41
+ clear
42
+ end
41
43
 
42
- def self.namespace=(namespace)
43
- statsd.namespace = namespace
44
- end
44
+ def port
45
+ @port ||= ::Datadog::Statsd::DEFAULT_PORT
46
+ end
47
+
48
+ def port=(port)
49
+ @port = port
50
+ clear
51
+ end
52
+
53
+ def namespace
54
+ @namespace ||= STATSD_NAMESPACE
55
+ end
45
56
 
46
- def self.tags=(tags)
47
- statsd.tags = tags
57
+ def namespace=(namespace)
58
+ @namespace = namespace
59
+ clear
60
+ end
61
+
62
+ def tags
63
+ @tags ||= []
64
+ end
65
+
66
+ def tags=(tags)
67
+ @tags = tags
68
+ clear
69
+ end
70
+
71
+ private
72
+
73
+ def clear
74
+ @statsd && @statsd.close
75
+ @statsd = nil
76
+ end
48
77
  end
49
78
 
50
79
  class StatsdSubscriber < ActiveSupport::Subscriber
@@ -113,6 +142,7 @@ module Kafka
113
142
  end
114
143
 
115
144
  def process_batch(event)
145
+ lag = event.payload.fetch(:offset_lag)
116
146
  messages = event.payload.fetch(:message_count)
117
147
 
118
148
  tags = {
@@ -128,6 +158,8 @@ module Kafka
128
158
  timing("consumer.process_batch.latency", event.duration, tags: tags)
129
159
  count("consumer.messages", messages, tags: tags)
130
160
  end
161
+
162
+ gauge("consumer.lag", lag, tags: tags)
131
163
  end
132
164
 
133
165
  def join_group(event)
@@ -1,4 +1,7 @@
1
1
  module Kafka
2
+
3
+ # Manages a consumer's position in partitions, figures out where to resume processing
4
+ # from, etc.
2
5
  class OffsetManager
3
6
 
4
7
  # The default broker setting for offsets.retention.minutes is 1440.
@@ -21,10 +24,28 @@ module Kafka
21
24
  @recommit_interval = (offset_retention_time || DEFAULT_RETENTION_TIME) / 2
22
25
  end
23
26
 
27
+ # Set the default offset for a topic.
28
+ #
29
+ # When the consumer is started for the first time, or in cases where it gets stuck and
30
+ # has to reset its position, it must start either with the earliest messages or with
31
+ # the latest, skipping to the very end of each partition.
32
+ #
33
+ # @param topic [String] the name of the topic.
34
+ # @param default_offset [Symbol] either `:earliest` or `:latest`.
35
+ # @return [nil]
24
36
  def set_default_offset(topic, default_offset)
25
37
  @default_offsets[topic] = default_offset
26
38
  end
27
39
 
40
+ # Mark a message as having been processed.
41
+ #
42
+ # When offsets are committed, the message's offset will be stored in Kafka so
43
+ # that we can resume from this point at a later time.
44
+ #
45
+ # @param topic [String] the name of the topic.
46
+ # @param partition [Integer] the partition number.
47
+ # @param offset [Integer] the offset of the message that should be marked as processed.
48
+ # @return [nil]
28
49
  def mark_as_processed(topic, partition, offset)
29
50
  @uncommitted_offsets += 1
30
51
  @processed_offsets[topic] ||= {}
@@ -35,15 +56,35 @@ module Kafka
35
56
  @logger.debug "Marking #{topic}/#{partition}:#{offset} as processed"
36
57
  end
37
58
 
59
+ # Move the consumer's position in the partition back to the configured default
60
+ # offset, either the first or latest in the partition.
61
+ #
62
+ # @param topic [String] the name of the topic.
63
+ # @param partition [Integer] the partition number.
64
+ # @return [nil]
38
65
  def seek_to_default(topic, partition)
66
+ # Remove any cached offset, in case things have changed broker-side.
67
+ clear_resolved_offset(topic)
68
+
39
69
  seek_to(topic, partition, -1)
40
70
  end
41
71
 
72
+ # Move the consumer's position in the partition to the specified offset.
73
+ #
74
+ # @param topic [String] the name of the topic.
75
+ # @param partition [Integer] the partition number.
76
+ # @param offset [Integer] the offset that the consumer position should be moved to.
77
+ # @return [nil]
42
78
  def seek_to(topic, partition, offset)
43
79
  @processed_offsets[topic] ||= {}
44
80
  @processed_offsets[topic][partition] = offset
45
81
  end
46
82
 
83
+ # Return the next offset that should be fetched for the specified partition.
84
+ #
85
+ # @param topic [String] the name of the topic.
86
+ # @param partition [Integer] the partition number.
87
+ # @return [Integer] the next offset that should be fetched.
47
88
  def next_offset_for(topic, partition)
48
89
  offset = @processed_offsets.fetch(topic, {}).fetch(partition) {
49
90
  committed_offset_for(topic, partition)
@@ -59,6 +100,16 @@ module Kafka
59
100
  end
60
101
  end
61
102
 
103
+ # Commit offsets of messages that have been marked as processed.
104
+ #
105
+ # If `recommit` is set to true, we will also commit the existing positions
106
+ # even if no messages have been processed on a partition. This is done
107
+ # in order to avoid the offset information expiring in cases where messages
108
+ # are very rare -- it's essentially a keep-alive.
109
+ #
110
+ # @param recommit [Boolean] whether to recommit offsets that have already been
111
+ # committed.
112
+ # @return [nil]
62
113
  def commit_offsets(recommit = false)
63
114
  offsets = offsets_to_commit(recommit)
64
115
  unless offsets.empty?
@@ -74,6 +125,10 @@ module Kafka
74
125
  end
75
126
  end
76
127
 
128
+ # Commit offsets if necessary, according to the offset commit policy specified
129
+ # when initializing the class.
130
+ #
131
+ # @return [nil]
77
132
  def commit_offsets_if_necessary
78
133
  recommit = recommit_timeout_reached?
79
134
  if recommit || commit_timeout_reached? || commit_threshold_reached?
@@ -81,6 +136,9 @@ module Kafka
81
136
  end
82
137
  end
83
138
 
139
+ # Clear all stored offset information.
140
+ #
141
+ # @return [nil]
84
142
  def clear_offsets
85
143
  @processed_offsets.clear
86
144
  @resolved_offsets.clear
@@ -89,6 +147,12 @@ module Kafka
89
147
  @committed_offsets = nil
90
148
  end
91
149
 
150
+ # Clear stored offset information for all partitions except those specified
151
+ # in `excluded`.
152
+ #
153
+ # offset_manager.clear_offsets_excluding("my-topic" => [1, 2, 3])
154
+ #
155
+ # @return [nil]
92
156
  def clear_offsets_excluding(excluded)
93
157
  # Clear all offsets that aren't in `excluded`.
94
158
  @processed_offsets.each do |topic, partitions|
@@ -104,6 +168,10 @@ module Kafka
104
168
 
105
169
  private
106
170
 
171
+ def clear_resolved_offset(topic)
172
+ @resolved_offsets.delete(topic)
173
+ end
174
+
107
175
  def resolve_offset(topic, partition)
108
176
  @resolved_offsets[topic] ||= fetch_resolved_offsets(topic)
109
177
  @resolved_offsets[topic].fetch(partition)
@@ -11,5 +11,15 @@ module Kafka
11
11
  @create_time = create_time
12
12
  @bytesize = key.to_s.bytesize + value.to_s.bytesize
13
13
  end
14
+
15
+ def ==(other)
16
+ @value == other.value &&
17
+ @key == other.key &&
18
+ @topic == other.topic &&
19
+ @partition == other.partition &&
20
+ @partition_key == other.partition_key &&
21
+ @create_time == other.create_time &&
22
+ @bytesize == other.bytesize
23
+ end
14
24
  end
15
25
  end
@@ -98,7 +98,7 @@ module Kafka
98
98
  timeout: @ack_timeout * 1000, # Kafka expects the timeout in milliseconds.
99
99
  )
100
100
 
101
- handle_response(response) if response
101
+ handle_response(broker, response) if response
102
102
  rescue ConnectionError => e
103
103
  @logger.error "Could not connect to broker #{broker}: #{e}"
104
104
 
@@ -108,7 +108,7 @@ module Kafka
108
108
  end
109
109
  end
110
110
 
111
- def handle_response(response)
111
+ def handle_response(broker, response)
112
112
  response.each_partition do |topic_info, partition_info|
113
113
  topic = topic_info.topic
114
114
  partition = partition_info.partition
@@ -138,24 +138,24 @@ module Kafka
138
138
  })
139
139
  end
140
140
  rescue Kafka::CorruptMessage
141
- @logger.error "Corrupt message when writing to #{topic}/#{partition}"
141
+ @logger.error "Corrupt message when writing to #{topic}/#{partition} on #{broker}"
142
142
  rescue Kafka::UnknownTopicOrPartition
143
- @logger.error "Unknown topic or partition #{topic}/#{partition}"
143
+ @logger.error "Unknown topic or partition #{topic}/#{partition} on #{broker}"
144
144
  @cluster.mark_as_stale!
145
145
  rescue Kafka::LeaderNotAvailable
146
146
  @logger.error "Leader currently not available for #{topic}/#{partition}"
147
147
  @cluster.mark_as_stale!
148
148
  rescue Kafka::NotLeaderForPartition
149
- @logger.error "Broker not currently leader for #{topic}/#{partition}"
149
+ @logger.error "Broker #{broker} not currently leader for #{topic}/#{partition}"
150
150
  @cluster.mark_as_stale!
151
151
  rescue Kafka::RequestTimedOut
152
- @logger.error "Timed out while writing to #{topic}/#{partition}"
152
+ @logger.error "Timed out while writing to #{topic}/#{partition} on #{broker}"
153
153
  rescue Kafka::NotEnoughReplicas
154
154
  @logger.error "Not enough in-sync replicas for #{topic}/#{partition}"
155
155
  rescue Kafka::NotEnoughReplicasAfterAppend
156
156
  @logger.error "Messages written, but to fewer in-sync replicas than required for #{topic}/#{partition}"
157
157
  else
158
- @logger.debug "Successfully appended #{messages.count} messages to #{topic}/#{partition}"
158
+ @logger.debug "Successfully appended #{messages.count} messages to #{topic}/#{partition} on #{broker}"
159
159
 
160
160
  # The messages were successfully written; clear them from the buffer.
161
161
  @buffer.clear_messages(topic: topic, partition: partition)
@@ -294,7 +294,11 @@ module Kafka
294
294
 
295
295
  notification[:attempts] = attempt
296
296
 
297
- @cluster.refresh_metadata_if_necessary!
297
+ begin
298
+ @cluster.refresh_metadata_if_necessary!
299
+ rescue ConnectionError => e
300
+ raise DeliveryFailed.new(e, buffer_messages)
301
+ end
298
302
 
299
303
  assign_partitions!
300
304
  operation.execute
@@ -321,13 +325,13 @@ module Kafka
321
325
  unless @pending_message_queue.empty?
322
326
  # Mark the cluster as stale in order to force a cluster metadata refresh.
323
327
  @cluster.mark_as_stale!
324
- raise DeliveryFailed, "Failed to assign partitions to #{@pending_message_queue.size} messages"
328
+ raise DeliveryFailed.new("Failed to assign partitions to #{@pending_message_queue.size} messages", buffer_messages)
325
329
  end
326
330
 
327
331
  unless @buffer.empty?
328
332
  partitions = @buffer.map {|topic, partition, _| "#{topic}/#{partition}" }.join(", ")
329
333
 
330
- raise DeliveryFailed, "Failed to send messages to #{partitions}"
334
+ raise DeliveryFailed.new("Failed to send messages to #{partitions}", buffer_messages)
331
335
  end
332
336
  end
333
337
 
@@ -380,6 +384,29 @@ module Kafka
380
384
  @pending_message_queue.replace(failed_messages)
381
385
  end
382
386
 
387
+ def buffer_messages
388
+ messages = []
389
+
390
+ @pending_message_queue.each do |message|
391
+ messages << message
392
+ end
393
+
394
+ @buffer.each do |topic, partition, messages_for_partition|
395
+ messages_for_partition.each do |message|
396
+ messages << PendingMessage.new(
397
+ message.value,
398
+ message.key,
399
+ topic,
400
+ partition,
401
+ nil,
402
+ message.create_time
403
+ )
404
+ end
405
+ end
406
+
407
+ messages
408
+ end
409
+
383
410
  def buffer_overflow(topic, message)
384
411
  @instrumenter.instrument("buffer_overflow.producer", {
385
412
  topic: topic,
@@ -38,7 +38,7 @@ module Kafka
38
38
  end
39
39
 
40
40
  # we can continue, so send OK
41
- @encoder.write([0,2].pack('l>c'))
41
+ @encoder.write([0, 2].pack('l>c'))
42
42
 
43
43
  # read wrapped message and return it back with principal
44
44
  handshake_messages
@@ -31,15 +31,15 @@ module Kafka
31
31
 
32
32
  # first initiate the TCP socket
33
33
  begin
34
- # Initiate the socket connection in the background. If it doesn't fail
35
- # immediately it will raise an IO::WaitWritable (Errno::EINPROGRESS)
34
+ # Initiate the socket connection in the background. If it doesn't fail
35
+ # immediately it will raise an IO::WaitWritable (Errno::EINPROGRESS)
36
36
  # indicating the connection is in progress.
37
37
  @tcp_socket.connect_nonblock(sockaddr)
38
38
  rescue IO::WaitWritable
39
39
  # select will block until the socket is writable or the timeout
40
40
  # is exceeded, whichever comes first.
41
41
  unless select_with_timeout(@tcp_socket, :connect_write)
42
- # select returns nil when the socket is not ready before timeout
42
+ # select returns nil when the socket is not ready before timeout
43
43
  # seconds have elapsed
44
44
  @tcp_socket.close
45
45
  raise Errno::ETIMEDOUT
@@ -57,8 +57,8 @@ module Kafka
57
57
  @ssl_socket = OpenSSL::SSL::SSLSocket.new(@tcp_socket, ssl_context)
58
58
 
59
59
  begin
60
- # Initiate the socket connection in the background. If it doesn't fail
61
- # immediately it will raise an IO::WaitWritable (Errno::EINPROGRESS)
60
+ # Initiate the socket connection in the background. If it doesn't fail
61
+ # immediately it will raise an IO::WaitWritable (Errno::EINPROGRESS)
62
62
  # indicating the connection is in progress.
63
63
  # Unlike waiting for a tcp socket to connect, you can't time out ssl socket
64
64
  # connections during the connect phase properly, because IO.select only partially works.
@@ -130,7 +130,7 @@ module Kafka
130
130
  # our write buffer.
131
131
  written += @ssl_socket.write_nonblock(bytes)
132
132
  rescue Errno::EFAULT => error
133
- raise error
133
+ raise error
134
134
  rescue OpenSSL::SSL::SSLError, Errno::EAGAIN, Errno::EWOULDBLOCK, IO::WaitWritable => error
135
135
  if error.is_a?(OpenSSL::SSL::SSLError) && error.message == 'write would block'
136
136
  if select_with_timeout(@ssl_socket, :write)
data/lib/kafka/statsd.rb CHANGED
@@ -29,7 +29,7 @@ module Kafka
29
29
  DEFAULT_PORT = 8125
30
30
 
31
31
  def self.statsd
32
- @statsd ||= ::Statsd.new(DEFAULT_HOST, DEFAULT_PORT).tap{ |sd| sd.namespace = DEFAULT_NAMESPACE }
32
+ @statsd ||= ::Statsd.new(DEFAULT_HOST, DEFAULT_PORT).tap { |sd| sd.namespace = DEFAULT_NAMESPACE }
33
33
  end
34
34
 
35
35
  def self.host=(host)
data/lib/kafka/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module Kafka
2
- VERSION = "0.4.2"
2
+ VERSION = "0.4.3"
3
3
  end
data/ruby-kafka.gemspec CHANGED
@@ -1,4 +1,5 @@
1
1
  # coding: utf-8
2
+
2
3
  lib = File.expand_path('../lib', __FILE__)
3
4
  $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
5
  require 'kafka/version'
@@ -36,9 +37,10 @@ Gem::Specification.new do |spec|
36
37
  spec.add_development_dependency "snappy"
37
38
  spec.add_development_dependency "colored"
38
39
  spec.add_development_dependency "rspec_junit_formatter", "0.2.2"
39
- spec.add_development_dependency "dogstatsd-ruby", ">= 2.0.0"
40
+ spec.add_development_dependency "dogstatsd-ruby", ">= 3.0.0"
40
41
  spec.add_development_dependency "statsd-ruby"
41
42
  spec.add_development_dependency "ruby-prof"
42
43
  spec.add_development_dependency "timecop"
44
+ spec.add_development_dependency "rubocop", "~> 0.49.1"
43
45
  spec.add_development_dependency "gssapi", '>=1.2.0'
44
46
  end
@@ -0,0 +1,17 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+ #
4
+ # This file was generated by Bundler.
5
+ #
6
+ # The application 'rubocop' is installed as part of a gem, and
7
+ # this file is here to facilitate running it.
8
+ #
9
+
10
+ require "pathname"
11
+ ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../Gemfile",
12
+ Pathname.new(__FILE__).realpath)
13
+
14
+ require "rubygems"
15
+ require "bundler/setup"
16
+
17
+ load Gem.bin_path("rubocop", "rubocop")
@@ -0,0 +1,17 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+ #
4
+ # This file was generated by Bundler.
5
+ #
6
+ # The application 'ruby-parse' is installed as part of a gem, and
7
+ # this file is here to facilitate running it.
8
+ #
9
+
10
+ require "pathname"
11
+ ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../Gemfile",
12
+ Pathname.new(__FILE__).realpath)
13
+
14
+ require "rubygems"
15
+ require "bundler/setup"
16
+
17
+ load Gem.bin_path("parser", "ruby-parse")
@@ -0,0 +1,17 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+ #
4
+ # This file was generated by Bundler.
5
+ #
6
+ # The application 'ruby-rewrite' is installed as part of a gem, and
7
+ # this file is here to facilitate running it.
8
+ #
9
+
10
+ require "pathname"
11
+ ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../../../Gemfile",
12
+ Pathname.new(__FILE__).realpath)
13
+
14
+ require "rubygems"
15
+ require "bundler/setup"
16
+
17
+ load Gem.bin_path("parser", "ruby-rewrite")
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ruby-kafka
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.2
4
+ version: 0.4.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Daniel Schierbeck
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2017-09-08 00:00:00.000000000 Z
11
+ date: 2017-10-20 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -170,14 +170,14 @@ dependencies:
170
170
  requirements:
171
171
  - - ">="
172
172
  - !ruby/object:Gem::Version
173
- version: 2.0.0
173
+ version: 3.0.0
174
174
  type: :development
175
175
  prerelease: false
176
176
  version_requirements: !ruby/object:Gem::Requirement
177
177
  requirements:
178
178
  - - ">="
179
179
  - !ruby/object:Gem::Version
180
- version: 2.0.0
180
+ version: 3.0.0
181
181
  - !ruby/object:Gem::Dependency
182
182
  name: statsd-ruby
183
183
  requirement: !ruby/object:Gem::Requirement
@@ -220,6 +220,20 @@ dependencies:
220
220
  - - ">="
221
221
  - !ruby/object:Gem::Version
222
222
  version: '0'
223
+ - !ruby/object:Gem::Dependency
224
+ name: rubocop
225
+ requirement: !ruby/object:Gem::Requirement
226
+ requirements:
227
+ - - "~>"
228
+ - !ruby/object:Gem::Version
229
+ version: 0.49.1
230
+ type: :development
231
+ prerelease: false
232
+ version_requirements: !ruby/object:Gem::Requirement
233
+ requirements:
234
+ - - "~>"
235
+ - !ruby/object:Gem::Version
236
+ version: 0.49.1
223
237
  - !ruby/object:Gem::Dependency
224
238
  name: gssapi
225
239
  requirement: !ruby/object:Gem::Requirement
@@ -243,10 +257,10 @@ extra_rdoc_files: []
243
257
  files:
244
258
  - ".gitignore"
245
259
  - ".rspec"
260
+ - ".rubocop.yml"
246
261
  - ".yardopts"
247
262
  - CHANGELOG.md
248
263
  - Gemfile
249
- - Gemfile.lock
250
264
  - ISSUE_TEMPLATE.md
251
265
  - LICENSE.txt
252
266
  - Procfile
@@ -341,8 +355,11 @@ files:
341
355
  - vendor/bundle/bin/pry
342
356
  - vendor/bundle/bin/rake
343
357
  - vendor/bundle/bin/rspec
358
+ - vendor/bundle/bin/rubocop
359
+ - vendor/bundle/bin/ruby-parse
344
360
  - vendor/bundle/bin/ruby-prof
345
361
  - vendor/bundle/bin/ruby-prof-check-trace
362
+ - vendor/bundle/bin/ruby-rewrite
346
363
  homepage: https://github.com/zendesk/ruby-kafka
347
364
  licenses:
348
365
  - Apache License Version 2.0
@@ -363,7 +380,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
363
380
  version: '0'
364
381
  requirements: []
365
382
  rubyforge_project:
366
- rubygems_version: 2.4.5.1
383
+ rubygems_version: 2.6.11
367
384
  signing_key:
368
385
  specification_version: 4
369
386
  summary: A client library for the Kafka distributed commit log.
data/Gemfile.lock DELETED
@@ -1,92 +0,0 @@
1
- PATH
2
- remote: .
3
- specs:
4
- ruby-kafka (0.4.1)
5
-
6
- GEM
7
- remote: https://rubygems.org/
8
- specs:
9
- activesupport (4.2.5)
10
- i18n (~> 0.7)
11
- json (~> 1.7, >= 1.7.7)
12
- minitest (~> 5.1)
13
- thread_safe (~> 0.3, >= 0.3.4)
14
- tzinfo (~> 1.1)
15
- benchmark-perf (0.1.0)
16
- builder (3.2.2)
17
- coderay (1.1.0)
18
- colored (1.2)
19
- diff-lcs (1.2.5)
20
- docker-api (1.32.1)
21
- excon (>= 0.38.0)
22
- json
23
- dogstatsd-ruby (2.1.0)
24
- dotenv (2.1.0)
25
- excon (0.54.0)
26
- ffi (1.9.18)
27
- gssapi (1.2.0)
28
- ffi (>= 1.0.1)
29
- i18n (0.7.0)
30
- json (1.8.3)
31
- method_source (0.8.2)
32
- minitest (5.8.3)
33
- pry (0.9.12.6)
34
- coderay (~> 1.0)
35
- method_source (~> 0.8)
36
- slop (~> 3.4)
37
- rake (10.5.0)
38
- rspec (3.4.0)
39
- rspec-core (~> 3.4.0)
40
- rspec-expectations (~> 3.4.0)
41
- rspec-mocks (~> 3.4.0)
42
- rspec-benchmark (0.1.0)
43
- benchmark-perf (~> 0.1.0)
44
- rspec (>= 3.0.0, < 4.0.0)
45
- rspec-core (3.4.1)
46
- rspec-support (~> 3.4.0)
47
- rspec-expectations (3.4.0)
48
- diff-lcs (>= 1.2.0, < 2.0)
49
- rspec-support (~> 3.4.0)
50
- rspec-mocks (3.4.1)
51
- diff-lcs (>= 1.2.0, < 2.0)
52
- rspec-support (~> 3.4.0)
53
- rspec-support (3.4.1)
54
- rspec_junit_formatter (0.2.2)
55
- builder (< 4)
56
- rspec-core (>= 2, < 4, != 2.12.0)
57
- ruby-prof (0.15.9)
58
- slop (3.6.0)
59
- snappy (0.0.12)
60
- statsd-ruby (1.4.0)
61
- thread_safe (0.3.5)
62
- timecop (0.8.0)
63
- tzinfo (1.2.2)
64
- thread_safe (~> 0.1)
65
-
66
- PLATFORMS
67
- ruby
68
-
69
- DEPENDENCIES
70
- activesupport
71
- bundler (>= 1.9.5)
72
- colored
73
- docker-api
74
- dogstatsd-ruby (>= 2.0.0)
75
- dotenv
76
- gssapi (>= 1.2.0)
77
- pry
78
- rake (~> 10.0)
79
- rspec
80
- rspec-benchmark
81
- rspec_junit_formatter (= 0.2.2)
82
- ruby-kafka!
83
- ruby-prof
84
- snappy
85
- statsd-ruby
86
- timecop
87
-
88
- RUBY VERSION
89
- ruby 2.2.3p173
90
-
91
- BUNDLED WITH
92
- 1.15.3