ruby-kafka 0.3.16 → 0.3.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 02443c4134a1dd98934b6bd2f73b2edc09fe4cf5
4
- data.tar.gz: b03995318665f91f070ecb00a14aa483e4c4444e
3
+ metadata.gz: ae14c68f2b224bf04659de3a17203e7065d5efd7
4
+ data.tar.gz: d33dd2594e8159eb23d3a8589b789f2b66c5993b
5
5
  SHA512:
6
- metadata.gz: 2c69cb9baa6eed992cdd7dc66b14547df32059144748b0c12fe269ec126bbd4ed62b18300aacb955f62b17235e26e605cfea53425adf278a57edcb99d4e7c272
7
- data.tar.gz: d2b7dd37bc7eb3ac6666808172d289334d926d9cb17b01690c19e1f67174af4f054c003fcc204fb9f711441d4207a4693e2d54419af873930306ae9f2fe8d1a0
6
+ metadata.gz: 9f2d374b0237fad7d3678f18ee903640a40bc735f995b3138286928398d5d6b595bc3ce395a86d60810934617cdc2c1e207d305c5b245323c9281ffe4bc33bc5
7
+ data.tar.gz: 0403e6ba029d666660395fc666dc4003195e9aa947b2c44936415394b12abd14946830d4b369ac5587f5d71858f40c6ff0630628c51602603e98f340e065ef36
@@ -4,11 +4,19 @@ Changes and additions to the library will be listed here.
4
4
 
5
5
  ## Unreleased
6
6
 
7
+ ## v0.3.17
8
+
9
+ - Re-commit previously committed offsets periodically with an interval of half
10
+ the offset retention time, starting with the first commit (#318).
11
+ - Expose offset retention time in the Consumer API (#316).
12
+ - Don't get blocked when there's temporarily no leader for a topic (#336).
13
+
7
14
  ## v0.3.16
8
15
 
9
16
  - Fix SSL socket timeout (#283).
10
17
  - Update to the latest Datadog gem (#296).
11
18
  - Automatically detect private key type (#297).
19
+ - Only fetch messages for subscribed topics (#309).
12
20
 
13
21
  ## v0.3.15
14
22
 
data/README.md CHANGED
@@ -9,35 +9,35 @@ Although parts of this library work with Kafka 0.8 – specifically, the Produce
9
9
  1. [Installation](#installation)
10
10
  2. [Compatibility](#compatibility)
11
11
  3. [Usage](#usage)
12
- 1. [Setting up the Kafka Client](#setting-up-the-kafka-client)
13
- 2. [Producing Messages to Kafka](#producing-messages-to-kafka)
14
- 1. [Efficiently Producing Messages](#efficiently-producing-messages)
15
- 1. [Asynchronously Producing Messages](#asynchronously-producing-messages)
16
- 2. [Serialization](#serialization)
17
- 3. [Partitioning](#partitioning)
18
- 4. [Buffering and Error Handling](#buffering-and-error-handling)
19
- 5. [Message Durability](#message-durability)
20
- 6. [Message Delivery Guarantees](#message-delivery-guarantees)
21
- 7. [Compression](#compression)
22
- 8. [Producing Messages from a Rails Application](#producing-messages-from-a-rails-application)
23
- 3. [Consuming Messages from Kafka](#consuming-messages-from-kafka)
24
- 1. [Consumer Groups](#consumer-groups)
25
- 2. [Consumer Checkpointing](#consumer-checkpointing)
26
- 3. [Topic Subscriptions](#topic-subscriptions)
27
- 4. [Shutting Down a Consumer](#shutting-down-a-consumer)
28
- 5. [Consuming Messages in Batches](#consuming-messages-in-batches)
29
- 6. [Balancing Throughput and Latency](#balancing-throughput-and-latency)
30
- 4. [Thread Safety](#thread-safety)
31
- 5. [Logging](#logging)
32
- 6. [Instrumentation](#instrumentation)
33
- 7. [Monitoring](#monitoring)
34
- 1. [Reporting Metrics to Datadog](#reporting-metrics-to-datadog)
35
- 8. [Understanding Timeouts](#understanding-timeouts)
36
- 9. [Encryption and Authentication using SSL](#encryption-and-authentication-using-ssl)
12
+ 1. [Setting up the Kafka Client](#setting-up-the-kafka-client)
13
+ 2. [Producing Messages to Kafka](#producing-messages-to-kafka)
14
+ 1. [Efficiently Producing Messages](#efficiently-producing-messages)
15
+ 1. [Asynchronously Producing Messages](#asynchronously-producing-messages)
16
+ 2. [Serialization](#serialization)
17
+ 3. [Partitioning](#partitioning)
18
+ 4. [Buffering and Error Handling](#buffering-and-error-handling)
19
+ 5. [Message Durability](#message-durability)
20
+ 6. [Message Delivery Guarantees](#message-delivery-guarantees)
21
+ 7. [Compression](#compression)
22
+ 8. [Producing Messages from a Rails Application](#producing-messages-from-a-rails-application)
23
+ 3. [Consuming Messages from Kafka](#consuming-messages-from-kafka)
24
+ 1. [Consumer Groups](#consumer-groups)
25
+ 2. [Consumer Checkpointing](#consumer-checkpointing)
26
+ 3. [Topic Subscriptions](#topic-subscriptions)
27
+ 4. [Shutting Down a Consumer](#shutting-down-a-consumer)
28
+ 5. [Consuming Messages in Batches](#consuming-messages-in-batches)
29
+ 6. [Balancing Throughput and Latency](#balancing-throughput-and-latency)
30
+ 4. [Thread Safety](#thread-safety)
31
+ 5. [Logging](#logging)
32
+ 6. [Instrumentation](#instrumentation)
33
+ 7. [Monitoring](#monitoring)
34
+ 1. [Reporting Metrics to Datadog](#reporting-metrics-to-datadog)
35
+ 8. [Understanding Timeouts](#understanding-timeouts)
36
+ 9. [Encryption and Authentication using SSL](#encryption-and-authentication-using-ssl)
37
37
  4. [Design](#design)
38
- 1. [Producer Design](#producer-design)
39
- 2. [Asynchronous Producer Design](#asynchronous-producer-design)
40
- 3. [Consumer Design](#consumer-design)
38
+ 1. [Producer Design](#producer-design)
39
+ 2. [Asynchronous Producer Design](#asynchronous-producer-design)
40
+ 3. [Consumer Design](#consumer-design)
41
41
  5. [Development](#development)
42
42
  6. [Roadmap](#roadmap)
43
43
 
@@ -166,7 +166,7 @@ Read the docs for [Kafka::Producer](http://www.rubydoc.info/gems/ruby-kafka/Kafk
166
166
 
167
167
  #### Asynchronously Producing Messages
168
168
 
169
- A normal producer will block while `#deliver_messages` is sending messages to Kafka, possible for tens of seconds or even minutes at a time, depending on your timeout and retry settings. Furthermore, you have to call `#deliver_messages` manually, with a frequency that balances batch size with message delay.
169
+ A normal producer will block while `#deliver_messages` is sending messages to Kafka, possibly for tens of seconds or even minutes at a time, depending on your timeout and retry settings. Furthermore, you have to call `#deliver_messages` manually, with a frequency that balances batch size with message delay.
170
170
 
171
171
  In order to avoid blocking during message deliveries you can use the _asynchronous producer_ API. It is mostly similar to the synchronous API, with calls to `#produce` and `#deliver_messages`. The main difference is that rather than blocking, these calls will return immediately. The actual work will be done in a background thread, with the messages and operations being sent from the caller over a thread safe queue.
172
172
 
@@ -505,6 +505,10 @@ By default, offsets are committed every 10 seconds. You can increase the frequen
505
505
 
506
506
  In addition to the time based trigger it's possible to trigger checkpointing in response to _n_ messages having been processed, known as the _offset commit threshold_. This puts a bound on the number of messages that can be double-processed before the problem is detected. Setting this to 1 will cause an offset commit to take place every time a message has been processed. By default this trigger is disabled.
507
507
 
508
+ Stale offsets are periodically purged by the broker. The broker setting `offsets.retention.minutes` controls the retention window for committed offsets, and defaults to 1 day. The length of the retention window, known as _offset retention time_, can be changed for the consumer.
509
+
510
+ Previously committed offsets are re-committed, to reset the retention window, at the first commit and periodically at an interval of half the _offset retention time_.
511
+
508
512
  ```ruby
509
513
  consumer = kafka.consumer(
510
514
  group_id: "some-group",
@@ -514,6 +518,9 @@ consumer = kafka.consumer(
514
518
 
515
519
  # Commit offsets when 100 messages have been processed.
516
520
  offset_commit_threshold: 100,
521
+
522
+ # Increase the length of time that committed offsets are kept.
523
+ offset_retention_time: 7 * 60 * 60
517
524
  )
518
525
  ```
519
526
 
@@ -668,7 +675,7 @@ end
668
675
 
669
676
  It is highly recommended that you monitor your Kafka client applications in production. Typical problems you'll see are:
670
677
 
671
- * high network errors rates, which may impact performance and time-to-delivery;
678
+ * high network error rates, which may impact performance and time-to-delivery;
672
679
  * producer buffer growth, which may indicate that producers are unable to deliver messages at the rate they're being produced;
673
680
  * consumer processing errors, indicating exceptions are being raised in the processing code;
674
681
  * frequent consumer rebalances, which may indicate unstable network conditions or consumer configurations.
@@ -31,8 +31,8 @@ module Kafka
31
31
  # @param socket_timeout [Integer, nil] the timeout setting for socket
32
32
  # connections. See {BrokerPool#initialize}.
33
33
  #
34
- # @param ssl_ca_cert [String, nil] a PEM encoded CA cert to use with an
35
- # SSL connection.
34
+ # @param ssl_ca_cert [String, Array<String>, nil] a PEM encoded CA cert, or an Array of
35
+ # PEM encoded CA certs, to use with an SSL connection.
36
36
  #
37
37
  # @param ssl_client_cert [String, nil] a PEM encoded client cert to use with an
38
38
  # SSL connection. Must be used in combination with ssl_client_cert_key.
@@ -216,19 +216,25 @@ module Kafka
216
216
  # not triggered by message processing.
217
217
  # @param heartbeat_interval [Integer] the interval between heartbeats; must be less
218
218
  # than the session window.
219
+ # @param offset_retention_time [Integer] the time period that committed
220
+ # offsets will be retained, in seconds. Defaults to the broker setting.
219
221
  # @return [Consumer]
220
- def consumer(group_id:, session_timeout: 30, offset_commit_interval: 10, offset_commit_threshold: 0, heartbeat_interval: 10)
222
+ def consumer(group_id:, session_timeout: 30, offset_commit_interval: 10, offset_commit_threshold: 0, heartbeat_interval: 10, offset_retention_time: nil)
221
223
  cluster = initialize_cluster
222
224
 
223
225
  instrumenter = DecoratingInstrumenter.new(@instrumenter, {
224
226
  group_id: group_id,
225
227
  })
226
228
 
229
+ # The Kafka protocol expects the retention time to be in ms.
230
+ retention_time = (offset_retention_time && offset_retention_time * 1_000) || -1
231
+
227
232
  group = ConsumerGroup.new(
228
233
  cluster: cluster,
229
234
  logger: @logger,
230
235
  group_id: group_id,
231
236
  session_timeout: session_timeout,
237
+ retention_time: retention_time
232
238
  )
233
239
 
234
240
  offset_manager = OffsetManager.new(
@@ -237,6 +243,7 @@ module Kafka
237
243
  logger: @logger,
238
244
  commit_interval: offset_commit_interval,
239
245
  commit_threshold: offset_commit_threshold,
246
+ offset_retention_time: offset_retention_time
240
247
  )
241
248
 
242
249
  heartbeat = Heartbeat.new(
@@ -447,7 +454,9 @@ module Kafka
447
454
 
448
455
  if ca_cert
449
456
  store = OpenSSL::X509::Store.new
450
- store.add_cert(OpenSSL::X509::Certificate.new(ca_cert))
457
+ Array(ca_cert).each do |cert|
458
+ store.add_cert(OpenSSL::X509::Certificate.new(cert))
459
+ end
451
460
  ssl_context.cert_store = store
452
461
  end
453
462
 
@@ -286,6 +286,7 @@ module Kafka
286
286
  @cluster.mark_as_stale!
287
287
  rescue LeaderNotAvailable => e
288
288
  @logger.error "Leader not available; waiting 1s before retrying"
289
+ @cluster.mark_as_stale!
289
290
  sleep 1
290
291
  end
291
292
  end
@@ -5,7 +5,7 @@ module Kafka
5
5
  class ConsumerGroup
6
6
  attr_reader :assigned_partitions, :generation_id
7
7
 
8
- def initialize(cluster:, logger:, group_id:, session_timeout:)
8
+ def initialize(cluster:, logger:, group_id:, session_timeout:, retention_time:)
9
9
  @cluster = cluster
10
10
  @logger = logger
11
11
  @group_id = group_id
@@ -16,6 +16,7 @@ module Kafka
16
16
  @topics = Set.new
17
17
  @assigned_partitions = {}
18
18
  @assignment_strategy = RoundRobinAssignmentStrategy.new(cluster: @cluster)
19
+ @retention_time = retention_time
19
20
  end
20
21
 
21
22
  def subscribe(topic)
@@ -68,6 +69,7 @@ module Kafka
68
69
  member_id: @member_id,
69
70
  generation_id: @generation_id,
70
71
  offsets: offsets,
72
+ retention_time: @retention_time
71
73
  )
72
74
 
73
75
  response.topics.each do |topic, partitions|
@@ -45,7 +45,7 @@ module Kafka
45
45
  if empty?
46
46
  0
47
47
  else
48
- highwater_mark_offset - last_offset
48
+ (highwater_mark_offset - 1) - last_offset
49
49
  end
50
50
  end
51
51
  end
@@ -56,7 +56,7 @@ module Kafka
56
56
  return unless @buffer.key?(topic) && @buffer[topic].key?(partition)
57
57
 
58
58
  @size -= @buffer[topic][partition].count
59
- @bytesize -= @buffer[topic][partition].map(&:bytesize).reduce(:+)
59
+ @bytesize -= @buffer[topic][partition].map(&:bytesize).reduce(0, :+)
60
60
 
61
61
  @buffer[topic].delete(partition)
62
62
  @buffer.delete(topic) if @buffer[topic].empty?
@@ -1,6 +1,10 @@
1
1
  module Kafka
2
2
  class OffsetManager
3
- def initialize(cluster:, group:, logger:, commit_interval:, commit_threshold:)
3
+
4
+ # The default broker setting for offsets.retention.minutes is 1440.
5
+ DEFAULT_RETENTION_TIME = 1440 * 60
6
+
7
+ def initialize(cluster:, group:, logger:, commit_interval:, commit_threshold:, offset_retention_time:)
4
8
  @cluster = cluster
5
9
  @group = group
6
10
  @logger = logger
@@ -13,6 +17,8 @@ module Kafka
13
17
  @committed_offsets = nil
14
18
  @resolved_offsets = {}
15
19
  @last_commit = Time.now
20
+ @last_recommit = nil
21
+ @recommit_interval = (offset_retention_time || DEFAULT_RETENTION_TIME) / 2
16
22
  end
17
23
 
18
24
  def set_default_offset(topic, default_offset)
@@ -49,17 +55,15 @@ module Kafka
49
55
  end
50
56
  end
51
57
 
52
- def commit_offsets
53
- unless @processed_offsets.empty?
54
- pretty_offsets = @processed_offsets.flat_map {|topic, partitions|
55
- partitions.map {|partition, offset| "#{topic}/#{partition}:#{offset}" }
56
- }.join(", ")
57
-
58
- @logger.info "Committing offsets: #{pretty_offsets}"
58
+ def commit_offsets(recommit = false)
59
+ offsets = offsets_to_commit(recommit)
60
+ unless offsets.empty?
61
+ @logger.info "Committing offsets#{recommit ? ' with recommit' : ''}: #{prettify_offsets(offsets)}"
59
62
 
60
- @group.commit_offsets(@processed_offsets)
63
+ @group.commit_offsets(offsets)
61
64
 
62
65
  @last_commit = Time.now
66
+ @last_recommit = Time.now if recommit
63
67
 
64
68
  @uncommitted_offsets = 0
65
69
  @committed_offsets = nil
@@ -67,8 +71,9 @@ module Kafka
67
71
  end
68
72
 
69
73
  def commit_offsets_if_necessary
70
- if commit_timeout_reached? || commit_threshold_reached?
71
- commit_offsets
74
+ recommit = recommit_timeout_reached?
75
+ if recommit || commit_timeout_reached? || commit_threshold_reached?
76
+ commit_offsets(recommit)
72
77
  end
73
78
  end
74
79
 
@@ -107,13 +112,44 @@ module Kafka
107
112
  @cluster.resolve_offsets(topic, partitions, default_offset)
108
113
  end
109
114
 
115
+ def seconds_since(time)
116
+ Time.now - time
117
+ end
118
+
110
119
  def seconds_since_last_commit
111
- Time.now - @last_commit
120
+ seconds_since(@last_commit)
112
121
  end
113
122
 
114
- def committed_offset_for(topic, partition)
123
+ def committed_offsets
115
124
  @committed_offsets ||= @group.fetch_offsets
116
- @committed_offsets.offset_for(topic, partition)
125
+ end
126
+
127
+ def committed_offset_for(topic, partition)
128
+ committed_offsets.offset_for(topic, partition)
129
+ end
130
+
131
+ def offsets_to_commit(recommit = false)
132
+ if recommit
133
+ offsets_to_recommit.merge!(@processed_offsets) do |_topic, committed, processed|
134
+ committed.merge!(processed)
135
+ end
136
+ else
137
+ @processed_offsets
138
+ end
139
+ end
140
+
141
+ def offsets_to_recommit
142
+ committed_offsets.topics.each_with_object({}) do |(topic, partition_info), offsets|
143
+ topic_offsets = partition_info.keys.each_with_object({}) do |partition, partition_map|
144
+ offset = committed_offsets.offset_for(topic, partition)
145
+ partition_map[partition] = offset unless offset == -1
146
+ end
147
+ offsets[topic] = topic_offsets unless topic_offsets.empty?
148
+ end
149
+ end
150
+
151
+ def recommit_timeout_reached?
152
+ @last_recommit.nil? || seconds_since(@last_recommit) >= @recommit_interval
117
153
  end
118
154
 
119
155
  def commit_timeout_reached?
@@ -123,5 +159,11 @@ module Kafka
123
159
  def commit_threshold_reached?
124
160
  @commit_threshold != 0 && @uncommitted_offsets >= @commit_threshold
125
161
  end
162
+
163
+ def prettify_offsets(offsets)
164
+ offsets.flat_map do |topic, partitions|
165
+ partitions.map { |partition, offset| "#{topic}/#{partition}:#{offset}" }
166
+ end.join(', ')
167
+ end
126
168
  end
127
169
  end
@@ -1,3 +1,3 @@
1
1
  module Kafka
2
- VERSION = "0.3.16"
2
+ VERSION = "0.3.17"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ruby-kafka
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.16
4
+ version: 0.3.17
5
5
  platform: ruby
6
6
  authors:
7
7
  - Daniel Schierbeck
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2017-01-20 00:00:00.000000000 Z
11
+ date: 2017-04-07 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler