ruby-kafka 0.3.16 → 0.3.17

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 02443c4134a1dd98934b6bd2f73b2edc09fe4cf5
4
- data.tar.gz: b03995318665f91f070ecb00a14aa483e4c4444e
3
+ metadata.gz: ae14c68f2b224bf04659de3a17203e7065d5efd7
4
+ data.tar.gz: d33dd2594e8159eb23d3a8589b789f2b66c5993b
5
5
  SHA512:
6
- metadata.gz: 2c69cb9baa6eed992cdd7dc66b14547df32059144748b0c12fe269ec126bbd4ed62b18300aacb955f62b17235e26e605cfea53425adf278a57edcb99d4e7c272
7
- data.tar.gz: d2b7dd37bc7eb3ac6666808172d289334d926d9cb17b01690c19e1f67174af4f054c003fcc204fb9f711441d4207a4693e2d54419af873930306ae9f2fe8d1a0
6
+ metadata.gz: 9f2d374b0237fad7d3678f18ee903640a40bc735f995b3138286928398d5d6b595bc3ce395a86d60810934617cdc2c1e207d305c5b245323c9281ffe4bc33bc5
7
+ data.tar.gz: 0403e6ba029d666660395fc666dc4003195e9aa947b2c44936415394b12abd14946830d4b369ac5587f5d71858f40c6ff0630628c51602603e98f340e065ef36
@@ -4,11 +4,19 @@ Changes and additions to the library will be listed here.
4
4
 
5
5
  ## Unreleased
6
6
 
7
+ ## v0.3.17
8
+
9
+ - Re-commit previously committed offsets periodically with an interval of half
10
+ the offset retention time, starting with the first commit (#318).
11
+ - Expose offset retention time in the Consumer API (#316).
12
+ - Don't get blocked when there's temporarily no leader for a topic (#336).
13
+
7
14
  ## v0.3.16
8
15
 
9
16
  - Fix SSL socket timeout (#283).
10
17
  - Update to the latest Datadog gem (#296).
11
18
  - Automatically detect private key type (#297).
19
+ - Only fetch messages for subscribed topics (#309).
12
20
 
13
21
  ## v0.3.15
14
22
 
data/README.md CHANGED
@@ -9,35 +9,35 @@ Although parts of this library work with Kafka 0.8 – specifically, the Produce
9
9
  1. [Installation](#installation)
10
10
  2. [Compatibility](#compatibility)
11
11
  3. [Usage](#usage)
12
- 1. [Setting up the Kafka Client](#setting-up-the-kafka-client)
13
- 2. [Producing Messages to Kafka](#producing-messages-to-kafka)
14
- 1. [Efficiently Producing Messages](#efficiently-producing-messages)
15
- 1. [Asynchronously Producing Messages](#asynchronously-producing-messages)
16
- 2. [Serialization](#serialization)
17
- 3. [Partitioning](#partitioning)
18
- 4. [Buffering and Error Handling](#buffering-and-error-handling)
19
- 5. [Message Durability](#message-durability)
20
- 6. [Message Delivery Guarantees](#message-delivery-guarantees)
21
- 7. [Compression](#compression)
22
- 8. [Producing Messages from a Rails Application](#producing-messages-from-a-rails-application)
23
- 3. [Consuming Messages from Kafka](#consuming-messages-from-kafka)
24
- 1. [Consumer Groups](#consumer-groups)
25
- 2. [Consumer Checkpointing](#consumer-checkpointing)
26
- 3. [Topic Subscriptions](#topic-subscriptions)
27
- 4. [Shutting Down a Consumer](#shutting-down-a-consumer)
28
- 5. [Consuming Messages in Batches](#consuming-messages-in-batches)
29
- 6. [Balancing Throughput and Latency](#balancing-throughput-and-latency)
30
- 4. [Thread Safety](#thread-safety)
31
- 5. [Logging](#logging)
32
- 6. [Instrumentation](#instrumentation)
33
- 7. [Monitoring](#monitoring)
34
- 1. [Reporting Metrics to Datadog](#reporting-metrics-to-datadog)
35
- 8. [Understanding Timeouts](#understanding-timeouts)
36
- 9. [Encryption and Authentication using SSL](#encryption-and-authentication-using-ssl)
12
+ 1. [Setting up the Kafka Client](#setting-up-the-kafka-client)
13
+ 2. [Producing Messages to Kafka](#producing-messages-to-kafka)
14
+ 1. [Efficiently Producing Messages](#efficiently-producing-messages)
15
+ 1. [Asynchronously Producing Messages](#asynchronously-producing-messages)
16
+ 2. [Serialization](#serialization)
17
+ 3. [Partitioning](#partitioning)
18
+ 4. [Buffering and Error Handling](#buffering-and-error-handling)
19
+ 5. [Message Durability](#message-durability)
20
+ 6. [Message Delivery Guarantees](#message-delivery-guarantees)
21
+ 7. [Compression](#compression)
22
+ 8. [Producing Messages from a Rails Application](#producing-messages-from-a-rails-application)
23
+ 3. [Consuming Messages from Kafka](#consuming-messages-from-kafka)
24
+ 1. [Consumer Groups](#consumer-groups)
25
+ 2. [Consumer Checkpointing](#consumer-checkpointing)
26
+ 3. [Topic Subscriptions](#topic-subscriptions)
27
+ 4. [Shutting Down a Consumer](#shutting-down-a-consumer)
28
+ 5. [Consuming Messages in Batches](#consuming-messages-in-batches)
29
+ 6. [Balancing Throughput and Latency](#balancing-throughput-and-latency)
30
+ 4. [Thread Safety](#thread-safety)
31
+ 5. [Logging](#logging)
32
+ 6. [Instrumentation](#instrumentation)
33
+ 7. [Monitoring](#monitoring)
34
+ 1. [Reporting Metrics to Datadog](#reporting-metrics-to-datadog)
35
+ 8. [Understanding Timeouts](#understanding-timeouts)
36
+ 9. [Encryption and Authentication using SSL](#encryption-and-authentication-using-ssl)
37
37
  4. [Design](#design)
38
- 1. [Producer Design](#producer-design)
39
- 2. [Asynchronous Producer Design](#asynchronous-producer-design)
40
- 3. [Consumer Design](#consumer-design)
38
+ 1. [Producer Design](#producer-design)
39
+ 2. [Asynchronous Producer Design](#asynchronous-producer-design)
40
+ 3. [Consumer Design](#consumer-design)
41
41
  5. [Development](#development)
42
42
  6. [Roadmap](#roadmap)
43
43
 
@@ -166,7 +166,7 @@ Read the docs for [Kafka::Producer](http://www.rubydoc.info/gems/ruby-kafka/Kafk
166
166
 
167
167
  #### Asynchronously Producing Messages
168
168
 
169
- A normal producer will block while `#deliver_messages` is sending messages to Kafka, possible for tens of seconds or even minutes at a time, depending on your timeout and retry settings. Furthermore, you have to call `#deliver_messages` manually, with a frequency that balances batch size with message delay.
169
+ A normal producer will block while `#deliver_messages` is sending messages to Kafka, possibly for tens of seconds or even minutes at a time, depending on your timeout and retry settings. Furthermore, you have to call `#deliver_messages` manually, with a frequency that balances batch size with message delay.
170
170
 
171
171
  In order to avoid blocking during message deliveries you can use the _asynchronous producer_ API. It is mostly similar to the synchronous API, with calls to `#produce` and `#deliver_messages`. The main difference is that rather than blocking, these calls will return immediately. The actual work will be done in a background thread, with the messages and operations being sent from the caller over a thread safe queue.
172
172
 
@@ -505,6 +505,10 @@ By default, offsets are committed every 10 seconds. You can increase the frequen
505
505
 
506
506
  In addition to the time based trigger it's possible to trigger checkpointing in response to _n_ messages having been processed, known as the _offset commit threshold_. This puts a bound on the number of messages that can be double-processed before the problem is detected. Setting this to 1 will cause an offset commit to take place every time a message has been processed. By default this trigger is disabled.
507
507
 
508
+ Stale offsets are periodically purged by the broker. The broker setting `offsets.retention.minutes` controls the retention window for committed offsets, and defaults to 1 day. The length of the retention window, known as _offset retention time_, can be changed for the consumer.
509
+
510
+ Previously committed offsets are re-committed, to reset the retention window, at the first commit and periodically at an interval of half the _offset retention time_.
511
+
508
512
  ```ruby
509
513
  consumer = kafka.consumer(
510
514
  group_id: "some-group",
@@ -514,6 +518,9 @@ consumer = kafka.consumer(
514
518
 
515
519
  # Commit offsets when 100 messages have been processed.
516
520
  offset_commit_threshold: 100,
521
+
522
+ # Increase the length of time that committed offsets are kept.
523
+ offset_retention_time: 7 * 60 * 60
517
524
  )
518
525
  ```
519
526
 
@@ -668,7 +675,7 @@ end
668
675
 
669
676
  It is highly recommended that you monitor your Kafka client applications in production. Typical problems you'll see are:
670
677
 
671
- * high network errors rates, which may impact performance and time-to-delivery;
678
+ * high network error rates, which may impact performance and time-to-delivery;
672
679
  * producer buffer growth, which may indicate that producers are unable to deliver messages at the rate they're being produced;
673
680
  * consumer processing errors, indicating exceptions are being raised in the processing code;
674
681
  * frequent consumer rebalances, which may indicate unstable network conditions or consumer configurations.
@@ -31,8 +31,8 @@ module Kafka
31
31
  # @param socket_timeout [Integer, nil] the timeout setting for socket
32
32
  # connections. See {BrokerPool#initialize}.
33
33
  #
34
- # @param ssl_ca_cert [String, nil] a PEM encoded CA cert to use with an
35
- # SSL connection.
34
+ # @param ssl_ca_cert [String, Array<String>, nil] a PEM encoded CA cert, or an Array of
35
+ # PEM encoded CA certs, to use with an SSL connection.
36
36
  #
37
37
  # @param ssl_client_cert [String, nil] a PEM encoded client cert to use with an
38
38
  # SSL connection. Must be used in combination with ssl_client_cert_key.
@@ -216,19 +216,25 @@ module Kafka
216
216
  # not triggered by message processing.
217
217
  # @param heartbeat_interval [Integer] the interval between heartbeats; must be less
218
218
  # than the session window.
219
+ # @param offset_retention_time [Integer] the time period that committed
220
+ # offsets will be retained, in seconds. Defaults to the broker setting.
219
221
  # @return [Consumer]
220
- def consumer(group_id:, session_timeout: 30, offset_commit_interval: 10, offset_commit_threshold: 0, heartbeat_interval: 10)
222
+ def consumer(group_id:, session_timeout: 30, offset_commit_interval: 10, offset_commit_threshold: 0, heartbeat_interval: 10, offset_retention_time: nil)
221
223
  cluster = initialize_cluster
222
224
 
223
225
  instrumenter = DecoratingInstrumenter.new(@instrumenter, {
224
226
  group_id: group_id,
225
227
  })
226
228
 
229
+ # The Kafka protocol expects the retention time to be in ms.
230
+ retention_time = (offset_retention_time && offset_retention_time * 1_000) || -1
231
+
227
232
  group = ConsumerGroup.new(
228
233
  cluster: cluster,
229
234
  logger: @logger,
230
235
  group_id: group_id,
231
236
  session_timeout: session_timeout,
237
+ retention_time: retention_time
232
238
  )
233
239
 
234
240
  offset_manager = OffsetManager.new(
@@ -237,6 +243,7 @@ module Kafka
237
243
  logger: @logger,
238
244
  commit_interval: offset_commit_interval,
239
245
  commit_threshold: offset_commit_threshold,
246
+ offset_retention_time: offset_retention_time
240
247
  )
241
248
 
242
249
  heartbeat = Heartbeat.new(
@@ -447,7 +454,9 @@ module Kafka
447
454
 
448
455
  if ca_cert
449
456
  store = OpenSSL::X509::Store.new
450
- store.add_cert(OpenSSL::X509::Certificate.new(ca_cert))
457
+ Array(ca_cert).each do |cert|
458
+ store.add_cert(OpenSSL::X509::Certificate.new(cert))
459
+ end
451
460
  ssl_context.cert_store = store
452
461
  end
453
462
 
@@ -286,6 +286,7 @@ module Kafka
286
286
  @cluster.mark_as_stale!
287
287
  rescue LeaderNotAvailable => e
288
288
  @logger.error "Leader not available; waiting 1s before retrying"
289
+ @cluster.mark_as_stale!
289
290
  sleep 1
290
291
  end
291
292
  end
@@ -5,7 +5,7 @@ module Kafka
5
5
  class ConsumerGroup
6
6
  attr_reader :assigned_partitions, :generation_id
7
7
 
8
- def initialize(cluster:, logger:, group_id:, session_timeout:)
8
+ def initialize(cluster:, logger:, group_id:, session_timeout:, retention_time:)
9
9
  @cluster = cluster
10
10
  @logger = logger
11
11
  @group_id = group_id
@@ -16,6 +16,7 @@ module Kafka
16
16
  @topics = Set.new
17
17
  @assigned_partitions = {}
18
18
  @assignment_strategy = RoundRobinAssignmentStrategy.new(cluster: @cluster)
19
+ @retention_time = retention_time
19
20
  end
20
21
 
21
22
  def subscribe(topic)
@@ -68,6 +69,7 @@ module Kafka
68
69
  member_id: @member_id,
69
70
  generation_id: @generation_id,
70
71
  offsets: offsets,
72
+ retention_time: @retention_time
71
73
  )
72
74
 
73
75
  response.topics.each do |topic, partitions|
@@ -45,7 +45,7 @@ module Kafka
45
45
  if empty?
46
46
  0
47
47
  else
48
- highwater_mark_offset - last_offset
48
+ (highwater_mark_offset - 1) - last_offset
49
49
  end
50
50
  end
51
51
  end
@@ -56,7 +56,7 @@ module Kafka
56
56
  return unless @buffer.key?(topic) && @buffer[topic].key?(partition)
57
57
 
58
58
  @size -= @buffer[topic][partition].count
59
- @bytesize -= @buffer[topic][partition].map(&:bytesize).reduce(:+)
59
+ @bytesize -= @buffer[topic][partition].map(&:bytesize).reduce(0, :+)
60
60
 
61
61
  @buffer[topic].delete(partition)
62
62
  @buffer.delete(topic) if @buffer[topic].empty?
@@ -1,6 +1,10 @@
1
1
  module Kafka
2
2
  class OffsetManager
3
- def initialize(cluster:, group:, logger:, commit_interval:, commit_threshold:)
3
+
4
+ # The default broker setting for offsets.retention.minutes is 1440.
5
+ DEFAULT_RETENTION_TIME = 1440 * 60
6
+
7
+ def initialize(cluster:, group:, logger:, commit_interval:, commit_threshold:, offset_retention_time:)
4
8
  @cluster = cluster
5
9
  @group = group
6
10
  @logger = logger
@@ -13,6 +17,8 @@ module Kafka
13
17
  @committed_offsets = nil
14
18
  @resolved_offsets = {}
15
19
  @last_commit = Time.now
20
+ @last_recommit = nil
21
+ @recommit_interval = (offset_retention_time || DEFAULT_RETENTION_TIME) / 2
16
22
  end
17
23
 
18
24
  def set_default_offset(topic, default_offset)
@@ -49,17 +55,15 @@ module Kafka
49
55
  end
50
56
  end
51
57
 
52
- def commit_offsets
53
- unless @processed_offsets.empty?
54
- pretty_offsets = @processed_offsets.flat_map {|topic, partitions|
55
- partitions.map {|partition, offset| "#{topic}/#{partition}:#{offset}" }
56
- }.join(", ")
57
-
58
- @logger.info "Committing offsets: #{pretty_offsets}"
58
+ def commit_offsets(recommit = false)
59
+ offsets = offsets_to_commit(recommit)
60
+ unless offsets.empty?
61
+ @logger.info "Committing offsets#{recommit ? ' with recommit' : ''}: #{prettify_offsets(offsets)}"
59
62
 
60
- @group.commit_offsets(@processed_offsets)
63
+ @group.commit_offsets(offsets)
61
64
 
62
65
  @last_commit = Time.now
66
+ @last_recommit = Time.now if recommit
63
67
 
64
68
  @uncommitted_offsets = 0
65
69
  @committed_offsets = nil
@@ -67,8 +71,9 @@ module Kafka
67
71
  end
68
72
 
69
73
  def commit_offsets_if_necessary
70
- if commit_timeout_reached? || commit_threshold_reached?
71
- commit_offsets
74
+ recommit = recommit_timeout_reached?
75
+ if recommit || commit_timeout_reached? || commit_threshold_reached?
76
+ commit_offsets(recommit)
72
77
  end
73
78
  end
74
79
 
@@ -107,13 +112,44 @@ module Kafka
107
112
  @cluster.resolve_offsets(topic, partitions, default_offset)
108
113
  end
109
114
 
115
+ def seconds_since(time)
116
+ Time.now - time
117
+ end
118
+
110
119
  def seconds_since_last_commit
111
- Time.now - @last_commit
120
+ seconds_since(@last_commit)
112
121
  end
113
122
 
114
- def committed_offset_for(topic, partition)
123
+ def committed_offsets
115
124
  @committed_offsets ||= @group.fetch_offsets
116
- @committed_offsets.offset_for(topic, partition)
125
+ end
126
+
127
+ def committed_offset_for(topic, partition)
128
+ committed_offsets.offset_for(topic, partition)
129
+ end
130
+
131
+ def offsets_to_commit(recommit = false)
132
+ if recommit
133
+ offsets_to_recommit.merge!(@processed_offsets) do |_topic, committed, processed|
134
+ committed.merge!(processed)
135
+ end
136
+ else
137
+ @processed_offsets
138
+ end
139
+ end
140
+
141
+ def offsets_to_recommit
142
+ committed_offsets.topics.each_with_object({}) do |(topic, partition_info), offsets|
143
+ topic_offsets = partition_info.keys.each_with_object({}) do |partition, partition_map|
144
+ offset = committed_offsets.offset_for(topic, partition)
145
+ partition_map[partition] = offset unless offset == -1
146
+ end
147
+ offsets[topic] = topic_offsets unless topic_offsets.empty?
148
+ end
149
+ end
150
+
151
+ def recommit_timeout_reached?
152
+ @last_recommit.nil? || seconds_since(@last_recommit) >= @recommit_interval
117
153
  end
118
154
 
119
155
  def commit_timeout_reached?
@@ -123,5 +159,11 @@ module Kafka
123
159
  def commit_threshold_reached?
124
160
  @commit_threshold != 0 && @uncommitted_offsets >= @commit_threshold
125
161
  end
162
+
163
+ def prettify_offsets(offsets)
164
+ offsets.flat_map do |topic, partitions|
165
+ partitions.map { |partition, offset| "#{topic}/#{partition}:#{offset}" }
166
+ end.join(', ')
167
+ end
126
168
  end
127
169
  end
@@ -1,3 +1,3 @@
1
1
  module Kafka
2
- VERSION = "0.3.16"
2
+ VERSION = "0.3.17"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ruby-kafka
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.16
4
+ version: 0.3.17
5
5
  platform: ruby
6
6
  authors:
7
7
  - Daniel Schierbeck
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2017-01-20 00:00:00.000000000 Z
11
+ date: 2017-04-07 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler