ruby-kafka 0.3.16 → 0.3.17
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +8 -0
- data/README.md +37 -30
- data/lib/kafka/client.rb +13 -4
- data/lib/kafka/consumer.rb +1 -0
- data/lib/kafka/consumer_group.rb +3 -1
- data/lib/kafka/fetched_batch.rb +1 -1
- data/lib/kafka/message_buffer.rb +1 -1
- data/lib/kafka/offset_manager.rb +56 -14
- data/lib/kafka/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: ae14c68f2b224bf04659de3a17203e7065d5efd7
|
4
|
+
data.tar.gz: d33dd2594e8159eb23d3a8589b789f2b66c5993b
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 9f2d374b0237fad7d3678f18ee903640a40bc735f995b3138286928398d5d6b595bc3ce395a86d60810934617cdc2c1e207d305c5b245323c9281ffe4bc33bc5
|
7
|
+
data.tar.gz: 0403e6ba029d666660395fc666dc4003195e9aa947b2c44936415394b12abd14946830d4b369ac5587f5d71858f40c6ff0630628c51602603e98f340e065ef36
|
data/CHANGELOG.md
CHANGED
@@ -4,11 +4,19 @@ Changes and additions to the library will be listed here.
|
|
4
4
|
|
5
5
|
## Unreleased
|
6
6
|
|
7
|
+
## v0.3.17
|
8
|
+
|
9
|
+
- Re-commit previously committed offsets periodically with an interval of half
|
10
|
+
the offset retention time, starting with the first commit (#318).
|
11
|
+
- Expose offset retention time in the Consumer API (#316).
|
12
|
+
- Don't get blocked when there's temporarily no leader for a topic (#336).
|
13
|
+
|
7
14
|
## v0.3.16
|
8
15
|
|
9
16
|
- Fix SSL socket timeout (#283).
|
10
17
|
- Update to the latest Datadog gem (#296).
|
11
18
|
- Automatically detect private key type (#297).
|
19
|
+
- Only fetch messages for subscribed topics (#309).
|
12
20
|
|
13
21
|
## v0.3.15
|
14
22
|
|
data/README.md
CHANGED
@@ -9,35 +9,35 @@ Although parts of this library work with Kafka 0.8 – specifically, the Produce
|
|
9
9
|
1. [Installation](#installation)
|
10
10
|
2. [Compatibility](#compatibility)
|
11
11
|
3. [Usage](#usage)
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
12
|
+
1. [Setting up the Kafka Client](#setting-up-the-kafka-client)
|
13
|
+
2. [Producing Messages to Kafka](#producing-messages-to-kafka)
|
14
|
+
1. [Efficiently Producing Messages](#efficiently-producing-messages)
|
15
|
+
1. [Asynchronously Producing Messages](#asynchronously-producing-messages)
|
16
|
+
2. [Serialization](#serialization)
|
17
|
+
3. [Partitioning](#partitioning)
|
18
|
+
4. [Buffering and Error Handling](#buffering-and-error-handling)
|
19
|
+
5. [Message Durability](#message-durability)
|
20
|
+
6. [Message Delivery Guarantees](#message-delivery-guarantees)
|
21
|
+
7. [Compression](#compression)
|
22
|
+
8. [Producing Messages from a Rails Application](#producing-messages-from-a-rails-application)
|
23
|
+
3. [Consuming Messages from Kafka](#consuming-messages-from-kafka)
|
24
|
+
1. [Consumer Groups](#consumer-groups)
|
25
|
+
2. [Consumer Checkpointing](#consumer-checkpointing)
|
26
|
+
3. [Topic Subscriptions](#topic-subscriptions)
|
27
|
+
4. [Shutting Down a Consumer](#shutting-down-a-consumer)
|
28
|
+
5. [Consuming Messages in Batches](#consuming-messages-in-batches)
|
29
|
+
6. [Balancing Throughput and Latency](#balancing-throughput-and-latency)
|
30
|
+
4. [Thread Safety](#thread-safety)
|
31
|
+
5. [Logging](#logging)
|
32
|
+
6. [Instrumentation](#instrumentation)
|
33
|
+
7. [Monitoring](#monitoring)
|
34
|
+
1. [Reporting Metrics to Datadog](#reporting-metrics-to-datadog)
|
35
|
+
8. [Understanding Timeouts](#understanding-timeouts)
|
36
|
+
9. [Encryption and Authentication using SSL](#encryption-and-authentication-using-ssl)
|
37
37
|
4. [Design](#design)
|
38
|
-
|
39
|
-
|
40
|
-
|
38
|
+
1. [Producer Design](#producer-design)
|
39
|
+
2. [Asynchronous Producer Design](#asynchronous-producer-design)
|
40
|
+
3. [Consumer Design](#consumer-design)
|
41
41
|
5. [Development](#development)
|
42
42
|
6. [Roadmap](#roadmap)
|
43
43
|
|
@@ -166,7 +166,7 @@ Read the docs for [Kafka::Producer](http://www.rubydoc.info/gems/ruby-kafka/Kafk
|
|
166
166
|
|
167
167
|
#### Asynchronously Producing Messages
|
168
168
|
|
169
|
-
A normal producer will block while `#deliver_messages` is sending messages to Kafka,
|
169
|
+
A normal producer will block while `#deliver_messages` is sending messages to Kafka, possibly for tens of seconds or even minutes at a time, depending on your timeout and retry settings. Furthermore, you have to call `#deliver_messages` manually, with a frequency that balances batch size with message delay.
|
170
170
|
|
171
171
|
In order to avoid blocking during message deliveries you can use the _asynchronous producer_ API. It is mostly similar to the synchronous API, with calls to `#produce` and `#deliver_messages`. The main difference is that rather than blocking, these calls will return immediately. The actual work will be done in a background thread, with the messages and operations being sent from the caller over a thread safe queue.
|
172
172
|
|
@@ -505,6 +505,10 @@ By default, offsets are committed every 10 seconds. You can increase the frequen
|
|
505
505
|
|
506
506
|
In addition to the time based trigger it's possible to trigger checkpointing in response to _n_ messages having been processed, known as the _offset commit threshold_. This puts a bound on the number of messages that can be double-processed before the problem is detected. Setting this to 1 will cause an offset commit to take place every time a message has been processed. By default this trigger is disabled.
|
507
507
|
|
508
|
+
Stale offsets are periodically purged by the broker. The broker setting `offsets.retention.minutes` controls the retention window for committed offsets, and defaults to 1 day. The length of the retention window, known as _offset retention time_, can be changed for the consumer.
|
509
|
+
|
510
|
+
Previously committed offsets are re-committed, to reset the retention window, at the first commit and periodically at an interval of half the _offset retention time_.
|
511
|
+
|
508
512
|
```ruby
|
509
513
|
consumer = kafka.consumer(
|
510
514
|
group_id: "some-group",
|
@@ -514,6 +518,9 @@ consumer = kafka.consumer(
|
|
514
518
|
|
515
519
|
# Commit offsets when 100 messages have been processed.
|
516
520
|
offset_commit_threshold: 100,
|
521
|
+
|
522
|
+
# Increase the length of time that committed offsets are kept.
|
523
|
+
offset_retention_time: 7 * 60 * 60
|
517
524
|
)
|
518
525
|
```
|
519
526
|
|
@@ -668,7 +675,7 @@ end
|
|
668
675
|
|
669
676
|
It is highly recommended that you monitor your Kafka client applications in production. Typical problems you'll see are:
|
670
677
|
|
671
|
-
* high network
|
678
|
+
* high network error rates, which may impact performance and time-to-delivery;
|
672
679
|
* producer buffer growth, which may indicate that producers are unable to deliver messages at the rate they're being produced;
|
673
680
|
* consumer processing errors, indicating exceptions are being raised in the processing code;
|
674
681
|
* frequent consumer rebalances, which may indicate unstable network conditions or consumer configurations.
|
data/lib/kafka/client.rb
CHANGED
@@ -31,8 +31,8 @@ module Kafka
|
|
31
31
|
# @param socket_timeout [Integer, nil] the timeout setting for socket
|
32
32
|
# connections. See {BrokerPool#initialize}.
|
33
33
|
#
|
34
|
-
# @param ssl_ca_cert [String, nil] a PEM encoded CA cert
|
35
|
-
# SSL connection.
|
34
|
+
# @param ssl_ca_cert [String, Array<String>, nil] a PEM encoded CA cert, or an Array of
|
35
|
+
# PEM encoded CA certs, to use with an SSL connection.
|
36
36
|
#
|
37
37
|
# @param ssl_client_cert [String, nil] a PEM encoded client cert to use with an
|
38
38
|
# SSL connection. Must be used in combination with ssl_client_cert_key.
|
@@ -216,19 +216,25 @@ module Kafka
|
|
216
216
|
# not triggered by message processing.
|
217
217
|
# @param heartbeat_interval [Integer] the interval between heartbeats; must be less
|
218
218
|
# than the session window.
|
219
|
+
# @param offset_retention_time [Integer] the time period that committed
|
220
|
+
# offsets will be retained, in seconds. Defaults to the broker setting.
|
219
221
|
# @return [Consumer]
|
220
|
-
def consumer(group_id:, session_timeout: 30, offset_commit_interval: 10, offset_commit_threshold: 0, heartbeat_interval: 10)
|
222
|
+
def consumer(group_id:, session_timeout: 30, offset_commit_interval: 10, offset_commit_threshold: 0, heartbeat_interval: 10, offset_retention_time: nil)
|
221
223
|
cluster = initialize_cluster
|
222
224
|
|
223
225
|
instrumenter = DecoratingInstrumenter.new(@instrumenter, {
|
224
226
|
group_id: group_id,
|
225
227
|
})
|
226
228
|
|
229
|
+
# The Kafka protocol expects the retention time to be in ms.
|
230
|
+
retention_time = (offset_retention_time && offset_retention_time * 1_000) || -1
|
231
|
+
|
227
232
|
group = ConsumerGroup.new(
|
228
233
|
cluster: cluster,
|
229
234
|
logger: @logger,
|
230
235
|
group_id: group_id,
|
231
236
|
session_timeout: session_timeout,
|
237
|
+
retention_time: retention_time
|
232
238
|
)
|
233
239
|
|
234
240
|
offset_manager = OffsetManager.new(
|
@@ -237,6 +243,7 @@ module Kafka
|
|
237
243
|
logger: @logger,
|
238
244
|
commit_interval: offset_commit_interval,
|
239
245
|
commit_threshold: offset_commit_threshold,
|
246
|
+
offset_retention_time: offset_retention_time
|
240
247
|
)
|
241
248
|
|
242
249
|
heartbeat = Heartbeat.new(
|
@@ -447,7 +454,9 @@ module Kafka
|
|
447
454
|
|
448
455
|
if ca_cert
|
449
456
|
store = OpenSSL::X509::Store.new
|
450
|
-
|
457
|
+
Array(ca_cert).each do |cert|
|
458
|
+
store.add_cert(OpenSSL::X509::Certificate.new(cert))
|
459
|
+
end
|
451
460
|
ssl_context.cert_store = store
|
452
461
|
end
|
453
462
|
|
data/lib/kafka/consumer.rb
CHANGED
data/lib/kafka/consumer_group.rb
CHANGED
@@ -5,7 +5,7 @@ module Kafka
|
|
5
5
|
class ConsumerGroup
|
6
6
|
attr_reader :assigned_partitions, :generation_id
|
7
7
|
|
8
|
-
def initialize(cluster:, logger:, group_id:, session_timeout:)
|
8
|
+
def initialize(cluster:, logger:, group_id:, session_timeout:, retention_time:)
|
9
9
|
@cluster = cluster
|
10
10
|
@logger = logger
|
11
11
|
@group_id = group_id
|
@@ -16,6 +16,7 @@ module Kafka
|
|
16
16
|
@topics = Set.new
|
17
17
|
@assigned_partitions = {}
|
18
18
|
@assignment_strategy = RoundRobinAssignmentStrategy.new(cluster: @cluster)
|
19
|
+
@retention_time = retention_time
|
19
20
|
end
|
20
21
|
|
21
22
|
def subscribe(topic)
|
@@ -68,6 +69,7 @@ module Kafka
|
|
68
69
|
member_id: @member_id,
|
69
70
|
generation_id: @generation_id,
|
70
71
|
offsets: offsets,
|
72
|
+
retention_time: @retention_time
|
71
73
|
)
|
72
74
|
|
73
75
|
response.topics.each do |topic, partitions|
|
data/lib/kafka/fetched_batch.rb
CHANGED
data/lib/kafka/message_buffer.rb
CHANGED
@@ -56,7 +56,7 @@ module Kafka
|
|
56
56
|
return unless @buffer.key?(topic) && @buffer[topic].key?(partition)
|
57
57
|
|
58
58
|
@size -= @buffer[topic][partition].count
|
59
|
-
@bytesize -= @buffer[topic][partition].map(&:bytesize).reduce(:+)
|
59
|
+
@bytesize -= @buffer[topic][partition].map(&:bytesize).reduce(0, :+)
|
60
60
|
|
61
61
|
@buffer[topic].delete(partition)
|
62
62
|
@buffer.delete(topic) if @buffer[topic].empty?
|
data/lib/kafka/offset_manager.rb
CHANGED
@@ -1,6 +1,10 @@
|
|
1
1
|
module Kafka
|
2
2
|
class OffsetManager
|
3
|
-
|
3
|
+
|
4
|
+
# The default broker setting for offsets.retention.minutes is 1440.
|
5
|
+
DEFAULT_RETENTION_TIME = 1440 * 60
|
6
|
+
|
7
|
+
def initialize(cluster:, group:, logger:, commit_interval:, commit_threshold:, offset_retention_time:)
|
4
8
|
@cluster = cluster
|
5
9
|
@group = group
|
6
10
|
@logger = logger
|
@@ -13,6 +17,8 @@ module Kafka
|
|
13
17
|
@committed_offsets = nil
|
14
18
|
@resolved_offsets = {}
|
15
19
|
@last_commit = Time.now
|
20
|
+
@last_recommit = nil
|
21
|
+
@recommit_interval = (offset_retention_time || DEFAULT_RETENTION_TIME) / 2
|
16
22
|
end
|
17
23
|
|
18
24
|
def set_default_offset(topic, default_offset)
|
@@ -49,17 +55,15 @@ module Kafka
|
|
49
55
|
end
|
50
56
|
end
|
51
57
|
|
52
|
-
def commit_offsets
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
}.join(", ")
|
57
|
-
|
58
|
-
@logger.info "Committing offsets: #{pretty_offsets}"
|
58
|
+
def commit_offsets(recommit = false)
|
59
|
+
offsets = offsets_to_commit(recommit)
|
60
|
+
unless offsets.empty?
|
61
|
+
@logger.info "Committing offsets#{recommit ? ' with recommit' : ''}: #{prettify_offsets(offsets)}"
|
59
62
|
|
60
|
-
@group.commit_offsets(
|
63
|
+
@group.commit_offsets(offsets)
|
61
64
|
|
62
65
|
@last_commit = Time.now
|
66
|
+
@last_recommit = Time.now if recommit
|
63
67
|
|
64
68
|
@uncommitted_offsets = 0
|
65
69
|
@committed_offsets = nil
|
@@ -67,8 +71,9 @@ module Kafka
|
|
67
71
|
end
|
68
72
|
|
69
73
|
def commit_offsets_if_necessary
|
70
|
-
|
71
|
-
|
74
|
+
recommit = recommit_timeout_reached?
|
75
|
+
if recommit || commit_timeout_reached? || commit_threshold_reached?
|
76
|
+
commit_offsets(recommit)
|
72
77
|
end
|
73
78
|
end
|
74
79
|
|
@@ -107,13 +112,44 @@ module Kafka
|
|
107
112
|
@cluster.resolve_offsets(topic, partitions, default_offset)
|
108
113
|
end
|
109
114
|
|
115
|
+
def seconds_since(time)
|
116
|
+
Time.now - time
|
117
|
+
end
|
118
|
+
|
110
119
|
def seconds_since_last_commit
|
111
|
-
|
120
|
+
seconds_since(@last_commit)
|
112
121
|
end
|
113
122
|
|
114
|
-
def
|
123
|
+
def committed_offsets
|
115
124
|
@committed_offsets ||= @group.fetch_offsets
|
116
|
-
|
125
|
+
end
|
126
|
+
|
127
|
+
def committed_offset_for(topic, partition)
|
128
|
+
committed_offsets.offset_for(topic, partition)
|
129
|
+
end
|
130
|
+
|
131
|
+
def offsets_to_commit(recommit = false)
|
132
|
+
if recommit
|
133
|
+
offsets_to_recommit.merge!(@processed_offsets) do |_topic, committed, processed|
|
134
|
+
committed.merge!(processed)
|
135
|
+
end
|
136
|
+
else
|
137
|
+
@processed_offsets
|
138
|
+
end
|
139
|
+
end
|
140
|
+
|
141
|
+
def offsets_to_recommit
|
142
|
+
committed_offsets.topics.each_with_object({}) do |(topic, partition_info), offsets|
|
143
|
+
topic_offsets = partition_info.keys.each_with_object({}) do |partition, partition_map|
|
144
|
+
offset = committed_offsets.offset_for(topic, partition)
|
145
|
+
partition_map[partition] = offset unless offset == -1
|
146
|
+
end
|
147
|
+
offsets[topic] = topic_offsets unless topic_offsets.empty?
|
148
|
+
end
|
149
|
+
end
|
150
|
+
|
151
|
+
def recommit_timeout_reached?
|
152
|
+
@last_recommit.nil? || seconds_since(@last_recommit) >= @recommit_interval
|
117
153
|
end
|
118
154
|
|
119
155
|
def commit_timeout_reached?
|
@@ -123,5 +159,11 @@ module Kafka
|
|
123
159
|
def commit_threshold_reached?
|
124
160
|
@commit_threshold != 0 && @uncommitted_offsets >= @commit_threshold
|
125
161
|
end
|
162
|
+
|
163
|
+
def prettify_offsets(offsets)
|
164
|
+
offsets.flat_map do |topic, partitions|
|
165
|
+
partitions.map { |partition, offset| "#{topic}/#{partition}:#{offset}" }
|
166
|
+
end.join(', ')
|
167
|
+
end
|
126
168
|
end
|
127
169
|
end
|
data/lib/kafka/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: ruby-kafka
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.3.
|
4
|
+
version: 0.3.17
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Daniel Schierbeck
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-
|
11
|
+
date: 2017-04-07 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|