ruby-kafka 0.3.7 → 0.3.8
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +6 -0
- data/README.md +30 -4
- data/circle.yml +4 -2
- data/lib/kafka/async_producer.rb +2 -2
- data/lib/kafka/client.rb +21 -14
- data/lib/kafka/message_buffer.rb +4 -1
- data/lib/kafka/version.rb +1 -1
- data/ruby-kafka.gemspec +1 -0
- metadata +16 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: ca28feeee3d3fd46dab6145eb79a67b842d7f6ec
|
4
|
+
data.tar.gz: 720984362d2a5c4492a35022749d458618010736
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: c95bc7e7befbf91326d87a3a21af6bed1ba138b2243849f071084b1ec351eae728e0d03da20581f20a47a1b11a1b8842495aaf5ac55b8d6530a8c17c6e219a45
|
7
|
+
data.tar.gz: 56b6062281cec53d293a3e4e2e295b423dc8fbc9e2affef51f67c5c5f6697daf064f07bf6a6b34d3cd7b94161dffcfe050cdfc3e967cbe58a317ba041cfc84ef
|
data/CHANGELOG.md
CHANGED
@@ -4,6 +4,12 @@ Changes and additions to the library will be listed here.
|
|
4
4
|
|
5
5
|
## Unreleased
|
6
6
|
|
7
|
+
## v0.3.8
|
8
|
+
|
9
|
+
- Keep separate connection pools for consumers and producers initialized from
|
10
|
+
the same client.
|
11
|
+
- Handle connection errors automatically in the async producer.
|
12
|
+
|
7
13
|
## v0.3.7
|
8
14
|
|
9
15
|
- Default to port 9092 if no port is provided for a seed broker.
|
data/README.md
CHANGED
@@ -21,10 +21,11 @@ Although parts of this library work with Kafka 0.8 – specifically, the Produce
|
|
21
21
|
2. [Consuming Messages from Kafka](#consuming-messages-from-kafka)
|
22
22
|
1. [Consumer Checkpointing](#consumer-checkpointing)
|
23
23
|
2. [Consuming Messages in Batches](#consuming-messages-in-batches)
|
24
|
-
3. [
|
25
|
-
4. [
|
26
|
-
5. [
|
27
|
-
6. [
|
24
|
+
3. [Thread Safety](#thread-safety)
|
25
|
+
4. [Logging](#logging)
|
26
|
+
5. [Instrumentation](#instrumentation)
|
27
|
+
6. [Understanding Timeouts](#understanding-timeouts)
|
28
|
+
7. [Encryption and Authentication using SSL](#encryption-and-authentication-using-ssl)
|
28
29
|
3. [Development](#development)
|
29
30
|
4. [Roadmap](#roadmap)
|
30
31
|
|
@@ -259,6 +260,25 @@ That is, once `#deliver_messages` returns we can be sure that Kafka has received
|
|
259
260
|
- If you configure the producer to not require acknowledgements from the Kafka brokers by setting `required_acks` to zero there is no guarantee that the messsage will ever make it to a Kafka broker.
|
260
261
|
- If you use the asynchronous producer there's no guarantee that messages will have been delivered after `#deliver_messages` returns. A way of blocking until a message has been delivered with the asynchronous producer may be implemented in the future.
|
261
262
|
|
263
|
+
It's possible to improve your chances of success when calling `#deliver_messages`, at the price of a longer max latency:
|
264
|
+
|
265
|
+
```ruby
|
266
|
+
producer = kafka.producer(
|
267
|
+
# The number of retries when attempting to deliver messages. The default is
|
268
|
+
# 2, so 3 attempts in total, but you can configure a higher or lower number:
|
269
|
+
max_retries: 5,
|
270
|
+
|
271
|
+
# The number of seconds to wait between retries. In order to handle longer
|
272
|
+
# periods of Kafka being unavailable, increase this number. The default is
|
273
|
+
# 1 second.
|
274
|
+
retry_backoff: 5,
|
275
|
+
)
|
276
|
+
```
|
277
|
+
|
278
|
+
Note that these values affect the max latency of the operation; see [Understanding Timeouts](#understanding-timeouts) for an explanation of the various timeouts and latencies.
|
279
|
+
|
280
|
+
If you use the asynchronous producer you typically don't have to worry too much about this, as retries will be done in the background.
|
281
|
+
|
262
282
|
#### Compression
|
263
283
|
|
264
284
|
Depending on what kind of data you produce, enabling compression may yield improved bandwidth and space usage. Compression in Kafka is done on entire messages sets rather than on individual messages. This improves the compression rate and generally means that compressions works better the larger your buffers get, since the message sets will be larger by the time they're compressed.
|
@@ -424,6 +444,12 @@ end
|
|
424
444
|
|
425
445
|
One important thing to note is that the client commits the offset of the batch's messages only after the _entire_ batch has been processed.
|
426
446
|
|
447
|
+
### Thread Safety
|
448
|
+
|
449
|
+
You typically don't want to share a Kafka client between threads, since the network communication is not synchronized. Furthermore, you should avoid using threads in a consumer unless you're very careful about waiting for all work to complete before returning from the `#each_message` or `#each_batch` block. This is because _checkpointing_ assumes that returning from the block means that the messages that have been yielded have been successfully processed.
|
450
|
+
|
451
|
+
You should also avoid sharing a synchronous producer between threads, as the internal buffers are not thread safe. However, the _asynchronous_ producer should be safe to use in a multi-threaded environment.
|
452
|
+
|
427
453
|
### Logging
|
428
454
|
|
429
455
|
It's a very good idea to configure the Kafka client with a logger. All important operations and errors are logged. When instantiating your client, simply pass in a valid logger:
|
data/circle.yml
CHANGED
@@ -1,8 +1,9 @@
|
|
1
1
|
machine:
|
2
|
+
pre:
|
3
|
+
- curl -sSL https://s3.amazonaws.com/circle-downloads/install-circleci-docker.sh | bash -s -- 1.10.0
|
2
4
|
services:
|
3
5
|
- docker
|
4
6
|
environment:
|
5
|
-
LOG_TO_STDERR: true
|
6
7
|
LOG_LEVEL: DEBUG
|
7
8
|
|
8
9
|
dependencies:
|
@@ -14,6 +15,7 @@ dependencies:
|
|
14
15
|
|
15
16
|
test:
|
16
17
|
override:
|
17
|
-
- bundle exec rspec
|
18
|
+
- bundle exec rspec -r rspec_junit_formatter --format RspecJunitFormatter -o $CIRCLE_TEST_REPORTS/rspec/unit.xml
|
19
|
+
- bundle exec rspec -r rspec_junit_formatter --format RspecJunitFormatter -o $CIRCLE_TEST_REPORTS/rspec/functional.xml --tag functional
|
18
20
|
post:
|
19
21
|
- cp *.log $CIRCLE_ARTIFACTS/ || true
|
data/lib/kafka/async_producer.rb
CHANGED
@@ -211,8 +211,8 @@ module Kafka
|
|
211
211
|
|
212
212
|
def deliver_messages
|
213
213
|
@producer.deliver_messages
|
214
|
-
rescue DeliveryFailed
|
215
|
-
#
|
214
|
+
rescue DeliveryFailed, ConnectionError
|
215
|
+
# Failed to deliver messages -- nothing to do but try again later.
|
216
216
|
end
|
217
217
|
|
218
218
|
def threshold_reached?
|
data/lib/kafka/client.rb
CHANGED
@@ -43,10 +43,11 @@ module Kafka
|
|
43
43
|
def initialize(seed_brokers:, client_id: "ruby-kafka", logger: nil, connect_timeout: nil, socket_timeout: nil, ssl_ca_cert: nil, ssl_client_cert: nil, ssl_client_cert_key: nil)
|
44
44
|
@logger = logger || Logger.new(nil)
|
45
45
|
@instrumenter = Instrumenter.new(client_id: client_id)
|
46
|
+
@seed_brokers = normalize_seed_brokers(seed_brokers)
|
46
47
|
|
47
48
|
ssl_context = build_ssl_context(ssl_ca_cert, ssl_client_cert, ssl_client_cert_key)
|
48
49
|
|
49
|
-
connection_builder = ConnectionBuilder.new(
|
50
|
+
@connection_builder = ConnectionBuilder.new(
|
50
51
|
client_id: client_id,
|
51
52
|
connect_timeout: connect_timeout,
|
52
53
|
socket_timeout: socket_timeout,
|
@@ -55,16 +56,7 @@ module Kafka
|
|
55
56
|
instrumenter: @instrumenter,
|
56
57
|
)
|
57
58
|
|
58
|
-
|
59
|
-
connection_builder: connection_builder,
|
60
|
-
logger: @logger,
|
61
|
-
)
|
62
|
-
|
63
|
-
@cluster = Cluster.new(
|
64
|
-
seed_brokers: normalize_seed_brokers(seed_brokers),
|
65
|
-
broker_pool: broker_pool,
|
66
|
-
logger: @logger,
|
67
|
-
)
|
59
|
+
@cluster = initialize_cluster
|
68
60
|
end
|
69
61
|
|
70
62
|
# Initializes a new Kafka producer.
|
@@ -105,7 +97,7 @@ module Kafka
|
|
105
97
|
)
|
106
98
|
|
107
99
|
Producer.new(
|
108
|
-
cluster:
|
100
|
+
cluster: initialize_cluster,
|
109
101
|
logger: @logger,
|
110
102
|
instrumenter: @instrumenter,
|
111
103
|
compressor: compressor,
|
@@ -158,8 +150,10 @@ module Kafka
|
|
158
150
|
# than the session window.
|
159
151
|
# @return [Consumer]
|
160
152
|
def consumer(group_id:, session_timeout: 30, offset_commit_interval: 10, offset_commit_threshold: 0, heartbeat_interval: 10)
|
153
|
+
cluster = initialize_cluster
|
154
|
+
|
161
155
|
group = ConsumerGroup.new(
|
162
|
-
cluster:
|
156
|
+
cluster: cluster,
|
163
157
|
logger: @logger,
|
164
158
|
group_id: group_id,
|
165
159
|
session_timeout: session_timeout,
|
@@ -178,7 +172,7 @@ module Kafka
|
|
178
172
|
)
|
179
173
|
|
180
174
|
Consumer.new(
|
181
|
-
cluster:
|
175
|
+
cluster: cluster,
|
182
176
|
logger: @logger,
|
183
177
|
instrumenter: @instrumenter,
|
184
178
|
group: group,
|
@@ -286,6 +280,19 @@ module Kafka
|
|
286
280
|
|
287
281
|
private
|
288
282
|
|
283
|
+
def initialize_cluster
|
284
|
+
broker_pool = BrokerPool.new(
|
285
|
+
connection_builder: @connection_builder,
|
286
|
+
logger: @logger,
|
287
|
+
)
|
288
|
+
|
289
|
+
Cluster.new(
|
290
|
+
seed_brokers: @seed_brokers,
|
291
|
+
broker_pool: broker_pool,
|
292
|
+
logger: @logger,
|
293
|
+
)
|
294
|
+
end
|
295
|
+
|
289
296
|
def build_ssl_context(ca_cert, client_cert, client_cert_key)
|
290
297
|
return nil unless ca_cert || client_cert || client_cert_key
|
291
298
|
|
data/lib/kafka/message_buffer.rb
CHANGED
@@ -27,7 +27,7 @@ module Kafka
|
|
27
27
|
buffer_for(topic, partition).concat(messages)
|
28
28
|
|
29
29
|
@size += messages.count
|
30
|
-
@bytesize += messages.map(&:bytesize).reduce(:+)
|
30
|
+
@bytesize += messages.map(&:bytesize).reduce(0, :+)
|
31
31
|
end
|
32
32
|
|
33
33
|
def to_h
|
@@ -53,6 +53,8 @@ module Kafka
|
|
53
53
|
#
|
54
54
|
# @return [nil]
|
55
55
|
def clear_messages(topic:, partition:)
|
56
|
+
return unless @buffer.key?(topic) && @buffer[topic].key?(partition)
|
57
|
+
|
56
58
|
@size -= @buffer[topic][partition].count
|
57
59
|
@bytesize -= @buffer[topic][partition].map(&:bytesize).reduce(:+)
|
58
60
|
|
@@ -70,6 +72,7 @@ module Kafka
|
|
70
72
|
def clear
|
71
73
|
@buffer = {}
|
72
74
|
@size = 0
|
75
|
+
@bytesize = 0
|
73
76
|
end
|
74
77
|
|
75
78
|
private
|
data/lib/kafka/version.rb
CHANGED
data/ruby-kafka.gemspec
CHANGED
@@ -37,4 +37,5 @@ Gem::Specification.new do |spec|
|
|
37
37
|
spec.add_development_dependency "activesupport", ">= 4.2.0", "< 5.1"
|
38
38
|
spec.add_development_dependency "snappy"
|
39
39
|
spec.add_development_dependency "colored"
|
40
|
+
spec.add_development_dependency "rspec_junit_formatter", "0.2.2"
|
40
41
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: ruby-kafka
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.3.
|
4
|
+
version: 0.3.8
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Daniel Schierbeck
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2016-05-
|
11
|
+
date: 2016-05-23 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -156,6 +156,20 @@ dependencies:
|
|
156
156
|
- - ">="
|
157
157
|
- !ruby/object:Gem::Version
|
158
158
|
version: '0'
|
159
|
+
- !ruby/object:Gem::Dependency
|
160
|
+
name: rspec_junit_formatter
|
161
|
+
requirement: !ruby/object:Gem::Requirement
|
162
|
+
requirements:
|
163
|
+
- - '='
|
164
|
+
- !ruby/object:Gem::Version
|
165
|
+
version: 0.2.2
|
166
|
+
type: :development
|
167
|
+
prerelease: false
|
168
|
+
version_requirements: !ruby/object:Gem::Requirement
|
169
|
+
requirements:
|
170
|
+
- - '='
|
171
|
+
- !ruby/object:Gem::Version
|
172
|
+
version: 0.2.2
|
159
173
|
description: |-
|
160
174
|
A client library for the Kafka distributed commit log.
|
161
175
|
|