ruby-kafka 0.2.0 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +7 -0
- data/README.md +171 -19
- data/examples/firehose-producer.rb +49 -0
- data/examples/ssl-producer.rb +42 -0
- data/lib/kafka/broker_pool.rb +3 -1
- data/lib/kafka/client.rb +117 -16
- data/lib/kafka/connection.rb +8 -2
- data/lib/kafka/consumer.rb +40 -39
- data/lib/kafka/consumer_group.rb +1 -19
- data/lib/kafka/offset_manager.rb +75 -0
- data/lib/kafka/producer.rb +2 -40
- data/lib/kafka/protocol/offset_fetch_response.rb +7 -4
- data/lib/kafka/ssl_socket_with_timeout.rb +154 -0
- data/lib/kafka/version.rb +1 -1
- metadata +6 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 87132b87fc32443be48994590b059a88b6cc5fea
|
4
|
+
data.tar.gz: c32e7ed87e9dcc7c2ba6f3128db0aadb123d1962
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 84d5def3fa8963f928d2e59239e9c2678cfbe2ba904ad67bc13080a16b861bb5296394c442c14853144d065a4b86f205a8ba378e28302f67f8b80c5eaf0b7100
|
7
|
+
data.tar.gz: b39147eb0fd72f1753af38831871a00f597d0191a70f3f47734f87a2212b7d1500c0b1e712c36755b7ab09b27d66a26ea020f6a2fc413ea9ed9a0e2e15449341
|
data/CHANGELOG.md
CHANGED
@@ -4,6 +4,13 @@ Changes and additions to the library will be listed here.
|
|
4
4
|
|
5
5
|
## Unreleased
|
6
6
|
|
7
|
+
## v0.3.0
|
8
|
+
|
9
|
+
- Add support for encryption and authentication with SSL (Tom Crayford).
|
10
|
+
- Allow configuring consumer offset commit policies.
|
11
|
+
- Instrument consumer message processing.
|
12
|
+
- Fixed an issue causing exceptions when no logger was specified.
|
13
|
+
|
7
14
|
## v0.2.0
|
8
15
|
|
9
16
|
- Add instrumentation of message compression.
|
data/README.md
CHANGED
@@ -1,11 +1,29 @@
|
|
1
1
|
# ruby-kafka
|
2
2
|
|
3
|
-
[![Circle CI](https://circleci.com/gh/zendesk/ruby-kafka.svg?style=shield)](https://circleci.com/gh/zendesk/ruby-kafka/tree/master)
|
4
|
-
|
5
3
|
A Ruby client library for [Apache Kafka](http://kafka.apache.org/), a distributed log and message bus. The focus of this library will be operational simplicity, with good logging and metrics that can make debugging issues easier.
|
6
4
|
|
7
5
|
The Producer API is currently beta level and used in production. There's an alpha level Consumer Group API that has not yet been used in production and that may change without warning. Feel free to try it out but don't expect it to be stable or correct quite yet.
|
8
6
|
|
7
|
+
Although parts of this library work with Kafka 0.8 – specifically, the Producer API – it's being tested and developed against Kafka 0.9. The Consumer API will be 0.9 only.
|
8
|
+
|
9
|
+
#### Table of Contents
|
10
|
+
|
11
|
+
1. [Installation](#installation)
|
12
|
+
2. [Usage](#usage)
|
13
|
+
1. [Producing Messages to Kafka](#producing-messages-to-kafka)
|
14
|
+
1. [Asynchronously Producing Messages](#asynchronously-producing-messages)
|
15
|
+
2. [Serialization](#serialization)
|
16
|
+
3. [Partitioning](#partitioning)
|
17
|
+
4. [Buffering and Error Handling](#buffering-and-error-handling)
|
18
|
+
5. [Message Delivery Guarantees](#message-delivery-guarantees)
|
19
|
+
6. [Compression](#compression)
|
20
|
+
2. [Consuming Messages from Kafka](#consuming-messages-from-kafka)
|
21
|
+
3. [Logging](#logging)
|
22
|
+
4. [Understanding Timeouts](#understanding-timeouts)
|
23
|
+
5. [Encryption and Authentication using SSL](#encryption-and-authentication-using-ssl)
|
24
|
+
6. [Development](#development)
|
25
|
+
7. [Roadmap](#roadmap)
|
26
|
+
|
9
27
|
## Installation
|
10
28
|
|
11
29
|
Add this line to your application's Gemfile:
|
@@ -74,7 +92,7 @@ producer.deliver_messages
|
|
74
92
|
|
75
93
|
Read the docs for [Kafka::Producer](http://www.rubydoc.info/gems/ruby-kafka/Kafka/Producer) for more details.
|
76
94
|
|
77
|
-
|
95
|
+
#### Asynchronously Producing Messages
|
78
96
|
|
79
97
|
A normal producer will block while `#deliver_messages` is sending messages to Kafka, possible for tens of seconds or even minutes at a time, depending on your timeout and retry settings. Furthermore, you have to call `#deliver_messages` manually, with a frequency that balances batch size with message delay.
|
80
98
|
|
@@ -119,7 +137,7 @@ producer.produce("hello", topic: "greetings")
|
|
119
137
|
|
120
138
|
**Note:** if the calling thread produces messages faster than the producer can write them to Kafka, you'll eventually run into problems. The internal queue used for sending messages from the calling thread to the background worker has a size limit; once this limit is reached, a call to `#produce` will raise `Kafka::BufferOverflow`.
|
121
139
|
|
122
|
-
|
140
|
+
#### Serialization
|
123
141
|
|
124
142
|
This library is agnostic to which serialization format you prefer. Both the value and key of a message is treated as a binary string of data. This makes it easier to use whatever serialization format you want, since you don't have to do anything special to make it work with ruby-kafka. Here's an example of encoding data with JSON:
|
125
143
|
|
@@ -139,12 +157,12 @@ data = JSON.dump(event)
|
|
139
157
|
producer.produce(data, topic: "events")
|
140
158
|
```
|
141
159
|
|
142
|
-
|
160
|
+
#### Partitioning
|
143
161
|
|
144
162
|
Kafka topics are partitioned, with messages being assigned to a partition by the client. This allows a great deal of flexibility for the users. This section describes several strategies for partitioning and how they impact performance, data locality, etc.
|
145
163
|
|
146
164
|
|
147
|
-
|
165
|
+
##### Load Balanced Partitioning
|
148
166
|
|
149
167
|
When optimizing for efficiency, we either distribute messages as evenly as possible to all partitions, or make sure each producer always writes to a single partition. The former ensures an even load for downstream consumers; the latter ensures the highest producer performance, since message batching is done per partition.
|
150
168
|
|
@@ -163,7 +181,7 @@ producer.produce(msg2, topic: "messages", partition_key: partition_key)
|
|
163
181
|
|
164
182
|
You can also base the partition key on some property of the producer, for example the host name.
|
165
183
|
|
166
|
-
|
184
|
+
##### Semantic Partitioning
|
167
185
|
|
168
186
|
By assigning messages to a partition based on some property of the message, e.g. making sure all events tracked in a user session are assigned to the same partition, downstream consumers can make simplifying assumptions about data locality. In this example, a consumer can keep process local state pertaining to a user session knowing that all events for the session will be read from a single partition. This is also called _semantic partitioning_, since the partition assignment is part of the application behavior.
|
169
187
|
|
@@ -185,7 +203,7 @@ partition = some_number % partitions
|
|
185
203
|
producer.produce(event, topic: "events", partition: partition)
|
186
204
|
```
|
187
205
|
|
188
|
-
|
206
|
+
##### Compatibility with Other Clients
|
189
207
|
|
190
208
|
There's no standardized way to assign messages to partitions across different Kafka client implementations. If you have a heterogeneous set of clients producing messages to the same topics it may be important to ensure a consistent partitioning scheme. This library doesn't try to implement all schemes, so you'll have to figure out which scheme the other client is using and replicate it. An example:
|
191
209
|
|
@@ -198,7 +216,7 @@ partition = PartitioningScheme.assign(partitions, event)
|
|
198
216
|
producer.produce(event, topic: "events", partition: partition)
|
199
217
|
```
|
200
218
|
|
201
|
-
|
219
|
+
#### Buffering and Error Handling
|
202
220
|
|
203
221
|
The producer is designed for resilience in the face of temporary network errors, Kafka broker failovers, and other issues that prevent the client from writing messages to the destination topics. It does this by employing local, in-memory buffers. Only when messages are acknowledged by a Kafka broker will they be removed from the buffer.
|
204
222
|
|
@@ -208,6 +226,108 @@ Note that there's a maximum buffer size; pass in a different value for `max_buff
|
|
208
226
|
|
209
227
|
A final note on buffers: local buffers give resilience against broker and network failures, and allow higher throughput due to message batching, but they also trade off consistency guarantees for higher availibility and resilience. If your local process dies while messages are buffered, those messages will be lost. If you require high levels of consistency, you should call `#deliver_messages` immediately after `#produce`.
|
210
228
|
|
229
|
+
#### Message Delivery Guarantees
|
230
|
+
|
231
|
+
There are basically two different and incompatible guarantees that can be made in a message delivery system such as Kafka:
|
232
|
+
|
233
|
+
1. _at-most-once_ delivery guarantees that a message is at most delivered to the recipient _once_. This is useful only if delivering the message twice carries some risk and should be avoided. Implicit is the fact that there's no guarantee that the message will be delivered at all.
|
234
|
+
2. _at-least-once_ delivery guarantees that a message is delivered, but it may be delivered more than once. If the final recipient de-duplicates messages, e.g. by checking a unique message id, then it's even possible to implement _exactly-once_ delivery.
|
235
|
+
|
236
|
+
Of these two options, ruby-kafka implements the second one: when in doubt about whether a message has been delivered, a producer will try to deliver it again.
|
237
|
+
|
238
|
+
The guarantee is made only for the synchronous producer and boils down to this:
|
239
|
+
|
240
|
+
```ruby
|
241
|
+
producer = kafka.producer
|
242
|
+
|
243
|
+
producer.produce("hello", topic: "greetings")
|
244
|
+
|
245
|
+
# If this line fails with Kafka::DeliveryFailed we *may* have succeeded in deliverying
|
246
|
+
# the message to Kafka but won't know for sure.
|
247
|
+
producer.deliver_messages
|
248
|
+
|
249
|
+
# If we get to this line we can be sure that the message has been delivered to Kafka!
|
250
|
+
```
|
251
|
+
|
252
|
+
That is, once `#deliver_messages` returns we can be sure that Kafka has received the message. Note that there are some big caveats here:
|
253
|
+
|
254
|
+
- Depending on how your cluster and topic is configured the message could still be lost by Kafka.
|
255
|
+
- If you configure the producer to not require acknowledgements from the Kafka brokers by setting `required_acks` to zero there is no guarantee that the messsage will ever make it to a Kafka broker.
|
256
|
+
- If you use the asynchronous producer there's no guarantee that messages will have been delivered after `#deliver_messages` returns. A way of blocking until a message has been delivered with the asynchronous producer may be implemented in the future.
|
257
|
+
|
258
|
+
#### Compression
|
259
|
+
|
260
|
+
Depending on what kind of data you produce, enabling compression may yield improved bandwidth and space usage. Compression in Kafka is done on entire messages sets rather than on individual messages. This improves the compression rate and generally means that compressions works better the larger your buffers get, since the message sets will be larger by the time they're compressed.
|
261
|
+
|
262
|
+
Since many workloads have variations in throughput and distribution across partitions, it's possible to configure a threshold for when to enable compression by setting `compression_threshold`. Only if the defined number of messages are buffered for a partition will the messages be compressed.
|
263
|
+
|
264
|
+
Compression is enabled by passing the `compression_codec` parameter to `#producer` with the name of one of the algorithms allowed by Kafka:
|
265
|
+
|
266
|
+
* `:snappy` for [Snappy](http://google.github.io/snappy/) compression.
|
267
|
+
* `:gzip` for [gzip](https://en.wikipedia.org/wiki/Gzip) compression.
|
268
|
+
|
269
|
+
By default, all message sets will be compressed if you specify a compression codec. To increase the compression threshold, set `compression_threshold` to an integer value higher than one.
|
270
|
+
|
271
|
+
```ruby
|
272
|
+
producer = kafka.producer(
|
273
|
+
compression_codec: :snappy,
|
274
|
+
compression_threshold: 10,
|
275
|
+
)
|
276
|
+
```
|
277
|
+
|
278
|
+
### Consuming Messages from Kafka
|
279
|
+
|
280
|
+
**Warning:** The Consumer API is still alpha level and will likely change. The consumer code should not be considered stable, as it hasn't been exhaustively tested in production environments yet.
|
281
|
+
|
282
|
+
The simplest way to consume messages from a Kafka topic is using the `#fetch_messages` API:
|
283
|
+
|
284
|
+
```ruby
|
285
|
+
require "kafka"
|
286
|
+
|
287
|
+
kafka = Kafka.new(seed_brokers: ["kafka1:9092", "kafka2:9092"])
|
288
|
+
|
289
|
+
messages = kafka.fetch_messages(topic: "greetings", partition: 42)
|
290
|
+
|
291
|
+
messages.each do |message|
|
292
|
+
puts message.offset, message.key, message.value
|
293
|
+
end
|
294
|
+
```
|
295
|
+
|
296
|
+
While this is great for extremely simple use cases, there are a number of downsides:
|
297
|
+
|
298
|
+
- You can only fetch from a single topic and partition at a time.
|
299
|
+
- If you want to have multiple processes consume from the same topic, there's no way of coordinating which processes should fetch from which partitions.
|
300
|
+
- If a process dies, there's no way to have another process resume fetching from the point in the partition that the original process had reached.
|
301
|
+
|
302
|
+
The Consumer API solves all of these issues, and more. It uses the Consumer Groups feature released in Kafka 0.9 to allow multiple consumer processes to coordinate access to a topic, assigning each partition to a single consumer. When a consumer fails, the partitions that were assigned to it are re-assigned to other members of the group.
|
303
|
+
|
304
|
+
Using the API is simple:
|
305
|
+
|
306
|
+
```ruby
|
307
|
+
require "kafka"
|
308
|
+
|
309
|
+
kafka = Kafka.new(seed_brokers: ["kafka1:9092", "kafka2:9092"])
|
310
|
+
|
311
|
+
# Consumers with the same group id will form a Consumer Group together.
|
312
|
+
consumer = kafka.consumer(group_id: "my-consumer")
|
313
|
+
|
314
|
+
consumer.subscribe("greetings")
|
315
|
+
|
316
|
+
begin
|
317
|
+
# This will loop indefinitely, yielding each message in turn.
|
318
|
+
consumer.each_message do |message|
|
319
|
+
puts message.topic, message.partition
|
320
|
+
puts message.offset, message.key, message.value
|
321
|
+
end
|
322
|
+
ensure
|
323
|
+
# Always make sure to shut down the consumer properly.
|
324
|
+
consumer.shutdown
|
325
|
+
end
|
326
|
+
```
|
327
|
+
|
328
|
+
Each consumer process will be assigned one or more partitions from each topic that the group subscribes to. In order to handle more messages, simply start more processes.
|
329
|
+
|
330
|
+
|
211
331
|
### Logging
|
212
332
|
|
213
333
|
It's a very good idea to configure the Kafka client with a logger. All important operations and errors are logged. When instantiating your client, simply pass in a valid logger:
|
@@ -239,33 +359,65 @@ When sending many messages, it's likely that the client needs to send some messa
|
|
239
359
|
|
240
360
|
Make sure your application can survive being blocked for so long.
|
241
361
|
|
362
|
+
### Encryption and Authentication using SSL
|
363
|
+
|
364
|
+
By default, communication between Kafka clients and brokers is unencrypted and unauthenticated. Kafka 0.9 added optional support for [encryption and client authentication and authorization](http://kafka.apache.org/documentation.html#security_ssl). There are two layers of security made possible by this:
|
365
|
+
|
366
|
+
#### Encryption of Communication
|
367
|
+
|
368
|
+
By enabling SSL encryption you can have some confidence that messages can be sent to Kafka over an untrusted network without being intercepted.
|
369
|
+
|
370
|
+
In this case you just need to pass a valid CA certificate as a string when configuring your `Kafka` client:
|
371
|
+
|
372
|
+
```ruby
|
373
|
+
kafka = Kafka.new(
|
374
|
+
ssl_ca_cert: File.read('my_ca_cert.pem'),
|
375
|
+
# ...
|
376
|
+
)
|
377
|
+
```
|
378
|
+
|
379
|
+
Without passing the CA certificate to the client it would be impossible to protect against [man-in-the-middle attacks](https://en.wikipedia.org/wiki/Man-in-the-middle_attack).
|
380
|
+
|
381
|
+
#### Client Authentication
|
382
|
+
|
383
|
+
In order to authenticate the client to the cluster, you need to pass in a certificate and key created for the client and trusted by the brokers.
|
384
|
+
|
385
|
+
```ruby
|
386
|
+
kafka = Kafka.new(
|
387
|
+
ssl_ca_cert: File.read('my_ca_cert.pem'),
|
388
|
+
ssl_client_cert: File.read('my_client_cert.pem'),
|
389
|
+
ssl_client_cert_key: File.read('my_client_cert_key.pem'),
|
390
|
+
# ...
|
391
|
+
)
|
392
|
+
```
|
393
|
+
|
394
|
+
Once client authentication is set up, it is possible to configure the Kafka cluster to [authorize client requests](http://kafka.apache.org/documentation.html#security_authz).
|
395
|
+
|
242
396
|
## Development
|
243
397
|
|
244
398
|
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
245
399
|
|
246
400
|
**Note:** the specs require a working [Docker](https://www.docker.com/) instance, but should work out of the box if you have Docker installed. Please create an issue if that's not the case.
|
247
401
|
|
248
|
-
|
249
|
-
|
250
|
-
The current stable release is v0.1. This release is running in production at Zendesk, but it's still not recommended that you use it when data loss is unacceptable. It will take a little while until all edge cases have been uncovered and handled.
|
402
|
+
[![Circle CI](https://circleci.com/gh/zendesk/ruby-kafka.svg?style=shield)](https://circleci.com/gh/zendesk/ruby-kafka/tree/master)
|
251
403
|
|
252
|
-
|
404
|
+
## Roadmap
|
253
405
|
|
254
|
-
|
406
|
+
The current stable release is v0.2. This release is running in production at Zendesk, but it's still not recommended that you use it when data loss is unacceptable. It will take a little while until all edge cases have been uncovered and handled.
|
255
407
|
|
256
|
-
|
408
|
+
### v0.3
|
257
409
|
|
258
|
-
|
410
|
+
Beta release of the Consumer API, allowing balanced Consumer Groups coordinating access to partitions. Kafka 0.9 only.
|
259
411
|
|
260
|
-
### v1.0
|
412
|
+
### v1.0
|
261
413
|
|
262
|
-
|
414
|
+
API freeze. All new changes will be backwards compatible.
|
263
415
|
|
264
416
|
## Why a new library?
|
265
417
|
|
266
418
|
There are a few existing Kafka clients in Ruby:
|
267
419
|
|
268
|
-
* [Poseidon](https://github.com/bpot/poseidon) seems to work for Kafka 0.8, but the project
|
420
|
+
* [Poseidon](https://github.com/bpot/poseidon) seems to work for Kafka 0.8, but the project is unmaintained and has known issues.
|
269
421
|
* [Hermann](https://github.com/reiseburo/hermann) wraps the C library [librdkafka](https://github.com/edenhill/librdkafka) and seems to be very efficient, but its API and mode of operation is too intrusive for our needs.
|
270
422
|
* [jruby-kafka](https://github.com/joekiller/jruby-kafka) is a great option if you're running on JRuby.
|
271
423
|
|
@@ -0,0 +1,49 @@
|
|
1
|
+
$LOAD_PATH.unshift(File.expand_path("../../lib", __FILE__))
|
2
|
+
|
3
|
+
require "kafka"
|
4
|
+
|
5
|
+
KAFKA_CLIENT_CERT = ENV.fetch("KAFKA_CLIENT_CERT")
|
6
|
+
KAFKA_CLIENT_CERT_KEY = ENV.fetch("KAFKA_CLIENT_CERT_KEY")
|
7
|
+
KAFKA_SERVER_CERT = ENV.fetch("KAFKA_SERVER_CERT")
|
8
|
+
KAFKA_URL = ENV.fetch("KAFKA_URL")
|
9
|
+
KAFKA_BROKERS = KAFKA_URL.gsub("kafka+ssl://", "").split(",")
|
10
|
+
KAFKA_TOPIC = "test-messages"
|
11
|
+
|
12
|
+
NUM_THREADS = 20
|
13
|
+
|
14
|
+
threads = NUM_THREADS.times.map do
|
15
|
+
Thread.new do
|
16
|
+
logger = Logger.new($stderr)
|
17
|
+
logger.level = Logger::INFO
|
18
|
+
|
19
|
+
kafka = Kafka.new(
|
20
|
+
seed_brokers: KAFKA_BROKERS,
|
21
|
+
logger: logger,
|
22
|
+
ssl_client_cert: KAFKA_CLIENT_CERT,
|
23
|
+
ssl_client_cert_key: KAFKA_CLIENT_CERT_KEY,
|
24
|
+
ssl_ca_cert: KAFKA_SERVER_CERT,
|
25
|
+
)
|
26
|
+
|
27
|
+
producer = kafka.async_producer(
|
28
|
+
delivery_interval: 1,
|
29
|
+
max_queue_size: 5_000,
|
30
|
+
max_buffer_size: 10_000,
|
31
|
+
)
|
32
|
+
|
33
|
+
begin
|
34
|
+
loop do
|
35
|
+
producer.produce(rand.to_s, key: rand.to_s, topic: KAFKA_TOPIC)
|
36
|
+
end
|
37
|
+
rescue Kafka::BufferOverflow
|
38
|
+
logger.error "Buffer overflow, backing off for 1s"
|
39
|
+
sleep 1
|
40
|
+
retry
|
41
|
+
ensure
|
42
|
+
producer.shutdown
|
43
|
+
end
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
threads.each {|t| t.abort_on_exception = true }
|
48
|
+
|
49
|
+
threads.map(&:join)
|
@@ -0,0 +1,42 @@
|
|
1
|
+
# Reads lines from STDIN, writing them to Kafka.
|
2
|
+
|
3
|
+
$LOAD_PATH.unshift(File.expand_path("../../lib", __FILE__))
|
4
|
+
|
5
|
+
require "kafka"
|
6
|
+
|
7
|
+
logger = Logger.new($stderr)
|
8
|
+
brokers = ENV.fetch("KAFKA_BROKERS").split(",")
|
9
|
+
|
10
|
+
# Make sure to create this topic in your Kafka cluster or configure the
|
11
|
+
# cluster to auto-create topics.
|
12
|
+
topic = "page-visits"
|
13
|
+
|
14
|
+
ssl_context = OpenSSL::SSL::SSLContext.new
|
15
|
+
ssl_context.set_params(
|
16
|
+
cert: OpenSSL::X509::Certificate.new(ENV.fetch("KAFKA_CLIENT_CERT")),
|
17
|
+
key: OpenSSL::PKey::RSA.new(ENV.fetch("KAFKA_CLIENT_CERT_KEY")),
|
18
|
+
)
|
19
|
+
|
20
|
+
kafka = Kafka.new(
|
21
|
+
seed_brokers: brokers,
|
22
|
+
client_id: "ssl-producer",
|
23
|
+
logger: logger,
|
24
|
+
ssl: true,
|
25
|
+
ssl_context: ssl_context,
|
26
|
+
)
|
27
|
+
|
28
|
+
producer = kafka.producer
|
29
|
+
|
30
|
+
begin
|
31
|
+
$stdin.each_with_index do |line, index|
|
32
|
+
producer.produce(line, topic: topic)
|
33
|
+
|
34
|
+
# Send messages for every 10 lines.
|
35
|
+
producer.deliver_messages if index % 10 == 0
|
36
|
+
end
|
37
|
+
ensure
|
38
|
+
# Make sure to send any remaining messages.
|
39
|
+
producer.deliver_messages
|
40
|
+
|
41
|
+
producer.shutdown
|
42
|
+
end
|
data/lib/kafka/broker_pool.rb
CHANGED
@@ -2,12 +2,13 @@ require "kafka/broker"
|
|
2
2
|
|
3
3
|
module Kafka
|
4
4
|
class BrokerPool
|
5
|
-
def initialize(client_id:, connect_timeout: nil, socket_timeout: nil, logger:)
|
5
|
+
def initialize(client_id:, connect_timeout: nil, socket_timeout: nil, logger:, ssl_context: nil)
|
6
6
|
@client_id = client_id
|
7
7
|
@connect_timeout = connect_timeout
|
8
8
|
@socket_timeout = socket_timeout
|
9
9
|
@logger = logger
|
10
10
|
@brokers = {}
|
11
|
+
@ssl_context = ssl_context
|
11
12
|
end
|
12
13
|
|
13
14
|
def connect(host, port, node_id: nil)
|
@@ -21,6 +22,7 @@ module Kafka
|
|
21
22
|
connect_timeout: @connect_timeout,
|
22
23
|
socket_timeout: @socket_timeout,
|
23
24
|
logger: @logger,
|
25
|
+
ssl_context: @ssl_context,
|
24
26
|
)
|
25
27
|
|
26
28
|
@brokers[node_id] = broker unless node_id.nil?
|
data/lib/kafka/client.rb
CHANGED
@@ -1,3 +1,5 @@
|
|
1
|
+
require "openssl"
|
2
|
+
|
1
3
|
require "kafka/cluster"
|
2
4
|
require "kafka/producer"
|
3
5
|
require "kafka/consumer"
|
@@ -23,32 +25,83 @@ module Kafka
|
|
23
25
|
# @param socket_timeout [Integer, nil] the timeout setting for socket
|
24
26
|
# connections. See {BrokerPool#initialize}.
|
25
27
|
#
|
28
|
+
# @param ssl_ca_cert [String, nil] a PEM encoded CA cert to use with an
|
29
|
+
# SSL connection.
|
30
|
+
#
|
31
|
+
# @param ssl_client_cert [String, nil] a PEM encoded client cert to use with an
|
32
|
+
# SSL connection. Must be used in combination with ssl_client_cert_key.
|
33
|
+
#
|
34
|
+
# @param ssl_client_cert_key [String, nil] a PEM encoded client cert key to use with an
|
35
|
+
# SSL connection. Must be used in combination with ssl_client_cert.
|
36
|
+
#
|
26
37
|
# @return [Client]
|
27
|
-
def initialize(seed_brokers:, client_id: "ruby-kafka", logger: nil, connect_timeout: nil, socket_timeout: nil)
|
28
|
-
@logger = logger || Logger.new(
|
38
|
+
def initialize(seed_brokers:, client_id: "ruby-kafka", logger: nil, connect_timeout: nil, socket_timeout: nil, ssl_ca_cert: nil, ssl_client_cert: nil, ssl_client_cert_key: nil)
|
39
|
+
@logger = logger || Logger.new(nil)
|
40
|
+
|
41
|
+
ssl_context = build_ssl_context(ssl_ca_cert, ssl_client_cert, ssl_client_cert_key)
|
29
42
|
|
30
43
|
broker_pool = BrokerPool.new(
|
31
44
|
client_id: client_id,
|
32
45
|
connect_timeout: connect_timeout,
|
33
46
|
socket_timeout: socket_timeout,
|
34
|
-
logger: logger,
|
47
|
+
logger: @logger,
|
48
|
+
ssl_context: ssl_context,
|
35
49
|
)
|
36
50
|
|
37
51
|
@cluster = Cluster.new(
|
38
52
|
seed_brokers: seed_brokers,
|
39
53
|
broker_pool: broker_pool,
|
40
|
-
logger: logger,
|
54
|
+
logger: @logger,
|
41
55
|
)
|
42
56
|
end
|
43
57
|
|
44
|
-
#
|
58
|
+
# Initializes a new Kafka producer.
|
59
|
+
#
|
60
|
+
# @param ack_timeout [Integer] The number of seconds a broker can wait for
|
61
|
+
# replicas to acknowledge a write before responding with a timeout.
|
62
|
+
#
|
63
|
+
# @param required_acks [Integer] The number of replicas that must acknowledge
|
64
|
+
# a write.
|
45
65
|
#
|
46
|
-
#
|
66
|
+
# @param max_retries [Integer] the number of retries that should be attempted
|
67
|
+
# before giving up sending messages to the cluster. Does not include the
|
68
|
+
# original attempt.
|
69
|
+
#
|
70
|
+
# @param retry_backoff [Integer] the number of seconds to wait between retries.
|
71
|
+
#
|
72
|
+
# @param max_buffer_size [Integer] the number of messages allowed in the buffer
|
73
|
+
# before new writes will raise {BufferOverflow} exceptions.
|
74
|
+
#
|
75
|
+
# @param max_buffer_bytesize [Integer] the maximum size of the buffer in bytes.
|
76
|
+
# attempting to produce messages when the buffer reaches this size will
|
77
|
+
# result in {BufferOverflow} being raised.
|
78
|
+
#
|
79
|
+
# @param compression_codec [Symbol, nil] the name of the compression codec to
|
80
|
+
# use, or nil if no compression should be performed. Valid codecs: `:snappy`
|
81
|
+
# and `:gzip`.
|
82
|
+
#
|
83
|
+
# @param compression_threshold [Integer] the number of messages that needs to
|
84
|
+
# be in a message set before it should be compressed. Note that message sets
|
85
|
+
# are per-partition rather than per-topic or per-producer.
|
47
86
|
#
|
48
|
-
# @see Producer#initialize
|
49
87
|
# @return [Kafka::Producer] the Kafka producer.
|
50
|
-
def producer(
|
51
|
-
|
88
|
+
def producer(compression_codec: nil, compression_threshold: 1, ack_timeout: 5, required_acks: 1, max_retries: 2, retry_backoff: 1, max_buffer_size: 1000, max_buffer_bytesize: 10_000_000)
|
89
|
+
compressor = Compressor.new(
|
90
|
+
codec_name: compression_codec,
|
91
|
+
threshold: compression_threshold,
|
92
|
+
)
|
93
|
+
|
94
|
+
Producer.new(
|
95
|
+
cluster: @cluster,
|
96
|
+
logger: @logger,
|
97
|
+
compressor: compressor,
|
98
|
+
ack_timeout: ack_timeout,
|
99
|
+
required_acks: required_acks,
|
100
|
+
max_retries: max_retries,
|
101
|
+
retry_backoff: retry_backoff,
|
102
|
+
max_buffer_size: max_buffer_size,
|
103
|
+
max_buffer_bytesize: max_buffer_bytesize,
|
104
|
+
)
|
52
105
|
end
|
53
106
|
|
54
107
|
# Creates a new AsyncProducer instance.
|
@@ -76,17 +129,38 @@ module Kafka
|
|
76
129
|
)
|
77
130
|
end
|
78
131
|
|
79
|
-
# Creates a new
|
80
|
-
#
|
81
|
-
#
|
82
|
-
#
|
83
|
-
#
|
132
|
+
# Creates a new Kafka consumer.
|
133
|
+
#
|
134
|
+
# @param group_id [String] the id of the group that the consumer should join.
|
135
|
+
# @param session_timeout [Integer] the number of seconds after which, if a client
|
136
|
+
# hasn't contacted the Kafka cluster, it will be kicked out of the group.
|
137
|
+
# @param offset_commit_interval [Integer] the interval between offset commits,
|
138
|
+
# in seconds.
|
139
|
+
# @param offset_commit_threshold [Integer] the number of messages that can be
|
140
|
+
# processed before their offsets are committed. If zero, offset commits are
|
141
|
+
# not triggered by message processing.
|
84
142
|
# @return [Consumer]
|
85
|
-
def consumer(
|
143
|
+
def consumer(group_id:, session_timeout: 30, offset_commit_interval: 10, offset_commit_threshold: 0)
|
144
|
+
group = ConsumerGroup.new(
|
145
|
+
cluster: @cluster,
|
146
|
+
logger: @logger,
|
147
|
+
group_id: group_id,
|
148
|
+
session_timeout: session_timeout,
|
149
|
+
)
|
150
|
+
|
151
|
+
offset_manager = OffsetManager.new(
|
152
|
+
group: group,
|
153
|
+
logger: @logger,
|
154
|
+
commit_interval: offset_commit_interval,
|
155
|
+
commit_threshold: offset_commit_threshold,
|
156
|
+
)
|
157
|
+
|
86
158
|
Consumer.new(
|
87
159
|
cluster: @cluster,
|
88
160
|
logger: @logger,
|
89
|
-
|
161
|
+
group: group,
|
162
|
+
offset_manager: offset_manager,
|
163
|
+
session_timeout: session_timeout,
|
90
164
|
)
|
91
165
|
end
|
92
166
|
|
@@ -185,5 +259,32 @@ module Kafka
|
|
185
259
|
def close
|
186
260
|
@cluster.disconnect
|
187
261
|
end
|
262
|
+
|
263
|
+
private
|
264
|
+
|
265
|
+
def build_ssl_context(ca_cert, client_cert, client_cert_key)
|
266
|
+
return nil unless ca_cert || client_cert || client_cert_key
|
267
|
+
|
268
|
+
ssl_context = OpenSSL::SSL::SSLContext.new
|
269
|
+
|
270
|
+
if client_cert && client_cert_key
|
271
|
+
ssl_context.set_params(
|
272
|
+
cert: OpenSSL::X509::Certificate.new(client_cert),
|
273
|
+
key: OpenSSL::PKey::RSA.new(client_cert_key)
|
274
|
+
)
|
275
|
+
elsif client_cert && !client_cert_key
|
276
|
+
raise ArgumentError, "Kafka client initialized with `ssl_client_cert` but no `ssl_client_cert_key`. Please provide both."
|
277
|
+
elsif !client_cert && client_cert_key
|
278
|
+
raise ArgumentError, "Kafka client initialized with `ssl_client_cert_key`, but no `ssl_client_cert`. Please provide both."
|
279
|
+
end
|
280
|
+
|
281
|
+
if ca_cert
|
282
|
+
store = OpenSSL::X509::Store.new
|
283
|
+
store.add_cert(OpenSSL::X509::Certificate.new(ca_cert))
|
284
|
+
ssl_context.cert_store = store
|
285
|
+
end
|
286
|
+
|
287
|
+
ssl_context
|
288
|
+
end
|
188
289
|
end
|
189
290
|
end
|
data/lib/kafka/connection.rb
CHANGED
@@ -1,5 +1,6 @@
|
|
1
1
|
require "stringio"
|
2
2
|
require "kafka/socket_with_timeout"
|
3
|
+
require "kafka/ssl_socket_with_timeout"
|
3
4
|
require "kafka/instrumentation"
|
4
5
|
require "kafka/protocol/request_message"
|
5
6
|
require "kafka/protocol/encoder"
|
@@ -42,12 +43,13 @@ module Kafka
|
|
42
43
|
# broker. Default is 10 seconds.
|
43
44
|
#
|
44
45
|
# @return [Connection] a new connection.
|
45
|
-
def initialize(host:, port:, client_id:, logger:, connect_timeout: nil, socket_timeout: nil)
|
46
|
+
def initialize(host:, port:, client_id:, logger:, connect_timeout: nil, socket_timeout: nil, ssl_context: nil)
|
46
47
|
@host, @port, @client_id = host, port, client_id
|
47
48
|
@logger = logger
|
48
49
|
|
49
50
|
@connect_timeout = connect_timeout || CONNECT_TIMEOUT
|
50
51
|
@socket_timeout = socket_timeout || SOCKET_TIMEOUT
|
52
|
+
@ssl_context = ssl_context
|
51
53
|
end
|
52
54
|
|
53
55
|
def to_s
|
@@ -101,7 +103,11 @@ module Kafka
|
|
101
103
|
def open
|
102
104
|
@logger.debug "Opening connection to #{@host}:#{@port} with client id #{@client_id}..."
|
103
105
|
|
104
|
-
|
106
|
+
if @ssl_context
|
107
|
+
@socket = SSLSocketWithTimeout.new(@host, @port, connect_timeout: @connect_timeout, timeout: @socket_timeout, ssl_context: @ssl_context)
|
108
|
+
else
|
109
|
+
@socket = SocketWithTimeout.new(@host, @port, connect_timeout: @connect_timeout, timeout: @socket_timeout)
|
110
|
+
end
|
105
111
|
|
106
112
|
@encoder = Kafka::Protocol::Encoder.new(@socket)
|
107
113
|
@decoder = Kafka::Protocol::Decoder.new(@socket)
|
data/lib/kafka/consumer.rb
CHANGED
@@ -1,4 +1,5 @@
|
|
1
1
|
require "kafka/consumer_group"
|
2
|
+
require "kafka/offset_manager"
|
2
3
|
require "kafka/fetch_operation"
|
3
4
|
|
4
5
|
module Kafka
|
@@ -50,28 +51,12 @@ module Kafka
|
|
50
51
|
#
|
51
52
|
class Consumer
|
52
53
|
|
53
|
-
|
54
|
-
#
|
55
|
-
# @param cluster [Kafka::Cluster]
|
56
|
-
# @param logger [Logger]
|
57
|
-
# @param group_id [String] the id of the group that the consumer should join.
|
58
|
-
# @param session_timeout [Integer] the interval between consumer heartbeats,
|
59
|
-
# in seconds.
|
60
|
-
def initialize(cluster:, logger:, group_id:, session_timeout: 30)
|
54
|
+
def initialize(cluster:, logger:, group:, offset_manager:, session_timeout:)
|
61
55
|
@cluster = cluster
|
62
56
|
@logger = logger
|
63
|
-
@
|
57
|
+
@group = group
|
58
|
+
@offset_manager = offset_manager
|
64
59
|
@session_timeout = session_timeout
|
65
|
-
|
66
|
-
@group = ConsumerGroup.new(
|
67
|
-
cluster: cluster,
|
68
|
-
logger: logger,
|
69
|
-
group_id: group_id,
|
70
|
-
session_timeout: @session_timeout,
|
71
|
-
)
|
72
|
-
|
73
|
-
@offsets = {}
|
74
|
-
@default_offsets = {}
|
75
60
|
end
|
76
61
|
|
77
62
|
# Subscribes the consumer to a topic.
|
@@ -87,7 +72,7 @@ module Kafka
|
|
87
72
|
# @return [nil]
|
88
73
|
def subscribe(topic, default_offset: :earliest)
|
89
74
|
@group.subscribe(topic)
|
90
|
-
@
|
75
|
+
@offset_manager.set_default_offset(topic, default_offset)
|
91
76
|
|
92
77
|
nil
|
93
78
|
end
|
@@ -111,15 +96,32 @@ module Kafka
|
|
111
96
|
batch = fetch_batch
|
112
97
|
|
113
98
|
batch.each do |message|
|
114
|
-
|
99
|
+
Instrumentation.instrument("process_message.consumer.kafka") do |notification|
|
100
|
+
notification.update(
|
101
|
+
topic: message.topic,
|
102
|
+
partition: message.partition,
|
103
|
+
offset: message.offset,
|
104
|
+
key: message.key,
|
105
|
+
value: message.value,
|
106
|
+
)
|
107
|
+
|
108
|
+
yield message
|
109
|
+
end
|
110
|
+
|
111
|
+
@offset_manager.commit_offsets_if_necessary
|
115
112
|
|
116
113
|
send_heartbeat_if_necessary
|
117
114
|
mark_message_as_processed(message)
|
118
115
|
end
|
119
116
|
rescue ConnectionError => e
|
120
|
-
@logger.error "Connection error while
|
121
|
-
|
122
|
-
|
117
|
+
@logger.error "Connection error while sending heartbeat; rejoining"
|
118
|
+
join_group
|
119
|
+
rescue UnknownMemberId
|
120
|
+
@logger.error "Kicked out of group; rejoining"
|
121
|
+
join_group
|
122
|
+
rescue RebalanceInProgress
|
123
|
+
@logger.error "Group is rebalancing; rejoining"
|
124
|
+
join_group
|
123
125
|
end
|
124
126
|
end
|
125
127
|
end
|
@@ -137,13 +139,20 @@ module Kafka
|
|
137
139
|
#
|
138
140
|
# @return [nil]
|
139
141
|
def shutdown
|
142
|
+
@offset_manager.commit_offsets
|
140
143
|
@group.leave
|
144
|
+
rescue ConnectionError
|
141
145
|
end
|
142
146
|
|
143
147
|
private
|
144
148
|
|
149
|
+
def join_group
|
150
|
+
@offset_manager.clear_offsets
|
151
|
+
@group.join
|
152
|
+
end
|
153
|
+
|
145
154
|
def fetch_batch
|
146
|
-
|
155
|
+
join_group unless @group.member?
|
147
156
|
|
148
157
|
@logger.debug "Fetching a batch of messages"
|
149
158
|
|
@@ -160,15 +169,9 @@ module Kafka
|
|
160
169
|
max_wait_time: 5,
|
161
170
|
)
|
162
171
|
|
163
|
-
offset_response = @group.fetch_offsets
|
164
|
-
|
165
172
|
assigned_partitions.each do |topic, partitions|
|
166
173
|
partitions.each do |partition|
|
167
|
-
offset = @
|
168
|
-
offset_response.offset_for(topic, partition)
|
169
|
-
}
|
170
|
-
|
171
|
-
offset = @default_offsets.fetch(topic) if offset < 0
|
174
|
+
offset = @offset_manager.next_offset_for(topic, partition)
|
172
175
|
|
173
176
|
@logger.debug "Fetching from #{topic}/#{partition} starting at offset #{offset}"
|
174
177
|
|
@@ -178,14 +181,13 @@ module Kafka
|
|
178
181
|
|
179
182
|
messages = operation.execute
|
180
183
|
|
181
|
-
@logger.
|
184
|
+
@logger.info "Fetched #{messages.count} messages"
|
182
185
|
|
183
186
|
messages
|
184
|
-
|
187
|
+
rescue ConnectionError => e
|
188
|
+
@logger.error "Connection error while fetching messages: #{e}"
|
185
189
|
|
186
|
-
|
187
|
-
@logger.debug "Committing offsets"
|
188
|
-
@group.commit_offsets(@offsets)
|
190
|
+
return []
|
189
191
|
end
|
190
192
|
|
191
193
|
# Sends a heartbeat if it would be necessary in order to avoid getting
|
@@ -204,8 +206,7 @@ module Kafka
|
|
204
206
|
end
|
205
207
|
|
206
208
|
def mark_message_as_processed(message)
|
207
|
-
@
|
208
|
-
@offsets[message.topic][message.partition] = message.offset + 1
|
209
|
+
@offset_manager.mark_as_processed(message.topic, message.partition, message.offset)
|
209
210
|
end
|
210
211
|
end
|
211
212
|
end
|
data/lib/kafka/consumer_group.rb
CHANGED
@@ -44,6 +44,7 @@ module Kafka
|
|
44
44
|
def leave
|
45
45
|
@logger.info "[#{@member_id}] Leaving group `#{@group_id}`"
|
46
46
|
coordinator.leave_group(group_id: @group_id, member_id: @member_id)
|
47
|
+
rescue ConnectionError
|
47
48
|
end
|
48
49
|
|
49
50
|
def fetch_offsets
|
@@ -66,14 +67,6 @@ module Kafka
|
|
66
67
|
Protocol.handle_error(error_code)
|
67
68
|
end
|
68
69
|
end
|
69
|
-
rescue UnknownMemberId
|
70
|
-
@logger.error "Kicked out of group; rejoining"
|
71
|
-
join
|
72
|
-
retry
|
73
|
-
rescue IllegalGeneration
|
74
|
-
@logger.error "Illegal generation #{@generation_id}; rejoining group"
|
75
|
-
join
|
76
|
-
retry
|
77
70
|
end
|
78
71
|
|
79
72
|
def heartbeat
|
@@ -86,15 +79,6 @@ module Kafka
|
|
86
79
|
)
|
87
80
|
|
88
81
|
Protocol.handle_error(response.error_code)
|
89
|
-
rescue ConnectionError => e
|
90
|
-
@logger.error "Connection error while sending heartbeat; rejoining"
|
91
|
-
join
|
92
|
-
rescue UnknownMemberId
|
93
|
-
@logger.error "Kicked out of group; rejoining"
|
94
|
-
join
|
95
|
-
rescue RebalanceInProgress
|
96
|
-
@logger.error "Group is rebalancing; rejoining"
|
97
|
-
join
|
98
82
|
end
|
99
83
|
|
100
84
|
private
|
@@ -130,8 +114,6 @@ module Kafka
|
|
130
114
|
end
|
131
115
|
|
132
116
|
def synchronize
|
133
|
-
@logger.info "[#{@member_id}] Synchronizing group"
|
134
|
-
|
135
117
|
group_assignment = {}
|
136
118
|
|
137
119
|
if group_leader?
|
@@ -0,0 +1,75 @@
|
|
1
|
+
module Kafka
|
2
|
+
class OffsetManager
|
3
|
+
def initialize(group:, logger:, commit_interval:, commit_threshold:)
|
4
|
+
@group = group
|
5
|
+
@logger = logger
|
6
|
+
@commit_interval = commit_interval
|
7
|
+
@commit_threshold = commit_threshold
|
8
|
+
|
9
|
+
@uncommitted_offsets = 0
|
10
|
+
@processed_offsets = {}
|
11
|
+
@default_offsets = {}
|
12
|
+
@committed_offsets = nil
|
13
|
+
@last_commit = Time.at(0)
|
14
|
+
end
|
15
|
+
|
16
|
+
def set_default_offset(topic, default_offset)
|
17
|
+
@default_offsets[topic] = default_offset
|
18
|
+
end
|
19
|
+
|
20
|
+
def mark_as_processed(topic, partition, offset)
|
21
|
+
@uncommitted_offsets += 1
|
22
|
+
@processed_offsets[topic] ||= {}
|
23
|
+
@processed_offsets[topic][partition] = offset + 1
|
24
|
+
end
|
25
|
+
|
26
|
+
def next_offset_for(topic, partition)
|
27
|
+
offset = @processed_offsets.fetch(topic, {}).fetch(partition) {
|
28
|
+
committed_offset_for(topic, partition)
|
29
|
+
}
|
30
|
+
|
31
|
+
offset = @default_offsets.fetch(topic) if offset < 0
|
32
|
+
|
33
|
+
offset
|
34
|
+
end
|
35
|
+
|
36
|
+
def commit_offsets
|
37
|
+
unless @processed_offsets.empty?
|
38
|
+
@logger.info "Committing offsets for #{@uncommitted_offsets} messages"
|
39
|
+
|
40
|
+
@group.commit_offsets(@processed_offsets)
|
41
|
+
|
42
|
+
@last_commit = Time.now
|
43
|
+
@processed_offsets.clear
|
44
|
+
@uncommitted_offsets = 0
|
45
|
+
end
|
46
|
+
end
|
47
|
+
|
48
|
+
def commit_offsets_if_necessary
|
49
|
+
if seconds_since_last_commit >= @commit_interval || commit_threshold_reached?
|
50
|
+
commit_offsets
|
51
|
+
end
|
52
|
+
end
|
53
|
+
|
54
|
+
def clear_offsets
|
55
|
+
@uncommitted_offsets = 0
|
56
|
+
@processed_offsets.clear
|
57
|
+
@committed_offsets = nil
|
58
|
+
end
|
59
|
+
|
60
|
+
private
|
61
|
+
|
62
|
+
def seconds_since_last_commit
|
63
|
+
Time.now - @last_commit
|
64
|
+
end
|
65
|
+
|
66
|
+
def committed_offset_for(topic, partition)
|
67
|
+
@committed_offsets ||= @group.fetch_offsets
|
68
|
+
@committed_offsets.offset_for(topic, partition)
|
69
|
+
end
|
70
|
+
|
71
|
+
def commit_threshold_reached?
|
72
|
+
@commit_threshold != 0 && @uncommitted_offsets >= @commit_threshold
|
73
|
+
end
|
74
|
+
end
|
75
|
+
end
|
data/lib/kafka/producer.rb
CHANGED
@@ -130,41 +130,7 @@ module Kafka
|
|
130
130
|
#
|
131
131
|
class Producer
|
132
132
|
|
133
|
-
|
134
|
-
#
|
135
|
-
# @param cluster [Cluster] the cluster client. Typically passed in for you.
|
136
|
-
#
|
137
|
-
# @param logger [Logger] the logger that should be used. Typically passed
|
138
|
-
# in for you.
|
139
|
-
#
|
140
|
-
# @param ack_timeout [Integer] The number of seconds a broker can wait for
|
141
|
-
# replicas to acknowledge a write before responding with a timeout.
|
142
|
-
#
|
143
|
-
# @param required_acks [Integer] The number of replicas that must acknowledge
|
144
|
-
# a write.
|
145
|
-
#
|
146
|
-
# @param max_retries [Integer] the number of retries that should be attempted
|
147
|
-
# before giving up sending messages to the cluster. Does not include the
|
148
|
-
# original attempt.
|
149
|
-
#
|
150
|
-
# @param retry_backoff [Integer] the number of seconds to wait between retries.
|
151
|
-
#
|
152
|
-
# @param max_buffer_size [Integer] the number of messages allowed in the buffer
|
153
|
-
# before new writes will raise {BufferOverflow} exceptions.
|
154
|
-
#
|
155
|
-
# @param max_buffer_bytesize [Integer] the maximum size of the buffer in bytes.
|
156
|
-
# attempting to produce messages when the buffer reaches this size will
|
157
|
-
# result in {BufferOverflow} being raised.
|
158
|
-
#
|
159
|
-
# @param compression_codec [Symbol, nil] the name of the compression codec to
|
160
|
-
# use, or nil if no compression should be performed. Valid codecs: `:snappy`
|
161
|
-
# and `:gzip`.
|
162
|
-
#
|
163
|
-
# @param compression_threshold [Integer] the number of messages that needs to
|
164
|
-
# be in a message set before it should be compressed. Note that message sets
|
165
|
-
# are per-partition rather than per-topic or per-producer.
|
166
|
-
#
|
167
|
-
def initialize(cluster:, logger:, compression_codec: nil, compression_threshold: 1, ack_timeout: 5, required_acks: 1, max_retries: 2, retry_backoff: 1, max_buffer_size: 1000, max_buffer_bytesize: 10_000_000)
|
133
|
+
def initialize(cluster:, logger:, compressor:, ack_timeout:, required_acks:, max_retries:, retry_backoff:, max_buffer_size:, max_buffer_bytesize:)
|
168
134
|
@cluster = cluster
|
169
135
|
@logger = logger
|
170
136
|
@required_acks = required_acks
|
@@ -173,11 +139,7 @@ module Kafka
|
|
173
139
|
@retry_backoff = retry_backoff
|
174
140
|
@max_buffer_size = max_buffer_size
|
175
141
|
@max_buffer_bytesize = max_buffer_bytesize
|
176
|
-
|
177
|
-
@compressor = Compressor.new(
|
178
|
-
codec_name: @compression_codec,
|
179
|
-
threshold: @compression_threshold,
|
180
|
-
)
|
142
|
+
@compressor = compressor
|
181
143
|
|
182
144
|
# The set of topics that are produced to.
|
183
145
|
@target_topics = Set.new
|
@@ -18,11 +18,14 @@ module Kafka
|
|
18
18
|
end
|
19
19
|
|
20
20
|
def offset_for(topic, partition)
|
21
|
-
offset_info = topics.fetch(topic).fetch(partition)
|
21
|
+
offset_info = topics.fetch(topic).fetch(partition, nil)
|
22
22
|
|
23
|
-
|
24
|
-
|
25
|
-
|
23
|
+
if offset_info
|
24
|
+
Protocol.handle_error(offset_info.error_code)
|
25
|
+
offset_info.offset
|
26
|
+
else
|
27
|
+
-1
|
28
|
+
end
|
26
29
|
end
|
27
30
|
|
28
31
|
def self.decode(decoder)
|
@@ -0,0 +1,154 @@
|
|
1
|
+
require "socket"
|
2
|
+
|
3
|
+
module Kafka
|
4
|
+
|
5
|
+
# Opens sockets in a non-blocking fashion, ensuring that we're not stalling
|
6
|
+
# for long periods of time.
|
7
|
+
#
|
8
|
+
# It's possible to set timeouts for connecting to the server, for reading data,
|
9
|
+
# and for writing data. Whenever a timeout is exceeded, Errno::ETIMEDOUT is
|
10
|
+
# raised.
|
11
|
+
#
|
12
|
+
class SSLSocketWithTimeout
|
13
|
+
|
14
|
+
# Opens a socket.
|
15
|
+
#
|
16
|
+
# @param host [String]
|
17
|
+
# @param port [Integer]
|
18
|
+
# @param connect_timeout [Integer] the connection timeout, in seconds.
|
19
|
+
# @param timeout [Integer] the read and write timeout, in seconds.
|
20
|
+
# @param ssl_context [OpenSSL::SSL::SSLContext] which SSLContext the ssl connection should use
|
21
|
+
# @raise [Errno::ETIMEDOUT] if the timeout is exceeded.
|
22
|
+
def initialize(host, port, connect_timeout: nil, timeout: nil, ssl_context:)
|
23
|
+
addr = Socket.getaddrinfo(host, nil)
|
24
|
+
sockaddr = Socket.pack_sockaddr_in(port, addr[0][3])
|
25
|
+
|
26
|
+
@timeout = timeout
|
27
|
+
|
28
|
+
@tcp_socket = Socket.new(Socket.const_get(addr[0][0]), Socket::SOCK_STREAM, 0)
|
29
|
+
@tcp_socket.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_NODELAY, 1)
|
30
|
+
|
31
|
+
# first initiate the TCP socket
|
32
|
+
begin
|
33
|
+
# Initiate the socket connection in the background. If it doesn't fail
|
34
|
+
# immediately it will raise an IO::WaitWritable (Errno::EINPROGRESS)
|
35
|
+
# indicating the connection is in progress.
|
36
|
+
@tcp_socket.connect_nonblock(sockaddr)
|
37
|
+
rescue IO::WaitWritable
|
38
|
+
# IO.select will block until the socket is writable or the timeout
|
39
|
+
# is exceeded, whichever comes first.
|
40
|
+
unless IO.select(nil, [@tcp_socket], nil, connect_timeout)
|
41
|
+
# IO.select returns nil when the socket is not ready before timeout
|
42
|
+
# seconds have elapsed
|
43
|
+
@tcp_socket.close
|
44
|
+
raise Errno::ETIMEDOUT
|
45
|
+
end
|
46
|
+
|
47
|
+
begin
|
48
|
+
# Verify there is now a good connection.
|
49
|
+
@tcp_socket.connect_nonblock(sockaddr)
|
50
|
+
rescue Errno::EISCONN
|
51
|
+
# The socket is connected, we're good!
|
52
|
+
end
|
53
|
+
end
|
54
|
+
|
55
|
+
# once that's connected, we can start initiating the ssl socket
|
56
|
+
@ssl_socket = OpenSSL::SSL::SSLSocket.new(@tcp_socket, ssl_context)
|
57
|
+
|
58
|
+
begin
|
59
|
+
# Initiate the socket connection in the background. If it doesn't fail
|
60
|
+
# immediately it will raise an IO::WaitWritable (Errno::EINPROGRESS)
|
61
|
+
# indicating the connection is in progress.
|
62
|
+
# Unlike waiting for a tcp socket to connect, you can't time out ssl socket
|
63
|
+
# connections during the connect phase properly, because IO.select only partially works.
|
64
|
+
# Instead, you have to retry.
|
65
|
+
@ssl_socket.connect_nonblock
|
66
|
+
rescue Errno::EAGAIN, Errno::EWOULDBLOCK, IO::WaitReadable
|
67
|
+
IO.select([@ssl_socket])
|
68
|
+
retry
|
69
|
+
rescue IO::WaitWritable
|
70
|
+
IO.select(nil, [@ssl_socket])
|
71
|
+
retry
|
72
|
+
end
|
73
|
+
end
|
74
|
+
|
75
|
+
# Reads bytes from the socket, possible with a timeout.
|
76
|
+
#
|
77
|
+
# @param num_bytes [Integer] the number of bytes to read.
|
78
|
+
# @raise [Errno::ETIMEDOUT] if the timeout is exceeded.
|
79
|
+
# @return [String] the data that was read from the socket.
|
80
|
+
def read(num_bytes)
|
81
|
+
buffer = ''
|
82
|
+
until buffer.length >= num_bytes
|
83
|
+
begin
|
84
|
+
# unlike plain tcp sockets, ssl sockets don't support IO.select
|
85
|
+
# properly.
|
86
|
+
# Instead, timeouts happen on a per read basis, and we have to
|
87
|
+
# catch exceptions from read_nonblock, and gradually build up
|
88
|
+
# our read buffer.
|
89
|
+
buffer << @ssl_socket.read_nonblock(num_bytes - buffer.length)
|
90
|
+
rescue IO::WaitReadable
|
91
|
+
unless IO.select([@ssl_socket], nil, nil, @timeout)
|
92
|
+
raise Errno::ETIMEDOUT
|
93
|
+
end
|
94
|
+
retry
|
95
|
+
rescue IO::WaitWritable
|
96
|
+
unless IO.select(nil, [@ssl_socket], nil, @timeout)
|
97
|
+
raise Errno::ETIMEDOUT
|
98
|
+
end
|
99
|
+
retry
|
100
|
+
end
|
101
|
+
end
|
102
|
+
buffer
|
103
|
+
end
|
104
|
+
|
105
|
+
# Writes bytes to the socket, possible with a timeout.
|
106
|
+
#
|
107
|
+
# @param bytes [String] the data that should be written to the socket.
|
108
|
+
# @raise [Errno::ETIMEDOUT] if the timeout is exceeded.
|
109
|
+
# @return [Integer] the number of bytes written.
|
110
|
+
def write(bytes)
|
111
|
+
loop do
|
112
|
+
written = 0
|
113
|
+
begin
|
114
|
+
# unlike plain tcp sockets, ssl sockets don't support IO.select
|
115
|
+
# properly.
|
116
|
+
# Instead, timeouts happen on a per write basis, and we have to
|
117
|
+
# catch exceptions from write_nonblock, and gradually build up
|
118
|
+
# our write buffer.
|
119
|
+
written += @ssl_socket.write_nonblock(bytes)
|
120
|
+
rescue Errno::EFAULT => error
|
121
|
+
raise error
|
122
|
+
rescue OpenSSL::SSL::SSLError, Errno::EAGAIN, Errno::EWOULDBLOCK, IO::WaitWritable => error
|
123
|
+
if error.is_a?(OpenSSL::SSL::SSLError) && error.message == 'write would block'
|
124
|
+
if IO.select(nil, [@ssl_socket], nil, @timeout)
|
125
|
+
retry
|
126
|
+
else
|
127
|
+
raise Errno::ETIMEDOUT
|
128
|
+
end
|
129
|
+
else
|
130
|
+
raise error
|
131
|
+
end
|
132
|
+
end
|
133
|
+
|
134
|
+
# Fast, common case.
|
135
|
+
break if written == bytes.size
|
136
|
+
|
137
|
+
# This takes advantage of the fact that most ruby implementations
|
138
|
+
# have Copy-On-Write strings. Thusly why requesting a subrange
|
139
|
+
# of data, we actually don't copy data because the new string
|
140
|
+
# simply references a subrange of the original.
|
141
|
+
bytes = bytes[written, bytes.size]
|
142
|
+
end
|
143
|
+
end
|
144
|
+
|
145
|
+
def close
|
146
|
+
@tcp_socket.close
|
147
|
+
@ssl_socket.close
|
148
|
+
end
|
149
|
+
|
150
|
+
def set_encoding(encoding)
|
151
|
+
@tcp_socket.set_encoding(encoding)
|
152
|
+
end
|
153
|
+
end
|
154
|
+
end
|
data/lib/kafka/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: ruby-kafka
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Daniel Schierbeck
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2016-03-
|
11
|
+
date: 2016-03-08 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -163,8 +163,10 @@ files:
|
|
163
163
|
- bin/console
|
164
164
|
- bin/setup
|
165
165
|
- circle.yml
|
166
|
+
- examples/firehose-producer.rb
|
166
167
|
- examples/simple-consumer.rb
|
167
168
|
- examples/simple-producer.rb
|
169
|
+
- examples/ssl-producer.rb
|
168
170
|
- lib/kafka.rb
|
169
171
|
- lib/kafka/async_producer.rb
|
170
172
|
- lib/kafka/broker.rb
|
@@ -181,6 +183,7 @@ files:
|
|
181
183
|
- lib/kafka/gzip_codec.rb
|
182
184
|
- lib/kafka/instrumentation.rb
|
183
185
|
- lib/kafka/message_buffer.rb
|
186
|
+
- lib/kafka/offset_manager.rb
|
184
187
|
- lib/kafka/partitioner.rb
|
185
188
|
- lib/kafka/pending_message.rb
|
186
189
|
- lib/kafka/pending_message_queue.rb
|
@@ -219,6 +222,7 @@ files:
|
|
219
222
|
- lib/kafka/round_robin_assignment_strategy.rb
|
220
223
|
- lib/kafka/snappy_codec.rb
|
221
224
|
- lib/kafka/socket_with_timeout.rb
|
225
|
+
- lib/kafka/ssl_socket_with_timeout.rb
|
222
226
|
- lib/kafka/version.rb
|
223
227
|
- lib/ruby-kafka.rb
|
224
228
|
- ruby-kafka.gemspec
|