ruby-kafka 0.2.0 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: e58cc6609bed1f99291323a15a90e1a8433370a9
4
- data.tar.gz: eca9c8f1efc71812b4d4d7580f0917a7921864f1
3
+ metadata.gz: 87132b87fc32443be48994590b059a88b6cc5fea
4
+ data.tar.gz: c32e7ed87e9dcc7c2ba6f3128db0aadb123d1962
5
5
  SHA512:
6
- metadata.gz: 7be411f4c72bd6de0154bbee286df4a0d67052af54f371fc643248cc6f6a8cb4760a2f4c18c47921a6031722be8c280502617fb27aa38423cf6add20667db992
7
- data.tar.gz: cb721b17f832fc9c4485e6db40902e5ebdd2fbbcf62d24b27879ffa697455f38eb2ebc05355b37a413264a2452169052a35c3f0910312752a328f35d787371a2
6
+ metadata.gz: 84d5def3fa8963f928d2e59239e9c2678cfbe2ba904ad67bc13080a16b861bb5296394c442c14853144d065a4b86f205a8ba378e28302f67f8b80c5eaf0b7100
7
+ data.tar.gz: b39147eb0fd72f1753af38831871a00f597d0191a70f3f47734f87a2212b7d1500c0b1e712c36755b7ab09b27d66a26ea020f6a2fc413ea9ed9a0e2e15449341
@@ -4,6 +4,13 @@ Changes and additions to the library will be listed here.
4
4
 
5
5
  ## Unreleased
6
6
 
7
+ ## v0.3.0
8
+
9
+ - Add support for encryption and authentication with SSL (Tom Crayford).
10
+ - Allow configuring consumer offset commit policies.
11
+ - Instrument consumer message processing.
12
+ - Fixed an issue causing exceptions when no logger was specified.
13
+
7
14
  ## v0.2.0
8
15
 
9
16
  - Add instrumentation of message compression.
data/README.md CHANGED
@@ -1,11 +1,29 @@
1
1
  # ruby-kafka
2
2
 
3
- [![Circle CI](https://circleci.com/gh/zendesk/ruby-kafka.svg?style=shield)](https://circleci.com/gh/zendesk/ruby-kafka/tree/master)
4
-
5
3
  A Ruby client library for [Apache Kafka](http://kafka.apache.org/), a distributed log and message bus. The focus of this library will be operational simplicity, with good logging and metrics that can make debugging issues easier.
6
4
 
7
5
  The Producer API is currently beta level and used in production. There's an alpha level Consumer Group API that has not yet been used in production and that may change without warning. Feel free to try it out but don't expect it to be stable or correct quite yet.
8
6
 
7
+ Although parts of this library work with Kafka 0.8 – specifically, the Producer API – it's being tested and developed against Kafka 0.9. The Consumer API will be 0.9 only.
8
+
9
+ #### Table of Contents
10
+
11
+ 1. [Installation](#installation)
12
+ 2. [Usage](#usage)
13
+ 1. [Producing Messages to Kafka](#producing-messages-to-kafka)
14
+ 1. [Asynchronously Producing Messages](#asynchronously-producing-messages)
15
+ 2. [Serialization](#serialization)
16
+ 3. [Partitioning](#partitioning)
17
+ 4. [Buffering and Error Handling](#buffering-and-error-handling)
18
+ 5. [Message Delivery Guarantees](#message-delivery-guarantees)
19
+ 6. [Compression](#compression)
20
+ 2. [Consuming Messages from Kafka](#consuming-messages-from-kafka)
21
+ 3. [Logging](#logging)
22
+ 4. [Understanding Timeouts](#understanding-timeouts)
23
+ 5. [Encryption and Authentication using SSL](#encryption-and-authentication-using-ssl)
24
+ 6. [Development](#development)
25
+ 7. [Roadmap](#roadmap)
26
+
9
27
  ## Installation
10
28
 
11
29
  Add this line to your application's Gemfile:
@@ -74,7 +92,7 @@ producer.deliver_messages
74
92
 
75
93
  Read the docs for [Kafka::Producer](http://www.rubydoc.info/gems/ruby-kafka/Kafka/Producer) for more details.
76
94
 
77
- ### Asynchronously Producing Messages
95
+ #### Asynchronously Producing Messages
78
96
 
79
97
  A normal producer will block while `#deliver_messages` is sending messages to Kafka, possible for tens of seconds or even minutes at a time, depending on your timeout and retry settings. Furthermore, you have to call `#deliver_messages` manually, with a frequency that balances batch size with message delay.
80
98
 
@@ -119,7 +137,7 @@ producer.produce("hello", topic: "greetings")
119
137
 
120
138
  **Note:** if the calling thread produces messages faster than the producer can write them to Kafka, you'll eventually run into problems. The internal queue used for sending messages from the calling thread to the background worker has a size limit; once this limit is reached, a call to `#produce` will raise `Kafka::BufferOverflow`.
121
139
 
122
- ### Serialization
140
+ #### Serialization
123
141
 
124
142
  This library is agnostic to which serialization format you prefer. Both the value and key of a message is treated as a binary string of data. This makes it easier to use whatever serialization format you want, since you don't have to do anything special to make it work with ruby-kafka. Here's an example of encoding data with JSON:
125
143
 
@@ -139,12 +157,12 @@ data = JSON.dump(event)
139
157
  producer.produce(data, topic: "events")
140
158
  ```
141
159
 
142
- ### Partitioning
160
+ #### Partitioning
143
161
 
144
162
  Kafka topics are partitioned, with messages being assigned to a partition by the client. This allows a great deal of flexibility for the users. This section describes several strategies for partitioning and how they impact performance, data locality, etc.
145
163
 
146
164
 
147
- #### Load Balanced Partitioning
165
+ ##### Load Balanced Partitioning
148
166
 
149
167
  When optimizing for efficiency, we either distribute messages as evenly as possible to all partitions, or make sure each producer always writes to a single partition. The former ensures an even load for downstream consumers; the latter ensures the highest producer performance, since message batching is done per partition.
150
168
 
@@ -163,7 +181,7 @@ producer.produce(msg2, topic: "messages", partition_key: partition_key)
163
181
 
164
182
  You can also base the partition key on some property of the producer, for example the host name.
165
183
 
166
- #### Semantic Partitioning
184
+ ##### Semantic Partitioning
167
185
 
168
186
  By assigning messages to a partition based on some property of the message, e.g. making sure all events tracked in a user session are assigned to the same partition, downstream consumers can make simplifying assumptions about data locality. In this example, a consumer can keep process local state pertaining to a user session knowing that all events for the session will be read from a single partition. This is also called _semantic partitioning_, since the partition assignment is part of the application behavior.
169
187
 
@@ -185,7 +203,7 @@ partition = some_number % partitions
185
203
  producer.produce(event, topic: "events", partition: partition)
186
204
  ```
187
205
 
188
- #### Compatibility with Other Clients
206
+ ##### Compatibility with Other Clients
189
207
 
190
208
  There's no standardized way to assign messages to partitions across different Kafka client implementations. If you have a heterogeneous set of clients producing messages to the same topics it may be important to ensure a consistent partitioning scheme. This library doesn't try to implement all schemes, so you'll have to figure out which scheme the other client is using and replicate it. An example:
191
209
 
@@ -198,7 +216,7 @@ partition = PartitioningScheme.assign(partitions, event)
198
216
  producer.produce(event, topic: "events", partition: partition)
199
217
  ```
200
218
 
201
- ### Buffering and Error Handling
219
+ #### Buffering and Error Handling
202
220
 
203
221
  The producer is designed for resilience in the face of temporary network errors, Kafka broker failovers, and other issues that prevent the client from writing messages to the destination topics. It does this by employing local, in-memory buffers. Only when messages are acknowledged by a Kafka broker will they be removed from the buffer.
204
222
 
@@ -208,6 +226,108 @@ Note that there's a maximum buffer size; pass in a different value for `max_buff
208
226
 
209
227
  A final note on buffers: local buffers give resilience against broker and network failures, and allow higher throughput due to message batching, but they also trade off consistency guarantees for higher availibility and resilience. If your local process dies while messages are buffered, those messages will be lost. If you require high levels of consistency, you should call `#deliver_messages` immediately after `#produce`.
210
228
 
229
+ #### Message Delivery Guarantees
230
+
231
+ There are basically two different and incompatible guarantees that can be made in a message delivery system such as Kafka:
232
+
233
+ 1. _at-most-once_ delivery guarantees that a message is at most delivered to the recipient _once_. This is useful only if delivering the message twice carries some risk and should be avoided. Implicit is the fact that there's no guarantee that the message will be delivered at all.
234
+ 2. _at-least-once_ delivery guarantees that a message is delivered, but it may be delivered more than once. If the final recipient de-duplicates messages, e.g. by checking a unique message id, then it's even possible to implement _exactly-once_ delivery.
235
+
236
+ Of these two options, ruby-kafka implements the second one: when in doubt about whether a message has been delivered, a producer will try to deliver it again.
237
+
238
+ The guarantee is made only for the synchronous producer and boils down to this:
239
+
240
+ ```ruby
241
+ producer = kafka.producer
242
+
243
+ producer.produce("hello", topic: "greetings")
244
+
245
+ # If this line fails with Kafka::DeliveryFailed we *may* have succeeded in deliverying
246
+ # the message to Kafka but won't know for sure.
247
+ producer.deliver_messages
248
+
249
+ # If we get to this line we can be sure that the message has been delivered to Kafka!
250
+ ```
251
+
252
+ That is, once `#deliver_messages` returns we can be sure that Kafka has received the message. Note that there are some big caveats here:
253
+
254
+ - Depending on how your cluster and topic is configured the message could still be lost by Kafka.
255
+ - If you configure the producer to not require acknowledgements from the Kafka brokers by setting `required_acks` to zero there is no guarantee that the messsage will ever make it to a Kafka broker.
256
+ - If you use the asynchronous producer there's no guarantee that messages will have been delivered after `#deliver_messages` returns. A way of blocking until a message has been delivered with the asynchronous producer may be implemented in the future.
257
+
258
+ #### Compression
259
+
260
+ Depending on what kind of data you produce, enabling compression may yield improved bandwidth and space usage. Compression in Kafka is done on entire messages sets rather than on individual messages. This improves the compression rate and generally means that compressions works better the larger your buffers get, since the message sets will be larger by the time they're compressed.
261
+
262
+ Since many workloads have variations in throughput and distribution across partitions, it's possible to configure a threshold for when to enable compression by setting `compression_threshold`. Only if the defined number of messages are buffered for a partition will the messages be compressed.
263
+
264
+ Compression is enabled by passing the `compression_codec` parameter to `#producer` with the name of one of the algorithms allowed by Kafka:
265
+
266
+ * `:snappy` for [Snappy](http://google.github.io/snappy/) compression.
267
+ * `:gzip` for [gzip](https://en.wikipedia.org/wiki/Gzip) compression.
268
+
269
+ By default, all message sets will be compressed if you specify a compression codec. To increase the compression threshold, set `compression_threshold` to an integer value higher than one.
270
+
271
+ ```ruby
272
+ producer = kafka.producer(
273
+ compression_codec: :snappy,
274
+ compression_threshold: 10,
275
+ )
276
+ ```
277
+
278
+ ### Consuming Messages from Kafka
279
+
280
+ **Warning:** The Consumer API is still alpha level and will likely change. The consumer code should not be considered stable, as it hasn't been exhaustively tested in production environments yet.
281
+
282
+ The simplest way to consume messages from a Kafka topic is using the `#fetch_messages` API:
283
+
284
+ ```ruby
285
+ require "kafka"
286
+
287
+ kafka = Kafka.new(seed_brokers: ["kafka1:9092", "kafka2:9092"])
288
+
289
+ messages = kafka.fetch_messages(topic: "greetings", partition: 42)
290
+
291
+ messages.each do |message|
292
+ puts message.offset, message.key, message.value
293
+ end
294
+ ```
295
+
296
+ While this is great for extremely simple use cases, there are a number of downsides:
297
+
298
+ - You can only fetch from a single topic and partition at a time.
299
+ - If you want to have multiple processes consume from the same topic, there's no way of coordinating which processes should fetch from which partitions.
300
+ - If a process dies, there's no way to have another process resume fetching from the point in the partition that the original process had reached.
301
+
302
+ The Consumer API solves all of these issues, and more. It uses the Consumer Groups feature released in Kafka 0.9 to allow multiple consumer processes to coordinate access to a topic, assigning each partition to a single consumer. When a consumer fails, the partitions that were assigned to it are re-assigned to other members of the group.
303
+
304
+ Using the API is simple:
305
+
306
+ ```ruby
307
+ require "kafka"
308
+
309
+ kafka = Kafka.new(seed_brokers: ["kafka1:9092", "kafka2:9092"])
310
+
311
+ # Consumers with the same group id will form a Consumer Group together.
312
+ consumer = kafka.consumer(group_id: "my-consumer")
313
+
314
+ consumer.subscribe("greetings")
315
+
316
+ begin
317
+ # This will loop indefinitely, yielding each message in turn.
318
+ consumer.each_message do |message|
319
+ puts message.topic, message.partition
320
+ puts message.offset, message.key, message.value
321
+ end
322
+ ensure
323
+ # Always make sure to shut down the consumer properly.
324
+ consumer.shutdown
325
+ end
326
+ ```
327
+
328
+ Each consumer process will be assigned one or more partitions from each topic that the group subscribes to. In order to handle more messages, simply start more processes.
329
+
330
+
211
331
  ### Logging
212
332
 
213
333
  It's a very good idea to configure the Kafka client with a logger. All important operations and errors are logged. When instantiating your client, simply pass in a valid logger:
@@ -239,33 +359,65 @@ When sending many messages, it's likely that the client needs to send some messa
239
359
 
240
360
  Make sure your application can survive being blocked for so long.
241
361
 
362
+ ### Encryption and Authentication using SSL
363
+
364
+ By default, communication between Kafka clients and brokers is unencrypted and unauthenticated. Kafka 0.9 added optional support for [encryption and client authentication and authorization](http://kafka.apache.org/documentation.html#security_ssl). There are two layers of security made possible by this:
365
+
366
+ #### Encryption of Communication
367
+
368
+ By enabling SSL encryption you can have some confidence that messages can be sent to Kafka over an untrusted network without being intercepted.
369
+
370
+ In this case you just need to pass a valid CA certificate as a string when configuring your `Kafka` client:
371
+
372
+ ```ruby
373
+ kafka = Kafka.new(
374
+ ssl_ca_cert: File.read('my_ca_cert.pem'),
375
+ # ...
376
+ )
377
+ ```
378
+
379
+ Without passing the CA certificate to the client it would be impossible to protect against [man-in-the-middle attacks](https://en.wikipedia.org/wiki/Man-in-the-middle_attack).
380
+
381
+ #### Client Authentication
382
+
383
+ In order to authenticate the client to the cluster, you need to pass in a certificate and key created for the client and trusted by the brokers.
384
+
385
+ ```ruby
386
+ kafka = Kafka.new(
387
+ ssl_ca_cert: File.read('my_ca_cert.pem'),
388
+ ssl_client_cert: File.read('my_client_cert.pem'),
389
+ ssl_client_cert_key: File.read('my_client_cert_key.pem'),
390
+ # ...
391
+ )
392
+ ```
393
+
394
+ Once client authentication is set up, it is possible to configure the Kafka cluster to [authorize client requests](http://kafka.apache.org/documentation.html#security_authz).
395
+
242
396
  ## Development
243
397
 
244
398
  After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
245
399
 
246
400
  **Note:** the specs require a working [Docker](https://www.docker.com/) instance, but should work out of the box if you have Docker installed. Please create an issue if that's not the case.
247
401
 
248
- ## Roadmap
249
-
250
- The current stable release is v0.1. This release is running in production at Zendesk, but it's still not recommended that you use it when data loss is unacceptable. It will take a little while until all edge cases have been uncovered and handled.
402
+ [![Circle CI](https://circleci.com/gh/zendesk/ruby-kafka.svg?style=shield)](https://circleci.com/gh/zendesk/ruby-kafka/tree/master)
251
403
 
252
- The API may still be changed in v0.2.
404
+ ## Roadmap
253
405
 
254
- ### v0.2: Stable Producer API
406
+ The current stable release is v0.2. This release is running in production at Zendesk, but it's still not recommended that you use it when data loss is unacceptable. It will take a little while until all edge cases have been uncovered and handled.
255
407
 
256
- Target date: end of February.
408
+ ### v0.3
257
409
 
258
- The API should now have stabilized and the library should be battle tested enough to deploy for critical use cases.
410
+ Beta release of the Consumer API, allowing balanced Consumer Groups coordinating access to partitions. Kafka 0.9 only.
259
411
 
260
- ### v1.0: Consumer API
412
+ ### v1.0
261
413
 
262
- The Consumer API defined by Kafka 0.9 will be implemented.
414
+ API freeze. All new changes will be backwards compatible.
263
415
 
264
416
  ## Why a new library?
265
417
 
266
418
  There are a few existing Kafka clients in Ruby:
267
419
 
268
- * [Poseidon](https://github.com/bpot/poseidon) seems to work for Kafka 0.8, but the project has is unmaintained and has known issues.
420
+ * [Poseidon](https://github.com/bpot/poseidon) seems to work for Kafka 0.8, but the project is unmaintained and has known issues.
269
421
  * [Hermann](https://github.com/reiseburo/hermann) wraps the C library [librdkafka](https://github.com/edenhill/librdkafka) and seems to be very efficient, but its API and mode of operation is too intrusive for our needs.
270
422
  * [jruby-kafka](https://github.com/joekiller/jruby-kafka) is a great option if you're running on JRuby.
271
423
 
@@ -0,0 +1,49 @@
1
+ $LOAD_PATH.unshift(File.expand_path("../../lib", __FILE__))
2
+
3
+ require "kafka"
4
+
5
+ KAFKA_CLIENT_CERT = ENV.fetch("KAFKA_CLIENT_CERT")
6
+ KAFKA_CLIENT_CERT_KEY = ENV.fetch("KAFKA_CLIENT_CERT_KEY")
7
+ KAFKA_SERVER_CERT = ENV.fetch("KAFKA_SERVER_CERT")
8
+ KAFKA_URL = ENV.fetch("KAFKA_URL")
9
+ KAFKA_BROKERS = KAFKA_URL.gsub("kafka+ssl://", "").split(",")
10
+ KAFKA_TOPIC = "test-messages"
11
+
12
+ NUM_THREADS = 20
13
+
14
+ threads = NUM_THREADS.times.map do
15
+ Thread.new do
16
+ logger = Logger.new($stderr)
17
+ logger.level = Logger::INFO
18
+
19
+ kafka = Kafka.new(
20
+ seed_brokers: KAFKA_BROKERS,
21
+ logger: logger,
22
+ ssl_client_cert: KAFKA_CLIENT_CERT,
23
+ ssl_client_cert_key: KAFKA_CLIENT_CERT_KEY,
24
+ ssl_ca_cert: KAFKA_SERVER_CERT,
25
+ )
26
+
27
+ producer = kafka.async_producer(
28
+ delivery_interval: 1,
29
+ max_queue_size: 5_000,
30
+ max_buffer_size: 10_000,
31
+ )
32
+
33
+ begin
34
+ loop do
35
+ producer.produce(rand.to_s, key: rand.to_s, topic: KAFKA_TOPIC)
36
+ end
37
+ rescue Kafka::BufferOverflow
38
+ logger.error "Buffer overflow, backing off for 1s"
39
+ sleep 1
40
+ retry
41
+ ensure
42
+ producer.shutdown
43
+ end
44
+ end
45
+ end
46
+
47
+ threads.each {|t| t.abort_on_exception = true }
48
+
49
+ threads.map(&:join)
@@ -0,0 +1,42 @@
1
+ # Reads lines from STDIN, writing them to Kafka.
2
+
3
+ $LOAD_PATH.unshift(File.expand_path("../../lib", __FILE__))
4
+
5
+ require "kafka"
6
+
7
+ logger = Logger.new($stderr)
8
+ brokers = ENV.fetch("KAFKA_BROKERS").split(",")
9
+
10
+ # Make sure to create this topic in your Kafka cluster or configure the
11
+ # cluster to auto-create topics.
12
+ topic = "page-visits"
13
+
14
+ ssl_context = OpenSSL::SSL::SSLContext.new
15
+ ssl_context.set_params(
16
+ cert: OpenSSL::X509::Certificate.new(ENV.fetch("KAFKA_CLIENT_CERT")),
17
+ key: OpenSSL::PKey::RSA.new(ENV.fetch("KAFKA_CLIENT_CERT_KEY")),
18
+ )
19
+
20
+ kafka = Kafka.new(
21
+ seed_brokers: brokers,
22
+ client_id: "ssl-producer",
23
+ logger: logger,
24
+ ssl: true,
25
+ ssl_context: ssl_context,
26
+ )
27
+
28
+ producer = kafka.producer
29
+
30
+ begin
31
+ $stdin.each_with_index do |line, index|
32
+ producer.produce(line, topic: topic)
33
+
34
+ # Send messages for every 10 lines.
35
+ producer.deliver_messages if index % 10 == 0
36
+ end
37
+ ensure
38
+ # Make sure to send any remaining messages.
39
+ producer.deliver_messages
40
+
41
+ producer.shutdown
42
+ end
@@ -2,12 +2,13 @@ require "kafka/broker"
2
2
 
3
3
  module Kafka
4
4
  class BrokerPool
5
- def initialize(client_id:, connect_timeout: nil, socket_timeout: nil, logger:)
5
+ def initialize(client_id:, connect_timeout: nil, socket_timeout: nil, logger:, ssl_context: nil)
6
6
  @client_id = client_id
7
7
  @connect_timeout = connect_timeout
8
8
  @socket_timeout = socket_timeout
9
9
  @logger = logger
10
10
  @brokers = {}
11
+ @ssl_context = ssl_context
11
12
  end
12
13
 
13
14
  def connect(host, port, node_id: nil)
@@ -21,6 +22,7 @@ module Kafka
21
22
  connect_timeout: @connect_timeout,
22
23
  socket_timeout: @socket_timeout,
23
24
  logger: @logger,
25
+ ssl_context: @ssl_context,
24
26
  )
25
27
 
26
28
  @brokers[node_id] = broker unless node_id.nil?
@@ -1,3 +1,5 @@
1
+ require "openssl"
2
+
1
3
  require "kafka/cluster"
2
4
  require "kafka/producer"
3
5
  require "kafka/consumer"
@@ -23,32 +25,83 @@ module Kafka
23
25
  # @param socket_timeout [Integer, nil] the timeout setting for socket
24
26
  # connections. See {BrokerPool#initialize}.
25
27
  #
28
+ # @param ssl_ca_cert [String, nil] a PEM encoded CA cert to use with an
29
+ # SSL connection.
30
+ #
31
+ # @param ssl_client_cert [String, nil] a PEM encoded client cert to use with an
32
+ # SSL connection. Must be used in combination with ssl_client_cert_key.
33
+ #
34
+ # @param ssl_client_cert_key [String, nil] a PEM encoded client cert key to use with an
35
+ # SSL connection. Must be used in combination with ssl_client_cert.
36
+ #
26
37
  # @return [Client]
27
- def initialize(seed_brokers:, client_id: "ruby-kafka", logger: nil, connect_timeout: nil, socket_timeout: nil)
28
- @logger = logger || Logger.new("/dev/null")
38
+ def initialize(seed_brokers:, client_id: "ruby-kafka", logger: nil, connect_timeout: nil, socket_timeout: nil, ssl_ca_cert: nil, ssl_client_cert: nil, ssl_client_cert_key: nil)
39
+ @logger = logger || Logger.new(nil)
40
+
41
+ ssl_context = build_ssl_context(ssl_ca_cert, ssl_client_cert, ssl_client_cert_key)
29
42
 
30
43
  broker_pool = BrokerPool.new(
31
44
  client_id: client_id,
32
45
  connect_timeout: connect_timeout,
33
46
  socket_timeout: socket_timeout,
34
- logger: logger,
47
+ logger: @logger,
48
+ ssl_context: ssl_context,
35
49
  )
36
50
 
37
51
  @cluster = Cluster.new(
38
52
  seed_brokers: seed_brokers,
39
53
  broker_pool: broker_pool,
40
- logger: logger,
54
+ logger: @logger,
41
55
  )
42
56
  end
43
57
 
44
- # Builds a new producer.
58
+ # Initializes a new Kafka producer.
59
+ #
60
+ # @param ack_timeout [Integer] The number of seconds a broker can wait for
61
+ # replicas to acknowledge a write before responding with a timeout.
62
+ #
63
+ # @param required_acks [Integer] The number of replicas that must acknowledge
64
+ # a write.
45
65
  #
46
- # `options` are passed to {Producer#initialize}.
66
+ # @param max_retries [Integer] the number of retries that should be attempted
67
+ # before giving up sending messages to the cluster. Does not include the
68
+ # original attempt.
69
+ #
70
+ # @param retry_backoff [Integer] the number of seconds to wait between retries.
71
+ #
72
+ # @param max_buffer_size [Integer] the number of messages allowed in the buffer
73
+ # before new writes will raise {BufferOverflow} exceptions.
74
+ #
75
+ # @param max_buffer_bytesize [Integer] the maximum size of the buffer in bytes.
76
+ # attempting to produce messages when the buffer reaches this size will
77
+ # result in {BufferOverflow} being raised.
78
+ #
79
+ # @param compression_codec [Symbol, nil] the name of the compression codec to
80
+ # use, or nil if no compression should be performed. Valid codecs: `:snappy`
81
+ # and `:gzip`.
82
+ #
83
+ # @param compression_threshold [Integer] the number of messages that needs to
84
+ # be in a message set before it should be compressed. Note that message sets
85
+ # are per-partition rather than per-topic or per-producer.
47
86
  #
48
- # @see Producer#initialize
49
87
  # @return [Kafka::Producer] the Kafka producer.
50
- def producer(**options)
51
- Producer.new(cluster: @cluster, logger: @logger, **options)
88
+ def producer(compression_codec: nil, compression_threshold: 1, ack_timeout: 5, required_acks: 1, max_retries: 2, retry_backoff: 1, max_buffer_size: 1000, max_buffer_bytesize: 10_000_000)
89
+ compressor = Compressor.new(
90
+ codec_name: compression_codec,
91
+ threshold: compression_threshold,
92
+ )
93
+
94
+ Producer.new(
95
+ cluster: @cluster,
96
+ logger: @logger,
97
+ compressor: compressor,
98
+ ack_timeout: ack_timeout,
99
+ required_acks: required_acks,
100
+ max_retries: max_retries,
101
+ retry_backoff: retry_backoff,
102
+ max_buffer_size: max_buffer_size,
103
+ max_buffer_bytesize: max_buffer_bytesize,
104
+ )
52
105
  end
53
106
 
54
107
  # Creates a new AsyncProducer instance.
@@ -76,17 +129,38 @@ module Kafka
76
129
  )
77
130
  end
78
131
 
79
- # Creates a new Consumer instance.
80
- #
81
- # `options` are passed to {Consumer#initialize}.
82
- #
83
- # @see Consumer
132
+ # Creates a new Kafka consumer.
133
+ #
134
+ # @param group_id [String] the id of the group that the consumer should join.
135
+ # @param session_timeout [Integer] the number of seconds after which, if a client
136
+ # hasn't contacted the Kafka cluster, it will be kicked out of the group.
137
+ # @param offset_commit_interval [Integer] the interval between offset commits,
138
+ # in seconds.
139
+ # @param offset_commit_threshold [Integer] the number of messages that can be
140
+ # processed before their offsets are committed. If zero, offset commits are
141
+ # not triggered by message processing.
84
142
  # @return [Consumer]
85
- def consumer(**options)
143
+ def consumer(group_id:, session_timeout: 30, offset_commit_interval: 10, offset_commit_threshold: 0)
144
+ group = ConsumerGroup.new(
145
+ cluster: @cluster,
146
+ logger: @logger,
147
+ group_id: group_id,
148
+ session_timeout: session_timeout,
149
+ )
150
+
151
+ offset_manager = OffsetManager.new(
152
+ group: group,
153
+ logger: @logger,
154
+ commit_interval: offset_commit_interval,
155
+ commit_threshold: offset_commit_threshold,
156
+ )
157
+
86
158
  Consumer.new(
87
159
  cluster: @cluster,
88
160
  logger: @logger,
89
- **options,
161
+ group: group,
162
+ offset_manager: offset_manager,
163
+ session_timeout: session_timeout,
90
164
  )
91
165
  end
92
166
 
@@ -185,5 +259,32 @@ module Kafka
185
259
  def close
186
260
  @cluster.disconnect
187
261
  end
262
+
263
+ private
264
+
265
+ def build_ssl_context(ca_cert, client_cert, client_cert_key)
266
+ return nil unless ca_cert || client_cert || client_cert_key
267
+
268
+ ssl_context = OpenSSL::SSL::SSLContext.new
269
+
270
+ if client_cert && client_cert_key
271
+ ssl_context.set_params(
272
+ cert: OpenSSL::X509::Certificate.new(client_cert),
273
+ key: OpenSSL::PKey::RSA.new(client_cert_key)
274
+ )
275
+ elsif client_cert && !client_cert_key
276
+ raise ArgumentError, "Kafka client initialized with `ssl_client_cert` but no `ssl_client_cert_key`. Please provide both."
277
+ elsif !client_cert && client_cert_key
278
+ raise ArgumentError, "Kafka client initialized with `ssl_client_cert_key`, but no `ssl_client_cert`. Please provide both."
279
+ end
280
+
281
+ if ca_cert
282
+ store = OpenSSL::X509::Store.new
283
+ store.add_cert(OpenSSL::X509::Certificate.new(ca_cert))
284
+ ssl_context.cert_store = store
285
+ end
286
+
287
+ ssl_context
288
+ end
188
289
  end
189
290
  end
@@ -1,5 +1,6 @@
1
1
  require "stringio"
2
2
  require "kafka/socket_with_timeout"
3
+ require "kafka/ssl_socket_with_timeout"
3
4
  require "kafka/instrumentation"
4
5
  require "kafka/protocol/request_message"
5
6
  require "kafka/protocol/encoder"
@@ -42,12 +43,13 @@ module Kafka
42
43
  # broker. Default is 10 seconds.
43
44
  #
44
45
  # @return [Connection] a new connection.
45
- def initialize(host:, port:, client_id:, logger:, connect_timeout: nil, socket_timeout: nil)
46
+ def initialize(host:, port:, client_id:, logger:, connect_timeout: nil, socket_timeout: nil, ssl_context: nil)
46
47
  @host, @port, @client_id = host, port, client_id
47
48
  @logger = logger
48
49
 
49
50
  @connect_timeout = connect_timeout || CONNECT_TIMEOUT
50
51
  @socket_timeout = socket_timeout || SOCKET_TIMEOUT
52
+ @ssl_context = ssl_context
51
53
  end
52
54
 
53
55
  def to_s
@@ -101,7 +103,11 @@ module Kafka
101
103
  def open
102
104
  @logger.debug "Opening connection to #{@host}:#{@port} with client id #{@client_id}..."
103
105
 
104
- @socket = SocketWithTimeout.new(@host, @port, connect_timeout: @connect_timeout, timeout: @socket_timeout)
106
+ if @ssl_context
107
+ @socket = SSLSocketWithTimeout.new(@host, @port, connect_timeout: @connect_timeout, timeout: @socket_timeout, ssl_context: @ssl_context)
108
+ else
109
+ @socket = SocketWithTimeout.new(@host, @port, connect_timeout: @connect_timeout, timeout: @socket_timeout)
110
+ end
105
111
 
106
112
  @encoder = Kafka::Protocol::Encoder.new(@socket)
107
113
  @decoder = Kafka::Protocol::Decoder.new(@socket)
@@ -1,4 +1,5 @@
1
1
  require "kafka/consumer_group"
2
+ require "kafka/offset_manager"
2
3
  require "kafka/fetch_operation"
3
4
 
4
5
  module Kafka
@@ -50,28 +51,12 @@ module Kafka
50
51
  #
51
52
  class Consumer
52
53
 
53
- # Creates a new Consumer.
54
- #
55
- # @param cluster [Kafka::Cluster]
56
- # @param logger [Logger]
57
- # @param group_id [String] the id of the group that the consumer should join.
58
- # @param session_timeout [Integer] the interval between consumer heartbeats,
59
- # in seconds.
60
- def initialize(cluster:, logger:, group_id:, session_timeout: 30)
54
+ def initialize(cluster:, logger:, group:, offset_manager:, session_timeout:)
61
55
  @cluster = cluster
62
56
  @logger = logger
63
- @group_id = group_id
57
+ @group = group
58
+ @offset_manager = offset_manager
64
59
  @session_timeout = session_timeout
65
-
66
- @group = ConsumerGroup.new(
67
- cluster: cluster,
68
- logger: logger,
69
- group_id: group_id,
70
- session_timeout: @session_timeout,
71
- )
72
-
73
- @offsets = {}
74
- @default_offsets = {}
75
60
  end
76
61
 
77
62
  # Subscribes the consumer to a topic.
@@ -87,7 +72,7 @@ module Kafka
87
72
  # @return [nil]
88
73
  def subscribe(topic, default_offset: :earliest)
89
74
  @group.subscribe(topic)
90
- @default_offsets[topic] = default_offset
75
+ @offset_manager.set_default_offset(topic, default_offset)
91
76
 
92
77
  nil
93
78
  end
@@ -111,15 +96,32 @@ module Kafka
111
96
  batch = fetch_batch
112
97
 
113
98
  batch.each do |message|
114
- yield message
99
+ Instrumentation.instrument("process_message.consumer.kafka") do |notification|
100
+ notification.update(
101
+ topic: message.topic,
102
+ partition: message.partition,
103
+ offset: message.offset,
104
+ key: message.key,
105
+ value: message.value,
106
+ )
107
+
108
+ yield message
109
+ end
110
+
111
+ @offset_manager.commit_offsets_if_necessary
115
112
 
116
113
  send_heartbeat_if_necessary
117
114
  mark_message_as_processed(message)
118
115
  end
119
116
  rescue ConnectionError => e
120
- @logger.error "Connection error while fetching messages: #{e}"
121
- else
122
- commit_offsets unless batch.nil? || batch.empty?
117
+ @logger.error "Connection error while sending heartbeat; rejoining"
118
+ join_group
119
+ rescue UnknownMemberId
120
+ @logger.error "Kicked out of group; rejoining"
121
+ join_group
122
+ rescue RebalanceInProgress
123
+ @logger.error "Group is rebalancing; rejoining"
124
+ join_group
123
125
  end
124
126
  end
125
127
  end
@@ -137,13 +139,20 @@ module Kafka
137
139
  #
138
140
  # @return [nil]
139
141
  def shutdown
142
+ @offset_manager.commit_offsets
140
143
  @group.leave
144
+ rescue ConnectionError
141
145
  end
142
146
 
143
147
  private
144
148
 
149
+ def join_group
150
+ @offset_manager.clear_offsets
151
+ @group.join
152
+ end
153
+
145
154
  def fetch_batch
146
- @group.join unless @group.member?
155
+ join_group unless @group.member?
147
156
 
148
157
  @logger.debug "Fetching a batch of messages"
149
158
 
@@ -160,15 +169,9 @@ module Kafka
160
169
  max_wait_time: 5,
161
170
  )
162
171
 
163
- offset_response = @group.fetch_offsets
164
-
165
172
  assigned_partitions.each do |topic, partitions|
166
173
  partitions.each do |partition|
167
- offset = @offsets.fetch(topic, {}).fetch(partition) {
168
- offset_response.offset_for(topic, partition)
169
- }
170
-
171
- offset = @default_offsets.fetch(topic) if offset < 0
174
+ offset = @offset_manager.next_offset_for(topic, partition)
172
175
 
173
176
  @logger.debug "Fetching from #{topic}/#{partition} starting at offset #{offset}"
174
177
 
@@ -178,14 +181,13 @@ module Kafka
178
181
 
179
182
  messages = operation.execute
180
183
 
181
- @logger.debug "Fetched #{messages.count} messages"
184
+ @logger.info "Fetched #{messages.count} messages"
182
185
 
183
186
  messages
184
- end
187
+ rescue ConnectionError => e
188
+ @logger.error "Connection error while fetching messages: #{e}"
185
189
 
186
- def commit_offsets
187
- @logger.debug "Committing offsets"
188
- @group.commit_offsets(@offsets)
190
+ return []
189
191
  end
190
192
 
191
193
  # Sends a heartbeat if it would be necessary in order to avoid getting
@@ -204,8 +206,7 @@ module Kafka
204
206
  end
205
207
 
206
208
  def mark_message_as_processed(message)
207
- @offsets[message.topic] ||= {}
208
- @offsets[message.topic][message.partition] = message.offset + 1
209
+ @offset_manager.mark_as_processed(message.topic, message.partition, message.offset)
209
210
  end
210
211
  end
211
212
  end
@@ -44,6 +44,7 @@ module Kafka
44
44
  def leave
45
45
  @logger.info "[#{@member_id}] Leaving group `#{@group_id}`"
46
46
  coordinator.leave_group(group_id: @group_id, member_id: @member_id)
47
+ rescue ConnectionError
47
48
  end
48
49
 
49
50
  def fetch_offsets
@@ -66,14 +67,6 @@ module Kafka
66
67
  Protocol.handle_error(error_code)
67
68
  end
68
69
  end
69
- rescue UnknownMemberId
70
- @logger.error "Kicked out of group; rejoining"
71
- join
72
- retry
73
- rescue IllegalGeneration
74
- @logger.error "Illegal generation #{@generation_id}; rejoining group"
75
- join
76
- retry
77
70
  end
78
71
 
79
72
  def heartbeat
@@ -86,15 +79,6 @@ module Kafka
86
79
  )
87
80
 
88
81
  Protocol.handle_error(response.error_code)
89
- rescue ConnectionError => e
90
- @logger.error "Connection error while sending heartbeat; rejoining"
91
- join
92
- rescue UnknownMemberId
93
- @logger.error "Kicked out of group; rejoining"
94
- join
95
- rescue RebalanceInProgress
96
- @logger.error "Group is rebalancing; rejoining"
97
- join
98
82
  end
99
83
 
100
84
  private
@@ -130,8 +114,6 @@ module Kafka
130
114
  end
131
115
 
132
116
  def synchronize
133
- @logger.info "[#{@member_id}] Synchronizing group"
134
-
135
117
  group_assignment = {}
136
118
 
137
119
  if group_leader?
@@ -0,0 +1,75 @@
1
+ module Kafka
2
+ class OffsetManager
3
+ def initialize(group:, logger:, commit_interval:, commit_threshold:)
4
+ @group = group
5
+ @logger = logger
6
+ @commit_interval = commit_interval
7
+ @commit_threshold = commit_threshold
8
+
9
+ @uncommitted_offsets = 0
10
+ @processed_offsets = {}
11
+ @default_offsets = {}
12
+ @committed_offsets = nil
13
+ @last_commit = Time.at(0)
14
+ end
15
+
16
+ def set_default_offset(topic, default_offset)
17
+ @default_offsets[topic] = default_offset
18
+ end
19
+
20
+ def mark_as_processed(topic, partition, offset)
21
+ @uncommitted_offsets += 1
22
+ @processed_offsets[topic] ||= {}
23
+ @processed_offsets[topic][partition] = offset + 1
24
+ end
25
+
26
+ def next_offset_for(topic, partition)
27
+ offset = @processed_offsets.fetch(topic, {}).fetch(partition) {
28
+ committed_offset_for(topic, partition)
29
+ }
30
+
31
+ offset = @default_offsets.fetch(topic) if offset < 0
32
+
33
+ offset
34
+ end
35
+
36
+ def commit_offsets
37
+ unless @processed_offsets.empty?
38
+ @logger.info "Committing offsets for #{@uncommitted_offsets} messages"
39
+
40
+ @group.commit_offsets(@processed_offsets)
41
+
42
+ @last_commit = Time.now
43
+ @processed_offsets.clear
44
+ @uncommitted_offsets = 0
45
+ end
46
+ end
47
+
48
+ def commit_offsets_if_necessary
49
+ if seconds_since_last_commit >= @commit_interval || commit_threshold_reached?
50
+ commit_offsets
51
+ end
52
+ end
53
+
54
+ def clear_offsets
55
+ @uncommitted_offsets = 0
56
+ @processed_offsets.clear
57
+ @committed_offsets = nil
58
+ end
59
+
60
+ private
61
+
62
+ def seconds_since_last_commit
63
+ Time.now - @last_commit
64
+ end
65
+
66
+ def committed_offset_for(topic, partition)
67
+ @committed_offsets ||= @group.fetch_offsets
68
+ @committed_offsets.offset_for(topic, partition)
69
+ end
70
+
71
+ def commit_threshold_reached?
72
+ @commit_threshold != 0 && @uncommitted_offsets >= @commit_threshold
73
+ end
74
+ end
75
+ end
@@ -130,41 +130,7 @@ module Kafka
130
130
  #
131
131
  class Producer
132
132
 
133
- # Initializes a new Producer.
134
- #
135
- # @param cluster [Cluster] the cluster client. Typically passed in for you.
136
- #
137
- # @param logger [Logger] the logger that should be used. Typically passed
138
- # in for you.
139
- #
140
- # @param ack_timeout [Integer] The number of seconds a broker can wait for
141
- # replicas to acknowledge a write before responding with a timeout.
142
- #
143
- # @param required_acks [Integer] The number of replicas that must acknowledge
144
- # a write.
145
- #
146
- # @param max_retries [Integer] the number of retries that should be attempted
147
- # before giving up sending messages to the cluster. Does not include the
148
- # original attempt.
149
- #
150
- # @param retry_backoff [Integer] the number of seconds to wait between retries.
151
- #
152
- # @param max_buffer_size [Integer] the number of messages allowed in the buffer
153
- # before new writes will raise {BufferOverflow} exceptions.
154
- #
155
- # @param max_buffer_bytesize [Integer] the maximum size of the buffer in bytes.
156
- # attempting to produce messages when the buffer reaches this size will
157
- # result in {BufferOverflow} being raised.
158
- #
159
- # @param compression_codec [Symbol, nil] the name of the compression codec to
160
- # use, or nil if no compression should be performed. Valid codecs: `:snappy`
161
- # and `:gzip`.
162
- #
163
- # @param compression_threshold [Integer] the number of messages that needs to
164
- # be in a message set before it should be compressed. Note that message sets
165
- # are per-partition rather than per-topic or per-producer.
166
- #
167
- def initialize(cluster:, logger:, compression_codec: nil, compression_threshold: 1, ack_timeout: 5, required_acks: 1, max_retries: 2, retry_backoff: 1, max_buffer_size: 1000, max_buffer_bytesize: 10_000_000)
133
+ def initialize(cluster:, logger:, compressor:, ack_timeout:, required_acks:, max_retries:, retry_backoff:, max_buffer_size:, max_buffer_bytesize:)
168
134
  @cluster = cluster
169
135
  @logger = logger
170
136
  @required_acks = required_acks
@@ -173,11 +139,7 @@ module Kafka
173
139
  @retry_backoff = retry_backoff
174
140
  @max_buffer_size = max_buffer_size
175
141
  @max_buffer_bytesize = max_buffer_bytesize
176
-
177
- @compressor = Compressor.new(
178
- codec_name: @compression_codec,
179
- threshold: @compression_threshold,
180
- )
142
+ @compressor = compressor
181
143
 
182
144
  # The set of topics that are produced to.
183
145
  @target_topics = Set.new
@@ -18,11 +18,14 @@ module Kafka
18
18
  end
19
19
 
20
20
  def offset_for(topic, partition)
21
- offset_info = topics.fetch(topic).fetch(partition)
21
+ offset_info = topics.fetch(topic).fetch(partition, nil)
22
22
 
23
- Protocol.handle_error(offset_info.error_code)
24
-
25
- offset_info.offset
23
+ if offset_info
24
+ Protocol.handle_error(offset_info.error_code)
25
+ offset_info.offset
26
+ else
27
+ -1
28
+ end
26
29
  end
27
30
 
28
31
  def self.decode(decoder)
@@ -0,0 +1,154 @@
1
+ require "socket"
2
+
3
+ module Kafka
4
+
5
+ # Opens sockets in a non-blocking fashion, ensuring that we're not stalling
6
+ # for long periods of time.
7
+ #
8
+ # It's possible to set timeouts for connecting to the server, for reading data,
9
+ # and for writing data. Whenever a timeout is exceeded, Errno::ETIMEDOUT is
10
+ # raised.
11
+ #
12
+ class SSLSocketWithTimeout
13
+
14
+ # Opens a socket.
15
+ #
16
+ # @param host [String]
17
+ # @param port [Integer]
18
+ # @param connect_timeout [Integer] the connection timeout, in seconds.
19
+ # @param timeout [Integer] the read and write timeout, in seconds.
20
+ # @param ssl_context [OpenSSL::SSL::SSLContext] which SSLContext the ssl connection should use
21
+ # @raise [Errno::ETIMEDOUT] if the timeout is exceeded.
22
+ def initialize(host, port, connect_timeout: nil, timeout: nil, ssl_context:)
23
+ addr = Socket.getaddrinfo(host, nil)
24
+ sockaddr = Socket.pack_sockaddr_in(port, addr[0][3])
25
+
26
+ @timeout = timeout
27
+
28
+ @tcp_socket = Socket.new(Socket.const_get(addr[0][0]), Socket::SOCK_STREAM, 0)
29
+ @tcp_socket.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_NODELAY, 1)
30
+
31
+ # first initiate the TCP socket
32
+ begin
33
+ # Initiate the socket connection in the background. If it doesn't fail
34
+ # immediately it will raise an IO::WaitWritable (Errno::EINPROGRESS)
35
+ # indicating the connection is in progress.
36
+ @tcp_socket.connect_nonblock(sockaddr)
37
+ rescue IO::WaitWritable
38
+ # IO.select will block until the socket is writable or the timeout
39
+ # is exceeded, whichever comes first.
40
+ unless IO.select(nil, [@tcp_socket], nil, connect_timeout)
41
+ # IO.select returns nil when the socket is not ready before timeout
42
+ # seconds have elapsed
43
+ @tcp_socket.close
44
+ raise Errno::ETIMEDOUT
45
+ end
46
+
47
+ begin
48
+ # Verify there is now a good connection.
49
+ @tcp_socket.connect_nonblock(sockaddr)
50
+ rescue Errno::EISCONN
51
+ # The socket is connected, we're good!
52
+ end
53
+ end
54
+
55
+ # once that's connected, we can start initiating the ssl socket
56
+ @ssl_socket = OpenSSL::SSL::SSLSocket.new(@tcp_socket, ssl_context)
57
+
58
+ begin
59
+ # Initiate the socket connection in the background. If it doesn't fail
60
+ # immediately it will raise an IO::WaitWritable (Errno::EINPROGRESS)
61
+ # indicating the connection is in progress.
62
+ # Unlike waiting for a tcp socket to connect, you can't time out ssl socket
63
+ # connections during the connect phase properly, because IO.select only partially works.
64
+ # Instead, you have to retry.
65
+ @ssl_socket.connect_nonblock
66
+ rescue Errno::EAGAIN, Errno::EWOULDBLOCK, IO::WaitReadable
67
+ IO.select([@ssl_socket])
68
+ retry
69
+ rescue IO::WaitWritable
70
+ IO.select(nil, [@ssl_socket])
71
+ retry
72
+ end
73
+ end
74
+
75
+ # Reads bytes from the socket, possible with a timeout.
76
+ #
77
+ # @param num_bytes [Integer] the number of bytes to read.
78
+ # @raise [Errno::ETIMEDOUT] if the timeout is exceeded.
79
+ # @return [String] the data that was read from the socket.
80
+ def read(num_bytes)
81
+ buffer = ''
82
+ until buffer.length >= num_bytes
83
+ begin
84
+ # unlike plain tcp sockets, ssl sockets don't support IO.select
85
+ # properly.
86
+ # Instead, timeouts happen on a per read basis, and we have to
87
+ # catch exceptions from read_nonblock, and gradually build up
88
+ # our read buffer.
89
+ buffer << @ssl_socket.read_nonblock(num_bytes - buffer.length)
90
+ rescue IO::WaitReadable
91
+ unless IO.select([@ssl_socket], nil, nil, @timeout)
92
+ raise Errno::ETIMEDOUT
93
+ end
94
+ retry
95
+ rescue IO::WaitWritable
96
+ unless IO.select(nil, [@ssl_socket], nil, @timeout)
97
+ raise Errno::ETIMEDOUT
98
+ end
99
+ retry
100
+ end
101
+ end
102
+ buffer
103
+ end
104
+
105
+ # Writes bytes to the socket, possible with a timeout.
106
+ #
107
+ # @param bytes [String] the data that should be written to the socket.
108
+ # @raise [Errno::ETIMEDOUT] if the timeout is exceeded.
109
+ # @return [Integer] the number of bytes written.
110
+ def write(bytes)
111
+ loop do
112
+ written = 0
113
+ begin
114
+ # unlike plain tcp sockets, ssl sockets don't support IO.select
115
+ # properly.
116
+ # Instead, timeouts happen on a per write basis, and we have to
117
+ # catch exceptions from write_nonblock, and gradually build up
118
+ # our write buffer.
119
+ written += @ssl_socket.write_nonblock(bytes)
120
+ rescue Errno::EFAULT => error
121
+ raise error
122
+ rescue OpenSSL::SSL::SSLError, Errno::EAGAIN, Errno::EWOULDBLOCK, IO::WaitWritable => error
123
+ if error.is_a?(OpenSSL::SSL::SSLError) && error.message == 'write would block'
124
+ if IO.select(nil, [@ssl_socket], nil, @timeout)
125
+ retry
126
+ else
127
+ raise Errno::ETIMEDOUT
128
+ end
129
+ else
130
+ raise error
131
+ end
132
+ end
133
+
134
+ # Fast, common case.
135
+ break if written == bytes.size
136
+
137
+ # This takes advantage of the fact that most ruby implementations
138
+ # have Copy-On-Write strings. Thusly why requesting a subrange
139
+ # of data, we actually don't copy data because the new string
140
+ # simply references a subrange of the original.
141
+ bytes = bytes[written, bytes.size]
142
+ end
143
+ end
144
+
145
+ def close
146
+ @tcp_socket.close
147
+ @ssl_socket.close
148
+ end
149
+
150
+ def set_encoding(encoding)
151
+ @tcp_socket.set_encoding(encoding)
152
+ end
153
+ end
154
+ end
@@ -1,3 +1,3 @@
1
1
  module Kafka
2
- VERSION = "0.2.0"
2
+ VERSION = "0.3.0"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ruby-kafka
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Daniel Schierbeck
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2016-03-01 00:00:00.000000000 Z
11
+ date: 2016-03-08 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -163,8 +163,10 @@ files:
163
163
  - bin/console
164
164
  - bin/setup
165
165
  - circle.yml
166
+ - examples/firehose-producer.rb
166
167
  - examples/simple-consumer.rb
167
168
  - examples/simple-producer.rb
169
+ - examples/ssl-producer.rb
168
170
  - lib/kafka.rb
169
171
  - lib/kafka/async_producer.rb
170
172
  - lib/kafka/broker.rb
@@ -181,6 +183,7 @@ files:
181
183
  - lib/kafka/gzip_codec.rb
182
184
  - lib/kafka/instrumentation.rb
183
185
  - lib/kafka/message_buffer.rb
186
+ - lib/kafka/offset_manager.rb
184
187
  - lib/kafka/partitioner.rb
185
188
  - lib/kafka/pending_message.rb
186
189
  - lib/kafka/pending_message_queue.rb
@@ -219,6 +222,7 @@ files:
219
222
  - lib/kafka/round_robin_assignment_strategy.rb
220
223
  - lib/kafka/snappy_codec.rb
221
224
  - lib/kafka/socket_with_timeout.rb
225
+ - lib/kafka/ssl_socket_with_timeout.rb
222
226
  - lib/kafka/version.rb
223
227
  - lib/ruby-kafka.rb
224
228
  - ruby-kafka.gemspec