ruby-kafka 0.1.5 → 0.1.6

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: add61b5ca7d42ab4a606bf411d78d9fb8baa0224
4
- data.tar.gz: c8704b6429bed2e7e1159c5fc32e03db9476717f
3
+ metadata.gz: 50a1c28cf71285d37c57e3dbe7a5d156f891576c
4
+ data.tar.gz: f8a139dc4061ec8a771f86c0a5ca280692ddd5cc
5
5
  SHA512:
6
- metadata.gz: 1669663bd73bee8b0ac73d0aefe93b8219a996f1c482ad6e7867c0b46ce5d3f91857acb21e418880d7754b67d687ff785596383b31eff7793f1211af13c943d8
7
- data.tar.gz: f1de6fec8420801ca562d99925da270e61b955495a486117cc1659ccafe3c431fe8872000ebb4346af69ea8306ff4cceb237d047a957f1bebce3d8e48a87ef78
6
+ metadata.gz: 6d3245db50893aba63b50b600903dc0baf345017c940dcaa36b0e2fbe0f50fe231bc15f9af2dc083d4969fd63ba0493781570811a3ba4ba0e7df21b06e0fd993
7
+ data.tar.gz: d5cddd687cc85f02b0826e95205114e2f64cffd9dad48ef692e717dbb3f7cac41ce2b0afd71ee271e8590e27b9eac42d8f67f247c71fe760dec2915191361273
data/README.md CHANGED
@@ -24,7 +24,7 @@ Or install it yourself as:
24
24
 
25
25
  ## Usage
26
26
 
27
- Please see the [documentation site](http://www.rubydoc.info/gems/ruby-kafka) for detailed documentation on the latest release.
27
+ Please see the [documentation site](http://www.rubydoc.info/gems/ruby-kafka) for detailed documentation on the latest release. Note that the documentation on GitHub may not match the version of the library you're using – there are still being made many changes to the API.
28
28
 
29
29
  ### Producing Messages to Kafka
30
30
 
@@ -39,7 +39,7 @@ kafka = Kafka.new(seed_brokers: ["kafka1:9092", "kafka2:9092"])
39
39
  A producer buffers messages and sends them to the broker that is the leader of the partition a given message is assigned to.
40
40
 
41
41
  ```ruby
42
- producer = kafka.get_producer
42
+ producer = kafka.producer
43
43
  ```
44
44
 
45
45
  `produce` will buffer the message in the producer but will _not_ actually send it to the Kafka cluster.
@@ -66,7 +66,7 @@ If you don't know exactly how many partitions are in the topic, or you'd rather
66
66
  producer.produce("hello4", topic: "test-messages", partition_key: "yo")
67
67
  ```
68
68
 
69
- `deliver_messages` will send the buffered messages to the cluster. Since messages may be destined for different partitions, this could involve writing to more than one Kafka broker. Note that a failure to send all buffered messages after the configured number of retries will result in `Kafka::FailedToSendMessages` being raised. This can be rescued and ignored; the messages will be kept in the buffer until the next attempt.
69
+ `deliver_messages` will send the buffered messages to the cluster. Since messages may be destined for different partitions, this could involve writing to more than one Kafka broker. Note that a failure to send all buffered messages after the configured number of retries will result in `Kafka::DeliveryFailed` being raised. This can be rescued and ignored; the messages will be kept in the buffer until the next attempt.
70
70
 
71
71
  ```ruby
72
72
  producer.deliver_messages
@@ -74,6 +74,71 @@ producer.deliver_messages
74
74
 
75
75
  Read the docs for [Kafka::Producer](http://www.rubydoc.info/gems/ruby-kafka/Kafka/Producer) for more details.
76
76
 
77
+ ### Asynchronously Producing Messages
78
+
79
+ A normal producer will block while `#deliver_messages` is sending messages to Kafka, possible for tens of seconds or even minutes at a time, depending on your timeout and retry settings. Furthermore, you have to call `#deliver_messages` manually, with a frequency that balances batch size with message delay.
80
+
81
+ In order to avoid blocking during message deliveries you can use the _asynchronous producer_ API. It is mostly similar to the synchronous API, with calls to `#produce` and `#deliver_messages`. The main difference is that rather than blocking, these calls will return immediately. The actual work will be done in a background thread, with the messages and operations being sent from the caller over a thread safe queue.
82
+
83
+ ```ruby
84
+ # `#async_producer` will create a new asynchronous producer.
85
+ producer = kafka.async_producer
86
+
87
+ # The `#produce` API works as normal.
88
+ producer.produce("hello", topic: "greetings")
89
+
90
+ # `#deliver_messages` will return immediately.
91
+ producer.deliver_messages
92
+
93
+ # Make sure to call `#shutdown` on the producer in order to
94
+ # avoid leaking resources.
95
+ producer.shutdown
96
+ ```
97
+
98
+ By default, the delivery policy will be the same as for a synchronous producer: only when `#deliver_messages` is called will the messages be delivered. However, the asynchronous producer offers two complementary policies for _automatic delivery_:
99
+
100
+ 1. Trigger a delivery once the producer's message buffer reaches a specified _threshold_. This can be used to improve efficiency by increasing the batch size when sending messages to the Kafka cluster.
101
+ 2. Trigger a delivery at a _fixed time interval_. This puts an upper bound on message delays.
102
+
103
+ These policies can be used alone or in combination.
104
+
105
+ ```ruby
106
+ # `async_producer` will create a new asynchronous producer.
107
+ producer = kafka.async_producer(
108
+ # Trigger a delivery once 100 messages have been buffered.
109
+ delivery_threshold: 100,
110
+
111
+ # Trigger a delivery every 30 seconds.
112
+ delivery_interval: 30,
113
+ )
114
+
115
+ producer.produce("hello", topic: "greetings")
116
+
117
+ # ...
118
+ ```
119
+
120
+ **Note:** if the calling thread produces messages faster than the producer can write them to Kafka, you'll eventually run into problems. The internal queue used for sending messages from the calling thread to the background worker has a size limit; once this limit is reached, a call to `#produce` will raise `Kafka::BufferOverflow`.
121
+
122
+ ### Serialization
123
+
124
+ This library is agnostic to which serialization format you prefer. Both the value and key of a message is treated as a binary string of data. This makes it easier to use whatever serialization format you want, since you don't have to do anything special to make it work with ruby-kafka. Here's an example of encoding data with JSON:
125
+
126
+ ```ruby
127
+ require "json"
128
+
129
+ # ...
130
+
131
+ event = {
132
+ "name" => "pageview",
133
+ "url" => "https://example.com/posts/123",
134
+ # ...
135
+ }
136
+
137
+ data = JSON.dump(event)
138
+
139
+ producer.produce(data, topic: "events")
140
+ ```
141
+
77
142
  ### Partitioning
78
143
 
79
144
  Kafka topics are partitioned, with messages being assigned to a partition by the client. This allows a great deal of flexibility for the users. This section describes several strategies for partitioning and how they impact performance, data locality, etc.
@@ -137,9 +202,9 @@ producer.produce(event, topic: "events", partition: partition)
137
202
 
138
203
  The producer is designed for resilience in the face of temporary network errors, Kafka broker failovers, and other issues that prevent the client from writing messages to the destination topics. It does this by employing local, in-memory buffers. Only when messages are acknowledged by a Kafka broker will they be removed from the buffer.
139
204
 
140
- Typically, you'd configure the producer to retry failed attempts at sending messages, but sometimes all retries are exhausted. In that case, `Kafka::FailedToSendMessages` is raised from `Kafka::Producer#deliver_messages`. If you wish to have your application be resilient to this happening (e.g. if you're logging to Kafka from a web application) you can rescue this exception. The failed messages are still retained in the buffer, so a subsequent call to `#deliver_messages` will still attempt to send them.
205
+ Typically, you'd configure the producer to retry failed attempts at sending messages, but sometimes all retries are exhausted. In that case, `Kafka::DeliveryFailed` is raised from `Kafka::Producer#deliver_messages`. If you wish to have your application be resilient to this happening (e.g. if you're logging to Kafka from a web application) you can rescue this exception. The failed messages are still retained in the buffer, so a subsequent call to `#deliver_messages` will still attempt to send them.
141
206
 
142
- Note that there's a maximum buffer size; pass in a different value for `max_buffer_size` when calling `#get_producer` in order to configure this.
207
+ Note that there's a maximum buffer size; pass in a different value for `max_buffer_size` when calling `#producer` in order to configure this.
143
208
 
144
209
  A final note on buffers: local buffers give resilience against broker and network failures, and allow higher throughput due to message batching, but they also trade off consistency guarantees for higher availibility and resilience. If your local process dies while messages are buffered, those messages will be lost. If you require high levels of consistency, you should call `#deliver_messages` immediately after `#produce`.
145
210
 
@@ -152,7 +217,7 @@ It's important to understand how timeouts work if you have a latency sensitive a
152
217
  * `connect_timeout` sets the number of seconds to wait while connecting to a broker for the first time. When ruby-kafka initializes, it needs to connect to at least one host in `seed_brokers` in order to discover the Kafka cluster. Each host is tried until there's one that works. Usually that means the first one, but if your entire cluster is down, or there's a network partition, you could wait up to `n * connect_timeout` seconds, where `n` is the number of seed brokers.
153
218
  * `socket_timeout` sets the number of seconds to wait when reading from or writing to a socket connection to a broker. After this timeout expires the connection will be killed. Note that some Kafka operations are by definition long-running, such as waiting for new messages to arrive in a partition, so don't set this value too low. When configuring timeouts relating to specific Kafka operations, make sure to make them shorter than this one.
154
219
 
155
- **Producer timeouts** can be configured when calling `#get_producer` on a client instance:
220
+ **Producer timeouts** can be configured when calling `#producer` on a client instance:
156
221
 
157
222
  * `ack_timeout` is a timeout executed by a broker when the client is sending messages to it. It defines the number of seconds the broker should wait for replicas to acknowledge the write before responding to the client with an error. As such, it relates to the `required_acks` setting. It should be set lower than `socket_timeout`.
158
223
  * `retry_backoff` configures the number of seconds to wait after a failed attempt to send messages to a Kafka broker before retrying. The `max_retries` setting defines the maximum number of retries to attempt, and so the total duration could be up to `max_retries * retry_backoff` seconds. The timeout can be arbitrarily long, and shouldn't be too short: if a broker goes down its partitions will be handed off to another broker, and that can take tens of seconds.
@@ -23,7 +23,7 @@ kafka = Kafka.new(
23
23
  logger: logger,
24
24
  )
25
25
 
26
- producer = kafka.get_producer
26
+ producer = kafka.producer
27
27
 
28
28
  begin
29
29
  $stdin.each_with_index do |line, index|
data/lib/kafka.rb CHANGED
@@ -98,7 +98,7 @@ module Kafka
98
98
  end
99
99
 
100
100
  # Raised if not all messages could be sent by a producer.
101
- class FailedToSendMessages < Error
101
+ class DeliveryFailed < Error
102
102
  end
103
103
 
104
104
  # Initializes a new Kafka client.
@@ -0,0 +1,181 @@
1
+ require "thread"
2
+
3
+ module Kafka
4
+
5
+ # A Kafka producer that does all its work in the background so as to not block
6
+ # the calling thread. Calls to {#deliver_messages} are asynchronous and return
7
+ # immediately.
8
+ #
9
+ # In addition to this property it's possible to define automatic delivery
10
+ # policies. These allow placing an upper bound on the number of buffered
11
+ # messages and the time between message deliveries.
12
+ #
13
+ # * If `delivery_threshold` is set to a value _n_ higher than zero, the producer
14
+ # will automatically deliver its messages once its buffer size reaches _n_.
15
+ # * If `delivery_interval` is set to a value _n_ higher than zero, the producer
16
+ # will automatically deliver its messages every _n_ seconds.
17
+ #
18
+ # By default, automatic delivery is disabled and you'll have to call
19
+ # {#deliver_messages} manually.
20
+ #
21
+ # The calling thread communicates with the background thread doing the actual
22
+ # work using a thread safe queue. While the background thread is busy delivering
23
+ # messages, new messages will be buffered in the queue. In order to avoid
24
+ # the queue growing uncontrollably in cases where the background thread gets
25
+ # stuck or can't follow the pace of the calling thread, there's a maximum
26
+ # number of messages that is allowed to be buffered. You can configure this
27
+ # value by setting `max_queue_size`.
28
+ #
29
+ # ## Example
30
+ #
31
+ # producer = kafka.async_producer(
32
+ # # Keep at most 1.000 messages in the buffer before delivering:
33
+ # delivery_threshold: 1000,
34
+ #
35
+ # # Deliver messages every 30 seconds:
36
+ # delivery_interval: 30,
37
+ # )
38
+ #
39
+ # # There's no need to manually call #deliver_messages, it will happen
40
+ # # automatically in the background.
41
+ # producer.produce("hello", topic: "greetings")
42
+ #
43
+ # # Remember to shut down the producer when you're done with it.
44
+ # producer.shutdown
45
+ #
46
+ class AsyncProducer
47
+
48
+ # Initializes a new AsyncProducer.
49
+ #
50
+ # @param sync_producer [Kafka::Producer] the synchronous producer that should
51
+ # be used in the background.
52
+ # @param max_queue_size [Integer] the maximum number of messages allowed in
53
+ # the queue.
54
+ # @param delivery_threshold [Integer] if greater than zero, the number of
55
+ # buffered messages that will automatically trigger a delivery.
56
+ # @param delivery_interval [Integer] if greater than zero, the number of
57
+ # seconds between automatic message deliveries.
58
+ #
59
+ def initialize(sync_producer:, max_queue_size: 1000, delivery_threshold: 0, delivery_interval: 0)
60
+ raise ArgumentError unless max_queue_size > 0
61
+ raise ArgumentError unless delivery_threshold >= 0
62
+ raise ArgumentError unless delivery_interval >= 0
63
+
64
+ @queue = Queue.new
65
+ @max_queue_size = max_queue_size
66
+
67
+ @worker_thread = Thread.new do
68
+ worker = Worker.new(
69
+ queue: @queue,
70
+ producer: sync_producer,
71
+ delivery_threshold: delivery_threshold,
72
+ )
73
+
74
+ worker.run
75
+ end
76
+
77
+ @worker_thread.abort_on_exception = true
78
+
79
+ if delivery_interval > 0
80
+ Thread.new do
81
+ Timer.new(queue: @queue, interval: delivery_interval).run
82
+ end
83
+ end
84
+ end
85
+
86
+ # Produces a message to the specified topic.
87
+ #
88
+ # @see Kafka::Producer#produce
89
+ # @param (see Kafka::Producer#produce)
90
+ # @raise [BufferOverflow] if the message queue is full.
91
+ # @return [nil]
92
+ def produce(*args)
93
+ raise BufferOverflow if @queue.size >= @max_queue_size
94
+ @queue << [:produce, args]
95
+
96
+ nil
97
+ end
98
+
99
+ # Asynchronously delivers the buffered messages. This method will return
100
+ # immediately and the actual work will be done in the background.
101
+ #
102
+ # @see Kafka::Producer#deliver_messages
103
+ # @return [nil]
104
+ def deliver_messages
105
+ @queue << [:deliver_messages, nil]
106
+
107
+ nil
108
+ end
109
+
110
+ # Shuts down the producer, releasing the network resources used. This
111
+ # method will block until the buffered messages have been delivered.
112
+ #
113
+ # @see Kafka::Producer#shutdown
114
+ # @return [nil]
115
+ def shutdown
116
+ @queue << [:shutdown, nil]
117
+ @worker_thread.join
118
+
119
+ nil
120
+ end
121
+
122
+ class Timer
123
+ def initialize(interval:, queue:)
124
+ @queue = queue
125
+ @interval = interval
126
+ end
127
+
128
+ def run
129
+ loop do
130
+ sleep(@interval)
131
+ @queue << [:deliver_messages, nil]
132
+ end
133
+ end
134
+ end
135
+
136
+ class Worker
137
+ def initialize(queue:, producer:, delivery_threshold:)
138
+ @queue = queue
139
+ @producer = producer
140
+ @delivery_threshold = delivery_threshold
141
+ end
142
+
143
+ def run
144
+ loop do
145
+ operation, payload = @queue.pop
146
+
147
+ case operation
148
+ when :produce
149
+ @producer.produce(*payload)
150
+ deliver_messages if threshold_reached?
151
+ when :deliver_messages
152
+ deliver_messages
153
+ when :shutdown
154
+ # Deliver any pending messages first.
155
+ deliver_messages
156
+
157
+ # Stop the run loop.
158
+ break
159
+ else
160
+ raise "Unknown operation #{operation.inspect}"
161
+ end
162
+ end
163
+ ensure
164
+ @producer.shutdown
165
+ end
166
+
167
+ private
168
+
169
+ def deliver_messages
170
+ @producer.deliver_messages
171
+ rescue DeliveryFailed
172
+ # Delivery failed.
173
+ end
174
+
175
+ def threshold_reached?
176
+ @delivery_threshold > 0 &&
177
+ @producer.buffer_size >= @delivery_threshold
178
+ end
179
+ end
180
+ end
181
+ end
data/lib/kafka/client.rb CHANGED
@@ -1,5 +1,6 @@
1
1
  require "kafka/cluster"
2
2
  require "kafka/producer"
3
+ require "kafka/async_producer"
3
4
  require "kafka/fetched_message"
4
5
  require "kafka/fetch_operation"
5
6
 
@@ -47,10 +48,21 @@ module Kafka
47
48
  #
48
49
  # @see Producer#initialize
49
50
  # @return [Kafka::Producer] the Kafka producer.
50
- def get_producer(**options)
51
+ def producer(**options)
51
52
  Producer.new(cluster: @cluster, logger: @logger, **options)
52
53
  end
53
54
 
55
+ def async_producer(delivery_interval: 0, delivery_threshold: 0, max_queue_size: 1000, **options)
56
+ sync_producer = producer(**options)
57
+
58
+ AsyncProducer.new(
59
+ sync_producer: sync_producer,
60
+ delivery_interval: delivery_interval,
61
+ delivery_threshold: delivery_threshold,
62
+ max_queue_size: max_queue_size,
63
+ )
64
+ end
65
+
54
66
  # Fetches a batch of messages from a single partition. Note that it's possible
55
67
  # to get back empty batches.
56
68
  #
@@ -0,0 +1,23 @@
1
+ require "kafka/snappy_codec"
2
+ require "kafka/gzip_codec"
3
+
4
+ module Kafka
5
+ module Compression
6
+ def self.find_codec(name)
7
+ case name
8
+ when nil then nil
9
+ when :snappy then SnappyCodec.new
10
+ when :gzip then GzipCodec.new
11
+ else raise "Unknown compression codec #{name}"
12
+ end
13
+ end
14
+
15
+ def self.find_codec_by_id(codec_id)
16
+ case codec_id
17
+ when 1 then GzipCodec.new
18
+ when 2 then SnappyCodec.new
19
+ else raise "Unknown codec id #{codec_id}"
20
+ end
21
+ end
22
+ end
23
+ end
@@ -69,13 +69,13 @@ module Kafka
69
69
  fetched_topic.partitions.flat_map {|fetched_partition|
70
70
  Protocol.handle_error(fetched_partition.error_code)
71
71
 
72
- fetched_partition.messages.map {|offset, message|
72
+ fetched_partition.messages.map {|message|
73
73
  FetchedMessage.new(
74
74
  value: message.value,
75
75
  key: message.key,
76
76
  topic: fetched_topic.name,
77
77
  partition: fetched_partition.partition,
78
- offset: offset,
78
+ offset: message.offset,
79
79
  )
80
80
  }
81
81
  }
@@ -0,0 +1,28 @@
1
+ module Kafka
2
+ class GzipCodec
3
+ def initialize
4
+ require "zlib"
5
+ end
6
+
7
+ def codec_id
8
+ 1
9
+ end
10
+
11
+ def compress(data)
12
+ buffer = StringIO.new
13
+ buffer.set_encoding(Encoding::BINARY)
14
+
15
+ writer = Zlib::GzipWriter.new(buffer, Zlib::DEFAULT_COMPRESSION, Zlib::DEFAULT_STRATEGY)
16
+ writer.write(data)
17
+ writer.close
18
+
19
+ buffer.string
20
+ end
21
+
22
+ def decompress(data)
23
+ buffer = StringIO.new(data)
24
+ reader = Zlib::GzipReader.new(buffer)
25
+ reader.read
26
+ end
27
+ end
28
+ end
@@ -1,3 +1,5 @@
1
+ require "kafka/protocol/message_set"
2
+
1
3
  module Kafka
2
4
  # A produce operation attempts to send all messages in a buffer to the Kafka cluster.
3
5
  # Since topics and partitions are spread among all brokers in a cluster, this usually
@@ -23,11 +25,12 @@ module Kafka
23
25
  # * `sent_message_count` – the number of messages that were successfully sent.
24
26
  #
25
27
  class ProduceOperation
26
- def initialize(cluster:, buffer:, required_acks:, ack_timeout:, logger:)
28
+ def initialize(cluster:, buffer:, compression_codec:, required_acks:, ack_timeout:, logger:)
27
29
  @cluster = cluster
28
30
  @buffer = buffer
29
31
  @required_acks = required_acks
30
32
  @ack_timeout = ack_timeout
33
+ @compression_codec = compression_codec
31
34
  @logger = logger
32
35
  end
33
36
 
@@ -67,12 +70,20 @@ module Kafka
67
70
  end
68
71
  end
69
72
 
70
- messages_for_broker.each do |broker, message_set|
73
+ messages_for_broker.each do |broker, message_buffer|
71
74
  begin
72
- @logger.info "Sending #{message_set.size} messages to #{broker}"
75
+ @logger.info "Sending #{message_buffer.size} messages to #{broker}"
76
+
77
+ messages_for_topics = {}
78
+
79
+ message_buffer.each do |topic, partition, messages|
80
+ message_set = Protocol::MessageSet.new(messages: messages, compression_codec: @compression_codec)
81
+ messages_for_topics[topic] ||= {}
82
+ messages_for_topics[topic][partition] = message_set
83
+ end
73
84
 
74
85
  response = broker.produce(
75
- messages_for_topics: message_set.to_h,
86
+ messages_for_topics: messages_for_topics,
76
87
  required_acks: @required_acks,
77
88
  timeout: @ack_timeout * 1000, # Kafka expects the timeout in milliseconds.
78
89
  )
@@ -2,6 +2,7 @@ require "kafka/partitioner"
2
2
  require "kafka/message_buffer"
3
3
  require "kafka/produce_operation"
4
4
  require "kafka/pending_message"
5
+ require "kafka/compression"
5
6
 
6
7
  module Kafka
7
8
 
@@ -14,11 +15,11 @@ module Kafka
14
15
  # kafka = Kafka.new(...)
15
16
  #
16
17
  # # Will instantiate Kafka::Producer
17
- # producer = kafka.get_producer
18
+ # producer = kafka.producer
18
19
  #
19
20
  # This is done in order to share a logger as well as a pool of broker connections across
20
21
  # different producers. This also means that you don't need to pass the `cluster` and
21
- # `logger` options to `#get_producer`. See {#initialize} for the list of other options
22
+ # `logger` options to `#producer`. See {#initialize} for the list of other options
22
23
  # you can pass in.
23
24
  #
24
25
  # ## Buffering
@@ -77,7 +78,7 @@ module Kafka
77
78
  # logger: logger,
78
79
  # )
79
80
  #
80
- # producer = kafka.get_producer
81
+ # producer = kafka.producer
81
82
  #
82
83
  # begin
83
84
  # $stdin.each_with_index do |line, index|
@@ -117,7 +118,7 @@ module Kafka
117
118
  # @param max_buffer_size [Integer] the number of messages allowed in the buffer
118
119
  # before new writes will raise BufferOverflow exceptions.
119
120
  #
120
- def initialize(cluster:, logger:, ack_timeout: 5, required_acks: 1, max_retries: 2, retry_backoff: 1, max_buffer_size: 1000)
121
+ def initialize(cluster:, logger:, compression_codec: nil, ack_timeout: 5, required_acks: 1, max_retries: 2, retry_backoff: 1, max_buffer_size: 1000)
121
122
  @cluster = cluster
122
123
  @logger = logger
123
124
  @required_acks = required_acks
@@ -125,6 +126,7 @@ module Kafka
125
126
  @max_retries = max_retries
126
127
  @retry_backoff = retry_backoff
127
128
  @max_buffer_size = max_buffer_size
129
+ @compression_codec = Compression.find_codec(compression_codec)
128
130
 
129
131
  # A buffer organized by topic/partition.
130
132
  @buffer = MessageBuffer.new
@@ -185,7 +187,7 @@ module Kafka
185
187
  # the writes. The `ack_timeout` setting places an upper bound on the amount of
186
188
  # time the call will block before failing.
187
189
  #
188
- # @raise [FailedToSendMessages] if not all messages could be successfully sent.
190
+ # @raise [DeliveryFailed] if not all messages could be successfully sent.
189
191
  # @return [nil]
190
192
  def deliver_messages
191
193
  # There's no need to do anything if the buffer is empty.
@@ -233,6 +235,7 @@ module Kafka
233
235
  buffer: @buffer,
234
236
  required_acks: @required_acks,
235
237
  ack_timeout: @ack_timeout,
238
+ compression_codec: @compression_codec,
236
239
  logger: @logger,
237
240
  )
238
241
 
@@ -268,7 +271,7 @@ module Kafka
268
271
  unless @buffer.empty?
269
272
  partitions = @buffer.map {|topic, partition, _| "#{topic}/#{partition}" }.join(", ")
270
273
 
271
- raise FailedToSendMessages, "Failed to send messages to #{partitions}"
274
+ raise DeliveryFailed, "Failed to send messages to #{partitions}"
272
275
  end
273
276
  end
274
277
 
@@ -16,43 +16,77 @@ module Kafka
16
16
  class Message
17
17
  MAGIC_BYTE = 0
18
18
 
19
- attr_reader :key, :value, :attributes
19
+ attr_reader :key, :value, :attributes, :offset
20
20
 
21
- def initialize(key:, value:, attributes: 0)
21
+ def initialize(value:, key: nil, attributes: 0, offset: -1)
22
22
  @key = key
23
23
  @value = value
24
24
  @attributes = attributes
25
+ @offset = offset
25
26
  end
26
27
 
27
28
  def encode(encoder)
28
- data = encode_without_crc
29
- crc = Zlib.crc32(data)
29
+ data = encode_with_crc
30
30
 
31
- encoder.write_int32(crc)
32
- encoder.write(data)
31
+ encoder.write_int64(offset)
32
+ encoder.write_bytes(data)
33
33
  end
34
34
 
35
35
  def ==(other)
36
- @key == other.key && @value == other.value && @attributes == other.attributes
36
+ @key == other.key &&
37
+ @value == other.value &&
38
+ @attributes == other.attributes &&
39
+ @offset == other.offset
40
+ end
41
+
42
+ def compressed?
43
+ @attributes != 0
44
+ end
45
+
46
+ # @return [Kafka::Protocol::MessageSet]
47
+ def decompress
48
+ codec = Compression.find_codec_by_id(@attributes)
49
+
50
+ # For some weird reason we need to cut out the first 20 bytes.
51
+ data = codec.decompress(value)
52
+ message_set_decoder = Decoder.from_string(data)
53
+
54
+ MessageSet.decode(message_set_decoder)
37
55
  end
38
56
 
39
57
  def self.decode(decoder)
40
- crc = decoder.int32
41
- magic_byte = decoder.int8
58
+ offset = decoder.int64
59
+ message_decoder = Decoder.from_string(decoder.bytes)
60
+
61
+ crc = message_decoder.int32
62
+ magic_byte = message_decoder.int8
42
63
 
43
64
  unless magic_byte == MAGIC_BYTE
44
65
  raise Kafka::Error, "Invalid magic byte: #{magic_byte}"
45
66
  end
46
67
 
47
- attributes = decoder.int8
48
- key = decoder.bytes
49
- value = decoder.bytes
68
+ attributes = message_decoder.int8
69
+ key = message_decoder.bytes
70
+ value = message_decoder.bytes
50
71
 
51
- new(key: key, value: value, attributes: attributes)
72
+ new(key: key, value: value, attributes: attributes, offset: offset)
52
73
  end
53
74
 
54
75
  private
55
76
 
77
+ def encode_with_crc
78
+ buffer = StringIO.new
79
+ encoder = Encoder.new(buffer)
80
+
81
+ data = encode_without_crc
82
+ crc = Zlib.crc32(data)
83
+
84
+ encoder.write_int32(crc)
85
+ encoder.write(data)
86
+
87
+ buffer.string
88
+ end
89
+
56
90
  def encode_without_crc
57
91
  buffer = StringIO.new
58
92
  encoder = Encoder.new(buffer)
@@ -3,23 +3,65 @@ module Kafka
3
3
  class MessageSet
4
4
  attr_reader :messages
5
5
 
6
- def initialize(messages:)
6
+ def initialize(messages: [], compression_codec: nil)
7
7
  @messages = messages
8
+ @compression_codec = compression_codec
9
+ end
10
+
11
+ def ==(other)
12
+ messages == other.messages
13
+ end
14
+
15
+ def encode(encoder)
16
+ if @compression_codec.nil?
17
+ encode_without_compression(encoder)
18
+ else
19
+ encode_with_compression(encoder)
20
+ end
8
21
  end
9
22
 
10
23
  def self.decode(decoder)
11
24
  fetched_messages = []
12
25
 
13
26
  until decoder.eof?
14
- offset = decoder.int64
15
- message_decoder = Decoder.from_string(decoder.bytes)
16
- message = Message.decode(message_decoder)
27
+ message = Message.decode(decoder)
17
28
 
18
- fetched_messages << [offset, message]
29
+ if message.compressed?
30
+ wrapped_message_set = message.decompress
31
+ fetched_messages.concat(wrapped_message_set.messages)
32
+ else
33
+ fetched_messages << message
34
+ end
19
35
  end
20
36
 
21
37
  new(messages: fetched_messages)
22
38
  end
39
+
40
+ private
41
+
42
+ def encode_with_compression(encoder)
43
+ codec = @compression_codec
44
+
45
+ buffer = StringIO.new
46
+ encode_without_compression(Encoder.new(buffer))
47
+ data = codec.compress(buffer.string)
48
+
49
+ wrapper_message = Protocol::Message.new(
50
+ value: data,
51
+ attributes: codec.codec_id,
52
+ )
53
+
54
+ message_set = MessageSet.new(messages: [wrapper_message])
55
+ message_set.encode(encoder)
56
+ end
57
+
58
+ def encode_without_compression(encoder)
59
+ # Messages in a message set are *not* encoded as an array. Rather,
60
+ # they are written in sequence.
61
+ @messages.each do |message|
62
+ message.encode(encoder)
63
+ end
64
+ end
23
65
  end
24
66
  end
25
67
  end
@@ -59,44 +59,18 @@ module Kafka
59
59
  encoder.write_array(@messages_for_topics) do |topic, messages_for_partition|
60
60
  encoder.write_string(topic)
61
61
 
62
- encoder.write_array(messages_for_partition) do |partition, messages|
62
+ encoder.write_array(messages_for_partition) do |partition, message_set|
63
+ encoder.write_int32(partition)
64
+
63
65
  # When encoding the message set into the request, the bytesize of the message
64
- # set must precede the actual bytes. Therefore we need to encode the entire
66
+ # set must precede the actual data. Therefore we need to encode the entire
65
67
  # message set into a separate buffer first.
66
- encoded_message_set = encode_message_set(messages)
67
-
68
- encoder.write_int32(partition)
68
+ encoded_message_set = Encoder.encode_with(message_set)
69
69
 
70
- # When encoding bytes, the 32 bit size of the byte buffer is encoded first.
71
70
  encoder.write_bytes(encoded_message_set)
72
71
  end
73
72
  end
74
73
  end
75
-
76
- private
77
-
78
- def encode_message_set(messages)
79
- buffer = StringIO.new
80
- encoder = Encoder.new(buffer)
81
-
82
- # Messages in a message set are *not* encoded as an array. Rather,
83
- # they are written in sequence with only the byte size prepended.
84
- messages.each do |message|
85
- offset = -1 # offsets don't matter here.
86
-
87
- # When encoding a message into a message set, the bytesize of the message must
88
- # precede the actual bytes. Therefore we need to encode the message into a
89
- # separate buffer first.
90
- encoded_message = Encoder.encode_with(message)
91
-
92
- encoder.write_int64(offset)
93
-
94
- # When encoding bytes, the 32 bit size of the byte buffer is encoded first.
95
- encoder.write_bytes(encoded_message)
96
- end
97
-
98
- buffer.string
99
- end
100
74
  end
101
75
  end
102
76
  end
@@ -0,0 +1,20 @@
1
+ module Kafka
2
+ class SnappyCodec
3
+ def initialize
4
+ require "snappy"
5
+ end
6
+
7
+ def codec_id
8
+ 2
9
+ end
10
+
11
+ def compress(data)
12
+ Snappy.deflate(data)
13
+ end
14
+
15
+ def decompress(data)
16
+ buffer = StringIO.new(data)
17
+ Snappy::Reader.new(buffer).read
18
+ end
19
+ end
20
+ end
data/lib/kafka/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module Kafka
2
- VERSION = "0.1.5"
2
+ VERSION = "0.1.6"
3
3
  end
data/ruby-kafka.gemspec CHANGED
@@ -35,4 +35,5 @@ Gem::Specification.new do |spec|
35
35
  spec.add_development_dependency "docker-api"
36
36
  spec.add_development_dependency "rspec-benchmark"
37
37
  spec.add_development_dependency "activesupport", ">= 4.2.0", "< 5.1"
38
+ spec.add_development_dependency "snappy"
38
39
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ruby-kafka
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.5
4
+ version: 0.1.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - Daniel Schierbeck
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2016-02-18 00:00:00.000000000 Z
11
+ date: 2016-02-22 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -128,6 +128,20 @@ dependencies:
128
128
  - - "<"
129
129
  - !ruby/object:Gem::Version
130
130
  version: '5.1'
131
+ - !ruby/object:Gem::Dependency
132
+ name: snappy
133
+ requirement: !ruby/object:Gem::Requirement
134
+ requirements:
135
+ - - ">="
136
+ - !ruby/object:Gem::Version
137
+ version: '0'
138
+ type: :development
139
+ prerelease: false
140
+ version_requirements: !ruby/object:Gem::Requirement
141
+ requirements:
142
+ - - ">="
143
+ - !ruby/object:Gem::Version
144
+ version: '0'
131
145
  description: |-
132
146
  A client library for the Kafka distributed commit log.
133
147
 
@@ -151,13 +165,16 @@ files:
151
165
  - examples/simple-consumer.rb
152
166
  - examples/simple-producer.rb
153
167
  - lib/kafka.rb
168
+ - lib/kafka/async_producer.rb
154
169
  - lib/kafka/broker.rb
155
170
  - lib/kafka/broker_pool.rb
156
171
  - lib/kafka/client.rb
157
172
  - lib/kafka/cluster.rb
173
+ - lib/kafka/compression.rb
158
174
  - lib/kafka/connection.rb
159
175
  - lib/kafka/fetch_operation.rb
160
176
  - lib/kafka/fetched_message.rb
177
+ - lib/kafka/gzip_codec.rb
161
178
  - lib/kafka/instrumentation.rb
162
179
  - lib/kafka/message_buffer.rb
163
180
  - lib/kafka/partitioner.rb
@@ -178,6 +195,7 @@ files:
178
195
  - lib/kafka/protocol/produce_response.rb
179
196
  - lib/kafka/protocol/request_message.rb
180
197
  - lib/kafka/protocol/topic_metadata_request.rb
198
+ - lib/kafka/snappy_codec.rb
181
199
  - lib/kafka/socket_with_timeout.rb
182
200
  - lib/kafka/version.rb
183
201
  - lib/ruby-kafka.rb