RubyGems - ruby-kafka - Versions diffs - 0.3.4 → 0.3.5 - Mend

ruby-kafka 0.3.4 → 0.3.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +6 -0
data/README.md +35 -0
data/lib/kafka/async_producer.rb +42 -20
data/lib/kafka/client.rb +1 -0
data/lib/kafka/producer.rb +10 -2
data/lib/kafka/protocol/offset_commit_request.rb +4 -1
data/lib/kafka/version.rb +1 -1
metadata +2 -2

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: faf979f6512df3c644b20c1fad027fdc2818d558
-  data.tar.gz: 729c7d34332b6c832fdcc1bd6ee98daaf8af09e6
+  metadata.gz: 99773c65ab50857fb0d09cabebd3d985a9d5b88c
+  data.tar.gz: 6f94dfae3f17778c3d4c73207545e194e7e5dedf
 SHA512:
-  metadata.gz: c3ef737a7b4af9f94413759c4d7c1dec136f727f3bf7963cc3b7daaf26d42600046deca40cd28c2ba8685b7d54e4efbb841194fe272218b24a70abf76b359e51
-  data.tar.gz: 691a8b02e96485959653c84fc5fa83d586a569e2866a94a30a490de9031a37beea9c96d71e8e244a9bfde0a779f6066cea3cd3d53253142da6d2c95a702b38d2
+  metadata.gz: 6c9b7268300b13d023fcf509041bf945f791aa145156a7fffa9dff6cf197fae9715562676df035ffe994e9c67fecfcb9056de4ffffae31fe4454cf43ec81b88a
+  data.tar.gz: 5fe45243f286bef386f7589252c5ea59e9bc011e56c9edf5d0b999ecd1e34dfab51d0578700b55f86de4039a59de4de630b14d18077065d8732c71d204bc5b12

data/CHANGELOG.md CHANGED

@@ -4,6 +4,12 @@ Changes and additions to the library will be listed here.
 ## Unreleased
+## v0.3.5
+- Fix bug that caused the async producer to not work with Unicorn (#166).
+- Fix bug that caused committed consumer offsets to be lost (#167).
+- Instrument buffer overflows in the producer.
 ## v0.3.4
 - Make the producer buffer more resilient in the face of isolated topic errors.

data/README.md CHANGED

@@ -19,6 +19,8 @@ Although parts of this library work with Kafka 0.8 – specifically, the Produce
     6. [Compression](#compression)
     7. [Producing Messages from a Rails Application](#producing-messages-from-a-rails-application)
   2. [Consuming Messages from Kafka](#consuming-messages-from-kafka)
+    1. [Consumer Checkpointing](#consumer-checkpointing)
+    2. [Consuming Messages in Batches](#consuming-messages-in-batches)
   3. [Logging](#logging)
   4. [Instrumentation](#instrumentation)
   5. [Understanding Timeouts](#understanding-timeouts)
@@ -372,6 +374,39 @@ end
 Each consumer process will be assigned one or more partitions from each topic that the group subscribes to. In order to handle more messages, simply start more processes.
+#### Consumer Checkpointing
+In order to be able to resume processing after a consumer crashes, each consumer will periodically _checkpoint_ its position within each partition it reads from. Since each partition has a monotonically increasing sequence of message offsets, this works by _committing_ the offset of the last message that was processed in a given partition. Kafka handles these commits and allows another consumer in a group to resume from the last commit when a member crashes or becomes unresponsive.
+#### Consuming Messages in Batches
+Sometimes it is easier to deal with messages in batches rather than individually. A _batch_ is a sequence of one or more Kafka messages that all belong to the same topic and partition. One common reason to want to use batches is when some external system has a batch or transactional API.
+```ruby
+# A mock search index that we'll be keeping up to date with new Kafka messages.
+index = SearchIndex.new
+consumer.subscribe("posts")
+consumer.each_batch do |batch|
+  puts "Received batch: #{batch.topic}/#{batch.partition}"
+  transaction = index.transaction
+  batch.messages.each do |message|
+    # Let's assume that adding a document is idempotent.
+    transaction.add(id: message.key, body: message.value)
+  end
+  # Once this method returns, the messages have been successfully written to the
+  # search index. The consumer will only checkpoint a batch *after* the block
+  # has completed without an exception.
+  transaction.commit!
+end
+```
+One important thing to note is that the client commits the offset of the batch's messages only after the _entire_ batch has been processed.
 ### Logging

data/lib/kafka/async_producer.rb CHANGED

@@ -69,31 +69,23 @@ module Kafka
     # @param delivery_interval [Integer] if greater than zero, the number of
     #   seconds between automatic message deliveries.
     #
-    def initialize(sync_producer:, max_queue_size: 1000, delivery_threshold: 0, delivery_interval: 0)
+    def initialize(sync_producer:, max_queue_size: 1000, delivery_threshold: 0, delivery_interval: 0, instrumenter:)
       raise ArgumentError unless max_queue_size > 0
       raise ArgumentError unless delivery_threshold >= 0
       raise ArgumentError unless delivery_interval >= 0
       @queue = Queue.new
       @max_queue_size = max_queue_size
+      @instrumenter = instrumenter
-      @worker_thread = Thread.new do
-        worker = Worker.new(
-          queue: @queue,
-          producer: sync_producer,
-          delivery_threshold: delivery_threshold,
-        )
+      @worker = Worker.new(
+        queue: @queue,
+        producer: sync_producer,
+        delivery_threshold: delivery_threshold,
+      )
-        worker.run
-      end
-      @worker_thread.abort_on_exception = true
-      if delivery_interval > 0
-        Thread.new do
-          Timer.new(queue: @queue, interval: delivery_interval).run
-        end
-      end
+      # The timer will no-op if the delivery interval is zero.
+      @timer = Timer.new(queue: @queue, interval: delivery_interval)
     end
     # Produces a message to the specified topic.
@@ -102,9 +94,12 @@ module Kafka
     # @param (see Kafka::Producer#produce)
     # @raise [BufferOverflow] if the message queue is full.
     # @return [nil]
-    def produce(*args)
-      raise BufferOverflow if @queue.size >= @max_queue_size
+    def produce(value, topic:, **options)
+      ensure_threads_running!
+      buffer_overflow(topic) if @queue.size >= @max_queue_size
+      args = [value, **options.merge(topic: topic)]
       @queue << [:produce, args]
       nil
@@ -128,11 +123,35 @@ module Kafka
     # @return [nil]
     def shutdown
       @queue << [:shutdown, nil]
-      @worker_thread.join
+      @worker_thread && @worker_thread.join
       nil
     end
+    private
+    def ensure_threads_running!
+      @worker_thread = nil unless @worker_thread && @worker_thread.alive?
+      @worker_thread ||= start_thread { @worker.run }
+      @timer_thread = nil unless @timer_thread && @timer_thread.alive?
+      @timer_thread ||= start_thread { @timer.run }
+    end
+    def start_thread(&block)
+      thread = Thread.new(&block)
+      thread.abort_on_exception = true
+      thread
+    end
+    def buffer_overflow(topic)
+      @instrumenter.instrument("buffer_overflow.producer", {
+        topic: topic,
+      })
+      raise BufferOverflow
+    end
     class Timer
       def initialize(interval:, queue:)
         @queue = queue
@@ -140,6 +159,9 @@ module Kafka
       end
       def run
+        # Permanently sleep if the timer interval is zero.
+        Thread.stop if @interval.zero?
         loop do
           sleep(@interval)
           @queue << [:deliver_messages, nil]

data/lib/kafka/client.rb CHANGED

@@ -140,6 +140,7 @@ module Kafka
         delivery_interval: delivery_interval,
         delivery_threshold: delivery_threshold,
         max_queue_size: max_queue_size,
+        instrumenter: @instrumenter,
       )
     end

data/lib/kafka/producer.rb CHANGED

@@ -194,11 +194,11 @@ module Kafka
       )
       if buffer_size >= @max_buffer_size
-        raise BufferOverflow, "Max buffer size (#{@max_buffer_size} messages) exceeded"
+        buffer_overflow topic, "Max buffer size (#{@max_buffer_size} messages) exceeded"
       end
       if buffer_bytesize + message.bytesize >= @max_buffer_bytesize
-        raise BufferOverflow, "Max buffer bytesize (#{@max_buffer_bytesize} bytes) exceeded"
+        buffer_overflow topic, "Max buffer bytesize (#{@max_buffer_bytesize} bytes) exceeded"
       end
       @target_topics.add(topic)
@@ -362,5 +362,13 @@ module Kafka
       @pending_message_queue.replace(failed_messages)
     end
+    def buffer_overflow(topic, message)
+      @instrumenter.instrument("buffer_overflow.producer", {
+        topic: topic,
+      })
+      raise BufferOverflow, message
+    end
   end
 end

data/lib/kafka/protocol/offset_commit_request.rb CHANGED

@@ -1,6 +1,9 @@
 module Kafka
   module Protocol
     class OffsetCommitRequest
+      # This value signals to the broker that its default configuration should be used.
+      DEFAULT_RETENTION_TIME = -1
       def api_key
         8
       end
@@ -13,7 +16,7 @@ module Kafka
         OffsetCommitResponse
       end
-      def initialize(group_id:, generation_id:, member_id:, retention_time: 0, offsets:)
+      def initialize(group_id:, generation_id:, member_id:, retention_time: DEFAULT_RETENTION_TIME, offsets:)
         @group_id = group_id
         @generation_id = generation_id
         @member_id = member_id

data/lib/kafka/version.rb CHANGED

@@ -1,3 +1,3 @@
 module Kafka
-  VERSION = "0.3.4"
+  VERSION = "0.3.5"
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: ruby-kafka
 version: !ruby/object:Gem::Version
-  version: 0.3.4
+  version: 0.3.5
 platform: ruby
 authors:
 - Daniel Schierbeck
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2016-04-12 00:00:00.000000000 Z
+date: 2016-04-27 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler