RubyGems - ruby-kafka - Versions diffs - 0.3.18.beta1 → 0.3.18.beta2 - Mend

ruby-kafka 0.3.18.beta1 → 0.3.18.beta2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/README.md +14 -4
data/lib/kafka/broker.rb +0 -16
data/lib/kafka/connection.rb +23 -111
data/lib/kafka/consumer.rb +5 -1
data/lib/kafka/fetch_operation.rb +6 -14
data/lib/kafka/instrumenter.rb +4 -28
data/lib/kafka/version.rb +1 -1
metadata +2 -3
data/lib/kafka/protocol/null_response.rb +0 -11

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 80c7dd382316d8fd5465ab7f73a4393cf3ed01cf
-  data.tar.gz: b30842098f9076127c33eb762dffc71424de2866
+  metadata.gz: a315e2a5db26fa2430705e5dc25757593682703b
+  data.tar.gz: d80c0b9f184d4ec2da61139de39bf97177c7de4a
 SHA512:
-  metadata.gz: 90b3ed06f47b6e2b22a7623bcf3825206b7745170207cb9e2226adcc2ee60988769e7ad498a241e30eb0f04dbf3daa56acef4c91c5334b04ded7d1cb429e4866
-  data.tar.gz: 72c20e866cd03e575bbae4d82a86fe56195463412929dbb4656c1cfc16ef4c9f109317897ad7773cc79ccd22da395c440522b174e786f2ff358629c4db897dc8
+  metadata.gz: 63f090c16636aff10749d7e20628996207bf07dcd68da7fc58a76f49113972eb35e995b46af91d4b418692effefdc737e652750a583b6d7c3f066d5e797ff1e6
+  data.tar.gz: d5e813fbdf4d9663ca7e7718493f6b6b34523828eefd1697f69da2fb715fa1b533f84681be2fbd38ecd023c9b70bfb5085b04873764ab6f3cc040cc95c2f1ef0

data/README.md CHANGED

@@ -637,12 +637,13 @@ In order to optimize for throughput, you want to make sure to fetch as many mess
 In order to optimize for low latency, you want to process a message as soon as possible, even if that means fetching a smaller batch of messages.
-There are two values that can be tuned in order to balance these two concerns: `min_bytes` and `max_wait_time`.
+There are three values that can be tuned in order to balance these two concerns.
 * `min_bytes` is the minimum number of bytes to return from a single message fetch. By setting this to a high value you can increase the processing throughput. The default value is one byte.
-* `max_wait_time` is the maximum number of seconds to wait before returning data from a single message fetch. By setting this high you also increase the processing throughput – and by setting it low you set a bound on latency. This configuration overrides `min_bytes`, so you'll _always_ get data back within the time specified. The default value is five seconds.
+* `max_wait_time` is the maximum number of seconds to wait before returning data from a single message fetch. By setting this high you also increase the processing throughput – and by setting it low you set a bound on latency. This configuration overrides `min_bytes`, so you'll _always_ get data back within the time specified. The default value is five seconds. If you want to have at most one second of latency, set `max_wait_time` to 1.
+* `max_bytes_per_partition` is the maximum amount of data a broker will return for a single partition when fetching new messages. The default is 1MB, but increasing this number may lead to better throughtput since you'll need to fetch less frequently. Setting it to a lower value is not recommended unless you have so many partitions that it's causing network and latency issues to transfer a fetch response from a broker to a client. Setting the number too high may result in instability, so be careful.
-Both settings can be passed to either `#each_message` or `#each_batch`, e.g.
+The first two settings can be passed to either `#each_message` or `#each_batch`, e.g.
 ```ruby
 # Waits for data for up to 30 seconds, preferring to fetch at least 5KB at a time.
@@ -651,7 +652,16 @@ consumer.each_message(min_bytes: 1024 * 5, max_wait_time: 30) do |message|
 end
 ```
-If you want to have at most one second of latency, set `max_wait_time: 1`.
+The last setting is configured when subscribing to a topic, and can vary between topics:
+```ruby
+# Fetches up to 5MB per partition at a time for better throughput.
+consumer.subscribe("greetings", max_bytes_per_partition: 5 * 1024 * 1024)
+consumer.each_message do |message|
+  # ...
+end
+```
 ### Thread Safety

data/lib/kafka/broker.rb CHANGED

@@ -40,22 +40,6 @@ module Kafka
       @connection.send_request(request)
     end
-    # Fetches messages asynchronously.
-    #
-    # The fetch request is sent to the broker, but the response is not read.
-    # This allows the broker to process the request, wait for new messages,
-    # and send a response without the client having to wait. In order to
-    # read the response, call `#call` on the returned object. This will
-    # block the caller until the response is available.
-    #
-    # @param (see Kafka::Protocol::FetchRequest#initialize)
-    # @return [Kafka::AsyncResponse]
-    def fetch_messages_async(**options)
-      request = Protocol::FetchRequest.new(**options)
-      @connection.send_async_request(request)
-    end
     # Lists the offset of the specified topics and partitions.
     #
     # @param (see Kafka::Protocol::ListOffsetRequest#initialize)

data/lib/kafka/connection.rb CHANGED

@@ -2,41 +2,11 @@ require "stringio"
 require "kafka/socket_with_timeout"
 require "kafka/ssl_socket_with_timeout"
 require "kafka/protocol/request_message"
-require "kafka/protocol/null_response"
 require "kafka/protocol/encoder"
 require "kafka/protocol/decoder"
 module Kafka
-  # An asynchronous response object allows us to deliver a response at some
-  # later point in time.
-  #
-  # When instantiating an AsyncResponse, you provide a response decoder and
-  # a block that will force the caller to wait until a response is available.
-  class AsyncResponse
-    # Use a custom "nil" value so that nil can be an actual value.
-    MISSING = Object.new
-    def initialize(decoder, &block)
-      @decoder = decoder
-      @block = block
-      @response = MISSING
-    end
-    # Block until a response is available.
-    def call
-      @block.call if @response == MISSING
-      @response
-    end
-    # Deliver the response data.
-    #
-    # After calling this, `#call` will returned the decoded response.
-    def deliver(data)
-      @response = @decoder.decode(data)
-    end
-  end
   # A connection to a single Kafka broker.
   #
   # Usually you'll need a separate connection to each broker in a cluster, since most
@@ -108,18 +78,6 @@ module Kafka
     #
     # @return [Object] the response.
     def send_request(request)
-      # Immediately block on the asynchronous request.
-      send_async_request(request).call
-    end
-    # Sends a request over the connection.
-    #
-    # @param request [#encode, #response_class] the request that should be
-    #   encoded and written.
-    #
-    # @return [AsyncResponse] the async response, allowing the caller to choose
-    #   when to block.
-    def send_async_request(request)
       # Default notification payload.
       notification = {
         broker_host: @host,
@@ -128,41 +86,15 @@ module Kafka
         response_size: 0,
       }
-      @instrumenter.start("request.connection", notification)
-      open unless open?
-      @correlation_id += 1
+      @instrumenter.instrument("request.connection", notification) do
+        open unless open?
-      write_request(request, notification)
+        @correlation_id += 1
-      response_class = request.response_class
-      correlation_id = @correlation_id
+        write_request(request, notification)
-      if response_class.nil?
-        async_response = AsyncResponse.new(Protocol::NullResponse) { nil }
-        # Immediately deliver a nil value.
-        async_response.deliver(nil)
-        @instrumenter.finish("request.connection", notification)
-        async_response
-      else
-        async_response = AsyncResponse.new(response_class) {
-          # A caller is trying to read the response, so we have to wait for it
-          # before we can return.
-          wait_for_response(correlation_id, notification)
-          # Once done, we can finish the instrumentation.
-          @instrumenter.finish("request.connection", notification)
-        }
-        # Store the asynchronous response so that data can be delivered to it
-        # at a later time.
-        @pending_async_responses[correlation_id] = async_response
-        async_response
+        response_class = request.response_class
+        wait_for_response(response_class, notification) unless response_class.nil?
       end
     rescue Errno::EPIPE, Errno::ECONNRESET, Errno::ETIMEDOUT, EOFError => e
       close
@@ -186,9 +118,6 @@ module Kafka
       # Correlation id is initialized to zero and bumped for each request.
       @correlation_id = 0
-      # The pipeline of pending response futures must be reset.
-      @pending_async_responses = {}
     rescue Errno::ETIMEDOUT => e
       @logger.error "Timed out while trying to connect to #{self}: #{e}"
       raise ConnectionError, e
@@ -230,8 +159,8 @@ module Kafka
     #   a given Decoder.
     #
     # @return [nil]
-    def read_response(expected_correlation_id, notification)
-      @logger.debug "Waiting for response #{expected_correlation_id} from #{to_s}"
+    def read_response(response_class, notification)
+      @logger.debug "Waiting for response #{@correlation_id} from #{to_s}"
       data = @decoder.bytes
       notification[:response_size] = data.bytesize
@@ -240,49 +169,32 @@ module Kafka
       response_decoder = Kafka::Protocol::Decoder.new(buffer)
       correlation_id = response_decoder.int32
+      response = response_class.decode(response_decoder)
       @logger.debug "Received response #{correlation_id} from #{to_s}"
-      return correlation_id, response_decoder
+      return correlation_id, response
     rescue Errno::ETIMEDOUT
-      @logger.error "Timed out while waiting for response #{expected_correlation_id}"
+      @logger.error "Timed out while waiting for response #{@correlation_id}"
       raise
-    rescue Errno::EPIPE, Errno::ECONNRESET, Errno::ETIMEDOUT, EOFError => e
-      close
-      raise ConnectionError, "Connection error: #{e}"
     end
-    def wait_for_response(expected_correlation_id, notification)
+    def wait_for_response(response_class, notification)
       loop do
-        correlation_id, data = read_response(expected_correlation_id, notification)
-        if correlation_id < expected_correlation_id
-          # There may have been a previous request that timed out before the client
-          # was able to read the response. In that case, the response will still be
-          # sitting in the socket waiting to be read. If the response we just read
-          # was to a previous request, we deliver it to the pending async response
-          # future.
-          async_response = @pending_async_responses.delete(correlation_id)
-          async_response.deliver(data) if async_response
-        elsif correlation_id > expected_correlation_id
-          raise Kafka::Error, "Correlation id mismatch: expected #{expected_correlation_id} but got #{correlation_id}"
+        correlation_id, response = read_response(response_class, notification)
+        # There may have been a previous request that timed out before the client
+        # was able to read the response. In that case, the response will still be
+        # sitting in the socket waiting to be read. If the response we just read
+        # was to a previous request, we can safely skip it.
+        if correlation_id < @correlation_id
+          @logger.error "Received out-of-order response id #{correlation_id}, was expecting #{@correlation_id}"
+        elsif correlation_id > @correlation_id
+          raise Kafka::Error, "Correlation id mismatch: expected #{@correlation_id} but got #{correlation_id}"
         else
-          # If the request was asynchronous, deliver the response to the pending
-          # async response future.
-          async_response = @pending_async_responses.delete(correlation_id)
-          async_response.deliver(data)
-          return async_response.call
+          return response
         end
       end
-    rescue Errno::EPIPE, Errno::ECONNRESET, Errno::ETIMEDOUT, EOFError => e
-      notification[:exception] = [e.class.name, e.message]
-      notification[:exception_object] = e
-      close
-      raise ConnectionError, "Connection error: #{e}"
     end
   end
 end

data/lib/kafka/consumer.rb CHANGED

@@ -288,6 +288,10 @@ module Kafka
       @offset_manager.mark_as_processed(message.topic, message.partition, message.offset)
     end
+    def send_heartbeat_if_necessary
+      @heartbeat.send_if_necessary
+    end
     private
     def consumer_loop
@@ -316,7 +320,7 @@ module Kafka
     def make_final_offsets_commit!(attempts = 3)
       @offset_manager.commit_offsets
-    rescue ConnectionError, Kafka::OffsetCommitError
+    rescue ConnectionError
       # It's important to make sure final offsets commit is done
       # As otherwise messages that have been processed after last auto-commit
       # will be processed again and that may be huge amount of messages

data/lib/kafka/fetch_operation.rb CHANGED

@@ -40,11 +40,7 @@ module Kafka
       }
     end
-    def execute(&block)
-      if block.nil?
-        return to_enum(:execute)
-      end
+    def execute
       @cluster.add_target_topics(@topics.keys)
       @cluster.refresh_metadata_if_necessary!
@@ -60,7 +56,7 @@ module Kafka
         end
       end
-      responses = topics_by_broker.map {|broker, topics|
+      topics_by_broker.flat_map {|broker, topics|
         resolve_offsets(broker, topics)
         options = {
@@ -69,14 +65,10 @@ module Kafka
           topics: topics,
         }
-        broker.fetch_messages_async(**options)
-      }
-      responses.each {|response_future|
-        response = response_future.call
+        response = broker.fetch_messages(**options)
-        response.topics.each {|fetched_topic|
-          fetched_topic.partitions.each {|fetched_partition|
+        response.topics.flat_map {|fetched_topic|
+          fetched_topic.partitions.map {|fetched_partition|
             begin
               Protocol.handle_error(fetched_partition.error_code)
             rescue Kafka::OffsetOutOfRange => e
@@ -101,7 +93,7 @@ module Kafka
               )
             }
-            yield FetchedBatch.new(
+            FetchedBatch.new(
               topic: fetched_topic.name,
               partition: fetched_partition.partition,
               highwater_mark_offset: fetched_partition.highwater_mark_offset,

data/lib/kafka/instrumenter.rb CHANGED

@@ -6,35 +6,19 @@ module Kafka
       @default_payload = default_payload
       if defined?(ActiveSupport::Notifications)
-        @backend = ActiveSupport::Notifications.instrumenter
+        @backend = ActiveSupport::Notifications
       else
         @backend = nil
       end
     end
-    def instrument(event_name, payload = {})
+    def instrument(event_name, payload = {}, &block)
       if @backend
         payload.update(@default_payload)
-        @backend.instrument("#{event_name}.#{NAMESPACE}", payload) { yield payload if block_given? }
+        @backend.instrument("#{event_name}.#{NAMESPACE}", payload, &block)
       else
-        yield payload if block_given?
-      end
-    end
-    def start(event_name, payload = {})
-      if @backend
-        payload.update(@default_payload)
-        @backend.start("#{event_name}.#{NAMESPACE}", payload)
-      end
-    end
-    def finish(event_name, payload = {})
-      if @backend
-        payload.update(@default_payload)
-        @backend.finish("#{event_name}.#{NAMESPACE}", payload)
+        block.call(payload) if block
       end
     end
   end
@@ -48,13 +32,5 @@ module Kafka
     def instrument(event_name, payload = {}, &block)
       @backend.instrument(event_name, @extra_payload.merge(payload), &block)
     end
-    def start(event_name, payload = {})
-      @backend.start(event_name, @extra_payload.merge(payload))
-    end
-    def finish(event_name, payload = {})
-      @backend.finish(event_name, @extra_payload.merge(payload))
-    end
   end
 end

data/lib/kafka/version.rb CHANGED

@@ -1,3 +1,3 @@
 module Kafka
-  VERSION = "0.3.18.beta1"
+  VERSION = "0.3.18.beta2"
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: ruby-kafka
 version: !ruby/object:Gem::Version
-  version: 0.3.18.beta1
+  version: 0.3.18.beta2
 platform: ruby
 authors:
 - Daniel Schierbeck
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2017-06-21 00:00:00.000000000 Z
+date: 2017-06-29 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: gssapi
@@ -295,7 +295,6 @@ files:
 - lib/kafka/protocol/message.rb
 - lib/kafka/protocol/message_set.rb
 - lib/kafka/protocol/metadata_response.rb
-- lib/kafka/protocol/null_response.rb
 - lib/kafka/protocol/offset_commit_request.rb
 - lib/kafka/protocol/offset_commit_response.rb
 - lib/kafka/protocol/offset_fetch_request.rb

data/lib/kafka/protocol/null_response.rb DELETED

@@ -1,11 +0,0 @@
-module Kafka
-  module Protocol
-    # A response class used when no response is expected.
-    class NullResponse
-      def self.decode(decoder)
-        nil
-      end
-    end
-  end
-end