RubyGems - karafka - Versions diffs - 2.0.4 → 2.0.7 - Mend

karafka 2.0.4 → 2.0.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

checksums.yaml +4 -4
checksums.yaml.gz.sig +0 -0
data/CHANGELOG.md +18 -0
data/Gemfile.lock +1 -1
data/README.md +9 -9
data/bin/integrations +13 -3
data/config/errors.yml +3 -0
data/lib/karafka/admin.rb +2 -1
data/lib/karafka/base_consumer.rb +20 -4
data/lib/karafka/connection/client.rb +6 -6
data/lib/karafka/connection/listener.rb +9 -5
data/lib/karafka/contracts/consumer_group.rb +2 -2
data/lib/karafka/contracts/consumer_group_topic.rb +10 -9
data/lib/karafka/messages/builders/batch_metadata.rb +2 -3
data/lib/karafka/messages/builders/messages.rb +3 -1
data/lib/karafka/pro/active_job/consumer.rb +1 -1
data/lib/karafka/pro/base_consumer.rb +32 -3
data/lib/karafka/pro/contracts/consumer_group_topic.rb +21 -1
data/lib/karafka/pro/loader.rb +1 -1
data/lib/karafka/pro/processing/coordinator.rb +14 -0
data/lib/karafka/pro/processing/jobs/consume_non_blocking.rb +3 -2
data/lib/karafka/pro/processing/partitioner.rb +3 -5
data/lib/karafka/pro/routing/topic_extensions.rb +41 -5
data/lib/karafka/processing/executor.rb +14 -6
data/lib/karafka/processing/jobs/base.rb +4 -0
data/lib/karafka/processing/jobs/consume.rb +7 -2
data/lib/karafka/processing/worker.rb +0 -1
data/lib/karafka/routing/proxy.rb +9 -16
data/lib/karafka/routing/subscription_groups_builder.rb +1 -0
data/lib/karafka/routing/topic.rb +3 -1
data/lib/karafka/templates/karafka.rb.erb +1 -1
data/lib/karafka/version.rb +1 -1
data.tar.gz.sig +0 -0
metadata +2 -2
metadata.gz.sig +0 -0

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 8f065ff811caf7ed4cf8c1b39c09c5936bc06ef99f51ee8040ed00164dcffbe6
-  data.tar.gz: 878c4a53feaaa8587334cfb7c6c19d370ffaf03b6d67ef6ddeb50699da4d7322
+  metadata.gz: 0abed3f97a58be6b48f640468f7d7e6d48bc0960596b21d022b4616dd047be28
+  data.tar.gz: 48143253beee640e25e47a81474767c179e715e855d6173b59566483a57af5a8
 SHA512:
-  metadata.gz: 71794fc1da73605fe6cf40d1826d48294f436a61bfec46c03dce370b5093cefa5dec09b3eba8e33434dd13fe0daa57ff1b634e2e42de887075c0f29d820b0122
-  data.tar.gz: 17f1dc66520907e04a876cd86995ff2619fe46c995a239f6ffe61045cd284d82a24ebcb276501b7fa669163a99c45edb4738fdbbc6f59660a4752fca658ac7c7
+  metadata.gz: 9c9f8c170ac82fc0f1eb6ea41698dcd82cc525006931a59443d004c94eb18b56ffcb67eb1eb45fcc1fd557fee22e6e63ceb7a8a001245469e3e574d87c88c8e8
+  data.tar.gz: 47bc7e7dfe5ca3d503a3cb18da4e4b95c076197dc26b5633195e169d3f4d94da4effaf27bd4360ddff1481031b1ee20f61e465e24f6984570f6067ca4fbd51ea

checksums.yaml.gz.sig CHANGED Viewed

Binary file

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,23 @@
 # Karafka framework changelog
+## 2.0.7 (Unreleased)
+- [Breaking change] Redefine the Virtual Partitions routing DSL to accept concurrency
+- Allow for `concurrency` setting in Virtual Partitions to extend or limit number of jobs per regular partition. This allows to make sure, we do not use all the threads on virtual partitions jobs
+- Allow for creation of as many Virtual Partitions as needed, without taking global `concurrency` into consideration
+## 2.0.6 (2022-09-02)
+- Improve client closing.
+- Fix for: Multiple LRJ topics fetched concurrently block ability for LRJ to kick in (#1002)
+- Introduce a pre-enqueue sync execution layer to prevent starvation cases for LRJ
+- Close admin upon critical errors to prevent segmentation faults
+- Add support for manual subscription group management (#852)
+## 2.0.5 (2022-08-23)
+- Fix unnecessary double new line in the `karafka.rb` template for Ruby on Rails
+- Fix a case where a manually paused partition would not be processed after rebalance (#988)
+- Increase specs stability.
+- Lower concurrency of execution of specs in Github CI.
 ## 2.0.4 (2022-08-19)
 - Fix hanging topic creation (#964)
 - Fix conflict with other Rails loading libraries like `gruf` (#974)

data/Gemfile.lock CHANGED Viewed

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    karafka (2.0.4)
+    karafka (2.0.7)
       karafka-core (>= 2.0.2, < 3.0.0)
       rdkafka (>= 0.12)
       thor (>= 0.20)

data/README.md CHANGED Viewed

@@ -8,12 +8,12 @@
 Karafka is a Ruby and Rails multi-threaded efficient Kafka processing framework that:
-- Supports parallel processing in [multiple threads](https://github.com/karafka/karafka/wiki/Concurrency-and-multithreading) (also for a [single topic partition](https://github.com/karafka/karafka/wiki/Pro-Virtual-Partitions) work)
-- Has [ActiveJob backend](https://github.com/karafka/karafka/wiki/Active-Job) support (including [ordered jobs](https://github.com/karafka/karafka/wiki/Pro-Enhanced-Active-Job#ordered-jobs))
-- [Automatically integrates](https://github.com/karafka/karafka/wiki/Integrating-with-Ruby-on-Rails-and-other-frameworks#integrating-with-ruby-on-rails=) with Ruby on Rails
-- Supports in-development [code reloading](https://github.com/karafka/karafka/wiki/Auto-reload-of-code-changes-in-development)
+- Supports parallel processing in [multiple threads](https://karafka.io/docs/Concurrency-and-multithreading) (also for a [single topic partition](https://karafka.io/docs/Pro-Virtual-Partitions) work)
+- Has [ActiveJob backend](https://karafka.io/docs/Active-Job) support (including [ordered jobs](https://karafka.io/docs/Pro-Enhanced-Active-Job#ordered-jobs))
+- [Automatically integrates](https://karafka.io/docs/Integrating-with-Ruby-on-Rails-and-other-frameworks#integrating-with-ruby-on-rails) with Ruby on Rails
+- Supports in-development [code reloading](https://karafka.io/docs/Auto-reload-of-code-changes-in-development)
 - Is powered by [librdkafka](https://github.com/edenhill/librdkafka) (the Apache Kafka C/C++ client library)
-- Has an out-of the box [StatsD/DataDog monitoring](https://github.com/karafka/karafka/wiki/Monitoring-and-logging) with a dashboard template.
+- Has an out-of the box [StatsD/DataDog monitoring](https://karafka.io/docs/Monitoring-and-logging) with a dashboard template.
 ```ruby
 # Define what topics you want to consume with which consumers in karafka.rb
@@ -42,13 +42,13 @@ If you're entirely new to the subject, you can start with our "Kafka on Rails" a
 - [Kafka on Rails: Using Kafka with Ruby on Rails – Part 1 – Kafka basics and its advantages](https://mensfeld.pl/2017/11/kafka-on-rails-using-kafka-with-ruby-on-rails-part-1-kafka-basics-and-its-advantages/)
 - [Kafka on Rails: Using Kafka with Ruby on Rails – Part 2 – Getting started with Rails and Kafka](https://mensfeld.pl/2018/01/kafka-on-rails-using-kafka-with-ruby-on-rails-part-2-getting-started-with-ruby-and-kafka/)
-If you want to get started with Kafka and Karafka as fast as possible, then the best idea is to visit our [Getting started](https://github.com/karafka/karafka/wiki/Getting-started) guides and the [example apps repository](https://github.com/karafka/example-apps).
+If you want to get started with Kafka and Karafka as fast as possible, then the best idea is to visit our [Getting started](https://karafka.io/docs/Getting-Started) guides and the [example apps repository](https://github.com/karafka/example-apps).
 We also maintain many [integration specs](https://github.com/karafka/karafka/tree/master/spec/integrations) illustrating various use-cases and features of the framework.
 ### TL;DR (1 minute from setup to publishing and consuming messages)
-**Prerequisites**: Kafka running. You can start it by following instructions from [here](https://github.com/karafka/karafka/wiki/Setting-up-Kafka).
+**Prerequisites**: Kafka running. You can start it by following instructions from [here](https://karafka.io/docs/Setting-up-Kafka).
 1. Add and install Karafka:
@@ -85,8 +85,8 @@ Help me provide high-quality open-source software. Please see the Karafka [homep
 ## Support
-Karafka has [Wiki pages](https://github.com/karafka/karafka/wiki) for almost everything and a pretty decent [FAQ](https://github.com/karafka/karafka/wiki/FAQ). It covers the installation, setup, and deployment, along with other useful details on how to run Karafka.
+Karafka has [Wiki pages](https://karafka.io/docs) for almost everything and a pretty decent [FAQ](https://karafka.io/docs/FAQ). It covers the installation, setup, and deployment, along with other useful details on how to run Karafka.
 If you have questions about using Karafka, feel free to join our [Slack](https://slack.karafka.io) channel.
-Karafka has [priority support](https://github.com/karafka/karafka/wiki/Pro-Support) for technical and architectural questions that is part of the Karafka Pro subscription.
+Karafka has [priority support](https://karafka.io/docs/Pro-Support) for technical and architectural questions that is part of the Karafka Pro subscription.

data/bin/integrations CHANGED Viewed

@@ -19,7 +19,7 @@ ROOT_PATH = Pathname.new(File.expand_path(File.join(File.dirname(__FILE__), '../
 # When the value is high, there's a problem with thread allocation on Github CI, tht is why
 # we limit it. Locally we can run a lot of those, as many of them have sleeps and do not use a lot
 # of CPU
-CONCURRENCY = ENV.key?('CI') ? 5 : Etc.nprocessors * 2
+CONCURRENCY = ENV.key?('CI') ? 3 : Etc.nprocessors * 2
 # How may bytes do we want to keep from the stdout in the buffer for when we need to print it
 MAX_BUFFER_OUTPUT = 51_200
@@ -47,6 +47,8 @@ class Scenario
   # @param path [String] path to the scenarios file
   def initialize(path)
     @path = path
+    # First 1024 characters from stdout
+    @stdout_head = ''
     # Last 1024 characters from stdout
     @stdout_tail = ''
   end
@@ -75,8 +77,6 @@ class Scenario
   def finished?
     # If the thread is running too long, kill it
     if current_time - @started_at > MAX_RUN_TIME
-      @wait_thr.kill
       begin
         Process.kill('TERM', pid)
       # It may finish right after we want to kill it, that's why we ignore this
@@ -88,6 +88,7 @@ class Scenario
     # to stdout. Otherwise after reaching the buffer size, it would hang
     buffer = ''
     @stdout.read_nonblock(MAX_BUFFER_OUTPUT, buffer, exception: false)
+    @stdout_head = buffer if @stdout_head.empty?
     @stdout_tail << buffer
     @stdout_tail = @stdout_tail[-MAX_BUFFER_OUTPUT..-1] || @stdout_tail
@@ -112,6 +113,11 @@ class Scenario
     @wait_thr.value&.exitstatus || 123
   end
+  # @return [String] exit status of the process
+  def exit_status
+    @wait_thr.value.to_s
+  end
   # Prints a status report when scenario is finished and stdout if it failed
   def report
     if success?
@@ -123,7 +129,11 @@ class Scenario
       puts
       puts "\e[#{31}m#{'[FAILED]'}\e[0m #{name}"
+      puts "Time taken: #{current_time - @started_at} seconds"
       puts "Exit code: #{exit_code}"
+      puts "Exit status: #{exit_status}"
+      puts @stdout_head
+      puts '...'
       puts @stdout_tail
       puts buffer
       puts

data/config/errors.yml CHANGED Viewed

@@ -35,6 +35,7 @@ en:
       consumer_format: needs to be present
       id_format: 'needs to be a string with a Kafka accepted format'
       initial_offset_format: needs to be either earliest or latest
+      subscription_group_format: must be nil or a non-empty string
     consumer_group:
       missing: needs to be present
@@ -54,3 +55,5 @@ en:
     pro_consumer_group_topic:
       consumer_format: needs to inherit from Karafka::Pro::BaseConsumer and not Karafka::Consumer
+      virtual_partitions.partitioner_respond_to_call: needs to be defined and needs to respond to `#call`
+      virtual_partitions.concurrency_format: needs to be equl or more than 1

data/lib/karafka/admin.rb CHANGED Viewed

@@ -54,8 +54,9 @@ module Karafka
       def with_admin
         admin = ::Rdkafka::Config.new(Karafka::App.config.kafka).admin
         result = yield(admin)
-        admin.close
         result
+      ensure
+        admin&.close
       end
     end
   end

data/lib/karafka/base_consumer.rb CHANGED Viewed

@@ -15,13 +15,24 @@ module Karafka
     # @return [Waterdrop::Producer] producer instance
     attr_accessor :producer
-    # Can be used to run preparation code
+    # Can be used to run preparation code prior to the job being enqueued
     #
     # @private
-    # @note This should not be used by the end users as it is part of the lifecycle of things but
+    # @note This should not be used by the end users as it is part of the lifecycle of things and
+    #   not as a part of the public api. This should not perform any extensive operations as it is
+    #   blocking and running in the listener thread.
+    def on_before_enqueue; end
+    # Can be used to run preparation code in the worker
+    #
+    # @private
+    # @note This should not be used by the end users as it is part of the lifecycle of things and
     #   not as part of the public api. This can act as a hook when creating non-blocking
     #   consumers and doing other advanced stuff
-    def on_before_consume; end
+    def on_before_consume
+      messages.metadata.processed_at = Time.now
+      messages.metadata.freeze
+    end
     # Executes the default consumer flow.
     #
@@ -70,10 +81,15 @@ module Karafka
       end
     end
-    # Trigger method for running on shutdown.
+    # Trigger method for running on partition revocation.
     #
     # @private
     def on_revoked
+      # We need to always un-pause the processing in case we have lost a given partition.
+      # Otherwise the underlying librdkafka would not know we may want to continue processing and
+      # the pause could in theory last forever
+      resume
       coordinator.revoke
       Karafka.monitor.instrument('consumer.revoked', caller: self) do

data/lib/karafka/connection/client.rb CHANGED Viewed

@@ -275,16 +275,16 @@ module Karafka
       # Commits the stored offsets in a sync way and closes the consumer.
       def close
-        # Once client is closed, we should not close it again
-        # This could only happen in case of a race-condition when forceful shutdown happens
-        # and triggers this from a different thread
-        return if @closed
         @mutex.synchronize do
-          internal_commit_offsets(async: false)
+          # Once client is closed, we should not close it again
+          # This could only happen in case of a race-condition when forceful shutdown happens
+          # and triggers this from a different thread
+          return if @closed
           @closed = true
+          internal_commit_offsets(async: false)
           # Remove callbacks runners that were registered
           ::Karafka::Instrumentation.statistics_callbacks.delete(@subscription_group.id)
           ::Karafka::Instrumentation.error_callbacks.delete(@subscription_group.id)

data/lib/karafka/connection/listener.rb CHANGED Viewed

@@ -185,7 +185,9 @@ module Karafka
             # processed (if it was assigned and revoked really fast), thus we may not have it
             # here. In cases like this, we do not run a revocation job
             @executors.find_all(topic, partition).each do |executor|
-              jobs << @jobs_builder.revoked(executor)
+              job = @jobs_builder.revoked(executor)
+              job.before_enqueue
+              jobs << job
             end
             # We need to remove all the executors of a given topic partition that we have lost, so
@@ -205,7 +207,9 @@ module Karafka
         jobs = []
         @executors.each do |_, _, executor|
-          jobs << @jobs_builder.shutdown(executor)
+          job = @jobs_builder.shutdown(executor)
+          job.before_enqueue
+          jobs << job
         end
         @scheduler.schedule_shutdown(@jobs_queue, jobs)
@@ -238,10 +242,10 @@ module Karafka
           @partitioner.call(topic, messages) do |group_id, partition_messages|
             # Count the job we're going to create here
             coordinator.increment
             executor = @executors.find_or_create(topic, partition, group_id)
-            jobs << @jobs_builder.consume(executor, partition_messages, coordinator)
+            job = @jobs_builder.consume(executor, partition_messages, coordinator)
+            job.before_enqueue
+            jobs << job
           end
         end

data/lib/karafka/contracts/consumer_group.rb CHANGED Viewed

@@ -12,8 +12,8 @@ module Karafka
         ).fetch('en').fetch('validations').fetch('consumer_group')
       end
-      required(:id) { |id| id.is_a?(String) && Contracts::TOPIC_REGEXP.match?(id) }
-      required(:topics) { |topics| topics.is_a?(Array) && !topics.empty? }
+      required(:id) { |val| val.is_a?(String) && Contracts::TOPIC_REGEXP.match?(val) }
+      required(:topics) { |val| val.is_a?(Array) && !val.empty? }
       virtual do |data, errors|
         next unless errors.empty?

data/lib/karafka/contracts/consumer_group_topic.rb CHANGED Viewed

@@ -12,15 +12,16 @@ module Karafka
         ).fetch('en').fetch('validations').fetch('consumer_group_topic')
       end
-      required(:consumer) { |consumer_group| !consumer_group.nil? }
-      required(:deserializer) { |deserializer| !deserializer.nil? }
-      required(:id) { |id| id.is_a?(String) && Contracts::TOPIC_REGEXP.match?(id) }
-      required(:kafka) { |kafka| kafka.is_a?(Hash) && !kafka.empty? }
-      required(:max_messages) { |mm| mm.is_a?(Integer) && mm >= 1 }
-      required(:initial_offset) { |io| %w[earliest latest].include?(io) }
-      required(:max_wait_time) { |mwt| mwt.is_a?(Integer) && mwt >= 10 }
-      required(:manual_offset_management) { |mmm| [true, false].include?(mmm) }
-      required(:name) { |name| name.is_a?(String) && Contracts::TOPIC_REGEXP.match?(name) }
+      required(:consumer) { |val| !val.nil? }
+      required(:deserializer) { |val| !val.nil? }
+      required(:id) { |val| val.is_a?(String) && Contracts::TOPIC_REGEXP.match?(val) }
+      required(:kafka) { |val| val.is_a?(Hash) && !val.empty? }
+      required(:max_messages) { |val| val.is_a?(Integer) && val >= 1 }
+      required(:initial_offset) { |val| %w[earliest latest].include?(val) }
+      required(:max_wait_time) { |val| val.is_a?(Integer) && val >= 10 }
+      required(:manual_offset_management) { |val| [true, false].include?(val) }
+      required(:name) { |val| val.is_a?(String) && Contracts::TOPIC_REGEXP.match?(val) }
+      required(:subscription_group) { |val| val.nil? || (val.is_a?(String) && !val.empty?) }
       virtual do |data, errors|
         next unless errors.empty?

data/lib/karafka/messages/builders/batch_metadata.rb CHANGED Viewed

@@ -28,9 +28,8 @@ module Karafka
               created_at: messages.last.timestamp,
               # When this batch was built and scheduled for execution
               scheduled_at: scheduled_at,
-              # We build the batch metadata when we pick up the job in the worker, thus we can use
-              # current time here
-              processed_at: Time.now
+              # This needs to be set to a correct value prior to processing starting
+              processed_at: nil
             )
           end
         end

data/lib/karafka/messages/builders/messages.rb CHANGED Viewed

@@ -14,11 +14,13 @@ module Karafka
           # @param received_at [Time] moment in time when the messages were received
           # @return [Karafka::Messages::Messages] messages batch object
           def call(messages, topic, received_at)
+            # We cannot freeze the batch metadata because it is altered with the processed_at time
+            # prior to the consumption. It is being frozen there
             metadata = BatchMetadata.call(
               messages,
               topic,
               received_at
-            ).freeze
+            )
             Karafka::Messages::Messages.new(
               messages,

data/lib/karafka/pro/active_job/consumer.rb CHANGED Viewed

@@ -35,7 +35,7 @@ module Karafka
             # We cannot mark jobs as done after each if there are virtual partitions. Otherwise
             # this could create random markings
-            next if topic.virtual_partitioner?
+            next if topic.virtual_partitions?
             mark_as_consumed(message)
           end

data/lib/karafka/pro/base_consumer.rb CHANGED Viewed

@@ -23,13 +23,17 @@ module Karafka
       private_constant :MAX_PAUSE_TIME
-      # Pauses processing of a given partition until we're done with the processing
+      # Pauses processing of a given partition until we're done with the processing.
       # This ensures, that we can easily poll not reaching the `max.poll.interval`
-      def on_before_consume
+      # @note This needs to happen in the listener thread, because we cannot wait on this being
+      #   executed in the workers. Workers may be already running some LRJ jobs that are blocking
+      #   all the threads until finished, yet unless we pause the incoming partitions information,
+      #   we may be kicked out of the consumer group due to not polling often enough
+      def on_before_enqueue
         return unless topic.long_running_job?
         # This ensures, that when running LRJ with VP, things operate as expected
-        coordinator.on_started do |first_group_message|
+        coordinator.on_enqueued do |first_group_message|
           # Pause at the first message in a batch. That way in case of a crash, we will not loose
           # any messages
           pause(first_group_message.offset, MAX_PAUSE_TIME)
@@ -44,6 +48,29 @@ module Karafka
         end
       end
+      # Trigger method for running on partition revocation.
+      #
+      # @private
+      def on_revoked
+        # We do not want to resume on revocation in case of a LRJ.
+        # For LRJ we resume after the successful processing or do a backoff pause in case of a
+        # failure. Double non-blocking resume could cause problems in coordination.
+        resume unless topic.long_running_job?
+        coordinator.revoke
+        Karafka.monitor.instrument('consumer.revoked', caller: self) do
+          revoked
+        end
+      rescue StandardError => e
+        Karafka.monitor.instrument(
+          'error.occurred',
+          error: e,
+          caller: self,
+          type: 'consumer.revoked.error'
+        )
+      end
       private
       # Handles the post-consumption flow depending on topic settings
@@ -74,6 +101,8 @@ module Karafka
           resume
         else
           # If processing failed, we need to pause
+          # For long running job this will overwrite the default never-ending pause and will cause
+          # the processing th keep going after the error backoff
           pause(@seek_offset || first_message.offset)
         end
       end

data/lib/karafka/pro/contracts/consumer_group_topic.rb CHANGED Viewed

@@ -22,11 +22,31 @@ module Karafka
           ).fetch('en').fetch('validations').fetch('pro_consumer_group_topic')
         end
-        virtual do |data|
+        nested(:virtual_partitions) do
+          required(:active) { |val| [true, false].include?(val) }
+          required(:partitioner) { |val| val.nil? || val.respond_to?(:call) }
+          required(:concurrency) { |val| val.is_a?(Integer) && val >= 1 }
+        end
+        virtual do |data, errors|
+          next unless errors.empty?
           next if data[:consumer] < Karafka::Pro::BaseConsumer
           [[%i[consumer], :consumer_format]]
         end
+        # When virtual partitions are defined, partitioner needs to respond to `#call` and it
+        # cannot be nil
+        virtual do |data, errors|
+          next unless errors.empty?
+          virtual_partitions = data[:virtual_partitions]
+          next unless virtual_partitions[:active]
+          next if virtual_partitions[:partitioner].respond_to?(:call)
+          [[%i[virtual_partitions partitioner], :respond_to_call]]
+        end
       end
     end
   end

data/lib/karafka/pro/loader.rb CHANGED Viewed

@@ -67,7 +67,7 @@ module Karafka
         # Loads routing extensions
         def load_routing_extensions
-          ::Karafka::Routing::Topic.include(Routing::TopicExtensions)
+          ::Karafka::Routing::Topic.prepend(Routing::TopicExtensions)
           ::Karafka::Routing::Builder.prepend(Routing::BuilderExtensions)
         end
       end

data/lib/karafka/pro/processing/coordinator.rb CHANGED Viewed

@@ -18,6 +18,7 @@ module Karafka
         # @param args [Object] anything the base coordinator accepts
         def initialize(*args)
           super
+          @on_enqueued_invoked = false
           @on_started_invoked = false
           @on_finished_invoked = false
           @flow_lock = Mutex.new
@@ -30,6 +31,7 @@ module Karafka
           super
           @mutex.synchronize do
+            @on_enqueued_invoked = false
             @on_started_invoked = false
             @on_finished_invoked = false
             @first_message = messages.first
@@ -42,6 +44,18 @@ module Karafka
           @running_jobs.zero?
         end
+        # Runs synchronized code once for a collective of virtual partitions prior to work being
+        # enqueued
+        def on_enqueued
+          @flow_lock.synchronize do
+            return if @on_enqueued_invoked
+            @on_enqueued_invoked = true
+            yield(@first_message, @last_message)
+          end
+        end
         # Runs given code only once per all the coordinated jobs upon starting first of them
         def on_started
           @flow_lock.synchronize do

data/lib/karafka/pro/processing/jobs/consume_non_blocking.rb CHANGED Viewed

@@ -25,8 +25,9 @@ module Karafka
         # @note It needs to be working with a proper consumer that will handle the partition
         #   management. This layer of the framework knows nothing about Kafka messages consumption.
         class ConsumeNonBlocking < ::Karafka::Processing::Jobs::Consume
-          # Releases the blocking lock after it is done with the preparation phase for this job
-          def before_call
+          # Makes this job non-blocking from the start
+          # @param args [Array] any arguments accepted by `::Karafka::Processing::Jobs::Consume`
+          def initialize(*args)
             super
             @non_blocking = true
           end

data/lib/karafka/pro/processing/partitioner.rb CHANGED Viewed

@@ -21,17 +21,15 @@ module Karafka
         def call(topic, messages)
           ktopic = @subscription_group.topics.find(topic)
-          @concurrency ||= ::Karafka::App.config.concurrency
           # We only partition work if we have a virtual partitioner and more than one thread to
           # process the data. With one thread it is not worth partitioning the work as the work
           # itself will be assigned to one thread (pointless work)
-          if ktopic.virtual_partitioner? && @concurrency > 1
+          if ktopic.virtual_partitions? && ktopic.virtual_partitions.concurrency > 1
             # We need to reduce it to number of threads, so the group_id is not a direct effect
             # of the end user action. Otherwise the persistence layer for consumers would cache
             # it forever and it would cause memory leaks
             groupings = messages
-                        .group_by { |msg| ktopic.virtual_partitioner.call(msg) }
+                        .group_by { |msg| ktopic.virtual_partitions.partitioner.call(msg) }
                         .values
             # Reduce the max concurrency to a size that matches the concurrency
@@ -41,7 +39,7 @@ module Karafka
             # The algorithm here is simple, we assume that the most costly in terms of processing,
             # will be processing of the biggest group and we reduce the smallest once to have
             # max of groups equal to concurrency
-            while groupings.size > @concurrency
+            while groupings.size > ktopic.virtual_partitions.concurrency
               groupings.sort_by! { |grouping| -grouping.size }
               # Offset order needs to be maintained for virtual partitions

data/lib/karafka/pro/routing/topic_extensions.rb CHANGED Viewed

@@ -15,23 +15,59 @@ module Karafka
     module Routing
       # Routing extensions that allow to configure some extra PRO routing options
       module TopicExtensions
+        # Internal representation of the virtual partitions settings and configuration
+        # This allows us to abstract away things in a nice manner
+        #
+        # For features with more options than just on/off we use this approach as it simplifies
+        # the code. We do not use it for all not to create unneeded complexity
+        VirtualPartitions = Struct.new(
+          :active,
+          :partitioner,
+          :concurrency,
+          keyword_init: true
+        ) { alias_method :active?, :active }
         class << self
           # @param base [Class] class we extend
-          def included(base)
+          def prepended(base)
             base.attr_accessor :long_running_job
-            base.attr_accessor :virtual_partitioner
           end
         end
-        # @return [Boolean] true if virtual partitioner is defined, false otherwise
-        def virtual_partitioner?
-          virtual_partitioner != nil
+        # @param concurrency [Integer] max number of virtual partitions that can come out of the
+        #   single distribution flow. When set to more than the Karafka threading, will create
+        #   more work than workers. When less, can ensure we have spare resources to process other
+        #   things in parallel.
+        # @param partitioner [nil, #call] nil or callable partitioner
+        # @return [VirtualPartitions] method that allows to set the virtual partitions details
+        #   during the routing configuration and then allows to retrieve it
+        def virtual_partitions(
+          concurrency: Karafka::App.config.concurrency,
+          partitioner: nil
+        )
+          @virtual_partitions ||= VirtualPartitions.new(
+            active: !partitioner.nil?,
+            concurrency: concurrency,
+            partitioner: partitioner
+          )
+        end
+        # @return [Boolean] are virtual partitions enabled for given topic
+        def virtual_partitions?
+          virtual_partitions.active?
         end
         # @return [Boolean] is a given job on a topic a long-running one
         def long_running_job?
           @long_running_job || false
         end
+        # @return [Hash] hash with topic details and the extensions details
+        def to_h
+          super.merge(
+            virtual_partitions: virtual_partitions.to_h
+          )
+        end
       end
     end
   end

data/lib/karafka/processing/executor.rb CHANGED Viewed

@@ -37,14 +37,17 @@ module Karafka
         @topic = topic
       end
-      # Builds the consumer instance, builds messages batch and sets all that is needed to run the
-      # user consumption logic
+      # Allows us to prepare the consumer in the listener thread prior to the job being send to
+      # the queue. It also allows to run some code that is time sensitive and cannot wait in the
+      # queue as it could cause starvation.
       #
       # @param messages [Array<Karafka::Messages::Message>]
-      # @param received_at [Time] the moment we've received the batch (actually the moment we've)
-      #   enqueued it, but good enough
       # @param coordinator [Karafka::Processing::Coordinator] coordinator for processing management
-      def before_consume(messages, received_at, coordinator)
+      def before_enqueue(messages, coordinator)
+        # the moment we've received the batch or actually the moment we've enqueued it,
+        # but good enough
+        @enqueued_at = Time.now
         # Recreate consumer with each batch if persistence is not enabled
         # We reload the consumers with each batch instead of relying on some external signals
         # when needed for consistency. That way devs may have it on or off and not in this
@@ -57,9 +60,14 @@ module Karafka
         consumer.messages = Messages::Builders::Messages.call(
           messages,
           @topic,
-          received_at
+          @enqueued_at
         )
+        consumer.on_before_enqueue
+      end
+      # Runs setup and warm-up code in the worker prior to running the consumption
+      def before_consume
         consumer.on_before_consume
       end

data/lib/karafka/processing/jobs/base.rb CHANGED Viewed

@@ -22,6 +22,10 @@ module Karafka
           @non_blocking = false
         end
+        # When redefined can run any code prior to the job being enqueued
+        # @note This will run in the listener thread and not in the worker
+        def before_enqueue; end
         # When redefined can run any code that should run before executing the proper code
         def before_call; end

data/lib/karafka/processing/jobs/consume.rb CHANGED Viewed

@@ -18,13 +18,18 @@ module Karafka
           @executor = executor
           @messages = messages
           @coordinator = coordinator
-          @created_at = Time.now
           super()
         end
+        # Runs all the preparation code on the executor that needs to happen before the job is
+        # enqueued.
+        def before_enqueue
+          executor.before_enqueue(@messages, @coordinator)
+        end
         # Runs the before consumption preparations on the executor
         def before_call
-          executor.before_consume(@messages, @created_at, @coordinator)
+          executor.before_consume
         end
         # Runs the given executor

data/lib/karafka/processing/worker.rb CHANGED Viewed

@@ -49,7 +49,6 @@ module Karafka
         instrument_details = { caller: self, job: job, jobs_queue: @jobs_queue }
         if job
           Karafka.monitor.instrument('worker.process', instrument_details)
           Karafka.monitor.instrument('worker.processed', instrument_details) do

data/lib/karafka/routing/proxy.rb CHANGED Viewed

@@ -7,15 +7,6 @@ module Karafka
     class Proxy
       attr_reader :target
-      # We should proxy only non ? and = methods as we want to have a regular dsl
-      IGNORED_POSTFIXES = %w[
-        ?
-        =
-        !
-      ].freeze
-      private_constant :IGNORED_POSTFIXES
       # @param target [Object] target object to which we proxy any DSL call
       # @param block [Proc] block that we want to evaluate in the proxy context
       def initialize(target, &block)
@@ -25,21 +16,23 @@ module Karafka
       # Translates the no "=" DSL of routing into elements assignments on target
       # @param method_name [Symbol] name of the missing method
-      # @param arguments [Array] array with it's arguments
-      # @param block [Proc] block provided to the method
-      def method_missing(method_name, *arguments, &block)
+      def method_missing(method_name, ...)
         return super unless respond_to_missing?(method_name)
-        @target.public_send(:"#{method_name}=", *arguments, &block)
+        if @target.respond_to?(:"#{method_name}=")
+          @target.public_send(:"#{method_name}=", ...)
+        else
+          @target.public_send(method_name, ...)
+        end
       end
       # Tells whether or not a given element exists on the target
       # @param method_name [Symbol] name of the missing method
       # @param include_private [Boolean] should we include private in the check as well
       def respond_to_missing?(method_name, include_private = false)
-        return false if IGNORED_POSTFIXES.any? { |postfix| method_name.to_s.end_with?(postfix) }
-        @target.respond_to?(:"#{method_name}=", include_private) || super
+        @target.respond_to?(:"#{method_name}=", include_private) ||
+          @target.respond_to?(method_name, include_private) ||
+          super
       end
     end
   end

data/lib/karafka/routing/subscription_groups_builder.rb CHANGED Viewed

@@ -19,6 +19,7 @@ module Karafka
         max_messages
         max_wait_time
         initial_offset
+        subscription_group
       ].freeze
       private_constant :DISTRIBUTION_KEYS

data/lib/karafka/routing/topic.rb CHANGED Viewed

@@ -8,6 +8,7 @@ module Karafka
     class Topic
       attr_reader :id, :name, :consumer_group
       attr_writer :consumer
+      attr_accessor :subscription_group
       # Attributes we can inherit from the root unless they were defined on this level
       INHERITABLE_ATTRIBUTES = %i[
@@ -91,7 +92,8 @@ module Karafka
           id: id,
           name: name,
           consumer: consumer,
-          consumer_group_id: consumer_group.id
+          consumer_group_id: consumer_group.id,
+          subscription_group: subscription_group
         ).freeze
       end
     end

data/lib/karafka/templates/karafka.rb.erb CHANGED Viewed

@@ -1,6 +1,6 @@
 # frozen_string_literal: true
 <% unless rails? -%>
 # This file is auto-generated during the install process.
 # If by any chance you've wanted a setup for Rails app, either run the `karafka:install`
 # command again or refer to the install templates available in the source codes

data/lib/karafka/version.rb CHANGED Viewed

@@ -3,5 +3,5 @@
 # Main module namespace
 module Karafka
   # Current Karafka version
-  VERSION = '2.0.4'
+  VERSION = '2.0.7'
 end

data.tar.gz.sig CHANGED Viewed

Binary file

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: karafka
 version: !ruby/object:Gem::Version
-  version: 2.0.4
+  version: 2.0.7
 platform: ruby
 authors:
 - Maciej Mensfeld
@@ -35,7 +35,7 @@ cert_chain:
   Qf04B9ceLUaC4fPVEz10FyobjaFoY4i32xRto3XnrzeAgfEe4swLq8bQsR3w/EF3
   MGU0FeSV2Yj7Xc2x/7BzLK8xQn5l7Yy75iPF+KP3vVmDHnNl
   -----END CERTIFICATE-----
-date: 2022-08-19 00:00:00.000000000 Z
+date: 2022-09-05 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: karafka-core

metadata.gz.sig CHANGED Viewed

Binary file