karafka 1.0.1 → 1.1.0.alpha1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (43) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +27 -3
  3. data/Gemfile +1 -0
  4. data/Gemfile.lock +14 -32
  5. data/README.md +1 -1
  6. data/karafka.gemspec +2 -3
  7. data/lib/karafka.rb +2 -3
  8. data/lib/karafka/attributes_map.rb +3 -3
  9. data/lib/karafka/backends/inline.rb +2 -2
  10. data/lib/karafka/base_controller.rb +19 -69
  11. data/lib/karafka/base_responder.rb +10 -5
  12. data/lib/karafka/cli/info.rb +1 -2
  13. data/lib/karafka/cli/server.rb +6 -8
  14. data/lib/karafka/connection/{messages_consumer.rb → consumer.rb} +27 -12
  15. data/lib/karafka/connection/listener.rb +6 -13
  16. data/lib/karafka/connection/{messages_processor.rb → processor.rb} +3 -3
  17. data/lib/karafka/controllers/callbacks.rb +54 -0
  18. data/lib/karafka/controllers/includer.rb +1 -1
  19. data/lib/karafka/controllers/single_params.rb +2 -2
  20. data/lib/karafka/errors.rb +7 -0
  21. data/lib/karafka/fetcher.rb +11 -5
  22. data/lib/karafka/monitor.rb +2 -2
  23. data/lib/karafka/params/params.rb +3 -1
  24. data/lib/karafka/params/params_batch.rb +1 -1
  25. data/lib/karafka/patches/dry_configurable.rb +0 -2
  26. data/lib/karafka/patches/ruby_kafka.rb +34 -0
  27. data/lib/karafka/persistence/consumer.rb +25 -0
  28. data/lib/karafka/persistence/controller.rb +24 -9
  29. data/lib/karafka/process.rb +1 -1
  30. data/lib/karafka/responders/topic.rb +8 -1
  31. data/lib/karafka/schemas/config.rb +0 -10
  32. data/lib/karafka/schemas/consumer_group.rb +9 -8
  33. data/lib/karafka/schemas/consumer_group_topic.rb +1 -1
  34. data/lib/karafka/schemas/responder_usage.rb +1 -0
  35. data/lib/karafka/server.rb +6 -19
  36. data/lib/karafka/setup/config.rb +15 -34
  37. data/lib/karafka/setup/configurators/base.rb +1 -1
  38. data/lib/karafka/setup/configurators/water_drop.rb +11 -13
  39. data/lib/karafka/templates/karafka.rb.example +1 -1
  40. data/lib/karafka/version.rb +1 -1
  41. metadata +15 -28
  42. data/Rakefile +0 -7
  43. data/lib/karafka/setup/configurators/celluloid.rb +0 -19
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: c7d3b30943ab27b3ae2d8609e645590ca918ff2b
4
- data.tar.gz: 0b6b31b10d070c07d45ab6bb8bc177e94d4d9c3c
3
+ metadata.gz: 3a4a0b649f34461fbeef206fdc2c92bd3e6d6ff2
4
+ data.tar.gz: 87c937be4ee2bdfdc9b1957e458754e88cb15231
5
5
  SHA512:
6
- metadata.gz: b8a3f16977e6af10f6cda783c326be7bb8540b483cacfe225fc7c422cad091d9014950c6a6de4a9f5cd033b974a41de73bc445e7813063d12013684f056d35df
7
- data.tar.gz: 2e425da4e9e420ad092b3b563e5fa796f3f73159bc9c5d6cdf77bbf8463842f0a9379fe156b54328a3f65144f08ec657fc6f65aa01295a3efc40181e4bd0d858
6
+ metadata.gz: bee787850078c780417e90a87ea5945b12ef84b70e49a888696fd37454de8301fa25130a0f2b736ac8ea9df2a72a58837fe37aced3a3d2f7261bf11546cf23d6
7
+ data.tar.gz: 3c42797b1d29ca0d107b570c42d9f7b28e45c1f6c3664d8acd055379594569eb2fd801c8c9e44bc9120ee001fe2c4ade401577cb82024689549c9ff67b6177cb
data/CHANGELOG.md CHANGED
@@ -1,5 +1,29 @@
1
1
  # Karafka framework changelog
2
2
 
3
+ ## 1.1.0 Unreleased
4
+ - Gem bump
5
+ - Switch from Celluloid to native Thread management
6
+ - Improved shutdown process
7
+ - Introduced optional fetch callbacks and moved current the ```after_received``` there as well
8
+ - Karafka will raise Errors::InvalidPauseTimeout exception when trying to pause but timeout set to 0
9
+ - Allow float for timeouts and other time based second settings
10
+ - Renamed MessagesProcessor to Processor and MessagesConsumer to Consumer - we don't process and don't consumer anything else so it was pointless to keep this "namespace"
11
+ - #232 - Remove unused ActiveSupport require
12
+ - #214 - Expose consumer on a controller layer
13
+ - #193 - Process shutdown callbacks
14
+ - Fixed accessibility of ```#params_batch``` from the outside of the controller
15
+ - connection_pool config options are no longer required
16
+ - celluloid config options are no longer required
17
+ - ```#perform``` is not renamed to ```#consume``` with warning level on using the old one (deprecated)
18
+ - #235 - Rename perform to consume
19
+ - Upgrade to ruby-kafka 0.5
20
+ - Due to redesign of Waterdrop concurrency setting is no longer needed
21
+ - #236 - Manual offset management
22
+ - WaterDrop 1.0.0 support with async
23
+ - Renamed ```batch_consuming``` option to ```batch_fetching``` as it is not a consumption (with processing) but a process of fetching messages from Kafka. The messages is considered consumed, when it is processed.
24
+ - Renamed ```batch_processing``` to ```batch_consuming``` to resemble Kafka concept of consuming messages.
25
+ - Renamed ```after_received``` to ```after_fetched``` to normalize the naming conventions.
26
+
3
27
  ## 1.0.1
4
28
  - #210 - LoadError: cannot load such file -- [...]/karafka.rb
5
29
  - Ruby 2.4.2 as a default (+travis integration)
@@ -60,11 +84,11 @@
60
84
 
61
85
  ### New features and improvements
62
86
 
63
- - batch processing thanks to ```#batch_processing``` flag and ```#params_batch``` on controllers
87
+ - batch processing thanks to ```#batch_consuming``` flag and ```#params_batch``` on controllers
64
88
  - ```#topic``` method on an controller instance to make a clear distinction in between params and route details
65
89
  - Changed routing model (still compatible with 0.5) to allow better resources management
66
90
  - Lower memory requirements due to object creation limitation (2-3 times less objects on each new message)
67
- - Introduced the ```#batch_processing``` config flag (config for #126) that can be set per each consumer_group
91
+ - Introduced the ```#batch_consuming``` config flag (config for #126) that can be set per each consumer_group
68
92
  - Added support for partition, offset and partition key in the params hash
69
93
  - ```name``` option in config renamed to ```client_id```
70
94
  - Long running controllers with ```persistent``` flag on a topic config level, to make controller instances persistent between messages batches (single controller instance per topic per partition no per messages batch) - turned on by default
@@ -77,7 +101,7 @@
77
101
  - ```start_from_beginning``` moved into kafka scope (```kafka.start_from_beginning```)
78
102
  - Router no longer checks for route uniqueness - now you can define same routes for multiple kafkas and do a lot of crazy stuff, so it's your responsibility to check uniqueness
79
103
  - Change in the way we identify topics in between Karafka and Sidekiq workers. If you upgrade, please make sure, all the jobs scheduled in Sidekiq are finished before the upgrade.
80
- - ```batch_mode``` renamed to ```batch_consuming```
104
+ - ```batch_mode``` renamed to ```batch_fetching```
81
105
  - Renamed content to value to better resemble ruby-kafka internal messages naming convention
82
106
  - When having a responder with ```required``` topics and not using ```#respond_with``` at all, it will raise an exception
83
107
  - Renamed ```inline_mode``` to ```inline_processing``` to resemble other settings conventions
data/Gemfile CHANGED
@@ -5,6 +5,7 @@ source 'https://rubygems.org'
5
5
  gemspec
6
6
 
7
7
  group :development, :test do
8
+ gem 'waterdrop'
8
9
  gem 'timecop'
9
10
  gem 'rspec'
10
11
  gem 'simplecov'
data/Gemfile.lock CHANGED
@@ -1,18 +1,17 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- karafka (1.0.1)
4
+ karafka (1.1.0.alpha1)
5
5
  activesupport (>= 5.0)
6
- celluloid
7
6
  dry-configurable (~> 0.7)
8
7
  dry-validation (~> 0.11)
9
8
  envlogic (~> 1.0)
10
9
  multi_json (>= 1.12)
11
10
  rake (>= 11.3)
12
11
  require_all (>= 1.4)
13
- ruby-kafka (>= 0.4)
12
+ ruby-kafka (>= 0.5)
14
13
  thor (~> 0.19)
15
- waterdrop (~> 0.4)
14
+ waterdrop (>= 1.0.alpha2)
16
15
 
17
16
  GEM
18
17
  remote: https://rubygems.org/
@@ -22,25 +21,10 @@ GEM
22
21
  i18n (~> 0.7)
23
22
  minitest (~> 5.1)
24
23
  tzinfo (~> 1.1)
25
- celluloid (0.17.3)
26
- celluloid-essentials
27
- celluloid-extras
28
- celluloid-fsm
29
- celluloid-pool
30
- celluloid-supervision
31
- timers (>= 4.1.1)
32
- celluloid-essentials (0.20.5)
33
- timers (>= 4.1.1)
34
- celluloid-extras (0.20.5)
35
- timers (>= 4.1.1)
36
- celluloid-fsm (0.20.5)
37
- timers (>= 4.1.1)
38
- celluloid-pool (0.20.5)
39
- timers (>= 4.1.1)
40
- celluloid-supervision (0.20.6)
41
- timers (>= 4.1.1)
42
24
  concurrent-ruby (1.0.5)
43
- connection_pool (2.2.1)
25
+ delivery_boy (0.2.2)
26
+ king_konf (~> 0.1.8)
27
+ ruby-kafka (~> 0.4)
44
28
  diff-lcs (1.3)
45
29
  docile (1.1.5)
46
30
  dry-configurable (0.7.0)
@@ -72,11 +56,11 @@ GEM
72
56
  dry-types (~> 0.12.0)
73
57
  envlogic (1.0.4)
74
58
  activesupport
75
- hitimes (1.2.6)
76
59
  i18n (0.9.0)
77
60
  concurrent-ruby (~> 1.0)
78
61
  inflecto (0.0.2)
79
62
  json (2.1.0)
63
+ king_konf (0.1.8)
80
64
  minitest (5.10.3)
81
65
  multi_json (1.12.2)
82
66
  null-logger (0.1.4)
@@ -95,7 +79,7 @@ GEM
95
79
  diff-lcs (>= 1.2.0, < 2.0)
96
80
  rspec-support (~> 3.7.0)
97
81
  rspec-support (3.7.0)
98
- ruby-kafka (0.4.3)
82
+ ruby-kafka (0.5.0)
99
83
  simplecov (0.15.1)
100
84
  docile (~> 1.1.0)
101
85
  json (>= 1.8, < 3)
@@ -104,17 +88,14 @@ GEM
104
88
  thor (0.20.0)
105
89
  thread_safe (0.3.6)
106
90
  timecop (0.9.1)
107
- timers (4.1.2)
108
- hitimes
109
91
  tzinfo (1.2.4)
110
92
  thread_safe (~> 0.1)
111
- waterdrop (0.4.0)
112
- bundler
113
- connection_pool
114
- dry-configurable (~> 0.6)
93
+ waterdrop (1.0.0.alpha2)
94
+ delivery_boy (>= 0.2.2)
95
+ dry-configurable (~> 0.7)
96
+ dry-validation (~> 0.11)
115
97
  null-logger
116
- rake
117
- ruby-kafka (~> 0.4)
98
+ ruby-kafka (>= 0.5)
118
99
 
119
100
  PLATFORMS
120
101
  ruby
@@ -124,6 +105,7 @@ DEPENDENCIES
124
105
  rspec
125
106
  simplecov
126
107
  timecop
108
+ waterdrop
127
109
 
128
110
  BUNDLED WITH
129
111
  1.15.4
data/README.md CHANGED
@@ -80,7 +80,7 @@ Unfortunately, it does not yet support independent forks, however you should be
80
80
  Please run:
81
81
 
82
82
  ```bash
83
- bundle exec rake
83
+ bundle exec rspec spec
84
84
  ```
85
85
 
86
86
  to check if everything is in order. After that you can submit a pull request.
data/karafka.gemspec CHANGED
@@ -17,16 +17,15 @@ Gem::Specification.new do |spec|
17
17
  spec.license = 'MIT'
18
18
 
19
19
  spec.add_dependency 'activesupport', '>= 5.0'
20
- spec.add_dependency 'celluloid'
21
20
  spec.add_dependency 'dry-configurable', '~> 0.7'
22
21
  spec.add_dependency 'dry-validation', '~> 0.11'
23
22
  spec.add_dependency 'envlogic', '~> 1.0'
24
23
  spec.add_dependency 'multi_json', '>= 1.12'
25
24
  spec.add_dependency 'rake', '>= 11.3'
26
25
  spec.add_dependency 'require_all', '>= 1.4'
27
- spec.add_dependency 'ruby-kafka', '>= 0.4'
26
+ spec.add_dependency 'ruby-kafka', '>= 0.5'
28
27
  spec.add_dependency 'thor', '~> 0.19'
29
- spec.add_dependency 'waterdrop', '~> 0.4'
28
+ spec.add_dependency 'waterdrop', '>= 1.0.alpha2'
30
29
 
31
30
  spec.required_ruby_version = '>= 2.3.0'
32
31
 
data/lib/karafka.rb CHANGED
@@ -2,8 +2,6 @@
2
2
 
3
3
  %w[
4
4
  English
5
- bundler
6
- celluloid/current
7
5
  waterdrop
8
6
  kafka
9
7
  envlogic
@@ -14,7 +12,6 @@
14
12
  dry-configurable
15
13
  dry-validation
16
14
  active_support/callbacks
17
- active_support/core_ext/class/subclasses
18
15
  active_support/core_ext/hash/indifferent_access
19
16
  active_support/descendants_tracker
20
17
  active_support/inflector
@@ -67,3 +64,5 @@ module Karafka
67
64
  end
68
65
 
69
66
  Karafka::Loader.load!(Karafka.core_root)
67
+ Kafka::Consumer.prepend(Karafka::Patches::RubyKafka)
68
+ Dry::Configurable::Config.prepend(Karafka::Patches::DryConfigurable)
@@ -21,7 +21,7 @@ module Karafka
21
21
  offset_retention_time heartbeat_interval
22
22
  ],
23
23
  subscription: %i[start_from_beginning max_bytes_per_partition],
24
- consuming: %i[min_bytes max_wait_time],
24
+ consuming: %i[min_bytes max_wait_time automatically_mark_as_processed],
25
25
  pausing: %i[pause_timeout],
26
26
  # All the options that are under kafka config namespace, but are not used
27
27
  # directly with kafka api, but from the Karafka user perspective, they are
@@ -37,7 +37,7 @@ module Karafka
37
37
  name
38
38
  parser
39
39
  responder
40
- batch_processing
40
+ batch_consuming
41
41
  persistent
42
42
  ]).uniq
43
43
  end
@@ -52,7 +52,7 @@ module Karafka
52
52
  # only when proxying details go ruby-kafka. We use ignored fields internally in karafka
53
53
  ignored_settings = config_adapter[:subscription]
54
54
  defined_settings = config_adapter.values.flatten
55
- karafka_settings = %i[batch_consuming]
55
+ karafka_settings = %i[batch_fetching]
56
56
  # This is a drity and bad hack of dry-configurable to get keys before setting values
57
57
  dynamically_proxied = Karafka::Setup::Config
58
58
  ._settings
@@ -7,10 +7,10 @@ module Karafka
7
7
  module Inline
8
8
  private
9
9
 
10
- # Executes perform code immediately (without enqueuing)
10
+ # Executes consume code immediately (without enqueuing)
11
11
  def process
12
12
  Karafka.monitor.notice(self.class, params_batch)
13
- perform
13
+ consume
14
14
  end
15
15
  end
16
16
  end
@@ -3,57 +3,8 @@
3
3
  # Karafka module namespace
4
4
  module Karafka
5
5
  # Base controller from which all Karafka controllers should inherit
6
- # Similar to Rails controllers we can define after_received callbacks
7
- # that will be executed
8
- #
9
- # Note that if after_received return false, the chain will be stopped and
10
- # the perform method won't be executed
11
- #
12
- # @example Create simple controller
13
- # class ExamplesController < Karafka::BaseController
14
- # def perform
15
- # # some logic here
16
- # end
17
- # end
18
- #
19
- # @example Create a controller with a block after_received
20
- # class ExampleController < Karafka::BaseController
21
- # after_received do
22
- # # Here we should have some checking logic
23
- # # If false is returned, won't schedule a perform action
24
- # end
25
- #
26
- # def perform
27
- # # some logic here
28
- # end
29
- # end
30
- #
31
- # @example Create a controller with a method after_received
32
- # class ExampleController < Karafka::BaseController
33
- # after_received :after_received_method
34
- #
35
- # def perform
36
- # # some logic here
37
- # end
38
- #
39
- # private
40
- #
41
- # def after_received_method
42
- # # Here we should have some checking logic
43
- # # If false is returned, won't schedule a perform action
44
- # end
45
- # end
46
6
  class BaseController
47
7
  extend ActiveSupport::DescendantsTracker
48
- include ActiveSupport::Callbacks
49
-
50
- # The call method is wrapped with a set of callbacks
51
- # We won't run perform at the backend if any of the callbacks
52
- # returns false
53
- # @see http://api.rubyonrails.org/classes/ActiveSupport/Callbacks/ClassMethods.html#method-i-get_callbacks
54
- define_callbacks :after_received
55
-
56
- attr_reader :params_batch
57
8
 
58
9
  class << self
59
10
  attr_reader :topic
@@ -66,21 +17,6 @@ module Karafka
66
17
  @topic = topic
67
18
  Controllers::Includer.call(self)
68
19
  end
69
-
70
- # Creates a callback that will be executed after receiving message but before executing the
71
- # backend for processing
72
- # @param method_name [Symbol, String] method name or nil if we plan to provide a block
73
- # @yield A block with a code that should be executed before scheduling
74
- # @example Define a block after_received callback
75
- # after_received do
76
- # # logic here
77
- # end
78
- #
79
- # @example Define a class name after_received callback
80
- # after_received :method_name
81
- def after_received(method_name = nil, &block)
82
- set_callback :after_received, :before, method_name ? method_name : block
83
- end
84
20
  end
85
21
 
86
22
  # @return [Karafka::Routing::Topic] topic to which a given controller is subscribed
@@ -100,18 +36,32 @@ module Karafka
100
36
  # Executes the default controller flow, runs callbacks and if not halted
101
37
  # will call process method of a proper backend
102
38
  def call
103
- run_callbacks :after_received do
104
- process
105
- end
39
+ process
106
40
  end
107
41
 
108
42
  private
109
43
 
110
- # Method that will perform business logic on data received from Kafka
44
+ # We make it private as it should be accesible only from the inside of a controller
45
+ attr_reader :params_batch
46
+
47
+ # @return [Karafka::Connection::Consumer] messages consumer that can be used to
48
+ # commit manually offset or pause / stop consumer based on the business logic
49
+ def consumer
50
+ Persistence::Consumer.read
51
+ end
52
+
53
+ # Method that will perform business logic and on data received from Kafka (it will consume
54
+ # the data)
111
55
  # @note This method needs bo be implemented in a subclass. We stub it here as a failover if
112
56
  # someone forgets about it or makes on with typo
113
- def perform
57
+ def consume
114
58
  raise NotImplementedError, 'Implement this in a subclass'
115
59
  end
60
+
61
+ # Alias for #consume method and deprecation warning
62
+ def perform
63
+ Karafka.logger.warn('[DEPRECATION WARNING]: please use #consume instead of #perform')
64
+ consume
65
+ end
116
66
  end
117
67
  end
@@ -147,12 +147,11 @@ module Karafka
147
147
  # @note By default will not change topic (if default mapper used)
148
148
  mapped_topic = Karafka::App.config.topic_mapper.outgoing(topic)
149
149
 
150
- data_elements.each do |(data, options)|
151
- ::WaterDrop::Message.new(
152
- mapped_topic,
150
+ data_elements.each do |data, options|
151
+ producer(options).call(
153
152
  data,
154
- options
155
- ).send!
153
+ options.merge(topic: mapped_topic)
154
+ )
156
155
  end
157
156
  end
158
157
  end
@@ -176,5 +175,11 @@ module Karafka
176
175
  messages_buffer[topic.to_s] ||= []
177
176
  messages_buffer[topic.to_s] << [@parser_class.generate(data), options]
178
177
  end
178
+
179
+ # @param options [Hash] options for waterdrop
180
+ # @return [Class] WaterDrop producer (sync or async based on the settings)
181
+ def producer(options)
182
+ options[:async] ? WaterDrop::AsyncProducer : WaterDrop::SyncProducer
183
+ end
179
184
  end
180
185
  end
@@ -15,9 +15,8 @@ module Karafka
15
15
  "Karafka framework version: #{Karafka::VERSION}",
16
16
  "Application client id: #{config.client_id}",
17
17
  "Backend: #{config.backend}",
18
+ "Batch fetching: #{config.batch_fetching}",
18
19
  "Batch consuming: #{config.batch_consuming}",
19
- "Batch processing: #{config.batch_processing}",
20
- "Number of threads: #{config.concurrency}",
21
20
  "Boot file: #{Karafka.boot_file}",
22
21
  "Environment: #{Karafka.env}",
23
22
  "Kafka seed brokers: #{config.kafka.seed_brokers}"
@@ -20,21 +20,19 @@ module Karafka
20
20
 
21
21
  if cli.options[:daemon]
22
22
  FileUtils.mkdir_p File.dirname(cli.options[:pid])
23
- # For some reason Celluloid spins threads that break forking
24
- # Threads are not shutdown immediately so deamonization will stale until
25
- # those threads are killed by Celluloid manager (via timeout)
26
- # There's nothing initialized here yet, so instead we shutdown celluloid
27
- # and run it again when we need (after fork)
28
- Celluloid.shutdown
29
23
  daemonize
30
- Celluloid.boot
31
24
  end
32
25
 
33
26
  # We assign active topics on a server level, as only server is expected to listen on
34
27
  # part of the topics
35
28
  Karafka::Server.consumer_groups = cli.options[:consumer_groups]
36
29
 
37
- # Remove pidfile on shutdown, just before the server instance is going to be GCed
30
+ # Remove pidfile on stop, just before the server instance is going to be GCed
31
+ # We want to delay the moment in which the pidfile is removed as much as we can,
32
+ # so instead of removing it after the server stops running, we rely on the gc moment
33
+ # when this object gets removed (it is a bit later), so it is closer to the actual
34
+ # system process end. We do that, so monitoring and deployment tools that rely on pids
35
+ # won't alarm or start new system process up until the current one is finished
38
36
  ObjectSpace.define_finalizer(self, proc { send(:clean) })
39
37
 
40
38
  # After we fork, we can boot celluloid again