karafka 2.5.0.rc1 → 2.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2a66089d998c0dabb1070e4e8f1895a068e8f2aa8e752fb38ef9da1633b9704d
4
- data.tar.gz: 188ea36894e0a32168303654510ef2e072d19f8ad39e5f0155547b6c96dbfdb2
3
+ metadata.gz: 03a08ef42e32f92069ef95b4380b744f7188dd2248f296abc752e5cee9d12c7f
4
+ data.tar.gz: e23896dcf66e16cddf193ee1b412fb0560dd82e50dcf01b31f3bb93d451afcc3
5
5
  SHA512:
6
- metadata.gz: d3ee8f86dd3b26dea69f9e03972ac3aced8b76d8156a9c359cb2b50114f3156306281a7f147935fa4d566e2d398f0a343205255a96f9560528b9dc2d21ca166c
7
- data.tar.gz: 843e553470b78b107080df06ae7f7bd716d5d08363fb0d381709522aed397bb5da6d46c0d1aee06a72c6ba063b112118a669ff3f079fef4c53f672cf08ed1ee0
6
+ metadata.gz: 52356bcb5a97f121a6e383bccc7530ef3e5252442ff0a13122b3cfebc8579a00a396576e4965f74473f950b28c820d898d259cffd90b26ff80c40701527cf97f
7
+ data.tar.gz: 8fe1550960de8de921e21a45b170a0b1282fdd6d9c5f81f2285cfc5dc958c8d54dbfcb953a997d87891f2ab29f010d8b418987a55825f7d25bb124c717876fab
data/CHANGELOG.md CHANGED
@@ -1,11 +1,14 @@
1
1
  # Karafka Framework Changelog
2
2
 
3
- ## 2.5.0 (Unreleased)
3
+ ## 2.5.0 (2025-06-15)
4
4
  - **[Breaking]** Change how consistency of DLQ dispatches works in Pro (`partition_key` vs. direct partition id mapping).
5
5
  - **[Breaking]** Remove the headers `source_key` from the Pro DLQ dispatched messages as the original key is now fully preserved.
6
6
  - **[Breaking]** Use DLQ and Piping prefix `source_` instead of `original_` to align with naming convention of Kafka Streams and Apache Flink for future usage.
7
7
  - **[Breaking]** Rename scheduled jobs topics names in their config (Pro).
8
+ - **[Breaking]** Change K8s listener response from `204` to `200` and include JSON body with reasons.
9
+ - **[Breaking]** Replace admin config `max_attempts` with `max_retries_duration` and
8
10
  - **[Feature]** Parallel Segments for concurrent processing of the same partition with more than partition count of processes (Pro).
11
+ - [Enhancement] Normalize topic + partition logs format.
9
12
  - [Enhancement] Support KIP-82 (header values of arrays).
10
13
  - [Enhancement] Enhance errors tracker with `#counts` that contains per-error class specific counters for granular flow handling.
11
14
  - [Enhancement] Provide explicit `Karafka::Admin.copy_consumer_group` API.
@@ -41,7 +44,9 @@
41
44
  - [Enhancement] Enrich scheduled messages state reporter with debug data.
42
45
  - [Enhancement] Introduce a new state called `stopped` to the scheduled messages.
43
46
  - [Enhancement] Do not overwrite the `key` in the Pro DLQ dispatched messages for routing reasons.
44
- - [Enhancement] Introduce `errors_tracker.trace_id` for distributed error details correlation with the Web UI.
47
+ - [Enhancement] Introduce `errors_tracker.trace_id` for distributed error details correlation with the Web UI.
48
+ - [Enhancement] Improve contracts validations reporting.
49
+ - [Enhancement] Optimize topic creation and repartitioning admin operations for topics with hundreds of partitions.
45
50
  - [Refactor] Introduce a `bin/verify_kafka_warnings` script to clean Kafka from temporary test-suite topics.
46
51
  - [Refactor] Introduce a `bin/verify_topics_naming` script to ensure proper test topics naming convention.
47
52
  - [Refactor] Make sure all temporary topics have a `it-` prefix in their name.
@@ -66,6 +71,8 @@
66
71
  - [Fix] Scheduled Messages re-seek moves to `latest` on inheritance of initial offset when `0` offset is compacted.
67
72
  - [Fix] Seek to `:latest` without `topic_partition_position` (-1) will not seek at all.
68
73
  - [Fix] Extremely high turn over of scheduled messages can cause them not to reach EOF/Loaded state.
74
+ - [Fix] Fix incorrectly passed `max_wait_time` to rdkafka (ms instead of seconds) causing too long wait.
75
+ - [Fix] Remove aggresive requerying of the Kafka cluster on topic creation/removal/altering.
69
76
  - [Change] Move to trusted-publishers and remove signing since no longer needed.
70
77
 
71
78
  ## 2.4.18 (2025-04-09)
data/Gemfile CHANGED
@@ -18,7 +18,7 @@ end
18
18
  group :integrations do
19
19
  gem 'activejob', require: false
20
20
  gem 'karafka-testing', '>= 2.5.0', require: false
21
- gem 'karafka-web', '>= 0.11.0.beta1', require: false
21
+ gem 'karafka-web', '>= 0.11.0.rc2', require: false
22
22
  end
23
23
 
24
24
  group :test do
data/Gemfile.lock CHANGED
@@ -1,9 +1,9 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- karafka (2.5.0.rc1)
4
+ karafka (2.5.0)
5
5
  base64 (~> 0.2)
6
- karafka-core (>= 2.5.0, < 2.6.0)
6
+ karafka-core (>= 2.5.2, < 2.6.0)
7
7
  karafka-rdkafka (>= 0.19.5)
8
8
  waterdrop (>= 2.8.3, < 3.0.0)
9
9
  zeitwerk (~> 2.3)
@@ -27,9 +27,9 @@ GEM
27
27
  securerandom (>= 0.3)
28
28
  tzinfo (~> 2.0, >= 2.0.5)
29
29
  uri (>= 0.13.1)
30
- base64 (0.2.0)
30
+ base64 (0.3.0)
31
31
  benchmark (0.4.1)
32
- bigdecimal (3.1.9)
32
+ bigdecimal (3.2.2)
33
33
  byebug (12.0.0)
34
34
  concurrent-ruby (1.3.5)
35
35
  connection_pool (2.5.3)
@@ -39,7 +39,7 @@ GEM
39
39
  erubi (1.13.1)
40
40
  et-orbi (1.2.11)
41
41
  tzinfo
42
- factory_bot (6.5.2)
42
+ factory_bot (6.5.4)
43
43
  activesupport (>= 6.1.0)
44
44
  ffi (1.17.2)
45
45
  ffi (1.17.2-aarch64-linux-gnu)
@@ -59,7 +59,7 @@ GEM
59
59
  activesupport (>= 6.1)
60
60
  i18n (1.14.7)
61
61
  concurrent-ruby (~> 1.0)
62
- karafka-core (2.5.1)
62
+ karafka-core (2.5.2)
63
63
  karafka-rdkafka (>= 0.19.2, < 0.21.0)
64
64
  logger (>= 1.6.0)
65
65
  karafka-rdkafka (0.19.5)
@@ -69,9 +69,9 @@ GEM
69
69
  karafka-testing (2.5.1)
70
70
  karafka (>= 2.5.0.beta1, < 2.6.0)
71
71
  waterdrop (>= 2.8.0)
72
- karafka-web (0.11.0.beta3)
72
+ karafka-web (0.11.0)
73
73
  erubi (~> 1.4)
74
- karafka (>= 2.5.0.beta1, < 2.6.0)
74
+ karafka (>= 2.5.0.rc2, < 2.6.0)
75
75
  karafka-core (>= 2.5.0, < 2.6.0)
76
76
  roda (~> 3.68, >= 3.69)
77
77
  tilt (~> 2.0)
@@ -80,9 +80,9 @@ GEM
80
80
  minitest (5.25.5)
81
81
  ostruct (0.6.1)
82
82
  raabro (1.4.0)
83
- rack (3.1.15)
83
+ rack (3.1.16)
84
84
  rake (13.3.0)
85
- roda (3.92.0)
85
+ roda (3.93.0)
86
86
  rack
87
87
  rspec (3.13.1)
88
88
  rspec-core (~> 3.13.0)
@@ -113,7 +113,7 @@ GEM
113
113
  karafka-core (>= 2.4.9, < 3.0.0)
114
114
  karafka-rdkafka (>= 0.19.2)
115
115
  zeitwerk (~> 2.3)
116
- zeitwerk (2.6.18)
116
+ zeitwerk (2.7.3)
117
117
 
118
118
  PLATFORMS
119
119
  aarch64-linux-gnu
@@ -135,7 +135,7 @@ DEPENDENCIES
135
135
  fugit
136
136
  karafka!
137
137
  karafka-testing (>= 2.5.0)
138
- karafka-web (>= 0.11.0.beta1)
138
+ karafka-web (>= 0.11.0.rc2)
139
139
  ostruct
140
140
  rspec
141
141
  simplecov
data/bin/integrations CHANGED
@@ -45,6 +45,7 @@ class Scenario
45
45
  'shutdown/on_hanging_on_shutdown_job_and_a_shutdown_spec.rb' => [2].freeze,
46
46
  'shutdown/on_hanging_listener_and_shutdown_spec.rb' => [2].freeze,
47
47
  'swarm/forceful_shutdown_of_hanging_spec.rb' => [2].freeze,
48
+ 'swarm/with_blocking_at_exit_spec.rb' => [2].freeze,
48
49
  'instrumentation/post_errors_instrumentation_error_spec.rb' => [1].freeze,
49
50
  'cli/declaratives/delete/existing_with_exit_code_spec.rb' => [2].freeze,
50
51
  'cli/declaratives/create/new_with_exit_code_spec.rb' => [2].freeze,
@@ -84,7 +84,8 @@ en:
84
84
  admin.kafka_format: needs to be a hash
85
85
  admin.group_id_format: 'needs to be a string with a Kafka accepted format'
86
86
  admin.max_wait_time_format: 'needs to be an integer bigger than 0'
87
- admin.max_attempts_format: 'needs to be an integer bigger than 0'
87
+ admin.retry_backoff_format: 'needs to be an integer bigger than 100'
88
+ admin.max_retries_duration_format: 'needs to be an integer bigger than 1000'
88
89
 
89
90
  swarm.nodes_format: 'needs to be an integer bigger than 0'
90
91
  swarm.node_format: needs to be false or node instance
data/docker-compose.yml CHANGED
@@ -1,7 +1,7 @@
1
1
  services:
2
2
  kafka:
3
3
  container_name: kafka
4
- image: confluentinc/cp-kafka:7.9.1
4
+ image: confluentinc/cp-kafka:8.0.0
5
5
 
6
6
  ports:
7
7
  - 9092:9092
data/karafka.gemspec CHANGED
@@ -22,7 +22,7 @@ Gem::Specification.new do |spec|
22
22
  DESC
23
23
 
24
24
  spec.add_dependency 'base64', '~> 0.2'
25
- spec.add_dependency 'karafka-core', '>= 2.5.0', '< 2.6.0'
25
+ spec.add_dependency 'karafka-core', '>= 2.5.2', '< 2.6.0'
26
26
  spec.add_dependency 'karafka-rdkafka', '>= 0.19.5'
27
27
  spec.add_dependency 'waterdrop', '>= 2.8.3', '< 3.0.0'
28
28
  spec.add_dependency 'zeitwerk', '~> 2.3'
@@ -21,7 +21,10 @@ module Karafka
21
21
 
22
22
  # Make sure, that karafka options that someone wants to use are valid before assigning
23
23
  # them
24
- App.config.internal.active_job.job_options_contract.validate!(new_options)
24
+ App.config.internal.active_job.job_options_contract.validate!(
25
+ new_options,
26
+ scope: %w[active_job]
27
+ )
25
28
 
26
29
  # We need to modify this hash because otherwise we would modify parent hash.
27
30
  self._karafka_options = _karafka_options.dup
data/lib/karafka/admin.rb CHANGED
@@ -10,10 +10,13 @@ module Karafka
10
10
  # Cluster on which operations are performed can be changed via `admin.kafka` config, however
11
11
  # there is no multi-cluster runtime support.
12
12
  module Admin
13
+ extend Core::Helpers::Time
14
+
13
15
  extend Helpers::ConfigImporter.new(
14
16
  max_wait_time: %i[admin max_wait_time],
15
17
  poll_timeout: %i[admin poll_timeout],
16
- max_attempts: %i[admin max_attempts],
18
+ max_retries_duration: %i[admin max_retries_duration],
19
+ retry_backoff: %i[admin retry_backoff],
17
20
  group_id: %i[admin group_id],
18
21
  app_kafka: %i[kafka],
19
22
  admin_kafka: %i[admin kafka]
@@ -122,7 +125,7 @@ module Karafka
122
125
  handler = admin.create_topic(name, partitions, replication_factor, topic_config)
123
126
 
124
127
  with_re_wait(
125
- -> { handler.wait(max_wait_timeout: max_wait_time) },
128
+ -> { handler.wait(max_wait_timeout: max_wait_time_seconds) },
126
129
  -> { topics_names.include?(name) }
127
130
  )
128
131
  end
@@ -136,7 +139,7 @@ module Karafka
136
139
  handler = admin.delete_topic(name)
137
140
 
138
141
  with_re_wait(
139
- -> { handler.wait(max_wait_timeout: max_wait_time) },
142
+ -> { handler.wait(max_wait_timeout: max_wait_time_seconds) },
140
143
  -> { !topics_names.include?(name) }
141
144
  )
142
145
  end
@@ -151,7 +154,7 @@ module Karafka
151
154
  handler = admin.create_partitions(name, partitions)
152
155
 
153
156
  with_re_wait(
154
- -> { handler.wait(max_wait_timeout: max_wait_time) },
157
+ -> { handler.wait(max_wait_timeout: max_wait_time_seconds) },
155
158
  -> { topic_info(name).fetch(:partition_count) >= partitions }
156
159
  )
157
160
  end
@@ -362,7 +365,7 @@ module Karafka
362
365
  def delete_consumer_group(consumer_group_id)
363
366
  with_admin do |admin|
364
367
  handler = admin.delete_group(consumer_group_id)
365
- handler.wait(max_wait_timeout: max_wait_time)
368
+ handler.wait(max_wait_timeout: max_wait_time_seconds)
366
369
  end
367
370
  end
368
371
 
@@ -564,6 +567,12 @@ module Karafka
564
567
 
565
568
  private
566
569
 
570
+ # @return [Integer] number of seconds to wait. `rdkafka` requires this value
571
+ # (`max_wait_time`) to be provided in seconds while we define it in ms hence the conversion
572
+ def max_wait_time_seconds
573
+ max_wait_time / 1_000.0
574
+ end
575
+
567
576
  # Adds a new callback for given rdkafka instance for oauth token refresh (if needed)
568
577
  #
569
578
  # @param id [String, Symbol] unique (for the lifetime of instance) id that we use for
@@ -602,20 +611,23 @@ module Karafka
602
611
  # @param handler [Proc] the wait handler operation
603
612
  # @param breaker [Proc] extra condition upon timeout that indicates things were finished ok
604
613
  def with_re_wait(handler, breaker)
605
- attempt ||= 0
606
- attempt += 1
614
+ start_time = monotonic_now
615
+ # Convert milliseconds to seconds for sleep
616
+ sleep_time = retry_backoff / 1000.0
607
617
 
608
- handler.call
618
+ loop do
619
+ handler.call
609
620
 
610
- # If breaker does not operate, it means that the requested change was applied but is still
611
- # not visible and we need to wait
612
- raise(Errors::ResultNotVisibleError) unless breaker.call
613
- rescue Rdkafka::AbstractHandle::WaitTimeoutError, Errors::ResultNotVisibleError
614
- return if breaker.call
621
+ sleep(sleep_time)
615
622
 
616
- retry if attempt <= max_attempts
623
+ return if breaker.call
624
+ rescue Rdkafka::AbstractHandle::WaitTimeoutError
625
+ return if breaker.call
617
626
 
618
- raise
627
+ next if monotonic_now - start_time < max_retries_duration
628
+
629
+ raise(Errors::ResultNotVisibleError)
630
+ end
619
631
  end
620
632
 
621
633
  # @param type [Symbol] type of config we want
@@ -5,12 +5,13 @@ module Karafka
5
5
  # Base contract for all Karafka contracts
6
6
  class Base < ::Karafka::Core::Contractable::Contract
7
7
  # @param data [Hash] data for validation
8
+ # @param scope [Array<String>] nested scope if in use
8
9
  # @return [Boolean] true if all good
9
10
  # @raise [Errors::InvalidConfigurationError] invalid configuration error
10
11
  # @note We use contracts only in the config validation context, so no need to add support
11
12
  # for multiple error classes. It will be added when it will be needed.
12
- def validate!(data)
13
- super(data, Errors::InvalidConfigurationError)
13
+ def validate!(data, scope: [])
14
+ super(data, Errors::InvalidConfigurationError, scope: scope)
14
15
  end
15
16
  end
16
17
  end
@@ -53,7 +53,8 @@ module Karafka
53
53
  required(:kafka) { |val| val.is_a?(Hash) }
54
54
  required(:group_id) { |val| val.is_a?(String) && Contracts::TOPIC_REGEXP.match?(val) }
55
55
  required(:max_wait_time) { |val| val.is_a?(Integer) && val.positive? }
56
- required(:max_attempts) { |val| val.is_a?(Integer) && val.positive? }
56
+ required(:retry_backoff) { |val| val.is_a?(Integer) && val >= 100 }
57
+ required(:max_retries_duration) { |val| val.is_a?(Integer) && val >= 1_000 }
57
58
  end
58
59
 
59
60
  # We validate internals just to be sure, that they are present and working
@@ -76,7 +76,7 @@ module Karafka
76
76
  consumer = job.executor.topic.consumer
77
77
  topic = job.executor.topic.name
78
78
  partition = job.executor.partition
79
- info "[#{job.id}] #{job_type} job for #{consumer} on #{topic}/#{partition} started"
79
+ info "[#{job.id}] #{job_type} job for #{consumer} on #{topic}-#{partition} started"
80
80
  end
81
81
 
82
82
  # Prints info about the fact that a given job has finished
@@ -91,7 +91,7 @@ module Karafka
91
91
  partition = job.executor.partition
92
92
  info <<~MSG.tr("\n", ' ').strip!
93
93
  [#{job.id}] #{job_type} job for #{consumer}
94
- on #{topic}/#{partition} finished in #{time} ms
94
+ on #{topic}-#{partition} finished in #{time} ms
95
95
  MSG
96
96
  end
97
97
 
@@ -108,7 +108,7 @@ module Karafka
108
108
 
109
109
  info <<~MSG.tr("\n", ' ').strip!
110
110
  [#{client.id}]
111
- Pausing on topic #{topic}/#{partition}
111
+ Pausing on topic #{topic}-#{partition}
112
112
  on #{offset ? "offset #{offset}" : 'the consecutive offset'}
113
113
  MSG
114
114
  end
@@ -122,7 +122,7 @@ module Karafka
122
122
  client = event[:caller]
123
123
 
124
124
  info <<~MSG.tr("\n", ' ').strip!
125
- [#{client.id}] Resuming on topic #{topic}/#{partition}
125
+ [#{client.id}] Resuming on topic #{topic}-#{partition}
126
126
  MSG
127
127
  end
128
128
 
@@ -138,7 +138,7 @@ module Karafka
138
138
 
139
139
  info <<~MSG.tr("\n", ' ').strip!
140
140
  [#{consumer.id}] Retrying of #{consumer.class} after #{timeout} ms
141
- on topic #{topic}/#{partition} from offset #{offset}
141
+ on topic #{topic}-#{partition} from offset #{offset}
142
142
  MSG
143
143
  end
144
144
 
@@ -153,7 +153,7 @@ module Karafka
153
153
 
154
154
  info <<~MSG.tr("\n", ' ').strip!
155
155
  [#{consumer.id}] Seeking from #{consumer.class}
156
- on topic #{topic}/#{partition} to offset #{seek_offset}
156
+ on topic #{topic}-#{partition} to offset #{seek_offset}
157
157
  MSG
158
158
  end
159
159
 
@@ -233,7 +233,7 @@ module Karafka
233
233
  info "#{group_prefix}: No partitions revoked"
234
234
  else
235
235
  revoked_partitions.each do |topic, partitions|
236
- info "#{group_prefix}: Partition(s) #{partitions.join(', ')} of #{topic} revoked"
236
+ info "#{group_prefix}: #{topic}-[#{partitions.join(',')}] revoked"
237
237
  end
238
238
  end
239
239
  end
@@ -251,7 +251,7 @@ module Karafka
251
251
  info "#{group_prefix}: No partitions assigned"
252
252
  else
253
253
  assigned_partitions.each do |topic, partitions|
254
- info "#{group_prefix}: Partition(s) #{partitions.join(', ')} of #{topic} assigned"
254
+ info "#{group_prefix}: #{topic}-[#{partitions.join(',')}] assigned"
255
255
  end
256
256
  end
257
257
  end
@@ -269,7 +269,7 @@ module Karafka
269
269
 
270
270
  info <<~MSG.tr("\n", ' ').strip!
271
271
  [#{consumer.id}] Dispatched message #{offset}
272
- from #{topic}/#{partition}
272
+ from #{topic}-#{partition}
273
273
  to DLQ topic: #{dlq_topic}
274
274
  MSG
275
275
  end
@@ -288,7 +288,7 @@ module Karafka
288
288
  info <<~MSG.tr("\n", ' ').strip!
289
289
  [#{consumer.id}] Throttled and will resume
290
290
  from message #{offset}
291
- on #{topic}/#{partition}
291
+ on #{topic}-#{partition}
292
292
  MSG
293
293
  end
294
294
 
@@ -303,7 +303,7 @@ module Karafka
303
303
 
304
304
  info <<~MSG.tr("\n", ' ').strip!
305
305
  [#{consumer.id}] Post-filtering seeking to message #{offset}
306
- on #{topic}/#{partition}
306
+ on #{topic}-#{partition}
307
307
  MSG
308
308
  end
309
309
 
@@ -8,11 +8,12 @@ module Karafka
8
8
  # Namespace for instrumentation related with Kubernetes
9
9
  module Kubernetes
10
10
  # Base Kubernetes Listener providing basic HTTP server capabilities to respond with health
11
+ # statuses
11
12
  class BaseListener
12
13
  include ::Karafka::Core::Helpers::Time
13
14
 
14
15
  # All good with Karafka
15
- OK_CODE = '204 No Content'
16
+ OK_CODE = '200 OK'
16
17
 
17
18
  # Some timeouts, fail
18
19
  FAIL_CODE = '500 Internal Server Error'
@@ -38,11 +39,15 @@ module Karafka
38
39
 
39
40
  # Responds to a HTTP request with the process liveness status
40
41
  def respond
42
+ body = JSON.generate(status_body)
43
+
41
44
  client = @server.accept
42
45
  client.gets
43
46
  client.print "HTTP/1.1 #{healthy? ? OK_CODE : FAIL_CODE}\r\n"
44
- client.print "Content-Type: text/plain\r\n"
47
+ client.print "Content-Type: application/json\r\n"
48
+ client.print "Content-Length: #{body.bytesize}\r\n"
45
49
  client.print "\r\n"
50
+ client.print body
46
51
  client.close
47
52
 
48
53
  true
@@ -50,6 +55,16 @@ module Karafka
50
55
  !@server.closed?
51
56
  end
52
57
 
58
+ # @return [Hash] hash that will be the response body
59
+ def status_body
60
+ {
61
+ status: healthy? ? 'healthy' : 'unhealthy',
62
+ timestamp: Time.now.to_i,
63
+ port: @port,
64
+ process_id: ::Process.pid
65
+ }
66
+ end
67
+
53
68
  # Starts background thread with micro-http monitoring
54
69
  def start
55
70
  @server = TCPServer.new(*[@hostname, @port].compact)
@@ -53,7 +53,7 @@ module Karafka
53
53
  consuming_ttl: 5 * 60 * 1_000,
54
54
  polling_ttl: 5 * 60 * 1_000
55
55
  )
56
- # If this is set to true, it indicates unrecoverable error like fencing
56
+ # If this is set to a symbol, it indicates unrecoverable error like fencing
57
57
  # While fencing can be partial (for one of the SGs), we still should consider this
58
58
  # as an undesired state for the whole process because it halts processing in a
59
59
  # non-recoverable manner forever
@@ -116,7 +116,7 @@ module Karafka
116
116
  # We mark as unrecoverable only on certain errors that will not be fixed by retrying
117
117
  return unless UNRECOVERABLE_RDKAFKA_ERRORS.include?(error.code)
118
118
 
119
- @unrecoverable = true
119
+ @unrecoverable = error.code
120
120
  end
121
121
 
122
122
  # Deregister the polling tracker for given listener
@@ -142,17 +142,29 @@ module Karafka
142
142
  # Did we exceed any of the ttls
143
143
  # @return [String] 204 string if ok, 500 otherwise
144
144
  def healthy?
145
- time = monotonic_now
146
-
147
145
  return false if @unrecoverable
148
- return false if @pollings.values.any? { |tick| (time - tick) > @polling_ttl }
149
- return false if @consumptions.values.any? { |tick| (time - tick) > @consuming_ttl }
146
+ return false if polling_ttl_exceeded?
147
+ return false if consuming_ttl_exceeded?
150
148
 
151
149
  true
152
150
  end
153
151
 
154
152
  private
155
153
 
154
+ # @return [Boolean] true if the consumer exceeded the polling ttl
155
+ def polling_ttl_exceeded?
156
+ time = monotonic_now
157
+
158
+ @pollings.values.any? { |tick| (time - tick) > @polling_ttl }
159
+ end
160
+
161
+ # @return [Boolean] true if the consumer exceeded the consuming ttl
162
+ def consuming_ttl_exceeded?
163
+ time = monotonic_now
164
+
165
+ @consumptions.values.any? { |tick| (time - tick) > @consuming_ttl }
166
+ end
167
+
156
168
  # Wraps the logic with a mutex
157
169
  # @param block [Proc] code we want to run in mutex
158
170
  def synchronize(&block)
@@ -191,6 +203,17 @@ module Karafka
191
203
  @consumptions.delete(thread_id)
192
204
  end
193
205
  end
206
+
207
+ # @return [Hash] response body status
208
+ def status_body
209
+ super.merge!(
210
+ errors: {
211
+ polling_ttl_exceeded: polling_ttl_exceeded?,
212
+ consumption_ttl_exceeded: consuming_ttl_exceeded?,
213
+ unrecoverable: @unrecoverable
214
+ }
215
+ )
216
+ end
194
217
  end
195
218
  end
196
219
  end
@@ -47,6 +47,15 @@ module Karafka
47
47
  def healthy?
48
48
  (monotonic_now - @controlling) < @controlling_ttl
49
49
  end
50
+
51
+ # @return [Hash] response body status
52
+ def status_body
53
+ super.merge!(
54
+ errors: {
55
+ controlling_ttl_exceeded: !healthy?
56
+ }
57
+ )
58
+ end
50
59
  end
51
60
  end
52
61
  end
@@ -22,7 +22,10 @@ module Karafka
22
22
 
23
23
  # @param config [Karafka::Core::Configurable::Node] root node config
24
24
  def post_setup(config)
25
- Encryption::Contracts::Config.new.validate!(config.to_h)
25
+ Encryption::Contracts::Config.new.validate!(
26
+ config.to_h,
27
+ scope: %w[config]
28
+ )
26
29
 
27
30
  # Don't inject extra components if encryption is not active
28
31
  return unless config.encryption.active
@@ -29,7 +29,10 @@ module Karafka
29
29
  @schedule.instance_exec(&block)
30
30
 
31
31
  @schedule.each do |task|
32
- Contracts::Task.new.validate!(task.to_h)
32
+ Contracts::Task.new.validate!(
33
+ task.to_h,
34
+ scope: ['recurring_tasks', task.id]
35
+ )
33
36
  end
34
37
 
35
38
  @schedule
@@ -59,7 +62,10 @@ module Karafka
59
62
 
60
63
  # @param config [Karafka::Core::Configurable::Node] root node config
61
64
  def post_setup(config)
62
- RecurringTasks::Contracts::Config.new.validate!(config.to_h)
65
+ RecurringTasks::Contracts::Config.new.validate!(
66
+ config.to_h,
67
+ scope: %w[config]
68
+ )
63
69
 
64
70
  # Published after task is successfully executed
65
71
  Karafka.monitor.notifications_bus.register_event('recurring_tasks.task.executed')
@@ -28,7 +28,8 @@ module Karafka
28
28
  # Validates that each node has at least one assignment.
29
29
  #
30
30
  # @param builder [Karafka::Routing::Builder]
31
- def validate!(builder)
31
+ # @param scope [Array<String>]
32
+ def validate!(builder, scope: [])
32
33
  nodes_setup = Hash.new do |h, node_id|
33
34
  h[node_id] = { active: false, node_id: node_id }
34
35
  end
@@ -49,7 +50,7 @@ module Karafka
49
50
  end
50
51
 
51
52
  nodes_setup.each_value do |details|
52
- super(details)
53
+ super(details, scope: scope)
53
54
  end
54
55
  end
55
56
 
@@ -17,7 +17,10 @@ module Karafka
17
17
  # @param config [Karafka::Core::Configurable::Node] app config
18
18
  def post_setup(config)
19
19
  config.monitor.subscribe('app.before_warmup') do
20
- Contracts::Routing.new.validate!(config.internal.routing.builder)
20
+ Contracts::Routing.new.validate!(
21
+ config.internal.routing.builder,
22
+ scope: %w[swarm]
23
+ )
21
24
  end
22
25
  end
23
26
  end
@@ -60,7 +60,11 @@ module Karafka
60
60
  # We need to ensure that the message we want to proxy is fully legit. Otherwise, since
61
61
  # we envelope details like target topic, we could end up having incorrect data to
62
62
  # schedule
63
- MSG_CONTRACT.validate!(message, WaterDrop::Errors::MessageInvalidError)
63
+ MSG_CONTRACT.validate!(
64
+ message,
65
+ WaterDrop::Errors::MessageInvalidError,
66
+ scope: %w[scheduled_messages message]
67
+ )
64
68
 
65
69
  headers = (message[:headers] || {}).merge(
66
70
  'schedule_schema_version' => ScheduledMessages::SCHEMA_VERSION,
@@ -166,9 +170,17 @@ module Karafka
166
170
  # complies with our requirements
167
171
  # @param proxy_message [Hash] our message envelope
168
172
  def validate!(proxy_message)
169
- POST_CONTRACT.validate!(proxy_message)
173
+ POST_CONTRACT.validate!(
174
+ proxy_message,
175
+ scope: %w[scheduled_messages message]
176
+ )
177
+
170
178
  # After proxy specific validations we also ensure, that the final form is correct
171
- MSG_CONTRACT.validate!(proxy_message, WaterDrop::Errors::MessageInvalidError)
179
+ MSG_CONTRACT.validate!(
180
+ proxy_message,
181
+ WaterDrop::Errors::MessageInvalidError,
182
+ scope: %w[scheduled_messages message]
183
+ )
172
184
  end
173
185
  end
174
186
  end
@@ -51,7 +51,10 @@ module Karafka
51
51
 
52
52
  # @param config [Karafka::Core::Configurable::Node] root node config
53
53
  def post_setup(config)
54
- RecurringTasks::Contracts::Config.new.validate!(config.to_h)
54
+ ScheduledMessages::Contracts::Config.new.validate!(
55
+ config.to_h,
56
+ scope: %w[config]
57
+ )
55
58
  end
56
59
 
57
60
  # Basically since we may have custom producers configured that are not the same as the
@@ -50,15 +50,24 @@ module Karafka
50
50
 
51
51
  # Ensures high-level routing details consistency
52
52
  # Contains checks that require knowledge about all the consumer groups to operate
53
- Contracts::Routing.new.validate!(map(&:to_h))
53
+ Contracts::Routing.new.validate!(
54
+ map(&:to_h),
55
+ scope: %w[routes]
56
+ )
54
57
 
55
58
  each do |consumer_group|
56
59
  # Validate consumer group settings
57
- Contracts::ConsumerGroup.new.validate!(consumer_group.to_h)
60
+ Contracts::ConsumerGroup.new.validate!(
61
+ consumer_group.to_h,
62
+ scope: ['routes', consumer_group.name]
63
+ )
58
64
 
59
65
  # and then its topics settings
60
66
  consumer_group.topics.each do |topic|
61
- Contracts::Topic.new.validate!(topic.to_h)
67
+ Contracts::Topic.new.validate!(
68
+ topic.to_h,
69
+ scope: ['routes', consumer_group.name, topic.name]
70
+ )
62
71
  end
63
72
 
64
73
  # Initialize subscription groups after all the routing is done
@@ -38,13 +38,19 @@ module Karafka
38
38
 
39
39
  each do |consumer_group|
40
40
  if scope::Contracts.const_defined?('ConsumerGroup', false)
41
- scope::Contracts::ConsumerGroup.new.validate!(consumer_group.to_h)
41
+ scope::Contracts::ConsumerGroup.new.validate!(
42
+ consumer_group.to_h,
43
+ scope: ['routes', consumer_group.name]
44
+ )
42
45
  end
43
46
 
44
47
  next unless scope::Contracts.const_defined?('Topic', false)
45
48
 
46
49
  consumer_group.topics.each do |topic|
47
- scope::Contracts::Topic.new.validate!(topic.to_h)
50
+ scope::Contracts::Topic.new.validate!(
51
+ topic.to_h,
52
+ scope: ['routes', consumer_group.name, topic.name]
53
+ )
48
54
  end
49
55
  end
50
56
 
@@ -51,7 +51,10 @@ module Karafka
51
51
  # embedded
52
52
  # We cannot validate this during the start because config needs to be populated and routes
53
53
  # need to be defined.
54
- cli_contract.validate!(activity_manager.to_h)
54
+ cli_contract.validate!(
55
+ activity_manager.to_h,
56
+ scope: %w[cli]
57
+ )
55
58
 
56
59
  # We clear as we do not want parent handlers in case of working from fork
57
60
  process.clear
@@ -131,11 +131,20 @@ module Karafka
131
131
  # option max_wait_time [Integer] We wait only for this amount of time before raising error
132
132
  # as we intercept this error and retry after checking that the operation was finished or
133
133
  # failed using external factor.
134
- setting :max_wait_time, default: 1_000
134
+ #
135
+ # For async this will finish immediately but for sync operations this will wait and we
136
+ # will get a confirmation. 60 seconds is ok for both cases as for async, the re-wait will
137
+ # kick in
138
+ setting :max_wait_time, default: 60 * 1_000
139
+
140
+ # How long should we wait on admin operation retrying before giving up and raising an
141
+ # error that result is not visible
142
+ setting :max_retries_duration, default: 60_000
135
143
 
136
- # How many times should be try. 1 000 ms x 60 => 60 seconds wait in total and then we give
137
- # up on pending operations
138
- setting :max_attempts, default: 60
144
+ # In case of fast-finished async work, this `retry_backoff` help us not re-query Kafka
145
+ # too fast after previous call to check the async operation results. Basically prevents
146
+ # us from spamming metadata requests to Kafka in a loop
147
+ setting :retry_backoff, default: 500
139
148
 
140
149
  # option poll_timeout [Integer] time in ms
141
150
  # How long should a poll wait before yielding on no results (rdkafka-ruby setting)
@@ -352,7 +361,10 @@ module Karafka
352
361
 
353
362
  configure(&block)
354
363
 
355
- Contracts::Config.new.validate!(config.to_h)
364
+ Contracts::Config.new.validate!(
365
+ config.to_h,
366
+ scope: %w[config]
367
+ )
356
368
 
357
369
  configure_components
358
370
 
@@ -42,7 +42,10 @@ module Karafka
42
42
  # Creates needed number of forks, installs signals and starts supervision
43
43
  def run
44
44
  # Validate the CLI provided options the same way as we do for the regular server
45
- cli_contract.validate!(activity_manager.to_h)
45
+ cli_contract.validate!(
46
+ activity_manager.to_h,
47
+ scope: %w[swarm cli]
48
+ )
46
49
 
47
50
  # Close producer just in case. While it should not be used, we do not want even a
48
51
  # theoretical case since librdkafka is not thread-safe.
@@ -154,7 +157,7 @@ module Karafka
154
157
  # Run forceful kill
155
158
  manager.terminate
156
159
  # And wait until linux kills them
157
- # This prevents us from existing forcefully with any dead child process still existing
160
+ # This prevents us from exiting forcefully with any dead child process still existing
158
161
  # Since we have sent the `KILL` signal, it must die, so we can wait until all dead
159
162
  sleep(supervision_sleep) until manager.stopped?
160
163
 
@@ -3,5 +3,5 @@
3
3
  # Main module namespace
4
4
  module Karafka
5
5
  # Current Karafka version
6
- VERSION = '2.5.0.rc1'
6
+ VERSION = '2.5.0'
7
7
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: karafka
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.5.0.rc1
4
+ version: 2.5.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Maciej Mensfeld
@@ -29,7 +29,7 @@ dependencies:
29
29
  requirements:
30
30
  - - ">="
31
31
  - !ruby/object:Gem::Version
32
- version: 2.5.0
32
+ version: 2.5.2
33
33
  - - "<"
34
34
  - !ruby/object:Gem::Version
35
35
  version: 2.6.0
@@ -39,7 +39,7 @@ dependencies:
39
39
  requirements:
40
40
  - - ">="
41
41
  - !ruby/object:Gem::Version
42
- version: 2.5.0
42
+ version: 2.5.2
43
43
  - - "<"
44
44
  - !ruby/object:Gem::Version
45
45
  version: 2.6.0