gitlab-sidekiq-fetcher 0.5.0.pre.alpha → 0.6.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7da36ba54ba6a0e97cef5da210f02bdde86dc06e22960d37573e51e20046dd40
4
- data.tar.gz: bd3bf13edc789374109a2c18b6797b44c1440bc2665938c654e3a627bbc5bc23
3
+ metadata.gz: a71699d717aeb95cb406ed566e394a34e7978df8289db2fe89aff82046dd8e19
4
+ data.tar.gz: 4390a2a95507b8a6c5c08a9d3dac384f58f0d11311afd1da5973e4c61240120a
5
5
  SHA512:
6
- metadata.gz: 852f01191a052384cfe4215766c6b27dc932ced77782a92c23f73aba091b520e83d293d57b407e45ed5d141661664a9f77b88211b50e9bed52cb15a011b0347a
7
- data.tar.gz: 59ee3493c9ddcb3e145ce77259b85ed508f8e57ca97878c12f5427e25952d2a4418860cd1d90562d39aa693dd630359e88ac78fd882aaceb9c5c374b6ca7509a
6
+ metadata.gz: fa95de2f9b33f01b45c547b23bfd4728ab88ed304ecbeaa2416e03b1a6d6bff678f097b37ae85679ef69f520a37a6f7115207170284229df706172f60d103b19
7
+ data.tar.gz: d7981fd4afe0abf8454caffac4c1fa8e290dd8617ce28edf45b4a29a9deb9b3a9cd583f64038cc3808b814923ab74fc370d65d9a42257b4af9fa97ea48542959
data/.gitignore CHANGED
@@ -1,2 +1,3 @@
1
1
  *.gem
2
2
  coverage
3
+ .DS_Store
@@ -25,7 +25,7 @@ rspec:
25
25
  .integration:
26
26
  stage: test
27
27
  script:
28
- - cd tests/reliability_test
28
+ - cd tests/reliability
29
29
  - bundle exec ruby reliability_test.rb
30
30
  services:
31
31
  - redis:alpine
@@ -47,11 +47,19 @@ integration_basic:
47
47
  variables:
48
48
  JOB_FETCHER: basic
49
49
 
50
- retry_test:
50
+ kill_interruption:
51
51
  stage: test
52
52
  script:
53
- - cd tests/retry_test
54
- - bundle exec ruby retry_test.rb
53
+ - cd tests/interruption
54
+ - bundle exec ruby test_kill_signal.rb
55
+ services:
56
+ - redis:alpine
57
+
58
+ term_interruption:
59
+ stage: test
60
+ script:
61
+ - cd tests/interruption
62
+ - bundle exec ruby test_term_signal.rb
55
63
  services:
56
64
  - redis:alpine
57
65
 
data/Gemfile CHANGED
@@ -7,6 +7,6 @@ git_source(:github) { |repo_name| "https://github.com/#{repo_name}" }
7
7
  group :test do
8
8
  gem "rspec", '~> 3'
9
9
  gem "pry"
10
- gem "sidekiq", '~> 5.0'
10
+ gem "sidekiq", '~> 6.0.0'
11
11
  gem 'simplecov', require: false
12
12
  end
@@ -2,7 +2,7 @@ GEM
2
2
  remote: https://rubygems.org/
3
3
  specs:
4
4
  coderay (1.1.2)
5
- connection_pool (2.2.2)
5
+ connection_pool (2.2.3)
6
6
  diff-lcs (1.3)
7
7
  docile (1.3.1)
8
8
  json (2.1.0)
@@ -10,10 +10,10 @@ GEM
10
10
  pry (0.11.3)
11
11
  coderay (~> 1.1.0)
12
12
  method_source (~> 0.9.0)
13
- rack (2.0.5)
14
- rack-protection (2.0.4)
13
+ rack (2.2.3)
14
+ rack-protection (2.0.8.1)
15
15
  rack
16
- redis (4.0.2)
16
+ redis (4.2.1)
17
17
  rspec (3.8.0)
18
18
  rspec-core (~> 3.8.0)
19
19
  rspec-expectations (~> 3.8.0)
@@ -27,10 +27,11 @@ GEM
27
27
  diff-lcs (>= 1.2.0, < 2.0)
28
28
  rspec-support (~> 3.8.0)
29
29
  rspec-support (3.8.0)
30
- sidekiq (5.2.2)
31
- connection_pool (~> 2.2, >= 2.2.2)
32
- rack-protection (>= 1.5.0)
33
- redis (>= 3.3.5, < 5)
30
+ sidekiq (6.0.7)
31
+ connection_pool (>= 2.2.2)
32
+ rack (~> 2.0)
33
+ rack-protection (>= 2.0.0)
34
+ redis (>= 4.1.0)
34
35
  simplecov (0.16.1)
35
36
  docile (~> 1.1)
36
37
  json (>= 1.8, < 3)
@@ -43,8 +44,8 @@ PLATFORMS
43
44
  DEPENDENCIES
44
45
  pry
45
46
  rspec (~> 3)
46
- sidekiq (~> 5.0)
47
+ sidekiq (~> 6.0.0)
47
48
  simplecov
48
49
 
49
50
  BUNDLED WITH
50
- 1.17.1
51
+ 1.17.2
data/README.md CHANGED
@@ -10,6 +10,17 @@ There are two strategies implemented: [Reliable fetch](http://redis.io/commands/
10
10
  semi-reliable fetch that uses regular `brpop` and `lpush` to pick the job and put it to working queue. The main benefit of "Reliable" strategy is that `rpoplpush` is atomic, eliminating a race condition in which jobs can be lost.
11
11
  However, it comes at a cost because `rpoplpush` can't watch multiple lists at the same time so we need to iterate over the entire queue list which significantly increases pressure on Redis when there are more than a few queues. The "semi-reliable" strategy is much more reliable than the default Sidekiq fetcher, though. Compared to the reliable fetch strategy, it does not increase pressure on Redis significantly.
12
12
 
13
+ ### Interruption handling
14
+
15
+ Sidekiq expects any job to report succcess or to fail. In the last case, Sidekiq puts `retry_count` counter
16
+ into the job and keeps to re-run the job until the counter reched the maximum allowed value. When the job has
17
+ not been given a chance to finish its work(to report success or fail), for example, when it was killed forcibly or when the job was requeued, after receiving TERM signal, the standard retry mechanisme does not get into the game and the job will be retried indefinatelly. This is why Reliable fetcher maintains a special counter `interrupted_count`
18
+ which is used to limit the amount of such retries. In both cases, Reliable Fetcher increments counter `interrupted_count` and rejects the job from running again when the counter exceeds `max_retries_after_interruption` times (default: 3 times).
19
+ Such a job will be put to `interrupted` queue. This queue mostly behaves as Sidekiq Dead queue so it only stores a limited amount of jobs for a limited term. Same as for Dead queue, all the limits are configurable via `interrupted_max_jobs` (default: 10_000) and `interrupted_timeout_in_seconds` (default: 3 months) Sidekiq option keys.
20
+
21
+ You can also disable special handling of interrupted jobs by setting `max_retries_after_interruption` into `-1`.
22
+ In this case, interrupted jobs will be run without any limits from Reliable Fetcher and they won't be put into Interrupted queue.
23
+
13
24
 
14
25
  ## Installation
15
26
 
@@ -1,14 +1,14 @@
1
1
  Gem::Specification.new do |s|
2
- s.name = 'gitlab-sidekiq-fetcher'
3
- s.version = '0.5.0-alpha'
4
- s.authors = ['TEA', 'GitLab']
5
- s.email = 'valery@gitlab.com'
6
- s.license = 'LGPL-3.0'
7
- s.homepage = 'https://gitlab.com/gitlab-org/sidekiq-reliable-fetch/'
8
- s.summary = 'Reliable fetch extension for Sidekiq'
9
- s.description = 'Redis reliable queue pattern implemented in Sidekiq'
2
+ s.name = 'gitlab-sidekiq-fetcher'
3
+ s.version = '0.6.1'
4
+ s.authors = ['TEA', 'GitLab']
5
+ s.email = 'valery@gitlab.com'
6
+ s.license = 'LGPL-3.0'
7
+ s.homepage = 'https://gitlab.com/gitlab-org/sidekiq-reliable-fetch/'
8
+ s.summary = 'Reliable fetch extension for Sidekiq'
9
+ s.description = 'Redis reliable queue pattern implemented in Sidekiq'
10
10
  s.require_paths = ['lib']
11
- s.files = `git ls-files`.split($\)
12
- s.test_files = []
13
- s.add_dependency 'sidekiq', '~> 5'
11
+ s.files = `git ls-files`.split($\)
12
+ s.test_files = []
13
+ s.add_dependency 'sidekiq', '>= 5', '< 6.1'
14
14
  end
@@ -1,4 +1,5 @@
1
1
  require 'sidekiq'
2
+ require 'sidekiq/api'
2
3
 
3
4
  require_relative 'sidekiq/base_reliable_fetch'
4
5
  require_relative 'sidekiq/reliable_fetch'
@@ -1,6 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'sidekiq/job_retry'
3
+ require_relative 'interrupted_set'
4
4
 
5
5
  module Sidekiq
6
6
  class BaseReliableFetch
@@ -18,6 +18,9 @@ module Sidekiq
18
18
  # Defines the COUNT parameter that will be passed to Redis SCAN command
19
19
  SCAN_COUNT = 1000
20
20
 
21
+ # How much time a job can be interrupted
22
+ DEFAULT_MAX_RETRIES_AFTER_INTERRUPTION = 3
23
+
21
24
  UnitOfWork = Struct.new(:queue, :job) do
22
25
  def acknowledge
23
26
  Sidekiq.redis { |conn| conn.lrem(Sidekiq::BaseReliableFetch.working_queue_name(queue), 1, job) }
@@ -84,136 +87,145 @@ module Sidekiq
84
87
  def self.bulk_requeue(inprogress, _options)
85
88
  return if inprogress.empty?
86
89
 
87
- Sidekiq.logger.debug('Re-queueing terminated jobs')
88
-
89
90
  Sidekiq.redis do |conn|
90
91
  inprogress.each do |unit_of_work|
91
92
  conn.multi do |multi|
92
- multi.lpush(unit_of_work.queue, unit_of_work.job)
93
+ preprocess_interrupted_job(unit_of_work.job, unit_of_work.queue, multi)
94
+
93
95
  multi.lrem(working_queue_name(unit_of_work.queue), 1, unit_of_work.job)
94
96
  end
95
97
  end
96
98
  end
97
-
98
- Sidekiq.logger.info("Pushed #{inprogress.size} jobs back to Redis")
99
99
  rescue => e
100
100
  Sidekiq.logger.warn("Failed to requeue #{inprogress.size} jobs: #{e.message}")
101
101
  end
102
102
 
103
- def self.heartbeat_key(hostname, pid)
104
- "reliable-fetcher-heartbeat-#{hostname}-#{pid}"
105
- end
103
+ def self.clean_working_queue!(working_queue)
104
+ original_queue = working_queue.gsub(/#{WORKING_QUEUE_PREFIX}:|:[^:]*:[0-9]*\z/, '')
106
105
 
107
- def self.working_queue_name(queue)
108
- "#{WORKING_QUEUE_PREFIX}:#{queue}:#{hostname}:#{pid}"
106
+ Sidekiq.redis do |conn|
107
+ while job = conn.rpop(working_queue)
108
+ preprocess_interrupted_job(job, original_queue)
109
+ end
110
+ end
109
111
  end
110
112
 
111
- attr_reader :cleanup_interval, :last_try_to_take_lease_at, :lease_interval,
112
- :queues, :use_semi_reliable_fetch,
113
- :strictly_ordered_queues
113
+ def self.preprocess_interrupted_job(job, queue, conn = nil)
114
+ msg = Sidekiq.load_json(job)
115
+ msg['interrupted_count'] = msg['interrupted_count'].to_i + 1
114
116
 
115
- def initialize(options)
116
- @cleanup_interval = options.fetch(:cleanup_interval, DEFAULT_CLEANUP_INTERVAL)
117
- @lease_interval = options.fetch(:lease_interval, DEFAULT_LEASE_INTERVAL)
118
- @last_try_to_take_lease_at = 0
119
- @strictly_ordered_queues = !!options[:strict]
120
- @queues = options[:queues].map { |q| "queue:#{q}" }
117
+ if interruption_exhausted?(msg)
118
+ send_to_quarantine(msg, conn)
119
+ else
120
+ requeue_job(queue, msg, conn)
121
+ end
121
122
  end
122
123
 
123
- def retrieve_work
124
- clean_working_queues! if take_lease
125
-
126
- retrieve_unit_of_work
127
- end
124
+ # Detect "old" jobs and requeue them because the worker they were assigned
125
+ # to probably failed miserably.
126
+ def self.clean_working_queues!
127
+ Sidekiq.logger.info('Cleaning working queues')
128
128
 
129
- def retrieve_unit_of_work
130
- raise NotImplementedError,
131
- "#{self.class} does not implement #{__method__}"
132
- end
129
+ Sidekiq.redis do |conn|
130
+ conn.scan_each(match: "#{WORKING_QUEUE_PREFIX}:queue:*", count: SCAN_COUNT) do |key|
131
+ # Example: "working:name_of_the_job:queue:{hostname}:{PID}"
132
+ hostname, pid = key.scan(/:([^:]*):([0-9]*)\z/).flatten
133
133
 
134
- private
134
+ continue if hostname.nil? || pid.nil?
135
135
 
136
- def clean_working_queue!(working_queue)
137
- original_queue = working_queue.gsub(/#{WORKING_QUEUE_PREFIX}:|:[^:]*:[0-9]*\z/, '')
136
+ clean_working_queue!(key) if worker_dead?(hostname, pid, conn)
137
+ end
138
+ end
139
+ end
138
140
 
139
- Sidekiq.redis do |conn|
140
- count = 0
141
+ def self.worker_dead?(hostname, pid, conn)
142
+ !conn.get(heartbeat_key(hostname, pid))
143
+ end
141
144
 
142
- while job = conn.rpop(working_queue)
143
- msg = begin
144
- Sidekiq.load_json(job)
145
- rescue => e
146
- Sidekiq.logger.info("Skipped job: #{job} as we couldn't parse it")
147
- next
148
- end
145
+ def self.heartbeat_key(hostname, pid)
146
+ "reliable-fetcher-heartbeat-#{hostname}-#{pid}"
147
+ end
149
148
 
150
- msg['retry_count'] = msg['retry_count'].to_i + 1
149
+ def self.working_queue_name(queue)
150
+ "#{WORKING_QUEUE_PREFIX}:#{queue}:#{hostname}:#{pid}"
151
+ end
151
152
 
152
- if retries_exhausted?(msg)
153
- send_to_morgue(msg)
154
- else
155
- job = Sidekiq.dump_json(msg)
153
+ def self.interruption_exhausted?(msg)
154
+ return false if max_retries_after_interruption(msg['class']) < 0
156
155
 
157
- conn.lpush(original_queue, job)
156
+ msg['interrupted_count'].to_i >= max_retries_after_interruption(msg['class'])
157
+ end
158
158
 
159
- count += 1
160
- end
161
- end
159
+ def self.max_retries_after_interruption(worker_class)
160
+ max_retries_after_interruption = nil
162
161
 
163
- Sidekiq.logger.info("Requeued #{count} dead jobs to #{original_queue}")
162
+ max_retries_after_interruption ||= begin
163
+ Object.const_get(worker_class).sidekiq_options[:max_retries_after_interruption]
164
+ rescue NameError
164
165
  end
165
- end
166
166
 
167
- def retries_exhausted?(msg)
168
- max_retries_default = Sidekiq.options.fetch(:max_retries, Sidekiq::JobRetry::DEFAULT_MAX_RETRY_ATTEMPTS)
167
+ max_retries_after_interruption ||= Sidekiq.options[:max_retries_after_interruption]
168
+ max_retries_after_interruption ||= DEFAULT_MAX_RETRIES_AFTER_INTERRUPTION
169
+ max_retries_after_interruption
170
+ end
169
171
 
170
- max_retry_attempts = retry_attempts_from(msg['retry'], max_retries_default)
172
+ def self.send_to_quarantine(msg, multi_connection = nil)
173
+ Sidekiq.logger.warn(
174
+ class: msg['class'],
175
+ jid: msg['jid'],
176
+ message: %(Reliable Fetcher: adding dead #{msg['class']} job #{msg['jid']} to interrupted queue)
177
+ )
171
178
 
172
- msg['retry_count'] >= max_retry_attempts
179
+ job = Sidekiq.dump_json(msg)
180
+ Sidekiq::InterruptedSet.new.put(job, connection: multi_connection)
173
181
  end
174
182
 
175
- def retry_attempts_from(msg_retry, default)
176
- if msg_retry.is_a?(Integer)
177
- msg_retry
178
- else
179
- default
183
+ # If you want this method to be run is a scope of multi connection
184
+ # you need to pass it
185
+ def self.requeue_job(queue, msg, conn)
186
+ with_connection(conn) do |conn|
187
+ conn.lpush(queue, Sidekiq.dump_json(msg))
180
188
  end
181
- end
182
189
 
183
- def send_to_morgue(msg)
184
- Sidekiq.logger.warn(
185
- class: msg['class'],
190
+ Sidekiq.logger.info(
191
+ message: "Pushed job #{msg['jid']} back to queue #{queue}",
186
192
  jid: msg['jid'],
187
- message: %(Reliable Fetcher: adding dead #{msg['class']} job #{msg['jid']})
193
+ queue: queue
188
194
  )
195
+ end
196
+
197
+ # Yield block with an existing connection or creates another one
198
+ def self.with_connection(conn, &block)
199
+ return yield(conn) if conn
189
200
 
190
- payload = Sidekiq.dump_json(msg)
191
- Sidekiq::DeadSet.new.kill(payload, notify_failure: false)
201
+ Sidekiq.redis { |conn| yield(conn) }
192
202
  end
193
203
 
194
- # Detect "old" jobs and requeue them because the worker they were assigned
195
- # to probably failed miserably.
196
- def clean_working_queues!
197
- Sidekiq.logger.info("Cleaning working queues")
204
+ attr_reader :cleanup_interval, :last_try_to_take_lease_at, :lease_interval,
205
+ :queues, :use_semi_reliable_fetch,
206
+ :strictly_ordered_queues
198
207
 
199
- Sidekiq.redis do |conn|
200
- conn.scan_each(match: "#{WORKING_QUEUE_PREFIX}:queue:*", count: SCAN_COUNT) do |key|
201
- # Example: "working:name_of_the_job:queue:{hostname}:{PID}"
202
- hostname, pid = key.scan(/:([^:]*):([0-9]*)\z/).flatten
208
+ def initialize(options)
209
+ @cleanup_interval = options.fetch(:cleanup_interval, DEFAULT_CLEANUP_INTERVAL)
210
+ @lease_interval = options.fetch(:lease_interval, DEFAULT_LEASE_INTERVAL)
211
+ @last_try_to_take_lease_at = 0
212
+ @strictly_ordered_queues = !!options[:strict]
213
+ @queues = options[:queues].map { |q| "queue:#{q}" }
214
+ end
203
215
 
204
- continue if hostname.nil? || pid.nil?
216
+ def retrieve_work
217
+ self.class.clean_working_queues! if take_lease
205
218
 
206
- clean_working_queue!(key) if worker_dead?(hostname, pid)
207
- end
208
- end
219
+ retrieve_unit_of_work
209
220
  end
210
221
 
211
- def worker_dead?(hostname, pid)
212
- Sidekiq.redis do |conn|
213
- !conn.get(self.class.heartbeat_key(hostname, pid))
214
- end
222
+ def retrieve_unit_of_work
223
+ raise NotImplementedError,
224
+ "#{self.class} does not implement #{__method__}"
215
225
  end
216
226
 
227
+ private
228
+
217
229
  def take_lease
218
230
  return unless allowed_to_take_a_lease?
219
231
 
@@ -0,0 +1,47 @@
1
+ require 'sidekiq/api'
2
+
3
+ module Sidekiq
4
+ class InterruptedSet < ::Sidekiq::JobSet
5
+ DEFAULT_MAX_CAPACITY = 10_000
6
+ DEFAULT_MAX_TIMEOUT = 90 * 24 * 60 * 60 # 3 months
7
+
8
+ def initialize
9
+ super "interrupted"
10
+ end
11
+
12
+ def put(message, opts = {})
13
+ now = Time.now.to_f
14
+
15
+ with_multi_connection(opts[:connection]) do |conn|
16
+ conn.zadd(name, now.to_s, message)
17
+ conn.zremrangebyscore(name, '-inf', now - self.class.timeout)
18
+ conn.zremrangebyrank(name, 0, - self.class.max_jobs)
19
+ end
20
+
21
+ true
22
+ end
23
+
24
+ # Yield block inside an existing multi connection or creates new one
25
+ def with_multi_connection(conn, &block)
26
+ return yield(conn) if conn
27
+
28
+ Sidekiq.redis do |c|
29
+ c.multi do |multi|
30
+ yield(multi)
31
+ end
32
+ end
33
+ end
34
+
35
+ def retry_all
36
+ each(&:retry) while size > 0
37
+ end
38
+
39
+ def self.max_jobs
40
+ Sidekiq.options[:interrupted_max_jobs] || DEFAULT_MAX_CAPACITY
41
+ end
42
+
43
+ def self.timeout
44
+ Sidekiq.options[:interrupted_timeout_in_seconds] || DEFAULT_MAX_TIMEOUT
45
+ end
46
+ end
47
+ end
@@ -5,10 +5,11 @@ require 'sidekiq/reliable_fetch'
5
5
  require 'sidekiq/semi_reliable_fetch'
6
6
 
7
7
  describe Sidekiq::BaseReliableFetch do
8
+ let(:job) { Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo']) }
9
+
8
10
  before { Sidekiq.redis(&:flushdb) }
9
11
 
10
12
  describe 'UnitOfWork' do
11
- let(:job) { Sidekiq.dump_json({ class: 'Bob', args: [1, 2, 'foo'] }) }
12
13
  let(:fetcher) { Sidekiq::ReliableFetch.new(queues: ['foo']) }
13
14
 
14
15
  describe '#requeue' do
@@ -39,24 +40,47 @@ describe Sidekiq::BaseReliableFetch do
39
40
  end
40
41
 
41
42
  describe '.bulk_requeue' do
43
+ let!(:queue1) { Sidekiq::Queue.new('foo') }
44
+ let!(:queue2) { Sidekiq::Queue.new('bar') }
45
+
42
46
  it 'requeues the bulk' do
43
- queue1 = Sidekiq::Queue.new('foo')
44
- queue2 = Sidekiq::Queue.new('bar')
47
+ uow = described_class::UnitOfWork
48
+ jobs = [ uow.new('queue:foo', job), uow.new('queue:foo', job), uow.new('queue:bar', job) ]
49
+ described_class.bulk_requeue(jobs, queues: [])
45
50
 
46
- expect(queue1.size).to eq 0
47
- expect(queue2.size).to eq 0
51
+ expect(queue1.size).to eq 2
52
+ expect(queue2.size).to eq 1
53
+ end
48
54
 
55
+ it 'puts jobs into interrupted queue' do
49
56
  uow = described_class::UnitOfWork
50
- jobs = [ uow.new('queue:foo', 'bob'), uow.new('queue:foo', 'bar'), uow.new('queue:bar', 'widget') ]
57
+ interrupted_job = Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo'], interrupted_count: 3)
58
+ jobs = [ uow.new('queue:foo', interrupted_job), uow.new('queue:foo', job), uow.new('queue:bar', job) ]
59
+ described_class.bulk_requeue(jobs, queues: [])
60
+
61
+ expect(queue1.size).to eq 1
62
+ expect(queue2.size).to eq 1
63
+ expect(Sidekiq::InterruptedSet.new.size).to eq 1
64
+ end
65
+
66
+ it 'does not put jobs into interrupted queue if it is disabled' do
67
+ Sidekiq.options[:max_retries_after_interruption] = -1
68
+
69
+ uow = described_class::UnitOfWork
70
+ interrupted_job = Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo'], interrupted_count: 3)
71
+ jobs = [ uow.new('queue:foo', interrupted_job), uow.new('queue:foo', job), uow.new('queue:bar', job) ]
51
72
  described_class.bulk_requeue(jobs, queues: [])
52
73
 
53
74
  expect(queue1.size).to eq 2
54
75
  expect(queue2.size).to eq 1
76
+ expect(Sidekiq::InterruptedSet.new.size).to eq 0
77
+
78
+ Sidekiq.options[:max_retries_after_interruption] = 3
55
79
  end
56
80
  end
57
81
 
58
82
  it 'sets heartbeat' do
59
- config = double(:sidekiq_config, options: {})
83
+ config = double(:sidekiq_config, options: { queues: [] })
60
84
 
61
85
  heartbeat_thread = described_class.setup_reliable_fetch!(config)
62
86
 
@@ -4,7 +4,7 @@ shared_examples 'a Sidekiq fetcher' do
4
4
  before { Sidekiq.redis(&:flushdb) }
5
5
 
6
6
  describe '#retrieve_work' do
7
- let(:job) { Sidekiq.dump_json({ class: 'Bob', args: [1, 2, 'foo'] }) }
7
+ let(:job) { Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo']) }
8
8
  let(:fetcher) { described_class.new(queues: ['assigned']) }
9
9
 
10
10
  it 'retrieves the job and puts it to working queue' do
@@ -24,13 +24,13 @@ shared_examples 'a Sidekiq fetcher' do
24
24
  expect(fetcher.retrieve_work).to be_nil
25
25
  end
26
26
 
27
- it 'requeues jobs from dead working queue with incremented retry_count' do
27
+ it 'requeues jobs from dead working queue with incremented interrupted_count' do
28
28
  Sidekiq.redis do |conn|
29
29
  conn.rpush(other_process_working_queue_name('assigned'), job)
30
30
  end
31
31
 
32
32
  expected_job = Sidekiq.load_json(job)
33
- expected_job['retry_count'] = 1
33
+ expected_job['interrupted_count'] = 1
34
34
  expected_job = Sidekiq.dump_json(expected_job)
35
35
 
36
36
  uow = fetcher.retrieve_work
@@ -61,8 +61,7 @@ shared_examples 'a Sidekiq fetcher' do
61
61
  it 'does not clean up orphaned jobs more than once per cleanup interval' do
62
62
  Sidekiq.redis = Sidekiq::RedisConnection.create(url: REDIS_URL, size: 10)
63
63
 
64
- expect_any_instance_of(described_class)
65
- .to receive(:clean_working_queues!).once
64
+ expect(described_class).to receive(:clean_working_queues!).once
66
65
 
67
66
  threads = 10.times.map do
68
67
  Thread.new do
@@ -18,15 +18,20 @@ You need to have redis server running on default HTTP port `6379`. To use other
18
18
  This tool spawns configured number of Sidekiq workers and when the amount of processed jobs is about half of origin
19
19
  number it will kill all the workers with `kill -9` and then it will spawn new workers again until all the jobs are processed. To track the process and counters we use Redis keys/counters.
20
20
 
21
- # How to run retry tests
21
+ # How to run interruption tests
22
22
 
23
23
  ```
24
- cd retry_test
25
- bundle exec ruby retry_test.rb
24
+ cd tests/interruption
25
+
26
+ # Verify "KILL" signal
27
+ bundle exec ruby test_kill_signal.rb
28
+
29
+ # Verify "TERM" signal
30
+ bundle exec ruby test_term_signal.rb
26
31
  ```
27
32
 
28
33
  It requires Redis to be running on 6379 port.
29
34
 
30
35
  ## How it works
31
36
 
32
- It spawns Sidekiq workers then creates a job that will kill itself after a moment. The reliable fetcher will bring it back. The purpose is to verify that job is run no more then `retry` parameter says even when job was killed.
37
+ It spawns Sidekiq workers then creates a job that will kill itself after a moment. The reliable fetcher will bring it back. The purpose is to verify that job is run no more then allowed number of times.
@@ -0,0 +1,25 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'sidekiq'
4
+ require_relative 'config'
5
+ require_relative '../support/utils'
6
+
7
+ EXPECTED_NUM_TIMES_BEEN_RUN = 3
8
+ NUM_WORKERS = EXPECTED_NUM_TIMES_BEEN_RUN + 1
9
+
10
+ Sidekiq.redis(&:flushdb)
11
+
12
+ pids = spawn_workers(NUM_WORKERS)
13
+
14
+ RetryTestWorker.perform_async
15
+
16
+ sleep 300
17
+
18
+ Sidekiq.redis do |redis|
19
+ times_has_been_run = redis.get('times_has_been_run').to_i
20
+ assert 'The job has been run', times_has_been_run, EXPECTED_NUM_TIMES_BEEN_RUN
21
+ end
22
+
23
+ assert 'Found interruption exhausted jobs', Sidekiq::InterruptedSet.new.size, 1
24
+
25
+ stop_workers(pids)
@@ -0,0 +1,25 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'sidekiq'
4
+ require_relative 'config'
5
+ require_relative '../support/utils'
6
+
7
+ EXPECTED_NUM_TIMES_BEEN_RUN = 3
8
+ NUM_WORKERS = EXPECTED_NUM_TIMES_BEEN_RUN + 1
9
+
10
+ Sidekiq.redis(&:flushdb)
11
+
12
+ pids = spawn_workers(NUM_WORKERS)
13
+
14
+ RetryTestWorker.perform_async('TERM', 60)
15
+
16
+ sleep 300
17
+
18
+ Sidekiq.redis do |redis|
19
+ times_has_been_run = redis.get('times_has_been_run').to_i
20
+ assert 'The job has been run', times_has_been_run, EXPECTED_NUM_TIMES_BEEN_RUN
21
+ end
22
+
23
+ assert 'Found interruption exhausted jobs', Sidekiq::InterruptedSet.new.size, 1
24
+
25
+ stop_workers(pids)
@@ -0,0 +1,15 @@
1
+ # frozen_string_literal: true
2
+
3
+ class RetryTestWorker
4
+ include Sidekiq::Worker
5
+
6
+ def perform(signal = 'KILL', wait_seconds = 1)
7
+ Sidekiq.redis do |redis|
8
+ redis.incr('times_has_been_run')
9
+ end
10
+
11
+ Process.kill(signal, Process.pid)
12
+
13
+ sleep wait_seconds
14
+ end
15
+ end
@@ -0,0 +1,14 @@
1
+ # frozen_string_literal: true
2
+
3
+ class ReliabilityTestWorker
4
+ include Sidekiq::Worker
5
+
6
+ def perform
7
+ # To mimic long running job and to increase the probability of losing the job
8
+ sleep 1
9
+
10
+ Sidekiq.redis do |redis|
11
+ redis.lpush(REDIS_FINISHED_LIST, jid)
12
+ end
13
+ end
14
+ end
@@ -0,0 +1,26 @@
1
+ def assert(text, actual, expected)
2
+ if actual == expected
3
+ puts "#{text}: #{actual} (Success)"
4
+ else
5
+ puts "#{text}: #{actual} (Failed). Expected: #{expected}"
6
+ exit 1
7
+ end
8
+ end
9
+
10
+ def spawn_workers(number)
11
+ pids = []
12
+
13
+ number.times do
14
+ pids << spawn('sidekiq -r ./config.rb')
15
+ end
16
+
17
+ pids
18
+ end
19
+
20
+ # Stop Sidekiq workers
21
+ def stop_workers(pids)
22
+ pids.each do |pid|
23
+ Process.kill('KILL', pid)
24
+ Process.wait pid
25
+ end
26
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gitlab-sidekiq-fetcher
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.0.pre.alpha
4
+ version: 0.6.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - TEA
@@ -9,22 +9,28 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2019-08-02 00:00:00.000000000 Z
12
+ date: 2020-08-03 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: sidekiq
16
16
  requirement: !ruby/object:Gem::Requirement
17
17
  requirements:
18
- - - "~>"
18
+ - - ">="
19
19
  - !ruby/object:Gem::Version
20
20
  version: '5'
21
+ - - "<"
22
+ - !ruby/object:Gem::Version
23
+ version: '6.1'
21
24
  type: :runtime
22
25
  prerelease: false
23
26
  version_requirements: !ruby/object:Gem::Requirement
24
27
  requirements:
25
- - - "~>"
28
+ - - ">="
26
29
  - !ruby/object:Gem::Version
27
30
  version: '5'
31
+ - - "<"
32
+ - !ruby/object:Gem::Version
33
+ version: '6.1'
28
34
  description: Redis reliable queue pattern implemented in Sidekiq
29
35
  email: valery@gitlab.com
30
36
  executables: []
@@ -42,6 +48,7 @@ files:
42
48
  - gitlab-sidekiq-fetcher.gemspec
43
49
  - lib/sidekiq-reliable-fetch.rb
44
50
  - lib/sidekiq/base_reliable_fetch.rb
51
+ - lib/sidekiq/interrupted_set.rb
45
52
  - lib/sidekiq/reliable_fetch.rb
46
53
  - lib/sidekiq/semi_reliable_fetch.rb
47
54
  - spec/base_reliable_fetch_spec.rb
@@ -50,13 +57,14 @@ files:
50
57
  - spec/semi_reliable_fetch_spec.rb
51
58
  - spec/spec_helper.rb
52
59
  - tests/README.md
53
- - tests/reliability_test/config.rb
54
- - tests/reliability_test/reliability_test.rb
55
- - tests/reliability_test/worker.rb
56
- - tests/retry_test/config.rb
57
- - tests/retry_test/retry_test.rb
58
- - tests/retry_test/simple_assert.rb
59
- - tests/retry_test/worker.rb
60
+ - tests/interruption/config.rb
61
+ - tests/interruption/test_kill_signal.rb
62
+ - tests/interruption/test_term_signal.rb
63
+ - tests/interruption/worker.rb
64
+ - tests/reliability/config.rb
65
+ - tests/reliability/reliability_test.rb
66
+ - tests/reliability/worker.rb
67
+ - tests/support/utils.rb
60
68
  homepage: https://gitlab.com/gitlab-org/sidekiq-reliable-fetch/
61
69
  licenses:
62
70
  - LGPL-3.0
@@ -72,11 +80,11 @@ required_ruby_version: !ruby/object:Gem::Requirement
72
80
  version: '0'
73
81
  required_rubygems_version: !ruby/object:Gem::Requirement
74
82
  requirements:
75
- - - ">"
83
+ - - ">="
76
84
  - !ruby/object:Gem::Version
77
- version: 1.3.1
85
+ version: '0'
78
86
  requirements: []
79
- rubygems_version: 3.0.3
87
+ rubygems_version: 3.0.6
80
88
  signing_key:
81
89
  specification_version: 4
82
90
  summary: Reliable fetch extension for Sidekiq
@@ -1,26 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- class ReliabilityTestWorker
4
- include Sidekiq::Worker
5
-
6
- def perform
7
- # To mimic long running job and to increase the probability of losing the job
8
- sleep 1
9
-
10
- Sidekiq.redis do |redis|
11
- redis.lpush(REDIS_FINISHED_LIST, get_sidekiq_job_id)
12
- end
13
- end
14
-
15
- def get_sidekiq_job_id
16
- context_data = Thread.current[:sidekiq_context]&.first
17
-
18
- return unless context_data
19
-
20
- index = context_data.index('JID-')
21
-
22
- return unless index
23
-
24
- context_data[index + 4..-1]
25
- end
26
- end
@@ -1,40 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- require 'sidekiq'
4
- require 'sidekiq/util'
5
- require 'sidekiq/cli'
6
- require_relative 'config'
7
- require_relative 'simple_assert'
8
-
9
- NUM_WORKERS = RetryTestWorker::EXPECTED_NUM_TIMES_BEEN_RUN + 1
10
-
11
- Sidekiq.redis(&:flushdb)
12
-
13
- def spawn_workers
14
- pids = []
15
-
16
- NUM_WORKERS.times do
17
- pids << spawn('sidekiq -r ./config.rb')
18
- end
19
-
20
- pids
21
- end
22
-
23
- pids = spawn_workers
24
-
25
- jid = RetryTestWorker.perform_async
26
-
27
- sleep 300
28
-
29
- Sidekiq.redis do |redis|
30
- times_has_been_run = redis.get('times_has_been_run').to_i
31
- assert "The job has been run", times_has_been_run, 2
32
- end
33
-
34
- assert "Found dead jobs", Sidekiq::DeadSet.new.size, 1
35
-
36
- # Stop Sidekiq workers
37
- pids.each do |pid|
38
- Process.kill('KILL', pid)
39
- Process.wait pid
40
- end
@@ -1,8 +0,0 @@
1
- def assert(text, actual, expected)
2
- if actual == expected
3
- puts "#{text}: #{actual} (Success)"
4
- else
5
- puts "#{text}: #{actual} (Failed). Expected: #{expected}"
6
- exit 1
7
- end
8
- end
@@ -1,23 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- class RetryTestWorker
4
- include Sidekiq::Worker
5
-
6
- EXPECTED_NUM_TIMES_BEEN_RUN = 2
7
-
8
- sidekiq_options retry: EXPECTED_NUM_TIMES_BEEN_RUN
9
-
10
- sidekiq_retry_in do |count, exception|
11
- 1 # retry in one second
12
- end
13
-
14
- def perform
15
- sleep 1
16
-
17
- Sidekiq.redis do |redis|
18
- redis.incr('times_has_been_run')
19
- end
20
-
21
- Process.kill('KILL', Process.pid) # Job suicide, OOM killer imitation
22
- end
23
- end