gitlab-sidekiq-fetcher 0.4.0 → 0.6.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.gitignore +1 -0
- data/.gitlab-ci.yml +18 -2
- data/README.md +11 -0
- data/gitlab-sidekiq-fetcher.gemspec +11 -11
- data/lib/sidekiq-reliable-fetch.rb +1 -0
- data/lib/sidekiq/base_reliable_fetch.rb +108 -48
- data/lib/sidekiq/interrupted_set.rb +47 -0
- data/spec/base_reliable_fetch_spec.rb +33 -8
- data/spec/fetch_shared_examples.rb +13 -9
- data/spec/reliable_fetch_spec.rb +1 -0
- data/spec/semi_reliable_fetch_spec.rb +1 -0
- data/tests/README.md +37 -0
- data/tests/interruption/config.rb +19 -0
- data/tests/interruption/test_kill_signal.rb +25 -0
- data/tests/interruption/test_term_signal.rb +25 -0
- data/tests/interruption/worker.rb +15 -0
- data/{test → tests/reliability}/config.rb +1 -3
- data/{test → tests/reliability}/reliability_test.rb +1 -1
- data/{test → tests/reliability}/worker.rb +1 -1
- data/tests/support/utils.rb +26 -0
- metadata +21 -10
- data/test/README.md +0 -34
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 4815e3e75915230d7b2eaf2fe3fa0daa288ec4c670b2cd211cb659aff4838788
|
4
|
+
data.tar.gz: c8b491f2d1a2678ef40fe856a55bf89341e041070c40941ca693685fa3c048cf
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 1667fb3ffb47117ac3c756c07a85867d96871b5489e820823735b62912df0d82b4d6c10be478e1f291d637d2ed15aff99a958f4fd884df98f026ee8c8f9c4c83
|
7
|
+
data.tar.gz: 5dad56f30515be87e79c58c3b5dce301213d427fe579309bca2a525e2b8664b94f0763c7eabfb2443324c5d8ecf5403f9826a0772e3b09b74e215e72ebfc4226
|
data/.gitignore
CHANGED
data/.gitlab-ci.yml
CHANGED
@@ -3,7 +3,7 @@ image: "ruby:2.5"
|
|
3
3
|
before_script:
|
4
4
|
- ruby -v
|
5
5
|
- which ruby
|
6
|
-
- gem install bundler
|
6
|
+
- gem install bundler
|
7
7
|
- bundle install --jobs $(nproc) "${FLAGS[@]}"
|
8
8
|
|
9
9
|
variables:
|
@@ -25,7 +25,7 @@ rspec:
|
|
25
25
|
.integration:
|
26
26
|
stage: test
|
27
27
|
script:
|
28
|
-
- cd
|
28
|
+
- cd tests/reliability
|
29
29
|
- bundle exec ruby reliability_test.rb
|
30
30
|
services:
|
31
31
|
- redis:alpine
|
@@ -47,6 +47,22 @@ integration_basic:
|
|
47
47
|
variables:
|
48
48
|
JOB_FETCHER: basic
|
49
49
|
|
50
|
+
kill_interruption:
|
51
|
+
stage: test
|
52
|
+
script:
|
53
|
+
- cd tests/interruption
|
54
|
+
- bundle exec ruby test_kill_signal.rb
|
55
|
+
services:
|
56
|
+
- redis:alpine
|
57
|
+
|
58
|
+
term_interruption:
|
59
|
+
stage: test
|
60
|
+
script:
|
61
|
+
- cd tests/interruption
|
62
|
+
- bundle exec ruby test_term_signal.rb
|
63
|
+
services:
|
64
|
+
- redis:alpine
|
65
|
+
|
50
66
|
|
51
67
|
# rubocop:
|
52
68
|
# script:
|
data/README.md
CHANGED
@@ -10,6 +10,17 @@ There are two strategies implemented: [Reliable fetch](http://redis.io/commands/
|
|
10
10
|
semi-reliable fetch that uses regular `brpop` and `lpush` to pick the job and put it to working queue. The main benefit of "Reliable" strategy is that `rpoplpush` is atomic, eliminating a race condition in which jobs can be lost.
|
11
11
|
However, it comes at a cost because `rpoplpush` can't watch multiple lists at the same time so we need to iterate over the entire queue list which significantly increases pressure on Redis when there are more than a few queues. The "semi-reliable" strategy is much more reliable than the default Sidekiq fetcher, though. Compared to the reliable fetch strategy, it does not increase pressure on Redis significantly.
|
12
12
|
|
13
|
+
### Interruption handling
|
14
|
+
|
15
|
+
Sidekiq expects any job to report succcess or to fail. In the last case, Sidekiq puts `retry_count` counter
|
16
|
+
into the job and keeps to re-run the job until the counter reched the maximum allowed value. When the job has
|
17
|
+
not been given a chance to finish its work(to report success or fail), for example, when it was killed forcibly or when the job was requeued, after receiving TERM signal, the standard retry mechanisme does not get into the game and the job will be retried indefinatelly. This is why Reliable fetcher maintains a special counter `interrupted_count`
|
18
|
+
which is used to limit the amount of such retries. In both cases, Reliable Fetcher increments counter `interrupted_count` and rejects the job from running again when the counter exceeds `max_retries_after_interruption` times (default: 3 times).
|
19
|
+
Such a job will be put to `interrupted` queue. This queue mostly behaves as Sidekiq Dead queue so it only stores a limited amount of jobs for a limited term. Same as for Dead queue, all the limits are configurable via `interrupted_max_jobs` (default: 10_000) and `interrupted_timeout_in_seconds` (default: 3 months) Sidekiq option keys.
|
20
|
+
|
21
|
+
You can also disable special handling of interrupted jobs by setting `max_retries_after_interruption` into `-1`.
|
22
|
+
In this case, interrupted jobs will be run without any limits from Reliable Fetcher and they won't be put into Interrupted queue.
|
23
|
+
|
13
24
|
|
14
25
|
## Installation
|
15
26
|
|
@@ -1,14 +1,14 @@
|
|
1
1
|
Gem::Specification.new do |s|
|
2
|
-
s.name
|
3
|
-
s.version
|
4
|
-
s.authors
|
5
|
-
s.email
|
6
|
-
s.license
|
7
|
-
s.homepage
|
8
|
-
s.summary
|
9
|
-
s.description
|
2
|
+
s.name = 'gitlab-sidekiq-fetcher'
|
3
|
+
s.version = '0.6.0'
|
4
|
+
s.authors = ['TEA', 'GitLab']
|
5
|
+
s.email = 'valery@gitlab.com'
|
6
|
+
s.license = 'LGPL-3.0'
|
7
|
+
s.homepage = 'https://gitlab.com/gitlab-org/sidekiq-reliable-fetch/'
|
8
|
+
s.summary = 'Reliable fetch extension for Sidekiq'
|
9
|
+
s.description = 'Redis reliable queue pattern implemented in Sidekiq'
|
10
10
|
s.require_paths = ['lib']
|
11
|
-
s.files
|
12
|
-
s.test_files
|
13
|
-
s.add_dependency 'sidekiq', '
|
11
|
+
s.files = `git ls-files`.split($\)
|
12
|
+
s.test_files = []
|
13
|
+
s.add_dependency 'sidekiq', '>= 5', '< 7'
|
14
14
|
end
|
@@ -1,5 +1,7 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
|
+
require_relative 'interrupted_set'
|
4
|
+
|
3
5
|
module Sidekiq
|
4
6
|
class BaseReliableFetch
|
5
7
|
DEFAULT_CLEANUP_INTERVAL = 60 * 60 # 1 hour
|
@@ -16,6 +18,9 @@ module Sidekiq
|
|
16
18
|
# Defines the COUNT parameter that will be passed to Redis SCAN command
|
17
19
|
SCAN_COUNT = 1000
|
18
20
|
|
21
|
+
# How much time a job can be interrupted
|
22
|
+
DEFAULT_MAX_RETRIES_AFTER_INTERRUPTION = 3
|
23
|
+
|
19
24
|
UnitOfWork = Struct.new(:queue, :job) do
|
20
25
|
def acknowledge
|
21
26
|
Sidekiq.redis { |conn| conn.lrem(Sidekiq::BaseReliableFetch.working_queue_name(queue), 1, job) }
|
@@ -36,12 +41,10 @@ module Sidekiq
|
|
36
41
|
end
|
37
42
|
|
38
43
|
def self.setup_reliable_fetch!(config)
|
39
|
-
|
40
|
-
|
41
|
-
else
|
42
|
-
Sidekiq::ReliableFetch
|
43
|
-
end
|
44
|
+
fetch = config.options[:semi_reliable_fetch] ? SemiReliableFetch : ReliableFetch
|
45
|
+
fetch = fetch.new(config.options) if Sidekiq::VERSION >= '6'
|
44
46
|
|
47
|
+
config.options[:fetch] = fetch
|
45
48
|
Sidekiq.logger.info('GitLab reliable fetch activated!')
|
46
49
|
|
47
50
|
start_heartbeat_thread
|
@@ -79,25 +82,68 @@ module Sidekiq
|
|
79
82
|
Sidekiq.logger.debug("Heartbeat for hostname: #{hostname} and pid: #{pid}")
|
80
83
|
end
|
81
84
|
|
85
|
+
def bulk_requeue(inprogress, options)
|
86
|
+
self.class.bulk_requeue(inprogress, options)
|
87
|
+
end
|
88
|
+
|
82
89
|
def self.bulk_requeue(inprogress, _options)
|
83
90
|
return if inprogress.empty?
|
84
91
|
|
85
|
-
Sidekiq.logger.debug('Re-queueing terminated jobs')
|
86
|
-
|
87
92
|
Sidekiq.redis do |conn|
|
88
93
|
inprogress.each do |unit_of_work|
|
89
94
|
conn.multi do |multi|
|
90
|
-
|
95
|
+
preprocess_interrupted_job(unit_of_work.job, unit_of_work.queue, multi)
|
96
|
+
|
91
97
|
multi.lrem(working_queue_name(unit_of_work.queue), 1, unit_of_work.job)
|
92
98
|
end
|
93
99
|
end
|
94
100
|
end
|
95
|
-
|
96
|
-
Sidekiq.logger.info("Pushed #{inprogress.size} jobs back to Redis")
|
97
101
|
rescue => e
|
98
102
|
Sidekiq.logger.warn("Failed to requeue #{inprogress.size} jobs: #{e.message}")
|
99
103
|
end
|
100
104
|
|
105
|
+
def self.clean_working_queue!(working_queue)
|
106
|
+
original_queue = working_queue.gsub(/#{WORKING_QUEUE_PREFIX}:|:[^:]*:[0-9]*\z/, '')
|
107
|
+
|
108
|
+
Sidekiq.redis do |conn|
|
109
|
+
while job = conn.rpop(working_queue)
|
110
|
+
preprocess_interrupted_job(job, original_queue)
|
111
|
+
end
|
112
|
+
end
|
113
|
+
end
|
114
|
+
|
115
|
+
def self.preprocess_interrupted_job(job, queue, conn = nil)
|
116
|
+
msg = Sidekiq.load_json(job)
|
117
|
+
msg['interrupted_count'] = msg['interrupted_count'].to_i + 1
|
118
|
+
|
119
|
+
if interruption_exhausted?(msg)
|
120
|
+
send_to_quarantine(msg, conn)
|
121
|
+
else
|
122
|
+
requeue_job(queue, msg, conn)
|
123
|
+
end
|
124
|
+
end
|
125
|
+
|
126
|
+
# Detect "old" jobs and requeue them because the worker they were assigned
|
127
|
+
# to probably failed miserably.
|
128
|
+
def self.clean_working_queues!
|
129
|
+
Sidekiq.logger.info('Cleaning working queues')
|
130
|
+
|
131
|
+
Sidekiq.redis do |conn|
|
132
|
+
conn.scan_each(match: "#{WORKING_QUEUE_PREFIX}:queue:*", count: SCAN_COUNT) do |key|
|
133
|
+
# Example: "working:name_of_the_job:queue:{hostname}:{PID}"
|
134
|
+
hostname, pid = key.scan(/:([^:]*):([0-9]*)\z/).flatten
|
135
|
+
|
136
|
+
continue if hostname.nil? || pid.nil?
|
137
|
+
|
138
|
+
clean_working_queue!(key) if worker_dead?(hostname, pid, conn)
|
139
|
+
end
|
140
|
+
end
|
141
|
+
end
|
142
|
+
|
143
|
+
def self.worker_dead?(hostname, pid, conn)
|
144
|
+
!conn.get(heartbeat_key(hostname, pid))
|
145
|
+
end
|
146
|
+
|
101
147
|
def self.heartbeat_key(hostname, pid)
|
102
148
|
"reliable-fetcher-heartbeat-#{hostname}-#{pid}"
|
103
149
|
end
|
@@ -106,6 +152,57 @@ module Sidekiq
|
|
106
152
|
"#{WORKING_QUEUE_PREFIX}:#{queue}:#{hostname}:#{pid}"
|
107
153
|
end
|
108
154
|
|
155
|
+
def self.interruption_exhausted?(msg)
|
156
|
+
return false if max_retries_after_interruption(msg['class']) < 0
|
157
|
+
|
158
|
+
msg['interrupted_count'].to_i >= max_retries_after_interruption(msg['class'])
|
159
|
+
end
|
160
|
+
|
161
|
+
def self.max_retries_after_interruption(worker_class)
|
162
|
+
max_retries_after_interruption = nil
|
163
|
+
|
164
|
+
max_retries_after_interruption ||= begin
|
165
|
+
Object.const_get(worker_class).sidekiq_options[:max_retries_after_interruption]
|
166
|
+
rescue NameError
|
167
|
+
end
|
168
|
+
|
169
|
+
max_retries_after_interruption ||= Sidekiq.options[:max_retries_after_interruption]
|
170
|
+
max_retries_after_interruption ||= DEFAULT_MAX_RETRIES_AFTER_INTERRUPTION
|
171
|
+
max_retries_after_interruption
|
172
|
+
end
|
173
|
+
|
174
|
+
def self.send_to_quarantine(msg, multi_connection = nil)
|
175
|
+
Sidekiq.logger.warn(
|
176
|
+
class: msg['class'],
|
177
|
+
jid: msg['jid'],
|
178
|
+
message: %(Reliable Fetcher: adding dead #{msg['class']} job #{msg['jid']} to interrupted queue)
|
179
|
+
)
|
180
|
+
|
181
|
+
job = Sidekiq.dump_json(msg)
|
182
|
+
Sidekiq::InterruptedSet.new.put(job, connection: multi_connection)
|
183
|
+
end
|
184
|
+
|
185
|
+
# If you want this method to be run is a scope of multi connection
|
186
|
+
# you need to pass it
|
187
|
+
def self.requeue_job(queue, msg, conn)
|
188
|
+
with_connection(conn) do |conn|
|
189
|
+
conn.lpush(queue, Sidekiq.dump_json(msg))
|
190
|
+
end
|
191
|
+
|
192
|
+
Sidekiq.logger.info(
|
193
|
+
message: "Pushed job #{msg['jid']} back to queue #{queue}",
|
194
|
+
jid: msg['jid'],
|
195
|
+
queue: queue
|
196
|
+
)
|
197
|
+
end
|
198
|
+
|
199
|
+
# Yield block with an existing connection or creates another one
|
200
|
+
def self.with_connection(conn, &block)
|
201
|
+
return yield(conn) if conn
|
202
|
+
|
203
|
+
Sidekiq.redis { |conn| yield(conn) }
|
204
|
+
end
|
205
|
+
|
109
206
|
attr_reader :cleanup_interval, :last_try_to_take_lease_at, :lease_interval,
|
110
207
|
:queues, :use_semi_reliable_fetch,
|
111
208
|
:strictly_ordered_queues
|
@@ -119,7 +216,7 @@ module Sidekiq
|
|
119
216
|
end
|
120
217
|
|
121
218
|
def retrieve_work
|
122
|
-
clean_working_queues! if take_lease
|
219
|
+
self.class.clean_working_queues! if take_lease
|
123
220
|
|
124
221
|
retrieve_unit_of_work
|
125
222
|
end
|
@@ -131,43 +228,6 @@ module Sidekiq
|
|
131
228
|
|
132
229
|
private
|
133
230
|
|
134
|
-
def clean_working_queue!(working_queue)
|
135
|
-
original_queue = working_queue.gsub(/#{WORKING_QUEUE_PREFIX}:|:[^:]*:[0-9]*\z/, '')
|
136
|
-
|
137
|
-
Sidekiq.redis do |conn|
|
138
|
-
count = 0
|
139
|
-
|
140
|
-
while conn.rpoplpush(working_queue, original_queue) do
|
141
|
-
count += 1
|
142
|
-
end
|
143
|
-
|
144
|
-
Sidekiq.logger.info("Requeued #{count} dead jobs to #{original_queue}")
|
145
|
-
end
|
146
|
-
end
|
147
|
-
|
148
|
-
# Detect "old" jobs and requeue them because the worker they were assigned
|
149
|
-
# to probably failed miserably.
|
150
|
-
def clean_working_queues!
|
151
|
-
Sidekiq.logger.info("Cleaning working queues")
|
152
|
-
|
153
|
-
Sidekiq.redis do |conn|
|
154
|
-
conn.scan_each(match: "#{WORKING_QUEUE_PREFIX}:queue:*", count: SCAN_COUNT) do |key|
|
155
|
-
# Example: "working:name_of_the_job:queue:{hostname}:{PID}"
|
156
|
-
hostname, pid = key.scan(/:([^:]*):([0-9]*)\z/).flatten
|
157
|
-
|
158
|
-
continue if hostname.nil? || pid.nil?
|
159
|
-
|
160
|
-
clean_working_queue!(key) if worker_dead?(hostname, pid)
|
161
|
-
end
|
162
|
-
end
|
163
|
-
end
|
164
|
-
|
165
|
-
def worker_dead?(hostname, pid)
|
166
|
-
Sidekiq.redis do |conn|
|
167
|
-
!conn.get(self.class.heartbeat_key(hostname, pid))
|
168
|
-
end
|
169
|
-
end
|
170
|
-
|
171
231
|
def take_lease
|
172
232
|
return unless allowed_to_take_a_lease?
|
173
233
|
|
@@ -0,0 +1,47 @@
|
|
1
|
+
require 'sidekiq/api'
|
2
|
+
|
3
|
+
module Sidekiq
|
4
|
+
class InterruptedSet < ::Sidekiq::JobSet
|
5
|
+
DEFAULT_MAX_CAPACITY = 10_000
|
6
|
+
DEFAULT_MAX_TIMEOUT = 90 * 24 * 60 * 60 # 3 months
|
7
|
+
|
8
|
+
def initialize
|
9
|
+
super "interrupted"
|
10
|
+
end
|
11
|
+
|
12
|
+
def put(message, opts = {})
|
13
|
+
now = Time.now.to_f
|
14
|
+
|
15
|
+
with_multi_connection(opts[:connection]) do |conn|
|
16
|
+
conn.zadd(name, now.to_s, message)
|
17
|
+
conn.zremrangebyscore(name, '-inf', now - self.class.timeout)
|
18
|
+
conn.zremrangebyrank(name, 0, - self.class.max_jobs)
|
19
|
+
end
|
20
|
+
|
21
|
+
true
|
22
|
+
end
|
23
|
+
|
24
|
+
# Yield block inside an existing multi connection or creates new one
|
25
|
+
def with_multi_connection(conn, &block)
|
26
|
+
return yield(conn) if conn
|
27
|
+
|
28
|
+
Sidekiq.redis do |c|
|
29
|
+
c.multi do |multi|
|
30
|
+
yield(multi)
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
34
|
+
|
35
|
+
def retry_all
|
36
|
+
each(&:retry) while size > 0
|
37
|
+
end
|
38
|
+
|
39
|
+
def self.max_jobs
|
40
|
+
Sidekiq.options[:interrupted_max_jobs] || DEFAULT_MAX_CAPACITY
|
41
|
+
end
|
42
|
+
|
43
|
+
def self.timeout
|
44
|
+
Sidekiq.options[:interrupted_timeout_in_seconds] || DEFAULT_MAX_TIMEOUT
|
45
|
+
end
|
46
|
+
end
|
47
|
+
end
|
@@ -5,6 +5,8 @@ require 'sidekiq/reliable_fetch'
|
|
5
5
|
require 'sidekiq/semi_reliable_fetch'
|
6
6
|
|
7
7
|
describe Sidekiq::BaseReliableFetch do
|
8
|
+
let(:job) { Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo']) }
|
9
|
+
|
8
10
|
before { Sidekiq.redis(&:flushdb) }
|
9
11
|
|
10
12
|
describe 'UnitOfWork' do
|
@@ -12,7 +14,7 @@ describe Sidekiq::BaseReliableFetch do
|
|
12
14
|
|
13
15
|
describe '#requeue' do
|
14
16
|
it 'requeues job' do
|
15
|
-
Sidekiq.redis { |conn| conn.rpush('queue:foo',
|
17
|
+
Sidekiq.redis { |conn| conn.rpush('queue:foo', job) }
|
16
18
|
|
17
19
|
uow = fetcher.retrieve_work
|
18
20
|
|
@@ -25,7 +27,7 @@ describe Sidekiq::BaseReliableFetch do
|
|
25
27
|
|
26
28
|
describe '#acknowledge' do
|
27
29
|
it 'acknowledges job' do
|
28
|
-
Sidekiq.redis { |conn| conn.rpush('queue:foo',
|
30
|
+
Sidekiq.redis { |conn| conn.rpush('queue:foo', job) }
|
29
31
|
|
30
32
|
uow = fetcher.retrieve_work
|
31
33
|
|
@@ -38,24 +40,47 @@ describe Sidekiq::BaseReliableFetch do
|
|
38
40
|
end
|
39
41
|
|
40
42
|
describe '.bulk_requeue' do
|
43
|
+
let!(:queue1) { Sidekiq::Queue.new('foo') }
|
44
|
+
let!(:queue2) { Sidekiq::Queue.new('bar') }
|
45
|
+
|
41
46
|
it 'requeues the bulk' do
|
42
|
-
|
43
|
-
|
47
|
+
uow = described_class::UnitOfWork
|
48
|
+
jobs = [ uow.new('queue:foo', job), uow.new('queue:foo', job), uow.new('queue:bar', job) ]
|
49
|
+
described_class.bulk_requeue(jobs, queues: [])
|
44
50
|
|
45
|
-
expect(queue1.size).to eq
|
46
|
-
expect(queue2.size).to eq
|
51
|
+
expect(queue1.size).to eq 2
|
52
|
+
expect(queue2.size).to eq 1
|
53
|
+
end
|
47
54
|
|
55
|
+
it 'puts jobs into interrupted queue' do
|
48
56
|
uow = described_class::UnitOfWork
|
49
|
-
|
57
|
+
interrupted_job = Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo'], interrupted_count: 3)
|
58
|
+
jobs = [ uow.new('queue:foo', interrupted_job), uow.new('queue:foo', job), uow.new('queue:bar', job) ]
|
59
|
+
described_class.bulk_requeue(jobs, queues: [])
|
60
|
+
|
61
|
+
expect(queue1.size).to eq 1
|
62
|
+
expect(queue2.size).to eq 1
|
63
|
+
expect(Sidekiq::InterruptedSet.new.size).to eq 1
|
64
|
+
end
|
65
|
+
|
66
|
+
it 'does not put jobs into interrupted queue if it is disabled' do
|
67
|
+
Sidekiq.options[:max_retries_after_interruption] = -1
|
68
|
+
|
69
|
+
uow = described_class::UnitOfWork
|
70
|
+
interrupted_job = Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo'], interrupted_count: 3)
|
71
|
+
jobs = [ uow.new('queue:foo', interrupted_job), uow.new('queue:foo', job), uow.new('queue:bar', job) ]
|
50
72
|
described_class.bulk_requeue(jobs, queues: [])
|
51
73
|
|
52
74
|
expect(queue1.size).to eq 2
|
53
75
|
expect(queue2.size).to eq 1
|
76
|
+
expect(Sidekiq::InterruptedSet.new.size).to eq 0
|
77
|
+
|
78
|
+
Sidekiq.options[:max_retries_after_interruption] = 3
|
54
79
|
end
|
55
80
|
end
|
56
81
|
|
57
82
|
it 'sets heartbeat' do
|
58
|
-
config = double(:sidekiq_config, options: {})
|
83
|
+
config = double(:sidekiq_config, options: { queues: [] })
|
59
84
|
|
60
85
|
heartbeat_thread = described_class.setup_reliable_fetch!(config)
|
61
86
|
|
@@ -4,33 +4,38 @@ shared_examples 'a Sidekiq fetcher' do
|
|
4
4
|
before { Sidekiq.redis(&:flushdb) }
|
5
5
|
|
6
6
|
describe '#retrieve_work' do
|
7
|
+
let(:job) { Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo']) }
|
7
8
|
let(:fetcher) { described_class.new(queues: ['assigned']) }
|
8
9
|
|
9
10
|
it 'retrieves the job and puts it to working queue' do
|
10
|
-
Sidekiq.redis { |conn| conn.rpush('queue:assigned',
|
11
|
+
Sidekiq.redis { |conn| conn.rpush('queue:assigned', job) }
|
11
12
|
|
12
13
|
uow = fetcher.retrieve_work
|
13
14
|
|
14
15
|
expect(working_queue_size('assigned')).to eq 1
|
15
16
|
expect(uow.queue_name).to eq 'assigned'
|
16
|
-
expect(uow.job).to eq
|
17
|
+
expect(uow.job).to eq job
|
17
18
|
expect(Sidekiq::Queue.new('assigned').size).to eq 0
|
18
19
|
end
|
19
20
|
|
20
21
|
it 'does not retrieve a job from foreign queue' do
|
21
|
-
Sidekiq.redis { |conn| conn.rpush('queue:not_assigned',
|
22
|
+
Sidekiq.redis { |conn| conn.rpush('queue:not_assigned', job) }
|
22
23
|
|
23
24
|
expect(fetcher.retrieve_work).to be_nil
|
24
25
|
end
|
25
26
|
|
26
|
-
it 'requeues jobs from dead working queue' do
|
27
|
+
it 'requeues jobs from dead working queue with incremented interrupted_count' do
|
27
28
|
Sidekiq.redis do |conn|
|
28
|
-
conn.rpush(other_process_working_queue_name('assigned'),
|
29
|
+
conn.rpush(other_process_working_queue_name('assigned'), job)
|
29
30
|
end
|
30
31
|
|
32
|
+
expected_job = Sidekiq.load_json(job)
|
33
|
+
expected_job['interrupted_count'] = 1
|
34
|
+
expected_job = Sidekiq.dump_json(expected_job)
|
35
|
+
|
31
36
|
uow = fetcher.retrieve_work
|
32
37
|
|
33
|
-
expect(uow.job).to eq
|
38
|
+
expect(uow.job).to eq expected_job
|
34
39
|
|
35
40
|
Sidekiq.redis do |conn|
|
36
41
|
expect(conn.llen(other_process_working_queue_name('assigned'))).to eq 0
|
@@ -41,7 +46,7 @@ shared_examples 'a Sidekiq fetcher' do
|
|
41
46
|
working_queue = live_other_process_working_queue_name('assigned')
|
42
47
|
|
43
48
|
Sidekiq.redis do |conn|
|
44
|
-
conn.rpush(working_queue,
|
49
|
+
conn.rpush(working_queue, job)
|
45
50
|
end
|
46
51
|
|
47
52
|
uow = fetcher.retrieve_work
|
@@ -56,8 +61,7 @@ shared_examples 'a Sidekiq fetcher' do
|
|
56
61
|
it 'does not clean up orphaned jobs more than once per cleanup interval' do
|
57
62
|
Sidekiq.redis = Sidekiq::RedisConnection.create(url: REDIS_URL, size: 10)
|
58
63
|
|
59
|
-
|
60
|
-
.to receive(:clean_working_queues!).once
|
64
|
+
expect(described_class).to receive(:clean_working_queues!).once
|
61
65
|
|
62
66
|
threads = 10.times.map do
|
63
67
|
Thread.new do
|
data/spec/reliable_fetch_spec.rb
CHANGED
data/tests/README.md
ADDED
@@ -0,0 +1,37 @@
|
|
1
|
+
# How to run reliability tests
|
2
|
+
|
3
|
+
```
|
4
|
+
cd reliability_test
|
5
|
+
bundle exec ruby reliability_test.rb
|
6
|
+
```
|
7
|
+
|
8
|
+
You can adjust some parameters of the test in the `config.rb`.
|
9
|
+
|
10
|
+
JOB_FETCHER can be set to one of these values: `semi`, `reliable`, `basic`
|
11
|
+
|
12
|
+
You need to have redis server running on default HTTP port `6379`. To use other HTTP port, you can define
|
13
|
+
`REDIS_URL` environment varible with the port you need(example: `REDIS_URL="redis://localhost:9999"`).
|
14
|
+
|
15
|
+
|
16
|
+
## How it works
|
17
|
+
|
18
|
+
This tool spawns configured number of Sidekiq workers and when the amount of processed jobs is about half of origin
|
19
|
+
number it will kill all the workers with `kill -9` and then it will spawn new workers again until all the jobs are processed. To track the process and counters we use Redis keys/counters.
|
20
|
+
|
21
|
+
# How to run interruption tests
|
22
|
+
|
23
|
+
```
|
24
|
+
cd tests/interruption
|
25
|
+
|
26
|
+
# Verify "KILL" signal
|
27
|
+
bundle exec ruby test_kill_signal.rb
|
28
|
+
|
29
|
+
# Verify "TERM" signal
|
30
|
+
bundle exec ruby test_term_signal.rb
|
31
|
+
```
|
32
|
+
|
33
|
+
It requires Redis to be running on 6379 port.
|
34
|
+
|
35
|
+
## How it works
|
36
|
+
|
37
|
+
It spawns Sidekiq workers then creates a job that will kill itself after a moment. The reliable fetcher will bring it back. The purpose is to verify that job is run no more then allowed number of times.
|
@@ -0,0 +1,19 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require_relative '../../lib/sidekiq-reliable-fetch'
|
4
|
+
require_relative 'worker'
|
5
|
+
|
6
|
+
TEST_CLEANUP_INTERVAL = 20
|
7
|
+
TEST_LEASE_INTERVAL = 5
|
8
|
+
|
9
|
+
Sidekiq.configure_server do |config|
|
10
|
+
config.options[:semi_reliable_fetch] = true
|
11
|
+
|
12
|
+
# We need to override these parameters to not wait too long
|
13
|
+
# The default values are good for production use only
|
14
|
+
# These will be ignored for :basic
|
15
|
+
config.options[:cleanup_interval] = TEST_CLEANUP_INTERVAL
|
16
|
+
config.options[:lease_interval] = TEST_LEASE_INTERVAL
|
17
|
+
|
18
|
+
Sidekiq::ReliableFetch.setup_reliable_fetch!(config)
|
19
|
+
end
|
@@ -0,0 +1,25 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require 'sidekiq'
|
4
|
+
require_relative 'config'
|
5
|
+
require_relative '../support/utils'
|
6
|
+
|
7
|
+
EXPECTED_NUM_TIMES_BEEN_RUN = 3
|
8
|
+
NUM_WORKERS = EXPECTED_NUM_TIMES_BEEN_RUN + 1
|
9
|
+
|
10
|
+
Sidekiq.redis(&:flushdb)
|
11
|
+
|
12
|
+
pids = spawn_workers(NUM_WORKERS)
|
13
|
+
|
14
|
+
RetryTestWorker.perform_async
|
15
|
+
|
16
|
+
sleep 300
|
17
|
+
|
18
|
+
Sidekiq.redis do |redis|
|
19
|
+
times_has_been_run = redis.get('times_has_been_run').to_i
|
20
|
+
assert 'The job has been run', times_has_been_run, EXPECTED_NUM_TIMES_BEEN_RUN
|
21
|
+
end
|
22
|
+
|
23
|
+
assert 'Found interruption exhausted jobs', Sidekiq::InterruptedSet.new.size, 1
|
24
|
+
|
25
|
+
stop_workers(pids)
|
@@ -0,0 +1,25 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require 'sidekiq'
|
4
|
+
require_relative 'config'
|
5
|
+
require_relative '../support/utils'
|
6
|
+
|
7
|
+
EXPECTED_NUM_TIMES_BEEN_RUN = 3
|
8
|
+
NUM_WORKERS = EXPECTED_NUM_TIMES_BEEN_RUN + 1
|
9
|
+
|
10
|
+
Sidekiq.redis(&:flushdb)
|
11
|
+
|
12
|
+
pids = spawn_workers(NUM_WORKERS)
|
13
|
+
|
14
|
+
RetryTestWorker.perform_async('TERM', 60)
|
15
|
+
|
16
|
+
sleep 300
|
17
|
+
|
18
|
+
Sidekiq.redis do |redis|
|
19
|
+
times_has_been_run = redis.get('times_has_been_run').to_i
|
20
|
+
assert 'The job has been run', times_has_been_run, EXPECTED_NUM_TIMES_BEEN_RUN
|
21
|
+
end
|
22
|
+
|
23
|
+
assert 'Found interruption exhausted jobs', Sidekiq::InterruptedSet.new.size, 1
|
24
|
+
|
25
|
+
stop_workers(pids)
|
@@ -0,0 +1,15 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
class RetryTestWorker
|
4
|
+
include Sidekiq::Worker
|
5
|
+
|
6
|
+
def perform(signal = 'KILL', wait_seconds = 1)
|
7
|
+
Sidekiq.redis do |redis|
|
8
|
+
redis.incr('times_has_been_run')
|
9
|
+
end
|
10
|
+
|
11
|
+
Process.kill(signal, Process.pid)
|
12
|
+
|
13
|
+
sleep wait_seconds
|
14
|
+
end
|
15
|
+
end
|
@@ -1,8 +1,6 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
|
-
require_relative '
|
4
|
-
require_relative '../lib/sidekiq/reliable_fetch'
|
5
|
-
require_relative '../lib/sidekiq/semi_reliable_fetch'
|
3
|
+
require_relative '../../lib/sidekiq-reliable-fetch'
|
6
4
|
require_relative 'worker'
|
7
5
|
|
8
6
|
REDIS_FINISHED_LIST = 'reliable-fetcher-finished-jids'
|
@@ -0,0 +1,26 @@
|
|
1
|
+
def assert(text, actual, expected)
|
2
|
+
if actual == expected
|
3
|
+
puts "#{text}: #{actual} (Success)"
|
4
|
+
else
|
5
|
+
puts "#{text}: #{actual} (Failed). Expected: #{expected}"
|
6
|
+
exit 1
|
7
|
+
end
|
8
|
+
end
|
9
|
+
|
10
|
+
def spawn_workers(number)
|
11
|
+
pids = []
|
12
|
+
|
13
|
+
number.times do
|
14
|
+
pids << spawn('sidekiq -r ./config.rb')
|
15
|
+
end
|
16
|
+
|
17
|
+
pids
|
18
|
+
end
|
19
|
+
|
20
|
+
# Stop Sidekiq workers
|
21
|
+
def stop_workers(pids)
|
22
|
+
pids.each do |pid|
|
23
|
+
Process.kill('KILL', pid)
|
24
|
+
Process.wait pid
|
25
|
+
end
|
26
|
+
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: gitlab-sidekiq-fetcher
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.6.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- TEA
|
@@ -9,22 +9,28 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date:
|
12
|
+
date: 2020-07-22 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: sidekiq
|
16
16
|
requirement: !ruby/object:Gem::Requirement
|
17
17
|
requirements:
|
18
|
-
- - "
|
18
|
+
- - ">="
|
19
19
|
- !ruby/object:Gem::Version
|
20
20
|
version: '5'
|
21
|
+
- - "<"
|
22
|
+
- !ruby/object:Gem::Version
|
23
|
+
version: '7'
|
21
24
|
type: :runtime
|
22
25
|
prerelease: false
|
23
26
|
version_requirements: !ruby/object:Gem::Requirement
|
24
27
|
requirements:
|
25
|
-
- - "
|
28
|
+
- - ">="
|
26
29
|
- !ruby/object:Gem::Version
|
27
30
|
version: '5'
|
31
|
+
- - "<"
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '7'
|
28
34
|
description: Redis reliable queue pattern implemented in Sidekiq
|
29
35
|
email: valery@gitlab.com
|
30
36
|
executables: []
|
@@ -42,6 +48,7 @@ files:
|
|
42
48
|
- gitlab-sidekiq-fetcher.gemspec
|
43
49
|
- lib/sidekiq-reliable-fetch.rb
|
44
50
|
- lib/sidekiq/base_reliable_fetch.rb
|
51
|
+
- lib/sidekiq/interrupted_set.rb
|
45
52
|
- lib/sidekiq/reliable_fetch.rb
|
46
53
|
- lib/sidekiq/semi_reliable_fetch.rb
|
47
54
|
- spec/base_reliable_fetch_spec.rb
|
@@ -49,10 +56,15 @@ files:
|
|
49
56
|
- spec/reliable_fetch_spec.rb
|
50
57
|
- spec/semi_reliable_fetch_spec.rb
|
51
58
|
- spec/spec_helper.rb
|
52
|
-
-
|
53
|
-
-
|
54
|
-
-
|
55
|
-
-
|
59
|
+
- tests/README.md
|
60
|
+
- tests/interruption/config.rb
|
61
|
+
- tests/interruption/test_kill_signal.rb
|
62
|
+
- tests/interruption/test_term_signal.rb
|
63
|
+
- tests/interruption/worker.rb
|
64
|
+
- tests/reliability/config.rb
|
65
|
+
- tests/reliability/reliability_test.rb
|
66
|
+
- tests/reliability/worker.rb
|
67
|
+
- tests/support/utils.rb
|
56
68
|
homepage: https://gitlab.com/gitlab-org/sidekiq-reliable-fetch/
|
57
69
|
licenses:
|
58
70
|
- LGPL-3.0
|
@@ -72,8 +84,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
72
84
|
- !ruby/object:Gem::Version
|
73
85
|
version: '0'
|
74
86
|
requirements: []
|
75
|
-
|
76
|
-
rubygems_version: 2.7.6
|
87
|
+
rubygems_version: 3.0.3
|
77
88
|
signing_key:
|
78
89
|
specification_version: 4
|
79
90
|
summary: Reliable fetch extension for Sidekiq
|
data/test/README.md
DELETED
@@ -1,34 +0,0 @@
|
|
1
|
-
# How to run
|
2
|
-
|
3
|
-
```
|
4
|
-
cd test
|
5
|
-
bundle exec ruby reliability_test.rb
|
6
|
-
```
|
7
|
-
|
8
|
-
You can adjust some parameters of the test in the `config.rb`
|
9
|
-
|
10
|
-
|
11
|
-
# How it works
|
12
|
-
|
13
|
-
This tool spawns configured number of Sidekiq workers and when the amount of processed jobs is about half of origin
|
14
|
-
number it will kill all the workers with `kill -9` and then it will spawn new workers again until all the jobs are processed. To track the process and counters we use Redis keys/counters.
|
15
|
-
|
16
|
-
# How to run tests
|
17
|
-
|
18
|
-
To run rspec:
|
19
|
-
|
20
|
-
```
|
21
|
-
bundle exec rspec
|
22
|
-
```
|
23
|
-
|
24
|
-
To run performance tests:
|
25
|
-
|
26
|
-
```
|
27
|
-
cd test
|
28
|
-
JOB_FETCHER=semi bundle exec ruby reliability_test.rb
|
29
|
-
```
|
30
|
-
|
31
|
-
JOB_FETCHER can be set to one of these values: `semi`, `reliable`, `basic`
|
32
|
-
|
33
|
-
To run both kind of tests you need to have redis server running on default HTTP port `6379`. To use other HTTP port, you can define
|
34
|
-
`REDIS_URL` environment varible with the port you need(example: `REDIS_URL="redis://localhost:9999"`).
|