gitlab-sidekiq-fetcher 0.5.1 → 0.5.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA256:
3
- metadata.gz: eb3707eefa697c806e40fc41b6406714341c5f8dd7115451391313a5f7f01725
4
- data.tar.gz: 51046486212181fd92a7cbc2f48f707391ab3e1438ad814b29f2a12c84c2b8cf
2
+ SHA1:
3
+ metadata.gz: d35e77ddc4f9b05b7eaa90ef5d05815f724e47ef
4
+ data.tar.gz: 1aac18934185ff3b3a09050cb38dbcf31115fbbc
5
5
  SHA512:
6
- metadata.gz: b7453b6ff9e45f2d9cc7b8ad3b120c17bb04093214c79967ed6a75698b26fe358bf10ab6424dff95f5cab2b3839ac20e5a2af23a01985f4fe715d2f00c3d086b
7
- data.tar.gz: 03afebd011c716cdc96b59b2b18e80514701f6bef3674ab3bdeb568b35c1730c5c8906a3eb40ad9c8aff93b971dd88b538898b326ece66642ecf64f879524088
6
+ metadata.gz: 9776fe9d58da5f580b1c27df618c7b4aec1088e46b14efc4497e8c77b233ac97ce7fbb6bb6a03a84ed65a8cb24a41d0ee31a42dcb380d57836ef709ea3577ef8
7
+ data.tar.gz: 7651a6704485f5150e22a30bc9ef47674e1d1fafabb5a5a42aaa844e14274012c688d41bef650d1f12a9268609291164a84f9ed2678157c5cc8af73116275d93
data/.gitignore CHANGED
@@ -1,2 +1,3 @@
1
1
  *.gem
2
2
  coverage
3
+ .DS_Store
data/.gitlab-ci.yml CHANGED
@@ -25,7 +25,7 @@ rspec:
25
25
  .integration:
26
26
  stage: test
27
27
  script:
28
- - cd tests/reliability_test
28
+ - cd tests/reliability
29
29
  - bundle exec ruby reliability_test.rb
30
30
  services:
31
31
  - redis:alpine
@@ -47,19 +47,19 @@ integration_basic:
47
47
  variables:
48
48
  JOB_FETCHER: basic
49
49
 
50
- retry_test:
50
+ kill_interruption:
51
51
  stage: test
52
52
  script:
53
- - cd tests/retry_test
54
- - bundle exec ruby retry_test.rb
53
+ - cd tests/interruption
54
+ - bundle exec ruby test_kill_signal.rb
55
55
  services:
56
56
  - redis:alpine
57
57
 
58
- no_retry_test:
58
+ term_interruption:
59
59
  stage: test
60
60
  script:
61
- - cd tests/retry_test
62
- - bundle exec ruby no_retry_test.rb
61
+ - cd tests/interruption
62
+ - bundle exec ruby test_term_signal.rb
63
63
  services:
64
64
  - redis:alpine
65
65
 
data/README.md CHANGED
@@ -10,6 +10,17 @@ There are two strategies implemented: [Reliable fetch](http://redis.io/commands/
10
10
  semi-reliable fetch that uses regular `brpop` and `lpush` to pick the job and put it to working queue. The main benefit of "Reliable" strategy is that `rpoplpush` is atomic, eliminating a race condition in which jobs can be lost.
11
11
  However, it comes at a cost because `rpoplpush` can't watch multiple lists at the same time so we need to iterate over the entire queue list which significantly increases pressure on Redis when there are more than a few queues. The "semi-reliable" strategy is much more reliable than the default Sidekiq fetcher, though. Compared to the reliable fetch strategy, it does not increase pressure on Redis significantly.
12
12
 
13
+ ### Interruption handling
14
+
15
+ Sidekiq expects any job to report succcess or to fail. In the last case, Sidekiq puts `retry_count` counter
16
+ into the job and keeps to re-run the job until the counter reched the maximum allowed value. When the job has
17
+ not been given a chance to finish its work(to report success or fail), for example, when it was killed forcibly or when the job was requeued, after receiving TERM signal, the standard retry mechanisme does not get into the game and the job will be retried indefinatelly. This is why Reliable fetcher maintains a special counter `interrupted_count`
18
+ which is used to limit the amount of such retries. In both cases, Reliable Fetcher increments counter `interrupted_count` and rejects the job from running again when the counter exceeds `max_retries_after_interruption` times (default: 3 times).
19
+ Such a job will be put to `interrupted` queue. This queue mostly behaves as Sidekiq Dead queue so it only stores a limited amount of jobs for a limited term. Same as for Dead queue, all the limits are configurable via `interrupted_max_jobs` (default: 10_000) and `interrupted_timeout_in_seconds` (default: 3 months) Sidekiq option keys.
20
+
21
+ You can also disable special handling of interrupted jobs by setting `max_retries_after_interruption` into `-1`.
22
+ In this case, interrupted jobs will be run without any limits from Reliable Fetcher and they won't be put into Interrupted queue.
23
+
13
24
 
14
25
  ## Installation
15
26
 
@@ -1,14 +1,14 @@
1
1
  Gem::Specification.new do |s|
2
- s.name = 'gitlab-sidekiq-fetcher'
3
- s.version = '0.5.1'
4
- s.authors = ['TEA', 'GitLab']
5
- s.email = 'valery@gitlab.com'
6
- s.license = 'LGPL-3.0'
7
- s.homepage = 'https://gitlab.com/gitlab-org/sidekiq-reliable-fetch/'
8
- s.summary = 'Reliable fetch extension for Sidekiq'
9
- s.description = 'Redis reliable queue pattern implemented in Sidekiq'
2
+ s.name = 'gitlab-sidekiq-fetcher'
3
+ s.version = '0.5.2'
4
+ s.authors = ['TEA', 'GitLab']
5
+ s.email = 'valery@gitlab.com'
6
+ s.license = 'LGPL-3.0'
7
+ s.homepage = 'https://gitlab.com/gitlab-org/sidekiq-reliable-fetch/'
8
+ s.summary = 'Reliable fetch extension for Sidekiq'
9
+ s.description = 'Redis reliable queue pattern implemented in Sidekiq'
10
10
  s.require_paths = ['lib']
11
- s.files = `git ls-files`.split($\)
12
- s.test_files = []
11
+ s.files = `git ls-files`.split($\)
12
+ s.test_files = []
13
13
  s.add_dependency 'sidekiq', '~> 5'
14
14
  end
@@ -1,4 +1,5 @@
1
1
  require 'sidekiq'
2
+ require 'sidekiq/api'
2
3
 
3
4
  require_relative 'sidekiq/base_reliable_fetch'
4
5
  require_relative 'sidekiq/reliable_fetch'
@@ -1,6 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'sidekiq/job_retry'
3
+ require_relative 'interrupted_set'
4
4
 
5
5
  module Sidekiq
6
6
  class BaseReliableFetch
@@ -18,6 +18,9 @@ module Sidekiq
18
18
  # Defines the COUNT parameter that will be passed to Redis SCAN command
19
19
  SCAN_COUNT = 1000
20
20
 
21
+ # How much time a job can be interrupted
22
+ DEFAULT_MAX_RETRIES_AFTER_INTERRUPTION = 3
23
+
21
24
  UnitOfWork = Struct.new(:queue, :job) do
22
25
  def acknowledge
23
26
  Sidekiq.redis { |conn| conn.lrem(Sidekiq::BaseReliableFetch.working_queue_name(queue), 1, job) }
@@ -84,143 +87,145 @@ module Sidekiq
84
87
  def self.bulk_requeue(inprogress, _options)
85
88
  return if inprogress.empty?
86
89
 
87
- Sidekiq.logger.debug('Re-queueing terminated jobs')
88
-
89
90
  Sidekiq.redis do |conn|
90
91
  inprogress.each do |unit_of_work|
91
92
  conn.multi do |multi|
92
- multi.lpush(unit_of_work.queue, unit_of_work.job)
93
+ preprocess_interrupted_job(unit_of_work.job, unit_of_work.queue, multi)
94
+
93
95
  multi.lrem(working_queue_name(unit_of_work.queue), 1, unit_of_work.job)
94
96
  end
95
97
  end
96
98
  end
97
-
98
- Sidekiq.logger.info("Pushed #{inprogress.size} jobs back to Redis")
99
99
  rescue => e
100
100
  Sidekiq.logger.warn("Failed to requeue #{inprogress.size} jobs: #{e.message}")
101
101
  end
102
102
 
103
- def self.heartbeat_key(hostname, pid)
104
- "reliable-fetcher-heartbeat-#{hostname}-#{pid}"
105
- end
106
-
107
- def self.working_queue_name(queue)
108
- "#{WORKING_QUEUE_PREFIX}:#{queue}:#{hostname}:#{pid}"
109
- end
110
-
111
- attr_reader :cleanup_interval, :last_try_to_take_lease_at, :lease_interval,
112
- :queues, :use_semi_reliable_fetch,
113
- :strictly_ordered_queues
103
+ def self.clean_working_queue!(working_queue)
104
+ original_queue = working_queue.gsub(/#{WORKING_QUEUE_PREFIX}:|:[^:]*:[0-9]*\z/, '')
114
105
 
115
- def initialize(options)
116
- @cleanup_interval = options.fetch(:cleanup_interval, DEFAULT_CLEANUP_INTERVAL)
117
- @lease_interval = options.fetch(:lease_interval, DEFAULT_LEASE_INTERVAL)
118
- @last_try_to_take_lease_at = 0
119
- @strictly_ordered_queues = !!options[:strict]
120
- @queues = options[:queues].map { |q| "queue:#{q}" }
106
+ Sidekiq.redis do |conn|
107
+ while job = conn.rpop(working_queue)
108
+ preprocess_interrupted_job(job, original_queue)
109
+ end
110
+ end
121
111
  end
122
112
 
123
- def retrieve_work
124
- clean_working_queues! if take_lease
125
-
126
- retrieve_unit_of_work
127
- end
113
+ def self.preprocess_interrupted_job(job, queue, conn = nil)
114
+ msg = Sidekiq.load_json(job)
115
+ msg['interrupted_count'] = msg['interrupted_count'].to_i + 1
128
116
 
129
- def retrieve_unit_of_work
130
- raise NotImplementedError,
131
- "#{self.class} does not implement #{__method__}"
117
+ if interruption_exhausted?(msg)
118
+ send_to_quarantine(msg, conn)
119
+ else
120
+ requeue_job(queue, msg, conn)
121
+ end
132
122
  end
133
123
 
134
- private
135
-
136
- def clean_working_queue!(working_queue)
137
- original_queue = working_queue.gsub(/#{WORKING_QUEUE_PREFIX}:|:[^:]*:[0-9]*\z/, '')
124
+ # Detect "old" jobs and requeue them because the worker they were assigned
125
+ # to probably failed miserably.
126
+ def self.clean_working_queues!
127
+ Sidekiq.logger.info('Cleaning working queues')
138
128
 
139
129
  Sidekiq.redis do |conn|
140
- count = 0
141
-
142
- while job = conn.rpop(working_queue)
143
- msg = begin
144
- Sidekiq.load_json(job)
145
- rescue => e
146
- Sidekiq.logger.info("Skipped job: #{job} as we couldn't parse it")
147
- next
148
- end
149
-
150
- msg['retry_count'] = msg['retry_count'].to_i + 1
151
-
152
- if retries_exhausted?(msg)
153
- send_to_morgue(msg)
154
- else
155
- job = Sidekiq.dump_json(msg)
130
+ conn.scan_each(match: "#{WORKING_QUEUE_PREFIX}:queue:*", count: SCAN_COUNT) do |key|
131
+ # Example: "working:name_of_the_job:queue:{hostname}:{PID}"
132
+ hostname, pid = key.scan(/:([^:]*):([0-9]*)\z/).flatten
156
133
 
157
- conn.lpush(original_queue, job)
134
+ continue if hostname.nil? || pid.nil?
158
135
 
159
- count += 1
160
- end
136
+ clean_working_queue!(key) if worker_dead?(hostname, pid, conn)
161
137
  end
162
-
163
- Sidekiq.logger.info("Requeued #{count} dead jobs to #{original_queue}")
164
138
  end
165
139
  end
166
140
 
167
- def retries_exhausted?(msg)
168
- # `retry` parameter can be empty when job is running the first time and when
169
- # it's not specified in worker class explicitly.
170
- # In that case, the default parameter gets injected into the job when
171
- # it fails the first time in JobRetry#local.
172
- # We should handle the case when `retry` is explicitly set to false
173
- return true if msg['retry'] === false
141
+ def self.worker_dead?(hostname, pid, conn)
142
+ !conn.get(heartbeat_key(hostname, pid))
143
+ end
174
144
 
175
- max_retries_default = Sidekiq.options.fetch(:max_retries, Sidekiq::JobRetry::DEFAULT_MAX_RETRY_ATTEMPTS)
145
+ def self.heartbeat_key(hostname, pid)
146
+ "reliable-fetcher-heartbeat-#{hostname}-#{pid}"
147
+ end
176
148
 
177
- max_retry_attempts = retry_attempts_from(msg['retry'], max_retries_default)
149
+ def self.working_queue_name(queue)
150
+ "#{WORKING_QUEUE_PREFIX}:#{queue}:#{hostname}:#{pid}"
151
+ end
152
+
153
+ def self.interruption_exhausted?(msg)
154
+ return false if max_retries_after_interruption(msg['class']) < 0
178
155
 
179
- msg['retry_count'] >= max_retry_attempts
156
+ msg['interrupted_count'].to_i >= max_retries_after_interruption(msg['class'])
180
157
  end
181
158
 
182
- def retry_attempts_from(msg_retry, default)
183
- if msg_retry.is_a?(Integer)
184
- msg_retry
185
- else
186
- default
159
+ def self.max_retries_after_interruption(worker_class)
160
+ max_retries_after_interruption = nil
161
+
162
+ max_retries_after_interruption ||= begin
163
+ Object.const_get(worker_class).sidekiq_options[:max_retries_after_interruption]
164
+ rescue NameError
187
165
  end
166
+
167
+ max_retries_after_interruption ||= Sidekiq.options[:max_retries_after_interruption]
168
+ max_retries_after_interruption ||= DEFAULT_MAX_RETRIES_AFTER_INTERRUPTION
169
+ max_retries_after_interruption
188
170
  end
189
171
 
190
- def send_to_morgue(msg)
172
+ def self.send_to_quarantine(msg, multi_connection = nil)
191
173
  Sidekiq.logger.warn(
192
174
  class: msg['class'],
193
175
  jid: msg['jid'],
194
- message: %(Reliable Fetcher: adding dead #{msg['class']} job #{msg['jid']})
176
+ message: %(Reliable Fetcher: adding dead #{msg['class']} job #{msg['jid']} to interrupted queue)
195
177
  )
196
178
 
197
- payload = Sidekiq.dump_json(msg)
198
- Sidekiq::DeadSet.new.kill(payload, notify_failure: false)
179
+ job = Sidekiq.dump_json(msg)
180
+ Sidekiq::InterruptedSet.new.put(job, connection: multi_connection)
199
181
  end
200
182
 
201
- # Detect "old" jobs and requeue them because the worker they were assigned
202
- # to probably failed miserably.
203
- def clean_working_queues!
204
- Sidekiq.logger.info("Cleaning working queues")
183
+ # If you want this method to be run is a scope of multi connection
184
+ # you need to pass it
185
+ def self.requeue_job(queue, msg, conn)
186
+ with_connection(conn) do |conn|
187
+ conn.lpush(queue, Sidekiq.dump_json(msg))
188
+ end
205
189
 
206
- Sidekiq.redis do |conn|
207
- conn.scan_each(match: "#{WORKING_QUEUE_PREFIX}:queue:*", count: SCAN_COUNT) do |key|
208
- # Example: "working:name_of_the_job:queue:{hostname}:{PID}"
209
- hostname, pid = key.scan(/:([^:]*):([0-9]*)\z/).flatten
190
+ Sidekiq.logger.info(
191
+ message: "Pushed job #{msg['jid']} back to queue #{queue}",
192
+ jid: msg['jid'],
193
+ queue: queue
194
+ )
195
+ end
210
196
 
211
- continue if hostname.nil? || pid.nil?
197
+ # Yield block with an existing connection or creates another one
198
+ def self.with_connection(conn, &block)
199
+ return yield(conn) if conn
212
200
 
213
- clean_working_queue!(key) if worker_dead?(hostname, pid)
214
- end
215
- end
201
+ Sidekiq.redis { |conn| yield(conn) }
216
202
  end
217
203
 
218
- def worker_dead?(hostname, pid)
219
- Sidekiq.redis do |conn|
220
- !conn.get(self.class.heartbeat_key(hostname, pid))
221
- end
204
+ attr_reader :cleanup_interval, :last_try_to_take_lease_at, :lease_interval,
205
+ :queues, :use_semi_reliable_fetch,
206
+ :strictly_ordered_queues
207
+
208
+ def initialize(options)
209
+ @cleanup_interval = options.fetch(:cleanup_interval, DEFAULT_CLEANUP_INTERVAL)
210
+ @lease_interval = options.fetch(:lease_interval, DEFAULT_LEASE_INTERVAL)
211
+ @last_try_to_take_lease_at = 0
212
+ @strictly_ordered_queues = !!options[:strict]
213
+ @queues = options[:queues].map { |q| "queue:#{q}" }
222
214
  end
223
215
 
216
+ def retrieve_work
217
+ self.class.clean_working_queues! if take_lease
218
+
219
+ retrieve_unit_of_work
220
+ end
221
+
222
+ def retrieve_unit_of_work
223
+ raise NotImplementedError,
224
+ "#{self.class} does not implement #{__method__}"
225
+ end
226
+
227
+ private
228
+
224
229
  def take_lease
225
230
  return unless allowed_to_take_a_lease?
226
231
 
@@ -0,0 +1,47 @@
1
+ require 'sidekiq/api'
2
+
3
+ module Sidekiq
4
+ class InterruptedSet < ::Sidekiq::JobSet
5
+ DEFAULT_MAX_CAPACITY = 10_000
6
+ DEFAULT_MAX_TIMEOUT = 90 * 24 * 60 * 60 # 3 months
7
+
8
+ def initialize
9
+ super "interrupted"
10
+ end
11
+
12
+ def put(message, opts = {})
13
+ now = Time.now.to_f
14
+
15
+ with_multi_connection(opts[:connection]) do |conn|
16
+ conn.zadd(name, now.to_s, message)
17
+ conn.zremrangebyscore(name, '-inf', now - self.class.timeout)
18
+ conn.zremrangebyrank(name, 0, - self.class.max_jobs)
19
+ end
20
+
21
+ true
22
+ end
23
+
24
+ # Yield block inside an existing multi connection or creates new one
25
+ def with_multi_connection(conn, &block)
26
+ return yield(conn) if conn
27
+
28
+ Sidekiq.redis do |c|
29
+ c.multi do |multi|
30
+ yield(multi)
31
+ end
32
+ end
33
+ end
34
+
35
+ def retry_all
36
+ each(&:retry) while size > 0
37
+ end
38
+
39
+ def self.max_jobs
40
+ Sidekiq.options[:interrupted_max_jobs] || DEFAULT_MAX_CAPACITY
41
+ end
42
+
43
+ def self.timeout
44
+ Sidekiq.options[:interrupted_timeout_in_seconds] || DEFAULT_MAX_TIMEOUT
45
+ end
46
+ end
47
+ end
@@ -5,10 +5,11 @@ require 'sidekiq/reliable_fetch'
5
5
  require 'sidekiq/semi_reliable_fetch'
6
6
 
7
7
  describe Sidekiq::BaseReliableFetch do
8
+ let(:job) { Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo']) }
9
+
8
10
  before { Sidekiq.redis(&:flushdb) }
9
11
 
10
12
  describe 'UnitOfWork' do
11
- let(:job) { Sidekiq.dump_json({ class: 'Bob', args: [1, 2, 'foo'] }) }
12
13
  let(:fetcher) { Sidekiq::ReliableFetch.new(queues: ['foo']) }
13
14
 
14
15
  describe '#requeue' do
@@ -39,19 +40,42 @@ describe Sidekiq::BaseReliableFetch do
39
40
  end
40
41
 
41
42
  describe '.bulk_requeue' do
43
+ let!(:queue1) { Sidekiq::Queue.new('foo') }
44
+ let!(:queue2) { Sidekiq::Queue.new('bar') }
45
+
42
46
  it 'requeues the bulk' do
43
- queue1 = Sidekiq::Queue.new('foo')
44
- queue2 = Sidekiq::Queue.new('bar')
47
+ uow = described_class::UnitOfWork
48
+ jobs = [ uow.new('queue:foo', job), uow.new('queue:foo', job), uow.new('queue:bar', job) ]
49
+ described_class.bulk_requeue(jobs, queues: [])
45
50
 
46
- expect(queue1.size).to eq 0
47
- expect(queue2.size).to eq 0
51
+ expect(queue1.size).to eq 2
52
+ expect(queue2.size).to eq 1
53
+ end
48
54
 
55
+ it 'puts jobs into interrupted queue' do
49
56
  uow = described_class::UnitOfWork
50
- jobs = [ uow.new('queue:foo', 'bob'), uow.new('queue:foo', 'bar'), uow.new('queue:bar', 'widget') ]
57
+ interrupted_job = Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo'], interrupted_count: 3)
58
+ jobs = [ uow.new('queue:foo', interrupted_job), uow.new('queue:foo', job), uow.new('queue:bar', job) ]
59
+ described_class.bulk_requeue(jobs, queues: [])
60
+
61
+ expect(queue1.size).to eq 1
62
+ expect(queue2.size).to eq 1
63
+ expect(Sidekiq::InterruptedSet.new.size).to eq 1
64
+ end
65
+
66
+ it 'does not put jobs into interrupted queue if it is disabled' do
67
+ Sidekiq.options[:max_retries_after_interruption] = -1
68
+
69
+ uow = described_class::UnitOfWork
70
+ interrupted_job = Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo'], interrupted_count: 3)
71
+ jobs = [ uow.new('queue:foo', interrupted_job), uow.new('queue:foo', job), uow.new('queue:bar', job) ]
51
72
  described_class.bulk_requeue(jobs, queues: [])
52
73
 
53
74
  expect(queue1.size).to eq 2
54
75
  expect(queue2.size).to eq 1
76
+ expect(Sidekiq::InterruptedSet.new.size).to eq 0
77
+
78
+ Sidekiq.options[:max_retries_after_interruption] = 3
55
79
  end
56
80
  end
57
81
 
@@ -4,7 +4,7 @@ shared_examples 'a Sidekiq fetcher' do
4
4
  before { Sidekiq.redis(&:flushdb) }
5
5
 
6
6
  describe '#retrieve_work' do
7
- let(:job) { Sidekiq.dump_json({ class: 'Bob', args: [1, 2, 'foo'] }) }
7
+ let(:job) { Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo']) }
8
8
  let(:fetcher) { described_class.new(queues: ['assigned']) }
9
9
 
10
10
  it 'retrieves the job and puts it to working queue' do
@@ -24,13 +24,13 @@ shared_examples 'a Sidekiq fetcher' do
24
24
  expect(fetcher.retrieve_work).to be_nil
25
25
  end
26
26
 
27
- it 'requeues jobs from dead working queue with incremented retry_count' do
27
+ it 'requeues jobs from dead working queue with incremented interrupted_count' do
28
28
  Sidekiq.redis do |conn|
29
29
  conn.rpush(other_process_working_queue_name('assigned'), job)
30
30
  end
31
31
 
32
32
  expected_job = Sidekiq.load_json(job)
33
- expected_job['retry_count'] = 1
33
+ expected_job['interrupted_count'] = 1
34
34
  expected_job = Sidekiq.dump_json(expected_job)
35
35
 
36
36
  uow = fetcher.retrieve_work
@@ -61,8 +61,7 @@ shared_examples 'a Sidekiq fetcher' do
61
61
  it 'does not clean up orphaned jobs more than once per cleanup interval' do
62
62
  Sidekiq.redis = Sidekiq::RedisConnection.create(url: REDIS_URL, size: 10)
63
63
 
64
- expect_any_instance_of(described_class)
65
- .to receive(:clean_working_queues!).once
64
+ expect(described_class).to receive(:clean_working_queues!).once
66
65
 
67
66
  threads = 10.times.map do
68
67
  Thread.new do
data/tests/README.md CHANGED
@@ -18,18 +18,20 @@ You need to have redis server running on default HTTP port `6379`. To use other
18
18
  This tool spawns configured number of Sidekiq workers and when the amount of processed jobs is about half of origin
19
19
  number it will kill all the workers with `kill -9` and then it will spawn new workers again until all the jobs are processed. To track the process and counters we use Redis keys/counters.
20
20
 
21
- # How to run retry tests
21
+ # How to run interruption tests
22
22
 
23
23
  ```
24
- cd retry_test
25
- bundle exec ruby retry_test.rb
24
+ cd tests/interruption
26
25
 
27
- # To verify that workers with "retry: false" are not retried
28
- bundle exec ruby no_retry_test.rb
26
+ # Verify "KILL" signal
27
+ bundle exec ruby test_kill_signal.rb
28
+
29
+ # Verify "TERM" signal
30
+ bundle exec ruby test_term_signal.rb
29
31
  ```
30
32
 
31
33
  It requires Redis to be running on 6379 port.
32
34
 
33
35
  ## How it works
34
36
 
35
- It spawns Sidekiq workers then creates a job that will kill itself after a moment. The reliable fetcher will bring it back. The purpose is to verify that job is run no more then `retry` parameter says even when job was killed.
37
+ It spawns Sidekiq workers then creates a job that will kill itself after a moment. The reliable fetcher will bring it back. The purpose is to verify that job is run no more then allowed number of times.
@@ -2,7 +2,6 @@
2
2
 
3
3
  require_relative '../../lib/sidekiq-reliable-fetch'
4
4
  require_relative 'worker'
5
- require_relative 'no_retry_worker'
6
5
 
7
6
  TEST_CLEANUP_INTERVAL = 20
8
7
  TEST_LEASE_INTERVAL = 5
@@ -4,21 +4,22 @@ require 'sidekiq'
4
4
  require_relative 'config'
5
5
  require_relative '../support/utils'
6
6
 
7
- NUM_WORKERS = 2 # one worker will be killed and one spare worker t verify that job is not picked up
7
+ EXPECTED_NUM_TIMES_BEEN_RUN = 3
8
+ NUM_WORKERS = EXPECTED_NUM_TIMES_BEEN_RUN + 1
8
9
 
9
10
  Sidekiq.redis(&:flushdb)
10
11
 
11
12
  pids = spawn_workers(NUM_WORKERS)
12
13
 
13
- jid = NoRetryTestWorker.perform_async
14
+ RetryTestWorker.perform_async
14
15
 
15
16
  sleep 300
16
17
 
17
18
  Sidekiq.redis do |redis|
18
19
  times_has_been_run = redis.get('times_has_been_run').to_i
19
- assert 'The job has been run', times_has_been_run, 1
20
+ assert 'The job has been run', times_has_been_run, EXPECTED_NUM_TIMES_BEEN_RUN
20
21
  end
21
22
 
22
- assert 'Found dead jobs', Sidekiq::DeadSet.new.size, 1
23
+ assert 'Found interruption exhausted jobs', Sidekiq::InterruptedSet.new.size, 1
23
24
 
24
25
  stop_workers(pids)
@@ -4,21 +4,22 @@ require 'sidekiq'
4
4
  require_relative 'config'
5
5
  require_relative '../support/utils'
6
6
 
7
- NUM_WORKERS = RetryTestWorker::EXPECTED_NUM_TIMES_BEEN_RUN + 1
7
+ EXPECTED_NUM_TIMES_BEEN_RUN = 3
8
+ NUM_WORKERS = EXPECTED_NUM_TIMES_BEEN_RUN + 1
8
9
 
9
10
  Sidekiq.redis(&:flushdb)
10
11
 
11
12
  pids = spawn_workers(NUM_WORKERS)
12
13
 
13
- jid = RetryTestWorker.perform_async
14
+ RetryTestWorker.perform_async('TERM', 60)
14
15
 
15
16
  sleep 300
16
17
 
17
18
  Sidekiq.redis do |redis|
18
19
  times_has_been_run = redis.get('times_has_been_run').to_i
19
- assert 'The job has been run', times_has_been_run, RetryTestWorker::EXPECTED_NUM_TIMES_BEEN_RUN
20
+ assert 'The job has been run', times_has_been_run, EXPECTED_NUM_TIMES_BEEN_RUN
20
21
  end
21
22
 
22
- assert 'Found dead jobs', Sidekiq::DeadSet.new.size, 1
23
+ assert 'Found interruption exhausted jobs', Sidekiq::InterruptedSet.new.size, 1
23
24
 
24
25
  stop_workers(pids)
@@ -0,0 +1,15 @@
1
+ # frozen_string_literal: true
2
+
3
+ class RetryTestWorker
4
+ include Sidekiq::Worker
5
+
6
+ def perform(signal = 'KILL', wait_seconds = 1)
7
+ Sidekiq.redis do |redis|
8
+ redis.incr('times_has_been_run')
9
+ end
10
+
11
+ Process.kill(signal, Process.pid)
12
+
13
+ sleep wait_seconds
14
+ end
15
+ end
File without changes
File without changes
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gitlab-sidekiq-fetcher
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.1
4
+ version: 0.5.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - TEA
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2019-08-06 00:00:00.000000000 Z
12
+ date: 2019-09-03 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: sidekiq
@@ -42,6 +42,7 @@ files:
42
42
  - gitlab-sidekiq-fetcher.gemspec
43
43
  - lib/sidekiq-reliable-fetch.rb
44
44
  - lib/sidekiq/base_reliable_fetch.rb
45
+ - lib/sidekiq/interrupted_set.rb
45
46
  - lib/sidekiq/reliable_fetch.rb
46
47
  - lib/sidekiq/semi_reliable_fetch.rb
47
48
  - spec/base_reliable_fetch_spec.rb
@@ -50,14 +51,13 @@ files:
50
51
  - spec/semi_reliable_fetch_spec.rb
51
52
  - spec/spec_helper.rb
52
53
  - tests/README.md
53
- - tests/reliability_test/config.rb
54
- - tests/reliability_test/reliability_test.rb
55
- - tests/reliability_test/worker.rb
56
- - tests/retry_test/config.rb
57
- - tests/retry_test/no_retry_test.rb
58
- - tests/retry_test/no_retry_worker.rb
59
- - tests/retry_test/retry_test.rb
60
- - tests/retry_test/worker.rb
54
+ - tests/interruption/config.rb
55
+ - tests/interruption/test_kill_signal.rb
56
+ - tests/interruption/test_term_signal.rb
57
+ - tests/interruption/worker.rb
58
+ - tests/reliability/config.rb
59
+ - tests/reliability/reliability_test.rb
60
+ - tests/reliability/worker.rb
61
61
  - tests/support/utils.rb
62
62
  homepage: https://gitlab.com/gitlab-org/sidekiq-reliable-fetch/
63
63
  licenses:
@@ -78,7 +78,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
78
78
  - !ruby/object:Gem::Version
79
79
  version: '0'
80
80
  requirements: []
81
- rubygems_version: 3.0.3
81
+ rubyforge_project:
82
+ rubygems_version: 2.5.2
82
83
  signing_key:
83
84
  specification_version: 4
84
85
  summary: Reliable fetch extension for Sidekiq
@@ -1,21 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- class NoRetryTestWorker
4
- include Sidekiq::Worker
5
-
6
- sidekiq_options retry: false
7
-
8
- sidekiq_retry_in do |count, exception|
9
- 1 # retry in one second
10
- end
11
-
12
- def perform
13
- sleep 1
14
-
15
- Sidekiq.redis do |redis|
16
- redis.incr('times_has_been_run')
17
- end
18
-
19
- Process.kill('KILL', Process.pid) # Job suicide, OOM killer imitation
20
- end
21
- end
@@ -1,23 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- class RetryTestWorker
4
- include Sidekiq::Worker
5
-
6
- EXPECTED_NUM_TIMES_BEEN_RUN = 2
7
-
8
- sidekiq_options retry: EXPECTED_NUM_TIMES_BEEN_RUN
9
-
10
- sidekiq_retry_in do |count, exception|
11
- 1 # retry in one second
12
- end
13
-
14
- def perform
15
- sleep 1
16
-
17
- Sidekiq.redis do |redis|
18
- redis.incr('times_has_been_run')
19
- end
20
-
21
- Process.kill('KILL', Process.pid) # Job suicide, OOM killer imitation
22
- end
23
- end