gitlab-sidekiq-fetcher 0.5.0.pre.alpha → 0.7.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7da36ba54ba6a0e97cef5da210f02bdde86dc06e22960d37573e51e20046dd40
4
- data.tar.gz: bd3bf13edc789374109a2c18b6797b44c1440bc2665938c654e3a627bbc5bc23
3
+ metadata.gz: 6b9cf610a1915d63141331ba0e4820306235fc1c58e37e0124a9700d19005b99
4
+ data.tar.gz: 3690c4aaff9d47c8ec108ff264984343bc3a4412123ad56ac727cdf1ae3e8fd3
5
5
  SHA512:
6
- metadata.gz: 852f01191a052384cfe4215766c6b27dc932ced77782a92c23f73aba091b520e83d293d57b407e45ed5d141661664a9f77b88211b50e9bed52cb15a011b0347a
7
- data.tar.gz: 59ee3493c9ddcb3e145ce77259b85ed508f8e57ca97878c12f5427e25952d2a4418860cd1d90562d39aa693dd630359e88ac78fd882aaceb9c5c374b6ca7509a
6
+ metadata.gz: 6d5d61280c6db3b91c8107fca593fa12246db18c727389ef31cf8edf58e923bc78a2cbbe2be505962664ed56c8189513dc0ad0312e545988b70666839f191f66
7
+ data.tar.gz: cd1459179a3f97b3b3194a21e8335367e9021db590cb9441613d6d88030b1b86cc495b00e1ff37dc0cf7ea683d40e6b10d3c511c7d956a70e8c26c67a4bc0903
data/.gitignore CHANGED
@@ -1,2 +1,3 @@
1
1
  *.gem
2
2
  coverage
3
+ .DS_Store
@@ -25,7 +25,7 @@ rspec:
25
25
  .integration:
26
26
  stage: test
27
27
  script:
28
- - cd tests/reliability_test
28
+ - cd tests/reliability
29
29
  - bundle exec ruby reliability_test.rb
30
30
  services:
31
31
  - redis:alpine
@@ -40,21 +40,27 @@ integration_reliable:
40
40
  variables:
41
41
  JOB_FETCHER: reliable
42
42
 
43
-
44
43
  integration_basic:
45
44
  extends: .integration
46
45
  allow_failure: yes
47
46
  variables:
48
47
  JOB_FETCHER: basic
49
48
 
50
- retry_test:
49
+ kill_interruption:
51
50
  stage: test
52
51
  script:
53
- - cd tests/retry_test
54
- - bundle exec ruby retry_test.rb
52
+ - cd tests/interruption
53
+ - bundle exec ruby test_kill_signal.rb
55
54
  services:
56
55
  - redis:alpine
57
56
 
57
+ term_interruption:
58
+ stage: test
59
+ script:
60
+ - cd tests/interruption
61
+ - bundle exec ruby test_term_signal.rb
62
+ services:
63
+ - redis:alpine
58
64
 
59
65
  # rubocop:
60
66
  # script:
data/Gemfile CHANGED
@@ -7,6 +7,6 @@ git_source(:github) { |repo_name| "https://github.com/#{repo_name}" }
7
7
  group :test do
8
8
  gem "rspec", '~> 3'
9
9
  gem "pry"
10
- gem "sidekiq", '~> 5.0'
10
+ gem "sidekiq", '~> 6.1'
11
11
  gem 'simplecov', require: false
12
12
  end
@@ -2,7 +2,7 @@ GEM
2
2
  remote: https://rubygems.org/
3
3
  specs:
4
4
  coderay (1.1.2)
5
- connection_pool (2.2.2)
5
+ connection_pool (2.2.3)
6
6
  diff-lcs (1.3)
7
7
  docile (1.3.1)
8
8
  json (2.1.0)
@@ -10,10 +10,8 @@ GEM
10
10
  pry (0.11.3)
11
11
  coderay (~> 1.1.0)
12
12
  method_source (~> 0.9.0)
13
- rack (2.0.5)
14
- rack-protection (2.0.4)
15
- rack
16
- redis (4.0.2)
13
+ rack (2.2.3)
14
+ redis (4.2.1)
17
15
  rspec (3.8.0)
18
16
  rspec-core (~> 3.8.0)
19
17
  rspec-expectations (~> 3.8.0)
@@ -27,10 +25,10 @@ GEM
27
25
  diff-lcs (>= 1.2.0, < 2.0)
28
26
  rspec-support (~> 3.8.0)
29
27
  rspec-support (3.8.0)
30
- sidekiq (5.2.2)
31
- connection_pool (~> 2.2, >= 2.2.2)
32
- rack-protection (>= 1.5.0)
33
- redis (>= 3.3.5, < 5)
28
+ sidekiq (6.1.0)
29
+ connection_pool (>= 2.2.2)
30
+ rack (~> 2.0)
31
+ redis (>= 4.2.0)
34
32
  simplecov (0.16.1)
35
33
  docile (~> 1.1)
36
34
  json (>= 1.8, < 3)
@@ -43,8 +41,8 @@ PLATFORMS
43
41
  DEPENDENCIES
44
42
  pry
45
43
  rspec (~> 3)
46
- sidekiq (~> 5.0)
44
+ sidekiq (~> 6.1)
47
45
  simplecov
48
46
 
49
47
  BUNDLED WITH
50
- 1.17.1
48
+ 1.17.2
data/README.md CHANGED
@@ -6,10 +6,23 @@ fetches from Redis.
6
6
 
7
7
  It's based on https://github.com/TEA-ebook/sidekiq-reliable-fetch.
8
8
 
9
+ **IMPORTANT NOTE:** Since version `0.7.0` this gem works only with `sidekiq >= 6.1` (which introduced Fetch API breaking changes). Please use version `~> 0.5` if you use older version of the `sidekiq` .
10
+
9
11
  There are two strategies implemented: [Reliable fetch](http://redis.io/commands/rpoplpush#pattern-reliable-queue) using `rpoplpush` command and
10
12
  semi-reliable fetch that uses regular `brpop` and `lpush` to pick the job and put it to working queue. The main benefit of "Reliable" strategy is that `rpoplpush` is atomic, eliminating a race condition in which jobs can be lost.
11
13
  However, it comes at a cost because `rpoplpush` can't watch multiple lists at the same time so we need to iterate over the entire queue list which significantly increases pressure on Redis when there are more than a few queues. The "semi-reliable" strategy is much more reliable than the default Sidekiq fetcher, though. Compared to the reliable fetch strategy, it does not increase pressure on Redis significantly.
12
14
 
15
+ ### Interruption handling
16
+
17
+ Sidekiq expects any job to report succcess or to fail. In the last case, Sidekiq puts `retry_count` counter
18
+ into the job and keeps to re-run the job until the counter reched the maximum allowed value. When the job has
19
+ not been given a chance to finish its work(to report success or fail), for example, when it was killed forcibly or when the job was requeued, after receiving TERM signal, the standard retry mechanisme does not get into the game and the job will be retried indefinatelly. This is why Reliable fetcher maintains a special counter `interrupted_count`
20
+ which is used to limit the amount of such retries. In both cases, Reliable Fetcher increments counter `interrupted_count` and rejects the job from running again when the counter exceeds `max_retries_after_interruption` times (default: 3 times).
21
+ Such a job will be put to `interrupted` queue. This queue mostly behaves as Sidekiq Dead queue so it only stores a limited amount of jobs for a limited term. Same as for Dead queue, all the limits are configurable via `interrupted_max_jobs` (default: 10_000) and `interrupted_timeout_in_seconds` (default: 3 months) Sidekiq option keys.
22
+
23
+ You can also disable special handling of interrupted jobs by setting `max_retries_after_interruption` into `-1`.
24
+ In this case, interrupted jobs will be run without any limits from Reliable Fetcher and they won't be put into Interrupted queue.
25
+
13
26
 
14
27
  ## Installation
15
28
 
@@ -1,14 +1,14 @@
1
1
  Gem::Specification.new do |s|
2
- s.name = 'gitlab-sidekiq-fetcher'
3
- s.version = '0.5.0-alpha'
4
- s.authors = ['TEA', 'GitLab']
5
- s.email = 'valery@gitlab.com'
6
- s.license = 'LGPL-3.0'
7
- s.homepage = 'https://gitlab.com/gitlab-org/sidekiq-reliable-fetch/'
8
- s.summary = 'Reliable fetch extension for Sidekiq'
9
- s.description = 'Redis reliable queue pattern implemented in Sidekiq'
2
+ s.name = 'gitlab-sidekiq-fetcher'
3
+ s.version = '0.7.0'
4
+ s.authors = ['TEA', 'GitLab']
5
+ s.email = 'valery@gitlab.com'
6
+ s.license = 'LGPL-3.0'
7
+ s.homepage = 'https://gitlab.com/gitlab-org/sidekiq-reliable-fetch/'
8
+ s.summary = 'Reliable fetch extension for Sidekiq'
9
+ s.description = 'Redis reliable queue pattern implemented in Sidekiq'
10
10
  s.require_paths = ['lib']
11
- s.files = `git ls-files`.split($\)
12
- s.test_files = []
13
- s.add_dependency 'sidekiq', '~> 5'
11
+ s.files = `git ls-files`.split($\)
12
+ s.test_files = []
13
+ s.add_dependency 'sidekiq', '~> 6.1'
14
14
  end
@@ -1,4 +1,5 @@
1
1
  require 'sidekiq'
2
+ require 'sidekiq/api'
2
3
 
3
4
  require_relative 'sidekiq/base_reliable_fetch'
4
5
  require_relative 'sidekiq/reliable_fetch'
@@ -1,6 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'sidekiq/job_retry'
3
+ require_relative 'interrupted_set'
4
4
 
5
5
  module Sidekiq
6
6
  class BaseReliableFetch
@@ -18,6 +18,9 @@ module Sidekiq
18
18
  # Defines the COUNT parameter that will be passed to Redis SCAN command
19
19
  SCAN_COUNT = 1000
20
20
 
21
+ # How much time a job can be interrupted
22
+ DEFAULT_MAX_RETRIES_AFTER_INTERRUPTION = 3
23
+
21
24
  UnitOfWork = Struct.new(:queue, :job) do
22
25
  def acknowledge
23
26
  Sidekiq.redis { |conn| conn.lrem(Sidekiq::BaseReliableFetch.working_queue_name(queue), 1, job) }
@@ -38,11 +41,13 @@ module Sidekiq
38
41
  end
39
42
 
40
43
  def self.setup_reliable_fetch!(config)
41
- config.options[:fetch] = if config.options[:semi_reliable_fetch]
42
- Sidekiq::SemiReliableFetch
43
- else
44
- Sidekiq::ReliableFetch
45
- end
44
+ fetch_strategy = if config.options[:semi_reliable_fetch]
45
+ Sidekiq::SemiReliableFetch
46
+ else
47
+ Sidekiq::ReliableFetch
48
+ end
49
+
50
+ config.options[:fetch] = fetch_strategy.new(config.options)
46
51
 
47
52
  Sidekiq.logger.info('GitLab reliable fetch activated!')
48
53
 
@@ -81,23 +86,8 @@ module Sidekiq
81
86
  Sidekiq.logger.debug("Heartbeat for hostname: #{hostname} and pid: #{pid}")
82
87
  end
83
88
 
84
- def self.bulk_requeue(inprogress, _options)
85
- return if inprogress.empty?
86
-
87
- Sidekiq.logger.debug('Re-queueing terminated jobs')
88
-
89
- Sidekiq.redis do |conn|
90
- inprogress.each do |unit_of_work|
91
- conn.multi do |multi|
92
- multi.lpush(unit_of_work.queue, unit_of_work.job)
93
- multi.lrem(working_queue_name(unit_of_work.queue), 1, unit_of_work.job)
94
- end
95
- end
96
- end
97
-
98
- Sidekiq.logger.info("Pushed #{inprogress.size} jobs back to Redis")
99
- rescue => e
100
- Sidekiq.logger.warn("Failed to requeue #{inprogress.size} jobs: #{e.message}")
89
+ def self.worker_dead?(hostname, pid, conn)
90
+ !conn.get(heartbeat_key(hostname, pid))
101
91
  end
102
92
 
103
93
  def self.heartbeat_key(hostname, pid)
@@ -113,6 +103,8 @@ module Sidekiq
113
103
  :strictly_ordered_queues
114
104
 
115
105
  def initialize(options)
106
+ raise ArgumentError, 'missing queue list' unless options[:queues]
107
+
116
108
  @cleanup_interval = options.fetch(:cleanup_interval, DEFAULT_CLEANUP_INTERVAL)
117
109
  @lease_interval = options.fetch(:lease_interval, DEFAULT_LEASE_INTERVAL)
118
110
  @last_try_to_take_lease_at = 0
@@ -128,73 +120,56 @@ module Sidekiq
128
120
 
129
121
  def retrieve_unit_of_work
130
122
  raise NotImplementedError,
131
- "#{self.class} does not implement #{__method__}"
123
+ "#{self.class} does not implement #{__method__}"
132
124
  end
133
125
 
134
- private
135
-
136
- def clean_working_queue!(working_queue)
137
- original_queue = working_queue.gsub(/#{WORKING_QUEUE_PREFIX}:|:[^:]*:[0-9]*\z/, '')
126
+ def bulk_requeue(inprogress, _options)
127
+ return if inprogress.empty?
138
128
 
139
129
  Sidekiq.redis do |conn|
140
- count = 0
141
-
142
- while job = conn.rpop(working_queue)
143
- msg = begin
144
- Sidekiq.load_json(job)
145
- rescue => e
146
- Sidekiq.logger.info("Skipped job: #{job} as we couldn't parse it")
147
- next
148
- end
149
-
150
- msg['retry_count'] = msg['retry_count'].to_i + 1
151
-
152
- if retries_exhausted?(msg)
153
- send_to_morgue(msg)
154
- else
155
- job = Sidekiq.dump_json(msg)
156
-
157
- conn.lpush(original_queue, job)
130
+ inprogress.each do |unit_of_work|
131
+ conn.multi do |multi|
132
+ preprocess_interrupted_job(unit_of_work.job, unit_of_work.queue, multi)
158
133
 
159
- count += 1
134
+ multi.lrem(self.class.working_queue_name(unit_of_work.queue), 1, unit_of_work.job)
160
135
  end
161
136
  end
162
-
163
- Sidekiq.logger.info("Requeued #{count} dead jobs to #{original_queue}")
164
137
  end
138
+ rescue => e
139
+ Sidekiq.logger.warn("Failed to requeue #{inprogress.size} jobs: #{e.message}")
165
140
  end
166
141
 
167
- def retries_exhausted?(msg)
168
- max_retries_default = Sidekiq.options.fetch(:max_retries, Sidekiq::JobRetry::DEFAULT_MAX_RETRY_ATTEMPTS)
169
-
170
- max_retry_attempts = retry_attempts_from(msg['retry'], max_retries_default)
142
+ private
171
143
 
172
- msg['retry_count'] >= max_retry_attempts
173
- end
144
+ def preprocess_interrupted_job(job, queue, conn = nil)
145
+ msg = Sidekiq.load_json(job)
146
+ msg['interrupted_count'] = msg['interrupted_count'].to_i + 1
174
147
 
175
- def retry_attempts_from(msg_retry, default)
176
- if msg_retry.is_a?(Integer)
177
- msg_retry
148
+ if interruption_exhausted?(msg)
149
+ send_to_quarantine(msg, conn)
178
150
  else
179
- default
151
+ requeue_job(queue, msg, conn)
180
152
  end
181
153
  end
182
154
 
183
- def send_to_morgue(msg)
184
- Sidekiq.logger.warn(
185
- class: msg['class'],
155
+ # If you want this method to be run in a scope of multi connection
156
+ # you need to pass it
157
+ def requeue_job(queue, msg, conn)
158
+ with_connection(conn) do |conn|
159
+ conn.lpush(queue, Sidekiq.dump_json(msg))
160
+ end
161
+
162
+ Sidekiq.logger.info(
163
+ message: "Pushed job #{msg['jid']} back to queue #{queue}",
186
164
  jid: msg['jid'],
187
- message: %(Reliable Fetcher: adding dead #{msg['class']} job #{msg['jid']})
165
+ queue: queue
188
166
  )
189
-
190
- payload = Sidekiq.dump_json(msg)
191
- Sidekiq::DeadSet.new.kill(payload, notify_failure: false)
192
167
  end
193
168
 
194
169
  # Detect "old" jobs and requeue them because the worker they were assigned
195
170
  # to probably failed miserably.
196
171
  def clean_working_queues!
197
- Sidekiq.logger.info("Cleaning working queues")
172
+ Sidekiq.logger.info('Cleaning working queues')
198
173
 
199
174
  Sidekiq.redis do |conn|
200
175
  conn.scan_each(match: "#{WORKING_QUEUE_PREFIX}:queue:*", count: SCAN_COUNT) do |key|
@@ -203,15 +178,56 @@ module Sidekiq
203
178
 
204
179
  continue if hostname.nil? || pid.nil?
205
180
 
206
- clean_working_queue!(key) if worker_dead?(hostname, pid)
181
+ clean_working_queue!(key) if self.class.worker_dead?(hostname, pid, conn)
207
182
  end
208
183
  end
209
184
  end
210
185
 
211
- def worker_dead?(hostname, pid)
186
+ def clean_working_queue!(working_queue)
187
+ original_queue = working_queue.gsub(/#{WORKING_QUEUE_PREFIX}:|:[^:]*:[0-9]*\z/, '')
188
+
212
189
  Sidekiq.redis do |conn|
213
- !conn.get(self.class.heartbeat_key(hostname, pid))
190
+ while job = conn.rpop(working_queue)
191
+ preprocess_interrupted_job(job, original_queue)
192
+ end
193
+ end
194
+ end
195
+
196
+ def interruption_exhausted?(msg)
197
+ return false if max_retries_after_interruption(msg['class']) < 0
198
+
199
+ msg['interrupted_count'].to_i >= max_retries_after_interruption(msg['class'])
200
+ end
201
+
202
+ def max_retries_after_interruption(worker_class)
203
+ max_retries_after_interruption = nil
204
+
205
+ max_retries_after_interruption ||= begin
206
+ Object.const_get(worker_class).sidekiq_options[:max_retries_after_interruption]
207
+ rescue NameError
214
208
  end
209
+
210
+ max_retries_after_interruption ||= Sidekiq.options[:max_retries_after_interruption]
211
+ max_retries_after_interruption ||= DEFAULT_MAX_RETRIES_AFTER_INTERRUPTION
212
+ max_retries_after_interruption
213
+ end
214
+
215
+ def send_to_quarantine(msg, multi_connection = nil)
216
+ Sidekiq.logger.warn(
217
+ class: msg['class'],
218
+ jid: msg['jid'],
219
+ message: %(Reliable Fetcher: adding dead #{msg['class']} job #{msg['jid']} to interrupted queue)
220
+ )
221
+
222
+ job = Sidekiq.dump_json(msg)
223
+ Sidekiq::InterruptedSet.new.put(job, connection: multi_connection)
224
+ end
225
+
226
+ # Yield block with an existing connection or creates another one
227
+ def with_connection(conn)
228
+ return yield(conn) if conn
229
+
230
+ Sidekiq.redis { |redis_conn| yield(redis_conn) }
215
231
  end
216
232
 
217
233
  def take_lease
@@ -0,0 +1,47 @@
1
+ require 'sidekiq/api'
2
+
3
+ module Sidekiq
4
+ class InterruptedSet < ::Sidekiq::JobSet
5
+ DEFAULT_MAX_CAPACITY = 10_000
6
+ DEFAULT_MAX_TIMEOUT = 90 * 24 * 60 * 60 # 3 months
7
+
8
+ def initialize
9
+ super "interrupted"
10
+ end
11
+
12
+ def put(message, opts = {})
13
+ now = Time.now.to_f
14
+
15
+ with_multi_connection(opts[:connection]) do |conn|
16
+ conn.zadd(name, now.to_s, message)
17
+ conn.zremrangebyscore(name, '-inf', now - self.class.timeout)
18
+ conn.zremrangebyrank(name, 0, - self.class.max_jobs)
19
+ end
20
+
21
+ true
22
+ end
23
+
24
+ # Yield block inside an existing multi connection or creates new one
25
+ def with_multi_connection(conn, &block)
26
+ return yield(conn) if conn
27
+
28
+ Sidekiq.redis do |c|
29
+ c.multi do |multi|
30
+ yield(multi)
31
+ end
32
+ end
33
+ end
34
+
35
+ def retry_all
36
+ each(&:retry) while size > 0
37
+ end
38
+
39
+ def self.max_jobs
40
+ Sidekiq.options[:interrupted_max_jobs] || DEFAULT_MAX_CAPACITY
41
+ end
42
+
43
+ def self.timeout
44
+ Sidekiq.options[:interrupted_timeout_in_seconds] || DEFAULT_MAX_TIMEOUT
45
+ end
46
+ end
47
+ end
@@ -6,23 +6,21 @@ module Sidekiq
6
6
  # we inject a regular sleep into the loop.
7
7
  RELIABLE_FETCH_IDLE_TIMEOUT = 5 # seconds
8
8
 
9
- attr_reader :queues_iterator, :queues_size
9
+ attr_reader :queues_size
10
10
 
11
11
  def initialize(options)
12
12
  super
13
13
 
14
+ @queues = queues.uniq if strictly_ordered_queues
14
15
  @queues_size = queues.size
15
- @queues_iterator = queues.cycle
16
16
  end
17
17
 
18
18
  private
19
19
 
20
20
  def retrieve_unit_of_work
21
- @queues_iterator.rewind if strictly_ordered_queues
22
-
23
- queues_size.times do
24
- queue = queues_iterator.next
21
+ queues_list = strictly_ordered_queues ? queues : queues.shuffle
25
22
 
23
+ queues_list.each do |queue|
26
24
  work = Sidekiq.redis do |conn|
27
25
  conn.rpoplpush(queue, self.class.working_queue_name(queue))
28
26
  end
@@ -5,10 +5,11 @@ require 'sidekiq/reliable_fetch'
5
5
  require 'sidekiq/semi_reliable_fetch'
6
6
 
7
7
  describe Sidekiq::BaseReliableFetch do
8
+ let(:job) { Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo']) }
9
+
8
10
  before { Sidekiq.redis(&:flushdb) }
9
11
 
10
12
  describe 'UnitOfWork' do
11
- let(:job) { Sidekiq.dump_json({ class: 'Bob', args: [1, 2, 'foo'] }) }
12
13
  let(:fetcher) { Sidekiq::ReliableFetch.new(queues: ['foo']) }
13
14
 
14
15
  describe '#requeue' do
@@ -38,25 +39,49 @@ describe Sidekiq::BaseReliableFetch do
38
39
  end
39
40
  end
40
41
 
41
- describe '.bulk_requeue' do
42
+ describe '#bulk_requeue' do
43
+ let(:options) { { queues: %w[foo bar] } }
44
+ let!(:queue1) { Sidekiq::Queue.new('foo') }
45
+ let!(:queue2) { Sidekiq::Queue.new('bar') }
46
+
42
47
  it 'requeues the bulk' do
43
- queue1 = Sidekiq::Queue.new('foo')
44
- queue2 = Sidekiq::Queue.new('bar')
48
+ uow = described_class::UnitOfWork
49
+ jobs = [ uow.new('queue:foo', job), uow.new('queue:foo', job), uow.new('queue:bar', job) ]
50
+ described_class.new(options).bulk_requeue(jobs, nil)
45
51
 
46
- expect(queue1.size).to eq 0
47
- expect(queue2.size).to eq 0
52
+ expect(queue1.size).to eq 2
53
+ expect(queue2.size).to eq 1
54
+ end
48
55
 
56
+ it 'puts jobs into interrupted queue' do
49
57
  uow = described_class::UnitOfWork
50
- jobs = [ uow.new('queue:foo', 'bob'), uow.new('queue:foo', 'bar'), uow.new('queue:bar', 'widget') ]
51
- described_class.bulk_requeue(jobs, queues: [])
58
+ interrupted_job = Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo'], interrupted_count: 3)
59
+ jobs = [ uow.new('queue:foo', interrupted_job), uow.new('queue:foo', job), uow.new('queue:bar', job) ]
60
+ described_class.new(options).bulk_requeue(jobs, nil)
61
+
62
+ expect(queue1.size).to eq 1
63
+ expect(queue2.size).to eq 1
64
+ expect(Sidekiq::InterruptedSet.new.size).to eq 1
65
+ end
66
+
67
+ it 'does not put jobs into interrupted queue if it is disabled' do
68
+ Sidekiq.options[:max_retries_after_interruption] = -1
69
+
70
+ uow = described_class::UnitOfWork
71
+ interrupted_job = Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo'], interrupted_count: 3)
72
+ jobs = [ uow.new('queue:foo', interrupted_job), uow.new('queue:foo', job), uow.new('queue:bar', job) ]
73
+ described_class.new(options).bulk_requeue(jobs, nil)
52
74
 
53
75
  expect(queue1.size).to eq 2
54
76
  expect(queue2.size).to eq 1
77
+ expect(Sidekiq::InterruptedSet.new.size).to eq 0
78
+
79
+ Sidekiq.options[:max_retries_after_interruption] = 3
55
80
  end
56
81
  end
57
82
 
58
83
  it 'sets heartbeat' do
59
- config = double(:sidekiq_config, options: {})
84
+ config = double(:sidekiq_config, options: { queues: %w[foo bar] })
60
85
 
61
86
  heartbeat_thread = described_class.setup_reliable_fetch!(config)
62
87
 
@@ -4,8 +4,8 @@ shared_examples 'a Sidekiq fetcher' do
4
4
  before { Sidekiq.redis(&:flushdb) }
5
5
 
6
6
  describe '#retrieve_work' do
7
- let(:job) { Sidekiq.dump_json({ class: 'Bob', args: [1, 2, 'foo'] }) }
8
- let(:fetcher) { described_class.new(queues: ['assigned']) }
7
+ let(:job) { Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo']) }
8
+ let(:fetcher) { described_class.new(queues: queues) }
9
9
 
10
10
  it 'retrieves the job and puts it to working queue' do
11
11
  Sidekiq.redis { |conn| conn.rpush('queue:assigned', job) }
@@ -24,13 +24,13 @@ shared_examples 'a Sidekiq fetcher' do
24
24
  expect(fetcher.retrieve_work).to be_nil
25
25
  end
26
26
 
27
- it 'requeues jobs from dead working queue with incremented retry_count' do
27
+ it 'requeues jobs from dead working queue with incremented interrupted_count' do
28
28
  Sidekiq.redis do |conn|
29
29
  conn.rpush(other_process_working_queue_name('assigned'), job)
30
30
  end
31
31
 
32
32
  expected_job = Sidekiq.load_json(job)
33
- expected_job['retry_count'] = 1
33
+ expected_job['interrupted_count'] = 1
34
34
  expected_job = Sidekiq.dump_json(expected_job)
35
35
 
36
36
  uow = fetcher.retrieve_work
@@ -61,12 +61,11 @@ shared_examples 'a Sidekiq fetcher' do
61
61
  it 'does not clean up orphaned jobs more than once per cleanup interval' do
62
62
  Sidekiq.redis = Sidekiq::RedisConnection.create(url: REDIS_URL, size: 10)
63
63
 
64
- expect_any_instance_of(described_class)
65
- .to receive(:clean_working_queues!).once
64
+ expect(fetcher).to receive(:clean_working_queues!).once
66
65
 
67
66
  threads = 10.times.map do
68
67
  Thread.new do
69
- described_class.new(queues: ['assigned']).retrieve_work
68
+ fetcher.retrieve_work
70
69
  end
71
70
  end
72
71
 
@@ -18,15 +18,20 @@ You need to have redis server running on default HTTP port `6379`. To use other
18
18
  This tool spawns configured number of Sidekiq workers and when the amount of processed jobs is about half of origin
19
19
  number it will kill all the workers with `kill -9` and then it will spawn new workers again until all the jobs are processed. To track the process and counters we use Redis keys/counters.
20
20
 
21
- # How to run retry tests
21
+ # How to run interruption tests
22
22
 
23
23
  ```
24
- cd retry_test
25
- bundle exec ruby retry_test.rb
24
+ cd tests/interruption
25
+
26
+ # Verify "KILL" signal
27
+ bundle exec ruby test_kill_signal.rb
28
+
29
+ # Verify "TERM" signal
30
+ bundle exec ruby test_term_signal.rb
26
31
  ```
27
32
 
28
33
  It requires Redis to be running on 6379 port.
29
34
 
30
35
  ## How it works
31
36
 
32
- It spawns Sidekiq workers then creates a job that will kill itself after a moment. The reliable fetcher will bring it back. The purpose is to verify that job is run no more then `retry` parameter says even when job was killed.
37
+ It spawns Sidekiq workers then creates a job that will kill itself after a moment. The reliable fetcher will bring it back. The purpose is to verify that job is run no more then allowed number of times.
@@ -0,0 +1,25 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'sidekiq'
4
+ require_relative 'config'
5
+ require_relative '../support/utils'
6
+
7
+ EXPECTED_NUM_TIMES_BEEN_RUN = 3
8
+ NUM_WORKERS = EXPECTED_NUM_TIMES_BEEN_RUN + 1
9
+
10
+ Sidekiq.redis(&:flushdb)
11
+
12
+ pids = spawn_workers(NUM_WORKERS)
13
+
14
+ RetryTestWorker.perform_async
15
+
16
+ sleep 300
17
+
18
+ Sidekiq.redis do |redis|
19
+ times_has_been_run = redis.get('times_has_been_run').to_i
20
+ assert 'The job has been run', times_has_been_run, EXPECTED_NUM_TIMES_BEEN_RUN
21
+ end
22
+
23
+ assert 'Found interruption exhausted jobs', Sidekiq::InterruptedSet.new.size, 1
24
+
25
+ stop_workers(pids)
@@ -0,0 +1,25 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'sidekiq'
4
+ require_relative 'config'
5
+ require_relative '../support/utils'
6
+
7
+ EXPECTED_NUM_TIMES_BEEN_RUN = 3
8
+ NUM_WORKERS = EXPECTED_NUM_TIMES_BEEN_RUN + 1
9
+
10
+ Sidekiq.redis(&:flushdb)
11
+
12
+ pids = spawn_workers(NUM_WORKERS)
13
+
14
+ RetryTestWorker.perform_async('TERM', 60)
15
+
16
+ sleep 300
17
+
18
+ Sidekiq.redis do |redis|
19
+ times_has_been_run = redis.get('times_has_been_run').to_i
20
+ assert 'The job has been run', times_has_been_run, EXPECTED_NUM_TIMES_BEEN_RUN
21
+ end
22
+
23
+ assert 'Found interruption exhausted jobs', Sidekiq::InterruptedSet.new.size, 1
24
+
25
+ stop_workers(pids)
@@ -0,0 +1,15 @@
1
+ # frozen_string_literal: true
2
+
3
+ class RetryTestWorker
4
+ include Sidekiq::Worker
5
+
6
+ def perform(signal = 'KILL', wait_seconds = 1)
7
+ Sidekiq.redis do |redis|
8
+ redis.incr('times_has_been_run')
9
+ end
10
+
11
+ Process.kill(signal, Process.pid)
12
+
13
+ sleep wait_seconds
14
+ end
15
+ end
@@ -57,7 +57,7 @@ end
57
57
  def spawn_workers
58
58
  pids = []
59
59
  NUMBER_OF_WORKERS.times do
60
- pids << spawn('sidekiq -r ./config.rb')
60
+ pids << spawn('sidekiq -q default -q low -q high -r ./config.rb')
61
61
  end
62
62
 
63
63
  pids
@@ -0,0 +1,14 @@
1
+ # frozen_string_literal: true
2
+
3
+ class ReliabilityTestWorker
4
+ include Sidekiq::Worker
5
+
6
+ def perform
7
+ # To mimic long running job and to increase the probability of losing the job
8
+ sleep 1
9
+
10
+ Sidekiq.redis do |redis|
11
+ redis.lpush(REDIS_FINISHED_LIST, jid)
12
+ end
13
+ end
14
+ end
@@ -0,0 +1,26 @@
1
+ def assert(text, actual, expected)
2
+ if actual == expected
3
+ puts "#{text}: #{actual} (Success)"
4
+ else
5
+ puts "#{text}: #{actual} (Failed). Expected: #{expected}"
6
+ exit 1
7
+ end
8
+ end
9
+
10
+ def spawn_workers(number)
11
+ pids = []
12
+
13
+ number.times do
14
+ pids << spawn('sidekiq -q default -q high -q low -r ./config.rb')
15
+ end
16
+
17
+ pids
18
+ end
19
+
20
+ # Stop Sidekiq workers
21
+ def stop_workers(pids)
22
+ pids.each do |pid|
23
+ Process.kill('KILL', pid)
24
+ Process.wait pid
25
+ end
26
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gitlab-sidekiq-fetcher
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.0.pre.alpha
4
+ version: 0.7.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - TEA
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2019-08-02 00:00:00.000000000 Z
12
+ date: 2020-07-30 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: sidekiq
@@ -17,14 +17,14 @@ dependencies:
17
17
  requirements:
18
18
  - - "~>"
19
19
  - !ruby/object:Gem::Version
20
- version: '5'
20
+ version: '6.1'
21
21
  type: :runtime
22
22
  prerelease: false
23
23
  version_requirements: !ruby/object:Gem::Requirement
24
24
  requirements:
25
25
  - - "~>"
26
26
  - !ruby/object:Gem::Version
27
- version: '5'
27
+ version: '6.1'
28
28
  description: Redis reliable queue pattern implemented in Sidekiq
29
29
  email: valery@gitlab.com
30
30
  executables: []
@@ -42,6 +42,7 @@ files:
42
42
  - gitlab-sidekiq-fetcher.gemspec
43
43
  - lib/sidekiq-reliable-fetch.rb
44
44
  - lib/sidekiq/base_reliable_fetch.rb
45
+ - lib/sidekiq/interrupted_set.rb
45
46
  - lib/sidekiq/reliable_fetch.rb
46
47
  - lib/sidekiq/semi_reliable_fetch.rb
47
48
  - spec/base_reliable_fetch_spec.rb
@@ -50,13 +51,14 @@ files:
50
51
  - spec/semi_reliable_fetch_spec.rb
51
52
  - spec/spec_helper.rb
52
53
  - tests/README.md
53
- - tests/reliability_test/config.rb
54
- - tests/reliability_test/reliability_test.rb
55
- - tests/reliability_test/worker.rb
56
- - tests/retry_test/config.rb
57
- - tests/retry_test/retry_test.rb
58
- - tests/retry_test/simple_assert.rb
59
- - tests/retry_test/worker.rb
54
+ - tests/interruption/config.rb
55
+ - tests/interruption/test_kill_signal.rb
56
+ - tests/interruption/test_term_signal.rb
57
+ - tests/interruption/worker.rb
58
+ - tests/reliability/config.rb
59
+ - tests/reliability/reliability_test.rb
60
+ - tests/reliability/worker.rb
61
+ - tests/support/utils.rb
60
62
  homepage: https://gitlab.com/gitlab-org/sidekiq-reliable-fetch/
61
63
  licenses:
62
64
  - LGPL-3.0
@@ -72,11 +74,11 @@ required_ruby_version: !ruby/object:Gem::Requirement
72
74
  version: '0'
73
75
  required_rubygems_version: !ruby/object:Gem::Requirement
74
76
  requirements:
75
- - - ">"
77
+ - - ">="
76
78
  - !ruby/object:Gem::Version
77
- version: 1.3.1
79
+ version: '0'
78
80
  requirements: []
79
- rubygems_version: 3.0.3
81
+ rubygems_version: 3.0.6
80
82
  signing_key:
81
83
  specification_version: 4
82
84
  summary: Reliable fetch extension for Sidekiq
@@ -1,26 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- class ReliabilityTestWorker
4
- include Sidekiq::Worker
5
-
6
- def perform
7
- # To mimic long running job and to increase the probability of losing the job
8
- sleep 1
9
-
10
- Sidekiq.redis do |redis|
11
- redis.lpush(REDIS_FINISHED_LIST, get_sidekiq_job_id)
12
- end
13
- end
14
-
15
- def get_sidekiq_job_id
16
- context_data = Thread.current[:sidekiq_context]&.first
17
-
18
- return unless context_data
19
-
20
- index = context_data.index('JID-')
21
-
22
- return unless index
23
-
24
- context_data[index + 4..-1]
25
- end
26
- end
@@ -1,40 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- require 'sidekiq'
4
- require 'sidekiq/util'
5
- require 'sidekiq/cli'
6
- require_relative 'config'
7
- require_relative 'simple_assert'
8
-
9
- NUM_WORKERS = RetryTestWorker::EXPECTED_NUM_TIMES_BEEN_RUN + 1
10
-
11
- Sidekiq.redis(&:flushdb)
12
-
13
- def spawn_workers
14
- pids = []
15
-
16
- NUM_WORKERS.times do
17
- pids << spawn('sidekiq -r ./config.rb')
18
- end
19
-
20
- pids
21
- end
22
-
23
- pids = spawn_workers
24
-
25
- jid = RetryTestWorker.perform_async
26
-
27
- sleep 300
28
-
29
- Sidekiq.redis do |redis|
30
- times_has_been_run = redis.get('times_has_been_run').to_i
31
- assert "The job has been run", times_has_been_run, 2
32
- end
33
-
34
- assert "Found dead jobs", Sidekiq::DeadSet.new.size, 1
35
-
36
- # Stop Sidekiq workers
37
- pids.each do |pid|
38
- Process.kill('KILL', pid)
39
- Process.wait pid
40
- end
@@ -1,8 +0,0 @@
1
- def assert(text, actual, expected)
2
- if actual == expected
3
- puts "#{text}: #{actual} (Success)"
4
- else
5
- puts "#{text}: #{actual} (Failed). Expected: #{expected}"
6
- exit 1
7
- end
8
- end
@@ -1,23 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- class RetryTestWorker
4
- include Sidekiq::Worker
5
-
6
- EXPECTED_NUM_TIMES_BEEN_RUN = 2
7
-
8
- sidekiq_options retry: EXPECTED_NUM_TIMES_BEEN_RUN
9
-
10
- sidekiq_retry_in do |count, exception|
11
- 1 # retry in one second
12
- end
13
-
14
- def perform
15
- sleep 1
16
-
17
- Sidekiq.redis do |redis|
18
- redis.incr('times_has_been_run')
19
- end
20
-
21
- Process.kill('KILL', Process.pid) # Job suicide, OOM killer imitation
22
- end
23
- end