resque_stuck_queue 0.1.1 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +27 -6
- data/lib/resque_stuck_queue.rb +22 -24
- data/lib/resque_stuck_queue/config.rb +3 -3
- data/lib/resque_stuck_queue/heartbeat_job.rb +14 -0
- data/lib/resque_stuck_queue/version.rb +1 -1
- data/test/test_helper.rb +0 -1
- data/test/test_integration.rb +1 -2
- data/test/test_lagtime.rb +0 -1
- data/test/test_set_custom_refresh_job.rb +3 -3
- metadata +3 -4
- data/test/resque/refresh_latest_timestamp.rb +0 -8
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA512:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 8441e82a5f962f49c9de740fd9a843c50b3125458c69f08598b210774b2920f6bac676f87a748c095aafdcc86799934229a03ba5c2cc0c8a906f3834f484cc7c
|
4
|
+
data.tar.gz: 824e77bbf4ab0c7fa6b150e69ed0a4845d4e6a18022715b51880e60af7ba189be704277d0776aa91bbe39003b09f5b8f22895421484b10bb6c454b73f77391d7
|
5
5
|
SHA1:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: ec5948be0403dbafd9958e4dbb650f88ebd15de2
|
7
|
+
data.tar.gz: a97a2968698d72fa37b14d1ba3eabe7e9345d9cf
|
data/README.md
CHANGED
@@ -8,6 +8,8 @@ This is to be used to satisfy an ops problem. There have been cases resque proce
|
|
8
8
|
|
9
9
|
If resque doesn't run jobs in specific queues (defaults to `@queue = :app`) within a certain timeframe, it will trigger a pre-defined handler of your choice. You can use this to send an email, pager duty, add more resque workers, restart resque, send you a txt...whatever suits you.
|
10
10
|
|
11
|
+
It will also fire a proc to notify you when it's recovered.
|
12
|
+
|
11
13
|
## How it works
|
12
14
|
|
13
15
|
When you call `start` you are essentially starting two threads that will continiously run until `stop` is called or until the process shuts down.
|
@@ -16,18 +18,33 @@ One thread is responsible for pushing a 'heartbeat' job to resque which will ess
|
|
16
18
|
|
17
19
|
The other thread is a continious loop that will check redis (bypassing resque) for that key and check what the latest time the hearbeat job successfully updated that key.
|
18
20
|
|
19
|
-
|
21
|
+
StuckQueue will trigger a pre-defined proc if the queue is lagging according to the times you've configured (see below).
|
22
|
+
|
23
|
+
After firing the proc, it will continue to monitor the queue, but won't call the proc again until the queue is found to be good again (it will then call a different "recovered" handler).
|
24
|
+
|
25
|
+
By calling the recovered proc, it will then complain again the next time the lag is found.
|
20
26
|
|
21
|
-
##
|
27
|
+
## Configuration Options
|
22
28
|
|
23
|
-
Configure it first
|
29
|
+
Configure it first via something like:
|
24
30
|
|
25
31
|
<pre>
|
26
|
-
|
32
|
+
Resque::StuckQueue.config[:triggered_handler] = proc { send_email }
|
33
|
+
</pre>
|
34
|
+
|
35
|
+
Configuration settings are below. You'll most likely at the least want to tune `:triggered_handler`,`:heartbeat` and `:trigger_timeout` settings.
|
36
|
+
|
37
|
+
<pre>
|
38
|
+
triggered_handler:
|
27
39
|
set to what gets triggered when resque-stuck-queue will detect the latest heartbeat is older than the trigger_timeout time setting.
|
28
40
|
Example:
|
29
41
|
Resque::StuckQueue.config[:triggered_handler] = proc { |queue_name, lagtime| send_email('queue #{queue_name} isnt working, aaah the daemons') }
|
30
42
|
|
43
|
+
recovered_handler:
|
44
|
+
set to what gets triggered when resque-stuck-queue has triggered a problem, but then detects the queue went back down to functioning well again (it wont trigger again until it has recovered).
|
45
|
+
Example:
|
46
|
+
Resque::StuckQueue.config[:recovered_handler] = proc { |queue_name, lagtime| send_email('phew, queue #{queue_name} is ok') }
|
47
|
+
|
31
48
|
heartbeat:
|
32
49
|
set to how often to push that 'heartbeat' job to refresh the latest time it worked.
|
33
50
|
Example:
|
@@ -44,6 +61,9 @@ redis:
|
|
44
61
|
heartbeat_key:
|
45
62
|
optional, name of keys to keep track of the last good resque heartbeat time
|
46
63
|
|
64
|
+
triggered_key:
|
65
|
+
optional, name of keys to keep track of the last trigger time
|
66
|
+
|
47
67
|
logger:
|
48
68
|
optional, pass a Logger. Default a ruby logger will be instantiated. Needs to respond to that interface.
|
49
69
|
|
@@ -55,9 +75,10 @@ abort_on_exception:
|
|
55
75
|
|
56
76
|
refresh_job:
|
57
77
|
optional, your own custom refreshing job. if you are using something other than resque
|
78
|
+
|
58
79
|
</pre>
|
59
80
|
|
60
|
-
|
81
|
+
To start it:
|
61
82
|
|
62
83
|
<pre>
|
63
84
|
Resque::StuckQueue.start # blocking
|
@@ -121,7 +142,7 @@ class CustomJob
|
|
121
142
|
end
|
122
143
|
end
|
123
144
|
|
124
|
-
Resque::StuckQueue.config[:
|
145
|
+
Resque::StuckQueue.config[:heartbeat_job] = proc {
|
125
146
|
# or however else you enque your custom job, Sidekiq::Client.enqueue(CustomJob), whatever, etc.
|
126
147
|
CustomJob.perform_async
|
127
148
|
}
|
data/lib/resque_stuck_queue.rb
CHANGED
@@ -1,5 +1,6 @@
|
|
1
1
|
require "resque_stuck_queue/version"
|
2
2
|
require "resque_stuck_queue/config"
|
3
|
+
require "resque_stuck_queue/heartbeat_job"
|
3
4
|
|
4
5
|
# TODO move this require into a configurable?
|
5
6
|
require 'resque'
|
@@ -75,7 +76,7 @@ module Resque
|
|
75
76
|
|
76
77
|
Redis::Classy.db = redis if Redis::Classy.db.nil?
|
77
78
|
|
78
|
-
|
79
|
+
enqueue_repeating_heartbeat_job
|
79
80
|
setup_checker_thread
|
80
81
|
|
81
82
|
# fo-eva.
|
@@ -121,14 +122,14 @@ module Resque
|
|
121
122
|
|
122
123
|
private
|
123
124
|
|
124
|
-
def
|
125
|
+
def enqueue_repeating_heartbeat_job
|
125
126
|
@threads << Thread.new do
|
126
127
|
Thread.current.abort_on_exception = config[:abort_on_exception]
|
127
128
|
logger.info("Starting heartbeat thread")
|
128
129
|
while @running
|
129
130
|
# we want to go through resque jobs, because that's what we're trying to test here:
|
130
131
|
# ensure that jobs get executed and the time is updated!
|
131
|
-
logger.info("Sending
|
132
|
+
logger.info("Sending heartbeat jobs")
|
132
133
|
enqueue_jobs
|
133
134
|
wait_for_it
|
134
135
|
end
|
@@ -136,12 +137,14 @@ module Resque
|
|
136
137
|
end
|
137
138
|
|
138
139
|
def enqueue_jobs
|
139
|
-
if config[:
|
140
|
-
# FIXME config[:
|
141
|
-
config[:
|
140
|
+
if config[:heartbeat_job]
|
141
|
+
# FIXME config[:heartbeat_job] with mutliple queues is bad semantics
|
142
|
+
config[:heartbeat_job].call
|
142
143
|
else
|
143
144
|
queues.each do |queue_name|
|
144
|
-
Resque.enqueue_to(queue_name,
|
145
|
+
Resque.enqueue_to(queue_name, HeartbeatJob, [heartbeat_key_for(queue_name), redis.client.host, redis.client.port])
|
146
|
+
queue_name = :snapshot_progress
|
147
|
+
Resque.enqueue_to(queue_name, HeartbeatJob, [Resque::StuckQueue.heartbeat_key_for(queue_name), Resque.redis.client.host, Resque.redis.client.port])
|
145
148
|
end
|
146
149
|
end
|
147
150
|
end
|
@@ -155,17 +158,10 @@ module Resque
|
|
155
158
|
if mutex.lock
|
156
159
|
begin
|
157
160
|
queues.each do |queue_name|
|
158
|
-
|
159
|
-
if triggered_ago = last_triggered(queue_name)
|
160
|
-
logger.info("Last triggered for #{queue_name} is #{triggered_ago.inspect} seconds.")
|
161
|
-
else
|
162
|
-
logger.info("No last trigger found for #{queue_name}.")
|
163
|
-
end
|
161
|
+
log_checker_info(queue_name)
|
164
162
|
if should_trigger?(queue_name)
|
165
|
-
logger.info("Triggering :triggered handler for #{queue_name} at #{Time.now}.")
|
166
163
|
trigger_handler(queue_name, :triggered)
|
167
164
|
elsif should_recover?(queue_name)
|
168
|
-
logger.info("Triggering :recovered handler for #{queue_name} at #{Time.now}.")
|
169
165
|
trigger_handler(queue_name, :recovered)
|
170
166
|
end
|
171
167
|
end
|
@@ -235,6 +231,7 @@ module Resque
|
|
235
231
|
def trigger_handler(queue_name, type)
|
236
232
|
raise 'Must trigger either the recovered or triggered handler!' unless (type == :recovered || type == :triggered)
|
237
233
|
handler_name = :"#{type}_handler"
|
234
|
+
logger.info("Triggering #{type} handler for #{queue_name} at #{Time.now}.")
|
238
235
|
(config[handler_name] || const_get(handler_name.upcase)).call(queue_name, lag_time(queue_name))
|
239
236
|
manual_refresh(queue_name, type)
|
240
237
|
rescue => e
|
@@ -243,6 +240,16 @@ module Resque
|
|
243
240
|
force_stop!
|
244
241
|
end
|
245
242
|
|
243
|
+
def log_checker_info(queue_name)
|
244
|
+
logger.info("Lag time for #{queue_name} is #{lag_time(queue_name).inspect} seconds.")
|
245
|
+
if triggered_ago = last_triggered(queue_name)
|
246
|
+
logger.info("Last triggered for #{queue_name} is #{triggered_ago.inspect} seconds.")
|
247
|
+
else
|
248
|
+
logger.info("No last trigger found for #{queue_name}.")
|
249
|
+
end
|
250
|
+
|
251
|
+
end
|
252
|
+
|
246
253
|
def read_from_redis(keyname)
|
247
254
|
redis.get(keyname)
|
248
255
|
end
|
@@ -258,12 +265,3 @@ module Resque
|
|
258
265
|
end
|
259
266
|
end
|
260
267
|
|
261
|
-
class RefreshLatestTimestamp
|
262
|
-
def self.perform(args)
|
263
|
-
timestamp_key = args[0]
|
264
|
-
host = args[1]
|
265
|
-
port = args[2]
|
266
|
-
r = Redis.new(:host => host, :port => port)
|
267
|
-
r.set(timestamp_key, Time.now.to_i)
|
268
|
-
end
|
269
|
-
end
|
@@ -15,8 +15,8 @@ module Resque
|
|
15
15
|
class Config < Hash
|
16
16
|
|
17
17
|
OPTIONS_DESCRIPTIONS = {
|
18
|
-
:triggered_handler
|
19
|
-
:recovered_handler => "set to what gets triggered when resque-stuck-queue has triggered a problem, but then detects the queue went back down to functioning well again(
|
18
|
+
:triggered_handler => "set to what gets triggered when resque-stuck-queue will detect the latest heartbeat is older than the trigger_timeout time setting.\n\tExample:\n\tResque::StuckQueue.config[:triggered_handler] = proc { |queue_name, lagtime| send_email('queue \#{queue_name} isnt working, aaah the daemons') }",
|
19
|
+
:recovered_handler => "set to what gets triggered when resque-stuck-queue has triggered a problem, but then detects the queue went back down to functioning well again(it wont trigger again until it has recovered).\n\tExample:\n\tResque::StuckQueue.config[:recovered_handler] = proc { |queue_name, lagtime| send_email('phew, queue \#{queue_name} is ok') }",
|
20
20
|
:heartbeat => "set to how often to push that 'heartbeat' job to refresh the latest time it worked.\n\tExample:\n\tResque::StuckQueue.config[:heartbeat] = 5.minutes",
|
21
21
|
:trigger_timeout => "set to how much of a resque work lag you are willing to accept before being notified. note: take the :heartbeat setting into account when setting this timeout.\n\tExample:\n\tResque::StuckQueue.config[:trigger_timeout] = 55.minutes",
|
22
22
|
:redis => "set the Redis instance StuckQueue will use",
|
@@ -25,7 +25,7 @@ module Resque
|
|
25
25
|
:logger => "optional, pass a Logger. Default a ruby logger will be instantiated. Needs to respond to that interface.",
|
26
26
|
:queues => "optional, monitor specific queues you want to send a heartbeat/monitor to. default is :app",
|
27
27
|
:abort_on_exception => "optional, if you want the resque-stuck-queue threads to explicitly raise, default is false",
|
28
|
-
:
|
28
|
+
:heartbeat_job => "optional, your own custom refreshing job. if you are using something other than resque",
|
29
29
|
}
|
30
30
|
|
31
31
|
OPTIONS = OPTIONS_DESCRIPTIONS.keys
|
@@ -0,0 +1,14 @@
|
|
1
|
+
module Resque
|
2
|
+
module StuckQueue
|
3
|
+
class HeartbeatJob
|
4
|
+
def self.perform(args)
|
5
|
+
timestamp_key = args[0]
|
6
|
+
host = args[1]
|
7
|
+
port = args[2]
|
8
|
+
new_time = Time.now.to_i
|
9
|
+
r = Redis.new(:host => host, :port => port)
|
10
|
+
r.set(timestamp_key, new_time)
|
11
|
+
end
|
12
|
+
end
|
13
|
+
end
|
14
|
+
end
|
data/test/test_helper.rb
CHANGED
@@ -7,7 +7,6 @@ require "mocha/mini_test"
|
|
7
7
|
$:.unshift(".")
|
8
8
|
require 'resque_stuck_queue'
|
9
9
|
require File.join(File.expand_path(File.dirname(__FILE__)), "resque", "set_redis_key")
|
10
|
-
require File.join(File.expand_path(File.dirname(__FILE__)), "resque", "refresh_latest_timestamp")
|
11
10
|
|
12
11
|
module TestHelper
|
13
12
|
|
data/test/test_integration.rb
CHANGED
@@ -6,7 +6,6 @@ require 'pry'
|
|
6
6
|
$:.unshift(".")
|
7
7
|
require 'resque_stuck_queue'
|
8
8
|
require File.join(File.expand_path(File.dirname(__FILE__)), "resque", "set_redis_key")
|
9
|
-
require File.join(File.expand_path(File.dirname(__FILE__)), "resque", "refresh_latest_timestamp")
|
10
9
|
require File.join(File.expand_path(File.dirname(__FILE__)), "test_helper")
|
11
10
|
|
12
11
|
class TestIntegration < Minitest::Test
|
@@ -88,7 +87,7 @@ class TestIntegration < Minitest::Test
|
|
88
87
|
Resque::StuckQueue.config[:heartbeat] = 1
|
89
88
|
|
90
89
|
begin
|
91
|
-
Resque::StuckQueue.config[:
|
90
|
+
Resque::StuckQueue.config[:heartbeat_job] = proc { Resque.enqueue_to(:app, Resque::StuckQueue::HeartbeatJob, Resque::StuckQueue.heartbeat_key_for(:app)) }
|
92
91
|
@triggered = false
|
93
92
|
Resque::StuckQueue.config[:triggered_handler] = proc { @triggered = true }
|
94
93
|
start_and_stop_loops_after(4)
|
data/test/test_lagtime.rb
CHANGED
@@ -5,7 +5,6 @@ require 'pry'
|
|
5
5
|
$:.unshift(".")
|
6
6
|
require 'resque_stuck_queue'
|
7
7
|
require File.join(File.expand_path(File.dirname(__FILE__)), "resque", "set_redis_key")
|
8
|
-
require File.join(File.expand_path(File.dirname(__FILE__)), "resque", "refresh_latest_timestamp")
|
9
8
|
require File.join(File.expand_path(File.dirname(__FILE__)), "test_helper")
|
10
9
|
|
11
10
|
class TestLagTime < Minitest::Test
|
@@ -9,7 +9,7 @@ class TestYourOwnRefreshJob < Minitest::Test
|
|
9
9
|
Resque::StuckQueue.config[:trigger_timeout] = 1
|
10
10
|
Resque::StuckQueue.config[:heartbeat] = 1
|
11
11
|
Resque::StuckQueue.config[:abort_on_exception] = true
|
12
|
-
Resque::StuckQueue.config[:
|
12
|
+
Resque::StuckQueue.config[:heartbeat_job] = nil
|
13
13
|
Resque::StuckQueue.redis = Redis.new
|
14
14
|
Resque::StuckQueue.redis.flushall
|
15
15
|
end
|
@@ -17,7 +17,7 @@ class TestYourOwnRefreshJob < Minitest::Test
|
|
17
17
|
def test_will_trigger_with_unrefreshing_custom_heartbeat_job
|
18
18
|
# it will trigger because the key will be unrefreshed, hence 'old' and will always trigger.
|
19
19
|
puts "#{__method__}"
|
20
|
-
Resque::StuckQueue.config[:
|
20
|
+
Resque::StuckQueue.config[:heartbeat_job] = proc { nil } # does not refresh global key
|
21
21
|
@triggered = false
|
22
22
|
Resque::StuckQueue.config[:triggered_handler] = proc { @triggered = true }
|
23
23
|
start_and_stop_loops_after(3)
|
@@ -27,7 +27,7 @@ class TestYourOwnRefreshJob < Minitest::Test
|
|
27
27
|
def test_will_fail_with_bad_custom_heartbeat_job
|
28
28
|
puts "#{__method__}"
|
29
29
|
begin
|
30
|
-
Resque::StuckQueue.config[:
|
30
|
+
Resque::StuckQueue.config[:heartbeat_job] = proc { raise 'bad proc doc' } # does not refresh global key
|
31
31
|
@triggered = false
|
32
32
|
Resque::StuckQueue.config[:triggered_handler] = proc { @triggered = true }
|
33
33
|
start_and_stop_loops_after(3)
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: resque_stuck_queue
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Shai Rosenfeld
|
@@ -9,7 +9,7 @@ autorequire:
|
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
11
|
|
12
|
-
date: 2014-01-
|
12
|
+
date: 2014-01-29 00:00:00 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: redis-mutex
|
@@ -59,9 +59,9 @@ files:
|
|
59
59
|
- lib/resque/stuck_queue.rb
|
60
60
|
- lib/resque_stuck_queue.rb
|
61
61
|
- lib/resque_stuck_queue/config.rb
|
62
|
+
- lib/resque_stuck_queue/heartbeat_job.rb
|
62
63
|
- lib/resque_stuck_queue/version.rb
|
63
64
|
- resque_stuck_queue.gemspec
|
64
|
-
- test/resque/refresh_latest_timestamp.rb
|
65
65
|
- test/resque/set_redis_key.rb
|
66
66
|
- test/test_collision.rb
|
67
67
|
- test/test_config.rb
|
@@ -96,7 +96,6 @@ signing_key:
|
|
96
96
|
specification_version: 4
|
97
97
|
summary: fire a handler when your queues are wonky
|
98
98
|
test_files:
|
99
|
-
- test/resque/refresh_latest_timestamp.rb
|
100
99
|
- test/resque/set_redis_key.rb
|
101
100
|
- test/test_collision.rb
|
102
101
|
- test/test_config.rb
|