resque_stuck_queue 0.1.1 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA512:
3
- metadata.gz: 18578e956daa34b9788d57db6fc5539e0e425c692d672c89302c82051f81a639eff25fbf76570a1b0ac74099051503835fca1912e051086c1a7f6df2ea485e0f
4
- data.tar.gz: 2a721eec9c3ba4555a71425111b9ab322913da8df68612e6a08a6f4577c9f3a77793014617742661a4db1bfdede87e65567a8578c5f3511201d7678005e16bc0
3
+ metadata.gz: 8441e82a5f962f49c9de740fd9a843c50b3125458c69f08598b210774b2920f6bac676f87a748c095aafdcc86799934229a03ba5c2cc0c8a906f3834f484cc7c
4
+ data.tar.gz: 824e77bbf4ab0c7fa6b150e69ed0a4845d4e6a18022715b51880e60af7ba189be704277d0776aa91bbe39003b09f5b8f22895421484b10bb6c454b73f77391d7
5
5
  SHA1:
6
- metadata.gz: d8a0f8ef0e451bba4b99236eb653aae4078c498e
7
- data.tar.gz: 4cb3c0daad447f88893d8f2ebbd1bf991bea0d10
6
+ metadata.gz: ec5948be0403dbafd9958e4dbb650f88ebd15de2
7
+ data.tar.gz: a97a2968698d72fa37b14d1ba3eabe7e9345d9cf
data/README.md CHANGED
@@ -8,6 +8,8 @@ This is to be used to satisfy an ops problem. There have been cases resque proce
8
8
 
9
9
  If resque doesn't run jobs in specific queues (defaults to `@queue = :app`) within a certain timeframe, it will trigger a pre-defined handler of your choice. You can use this to send an email, pager duty, add more resque workers, restart resque, send you a txt...whatever suits you.
10
10
 
11
+ It will also fire a proc to notify you when it's recovered.
12
+
11
13
  ## How it works
12
14
 
13
15
  When you call `start` you are essentially starting two threads that will continiously run until `stop` is called or until the process shuts down.
@@ -16,18 +18,33 @@ One thread is responsible for pushing a 'heartbeat' job to resque which will ess
16
18
 
17
19
  The other thread is a continious loop that will check redis (bypassing resque) for that key and check what the latest time the hearbeat job successfully updated that key.
18
20
 
19
- It will trigger a pre-defined proc (see below) if the last time the hearbeat job updated that key is older than the trigger_timeout setting (see below).
21
+ StuckQueue will trigger a pre-defined proc if the queue is lagging according to the times you've configured (see below).
22
+
23
+ After firing the proc, it will continue to monitor the queue, but won't call the proc again until the queue is found to be good again (it will then call a different "recovered" handler).
24
+
25
+ By calling the recovered proc, it will then complain again the next time the lag is found.
20
26
 
21
- ## Usage
27
+ ## Configuration Options
22
28
 
23
- Configure it first. Optional settings are below. You'll most likely at the least want to tune `:triggered_handler`,`:heartbeat` and `:trigger_timeout` settings.
29
+ Configure it first via something like:
24
30
 
25
31
  <pre>
26
- handler:
32
+ Resque::StuckQueue.config[:triggered_handler] = proc { send_email }
33
+ </pre>
34
+
35
+ Configuration settings are below. You'll most likely at the least want to tune `:triggered_handler`,`:heartbeat` and `:trigger_timeout` settings.
36
+
37
+ <pre>
38
+ triggered_handler:
27
39
  set to what gets triggered when resque-stuck-queue will detect the latest heartbeat is older than the trigger_timeout time setting.
28
40
  Example:
29
41
  Resque::StuckQueue.config[:triggered_handler] = proc { |queue_name, lagtime| send_email('queue #{queue_name} isnt working, aaah the daemons') }
30
42
 
43
+ recovered_handler:
44
+ set to what gets triggered when resque-stuck-queue has triggered a problem, but then detects the queue went back down to functioning well again (it wont trigger again until it has recovered).
45
+ Example:
46
+ Resque::StuckQueue.config[:recovered_handler] = proc { |queue_name, lagtime| send_email('phew, queue #{queue_name} is ok') }
47
+
31
48
  heartbeat:
32
49
  set to how often to push that 'heartbeat' job to refresh the latest time it worked.
33
50
  Example:
@@ -44,6 +61,9 @@ redis:
44
61
  heartbeat_key:
45
62
  optional, name of keys to keep track of the last good resque heartbeat time
46
63
 
64
+ triggered_key:
65
+ optional, name of keys to keep track of the last trigger time
66
+
47
67
  logger:
48
68
  optional, pass a Logger. Default a ruby logger will be instantiated. Needs to respond to that interface.
49
69
 
@@ -55,9 +75,10 @@ abort_on_exception:
55
75
 
56
76
  refresh_job:
57
77
  optional, your own custom refreshing job. if you are using something other than resque
78
+
58
79
  </pre>
59
80
 
60
- Then start it:
81
+ To start it:
61
82
 
62
83
  <pre>
63
84
  Resque::StuckQueue.start # blocking
@@ -121,7 +142,7 @@ class CustomJob
121
142
  end
122
143
  end
123
144
 
124
- Resque::StuckQueue.config[:refresh_job] = proc {
145
+ Resque::StuckQueue.config[:heartbeat_job] = proc {
125
146
  # or however else you enque your custom job, Sidekiq::Client.enqueue(CustomJob), whatever, etc.
126
147
  CustomJob.perform_async
127
148
  }
@@ -1,5 +1,6 @@
1
1
  require "resque_stuck_queue/version"
2
2
  require "resque_stuck_queue/config"
3
+ require "resque_stuck_queue/heartbeat_job"
3
4
 
4
5
  # TODO move this require into a configurable?
5
6
  require 'resque'
@@ -75,7 +76,7 @@ module Resque
75
76
 
76
77
  Redis::Classy.db = redis if Redis::Classy.db.nil?
77
78
 
78
- enqueue_repeating_refresh_job
79
+ enqueue_repeating_heartbeat_job
79
80
  setup_checker_thread
80
81
 
81
82
  # fo-eva.
@@ -121,14 +122,14 @@ module Resque
121
122
 
122
123
  private
123
124
 
124
- def enqueue_repeating_refresh_job
125
+ def enqueue_repeating_heartbeat_job
125
126
  @threads << Thread.new do
126
127
  Thread.current.abort_on_exception = config[:abort_on_exception]
127
128
  logger.info("Starting heartbeat thread")
128
129
  while @running
129
130
  # we want to go through resque jobs, because that's what we're trying to test here:
130
131
  # ensure that jobs get executed and the time is updated!
131
- logger.info("Sending refresh jobs")
132
+ logger.info("Sending heartbeat jobs")
132
133
  enqueue_jobs
133
134
  wait_for_it
134
135
  end
@@ -136,12 +137,14 @@ module Resque
136
137
  end
137
138
 
138
139
  def enqueue_jobs
139
- if config[:refresh_job]
140
- # FIXME config[:refresh_job] with mutliple queues is bad semantics
141
- config[:refresh_job].call
140
+ if config[:heartbeat_job]
141
+ # FIXME config[:heartbeat_job] with mutliple queues is bad semantics
142
+ config[:heartbeat_job].call
142
143
  else
143
144
  queues.each do |queue_name|
144
- Resque.enqueue_to(queue_name, RefreshLatestTimestamp, [heartbeat_key_for(queue_name), redis.client.host, redis.client.port])
145
+ Resque.enqueue_to(queue_name, HeartbeatJob, [heartbeat_key_for(queue_name), redis.client.host, redis.client.port])
146
+ queue_name = :snapshot_progress
147
+ Resque.enqueue_to(queue_name, HeartbeatJob, [Resque::StuckQueue.heartbeat_key_for(queue_name), Resque.redis.client.host, Resque.redis.client.port])
145
148
  end
146
149
  end
147
150
  end
@@ -155,17 +158,10 @@ module Resque
155
158
  if mutex.lock
156
159
  begin
157
160
  queues.each do |queue_name|
158
- logger.info("Lag time for #{queue_name} is #{lag_time(queue_name).inspect} seconds.")
159
- if triggered_ago = last_triggered(queue_name)
160
- logger.info("Last triggered for #{queue_name} is #{triggered_ago.inspect} seconds.")
161
- else
162
- logger.info("No last trigger found for #{queue_name}.")
163
- end
161
+ log_checker_info(queue_name)
164
162
  if should_trigger?(queue_name)
165
- logger.info("Triggering :triggered handler for #{queue_name} at #{Time.now}.")
166
163
  trigger_handler(queue_name, :triggered)
167
164
  elsif should_recover?(queue_name)
168
- logger.info("Triggering :recovered handler for #{queue_name} at #{Time.now}.")
169
165
  trigger_handler(queue_name, :recovered)
170
166
  end
171
167
  end
@@ -235,6 +231,7 @@ module Resque
235
231
  def trigger_handler(queue_name, type)
236
232
  raise 'Must trigger either the recovered or triggered handler!' unless (type == :recovered || type == :triggered)
237
233
  handler_name = :"#{type}_handler"
234
+ logger.info("Triggering #{type} handler for #{queue_name} at #{Time.now}.")
238
235
  (config[handler_name] || const_get(handler_name.upcase)).call(queue_name, lag_time(queue_name))
239
236
  manual_refresh(queue_name, type)
240
237
  rescue => e
@@ -243,6 +240,16 @@ module Resque
243
240
  force_stop!
244
241
  end
245
242
 
243
+ def log_checker_info(queue_name)
244
+ logger.info("Lag time for #{queue_name} is #{lag_time(queue_name).inspect} seconds.")
245
+ if triggered_ago = last_triggered(queue_name)
246
+ logger.info("Last triggered for #{queue_name} is #{triggered_ago.inspect} seconds.")
247
+ else
248
+ logger.info("No last trigger found for #{queue_name}.")
249
+ end
250
+
251
+ end
252
+
246
253
  def read_from_redis(keyname)
247
254
  redis.get(keyname)
248
255
  end
@@ -258,12 +265,3 @@ module Resque
258
265
  end
259
266
  end
260
267
 
261
- class RefreshLatestTimestamp
262
- def self.perform(args)
263
- timestamp_key = args[0]
264
- host = args[1]
265
- port = args[2]
266
- r = Redis.new(:host => host, :port => port)
267
- r.set(timestamp_key, Time.now.to_i)
268
- end
269
- end
@@ -15,8 +15,8 @@ module Resque
15
15
  class Config < Hash
16
16
 
17
17
  OPTIONS_DESCRIPTIONS = {
18
- :triggered_handler => "set to what gets triggered when resque-stuck-queue will detect the latest heartbeat is older than the trigger_timeout time setting.\n\tExample:\n\tResque::StuckQueue.config[:triggered_handler] = proc { |queue_name, lagtime| send_email('queue \#{queue_name} isnt working, aaah the daemons') }",
19
- :recovered_handler => "set to what gets triggered when resque-stuck-queue has triggered a problem, but then detects the queue went back down to functioning well again(before the next trigger).\n\tExample:\n\tResque::StuckQueue.config[:recovered_handler] = proc { |queue_name, lagtime| send_email('phew, queue \#{queue_name} is ok') }",
18
+ :triggered_handler => "set to what gets triggered when resque-stuck-queue will detect the latest heartbeat is older than the trigger_timeout time setting.\n\tExample:\n\tResque::StuckQueue.config[:triggered_handler] = proc { |queue_name, lagtime| send_email('queue \#{queue_name} isnt working, aaah the daemons') }",
19
+ :recovered_handler => "set to what gets triggered when resque-stuck-queue has triggered a problem, but then detects the queue went back down to functioning well again(it wont trigger again until it has recovered).\n\tExample:\n\tResque::StuckQueue.config[:recovered_handler] = proc { |queue_name, lagtime| send_email('phew, queue \#{queue_name} is ok') }",
20
20
  :heartbeat => "set to how often to push that 'heartbeat' job to refresh the latest time it worked.\n\tExample:\n\tResque::StuckQueue.config[:heartbeat] = 5.minutes",
21
21
  :trigger_timeout => "set to how much of a resque work lag you are willing to accept before being notified. note: take the :heartbeat setting into account when setting this timeout.\n\tExample:\n\tResque::StuckQueue.config[:trigger_timeout] = 55.minutes",
22
22
  :redis => "set the Redis instance StuckQueue will use",
@@ -25,7 +25,7 @@ module Resque
25
25
  :logger => "optional, pass a Logger. Default a ruby logger will be instantiated. Needs to respond to that interface.",
26
26
  :queues => "optional, monitor specific queues you want to send a heartbeat/monitor to. default is :app",
27
27
  :abort_on_exception => "optional, if you want the resque-stuck-queue threads to explicitly raise, default is false",
28
- :refresh_job => "optional, your own custom refreshing job. if you are using something other than resque",
28
+ :heartbeat_job => "optional, your own custom refreshing job. if you are using something other than resque",
29
29
  }
30
30
 
31
31
  OPTIONS = OPTIONS_DESCRIPTIONS.keys
@@ -0,0 +1,14 @@
1
+ module Resque
2
+ module StuckQueue
3
+ class HeartbeatJob
4
+ def self.perform(args)
5
+ timestamp_key = args[0]
6
+ host = args[1]
7
+ port = args[2]
8
+ new_time = Time.now.to_i
9
+ r = Redis.new(:host => host, :port => port)
10
+ r.set(timestamp_key, new_time)
11
+ end
12
+ end
13
+ end
14
+ end
@@ -1,5 +1,5 @@
1
1
  module Resque
2
2
  module StuckQueue
3
- VERSION = "0.1.1"
3
+ VERSION = "0.2.0"
4
4
  end
5
5
  end
data/test/test_helper.rb CHANGED
@@ -7,7 +7,6 @@ require "mocha/mini_test"
7
7
  $:.unshift(".")
8
8
  require 'resque_stuck_queue'
9
9
  require File.join(File.expand_path(File.dirname(__FILE__)), "resque", "set_redis_key")
10
- require File.join(File.expand_path(File.dirname(__FILE__)), "resque", "refresh_latest_timestamp")
11
10
 
12
11
  module TestHelper
13
12
 
@@ -6,7 +6,6 @@ require 'pry'
6
6
  $:.unshift(".")
7
7
  require 'resque_stuck_queue'
8
8
  require File.join(File.expand_path(File.dirname(__FILE__)), "resque", "set_redis_key")
9
- require File.join(File.expand_path(File.dirname(__FILE__)), "resque", "refresh_latest_timestamp")
10
9
  require File.join(File.expand_path(File.dirname(__FILE__)), "test_helper")
11
10
 
12
11
  class TestIntegration < Minitest::Test
@@ -88,7 +87,7 @@ class TestIntegration < Minitest::Test
88
87
  Resque::StuckQueue.config[:heartbeat] = 1
89
88
 
90
89
  begin
91
- Resque::StuckQueue.config[:refresh_job] = proc { Resque.enqueue(RefreshLatestTimestamp, Resque::StuckQueue.heartbeat_key_for(:app)) }
90
+ Resque::StuckQueue.config[:heartbeat_job] = proc { Resque.enqueue_to(:app, Resque::StuckQueue::HeartbeatJob, Resque::StuckQueue.heartbeat_key_for(:app)) }
92
91
  @triggered = false
93
92
  Resque::StuckQueue.config[:triggered_handler] = proc { @triggered = true }
94
93
  start_and_stop_loops_after(4)
data/test/test_lagtime.rb CHANGED
@@ -5,7 +5,6 @@ require 'pry'
5
5
  $:.unshift(".")
6
6
  require 'resque_stuck_queue'
7
7
  require File.join(File.expand_path(File.dirname(__FILE__)), "resque", "set_redis_key")
8
- require File.join(File.expand_path(File.dirname(__FILE__)), "resque", "refresh_latest_timestamp")
9
8
  require File.join(File.expand_path(File.dirname(__FILE__)), "test_helper")
10
9
 
11
10
  class TestLagTime < Minitest::Test
@@ -9,7 +9,7 @@ class TestYourOwnRefreshJob < Minitest::Test
9
9
  Resque::StuckQueue.config[:trigger_timeout] = 1
10
10
  Resque::StuckQueue.config[:heartbeat] = 1
11
11
  Resque::StuckQueue.config[:abort_on_exception] = true
12
- Resque::StuckQueue.config[:refresh_job] = nil
12
+ Resque::StuckQueue.config[:heartbeat_job] = nil
13
13
  Resque::StuckQueue.redis = Redis.new
14
14
  Resque::StuckQueue.redis.flushall
15
15
  end
@@ -17,7 +17,7 @@ class TestYourOwnRefreshJob < Minitest::Test
17
17
  def test_will_trigger_with_unrefreshing_custom_heartbeat_job
18
18
  # it will trigger because the key will be unrefreshed, hence 'old' and will always trigger.
19
19
  puts "#{__method__}"
20
- Resque::StuckQueue.config[:refresh_job] = proc { nil } # does not refresh global key
20
+ Resque::StuckQueue.config[:heartbeat_job] = proc { nil } # does not refresh global key
21
21
  @triggered = false
22
22
  Resque::StuckQueue.config[:triggered_handler] = proc { @triggered = true }
23
23
  start_and_stop_loops_after(3)
@@ -27,7 +27,7 @@ class TestYourOwnRefreshJob < Minitest::Test
27
27
  def test_will_fail_with_bad_custom_heartbeat_job
28
28
  puts "#{__method__}"
29
29
  begin
30
- Resque::StuckQueue.config[:refresh_job] = proc { raise 'bad proc doc' } # does not refresh global key
30
+ Resque::StuckQueue.config[:heartbeat_job] = proc { raise 'bad proc doc' } # does not refresh global key
31
31
  @triggered = false
32
32
  Resque::StuckQueue.config[:triggered_handler] = proc { @triggered = true }
33
33
  start_and_stop_loops_after(3)
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: resque_stuck_queue
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Shai Rosenfeld
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2014-01-27 00:00:00 Z
12
+ date: 2014-01-29 00:00:00 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: redis-mutex
@@ -59,9 +59,9 @@ files:
59
59
  - lib/resque/stuck_queue.rb
60
60
  - lib/resque_stuck_queue.rb
61
61
  - lib/resque_stuck_queue/config.rb
62
+ - lib/resque_stuck_queue/heartbeat_job.rb
62
63
  - lib/resque_stuck_queue/version.rb
63
64
  - resque_stuck_queue.gemspec
64
- - test/resque/refresh_latest_timestamp.rb
65
65
  - test/resque/set_redis_key.rb
66
66
  - test/test_collision.rb
67
67
  - test/test_config.rb
@@ -96,7 +96,6 @@ signing_key:
96
96
  specification_version: 4
97
97
  summary: fire a handler when your queues are wonky
98
98
  test_files:
99
- - test/resque/refresh_latest_timestamp.rb
100
99
  - test/resque/set_redis_key.rb
101
100
  - test/test_collision.rb
102
101
  - test/test_config.rb
@@ -1,8 +0,0 @@
1
- class RefreshLatestTimestamp
2
- @queue = :app
3
- def self.perform(args)
4
- timestamp_key, host, port = args[0], args[1], args[2]
5
- r = Redis.new(:host => host, :port => port)
6
- r.set(timestamp_key, Time.now.to_i)
7
- end
8
- end