solid_queue_web 1.3.0 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (26) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +99 -3
  3. data/app/controllers/solid_queue_web/application_controller.rb +20 -0
  4. data/app/controllers/solid_queue_web/audit_controller.rb +43 -0
  5. data/app/controllers/solid_queue_web/dashboard_controller.rb +2 -0
  6. data/app/controllers/solid_queue_web/failed_jobs/selections_controller.rb +2 -0
  7. data/app/controllers/solid_queue_web/failed_jobs_controller.rb +2 -0
  8. data/app/controllers/solid_queue_web/jobs/selections_controller.rb +1 -0
  9. data/app/controllers/solid_queue_web/jobs_controller.rb +9 -2
  10. data/app/controllers/solid_queue_web/queues/pauses_controller.rb +2 -0
  11. data/app/controllers/solid_queue_web/queues_controller.rb +5 -0
  12. data/app/controllers/solid_queue_web/retry_failed_jobs_controller.rb +2 -0
  13. data/app/models/solid_queue_web/audit_event.rb +17 -0
  14. data/app/services/solid_queue_web/slow_job_alert.rb +70 -0
  15. data/app/services/solid_queue_web/stale_process_alert.rb +68 -0
  16. data/app/views/layouts/solid_queue_web/application.html.erb +1 -0
  17. data/app/views/solid_queue_web/audit/index.html.erb +78 -0
  18. data/app/views/solid_queue_web/jobs/index.html.erb +4 -0
  19. data/app/views/solid_queue_web/queues/index.html.erb +3 -3
  20. data/config/routes.rb +1 -0
  21. data/db/migrate/01_create_solid_queue_web_audit_events.rb +16 -0
  22. data/lib/generators/solid_queue_web/install/migrations_generator.rb +24 -0
  23. data/lib/generators/solid_queue_web/install/templates/create_solid_queue_web_audit_events.rb.tt +16 -0
  24. data/lib/solid_queue_web/version.rb +1 -1
  25. data/lib/solid_queue_web.rb +15 -1
  26. metadata +9 -1
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: f84e63b803df1ce7a322b564eeb78262d7ec76c00d34d087076c842606960e45
4
- data.tar.gz: 3cc8a8e1dd074bf9770cea28053b7f7d947f40588f6ac20bc0ccf3b1936adcd0
3
+ metadata.gz: 0aea2023ccc5983daeb0781f0b56105492179d638822d9d6c4a853e7296f3258
4
+ data.tar.gz: 0b430cccb3b56335a562451f0d598390a4a3f81d2abb1ad5122ce3233fa5663c
5
5
  SHA512:
6
- metadata.gz: d7ecc0d4f79f041cca6f6d1aff56953336c63579eae4802e9fdf72603d19523189cf81f27409560f26d1c9df8ecb165c7a0c5bb1302a91a681759ecf5840f8bb
7
- data.tar.gz: 5ae9f1035b8443dcbaa66fc02a5171cd91d65b0c09bcb7d66f365350c86aa87a682a5a10c0235227315818fdc7ad8b7d6a3bdb72807c232d429f2a0d046f4a16
6
+ metadata.gz: f654b581b1bddc2c2559d39fa09cfc83c0bc664268ba085770179476fbed468e672341025f1d34f5e64c8cb9fc9127914d0d1601a073cc7fb19df03ed890c369
7
+ data.tar.gz: 5c4794c4ab5e1f5bd5e33aaa4b6630f5de7c4289c55b6506d2f21493c9b63e4e24680b03c2cd1554812426b3b285c5cf640adc079f4b5a9fe93916bcbbd9b3c3
data/README.md CHANGED
@@ -53,7 +53,9 @@ SolidQueueWeb surfaces all of this in a browser UI available at any route you ch
53
53
  - **Dashboard quick actions** — "Retry All Failed" and "Discard All Blocked" cards appear on the dashboard only when the respective count is non-zero; one-click bulk operations with confirm dialogs, keeping the dashboard clean when everything is healthy
54
54
  - **CSV export** — "Export CSV" button on the jobs, failed jobs, and history pages downloads all records matching the current filters; columns are tailored per view
55
55
  - **Slow job detection** — when `slow_job_threshold` is configured, claimed jobs running longer than the threshold are flagged with an orange row, a "slow" badge, and a "Running For" duration column on the Running tab; a "Slow Jobs" warning card appears on the dashboard with a link to the Running tab
56
- - **Webhook alerts** — set `alert_webhook_url` and `alert_failure_threshold` to receive a POST request whenever the failed job count meets or exceeds the threshold; fires asynchronously so dashboard performance is unaffected; a configurable cooldown (default 1 h) prevents repeated alerts while the count stays elevated
56
+ - **Job wait time** — the Running tab shows a "Wait Time" column with how long each job waited in the queue from enqueue to pickup; also exported as `wait_time_seconds` in the claimed-status CSV
57
+ - **Admin audit log** — every discard, retry, queue pause, and resume is recorded to a `solid_queue_web_audit_events` table and viewable at `/jobs/audit` with action/actor/queue filters and CSV export; actor identity captured via the optional `current_actor` config block; requires running the install generator to create the table
58
+ - **Webhook alerts** — set `alert_webhook_url` and `alert_failure_threshold` to receive a POST request whenever the failed job count meets or exceeds the threshold; set `alert_queue_thresholds` for per-queue depth alerts; set `alert_slow_job_count_threshold` (requires `slow_job_threshold`) for slow-job count alerts; set `alert_stale_process_threshold` for stale-worker alerts; all fire asynchronously with a configurable cooldown (default 1 h) to prevent repeated alerts
57
59
  - **Performance analytics** — per-job-class statistics at `/jobs/performance` showing run count, average, p50, p95, p99, standard deviation, min, and max duration; sorted by p95 descending so the slowest classes surface first; high std dev surfaces inconsistent jobs worth investigating; period filter scopes to 1h / 24h / 7d or all time; each class name links to the filtered History view
58
60
  - **Failed job trend chart** — a "Failures — Last 12 Hours" bar chart on the dashboard shows failures per hour over the last 12 hours; bars are red, making failure spikes visible before clicking into the failed jobs list
59
61
  - **Error frequency report** — `GET /jobs/failed_jobs/errors` groups all failed jobs by error class and message prefix, shows a count per group, and surfaces a sample backtrace in an expandable row; sorted by count descending so the most common errors appear first; accessible via the "Error Summary" button on the Failed Jobs page
@@ -106,8 +108,11 @@ SolidQueueWeb.configure do |config|
106
108
  config.slow_job_threshold = 5.minutes # flag claimed jobs running longer than this (default: nil = disabled)
107
109
  config.alert_webhook_url = "https://hooks.example.com/solid-queue" # POST target — string or array (default: nil = disabled)
108
110
  config.alert_failure_threshold = 10 # fire when failed count >= this (default: nil = disabled)
109
- config.alert_queue_thresholds = { "critical" => 50, "default" => 200 } # fire when queue depth >= threshold (default: {})
110
- config.alert_webhook_cooldown = 1800 # seconds between repeated alerts per alert type (default: 3600)
111
+ config.alert_queue_thresholds = { "critical" => 50, "default" => 200 } # fire when queue depth >= threshold (default: {})
112
+ config.alert_slow_job_count_threshold = 5 # fire when slow job count >= this (default: nil = disabled)
113
+ config.alert_stale_process_threshold = 1 # fire when stale process count >= this (default: nil = disabled)
114
+ config.alert_webhook_cooldown = 1800 # seconds between repeated alerts per alert type (default: 3600)
115
+ config.current_actor = -> { current_user&.email } # identity for audit log (default: nil)
111
116
  config.connects_to = { reading: :reading, writing: :writing } # read replica (default: nil)
112
117
  config.time_zone = "America/New_York" # display timezone for all timestamps (default: nil = UTC)
113
118
  end
@@ -182,6 +187,97 @@ The same `alert_webhook_url` endpoint(s) receive the payload, with a distinct ev
182
187
 
183
188
  Cooldown is tracked independently per queue, so a persistently deep "critical" queue does not suppress alerts for "default". The shared `alert_webhook_cooldown` setting applies to each queue separately.
184
189
 
190
+ ## Slow job alerts
191
+
192
+ Set `alert_slow_job_count_threshold` to fire a webhook when the number of currently-running slow jobs meets or exceeds a count. This requires `slow_job_threshold` to also be configured — it defines what "slow" means.
193
+
194
+ ```ruby
195
+ SolidQueueWeb.configure do |config|
196
+ config.slow_job_threshold = 5.minutes # a job is "slow" if it has been claimed longer than this
197
+ config.alert_slow_job_count_threshold = 3 # fire when >= 3 jobs are slow
198
+ config.alert_webhook_url = "https://hooks.example.com/solid-queue"
199
+ config.alert_webhook_cooldown = 1800 # don't re-fire for 30 minutes (default: 3600)
200
+ end
201
+ ```
202
+
203
+ The same `alert_webhook_url` endpoint(s) receive the payload with a distinct event type:
204
+
205
+ ```json
206
+ {
207
+ "event": "slow_job_threshold_exceeded",
208
+ "slow_job_count": 5,
209
+ "threshold": 3,
210
+ "fired_at": "2026-05-28T08:00:00Z"
211
+ }
212
+ ```
213
+
214
+ The alert fires on every dashboard page load while the condition persists, subject to the cooldown window.
215
+
216
+ ## Stale process alerts
217
+
218
+ Set `alert_stale_process_threshold` to fire a webhook when the number of stale workers meets or exceeds a count. A process is considered stale when its `last_heartbeat_at` has not been updated within `SolidQueue.process_alive_threshold` (default 5 minutes). A stale worker means jobs in its queues have silently stopped processing.
219
+
220
+ ```ruby
221
+ SolidQueueWeb.configure do |config|
222
+ config.alert_stale_process_threshold = 1 # fire when any process goes stale
223
+ config.alert_webhook_url = "https://hooks.example.com/solid-queue"
224
+ config.alert_webhook_cooldown = 1800 # don't re-fire for 30 minutes (default: 3600)
225
+ end
226
+ ```
227
+
228
+ The same `alert_webhook_url` endpoint(s) receive the payload with a distinct event type:
229
+
230
+ ```json
231
+ {
232
+ "event": "stale_process_detected",
233
+ "stale_process_count": 2,
234
+ "threshold": 1,
235
+ "fired_at": "2026-05-28T08:00:00Z"
236
+ }
237
+ ```
238
+
239
+ The alert fires on every dashboard page load while the condition persists, subject to the cooldown window.
240
+
241
+ ## Admin audit log
242
+
243
+ Every discard, retry, queue pause, and resume action is recorded to a `solid_queue_web_audit_events` table and viewable at `/jobs/audit`.
244
+
245
+ ### Installation
246
+
247
+ The audit log requires an opt-in migration. Run the install generator to copy it to your application:
248
+
249
+ ```bash
250
+ rails generate solid_queue_web:install:migrations
251
+ rails db:migrate
252
+ ```
253
+
254
+ ### Identity
255
+
256
+ Set `SolidQueueWeb.current_actor` to a block that returns the current user's identity as a string. The block is evaluated in controller context, so you have access to helpers like `current_user`:
257
+
258
+ ```ruby
259
+ SolidQueueWeb.configure do |config|
260
+ config.current_actor = -> { current_user&.email }
261
+ end
262
+ ```
263
+
264
+ If not configured, the actor column is left `nil`.
265
+
266
+ ### Audited actions
267
+
268
+ | Action | Trigger |
269
+ |---|---|
270
+ | `job_discarded` | Single job discarded from the jobs list |
271
+ | `jobs_discarded` | Bulk or selection discard from the jobs list |
272
+ | `failed_job_retried` | Single failed job retried |
273
+ | `failed_jobs_retried` | Bulk or selection retry of failed jobs |
274
+ | `failed_job_discarded` | Single failed job discarded |
275
+ | `failed_jobs_discarded` | Bulk or selection discard of failed jobs |
276
+ | `queue_paused` | Queue paused |
277
+ | `queue_resumed` | Queue resumed |
278
+
279
+ The audit log page at `/jobs/audit` supports filtering by action, actor, and queue name. All records can be exported as CSV.
280
+
185
281
  ## Metrics endpoint
186
282
 
187
283
  `GET /jobs/metrics.json` returns a machine-readable JSON document suitable for Prometheus scraping, uptime monitors, or external dashboards. No configuration is required — the endpoint is available as soon as the engine is mounted.
@@ -37,5 +37,25 @@ module SolidQueueWeb
37
37
  def request_basic_auth
38
38
  request_http_basic_authentication("Solid Queue Dashboard")
39
39
  end
40
+
41
+ def record_audit(action, job_class: nil, queue_name: nil, item_count: 1)
42
+ AuditEvent.create!(
43
+ action: action,
44
+ actor: resolve_current_actor,
45
+ job_class: job_class,
46
+ queue_name: queue_name,
47
+ item_count: item_count
48
+ )
49
+ rescue => e
50
+ Rails.logger.error("[SolidQueueWeb] Audit log failed: #{e.message}")
51
+ end
52
+
53
+ def resolve_current_actor
54
+ block = SolidQueueWeb.current_actor
55
+ instance_exec(&block) if block
56
+ rescue => e
57
+ Rails.logger.error("[SolidQueueWeb] current_actor block failed: #{e.message}")
58
+ nil
59
+ end
40
60
  end
41
61
  end
@@ -0,0 +1,43 @@
1
+ module SolidQueueWeb
2
+ class AuditController < ApplicationController
3
+ before_action :set_filters
4
+
5
+ def index
6
+ scope = audit_scope
7
+ respond_to do |format|
8
+ format.html { @pagy, @audit_events = pagy(scope) }
9
+ format.csv do
10
+ send_data audit_csv(scope),
11
+ filename: "audit-log-#{Date.today}.csv",
12
+ type: "text/csv", disposition: "attachment"
13
+ end
14
+ end
15
+ end
16
+
17
+ private
18
+
19
+ def set_filters
20
+ @action_filter = params[:action_filter].presence_in(AuditEvent::ACTIONS)
21
+ @actor_filter = params[:actor].presence
22
+ @queue_filter = params[:queue].presence
23
+ end
24
+
25
+ def audit_scope
26
+ scope = AuditEvent.recent
27
+ scope = scope.where(action: @action_filter) if @action_filter
28
+ scope = scope.where(actor: @actor_filter) if @actor_filter
29
+ scope = scope.where(queue_name: @queue_filter) if @queue_filter
30
+ scope
31
+ end
32
+
33
+ def audit_csv(scope)
34
+ CSV.generate(headers: true) do |csv|
35
+ csv << %w[id action actor job_class queue_name item_count created_at]
36
+ scope.each do |event|
37
+ csv << [event.id, event.action, event.actor, event.job_class,
38
+ event.queue_name, event.item_count, event.created_at.iso8601]
39
+ end
40
+ end
41
+ end
42
+ end
43
+ end
@@ -4,6 +4,8 @@ module SolidQueueWeb
4
4
  @stats = DashboardStats.new
5
5
  AlertWebhook.call(failure_count: @stats.counts[:failed])
6
6
  QueueDepthAlert.call
7
+ SlowJobAlert.call
8
+ StaleProcessAlert.call
7
9
  end
8
10
  end
9
11
  end
@@ -6,6 +6,7 @@ module SolidQueueWeb
6
6
  executions = SolidQueue::FailedExecution.where(id: ids)
7
7
  jobs = executions.includes(:job).map(&:job)
8
8
  SolidQueue::FailedExecution.retry_all(jobs)
9
+ record_audit("failed_jobs_retried", item_count: jobs.size)
9
10
  redirect_to failed_jobs_path,
10
11
  notice: "#{jobs.size} #{"job".pluralize(jobs.size)} queued for retry."
11
12
  rescue => e
@@ -17,6 +18,7 @@ module SolidQueueWeb
17
18
  executions = SolidQueue::FailedExecution.where(id: ids)
18
19
  jobs = executions.includes(:job).map(&:job)
19
20
  SolidQueue::FailedExecution.discard_all_from_jobs(jobs)
21
+ record_audit("failed_jobs_discarded", item_count: jobs.size)
20
22
  redirect_to failed_jobs_path,
21
23
  notice: "#{jobs.size} #{"job".pluralize(jobs.size)} discarded."
22
24
  rescue => e
@@ -37,7 +37,9 @@ module SolidQueueWeb
37
37
 
38
38
  def perform_discard(executions)
39
39
  jobs = executions.map(&:job)
40
+ action = params[:id] ? "failed_job_discarded" : "failed_jobs_discarded"
40
41
  SolidQueue::FailedExecution.discard_all_from_jobs(jobs)
42
+ record_audit(action, job_class: jobs.first&.class_name, queue_name: jobs.first&.queue_name, item_count: jobs.size)
41
43
  redirect_to failed_jobs_path(queue: @queue, q: @search, period: @period),
42
44
  notice: "#{jobs.size} #{"job".pluralize(jobs.size)} discarded."
43
45
  end
@@ -9,6 +9,7 @@ module SolidQueueWeb
9
9
  ids = Array(params[:ids]).map(&:to_i).reject(&:zero?)
10
10
  jobs = model.where(id: ids).includes(:job).map(&:job)
11
11
  model.discard_all_from_jobs(jobs)
12
+ record_audit("jobs_discarded", item_count: jobs.size)
12
13
  redirect_to jobs_path(status: status, period: period),
13
14
  notice: "#{jobs.size} #{"job".pluralize(jobs.size)} discarded."
14
15
  rescue ArgumentError => e
@@ -30,7 +30,9 @@ module SolidQueueWeb
30
30
  model = Job.execution_model_for!(@status)
31
31
  if params[:id]
32
32
  @execution = model.find(params[:id])
33
+ discarded_job = @execution.job
33
34
  @execution.discard
35
+ record_audit("job_discarded", job_class: discarded_job&.class_name, queue_name: discarded_job&.queue_name)
34
36
  @remaining_count = filtered_scope(model).count
35
37
  respond_to do |format|
36
38
  format.turbo_stream
@@ -39,6 +41,7 @@ module SolidQueueWeb
39
41
  else
40
42
  jobs = filtered_scope(model).map(&:job)
41
43
  model.discard_all_from_jobs(jobs)
44
+ record_audit("jobs_discarded", item_count: jobs.size)
42
45
  redirect_to jobs_return_path, notice: "#{jobs.size} #{"job".pluralize(jobs.size)} discarded."
43
46
  end
44
47
  rescue ArgumentError => e
@@ -82,10 +85,14 @@ module SolidQueueWeb
82
85
 
83
86
  def jobs_csv(scope)
84
87
  CSV.generate(headers: true) do |csv|
85
- csv << %w[id class_name queue_name status priority enqueued_at]
88
+ headers = %w[id class_name queue_name status priority enqueued_at]
89
+ headers << "wait_time_seconds" if @status == "claimed"
90
+ csv << headers
86
91
  scope.each do |execution|
87
92
  job = execution.job
88
- csv << [job.id, job.class_name, job.queue_name, @status, job.priority, job.created_at.iso8601]
93
+ row = [job.id, job.class_name, job.queue_name, @status, job.priority, job.created_at.iso8601]
94
+ row << (execution.created_at - job.created_at).to_i if @status == "claimed"
95
+ csv << row
89
96
  end
90
97
  end
91
98
  end
@@ -4,6 +4,7 @@ module SolidQueueWeb
4
4
  def create
5
5
  queue = SolidQueue::Queue.find_by_name(params[:queue_name])
6
6
  queue.pause
7
+ record_audit("queue_paused", queue_name: queue.name)
7
8
  redirect_to queues_path, notice: "Queue \"#{queue.name}\" paused."
8
9
  rescue => e
9
10
  redirect_to queues_path, alert: "Could not pause queue: #{e.message}"
@@ -12,6 +13,7 @@ module SolidQueueWeb
12
13
  def destroy
13
14
  queue = SolidQueue::Queue.find_by_name(params[:queue_name])
14
15
  queue.resume
16
+ record_audit("queue_resumed", queue_name: queue.name)
15
17
  redirect_to queues_path, notice: "Queue \"#{queue.name}\" resumed."
16
18
  rescue => e
17
19
  redirect_to queues_path, alert: "Could not resume queue: #{e.message}"
@@ -7,6 +7,11 @@ module SolidQueueWeb
7
7
  @failed_24h = stats.failed_24h
8
8
  @oldest_ready = stats.oldest_ready
9
9
  @failure_sparklines = stats.failure_sparklines
10
+ @queue_sizes = SolidQueue::ReadyExecution
11
+ .joins(:job)
12
+ .group("solid_queue_jobs.queue_name")
13
+ .count
14
+ @paused_queue_names = SolidQueue::Pause.pluck(:queue_name).to_set
10
15
  end
11
16
  end
12
17
  end
@@ -16,6 +16,8 @@ module SolidQueueWeb
16
16
  else
17
17
  SolidQueue::FailedExecution.retry_all(jobs)
18
18
  end
19
+ action = params[:id] ? "failed_job_retried" : "failed_jobs_retried"
20
+ record_audit(action, job_class: jobs.first&.class_name, queue_name: jobs.first&.queue_name, item_count: jobs.size)
19
21
  redirect_to failed_jobs_path(queue: @queue, q: @search, period: @period),
20
22
  notice: retry_notice(jobs.size)
21
23
  rescue ArgumentError => e
@@ -0,0 +1,17 @@
1
+ module SolidQueueWeb
2
+ class AuditEvent < ApplicationRecord
3
+ self.table_name = "solid_queue_web_audit_events"
4
+
5
+ ACTIONS = %w[
6
+ job_discarded jobs_discarded
7
+ failed_job_retried failed_jobs_retried
8
+ failed_job_discarded failed_jobs_discarded
9
+ queue_paused queue_resumed
10
+ ].freeze
11
+
12
+ validates :action, presence: true, inclusion: { in: ACTIONS }
13
+ validates :item_count, numericality: { greater_than: 0 }
14
+
15
+ scope :recent, -> { order(created_at: :desc) }
16
+ end
17
+ end
@@ -0,0 +1,70 @@
1
+ require "net/http"
2
+ require "json"
3
+ require "uri"
4
+
5
+ module SolidQueueWeb
6
+ class SlowJobAlert
7
+ MUTEX = Mutex.new
8
+
9
+ class << self
10
+ def call
11
+ return unless configured?
12
+
13
+ slow_count = SolidQueue::ClaimedExecution
14
+ .where("created_at <= ?", SolidQueueWeb.slow_job_threshold.ago)
15
+ .count
16
+
17
+ return if slow_count < SolidQueueWeb.alert_slow_job_count_threshold
18
+ return unless should_fire?
19
+
20
+ urls = webhook_urls
21
+ Thread.new { urls.each { |url| post(url, slow_count) } }
22
+ end
23
+
24
+ def reset!
25
+ MUTEX.synchronize { @last_fired_at = nil }
26
+ end
27
+
28
+ private
29
+
30
+ def configured?
31
+ SolidQueueWeb.slow_job_threshold.present? &&
32
+ SolidQueueWeb.alert_slow_job_count_threshold.present? &&
33
+ webhook_urls.any?
34
+ end
35
+
36
+ def webhook_urls
37
+ Array(SolidQueueWeb.alert_webhook_url).flatten.compact.select(&:present?)
38
+ end
39
+
40
+ def should_fire?
41
+ MUTEX.synchronize do
42
+ cooldown = SolidQueueWeb.alert_webhook_cooldown
43
+ return false if @last_fired_at && Time.current - @last_fired_at < cooldown
44
+
45
+ @last_fired_at = Time.current
46
+ true
47
+ end
48
+ end
49
+
50
+ def post(url_string, slow_count)
51
+ uri = URI.parse(url_string)
52
+ payload = JSON.generate(
53
+ event: "slow_job_threshold_exceeded",
54
+ slow_job_count: slow_count,
55
+ threshold: SolidQueueWeb.alert_slow_job_count_threshold,
56
+ fired_at: Time.current.iso8601
57
+ )
58
+ http = Net::HTTP.new(uri.host, uri.port)
59
+ http.use_ssl = uri.scheme == "https"
60
+ http.open_timeout = 5
61
+ http.read_timeout = 10
62
+ request = Net::HTTP::Post.new(uri.path.presence || "/", "Content-Type" => "application/json")
63
+ request.body = payload
64
+ http.request(request)
65
+ rescue => e
66
+ Rails.logger.error("[SolidQueueWeb] Slow job alert webhook failed: #{e.message}")
67
+ end
68
+ end
69
+ end
70
+ end
@@ -0,0 +1,68 @@
1
+ require "net/http"
2
+ require "json"
3
+ require "uri"
4
+
5
+ module SolidQueueWeb
6
+ class StaleProcessAlert
7
+ MUTEX = Mutex.new
8
+
9
+ class << self
10
+ def call
11
+ return unless configured?
12
+
13
+ stale_count = SolidQueue::Process
14
+ .where("last_heartbeat_at < ?", SolidQueue.process_alive_threshold.ago)
15
+ .count
16
+
17
+ return if stale_count < SolidQueueWeb.alert_stale_process_threshold
18
+ return unless should_fire?
19
+
20
+ urls = webhook_urls
21
+ Thread.new { urls.each { |url| post(url, stale_count) } }
22
+ end
23
+
24
+ def reset!
25
+ MUTEX.synchronize { @last_fired_at = nil }
26
+ end
27
+
28
+ private
29
+
30
+ def configured?
31
+ SolidQueueWeb.alert_stale_process_threshold.present? && webhook_urls.any?
32
+ end
33
+
34
+ def webhook_urls
35
+ Array(SolidQueueWeb.alert_webhook_url).flatten.compact.select(&:present?)
36
+ end
37
+
38
+ def should_fire?
39
+ MUTEX.synchronize do
40
+ cooldown = SolidQueueWeb.alert_webhook_cooldown
41
+ return false if @last_fired_at && Time.current - @last_fired_at < cooldown
42
+
43
+ @last_fired_at = Time.current
44
+ true
45
+ end
46
+ end
47
+
48
+ def post(url_string, stale_count)
49
+ uri = URI.parse(url_string)
50
+ payload = JSON.generate(
51
+ event: "stale_process_detected",
52
+ stale_process_count: stale_count,
53
+ threshold: SolidQueueWeb.alert_stale_process_threshold,
54
+ fired_at: Time.current.iso8601
55
+ )
56
+ http = Net::HTTP.new(uri.host, uri.port)
57
+ http.use_ssl = uri.scheme == "https"
58
+ http.open_timeout = 5
59
+ http.read_timeout = 10
60
+ request = Net::HTTP::Post.new(uri.path.presence || "/", "Content-Type" => "application/json")
61
+ request.body = payload
62
+ http.request(request)
63
+ rescue => e
64
+ Rails.logger.error("[SolidQueueWeb] Stale process alert webhook failed: #{e.message}")
65
+ end
66
+ end
67
+ end
68
+ end
@@ -26,6 +26,7 @@
26
26
  <li><%= link_to "Recurring", recurring_tasks_path, class: current_page?(recurring_tasks_path) ? "active" : "", aria: { current: current_page?(recurring_tasks_path) ? "page" : nil } %></li>
27
27
  <li><%= link_to "Processes", processes_path, class: current_page?(processes_path) ? "active" : "", aria: { current: current_page?(processes_path) ? "page" : nil } %></li>
28
28
  <li><%= link_to "Search", search_path, class: current_page?(search_path) ? "active" : "", aria: { current: current_page?(search_path) ? "page" : nil } %></li>
29
+ <li><%= link_to "Audit", audit_path, class: current_page?(audit_path) ? "active" : "", aria: { current: current_page?(audit_path) ? "page" : nil } %></li>
29
30
  </ul>
30
31
  </nav>
31
32
  </div>
@@ -0,0 +1,78 @@
1
+ <h1 class="sqd-page-title">Audit Log</h1>
2
+
3
+ <div class="sqd-page-header">
4
+ <div class="sqd-filters">
5
+ <form action="<%= audit_path %>" method="get" style="display: flex; gap: 0.5rem; align-items: center; flex-wrap: wrap;">
6
+ <select name="action_filter" class="sqd-select" aria-label="Filter by action" onchange="this.form.submit()">
7
+ <option value="">All actions</option>
8
+ <% SolidQueueWeb::AuditEvent::ACTIONS.each do |a| %>
9
+ <option value="<%= a %>" <%= @action_filter == a ? "selected" : "" %>><%= a.tr("_", " ") %></option>
10
+ <% end %>
11
+ </select>
12
+ <% if @actor_filter.present? %>
13
+ <span class="sqd-badge sqd-badge--muted">Actor: <%= @actor_filter %></span>
14
+ <%= link_to "×", audit_path(action_filter: @action_filter, queue: @queue_filter), class: "sqd-btn sqd-btn--muted sqd-btn--sm" %>
15
+ <% end %>
16
+ <% if @queue_filter.present? %>
17
+ <span class="sqd-badge sqd-badge--muted">Queue: <%= @queue_filter %></span>
18
+ <%= link_to "×", audit_path(action_filter: @action_filter, actor: @actor_filter), class: "sqd-btn sqd-btn--muted sqd-btn--sm" %>
19
+ <% end %>
20
+ <% if @action_filter.present? || @actor_filter.present? || @queue_filter.present? %>
21
+ <%= link_to "Clear", audit_path, class: "sqd-btn sqd-btn--muted sqd-btn--sm" %>
22
+ <% end %>
23
+ </form>
24
+ </div>
25
+ <% if @audit_events.any? %>
26
+ <div class="sqd-actions">
27
+ <%= link_to "Export CSV", audit_path(format: :csv, action_filter: @action_filter, actor: @actor_filter, queue: @queue_filter),
28
+ class: "sqd-btn sqd-btn--muted", data: { turbo: false } %>
29
+ </div>
30
+ <% end %>
31
+ </div>
32
+
33
+ <div class="sqd-card">
34
+ <% if @audit_events.empty? %>
35
+ <div class="sqd-empty">No audit events recorded.</div>
36
+ <% else %>
37
+ <table>
38
+ <thead>
39
+ <tr>
40
+ <th scope="col">Time</th>
41
+ <th scope="col">Action</th>
42
+ <th scope="col">Actor</th>
43
+ <th scope="col">Job Class</th>
44
+ <th scope="col">Queue</th>
45
+ <th scope="col">Count</th>
46
+ </tr>
47
+ </thead>
48
+ <tbody>
49
+ <% @audit_events.each do |event| %>
50
+ <tr>
51
+ <td class="sqd-mono"><%= format_timestamp(event.created_at) %></td>
52
+ <td><span class="sqd-badge sqd-badge--<%= event.action.include?("discard") ? "failed" : event.action.include?("paused") || event.action.include?("resumed") ? "paused" : "ready" %>"><%= event.action.tr("_", " ") %></span></td>
53
+ <td class="sqd-mono sqd-muted-text">
54
+ <% if event.actor.present? %>
55
+ <%= link_to event.actor, audit_path(action_filter: @action_filter, queue: @queue_filter, actor: event.actor), style: "color: inherit;" %>
56
+ <% else %>
57
+ <span style="color: var(--muted)">—</span>
58
+ <% end %>
59
+ </td>
60
+ <td class="sqd-mono"><%= event.job_class || "—" %></td>
61
+ <td class="sqd-mono">
62
+ <% if event.queue_name.present? %>
63
+ <%= link_to event.queue_name, audit_path(action_filter: @action_filter, actor: @actor_filter, queue: event.queue_name), style: "color: inherit;" %>
64
+ <% else %>
65
+ <span style="color: var(--muted)">—</span>
66
+ <% end %>
67
+ </td>
68
+ <td><%= event.item_count %></td>
69
+ </tr>
70
+ <% end %>
71
+ </tbody>
72
+ </table>
73
+ <% end %>
74
+ </div>
75
+
76
+ <% if @pagy.last > 1 %>
77
+ <%= @pagy.series_nav.html_safe %>
78
+ <% end %>
@@ -164,6 +164,7 @@
164
164
  <%= sort_header_th("Enqueued At", "created_at", sort_url, current_sort: @sort, current_dir: @direction) %>
165
165
  <% if @status == "claimed" %>
166
166
  <th scope="col">Running For</th>
167
+ <th scope="col">Wait Time</th>
167
168
  <% end %>
168
169
  </tr>
169
170
  </thead>
@@ -192,6 +193,9 @@
192
193
  <td class="sqd-mono<%= slow ? " sqd-slow-duration" : "" %>">
193
194
  <%= time_ago_in_words(execution.created_at) %>
194
195
  </td>
196
+ <td class="sqd-mono">
197
+ <%= format_duration(execution.created_at - job.created_at) %>
198
+ </td>
195
199
  <% end %>
196
200
  </tr>
197
201
  <% end %>
@@ -21,7 +21,7 @@
21
21
  <% @queues.each do |queue| %>
22
22
  <tr>
23
23
  <td class="sqd-mono"><%= queue.name %></td>
24
- <td><%= queue.size %></td>
24
+ <td><%= @queue_sizes[queue.name] || 0 %></td>
25
25
  <td>
26
26
  <% if (oldest = @oldest_ready[queue.name]) %>
27
27
  <% age = Time.current - oldest %>
@@ -52,14 +52,14 @@
52
52
  <% end %>
53
53
  </td>
54
54
  <td>
55
- <% if queue.paused? %>
55
+ <% if @paused_queue_names.include?(queue.name) %>
56
56
  <span class="sqd-badge sqd-badge--paused">Paused</span>
57
57
  <% else %>
58
58
  <span class="sqd-badge sqd-badge--running">Running</span>
59
59
  <% end %>
60
60
  </td>
61
61
  <td class="sqd-row-actions">
62
- <% if queue.paused? %>
62
+ <% if @paused_queue_names.include?(queue.name) %>
63
63
  <%= button_to "Resume", queue_pause_path(queue.name), method: :delete,
64
64
  class: "sqd-btn sqd-btn--primary sqd-btn--sm" %>
65
65
  <% else %>
data/config/routes.rb CHANGED
@@ -3,6 +3,7 @@ SolidQueueWeb::Engine.routes.draw do
3
3
  resource :blocked_jobs, only: [:destroy]
4
4
 
5
5
  get "metrics", to: "metrics#index", as: :metrics, defaults: { format: :json }
6
+ get "audit", to: "audit#index", as: :audit
6
7
  get "search", to: "search#index", as: :search
7
8
  get "history", to: "history#index", as: :history
8
9
  get "performance", to: "performance#index", as: :performance
@@ -0,0 +1,16 @@
1
+ class CreateSolidQueueWebAuditEvents < ActiveRecord::Migration[7.1]
2
+ def change
3
+ create_table :solid_queue_web_audit_events do |t|
4
+ t.string :action, null: false
5
+ t.string :actor
6
+ t.string :job_class
7
+ t.string :queue_name
8
+ t.integer :item_count, null: false, default: 1
9
+ t.datetime :created_at, null: false
10
+ end
11
+
12
+ add_index :solid_queue_web_audit_events, :created_at
13
+ add_index :solid_queue_web_audit_events, :action
14
+ add_index :solid_queue_web_audit_events, :actor
15
+ end
16
+ end
@@ -0,0 +1,24 @@
1
+ require "rails/generators"
2
+ require "rails/generators/active_record"
3
+
4
+ module SolidQueueWeb
5
+ module Install
6
+ class MigrationsGenerator < Rails::Generators::Base
7
+ include Rails::Generators::Migration
8
+
9
+ source_root File.expand_path("templates", __dir__)
10
+ desc "Copy SolidQueueWeb migrations to your application."
11
+
12
+ def self.next_migration_number(path)
13
+ ActiveRecord::Generators::Base.next_migration_number(path)
14
+ end
15
+
16
+ def create_migration_file
17
+ migration_template(
18
+ "create_solid_queue_web_audit_events.rb.tt",
19
+ "db/migrate/create_solid_queue_web_audit_events.rb"
20
+ )
21
+ end
22
+ end
23
+ end
24
+ end
@@ -0,0 +1,16 @@
1
+ class CreateSolidQueueWebAuditEvents < ActiveRecord::Migration[<%= ActiveRecord::Migration.current_version %>]
2
+ def change
3
+ create_table :solid_queue_web_audit_events do |t|
4
+ t.string :action, null: false
5
+ t.string :actor
6
+ t.string :job_class
7
+ t.string :queue_name
8
+ t.integer :item_count, null: false, default: 1
9
+ t.datetime :created_at, null: false
10
+ end
11
+
12
+ add_index :solid_queue_web_audit_events, :created_at
13
+ add_index :solid_queue_web_audit_events, :action
14
+ add_index :solid_queue_web_audit_events, :actor
15
+ end
16
+ end
@@ -1,3 +1,3 @@
1
1
  module SolidQueueWeb
2
- VERSION = "1.3.0"
2
+ VERSION = "1.5.0"
3
3
  end
@@ -6,7 +6,8 @@ module SolidQueueWeb
6
6
  class << self
7
7
  attr_writer :page_size, :dashboard_refresh_interval, :default_refresh_interval, :search_results_limit,
8
8
  :slow_job_threshold, :alert_webhook_url, :alert_failure_threshold, :alert_webhook_cooldown,
9
- :alert_queue_thresholds, :connects_to, :time_zone
9
+ :alert_queue_thresholds, :alert_slow_job_count_threshold, :alert_stale_process_threshold,
10
+ :connects_to, :time_zone
10
11
 
11
12
  def page_size
12
13
  @page_size || 25
@@ -44,6 +45,14 @@ module SolidQueueWeb
44
45
  @alert_queue_thresholds || {}
45
46
  end
46
47
 
48
+ def alert_slow_job_count_threshold
49
+ @alert_slow_job_count_threshold
50
+ end
51
+
52
+ def alert_stale_process_threshold
53
+ @alert_stale_process_threshold
54
+ end
55
+
47
56
  def connects_to
48
57
  @connects_to
49
58
  end
@@ -60,5 +69,10 @@ module SolidQueueWeb
60
69
  @authenticate = block if block_given?
61
70
  @authenticate
62
71
  end
72
+
73
+ def current_actor(&block)
74
+ @current_actor = block if block_given?
75
+ @current_actor
76
+ end
63
77
  end
64
78
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: solid_queue_web
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.3.0
4
+ version: 1.5.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Chuck Smith
@@ -122,6 +122,7 @@ files:
122
122
  - app/assets/stylesheets/solid_queue_web/_12_dark_mode.css
123
123
  - app/assets/stylesheets/solid_queue_web/application.css
124
124
  - app/controllers/solid_queue_web/application_controller.rb
125
+ - app/controllers/solid_queue_web/audit_controller.rb
125
126
  - app/controllers/solid_queue_web/blocked_jobs_controller.rb
126
127
  - app/controllers/solid_queue_web/dashboard_controller.rb
127
128
  - app/controllers/solid_queue_web/failed_jobs/arguments_controller.rb
@@ -151,6 +152,7 @@ files:
151
152
  - app/javascript/solid_queue_web/theme_controller.js
152
153
  - app/jobs/solid_queue_web/application_job.rb
153
154
  - app/models/solid_queue_web/application_record.rb
155
+ - app/models/solid_queue_web/audit_event.rb
154
156
  - app/models/solid_queue_web/job.rb
155
157
  - app/services/solid_queue_web/alert_webhook.rb
156
158
  - app/services/solid_queue_web/dashboard_stats.rb
@@ -159,7 +161,10 @@ files:
159
161
  - app/services/solid_queue_web/metrics_payload.rb
160
162
  - app/services/solid_queue_web/queue_depth_alert.rb
161
163
  - app/services/solid_queue_web/queue_stats.rb
164
+ - app/services/solid_queue_web/slow_job_alert.rb
165
+ - app/services/solid_queue_web/stale_process_alert.rb
162
166
  - app/views/layouts/solid_queue_web/application.html.erb
167
+ - app/views/solid_queue_web/audit/index.html.erb
163
168
  - app/views/solid_queue_web/dashboard/index.html.erb
164
169
  - app/views/solid_queue_web/failed_jobs/errors/index.html.erb
165
170
  - app/views/solid_queue_web/failed_jobs/index.html.erb
@@ -177,6 +182,9 @@ files:
177
182
  - app/views/solid_queue_web/search/index.html.erb
178
183
  - config/importmap.rb
179
184
  - config/routes.rb
185
+ - db/migrate/01_create_solid_queue_web_audit_events.rb
186
+ - lib/generators/solid_queue_web/install/migrations_generator.rb
187
+ - lib/generators/solid_queue_web/install/templates/create_solid_queue_web_audit_events.rb.tt
180
188
  - lib/solid_queue_web.rb
181
189
  - lib/solid_queue_web/engine.rb
182
190
  - lib/solid_queue_web/version.rb