solid_queue_web 1.1.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e28c1cc2c32722a5b876083166f98549f60c2788cb5e225d6d6fa1122a2964ae
4
- data.tar.gz: b68d4c0cf42091242b1816956e217ae64f471142df5ce974f71772ae8b454d83
3
+ metadata.gz: e4178da05b230b0990212f6dd74b68f45b18e97e50fd1b732bc54a3243e1d61d
4
+ data.tar.gz: af6f4c2352ce0076fb441fe90d8e38f233bb3bee99fce040155d34e23af2e0b9
5
5
  SHA512:
6
- metadata.gz: ba9a31d372ecf83a97b0ed708ce8209ea5b24e9e042d11f7ff61ce54925bef988d0a6e12a7337c9757051268210279bd46eb6e011b6008d602832d340448b0db
7
- data.tar.gz: 0c41966b6dab2b47e4049b0afbf884d016d62fb78feb424934738900d4f676978c05c0129c7b02b31cc2eb47caf7770e090ad2921f8fec56e8854bc307b7f1df
6
+ metadata.gz: '0751948c9e66b3b5d0bbd3bb47007121466ebec956b523b2d3c3b1e16df399005d4acc823b2a1196f75a9511cc5167f135ba8bbb83f6e5c3778f405accc62e3f'
7
+ data.tar.gz: 422342bb5b419181d5ed4612cf2b06f907f5dcf540d43bb88973b145d1b08b5e4afd9e6319af5249dd6d16351480ab5440b54b111390f6692fc1209a217cfb4d
data/README.md CHANGED
@@ -8,6 +8,8 @@
8
8
 
9
9
  A monitoring and management dashboard for [Solid Queue](https://github.com/rails/solid_queue), mountable as a Rails engine in any app.
10
10
 
11
+ > **Note:** Development of this gem will continue, but if you need a unified dashboard that covers **Solid Queue**, **Solid Cable**, and **Solid Cache** in a single interface, check out [solid_stack_web](https://github.com/eclectic-coding/solid_stack_web).
12
+
11
13
  ![SolidQueueWeb dashboard](docs/solid-queue-web.png)
12
14
 
13
15
  ## The problem
@@ -52,7 +54,9 @@ SolidQueueWeb surfaces all of this in a browser UI available at any route you ch
52
54
  - **CSV export** — "Export CSV" button on the jobs, failed jobs, and history pages downloads all records matching the current filters; columns are tailored per view
53
55
  - **Slow job detection** — when `slow_job_threshold` is configured, claimed jobs running longer than the threshold are flagged with an orange row, a "slow" badge, and a "Running For" duration column on the Running tab; a "Slow Jobs" warning card appears on the dashboard with a link to the Running tab
54
56
  - **Webhook alerts** — set `alert_webhook_url` and `alert_failure_threshold` to receive a POST request whenever the failed job count meets or exceeds the threshold; fires asynchronously so dashboard performance is unaffected; a configurable cooldown (default 1 h) prevents repeated alerts while the count stays elevated
55
- - **Performance analytics** — per-job-class statistics at `/jobs/performance` showing run count, average, p50, p95, min, and max duration; sorted by p95 descending so the slowest classes surface first; period filter scopes to 1h / 24h / 7d or all time; each class name links to the filtered History view
57
+ - **Performance analytics** — per-job-class statistics at `/jobs/performance` showing run count, average, p50, p95, p99, standard deviation, min, and max duration; sorted by p95 descending so the slowest classes surface first; high std dev surfaces inconsistent jobs worth investigating; period filter scopes to 1h / 24h / 7d or all time; each class name links to the filtered History view
58
+ - **Failed job trend chart** — a "Failures — Last 12 Hours" bar chart on the dashboard shows failures per hour over the last 12 hours; bars are red, making failure spikes visible before clicking into the failed jobs list
59
+ - **Error frequency report** — `GET /jobs/failed_jobs/errors` groups all failed jobs by error class and message prefix, shows a count per group, and surfaces a sample backtrace in an expandable row; sorted by count descending so the most common errors appear first; accessible via the "Error Summary" button on the Failed Jobs page
56
60
  - **Metrics / health endpoint** — `GET /jobs/metrics.json` returns a machine-readable JSON document with job counts, throughput, per-queue depth and pause state, and process health summary; suitable for Prometheus scraping, uptime monitors, or external dashboards; `slow_jobs` count included when `slow_job_threshold` is configured
57
61
 
58
62
  ## Compatibility
@@ -75,6 +75,13 @@
75
75
 
76
76
  .sqd-pre--muted { color: var(--muted); }
77
77
 
78
+ .sqd-error-details summary {
79
+ cursor: pointer;
80
+ list-style: none;
81
+ }
82
+ .sqd-error-details summary::-webkit-details-marker { display: none; }
83
+ .sqd-error-details .sqd-pre { margin-top: 0.5rem; }
84
+
78
85
  .sqd-error-header {
79
86
  font-size: 13px;
80
87
  padding: 0.5rem 0.75rem;
@@ -94,4 +94,8 @@
94
94
 
95
95
  .sqd-sparkline__bar--depth {
96
96
  background: var(--purple);
97
+ }
98
+
99
+ .sqd-sparkline__bar--failure {
100
+ background: var(--danger);
97
101
  }
@@ -0,0 +1,9 @@
1
+ module SolidQueueWeb
2
+ module FailedJobs
3
+ class ErrorsController < ApplicationController
4
+ def index
5
+ @groups = ErrorFrequencyReport.new.groups
6
+ end
7
+ end
8
+ end
9
+ end
@@ -1,6 +1,6 @@
1
1
  module SolidQueueWeb
2
2
  class DashboardStats
3
- attr_reader :counts, :throughput, :sparkline, :depth_sparkline, :slow_jobs_count
3
+ attr_reader :counts, :throughput, :sparkline, :depth_sparkline, :failure_sparkline, :slow_jobs_count
4
4
 
5
5
  def initialize
6
6
  @now = Time.current
@@ -32,6 +32,13 @@ module SolidQueueWeb
32
32
  finished_times.count { |t| t >= from && t < to }
33
33
  end
34
34
 
35
+ failed_times = SolidQueue::FailedExecution.where(created_at: 12.hours.ago..@now).pluck(:created_at)
36
+ @failure_sparkline = 12.times.map do |i|
37
+ from = (12 - i).hours.ago
38
+ to = i == 11 ? @now : (11 - i).hours.ago
39
+ failed_times.count { |t| t >= from && t < to }
40
+ end
41
+
35
42
  threshold = SolidQueueWeb.slow_job_threshold
36
43
  @slow_jobs_count = threshold ? SolidQueue::ClaimedExecution.where("created_at <= ?", threshold.ago).count : 0
37
44
 
@@ -0,0 +1,34 @@
1
+ module SolidQueueWeb
2
+ class ErrorFrequencyReport
3
+ Row = Data.define(:exception_class, :message_prefix, :count, :sample_backtrace)
4
+
5
+ MESSAGE_LIMIT = 120
6
+
7
+ def groups
8
+ SolidQueue::FailedExecution
9
+ .order(created_at: :desc)
10
+ .each_with_object({}) do |execution, acc|
11
+ key = [execution.exception_class.to_s, message_prefix(execution.message)]
12
+ entry = acc[key] ||= { count: 0, sample_backtrace: nil }
13
+ entry[:count] += 1
14
+ entry[:sample_backtrace] ||= execution.backtrace
15
+ end
16
+ .map do |(exception_class, prefix), data|
17
+ Row.new(
18
+ exception_class: exception_class,
19
+ message_prefix: prefix,
20
+ count: data[:count],
21
+ sample_backtrace: data[:sample_backtrace]
22
+ )
23
+ end
24
+ .sort_by { |row| -row.count }
25
+ end
26
+
27
+ private
28
+
29
+ def message_prefix(message)
30
+ return "" if message.nil?
31
+ message.length > MESSAGE_LIMIT ? "#{message[0, MESSAGE_LIMIT]}…" : message
32
+ end
33
+ end
34
+ end
@@ -1,6 +1,6 @@
1
1
  module SolidQueueWeb
2
2
  class JobPerformanceStats
3
- Row = Struct.new(:class_name, :count, :avg, :p50, :p95, :min, :max, keyword_init: true)
3
+ Row = Struct.new(:class_name, :count, :avg, :p50, :p95, :p99, :std_dev, :min, :max, keyword_init: true)
4
4
 
5
5
  def initialize(scope)
6
6
  @scope = scope
@@ -18,6 +18,8 @@ module SolidQueueWeb
18
18
  avg: mean(durations),
19
19
  p50: percentile(durations, 50),
20
20
  p95: percentile(durations, 95),
21
+ p99: percentile(durations, 99),
22
+ std_dev: std_dev(durations),
21
23
  min: durations.first,
22
24
  max: durations.last
23
25
  )
@@ -34,5 +36,11 @@ module SolidQueueWeb
34
36
  idx = [(pct / 100.0 * sorted.size).ceil - 1, 0].max
35
37
  sorted[idx]
36
38
  end
39
+
40
+ def std_dev(sorted)
41
+ return 0.0 if sorted.size < 2
42
+ m = mean(sorted)
43
+ Math.sqrt(sorted.sum { |x| (x - m)**2 } / sorted.size)
44
+ end
37
45
  end
38
46
  end
@@ -104,6 +104,35 @@
104
104
  <% end %>
105
105
  </div>
106
106
 
107
+ <% max_failures = [@stats.failure_sparkline.max, 1].max %>
108
+ <div class="sqd-card" style="margin-bottom: 1rem;">
109
+ <div class="sqd-card__header">
110
+ <span class="sqd-card__title">Failures &mdash; Last 12 Hours</span>
111
+ <div class="sqd-throughput__summary">
112
+ <span>Total: <strong><%= @stats.failure_sparkline.sum %></strong></span>
113
+ </div>
114
+ </div>
115
+ <% if @stats.failure_sparkline.all?(&:zero?) %>
116
+ <div class="sqd-sparkline__empty">No failures in the last 12 hours</div>
117
+ <% else %>
118
+ <div class="sqd-sparkline" aria-label="Failed jobs per hour over the last 12 hours">
119
+ <% @stats.failure_sparkline.each_with_index do |count, i| %>
120
+ <% pct = (count.to_f / max_failures * 100).round %>
121
+ <% hour_start = (12 - i).hours.ago %>
122
+ <% show_tick = [0, 3, 6, 9, 11].include?(i) %>
123
+ <div class="sqd-sparkline__col">
124
+ <div class="sqd-sparkline__bar-wrap">
125
+ <div class="sqd-sparkline__bar sqd-sparkline__bar--failure"
126
+ style="height: <%= [pct, 3].max %>%"
127
+ title="<%= hour_start.strftime('%-I%p').downcase %>: <%= count %> <%= "failure".pluralize(count) %>"></div>
128
+ </div>
129
+ <div class="sqd-sparkline__tick"><%= show_tick ? (i == 11 ? "now" : hour_start.strftime("%-I%p").downcase) : "" %></div>
130
+ </div>
131
+ <% end %>
132
+ </div>
133
+ <% end %>
134
+ </div>
135
+
107
136
  <div style="display:grid; grid-template-columns: repeat(auto-fit, minmax(240px, 1fr)); gap: 1rem;">
108
137
  <div class="sqd-card">
109
138
  <div class="sqd-card__header">
@@ -0,0 +1,44 @@
1
+ <div class="sqd-page-header">
2
+ <h1 class="sqd-page-title">Error Summary</h1>
3
+ <div class="sqd-actions">
4
+ <%= link_to "← Failed Jobs", failed_jobs_path, class: "sqd-btn sqd-btn--muted sqd-btn--sm" %>
5
+ </div>
6
+ </div>
7
+
8
+ <% if @groups.any? %>
9
+ <div class="sqd-card">
10
+ <table>
11
+ <thead>
12
+ <tr>
13
+ <th scope="col">Error Class</th>
14
+ <th scope="col">Message</th>
15
+ <th scope="col" style="text-align: right;">Count</th>
16
+ </tr>
17
+ </thead>
18
+ <tbody>
19
+ <% @groups.each do |group| %>
20
+ <tr>
21
+ <td class="sqd-mono"><%= group.exception_class.presence || "—" %></td>
22
+ <td>
23
+ <% if group.sample_backtrace.present? %>
24
+ <details class="sqd-error-details">
25
+ <summary class="sqd-truncate" title="<%= group.message_prefix %>">
26
+ <%= group.message_prefix.presence || "—" %>
27
+ </summary>
28
+ <pre class="sqd-pre sqd-pre--muted"><%= Array(group.sample_backtrace).first(10).join("\n") %></pre>
29
+ </details>
30
+ <% else %>
31
+ <span class="sqd-truncate" title="<%= group.message_prefix %>"><%= group.message_prefix.presence || "—" %></span>
32
+ <% end %>
33
+ </td>
34
+ <td style="text-align: right;"><%= group.count %></td>
35
+ </tr>
36
+ <% end %>
37
+ </tbody>
38
+ </table>
39
+ </div>
40
+ <% else %>
41
+ <div class="sqd-card">
42
+ <div class="sqd-empty">No failed jobs. All clear!</div>
43
+ </div>
44
+ <% end %>
@@ -2,6 +2,7 @@
2
2
  <h1 class="sqd-page-title">Failed Jobs</h1>
3
3
  <% if @failed_jobs.any? %>
4
4
  <div class="sqd-actions">
5
+ <%= link_to "Error Summary", failed_job_errors_path, class: "sqd-btn sqd-btn--muted sqd-btn--sm" %>
5
6
  <%= link_to "Export CSV", failed_jobs_path(format: :csv, queue: @queue, q: @search, period: @period),
6
7
  class: "sqd-btn sqd-btn--muted", data: { turbo: false } %>
7
8
  <%= button_to "Retry All", retry_all_failed_jobs_path,
@@ -21,6 +21,8 @@
21
21
  <th scope="col" style="text-align: right;">Avg</th>
22
22
  <th scope="col" style="text-align: right;">p50</th>
23
23
  <th scope="col" style="text-align: right;">p95</th>
24
+ <th scope="col" style="text-align: right;">p99</th>
25
+ <th scope="col" style="text-align: right;">Std Dev</th>
24
26
  <th scope="col" style="text-align: right;">Min</th>
25
27
  <th scope="col" style="text-align: right;">Max</th>
26
28
  </tr>
@@ -36,6 +38,8 @@
36
38
  <td class="sqd-mono" style="text-align: right;"><%= format_duration(row.avg) %></td>
37
39
  <td class="sqd-mono" style="text-align: right;"><%= format_duration(row.p50) %></td>
38
40
  <td class="sqd-mono" style="text-align: right;"><%= format_duration(row.p95) %></td>
41
+ <td class="sqd-mono" style="text-align: right;"><%= format_duration(row.p99) %></td>
42
+ <td class="sqd-mono" style="text-align: right;"><%= format_duration(row.std_dev) %></td>
39
43
  <td class="sqd-mono" style="text-align: right;"><%= format_duration(row.min) %></td>
40
44
  <td class="sqd-mono" style="text-align: right;"><%= format_duration(row.max) %></td>
41
45
  </tr>
data/config/routes.rb CHANGED
@@ -35,6 +35,8 @@ SolidQueueWeb::Engine.routes.draw do
35
35
  end
36
36
  end
37
37
 
38
+ get "failed_jobs/errors", to: "failed_jobs/errors#index", as: :failed_job_errors
39
+
38
40
  resource :failed_job_selection, path: "failed_jobs/selection", only: [:create, :destroy],
39
41
  controller: "failed_jobs/selections"
40
42
  resources :failed_jobs, only: [:index, :destroy] do
@@ -1,3 +1,3 @@
1
1
  module SolidQueueWeb
2
- VERSION = "1.1.0"
2
+ VERSION = "1.2.0"
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: solid_queue_web
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.0
4
+ version: 1.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Chuck Smith
@@ -125,6 +125,7 @@ files:
125
125
  - app/controllers/solid_queue_web/blocked_jobs_controller.rb
126
126
  - app/controllers/solid_queue_web/dashboard_controller.rb
127
127
  - app/controllers/solid_queue_web/failed_jobs/arguments_controller.rb
128
+ - app/controllers/solid_queue_web/failed_jobs/errors_controller.rb
128
129
  - app/controllers/solid_queue_web/failed_jobs/selections_controller.rb
129
130
  - app/controllers/solid_queue_web/failed_jobs_controller.rb
130
131
  - app/controllers/solid_queue_web/history_controller.rb
@@ -152,12 +153,14 @@ files:
152
153
  - app/models/solid_queue_web/job.rb
153
154
  - app/services/solid_queue_web/alert_webhook.rb
154
155
  - app/services/solid_queue_web/dashboard_stats.rb
156
+ - app/services/solid_queue_web/error_frequency_report.rb
155
157
  - app/services/solid_queue_web/job_performance_stats.rb
156
158
  - app/services/solid_queue_web/metrics_payload.rb
157
159
  - app/services/solid_queue_web/queue_depth_alert.rb
158
160
  - app/services/solid_queue_web/queue_stats.rb
159
161
  - app/views/layouts/solid_queue_web/application.html.erb
160
162
  - app/views/solid_queue_web/dashboard/index.html.erb
163
+ - app/views/solid_queue_web/failed_jobs/errors/index.html.erb
161
164
  - app/views/solid_queue_web/failed_jobs/index.html.erb
162
165
  - app/views/solid_queue_web/history/index.html.erb
163
166
  - app/views/solid_queue_web/jobs/destroy.turbo_stream.erb